Print this page

Dictionary Generator

To facilitate the production of our dictionaries and other resources a suite of applications and software libraries has been developed and maintained. The software's ability to generate a vast variety of resources and output formats based on templates is its key strength.

The dictionary generation software was first conceived of by Ted Young as a way to generate a PDF dictionary from Mark Vygus's wordlist. Being an extremely lazy person Ted refused to accept any solution that required manual labor to create or structure his dictionaries. Through the use of some creativity and a lot of high quality open source libraries he was able to create the initial implementation and subsequently a dictionary which led to the formation of this group.

All source code is released under the GNU Lesser General Public License and are free for use in commercial and noncommercial products with attribution.

Architecture Overview

The software's architecture is extremely powerful and flexible; allowing for integration at various points in the processing chain. In fact, it's templating system is so powerful one can develop entirely new resources without programming.

Word lists are converted into lists of Java objects and processed and sorted. These Java objects are then converted to XML and transformed against an XSLT stylesheet. This stylesheet could generate any type of output. For example when generating PDFs the XSLT generates XSL-FO. XSL-FO is then rendered as a PDF.

Since most output formats cannot handle hieroglyphic text natively a series of converters have been implemented to render hieroglyphic text in common image formats. In the PDF example hieroglyphic text is converted to SVG. The library we use to render XSL-FO to PDF also understands SVG and preserves this as vector markup to ensure high quality printing.

It is highly recommended that you read through the Architecture page for more detailed discussion on this exciting architecture and how it was composed with a minimal amount of code.

Project Status

The entire code base was completely re-factored to take this project from a proof of concept to something that can function in the real world and be as flexible as we had envisioned.  As such, the current code avaiable is considered snap-shot quality.  For this reason we are not yet offering binaries, and the JavaDoc is empty.  We hope to have these issues resolved by the end of the month.

Resources

JavaDoc: http://gosp.sourceforge.net/projects/dictionary/apidocs/index.html

Unfortunately, the JavaDoc does not include any API documentation at this point. Documenting this code base is one of our highest priorities.

This JavaDoc include documentation for three components: the dictionary generator, a JSesh integration library, and the HieroWord parsing and converter library.

Source Code (SVN): https://sourceforge.net/svn/?group_id=230208

You may browse the source code here.

Getting Involved

We can definitely use some help. Once the source code is made available there will be plenty of room for improvement, especially in the form of enhancements and new features. There is also some work to be done in the HieorWord to MdC converter to handle some edge cases.

Perhaps most glamorous and rewarding area in which to help would be in the development of new templates; whether they are to improve the layout of existing documents or to generate entirely new resources.

For more information on how you can get involved please visit the Getting Involved page.

Project Management:

Project Manager: Ted Young
Deputy Project Manager: WANTED!