This page describes the changes that have been made in the most
recent, major versions of MADA+TOKAN. This information is also
presented (in greater detail) in the MADA.CHANGES file included with each MADA+TOKAN release.
| Version | Release Date | Description |
| 3.2 | January
2012 | A few convenience features, plus several minor bug fixes |
| 3.1 | August
2010 | Significant Model improvements |
| 3.0 | May
2010 | Added preprocessor, ALMOR, other convenience features.
SVMs reorganized. |
| 2.3.2 (== 2.32) | July
2009 | Numerous bug fixes |
| 2.0 | March
2008 | Major code refactoring, weighted features, and other improvements |
| Added the option to produce TOKAN output that is encoded as Buckwalter, Safe Buckwalter or UTF-8 |
| Added the option to build MADA using Aramorph (a free version of BAMA 1.2.1) instead of SAMA 3.1 |
| Added the option to declare an output directory that all MADA and TOKAN output will be built in |
| Added a 'quiet' mode that will suppress MADA+TOKAN status messages |
| Added a GLOSS mode to TOKAN to output an English gloss as one of the TOKAN scheme forms |
| Altered the MADA output file format slightly; the ";;MADA" line (which displays the predictions of the SVM classifiers) has been renamed ";;SVM_PREDICTIONS" for clarity |
| Minor improvements to support scripts |
| Added a TOKAN-evaluate.pl script to compare TOKAN output for some simple cases. |
| Identified a bug in SRILM that can create differences in MADA output after minor changes to input; a patch for the SRILM is provided to fix this. |
| Fixed the handling of blank lines in TOKAN. |
| Fixed a bug which caused @@LAT@@ words to only have a single output form when the TOKAN_SCHEME specified several forms |
| Various minor bug fixes in TOKAN |
| New models have trained using roughly twice the
training data |
| A flaw that rendered the SVM models in MADA 3.0
sub-optimal has been removed |
| Miscellaneous bug fixes |
| Added a preprocessor to handle input text cleaning
and formatting |
| Replaced Aragen morphological analyzer with its
successor, Almorgeana (ALMOR) |
| Adding a INSTALL.pl script to help with MADA
installation |
| Refactored TOKAN; added the means to run multiple
TOKAN schemes on the same file |
| Numerous changes to configuration variables for
clarity and convenience |
| Reorganized N-gram models of lemmas and diacritics |
| Miscellaneous other bug fixes |
| Miscellaneous bug fixes |
| Added morphological backoff options |
| Refactored entire code base to make maintenance and
improvements easier |
| Added tuned feature weights to improve analysis selection |
| Improved lemma and diacritic N-gram models |
| Added a few scripts to handle common tasks on MADA
files, such as feature extraction |
| Numerous convenience features added, such as adding
the ability to process gzipped files |
| Miscellaneous bug fixes |