This repo contains the implementation of various sequence-based recognisers initially designed for the task of alignment of manuscript ciphers, but now used for recognition in general.
To run an experiment for training or test with a certain configuration you should run the following command:
python3 <EXPERIMENT SCRIPT>.py --config_path <PATH TO CONFIG> [--test <PATH TO WEIGHTS>]If you want to write a configuration file, you can run the command:
python3 <EXPERIMENT SCRIPT>.py --get_templateFor more detailed help, run:
python3 <EXPERIMENT SCRIPT>.py --help- CTC Loss
- VGG Backbone
- Simple CNN Backbone
The system incorporates both the implementation of a few models and the training loop and boilerplate code that make them work. The code is architected as follows
The entry point of the program is an implementation of the Experiment class. One
must overload the initialise_everything method to create any objects the
experiment needs and then call main method to run it.
Dataloaders are implemented in the Data folder. This area perhaps needs a little bit
of cleanup as the system right now expects a GenericDecryptDataset object
whenever the data is needed. In any case, any dataloader that produces a Sample
object when indexed will work fine.
Formatters are classes that convert the output of the model into a picklable alternate representation. This representation may be used for logging purposes or to compute a specific metric during training. They must be picklable because the results are accumulated on disk in a pickle file before computing metrics for an entire epoch. This is needed to implement asynchronous logging and inference.
A formatter must implement the __call__ magic method to develop its main
computation. The keys this formatter adds on the output dictionary must be provided
by overloading the KEYS member of the class.
When more than one formatter is to be applied to the data, the
formatters.utils.Compose class can be used to combine them.
The output of a formatter is a list of dictionaries aligned with the input batch samples where the keys are the name of the chosen format. Thus, when applying a text formatter on the output of a model after feeding it a batch of size two will be:
[
{"text": "<sample text 1>"},
{"text": "<sample text 2>"}
]Should one use the composition formatter for both text and numbers, the output will be:
[
{"text": "<sample text 1>",
"numbers": "<sample number 1>"},
{"text": "<sample text 2>",
"numbers": "<sample number 2>"}
]Metrics are classes that implement the computation of an output metric against the ground truth. They must implement:
- The
maximisemethod to assert a higher value of the metric is better or not. - The
__call__method to perform the computation of the metric itself. This computes the value of the metric for a single data sample. - The
aggregatemethod to combine all sample predictions into a meaningful global value.
Metrics also have a metrics.utils.Compose object to combine multiple of them and
log them in parallel. When used for early stopping and training related purposes, the
first metric that is computed is the one that is taken into account.
A full model must implement the following methods:
- The
compute_batchmethod, which generates the output of the model for a batch. - The
compute_lossmethod, which computes the loss of the model from the previously generated output and the batch information.
This distinction is performed in order to implement the models by families. Any CTC
model will compute the loss the same way on the output; therefore any CTC model will
be implemented by inheriting from the CTC base model without requiring a rewrite of
the loss function computation -- only the forward and compute_batch methods.
Models have their associate configuration classes which must inherit from the
BaseConfig type. The configuration type should then be added to the class
through the MODEL_CONFIG member.
This code is licensed under the GNU GENERAL PUBLIC LICENSE Version 3 (see COPYING for the full license file).