This is the code for the paper "About Time: Do Transformers Learn Temporal Lexical Aspect?" by E. Metheniti, T. Van de Cruys and N. Hathout.
data/en and data/fr include a cleaned-up and concatenated version of the datasets by Friedrich and Gateva (2017) and Alikhani and Stone (2019) (English-French respectively)
data/unseen_tests includes our hand-crafted and annotated sentences for testing.
- transformers
- torch >= 2.2.0
- sklearn
- BERT:
bert-{base,large}-{cased,uncased} - RoBERTa:
roberta-{base,large} - XLNet:
xlnet-{base,large}-cased - Albert:
albert-{base,large}-v2
- FlauBERT:
flaubert/flaubert_{small,base,large}_casedORflaubert/flaubert_base_uncased - CamemBERT:
camembert-baseORcamembert/camembert-large
python finetune_classify.py \ --label_marker {'telicity' ,'duration'} \ --data_path {'data/en', 'data/fr'} --transformer_model {...} \ --num_epochs {2-4} \ --batch_size {...} \ --verb_segment_ids {'yes', 'no'} --training {'yes', 'no'}
label_marker: Binary classification of telicity (telic/atelic) or duration (stative/dynamic)data_path: Which datasets will be used.transformer_model: The exact model.num_epochs: Recommended 2-4. Default: 4batch_size: Default: 32verb_segment_ids: Whether we use token_type_ids or not to mark the verb position. Default: no. ATTN: RoBERTa + Camembert models do NOT support token_type_ids, always pass no. Also, the flag is NOT a boolean True/False, it's yes/no.training: Whether we train the model or load a model from the checkpoints folder.
See above.
python log_reg_embeddings.py \ --label_marker {'telicity' ,'duration'} \ --data_path {'data/en', 'data/fr'} \ --transformer_model {...} \ --batch_size {...} \ --verb_segment_ids {'yes', 'no'} --freeze_layer_count {1-12/24}
See above, and also:
freeze_layer_count: From which model layer we'll get the embeddings (max. 12 for base models, max. 24 for large models).
@inproceedings{ metheniti2022about, title={About Time: Do Transformers Learn Temporal Verbal Aspect?}, author={Metheniti, Eleni and Van de Cruys, Tim and Hathout, Nabil}, booktitle={Workshop on Cognitive Modeling and Computational Linguistics 2022}, year={2022}, url={https://openreview.net/forum?id=Bduz_QYg8Z5} }