Skip to content

turkic-nlp/generated-ud-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

generated-ud-data

Data Sources

The treebank data was derived by translating sentences from the following Turkish UD treebanks into respective languages and using silver annotation process:

Training data — translated from:

Test data — translated from:

Details about resources

Azerbaijani

  • train: based on training data sources above (translated & silver annotated)
  • test: taken from here

Turkmen

  • train: based on training data sources above (translated & silver annotated)
  • test: based on test data source above (translated & silver annotated)

Tatar

  • train: based on training data sources above (translated & silver annotated)
  • test taken from here

Bashkir

  • train: based on training data sources above (translated & silver annotated)
  • test: based on test data source above (translated & silver annotated)

Acknowledgement

  • Turkic UD Group: https://github.com/ud-turkic
  • UD_Turkish-IMST: Umut Sulubacak, Gülşen Eryiğit. Implementing Universal Dependency, Morphology and Multiword Expression Annotation Standards for Turkish Language Processing. Turkish Journal of Electrical Engineering & Computer Sciences, DOI: 10.3906/elk-1706-81):1–23. May 2018
  • UD_Turkish-GB: Çağrı Çöltekin (2015) A grammar-book treebank of Turkish In: Proceedings of the 14th workshop on Treebanks and Linguistic Theories (TLT 14)
  • UD_Turkish-FrameNet: Marşan, B., Kara, N., Özçelik, M., Arıcan, B. N., Cesur, N., Kuzgun, A., ... & Yıldız, O. T. (2021, January). Building the Turkish FrameNet. In Proceedings of the 11th Global Wordnet Conference (pp. 118-125).
  • UD-Turkic/Parallel: https://github.com/ud-turkic/parallel

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors