-
Notifications
You must be signed in to change notification settings - Fork 30
Description
Hi! I am trying to use this package to normalize languages in a project. For our specific use-case, we don't differentiate between macro-languages and their variations, as this difference was not made when compiling the data. However, in some cases, the individual language was listed, resulting on a need to identify the macro language.
According to the documentation, using standardize_tag(code,macro=True) should do this. However, we have noticed that it does not always work. Some examples follow:
Standardizing Mandarin gives the macro code for Chinese: standardize_tag("cmn",macro=True) gives 'zh'as an output.
Standardizing Northern Ping Chinese, Hainanese, or Pu-Xian Chinese does not gives the macro code for Chinese, it gives back the code of the respective individual language. As an example, standardize_tag("cpx",macro=True) should give 'zh'as an output but gives cpx instead.
The same thing happens with variations of Arabic, where it fails to identify even the most common dialects, such as Levantine Arabic, Moroccan Arabic, or Egyptian Arabic.
I am currently using version 3.3.0 of langcodes