standardize_tag does not properly identify macrolanguages

Hi! I am trying to use this package to normalize languages in a project. For our specific use-case, we don't differentiate between macro-languages and their variations, as this difference was not made when compiling the data. However, in some cases, the individual language was listed, resulting on a need to identify the macro language.

According to the documentation, using ```standardize_tag(code,macro=True)``` should do this. However, we have noticed that it does not always work. Some examples follow:

Standardizing Mandarin gives the macro code for Chinese: ```standardize_tag("cmn",macro=True)``` gives ```'zh'```as an output.

Standardizing Northern Ping Chinese, Hainanese, or Pu-Xian Chinese does not gives the macro code for Chinese, it gives back the code of the respective individual language. As an example,  ```standardize_tag("cpx",macro=True)``` should give ```'zh'```as an output but gives ```cpx``` instead.

The same thing happens with variations of Arabic, where it fails to identify even the most common dialects, such as Levantine Arabic, Moroccan Arabic, or Egyptian Arabic.

I am currently using version 3.3.0 of langcodes



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

standardize_tag does not properly identify macrolanguages #67

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

standardize_tag does not properly identify macrolanguages #67

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions