Hi,
While working on another project, we saw that parsing Chinese dates rarely works when the language zh-Hans, whereas it works fine with zh.
Example output:
Detecting text: 2024年6月1日
ja: 2024-06-01 00:00:00
zh: 2024-06-01 00:00:00
zh-Hans-HK: None
zh-Hans: None
zh-Hant-HK: None
zh-Hant: None
From what I understand about CLDR, Chinese (simplified) is the default locale for Chinese: https://st.unicode.org/cldr-apps/v#/zh_Hans//
In the dateparser source code, zh-Hans appears to be a subset of zh, sharing most translation tables but lacking simplification rules, which seems to cause the parsing failures.
I made a proposed change here: master...Merinorus:dateparser:chinese-dates
Before submitting a PR, I’d like to confirm whether my understanding is correct.
Thank you!