-
Notifications
You must be signed in to change notification settings - Fork 17
Added pseudo alignment strategy based on phoneme duration #116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| import typing | ||
| from dataclasses import dataclass, replace | ||
|
|
||
| import transphone |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use a lazy import? We want to keep the mandatory dependencies for the core code small.
|
Hey @popcornell! This is a good extension for evaluating such languages as Japanese. I'm unsure about your choice for the interface. I'm not that happy to add a |
| """Divides the interval into one interval per word where the size of the interval is | ||
| proportional to the number of phonemes in the word.""" | ||
|
|
||
| g2p = transphone.read_tokenizer(language) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The call transphone.read_tokenizer sounds, that it loads a model. Has it a caching?
If not, we should do a caching, since this function is called for every segment.
| (s['start_time'], s['end_time']), | ||
| words | ||
| ) | ||
| words, language) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you change this to words, language=language)?
|
|
||
| # pseudo-timestamp strategies | ||
| def equidistant_intervals(interval, words): | ||
| def equidistant_intervals(interval, words, *args): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer (interval, words, language) as signature, or at least (interval, words, **kwargs).
|
Hi, thanks for the PR. Is I am thinking, if |
I see arguments for both realizations. Since we have now an expert in this chat, maybe one other question first: |
|
I agree, when the language argument is useful in other locations, like word splitting or normalization, it may be worth to add it to the interface. |
yeah actually I was unsure about that too. I was assuming that one would split the reference before feeding to meeteval but yeah maybe the best way to handle this is to make it dependent on the language or have an additional argument. Are you guys ok with another argument ? Like |
|
@popcornell Do you have examples for the output of a Japanese ASR system? The guys from NTT said that CER is usually used instead of WER, which completely ignores whitespace and splits individual characters. In that case, we may want to add a time-constrained CER |
|
For documentation:
|
|
We discussed the following: The language should be encoded in the strategy name. Since there are potentially multiple libraries for obtaining phoneme durations, the package name should also be encoded in the strategy name. CER in a different PR @popcornell Are you willing to adjust the PR with the required changes or should we do it? |
|
Hey guys, yeah I plan to adjust it. But I am currently busy in the JSALT I thought I would have more time. I can do it this weekend though. |
|
@popcornell ping! Do you still plan to work on this? If not, I'd have some time. |
|
Hey Thilo, currently still busy for ICLR... |
Hi guys,
Greetings from Brno.
I am trying to add phoneme-based duration as an another pseudo-alignment word duration strategy.
This could enable tcpWER for languages such as Japanese for which one character e.g. a kanji could be of much longer duration than others.
Adding here also Alexander Polok as he is responsible for https://huggingface.co/spaces/BUT-FIT/EMMA_leaderboard
@Lakoc