Support streaming ASR evaluation by hirofumi0810 · Pull Request #30 · facebookresearch/SimulEval

hirofumi0810 · 2023-03-03T01:25:27Z

Support streaming ASR evaluation in WER. Migrated the same tokenizer from fairseq.

Summary: Pull Request resolved: fairinternal/SimulEval#30 Test Plan: `python tt_waitk_unity_v2.py --min-unit-chunk-size 10 --latency-metrics EndOffset --no-use-ref-len` Reviewed By: xutaima Differential Revision: D43581361 Pulled By: annasun28 fbshipit-source-id: 86b06784888bcfd452f6228fd55736fc01b72429

* Fix several bugs 20230109 (#23) Summary: - Fix the bugs where the scorers' options are not passed to cli. - Update sacrebleu dependency to 2.3.1 to support ja-mecab tokenizer - Fix the error when `eval_latency_unit=char`. The options are used for languages without spaces (e.g. Zh and JA) - Several bugs in speech-to-text data loader - Fix the bug where the last delay is ignored when computing CA AL - Fix minor errors when running remote evaluation - Typos and type hint mismatches Pull Request resolved: https://github.com/fairinternal/SimulEval/pull/23 Reviewed By: annasun28 Differential Revision: D42455147 Pulled By: xutaima fbshipit-source-id: 05b63ad0ed16c37093ad58b49b4b1a1c97fa6070 * Add missing license comments (#24) Summary: To address the task [T140465752](https://www.internalfb.com/intern/tasks/?t=140465752) Pull Request resolved: https://github.com/fairinternal/SimulEval/pull/24 Reviewed By: annasun28 Differential Revision: D42765319 Pulled By: xutaima fbshipit-source-id: be595eafce62d4b333ee206db7b103f4a7911a21 * Add ATDScore (#28) Summary: Added Average Token Delay (ATD) for a latency metric. paper: Average Token Delay: A Latency Metric for Simultaneous Translation (https://arxiv.org/abs/2211.13173) X-link: #28 Reviewed By: annasun28 Differential Revision: D42768122 Pulled By: xutaima fbshipit-source-id: f5cbeb785486dfdbb48a156859dd0451e96fd4cc * Enable --system-dir option (#25) Summary: Simplify the agent building argument to just directory name, with an optional config name. - [x] Documentation on readthedocs - [x] Provide example system directories based on current s2t and s2t Given a system directory `${system_dir}` ```bash > ls ${system_dir} main.yaml checkpoint.pt config.yaml dict.txt sentence.bpe.model wav2vec_small.yaml ``` and `main.yaml` has ```yaml agent_class: fairseq.models.streaming.agents.TestTimeWaitKS2T checkpoint: checkpoint.pt sentencepiece_model: sentence.bpe.model config_yaml: config.yaml wav2vec_yaml: wav2vec_small.yaml waitk_lagging: 2 fixed_pre_decision_ratio: 4 device: cuda:0 ``` From cli ``` simuleval --standalone --system-dir ${system_dir} ``` In python ``` from simuleval.utils import build_system_from_dir system = build_system_from_dir("system") print(system) while True: speech_segment = audio_frontend.send_segment() output_segment = system.pushpop(speech_segment) print(output_segment) if output_segment.finished: break ``` Systems available now (under ` /large_experiments/seamless/ust/xutaima/2023_H1/demo/systems`): | Path | Modality | language | Description | | ---- | --------| ------------- | ---------- | | `s2t_es-en_tt-waitk_multidomain` | speech-to-text | es -> en | Multidomain (2022 H2) | | `s2t_en-de_tt-waitk_iwslt2023-must-c` | speech-to-text | en -> de | MuST-C for IWSLT 2023 | | `s2t_en-zh_tt-waitk_iwslt2023-must-c` | speech-to-text | en -> zh | MuST-C for IWSLT 2023 | | `s2t_en-ja_tt-waitk_iwslt2023-must-c` | speech-to-text | en -> ja | MuST-C for IWSLT 2023 | |`s2s_es-en_tt-waitk-cascaded_multidomain`| speech-to-speech | es -> en | Multidomain Cascaced Model (2022 Q3) | |`s2s_es-en_tt-waitk-unity2_multidomain`| speech-to-speech | es -> en | Multidomain UnitY2 model (2022 H2 |) Pull Request resolved: https://github.com/fairinternal/SimulEval/pull/25 Reviewed By: schwarzmx Differential Revision: D42976452 Pulled By: xutaima fbshipit-source-id: 64d2224ff88fe6ceadabc42236f3c791298d967d * Enable stateless Agent (#26) Summary: Add optional argument states `policy`, `push` and `pop` functions to enable stateless agent. We may want stateless to be the only option in the future. Pull Request resolved: https://github.com/fairinternal/SimulEval/pull/26 Reviewed By: annasun28 Differential Revision: D43005153 Pulled By: xutaima fbshipit-source-id: 67612b96e0d2f46a1f7573aeb0296942b4a1fdd9 * Modify the current agents to stateless for production setting (#3817) Summary: Now we can use agent like this ```python system = build_system_from_dir( "/large_experiments/seamless/ust/xutaima/2023_H1/demo/systems/s2t_es-en_tt-waitk_multidomain" ) system.to("cuda:0") system_states = system.build_states() while True: speech_segment = audio_frontend.send_segment() output_segment = system.pushpop(speech_segment, system_states) print(output_segment) if output_segment.finished: break ``` X-link: https://github.com/fairinternal/fairseq-py/pull/3817 Reviewed By: annasun28 Differential Revision: D43008938 Pulled By: xutaima fbshipit-source-id: 89da787199cf961b33aea822f0a11cc46e30d8c7 * Fix ASR BLEU (#27) Summary: Pull Request resolved: https://github.com/fairinternal/SimulEval/pull/27 Reviewed By: padentomasello Differential Revision: D43139557 Pulled By: xutaima fbshipit-source-id: 8093550156b591459a19e85e78f67c80ece03491 * Fix the bugs introduced in recent PRs (#28) Summary: Pull Request resolved: https://github.com/fairinternal/SimulEval/pull/28 Reviewed By: schwarzmx Differential Revision: D43371745 Pulled By: xutaima fbshipit-source-id: 4440d87a02534ad441b04363ff3696806fdf61a0 * fixed the issue when running jobs through slurm (#29) Summary: Pull Request resolved: https://github.com/fairinternal/SimulEval/pull/29 Test Plan: Imported from GitHub, without a `Test Plan:` line. Kick off evaluation through `--slurm` option. Previous PRs introduced bug for thie option. Reviewed By: annasun28 Differential Revision: D43506304 Pulled By: xutaima fbshipit-source-id: f47299a59b5f0e1c796edc53efadfd54967ddc6f * EndOffset using playback intervals (#30) Summary: Pull Request resolved: https://github.com/fairinternal/SimulEval/pull/30 Test Plan: `python tt_waitk_unity_v2.py --min-unit-chunk-size 10 --latency-metrics EndOffset --no-use-ref-len` Reviewed By: xutaima Differential Revision: D43581361 Pulled By: annasun28 fbshipit-source-id: 86b06784888bcfd452f6228fd55736fc01b72429 * discontinuity metrics (#31) --------- Co-authored-by: master-possible <kano.yasumasa.kw4@is.naist.jp> Co-authored-by: Anna Sun <13106449+annasun28@users.noreply.github.com>

* Fix several bugs 20230109 (#23) Summary: - Fix the bugs where the scorers' options are not passed to cli. - Update sacrebleu dependency to 2.3.1 to support ja-mecab tokenizer - Fix the error when `eval_latency_unit=char`. The options are used for languages without spaces (e.g. Zh and JA) - Several bugs in speech-to-text data loader - Fix the bug where the last delay is ignored when computing CA AL - Fix minor errors when running remote evaluation - Typos and type hint mismatches Pull Request resolved: https://github.com/fairinternal/SimulEval/pull/23 Reviewed By: annasun28 Differential Revision: D42455147 Pulled By: xutaima fbshipit-source-id: 05b63ad0ed16c37093ad58b49b4b1a1c97fa6070 * Add missing license comments (#24) Summary: To address the task [T140465752](https://www.internalfb.com/intern/tasks/?t=140465752) Pull Request resolved: https://github.com/fairinternal/SimulEval/pull/24 Reviewed By: annasun28 Differential Revision: D42765319 Pulled By: xutaima fbshipit-source-id: be595eafce62d4b333ee206db7b103f4a7911a21 * Add ATDScore (#28) Summary: Added Average Token Delay (ATD) for a latency metric. paper: Average Token Delay: A Latency Metric for Simultaneous Translation (https://arxiv.org/abs/2211.13173) X-link: #28 Reviewed By: annasun28 Differential Revision: D42768122 Pulled By: xutaima fbshipit-source-id: f5cbeb785486dfdbb48a156859dd0451e96fd4cc * Enable --system-dir option (#25) Summary: Simplify the agent building argument to just directory name, with an optional config name. - [x] Documentation on readthedocs - [x] Provide example system directories based on current s2t and s2t Given a system directory `${system_dir}` ```bash > ls ${system_dir} main.yaml checkpoint.pt config.yaml dict.txt sentence.bpe.model wav2vec_small.yaml ``` and `main.yaml` has ```yaml agent_class: fairseq.models.streaming.agents.TestTimeWaitKS2T checkpoint: checkpoint.pt sentencepiece_model: sentence.bpe.model config_yaml: config.yaml wav2vec_yaml: wav2vec_small.yaml waitk_lagging: 2 fixed_pre_decision_ratio: 4 device: cuda:0 ``` From cli ``` simuleval --standalone --system-dir ${system_dir} ``` In python ``` from simuleval.utils import build_system_from_dir system = build_system_from_dir("system") print(system) while True: speech_segment = audio_frontend.send_segment() output_segment = system.pushpop(speech_segment) print(output_segment) if output_segment.finished: break ``` Systems available now (under ` /large_experiments/seamless/ust/xutaima/2023_H1/demo/systems`): | Path | Modality | language | Description | | ---- | --------| ------------- | ---------- | | `s2t_es-en_tt-waitk_multidomain` | speech-to-text | es -> en | Multidomain (2022 H2) | | `s2t_en-de_tt-waitk_iwslt2023-must-c` | speech-to-text | en -> de | MuST-C for IWSLT 2023 | | `s2t_en-zh_tt-waitk_iwslt2023-must-c` | speech-to-text | en -> zh | MuST-C for IWSLT 2023 | | `s2t_en-ja_tt-waitk_iwslt2023-must-c` | speech-to-text | en -> ja | MuST-C for IWSLT 2023 | |`s2s_es-en_tt-waitk-cascaded_multidomain`| speech-to-speech | es -> en | Multidomain Cascaced Model (2022 Q3) | |`s2s_es-en_tt-waitk-unity2_multidomain`| speech-to-speech | es -> en | Multidomain UnitY2 model (2022 H2 |) Pull Request resolved: https://github.com/fairinternal/SimulEval/pull/25 Reviewed By: schwarzmx Differential Revision: D42976452 Pulled By: xutaima fbshipit-source-id: 64d2224ff88fe6ceadabc42236f3c791298d967d * Enable stateless Agent (#26) Summary: Add optional argument states `policy`, `push` and `pop` functions to enable stateless agent. We may want stateless to be the only option in the future. Pull Request resolved: https://github.com/fairinternal/SimulEval/pull/26 Reviewed By: annasun28 Differential Revision: D43005153 Pulled By: xutaima fbshipit-source-id: 67612b96e0d2f46a1f7573aeb0296942b4a1fdd9 * Modify the current agents to stateless for production setting (#3817) Summary: Now we can use agent like this ```python system = build_system_from_dir( "/large_experiments/seamless/ust/xutaima/2023_H1/demo/systems/s2t_es-en_tt-waitk_multidomain" ) system.to("cuda:0") system_states = system.build_states() while True: speech_segment = audio_frontend.send_segment() output_segment = system.pushpop(speech_segment, system_states) print(output_segment) if output_segment.finished: break ``` X-link: https://github.com/fairinternal/fairseq-py/pull/3817 Reviewed By: annasun28 Differential Revision: D43008938 Pulled By: xutaima fbshipit-source-id: 89da787199cf961b33aea822f0a11cc46e30d8c7 * Fix ASR BLEU (#27) Summary: Pull Request resolved: https://github.com/fairinternal/SimulEval/pull/27 Reviewed By: padentomasello Differential Revision: D43139557 Pulled By: xutaima fbshipit-source-id: 8093550156b591459a19e85e78f67c80ece03491 * Fix the bugs introduced in recent PRs (#28) Summary: Pull Request resolved: https://github.com/fairinternal/SimulEval/pull/28 Reviewed By: schwarzmx Differential Revision: D43371745 Pulled By: xutaima fbshipit-source-id: 4440d87a02534ad441b04363ff3696806fdf61a0 * fixed the issue when running jobs through slurm * fixed the issue when running jobs through slurm (#29) Summary: Pull Request resolved: https://github.com/fairinternal/SimulEval/pull/29 Test Plan: Imported from GitHub, without a `Test Plan:` line. Kick off evaluation through `--slurm` option. Previous PRs introduced bug for thie option. Reviewed By: annasun28 Differential Revision: D43506304 Pulled By: xutaima fbshipit-source-id: f47299a59b5f0e1c796edc53efadfd54967ddc6f * EndOffset using playback intervals (#30) Summary: Pull Request resolved: https://github.com/fairinternal/SimulEval/pull/30 Test Plan: `python tt_waitk_unity_v2.py --min-unit-chunk-size 10 --latency-metrics EndOffset --no-use-ref-len` Reviewed By: xutaima Differential Revision: D43581361 Pulled By: annasun28 fbshipit-source-id: 86b06784888bcfd452f6228fd55736fc01b72429 * add instruction for speech-to-speech evaluation * Fix bugs in starting offset * update results * update readme * fixed ATD for s2s (#34) * add new line to files * discontinuity metrics (#31) * Add whisper ASR BLEU --------- Co-authored-by: master-possible <kano.yasumasa.kw4@is.naist.jp> Co-authored-by: Anna Sun <13106449+annasun28@users.noreply.github.com> Co-authored-by: master-possible <66279784+master-possible@users.noreply.github.com>

* Fix several bugs 20230109 (#23) Summary: - Fix the bugs where the scorers' options are not passed to cli. - Update sacrebleu dependency to 2.3.1 to support ja-mecab tokenizer - Fix the error when `eval_latency_unit=char`. The options are used for languages without spaces (e.g. Zh and JA) - Several bugs in speech-to-text data loader - Fix the bug where the last delay is ignored when computing CA AL - Fix minor errors when running remote evaluation - Typos and type hint mismatches Pull Request resolved: https://github.com/fairinternal/SimulEval/pull/23 Reviewed By: annasun28 Differential Revision: D42455147 Pulled By: xutaima fbshipit-source-id: 05b63ad0ed16c37093ad58b49b4b1a1c97fa6070 * Add missing license comments (#24) Summary: To address the task [T140465752](https://www.internalfb.com/intern/tasks/?t=140465752) Pull Request resolved: https://github.com/fairinternal/SimulEval/pull/24 Reviewed By: annasun28 Differential Revision: D42765319 Pulled By: xutaima fbshipit-source-id: be595eafce62d4b333ee206db7b103f4a7911a21 * Add ATDScore (#28) Summary: Added Average Token Delay (ATD) for a latency metric. paper: Average Token Delay: A Latency Metric for Simultaneous Translation (https://arxiv.org/abs/2211.13173) X-link: #28 Reviewed By: annasun28 Differential Revision: D42768122 Pulled By: xutaima fbshipit-source-id: f5cbeb785486dfdbb48a156859dd0451e96fd4cc * Enable --system-dir option (#25) Summary: Simplify the agent building argument to just directory name, with an optional config name. - [x] Documentation on readthedocs - [x] Provide example system directories based on current s2t and s2t Given a system directory `${system_dir}` ```bash > ls ${system_dir} main.yaml checkpoint.pt config.yaml dict.txt sentence.bpe.model wav2vec_small.yaml ``` and `main.yaml` has ```yaml agent_class: fairseq.models.streaming.agents.TestTimeWaitKS2T checkpoint: checkpoint.pt sentencepiece_model: sentence.bpe.model config_yaml: config.yaml wav2vec_yaml: wav2vec_small.yaml waitk_lagging: 2 fixed_pre_decision_ratio: 4 device: cuda:0 ``` From cli ``` simuleval --standalone --system-dir ${system_dir} ``` In python ``` from simuleval.utils import build_system_from_dir system = build_system_from_dir("system") print(system) while True: speech_segment = audio_frontend.send_segment() output_segment = system.pushpop(speech_segment) print(output_segment) if output_segment.finished: break ``` Systems available now (under ` /large_experiments/seamless/ust/xutaima/2023_H1/demo/systems`): | Path | Modality | language | Description | | ---- | --------| ------------- | ---------- | | `s2t_es-en_tt-waitk_multidomain` | speech-to-text | es -> en | Multidomain (2022 H2) | | `s2t_en-de_tt-waitk_iwslt2023-must-c` | speech-to-text | en -> de | MuST-C for IWSLT 2023 | | `s2t_en-zh_tt-waitk_iwslt2023-must-c` | speech-to-text | en -> zh | MuST-C for IWSLT 2023 | | `s2t_en-ja_tt-waitk_iwslt2023-must-c` | speech-to-text | en -> ja | MuST-C for IWSLT 2023 | |`s2s_es-en_tt-waitk-cascaded_multidomain`| speech-to-speech | es -> en | Multidomain Cascaced Model (2022 Q3) | |`s2s_es-en_tt-waitk-unity2_multidomain`| speech-to-speech | es -> en | Multidomain UnitY2 model (2022 H2 |) Pull Request resolved: https://github.com/fairinternal/SimulEval/pull/25 Reviewed By: schwarzmx Differential Revision: D42976452 Pulled By: xutaima fbshipit-source-id: 64d2224ff88fe6ceadabc42236f3c791298d967d * Enable stateless Agent (#26) Summary: Add optional argument states `policy`, `push` and `pop` functions to enable stateless agent. We may want stateless to be the only option in the future. Pull Request resolved: https://github.com/fairinternal/SimulEval/pull/26 Reviewed By: annasun28 Differential Revision: D43005153 Pulled By: xutaima fbshipit-source-id: 67612b96e0d2f46a1f7573aeb0296942b4a1fdd9 * Modify the current agents to stateless for production setting (#3817) Summary: Now we can use agent like this ```python system = build_system_from_dir( "/large_experiments/seamless/ust/xutaima/2023_H1/demo/systems/s2t_es-en_tt-waitk_multidomain" ) system.to("cuda:0") system_states = system.build_states() while True: speech_segment = audio_frontend.send_segment() output_segment = system.pushpop(speech_segment, system_states) print(output_segment) if output_segment.finished: break ``` X-link: https://github.com/fairinternal/fairseq-py/pull/3817 Reviewed By: annasun28 Differential Revision: D43008938 Pulled By: xutaima fbshipit-source-id: 89da787199cf961b33aea822f0a11cc46e30d8c7 * Fix ASR BLEU (#27) Summary: Pull Request resolved: https://github.com/fairinternal/SimulEval/pull/27 Reviewed By: padentomasello Differential Revision: D43139557 Pulled By: xutaima fbshipit-source-id: 8093550156b591459a19e85e78f67c80ece03491 * Fix the bugs introduced in recent PRs (#28) Summary: Pull Request resolved: https://github.com/fairinternal/SimulEval/pull/28 Reviewed By: schwarzmx Differential Revision: D43371745 Pulled By: xutaima fbshipit-source-id: 4440d87a02534ad441b04363ff3696806fdf61a0 * fixed the issue when running jobs through slurm * fixed the issue when running jobs through slurm (#29) Summary: Pull Request resolved: https://github.com/fairinternal/SimulEval/pull/29 Test Plan: Imported from GitHub, without a `Test Plan:` line. Kick off evaluation through `--slurm` option. Previous PRs introduced bug for thie option. Reviewed By: annasun28 Differential Revision: D43506304 Pulled By: xutaima fbshipit-source-id: f47299a59b5f0e1c796edc53efadfd54967ddc6f * EndOffset using playback intervals (#30) Summary: Pull Request resolved: https://github.com/fairinternal/SimulEval/pull/30 Test Plan: `python tt_waitk_unity_v2.py --min-unit-chunk-size 10 --latency-metrics EndOffset --no-use-ref-len` Reviewed By: xutaima Differential Revision: D43581361 Pulled By: annasun28 fbshipit-source-id: 86b06784888bcfd452f6228fd55736fc01b72429 * add instruction for speech-to-speech evaluation * Fix bugs in starting offset * update results * update readme * fixed ATD for s2s (#34) * add new line to files * discontinuity metrics (#31) * Add whisper ASR BLEU * Save individual metrics if possible * add metrics to log instance --------- Co-authored-by: master-possible <kano.yasumasa.kw4@is.naist.jp> Co-authored-by: Anna Sun <13106449+annasun28@users.noreply.github.com> Co-authored-by: master-possible <66279784+master-possible@users.noreply.github.com>

Support ASR evaluation

b07ea3e

hirofumi0810 requested a review from xutaima March 3, 2023 01:25

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support streaming ASR evaluation#30

Support streaming ASR evaluation#30
hirofumi0810 wants to merge 1 commit intomainfrom
wer_eval

hirofumi0810 commented Mar 3, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hirofumi0810 commented Mar 3, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants