-
Notifications
You must be signed in to change notification settings - Fork 39
Open
Description
Hello I run into an issue with the frames numbers.
I try to decode an audio book (http://ia801400.us.archive.org/7/items/jekyll_and_hyde_klh_0904_librivox/jekyll_01_stevenson_64kb.mp3)
configuration = Pocketsphinx::Configuration.default
decoder = Pocketsphinx::Decoder.new(configuration)
decoder.decode 'jekyll_01_stevenson.wav'
puts decoder.wordsThere is something wrong with start_frame and end_frame.
http://pastebin.com/a3zRxdH9
I expected the first start_frame to be 0.
This is my audio metadata:
Channels: 1
Bits per sample: 16
Samples per second: 16000
Bytes per second: 32000
Block align: 2
Sample frame count: 13682957
If I use 84749 as my original frame: I got 13682957 / (165414 - 84749.0) = 169.626938573111 which should be 160 since a pocketsphinx frame is 10ms.
Using pocketsphinx_continuous -infile jekyll_01_stevenson.wav -time yes
I got the correct times:
<s> 1.380 1.600 0.999300
chapter 1.610 1.980 0.864662
one 1.990 2.360 0.556734
<sil> 2.370 2.500 0.949038
of 2.510 2.600 0.820433
the 2.610 2.680 0.414533
strange 2.690 3.200 0.998601
case 3.210 3.550 0.999800
of 3.560 3.640 0.567698
dr(2) 3.650 4.000 0.734228
jekyll 4.010 4.350 0.999800
and 4.360 4.480 0.636639
mr 4.490 4.800 0.999900
hyde 4.810 5.340 0.986096
</s> 5.350 5.580 1.000000
Metadata
Metadata
Assignees
Labels
No labels