Skip to content

Conversation

@nayyarv
Copy link

@nayyarv nayyarv commented Nov 5, 2014

typicalvoicesample

In a typical voice sample, you can see that a significant portion of the speech does not consist of any activity, so in typical speech applications, the regions that don't have much happening are instead removed, as this results in us extracting the MFCCs of silence which aren't very helpful in most situations.

I've modified your code slightly to allow for Voice Activity Detection, it's default behaviour is still intact, but if someone wishes to implement a Voice Activity detector function they have the template, documentation and a simple threshold to play with, as well as an example showing simple applications.

The code allows for passing of the frames and the entire signal, which should be flexible enough for anyone to write their own versions depending on their purpose. I considered using the frame power provided as the first MFCC, but decided that this was overall more flexible, and allowed comparison to the entire signal at once.

This is a modification I made for my thesis in which I used your code to extract the MFCCs from a bunch of files, and I thought other people may find this handy too.

@meresmclr
Copy link

@nayyarv very nice -- could you fix the conflicts with existing code for ease of merging/evaluating your changes?

@nayyarv
Copy link
Author

nayyarv commented Jul 26, 2016

Haha, it's been some time since I opened this PR!

I'll have a look later this week and see if it's a straightforward update, and see if it's py3 compatible and not too complicated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants