I've made a browser extension last month that makes use of the same method.
https://github.com/dessant/buster
Buster is an accessibility tool that helps people solve difficult captchas. It works quite well, and you can select between several speech recognition services.
What was your experience regarding the quality of these services? I've found the video model of the Google Cloud Speech API to be the most accurate. Do you know any speech recognition services that have a free API like Wit.ai? I'd like to have more choices besides Google and Wit.ai, without requiring users to sign up for different services themselves.