Hi @jlsutherland and thanks for this cool module, OCR is a hard problem and you provide a pretty efficient and simple solution.
Would you be interested by PR with text extraction for non-scanned documents ? I think it fits the module name "doc2text" quite well but maybe you want to stick with just OCR, let me know