Include a way to get raw text out of a variety of formats. [Textract](https://textract.readthedocs.io/en/stable/) may be the way, but I've had trouble with it in Anaconda.