Sumi is a simple OCR application with support for corrections.
- Go
- GTK+ 3.16 or later
- Tesseract 3.04.00 or later
- Trained data for your language
- One of the following:
If none of the screenshot utilities above are available on your system, you can
use the SUMI_SCREENCAPTURE environment variable to provide your own. The
utility is expected to select a part of the screen and write to the file path
specified in the last argument, e.g. for scrot the valid SUMI_SCREENCAPTURE
value would be scrot -s.
- Download and install all dependencies from the list above
go get github.com/tsudoko/sumi
To use a language other than Japanese (or more than one language at once), pass
the ISO 639-3 code of the desired language in a -l flag, i.e. sumi -l eng.
Please note though that sumi was designed to work specifically with Japanese,
therefore it might give worse results when used with other languages.
Sumi prints scanned text to stdout. It's possible to send it to other
programs automatically, examples below.
X11, requires xclip.
./sumi | while read -r a; do echo "$a" | xclip -i -sel clip; done
Windows, requires a sh-compatible shell and iconv. You have to replace $cp
with your locale's codepage, for Japanese it's cp932.
./sumi.exe | while read -r a; do echo "$a" | iconv -t $cp | clip; done
With ep:
./sumi | xargs -n1 ep
With myougiden:
./sumi | xargs -n1 myougiden

