Saram - Image/PDF OCR detection system

Get OCR in txt form from an image or pdf extension supporting multiple files from directory using pytesseract with support for rotation in case of wrong orientation along.

Currently in beta state

Follow: Demo run

Saram features

Note: Make sure you have a OCR tool like tesseract and certain data value for comparing OCR, eg tesseract-data-eng along with Pillow and Wand for image conversion and loading which will be fetched during pip install.

For using in python: Refer to the py-module branch

Installation

Install using PIP:

$ pip install saram
$ saram <dirname>

else

Clone the source locally:

$ git clone https://github.com/aryaminus/saram
$ cd saram
$ git checkout py-module
$ python main.py <dirname>

Todo

Reference

  1. pdf-to-txt
  2. ocr-convert-image-to-text
  3. fix-image-rotation
  4. python-packaging

Contributing

  1. Fork it (https://github.com/aryaminus/saram/fork)
  2. Create your feature branch (git checkout -b feature/fooBar)
  3. Commit your changes (git commit -am 'Add some fooBar')
  4. Push to the branch (git push origin feature/fooBar)
  5. Create a new Pull Request

Enjoy!