Offline OCR and Language Translation

This article describes a procedure for performing offline Optical Character Recognition (OCR) and translation of documents that come in the form of a scanned image (jpg, png etc) or PDF.

The assumption is that these documents contain sensitive/personal information, which makes upload to online OCR and Translation services a non-feasible option due to exchange of sensitive information with outside parties.

The approach also assumes the translation and OCR is performed on a Windows 10 platform.

Pre-requisite Software

Before starting, you will need to download and install the following:

  • Microsoft .NET Framework 4.8

Note: If installing Ghostscript, ensure that the PATH environment variable is updated to include the path to the Ghostscript installaton directory.

Process overview

  • Download the corresponding language pack/s from within VietOCR. This pack should reflect the language of the document for which the OCR will be run against

Sample Image to be Used for OCR/Translation

To demonstrate the process of OCR and translate from Chinese to English, we will use a sample “Chinese Resident Identity Card” (embedded below). The image was sourced from Wikipedia.

Image for post
Image for post

Download the Source Language Pack through VietOCR

The source image contains Chinese glyphs, therefore, we will need to download Chinese language packs.

  • Open VietOCR
Image for post
Image for post

Perform OCR

Download the sample image identity card image mentioned earlier in this article and save locally

To perform OCR using VietOCR:

  • Open the input source file from within VietOCR
Image for post
Image for post

Install and Setup Google Translate on Bluestacks

Perform the following from within Bluestacks:

  • Click on the Google Play Store icon located on the Home Screen
Image for post
Image for post

Open up Google Translate and choose the following settings:

  • Select “English” for “Your Primary Language”
Image for post
Image for post

Copy the Converted Text File (ocr_version.txt) to Bluestacks

  • Open Bluestacks Media Manager, as show below
Image for post
Image for post
  • Select “Import From Windows”
Image for post
Image for post
  • Select the text file containing OCR text (ocr_version.txt)
Image for post
Image for post
  • Wait for the file to import. It will appear in Media Manager once import is complete
Image for post
Image for post
  • Click on the file to open and choose “Just Once” to open with the default viewer (Open with HTML Viewer)
Image for post
Image for post
  • The file should load and appear with the correct format
Image for post
Image for post
  • Position your mouse cursor over a character within the file and Long press left mouse button. You will then be able to choose “Select All”
Image for post
Image for post
  • You will now have the option to select “Translate”, which will send the text to Google Translate and open up the app
Image for post
Image for post
  • To open up the full translation, perform the steps as per below
Image for post
Image for post
  • You should now be able to see a neatly formatted version of the original text along with corresponding tranlsation
Image for post
Image for post

Notes on Translation VietOCR Accuracy and Features

The following notes are based on excerpts from the VietOCR tech guide.

OCR Accuracy

  • VietOCR contains an implementation of a postprocessing algorithm to improve accuracy by applying corrections to common errors encountered in the OCR process

Features

  • Batch processing is supported which makes room for programmatic automation

Closing Notes

The procedure described in this article may be cumbersome for some, however, the focus was on performing offline OCR/Translation without the use of paid software.

VietOCR is one of many OCR packages. This link provides a list of alternatives.

From my own experience, finding a readily avaiable offline tranlsation packages is quite difficult. Given the accuracy and continuous development of the existing apps (such as Google Translate), it made sense to leverage off an existing product which is continually being enhanced.

If you are after alternatives to Google Translate, then some options are Microsoft Translate & Yandex Translate. These both support offline tranlsation mode.

As of now, these translation apps do not have equivalent versions for Windows, hence the need to install the Bluestacks Android emulator.

Originally published at http://github.com.

Written by

Primarily a Learner/Coder with interests in Python, Cloud Technologies, Security and Automation. Pandas munching on Bamboo sticks give me the “Giggles” :))

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store