Offline OCR and Language Translation

This article describes a procedure for performing offline Optical Character Recognition (OCR) and translation of documents that come in the form of a scanned image (jpg, png etc) or PDF.

The assumption is that these documents contain sensitive/personal information, which makes upload to online OCR and Translation services a non-feasible option due to exchange of sensitive information with outside parties.

The approach also assumes the translation and OCR is performed on a Windows 10 platform.

Pre-requisite Software

Before starting, you will need to download and install the following:

Note: If installing Ghostscript, ensure that the PATH environment variable is updated to include the path to the Ghostscript installaton directory.

Process overview

Sample Image to be Used for OCR/Translation

To demonstrate the process of OCR and translate from Chinese to English, we will use a sample “Chinese Resident Identity Card” (embedded below). The image was sourced from Wikipedia.

Download the Source Language Pack through VietOCR

The source image contains Chinese glyphs, therefore, we will need to download Chinese language packs.

Perform OCR

Download the sample image identity card image mentioned earlier in this article and save locally

To perform OCR using VietOCR:

Install and Setup Google Translate on Bluestacks

Perform the following from within Bluestacks:

Open up Google Translate and choose the following settings:

Copy the Converted Text File (ocr_version.txt) to Bluestacks

Notes on Translation VietOCR Accuracy and Features

The following notes are based on excerpts from the VietOCR tech guide.

OCR Accuracy

Features

Closing Notes

The procedure described in this article may be cumbersome for some, however, the focus was on performing offline OCR/Translation without the use of paid software.

VietOCR is one of many OCR packages. This link provides a list of alternatives.

From my own experience, finding a readily avaiable offline tranlsation packages is quite difficult. Given the accuracy and continuous development of the existing apps (such as Google Translate), it made sense to leverage off an existing product which is continually being enhanced.

If you are after alternatives to Google Translate, then some options are Microsoft Translate & Yandex Translate. These both support offline tranlsation mode.

As of now, these translation apps do not have equivalent versions for Windows, hence the need to install the Bluestacks Android emulator.

Originally published at http://github.com.

Learner. Interests include Cloud and Devops technologies.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store