How to enable OCR for non-English images

Netwrix Data Classification
Other
https://kb.netwrix.com/3519
Copy Article URL Copied

The below steps define how to deploy the additional OCR language pack(s) and how to identify which files should be processed via the installed pack(s). It should be noted that this assumes that the user has enabled OCR correctly, more details can be found in the following KB article: https://kb.netwrix.com/3517

First, identify if the language you wish to use is available from the below list, then:

  1. Log into the Partners Portal (https://partners.conceptsearching.com)
  2. Select “OCR Language Packs
  3. Download the required language pack
  4. Ensure that the pack is deployed on all servers to the following locations:
    1. conceptQS (typically: C:\inetpub\wwwroot\conceptQS\bin\tessdata)
    2. conceptCollector (typically: C:\Program Files\ConceptSearching\Services\ConceptCollectorService\tessdata)
  5. The language pack file should not be renamed

Then, we must identify which files should be processed via a particular language pack:

  1. Log into the Administration Portal
  2. Select “Config
  3. Expand “Text Processing
  4. Select “OCR Language Mapping
  5. Each mapping allows you to define part of a path to identify specific files for processing:
    1. Select “Add
    2. Define the inclusion filter, such as:
      • *ru_* – Identifies any file that contains “ru_” within the path
      • * – Identifies any file
    3. Select the language (mapped to the deployed language pack)
    4. Select “Save
  6. In the event that a file matches multiple inclusion rules the longest matching rule will be favoured.
Go Up