How to enable OCR for non-English images

Netwrix Data Classification
The below steps define how to deploy the additional OCR language pack(s) and how to identify which files should be processed via the installed pack(s). It should be noted that this assumes that the user has enabled OCR correctly, more details can be found in the following KB article:

Select the language you wish to use from the dropdown list in order to download the corresponding pack:

  1. Ensure that the pack is deployed on all servers to the following locations:
    1. conceptQS (typically: C:\inetpub\wwwroot\conceptQS\bin\tessdata)
    2. conceptCollector (typically: C:\Program Files\ConceptSearching\Services\ConceptCollectorService\tessdata)
  2. The language pack file should not be renamed

Then, we must identify which files should be processed via a particular language pack:

  1. Log into the Administration Portal
  2. Select “Config
  3. Expand “Text Processing
  4. Select “OCR Language Mapping
  5. Each mapping allows you to define part of a path to identify specific files for processing:
    1. Select “Add
    2. Define the inclusion filter, such as:
      • *ru_* – Identifies any file that contains “ru_” within the path
      • * – Identifies any file
    3. Select the language (mapped to the deployed language pack)
    4. Select “Save
  6. In the event that a file matches multiple inclusion rules the longest matching rule will be favoured.
