Skip to main content

OCR

Description

picture 1

OCR Example

OCR (Optical Character Recognition) is a powerful tool that enables the extraction of text from images. By employing sophisticated algorithms, OCR technology scans the visual content of an image and identifies any text present within it.

This tool can accurately recognise and convert the text into machine-readable format, allowing it to be processed, searched, and manipulated digitally. OCR finds wide-ranging applications, such as digitising printed documents, extracting information from scanned images, enabling text-based searches within image collections, and facilitating data entry tasks by automatically extracting text from images. It provides a convenient and efficient means of extracting textual information from various sources, saving time and effort in manual transcription or typing tasks.

Settings

caution

Please note if the user is running Zene on Raspberry Pi, Tessaract models for detection and recognition are not supported.

Detection Model

The detection model is the model that is used to detect the text in the scene. The following are the options:

  • EAST (Efficient and Accurate Scene Text Detector)
  • DB (Differentiable Binarization)
  • Tessaract

EAST (Efficient and Accurate Scene Text Detector) Settings

Padding Scale

Padding scale is a parameter that determines the amount of padding around the detected text regions. A higher padding scale will result in more padding around the text, which can be useful in cases where the text is surrounded by other objects or noise. This can help improve the accuracy of the text recognition process by including more context around the detected text.

NMS Threshold

NMS threshold is a parameter that is used to eliminate overlapping bounding boxes in the detection process. It helps in reducing the number of false positives by merging overlapping boxes and keeping only the one with the highest confidence score. A lower NMS threshold will result in more aggressive merging of boxes, while a higher threshold will allow for more individual boxes to be retained.

Confidence Threshold

Confidence threshold is a parameter that determines the minimum confidence score a detection should have to be considered valid. Detections with confidence scores below this threshold will be discarded. A higher confidence threshold will result in fewer detections, but with higher accuracy, whereas a lower threshold will result in more detections, but with potentially lower accuracy.

DB (Differentiable Binarization) Settings

Binary Threshold

Binary Threshold is used to convert an input image into a binary image, where each pixel is either black or white. This is done by selecting a threshold value, and any pixel with intensity above this value is set to white (1.0), while others are set to black (0.0). This process helps in separating the text from the background, making it easier to detect and recognise the text

Polygon Threshold

A value used to determine the minimum confidence level required for a detected text region to be considered a valid text. When the DB Text Detector processes an image and generates polygons around the potential text regions, it assigns a confidence score to each polygon. Only the polygons with confidence scores higher than the polygon threshold are considered as valid text regions

Unclip Ratio

A parameter used to control the extent of the text region bounding box expansion. When a text region is detected, the bounding box might be too tight around the text. To ensure that the entire text is captured, the bounding box is expanded by a factor determined by the unclip ratio. A higher unclip ratio results in a larger expansion of the bounding box.

Maximum Results

This is to limit the number of text regions returned by the DB Text Detector. When processing an image, the detector might find multiple text regions with varying confidence scores. By setting the maximum results parameter, the detector only returns the top N text regions with the highest confidence scores.

Padding Scale

This is used to control the amount of padding added around the detected text regions. Padding is added to ensure that the entire text is captured and not cut off at the edges. The padding scale determines the size of the padding relative to the size of the text region. A higher padding scale results in a larger padding around the text region.

Tessaract Detection Settings

Confidence Threshold

The minimum probability required for the detector to consider a region in the input image as containing text. This threshold helps filter out false-positive detections and ensures that only regions with a high likelihood of containing text are processed further for text recognition. By adjusting the confidence threshold, one can control the balance between precision and recall, with higher thresholds leading to more precise detections but potentially missing some true text regions, and lower thresholds resulting in more detected regions but with a higher chance of false positives.

Recognition Model

The recognition model is the model that is used to recognise the text in the scene. The following are the options:

  • Tesseract
  • CRNN (Convolutional Recurrent Neural Network)

Tesseract Recognition Settings

Language

The language used for text recognition. The following are the options:

  • Afrikaans
  • Arabic
  • Assamese
  • Azerbaijani
  • Belarusian
  • Bengali
  • Bulgarian
  • Burmese
  • Catalan
  • Chinese (Simplified)
  • Chinese (Simplified, Vertical)
  • Chinese (Traditional)
  • Chinese (Traditional, Vertical)
  • Croatian
  • Czech
  • Danish
  • Divehi
  • Dutch
  • Dzongkha
  • English
  • Esperanto
  • Estonian
  • Ewe
  • Finnish
  • French
  • German
  • Georgian
  • Greek
  • Haitian
  • Hebrew
  • Hindi
  • Hungarian
  • Indonesian
  • Icelandic
  • Italian
  • Japanese
  • Javanese
  • Khmer (Central)
  • Korean
  • Kurdish
  • Lao
  • Latin
  • Lithuanian
  • Macedonian
  • Malay
  • Nepali
  • Norwegian
  • Panjabi
  • Polish
  • Portuguese
  • Romanian
  • Russian
  • Serbian
  • Sinhala
  • Slovak
  • Slovenian
  • Spanish
  • Swahili
  • Swedish
  • Tajik
  • Tamil
  • Telugu
  • Thai
  • Tigrinya
  • Turkish
  • Uighur
  • Ukrainian
  • Uzbek
  • Vietnamese

Display Results

Overlay Results

Whether to draw the results on top of the image frame.

Draw Lines

Whether to draw the results outlines of the detected text regions on top of the image frame.

Draw Text

Whether to draw the results the detected text on top of the image frame.