OCR
Description
OCR (Optical Character Recognition) is a powerful tool that enables the extraction of text from images. By employing sophisticated algorithms, OCR technology scans the visual content of an image and identifies any text present within it.
This tool can accurately recognise and convert the text into machine-readable format, allowing it to be processed, searched, and manipulated digitally. OCR finds wide-ranging applications, such as digitising printed documents, extracting information from scanned images, enabling text-based searches within image collections, and facilitating data entry tasks by automatically extracting text from images. It provides a convenient and efficient means of extracting textual information from various sources, saving time and effort in manual transcription or typing tasks.
Settings
Please note if the user is running Zene on Raspberry Pi, Tessaract models for detection and recognition are not supported.
Detection Model
The detection model is the model that is used to detect the text in the scene. The following are the options:
- EAST (Efficient and Accurate Scene Text Detector)
- DB (Differentiable Binarization)
- Tessaract
EAST (Efficient and Accurate Scene Text Detector) Settings
Padding Scale
Padding scale is a parameter that determines the amount of padding around the detected text regions. A higher padding scale will result in more padding around the text, which can be useful in cases where the text is surrounded by other objects or noise. This can help improve the accuracy of the text recognition process by including more context around the detected text.
NMS Threshold
NMS threshold is a parameter that is used to eliminate overlapping bounding boxes in the detection process. It helps in reducing the number of false positives by merging overlapping boxes and keeping only the one with the highest confidence score. A lower NMS threshold will result in more aggressive merging of boxes, while a higher threshold will allow for more individual boxes to be retained.
Confidence Threshold
Confidence threshold is a parameter that determines the minimum confidence score a detection should have to be considered valid. Detections with confidence scores below this threshold will be discarded. A higher confidence threshold will result in fewer detections, but with higher accuracy, whereas a lower threshold will result in more detections, but with potentially lower accuracy.
DB (Differentiable Binarization) Settings
Binary Threshold
Binary Threshold is used to convert an input image into a binary image, where each pixel is either black or white. This is done by selecting a threshold value, and any pixel with intensity above this value is set to white (1.0), while others are set to black (0.0). This process helps in separating the text from the background, making it easier to detect and recognise the text
Polygon Threshold
A value used to determine the minimum confidence level required for a detected text region to be considered a valid text. When the DB Text Detector processes an image and generates polygons around the potential text regions, it assigns a confidence score to each polygon. Only the polygons with confidence scores higher than the polygon threshold are considered as valid text regions
Unclip Ratio
A parameter used to control the extent of the text region bounding box expansion. When a text region is detected, the bounding box might be too tight around the text. To ensure that the entire text is captured, the bounding box is expanded by a factor determined by the unclip ratio. A higher unclip ratio results in a larger expansion of the bounding box.
Maximum Results
This is to limit the number of text regions returned by the DB Text Detector. When processing an image, the detector might find multiple text regions with varying confidence scores. By setting the maximum results parameter, the detector only returns the top N text regions with the highest confidence scores.
Padding Scale
This is used to control the amount of padding added around the detected text regions. Padding is added to ensure that the entire text is captured and not cut off at the edges. The padding scale determines the size of the padding relative to the size of the text region. A higher padding scale results in a larger padding around the text region.
Tessaract Detection Settings
Confidence Threshold
The minimum probability required for the detector to consider a region in the input image as containing text. This threshold helps filter out false-positive detections and ensures that only regions with a high likelihood of containing text are processed further for text recognition. By adjusting the confidence threshold, one can control the balance between precision and recall, with higher thresholds leading to more precise detections but potentially missing some true text regions, and lower thresholds resulting in more detected regions but with a higher chance of false positives.
Recognition Model
The recognition model is the model that is used to recognise the text in the scene. The following are the options:
- Tesseract
- CRNN (Convolutional Recurrent Neural Network)
Tesseract Recognition Settings
Language
The language used for text recognition. The following are the options:
- Afrikaans
- Arabic
- Assamese
- Azerbaijani
- Belarusian
- Bengali
- Bulgarian
- Burmese
- Catalan
- Chinese (Simplified)
- Chinese (Simplified, Vertical)
- Chinese (Traditional)
- Chinese (Traditional, Vertical)
- Croatian
- Czech
- Danish
- Divehi
- Dutch
- Dzongkha
- English
- Esperanto
- Estonian
- Ewe
- Finnish
- French
- German
- Georgian
- Greek
- Haitian
- Hebrew
- Hindi
- Hungarian
- Indonesian
- Icelandic
- Italian
- Japanese
- Javanese
- Khmer (Central)
- Korean
- Kurdish
- Lao
- Latin
- Lithuanian
- Macedonian
- Malay
- Nepali
- Norwegian
- Panjabi
- Polish
- Portuguese
- Romanian
- Russian
- Serbian
- Sinhala
- Slovak
- Slovenian
- Spanish
- Swahili
- Swedish
- Tajik
- Tamil
- Telugu
- Thai
- Tigrinya
- Turkish
- Uighur
- Ukrainian
- Uzbek
- Vietnamese
Display Results
Overlay Results
Whether to draw the results on top of the image frame.
Draw Lines
Whether to draw the results outlines of the detected text regions on top of the image frame.
Draw Text
Whether to draw the results the detected text on top of the image frame.