Skip to content
nyeoWM edited this page Jul 8, 2022 · 10 revisions

AI4WRD-OCR WIKI

AI4WRD was first developed as a non-invasive data extraction method for window based applications, mainly for software that does not support data extraction through apis, or output to a generally usable format such as json or csv. The software extracts data using optical character recognition on video streams, working with both video capture cards, virtual cameras and video cameras. Given that such software might have multiple screens and multiple sections of texts to be cropped, the software makes use of the SIFT and FLANN algorithm to match crops to specific frame from a video stream. More details below.

This guide will first detail a standard workflow, then dive into a few of the additional features of the software.

AI4WRD-OCR Features

Basic OCR Work-Flow

image

Use the navigation drop down menu to navigate between the various sections of the app. The app currently consists of three sections:

  1. Load frame
  2. Crop
  3. OCR-Livestream

The main workflow consists of the following:

Choosing video stream -> capturing screenshots of specific screens -> cropping specific sections of each of the screens -> starting optical character recognition

The rest of the section will guide you through this workflow in detail:

1. Load Frame

captureScreen1 Load Frame Screen

Select Language

image

Use the drop down menu to select additional languages to detect. Currently English is the default language, with the option to simultaneously detect either traditional or simplified Chinese too.

Select and preview Video Stream

image

Use the drop down menu to select the video stream that you want to perform Optical Character Recognition on, then click the run widget to preview the video stream.

Capture screenshots

image

Capture the necessary screenshots of the various screens that you would like to perform optical character recognition on. Later on, you would get to define crops for each of the screens and the software would automatically detect and crop the relevant video stream to perform optical character recognition.

Screenshots are captured using the Capture screenshot button. The screenshots captured will be displayed bellow the button.

2. Crop

cropScreen1 Crop Screen

Select Screenshot to Crop

image

Drop down menu to select the screenshot to crop. Crops will be automatically saved and associated with the specified screenshots internally.

Cropping

image

Drag the box to specify sections of the screen to crop. Later the application will perform Optical Character Recognition on the specified crops. A screen below will preview the selected crop.

Once you are satisfied with the crop, click the crop button and the crop will be saved listed below. You can proceed to perform crops as you like.

Additional Configurations

image

You can additionally specify if you would like to see the crop preview in real time, the color of cropping box, and the zoom level of the crops.

3. OCR Livestream

liveStream Optical Character Recognition Livestream Screen

Click on the Done Crop check box to initialize the models. One the model is initialized, optical character recognition will begin. The application extracts features from the videos stream using the Scale Invariant Feature Transform algorithm and matches the video stream with associated crops using the FLANN algorithm.

Note: If this is your first time running the program, it might take some time as the software will need to download the required models. Please ensure that your internet connection is stable. Check the terminal output if it does not respond after a significant amount of time, if it still does not respond you might need to restart the program.

Select Optical Character Recognition Library

image

Drop down menu to select the optical character recognition library to be used. Currently there are two libraries available: Tesseract and Easy-Ocr. We recommend tesseract for printed characters on screens and easy-ocr for streams from video cameras.

OCR Confidence Level Cut off

image

Specifies the minimum confidence level for optical character recognition. If the confidence level drops bellow the specified level the text will not be displayed.

Video Preview and Optical Character Recognition Output

Video Preview

image

Preview of the current video stream

Note that the video stream is the same stream selected in previousload frame page. If you would like to select a different video stream, please return to the load frame page using the navigation drop down menu and select a different stream.

OCR Output

image

Optical Character Recognition output for each crop will be displayed at the bottom of the screen.

Saving and Loading Configurations

image

Configurations of screenshots and associated crops may be saved and later reused using the save and reload functionality. Simply specify the output directory and filename and click "save screen and crop configuration". To reload the screenshot and crops, simply input the path to a previous save-file and click "load screen and crop configuration".

Configurations do not contain video stream information, just the screenshots and associated crops, the stream must be specified in the load frame page, accessible from the navigation drop down menu at the top of the page.

Warning the save and load functionality makes use of the python pickle module, which allows arbitrary code to be run. Please only load configuration files that you trust. More information about the security of the pickle module is available here.

Output to CSV and MQTT

The software provides functionality to output detected characters to CSV and MQTT.

CSV Output

image

There are two modes for csv output. "Save previous to csv" saves all detected text since starting the OCR Livestream to a csv file. "Save continuous to csv" first creates a csv file and then continuously appends detected characters to the csv file. The CSV file will contain the timestamps as well as the detected characters.

To use merely input the path and filename that you would like to save the csv file too and click on one of the buttons.

MQTT Output

image

The software also provides functionality to output detected characters through the MQTT protocol. Just specify the broker, port and topic. Click publish to MQTT server and then the software will start publishing the output of the Optical Character Recognition.

Note, if there are multiple crops, each crop will be output to a different topic as follows:

<user-specific-topic><crop-number>

For instance, if there are 2 crops and the topic specified is ai4wrdOutput, the text from crop 1 will be published to ai4wrdOutput1 and the text from crop 2 will be published to ai4wrOutput2.

Clone this wiki locally