Skip to content

πŸ“ˆ Combat Otter: Text recognition model traning πŸ‘€

Notifications You must be signed in to change notification settings

archived-dreams/combat-otter.tesseract

Repository files navigation

πŸ“ˆ Combat Otter: Tesseract 5 traning πŸ‘€

Hi! This is an auxiliary repository for a kriakiku/combat-otter project. The purpose of the repository is to simplify the training of the tesseract-ocr model for new fonts by creating ready-made commands.

😜 But of course you can use this solution for your own purposes not related to our application in any way.

How to install Tesseract 5 on Ubuntu

The author of the repository used WSL 2 (installation guide) with installed Ubuntu 22.04.3 LTS (download) and Tesseract 5.

You can install the latest version of the library using the following commands:

# sudo add-apt-repository -y ppa:alex-p/tesseract-ocr-devel
# sudo apt -y update
# sudo apt install -y tesseract-ocr

How to recognize the font being used?

Take a screenshot of the area of the screen containing the font you are exclusively interested in. Use one of the services: whatfontis, fontsquirrel.

How do I find out the real name of the downloaded font?

Use the fc-scan or fc-list CLI utility.

fc-scan usage example:

You should use the fullname value.

# fc-scan "./fonts/Stratum2 Bold.ttf"
Pattern has 23 elts (size 32)
        family: "Stratum2"(s) "Stratum2 Bd"(s)
        familylang: "en"(s) "en"(s)
        style: "Bold"(s)
        stylelang: "en"(s)
        fullname: "Stratum2 Bd Bold"(s)
        fullnamelang: "en"(s)
        slant: 0(i)(s)
        weight: 200(f)(s)
        width: 100(f)(s)
        foundry: "ptf"(s)
        file: "./fonts/Stratum2 Bold.ttf"(s)
        index: 0(i)(s)
        [...]
        postscriptname: "Stratum2-Bold"(s)
        color: False(s)
        symbol: False(s)
        variable: False(s)

fc-list usage example

You will only be able to see the fonts installed in the system.

# fc-list
[...]
~/.fonts/tessfont/Stratum2 Bold.ttf: Stratum2,Stratum2 Bd:style=Bold
[...]
~/.fonts/tessfont/Stratum2 Medium.ttf: Stratum2,Stratum2 Md:style=Medium,Regular

What fonts were found and trained?

We are talking about the pre-game lobby in CoD

  • Stratum2 Bold: EN, DE, ES (Latin), ES, FR, IT, PT (Brazilian);
  • Bio Sans Bold, Bio Sans Regular: PL, RU;
  • ????: AR, ZN (Traditional), ZN (Simplified);
  • ????: TH, KO, JA – each language has a unique font;

How to train a model?

Download and place the font you need in the fonts folder. Next, update the environment settings in the config.ini file:

  • Enter the font name in the environment variable (FONT_NAME);

Step I. Generate training data

You will need to generate images to retrain the model. This process will take several hours (about 195 000 files will be generated). But you can stop its execution at any time and continue by restarting the command:

# ./0.generate-training-data.sh

You may want to save files to cloud storage or share them. To do this, use the command that archives the files:

# ./0.post-archive-generated-training-data.sh

If you need the generated files for already trained fonts, you can download and uncompress them to the train folder:

Step II.


Links

About

πŸ“ˆ Combat Otter: Text recognition model traning πŸ‘€

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published