Table Transformer Simple Inference

⚠️ 25/05/2022 Read before using: this repo will not be updated in the future. Note that you should not use this code if you want to use all the features that the official repo has to offer. You could use it as a good example for inference, but you really shouldn't use our post-processing since the official repo does a much better job at this.

Table Transformer Simple Inference

This repository contains code to run simple inference and export the cells of a table (incl. text in cell), as a pandas DataFrame. Note that not all the features that the official repository offers are included. The resulting DataFrame is constructed based on the column and row predictions, and will probably not work on complex tables. The repo is built on top of this fork of the official repo.

We worked with tables that were already cropped out, but you could also first apply table detection using the pre-trained weights from the official repo.

Setup Guide

You can refer to the environment.yml file to set up your environement with conda. We used a virtual environment, which means that you can either put all the requirements in the environment.yml file into a requirements.txt file or just install them manually.

Create Virtual Environment

python -m virtualenv venv
source venv/bin/activate

PyTorch

To install PyTorch you might be able to use the command below if your CUDA and Python version (3.8.10) overlap. If not, you can use the get started guide to compose your install command.

pip install torch==1.10.2+cu113 torchvision==0.11.3+cu113 torchaudio==0.10.2+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html

Detectron2

Use the command below and if it doesn't work you can use the official installation guide

pip install "detectron2@git+https://github.com/facebookresearch/detectron2.git@v0.5#egg=detectron2"

PyTesseract

The Tesseract OCR Python wrapper is used for text recognition on each cell in the tables. For more information about the installation you can refer to the pytesseract GitHub repo.

pip install pip install pytesseract

Next, you need to download additional language packs, although it might be that English is supported out of the box. You can download the language packs from either the tessdata or tessdata_fast repository. Keep in mind that you have to make a speed/accuracy compromise when using the fast packs.

You can either clone the whole repository or download a single pack. During development the (format=language:abbreviation:packname) English='eng'=eng.traineddata, French='fra'=fra.traineddata, and German='deu'=deu.traineddata lanaguage packs were used.

Put the language packs in a directory called tessdata and set the TESSDATA_PREFIX environment variable like we do below.

export TESSDATA_PREFIX=/home/user/tessdata

Pre-trained Model Weights

Assuming that you have already cropped out the table after table detection, you can use the pre-trained model below for table structure reocgnition. Put the model in the root of this repository after the download has finished or change the path to the model in the code.

Table Structure Recognition:

Model	Schedule	AP50	AP75	AP	AR	GriTS_Top	GriTS_Con	GriTS_Loc	Acc_Con	File	Size
DETR R18	20 Epochs	0.970	0.941	0.902	0.935	0.9849	0.9850	0.9786	0.8243	Weights	110 MB

Running Things Locally

You can run the main.py script which will use the 'example_table.jpg' as input to the model and output 'visualization.jpg' containing the visualization of the predictions on the original image.

python main.py

Official Repository

The official repository can be found here. If you want to extract more complex tables you will have to add the appropriate post-processing yourself, but you can use the official source code as a reference. :)

Name		Name	Last commit message	Last commit date
Latest commit History 107 Commits
.github/workflows		.github/workflows
detr		detr
resources		resources
src		src
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
environment.yml		environment.yml
example_table.jpg		example_table.jpg
main.py		main.py
visualization.jpg		visualization.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Table Transformer Simple Inference

Setup Guide

Create Virtual Environment

PyTorch

Detectron2

PyTesseract

Pre-trained Model Weights

Running Things Locally

Official Repository

About

Releases

Packages

Languages

License

peetio/table-transformer-simple-inference

Folders and files

Latest commit

History

Repository files navigation

Table Transformer Simple Inference

Setup Guide

Create Virtual Environment

PyTorch

Detectron2

PyTesseract

Pre-trained Model Weights

Running Things Locally

Official Repository

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages