Skip to content

Latest commit

 

History

History
49 lines (32 loc) · 1.28 KB

File metadata and controls

49 lines (32 loc) · 1.28 KB

Project Description

Xtracter

This package tries to create a document classifier, text extractor from documents stream and divide documents into groups based on their contents for INDIAN financial institutions

🚀 Getting Started

To start using this package, clone it using github:

git clone https://github.com/PrachetShah/Document-Classifier-and-Text-Extracter.git

In the project directory, you can run:

pip install -r requirements.txt

👩‍💻 Usage Guide

In Development

Requirements

''' To activate venv: venv\Scripts\activate

Required Milestones:

  1. Create a Library for Data Classification and Extraction
  • the documents must be identified, classified, and divided into multiple groups submit a single file (image/pdf/word document) that contains many documents. To achieve:
    1. Class Extract
    2. Relevant Docstrings
    3. Implement functions in class - identify(take input and OCR function), save, OCR Function(conditions for diff documents and split based result)
  1. Once document is classified and split, create a library which accepts split document and extracts the data from it.

Non-Tech Issues

  1. Project README
  2. Docstring for Functions
  3. Find which all functions can be added to improve efficiency and workflow '''