This package tries to create a document classifier, text extractor from documents stream and divide documents into groups based on their contents for INDIAN financial institutions
To start using this package, clone it using github:
In the project directory, you can run:
In Development
''' To activate venv: venv\Scripts\activate
Required Milestones:
- Create a Library for Data Classification and Extraction
- the documents must be identified, classified, and divided
into multiple groups
submit a single file (image/pdf/word document)
that contains many documents.
To achieve:
- Class Extract
- Relevant Docstrings
- Implement functions in class - identify(take input and OCR function), save, OCR Function(conditions for diff documents and split based result)
- Once document is classified and split, create a library which accepts split document and extracts the data from it.
Non-Tech Issues
- Project README
- Docstring for Functions
- Find which all functions can be added to improve efficiency and workflow '''