Skip to content

Python program for searching pdf text, ranking the results and exporting highlighted search results in pdf. Uses trie structure, stack, heap, page graph. Converts queries to postfix notation. Allows for logical expressions and phrases. Offers did you mean functionality.

Notifications You must be signed in to change notification settings

njmarko/googolplex-pdf-search

Repository files navigation

googolplex-pdf-search

Python program for searching pdf text, ranking the results and exporting highlighted search results in pdf. Uses trie structure, stack, heap, page graph. Converts queries to postfix notation. Allows for logical expressions and phrases. Offers did you mean functionality.

Required libraries

  • PyMuPDF
  • didyoumean.py

How to install and run the program

  1. Create a virtual environment in the project directory: virtualenv venv

  2. Activate the virtual environment:

    2.1. For Windows: venv\Scripts\activate

    2.2. For Linux: source venv/bin/activate

  3. Install the required libraries: pip install -r requirements.txt

  4. Run the program: python main.py

  5. All in one command:

    5.1. For linux virtualenv venv && source venv/bin/activate && pip install -r requirements.txt && python main.py

    5.1. For windows (if using Powershell) virtualenv venv; venv\Scripts\Activate; pip install -r requirements.txt; python main.py

Application screenshots

signal-visualization

Ilustration 1 - Loading bar.

signal-visualization

Ilustration 2 - Autocomplete feature.

signal-visualization

Ilustration 3 - Did you mean functionality.

signal-visualization

Ilustration 4 - Third page of results for the search query graph.

signal-visualization

Ilustration 5 - Complex logical query with OR, AND and grouping with brackets.

signal-visualization

Ilustration 6 - Complex logical query with negation (NOT) and grouping with brackets.

signal-visualization

Ilustration 7 - Phrase search for "skip list" by using the double quotes.

signal-visualization

Ilustration 8 - Generated pdf with highlighted search query "skip list".

About

Python program for searching pdf text, ranking the results and exporting highlighted search results in pdf. Uses trie structure, stack, heap, page graph. Converts queries to postfix notation. Allows for logical expressions and phrases. Offers did you mean functionality.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages