wiki-search

In the vast landscape of digital information, the challenge of efficiently accessing relevant data is increasing. Whether for academic research, professional pursuits or personal curiosity, the ability to quickly access relevant information among the vast sea of available content is crucial. This overarching problem has been the primary motivation for this information retrieval project.

Within this course project, the aim was to address a specific aspect of this grand challenge: optimizing ad-hoc information retrieval systems. Leveraging the rich resources of the DPR Wiki100 dataset and the versatile PyTerrier framework, the effort was focused on building an efficient information retrieval system. The approach involved using the powerful NLTK library for query processing and enabling core functionalities such as tokenization and stemming. Moreover, the inclusion of a minimalistic user interface, facilitated by the tkinter library, ensures an intuitive experience for users, further enhancing the practical utility of our system.

At its core, the project attempted to address the small but important problem of improving the performance of the core BM25 ranking schemes. While existing retrieval methods provided a baseline, it was realized that there was a large room for improvement and optimization. It is aimed to improve the effectiveness and efficiency of a simple information retrieval system by making various enhancements and optimizations.

What makes this project particularly interesting is its practical importance. In an age dominated by data, the ability to access relevant information quickly and accurately is invaluable in many fields. Moreover, the fact that it can process such a large amount of data (almost 5 GB) and provide the user with results quickly and accurately makes this project even more valuable.

Moving forward, this report will delve into the methodology employed in constructing our information retrieval system, detailing the various components and techniques utilized. Then, it will present an in-depth analysis of the performance enhancements achieved through our optimizations, supported by empirical results and comparative evaluations. Finally, it will discuss the implications of our findings, highlighting avenues for future research and potential extensions of our work.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.idea		.idea
reports		reports
src		src
utils		utils
ProjectDescription_2024.pdf		ProjectDescription_2024.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

wiki-search

About

Releases

Packages

Languages

mb-emektar/wiki-search

Folders and files

Latest commit

History

Repository files navigation

wiki-search

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages