Awesome RAG Evaluation

The official repository for the paper: Evaluation of Retrieval-Augmented Generation: A Survey Arxiv. This paper has been accepted by the 2024 CCF Big Data.

Abstract

Retrieval-Augmented Generation (RAG) has recently gained traction in natural language processing. Numerous studies and real-world applications are leveraging its ability to enhance generative models through external information retrieval. Evaluating these RAG systems, however, poses unique challenges due to their hybrid structure and reliance on dynamic knowledge sources. To better understand these challenges, we conduct A Unified Evaluation Process of RAG (Auepora) and aim to provide a comprehensive overview of the evaluation and benchmarks of RAG systems. Specifically, we examine and compare several quantifiable metrics of the Retrieval and Generation components, such as relevance, accuracy, and faithfulness, within the current RAG benchmarks, encompassing the possible output and ground truth pairs. We then analyze the various datasets and metrics, discuss the limitations of current benchmarks, and suggest potential directions to advance the field of RAG benchmarks.

Analysis Framework for Evaluating RAG Systems

The Target modular of Auepora. The retrieval and generation components are highlighted in red and green, respectively.

Reference Framework

Category	Framework	Webpage	Paper
Tool	TruEra RAG Triad	https://www.trulens.org/trulens_eval/getting_started/core_concepts/rag_triad/	-
Tool	LangChain Bench.	https://langchain-ai.github.io/langchain-benchmarks/notebooks/retrieval/langchain_docs_qa.html	-
Tool	Databricks Eval	https://www.databricks.com/blog/LLM-auto-eval-best-practices-RAG	-
Benchmark	RAGAs	https://github.com/explodinggradients/ragas	https://aclanthology.org/2024.eacl-demo.16/
Benchmark	RECALL	-	https://arxiv.org/abs/2311.08147
Benchmark	ARES	https://github.com/stanford-futuredata/ARES	https://arxiv.org/abs/2311.09476
Benchmark	RGB	https://github.com/chen700564/RGB	https://ojs.aaai.org/index.php/AAAI/article/view/29728
Benchmark	MultiHop-RAG	https://github.com/yixuantt/MultiHop-RAG/	https://arxiv.org/abs/2401.15391
Benchmark	CRUD-RAG	https://github.com/IAAR-Shanghai/CRUD_RAG	https://arxiv.org/abs/2401.17043v2
Benchmark	MedRag	https://github.com/Teddy-XiongGZ/MedRAG	http://arxiv.org/abs/2402.13178v2
Benchmark	FeB4RAG	https://github.com/ielab/FeB4RAG	http://arxiv.org/abs/2402.11891v1
Benchmark	CDQA	https://github.com/Alibaba-NLP/CDQA	https://arxiv.org/abs/2402.19248v2
Benchmark	DomainRAG	https://github.com/ShootingWong/DomainRAG	https://arxiv.org/abs/2406.05654v2
Benchmark	ReEval	https://autodebug-llm.github.io	https://aclanthology.org/2024.findings-naacl.85/
Research	FiD-Light	-	https://doi.org/10.1145/3539618.3591687
Research	Diversity Reranker	https://towardsdatascience.com/enhancing-rag-pipelines-in-haystack-45f14e2bc9f5	-

Citation

If you find this paper or repository helpful, please consider citing our work:

@misc{yu2024evaluation,
      title={Evaluation of Retrieval-Augmented Generation: A Survey}, 
      author={Hao Yu and Aoran Gan and Kai Zhang and Shiwei Tong and Qi Liu and Zhaofeng Liu},
      year={2024},
      eprint={2405.07437},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Changelog

2024-05-11: Initial release of the paper and repository.
2024-06-25: Acceptance of the paper by the 2024 CCF Big Data.
2024-06-30: Add two benchmarks: DomainRAG and ReEval.
2024-07-03: Update Arxiv version to v2.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
figures		figures
LICENSE		LICENSE
README.md		README.md
README_cn.md		README_cn.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome RAG Evaluation

Abstract

Analysis Framework for Evaluating RAG Systems

Reference Framework

Citation

Changelog

About

Releases

Packages

License

YHPeter/Awesome-RAG-Evaluation

Folders and files

Latest commit

History

Repository files navigation

Awesome RAG Evaluation

Abstract

Analysis Framework for Evaluating RAG Systems

Reference Framework

Citation

Changelog

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages