Awesome Compound AI Paper List ⭐️

Developers are increasingly creating Compound AI systems that combine multiple model calls and external components to tackle complex AI tasks. These systems often outperform single models through effective component combination and orchestration with better cost-efficiency and/or reduced latency. This repository collects and categorizes papers on Compound AI, including LLM routing, cascading, ensembling, speculative decoding methods, and LLM programming frameworks.

If you find this repository useful, please consider giving it a star. If there are any relevant papers that should be included, you're welcome to create a pull request or open an issue!

Related Resources

Routing

Large Language Model Routing with Benchmark Datasets (arXiv, 2023) [PDF]
- Tal Shnitzer, Anthony Ou, Mírian Silva, Kate Soule, Yuekai Sun, Justin Solomon, Neil Thompson, Mikhail Yurochkin
Tryage: Real-time, intelligent Routing of User Prompts to Large Language Models (arXiv, 2023) [PDF]
- Surya Narayanan Hari, Matt Thomson
Harnessing the Power of Multiple Minds: Lessons Learned from LLM Routing (Workshop on Insights from Negative Results in NLP, 2024) [PDF]
- Kv Aditya Srivatsa, Kaushal Maurya, Ekaterina Kochmar
Fly-Swat or Cannon? Cost-Effective Language Model Choice via Meta-Modeling (WSDM 2024) [PDF]
- Marija Šakota, Maxime Peyrard, Robert West
Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models (NAACL 2024) [PDF]
- Keming Lu, Hongyi Yuan, Runji Lin, Junyang Lin, Zheng Yuan, Chang Zhou, Jingren Zhou
Which LLM to Play? Convergence-Aware Online Model Selection with Time-Increasing Bandits (WWW 2024) [PDF]
- Yu Xia, Fang Kong, Tong Yu, Liya Guo, Ryan A. Rossi, Sungchul Kim, Shuai Li
Hybrid LLM: Cost-Efficient and Quality-Aware Query Routing (ICLR 2024) [PDF]
- Dujian Ding, Ankur Mallick, Chi Wang, Robert Sim, Subhabrata Mukherjee, Victor Rühle, Laks V. S. Lakshmanan, Ahmed Hassan Awadallah
Towards Optimizing the Costs of LLM Usage (arXiv, 2024) [PDF]
- Shivanshu Shekhar, Tanishq Dubey, Koyel Mukherjee, Apoorv Saxena, Atharv Tyagi, Nishanth Kotla
ROUTERBENCH: A Benchmark for Multi-LLM Routing System (arXiv, 2024) [PDF]
- Qitian Jason Hu, Jacob Bieker, Xiuyu Li, Nan Jiang, Benjamin Keigwin, Gaurav Ranganath, Kurt Keutzer, Shriyash Kaustubh Upadhyay
- Code: https://github.com/withmartian/routerbench
RouteLLM: Learning to Route LLMs with Preference Data (arXiv, 2024) [PDF]
- Isaac Ong, Amjad Almahairi, Vincent Wu, Wei-Lin Chiang, Tianhao Wu, Joseph E. Gonzalez, M Waleed Kadous, Ion Stoica

Ensemble

Efficient Online ML API Selection for Multi-Label Classification Tasks (ICML 2022) [PDF]
- Lingjiao Chen, Matei Zaharia, James Zou
LLM-Blender: Ensembling Large Language Models with Pairwise Ranking and Generative Fusion (ACL 2023) [PDF]
- Dongfu Jiang, Xiang Ren, Bill Yuchen Lin
More Agents Is All You Need (arXiv, 2024) [PDF]
- Junyou Li, Qin Zhang, Yangbin Yu, Qiang Fu, Deheng Ye

Cascade

FrugalML: How to Use ML Prediction APIs More Accurately and Cheaply (NIPS 2020) [PDF]
- Lingjiao Chen, Matei Zaharia, James Zou
Model Cascading: Towards Jointly Improving Efficiency and Accuracy of NLP Systems (EMNLP 2022) [PDF]
- Neeraj Varshney, Chitta Baral
FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance (arXiv, 2023) [PDF]
- Lingjiao Chen, Matei Zaharia, James Zou
Online Cascade Learning for Efficient Inference over Streams (ICML 2024) [PDF]
- Lunyiu Nie, Zhimin Ding, Erdong Hu, Christopher Jermaine, Swarat Chaudhuri
- Code: https://github.com/Flitternie/online_cascade_learning
Language Model Cascades: Token-level uncertainty and beyond (ICLR 2024) [PDF]
- Neha Gupta, Harikrishna Narasimhan, Wittawat Jitkrittum, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar
Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning (ICLR 2024) [PDF]
- Murong Yue, Jie Zhao, Min Zhang, Liang Du, Ziyu Yao
Cascade-Aware Training of Language Models (arXiv, 2024) [PDF]
- Congchao Wang, Sean Augenstein, Keith Rush, Wittawat Jitkrittum, Harikrishna Narasimhan, Ankit Singh Rawat, Aditya Krishna Menon, Alec Go

Speculative Decoding

Fast Inference from Transformers via Speculative Decoding (ICML 2023) [PDF]
- Yaniv Leviathan, Matan Kalman, Yossi Matias
SpecInfer: Accelerating Generative Large Language Model Serving with Tree-based Speculative Inference and Verification (ASPLOS 2024) [PDF]
- Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Zeyu Wang, Zhengxin Zhang, Rae Ying Yee Wong, Alan Zhu, Lijie Yang, Xiaoxiang Shi, Chunan Shi, Zhuoming Chen, Daiyaan Arfeen, Reyna Abhyankar, Zhihao Jia
Faster Cascades via Speculative Decoding (arXiv, 2024) [PDF]
- Harikrishna Narasimhan, Wittawat Jitkrittum, Ankit Singh Rawat, Seungyeon Kim, Neha Gupta, Aditya Krishna Menon, Sanjiv Kumar

LLM/Agent Programming Framework

Prompting Is Programming: A Query Language for Large Language Models (PLDI 2023) [PDF]
- Luca Beurer-Kellner, Marc Fischer, Martin Vechev
- Code: https://github.com/eth-sri/lmql
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines (R0-FoMo Workshop, 2023) [PDF]
- Omar Khattab, Arnav Singhvi, Paridhi Maheshwari, Zhiyuan Zhang, Keshav Santhanam, Sri Vardhamanan, Saiful Haq, Ashutosh Sharma, Thomas T. Joshi, Hanna Moazam, Heather Miller, Matei Zaharia, Christopher Potts
- Code: https://github.com/stanfordnlp/dspy
Language Agents as Optimizable Graphs (ICML 2024) [PDF]
- Mingchen Zhuge, Wenyi Wang, Louis Kirsch, Francesco Faccio, Dmitrii Khizbullin, Jürgen Schmidhuber
- Code: https://github.com/metauto-ai/gptswarm
SGLang: Efficient Execution of Structured Language Model Programs (arXiv, 2024) [PDF]
- Lianmin Zheng, Liangsheng Yin, Zhiqiang Xie, Chuyue Sun, Jeff Huang, Cody Hao Yu, Shiyi Cao, Christos Kozyrakis, Ion Stoica, Joseph E. Gonzalez, Clark Barrett, Ying Sheng
- Code: https://github.com/sgl-project/sglang
AgentLego: An open-source library of versatile tool APIs to extend and enhance LLM based agents
- OpenXLab
- Code: https://github.com/InternLM/agentlego
A Declarative System for Optimizing AI Workloads (arXiv, 2024) [PDF]
- Chunwei Liu, Matthew Russo, Michael Cafarella, Lei Cao, Peter Baille Chen, Zui Chen, Michael Franklin, Tim Kraska, Samuel Madden, Gerardo Vitagliano
- Code: https://github.com/mitdbg/palimpzest

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome Compound AI Paper List ⭐️

Related Resources

Routing

Ensemble

Cascade

Speculative Decoding

LLM/Agent Programming Framework

About

Releases

Packages

License

Flitternie/awesome-compound-ai

Folders and files

Latest commit

History

Repository files navigation

Awesome Compound AI Paper List ⭐️

Related Resources

Routing

Ensemble

Cascade

Speculative Decoding

LLM/Agent Programming Framework

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages