Skip to content

Flitternie/awesome-compound-ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 

Repository files navigation

Awesome Compound AI Paper List ⭐️

Awesome contributions welcome

Developers are increasingly creating Compound AI systems that combine multiple model calls and external components to tackle complex AI tasks. These systems often outperform single models through effective component combination and orchestration with better cost-efficiency and/or reduced latency. This repository collects and categorizes papers on Compound AI, including LLM routing, cascading, ensembling, speculative decoding methods, and LLM programming frameworks.

If you find this repository useful, please consider giving it a star. If there are any relevant papers that should be included, you're welcome to create a pull request or open an issue!

Related Resources

Routing

  • Large Language Model Routing with Benchmark Datasets (arXiv, 2023) [PDF]
    • Tal Shnitzer, Anthony Ou, Mírian Silva, Kate Soule, Yuekai Sun, Justin Solomon, Neil Thompson, Mikhail Yurochkin
  • Tryage: Real-time, intelligent Routing of User Prompts to Large Language Models (arXiv, 2023) [PDF]
    • Surya Narayanan Hari, Matt Thomson
  • Harnessing the Power of Multiple Minds: Lessons Learned from LLM Routing (Workshop on Insights from Negative Results in NLP, 2024) [PDF]
    • Kv Aditya Srivatsa, Kaushal Maurya, Ekaterina Kochmar
  • Fly-Swat or Cannon? Cost-Effective Language Model Choice via Meta-Modeling (WSDM 2024) [PDF]
    • Marija Šakota, Maxime Peyrard, Robert West
  • Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models (NAACL 2024) [PDF]
    • Keming Lu, Hongyi Yuan, Runji Lin, Junyang Lin, Zheng Yuan, Chang Zhou, Jingren Zhou
  • Which LLM to Play? Convergence-Aware Online Model Selection with Time-Increasing Bandits (WWW 2024) [PDF]
    • Yu Xia, Fang Kong, Tong Yu, Liya Guo, Ryan A. Rossi, Sungchul Kim, Shuai Li
  • Hybrid LLM: Cost-Efficient and Quality-Aware Query Routing (ICLR 2024) [PDF]
    • Dujian Ding, Ankur Mallick, Chi Wang, Robert Sim, Subhabrata Mukherjee, Victor Rühle, Laks V. S. Lakshmanan, Ahmed Hassan Awadallah
  • Towards Optimizing the Costs of LLM Usage (arXiv, 2024) [PDF]
    • Shivanshu Shekhar, Tanishq Dubey, Koyel Mukherjee, Apoorv Saxena, Atharv Tyagi, Nishanth Kotla
  • ROUTERBENCH: A Benchmark for Multi-LLM Routing System (arXiv, 2024) [PDF] Benchmark
  • RouteLLM: Learning to Route LLMs with Preference Data (arXiv, 2024) [PDF]
    • Isaac Ong, Amjad Almahairi, Vincent Wu, Wei-Lin Chiang, Tianhao Wu, Joseph E. Gonzalez, M Waleed Kadous, Ion Stoica

Ensemble

  • Efficient Online ML API Selection for Multi-Label Classification Tasks (ICML 2022) [PDF]
    • Lingjiao Chen, Matei Zaharia, James Zou
  • LLM-Blender: Ensembling Large Language Models with Pairwise Ranking and Generative Fusion (ACL 2023) [PDF]
    • Dongfu Jiang, Xiang Ren, Bill Yuchen Lin
  • More Agents Is All You Need (arXiv, 2024) [PDF]
    • Junyou Li, Qin Zhang, Yangbin Yu, Qiang Fu, Deheng Ye

Cascade

  • FrugalML: How to Use ML Prediction APIs More Accurately and Cheaply (NIPS 2020) [PDF]
    • Lingjiao Chen, Matei Zaharia, James Zou
  • Model Cascading: Towards Jointly Improving Efficiency and Accuracy of NLP Systems (EMNLP 2022) [PDF]
    • Neeraj Varshney, Chitta Baral
  • FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance (arXiv, 2023) [PDF]
    • Lingjiao Chen, Matei Zaharia, James Zou
  • Online Cascade Learning for Efficient Inference over Streams (ICML 2024) [PDF]
  • Language Model Cascades: Token-level uncertainty and beyond (ICLR 2024) [PDF]
    • Neha Gupta, Harikrishna Narasimhan, Wittawat Jitkrittum, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar
  • Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning (ICLR 2024) [PDF]
    • Murong Yue, Jie Zhao, Min Zhang, Liang Du, Ziyu Yao
  • Cascade-Aware Training of Language Models (arXiv, 2024) [PDF]
    • Congchao Wang, Sean Augenstein, Keith Rush, Wittawat Jitkrittum, Harikrishna Narasimhan, Ankit Singh Rawat, Aditya Krishna Menon, Alec Go

Speculative Decoding

  • Fast Inference from Transformers via Speculative Decoding (ICML 2023) [PDF]
    • Yaniv Leviathan, Matan Kalman, Yossi Matias
  • SpecInfer: Accelerating Generative Large Language Model Serving with Tree-based Speculative Inference and Verification (ASPLOS 2024) [PDF]
    • Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Zeyu Wang, Zhengxin Zhang, Rae Ying Yee Wong, Alan Zhu, Lijie Yang, Xiaoxiang Shi, Chunan Shi, Zhuoming Chen, Daiyaan Arfeen, Reyna Abhyankar, Zhihao Jia
  • Faster Cascades via Speculative Decoding (arXiv, 2024) [PDF]
    • Harikrishna Narasimhan, Wittawat Jitkrittum, Ankit Singh Rawat, Seungyeon Kim, Neha Gupta, Aditya Krishna Menon, Sanjiv Kumar

LLM/Agent Programming Framework

  • Prompting Is Programming: A Query Language for Large Language Models (PLDI 2023) [PDF]
  • DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines (R0-FoMo Workshop, 2023) [PDF]
    • Omar Khattab, Arnav Singhvi, Paridhi Maheshwari, Zhiyuan Zhang, Keshav Santhanam, Sri Vardhamanan, Saiful Haq, Ashutosh Sharma, Thomas T. Joshi, Hanna Moazam, Heather Miller, Matei Zaharia, Christopher Potts
    • Code: https://github.com/stanfordnlp/dspy Stars
  • Language Agents as Optimizable Graphs (ICML 2024) [PDF]
  • SGLang: Efficient Execution of Structured Language Model Programs (arXiv, 2024) [PDF]
    • Lianmin Zheng, Liangsheng Yin, Zhiqiang Xie, Chuyue Sun, Jeff Huang, Cody Hao Yu, Shiyi Cao, Christos Kozyrakis, Ion Stoica, Joseph E. Gonzalez, Clark Barrett, Ying Sheng
    • Code: https://github.com/sgl-project/sglang Stars
  • AgentLego: An open-source library of versatile tool APIs to extend and enhance LLM based agents
  • A Declarative System for Optimizing AI Workloads (arXiv, 2024) [PDF]
    • Chunwei Liu, Matthew Russo, Michael Cafarella, Lei Cao, Peter Baille Chen, Zui Chen, Michael Franklin, Tim Kraska, Samuel Madden, Gerardo Vitagliano
    • Code: https://github.com/mitdbg/palimpzest Stars

Releases

No releases published

Packages

No packages published