Skip to content

Latest commit

 

History

History
137 lines (97 loc) · 9.47 KB

README.md

File metadata and controls

137 lines (97 loc) · 9.47 KB

🤖 Awesome Pre-Trained Visual Representations for Motor Control and Robot Learning

Awesome Contrib PaperNum

Pre-trained visual representations (PVRs) are reshaping the landscape of robotic control by moving beyond the traditional tabula-rasa paradigm, where visuo-motor policies are trained from scratch. By leveraging powerful pre-trained foundation models from the domain of vision, researchers can harness rich, high-level abstractions to accelerate training and improve the generalization of robot policies across diverse environments. This repository compiles pioneering research on how PVRs are being utilized to enhance control learning and robotic performance without relying on task-specific or in-domain data.

Feel free to cite and/or contribute.

Table of Contents

All Papers

2022

  1. The Unsurprising Effectiveness of Pre-Trained Vision Models for Control
    Simone Parisi, Aravind Rajeswaran, Senthil Purushwalkam, Abhinav Gupta
    📄 🌐 :octocat: (ICML 2022)

  2. Masked Visual Pre-training for Motor Control
    Tete Xiao, Ilija Radosavovic, Trevor Darrell, Jitendra Malik
    📄 🌐 :octocat: (ArXiv PrePrint 2022)

  3. R3M: A Universal Visual Representation for Robot Manipulation
    Suraj Nair, Aravind Rajeswaran, Vikash Kumar, Chelsea Finn, Abhinav Gupta
    📄 🌐 :octocat: (CoRL 2022)

  4. VIP: Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training
    Yecheng Jason Ma, Shagun Sodhani, Dinesh Jayaraman, Osbert Bastani, Vikash Kumar, Amy Zhang
    📄 🌐 :octocat: (ICLR 2023)

  5. Real-World Robot Learning with Masked Visual Pre-training
    Ilija Radosavovic, Tete Xiao, Stephen James, Pieter Abbeel, Jitendra Malik, Trevor Darrell
    📄 🌐 :octocat: (CoRL 2022)

2023

  1. Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?
    Arjun Majumdar, Karmesh Yadav, Sergio Arnaud, Yecheng Jason Ma, Claire Chen, Sneha Silwal, Aryan Jain, Vincent-Pierre Berges, Pieter Abbeel, Jitendra Malik, Dhruv Batra, Yixin Lin, Oleksandr Maksymets, Aravind Rajeswaran, Franziska Meier
    📄 🌐 :octocat: (NeurIPS 2023)

  2. For Pre-Trained Vision Models in Motor Control, Not All Policy Learning Methods are Created Equal
    Yingdong Hu, Renhao Wang, Li Erran Li, Yang Gao
    📄 🌐 :octocat: (ICML 2023)

  3. Exploring Visual Pre-training for Robot Manipulation: Datasets, Models and Methods
    Ya Jing, Xuelin Zhu, Xingbin Liu, Qie Sima, Taozheng Yang, Yunhai Feng, Tao Kong
    📄 🌐 (IROS 2023)

  4. An Unbiased Look at Datasets for Visuo-Motor Pre-Training
    Sudeep Dasari, Mohan Kumar Srirama, Unnat Jain, Abhinav Gupta
    📄 🌐 :octocat: (CoRL 2023)

  5. What Makes Pre-Trained Visual Representations Successful for Robust Manipulation?
    Kaylee Burns, Zach Witzel, Jubayer Ibn Hamid, Tianhe Yu, Chelsea Finn, Karol Hausman
    📄 🌐 :octocat: (ArXiv Preprint 2023)

2024

  1. SpawnNet: Learning Generalizable Visuomotor Skills from Pre-trained Networks
    Xingyu Lin, John So, Sashwat Mahalingam, Fangchen Liu, Pieter Abbeel
    📄 🌐 :octocat: (ICRA 2024)

  2. What do we learn from a large-scale study of pre-trained visual representations in sim and real environments?
    Sneha Silwal, Karmesh Yadav, Tingfan Wu, Jay Vakil, Arjun Majumdar, Sergio Arnaud, Claire Chen, Vincent-Pierre Berges, Dhruv Batra, Aravind Rajeswaran, Mrinal Kalakrishnan, Franziska Meier, Oleksandr Maksymets
    📄 🌐 (ICRA 2024)

  3. Decomposing the Generalization Gap in Imitation Learning for Visual Robotic Manipulation
    Annie Xie, Lisa Lee, Ted Xiao, Chelsea Finn
    📄 🌐:octocat: (ICRA 2024)

  4. Spatiotemporal Predictive Pre-training for Robotic Motor Control
    Jiange Yang, Bei Liu, Jianlong Fu, Bocheng Pan, Gangshan Wu, Limin Wang
    📄 (ArXiv Preprint 2024)

  5. Pre-trained Text-to-Image Diffusion Models Are Versatile Representation Learners for Control
    Gunshi Gupta, Karmesh Yadav, Yarin Gal, Dhruv Batra, Zsolt Kira, Cong Lu, Tim G. J. Rudner
    📄 :octocat: (ArXiv Preprint 2024)

  6. Theia: Distilling Diverse Vision Foundation Models for Robot Learning Jinghuan Shang, Karl Schmeckpeper, Brandon B. May, Maria Vittoria Minniti, Tarik Kelestemur, David Watkins, Laura Herlant
    📄 🌐 :octocat: (ArXiv Preprint 2024)

Other Useful Sources

Benchmarks

  1. CortexBench :octocat:
  2. Voltron :octocat:

Fine-Tuning & Training from Scratch

  1. On Pre-Training for Visuo-Motor Control: Revisiting a Learning-from-Scratch Baseline
    Nicklas Hansen, Zhecheng Yuan, Yanjie Ze, Tongzhou Mu, Aravind Rajeswaran, Hao Su, Huazhe Xu, Xiaolong Wang
    📄 :octocat: (CoRL 2022 - Workshop on Pre-training Robot Learning)

  2. Lossless Adaptation of Pretrained Vision Models For Robotic Manipulation
    Mohit Sharma, Claudio Fantacci, Yuxiang Zhou, Skanda Koppula, Nicolas Heess, Jon Scholz, Yusuf Aytar
    📄 🌐 (ICLR 2023)

Language Integration

  1. Language-Driven Representation Learning for Robotics
    Siddharth Karamcheti, Suraj Nair, Annie S. Chen, Thomas Kollar, Chelsea Finn, Dorsa Sadigh, Percy Liang 📄 🌐 :octocat: (RSS 2023)

  2. LIV: Language-Image Representations and Rewards for Robotic Control
    Yecheng Jason Ma, William Liang, Vaidehi Som, Vikash Kumar, Amy Zhang, Osbert Bastani, Dinesh Jayaraman
    📄 🌐 :octocat: (ICML 2023)

  3. Vision-Language Foundation Models as Effective Robot Imitators
    Xinghang Li, Minghuan Liu, Hanbo Zhang, Cunjun Yu, Jie Xu, Hongtao Wu, Chilam Cheang, Ya Jing, Weinan Zhang, Huaping Liu, Hang Li, Tao Kong
    📄 🌐 :octocat: (ICCV 2023)

Citing PVRobotics

@misc{tsagkas2024awesome,
  author={Tsagkas, Nikolaos},
  title={Awesome Pre-Trained Visual Representations for Motor Control and Robot Learning},
  howpublished={\url{https://github.com/tsagkas/Awesome-PVRobotics}},
  year={2024}
}