🤖 Awesome Pre-Trained Visual Representations for Motor Control and Robot Learning

Pre-trained visual representations (PVRs) are reshaping the landscape of robotic control by moving beyond the traditional tabula-rasa paradigm, where visuo-motor policies are trained from scratch. By leveraging powerful pre-trained foundation models from the domain of vision, researchers can harness rich, high-level abstractions to accelerate training and improve the generalization of robot policies across diverse environments. This repository compiles pioneering research on how PVRs are being utilized to enhance control learning and robotic performance without relying on task-specific or in-domain data.

Feel free to cite and/or contribute.

All Papers

2022

The Unsurprising Effectiveness of Pre-Trained Vision Models for Control
Simone Parisi, Aravind Rajeswaran, Senthil Purushwalkam, Abhinav Gupta
📄 🌐 (ICML 2022)
Masked Visual Pre-training for Motor Control
Tete Xiao, Ilija Radosavovic, Trevor Darrell, Jitendra Malik
📄 🌐 (ArXiv PrePrint 2022)
R3M: A Universal Visual Representation for Robot Manipulation
Suraj Nair, Aravind Rajeswaran, Vikash Kumar, Chelsea Finn, Abhinav Gupta
📄 🌐 (CoRL 2022)
VIP: Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training
Yecheng Jason Ma, Shagun Sodhani, Dinesh Jayaraman, Osbert Bastani, Vikash Kumar, Amy Zhang
📄 🌐 (ICLR 2023)
Real-World Robot Learning with Masked Visual Pre-training
Ilija Radosavovic, Tete Xiao, Stephen James, Pieter Abbeel, Jitendra Malik, Trevor Darrell
📄 🌐 (CoRL 2022)

2023

Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?
Arjun Majumdar, Karmesh Yadav, Sergio Arnaud, Yecheng Jason Ma, Claire Chen, Sneha Silwal, Aryan Jain, Vincent-Pierre Berges, Pieter Abbeel, Jitendra Malik, Dhruv Batra, Yixin Lin, Oleksandr Maksymets, Aravind Rajeswaran, Franziska Meier
📄 🌐 (NeurIPS 2023)
For Pre-Trained Vision Models in Motor Control, Not All Policy Learning Methods are Created Equal
Yingdong Hu, Renhao Wang, Li Erran Li, Yang Gao
📄 🌐 (ICML 2023)
Exploring Visual Pre-training for Robot Manipulation: Datasets, Models and Methods
Ya Jing, Xuelin Zhu, Xingbin Liu, Qie Sima, Taozheng Yang, Yunhai Feng, Tao Kong
📄 🌐 (IROS 2023)
An Unbiased Look at Datasets for Visuo-Motor Pre-Training
Sudeep Dasari, Mohan Kumar Srirama, Unnat Jain, Abhinav Gupta
📄 🌐 (CoRL 2023)
What Makes Pre-Trained Visual Representations Successful for Robust Manipulation?
Kaylee Burns, Zach Witzel, Jubayer Ibn Hamid, Tianhe Yu, Chelsea Finn, Karol Hausman
📄 🌐 (ArXiv Preprint 2023)

2024

SpawnNet: Learning Generalizable Visuomotor Skills from Pre-trained Networks
Xingyu Lin, John So, Sashwat Mahalingam, Fangchen Liu, Pieter Abbeel
📄 🌐 (ICRA 2024)
What do we learn from a large-scale study of pre-trained visual representations in sim and real environments?
Sneha Silwal, Karmesh Yadav, Tingfan Wu, Jay Vakil, Arjun Majumdar, Sergio Arnaud, Claire Chen, Vincent-Pierre Berges, Dhruv Batra, Aravind Rajeswaran, Mrinal Kalakrishnan, Franziska Meier, Oleksandr Maksymets
📄 🌐 (ICRA 2024)
Decomposing the Generalization Gap in Imitation Learning for Visual Robotic Manipulation
Annie Xie, Lisa Lee, Ted Xiao, Chelsea Finn
📄 🌐 (ICRA 2024)
Spatiotemporal Predictive Pre-training for Robotic Motor Control
Jiange Yang, Bei Liu, Jianlong Fu, Bocheng Pan, Gangshan Wu, Limin Wang
📄 (ArXiv Preprint 2024)
Pre-trained Text-to-Image Diffusion Models Are Versatile Representation Learners for Control
Gunshi Gupta, Karmesh Yadav, Yarin Gal, Dhruv Batra, Zsolt Kira, Cong Lu, Tim G. J. Rudner
📄 (ArXiv Preprint 2024)
Theia: Distilling Diverse Vision Foundation Models for Robot Learning Jinghuan Shang, Karl Schmeckpeper, Brandon B. May, Maria Vittoria Minniti, Tarik Kelestemur, David Watkins, Laura Herlant
📄 🌐 (ArXiv Preprint 2024)

Other Useful Sources

Benchmarks

CortexBench
Voltron

Fine-Tuning & Training from Scratch

On Pre-Training for Visuo-Motor Control: Revisiting a Learning-from-Scratch Baseline
Nicklas Hansen, Zhecheng Yuan, Yanjie Ze, Tongzhou Mu, Aravind Rajeswaran, Hao Su, Huazhe Xu, Xiaolong Wang
📄 (CoRL 2022 - Workshop on Pre-training Robot Learning)
Lossless Adaptation of Pretrained Vision Models For Robotic Manipulation
Mohit Sharma, Claudio Fantacci, Yuxiang Zhou, Skanda Koppula, Nicolas Heess, Jon Scholz, Yusuf Aytar
📄 🌐 (ICLR 2023)

Language Integration

Language-Driven Representation Learning for Robotics
Siddharth Karamcheti, Suraj Nair, Annie S. Chen, Thomas Kollar, Chelsea Finn, Dorsa Sadigh, Percy Liang 📄 🌐 (RSS 2023)
LIV: Language-Image Representations and Rewards for Robotic Control
Yecheng Jason Ma, William Liang, Vaidehi Som, Vikash Kumar, Amy Zhang, Osbert Bastani, Dinesh Jayaraman
📄 🌐 (ICML 2023)
Vision-Language Foundation Models as Effective Robot Imitators
Xinghang Li, Minghuan Liu, Hanbo Zhang, Cunjun Yu, Jie Xu, Hongtao Wu, Chilam Cheang, Ya Jing, Weinan Zhang, Huaping Liu, Hang Li, Tao Kong
📄 🌐 (ICCV 2023)

Citing PVRobotics

@misc{tsagkas2024awesome,
  author={Tsagkas, Nikolaos},
  title={Awesome Pre-Trained Visual Representations for Motor Control and Robot Learning},
  howpublished={\url{https://github.com/tsagkas/Awesome-PVRobotics}},
  year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

🤖 Awesome Pre-Trained Visual Representations for Motor Control and Robot Learning

Table of Contents

All Papers

2022

2023

2024

Other Useful Sources

Benchmarks

Fine-Tuning & Training from Scratch

Language Integration

Citing PVRobotics

Files

README.md

Latest commit

History

README.md

File metadata and controls

🤖 Awesome Pre-Trained Visual Representations for Motor Control and Robot Learning

Table of Contents

All Papers

2022

2023

2024

Other Useful Sources

Benchmarks

Fine-Tuning & Training from Scratch

Language Integration

Citing PVRobotics