Humanoid Manipulation with Vision-Language-Action Model

CMU MRSD Capstone Project (Fall 2025), sponsored by Nissan and Field AI

CMU MRSD Capstone Project (Fall 2025), sponsored by Nissan and Field AI
Advised by Prof. Guanya Shi
View the project website

Skills

Computer Vision, Point Clouds, Robot Foundation Models, Vision-Language-Action Models
Frameworks: PyTorch, PyTorch 3D, MuJoCo, PCL, Open3D, Apple Vision Pro, Rerun

Collected 800+ high-quality tele-operated manipulation data on the Unitree G1 robot using Apple Vision Pro and a custom built data teleoperation and collection pipeline. LoRA fine-tuned NVIDIA GR00T N1.5 and deployed the policy. Developed and benchmarked other diffusion policies with modality (point-cloud perception) and architecture (DDPM and ACT) changes with a focus on latency and reliability.

GR00T N1.5 VLA on the Unitree G1 performing box manipulation.

Project Poster

Download PDF