Humanoid Manipulation with Vision-Language-Action Model
CMU MRSD Capstone Project (Fall 2025), sponsored by Nissan and Field AI
CMU MRSD Capstone Project (Fall 2025), sponsored by Nissan and Field AI
Advised by Prof. Guanya Shi
View the project website
Skills
Computer Vision, Point Clouds, Robot Foundation Models, Vision-Language-Action Models
Frameworks: PyTorch, PyTorch 3D, MuJoCo, PCL, Open3D, Apple Vision Pro, Rerun
Collected 800+ high-quality tele-operated manipulation data on the Unitree G1 robot using Apple Vision Pro and a custom built data teleoperation and collection pipeline. LoRA fine-tuned NVIDIA GR00T N1.5 and deployed the policy. Developed and benchmarked other diffusion policies with modality (point-cloud perception) and architecture (DDPM and ACT) changes with a focus on latency and reliability.