Apple uses Vision Pro to train humanoid robots with human perspective data

Apple’s new research, detailed in “Humanoid Policy ∼ Human Policy,” introduces a groundbreaking way to train humanoid robots. This innovative approach, a collaboration with top universities, uses first-person human demonstrations, often captured with an Apple Vision Pro, to make robot learning more efficient and affordable. It’s a significant leap from the traditional, complex methods of teaching robots.

Vision Pro robot training

The core idea is to collect egocentric human demonstrations, which are videos of people performing tasks from their own perspective. This contrasts sharply with the expensive and labor-intensive process of gathering only robot-generated training data. By combining over 25,000 human and 1,500 robot demonstrations into a unified dataset called PH2D, Apple aims to create a single AI policy that understands both human and robot actions, vastly improving efficiency.

To achieve this, Apple developed a specific app for the Vision Pro. It uses the headset’s camera and ARKit to capture precise 3D head and hand movements, providing the detailed action data robots need. To make this accessible, Apple also created a mount for a ZED Mini stereo camera, allowing similar high-quality data capture with more affordable headsets like the Meta Quest 3.

Vision Pro robot training (2)

This new method drastically improves training efficiency. While traditional teleoperation can take ages, Apple’s approach records full demonstrations in seconds, slashing costs and boosting scalability. Interestingly, human demonstration videos are slowed down by a factor of four during training to match robot speeds, simplifying the learning process without extra adjustments.

At the heart of this system is the Human Action Transformer (HAT) model. HAT processes both human and robot demonstrations in a unified format, learning universal rules for manipulation tasks. This integrated approach allows robots to master new and unfamiliar tasks more effectively, requiring less data than traditional robot-only training.

The “Humanoid Policy ∼ Human Policy” study showcases a pivotal moment in AI and robotics, demonstrating how human insights can accelerate the development of sophisticated humanoid robots.

Check out the full paper here.

About the Author

Asma is an editor at iThinkDifferent with a strong focus on social media, Apple news, streaming services, guides, mobile gaming, app reviews, and more. When not blogging, Asma loves to play with her cat, draw, and binge on Netflix shows.