Show HN: EgoExo Forge: Data and Utilities Needed for Ego and Exo Human Data
In my opinion, large-scale, diverse, and high-quality data is still the largest bottleneck for generalized robotics deployment. I believe that some version of imitation learning from human examples will be the most scalable + clean way to train humanoid robots (similar to what Tesla did for Full Self driving). Teleop is too expensive to collect a large enough dataset in a reasonable manner, so passive collection via egocentric (and in certain cases, exocentric) views feels like the right bet.
Over the past few months, I've been trying to build out the scaffolding for this and using Rerun as my underlying infrastructure. Data being collected needs to be easily inspectable + time series and rerun provides the right tooling for this.
My goal is to first build out a ground truth representative dataset from already existing open source data, generate some reasonable baselines, and then go out and collect my own data that adheres to the defined schema.
Starting with open-source datasets
1. EgoDex from Apple 2. HOCap from Nvidia and the University of Texas at Dallas 3. Assembly101 from Meta
All these different datasets have different sensor configurations + annotations, so my goal with egoexo-forge is to have one consistent labeling scheme + data layout. I built a data pipeline that aligns all of the different datasets in one general schema assuming the COCO133 keypoint layout that allows for exo+ego, ego only, or exo only
Since the scaffolding is already there, it becomes MUCH easier to add other datasets. So the next ones that I'll be including are HD-EPIC kitchens dataset, HOT3D, and finally my own personal iPhone + insta360 go collection method.
Once I have a diverse variety of datasets, I'll double down on what I believe to be the key algorithms required to make useful data for imitation learning
1. Camera Pose estimation via SLAM/SFM for ego perspective (and automatic calibration for exo) 2. Human pose estimation for both egocentric + exocentric views 3. Metric 3D reconstruction + object tracking
I'll be setting up reasonable open-source baselines for each of these to validate that these datasets work, and then finally try to use the generated datasets for some imitation learning via the pi0-lerobot repo I've been working on.
I plan on making a blog post + providing more info on all of this in the near future so stay tuned
No comments yet