Datasets Best Practices
How to Record Killer Robot Training Data
Garbage in, garbage out. Your robot is only as smart as the data you feed it. After training countless models, we've learned what separates amateur datasets from professional-grade training data that actually works.
Here's how to record datasets that turn your robot from a confused arm-waver into a precise, intelligent assistant.
Create Your Robot's Perfect World
Think of your recording environment like a photography studio – you need controlled conditions to capture the best shots.
Keep It Static
Your robot needs to focus on learning the task, not dealing with chaos. Clear the workspace of:
People walking around (they're visual noise)
Moving machinery or equipment
Shifting backgrounds or decorations
Pro tip: If you wouldn't want it in the background of an important video call, your robot doesn't want it in its training data either.
Master the Lighting Game
Shadows are your enemy. They hide crucial details and confuse your robot about object shapes and positions.
What works:
Ring lights or diffused lamps for even coverage
Consistent lighting setup for every recording session
Same lighting conditions for each object type
What doesn't:
Window light (changes with weather and time)
Harsh overhead lighting (creates confusing shadows)
Mixed lighting sources (different color temperatures)
Set Up Cameras Like a Pro
Your cameras are literally your robot's eyes during training. Get this wrong, and your robot will be blind to the world.
Match Your Model's Expectations
If you're using a pre-trained model like pi0 from Physical Intelligence, exactly replicate their camera setup:
Wrist cameras on each robot arm
First-person view (FPV) context camera for workspace overview
Same angles and distances they used
Why this matters: Models expect to see the world the way they were trained. Change the camera setup, and you're essentially speaking a different visual language.
Position for Success
Ask yourself: "Could I control this robot using only these camera feeds?" If not, your robot can't learn from them either.
Ideal camera angles:
Wrist cameras: Track the gripper and object interaction closely
Context camera: Wide view showing the entire workspace and task context
Stability first: Secure every camera – shaky footage creates terrible training data
Teach Your Robot to Reach and Grab
How your robot approaches objects teaches it everything about manipulation.
Make the Approach Visible
Your robot needs to see the target object as early as possible, especially through the wrist cameras.
Good approach: Angle the arm so the object stays visible throughout the entire motion Bad approach: Gripper blocks the object view during approach (robot learns nothing useful)
Be Consistent (But Not Boring)
Use a repeatable grasping strategy so your robot learns reliable patterns, but allow natural variations to build robustness. Think "theme and variations" in music – consistent core technique with slight adaptations.
Build Diversity That Actually Matters
Diversity isn't about recording random chaos – it's about strategic variation that makes your robot adaptable.
Smart Variations to Include
Object placement: Left side, right side, center, near, far Object types: Different shapes, sizes, colors, textures Lighting conditions: Slight variations in brightness (but stay consistent) Approach angles: Multiple ways to reach the same target
Think in Learning Spaces
If your training data only shows objects on the left, your robot will never learn to pick up objects on the right. Map out the "space" where your robot needs to operate and make sure your data covers it.
Test your coverage: Where are the edges of your task? Make sure you have examples near those boundaries.
Avoid the Outlier Trap
Outliers are weird examples that don't represent normal task execution. They teach your robot the wrong lessons.
Good variation: Object moved 2 inches to the right Bad outlier: Object hanging upside down from the ceiling
Keep variations realistic and within the bounds of actual task requirements.
How Much Data Do You Actually Need?
Start with 40-50 episodes per task. An episode is one complete task execution from start to finish.
This isn't the final number – it's your starting point for testing whether your pipeline works before you invest weeks in data collection.
Sanity Check Before You Scale
Robotics datasets take forever to collect and are nearly impossible to edit. Don't waste weeks recording garbage data.
The Smart Approach
Record 3-5 test episodes first
Check data quality using LeRobot's Visualize Dataset tool
Run the full pipeline – data collection → training → inference
Test basic replay – can your trained model at least repeat what it saw?
Only then scale up to full dataset collection
The Reality Check Questions
Can the data load properly without errors?
Does the trained model learn to replay at least one episode perfectly?
When you put the robot in the same starting state as your dataset, can it execute the task?
If any answer is "no," fix the problem before collecting more data.
Ready to Record?
You now have the blueprint for recording professional-quality robot training data. Remember: quality beats quantity every time.
A small, well-recorded dataset will outperform a massive collection of sloppy demonstrations.
Your next step: Pick one simple task, set up your environment using these guidelines, and record your first dataset. Start small, validate your process, then scale to build the training data that will make your robot truly intelligent.
Last updated