AI Agent Controls a Real Robot Arm: What Happened

When AI Meets the Physical World

What happens when you hand an AI agent the keys to a real robotic arm? The answer, it turns out, is surprisingly impressive. A recent experiment explored exactly that — using an AI coding agent called OpenClaw alongside OpenAI's Codex to configure, operate, and even train a physical robot arm from scratch.

The AI successfully set up the hardware, used an onboard camera to perceive its surroundings, and gradually learned to grasp objects. It even helped train a secondary AI model to pick up and place specific items. If that sounds like science fiction, it increasingly isn't.

The Robot: A LeRobot 101

The hardware at the center of this experiment is the LeRobot 101, a prebuilt robotic arm that's part of an open-source initiative from HuggingFace. The project is designed to lower the barrier to entry for robotics experimentation, making it more accessible and affordable than traditional industrial systems.

The LeRobot system includes two arms working in tandem:

A controller arm, operated manually by a human using a handle and trigger
A follower arm, fitted with a camera, that mirrors the controller's movements

This setup enables teleoperation-based AI training. As a person moves the controller arm, the AI model observes the follower arm's camera feed and learns to replicate those movements autonomously over time.

Building With AI: Triumphs and Growing Pains

Getting started wasn't entirely smooth. Before leveraging OpenClaw, the experimenter spent several hours attempting to connect and calibrate the robot manually — at one point nearly burning out the motors by applying incorrect settings that caused them to overheat.

Once OpenClaw and Codex entered the picture, progress accelerated significantly. The AI agent navigated the complex process of establishing connections to the robot hardware, assisted with joint calibration, and generated a Python script that combined multiple libraries to detect and grip a red ball on command.

This approach — sometimes called vibe coding — isn't flawless. AI-generated code can produce hallucinations and introduce bugs, particularly when dealing with specialized hardware. But the overall results were striking enough to turn heads.

From Simple Gripping to Model Training

Grabbing a red ball was just the beginning. The next phase involved using OpenClaw to guide the training of a dedicated control model for the arm. The AI proved adept at walking through each stage of the training process and evaluating the model's error rate after every training run — a task that would typically require significant robotics expertise.

The 'Code as Policy' Revolution

The concept powering this experiment has an academic name: code as policy. First introduced in a landmark 2022 research paper, the idea proposes that AI-generated code can serve as a versatile and powerful method for programming and controlling robots — one that sidesteps the need for highly specialized robotics knowledge.

Since that paper was published, AI coding capabilities have advanced at a remarkable pace, and the code-as-policy approach has gained serious momentum across research institutions worldwide.

A New Benchmark: CaP-X

A collaborative research team from UC Berkeley, Nvidia, Carnegie Mellon University, and Stanford has developed a new evaluation framework called CaP-X to measure how well AI coding models can handle robotic programming tasks.

The findings are revealing. According to CaP-X results, Google's Gemini outperforms both Claude and ChatGPT when it comes to robot programming — likely because Google DeepMind has heavily prioritized multimodal training, equipping its models with a stronger grasp of physical-world reasoning.

Alongside the benchmark, the team released:

CaP-Gym — a controlled environment for testing coding agents on both simulated and real robots
CaP-Agent0 — an agentic framework that significantly boosts coding model performance, enabling them to outperform some models trained specifically for direct robotic motion control

Industry Leaders Are Taking Notice

Ken Goldberg, a prominent roboticist at UC Berkeley and one of the researchers behind this work, frames the significance clearly:

"AI-powered coding is super exciting because it has the potential to bridge the gap between conventional engineering methods, which are reliable but don't generalize, and contemporary vision-language-action models, which generalize but are not yet reliable."

Goldberg's team is also collaborating with Nvidia to expand the practical reach of the code-as-policy method, working to make it compatible with a broader ecosystem of robot software tools.

Spencer Huang, who has been organizing internal robotics hackathons at Nvidia and is currently involved in a joint research project with Goldberg, believes the implications extend far beyond the lab.

"Nearly anyone can get into robotics, which is the true holy grail," Huang explains. Enabling people to direct robots through spoken commands, typed instructions, or simple physical demonstrations is, in his words, the "critical unlock for robots in society."

What This Means for the Future of Robotics

The barriers that once made robotics the exclusive domain of specialists — deep programming knowledge, costly hardware, and years of training — are beginning to erode. AI-powered coding tools are emerging as a genuine equalizer, enabling enthusiasts, researchers, and developers to experiment with physical robots in ways that were unimaginable just a few years ago.

Whether this signals an imminent robotics breakthrough or simply an exciting step in a longer journey, one thing is clear: the relationship between artificial intelligence and the physical world is growing closer, faster, and more capable with every passing year.

How I Gave an AI Agent Full Control of a Robotic Arm — And What Happened Next