
Researchers from Nvidia, Carnegie Mellon University, and the University of California, Berkeley introduced ENPIRE — a framework that allows AI coding agents to improve robot control policies on real hardware.
The system runs a closed loop: the robot performs a task, the environment automatically evaluates the result and returns to its initial state, and the AI agent analyzes errors, rewrites the code, and launches the next series of trials.
How ENPIRE works
In robotics, training on real hardware remains an expensive and slow process. After a failed attempt, the scene must be returned to its initial state, the result checked, the algorithm changed, and the trial conducted again. Usually part of this work requires the involvement of engineers.
ENPIRE transfers to the physical world an approach that Nvidia calls AutoResearch: AI agents write code, test it, and improve it in subsequent iterations. However, unlike in a digital environment, here each experiment involves real robots, cameras, objects, grasping errors, friction, and other physical constraints.
The framework consists of four modules:
- Environment is responsible for automatic scene reset, result verification, logging, and safety interfaces;
- Policy Improvement launches the improvement of the control policy;
- Rollout evaluates the policy on one or several physical robots;
- Evolution allows agents to analyze logs, look for ideas in the literature, change the training infrastructure, and fix code.
After the initial setup of the environment, the loop can run without constant human supervision. The agent receives data from video, trajectories, and the reward function, proposes a new hypothesis, changes the code, tests the result on the robot, and saves the changes if they improve the metric.
Why automatic verification and reset are needed
A key element of ENPIRE is the automation of two operations: verifying the result and returning the scene to its initial state. The first is needed so that the system can determine on its own whether the task has been completed. For example, in the cable tie scenario, the evaluation function combined a detector, a segmentation model, and verification by two cameras. This way the agent received a success or error signal without manual labeling of each run.
Automatic reset allows running many attempts in a row. After a failed action, the robot must return the object or scene to a state suitable for the next experiment. Without this, training on real hardware quickly runs into the need for constant human involvement.
As Decrypt noted, at the first stage a human helps the agent create permanent tools — a reset procedure and a reward function. After that they are reused, and the agent takes over the further improvement of the policy.
What was shown on the robots
In real experiments the team tested ENPIRE on several manipulation tasks. Push-T checks whether the robot can push a T-shaped object into a given zone. Pin Insertion requires inserting pins into holes 4 mm in diameter. GPU installation and operations with a cable tie are also shown.
On the Nvidia project page it is stated that in real manipulation tasks the system successfully completed the task in 99% of cases if the agent was given up to eight attempts taking into account previous errors. The metric reflects the system's ability to recover after failures and repeat actions taking context into account, rather than the accuracy of a single isolated attempt.
As coding agents, the team compared Codex on GPT-5.5, Claude Code on Opus 4.7, and Kimi Code on Kimi K2.6. The evaluation took place in the AutoEnvBench benchmark on the Push-T and Pin Insertion tasks.
The researchers also tested ENPIRE in RoboCasa — a simulator of household tasks such as opening cabinets and drawers and turning objects on or off in the kitchen. In these scenarios ENPIRE outperformed Nvidia's GR00T and CaP-X — an agent system that uses tools but does not run a full cycle of automatic research.
Eight robots accelerated training
A separate block of the work is devoted to scaling to a fleet of robots. Nvidia conducted an experiment on eight robotic stations with two manipulators. Each had its own hardware, computer, and AI coding agent.
The stations exchanged results via Git: a successful idea or code change could quickly spread between agents. This approach made it possible to reduce training time. According to Decrypt, the transition from one robot to eight reduced the time to master Push-T from about five to two hours. For Pin Insertion the time dropped from more than 90 minutes to about 40 minutes.
Limitations
The authors emphasized that scaling does not solve all problems. When agents read logs, write code, debug it, or wait for a response from the base language model, the robots and computing resources are not fully utilized. As the number of robots grows, GPU activity increases, but the average utilization of the robots themselves decreases. Teams of agents spend more time summarizing the results of other branches and coordinating, rather than only on physical runs.
Another limitation is the growth in token consumption. A larger fleet of robots brings the policy to a working state faster but requires more tokens because of reading logs, sharing ideas, and coordination between agents.
In addition, ENPIRE has so far been shown on a limited set of manipulation tasks. Its results do not mean that robots can already independently master arbitrary physical skills in an open environment without engineering preparation.
Recall that in June Nvidia introduced the Isaac GR00T Reference Humanoid Robot — a research reference design for developing and testing the skills of humanoid robots. The configuration included a Unitree H2 Plus body and tactile five-fingered hands by Sharpa Wave.
Earlier Unitree introduced "the world's first ready-for-mass-production" piloted robot. The android is able to move on two and four limbs.
Source: ForkLog
Новости в мире криптовалют
Random quote about money
"Философия богатого отличается от философии бедного следующим: богатый инвестирует свои деньги и расходует то, что осталось; бедный же расходует свои деньги и инвестирует то, что осталось."













* to search the proxy database, just enter a country name, e.g. Russia, USA, Thailand