Alibaba unveils AI models for controlling robots

Alibaba has unveiled Qwen-Robot Suite — a set of AI models for robots and tasks in the physical environment: Qwen-RobotNav for navigation, Qwen-RobotManip for actions with objects, and Qwen-RobotWorld for predicting how a scene will unfold. The team described the project as a «full stack for embodied artificial intelligence».
These are software models meant to help physical agents perceive their surroundings, plan actions, and carry out commands in natural language. Qwen-Robot Suite is already undergoing pilot trials with select Alibaba Cloud enterprise clients in the robotics field.
Why Alibaba is bringing Qwen into the physical world
Large language and multimodal models can already work with text, images, video, and speech, but that is not enough for robots. Physical agents need not only to understand a command but also to translate it into motion, account for space, the properties of objects, sensor limitations, and the consequences of actions.
Alibaba calls this the field of physical AI, or «embodied AI». In this approach, a model must work not only with digital data but also with the physical environment: moving around, finding objects, controlling manipulators, and predicting what will happen after an action.
Qwen-RobotNav: five navigation tasks in one model
Qwen-RobotNav is responsible for navigation. The model combines five groups of tasks:
- following instructions;
- moving to a specified point;
- searching for objects;
- tracking a target;
- autonomous driving.
According to Alibaba, Qwen-RobotNav is built on Qwen3-VL and trained on 15.6 million samples related to route planning and visual-language reasoning.
The company claimed 76.5% success on VLN-CE RxR and 90% on EVT-Bench. Alibaba also specified that the model can operate as a tool for larger agentic systems: a higher-level model plans the task, while Qwen-RobotNav handles the movement.
In its demonstrations, Alibaba describes scenarios such as searching for a lost item in a room or checking whether a specific object in a building is open. In such tasks, a robot must not merely move but gather visual evidence and return an answer to the user.
Qwen-RobotManip: actions with objects
Qwen-RobotManip is designed for physical actions with objects. The model is meant to help robots pick up, move, and place items, as well as transfer skills between different types of devices.
One of the key challenges of robotics is that robots describe actions differently. A manipulator, a two-armed platform, a robot with a hand, or a mobile system use different coordinates, joints, and command formats. Qwen-RobotManip attempts to bring this data into a common representation so that training on one type of robot helps another.
For training, Alibaba used more than 38,100 hours of data. This volume included 11,320 hours of open robotics data, 1,933 hours of first-person human action video, and 24,808 hours of synthetic robotic demonstrations created on the basis of such videos.
The company stated that the model took first place in RoboChallenge Table30 v1 in the general-purpose models track. According to Alibaba, Qwen-RobotManip also showed robustness to new instructions, unfamiliar objects, and the transfer of skills between different robots.
Qwen-RobotWorld: a world model for robots
Qwen-RobotWorld is a video world model controlled by natural language. It is meant to predict how a scene will unfold after a given action.
For example, the model receives a current observation and a text command, and then generates a likely future state of the environment. This approach can be used for manipulation, autonomous driving, navigation, planning, and the creation of synthetic training data for robots.
To train Qwen-RobotWorld, the team assembled the Embodied World Knowledge corpus. It includes 8.6 million «video-text» pairs and more than 200 million frames, covers more than 20 types of robotic platforms and over 500 categories of actions.
Alibaba stated that Qwen-RobotWorld took first place in EWMBench and DreamGen Bench, and also outperformed all open models in WorldModelBench and PBench. The technical description also claims that the model shows high consistency with basic physical regularities — motion, conservation of mass, fluids, and gravity.
Mass-market robots are still a long way off
Despite the claimed results, Qwen-Robot Suite for now remains a set of models rather than a finished consumer robotics platform. Real-world deployment runs into sensor noise, actuator wear, non-standard situations, perception errors, and an enormous number of rare scenarios. Many of the benchmarks on which such systems are compared are run in simulation or under limited experimental conditions.
Alibaba also did not disclose the cost of access, the timeline for a public launch, or the list of clients who are already testing Qwen-Robot Suite.
As a reminder, in April Alibaba Cloud unveiled the agentic model Qwen3.6-Plus with a context window of 1 million tokens and support for external tools.
Source: ForkLog
Новости в мире криптовалют
Random quote about money
"Финансовая деятельность – искусство или наука, управлять доходами и ресурсами для пущей выгоды управляющего."















* to search the proxy database, just enter a country name, e.g. Russia, USA, Thailand