Almost at the same time as Apple released its new product last night, the entire tech circle was swiped by a product called Manus.
This is the world’s first truly general-purpose AI Agent, and as you can see from the case studies on its official website, it can think independently, plan and execute complex tasks, and directly deliver complete results.
Compared to agents like Claude’s Computer use, which can also multitask, or help you order food and order hotels, Manus can cover more areas and achieve higher execution quality.
Manus has set a new record in the authoritative GAIA benchmark, far outperforming OpenAI’s counterparts.
The name Manus comes from the Latin, Mens et Manus, which means mind and hand. This is also the motto of MIT, which encourages students to turn their ideas into practical results.
A few hours before the release of Manus, founder Xiao Hong posted “The climax is coming” on the instant platform and shared an excerpt from Shakespeare’s book:

It’s hard to say right now that the birth of Manus is a milestone for AGI, but it’s likely to bring the Agent era to a real “climax”.
Manus Experience Application Link 👇: https://manus.im/invitation
01
Sifting resumes, choosing houses and speculating stocks, Manus really knows how to “work”?
Officially, Manus is not just a conversational AI tool that can only chat, but a real autonomous agent.
While other AIs may only stay at the stage of generating ideas, Manus is able to think and act on their own. Officially, it is seen as a new paradigm for human-robot collaboration, and perhaps even a window to AGI.
Synchronizing Manus with a four-minute demo was also available. In these cases, Manus completes the entire process from planning to execution completely autonomously, demonstrating true agent capabilities rather than simple assistant functions.

Let’s say start with a common HR task – screening resumes.
The demo kicked off with a zip file of 10 resumes sent to Manus, who was able to work as efficiently as a professional recruiter.
It unzips the file first, then goes through each resume page by page and records important information. Manus also processes files asynchronously, which means you can turn off your computer at any time and let you know when the task is complete.

Of course, you can also give it new instructions at any time during the process.
Next, go ahead and upload 5 resumes to Manus. After carefully reading all 15 resumes, Manus gave a ranking recommendation and provided candidate profiles and evaluation criteria as a reference.
And that’s not all, we can also have Manus generate spreadsheets.
Because Manus has the knowledge and memory skills, the next time it performs a similar task, it delivers the results directly in spreadsheet.

In another demonstration case, Manus screened a safe, low-crime neighborhood in New York and purchased a property that met the criteria, taking into account the family’s income profile and the child’s schooling requirements.

Faced with such complex tasks, Manus also methodically breaks them down into steps and creates detailed to-do lists.
- Search and read articles about New York’s safest neighborhoods.
- Research the situation in New York’s secondary schools.
- Write a Python program to calculate a budget.
- Based on your budget, you can filter for suitable listings on real estate websites.
- Consolidate all the information, write detailed reports and organize relevant materials.

In the third case, Manus became a professional equity analyst.
Let it analyze the correlation between the stock prices of Nvidia, Marvell Technology, and TSMC over the past 3 years, and Manus can access authoritative data sources through APIs. After validating the data, it starts writing code for data analysis and visualization.

Once the data has been analyzed and visualized, Manus can also create a website based on that data. With the user’s permission, the website can also be deployed online and provided with a link to share.

After experiencing Manus, @DavidAIinchina gave a very high evaluation – “incredible use case”.
Officially, the above display is just the tip of the iceberg of Manus’ abilities.
The official website (https://manus.im/usecases) also shared more examples of Manus handling real-world tasks. From personalized travel planning, in-depth stock analysis, insurance policy comparisons, supplier procurement, financial report analysis, and professional data collation, Manus has you covered.

Although Manus is not yet fully open, its popularity has taken the entire network by storm. On major platforms, netizens poured into the comment area late at night to ask for the invitation code, which shows how popular it is.
And in the GAIA benchmark, which evaluates the ability of general-purpose AI assistants to solve real-world problems, Manus achieved SOTA levels on all three difficulty levels.
To ensure reproducibility of the results, Manus is evaluated using a configuration that is exactly the same as its official version.

In addition to benchmarking, Manus has solved real-world problems on platforms like Upwork and Fiverr, and has proven himself in Kaggle competitions.
And all of this is inseparable from the excellent open source community, so the official also hopes to give back to the community.
Manus uses a multisig system that is driven by multiple independent models. Later this year, there are plans to open-source some of these models, especially the postering part of Manus.

02
Chinese team, a variety of explosive products, millions of users
So who is behind this industry-shaking product?
It is reported that Xiao Hong, the founder behind Manus AI, is a 2015 alumnus of Huazhong University of Science and Technology majoring in software engineering.
After graduation, he continued to start his own business, and in 2015, he founded Nightingale Technology, launched “Yiban Assistant” and “WeCompanion Assistant”, serving more than 2 million B-end users, and received investment from Tencent and Zhen Fund.
There is also a more distinctive AI product that haunts Xiao Hong – Monica.
It’s an AI assistant that claims to be All-in-One, and was originally launched as a browser add-on.

By integrating with mainstream models (such as Claude 3.5, DeepSeek, etc.), Monica provides chat, translation, copywriting and other functions, and users can create customized tools through natural language and share them to Tool Square.
Monica also focused on overseas markets in the early days, with a user scale of more than one million, and became the leading product in the field of AI plug-ins.
In February this year, Monica’s Chinese version (monica.cn) was launched for beta testing and is currently available for free to domestic users. This version is based on the DeepSeek R1 and V3 models, which has deep reasoning and thinking capabilities, and supports memory functions and real-time online search.

03
Manus’s philosophy of technology: less structure more intelligence
Manus’s technical philosophy is somewhat different from the mainstream, which is “less structure, more intelligence.”
They believe that when the data is good enough, the model is strong enough, the architecture is flexible enough, and the engineering is solid enough, capabilities such as computer use, deep research, and coding agent will naturally emerge without the need to be designed as a specific product feature.
As one of the representatives of the miracle, GPT-4-Turbo has an average score of less than 7% on the GAIA public rankings, and even solutions using complex multi-agent systems only reach 40%. Manus’ performance can be described as “far ahead”.

In a recent interview with Zhang Xiaojun, founder Xiao Hong also talked about Manus, an agent product that had not yet been released at that time.
“It looks like it’s really supposed to be a chatbot, which is very much in line with everyone’s imagination, but at the same time, it’s very complex on the application side, unlike Monica, just using different models is quite complicated.”
Xiao Hong also divides the current AI applications into two categories: one is to fill the gap in the main application products, and the other is to provide unique solutions for specific scenarios.
For example, Perplexity (which provides online search capabilities) and Monica (in the form of a browser plug-in) fall into this category, filling in the gaps left by existing products.
Applications such as model-driven new scenarios, which mainly appear in the field of images and videos, are directly driven by advances in model technology. Products like Pika and Runway leverage the power of models to create new use cases.
Some users ridiculed Manus as “the ultimate shell is awesome”, in fact, Xiao Hong is not afraid to let users know that their products use someone else’s model. Back last year, he likened Monica to consumer electronics and put ChatGPT’s logo on the official website.
04
A new era of human-computer interaction has arrived, but don’t rush to put Manus on the AGI altar
In early 2024, APPSO predicted that large models would become the new operating system for smartphones, and that natural user interfaces (NUIs) would gradually replace existing graphical user interfaces (GUIs).
An important entry point for this new interaction is the Agent.
Last year, we saw similar cases at many mobile phone launches. vivo’s press conference showcased “Phone GPT”, which can order food with AI, Huawei’s Hongmeng’s Xiaoyi and intent framework, or Honor’s YOYO agent, and Zhipu’s AutoGLM, the core is the same:
Let the AI mimic the human Plan-Do-Check-Act cycle to operate the device like a human.
Zhang Peng, CEO of Zhipu AI, mentioned earlier that the current agent capability is more like adding an intelligent scheduling layer between users and applications, linking all applications and even all devices.

This can be regarded as a prototype of the large-scale general-purpose operating system LLM-OS, which will have a great impact on the form of human-computer interaction. Andrej Karpathy, a founding member of OpenAI and a leading AI technology expert, has also spoken about the large language model operating system (LLM OS) several times.
He believes that the big model is a new computer and operating system that can connect various software and hardware, as well as peripherals composed of all modal information, and perform various tasks through function calls.
In a traditional operating system, you need to build a bunch of peripherals around the CPU, such as a mouse and keyboard, disk storage, and cache space. Whereas, in LLM OS, the large model itself is the CPU.
I/O peripherals are also no longer mice and keyboards, as LLMs can be compatible with more modal data inputs and outputs. At the same time, the external tools used by the large model will also be upgraded from traditional software to agent tools.
Cross-application operations are a critical part, which means that the agent can achieve more complex autonomous and coherent operations, and may also move towards real commercialization. As for whether the services provided by various Internet companies can be connected, it may be the biggest obstacle to achieving such interaction in the future.
However, many AI assistants now do it on behalf of others by invoking the phone’s accessibility features permissions to control screen taps.
The advent of Manus means that the AI in Agent mode can understand the requirements and work independently until the task is completed. This is undoubtedly a big step forward in the field of human-machine interaction, and it shows us the potential of AI to transform from a tool to a partner.
But it’s too early to say we’ve got one foot in the door of AGI. Xiao Hong himself mentioned that the early Agent was more like a “feature machine”, which needed to be constantly iterated and improved. The current agent still needs to rely on the improvement of model capabilities and better virtual environment support to be truly competent for various long-tail tasks.
If it is analogous to intelligent driving, it is probably equivalent to upgrading from L2 to L3 assisted driving. While Manus performs well in the GAIA benchmark, that doesn’t mean it has all the characteristics of artificial general intelligence. There is still a long road to AGI, and multiple challenges need to be addressed, such as model capability, self-learning, and task generalization.
But because of Manus’s breakthrough in autonomy and versatility, there is another star that illuminates us in the great voyage to AGI.
Author:shiyun 张勇毅
Source:AI Agent 的「GPT 时刻」,Manus 炸醒整个 AI 圈!
The copyright belongs to the author. For commercial reprints, please contact the author for authorization. For non-commercial reprints, please indicate the source.