Apr 9, 2026 · 4 min read2026 年 4 月 9 日 · 阅读约 4 分钟

Why the next Agent OS will center on CLI + Skills

GUI was the right answer when the main user of software was a human being who needed translation. Agents change that assumption. Once the operator is a system that thinks in language, tools, parameters, and structured state, the center of interaction starts to move.

If there is one historical parallel worth keeping in mind, it is not "a smarter chatbot." It is the GUI revolution itself: the moment Windows and Mac OS displaced DOS. That revolution moved the center of interaction away from command syntax and into the GUI. The next one may move the center again: voice and text become the natural entry point, CLI + Skills become the execution substrate, and GUI stops being the primary site of operation.

1. History

GUI was the right answer for the last era

Early command-line systems were efficient, but they assumed a trained operator. Personal computing changed the user base. Computers were no longer built only for engineers, operators, and specialists; they were being opened to office workers, students, designers, and families.

For those users, the main problem was not raw capability. It was translation. They needed visible objects, consistent controls, a lower memory burden, and a safer way to issue machine instructions. GUI solved that problem. That is why it won.

The important point is that GUI did not replace the machine contract underneath. Software still ran on commands, files, processes, RPCs, and database updates. GUI reorganized those capabilities into something ordinary users could learn and trust. For forty years, that was the correct optimization target, because humans were the bottleneck.

2. Now

Why agents need CLI

Agents can drive a GUI through browser automation or computer-use models. But that is a compatibility technique, not an ideal substrate. It is slower, more brittle, and more ambiguous than calling an explicit capability directly.

Agents perform best when actions are named, parameters are typed, output is structured, and failure states are visible. CLI offers exactly that. It gives the model stable commands, clear input parameters, parseable output, and a way to chain capabilities together without guessing what a button means today.

Put differently, the reason we keep returning to CLI is not nostalgia for black-screen terminals. It is that CLI provides a low-ambiguity execution contract with at least four properties:

▸ Actions and parameters are explicitly callable, not buried inside a visual hierarchy.
▸ Output is structured, so the next step can parse, validate, and reuse it directly.
▸ Capabilities compose, so multi-step workflows are not trapped inside any single product's UI.
▸ Failure modes, permission scopes, and logs are all inspectable — supervision, replay, and recovery become tractable.

API and CLI are the same thing in essence. But in desktop software, local tools, and mixed human-agent workflows, CLI remains the most practical way to expose an executable contract. It is scriptable, reviewable, composable, and easy to adopt incrementally.

This changes product economics. Once software demand is increasingly mediated by agents, a feature that exists only inside a deep GUI flow becomes harder to reach, harder to recommend, and harder to compose into a workflow. What used to be a UX detail starts becoming a distribution problem.

3. Next

What happens to GUI in an Agent OS

The next Agent OS will not look like a desktop copied into a chat window. Its primary interaction model will be language: speaking, typing, delegating, confirming. Underneath that, the system will need a real execution layer made of callable tools. That is where CLI + Skills comes in.

CLI provides the executable surface. Skills provide discovery, packaging, typed metadata, permissions, and reuse. Together they turn scattered capabilities into something an agent can reliably find, install, invoke, and combine.

GUI does not disappear, but its role evolves. It stops feeling like a maze the user has to walk through layer by layer, and starts feeling like a results pane — surfacing state, previews, diffs, and risk warnings, then asking for confirmation at the key checkpoints. In other words, the GUI that remains is no longer where users complete operations; it is where they understand outcomes and authorise milestones.

That is why we expect future software to become flatter. Fewer nested controls. Fewer brittle workflows. Fewer screens whose only job is to manually walk the user through steps an agent could execute directly. The richer the agent layer becomes, the simpler the GUI can afford to be. The point is not to drag humans back into terminal life; the point is that software now needs an execution contract that is honest to agents. Once that contract exists, voice and text can become the front door, CLI + Skills can become the machinery underneath, and GUI no longer has to pretend to be the whole operating model.

References

Ritchie, D. M., & Thompson, K. (1974). The UNIX Time-Sharing System. Communications of the ACM, 17(7), 365–375. doi.org/10.1145/361011.361061
Engelbart, D. C., & English, W. K. (1968). A Research Center for Augmenting Human Intellect. Proceedings of the Fall Joint Computer Conference (AFIPS '68), 395–410. doi.org/10.1145/1476589.1476645
Shneiderman, B. (1983). Direct Manipulation: A Step Beyond Programming Languages. IEEE Computer, 16(8), 57–69. doi.org/10.1109/MC.1983.1654471
Anthropic. (2025). Claude Code overview. docs.anthropic.com/en/docs/claude-code/overview
Anthropic. (2025). Agent Skills. docs.anthropic.com/en/docs/agents-and-tools/agent-skills
Model Context Protocol. (2025). Specification and SDKs. modelcontextprotocol.io

为什么下一代 Agent OS 会以 CLI + Skills 为核心

当软件的主要使用者还是人类时，GUI 是正确答案；因为人需要一层翻译，才能稳定地操作计算机。但 Agent 改变了这个前提。当操作者变成一个以语言、工具、参数和结构化状态为工作对象的系统，交互中心就会开始迁移。

如果一定要找一个最贴切的历史参照，不是”聊天框变得更聪明了”，而是当年 Windows / macOS 对 DOS 的取代。那场 GUI 交互变革，把交互中心从命令语法转移到了图形界面；而下一场变革，很可能会再次改写这个中心。对 Agent 来说，语音和文本会成为天然入口，CLI + Skills 会成为真正的执行底座，而 GUI 将不再承担主要操作本身。

1. 历史

GUI 是上一代交互问题的正确答案

早期命令行系统效率很高，但它默认操作者受过训练。个人计算时代到来之后，计算机不再只服务工程师、操作员和专业人员，而开始面向办公室职员、学生、设计师和普通家庭用户。

对这些用户来说，真正的问题不是算力够不够，而是怎么把机器能力翻译成人能掌握的交互方式。他们需要可见对象、一致控件、更低的记忆负担，以及更低的误操作成本。GUI 解决的正是这个问题，所以它才成为主流。

这里要看清的一点是：GUI 并没有替换机器底层的契约。软件真正依赖的仍然是命令、文件、进程、RPC 和数据库变更。GUI 只是把这些能力重新组织成普通人能学会、能信任、能稳定使用的交互形式。在过去四十年里，这当然是对的，因为瓶颈一直在人。

2. 当下

为什么 Agent 需要 CLI

Agent 当然也可以操作 GUI，通过浏览器自动化、computer-use 模型或视觉识别去“看屏幕、点按钮”。但那更像一种兼容手段，而不是理想底座。它更慢、更脆、更依赖页面细节，也更容易把简单任务拖成一长串不稳定步骤。

Agent 擅长的不是“理解一个按钮今天摆在哪里”，而是处理显式动作、类型化参数、结构化输出和清晰的失败状态。CLI 恰好提供了这一整套东西：稳定的命令、清楚的输入参数、可解析的输出，以及把多个能力串起来的组合方式。

换句话说，我们强调 CLI，并不是因为怀念黑底白字终端，而是因为它提供了一份低歧义的执行契约，至少包含四个性质：

▸ 动作和参数显式可调用，而不是被埋进视觉层级。
▸ 输出是结构化的，可以被下一步直接解析、校验、复用。
▸ 能力之间可以组合，多步工作流不必被困在某一个产品的 UI 边界里。
▸ 失败模式、权限边界和日志全部可检查，监督、回放与恢复都是可行的。

API 和 CLI 在本质上是一回事。但在桌面软件、本地工具和人机协作环境里，CLI 仍然是最现实的执行契约。它便于脚本化，便于审阅，便于组合，也便于渐进式接入，而不用先把整套产品重新做一遍。

这还会改变软件的分发逻辑。一旦越来越多的软件需求由 Agent 代为发起和筛选，那些只藏在复杂 GUI 流程里的功能，会变得更难触达、更难被推荐，也更难被编排进工作流。原来只是体验细节的东西，会逐渐变成分发问题。

3. 未来

Agent OS 里，GUI 会变成什么

下一代 Agent OS 不会只是“把桌面塞进聊天框里”。它的交互会主要以语言为中心：说、写、委托、确认。在这个交互模型下面，系统必须有一套真正的执行层，能够稳定调用工具和能力。这正是 CLI + Skills 的生态位。

CLI 提供稳定的操作面，Skills 提供发现、封装、权限、元数据和复用能力。两者合在一起，才会把零散的软件功能变成 Agent 可以稳定找到、安装、调用、串联的对象。

GUI 不会消失，但角色会变。它会从一座需要人层层穿越的操作迷宫，逐渐演化成一个结果面板：把状态、预览、diff 和风险提示集中呈现，让用户在关键节点完成确认。换句话说，留下来的 GUI 不再是用户"完成操作"的地方，而是用户"理解结果、授权里程碑"的地方。

所以我们判断，未来软件的 GUI 会逐渐变得更扁平。层层嵌套的控件会减少，脆弱的多步流程会减少，那些只是为了引导人手动走完一连串步骤而存在的页面也会减少。Agent 层越强，GUI 就越没有必要继续膨胀。这并不是要把人重新赶回终端，而是说软件必须开始提供一份对 Agent 足够诚实的执行契约。一旦这件事成立，语音和文本就会自然变成入口，CLI + Skills 会成为下面那套真正工作的机械结构，GUI 也就没有必要继续假装自己是整套交互模型的全部。

参考文献

Ritchie, D. M., & Thompson, K. (1974). The UNIX Time-Sharing System. Communications of the ACM, 17(7), 365–375. doi.org/10.1145/361011.361061
Engelbart, D. C., & English, W. K. (1968). A Research Center for Augmenting Human Intellect. Proceedings of the Fall Joint Computer Conference (AFIPS '68), 395–410. doi.org/10.1145/1476589.1476645
Shneiderman, B. (1983). Direct Manipulation: A Step Beyond Programming Languages. IEEE Computer, 16(8), 57–69. doi.org/10.1109/MC.1983.1654471
Anthropic. (2025). Claude Code overview. docs.anthropic.com/en/docs/claude-code/overview
Anthropic. (2025). Agent Skills. docs.anthropic.com/en/docs/agents-and-tools/agent-skills
Model Context Protocol. (2025). Specification and SDKs. modelcontextprotocol.io