什么是Agent

workflow

我们当前使用LLM的方式是Non-agentic workflow (zero-shot) 的，就是问LLM一个问题，让它从头到尾一次性把内容写完。

比如： Please type out an essay on topic X from start to finish in one go, without using backspace.

而Agent的工作方式是不断循环迭代的，就像一个人的工作流程一样。

Agentic workflow也许长这样:

Write an essay outline on topic X

Do you need any web research?

Write a first draft.

Consider what parts need revision or more research.

Revise your draft.

…

Agentic Reasoning Design Patterns

四个主要的模式，前两个属于robust technology，通常是有效的，能得到效果提升；后两个是emerging technology，并不总是生效，但有时会让得到非常惊艳的结果。

Reflection反思

例子：除了像prompt一个LLM一样让LLM为某个任务写代码，还在过程中与LLM有交互，比如在LLM给出代码后，继续追问说：“here’s code intended for a toas” 然后把LLM刚刚返回回来的代码再输入进去，然后让LLM继续 “the code carefully for correctness、efficiency、good construction”之类的。

只是简单把输入重新传给LLM，re-prompt它，就通常能得到很好的效果

Tool use

例子：写单测，然后运行

人在处理问题时会大量借助工具，比如分析工具（Wolfram Alpha、Bearly Code Interpreter、Code Execution）、搜索工具（Web browsing、Wikipedia、Productivity）、生产工具（Email、Calendar、Cloud Storage）、图像处理相关工具（Image generation、Image captioning、Object detection）等等

Planning

例子：一个来自HuggingGPT的例子，输入是一张小男孩在骑脚踏车的图片，要求生产一张新图片，是一个女孩子在读书，姿势与传入的示例图片一样，然后用我的声音讲述一下图片内容。

这需要一套解决方案，要使用好几套模型，并把他们串起来：先做Pose Determination，生成一个“openpose model”，然后找一个Pose-to-Image模型，生成小女生照片，最后调用Image-toText和Text-To-Speech。

Multi-agent collaboration

例子：两个agent，一个是coder agent、一个是Critic agent，然后两个agent对话。

搞个Agent玩一玩

参考文献

https://www.youtube.com/watch?v=sal78ACtGTc Andrew Ng对AI Agent的总结视频

Self-Refine: Iterative Refinement with Self-Feedback, Madaan et al. (2023)

Reflexion: Language Agents with Verbal Reinforcement Learning, Shinn et al. (2023)

Gorilla: Large Language Model Connected with Massive APIs, Patil et al. (2023)

MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action, Yang et al. (2023)

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, Wei et al. (2022)

HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face, Shen et al. (2023)

Communicative Agents for Software Development, Qian et al. (2023)

AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation, Wu et al. (2023)