Improve intro to agents

2024-12-30 12:23:19 +01:00 · 2024-12-30 12:23:19 +01:00 · 6f13e78aac
parent 95aae1ba85
commit 6f13e78aac
1 changed files with 36 additions and 25 deletions
--- a/docs/source/conceptual_guides/intro_agents.md
+++ b/docs/source/conceptual_guides/intro_agents.md
@ -17,23 +17,21 @@ rendered properly in your Markdown viewer.

 ### What is an agent?

-Current LLMs are like basic reasoning robots that are trapped into a room.
-They take as input what we decide to provide to them. We pass notes under the door – be it text, or text with images for vision models, or even audio –, and they reply to each note by passing another note under the door, but they cannot do anything else.
-
 Any efficient system using AI will need to provide LLMs some kind of access to the real world: for instance the possibility to call a search tool to get external information, or to act on certain programs in order to solve a task.

-In other words, give them some agency. Agentic programs are the gateway to the outside world for LLMs.
+In other words, give them some ***agency***. Agentic programs are the gateway to the outside world for LLMs.

-Our definition of AI Agents is : “programs in which the workflow is determined by LLM outputs”. Any system leveraging LLMs will embed them into code. The influence of the LLM's input on the code workflow is the level of agency of LLMs in the system.
+For a regiorous definition, AI Agents are *“programs in which the workflow is determined by LLM outputs”*.
+
+Any system leveraging LLMs will integrate the LLM outputs into code. The influence of the LLM's input on the code workflow is the level of agency of LLMs in the system.

 Note that with this definition, "agent" is not a discrete, 0 or 1 definition: instead, "agency" evolves on a continuous spectrum, as you give more or less influence to the LLM on your workflow.

-
-If the output of the LLM has no impact on the workflow, as in a program that just postprocesses a LLM's output and returns it, this system is not agentic at all.
-
-If an LLM output is used to determine which branch of an `if/else` switch is ran, the system starts to have some level of agency: it's a router.
+- If the output of the LLM has no impact on the workflow, as in a program that just postprocesses a LLM's output and returns it, this system is not agentic at all.
+- If an LLM output is used to determine which branch of an `if/else` switch is ran, the system starts to have some level of agency: it's a router.

 Then it can get more agentic.
+
 - If you use an LLM output to determine which function is run and with which arguments, that's tool calling.
 - If you use an LLM output to determine if you should keep iterating in a while loop, you get a multi-step agent.

@ -49,34 +47,46 @@ Since the system’s versatility goes in lockstep with the level of agency that

 Programs are not just tools anymore, confined to an ultra-specialized task : they are agents.

+One type of agentic system is quite simple: the multi-step agent. It has this structure:
+
+```python
+memory = [user_defined_task]
+while llm_should_continue(memory): # this loop is the multi-step part
+		action = llm_get_next_action(memory) # this is the tool-calling part
+		observations = execute_action(action)
+		memory += [action, observations]
+```
+
+This agentic system just runs in a loop, execution a new action at each step (the action can involve calling some pre-determined *tools* that are just functions), until its observations make it apparent that a satisfactory state has been reached to solve the given task.
+

 ### When to use an agentic system ?

-Agents are useful when you need an LLM to help you determine the workflow of an app.
+Agents are useful when you need an LLM to determine the workflow of an app.

-It's advise to regularize towards not using any agentic behaviour.
-Ask yourself: do I really need flexibility in the workflow to efficiently solve the task at hand? If a fixed workflow would work, you might as well build it all in good old no-AI code for 100% robustness. Agents are useful when the ficed workflow is not enough.
+The question to ask is: "Do I really need flexibility in the workflow to efficiently solve the task at hand?"
+
+If a fixed workflow can work, you might as well build it all in good old no-AI code for 100% robustness. For the sake of simplicity and robstness, it's advised to regularize towards not using any agentic behaviour. On the opposite, agents are useful when the fixed workflow is not sufficient.

 Let's take an example: say you're making an app that handles customer requests on a surfing trip website.

 You could know in advance that the requests will have to be classified in either of 2 buckets according to deterministic criteria, and you have a predefined workflow for each of these 2 cases.
 For instance, this is if you let the user click a button to determine their query, and it goes into either of these buckets:
-1. Want some knowledge on the trips. Then you give them access to a search bar to search your knowledge base
-2. Wants to talk to sales. Then you let them type in a contact form.

-If that deterministic workflow fits all queries, by all means just code verything: this will give you a 100% reliable system with no risk of error introduced by letting unpredictable LLMs meddle in your workflow.
+1. Want some knowledge on the trips? ⇒ Then you give them access to a search bar to search your knowledge base
+2. Wants to talk to sales? ⇒ Then you let them type in a contact form.

-But what if the workflow can't be determined that well in advance? Say, 10% or 20% of users requests do not fit properly into your rigid categories, and are thus not handled properly by your program?
+If that deterministic workflow fits all queries, by all means just code everything! This will give you a 100% reliable system with no risk of error introduced by letting unpredictable LLMs meddle in your workflow.

-Let's say, a user wants to ask : "I can come on Monday, but I forgot my passport so risk being delayed to Wednesday, is it possible to take me and my stuff to surf on Tuesday morning, with a concellation insurance?"
-This question into play many factors: availability of employees, weather, travelling distance, knowledge about cancellation policies...
-Probably none of the predetermined criteria above won't work properly on this question.
+But what if the workflow can't be determined that well in advance? Say, 20% or 40% of users requests do not fit properly into your rigid categories, and are thus not handled properly by your program?

-If these cases where the predetermined workflow falls short are frequent, that means you need more flexibility: making your system agentic will provide it that flexibility. In our example, you could just make a multi-step agent that has access to a weather API tool, a google maps API to compute travel distance, an employee availability dashboard and a RAG system on your knowledge base.
+For instance, a user wants to ask : "I can come on Monday, but I forgot my passport so risk being delayed to Wednesday, is it possible to take me and my stuff to surf on Tuesday morning, with a cancellation insurance?" This question hinges on many factors, and probably none of the predetermined criteria above won't be sufficient for this request.

-Actually, most real-life tasks do not fit in a pre-determined workflow. This is why until today, our programs where always focused on infinitely narrow tasks, like "compute the sum of these numbers" or "find the shortest path in this graph". 
+If the pre-determined workflow falls short too often, that means you need more flexibility, which is just what an agentic setup provides. In the above example, you could just make a multi-step agent that has access to a weather API tool, a google maps API to compute travel distance, an employee availability dashboard and a RAG system on your knowledge base.

-Agentic systems are a great way to introduce the vast world of real-world tasks to programs!
+Until recently, computer programs were restricted to pre-determined workflows (with possible piles of if/else switches), thus focused on extremely narrow tasks, like "compute the sum of these numbers" or "find the shortest path in this graph".
+
+But actually, most real-life tasks are like our trip example above, they do not fit in pre-determined workflows. Agentic systems open up the vast world of real-world tasks to programs!

 ### Why `smolagents`?

@ -98,11 +108,12 @@ These will not be that straightforward to implement correctly, especially not to

 ### Code agents

-[Multiple](https://huggingface.co/papers/2402.01030) [research](https://huggingface.co/papers/2411.01747) [papers](https://huggingface.co/papers/2401.00812) have shown that having the LLM write its actions (the tool calls) in code is much better than the current standard format for tool calling, which is across the industry different shades of "writing actions as a JSON of tools names and arguments to use".
+[Multiple](https://huggingface.co/papers/2402.01030) [research](https://huggingface.co/papers/2411.01747) [papers](https://huggingface.co/papers/2401.00812) have shown that having the LLM write its actions (the tool calls) in code is much better than the current standard format for tool calling, which is across the industry different shades of "writing actions as a JSON of tools names and arguments to use, which you then parse to know which tool to execute and with which arguments".

-Why is code better? Well, because we crafted our code languages specifically to be great at expressing actions performed by a computer. If JSON snippets was a better way, this package would have been written in JSON snippets and the devil would be laughing at us.
+Why is code better? Well, because we crafted our code languages specifically to be great at expressing actions performed by a computer. If JSON snippets were a better way, JSON would be the top programming language and programming would be hell on earth.
+
+Code has better:

-Code is just a better way to express actions on a computer. It has better:
 - **Composability:** could you nest JSON actions within each other, or define a set of JSON actions to re-use later, the same way you could just define a python function?
 - **Object management:** how do you store the output of an action like `generate_image` in JSON?
 - **Generality:** code is built to express simply anything you can do have a computer do.