Refactor documentation

This commit is contained in:
Aymeric 2024-12-20 16:15:06 +01:00
parent 7a1c6bce81
commit 7b0b01d8f3
5 changed files with 119 additions and 127 deletions

View File

@ -1,25 +1,28 @@
- sections: - title: Get started
sections:
- local: index - local: index
title: 🤗 Agents title: 🤗 Agents
- local: quicktour - local: quicktour
title: Quick tour title: ⏱️ Quick tour
title: Get started - title: Tutorials
- sections: sections:
- local: building_good_agents - local: tutorials/building_good_agents
title: Building good agents title: Building good agents
- local: tools - local: tutorials/tools
title: 🛠️ Tools - in-depth guide title: 🛠️ Tools - in-depth guide
title: Tutorials - title: Conceptual guides
- sections: sections:
- local: intro_agents - local: conceptual_guides/intro_agents
title: An introduction to agentic systems title: 🤖 An introduction to agentic systems
title: Conceptual guides - local: conceptual_guides/react
- sections: title: 🤔 ReAct agents
- local: text_to_sql - title: Examples
sections:
- local: examples/text_to_sql
title: Text-to-SQL title: Text-to-SQL
title: Examples - title: Reference
- sections: sections:
- sections: - local: reference/agents
- local: main_classes/agent title: Agent-related objects
title: Agents and Tools - local: reference/tools
title: Main Classes title: Tool-related objects

View File

@ -27,60 +27,25 @@ An agent is a system that uses an LLM as its engine, and it has access to functi
These *tools* are functions for performing a task, and they contain all necessary description for the agent to properly use them. These *tools* are functions for performing a task, and they contain all necessary description for the agent to properly use them.
The agent can be programmed to: For example, here is how a Code agent with access to a `web_search` tool would work its way through the following question.
- devise a series of actions/tools and run them all at once, like the [`CodeAgent`]
- plan and execute actions/tools one by one and wait for the outcome of each action before launching the next one, like the [`JsonAgent`]
### Types of agents
#### Code agent
This agent has a planning step, then generates python code to execute all its actions at once. It natively handles different input and output types for its tools, thus it is the recommended choice for multimodal tasks.
#### React agents
This is the go-to agent to solve reasoning tasks, since the ReAct framework ([Yao et al., 2022](https://huggingface.co/papers/2210.03629)) makes it really efficient to think on the basis of its previous observations.
We implement two versions of JsonAgent:
- [`JsonAgent`] generates tool calls as a JSON in its output.
- [`CodeAgent`] is a new type of JsonAgent that generates its tool calls as blobs of code, which works really well for LLMs that have strong coding performance.
> [!TIP]
> Read [Open-source LLMs as LangChain Agents](https://huggingface.co/blog/open-source-llms-as-agents) blog post to learn more about ReAct agents.
<div class="flex justify-center">
<img
class="block dark:hidden"
src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/Agent_ManimCE.gif"
/>
<img
class="hidden dark:block"
src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/Agent_ManimCE.gif"
/>
</div>
![Framework of a React Agent](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/open-source-llms-as-agents/ReAct.png)
For example, here is how a ReAct Code agent would work its way through the following question.
```py3 ```py3
agent.run( agent.run(
"""How many more blocks (also denoted as layers) in BERT base encoder than the encoder from the architecture """How many more blocks (also denoted as layers) are there in BERT base encoder than in the encoder from the architecture proposed in Attention is All You Need?"""
proposed in Attention is All You Need?"""
) )
``` ```
```text ```text
=====New task===== =====New task=====
How many more blocks (also denoted as layers) in BERT base encoder than the encoder from the architecture proposed in Attention is All You Need? How many more blocks (also denoted as layers) are there in BERT base encoder than in the encoder from the architecture proposed in Attention is All You Need?
====Agent is executing the code below: ====Agent is executing the code below:
bert_blocks = search(query="number of blocks in BERT base encoder") bert_blocks = web_search(query="number of blocks in BERT base encoder")
print("BERT blocks:", bert_blocks) print("BERT blocks:", bert_blocks)
==== ====
Print outputs: Print outputs:
BERT blocks: twelve encoder blocks BERT blocks: twelve encoder blocks
====Agent is executing the code below: ====Agent is executing the code below:
attention_layer = search(query="number of layers in Attention is All You Need") attention_layer = web_search(query="number of layers in Attention is All You Need")
print("Attention layers:", attention_layer) print("Attention layers:", attention_layer)
==== ====
Print outputs: Print outputs:
@ -459,4 +424,4 @@ with gr.Blocks() as demo:
if __name__ == "__main__": if __name__ == "__main__":
demo.launch() demo.launch()
``` ```

View File

@ -13,7 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer. rendered properly in your Markdown viewer.
--> -->
# Agents & Tools # Agents
<Tip warning={true}> <Tip warning={true}>
@ -27,19 +27,16 @@ contains the API docs for the underlying classes.
## Agents ## Agents
We provide two types of agents, based on the main [`Agent`] class: Our agents inherit from [`ReactAgent`], which means they can act in multiple steps, each step consisting of one thought, then one tool call and execution. Read more in [this conceptual guide](../conceptual_guides/react).
- [`CodeAgent`] acts in one shot, generating code to solve the task, then executes it at once.
- [`ReactAgent`] acts step by step, each step consisting of one thought, then one tool call and execution. It has two classes: We provide two types of agents, based on the main [`Agent`] class.
- [`JsonAgent`] writes its tool calls in JSON. - [`JsonAgent`] writes its tool calls in JSON.
- [`CodeAgent`] writes its tool calls in Python code. - [`CodeAgent`] writes its tool calls in Python code.
### Agent ### BaseAgent
[[autodoc]] Agent [[autodoc]] BaseAgent
### CodeAgent
[[autodoc]] CodeAgent
### React agents ### React agents
@ -53,35 +50,10 @@ We provide two types of agents, based on the main [`Agent`] class:
[[autodoc]] ManagedAgent [[autodoc]] ManagedAgent
## Tools
### load_tool
[[autodoc]] load_tool
### tool
[[autodoc]] tool
### Tool
[[autodoc]] Tool
### Toolbox
[[autodoc]] Toolbox
### launch_gradio_demo
[[autodoc]] launch_gradio_demo
### stream_to_gradio ### stream_to_gradio
[[autodoc]] stream_to_gradio [[autodoc]] stream_to_gradio
### ToolCollection
[[autodoc]] ToolCollection
## Engines ## Engines
@ -129,33 +101,3 @@ HfApiEngine()(messages, stop_sequences=["conversation"])
``` ```
[[autodoc]] HfApiEngine [[autodoc]] HfApiEngine
## Agent Types
Agents can handle any type of object in-between tools; tools, being completely multimodal, can accept and return
text, image, audio, video, among other types. In order to increase compatibility between tools, as well as to
correctly render these returns in ipython (jupyter, colab, ipython notebooks, ...), we implement wrapper classes
around these types.
The wrapped objects should continue behaving as initially; a text object should still behave as a string, an image
object should still behave as a `PIL.Image`.
These types have three specific purposes:
- Calling `to_raw` on the type should return the underlying object
- Calling `to_string` on the type should return the object as a string: that can be the string in case of an `AgentText`
but will be the path of the serialized version of the object in other instances
- Displaying it in an ipython kernel should display the object correctly
### AgentText
[[autodoc]] agents.types.AgentText
### AgentImage
[[autodoc]] agents.types.AgentImage
### AgentAudio
[[autodoc]] agents.types.AgentAudio

View File

@ -0,0 +1,82 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Tools
<Tip warning={true}>
Transformers Agents is an experimental API which is subject to change at any time. Results returned by the agents
can vary as the APIs or underlying models are prone to change.
</Tip>
To learn more about agents and tools make sure to read the [introductory guide](../index). This page
contains the API docs for the underlying classes.
## Tools
### load_tool
[[autodoc]] load_tool
### tool
[[autodoc]] tool
### Tool
[[autodoc]] Tool
### Toolbox
[[autodoc]] Toolbox
### launch_gradio_demo
[[autodoc]] launch_gradio_demo
### ToolCollection
[[autodoc]] ToolCollection
## Agent Types
Agents can handle any type of object in-between tools; tools, being completely multimodal, can accept and return
text, image, audio, video, among other types. In order to increase compatibility between tools, as well as to
correctly render these returns in ipython (jupyter, colab, ipython notebooks, ...), we implement wrapper classes
around these types.
The wrapped objects should continue behaving as initially; a text object should still behave as a string, an image
object should still behave as a `PIL.Image`.
These types have three specific purposes:
- Calling `to_raw` on the type should return the underlying object
- Calling `to_string` on the type should return the object as a string: that can be the string in case of an `AgentText`
but will be the path of the serialized version of the object in other instances
- Displaying it in an ipython kernel should display the object correctly
### AgentText
[[autodoc]] agents.types.AgentText
### AgentImage
[[autodoc]] agents.types.AgentImage
### AgentAudio
[[autodoc]] agents.types.AgentAudio

View File

@ -197,8 +197,8 @@ agent.run(
``` ```
> [!WARNING] > [!TIP]
> Beware when adding tools to an agent that already works well because it can bias selection towards your tool or select another tool other than the one already defined. > Beware of not adding too many tools to an agent: this can overwhelm weaker LLM engines.
Use the `agent.toolbox.update_tool()` method to replace an existing tool in the agent's toolbox. Use the `agent.toolbox.update_tool()` method to replace an existing tool in the agent's toolbox.