Refactor documentation

2024-12-20 16:15:06 +01:00 · 2024-12-20 16:15:06 +01:00 · 7b0b01d8f3
parent 7a1c6bce81
commit 7b0b01d8f3
5 changed files with 119 additions and 127 deletions
--- a/docs/source/_toctree.yml
+++ b/docs/source/_toctree.yml
@ -1,25 +1,28 @@
- sections:
+- title: Get started
  sections:
  - local: index
    title: 🤗 Agents
  - local: quicktour
-    title: Quick tour
+    title: ⏱️ Quick tour
-  title: Get started
+- title: Tutorials
- sections:
+  sections:
-  - local: building_good_agents
+  - local: tutorials/building_good_agents
-    title: Building good agents
+    title: ✨ Building good agents
-  - local: tools
+  - local: tutorials/tools
    title: 🛠️ Tools - in-depth guide
-  title: Tutorials
+- title: Conceptual guides
- sections:
+  sections:
-  - local: intro_agents
+  - local: conceptual_guides/intro_agents
-    title: An introduction to agentic systems
+    title: 🤖 An introduction to agentic systems
-  title: Conceptual guides
+  - local: conceptual_guides/react
- sections:
+    title: 🤔 ReAct agents
-  - local: text_to_sql
+- title: Examples
  sections:
  - local: examples/text_to_sql
    title: Text-to-SQL
-  title: Examples
+- title: Reference
- sections:
+  sections:
-  - sections:
+  - local: reference/agents
-    - local: main_classes/agent
+    title: Agent-related objects
-      title: Agents and Tools
+  - local: reference/tools
-    title: Main Classes
+    title: Tool-related objects
--- a/docs/source/quicktour.md
+++ b/docs/source/quicktour.md
@ -27,60 +27,25 @@ An agent is a system that uses an LLM as its engine, and it has access to functi
 These *tools* are functions for performing a task, and they contain all necessary description for the agent to properly use them.
-The agent can be programmed to:
+For example, here is how a Code agent with access to a `web_search` tool would work its way through the following question.
 - devise a series of actions/tools and run them all at once,  like the [`CodeAgent`]
 - plan and execute actions/tools one by one and wait for the outcome of each action before launching the next one, like the [`JsonAgent`]
 ### Types of agents
 #### Code agent
 This agent has a planning step, then generates python code to execute all its actions at once. It natively handles different input and output types for its tools, thus it is the recommended choice for multimodal tasks.
 #### React agents
 This is the go-to agent to solve reasoning tasks, since the ReAct framework ([Yao et al., 2022](https://huggingface.co/papers/2210.03629)) makes it really efficient to think on the basis of its previous observations.
 We implement two versions of JsonAgent: 
 - [`JsonAgent`] generates tool calls as a JSON in its output.
 - [`CodeAgent`] is a new type of JsonAgent that generates its tool calls as blobs of code, which works really well for LLMs that have strong coding performance.
 > [!TIP]
 > Read [Open-source LLMs as LangChain Agents](https://huggingface.co/blog/open-source-llms-as-agents) blog post to learn more about ReAct agents.
 <div class="flex justify-center">
    <img
        class="block dark:hidden"
        src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/Agent_ManimCE.gif"
    />
    <img
        class="hidden dark:block"
        src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/Agent_ManimCE.gif"
    />
 </div>
 ![Framework of a React Agent](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/open-source-llms-as-agents/ReAct.png)
 For example, here is how a ReAct Code agent would work its way through the following question.
 ```py3
 agent.run(
-"""How many more blocks (also denoted as layers) in BERT base encoder than the encoder from the architecture
+"""How many more blocks (also denoted as layers) are there in BERT base encoder than in the encoder from the architecture proposed in Attention is All You Need?"""
 proposed in Attention is All You Need?"""
 )
 ```
 ```text
 =====New task=====
-How many more blocks (also denoted as layers) in BERT base encoder than the encoder from the architecture proposed in Attention is All You Need?
+How many more blocks (also denoted as layers) are there in BERT base encoder than in the encoder from the architecture proposed in Attention is All You Need?
 ====Agent is executing the code below:
-bert_blocks = search(query="number of blocks in BERT base encoder")
+bert_blocks = web_search(query="number of blocks in BERT base encoder")
 print("BERT blocks:", bert_blocks)
 ====
 Print outputs:
 BERT blocks: twelve encoder blocks
 ====Agent is executing the code below:
-attention_layer = search(query="number of layers in Attention is All You Need")
+attention_layer = web_search(query="number of layers in Attention is All You Need")
 print("Attention layers:", attention_layer)
 ====
 Print outputs:
@ -459,4 +424,4 @@ with gr.Blocks() as demo:
 if __name__ == "__main__":
    demo.launch()
-```
+```
--- a/docs/source/main_classes/agent.md
+++ b/docs/source/main_classes/agent.md
@ -13,7 +13,7 @@ specific language governing permissions and limitations under the License.
 rendered properly in your Markdown viewer.
 -->
-# Agents & Tools
+# Agents
 <Tip warning={true}>
@ -27,19 +27,16 @@ contains the API docs for the underlying classes.
 ## Agents
-We provide two types of agents, based on the main [`Agent`] class:
+Our agents inherit from [`ReactAgent`], which means they can act in multiple steps, each step consisting of one thought, then one tool call and execution. Read more in [this conceptual guide](../conceptual_guides/react).
- [`CodeAgent`] acts in one shot, generating code to solve the task, then executes it at once.
+
- [`ReactAgent`] acts step by step, each step consisting of one thought, then one tool call and execution. It has two classes:
+We provide two types of agents, based on the main [`Agent`] class.
  - [`JsonAgent`] writes its tool calls in JSON.
  - [`CodeAgent`] writes its tool calls in Python code.
-### Agent
+### BaseAgent
-[[autodoc]] Agent
+[[autodoc]] BaseAgent
 ### CodeAgent
 [[autodoc]] CodeAgent
 ### React agents
@ -53,35 +50,10 @@ We provide two types of agents, based on the main [`Agent`] class:
 [[autodoc]] ManagedAgent
 ## Tools
 ### load_tool
 [[autodoc]] load_tool
 ### tool
 [[autodoc]] tool
 ### Tool
 [[autodoc]] Tool
 ### Toolbox
 [[autodoc]] Toolbox
 ### launch_gradio_demo
 [[autodoc]] launch_gradio_demo
 ### stream_to_gradio
 [[autodoc]] stream_to_gradio
 ### ToolCollection
 [[autodoc]] ToolCollection
 ## Engines
@ -129,33 +101,3 @@ HfApiEngine()(messages, stop_sequences=["conversation"])
 ```
 [[autodoc]] HfApiEngine
 ## Agent Types
 Agents can handle any type of object in-between tools; tools, being completely multimodal, can accept and return
 text, image, audio, video, among other types. In order to increase compatibility between tools, as well as to 
 correctly render these returns in ipython (jupyter, colab, ipython notebooks, ...), we implement wrapper classes
 around these types.
 The wrapped objects should continue behaving as initially; a text object should still behave as a string, an image
 object should still behave as a `PIL.Image`.
 These types have three specific purposes:
 - Calling `to_raw` on the type should return the underlying object
 - Calling `to_string` on the type should return the object as a string: that can be the string in case of an `AgentText`
  but will be the path of the serialized version of the object in other instances
 - Displaying it in an ipython kernel should display the object correctly
 ### AgentText
 [[autodoc]] agents.types.AgentText
 ### AgentImage
 [[autodoc]] agents.types.AgentImage
 ### AgentAudio
 [[autodoc]] agents.types.AgentAudio
--- a/docs/source/reference/tools.md
+++ b/docs/source/reference/tools.md
@ -0,0 +1,82 @@
 <!--Copyright 2024 The HuggingFace Team. All rights reserved.
 Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
 the License. You may obtain a copy of the License at
 http://www.apache.org/licenses/LICENSE-2.0
 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
 an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
 specific language governing permissions and limitations under the License.
 ⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
 rendered properly in your Markdown viewer.
 -->
 # Tools
 <Tip warning={true}>
 Transformers Agents is an experimental API which is subject to change at any time. Results returned by the agents
 can vary as the APIs or underlying models are prone to change.
 </Tip>
 To learn more about agents and tools make sure to read the [introductory guide](../index). This page
 contains the API docs for the underlying classes.
 ## Tools
 ### load_tool
 [[autodoc]] load_tool
 ### tool
 [[autodoc]] tool
 ### Tool
 [[autodoc]] Tool
 ### Toolbox
 [[autodoc]] Toolbox
 ### launch_gradio_demo
 [[autodoc]] launch_gradio_demo
 ### ToolCollection
 [[autodoc]] ToolCollection
 ## Agent Types
 Agents can handle any type of object in-between tools; tools, being completely multimodal, can accept and return
 text, image, audio, video, among other types. In order to increase compatibility between tools, as well as to 
 correctly render these returns in ipython (jupyter, colab, ipython notebooks, ...), we implement wrapper classes
 around these types.
 The wrapped objects should continue behaving as initially; a text object should still behave as a string, an image
 object should still behave as a `PIL.Image`.
 These types have three specific purposes:
 - Calling `to_raw` on the type should return the underlying object
 - Calling `to_string` on the type should return the object as a string: that can be the string in case of an `AgentText`
  but will be the path of the serialized version of the object in other instances
 - Displaying it in an ipython kernel should display the object correctly
 ### AgentText
 [[autodoc]] agents.types.AgentText
 ### AgentImage
 [[autodoc]] agents.types.AgentImage
 ### AgentAudio
 [[autodoc]] agents.types.AgentAudio
--- a/docs/source/tutorials/tools.md
+++ b/docs/source/tutorials/tools.md
@ -197,8 +197,8 @@ agent.run(
 ```
-> [!WARNING]
+> [!TIP]
-> Beware when adding tools to an agent that already works well because it can bias selection towards your tool or select another tool other than the one already defined.
+> Beware of not adding too many tools to an agent: this can overwhelm weaker LLM engines.
 Use the `agent.toolbox.update_tool()` method to replace an existing tool in the agent's toolbox.