Improve Readme: shorter intro, VLMs, MCP, etc (#459)

* Improve Readme: shorter intro, VLMs, MCP, etc
2025-02-03 11:13:49 +01:00 · 2025-02-03 11:13:49 +01:00 · 44f94eaa2d
parent 318b9e6700
commit 44f94eaa2d
3 changed files with 122 additions and 42 deletions
--- a/README.md
+++ b/README.md
@ -32,14 +32,17 @@ limitations under the License.
 `smolagents` is a library that enables you to run powerful agents in a few lines of code. It offers:
-✨ **Simplicity**: the logic for agents fits in ~thousand lines of code (see [agents.py](https://github.com/huggingface/smolagents/blob/main/src/smolagents/agents.py)). We kept abstractions to their minimal shape above raw code!
+✨ **Simplicity**: the logic for agents fits in 1,000 lines of code (see [agents.py](https://github.com/huggingface/smolagents/blob/main/src/smolagents/agents.py)). We kept abstractions to their minimal shape above raw code!
-🧑‍💻 **First-class support for Code Agents**, i.e. agents that write their actions in code (as opposed to "agents being used to write code"). To make it secure, we support executing in sandboxed environments via [E2B](https://e2b.dev/).
+🧑‍💻 **First-class support for Code Agents**. Our [`CodeAgent`](https://huggingface.co/docs/smolagents/reference/agents#smolagents.CodeAgent) writes its actions in code (as opposed to "agents being used to write code"). To make it secure, we support executing in sandboxed environments via [E2B](https://e2b.dev/).
 - On top of this [`CodeAgent`](https://huggingface.co/docs/smolagents/reference/agents#smolagents.CodeAgent) class, we still support the standard [`ToolCallingAgent`](https://huggingface.co/docs/smolagents/reference/agents#smolagents.ToolCallingAgent) that writes actions as JSON/text blobs.
-🤗 **Hub integrations**: you can share and load Gradio Spaces as tools to/from the Hub, and more is to come!
+🤗 **Hub integrations**: you can [share/pull tools to/from the Hub](https://huggingface.co/docs/smolagents/reference/tools#smolagents.Tool.from_hub), and more is to come!
-🌐 **Support for any LLM**: it supports models hosted on the Hub loaded in their `transformers` version or through our inference API, but also supports models from OpenAI, Anthropic and many others via our [LiteLLM](https://www.litellm.ai/) integration.
+🌐 **Model-agnostic**: smolagents supports any LLM. It can be a local `transformers` or `ollama` model, one of [many providers on the Hub](https://huggingface.co/blog/inference-providers), or any model from OpenAI, Anthropic and many others via our [LiteLLM](https://www.litellm.ai/) integration.
 👁️ **Modality-agnostic**: Agents support text, vision, video, even audio inputs! Cf [this tutorial](https://huggingface.co/docs/smolagents/examples/web_browser) for vision.
 🛠️ **Tool-agnostic**: you can use tools from [LangChain](https://huggingface.co/docs/smolagents/reference/tools#smolagents.Tool.from_langchain), [Anthropic's MCP](https://huggingface.co/docs/smolagents/reference/tools#smolagents.ToolCollection.from_mcp), you can even use a [Hub Space](https://huggingface.co/docs/smolagents/reference/tools#smolagents.Tool.from_space) as a tool.
 Full documentation can be found [here](https://huggingface.co/docs/smolagents/index).
@ -51,7 +54,7 @@ Full documentation can be found [here](https://huggingface.co/docs/smolagents/in
 - [Quick Demo](#quick-demo)
 - [Command Line Interface](#command-line-interface)
 - [Code Agents](#code-agents)
- [How Smol is it Really?](#how-smol-is-it-really)
+- [How smol is this library?](#how-smol-is-this-library)
 - [How Strong are Open Models for Agentic Workflows?](#how-strong-are-open-models-for-agentic-workflows)
 - [Contributing](#contributing)
 - [Citing smolagents](#citing-smolagents)
@ -66,53 +69,131 @@ Then define your agent, give it the tools it needs and run it!
 ```py
 from smolagents import CodeAgent, DuckDuckGoSearchTool, HfApiModel
-agent = CodeAgent(tools=[DuckDuckGoSearchTool()], model=HfApiModel())
+mdoel = HfApiModel()
 agent = CodeAgent(tools=[DuckDuckGoSearchTool()], model=model)
 agent.run("How many seconds would it take for a leopard at full speed to run through Pont des Arts?")
 ```
 https://github.com/user-attachments/assets/cd0226e2-7479-4102-aea0-57c22ca47884
 Our library is LLM-agnostic: you could switch the example above to any inference provider.
 <details>
 <summary> <b>HfApiModel, gateway for 4 inference providers</b></summary>
 ```py
 from smolagents import HfApiModel
 model = HfApiModel(
    model_id="deepseek-ai/DeepSeek-R1",
    provider="together",
 )
 ```
 </details>
 <details>
 <summary> <b>LiteLLM to access 100+ LLMs</b></summary>
 ```py
 from smolagents import LiteLLMModel
 model = LiteLLMModel(
    "anthropic/claude-3-5-sonnet-latest",
    temperature=0.2,
    api_key=os.environ["ANTHROPIC_API_KEY"]
 )
 ```
 </details>
 <details>
 <summary> <b>OpenAI-compatible servers</b></summary>
 ```py
 import os
 from smolagents import OpenAIServerModel
 model = OpenAIServerModel(
    model_id="deepseek-ai/DeepSeek-R1",
    api_base="https://api.together.xyz/v1/", # Leave this blank to query OpenAI servers.
    api_key=os.environ["TOGETHER_API_KEY"], # Switch to the API key for the server you're targeting.
 )
 ```
 </details>
 <details>
 <summary> <b>Local `transformers` model</b></summary>
 ```py
 from smolagents import TransformersModel
 model = TransformersModel(
    model_id="Qwen/Qwen2.5-Coder-32B-Instruct",
    max_new_tokens=4096,
    device_map="auto"
 )
 ```
 </details>
 <details>
 <summary> <b>Azure models</b></summary>
 ```py
 import os
 from smolagents import AzureOpenAIServerModel
 model = AzureOpenAIServerModel(
    model_id = os.environ.get("AZURE_OPENAI_MODEL"),
    azure_endpoint=os.environ.get("AZURE_OPENAI_ENDPOINT"),
    api_key=os.environ.get("AZURE_OPENAI_API_KEY"),
    api_version=os.environ.get("OPENAI_API_VERSION")    
 )
 ```
 </details>
 ## Command Line Interface
-You can accomplish multi-step agentic tasks using two commands: `smolagent` and `webagent`. `smolagent` is a more generalist command to run a multi-step CodeAgent that can be equipped with various tools, meanwhile `webagent` is an agent equipped with web browsing tools using [helium](https://github.com/mherrmann/helium).
+You can run agents from CLI using two commands: `smolagent` and `webagent`. `smolagent` is a generalist command to run a multi-step `CodeAgent` that can be equipped with various tools, meanwhile `webagent` is a specific web-browsing agent using [helium](https://github.com/mherrmann/helium).
-**Web Browser in CLI**
+**Web Browser Agent in CLI**
-`webagent` allows users to automate web browsing tasks. It uses the Helium library to interact with web pages and uses defined tools to browse the web. Read more about it [here](https://github.com/huggingface/smolagents/blob/main/src/smolagents/vision_web_browser.py).
+`webagent` allows users to automate web browsing tasks. It uses the [helium](https://github.com/helium) library to interact with web pages and uses defined tools to browse the web. Read more about this agent [here](https://github.com/huggingface/smolagents/blob/main/src/smolagents/vision_web_browser.py).
 Run the following command to get started:
 ```bash
 webagent {YOUR_PROMPT_HERE} --model "LiteLLMModel" --model-id "gpt-4o"
 ```
-A good example command to get started is `$ webagent --prompt "go to xyz.com/women, get to sale section, click the first clothing item you see. Get the product details, and the price, return them. note that I'm shopping from France"`. We redacted the website here, modify it with website of your choice.
+For instance:
 ```bash
 webagent --prompt "go to xyz.com/women, get to sale section, click the first clothing item you see. Get the product details, and the price, return them. note that I'm shopping from France"
 ```
 We redacted the website here, modify it with the website of your choice.
-**Tool Calling Agent in CLI**
+**CodeAgent in CLI**
-You can run `smolagent` command to run a multi-step agent with [tools](https://huggingface.co/docs/smolagents/en/reference/tools). It uses web search tool by default.
+Use `smolagent` to run a multi-step agent with [tools](https://huggingface.co/docs/smolagents/en/reference/tools). It uses web search tool by default.
-You can easily get started with `$ smolagent {YOUR_PROMPT_HERE}`. A more custom version of this one-liner is following, see more details [here](https://github.com/huggingface/smolagents/blob/main/src/smolagents/cli.py).
+You can easily get started with `$ smolagent {YOUR_PROMPT_HERE}`. You can customize this as follows (more details [here](https://github.com/huggingface/smolagents/blob/main/src/smolagents/cli.py)).
 ```bash
 smolagent {YOUR_PROMPT_HERE} --model-type "HfApiModel" --model-id "Qwen/Qwen2.5-Coder-32B-Instruct" --imports "pandas numpy" --tools "web_search translation"
 ```
-A good example command to get started is `$ smolagent "Plan a trip to Tokyo, Kyoto and Osaka between Mar 28 and Apr 7. Allocate time according to number of public attraction in each, and optimize for distance and travel time. Bring all the public transportation options."`. 
+For instance:
 ```bash
 smolagent "Plan a trip to Tokyo, Kyoto and Osaka between Mar 28 and Apr 7. Allocate time according to number of public attraction in each, and optimize for distance and travel time. Bring all the public transportation options."
 ``` 
 ## Code agents?
-In our `CodeAgent`,  the LLM engine writes its actions in code. This approach is demonstrated to work better than the current industry practice of letting the LLM output a dictionary of the tools it wants to calls: [uses 30% fewer steps](https://huggingface.co/papers/2402.01030) (thus 30% fewer LLM calls)
+In our [`CodeAgent`](https://huggingface.co/docs/smolagents/reference/agents#smolagents.CodeAgent),  the LLM engine writes its actions in code. This approach is demonstrated to work better than the current industry practice of letting the LLM output a dictionary of the tools it wants to calls: [uses 30% fewer steps](https://huggingface.co/papers/2402.01030) (thus 30% fewer LLM calls) and [reaches higher performance on difficult benchmarks](https://huggingface.co/papers/2411.01747). Head to [our high-level intro to agents](https://huggingface.co/docs/smolagents/conceptual_guides/intro_agents) to learn more on that.
 and [reaches higher performance on difficult benchmarks](https://huggingface.co/papers/2411.01747). Head to [our high-level intro to agents](https://huggingface.co/docs/smolagents/conceptual_guides/intro_agents) to learn more on that.
 Especially, since code execution can be a security concern (arbitrary code execution!), we provide options at runtime:
  - a secure python interpreter to run code more safely in your environment (more secure than raw code execution but still risky)
  - a sandboxed environment using [E2B](https://e2b.dev/) (removes the risk to your own system).
-## How smol is it really?
+On top of this [`CodeAgent`](https://huggingface.co/docs/smolagents/reference/agents#smolagents.CodeAgent) class, we still support the standard [`ToolCallingAgent`](https://huggingface.co/docs/smolagents/reference/agents#smolagents.ToolCallingAgent) that writes actions as JSON/text blobs. But we recommend always using `CodeAgent`.
-We strived to keep abstractions to a strict minimum: the main code in `agents.py` is only ~1,000 lines of code.
+## How smol is this library?
-Still, we implement several types of agents: `CodeAgent` writes its actions as Python code snippets, and the more classic `ToolCallingAgent` leverages built-in tool calling methods.
+
 We strived to keep abstractions to a strict minimum: the main code in `agents.py` has <1,000 lines of code.
 Still, we implement several types of agents: `CodeAgent` writes its actions as Python code snippets, and the more classic `ToolCallingAgent` leverages built-in tool calling methods. We also have multi-agent hierarchies, import from tool collections, remote code execution, vision models...
 By the way, why use a framework at all? Well, because a big part of this stuff is non-trivial. For instance, the code agent has to keep a consistent format for code throughout its system prompt, its parser, the execution. So our framework handles this complexity for you. But of course we still encourage you to hack into the source code and use only the bits that you need, to the exclusion of everything else!
@ -123,13 +204,12 @@ We've created [`CodeAgent`](https://huggingface.co/docs/smolagents/reference/age
 [Find the benchmarking code here](https://github.com/huggingface/smolagents/blob/main/examples/benchmark.ipynb) for more detail on the agentic setup used, and see a comparison of using LLMs code agents compared to vanilla (spoilers: code agents works better).
 <p align="center">
-    <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/smolagents/benchmark_code_agents.png" alt="benchmark of different models on agentic workflows" width=70%>
+    <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/smolagents/benchmark_code_agents.jpeg" alt="benchmark of different models on agentic workflows. Open model DeepSeek-R1 beats closed-source models." width=60% max-width=500px>
 </p>
 This comparison shows that open-source models can now take on the best closed models!
-
+## Contribute
 ## Contributing
 To contribute, follow our [contribution guide](https://github.com/huggingface/smolagents/blob/main/CONTRIBUTING.md).
@ -158,8 +238,9 @@ To run tests locally, run this command:
 ```bash
 make test
 ```
 </details>
-## Citing smolagents
+## Cite smolagents
 If you use `smolagents` in your publication, please cite it by using the following BibTeX entry.
--- a/docs/source/en/examples/web_browser.md
+++ b/docs/source/en/examples/web_browser.md
@ -5,19 +5,19 @@
 In this notebook, we'll create an **agent-powered web browser automation system**! This system can navigate websites, interact with elements, and extract information automatically.
 The agent will be able to:
 ✅ Navigate to web pages
 ✅ Click on elements
 ✅ Search within pages
 ✅ Handle popups and modals
 ✅ Take screenshots
 ✅ Extract information
-Let's set up this system step by step.
+- [x] Navigate to web pages
 - [x] Click on elements
 - [x] Search within pages
 - [x] Handle popups and modals
 - [x] Extract information
 Let's set up this system step by step!
 First, run these lines to install the required dependencies:
 ```bash
-pip install smolagents selenium helium pillow python-dotenv -q
+pip install smolagents selenium helium pillow -q
 ```
 Let's import our required libraries and set up environment variables:
@ -208,11 +208,4 @@ The system is particularly effective for tasks like:
 - Data extraction from websites
 - Web research automation
 - UI testing and verification
- Content monitoring
+- Content monitoring
 Best Practices:
 1. Always provide clear, specific instructions
 2. Use the screenshot callback for debugging
 3. Handle errors gracefully
 4. Clean up old screenshots to manage memory
 5. Set reasonable step limits for your tasks
--- a/src/smolagents/utils.py
+++ b/src/smolagents/utils.py
@ -162,7 +162,10 @@ def parse_code_blobs(code_blob: str) -> str:
        if "final" in code_blob and "answer" in code_blob:
            raise ValueError(
                f"""
-The code blob is invalid, because the regex pattern {pattern} was not found in {code_blob=}. It seems like you're trying to return the final answer, you can do it as follows:
+Your code snippet is invalid, because the regex pattern {pattern} was not found in it.
 Here is your code snippet:
 {code_blob}
 It seems like you're trying to return the final answer, you can do it as follows:
 Code:
 ```py
 final_answer("YOUR FINAL ANSWER HERE")
@ -170,7 +173,10 @@ final_answer("YOUR FINAL ANSWER HERE")
            )
        raise ValueError(
            f"""
-The code blob is invalid, because the regex pattern {pattern} was not found in {code_blob=}. Make sure to include code with the correct pattern, for instance:
+Your code snippet is invalid, because the regex pattern {pattern} was not found in it.
 Here is your code snippet:
 {code_blob}
 Make sure to include code with the correct pattern, for instance:
 Thoughts: Your thoughts
 Code:
 ```py