Merge branch 'huggingface:main' into add-device-parameter
This commit is contained in:
		
						commit
						6f87aee50e
					
				|  | @ -28,10 +28,10 @@ In this guide, we're going to see best practices for building agents. | ||||||
| 
 | 
 | ||||||
| Giving an LLM some agency in your workflow introduces some risk of errors. | Giving an LLM some agency in your workflow introduces some risk of errors. | ||||||
| 
 | 
 | ||||||
| Well-programmed agentic systems have good error logging and retry mechanisms anyway, so the LLM engine has a chance to self-correct their mistake. But to reduce the risk of LLM error to the maximum, you should simplify your worklow! | Well-programmed agentic systems have good error logging and retry mechanisms anyway, so the LLM engine has a chance to self-correct their mistake. But to reduce the risk of LLM error to the maximum, you should simplify your workflow! | ||||||
| 
 | 
 | ||||||
| Let's take again the example from [intro_agents]: a bot that answers user queries on a surf trip company. | Let's take again the example from [intro_agents]: a bot that answers user queries on a surf trip company. | ||||||
| Instead of letting the agent do 2 different calls for "travel distance API" and "weather API" each time they are asked about a new surf spot, you could just make one unified tool "return_spot_information", a functions that calls both APIs at once and returns their concatenated outputs to the user. | Instead of letting the agent do 2 different calls for "travel distance API" and "weather API" each time they are asked about a new surf spot, you could just make one unified tool "return_spot_information", a function that calls both APIs at once and returns their concatenated outputs to the user. | ||||||
| 
 | 
 | ||||||
| This will reduce costs, latency, and error risk! | This will reduce costs, latency, and error risk! | ||||||
| 
 | 
 | ||||||
|  | @ -168,7 +168,7 @@ Final answer: | ||||||
| /var/folders/6m/9b1tts6d5w960j80wbw9tx3m0000gn/T/tmpx09qfsdd/652f0007-3ee9-44e2-94ac-90dae6bb89a4.png | /var/folders/6m/9b1tts6d5w960j80wbw9tx3m0000gn/T/tmpx09qfsdd/652f0007-3ee9-44e2-94ac-90dae6bb89a4.png | ||||||
| ``` | ``` | ||||||
| The user sees, instead of an image being returned, a path being returned to them. | The user sees, instead of an image being returned, a path being returned to them. | ||||||
| It could look like a bug from the system, but actually the agentic system didn't cause the error: it's just that the LLM engine tid the mistake of not saving the image output into a variable. | It could look like a bug from the system, but actually the agentic system didn't cause the error: it's just that the LLM engine did the mistake of not saving the image output into a variable. | ||||||
| Thus it cannot access the image again except by leveraging the path that was logged while saving the image, so it returns the path instead of an image. | Thus it cannot access the image again except by leveraging the path that was logged while saving the image, so it returns the path instead of an image. | ||||||
| 
 | 
 | ||||||
| The first step to debugging your agent is thus "Use a more powerful LLM". Alternatives like `Qwen2/5-72B-Instruct` wouldn't have made that mistake. | The first step to debugging your agent is thus "Use a more powerful LLM". Alternatives like `Qwen2/5-72B-Instruct` wouldn't have made that mistake. | ||||||
|  | @ -177,9 +177,9 @@ The first step to debugging your agent is thus "Use a more powerful LLM". Altern | ||||||
| 
 | 
 | ||||||
| Then you can also use less powerful models but guide them better. | Then you can also use less powerful models but guide them better. | ||||||
| 
 | 
 | ||||||
| Put yourself in the shoes if your model: if you were the model solving the task, would you struggle with the information available to you (from the system prompt + task formulation + tool description) ? | Put yourself in the shoes of your model: if you were the model solving the task, would you struggle with the information available to you (from the system prompt + task formulation + tool description) ? | ||||||
| 
 | 
 | ||||||
| Would you need some added claritications ?  | Would you need some added clarifications?  | ||||||
| 
 | 
 | ||||||
| To provide extra information, we do not recommend to change the system prompt right away: the default system prompt has many adjustments that you do not want to mess up except if you understand the prompt very well. | To provide extra information, we do not recommend to change the system prompt right away: the default system prompt has many adjustments that you do not want to mess up except if you understand the prompt very well. | ||||||
| Better ways to guide your LLM engine are: | Better ways to guide your LLM engine are: | ||||||
|  |  | ||||||
|  | @ -189,14 +189,12 @@ class HfApiModel(Model): | ||||||
|     This engine allows you to communicate with Hugging Face's models using the Inference API. It can be used in both serverless mode or with a dedicated endpoint, supporting features like stop sequences and grammar customization. |     This engine allows you to communicate with Hugging Face's models using the Inference API. It can be used in both serverless mode or with a dedicated endpoint, supporting features like stop sequences and grammar customization. | ||||||
| 
 | 
 | ||||||
|     Parameters: |     Parameters: | ||||||
|         model (`str`, *optional*, defaults to `"Qwen/Qwen2.5-Coder-32B-Instruct"`): |         model_id (`str`, *optional*, defaults to `"Qwen/Qwen2.5-Coder-32B-Instruct"`): | ||||||
|             The Hugging Face model ID to be used for inference. This can be a path or model identifier from the Hugging Face model hub. |             The Hugging Face model ID to be used for inference. This can be a path or model identifier from the Hugging Face model hub. | ||||||
|         token (`str`, *optional*): |         token (`str`, *optional*): | ||||||
|             Token used by the Hugging Face API for authentication. This token need to be authorized 'Make calls to the serverless Inference API'. |             Token used by the Hugging Face API for authentication. This token need to be authorized 'Make calls to the serverless Inference API'. | ||||||
|             If the model is gated (like Llama-3 models), the token also needs 'Read access to contents of all public gated repos you can access'. |             If the model is gated (like Llama-3 models), the token also needs 'Read access to contents of all public gated repos you can access'. | ||||||
|             If not provided, the class will try to use environment variable 'HF_TOKEN', else use the token stored in the Hugging Face CLI configuration. |             If not provided, the class will try to use environment variable 'HF_TOKEN', else use the token stored in the Hugging Face CLI configuration. | ||||||
|         max_tokens (`int`, *optional*, defaults to 1500): |  | ||||||
|             The maximum number of tokens allowed in the output. |  | ||||||
|         timeout (`int`, *optional*, defaults to 120): |         timeout (`int`, *optional*, defaults to 120): | ||||||
|             Timeout for the API request, in seconds. |             Timeout for the API request, in seconds. | ||||||
| 
 | 
 | ||||||
|  | @ -207,12 +205,11 @@ class HfApiModel(Model): | ||||||
|     Example: |     Example: | ||||||
|     ```python |     ```python | ||||||
|     >>> engine = HfApiModel( |     >>> engine = HfApiModel( | ||||||
|     ...     model="Qwen/Qwen2.5-Coder-32B-Instruct", |     ...     model_id="Qwen/Qwen2.5-Coder-32B-Instruct", | ||||||
|     ...     token="your_hf_token_here", |     ...     token="your_hf_token_here", | ||||||
|     ...     max_tokens=2000 |  | ||||||
|     ... ) |     ... ) | ||||||
|     >>> messages = [{"role": "user", "content": "Explain quantum mechanics in simple terms."}] |     >>> messages = [{"role": "user", "content": "Explain quantum mechanics in simple terms."}] | ||||||
|     >>> response = engine(messages, stop_sequences=["END"]) |     >>> response = engine(messages, stop_sequences=["END"], max_tokens=1500) | ||||||
|     >>> print(response) |     >>> print(response) | ||||||
|     "Quantum mechanics is the branch of physics that studies..." |     "Quantum mechanics is the branch of physics that studies..." | ||||||
|     ``` |     ``` | ||||||
|  |  | ||||||
		Loading…
	
		Reference in New Issue