学习笔记-Agents Course Unit1
status
Published
slug
agent-course-unit
type
Post
category
Technology
date
Feb 16, 2025 → Feb 20, 2025
tags
LLM
笔记
AI Agent
课程
summary
Hugging Face官方AI Agents课程Unit的学习笔记,关于AI Agent的概念,原理与实现
Hugging Face官方AI Agents课程,链接:https://huggingface.co/learn/agents-course/unit0/introduction
Unit1 Introduction to Agents
What is an Agent?
- an AI model capable of reasoning, planning, and interacting with its environment.

- Formal:An Agent is a system that leverages an AI model to interact with its environment in order to achieve a user-defined objective. It combines reasoning, planning, and the execution of actions (often via external tools) to fulfill tasks.
- Two main parts
- The Brain:AI model (LLM), reasoning and planning
- The Body:Capabilities and Tools, acting
- Summary
- Understand natural language: Interpret and respond to human instructions in a meaningful way.
- Reason and plan: Analyze information, make decisions, and devise strategies to solve problems.
- Interact with its environment: Gather information, take actions, and observe the results of those actions.
What are LLMs
- A brain of the Agent
- a type of AI model that excels at understanding and generating human language.
- trained on vast amounts of text data
- consist of many millions of parameters
- Mainstream Architecture:Transformer
- Key Aspect: Attention
- identify the most relevant words to predict the next token
Type | Example | Use Cases | Typical Size |
Encoders | BERT from Google | Text classification, semantic search, Named Entity Recognition | Millions of parameters |
Decoders | Llama from Meta | Text generation, chatbots, code generation | Billions (in the US sense, i.e., 10^9) of parameters |
Seq2Seq(Encoder + Decoder) | T5, BART, | Translation, Summarization, Paraphrasing | Millions of parameters |
- token: the unit of information an LLM works with
- eg: interest, ing, ed
- special token:
- The LLM uses these tokens to open and close the structured components of its generation.
- End of sequence token (EOS)
- prompt: the input sequence that is provided to an LLM
- Careful design of the prompt makes it easier to guide the generation of the LLM toward the desired output.
Next token prediction
- Autoregressive
- the output will become the input for the next one
- Decode strategies
- Maximum: always select the token with the maximum score
- Beam Search: explore multiple candidate sequences to find the one with the maximum total score
LLMs’ training
- pre-training: train on large dataset, unsupervised learning
- learns the structure of the language and underlying patterns in text, allowing the model to generalize to unseen data.
- fine-tune: custom dataset, supervised learning
- perform specific task
Messages and Special Tokens
- how LLMs structure their generations through chat templates
- Chart templates: the bridge between conversational messages (user and assistant turns) and the specific formatting requirements of your chosen LLM
Messages: The Underlying System of LLMs
- System Messages: define how the model should behave.
- persistent instructions
- Effect: gives information about the available tools, provides instructions to the model on how to format the actions to take, and includes guidelines on how the thought process should be
- Conversations: User and Assistant Messages
- exchange between the user (Human) and the assistant (LLM)
- different model have different templates
Chat-Templates
- Base models:is trained on raw text data to predict the next token.
- Instruct Models: is fine-tuned specifically to follow instructions and engage in conversations.
- Chat templates:format prompts in a consistent way
- Conversations → Prompts
What are Tools?
- A Tool is a function given to the LLM.
- common tools
Tool | Description |
Web Search | Allows the agent to fetch up-to-date information from the internet. |
Image Generation | Creates images based on text descriptions. |
Retrieval | Retrieves information from an external source. |
API Interface | Interacts with an external API (GitHub, YouTube, Spotify, etc.). |
- A Tool contain:
- A textual description of what the function does.
- A Callable (something to perform an action).
- Arguments with typings.
- (Optional) Outputs with typings.
- Why need tools: LLMs predict the completion of a prompt based on their training data.
- their internal knowledge only includes events prior to their training ⇒ if your agent needs up-to-date data you must provide it through some tool.
- Hallucination problem: make up something randomly
- How do tools work?
- Teach the LLM about the existence of tools
- LLM will generate text, in the form of code, to invoke that tool
- The output from the tool will then be sent back to the LLM, which will compose its final response for the user.
- Tool class:
- core function:
- to_string(): converts the tool’s attributes into a textual representation
- __call__(): Calls the function when the tool instance is invoked.
- Summary:
- What Tools Are: Functions that give LLMs extra capabilities, such as performing calculations or accessing external data.
- How to Define a Tool: By providing a clear textual description, inputs, outputs, and a callable function.
- Why Tools Are Essential: They enable Agents to overcome the limitations of static model training, handle real-time tasks, and perform specialized actions.
Thought-Action-Observation Cycle
- Agents work in a continuous cycle of: thinking (Thought) → acting (Act) and observing (Observe).
- Thought: The LLM part of the Agent decides what the next step should be.
- Action: The agent takes an action, by calling the tools with the associated arguments.
- Observation: The model reflects on the response from the tool.
- In many Agent frameworks, the rules and guidelines are embedded directly into the system prompt.
Thought: Internal Reasoning and the Re-Act Approach
- What are thoughts?
- The Agent’s internal reasoning and planning processes to solve the task
- Re-Act approach: the concatenation of “Reasoning” (Think) with “Acting” (Act)
- A simple technique that appends “Let’s think step by step”
- Encourage to generate a plan
- Encourage to decompose the problem into sub-tasks
- Less errors than generate the final solution directly
Actions: Enabling the Agent to Engage with Its Environment
- Types of Agent Actions
- By action format
- By purpose
Type of Agent | Description |
JSON Agent | The Action to take is specified in JSON format. |
Code Agent | The Agent writes a code block that is interpreted externally. |
Function-calling Agent | It is a subcategory of the JSON Agent which has been fine-tuned to generate a new message for each action. |
Type of Action | Description |
Information Gathering | Performing web searches, querying databases, or retrieving documents. |
Tool Usage | Making API calls, running calculations, and executing code. |
Environment Interaction | Manipulating digital interfaces or controlling physical devices. |
Communication | Engaging with users via chat or collaborating with other agents. |
- The Stop and Parse Approach
- LLM only handles text and uses it to describe the action it wants take and the parameters to supply to the tool.
- Generation in a Structured Format
- eg: JSON
- Halting Further Generation
- Parsing the Output
| A crucial ability to STOP generating new tokens when an action is complete
- Code Agents
- Generate an executable code block
- eg: Python
- Advantages:
- Expressiveness
- Modularity and Reusability
- Enhanced Debuggability
- Direct Integration
Observe, Integrating Feedback to Reflect and Adapt
- Observations are how an Agent perceives the consequences of its actions.
- The observation phase
- Collects Feedback: Receives data or confirmation that its action was successful (or not).
- Appends Results: Integrates the new information into its existing context, effectively updating its memory.
- Adapts its Strategy: Uses this updated context to refine subsequent thoughts and actions.
Dummy Agent Library
Notebook: ‣
First Agent Template
repo: ‣
What is smolagents?
- a library that provides a framework for developing your agents with ease.