学习笔记-Agents Course Unit1

status
Published
slug
agent-course-unit
type
Post
category
Technology
date
Feb 16, 2025 → Feb 20, 2025
tags
LLM
笔记
AI Agent
课程
summary
Hugging Face官方AI Agents课程Unit的学习笔记,关于AI Agent的概念,原理与实现
💡
Hugging Face官方AI Agents课程,链接:https://huggingface.co/learn/agents-course/unit0/introduction

Unit1 Introduction to Agents

What is an Agent?

  • an AI model capable of reasoning, planning, and interacting with its environment.
notion image
  • Formal:An Agent is a system that leverages an AI model to interact with its environment in order to achieve a user-defined objective. It combines reasoning, planning, and the execution of actions (often via external tools) to fulfill tasks.
    • Two main parts
      • The Brain:AI model (LLM), reasoning and planning
      • The Body:Capabilities and Tools, acting
    • Summary
      • Understand natural language: Interpret and respond to human instructions in a meaningful way.
      • Reason and plan: Analyze information, make decisions, and devise strategies to solve problems.
      • Interact with its environment: Gather information, take actions, and observe the results of those actions.

What are LLMs

  • A brain of the Agent
  • a type of AI model that excels at understanding and generating human language
    • trained on vast amounts of text data
    • consist of many millions of parameters
  • Mainstream Architecture:Transformer
    • Type
      Example
      Use Cases
      Typical Size
      Encoders
      BERT from Google
      Text classification, semantic search, Named Entity Recognition
      Millions of parameters
      Decoders
      Llama from Meta
      Text generation, chatbots, code generation
      Billions (in the US sense, i.e., 10^9) of parameters
      Seq2Seq(Encoder + Decoder)
      T5, BART,
      Translation, Summarization, Paraphrasing
      Millions of parameters
    • Key Aspect: Attention
      • identify the most relevant words to predict the next token
  • token: the unit of information an LLM works with
    • eg: interest, ing, ed
    • special token:
      • The LLM uses these tokens to open and close the structured components of its generation.
      • End of sequence token (EOS)
  • prompt: the input sequence that is provided to an LLM
    • Careful design of the prompt makes it easier to guide the generation of the LLM toward the desired output.

Next token prediction

  • Autoregressive
    • the output will become the input for the next one
  • Decode strategies
    • Maximum: always select the token with the maximum score
    • Beam Search: explore multiple candidate sequences to find the one with the maximum total score

LLMs’ training

  • pre-training: train on large dataset, unsupervised learning
    • learns the structure of the language and underlying patterns in text, allowing the model to generalize to unseen data.
  • fine-tune: custom dataset, supervised learning
    • perform specific task

Messages and Special Tokens

  • how LLMs structure their generations through chat templates
  • Chart templates: the bridge between conversational messages (user and assistant turns) and the specific formatting requirements of your chosen LLM

Messages: The Underlying System of LLMs

  • System Messages: define how the model should behave.
    • persistent instructions
    • Effect: gives information about the available tools, provides instructions to the model on how to format the actions to take, and includes guidelines on how the thought process should be
  • Conversations: User and Assistant Messages
    • exchange between the user (Human) and the assistant (LLM)
    • different model have different templates

Chat-Templates

  • Base models:is trained on raw text data to predict the next token.
  • Instruct Models: is fine-tuned specifically to follow instructions and engage in conversations.
    • Chat templates:format prompts in a consistent way
      • Conversations → Prompts

What are Tools?

  • Tool is a function given to the LLM.
    • common tools
      • Tool
        Description
        Web Search
        Allows the agent to fetch up-to-date information from the internet.
        Image Generation
        Creates images based on text descriptions.
        Retrieval
        Retrieves information from an external source.
        API Interface
        Interacts with an external API (GitHub, YouTube, Spotify, etc.).
  • A Tool contain:
    • textual description of what the function does.
    • Callable (something to perform an action).
    • Arguments with typings.
    • (Optional) Outputs with typings.
  • Why need tools: LLMs predict the completion of a prompt based on their training data.
    • their internal knowledge only includes events prior to their training ⇒ if your agent needs up-to-date data you must provide it through some tool.
    • Hallucination problem: make up something randomly
  • How do tools work?
    • Teach the LLM about the existence of tools
    • LLM  will generate text, in the form of code, to invoke that tool
    • The output from the tool will then be sent back to the LLM, which will compose its final response for the user.
  • Tool class:
    • core function:
      • to_string(): converts the tool’s attributes into a textual representation
      • __call__(): Calls the function when the tool instance is invoked.
  • Summary:
    • What Tools Are: Functions that give LLMs extra capabilities, such as performing calculations or accessing external data.
    • How to Define a Tool: By providing a clear textual description, inputs, outputs, and a callable function.
    • Why Tools Are Essential: They enable Agents to overcome the limitations of static model training, handle real-time tasks, and perform specialized actions.

Thought-Action-Observation Cycle

  • Agents work in a continuous cycle of: thinking (Thought) → acting (Act) and observing (Observe).
      1. Thought: The LLM part of the Agent decides what the next step should be.
      1. Action: The agent takes an action, by calling the tools with the associated arguments.
      1. Observation: The model reflects on the response from the tool.
  • In many Agent frameworks, the rules and guidelines are embedded directly into the system prompt.

Thought: Internal Reasoning and the Re-Act Approach

  • What are thoughts?
    • The Agent’s internal reasoning and planning processes to solve the task
  • Re-Act approach: the concatenation of “Reasoning” (Think) with “Acting” (Act)
    • A simple technique that appends “Let’s think step by step”
      • Encourage to generate a plan
      • Encourage to decompose the problem into sub-tasks
    • Less errors than generate the final solution directly

Actions: Enabling the Agent to Engage with Its Environment

  • Types of Agent Actions
    • By action format
      • Type of Agent
        Description
        JSON Agent
        The Action to take is specified in JSON format.
        Code Agent
        The Agent writes a code block that is interpreted externally.
        Function-calling Agent
        It is a subcategory of the JSON Agent which has been fine-tuned to generate a new message for each action.
    • By purpose
      • Type of Action
        Description
        Information Gathering
        Performing web searches, querying databases, or retrieving documents.
        Tool Usage
        Making API calls, running calculations, and executing code.
        Environment Interaction
        Manipulating digital interfaces or controlling physical devices.
        Communication
        Engaging with users via chat or collaborating with other agents.
  • The Stop and Parse Approach
    • | A crucial ability to STOP generating new tokens when an action is complete
      • LLM only handles text and uses it to describe the action it wants take and the parameters to supply to the tool.
    • Generation in a Structured Format
      • eg: JSON
    • Halting Further Generation
    • Parsing the Output
  • Code Agents
    • Generate an executable code block
      • eg: Python
    • Advantages:
      • Expressiveness
      • Modularity and Reusability
      • Enhanced Debuggability
      • Direct Integration

Observe, Integrating Feedback to Reflect and Adapt

  • Observations are how an Agent perceives the consequences of its actions.
  • The observation phase
    • Collects Feedback: Receives data or confirmation that its action was successful (or not).
    • Appends Results: Integrates the new information into its existing context, effectively updating its memory.
    • Adapts its Strategy: Uses this updated context to refine subsequent thoughts and actions.

Dummy Agent Library

Notebook:

First Agent Template

repo:

What is smolagents?

  • a library that provides a framework for developing your agents with ease.
 

© Shansan 2021 - 2025