≡ Menu
agentic ai

Building agentic AI systems involves a combination of foundational AI technologies, programming languages, and specialized frameworks.

Agentic AI systems are characterized by their ability to perceive, reason, plan, act, and learn autonomously.

To achieve this there are certain agentic AI tools and resources that you need.

Check them out below.

LangChain – LangChain is an open-source software framework designed to simplify the development of applications powered by large language models (LLMs). It acts as an orchestration layer, providing a standardized way to connect LLMs with external data sources and computational tools. This modular approach, built around “chains” and “agents,” allows developers to create complex workflows for tasks like building chatbots, performing document analysis and summarization, generating content, and even enabling autonomous AI agents. LangChain supports integration with a wide variety of LLMs and external services, offering flexibility and streamlining the process of building sophisticated, context-aware LLM applications.

LangGraph – LangGraph, built by LangChain, is an open-source AI agent framework designed for building, deploying, and managing complex generative AI agent workflows. It leverages a graph-based architecture to model and orchestrate intricate relationships between various components of an AI agent, enabling the creation of stateful and cyclical workflows. This allows for more sophisticated decision-making, improved scalability, and enhanced performance in applications like chatbots, multi-agent systems, and other LLM-backed experiences. LangGraph provides control over agent actions, facilitates human-in-the-loop oversight, and offers robust features for memory, debugging, and production-ready deployment, making it a powerful tool for developing reliable and adaptable AI agents.

Firecrawl – Firecrawl is an API service designed to simplify the process of extracting clean, LLM-ready data from websites. It functions as a web scraper and crawler, capable of converting entire websites or specific URLs into structured data, typically in markdown or JSON format. Firecrawl handles complexities like reverse proxies, caching, rate limits, and content blocked by JavaScript, making it reliable for diverse web scraping needs. It offers various modes including scrape for single URLs, crawl for full website traversal, and map to generate lists of semantically related pages. Additionally, Firecrawl provides an extract endpoint for advanced data extraction using natural language prompts and schema definitions, enabling users to retrieve specific, structured information from web pages for use in AI applications.

Warp – Warp is a modern, AI-powered terminal designed to function as an “Agentic Development Environment.” It goes beyond traditional terminals by integrating intelligent features to help software engineers with multi-step tasks. Instead of just running commands, Warp allows developers to use natural language to ask agents to write code, debug issues, or manage workflows. Key features include the “Agent Mode,” which can interpret and execute multi-step tasks, and a “Universal Input” that accepts both commands and conversational prompts. The terminal also features a “Block” system that groups commands and their output together for easy sharing and a built-in code editor, which allows developers to stay in their workflow and quickly refine agent-suggested code. By providing a management layer to track agents and their progress, Warp empowers developers to act as orchestrators of an AI-driven workflow, increasing productivity and enabling multitasking across complex projects.

When choosing agentic AI  tools, consider your specific use case, desired level of autonomy, integration needs, scalability requirements, and the expertise of your development team.

What agentic AI tools and resources do you use in your workflows?

Let us know in the comments section below!

 

{ 0 comments }

Credit:https://hasdata.com/

In today’s data-driven world, getting information from the web and making it useful for powerful AI models is a game-changer. Imagine being able to summarize entire websites, extract specific facts from a blog series, or even train a custom chatbot on a curated set of online resources.

This tutorial will walk you through the process of scraping web data efficiently using Firecrawl and then seamlessly feeding that data into a Large Language Model (LLM) for further processing, analysis, or generation.

Why Firecrawl?

While many web scraping tools exist, Firecrawl stands out for its simplicity and its ability to return clean, structured content (Markdown or HTML) from URLs, making it ideal for LLM ingestion. It handles common scraping headaches like JavaScript rendering and content extraction with ease.

What LLM are we using?

For this tutorial, we’ll demonstrate with a hypothetical LLM API, as the exact implementation will vary depending on the LLM provider you choose (e.g., OpenAI’s GPT series, Google’s Gemini, Anthropic’s Claude, Hugging Face models, etc.). The core principle of sending scraped text remains the same.

Let’s Get Started!

Prerequisites:

  1. Firecrawl API Key: You’ll need to sign up for a Firecrawl API key. Visit their website to get one.
  2. Python: This tutorial uses Python. Make sure you have it installed.
  3. requests library: For making API calls. Install it using pip: pip install requests

Step 1: Scraping Data with Firecrawl

First, let’s write a Python script to scrape a website using Firecrawl.

For this example, we’ll scrape a hypothetical blog post.

import requests
import json
import os

# Replace with your actual Firecrawl API Key
FIRECRAWL_API_KEY = "YOUR_FIRECRAWL_API_KEY"
FIRECRAWL_API_URL = "https://api.firecrawl.dev/v0/scrape"

def scrape_website_firecrawl(url):
    """
    Scrapes a given URL using the Firecrawl API.

    Args:
        url (str): The URL to scrape.

    Returns:
        str: The scraped content in Markdown format, or None if an error occurs.
    """
    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {FIRECRAWL_API_KEY}"
    }
    payload = {
        "url": url,
        "pageOptions": {
            "onlyMainContent": True  # Focus on the main content of the page
        }
    }

    try:
        response = requests.post(FIRECRAWL_API_URL, headers=headers, json=payload)
        response.raise_for_status()  # Raise an exception for bad status codes (4xx or 5xx)
        data = response.json()
        if data and data.get("success") and data.get("data") and data["data"][0].get("content"):
            return data["data"][0]["content"]
        else:
            print(f"Error: Firecrawl did not return expected data for {url}")
            return None
    except requests.exceptions.RequestException as e:
        print(f"Error scraping {url}: {e}")
        return None

if __name__ == "__main__":
    target_url = "https://blog.firecrawl.dev/blog/scraping-dynamic-content-with-firecrawl-and-playwright/" # Example URL
    scraped_content = scrape_website_firecrawl(target_url)

    if scraped_content:
        print("Successfully scraped content (first 500 characters):")
        print(scraped_content[:500])
        # You might want to save this to a file for larger content
        with open("scraped_content.md", "w", encoding="utf-8") as f:
            f.write(scraped_content)
        print("\nScraped content saved to scraped_content.md")
    else:
        print("Failed to scrape content.")

Explanation:

  • We define FIRECRAWL_API_KEY and FIRECRAWL_API_URL.
  • The scrape_website_firecrawl function takes a URL, sets up the necessary headers (including your API key), and sends a POST request to the Firecrawl API.
  • "onlyMainContent": True is a crucial pageOptions setting that tells Firecrawl to focus on extracting the primary article/blog post content, ignoring sidebars, footers, and headers, which is perfect for LLM input.
  • It checks for successful response and extracts the content which is typically in Markdown format.
  • The scraped content is then saved to a Markdown file for easy inspection.

Step 2: Feeding Scraped Data to an LLM

Now, let’s take the scraped_content and send it to an LLM. For this example, we’ll use a placeholder for an LLM API call. Remember to replace this with the actual API endpoint and authentication for your chosen LLM.

import requests
import json
import os

# --- (Previous Firecrawl scraping code goes here) ---

# Replace with your actual Firecrawl API Key
FIRECRAWL_API_KEY = "YOUR_FIRECRAWL_API_KEY"
FIRECRAWL_API_URL = "https://api.firecrawl.dev/v0/scrape"

def scrape_website_firecrawl(url):
    """
    Scrapes a given URL using the Firecrawl API.

    Args:
        url (str): The URL to scrape.

    Returns:
        str: The scraped content in Markdown format, or None if an error occurs.
    """
    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {FIRECRAWL_API_KEY}"
    }
    payload = {
        "url": url,
        "pageOptions": {
            "onlyMainContent": True  # Focus on the main content of the page
        }
    }

    try:
        response = requests.post(FIRECRAWL_API_URL, headers=headers, json=payload)
        response.raise_for_status()  # Raise an exception for bad status codes (4xx or 5xx)
        data = response.json()
        if data and data.get("success") and data.get("data") and data["data"][0].get("content"):
            return data["data"][0]["content"]
        else:
            print(f"Error: Firecrawl did not return expected data for {url}")
            return None
    except requests.exceptions.RequestException as e:
        print(f"Error scraping {url}: {e}")
        return None


# --- LLM Integration ---

# Placeholder for your LLM API details
LLM_API_URL = "https://api.your-llm-provider.com/v1/generate" # Example URL
LLM_API_KEY = "YOUR_LLM_API_KEY" # Replace with your LLM API Key

def send_to_llm(text_content, prompt):
    """
    Sends the text content and a prompt to a hypothetical LLM API.

    Args:
        text_content (str): The scraped text content to send to the LLM.
        prompt (str): The prompt to instruct the LLM.

    Returns:
        str: The LLM's response, or None if an error occurs.
    """
    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {LLM_API_KEY}" # Or whatever auth your LLM uses
    }
    
    # The payload structure will vary greatly depending on your LLM provider.
    # This is a common structure for text generation.
    payload = {
        "model": "your-preferred-llm-model", # e.g., "gpt-4", "gemini-1.5-pro", etc.
        "messages": [
            {"role": "system", "content": "You are a helpful assistant that can summarize and analyze documents."},
            {"role": "user", "content": f"{prompt}\n\nHere is the document:\n\n{text_content}"}
        ],
        "max_tokens": 1000, # Adjust as needed
        "temperature": 0.7 # Adjust for creativity vs. factualness
    }

    try:
        response = requests.post(LLM_API_URL, headers=headers, json=payload)
        response.raise_for_status()
        llm_data = response.json()
        # This part also depends on the LLM's response structure
        if llm_data and llm_data.get("choices") and llm_data["choices"][0].get("message") and llm_data["choices"][0]["message"].get("content"):
            return llm_data["choices"][0]["message"]["content"]
        else:
            print("Error: LLM did not return expected response structure.")
            return None
    except requests.exceptions.RequestException as e:
        print(f"Error communicating with LLM API: {e}")
        return None

if __name__ == "__main__":
    target_url = "https://blog.firecrawl.dev/blog/scraping-dynamic-content-with-firecrawl-and-playwright/" # Example URL
    scraped_content = scrape_website_firecrawl(target_url)

    if scraped_content:
        print("\n--- Sending scraped content to LLM ---")
        user_prompt = "Summarize the key points of this article in under 200 words."
        llm_response = send_to_llm(scraped_content, user_prompt)

        if llm_response:
            print("\nLLM Response:")
            print(llm_response)
        else:
            print("Failed to get response from LLM.")
    else:
        print("Skipping LLM interaction due to failed scraping.")

Explanation:

  • LLM_API_URL and LLM_API_KEY: These are placeholders. You must replace them with the actual API endpoint and your authentication method for your chosen LLM.
  • send_to_llm function:
    • It takes the text_content (our scraped data) and a prompt as input.
    • The payload structure is generic for a conversational LLM API. Key elements are:
      • model: Specify the LLM model you want to use.
      • messages: This is where you put the conversation. We use a system message to define the LLM’s role and a user message containing your prompt and the scraped text_content.
      • max_tokens: Limits the length of the LLM’s response.
      • temperature: Controls the creativity of the LLM’s output.
    • Error handling is included to catch API communication issues.
  • if __name__ == "__main__": block:
    • We first scrape the content.
    • If successful, we define a user_prompt (e.g., “Summarize this article”).
    • Then, we call send_to_llm with the scraped content and our prompt.
    • Finally, we print the LLM’s response.

Putting it All Together and Beyond

By combining Firecrawl’s efficient scraping with the power of LLMs, you can unlock a vast array of possibilities:

  • Content Summarization: Quickly get the gist of long articles, reports, or research papers.
  • Information Extraction: Ask the LLM to pull out specific data points (e.g., dates, names, key metrics) from unstructured text.
  • Question Answering: Build a system that can answer questions based on the content of multiple scraped web pages.
  • Sentiment Analysis: Analyze the tone and sentiment of reviews or comments on a product page.
  • Content Generation: Use scraped data as context for generating new, related content (e.g., drafting a social media post based on a blog article).
  • Custom Chatbots: Train a chatbot on a specific knowledge base created from scraped documentation or FAQs.

Important Considerations:

  • Rate Limits: Be mindful of API rate limits for both Firecrawl and your chosen LLM. Implement delays or backoff strategies if you’re making many requests.
  • Content Length: LLMs have context window limits (the maximum amount of text they can process at once). For very long scraped articles, you might need to implement strategies like:
    • Chunking: Split the scraped content into smaller, manageable chunks and process them individually.
    • Summarization (Pre-processing): Use a smaller, faster LLM to summarize content before sending it to a more powerful LLM for deeper analysis.
  • Ethical Scraping: Always respect robots.txt and the terms of service of the websites you scrape. Avoid overloading servers with too many requests.
  • Error Handling: Robust error handling is crucial for production applications, including retries and logging.

Conclusion

This tutorial provides a solid foundation for integrating web scraping with LLMs. Firecrawl simplifies the data acquisition, providing clean, ready-to-use text, while LLMs empower you to derive meaningful insights and generate valuable outputs from that data. Experiment with different prompts and LLM models to discover the full potential of this powerful combination!

Happy scraping and prompting!

{ 0 comments }