Use Firecrawl Extract 2.0 for agent-driven web scraping

Problem

Traditional web scrapers break on modern websites that require authentication, JavaScript rendering, pagination, and multi-step navigation. Sites with dynamic content, infinite scroll, or gated data behind login walls are especially problematic. Crawlers like BeautifulSoup or Scrapy cannot handle these interactions without extensive custom code for each target site.

Common failures with traditional scraping:

Login walls and session management
JavaScript-rendered content that does not exist in raw HTML
Paginated results requiring click-through navigation
Anti-bot protections and rate limiting
Dynamic content loaded via API calls after page render

Solution

Firecrawl Extract 2.0 uses AI agents that can perform multi-step browser interactions to scrape data from complex websites.

Basic extraction with a schema:

import Firecrawl from "@mendable/firecrawl-js";

const firecrawl = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY });

const result = await firecrawl.extract({
  urls: ["https://example.com/products/*"],
  prompt: "Extract all product names, prices, and availability status",
  schema: {
    type: "object",
    properties: {
      products: {
        type: "array",
        items: {
          type: "object",
          properties: {
            name: { type: "string" },
            price: { type: "number" },
            inStock: { type: "boolean" },
          },
        },
      },
    },
  },
});

Agent-driven extraction with authentication and navigation:

const result = await firecrawl.extract({
  urls: ["https://dashboard.example.com/analytics"],
  prompt: "Log in, navigate to the analytics tab, and extract monthly revenue figures",
  actions: [
    { type: "click", selector: "#login-button" },
    { type: "fill", selector: "#email", value: "user@example.com" },
    { type: "fill", selector: "#password", value: process.env.SITE_PASSWORD },
    { type: "click", selector: "#submit" },
    { type: "wait", milliseconds: 2000 },
    { type: "click", selector: "[data-tab='analytics']" },
  ],
  schema: {
    type: "object",
    properties: {
      monthlyRevenue: { type: "array", items: { type: "number" } },
      period: { type: "string" },
    },
  },
});

Why It Works

Firecrawl Extract 2.0 combines a headless browser with an AI agent that interprets page structure and executes multi-step workflows. The agent handles JavaScript rendering, waits for dynamic content, and can navigate authenticated sessions. By defining a schema, the extracted data is returned in a structured format rather than raw HTML, eliminating the need for custom parsing logic. The agent approach means you describe what you want rather than writing brittle CSS selectors that break when the site changes.

Context

Firecrawl offers both cloud-hosted and self-hosted options
The Extract 2.0 agent feature launched in April 2025 and supports login flows, tab navigation, and pagination
For bulk scraping (tens of thousands of pages), contact Firecrawl directly for increased rate limits
Alternative tools include Apify for LinkedIn-specific scraping and Scrapin.io for structured LinkedIn data
Firecrawl also provides an MCP server for integration with Claude and other AI assistants