Scraping & Crawling Tools

Sapiom provides advanced web scraping and crawling via Firecrawl. These tools go beyond sapiom_fetch (which returns clean markdown for a single page) — scrape with format options, crawl entire sites, map sitemaps, and extract structured data with natural language prompts.

All scraping tools accept an optional agentName parameter for spend attribution.

Scrape

`sapiom_scrape`

Scrape a single webpage with advanced options. Returns content in requested formats (markdown, html, rawHtml, screenshot). Supports main-content extraction and wait conditions.

url string required

URL of the webpage to scrape

formats string[]

Output formats: markdown, html, rawHtml, screenshot. Defaults to markdown.

onlyMainContent boolean

Extract only the main content, removing navigation, footers, and ads

waitFor number

Wait time in milliseconds before scraping (useful for JS-rendered content)

Crawl

`sapiom_crawl`

Crawl an entire website starting from a URL. Returns a job ID for async status polling via sapiom_crawl_status. Supports depth limits, page limits, and path filtering.

url string required

Starting URL for the crawl

maxDiscoveryDepth number

Maximum link depth to crawl (min: 1)

limit number

Maximum pages to crawl (1–10,000)

includePaths string[]

URL path patterns to include (e.g. ["/docs/*", "/blog/*"])

excludePaths string[]

URL path patterns to exclude (e.g. ["/admin/*"])

`sapiom_crawl_status`

Check the status of a crawl job and retrieve results. Use the job ID returned by sapiom_crawl.

id string required

Crawl job ID from sapiom_crawl

Map

`sapiom_map`

Map all URLs on a website without extracting content. Fast sitemap discovery — returns a list of all discoverable URLs from the given starting page.

url string required

URL of the website to map

Extract

`sapiom_extract`

Extract structured data from web pages using a natural language prompt and optional JSON schema. Provide URLs and describe what data to extract — returns an async job ID.

urls string[] required

URLs to extract data from

prompt string required

Natural language description of what data to extract (e.g. “Extract all pricing tiers with names, prices, and features”)

schema object

Optional JSON schema defining the expected output structure

`sapiom_extract_status`

Check the status of an extract job and retrieve results. Use the job ID returned by sapiom_extract.

id string required

Extract job ID from sapiom_extract

Site Search

`sapiom_site_search`

Search within a specific website’s content. Unlike sapiom_search (web-wide search via Linkup) or sapiom_deep_search (web-wide via You.com), this searches only within the pages of the specified site.

query string required

Search query

url string required

Site URL to search within (e.g. "https://docs.example.com")