Scraping & Crawling Tools
Sapiom provides advanced web scraping and crawling via Firecrawl. These tools go beyond sapiom_fetch (which returns clean markdown for a single page) — scrape with format options, crawl entire sites, map sitemaps, and extract structured data with natural language prompts.
All scraping tools accept an optional agentName parameter for spend attribution.
Scrape
Section titled “Scrape”sapiom_scrape
Section titled “sapiom_scrape”Scrape a single webpage with advanced options. Returns content in requested formats (markdown, html, rawHtml, screenshot). Supports main-content extraction and wait conditions.
url string required URL of the webpage to scrape
formats string[] Output formats: markdown, html, rawHtml, screenshot. Defaults to markdown.
onlyMainContent boolean Extract only the main content, removing navigation, footers, and ads
waitFor number Wait time in milliseconds before scraping (useful for JS-rendered content)
sapiom_crawl
Section titled “sapiom_crawl”Crawl an entire website starting from a URL. Returns a job ID for async status polling via sapiom_crawl_status. Supports depth limits, page limits, and path filtering.
url string required Starting URL for the crawl
maxDiscoveryDepth number Maximum link depth to crawl (min: 1)
limit number Maximum pages to crawl (1–10,000)
includePaths string[] URL path patterns to include (e.g. ["/docs/*", "/blog/*"])
excludePaths string[] URL path patterns to exclude (e.g. ["/admin/*"])
sapiom_crawl_status
Section titled “sapiom_crawl_status”Check the status of a crawl job and retrieve results. Use the job ID returned by sapiom_crawl.
id string required Crawl job ID from sapiom_crawl
sapiom_map
Section titled “sapiom_map”Map all URLs on a website without extracting content. Fast sitemap discovery — returns a list of all discoverable URLs from the given starting page.
url string required URL of the website to map
Extract
Section titled “Extract”sapiom_extract
Section titled “sapiom_extract”Extract structured data from web pages using a natural language prompt and optional JSON schema. Provide URLs and describe what data to extract — returns an async job ID.
urls string[] required URLs to extract data from
prompt string required Natural language description of what data to extract (e.g. “Extract all pricing tiers with names, prices, and features”)
schema object Optional JSON schema defining the expected output structure
sapiom_extract_status
Section titled “sapiom_extract_status”Check the status of an extract job and retrieve results. Use the job ID returned by sapiom_extract.
id string required Extract job ID from sapiom_extract
Site Search
Section titled “Site Search”sapiom_site_search
Section titled “sapiom_site_search”Search within a specific website’s content. Unlike sapiom_search (web-wide search via Linkup) or sapiom_deep_search (web-wide via You.com), this searches only within the pages of the specified site.
query string required Search query
url string required Site URL to search within (e.g. "https://docs.example.com")