fetch_url
Fetch a URL with full reliability — retry, circuit breaker, cache, and anti-bot bypass.
Returns both raw HTML and clean markdown. Automatically retries on failure
with exponential backoff, falls back to plain HTTP if browser fetch fails,
and circuit-breaks domains that are consistently down.
Args:
url: The URL to fetch
use_cache: Whether to use cached results (default: true, TTL 1 hour)
js_render: Whether to render JavaScript (default: true, disable for speed)
wait_for: CSS selector to wait for before capturing (e.g., '.results-loaded')
fetch_markdown
Fetch a URL and return clean markdown text optimized for LLM consumption.
Same reliability as fetch_url but returns only the markdown content,
stripping HTML, scripts, and noise. Best for when you need the page
content for analysis, summarization, or data extraction.
Args:
url: The URL to fetch
use_cache: Whether to use cached results (default: true)
wait_for: CSS selector to wait for before capturing
check_domain
Check the health status of a domain.
Returns the circuit breaker state: 'closed' (healthy), 'open' (failing),
or 'half_open' (testing recovery). Use this before batch operations to
avoid wasting time on domains that are down.
Args:
domain: The domain to check (e.g., 'example.com')
cache_stats
Get cache statistics — size and item count.
Useful for monitoring cache utilization and deciding when to clear.
clear_cache
Clear the entire fetch cache.
Use when you need fresh data and don't want to rely on cached results.