take_screenshot
Capture a screenshot of any URL and return a public image URL. By default captures the full scrollable page. Set fullPage to false for viewport-only capture (recommended for long pages). Returns image dimensions in the response.
screenshot_tablet
Capture a screenshot at iPad viewport (820×1180). By default captures viewport-only (not the full scrollable page). Set fullPage to true for full-page capture. Returns device name, dimensions, and public image URL.
screenshot_responsive
Capture screenshots at desktop (1280×800), tablet (820×1180), and mobile (393×852) viewports in one call. By default captures viewport-only (recommended). Set fullPage to true for full-page captures. Returns all three URLs for responsive comparison.
screenshot_fullpage
Capture a full-page screenshot (entire scrollable content) of any URL. Use max_height to cap extremely long pages and prevent unreadable strips.
screenshot_dark
Capture a full-page screenshot with dark mode (prefers-color-scheme: dark) emulated. Works on sites that support dark mode via CSS media queries.
screenshot_element
Capture a screenshot of a specific element on the page by CSS selector. Only the matched element is captured, not the full page. Automatically waits for the element to appear (SPA-friendly). Use delay for pages that need extra hydration time.
screenshot_pdf
Export a webpage as a PDF document (A4 format with background graphics). Returns a public URL to the PDF file.
list_recent_screenshots
List the most recent screenshots taken with this API key. Returns URLs and metadata.
get_screenshot_status
Check the status of a screenshot job by ID. Returns done/pending/failed and the public URL if ready.
browser_navigate
Open a browser and navigate to a URL. Returns a screenshot of the loaded page. Use this to start a browser session — the returned sessionId must be passed to all subsequent browser_ tools. Pass width/height to start with a custom viewport (e.g. 393×852 for mobile). Set record_video to true to record the entire session as a video — the recording URL is returned when browser_close is called. When workflow metadata is provided, the resulting run can surface structured verdicts, summaries, and next actions in the dashboard.
browser_click
Click an element on the current browser page by CSS selector or visible text. Returns a screenshot after clicking.
browser_click_at
Click at specific x,y coordinates on the current browser page. Use this when elements cannot be targeted by CSS selector — such as CAPTCHA checkboxes, canvas elements, iframes, or Cloudflare Turnstile widgets. Returns a screenshot after clicking.
browser_fill
Type text into an input field on the current browser page. Clears the field first, then types the value.
browser_screenshot
Take a screenshot of the current browser page without performing any action.
browser_scroll
Scroll the browser page by a given amount in pixels.
browser_wait_for
Wait for an element to appear on the page, then return a screenshot. Useful after navigation or form submissions.
browser_evaluate
Run JavaScript in the browser page and return the result as text. Useful for extracting data, checking values, or triggering actions.
browser_set_viewport
Resize the browser viewport in an existing session. Useful for testing responsive layouts without starting a new session — e.g. switch between desktop (1280×800), tablet (820×1180), and mobile (393×852). Returns a screenshot after resizing.
browser_close
Close the browser session and free all resources. Always call this when the browser workflow is complete. If the session was started with record_video: true, the video recording URL is returned.
browser_get_accessibility_tree
Get the accessibility tree of the current page. Returns a structured snapshot of all interactive elements, headings, links, buttons, form fields, images with alt text, and ARIA roles. This is the BEST tool for understanding page structure and UX without looking at screenshots.
browser_get_text
Extract all visible text from the current page. Useful for understanding page content without screenshots. Returns text in reading order.
browser_get_html
Get the HTML of the current page or a specific element. Useful for inspecting DOM structure, class names, and attributes.
browser_hover
Hover over an element on the page. Useful for triggering tooltips, dropdown menus, or hover states. Returns a screenshot after hovering.
browser_select_option
Select an option from a <select> dropdown element. Returns a screenshot after selection.
browser_go_back
Navigate back in browser history (like clicking the Back button). Returns a screenshot of the previous page.
browser_go_forward
Navigate forward in browser history. Returns a screenshot.
browser_console_logs
Get captured console logs (errors, warnings, logs) and JavaScript exceptions from the current browser session. Essential for debugging frontend issues.
browser_network_errors
Get failed network requests (4xx/5xx responses) captured during the browser session. Useful for identifying broken API calls, missing resources, and backend errors.
browser_perf_metrics
Get Core Web Vitals and performance metrics for the current page. Returns LCP, FCP, CLS, TTFB, DOM size, resource counts, and total transfer size. Essential for performance audits.
browser_network_requests
Get the full network request waterfall with timing data. Shows every request made by the page — URLs, methods, status codes, resource types, durations, and sizes. Use for performance analysis and debugging.
browser_seo_audit
Extract SEO metadata from the current page: title, meta description, Open Graph tags, Twitter cards, canonical URL, heading hierarchy, structured data (JSON-LD), robots directives, and image alt text coverage.
browser_press_key
Press a keyboard key or key combination. Supports special keys like Enter, Tab, Escape, ArrowDown, and modifiers like Control+A, Shift+Tab. Returns a screenshot after pressing.
browser_cookies
Get or set cookies for the current browser session. Use 'get' to read all cookies (useful for debugging auth). Use 'set' to add cookies (useful for setting auth tokens). Use 'clear' to delete all cookies.
browser_storage
Read or write localStorage and sessionStorage. Use for debugging client-side state, auth tokens, feature flags, and cached data.
find_login_page
Discover login/sign-in pages for a website. Checks the site's sitemap.xml and probes common login URL paths. Returns a list of candidate login URLs found. Use this before attempting to log in to a site.
smart_login
Attempt to log in to a website. Navigates to the login URL, finds email/username and password fields, fills them in, and submits the form with click, Enter, and form-submit fallbacks for Clerk and other multi-step auth UIs. Returns a screenshot and reports whether login succeeded, failed, or needs verification. Always ask the user for credentials first — never guess. If the site requires email verification (OTP code), use read_verification_email to automatically fetch the code from Gmail (requires one-time authorize_email_access setup).
accessibility_snapshot
Get the accessibility tree for any URL without needing a browser session. Returns a structured snapshot of all interactive elements, headings, links, buttons, form fields, images with alt text, and ARIA roles. Great for quick UX audits.
screenshot_diff
Compare two URLs pixel-by-pixel and return a diff overlay image showing exactly what changed. Returns the diff image URL, percentage of pixels changed, total changed pixel count, and a match score.
webhook_list
List all outbound webhook endpoints registered for the current account. Use this to confirm which URLs will receive screenshot.completed, run.completed, run.failed, and quota.warning events.
webhook_create
Register a new outbound webhook endpoint. The signing secret is returned ONCE — store it before doing anything else. Default events=['*'] subscribes to every event type. Available events: screenshot.completed, screenshot.failed, run.completed, run.failed, quota.warning, test.ping.
webhook_test
Fire a test.ping event to a webhook endpoint to verify reachability and signature handling. Returns once the delivery has been enqueued — inspect with webhook_deliveries shortly after.
webhook_rotate
Rotate the signing secret for an endpoint. The new secret is returned once — update your verifier immediately to avoid signature mismatches.
webhook_deliveries
List the most recent delivery attempts for a webhook endpoint, including HTTP status, attempt count, and any error message. Use after a test.ping or when debugging customer-reported missed events.
webhook_delete
Permanently delete a webhook endpoint and stop sending events to it. Existing in-flight deliveries are not retried.
screenshot_batch
Capture screenshots of multiple URLs in one call (max 10). Returns an array of results with screenshot URLs and metadata. All screenshots share the same viewport and format settings.
screenshot_cross_browser
Capture a URL in Chromium, Firefox, and WebKit simultaneously. Returns three screenshot URLs — one per browser engine. Useful for cross-browser visual testing.
find_breakpoints
Detect responsive layout breakpoints for a URL. Scans viewport widths from 320px to 1920px and identifies where significant layout changes occur (large height jumps, content reflows). Returns a list of detected breakpoint widths.
ux_review
Run an AI-powered UX review on any URL. Captures a screenshot and analyzes it along with accessibility tree, SEO metadata, and performance metrics using Kimi k2.5 vision. Returns actionable UX feedback across categories: Accessibility, SEO, Performance, Navigation, Content, and Mobile-friendliness.
authorize_email_access
One-time setup: Connect the user's Gmail account via OAuth so the AI can read verification emails automatically. Returns an authorization URL the user must visit. After authorizing, the AI can use read_verification_email to fetch OTP codes.
read_verification_email
Read the latest email verification code / OTP from the user's Gmail inbox. Use this after smart_login encounters a verification code screen. The user must have previously authorized Gmail access via authorize_email_access. Searches recent emails for verification codes from common senders (Clerk, Auth0, etc).
auth_test_assist
Start here for website login, sign-up, and verification testing. This is the shared auth entrypoint for MCP and CLI workflows. It reuses your saved inbox/password, checks remembered auth state for the site's normalized origin, and returns reusable auth strategy plus site-specific signals such as recommended auth path, account-exists confidence, likely auth method, expected follow-up, and known-site history. Call it again with action='record' after auth attempts to save what worked.
create_test_inbox
Standalone inbox helper for testing. Create or reuse the saved primary disposable email inbox, then use auth_test_assist first when the task is website auth so you also get reusable cross-site strategy and remembered per-site guidance. Returns email, password, inbox ID, and known-site history for the reusable inbox.
check_inbox
Check a disposable AgentMail inbox for new messages. Use after create_test_inbox to read verification emails, OTP codes, welcome emails, or password reset links. Automatically extracts verification codes from email content.
send_test_email
Send an email from a disposable AgentMail inbox. Useful for testing contact forms, reply workflows, or sending test data to services.
solve_captcha
Automatically solve CAPTCHAs on the current page using CapSolver AI. Supports Cloudflare Turnstile, reCAPTCHA v2/v3, and hCaptcha. Detects the CAPTCHA type and sitekey automatically, sends it to CapSolver for solving, injects the token, and optionally submits the form. Use this when a CAPTCHA blocks form submission during browser automation.
og_preview
Preview how a URL will look when shared on social media. Extracts all Open Graph and Twitter Card meta tags from the rendered page, validates them, screenshots the og:image, and generates a social card mockup. Works with JS-rendered pages (SPAs). No browser session needed.