extract_document
Extract raw text content from a document. Supports PDF (pdf-parse), Word/DOCX (mammoth), images via OCR (tesseract.js), Excel/CSV (xlsx), and plain text. Accepts a remote URL, a local file path, or base64-encoded bytes. Returns the extracted text plus character and word counts.
summarize_document
Extract text from a document and summarize it with Claude AI. Works with any supported format (PDF, DOCX, image, Excel, plain text). You can also pass pre-extracted text directly via the `text` field. Offers three styles: concise (default), detailed, and bullet-points.
parse_fields
Extract specific named fields from a document using Claude AI. Returns a JSON object with the requested field names as keys and extracted values as strings (null when a field is not found). Ideal for structured extraction from invoices, contracts, receipts, forms, etc. You can pass pre-extracted text via `text` or provide a document source.
analyze_document
Answer a specific question about a document using Claude AI. Works with any supported format (PDF, DOCX, image, Excel, plain text). You can also pass pre-extracted text directly via the `text` field.