If you're an AI reading this: skill.md has everything you need.Read it →
Explorer/MCP/narrateai/video

NarrateAI

REMOTE
narrateai/video

Stop recording your voice. NarrateAI adds professional voiceover to silent screen recordings — automatically. Just drop a video URL and get a narrated demo, dubbed video, or polished document back. Works from Cursor, Claude, ChatGPT, or any MCP client. **What it does:** - Narrates silent videos with AI voiceover — no mic, no script writing, no editing - 7 AI voices + voice cloning from a 15-second audio sample - Transcribes speech-to-text from any video (meetings, podcasts, tutorials) - Translates and dubs videos into any language with cloned speaker voice - Converts screen recordings into professional docs (5 templates: onboarding guides, tutorials, feature showcases, product docs, business overviews) - Text-to-speech for any text **18 MCP tools** — single videos, batch processing (up to 5), transcript editing, voice cloning, TTS, and more. Free tier: 5 minutes of video processing. Upgrade anytime at [narrateai.app](https://narrateai.app).

Tools
18
Indexed
7d ago
Deployment
remote
Endpoint
https://video--narrateai.run.tools
Tools (18)
get_job_result
Check job status and result. Poll every 60 seconds — do NOT poll more frequently. Video processing typically takes 3-5 minutes. Progress may stay at 20% during frame analysis for 1-3 minutes — this is completely normal. Do NOT interpret slow progress as failure. Only report failure when status is "failed" with an error message. Returns: status (processing | transcript_ready | completed | failed | poll_error). poll_error means a temporary connection issue — the job is still running, just retry. transcript_ready includes transcript. completed includes video_url.
get_upload_url
GET A SIGNED UPLOAD URL for uploading a local video to NarrateAI cloud storage. Use this ONLY when running in HTTP/remote mode and the user has a local video file. After getting the URL, upload the file with curl, then pass the returned temp_file_path to any processing tool as the video_source. For stdio/local mode this is NOT needed — tools can read local files directly. Returns: JSON with upload_url, temp_file_path, and a ready-to-use curl command.
generate_narration_script
NARRATION SCRIPT – generates an AI-written timed script for a SILENT video. No audio output. Use when the user wants a timed narration script, text-only narration, or sync data for a silent video. This does NOT extract existing speech (use transcribe_video for that). This does NOT produce a video file (use narrate_video_full for that). Runs as a background task with progress reporting. Processing takes 1-5 minutes. Returns: JSON with transcript, job_id, db_job_id.
narrate_video_full
FULL NARRATED VIDEO – produces a downloadable video with AI voiceover. Use when the user wants: "narrate this video", "add voiceover", "make a narrated video". VOICE OPTIONS — ask the user which they prefer: 1. AI voice: male1 (default, fastest), female1 (default, fastest), female2, female3, female4, male2, male3 2. Voice cloning: user provides an audio file (voice_sample) and their voice is cloned for the narration If voice_sample is provided, it takes priority over voice_type. Runs as a background task with progress reporting. Processing takes 2-5 minutes. Returns: JSON with video_url, transcript, job_id when done.
abandon_job
Abandon/cancel a processing job. Call this when the user cancels on the agent side. Stops the backend from continuing audio generation and video assembly. Use after narrate_video_transcript or when continue_to_full_video was started but user cancelled. Returns: JSON with success or error.
transcribe_video
TRANSCRIPTION ONLY – video with existing voice -> speech-to-text -> timed transcript. No translation, no narrated video. Returns original speech as-is. Use when the user wants to transcribe a video that already has spoken audio (podcast, interview, meeting recording, etc.). CRITICAL: source_language is REQUIRED. If the user does not specify the language of the video, you MUST ask them concisely before calling. Supported: en, zh, yue, fr, de, it, ja, ko, pt, ru, es (Qwen) + others via Whisper. Runs as a background task with progress reporting. Processing takes 1-5 minutes. Returns: JSON with transcript, job_id, db_job_id when done.
transcribe_and_translate
TRANSCRIBE & TRANSLATE (new upload) – video with voice -> speech-to-text -> translate -> translated transcript. No TTS, no video output. Returns translated timed transcript only. Use when the user uploads a new video and wants a translated transcript (e.g. Spanish podcast -> English transcript). CRITICAL: source_language and target_language are REQUIRED. Ask user if not specified. Runs as a background task with progress reporting. Processing takes 1-5 minutes.
translate_existing_video
TRANSLATION (existing video) – Translate transcript of a video already in the user's library. Loads transcript from cloud, translates, returns. No upload. Sync – returns immediately. Use when the user wants to translate a video they already narrated/dubbed with NarrateAI (e.g. "Translate my video X to French"). job_id is the completed video's job ID. CRITICAL: source_language and target_language are REQUIRED. For narrated videos, source is typically the narration language (e.g. en). For dubbed videos, source is the dubbed language.
dub_video_full
FULL AUTO-DUBBING – transcribe -> translate -> extract speaker voice -> TTS with cloned voice -> dubbed video. No refinement screen. Uses the video's own speaker voice for the dubbed audio. Use when the user wants a complete dubbed video (e.g. Spanish video -> English dubbed). CRITICAL: source_language, target_language, and preserve_background_music are REQUIRED. Agent MUST ask user for all three if not specified. For preserve_background_music: ask if the video has background music they want to keep (true) or replace with silence (false). Runs as a background task with progress reporting. Processing takes 2-5 minutes.
generate_document
DOCUMENT GENERATION – produces a structured markdown document from a silent video. Use when the user wants: a document, article, guide, tutorial, or written content based on a video. NOT for narrated video or voiceover. The agent MUST ask which document type the user wants before calling: - user_onboarding: Step-by-step onboarding guide - tutorial_guide: Tutorial/how-to guide - feature_showcase: Feature showcase document - business_overview: Business overview document - product_documentation: Product documentation Also returns a synced transcript as a bonus – offer it to the user after the document is delivered ("I also have a synced transcript for this video, would you like it?"). Runs as a background task with progress reporting. Processing takes 1-5 minutes. Returns: JSON with document_markdown, document_data, transcript, job_id, db_job_id.
generate_tts
TEXT-TO-SPEECH – generate audio from text. Returns a downloadable audio URL. Use when the user wants: "read this aloud", "generate speech", "text to speech", "convert text to audio", "make an audio file from this text". VOICE OPTIONS — ask the user which they prefer: 1. AI voice: male1 (default, fastest), female1 (default, fastest), female2, female3, female4, male2, male3 2. Voice cloning: user provides an audio file (voice_sample) and their voice is cloned If voice_sample is provided, it takes priority over voice_type. Returns: JSON with audio_url, text, voice, language.
narrate_batch
BATCH NARRATION – narrate multiple videos in parallel. Each gets a full narrated video with voiceover. Use when the user has multiple videos to narrate (e.g. "narrate these 3 videos"). Maximum 5 videos per batch. Each video is processed independently – one failure does not affect others. If the user does not specify a voice, ask them ONCE (applies to all videos). Voice options: male1 (default, fastest), female1 (default, fastest), female2, female3, female4, male2, male3. CRITICAL – Context handling: Before calling, you MUST ask the user about context: 1. "Would you like to provide the same context for all videos, different context per video, or no context?" 2. If same for all: use manual_context. 3. If different: use contexts_json. 4. If none: leave both empty. Runs as a background task. Processing takes 2-10 minutes depending on video count and length. Returns: JSON array of results with video_url per video.
batch_generate_scripts
BATCH SCRIPT GENERATION – generate AI narration scripts for multiple silent videos in parallel. Each video gets a timed narration script (text only, no audio). Maximum 5 videos per batch. One failure does not affect others. CRITICAL – Context handling: Before calling, ask the user about context: 1. Same for all -> manual_context. 2. Different per video -> contexts_json. 3. No context -> leave both empty. Runs as a background task. Processing takes 2-10 minutes.
batch_transcribe
BATCH TRANSCRIPTION – transcribe speech from multiple videos in parallel. Each video must have existing spoken audio. Returns timed transcript per video. CRITICAL: source_language is REQUIRED – ask user if not specified. Applies to all videos. Maximum 5 videos per batch. One failure does not affect others. Runs as a background task. Processing takes 2-10 minutes.
batch_dub
BATCH DUBBING – dub multiple videos into another language in parallel. Each video gets full auto-dubbing (transcribe -> translate -> voice clone -> dubbed video). CRITICAL: source_language, target_language, preserve_background_music are REQUIRED – ask user. All videos share the same languages and music setting. Maximum 5 videos per batch. One failure does not affect others. Runs as a background task. Processing takes 2-10 minutes.
update_transcript
UPDATE TRANSCRIPT – edit the narration script before continuing to full video. Use after generate_narration_script returns a transcript and the user wants to change wording, timing, or content of specific segments. The user describes changes naturally; you apply them and call this tool with the updated segments. Also used in the translate-then-re-narrate flow: after translate_existing_video returns a translated transcript, call this with the translated segments, reset_for_reprocessing=True, and target_language set to the translation language (e.g. "hi", "fr") to prepare the completed job for re-narration via continue_to_full_video. The transcript_json must include ALL segments (not just changed ones) — it replaces the full transcript. Each segment needs: start_time, end_time, text. Optionally: pause_duration, chunk_type. After updating, the user can call continue_to_full_video with the same job_id. Returns: JSON with success status or error.
list_videos
LIST VIDEOS – get the user's video library (previously processed videos). Use when the user wants to see their existing videos, re-translate a previously narrated video, or work with videos they already processed. Returns paginated list with job IDs, filenames, status, and timestamps. The returned job IDs can be used with translate_existing_video to translate a completed video's transcript, or with get_job_result to check status. Returns: JSON with jobs array (id, filename, status, language, created_at, updated_at), total count, page, and per_page.
continue_to_full_video
Continue from transcript to full narrated video. Use after generate_narration_script returns a transcript and the user is satisfied with it. VOICE OPTIONS — ask the user which they prefer: 1. AI voice: male1 (default, fastest), female1 (default, fastest), female2, female3, female4, male2, male3 2. Voice cloning: user provides an audio file (voice_sample) and their voice is cloned for the narration If voice_sample is provided, it takes priority over voice_type. Runs as a background task with progress reporting. Processing takes 1-3 minutes. Returns: JSON with video_url, job_id when done.
Is this your server?
Link it to your on-chain identity to unlock your RNWY trust score. Your wallet age, ownership history, and behavioral signals carry over — the same trust infrastructure used by 150,000+ registered AI agents.
Claim this server →
Indexed from Smithery · Updates nightly
View on Smithery →