Skip to content

MCP Tools Reference

Voice Mirror provides 45 MCP (Model Context Protocol) tools across 5 dynamically loadable groups, enabling Claude Code, OpenCode, and other MCP-aware agents to interact with the system. The MCP server is a native Rust binary (voice-mirror-mcp) that communicates over stdio JSON-RPC.

Tool groups load on demand. Two groups (core and capture) are always loaded; the rest load when the active profile enables them or when the AI mentions a relevant keyword. Groups that go unused for ~15 consecutive tool calls are automatically unloaded to reduce context size.

GroupAlways loadedToolsCount
coreyesvoice_send, voice_inbox, voice_listen, voice_status, get_logs5
captureyescapture_list_windows, capture_window, capture_browser, sandbox_start, sandbox_attach, sandbox_snapshot, sandbox_screenshot, sandbox_click, sandbox_type, sandbox_close_window, list_ports11
memorynomemory_search, memory_get, memory_remember, memory_forget, memory_stats, memory_flush6
browsernobrowser_action1
n8nnoworkflow / execution / credential / tag / node tools22

Core voice communication and diagnostics between the AI agent and Voice Mirror.

Send a message to the Voice Mirror inbox. The message is spoken aloud via TTS.

{
"instance_id": "voice-claude",
"message": "Hello! I've finished the task.",
"thread_id": "optional-thread-id",
"reply_to": "optional-message-id"
}

Required: instance_id, message.

Read messages from the inbox. Voice queries from the user appear here.

{
"instance_id": "voice-claude",
"include_read": false,
"limit": 10,
"mark_as_read": true
}

Required: instance_id.

Wait for new voice input from the user. Blocks until a message arrives or timeout.

{
"instance_id": "voice-claude",
"from_sender": "user",
"thread_id": "optional-thread-filter",
"timeout_seconds": 300
}

Required: instance_id, from_sender. Timeout defaults to 300 seconds (max 600).

Update or list Claude instance presence status.

{
"instance_id": "voice-claude",
"action": "update",
"status": "active",
"current_task": "Working on docs"
}

Required: instance_id. action is update or list; status is active or idle.

Query Voice Mirror’s structured output logs. Without a channel, returns a summary of all channels with entry counts. With a channel name, returns actual log lines.

{
"channel": "preview",
"level": "info",
"last": 100,
"search": "stream"
}

System channels: app (Voice Mirror core), cli (CLI provider), voice (voice pipeline), mcp (MCP server), browser (browser bridge), frontend (frontend errors), preview (App Preview window-follow + streaming). Project channels are dynamic — created when dev servers start — and contain build logs plus browser console output for the project being developed.

Capture Group (11 tools) — Always loaded

Section titled “Capture Group (11 tools) — Always loaded”

This is the headline group. It powers the App Preview “see and drive” loop: a live, true-size view of the app you are building, where the AI looks at the same running surface the user does, then reads its element tree and clicks/types to test it.

How the AI sees the app:

  • CDP screencast for web, Tauri, WebView2, and Electron apps.
  • Windows Graphics Capture for native windows rendered at true window size.

How the AI drives the app:

  • It reads an accessibility/element tree exposed as @e{n} refs, then clicks and types by ref.
  • CDP engine -> web / Tauri / WebView2 / Electron apps.
  • UI Automation (UIA) engine -> native Windows apps (Notepad, Calculator, Settings, Win32/WinForms/WPF/Qt).
  • Both engines present the same tool surface and the same @ref model — the AI cannot tell which engine is underneath. Driving native Windows apps this way is genuinely novel.

The live preview auto-follows whichever window you or the AI last touched (two-way focus sync).

Windows-only today. App Preview and native-app driving currently run on Windows.

List all visible desktop windows (title, process name, dimensions). Use this to find a target before capturing or driving it.

{
"filter": "notepad"
}

filter is an optional case-insensitive substring match on title or process name.

Take a screenshot of a specific desktop window. Use capture_list_windows first, then capture by title substring or exact hwnd.

{
"title": "Calculator",
"hwnd": 132456
}

Provide title or hwnd (HWND is more precise). Returns the screenshot as an image.

Screenshot the Lens browser preview at its exact current viewport size — the web app or site the user is building, as the user sees it. Prefer this over capture_window for previewing localhost apps/sites. No parameters.

Call this first when starting work on a desktop app (e.g. a Tauri app). It launches the app with remote debugging on a safe port and opens the live App Preview so both you and the user see it running.

{
"path": "C:/projects/my-tauri-app"
}

path is optional — omit it to launch Voice Mirror’s active project.

Register an app you already launched yourself (with --remote-debugging-port=PORT) as the active sandbox and open the live App Preview for it. Use this instead of sandbox_start when you launched the app in a terminal.

{
"port": 9333
}

Required: port (must not be Voice Mirror’s own port 9222).

See the structure of the app. Returns the accessibility tree as @ref element handles plus a windows list of the app’s open windows ([index] title — url). Call this first, then use the refs with sandbox_click / sandbox_type.

{
"port": 9333,
"hwnd": 132456,
"window": "settings"
}
  • Omit port/hwnd to use the active sandbox app launched by Voice Mirror.
  • Pass hwnd to drive a native (non-CDP) app via UI Automation — mutually exclusive with port.
  • window selects a secondary window from a previous snapshot’s windows list (prefer a URL/route substring or index, since apps often reuse the same window title). Subsequent sandbox_click / sandbox_type act on whichever window you last snapshotted.

See the app rendered at its true window size (the real running window, not a stretched web preview). Works for CDP/Tauri apps and for native apps snapshotted by hwnd.

{
"port": 9333
}

port is optional — defaults to the active sandbox app (or the native window from the last sandbox_snapshot).

Click an element in the running app to test it. Acts on whichever window the most recent sandbox_snapshot targeted (CDP or native).

{
"element_ref": "@e7",
"port": 9333
}

Required: element_ref (from the last snapshot). port is optional.

Type text into an element in the running app.

{
"element_ref": "@e7",
"text": "hello world",
"port": 9333
}

Required: element_ref, text. port is optional.

Gracefully close the app window you are currently driving (the one your last sandbox_snapshot targeted) — e.g. a Settings window you opened. Performs the native title-bar close you cannot reach with sandbox_click.

{
"port": 9333
}

port is optional.

List which process holds each listening TCP port (port, PID, process name, state) — instantly see what is running on a port without shelling out to PowerShell/netstat.

{
"port": 9333
}

Pass port to filter to a single port (e.g. to check a dev port before sandbox_start); omit it to list all listening ports.

Persistent memory with hybrid semantic + keyword search across three tiers. Memories survive across sessions.

TierTTLUse case
corePermanentUser preferences, project decisions, durable facts
stable7 daysImportant context, session decisions
notes24 hoursTemporary reminders

Search memories using hybrid semantic + keyword search.

{
"query": "user preferences for TTS voice",
"max_results": 5,
"min_score": 0.3
}

Required: query. Defaults: max_results 5, min_score 0.3.

Read the full content of a memory chunk or file. Use after memory_search to pull only the lines you need.

{
"path": "chunk_abc123",
"from_line": 10,
"lines": 20
}

Required: path (file path or chunk ID from search results).

Store a persistent memory with a tier classification.

{
"content": "User prefers Kokoro TTS with voice af_bella",
"tier": "core"
}

Required: content. tier is core, stable, or notes.

Delete a memory by content or chunk ID. Destructive — requires confirmed: true.

{
"content_or_id": "chunk_abc123",
"confirmed": true
}

Required: content_or_id.

Get memory system statistics including storage, index, and embedding info. No parameters.

Flush important context to persistent memory before the context window is compacted, to preserve key decisions, topics, and action items.

{
"topics": ["voice pipeline debugging"],
"decisions": ["switched to Kokoro TTS"],
"action_items": ["test App Preview on the new build"],
"summary": "Debugged TTS pipeline, switched engines"
}

Browser automation is a single unified tool, browser_action, that dispatches roughly 50 sub-actions via its action parameter. (Earlier versions exposed many separate browser_* tools; that is no longer the case.) The browser exposes elements as @e{n} refs — call snapshot first to discover them, then target elements by ref.

{
"action": "snapshot",
"interactiveOnly": true
}
{
"action": "click",
"ref": "@e3"
}
{
"action": "navigate",
"url": "https://example.com"
}

Required: action. Other fields apply depending on the action (ref, selector, url, value, expression, query, name, username, password, key, pattern, content, annotate, interactiveOnly, timeout, stableMs, tabId, x, y).

Sub-actions by category:

CategoryActions
Navigationnavigate, back, forward, reload
Interactionclick, dblclick, fill, fill_rich_editor, type, hover, focus, scroll, select, check, uncheck
Inspectionscreenshot (set annotate: true for numbered overlays), snapshot (@eN refs; interactiveOnly: true to filter), gettext, content, boundingbox, isvisible, url, title
Scriptingevaluate, addscript
Tabstab_new, tab_list, tab_switch, tab_close
Waitingwait, waitforurl, waitforloadstate, waitforstable (DOM mutation silence)
Statecookies_get, cookies_set, cookies_clear, storage_get, storage_set
Authauth_save, auth_login, auth_list, auth_delete
Websearch, fetch

Full n8n workflow automation over the n8n REST API.

ToolPurpose
n8n_search_nodesSearch for n8n nodes by keyword
n8n_get_nodeGet detailed node info (minimal/standard/full)
n8n_list_workflowsList all workflows (optionally active only)
n8n_get_workflowGet workflow details by ID
n8n_create_workflowCreate a new workflow
n8n_update_workflowUpdate a workflow via operations
n8n_delete_workflowDelete a workflow (requires confirmation)
n8n_validate_workflowValidate a workflow configuration
n8n_trigger_workflowExecute via webhook trigger
n8n_deploy_templateDeploy from an n8n.io template
n8n_get_executionsGet recent executions
n8n_get_executionGet execution details
n8n_delete_executionDelete an execution (requires confirmation)
n8n_retry_executionRetry a failed execution
n8n_list_credentialsList credentials
n8n_create_credentialCreate a new credential
n8n_delete_credentialDelete a credential (requires confirmation)
n8n_get_credential_schemaGet the schema for a credential type
n8n_list_tagsList all tags
n8n_create_tagCreate a new tag
n8n_delete_tagDelete a tag (requires confirmation)
n8n_list_variablesList global variables

Destructive tools (n8n_delete_workflow, n8n_delete_credential, n8n_delete_tag, n8n_delete_execution) require confirmed: true. memory_forget is also gated this way.

All MCP tools return the standard MCP tool-result shape — a content array plus an isError flag:

{
"content": [{ "type": "text", "text": "Message sent." }],
"isError": false
}

On error, isError is true and the text describes what went wrong:

{
"content": [{ "type": "text", "text": "Description of what went wrong" }],
"isError": true
}

Screenshot tools (capture_window, sandbox_screenshot, browser_action with action: "screenshot") return an image content item (base64 data + mimeType) alongside a short text note.

Tools load by profile and by keyword intent:

  • Always loaded: core and capture are loaded at startup and cannot be unloaded.
  • Profile: an active tool profile pins a set of groups. The default voice-assistant profile enables core + memory + browser (with capture always loaded on top).
  • Keyword intent: if no profile restricts it, a group auto-loads when the user’s message mentions a matching keyword — for example “remember”/“recall” loads memory, “search”/“website”/“snapshot” loads browser, “screenshot”/“sandbox”/“preview” relates to capture (already always loaded), and “n8n”/“workflow”/“trigger” loads n8n.
  • Auto-unload: any non-pinned, non-always-loaded group that goes unused for ~15 consecutive tool calls is automatically unloaded to keep the tool list and context small.