MCP Tools Reference
Voice Mirror provides 45 MCP (Model Context Protocol) tools across 5 dynamically loadable groups, enabling Claude Code, OpenCode, and other MCP-aware agents to interact with the system. The MCP server is a native Rust binary (voice-mirror-mcp) that communicates over stdio JSON-RPC.
Tool groups load on demand. Two groups (core and capture) are always loaded; the rest load when the active profile enables them or when the AI mentions a relevant keyword. Groups that go unused for ~15 consecutive tool calls are automatically unloaded to reduce context size.
| Group | Always loaded | Tools | Count |
|---|---|---|---|
| core | yes | voice_send, voice_inbox, voice_listen, voice_status, get_logs | 5 |
| capture | yes | capture_list_windows, capture_window, capture_browser, sandbox_start, sandbox_attach, sandbox_snapshot, sandbox_screenshot, sandbox_click, sandbox_type, sandbox_close_window, list_ports | 11 |
| memory | no | memory_search, memory_get, memory_remember, memory_forget, memory_stats, memory_flush | 6 |
| browser | no | browser_action | 1 |
| n8n | no | workflow / execution / credential / tag / node tools | 22 |
Core Group (5 tools) — Always loaded
Section titled “Core Group (5 tools) — Always loaded”Core voice communication and diagnostics between the AI agent and Voice Mirror.
voice_send
Section titled “voice_send”Send a message to the Voice Mirror inbox. The message is spoken aloud via TTS.
{ "instance_id": "voice-claude", "message": "Hello! I've finished the task.", "thread_id": "optional-thread-id", "reply_to": "optional-message-id"}Required: instance_id, message.
voice_inbox
Section titled “voice_inbox”Read messages from the inbox. Voice queries from the user appear here.
{ "instance_id": "voice-claude", "include_read": false, "limit": 10, "mark_as_read": true}Required: instance_id.
voice_listen
Section titled “voice_listen”Wait for new voice input from the user. Blocks until a message arrives or timeout.
{ "instance_id": "voice-claude", "from_sender": "user", "thread_id": "optional-thread-filter", "timeout_seconds": 300}Required: instance_id, from_sender. Timeout defaults to 300 seconds (max 600).
voice_status
Section titled “voice_status”Update or list Claude instance presence status.
{ "instance_id": "voice-claude", "action": "update", "status": "active", "current_task": "Working on docs"}Required: instance_id. action is update or list; status is active or idle.
get_logs
Section titled “get_logs”Query Voice Mirror’s structured output logs. Without a channel, returns a summary of all channels with entry counts. With a channel name, returns actual log lines.
{ "channel": "preview", "level": "info", "last": 100, "search": "stream"}System channels: app (Voice Mirror core), cli (CLI provider), voice (voice pipeline), mcp (MCP server), browser (browser bridge), frontend (frontend errors), preview (App Preview window-follow + streaming). Project channels are dynamic — created when dev servers start — and contain build logs plus browser console output for the project being developed.
Capture Group (11 tools) — Always loaded
Section titled “Capture Group (11 tools) — Always loaded”This is the headline group. It powers the App Preview “see and drive” loop: a live, true-size view of the app you are building, where the AI looks at the same running surface the user does, then reads its element tree and clicks/types to test it.
How the AI sees the app:
- CDP screencast for web, Tauri, WebView2, and Electron apps.
- Windows Graphics Capture for native windows rendered at true window size.
How the AI drives the app:
- It reads an accessibility/element tree exposed as
@e{n}refs, then clicks and types by ref. - CDP engine -> web / Tauri / WebView2 / Electron apps.
- UI Automation (UIA) engine -> native Windows apps (Notepad, Calculator, Settings, Win32/WinForms/WPF/Qt).
- Both engines present the same tool surface and the same
@refmodel — the AI cannot tell which engine is underneath. Driving native Windows apps this way is genuinely novel.
The live preview auto-follows whichever window you or the AI last touched (two-way focus sync).
Windows-only today. App Preview and native-app driving currently run on Windows.
capture_list_windows
Section titled “capture_list_windows”List all visible desktop windows (title, process name, dimensions). Use this to find a target before capturing or driving it.
{ "filter": "notepad"}filter is an optional case-insensitive substring match on title or process name.
capture_window
Section titled “capture_window”Take a screenshot of a specific desktop window. Use capture_list_windows first, then capture by title substring or exact hwnd.
{ "title": "Calculator", "hwnd": 132456}Provide title or hwnd (HWND is more precise). Returns the screenshot as an image.
capture_browser
Section titled “capture_browser”Screenshot the Lens browser preview at its exact current viewport size — the web app or site the user is building, as the user sees it. Prefer this over capture_window for previewing localhost apps/sites. No parameters.
sandbox_start
Section titled “sandbox_start”Call this first when starting work on a desktop app (e.g. a Tauri app). It launches the app with remote debugging on a safe port and opens the live App Preview so both you and the user see it running.
{ "path": "C:/projects/my-tauri-app"}path is optional — omit it to launch Voice Mirror’s active project.
sandbox_attach
Section titled “sandbox_attach”Register an app you already launched yourself (with --remote-debugging-port=PORT) as the active sandbox and open the live App Preview for it. Use this instead of sandbox_start when you launched the app in a terminal.
{ "port": 9333}Required: port (must not be Voice Mirror’s own port 9222).
sandbox_snapshot
Section titled “sandbox_snapshot”See the structure of the app. Returns the accessibility tree as @ref element handles plus a windows list of the app’s open windows ([index] title — url). Call this first, then use the refs with sandbox_click / sandbox_type.
{ "port": 9333, "hwnd": 132456, "window": "settings"}- Omit
port/hwndto use the active sandbox app launched by Voice Mirror. - Pass
hwndto drive a native (non-CDP) app via UI Automation — mutually exclusive withport. windowselects a secondary window from a previous snapshot’swindowslist (prefer a URL/route substring or index, since apps often reuse the same window title). Subsequentsandbox_click/sandbox_typeact on whichever window you last snapshotted.
sandbox_screenshot
Section titled “sandbox_screenshot”See the app rendered at its true window size (the real running window, not a stretched web preview). Works for CDP/Tauri apps and for native apps snapshotted by hwnd.
{ "port": 9333}port is optional — defaults to the active sandbox app (or the native window from the last sandbox_snapshot).
sandbox_click
Section titled “sandbox_click”Click an element in the running app to test it. Acts on whichever window the most recent sandbox_snapshot targeted (CDP or native).
{ "element_ref": "@e7", "port": 9333}Required: element_ref (from the last snapshot). port is optional.
sandbox_type
Section titled “sandbox_type”Type text into an element in the running app.
{ "element_ref": "@e7", "text": "hello world", "port": 9333}Required: element_ref, text. port is optional.
sandbox_close_window
Section titled “sandbox_close_window”Gracefully close the app window you are currently driving (the one your last sandbox_snapshot targeted) — e.g. a Settings window you opened. Performs the native title-bar close you cannot reach with sandbox_click.
{ "port": 9333}port is optional.
list_ports
Section titled “list_ports”List which process holds each listening TCP port (port, PID, process name, state) — instantly see what is running on a port without shelling out to PowerShell/netstat.
{ "port": 9333}Pass port to filter to a single port (e.g. to check a dev port before sandbox_start); omit it to list all listening ports.
Memory Group (6 tools)
Section titled “Memory Group (6 tools)”Persistent memory with hybrid semantic + keyword search across three tiers. Memories survive across sessions.
| Tier | TTL | Use case |
|---|---|---|
| core | Permanent | User preferences, project decisions, durable facts |
| stable | 7 days | Important context, session decisions |
| notes | 24 hours | Temporary reminders |
memory_search
Section titled “memory_search”Search memories using hybrid semantic + keyword search.
{ "query": "user preferences for TTS voice", "max_results": 5, "min_score": 0.3}Required: query. Defaults: max_results 5, min_score 0.3.
memory_get
Section titled “memory_get”Read the full content of a memory chunk or file. Use after memory_search to pull only the lines you need.
{ "path": "chunk_abc123", "from_line": 10, "lines": 20}Required: path (file path or chunk ID from search results).
memory_remember
Section titled “memory_remember”Store a persistent memory with a tier classification.
{ "content": "User prefers Kokoro TTS with voice af_bella", "tier": "core"}Required: content. tier is core, stable, or notes.
memory_forget
Section titled “memory_forget”Delete a memory by content or chunk ID. Destructive — requires confirmed: true.
{ "content_or_id": "chunk_abc123", "confirmed": true}Required: content_or_id.
memory_stats
Section titled “memory_stats”Get memory system statistics including storage, index, and embedding info. No parameters.
memory_flush
Section titled “memory_flush”Flush important context to persistent memory before the context window is compacted, to preserve key decisions, topics, and action items.
{ "topics": ["voice pipeline debugging"], "decisions": ["switched to Kokoro TTS"], "action_items": ["test App Preview on the new build"], "summary": "Debugged TTS pipeline, switched engines"}Browser Group (1 tool)
Section titled “Browser Group (1 tool)”Browser automation is a single unified tool, browser_action, that dispatches roughly 50 sub-actions via its action parameter. (Earlier versions exposed many separate browser_* tools; that is no longer the case.) The browser exposes elements as @e{n} refs — call snapshot first to discover them, then target elements by ref.
browser_action
Section titled “browser_action”{ "action": "snapshot", "interactiveOnly": true}{ "action": "click", "ref": "@e3"}{ "action": "navigate", "url": "https://example.com"}Required: action. Other fields apply depending on the action (ref, selector, url, value, expression, query, name, username, password, key, pattern, content, annotate, interactiveOnly, timeout, stableMs, tabId, x, y).
Sub-actions by category:
| Category | Actions |
|---|---|
| Navigation | navigate, back, forward, reload |
| Interaction | click, dblclick, fill, fill_rich_editor, type, hover, focus, scroll, select, check, uncheck |
| Inspection | screenshot (set annotate: true for numbered overlays), snapshot (@eN refs; interactiveOnly: true to filter), gettext, content, boundingbox, isvisible, url, title |
| Scripting | evaluate, addscript |
| Tabs | tab_new, tab_list, tab_switch, tab_close |
| Waiting | wait, waitforurl, waitforloadstate, waitforstable (DOM mutation silence) |
| State | cookies_get, cookies_set, cookies_clear, storage_get, storage_set |
| Auth | auth_save, auth_login, auth_list, auth_delete |
| Web | search, fetch |
n8n Group (22 tools)
Section titled “n8n Group (22 tools)”Full n8n workflow automation over the n8n REST API.
| Tool | Purpose |
|---|---|
n8n_search_nodes | Search for n8n nodes by keyword |
n8n_get_node | Get detailed node info (minimal/standard/full) |
n8n_list_workflows | List all workflows (optionally active only) |
n8n_get_workflow | Get workflow details by ID |
n8n_create_workflow | Create a new workflow |
n8n_update_workflow | Update a workflow via operations |
n8n_delete_workflow | Delete a workflow (requires confirmation) |
n8n_validate_workflow | Validate a workflow configuration |
n8n_trigger_workflow | Execute via webhook trigger |
n8n_deploy_template | Deploy from an n8n.io template |
n8n_get_executions | Get recent executions |
n8n_get_execution | Get execution details |
n8n_delete_execution | Delete an execution (requires confirmation) |
n8n_retry_execution | Retry a failed execution |
n8n_list_credentials | List credentials |
n8n_create_credential | Create a new credential |
n8n_delete_credential | Delete a credential (requires confirmation) |
n8n_get_credential_schema | Get the schema for a credential type |
n8n_list_tags | List all tags |
n8n_create_tag | Create a new tag |
n8n_delete_tag | Delete a tag (requires confirmation) |
n8n_list_variables | List global variables |
Destructive tools (n8n_delete_workflow, n8n_delete_credential, n8n_delete_tag, n8n_delete_execution) require confirmed: true. memory_forget is also gated this way.
Tool Response Format
Section titled “Tool Response Format”All MCP tools return the standard MCP tool-result shape — a content array plus an isError flag:
{ "content": [{ "type": "text", "text": "Message sent." }], "isError": false}On error, isError is true and the text describes what went wrong:
{ "content": [{ "type": "text", "text": "Description of what went wrong" }], "isError": true}Screenshot tools (capture_window, sandbox_screenshot, browser_action with action: "screenshot") return an image content item (base64 data + mimeType) alongside a short text note.
Dynamic Loading
Section titled “Dynamic Loading”Tools load by profile and by keyword intent:
- Always loaded:
coreandcaptureare loaded at startup and cannot be unloaded. - Profile: an active tool profile pins a set of groups. The default
voice-assistantprofile enables core + memory + browser (with capture always loaded on top). - Keyword intent: if no profile restricts it, a group auto-loads when the user’s message mentions a matching keyword — for example “remember”/“recall” loads memory, “search”/“website”/“snapshot” loads browser, “screenshot”/“sandbox”/“preview” relates to capture (already always loaded), and “n8n”/“workflow”/“trigger” loads n8n.
- Auto-unload: any non-pinned, non-always-loaded group that goes unused for ~15 consecutive tool calls is automatically unloaded to keep the tool list and context small.