Voice runtime workers and transcription pipeline
This document drills into the voice-mode backend that sits below voice-mode-foundry-local.md. The existing voice-mode page explains the staff-gated /voice command, settings, runtime inspection, model picker, and TUI entry points. This page focuses on the core code paths that actually record audio, install/locate the Foundry Local runtime, load speech models, and turn PCM buffers into preview/final text.
The analyzed implementation is split across app.js and three bundled worker files:
copilot-cli-pkg/voice-mic.worker.jscaptures microphone PCM through@picovoice/pvrecorder-node.copilot-cli-pkg/voice-installer.worker.jsresolves/downloads the Foundry Local native runtime.copilot-cli-pkg/voice-foundry.worker.jsmanages Foundry Local models and transcription sessions.
Because these files are bundled/minified, line numbers are approximate. The worker bundles are mostly one-line payloads, so the exact string anchors and offsets are more useful than line numbers.
For the user-facing command and settings path, start with Voice mode and Foundry Local. The sections below trace worker-thread RPC, microphone PCM capture, runtime installation, model loading, streaming previews, final transcription, and cleanup; the resulting text re-enters the normal prompt/session flow rather than creating a separate model pipeline.
Source anchors
| Semantic alias | Minified anchor | Approx. location | Role |
|---|---|---|---|
| Voice hook/controller | qHo(...) | app.js ~6861 | React-style voice controller: inspect runtime, warm model, open mic, expose enable, disable, selectModel, and status. |
| Recording bridge | $Ho(...) | app.js ~6861 | Binds a loaded Foundry model handle to a microphone source and forwards PCM chunks into an active transcription session. |
| Model-handle wrapper | GHo(...) | app.js ~6861 | Loads a model through the Foundry client and returns beginRecording(...) / cancelCurrentRecording(). |
| Foundry RPC client | cNr / UHo(...) | app.js ~6861 | Main-thread wrapper around voice-foundry.worker.js RPC methods and events. |
| Foundry recording session | uNr | app.js ~6861 | Per-recording session wrapper; forwards appendSession, stopSession, cancelSession, and sessionPreview. |
| Foundry worker channel | FHo(...) | app.js ~6861 | Creates the voice-foundry.worker.js worker RPC channel with nativeLocation. |
| Mic worker channel | QHo(...) | app.js ~6861 | Creates the voice-mic.worker.js worker RPC channel and decodes transferable PCM buffers. |
| Microphone source adapter | HHo(...) | app.js ~6861 | Main-thread adapter for mic start/stop plus pcm and error subscriptions. |
| Runtime installer | OHo(...) / ZFa(...) | app.js ~6861 | Caches runtime inspection and launches voice-installer.worker.js for install/update. |
| Slash command entry | /voice, inspectRuntime, voice-runtime-download, voice-models | app.js ~4916 | User-facing control path that triggers runtime/model dialogs and calls the controller. |
| TUI session injection | voice:e.VOICE ? { ... } | app.js ~7342 | Injects voice controller methods into the interactive session only when the VOICE gate is enabled. |
| Mic backend state machine | var O=1600,C=15,l=class{...} | voice-mic.worker.js line 59 | Opens PvRecorder, reads PCM frames, emits pcm, and handles start/stop/shutdown. |
| Runtime install state | var J=1,b=".complete", platform map | voice-installer.worker.js line 59 | Builds the runtime cache path, validates required files, and marks completed downloads. |
| Foundry backend state machine | var h=class{managerPromise;state={tag:"unloaded"}...} | voice-foundry.worker.js line 59 | Lists/downloads/loads models and opens streaming or batch transcription sessions. |
High-level pipeline
flowchart TD User[User holds space / toggles dictation] --> UI[TUI voice hook] UI --> Controller[app.js qHo controller] Controller --> Installer[OHo runtime installer] Installer --> InstallerWorker[voice-installer.worker.js] Controller --> FoundryClient[cNr Foundry client] FoundryClient --> FoundryWorker[voice-foundry.worker.js] Controller --> MicSource[HHo microphone source] MicSource --> MicWorker[voice-mic.worker.js] MicWorker -->|pcm events| MicSource MicSource -->|PCM sink| Recording[$Ho recording bridge] Recording -->|appendSession| FoundryWorker FoundryWorker -->|sessionPreview| FoundryClient FoundryWorker -->|final text| FoundryClient FoundryClient --> UIThe split is deliberate. The main TUI process owns settings, UI state, status text, and lifecycle cleanup. Native code and long-running voice work are isolated behind worker-thread RPC channels:
- mic capture can fail or block without freezing the TUI;
- Foundry Local runtime/model operations run outside the TUI loop;
- PCM buffers are transferred rather than copied when possible;
- shutdown can terminate or dispose each subsystem independently.
Main-thread controller in app.js
The voice controller created by qHo(...) is the runtime coordinator. It keeps a small state machine in React state:
| State | Meaning |
|---|---|
off | No active voice runtime, mic, or Foundry client. |
preparing | Runtime/model checks are in progress before a first activation. |
installing | The installer worker is downloading or updating Foundry Local. |
warming | Runtime is present and the model/mic/client are being opened. |
ready | A selected model is loaded and the mic source is ready for recordings. |
error | Activation or backend operation failed. |
The enable({ modelId }) path serializes work through an internal promise chain so overlapping enable/disable/select operations do not race. It:
- chooses the requested model ID or the persisted selected model;
- calls
installer.inspect(); - maps installer states to
runtime-unsupported,runtime-missing,runtime-outdated, or a downloadedlocation; - checks whether the selected model exists and is cached through
client.listModels(); - constructs an owned
{ client, mic }pair when needed; - opens the microphone source;
- warms up the Foundry model with
GHo(...); - sets state to
readyand fires the “Voice ready” notification after the first warmup.
When the selected model changes, qHo(...) cancels any current recording before switching the active model. On fatal backend failure, it aborts the active controller, moves to error, and disposes the owned mic/client pair.
Recording bridge: $Ho(…)
$Ho(...) is the short-lived object for one recording. It joins a loaded model handle from GHo(...) with the microphone source from HHo(...).
Runtime flow:
- Open a Foundry transcription session through
modelHandle.openSession(callbacks). - Ensure the microphone source is open.
- Set the microphone sink to a callback that receives each PCM
Buffer. - For each PCM chunk:
- call optional
onPcm; - call
session.append(buffer); - if append fails while active, unset the sink, surface
onError, and cancel the session.
- call optional
- On
stop():- unset the sink;
- call
session.stop(); - deliver the final text through
onFinal.
- On
cancel():- unset the sink;
- call
session.cancel(); - resolve even if cancel itself fails.
This bridge is where audio becomes model input. Everything above it is UI/setup; everything below it is mic or Foundry worker implementation.
Microphone worker
voice-mic.worker.js exposes a tiny RPC backend with four methods:
| Method | Behavior |
|---|---|
start({ inputDeviceId }) | Loads @picovoice/pvrecorder-node, opens PvRecorder, starts the read loop. |
stop() | Cancels startup or stops an active recorder, then releases it. |
getState() | Returns { open: false } for idle/starting/stopping and { open: true } for active. |
shutdown() | Stops the recorder and clears event subscribers. |
The worker state machine is:
stateDiagram-v2 [*] --> idle idle --> starting: start(device) starting --> active: PvRecorder.start + read loop starting --> stopping: stop during startup active --> stopping: stop active --> idle: read error cleanup stopping --> idle: teardown finished idle --> [*]: shutdownImportant constants and behavior:
O=1600is passed as thePvRecorderframe length.C=15is passed as the recorder buffered-frame/count argument.- The default device is
-1when noinputDeviceIdis supplied. - Starting a different device while one is starting/active returns
device-busy. - Loading failures become
mic-unavailableerrors with a reinstall hint. - Opening failures stop/release any partially-created recorder before throwing.
runReadLoop(...)repeatedly awaitsrecorder.read(), converts returned PCM into aBuffer, and emitspcm.- Read failures stop and release the recorder, reset state to
idle, and emit anerrorevent.
The worker posts PCM as a transferable event:
| Event | Payload |
|---|---|
pcm | { buffer, byteOffset, byteLength }, transferred back to the parent thread. |
error | Serialized VoiceBackendError or generic error. |
app.js decodes the pcm event back into a Buffer before handing it to HHo(...).
Runtime installer worker
voice-installer.worker.js is responsible for turning “voice runtime is needed” into a concrete native library location.
Platform and cache resolution
The worker maps supported Node platform/architecture pairs to Foundry runtime IDs:
| Node key | Foundry runtime directory |
|---|---|
win32-x64 | win-x64 |
win32-arm64 | win-arm64 |
linux-x64 | linux-x64 |
darwin-arm64 | osx-arm64 |
Unsupported pairs throw a user-facing “Voice mode is not supported” error. The cache path is:
<COPILOT_CACHE_HOME or default cache root>/foundry/<hash>/<runtime-dir>The <hash> is derived from a schema number and the expected Foundry/ONNX artifacts, so a dependency version change moves the runtime to a new cache directory.
Version audit
The worker requires two upstream Foundry Local files to have the expected shape:
foundry-local-sdk/script/install-utils.cjsmust exportrunInstall.foundry-local-sdk/deps_versions.jsonmust include:foundry-local-core.nuget;onnxruntime.version;onnxruntime-genai.version.
Those checks are defensive source-shape audits. Their error strings explicitly mention re-running the source audit checklist if the upstream SDK layout changes.
Download and atomic install
Install flow:
- Check for
.completeplus required runtime files. - If missing, post
download-startedto the parent. - Create a sibling temporary directory named like
.tmp-<runtime>-<pid>-<timestamp>. - Run Foundry Local
runInstall(artifacts, { binDir: tmp })while suppressing stdout/stderr noise. - Verify required files exist in the temporary directory.
- Write the
.completesentinel. - Rename the temporary directory into the final cache path.
- If rename collides with an already-complete runtime, delete the temporary directory and reuse the existing install.
On Windows, the returned location also reports whether Microsoft.WindowsAppRuntime.Bootstrap.dll exists. The Foundry worker later uses that to add a Bootstrap setting when creating the manager.
Foundry worker
voice-foundry.worker.js is the transcription backend. It exposes these RPC methods:
| Method | Behavior |
|---|---|
listModels() | Reads Foundry catalog models and returns ASR variants with cached/model metadata. |
downloadModel({ variantId, downloadId }) | Downloads one variant and emits progress. |
deleteModel({ variantId }) | Removes a cached model, unless an active session blocks deletion. |
loadModel({ variantId }) | Loads a cached model and returns a modelGeneration. |
openSession({ sessionId, modelGeneration }) | Opens a streaming or batch transcription session for the loaded model. |
appendSession({ sessionId, pcm }) | Appends PCM to the active session. |
stopSession({ sessionId }) | Stops the session and returns { text }. |
cancelSession({ sessionId }) | Cancels and tears down the active session. |
shutdown() | Cancels active work, unloads the selected model, and clears events. |
Manager and model lifecycle
The worker initializes FoundryLocalManager.create(...) with:
appName: "github-copilot-cli";libraryPathfrom the installer result;- additional settings
{ AzureCatalogFilter: "'',test" }; Bootstrap: "true"on Windows when the installer reports that Windows App Runtime bootstrap is needed.
Model state moves through:
stateDiagram-v2 [*] --> unloaded unloaded --> loading: loadModel(variantId) loading --> ready: variant.load() loading --> unloaded: load failure ready --> loading: load different model with no active session ready --> unloaded: delete/unload/shutdownThe modelGeneration number prevents stale UI handles from opening sessions after a model was changed. app.js stores the generation returned by loadModel(...); openSession(...) rejects with stale-model if the currently loaded model no longer matches that generation.
Streaming vs batch transcription
The worker chooses the transcription mode from the variant alias:
- aliases containing
streamingusecreateLiveTranscriptionSession(); - other ASR variants use a temporary WAV file and batch
transcribe(path).
Streaming session flow:
- Create and start the Foundry live transcription session.
appendSession(...)forwards PCM directly tofoundrySdkSession.append(pcm).runStreamingDrain(...)readsfoundrySdkSession.getStream().- Non-final text is appended to
tail; final text is appended tocommittedand clearstail. - Each update emits
sessionPreviewwithcommitted + tail. stopSession(...)callsfoundrySdkSession.stop()and waits for the drain task, with a timeout.
Batch session flow:
- Create a temporary file named
voice-foundry-batch-<sessionId>.wavunderos.tmpdir()or the configured temp dir. - Write a placeholder WAV header.
appendSession(...)writes PCM chunks and increments data size.stopSession(...)finalizes the WAV header.- Call
variant.createAudioClient().transcribe(wav.path)and returntext ?? "". - Delete the WAV file in
finally.
Batch mode has no live preview because transcription happens only after the WAV file is finalized.
Main-thread Foundry client wrapper
cNr wraps the Foundry worker RPC channel in a main-thread client API:
listModels()forwardslistModels.downloadModel(variantId, onProgress)creates adownloadId, subscribes tomodelDownloadProgress, and filters progress by that ID.loadModel(variantId)returns a handle withopenSession(callbacks).dispose()shuts the worker down and notifies active sessions.
uNr is the per-recording session object returned by openSession(...):
| Method/event | Runtime behavior |
|---|---|
sessionPreview event | Delivered as callbacks.onPreview(text) while the session is still open. |
append(buffer) | Calls worker appendSession; transfers the underlying ArrayBuffer when the buffer covers it exactly. |
stop() | Calls worker stopSession and delivers callbacks.onFinal(text). |
cancel() | Calls worker cancelSession; errors are swallowed because cancel is best-effort cleanup. |
| dispose notification | Marks the session errored and calls callbacks.onError(...). |
This wrapper keeps recording-session state (open, stopping, final, cancelled, errored) on the main side so UI callbacks cannot fire after terminal states.
End-to-end dictation sequence
sequenceDiagram participant UI as TUI voice keybinding participant Hook as qHo controller participant Mic as voice-mic.worker.js participant Bridge as $Ho recording bridge participant Foundry as voice-foundry.worker.js
UI->>Hook: enable({ modelId }) Hook->>Foundry: listModels / loadModel Hook->>Mic: start(inputDeviceId) Mic-->>Hook: pcm events ready UI->>Hook: beginRecording(callbacks) Hook->>Foundry: openSession(sessionId, modelGeneration) Hook->>Bridge: set mic sink loop while recording Mic-->>Bridge: pcm Buffer Bridge->>Foundry: appendSession(sessionId, pcm) Foundry-->>UI: sessionPreview(text) for streaming models end UI->>Bridge: stop() Bridge->>Foundry: stopSession(sessionId) Foundry-->>UI: final textFailure and cleanup behavior
| Failure point | Handling |
|---|---|
| Runtime unsupported/missing/outdated | qHo(...) returns a structured result; /voice opens the runtime download/update dialog or reports unsupported platform. |
| Model not selected or not cached | qHo(...) returns no-model-selected / model-not-cached; UI opens the model picker. |
| Mic backend missing | voice-mic.worker.js returns mic-unavailable with a reinstall hint. |
| Mic read error | Worker emits error; qHo(...) logs a warning and cancels the current recording so the next recording can recover. |
| Append failure | $Ho(...) unsets the mic sink, calls onError, and cancels the Foundry session. |
| Stale model generation | Foundry worker rejects openSession with stale-model; main wrapper cleans up the session handle. |
| Streaming drain timeout | stopStreaming(...) fails with session-timeout after the timeout wrapper. |
| Windows native dependency missing | Foundry manager initialization maps dependency-load errors to a Visual C++ Redistributable message. |
| Shutdown | qHo.shutdown() aborts active state, cancels recording, closes mic, disposes Foundry client, and worker shutdown callbacks clear subscribers. |
Relationship to other docs
voice-mode-foundry-local.mdcovers/voice, settings, model picker, and TUI affordances.loader-bootstrap.mdcovers the secure native-module routing that makesfoundry-local-sdkand@picovoice/pvrecorder-nodeloadable from the extracted package.settings-config-persistence.mdcovers the settings helpers used to persistvoice.enabledandvoice.selectedModel.telemetry-update-and-shutdown.mdcovers the broader shutdown-service pattern that voice uses for Foundry client disposal.
Created and maintained by Yingting Huang.