spych
Spych
Spych (pronounced "speech"): Talk with your computer like it's your personal assistant without sending your voice to the cloud.
A lightweight, fully offline Python toolkit for wake word detection, audio transcription, spoken AI responses, and AI integrations. Built on faster-whisper, PvRecorder, and Kokoro.
API Docs: https://connor-makowski.github.io/spych/spych.html
Installation
Recommended: pipx
pipx install spych
Alternative: pip
pip install spych
TTS Extras
By default, Spych automatically installs the right TTS backend for your Python version. You can also install explicitly:
pipx install "spych[kokoro]" # Fast, lightweight (Python < 3.13 recommended)
pipx install "spych[chatterbox]" # High-quality voice cloning (Python >= 3.13 required)
Quick Start
# Navigate to your project directory first
cd ~/my_project
# Voice-control Claude Code — say "hey claude" to trigger
spych claude
# Use a personality preset — say "hey jarvis" to trigger
spych claude --personality jarvis
# Voice-control a local Ollama model — say "hey llama" to trigger
spych ollama --model llama3.2:latest
💡 Pro tip: Saying "Hey Claude" or "Hey Llama" tends to trigger more reliably than the bare wake word.
Say "terminate" (or press Ctrl+C) to stop any session.
CLI
Available Agents
All agents require their respective CLI tool to be installed and authenticated before use.
| Command | Alias | Description | Default wake words |
|---|---|---|---|
spych claude_code_cli |
— | Voice-control Claude Code via the CLI | claude, clod, cloud, clawed |
spych claude_code_sdk |
spych claude |
Voice-control Claude Code via the Agent SDK | claude, clod, cloud, clawed |
spych codex_cli |
spych codex |
Voice-control the OpenAI Codex agent | codex |
spych gemini_cli |
spych gemini |
Voice-control the Google Gemini agent | gemini, google |
spych opencode_cli |
spych opencode |
Voice-control the OpenCode agent | opencode, open code |
spych ollama |
— | Talk to a local Ollama model | llama, ollama, lama |
Available Utilities
The following utilities are also available as CLI commands. They don't use wake words, but serve various auxiliary functions like live transcription and voice profiling.
| Command | Description |
|---|---|
spych --version |
Print the version number and exit |
spych --help |
Show detailed usage instructions and exit |
spych live |
Continuous speech-to-text transcription to file |
spych multi |
Run multiple agents simultaneously |
spych users |
Manage user profiles and global settings |
spych profile_my_voice |
Record a voice sample for TTS cloning |
Global Flags
These must be placed before the agent name:
spych --theme light claude
| Flag | Options | Default | Description |
|---|---|---|---|
--theme |
dark, light, solarized, mono |
dark |
Terminal colour theme |
💡 TUI Dashboard: Spych launches a rich terminal interface by default. Use the
--verboseflag (e.g.,spych --verbose claude) to switch to a simpler, non-interactive scrollable output.
Common Flags
All agent subcommands accept these flags:
| Flag | Default | Description |
|---|---|---|
--personality NAME |
— | Apply a named preset (sets wake words, voice, name, style) |
--name NAME |
(agent default) | Custom display name shown in the terminal |
--wake-words WORD [...] |
(agent default) | One or more words that trigger the agent |
--terminate-words WORD [...] |
terminate |
Words that stop the listener |
--listen-duration SECONDS |
0 (VAD auto) |
Seconds to record after wake word |
--follow-up-listen-duration SECONDS |
0 |
Seconds to listen for a follow-up answer |
--inactivity-timeout SECONDS |
4.0 |
Seconds of silence before returning to wake word |
--use-speaker BOOL |
true |
Speak responses aloud via TTS |
--speaker-voice VOICE |
af_heart |
Voice name for spoken responses |
--speaker-backend BACKEND |
(auto) | chatterbox or kokoro |
--response-style STYLE |
— | Style preset or custom instruction for spoken output |
--intermediate-responses BOOL |
true |
Enable intermediate response chaining for long-running tasks |
Coding agents (claude, codex, gemini, opencode) also accept:
| Flag | Default | Description |
|---|---|---|
--continue-conversation BOOL |
true |
Resume the most recent session |
--show-tool-events BOOL |
true |
Print live tool start/end events |
Agent-specific flags:
| Agent | Flag | Default | Description |
|---|---|---|---|
ollama |
--model |
llama3.2:latest |
Ollama model name |
ollama |
--history-length |
10 |
Past interactions to include in context |
ollama |
--host |
http://localhost:11434 |
Ollama instance URL |
opencode_cli |
--model |
— | Model in provider/model format |
claude_code_sdk |
--setting-sources |
user project local |
Claude Code settings sources |
Personalities
Personalities are named presets that bundle a wake word list, voice, display name, and response style into a single flag. Any explicit flag overrides the preset.
spych claude --personality jarvis
# equivalent to:
spych claude --name "JARVIS" --wake-words jarvis jarves --speaker-voice bm_george --use-speaker true --response-style jarvis
| Name | Wake words | Voice | Style |
|---|---|---|---|
assistant |
assistant, helper, computer |
af_heart |
assistant — helpful, precise, informative |
friend |
friend, buddy, pal |
af_amy |
friendly — warm and simple |
jarvis |
jarvis, jarves, jargus, jervis |
bm_george |
jarvis — precise, dry wit, "sir" |
pirate |
blackbeard, pirate, ahoy |
am_michael |
pirate — pirate speak, colorful |
news_anchor |
bella, news anchor, anchor |
af_bella |
news_anchor — professional broadcast tone |
robot |
rob, robot |
am_adam |
robot — monotone, literal |
caveman |
er, ur, caveman, cave man |
am_onyx |
caveman — very simple, direct |
User Management
Spych supports multiple user profiles, allowing agents to provide more personalized responses based on your name, age, and other context.
# Launch the interactive user management menu
spych users
The users utility allows you to:
- Create, edit, and delete user profiles.
- Set a default user for all agents.
- Change the global terminal theme (
dark,light,solarized,mono).
You can also specify a user for a specific session:
spych claude --user Connor
Response Styles
The --response-style flag shapes how the agent formats its spoken output.
| Style | Description |
|---|---|
assistant |
Helpful and precise, concise and informative |
concise |
Key points only, direct |
friendly |
Warm, approachable, simple language |
military |
Brevity-style, short sentences |
five_year_old |
Simple words, very short |
fast |
As brief as reasonably possible |
pirate |
Pirate speak, colorful |
news_anchor |
Professional broadcast tone |
haiku |
5-7-5 haiku form |
shakespearean |
Elizabethan English |
robot |
Monotone, literal |
caveman |
Very simple, direct |
yoda |
Inverted sentence structure |
jarvis |
JARVIS from Iron Man — precise, dry wit, addresses user as "sir" or "ma'am" |
You can also pass any custom instruction string directly: --response-style "Reply in exactly one sentence.".
Text-to-Speech & Voices
Spoken responses are enabled by default for personality presets and when --use-speaker true is set.
spych claude --use-speaker true --speaker-voice bm_george
spych claude --use-speaker true --speaker-backend kokoro
spych claude --use-speaker false # disable TTS
When TTS is active, short responses are spoken verbatim; longer ones use the agent's short summary. If the response ends with a question, Spych automatically listens for a follow-up — no wake word required.
TTS Backends
| Backend | Best for | Python support |
|---|---|---|
| Chatterbox (default priority) | Natural voices, zero-shot voice cloning | 3.11+ (required for 3.13+) |
| Kokoro (lightweight fallback) | Fast, low-resource devices (e.g. Raspberry Pi) | 3.11–3.12 recommended |
Spych tries Chatterbox first, then Kokoro. Use --speaker-backend to force one explicitly.
Available Voices
The same voice names work for both backends.
- Chatterbox wave voices: https://github.com/connor-makowski/spych/tree/main/voices/wave
- Kokoro pt voices (56 total): https://github.com/connor-makowski/spych/tree/main/voices/pt
American English (am_ / af_):
| Voice | Gender | Grade |
|---|---|---|
af_heart |
F | A (default) |
af_bella |
F | A- |
af_nicole |
F | B- |
am_michael |
M | C+ |
am_fenrir |
M | C+ |
am_puck |
M | C+ |
British English (bm_ / bf_):
| Voice | Gender | Grade |
|---|---|---|
bf_emma |
F | B- |
bf_isabella |
F | C |
bm_george |
M | C |
Voice Cloning
Record a 10-second sample of your voice, then use it as the speaker voice. Requires the Chatterbox backend.
# Step 1: record your profile
spych profile_my_voice --name my_voice
# Step 2: use it
spych claude --use-speaker true --speaker-voice my_voice --speaker-backend chatterbox
# Or use any .wav file directly
spych claude --use-speaker true --speaker-voice /path/to/my_voice.wav --speaker-backend chatterbox
Live Transcription
spych live continuously records from the microphone using VAD and writes the transcript to disk in real time. No wake word required — it transcribes everything until stopped.
CLI
spych live # writes transcript.srt
spych live --output-path meeting --output-format both
spych live --terminate-words "stop recording"
spych live --no-timestamps --whisper-model small.en
Stop by pressing the stop key (default: q + Enter), saying a terminate word, or pressing Ctrl+C.
Parameters
| Flag | Default | Description |
|---|---|---|
--output-path PATH |
transcript |
Base output file path without extension |
--output-format FORMAT |
srt |
txt, srt, or both |
--no-timestamps |
false | Omit timestamps from terminal and .txt output |
--stop-key KEY |
q |
Key (then Enter) to stop the session |
--terminate-words WORD [...] |
— | Spoken words that stop the session |
--device-index N |
-1 |
Microphone device index; -1 uses system default |
--whisper-model MODEL |
base.en |
faster-whisper model name |
--whisper-device DEVICE |
cpu |
cpu or cuda |
--whisper-compute-type TYPE |
int8 |
int8, float16, or float32 |
--no-speech-threshold FLOAT |
0.3 |
Whisper segments above this no_speech_prob are dropped |
--speech-threshold FLOAT |
0.5 |
VAD speech onset probability |
--silence-threshold FLOAT |
0.35 |
VAD silence probability during speech |
--silence-frames N |
20 |
Consecutive silent frames to end a segment (~32ms each) |
--speech-pad-frames N |
5 |
Pre-roll frames and onset confirmation count |
--max-speech-duration SECONDS |
30.0 |
Hard cap on a single segment |
--context-words N |
32 |
Trailing words passed as whisper initial_prompt |
Python
from spych.live import SpychLive
SpychLive(
output_format="srt", # "txt", "srt", or "both"
output_path="my_transcript", # written to my_transcript.srt
show_timestamps=True,
stop_key="q", # type q + Enter to stop
terminate_words=["stop recording"],
).start()
SpychLive Parameters
| Parameter | Default | Description |
|---|---|---|
output_format |
"srt" |
Output format(s): "txt", "srt", or "both" |
output_path |
"transcript" |
Base path without extension |
show_timestamps |
True |
Prepend [HH:MM:SS] timestamps to terminal and .txt output |
stop_key |
"q" |
Key (then Enter) to stop the session |
terminate_words |
None |
Spoken words that stop the session |
on_terminate |
None |
No-argument callback executed when a terminate word fires |
device_index |
-1 |
Microphone device index; -1 uses system default |
whisper_model |
"base.en" |
faster-whisper model name |
whisper_device |
"cpu" |
Device for inference: "cpu" or "cuda" |
whisper_compute_type |
"int8" |
Compute precision: "int8", "float16", or "float32" |
no_speech_threshold |
0.4 |
Whisper segments above this are discarded |
speech_threshold |
0.5 |
Silero VAD onset probability |
silence_threshold |
0.35 |
Silero VAD silence probability during speech |
silence_frames_threshold |
20 |
Consecutive silent frames to close a segment |
speech_pad_frames |
5 |
Pre-roll frame count and onset confirmation threshold |
max_speech_duration_s |
30.0 |
Hard cap on a single segment in seconds |
context_words |
32 |
Trailing transcript words passed as initial_prompt |
Multi-agent
Run several agents simultaneously under a single listener, each bound to its own wake words. Say "hey claude" to talk to Claude, "hey llama" to talk to Ollama — all in the same terminal session.
CLI
# Two agents, default wake words
spych multi --agents claude gemini
# Include Ollama with a specific model
spych multi --agents claude ollama --ollama-model llama3.2:latest
# Tune listen duration across all agents
spych multi --agents claude codex --listen-duration 8
Multi-agent CLI Flags
| Flag | Default | Description |
|---|---|---|
--agents AGENT [...] |
(required) | Agents to run: claude (claude_code_cli), claude_sdk (claude_code_sdk), codex (codex_cli), gemini (gemini_cli), opencode (opencode_cli), ollama |
--terminate-words WORD [...] |
terminate |
Words that stop all agents |
--listen-duration SECONDS |
5 |
Seconds to listen after a wake word |
--follow-up-listen-duration SECONDS |
0 |
Seconds to listen for follow-up answers |
--inactivity-timeout SECONDS |
4.0 |
Seconds of silence before returning to wake word |
--continue-conversation BOOL |
true |
Resume the most recent session for each coding agent |
--show-tool-events BOOL |
true |
Print live tool start/end events |
--use-speaker BOOL |
true |
Speak responses aloud via TTS |
--speaker-backend BACKEND |
(auto) | chatterbox or kokoro |
--intermediate-responses BOOL |
true |
Enable intermediate response chaining for long-running tasks |
--ollama-model MODEL |
llama3.2:latest |
Only used when ollama is in --agents |
--ollama-host URL |
http://localhost:11434 |
Only used when ollama is in --agents |
--ollama-history-length N |
10 |
Only used when ollama is in --agents |
--opencode-model MODEL |
— | provider/model format. Only used when opencode_cli is in --agents |
--setting-sources SOURCE [...] |
user project local |
Only used when claude_code_sdk is in --agents |
Python
from spych.core import Spych
from spych.orchestrator import SpychOrchestrator
from spych.agents.claude import LocalClaudeCodeCLIResponder
from spych.agents.ollama import OllamaResponder
spych_object = Spych(whisper_model="base.en")
SpychOrchestrator(
entries=[
{
"responder": LocalClaudeCodeCLIResponder(spych_object=spych_object),
"wake_words": ["claude", "clod", "cloud", "clawed"],
"terminate_words": ["terminate"],
},
{
"responder": OllamaResponder(spych_object=spych_object, model="llama3.2:latest"),
"wake_words": ["llama", "ollama", "lama"],
},
]
).start()
OrchestratorEntry Keys
| Key | Required | Default | Description |
|---|---|---|---|
responder |
✓ | — | A BaseResponder instance |
wake_words |
✓ | — | Words that trigger this responder. Must be unique across all entries |
terminate_words |
["terminate"] |
Words that stop the entire orchestrator |
SpychOrchestrator Parameters
| Parameter | Default | Description |
|---|---|---|
entries |
(required) | List of OrchestratorEntry dicts |
spych_wake_kwargs |
None |
Extra kwargs forwarded to SpychWake |
Python — Built-in Agents
The same agents available from the CLI can be used directly from Python.
Claude Code CLI
from spych.agents import claude_code_cli
# Say "hey claude" to trigger
claude_code_cli()
Claude Code SDK
from spych.agents import claude_code_sdk
# Say "hey claude" to trigger
claude_code_sdk()
Codex CLI
from spych.agents import codex_cli
# Say "hey codex" to trigger
codex_cli()
Gemini CLI
from spych.agents import gemini_cli
# Say "hey gemini" to trigger
gemini_cli()
OpenCode CLI
from spych.agents import opencode_cli
# Say "hey opencode" to trigger
opencode_cli()
Ollama
from spych.agents import ollama
# Pull the model first: ollama pull llama3.2:latest
# Say "hey llama" to trigger
ollama(model="llama3.2:latest")
Coding Agent Parameters
| Parameter | claude_code_cli |
claude_code_sdk |
codex_cli |
gemini_cli |
opencode_cli |
Description |
|---|---|---|---|---|---|---|
name |
Claude |
Claude |
Codex |
Gemini |
OpenCode |
Custom display name |
wake_words |
["claude", "clod", "cloud", "clawed"] |
["claude", "clod", "cloud", "clawed"] |
["codex"] |
["gemini", "google"] |
["opencode", "open code"] |
Words that trigger the agent |
terminate_words |
["terminate"] |
["terminate"] |
["terminate"] |
["terminate"] |
["terminate"] |
Words that stop the listener |
model |
— | — | — | — | None |
Model in provider/model format |
listen_duration |
0 |
0 |
0 |
0 |
0 |
Seconds to listen (0 = VAD auto) |
continue_conversation |
True |
True |
True |
True |
True |
Resume the most recent session |
setting_sources |
— | ["user", "project", "local"] |
— | — | — | Claude Code settings sources |
show_tool_events |
True |
True |
True |
True |
True |
Print live tool start/end events |
use_speaker |
False |
False |
False |
False |
False |
Speak responses aloud via TTS |
speaker_voice |
"af_heart" |
"af_heart" |
"af_heart" |
"af_heart" |
"af_heart" |
Voice name for TTS |
response_style |
"" |
"" |
"" |
"" |
"" |
Style preset or custom instruction |
allow_intermediate_responses |
True |
True |
True |
True |
True |
Enable intermediate response chaining |
spych_kwargs |
— | — | — | — | — | Extra kwargs passed to Spych |
spych_wake_kwargs |
— | — | — | — | — | Extra kwargs passed to SpychWake |
Ollama Parameters
| Parameter | Default | Description |
|---|---|---|
name |
"Ollama" |
Custom display name |
wake_words |
["llama", "ollama", "lama"] |
Words that trigger the agent |
terminate_words |
["terminate"] |
Words that stop the listener |
model |
"llama3.2:latest" |
Ollama model name |
listen_duration |
0 |
Seconds to listen (0 = VAD auto) |
history_length |
10 |
Past interactions to include in context |
host |
"http://localhost:11434" |
Ollama instance URL |
use_speaker |
False |
Speak responses aloud via TTS |
speaker_voice |
"af_heart" |
Voice name for TTS |
response_style |
"" |
Style preset or custom instruction |
allow_intermediate_responses |
True |
Enable intermediate response chaining |
spych_kwargs |
None |
Extra kwargs passed to Spych |
spych_wake_kwargs |
None |
Extra kwargs passed to SpychWake |
Python: Building Your Own Agent
Subclass BaseResponder, implement respond, and Spych handles the rest: wake word detection, transcription, spinner UI, timing, TTS, error handling.
respond() must return an AgentResponse. Use self.format_prompt() to inject the JSON schema into your prompt and self.parse_output() to parse the result:
from spych.responders import BaseResponder, AgentResponse
class MyResponder(BaseResponder):
def respond(self, user_input: str) -> AgentResponse:
raw = call_my_llm(self.format_prompt(user_input))
return self.parse_output(raw)
A complete working example with a custom wake word:
from spych import Spych, SpychOrchestrator
from spych.responders import BaseResponder, AgentResponse
class EchoResponder(BaseResponder):
def respond(self, user_input: str) -> AgentResponse:
return AgentResponse(
response=f"'{self.name}' heard: {user_input}",
summary=f"Heard: {user_input}",
requires_user_feedback=False,
)
SpychOrchestrator(
entries=[
{
"responder": EchoResponder(
spych_object=Spych(whisper_model="base.en"),
listen_duration=5,
name="TestResponder",
),
"wake_words": ["test"],
"terminate_words": ["terminate"],
}
]
).start()
You can also subclass a built-in agent. For example, a translation agent that routes to Ollama:
from spych import Spych, SpychOrchestrator
from spych.agents import OllamaResponder
from spych.responders import AgentResponse
class Spanish(OllamaResponder):
def respond(self, user_input: str) -> AgentResponse:
user_input = f"Translate the following to Spanish and return only the translated text: '{user_input}'"
return super().respond(user_input)
class German(OllamaResponder):
def respond(self, user_input: str) -> AgentResponse:
user_input = f"Translate the following to German and return only the translated text: '{user_input}'"
return super().respond(user_input)
spych_object = Spych(whisper_model="base.en")
SpychOrchestrator(
entries=[
{
"responder": Spanish(spych_object=spych_object, name="SpanishTranslator", model="llama3.2:latest"),
"wake_words": ["spanish"],
"terminate_words": ["terminate"],
},
{
"responder": German(spych_object=spych_object, name="GermanTranslator", model="llama3.2:latest"),
"wake_words": ["german"],
"terminate_words": ["terminate"],
},
]
).start()
Think your agent would be useful to others? Open a PR or file a feature request via a GitHub issue.
Python: Lower-Level API
Need more control? Use Spych and SpychWake directly.
Transcription
from spych import Spych
spych = Spych(
whisper_model="base.en", # tiny, small, medium, large — all faster-whisper models work
whisper_device="cpu", # use "cuda" for Nvidia GPU
)
print(spych.listen(duration=5))
See: https://connor-makowski.github.io/spych/spych/core.html
Wake Word Detection
from spych import SpychWake, Spych
spych = Spych(whisper_model="base.en", whisper_device="cpu")
def on_wake():
print("Wake word detected! Listening...")
print(spych.listen(duration=5))
SpychWake(
wake_word_map={"speech": on_wake},
whisper_model="tiny.en",
whisper_device="cpu",
).start()
See: https://connor-makowski.github.io/spych/spych/wake.html
API Reference
Full docs including all parameters and methods: https://connor-makowski.github.io/spych/spych.html
Support
Found a bug or want a new feature? Open an issue on GitHub.
Contributing
Contributions are welcome!
- Fork the repo and clone it locally.
- Make your changes.
- Run tests and make sure they pass.
- Commit atomically with clear messages.
- Submit a pull request.
Virtual environment setup:
python3.11 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
./utils/test.sh
1""" 2# Spych 3[](https://badge.fury.io/py/spych) 4[](https://opensource.org/licenses/MIT) 5[](https://pypi.org/project/spych/) 6 7**Spych** (pronounced "speech"): Talk with your computer like it's your personal assistant without sending your voice to the cloud. 8 9A lightweight, fully offline Python toolkit for wake word detection, audio transcription, spoken AI responses, and AI integrations. Built on [faster-whisper](https://github.com/SYSTRAN/faster-whisper), [PvRecorder](https://github.com/Picovoice/pvrecorder), and [Kokoro](https://github.com/hexgrad/kokoro). 10 11**API Docs**: https://connor-makowski.github.io/spych/spych.html 12 13--- 14 15# Installation 16 17### Recommended: pipx 18 19```bash 20pipx install spych 21``` 22 23### Alternative: pip 24 25```bash 26pip install spych 27``` 28 29### TTS Extras 30 31By default, Spych automatically installs the right TTS backend for your Python version. You can also install explicitly: 32 33```bash 34pipx install "spych[kokoro]" # Fast, lightweight (Python < 3.13 recommended) 35pipx install "spych[chatterbox]" # High-quality voice cloning (Python >= 3.13 required) 36``` 37 38--- 39 40# Quick Start 41 42```bash 43# Navigate to your project directory first 44cd ~/my_project 45 46# Voice-control Claude Code — say "hey claude" to trigger 47spych claude 48 49# Use a personality preset — say "hey jarvis" to trigger 50spych claude --personality jarvis 51 52# Voice-control a local Ollama model — say "hey llama" to trigger 53spych ollama --model llama3.2:latest 54``` 55 56> 💡 **Pro tip:** Saying "Hey Claude" or "Hey Llama" tends to trigger more reliably than the bare wake word. 57 58Say **"terminate"** (or press `Ctrl+C`) to stop any session. 59 60--- 61 62# CLI 63 64## Available Agents 65 66All agents require their respective CLI tool to be installed and authenticated before use. 67 68| Command | Alias | Description | Default wake words | 69|---|---|---|---| 70| `spych claude_code_cli` | — | Voice-control Claude Code via the CLI | `claude`, `clod`, `cloud`, `clawed` | 71| `spych claude_code_sdk` | `spych claude` | Voice-control Claude Code via the Agent SDK | `claude`, `clod`, `cloud`, `clawed` | 72| `spych codex_cli` | `spych codex` | Voice-control the OpenAI Codex agent | `codex` | 73| `spych gemini_cli` | `spych gemini` | Voice-control the Google Gemini agent | `gemini`, `google` | 74| `spych opencode_cli` | `spych opencode` | Voice-control the OpenCode agent | `opencode`, `open code` | 75| `spych ollama` | — | Talk to a local Ollama model | `llama`, `ollama`, `lama` | 76 77## Available Utilities 78 79The following utilities are also available as CLI commands. They don't use wake words, but serve various auxiliary functions like live transcription and voice profiling. 80 81| Command | Description | 82|---|---| 83| `spych --version` | Print the version number and exit | 84| `spych --help` | Show detailed usage instructions and exit | 85| `spych live` | Continuous speech-to-text transcription to file | 86| `spych multi` | Run multiple agents simultaneously | 87| `spych users` | Manage user profiles and global settings | 88| `spych profile_my_voice` | Record a voice sample for TTS cloning | 89 90## Global Flags 91 92These must be placed **before** the agent name: 93 94```bash 95spych --theme light claude 96``` 97 98| Flag | Options | Default | Description | 99|---|---|---|---| 100| `--theme` | `dark`, `light`, `solarized`, `mono` | `dark` | Terminal colour theme | 101 102> 💡 **TUI Dashboard:** Spych launches a rich terminal interface by default. Use the `--verbose` flag (e.g., `spych --verbose claude`) to switch to a simpler, non-interactive scrollable output. 103 104--- 105 106## Common Flags 107 108All agent subcommands accept these flags: 109 110| Flag | Default | Description | 111|---|---|---| 112| `--personality NAME` | — | Apply a named preset (sets wake words, voice, name, style) | 113| `--name NAME` | *(agent default)* | Custom display name shown in the terminal | 114| `--wake-words WORD [...]` | *(agent default)* | One or more words that trigger the agent | 115| `--terminate-words WORD [...]` | `terminate` | Words that stop the listener | 116| `--listen-duration SECONDS` | `0` (VAD auto) | Seconds to record after wake word | 117| `--follow-up-listen-duration SECONDS` | `0` | Seconds to listen for a follow-up answer | 118| `--inactivity-timeout SECONDS` | `4.0` | Seconds of silence before returning to wake word | 119| `--use-speaker BOOL` | `true` | Speak responses aloud via TTS | 120| `--speaker-voice VOICE` | `af_heart` | Voice name for spoken responses | 121| `--speaker-backend BACKEND` | *(auto)* | `chatterbox` or `kokoro` | 122| `--response-style STYLE` | — | Style preset or custom instruction for spoken output | 123| `--intermediate-responses BOOL` | `true` | Enable intermediate response chaining for long-running tasks | 124 125Coding agents (`claude`, `codex`, `gemini`, `opencode`) also accept: 126 127| Flag | Default | Description | 128|---|---|---| 129| `--continue-conversation BOOL` | `true` | Resume the most recent session | 130| `--show-tool-events BOOL` | `true` | Print live tool start/end events | 131 132Agent-specific flags: 133 134| Agent | Flag | Default | Description | 135|---|---|---|---| 136| `ollama` | `--model` | `llama3.2:latest` | Ollama model name | 137| `ollama` | `--history-length` | `10` | Past interactions to include in context | 138| `ollama` | `--host` | `http://localhost:11434` | Ollama instance URL | 139| `opencode_cli` | `--model` | — | Model in `provider/model` format | 140| `claude_code_sdk` | `--setting-sources` | `user project local` | Claude Code settings sources | 141 142--- 143 144# Personalities 145 146Personalities are named presets that bundle a wake word list, voice, display name, and response style into a single flag. Any explicit flag overrides the preset. 147 148```bash 149spych claude --personality jarvis 150# equivalent to: 151spych claude --name "JARVIS" --wake-words jarvis jarves \ 152 --speaker-voice bm_george --use-speaker true \ 153 --response-style jarvis 154``` 155 156| Name | Wake words | Voice | Style | 157|---|---|---|---| 158| `assistant` | `assistant`, `helper`, `computer` | `af_heart` | `assistant` — helpful, precise, informative | 159| `friend` | `friend`, `buddy`, `pal` | `af_amy` | `friendly` — warm and simple | 160| `jarvis` | `jarvis`, `jarves`, `jargus`, `jervis` | `bm_george` | `jarvis` — precise, dry wit, "sir" | 161| `pirate` | `blackbeard`, `pirate`, `ahoy` | `am_michael` | `pirate` — pirate speak, colorful | 162| `news_anchor` | `bella`, `news anchor`, `anchor` | `af_bella` | `news_anchor` — professional broadcast tone | 163| `robot` | `rob`, `robot` | `am_adam` | `robot` — monotone, literal | 164| `caveman` | `er`, `ur`, `caveman`, `cave man` | `am_onyx` | `caveman` — very simple, direct | 165 166--- 167 168# User Management 169 170Spych supports multiple user profiles, allowing agents to provide more personalized responses based on your name, age, and other context. 171 172```bash 173# Launch the interactive user management menu 174spych users 175``` 176 177The `users` utility allows you to: 178- Create, edit, and delete user profiles. 179- Set a default user for all agents. 180- Change the global terminal theme (`dark`, `light`, `solarized`, `mono`). 181 182You can also specify a user for a specific session: 183```bash 184spych claude --user Connor 185``` 186 187--- 188 189# Response Styles 190 191The `--response-style` flag shapes how the agent formats its spoken output. 192 193| Style | Description | 194|---|---| 195| `assistant` | Helpful and precise, concise and informative | 196| `concise` | Key points only, direct | 197| `friendly` | Warm, approachable, simple language | 198| `military` | Brevity-style, short sentences | 199| `five_year_old` | Simple words, very short | 200| `fast` | As brief as reasonably possible | 201| `pirate` | Pirate speak, colorful | 202| `news_anchor` | Professional broadcast tone | 203| `haiku` | 5-7-5 haiku form | 204| `shakespearean` | Elizabethan English | 205| `robot` | Monotone, literal | 206| `caveman` | Very simple, direct | 207| `yoda` | Inverted sentence structure | 208| `jarvis` | JARVIS from Iron Man — precise, dry wit, addresses user as "sir" or "ma'am" | 209 210You can also pass any custom instruction string directly: `--response-style "Reply in exactly one sentence."`. 211 212--- 213 214# Text-to-Speech & Voices 215 216Spoken responses are enabled by default for personality presets and when `--use-speaker true` is set. 217 218```bash 219spych claude --use-speaker true --speaker-voice bm_george 220spych claude --use-speaker true --speaker-backend kokoro 221spych claude --use-speaker false # disable TTS 222``` 223 224When TTS is active, short responses are spoken verbatim; longer ones use the agent's short `summary`. If the response ends with a question, Spych automatically listens for a follow-up — no wake word required. 225 226### TTS Backends 227 228| Backend | Best for | Python support | 229|---|---|---| 230| **Chatterbox** (default priority) | Natural voices, zero-shot voice cloning | 3.11+ (required for 3.13+) | 231| **Kokoro** (lightweight fallback) | Fast, low-resource devices (e.g. Raspberry Pi) | 3.11–3.12 recommended | 232 233Spych tries Chatterbox first, then Kokoro. Use `--speaker-backend` to force one explicitly. 234 235### Available Voices 236 237The same voice names work for both backends. 238 239- Chatterbox wave voices: https://github.com/connor-makowski/spych/tree/main/voices/wave 240- Kokoro pt voices (56 total): https://github.com/connor-makowski/spych/tree/main/voices/pt 241 242American English (`am_` / `af_`): 243 244| Voice | Gender | Grade | 245|---|---|---| 246| `af_heart` | F | A (default) | 247| `af_bella` | F | A- | 248| `af_nicole` | F | B- | 249| `am_michael` | M | C+ | 250| `am_fenrir` | M | C+ | 251| `am_puck` | M | C+ | 252 253British English (`bm_` / `bf_`): 254 255| Voice | Gender | Grade | 256|---|---|---| 257| `bf_emma` | F | B- | 258| `bf_isabella` | F | C | 259| `bm_george` | M | C | 260 261### Voice Cloning 262 263Record a 10-second sample of your voice, then use it as the speaker voice. Requires the Chatterbox backend. 264 265```bash 266# Step 1: record your profile 267spych profile_my_voice --name my_voice 268 269# Step 2: use it 270spych claude --use-speaker true --speaker-voice my_voice --speaker-backend chatterbox 271 272# Or use any .wav file directly 273spych claude --use-speaker true --speaker-voice /path/to/my_voice.wav --speaker-backend chatterbox 274``` 275 276--- 277 278# Live Transcription 279 280`spych live` continuously records from the microphone using VAD and writes the transcript to disk in real time. No wake word required — it transcribes everything until stopped. 281 282## CLI 283 284```bash 285spych live # writes transcript.srt 286spych live --output-path meeting --output-format both 287spych live --terminate-words "stop recording" 288spych live --no-timestamps --whisper-model small.en 289``` 290 291Stop by pressing the stop key (default: `q` + Enter), saying a terminate word, or pressing `Ctrl+C`. 292 293### Parameters 294 295| Flag | Default | Description | 296|---|---|---| 297| `--output-path PATH` | `transcript` | Base output file path without extension | 298| `--output-format FORMAT` | `srt` | `txt`, `srt`, or `both` | 299| `--no-timestamps` | false | Omit timestamps from terminal and `.txt` output | 300| `--stop-key KEY` | `q` | Key (then Enter) to stop the session | 301| `--terminate-words WORD [...]` | — | Spoken words that stop the session | 302| `--device-index N` | `-1` | Microphone device index; -1 uses system default | 303| `--whisper-model MODEL` | `base.en` | faster-whisper model name | 304| `--whisper-device DEVICE` | `cpu` | `cpu` or `cuda` | 305| `--whisper-compute-type TYPE` | `int8` | `int8`, `float16`, or `float32` | 306| `--no-speech-threshold FLOAT` | `0.3` | Whisper segments above this `no_speech_prob` are dropped | 307| `--speech-threshold FLOAT` | `0.5` | VAD speech onset probability | 308| `--silence-threshold FLOAT` | `0.35` | VAD silence probability during speech | 309| `--silence-frames N` | `20` | Consecutive silent frames to end a segment (~32ms each) | 310| `--speech-pad-frames N` | `5` | Pre-roll frames and onset confirmation count | 311| `--max-speech-duration SECONDS` | `30.0` | Hard cap on a single segment | 312| `--context-words N` | `32` | Trailing words passed as whisper `initial_prompt` | 313 314## Python 315 316```python 317from spych.live import SpychLive 318 319SpychLive( 320 output_format="srt", # "txt", "srt", or "both" 321 output_path="my_transcript", # written to my_transcript.srt 322 show_timestamps=True, 323 stop_key="q", # type q + Enter to stop 324 terminate_words=["stop recording"], 325).start() 326``` 327 328### `SpychLive` Parameters 329 330| Parameter | Default | Description | 331|---|---|---| 332| `output_format` | `"srt"` | Output format(s): `"txt"`, `"srt"`, or `"both"` | 333| `output_path` | `"transcript"` | Base path without extension | 334| `show_timestamps` | `True` | Prepend `[HH:MM:SS]` timestamps to terminal and `.txt` output | 335| `stop_key` | `"q"` | Key (then Enter) to stop the session | 336| `terminate_words` | `None` | Spoken words that stop the session | 337| `on_terminate` | `None` | No-argument callback executed when a terminate word fires | 338| `device_index` | `-1` | Microphone device index; `-1` uses system default | 339| `whisper_model` | `"base.en"` | faster-whisper model name | 340| `whisper_device` | `"cpu"` | Device for inference: `"cpu"` or `"cuda"` | 341| `whisper_compute_type` | `"int8"` | Compute precision: `"int8"`, `"float16"`, or `"float32"` | 342| `no_speech_threshold` | `0.4` | Whisper segments above this are discarded | 343| `speech_threshold` | `0.5` | Silero VAD onset probability | 344| `silence_threshold` | `0.35` | Silero VAD silence probability during speech | 345| `silence_frames_threshold` | `20` | Consecutive silent frames to close a segment | 346| `speech_pad_frames` | `5` | Pre-roll frame count and onset confirmation threshold | 347| `max_speech_duration_s` | `30.0` | Hard cap on a single segment in seconds | 348| `context_words` | `32` | Trailing transcript words passed as `initial_prompt` | 349 350--- 351 352# Multi-agent 353 354Run several agents simultaneously under a single listener, each bound to its own wake words. Say "hey claude" to talk to Claude, "hey llama" to talk to Ollama — all in the same terminal session. 355 356## CLI 357 358```bash 359# Two agents, default wake words 360spych multi --agents claude gemini 361 362# Include Ollama with a specific model 363spych multi --agents claude ollama --ollama-model llama3.2:latest 364 365# Tune listen duration across all agents 366spych multi --agents claude codex --listen-duration 8 367``` 368 369### Multi-agent CLI Flags 370 371| Flag | Default | Description | 372|---|---|---| 373| `--agents AGENT [...]` | *(required)* | Agents to run: `claude` (`claude_code_cli`), `claude_sdk` (`claude_code_sdk`), `codex` (`codex_cli`), `gemini` (`gemini_cli`), `opencode` (`opencode_cli`), `ollama` | 374| `--terminate-words WORD [...]` | `terminate` | Words that stop all agents | 375| `--listen-duration SECONDS` | `5` | Seconds to listen after a wake word | 376| `--follow-up-listen-duration SECONDS` | `0` | Seconds to listen for follow-up answers | 377| `--inactivity-timeout SECONDS` | `4.0` | Seconds of silence before returning to wake word | 378| `--continue-conversation BOOL` | `true` | Resume the most recent session for each coding agent | 379| `--show-tool-events BOOL` | `true` | Print live tool start/end events | 380| `--use-speaker BOOL` | `true` | Speak responses aloud via TTS | 381| `--speaker-backend BACKEND` | *(auto)* | `chatterbox` or `kokoro` | 382| `--intermediate-responses BOOL` | `true` | Enable intermediate response chaining for long-running tasks | 383| `--ollama-model MODEL` | `llama3.2:latest` | Only used when `ollama` is in `--agents` | 384| `--ollama-host URL` | `http://localhost:11434` | Only used when `ollama` is in `--agents` | 385| `--ollama-history-length N` | `10` | Only used when `ollama` is in `--agents` | 386| `--opencode-model MODEL` | — | `provider/model` format. Only used when `opencode_cli` is in `--agents` | 387| `--setting-sources SOURCE [...]` | `user project local` | Only used when `claude_code_sdk` is in `--agents` | 388 389## Python 390 391```python 392from spych.core import Spych 393from spych.orchestrator import SpychOrchestrator 394from spych.agents.claude import LocalClaudeCodeCLIResponder 395from spych.agents.ollama import OllamaResponder 396 397spych_object = Spych(whisper_model="base.en") 398 399SpychOrchestrator( 400 entries=[ 401 { 402 "responder": LocalClaudeCodeCLIResponder(spych_object=spych_object), 403 "wake_words": ["claude", "clod", "cloud", "clawed"], 404 "terminate_words": ["terminate"], 405 }, 406 { 407 "responder": OllamaResponder(spych_object=spych_object, model="llama3.2:latest"), 408 "wake_words": ["llama", "ollama", "lama"], 409 }, 410 ] 411).start() 412``` 413 414### `OrchestratorEntry` Keys 415 416| Key | Required | Default | Description | 417|---|---|---|---| 418| `responder` | ✓ | — | A `BaseResponder` instance | 419| `wake_words` | ✓ | — | Words that trigger this responder. Must be unique across all entries | 420| `terminate_words` | | `["terminate"]` | Words that stop the entire orchestrator | 421 422### `SpychOrchestrator` Parameters 423 424| Parameter | Default | Description | 425|---|---|---| 426| `entries` | *(required)* | List of `OrchestratorEntry` dicts | 427| `spych_wake_kwargs` | `None` | Extra kwargs forwarded to `SpychWake` | 428 429--- 430 431# Python — Built-in Agents 432 433The same agents available from the CLI can be used directly from Python. 434 435## Claude Code CLI 436 437```python 438from spych.agents import claude_code_cli 439 440# Say "hey claude" to trigger 441claude_code_cli() 442``` 443 444## Claude Code SDK 445 446```python 447from spych.agents import claude_code_sdk 448 449# Say "hey claude" to trigger 450claude_code_sdk() 451``` 452 453## Codex CLI 454 455```python 456from spych.agents import codex_cli 457 458# Say "hey codex" to trigger 459codex_cli() 460``` 461 462## Gemini CLI 463 464```python 465from spych.agents import gemini_cli 466 467# Say "hey gemini" to trigger 468gemini_cli() 469``` 470 471## OpenCode CLI 472 473```python 474from spych.agents import opencode_cli 475 476# Say "hey opencode" to trigger 477opencode_cli() 478``` 479 480## Ollama 481 482```python 483from spych.agents import ollama 484 485# Pull the model first: ollama pull llama3.2:latest 486# Say "hey llama" to trigger 487ollama(model="llama3.2:latest") 488``` 489 490### Coding Agent Parameters 491 492| Parameter | `claude_code_cli` | `claude_code_sdk` | `codex_cli` | `gemini_cli` | `opencode_cli` | Description | 493|---|---|---|---|---|---|---| 494| `name` | `Claude` | `Claude` | `Codex` | `Gemini` | `OpenCode` | Custom display name | 495| `wake_words` | `["claude", "clod", "cloud", "clawed"]` | `["claude", "clod", "cloud", "clawed"]` | `["codex"]` | `["gemini", "google"]` | `["opencode", "open code"]` | Words that trigger the agent | 496| `terminate_words` | `["terminate"]` | `["terminate"]` | `["terminate"]` | `["terminate"]` | `["terminate"]` | Words that stop the listener | 497| `model` | — | — | — | — | `None` | Model in `provider/model` format | 498| `listen_duration` | `0` | `0` | `0` | `0` | `0` | Seconds to listen (0 = VAD auto) | 499| `continue_conversation` | `True` | `True` | `True` | `True` | `True` | Resume the most recent session | 500| `setting_sources` | — | `["user", "project", "local"]` | — | — | — | Claude Code settings sources | 501| `show_tool_events` | `True` | `True` | `True` | `True` | `True` | Print live tool start/end events | 502| `use_speaker` | `False` | `False` | `False` | `False` | `False` | Speak responses aloud via TTS | 503| `speaker_voice` | `"af_heart"` | `"af_heart"` | `"af_heart"` | `"af_heart"` | `"af_heart"` | Voice name for TTS | 504| `response_style` | `""` | `""` | `""` | `""` | `""` | Style preset or custom instruction | 505| `allow_intermediate_responses` | `True` | `True` | `True` | `True` | `True` | Enable intermediate response chaining | 506| `spych_kwargs` | — | — | — | — | — | Extra kwargs passed to `Spych` | 507| `spych_wake_kwargs` | — | — | — | — | — | Extra kwargs passed to `SpychWake` | 508 509### Ollama Parameters 510 511| Parameter | Default | Description | 512|---|---|---| 513| `name` | `"Ollama"` | Custom display name | 514| `wake_words` | `["llama", "ollama", "lama"]` | Words that trigger the agent | 515| `terminate_words` | `["terminate"]` | Words that stop the listener | 516| `model` | `"llama3.2:latest"` | Ollama model name | 517| `listen_duration` | `0` | Seconds to listen (0 = VAD auto) | 518| `history_length` | `10` | Past interactions to include in context | 519| `host` | `"http://localhost:11434"` | Ollama instance URL | 520| `use_speaker` | `False` | Speak responses aloud via TTS | 521| `speaker_voice` | `"af_heart"` | Voice name for TTS | 522| `response_style` | `""` | Style preset or custom instruction | 523| `allow_intermediate_responses` | `True` | Enable intermediate response chaining | 524| `spych_kwargs` | `None` | Extra kwargs passed to `Spych` | 525| `spych_wake_kwargs` | `None` | Extra kwargs passed to `SpychWake` | 526 527--- 528 529# Python: Building Your Own Agent 530 531Subclass `BaseResponder`, implement `respond`, and Spych handles the rest: wake word detection, transcription, spinner UI, timing, TTS, error handling. 532 533`respond()` must return an `AgentResponse`. Use `self.format_prompt()` to inject the JSON schema into your prompt and `self.parse_output()` to parse the result: 534 535```python 536from spych.responders import BaseResponder, AgentResponse 537 538class MyResponder(BaseResponder): 539 def respond(self, user_input: str) -> AgentResponse: 540 raw = call_my_llm(self.format_prompt(user_input)) 541 return self.parse_output(raw) 542``` 543 544A complete working example with a custom wake word: 545 546```python 547from spych import Spych, SpychOrchestrator 548from spych.responders import BaseResponder, AgentResponse 549 550class EchoResponder(BaseResponder): 551 def respond(self, user_input: str) -> AgentResponse: 552 return AgentResponse( 553 response=f"'{self.name}' heard: {user_input}", 554 summary=f"Heard: {user_input}", 555 requires_user_feedback=False, 556 ) 557 558SpychOrchestrator( 559 entries=[ 560 { 561 "responder": EchoResponder( 562 spych_object=Spych(whisper_model="base.en"), 563 listen_duration=5, 564 name="TestResponder", 565 ), 566 "wake_words": ["test"], 567 "terminate_words": ["terminate"], 568 } 569 ] 570).start() 571``` 572 573You can also subclass a built-in agent. For example, a translation agent that routes to Ollama: 574 575```python 576from spych import Spych, SpychOrchestrator 577from spych.agents import OllamaResponder 578from spych.responders import AgentResponse 579 580class Spanish(OllamaResponder): 581 def respond(self, user_input: str) -> AgentResponse: 582 user_input = f"Translate the following to Spanish and return only the translated text: '{user_input}'" 583 return super().respond(user_input) 584 585class German(OllamaResponder): 586 def respond(self, user_input: str) -> AgentResponse: 587 user_input = f"Translate the following to German and return only the translated text: '{user_input}'" 588 return super().respond(user_input) 589 590spych_object = Spych(whisper_model="base.en") 591 592SpychOrchestrator( 593 entries=[ 594 { 595 "responder": Spanish(spych_object=spych_object, name="SpanishTranslator", model="llama3.2:latest"), 596 "wake_words": ["spanish"], 597 "terminate_words": ["terminate"], 598 }, 599 { 600 "responder": German(spych_object=spych_object, name="GermanTranslator", model="llama3.2:latest"), 601 "wake_words": ["german"], 602 "terminate_words": ["terminate"], 603 }, 604 ] 605).start() 606``` 607 608Think your agent would be useful to others? Open a PR or file a feature request via a [GitHub issue](https://github.com/connor-makowski/spych/issues). 609 610--- 611 612# Python: Lower-Level API 613 614Need more control? Use `Spych` and `SpychWake` directly. 615 616## Transcription 617 618```python 619from spych import Spych 620 621spych = Spych( 622 whisper_model="base.en", # tiny, small, medium, large — all faster-whisper models work 623 whisper_device="cpu", # use "cuda" for Nvidia GPU 624) 625 626print(spych.listen(duration=5)) 627``` 628 629See: https://connor-makowski.github.io/spych/spych/core.html 630 631## Wake Word Detection 632 633```python 634from spych import SpychWake, Spych 635 636spych = Spych(whisper_model="base.en", whisper_device="cpu") 637 638def on_wake(): 639 print("Wake word detected! Listening...") 640 print(spych.listen(duration=5)) 641 642SpychWake( 643 wake_word_map={"speech": on_wake}, 644 whisper_model="tiny.en", 645 whisper_device="cpu", 646).start() 647``` 648 649See: https://connor-makowski.github.io/spych/spych/wake.html 650 651--- 652 653# API Reference 654 655Full docs including all parameters and methods: https://connor-makowski.github.io/spych/spych.html 656 657--- 658 659# Support 660 661Found a bug or want a new feature? [Open an issue on GitHub](https://github.com/connor-makowski/spych/issues). 662 663--- 664 665# Contributing 666 667Contributions are welcome! 668 6691. Fork the repo and clone it locally. 6702. Make your changes. 6713. Run tests and make sure they pass. 6724. Commit atomically with clear messages. 6735. Submit a pull request. 674 675**Virtual environment setup:** 676```bash 677python3.11 -m venv venv 678source venv/bin/activate 679pip install -r requirements.txt 680./utils/test.sh 681``` 682""" 683 684from .core import Spych 685from .wake import SpychWake 686from .orchestrator import SpychOrchestrator