spych

Spych

PyPI version License: MIT PyPI Downloads

Spych (pronounced "speech"): Talk with your computer like it's your personal assistant without sending your voice to the cloud.

A lightweight, fully offline Python toolkit for wake word detection, audio transcription, spoken AI responses, and AI integrations. Built on faster-whisper, PvRecorder, and Kokoro.

API Docs: https://connor-makowski.github.io/spych/spych.html


Installation

pipx install spych

Alternative: pip

pip install spych

TTS Extras

By default, Spych automatically installs the right TTS backend for your Python version. You can also install explicitly:

pipx install "spych[kokoro]"       # Fast, lightweight (Python < 3.13 recommended)
pipx install "spych[chatterbox]"   # High-quality voice cloning (Python >= 3.13 required)

Quick Start

# Navigate to your project directory first
cd ~/my_project

# Voice-control Claude Code — say "hey claude" to trigger
spych claude

# Use a personality preset — say "hey jarvis" to trigger
spych claude --personality jarvis

# Voice-control a local Ollama model — say "hey llama" to trigger
spych ollama --model llama3.2:latest

💡 Pro tip: Saying "Hey Claude" or "Hey Llama" tends to trigger more reliably than the bare wake word.

Say "terminate" (or press Ctrl+C) to stop any session.


CLI

Available Agents

All agents require their respective CLI tool to be installed and authenticated before use.

Command Alias Description Default wake words
spych claude_code_cli Voice-control Claude Code via the CLI claude, clod, cloud, clawed
spych claude_code_sdk spych claude Voice-control Claude Code via the Agent SDK claude, clod, cloud, clawed
spych codex_cli spych codex Voice-control the OpenAI Codex agent codex
spych gemini_cli spych gemini Voice-control the Google Gemini agent gemini, google
spych opencode_cli spych opencode Voice-control the OpenCode agent opencode, open code
spych ollama Talk to a local Ollama model llama, ollama, lama

Available Utilities

The following utilities are also available as CLI commands. They don't use wake words, but serve various auxiliary functions like live transcription and voice profiling.

Command Description
spych --version Print the version number and exit
spych --help Show detailed usage instructions and exit
spych live Continuous speech-to-text transcription to file
spych multi Run multiple agents simultaneously
spych users Manage user profiles and global settings
spych profile_my_voice Record a voice sample for TTS cloning

Global Flags

These must be placed before the agent name:

spych --theme light claude
Flag Options Default Description
--theme dark, light, solarized, mono dark Terminal colour theme

💡 TUI Dashboard: Spych launches a rich terminal interface by default. Use the --verbose flag (e.g., spych --verbose claude) to switch to a simpler, non-interactive scrollable output.


Common Flags

All agent subcommands accept these flags:

Flag Default Description
--personality NAME Apply a named preset (sets wake words, voice, name, style)
--name NAME (agent default) Custom display name shown in the terminal
--wake-words WORD [...] (agent default) One or more words that trigger the agent
--terminate-words WORD [...] terminate Words that stop the listener
--listen-duration SECONDS 0 (VAD auto) Seconds to record after wake word
--follow-up-listen-duration SECONDS 0 Seconds to listen for a follow-up answer
--inactivity-timeout SECONDS 4.0 Seconds of silence before returning to wake word
--use-speaker BOOL true Speak responses aloud via TTS
--speaker-voice VOICE af_heart Voice name for spoken responses
--speaker-backend BACKEND (auto) chatterbox or kokoro
--response-style STYLE Style preset or custom instruction for spoken output
--intermediate-responses BOOL true Enable intermediate response chaining for long-running tasks

Coding agents (claude, codex, gemini, opencode) also accept:

Flag Default Description
--continue-conversation BOOL true Resume the most recent session
--show-tool-events BOOL true Print live tool start/end events

Agent-specific flags:

Agent Flag Default Description
ollama --model llama3.2:latest Ollama model name
ollama --history-length 10 Past interactions to include in context
ollama --host http://localhost:11434 Ollama instance URL
opencode_cli --model Model in provider/model format
claude_code_sdk --setting-sources user project local Claude Code settings sources

Personalities

Personalities are named presets that bundle a wake word list, voice, display name, and response style into a single flag. Any explicit flag overrides the preset.

spych claude --personality jarvis
# equivalent to:
spych claude --name "JARVIS" --wake-words jarvis jarves              --speaker-voice bm_george --use-speaker true              --response-style jarvis
Name Wake words Voice Style
assistant assistant, helper, computer af_heart assistant — helpful, precise, informative
friend friend, buddy, pal af_amy friendly — warm and simple
jarvis jarvis, jarves, jargus, jervis bm_george jarvis — precise, dry wit, "sir"
pirate blackbeard, pirate, ahoy am_michael pirate — pirate speak, colorful
news_anchor bella, news anchor, anchor af_bella news_anchor — professional broadcast tone
robot rob, robot am_adam robot — monotone, literal
caveman er, ur, caveman, cave man am_onyx caveman — very simple, direct

User Management

Spych supports multiple user profiles, allowing agents to provide more personalized responses based on your name, age, and other context.

# Launch the interactive user management menu
spych users

The users utility allows you to:

  • Create, edit, and delete user profiles.
  • Set a default user for all agents.
  • Change the global terminal theme (dark, light, solarized, mono).

You can also specify a user for a specific session:

spych claude --user Connor

Response Styles

The --response-style flag shapes how the agent formats its spoken output.

Style Description
assistant Helpful and precise, concise and informative
concise Key points only, direct
friendly Warm, approachable, simple language
military Brevity-style, short sentences
five_year_old Simple words, very short
fast As brief as reasonably possible
pirate Pirate speak, colorful
news_anchor Professional broadcast tone
haiku 5-7-5 haiku form
shakespearean Elizabethan English
robot Monotone, literal
caveman Very simple, direct
yoda Inverted sentence structure
jarvis JARVIS from Iron Man — precise, dry wit, addresses user as "sir" or "ma'am"

You can also pass any custom instruction string directly: --response-style "Reply in exactly one sentence.".


Text-to-Speech & Voices

Spoken responses are enabled by default for personality presets and when --use-speaker true is set.

spych claude --use-speaker true --speaker-voice bm_george
spych claude --use-speaker true --speaker-backend kokoro
spych claude --use-speaker false   # disable TTS

When TTS is active, short responses are spoken verbatim; longer ones use the agent's short summary. If the response ends with a question, Spych automatically listens for a follow-up — no wake word required.

TTS Backends

Backend Best for Python support
Chatterbox (default priority) Natural voices, zero-shot voice cloning 3.11+ (required for 3.13+)
Kokoro (lightweight fallback) Fast, low-resource devices (e.g. Raspberry Pi) 3.11–3.12 recommended

Spych tries Chatterbox first, then Kokoro. Use --speaker-backend to force one explicitly.

Available Voices

The same voice names work for both backends.

American English (am_ / af_):

Voice Gender Grade
af_heart F A (default)
af_bella F A-
af_nicole F B-
am_michael M C+
am_fenrir M C+
am_puck M C+

British English (bm_ / bf_):

Voice Gender Grade
bf_emma F B-
bf_isabella F C
bm_george M C

Voice Cloning

Record a 10-second sample of your voice, then use it as the speaker voice. Requires the Chatterbox backend.

# Step 1: record your profile
spych profile_my_voice --name my_voice

# Step 2: use it
spych claude --use-speaker true --speaker-voice my_voice --speaker-backend chatterbox

# Or use any .wav file directly
spych claude --use-speaker true --speaker-voice /path/to/my_voice.wav --speaker-backend chatterbox

Live Transcription

spych live continuously records from the microphone using VAD and writes the transcript to disk in real time. No wake word required — it transcribes everything until stopped.

CLI

spych live                                                 # writes transcript.srt
spych live --output-path meeting --output-format both
spych live --terminate-words "stop recording"
spych live --no-timestamps --whisper-model small.en

Stop by pressing the stop key (default: q + Enter), saying a terminate word, or pressing Ctrl+C.

Parameters

Flag Default Description
--output-path PATH transcript Base output file path without extension
--output-format FORMAT srt txt, srt, or both
--no-timestamps false Omit timestamps from terminal and .txt output
--stop-key KEY q Key (then Enter) to stop the session
--terminate-words WORD [...] Spoken words that stop the session
--device-index N -1 Microphone device index; -1 uses system default
--whisper-model MODEL base.en faster-whisper model name
--whisper-device DEVICE cpu cpu or cuda
--whisper-compute-type TYPE int8 int8, float16, or float32
--no-speech-threshold FLOAT 0.3 Whisper segments above this no_speech_prob are dropped
--speech-threshold FLOAT 0.5 VAD speech onset probability
--silence-threshold FLOAT 0.35 VAD silence probability during speech
--silence-frames N 20 Consecutive silent frames to end a segment (~32ms each)
--speech-pad-frames N 5 Pre-roll frames and onset confirmation count
--max-speech-duration SECONDS 30.0 Hard cap on a single segment
--context-words N 32 Trailing words passed as whisper initial_prompt

Python

from spych.live import SpychLive

SpychLive(
    output_format="srt",          # "txt", "srt", or "both"
    output_path="my_transcript",  # written to my_transcript.srt
    show_timestamps=True,
    stop_key="q",                 # type q + Enter to stop
    terminate_words=["stop recording"],
).start()

SpychLive Parameters

Parameter Default Description
output_format "srt" Output format(s): "txt", "srt", or "both"
output_path "transcript" Base path without extension
show_timestamps True Prepend [HH:MM:SS] timestamps to terminal and .txt output
stop_key "q" Key (then Enter) to stop the session
terminate_words None Spoken words that stop the session
on_terminate None No-argument callback executed when a terminate word fires
device_index -1 Microphone device index; -1 uses system default
whisper_model "base.en" faster-whisper model name
whisper_device "cpu" Device for inference: "cpu" or "cuda"
whisper_compute_type "int8" Compute precision: "int8", "float16", or "float32"
no_speech_threshold 0.4 Whisper segments above this are discarded
speech_threshold 0.5 Silero VAD onset probability
silence_threshold 0.35 Silero VAD silence probability during speech
silence_frames_threshold 20 Consecutive silent frames to close a segment
speech_pad_frames 5 Pre-roll frame count and onset confirmation threshold
max_speech_duration_s 30.0 Hard cap on a single segment in seconds
context_words 32 Trailing transcript words passed as initial_prompt

Multi-agent

Run several agents simultaneously under a single listener, each bound to its own wake words. Say "hey claude" to talk to Claude, "hey llama" to talk to Ollama — all in the same terminal session.

CLI

# Two agents, default wake words
spych multi --agents claude gemini

# Include Ollama with a specific model
spych multi --agents claude ollama --ollama-model llama3.2:latest

# Tune listen duration across all agents
spych multi --agents claude codex --listen-duration 8

Multi-agent CLI Flags

Flag Default Description
--agents AGENT [...] (required) Agents to run: claude (claude_code_cli), claude_sdk (claude_code_sdk), codex (codex_cli), gemini (gemini_cli), opencode (opencode_cli), ollama
--terminate-words WORD [...] terminate Words that stop all agents
--listen-duration SECONDS 5 Seconds to listen after a wake word
--follow-up-listen-duration SECONDS 0 Seconds to listen for follow-up answers
--inactivity-timeout SECONDS 4.0 Seconds of silence before returning to wake word
--continue-conversation BOOL true Resume the most recent session for each coding agent
--show-tool-events BOOL true Print live tool start/end events
--use-speaker BOOL true Speak responses aloud via TTS
--speaker-backend BACKEND (auto) chatterbox or kokoro
--intermediate-responses BOOL true Enable intermediate response chaining for long-running tasks
--ollama-model MODEL llama3.2:latest Only used when ollama is in --agents
--ollama-host URL http://localhost:11434 Only used when ollama is in --agents
--ollama-history-length N 10 Only used when ollama is in --agents
--opencode-model MODEL provider/model format. Only used when opencode_cli is in --agents
--setting-sources SOURCE [...] user project local Only used when claude_code_sdk is in --agents

Python

from spych.core import Spych
from spych.orchestrator import SpychOrchestrator
from spych.agents.claude import LocalClaudeCodeCLIResponder
from spych.agents.ollama import OllamaResponder

spych_object = Spych(whisper_model="base.en")

SpychOrchestrator(
    entries=[
        {
            "responder": LocalClaudeCodeCLIResponder(spych_object=spych_object),
            "wake_words": ["claude", "clod", "cloud", "clawed"],
            "terminate_words": ["terminate"],
        },
        {
            "responder": OllamaResponder(spych_object=spych_object, model="llama3.2:latest"),
            "wake_words": ["llama", "ollama", "lama"],
        },
    ]
).start()

OrchestratorEntry Keys

Key Required Default Description
responder A BaseResponder instance
wake_words Words that trigger this responder. Must be unique across all entries
terminate_words ["terminate"] Words that stop the entire orchestrator

SpychOrchestrator Parameters

Parameter Default Description
entries (required) List of OrchestratorEntry dicts
spych_wake_kwargs None Extra kwargs forwarded to SpychWake

Python — Built-in Agents

The same agents available from the CLI can be used directly from Python.

Claude Code CLI

from spych.agents import claude_code_cli

# Say "hey claude" to trigger
claude_code_cli()

Claude Code SDK

from spych.agents import claude_code_sdk

# Say "hey claude" to trigger
claude_code_sdk()

Codex CLI

from spych.agents import codex_cli

# Say "hey codex" to trigger
codex_cli()

Gemini CLI

from spych.agents import gemini_cli

# Say "hey gemini" to trigger
gemini_cli()

OpenCode CLI

from spych.agents import opencode_cli

# Say "hey opencode" to trigger
opencode_cli()

Ollama

from spych.agents import ollama

# Pull the model first: ollama pull llama3.2:latest
# Say "hey llama" to trigger
ollama(model="llama3.2:latest")

Coding Agent Parameters

Parameter claude_code_cli claude_code_sdk codex_cli gemini_cli opencode_cli Description
name Claude Claude Codex Gemini OpenCode Custom display name
wake_words ["claude", "clod", "cloud", "clawed"] ["claude", "clod", "cloud", "clawed"] ["codex"] ["gemini", "google"] ["opencode", "open code"] Words that trigger the agent
terminate_words ["terminate"] ["terminate"] ["terminate"] ["terminate"] ["terminate"] Words that stop the listener
model None Model in provider/model format
listen_duration 0 0 0 0 0 Seconds to listen (0 = VAD auto)
continue_conversation True True True True True Resume the most recent session
setting_sources ["user", "project", "local"] Claude Code settings sources
show_tool_events True True True True True Print live tool start/end events
use_speaker False False False False False Speak responses aloud via TTS
speaker_voice "af_heart" "af_heart" "af_heart" "af_heart" "af_heart" Voice name for TTS
response_style "" "" "" "" "" Style preset or custom instruction
allow_intermediate_responses True True True True True Enable intermediate response chaining
spych_kwargs Extra kwargs passed to Spych
spych_wake_kwargs Extra kwargs passed to SpychWake

Ollama Parameters

Parameter Default Description
name "Ollama" Custom display name
wake_words ["llama", "ollama", "lama"] Words that trigger the agent
terminate_words ["terminate"] Words that stop the listener
model "llama3.2:latest" Ollama model name
listen_duration 0 Seconds to listen (0 = VAD auto)
history_length 10 Past interactions to include in context
host "http://localhost:11434" Ollama instance URL
use_speaker False Speak responses aloud via TTS
speaker_voice "af_heart" Voice name for TTS
response_style "" Style preset or custom instruction
allow_intermediate_responses True Enable intermediate response chaining
spych_kwargs None Extra kwargs passed to Spych
spych_wake_kwargs None Extra kwargs passed to SpychWake

Python: Building Your Own Agent

Subclass BaseResponder, implement respond, and Spych handles the rest: wake word detection, transcription, spinner UI, timing, TTS, error handling.

respond() must return an AgentResponse. Use self.format_prompt() to inject the JSON schema into your prompt and self.parse_output() to parse the result:

from spych.responders import BaseResponder, AgentResponse

class MyResponder(BaseResponder):
    def respond(self, user_input: str) -> AgentResponse:
        raw = call_my_llm(self.format_prompt(user_input))
        return self.parse_output(raw)

A complete working example with a custom wake word:

from spych import Spych, SpychOrchestrator
from spych.responders import BaseResponder, AgentResponse

class EchoResponder(BaseResponder):
    def respond(self, user_input: str) -> AgentResponse:
        return AgentResponse(
            response=f"'{self.name}' heard: {user_input}",
            summary=f"Heard: {user_input}",
            requires_user_feedback=False,
        )

SpychOrchestrator(
    entries=[
        {
            "responder": EchoResponder(
                spych_object=Spych(whisper_model="base.en"),
                listen_duration=5,
                name="TestResponder",
            ),
            "wake_words": ["test"],
            "terminate_words": ["terminate"],
        }
    ]
).start()

You can also subclass a built-in agent. For example, a translation agent that routes to Ollama:

from spych import Spych, SpychOrchestrator
from spych.agents import OllamaResponder
from spych.responders import AgentResponse

class Spanish(OllamaResponder):
    def respond(self, user_input: str) -> AgentResponse:
        user_input = f"Translate the following to Spanish and return only the translated text: '{user_input}'"
        return super().respond(user_input)

class German(OllamaResponder):
    def respond(self, user_input: str) -> AgentResponse:
        user_input = f"Translate the following to German and return only the translated text: '{user_input}'"
        return super().respond(user_input)

spych_object = Spych(whisper_model="base.en")

SpychOrchestrator(
    entries=[
        {
            "responder": Spanish(spych_object=spych_object, name="SpanishTranslator", model="llama3.2:latest"),
            "wake_words": ["spanish"],
            "terminate_words": ["terminate"],
        },
        {
            "responder": German(spych_object=spych_object, name="GermanTranslator", model="llama3.2:latest"),
            "wake_words": ["german"],
            "terminate_words": ["terminate"],
        },
    ]
).start()

Think your agent would be useful to others? Open a PR or file a feature request via a GitHub issue.


Python: Lower-Level API

Need more control? Use Spych and SpychWake directly.

Transcription

from spych import Spych

spych = Spych(
    whisper_model="base.en",  # tiny, small, medium, large — all faster-whisper models work
    whisper_device="cpu",     # use "cuda" for Nvidia GPU
)

print(spych.listen(duration=5))

See: https://connor-makowski.github.io/spych/spych/core.html

Wake Word Detection

from spych import SpychWake, Spych

spych = Spych(whisper_model="base.en", whisper_device="cpu")

def on_wake():
    print("Wake word detected! Listening...")
    print(spych.listen(duration=5))

SpychWake(
    wake_word_map={"speech": on_wake},
    whisper_model="tiny.en",
    whisper_device="cpu",
).start()

See: https://connor-makowski.github.io/spych/spych/wake.html


API Reference

Full docs including all parameters and methods: https://connor-makowski.github.io/spych/spych.html


Support

Found a bug or want a new feature? Open an issue on GitHub.


Contributing

Contributions are welcome!

  1. Fork the repo and clone it locally.
  2. Make your changes.
  3. Run tests and make sure they pass.
  4. Commit atomically with clear messages.
  5. Submit a pull request.

Virtual environment setup:

python3.11 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
./utils/test.sh
  1"""
  2# Spych
  3[![PyPI version](https://badge.fury.io/py/spych.svg)](https://badge.fury.io/py/spych)
  4[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
  5[![PyPI Downloads](https://img.shields.io/pypi/dm/spych.svg?label=PyPI%20downloads)](https://pypi.org/project/spych/)
  6
  7**Spych** (pronounced "speech"): Talk with your computer like it's your personal assistant without sending your voice to the cloud.
  8
  9A lightweight, fully offline Python toolkit for wake word detection, audio transcription, spoken AI responses, and AI integrations. Built on [faster-whisper](https://github.com/SYSTRAN/faster-whisper), [PvRecorder](https://github.com/Picovoice/pvrecorder), and [Kokoro](https://github.com/hexgrad/kokoro).
 10
 11**API Docs**: https://connor-makowski.github.io/spych/spych.html
 12
 13---
 14
 15# Installation
 16
 17### Recommended: pipx
 18
 19```bash
 20pipx install spych
 21```
 22
 23### Alternative: pip
 24
 25```bash
 26pip install spych
 27```
 28
 29### TTS Extras
 30
 31By default, Spych automatically installs the right TTS backend for your Python version. You can also install explicitly:
 32
 33```bash
 34pipx install "spych[kokoro]"       # Fast, lightweight (Python < 3.13 recommended)
 35pipx install "spych[chatterbox]"   # High-quality voice cloning (Python >= 3.13 required)
 36```
 37
 38---
 39
 40# Quick Start
 41
 42```bash
 43# Navigate to your project directory first
 44cd ~/my_project
 45
 46# Voice-control Claude Code — say "hey claude" to trigger
 47spych claude
 48
 49# Use a personality preset — say "hey jarvis" to trigger
 50spych claude --personality jarvis
 51
 52# Voice-control a local Ollama model — say "hey llama" to trigger
 53spych ollama --model llama3.2:latest
 54```
 55
 56> 💡 **Pro tip:** Saying "Hey Claude" or "Hey Llama" tends to trigger more reliably than the bare wake word.
 57
 58Say **"terminate"** (or press `Ctrl+C`) to stop any session.
 59
 60---
 61
 62# CLI
 63
 64## Available Agents
 65
 66All agents require their respective CLI tool to be installed and authenticated before use.
 67
 68| Command | Alias | Description | Default wake words |
 69|---|---|---|---|
 70| `spych claude_code_cli` | — | Voice-control Claude Code via the CLI | `claude`, `clod`, `cloud`, `clawed` |
 71| `spych claude_code_sdk` | `spych claude` | Voice-control Claude Code via the Agent SDK | `claude`, `clod`, `cloud`, `clawed` |
 72| `spych codex_cli` | `spych codex` | Voice-control the OpenAI Codex agent | `codex` |
 73| `spych gemini_cli` | `spych gemini` | Voice-control the Google Gemini agent | `gemini`, `google` |
 74| `spych opencode_cli` | `spych opencode` | Voice-control the OpenCode agent | `opencode`, `open code` |
 75| `spych ollama` | — | Talk to a local Ollama model | `llama`, `ollama`, `lama` |
 76
 77## Available Utilities
 78
 79The following utilities are also available as CLI commands. They don't use wake words, but serve various auxiliary functions like live transcription and voice profiling.
 80
 81| Command | Description |
 82|---|---|
 83| `spych --version` | Print the version number and exit |
 84| `spych --help` | Show detailed usage instructions and exit |
 85| `spych live` | Continuous speech-to-text transcription to file |
 86| `spych multi` | Run multiple agents simultaneously |
 87| `spych users` | Manage user profiles and global settings |
 88| `spych profile_my_voice` | Record a voice sample for TTS cloning |
 89
 90## Global Flags
 91
 92These must be placed **before** the agent name:
 93
 94```bash
 95spych --theme light claude
 96```
 97
 98| Flag | Options | Default | Description |
 99|---|---|---|---|
100| `--theme` | `dark`, `light`, `solarized`, `mono` | `dark` | Terminal colour theme |
101
102> 💡 **TUI Dashboard:** Spych launches a rich terminal interface by default. Use the `--verbose` flag (e.g., `spych --verbose claude`) to switch to a simpler, non-interactive scrollable output.
103
104---
105
106## Common Flags
107
108All agent subcommands accept these flags:
109
110| Flag | Default | Description |
111|---|---|---|
112| `--personality NAME` | — | Apply a named preset (sets wake words, voice, name, style) |
113| `--name NAME` | *(agent default)* | Custom display name shown in the terminal |
114| `--wake-words WORD [...]` | *(agent default)* | One or more words that trigger the agent |
115| `--terminate-words WORD [...]` | `terminate` | Words that stop the listener |
116| `--listen-duration SECONDS` | `0` (VAD auto) | Seconds to record after wake word |
117| `--follow-up-listen-duration SECONDS` | `0` | Seconds to listen for a follow-up answer |
118| `--inactivity-timeout SECONDS` | `4.0` | Seconds of silence before returning to wake word |
119| `--use-speaker BOOL` | `true` | Speak responses aloud via TTS |
120| `--speaker-voice VOICE` | `af_heart` | Voice name for spoken responses |
121| `--speaker-backend BACKEND` | *(auto)* | `chatterbox` or `kokoro` |
122| `--response-style STYLE` | — | Style preset or custom instruction for spoken output |
123| `--intermediate-responses BOOL` | `true` | Enable intermediate response chaining for long-running tasks |
124
125Coding agents (`claude`, `codex`, `gemini`, `opencode`) also accept:
126
127| Flag | Default | Description |
128|---|---|---|
129| `--continue-conversation BOOL` | `true` | Resume the most recent session |
130| `--show-tool-events BOOL` | `true` | Print live tool start/end events |
131
132Agent-specific flags:
133
134| Agent | Flag | Default | Description |
135|---|---|---|---|
136| `ollama` | `--model` | `llama3.2:latest` | Ollama model name |
137| `ollama` | `--history-length` | `10` | Past interactions to include in context |
138| `ollama` | `--host` | `http://localhost:11434` | Ollama instance URL |
139| `opencode_cli` | `--model` | — | Model in `provider/model` format |
140| `claude_code_sdk` | `--setting-sources` | `user project local` | Claude Code settings sources |
141
142---
143
144# Personalities
145
146Personalities are named presets that bundle a wake word list, voice, display name, and response style into a single flag. Any explicit flag overrides the preset.
147
148```bash
149spych claude --personality jarvis
150# equivalent to:
151spych claude --name "JARVIS" --wake-words jarvis jarves \
152             --speaker-voice bm_george --use-speaker true \
153             --response-style jarvis
154```
155
156| Name | Wake words | Voice | Style |
157|---|---|---|---|
158| `assistant` | `assistant`, `helper`, `computer` | `af_heart` | `assistant` — helpful, precise, informative |
159| `friend` | `friend`, `buddy`, `pal` | `af_amy` | `friendly` — warm and simple |
160| `jarvis` | `jarvis`, `jarves`, `jargus`, `jervis` | `bm_george` | `jarvis` — precise, dry wit, "sir" |
161| `pirate` | `blackbeard`, `pirate`, `ahoy` | `am_michael` | `pirate` — pirate speak, colorful |
162| `news_anchor` | `bella`, `news anchor`, `anchor` | `af_bella` | `news_anchor` — professional broadcast tone |
163| `robot` | `rob`, `robot` | `am_adam` | `robot` — monotone, literal |
164| `caveman` | `er`, `ur`, `caveman`, `cave man` | `am_onyx` | `caveman` — very simple, direct |
165
166---
167
168# User Management
169
170Spych supports multiple user profiles, allowing agents to provide more personalized responses based on your name, age, and other context.
171
172```bash
173# Launch the interactive user management menu
174spych users
175```
176
177The `users` utility allows you to:
178- Create, edit, and delete user profiles.
179- Set a default user for all agents.
180- Change the global terminal theme (`dark`, `light`, `solarized`, `mono`).
181
182You can also specify a user for a specific session:
183```bash
184spych claude --user Connor
185```
186
187---
188
189# Response Styles
190
191The `--response-style` flag shapes how the agent formats its spoken output.
192
193| Style | Description |
194|---|---|
195| `assistant` | Helpful and precise, concise and informative |
196| `concise` | Key points only, direct |
197| `friendly` | Warm, approachable, simple language |
198| `military` | Brevity-style, short sentences |
199| `five_year_old` | Simple words, very short |
200| `fast` | As brief as reasonably possible |
201| `pirate` | Pirate speak, colorful |
202| `news_anchor` | Professional broadcast tone |
203| `haiku` | 5-7-5 haiku form |
204| `shakespearean` | Elizabethan English |
205| `robot` | Monotone, literal |
206| `caveman` | Very simple, direct |
207| `yoda` | Inverted sentence structure |
208| `jarvis` | JARVIS from Iron Man — precise, dry wit, addresses user as "sir" or "ma'am" |
209
210You can also pass any custom instruction string directly: `--response-style "Reply in exactly one sentence."`.
211
212---
213
214# Text-to-Speech & Voices
215
216Spoken responses are enabled by default for personality presets and when `--use-speaker true` is set.
217
218```bash
219spych claude --use-speaker true --speaker-voice bm_george
220spych claude --use-speaker true --speaker-backend kokoro
221spych claude --use-speaker false   # disable TTS
222```
223
224When TTS is active, short responses are spoken verbatim; longer ones use the agent's short `summary`. If the response ends with a question, Spych automatically listens for a follow-up — no wake word required.
225
226### TTS Backends
227
228| Backend | Best for | Python support |
229|---|---|---|
230| **Chatterbox** (default priority) | Natural voices, zero-shot voice cloning | 3.11+ (required for 3.13+) |
231| **Kokoro** (lightweight fallback) | Fast, low-resource devices (e.g. Raspberry Pi) | 3.11–3.12 recommended |
232
233Spych tries Chatterbox first, then Kokoro. Use `--speaker-backend` to force one explicitly.
234
235### Available Voices
236
237The same voice names work for both backends.
238
239- Chatterbox wave voices: https://github.com/connor-makowski/spych/tree/main/voices/wave
240- Kokoro pt voices (56 total): https://github.com/connor-makowski/spych/tree/main/voices/pt
241
242American English (`am_` / `af_`):
243
244| Voice | Gender | Grade |
245|---|---|---|
246| `af_heart` | F | A (default) |
247| `af_bella` | F | A- |
248| `af_nicole` | F | B- |
249| `am_michael` | M | C+ |
250| `am_fenrir` | M | C+ |
251| `am_puck` | M | C+ |
252
253British English (`bm_` / `bf_`):
254
255| Voice | Gender | Grade |
256|---|---|---|
257| `bf_emma` | F | B- |
258| `bf_isabella` | F | C |
259| `bm_george` | M | C |
260
261### Voice Cloning
262
263Record a 10-second sample of your voice, then use it as the speaker voice. Requires the Chatterbox backend.
264
265```bash
266# Step 1: record your profile
267spych profile_my_voice --name my_voice
268
269# Step 2: use it
270spych claude --use-speaker true --speaker-voice my_voice --speaker-backend chatterbox
271
272# Or use any .wav file directly
273spych claude --use-speaker true --speaker-voice /path/to/my_voice.wav --speaker-backend chatterbox
274```
275
276---
277
278# Live Transcription
279
280`spych live` continuously records from the microphone using VAD and writes the transcript to disk in real time. No wake word required — it transcribes everything until stopped.
281
282## CLI
283
284```bash
285spych live                                                 # writes transcript.srt
286spych live --output-path meeting --output-format both
287spych live --terminate-words "stop recording"
288spych live --no-timestamps --whisper-model small.en
289```
290
291Stop by pressing the stop key (default: `q` + Enter), saying a terminate word, or pressing `Ctrl+C`.
292
293### Parameters
294
295| Flag | Default | Description |
296|---|---|---|
297| `--output-path PATH` | `transcript` | Base output file path without extension |
298| `--output-format FORMAT` | `srt` | `txt`, `srt`, or `both` |
299| `--no-timestamps` | false | Omit timestamps from terminal and `.txt` output |
300| `--stop-key KEY` | `q` | Key (then Enter) to stop the session |
301| `--terminate-words WORD [...]` | — | Spoken words that stop the session |
302| `--device-index N` | `-1` | Microphone device index; -1 uses system default |
303| `--whisper-model MODEL` | `base.en` | faster-whisper model name |
304| `--whisper-device DEVICE` | `cpu` | `cpu` or `cuda` |
305| `--whisper-compute-type TYPE` | `int8` | `int8`, `float16`, or `float32` |
306| `--no-speech-threshold FLOAT` | `0.3` | Whisper segments above this `no_speech_prob` are dropped |
307| `--speech-threshold FLOAT` | `0.5` | VAD speech onset probability |
308| `--silence-threshold FLOAT` | `0.35` | VAD silence probability during speech |
309| `--silence-frames N` | `20` | Consecutive silent frames to end a segment (~32ms each) |
310| `--speech-pad-frames N` | `5` | Pre-roll frames and onset confirmation count |
311| `--max-speech-duration SECONDS` | `30.0` | Hard cap on a single segment |
312| `--context-words N` | `32` | Trailing words passed as whisper `initial_prompt` |
313
314## Python
315
316```python
317from spych.live import SpychLive
318
319SpychLive(
320    output_format="srt",          # "txt", "srt", or "both"
321    output_path="my_transcript",  # written to my_transcript.srt
322    show_timestamps=True,
323    stop_key="q",                 # type q + Enter to stop
324    terminate_words=["stop recording"],
325).start()
326```
327
328### `SpychLive` Parameters
329
330| Parameter | Default | Description |
331|---|---|---|
332| `output_format` | `"srt"` | Output format(s): `"txt"`, `"srt"`, or `"both"` |
333| `output_path` | `"transcript"` | Base path without extension |
334| `show_timestamps` | `True` | Prepend `[HH:MM:SS]` timestamps to terminal and `.txt` output |
335| `stop_key` | `"q"` | Key (then Enter) to stop the session |
336| `terminate_words` | `None` | Spoken words that stop the session |
337| `on_terminate` | `None` | No-argument callback executed when a terminate word fires |
338| `device_index` | `-1` | Microphone device index; `-1` uses system default |
339| `whisper_model` | `"base.en"` | faster-whisper model name |
340| `whisper_device` | `"cpu"` | Device for inference: `"cpu"` or `"cuda"` |
341| `whisper_compute_type` | `"int8"` | Compute precision: `"int8"`, `"float16"`, or `"float32"` |
342| `no_speech_threshold` | `0.4` | Whisper segments above this are discarded |
343| `speech_threshold` | `0.5` | Silero VAD onset probability |
344| `silence_threshold` | `0.35` | Silero VAD silence probability during speech |
345| `silence_frames_threshold` | `20` | Consecutive silent frames to close a segment |
346| `speech_pad_frames` | `5` | Pre-roll frame count and onset confirmation threshold |
347| `max_speech_duration_s` | `30.0` | Hard cap on a single segment in seconds |
348| `context_words` | `32` | Trailing transcript words passed as `initial_prompt` |
349
350---
351
352# Multi-agent
353
354Run several agents simultaneously under a single listener, each bound to its own wake words. Say "hey claude" to talk to Claude, "hey llama" to talk to Ollama — all in the same terminal session.
355
356## CLI
357
358```bash
359# Two agents, default wake words
360spych multi --agents claude gemini
361
362# Include Ollama with a specific model
363spych multi --agents claude ollama --ollama-model llama3.2:latest
364
365# Tune listen duration across all agents
366spych multi --agents claude codex --listen-duration 8
367```
368
369### Multi-agent CLI Flags
370
371| Flag | Default | Description |
372|---|---|---|
373| `--agents AGENT [...]` | *(required)* | Agents to run: `claude` (`claude_code_cli`), `claude_sdk` (`claude_code_sdk`), `codex` (`codex_cli`), `gemini` (`gemini_cli`), `opencode` (`opencode_cli`), `ollama` |
374| `--terminate-words WORD [...]` | `terminate` | Words that stop all agents |
375| `--listen-duration SECONDS` | `5` | Seconds to listen after a wake word |
376| `--follow-up-listen-duration SECONDS` | `0` | Seconds to listen for follow-up answers |
377| `--inactivity-timeout SECONDS` | `4.0` | Seconds of silence before returning to wake word |
378| `--continue-conversation BOOL` | `true` | Resume the most recent session for each coding agent |
379| `--show-tool-events BOOL` | `true` | Print live tool start/end events |
380| `--use-speaker BOOL` | `true` | Speak responses aloud via TTS |
381| `--speaker-backend BACKEND` | *(auto)* | `chatterbox` or `kokoro` |
382| `--intermediate-responses BOOL` | `true` | Enable intermediate response chaining for long-running tasks |
383| `--ollama-model MODEL` | `llama3.2:latest` | Only used when `ollama` is in `--agents` |
384| `--ollama-host URL` | `http://localhost:11434` | Only used when `ollama` is in `--agents` |
385| `--ollama-history-length N` | `10` | Only used when `ollama` is in `--agents` |
386| `--opencode-model MODEL` | — | `provider/model` format. Only used when `opencode_cli` is in `--agents` |
387| `--setting-sources SOURCE [...]` | `user project local` | Only used when `claude_code_sdk` is in `--agents` |
388
389## Python
390
391```python
392from spych.core import Spych
393from spych.orchestrator import SpychOrchestrator
394from spych.agents.claude import LocalClaudeCodeCLIResponder
395from spych.agents.ollama import OllamaResponder
396
397spych_object = Spych(whisper_model="base.en")
398
399SpychOrchestrator(
400    entries=[
401        {
402            "responder": LocalClaudeCodeCLIResponder(spych_object=spych_object),
403            "wake_words": ["claude", "clod", "cloud", "clawed"],
404            "terminate_words": ["terminate"],
405        },
406        {
407            "responder": OllamaResponder(spych_object=spych_object, model="llama3.2:latest"),
408            "wake_words": ["llama", "ollama", "lama"],
409        },
410    ]
411).start()
412```
413
414### `OrchestratorEntry` Keys
415
416| Key | Required | Default | Description |
417|---|---|---|---|
418| `responder` | ✓ | — | A `BaseResponder` instance |
419| `wake_words` | ✓ | — | Words that trigger this responder. Must be unique across all entries |
420| `terminate_words` | | `["terminate"]` | Words that stop the entire orchestrator |
421
422### `SpychOrchestrator` Parameters
423
424| Parameter | Default | Description |
425|---|---|---|
426| `entries` | *(required)* | List of `OrchestratorEntry` dicts |
427| `spych_wake_kwargs` | `None` | Extra kwargs forwarded to `SpychWake` |
428
429---
430
431# Python — Built-in Agents
432
433The same agents available from the CLI can be used directly from Python.
434
435## Claude Code CLI
436
437```python
438from spych.agents import claude_code_cli
439
440# Say "hey claude" to trigger
441claude_code_cli()
442```
443
444## Claude Code SDK
445
446```python
447from spych.agents import claude_code_sdk
448
449# Say "hey claude" to trigger
450claude_code_sdk()
451```
452
453## Codex CLI
454
455```python
456from spych.agents import codex_cli
457
458# Say "hey codex" to trigger
459codex_cli()
460```
461
462## Gemini CLI
463
464```python
465from spych.agents import gemini_cli
466
467# Say "hey gemini" to trigger
468gemini_cli()
469```
470
471## OpenCode CLI
472
473```python
474from spych.agents import opencode_cli
475
476# Say "hey opencode" to trigger
477opencode_cli()
478```
479
480## Ollama
481
482```python
483from spych.agents import ollama
484
485# Pull the model first: ollama pull llama3.2:latest
486# Say "hey llama" to trigger
487ollama(model="llama3.2:latest")
488```
489
490### Coding Agent Parameters
491
492| Parameter | `claude_code_cli` | `claude_code_sdk` | `codex_cli` | `gemini_cli` | `opencode_cli` | Description |
493|---|---|---|---|---|---|---|
494| `name` | `Claude` | `Claude` | `Codex` | `Gemini` | `OpenCode` | Custom display name |
495| `wake_words` | `["claude", "clod", "cloud", "clawed"]` | `["claude", "clod", "cloud", "clawed"]` | `["codex"]` | `["gemini", "google"]` | `["opencode", "open code"]` | Words that trigger the agent |
496| `terminate_words` | `["terminate"]` | `["terminate"]` | `["terminate"]` | `["terminate"]` | `["terminate"]` | Words that stop the listener |
497| `model` | — | — | — | — | `None` | Model in `provider/model` format |
498| `listen_duration` | `0` | `0` | `0` | `0` | `0` | Seconds to listen (0 = VAD auto) |
499| `continue_conversation` | `True` | `True` | `True` | `True` | `True` | Resume the most recent session |
500| `setting_sources` | — | `["user", "project", "local"]` | — | — | — | Claude Code settings sources |
501| `show_tool_events` | `True` | `True` | `True` | `True` | `True` | Print live tool start/end events |
502| `use_speaker` | `False` | `False` | `False` | `False` | `False` | Speak responses aloud via TTS |
503| `speaker_voice` | `"af_heart"` | `"af_heart"` | `"af_heart"` | `"af_heart"` | `"af_heart"` | Voice name for TTS |
504| `response_style` | `""` | `""` | `""` | `""` | `""` | Style preset or custom instruction |
505| `allow_intermediate_responses` | `True` | `True` | `True` | `True` | `True` | Enable intermediate response chaining |
506| `spych_kwargs` | — | — | — | — | — | Extra kwargs passed to `Spych` |
507| `spych_wake_kwargs` | — | — | — | — | — | Extra kwargs passed to `SpychWake` |
508
509### Ollama Parameters
510
511| Parameter | Default | Description |
512|---|---|---|
513| `name` | `"Ollama"` | Custom display name |
514| `wake_words` | `["llama", "ollama", "lama"]` | Words that trigger the agent |
515| `terminate_words` | `["terminate"]` | Words that stop the listener |
516| `model` | `"llama3.2:latest"` | Ollama model name |
517| `listen_duration` | `0` | Seconds to listen (0 = VAD auto) |
518| `history_length` | `10` | Past interactions to include in context |
519| `host` | `"http://localhost:11434"` | Ollama instance URL |
520| `use_speaker` | `False` | Speak responses aloud via TTS |
521| `speaker_voice` | `"af_heart"` | Voice name for TTS |
522| `response_style` | `""` | Style preset or custom instruction |
523| `allow_intermediate_responses` | `True` | Enable intermediate response chaining |
524| `spych_kwargs` | `None` | Extra kwargs passed to `Spych` |
525| `spych_wake_kwargs` | `None` | Extra kwargs passed to `SpychWake` |
526
527---
528
529# Python: Building Your Own Agent
530
531Subclass `BaseResponder`, implement `respond`, and Spych handles the rest: wake word detection, transcription, spinner UI, timing, TTS, error handling.
532
533`respond()` must return an `AgentResponse`. Use `self.format_prompt()` to inject the JSON schema into your prompt and `self.parse_output()` to parse the result:
534
535```python
536from spych.responders import BaseResponder, AgentResponse
537
538class MyResponder(BaseResponder):
539    def respond(self, user_input: str) -> AgentResponse:
540        raw = call_my_llm(self.format_prompt(user_input))
541        return self.parse_output(raw)
542```
543
544A complete working example with a custom wake word:
545
546```python
547from spych import Spych, SpychOrchestrator
548from spych.responders import BaseResponder, AgentResponse
549
550class EchoResponder(BaseResponder):
551    def respond(self, user_input: str) -> AgentResponse:
552        return AgentResponse(
553            response=f"'{self.name}' heard: {user_input}",
554            summary=f"Heard: {user_input}",
555            requires_user_feedback=False,
556        )
557
558SpychOrchestrator(
559    entries=[
560        {
561            "responder": EchoResponder(
562                spych_object=Spych(whisper_model="base.en"),
563                listen_duration=5,
564                name="TestResponder",
565            ),
566            "wake_words": ["test"],
567            "terminate_words": ["terminate"],
568        }
569    ]
570).start()
571```
572
573You can also subclass a built-in agent. For example, a translation agent that routes to Ollama:
574
575```python
576from spych import Spych, SpychOrchestrator
577from spych.agents import OllamaResponder
578from spych.responders import AgentResponse
579
580class Spanish(OllamaResponder):
581    def respond(self, user_input: str) -> AgentResponse:
582        user_input = f"Translate the following to Spanish and return only the translated text: '{user_input}'"
583        return super().respond(user_input)
584
585class German(OllamaResponder):
586    def respond(self, user_input: str) -> AgentResponse:
587        user_input = f"Translate the following to German and return only the translated text: '{user_input}'"
588        return super().respond(user_input)
589
590spych_object = Spych(whisper_model="base.en")
591
592SpychOrchestrator(
593    entries=[
594        {
595            "responder": Spanish(spych_object=spych_object, name="SpanishTranslator", model="llama3.2:latest"),
596            "wake_words": ["spanish"],
597            "terminate_words": ["terminate"],
598        },
599        {
600            "responder": German(spych_object=spych_object, name="GermanTranslator", model="llama3.2:latest"),
601            "wake_words": ["german"],
602            "terminate_words": ["terminate"],
603        },
604    ]
605).start()
606```
607
608Think your agent would be useful to others? Open a PR or file a feature request via a [GitHub issue](https://github.com/connor-makowski/spych/issues).
609
610---
611
612# Python: Lower-Level API
613
614Need more control? Use `Spych` and `SpychWake` directly.
615
616## Transcription
617
618```python
619from spych import Spych
620
621spych = Spych(
622    whisper_model="base.en",  # tiny, small, medium, large — all faster-whisper models work
623    whisper_device="cpu",     # use "cuda" for Nvidia GPU
624)
625
626print(spych.listen(duration=5))
627```
628
629See: https://connor-makowski.github.io/spych/spych/core.html
630
631## Wake Word Detection
632
633```python
634from spych import SpychWake, Spych
635
636spych = Spych(whisper_model="base.en", whisper_device="cpu")
637
638def on_wake():
639    print("Wake word detected! Listening...")
640    print(spych.listen(duration=5))
641
642SpychWake(
643    wake_word_map={"speech": on_wake},
644    whisper_model="tiny.en",
645    whisper_device="cpu",
646).start()
647```
648
649See: https://connor-makowski.github.io/spych/spych/wake.html
650
651---
652
653# API Reference
654
655Full docs including all parameters and methods: https://connor-makowski.github.io/spych/spych.html
656
657---
658
659# Support
660
661Found a bug or want a new feature? [Open an issue on GitHub](https://github.com/connor-makowski/spych/issues).
662
663---
664
665# Contributing
666
667Contributions are welcome!
668
6691. Fork the repo and clone it locally.
6702. Make your changes.
6713. Run tests and make sure they pass.
6724. Commit atomically with clear messages.
6735. Submit a pull request.
674
675**Virtual environment setup:**
676```bash
677python3.11 -m venv venv
678source venv/bin/activate
679pip install -r requirements.txt
680./utils/test.sh
681```
682"""
683
684from .core import Spych
685from .wake import SpychWake
686from .orchestrator import SpychOrchestrator