%%{init: {'theme': 'base', 'themeVariables': {
'primaryColor': '#1a1a1a',
'primaryTextColor': '#d4d4d4',
'primaryBorderColor': '#c49a6c',
'lineColor': '#8a6d4a',
'secondaryColor': '#2a2a2a',
'tertiaryColor': '#111',
'fontFamily': 'SF Mono, Fira Code, Consolas, monospace',
'fontSize': '12px',
'clusterBkg': '#1a1a1a',
'clusterBorder': '#2a2a2a',
'edgeLabelBackground': '#111'
}}}%%
flowchart TD
subgraph A["Lane A — Analog"]
direction TB
Campus[Analog Phone on Campus]
Tape[Tape Recorder]
Euro[Eurorack]
OT_A[Octatrack]
Campus --> Tape --> Euro --> OT_A
end
subgraph B["Lane B — Digital"]
direction TB
VoIP[VoIP / Twilio]
Pipeline[Python Pipeline]
Grains[Grain Library]
Packs[Daily Packs]
OT_B[Octatrack]
VoIP --> Pipeline --> Grains --> Packs --> OT_B
end
subgraph C["Lane C — Installation"]
direction TB
OT_C[OT Stem Exports]
Engine[Python Engine]
XR18[XR18 Mixer]
Inside[Inside Speakers]
Outside[Outside Speakers]
OT_C --> Engine --> XR18
XR18 --> Inside
XR18 --> Outside
Mic[Room Mic] -.->|modulation| Engine
end
OT_A --> OT_C
OT_B --> OT_C
style A fill:#1a1a1a,stroke:#9a5a5a,color:#d4d4d4
style B fill:#1a1a1a,stroke:#c49a6c,color:#d4d4d4
style C fill:#1a1a1a,stroke:#5a9a5a,color:#d4d4d4
style Campus fill:#2a2a2a,stroke:#9a5a5a,color:#d4d4d4
style Tape fill:#2a2a2a,stroke:#9a5a5a,color:#d4d4d4
style Euro fill:#2a2a2a,stroke:#9a5a5a,color:#d4d4d4
style OT_A fill:#2a2a2a,stroke:#9a5a5a,color:#d4d4d4
style VoIP fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
style Pipeline fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
style Grains fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
style Packs fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
style OT_B fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
style OT_C fill:#2a2a2a,stroke:#5a9a5a,color:#d4d4d4
style Engine fill:#2a2a2a,stroke:#5a9a5a,color:#d4d4d4
style XR18 fill:#2a2a2a,stroke:#5a9a5a,color:#d4d4d4
style Inside fill:#2a2a2a,stroke:#5a9a5a,color:#d4d4d4
style Outside fill:#2a2a2a,stroke:#5a9a5a,color:#d4d4d4
style Mic fill:#2a2a2a,stroke:#9a9a5a,color:#d4d4d4
After the Tone
System Architecture for a Participatory Sound Installation
1 Executive Summary
After the Tone is a participatory sound installation where visitors leave voicemail messages on a dedicated phone line. These messages are transcribed, tended, and transmuted into audio material that plays continuously in the exhibition space.
The system operates across three lanes:
- Lane A (Analog): Analog phone on campus → tape recorder → eurorack/Octatrack for artifact-making and color
- Lane B (Digital): VoIP → Python pipeline → grains + curated daily packs → Octatrack
- Lane C (Installation): OT exports stems → computer plays zones, triggers fades, occasional mic injection
The computer runs Lane C: stable, unattended playback. It does not do creative sound design. It plays pre-composed stems, manages zones, and reacts subtly to the room.
2 System Architecture
2.1 Deployment: Relay + Studio Split
In practice, the system runs across two machines to separate the always-on phone line from the residency-bound audio engine.
%%{init: {'theme': 'base', 'themeVariables': {
'primaryColor': '#1a1a1a',
'primaryTextColor': '#d4d4d4',
'primaryBorderColor': '#c49a6c',
'lineColor': '#8a6d4a',
'secondaryColor': '#2a2a2a',
'tertiaryColor': '#111',
'fontFamily': 'SF Mono, Fira Code, Consolas, monospace',
'fontSize': '12px',
'clusterBkg': '#1a1a1a',
'clusterBorder': '#2a2a2a',
'edgeLabelBackground': '#111'
}}}%%
flowchart LR
subgraph Relay["Relay — Windows PC (home, always-on)"]
direction TB
Twilio[Twilio Webhook]
Mode{mode?}
Greeting[Play Random Greeting]
Dual[Dial Tape via SIP +<br/>Record Digitally]
Download[Download WAV]
Save[Save WAV + JSON Sidecar]
Twilio --> Mode
Mode -->|record| Greeting --> Download --> Save
Mode -->|dual| Dual --> Download
end
subgraph Sync["Dropbox"]
Folder[after_the_tone/incoming/]
end
subgraph Studio["Studio — Mac Laptop (residency)"]
direction TB
Incoming[Incoming Page]
Ingest[Ingest Button]
Pipeline[Pipeline: Transcribe → Scrub → Analyze → Grains]
Engine[Engine + Dashboard]
Incoming --> Ingest --> Pipeline --> Engine
end
Save --> Folder --> Incoming
style Relay fill:#1a1a1a,stroke:#c49a6c,color:#d4d4d4
style Sync fill:#1a1a1a,stroke:#666,color:#d4d4d4
style Studio fill:#1a1a1a,stroke:#5a9a5a,color:#d4d4d4
style Twilio fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
style Mode fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
style Greeting fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
style Dual fill:#2a2a2a,stroke:#9a5a5a,color:#d4d4d4
style Download fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
style Save fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
style Folder fill:#2a2a2a,stroke:#666,color:#d4d4d4
style Incoming fill:#2a2a2a,stroke:#5a9a5a,color:#d4d4d4
style Ingest fill:#2a2a2a,stroke:#5a9a5a,color:#d4d4d4
style Pipeline fill:#2a2a2a,stroke:#5a9a5a,color:#d4d4d4
style Engine fill:#2a2a2a,stroke:#5a9a5a,color:#d4d4d4
Why split? The laptop handles audio hardware (XR18, speakers, room mic) and needs stability. The phone line needs always-on internet + ngrok. If the laptop crashes, the phone line stays up. If the PC crashes, the installation keeps playing.
| Relay (PC) | Studio (Mac) | |
|---|---|---|
| Role | Answer calls, save recordings | Everything else |
| Dependencies | fastapi, uvicorn, httpx, twilio, pyyaml, python-multipart | Full att package |
| Internet | Required (Twilio + ngrok) | Only for Anthropic API calls |
| Audio hardware | None | XR18 + speakers + mic |
| Database | None | SQLite |
| Uptime | Always-on | During residency hours |
Sync: Dropbox folder after_the_tone/incoming/. Relay writes YYYYMMDD_HHMMSS_{CallSid}.wav + .json sidecar. Mac picks them up from the dashboard Incoming page. After ingest, files move to incoming/ingested/.
Greetings: MP3 files live in ~/Dropbox/after_the_tone/greetings/ on both machines, synced via Dropbox. The relay serves them to Twilio via a static mount at /greetings/{filename}. On each call, one is chosen at random. The relay prefers .mp3 (faster for Twilio to fetch), falls back to .wav. If no greeting files exist, the call skips straight to the beep with no TTS fallback.
JSON sidecar format (written alongside each WAV):
{
"call_sid": "CA...",
"phone_hash": "sha256[:16]",
"duration": 45.2,
"timestamp": "20250225_143022"
}Credentials: The relay reads TWILIO_ACCOUNT_SID and TWILIO_AUTH_TOKEN from environment variables. On the deployed PC, these are set in relay/start.bat (plaintext — acceptable for a home machine, not suitable for shared infrastructure).
ngrok: The relay uses a static free-tier ngrok domain (salena-crenate-coequally.ngrok-free.dev). This survives ngrok restarts, so the Twilio webhook URL never needs updating.
Single-machine mode still works: leave audio.incoming_watch_dir unset and the Twilio webhook mounts directly on the Mac as before.
2.2 Campus Phone: Grandstream HT812
An analog phone on campus connects to the same Twilio pipeline via a Grandstream HT812 ATA (Analog Telephone Adapter). When someone picks up the phone, it auto-dials Twilio over SIP — no keypad, no dialing, just pick up and talk.
%%{init: {'theme': 'base', 'themeVariables': {
'primaryColor': '#1a1a1a',
'primaryTextColor': '#d4d4d4',
'primaryBorderColor': '#c49a6c',
'lineColor': '#8a6d4a',
'secondaryColor': '#2a2a2a',
'tertiaryColor': '#111',
'fontFamily': 'SF Mono, Fira Code, Consolas, monospace',
'fontSize': '12px',
'clusterBkg': '#1a1a1a',
'clusterBorder': '#2a2a2a',
'edgeLabelBackground': '#111'
}}}%%
flowchart LR
subgraph Campus["Residency — On-Site Network"]
direction TB
Phone[Analog Phone]
HT812[Grandstream HT812<br/>FXS → SIP]
Opal[GL-iNet Opal<br/>WiFi bridge]
XR18[XR18 Mixer]
iPad[iPad<br/>Mixing Station]
Phone -->|RJ11| HT812
HT812 -->|ethernet| Opal
XR18 -->|ethernet| Opal
iPad -.->|WiFi| Opal
end
subgraph Cloud["Twilio"]
SIP[SIP Domain]
Webhook[Voice URL Webhook]
SIP --> Webhook
end
subgraph Home["Home PC — Relay"]
Relay[relay/server.py]
end
Opal -->|internet| SIP
Webhook -->|ngrok| Relay
style Campus fill:#1a1a1a,stroke:#5a9a5a,color:#d4d4d4
style Cloud fill:#1a1a1a,stroke:#c49a6c,color:#d4d4d4
style Home fill:#1a1a1a,stroke:#c49a6c,color:#d4d4d4
style Phone fill:#2a2a2a,stroke:#9a5a5a,color:#d4d4d4
style HT812 fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
style Opal fill:#2a2a2a,stroke:#5a9a5a,color:#d4d4d4
style XR18 fill:#2a2a2a,stroke:#5a9a5a,color:#d4d4d4
style iPad fill:#2a2a2a,stroke:#666,color:#d4d4d4
style SIP fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
style Webhook fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
style Relay fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
Network: The GL-iNet Opal travel router connects to the residency WiFi and bridges it to its LAN ports. The HT812 and XR18 share the Opal’s two ethernet ports. The iPad connects to the Opal’s WiFi AP for mixer control. Local traffic (iPad ↔︎ XR18) stays on the Opal’s subnet; only the HT812’s SIP traffic goes out to the internet. The Mac laptop is not in this path — it doesn’t need to be on or connected for the phone to work.
Call flow: Phone pickup → HT812 offhook auto-dial → SIP to Twilio → Twilio fires Voice URL webhook → relay on home PC → greeting + record → WAV to Dropbox → Mac ingests later.
Two ports, two paths:
| HT812 Port | Connected Device | SIP User | Behavior |
|---|---|---|---|
| FXS 1 | Analog phone | airstream |
Offhook auto-dial → digital pipeline (greeting, record, grains) |
| FXS 2 | Tape answering machine (Code-a-Phone) | tape |
Receives incoming SIP calls → tape records analog copy |
FXS 2 is optional. In standard mode (mode: record), only FXS 1 is used — callers hear a random MP3 greeting, leave a message, and the relay saves a digital WAV. FXS 2 sits idle.
2.2.1 Dual Recording Mode
In dual mode (mode: dual), every call simultaneously records to the tape machine AND digitally. The relay uses Twilio’s <Dial record="record-from-answer-dual"> to bridge the caller to the tape machine’s SIP endpoint while Twilio captures a parallel digital recording.
Caller → Twilio → relay (dual mode)
├── SIP dial → HT812 FXS 2 → Code-a-Phone (analog tape)
└── Twilio records call digitally → WAV to Dropbox
The caller hears the tape machine’s physical greeting (not the MP3) and leaves a message on the cassette. Meanwhile, Twilio captures the entire call as a WAV and the relay saves it to Dropbox as usual. You get two copies: one with analog tape character, one clean digital.
Relay config for dual mode:
mode: dual
tape_sip: sip:tape@after-the-tone.sip.twilio.comRecording behavior:
- No silence cutoff —
timeout=0disables Twilio’s default 5-second silence detection. Recordings only end when the caller hangs up or hits the max length. - Max recording length — 3600 seconds (1 hour). Callers are never cut off for going long.
- Download timeout — 300 seconds, to handle large recordings over slow connections.
3 Lane Definitions
3.1 Lane A: Analog Artifact
Purpose: Create textured, degraded audio material with analog character.
| Input | Messages left on an analog phone on campus, recorded directly to tape |
| Process | Tape recordings processed through eurorack (filters, reverb, saturation), sampled into Octatrack |
| Output | Processed stems with analog warmth and tape artifacts |
| Rule | All creative decisions happen here, not on the computer |
3.2 Lane B: Digital Indexed
Purpose: Systematically process voicemails into an indexed, searchable grain library.
| Input | VoIP recordings via Twilio webhook (relay PC or local) |
| Process | Transcribe (Whisper) → scrub PII (spaCy) → analyze emotions (Claude) → segment into grains (librosa) |
| Output | Tagged grain library + curated daily sample packs for Octatrack |
| Rule | Pipeline is automated; curation happens when selecting daily packs |
| Deployment | Twilio answering runs on the relay PC; pipeline processing runs on the Mac (see Relay + Studio Split) |
3.3 Lane C: Installation Playback
Purpose: Reliable, deterministic playback in the exhibition space.
| Input | Pre-composed stems (from DAW, Octatrack, or any source) + curated voicemail clips |
| Process | Two-zone playback engine: beds loop inside, curated clips play outside, cross-zone intensity modulation |
| Output | 3-channel audio to XR18 → speakers (stereo inside + mono outside) |
| Rule | No creative sound design on the computer. Stable unattended operation. |
4 Lane B — Digital Pipeline
A deep-dive into the automated VoIP ingest pipeline that powers Lane B.
%%{init: {'theme': 'base', 'themeVariables': {
'primaryColor': '#1a1a1a',
'primaryTextColor': '#d4d4d4',
'primaryBorderColor': '#c49a6c',
'lineColor': '#8a6d4a',
'secondaryColor': '#2a2a2a',
'tertiaryColor': '#111',
'fontFamily': 'SF Mono, Fira Code, Consolas, monospace',
'fontSize': '12px',
'clusterBkg': '#1a1a1a',
'clusterBorder': '#2a2a2a',
'edgeLabelBackground': '#111'
}}}%%
flowchart LR
VoIP[VoIP Recording] --> Transcribe
Transcribe --> Scrub[PII Scrub]
Scrub --> Analyze[Emotional Analysis]
Analyze --> Grains[Extract Grains]
Grains --> Ready
style VoIP fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
style Transcribe fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
style Scrub fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
style Analyze fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
style Grains fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
style Ready fill:#2a2a2a,stroke:#5a9a5a,color:#d4d4d4
Status state machine: new → transcribed → scrubbed → analyzed → ready
Key file: att/ingest/pipeline.py — orchestrates the full sequence: transcribe → scrub → analyze → extract grains.
Library: faster-whisper (local inference, no cloud upload)
Key file: att/ingest/transcribe.py
Config (whisper section):
| Parameter | Default | Description |
|---|---|---|
model_size |
small |
Whisper model size (tiny/base/small/medium/large) |
device |
auto |
Compute device (auto/cpu/cuda) |
compute_type |
int8 |
Quantization for faster inference |
language |
null |
Language code or null for auto-detect |
Output to DB:
transcripts.raw_text— full transcriptiontranscripts.language— detected language codetranscripts.confidence— model confidence score
Privacy: Audio never leaves the machine. All transcription runs locally.
Failure: Empty audio → status set to "empty", pipeline aborts for that message.
People call a voicemail line and say whatever is on their mind. They say their names. They say their mother’s name. They leave phone numbers, addresses, confessions. This project asks people to be vulnerable, and the minimum obligation in return is that their identifying information never leaves the machine they trusted it to. PII scrubbing is not a feature — it is the condition under which this project is ethical.
Libraries: spaCy (en_core_web_sm) + custom regex patterns — both run locally, no network calls.
Key file: att/ingest/scrub.py
4.0.1 Where it sits in the pipeline
Scrubbing is step 2 of the ingest pipeline (att/ingest/pipeline.py). It runs after local transcription and before any text is sent to an external API. The ordering is enforced by the pipeline itself — there is no code path that sends unscrubbed text to Claude. The raw transcript and the original audio never leave the local machine.
4.0.2 Two-pass approach
The scrubber makes two passes over the transcript, then deduplicates overlapping detections (keeping the longer match):
Pass 1 — Regex patterns (deterministic, high-precision):
| Pattern | Examples caught | Placeholder |
|---|---|---|
PHONE |
(555) 123-4567, +1-555-123-4567 |
[PHONE] |
EMAIL |
name@example.com |
[EMAIL] |
SSN |
123-45-6789 |
[SSN] |
ADDRESS |
1234 Main Street, 56 Oak Ave |
[ADDRESS] |
Pass 2 — spaCy NER (statistical, catches what regex misses):
| Entity type | What it catches | Placeholder |
|---|---|---|
PERSON |
Personal names (“my daughter Sarah”) | [NAME] |
ORG |
Organizations, employers, schools | [ORGANIZATION] |
GPE / LOC / FAC |
Cities, states, landmarks, buildings | [LOCATION] |
The two passes are complementary: regex catches structured identifiers that NER models often miss (phone numbers, SSNs), while spaCy catches names and places that no regex can reliably match.
4.0.3 What the scrubbed text looks like
A raw transcript like:
“Hi, this is Sarah Chen calling from 1455 Oak Drive. You can reach me at 555-0142. I just wanted to say I miss you, Mom.”
becomes:
“Hi, this is [NAME] calling from [ADDRESS]. You can reach me at [PHONE]. I just wanted to say I miss you, Mom.”
The emotional content — the part that matters for this project — is preserved. The identifying details are not.
4.0.4 Overlap deduplication
When regex and spaCy flag the same span (e.g., both detect “Sarah Chen”), the scrubber sorts by position and keeps the longer match to avoid double-replacement or garbled output.
4.0.5 Output to DB
transcripts.scrubbed_text— the redacted version of the transcript, with placeholders replacing every detected entitytranscripts.pii_entities— a JSON array recording what was found and where:{"text": "Sarah Chen", "label": "PERSON", "start": 13, "end": 23}
The original entity text is stored only in this local DB field so that scrubbing can be audited and improved. It is never sent to any external service.
4.0.6 What reaches the outside world
Nothing in this pipeline contacts the network except the Claude API call in the next step (emotional analysis), and that call receives only the scrubbed text. The code path is explicit — analyze_message() takes a scrubbed_text parameter; the raw transcript is not passed.
To be concrete about what never leaves the machine:
- The original audio recording
- The raw Whisper transcript
- Any detected PII entities (names, phone numbers, addresses, SSNs)
4.0.7 Config
None — the regex patterns and spaCy model are hardcoded. This is intentional: PII scrubbing should not be something you can accidentally misconfigure or turn off. The patterns are conservative (biased toward over-scrubbing).
4.0.8 Limitations
This is a best-effort system, not a legal guarantee. spaCy’s en_core_web_sm is a small model optimized for speed over recall. It will miss some names, especially unusual ones or those in non-English speech. The regex patterns cover US-format identifiers. If a caller says “I live on the corner of Fifth and Main” without a street number, the address regex won’t catch it (though spaCy’s GPE/LOC may).
The design principle is: when in doubt, scrub it. A false positive (over-redacting) costs nothing meaningful — the emotional analysis still works with [NAME] placeholders. A false negative (leaking a name to an API) is the failure mode that matters.
Service: Anthropic Claude API (async)
Key file: att/ingest/analyze.py
Config (analysis section):
| Parameter | Default | Description |
|---|---|---|
model |
claude-sonnet-4-20250514 |
Claude model for analysis |
themes |
17 emotions | Whitelist of valid theme labels |
Output to DB:
analysis.sentiment— primary emotional categoryanalysis.intensity— float [0, 1]analysis.themes— list of matched theme labelsanalysis.summary— one-sentence descriptionanalysis.slug— filename-safe identifieranalysis.narrative_thread— thematic grouping
Validation: Intensity clamped to [0, 1], themes filtered against the configured whitelist, handles markdown fences in Claude’s JSON response.
Failure: Missing API key → exception. Invalid JSON response → exception. Both set message status to "error".
Libraries: librosa (silence detection, spectral features) + soundfile
Key file: att/ingest/grains.py
Config (grains section):
| Parameter | Default | Description |
|---|---|---|
min_grain_secs |
2.0 |
Minimum grain duration |
max_grain_secs |
10 |
Maximum grain duration |
silence_threshold_db |
30.0 |
Silence detection threshold |
min_silence_gap_secs |
0.3 |
Minimum gap between grains |
Algorithm:
- Detect silence intervals in audio
- Merge intervals closer than
min_silence_gap_secs - Merge segments shorter than
min_grain_secs - Split segments longer than
max_grain_secs - Apply 10ms fade in/out to each grain
Spectral features per grain:
spectral_centroid(Hz) — brightness measurerms_energy— perceptual loudness
Output: Grain WAV files in audio/grains/, DB rows in grains table.
Failure: Audio shorter than min_grain_secs → empty grain list. No speech detected → empty grain list.
Key file: att/ingest/export.py
Naming convention: {sentiment}/{slug}_{grain_num:02d}.wav (e.g. grief/miss_you_mom_01.wav)
Directory structure: Sentiment subfolders + grains.csv metadata file.
CSV columns:
filename, folder, slug, sentiment, themes, intensity, duration, message_id, grain_index, start, end, spectral_centroid, rms_energy, summary
Config: Uses audio.storage_path for the export directory root.
In addition to the automatic silence-based grain extraction above, a curated phrase extraction tool lets you search for specific spoken phrases across all transcripts and cut them out precisely.
Key file: scripts/extract_phrases.py
How it works:
- You write a text file (
phrases.txt) with one phrase per line — the phrases that struck you while listening - The tool fuzzy-matches each phrase against all transcripts’ word-level timestamps (generated by Whisper)
- Matching audio segments are extracted with configurable padding (default 1.5s before and after) and 1.5s crossfades
- Output: clean WAV files in
audio/curated/, ready for the outside zone
# Preview matches without extracting
python scripts/extract_phrases.py phrases.txt --dry-run
# Extract with default settings
python scripts/extract_phrases.py phrases.txt
# Custom padding and match threshold
python scripts/extract_phrases.py phrases.txt --pad-before 0.5 --pad-after 0.5 --min-score 75
# Keep all non-overlapping matches (not just the best)
python scripts/extract_phrases.py phrases.txt --all-matches
# Then normalize the output for consistent playback volume
python scripts/normalize_curated.pyIf you’re working with field recordings, interviews, or any spoken-word material and want to build a usable sample library, here’s the workflow this project uses. The tools are open and the approach generalizes to any audio corpus.
The philosophy: you still listen to everything. The machine doesn’t decide what matters — you do. It just makes the extraction fast once you’ve made your choices.
Listen first — Go through every recording. Apply your own creative judgment. Note what resonates, what surprises you, what you want to use.
Transcribe — Whisper runs locally (no cloud upload) and produces word-level timestamps. Every word gets a precise start and end time. Run
scripts/retranscribe.pyif your existing transcripts are missing word timestamps.Note the phrases that struck you — As you listen, write down the phrases in a plain text file (one per line). These are your curatorial choices — the moments that matter to you.
Let the tool do the cutting — The fuzzy matcher (
scripts/extract_phrases.py) finds your chosen phrases in the transcripts, even when speakers mumbled or trailed off. It handles the tedious part: locating the exact timecode and cutting the audio with padding and crossfades. Output: clean WAV files ready to drop into a sampler, DAW, or this installation’s curated zone.Normalize — Batch-normalize all clips to consistent loudness (
scripts/normalize_curated.pytargets -23 dB RMS with a -1 dB peak ceiling) so the library plays back at even levels.Organize — The export tool (
att/ingest/export.py) sorts grains into folders by emotional category (grief/,hope/,silence/) with descriptive filenames derived from content analysis. A CSV manifest ties everything together: filename, sentiment, themes, intensity, duration, spectral features.
Frame it this way: you do the listening, the choosing, the curating. The machine handles the cutting and organizing.
If you’re working with hours of interview footage and hunting for the three sentences that tell the story, this toolchain can help.
The problem — Scrubbing through timelines is slow. You know the kind of thing you’re looking for, but finding it means watching everything again.
Word-level timestamps — Whisper generates timecoded transcripts where every word has a start and end time. These can be exported as SRT subtitle files or used programmatically to jump straight to any moment.
Phrase search — Write down the phrases or ideas you’re looking for in a text file. The fuzzy matcher (
scripts/extract_phrases.py) scans all transcripts and returns timecodes — even when speakers trailed off or used slightly different words than you remembered.Auto-extract — The extraction tool cuts audio at those timecodes with padding and fades. Build a rough cut from a text file instead of scrubbing through hours of footage.
Instead of scrubbing through timelines, describe what you’re looking for in words. The machine finds it for you.
If you’re going to use AI with recordings of real people, here’s how to think about doing it responsibly. This section documents the ethical framework built into this project — not as a legal disclaimer, but as a set of design decisions that encode a responsibility to the people who shared their voices.
4.0.9 Nothing leaves the machine by default
All transcription (Whisper) runs locally. Original audio, raw transcripts, and detected personal information never touch the internet.
4.0.10 PII scrubbing before any AI analysis
Before the system asks Claude to analyze emotional content, it strips names, phone numbers, addresses, SSNs, and organizations using a two-pass approach:
- Pass 1: Deterministic regex patterns — catches phone numbers, emails, SSNs, street addresses
- Pass 2: spaCy NER — catches personal names, organizations, locations
The design bias is explicit: over-redacting is acceptable, leaking PII is not. A false positive (replacing “Mom” with [NAME]) costs nothing — the emotional analysis still works. A false negative (sending someone’s name to an external API) is the failure mode that matters.
4.0.11 Only scrubbed text reaches Claude
The code path is explicit: scrub_pii() runs first, analyze_message() only receives the scrubbed output. There is no code path that sends raw transcript to an external API. The pipeline enforces this ordering — it’s not a setting you can turn off.
4.0.12 Audit trail
Detected PII entities are stored locally (in the pii_entities field) so scrubbing accuracy can be reviewed and improved. This data never leaves the machine.
4.0.13 Limitations (honest)
- spaCy’s
en_core_web_smis a small model optimized for speed. It will miss some names, especially unusual ones or those in non-English speech. - Regex patterns cover US-format identifiers. International phone numbers and address formats may slip through.
- If a caller says “I live on the corner of Fifth and Main” without a street number, the address regex won’t catch it (though spaCy’s location detection may).
- The system errs on the side of caution but isn’t perfect.
4.0.14 Voice as property
A person’s voice is theirs. AI models trained on voice recordings can later be used to generate synthetic speech that sounds like that person — deepfakes. Exposing raw voice audio online means it could be scraped and used for voice cloning without consent. This is why:
- Original audio files never leave the local machine
- The public constellation website streams playback but does not allow downloading, sharing, or saving audio files
- Only scrubbed text (not audio) is sent to the Claude API for analysis
- The voice recordings exist for listening in the moment, not for extraction
4.0.15 Why this matters
People called a voicemail line and shared personal stories. Some left their names. Some left phone numbers. Some said things they might not want the world to hear. Using AI to analyze that content comes with a responsibility to protect their privacy and their voice. The architecture encodes that responsibility — it’s not an afterthought.
5 Preparing Sound for the Installation
A stem is a composed piece of audio that the installation will loop on the inside speakers. It can come from anywhere — a DAW, a hardware sampler, a modular synthesizer, field recordings processed through effects. The only requirements are technical: the engine needs files in a specific format to play them reliably.
5.1 What Makes a Valid Stem
| Property | Requirement |
|---|---|
| Format | WAV (PCM 24-bit) |
| Sample rate | 48kHz (files at other rates are rejected on import) |
| Channels | Stereo (the engine preserves stereo throughout the signal path) |
| Loudness | Normalized to -14 LUFS on import (the importer does this automatically) |
| Duration | 1 second minimum, 1 hour maximum |
Longer stems are better — 5 to 30 minutes avoids obvious looping. The engine streams from disk with a 10-second read-ahead buffer, so even very long files work without loading into memory.
5.2 How to Upload
Via dashboard: Go to the Stems page, drag-drop a WAV file onto the upload zone. The importer validates format, normalizes loudness, and adds it to the stem library. Set a mood tag for diversity (the engine avoids repeating the same mood back-to-back).
Via API:
curl -X POST -F file=@my_stem.wav localhost:8000/api/stems/upload5.3 Generating Curated Samples
The outside zone plays short voicemail clips from audio/curated/. To build or refresh this library from the voicemail corpus:
# 1. Edit phrases.txt with phrases you want to extract (one per line)
# 2. Extract matching audio from transcripts
python scripts/extract_phrases.py phrases.txt
# 3. Normalize to consistent volume (-23 dB RMS)
python scripts/normalize_curated.pyOutput goes to audio/curated/. The outside zone picks up new files automatically on its next scan cycle.
6 How the Installation Plays Sound
Walk into the exhibition space and you hear two things: a continuous wash of ambient sound from the speakers inside, and occasional fragments of actual voicemail messages drifting from the speakers outside. The two zones breathe together — when the inside gets louder, the outside responds. The sound is always changing, but slowly, like weather.
Under the hood, this is a two-zone playback engine (att/engine/player.py). Each zone has its own independent playback system, and they communicate through a simple intensity signal.
%%{init: {'theme': 'base', 'themeVariables': {
'primaryColor': '#1a1a1a',
'primaryTextColor': '#d4d4d4',
'primaryBorderColor': '#c49a6c',
'lineColor': '#8a6d4a',
'secondaryColor': '#2a2a2a',
'tertiaryColor': '#111',
'fontFamily': 'SF Mono, Fira Code, Consolas, monospace',
'fontSize': '12px',
'clusterBkg': '#1a1a1a',
'clusterBorder': '#2a2a2a',
'edgeLabelBackground': '#111'
}}}%%
flowchart LR
subgraph Inside["Inside Zone — Stems"]
direction TB
Stems[(Stem Library)]
Select[Weighted Random<br/>+ Mood Diversity]
Layer[Playback Layer<br/>stream from disk]
Stems --> Select --> Layer
end
subgraph Outside["Outside Zone — Curated"]
direction TB
Folder[(audio/curated/)]
Pick[Random Selection]
Play[One-Shot Playback]
Folder --> Pick --> Play
end
Layer --> Mix[Mixer]
Play --> Mix
Layer -.->|RMS intensity| Play
Mix --> XR18[XR18 Mixer]
XR18 --> Spk_I[Inside Speakers<br/>ch 0-1 stereo]
XR18 --> Spk_O[Outside Speaker<br/>ch 2 mono]
style Inside fill:#1a1a1a,stroke:#5a9a5a,color:#d4d4d4
style Outside fill:#1a1a1a,stroke:#c49a6c,color:#d4d4d4
style Stems fill:#2a2a2a,stroke:#5a9a5a,color:#d4d4d4
style Select fill:#2a2a2a,stroke:#5a9a5a,color:#d4d4d4
style Layer fill:#2a2a2a,stroke:#5a9a5a,color:#d4d4d4
style Folder fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
style Pick fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
style Play fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
style Mix fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
style XR18 fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
style Spk_I fill:#2a2a2a,stroke:#5a9a5a,color:#d4d4d4
style Spk_O fill:#2a2a2a,stroke:#5a9a5a,color:#d4d4d4
6.1 Inside Zone — Beds of Sound
The inside speakers play stereo bed stems: composed pieces of ambient audio that loop continuously, one dissolving into the next. Think of each stem as a slow scene change — the texture of the room shifts, but there’s never silence.
How it works:
- Stems are stereo WAV files at 48kHz, streamed from disk with a 10-second read-ahead buffer (not loaded into memory — even hour-long compositions work)
- When a stem reaches its end, the engine crossfades to a new one over 3 seconds — one texture dissolving into another
- Mood diversity: the system avoids repeating the same emotional tone back-to-back. If the current stem is tagged “grief,” the next one won’t be
- Weighted selection: stems with higher weights are chosen more often, giving you curatorial control over which compositions dominate
- Stems can also be swapped manually from the dashboard at any time
6.2 Outside Zone — Fragments of Past Messages
The outside speakers whisper fragments of past voicemail messages — the software picks them at random, spacing them out like distant memories. Each clip plays once, then the speaker goes quiet for 30–75 seconds before another.
The outside zone listens to the inside zone’s energy (measured as RMS — root mean square, a way of measuring audio loudness):
- When the inside zone is loud: curated clips play louder and gaps between them are shorter — the outside joins the conversation
- When the inside zone is quiet: clips are softer and more spaced out — the outside steps back
- The volume floor is 5% (never fully silent) and the ceiling is 50% of the zone’s configured volume
This cross-zone modulation means the two zones breathe together without any manual coordination.
6.3 Per-Zone Gain Automation
On top of the zone volumes you set in the dashboard, the engine adds slow, random gain drift — subtle volume movement that keeps the sound from feeling static. Think of it as the room itself breathing.
- Drift range: gain multiplier wanders between 0.6 and 1.0
- Slew-limited: maximum 0.02 change per update (no abrupt jumps, only gradual shifts)
- Update interval: every 0.5–2 seconds, a new random target is chosen
6.4 Safe Mode
A mandatory fallback for unattended operation — if something goes wrong while nobody’s watching, the installation keeps playing.
| Trigger | Automatic on any engine exception, or manual from dashboard |
| Behavior | All zones play the same bed stem, looped, at a fixed safe volume (60%) |
| Dashboard | Shows “SAFE MODE” prominently in red |
| Exit | Manual from dashboard only |
6.5 Threading Model
The engine uses three concurrent systems to keep audio smooth:
- Mix thread — a dedicated Python thread that fills the audio buffer using
time.sleep()for precise timing. This is the real-time audio path; it never touches asyncio. - Trigger task — an asyncio task that polls for bed-change timers and manual triggers (non-realtime, runs every 0.5s)
- Curated loops — one asyncio task per curated zone, managing sample selection, playback timing, and gap scheduling
7 What Makes the Sound Change
The installation is designed to run unattended — the sound evolves on its own. Here’s what causes changes:
- Bed stems auto-rotate: when a stem finishes playing (reaches 98% progress), the engine crossfades to a new one with a different mood
- Curated clips self-regulate: the outside zone picks random clips and adjusts its own volume and pacing based on the inside zone’s energy
- Gain drift: slow random volume movement across all zones (see Per-Zone Gain Automation above)
- Manual control: from the dashboard, you can swap a stem, change zone volumes, set a target mood, or enter/exit safe mode
8 Hardware Setup
8.1 XR18 Channel Mapping
| Channel | Assignment | Zone |
|---|---|---|
| 0 | Inside L | inside |
| 1 | Inside R | inside |
| 2 | Outside (mono) | outside |
8.2 Signal Flow
Python Engine (sounddevice) → XR18 USB Audio → XR18 Mixer → Speakers
Room Mic → XR18 Aux In → Python Engine (sounddevice input)
8.3 On-Site Network
Sou'wester WiFi
│
GL-iNet Opal (WiFi client → LAN bridge + WiFi AP)
│
├── LAN 1 → XR18 (mixer control)
├── LAN 2 → Grandstream HT812 (SIP to Twilio)
└── WiFi AP → iPad (Mixing Station)
The Mac laptop connects to the Sou’wester WiFi directly (or the Opal’s AP) — it only needs internet for Anthropic API calls during ingest. The audio engine runs entirely offline over USB to the XR18.
8.4 Grandstream HT812
| Port | Device | Purpose |
|---|---|---|
| WAN | — | Not used (ethernet goes to Opal LAN port) |
| LAN | — | Not used |
| FXS 1 | Analog phone | Auto-dials Twilio on pickup |
| FXS 2 | Tape answering machine (optional) | Receives incoming SIP calls |
9 Dashboard Reference
9.1 Pages
| Page | Purpose |
|---|---|
| Home | Stats, recent messages, upload zone, theme distribution |
| Incoming | Relay recordings from Dropbox, per-file ingest button (only visible when incoming_watch_dir is set) |
| Messages | Message list with sentiment filter, detail view with inline editing |
| Stems | Stem library with role/mood filters, upload zone, role assignment |
| Engine | Zone status, controls (play/pause/swap/safe mode), per-zone volume sliders, mood selector |
| Zones | Zone configuration viewer (channels, volume, type) |
| Settings | Configuration viewer |
9.2 Key Workflows
Upload a stem:
- Go to Stems page
- Drag-drop a WAV file onto the upload zone
- Optionally set mood, weight, and tags
- Click “play” to crossfade it into the inside zone
Ingest a relay recording:
- Recordings arrive in the Dropbox folder automatically
- Go to the Incoming page (visible when
incoming_watch_diris set) - Click ingest on a recording
- Pipeline runs: transcribe → scrub → analyze → extract grains
- File moves to
ingested/subfolder, message appears on Messages page
Archive/unarchive a message:
- Go to Messages page, open a message detail
- Click archive to hide it from the public website (export script filters archived messages)
- Click unarchive to restore it
Enter safe mode:
- Go to Engine page
- Click “enter safe mode”
- All zones play the same bed at fixed volume
- Click “exit safe mode” to resume normal playback
Swap a stem:
- Go to Engine page
- Click a specific stem in the library to crossfade to it in the inside zone
- Or let the engine auto-rotate when the current stem finishes
10 The Constellation — Public Website
After the residency, the voicemails live on as a constellation of light and sound. Each message becomes a point of light in a dark sky, and visitors can listen to them from anywhere.
Live at: after-the-tone.netlify.app (also embedded at sonicswitchyard.com/art/afterthetone)
10.1 What You See
The page opens to a dark canvas. Points of warm light appear in a tight cluster at the center, then drift apart like a big bang — settling into a constellation over a few seconds. This is a force-directed layout running live in the browser: messages that share emotional themes are pulled together by faint lines, while a gentle repulsion keeps them from overlapping. 2D Perlin noise adds slow ambient drift, so the field is never quite still.
Each star encodes properties of the original voicemail:
| Visual Property | Data Source | How It Maps |
|---|---|---|
| Size | Duration of the call | Longer messages make bigger stars (log-scaled) |
| Brightness / glow | Emotional intensity | More intense messages glow brighter |
| Connections | Shared themes (2+) | Faint lines link messages with overlapping emotional themes |
| Clustering | Sentiment + themes | Messages with similar feelings drift together |
Click a star (or the “listen to a message” button) to hear the original voicemail and read the scrubbed transcript. Theme tags appear below. The constellation highlights the selected star and its connected neighbors.
Audio plays in-browser only. There are no download links, no right-click-save, no sharing buttons. This is deliberate: a person’s voice is their property, and exposing downloadable audio risks voice cloning and deepfake use. The constellation is for listening, not extracting. See Privacy and Responsible AI Use for the full rationale.
10.2 Archive Feature
Messages can be hidden from the public site via the dashboard’s Messages page (archive/unarchive buttons). The export script filters out archived messages, so they won’t appear in the constellation after the next deploy. This gives curatorial control over what’s public without deleting anything from the database.
10.3 Build and Deploy Pipeline
The public site is a static HTML page (public_src/index.html) that loads a data.json manifest and audio files. No server required — it runs on Netlify’s free tier.
Export script (scripts/export_public.py):
- Queries SQLite for non-archived messages with completed analysis
- Converts WAV audio to 128kbps mono MP3 via ffmpeg
- Generates
data.jsonwith scrubbed transcripts, sentiment, themes, intensity, duration - Copies
public_src/index.htmlto the output directory - Writes
netlify.tomlwith CORS headers (for Squarespace iframe embedding) and cache rules
# Full export (audio + data)
source .venv/bin/activate
python scripts/export_public.py
# Data only (skip slow audio conversion)
python scripts/export_public.py --skip-audio
# Deploy to Netlify
netlify deploy --prod --dir=public/ --site=d81467c4-1d84-41f4-a584-aaa944e67d0a --no-buildSource files:
| File | Purpose |
|---|---|
public_src/index.html |
Single-page app: constellation canvas, audio player, force layout, Perlin drift |
scripts/export_public.py |
Build script: SQLite → data.json + MP3 audio + HTML + netlify.toml |
11 Configuration Reference
All parameters from config.yaml, organized by domain. This section is a technical reference — consult it when setting up or tuning the installation.
11.1 Audio
How the system talks to your audio interface.
| Parameter | Default | Description |
|---|---|---|
audio.sample_rate |
48000 |
Global sample rate in Hz — all stems and curated clips must match this |
audio.channels |
3 |
Total output channels (inside stereo + outside mono) |
audio.block_size |
1024 |
Audio buffer block size in samples — lower = less latency, higher = more stable |
audio.buffer_seconds |
2.0 |
Ring buffer duration — headroom before audio underruns |
audio.device |
X18/XR18 |
sounddevice output device name or index (null for system default) |
audio.storage_dir |
audio |
Root directory for audio files |
audio.incoming_watch_dir |
null |
Dropbox folder for relay recordings — set this to enable relay mode |
11.2 Zones
Each zone is a group of output channels with its own playback type. The type field determines what kind of audio the zone plays.
| Parameter | Default | Description |
|---|---|---|
zones.<name>.channels |
— | Output channel numbers (e.g., [0, 1] for stereo, [2] for mono) |
zones.<name>.volume |
0.8 |
Zone master volume, adjustable at runtime via dashboard |
zones.<name>.type |
stems |
Zone type: stems (loops bed stems) or curated (plays one-shot samples) |
zones.<name>.curated.directory |
audio/curated |
Directory to scan for WAV samples (curated zones only) |
zones.<name>.curated.sleep_range_secs |
[30, 75] |
Range for gap duration between curated clips (seconds) |
zones.<name>.curated.volume |
0.8 |
Base volume for curated playback (modulated by stem zone intensity) |
11.3 Whisper
Local speech-to-text. Runs entirely on your machine — no audio leaves the network.
| Parameter | Default | Description |
|---|---|---|
whisper.model_size |
small |
Whisper model (tiny/base/small/medium/large) — larger = more accurate, slower |
whisper.device |
auto |
Compute device (auto/cpu/cuda) |
whisper.compute_type |
int8 |
Quantization type for faster inference |
whisper.language |
null |
Language code or null for auto-detect |
11.4 Analysis
Emotional analysis via Claude API. Only scrubbed text (PII removed) is sent.
| Parameter | Default | Description |
|---|---|---|
analysis.model |
claude-sonnet-4-20250514 |
Claude model for emotional analysis |
analysis.themes |
17 items | Emotion whitelist: grief, longing, memory, love, family, loss, hope, anger, peace, regret, gratitude, apology, farewell, confession, prayer, humor, silence |
11.5 Grains
Controls how voicemails are split into segments during ingest. Used by the legacy grain engine and the phrase extraction pipeline.
| Parameter | Default | Description |
|---|---|---|
grains.min_grain_secs |
3 |
Minimum grain duration (seconds) |
grains.max_grain_secs |
10 |
Maximum grain duration (seconds) |
grains.silence_threshold_db |
25 |
Silence detection sensitivity (dB) — higher = more splits |
grains.min_silence_gap_secs |
0.3 |
Gaps shorter than this are merged |
11.6 Stems
Controls how the inside zone plays bed stems.
| Parameter | Default | Description |
|---|---|---|
stems.storage_dir |
audio/stems |
Where imported stems are stored on disk |
stems.crossfade_seconds |
3.0 |
Duration of crossfade when switching stems |
stems.loop |
true |
Whether stems loop when they reach the end |
stems.default_mood |
null |
If set, prefer stems tagged with this mood on startup |
stems.bed_volume |
0.8 |
Default volume for stem playback layers |
stems.import_target_lufs |
-14.0 |
Loudness normalization target for imported stems (LUFS) |
stems.import_required_sample_rate |
48000 |
Required sample rate — stems at other rates are rejected |
11.6.1 Safe Mode
The fallback mode for unattended operation. If the engine crashes, it drops to safe mode automatically.
| Parameter | Default | Description |
|---|---|---|
stems.safe_mode.enabled |
true |
Whether safe mode fallback is available |
stems.safe_mode.safe_bed_name |
null |
Specific bed stem for safe mode (null = use whatever’s loaded) |
stems.safe_mode.fixed_volume |
0.6 |
Fixed volume during safe mode |
11.6.2 Gain Automation
Slow, random volume drift that keeps the sound from feeling static.
| Parameter | Default | Description |
|---|---|---|
stems.gain_automation.update_interval_secs |
[0.5, 2.0] |
How often a new gain target is picked (seconds) |
stems.gain_automation.max_gain_change_per_tick |
0.02 |
Maximum gain change per update — slew limit to prevent jumps |
stems.gain_automation.drift_range |
[0.6, 1.0] |
Gain multiplier bounds — how far volume can drift |
11.7 Twilio
Voice recording configuration for the phone line.
| Parameter | Default | Description |
|---|---|---|
twilio.greeting |
"" |
TTS greeting text (empty = MP3 greeting files only, no text-to-speech) |
twilio.max_recording_seconds |
3600 |
Maximum recording duration — 1 hour, callers are never cut off |
twilio.recording_channels |
1 |
Recording channel count |
twilio.voice |
Polly.Brian |
TTS voice (only used if greeting text is set) |
twilio.voice_rate |
fast |
TTS speaking rate |
twilio.voice_pitch |
-20% |
TTS pitch adjustment |
twilio.pause_before_instructions |
1 |
Seconds of silence before greeting plays |
11.8 Dashboard
| Parameter | Default | Description |
|---|---|---|
dashboard.enabled |
true |
Enable the web dashboard at http://localhost:8000 |
mode: single
role: all
server:
host: 0.0.0.0
port: 8000
audio:
sample_rate: 48000
channels: 3
block_size: 1024
buffer_seconds: 2.0
device: X18/XR18
storage_dir: audio
incoming_watch_dir: ~/Dropbox/after_the_tone/incoming
zones:
inside:
channels: [0, 1]
volume: 0.8
type: stems
outside:
channels: [2]
volume: 0.8
type: curated
curated:
directory: audio/curated
sleep_range_secs: [30, 75]
whisper:
model_size: small
device: auto
compute_type: int8
language: null
analysis:
model: claude-sonnet-4-20250514
themes:
- grief
- longing
- memory
- love
- family
- loss
- hope
- anger
- peace
- regret
- gratitude
- apology
- farewell
- confession
- prayer
- humor
- silence
grains:
min_grain_secs: 3
max_grain_secs: 10
silence_threshold_db: 25
min_silence_gap_secs: 0.3
stems:
storage_dir: audio/stems
crossfade_seconds: 3.0
loop: true
default_mood: null
bed_volume: 0.8
import_target_lufs: -14.0
import_required_sample_rate: 48000
safe_mode:
enabled: true
safe_bed_name: null
fixed_volume: 0.6
gain_automation:
update_interval_secs: [0.5, 2.0]
max_gain_change_per_tick: 0.02
drift_range: [0.6, 1.0]
twilio:
greeting: ''
max_recording_seconds: 3600
recording_channels: 1
voice: Polly.Brian
voice_rate: fast
voice_pitch: -20%
pause_before_instructions: 1
dashboard:
enabled: true12 File Map
12.1 Relay (Windows PC)
| File | Purpose |
|---|---|
relay/server.py |
Standalone Twilio relay: answer calls, play greeting, save WAV + JSON to Dropbox. Supports standard and dual (tape + digital) modes. |
relay/config.yaml |
Relay config: ngrok URL, output dir, greetings dir, recording mode, tape SIP URI |
relay/requirements.txt |
Minimal Python deps (fastapi, uvicorn, httpx, twilio, pyyaml, python-multipart) |
relay/start.bat |
Windows startup script: sets Twilio env vars, launches ngrok + relay server |
12.2 Studio (Mac Laptop)
| File | Purpose |
|---|---|
att/main.py |
FastAPI app, lifespan, engine/mic/audio initialization |
att/config.py |
Pydantic config models, YAML + env loading |
att/db.py |
SQLite schema + async query layer (messages, transcripts, analysis, grains, stems) |
att/ingest/pipeline.py |
Ingest orchestrator: transcribe → scrub → analyze → extract |
att/ingest/transcribe.py |
Local Whisper transcription |
att/ingest/scrub.py |
PII removal (spaCy + regex) |
att/ingest/analyze.py |
Claude emotional analysis |
att/ingest/grains.py |
Grain extraction with librosa |
att/ingest/export.py |
Grain export with slug naming + CSV |
att/ingest/twilio_webhook.py |
VoIP webhook: TwiML greeting, recording download, ingest queue (disabled in relay mode) |
att/engine/player.py |
StemPlayer (two-zone beds + curated playback) + GrainEngine (legacy) |
att/engine/triggers.py |
Trigger system (manual triggers, bed-change timers) |
att/engine/importer.py |
Stem import validation + loudness normalization |
att/engine/mixer.py |
Voice/Mixer for grain-based playback (legacy) |
att/engine/selector.py |
Grain selection with sentiment/theme filtering (legacy) |
att/audio/output.py |
Ring buffer + sounddevice multi-channel output |
att/audio/processing.py |
DSP utilities: normalize, trim_silence, crossfade |
att/dashboard/routes.py |
HTML page routes (home, incoming, messages, stems, engine, settings) |
att/dashboard/api.py |
REST API endpoints (CRUD, engine controls, incoming ingest, archive, batch ops) |
12.3 Scripts & Public Site
| File | Purpose |
|---|---|
scripts/extract_phrases.py |
Extract curated samples from voicemails using fuzzy phrase matching against phrases.txt |
scripts/normalize_curated.py |
Normalize curated samples to -23dB RMS, resample to 48kHz |
scripts/export_public.py |
Build the public website: SQLite → data.json + MP3 audio + HTML |
scripts/retranscribe.py |
Re-run Whisper transcription on messages missing word timestamps |
public_src/index.html |
Public website source: constellation visualization + audio player |
phrases.txt |
Curated phrase list for sample extraction |
13 Failure Modes and Mitigations
Risk: The real-time audio thread hits an exception and stops producing sound. Mitigation: Safe mode activates automatically — all zones play the same bed stem at a fixed volume. The dashboard shows a red “SAFE MODE” indicator. Fix the underlying issue (usually a channel count mismatch between config and audio device) and restart.
Risk: The outside zone has no samples to play. Mitigation: If audio/curated/ is empty, the curated loop logs a warning and retries every 10 seconds. Just add WAV files to the folder. If files exist but have the wrong sample rate, they’re skipped with a warning — run scripts/normalize_curated.py to fix.
Risk: Gain drift or cross-zone modulation causing unexpected volume levels. Mitigation: All gain changes are slew-limited (max 0.02 per update). Curated volume is capped at 50% of the zone’s configured volume. Gain drift stays within configurable bounds (0.6–1.0 by default). Zone volumes are always adjustable from the dashboard.
14 Daily Operational Protocol
This protocol assumes the VoIP pipeline and engine are already running. See Quick Start for initial setup.
- Collect — let phones/tapes run, VoIP pipeline processes automatically
- Ingest — copy daily pack to OT SD card, capture new tape messages to OT raw folder
- Compose — update bed + fragments in OT with strict timebox (60–90 min), save project version
- Export — render stems to laptop, upload via dashboard, assign roles
- Promote — verify stems play in engine, approve as current set
- Stability check — reboot test: can you restart playback in < 5 minutes?
15 Quick Start
15.1 Single-Machine Mode
If running everything on one machine (local development, or if you don’t need the relay):
# Clone and install
git clone <repo-url> after_the_tone
cd after_the_tone
pip install -e .
# Configure
cp config.yaml.example config.yaml # edit as needed
cp .env.example .env # add API keys
# Run
python -m att.main
# or
uvicorn att.main:app --host 0.0.0.0 --port 8000
# Open dashboard
open http://localhost:800015.2 Relay + Studio Mode
15.2.1 Relay (Windows PC) Setup
From a fresh Windows machine:
# 1. Install Python + ngrok
winget install Python.Python.3.12
winget install Ngrok.Ngrok
ngrok config add-authtoken <your-token>
# 2. Clone the repo
git clone <repo-url> after_the_tone
cd after_the_tone\relay
# 3. Create a virtual environment and install deps
python -m venv venv
venv\Scripts\activate
pip install -r requirements.txt
# 4. Edit relay/config.yaml
# - Set base_url to your ngrok static domain
# - Verify output_dir and greetings_dir point to your Dropbox paths
# 5. Set Twilio credentials in relay/start.bat
# Edit the TWILIO_ACCOUNT_SID and TWILIO_AUTH_TOKEN lines
# 6. Test manually
relay\start.bat
# Call the phone number — should hear a greeting, then a beep
# Check ~/Dropbox/after_the_tone/incoming/ for the WAV + JSON sidecarGreetings sync automatically from the Mac via ~/Dropbox/after_the_tone/greetings/.
15.2.2 Studio (Mac Laptop) Setup
cd after_the_tone
# Enable relay mode: set the Dropbox watch folder in config.yaml
# Under the audio: section, uncomment and set:
# incoming_watch_dir: ~/Dropbox/after_the_tone/incoming
# Run the app as usual
python -m att.main
# Open dashboard — the "incoming" nav link will appear
open http://localhost:8000/incomingWhen new recordings arrive via Dropbox, they appear on the Incoming page. Click ingest to run the full pipeline (transcribe → scrub → analyze → grains). The file moves to incoming/ingested/ and the message appears on the Messages page.
15.2.3 Relay Auto-Start
The repo includes relay/start.bat which sets Twilio credentials, launches ngrok with the static domain, waits for the tunnel, then starts the relay server.
Task Scheduler setup (runs at user logon):
- Trigger: “At log on” (specific user)
- Action: Start a program →
relay\start.bat - Settings: AllowStartIfOnBatteries, DontStopIfGoingOnBatteries, StartWhenAvailable
Limitation: Runs at login, not at boot. If the PC restarts and nobody logs in, the relay stays down. For true boot-time startup, use NSSM to wrap it as a Windows service.
ngrok domain: The relay uses a static free-tier ngrok domain, so the Twilio webhook URL survives ngrok restarts with no reconfiguration needed.
15.3 Campus Phone (Grandstream HT812)
15.3.1 1. Twilio SIP Domain Setup
In the Twilio Console:
- Go to Elastic SIP Trunking → SIP Domains (or Voice → SIP Domains)
- Create a new SIP domain:
after-the-tone.sip.twilio.com - Set Voice URL to:
https://salena-crenate-coequally.ngrok-free.dev/twilio/voice(HTTP POST) — same webhook as PSTN calls - Under Credential Lists, create a new list and add credentials:
- Username:
airstream, Password: (choose a strong password) - Username:
tape, Password: (choose a strong password) — only needed if using FXS 2 for tape path
- Username:
- Assign the credential list to the SIP domain for authentication
15.3.2 2. Network Setup
Connect the GL-iNet Opal to the local WiFi (repeater mode):
- Access Opal admin at
192.168.8.1 - Internet → Repeater → scan and connect to WiFi network
- The Opal now bridges WiFi to its LAN ports and runs its own WiFi AP
- Plug HT812 into Opal LAN port 1, XR18 into LAN port 2
- Verify iPad still controls XR18 via Mixing Station (local traffic, unaffected)
15.3.3 3. HT812 Configuration
Access the HT812 web admin at its LAN IP (check Opal admin → connected clients, or try 192.168.8.x).
FXS Port 1 — Digital Path (auto-dial on pickup):
| Setting | Value |
|---|---|
| SIP Server | after-the-tone.sip.twilio.com |
| SIP User ID | airstream |
| Authenticate ID | airstream |
| Authenticate Password | (your password from step 1) |
| Offhook Auto-Dial | greeting |
| Offhook Auto-Dial Delay | 0 (immediate — no dial tone, just pick up and go) |
| NAT Traversal | Keep-Alive |
FXS Port 2 — Tape Path (optional, incoming calls):
| Setting | Value |
|---|---|
| SIP Server | after-the-tone.sip.twilio.com |
| SIP User ID | tape |
| Authenticate ID | tape |
| Authenticate Password | (your password) |
15.3.4 4. Test
- Check HT812 status page — FXS 1 should show Registered
- Pick up the analog phone
- You should hear a random greeting, then a beep
- Leave a message, hang up
- Check
~/Dropbox/after_the_tone/incoming/for the WAV + JSON sidecar
Troubleshooting:
- Can’t log into HT812 web admin: The V2 model does not use
admin/admin. It has a unique password printed on a sticker on the bottom of the unit. Username isadmin. - Not registered: Check SIP server address, credentials, and that the Opal has internet (try pinging from the Opal admin). Make sure SIP Registration is enabled on the Twilio SIP Domain.
- Registered but no audio: Check NAT traversal setting — try switching between Keep-Alive and STUN
- One-way audio: Usually a NAT issue — enable STUN server (
stun.l.google.com:19302) in HT812 settings
15.4 First Run
- Upload at least one stem via the dashboard (Stems page)
- Assign it the bed role
- The engine will start playing it immediately
- Upload more stems with fragments and air roles for full layered playback