After the Tone

System Architecture for a Participatory Sound Installation

Author

Published

March 17, 2026

1 Executive Summary

After the Tone is a participatory sound installation where visitors leave voicemail messages on a dedicated phone line. These messages are transcribed, tended, and transmuted into audio material that plays continuously in the exhibition space.

The system operates across three lanes:

Lane A (Analog): Analog phone on campus → tape recorder → eurorack/Octatrack for artifact-making and color
Lane B (Digital): VoIP → Python pipeline → grains + curated daily packs → Octatrack
Lane C (Installation): OT exports stems → computer plays zones, triggers fades, occasional mic injection

The computer runs Lane C: stable, unattended playback. It does not do creative sound design. It plays pre-composed stems, manages zones, and reacts subtly to the room.

2 System Architecture

%%{init: {'theme': 'base', 'themeVariables': {
  'primaryColor': '#1a1a1a',
  'primaryTextColor': '#d4d4d4',
  'primaryBorderColor': '#c49a6c',
  'lineColor': '#8a6d4a',
  'secondaryColor': '#2a2a2a',
  'tertiaryColor': '#111',
  'fontFamily': 'SF Mono, Fira Code, Consolas, monospace',
  'fontSize': '12px',
  'clusterBkg': '#1a1a1a',
  'clusterBorder': '#2a2a2a',
  'edgeLabelBackground': '#111'
}}}%%
flowchart TD
    subgraph A["Lane A — Analog"]
        direction TB
        Campus[Analog Phone on Campus]
        Tape[Tape Recorder]
        Euro[Eurorack]
        OT_A[Octatrack]
        Campus --> Tape --> Euro --> OT_A
    end

    subgraph B["Lane B — Digital"]
        direction TB
        VoIP[VoIP / Twilio]
        Pipeline[Python Pipeline]
        Grains[Grain Library]
        Packs[Daily Packs]
        OT_B[Octatrack]
        VoIP --> Pipeline --> Grains --> Packs --> OT_B
    end

    subgraph C["Lane C — Installation"]
        direction TB
        OT_C[OT Stem Exports]
        Engine[Python Engine]
        XR18[XR18 Mixer]
        Inside[Inside Speakers]
        Outside[Outside Speakers]
        OT_C --> Engine --> XR18
        XR18 --> Inside
        XR18 --> Outside
        Mic[Room Mic] -.->|modulation| Engine
    end

    OT_A --> OT_C
    OT_B --> OT_C

    style A fill:#1a1a1a,stroke:#9a5a5a,color:#d4d4d4
    style B fill:#1a1a1a,stroke:#c49a6c,color:#d4d4d4
    style C fill:#1a1a1a,stroke:#5a9a5a,color:#d4d4d4
    style Campus fill:#2a2a2a,stroke:#9a5a5a,color:#d4d4d4
    style Tape fill:#2a2a2a,stroke:#9a5a5a,color:#d4d4d4
    style Euro fill:#2a2a2a,stroke:#9a5a5a,color:#d4d4d4
    style OT_A fill:#2a2a2a,stroke:#9a5a5a,color:#d4d4d4
    style VoIP fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
    style Pipeline fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
    style Grains fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
    style Packs fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
    style OT_B fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
    style OT_C fill:#2a2a2a,stroke:#5a9a5a,color:#d4d4d4
    style Engine fill:#2a2a2a,stroke:#5a9a5a,color:#d4d4d4
    style XR18 fill:#2a2a2a,stroke:#5a9a5a,color:#d4d4d4
    style Inside fill:#2a2a2a,stroke:#5a9a5a,color:#d4d4d4
    style Outside fill:#2a2a2a,stroke:#5a9a5a,color:#d4d4d4
    style Mic fill:#2a2a2a,stroke:#9a9a5a,color:#d4d4d4

2.1 Deployment: Relay + Studio Split

In practice, the system runs across two machines to separate the always-on phone line from the residency-bound audio engine.

%%{init: {'theme': 'base', 'themeVariables': {
  'primaryColor': '#1a1a1a',
  'primaryTextColor': '#d4d4d4',
  'primaryBorderColor': '#c49a6c',
  'lineColor': '#8a6d4a',
  'secondaryColor': '#2a2a2a',
  'tertiaryColor': '#111',
  'fontFamily': 'SF Mono, Fira Code, Consolas, monospace',
  'fontSize': '12px',
  'clusterBkg': '#1a1a1a',
  'clusterBorder': '#2a2a2a',
  'edgeLabelBackground': '#111'
}}}%%
flowchart LR
    subgraph Relay["Relay — Windows PC (home, always-on)"]
        direction TB
        Twilio[Twilio Webhook]
        Mode{mode?}
        Greeting[Play Random Greeting]
        Dual[Dial Tape via SIP +<br/>Record Digitally]
        Download[Download WAV]
        Save[Save WAV + JSON Sidecar]
        Twilio --> Mode
        Mode -->|record| Greeting --> Download --> Save
        Mode -->|dual| Dual --> Download
    end

    subgraph Sync["Dropbox"]
        Folder[after_the_tone/incoming/]
    end

    subgraph Studio["Studio — Mac Laptop (residency)"]
        direction TB
        Incoming[Incoming Page]
        Ingest[Ingest Button]
        Pipeline[Pipeline: Transcribe → Scrub → Analyze → Grains]
        Engine[Engine + Dashboard]
        Incoming --> Ingest --> Pipeline --> Engine
    end

    Save --> Folder --> Incoming

    style Relay fill:#1a1a1a,stroke:#c49a6c,color:#d4d4d4
    style Sync fill:#1a1a1a,stroke:#666,color:#d4d4d4
    style Studio fill:#1a1a1a,stroke:#5a9a5a,color:#d4d4d4
    style Twilio fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
    style Mode fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
    style Greeting fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
    style Dual fill:#2a2a2a,stroke:#9a5a5a,color:#d4d4d4
    style Download fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
    style Save fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
    style Folder fill:#2a2a2a,stroke:#666,color:#d4d4d4
    style Incoming fill:#2a2a2a,stroke:#5a9a5a,color:#d4d4d4
    style Ingest fill:#2a2a2a,stroke:#5a9a5a,color:#d4d4d4
    style Pipeline fill:#2a2a2a,stroke:#5a9a5a,color:#d4d4d4
    style Engine fill:#2a2a2a,stroke:#5a9a5a,color:#d4d4d4

Why split? The laptop handles audio hardware (XR18, speakers, room mic) and needs stability. The phone line needs always-on internet + ngrok. If the laptop crashes, the phone line stays up. If the PC crashes, the installation keeps playing.

	Relay (PC)	Studio (Mac)
Role	Answer calls, save recordings	Everything else
Dependencies	fastapi, uvicorn, httpx, twilio, pyyaml, python-multipart	Full `att` package
Internet	Required (Twilio + ngrok)	Only for Anthropic API calls
Audio hardware	None	XR18 + speakers + mic
Database	None	SQLite
Uptime	Always-on	During residency hours

Sync: Dropbox folder after_the_tone/incoming/. Relay writes YYYYMMDD_HHMMSS_{CallSid}.wav + .json sidecar. Mac picks them up from the dashboard Incoming page. After ingest, files move to incoming/ingested/.

Greetings: MP3 files live in ~/Dropbox/after_the_tone/greetings/ on both machines, synced via Dropbox. The relay serves them to Twilio via a static mount at /greetings/{filename}. On each call, one is chosen at random. The relay prefers .mp3 (faster for Twilio to fetch), falls back to .wav. If no greeting files exist, the call skips straight to the beep with no TTS fallback.

JSON sidecar format (written alongside each WAV):

{
  "call_sid": "CA...",
  "phone_hash": "sha256[:16]",
  "duration": 45.2,
  "timestamp": "20250225_143022"
}

Credentials: The relay reads TWILIO_ACCOUNT_SID and TWILIO_AUTH_TOKEN from environment variables. On the deployed PC, these are set in relay/start.bat (plaintext — acceptable for a home machine, not suitable for shared infrastructure).

ngrok: The relay uses a static free-tier ngrok domain (salena-crenate-coequally.ngrok-free.dev). This survives ngrok restarts, so the Twilio webhook URL never needs updating.

Single-machine mode still works: leave audio.incoming_watch_dir unset and the Twilio webhook mounts directly on the Mac as before.

2.2 Campus Phone: Grandstream HT812

An analog phone on campus connects to the same Twilio pipeline via a Grandstream HT812 ATA (Analog Telephone Adapter). When someone picks up the phone, it auto-dials Twilio over SIP — no keypad, no dialing, just pick up and talk.

%%{init: {'theme': 'base', 'themeVariables': {
  'primaryColor': '#1a1a1a',
  'primaryTextColor': '#d4d4d4',
  'primaryBorderColor': '#c49a6c',
  'lineColor': '#8a6d4a',
  'secondaryColor': '#2a2a2a',
  'tertiaryColor': '#111',
  'fontFamily': 'SF Mono, Fira Code, Consolas, monospace',
  'fontSize': '12px',
  'clusterBkg': '#1a1a1a',
  'clusterBorder': '#2a2a2a',
  'edgeLabelBackground': '#111'
}}}%%
flowchart LR
    subgraph Campus["Residency — On-Site Network"]
        direction TB
        Phone[Analog Phone]
        HT812[Grandstream HT812<br/>FXS → SIP]
        Opal[GL-iNet Opal<br/>WiFi bridge]
        XR18[XR18 Mixer]
        iPad[iPad<br/>Mixing Station]
        Phone -->|RJ11| HT812
        HT812 -->|ethernet| Opal
        XR18 -->|ethernet| Opal
        iPad -.->|WiFi| Opal
    end

    subgraph Cloud["Twilio"]
        SIP[SIP Domain]
        Webhook[Voice URL Webhook]
        SIP --> Webhook
    end

    subgraph Home["Home PC — Relay"]
        Relay[relay/server.py]
    end

    Opal -->|internet| SIP
    Webhook -->|ngrok| Relay

    style Campus fill:#1a1a1a,stroke:#5a9a5a,color:#d4d4d4
    style Cloud fill:#1a1a1a,stroke:#c49a6c,color:#d4d4d4
    style Home fill:#1a1a1a,stroke:#c49a6c,color:#d4d4d4
    style Phone fill:#2a2a2a,stroke:#9a5a5a,color:#d4d4d4
    style HT812 fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
    style Opal fill:#2a2a2a,stroke:#5a9a5a,color:#d4d4d4
    style XR18 fill:#2a2a2a,stroke:#5a9a5a,color:#d4d4d4
    style iPad fill:#2a2a2a,stroke:#666,color:#d4d4d4
    style SIP fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
    style Webhook fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
    style Relay fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4

Network: The GL-iNet Opal travel router connects to the residency WiFi and bridges it to its LAN ports. The HT812 and XR18 share the Opal’s two ethernet ports. The iPad connects to the Opal’s WiFi AP for mixer control. Local traffic (iPad ↔︎ XR18) stays on the Opal’s subnet; only the HT812’s SIP traffic goes out to the internet. The Mac laptop is not in this path — it doesn’t need to be on or connected for the phone to work.

Call flow: Phone pickup → HT812 offhook auto-dial → SIP to Twilio → Twilio fires Voice URL webhook → relay on home PC → greeting + record → WAV to Dropbox → Mac ingests later.

Two ports, two paths:

HT812 Port	Connected Device	SIP User	Behavior
FXS 1	Analog phone	`airstream`	Offhook auto-dial → digital pipeline (greeting, record, grains)
FXS 2	Tape answering machine (Code-a-Phone)	`tape`	Receives incoming SIP calls → tape records analog copy

FXS 2 is optional. In standard mode (mode: record), only FXS 1 is used — callers hear a random MP3 greeting, leave a message, and the relay saves a digital WAV. FXS 2 sits idle.

2.2.1 Dual Recording Mode

In dual mode (mode: dual), every call simultaneously records to the tape machine AND digitally. The relay uses Twilio’s <Dial record="record-from-answer-dual"> to bridge the caller to the tape machine’s SIP endpoint while Twilio captures a parallel digital recording.

Caller → Twilio → relay (dual mode)
                    ├── SIP dial → HT812 FXS 2 → Code-a-Phone (analog tape)
                    └── Twilio records call digitally → WAV to Dropbox

The caller hears the tape machine’s physical greeting (not the MP3) and leaves a message on the cassette. Meanwhile, Twilio captures the entire call as a WAV and the relay saves it to Dropbox as usual. You get two copies: one with analog tape character, one clean digital.

Relay config for dual mode:

mode: dual
tape_sip: sip:tape@after-the-tone.sip.twilio.com

Recording behavior:

No silence cutoff — timeout=0 disables Twilio’s default 5-second silence detection. Recordings only end when the caller hangs up or hits the max length.
Max recording length — 3600 seconds (1 hour). Callers are never cut off for going long.
Download timeout — 300 seconds, to handle large recordings over slow connections.

3 Lane Definitions

3.1 Lane A: Analog Artifact

Purpose: Create textured, degraded audio material with analog character.

Input	Messages left on an analog phone on campus, recorded directly to tape
Process	Tape recordings processed through eurorack (filters, reverb, saturation), sampled into Octatrack
Output	Processed stems with analog warmth and tape artifacts
Rule	All creative decisions happen here, not on the computer

3.2 Lane B: Digital Indexed

Purpose: Systematically process voicemails into an indexed, searchable grain library.

Input	VoIP recordings via Twilio webhook (relay PC or local)
Process	Transcribe (Whisper) → scrub PII (spaCy) → analyze emotions (Claude) → segment into grains (librosa)
Output	Tagged grain library + curated daily sample packs for Octatrack
Rule	Pipeline is automated; curation happens when selecting daily packs
Deployment	Twilio answering runs on the relay PC; pipeline processing runs on the Mac (see Relay + Studio Split)

3.3 Lane C: Installation Playback

Purpose: Reliable, deterministic playback in the exhibition space.

Input	Pre-composed stems (from DAW, Octatrack, or any source) + curated voicemail clips
Process	Two-zone playback engine: beds loop inside, curated clips play outside, cross-zone intensity modulation
Output	3-channel audio to XR18 → speakers (stereo inside + mono outside)
Rule	No creative sound design on the computer. Stable unattended operation.

4 Lane B — Digital Pipeline

A deep-dive into the automated VoIP ingest pipeline that powers Lane B.

%%{init: {'theme': 'base', 'themeVariables': {
  'primaryColor': '#1a1a1a',
  'primaryTextColor': '#d4d4d4',
  'primaryBorderColor': '#c49a6c',
  'lineColor': '#8a6d4a',
  'secondaryColor': '#2a2a2a',
  'tertiaryColor': '#111',
  'fontFamily': 'SF Mono, Fira Code, Consolas, monospace',
  'fontSize': '12px',
  'clusterBkg': '#1a1a1a',
  'clusterBorder': '#2a2a2a',
  'edgeLabelBackground': '#111'
}}}%%
flowchart LR
    VoIP[VoIP Recording] --> Transcribe
    Transcribe --> Scrub[PII Scrub]
    Scrub --> Analyze[Emotional Analysis]
    Analyze --> Grains[Extract Grains]
    Grains --> Ready

    style VoIP fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
    style Transcribe fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
    style Scrub fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
    style Analyze fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
    style Grains fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
    style Ready fill:#2a2a2a,stroke:#5a9a5a,color:#d4d4d4

Status state machine: new → transcribed → scrubbed → analyzed → ready

Key file: att/ingest/pipeline.py — orchestrates the full sequence: transcribe → scrub → analyze → extract grains.

Library: faster-whisper (local inference, no cloud upload)

Key file: att/ingest/transcribe.py

Config (whisper section):

Parameter	Default	Description
`model_size`	`small`	Whisper model size (tiny/base/small/medium/large)
`device`	`auto`	Compute device (auto/cpu/cuda)
`compute_type`	`int8`	Quantization for faster inference
`language`	`null`	Language code or null for auto-detect

Output to DB:

transcripts.raw_text — full transcription
transcripts.language — detected language code
transcripts.confidence — model confidence score

Privacy: Audio never leaves the machine. All transcription runs locally.

Failure: Empty audio → status set to "empty", pipeline aborts for that message.

Why this step exists

People call a voicemail line and say whatever is on their mind. They say their names. They say their mother’s name. They leave phone numbers, addresses, confessions. This project asks people to be vulnerable, and the minimum obligation in return is that their identifying information never leaves the machine they trusted it to. PII scrubbing is not a feature — it is the condition under which this project is ethical.

Libraries: spaCy (en_core_web_sm) + custom regex patterns — both run locally, no network calls.

Key file: att/ingest/scrub.py

4.0.1 Where it sits in the pipeline

Scrubbing is step 2 of the ingest pipeline (att/ingest/pipeline.py). It runs after local transcription and before any text is sent to an external API. The ordering is enforced by the pipeline itself — there is no code path that sends unscrubbed text to Claude. The raw transcript and the original audio never leave the local machine.

4.0.2 Two-pass approach

The scrubber makes two passes over the transcript, then deduplicates overlapping detections (keeping the longer match):

Pass 1 — Regex patterns (deterministic, high-precision):

Pattern	Examples caught	Placeholder
`PHONE`	`(555) 123-4567`, `+1-555-123-4567`	`[PHONE]`
`EMAIL`	`name@example.com`	`[EMAIL]`
`SSN`	`123-45-6789`	`[SSN]`
`ADDRESS`	`1234 Main Street`, `56 Oak Ave`	`[ADDRESS]`

Pass 2 — spaCy NER (statistical, catches what regex misses):

Entity type	What it catches	Placeholder
`PERSON`	Personal names (“my daughter Sarah”)	`[NAME]`
`ORG`	Organizations, employers, schools	`[ORGANIZATION]`
`GPE` / `LOC` / `FAC`	Cities, states, landmarks, buildings	`[LOCATION]`

The two passes are complementary: regex catches structured identifiers that NER models often miss (phone numbers, SSNs), while spaCy catches names and places that no regex can reliably match.

4.0.3 What the scrubbed text looks like

A raw transcript like:

“Hi, this is Sarah Chen calling from 1455 Oak Drive. You can reach me at 555-0142. I just wanted to say I miss you, Mom.”

becomes:

“Hi, this is [NAME] calling from [ADDRESS]. You can reach me at [PHONE]. I just wanted to say I miss you, Mom.”

The emotional content — the part that matters for this project — is preserved. The identifying details are not.

4.0.4 Overlap deduplication

When regex and spaCy flag the same span (e.g., both detect “Sarah Chen”), the scrubber sorts by position and keeps the longer match to avoid double-replacement or garbled output.

4.0.5 Output to DB

transcripts.scrubbed_text — the redacted version of the transcript, with placeholders replacing every detected entity
transcripts.pii_entities — a JSON array recording what was found and where: {"text": "Sarah Chen", "label": "PERSON", "start": 13, "end": 23}

The original entity text is stored only in this local DB field so that scrubbing can be audited and improved. It is never sent to any external service.

4.0.6 What reaches the outside world

Nothing in this pipeline contacts the network except the Claude API call in the next step (emotional analysis), and that call receives only the scrubbed text. The code path is explicit — analyze_message() takes a scrubbed_text parameter; the raw transcript is not passed.

To be concrete about what never leaves the machine:

The original audio recording
The raw Whisper transcript
Any detected PII entities (names, phone numbers, addresses, SSNs)

4.0.7 Config

None — the regex patterns and spaCy model are hardcoded. This is intentional: PII scrubbing should not be something you can accidentally misconfigure or turn off. The patterns are conservative (biased toward over-scrubbing).

4.0.8 Limitations

This is a best-effort system, not a legal guarantee. spaCy’s en_core_web_sm is a small model optimized for speed over recall. It will miss some names, especially unusual ones or those in non-English speech. The regex patterns cover US-format identifiers. If a caller says “I live on the corner of Fifth and Main” without a street number, the address regex won’t catch it (though spaCy’s GPE/LOC may).

The design principle is: when in doubt, scrub it. A false positive (over-redacting) costs nothing meaningful — the emotional analysis still works with [NAME] placeholders. A false negative (leaking a name to an API) is the failure mode that matters.

Service: Anthropic Claude API (async)

Key file: att/ingest/analyze.py

Config (analysis section):

Parameter	Default	Description
`model`	`claude-sonnet-4-20250514`	Claude model for analysis
`themes`	17 emotions	Whitelist of valid theme labels

Output to DB:

analysis.sentiment — primary emotional category
analysis.intensity — float [0, 1]
analysis.themes — list of matched theme labels
analysis.summary — one-sentence description
analysis.slug — filename-safe identifier
analysis.narrative_thread — thematic grouping

Validation: Intensity clamped to [0, 1], themes filtered against the configured whitelist, handles markdown fences in Claude’s JSON response.

Failure: Missing API key → exception. Invalid JSON response → exception. Both set message status to "error".

Libraries: librosa (silence detection, spectral features) + soundfile

Key file: att/ingest/grains.py

Config (grains section):

Parameter	Default	Description
`min_grain_secs`	`2.0`	Minimum grain duration
`max_grain_secs`	`10`	Maximum grain duration
`silence_threshold_db`	`30.0`	Silence detection threshold
`min_silence_gap_secs`	`0.3`	Minimum gap between grains

Algorithm:

Detect silence intervals in audio
Merge intervals closer than min_silence_gap_secs
Merge segments shorter than min_grain_secs
Split segments longer than max_grain_secs
Apply 10ms fade in/out to each grain

Spectral features per grain:

spectral_centroid (Hz) — brightness measure
rms_energy — perceptual loudness

Output: Grain WAV files in audio/grains/, DB rows in grains table.

Failure: Audio shorter than min_grain_secs → empty grain list. No speech detected → empty grain list.

Key file: att/ingest/export.py

Naming convention: {sentiment}/{slug}_{grain_num:02d}.wav (e.g. grief/miss_you_mom_01.wav)

Directory structure: Sentiment subfolders + grains.csv metadata file.

CSV columns:

filename, folder, slug, sentiment, themes, intensity, duration, message_id, grain_index, start, end, spectral_centroid, rms_energy, summary

Config: Uses audio.storage_path for the export directory root.

In addition to the automatic silence-based grain extraction above, a curated phrase extraction tool lets you search for specific spoken phrases across all transcripts and cut them out precisely.

Key file: scripts/extract_phrases.py

How it works:

You write a text file (phrases.txt) with one phrase per line — the phrases that struck you while listening
The tool fuzzy-matches each phrase against all transcripts’ word-level timestamps (generated by Whisper)
Matching audio segments are extracted with configurable padding (default 1.5s before and after) and 1.5s crossfades
Output: clean WAV files in audio/curated/, ready for the outside zone

# Preview matches without extracting
python scripts/extract_phrases.py phrases.txt --dry-run

# Extract with default settings
python scripts/extract_phrases.py phrases.txt

# Custom padding and match threshold
python scripts/extract_phrases.py phrases.txt --pad-before 0.5 --pad-after 0.5 --min-score 75

# Keep all non-overlapping matches (not just the best)
python scripts/extract_phrases.py phrases.txt --all-matches

# Then normalize the output for consistent playback volume
python scripts/normalize_curated.py

For Fellow Artists: Building a Sample Library from Field Recordings

If you’re working with field recordings, interviews, or any spoken-word material and want to build a usable sample library, here’s the workflow this project uses. The tools are open and the approach generalizes to any audio corpus.

The philosophy: you still listen to everything. The machine doesn’t decide what matters — you do. It just makes the extraction fast once you’ve made your choices.

Listen first — Go through every recording. Apply your own creative judgment. Note what resonates, what surprises you, what you want to use.
Transcribe — Whisper runs locally (no cloud upload) and produces word-level timestamps. Every word gets a precise start and end time. Run scripts/retranscribe.py if your existing transcripts are missing word timestamps.
Note the phrases that struck you — As you listen, write down the phrases in a plain text file (one per line). These are your curatorial choices — the moments that matter to you.
Let the tool do the cutting — The fuzzy matcher (scripts/extract_phrases.py) finds your chosen phrases in the transcripts, even when speakers mumbled or trailed off. It handles the tedious part: locating the exact timecode and cutting the audio with padding and crossfades. Output: clean WAV files ready to drop into a sampler, DAW, or this installation’s curated zone.
Normalize — Batch-normalize all clips to consistent loudness (scripts/normalize_curated.py targets -23 dB RMS with a -1 dB peak ceiling) so the library plays back at even levels.
Organize — The export tool (att/ingest/export.py) sorts grains into folders by emotional category (grief/, hope/, silence/) with descriptive filenames derived from content analysis. A CSV manifest ties everything together: filename, sentiment, themes, intensity, duration, spectral features.

Frame it this way: you do the listening, the choosing, the curating. The machine handles the cutting and organizing.

For Filmmakers: Scanning Documentary Footage with Transcription

If you’re working with hours of interview footage and hunting for the three sentences that tell the story, this toolchain can help.

The problem — Scrubbing through timelines is slow. You know the kind of thing you’re looking for, but finding it means watching everything again.
Word-level timestamps — Whisper generates timecoded transcripts where every word has a start and end time. These can be exported as SRT subtitle files or used programmatically to jump straight to any moment.
Phrase search — Write down the phrases or ideas you’re looking for in a text file. The fuzzy matcher (scripts/extract_phrases.py) scans all transcripts and returns timecodes — even when speakers trailed off or used slightly different words than you remembered.
Auto-extract — The extraction tool cuts audio at those timecodes with padding and fades. Build a rough cut from a text file instead of scrubbing through hours of footage.

Instead of scrubbing through timelines, describe what you’re looking for in words. The machine finds it for you.

Privacy and Responsible AI Use

If you’re going to use AI with recordings of real people, here’s how to think about doing it responsibly. This section documents the ethical framework built into this project — not as a legal disclaimer, but as a set of design decisions that encode a responsibility to the people who shared their voices.

4.0.9 Nothing leaves the machine by default

All transcription (Whisper) runs locally. Original audio, raw transcripts, and detected personal information never touch the internet.

4.0.10 PII scrubbing before any AI analysis

Before the system asks Claude to analyze emotional content, it strips names, phone numbers, addresses, SSNs, and organizations using a two-pass approach:

Pass 1: Deterministic regex patterns — catches phone numbers, emails, SSNs, street addresses
Pass 2: spaCy NER — catches personal names, organizations, locations

The design bias is explicit: over-redacting is acceptable, leaking PII is not. A false positive (replacing “Mom” with [NAME]) costs nothing — the emotional analysis still works. A false negative (sending someone’s name to an external API) is the failure mode that matters.

4.0.11 Only scrubbed text reaches Claude

The code path is explicit: scrub_pii() runs first, analyze_message() only receives the scrubbed output. There is no code path that sends raw transcript to an external API. The pipeline enforces this ordering — it’s not a setting you can turn off.

4.0.12 Audit trail

Detected PII entities are stored locally (in the pii_entities field) so scrubbing accuracy can be reviewed and improved. This data never leaves the machine.

4.0.13 Limitations (honest)

spaCy’s en_core_web_sm is a small model optimized for speed. It will miss some names, especially unusual ones or those in non-English speech.
Regex patterns cover US-format identifiers. International phone numbers and address formats may slip through.
If a caller says “I live on the corner of Fifth and Main” without a street number, the address regex won’t catch it (though spaCy’s location detection may).
The system errs on the side of caution but isn’t perfect.

4.0.14 Voice as property

A person’s voice is theirs. AI models trained on voice recordings can later be used to generate synthetic speech that sounds like that person — deepfakes. Exposing raw voice audio online means it could be scraped and used for voice cloning without consent. This is why:

Original audio files never leave the local machine
The public constellation website streams playback but does not allow downloading, sharing, or saving audio files
Only scrubbed text (not audio) is sent to the Claude API for analysis
The voice recordings exist for listening in the moment, not for extraction

4.0.15 Why this matters

People called a voicemail line and shared personal stories. Some left their names. Some left phone numbers. Some said things they might not want the world to hear. Using AI to analyze that content comes with a responsibility to protect their privacy and their voice. The architecture encodes that responsibility — it’s not an afterthought.

5 Preparing Sound for the Installation

A stem is a composed piece of audio that the installation will loop on the inside speakers. It can come from anywhere — a DAW, a hardware sampler, a modular synthesizer, field recordings processed through effects. The only requirements are technical: the engine needs files in a specific format to play them reliably.

5.1 What Makes a Valid Stem

Property	Requirement
Format	WAV (PCM 24-bit)
Sample rate	48kHz (files at other rates are rejected on import)
Channels	Stereo (the engine preserves stereo throughout the signal path)
Loudness	Normalized to -14 LUFS on import (the importer does this automatically)
Duration	1 second minimum, 1 hour maximum

Longer stems are better — 5 to 30 minutes avoids obvious looping. The engine streams from disk with a 10-second read-ahead buffer, so even very long files work without loading into memory.

5.2 How to Upload

Via dashboard: Go to the Stems page, drag-drop a WAV file onto the upload zone. The importer validates format, normalizes loudness, and adds it to the stem library. Set a mood tag for diversity (the engine avoids repeating the same mood back-to-back).

Via API:

curl -X POST -F file=@my_stem.wav localhost:8000/api/stems/upload

5.3 Generating Curated Samples

The outside zone plays short voicemail clips from audio/curated/. To build or refresh this library from the voicemail corpus:

# 1. Edit phrases.txt with phrases you want to extract (one per line)
# 2. Extract matching audio from transcripts
python scripts/extract_phrases.py phrases.txt

# 3. Normalize to consistent volume (-23 dB RMS)
python scripts/normalize_curated.py

Output goes to audio/curated/. The outside zone picks up new files automatically on its next scan cycle.

6 How the Installation Plays Sound

Walk into the exhibition space and you hear two things: a continuous wash of ambient sound from the speakers inside, and occasional fragments of actual voicemail messages drifting from the speakers outside. The two zones breathe together — when the inside gets louder, the outside responds. The sound is always changing, but slowly, like weather.

Under the hood, this is a two-zone playback engine (att/engine/player.py). Each zone has its own independent playback system, and they communicate through a simple intensity signal.

%%{init: {'theme': 'base', 'themeVariables': {
  'primaryColor': '#1a1a1a',
  'primaryTextColor': '#d4d4d4',
  'primaryBorderColor': '#c49a6c',
  'lineColor': '#8a6d4a',
  'secondaryColor': '#2a2a2a',
  'tertiaryColor': '#111',
  'fontFamily': 'SF Mono, Fira Code, Consolas, monospace',
  'fontSize': '12px',
  'clusterBkg': '#1a1a1a',
  'clusterBorder': '#2a2a2a',
  'edgeLabelBackground': '#111'
}}}%%
flowchart LR
    subgraph Inside["Inside Zone — Stems"]
        direction TB
        Stems[(Stem Library)]
        Select[Weighted Random<br/>+ Mood Diversity]
        Layer[Playback Layer<br/>stream from disk]
        Stems --> Select --> Layer
    end

    subgraph Outside["Outside Zone — Curated"]
        direction TB
        Folder[(audio/curated/)]
        Pick[Random Selection]
        Play[One-Shot Playback]
        Folder --> Pick --> Play
    end

    Layer --> Mix[Mixer]
    Play --> Mix

    Layer -.->|RMS intensity| Play

    Mix --> XR18[XR18 Mixer]
    XR18 --> Spk_I[Inside Speakers<br/>ch 0-1 stereo]
    XR18 --> Spk_O[Outside Speaker<br/>ch 2 mono]

    style Inside fill:#1a1a1a,stroke:#5a9a5a,color:#d4d4d4
    style Outside fill:#1a1a1a,stroke:#c49a6c,color:#d4d4d4
    style Stems fill:#2a2a2a,stroke:#5a9a5a,color:#d4d4d4
    style Select fill:#2a2a2a,stroke:#5a9a5a,color:#d4d4d4
    style Layer fill:#2a2a2a,stroke:#5a9a5a,color:#d4d4d4
    style Folder fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
    style Pick fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
    style Play fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
    style Mix fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
    style XR18 fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
    style Spk_I fill:#2a2a2a,stroke:#5a9a5a,color:#d4d4d4
    style Spk_O fill:#2a2a2a,stroke:#5a9a5a,color:#d4d4d4

6.1 Inside Zone — Beds of Sound

The inside speakers play stereo bed stems: composed pieces of ambient audio that loop continuously, one dissolving into the next. Think of each stem as a slow scene change — the texture of the room shifts, but there’s never silence.

How it works:

Stems are stereo WAV files at 48kHz, streamed from disk with a 10-second read-ahead buffer (not loaded into memory — even hour-long compositions work)
When a stem reaches its end, the engine crossfades to a new one over 3 seconds — one texture dissolving into another
Mood diversity: the system avoids repeating the same emotional tone back-to-back. If the current stem is tagged “grief,” the next one won’t be
Weighted selection: stems with higher weights are chosen more often, giving you curatorial control over which compositions dominate
Stems can also be swapped manually from the dashboard at any time

6.2 Outside Zone — Fragments of Past Messages

The outside speakers whisper fragments of past voicemail messages — the software picks them at random, spacing them out like distant memories. Each clip plays once, then the speaker goes quiet for 30–75 seconds before another.

The outside zone listens to the inside zone’s energy (measured as RMS — root mean square, a way of measuring audio loudness):

When the inside zone is loud: curated clips play louder and gaps between them are shorter — the outside joins the conversation
When the inside zone is quiet: clips are softer and more spaced out — the outside steps back
The volume floor is 5% (never fully silent) and the ceiling is 50% of the zone’s configured volume

This cross-zone modulation means the two zones breathe together without any manual coordination.

6.3 Per-Zone Gain Automation

On top of the zone volumes you set in the dashboard, the engine adds slow, random gain drift — subtle volume movement that keeps the sound from feeling static. Think of it as the room itself breathing.

Drift range: gain multiplier wanders between 0.6 and 1.0
Slew-limited: maximum 0.02 change per update (no abrupt jumps, only gradual shifts)
Update interval: every 0.5–2 seconds, a new random target is chosen

6.4 Safe Mode

A mandatory fallback for unattended operation — if something goes wrong while nobody’s watching, the installation keeps playing.

Trigger	Automatic on any engine exception, or manual from dashboard
Behavior	All zones play the same bed stem, looped, at a fixed safe volume (60%)
Dashboard	Shows “SAFE MODE” prominently in red
Exit	Manual from dashboard only

6.5 Threading Model

The engine uses three concurrent systems to keep audio smooth:

Mix thread — a dedicated Python thread that fills the audio buffer using time.sleep() for precise timing. This is the real-time audio path; it never touches asyncio.
Trigger task — an asyncio task that polls for bed-change timers and manual triggers (non-realtime, runs every 0.5s)
Curated loops — one asyncio task per curated zone, managing sample selection, playback timing, and gap scheduling

7 What Makes the Sound Change

The installation is designed to run unattended — the sound evolves on its own. Here’s what causes changes:

Bed stems auto-rotate: when a stem finishes playing (reaches 98% progress), the engine crossfades to a new one with a different mood
Curated clips self-regulate: the outside zone picks random clips and adjusts its own volume and pacing based on the inside zone’s energy
Gain drift: slow random volume movement across all zones (see Per-Zone Gain Automation above)
Manual control: from the dashboard, you can swap a stem, change zone volumes, set a target mood, or enter/exit safe mode

8 Hardware Setup

8.1 XR18 Channel Mapping

Channel	Assignment	Zone
0	Inside L	inside
1	Inside R	inside
2	Outside (mono)	outside

8.2 Signal Flow

Python Engine (sounddevice) → XR18 USB Audio → XR18 Mixer → Speakers
Room Mic → XR18 Aux In → Python Engine (sounddevice input)

8.3 On-Site Network

Sou'wester WiFi
       │
  GL-iNet Opal (WiFi client → LAN bridge + WiFi AP)
       │
       ├── LAN 1 → XR18 (mixer control)
       ├── LAN 2 → Grandstream HT812 (SIP to Twilio)
       └── WiFi AP → iPad (Mixing Station)

The Mac laptop connects to the Sou’wester WiFi directly (or the Opal’s AP) — it only needs internet for Anthropic API calls during ingest. The audio engine runs entirely offline over USB to the XR18.

8.4 Grandstream HT812

Port	Device	Purpose
WAN	—	Not used (ethernet goes to Opal LAN port)
LAN	—	Not used
FXS 1	Analog phone	Auto-dials Twilio on pickup
FXS 2	Tape answering machine (optional)	Receives incoming SIP calls

9 Dashboard Reference

9.1 Pages

Page	Purpose
Home	Stats, recent messages, upload zone, theme distribution
Incoming	Relay recordings from Dropbox, per-file ingest button (only visible when `incoming_watch_dir` is set)
Messages	Message list with sentiment filter, detail view with inline editing
Stems	Stem library with role/mood filters, upload zone, role assignment
Engine	Zone status, controls (play/pause/swap/safe mode), per-zone volume sliders, mood selector
Zones	Zone configuration viewer (channels, volume, type)
Settings	Configuration viewer

9.2 Key Workflows

Upload a stem:

Go to Stems page
Drag-drop a WAV file onto the upload zone
Optionally set mood, weight, and tags
Click “play” to crossfade it into the inside zone

Ingest a relay recording:

Recordings arrive in the Dropbox folder automatically
Go to the Incoming page (visible when incoming_watch_dir is set)
Click ingest on a recording
Pipeline runs: transcribe → scrub → analyze → extract grains
File moves to ingested/ subfolder, message appears on Messages page

Archive/unarchive a message:

Go to Messages page, open a message detail
Click archive to hide it from the public website (export script filters archived messages)
Click unarchive to restore it

Enter safe mode:

Go to Engine page
Click “enter safe mode”
All zones play the same bed at fixed volume
Click “exit safe mode” to resume normal playback

Swap a stem:

Go to Engine page
Click a specific stem in the library to crossfade to it in the inside zone
Or let the engine auto-rotate when the current stem finishes

10 The Constellation — Public Website

After the residency, the voicemails live on as a constellation of light and sound. Each message becomes a point of light in a dark sky, and visitors can listen to them from anywhere.

Live at: after-the-tone.netlify.app (also embedded at sonicswitchyard.com/art/afterthetone)

10.1 What You See

The page opens to a dark canvas. Points of warm light appear in a tight cluster at the center, then drift apart like a big bang — settling into a constellation over a few seconds. This is a force-directed layout running live in the browser: messages that share emotional themes are pulled together by faint lines, while a gentle repulsion keeps them from overlapping. 2D Perlin noise adds slow ambient drift, so the field is never quite still.

Each star encodes properties of the original voicemail:

Visual Property	Data Source	How It Maps
Size	Duration of the call	Longer messages make bigger stars (log-scaled)
Brightness / glow	Emotional intensity	More intense messages glow brighter
Connections	Shared themes (2+)	Faint lines link messages with overlapping emotional themes
Clustering	Sentiment + themes	Messages with similar feelings drift together

Click a star (or the “listen to a message” button) to hear the original voicemail and read the scrubbed transcript. Theme tags appear below. The constellation highlights the selected star and its connected neighbors.

Voice as Property — No Download, No Save

Audio plays in-browser only. There are no download links, no right-click-save, no sharing buttons. This is deliberate: a person’s voice is their property, and exposing downloadable audio risks voice cloning and deepfake use. The constellation is for listening, not extracting. See Privacy and Responsible AI Use for the full rationale.

10.2 Archive Feature

Messages can be hidden from the public site via the dashboard’s Messages page (archive/unarchive buttons). The export script filters out archived messages, so they won’t appear in the constellation after the next deploy. This gives curatorial control over what’s public without deleting anything from the database.

10.3 Build and Deploy Pipeline

The public site is a static HTML page (public_src/index.html) that loads a data.json manifest and audio files. No server required — it runs on Netlify’s free tier.

Export script (scripts/export_public.py):

Queries SQLite for non-archived messages with completed analysis
Converts WAV audio to 128kbps mono MP3 via ffmpeg
Generates data.json with scrubbed transcripts, sentiment, themes, intensity, duration
Copies public_src/index.html to the output directory
Writes netlify.toml with CORS headers (for Squarespace iframe embedding) and cache rules

# Full export (audio + data)
source .venv/bin/activate
python scripts/export_public.py

# Data only (skip slow audio conversion)
python scripts/export_public.py --skip-audio

# Deploy to Netlify
netlify deploy --prod --dir=public/ --site=d81467c4-1d84-41f4-a584-aaa944e67d0a --no-build

Source files:

File	Purpose
`public_src/index.html`	Single-page app: constellation canvas, audio player, force layout, Perlin drift
`scripts/export_public.py`	Build script: SQLite → data.json + MP3 audio + HTML + netlify.toml

11 Configuration Reference

All parameters from config.yaml, organized by domain. This section is a technical reference — consult it when setting up or tuning the installation.

11.1 Audio

How the system talks to your audio interface.

Parameter	Default	Description
`audio.sample_rate`	`48000`	Global sample rate in Hz — all stems and curated clips must match this
`audio.channels`	`3`	Total output channels (inside stereo + outside mono)
`audio.block_size`	`1024`	Audio buffer block size in samples — lower = less latency, higher = more stable
`audio.buffer_seconds`	`2.0`	Ring buffer duration — headroom before audio underruns
`audio.device`	`X18/XR18`	sounddevice output device name or index (`null` for system default)
`audio.storage_dir`	`audio`	Root directory for audio files
`audio.incoming_watch_dir`	`null`	Dropbox folder for relay recordings — set this to enable relay mode

11.2 Zones

Each zone is a group of output channels with its own playback type. The type field determines what kind of audio the zone plays.

Parameter	Default	Description
`zones.<name>.channels`	—	Output channel numbers (e.g., `[0, 1]` for stereo, `[2]` for mono)
`zones.<name>.volume`	`0.8`	Zone master volume, adjustable at runtime via dashboard
`zones.<name>.type`	`stems`	Zone type: `stems` (loops bed stems) or `curated` (plays one-shot samples)
`zones.<name>.curated.directory`	`audio/curated`	Directory to scan for WAV samples (curated zones only)
`zones.<name>.curated.sleep_range_secs`	`[30, 75]`	Range for gap duration between curated clips (seconds)
`zones.<name>.curated.volume`	`0.8`	Base volume for curated playback (modulated by stem zone intensity)

11.3 Whisper

Local speech-to-text. Runs entirely on your machine — no audio leaves the network.

Parameter	Default	Description
`whisper.model_size`	`small`	Whisper model (tiny/base/small/medium/large) — larger = more accurate, slower
`whisper.device`	`auto`	Compute device (auto/cpu/cuda)
`whisper.compute_type`	`int8`	Quantization type for faster inference
`whisper.language`	`null`	Language code or null for auto-detect

11.4 Analysis

Emotional analysis via Claude API. Only scrubbed text (PII removed) is sent.

Parameter	Default	Description
`analysis.model`	`claude-sonnet-4-20250514`	Claude model for emotional analysis
`analysis.themes`	17 items	Emotion whitelist: grief, longing, memory, love, family, loss, hope, anger, peace, regret, gratitude, apology, farewell, confession, prayer, humor, silence

11.5 Grains

Controls how voicemails are split into segments during ingest. Used by the legacy grain engine and the phrase extraction pipeline.

Parameter	Default	Description
`grains.min_grain_secs`	`3`	Minimum grain duration (seconds)
`grains.max_grain_secs`	`10`	Maximum grain duration (seconds)
`grains.silence_threshold_db`	`25`	Silence detection sensitivity (dB) — higher = more splits
`grains.min_silence_gap_secs`	`0.3`	Gaps shorter than this are merged

11.6 Stems

Controls how the inside zone plays bed stems.

Parameter	Default	Description
`stems.storage_dir`	`audio/stems`	Where imported stems are stored on disk
`stems.crossfade_seconds`	`3.0`	Duration of crossfade when switching stems
`stems.loop`	`true`	Whether stems loop when they reach the end
`stems.default_mood`	`null`	If set, prefer stems tagged with this mood on startup
`stems.bed_volume`	`0.8`	Default volume for stem playback layers
`stems.import_target_lufs`	`-14.0`	Loudness normalization target for imported stems (LUFS)
`stems.import_required_sample_rate`	`48000`	Required sample rate — stems at other rates are rejected

11.6.1 Safe Mode

The fallback mode for unattended operation. If the engine crashes, it drops to safe mode automatically.

Parameter	Default	Description
`stems.safe_mode.enabled`	`true`	Whether safe mode fallback is available
`stems.safe_mode.safe_bed_name`	`null`	Specific bed stem for safe mode (null = use whatever’s loaded)
`stems.safe_mode.fixed_volume`	`0.6`	Fixed volume during safe mode

11.6.2 Gain Automation

Slow, random volume drift that keeps the sound from feeling static.

Parameter	Default	Description
`stems.gain_automation.update_interval_secs`	`[0.5, 2.0]`	How often a new gain target is picked (seconds)
`stems.gain_automation.max_gain_change_per_tick`	`0.02`	Maximum gain change per update — slew limit to prevent jumps
`stems.gain_automation.drift_range`	`[0.6, 1.0]`	Gain multiplier bounds — how far volume can drift

11.7 Twilio

Voice recording configuration for the phone line.

Parameter	Default	Description
`twilio.greeting`	`""`	TTS greeting text (empty = MP3 greeting files only, no text-to-speech)
`twilio.max_recording_seconds`	`3600`	Maximum recording duration — 1 hour, callers are never cut off
`twilio.recording_channels`	`1`	Recording channel count
`twilio.voice`	`Polly.Brian`	TTS voice (only used if greeting text is set)
`twilio.voice_rate`	`fast`	TTS speaking rate
`twilio.voice_pitch`	`-20%`	TTS pitch adjustment
`twilio.pause_before_instructions`	`1`	Seconds of silence before greeting plays

11.8 Dashboard

Parameter	Default	Description
`dashboard.enabled`	`true`	Enable the web dashboard at `http://localhost:8000`

Full config.yaml (current deployment)

mode: single
role: all
server:
  host: 0.0.0.0
  port: 8000
audio:
  sample_rate: 48000
  channels: 3
  block_size: 1024
  buffer_seconds: 2.0
  device: X18/XR18
  storage_dir: audio
  incoming_watch_dir: ~/Dropbox/after_the_tone/incoming
zones:
  inside:
    channels: [0, 1]
    volume: 0.8
    type: stems
  outside:
    channels: [2]
    volume: 0.8
    type: curated
    curated:
      directory: audio/curated
      sleep_range_secs: [30, 75]
whisper:
  model_size: small
  device: auto
  compute_type: int8
  language: null
analysis:
  model: claude-sonnet-4-20250514
  themes:
    - grief
    - longing
    - memory
    - love
    - family
    - loss
    - hope
    - anger
    - peace
    - regret
    - gratitude
    - apology
    - farewell
    - confession
    - prayer
    - humor
    - silence
grains:
  min_grain_secs: 3
  max_grain_secs: 10
  silence_threshold_db: 25
  min_silence_gap_secs: 0.3
stems:
  storage_dir: audio/stems
  crossfade_seconds: 3.0
  loop: true
  default_mood: null
  bed_volume: 0.8
  import_target_lufs: -14.0
  import_required_sample_rate: 48000
  safe_mode:
    enabled: true
    safe_bed_name: null
    fixed_volume: 0.6
  gain_automation:
    update_interval_secs: [0.5, 2.0]
    max_gain_change_per_tick: 0.02
    drift_range: [0.6, 1.0]
twilio:
  greeting: ''
  max_recording_seconds: 3600
  recording_channels: 1
  voice: Polly.Brian
  voice_rate: fast
  voice_pitch: -20%
  pause_before_instructions: 1
dashboard:
  enabled: true

12 File Map

12.1 Relay (Windows PC)

File	Purpose
`relay/server.py`	Standalone Twilio relay: answer calls, play greeting, save WAV + JSON to Dropbox. Supports standard and dual (tape + digital) modes.
`relay/config.yaml`	Relay config: ngrok URL, output dir, greetings dir, recording mode, tape SIP URI
`relay/requirements.txt`	Minimal Python deps (fastapi, uvicorn, httpx, twilio, pyyaml, python-multipart)
`relay/start.bat`	Windows startup script: sets Twilio env vars, launches ngrok + relay server

12.2 Studio (Mac Laptop)

File	Purpose
`att/main.py`	FastAPI app, lifespan, engine/mic/audio initialization
`att/config.py`	Pydantic config models, YAML + env loading
`att/db.py`	SQLite schema + async query layer (messages, transcripts, analysis, grains, stems)
`att/ingest/pipeline.py`	Ingest orchestrator: transcribe → scrub → analyze → extract
`att/ingest/transcribe.py`	Local Whisper transcription
`att/ingest/scrub.py`	PII removal (spaCy + regex)
`att/ingest/analyze.py`	Claude emotional analysis
`att/ingest/grains.py`	Grain extraction with librosa
`att/ingest/export.py`	Grain export with slug naming + CSV
`att/ingest/twilio_webhook.py`	VoIP webhook: TwiML greeting, recording download, ingest queue (disabled in relay mode)
`att/engine/player.py`	StemPlayer (two-zone beds + curated playback) + GrainEngine (legacy)
`att/engine/triggers.py`	Trigger system (manual triggers, bed-change timers)
`att/engine/importer.py`	Stem import validation + loudness normalization
`att/engine/mixer.py`	Voice/Mixer for grain-based playback (legacy)
`att/engine/selector.py`	Grain selection with sentiment/theme filtering (legacy)
`att/audio/output.py`	Ring buffer + sounddevice multi-channel output
`att/audio/processing.py`	DSP utilities: normalize, trim_silence, crossfade
`att/dashboard/routes.py`	HTML page routes (home, incoming, messages, stems, engine, settings)
`att/dashboard/api.py`	REST API endpoints (CRUD, engine controls, incoming ingest, archive, batch ops)

12.3 Scripts & Public Site

File	Purpose
`scripts/extract_phrases.py`	Extract curated samples from voicemails using fuzzy phrase matching against `phrases.txt`
`scripts/normalize_curated.py`	Normalize curated samples to -23dB RMS, resample to 48kHz
`scripts/export_public.py`	Build the public website: SQLite → data.json + MP3 audio + HTML
`scripts/retranscribe.py`	Re-run Whisper transcription on messages missing word timestamps
`public_src/index.html`	Public website source: constellation visualization + audio player
`phrases.txt`	Curated phrase list for sample extraction

13 Failure Modes and Mitigations

Mix Thread Crash

Risk: The real-time audio thread hits an exception and stops producing sound. Mitigation: Safe mode activates automatically — all zones play the same bed stem at a fixed volume. The dashboard shows a red “SAFE MODE” indicator. Fix the underlying issue (usually a channel count mismatch between config and audio device) and restart.

Curated Zone Silence

Risk: The outside zone has no samples to play. Mitigation: If audio/curated/ is empty, the curated loop logs a warning and retries every 10 seconds. Just add WAV files to the folder. If files exist but have the wrong sample rate, they’re skipped with a warning — run scripts/normalize_curated.py to fix.

Volume Runaway

Risk: Gain drift or cross-zone modulation causing unexpected volume levels. Mitigation: All gain changes are slew-limited (max 0.02 per update). Curated volume is capped at 50% of the zone’s configured volume. Gain drift stays within configurable bounds (0.6–1.0 by default). Zone volumes are always adjustable from the dashboard.

14 Daily Operational Protocol

Prerequisites

This protocol assumes the VoIP pipeline and engine are already running. See Quick Start for initial setup.

Collect — let phones/tapes run, VoIP pipeline processes automatically
Ingest — copy daily pack to OT SD card, capture new tape messages to OT raw folder
Compose — update bed + fragments in OT with strict timebox (60–90 min), save project version
Export — render stems to laptop, upload via dashboard, assign roles
Promote — verify stems play in engine, approve as current set
Stability check — reboot test: can you restart playback in < 5 minutes?

15 Quick Start

15.1 Single-Machine Mode

If running everything on one machine (local development, or if you don’t need the relay):

# Clone and install
git clone <repo-url> after_the_tone
cd after_the_tone
pip install -e .

# Configure
cp config.yaml.example config.yaml  # edit as needed
cp .env.example .env                # add API keys

# Run
python -m att.main
# or
uvicorn att.main:app --host 0.0.0.0 --port 8000

# Open dashboard
open http://localhost:8000

15.2 Relay + Studio Mode

15.2.1 Relay (Windows PC) Setup

From a fresh Windows machine:

# 1. Install Python + ngrok
winget install Python.Python.3.12
winget install Ngrok.Ngrok
ngrok config add-authtoken <your-token>

# 2. Clone the repo
git clone <repo-url> after_the_tone
cd after_the_tone\relay

# 3. Create a virtual environment and install deps
python -m venv venv
venv\Scripts\activate
pip install -r requirements.txt

# 4. Edit relay/config.yaml
#    - Set base_url to your ngrok static domain
#    - Verify output_dir and greetings_dir point to your Dropbox paths

# 5. Set Twilio credentials in relay/start.bat
#    Edit the TWILIO_ACCOUNT_SID and TWILIO_AUTH_TOKEN lines

# 6. Test manually
relay\start.bat
# Call the phone number — should hear a greeting, then a beep
# Check ~/Dropbox/after_the_tone/incoming/ for the WAV + JSON sidecar

Greetings sync automatically from the Mac via ~/Dropbox/after_the_tone/greetings/.

15.2.2 Studio (Mac Laptop) Setup

cd after_the_tone

# Enable relay mode: set the Dropbox watch folder in config.yaml
# Under the audio: section, uncomment and set:
#   incoming_watch_dir: ~/Dropbox/after_the_tone/incoming

# Run the app as usual
python -m att.main

# Open dashboard — the "incoming" nav link will appear
open http://localhost:8000/incoming

When new recordings arrive via Dropbox, they appear on the Incoming page. Click ingest to run the full pipeline (transcribe → scrub → analyze → grains). The file moves to incoming/ingested/ and the message appears on the Messages page.

15.2.3 Relay Auto-Start

The repo includes relay/start.bat which sets Twilio credentials, launches ngrok with the static domain, waits for the tunnel, then starts the relay server.

Task Scheduler setup (runs at user logon):

Trigger: “At log on” (specific user)
Action: Start a program → relay\start.bat
Settings: AllowStartIfOnBatteries, DontStopIfGoingOnBatteries, StartWhenAvailable

Limitation: Runs at login, not at boot. If the PC restarts and nobody logs in, the relay stays down. For true boot-time startup, use NSSM to wrap it as a Windows service.

ngrok domain: The relay uses a static free-tier ngrok domain, so the Twilio webhook URL survives ngrok restarts with no reconfiguration needed.

15.3 Campus Phone (Grandstream HT812)

15.3.1 1. Twilio SIP Domain Setup

In the Twilio Console:

Go to Elastic SIP Trunking → SIP Domains (or Voice → SIP Domains)
Create a new SIP domain: after-the-tone.sip.twilio.com
Set Voice URL to: https://salena-crenate-coequally.ngrok-free.dev/twilio/voice (HTTP POST) — same webhook as PSTN calls
Under Credential Lists, create a new list and add credentials:
- Username: airstream, Password: (choose a strong password)
- Username: tape, Password: (choose a strong password) — only needed if using FXS 2 for tape path
Assign the credential list to the SIP domain for authentication

15.3.2 2. Network Setup

Connect the GL-iNet Opal to the local WiFi (repeater mode):

Access Opal admin at 192.168.8.1
Internet → Repeater → scan and connect to WiFi network
The Opal now bridges WiFi to its LAN ports and runs its own WiFi AP
Plug HT812 into Opal LAN port 1, XR18 into LAN port 2
Verify iPad still controls XR18 via Mixing Station (local traffic, unaffected)

15.3.3 3. HT812 Configuration

Access the HT812 web admin at its LAN IP (check Opal admin → connected clients, or try 192.168.8.x).

FXS Port 1 — Digital Path (auto-dial on pickup):

Setting	Value
SIP Server	`after-the-tone.sip.twilio.com`
SIP User ID	`airstream`
Authenticate ID	`airstream`
Authenticate Password	(your password from step 1)
Offhook Auto-Dial	`greeting`
Offhook Auto-Dial Delay	`0` (immediate — no dial tone, just pick up and go)
NAT Traversal	Keep-Alive

FXS Port 2 — Tape Path (optional, incoming calls):

Setting	Value
SIP Server	`after-the-tone.sip.twilio.com`
SIP User ID	`tape`
Authenticate ID	`tape`
Authenticate Password	(your password)

15.3.4 4. Test

Check HT812 status page — FXS 1 should show Registered
Pick up the analog phone
You should hear a random greeting, then a beep
Leave a message, hang up
Check ~/Dropbox/after_the_tone/incoming/ for the WAV + JSON sidecar

Troubleshooting:

Can’t log into HT812 web admin: The V2 model does not use admin/admin. It has a unique password printed on a sticker on the bottom of the unit. Username is admin.
Not registered: Check SIP server address, credentials, and that the Opal has internet (try pinging from the Opal admin). Make sure SIP Registration is enabled on the Twilio SIP Domain.
Registered but no audio: Check NAT traversal setting — try switching between Keep-Alive and STUN
One-way audio: Usually a NAT issue — enable STUN server (stun.l.google.com:19302) in HT812 settings

15.4 First Run

Upload at least one stem via the dashboard (Stems page)
Assign it the bed role
The engine will start playing it immediately
Upload more stems with fragments and air roles for full layered playback