After the Tone

System Architecture for a Participatory Sound Installation

Author

JF

Published

March 17, 2026

1 Executive Summary

After the Tone is a participatory sound installation where visitors leave voicemail messages on a dedicated phone line. These messages are transcribed, tended, and transmuted into audio material that plays continuously in the exhibition space.

The system operates across three lanes:

  • Lane A (Analog): Analog phone on campus → tape recorder → eurorack/Octatrack for artifact-making and color
  • Lane B (Digital): VoIP → Python pipeline → grains + curated daily packs → Octatrack
  • Lane C (Installation): OT exports stems → computer plays zones, triggers fades, occasional mic injection

The computer runs Lane C: stable, unattended playback. It does not do creative sound design. It plays pre-composed stems, manages zones, and reacts subtly to the room.

2 System Architecture

%%{init: {'theme': 'base', 'themeVariables': {
  'primaryColor': '#1a1a1a',
  'primaryTextColor': '#d4d4d4',
  'primaryBorderColor': '#c49a6c',
  'lineColor': '#8a6d4a',
  'secondaryColor': '#2a2a2a',
  'tertiaryColor': '#111',
  'fontFamily': 'SF Mono, Fira Code, Consolas, monospace',
  'fontSize': '12px',
  'clusterBkg': '#1a1a1a',
  'clusterBorder': '#2a2a2a',
  'edgeLabelBackground': '#111'
}}}%%
flowchart TD
    subgraph A["Lane A — Analog"]
        direction TB
        Campus[Analog Phone on Campus]
        Tape[Tape Recorder]
        Euro[Eurorack]
        OT_A[Octatrack]
        Campus --> Tape --> Euro --> OT_A
    end

    subgraph B["Lane B — Digital"]
        direction TB
        VoIP[VoIP / Twilio]
        Pipeline[Python Pipeline]
        Grains[Grain Library]
        Packs[Daily Packs]
        OT_B[Octatrack]
        VoIP --> Pipeline --> Grains --> Packs --> OT_B
    end

    subgraph C["Lane C — Installation"]
        direction TB
        OT_C[OT Stem Exports]
        Engine[Python Engine]
        XR18[XR18 Mixer]
        Inside[Inside Speakers]
        Outside[Outside Speakers]
        OT_C --> Engine --> XR18
        XR18 --> Inside
        XR18 --> Outside
        Mic[Room Mic] -.->|modulation| Engine
    end

    OT_A --> OT_C
    OT_B --> OT_C

    style A fill:#1a1a1a,stroke:#9a5a5a,color:#d4d4d4
    style B fill:#1a1a1a,stroke:#c49a6c,color:#d4d4d4
    style C fill:#1a1a1a,stroke:#5a9a5a,color:#d4d4d4
    style Campus fill:#2a2a2a,stroke:#9a5a5a,color:#d4d4d4
    style Tape fill:#2a2a2a,stroke:#9a5a5a,color:#d4d4d4
    style Euro fill:#2a2a2a,stroke:#9a5a5a,color:#d4d4d4
    style OT_A fill:#2a2a2a,stroke:#9a5a5a,color:#d4d4d4
    style VoIP fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
    style Pipeline fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
    style Grains fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
    style Packs fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
    style OT_B fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
    style OT_C fill:#2a2a2a,stroke:#5a9a5a,color:#d4d4d4
    style Engine fill:#2a2a2a,stroke:#5a9a5a,color:#d4d4d4
    style XR18 fill:#2a2a2a,stroke:#5a9a5a,color:#d4d4d4
    style Inside fill:#2a2a2a,stroke:#5a9a5a,color:#d4d4d4
    style Outside fill:#2a2a2a,stroke:#5a9a5a,color:#d4d4d4
    style Mic fill:#2a2a2a,stroke:#9a9a5a,color:#d4d4d4

2.1 Deployment: Relay + Studio Split

In practice, the system runs across two machines to separate the always-on phone line from the residency-bound audio engine.

%%{init: {'theme': 'base', 'themeVariables': {
  'primaryColor': '#1a1a1a',
  'primaryTextColor': '#d4d4d4',
  'primaryBorderColor': '#c49a6c',
  'lineColor': '#8a6d4a',
  'secondaryColor': '#2a2a2a',
  'tertiaryColor': '#111',
  'fontFamily': 'SF Mono, Fira Code, Consolas, monospace',
  'fontSize': '12px',
  'clusterBkg': '#1a1a1a',
  'clusterBorder': '#2a2a2a',
  'edgeLabelBackground': '#111'
}}}%%
flowchart LR
    subgraph Relay["Relay — Windows PC (home, always-on)"]
        direction TB
        Twilio[Twilio Webhook]
        Mode{mode?}
        Greeting[Play Random Greeting]
        Dual[Dial Tape via SIP +<br/>Record Digitally]
        Download[Download WAV]
        Save[Save WAV + JSON Sidecar]
        Twilio --> Mode
        Mode -->|record| Greeting --> Download --> Save
        Mode -->|dual| Dual --> Download
    end

    subgraph Sync["Dropbox"]
        Folder[after_the_tone/incoming/]
    end

    subgraph Studio["Studio — Mac Laptop (residency)"]
        direction TB
        Incoming[Incoming Page]
        Ingest[Ingest Button]
        Pipeline[Pipeline: Transcribe → Scrub → Analyze → Grains]
        Engine[Engine + Dashboard]
        Incoming --> Ingest --> Pipeline --> Engine
    end

    Save --> Folder --> Incoming

    style Relay fill:#1a1a1a,stroke:#c49a6c,color:#d4d4d4
    style Sync fill:#1a1a1a,stroke:#666,color:#d4d4d4
    style Studio fill:#1a1a1a,stroke:#5a9a5a,color:#d4d4d4
    style Twilio fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
    style Mode fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
    style Greeting fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
    style Dual fill:#2a2a2a,stroke:#9a5a5a,color:#d4d4d4
    style Download fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
    style Save fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
    style Folder fill:#2a2a2a,stroke:#666,color:#d4d4d4
    style Incoming fill:#2a2a2a,stroke:#5a9a5a,color:#d4d4d4
    style Ingest fill:#2a2a2a,stroke:#5a9a5a,color:#d4d4d4
    style Pipeline fill:#2a2a2a,stroke:#5a9a5a,color:#d4d4d4
    style Engine fill:#2a2a2a,stroke:#5a9a5a,color:#d4d4d4

Why split? The laptop handles audio hardware (XR18, speakers, room mic) and needs stability. The phone line needs always-on internet + ngrok. If the laptop crashes, the phone line stays up. If the PC crashes, the installation keeps playing.

Relay (PC) Studio (Mac)
Role Answer calls, save recordings Everything else
Dependencies fastapi, uvicorn, httpx, twilio, pyyaml, python-multipart Full att package
Internet Required (Twilio + ngrok) Only for Anthropic API calls
Audio hardware None XR18 + speakers + mic
Database None SQLite
Uptime Always-on During residency hours

Sync: Dropbox folder after_the_tone/incoming/. Relay writes YYYYMMDD_HHMMSS_{CallSid}.wav + .json sidecar. Mac picks them up from the dashboard Incoming page. After ingest, files move to incoming/ingested/.

Greetings: MP3 files live in ~/Dropbox/after_the_tone/greetings/ on both machines, synced via Dropbox. The relay serves them to Twilio via a static mount at /greetings/{filename}. On each call, one is chosen at random. The relay prefers .mp3 (faster for Twilio to fetch), falls back to .wav. If no greeting files exist, the call skips straight to the beep with no TTS fallback.

JSON sidecar format (written alongside each WAV):

{
  "call_sid": "CA...",
  "phone_hash": "sha256[:16]",
  "duration": 45.2,
  "timestamp": "20250225_143022"
}

Credentials: The relay reads TWILIO_ACCOUNT_SID and TWILIO_AUTH_TOKEN from environment variables. On the deployed PC, these are set in relay/start.bat (plaintext — acceptable for a home machine, not suitable for shared infrastructure).

ngrok: The relay uses a static free-tier ngrok domain (salena-crenate-coequally.ngrok-free.dev). This survives ngrok restarts, so the Twilio webhook URL never needs updating.

Single-machine mode still works: leave audio.incoming_watch_dir unset and the Twilio webhook mounts directly on the Mac as before.

2.2 Campus Phone: Grandstream HT812

An analog phone on campus connects to the same Twilio pipeline via a Grandstream HT812 ATA (Analog Telephone Adapter). When someone picks up the phone, it auto-dials Twilio over SIP — no keypad, no dialing, just pick up and talk.

%%{init: {'theme': 'base', 'themeVariables': {
  'primaryColor': '#1a1a1a',
  'primaryTextColor': '#d4d4d4',
  'primaryBorderColor': '#c49a6c',
  'lineColor': '#8a6d4a',
  'secondaryColor': '#2a2a2a',
  'tertiaryColor': '#111',
  'fontFamily': 'SF Mono, Fira Code, Consolas, monospace',
  'fontSize': '12px',
  'clusterBkg': '#1a1a1a',
  'clusterBorder': '#2a2a2a',
  'edgeLabelBackground': '#111'
}}}%%
flowchart LR
    subgraph Campus["Residency — On-Site Network"]
        direction TB
        Phone[Analog Phone]
        HT812[Grandstream HT812<br/>FXS → SIP]
        Opal[GL-iNet Opal<br/>WiFi bridge]
        XR18[XR18 Mixer]
        iPad[iPad<br/>Mixing Station]
        Phone -->|RJ11| HT812
        HT812 -->|ethernet| Opal
        XR18 -->|ethernet| Opal
        iPad -.->|WiFi| Opal
    end

    subgraph Cloud["Twilio"]
        SIP[SIP Domain]
        Webhook[Voice URL Webhook]
        SIP --> Webhook
    end

    subgraph Home["Home PC — Relay"]
        Relay[relay/server.py]
    end

    Opal -->|internet| SIP
    Webhook -->|ngrok| Relay

    style Campus fill:#1a1a1a,stroke:#5a9a5a,color:#d4d4d4
    style Cloud fill:#1a1a1a,stroke:#c49a6c,color:#d4d4d4
    style Home fill:#1a1a1a,stroke:#c49a6c,color:#d4d4d4
    style Phone fill:#2a2a2a,stroke:#9a5a5a,color:#d4d4d4
    style HT812 fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
    style Opal fill:#2a2a2a,stroke:#5a9a5a,color:#d4d4d4
    style XR18 fill:#2a2a2a,stroke:#5a9a5a,color:#d4d4d4
    style iPad fill:#2a2a2a,stroke:#666,color:#d4d4d4
    style SIP fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
    style Webhook fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
    style Relay fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4

Network: The GL-iNet Opal travel router connects to the residency WiFi and bridges it to its LAN ports. The HT812 and XR18 share the Opal’s two ethernet ports. The iPad connects to the Opal’s WiFi AP for mixer control. Local traffic (iPad ↔︎ XR18) stays on the Opal’s subnet; only the HT812’s SIP traffic goes out to the internet. The Mac laptop is not in this path — it doesn’t need to be on or connected for the phone to work.

Call flow: Phone pickup → HT812 offhook auto-dial → SIP to Twilio → Twilio fires Voice URL webhook → relay on home PC → greeting + record → WAV to Dropbox → Mac ingests later.

Two ports, two paths:

HT812 Port Connected Device SIP User Behavior
FXS 1 Analog phone airstream Offhook auto-dial → digital pipeline (greeting, record, grains)
FXS 2 Tape answering machine (Code-a-Phone) tape Receives incoming SIP calls → tape records analog copy

FXS 2 is optional. In standard mode (mode: record), only FXS 1 is used — callers hear a random MP3 greeting, leave a message, and the relay saves a digital WAV. FXS 2 sits idle.

2.2.1 Dual Recording Mode

In dual mode (mode: dual), every call simultaneously records to the tape machine AND digitally. The relay uses Twilio’s <Dial record="record-from-answer-dual"> to bridge the caller to the tape machine’s SIP endpoint while Twilio captures a parallel digital recording.

Caller → Twilio → relay (dual mode)
                    ├── SIP dial → HT812 FXS 2 → Code-a-Phone (analog tape)
                    └── Twilio records call digitally → WAV to Dropbox

The caller hears the tape machine’s physical greeting (not the MP3) and leaves a message on the cassette. Meanwhile, Twilio captures the entire call as a WAV and the relay saves it to Dropbox as usual. You get two copies: one with analog tape character, one clean digital.

Relay config for dual mode:

mode: dual
tape_sip: sip:tape@after-the-tone.sip.twilio.com

Recording behavior:

  • No silence cutofftimeout=0 disables Twilio’s default 5-second silence detection. Recordings only end when the caller hangs up or hits the max length.
  • Max recording length — 3600 seconds (1 hour). Callers are never cut off for going long.
  • Download timeout — 300 seconds, to handle large recordings over slow connections.

3 Lane Definitions

3.1 Lane A: Analog Artifact

Purpose: Create textured, degraded audio material with analog character.

Input Messages left on an analog phone on campus, recorded directly to tape
Process Tape recordings processed through eurorack (filters, reverb, saturation), sampled into Octatrack
Output Processed stems with analog warmth and tape artifacts
Rule All creative decisions happen here, not on the computer

3.2 Lane B: Digital Indexed

Purpose: Systematically process voicemails into an indexed, searchable grain library.

Input VoIP recordings via Twilio webhook (relay PC or local)
Process Transcribe (Whisper) → scrub PII (spaCy) → analyze emotions (Claude) → segment into grains (librosa)
Output Tagged grain library + curated daily sample packs for Octatrack
Rule Pipeline is automated; curation happens when selecting daily packs
Deployment Twilio answering runs on the relay PC; pipeline processing runs on the Mac (see Relay + Studio Split)

3.3 Lane C: Installation Playback

Purpose: Reliable, deterministic playback in the exhibition space.

Input Pre-composed stems (from DAW, Octatrack, or any source) + curated voicemail clips
Process Two-zone playback engine: beds loop inside, curated clips play outside, cross-zone intensity modulation
Output 3-channel audio to XR18 → speakers (stereo inside + mono outside)
Rule No creative sound design on the computer. Stable unattended operation.

4 Lane B — Digital Pipeline

A deep-dive into the automated VoIP ingest pipeline that powers Lane B.

%%{init: {'theme': 'base', 'themeVariables': {
  'primaryColor': '#1a1a1a',
  'primaryTextColor': '#d4d4d4',
  'primaryBorderColor': '#c49a6c',
  'lineColor': '#8a6d4a',
  'secondaryColor': '#2a2a2a',
  'tertiaryColor': '#111',
  'fontFamily': 'SF Mono, Fira Code, Consolas, monospace',
  'fontSize': '12px',
  'clusterBkg': '#1a1a1a',
  'clusterBorder': '#2a2a2a',
  'edgeLabelBackground': '#111'
}}}%%
flowchart LR
    VoIP[VoIP Recording] --> Transcribe
    Transcribe --> Scrub[PII Scrub]
    Scrub --> Analyze[Emotional Analysis]
    Analyze --> Grains[Extract Grains]
    Grains --> Ready

    style VoIP fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
    style Transcribe fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
    style Scrub fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
    style Analyze fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
    style Grains fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
    style Ready fill:#2a2a2a,stroke:#5a9a5a,color:#d4d4d4

Status state machine: newtranscribedscrubbedanalyzedready

Key file: att/ingest/pipeline.py — orchestrates the full sequence: transcribe → scrub → analyze → extract grains.

Library: faster-whisper (local inference, no cloud upload)

Key file: att/ingest/transcribe.py

Config (whisper section):

Parameter Default Description
model_size small Whisper model size (tiny/base/small/medium/large)
device auto Compute device (auto/cpu/cuda)
compute_type int8 Quantization for faster inference
language null Language code or null for auto-detect

Output to DB:

  • transcripts.raw_text — full transcription
  • transcripts.language — detected language code
  • transcripts.confidence — model confidence score

Privacy: Audio never leaves the machine. All transcription runs locally.

Failure: Empty audio → status set to "empty", pipeline aborts for that message.

ImportantWhy this step exists

People call a voicemail line and say whatever is on their mind. They say their names. They say their mother’s name. They leave phone numbers, addresses, confessions. This project asks people to be vulnerable, and the minimum obligation in return is that their identifying information never leaves the machine they trusted it to. PII scrubbing is not a feature — it is the condition under which this project is ethical.

Libraries: spaCy (en_core_web_sm) + custom regex patterns — both run locally, no network calls.

Key file: att/ingest/scrub.py

4.0.1 Where it sits in the pipeline

Scrubbing is step 2 of the ingest pipeline (att/ingest/pipeline.py). It runs after local transcription and before any text is sent to an external API. The ordering is enforced by the pipeline itself — there is no code path that sends unscrubbed text to Claude. The raw transcript and the original audio never leave the local machine.

4.0.2 Two-pass approach

The scrubber makes two passes over the transcript, then deduplicates overlapping detections (keeping the longer match):

Pass 1 — Regex patterns (deterministic, high-precision):

Pattern Examples caught Placeholder
PHONE (555) 123-4567, +1-555-123-4567 [PHONE]
EMAIL name@example.com [EMAIL]
SSN 123-45-6789 [SSN]
ADDRESS 1234 Main Street, 56 Oak Ave [ADDRESS]

Pass 2 — spaCy NER (statistical, catches what regex misses):

Entity type What it catches Placeholder
PERSON Personal names (“my daughter Sarah”) [NAME]
ORG Organizations, employers, schools [ORGANIZATION]
GPE / LOC / FAC Cities, states, landmarks, buildings [LOCATION]

The two passes are complementary: regex catches structured identifiers that NER models often miss (phone numbers, SSNs), while spaCy catches names and places that no regex can reliably match.

4.0.3 What the scrubbed text looks like

A raw transcript like:

“Hi, this is Sarah Chen calling from 1455 Oak Drive. You can reach me at 555-0142. I just wanted to say I miss you, Mom.”

becomes:

“Hi, this is [NAME] calling from [ADDRESS]. You can reach me at [PHONE]. I just wanted to say I miss you, Mom.”

The emotional content — the part that matters for this project — is preserved. The identifying details are not.

4.0.4 Overlap deduplication

When regex and spaCy flag the same span (e.g., both detect “Sarah Chen”), the scrubber sorts by position and keeps the longer match to avoid double-replacement or garbled output.

4.0.5 Output to DB

  • transcripts.scrubbed_text — the redacted version of the transcript, with placeholders replacing every detected entity
  • transcripts.pii_entities — a JSON array recording what was found and where: {"text": "Sarah Chen", "label": "PERSON", "start": 13, "end": 23}

The original entity text is stored only in this local DB field so that scrubbing can be audited and improved. It is never sent to any external service.

4.0.6 What reaches the outside world

Nothing in this pipeline contacts the network except the Claude API call in the next step (emotional analysis), and that call receives only the scrubbed text. The code path is explicit — analyze_message() takes a scrubbed_text parameter; the raw transcript is not passed.

To be concrete about what never leaves the machine:

  • The original audio recording
  • The raw Whisper transcript
  • Any detected PII entities (names, phone numbers, addresses, SSNs)

4.0.7 Config

None — the regex patterns and spaCy model are hardcoded. This is intentional: PII scrubbing should not be something you can accidentally misconfigure or turn off. The patterns are conservative (biased toward over-scrubbing).

4.0.8 Limitations

This is a best-effort system, not a legal guarantee. spaCy’s en_core_web_sm is a small model optimized for speed over recall. It will miss some names, especially unusual ones or those in non-English speech. The regex patterns cover US-format identifiers. If a caller says “I live on the corner of Fifth and Main” without a street number, the address regex won’t catch it (though spaCy’s GPE/LOC may).

The design principle is: when in doubt, scrub it. A false positive (over-redacting) costs nothing meaningful — the emotional analysis still works with [NAME] placeholders. A false negative (leaking a name to an API) is the failure mode that matters.

Service: Anthropic Claude API (async)

Key file: att/ingest/analyze.py

Config (analysis section):

Parameter Default Description
model claude-sonnet-4-20250514 Claude model for analysis
themes 17 emotions Whitelist of valid theme labels

Output to DB:

  • analysis.sentiment — primary emotional category
  • analysis.intensity — float [0, 1]
  • analysis.themes — list of matched theme labels
  • analysis.summary — one-sentence description
  • analysis.slug — filename-safe identifier
  • analysis.narrative_thread — thematic grouping

Validation: Intensity clamped to [0, 1], themes filtered against the configured whitelist, handles markdown fences in Claude’s JSON response.

Failure: Missing API key → exception. Invalid JSON response → exception. Both set message status to "error".

Libraries: librosa (silence detection, spectral features) + soundfile

Key file: att/ingest/grains.py

Config (grains section):

Parameter Default Description
min_grain_secs 2.0 Minimum grain duration
max_grain_secs 10 Maximum grain duration
silence_threshold_db 30.0 Silence detection threshold
min_silence_gap_secs 0.3 Minimum gap between grains

Algorithm:

  1. Detect silence intervals in audio
  2. Merge intervals closer than min_silence_gap_secs
  3. Merge segments shorter than min_grain_secs
  4. Split segments longer than max_grain_secs
  5. Apply 10ms fade in/out to each grain

Spectral features per grain:

  • spectral_centroid (Hz) — brightness measure
  • rms_energy — perceptual loudness

Output: Grain WAV files in audio/grains/, DB rows in grains table.

Failure: Audio shorter than min_grain_secs → empty grain list. No speech detected → empty grain list.

Key file: att/ingest/export.py

Naming convention: {sentiment}/{slug}_{grain_num:02d}.wav (e.g. grief/miss_you_mom_01.wav)

Directory structure: Sentiment subfolders + grains.csv metadata file.

CSV columns:

filename, folder, slug, sentiment, themes, intensity, duration, message_id, grain_index, start, end, spectral_centroid, rms_energy, summary

Config: Uses audio.storage_path for the export directory root.

In addition to the automatic silence-based grain extraction above, a curated phrase extraction tool lets you search for specific spoken phrases across all transcripts and cut them out precisely.

Key file: scripts/extract_phrases.py

How it works:

  1. You write a text file (phrases.txt) with one phrase per line — the phrases that struck you while listening
  2. The tool fuzzy-matches each phrase against all transcripts’ word-level timestamps (generated by Whisper)
  3. Matching audio segments are extracted with configurable padding (default 1.5s before and after) and 1.5s crossfades
  4. Output: clean WAV files in audio/curated/, ready for the outside zone
# Preview matches without extracting
python scripts/extract_phrases.py phrases.txt --dry-run

# Extract with default settings
python scripts/extract_phrases.py phrases.txt

# Custom padding and match threshold
python scripts/extract_phrases.py phrases.txt --pad-before 0.5 --pad-after 0.5 --min-score 75

# Keep all non-overlapping matches (not just the best)
python scripts/extract_phrases.py phrases.txt --all-matches

# Then normalize the output for consistent playback volume
python scripts/normalize_curated.py
TipFor Fellow Artists: Building a Sample Library from Field Recordings

If you’re working with field recordings, interviews, or any spoken-word material and want to build a usable sample library, here’s the workflow this project uses. The tools are open and the approach generalizes to any audio corpus.

The philosophy: you still listen to everything. The machine doesn’t decide what matters — you do. It just makes the extraction fast once you’ve made your choices.

  1. Listen first — Go through every recording. Apply your own creative judgment. Note what resonates, what surprises you, what you want to use.

  2. Transcribe — Whisper runs locally (no cloud upload) and produces word-level timestamps. Every word gets a precise start and end time. Run scripts/retranscribe.py if your existing transcripts are missing word timestamps.

  3. Note the phrases that struck you — As you listen, write down the phrases in a plain text file (one per line). These are your curatorial choices — the moments that matter to you.

  4. Let the tool do the cutting — The fuzzy matcher (scripts/extract_phrases.py) finds your chosen phrases in the transcripts, even when speakers mumbled or trailed off. It handles the tedious part: locating the exact timecode and cutting the audio with padding and crossfades. Output: clean WAV files ready to drop into a sampler, DAW, or this installation’s curated zone.

  5. Normalize — Batch-normalize all clips to consistent loudness (scripts/normalize_curated.py targets -23 dB RMS with a -1 dB peak ceiling) so the library plays back at even levels.

  6. Organize — The export tool (att/ingest/export.py) sorts grains into folders by emotional category (grief/, hope/, silence/) with descriptive filenames derived from content analysis. A CSV manifest ties everything together: filename, sentiment, themes, intensity, duration, spectral features.

Frame it this way: you do the listening, the choosing, the curating. The machine handles the cutting and organizing.

TipFor Filmmakers: Scanning Documentary Footage with Transcription

If you’re working with hours of interview footage and hunting for the three sentences that tell the story, this toolchain can help.

  1. The problem — Scrubbing through timelines is slow. You know the kind of thing you’re looking for, but finding it means watching everything again.

  2. Word-level timestamps — Whisper generates timecoded transcripts where every word has a start and end time. These can be exported as SRT subtitle files or used programmatically to jump straight to any moment.

  3. Phrase search — Write down the phrases or ideas you’re looking for in a text file. The fuzzy matcher (scripts/extract_phrases.py) scans all transcripts and returns timecodes — even when speakers trailed off or used slightly different words than you remembered.

  4. Auto-extract — The extraction tool cuts audio at those timecodes with padding and fades. Build a rough cut from a text file instead of scrubbing through hours of footage.

Instead of scrubbing through timelines, describe what you’re looking for in words. The machine finds it for you.

ImportantPrivacy and Responsible AI Use

If you’re going to use AI with recordings of real people, here’s how to think about doing it responsibly. This section documents the ethical framework built into this project — not as a legal disclaimer, but as a set of design decisions that encode a responsibility to the people who shared their voices.

4.0.9 Nothing leaves the machine by default

All transcription (Whisper) runs locally. Original audio, raw transcripts, and detected personal information never touch the internet.

4.0.10 PII scrubbing before any AI analysis

Before the system asks Claude to analyze emotional content, it strips names, phone numbers, addresses, SSNs, and organizations using a two-pass approach:

  • Pass 1: Deterministic regex patterns — catches phone numbers, emails, SSNs, street addresses
  • Pass 2: spaCy NER — catches personal names, organizations, locations

The design bias is explicit: over-redacting is acceptable, leaking PII is not. A false positive (replacing “Mom” with [NAME]) costs nothing — the emotional analysis still works. A false negative (sending someone’s name to an external API) is the failure mode that matters.

4.0.11 Only scrubbed text reaches Claude

The code path is explicit: scrub_pii() runs first, analyze_message() only receives the scrubbed output. There is no code path that sends raw transcript to an external API. The pipeline enforces this ordering — it’s not a setting you can turn off.

4.0.12 Audit trail

Detected PII entities are stored locally (in the pii_entities field) so scrubbing accuracy can be reviewed and improved. This data never leaves the machine.

4.0.13 Limitations (honest)

  • spaCy’s en_core_web_sm is a small model optimized for speed. It will miss some names, especially unusual ones or those in non-English speech.
  • Regex patterns cover US-format identifiers. International phone numbers and address formats may slip through.
  • If a caller says “I live on the corner of Fifth and Main” without a street number, the address regex won’t catch it (though spaCy’s location detection may).
  • The system errs on the side of caution but isn’t perfect.

4.0.14 Voice as property

A person’s voice is theirs. AI models trained on voice recordings can later be used to generate synthetic speech that sounds like that person — deepfakes. Exposing raw voice audio online means it could be scraped and used for voice cloning without consent. This is why:

  • Original audio files never leave the local machine
  • The public constellation website streams playback but does not allow downloading, sharing, or saving audio files
  • Only scrubbed text (not audio) is sent to the Claude API for analysis
  • The voice recordings exist for listening in the moment, not for extraction

4.0.15 Why this matters

People called a voicemail line and shared personal stories. Some left their names. Some left phone numbers. Some said things they might not want the world to hear. Using AI to analyze that content comes with a responsibility to protect their privacy and their voice. The architecture encodes that responsibility — it’s not an afterthought.

5 Preparing Sound for the Installation

A stem is a composed piece of audio that the installation will loop on the inside speakers. It can come from anywhere — a DAW, a hardware sampler, a modular synthesizer, field recordings processed through effects. The only requirements are technical: the engine needs files in a specific format to play them reliably.

5.1 What Makes a Valid Stem

Property Requirement
Format WAV (PCM 24-bit)
Sample rate 48kHz (files at other rates are rejected on import)
Channels Stereo (the engine preserves stereo throughout the signal path)
Loudness Normalized to -14 LUFS on import (the importer does this automatically)
Duration 1 second minimum, 1 hour maximum

Longer stems are better — 5 to 30 minutes avoids obvious looping. The engine streams from disk with a 10-second read-ahead buffer, so even very long files work without loading into memory.

5.2 How to Upload

Via dashboard: Go to the Stems page, drag-drop a WAV file onto the upload zone. The importer validates format, normalizes loudness, and adds it to the stem library. Set a mood tag for diversity (the engine avoids repeating the same mood back-to-back).

Via API:

curl -X POST -F file=@my_stem.wav localhost:8000/api/stems/upload

5.3 Generating Curated Samples

The outside zone plays short voicemail clips from audio/curated/. To build or refresh this library from the voicemail corpus:

# 1. Edit phrases.txt with phrases you want to extract (one per line)
# 2. Extract matching audio from transcripts
python scripts/extract_phrases.py phrases.txt

# 3. Normalize to consistent volume (-23 dB RMS)
python scripts/normalize_curated.py

Output goes to audio/curated/. The outside zone picks up new files automatically on its next scan cycle.

6 How the Installation Plays Sound

Walk into the exhibition space and you hear two things: a continuous wash of ambient sound from the speakers inside, and occasional fragments of actual voicemail messages drifting from the speakers outside. The two zones breathe together — when the inside gets louder, the outside responds. The sound is always changing, but slowly, like weather.

Under the hood, this is a two-zone playback engine (att/engine/player.py). Each zone has its own independent playback system, and they communicate through a simple intensity signal.

%%{init: {'theme': 'base', 'themeVariables': {
  'primaryColor': '#1a1a1a',
  'primaryTextColor': '#d4d4d4',
  'primaryBorderColor': '#c49a6c',
  'lineColor': '#8a6d4a',
  'secondaryColor': '#2a2a2a',
  'tertiaryColor': '#111',
  'fontFamily': 'SF Mono, Fira Code, Consolas, monospace',
  'fontSize': '12px',
  'clusterBkg': '#1a1a1a',
  'clusterBorder': '#2a2a2a',
  'edgeLabelBackground': '#111'
}}}%%
flowchart LR
    subgraph Inside["Inside Zone — Stems"]
        direction TB
        Stems[(Stem Library)]
        Select[Weighted Random<br/>+ Mood Diversity]
        Layer[Playback Layer<br/>stream from disk]
        Stems --> Select --> Layer
    end

    subgraph Outside["Outside Zone — Curated"]
        direction TB
        Folder[(audio/curated/)]
        Pick[Random Selection]
        Play[One-Shot Playback]
        Folder --> Pick --> Play
    end

    Layer --> Mix[Mixer]
    Play --> Mix

    Layer -.->|RMS intensity| Play

    Mix --> XR18[XR18 Mixer]
    XR18 --> Spk_I[Inside Speakers<br/>ch 0-1 stereo]
    XR18 --> Spk_O[Outside Speaker<br/>ch 2 mono]

    style Inside fill:#1a1a1a,stroke:#5a9a5a,color:#d4d4d4
    style Outside fill:#1a1a1a,stroke:#c49a6c,color:#d4d4d4
    style Stems fill:#2a2a2a,stroke:#5a9a5a,color:#d4d4d4
    style Select fill:#2a2a2a,stroke:#5a9a5a,color:#d4d4d4
    style Layer fill:#2a2a2a,stroke:#5a9a5a,color:#d4d4d4
    style Folder fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
    style Pick fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
    style Play fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
    style Mix fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
    style XR18 fill:#2a2a2a,stroke:#c49a6c,color:#d4d4d4
    style Spk_I fill:#2a2a2a,stroke:#5a9a5a,color:#d4d4d4
    style Spk_O fill:#2a2a2a,stroke:#5a9a5a,color:#d4d4d4

6.1 Inside Zone — Beds of Sound

The inside speakers play stereo bed stems: composed pieces of ambient audio that loop continuously, one dissolving into the next. Think of each stem as a slow scene change — the texture of the room shifts, but there’s never silence.

How it works:

  • Stems are stereo WAV files at 48kHz, streamed from disk with a 10-second read-ahead buffer (not loaded into memory — even hour-long compositions work)
  • When a stem reaches its end, the engine crossfades to a new one over 3 seconds — one texture dissolving into another
  • Mood diversity: the system avoids repeating the same emotional tone back-to-back. If the current stem is tagged “grief,” the next one won’t be
  • Weighted selection: stems with higher weights are chosen more often, giving you curatorial control over which compositions dominate
  • Stems can also be swapped manually from the dashboard at any time

6.2 Outside Zone — Fragments of Past Messages

The outside speakers whisper fragments of past voicemail messages — the software picks them at random, spacing them out like distant memories. Each clip plays once, then the speaker goes quiet for 30–75 seconds before another.

The outside zone listens to the inside zone’s energy (measured as RMS — root mean square, a way of measuring audio loudness):

  • When the inside zone is loud: curated clips play louder and gaps between them are shorter — the outside joins the conversation
  • When the inside zone is quiet: clips are softer and more spaced out — the outside steps back
  • The volume floor is 5% (never fully silent) and the ceiling is 50% of the zone’s configured volume

This cross-zone modulation means the two zones breathe together without any manual coordination.

6.3 Per-Zone Gain Automation

On top of the zone volumes you set in the dashboard, the engine adds slow, random gain drift — subtle volume movement that keeps the sound from feeling static. Think of it as the room itself breathing.

  • Drift range: gain multiplier wanders between 0.6 and 1.0
  • Slew-limited: maximum 0.02 change per update (no abrupt jumps, only gradual shifts)
  • Update interval: every 0.5–2 seconds, a new random target is chosen

6.4 Safe Mode

A mandatory fallback for unattended operation — if something goes wrong while nobody’s watching, the installation keeps playing.

Trigger Automatic on any engine exception, or manual from dashboard
Behavior All zones play the same bed stem, looped, at a fixed safe volume (60%)
Dashboard Shows “SAFE MODE” prominently in red
Exit Manual from dashboard only

6.5 Threading Model

The engine uses three concurrent systems to keep audio smooth:

  • Mix thread — a dedicated Python thread that fills the audio buffer using time.sleep() for precise timing. This is the real-time audio path; it never touches asyncio.
  • Trigger task — an asyncio task that polls for bed-change timers and manual triggers (non-realtime, runs every 0.5s)
  • Curated loops — one asyncio task per curated zone, managing sample selection, playback timing, and gap scheduling

7 What Makes the Sound Change

The installation is designed to run unattended — the sound evolves on its own. Here’s what causes changes:

  • Bed stems auto-rotate: when a stem finishes playing (reaches 98% progress), the engine crossfades to a new one with a different mood
  • Curated clips self-regulate: the outside zone picks random clips and adjusts its own volume and pacing based on the inside zone’s energy
  • Gain drift: slow random volume movement across all zones (see Per-Zone Gain Automation above)
  • Manual control: from the dashboard, you can swap a stem, change zone volumes, set a target mood, or enter/exit safe mode

8 Hardware Setup

8.1 XR18 Channel Mapping

Channel Assignment Zone
0 Inside L inside
1 Inside R inside
2 Outside (mono) outside

8.2 Signal Flow

Python Engine (sounddevice) → XR18 USB Audio → XR18 Mixer → Speakers
Room Mic → XR18 Aux In → Python Engine (sounddevice input)

8.3 On-Site Network

Sou'wester WiFi
       │
  GL-iNet Opal (WiFi client → LAN bridge + WiFi AP)
       │
       ├── LAN 1 → XR18 (mixer control)
       ├── LAN 2 → Grandstream HT812 (SIP to Twilio)
       └── WiFi AP → iPad (Mixing Station)

The Mac laptop connects to the Sou’wester WiFi directly (or the Opal’s AP) — it only needs internet for Anthropic API calls during ingest. The audio engine runs entirely offline over USB to the XR18.

8.4 Grandstream HT812

Port Device Purpose
WAN Not used (ethernet goes to Opal LAN port)
LAN Not used
FXS 1 Analog phone Auto-dials Twilio on pickup
FXS 2 Tape answering machine (optional) Receives incoming SIP calls

9 Dashboard Reference

9.1 Pages

Page Purpose
Home Stats, recent messages, upload zone, theme distribution
Incoming Relay recordings from Dropbox, per-file ingest button (only visible when incoming_watch_dir is set)
Messages Message list with sentiment filter, detail view with inline editing
Stems Stem library with role/mood filters, upload zone, role assignment
Engine Zone status, controls (play/pause/swap/safe mode), per-zone volume sliders, mood selector
Zones Zone configuration viewer (channels, volume, type)
Settings Configuration viewer

9.2 Key Workflows

Upload a stem:

  1. Go to Stems page
  2. Drag-drop a WAV file onto the upload zone
  3. Optionally set mood, weight, and tags
  4. Click “play” to crossfade it into the inside zone

Ingest a relay recording:

  1. Recordings arrive in the Dropbox folder automatically
  2. Go to the Incoming page (visible when incoming_watch_dir is set)
  3. Click ingest on a recording
  4. Pipeline runs: transcribe → scrub → analyze → extract grains
  5. File moves to ingested/ subfolder, message appears on Messages page

Archive/unarchive a message:

  1. Go to Messages page, open a message detail
  2. Click archive to hide it from the public website (export script filters archived messages)
  3. Click unarchive to restore it

Enter safe mode:

  1. Go to Engine page
  2. Click “enter safe mode”
  3. All zones play the same bed at fixed volume
  4. Click “exit safe mode” to resume normal playback

Swap a stem:

  1. Go to Engine page
  2. Click a specific stem in the library to crossfade to it in the inside zone
  3. Or let the engine auto-rotate when the current stem finishes

10 The Constellation — Public Website

After the residency, the voicemails live on as a constellation of light and sound. Each message becomes a point of light in a dark sky, and visitors can listen to them from anywhere.

Live at: after-the-tone.netlify.app (also embedded at sonicswitchyard.com/art/afterthetone)

10.1 What You See

The page opens to a dark canvas. Points of warm light appear in a tight cluster at the center, then drift apart like a big bang — settling into a constellation over a few seconds. This is a force-directed layout running live in the browser: messages that share emotional themes are pulled together by faint lines, while a gentle repulsion keeps them from overlapping. 2D Perlin noise adds slow ambient drift, so the field is never quite still.

Each star encodes properties of the original voicemail:

Visual Property Data Source How It Maps
Size Duration of the call Longer messages make bigger stars (log-scaled)
Brightness / glow Emotional intensity More intense messages glow brighter
Connections Shared themes (2+) Faint lines link messages with overlapping emotional themes
Clustering Sentiment + themes Messages with similar feelings drift together

Click a star (or the “listen to a message” button) to hear the original voicemail and read the scrubbed transcript. Theme tags appear below. The constellation highlights the selected star and its connected neighbors.

ImportantVoice as Property — No Download, No Save

Audio plays in-browser only. There are no download links, no right-click-save, no sharing buttons. This is deliberate: a person’s voice is their property, and exposing downloadable audio risks voice cloning and deepfake use. The constellation is for listening, not extracting. See Privacy and Responsible AI Use for the full rationale.

10.2 Archive Feature

Messages can be hidden from the public site via the dashboard’s Messages page (archive/unarchive buttons). The export script filters out archived messages, so they won’t appear in the constellation after the next deploy. This gives curatorial control over what’s public without deleting anything from the database.

10.3 Build and Deploy Pipeline

The public site is a static HTML page (public_src/index.html) that loads a data.json manifest and audio files. No server required — it runs on Netlify’s free tier.

Export script (scripts/export_public.py):

  1. Queries SQLite for non-archived messages with completed analysis
  2. Converts WAV audio to 128kbps mono MP3 via ffmpeg
  3. Generates data.json with scrubbed transcripts, sentiment, themes, intensity, duration
  4. Copies public_src/index.html to the output directory
  5. Writes netlify.toml with CORS headers (for Squarespace iframe embedding) and cache rules
# Full export (audio + data)
source .venv/bin/activate
python scripts/export_public.py

# Data only (skip slow audio conversion)
python scripts/export_public.py --skip-audio

# Deploy to Netlify
netlify deploy --prod --dir=public/ --site=d81467c4-1d84-41f4-a584-aaa944e67d0a --no-build

Source files:

File Purpose
public_src/index.html Single-page app: constellation canvas, audio player, force layout, Perlin drift
scripts/export_public.py Build script: SQLite → data.json + MP3 audio + HTML + netlify.toml

11 Configuration Reference

All parameters from config.yaml, organized by domain. This section is a technical reference — consult it when setting up or tuning the installation.

11.1 Audio

How the system talks to your audio interface.

Parameter Default Description
audio.sample_rate 48000 Global sample rate in Hz — all stems and curated clips must match this
audio.channels 3 Total output channels (inside stereo + outside mono)
audio.block_size 1024 Audio buffer block size in samples — lower = less latency, higher = more stable
audio.buffer_seconds 2.0 Ring buffer duration — headroom before audio underruns
audio.device X18/XR18 sounddevice output device name or index (null for system default)
audio.storage_dir audio Root directory for audio files
audio.incoming_watch_dir null Dropbox folder for relay recordings — set this to enable relay mode

11.2 Zones

Each zone is a group of output channels with its own playback type. The type field determines what kind of audio the zone plays.

Parameter Default Description
zones.<name>.channels Output channel numbers (e.g., [0, 1] for stereo, [2] for mono)
zones.<name>.volume 0.8 Zone master volume, adjustable at runtime via dashboard
zones.<name>.type stems Zone type: stems (loops bed stems) or curated (plays one-shot samples)
zones.<name>.curated.directory audio/curated Directory to scan for WAV samples (curated zones only)
zones.<name>.curated.sleep_range_secs [30, 75] Range for gap duration between curated clips (seconds)
zones.<name>.curated.volume 0.8 Base volume for curated playback (modulated by stem zone intensity)

11.3 Whisper

Local speech-to-text. Runs entirely on your machine — no audio leaves the network.

Parameter Default Description
whisper.model_size small Whisper model (tiny/base/small/medium/large) — larger = more accurate, slower
whisper.device auto Compute device (auto/cpu/cuda)
whisper.compute_type int8 Quantization type for faster inference
whisper.language null Language code or null for auto-detect

11.4 Analysis

Emotional analysis via Claude API. Only scrubbed text (PII removed) is sent.

Parameter Default Description
analysis.model claude-sonnet-4-20250514 Claude model for emotional analysis
analysis.themes 17 items Emotion whitelist: grief, longing, memory, love, family, loss, hope, anger, peace, regret, gratitude, apology, farewell, confession, prayer, humor, silence

11.5 Grains

Controls how voicemails are split into segments during ingest. Used by the legacy grain engine and the phrase extraction pipeline.

Parameter Default Description
grains.min_grain_secs 3 Minimum grain duration (seconds)
grains.max_grain_secs 10 Maximum grain duration (seconds)
grains.silence_threshold_db 25 Silence detection sensitivity (dB) — higher = more splits
grains.min_silence_gap_secs 0.3 Gaps shorter than this are merged

11.6 Stems

Controls how the inside zone plays bed stems.

Parameter Default Description
stems.storage_dir audio/stems Where imported stems are stored on disk
stems.crossfade_seconds 3.0 Duration of crossfade when switching stems
stems.loop true Whether stems loop when they reach the end
stems.default_mood null If set, prefer stems tagged with this mood on startup
stems.bed_volume 0.8 Default volume for stem playback layers
stems.import_target_lufs -14.0 Loudness normalization target for imported stems (LUFS)
stems.import_required_sample_rate 48000 Required sample rate — stems at other rates are rejected

11.6.1 Safe Mode

The fallback mode for unattended operation. If the engine crashes, it drops to safe mode automatically.

Parameter Default Description
stems.safe_mode.enabled true Whether safe mode fallback is available
stems.safe_mode.safe_bed_name null Specific bed stem for safe mode (null = use whatever’s loaded)
stems.safe_mode.fixed_volume 0.6 Fixed volume during safe mode

11.6.2 Gain Automation

Slow, random volume drift that keeps the sound from feeling static.

Parameter Default Description
stems.gain_automation.update_interval_secs [0.5, 2.0] How often a new gain target is picked (seconds)
stems.gain_automation.max_gain_change_per_tick 0.02 Maximum gain change per update — slew limit to prevent jumps
stems.gain_automation.drift_range [0.6, 1.0] Gain multiplier bounds — how far volume can drift

11.7 Twilio

Voice recording configuration for the phone line.

Parameter Default Description
twilio.greeting "" TTS greeting text (empty = MP3 greeting files only, no text-to-speech)
twilio.max_recording_seconds 3600 Maximum recording duration — 1 hour, callers are never cut off
twilio.recording_channels 1 Recording channel count
twilio.voice Polly.Brian TTS voice (only used if greeting text is set)
twilio.voice_rate fast TTS speaking rate
twilio.voice_pitch -20% TTS pitch adjustment
twilio.pause_before_instructions 1 Seconds of silence before greeting plays

11.8 Dashboard

Parameter Default Description
dashboard.enabled true Enable the web dashboard at http://localhost:8000
mode: single
role: all
server:
  host: 0.0.0.0
  port: 8000
audio:
  sample_rate: 48000
  channels: 3
  block_size: 1024
  buffer_seconds: 2.0
  device: X18/XR18
  storage_dir: audio
  incoming_watch_dir: ~/Dropbox/after_the_tone/incoming
zones:
  inside:
    channels: [0, 1]
    volume: 0.8
    type: stems
  outside:
    channels: [2]
    volume: 0.8
    type: curated
    curated:
      directory: audio/curated
      sleep_range_secs: [30, 75]
whisper:
  model_size: small
  device: auto
  compute_type: int8
  language: null
analysis:
  model: claude-sonnet-4-20250514
  themes:
    - grief
    - longing
    - memory
    - love
    - family
    - loss
    - hope
    - anger
    - peace
    - regret
    - gratitude
    - apology
    - farewell
    - confession
    - prayer
    - humor
    - silence
grains:
  min_grain_secs: 3
  max_grain_secs: 10
  silence_threshold_db: 25
  min_silence_gap_secs: 0.3
stems:
  storage_dir: audio/stems
  crossfade_seconds: 3.0
  loop: true
  default_mood: null
  bed_volume: 0.8
  import_target_lufs: -14.0
  import_required_sample_rate: 48000
  safe_mode:
    enabled: true
    safe_bed_name: null
    fixed_volume: 0.6
  gain_automation:
    update_interval_secs: [0.5, 2.0]
    max_gain_change_per_tick: 0.02
    drift_range: [0.6, 1.0]
twilio:
  greeting: ''
  max_recording_seconds: 3600
  recording_channels: 1
  voice: Polly.Brian
  voice_rate: fast
  voice_pitch: -20%
  pause_before_instructions: 1
dashboard:
  enabled: true

12 File Map

12.1 Relay (Windows PC)

File Purpose
relay/server.py Standalone Twilio relay: answer calls, play greeting, save WAV + JSON to Dropbox. Supports standard and dual (tape + digital) modes.
relay/config.yaml Relay config: ngrok URL, output dir, greetings dir, recording mode, tape SIP URI
relay/requirements.txt Minimal Python deps (fastapi, uvicorn, httpx, twilio, pyyaml, python-multipart)
relay/start.bat Windows startup script: sets Twilio env vars, launches ngrok + relay server

12.2 Studio (Mac Laptop)

File Purpose
att/main.py FastAPI app, lifespan, engine/mic/audio initialization
att/config.py Pydantic config models, YAML + env loading
att/db.py SQLite schema + async query layer (messages, transcripts, analysis, grains, stems)
att/ingest/pipeline.py Ingest orchestrator: transcribe → scrub → analyze → extract
att/ingest/transcribe.py Local Whisper transcription
att/ingest/scrub.py PII removal (spaCy + regex)
att/ingest/analyze.py Claude emotional analysis
att/ingest/grains.py Grain extraction with librosa
att/ingest/export.py Grain export with slug naming + CSV
att/ingest/twilio_webhook.py VoIP webhook: TwiML greeting, recording download, ingest queue (disabled in relay mode)
att/engine/player.py StemPlayer (two-zone beds + curated playback) + GrainEngine (legacy)
att/engine/triggers.py Trigger system (manual triggers, bed-change timers)
att/engine/importer.py Stem import validation + loudness normalization
att/engine/mixer.py Voice/Mixer for grain-based playback (legacy)
att/engine/selector.py Grain selection with sentiment/theme filtering (legacy)
att/audio/output.py Ring buffer + sounddevice multi-channel output
att/audio/processing.py DSP utilities: normalize, trim_silence, crossfade
att/dashboard/routes.py HTML page routes (home, incoming, messages, stems, engine, settings)
att/dashboard/api.py REST API endpoints (CRUD, engine controls, incoming ingest, archive, batch ops)

12.3 Scripts & Public Site

File Purpose
scripts/extract_phrases.py Extract curated samples from voicemails using fuzzy phrase matching against phrases.txt
scripts/normalize_curated.py Normalize curated samples to -23dB RMS, resample to 48kHz
scripts/export_public.py Build the public website: SQLite → data.json + MP3 audio + HTML
scripts/retranscribe.py Re-run Whisper transcription on messages missing word timestamps
public_src/index.html Public website source: constellation visualization + audio player
phrases.txt Curated phrase list for sample extraction

13 Failure Modes and Mitigations

WarningMix Thread Crash

Risk: The real-time audio thread hits an exception and stops producing sound. Mitigation: Safe mode activates automatically — all zones play the same bed stem at a fixed volume. The dashboard shows a red “SAFE MODE” indicator. Fix the underlying issue (usually a channel count mismatch between config and audio device) and restart.

WarningCurated Zone Silence

Risk: The outside zone has no samples to play. Mitigation: If audio/curated/ is empty, the curated loop logs a warning and retries every 10 seconds. Just add WAV files to the folder. If files exist but have the wrong sample rate, they’re skipped with a warning — run scripts/normalize_curated.py to fix.

WarningVolume Runaway

Risk: Gain drift or cross-zone modulation causing unexpected volume levels. Mitigation: All gain changes are slew-limited (max 0.02 per update). Curated volume is capped at 50% of the zone’s configured volume. Gain drift stays within configurable bounds (0.6–1.0 by default). Zone volumes are always adjustable from the dashboard.

14 Daily Operational Protocol

ImportantPrerequisites

This protocol assumes the VoIP pipeline and engine are already running. See Quick Start for initial setup.

  1. Collect — let phones/tapes run, VoIP pipeline processes automatically
  2. Ingest — copy daily pack to OT SD card, capture new tape messages to OT raw folder
  3. Compose — update bed + fragments in OT with strict timebox (60–90 min), save project version
  4. Export — render stems to laptop, upload via dashboard, assign roles
  5. Promote — verify stems play in engine, approve as current set
  6. Stability check — reboot test: can you restart playback in < 5 minutes?

15 Quick Start

15.1 Single-Machine Mode

If running everything on one machine (local development, or if you don’t need the relay):

# Clone and install
git clone <repo-url> after_the_tone
cd after_the_tone
pip install -e .

# Configure
cp config.yaml.example config.yaml  # edit as needed
cp .env.example .env                # add API keys

# Run
python -m att.main
# or
uvicorn att.main:app --host 0.0.0.0 --port 8000

# Open dashboard
open http://localhost:8000

15.2 Relay + Studio Mode

15.2.1 Relay (Windows PC) Setup

From a fresh Windows machine:

# 1. Install Python + ngrok
winget install Python.Python.3.12
winget install Ngrok.Ngrok
ngrok config add-authtoken <your-token>

# 2. Clone the repo
git clone <repo-url> after_the_tone
cd after_the_tone\relay

# 3. Create a virtual environment and install deps
python -m venv venv
venv\Scripts\activate
pip install -r requirements.txt

# 4. Edit relay/config.yaml
#    - Set base_url to your ngrok static domain
#    - Verify output_dir and greetings_dir point to your Dropbox paths

# 5. Set Twilio credentials in relay/start.bat
#    Edit the TWILIO_ACCOUNT_SID and TWILIO_AUTH_TOKEN lines

# 6. Test manually
relay\start.bat
# Call the phone number — should hear a greeting, then a beep
# Check ~/Dropbox/after_the_tone/incoming/ for the WAV + JSON sidecar

Greetings sync automatically from the Mac via ~/Dropbox/after_the_tone/greetings/.

15.2.2 Studio (Mac Laptop) Setup

cd after_the_tone

# Enable relay mode: set the Dropbox watch folder in config.yaml
# Under the audio: section, uncomment and set:
#   incoming_watch_dir: ~/Dropbox/after_the_tone/incoming

# Run the app as usual
python -m att.main

# Open dashboard — the "incoming" nav link will appear
open http://localhost:8000/incoming

When new recordings arrive via Dropbox, they appear on the Incoming page. Click ingest to run the full pipeline (transcribe → scrub → analyze → grains). The file moves to incoming/ingested/ and the message appears on the Messages page.

15.2.3 Relay Auto-Start

The repo includes relay/start.bat which sets Twilio credentials, launches ngrok with the static domain, waits for the tunnel, then starts the relay server.

Task Scheduler setup (runs at user logon):

  • Trigger: “At log on” (specific user)
  • Action: Start a program → relay\start.bat
  • Settings: AllowStartIfOnBatteries, DontStopIfGoingOnBatteries, StartWhenAvailable

Limitation: Runs at login, not at boot. If the PC restarts and nobody logs in, the relay stays down. For true boot-time startup, use NSSM to wrap it as a Windows service.

ngrok domain: The relay uses a static free-tier ngrok domain, so the Twilio webhook URL survives ngrok restarts with no reconfiguration needed.

15.3 Campus Phone (Grandstream HT812)

15.3.1 1. Twilio SIP Domain Setup

In the Twilio Console:

  1. Go to Elastic SIP Trunking → SIP Domains (or Voice → SIP Domains)
  2. Create a new SIP domain: after-the-tone.sip.twilio.com
  3. Set Voice URL to: https://salena-crenate-coequally.ngrok-free.dev/twilio/voice (HTTP POST) — same webhook as PSTN calls
  4. Under Credential Lists, create a new list and add credentials:
    • Username: airstream, Password: (choose a strong password)
    • Username: tape, Password: (choose a strong password) — only needed if using FXS 2 for tape path
  5. Assign the credential list to the SIP domain for authentication

15.3.2 2. Network Setup

Connect the GL-iNet Opal to the local WiFi (repeater mode):

  1. Access Opal admin at 192.168.8.1
  2. Internet → Repeater → scan and connect to WiFi network
  3. The Opal now bridges WiFi to its LAN ports and runs its own WiFi AP
  4. Plug HT812 into Opal LAN port 1, XR18 into LAN port 2
  5. Verify iPad still controls XR18 via Mixing Station (local traffic, unaffected)

15.3.3 3. HT812 Configuration

Access the HT812 web admin at its LAN IP (check Opal admin → connected clients, or try 192.168.8.x).

FXS Port 1 — Digital Path (auto-dial on pickup):

Setting Value
SIP Server after-the-tone.sip.twilio.com
SIP User ID airstream
Authenticate ID airstream
Authenticate Password (your password from step 1)
Offhook Auto-Dial greeting
Offhook Auto-Dial Delay 0 (immediate — no dial tone, just pick up and go)
NAT Traversal Keep-Alive

FXS Port 2 — Tape Path (optional, incoming calls):

Setting Value
SIP Server after-the-tone.sip.twilio.com
SIP User ID tape
Authenticate ID tape
Authenticate Password (your password)

15.3.4 4. Test

  1. Check HT812 status page — FXS 1 should show Registered
  2. Pick up the analog phone
  3. You should hear a random greeting, then a beep
  4. Leave a message, hang up
  5. Check ~/Dropbox/after_the_tone/incoming/ for the WAV + JSON sidecar

Troubleshooting:

  • Can’t log into HT812 web admin: The V2 model does not use admin/admin. It has a unique password printed on a sticker on the bottom of the unit. Username is admin.
  • Not registered: Check SIP server address, credentials, and that the Opal has internet (try pinging from the Opal admin). Make sure SIP Registration is enabled on the Twilio SIP Domain.
  • Registered but no audio: Check NAT traversal setting — try switching between Keep-Alive and STUN
  • One-way audio: Usually a NAT issue — enable STUN server (stun.l.google.com:19302) in HT812 settings

15.4 First Run

  1. Upload at least one stem via the dashboard (Stems page)
  2. Assign it the bed role
  3. The engine will start playing it immediately
  4. Upload more stems with fragments and air roles for full layered playback