Skip to content

marycamacho/audio-transcription-automation

Repository files navigation

Audio Transcription Automation

Overview

This application automates private, server‑side transcription and AI post‑processing of audio and video recordings for a distributed team using Nextcloud StorageShare.

It is designed for:

  • Zoom local recordings saved into Nextcloud
  • Local audio recorders that sync into Nextcloud
  • Fully private processing (no SaaS transcription services)
  • Clear separation between transcription and AI enrichment

The system runs as two independent workers:

  1. Transcription worker (dev) — converts audio/video → .txt
  2. AI worker (ai) — converts selected transcripts → structured Markdown documents

A third script exists for test seeding only.


High‑Level Architecture

  • Each team member has a private *-Transcripts folder in Nextcloud
  • An automation user is granted access to each space
  • Files are pulled via WebDAV, processed locally, then written back
  • No polling state is stored remotely; idempotency is enforced by file moves

Folder Model (per user)

<Name>-Transcripts/
├── New-Recordings/
│   ├── Audio/                  (used for default save location for voice recorders)
│   ├── <Zoom meeting folders>/
│   └── *.m4a / *.mp3 / *.mp4
├── Transcripts/                (generated .txt files)
├── Completed/
│   ├── Audio/
│   └── Video/
├── AI/
│   └── *.txt                   (user‑selected AI inputs)
├── AI/Output/                  (generated .md files)
└── Hold/                       (quarantined failures)

Workers

1. Transcription Worker (npm run dev)

Purpose

  • Detect new audio/video files
  • Transcribe with whisper.cpp
  • Normalize known terminology
  • Upload .txt transcripts
  • Move processed media out of inboxes

Behavior

  • Supports loose files and Zoom folders
  • Multi‑audio Zoom folders produce numbered transcripts
  • Zero‑byte or invalid files are moved to Hold
  • Re‑runs are safe (existing transcripts are skipped)

Audio formats

  • .m4a, .mp3, .mp4
  • Audio is converted to mono 16kHz WAV before transcription

2. AI Worker (npm run ai)

Purpose

  • Process only transcripts explicitly placed in AI/
  • Generate structured Markdown documents via OpenAI
  • Move original .txt back to Transcripts/

Key rules

  • If AI/ does not exist → skip
  • If AI/ exists but is empty → no‑op
  • AI/Output/ is created only when needed
  • Output filenames are deduplicated (-2, -3, etc.)

Output

  • Markdown documents (meeting summaries, notes, etc.)
  • Schema‑validated JSON → Markdown
  • Company name normalization enforced

Whisper Model

The system uses whisper.cpp with a local model file.

Required

MODEL_PATH=/absolute/path/to/ggml-base.en.bin

Example deployment mount:

-v /var/lib/whisper-models:/models:ro
MODEL_PATH=/models/ggml-base.en.bin

The container will fail fast if the model is missing.


Environment Variables (Core)

Nextcloud

NC_BASE=https://nextcloud.example.com
NC_USER=automation-user
NC_PASS=app-password
NC_ROOT=/Transcripts

Folder Templates

NC_TEMPLATE_INBOX=New-Recordings
NC_TEMPLATE_INBOX_AUDIO=New-Recordings/Audio
NC_TEMPLATE_TRANSCRIPTS=Transcripts
NC_TEMPLATE_COMPLETED_AUDIO=Completed/Audio
NC_TEMPLATE_COMPLETED_VIDEO=Completed/Video
NC_TEMPLATE_HOLD=Hold

AI

OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-5-mini

Test Seeding (Development Only)

Script

src/seed-testdata.mjs

Purpose

  • Create deterministic, realistic test data in Nextcloud
  • Validate transcription and AI behavior end‑to‑end
  • Exercise failure paths safely

Critical Rules

  • DESTRUCTIVE: archives any existing test root
  • Must use:
NC_ROOT=/Transcripts/_TEST_LATEST
  • Never run against production paths

What it does

  • Archives previous test runs to _TEST_RUNS/<timestamp>
  • Creates multiple user spaces
  • Seeds:
    • Loose audio
    • Zoom folders (single/multi/no audio)
    • Bad audio
    • Zero‑byte files
    • AI input transcripts

AI Output Note

  • AI/Output is never created by the seed script
  • This avoids duplicate folder artifacts in Nextcloud

Usage

npm run seed:test
npm run dev:test
npm run ai:test

Deployment Model

  • Docker image contains all runtime dependencies except the Whisper model
  • Systemd timers trigger:
    • Transcription worker (e.g. every 15 minutes)
    • AI worker (independent cadence)

Workers are intentionally stateless and safe to rerun.


Design Principles

  • Private by default
  • No background daemons
  • No implicit AI processing
  • Clear file‑based user intent
  • Fail fast, quarantine safely
  • Replaceable infrastructure

Non‑Goals

  • Real‑time transcription
  • Multi‑language support (currently)
  • SaaS transcription vendors
  • Automatic AI processing without user intent

About

Updated: Fully deployed app for automating transcription of audio & video saved to nextcloud

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published