Audio Transcription Automation

Overview

This application automates private, server‑side transcription and AI post‑processing of audio and video recordings for a distributed team using Nextcloud StorageShare.

It is designed for:

Zoom local recordings saved into Nextcloud
Local audio recorders that sync into Nextcloud
Fully private processing (no SaaS transcription services)
Clear separation between transcription and AI enrichment

The system runs as two independent workers:

Transcription worker (dev) — converts audio/video → .txt
AI worker (ai) — converts selected transcripts → structured Markdown documents

A third script exists for test seeding only.

High‑Level Architecture

Each team member has a private *-Transcripts folder in Nextcloud
An automation user is granted access to each space
Files are pulled via WebDAV, processed locally, then written back
No polling state is stored remotely; idempotency is enforced by file moves

Folder Model (per user)

<Name>-Transcripts/
├── New-Recordings/
│   ├── Audio/                  (used for default save location for voice recorders)
│   ├── <Zoom meeting folders>/
│   └── *.m4a / *.mp3 / *.mp4
├── Transcripts/                (generated .txt files)
├── Completed/
│   ├── Audio/
│   └── Video/
├── AI/
│   └── *.txt                   (user‑selected AI inputs)
├── AI/Output/                  (generated .md files)
└── Hold/                       (quarantined failures)

Workers

1. Transcription Worker (`npm run dev`)

Purpose

Detect new audio/video files
Transcribe with whisper.cpp
Normalize known terminology
Upload .txt transcripts
Move processed media out of inboxes

Behavior

Supports loose files and Zoom folders
Multi‑audio Zoom folders produce numbered transcripts
Zero‑byte or invalid files are moved to Hold
Re‑runs are safe (existing transcripts are skipped)

Audio formats

.m4a, .mp3, .mp4
Audio is converted to mono 16kHz WAV before transcription

2. AI Worker (`npm run ai`)

Purpose

Process only transcripts explicitly placed in AI/
Generate structured Markdown documents via OpenAI
Move original .txt back to Transcripts/

Key rules

If AI/ does not exist → skip
If AI/ exists but is empty → no‑op
AI/Output/ is created only when needed
Output filenames are deduplicated (-2, -3, etc.)

Output

Markdown documents (meeting summaries, notes, etc.)
Schema‑validated JSON → Markdown
Company name normalization enforced

Whisper Model

The system uses whisper.cpp with a local model file.

Required

MODEL_PATH=/absolute/path/to/ggml-base.en.bin

Example deployment mount:

-v /var/lib/whisper-models:/models:ro
MODEL_PATH=/models/ggml-base.en.bin

The container will fail fast if the model is missing.

Environment Variables (Core)

Nextcloud

NC_BASE=https://nextcloud.example.com
NC_USER=automation-user
NC_PASS=app-password
NC_ROOT=/Transcripts

Folder Templates

NC_TEMPLATE_INBOX=New-Recordings
NC_TEMPLATE_INBOX_AUDIO=New-Recordings/Audio
NC_TEMPLATE_TRANSCRIPTS=Transcripts
NC_TEMPLATE_COMPLETED_AUDIO=Completed/Audio
NC_TEMPLATE_COMPLETED_VIDEO=Completed/Video
NC_TEMPLATE_HOLD=Hold

AI

OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-5-mini

Test Seeding (Development Only)

Script

src/seed-testdata.mjs

Purpose

Create deterministic, realistic test data in Nextcloud
Validate transcription and AI behavior end‑to‑end
Exercise failure paths safely

Critical Rules

DESTRUCTIVE: archives any existing test root
Must use:

NC_ROOT=/Transcripts/_TEST_LATEST

Never run against production paths

What it does

Archives previous test runs to _TEST_RUNS/<timestamp>
Creates multiple user spaces
Seeds:
- Loose audio
- Zoom folders (single/multi/no audio)
- Bad audio
- Zero‑byte files
- AI input transcripts

AI Output Note

AI/Output is never created by the seed script
This avoids duplicate folder artifacts in Nextcloud

Usage

npm run seed:test
npm run dev:test
npm run ai:test

Deployment Model

Docker image contains all runtime dependencies except the Whisper model
Systemd timers trigger:
- Transcription worker (e.g. every 15 minutes)
- AI worker (independent cadence)

Workers are intentionally stateless and safe to rerun.

Design Principles

Private by default
No background daemons
No implicit AI processing
Clear file‑based user intent
Fail fast, quarantine safely
Replaceable infrastructure

Non‑Goals

Real‑time transcription
Multi‑language support (currently)
SaaS transcription vendors
Automatic AI processing without user intent

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
fixtures		fixtures
prompts		prompts
src		src
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
todo.md		todo.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Audio Transcription Automation

Overview

High‑Level Architecture

Folder Model (per user)

Workers

1. Transcription Worker (`npm run dev`)

2. AI Worker (`npm run ai`)

Whisper Model

Environment Variables (Core)

Nextcloud

Folder Templates

AI

Test Seeding (Development Only)

Script

Purpose

Critical Rules

What it does

AI Output Note

Usage

Deployment Model

Design Principles

Non‑Goals

About

Uh oh!

Releases

Packages

Languages

License

marycamacho/audio-transcription-automation

Folders and files

Latest commit

History

Repository files navigation

Audio Transcription Automation

Overview

High‑Level Architecture

Folder Model (per user)

Workers

1. Transcription Worker (npm run dev)

2. AI Worker (npm run ai)

Whisper Model

Environment Variables (Core)

Nextcloud

Folder Templates

AI

Test Seeding (Development Only)

Script

Purpose

Critical Rules

What it does

AI Output Note

Usage

Deployment Model

Design Principles

Non‑Goals

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

1. Transcription Worker (`npm run dev`)

2. AI Worker (`npm run ai`)

Packages