This application automates private, server‑side transcription and AI post‑processing of audio and video recordings for a distributed team using Nextcloud StorageShare.
It is designed for:
- Zoom local recordings saved into Nextcloud
- Local audio recorders that sync into Nextcloud
- Fully private processing (no SaaS transcription services)
- Clear separation between transcription and AI enrichment
The system runs as two independent workers:
- Transcription worker (
dev) — converts audio/video →.txt - AI worker (
ai) — converts selected transcripts → structured Markdown documents
A third script exists for test seeding only.
- Each team member has a private
*-Transcriptsfolder in Nextcloud - An automation user is granted access to each space
- Files are pulled via WebDAV, processed locally, then written back
- No polling state is stored remotely; idempotency is enforced by file moves
<Name>-Transcripts/
├── New-Recordings/
│ ├── Audio/ (used for default save location for voice recorders)
│ ├── <Zoom meeting folders>/
│ └── *.m4a / *.mp3 / *.mp4
├── Transcripts/ (generated .txt files)
├── Completed/
│ ├── Audio/
│ └── Video/
├── AI/
│ └── *.txt (user‑selected AI inputs)
├── AI/Output/ (generated .md files)
└── Hold/ (quarantined failures)
Purpose
- Detect new audio/video files
- Transcribe with
whisper.cpp - Normalize known terminology
- Upload
.txttranscripts - Move processed media out of inboxes
Behavior
- Supports loose files and Zoom folders
- Multi‑audio Zoom folders produce numbered transcripts
- Zero‑byte or invalid files are moved to
Hold - Re‑runs are safe (existing transcripts are skipped)
Audio formats
.m4a,.mp3,.mp4- Audio is converted to mono 16kHz WAV before transcription
Purpose
- Process only transcripts explicitly placed in
AI/ - Generate structured Markdown documents via OpenAI
- Move original
.txtback toTranscripts/
Key rules
- If
AI/does not exist → skip - If
AI/exists but is empty → no‑op AI/Output/is created only when needed- Output filenames are deduplicated (
-2,-3, etc.)
Output
- Markdown documents (meeting summaries, notes, etc.)
- Schema‑validated JSON → Markdown
- Company name normalization enforced
The system uses whisper.cpp with a local model file.
Required
MODEL_PATH=/absolute/path/to/ggml-base.en.bin
Example deployment mount:
-v /var/lib/whisper-models:/models:ro
MODEL_PATH=/models/ggml-base.en.bin
The container will fail fast if the model is missing.
NC_BASE=https://nextcloud.example.com
NC_USER=automation-user
NC_PASS=app-password
NC_ROOT=/Transcripts
NC_TEMPLATE_INBOX=New-Recordings
NC_TEMPLATE_INBOX_AUDIO=New-Recordings/Audio
NC_TEMPLATE_TRANSCRIPTS=Transcripts
NC_TEMPLATE_COMPLETED_AUDIO=Completed/Audio
NC_TEMPLATE_COMPLETED_VIDEO=Completed/Video
NC_TEMPLATE_HOLD=Hold
OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-5-mini
src/seed-testdata.mjs
- Create deterministic, realistic test data in Nextcloud
- Validate transcription and AI behavior end‑to‑end
- Exercise failure paths safely
- DESTRUCTIVE: archives any existing test root
- Must use:
NC_ROOT=/Transcripts/_TEST_LATEST
- Never run against production paths
- Archives previous test runs to
_TEST_RUNS/<timestamp> - Creates multiple user spaces
- Seeds:
- Loose audio
- Zoom folders (single/multi/no audio)
- Bad audio
- Zero‑byte files
- AI input transcripts
AI/Outputis never created by the seed script- This avoids duplicate folder artifacts in Nextcloud
npm run seed:test
npm run dev:test
npm run ai:test
- Docker image contains all runtime dependencies except the Whisper model
- Systemd timers trigger:
- Transcription worker (e.g. every 15 minutes)
- AI worker (independent cadence)
Workers are intentionally stateless and safe to rerun.
- Private by default
- No background daemons
- No implicit AI processing
- Clear file‑based user intent
- Fail fast, quarantine safely
- Replaceable infrastructure
- Real‑time transcription
- Multi‑language support (currently)
- SaaS transcription vendors
- Automatic AI processing without user intent