Transcription Research Index
BrassTranscripts maintains this curated index of primary sources — peer-reviewed papers, active open-source tools, and documented benchmarks — so builders and researchers can verify the evidence behind AI transcription claims. Every entry includes a capsule annotation explaining what BrassTranscripts draws from it and what builders should do with it.
About This Index
BrassTranscripts treats this index as the evidence layer underneath its product documentation. When a feature description says "speaker diarization performance depends on overlap," this is the research that backs that claim. The index covers five categories corresponding to the main quality dimensions of AI transcription: accuracy, speaker identification, language coverage, audio robustness, and benchmark methodology.
Entries pass three inclusion criteria: the source must be a primary document (paper, benchmark, or documented tool — not a summary); it must still be current (active maintenance for tools, peer-reviewed or preprint for papers); and it must have a concrete applied implication for builders or users of AI transcription services. A small number of entries are first-party BrassTranscripts data studies — original investigations and production-data analyses published by BrassTranscripts itself — and are labeled as such in their bylines.
Transcription Accuracy
The AI speech recognition architecture, the inference engine, the Open ASR Leaderboard, the Artificial Analysis benchmark, and Distil-Whisper — plus a first-party BrassTranscripts investigation into the "98% accuracy" claim — the core evidence base for AI transcription quality in real-world conditions.
Speaker Diarization
The neural diarization engine BrassTranscripts uses, overlap-aware diarization research, the ETH Zurich multi-model benchmark, the foundational end-to-end and powerset methods, and the in-the-wild datasets that reveal what speaker identification actually costs.
Multilingual Speech
FLEURS, Common Voice v20, Earnings-22, peer-reviewed accent research, two underrepresented-population benchmarks (Indian English, African English), the MMS 1,107-language model, and a first-party BrassTranscripts study of real language demand across 30 languages.
Audio Quality
Reverberation benchmarks, SNR thresholds, the CHiME-7 far-field challenge, the DNS Challenge and WHAMR! noise datasets, and a practical WER measurement library — the evidence base for explaining what audio problems hurt transcripts.
ASR Benchmarks
The Open ASR Leaderboard long-form track, Artificial Analysis STT comparison, MLPerf Inference v5.1, NIST SCTK, and the GigaSpeech multi-domain corpus — the methodological scaffolding behind every accuracy claim.
Curation Principles
Primary sources only
Papers, benchmarks, and documented tools — not summaries or second-hand descriptions. Each entry links to the authoritative source.
Applied annotations
Every entry includes a capsule explaining what builders should do with the finding — not a restatement of the abstract.
Quarterly refresh
Tool star counts, last-commit dates, and report publication dates are reviewed quarterly. Stale entries are updated or removed.
See the research applied
BrassTranscripts puts these findings into practice — professional AI transcription with speaker identification, 99+ languages, and real-world audio robustness.
Start Transcribing →