Transcription Research Index

BrassTranscripts maintains this curated index of primary sources — peer-reviewed papers, active open-source tools, and documented benchmarks — so builders and researchers can verify the evidence behind AI transcription claims. Every entry includes a capsule annotation explaining what BrassTranscripts draws from it and what builders should do with it.

About This Index

BrassTranscripts treats this index as the evidence layer underneath its product documentation. When a feature description says "speaker diarization performance depends on overlap," this is the research that backs that claim. The index covers five categories corresponding to the main quality dimensions of AI transcription: accuracy, speaker identification, language coverage, audio robustness, and benchmark methodology.

Entries pass three inclusion criteria: the source must be a primary document (paper, benchmark, or documented tool — not a summary); it must still be current (active maintenance for tools, peer-reviewed or preprint for papers); and it must have a concrete applied implication for builders or users of AI transcription services. A small number of entries are first-party BrassTranscripts data studies — original investigations and production-data analyses published by BrassTranscripts itself — and are labeled as such in their bylines.

Transcription Accuracy

The AI speech recognition architecture, the inference engine, the Open ASR Leaderboard, the Artificial Analysis benchmark, Distil-Whisper, and the Conformer architecture — plus a first-party BrassTranscripts investigation into the "98% accuracy" claim — the core evidence base for AI transcription quality in real-world conditions.

8 entries

→

Speaker Diarization

The neural diarization engine BrassTranscripts uses, overlap-aware diarization research, the ETH Zurich multi-model benchmark, the foundational end-to-end and powerset methods, the Third DIHARD challenge, and the in-the-wild datasets that reveal what speaker identification actually costs.

8 entries

→

Multilingual Speech

FLEURS, Common Voice v20, Earnings-22, peer-reviewed accent research, two underrepresented-population benchmarks (Indian English, African English), the MMS 1,107-language model, and a first-party BrassTranscripts study of real language demand across 30 languages.

8 entries

→

Audio Quality

Reverberation benchmarks, SNR thresholds, the CHiME-7 far-field challenge, the DNS Challenge and WHAMR! noise datasets, the DNSMOS perceptual quality metric, and a practical WER measurement library — the evidence base for explaining what audio problems hurt transcripts.

7 entries

→

ASR Benchmarks

The Open ASR Leaderboard long-form track, Artificial Analysis STT comparison, MLPerf Inference v5.1, NIST SCTK, the GigaSpeech multi-domain corpus, and The People's Speech 30,000-hour dataset — the methodological scaffolding behind every accuracy claim.

8 entries

→

Curation Principles

Primary sources only

Papers, benchmarks, and documented tools — not summaries or second-hand descriptions. Each entry links to the authoritative source.

Applied annotations

Every entry includes a capsule explaining what builders should do with the finding — not a restatement of the abstract.

Quarterly refresh

Tool star counts, last-commit dates, and report publication dates are reviewed quarterly. Stale entries are updated or removed.

See the research applied

BrassTranscripts puts these findings into practice — professional AI transcription with speaker identification, 99+ languages, and real-world audio robustness.

Start Transcribing →