Skip to main content

Multilingual Audio & Video Transcription in 99+ Languages

BrassTranscripts transcribes recordings in 99+ languages with automatic language detection and speaker identification — no language dropdown, no separate tool per language. Upload audio or video in any supported language and get TXT, SRT, VTT, and JSON transcripts at a $2.50-$6 flat rate, no subscription.

99+
Languages with auto-detection
Auto
Speaker ID across languages
$2.50-$6
Same flat rate, every language
4 formats
TXT, SRT, VTT, JSON

How Multilingual Transcription Works

BrassTranscripts detects the spoken language automatically — there is no language to select and no wrong choice to make. The workflow is identical no matter what language your recording is in.

1

Upload Your Recording — No Language Setting

Drag and drop any of 11 supported formats (MP3, M4A, WAV, MP4, and more) up to 450 MB. You don't pick a language — the AI transcription engine identifies the spoken language directly from the audio, including for recordings where you're unsure of the exact language or dialect.

2

Automatic Transcription + Speaker Labels

The engine transcribes in the detected language at 1-3 minutes per hour of audio and labels each speaker automatically. For recordings that switch languages or have speakers using different languages, each segment is transcribed in the language actually spoken — with speaker labels preserved across the whole file.

3

Preview, Pay, Download Four Formats

Review the first 30 words to verify quality on your specific audio, then pay the flat rate ($2.50 up to 15 minutes, $6.00 for 16+ minutes (any length)). Download TXT, SRT, VTT, and JSON — the JSON includes the detected language and word-level timestamps. To translate the result, run it through the multilingual AI prompt set.

99+ Languages, With Honest Accuracy Expectations

BrassTranscripts supports 99+ languages, but accuracy is not uniform across them — it tracks how much training data exists for each language. We're transparent about this rather than claiming one accuracy number for every language.

Professional-grade coverage

High-resource languages with large training corpora — English, Spanish, Portuguese, French, German, Italian, Mandarin, Japanese, Dutch — produce consistent, professional-grade results suitable for most business and content workflows.

Variable coverage — verify first

Lower-resource languages, heavy regional accents, and non-native speech produce more variable results that may need human review for high-stakes documents. The 30-word preview lets you check quality before paying.

Why accuracy varies by language: the evidence behind this — coverage maps, accent benchmarks, and underrepresented-language research — is documented on the multilingual speech research page, part of the BrassTranscripts Research Index.

Mixed-Language Recordings & Speaker Labels

Real multilingual audio rarely stays in one language. A meeting opens in English and shifts to Spanish; an interview subject code-switches mid-sentence; a panel has speakers in three languages. BrassTranscripts transcribes each segment in the language actually spoken and keeps speaker labels consistent across the whole recording — so a mixed-language file becomes one attributed transcript instead of a jumble.

  • Code-switching: segments are transcribed in the language spoken, not forced into one language.
  • Multi-language speakers: automatic speaker identification labels who said what, across languages.
  • Post-processing: the multilingual AI prompt set adds translate-to-English, language identification, code-switching analysis, and cross-language terminology cleanup.

Who Needs Multilingual Transcription

🌍 Global & Distributed Teams

Multilingual meetings, all-hands, and cross-border calls transcribed with speaker labels — no separate tool per region.

🔬 Researchers & Journalists

Field interviews and source material in the subject's own language, with attributed quotes and timestamps for analysis and fact-checking.

🎙️ International Podcasters & Creators

Non-English and bilingual shows get SRT/VTT captions and TXT show notes from a single upload — captions in the language spoken.

⚖️ Legal, Immigration & Social Services

Client recordings in the speaker's native language, transcribed verbatim with speaker separation before translation or review.

Same Flat Rate, Every Language

No language surcharge, no subscription, no per-minute meter. The price depends only on duration — identical whether the recording is in English, Portuguese, Arabic, or any other supported language.

  • ✓ $2.50 flat for files up to 15 minutes
  • ✓ $6.00 flat for files 16 minutes and longer (any length)
  • ✓ Automatic language detection + speaker identification, included
  • ✓ All four formats: TXT, SRT, VTT, JSON
  • ✓ 30-word preview before payment
  • ✓ 100% money-back satisfaction guarantee

Transcribe Your Recording in Any of 99+ Languages

Upload • Automatic language detection • Speaker labels • TXT, SRT, VTT, JSON in minutes

Start Transcribing →

Preview before paying • $2.50-$6 flat rate • No subscription • 100% satisfaction guarantee

Frequently Asked Questions About Multilingual Transcription

How many languages does BrassTranscripts transcribe?

BrassTranscripts transcribes audio and video in 99+ languages with automatic language detection — you don't select the language before uploading. Coverage spans high-resource languages like English, Spanish, Portuguese, French, German, Italian, Mandarin, Japanese, and Arabic, plus dozens of lower-resource languages. Accuracy is strongest for high-resource languages with large training corpora; less-resourced languages produce more variable results and may need human review for high-stakes documents.

Do I need to specify the language before uploading?

No. BrassTranscripts detects the spoken language automatically from the audio itself, so there is no language dropdown to set and no risk of choosing the wrong one. This matters for recordings where you're unsure of the exact language or dialect, and for files where the language isn't known in advance. The detected language is recorded with the transcript and exposed in the JSON output.

Can BrassTranscripts handle a recording with more than one language?

Yes. BrassTranscripts transcribes recordings that switch between languages (code-switching) and multi-party recordings where different speakers use different languages, labeling each speaker automatically. The AI transcription engine transcribes each segment in the language actually spoken. For mixed-language meetings, interviews, and panel discussions, this produces a single transcript that preserves who said what, in the language they said it.

Does multilingual transcription include speaker identification?

Yes. BrassTranscripts includes automatic speaker identification on every transcript in every language at no extra charge, labeling speakers as Speaker A, Speaker B, and so on with consistent labels throughout. Speaker labels work across languages, which is what makes multilingual meetings, multi-host podcasts, and cross-border interviews usable as a single attributed transcript.

Can BrassTranscripts translate a non-English recording into English?

BrassTranscripts transcribes recordings in their original spoken language. To turn a non-English transcript into English, BrassTranscripts publishes a set of multilingual AI prompts — including a translate-to-English prompt — that run the finished transcript through ChatGPT, Claude, or Gemini. This keeps the verbatim source-language transcript intact while producing a separate English version you can verify against the original.

How accurate is AI transcription for non-English languages?

Accuracy correlates with how much training data exists for a language. BrassTranscripts produces professional-grade results for high-resource languages (English, Spanish, Portuguese, French, German, Italian) and more variable results for low-resource languages and heavy regional accents. Because accuracy depends on your specific recording, BrassTranscripts shows a 30-word preview before payment so you can verify quality on your actual audio before paying.

What does multilingual transcription cost?

BrassTranscripts charges the same flat rate for every language: $2.50 for files up to 15 minutes and $6.00 for files 16 minutes and longer (any length), with no subscription and no per-minute meter. Every transcript includes automatic speaker identification, automatic language detection, and all four output formats (TXT, SRT, VTT, JSON) regardless of language.

More questions about multilingual transcription? Visit our complete FAQ page or contact .