Skip to main content
← Back to Blog
12 min readBrassTranscripts Team

What File Formats Can Be Transcribed? [Complete Audio & Video Format Guide]

BrassTranscripts supports 11 file formats total: 9 audio formats (MP3, M4A, WAV, AAC, FLAC, OGG, Opus, WebM, MPGA) and 2 video formats (MP4, MPEG). Maximum file size is 250MB. Audio/video duration must be between 5 minutes and 2 hours.

Quick Navigation


Complete Supported Formats List

BrassTranscripts supports 11 total formats (9 audio + 2 video). Read about why we chose to support these 11 formats in our technical decisions.

Audio Formats (9 Total)

Format Extension Best For Quality
MP3 .mp3 General use, podcasts Good compression, widely compatible
M4A .m4a Apple devices, iTunes Better quality than MP3 at same size
WAV .wav Professional recordings Uncompressed, maximum quality
AAC .aac Streaming, modern devices Efficient compression, good quality
FLAC .flac Archival, audiophile Lossless compression
OGG .ogg Open-source projects Free format, good compression
Opus .opus Voice recordings, VoIP Optimized for speech
WebM .webm Web recordings Browser-native format
MPGA .mpga MPEG audio streams Legacy audio format

Video Formats (2 Total)

Format Extension Best For Note
MP4 .mp4 Universal video standard Audio track extracted for transcription
MPEG .mpeg Legacy video files Audio track extracted for transcription

Important: For video files, BrassTranscripts extracts the audio track and transcribes it. Video content is not analyzed—only the spoken audio.


Audio Formats Explained

MP3 (.mp3) - Most Common Format

Why it's popular: MP3 has been the standard audio format for decades. Nearly every device and software supports it.

Transcription quality: Excellent. Even with compression, MP3 retains sufficient audio quality for accurate transcription.

When to use:

  • Podcast recordings
  • Downloaded audio files
  • General audio transcription needs
  • Files from older recording equipment

Technical specs: Lossy compression, typically 128-320 kbps bitrate


M4A (.m4a) - Apple's Preferred Format

Why it's popular: Default format for Apple devices (iPhone, iPad, Mac). iTunes and Voice Memos app both create M4A files.

Transcription quality: Excellent. M4A typically provides better audio quality than MP3 at the same file size.

When to use:

  • iPhone/iPad recordings (Voice Memos app)
  • iTunes audio files
  • Apple ecosystem recordings
  • Higher quality audio at smaller sizes

Technical specs: AAC compression in MP4 container, typically 128-256 kbps


WAV (.wav) - Professional Uncompressed Audio

Why it's popular: Standard for professional audio recording. No compression means no quality loss.

Transcription quality: Maximum. Uncompressed audio preserves every detail.

When to use:

  • Professional studio recordings
  • High-quality interviews
  • Archival recordings
  • When file size is not a concern

Technical specs: Uncompressed PCM audio, typically 1,411 kbps (CD quality)

Warning: WAV files are large. A 1-hour recording can be 600MB+, exceeding BrassTranscripts' 250MB limit. Consider converting to FLAC or high-bitrate MP3 for long recordings.


AAC (.aac) - Modern Efficient Format

Why it's popular: Standard for streaming services and modern devices. Better compression efficiency than MP3.

Transcription quality: Excellent. AAC maintains high quality even at lower bitrates.

When to use:

  • Streaming audio downloads
  • Modern device recordings
  • YouTube audio extraction
  • Efficient storage with high quality

Technical specs: Advanced Audio Coding, typically 128-256 kbps


FLAC (.flac) - Lossless Compression

Why it's popular: Audiophile favorite. Compresses audio without losing quality (unlike MP3/AAC).

Transcription quality: Maximum. Identical to WAV quality but 40-60% smaller file size.

When to use:

  • High-quality archival recordings
  • Professional interviews requiring perfect fidelity
  • Long recordings where WAV would exceed size limits
  • Music recordings with subtle audio details

Technical specs: Lossless compression, typically 700-900 kbps


OGG (.ogg) - Open-Source Alternative

Why it's popular: Free, patent-free format. Common in open-source software and Linux systems.

Transcription quality: Good to excellent, depending on encoding settings.

When to use:

  • Linux system recordings
  • Open-source project audio
  • Game audio files
  • Web-based recordings using open formats

Technical specs: Ogg Vorbis compression, variable bitrate


Opus (.opus) - Speech-Optimized Format

Why it's popular: Designed specifically for voice and speech. Used in VoIP applications and voice chat.

Transcription quality: Excellent for speech. Optimized for clarity over music quality.

When to use:

  • VoIP call recordings (Discord, Signal, WhatsApp)
  • Voice chat recordings
  • Webinar recordings
  • Speech-only content

Technical specs: Opus codec, typically 16-64 kbps for speech


WebM (.webm) - Browser-Native Format

Why it's popular: Standard for web-based audio recording. Browsers can record directly to WebM.

Transcription quality: Good. Quality depends on recording settings.

When to use:

  • Browser-based recordings
  • Web application audio captures
  • Screen recordings with audio
  • Online meeting recordings

Technical specs: Opus or Vorbis audio in WebM container


MPGA (.mpga) - MPEG Audio Stream

Why it's popular: Legacy format for MPEG audio streams. Less common than MP3 but still in use.

Transcription quality: Good. Similar to MP3 quality.

When to use:

  • Legacy audio files
  • MPEG stream extractions
  • Older recording systems

Technical specs: MPEG-1 or MPEG-2 audio layer


Video Formats Explained

MP4 (.mp4) - Universal Video Standard

Why it's supported: MP4 is the most common video format worldwide. Every modern device and platform uses it.

How transcription works: BrassTranscripts extracts the audio track from MP4 files and transcribes the speech. Video content is not analyzed.

When to use:

  • Zoom/Teams/Google Meet recordings
  • YouTube video downloads
  • Phone video recordings
  • Screen recordings with audio
  • Interview recordings (video cameras)
  • Webinar recordings

Common sources:

  • Video conferencing platforms (Zoom, Teams, Meet)
  • Smartphone cameras (iPhone, Android)
  • Screen recording software (OBS, Camtasia)
  • Video editing software exports
  • Security camera footage with audio

Technical specs: H.264 or H.265 video + AAC audio, maximum 250MB


MPEG (.mpeg) - Legacy Video Format

Why it's supported: Older video format still used in professional video equipment and legacy systems.

How transcription works: Audio track extracted and transcribed (same as MP4).

When to use:

  • Legacy video files
  • Professional video equipment recordings
  • Older security camera footage
  • Broadcast video files

Technical specs: MPEG-2 video + audio, maximum 250MB


File Size and Duration Limits

Maximum File Size: 250MB

Why this limit: Balances upload speed, processing efficiency, and practical recording lengths. All formats process at similar speeds - learn more in our processing time guide.

What 250MB allows:

  • MP3 (128 kbps): ~4 hours of audio
  • MP3 (320 kbps): ~1.5 hours of audio
  • M4A (128 kbps): ~4 hours of audio
  • WAV (CD quality): ~25 minutes of audio
  • FLAC: ~35-40 minutes of audio
  • MP4 video (1080p): ~15-30 minutes depending on bitrate

If your file exceeds 250MB:

  1. Compress the file: Convert high-quality formats (WAV) to efficient formats (MP3, M4A)
  2. Reduce bitrate: Re-encode at lower bitrate while maintaining speech clarity
  3. Split the file: Break long recordings into multiple segments under 250MB each
  4. Remove video: If transcribing video, extract audio-only (much smaller)

Duration Limits: 5 Minutes to 2 Hours

Minimum duration: 5 minutes

Why: Ensures meaningful transcription content. Very short clips often lack context and are inefficient to process.

Maximum duration: 2 hours

Why: Practical limit for transcription quality and processing time. Most meetings, interviews, and podcasts fit within 2 hours.

If your recording exceeds 2 hours:

  1. Split into segments: Break into 2-hour (or shorter) chunks
  2. Transcribe separately: Upload each segment individually
  3. Combine transcripts: Merge the text files afterward

Example: 3-hour conference recording → Split into 90-minute segments → Upload twice → Combine transcripts


Format Conversion Tips

When to Convert Formats

Convert WAV to FLAC or MP3:

  • WAV files exceeding 250MB
  • Long professional recordings
  • Archival storage with size constraints

Convert video to audio-only:

  • Video files exceeding 250MB
  • Only speech matters (no need to upload video)
  • Faster upload and processing

Convert exotic formats to MP3/M4A:

  • Formats not listed above (WMA, RA, etc.)
  • Ensuring maximum compatibility
  • Reducing troubleshooting

Free Conversion Tools

Desktop software:

  • Audacity (Windows, Mac, Linux) - Free, open-source audio editor and converter
  • VLC Media Player (Windows, Mac, Linux) - Convert audio and video formats
  • FFmpeg (Command-line tool) - Professional conversion for technical users

Online converters (use with caution for sensitive content):

Privacy note: For confidential recordings (business meetings, legal interviews, medical content), use desktop software only. Online converters upload your files to third-party servers.


Conversion Best Practices

Maintain quality for transcription:

  • Use 128 kbps minimum for MP3/M4A (lower reduces accuracy)
  • Keep sample rate at 44.1 kHz or higher (speech clarity)
  • Preserve mono or stereo (don't force mono from stereo unnecessarily)

Reduce file size efficiently:

  1. Remove video track: Audio-only files are 90% smaller than video
  2. Use efficient formats: FLAC instead of WAV, M4A instead of WAV
  3. Normalize audio levels: Prevents oversized files from excessive dynamics
  4. Trim silence: Remove long silent sections (intro/outro music, dead air)

Avoid these mistakes:

  • ❌ Converting already-compressed formats multiple times (MP3 → AAC → OGG degrades quality)
  • ❌ Using very low bitrates (<64 kbps) to shrink files (damages transcription accuracy)
  • ❌ Converting lossless to lossy then back to lossless (doesn't restore quality)

Common Format Questions

Can I upload files from Zoom/Teams/Google Meet?

Yes, absolutely. All major video conferencing platforms export compatible formats:

  • Zoom: MP4 (video) or M4A (audio-only) ✅
  • Microsoft Teams: MP4 ✅
  • Google Meet: MP4 ✅
  • Webex: MP4 ✅

Pro tip: Download audio-only recordings when available (M4A from Zoom). They're much smaller than video files and upload faster.


What about iPhone/Android recordings?

iPhone (Voice Memos app): M4A ✅ Directly supported

Android:

  • Most devices: MP3 or M4A ✅ Directly supported
  • Samsung Voice Recorder: M4A ✅ Directly supported
  • Google Recorder: M4A ✅ Directly supported

Can I transcribe YouTube videos?

Yes, but you need to download the audio first. BrassTranscripts does not download YouTube videos for you.

How to download YouTube audio:

  1. Use yt-dlp (command-line tool, most reliable)
  2. Use 4K Video Downloader (desktop app)
  3. Extract audio as MP3 or M4A
  4. Upload to BrassTranscripts

Legal note: Only download content you have rights to transcribe (your own videos, educational use with permission, public domain content).


What if my format isn't supported?

Solution: Convert to MP3 or M4A (universally compatible)

Unsupported formats you might encounter:

  • WMA (Windows Media Audio) → Convert to MP3
  • RA (RealAudio) → Convert to MP3
  • AMR (Adaptive Multi-Rate, old phone recordings) → Convert to M4A
  • AIFF (Apple Interchange File Format) → Convert to M4A or FLAC

Use conversion tools mentioned earlier (Audacity, VLC, FFmpeg).


Does audio quality affect format choice?

For transcription accuracy: Format matters less than recording quality.

What actually affects accuracy:

  1. Clear speech (no mumbling, good enunciation)
  2. Low background noise (quiet environment)
  3. Good microphone (reduces distortion)
  4. Adequate volume levels (not too quiet or clipping)

Format impacts:

  • Compressed formats (MP3, AAC, OGG): Minimal impact if bitrate is reasonable (128+ kbps)
  • Lossless formats (WAV, FLAC): Maximum fidelity but often overkill for speech
  • Very low bitrates (<64 kbps): Can damage transcription accuracy

Recommendation: Use MP3 or M4A at 128-192 kbps for speech. Higher quality formats don't significantly improve transcription but create larger files.


Can I transcribe files from cloud storage?

BrassTranscripts requires direct file upload. You cannot provide cloud storage links (Google Drive, Dropbox, OneDrive).

Workflow:

  1. Download file from cloud storage to your device
  2. Verify format is supported (see list above)
  3. Upload to BrassTranscripts

Why direct upload: Ensures security, privacy, and reliable file access during processing.


What happens to video content in MP4/MPEG files?

Video content is ignored. BrassTranscripts extracts and transcribes only the audio track.

This means:

  • Visual information is not captured (slides, screen shares, text overlays)
  • Speaker identification is based on voice, not visual appearance
  • Gestures, body language, and visual context are not transcribed

For videos with important visual content: Consider using screen reader software or manual notes to supplement the transcript with visual descriptions.


Are there any format-specific pricing differences?

No. All supported formats have the same pricing.

Pricing is based on audio duration, not file format or size:

  • 1-15 minutes: $2.25 flat rate
  • 16+ minutes: $0.15 per minute

Example:

  • 20-minute MP3 file: $3.00
  • 20-minute WAV file: $3.00
  • 20-minute MP4 video: $3.00

(All priced identically despite different file sizes)


Summary: Choose the Right Format

Best all-around formats:

  1. MP3: Maximum compatibility, good quality, reasonable size
  2. M4A: Apple ecosystem, better quality than MP3 at same size
  3. MP4: Video recordings (audio extracted automatically)

For maximum quality:

  • WAV: Uncompressed, but watch 250MB limit
  • FLAC: Lossless compression, best balance of quality and size

For specific use cases:

  • Podcasts: MP3 or M4A
  • Interviews: MP3, M4A, or FLAC
  • Meetings: MP4 video recordings or M4A audio-only
  • Professional recordings: WAV or FLAC
  • VoIP/Voice chat: Opus or MP3

Format checklist:

  • ✅ Is your format in the supported list? (11 formats total)
  • ✅ Is your file under 250MB?
  • ✅ Is your recording between 5 minutes and 2 hours?
  • ✅ If not, can you convert or split the file?

Ready to transcribe? Upload your audio or video file at BrassTranscripts.com and get your transcript in minutes.

Ready to try BrassTranscripts?

Experience the accuracy and speed of our AI transcription service.