Skip to main content
← Back to Blog
13 min readBrassTranscripts Team

Transcribe My Audio: Upload & Convert Audio to Text in Minutes

Transcribe my audio file to text with professional AI transcription in 1-3 minutes per hour. Upload any audio format (MP3, M4A, WAV, AAC, FLAC, OGG, Opus, WebM, MPGA) and receive accurate transcripts with automatic speaker identification. No subscription required—pay only for what you use, starting at $2.25 for audio up to 15 minutes.

This guide shows you exactly how to transcribe your audio file to text, what formats are supported, pricing details, and how to get the best transcription results for your recording. For a quick overview of our service features, visit our audio transcription service page.

Quick Navigation

How to Transcribe My Audio File (Step-by-Step)

Converting your audio file to text takes 5 simple steps with BrassTranscripts:

Step 1: Upload Your Audio File

Visit BrassTranscripts.com and click the upload area or drag your audio file directly into the browser. The system accepts 11 audio formats and processes files up to 250MB and 2 hours in duration.

Supported formats: MP3, M4A, WAV, AAC, FLAC, OGG, Opus, WebM, MPGA, MP4 (audio), MPEG (audio)

Step 2: AI Processing (1-3 Minutes Per Hour)

WhisperX large-v3 AI model processes your audio with automatic speaker identification using Pyannote 3.1 speaker diarization. Processing time averages 1-3 minutes per hour of audio—a 60-minute file typically completes in 1-3 minutes.

What happens during processing:

  • Speech-to-text conversion with 99+ language support
  • Automatic language detection
  • Speaker identification and labeling (Speaker A, Speaker B, etc.)
  • Timestamp generation for each segment

Step 3: Preview First 30 Words Free

Before paying, view the first 30 words of your transcript to verify accuracy and speaker separation. This preview lets you confirm the transcription quality matches your needs.

Preview shows:

  • Transcription accuracy for your audio quality
  • Speaker identification (if multiple speakers detected)
  • Formatting and structure

Step 4: Pay Only for What You Use

BrassTranscripts uses simple pay-per-use pricing with no subscription:

  • $2.25 flat rate for audio 1-15 minutes
  • $0.15 per minute for audio 16+ minutes

Payment examples:

  • 10-minute file: $2.25
  • 30-minute file: $4.50 (2.25 + 15 × 0.15)
  • 60-minute file: $9.00 (2.25 + 45 × 0.15)
  • 90-minute file: $13.50 (2.25 + 75 × 0.15)

Step 5: Download in 4 Formats (All Included)

After payment, immediately download your transcript in all 4 formats:

  • TXT: Plain text for easy reading and editing
  • SRT: Subtitle format for video captioning
  • VTT: Web video text tracks for HTML5 video
  • JSON: Structured data with timestamps and speaker labels

All formats include speaker identification and timestamps. No additional charges for multiple formats—download all 4 with every transcript.

Transcribe My Audio Now →

What Audio Files Can I Transcribe?

BrassTranscripts accepts 11 audio formats covering virtually all common recording types.

Supported Audio Formats

Compressed formats (most common):

  • MP3: Universal audio format, widely compatible
  • M4A: Apple/iTunes format, high quality
  • AAC: Advanced audio coding, streaming quality
  • OGG: Open-source compressed audio
  • Opus: Modern compressed format
  • MPGA: MPEG audio format

Uncompressed formats (highest quality):

  • WAV: Professional recording standard
  • FLAC: Lossless compressed audio

Video formats (audio extraction):

  • WebM: Web video format
  • MP4: Video file (audio extracted)
  • MPEG: Video file (audio extracted)

File Limits

Maximum file size: 250MB Maximum duration: 2 hours Minimum duration: 5 minutes

Tip: Compressed formats like MP3 or M4A work well and stay under size limits. A 60-minute MP3 at 128 kbps is typically 60MB.

What If My Audio Format Isn't Supported?

Convert your audio file using free tools:

  • Windows: Use VLC Media Player to convert to MP3 or WAV
  • macOS: Use QuickTime or iTunes to export as M4A or MP3
  • Online: Use CloudConvert to convert any audio to MP3

Most audio editing software (Audacity, GarageBand, Adobe Audition) exports to supported formats.

Transcribe My Audio: Features

Professional transcription features included with every transcript:

Automatic Speaker Identification

WhisperX with Pyannote 3.1 speaker diarization automatically detects and labels different speakers in your audio. The system analyzes voice characteristics to distinguish between speakers and assigns consistent labels throughout the transcript.

Speaker identification works best with:

  • 2-6 speakers
  • Clear voice separation
  • Minimal overlapping speech
  • Distinct voice characteristics

Transcript format with speakers:

Speaker A: Let's discuss the quarterly results.

Speaker B: The revenue increased by 23% this quarter.

Speaker A: That's excellent news. What were the main drivers?

Learn more about speaker identification technology.

99+ Languages with Auto-Detection

WhisperX large-v3 model supports 99+ languages with automatic language detection. Upload audio in any supported language—the system detects the language automatically and transcribes accurately.

Commonly transcribed languages:

  • English (US, UK, Australian, Canadian)
  • Spanish, French, German, Italian
  • Mandarin, Japanese, Korean
  • Portuguese, Russian, Arabic
  • Hindi, Bengali, and 80+ more

No need to specify language—automatic detection handles mixed-language content within the same audio file.

Multiple Output Formats Included

Every transcript includes all 4 formats at no additional cost:

TXT (Plain Text):

  • Easy to read and edit
  • Compatible with any text editor
  • Best for general use, analysis, archiving

SRT (SubRip Subtitle):

  • Standard subtitle format
  • Compatible with YouTube, Vimeo, video editors
  • Includes timestamps and speaker labels

VTT (WebVTT):

  • Web standard for HTML5 video
  • Advanced subtitle features
  • Browser-compatible captioning

JSON (Structured Data):

  • Complete transcript data with metadata
  • Timestamps per word and segment
  • Speaker labels with timing
  • Ideal for custom processing or integration

Fast Processing Speed

WhisperX processes audio at 20-60x realtime speed:

  • 10-minute audio: ~30 seconds processing
  • 30-minute audio: ~1 minute processing
  • 60-minute audio: ~1-3 minutes processing
  • 2-hour audio: ~3-6 minutes processing

Start transcribing immediately after upload with near-instant results.

Privacy and Data Security

Audio retention: 24 hours after upload Transcript retention: 48 hours after purchase Automatic deletion: Files removed from servers after retention period

Your audio and transcripts are not used for training AI models or shared with third parties.

Transcribe My Audio: Pricing

Simple pay-per-use pricing with no subscription fees or monthly commitments.

Pricing Structure

Audio Duration Price Per-Minute Cost
1-15 minutes $2.25 flat $0.15-0.23/min
16-30 minutes $4.50 $0.15/min
31-60 minutes $9.00 $0.15/min
61-90 minutes $13.50 $0.15/min
91-120 minutes $18.00 $0.15/min

Formula: $2.25 for first 15 minutes + $0.15 per additional minute

Price Comparison

How BrassTranscripts compares to alternatives:

Service 30-Minute Audio 60-Minute Audio Model
BrassTranscripts $4.50 $9.00 Pay-per-use
Rev.com $45.00 $90.00 $1.50/minute
Trint $60/month $60/month Subscription
Otter.ai Pro $17/month $17/month Subscription + limits
Sonix $10/hour $10/hour Subscription + per-hour

Savings over manual services: 90% (Rev charges $1.50/minute, BrassTranscripts $0.15/minute)

No subscription advantage: Transcribe 2 files per year or 20 files per month—same per-minute rate.

What's Included in Price

  • Automatic speaker identification (Pyannote 3.1)
  • 99+ languages with auto-detection
  • All 4 formats (TXT, SRT, VTT, JSON)
  • Processing in 1-3 minutes per hour
  • 30-word preview before payment
  • 100% money-back guarantee

No hidden fees. No per-speaker charges. No format conversion fees.

See Pricing Examples →

Use Cases for Audio Transcription

Common scenarios where transcribing audio files to text helps productivity, accessibility, and content creation.

Transcribe Meeting Recordings

Convert team meetings, client calls, and conference sessions to searchable text. Meeting transcripts enable:

  • Reference specific decisions without re-listening
  • Share key points with absent team members
  • Create action item lists from discussions
  • Document project decisions and reasoning

Learn more about meeting transcription workflows.

Transcribe Interview Audio

Research interviews, journalism interviews, and stakeholder interviews benefit from accurate transcripts:

  • Qualitative research analysis and coding
  • Quote extraction for articles
  • Evidence documentation
  • Pattern identification across interviews

See our complete interview transcription guide.

Transcribe Podcast Episodes

Podcast creators use transcripts for:

  • SEO-optimized show notes
  • Blog post creation from episodes
  • Social media quote extraction
  • Accessibility for deaf/hard-of-hearing audiences

Read our podcast transcription workflow.

Transcribe Lecture Recordings

Students and educators transcribe lectures for:

  • Study guides and review materials
  • Accessibility accommodations
  • Note-taking support
  • Course material documentation

See lecture transcription best practices.

Transcribe Video Content

Video creators transcribe for:

  • YouTube captions and subtitles
  • Video SEO through searchable text
  • Content repurposing (blog posts, social media)
  • Accessibility compliance (ADA/WCAG)

Learn about video transcription.

Transcribe Research Audio

Academic and market researchers transcribe:

  • Focus group discussions
  • User research sessions
  • Ethnographic interviews
  • Field recordings

Transcribe Phone Calls

Business professionals transcribe:

  • Client consultations
  • Sales calls
  • Customer support calls
  • Phone interviews

Legal note: Verify recording consent laws in your jurisdiction before recording phone calls. Most US states require one-party consent, but some require all-party consent.

How to Get Better Transcription Results

Audio quality directly affects transcription accuracy. Follow these practices for optimal results.

Recording Environment

Choose quiet locations:

  • Private office or conference room
  • Library study room or quiet workspace
  • Avoid: Coffee shops, outdoor locations, traffic areas

Minimize background noise:

  • Turn off HVAC systems, fans, appliances
  • Close windows to block street noise
  • Silence phone notifications
  • Put computers in sleep mode (fan noise)

Recording Equipment

Use quality microphones:

  • Dedicated USB microphone: Audio-Technica ATR2100x ($80), Blue Yeti ($100)
  • Smartphone with good recording app: Voice Memos (iOS), Voice Recorder (Android)
  • Avoid: Laptop built-in microphones (highly variable quality)

Microphone positioning:

  • 6-8 inches from speaker's mouth
  • Point microphone directly at speaker
  • Use pop filter to reduce plosives (P, B, T sounds)

Recording Settings

Optimal audio settings:

  • Sample rate: 44.1 kHz or 48 kHz
  • Bit depth: 16-bit minimum
  • Format: WAV (uncompressed) or MP3 (192+ kbps)

Multi-speaker recordings:

  • Individual microphones per speaker (ideal)
  • Place single microphone equidistant from all speakers
  • Encourage turn-taking (minimal interruptions)

Audio Post-Processing

If your audio has quality issues, apply basic processing before transcription:

Noise reduction:

  • Use Audacity's Noise Reduction effect
  • Apply gentle reduction (50-70%) to avoid distortion

Normalization:

  • Normalize audio to -3dB to -1dB peak level
  • Ensures consistent volume throughout

EQ adjustment:

  • Boost midrange frequencies (1-4 kHz) for voice clarity
  • Reduce low frequencies (<80 Hz) to minimize rumble

See our complete audio quality guide.

What to Avoid

Don't transcribe:

  • Audio with loud music overlaying speech
  • Heavily compressed or distorted recordings
  • Audio with constant background noise louder than speech
  • Recordings where speakers are barely audible

Better approach: Re-record if possible, or use human transcription services for extremely poor audio quality.

Frequently Asked Questions

How long does it take to transcribe my audio?

Processing takes 1-3 minutes per hour of audio. A 60-minute audio file typically completes in 1-3 minutes, a 30-minute file in 30-60 seconds. After processing completes, transcripts are available for immediate download in all 4 formats.

What audio formats can I transcribe?

BrassTranscripts accepts 11 audio formats: MP3, M4A, WAV, AAC, FLAC, OGG, Opus, WebM, MPGA, and audio from MP4/MPEG video files. Maximum file size is 250MB, maximum duration is 2 hours, minimum duration is 5 minutes.

How much does it cost to transcribe my audio?

Pricing is $2.25 for audio 1-15 minutes, then $0.15 per minute for audio 16+ minutes. Examples: 30-minute file costs $4.50, 60-minute file costs $9.00. No subscription required—pay only for transcripts you purchase.

Can I transcribe audio for free?

BrassTranscripts offers a free 30-word preview of every transcript before payment. This preview shows transcription accuracy and speaker identification quality. Full transcripts require payment, but the preview feature lets you verify quality before purchasing.

Do I need to create an account?

No account required. Upload audio, process transcript, preview 30 words, and purchase without creating an account. Optional: Create account for easier access to transcript history.

How accurate is the transcription?

Transcription accuracy depends primarily on audio quality. Clear audio with minimal background noise produces professional-grade transcripts suitable for most business and academic uses. Poor audio quality, heavy accents, or excessive background noise may require manual correction. Preview 30 words free to verify accuracy before purchasing.

Does transcription include speaker identification?

Yes. Automatic speaker identification using Pyannote 3.1 labels different speakers throughout the transcript (Speaker A, Speaker B, etc.). Works best with 2-6 speakers with distinct voice characteristics and minimal overlapping speech.

What languages are supported?

WhisperX large-v3 supports 99+ languages with automatic language detection. Common languages include English, Spanish, French, German, Italian, Mandarin, Japanese, Korean, Portuguese, Russian, Arabic, Hindi, and 80+ more. No need to specify language—detection is automatic.

What transcript formats do I receive?

Every transcript includes all 4 formats: TXT (plain text), SRT (subtitles), VTT (web captions), JSON (structured data with timestamps and speaker labels). All formats included in price—no additional fees.

Is my audio secure and private?

Audio files are stored for 24 hours after upload, transcripts for 48 hours after purchase, then automatically deleted from servers. Audio and transcripts are not used for AI model training or shared with third parties.

What if the transcription has errors?

BrassTranscripts offers a 100% money-back satisfaction guarantee. If transcription quality doesn't meet your needs, contact support@brasstranscripts.com for a full refund. The free 30-word preview helps verify quality before purchasing.

Can I transcribe audio with multiple speakers?

Yes. Automatic speaker identification detects and labels different speakers throughout the transcript. Works best with 2-6 speakers with distinct voices. Each speaker receives a consistent label (Speaker A, Speaker B) throughout the transcript.

Get Started: Transcribe Your Audio File Now

Ready to convert your audio file to text? Upload any audio format and receive accurate transcripts with speaker identification in minutes.

Simple process:

  1. Upload audio (11 formats supported)
  2. Preview first 30 words free
  3. Pay $2.25 for 1-15 minutes ($0.15/min after)
  4. Download TXT, SRT, VTT, JSON formats

Features included:

  • Automatic speaker identification
  • 99+ languages with auto-detection
  • All 4 formats (no extra charge)
  • Processing in 1-3 minutes per hour
  • 100% money-back guarantee

Transcribe My Audio →

Before recording: Use our Audio Quality Pre-Recording Checklist to prevent quality issues. After transcription: Fix speaker labels with our Speaker Attribution Error Corrector or apply formatting with our Transcript Formatting & Style Standardizer.

Need help with audio quality? See our audio quality optimization guide for recording tips and best practices.

Have questions? Contact support@brasstranscripts.com for assistance with transcription, technical issues, or pricing questions.

Ready to try BrassTranscripts?

Experience the accuracy and speed of our AI transcription service.