Skip to main content

Transcription Service FAQ – Audio to Text Questions Answered

Everything you need to know about our AI transcription service with automatic speaker identification and fast audio to text conversion.

How accurate is your audio to text transcription service?

Our AI transcription service achieves 95%+ accuracy for clear audio with minimal background noise. Accuracy depends on audio quality, speaker clarity, and background noise levels. We use WhisperX, one of the most advanced speech recognition models available for professional audio to text conversion.

What audio and video formats do you support?

We support all major audio and video formats including MP3, MP4, M4A, WAV, AAC, FLAC, OGG, Opus, WebM, MPEG, and MPGA. Maximum file size is 250MB, and recordings can be up to 2 hours long.

How fast is your audio transcription service?

Our fast transcription service typically takes 1-2 minutes per hour of audio. You'll see a preview with the first 30 words within minutes, allowing you to verify transcription quality before payment.

Does your transcription service include speaker identification?

Yes! Our transcription service includes automatic speaker diarization that detects and labels different speakers in your audio. The system can identify multiple speakers and separate their dialogue clearly in the transcript for professional transcription results.

What transcription formats do you provide?

You'll receive your transcript in multiple formats: TXT (plain text with speaker labels), SRT (subtitle format), VTT (web video format), and JSON (structured data with timestamps). All formats are included with every transcription.

How much does it cost?

We use simple tiered pricing: Files 1-15 minutes cost $2.25 flat rate. Files 16+ minutes cost $0.15 per minute for the total duration. You'll see the exact cost after processing is complete, based on the actual audio duration detected.

Can you show me pricing examples?

Here are some examples of our tiered pricing structure:

Audio DurationPricing TierTotal Cost
10 minutesFlat rate (1-15 min)$2.25
15 minutesFlat rate (1-15 min)$2.25
16 minutesPer minute (16 × $0.15)$2.40
30 minutesPer minute (30 × $0.15)$4.50
60 minutes (1 hour)Per minute (60 × $0.15)$9.00
120 minutes (2 hours)Per minute (120 × $0.15)$18.00

All prices include automatic speaker identification and multiple output formats (TXT, SRT, VTT, JSON).

Is there a satisfaction guarantee?

Absolutely! We offer a 100% satisfaction guarantee for any reason, even after delivery. If you're not completely satisfied with your transcription, we'll provide a full refund, no questions asked.

How secure is my audio file?

Your privacy is our top priority. All files are encrypted during upload and storage. We automatically delete audio files after 24 hours and transcripts after 48 hours. We don't store any personal information, and payment processing is handled securely by LemonSqueezy.

How long do you keep my files?

We use a two-tier deletion schedule for maximum privacy: Audio files (your uploaded recordings) are automatically deleted after 24 hours, while transcript files (the text output) remain available for 48 hours to give you time to download them. This ensures your sensitive audio data has minimal exposure while maintaining user convenience.

Why do you keep audio files for 24 hours?

Audio files are kept briefly for technical reasons: processing retries if our system encounters issues, quality validation, and error recovery. Once transcription is successfully completed, the audio files are deleted within 24 hours as they're no longer needed. This is much shorter than the 48-hour window for transcripts.

Do you store my personal information?

No, we don't collect or store personal information from you as a user. We don't use cookies for tracking. The only data we temporarily store is your uploaded audio file (24 hours) and generated transcript (48 hours), which are both automatically deleted.

Can I download my transcript multiple times?

Yes, you can download your transcript in all available formats as many times as needed during the 48-hour availability period. We recommend downloading your files promptly after completion.

What happens if processing fails?

If processing fails for any reason, you won't be charged. Our system will notify you of the issue, and you can try uploading again. Most failures are due to poor audio quality or unsupported file formats.

What languages can your transcription service transcribe?

Our AI-powered transcription service supports 99 languages with automatic language detection. The system uses Whisper large-v3 model which can identify and transcribe: English, Spanish, French, German, Italian, Portuguese, Chinese (Mandarin), Japanese, Korean, Russian, Arabic, Hindi, Dutch, Swedish, Danish, Norwegian, Finnish, Polish, Turkish, Greek, Hebrew, Thai, Vietnamese, Indonesian, Malay, Filipino, Ukrainian, Czech, Hungarian, Romanian, Bulgarian, Croatian, Slovak, Slovenian, Estonian, Latvian, Lithuanian, Maltese, Irish, Welsh, Scottish Gaelic, Basque, Catalan, Galician, Icelandic, Luxembourgish, Afrikaans, Albanian, Amharic, Armenian, Azerbaijani, Belarusian, Bengali, Bosnian, Burmese, Cantonese, Esperanto, Faroese, Georgian, Gujarati, Haitian Creole, Hausa, Hawaiian, Javanese, Kannada, Kazakh, Khmer, Lao, Latin, Macedonian, Malagasy, Malayalam, Marathi, Mongolian, Nepali, Occitan, Pashto, Persian, Punjabi, Sanskrit, Serbian, Shona, Sindhi, Sinhala, Somali, Sundanese, Swahili, Tajik, Tamil, Tatar, Telugu, Tibetan, Turkmen, Urdu, Uzbek, Yiddish, Yoruba, and more. Language detection is automatic - simply upload your audio file.

Can you transcribe multiple languages?

Yes, our multilingual transcription service automatically detects and transcribes 99+ languages. The AI will automatically identify the primary language in your audio and provide an accurate transcription without requiring you to specify the language beforehand.

What about audio with heavy accents or background noise?

While our AI handles accents well, transcription accuracy may decrease with heavy accents or significant background noise. The 30-word preview lets you assess quality before paying, ensuring you only pay for transcriptions that meet your needs.

Do you offer bulk discounts?

Currently, we process files individually at our standard rate. For large-volume transcription needs, please contact our support team to discuss custom solutions.

Can I get timestamps for specific words?

Yes! The JSON format includes precise timestamps for individual words and segments, allowing you to sync the transcript with your audio perfectly. This is ideal for creating subtitles or detailed analysis.

What if I need to transcribe longer files?

Our current limit is 2 hours per file. For longer recordings, we recommend splitting them into smaller segments. Each segment can be processed separately while maintaining the same high quality and speaker identification.

How can I improve my raw transcripts after downloading?

Our raw transcripts include speaker labels (Speaker A, Speaker B) and timestamps, but you can enhance them further using AI chat tools. This process can transform your transcript into polished, professional documents with real names, better formatting, and organized structure.

→ Learn how to use AI chat tools to process your raw transcripts

This guide includes copy-paste AI prompts, speaker mapping instructions, and specialized techniques for different use cases like business meetings, interviews, and podcasts.

Need help with transcription jobs or audio to text questions?

Our support team is here to help with any questions about our AI transcription service, speaker identification features, or transcription jobs.

Start Transcribing