Open Source Licenses & Acknowledgments

AI Models & Core Technologies

OpenAI Whisper

Our transcription service is powered by OpenAI's Whisper, a general-purpose speech recognition model trained on diverse audio data capable of multilingual speech recognition, speech translation, and language identification.

Project: OpenAI Whisper
License: MIT License
Usage: Base speech recognition model for audio transcription

WhisperX

We use WhisperX for enhanced transcription with precise word-level timestamps and improved performance through faster-whisper and wav2vec2 alignment integration.

Project: WhisperX
License: BSD-4-Clause License
Usage: Enhanced speech recognition with word-level timestamps
Authors: Max Bain and others

pyannote.audio

Speaker diarization (identifying different speakers) is handled by pyannote.audio, an open-source toolkit offering state-of-the-art pretrained models and pipelines based on PyTorch.

Project: pyannote.audio
License: MIT License
Usage: Speaker diarization and identification
Authors: Hervé Bredin and contributors

faster-whisper

For improved performance, we utilize faster-whisper, a reimplementation of OpenAI's Whisper model using CTranslate2, offering significantly faster transcription with lower memory usage.

Project: faster-whisper
License: MIT License
Usage: Optimized Whisper inference engine
Authors: SYSTRAN and contributors

Supporting Libraries

PyTorch

The machine learning computations are powered by PyTorch, an open-source machine learning library providing tensor computation with GPU acceleration and deep neural networks.

Project: PyTorch
License: BSD-3-Clause License
Usage: Machine learning framework for model inference

Transformers (Hugging Face)

Model loading and inference utilities are provided by Hugging Face's Transformers library, offering state-of-the-art natural language processing models and tools.

Project: Transformers
License: Apache 2.0 License
Usage: Model loading and preprocessing utilities

Web Platform Technologies

Next.js

Our web application is built with Next.js, a React framework for production-grade applications with features like server-side rendering, API routes, and optimized performance.

Project: Next.js
License: MIT License
Usage: Web application framework

React

The user interface is built with React, a JavaScript library for building interactive user interfaces with component-based architecture.

Project: React
License: MIT License
Usage: Frontend user interface library

Tailwind CSS

Styling and responsive design are implemented using Tailwind CSS, a utility-first CSS framework for rapidly building custom user interfaces.

Project: Tailwind CSS
License: MIT License
Usage: CSS framework for styling and layout

Infrastructure & Services

Supabase

Database services and real-time functionality are provided by Supabase, an open-source Firebase alternative built on PostgreSQL.

Project: Supabase
License: Apache 2.0 License
Usage: Database and backend services

Node.js

Server-side processing and API endpoints run on Node.js, a JavaScript runtime built on Chrome's V8 engine for building fast and scalable network applications.

Project: Node.js
License: MIT License
Usage: Server-side JavaScript runtime

License Compliance

We are committed to respecting and complying with all open source licenses. All dependencies are used in accordance with their respective licenses, and we extend our gratitude to the maintainers and contributors of these projects.

License Summary

MIT License: Whisper, WhisperX, faster-whisper, pyannote.audio, Next.js, React, Tailwind CSS, Node.js
BSD Licenses: WhisperX (BSD-4-Clause), PyTorch (BSD-3-Clause)
Apache 2.0 License: Transformers, Supabase

Attribution & Credits

Special recognition goes to the researchers, developers, and maintainers who have made these technologies available to the open source community:

OpenAI team for developing and open-sourcing the Whisper speech recognition model
Max Bain and collaborators for WhisperX enhancements and word-level timestamp alignment
Hervé Bredin and the pyannote.audio team for state-of-the-art speaker diarization
SYSTRAN for the optimized faster-whisper implementation
Hugging Face for the Transformers library and model ecosystem
Meta (Facebook) for the PyTorch machine learning framework
Vercel for Next.js and React ecosystem contributions
Supabase team for the open-source backend platform
All open source contributors who have improved these projects

Research Papers & Citations

Our service is built on academic research. If you use our transcripts in academic work, please consider citing the underlying research:

Whisper

Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., & Sutskever, I. (2022). Robust Speech Recognition via Large-Scale Weak Supervision. arXiv preprint arXiv:2212.04356.

WhisperX

Bain, M., Huh, J., Han, T., & Zisserman, A. (2022). WhisperX: Time-Accurate Speech Transcription of Long-Form Audio. arXiv preprint arXiv:2303.00747.

pyannote.audio

Bredin, H., & Laurent, A. (2021). End-to-end speaker segmentation for overlap-aware resegmentation. Proc. Interspeech 2021, 3111-3115.