OpenAI launched Whisper in September 2022, an Open-source model for speech recognition.

Soon after this openai introduced Whisper API. Since whisper is open source, many developers host it on their server and launch API.

In this blog, I will provide you with a list of these kinds of websites that are offering Whisper API. Most of these APIs are more affordable than OpenAI and offer more file size limits. OpenAI just offers a 25MB file size.

Whisper API vs Host Locally

Here is the comparison of Hosting Whisper vs Whisper API. It will help you to decide which is better for you.

	Host Whisper	Whisper API
Pricing	Low	High
Implementation	Difficult	Easy
Setup Cost	$1000-5000	NO Cost

Whisper Needs powerful computers and skilled developers, which is why it is a good idea to use Whisper API initially.

Whisper Large v2 is preferred by most of the developers in case you want to use any other model then it is a good idea to host locally.

Top Whisper API

Here are the websites that are offering Whisper API except for the official OpenAI Whisper API.

1. Deepgram:

Deepgram build has its own speech-to-text model Nova 2. Along with Nova 2, they are also offering Whisper API.

It is more affordable than OpenAI and offers more features. Here are the advantages of Deepgram Whisper API.

Better performance
Bigger File Size Option (Up to 2 GB)
3X Faster

You can use Deepgram whisper API by making a simple curl request. Deepgram provides all models (Large, Medium, Small, Base, and Tiny)

Deepgram also provides $200 free credits to test their API. Here is the pricing of Deepgram Whisper Cloud.

Model	Pay as you go	Growth ($4000/year required)
Tiny	$0.0033/min	$0.0027/min
Base	$0.0035/min	$0.0028/min
Small	$0.0038/min	$0.0032/min
Medium	$0.0042/min	$0.0035/min
Large V2	$0.0048/min	$0.0048/min

Deepgram’s own speech recognition model is much faster than Whisper’s. You can transcribe 1-hour duration in a few seconds. It is also more affordable than Whisper.

Deepgram also provides audio summarization, and entity detection for its own speech-to-text API Nova.

Deepgram

Replicate

Replicate is a GPU hosting service that provides GPU access on demand. This means you only pay for the GPU when you need it, without having to rent a whole server.

Thus you can customize the cost & speed according to your needs.

However Replicate is also providing the API for Whisper. Replicate is charging based on the hour usage not on minutes of transcription.

In the backend, they are using Nvidia A40 GPU in the backend, which will cost you $0.000725 per second.

I noticed the performance is quite low compared to other solutions however you can save money and customize the API output according to your needs.

Replicate also provides hundreds of other AI models API. Thus it can help to build a complete project on a smaller budget.

Replicate keeps adding new AI models, some of which are customized versions by their users.

Replicate

Azure

Azure also offers every API that you can get from Openai. Currently, Whisper is available in preview and the file limit is only 25 MB.

However, Microsoft Announced that soon they will increase the file limit to 2 GB and allow multiple concurrent connections.

Unlike other APIs from Azure, you can’t directly access it. You have to make a request for OpenAI APIs. Once approved you can use any OpenAI model GPT 3.5, GPT-4, Whisper, etc.

Azure pricing is similar to the OpenAI pricing. Azure is also providing Startup credits of up to $150K credits for OpenAI models. They are offering API at the same price. Azure is definitely a reliable option compared to all other speech-to-text.

Azure Whisper API

DataCrunch

DataCrunch is another company that provides the Whisper API, but you have to request access. DataCrunch offers GPU services. They recently added Whisper, Llama2, and Stable-Diffusion-XL APIs.

DataCrunch will enable self-serve API access in the very near future

For pricing, DataCrunch is the cheapest for the Whisper API. They charge $0.0010 per minute of audio processed. Since DataCrunch is well-known for GPUs, you can expect reliable performance.

Unlike OpenAI, DataCrunch API is not accessible to everybody. You have to contact them, don’t worry there are no specific requirements.

DataCrunch has data centers located in Finland and Iceland to provide low-latency Whisper API access globally. Their GPU servers are powered by Nvidia technology to enable fast transcription speeds.

When you request access to the Whisper API, DataCrunch will have you fill out a form describing your intended use case and expected usage volumes. They offer customized pricing plans beyond just the $0.0010/minute base rate for large usage customers.

The Whisper API is available through a simple REST interface that is easy to connect to from any programming language.

DataCrunch handles setting up and maintaining the Whisper models behind the scenes so you don’t have to worry about the ML infrastructure. All you need is an API key to start sending requests.

Currently, their API is only expecting audio files (mp3 format only) and the max file size is 200 MB (or roughly 2 hours)

Datacrunch

Whisper API

The Whisper API website provides API at a pretty cheap price. It will cost you $0.0025/minute minimum 10 hours purchase is required. Whisper API provides 30 minutes of transcription for free.

Basically, you have to top up the balance before purchasing. You can get API by making a simple curl request.

The main limitation I found in this website is Billing is not available on the Pay-as-you-go model. You have to top up your account and keep track of usage.

Whisper API provides a detailed overview of usages.

WhisperAPI