Listen Monster

Best Whisper API Providers Better Than OpenAI

OpenAI launched Whisper in September 2022, an Open-source model for speech recognition.

Soon after this openai introduced Whisper API. Since whisper is open source, many developers host it on their server and launch API.

In this blog, I will provide you with a list of these kinds of websites that are offering Whisper API. Most of these APIs are more affordable than OpenAI and offer more file size limits. OpenAI just offers a 25MB file size.

Whisper API vs Host Locally

Here is the comparison of Hosting Whisper vs Whisper API. It will help you to decide which is better for you.

Host WhisperWhisper API
Pricing LowHigh
Implementation Difficult Easy
Setup Cost$1000-5000NO Cost

Whisper Needs powerful computers and skilled developers, which is why it is a good idea to use Whisper API initially.

Whisper Large v2 is preferred by most of the developers in case you want to use any other model then it is a good idea to host locally.

Top Whisper API

Here are the websites that are offering Whisper API except for the official OpenAI Whisper API.

1. Deepgram:

Deepgram build has its own speech-to-text model Nova 2. Along with Nova 2, they are also offering Whisper API.

Deepgram Whisper Cloud API

It is more affordable than OpenAI and offers more features. Here are the advantages of Deepgram Whisper API.

  1. Better performance
  2. Bigger File Size Option (Up to 2 GB)
  3. 3X Faster

You can use Deepgram whisper API by making a simple curl request. Deepgram provides all models (Large, Medium, Small, Base, and Tiny)

Deepgram also provides $200 free credits to test their API. Here is the pricing of Deepgram Whisper Cloud.

ModelPay as you goGrowth ($4000/year required)
Tiny $0.0033/min$0.0027/min
Base$0.0035/min$0.0028/min
Small $0.0038/min$0.0032/min
Medium$0.0042/min$0.0035/min
Large V2 $0.0048/min$0.0048/min

Deepgram’s own speech recognition model is much faster than Whisper’s. You can transcribe 1-hour duration in a few seconds. It is also more affordable than Whisper.

Deepgram also provides audio summarization, and entity detection for its own speech-to-text API Nova.

Replicate

Replicate is a GPU hosting service that provides GPU access on demand. This means you only pay for the GPU when you need it, without having to rent a whole server.

Thus you can customize the cost & speed according to your needs.

However Replicate is also providing the API for Whisper. Replicate is charging based on the hour usage not on minutes of transcription.

In the backend, they are using Nvidia A40 GPU in the backend, which will cost you $0.000725 per second.

I noticed the performance is quite low compared to other solutions however you can save money and customize the API output according to your needs.

Replicate also provides hundreds of other AI models API. Thus it can help to build a complete project on a smaller budget.

Replicate keeps adding new AI models, some of which are customized versions by their users.

Azure

Azure also offers every API that you can get from Openai. Currently, Whisper is available in preview and the file limit is only 25 MB.

However, Microsoft Announced that soon they will increase the file limit to 2 GB and allow multiple concurrent connections.

Azure Whisper API

Unlike other APIs from Azure, you can’t directly access it. You have to make a request for OpenAI APIs. Once approved you can use any OpenAI model GPT 3.5, GPT-4, Whisper, etc.

Azure pricing is similar to the OpenAI pricing. Azure is also providing Startup credits of up to $150K credits for OpenAI models. They are offering API at the same price. Azure is definitely a reliable option compared to all other speech-to-text.

DataCrunch

DataCrunch is another company that provides the Whisper API, but you have to request access. DataCrunch offers GPU services. They recently added Whisper, Llama2, and Stable-Diffusion-XL APIs.

DataCrunch will enable self-serve API access in the very near future

For pricing, DataCrunch is the cheapest for the Whisper API. They charge $0.0010 per minute of audio processed. Since DataCrunch is well-known for GPUs, you can expect reliable performance.

Unlike OpenAI, DataCrunch API is not accessible to everybody. You have to contact them, don’t worry there are no specific requirements.

DataCrunch has data centers located in Finland and Iceland to provide low-latency Whisper API access globally. Their GPU servers are powered by Nvidia technology to enable fast transcription speeds.

When you request access to the Whisper API, DataCrunch will have you fill out a form describing your intended use case and expected usage volumes. They offer customized pricing plans beyond just the $0.0010/minute base rate for large usage customers.

The Whisper API is available through a simple REST interface that is easy to connect to from any programming language.

DataCrunch handles setting up and maintaining the Whisper models behind the scenes so you don’t have to worry about the ML infrastructure. All you need is an API key to start sending requests.

Currently, their API is only expecting audio files (mp3 format only) and the max file size is 200 MB (or roughly 2 hours)

Whisper API

The Whisper API website provides API at a pretty cheap price. It will cost you $0.0025/minute minimum 10 hours purchase is required. Whisper API provides 30 minutes of transcription for free.

Basically, you have to top up the balance before purchasing. You can get API by making a simple curl request.

The main limitation I found in this website is Billing is not available on the Pay-as-you-go model. You have to top up your account and keep track of usage.

Whisper API provides a detailed overview of usages.

Final Words

Whisper is an amazing speech recognition model that is getting popular day by day.

Along with amazing accuracy, it is free & open source. Whisper API will definitely save initial budget & time.

Here I have provided the top API solutions that offer more features than official OpenAI.

If you think I missed a crucial API provider do let me know.

Related Posts

Leave a Comment