I've tried out OpenAI's Whisper. That is: I made an interface using C# and WPF. Like all the articles I've written about OpenAI it's the same mechanism here as the other ones. My interface looks like all the other interfaces.
To make it clear: OpenAI's Whisper is an open API for transcribing and translating audio files. It supports the following audio-filetypes: mp3, mp4, mpeg, mpga, m4a, wav, or webm. I'm guessing that for the quickest response from the files that are small is better? Logically I would think so, but I have not given much time or resources to find out. The only thing I know is that the webm-format is very small compared to the other filetypes. Wav-files are big and there's not much you can get into it before your 25MBs are used up. I did a recording which lasted for 1min or so and it was 11MB. I converted that file to the webm-format and it became 570kb. That's (after my calculations), 5% of the WAV-file size! (I've just used default values for creating the files)
I first started testing with mp3, since it was for me the most logical choice and I've never heard of the webm-file. And of course, even that is better. the MP3-file was approximately 1/10 of the size with a bitrate of 128kb. And I tried some other file-formats(aac, wma, mpeg, mp4). Well; all the files that Openai wil accept. It will not accept AAC and WMA but I was just curious. And it was the first file-formats(including mp3) that I could convert to. And speaking of that: I used NAudio to convert from wav to mp3, aac and wma. It was an exciting time.
But I was curious about the webm file-format, so I asked our trusted chatGPT3.5 turbo friend about it and it mentioned MediaToolkit. If you're using Visual Studio, using NuGet works fine. So now I could convert from wav to all the other file-formats the easy way. Yeah,,,for the moment I'm using both(MediaToolkit and NAudio). MdiaToolkit is great for conversion and getting audio-information. NAudio I used to make a simple player so I could test out my files(yeah...another eyes looking up: I have VLC too). But it's nice to be able to play, convert and record the files from the same application. So I made some Tools. But first of all. My Whisper-application:
Important options in whisper are:
- Speech to Text: Transcription or Translation
- Language(I think it can find out it self, but the success-rate is higher if you give it a hint)
As you can see from the above. Yes. It works. At the moment it only translates to english, and I tried and it works. The above text is taken from a Norwegian newspaper actually. When I translate I get this:
Not perfect. But one gets the idea of what it's all about. It was very exciting to work with this. And so easy. What I thought was the fun part was making the Tools in the menu.
First I made a recording-tool so that I have something to test with:
For this I used the C# - NAudio - library
Then I wanted to convert the files. So I made a conversion-tool:
It says convert to MP3, and that's what it started with, but I added all the other formats when I found out how. For this I used the MediaToolkit. Very easy API to understand and use. The next thing that cam to mind was to be able to play the files to test, without using VLC. I used VLC to check and see if the files really was of the expected format. See below. It's a very simple audioplayback-device:
The playback of the files was a little bit more tricky. I could not get NAudio to do playback of wav-files. I could make it play, but I could not make it stop when I wanted, so I used the player from windows.media - library to play these(and wma).
All in all I must say that this project was fun. And it would of been nice to be able to work more on this and make it better when I get the time. I had some time because of the easter-holidays, but sadly: that's over now.