Web · 2019-08-29

‘Spext’ is fusion of speech & text, & creates new interface for working with Content: CEO – Exclusive

For writers who want to transcribe anything “voice, i.e. interviews, etc., hiring a freelancer or a service for the work is the norm. The voice-to-text tech has advanced a fair bit. Recent strides by GoogleAmazon & others, will soon make digitally transcribing voice far more common, & a lot more accurate than it has been in the past.

One startup that has taken a lead in using these advances to produce innovative voice transcription software is Spext. It’s a startup from India, founded by Ashutosh Trivedi, Sanyam Jain & Anup Gosavi, with the aim of developing an AI based transcription & audio editing Software as a Service (SaaS) platform.

The team has already made great headway, & currently, its software addresses & solves a selection of issues faced by audio & video creators. 

Spext has targeted the growing podcast market specifically, but its features are equally useful to videographers, journalists, novelists, educators as well as audio & text creators from other varied fields.

What adds to the novelty of Spext is the fact that audio can now be edited as any other text document.

Whats New On The Net chatted with Anup Gosavi on email about Spext:

Q) Why Spext?

A) Spext is a fusion of the words – speech & text. Traditionally, we think about speech (voice) & text as two completely different mediums. If you fuse them together as a single thing, you can create an entirely new interface for working and generating with spoken word Content.

The idea for Spext started simply – to convert podcasts into written blogs using machine transcription, & to give a simple editor to correct the transcript. While building this editor, we thought, “What if you could edit the transcript & it actually edited the audio?” Compared to editing waveforms, this was more intuitive interface to work with audio files & so we built it 🙂

Q) Which aspects, specifically, of your software use machine learning?

A) Aligning spoken words to text – we have to be very accurate as the cuts have to be smooth and without glitches.

Noise reduction & leveling – Reducing noise & voice leveling uses machine learning

The underlying speech to text technology is machine learning as well but we rely on providers like Google, Amazon, AssemblyAI, etc.

Q) How much time, in your estimation, does your transcription software save for, say, journalists?

A) Most of our users say it takes around 7-8 minutes to transcribe 1 minute of media/ audio. With Spext, it takes usually 1.5-2 minutes to do the same thing. We reduce the production time by almost 80%.

Q) What is the accuracy level? What affects it negatively?

A) Normally it is 92-95%. Factors that affect the accuracy:
1. Noise level in the recording
2. Sampling rate (higher sampling rate is better)


What Makes Spext Special?

Voice transcription software Spext makes audio editing a breeze. Traditionally, waveform editing tools, such as Audacity  are used to edit podcasts or video audio files. Not only is this a time consuming process, but these types of tools often have a very steep learning curve, & as such, have very limited value for any layperson who sets out to edit their own audio files.

Spext, on the other hand, uses various voice-to-text APIs (Google, Amazon) to translate the audio into editable text, which can be manipulated in much the same way as normal text, in an ordinary text editor. 

Deleting silences 

Removing unwanted silences from audio or video files, is a fairly complex procedure, it involves following the waveform to identify the ‘flat’ areas, then carefully snipping out these sections. Obviously, much can go wrong during this process, especially if handled by inexperienced hands. Spext eliminates the complexity— to remove ‘dead space’ all you need to do is delete the ‘- – -’ areas from the transcribed text file, wherever you want a silence removed, & that’s it.

Removing filler words

Unnecessary ‘filler’ words, such as ‘uh’ & ‘um’, are a natural part of spontaneous conversational language, but they usually spoil a podcast or video production & make it sound unprofessional. Spext automatically highlights these words in red, so instead of having to carefully listen to the audio, identify the exact moment where a superfluous word crops up, & then cut it at the exact juncture in the waveform, you simply delete it from the text transcription.

Altering the flow

If the audio needs sections moved from one position to another to enhance the flow of the conversation, you only need to highlight the text in the transcription, select it & then delete & paste it wherever you want. This feature operates much like moving text in a text editor, except that, once moved you’ll be able to ‘hear’ it in its new position. At this point you cannot add new text to the file, but it’s something the team are working on.

Finalizing the audio production

As soon as the audio is edited to your satisfaction, Spext makes it phenomenally easy to add intro & outro music, set the speech volume levels of the speakers to the same level, remove background noise &  general ‘fuzziness’ from the audio file. This is achieved by adding your music & clicking one button to ‘fix’ everything, pretty cool.

To take advantage of these tools you need to sign up for a Spext account, & upload your audio files. Editing takes place in the browser(only Google Chrome at present), so you don’t need to download anything & the tool will work on any device.

Additionally Spext has a host of clever AI tools, which assist in the editing process, such as ‘audio search’, which automatically discovers sections of the audio based on a keyword. If you want to find a reference to the various beaches described in a travel podcast, Spext will instantly find the sections & denote the exact time during the file for easy identification.

It also offers standard text-to-voice editing, so that you can write scripts for your audio productions & hear them before you voiceover. Templates to convert audio to video, are also available, so that you can share your podcasts on social media, which is more conducive to the format.

Spext produces transcripts from audio files very quickly, with a turnaround time of mere minutes, even for files which are hours long, this is really an advantage since it would take a human hours to do the same work.

Image Credit: Spext

Video Credit: YouTube


Click here to opt-out of Google Analytics