2 min readfrom Machine Learning

What will be the next breakthrough in ASR? [D]

Hey All,

I am currently working on ASR models, and I have gathered some recent literature. From my literature search, it seems like the ASR models are getting more and more powerful due to two main things.

  1. Because pseudo-labelled data is growing, supervised models are rising rapidly. Whisper-large-v3 has been trained on 5M hours of weakly supervised data, and Nvidia Parakeet v3 has been trained on 660k hours of labelled data (open-sourced). Funny enough, Nvidia Parakeet v3 actually beats Whisper-large-v3 on almost every benchmark, even though it has a smaller model size and smaller data scale. So clearly, scale is not everything.

  2. New architectures are on the rise; We used to have self-supervised + CTC to solve the ASR task, but now it seems like Transducer, and Token-Duration-Transducers are taking off. As well as attention encoder-decoder architectures (Qwen) that are all trained in a supervised manner.

Now, given that the labelled data is very huge, and the new architectures are coming up, are we saying bye to the self-supervised learning approaches like Data2Vec2.0, WavLM, etc., for ASR, and will we only use them for general-purpose speech tasks?

This is actually not similar to how computer vision operates now. Dinov3 is a self-supervised approach that is extremely performant in segmentation, classification, depth estimation etc but I do not see this in the speech domain now. ASR is dominated by these huge supervised architectures (which is a dense-prediction task), as well as emotion recognition, diarization, and speech seperation are also all dominated by the supervised approaches.

Do you think we will have our Dino moment with a new self-supervised architecture? Or supervised learning is the way to go? How would these methods actually perform if we trained a self-supervised model on these huge datasets?

submitted by /u/ComprehensiveTop3297
[link] [comments]

Want to read more?

Check out the full article on the original site

View original article

Tagged with

#generative AI for data analysis
#Excel alternatives for data analysis
#big data management in spreadsheets
#self-service analytics tools
#conversational data analysis
#real-time data collaboration
#intelligent data visualization
#data visualization tools
#enterprise data management
#big data performance
#self-service analytics
#data analysis tools
#data cleaning solutions
#natural language processing for spreadsheets
#machine learning in spreadsheet applications
#large dataset processing
#rows.com
#financial modeling with spreadsheets
#ASR
#Speech recognition