top of page
Search

Using Whisper AI and WebVTT for Immersive Reading and Transcripts


Making knowledge easier is the focus of this site. We normally use Otter.ai to create transcripts and clickable immersive text to help our readers read, digest and most importantly, learn the massive amount of information on this site and that is available from medical professionals, scientists, lived experience professionals that create massive amounts of content in the form of podcasts, audio and video.


It's always been the goal to integrate this technology, for ease of use, control of visual aesthetics, cost and more. I currently cover all of the sites costs and donate my time so anyway to create and utilize tools to aid in that can go along way.


This post is to serve as a testing ground for some new tools I have found. Specifically, Whisper AI and WebVTT.


Utilizing this Colab notebook I have found another alternative to the self hosted Whisper AI Web Docker program I was using. Being able to use free GPU power from Google for Transcripts is going to enable more translations as well as audio playback. As AI grows, likely text to speech too.


Tools:


DeepGram's Google Colab Python Notebook: https://colab.research.google.com/github/deepgram-devs/try-whisper-in-google-collab/blob/main/try_whisper_in_three_easy_steps.ipynb


Huberman Labs Podcasts:

https://www.youtube.com/@hubermanlab


HTML5 Player with Clickable Transcript project:

Writeup: https://masf-html5.blogspot.com/2016/04/html5-player-with-clickable-transcript.html

JS BIN: http://jsbin.com/bevehi/edit?html,css,js,output


AblePlayer

https://github.com/ableplayer/ableplayer



To get started, we are going to be using the Huberman Labs Podcast # 87 Dr. Erich Jarvis: The Neuroscience of Speech, Language & Music





Getting WebVTT Transcript using Python and Google CoLAB:


First, we will setup the notebook and run the imports, We need to modify the code to point to that specific podcast:





Pull in the Audio from youtube and run whisper AI on it to create the VTT file: (renamed to .txt to make Wix Happy)



The Neuroscience of Speech, Language and Music LVxL_p_kToc copy.m4a
.txt
Download TXT • 175KB

Now that we have an audio file, and a VTT file we can start to work on the Player.


First lets see how we can integrate this project into a WIx blog post, by embedding the HTML code Directly:






149 views

Recent Posts

See All

Thank You for Visiting Everything Neurodiversity!

Hello, Thank you for visiting Everything Neurodiversity. This site is a labor of love and has been run solely by me. The hosting costs are minimal and I try to dedicate time to it whenever I can. I intend to keep this site as educational and ad free. 

I have learned a great deal from working on this site and the social platforms that go along with it. So much that I have started another site dedicated to fashion and clothing. Trying to make shopping for clothes easier if you will. I have curated close to a million items and build a web application to search and display them. It's still a work in progress, but If you are here I wanted to extend an invite to test and explore the beta version. Its embed below or available at app.mymallvibes.com

popular posts

music

Books

1
2

HR Resources