Splitting youtube videos using the terminal

Wow! It is 2016 already and another Google IO went by. What boggled by mind this time was using instant apps without installing them! I started this blog in May 2013, after watching the live stream of IO and it’s been 3 years already. I was very caught up last year, as I switched jobs, moved to a new city and then back to my old place. I am currently working on a startup called Born2Blossom with a couple of friends. India hasn’t yet achieved universal primary education for its kids and this is a problem. The government is investing in creating infrastructure, but drop out rates are still startling and teacher to student ratio is abysmal. We realized there is huge potential to support both the teachers and the students. We are attempting to expose upper primary school children to STEM (Science Technology Engineering and Math) through hands-on activities. We believe , leveraging technology can help us create a scalable solution providing support to those who lack accessibility to these resources. It’s well known that we learn by doing things hands on, especially Science. I am satisfied with the way the product is maturing and also intrigued by the interesting challenges that a startup ensues. We are grappling with developing a rock solid business plan but I am sure we’ll get there. Aah, I digress! So! At Born2Blossom, I am looking after the technical aspects of the app. That led me to explore a lot of frameworks that I might talk about in future but today was particularly interesting.

We want to curate quality content available on youtube in our prototype phase and it involved using only parts of that video and embedding it in our app for the kids to view. Wanting to automate this process of downloading youtube videos and splitting it into parts based on timestamps, led me to write a short script and also discover this amazing utility called Youtube-dl. Yeah, you guessed it, this simple command line utility comes in really handy to download youtube videos. But it does so much more! Kudos to the developers and community for the amazing documentation. The program lets you search for videos using a host of filters, specify quality of video or audio one needs to download. It also allows users to download multiple videos using a batch file and fetch entire playlists! I am sure, the program is also very well maintained and easy to install and use on different platforms.

Ok, so we have the downloading part taken care of. How do we split the video at given time stamps? Enter FFmpeg! I had written a post on how to record screencasts with ffmpeg. Basically think audio and video editing or manipulation and I’m sure FFmpeg has got you covered. As expected, I searched around in the official wiki and a couple of answers on stackoverflow.com and I knew the video can be split into parts using this. The natural consequence was me deciding to write a script to combine these two. Although, I could have written a shell script to do this, I decided to go with Python because I found a module named youtube_dl that natively supports the youtube downloader in python and frankly I find file manipulation much easier in python. Last but not the least, it is installed out of the box on a Mac!

So, go on and install youtubedl on your machine. It should be a breeze.

I know, a command line python program shouldn’t have a separate python module to use the same python program in the first place. But nonetheless, this should install youtube_dl on your machine. Try doing  import youtube_dl  after opening a python interpreter on the terminal just for sanity check and I got the error  can't find module youtube_dl . It’s most likely a python version issue. So on a mac, python is by default installed in  /usr/bin/python  and installing via pip, creates packages in  /usr/local/lib/python2.7.x/ . Just correct that path by doing  export PYTHONPATH = "/usr/local/lib/python2.7/site-packages:$PYTHONPATH"  or adding it to your bash_profile. It makes sure the modules installed by pip are referenced correctly and the error should most probably go away. Now I was taking inputs from a file (lets call it video_data.txt) like this :

The first column is the video id on youtube. Second column is the start timestamp, next column is the end time stamp where I need to cut the video. So in the first row, the first part should be between 15 and 35 seconds of the video and the second part should be between 42 and 58 seconds. This kind of structure is fairly generic and should be easily usable.

The final python script that combines the file read and ffmpeg splitting is called generator.py and looks like:

I have highlighted the most important parts of the script. We specify some out of the tons of options provided by youtube_dl to ensure we download mp4 videos and output file names are of set template.  I do a raw subprocess call to invoke the ffmpeg command line utility. You can easily use a good ffmpeg wrapper for python, I’m sure there are a lot of options out there. In any case, I decided to go with using a sub process and placing every part of the command as a separate string, otherwise you can also combine the strings into one but make sure you type exactly how you would on a command prompt. So something like  subprocess.call("ffmpeg -i input.mp4 -ss 30 -t 40 output.mp4",shell=True)  is valid. Of course, a lot of things in the string call of ffmpeg would be variables which is why I preferred the list approach as it was clean. Adding the shell flag invokes the program in a platform dependent shell (ofcourse :P). The official documentation warns that using shell = True  could be a security hazard. I am referencing a stackoverflow answer for further reading on subprocess shell flag.

This should get you started in the right direction when splitting youtube videos into multiple parts for whatever reason. Are there better ways? Let me know, I am curious!