I woke up this morning wanting to count mp3s. I’d been meaning to start reorganizing my music, and I wanted to know how many mp3s I really had. I also wanted to find any folders where I had a mix of mp3s and sub folders, because I want to fix these:
So I wrote myself a short python script to count mp3s in a folder, and to tell me which and how many mixed folders I had. Here’s how I did it: countMp3s.py (and here’s a version with more comments). I thought this would make a nice python tutorial for beginners with just a little bit of programming experience.
Let’s take a look at the code, and then walk through it step by step.
Here’s the whole script:
from fnmatch import fnmatch import os import sys if len(sys.argv) < 2: print "Please supply a path argument; this is the folder path to your music folder" exit(1) path = sys.argv total_mp3s = 0 total_folders = 0 total_weird_folders = 0 for (sub_path, folders, files) in os.walk(path): mp3files = [x for x in files if fnmatch(x, "*.mp3")] total_mp3s += len(mp3files) total_folders += 1 if len(folders) > 0 and len(mp3files) > 0: print sub_path total_weird_folders += 1 fraction_weird_folders = 0 if total_folders > 0: fraction_weird_folders = total_weird_folders / float(total_folders) print "Total number of mp3s: %s" % total_mp3s print "Number of folders with mp3s and subfolders: %s" % total_weird_folders print "Total number of folders: %s" % total_folders print "Fraction of folders with mp3s and subfolders: %f" % fraction_weird_folders
Let’s go through this line by line:
from fnmatch import fnmatch import os import sys
Here, we’re asking python to import a bunch of extra tools for us to use. ‘os’ is operating system. This gives us powers to do operating system type stuff, like listing folders, moving files, stuff like that. ‘sys’ is system. This gives us the power to read things from the command line. ‘fnmatch’ is file name match. It will let us do filename wildcards, in this case ‘*.mp3’.
We’re going to run this script from the command prompt. What is the command prompt? It’s the old-style part of your computer where you can enter commands by keyboard and stuff happens. How to use it for: Mac Windows
So, at a command prompt, we’re going to type something like
python countMp3s.py /Users/mcarlin/Music
to run our program. The first part tells the computer to run python. The next two parts are called arguments; they’re extra pieces of information for python to use. The first argument is the name of our script, countMp3s.py. With this, we’re telling python to run our script. The second argument is going to tell our program where to find music.
if len(sys.argv) < 2: print "Please supply a path argument; this is the folder path to your music folder" exit(1)
This code checks to see whether the script has been given two arguments. If it hasn’t, the script asks the user to try again, and quits.
Sys.argv is the list of arguments. It should have two things: the script name, and the location of music.
path = sys.argv
Sys.argv is how we get the location of the music. The bracketed 1 means “get me thing 1 from Sys.argv”. If we had said Sys.argv, we would have gotten the script name. We put the location of music in a variable named path.
total_mp3s = 0 total_folders = 0 total_weird_folders = 0
Here we’re creating counters for the number of mp3s, number of folders, and number of weird folders. They all start at zero.
for (sub_path, folders, files) in os.walk(path):
os.walk takes a path (the location of our music) and “walks” through every sub folder. In each case, you get three things: the path to the sub folder (which we call sub_path), a list of folders inside this folder, and a list of files in this folder. Since we’re in a for loop, the next few lines of code (everything which is indented) will happen repeatedly, once for every sub folder.
mp3files = [x for x in files if fnmatch(x, "*.mp3")]
This one might be self explanatory. It’s equivalent to saying: please give me every file in the list of files, if the file matches “*.mp3”.
total_mp3s += len(mp3files) total_folders += 1
len(mp3files) is the length of the list of mp3files. However long that list is, that’s how many mp3s are in this folder. We add this to the total mp3 count. We also add one to the total folder count.
if len(folders) > 0 and len(mp3files) > 0: print sub_path total_weird_folders += 1
If this folder has more than zero sub folders and more than zero mp3s, it’s a weird folder (remember, that’s something I’m looking to count). We print out the location of this folder, and add one to the weird folders count. Okay, we’re done with the loop. The rest of this stuff happens just once, after the loop is over and everything has been counted.
fraction_weird_folders = 0 if total_folders > 0: fraction_weird_folders = total_weird_folders / float(total_folders)
We know the number of weird folders, but I also want to know the fraction. So, assuming the total number of folders isn’t zero, I divide number of weird folders by total number of folders to get the fraction.
print "Total number of mp3s: %d" % total_mp3s print "Number of folders with mp3s and subfolders: %d" % total_weird_folders print "Total number of folders: %d" % total_folders print "Fraction of folders with mp3s and subfolders: %f" % fraction_weird_folders
Now we just print out all the numbers. You can put a variable into the print statement by use of %s (for strings), %d (for integers), or %f (for fractional or floating point numbers).
That’s it! That’s the whole program. I ran this program on my music collection, and here are the numbers I got:
Total number of mp3s: 25191 Number of folders with mp3s and subfolders: 113 Total number of folders: 13226 Fraction of folders with mp3s and subfolders: 0.008544
So I have 25,191 mp3s, and just under 1% of my folders are weird. Yay!
You can do a lot of really powerful things with the ‘os’ tools in python. You could remove all the duplicates from your music collection. You could remove “The” from the beginning of every file name. You could copy all the folders with more than 10 files to a different location.
If you have an idea for something to do with these powers, leave me a comment. I might write another few of these tutorials based on reader input.