Counting mp3s! (A Short Python Tutorial)

I woke up this morning wanting to count mp3s. I’d been meaning to start reorganizing my music, and I wanted to know how many mp3s I really had. I also wanted to find any folders where I had a mix of mp3s and sub folders, because I want to fix these:

So I wrote myself a short python script to count mp3s in a folder, and to tell me which and how many mixed folders I had. Here’s how I did it: countMp3s.py (and here’s a version with more comments). I thought this would make a nice python tutorial for beginners with just a little bit of programming experience.

Let’s take a look at the code, and then walk through it step by step.

Here’s the whole script:

from fnmatch import fnmatch
import os
import sys

if len(sys.argv) < 2:
  print "Please supply a path argument; this is the folder path to your music folder"
  exit(1)

path = sys.argv[1]

total_mp3s = 0
total_folders = 0
total_weird_folders = 0

for (sub_path, folders, files) in os.walk(path):
  mp3files = [x for x in files if fnmatch(x, "*.mp3")]

  total_mp3s += len(mp3files)
  total_folders += 1

  if len(folders) > 0 and len(mp3files) > 0:
    print sub_path
    total_weird_folders += 1

fraction_weird_folders = 0
if total_folders > 0:
  fraction_weird_folders = total_weird_folders / float(total_folders)

print "Total number of mp3s: %s" % total_mp3s
print "Number of folders with mp3s and subfolders: %s" % total_weird_folders
print "Total number of folders: %s" % total_folders
print "Fraction of folders with mp3s and subfolders: %f" % fraction_weird_folders

Let’s go through this line by line:

from fnmatch import fnmatch
import os
import sys

Here, we’re asking python to import a bunch of extra tools for us to use. ‘os’ is operating system. This gives us powers to do operating system type stuff, like listing folders, moving files, stuff like that. ‘sys’ is system. This gives us the power to read things from the command line. ‘fnmatch’ is file name match. It will let us do filename wildcards, in this case ‘*.mp3’.

We’re going to run this script from the command prompt. What is the command prompt? It’s the old-style part of your computer where you can enter commands by keyboard and stuff happens. How to use it for: Mac Windows

So, at a command prompt, we’re going to type something like

python countMp3s.py /Users/mcarlin/Music

to run our program. The first part tells the computer to run python. The next two parts are called arguments; they’re extra pieces of information for python to use. The first argument is the name of our script, countMp3s.py. With this, we’re telling python to run our script. The second argument is going to tell our program where to find music.

if len(sys.argv) < 2:
  print "Please supply a path argument; this is the folder path to your music folder"
  exit(1)

This code checks to see whether the script has been given two arguments. If it hasn’t, the script asks the user to try again, and quits.

Sys.argv is the list of arguments. It should have two things: the script name, and the location of music.

path = sys.argv[1]

Sys.argv[1] is how we get the location of the music. The bracketed 1 means “get me thing 1 from Sys.argv”. If we had said Sys.argv[0], we would have gotten the script name. We put the location of music in a variable named path.

total_mp3s = 0
total_folders = 0
total_weird_folders = 0

Here we’re creating counters for the number of mp3s, number of folders, and number of weird folders. They all start at zero.

for (sub_path, folders, files) in os.walk(path):

os.walk takes a path (the location of our music) and “walks” through every sub folder. In each case, you get three things: the path to the sub folder (which we call sub_path), a list of folders inside this folder, and a list of files in this folder. Since we’re in a for loop, the next few lines of code (everything which is indented) will happen repeatedly, once for every sub folder.

  mp3files = [x for x in files if fnmatch(x, "*.mp3")]

This one might be self explanatory. It’s equivalent to saying: please give me every file in the list of files, if the file matches “*.mp3”.

  total_mp3s += len(mp3files)
  total_folders += 1

len(mp3files) is the length of the list of mp3files. However long that list is, that’s how many mp3s are in this folder. We add this to the total mp3 count. We also add one to the total folder count.

  if len(folders) > 0 and len(mp3files) > 0:
    print sub_path
    total_weird_folders += 1

If this folder has more than zero sub folders and more than zero mp3s, it’s a weird folder (remember, that’s something I’m looking to count). We print out the location of this folder, and add one to the weird folders count. Okay, we’re done with the loop. The rest of this stuff happens just once, after the loop is over and everything has been counted.

fraction_weird_folders = 0
if total_folders > 0:
  fraction_weird_folders = total_weird_folders / float(total_folders)

We know the number of weird folders, but I also want to know the fraction. So, assuming the total number of folders isn’t zero, I divide number of weird folders by total number of folders to get the fraction.

print "Total number of mp3s: %d" % total_mp3s
print "Number of folders with mp3s and subfolders: %d" % total_weird_folders
print "Total number of folders: %d" % total_folders
print "Fraction of folders with mp3s and subfolders: %f" % fraction_weird_folders

Now we just print out all the numbers. You can put a variable into the print statement by use of %s (for strings), %d (for integers), or %f (for fractional or floating point numbers).

That’s it! That’s the whole program. I ran this program on my music collection, and here are the numbers I got:

Total number of mp3s: 25191
Number of folders with mp3s and subfolders: 113
Total number of folders: 13226
Fraction of folders with mp3s and subfolders: 0.008544

So I have 25,191 mp3s, and just under 1% of my folders are weird. Yay!

You can do a lot of really powerful things with the ‘os’ tools in python. You could remove all the duplicates from your music collection. You could remove “The” from the beginning of every file name. You could copy all the folders with more than 10 files to a different location.

If you have an idea for something to do with these powers, leave me a comment. I might write another few of these tutorials based on reader input.

I’m Hopeful that new Pope Francis is to Social Injustice what John Paul II was to the USSR

A new Pope has been elected. Cardinal Jorge Bergoglio has been elected Pope Francis I.

I’m a lapsed Catholic, an agnostic, and I have been accustomed to strongly opposing the Catholic church for all of my adult life.

So why am I so happy about this choice?

I’m happy because Bergoglio seems like a humble man. As a Cardinal, he is supposed to have lived in a regular people apartment, taken public transportation, cooked his own meals. He was supposedly a front runner in the 2005 conclave, but made strong pleas not to be chosen, and so was not chosen. The phrase most attached to his name is “Social Justice”. He’s of Italian descent, but born and raised in Argentina, in an age and a country which are strongly identified with social justice, with socialism, with balancing power towards average people, not rich ones. He has chosen the name Francis, the first ever to do so, after Francis of Assisi, the man who gave up everything to live a life of poverty and help others… perhaps the only historical Christian I have ever respected.

Rich v Poor is the new Cold War. The fight to ensure that ordinary people aren’t left behind by hyper-capitalism is the fight of our generation. It concerns so many things, among them social justicerich-vs-poorthe financial crisisOccupy Wall Streetcrushing austerity in Greece, Italy and other placesCorporate evils, and universal health care, to name a bare handful.

On all of these issues, it’s already pretty clear that Cardinal Bergoglio stands on the side of good.

In 1978, the Catholic church elected little known Karol Wojtyla to be Pope John Paul II. He came from a land that was hurt by the USSR, by the Cold War. For all his other faults, he put the soft power of a billion people squarely against the Iron Curtain, against the Berlin Wall, against Soviet puppet governments, and he did it peacefully. He didn’t do it alone, by any means, but it’s fair to say he was a significant force for peaceful good in the Cold War.

I have disliked the Catholic church, intensely, for many years, but I think this time they made the same kind of choice. Unable to summon the willpower to clean house, unable to come to terms with the sexual abuse scandal, unwilling to consider modern views on sexual practices, I think they nevertheless made a very good choice: to throw the moral weight and soft power of a billion people squarely on the right side of social justice. He will have many faults [1], but I’m really, honestly hopeful that he can be a powerful ally in my generation’s most important fight.

 

No, that’s Jonathan Pryce

[1] I’m disappointed the new Pope (most probably) opposes homosexuality, though with the Cardinals, that was to be expected. Honestly, and I may get a lot of flak for saying this: gay people are winning. Poor people are losing! I think gay marriage will eventually become a reality, almost everywhere, whereas I’m very worried that extreme wealth distribution and feudalism will also become the norm. I would absolutely rather a Pope against gay people and in favor of the poor than the other way around
.

Projects: Letterpress in a Day

My favorite game of late is Letterpress. It’s a wonderful iPhone game which combines Scrabble-esque word play with territory capture, ala Risk. It’s an instant classic, one I’d love to play with my grandmother.

What a shame, then, that Letterpress is iOS only, and doesn’t support local play.

For kicks, today I challenged myself to make a local Letterpress clone in one day. Bonus points: I wanted it to be playable on the iPad, so I could set it down in front of my grandmother.

Three and a half hours later…

 

Here it is! A complete two player local clone of Letterpress, up on my webspace.

I’ll leave this up until the inevitable cease and desist letter, and then, with great respect to atebits and his awesome game, I’ll take down my version.

This turned out to be a really simple project, and might serve as a good introduction to game programming. It’s just javascript and an html canvas element (ie, thingy which allows you to draw images on x and y coordinates).

For those who are interested, here’s the entire project source code and resources, released under the Crapl License (that is, open source, with the understanding that the source code is a quick hack).

If enough people are interested, I’ll make a tutorial out of this. It’s definitely, honestly a beginner level project, with enough guidance.

Oh, but my bonus goal? The iPad? Don’t even try it. It loads a large dictionary file. iPad Safari segfaults every time I try to visit the site. Sigh 🙂

Let me know what you think!

My Bike Has Saved My Life

How healthy is it, in the long run, to bike places instead of driving?

I began biking regularly in the latter half of 2006. How much have these past six years of biking helped my life?

For all this time, I’ve averaged about 20 miles a week, the typical length of my commute to school or work. Many weeks of inactivity have been roughly balanced out by a lot weeks of 70 or 100 miles.

So I’ve biked about 20 * 50 * 6 = 6000 miles in six years. That’s just three miles a day, a pittance, but sustained over a long time.

Using this calorie counter for bikes, with my average weight during this time (about 250 lbs), 10 mph average speed, no elevation change, 80% flat ground, 10% uphill, 10% downhill (conservative estimates for sure!), and 6000 miles, I get 475,305 calories.

At 3500 calories a pound, that’s 135 pounds that I could have otherwise gained.

I weight 273 lbs. Even at six feet tall, that makes me a very large man! If I had gained all those calories, I would weigh 408 lbs. At that size, my risk factors for diabetes and heart disease would be huge.

 

I’m totally sober in this photo. That’s just how much I like cake.

 

This isn’t even considering the downstream effects (more muscle, thus even more automatic weight loss) of having biked those miles, nor the roughly $600 in gasoline I would have otherwise spent*, nor the general therapeutic benefits to my happiness.

My bicycles have saved my life. We’re not done yet (273 lbs is a long, long way off from the good), but I’m glad to know I got something done.

 

This post is dedicated to my favorite of all bikes, the blue bomber. I miss you, dude!

 

*Okay, maybe more like $400, taking out a lot of the fun rides.

Concerning Scale

How big is big? Many of my friends in technology like to talk about big scale. Programmers at big companies work on big projects which have billions of transactions a day, on tens of thousands of computers. We talk about these things like they’re really big. They’re really big, aren’t they?

Are they?

I wanted to try to gain a better intuition about big, and maybe to bust my pride a little bit, so I asked myself:

Which is bigger? A mountain, or all the Google searches ever, if each search was worth one bean?

What?

Well, a Google search is a little thing, but not insignificant. A lot of beans (or a lot of Google searches) should really add up.

So I wanted to know: if we’d been throwing a bean on a pile each time someone made a Google search, would we have a pile that was comparable in size to a mountain?

 

Fancy Google Data Center

 

Here’s a page with some of Google’s yearly search totals. Google had something like 1.7 trillion searches in all of 2011. I think 12 trillion is a safe estimate for all the Google searches ever. It’s almost certainly within a factor of two (ie, somewhere between 6 and 24 trillion).

 

Hey. Beans!

 

A bean is something like 1.5 cm3.

So, a bean per Google search means about 18 trillion cm3. Let’s convert that to km3. That pile is 0.018 km3.

Mount Fuji is 336 km3.

 

Mount Fuji is huge.

 

Mount Fuji is eighteen thousand times larger than the bean pile for all the Google searches ever. Even if Google grew ten times larger, it would take us eighteen thousand more years of bean piling to stack up to Mount Fuji (and Mount Fuji isn’t the largest mountain on Earth. Not even close).

It might even be the case that Mount Fuji is larger in volume than all the beans the human species has ever eaten in all history. I don’t know.

Go Nature!

 

The Status is Not Quo.

Since the recent election, the idea has surfaced that the American people have chosen yet another divided government. In his column on Wednesday, George Will wrote,

A nation vocally disgusted with the status quo has reinforced it by ratifying existing control of the executive branch and both halves of the legislative branch.

Is this really true? Many Americans who voted for Democrat members of the House of Representatives might wonder who exactly voted for the Republican House. I had a suspicion, so I did some data digging.

Using the New York Times election data from Thursday morning, with a little programming magic, I came up with the following result:

  • Popular Votes for Democratic representatives: 54,329,835
  • Popular Votes for Republican representatives: 53,828,891
  • Popular Votes for third parties and write-ins: 3,141,569

That’s right, folks. Credit to the Republicans, they won the House fairly under the present rules… but the Democrats won the popular vote for the House48.81% to 48.36%

This doesn’t seem that surprising, does it? After all, every election for the last sixteen years or so, we’ve heard about how, due to the Electoral college, the popular vote for president could go differently than the state by state result.

This is different for two reasons. First, while it is the right of each state to apportion its representatives, the intention of the House is to reflect the will of the populace. The Senate, which sends two representatives from each state, big or small, is intended to reflect the state by state preferences of the nation. It could skew very heavily, if one or the other party was more strongly represented in the smaller states.

The House is supposed to be more balanced according to the percentages of the population. While it can still end up skewed on a state by state basis, it is against the spirit of the lower chamber for the result to skew very far on a national scale.

The second, much more important reason is this: as of this writing, with 9 seats undecided, the Democrats have 193 members in the house, while the Republicans have 233. This means:

  • The popular vote for the House was 48.81% Democrat, 48.36% Republican
  • But the Democrats only control 45% of the House, while the Republicans control 54%.

This is a 9% skew towards the Republicans on tied voting. The American people narrowly preferred Democrats for the house, at large… but the Republicans have a 9% margin of victory. Shocking, isn’t it?

Think of this another way: we all have a mental picture where battleground state votes have “high value”, where they are worth a lot more than other votes. If we average House seats by number of votes, and then normalize so that the average vote is worth 1, we get the following very unpleasant statistic:

  • If you voted for a Democrat for the House in this election, your vote was worth 0.9 votes.
  • If you voted for a Republican for the House in this election, your vote was worth 1.1 votes.

I hope you’re flabbergasted by that. I know I am. The vote of a Democrat is worth significantly less than the vote of a Republican.

This is the price of gerrymandering. It is a very real political tactic in which politicians redraw the districts (often times with very strange shapes) to concentrate their opponents’ votes in fewer districts, and spread their own winnings out into more districts.

Take another look at the NYTimes district map, or this one by Google. Look at something like south Texas, for instance. See those weird narrow districts at the bottom of the state? Look at Pennsylvania district 12, or North Carolina districts 1, 7, or 11. These are the shape they are specifically to change the apportionment of seats in the House. Both sides engage in this process, and each have some states where they gain an unfair advantage, but data shows that the Republicans are very definitely the worse offenders.

Older followers of politics know that gerrymandering is a problem. Younger followers might not know about it yet. I very much doubt anyone knows just how bad it is. I certainly didn’t. Well, this is how bad it is: one side has votes that are worth 0.9. The other side has votes that are worth 1.1. If they reach a national tie, one side will gain an automatic 9% advantage.

Those interested in cleaning up Washington’s gridlock might look to this as a problem worth solving. It won’t be easy; some severe gerrymandering has gone before the Supreme Court, and been upheld. We probably can’t count on the courts to solve the problem; it will have to be solved politically, at a grassroots and state level. Personally, I think this should be near the top of the Democrats’ to-do list for the next decade. I know I personally want a whole vote.