Wait, haven't I already heard this song?
You're at your desk playing a game, coding, writing emails, checking whether your Facebook friends are as bored as you, checking the latest basketball results – or better yet – checking Statisticator's blog to look at basketball results predictions.
And then it happens.
The iTunes song that is playing right now. You've already heard yesterday, and maybe even the day before. You check iTunes and indeed it already has multiple plays whereas there are a whole bunch of other songs that have never been played.
How is that possible?
More questions then sprout: how many songs will I listen to before hearing each of the songs in my playlist at least once? Once I have heard each song at least once, how many times will I have listened to the song with maximum plays?
To make a more common statistical analogy, suppose you have an urn with N balls each uniquely numbered 1 through N. You then proceed to pull a ball, take note of its number (kind of like bingo), and then place it back in the urn. The question is then: how many balls will you have to pull in order to reach a point where you will have pulled each of the N different balls at least once?
Before jumping into the theory, a few words should be said about how shuffling works in iTunes, and what other iTunes-related analyses have reported.
There are two types of shuffling options in iTunes (at least used to be before version 11): "shuffle" and "party shuffle". With "shuffle" iTunes will randomly play all the songs in your library \textit{without} replacement. It simply creates a random permutation of your songs, and all songs will be played once and only once. This feature does not interest us in the framework of this paper. The other option, "party shuffle", randomly plays songs \textit{with} replacement. This is the feature that interests us: after any given song has been played, any song (including the one just finished) has an equal probability of being played. Steve Jobs has often emphatically declared that the shuffling in both cases was entirely random, which we will use as our key assumption.
While not the primary purpose of this paper, it is worthwhile to mention that there are numerous discussions on iTunes' randomness on the Internet. Certain users have noted that the same sequence of four or five songs is always played in the exact same order, suggesting the permutation is not perfect and the algorithm relies on some faulty pseudo-randomness. Others comment that the algorithm incorporates user star rating to determine the order in which to play the songs with a tendancy of playing five-starred songs more. Another analysis suggests that not all songs are created equally, observing that popular songs and artists, as well as songs purchased from the iTunes store, are played more often than expected.
Again, our purpose here is not to explore iTunes' randomness (or for that matter the randomness of any other shuffling mode on other devices), but to determine some of the consequences of the assumed perfect randomness on the number of times we will need to play songs before all of them are heard.
Perhaps perfect randomness is not such a desirable feature after all!
We will tackle these questions through different posts:
- Part 1 will contain some initial plots, as well as definitions and notations used in further parts
- Part 2 will tackle the problem using some more naive methods
- Part 3 will look at the elegant hidden recursions involved in the problem
- Part 4 is the One-Way Elevator theory generating the recursions in Part 3
- Part 5 will solve the Tangled Equations brought to light with the One-Way Elevator
- Part 6 will move one step beyond the Tangled Equations and attempt to solve them in a more general context with regression tools
- Part 7 will put all the pieces back together!