Wednesday, February 27, 2013

iTunes and Shuffling: The unintended consequences of perfect randomness





Wait, haven't I already heard this song?

You're at your desk playing a game, coding, writing emails, checking whether your Facebook friends are as bored as you, checking the latest basketball results – or better yet – checking Statisticator's blog to look at basketball results predictions.

And then it happens.

The iTunes song that is playing right now. You've already heard yesterday, and maybe even the day before. You check iTunes and indeed it already has multiple plays whereas there are a whole bunch of other songs that have never been played.

How is that possible?

More questions then sprout: how many songs will I listen to before hearing each of the songs in my playlist at least once? Once I have heard each song at least once, how many times will I have listened to the song with maximum plays?

To make a more common statistical analogy, suppose you have an urn with N balls each uniquely numbered 1 through N. You then proceed to pull a ball, take note of its number (kind of like bingo), and then place it back in the urn. The question is then: how many balls will you have to pull in order to reach a point where you will have pulled each of the N different balls at least once?




Before jumping into the theory, a few words should be said about how shuffling works in iTunes, and what other iTunes-related analyses have reported.

There are two types of shuffling options in iTunes (at least used to be before version 11): "shuffle" and "party shuffle". With "shuffle" iTunes will randomly play all the songs in your library \textit{without} replacement. It simply creates a random permutation of your songs, and all songs will be played once and only once. This feature does not interest us in the framework of this paper. The other option, "party shuffle", randomly plays songs \textit{with} replacement. This is the feature that interests us: after any given song has been played, any song (including the one just finished) has an equal probability of being played. Steve Jobs has often emphatically declared that the shuffling in both cases was entirely random, which we will use as our key assumption.

While not the primary purpose of this paper, it is worthwhile to mention that there are numerous discussions on iTunes' randomness on the Internet. Certain users have noted that the same sequence of four or five songs is always played in the exact same order, suggesting the permutation is not perfect and the algorithm relies on some faulty pseudo-randomness. Others comment that the algorithm incorporates user star rating to determine the order in which to play the songs with a tendancy of playing five-starred songs more. Another analysis suggests that not all songs are created equally, observing that popular songs and artists, as well as songs purchased from the iTunes store, are played more often than expected.




Again, our purpose here is not to explore iTunes' randomness (or for that matter the randomness of any other shuffling mode on other devices), but to determine some of the consequences of the assumed perfect randomness on the number of times we will need to play songs before all of them are heard.

Perhaps perfect randomness is not such a desirable feature after all!

We will tackle these questions through different posts:


  • Part 1 will contain some initial plots, as well as definitions and notations used in further parts
  • Part 2 will tackle the problem using some more naive methods
  • Part 3 will look at the elegant hidden recursions involved in the problem
  • Part 4 is the One-Way Elevator theory generating the recursions in Part 3
  • Part 5 will solve the Tangled Equations brought to light with the One-Way Elevator
  • Part 6 will move one step beyond the Tangled Equations and attempt to solve them in a more general context with regression tools
  • Part 7 will put all the pieces back together!

Monday, February 18, 2013

Who will make the Playoffs?


The All-Star weekend is a nice break in the middle of the regular season. Dunk contest, three-point contest, rising stars game and naturally the All Star game itself. A good break indeed from the intense regular season schedule.

But for me it offered a unique opportunity: no games for four consecutive nights! The perfect opportunity to run simulations. Many simulations.

A. Downpour. Of. Simulations.

You be the judge: the outcome of over 2 BILLION games were simulated.

I previously worked on some code to forecast the outcome of the rest of the season based on each team's latest performance, code which I have already used to look at the Lakers probability of making the Playoffs, and whether Dallas or LA (Lakers) had a better chance of making the Playoffs. But I was eagerly waiting for the All Star weekend to run the code for all days starting on December 1st 2012, in order to look at the trends and shifts in each team's probability of making the Playoffs, and each team's expected final standing.

Here are the results by conference.

Eastern Conference

Let's start by looking at the evolution of team standings:

And now for the evolution of the probability of making it to the Playoffs:

The last values as of February 15th 2013 give the following values:

Team Position Playoff Probability
MIA 1.196 100%
NYK 3.157 100%
BRK 4.455 99.8%
IND 4.076 99.8%
CHI 4.445 99.7%
ATL 4.942 99.6%
BOS 6.706 95.4%
MIL 7.230 91.4%
PHI 9.666 8.2%
TOR 10.195 4.2%
DET 10.588 1.8%
WAS 13.069 0.1%
CHA 14.470 0%
CLE 12.564 0%
ORL 13.241 0%

Although Philadelphia still has a glimmer of hope of making the Playoffs provided Milwaukee runs into a very bad stretch, the eastern teams making the Playoffs seem decided. The huge uncertainty however lies around positions 3-6 where all teams are extremely close. With the Brooklyn Nets facing Chicago and Indiana in the last two weeks of the regular season, the eastern final brackets will not be decided until the very end.


Western Conference

Similarly to what we did for the Eastern Conference, let's start by looking at the standings evolution:

And now for the "Playoff Probability" evolution:

The last values as of February 15th 2013 give the following values:

Team Position Playoff Probability
LAC 2.749 100%
OKC 2.371 100%
SAS 1.329 100%
DEN 4.899 99.9%
MEM 4.682 99.5%
GSW 5.824 97.5%
UTA 7.190 86.9%
HOU 7.550 80.7%
POR 9.752 14.8%
LAL 9.877 12.1%
DAL 10.376 8.1%
MIN 12.519 0.4%
NOH 12.870 0.1%
PHO 14.224 0%
SAC 13.788 0%

The story is a little different on the western front, where suspense is not around the center 3-6 positions and the final Playoff bracket but on the final eight spot. Houston has a good chance of keeping that spot which it currently holds, but Portland and Los Angeles (Lakers of course!) are on their heels from a safe-but-not-THAT-safe distance. Any slip and the two will pounce on the position!



I will provide updated probabilities as the season progresses!



Tuesday, February 12, 2013

Nice infographics on the Lakers' last road trip

Given I'm in the midst of analyzing the Lakers' performance and chances of making the Playoffs, I thought I'd share this pretty neat infographics tweeted by the Los Angeles Lakers (@Lakers), summarizing their last 7-game road trip. Experts on NBA.com said they would reconsider the Lakers for Playoffs if they went 5-2 or better on this road trip, they fell just short with 4-3.

Based on my most recent analysis, their current probability of making the Playoffs is 15.7%.



Monday, February 11, 2013

NBA article: CAN LAKERS OR MAVS TAKE NO. 8 IN WEST?



I wrote a post a couple of weeks ago on the Lakers' odds of making it to the playoffs. While currently working on another post on what the final Playoff picture will most likely look like for both conferences and all positions, I wanted to respond to an article on NBA.com on whether the Lakers or the Mavericks have a shot at the #8 spot in the West.

Portland, Houston and Utah have not be doing as well lately as they had been until now, which might give both the Lakers and the Mavericks some hope for some post-season action.

Running some simulations for all five teams' upcoming performance, here's what transpired:


Position Lakers Probability Mavericks Probability Jazz Probability TrailBlazers Probability Rockets Probability
3 / / / / 0.4
4 / 0.2 0.8 0.1 0.9
5 0.1 0.2 3 0.3 3
6 1.1 0.3 10.4 1.9 10.2
7 3.9 4.2 26.9 9.3 32.1
8 10.6 8.7 28.8 14.6 29.9
9 20.7 15.3 18.8 27.2 14.2
10 29.7 25.1 8 24.1 6.8
11 26.1 32.9 2.5 16.9 2.2
12 6 8.8 0.8 4.6 0.3
13 1.4 3.1 / 0.7 /
14 0.4 0.9 / 0.3 /
15 / 0.3 / / /

The Lakers' probability of making the Playoffs is 15.7% and has significantly increased over the past two weeks since my previous post, thanks to three consecutive wins at home before winning 4 out of 7 during a grueling road trip! This 70% win probability over the last 10 games has boosted home and road winning percentages in the model.

As for the Mavericks, their probability of making the Playoffs is close to the Lakers: 13.6%. The Trailblazers' have almost twice as much of a chance of making with 26.2%. The Jazz (69.9%) and Rockets (76.5%), have less to concern themselves with at this point, but should still remain on their guards, there are no guarantees in the NBA!

So while the Rockets', Jazz's and Trailblazers' latest stumbles can give Lakers and Mavericks fans some hope, they should remain realists, the road to climb out of their respectives holes to reach the eight spot is long and difficult one, but there is no doubt that both teams will fight till the final game!