Naïve approach: Take 1
Considering N songs , let's start with the most extreme case: what is the probability that each song will be heard once after N songs are played? That is to say k=0, and each song is played exactly once, without any songs being listened to twice. The sequences satisfying this are all the possible permutations of N songs, which is N!, while the number of possible song orderings is N^N, which gives us:
Another way to obtain the previous results is to make the following reasonning:
- the first song played has no impact on the probability
- the requirement for the second song is to be different from the first, which occurs with probability (N-1)/N
- the requirement for the third song is to be different from the first two, which occurs with probability (N-2)/N
- the requirement for the i-th song is to be different from the first i-1, which occurs with probability (N-i+1)/N
- the requirement for the last song is to be different from the first N-1, which occurs with probability 1/N
Putting all the pieces together also yields:
So let's look at the next case: with still N songs in our playlist, what is the probability that each song will be heard at least once after N+1 songs are played? In other words, what is the probability that k=1, and one, and only one, song will have been heard twice, and all others once? A possible scenario with 5 songs would be: 3-2-1-5-1-4.
So how many song orderings satisfy this scenario? It's a little trickier, but one way to look at the number of successful songs ordering is as follows:
- Out of the N songs, select the one that will be played last (N possibilities).
- Out of the remaining N-1 songs, select the one that will be played twice (N-1 possibilities).
- We now have N songs (N-1 unique, and one duplicate) that we need to order for the first N plays. There are N!/2 ways of doing so (dividing by 2 because two songs are identical).
- With N+1 songs being played, the size of the universe of song orders is now N^(N+1).
Putting the pieces together yields:
However, this approach does not generalize easily. Why? Consider N+2 plays, so k=2. We now have to distinguish two cases for the two extra plays: do they correspond to the same song or to different songs? Do we have 1-2-2-2-3 or 1-2-1-2-3? What about k=17? How do we split the additional 17 plays across the N songs? Although not impossible, it is a rather difficult task to determine all the ways of splitting the k extra plays across the N songs.
Naïve Approach: Take 2
Previously we looked at computing P(k|N) for fixed k to understand the relationship for small k and then derive a more general formula for any k. We can also take the opposite approach: compute P(k|N) for all k and small values of N, and see if we can generalize from there.
Consider 2 balls. What is P(k|2)? After hearing the first of the two songs, what is the probability of listening to it k more times before hearing the unheard song and ending the successful order of length k+2? We therefore derive:
For N=3, we get awfully close to the former brick wall. For instance, if k=7, we know that 7 songs will be repeats, but we don't know how the repeats will be spread out across the songs. Will it be 1-1-1-2-1-1-1-1-1-3 with 1 getting all the repeats, or something more balanced out like 1-2-1-2-2-1-1-1-2-3?