So it's not completely easy to spot on the boxes, but despite the same price, the Ninjago box contains twice as many pieces (575) as the Frozen one (292). Ninjago is a line of set produced by Lego and therefore owned by Lego, while Frozen is the result of a partnership with Disney. Quickly scanning the different boxes, I seemed to some trend there: similarly priced sets appeared to have fewer pieces for themes that were the result of external as opposed to internally derived.

Having some free time on my hands during the Christmas break, I extracted as much data as I could for the LEGO site, pulling for each set the theme, the number of pieces and the price. I was able to identify close to 700 sets which provides a reasonable size for exploring some trends. Here are all the data points with number of pieces on the x-axis and price on the y-axis, and some jitter was added but not particularly necessary (prices tend to take discrete levels but not number of pieces).

A few observations:

- the data is densely concentrated around the origin, the outliers on the scale make it hard to determine what exactly is going on there
- there appears to be quite some variability in number of pieces for a given price point, which confirms my initial impression from the Lego store. Looking at the $200 vertical line, we see that there are boxes at that price with fewer than 1000 pieces, and others with over 2500!
- overall, the relationship seems pretty linear along the lines of pieces = 10 * price, every $1 gets you about 10 pieces. I was more expecting a convex shape where each incremental piece costs a little less than the previous one, similarly to Starbucks where the larger the drink, the better the size-to-price ratio). I guess this can somewhat make sense: with food/drinks, two one-size units are equivalent to a two-size unit (if a gallon costs too much I'll just buy two half-gallons), but two 300 pieces Lego sets are not equivalent to a 600 Lego set, and so I guess Lego can afford maintaining the linear relationship.

- at 3808 pieces and $399, we have the tough-to-find Star Wars Death Star
- at 4634 pieces and $349, we have the Ghostbusters Firestation (to be released into 2016)

Let's focus a little more around the origin where most of the data resides (92% of sets are priced less than $100):

Along the x-axis there appears to be a category of sets (green dots) consisting of just a few pieces but priced incredibly high. These are actually of the Mindstorm category. They are actually very sophisticated Lego pieces allowing you to build robots containing touch / light sensors that are sold separately at high price points. In the rest of this post, we will exclude the Mindstorm category, as well as the Power Functions category for the same reason. The Dimensions category was also excluded given that the pieces, while not as sophisticated as for Mindstorm and Power Functions, were quite elaborate based on their interaction with the Playstation console (average pieces-to-price ratio is about 3).

There appears to be another category with it's own specific piece/price relationship (red dots). While overall it seemed that every $1 was equivalent to about $10 pieces, this category seems to have a steep $1 for 1.5 pieces. This is actually the Duplos category for younger children, and the pieces are much larger than regular Legos. That being said, I'm wondering if Lego isn't taking advantage of all the parents eager to give their toddlers a head start in the Lego environment... Duplos are also thrown out for the rest of the post.

Back to our original question, how do the different themes compare to each other, and is there a price difference between internal and external brands?

The following boxplot provides some insight in the pieces-to-price ratio within each category. I've sorted them by decreasing median (higher median is synonym with a 'good deal', many pieces for every dollar). I've also color-coded them based on whether the theme was internal (red) or external (blue) to Lego.

Glancing at the graph, the two main take-aways are that:

- there is strong variability within each category (in Star Wars for instance, the Troop Carrier set has 565 pieces for $40, while Battle on Takodana has fewer pieces (409) for a 50% higher price)
- there does nonetheless seem to be a trend that internal themes have a better pieces-to-price ratio

The conclusion of the regression analysis is that the slopes for the two lines is not statistically significant (9.67 pieces/$ for external brands, 10.15 pieces/$ for internal brands), but there was a significant difference in intercept (50 fewer pieces for an external brand at the same price).

So in summary, don't feel Lego is completely overpricing you Disney Princesses or Star Wars figurines although there is a small pricing difference. If you do want the biggest bang for your buck, take a look at the Creator theme, and in particular here's the overall pieces-to-price winner (which I ended up getting my kid!):

Happy building!