Showing posts with label french. Show all posts
Showing posts with label french. Show all posts

Thursday, March 7, 2013

What is "Face de Bouc"?


This recent article in French District (for French People in California) looks at how may people speak French in each of the US states. In order to do this, Facebook statistics are used, by looking at how many individuals in each state have listed French as a language. States are then ranked based on those absolute numbers. The main insights the author draws is that California has the most french speaking individuals (213,200) and Wyoming the least (1,440).

Personally, I don't find this super insightful and actually found the results to be presented in a misleading way.

There are definitely some questions we need to ask ourselves first before plunging into the data:
  • Is the proportion of Facebook users in the population the same across states? The article answers this implicitly and hints that the proportion ranges at least from a half to two-thirds.
  • Is the proportion of facebook users reporting their language the same across states? One could imagine that in more culturally diverse environments such as New York or California people might be more used to sharing this type of information.
  • The article does sometimes put the absolute number of French speakers in perspective based on the state's population. Nonetheless, the ranking in the article is based on absolute numbers. So naturally we would expect California and Wyoming to have opposite values. Does that tell us anything?


Based on the values reported in the article I was able to rank states by proportion of reported French-speaking Facebook users in the population:
State % of French-Speaking Facebook Users
District of Columbia 2.5%
New York 0.9%
Nevada 0.9%
Florida 0.7%
Hawaii 0.7%
California 0.6%
Alaska 0.5%
Illinois 0.4%
Georgia 0.4%
Louisiana 0.4%
Texas 0.3%
Pennsylvania 0.3%
Ohio 0.3%
New Jersey 0.3%
Arizona 0.3%
Wisconsin 0.3%
Oregon 0.3%
Montana 0.3%
North Dakota 0.3%
Wyoming 0.3%
Oklahoma 0.2%
Arkansas 0.2%
Mississippi 0.2%
South Dakota 0.2%

Although not perfect, this does provide a better comparison. District of Colmbia is mentioned in the original article as having the highest proportion, but nowhere does it state that this value is almost three times the value for the second highest state, New York. Definitely puts things into perspective!

And California despite having the largest absolute number is sandwiched between Hawaii and Alaska, probably not the states you would have thought of first!

Of course this ranking is only meaningful if we assume that the two previous assumptions hold on the proportions of Facebook users and language reporting are constant across states.

Oh and by the way, French people like referring to Facebook as "Face de Bouc" which sounds the same but literally means "billy goat face" :-)




Thursday, April 26, 2012

Predicting France's next president?


We're quite literally in the middle of the French Elections, a perfect opportunity to try to predict the new president 10 days ahead of time!

Before we jump in the model, a few words on the French system, thankfully much simpler than the US one!

French elections 101

The election is a two-step process. During the first step, called "first round" all candidates are eligible, and each french voter casts his vote for one of them.

After this first round, the two candidates having received the most votes go to the "second round" and are the only two eligible candidates at this point. This second round takes place exactly two weeks after the first round. Today we are right between the two rounds, and the two remaining candidates are current president Nicolas Sarkozy seeking his second term (left picture), and François Hollande (right picture).

 

The polls have been pretty much spot on predicting Nicolas and François would battle in the second round, with Francois Hollande having a slight advantage in first round votes.

So, can we predict who will win the second round?

I looked at historical results for the past six presidential elections (1974, 1981, 1988, 1995, 2002 and 2007), recording for each candidate first round and second round percentage of votes.

The model aims at computing the probability of becoming president for the candidate receiving the most votes in the first round.

Now out of the six past elections, the first round vote leader won only 3 elections with the challenger winning the other 3. So looking at the difference in first round percentage votes is not sufficient.

Based on various theories on election, it is also important to consider the percentage of votes received by other eliminated candidates with close affinities to the round-two candidates. Now with only 6 observations, it is difficult to introduce many variables, but I decided to add one more in addition to the first round delta percentage votes for the two leading candidates. This second variable is the delta between percentage votes for the candidates closest candidates. Let me explain based on an example:

Let us rank the 1995 first round candidates by left-right political affinity:

Candidate            First Round %       Sum closest two
Arlette Laguiller             5.30                  8.66
Robert Hue                    8.66                 28.60
Lionel Jospin                23.30                 11.98
Dominique Voynet              3.32                 41.87
Edouard Balladur             18.57                 24.16
Jacques Chirac               20.84                 23.31
Philippe de Villiers          4.74                 35.84
Jean-Marie Le Pen            15.00                  5.02
Jacques Cheminade             0.28                 15.00

For each candidate I then computed the sum of the two candidates immediately to the left and to the right on the political scale.

And the variable I introduce is the delta of this sum metric for the first round leader and the runner-up. So in 1995, te first round leader was Lionel Jospin, his first round percentage delta with second vote leader Jacques Chirac was 2.46 (23.30 - 20.84), and the "closest candidate delta" was -11.33 (11.98 - 23.31).

Model results

With these variables, I built a quick logistic model to estimate the probability of the first round leader to win the second round as a function of "first round delta" and "closest candidate delta".

Applying the model to the results of the 2012 first round results, indicates that the president for the next five years will be....

Nicolas Sarkozy !

Now, despite the small number of observations, I decided to exclude one of them which could be seen as an outlier. Indeed, in 2002 the extreme right party created a monumental surprise by reaching the second round. The second round became a right VS extreme right instead of the usual right VS left battle. And that year Jacques Chirac won the second round with an unprecedented 82% of votes whereas the values usually reside in the 45%-55% range.

Excluding that observation, the model was a perfect fit for the five remaining observations and predicted that the president for the next five years will be....

Nicolas Sarkozy !

Wait until May 6th to criticize...

Now, I could not agree more with the criticism the approach deserves of using the variables (including the intercept) in the model when we only have five or six observations.

But the objective here is not to publish in a stats journal, jsut to play around with the data. And all the polls indicate the François Hollande will be the next president. So in 10 days we'll see if this method that predicts Nicolas Sarkozy isn't as faulty as it would initially appear...