We're quite literally in the middle of the French Elections, a perfect opportunity to try to predict the new president 10 days ahead of time!

Before we jump in the model, a few words on the French system, thankfully much simpler than the US one!

**French elections 101**

The election is a two-step process. During the first step, called "first round" all candidates are eligible, and each french voter casts his vote for one of them.

After this first round, the two candidates having received the most votes go to the "second round" and are the only two eligible candidates at this point. This second round takes place exactly two weeks after the first round. Today we are right between the two rounds, and the two remaining candidates are current president Nicolas Sarkozy seeking his second term (left picture), and François Hollande (right picture).

The polls have been pretty much spot on predicting Nicolas and François would battle in the second round, with Francois Hollande having a slight advantage in first round votes.

**So, can we predict who will win the second round?**

I looked at historical results for the past six presidential elections (1974, 1981, 1988, 1995, 2002 and 2007), recording for each candidate first round and second round percentage of votes.

The model aims at computing the probability of becoming president for the candidate receiving the most votes in the first round.

Now out of the six past elections, the first round vote leader won only 3 elections with the challenger winning the other 3. So looking at the difference in first round percentage votes is not sufficient.

Based on various theories on election, it is also important to consider the percentage of votes received by other eliminated candidates with close affinities to the round-two candidates. Now with only 6 observations, it is difficult to introduce many variables, but I decided to add one more in addition to the first round delta percentage votes for the two leading candidates. This second variable is the delta between percentage votes for the candidates closest candidates. Let me explain based on an example:

Let us rank the 1995 first round candidates by left-right political affinity:

Candidate First Round % Sum closest two

Arlette Laguiller 5.30 8.66

Robert Hue 8.66 28.60

Lionel Jospin 23.30 11.98

Dominique Voynet 3.32 41.87

Edouard Balladur 18.57 24.16

Jacques Chirac 20.84 23.31

Philippe de Villiers 4.74 35.84

Jean-Marie Le Pen 15.00 5.02

Jacques Cheminade 0.28 15.00

For each candidate I then computed the sum of the two candidates immediately to the left and to the right on the political scale.

And the variable I introduce is the delta of this sum metric for the first round leader and the runner-up. So in 1995, te first round leader was Lionel Jospin, his first round percentage delta with second vote leader Jacques Chirac was 2.46 (23.30 - 20.84), and the "closest candidate delta" was -11.33 (11.98 - 23.31).

**Model results**

With these variables, I built a quick logistic model to estimate the probability of the first round leader to win the second round as a function of "first round delta" and "closest candidate delta".

Applying the model to the results of the 2012 first round results, indicates that the president for the next five years will be....

Nicolas Sarkozy !

Now, despite the small number of observations, I decided to exclude one of them which could be seen as an outlier. Indeed, in 2002 the extreme right party created a monumental surprise by reaching the second round. The second round became a right VS extreme right instead of the usual right VS left battle. And that year Jacques Chirac won the second round with an unprecedented 82% of votes whereas the values usually reside in the 45%-55% range.

Excluding that observation, the model was a perfect fit for the five remaining observations and predicted that the president for the next five years will be....

Nicolas Sarkozy !

**Wait until May 6th to criticize...**

Now, I could not agree more with the criticism the approach deserves of using the variables (including the intercept) in the model when we only have five or six observations.

But the objective here is not to publish in a stats journal, jsut to play around with the data. And all the polls indicate the François Hollande will be the next president. So in 10 days we'll see if this method that predicts Nicolas Sarkozy isn't as faulty as it would initially appear...

Let's look on the bright side of the analysis: I know have an additional observation for the 2017 elections :-)

