# Description and modeling of FlapMMO score data

FlapMMO is an online game similar to Flappy Bird. The player controls a bird and the aim is to fly as far as possible, avoiding the pipes. At the same time, you see other players playing (title image).

In this post, an exploration of a dataset of scores is performed, using descriptive statistics and testing some probabilistic models.

Retrieving data. Thanks to the work of Connor Sauve, it is possible to retrieve the flow of data, which contains for each attempt useful information as:

• an id field, which uniquely identify the player,
• a nickname,
• a date,
• a list of dates of jump (from the beginning to the end of the attempt).

Finally, I get two datasets:

• data0, obtained by Connor Sauve on 13 Feb 2014, which contains about 400,000 attempts with more than 18,000 different players,
• data1, obtained by myself on 2 Mar 2014, which contains about 100,000 attempts with more than 5,000 players.

In the next plots, data1 will be used. In the last paragraph, a brief (one line) comparison between the two datasets is done.

Variable of interest. For each id, I only focus on the successive pipes where the bird bangs. Then, I transform my datasets to obtain something like:

id attempt 1 attempt 2 attempt 3 attempt 4
3266 2 1
3267 1 6 2 3
3268 1
3269 1 1 1
3270 2 10 5 1

For example, we see that the player with the id 3266 plays 2 times. In his first try, his bird banged the second pipe. In his next try, his bird banged the first pipe. Then he stopped to play.

Note that I remove the attempts which do not reach the first pipe.

How much time each player continues to play? In the following graph, I plot the frequency of players as a function of the number of attempts.

We observe that most of people only play a few times: 50% of the players play 10 times or less, and 75% of the them try less than 25 attempts.

From this plot, we deduce the probability that a player plays again as a function of the number of attempt done.

This plot suggests that the more a player tries the game, the more he continues to play. It might reveal that the game is addictive.

How far each player go? The next descriptive graph represents the frequency of players as a function of the pipe reached in their best play.

Here, we observe that most players are able to pass a pipe, but 50% of them don’t reach the fifth pipe and 75% lose before the eighth pipe. It is noteworthy that someone reached the 140th pipe (outside of the graph).

Evolution of the score between two consecutive tries. Knowing the score (the pipe where the bird bangs) of a player for an attempt, we want to infer the score of the next try. For this purpose, we use a homogeneous Markov model. This is a simplistic model, because the next score may depend on all the history of scores (not a one step Markov model), and on the number of attempts the player has already done (not a homogeneous model).

An empirical transition matrix is obtained, where each cell (i,j) represents the probability that a player who scores i in a try will score j in the next try. Only 10 states are kept:

{1, 2, 3, 4, 5, 6,7, 8, > , † }.

Here “>” represents a score greater than 8 and “†” means that the player stopped to play.

The matrix is given by:

For example, the probability that a player who scores 1 in a try will score 6 in the next try is 0.02=2%. The probability that a player who scores greater than 8 will leave the game is 0.09=9%.

From this matrix, we represent the probability to score 1 (respectively to score greater than 8) in the next try as a function of the current try.

We deduce that players who score high for a try tend to score high in the next try, and vice versa. Then, this game is not a random game.

Also from the matrix, we plot the probability to leave the game as a function of the current try.

Thus, people who reach great scores are more likely to leave the game.

Skill of players. Now, we take players individually and for each of them, we want to measure his skill. Letting {1,…,N} the players, we only make the following assumption:

“when the bird of the player k is in front of a pipe, it dies with probability p_{k}”.

Then for the player k, each score follow a geometric distribution (on N*) with rate p_{k}. After estimation of the rates for all players, we plot the histogram of rates for players who make more than 30 attempts.

The distribution of rates is not uniform and most of players (with more than 30 attempts) have a rate around 0.5.

Note that the geometric hypothesis was tested with the Cramer-von Mises test, and for almost all the players, the hypothesis of geometric distribution could not be rejected with this test (even for players who play many times).

Evolution of the skill. The previous model could not exhibit the evolution of the skill when a player is making many attempts. To fix that, we modify the previous assumption by:

“when the bird of the player k is in front of a pipe, letting l the number of attempts already done by the player, it dies with probability p_{k,l}”.

We use a uniform convolution to estimate each rate p_{k,l}. Then, for players who make many attempts, we can observe if their skills increase well or not. Here the plots for players 1725, 1433 and 4147.

Player 1725 begins without knowing how to play and then constantly improves his performance.

Player 1433 knows how to play, and his skill improvment is slow.

Player 4147 played many times but its skill remains more or less constant.

Comparison between data0 and data1. The shape of all plots looks similar, except that in data1 the skill of players is greater.

Related contents:

Data:

Requests:

• How to find reasonable ways to fit my plots of sections “How much time each player continues to play?”, “How far each player go?” and “Skill of players” ?