I'll admit: the points are part of the reason why I do anything on PI Square (including writing this post). Points a cheap way to add meaningless competition to help encourage participation. On the home page of PI Square, the All-Time Leaderboard shows you yourself, the 5 people above you, the 5 people below you, and the top 25 people. That's it. That's not very satisfying.

In this post, we will look at the full PI Square All-Time Leaderboard in a few different ways. This post is just for fun, so the results are not super meaningful.

# To start off…

I collected the point data and rank data from 2020-08-16 to 2020-08-21, so the data is not accurate to any particular point in time. Points and ranks would have changed during that time, users could have deleted their accounts, and new users would have been added. Also, I know that I missed a few users that existed before I started collecting data, but I still managed to capture have more than 99% of the available point data and rank data.

54352 users are included in the analysis, but there are only 1222 distinct point totals and therefore only 1222 distinct ranks.

# Points versus rank: the curve of best fit

In Excel, I created a **scatterplot** of points versus rank. The best-fitting trendline that I was able to use was a "power" trendline. Duplicated data points were excluded. The **data points are blue** and the **trendline is red**. Here is the result:

We can zoom in if we look at only the top 4000 people:

The **curve of best fit** seems to fit well for the low-ranking users, which is most users, but overshoots the number of points earned by the high-ranking users. Why?

Look closely at the full graph. There is a cluster of data points with ranks from 20000 to 25000 and another cluster of data points with ranks 45000 to 50000. If we exclude those 29 points, we get a much better fit:

Of course, excluding valid data points is not ideal. Perhaps another type of curve of best fit—one that Excel does not support—would have been more appropriate. As you can see, the data points form a pretty smooth curve, but how to accurately describe that curve without overfitting is still a mystery to me.

# Do PI Champions live up to the title?

"PI Champion" is a self-appointed title that a user uses to show their commitment to learning and using PI. I say "self-appointed" because all you need to do to become a PI Champion is join the group. Is this just job title inflation, or is there actually something special about PI Champions?

Below is a scatterplot of points versus rank for the PI Champions only. There are 367 PI Champions.

As you can see, the PI Champions all rank fairly highly, and, based on the **R²** value, the curve is a much better fit for only the PI Champions compared to all users. However, when you join the PI Champions group, you automatically get 500 points, and so the lowest rank that a PI Champion would be at is 2340. We have also seen from earlier that the curve of best fit tends to fit better when the low-ranking users are excluded, which explains the high R² value. Basically, this graph doesn't tell us much.

To properly analyze the PI Champions compared to other users, we need to exclude the free 500 points that they all receive, which is done in the graphs below.

Scatterplot of points versus rank for PI Champions only, minus 500 points:

Zoomed in:

So far, this looks very similar to the scatterplots that plotted all users, so PI Champions are not too special when it comes to how rank changes with points or vice versa.

Now it's time to highlight where the PI Champions stand compared to other users. Below are the same graphs that show all PI Square users, but this time, the PI Champions, minus their free 500 points, are shown in red:

Zoomed in:

The graphs show that PI Champions, without their freebie points, come from many different ranks. However, the **median** rank of the PI Champions, minus 500 points, is 2337, while the median rank of all PI users is 21973, so PI Champions still generally earn more points compared to regular users and so the title isn't merely a free ego boost.

# Number of users per level

Because having a point system apparently did not gamify PI Square participation enough, each user also has a **level**. Each level is named after an element on the periodic table and represents a range of points.

The **histogram** below shows the number of users in each level.

Without hydrogen:

Because there is a big difference in the size of the numbers of users in the lower levels versus the higher levels, it helps to look instead at the **logarithm** of the number of users:

From helium to oxygen, there seems to be a **linear** decrease in the logarithm of the number of users, or in other words, there seems to be an **exponential** decrease in the number of users. Beyond oxygen, we see not only much fewer users per level but also much fluctuation between levels. Hydrogen has a ridiculously high number of users no matter how you cut and slice it.

# The hydrogen clusters

Earlier, we saw a cluster of data points with ranks from 20000 to 25000 and another cluster of data points with ranks 45000 to 50000. Let's look at these in more detail.

First, look at the table below, which shows the 5 most common number of points for users to have, from most common to least common:

Points | Number Of Users | Rank |
---|---|---|

100 | 24394 | 21973 |

120 | 8038 | 12533 |

0 | 7201 | 47152 |

160 | 1272 | 8311 |

110 | 916 | 20636 |

As you can see, there are more users that have 100 points than users that have 0 points. There is an activity called "Logging in for the first time" that gives you 100 points, but since some users have <100 points, this activity must have been introduced at some point after PI Square's launch. For a user to have 120 points, they need to have joined PI Square and either use the search bar, tag something, update their status, or like something. For a user to have 160 points, they need to have done 3 of these things or only 1 of these things but logged in 10 times. Astonishingly, 80% of all PI Square users have 1 of the 5 number of points listed above!

The reason for the clusters existing is probably this: there are many new users and many users that do not do much on PI Square. Because of **missions**, it does not take much activity before you earn >200 points, and it also means that in users' early days on PI Square, most of their points would come from missions. Since many missions assign the same number of points and since many new users will do the same few activities, this causes many users to have the same number of points. Because PI Square uses **standard competition ranking**, these ties create gaps in the ranks. When the gaps are huge, they create the clusters that we see. If we were to use **dense ranking** instead, there would be no clusters, as shown in the graph below:

# Trivia

- No user has earned more points than there are users.
- You keep the freebie PI Champion points even after you leave the group.
- Due to the 500 points that every PI Champion receives, it is impossible for a PI Champion to be at the hydrogen, helium, or lithium level.
- Hydrogen, followed by helium, is the most abundant element both in the universe and on PI Square.
- Normally, we talk about analyzing, in PI, data that is not about PI. This post is the opposite: we analyze data about PI (Square) without using any PI programs.
- My scatterplots show rank on the x-axis and points on the y-axis. Since rank depends on points, by convention, I should have flipped the axes around (independent variable on the x-axis, dependent variable on the y-axis). However, I think that it would be more common for users to ask "How many points do I need to achieve [rank]?" rather than "How high will I rank if I get [points] points?".

# Conclusion

Hopefully, you enjoyed this post. It goes to show that data and patterns are everywhere, even in places where you least expect it. In my next blog post, we use these results to help us track our progress in rising the ranks of PI Square.

While i am really not knowledgeable on statistics, why should a curve of best fit (quadratic i presume) be a fitting model for this dataset?

There are different ways to earn points, and i think only the points earned from posts would really fit the curve. That would/could be an explanation for the better fit for this in the PI Champions group: generally more frequent posters as they are more engaged in the forum?