# Understanding NFT Rarity Calculations

Nonsensical methods are pushed by multiple platforms that mislead investors about the true value of an NFT

Personal Profile Picture (PFP) collections are traded in numerous marketplaces where the relative price of a PFP within its collection should be somehow linked to the rarity of the individual PFP’s attributes. However, there is no consensus on even the proper definition of rarity, let alone on how it should be measured. Descriptions of the various methods currently in the public domain are mired in confusion because mathematical representations are poorly presented. Moreover, the price-rarity relationship is blurred by tokens being ranked completely differently depending on the rarity metric used by the marketplace.

Outrank have a mathematical framework which unifies all the rarity metrics in the public domain into one simple formula. This includes the OpenRarity open-source code which gives a ranking identical to that obtained using the geometric mean or, equivalently, the product of trait frequencies. Our formula also encompasses the Jaccard distance proposed by NFTGo as a special case, which is equivalent to the arithmetic mean ranking. Many other rarity analytics providers use a metric that is equivalent to the harmonic mean. Apart from the simple ‘rarest trait’ metric, every PFP rarity metric in the public domain provides a ranking that is identical to one of these three so-called Pythagorean means.

There are only four different rarity metrics in the public domain, as shown in the table below. When the model described on their website yields an identical ranking to a Pythagorean mean, we indicate this using ✔. Some sites, like Rarity.tools provide more than one ranking model. The * indicates the ability to add/remove trait normalization.

To understand this table, consider a PFP collection with $n$ different traits where each trait $i$ has $ω_1$ different attributes for $i = 1, ..., n$. The count of a particular attribute is the number of tokens having this attribute. Every token has a set of counts $(m_1, ..., m_n)$ where, for each trait $i$ the count $m_1$ is the number of tokens in the entire collection having the same attribute at that token. Now we define a token’s ‘rarest trait’ to be the minimum of the token's attribute counts. We also have the Pythagorean means:

**Harmonic**is the reciprocal of the arithmetic average of the reciprocals of the attribute counts**Geometric**is the $n^{th}$ root of the product of all the attribute counts**Arithmetic**is the average of all the attribute counts

Each count has a corresponding frequency, which is the count divided by $m$, the total number of tokens in the collection. In the definitions above it does not matter whether we use counts or frequencies. We have used counts, but we could just as well use frequencies. This is because we obtain the same rarity score, whether we use counts or frequencies.

### Rarity Scores and Ranks

We represent a rarity measurement $r$ as a score by first finding the minimum and maximum measurements, considering all the tokens in the collection, and then setting the score for $r$ to be:

It does not matter whether we define the Pythagorean means on frequencies or counts, the score is the same.

The figure below compares the densities of rarity scores for the harmonic, geometric and arithmetic mean rarity metrics. Note that the natural ordering of Pythagorean means is lost once raw measurements are converted to scores. The shapes are diverse, depending on both the collection and the rarity metric. In some collections the scores densities are all similar, they do not depend much on the metric used. In others, very different scores could be obtained, depending on whether one uses the harmonic, geometric or arithmetic mean.

Scores are usually converted to ranks, so that the token rank 1 has the smaller score and the token ranked m m has the highest, that is, they rank low to high. This throws away much valuable information. Moreover, ranks are invariant under strictly increasing monotonic transformations. For instance, it does not matter whether you use the geometric mean or just the product or counts (or frequencies) the ranking remains the same. Some analytic sites use a strict monotonic decreasing transformation of a Pythagorean mean but then they rank high to low so the result is the same. It is this logic that underpins our classification in the table above.

### Weighted Power Means

Let $p$ be any non-zero real number and $(ω_1,…,ω_n)$ be a set of positive weights. Then the *weighted power mean* with exponent $p$ of a set of $n$ positive real numbers ${(a_1,…,a_n)}$ is defined as:

This single formula provides a unified rarity metric that encompasses all the rarity metrics currently in the public domain. The exponent parameter $p$ can be any non-zero real number, but we make a special definition for the case $p = 0$. In fact, we are particularly interested in this and other specific values of $p$. First, the weighted harmonic mean has $p = -1:$

Then for $p = 0$ we set the weighted power mean to the weighted geometric mean:

and for $p=1$ we have the weighted arithmetic mean:

We note the limits as $p$ approaches $− ∞$ and $+∞$ are the weighted minimum and weighted maximum of the numbers, i.e.

The Pythagorean means correspond to setting $( a_1 , … , a_n )$ to be counts, or frequencies, and the weights all equal to one. These unweighted power means are also called generalized means.

### Trait Normalization

To understand why we need trait normalization, consider the simple example of two traits in a collection with 10,000 tokens. Mouth has 100 different types, all equally likely, so 100 tokens have a mouth of type 1 and so on. Eyes have only two different types, 9,900 have blue eyes and only 100 have brown eyes. Even though a token has a 1% chance of brown eyes and a 1% chance of any type of mouth, having brown eyes is clearly much more special than any sort of mouth. We want to weight these two 100 counts differently by assigning a smaller weight to the eyes frequencies than to the mouth frequencies (recall, smaller frequencies are rarer). For trait normalization, outrank sets the real numbers $( a_1 , … , a_n)$to the trait counts (not frequencies) and the weights $(w_1 , … , w_n)$to the number of trait values for each trait type.

### Collection Characteristics

Outrank have developed two useful tools to summarize the characteristics of an entire PFP collection:

**Bar Code:** The first is a graphical representation of the difference between the maximum and minimum attribute frequencies. That is, for each token we find its most common trait and its rarest trait and take the difference between the two frequencies. A scatter plot of this difference against the unique token ID is called the bar code. The height of the top bar determines how divergent the rarity scores corresponding to different Pythagorean means can be for this collection. The maximum height is 1.0 and the minimum is 0.0. Most collections have several bars. Tokens corresponding to a point in a higher bar may be ranked very differently depending on the metric used (highly discordant ranking). Tokens with points in lower bars have less discordant rankings because there is much less difference between the max and the min limits of the unweighted power means.

**QR Code:** Rankings diverge depending on the metric used, with the greatest difference apparent between the harmonic and arithmetic ranks. The QR code is a scatter plot where each point again represents a token, but the x-coordinate is its arithmetic rank, and the y-coordinate is its harmonic rank. The tokens lying on or near the purple dashed line have similar ranks according to both metrics. Those that lie far above (below) the line are less (more) common under the harmonic rank than they are under the arithmetic rank.

### Independence

Tests for independence of traits use a simple chi-squared test and the Cramer’s V statistic. The first table below shows the chi-squared statistics above the diagonal and their 95% critical values (which depend on the numbers of trait values) below the diagonal. Trait pairs are only independent when the number above the diagonal is less than the corresponding number below the diagonal.

The second table reports Cramer’s V statistics, which allow comparison between different trait pairs when the contingency tables behind the chi-squared tests have less than 5 elements in some cells – which makes the independence test less reliable. Because they are robust to low cell counts in contingency tables the Cramer’s V statistics are especially useful for trait pairs that are only marginally non-independent. If their Cramer’s V is close to the values corresponding to independent trait pairs, we comfortably regard these traits as independent too.

Of all the rarity metrics in the public domain there is only one that could be correct and even this is only correct for certain collections. To be precise, independence of all trait value distributions implies the geometric mean rarity metric is correct. Lack of trait value independence implies that none of the metrics currently in the public domain are adequate and there is only one, universal mathematically sound approach which is based on combinatorial algebra.

### Summary

Outrank have developed a unified solution which resolves the discrepancies between the many different rarity metrics currently in the public domain. While there is so much disagreement on ranks, the relationship between price and rarity will be impossible to model.

All the metrics currently available can be subsumed into a single formula, for the weighted power mean. But only the weighted geometric mean has correct statistical foundations. The other metrics can only be regarded as ad-hoc distance metric, on which there is no prospect of universal agreement.

Even the geometric mean is not statistically correct unless the trait type distributions are independent. Outrank provide independence tests which, if failed, would require the collection to be scored and ranked according to our new universal rarity metric, coming soon

Please access the academic paper here: https://c765i-5iaaa-aaaap-qbo7q-cai.icp0.io/Non_Rarity_Metrics_for_Non_Fungible_Tokens.pdf

Last updated