Understanding NFT Rarity Calculations

Nonsensical methods are pushed by multiple platforms that mislead investors about the true value of an NFT

Personal Profile Picture (PFP) collections are traded in numerous marketplaces where the relative price of a PFP within its collection should be somehow linked to the rarity of the individual PFP’s attributes. However, there is no consensus on even the proper definition of rarity, let alone on how it should be measured. Descriptions of the various methods currently in the public domain are mired in confusion because mathematical representations are poorly presented. Moreover, the price-rarity relationship is blurred by tokens being ranked completely differently depending on the rarity metric used by the marketplace.

Outrank have a mathematical framework which unifies all the rarity metrics in the public domain into one simple formula. This includes the OpenRarity open-source code which gives a ranking identical to that obtained using the geometric mean or, equivalently, the product of trait frequencies. Our formula also encompasses the Jaccard distance proposed by NFTGo as a special case, which is equivalent to the arithmetic mean ranking. Many other rarity analytics providers use a metric that is equivalent to the harmonic mean. Apart from the simple ‘rarest trait’ metric, every PFP rarity metric in the public domain provides a ranking that is identical to one of these three so-called Pythagorean means.

There are only four different rarity metrics in the public domain, as shown in the table below. When the model described on their website yields an identical ranking to a Pythagorean mean, we indicate this using ✔. Some sites, like Rarity.tools provide more than one ranking model. The * indicates the ability to add/remove trait normalization.

  • Harmonic is the reciprocal of the arithmetic average of the reciprocals of the attribute counts

  • Arithmetic is the average of all the attribute counts

Rarity Scores and Ranks

It does not matter whether we define the Pythagorean means on frequencies or counts, the score is the same.

The figure below compares the densities of rarity scores for the harmonic, geometric and arithmetic mean rarity metrics. Note that the natural ordering of Pythagorean means is lost once raw measurements are converted to scores. The shapes are diverse, depending on both the collection and the rarity metric. In some collections the scores densities are all similar, they do not depend much on the metric used. In others, very different scores could be obtained, depending on whether one uses the harmonic, geometric or arithmetic mean.

Scores are usually converted to ranks, so that the token rank 1 has the smaller score and the token ranked m m has the highest, that is, they rank low to high. This throws away much valuable information. Moreover, ranks are invariant under strictly increasing monotonic transformations. For instance, it does not matter whether you use the geometric mean or just the product or counts (or frequencies) the ranking remains the same. Some analytic sites use a strict monotonic decreasing transformation of a Pythagorean mean but then they rank high to low so the result is the same. It is this logic that underpins our classification in the table above.

Weighted Power Means

Trait Normalization

Collection Characteristics

Outrank have developed two useful tools to summarize the characteristics of an entire PFP collection:

Bar Code: The first is a graphical representation of the difference between the maximum and minimum attribute frequencies. That is, for each token we find its most common trait and its rarest trait and take the difference between the two frequencies. A scatter plot of this difference against the unique token ID is called the bar code. The height of the top bar determines how divergent the rarity scores corresponding to different Pythagorean means can be for this collection. The maximum height is 1.0 and the minimum is 0.0. Most collections have several bars. Tokens corresponding to a point in a higher bar may be ranked very differently depending on the metric used (highly discordant ranking). Tokens with points in lower bars have less discordant rankings because there is much less difference between the max and the min limits of the unweighted power means.

QR Code: Rankings diverge depending on the metric used, with the greatest difference apparent between the harmonic and arithmetic ranks. The QR code is a scatter plot where each point again represents a token, but the x-coordinate is its arithmetic rank, and the y-coordinate is its harmonic rank. The tokens lying on or near the purple dashed line have similar ranks according to both metrics. Those that lie far above (below) the line are less (more) common under the harmonic rank than they are under the arithmetic rank.

Independence

Tests for independence of traits use a simple chi-squared test and the Cramer’s V statistic. The first table below shows the chi-squared statistics above the diagonal and their 95% critical values (which depend on the numbers of trait values) below the diagonal. Trait pairs are only independent when the number above the diagonal is less than the corresponding number below the diagonal.

The second table reports Cramer’s V statistics, which allow comparison between different trait pairs when the contingency tables behind the chi-squared tests have less than 5 elements in some cells – which makes the independence test less reliable. Because they are robust to low cell counts in contingency tables the Cramer’s V statistics are especially useful for trait pairs that are only marginally non-independent. If their Cramer’s V is close to the values corresponding to independent trait pairs, we comfortably regard these traits as independent too.

Of all the rarity metrics in the public domain there is only one that could be correct and even this is only correct for certain collections. To be precise, independence of all trait value distributions implies the geometric mean rarity metric is correct. Lack of trait value independence implies that none of the metrics currently in the public domain are adequate and there is only one, universal mathematically sound approach which is based on combinatorial algebra.

Summary

Outrank have developed a unified solution which resolves the discrepancies between the many different rarity metrics currently in the public domain. While there is so much disagreement on ranks, the relationship between price and rarity will be impossible to model.

All the metrics currently available can be subsumed into a single formula, for the weighted power mean. But only the weighted geometric mean has correct statistical foundations. The other metrics can only be regarded as ad-hoc distance metric, on which there is no prospect of universal agreement.

Even the geometric mean is not statistically correct unless the trait type distributions are independent. Outrank provide independence tests which, if failed, would require the collection to be scored and ranked according to our new universal rarity metric, coming soon

Please access the academic paper here: https://c765i-5iaaa-aaaap-qbo7q-cai.icp0.io/Non_Rarity_Metrics_for_Non_Fungible_Tokens.pdf

Last updated