![]() |
It was anticipated more than a century ago that the distribution of real-world observations' first digits would not be uniform but would exhibit a trend where numbers with lower first digits (1,2,...) occur more frequently than those with higher first digits (...,8,9).
Simon Newcomb, an astronomer, first proposed this idea in 1881 after observing that pages of logarithm tables were more frequently thumbed for lower digits than higher ones. He presented a mathematical formula that predicted the distribution of first digits to support his claim that this was the case because scientists were more likely to need to check up logs of real numbers with smaller first digits than larger ones.
Later on, it was rediscovered by physicist Frank Benford, who stated it in 1938 in an article titled "The law of anomalous numbers." This phenomenon is known as Benford's law, the law of anomalous numbers, or the first-digit law. A set of numbers is said to satisfy Benford's law if the leading digit d (d ∈ {1, ..., 9}) occurs with probability P(d)=log(1+1/d), where log is a logarithm to the base 10.
The leading digits in such a set thus have the following distribution: 1 (30.1 percent), 2 (17.6 percent), 3 (12.5 percent), 4 (9.7 percent), 5 (7.9 percent), 6 (6.7 percent), 7 (5.8 percent), 8 (5.1 percent) and 9 (4.6 percent) (rounded to the first digits after the comma). Also, the distribution of second, third and subsequent digits, as well as digit combinations, are predicted by Benford's law.
In 1995, this phenomenon was finally proven by Theodore P. Hill, whose proof was based on the fact that numbers in data series following Benford's law are, in effect, "second-generation" distributions, i.e., combinations of other distributions. Benford's law would not prove beyond a reasonable doubt that fraud occurred, Theodore P. Hill, a retired professor of mathematics at Georgia Tech in Atlanta, said to Reuters in 2020, adding that it is only a red flag test that can raise doubts.
Fewster (2009) points out that there have been more publications on Benford's law in recent years, mainly focusing on the law's research in various data sources, fraud, computer science applications and new probability theorems. He continues by saying that the actual cause of Benford's law is difficult to pinpoint.
This law has been found to apply to a wide range of datasets, from countries' populations to financial data, physical constants and earthquakes. For example, Benford's law can spot possibly fake or anomalous data points. For instance, it may be a sign of number manipulation if a company's financial results do not match the expected distribution of the first digits. Benford's law has also been applied to forensic accounting, auditing and election monitoring to look for potential electoral fraud. However, another point of view says that, for elections, deviation from Benford's law does not prove election fraud, as reported by Reuters in 2020.
Kaiser's analysis published in the Journal of Economic Surveys in 2019 indicates that although income generally obeys Benford's law, almost all data sets show substantial discrepancies from it, which the author interprets as a strong indicator of reliability issues in the survey data. This means that survey data may not always be reliable.
The Sambridge et al. research produced the first-ever first-digit information-only identification of an abnormal seismic disturbance, which was later determined to be a tiny local Canberra earthquake. Out of curiosity, I used the recent data for more than 200,000 earthquakes that appeared in some particular region of our planet and tested it. I have found out that the most frequent first digits for depth are 7, 6 and 1, which does not conform with Benford's law. In the case of earthquake magnitude, first-digit frequencies are 1 (66.9 percent), 2 (27.9 percent), 3 (4.47 percent), 4 (0.7 percent), 5 (0.057 percent), 6 (0.04 percent), 7 (0.0009 percent), etc. So, the sequence is decreasing as in Benford's law, even though the distribution is not exactly the same.
I am not a seismologist but, following savvy, it might be that there are: (a) structural features of the earth at a depth of 6 to 7 kilometers in this region resulting in high seismic activity, (b) earthquakes at larger depths are not detected due to technologies' limitations and/or small magnitudes, and as a result are not included in the dataset (earthquakes occur in the crust or upper mantle, which ranges from the earth's surface to about 800 kilometers deep), (c) errors in a dataset, questioning its reliability, (d) errors of measurements in existing data related to used technologies' or (e) something else.
Researchers can more accurately evaluate and interpret their findings if they are aware of the expected distribution of first, second and consequent digits in a specific dataset. For instance, in some cases, it may be a sign of biased data collection or processing if distribution of a dataset's first, second and consequent digits deviates from Benford's law. However, if the distribution does follow the law, this can aid researchers in finding underlying patterns or relationships in the data.
Readers interested in testing their data can use Benford's law online calculators such as https://benfords-law.netlify.app/, https://www.dcode.fr/benford-law, https://ezcalc.me/benfords-law-calculator or compute in Excel or Google Spreadsheets.
Rushan Ziatdinov (www.ziatdinov-lab.com) is a professor in the Department of Industrial Engineering at Keimyung University, Daegu. He can be reached at ziatdinov.rushan@gmail.com.