As a producer of digital forensic software, we are regularly learning more about forensic methods and tasks. Recently I came across a curious article (and video) in Business Insider called “How forensic accountants use Benford’s Law to detect fraud”
The video states that forensic guys can use Benford’s Law to analyze financial data and identify red flags.
This sounds interesting because it is too easy to check if a set of data obeys this law. Let’s do some checks.
Benford’s Law is a phenomenological law about the frequency distribution of leading digits in many real-life sets of numerical data. Mathematically it can be written asd – the leading digit
P(d) – probability that a number begins with d.
For d from 1 to 9, we can calculate probabilities as follows:
It is stated in the article that Benford’s Law works good for values which are results of multiplication, division and raising to a power, which are common for financial calculations.
Microsoft Excel helps us to verify this with no efforts. I will multiply 2 random values and divide it to another random value: RANDBETWEEN(1,99)* RANDBETWEEN(1,99)/ RANDBETWEEN(1,99)
and calculate the distribution for 500 values.
Great! This really looks like Benford’s Law distribution. And this may really help in forensic investigation when analyzing calculated values. If you are interested how I extracted the first digit from a number – it’s very easy:
NUMBERVALUE(LEFT(TEXT(A2,0),1))
But the article’s video starts from the statement that the population of every US county obeys to this law as well. Although I understand, that we can predict demographic values using math formulas, this sounds weird. Since the population of US obeys to this law, I guess this should work for the world population. Let’s check it.
I was skeptical about the result, but it really looks like Bedford’s Law distribution!
Ok, if it works with population, maybe it will work with similar entities, but specific to digital forensics. First, I thought about event logs, e.g. number of events in the log. But obviously it won’t work, because event logs are limited by size. In my example, when the event log grows, the first digit will be always ‘2’ till you clear the log.
Moreover, Windows commonly uses only few event logs, so we won’t get any statistical significance. And of course Benford’s Law won’t work if we test Event ID numbers – they have solely artificial nature and it’s up to developer to choose an identifier for event.
But what about files. What if we test it with file size – will it work? If it works for test samples, we will be able to suppose that if the investigation case files don’t obey the law, someone could remove many files. I will test it with C:\Windows folder and its subfolders on my PC (it contains more than 100 000 files). By the way, since Windows keeps event log files in C:\Windows\System32\winevt\Logs and these files are limited by size, it may affect the statistics. Hopefully there are only about 150 files in Logs folder, so this effect is not significant. We will take size of every file in kilobytes.
Well, it is really similar to Benford’s Law distribution. But one sample is not enough to make conclusions. It is interesting that in all my cases, we can see a small hump at 7 or 8 digits.
Let us summarize what we researched here. Benford’s Law works for big volumes of data that are the result of certain calculations, like financial results. It works with some statistical data. It may work for some specific digital data – although my test shows that it works for file size, it should be rechecked with other sets of data. Probably it will work with other digital data which digital forensic investigator may get as evidence. However this method won’t work for neat frauds. If only a small part of data modified, the distribution won’t show any significant deviation. In case of financial fraud, if a malefactor modified the original data and recalculate all the results, I suppose that the new set of data will obey the law.
Therefore, as any forensic instrument, Benford’s Law has its own applications, but it is not a panacea.