Today In Data Mining? Maybe?


Finance professor Jialan Wang won the Internet today with a beautiful note on Benford's law in US accounting data (for completeness of her victory see here, here, here, here, and here).

Here's the argument. Benford's Law is a statistical regularity that applies to many collections of numbers of differing orders of magnitude. As Wang writes:

A second earth-shattering fact is that there are more numbers in the universe that begin with the digit 1 than 2, or 3, or 4, or 5, or 6, or 7, or 8, or 9. And more numbers that begin with 2 than 3, or 4, and so on. This relationship holds for the lengths of rivers, the populations of cities, molecular weights of chemicals, and any number of other categories.

The explanation generally seems linked with exponential growth, and the formula is P(d) = log10 (1 + 1/d). So the probability of a number starting with a 1 is log 2, or 30%; the probability of it starting with a 9 is log 1.11, or about 4.6%. Strong men have been driven mad peering into this abyss.

Benford's law ought to hold for lots of kinds of financial data, particularly if you just take a big unsorted pile of stuff. So Wang took 50 years of various financial data (revenues, assets, and 41 other publicly reported categories) from 20,000 publicly reporting companies and just plotted the number of numbers that started with 1s, 2s, 3s ... etc. And it was a pretty good match to the Benford distribution:

So far so good. Now the bad news: the relationship has been moving away from a Benford distribution over time.

That chart is sum of squares of deviations, so 0.01 means that the average digit appears 3.3% more or less often than Benford's law predicts. You can tell evocative stories about various points on that graph. Wang's stories include:

- Deviation in finance went up in 1981-1982, "coincident with two major deregulatory acts that sparked the beginnings of that other big mortgage debacle, the Savings and Loan Crisis." It peaked in 1988 and matched that level in 2008, corresponding with banking crises.
- Deviation in tech surged during the dotcom bubble.
- Deviation in tech and manufacturing did not decline around 1990, as it did in finance, "since neither industry experienced major fraud scandals during that period."

Her conclusion:

While these time series don't prove anything decisively, deviations from Benford's law are compellingly correlated with known financial crises, bubbles, and fraud waves. And overall, the picture looks grim. Accounting data seem to be less and less related to the natural data-generating process that governs everything from rivers to molecules to cities. Since these data form the basis of most of our research in finance, Benford's law casts serious doubt on the reliability of our results. And it's just one more reason for investors to beware.

The research is clever, simple, alarming, and just really really cool, and everyone seems pretty convinced. And my statistical knowledge is ... so-so. But I'm still a little skeptical. Partly it's that the stories seem a little cherry-picked. (Why did deviations go up for every industry in the early 1980s - banking deregulation? Why do ups and downs in manufacturing track those in tech so closely when manufacturing lacked a lot of the IPO-boom, options-backdating incentives to manipulate earnings that tech arguably had?)

But mostly I worry that the explanation seems light on mechanism. Benford's law has been used to spot fraud in corporate expense accounts, as well as in Enron and Greece. The idea is that people manipulate numbers in ways that aren't natural. A $0.09 EPS gets pushed up to $0.10. Totally made-up profits might as well be round or amusing numbers. So if a company's numbers deviate from Benford's law, that could suggest that that company is up to something suspicious.

But it seems like something that would wash out in aggregating 20,000 companies. A company with an expense account limit of $100 might see a lot of $99 receipts. A company with a $10 minimum for reimbursement might see a lot of $10.25 receipts. Greece might not want to admit to €300 billion in debt but will be cool with €299bn. Enron might prefer $0.10 to $0.09 EPS. Some of those choices will push up the numbers of 1s, 2s, ... etc.; some will push them down.

Aggregating 20,000 companies, even if they're all committing fraud, ought to wash out as long as they all have slightly different accounting policies, achieved slightly different actual results, and are committing slightly different frauds. One company's earnings time series is either an artifact of nature or something manipulated by crooks; 20,000 earnings time series, faked or not, are just their own collection of naturally occurring numbers.

I'm not aware of the literature here - though what I've found is all related to individual frauds perpetrated by individuals, or at least within one company with one expense policy - so, y'know, enlighten me if I'm missing something. Maybe the natural tendency of all manipulation is to increase 1s and 9s. And I don't have a better explanation for why the deviations from Benford's law are increasing. But I'm not yet ready to throw, um, every US public company in jail on the basis of these charts.

Benford's Law and the Decreasing Reliability of Accounting Data for US Firms [Studies in Everyday Life]


Today In Swiss Banks With Creepy But Defensible Structured Products

I don't really understand it but the TVIX thing is creepy fun. If you haven't followed it, Credit Suisse issued this exchange-traded note called TVIX that was a 2x levered bet on the VIX. They suspended new issuance about a month ago due to position limits, and people were just so damn excited to own the thing that its price crept up to 189% of its fair value, where "fair value" is a reasonably easily measurable thing based on the formula in the TVIX prospectus. Then last week Credit Suisse announced that they would be creating more units, and the price plummeted to and then through fair value, which is what you'd expect to happen. Except that it started plummeting a few hours before that announcement, which is Suspicious. So of course people are sad and so there's a Bloomberg Brief with sort of sad-funny quotes like: “When it started to fall, I bought more because I couldn’t believe how low it was going. I didn’t realize I was playing with a hand grenade.” – Michael Gamble [heh! - ed.], 67, who doubled down on his TVIX investment before the price collapsed. Investors “all think: ‘Oh, I’ll just buy these things, I’ll be hedged against volatility and everything will be wonderful.’ And now they’ve seen the market goes down and their volatility protection goes down too, and they’re going ‘Hmm, what happened here?’ These people are going to have to pay a really expensive lesson.” – Larry McMillan, who manages $30 million as president of McMillan Analysis Corp. So, yes, Larry, they are going to pay a really expensive lesson. But what is it? Stephen Lubben has a little thing in DealBook today where he frets: