Super Crunchers

Ian Ayers is a surprisingly engaging writer, taking what many would consider a very dry topic — statistics — and turning it into a thought-provoking, but flawed, book entitled Super Crunchers: Why Thinking-By-Numbers is the New Way To Be Smart.

From the opening pages, Ayers pits the “super crunchers” — people applying statistics to large data sets — against experts in an area, be it viticulture, baseball, or marketing. With barely suppressed glee he describes how number crunching out-predicts the experts time and time again. The point being that as collecting, storing and analysing large amounts of data becomes cheaper and cheaper, more and more decision-making will take the results of “super crunching” into account, with experts either having to step aside or learn some statistical chops. To back arguments for the rise of “super crunching” Ayers draws on a large number of examples from a variety of areas and even experiments with the technique himself, describing how he used it to help choose the title of his book.

Although I am more or less convinced by Ayers’ arguments I found myself questioning his credibility in several places during the book. I think the main reason for this was due to the tone of the book occasionally crossing the fine line separating “enthusiastic, popular account” and “overly simplistic, gushing rave”. The constant use of “super crunching” throughout the book got on my nerves after a while. It began to overemphasise the newness of what could as easily be called “statistical analysis”. After a while I mentally replaced “super crunching” with the less sensational “statistical analysis” wherever I encountered it.

Conversely, Ayers constantly refers to “regression” when talking about the techniques analysts use to make predictions. At first, I thought this was a convenient short-hand for a range of techniques that he didn’t want to spend time distinguishing between. It was only when neural networks are described as “a newfangled competitor to the tried-and-true regression formula” and “an important contributor to the Super Crunching revolution” that I realised that Ayers may not know as much about the nuts and bolts of computational statistics as I first thought. This impression was confirmed when Ayers later confuses “summary statistics” for “sufficient statistics” and talks tautologically of “binary bytes”.

Stylistically, there is too much foreshadowing and repetition of topics throughout the book for my liking. This feels a little condescending at times, as does him directly asking the reader to stop and think about a concept or problem at various points.

Overall, I wanted to like this book more than I did. It was a light, enjoyable read and I wholeheartedly agree with Ayers’ belief in the continuing importance of statistics in decision-making and his call to improve the average person’s intuition of statistics. Unfortunately, I found much of “Super Crunchers” substituting enthusiasm for coherence, as well as impressions and anecdote for any kind of meaningful argument.

Comments (3)

  1. ansate wrote::

    thanks for the review! I’d been thinking about reading this but hadn’t gotten to it. Sounds like he’s enthusiastic about the same things I am, but doesn’t add enough to the discussion to be worth it to us geeks who used these arguments in our grad school entrance essays.

    Sunday, September 28, 2008 at 12:31 am #
  2. Great review!

    Tuesday, September 30, 2008 at 10:21 am #
  3. Bob Carpenter wrote::

    I also thought about getting this book, so thanks for saving me some time. I was so turned off by the breathless style of The Numerati (another pop book about data mining) that I think I’ll wait a while before delving into another pop quant book.

    I believe the right question to ask is whether we need domain experts at all, or just need a whole lot of data.

    I think the answer’s pretty obvious. Even the basic structure of a statistical model entails a large degree of design in everything from setting up dependencies to selecting predictors.

    The most accurate natural language systems bring in all kinds of human-generated knowledge sources from labeled data for classifiers or part-of-speech taggers to domain-specific dictionaries to full-blown ontologies.

    I’d cut just about anybody slack for not sorting out all of our redundant terminology. I only just recently realized that so-called max entropy classifiers, logistic regression, and one-layer neural nets with sigmoid/softmax activation were the same thing, and that L1 norms, Laplace priors, double-exponential priors, and the “lasso” are the same thing.

    Thursday, December 18, 2008 at 5:18 am #