When in danger or in doubt, run in circles, scream and shout.

I have voted in every election (and I mean every election, even the weird February local initiative ones where you’re wondering why they saw fit to bring this up *now*), since 2000. I read the book that comes out, I do fact checks, and I vote.

There are some things I wish I could wave a magic wand and just have go away:

  1. Opinion Journalism. How you say what you say matters, and you can take a statement of fact and either amplify the parts of the statement that suit your need to sway an audience and/or de-amplify the ones that don’t suit you. We have forums for editorial journalism — they’re in the Editorials section, cleverly enough — and they should stay there. Since the dawn of “alternative facts” this has become more and more sketchy, and it feeds the hysteria.
  2. Speaking of hysteria – can we have a round of applause for the Hysteria Machine? No? Good. Because the Hysteria Machine is exhausting. Yes, I know s/he said the thing. It’s on tape, I saw it. I do not need you to reinforce to me how awful the thing is. All I need is the fact that s/he said the thing (or did the thing). Let me have my own disgust, or anger, or sadness, without imparting a healthy layer of *yours* on top of it. (By the by, I’m referring to articles, blog posts, radio, podcasts, etc. If you are my friend and we talk socially and you want to commiserate over the whatever — or even *healthily debate with facts and reasoning over differences of opinion* — then that’s cool.) I just don’t want a national news syndicate telling me where my outrage should come from. It’s insulting (it implies I don’t understand things and so wants to dumb it down to an emotional reaction) and it’s exhausting.
  3. Armchair data science. I love data. I love data science. I love everything about data including tracking it from where and how and under what rigors it is collected to the pipelines in which it runs to the output in which it is consumed. I love data even — and perhaps especially — when it disproves an assumption or bias I have, because learning is hard and sometimes un-fun and that means you are exercising your brain. Go brains! Armchair data science is none of these. Armchair data science is like this:

Let’s play a game.  What’s wrong with this poll?

Firstly, it sits in a very popular media entry site, sandwiched between international news and Latest Video (of… stuff, I guess), below an article about free pastries at McDonalds and above local news (predominantly about COVID). The context is negligible or confusing at best. In what context am I being asked how I feel about polls? Apparently one in which I am also interested in a McDonalds Apple Pie while self-isolating and reading about how things are going far away from me.

Secondly, look at the nature of the question: “Do you like taking polls?”  The question can be answered 3 different ways:

  1. Yes, I like taking polls.
  2. No, I do not like taking polls.
  3. No, I do not like taking polls, but I do anyway, because I can’t help myself.

The first one is easy – yep, like taking polls, so I’m going to check that box.

The second one has got to be facetious – if I do not like taking polls, I’m not going to take your poll. The results you get with this poll will not reflect the actual population that likes or does not like taking polls, and will skew heavily towards those that like taking polls.  You’re not going to get the volume of “No’s” that reflect reality, because your poll does not have ESP and can’t read my mind as I register what it is asking me, reflect that I don’t like polls, and therefore do not engage. (The fact that I’m engaging this much on my blog and yet still won’t click your damn button illustrates this).

The third one is even better — I do not like taking polls, but I am unable to stop myself from grasping my mouse and clicking that button (or taking my finger and poking at it). What is being measured here is the impetus of the user to click a button because they like the little dopamine rush they get when they click a button; and likely has nothing to do with polls per-se.  

The results of this poll will be useless — they will be heavily skewed towards the first and third answers, and, if the respondents who would represent the second one actually behave in the manner the poll suggests they behave, they would not be represented at all. What’s wrong with a useless poll?

This useless poll will probably drive someone’s decision, somewhere.  It will either drive a marketing choice (have more polls! people love taking them!), an editorial choice (we should make polls on the front page every day!), or a behavioral choice (people love clicking things, let’s add more clickable content!).  Which then will drive other behaviors and choices, and what you end up with are ad-filled, click-bait-filled pages of no material use for those of us who just wanted the facts.

This is just an innocuous, stupid little poll about polling.  What happens when it looks like it’s a legit poll about how people feel about COVID? Or the economy? Or healthcare? Or personal freedoms?  The output of that drives more of the hysteria machine, of course, because now we know how to cater to our clickers– they care about the economy so let’s tell them what is happening with it, but not objectively — let’s not share specific data points with a holistic view; let’s instead concentrate on the Stock Market. Or on the jobs data — but not all the jobs data, just the ones we think will drive the most clicks. 

Ironically this means that those of us who would like all the data, so we can make informed choices, absent of editorial sway and anxiety exacerbation, have to click *more* … to dig it all out.

 

Owning Your Data

I realize I’m terribly late to this party. I’m not even fashionably late, I’m “you arrived just as the caterers were cleaning up and the hostess had taken off her shoes” late. I’ve been busy (as, I think, I’ve amply covered).

However, I really must say a word or two about Reinhart and Rogoff.

For those who don’t follow economics or kinda remember they heard about it but aren’t sure what the big hullabaloo is, I recommend you google it; look for the Economist, the Guardian, and the Atlantic non-editorial resources to start. There’s a few. Then you can go off to the editorials for dessert. For those who don’t want to google, here’s the Twitter version: Two economists present a work in which they suggest that there is a deep drop off in economic performance without austerity measures. Essentially they said that when debt is high, growth slows to a grinding halt; the graph they presented roughly resembled the cliffs of Dover.

And it was wrong.

Because of an Excel spreadsheet formula error.

Normally this wouldn’t be awful. Anyone, and I do mean anyone, who has used Excel to convey data (or volumes of analysis) has made that spreadsheet error, and it can be as simple as not properly conveying a Sum formula, or as complex as messing up your Vlookup in your nested IF statement. Excel has been bastardized over the years into an analytics function (by courtesy of default in that it’s on nearly every machine) that it really can’t fully accommodate without failsafes; EVERYONE makes an Excel error.

Reinhart and Rogoff’s mistake is NOT that they made a spreadsheet formula error. And, contrary to the article above I linked to, it’s only partially that they did not peer review.

It was governments’ (plural, many, varied) mistake to use it to shape policy.

Lookit, suppose I told you that, according to my Excel spreadsheet, you were very likely to die from dehydration if you didn’t eradicate all but 0.4 grams of salt per day from your diet. For perspective, the average diet has about 5 times that. You would very rightly look to other studies, other data, other sources of information. You’d poll your neighbors. You’d check with friends. You’d do your due diligence before you used my say-so, no matter how shiny my Excel spreadsheet, or even how shiny my MD would be (this is fiction, after all).  Plenty of people are told by their doctor to lose 10lbs because it will make a difference in the long run, and plenty of people seem to blithely ignore it because they don’t have corresponding (personal, attributable, anecdotal) data.

So why, why, why did any government, financial body, fiscal institution leap on the screeching panic train when R&R’s study hit?  Why did no one look to a 2nd opinion, a different study; why didn’t they check the data for themselves before subjecting their economies to the fiscal equivalent of a rectal exam?

I have been in data now for 15 years. It’s not a long time in the scheme of things, but it’s something I’m known to be passionate about. I can go on and on about how data works, or doesn’t; what you can derive from it; how data *is* integrity if done right. Any form of analytic reporting that is worth its salt has been tested, peer-reviewed, and validated against two or three other methods before it is used in a practical space. At Expedia, at one point, I managed 500 ad-hoc requests per month, and each of those was eyeballed against existing reporting and a decent sense-check before being used to cut deals (or not).

Now, please understand: R&R screwed up. And, apart from their formula error, they insist the outcome is the same (and it is, but it’s the equivalent of saying “ok it’s not a steep drop off anymore, more of a speedbump, but still it’s a delta!!”). This is the foible of the data monkey; again, something we’ve all been prey to. But not all of us have done it to the culpability of large (and small) governments, and most of us have learned to admit when we’re wrong. That is the crux of it: if no one is perfect, no data is perfect, to pretend yours is against evidence to the contrary is specious at best and negligent at worst.

I argue though that the more egregious mistake is to *follow* that data without validation. To quote Ben Kenobi: “Who’s more foolish, the fool, or the fool that follows him?”