“It’s all just/an the algorithm.” We hear it a lot: bandied about in media coverage of, well, the media; used as an explainer for why Facebook knows you like teacups with dragons on them and why Amazon suggests you purchase tissues and why you see those ads in your Gmail about bulbs or deer or survivalist stuff. (All true, btw). I think there’s a decent size of the population that has a context-specific definition for algorithm (e.g., I know that this means a black box in which things are magically done and then Instagram *just knows* that I like fitness videos) but not an *actual* one, which means when I hear that “the algorithm knows” I have no problem with GMO’s but do prefer organics and less-processed foodstuffs, I think that it “just knows” without really understanding what that means.
So here’s a primer of algorithms, because this is what goes through my overcaffeinated brain of a Sunday morning. If you’d like to understand more about them, or if you’d like to explain them to someone you think should understand them more, this one’s for you.
The first thing to know about algorithms is they are not smart. They have no intelligence whatsoever. They’re basically an equation, a formula, a set of rules by which one or more pieces of data (“Bobbie likes pie”, “Bobbie tracks her food on MyFitnessPal”) gets “looked at” and then somewhere checked against a list of criteria (“People who like pie like junk food”, “Women who track their food are on a diet”) and then a “logical” conclusion is spat out. You actually can use algorithms in your day to day; you probably already are. Just like the algorithms in your brains, algorithms in computers are built by humans.
For example: up until 2020, I drove about 20,000 miles per year. For those non-drivers in the world or those who are based metric, that’s more than average. Most dealerships will assume, for their “bring you in for maintenance purposes”, that you’re driving about 12-15,000 miles per year. Because I had a relatively new car up until 2020, and because it was covered for maintenance through some package deal I bought, I was bringing my car in every 5,000 miles. However, the dealer had an algorithm for every 5,000 miles based on what they considered “typical use”. This means that they’d always want to schedule my next maintenance 4 or 5 months from my current one; and I’d frequently have to bump it sooner, because at 20,000 miles/year, I’m driving 5,000 miles every 3 months. I know this and because it was nice simple round numbers, I didn’t have to have a spreadsheet on it. My driving mileage has been pretty consistent for 15 or so years. So the *algorithm* we’re looking at here, to predict when my next appointment is, is Number of Miles Per Year Expected / 5,000 = How many Times Per Year my car gets serviced. Then it’s How Many Months Per Year / How many Times Per Year my car gets serviced, to how many months between each service. If I wanted to be fancy, I could write that as (Months Per Year)/(Miles Per Year Expected/5000). The reason the dealer and I get different numbers is that while we both agree on how many months there are in a year, they are working with a different Miles Per Year Expected. The *algorithm* isn’t wrong, because it isn’t *right*, either. It’s all dependent on what goes in, to determine what comes out.
What Happens When Things Change
Now that we are in COVID restriction, I still drive quite a bit to go visit immediate family every 2 weeks, but aside from that I’m working from home and I’m working out at home and so I don’t drive nearly as much. The *algorithm* still hasn’t changed — but the Miles Per Year Expected has. So now, my number looks a lot more like the dealer’s number — I’m driving about 12k miles/year, and so I would come in every 4 or 5 months. If the *dealer* changes their expectations, though, thinking “oh wow people aren’t driving with COVID we should bump that down to like 5k/year”, then our output of the algorithm will once again differ.
Slightly More Sophisticated Stuff
Simple algorithms are like the one above, it’s got one or more inputs (expected miles per year) and at least one output (Bobbie needs to get her car serviced in June). You can add more inputs, though, and some “checking stations”. These can be what are called “if” statements (If Bobbie likes strawberry pie then assume excess calorie consumption from April to July; if Bobbie likes blueberry pie then assume excess calorie consumption from July to August) which in turn can be on other “if” statements (If strawberries then In Season = April, May, June; If blueberries then In Season = July, August). You can take these “if” statements, or conditions, and sprinkle them in all of the parts of the algorithm: at the beginning, middle, and even with the ending to determine the ending.
Again, you probably do this all the time. Say you’re at Costco. I don’t know about you but I like to limit my Costco trips because crowds are not my thing; also because I like to limit my trips in general (I’m the sort of person who has a categorized grocery list). Most folks have a grocery list, and most folks have a Costco list. You’re at Costco, and they have special pallet stacks of stuff on sale (the pricing usually indicates how much off). And you’re in front of the toilet paper, which was not originally on your list. This is a more sophisticated algorithm you’re running in your head:
- Toilet Paper is On Sale
- Toilet Paper is 36 rolls
- Sale is only good for about 1 week
- I am not coming back to Costco for at least 3 weeks.
- How much toilet paper do you have at home
Evaluation: Here you need your algo to check a few things:
- Do you have the money in your planned budget for the extra toilet paper that was not on your list? – this is an evaluation that you can do with only one of the inputs – the Sale Price
- Do you need toilet paper between now and the time you *think* it will next be on sale? – this evaluation is done with the input of the volume of toilet paper you have at home, plus the amount of time between now and when you think it could be next on sale. (You know the next time you’re coming to Costco, in at least 3 weeks. But it may not be on sale then.)
- Do you have the storage capacity for the extra 36 rolls? – this evaluation is done independently of 1 and 2 — straight up can you stock 36 rolls or not?
As you evaluate each of these, you spit out the “result” of your algorithm, perhaps as these steps (remember, these assume you didn’t need toilet paper right now, and that this was just something to evaluate on top of your regular list):
- If I have money for this, then go to step 2. Otherwise, keep rolling my cart.
- If I think toilet paper will be on sale the next time I am here,
- *AND* I can last that long until I need toilet paper, then keep rolling my cart, else
- *AND* I cannot last that long until I need toilet paper, go to step 5
- If I think it will not be on sale next time, then go to step 3
- If it is worth it to me to delay purchasing the toilet paper for next time at the expense of the sale price (e.g., is 3 weeks wait better than $4 off?), then keep rolling my cart, else go to step 4
- If I can store the toilet paper, go to step 5. Else, keep rolling my cart.
- Buy toilet paper.
Here’s the thing: this evaluation happens in the space of a minute or two in your brain, standing at the endcap of toilet paper in Costco while trying to avoid getting sideswiped by carts and small children running to get the free food. You probably spent more time reading through that list than you would actually doing the evaluation in your head, at Costco. You’ve just run an algorithm, because you could easily have replaced “toilet paper” in this decision, with say, “steaks” or “beer” or “high-end whey protein shake mix” or “kale” or “salmon” or “bread” or any of a number of consumable goods. You could replace the windows of your visits to Costco with different figures (I know folks who go every week, every two weeks, only when needed, etc.). You could replace the amount of the sale price in the evaluation (e.g., $4 trade off for your visit window may be enough. But is $2? Or would $10 be a good trade off of convenience for a 2-month window? etc.). The *steps* are the same, the kinds of things that you are checking in the steps are the same, but the specifics differ from situation to situation.
Algorithms In the World
When we say “Facebook runs an algorithm and so they know you like Argyle Socks”, we mean that Facebook has a HUGE volume of inputs (ones you give it and ones it infers and ones it purchases) and a HUGE volume of conditions it evaluates.
It can for example extrapolate from the data you give it (say, photos, comments on friends’ posts, clicks you do *on Facebook*, etc.) that you like socks.
It can infer from things your friends post, or from cookies it drops (think: little text tracker that sits in the background of your computer that, when you leave Facebook.com, gets “looked for” by other websites that Facebook has deals with. That rando website checks to say “hey computer you got a Facebook cookie?” and your computer says “yup I got a Facebook cookie, it’s cookie number bla-bla” and that website says “cool beans thanks I’ll make a note of it”. Because Facebook *made* the cookie, it knows that bla-bla belongs to you. And because there’s millions of sites that Facebook agrees to check for cookies with, that sites that Facebook does not own or operate, Facebook can know that you went on Target, for example, and shopped for argyle socks.).
Facebook also straight up purchases data. “Hey argyle sock company, let me know the typical demographic by zip code of people who buy your socks!” When the argyle sock company comes back and says “ok so like in 98074 the typical argyle sock purchaser is female (we infer this because they bought women’s argyle socks) and over 30 (we infer this because she didn’t use pay pal or apple pay she used like an old school credit card)”, Facebook can marry that up with marketing data that says the average 98074 female over 30 also is also married with an income bracket of XYZ and likely owns and doesn’t rent.
Facebook can then take all of *that* data and run it through *another* set of checking stations and say ok so if she likes argyle socks then with this other data we have about her *what else* can we market to her? Maybe there’s a high correlation of female argyle sock wearing disposable income homeowner to coffee consumption. Let’s try that. Oh, did she click it? Our checking stations were *right*, let’s use them more. Oh, did she not? More data for the checking stations.
This is just one (very tortured) example: nearly every site you interact with (not just Facebook or its properties), every company that you purchase goods or services from (e.g., banks, insurance companies, etc.), and most especially every company you work with that gives you something “for free” (e.g., Instagram, Snapchat, Pinterest, etc.) collects this information, and has their own special list of algorithms they chug through and spit out ideas as to what you like or don’t like, what you do or do not want. Sometimes they sell these ideas, sometimes they purchase other’s ideas and marry them up with *their* ideas to get super-specific ideas about you. The more inputs they can get, the more outputs they can test, and the more testing they do, the more accurate they can get. This isn’t just about argyle socks either: they can suggest or infer political preference, disposable income, sexual preference, charitable leanings, religious leanings, and so forth. They can then market to you based on what they think you want to hear, or want to read, or want to buy.
All just an algorithm.