Remember those stories from a few decades ago, about how improvements in productivity and automation would mean that we’d only be working 5 hours a week by now? They seem as fanciful, and predictive, as The Jetsons.
It’s almost a punch-line now, because many actually work more hours today than back then. How could they have been so stupid?
Of course there is actually a lot baked into the fact that higher productivity has not reduced hours worked (nor, it turns out, has it raised wages for the bottom 50% of earners in the last three decades, quite contrary to the previous, oh, 200 years). But a big part of why that prediction seems so silly now is it included an implicit assumption: That the work we’d be doing today is pretty much the same as what we did then. You know, before computers.
And I don’t mean, they could not foresee how much time we’d waste on social media. Their primary mistake was not thinking about how we’d use the space created by greater productivity.
Induced demand. People who would not have driven with only two lanes, because of all the traffic, will now drive if there are four lanes. Trips you would not have taken because of the gas cost now become affordable and reasonable.
If your goal was to get more people driving, and maybe get the resulting increase in economic activity, then I suppose you got what you wanted. But if you were trying to reduce traffic - which is why most people say they want more freeways - then you’ve failed. If anything, you made it worse, because now twice the number of people are stuck in traffic. You didn’t reduce the problem, you just spread it around.
Demand is induced in many other areas. I grew up remodeling houses with my dad and got to see the impact of pneumatic nail guns and paint sprayers, which automated a huge chunk of how we’d spent our days. Of course, we didn’t use that newly free time to sit around, we changed our behavior: We did higher quality work, we finished jobs faster so got more done in a year, and we lowered prices.
I ran into this perception conflict all the time as I was building Puppet. I’d say I was building automation for the data center, and salespeople and execs would say, “Oh, so you can fire sysadmins!” I don’t know what they had against operations teams, but no, I would respond: I can give you a choice between reduced cost at the same service level (i.e., you fire people), or keep costs the same but increase service quality.
“Wait, that’s an option?!” Without higher level tools like Puppet, people had no idea how to increase service quality. Cost was their only dial, so they focused on that, even if it made no sense. Once I gave them other choices, of course they wanted better quality.
So where success in 1999 was shipping twice a year, and not going down, like, that often, now success is shipping multiple times a day and never appearing down to your customers. The standards have changed entirely, and that change was made possible mostly through automation. You can bet Netflix doesn’t become the new standard for infrastructure hotness using the bad old manual practices of the 90s.
You might say, no, people’s standards changed and that’s what motivated them to invest in automation, but you would be wrong. They always wanted higher quality. They always wanted better service. They just felt they had no choice but to accept the status quo, until we showed them other options.
I’m not saying automation never destroys jobs, never reduces the amount of work to be done in an area. But it’s by no means the default result.
The first impact of automation is to increase quality. I felt this myself, building houses. The main reason I’m no longer a carpenter is because of how incompetent I was at setting trim nails. You drive the nail most of the way into the wood1, then use a small punch to recess it. You cover that one little hole in spackle, and no one can tell. Unless you’re me. When I do it, I make a little sunflower, with a hole in the middle and holes all around in a circle, because I’m just that good at letting the punch head slide off the nail, into the wood. If we’d had the trim nailers that exist now, even I could have done a decent job. That kind of success might not have driven me out of the industry and into the welcoming arms of computers. I’m not a lot better typist, but the delete key does wonders for my self-confidence.
And let’s be honest about what we’re automating: It’s literally the most boring and least useful work we do. I wasn’t exactly high skilled labor, but I assure you there were more valuable things for me to do than try to set a nail while on my knees, bent over to the base molding. As a SysAdmin, yeah, my bosses thought my job was typing the same command 1000 times a day, but we can look back now and see that definition actually got in the way of the work, rather than being it. We had much more important stuff to do. Or at least, I did. Not sure about the bosses.
If we were all willing to drive cars from the 70s, while living in houses from the 70s (oh god the colors), using computers from the 70s (all three of them), watching channels from the 70s (all three of them), then yeah, maybe we could also work 5 hours a week.
But we all know what we’d do with the spare time: We’d make more work.
You’d do something silly, like write a book. And obviously, most of those books would be junk, but enough would be good that it would become someone’s job. And maybe the other writers didn’t actually have a knack for writing, but found they could be great at helping other writers. Oops, now you’re an editor, or a publisher. And now you’re working more than five hours a week again!
Or maybe you don’t want to be a writer, you just hate Avocado Green enough to do something different with your kitchen. Oops, now your friends want the same thing, and you’re an interior designer.
Or maybe you realize that the gas-guzzling death-traps we all drive in are insane, and you figure out some way to start bringing those sweet little rides over from Japan. You know, just to fill the time. Because you can only consume so much of your day watching three channels (and remember, no ESPN in this scenario).
You’re not the only one that’s bored. Everyone else has more time, and a need to fill it. They can spend the effort to become experts in watches, furniture, bicycles, hiking, boating, economics, philosophy, or any number of other areas. And we all know the main outcomes of expertise, beyond insufferable newsletters: Demand for more specialized stuff, or enough disgust at the lack of it that you’ll just do it your darn self.
Some number of us won’t do the work, but we still want more, because why else would I have read three books on horology? It only takes a few people to step in to fill that demand, and suddenly your time is taken up.
And of course, if you want to buy that fancy Patek Philippe to pass on to your kid, you probably need to work a few more hours than the minimum.
So now you know why we don’t work five hours a week, and hopefully you have a little more confidence that automation will generally be good, not bad. It’ll create more entrepreneurs, raise wages, increase quality, and reduce cost. Not every time, but most times.
It doesn’t explain why those prognosticators in the 70s were so silly, but, well, I bet it wasn’t the worst decision they made that decade.
16oz Estwing finish head with a curved claw, natch. ↩
I recently went mountain biking in Bend, Oregon, which is a few hours’ drive from my house. As I usually do, I used Strava to record the ride. This was a new trail for me, and I got lost multiple times, so I also used Trailforks to figure out how to get back to my vehicle.
I’ve been thinking a lot about how services like Facebook start out helping us, but at some point the relationship shifts and we’re stuck helping them more. I don’t think Strava has gotten to a stage where the relationship is abusive, and maybe this is weird but I hope they stay too small to ever get there. Even without that shift, though, I am a little uncomfortable with our relationship.
I was standing on the trail, trying to find the right trail and idly watching a coyote search for chipmunks. As I switched between apps, I realized that I was losing a lot by recording in Strava instead of Trailforks. Or rather, I wasn’t; people who want to bike this trail in Bend were.
I log my rides (and my infrequent, hilariously slow runs) in Strava for, um, reasons. I don’t want kudos, and I don’t really look at my historical performance, although I do enjoy being able to study my rides at times. I want the data there, even if I rarely use it. And it conveniently automatically completes my daily exercise goal in Streaks. It’s kind of useful, but if it went away tomorrow, I wouldn’t miss the online component much, if at all.
If I instead recorded all of the trail rides in Trailforks, then everyone who came after me could get some value from the information provided by my ride. They could see what route I took, could likely tell based on my speed when I had to walk and when I was able to ride, and they could over time get a sense of what routes are most popular, or even what signage is confusing.
I built an open source company, so I’ve thought a lot about the worth of contributions. A developer’s time on one project can’t be spent on another. Someone who writes documentation for your baby is giving up the opportunity to contribute elsewhere. It’s a conscious choice on the part of the contributor, and a constant interaction between the project and its constituents to keep people coming back.
I think Trailforks really understands the value of my contributions: If you do this, people who ride here will have higher quality data, and probably better rides.
I am super confident that Strava knows the value of my usage: I’ll get feedback from friends and I can track my speeds and feeds. But those aren’t contributions; that’s not something I’m giving up for the greater good. It’s something I am doing for selfish reasons.
The value of my trail ride could be for the greater good, though. Even my road rides and runs could be, as they could help people find routes, but the trail rides especially seem valuable, because the downsides of being lost out there are materially worse than not having the right route for your run.
It’s clear to me that Strava is not seeing my data as a contribution. They’re focused on engagement. That’s not inherently bad - lots of people use and love the app - but it is different. I find it interesting to think of what the experience would look like if that changed.
But after that, I thought: Why can’t I just share the data with both apps?
I mean, to some extent I can. I can just run both of them and let each record its own view of the world. This is what I did with Slopes and Strava in Mammoth, taking the lifts up and riding down. That made a little more sense because neither quite has the correct view of the world - Slopes doesn’t know what bikes are, and Strava doesn’t know what lifts are. It was pretty kludgy, but more importantly, I didn’t run into this conflict because the apps exist to do pretty much the same thing, just for different sports.
I could duplicate here, I assume, but… it seems stupid.
Beyond our relationships to service providers, I’ve been thinking about what it means to own your own data. It sounds awesome, but it’s rarely very useful in practice.
It turns out, I do own the data that I have posted on Strava. Great! So I’ll just share it with Trailforks, too.
Hmm. What would that look like? Can I… download the data, and then upload it to Trailforks? Is it a common data type?
Can I record it separately on my phone and post it to both apps? Is that what truly owning my data would look like?
It’s hard to imagine that world: You use apps that generate data, which by default is yours and only yours. It gets recorded in forms that are easy to share, understand, and manage. If you like, you can then contribute that data to other sites, and in doing so, you get to negotiate with them exactly what rights you’re passing on. Either way you have the data, but now they get a copy, too. If you don’t like their offer, you still get the data, and most likely, given you have all your delicious data, other apps will crop up with a different offer, because they can focus on that rather than all the data collection.
It would look a lot like the text editor I’m using to write this article, Ulysses. It allows me to publish, but is built first and foremost to make it easy to write. Sharing, contributing, engagement, and all of the other online stuff is left to other sites, other apps, like Wordpress and Medium. And yeah, those apps do allow both writing and publishing, but it’s a horrible experience, a great way to lose data, and if you only write there then your data is stuck in their system and is pretty hard to get out in a useful way.
The world of writing looks weirdly different from the world of recording rides. And a lot worse.
I’m not in control. Legally I own the data, but, ah, I don’t have it. Strava does.
I would never write directly in Medium, so why am I logging my rides directly in Strava? What am I giving up because of it?
I’m pleased to find that Strava will allow me to give other people the data - the data that I own! - and it turns out that Trailforks knows how to slurp it out of Strava.
So it all ends well: My rides are in both locations, and every mountain bike ride I post to Strava will now be automatically imported intro Trailforks. Probably.
But for that brief moment, in Bend, while watching the coyote… I saw what it would take for me to really own my data. I liked it a lot.
I’ve always had a warm place in my heart for filesystems.
I taught myself shell scripting while automating the installation of Disksuite, Sun’s free but sadistic disk mirroring software. I barely recall the actual work, instead remembering a hallway. I undertook a literal journey to learn programming, a repeated pilgrimage to the desk of a friend who took visible pleasure in explaining to me what I was doing wrong.1 It’s fair to say that if filesystems were less painful in the 90s, I would not be where I am today.
When Sun started advertising ZFS as the (finally!) successor to Disksuite and the filesystem it was built around, UFS, most of its functionality seemed obviously good - make the computers manage the disks, don’t demand people know up front how big a filesystem should be, don’t fail miserably when the server crashes, little things like that. But what was this data integrity thing? I’m embarrassed to say it took me a while to realize I needed it - who really cares if your filesystem is good at storing data, amirite? - and even longer to understand how it worked.
To explain it, I’m going to have to teach you cryptography. Just a little. You’re welcome to skip ahead if you’ve already got this part covered, but I expect most could use a little, ah, refresher. Step 1 in cryptography guides is usually: “Get a masters in mathematics from MIT.” I’m hoping to do a bit better than that. Cryptography really is just a form of math, and while we can’t all understand the details (I certainly don’t) we can at least understand the “algorithms happen here” flow diagrams2.
Cryptography is most famous for its privacy utility: You use it to ensure you and only you can read your files and chat messages. It gets more complex once we need to read them on all of our different devices, but most of it is pretty similar in concept. Even more useful is ensuring both you and I can read some text, but no one else can. It’s more complex, but is essentially an extension of that first use.
Privacy is not the only use case for cryptography. It’s also useful for efficient validation. That is, it can be used to see if a file you have today is the same one you had yesterday. I sent you a document, you think it looks wrong; how do we make sure it did not get changed somehow in transit?
Obviously one way to do that is to just send it again. This is not a great solution, because if you did not trust it the first time, why would you trust it the second? That might also be a bad idea if bandwidth is expensive. You generally want a verification mechanism that takes less space than the original file, and less CPU power than directly comparing the two files.
Cryptography provides just such a capability, usually called a ‘hash function’. It’s an algorithm that converts, say, a large text file into a much shorter string. If you want to ensure the file is not changed in some way, just run it again and compare the output. The short strings are easier to compare than the long documents, and you could even read them over the phone to someone so they can check the file on their end. These algorithms generally produce a string of a fixed length, regardless of input - this makes them efficient for long term storage and comparison, and safe to run on any size file. Here’s an example hash from my files:
As you can see, it’s just a long string of gibberish. It’s not only useful for comparison, not meaningful in it’s own right.
What’s critical about these algorithms is that given a unique input they always provide a unique output. If you and I each have a file that hashes to a given string, then we can be confident we have exactly the same file. Of course, this can’t literally be true: We could design a hash function that only had 256 possible outputs, and there are obviously more than 256 possible inputs. This would produce a lot of what are called collisions, when two files hash to the same output, and, ah, is not terribly useful.
All of the modern hash functions are incredibly long. It is possible in theory but not in practice that a collision would happen. You’d need to execute the function 2^128 times. That’s 3.4 with 38 zeros after it. So, mathematically possible, but you can expect the sun to swallow the earth before the most secure hash functions get compromised. I mean, you can’t. You’ll be gone by then. But your files will still be safe.
Now that you’re at least as much an expert on cryptography as most of the bitcoin hodlers, why does any of this matter?
We were talking about data integrity.
You’d be right to guess that ZFS uses these hash functions to provide it. It goes further than just validating individual files. A little bit of cryptographic genius called a Merkle tree is the key. These don’t just hash the content on disk for later validation; they build a tree of hashes, where the leaf nodes are hashed by the nodes above them in the tree, which are themselves hashed by the root node. If any part of this system is corrupted - because the disk is broken, or someone changed the content some other way - it’s easy to detect. It’s not just that the individual hash will be different; remember each parent hashes all of its children, so now the parent is wrong. And its parent is wrong, too.
If the content is changed by any mechanism that does not also also update the Merkle tree, then it is easy to detect by rehashing all of the content and comparing the results to the stored tree.
This is how ZFS validates data integrity. It can write a block to disk, then pull the block and ensure it still matches the hash. When it writes a block, it updates the parallel tree, and when you ask for the block later, it can tell you if the block is still correct. If it’s not, it throws an error instead of handing it back to you.
When I first learned of this, it seemed overkill, but over time I remembered just how many ways there are for data to get corrupted. The most obvious one is someone changes it for nefarious reasons, but far more commonly you have a failure somewhere in the writing or reading process. The old spinning disks were error-prone, and the new SSD drives degrade eventually. It’s the complexity of reading and writing that really gets you, though: There are multiple layers of caches, drivers, and connections, any of which could introduce corruption.
For the first time on a normal production system, you could at least detect any of those problems. It’s too bad no one ever used it.3
I know, I know, you came to hear about how you could get all the awesomeness of blockchain without using the blockchain and instead I’m giving lessons on two things you could literally not care less about, cryptography and filesystems. Don’t worry. It gets worse from here.
Long after I learned about and promptly forgot ZFS (after all, it’s not like I was using it), I adopted Git. It’s a version control system, used for storing and managing source code. Every geek knows about it, but most of the world only recently learned of it when Microsoft bought Github for $7.5B with a ‘b’. I was an early adopter, switching Puppet to Git in 20084. Eventually I even learned how it works. I was titillated and a bit horrified that I had duplicated in Puppet one of the key features that made Git work: A system of storing files that allowed them to be looked up by their content (or rather, a hash of their content). Normally you store files by a name, but if lots of people (or, in Puppet’s case, computers) store the same file, they might not call it the same thing, so Git and Puppet instead stored them by their hash. This ensured we never backed up more than one copy of a file, saving a lot of space, and made it easy to check for changes in files.
For Puppet, we just used this to back up files we changed, in case people later wanted to revert.
Git did a lot more than that.
Like ZFS, it builds a Merkle tree of the entire file repository, with a similar goal: To understand what files have changed and how. After all, git is used to track and share changes to a collection of files. The sharing is a critical component; you can easily copy an entire git repository to another computer, or another person, and it’s important that they be able to confirm that they have a faithful copy.
Git stores the hash tree alongside all of the files. At any point, you can use the tree to validate every file in your tree. If there are changes (which is pretty much the whole point of a version control system), it can automatically store the new files and update the related tree.
Just like ZFS, one of the key features here is that the Merkle tree allows us to validate every file stored. We can walk the file tree and compare each file to its hash, and then compare the file listing to its own hash, all the way up. Any discrepancy is easily spotted.
This is my favorite kind of cleverness: It’s simple in implementation, yet makes Git more flexible and useful. It has power that other version control systems are missing, just because it relies on this basic mechanism for storage and validation.
It would be easy to see the blockchain as a sudden revolution, a dramatic change in what’s possible. Viewed this way, it’s hard to separate the pieces from the whole. If all you see is the big picture, it’s easy not to notice that every individual component has its own history, its own value.
The blockchain was gradual, for both me and the industry. It was not one giant leap forward. It was part of a story, a sequence, and the most interesting aspect - Merkle trees - is decades old in math and now pushing decades old even in popular usage. Most of the interesting features touted in the blockchain come directly from them. Immutability (which isn’t) and trustless systems derive directly.
It’s worth understanding that history, to see which stages and steps apply to problems you have. The current cryptocurrency tech stack is built to solve problems I don’t think exist. Certainly they aren’t problems I have.
Unlike the blockchain as a whole, though, the individual technical components have been used for years, even decades, in production. Focusing on the current trend can blind you to the opportunity history demonstrates. I think you’re a lot more likely to find broadly applicable solutions there than in trying to replace currency.
Because I got here from the world of filesystems and version control, I see different benefits than you might if you approach thinking of currencies or exchanges. Or chat messages. That does not make me right or wrong, but it does, at least, mean we’re going to work on different problems.
I expect most of you think this is boring. That’s great. It will give me that much more time to build something.
My brightest memory is learning that of course the ’echo’ command resets the exit code variable. This was a critical early lesson in how your own debugging can dramatically change the behavior of a program. ↩
When people talk about the futility of trying to ban cryptography, this is what they mean: You can’t ban math. ↩
I am a tool junkie. I love the effortless balance of a well-known chef’s knife, like my hands know what to do all on their own. Heavy usage builds callouses and tunes muscles, its usefulness evidenced by scuff marks and changed infrastructure. Failure leaves blisters or even hospital visits in its wake.
A good tool proves its utility. Knives slowly shrink with sharpening, work pants thin, machines need oil. If they don’t, you’re either not maintaining your tools, or barely using them.
This wear is proof of your usage. They should be scratched. Dented. Aged. Patinas should be acquired from the shop, not factory treatments. Their callouses should pair yours. Tools can not be precious. They’ll just live on a shelf, then retire to your attic. You should seek that perfect middle ground, where you spend enough money that your kids can inherit them, but not so much that you are squeamish about giving them a job.
Tools only deserve the label if they help you work.
You might say I have strong feelings about them. I’m assuming this love led to my focus as a software entrepreneur on helping people people work. Or maybe my experience with tools in the physical world led me to seek them in the digital world, learning to make what I could not buy.
Given my tool fetish, you’d think I’d have a solid grasp of what I mean when I use the word. Apparently, not so much. I was recently pulled up short by a simple question, asked by Jordan Hayles of the Radical Brand Lab: What do you mean by tools?
What do you mean, what do I mean? It’s a simple question, right? The above text gives one example, but I would have thought I could answer it in a bunch of reasonable ways, none of which seem terribly controversial.
But the more I explored, the less simple the question became.
I’ve been describing my goal as building power tools for people. This phrase comes from my time building houses with my dad, and ‘power tools’ just meant the things you plugged in. You know? Because they needed power? It’s a common usage, maybe the word choice here did not mean much.
Except… I’ve spent more than a decade learning product management, describing myself as a product-oriented founder, managing that function in a growing company, and attempting to teach it to others. Yet here I am ignoring both the term and the field entirely. Why am I so quickly dumping my work of the last ten years? Is it just creative branding? Cynicism about my industry?
Why not power products? That’s a motor boat of alliteration: ‘power products for people.’ Awesome, right?
Ok, maybe not.
Product management as we know it began in the consumer goods industry. You’re handed a train car full of dish soap and told to sell it. You’ve got to package it, set pricing, convince a local store to carry it, argue with them about location, move it away from competitors, all that. Every product you see in your local grocery store is loved by a product manager who fights for its shelf space, believes it is beautiful, and wants you to give it a good home.
Product management can also be used for evil. Laser printers had toner cartridges you could just refill. Not very clean, but cheap and reliable to run once you plonked down the cash for the expensive printer. Modern inkjet printers instead use disposable cartridges. To sustain profit margins in a rapidly commoditizing industry, manufacturers started putting rules in place on the cartridges: You had to buy them from the manufacturer, they had to be replaced every year, you could not refill them, you could not print in black and white if any color cartridges are empty.
That’s good product management. Well, it’s evil, but you know what I mean. It’s effective. We’re talking big-B revenue effective. Hmm. A moral distinction begins to reveal itself.
These are examples of companies forcing their business model onto their customers. There’s no difference between the dish soap sold at retail and the one sold in bulk, yet they’re separate products, differentiated through packaging, shipping needs, and labeling. You pay much more for the retail package than the wholesale one, primarily because the business model behind them is so different.
But when I think of a tool, these complications are missing. When I use a hammer, it just has to fit my hand and smash stuff. When I pick up my drill, it works with every bit I own, regardless of the logo. The battery and charger are proprietary, but the vendor’s most visible role in my life is color choice. My yellow drill works just fine with bits from the blue or green companies. (You probably visualized brands by my just mentioning colors. That’s still effective here.) It does not matter whether I bought the drill from Home Depot or inherited it from my dad; once in my hands, it just works.
I think this begins to answer the question of what a tool is.
It helps you do your job, without your worrying about the vendor’s needs.
I know that DeWalt and Mikita need to make money to sell me a drill, but I don’t think about it when I’m using their tools. Even after more than two decades without one, I can comfortably recite that “my” hammer is the Estwing 22oz waffle head with a straight claw1, but none of those details mean I need the vendor’s permission to hit a nail with it. I make a decision about the right tool, I buy it, I use it. End of story.
It is small. If you call something a tool, not a product, you’re saying it’s less, it’s not as complete a solution. This can be belittling, insulting, but it does not have to be. It’s also a statement of independence. Of freedom. Of, and this is going to sound crazy, compatibility.
Products have an implicit, ongoing dependence on their vendor. If that’s me, I love it: I want you to pay me all the time, not just once. That ongoing relationship is how I afford to keep improving what I’ve built for you. This can be a great way to ensure we have a long-term, sustainable partnership. But it’s not always a healthy relationship. The more you have to deal with how I make money, the worse the experience is for you.
I think this is what I like about tools. They’re self-contained. Independent. Using them is fundamentally pragmatic, not a lifetime commitment.
That independence has downsides for me as a vendor. You don’t get any of those delicious growth-hacker buzzwords. Your product isn’t “sticky”, there’s no “moat.” Those are examples of my customers being constrained by my business model, and their absence means revenue is harder to build, to protect.
One might argue I’m better off because treating my customers with more respect makes a better business in the long term, and I’d probably agree. This kind of respectful partnership should deliver higher returns than one that traps and mistreats its customers. I think this is often the right answer, but it’s not a popular one. It’s harder to get funding, to get off the ground. I might be accused of not “wanting to build a real company,” or I might have Silicon Valley’s most dire insult hurled at me: “That’s just a lifestyle business”.
Tell that to Adobe. Or AutoDesk. These are great tools companies. They are the behemoths we know today because they knuckled down and solved their customers’ problems. They worried about that, rather than how they could extract maximum revenue over time. It was a different time, but people have not changed.
I don’t think that every product is compromised when the vendor’s needs show up in the customer’s life, but I think most are. Some of it is laziness, shoring up product limitations with business model innovations, but a lot of it is strategy, recognizing the value of painting your customer into a corner.
Honestly, some of it is just survival. A lot of those inkjet printers are unaffordably cheap, but buyers care only about cost, not value. Some markets are intrinsically dysfunctional, with users and vendors slowly killing each through bad deals and cynical behavior. But as a vendor, I get to make a choice about what markets to play in, and how to work with my customers.
I am a simple person with a simple dream: I want to build something that helps someone work. I have to make money while doing it, because that’s the nature of the job, but I’m more interested in my customers’ work than my own. I know I need a business model, a go-to-market strategy, a plan for growing and supporting my business. But my customers should not need to care about that, should they? If they like what I’m building, they should be able to buy it, and use it. And tell all their friends how great it is. They should not wake up one day to find they’ve accidentally gotten married to me.
I just want to build tools. And I’m proud of it.
We told with great pleasure the (most likely apocryphal) story that this hammer was illegal in Florida because the metal haft could cut your thumb off. ↩
Congress recently required Mark Zuckerberg to defend his lifelong practice of mistreating your private information. Movements to give you control of this critical data took the opportunity to claim they can prevent future such breaches. Blockchain is the new solution in search of a problem, and personal data is in the crosshairs.
But can the blockchain actually help secure your personal data? What would that take? And seriously, what do people mean when they say we should own our own data?
It sounds nice. Too bad it won’t help. The problem is not “ownership” (whatever that even means in a world of infinite digital copies). It’s centralization. Having one person’s data is a small threat, only to that individual. Having everyone’s data is a national crisis.
By now we’re familiar with the huge amounts that Facebook, Google, Amazon, and apparently everyone except Apple have on us. But how did they get it? Mostly, we gave it to them, through using their products. What we didn’t give to them, we gave to someone else who then passed it on.
There have been massive breaches at Equifax, Facebook1, and many others. Even the general public is becoming aware of the real causes. Some of the the largest companies in the world exist purely to collect your information and sell access to you based on it. They might not sell your data, but they definitely sell your attention using it.
These are the problems you know about. Don’t worry; it gets worse from here. If you think your birthdate and pictures of your kids are personal, what about your DNA?
Anne Wojcicki is married to a Google founder, and she liked their data accumulation so much she started her own company to build a huge pile of even more personal data. 23andMe does not scrape the internet - or your cheeks - to get your DNA; no, people pay for the privilege of giving it to them. Yes, they offer a service in return, but do they clean house after? Hah! No. They keep it. (Hopefully somewhat more safely than Equifax does.)
What’s so wrong about there being a database of DNA from a big chunk of the population? Let’s ask the police.
You might not be afraid of the police. You should consider yourself lucky. I know anyone of color in the US is and should be. I know I am; I grew up on a commune, and policed raided us using helicopters and assault rifles in hopes of busting us for cannabis. I don’t mean to imply that hippies have been as systematically oppressed as African Americans (and certainly not in the south); just that I grew up with my own justified skepticism of exactly what that force was here for.
Even if you don’t fear the police, you should fear the consequences of DNA testing. The science behind most parts of DNA are absolutely rock solid2. The police work is another matter. Beyond outright fraud used to wrongly convict people, the messy world of testing DNA at crime scenes just makes it hard to get correct results. Juries inappropriately treat a complicated test as foolproof. It could be compromised anywhere from the crime scene to police handling to the lab itself. The failure rate even without fraud is high enough that I would not want to trust my life to it.
Not to imply that DNA testing is worthless; quite the contrary. It has been used to exonerate many people who were incorrectly imprisoned and put on death row. It’s not that it always fails, just that you don’t want to finding yourself gambling on it against life in prison.
But remember: This is just for cases where someone has a single person’s DNA. Like having just your fingerprint. What happens when someone like 23andMe has a whole database of it?
“If you didn’t do anything wrong, then you have nothing to fear.” Pfft. Yes, it starts with requests for the DNA of individual suspects, but it escalates to doing a database-wide search for DNA that matches. And by ‘matches’, we don’t mean, “is 100% guaranteed”, we mean, “eh, it’s pretty close”. A DNA “match” directed the police to someone they thought was a relative of a suspect, who was then brought in for questioning. So I guess as long as you’ve never done anything wrong, and aren’t related to anyone ever doing anything wrong, you’re fine. Right?
I feel so much better.
I had investors literally laugh at the idea that collecting this data introduced security concerns. They grew up at Google, so it’s not surprising they could not see centralization as a problem. Just like Equifax started out wanting to make it easier to get loans, and now they’ve got so much power you can’t get one without them.
There is a world of difference between giving someone your data, and allowing someone to include your data in a massive pile of it. Any discussion of the risks of data needs to acknowledge that.
Now we see our discussions of owning your own data don’t quite have it right. What we actually want is decentralization of data. We don’t want a single company to have access to this much information about huge groups of people.
And now you see the problem.
New technology can’t break Facebook’s business model. It can’t prevent Google from scraping every web site on the internet and identifying you by connecting everything. Whether you give it to them or not, they’ll know what you look like, where you live, and who you hang out with.
Most importantly, it can’t prevent people from sharing all that data with these services. After all, they’re getting something valuable in return, like connecting with friends and family. Or figuring out their family tree.
The problem is not the centralization. It’s the effectiveness of a business model built on centralization.
So anyone who comes to you and says “The blockchain will allow you to own your own data!”, ask them in return, “How will you make it such a joy to use that Facebook will go bankrupt?” And please, record the conversation, because I want to see them stammer.
This is fundamentally a product and design problem, but the technofuturists are treating it like a technology problem. “Oh, if only those college students had access to better cryptographic tools they never would have shared that data with Facebook!” 🤯 No. People will stop using Facebook, and 23andMe, and Google, when there are better solutions. And unfortunately, they need to work ten times better, not just a little bit.
So talk to me about the blockchain. I really do want to hear how you’ll use it to help people own their own data, and remove the incentive to centralize all of this data.
But talk to me of products. Of user benefits. Of business models built around all of this.
Because people have to want what you’re selling, and the only way to get that is to build something they want to use. Only then will they be able to own their own data.