Decentralize Your Data

What does owning your data even mean, and can the Blockchain help?

Photo courtesy of Dennis Kummer

Congress recently required Mark Zuckerberg to defend his lifelong practice of mistreating your private information. Movements to give you control of this critical data took the opportunity to claim they can prevent future such breaches. Blockchain is the new solution in search of a problem, and personal data is in the crosshairs.

But can the blockchain actually help secure your personal data? What would that take? And seriously, what do people mean when they say we should own our own data?

It sounds nice. Too bad it won’t help. The problem is not “ownership” (whatever that even means in a world of infinite digital copies). It’s centralization. Having one person’s data is a small threat, only to that individual. Having everyone’s data is a national crisis.

By now we’re familiar with the huge amounts that Facebook, Google, Amazon, and apparently everyone except Apple have on us. But how did they get it? Mostly, we gave it to them, through using their products. What we didn’t give to them, we gave to someone else who then passed it on.

There have been massive breaches at Equifax, Facebook1, and many others. Even the general public is becoming aware of the real causes. Some of the the largest companies in the world exist purely to collect your information and sell access to you based on it. They might not sell your data, but they definitely sell your attention using it.

These are the problems you know about. Don’t worry; it gets worse from here. If you think your birthdate and pictures of your kids are personal, what about your DNA?

Anne Wojcicki is married to a Google founder, and she liked their data accumulation so much she started her own company to build a huge pile of even more personal data. 23andMe does not scrape the internet — or your cheeks — to get your DNA; no, people pay for the privilege of giving it to them. Yes, they offer a service in return, but do they clean house after? Hah! No. They keep it. (Hopefully somewhat more safely than Equifax does.)

What’s so wrong about there being a database of DNA from a big chunk of the population? Let’s ask the police.

You might not be afraid of the police. You should consider yourself lucky. I know anyone of color in the US is and should be. I know I am; I grew up on a commune, and policed raided us using helicopters and assault rifles in hopes of busting us for cannabis. I don’t mean to imply that hippies have been as systematically oppressed as African Americans (and certainly not in the south); just that I grew up with my own justified skepticism of exactly what that force was here for.

Even if you don’t fear the police, you should fear the consequences of DNA testing. The science behind most parts of DNA are absolutely rock solid2. The police work is another matter. Beyond outright fraud used to wrongly convict people, the messy world of testing DNA at crime scenes just makes it hard to get correct results. Juries inappropriately treat a complicated test as foolproof. It could be compromised anywhere from the crime scene to police handling to the lab itself. The failure rate even without fraud is high enough that I would not want to trust my life to it.

Not to imply that DNA testing is worthless; quite the contrary. It has been used to exonerate many people who were incorrectly imprisoned and put on death row. It’s not that it always fails, just that you don’t want to finding yourself gambling on it against life in prison.

But remember: This is just for cases where someone has a single person’s DNA. Like having just your fingerprint. What happens when someone like 23andMe has a whole database of it?

“If you didn’t do anything wrong, then you have nothing to fear.” Pfft. Yes, it starts with requests for the DNA of individual suspects, but it escalates to doing a database-wide search for DNA that matches. And by ‘matches’, we don’t mean, “is 100% guaranteed”, we mean, “eh, it’s pretty close”. A DNA “match” directed the police to someone they thought was a relative of a suspect, who was then brought in for questioning. So I guess as long as you’ve never done anything wrong, and aren’t related to anyone ever doing anything wrong, you’re fine. Right?

I feel so much better.

I had investors literally laugh at the idea that collecting this data introduced security concerns. They grew up at Google, so it’s not surprising they could not see centralization as a problem. Just like Equifax started out wanting to make it easier to get loans, and now they’ve got so much power you can’t get one without them.

There is a world of difference between giving someone your data, and allowing someone to include your data in a massive pile of it. Any discussion of the risks of data needs to acknowledge that.

Now we see our discussions of owning your own data don’t quite have it right. What we actually want is decentralization of data. We don’t want a single company to have access to this much information about huge groups of people.

And now you see the problem.

New technology can’t break Facebook’s business model. It can’t prevent Google from scraping every web site on the internet and identifying you by connecting everything. Whether you give it to them or not, they’ll know what you look like, where you live, and who you hang out with.

Most importantly, it can’t prevent people from sharing all that data with these services. After all, they’re getting something valuable in return, like connecting with friends and family. Or figuring out their family tree.

The problem is not the centralization. It’s the effectiveness of a business model built on centralization.

So anyone who comes to you and says “The blockchain will allow you to own your own data!”, ask them in return, “How will you make it such a joy to use that Facebook will go bankrupt?” And please, record the conversation, because I want to see them stammer.

This is fundamentally a product and design problem, but the technofuturists are treating it like a technology problem. “Oh, if only those college students had access to better cryptographic tools they never would have shared that data with Facebook!” 🤯 No. People will stop using Facebook, and 23andMe, and Google, when there are better solutions. And unfortunately, they need to work ten times better, not just a little bit.

So talk to me about the blockchain. I really do want to hear how you’ll use it to help people own their own data, and remove the incentive to centralize all of this data.

But talk to me of products. Of user benefits. Of business models built around all of this.

Because people have to want what you’re selling, and the only way to get that is to build something they want to use. Only then will they be able to own their own data.

This is the third article in a series of indefinite length on The Blockchain Without Blockchain

  1. Although technically not a breach, since their usage rules weren’t broken — that’s how little they respect your privacy.
  2. Although we’ve still got a lot to learn about the epigenome, so don’t think we’re done here.

Trusting more with the blockchain

Society is built on trust, and improves or weakens with it.

Photo by Nathaniel Tetteh

I know I have trust issues. I don’t need the blockchain crowd telling me.

Trusting is scary. We’ve all been burned at some point. But we can also look back and see trusting someone helped us develop, personally and professionally. None of us could be who we are if we had not learned this critical skill. Knowing where and how to trust is critical to growth, to life. It’s not even just humans — we can see this in our pets, our livestock.

A cynic might say that trust limits us. That if we only had less, we could do and be more. I’m not exactly known for looking on the bright side, but even I know this is wrong. Trust is the infrastructure for our experiences. Removing it flattens everything, not just limiting what you can do but limiting why you would do it.

Our problem is too little trust, not too much.

We know the stereotype of someone who does not trust. Someone outside of society. We know a person who cannot trust is broken in some way, missing something critical, in need of healing. Many of us also know the allure of not needing to trust, or be trusted. “Ah, to be independent, to owe nothing to anyone…”

This is the dream of remembered childhood. It was always a lie. We were failing to notice the work being done in our name, for us. It was a joyous lie, made more pleasant with the golden tinge of nostalgia. Grown, we miss the lie, we reach for it.

But deep down, we know: More than anything else, life is about trust.

Great companies have been built on this truth. eBay could only exist by creating trust between unknown parties.

Some look at this and see failure. “If only eBay had not needed trust…”

One of the Blockchain’s great claims is enabling commerce between people who don’t trust each other. Never mind that of course you still have to trust something — the code, the packaging of what you’re buying, the exchange, etc. You might scoff and say these are a given, but none of those things can be trusted in the current world of the blockchain. Never mind that commerce has always been done between people with little or no trust. That’s not what matters.

It is philosophical, psychological: Given the recognition that life is enriched by trust, and more riches require more trust, what do you do? Find a way to add trust to your life, or look for a way to get riches without it?

I can’t say the blockchain people are wrong. Maybe they really do need some kind of trustless commerce. I don’t know them. Well, other than the drug dealers. I know why they want this.

But in my life, for my problems? More trust is the answer, not less.

Ironically, the blockchain can actually help with that. Without changing a thing. Its boosters are right about its utility, they’re just wrong about why it works.

I don’t like to trust people with my data. People talk about wanting to own their data, being able to share bits with Facebook but not the whole thing. It’s a nice, if naive1, idea, but that’s not what I mean.

I don’t trust you to touch it. You’ll muck it up.

Heck, I don’t even trust myself. Actually, I was never given that choice. My apps don’t trust me with my own data. They keep it hidden away somewhere, behind an API, in proprietary formats.

Their distrust is reasonable. I don’t know how the app works. The data model is hidden, the storage internal. Most importantly, they can’t tell if I mess with it, and they can’t fix what I break.

Things were in some ways better in the age of documents, but now our data is all hidden. We ask them to give us access, and they sometimes comply with simplistic APIs. But they do not trust us.

What if they did? What if I were allowed access to my own data? What if I could share it with you, my close friend, because I trust you with it?

I mean, not entirely trust. I’m not stupid. We’re not that close.

With the right tools, I could see what you did, understand it, ensure it all makes sense. You could change it, query it, hand it back to me, and I could validate the whole thing. Get the best out of your work, but keep safety lines in place.

Again, a cynic would say call this an example of eliminating trust. But is it?

Is the key to this new interaction really that I don’t trust you?

No. I don’t want just anyone to have my data. It’s for you. My close friend. Who I already trust. Mostly.

This change does not entice me to share with psychopaths, strangers, or, god forbid, the people I went to high school with. It provides just enough of a bridge that I’m willing to give you, my good friend, who I just met on the internet, rights that I’d otherwise hold back.

Of course I know this is not what blockchain people mean when they talk about trust. Meh. I’m not interested in making capitalism even less moral, less human. I don’t even want to hang out with the people who do. But I am interested in making data more useful. And I’m especially interested in connecting with other people.

And this certainly does that.

Now my applications can expose their insides. They can be slugs instead of snails.2 I can use the apps I love, but my tools can fill their gaps. I can script my way around their missing APIs and limited reporting. Heck, I don’t need their interfaces at all. Just because they’re provided doesn’t make them good, and I can get there faster using the tools I already know.

I can pick the best app, without committing to a long-term relationship. I can give you my data, take advantage of what you offer, and if I want to change later, I don’t lose everything.

There’s a great example out there: Github. If they went away tomorrow, I would lose almost no data I care about. They host, but do not control, my most important data: my source code. I get some utility out of they’re hosting it, but they don’t get special rights.3

Github actually created a new kind of trust relationship: Because its users can trust the data store, they began to trust strangers to contribute code. If I were using Subversion, I would have to give you all or no access; in Git, I can give you qualified access. Github calls these ‘pull requests’. “Yes, you can contribute code, but I get to read it first.” That enables a flowering of trust, potentially leading to a deep relationship. That path to complete trust is much narrower, much harder without this infrastructure of gradual trust.

You can choose whether to see this as more or less trust. You can’t argue with the new bonds created, the new groups formed, all through the help of tooling. Almost like how commerce starts with low-trust exchanges of money and can lead to deep and meaningful relationships.

What would a world look like if all of my applications had just as much, or as little, ownership of my data as Github does?

Of course, the apps won’t like it much. Their trusting me with my data also gives me power over it, where today I have none.

With enough usage, our expectations start to change. Given two otherwise equivalent accounting apps, wouldn’t you pick the one that trusted you? That gave you equal access to your data, simplifying automation and reporting?

I would.

This is the world I want.

This is the second article in a series in indefinite length on The Blockchain without Blockchain.

  1. Facebook does not have all of your data because you provided a massive dump at once; they’ve painstakingly collected it over the years, bit by bit. You can be sure they’ll store every little piece you selectively reveal to them.
  2. Honestly I don’t know if slugs are more useful, but they’re certainly more vulnerable. And I could not think of a better analogy.
  3. There is some data that only they have. This is a limitation in git, more than the plan of GitHub. E.g., my follower list is not in my repo, which is probably good, but it’s not anywhere else I can access, which is probably bad.

Blockchain without Blockchain

There is powerful technology hidden in the blockchain. Just strip out the parts everyone cares about

Image courtesy of MemeGenerator

People are very excited about the blockchain, convinced it’s going going to replace cash, or gold, or credit cards, or even the internet. I’ve even read claims that it’s the most important invention since little things like money and democracy. Starving through a failed attempt at utopia has taught me to align with the skeptics who point to the lack of proven use cases (other than fraud and crime), and with the pranksters who make fun of the claims of salvation.

That doesn’t mean I think it’s worthless.

The economic use cases (cash, gold standard, “sound money”, etc.) rest somewhere between unproven and conspiracy theory. None get any love from me. Just because I don’t respect the Austrian school of economics doesn’t mean I don’t understand it. They’re so stuck on their commitment to not learning from history that they can’t let go of the switch away from the gold standard. I am also not silly enough to think that the anti-government libertarians are in it for anyone but themselves.

I’m only here for the tech, people.

The world of the blockchain is in a state of flux. This is normal for movements, and especially so for the definitions of words to be shifting quickly. I experienced this directly as Puppet was helping to create the DevOps space. Today, when we say ‘blockchain’ we mean a specific set of features provided by the code underlying Bitcoin; some might add the requirement for smart contracts, as added by Etherium.

This definition won’t last.

I think that when we look up in five years, the word will mean something quite different. As the conversation inevitably shifts from hype about its universal applicability to finding practical use cases, I am convinced the term ‘blockchain’ will invert from its maximalist form, including all possible features, to a much more minimal form, defining a base set of functionality that is suitable to solving far more problems. Different implementations will layer features onto that base set, but all of these derivatives will be called ‘blockchain’ (which will soon have purists crying we’re not using the “real” blockchain).

It’s that base set of functionality that I’m interested in. I’m hoping the world’s excitement over the complete package can turn into momentum around a more flexible solution. What can we take away, and what can we do once we strip things down?

Let’s start with the most obvious thing we don’t need: Lack of trust. Free markets are built on trust, so it’s silly to try to build one that doesn’t. I just don’t run into many problems that can only be solved if I never trust anyone. Even writing that out makes me cringe.

I’m interested in power tools, user productivity, and helping teams, which inherently means I’m interested in people who know and trust each other. That isn’t to say data validation and simple synchronization are never helpful, but trust between people is not a problem. So take out that whole part of the story.

Next up is consensus. Let’s be honest: You don’t really want the world to have access to most of your data. And most of you who do, well, it probably doesn’t want to see it. This solution for deciding whose copy of data is right? I don’t need it.

That isn’t to say there aren’t important consensus problems; it’s just that the blockchain’s version of them is not useful to me. I do actually struggle with managing conflicts in my own work. I’m writing this essay on one of the six devices I regularly work from, and I obviously want it to transparently synchronize between the rest. I can dream about never having conflicts, but in the real world I need a simple way to handle them when they happen. That’s my consensus algorithm. If I use the Blockchain’s, then one chunk of work wins and the rest of the work gets thrown away. That might work well in some environments, but is obviously a non-starter for my personal work. Instead, I need an algorithm that gives me control over managing my own work streams.

I expect a blockchain hodler would flip at calling this a “consensus algorithm”. It’s not. It’s more like a tool for managing merge conflicts. If you buy into the blockchain, you’ve added a ton of complexity that can and should be handled by a person.

It’s not a big leap from managing conflicts between one person’s devices to managing conflicts within a team. This is still an important problem, one wrestled with by anyone building online collaboration tools, but requires nothing like the level of infrastructure built into the blockchain. After all, I trust myself. I trust my team. I like to be able to verify, but I’m more worried about mistakes and data loss than I am about intentional subversion of the system. And actually, the current solution is horrible for teams. It’s one thing for someone else’s work to cause yours to sit in a queue for a bit, it’s another thing entirely to have your code thrown away because someone is in front of you.

Taking out the trust-less consensus allows us to remove the worst part of bitcoin: Proof of work. This is how parties in a blockchain transaction fight to have their work accepted and others’ rejected. If they win, they are rewarded with a token, which is how they’re “paid” for their work.

We don’t need any of that.

We’ve already established we don’t need an automated, trustless process for deciding who gets to update the database (and we’ve concluded we don’t want conflicting just thrown away). As a result, this whole mechanism can just be removed. Good thing, too, because bitcoin is in the process of consuming all of the world’s natural resources in the name of never trusting anyone. This also simplifies the problem of scaling these databases. Bitcoin is stupid slow because its proof of work system is stupid slow. Remove that, and any iPhone can trivially record new data as fast as you want.

Now that we don’t need proof of work, we also don’t need, surprise!, the currency itself. In a trustless system, tokens are used to compensate the networks recording the transactions. This is now so cheap we don’t need to reward people to do it.

There you go. Now you’ve got the blockchain, except without cryptocurrencies, trustless transactions, consensus algorithms, or proof of work.

The blockchain without blockchain.

It’s important to note: It’s not that I think none of these features are ever useful in any case. It’s that none of them are necessary to get the most important features out of the blockchain, and each of them should only ever be added to a system if they’re truly needed. In most cases, YAGNI.

What can we do now? Well, obviously, now that we’ve removed so much functionality we can do a lot more. General purpose languages are more powerful than specialized ones, and a more flexible database is a more powerful one. The current set of blockchain features can really only be used for a narrow set of use cases, and even those seem to be more theoretical than actual.

What’s left?

A database. That’s what the blockchain always was anyway. It’s built on Merkle trees, which means you can validate every change back to initialization if necessary. This makes it kind of trustless: I can validate your work, rather than just trusting your word of what you did. This ability actually encourages me to trust you more with my data, though.

Check back later for more about what I mean.

This article is the first in a series of indefinite length. I’ll explore the consequences of this stripping away. I will also address some of the potential I see for full current blockchain feature set. I look forward to approaching this topic from a different direction.