Owning My Strava Ride

I know what it means to own my code contributions Github, but what does it mean to own my rides on Strava?

Photo by William Hook

I recently went mountain biking in Bend, Oregon, which is a few hours’ drive from my house. As I usually do, I used Strava to record the ride. This was a new trail for me, and I got lost multiple times, so I also used Trailforks to figure out how to get back to my vehicle.

I’ve been thinking a lot about how services like Facebook start out helping us, but at some point the relationship shifts and we’re stuck helping them more. I don’t think Strava has gotten to a stage where the relationship is abusive, and maybe this is weird but I hope they stay too small to ever get there. Even without that shift, though, I am a little uncomfortable with our relationship.

I was standing on the trail, trying to find the right trail and idly watching a coyote search for chipmunks. As I switched between apps, I realized that I was losing a lot by recording in Strava instead of Trailforks. Or rather, I wasn’t; people who want to bike this trail in Bend were.

I log my rides (and my infrequent, hilariously slow runs) in Strava for, um, reasons. I don’t want kudos, and I don’t really look at my historical performance, although I do enjoy being able to study my rides at times. I want the data there, even if I rarely use it. And it conveniently automatically completes my daily exercise goal in Streaks. It’s kind of useful, but if it went away tomorrow, I wouldn’t miss the online component much, if at all.

If I instead recorded all of the trail rides in Trailforks, then everyone who came after me could get some value from the information provided by my ride. They could see what route I took, could likely tell based on my speed when I had to walk and when I was able to ride, and they could over time get a sense of what routes are most popular, or even what signage is confusing.

I built an open source company, so I’ve thought a lot about the worth of contributions. A developer‘s time on one project can’t be spent on another. Someone who writes documentation for your baby is giving up the opportunity to contribute elsewhere. It’s a conscious choice on the part of the contributor, and a constant interaction between the project and its constituents to keep people coming back.

I think Trailforks really understands the value of my contributions: If you do this, people who ride here will have higher quality data, and probably better rides.

I am super confident that Strava knows the value of my usage: I’ll get feedback from friends and I can track my speeds and feeds. But those aren’t contributions; that’s not something I’m giving up for the greater good. It’s something I am doing for selfish reasons.

The value of my trail ride could be for the greater good, though. Even my road rides and runs could be, as they could help people find routes, but the trail rides especially seem valuable, because the downsides of being lost out there are materially worse than not having the right route for your run.

It’s clear to me that Strava is not seeing my data as a contribution. They’re focused on engagement. That’s not inherently bad — lots of people use and love the app — but it is different. I find it interesting to think of what the experience would look like if that changed.

But after that, I thought: Why can’t I just share the data with both apps?

I mean, to some extent I can. I can just run both of them and let each record its own view of the world. This is what I did with Slopes and Strava in Mammoth, taking the lifts up and riding down. That made a little more sense because neither quite has the correct view of the world — Slopes doesn’t know what bikes are, and Strava doesn’t know what lifts are. It was pretty kludgy, but more importantly, I didn’t run into this conflict because the apps exist to do pretty much the same thing, just for different sports.

I could duplicate here, I assume, but… it seems stupid.

Beyond our relationships to service providers, I’ve been thinking about what it means to own your own data. It sounds awesome, but it’s rarely very useful in practice.

It turns out, I do own the data that I have posted on Strava. Great! So I’ll just share it with Trailforks, too.

Hmm. What would that look like? Can I… download the data, and then upload it to Trailforks? Is it a common data type?

Can I record it separately on my phone and post it to both apps? Is that what truly owning my data would look like?

It’s hard to imagine that world: You use apps that generate data, which by default is yours and only yours. It gets recorded in forms that are easy to share, understand, and manage. If you like, you can then contribute that data to other sites, and in doing so, you get to negotiate with them exactly what rights you’re passing on. Either way you have the data, but now they get a copy, too. If you don’t like their offer, you still get the data, and most likely, given you have all your delicious data, other apps will crop up with a different offer, because they can focus on that rather than all the data collection.

It would look a lot like the text editor I’m using to write this article, Ulysses. It allows me to publish, but is built first and foremost to make it easy to write. Sharing, contributing, engagement, and all of the other online stuff is left to other sites, other apps, like WordPress and Medium. And yeah, those apps do allow both writing and publishing, but it’s a horrible experience, a great way to lose data, and if you only write there then your data is stuck in their system and is pretty hard to get out in a useful way.

The world of writing looks weirdly different from the world of recording rides. And a lot worse.

I’m not in control. Legally I own the data, but, ah, I don’t have it. Strava does.

I would never write directly in Medium, so why am I logging my rides directly in Strava? What am I giving up because of it?

I’m pleased to find that Strava will allow me to give other people the data — the data that I own! — and it turns out that Trailforks knows how to slurp it out of Strava.

So it all ends well: My rides are in both locations, and every mountain bike ride I post to Strava will now be automatically imported intro Trailforks. Probably.

But for that brief moment, in Bend, while watching the coyote… I saw what it would take for me to really own my data. I liked it a lot.

An alternative crypto-history

Filesystems, version control, and the blockchain

Image courtesy of Adarsh Kummur

This is the fourth, and possibly final, article in my series on the blockchain without blockchain.

I’ve always had a warm place in my heart for filesystems.

I taught myself shell scripting while automating the installation of Disksuite, Sun’s free but sadistic disk mirroring software. I barely recall the actual work, instead remembering a hallway. I undertook a literal journey to learn programming, a repeated pilgrimage to the desk of a friend who took visible pleasure in explaining to me what I was doing wrong.1 It’s fair to say that if filesystems were less painful in the 90s, I would not be where I am today.

When Sun started advertising ZFS as the (finally!) successor to Disksuite and the filesystem it was built around, UFS, most of its functionality seemed obviously good — make the computers manage the disks, don’t demand people know up front how big a filesystem should be, don’t fail miserably when the server crashes, little things like that. But what was this data integrity thing? I’m embarrassed to say it took me a while to realize I needed it — who really cares if your filesystem is good at storing data, amirite? — and even longer to understand how it worked.

To explain it, I’m going to have to teach you cryptography. Just a little. You’re welcome to skip ahead if you’ve already got this part covered, but I expect most could use a little, ah, refresher. Step 1 in cryptography guides is usually: “Get a masters in mathematics from MIT.” I’m hoping to do a bit better than that. Cryptography really is just a form of math, and while we can’t all understand the details (I certainly don’t) we can at least understand the “algorithms happen here” flow diagrams2.

Cryptography is most famous for its privacy utility: You use it to ensure you and only you can read your files and chat messages. It gets more complex once we need to read them on all of our different devices, but most of it is pretty similar in concept. Even more useful is ensuring both you and I can read some text, but no one else can. It’s more complex, but is essentially an extension of that first use.

Privacy is not the only use case for cryptography. It’s also useful for efficient validation. That is, it can be used to see if a file you have today is the same one you had yesterday. I sent you a document, you think it looks wrong; how do we make sure it did not get changed somehow in transit?

Obviously one way to do that is to just send it again. This is not a great solution, because if you did not trust it the first time, why would you trust it the second? That might also be a bad idea if bandwidth is expensive. You generally want a verification mechanism that takes less space than the original file, and less CPU power than directly comparing the two files.

Cryptography provides just such a capability, usually called a ‘hash function’. It’s an algorithm that converts, say, a large text file into a much shorter string. If you want to ensure the file is not changed in some way, just run it again and compare the output. The short strings are easier to compare than the long documents, and you could even read them over the phone to someone so they can check the file on their end. These algorithms generally produce a string of a fixed length, regardless of input — this makes them efficient for long term storage and comparison, and safe to run on any size file. Here’s an example hash from my files:

03f39f4bfad04f6f2cfe09ced161ab740094905c

As you can see, it’s just a long string of gibberish. It’s not only useful for comparison, not meaningful in it’s own right.

What’s critical about these algorithms is that given a unique input they always provide a unique output. If you and I each have a file that hashes to a given string, then we can be confident we have exactly the same file. Of course, this can’t literally be true: We could design a hash function that only had 256 possible outputs, and there are obviously more than 256 possible inputs. This would produce a lot of what are called collisions, when two files hash to the same output, and, ah, is not terribly useful.

All of the modern hash functions are incredibly long. It is possible in theory but not in practice that a collision would happen. You’d need to execute the function 2^128 times. That’s 3.4 with 38 zeros after it. So, mathematically possible, but you can expect the sun to swallow the earth before the most secure hash functions get compromised. I mean, you can’t. You’ll be gone by then. But your files will still be safe.

Now that you’re at least as much an expert on cryptography as most of the bitcoin hodlers, why does any of this matter?

We were talking about data integrity.

You’d be right to guess that ZFS uses these hash functions to provide it. It goes further than just validating individual files. A little bit of cryptographic genius called a Merkle tree is the key. These don’t just hash the content on disk for later validation; they build a tree of hashes, where the leaf nodes are hashed by the nodes above them in the tree, which are themselves hashed by the root node. If any part of this system is corrupted — because the disk is broken, or someone changed the content some other way — it’s easy to detect. It’s not just that the individual hash will be different; remember each parent hashes all of its children, so now the parent is wrong. And its parent is wrong, too.

If the content is changed by any mechanism that does not also also update the Merkle tree, then it is easy to detect by rehashing all of the content and comparing the results to the stored tree.

This is how ZFS validates data integrity. It can write a block to disk, then pull the block and ensure it still matches the hash. When it writes a block, it updates the parallel tree, and when you ask for the block later, it can tell you if the block is still correct. If it’s not, it throws an error instead of handing it back to you.

When I first learned of this, it seemed overkill, but over time I remembered just how many ways there are for data to get corrupted. The most obvious one is someone changes it for nefarious reasons, but far more commonly you have a failure somewhere in the writing or reading process. The old spinning disks were error-prone, and the new SSD drives degrade eventually. It’s the complexity of reading and writing that really gets you, though: There are multiple layers of caches, drivers, and connections, any of which could introduce corruption.

For the first time on a normal production system, you could at least detect any of those problems. It’s too bad no one ever used it.3

I know, I know, you came to hear about how you could get all the awesomeness of blockchain without using the blockchain and instead I’m giving lessons on two things you could literally not care less about, cryptography and filesystems. Don’t worry. It gets worse from here.

Long after I learned about and promptly forgot ZFS (after all, it’s not like I was using it), I adopted Git. It’s a version control system, used for storing and managing source code. Every geek knows about it, but most of the world only recently learned of it when Microsoft bought Github for $7.5B with a ‘b’. I was an early adopter, switching Puppet to Git in 20084. Eventually I even learned how it works. I was titillated and a bit horrified that I had duplicated in Puppet one of the key features that made Git work: A system of storing files that allowed them to be looked up by their content (or rather, a hash of their content). Normally you store files by a name, but if lots of people (or, in Puppet’s case, computers) store the same file, they might not call it the same thing, so Git and Puppet instead stored them by their hash. This ensured we never backed up more than one copy of a file, saving a lot of space, and made it easy to check for changes in files.

For Puppet, we just used this to back up files we changed, in case people later wanted to revert.

Git did a lot more than that.

Like ZFS, it builds a Merkle tree of the entire file repository, with a similar goal: To understand what files have changed and how. After all, git is used to track and share changes to a collection of files. The sharing is a critical component; you can easily copy an entire git repository to another computer, or another person, and it’s important that they be able to confirm that they have a faithful copy.

Git stores the hash tree alongside all of the files. At any point, you can use the tree to validate every file in your tree. If there are changes (which is pretty much the whole point of a version control system), it can automatically store the new files and update the related tree.

Just like ZFS, one of the key features here is that the Merkle tree allows us to validate every file stored. We can walk the file tree and compare each file to its hash, and then compare the file listing to its own hash, all the way up. Any discrepancy is easily spotted.

This is my favorite kind of cleverness: It’s simple in implementation, yet makes Git more flexible and useful. It has power that other version control systems are missing, just because it relies on this basic mechanism for storage and validation.

Ok. Now we get to the point.

Again, I’m not actually interested in the blockchain. I’m interested in peeling it apart, putting the useful bits to work while avoiding the whole anarcho-capitalist aspect.

It would be easy to see the blockchain as a sudden revolution, a dramatic change in what’s possible. Viewed this way, it’s hard to separate the pieces from the whole. If all you see is the big picture, it’s easy not to notice that every individual component has its own history, its own value.

The blockchain was gradual, for both me and the industry. It was not one giant leap forward. It was part of a story, a sequence, and the most interesting aspect — Merkle trees — is decades old in math and now pushing decades old even in popular usage. Most of the interesting features touted in the blockchain come directly from them. Immutability (which isn’t) and trustless systems derive directly.

It’s worth understanding that history, to see which stages and steps apply to problems you have. The current cryptocurrency tech stack is built to solve problems I don’t think exist. Certainly they aren’t problems I have.

Unlike the blockchain as a whole, though, the individual technical components have been used for years, even decades, in production. Focusing on the current trend can blind you to the opportunity history demonstrates. I think you’re a lot more likely to find broadly applicable solutions there than in trying to replace currency.

Because I got here from the world of filesystems and version control, I see different benefits than you might if you approach thinking of currencies or exchanges. Or chat messages. That does not make me right or wrong, but it does, at least, mean we’re going to work on different problems.

I expect most of you think this is boring. That’s great. It will give me that much more time to build something.

  1. My brightest memory is learning that of course the ‘echo’ command resets the exit code variable. This was a critical early lesson in how your own debugging can dramatically change the behavior of a program.
  2. When people talk about the futility of trying to ban cryptography, this is what they mean: You can’t ban math.
  3. Yes, I know some people use and love ZFS. But never to the extent it should be.
  4. Resulting in one of our critical community members abandoning Puppet in protest, for some reason.

Decentralize Your Data

What does owning your data even mean, and can the Blockchain help?

Photo courtesy of Dennis Kummer

Congress recently required Mark Zuckerberg to defend his lifelong practice of mistreating your private information. Movements to give you control of this critical data took the opportunity to claim they can prevent future such breaches. Blockchain is the new solution in search of a problem, and personal data is in the crosshairs.

But can the blockchain actually help secure your personal data? What would that take? And seriously, what do people mean when they say we should own our own data?

It sounds nice. Too bad it won’t help. The problem is not “ownership” (whatever that even means in a world of infinite digital copies). It’s centralization. Having one person’s data is a small threat, only to that individual. Having everyone’s data is a national crisis.

By now we’re familiar with the huge amounts that Facebook, Google, Amazon, and apparently everyone except Apple have on us. But how did they get it? Mostly, we gave it to them, through using their products. What we didn’t give to them, we gave to someone else who then passed it on.

There have been massive breaches at Equifax, Facebook1, and many others. Even the general public is becoming aware of the real causes. Some of the the largest companies in the world exist purely to collect your information and sell access to you based on it. They might not sell your data, but they definitely sell your attention using it.

These are the problems you know about. Don’t worry; it gets worse from here. If you think your birthdate and pictures of your kids are personal, what about your DNA?

Anne Wojcicki is married to a Google founder, and she liked their data accumulation so much she started her own company to build a huge pile of even more personal data. 23andMe does not scrape the internet — or your cheeks — to get your DNA; no, people pay for the privilege of giving it to them. Yes, they offer a service in return, but do they clean house after? Hah! No. They keep it. (Hopefully somewhat more safely than Equifax does.)

What’s so wrong about there being a database of DNA from a big chunk of the population? Let’s ask the police.

You might not be afraid of the police. You should consider yourself lucky. I know anyone of color in the US is and should be. I know I am; I grew up on a commune, and policed raided us using helicopters and assault rifles in hopes of busting us for cannabis. I don’t mean to imply that hippies have been as systematically oppressed as African Americans (and certainly not in the south); just that I grew up with my own justified skepticism of exactly what that force was here for.

Even if you don’t fear the police, you should fear the consequences of DNA testing. The science behind most parts of DNA are absolutely rock solid2. The police work is another matter. Beyond outright fraud used to wrongly convict people, the messy world of testing DNA at crime scenes just makes it hard to get correct results. Juries inappropriately treat a complicated test as foolproof. It could be compromised anywhere from the crime scene to police handling to the lab itself. The failure rate even without fraud is high enough that I would not want to trust my life to it.

Not to imply that DNA testing is worthless; quite the contrary. It has been used to exonerate many people who were incorrectly imprisoned and put on death row. It’s not that it always fails, just that you don’t want to finding yourself gambling on it against life in prison.

But remember: This is just for cases where someone has a single person’s DNA. Like having just your fingerprint. What happens when someone like 23andMe has a whole database of it?

“If you didn’t do anything wrong, then you have nothing to fear.” Pfft. Yes, it starts with requests for the DNA of individual suspects, but it escalates to doing a database-wide search for DNA that matches. And by ‘matches’, we don’t mean, “is 100% guaranteed”, we mean, “eh, it’s pretty close”. A DNA “match” directed the police to someone they thought was a relative of a suspect, who was then brought in for questioning. So I guess as long as you’ve never done anything wrong, and aren’t related to anyone ever doing anything wrong, you’re fine. Right?

I feel so much better.

I had investors literally laugh at the idea that collecting this data introduced security concerns. They grew up at Google, so it’s not surprising they could not see centralization as a problem. Just like Equifax started out wanting to make it easier to get loans, and now they’ve got so much power you can’t get one without them.

There is a world of difference between giving someone your data, and allowing someone to include your data in a massive pile of it. Any discussion of the risks of data needs to acknowledge that.

Now we see our discussions of owning your own data don’t quite have it right. What we actually want is decentralization of data. We don’t want a single company to have access to this much information about huge groups of people.

And now you see the problem.

New technology can’t break Facebook’s business model. It can’t prevent Google from scraping every web site on the internet and identifying you by connecting everything. Whether you give it to them or not, they’ll know what you look like, where you live, and who you hang out with.

Most importantly, it can’t prevent people from sharing all that data with these services. After all, they’re getting something valuable in return, like connecting with friends and family. Or figuring out their family tree.

The problem is not the centralization. It’s the effectiveness of a business model built on centralization.

So anyone who comes to you and says “The blockchain will allow you to own your own data!”, ask them in return, “How will you make it such a joy to use that Facebook will go bankrupt?” And please, record the conversation, because I want to see them stammer.

This is fundamentally a product and design problem, but the technofuturists are treating it like a technology problem. “Oh, if only those college students had access to better cryptographic tools they never would have shared that data with Facebook!” 🤯 No. People will stop using Facebook, and 23andMe, and Google, when there are better solutions. And unfortunately, they need to work ten times better, not just a little bit.

So talk to me about the blockchain. I really do want to hear how you’ll use it to help people own their own data, and remove the incentive to centralize all of this data.

But talk to me of products. Of user benefits. Of business models built around all of this.

Because people have to want what you’re selling, and the only way to get that is to build something they want to use. Only then will they be able to own their own data.

This is the third article in a series of indefinite length on The Blockchain Without Blockchain

  1. Although technically not a breach, since their usage rules weren’t broken — that’s how little they respect your privacy.
  2. Although we’ve still got a lot to learn about the epigenome, so don’t think we’re done here.

Trusting more with the blockchain

Society is built on trust, and improves or weakens with it.

Photo by Nathaniel Tetteh

I know I have trust issues. I don’t need the blockchain crowd telling me.

Trusting is scary. We’ve all been burned at some point. But we can also look back and see trusting someone helped us develop, personally and professionally. None of us could be who we are if we had not learned this critical skill. Knowing where and how to trust is critical to growth, to life. It’s not even just humans — we can see this in our pets, our livestock.

A cynic might say that trust limits us. That if we only had less, we could do and be more. I’m not exactly known for looking on the bright side, but even I know this is wrong. Trust is the infrastructure for our experiences. Removing it flattens everything, not just limiting what you can do but limiting why you would do it.

Our problem is too little trust, not too much.

We know the stereotype of someone who does not trust. Someone outside of society. We know a person who cannot trust is broken in some way, missing something critical, in need of healing. Many of us also know the allure of not needing to trust, or be trusted. “Ah, to be independent, to owe nothing to anyone…”

This is the dream of remembered childhood. It was always a lie. We were failing to notice the work being done in our name, for us. It was a joyous lie, made more pleasant with the golden tinge of nostalgia. Grown, we miss the lie, we reach for it.

But deep down, we know: More than anything else, life is about trust.

Great companies have been built on this truth. eBay could only exist by creating trust between unknown parties.

Some look at this and see failure. “If only eBay had not needed trust…”

One of the Blockchain’s great claims is enabling commerce between people who don’t trust each other. Never mind that of course you still have to trust something — the code, the packaging of what you’re buying, the exchange, etc. You might scoff and say these are a given, but none of those things can be trusted in the current world of the blockchain. Never mind that commerce has always been done between people with little or no trust. That’s not what matters.

It is philosophical, psychological: Given the recognition that life is enriched by trust, and more riches require more trust, what do you do? Find a way to add trust to your life, or look for a way to get riches without it?

I can’t say the blockchain people are wrong. Maybe they really do need some kind of trustless commerce. I don’t know them. Well, other than the drug dealers. I know why they want this.

But in my life, for my problems? More trust is the answer, not less.

Ironically, the blockchain can actually help with that. Without changing a thing. Its boosters are right about its utility, they’re just wrong about why it works.

I don’t like to trust people with my data. People talk about wanting to own their data, being able to share bits with Facebook but not the whole thing. It’s a nice, if naive1, idea, but that’s not what I mean.

I don’t trust you to touch it. You’ll muck it up.

Heck, I don’t even trust myself. Actually, I was never given that choice. My apps don’t trust me with my own data. They keep it hidden away somewhere, behind an API, in proprietary formats.

Their distrust is reasonable. I don’t know how the app works. The data model is hidden, the storage internal. Most importantly, they can’t tell if I mess with it, and they can’t fix what I break.

Things were in some ways better in the age of documents, but now our data is all hidden. We ask them to give us access, and they sometimes comply with simplistic APIs. But they do not trust us.

What if they did? What if I were allowed access to my own data? What if I could share it with you, my close friend, because I trust you with it?

I mean, not entirely trust. I’m not stupid. We’re not that close.

With the right tools, I could see what you did, understand it, ensure it all makes sense. You could change it, query it, hand it back to me, and I could validate the whole thing. Get the best out of your work, but keep safety lines in place.

Again, a cynic would say call this an example of eliminating trust. But is it?

Is the key to this new interaction really that I don’t trust you?

No. I don’t want just anyone to have my data. It’s for you. My close friend. Who I already trust. Mostly.

This change does not entice me to share with psychopaths, strangers, or, god forbid, the people I went to high school with. It provides just enough of a bridge that I’m willing to give you, my good friend, who I just met on the internet, rights that I’d otherwise hold back.

Of course I know this is not what blockchain people mean when they talk about trust. Meh. I’m not interested in making capitalism even less moral, less human. I don’t even want to hang out with the people who do. But I am interested in making data more useful. And I’m especially interested in connecting with other people.

And this certainly does that.

Now my applications can expose their insides. They can be slugs instead of snails.2 I can use the apps I love, but my tools can fill their gaps. I can script my way around their missing APIs and limited reporting. Heck, I don’t need their interfaces at all. Just because they’re provided doesn’t make them good, and I can get there faster using the tools I already know.

I can pick the best app, without committing to a long-term relationship. I can give you my data, take advantage of what you offer, and if I want to change later, I don’t lose everything.

There’s a great example out there: Github. If they went away tomorrow, I would lose almost no data I care about. They host, but do not control, my most important data: my source code. I get some utility out of they’re hosting it, but they don’t get special rights.3

Github actually created a new kind of trust relationship: Because its users can trust the data store, they began to trust strangers to contribute code. If I were using Subversion, I would have to give you all or no access; in Git, I can give you qualified access. Github calls these ‘pull requests’. “Yes, you can contribute code, but I get to read it first.” That enables a flowering of trust, potentially leading to a deep relationship. That path to complete trust is much narrower, much harder without this infrastructure of gradual trust.

You can choose whether to see this as more or less trust. You can’t argue with the new bonds created, the new groups formed, all through the help of tooling. Almost like how commerce starts with low-trust exchanges of money and can lead to deep and meaningful relationships.

What would a world look like if all of my applications had just as much, or as little, ownership of my data as Github does?

Of course, the apps won’t like it much. Their trusting me with my data also gives me power over it, where today I have none.

With enough usage, our expectations start to change. Given two otherwise equivalent accounting apps, wouldn’t you pick the one that trusted you? That gave you equal access to your data, simplifying automation and reporting?

I would.

This is the world I want.

This is the second article in a series in indefinite length on The Blockchain without Blockchain.

  1. Facebook does not have all of your data because you provided a massive dump at once; they’ve painstakingly collected it over the years, bit by bit. You can be sure they’ll store every little piece you selectively reveal to them.
  2. Honestly I don’t know if slugs are more useful, but they’re certainly more vulnerable. And I could not think of a better analogy.
  3. There is some data that only they have. This is a limitation in git, more than the plan of GitHub. E.g., my follower list is not in my repo, which is probably good, but it’s not anywhere else I can access, which is probably bad.

Blockchain without Blockchain

There is powerful technology hidden in the blockchain. Just strip out the parts everyone cares about

Image courtesy of MemeGenerator

People are very excited about the blockchain, convinced it’s going going to replace cash, or gold, or credit cards, or even the internet. I’ve even read claims that it’s the most important invention since little things like money and democracy. Starving through a failed attempt at utopia has taught me to align with the skeptics who point to the lack of proven use cases (other than fraud and crime), and with the pranksters who make fun of the claims of salvation.

That doesn’t mean I think it’s worthless.

The economic use cases (cash, gold standard, “sound money”, etc.) rest somewhere between unproven and conspiracy theory. None get any love from me. Just because I don’t respect the Austrian school of economics doesn’t mean I don’t understand it. They’re so stuck on their commitment to not learning from history that they can’t let go of the switch away from the gold standard. I am also not silly enough to think that the anti-government libertarians are in it for anyone but themselves.

I’m only here for the tech, people.

The world of the blockchain is in a state of flux. This is normal for movements, and especially so for the definitions of words to be shifting quickly. I experienced this directly as Puppet was helping to create the DevOps space. Today, when we say ‘blockchain’ we mean a specific set of features provided by the code underlying Bitcoin; some might add the requirement for smart contracts, as added by Etherium.

This definition won’t last.

I think that when we look up in five years, the word will mean something quite different. As the conversation inevitably shifts from hype about its universal applicability to finding practical use cases, I am convinced the term ‘blockchain’ will invert from its maximalist form, including all possible features, to a much more minimal form, defining a base set of functionality that is suitable to solving far more problems. Different implementations will layer features onto that base set, but all of these derivatives will be called ‘blockchain’ (which will soon have purists crying we’re not using the “real” blockchain).

It’s that base set of functionality that I’m interested in. I’m hoping the world’s excitement over the complete package can turn into momentum around a more flexible solution. What can we take away, and what can we do once we strip things down?

Let’s start with the most obvious thing we don’t need: Lack of trust. Free markets are built on trust, so it’s silly to try to build one that doesn’t. I just don’t run into many problems that can only be solved if I never trust anyone. Even writing that out makes me cringe.

I’m interested in power tools, user productivity, and helping teams, which inherently means I’m interested in people who know and trust each other. That isn’t to say data validation and simple synchronization are never helpful, but trust between people is not a problem. So take out that whole part of the story.

Next up is consensus. Let’s be honest: You don’t really want the world to have access to most of your data. And most of you who do, well, it probably doesn’t want to see it. This solution for deciding whose copy of data is right? I don’t need it.

That isn’t to say there aren’t important consensus problems; it’s just that the blockchain’s version of them is not useful to me. I do actually struggle with managing conflicts in my own work. I’m writing this essay on one of the six devices I regularly work from, and I obviously want it to transparently synchronize between the rest. I can dream about never having conflicts, but in the real world I need a simple way to handle them when they happen. That’s my consensus algorithm. If I use the Blockchain’s, then one chunk of work wins and the rest of the work gets thrown away. That might work well in some environments, but is obviously a non-starter for my personal work. Instead, I need an algorithm that gives me control over managing my own work streams.

I expect a blockchain hodler would flip at calling this a “consensus algorithm”. It’s not. It’s more like a tool for managing merge conflicts. If you buy into the blockchain, you’ve added a ton of complexity that can and should be handled by a person.

It’s not a big leap from managing conflicts between one person’s devices to managing conflicts within a team. This is still an important problem, one wrestled with by anyone building online collaboration tools, but requires nothing like the level of infrastructure built into the blockchain. After all, I trust myself. I trust my team. I like to be able to verify, but I’m more worried about mistakes and data loss than I am about intentional subversion of the system. And actually, the current solution is horrible for teams. It’s one thing for someone else’s work to cause yours to sit in a queue for a bit, it’s another thing entirely to have your code thrown away because someone is in front of you.

Taking out the trust-less consensus allows us to remove the worst part of bitcoin: Proof of work. This is how parties in a blockchain transaction fight to have their work accepted and others’ rejected. If they win, they are rewarded with a token, which is how they’re “paid” for their work.

We don’t need any of that.

We’ve already established we don’t need an automated, trustless process for deciding who gets to update the database (and we’ve concluded we don’t want conflicting just thrown away). As a result, this whole mechanism can just be removed. Good thing, too, because bitcoin is in the process of consuming all of the world’s natural resources in the name of never trusting anyone. This also simplifies the problem of scaling these databases. Bitcoin is stupid slow because its proof of work system is stupid slow. Remove that, and any iPhone can trivially record new data as fast as you want.

Now that we don’t need proof of work, we also don’t need, surprise!, the currency itself. In a trustless system, tokens are used to compensate the networks recording the transactions. This is now so cheap we don’t need to reward people to do it.

There you go. Now you’ve got the blockchain, except without cryptocurrencies, trustless transactions, consensus algorithms, or proof of work.

The blockchain without blockchain.

It’s important to note: It’s not that I think none of these features are ever useful in any case. It’s that none of them are necessary to get the most important features out of the blockchain, and each of them should only ever be added to a system if they’re truly needed. In most cases, YAGNI.

What can we do now? Well, obviously, now that we’ve removed so much functionality we can do a lot more. General purpose languages are more powerful than specialized ones, and a more flexible database is a more powerful one. The current set of blockchain features can really only be used for a narrow set of use cases, and even those seem to be more theoretical than actual.

What’s left?

A database. That’s what the blockchain always was anyway. It’s built on Merkle trees, which means you can validate every change back to initialization if necessary. This makes it kind of trustless: I can validate your work, rather than just trusting your word of what you did. This ability actually encourages me to trust you more with my data, though.

Check back later for more about what I mean.

This article is the first in a series of indefinite length. I’ll explore the consequences of this stripping away. I will also address some of the potential I see for full current blockchain feature set. I look forward to approaching this topic from a different direction.