An alternative crypto-history

Filesystems, version control, and the blockchain

Image courtesy of Adarsh Kummur

This is the fourth, and possibly final, article in my series on the blockchain without blockchain.

I’ve always had a warm place in my heart for filesystems.

I taught myself shell scripting while automating the installation of Disksuite, Sun’s free but sadistic disk mirroring software. I barely recall the actual work, instead remembering a hallway. I undertook a literal journey to learn programming, a repeated pilgrimage to the desk of a friend who took visible pleasure in explaining to me what I was doing wrong.1 It’s fair to say that if filesystems were less painful in the 90s, I would not be where I am today.

When Sun started advertising ZFS as the (finally!) successor to Disksuite and the filesystem it was built around, UFS, most of its functionality seemed obviously good — make the computers manage the disks, don’t demand people know up front how big a filesystem should be, don’t fail miserably when the server crashes, little things like that. But what was this data integrity thing? I’m embarrassed to say it took me a while to realize I needed it — who really cares if your filesystem is good at storing data, amirite? — and even longer to understand how it worked.

To explain it, I’m going to have to teach you cryptography. Just a little. You’re welcome to skip ahead if you’ve already got this part covered, but I expect most could use a little, ah, refresher. Step 1 in cryptography guides is usually: “Get a masters in mathematics from MIT.” I’m hoping to do a bit better than that. Cryptography really is just a form of math, and while we can’t all understand the details (I certainly don’t) we can at least understand the “algorithms happen here” flow diagrams2.

Cryptography is most famous for its privacy utility: You use it to ensure you and only you can read your files and chat messages. It gets more complex once we need to read them on all of our different devices, but most of it is pretty similar in concept. Even more useful is ensuring both you and I can read some text, but no one else can. It’s more complex, but is essentially an extension of that first use.

Privacy is not the only use case for cryptography. It’s also useful for efficient validation. That is, it can be used to see if a file you have today is the same one you had yesterday. I sent you a document, you think it looks wrong; how do we make sure it did not get changed somehow in transit?

Obviously one way to do that is to just send it again. This is not a great solution, because if you did not trust it the first time, why would you trust it the second? That might also be a bad idea if bandwidth is expensive. You generally want a verification mechanism that takes less space than the original file, and less CPU power than directly comparing the two files.

Cryptography provides just such a capability, usually called a ‘hash function’. It’s an algorithm that converts, say, a large text file into a much shorter string. If you want to ensure the file is not changed in some way, just run it again and compare the output. The short strings are easier to compare than the long documents, and you could even read them over the phone to someone so they can check the file on their end. These algorithms generally produce a string of a fixed length, regardless of input — this makes them efficient for long term storage and comparison, and safe to run on any size file. Here’s an example hash from my files:

03f39f4bfad04f6f2cfe09ced161ab740094905c

As you can see, it’s just a long string of gibberish. It’s not only useful for comparison, not meaningful in it’s own right.

What’s critical about these algorithms is that given a unique input they always provide a unique output. If you and I each have a file that hashes to a given string, then we can be confident we have exactly the same file. Of course, this can’t literally be true: We could design a hash function that only had 256 possible outputs, and there are obviously more than 256 possible inputs. This would produce a lot of what are called collisions, when two files hash to the same output, and, ah, is not terribly useful.

All of the modern hash functions are incredibly long. It is possible in theory but not in practice that a collision would happen. You’d need to execute the function 2^128 times. That’s 3.4 with 38 zeros after it. So, mathematically possible, but you can expect the sun to swallow the earth before the most secure hash functions get compromised. I mean, you can’t. You’ll be gone by then. But your files will still be safe.

Now that you’re at least as much an expert on cryptography as most of the bitcoin hodlers, why does any of this matter?

We were talking about data integrity.

You’d be right to guess that ZFS uses these hash functions to provide it. It goes further than just validating individual files. A little bit of cryptographic genius called a Merkle tree is the key. These don’t just hash the content on disk for later validation; they build a tree of hashes, where the leaf nodes are hashed by the nodes above them in the tree, which are themselves hashed by the root node. If any part of this system is corrupted — because the disk is broken, or someone changed the content some other way — it’s easy to detect. It’s not just that the individual hash will be different; remember each parent hashes all of its children, so now the parent is wrong. And its parent is wrong, too.

If the content is changed by any mechanism that does not also also update the Merkle tree, then it is easy to detect by rehashing all of the content and comparing the results to the stored tree.

This is how ZFS validates data integrity. It can write a block to disk, then pull the block and ensure it still matches the hash. When it writes a block, it updates the parallel tree, and when you ask for the block later, it can tell you if the block is still correct. If it’s not, it throws an error instead of handing it back to you.

When I first learned of this, it seemed overkill, but over time I remembered just how many ways there are for data to get corrupted. The most obvious one is someone changes it for nefarious reasons, but far more commonly you have a failure somewhere in the writing or reading process. The old spinning disks were error-prone, and the new SSD drives degrade eventually. It’s the complexity of reading and writing that really gets you, though: There are multiple layers of caches, drivers, and connections, any of which could introduce corruption.

For the first time on a normal production system, you could at least detect any of those problems. It’s too bad no one ever used it.3

I know, I know, you came to hear about how you could get all the awesomeness of blockchain without using the blockchain and instead I’m giving lessons on two things you could literally not care less about, cryptography and filesystems. Don’t worry. It gets worse from here.

Long after I learned about and promptly forgot ZFS (after all, it’s not like I was using it), I adopted Git. It’s a version control system, used for storing and managing source code. Every geek knows about it, but most of the world only recently learned of it when Microsoft bought Github for $7.5B with a ‘b’. I was an early adopter, switching Puppet to Git in 20084. Eventually I even learned how it works. I was titillated and a bit horrified that I had duplicated in Puppet one of the key features that made Git work: A system of storing files that allowed them to be looked up by their content (or rather, a hash of their content). Normally you store files by a name, but if lots of people (or, in Puppet’s case, computers) store the same file, they might not call it the same thing, so Git and Puppet instead stored them by their hash. This ensured we never backed up more than one copy of a file, saving a lot of space, and made it easy to check for changes in files.

For Puppet, we just used this to back up files we changed, in case people later wanted to revert.

Git did a lot more than that.

Like ZFS, it builds a Merkle tree of the entire file repository, with a similar goal: To understand what files have changed and how. After all, git is used to track and share changes to a collection of files. The sharing is a critical component; you can easily copy an entire git repository to another computer, or another person, and it’s important that they be able to confirm that they have a faithful copy.

Git stores the hash tree alongside all of the files. At any point, you can use the tree to validate every file in your tree. If there are changes (which is pretty much the whole point of a version control system), it can automatically store the new files and update the related tree.

Just like ZFS, one of the key features here is that the Merkle tree allows us to validate every file stored. We can walk the file tree and compare each file to its hash, and then compare the file listing to its own hash, all the way up. Any discrepancy is easily spotted.

This is my favorite kind of cleverness: It’s simple in implementation, yet makes Git more flexible and useful. It has power that other version control systems are missing, just because it relies on this basic mechanism for storage and validation.

Ok. Now we get to the point.

Again, I’m not actually interested in the blockchain. I’m interested in peeling it apart, putting the useful bits to work while avoiding the whole anarcho-capitalist aspect.

It would be easy to see the blockchain as a sudden revolution, a dramatic change in what’s possible. Viewed this way, it’s hard to separate the pieces from the whole. If all you see is the big picture, it’s easy not to notice that every individual component has its own history, its own value.

The blockchain was gradual, for both me and the industry. It was not one giant leap forward. It was part of a story, a sequence, and the most interesting aspect — Merkle trees — is decades old in math and now pushing decades old even in popular usage. Most of the interesting features touted in the blockchain come directly from them. Immutability (which isn’t) and trustless systems derive directly.

It’s worth understanding that history, to see which stages and steps apply to problems you have. The current cryptocurrency tech stack is built to solve problems I don’t think exist. Certainly they aren’t problems I have.

Unlike the blockchain as a whole, though, the individual technical components have been used for years, even decades, in production. Focusing on the current trend can blind you to the opportunity history demonstrates. I think you’re a lot more likely to find broadly applicable solutions there than in trying to replace currency.

Because I got here from the world of filesystems and version control, I see different benefits than you might if you approach thinking of currencies or exchanges. Or chat messages. That does not make me right or wrong, but it does, at least, mean we’re going to work on different problems.

I expect most of you think this is boring. That’s great. It will give me that much more time to build something.

  1. My brightest memory is learning that of course the ‘echo’ command resets the exit code variable. This was a critical early lesson in how your own debugging can dramatically change the behavior of a program.
  2. When people talk about the futility of trying to ban cryptography, this is what they mean: You can’t ban math.
  3. Yes, I know some people use and love ZFS. But never to the extent it should be.
  4. Resulting in one of our critical community members abandoning Puppet in protest, for some reason.

The virtue of a tool

Vendor success should be about customer needs, not its own.

Photo by Philip Swinburn

I am a tool junkie. I love the effortless balance of a well-known chef’s knife, like my hands know what to do all on their own. Heavy usage builds callouses and tunes muscles, its usefulness evidenced by scuff marks and changed infrastructure. Failure leaves blisters or even hospital visits in its wake.

A good tool proves its utility. Knives slowly shrink with sharpening, work pants thin, machines need oil. If they don’t, you’re either not maintaining your tools, or barely using them.

This wear is proof of your usage. They should be scratched. Dented. Aged. Patinas should be acquired from the shop, not factory treatments. Their callouses should pair yours. Tools can not be precious. They’ll just live on a shelf, then retire to your attic. You should seek that perfect middle ground, where you spend enough money that your kids can inherit them, but not so much that you are squeamish about giving them a job.

Tools only deserve the label if they help you work.

You might say I have strong feelings about them. I’m assuming this love led to my focus as a software entrepreneur on helping people people work. Or maybe my experience with tools in the physical world led me to seek them in the digital world, learning to make what I could not buy.

Given my tool fetish, you’d think I’d have a solid grasp of what I mean when I use the word. Apparently, not so much. I was recently pulled up short by a simple question, asked by Jordan Hayles of the Radical Brand Lab: What do you mean by tools?

What do you mean, what do I mean? It’s a simple question, right? The above text gives one example, but I would have thought I could answer it in a bunch of reasonable ways, none of which seem terribly controversial.

But the more I explored, the less simple the question became.

I’ve been describing my goal as building power tools for people. This phrase comes from my time building houses with my dad, and ‘power tools’ just meant the things you plugged in. You know? Because they needed power? It’s a common usage, maybe the word choice here did not mean much.

Except… I’ve spent more than a decade learning product management, describing myself as a product-oriented founder, managing that function in a growing company, and attempting to teach it to others. Yet here I am ignoring both the term and the field entirely. Why am I so quickly dumping my work of the last ten years? Is it just creative branding? Cynicism about my industry?

Why not power products? That’s a motor boat of alliteration: ‘power products for people.’ Awesome, right?

Ok, maybe not.

Product management as we know it began in the consumer goods industry. You’re handed a train car full of dish soap and told to sell it. You’ve got to package it, set pricing, convince a local store to carry it, argue with them about location, move it away from competitors, all that. Every product you see in your local grocery store is loved by a product manager who fights for its shelf space, believes it is beautiful, and wants you to give it a good home.

Tide soap is one of the most commonly stolen consumer goods, but not because it’s soap. The strong brand makes it easy to resell, even allowing it to be used as a stand-in for money in drug deals. I wish I was that good at product management. For all that, it says nothing about the soap.

Product management can also be used for evil. Laser printers had toner cartridges you could just refill. Not very clean, but cheap and reliable to run once you plonked down the cash for the expensive printer. Modern inkjet printers instead use disposable cartridges. To sustain profit margins in a rapidly commoditizing industry, manufacturers started putting rules in place on the cartridges: You had to buy them from the manufacturer, they had to be replaced every year, you could not refill them, you could not print in black and white if any color cartridges are empty.

The user was getting hurt so the vendor could make more money. People got pissed of enough that the US Supreme Court weighed in.

That’s good product management. Well, it’s evil, but you know what I mean. It’s effective. We’re talking big-B revenue effective. Hmm. A moral distinction begins to reveal itself.

These are examples of companies forcing their business model onto their customers. There’s no difference between the dish soap sold at retail and the one sold in bulk, yet they’re separate products, differentiated through packaging, shipping needs, and labeling. You pay much more for the retail package than the wholesale one, primarily because the business model behind them is so different.

But when I think of a tool, these complications are missing. When I use a hammer, it just has to fit my hand and smash stuff. When I pick up my drill, it works with every bit I own, regardless of the logo. The battery and charger are proprietary, but the vendor’s most visible role in my life is color choice. My yellow drill works just fine with bits from the blue or green companies. (You probably visualized brands by my just mentioning colors. That’s still effective here.) It does not matter whether I bought the drill from Home Depot or inherited it from my dad; once in my hands, it just works.

I think this begins to answer the question of what a tool is.

It helps you do your job, without your worrying about the vendor’s needs.

I know that DeWalt and Mikita need to make money to sell me a drill, but I don’t think about it when I’m using their tools. Even after more than two decades without one, I can comfortably recite that “my” hammer is the Estwing 22oz waffle head with a straight claw1, but none of those details mean I need the vendor’s permission to hit a nail with it. I make a decision about the right tool, I buy it, I use it. End of story.

It is small. If you call something a tool, not a product, you’re saying it’s less, it’s not as complete a solution. This can be belittling, insulting, but it does not have to be. It’s also a statement of independence. Of freedom. Of, and this is going to sound crazy, compatibility.

Products have an implicit, ongoing dependence on their vendor. If that’s me, I love it: I want you to pay me all the time, not just once. That ongoing relationship is how I afford to keep improving what I’ve built for you. This can be a great way to ensure we have a long-term, sustainable partnership. But it’s not always a healthy relationship. The more you have to deal with how I make money, the worse the experience is for you.

I think this is what I like about tools. They’re self-contained. Independent. Using them is fundamentally pragmatic, not a lifetime commitment.

That independence has downsides for me as a vendor. You don’t get any of those delicious growth-hacker buzzwords. Your product isn’t “sticky”, there’s no “moat.” Those are examples of my customers being constrained by my business model, and their absence means revenue is harder to build, to protect.

One might argue I’m better off because treating my customers with more respect makes a better business in the long term, and I’d probably agree. This kind of respectful partnership should deliver higher returns than one that traps and mistreats its customers. I think this is often the right answer, but it’s not a popular one. It’s harder to get funding, to get off the ground. I might be accused of not “wanting to build a real company,” or I might have Silicon Valley’s most dire insult hurled at me: “That’s just a lifestyle business”.

Tell that to Adobe. Or AutoDesk. These are great tools companies. They are the behemoths we know today because they knuckled down and solved their customers’ problems. They worried about that, rather than how they could extract maximum revenue over time. It was a different time, but people have not changed.

I don’t think that every product is compromised when the vendor’s needs show up in the customer’s life, but I think most are. Some of it is laziness, shoring up product limitations with business model innovations, but a lot of it is strategy, recognizing the value of painting your customer into a corner.

Honestly, some of it is just survival. A lot of those inkjet printers are unaffordably cheap, but buyers care only about cost, not value. Some markets are intrinsically dysfunctional, with users and vendors slowly killing each through bad deals and cynical behavior. But as a vendor, I get to make a choice about what markets to play in, and how to work with my customers.

I am a simple person with a simple dream: I want to build something that helps someone work. I have to make money while doing it, because that’s the nature of the job, but I’m more interested in my customers’ work than my own. I know I need a business model, a go-to-market strategy, a plan for growing and supporting my business. But my customers should not need to care about that, should they? If they like what I’m building, they should be able to buy it, and use it. And tell all their friends how great it is. They should not wake up one day to find they’ve accidentally gotten married to me.

I just want to build tools. And I’m proud of it.

  1. We told with great pleasure the (most likely apocryphal) story that this hammer was illegal in Florida because the metal haft could cut your thumb off.

Several short sentences

I have plenty to share. I can even write it. I just can’t publish it.

Picture courtesy of Cliff Johnson

Last year I decided to write more. Daily, in fact. One of my first actions was to ask Om Malik for advice. I had been following him since he was writing for Business 2.0, which was an actually valuable business magazine in the tech bubble. I am now lucky enough to know him through True Ventures.

When we talked, he shared a story about his first published piece. I did not find it helpful.

The magazine he was writing for (Fortune, I think) said his article was too long. He shortened it. A lot. It was still rejected. He compressed it more. Again. And again. I believe the original 1500 word piece became 300. Om’s punch line was that they told him it was their best-ever first contribution by an author.

I don’t think I’m cut out for writing 300 word articles. It’s not just that that kind of compressed writing is hard. It’s also about the missed opportunity: If you’re reading one of my articles, I want to take full advantage.

But my lack of desire to write short pieces is not why his story wasn’t helpful.

It’s one thing to say: “Write shorter”. Ok. I can see how that makes sense. Everyone knows the line, “I’m sorry this is so long, I didn’t have time to make it shorter.” It’s intuitive, implying that investing in the quality of writing somehow intrinsically shortens it.

But more than a year after this story from Om, I had still not found a way to put this principle into practice.

I could compress writing a bit. Programming taught me that shorter code is often simpler and clearer. Of course, it still took years to be good at it.

Stephen King had taught me that adverbs often indicate a problem. If you can get rid of them without losing meaning, your work is almost always better. One way to do this is to choose more expansive verbs.

I was able to translate my speaking coach’s advice into writing, too: She helped rid my talks of filler words. You might not think this is useful in writing — I never find myself typing out ‘um’1 — but I am over-fond of long phrases that can be easily replaced with a single word.

I tend to over-qualify. This is an opinion piece. You know that. I can skip all the incidences of “I think”. One “maybe” is enough, I don’t need “should maybe”. Two examples suffice; I don’t need three and an “etc”.

For example, in this piece, I replaced the phrase “was still not able to learn” with “had not learned”. “I had learned as a programmer” became “programming taught me”. Both of them are clearer, simpler, just… better. In particular, the verbs are much stronger.

Again, it’s not that I had learned nothing over the last year. I just… I knew I was using tactics. Simple rules. My writing was shorter, but… not short. The Hemingway app was still harshly judging all my failures.

Then I got lucky. I ran into Lucy Bellwood — a fellow Reedie! — at an Indie.vc event, and we traded book recommendations while her partner patiently slept on his feet. She recommended a new book to me: Several Short Sentences About Writing, by Verlyn Klinkenborg.

It’s amazing.

Of course, I doubt I’ll ever finish it.

It reads more like poetry than prose. I do not like poetry. Nearly every paragraph is a single sentence. Almost none grow to more than one line.

Many books on writing can be summarized as: Sit down and write. Seriously. Now. Keep doing that.

Not this one.

I can’t summarize it for you. It’s, it’s… dense. But I can share a little I have learned. And how it has cursed me.

I received a new term from it: Transitions. I knew what adverbs were long before Stephen King taught me that they’re suspect. Not to be entirely avoided, but hold your nose when you use them.

I already knew my sentences were too long. Even when I broke them up, I knew I relied too heavily on words that connect them (like the phrase at the beginning of this one). This knowledge wasn’t useful, because I wasn’t able to translate it into methods of fixing them.

The first thing Klinkenborg gave me was a name for these words: Transitions.

Like adverbs, they’re usually a sign that you’ve failed somewhere. That you were lazy.

This labeling did not magically fix my writing, but it gave me something to track. An easy measure for how I was doing.

Then he delivered the kicker: Simple guidance on how to eliminate them. Or at least, reduce them.

Knowing sentences should be shorter is not useful if you don’t know how to get there. What makes a great short sentence, vs a crappy long one?

His answer is incredibly dense. I have to slow down when reading it: You should minimize the distance between the period of your previous sentence and the point of this one.

Those long, complex sentences I like to write aren’t bad because they’re long, or because they have too many phrases. They’re bad because their point is so far away from their start. The reader is left to wander through it, holding out hope for a conclusion.

What are you trying to say? Say that. Immediately. Did you leave something important out? Say that. Now. Keep it up until you’re done.

Your old, complex sentence is now a series of short sentences, in order of importance, each getting right to the point.

Of course, that’s not how the book describes it. My description would likely horrify its author. There’s far more to it than this. But it’s a start. And a huge departure from what I was doing.

It’s also why I can’t write any more.

The topics I’m working on now are incredibly important to me. They’re hard to reason about, to capture. And while I’m sitting here struggling with the content, the writing itself keeps getting in the way. The form. The structure of the sentences. The line breaks.

Where I put paragraphs.

It’s not that I’m embarrassed. It’s that the process of writing is distracting me from what I am trying to write.

So please. Forgive me a little writer’s block. I promise I’ll get better.

When I do, I hope you’ll also tolerate experimentation and failure in how to put all this to work. Expect wild oscillations in sentence length. Inconsistency between sentences and paragraphs. Confusion across pieces. I’m in that ’conscious incompetence’ phase of short sentences, and it’s going to be a rough path for a little while.

I’ll come out a better writer, though. And hopefully you’ll gain something from witnessing the process.

I recently updated Om on my progress, sharing how much his story confused me and these tactical lessons I’d just received. Thankfully he appreciated the note, and was happy to see I took his story as a challenge, rather than a judgement. And he didn’t seem fazed in the slightest that it took more than a year to make anything of his help.

  1. Except, obviously, in this case. Not never. Just not usually.