In our First Line of Code series, Commit co-founder Beier Cai talks to prominent tech founders and tech leads building the next generation of companies, to hear experiences and lessons learned from their early days.
Charity Majors is the CTO of Honeycomb.ai, a San Francisco-based software company. Honeycomb provides full-stack observability—designed for high cardinality data and collaborative problem solving, enabling engineers to deeply understand and debug production software together.
Commit co-founder Beier Cai sat down with Charity to talk about the early days of Honeycomb and what it was like writing the first lines of code.
Beier: How did you get into software? And how did you get into DevOps in general?
Charity: I was a music major in school and I had a crush on a boy in the computer labs, so I started hanging out there. I’m a very oppositional personality: as soon as I saw these computer labs that were full of dudes and no women I was like, ‘I belong here!’ I got my first gig at the university and it grew from there.
I really enjoy how computers are, at least for this generation, both art and science. They’re beyond our ability to grasp using very simple mental models—you have to use them to get a sense of their flavour and the feel. I had the opportunity to do that by building bigger and bigger systems.
I was also fortunate in the teams that I got to work with, because this is such an apprenticeship industry. You become a great engineer by getting to work with other great engineers, seeing how well they do things and carrying those lessons forward.
Beier: What was the first piece of technology you started toying around with?
Charity: It was FreeBSD and then Linux. I love Unix to this day. After that it was databases like MongoDB and Redis. Before Honeycomb I was mostly known for my work helping MongoDB grow up.
Databases are super fun to me. Part of the fun, ironically, is that they’ve been lagging when it comes to tools for the layperson to understand them. They’re one of the last priesthoods of software. Watching how completely that priesthood falls apart at scale motivated me to do Honeycomb.
Beier: Every company has one or two heroes who know everything. When production goes down, they’re the ones who instinctively know what went wrong.
Charity: And they’re always the ones who’ve been there the longest. That’s how you know that your system is running more on intuition than on your ability to understand it. The people who have been there the longest and who built most of it are the ones who are able to instinctively point at the place where it’s broken.
Funnily enough, I’ve now worked on three teams where that hasn’t been the case: where the best debuggers are not the people who’ve been there the longest, but the people who have the most curiosity and the most interest. The difference, I think, is in the kind of tooling that you have. I think it’s the last generation of tooling where you have dashboards that force you to read between the lines. You have to know which questions to ask, which graphs to look at, and interpret them in a way.
For Ops, one of our chief functionalities of the last decade or two has been interpreting computers to software engineers. To be able to look at all the low level activites, draw a story out of that and translate it into what could be happening with the code you just wrote.
For the new generation of tools, we’re trying to make them speak to software engineers in the language that they use every day. We’re trying to bring it up the stack by a level or two so you can understand what happens at the moment you write your own code, without needing a translator or a priest to interpret it for you.
We’ve heard for years about how microservices and DevOps need to learn to write code, or write code better. But I think in the last few years we’ve seen a big swing to where now it’s like, ‘OK software engineers, it’s your turn. You need to learn to write observable code and build systems set at scale.’
Beier: Honeycomb has been around for a few years and it’s been very successful. As a technical founder, how did you choose between new technology and tried and tested stuff?
Charity: That’s a great question, because from day one we faced the core technical decision that would define our fate, which was whether or not to write our own storage engine. Our own database, basically.
I spent my entire career yelling at people to not write a database. Every software engineer dreams of writing a database when they grow up. I’d tell them: ‘Just don’t do it.’ But when my co-founder Christina and I looked around, we realized nothing existed that we could actually use.
There was Druid, which got us two-thirds of the way. But it was written in Java, it didn’t have any provisioning for flexible schemas, and its usage of data types was meh. We needed something purpose-built for observability.
So we decided to write our own database—and it almost killed us. We knew we had to write our own or we would have looked and felt like every other product out there. And we knew we wanted to do something that was dramatically different.
But at the same time, when you’ve taken venture money, VCs expect you to be out there proving your product from day one and getting it in front of customers. We weren’t able to start doing that until a year and a half in. So our investors were very cranky about this.
They were like, why don’t you use something off the shelf, stuff it in MySQL, and when you’ve found product-market fit go back and optimize for it. But I was sure that if we did that we’d end up looking and feeling like so many other products out there.
So I managed to antagonize and alienate our first couple generations of investors. And I think I was right, but they were also right in that we had a ticking time bomb on our hands, financially. We were really fortunate to find investors that had the patience for us. It easily could have gone the other way.
There’s a real big barrier to opportunity in the infrastructure space when it comes to anything that’s genuinely disruptive, because infrastructure software takes a long time to build and to mature. Nobody wants to trust their infrastructure with anything cutting edge or Series A or otherwise very new. In general, I‘m always keen to buy it or use open source if I can.
I like the way Dan McKinley put it in one of his classic blog posts. He says that if you’re a startup, you have two or three ‘innovation tokens,’ so spend them wisely. You can’t innovate on everything, or you’ll spend all your time on bullshit, constantly ironing out tiny little details.
You have to use the tried and true as much as possible so the things that you build yourself can become real core differentiators. They have to be things that will make it or break it for your product. We chose to make our storage engine our one big innovation token, and we tried to go very ‘middle of the stream’ for everything else.
Beier: I’ve done a few interviews before and most founders have said, ‘Just use existing open source products.’ But your database has become a core competency for you.
Charity: It emanates into everything. We’re five-and-a-half years in, and it’s cool to see how the choice that we made there continues to be a differentiator for us—in a good way.
We see incumbents trying to refactor their backend so they can do what we do, and it’s a drag on them. Because they already do other things so well. Plus, if you’re trying to optimize your system to do a different set of things well, you risk alienating your old customers, and it’s very difficult when you have all this data.
So our architecture keeps working for us as we’re trying to do all of these new and shiny things. But I don’t want to undersell what a risk it was and how dangerous it was.
Beier: I want to dig into that a bit more, because this doesn’t happen very often. I know there are many technical founders want to build a certain piece of technology as their main differentiator. What are the key lessons you’ve learned about building your own technology? You must have felt pressure from your customers and your investors to do otherwise. How did you handle that?
Charity: Badly. I had planned on being CTO from the beginning, but I ended up having to be CEO for three-and-a-half years, and it almost killed me. I don’t have a good poker face. I’m not a very disciplined or organized person, and I think that you really want the CEO to be that kind of person.
I kept us alive long enough to get to the point where being disciplined would be an asset, so it was very chaotic in those early years. At first we thought we were going to be doing an enterprise thing that was just B2B. Then we thought we were going to be a better Elastic.
The hardest thing about my job, in retrospect, was figuring out how to talk about what we were doing. Every term in data is so loaded. We knew early on that what we were doing was dramatically different from monitoring.
Towards the end of the first year, I happened to Google the term ‘observability,’ which describes how much you can understand the inner workings of a system by looking at it from the outside. It comes from mechanical engineering and control theory, and I was like, ‘oh my god, that’s exactly what we’re trying to do.’ But then I had to educate the entire world about what that meant. Then, of course, after we used it for a year or two, everybody jumped on the bandwagon.
It seems so easy now, but it felt like birthing a child at the time. It was the hardest thing I’ve ever had to do. I have new respect now for product marketing. Writing code was hard, but translating it for humans was hella difficult.
Beier: You’ve been around for more than five years. How do you prioritize and manage technical debt as a whole, for the product and the team?
Charity: I’m not the best person to answer this these days; Emily Nakashima, my VP of engineering, is the one who really makes these decisions on the ground. But I will say that the most important limiting factor is your horizon. Are you funded? Are you able to plan for the next three months? Six months? Two years?
This number bounces around over a company’s lifetime. There have been moments when Emily and I felt like we couldn’t plan farther out than two weeks. So you’re obviously going to rack up a lot of tech debt there.
When you’ve just gotten funded, you often have tension with your investors. They’re like, Let’s see some progress.’ While we’re thinking, ‘We just got money, now we can pay down our technical debt.’
It’s more art than science. That’s why you really want to have deeply technical managers, because they have a gut feeling about when you’re starting to accrue too much technical debt. It’s not something you can teach someone who hasn’t been through the digestive system of a company before.
Beier: Now that your team is bigger, what’s the day-to-day like for the engineering team to deal with the technical debt?
Charity: We plan in what we call eighths, which is a little awkward, but it’s eighths of a year—six weeks at a time.
We know exactly what we’re doing for the next six weeks. And we have a pretty good idea of what we’re doing for the six weeks after that. Then we’ve got stuff on deck and in design land. So we try to plan work that can produce results between now and then.
The way that we plan for technical debt is we try to spread it around. We try to make sure that people always feel like they’re making progress on something. Because engineers want to pay down tech debt.
That’s the thing I think most management doesn’t understand, how badly engineers want to pay down that tech debt. You usually have to hold them back from paying down tech debt to crank out features.
I believe in empowering engineers, not creating a culture where someone just makes and assigns tickets and the engineers are code monkeys. Especially when we’re a product for developers. Our engineers are our best consumers, and they’re creators of our own products.
Beier: Right. So it’s less of a top-down situation and more a bottom-up one where the team can surface technical issues.
Charity: For low-level technical decisions, the people who are best equipped to make them are the people who are on the ground. But there’s a role for leadership, too. We tell people where we need to be a year from now, or six months from now. But how do we get there? I have no business telling people what code to write today. That’s not my job.