Hashicorp CTO Armon Dadgar
Hashicorp CTO Armon Dadgar

What Hashicorp CTO, Armon Dadgar, learned from writing the company’s first line of code

February 2, 2021 in FLOC

In our First Line of Code series, Commit co-founder Beier Cai talks to prominent tech founders and tech leads building the next generation of companies, to hear experiences and lessons learned from their early days.

Armon Dadgar is the co-founder and CTO of HashiCorp, one of the fastest growing tech companies in the cloud and DevOps spaces. Well known for its suite of products such as HashiCorp Terraform, HashiCorp Vault, HashiCorp Nomad, and HashiCorp Consu, HashiCorp is also highly respected by the developer community for being a prominent open-source promoter and leading all-remote working practice.

Commit co-founder Beier Cai sat down with Armon to talk about the early days of HashiCorp and what it was like writing the first lines of code.

Beier Cai: Tell me how you got into software. What was your first experience with coding?

Armon Dadgar: My dad was an electrical engineer—he worked at Cadence Design Systems. Growing up, I would see him doing things on his computer and I was curious. I was a 10-year-old, like, “What are you working on? Show me what you’re doing.” It got me really interested. I begged my parents for Christmas, “Can you get me a coding book? I want to learn to program.”

I remember very fondly my parents got me the box set of the .NET kit. It came in this big box from Microsoft. It had a 600-page manual and the 12 different CDs that you needed. With Visual BASIC, we built a Tic-Tac-Toe app, Tetris and things like that. I remember going through the book on my winter break.

Beier: Do you remember when you came up with the idea for a startup?

Armon: Flash forward to sophomore year of college. I did my first internship at a big company—at Amazon—and, I’ll be honest, I hated it. I think my impression was that it was bureaucratic.

I thought it would be great—I’d go and write software all day. But instead you’re doing sprint reviews and design planning, and if you want anything approved you have to go to multiple committees. There were database design committees and architecture committees, and I’m spending all my time going to committee meetings, trying to convince people to approve the design of something.

I remember coming off of that experience, coming back to school and thinking, “Wow, I really don’t want to work for a big tech company. I’d rather launch a startup.”

I formed a group with a bunch of friends. We worked on a bunch of little startup ideas while we were at university. Everything from an e-commerce website for textbooks to a point-of-sale thing, kind of like Square, to a GroupMe type app. We kicked around different ideas and I guess that’s where the startup seed got planted.

Beier: How did you manage your personal financial risk when getting started on the startup?

Armon: I was in a super lucky position in that I went to a state school. My tuition was probably $5,000 a year, so when I graduated I didn’t have any debt. I lived at home when I went to school so I didn’t incur a lot of living costs. When I graduated, I worked at a tech company for a few years, with a pretty reasonable tech salary, so I was able to save some money and didn’t really have any debt. When I founded HashiCorp I was in my early 20s. I was single—no kids, no mortgage. The only cost I had was my rent. Rent and food, and beer money. That was basically it. And I had some savings from working for a few years.

Me and Mitchell used to joke all the time when we started the company. We were like, “Great, we’ll start it in my living room. All we need is two Ikea desks, and if we fail after six months we’ll go work at Google.” So I guess I didn’t think about it too much. There wasn’t a whole lot of personal financial risk.

Beier: I think that’s a great way to think about it, especially for young people in their early 20s who don’t have a lot of debt. That’s really the best time to try something. If it doesn’t work there’s a backup option. I know a lot of recent graduates who went straight to Facebook or Google and they are thinking, “I get paid so well,” and they get used to that life and never think about startups again. Which is unfortunate. 

Armon: I think it’s valuable for people to spend a few years [at an established company]. I think my time at Amazon was very valuable—my time working before HashiCorp—because you learn a lot from it. Plus, it lets you save some money, build a little bit of a cushion.

I think it’s hard to graduate from school and go straight to a startup, because you don’t really have enough experience. You haven’t seen software done in industry. The way you write software in school is not the same as how you write software in industry. I think it’s useful to get a few years of that exposure. But then, yeah, when you’re in your early 20s, that’s the best time to take the risk.

Beier: Early on at HashiCorp, how did you choose between newer technology and tried and tested technology? 

Armon: One the best articles I’ve read was about describing the job of a startup founder as being a risk manager. Think of yourself as a portfolio manager. If you’re investing in a portfolio, there’s a set of assets where you’re going to want a safe choice. It’s going to have a low ROI, but I’m not taking a risk. A small percent of my portfolio I might allocate to high-risk, high-yield investments. But overall I’m trying to have a balanced risk approach.

If you apply that approach to a startup you can say, okay, the area of HashiCorp where I’m making a high-risk investment is the products we develop: Consul, Vault, Terraform. I’m making high-risks bets that I can change the way the market works. I can get a user community around a new tool, I can develop a new product.

In other areas of my business I think, where can I de-risk my portfolio? Where can I take unnecessary risk out of it? Should my back-end database be MySQL? That’s very low risk. It’s a boring technology. From my customers’ perspective, do they care if the record is MySQL or a fancy new NoSQL system? No, it doesn’t matter to them, it’s invisible detail. So I would rather allocate my risk to Vault and Consul and Terraform and de-risk my investment on the other things.

I tend to have a very conservative posture. I’ve vetoed our engineering team multiple times on different shiny technologies where I’m like, no, we’re going to use Memcache, we’re going to use MySQL and Postgres. We’re going to use boring things because we understand how they work, we know they work at massive scale, and they don’t come back to bite you.

Beier: What percent do you put towards higher risk tech versus safer technology?

Armon: The way I like to break it down is to think, what should be a core competency? We should be the experts at developing Vault. Should we be taking risks on our control plane? No. So let’s use boring technology for the control plane. All of our SRE work, I’d say we pick the boring technology where we can.

I tend to view the north star to use for all that is just asking yourself, “Does this add customer value?” And if the answer is no, take the boring thing. If I’ve got my own license manager or I buy a license manager, does it add customer value?

Beier: What are some of the major technical milestones that you guys have achieved that didn’t get media attention?

Armon: One of the things we like to do as an opportunity to push ourselves internally are these large-scale benchmarks. We did one in 2016, 2017, with Nomad, where we did what we called our million container challenge. We said, let’s pick a crazy number: could you run a million containers on top of Nomad?

Of course, as you’d expect, the first time we tried it failed miserably. We had all these product issues. Nomad would crash, it wasn’t designed to be able to handle that. But we chipped away at those issues. It maybe took 15 minutes for Nomad to get all of the containers running, or 20 minutes. By the time we finished optimizing it we got it down to four and a half minutes.

All of this work went into bug fixing, performance optimization, profiling the system, changing the data layout on disc. All of these optimizations to get it down to the four-and-a-half-minute marker—we can talk about how I enabled a million containers, but most customers don’t have that use case. For the rest of the customers, what they felt was just this massively increased stability and performance. I think those kinds of benchmarks, while it’s invisible in some sense, works for the customer. It wasn’t a new feature but the system works that much better.

Beier Cai: What’s your philosophy for managing technical debt?

Armon: There are a few different principles here. One is an ounce of prevention is worth a pound of cure.

Stuff I’ll put in that category is people being like, “Oh, we don’t have time to write tests—we’ve just got to get this product out.” That’s the type of technical debt you never want to incur, because it’s so much more expensive to pay that down later than it is to prevent it by just writing the unit test. Like, your code is compiling, just write the damn unit test. You never want to skip basic hygiene.

Then I think there’s a different class of technical debt. I’ll call it “architectural.” Architectural debt is another class you want to avoid. There’s this view that once we get to some point of maturity, we’ll come back and redo the architecture. Find me someone who’s ever gone back and redone an architecture successfully. It’s like, “I’ll build the house and then, you know, once I like the design, I’ll change the foundation.” It just doesn’t work. You’re going to be fighting that thing forever.

You don’t want to over-engineer it, but you certainly don’t want to under-engineer to the point where you anticipate that your architecture is going to be a pain point for your extensibility or your ability to iterate on it.

The third type of debt, which I think is the most acceptable kind to take on, is what I’ll call “scale design.” I can say, “Great, today I can design with a single MySQL instance, or should I design around a sharding scheme, or should I design around a multi-datacenter replication scheme, where I can span the U.S. and Europe and make it active/active.

Those are three very different scale design points and each of them is 10 X harder than the last one. If I’m just working on getting product-market fit, great. I should just use the single instance MySQL and take on what we’ll call design debt, so that when I get to a really big scale I have design debt. I’m going to move to a sharding. And when I get to super scale, I have design debt, I need to get to a multi-data centre. That’s okay because I’m going to pay that debt off when my customers demand that.

If my product failed, I didn’t waste a whole lot of money designing a multi-data centre replicated thing that I ended up never needing.

Beier: What was your hiring strategy in the very early days? What kind of people or engineers did you look for?

Armon: There were a few different strategies. One was, early on, we wanted to be remote. Even when we were three people we had three different locations. So we said we’re going to make a conscious effort to hire in a remote way and we’re not going to filter down to San Francisco or Seattle or whatever.

The second big one was actually a programming language bet. Heroku made the bet on Ruby and it became the place to work if you want to work on Ruby. I think we made a similar big bet on Go. People who wanted to program in it and be on the bleeding edge of the Go community—we want HashiCorp to be the place they’d go to.

And the third was open source. All of our core products are open source, so if you’re motivated by working in open source, working directly with the community, having your work showcased—that’s a huge appeal and a huge attractor for HashiCorp. The other advantage of open source is it ends up being a great hiring inbound funnel, where the people who collaborate and contribute are like, wow this person’s great to work with. They write great code. Then we’ll reach out and be like “Hey, would you be interested in a role at HashiCorp?” So a lot of our early hires, probably more than 50 percent, are people we found through the open-source community.

Beier: Did you look for any specific employee talents or qualities in the early days?

Armon: This is sort of a controversial one: one of my very early hiring philosophies was to only hire seniors. That’s changed now; obviously we hire junior people at this point. But what’s tough is that junior people require a whole lot of mentorship. And there’s—especially when you’re a small startup—there is a hidden cost to that. It goes back to being a risk manager. I’d rather pay more for someone who is senior, where they’re not going to take time away from me to do mentorship, because time is a precious commodity. Your number one enemy is time when you’re a startup.

Beier Cai: How did it work out for you?

Armon: Really good. If I had to do it again I would do the exact same thing. Now we actively hire junior people, and we have college internship programs and things like that. Our engineering team is over 300 people now, so you have a lot more bandwidth to be able to absorb junior people and mentor them and train them. But in the early days, that focus and efficiency you got by just having an all senior team was really valuable.

Beier Cai: A personal experience for me as a tech leader is that at some point I became a bottleneck. I could do the work, or I could spend time communicating, educating and convincing others to do it. But then I’m not in control anymore and I’m worried the quality might suffer. Do you have similar experience in the early days?

Armon: That was the hardest transition. Me and Mitchell wrote code for the initial versions of all our products. We knew them inside and out. What was tough was that, in the very early days, I would literally do a code review for every code across every project. So imagine six projects as different as Terraform, Consul, Vault and Nomad, and I was trying to do code reviews across all of that. It was insanity, it was impossible.

How big were we when it tipped? I don’t know, maybe 18, 20 people, where we were like, it’s just not sustainable anymore. You have to have people that you delegate to and trust. We started grooming people to be team leads for the respective projects. I would have one-on-one meetings with the team leads and if there was a controversial code request or there was something big and breaking about the architecture, I would discuss that with them.

For the day-to-day code review and approvals, at some point I had to let go. But it was super tough. It was like, I know the system better than anyone else, I’ll think about the edge cases, I’ll be the code reviewer for it. But at some point it was just impractical. People were like, I submitted this to you a week ago and I’m still waiting for a code review. I was just buried.

Beier: How did you start to let go? Any advice you can give to help people manage the transition?

Armon: One was realizing that I can still review it without needing to be in the change path. So even though I stopped being the PR review path, I would still look over a lot of the PRs afterwards. I would see the notification. Great, it would get merged without me, but I can go back and review it and I’d leave comments if I noticed something.

And I think you have to just realize that software is malleable. It’s okay, even if the pull request got merged and it wasn’t exactly the way you want to do it, they can do another pull request and change it or fix it. It’s not set in stone.

And then I think the other piece of it—and it was hard, it took me six months to internalise—is that I can only be so effective with just two hands at a keyboard. But if I can train 10, 20 other people to think the way I do, or to apply the same design principles, then it’s like I have 20 hands at the keyboard. Making that cognitive shift from IC-land into manager-land is a really tough one.

My motivation, and the motivation I’ve always tried to transfer onto the folks internally is the customer use case. I think there’s a sense of pride and a sense of motivation that comes from that, versus “Hey, here’s this abstract feature we’re building for Vault.” That’s always been the most grounding and exciting thing. Like “Oh wow, these are the cool use cases we’re going to enable and isn’t that exciting to be a part of?”

Beier Cai: What is an epic technical failure you’ve experienced at HashiCorp and what did you learn from it? 

Armon: The biggest by far was our first big on-premise Terraform Enterprise customer, where we were making the transition from a SaaS version, where we managed Terraform Enterprise, until they wanted a self-host and on-premise. And it was a total cluster. There are so many different layers.

There were product-level challenges. A SaaS product is very different in design than something that’s designed for on-premise self-operation. There were bad decisions in terms of build versus buy. This was a classic example where we tried to build a thing when we probably should’ve just bought a thing. Later on we did buy and replicate it through the on-premise installer, but at the time we tried to build it, which was a mess and a disaster. There was a lot more complexity there than we really realized.

And there were a huge amount of people-process issues—people ended up getting fired afterwards. There was an intermediate manager, I wasn’t managing it directly anymore. So there was me, and a manager, and the team working on it—and I didn’t stay close enough to that team.

Every week we’d meet and do the readouts and every week it was, you know, green, green, green—up until the week when we were going to go do the install at the customer’s site and it went red. How did we go from green every week to red?

Once we got a little bit closer I realized the manager had not done a good job breaking out all the work items. They were making unrealistic assumptions about how much progress they were going to make. You know, a classic 80-20 rule: 80 percent of the stuff takes 20 percent of the time and all of the hard stuff was back loaded to that last 20 percent. It was a huge flaming mess. We ended up having to push out the install for the customer. It took us two days to get the thing up and running on the customer’s site, as opposed to two hours. It was a nightmare. We ended up having to fire and reorganize that team, reset the entire product direction for how it worked.

There were a bunch of lessons in it.

One is that going from on-prem to SaaS or SaaS to on-prem is not trivial. Two: it was the first time I was in a director role. It was hard. I think I placed maybe too much faith in the manager. I wasn’t close enough and I hadn’t done enough diligence, like, “Show me your timelines, show me the Gantt chart, show me all the sub-projects and how they’re tracking.” I trusted that they had broken it out and were executing against it and I didn’t click one level deeper to verify that.

And I think there was probably even a product decision in there that was a mistake: should we have pivoted our business to support this one-off customer? Those types of one-off big exceptions are really tough for a small startup. You don’t have the capacity for it.

So there were a lot of different lessons there. Was it a good business move? Was it a bad technical approach? Was it a bad management approach? You know, the answer is probably yes to all of them.

Commit is the remote-first community for Software Engineers. Apply today to expand your network, get professional development opportunities and access to exciting Bay Area startups and projects.

applynow.png