Episode Cover

Nurture over Nature in Software

This past Sunday, I had the pleasure of talking to Ant Weiss as the inaugural guest on his new “DevOps Shorts” podcast. Below is the transcript of the episode, edited for clarity.

• • •

Ant: Hello and welcome to DevOps Shorts, the show, where we invite wonderful human beings to have a lightning-fast talk about dev ops and other mythical creatures. Each episode is only 15 minutes long, and we are focused on three main questions, so it’s short and sweet. Why? Well, because if there’s one thing we know, it is that great delivery comes in small batches. Our guest for today is Tobias Kunze Briseño—did I get this right? Tobias was the tech co-founder of the company that later became known as OpenShift. Today Tobias is the founder and CEO of a startup called Glasnostic. Glasnostic is building a novel technology, and here I’ll need my notes to put it right: it is “a novel technology for operators to provide their applications with instant resilience so developers can deploy more and the business stays up.” Sounds great!

Hello, Tobias, how are you today?

Tobias: I am doing great, thank you.

1. Why Do you Love Being in Tech?

Ant: Wonderful! So, without further ado, let’s dive straight into our three questions. We have three main questions, and our first question for today is very simple: why do you love being in tech? Why did you fall in love with information technology? I have this stupid, naive assumption that you love what you do, and you can, of course, tell me “no, that’s bullshit, I hate what I do!” Well, I don’t think that is the situation with you. So, starting the timer right now and go!

The Renaissance of our Time

Tobias: Perfect. Yeah, thanks for having me. So, it’s a really interesting question, because I think the question should actually be something like “why wouldn’t you want to be in tech,” right? I think if you look back, IT is the defining human innovation today. Oh, and it has been that for the last 20-plus years. It’s the Renaissance of our times. It’s like, 200 years ago, we had the Enlightenment, and you would be kind of weird if you didn’t want to be part of it, right? Or the Industrial Revolution: it was a great event, the event of everyone’s lifetime. I think IT is the event of our time. There are many other things that are interesting, but IT is the defining event, I think, at least of our lifetime.

Continual Crafting

But on a more personal level, what I really like about technology is the particular crafting of things and the personal challenges that come with that. It’s a personal challenge as well as an intellectual one, right? Now, different people approach crafting in different ways. There are the artsy types that tend to be somewhat navel-gazing, and when they craft something that often gets polished until it is perfect, right? In science, on the other hand, you’re trying to adapt to something that’s extremely slow-moving: you’re looking for natural laws. But in engineering, we make things possible! We do things that otherwise wouldn’t be possible. And it’s a continually changing and rapidly changing environment we live in, that really makes it very challenging and exciting, right? As engineers, we need to continually adapt to a rapidly changing environment. Everything we create needs to be adjusted, modified, nurtured—almost “raised” like a kid throughout its lifecycle.

For instance, my background is in art. I am a trained classical musician. I studied composition and orchestral conducting in a previous life. But from there, it was a very straight line for me to technology. First, doing digital sound synthesis, programming VAXes of all things. Then I dabbled in algorithmic composition. Then I worked for Silicon Graphics when it was a great company, and from then on, it was startup after startup, and I never look back. What made me stick with technology was this particular, extremely creative activity of crafting things.

So, if you think back to when you were a kid, there are kids that love watercolors, kids that love building airplane models. I was the kid that would play Lego. I loved that you could build something and then discover “Oh! There’s this other thing I could build on top of that!” You never reach the end, it’s a continual process. And the process of crafting a solution is very similar in technology, right? Because, even if some of us look at change requests coming in as disruptive and annoying, and we get frustrated—“I just built this, and now you want me to throw it all out again?”—that’s the nature of things. And in IT we’re living this every single day. It’s the intellectually challenging way of having to write new code, form new hypotheses, and then test it on reality. And when we are done, reality comes back with more changes, right? It’s a continual loop. So, how do you adapt continually? How do you craft continually? That is really what it is all about.

Ant: Poking reality and seeing what happens!

Tobias: Yeah, exactly!

2. What was Your DevOps “Aha!” Moment?

Ant: OK, let’s go straight to the next question: What was your DevOps “Aha!” moment? In the “DevOps Handbook,” each of the authors describes their DevOps “Aha!” moment. That moment when they realized that the way things work in IT really actually sucks, but there is a better way. So, I suppose you, too, had this moment someday in your career, and I’d love to hear about that.

Horizontal vs. Vertical: Thread of Execution vs. Environment

Tobias: Yeah, so for me, I think it was a very slow-moving moment. It was more like an “Aaaaaaaaaahaaaaaaaaaah!” moment. When we built Makara (which was the company that became OpenShift), we were laser-focused on how to make it easy for developers to build applications. So, support the building of applications. And that meant we baked all lifecycle management aspects into the platform. Monitoring was baked in. Incident resolution baked in. That, of course, had natural limits, and those limits were much more narrow than I expected them to be initially. And then, when we started hosting this as the first version of OpenShift, it became apparent that all these operational knobs and levers were really not that useful to developers. Because, as it happens in multi-tenant environments, you’re already facing something that has nothing to do with your code: the environment. I call this the vertical domain. As developers and as engineers—as computer science types—we think in terms of horizontal thread-of-execution problems. That’s what we’re focused on. That’s what debugging is all about. That’s what tracing is all about. That’s what observability is all about: what does my code do?

And then we discover all of a sudden: that’s the least of your problems. A bug—finding a bug, debugging it, fixing it—anybody can do this after a week of programming. Actually, you probably did that before you finished your first program! Then, of course, you’ll also have to find problems in other people’s code very quickly, which is a bit more difficult. But the real problems are those vertical problems that are not thread-of-execution problems. And there are disciplines in computer science and programming where that is an issue, of course, like thread programming and parallel programming, where unpredictable things happen outside of the thread of execution.

As developers, we like to think in terms of threads of execution, the execution of individual calls. But, the interesting problems are all outside of it. And with cloud and microservices and, of course, parallel development, where you split up into many “two-pizza teams”—that is the defining problem of our time.

Nature vs. Nurture in Software

So my “Aha!” moment was seeing that there’s this nature-versus-nurture debate that nobody wants to have in software engineering. And the nature part is the code I’m writing, the genetic code that is put in place. But that is almost meaningless, right? In the grand scheme of things, it doesn’t matter. There are millions of ways to write this code. It’s just a function that I expose. But the really defining problem is how we nurture it. How do we operate it? How do I “raise” it and make it successful? Because, ultimately, nothing exists in isolation.

Successful Systems are Not Built, They are Run

So, I would describe my “Aha!” moment as the realization that successful systems do not become successful because they are well-engineered. They become successful because they are operated well. Engineering is needed—I’m not saying it is not an important piece—but it is a very small piece, and we tend to overthink that piece. So, I would say: the vertical domain, that’s what it’s all about, getting ahold of the environmental problems.

Ant: I love the picture you painted. What I really envisioned when you were talking, I saw code as grain, and when a grain falls into fertile soil, then something grows from that, and if the grain falls onto dry earth, then nothing good will come out of it.

Tobias: Exactly, yeah, very good picture.

Illustration of operational patterns
Monitoring systems and detecting failures is worth little without the ability to remediate effectively and in time. Glasnostic Mission Control enables operations, SRE and security teams to respond to complex, disruptive behaviors effectively, with powerful and predictable control primitives, in real-time and without the YAML.

3. What is Next for DevOps and the IT Industry?

Ant: Yeah. Next question, the third question. And this is the most exciting question for me because I like to describe myself as a software delivery futurist. So let’s talk about the future. What do you think is next for DevOps and the IT industry? This is your chance to look into the future. You can talk about, you know, the day after tomorrow, or you can talk about what happens five, ten, twenty years from now. Just go wild, whatever comes to your mind, I want to hear it because you’re the one dealing with technology, so you probably have a vision of where all this is going.

Operators are the New Kingmakers

Tobias: Yeah, and obviously, I founded Glasnostic, which I wouldn’t have done if I hadn’t had very specific convictions about that space, so take this with a grain of salt. But, yeah, building on what I said earlier, the vertical domain, the environmental factors, the operational concerns that matter: I think the defining piece over the next couple of years—not ten years: two years—is that we’re going to see a cultural shift where the operations groups become the kingmakers. This is for larger organizations, of course. I’m not talking about the ten-person San Francisco startup or, you know, Tel Aviv startup. Those have one product, so one application, that’s it. They are not dealing with complexity. They may deal with scale and other technical issues that are also important in their own right. I am talking about larger organizations with multiple levels of engineering.

We’ve been living under this paradigm, “developers are the kingmakers” for about ten years now. Stephen O’Grady of Redmonk wrote the book on that topic. I think that pendulum will swing towards the operations side. And also, within the organization, the business folks need to start talking to the operations teams first, before they talk to developers, right? Because developers talk about, like, “OK, it’s done,” but there are a thousand definitions of “done,” right? As a developer, I can’t oversee the entire complexity of delivering something to the customer all the way, the last centimeter, the last inch. I’m done when it works on my laptop.

But all those delivery, business continuity and disaster recovery concerns—all these other pieces that come after code is “done”—are increasingly the important pieces, certainly for the business. The business is not interested in your commit or whether you are done with the happy-path implementation or all the other definitions of “done” that come after that. The business only wants to know: when can I sell it? And it’s the operations groups that can answer this.

From Resilience Engineering to Resilience Operations

And, looking again at the two-dimensional space, where development is focused on the horizontal thread-of-execution dimension and operations on the vertical, environmental-factors dimension. There is now more and more interest in application resilience and resilience engineering. I think we’re going to see resilience operations. Some things you do in code, like compensation strategies. The vast majority of things that are important for resilience, however—noisy neighbors, ripple effects, compound failures, gray failures in general—all these things that bring your application down, that bring your business down, need to be done at runtime, in ops.

So, my prediction is that the unrelenting growth of hyper-connected applications is causing the industry to “move up the stack” to the ops level and that we’ll see a massive need for real-time, environmental control.

Ant: Great answer. OK, so operations in the future! Thank you for watching this, thank you, Tobias, and a lot of interesting stuff to work on!

Tobias: It was a pleasure talking to you!