Complex systems are complex (and fragile)

Complex systems are complex (and fragile)

About every two months, a colleague and I travel to various cities in the US (and sometimes abroad) to teach Microsoft customers how to license their software effectively over a rather intense two-day course.

Almost none of these attendees want to game the system. Instead, most come (often repeatedly, sometimes with more people each time) to simply understand the ever-changing rules, how to apply them correctly, and how to (as I often hear it said) “do the right thing”.

Doing the right thing, whether we’re talking licensing, security, compliance, and beyond, often isn’t cheap. It takes planning, auditing, understanding the entire system, understanding an application lifecycle, and hiring competent developers and testers to help build and verify everything.

In the case of software licensing, we’ve generally found that there is no one single person that knows the breadth of a typical organization’s infrastructure. How can there possibly be? But the problem is if you want to license effectively (or build systems that are secure, compliant, or reliable), an individual or group of individuals must understand the entire integrated application stack – or face the reality that there will be holes. But what about the technology, when issues like Heartbleed come along and expose fundamental flaws across the Internet?

The reality is that complex systems are complex. But it is because of this complexity that these systems must be planned, documented, and clearly understood at some level, or we’re kidding ourselves that we can secure, protect, defend (and properly pay for) these systems, and have them be available with any kind of reliability.

Two friends on Twitter had a dialog the other day about responsibility/culpability when open source components are included in an application/system. One commented, “I never understand why doing it right & not getting sued for doing it wrong aren’t a strong argument.”

I get what she means. But unfortunately having been at a small ISV who wound up suing a much larger retail company because they were pirating our software, “doing the right thing” in business sometimes comes down to “doing the cheap, quick, or lazy thing”. In our case, an underling at the retail company had told us they were pirating our software, and he wanted to rectify it. He wanted to do the right thing. Negotiations occurred to try and come to closure about the piracy, but when it came down to paying the bill for the software that had been used/was being used, a higher up vetoed the payment due to us. Why? Simple risk management. Cheaper was believed to be better than the right thing.This tiny Texas software company couldn’t ever challenge them in court and win (for posterity: we could, and we did).

Unfortunately we hear stories all the time of this sort of thing. It’s a game of chicken. This isn’t unusual – it happens in software all the time.

I wish I could say that I was shocked when I hear of companies taking shortcuts – improperly using open-source (or commercial) software out of the bounds of how it is licensed, deploying complex systems without understanding their security threat model, or continuing to run software after it has left support. But no. Not much really surprises me much anymore.

What does concern me, though, is that the world assumed that OpenSSL was secure, and that it had been reviewed and audited by enough skilled eyes to avoid elementary bugs like the one that created Heartbleed. But no, that’s not the case. Like any complex system, there’s a certain point where an innumerable number of people around the world just assumed that OpenSSL worked, accepted it, and deployed it; yet here it failed at a fundamental level for two years.

In a recent interview, the developer responsible for the flaw behind Heartbleed discussed the issue, stating, “But in this case, it was a simple programming error in a new feature, which unfortunately occurred in a security relevant area.”

I can’t tell you how troubling I find that statement. Long ago, Microsoft had a sea change with regard to how software was developed. Key components of this change involved

  1. Developing threat models in order to be certain we understood the types and angles of approach for any threat vectors we could find
  2. Deeper security foundations across the OS and applications
  3. Finally, a much more comprehensive approach to testing (in large part to try and ensure that “simple programming errors in new features” wouldn’t blow the entire system apart.

No, even Microsoft’s system is not perfect, and flaws still happen, even with new operating systems. But as I noted, I find it remarkably troubling that a flaw as significant as Heartbleed can make it through development, peer review, any bounds-checking testing done in the OpenSSL development process, and into release (where it will generally be accepted as “known good” by the community at large – warranted or not) for two years. It’s also concerning that the statement included that the Heartbleed flaw “unfortunately occurred in a security relevant area“. As I said on Twitter – this is OpenSSL. The entire thing should be considered to be a security relevant area.

The biggest problem with this issue is that there should be ongoing threat modeling and bounds checking amongst users of OpenSSL (or any software – open or commercial), and in this case the OpenSSL development community to ensure that the software is actually secure. But as with any complex system, there’s a uniform expectation that this type of project results in code that could be generally regarded as safe. But most companies will simply assume a project as mature and ubiquitous as OpenSSL is so, and do little to no verification of the software, deploy it, and later hear through others about vulnerabilities in the software.

In the complex stacks of software today, most businesses aren’t qualified to, simply aren’t willing to, or aren’t aware of the need to, perform acceptance checking on third-party software they’re using in their own systems (and likely don’t really have developers on staff that are qualified to review software such as OpenSSL. As a result, a complex and fragile system becomes even more complex. And even more fragile. Even more dangerous, without any level of internal testing, these systems of internal and external components are assumed to be reliable, safe, and secure – until time (and usually a highly technical developer being compensated for finding vulnerabilities) show it to not be the case, and then we find ourselves in goose chase mode, as we are right now.

Comments are closed.