Stop cargo culting

Software is a formal discipline and yet the industry is saturated with cargo cult practices. Most companies are not Google, Amazon, Twitter, Facebook, Netflix, etc. and they never will be but a lot of programmers uncritically internalize and advocate for practices developed at those companies. Let’s try to critically analyze some of the thought leadership and technology that has recently come out of those companies.

Netflix – Chaos Engineering

Netflix is famous for seemingly bucking the trend when it comes to engineering practices and theories. If you search you will find a bunch of slide decks outlining their work culture and how they famously don’t put up with “low” performing team members. I don’t know how they decide who is a “low” performer and who is a “high” performer. I suspect there are a lot of long work hours involved if you want to be considered a “high” performer. This might be something that appeals to you or you might find out that after a few months you are burned out even though they’re technically paying you $500k+ for all your effort.

What’s the point of all this? The point is that no one else is paying their engineers ridiculous sums of money to manage 1000s of servers. Chances are you are not as highly paid as a Netflix engineer. At best you’re managing 100s of servers and whatever you were using for 10s of servers is going to be sufficient for managing 100s of servers. You’re not gonna need to inject any “chaos” into your system for things to fail. Things will fail on their own and you will not be able to keep up because neither you nor the people you work with are paid $500k+.

You and the company you work at also don’t have the same economic incentives as Netflix. Every minute Netflix is down is probably a million dollars of revenue lost. I’m exaggerating, of course, but they pay their engineers so much because they can afford to pay their engineers that much. Every minute of uptime is real money that goes into Netflix coffers and they’re willing to give some percentage of that to “high” performing engineers to make sure Netflix continues to make as much money as possible. Chaos engineering is how they keep themselves honest when building systems that print millions of dollars every hour.

Can people benefit from the same kind of thinking (not the ridiculous work hours and culture but the engineering)? Certainly. It doesn’t hurt to engineer resilient systems. It also doesn’t hurt to use programming languages with static types so if you want to build more resilient systems consider investing time in learning a few languages with static type systems. The time invested in doing that is going to be much more useful than the time invested in learning “chaos engineering”.

Google – Big Data

I can’t imagine the scale Google operates at. Their scale is so large that they can consistently throw away good products and still print money. They famously killed their RSS reader and more recently Inbox. Both times there was a lot of outrage but Google DGAF. They print money. I repeat, they print money with ads. Operating at such a large scale is so ingrained for Google that everything they do is “BIG” so it’s no wonder then that they champion engineering approaches involving “BIG data”.

It’s a safe bet to assume that you’re not operating at Google scale and you’re not collecting Google scale data. I’m almost 100% sure you don’t have “BIG data”. At best you have regular data that fits on a single large VM with a few hundred GB of RAM. That’s it. That’s your scale. If you go looking for engineering practices structured around having so much data that it doesn’t fit on a single server then you’re gonna have a bad time mmm’kay.

You’re gonna try to deploy monstrosities like Hadoop/HDFS, Spark, Flume, Pig, Kafka, Samza, etc. so that you can manage your “BIG data” all the while forgetting that you only have a single server worth of data. What you really needed was a bunch of ETL processes but what you got instead was a distributed system and all the attendant headaches and costs associated with distributed systems. Instead of printing money you are now burning it because you adopted practices that were designed for unimaginable scale.

Facebook – Moving Fast and Breaking Things

I hope I don’t have to convince you why this one is ridiculous. When your site consists of people posting comments and pictures you can afford to lose a few comments and pictures because people will just repost them. Facebook is not a safety critical system and in the early days they prioritized growth at the expense of everything else.

The practice of shipping any code that is barely functional is a viable business strategy only if your system is basically inconsequential. Some software is that way so it’s fine to pile on technical debt to accomplish some business goal but at some point that stops working. Now that Facebook is an actual company that technical debt is starting to catch up. All the recent security breaches are a good example of what happens when you prioritize growth at the expense of everything else. Badly engineered systems that no one understands are easier to hack and game.

So just don’t do what Facebook did. Don’t prioritize growth at the expense of everything else. I’m pretty sure making software the way Facebook made software is borderline unethical.

Amazon, Twitter, etc. – Microservice All the Things!!!

This one is also pretty ridiculous. Anyone that has ever managed a large enough software system with a sprawling mess of dependencies has in theory gotten a taste of what it means to operate with microservices. There is an irreducible amount of complexity in any software system. The complexity can be moved around but it can’t be hidden. It’s like a physical conservation law. You can’t destroy or create energy, all you can do is transfer it from one form to another.

The trade-off you’re making with microservices is more dependencies for potentially more velocity in changing those dependencies. In practice what actually happens is that each microservice starts to depend on internal implementation details of the other microservices and you end up with a distributed monolith. I have yet to see a successful microservice deployment. If you know of one then let me know in the comments.

Conclusion

Don’t cargo cult. Don’t run after the hype train. There is no substitute for using one’s head (even though I keep hearing the AI singularity is on the horizon). Do listen to that inner voice that says “This is unnecessarily complicated”. Do try to simplify as much as possible because there is no substitute for a simple and well designed software system that results from clear thinking.