What happened to performance engineering in the cloud?

Instead of tossing money at performance issues, take a second look at how to manage and optimize cloud computing performance.

What happened to performance engineering in the cloud?

Remember when performance was everything? I used to spend days in computer labs testing performance and reporting for tech magazines about what I found. The tech that provided the best performance, meaning CPU processing, data, storage, and other components at the highest speed, typically won “Editor’s Choice.”

These days we understand that good application, platform, and database performance is table stakes for any deployed systems, including cloud computing. That said, I don’t hear the discussions about cloud system performance as much as I did just five to seven years ago. What happened?

Maybe it's a sign that we’ve gotten so good at performance that’s it’s no longer an issue. I think that performance issues remain, but how we handle fixes in the cloud is not discussed as much as it should be. The approaches and technology used to adjust performance issues are not as well understood, at least from my experience with cloud migration projects and net-new cloud system development.

When cloud architects are asked why they no longer do performance modeling and testing to the degree that we once did, I think most will say that public clouds have an almost unlimited amount of compute and storage resources. If performance becomes an issue, we’ll just allocate more resources until the trouble is fixed.

There are a few problems with this assumption.

First, those resources are not free. They increase the operational cost of the systems deployed in the cloud, perhaps three to five times more than if other types of performance fixes, such as improved design, were taken. Tossing money at a problem is not a “technology solution,” and although I’m sure such a fix is possible, if it costs you five times more, it's not a real solution.

Second, we’re moving so fast in migrating and deploying net-new systems to the cloud, provisioning more resources becomes the fastest solution and thus the one that’s picked most often. Architects assume that the inefficiencies will be found and engineered out of the systems at some point down the road. As most of you already know, this rarely happens.

Finally, we’re not truly understanding the root causes of performance challenges. Those who have been in the software engineering field as long as I have understand that design fixes are often way more cost-effective than increasing horsepower and storage I/O speed.

The danger is that cloud system engineering for performance optimization will become a lost art. I’m bringing up performance engineering and optimization more often than I did 10 years ago. I fear that we are losing track of how to test, engineer, and fix for performance, with the knee-jerk response of just tossing cloud-based resources at the problems.

This is another time when a poorly designed and deployed solution “works.” However, it’s not at all optimized and will quietly suck money from the business while no one is really aware of the reality.

Hopefully we don’t go off too far in that direction. I’ll keep reminding you if we do.

Copyright © 2022 IDG Communications, Inc.

How to choose a low-code development platform