<link href="https://fonts.googleapis.com/css?family=Work+Sans:300,400,500,700,800|Nunito:300,400,400i,700&amp;display=swap" rel="stylesheet"/>
8 minutes reading time (1540 words)

How Machine Learning Enhances Performance Engineering and Testing

As enterprise software platforms expand in complexity and importance, performance anomalies have become a serious threat that can result in millions of dollars in losses. Faced with this challenge, performance engineering experts have begun utilizing machine learning algorithms to predict performance issues, remedy them, and even avoid them altogether.

Machine learning solutions can analyze and interpret thousands of statistics per second, providing real-time (or near real-time) insight into a system's performance. They can be used to recognize data patterns, build statistical models, and make predictions that are invaluable to the process of performance monitoring and testing.

With these abilities, machine learning tools are able to solve performance issues faster and more accurately than performance teams, significantly improving efficiency. Furthermore, they can help teams understand the platform's behavior quickly while mitigating the risks associated with poor performance, such as reputational damage, a reduction in customers, and financial losses.

Here we take a closer look at the necessary considerations for organizations looking to harness the power of machine learning algorithms to improve performance anomaly detection and, as a result, overall performance testing. 

Recognizing Performance Anomalies

During testing, there are numerous signs that an application is producing a performance anomaly, such as delayed response time, increased latency, hanging, freezing, or crashing systems, and decreased throughput.

The root cause of these issues can be traced to any number of sources, including operator errors, hardware/software failures, over- or under-provisioning of resources, or unexpected interactions between system components in different locations.

There are three types of performance anomalies that performance testing experts look out for.

  • Point anomalies: A single instance of data that's vastly different from the rest of the data in the dataset or database.
  • Contextual anomalies: Here the anomaly is specific to a given context. These are common in time-series data, like in the case of performance peaks due to traffic increase.
  • Collective anomalies: In this case, there is a set of data instances that together indicate abnormal behavior.

In all cases, anomaly detection is similar to what's known as noise removal or novelty detection during the performance testing process. The difference is that while anomaly detection looks to flag potential threats, novelty detection works to identify patterns that were not observed in training data, and noise removal works to separate unwanted observations from the desired data for analysis.

[RELEVANT READING: Using Machine Learning to Detect Anomalies in Performance Engineering]

Performance Anomaly Detection Without Machine Learning

One of the most basic methods of anomaly detection in performance testing is to identify and flag data points that stray from the common model through simple statistical techniques.

  • Reactive approach

Here a team may set a threshold for specific performance metrics, like CPU utilization, disk I/O, memory consumption, or network traffic, and raise alarms when that threshold is violated. The challenging aspect of this approach is that larger data systems can have variable workloads, so setting static thresholds can trigger false alarms and won't help the team understand the effect of application changes or updates to performance.

  • Proactive approach

In this category, teams are continuously evaluating a system by comparing it to baselines or statist