As software becomes more complex across the board, new techniques emerge that simplify and improve the software development lifecycle, particularly within the area of product performance.
Thanks to its natural compatibility with performance analysis, data science is a particularly valuable discipline for software development outsourcing companies as they aim for higher levels of excellence in their solutions.
But why is data science so well-suited to performance analysis? Here are some of the challenges we've faced and the benefits we're seeing at PSL after applying data science techniques to our performance testing.
When dealing with software performance issues, the analysis process is all about reviewing the metrics and trying to find answers. This is often a manual process, with experienced analysts copying and pasting metrics into a spreadsheet and working from that. Clearly, this technique is not efficient enough to make quick impacts and there is too much room for error.
Another common challenge is the sheer number of metrics to analyze. The continuous performance testing process takes a lot of time, and it's impossible to assess all of an app's functionalities at once, so the most important metrics must be singled out and prioritized. Again, this is very difficult to achieve manually by combing through endless excel sheets with endless rows of data.
There are many routine questions to ask when looking at performance, such as where did the application break, or which resource caused the problem? Even so, crunching the numbers manually is very difficult for engineers or developers that don't have a strong background in mathematics or statistics.
To make the performance analysis process beneficial, we needed our teams to first understand the source of the numbers, how to process them, and how they can be transformed into more meaningful values. Naturally, we turned to the data science experts to help develop the solution.
Data scientists can help software development outsourcing companies take a more formal approach to performance analysis, or point out useful techniques for improving the entire process.
At PSL, we worked with a team of data science experts who helped us understand how to identify consistent performance variations. One of the most valuable pieces of advice they gave us was to increase the number of performance tests we performed, which greatly improved the reliability of the numbers we had to work with. We could then pass this knowledge onto our customers and help them make more informed decisions from the very start of the project.
Working this way, we found that their knowledge and expertise helped us to become stronger performance analysts, define our own path, build our own tools, develop our own knowledge, and solve our own problems.
The result: our amazing engineers developed new innovations in performance analysis automation.
[Looking for more content on performance? Watch PSL's Head of Performance Engineering talk about why it's so vital for companies]
Our time working with data scientists helped us to find these gaps in our team's knowledge, as well as uncover problems with the tools we were using. This strategy allowed us to design training courses for our engineers, which gave them the knowledge they needed to innovate.
Once the team understood the performance analysis process, they were empowered to create automation tools to increase efficiency, boost productivity, and generate more valuable insights from the data. By automating the analysis part of the process, or at least most of it, we have been able to cover more ground and reduce the number of errors in production.
Another reason for automating much of the performance analysis process is to pinpoint the metrics that engineers should be focusing on. There is no single rule of thumb for determining the metrics for analysis, so we have to collect them all and use automated analysis to identify the most relevant. During this process, the more common metrics we look at are the number of transactions per second, response time, and error rate. We're also looking for resource usage metrics, such as how much CPU the app is using, or how much memory is required to deliver a specific amount of transactions.
The data science techniques we have applied have also helped us to reduce the dimensionality of our final data sets, resulting in much more meaningful results. We now cover as many variables as we need to, while also defining the most relevant metrics for each application. This is incredibly important because in some cases, bottlenecks are hidden in uncommon variables, like the CPU context switching or the garbage collecti