Nov 25, 2013

Stop Doing Too Much Automation

When researching my article on testing careers for the Testing Planet a thought stuck me about the amount of people who indicated that ‘Test’ Automation was one of the main learning goals of many of the respondents.  This made me think a little about how our craft appears to be going down a path that automation is the magic bullet that can be used to resolve all the issue we have in testing.
I have had the idea to write this article floating around in my head for a while now and the final push was when I saw the article by Alan Page (Tooth of the Angry Weasel) - Last Word on the A Word  in which he said much of what I was thinking. So how can I expand on what I feel is a great article by Alan?
The part of the article that I found the most interesting was the following:

“..In fact, one of the touted benefits of automation is repeatability – but no user executes the same tasks over and over the exact same way, so writing a bunch of automated tasks to do the same is often silly.”

This is similar to what I want to write about in this article.  I see time and time again dashboards and metrics being shared around stating that by running this automated ‘test’ 1 million times we have saved a tester running it manually 1 million times and therefore if the ‘test’ took 1 hour and 1 minute to run manually and 1 minute to run automated it means we have saved 1 million hours of testing.  This is so tempting and to a business who speaks in value and in this context this mean costs.  Saving 1 million hours of testing by automating is a significant amount of cost saving and this is the kind of thing that business likes to see a tangible measure that shows ROI (Return on Investment) for doing ‘test’ automation.  Worryingly this is how some companies sell their ‘test’ automation tools.

If we step back for a minute and go back and read the statement by Alan.  The thing that most people who state we should automate all testing talk about the repeatability factor.  Now let us really think about this.  When you run a test script manually you do more than what is written down in the script.  You think both critically and creatively , you observe things far from the beaten track of where the script was telling you to go.  Computer see in assertions, true or false, black or white, 0 or 1, they cannot see what they are not told to see.  Even with the advances in artificial intelligence it is very difficult for automation systems to ‘check’ more than they have been told to do.  To really test and test well you need a human being with the ability to think and observe.   Going back to our million times example.  If we ran the same test a million times on a piece of code that has not changed the chances of find NEW issues or problems remains very slim however running this manually with a different person running it each time our chances of finding issues or problems increases.  I am aware our costs also increase and there is a point of diminishing returns.  James Lyndsay has talked about this on his blog in which he discusses the importance of diversity   The article also has a very clever visual aid to demonstrate why diversity is important and as a side effect it helps to highlight the points of diminishing return. This is the area that the business needs to focus on rather than how many times you have run a test.

My other concern point is the use of metrics in automation to indicate how many of your tests have you or could you automate.  How many of you have been asked this question?  The problem I see with this is what people mean by “How many of your tests?”  What is this question based upon?  Is it...
  • all the tests that you know about now?
  • all possible tests you could run?
  • all tests you plan to run?
  • all your priority one tests?
The issue is that this is a number that will constantly change as you explore and test the system and learn more. Therefore if you start reporting it as a metric especially as a percentage it soon becomes a non-valuable measure which costs more to collect and collate than any benefit it may try to imply.  I like to use the follow example as an extreme view.

Manager:  Can you provide me a % of all the possible tests that you could run for system X that you could automate.
Me:  Are you sure you mean all possible tests?
Manager: Yes
Me: Ok, easy it is 0%
Manager:  ?????

Most people are aware that testing can have an infinite amount of tests even for the most simple of systems so any number divided by infinity will be close to zero, therefore the answer that was provided in the example scenario above.  Others could argue that we only care about of what we have planned to do how much of that can be automated, or only the high priority stuff and that is OK, to a point, but be careful about measuring this percentage since it can and will vary up or down and this can cause confusion.  As we test we find new stuff and as we find new stuff our number of things to test increases.

My final worry with ‘test’ automation is the amount of ‘test’ automation we are doing (hence the title of this article) I seen cases where people automate for the sake of automation since that is what they have been told to do.  This links in with the previous statement about measuring tests that can be automated.   There need to be some intelligence when deciding what to automate and more importantly what not to automate. The problem is when we are being measured by the number of ‘tests’ that we can automate human nature will start to act in a way that makes us look good against what we are being measured. There are major problems with this and people stop thinking about what would be the best automation solution and concentrate on trying to automate as much as they can regardless of costs. 

What ! You did not realise that automation has a cost?  One of the common problems I see when people sell ‘test’ automation is they conveniently or otherwise forget to include the hidden cost of automation.  We always see the figures of the amount of testing time (and money) being saved by running this set of ‘tests’ each time.  What does not get reported and very rarely measured is the amount of time maintaining and analysing the rests from ‘test’ automation.  This is important since this is time that a tester could be doing some testing and finding new information rather than confirming our existing expectations.  This appears to be missing whenever I hear people talking of ‘test’ automation in a positive way.  What I see is a race to automate all that can be automated regardless of the cost to maintain. 

If you are looking at implementing test automation you seriously need to think about what the purpose of the automation is.  I would suggest you do ‘just enough’ automation to give you confidence that it appear to work in the way your customer expects.  This level of automation then frees up your testers to do some actual testing or creating automation tools that can aid testing.  You need to stop doing too much automation and look at ways you can make your ‘test’ automation effective and efficient without it being a bloated, cumbersome, hard to maintain monstrosity (Does that describe some peoples current automation system?)  Also automation is mainly code so should be treated the same as code and be regularly reviewed and re-factored to reduce duplication and waste.

I am not against automation at all and in my daily job I encourage and support people to use automation to help them to do excellent testing. I feel it plays a vital role as a tool to SUPPORT testing it should NOT be sold on the premise that that it can replace testing or thinking testers.

Some observant readers may wonder why I write ‘test’ in this way when mentioning ‘test’ automation.  My reasons for this can be found in the article by James Bach on testing vs. checking refined.
 
Resource: Stevo

How to measure and analyze the testing efficiency?

Measurements or Metrics or Stats are the common terms you would hear in every management meeting. What exactly does that mean? A bunch of numbers? Hyped up colorful graphs? Why should I bother about those? Well. Right from pre-KG education to K12 to college, we are so used to something called "marks". These are a set of numbers that reflect the academic capabilities of students (note - marks are not the real indicators of IQ levels). These marks in turn help academic institutions to select candidates, along with other parameters. In the same way, if one wants to measure the extent of testing, what are the numbers one must look at?

Fundamental rule. Do not complicate the numbers. The more you complicate, the more will be the confusion. We need to start with some basic numbers that reflect speed of testing, coverage of testing, efficiency of testing. If all these indicators move up, we can definitely be confident that the testing efficiency is getting better and better.

Every number collected from projects, help to fine tune the processes in that project and across the company itself. 

Test planning rate (TPR). TPR = Total number of test cases planned / total person-hours spent on planning. This number indicates how fast the testing team thinks, articulates the tests and documents the tests.

Test execution rate (TER). TER = Total number of test cases executed / total person-hours spent on execution. This indicates the speed of testers in executing the same. 

Requirements coverage (RC). Ideal goal is 100% coverage. But it is very tough to say how many test cases will cover 100% of requirements. But there is a simple range you mus assume. If we test each requirement in just 2 different ways - 1 positive and 1 negative, we need 2N number of test cases, where N is the number of distinct requirements. On an average, most of the commercial app requirements can be done with 8N test cases. So, the chances of achieving 100% coverage is high if you try to test every requirement in 8 different ways. Not all requirements may need an eight-way approach.

Planning Miss (PM).  PM = Number of adhoc test cases that are framed at the time of execution / Number of test cases planned before execution. This indicates, whether the testers are able to plan the tests based on the documentation and understanding levels. This number must be as less as possible, but it is very difficult to achieve zero level in this.

Bug Dispute Rate (BDR). BDR = Number of bugs rejected by development team / Number of total bugs posted by testing team. A high number here leads to unwanted arguments between the two teams.

There is a set of metrics that reflect the efficiency of the development team, based on the bugs found by the testing team. Those metrics do not really reflect the efficiency of the testing team; but without testing team, those metrics cannot be calculated. Here are a few of those.

Bug Fix Rate (BFR). BFR = Total number of hours spent on fixing bugs / total number of bugs fixed by dev team. This indicates the speed of developers  in fixing the bugs.

Number of re-opened bugs. This absolute number is an indicator of how many potential bad-fixes or regression effects are injected into the application, by the development team. Ideal goal is zero for this.

Bug Bounce Chart (BBC). BBC is not just a number, but a line chart. On the X axis, we need to plot the build numbers in sequence. Y axis contains how many New+ReOpen bugs are found in each build. Ideally this graph must keep dropping towards zero, as quickly as possible. But if we see a swinging pattern, like sinusoidal wave, it indicates, new bugs are getting injected build over build, due to regression effects. After code-freeze, product companies must keep a keen watch on this chart.
 
Resource: QAmonitor

Performance Testing - Basics

Squeeze the app before release. If the app withstands that, it is fit for release. But how to squeeze? How will we determine the number of users, data volume etc.? Let us take this step by step and learn. Performance tests are usually postponed until customers feel the pinch. The primary reason is the cost of the tools and the capability to use the tools. If one wants to earn millions of dollars thru a hosted app, a good, proven and simple way is to increase users and reduce price. If one does this, the business volume will grow - but it brings the performance issues as well along with that.

Most of the tools use the same concept of emulating the requests from the client side of the application. This has to be done programmatically. When one is able to generate requests, processing response is a relatively easier task. When you choose the tools, it is better you look for the must-be-in features and then for nice-if features. 

The must-be-in features are listed below.

  1. Select protocols (HTTP, FTP, SMTP etc.)
  2. Record the user sequence and generate script
  3. Parameterize the script to supply a variety of data
  4. Process dynamic data sent by server side (correlation)
  5. Configure user count, iterations and pacing between iterations
  6. Configure user ramp-up
  7. Process secondary requests
  8. Configure network speed, browser types
  9. Check for specific patterns in the response
  10. Execute multiple scripts in parallel
  11. Measure hits, throughput, response time for every page 
  12. Log important details and server response data for troubleshooting
  13. Provide custom coding facility to add additional logic
The nice-if features are listed below.
  1. Configure performance counters for OS, webserver, app server, database server. This way, you can get all results under one single tool
  2. Automatically correlate standard dynamic texts based on java or .net framework. This will reduce scripting time
  3. Provide a visual UI to script and build logic
  4. Generate data as needed - sequential, random and unique data
  5. Provide a flexible licensing model - permanent as well as pay-per-use will be great
  6. Integrate withe profiling tools to pinpoint issues at code level
When one evaluates a performance testing tool, one must do a simple proof of concept on the above features, to see how effectively the tool handles these features. No need to say, that the tool must be simple to use.


Here are a few simple terms you need to be clear - at least academically. There are so many different definitions for the phrases given below, but we try to take the mostly accepted definitions from various project groups.

Load Testing - Test the app for an expected number of users. Usually customers know their current user base (example - total number of account holders in a bank). The number of online users will be usually between 5% and 10% of customer base. But an online user may be just a logged in user, doing no transaction with server. Our interest is always on the concurrent users. Concurrent users are usually between 5% and 10% of online users. So if 100,000 is the total customer base, then 10% of it, 10,000 will be online users and 10% of that, 1000 will be concurrent users.

Stress Testing - Overload the system by x%. That x% may be 10% more than normal load or even 300% more than the normal load. But usually load tests happen for a longer duration and stress tests happen for a shorter duration as spikes, with abnormally more users. Stress is like a flash flood. 

Scalability/Capacity Testing -  See the level at which the system crashes. Keep increasing users and you will see a lot of failures and eventually crash. Some companies use the term stress testing itself to include the capacity testing as well.

Volume Testing - keep increasing the data size for requests as well as process requests when the application database has 100s of millions of records. This usually checks the robustness and speed of the data retrieval and processing.

Endurance/Availability Tests - test the system for a very long period of time. Let the users keep sending requests 24 by 7 may be even for a week or month. See if system consistently behaves over a period of time.

Resource: softsmith