Metamorphic Testing in a Nutshell

23 Dec, 2021

Tags: software-testing

“Building the product right versus building the right product” is a popular cliché used to describe the product-market fit. Software testing focuses on "building the right product". There are analogies in the mathematical world, where every claim is validated by proof. The level of rigour required in mathematics is that the proof must handle every case to be acceptable. In the world of software testing, nothing stops us from considering every case, no matter how trivial. However, the difficulty lies in identifying every possibility, which can be exponential in some cases. Traditional testing requires crafting a few examples (input/output pair) and checking that the output matches our expectations (ground truth) [[1]](). There are situations where obtaining test examples is difficult. Traditional testing suffers from the Oracle problem, or insufficient coverage in some cases. Hence, impacting the applicability of traditional testing in those areas. Handling these issues would require metamorphic testing. ### Verification versus Validation There is usually an upfront cost for writing tests for the application. Due to these costs, some managers have been hesitant to invest in testing. There is a preponderance of evidence about the value that testing provides to an organization. However, the level of testing can vary across organizations. Testing is considered an afterthought by some schools of thought, while test-driven development is considered the only way to go by others. Rather than focus on both extremes, my take is that the programmer should write enough tests to give confidence that features can be added without breaking existing functionalities. When done rightly, the benefits of a testing culture include the following: - A reduction in the time spent troubleshooting bugs. - A quality mentality that spreads among team members in the organization. Anecdotal evidence suggests that Software engineers who create new features while writing tests tend to have lower technical debts in their source code. Software testing can take the following form: - **Verification** is a series of checks on the source code, design documents, and architecture to ensure the quality of the software via reviews and inspection. This entails checking if the software matches its planned requirements. - **Validation** is checking if the software meets the needs of the customer. In a game-theoretic sense, we can have a set of testers and a set of programmers which may overlap and act in good faith. The programmers and testers are actors in our fictitious game. Let us restate our goals in the following phrases. - The goal of a programmer is to write code that can pass the test. - Testers write tests that can fail and reveal problems. One may be tempted to ask the following questions. - Is there any value in a passing test? - If so, what information derives from a passing test [2]? One may think that passing tests is useless. However, a passing test assures us that the code path is working as intended, but it tells us nothing about its behaviour in any untested code path. Hence, stressing the need for higher test coverage. Metamorphic testing comes to the rescue. ### What is Metamorphic Testing? Metamorphic testing (MT) reduces the need for an Oracle by using the relationship between two or more inputs (derived from the original input) and their expected outputs. These relationships are more general and can help increase test coverage. They are also known as metamorphic relations (MR). However, if any metamorphic relation is violated, then a bug has been detected. MT helps in test case generation and result verification [[2]](). MT can provide a framework for robust testing of edge cases and failure modes. Metamorphic testing is a widely used methodology in various forms. However, only recently did academics formalize it. Metamorphic testing can help test a program when the expected output (oracle) is unavailable. However, we require a proxy for this measure if the ground truth is absent. There are numerous applications of metamorphic testing in machine learning [[4]](), compiler development [[7]](), and bioinformatics [[6]](). Metamorphic testing verifies the output of a test case based on slight perturbation of the input while retaining the same information. These are typical problems that are inherently lacking ground truth for traditional testing. These include: - Testing the effectiveness of a text-to-speech (TTS) processor. - Is the result from a search query correct? Hence, we will provide examples of metamorphic relations in this blog. A possible metamorphic relation in graph-theoretic problems is the use of symmetry. Assume that we have an undirected graph in which every node has a bidirectional link with the same weight. We can be sure that \begin{equation} Dist (G, a, z) = Dist(G, z, a) \end{equation} Where $Dist (G, x, y)$ is the distance between node, $x$ and node, $y$ in a graph, $G$. Similarly, finding the minimum of two values should be independent of the order \begin{equation} min(a, b) = min (b,a) \end{equation} More examples of metamorphic testing + Comparing diffs of binary + Verifying the shortest path algorithm - In a single-source shortest path, it can be checked that every node is relaxed, given that we have the cost of the node. + Checking the order of execution, e.g. happen-before relations in Distributed System + Automated log analysis Here is an example of traditional testing. ``` def example1(): p=5 assert(p==5) ``` Notice that we have only examined a point in the problem space, which is in contrast to generative testing, which is the basis for property-based testing. Metamorphic testing is related to Property-based testing, as it is a generative testing paradigm. ``` def example2() p=random_point() assert(p>5) ``` The benefit of generative testing is higher coverage. However, we trade specificity for generality [[1]](). The challenges of using generative testing: + Finding appropriate relationships. + Specific versus General tradeoff. - Specific has more information, while general can work across multiple use cases, but has minimal information ### Possible Use cases - In a Neural network, if you have done some convolutions, you may check if the translational matrices obey the properties of a typical translation matrix. - In tabulation, if you are dealing with human ages, it is possible to check if ages are between 0 and 115 as humans have a limited lifespan which is a universal feature of humans. In some applications, you may want to verify whether the user has reached the acceptable age to use the application. - If you know the call graph of processes on a browser, it is possible to predict the loading order of the page and use it to identify bugs. - Checking time bounds violations in distributed systems. - Investigating logs of an application to detect bugs. - Creating a parser for CSS style files to identify usability bugs by checking printer-safe, colour-blind (safe colour) codes in the CSS files. This can identify accessibility bugs. - Checking UI placement (snapshot testing). - Validating a clustering algorithm without ground truth. Deciding on the optimal number of clusters can be challenging and ill-posed. Formulate adequacy criteria based on the structure of the data. A solution may look unintuitive, but it is an idea for a specific use case. See more information in the paper [[4]](). ### Advantages of MT Here are some advantages of MT[[2]](): - Simple - Easy to implement - Easy to automate - Low cost ### Case studies in information retrieval Here are examples of metamorphic relations in evaluating the usefulness of a search engine [[3, 5]](). - Equivalence: A repeated search query should return the same element, independent of ordering. For example, a query "houses in Vancouver" should yield results similar to "Vancouver houses". - Equality: If sorted by relevance, a search query will return the same result for the same query. For example, if I search in the search engine twice within a short time. The results should be identical. - Subset: A subset of a query should have fewer results than the original query. For example, a query "houses in British Columbia" should return more results than a query "houses in Vancouver". This is because Vancouver is a city in the province of British Columbia. Disjoint: Irrelevant responses should not be returned. In the query "houses in Vancouver", we should not see any houses from Saskatchewan in the search result. - Complete: A universe is the union of all of its components. For example, a query "houses in British Columbia" will display a combination of houses in every city in the province of British Columbia. - Difference: A query "houses in British Columbia, not Vancouver" should not return any results for houses listed in Vancouver. ##### Misconceptions around MT Relations There are things that can impact how MT is used in practice. Even though violating the bound results in errors, not all necessary properties are MR, e.g. -1 *= sin(x) *= 1 even though sine is not an MT. - "MR targets multiple inputs and their expected output." - "Not all MRs separate into input-only and output-only sub-relations." - "Not all MRs are equality relations." - "MR can work with or without Oracle." ### Applying metamorphic testing in practice - Figure out how inputs are modified to effect a predictable pattern in the output. - Due to sufficiency versus necessary constraints, do not assert but rather warn. - Metamorphic testing should not replace standard testing (unit testing, interaction testing). MT is meant to augment existing test suites by increasing test coverage. ### Conclusions Metamorphic testing should serve as a secondary testing paradigm in your tool set. It is reasonable to see metamorphic testing as sanity checks based on the property of the (input/output) problems. One rule of thumb for applying metamorphic testing is to study successful case studies in the industry and adapt to your specific needs. Finally, creating Metamorphic relations is a combination of both art and science. Note: opening quote is attributed to Boehm. ### References - [[1]]() Blog, https://www.hillelwayne.com/post/metamorphic-testing/ - [[2]]() Chen, T.Y., Kuo, K., Liu, H., Poon P., Towey, D., Tse T.H., and Zhou Z.Q. (2018). Metamorphic Testing: A Review of Challenges and Opportunities. ACM Computing Survey. - [[3]]() Segura, S., Parejo, J.A., Troya, J., & Ruiz-Cortés, A. (2018). Metamorphic Testing of RESTful Web APIs. IEEE Transactions on Software Engineering, 44, 1083-1099. - [[4]]() Xie, X, Zhang, Z., Chen, T.Y., Liu, Y., Poon, P., and Xu, B. (2018). METTLE: a METamorphic testing approach to assessing and validating unsupervised machine LEarning systems. - [[5]]() Zhou, Z., Xiang, S., and Chen, T.Y. (2016). Metamorphic Testing for Software Quality Assessment: A Study of Search Engines. IEEE Transaction Software Engineering - [[6]]() Chen, T.Y., Ho, J.W., Liu, H. , and Xie, X. (2009). An innovative approach for testing bioinformatics programs using metamorphic testing. BMC Bioinformatics. - [[7]]() Donaldson, A.F. and Evrard, H., Lascu, A., and Thomson, P. (2017). Automated Testing of Graphics Shader Compilers. Proceeding of ACM Programming Languages. ### **How to Cite this Article** ``` BibTeX Citation @article{kodoh2021a, author = {Odoh, Kenneth}, title = {Metamorphic Testing in a Nutshell}, year = {2021}, note = {https://kenluck2001.github.io/blog_post/metamorphic_testing_in_a_nutshell.html} } ```

10/18

Please feel free to donate to support my work by clicking

Read more of our blog posts, technical talks, and publications.