Testing (Tosca) and AI... GenAI, a revolutionary dimension in testing or just a hype? (part 2)

Otman (Osman) Zemouri

test architect | quality engineering expert | trainer

Published Feb 11, 2025

In the previous part of this blog series (disclaimer) I explored the foundations and historical context of Generative AI and how it is shaping the future of various industries, including the software testing industry. Furthermore I examined the current state of AI integration within Tosca and explored how GPT-tools like Microsoft’s Copilot and Google’s Gemini can assist in the test design process, despite some limitations. I also demonstrated the use of Applitools, a cloud-based platform for visual and regression testing at the user interface level through a real-world scenario within a fictional company called Acompany.

Part two of this article delves further into the role and goal of GenAI in testing. Furthermore, I will evaluate some GenAI use cases where specific tools and test strategies can enhance the testing process. Finally, I will examine whether Acompany can achieve its goal of structuring a more efficient testing process.

The role and goal of GenAI in Testing

We know that thousands of articles and studies have been written about the role of GenAI in testing (GenAI was used in reviewing this blog). The key takeaway and underlying theme is that GenAI in testing is not just about using machine learning models to automate repetitive tasks, employing generative AI to create test data or generate test cases. Rather, it is about enhancing the testing process by making it smarter and more efficient. This can be achieved through methods such as intelligent test case prioritization, predictive analytics for defect identification, and advanced anomaly detection. Figure 1 below shows part of an example of an anomaly detection program written in Python.

Figure 1. Anomaly detection with PyOD library (PyPi.org)

Python is a language that is widely used in data science in combination with machine learning, and anomaly detection is the process of identifying unusual or unexpected data points, patterns, or events that deviate significantly from the norm in a software system. These so called “anomalies” can be indicative of potential defects, bugs or performance issues. As test purists we know from experience that software anomalies firstly needs to be detected by (peer) reviewing the documentation or code. So identifying anomalies early in the software development lifecycle is crucial. In this case anomaly is really nothing more than a fancy word for a finding. As test professionals, understanding how to leverage these findings can drastically improve product quality, and anomalies can play a pivotal role in your testing efforts. Let’s dive into some use cases where GenAI can assist in addressing each anomaly!

Use case 1: Early detection prevents escalation

When anomalies are detected through code reviews or peer assessments, they serve as early indicators of potential issues. By catching these anomalies before they manifest as full-blown defects, you can save valuable time and resources down the line (Boehm's law).

If we look at GenAI involvement, then GenAI models can be trained on past codebases to automatically detect potential anomalies during code reviews. For instance, GitHub Copilot or similar AI tools can highlight suspicious code patterns, providing real-time suggestions to developers before the code is merged. This proactive detection of anomalies via AI can significantly reduce the need for manual review efforts.

A use case can be a “GenAI-powered” code review tool that flags a particular logic structure as error-prone based on historical defect data. The developer, after reviewing the suggestion, realizes the logic could lead to a potential infinite loop and resolves the issue before it even reaches the testing phase. In figure 2 below, you can see a workflow of how developers can utilize a GenAI feature (in this case: Gemini) to assist with code reviews within a specific CI/CD process.

Figure 2. GenAI-powered code review in a CI/CD process (source: GitHub)

Use case 2: Refining the test strategy

Anomalies can also guide you in refining your test approach. Each anomaly presents a learning opportunity, often pointing towards weaknesses in the code or architecture that might have been overlooked.

So if we look into GenAI involvement by analyzing the anomalies discovered, GenAI models can suggest optimized test cases or highlight areas in the code that require more robust testing. AI-based tools can also recommend test case improvements based on past defects, making the testing process smarter and more targeted.

A possible use case for a test strategy is that, after repeated anomalies in certain areas of the code, a GenAI tool suggests creating additional test cases focused on specific edge cases that were not initially considered. These suggestions improve the coverage of the test suite, reducing the likelihood of missed defects in future releases. If we edit figure 2 above and add a “test strategy” module into it, it looks like figure 3 below.

Figure 3. Testing strategy supported by GPT

The test strategy module can be part of an agile project management tool like JIRA and assisted by a GPT tool like Gemini. The gear icons in figure 3 above indicate the extent of the setting and configuration changes for the tools involved.

Use case 3: Improved communication across teams and continuous improvement

Involving developers, business analysts and other stakeholders in anomaly detection can foster better communication. Furthermore anomalies often highlight process inefficiencies or gaps in understanding. As you detect and analyze anomalies over time, patterns may emerge that point to broader process improvements.

Regarding GenAI involvement, GenAI-powered tools can automatically generate reports and summaries from complex testing data, making anomalies easier to understand for non-technical stakeholders. AI can also assist in translating technical findings into business impact terms, facilitating smoother communication between teams. Finally, GenAI can help automate the analysis of recurring anomalies, detecting patterns and suggesting improvements in coding practices or test processes.

A use case from everyday practice can be a GenAI tool, like Claude.ai, that generates a high-level dashboard from detailed anomaly reports in order to translate complex defect data into visual insights that are easily digestible for the business team. This helps prioritize anomalies that have a higher impact on end-users, ensuring that development focuses on the most critical issues first. Keep in mind that at the moment there is no such as a “Claude.ai” API, but Anthropic the company behind Claude.ai offers different API’s, so in the future you could for example communicate with your CI/CD pipelines using specific GenAI API’s. In figure 4 below, you see a simple example of a test results dashboard created by Claude.ia.

Figure 4. Test results dashboard generated by Claude.ai

Acompany's Advanced and Adaptive Test Automation Framework

The 3 use cases above demonstrate the importance of anomalies in testing, as they often reveal deviations from expected behavior, which can indicate defects or potential issues in your software or system. Let's now build a bridge to a previously developed scenario.

In my previous blog, I introduced a scenario where a test consultant (you) was hired by a fictitious company called “Acompany” to help structure their testing process. The decision was made to develop a test automation framework using Python, Selenium, and Applitools Eyes, focusing on improving regression testing at the UI level. The setup of this Test Automation Framework (TAF) proved to be successful.

Now, Acompany, led by its visionary CTO, recently embarked on a transformative journey to explore GenAI’s role in ensuring application quality following its recent success. The CTO, in close collaboration with domain experts, initiated a comprehensive corporate business and risk analysis. This was not merely a traditional assessment of business risks, but it also focused on future-proofing the organization’s application landscape. During the analysis, one key conclusion stood out: GenAI could play a pivotal role in safeguarding the integrity and quality of the company’s complex application portfolio.

To put these insights into practice, the CTO enlisted your help, given your years of experience in setting up test processes, and promoted you to test lead / test architect. Tasked with conducting a Test Process Improvement (TPI) analysis, your role was to identify areas where testing practices could be enhanced through automation, GenAI and smarter frameworks. The outcome of the TPI revealed significant opportunities for improvement. With the support of leadership, you were given the responsibility to lead several Proof of Concept (POC) teams to evolve the company’s TAF. This framework, primarily built on Python and integrated with tools like Applitools, was already sophisticated, but it needed to scale to meet the growing demands of the business: the Acompany's Advanced and Adaptive Test Automation Framework (3ATAF) project was born. To keep it simple, we will focus on one of the POC teams, the Tosca Copilot team.

The Tosca Copilot Team and EduCompany

Within the 3ATAF project, the Tosca Copilot team was tasked with demonstrating the feasibility of Tosca's Copilot feature in automating the end-to-end functional testing process. The research focused on reducing manual test case creation through GenAI-generated test steps and enhancing test coverage by integrating advanced GenAI capabilities.

Because Acompany is also active in the education sector, the 'EduCompany' product (see Figure 5) was the most logical choice for conducting a proof of concept (POC). The POC focused on leveraging Tosca Copilot to automate and optimize testing efforts, aiming to streamline the testing of Acompany’s educational platform by enhancing the efficiency of test case generation and execution.

Figure 5. AUT: EduCompany

As of writing this blog, the Tosca Copilot team has been using Tosca version 2024.1.2 to explore how GenAI can be integrated into the testing lifecycle. Their first challenge and assignment were clear: “Test Tosca Copilot in a real-world scenario, and evaluate its potential as a core component of Acompany’s corporate testing strategy”. An important prerequisite for using Tosca Copilot is having a Tosca Cloud account or tenant. For more information on this, the team refers to the Tosca Cloud documentation. After successfully requesting a (trial) license for Tosca Cloud and completing the registration process, the team began by creating a Tosca Cloud workspace and configuring Copilot into their project (see figure 6 up to and including figure 11).

Figure 6. Step 1: Navigating inside the Tosca Cloud environment to “Workspaces”

Figure 7. Step 2: Creating the Tosca Cloud workspace

Figure 8. Step 3: Checking and editing the settings of the Tosca Cloud workspace

Figure 9. Step 4: The created Tosca Cloud workspace “EduCompany”

Figure 10. Step 5: In Tosca Commander connect to Tosca Cloud workspace

Figure 11. Step 6: Project structure EduCompany and connected to Tosca Cloud

After the infrastructural set-up, the team began integrating Tosca Copilot into their workflow, see figure 12 and 13 below.

Figure 12. Step 7: Navigating to Tosca Copilot add-on (part of standard Tosca installation)

Figure 13. Step 8: Connect to / log in EduCompany tenant

This integration led to measurable improvements in test coverage. For example, regression testing, which previously took days due to its manual and repetitive nature, could now be completed in hours. This shift was driven by the tool’s capabilities, enabling test specialists to analyze existing test cases and execution results more deeply (see figure 14 and 15). It is important to emphasize that the (test) results still had to be analyzed further, due to certain limitations in the tool.

Figure 14. Tosca Copilot’s main window

Figure 15. Tosca Copilot in action: analyzing test execution results

By utilizing GenAI-generated analysis of the test steps, the team improved the creation of detailed test cases. In one instance, Tosca Copilot was used to analyze the EduCompany platform’s user interface for a complex registration process. While this reduced the time required for test case creation, it also highlighted the tool’s dependency on the quality of existing data.

In terms of test execution, the team observed some efficiency gains, but these improvements were far from seamless. Tosca Copilot’s analytics capabilities offered real-time insights into test results, which helped identify issues faster, such as inconsistencies in the platform’s response times across different user roles during regression testing. However, this functionality came with significant caveats. Without a deep understanding of the testing context and the tool’s limitations, the insights provided could easily be misinterpreted or overlooked.

The scenario above underscores a critical point: Tosca Copilot is not a standalone solution or a guarantee of success. Like other GenAI tools, it has room for improvement, particularly in reducing the learning curve and improving the clarity of its insights. Its value is entirely dependent on the skill and expertise of the test specialist using it, and it should be seen as a supplementary tool. Furthermore the main function of a tool is to augment, assist or enhance human capabilities to perform a specific task more efficiently, accurately, or effectively. A tool is never a replacement for human skill or judgment but rather a means to extend what humans can achieve.

Tosca’s Position in the GenAI Landscape…

Like other test tools in the market, Tosca is navigating the evolving landscape of Generative AI to enhance its capabilities and stay competitive. GenAI could “revolutionize” software testing by enabling intelligent automation, reducing manual effort and improving test accuracy. Tosca version 2024.2.1 was released on January 29th (2025), but it does not include significant improvements regarding GenAI. In Tosca version 2024.2.3, released on March 31, 2025, an additional feature has been added: the generation of TestCases. Although Tosca Copilot is included in the standard Tosca installation, it is not free and requires a Tosca Cloud tenant/account with a paid license. Furthermore, Copilot is not yet a fully GenAI-powered tool, but it represents a significant step forward in leveraging AI to streamline testing processes to some extent.

That said, Tosca still has room to grow in incorporating more advanced GenAI capabilities. For instance, the potential integration of GPT technology could unlock new possibilities for intelligent test generation, maintenance and even “self-healing” test scripts. The journey toward fully harnessing GenAI is ongoing, and Tosca’s Copilot represents a step in that direction, similar to other “Copilot” initiatives in the software testing industry.

Conclusion

Is GenAI in testing merely a hype, or does it offer meaningful value in understanding software quality? We cannot predict the future as simple souls, but GenAI is transforming the way we think about software testing, enabling smarter and more efficient (working) processes. And its adoption is not without hurdles. While tools like Applitools, Testim, and potentially Tosca are incorporating advanced GenAI capabilities, there remains substantial room for growth and innovation. Integrating GenAI into testing frameworks presents challenges, but the potential benefits are significant, such as improved test coverage, faster test creation and greater accuracy. Furthermore, it is important to realize that GenAI models are still constantly evolving. These models can sometimes generate nonsense content, biased training data, or (test) data from unverified sources.

Furthermore anomaly detection plays a critical role in testing, as it helps identify unexpected behaviors, defects or vulnerabilities in software that might otherwise go unnoticed. These anomalies can be subtle and complex, making them difficult to detect using traditional methods. GenAI supported anomaly detection tools are the rule rather than the exception.

It is said that AI is transforming the landscape of software testing, making it smarter, more efficient, and more effective. But you still need a human being (the test specialist) to integrate GenAI capabilities into a test automation framework. A skilled test specialist is essential to integrate GenAI capabilities into a test automation framework, interpret results, and make strategic decisions. Tools like Tosca Copilot, when combined with human expertise, can empower organizations to achieve higher levels of quality, reduce maintenance efforts, and proactively address potential issues. But that's easier said than done.

In the case of the simulated company Acompany, the implementation of Tosca Copilot demonstrates how GenAI can be useful in optimizing testing processes, hopefully delivering robust and reliable software. As the technology continues to evolve, the collaboration between human ingenuity and AI-driven tools will undoubtedly shape the future of software testing. Ultimately, the worldwide community will decide, it has the power in its own hands!

Used abbreviations

• AI (Artificial Intelligence)

• API (Application Programming Interface)

• AUT (Application Under Test)

• CI/CD (Continuous Integration/Continuous Delivery)

• CTO (Chief Technology Officer)

• GenAI (Generative Artificial Intelligence)

• GPT* (Generative Pre-training Transformer)

• IDE (Integrated Development Environment)

• POC (Proof of Concept)

• TAF (Test Automation Framework)

• TPI (Test Process Improvement)

Sources

• https://pypi.org/project/pyod/

• https://github.com/GoogleCloudPlatform/genai-for-developers

• https://claude.ai/

• https://www.anthropic.com/

• https://documentation.tricentis.com/tricentis_cloud/en/content/home.htm

• https://documentation.tricentis.com/tosca_copilot/en/content/landing_page.htm#

• https://www.tmap.net/building-blocks/test-process-improvement-tpi

To view or add a comment, sign in

LinkedIn respects your privacy

Testing (Tosca) and AI... GenAI, a revolutionary dimension in testing or just a hype? (part 2)

Otman (Osman) Zemouri

test architect | quality engineering expert | trainer

More articles by Otman (Osman) Zemouri

Others also viewed

Are your APIs ready for AI?

Docker Labs: GenAI No. 15

Is Performance Testing Dead,LLM Testing, AI Reliability, Security and More

Agentic AI Tooling in 2025: A Practical Deep Dive

Docker Labs: GenAI | No. 1

Testing Jobs in the AI Age: Navigating Uncertainty and Embracing Possibilities

Advancing Agentic Systems: Dynamic Task Decomposition and Real-Time Tool Integration

THEAGENTCOMPANY: A Deeper Dive into Benchmarking LLM Agents in Real-World Work Tasks

AI Tools Take On Tasks, Write Code, and Unearth the Past

From Syntax to Strategy: The Cognitive Revolution We're Witnessing in Development

Explore content categories

More articles by Otman (Osman) Zemouri

Testing (Tosca) and AI... GenAI, a revolutionary dimension in testing or just a hype? (part 1)

Tosca and AI...What can we do with it?

Tosca en AI... GenAI, een revolutionaire dimensie in testen of gewoon een hype? (deel 1)

Tosca en AI...Wat kunnen we ermee?

Wist je dat Tricentis Tosca....CI/CD integratie nog makkelijker heeft gemaakt? (deel 2)

Wist je dat Tosca*...CI/CD integratie nog makkelijker heeft gemaakt? (deel 1)

Wist je dat Tosca....met CI pipelines kan werken? (deel 2)

Wist je dat Tosca....met CI pipelines kan werken? (deel 1)

Wist je dat Tosca....load testen kan genereren?

Wist je dat Tosca....?