Table of Contents
About the author
Introduction
1. Current status of UI automation
2. The introduction of behavior-driven development BDD
3. Current application of AI in UI automation
4. AI fully automatically writes effective test scripts
5. System Implementation
Home Technology peripherals AI Let AI write effective UI automation with real-time debugging

Let AI write effective UI automation with real-time debugging

Mar 15, 2024 pm 03:46 PM
ai Ctrip

About the author

Thales Fu, senior R&D manager at Ctrip, is committed to finding better ways to combine AI and engineering to solve real-life problems.

Introduction

In the rapidly iterative software development cycle, automated testing of user interface (UI) has become the key to improving efficiency and ensuring product quality. However, as applications become increasingly complex, traditional UI automation methods gradually reveal their limitations. AI-driven UI automation is here, but it still faces accuracy and reliability challenges. In this context, this article proposes an innovative perspective: through real-time debugging technology, the effectiveness of UI automation scripts written by AI can be significantly improved.

This problem is not just a technical challenge, it is related to how to speed up the delivery of software while ensuring software quality. This article will explore how real-time debugging can help AI understand and execute UI test scripts more accurately, and how this method can bring revolutionary changes to software development.

1. Current status of UI automation

UI automation has experienced considerable development, starting from the simple recording and playback tools. to today’s complex scripting framework. Despite the continuous advancement of technology, traditional UI automation methods still face challenges when dealing with rapidly changing application interfaces. As applications become more complex and dynamic, traditional approaches may not be enough. Therefore, engineers are looking for more flexible and reliable solutions to improve the efficiency and reliability of UI automation. A new generation of UI automation tools and technologies are emerging to

According to industry surveys, manually writing test scripts is inefficient and takes a lot of time to re-work when updates are applied. Research shows that maintaining UI automation test scripts may account for 60% to 70% of the entire testing work. In an agile development environment, it can take more than 100 hours to rewrite and test existing automation scripts for each application update. This high maintenance cost highlights the inefficiency and resource consumption of traditional UI automation methods.

2. The introduction of behavior-driven development BDD

Behavior-driven development (BDD) is an agile software development practice that encourages software More effective communication between a project’s developers, testers, and non-technical stakeholders. Cucumber is a popular tool for implementing BDD methodology, which allows team members to write explicit, executable test cases using natural language.

Cucumber uses a domain-specific language (DSL) called Gherkin, which is extremely easy to read and allows non-technical people to understand the purpose and content of the test. Test scenarios are written in the form of a series of Given-When-Then statements, which clearly describe how the system should respond under specific conditions.

For example, the shopping cart function of an online shopping website may have the following Gherkin scenario:

Let AI write effective UI automation with real-time debugging

## This approach leverages natural language description capabilities to promote better communication and understanding between technical and non-technical teams. At the same time, the natural language test scenario also plays the role of project documentation, helping new team members quickly understand the project functions. This enables non-technical personnel to directly participate in the test case writing and verification process, ensuring that development work is closely aligned with business needs.

But it also has limitations. Although the test scenario is written in natural language, the implementation (step definition) behind each step still requires technical personnel to write it in a programming language. This means implementing test logic can involve complex coding efforts. As applications grow and change, maintaining and updating corresponding test steps can become tedious. Especially when the UI changes frequently, the relevant step definitions also need to be updated accordingly. There are also flexibility and adaptability limitations: Cucumber test scripts rely on predefined steps and structures, which can limit the flexibility of the test. For some complex test scenarios, implementing specific test logic may require creative ways to circumvent the limitations of the framework.

Let AI write effective UI automation with real-time debugging

3. Current application of AI in UI automation

In recent years, AI technology has been integrated In UI automation, especially after the emergence of large models represented by GPT, because it has its own code generation capabilities. The industry has also begun to try to directly generate Gherkin's test case description language into test code through large models.

Let AI write effective UI automation with real-time debugging

However, the test code generated by the current large model cannot fully meet expectations. There are several main problems: First, the generated script, because Syntax errors may prevent it from running; secondly, it may not accurately cover the checkpoints that the test case requires it to test. In our practice, the rate of success on the first try is no more than 5%.

After it fails to be generated, people will need to intervene to perform some remedial work. Including: debugging, modifying the use case to regenerate, or directly modifying the generated script.

Let AI write effective UI automation with real-time debugging

#And these tasks themselves also require a lot of manpower, which is contrary to the original intention of our system to automatically generate test scripts through AI.

4. AI fully automatically writes effective test scripts

In order to solve this problem, we have rethought the way AI generates test scripts the whole process.

Let AI write effective UI automation with real-time debugging

#We also consider people’s work together. People have done the debugging and modification work in the system, so can AI do this part of the work? Let the system run the generated code by itself, and let AI debug and modify the error codes it generates.

Therefore, we have adjusted the system design to allow AI to do these tasks autonomously instead of humans. In the end, for all the use cases of Ctrip's hotel order details page, 83.3% of the cases were successfully generated without anyone's participation. During the script generation process, bugs were discovered in 8% of the cases. We generated these use cases three times in a row, with success rates of 84.3%, 81.4% and 83.3% respectively. The system is stable and effective.

Let AI write effective UI automation with real-time debugging

The specific test cases and codes are as follows:

Let AI write effective UI automation with real-time debugging

First, you need to slide to the order details page and drop it User rights module, and then click on the booking optimization area to pop up the price floating layer.

Let AI write effective UI automation with real-time debugging

Then check to see if the fee details include Black Diamond VIP.

Let AI write effective UI automation with real-time debugging

The final generated test code is as follows:

Let AI write effective UI automation with real-time debugging

5. System Implementation

The core architecture diagram of the entire system is as follows. The core part of the system is a langchain framework program. It will access the large model, and we have equipped it with multiple tools, which are mainly divided into two categories, one is the tool for obtaining page information, and the other is the debugging tool.

Langchain will automatically use the page information acquisition tool to get the page data as needed to determine which specific control is needed for the current operation to generate code. Then use the debugging tool to actually execute the code on the mobile phone, and judge whether the code you generated is correct based on the debugging feedback.

Let AI write effective UI automation with real-time debugging

5.1 Prompt words

Yes After the basic architecture, we need prompt words to glue these tools together and let AI understand how it should work. Structurally, our prompt word contains several parts: first, tell the AI ​​how it should think and work, secondly, tell it to debug each of its generated statements through Debug, tell it again what the output format is, and finally tell it The complete text of the use case to be handled by the AI.

For telling AI how it should think and work, the expansion includes the following parts: First, look at what modules are on the page, which module should be the step I want to operate, and what are in this module Controls and components, which control or component I want to operate currently, what action I want to operate, and what is the special syntax I can use, and then generate statements.

Let AI write effective UI automation with real-time debugging

5.2 Debugging Tools

Debugging Tools The essence is to remotely connect to the phone through the adb tool. After connecting, we can send the instructions generated by the AI ​​to the mobile phone to run, and read the results after the operation and give them to the AI, allowing the AI ​​to judge whether the instructions it generated are correct.

5.3 Page information acquisition tool

The ultimate purpose of the page information acquisition tool is to help AI determine the content to be operated as written in the BDD use case, and what is the ID of the specific control it wants to operate. Only with the ID can the subsequent content be generated based on the ID. Program instructions. In order to get the ID, we need a control and component library. The core of this library is the ID of each control and component and their description. With these two contents, AI can be helped to guess which control is needed based on the description of the control after reading the BDD use case.

To achieve this goal, we established a page control library. In addition to the ID and description of each control on the page, this library also contains the relationship between the page and components, and the relationship between components and controls. It can facilitate AI to query step by step.

Let AI write effective UI automation with real-time debugging

#The control library itself is generated based on our static analysis of the code through the job. However, in actual applications, because the controls currently displayed on the page will differ depending on the scene state, the controls on the page will be hidden in some scenarios. Therefore, the page information acquisition tool will intersect the currently existing controls on the page with the controls queried in the control library, thereby obtaining the controls actually displayed on the current page and their description information.

5.4 Further Split AI

Let AI write effective UI automation with real-time debugging

## After completing these tasks, AI can basically do the yellow part of the picture above, which is the human work. The generation success rate has also increased from 5% to 55%, but the 55% success rate is still not enough.

We further analyzed the failed cases. It was found that the main problem was the hallucination of the AI. Although the prompt words were relatively detailed, the AI ​​sometimes did not process it as required, and sometimes talked nonsense on its own.

Our conclusion is that AI is given too much responsibility and it has too many things to consider. It’s not that it doesn’t have enough tokens, but that if it has to do too many things, it will be forgotten and unable to accurately complete the requirements. Therefore, we considered splitting, and still used the function of langchain. Since AI can complete functions through tools, why can't this tool itself be an AI?

Let AI write effective UI automation with real-time debugging

You can even split it again.

Let AI write effective UI automation with real-time debugging

Through these splits, we make the work that each AI needs to consider less and simpler, and also make it process more accurately. The final generation success rate increased to more than 80%.

6. Follow-up development

Currently, through our work, AI can be successfully used with about 80% success without human participation. It is exciting to generate automated test code at a high rate, but there are still many problems that need to be solved.

1) The cost of calling large models is still not low. Is there a better way to complete the work at a lower cost?

2) There are currently some operations or verifications that are difficult to handle. The success rate is 80% and there is still a lot of room for improvement. At present, people still need to review the generated results.

3) In addition, there is room for improvement in other aspects, which is worthy of our continued improvement.

The above is the detailed content of Let AI write effective UI automation with real-time debugging. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial
1664
14
PHP Tutorial
1266
29
C# Tutorial
1239
24
Which of the top ten currency trading platforms in the world are among the top ten currency trading platforms in 2025 Which of the top ten currency trading platforms in the world are among the top ten currency trading platforms in 2025 Apr 28, 2025 pm 08:12 PM

The top ten cryptocurrency exchanges in the world in 2025 include Binance, OKX, Gate.io, Coinbase, Kraken, Huobi, Bitfinex, KuCoin, Bittrex and Poloniex, all of which are known for their high trading volume and security.

How to understand DMA operations in C? How to understand DMA operations in C? Apr 28, 2025 pm 10:09 PM

DMA in C refers to DirectMemoryAccess, a direct memory access technology, allowing hardware devices to directly transmit data to memory without CPU intervention. 1) DMA operation is highly dependent on hardware devices and drivers, and the implementation method varies from system to system. 2) Direct access to memory may bring security risks, and the correctness and security of the code must be ensured. 3) DMA can improve performance, but improper use may lead to degradation of system performance. Through practice and learning, we can master the skills of using DMA and maximize its effectiveness in scenarios such as high-speed data transmission and real-time signal processing.

How to use the chrono library in C? How to use the chrono library in C? Apr 28, 2025 pm 10:18 PM

Using the chrono library in C can allow you to control time and time intervals more accurately. Let's explore the charm of this library. C's chrono library is part of the standard library, which provides a modern way to deal with time and time intervals. For programmers who have suffered from time.h and ctime, chrono is undoubtedly a boon. It not only improves the readability and maintainability of the code, but also provides higher accuracy and flexibility. Let's start with the basics. The chrono library mainly includes the following key components: std::chrono::system_clock: represents the system clock, used to obtain the current time. std::chron

How to handle high DPI display in C? How to handle high DPI display in C? Apr 28, 2025 pm 09:57 PM

Handling high DPI display in C can be achieved through the following steps: 1) Understand DPI and scaling, use the operating system API to obtain DPI information and adjust the graphics output; 2) Handle cross-platform compatibility, use cross-platform graphics libraries such as SDL or Qt; 3) Perform performance optimization, improve performance through cache, hardware acceleration, and dynamic adjustment of the details level; 4) Solve common problems, such as blurred text and interface elements are too small, and solve by correctly applying DPI scaling.

Quantitative Exchange Ranking 2025 Top 10 Recommendations for Digital Currency Quantitative Trading APPs Quantitative Exchange Ranking 2025 Top 10 Recommendations for Digital Currency Quantitative Trading APPs Apr 30, 2025 pm 07:24 PM

The built-in quantization tools on the exchange include: 1. Binance: Provides Binance Futures quantitative module, low handling fees, and supports AI-assisted transactions. 2. OKX (Ouyi): Supports multi-account management and intelligent order routing, and provides institutional-level risk control. The independent quantitative strategy platforms include: 3. 3Commas: drag-and-drop strategy generator, suitable for multi-platform hedging arbitrage. 4. Quadency: Professional-level algorithm strategy library, supporting customized risk thresholds. 5. Pionex: Built-in 16 preset strategy, low transaction fee. Vertical domain tools include: 6. Cryptohopper: cloud-based quantitative platform, supporting 150 technical indicators. 7. Bitsgap:

What is real-time operating system programming in C? What is real-time operating system programming in C? Apr 28, 2025 pm 10:15 PM

C performs well in real-time operating system (RTOS) programming, providing efficient execution efficiency and precise time management. 1) C Meet the needs of RTOS through direct operation of hardware resources and efficient memory management. 2) Using object-oriented features, C can design a flexible task scheduling system. 3) C supports efficient interrupt processing, but dynamic memory allocation and exception processing must be avoided to ensure real-time. 4) Template programming and inline functions help in performance optimization. 5) In practical applications, C can be used to implement an efficient logging system.

How to measure thread performance in C? How to measure thread performance in C? Apr 28, 2025 pm 10:21 PM

Measuring thread performance in C can use the timing tools, performance analysis tools, and custom timers in the standard library. 1. Use the library to measure execution time. 2. Use gprof for performance analysis. The steps include adding the -pg option during compilation, running the program to generate a gmon.out file, and generating a performance report. 3. Use Valgrind's Callgrind module to perform more detailed analysis. The steps include running the program to generate the callgrind.out file and viewing the results using kcachegrind. 4. Custom timers can flexibly measure the execution time of a specific code segment. These methods help to fully understand thread performance and optimize code.

An efficient way to batch insert data in MySQL An efficient way to batch insert data in MySQL Apr 29, 2025 pm 04:18 PM

Efficient methods for batch inserting data in MySQL include: 1. Using INSERTINTO...VALUES syntax, 2. Using LOADDATAINFILE command, 3. Using transaction processing, 4. Adjust batch size, 5. Disable indexing, 6. Using INSERTIGNORE or INSERT...ONDUPLICATEKEYUPDATE, these methods can significantly improve database operation efficiency.

See all articles