Human-AI Collaboration in Software Testing

Explore top LinkedIn content from expert professionals.

  • View profile for Michelle Yi

    Full-stack human

    8,049 followers

    People often ask me what they can implement in addition to existing best practices (e.g., unit tests) for software development testing specific to generative AI applications. These are four types of practical testing to consider for responsible generative AI: 1️⃣ Adversarial testing metric for robustness (full gradient) to check for data poisoning impacting model performance ▶ https://lnkd.in/gc7MNYwF 2️⃣ False Positive Rate (FPR) between groups to check for bias as spurious correlations (e.g., prevent dogs being represented only outdoors, not indoors) ▶ https://lnkd.in/gZCVG44h 3️⃣ Area Under the Curve (AUC) difference between models to identify bias as underrepresentation (e.g., computer scientists represented as men and not women) ▶ https://lnkd.in/gZCVG44h 4️⃣ Bug and out-of-distribution (OOD) scenario identification from text-based adaptive testing assisted by LLMs that prompt toward OOD questions, so the LLM is prompting humans to help find bugs and incorporate the results into the pipeline for models to correct for those ▶ https://lnkd.in/gSZffpi3 (image credit to this paper) As we build more GenAI-driven applications, let's implement new best practices to ensure safe AI. #artificialintelligence #ethicalai #softwaredevelopment #ICCV23

  • View profile for Itamar Friedman

    Co-Founder & CEO @ Qodo | Intelligent Software Development | Code Integrity: Review, Testing, Quality

    14,596 followers

    Meta's TestGen-LLM provides another glimpse into the future of software development. It offers a compelling use case and results: The system generates test cases that can improve code coverage. 1> It first generates a bunch of tests, then filters out those that don't run, pass, or add to the code coverage. In highly controlled cases, the ratio is 1:4, and in real-world scenarios, it is 1:20. 2> Then it lets a human reviewer accept/reject tests. The reported accepted ratio is 1:2, or even 73% in best reported cases. Now, it is important to note that each test the system generates is a one-test addition to a test suite that already exists, written before by a professional developer. Moreover, it doesn't necessarily generate tests for any given test suite. "In total, over the three test-a-thons, 196 test classes were successfully improved, while the TestGen-LLM tool was applied to a total of 1,979 test classes. TestGen-LLM was therefore able to automatically improve approximately 10% of the test classes to which it was applied." While many, including myself, are excited about this work, I'm sharing about its limitations. We are still in the era of AI assistants for test generation (like CodiumAI 's IDE plugins) -- AI teammates who run fully automated workflows will happen soon. #automationtesting #ai

  • 🛠️Your Organization Isn't Designed to Work with GenAI. ❎Many companies are struggling to get the most out of generative AI (GenAI) because they're using the wrong approach. 🤝They treat it like a standard automation tool instead of a collaborative partner that can learn and improve alongside humans. 📢This Harvard Business Review article highlights a new framework called "Design for Dialogue" ️ to help organizations unlock the full potential of GenAI. Here are the key takeaways: 🪷Traditional methods for process redesign don't work with GenAI because it's dynamic and interactive, unlike previous technologies. ✍Design for Dialogue emphasizes collaboration between humans and AI, with each taking the lead at different points based on expertise and context. This approach involves  📋Task analysis ensures that each task is assigned to the right leader — AI or human 🧑💻Interaction protocols that outline how AI and humans communicate and collaborate rather than establish a fixed process 🔁Feedback loops to continuously assess and fine-tune AI–human collaboration based on feedback. 5-step guide to implement Design for Dialogue in your organization 🔍Identify high-value processes. Begin with a thorough assessment of existing workflows, identifying areas where AI could have the most significant impact. Processes that involve a high degree of work with words, images, numbers, and sounds — what we call WINS work are ripe for providing humans with GenAI leverage. 🎢Perform task analysis. Understand the sequence of actions, decisions, and interactions that define a business process. For each identified task, develop a profile that outlines the decision points, required expertise, potential risks, and contextual factors that will influence the AI’s or humans’ ability to lead. 🎨Design protocols. Define how AI systems should engage with human operators and vice versa, including establishing clear guidelines for how and when AI should seek human input and vice versa. Develop feedback mechanisms, both automated and human led. 🏋🏼♂️Train teams. Conduct comprehensive training sessions to familiarize employees with the new AI tools and protocols. Focus on building comfort and trust in AI’s capabilities and teach how to provide constructive feedback to and collaborate with AI systems. ⚖Evaluate and Scale. Roll out the AI integration with continuous monitoring to capture performance data and user feedback and refine the process. Continuously update the task profiles and interaction protocols to improve collaboration between AI and human employees while also looking for process steps that can be completely automated based on the interaction data captured.  By embracing Design for Dialogue, organizations can: 🚀Boost innovation and efficiency, 📈Improve employee satisfaction 💪Gain a competitive advantage 🗣️What are your thoughts on the future of AI and human collaboration? Please share your insights in the comments! #GenAI #AI #FutureOfWork #Collaboration

Explore categories