Intelligent Software Engineering:
Working at the Intersection of AI and
Software Engineering
Tao Xie
Department of Computer Science and Technology
Peking University
Artificial Intelligence  Software Engineering
Artificial
Intelligence
Software
Engineering
Intelligent Software Engineering
Intelligence Software Engineering
Speakers
SIGSOFT Webinar: Intelligent Software
Engineering: Synergy between AI and Software
Engineering (Feb 21, 2019)
What can AI do for software engineering, and how can we as software
engineers design and build better AI systems? As AI continues to disrupt
many fields from agriculture to manufacturing, it’s important to explore
the essential connections between AI and software engineering.
1. The creation of AI software (How do we architect, build, maintain,
deploy, test, and verify AI software?)
2. The application of AI to software engineering (How can AI help
software engineers better do their jobs and advance the state of the
practice?)
3. AI and software engineering in use (How have applications blended
AI and software engineering so far?)
4. The AI landscape and its effect on software engineering (How do
related topics such as AI technology investment, ethics, data collection,
and security affect the work of software developers?)
DSML 2021 Keynote: Intelligent Software Engineering: Working at the Intersection of AI and Software Engineering
Artificial Intelligence  Software Engineering
Artificial
Intelligence
Software
Engineering
Intelligent Software Engineering
Intelligence Software Engineering
Figure created by Christian Kaestner, taken from https://ckaestne.github.io/seai/S2020/
Figure created by Christian Kaestner, taken from https://ckaestne.github.io/seai/S2020/
Challenges in Autonomics: “In search of a foundation for
next generation autonomous systems”
• How to specify autonomous system behavior in the face of
unpredictability;
• How to carry out faithful analysis of system behavior with respect to rich
environments that include humans, physical artifacts, and other systems?
• How to build such systems by combining executable modeling techniques
from software engineering with AI and ML?
Harel, Marron, Sifakis. Autonomics: In search of a foundation for next-generation autonomous systems. Proc. Natl. Acad. Sci. USA 2020
Sample Requirements in Autonomous Driving System (ADS)
• Stability: ADS must assure stable control and avoid dangerous actions for the vehicle
• R1.1: ADS must avoid impossible steering angles
• Safety: ADS must avoid collision with moving or static objects along the path
• R2.1: ADS must keep a safe distance from other objects
• Compliance: the ADS must respect the traffic regulations enforced by law in a
geographical area
• R3.1: the velocity of the vehicle should be less than the speed limit
• R3.2: the vehicle should not run the red light
• R3.3: the vehicle should stay in the correct lane
• Comfort: the planned trajectory should be comfortable for the passenger
• R4.1: the vehicle’s velocity should not change too much
• R4.2: the vehicle’s acceleration should not change too much
Czarnecki, “Automated Driving System (ADS) High–Level Quality 66 Requirements Analysis– Driving Behavior Safety,” Univ Waterloo, 2018.
Tuncali, Fainekos, Prokhorov, Ito, Kapinski, “Requirements-driven test generation for autonomous vehicles with machine learning components,” IEEE Transactions
on Intelligent Vehicles, 2019.
Microsoft's Teen Chatbot Tay
Turned into Genocidal Racist (2016 March 23/24)
http://www.businessinsider.com/ai-expert-explains-why-microsofts-tay-chatbot-is-so-racist-2016-3
"There are a number of precautionary
steps they [Microsoft] could have taken.
It wouldn't have been too hard to create
a blacklist of terms; or narrow the scope
of replies. They could also have simply
manually moderated Tay for the first few
days, even if that had meant slower
responses."
“businesses and other AI developers will
need to give more thought to the
protocols they design for testing and
training AIs like Tay.”
Figure created by Christian Kaestner, taken from https://ckaestne.github.io/seai/S2020/
Adversarial Machine Learning/Testing
● Generate adversarial examples derived from legitimate examples
with slight modification (imperceptible to human) to induce
misclassification
13
Turn right
Go straight
Slight
modification
Tian et al. DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars. ICSE 2018
Example Detected Erroneous Behaviors
Turn right
Go straight
14
Go straight Turn left
Pei et al. DeepXplore: Automated Whitebox Testing of Deep Learning Systems. SOSP 2017.
Tian et al. DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars. ICSE 2018.
Example Detected Erroneous Behaviors
Turn right
Go straight
15
Go straight Turn left
Pei et al. DeepXplore: Automated Whitebox Testing of Deep Learning Systems. SOSP 2017.
Tian et al. DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars. ICSE 2018.
Lu et al. NO Need to Worry about Adversarial Examples in
Object Detection in Autonomous Vehicles. CVPR’17.
https://arxiv.org/abs/1707.03501
“
“
Example Detected Erroneous Behaviors
Turn right
Go straight
16
Go straight Turn left
Pei et al. DeepXplore: Automated Whitebox Testing of Deep Learning Systems. SOSP 2017.
Tian et al. DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars. ICSE 2018.
“
“
Zhou et al. DeepBillboard: Systematic Physical-World
Testing of Autonomous Driving Systems. ICSE 2020.
Robustness Certification for Deep Learning Model
Li, Weber, Xu, Rimanic, Kailkhura, Xie, Zhang, Li. TSS: Transformation-Specific Smoothing for Robustness Certification. CCS 2021.
Transformation-Specific Smoothing-based
robustness certification
• A general robustness certification
framework against various semantic
transformations.
• A range of different transformation-
specific smoothing protocols and
techniques to provide substantially
better certified robustness bounds than
state-of-the-art approaches on large-
scale datasets
Figure created by Christian Kaestner, taken from https://ckaestne.github.io/seai/S2020/
DL Software Development and Deployment
19
Trained
DL Models
DL Software Development
DL frameworks
Training Data
DL programs
[ISSTA’18, ISSRE’19, ESEC/FSE’19, ICSE’20]
DL Software Development and Deployment
20
Trained
DL Models
Server/Cloud Platforms
Mobile Platforms
Browser Platforms
DL Software Development
DL frameworks
Training Data
DL programs
[ISSTA’18, ISSRE’19, ESEC/FSE’19, ICSE’20]
DL Software Deployment
Converted
DL Models
Converted
DL Models
Converted
DL Models
TF
Serving
Core ML
TF Lite
TF.js
Deployment-related
frameworks Knowledge Gap!
Research Questions
21
RQ1: Popularity of DL software deployment
RQ2: Difficulty level of DL software deployment
RQ3: Challenges in DL software deployment
Chen, Cao, Liu, Wang, Xie, Liu. A Comprehensive Study on Challenges in Deploying Deep Learning Based Software. ESEC/FSE 2020.
Collect Relevant Questions from SO
22
TF
Serving
Google
Cloud AI
Amazon
SageMaker
Core ML
TF Lite
TF.js
Server/cloud
deployment
Mobile deployment
Browser deployment
Extraction
&Filtering
Platforms Counts
Cloud/server 1,325
Mobile 1,533
Browser 165
In total 3,023
Identification of relevant questions
RQ1: Popularity
23
Trend of users Trend of questions
DL software deployment is gaining increasing attention,
demonstrating timeliness and urgency of this study.
RQ2: Difficulty Level
24
% of questions with no accepted answer (%no acc.)
time needed to receive an accepted answer (acc. time)
Metrics
High %no acc.
Long acc. time
Low %no acc.
Short acc. time
Topics %no acc.
acc. time
(median)
DL software deployment 70.7% 405 min
Other aspects of DL software 62.7% 146 min
Big data [ESEC/FSE’19] 60.5% 198 min
Concurrency [ESEM’18] 43.8% 42 min
Mobile [EMSE’16] 55.0% 55 min
Questions related to DL software deployment are difficult to
resolve, motivating us to identify challenges behind it.
DIFFICULTY LEVEL
25
Data Processing (19.8%)
Procedure
(1.3%)
Setting size /
shape of input
data (1.8%)
Setting format /
datatype of input
data (8.8%)
Parsing
output (4.8%)
Migrating pre-
processing
(3.1%)
Authenticating
client (1.8%)
Procedure
(0.4%)
Parsing
request
(1.3%)
Serving (13.2%)
Model
loading (3.1%)
Configuration
of batching
(2.6%)
Serving multiple
models simultaneously
(3.5%)
Bidirectional
streaming
(0.4%)
Server / Cloud (100%)
Model Update (2.6%)
Environment (19.4%)
Installing /
building
frameworks (7.5%)
Avoiding version
incompatibility
(4.0%)
Configuration
of environment
variables (7.9%)
Limitations of
platforms /
frameworks (2.6%)
Model Export (15.0%)
Procedure
(4.4%)
Specification of
model
information (6.2%)
Export of
unsupported
models (3.1%)
Selection / usage
of APIs (0.9%)
Model
quantization
(0.4%)
Request (13.3%)
Procedure (0.9%)
Setting request
parameters /
body (8.4%)
Batching
request (2.2%)
General Questions (16.7%)
Entire procedure
of deployment
(9.7%)
Conceptual
questions
(4.4%)
Getting information
of exposed model
(1.8%)
Model Update (1.6%)
Data Extraction (3.2%)
Model Loading (24.0%)
Loading from
local storage
(8.0%)
Loading from a
Http endpoint
(2.4%)
Asynchronous
loading (5.6%)
Selection / usage
of APIs (2.4%)
Improving loading
speed (0.8%)
Procedure
(4.8%)
Inference Speed (7.2%)
Environment (19.2%)
Importing
libraries (10.4%)
Avoiding version
incompatibility
(8.8%)
Procedure
(3.2%)
Specification of
model information
(5.6%)
Conversion of
unsupported
models (4.0%)
Selection /
usage of APIs
(2.4%)
Saving models
(3.2%)
General Questions (5.6%)
Entire procedure
of deployment
(3.2%)
Limitations of
frameworks (2.4%)
Procedure (1.6%)
Data Processing (18.4%)
Setting size /
shape of input
data (5.6%)
Setting format /
datatype of input
data (4.8%)
Migrating pre-
processing
(2.4%)
Browser (100%)
Model Conversion (18.4%)
Model Security (2.4%)
Data Loading
(4.0%)
RQ3: Challenges
A wide spectrum of challenges
for each of three platforms
 organized as taxonomies
 72 categories of challenges
Model Update (3.0%)
Data Extraction (1.7%)
General Questions (18.6%)
Entire procedure
of deployment
(13.4%)
Conceptual
questions
(4.8%)
Limitations of
frameworks
(0.4%)
Avoiding version
incompatibility
(1.7%)
Configuration of
input / output
information (8.2%)
DL Integration into Projects (21.2%)
Importing /
loading models
(4.3%)
Build
configuration
(3.9%)
Inference Speed (3.9%)
Model Conversion (26.5%)
Procedure
(3.9%)
Saving models
(1.3%)
Conversion of
unsupported
models (6.1%)
Model
quantization
(4.8%)
Specification of
model information
(8.2%)
Selection /
usage of APIs
(0.9%)
Parsing
converted
models (1.3%)
DL Library Compilation (7.8%)
Usage of
prebuilt
libraries (0.4%)
Register of
unsupported
operators (3.0%)
Build
configuration
(2.6%)
Procedure (1.7%)
Data Processing (16.9%)
Setting size /
shape of input
data (3.0%)
Setting format /
datatype of input
data (5.2%)
Parsing output
(2.2%)
Migrating pre-
processing (4.8%)
Mobile (100%)
Procedure
(1.7%)
Thread
management
(2.2%)
Model Security (0.4%)
Procedure
(0.9%)
Server/Cloud Mobile
Browser
Common Challenges across Three Platforms
26
Cloud/Server 15.0%
Mobile 26.5%
Browser 18.4%
Model Conversion
Unsupported models
Specification of model
information
Selection/usage of APIs
Model quantization
Cloud/server 19.8%
Mobile 16.9%
Browser 18.4%
Setting size / shape / format
/ datatype of input data
Migrating pre-processing
Parsing output
Data Processing
Unique Challenges in
Client Platforms (Mobile and Browser)
27
Model Security
Server/Cloud 0.0%
Mobile 0.4%
Browser 2.4%
Inference Speed
Server/Cloud 0.0%
Mobile 3.9%
Browser 7.2%
Models on client platforms easier
to obtain than those on
server/cloud platforms
Client platforms with weaker
computing power than server/cloud
platforms
Summary of Challenges
in DL Software Deployment
28
Deploying DL Software
RQ1:
Popularity
RQ2:
Difficulty Level
RQ3:
Challenges
DL software deployment is
gaining increasing attention.
Questions about DL software
deployment are difficult to resolve.
We build a taxonomy consisting of
72 categories, linked to challenges
in deploying DL software.
Server/Cloud
Mobile Browser
Chen, Cao, Liu, Wang, Xie, Liu. A Comprehensive Study on Challenges in Deploying Deep Learning Based Software. ESEC/FSE 2020.
Figure created by Christian Kaestner, taken from https://ckaestne.github.io/seai/S2020/
Neural Machine Translation
Screen snapshot captured on April 5, 2018
• Overall better than statistical
machine translation
• Worse controllability
• Existing translation quality
assurance
 Need reference translation,
not applicable online
Translation Quality Assurance
● Key idea:black-box algorithms specialized for common problems
○ No need for reference translation; need only the original sentence and generated
translation
○ Precise problem localization
● Common problems
○ Under-translation
○ Over-translation
Collaborative Work with Tencent
Wang, Zheng, Liu, Zhang, Zeng, Deng, Yang, He, Xie. Detecting Failures of Neural Machine Translation in the Absence of Reference Translations. DSN 2019 Industry Track
Deployment of Translation Quality Assurance
● Adopted to improve WeChat translation service (over 1 billion users,
online serving 12 million translation tasks)
○ Offline monitoring (regression testing)
○ Online monitoring (real time selection of best model)
● Large scale test data for translation
○ ~130K English/180K Chinese words/phrases
○ Detect numerous problems in other vendors as well
BLEU Score Improvement %Problems Reduction
Problem Cases in Other Translation Services
Collaborative Work with Tencent
Wang, Zheng, Liu, Zhang, Zeng, Deng, Yang, He, Xie.
Detecting Failures of Neural Machine Translation in the
Absence of Reference Translations. DSN 2019 Industry Track
Figure created by Christian Kaestner, taken from https://ckaestne.github.io/seai/S2020/
Neural Network Architecture Neural Network Model
-0.2
Training
Existing work on NN model:
Testing
Verification
Bug Detection
…
?
34
Neural Network Architecture
An architecture vendor Many Developers
3. Quality assurance needs to be provided for architectures
A NN Architecture Many NN Models
Software
Systems
35
-0.2
Data
Code (Architecture) Training (Magic) NN Model
1. Bugs at model level are difficult to fix
Hours, Days, Weeks, Months, …
2. Bugs in architectures may cause failures in training
Why Neural Network Architecture?
Bugs leading to errors in numerical operations, such as “NaN”,
“INF”, or crashes during training or inference.
36
Numerical Bugs
37
…
1. y_softmax = tf.nn.softmax(h_fc)
2. cross_entropy = y_ * tf.log(y_softmax)
…
y_
h_fc
Mul
Softmax
Log
y_softmax ∈ [0,1]
h_fc ∈ [-100,100]
We can use static analysis to infer
the range of tensors.
An Example of Numerical Bugs
38
NN Architecture Computation Graph Static Analysis Check Unsafe Operations
Log
Exp
…
Detecting Numerical Bugs in
Neural Network Architectures
Zhang, Ren, Chen, Xiong, Cheung, Xie. Detecting Numerical Bugs in Neural Network Architectures. ESEC/FSE’20.
ACM SIGSOFT Distinguished Paper Award
Found 11 buggy statements in the code repository
Submitted pull requests, and 3 buggy statements have been repaired by the developers
39
Bugs Detected in Real-World Architectures
Zhang, Ren, Chen, Xiong, Cheung, Xie. Detecting Numerical Bugs in Neural Network Architectures. ESEC/FSE’20.
ACM SIGSOFT Distinguished Paper Award
Open Topics in Intelligence Software Engineering (ISE)
• How to solicit and specify requirements for intelligence software?
• How to tackle the complexity of integrating intelligence software with the rest of the
software system?
• How to define test oracles (or properties) for intelligence software?
• How to design high-quality library/framework APIs for developing intelligence
software?
• How to transfer ISE research results into industrial/open source practice?
• …
(SE  AI)  Practice Impact
Problem
Domain
Solution
Domain
Practice
Intelligent Software Engineering
Intelligence Software Engineering
42
Thank You!
Q & A
Tao Xie
Peking University
taoxie@pku.edu.cn
https://taoxiease.github.io/

More Related Content

PDF
SETTA'18 Keynote: Intelligent Software Engineering: Synergy between AI and So...
PDF
Software Analytics: Data Analytics for Software Engineering and Security
PDF
Planning and Executing Practice-Impactful Research
PDF
MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...
PDF
Intelligent Software Engineering: Synergy between AI and Software Engineering...
PPTX
Intelligent Software Engineering: Synergy between AI and Software Engineering
PDF
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...
PPTX
ACM Chicago March 2019 meeting: Software Engineering and AI - Prof. Tao Xie, ...
SETTA'18 Keynote: Intelligent Software Engineering: Synergy between AI and So...
Software Analytics: Data Analytics for Software Engineering and Security
Planning and Executing Practice-Impactful Research
MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...
Intelligent Software Engineering: Synergy between AI and Software Engineering...
Intelligent Software Engineering: Synergy between AI and Software Engineering
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...
ACM Chicago March 2019 meeting: Software Engineering and AI - Prof. Tao Xie, ...

What's hot (20)

PDF
ISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven Research
PDF
Software Analytics: Data Analytics for Software Engineering
PPTX
Machine learning testing survey, landscapes and horizons, the Cliff Notes
PPTX
Transferring Software Testing Tools to Practice
PPTX
Towards Mining Software Repositories Research that Matters
PPTX
Impact-Driven Research on Software Engineering Tooling
PDF
Measuring Agile Software Development
PPTX
An introduction to AI in Test Engineering
PDF
Se research update
PPTX
Software Engineering for ML/AI, keynote at FAS*/ICAC/SASO 2019
PDF
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
PDF
Speeding-up Software Testing With Computational Intelligence
PDF
Adaptation of the technology of the static code analyzer for developing paral...
PDF
Transferring Software Testing Tools to Practice (AST 2017 Keynote)
PDF
Implications of Open Source Software Use (or Let's Talk Open Source)
PDF
2015 2016 ieee matlab project titles
PDF
Opinion Mining for Software Engineering
PPTX
It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
PDF
PhD Welcome Day 2014
PDF
VST2022.pdf
ISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven Research
Software Analytics: Data Analytics for Software Engineering
Machine learning testing survey, landscapes and horizons, the Cliff Notes
Transferring Software Testing Tools to Practice
Towards Mining Software Repositories Research that Matters
Impact-Driven Research on Software Engineering Tooling
Measuring Agile Software Development
An introduction to AI in Test Engineering
Se research update
Software Engineering for ML/AI, keynote at FAS*/ICAC/SASO 2019
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
Speeding-up Software Testing With Computational Intelligence
Adaptation of the technology of the static code analyzer for developing paral...
Transferring Software Testing Tools to Practice (AST 2017 Keynote)
Implications of Open Source Software Use (or Let's Talk Open Source)
2015 2016 ieee matlab project titles
Opinion Mining for Software Engineering
It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
PhD Welcome Day 2014
VST2022.pdf
Ad

Similar to DSML 2021 Keynote: Intelligent Software Engineering: Working at the Intersection of AI and Software Engineering (20)

PDF
Testing Machine Learning-enabled Systems: A Personal Perspective
PDF
Racing with Artificial Intelligence
PDF
2020 09-16-ai-engineering challanges
PDF
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
PDF
Adobe Audition Crack FRESH Version 2025 FREE
PDF
20181212 ibm aot
PDF
Alison Lowndes, Artificial Intelligence DevRel, Nvidia – Fueling the Artifici...
PDF
Solutions for ADAS and AI data engineering using OpenPOWER/POWER systems
PDF
01 - Course setup software sustainability
PDF
IRJET- Self Driving Car using Deep Q-Learning
PDF
Applications of Search-based Software Testing to Trustworthy Artificial Intel...
PDF
AI in the Financial Services Industry
PDF
IRJET - Autonomous Navigation System using Deep Learning
PDF
DriveBuild: Automation of Tests in the Field of Autonomous Cars
PPTX
Intelligent Software Engineering: Synergy between AI and Software Engineering
PDF
Automated Testing and Safety Analysis of Deep Neural Networks
PDF
Autonomous Systems: How to Address the Dilemma between Autonomy and Safety
PPTX
Sundance's presentation at B:RAI 2020
PDF
Leveraging Artificial Intelligence Processing on Edge Devices
 
PDF
Reinforcement Learning with Sagemaker, DeepRacer and Robomaker
Testing Machine Learning-enabled Systems: A Personal Perspective
Racing with Artificial Intelligence
2020 09-16-ai-engineering challanges
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Adobe Audition Crack FRESH Version 2025 FREE
20181212 ibm aot
Alison Lowndes, Artificial Intelligence DevRel, Nvidia – Fueling the Artifici...
Solutions for ADAS and AI data engineering using OpenPOWER/POWER systems
01 - Course setup software sustainability
IRJET- Self Driving Car using Deep Q-Learning
Applications of Search-based Software Testing to Trustworthy Artificial Intel...
AI in the Financial Services Industry
IRJET - Autonomous Navigation System using Deep Learning
DriveBuild: Automation of Tests in the Field of Autonomous Cars
Intelligent Software Engineering: Synergy between AI and Software Engineering
Automated Testing and Safety Analysis of Deep Neural Networks
Autonomous Systems: How to Address the Dilemma between Autonomy and Safety
Sundance's presentation at B:RAI 2020
Leveraging Artificial Intelligence Processing on Edge Devices
 
Reinforcement Learning with Sagemaker, DeepRacer and Robomaker
Ad

More from Tao Xie (16)

PDF
MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...
PDF
Diversity and Computing/Engineering: Perspectives from Allies
PPTX
Advances in Unit Testing: Theory and Practice
PDF
Common Technical Writing Issues
PPTX
HotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William Enck
PPTX
Transferring Software Testing and Analytics Tools to Practice
PDF
User Expectations in Mobile App Security
PDF
Software Analytics - Achievements and Challenges
PDF
Software Mining and Software Datasets
PPTX
Next Generation Developer Testing: Parameterized Testing
PPTX
Csise15 codehunt
PDF
Text Analytics for Security
PPTX
Gamifying Teaching and Learning of Software Engineering and Programming
PDF
Tutorial: Text Analytics for Security
PPTX
Software Analytics: Towards Software Mining that Matters (2014)
PPTX
Teaching and Learning Programming and Software Engineering via Interactive Ga...
MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...
Diversity and Computing/Engineering: Perspectives from Allies
Advances in Unit Testing: Theory and Practice
Common Technical Writing Issues
HotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William Enck
Transferring Software Testing and Analytics Tools to Practice
User Expectations in Mobile App Security
Software Analytics - Achievements and Challenges
Software Mining and Software Datasets
Next Generation Developer Testing: Parameterized Testing
Csise15 codehunt
Text Analytics for Security
Gamifying Teaching and Learning of Software Engineering and Programming
Tutorial: Text Analytics for Security
Software Analytics: Towards Software Mining that Matters (2014)
Teaching and Learning Programming and Software Engineering via Interactive Ga...

Recently uploaded (20)

PPTX
Comprehensive Guide to Digital Image Processing Concepts and Applications
PDF
Coding with GPT-5- What’s New in GPT 5 That Benefits Developers.pdf
PPTX
UNIT II: Software design, software .pptx
PPTX
Chapter 1 - Transaction Processing and Mgt.pptx
PDF
Mobile App for Guard Tour and Reporting.pdf
PDF
Streamlining Project Management in Microsoft Project, Planner, and Teams with...
PDF
Top 10 Project Management Software for Small Teams in 2025.pdf
PPTX
Chapter_05_System Modeling for software engineering
PPTX
Human-Computer Interaction for Lecture 1
PDF
SOFTWARE ENGINEERING Software Engineering (3rd Edition) by K.K. Aggarwal & Yo...
PDF
Building an Inclusive Web Accessibility Made Simple with Accessibility Analyzer
PDF
IT Consulting Services to Secure Future Growth
PDF
Engineering Document Management System (EDMS)
PDF
Cloud Native Aachen Meetup - Aug 21, 2025
PPTX
Bandicam Screen Recorder 8.2.1 Build 2529 Crack
PPTX
Human-Computer Interaction for Lecture 2
PDF
Understanding the Need for Systemic Change in Open Source Through Intersectio...
PDF
Mobile App Backend Development with WordPress REST API: The Complete eBook
PDF
WhatsApp Chatbots The Key to Scalable Customer Support.pdf
PPTX
Post-Migration Optimization Playbook: Getting the Most Out of Your New Adobe ...
Comprehensive Guide to Digital Image Processing Concepts and Applications
Coding with GPT-5- What’s New in GPT 5 That Benefits Developers.pdf
UNIT II: Software design, software .pptx
Chapter 1 - Transaction Processing and Mgt.pptx
Mobile App for Guard Tour and Reporting.pdf
Streamlining Project Management in Microsoft Project, Planner, and Teams with...
Top 10 Project Management Software for Small Teams in 2025.pdf
Chapter_05_System Modeling for software engineering
Human-Computer Interaction for Lecture 1
SOFTWARE ENGINEERING Software Engineering (3rd Edition) by K.K. Aggarwal & Yo...
Building an Inclusive Web Accessibility Made Simple with Accessibility Analyzer
IT Consulting Services to Secure Future Growth
Engineering Document Management System (EDMS)
Cloud Native Aachen Meetup - Aug 21, 2025
Bandicam Screen Recorder 8.2.1 Build 2529 Crack
Human-Computer Interaction for Lecture 2
Understanding the Need for Systemic Change in Open Source Through Intersectio...
Mobile App Backend Development with WordPress REST API: The Complete eBook
WhatsApp Chatbots The Key to Scalable Customer Support.pdf
Post-Migration Optimization Playbook: Getting the Most Out of Your New Adobe ...

DSML 2021 Keynote: Intelligent Software Engineering: Working at the Intersection of AI and Software Engineering

  • 1. Intelligent Software Engineering: Working at the Intersection of AI and Software Engineering Tao Xie Department of Computer Science and Technology Peking University
  • 2. Artificial Intelligence  Software Engineering Artificial Intelligence Software Engineering Intelligent Software Engineering Intelligence Software Engineering
  • 3. Speakers SIGSOFT Webinar: Intelligent Software Engineering: Synergy between AI and Software Engineering (Feb 21, 2019)
  • 4. What can AI do for software engineering, and how can we as software engineers design and build better AI systems? As AI continues to disrupt many fields from agriculture to manufacturing, it’s important to explore the essential connections between AI and software engineering. 1. The creation of AI software (How do we architect, build, maintain, deploy, test, and verify AI software?) 2. The application of AI to software engineering (How can AI help software engineers better do their jobs and advance the state of the practice?) 3. AI and software engineering in use (How have applications blended AI and software engineering so far?) 4. The AI landscape and its effect on software engineering (How do related topics such as AI technology investment, ethics, data collection, and security affect the work of software developers?)
  • 6. Artificial Intelligence  Software Engineering Artificial Intelligence Software Engineering Intelligent Software Engineering Intelligence Software Engineering
  • 7. Figure created by Christian Kaestner, taken from https://ckaestne.github.io/seai/S2020/
  • 8. Figure created by Christian Kaestner, taken from https://ckaestne.github.io/seai/S2020/
  • 9. Challenges in Autonomics: “In search of a foundation for next generation autonomous systems” • How to specify autonomous system behavior in the face of unpredictability; • How to carry out faithful analysis of system behavior with respect to rich environments that include humans, physical artifacts, and other systems? • How to build such systems by combining executable modeling techniques from software engineering with AI and ML? Harel, Marron, Sifakis. Autonomics: In search of a foundation for next-generation autonomous systems. Proc. Natl. Acad. Sci. USA 2020
  • 10. Sample Requirements in Autonomous Driving System (ADS) • Stability: ADS must assure stable control and avoid dangerous actions for the vehicle • R1.1: ADS must avoid impossible steering angles • Safety: ADS must avoid collision with moving or static objects along the path • R2.1: ADS must keep a safe distance from other objects • Compliance: the ADS must respect the traffic regulations enforced by law in a geographical area • R3.1: the velocity of the vehicle should be less than the speed limit • R3.2: the vehicle should not run the red light • R3.3: the vehicle should stay in the correct lane • Comfort: the planned trajectory should be comfortable for the passenger • R4.1: the vehicle’s velocity should not change too much • R4.2: the vehicle’s acceleration should not change too much Czarnecki, “Automated Driving System (ADS) High–Level Quality 66 Requirements Analysis– Driving Behavior Safety,” Univ Waterloo, 2018. Tuncali, Fainekos, Prokhorov, Ito, Kapinski, “Requirements-driven test generation for autonomous vehicles with machine learning components,” IEEE Transactions on Intelligent Vehicles, 2019.
  • 11. Microsoft's Teen Chatbot Tay Turned into Genocidal Racist (2016 March 23/24) http://www.businessinsider.com/ai-expert-explains-why-microsofts-tay-chatbot-is-so-racist-2016-3 "There are a number of precautionary steps they [Microsoft] could have taken. It wouldn't have been too hard to create a blacklist of terms; or narrow the scope of replies. They could also have simply manually moderated Tay for the first few days, even if that had meant slower responses." “businesses and other AI developers will need to give more thought to the protocols they design for testing and training AIs like Tay.”
  • 12. Figure created by Christian Kaestner, taken from https://ckaestne.github.io/seai/S2020/
  • 13. Adversarial Machine Learning/Testing ● Generate adversarial examples derived from legitimate examples with slight modification (imperceptible to human) to induce misclassification 13 Turn right Go straight Slight modification Tian et al. DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars. ICSE 2018
  • 14. Example Detected Erroneous Behaviors Turn right Go straight 14 Go straight Turn left Pei et al. DeepXplore: Automated Whitebox Testing of Deep Learning Systems. SOSP 2017. Tian et al. DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars. ICSE 2018.
  • 15. Example Detected Erroneous Behaviors Turn right Go straight 15 Go straight Turn left Pei et al. DeepXplore: Automated Whitebox Testing of Deep Learning Systems. SOSP 2017. Tian et al. DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars. ICSE 2018. Lu et al. NO Need to Worry about Adversarial Examples in Object Detection in Autonomous Vehicles. CVPR’17. https://arxiv.org/abs/1707.03501 “ “
  • 16. Example Detected Erroneous Behaviors Turn right Go straight 16 Go straight Turn left Pei et al. DeepXplore: Automated Whitebox Testing of Deep Learning Systems. SOSP 2017. Tian et al. DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars. ICSE 2018. “ “ Zhou et al. DeepBillboard: Systematic Physical-World Testing of Autonomous Driving Systems. ICSE 2020.
  • 17. Robustness Certification for Deep Learning Model Li, Weber, Xu, Rimanic, Kailkhura, Xie, Zhang, Li. TSS: Transformation-Specific Smoothing for Robustness Certification. CCS 2021. Transformation-Specific Smoothing-based robustness certification • A general robustness certification framework against various semantic transformations. • A range of different transformation- specific smoothing protocols and techniques to provide substantially better certified robustness bounds than state-of-the-art approaches on large- scale datasets
  • 18. Figure created by Christian Kaestner, taken from https://ckaestne.github.io/seai/S2020/
  • 19. DL Software Development and Deployment 19 Trained DL Models DL Software Development DL frameworks Training Data DL programs [ISSTA’18, ISSRE’19, ESEC/FSE’19, ICSE’20]
  • 20. DL Software Development and Deployment 20 Trained DL Models Server/Cloud Platforms Mobile Platforms Browser Platforms DL Software Development DL frameworks Training Data DL programs [ISSTA’18, ISSRE’19, ESEC/FSE’19, ICSE’20] DL Software Deployment Converted DL Models Converted DL Models Converted DL Models TF Serving Core ML TF Lite TF.js Deployment-related frameworks Knowledge Gap!
  • 21. Research Questions 21 RQ1: Popularity of DL software deployment RQ2: Difficulty level of DL software deployment RQ3: Challenges in DL software deployment Chen, Cao, Liu, Wang, Xie, Liu. A Comprehensive Study on Challenges in Deploying Deep Learning Based Software. ESEC/FSE 2020.
  • 22. Collect Relevant Questions from SO 22 TF Serving Google Cloud AI Amazon SageMaker Core ML TF Lite TF.js Server/cloud deployment Mobile deployment Browser deployment Extraction &Filtering Platforms Counts Cloud/server 1,325 Mobile 1,533 Browser 165 In total 3,023 Identification of relevant questions
  • 23. RQ1: Popularity 23 Trend of users Trend of questions DL software deployment is gaining increasing attention, demonstrating timeliness and urgency of this study.
  • 24. RQ2: Difficulty Level 24 % of questions with no accepted answer (%no acc.) time needed to receive an accepted answer (acc. time) Metrics High %no acc. Long acc. time Low %no acc. Short acc. time Topics %no acc. acc. time (median) DL software deployment 70.7% 405 min Other aspects of DL software 62.7% 146 min Big data [ESEC/FSE’19] 60.5% 198 min Concurrency [ESEM’18] 43.8% 42 min Mobile [EMSE’16] 55.0% 55 min Questions related to DL software deployment are difficult to resolve, motivating us to identify challenges behind it. DIFFICULTY LEVEL
  • 25. 25 Data Processing (19.8%) Procedure (1.3%) Setting size / shape of input data (1.8%) Setting format / datatype of input data (8.8%) Parsing output (4.8%) Migrating pre- processing (3.1%) Authenticating client (1.8%) Procedure (0.4%) Parsing request (1.3%) Serving (13.2%) Model loading (3.1%) Configuration of batching (2.6%) Serving multiple models simultaneously (3.5%) Bidirectional streaming (0.4%) Server / Cloud (100%) Model Update (2.6%) Environment (19.4%) Installing / building frameworks (7.5%) Avoiding version incompatibility (4.0%) Configuration of environment variables (7.9%) Limitations of platforms / frameworks (2.6%) Model Export (15.0%) Procedure (4.4%) Specification of model information (6.2%) Export of unsupported models (3.1%) Selection / usage of APIs (0.9%) Model quantization (0.4%) Request (13.3%) Procedure (0.9%) Setting request parameters / body (8.4%) Batching request (2.2%) General Questions (16.7%) Entire procedure of deployment (9.7%) Conceptual questions (4.4%) Getting information of exposed model (1.8%) Model Update (1.6%) Data Extraction (3.2%) Model Loading (24.0%) Loading from local storage (8.0%) Loading from a Http endpoint (2.4%) Asynchronous loading (5.6%) Selection / usage of APIs (2.4%) Improving loading speed (0.8%) Procedure (4.8%) Inference Speed (7.2%) Environment (19.2%) Importing libraries (10.4%) Avoiding version incompatibility (8.8%) Procedure (3.2%) Specification of model information (5.6%) Conversion of unsupported models (4.0%) Selection / usage of APIs (2.4%) Saving models (3.2%) General Questions (5.6%) Entire procedure of deployment (3.2%) Limitations of frameworks (2.4%) Procedure (1.6%) Data Processing (18.4%) Setting size / shape of input data (5.6%) Setting format / datatype of input data (4.8%) Migrating pre- processing (2.4%) Browser (100%) Model Conversion (18.4%) Model Security (2.4%) Data Loading (4.0%) RQ3: Challenges A wide spectrum of challenges for each of three platforms  organized as taxonomies  72 categories of challenges Model Update (3.0%) Data Extraction (1.7%) General Questions (18.6%) Entire procedure of deployment (13.4%) Conceptual questions (4.8%) Limitations of frameworks (0.4%) Avoiding version incompatibility (1.7%) Configuration of input / output information (8.2%) DL Integration into Projects (21.2%) Importing / loading models (4.3%) Build configuration (3.9%) Inference Speed (3.9%) Model Conversion (26.5%) Procedure (3.9%) Saving models (1.3%) Conversion of unsupported models (6.1%) Model quantization (4.8%) Specification of model information (8.2%) Selection / usage of APIs (0.9%) Parsing converted models (1.3%) DL Library Compilation (7.8%) Usage of prebuilt libraries (0.4%) Register of unsupported operators (3.0%) Build configuration (2.6%) Procedure (1.7%) Data Processing (16.9%) Setting size / shape of input data (3.0%) Setting format / datatype of input data (5.2%) Parsing output (2.2%) Migrating pre- processing (4.8%) Mobile (100%) Procedure (1.7%) Thread management (2.2%) Model Security (0.4%) Procedure (0.9%) Server/Cloud Mobile Browser
  • 26. Common Challenges across Three Platforms 26 Cloud/Server 15.0% Mobile 26.5% Browser 18.4% Model Conversion Unsupported models Specification of model information Selection/usage of APIs Model quantization Cloud/server 19.8% Mobile 16.9% Browser 18.4% Setting size / shape / format / datatype of input data Migrating pre-processing Parsing output Data Processing
  • 27. Unique Challenges in Client Platforms (Mobile and Browser) 27 Model Security Server/Cloud 0.0% Mobile 0.4% Browser 2.4% Inference Speed Server/Cloud 0.0% Mobile 3.9% Browser 7.2% Models on client platforms easier to obtain than those on server/cloud platforms Client platforms with weaker computing power than server/cloud platforms
  • 28. Summary of Challenges in DL Software Deployment 28 Deploying DL Software RQ1: Popularity RQ2: Difficulty Level RQ3: Challenges DL software deployment is gaining increasing attention. Questions about DL software deployment are difficult to resolve. We build a taxonomy consisting of 72 categories, linked to challenges in deploying DL software. Server/Cloud Mobile Browser Chen, Cao, Liu, Wang, Xie, Liu. A Comprehensive Study on Challenges in Deploying Deep Learning Based Software. ESEC/FSE 2020.
  • 29. Figure created by Christian Kaestner, taken from https://ckaestne.github.io/seai/S2020/
  • 30. Neural Machine Translation Screen snapshot captured on April 5, 2018 • Overall better than statistical machine translation • Worse controllability • Existing translation quality assurance  Need reference translation, not applicable online
  • 31. Translation Quality Assurance ● Key idea:black-box algorithms specialized for common problems ○ No need for reference translation; need only the original sentence and generated translation ○ Precise problem localization ● Common problems ○ Under-translation ○ Over-translation Collaborative Work with Tencent Wang, Zheng, Liu, Zhang, Zeng, Deng, Yang, He, Xie. Detecting Failures of Neural Machine Translation in the Absence of Reference Translations. DSN 2019 Industry Track
  • 32. Deployment of Translation Quality Assurance ● Adopted to improve WeChat translation service (over 1 billion users, online serving 12 million translation tasks) ○ Offline monitoring (regression testing) ○ Online monitoring (real time selection of best model) ● Large scale test data for translation ○ ~130K English/180K Chinese words/phrases ○ Detect numerous problems in other vendors as well BLEU Score Improvement %Problems Reduction Problem Cases in Other Translation Services Collaborative Work with Tencent Wang, Zheng, Liu, Zhang, Zeng, Deng, Yang, He, Xie. Detecting Failures of Neural Machine Translation in the Absence of Reference Translations. DSN 2019 Industry Track
  • 33. Figure created by Christian Kaestner, taken from https://ckaestne.github.io/seai/S2020/
  • 34. Neural Network Architecture Neural Network Model -0.2 Training Existing work on NN model: Testing Verification Bug Detection … ? 34 Neural Network Architecture
  • 35. An architecture vendor Many Developers 3. Quality assurance needs to be provided for architectures A NN Architecture Many NN Models Software Systems 35 -0.2 Data Code (Architecture) Training (Magic) NN Model 1. Bugs at model level are difficult to fix Hours, Days, Weeks, Months, … 2. Bugs in architectures may cause failures in training Why Neural Network Architecture?
  • 36. Bugs leading to errors in numerical operations, such as “NaN”, “INF”, or crashes during training or inference. 36 Numerical Bugs
  • 37. 37 … 1. y_softmax = tf.nn.softmax(h_fc) 2. cross_entropy = y_ * tf.log(y_softmax) … y_ h_fc Mul Softmax Log y_softmax ∈ [0,1] h_fc ∈ [-100,100] We can use static analysis to infer the range of tensors. An Example of Numerical Bugs
  • 38. 38 NN Architecture Computation Graph Static Analysis Check Unsafe Operations Log Exp … Detecting Numerical Bugs in Neural Network Architectures Zhang, Ren, Chen, Xiong, Cheung, Xie. Detecting Numerical Bugs in Neural Network Architectures. ESEC/FSE’20. ACM SIGSOFT Distinguished Paper Award
  • 39. Found 11 buggy statements in the code repository Submitted pull requests, and 3 buggy statements have been repaired by the developers 39 Bugs Detected in Real-World Architectures Zhang, Ren, Chen, Xiong, Cheung, Xie. Detecting Numerical Bugs in Neural Network Architectures. ESEC/FSE’20. ACM SIGSOFT Distinguished Paper Award
  • 40. Open Topics in Intelligence Software Engineering (ISE) • How to solicit and specify requirements for intelligence software? • How to tackle the complexity of integrating intelligence software with the rest of the software system? • How to define test oracles (or properties) for intelligence software? • How to design high-quality library/framework APIs for developing intelligence software? • How to transfer ISE research results into industrial/open source practice? • …
  • 41. (SE  AI)  Practice Impact Problem Domain Solution Domain Practice Intelligent Software Engineering Intelligence Software Engineering
  • 42. 42 Thank You! Q & A Tao Xie Peking University [email protected] https://taoxiease.github.io/