DSML 2021 Keynote: Intelligent Software Engineering: Working at the Intersection of AI and Software Engineering

Intelligent Software Engineering:
Working at the Intersection of AI and
Software Engineering
Tao Xie
Department of Computer Science and Technology
Peking University

Artificial Intelligence  Software Engineering
Artificial
Intelligence
Software
Engineering
Intelligent Software Engineering
Intelligence Software Engineering

Speakers
SIGSOFT Webinar: Intelligent Software
Engineering: Synergy between AI and Software
Engineering (Feb 21, 2019)

What can AI do for software engineering, and how can we as software
engineers design and build better AI systems? As AI continues to disrupt
many fields from agriculture to manufacturing, it’s important to explore
the essential connections between AI and software engineering.
1. The creation of AI software (How do we architect, build, maintain,
deploy, test, and verify AI software?)
2. The application of AI to software engineering (How can AI help
software engineers better do their jobs and advance the state of the
practice?)
3. AI and software engineering in use (How have applications blended
AI and software engineering so far?)
4. The AI landscape and its effect on software engineering (How do
related topics such as AI technology investment, ethics, data collection,
and security affect the work of software developers?)

Figure created by Christian Kaestner, taken from https://ckaestne.github.io/seai/S2020/

Challenges in Autonomics: “In search of a foundation for
next generation autonomous systems”
• How to specify autonomous system behavior in the face of
unpredictability;
• How to carry out faithful analysis of system behavior with respect to rich
environments that include humans, physical artifacts, and other systems?
• How to build such systems by combining executable modeling techniques
from software engineering with AI and ML?
Harel, Marron, Sifakis. Autonomics: In search of a foundation for next-generation autonomous systems. Proc. Natl. Acad. Sci. USA 2020

Sample Requirements in Autonomous Driving System (ADS)
• Stability: ADS must assure stable control and avoid dangerous actions for the vehicle
• R1.1: ADS must avoid impossible steering angles
• Safety: ADS must avoid collision with moving or static objects along the path
• R2.1: ADS must keep a safe distance from other objects
• Compliance: the ADS must respect the traffic regulations enforced by law in a
geographical area
• R3.1: the velocity of the vehicle should be less than the speed limit
• R3.2: the vehicle should not run the red light
• R3.3: the vehicle should stay in the correct lane
• Comfort: the planned trajectory should be comfortable for the passenger
• R4.1: the vehicle’s velocity should not change too much
• R4.2: the vehicle’s acceleration should not change too much
Czarnecki, “Automated Driving System (ADS) High–Level Quality 66 Requirements Analysis– Driving Behavior Safety,” Univ Waterloo, 2018.
Tuncali, Fainekos, Prokhorov, Ito, Kapinski, “Requirements-driven test generation for autonomous vehicles with machine learning components,” IEEE Transactions
on Intelligent Vehicles, 2019.

Microsoft's Teen Chatbot Tay
Turned into Genocidal Racist (2016 March 23/24)
http://www.businessinsider.com/ai-expert-explains-why-microsofts-tay-chatbot-is-so-racist-2016-3
"There are a number of precautionary
steps they [Microsoft] could have taken.
It wouldn't have been too hard to create
a blacklist of terms; or narrow the scope
of replies. They could also have simply
manually moderated Tay for the first few
days, even if that had meant slower
responses."
“businesses and other AI developers will
need to give more thought to the
protocols they design for testing and
training AIs like Tay.”

Adversarial Machine Learning/Testing
● Generate adversarial examples derived from legitimate examples
with slight modification (imperceptible to human) to induce
misclassification
13
Turn right
Go straight
Slight
modification
Tian et al. DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars. ICSE 2018

Example Detected Erroneous Behaviors
Turn right
Go straight
14
Go straight Turn left
Pei et al. DeepXplore: Automated Whitebox Testing of Deep Learning Systems. SOSP 2017.
Tian et al. DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars. ICSE 2018.

Turn right
Go straight
15
Lu et al. NO Need to Worry about Adversarial Examples in
Object Detection in Autonomous Vehicles. CVPR’17.
https://arxiv.org/abs/1707.03501
“
“

Turn right
Go straight
16
“
“
Zhou et al. DeepBillboard: Systematic Physical-World
Testing of Autonomous Driving Systems. ICSE 2020.

Robustness Certification for Deep Learning Model
Li, Weber, Xu, Rimanic, Kailkhura, Xie, Zhang, Li. TSS: Transformation-Specific Smoothing for Robustness Certification. CCS 2021.
Transformation-Specific Smoothing-based
robustness certification
• A general robustness certification
framework against various semantic
transformations.
• A range of different transformation-
specific smoothing protocols and
techniques to provide substantially
better certified robustness bounds than
state-of-the-art approaches on large-
scale datasets

DL Software Development and Deployment
19
Trained
DL Models
DL Software Development
DL frameworks
Training Data
DL programs
[ISSTA’18, ISSRE’19, ESEC/FSE’19, ICSE’20]

DL Software Development and Deployment
20
Trained
DL Models
Server/Cloud Platforms
Mobile Platforms
Browser Platforms
DL Software Development
DL frameworks
Training Data
DL programs
[ISSTA’18, ISSRE’19, ESEC/FSE’19, ICSE’20]
DL Software Deployment
Converted
DL Models
Converted
DL Models
Converted
DL Models
TF
Serving
Core ML
TF Lite
TF.js
Deployment-related
frameworks Knowledge Gap!

Research Questions
21
RQ1: Popularity of DL software deployment
RQ2: Difficulty level of DL software deployment
RQ3: Challenges in DL software deployment
Chen, Cao, Liu, Wang, Xie, Liu. A Comprehensive Study on Challenges in Deploying Deep Learning Based Software. ESEC/FSE 2020.

Collect Relevant Questions from SO
22
TF
Serving
Google
Cloud AI
Amazon
SageMaker
Core ML
TF Lite
TF.js
Server/cloud
deployment
Mobile deployment
Browser deployment
Extraction
&Filtering
Platforms Counts
Cloud/server 1,325
Mobile 1,533
Browser 165
In total 3,023
Identification of relevant questions

RQ1: Popularity
23
Trend of users Trend of questions
DL software deployment is gaining increasing attention,
demonstrating timeliness and urgency of this study.

RQ2: Difficulty Level
24
% of questions with no accepted answer (%no acc.)
time needed to receive an accepted answer (acc. time)
Metrics
High %no acc.
Long acc. time
Low %no acc.
Short acc. time
Topics %no acc.
acc. time
(median)
DL software deployment 70.7% 405 min
Other aspects of DL software 62.7% 146 min
Big data [ESEC/FSE’19] 60.5% 198 min
Concurrency [ESEM’18] 43.8% 42 min
Mobile [EMSE’16] 55.0% 55 min
Questions related to DL software deployment are difficult to
resolve, motivating us to identify challenges behind it.
DIFFICULTY LEVEL

25
Data Processing (19.8%)
Procedure
(1.3%)
Setting size /
shape of input
data (1.8%)
Setting format /
datatype of input
data (8.8%)
Parsing
output (4.8%)
Migrating pre-
processing
(3.1%)
Authenticating
client (1.8%)
Procedure
(0.4%)
Parsing
request
(1.3%)
Serving (13.2%)
Model
loading (3.1%)
Configuration
of batching
(2.6%)
Serving multiple
models simultaneously
(3.5%)
Bidirectional
streaming
(0.4%)
Server / Cloud (100%)
Model Update (2.6%)
Environment (19.4%)
Installing /
building
frameworks (7.5%)
Avoiding version
incompatibility
(4.0%)
Configuration
of environment
variables (7.9%)
Limitations of
platforms /
frameworks (2.6%)
Model Export (15.0%)
Procedure
(4.4%)
Specification of
model
information (6.2%)
Export of
unsupported
models (3.1%)
Selection / usage
of APIs (0.9%)
Model
quantization
(0.4%)
Request (13.3%)
Procedure (0.9%)
Setting request
parameters /
body (8.4%)
Batching
request (2.2%)
General Questions (16.7%)
Entire procedure
of deployment
(9.7%)
Conceptual
questions
(4.4%)
Getting information
of exposed model
(1.8%)
Model Update (1.6%)
Data Extraction (3.2%)
Model Loading (24.0%)
Loading from
local storage
(8.0%)
Loading from a
Http endpoint
(2.4%)
Asynchronous
loading (5.6%)
Selection / usage
of APIs (2.4%)
Improving loading
speed (0.8%)
Procedure
(4.8%)
Inference Speed (7.2%)
Environment (19.2%)
Importing
libraries (10.4%)
Avoiding version
incompatibility
(8.8%)
Procedure
(3.2%)
Specification of
model information
(5.6%)
Conversion of
unsupported
models (4.0%)
Selection /
usage of APIs
(2.4%)
Saving models
(3.2%)
Entire procedure
of deployment
(3.2%)
Limitations of
frameworks (2.4%)
Procedure (1.6%)
Setting size /
shape of input
data (5.6%)
Setting format /
datatype of input
data (4.8%)
Migrating pre-
processing
(2.4%)
Browser (100%)
Model Conversion (18.4%)
Model Security (2.4%)
Data Loading
(4.0%)
RQ3: Challenges
A wide spectrum of challenges
for each of three platforms
 organized as taxonomies
 72 categories of challenges
Model Update (3.0%)
Data Extraction (1.7%)
Entire procedure
of deployment
(13.4%)
Conceptual
questions
(4.8%)
Limitations of
frameworks
(0.4%)
Avoiding version
incompatibility
(1.7%)
Configuration of
input / output
information (8.2%)
DL Integration into Projects (21.2%)
Importing /
loading models
(4.3%)
Build
configuration
(3.9%)
Inference Speed (3.9%)
Model Conversion (26.5%)
Procedure
(3.9%)
Saving models
(1.3%)
Conversion of
unsupported
models (6.1%)
Model
quantization
(4.8%)
Specification of
model information
(8.2%)
Selection /
usage of APIs
(0.9%)
Parsing
converted
models (1.3%)
DL Library Compilation (7.8%)
Usage of
prebuilt
libraries (0.4%)
Register of
unsupported
operators (3.0%)
Build
configuration
(2.6%)
Procedure (1.7%)
Setting size /
shape of input
data (3.0%)
Setting format /
datatype of input
data (5.2%)
Parsing output
(2.2%)
Migrating pre-
processing (4.8%)
Mobile (100%)
Procedure
(1.7%)
Thread
management
(2.2%)
Model Security (0.4%)
Procedure
(0.9%)
Server/Cloud Mobile
Browser

Common Challenges across Three Platforms
26
Cloud/Server 15.0%
Mobile 26.5%
Browser 18.4%
Model Conversion
Unsupported models
Specification of model
information
Selection/usage of APIs
Model quantization
Cloud/server 19.8%
Mobile 16.9%
Browser 18.4%
Setting size / shape / format
/ datatype of input data
Migrating pre-processing
Parsing output
Data Processing

Unique Challenges in
Client Platforms (Mobile and Browser)
27
Model Security
Server/Cloud 0.0%
Mobile 0.4%
Browser 2.4%
Inference Speed
Server/Cloud 0.0%
Mobile 3.9%
Browser 7.2%
Models on client platforms easier
to obtain than those on
server/cloud platforms
Client platforms with weaker
computing power than server/cloud
platforms

Summary of Challenges
in DL Software Deployment
28
Deploying DL Software
RQ1:
Popularity
RQ2:
Difficulty Level
RQ3:
Challenges
DL software deployment is
gaining increasing attention.
Questions about DL software
deployment are difficult to resolve.
We build a taxonomy consisting of
72 categories, linked to challenges
in deploying DL software.
Server/Cloud
Mobile Browser
Chen, Cao, Liu, Wang, Xie, Liu. A Comprehensive Study on Challenges in Deploying Deep Learning Based Software. ESEC/FSE 2020.

Neural Machine Translation
Screen snapshot captured on April 5, 2018
• Overall better than statistical
machine translation
• Worse controllability
• Existing translation quality
assurance
 Need reference translation，
not applicable online

Translation Quality Assurance
● Key idea：black-box algorithms specialized for common problems
○ No need for reference translation; need only the original sentence and generated
translation
○ Precise problem localization
● Common problems
○ Under-translation
○ Over-translation
Collaborative Work with Tencent
Wang, Zheng, Liu, Zhang, Zeng, Deng, Yang, He, Xie. Detecting Failures of Neural Machine Translation in the Absence of Reference Translations. DSN 2019 Industry Track

Deployment of Translation Quality Assurance
● Adopted to improve WeChat translation service (over 1 billion users，
online serving 12 million translation tasks)
○ Offline monitoring (regression testing)
○ Online monitoring (real time selection of best model)
● Large scale test data for translation
○ ~130K English/180K Chinese words/phrases
○ Detect numerous problems in other vendors as well
BLEU Score Improvement %Problems Reduction
Problem Cases in Other Translation Services
Collaborative Work with Tencent
Wang, Zheng, Liu, Zhang, Zeng, Deng, Yang, He, Xie.
Detecting Failures of Neural Machine Translation in the
Absence of Reference Translations. DSN 2019 Industry Track

Neural Network Architecture Neural Network Model
-0.2
Training
Existing work on NN model:
Testing
Verification
Bug Detection
…
?
34
Neural Network Architecture

An architecture vendor Many Developers
3. Quality assurance needs to be provided for architectures
A NN Architecture Many NN Models
Software
Systems
35
-0.2
Data
Code (Architecture) Training (Magic) NN Model
1. Bugs at model level are difficult to fix
Hours, Days, Weeks, Months, …
2. Bugs in architectures may cause failures in training
Why Neural Network Architecture?

Bugs leading to errors in numerical operations, such as “NaN”,
“INF”, or crashes during training or inference.
36
Numerical Bugs

37
…
1. y_softmax = tf.nn.softmax(h_fc)
2. cross_entropy = y_ * tf.log(y_softmax)
…
y_
h_fc
Mul
Softmax
Log
y_softmax ∈ [0,1]
h_fc ∈ [-100,100]
We can use static analysis to infer
the range of tensors.
An Example of Numerical Bugs

38
NN Architecture Computation Graph Static Analysis Check Unsafe Operations
Log
Exp
…
Detecting Numerical Bugs in
Neural Network Architectures
Zhang, Ren, Chen, Xiong, Cheung, Xie. Detecting Numerical Bugs in Neural Network Architectures. ESEC/FSE’20.
ACM SIGSOFT Distinguished Paper Award

Found 11 buggy statements in the code repository
Submitted pull requests, and 3 buggy statements have been repaired by the developers
39
Bugs Detected in Real-World Architectures
Zhang, Ren, Chen, Xiong, Cheung, Xie. Detecting Numerical Bugs in Neural Network Architectures. ESEC/FSE’20.
ACM SIGSOFT Distinguished Paper Award

Open Topics in Intelligence Software Engineering (ISE)
• How to solicit and specify requirements for intelligence software?
• How to tackle the complexity of integrating intelligence software with the rest of the
software system?
• How to define test oracles (or properties) for intelligence software?
• How to design high-quality library/framework APIs for developing intelligence
software?
• How to transfer ISE research results into industrial/open source practice?
• …

(SE  AI)  Practice Impact
Problem
Domain
Solution
Domain
Practice
Intelligent Software Engineering
Intelligence Software Engineering

42
Thank You!
Q & A
Tao Xie
Peking University
taoxie@pku.edu.cn
https://taoxiease.github.io/

DSML 2021 Keynote: Intelligent Software Engineering: Working at the Intersection of AI and Software Engineering

More Related Content

What's hot (20)

Similar to DSML 2021 Keynote: Intelligent Software Engineering: Working at the Intersection of AI and Software Engineering (20)

More from Tao Xie (16)

Recently uploaded (20)

DSML 2021 Keynote: Intelligent Software Engineering: Working at the Intersection of AI and Software Engineering