AI in Security
Subrat Kumar Panda
AI First Thought Leader,
Director of Engineering, AI and Data Sciences,
Capillary Technologies
Bangalore
Agenda
● AI and Industry 4.0
● Brief intro AI, ML, IoT
● Security Evolution (AI related)
● Era of Data
● AI use cases in security
● Building and deploying a Intelligent Security Product
Brief Introduction about me
● BTech ( 2002) , PhD (2009) – CSE, IIT Kharagpur
● Synopsys (EDA), IBM (CPU), NVIDIA (GPU), Taro (Full Stack Engineer), Capillary (Principal Architect - AI)
● Applying AI to Retail
● Co-Founded IDLI (for social good) with Prof. Amit Sethi (IIT Bombay), Jacob Minz (Synopsys) and Biswa
Gourav Singh (Capillary)
● https://www.facebook.com/groups/idliai/
● Linked In - https://www.linkedin.com/in/subratpanda/
● Facebook - https://www.facebook.com/subratpanda
● Twitter - @subratpanda
Industry 4.0
https://en.wikipedia.org/wiki/Industry_4.0
1. Interoperability
2. Information
transparency
3. Technical assistance
4. Decentralized
decisions
Knowledge is Power - Sir Francis Bacon
- Industry 4.0 enabled by IoT, BigData and AI
- IoT is the intelligent sensor
- BigData will enable processing huge volumes of data
- AI will make sense of the data in decision making
- AI helps transform raw data into power - AI will transform businesses for sure
- Primarily Machine Learning and then the deeper aspects with Deep Learning
AI is the bedrock on which Industry 4.0 relies on.
The AI landscape - Nvidia
Machine Learning – http://techleer.com
What AI can and cannot Do today ?
https://hbr.org/2016/11/what-artificial-intelligence-can-and-cant-do-right-now
Supervised Learning
1. Being able to input A and output B will transform many industries.
2. The technical term for building this A→B software is supervised learning.
3. The best solutions today are built with a technology called deep learning or deep neural
networks, which were loosely inspired by the brain.
4. Basically labelled data is the most important requirement for Supervised Learning.
If a typical person can do a mental task with less than one second of thought, we can probably automate it
using AI either now or in the near future. - Andrew Ng
Transfer Learning - http://ruder.io/transfer-learning/
Transfer Learning - http://ruder.io/transfer-learning/
Drivers of ML Success
Machine Learning Tasks
● Regression (or prediction) — a task of predicting the next value based on the previous values.
● Classification — a task of separating things into different categories.
● Clustering — similar to classification but the classes are unknown, grouping things by their
similarity.
● Association rule learning (or recommendation) — a task of recommending something based on
the previous experience.
● Dimensionality reduction — or generalization, a task of searching common and most important
features in multiple examples.
● Generative models — a task of creating something based on the previous knowledge of the
distribution.
AI Funding in Cybersecurity
https://www.ciab.com/resources/artificial-intell
igence-cybersecurity/
Trends to Watch
https://www.ciab.com/resources/artificial-intell
igence-cybersecurity/
Future of AI
https://threatpost.com/artificial-intelligence-a-
cybersecurity-tool-for-good-and-sometimes-b
ad/137831/
AI powered Information Security
https://blog.capterra.com/artificial-
intelligence-in-cybersecurity/
Awesome ML Papers and Code for Cyber Security
- https://github.com/jivoi/awesome-ml-for-cybersecurity
- Datasets
- Papers
- Books
- Talks
- Tutorials
- Courses
ML
Applications
https://ccdcoe.org/uploads/2018/10/Art-19-On-the-Effectiveness-of-Machine-and-Deep-Learning-for-Cyber-Security.pdf
Malware Detection
https://arxiv.org/pdf/1904.02441.pdf
Malware Detection Methodology
- Problem Formulation - Binary Classification Problem
- Dataset
- Feature Extraction
- Dimensionality Reduction
- Model Building and Analysis
Datasets
- Malicia Project data
- Difference between the number of malware (11, 308) and benign executables (2, 819)
- Oversampling, Undersampling, Cluster based sampling helps
- Generalizability achieved by K-fold Cross Validation
Feature Extraction
- Decoding the executables
- Literature shows that various static attribute such as Windows API calls, strings, opcode, and
control flow graph are good feature vectors
- They used opcode frequency as a discriminatory feature
- Dimensionality Reduction
- Variance Threshold
- Autoencoders
Building the Learning Model
- Exploration/Ensemble of multiple models
- Random Forest
- DNN-2L
- DNN-4L
- DNN-7L
Results
- Achieved the highest accuracy of
99.78% with random forest and
variance threshold which is an
improvement of 1.26% on
previously reported the best
accuracy.
- In feature reduction, variance
threshold outplayed auto-encoders
in improving the model
performance.
- The best result did not come from
any of the deep learning models.
- DL was a overkill for Malicia
Dataset
Hardware Based Malware detector
https://cse.iitk.ac.in/users/spramod/papers/date17.pdf
Feature Sets
https://cse.iitk.ac.in/users/spramod/papers/date17.pdf
Reinforcement Learning
DQN architecture
Questions?

AI in security