SlideShare a Scribd company logo
AUTOMATIC ATTENDANCE SYSTEM
By: Pinaki Ranjan Sarkar
Under the guidance of:
Dr. Gorthi R.K.S.S. Manyam &
Dr. Deepak Mishra
OUTLINE
▪ Motivation▪ Motivation
▪ Objective
▪ System Requirements
▪ Design Details
▪ Tried methods
▪ Inspiration▪ Inspiration
▪ Main design
▪ Status so far
▪ Future work
MOTIVATION
▪ Taking attendance in large classes is:▪ Taking attendance in large classes is:
▪ Cumbersome
▪ Repetitive
▪ Consumes valuable class time
▪ What if we make an efficient face detection and recognition system for▪ What if we make an efficient face detection and recognition system for
this task?
OBJECTIVES
▪ Automatic user identification via face detection and recognition.▪ Automatic user identification via face detection and recognition.
▪ Develop and implement an efficient face detection and recognition
system.
▪ End-to-end face recognition system using deep learning.
DIFFICULTIES
▪ Large pose variation▪ Large pose variation
▪ Hidden faces & tiny faces
▪ Different illumination conditions, occlusions
SYSTEM REQUIREMENTS
▪ Hardware:▪ Hardware:
▪ A camera
▪ PC or Raspberry pi
▪ Software:
▪ Matlab 2013+
▪ Python 2.7▪ Python 2.7
▪ Lasagne API
DESIGN DETAILS
Database
Face
Detection
Face
Recognition
Abhi - 1
Priya – 1
Ayushi – 0
Pinaki – 0
Akshay – 1
Sidd - 1Sidd - 1
All are using CNN!!
GOING DEEP INTO FACE RECOGNITION
▪ Various methods are employed to recognize a person in wild.▪ Various methods are employed to recognize a person in wild.
▪ Comparing to traditional handcrafted features such as high dimensional
LBP, Active Appearance Model(AAM), Active Shape Model(ASM) or
Bayesian face, Gaussian face etc.; automatically learnt deep features
based on personal identity are more advantageous.
▪ In most deep learning based face recognition methods the inputs to the
deep model are aligned face images.deep model are aligned face images.
TRIED METHODS
▪ Tal Hassner, Shai Harel, Eran Paz, Roee Enbar, "Effective Face Frontalization▪ Tal Hassner, Shai Harel, Eran Paz, Roee Enbar, "Effective Face Frontalization
in Unconstrained Images”, CVPR-2015
TRIED METHODS
▪ Tal Hassner, Shai Harel, Eran Paz, Roee Enbar, "Effective Face Frontalization▪ Tal Hassner, Shai Harel, Eran Paz, Roee Enbar, "Effective Face Frontalization
in Unconstrained Images”, CVPR-2015
TRIED METHODS
▪ Zhu, Xiangyu, et al. "Face alignment across large poses: A 3d▪ Zhu, Xiangyu, et al. "Face alignment across large poses: A 3d
solution." CVPR-2016
TRIED METHODS
▪ Zhu, Xiangyu, et al. "Face alignment across large poses: A 3d▪ Zhu, Xiangyu, et al. "Face alignment across large poses: A 3d
solution." CVPR-2016
TRIED METHODS
▪ Zhu, Xiangyu, et al. "Face alignment across large poses: A 3d▪ Zhu, Xiangyu, et al. "Face alignment across large poses: A 3d
solution." CVPR-2016
TRIED METHODS
▪ I have tried to implement some more papers but they failed when we
are dealing with large pose.
▪ I have tried to implement some more papers but they failed when we
are dealing with large pose.
▪ Instead of AAM, 3D fitted model (3D frontalisation doesn’t show
significant improvements over simple 2D alignment*), we used Deep
learning techniques to recognize a face using only personal identity
clues.
* Banerjee, Sandipan, et al. "To Frontalize or Not To Frontalize: Do We Really Need Elaborate Pre-Processing to
Improve Face Recognition Performance?." arXiv preprint arXiv:1610.04823 (2016).
INSPIRATION
▪ Our work is inspired by some of the state-of-the-art papers.▪ Our work is inspired by some of the state-of-the-art papers.
▪ DeepFace: Closing the Gap to Human-Level Performance in Face Verification, CVPR-2014
▪ FaceNet: A Unified Embedding for Face Recognition and Clustering, CVPR-2015
▪ DeepID3: Face Recognition with Very Deep Neural Networks, CVPR-2015
▪ Supervised Transformer Network for Efficient Face Detection, ECCV-2016
▪ Towards End-to-End Face Recognition through Alignment Learning, arXiv-2017
▪ Spatial transformer networks. NIPS-2015
▪ Finding Tiny Faces. arXiv-2016
MAIN DESIGN
▪ The complete architecture has two stages▪ The complete architecture has two stages
▪ Face Detection
▪ Face Recognition
Automatic Attendance System using CNN
ARCHITECTURE FOR DETECTION
▪ They have provided an in-depth analysis of image resolution, object
scale, and spatial context for the purposes of finding small faces.
▪ Still the detailed study of the paper is pending as I have found this
paper very recently. I will briefly describe their architecture in the next
slide
ARCHITECTURE FOR DETECTION
ARCHITECTURE FOR DETECTION
ARCHITECTURE FOR DETECTION
WHERE IT FAILS?
▪ For out of plane rotation this proposed method works fine but when 2D▪ For out of plane rotation this proposed method works fine but when 2D
rotation comes into picture then their method suffers from less
accuracy.
▪ Some of the failures are shown in the next slide
11/14 True detection
1 False detection
1/14 True detection
1 False detection
Automatic Attendance System using CNN
ARCHITECTURE FOR RECOGNITION
Localization
Network
Transform
parameters
RecognitionRecognition
Network
Features
Augmented image
128 X 128
Transformer
Aligned face
64 X 64
Spatial Transformer Network
SPATIAL TRANSFORMER NETWORK
▪ Intuition behind STN▪ Intuition behind STN
SPATIAL TRANSFORMER NETWORK
▪ Intuition behind STN▪ Intuition behind STN
Sampling
SPATIAL TRANSFORMER NETWORK
SPATIAL TRANSFORMER NETWORK
SPATIAL TRANSFORMER NETWORK
▪ According to the original DeepMind paper, the spatial transformer can▪ According to the original DeepMind paper, the spatial transformer can
be used to implement any parametrizable transformation including
translation, scaling, affine, projective.
▪ Suppose that for the ith target point pt
i = (xt
i ; yt
i ; 1) in the output image,
a grid generator generates its source coordinates (xs
i ; ys
i ; 1) in the input
image according to transformation parameters.
Projective transformation equation
SPATIAL TRANSFORMER NETWORK
▪ Sampler: (Mathematical Formulation)▪ Sampler: (Mathematical Formulation)
SPATIAL TRANSFORMER NETWORK
▪ We use the bilinear kernel so that:▪ We use the bilinear kernel so that:
SPATIAL TRANSFORMER NETWORK
▪ We use the bilinear kernel so that:▪ We use the bilinear kernel so that:
▪ So overall transformer model will be:
SPATIAL TRANSFORMER NETWORK
▪ We use the bilinear kernel so that:▪ We use the bilinear kernel so that:
▪ So overall transformer model will be:
This is equivalent to convolving a sampling kernel
k with the source image of H X W dimension
SPATIAL TRANSFORMER NETWORK
▪ We use the bilinear kernel so that:▪ We use the bilinear kernel so that:
▪ So overall transformer model will be:
This is equivalent to convolving a sampling kernel
k with the source image of H X W dimension
▪ All the blocks should be differentiable.
SPATIAL TRANSFORMER NETWORK
SPATIAL TRANSFORMER NETWORK
During the backward propagation, we need to calculate the gradient
of Vi with respect to each of the eight transformation parameters.
SPATIAL TRANSFORMER NETWORK
During the backward propagation, we need to calculate the gradient
of Vi with respect to each of the eight transformation parameters.
SPATIAL TRANSFORMER NETWORK
During the backward propagation, we need to calculate the gradient
of Vi with respect to each of the eight transformation parameters.
SPATIAL TRANSFORMER NETWORK
SPATIAL TRANSFORMER NETWORK
SPATIAL TRANSFORMER NETWORK
SPATIAL TRANSFORMER NETWORK
Where,
SPATIAL TRANSFORMER NETWORK
▪ The similarity transformation is defined here▪ The similarity transformation is defined here
in which α is the rotation angle, λ is the scaling factor, and t1; t2 are the horizontal and
vertical translation displacements respectively. Analogously, the gradients of Vi respected to α
and λ are shown below:
SPATIAL TRANSFORMER NETWORK
▪ The similarity transformation is defined here▪ The similarity transformation is defined here
in which α is the rotation angle, λ is the scaling factor, and t1; t2 are the horizontal and
vertical translation displacements respectively. Analogously, the gradients of Vi respected to α
and λ are shown below:
SPATIAL TRANSFORMER NETWORK
▪ The similarity transformation is defined here▪ The similarity transformation is defined here
in which α is the rotation angle, λ is the scaling factor, and t1; t2 are the horizontal and
vertical translation displacements respectively. Analogously, the gradients of Vi respected to α
and λ are shown below:
STATUS SO FAR
▪ STN is implemented and tested on Labeled Face in Wild (LFW) dataset.▪ STN is implemented and tested on Labeled Face in Wild (LFW) dataset.
▪ Out of 5423 classes, we took only 1000 classes because of the limitation in
computation.
▪ During training we did data augmentation with random 2D-Affine transformation
on face data to increase the training size.
▪ We had 15399 training images, 3501 testing images and 2100 validation images
during training.
▪ We introduced a CNN architecture to extract deep features from the
transformed face.
STATUS SO FAR
▪ Output of STN network▪ Output of STN network
STATUS SO FAR
Conv
Conv
Pool & Actv
Conv
Conv
Conv
Pool & Actv
Pool & Actv
Pool & Actv
Dense
Dense
Dense
Actv
STN Architecture
Conv
Pool & Actv
Pool & Actv
Dense
Dense
Dense
Actv
Actv
Recognition
Architecture
STATUS SO FAR
Conv
Conv
Pool & Actv
Conv
Conv
Conv
Pool & Actv
Pool & Actv
Pool & Actv
Dense
Dense
Dense
Actv
STN Architecture
Conv
Pool & Actv
Pool & Actv
Dense
Dense
Dense
Actv
Actv
Recognition
Architecture
FUTURE WORK
▪ Try to validate the architecture in real data (taken from classroom)▪ Try to validate the architecture in real data (taken from classroom)
▪ Without training a new CNN model, compare recognition accuracy with
the ImageNet winning pre-trained models.
▪ Adding 2D rotation invariance face detection with the recent model.
THANK YOU!

More Related Content

PDF
AI and Healthcare 2022.pdf
PPTX
institutional ethics committee
PDF
Project report
PDF
Storytelling For The Web: Integrate Storytelling in your Design Process
PPTX
Pandas Data Cleaning and Preprocessing PPT.pptx
PPTX
Central Cooling System (HT & LT).pptx
PPTX
Smart farming
AI and Healthcare 2022.pdf
institutional ethics committee
Project report
Storytelling For The Web: Integrate Storytelling in your Design Process
Pandas Data Cleaning and Preprocessing PPT.pptx
Central Cooling System (HT & LT).pptx
Smart farming

What's hot (20)

PPTX
Face Detection Attendance System By Arjun Sharma
PPTX
Face Detection
PPTX
Detection and recognition of face using neural network
PPT
face recognition system using LBP
PPTX
Deep fakes and beyond
PPTX
Project Face Detection
PPTX
Attendence management system using face detection
DOCX
Age and Gender Detection.docx
PPTX
Computer vision and robotics
PPT
Automated Face Detection System
PPT
Facial recognition technology by vaibhav
PDF
Human Emotion Recognition
PPTX
Face recognigion system ppt
PPTX
Face recognization using artificial nerual network
PPT
Face Detection and Recognition System
PPTX
Final year ppt
PPTX
Attendance Management System using Face Recognition
PPTX
Facial Expression Recognition System using Deep Convolutional Neural Networks.
PPTX
Face recognition attendance system
PPTX
Deep learning on face recognition (use case, development and risk)
Face Detection Attendance System By Arjun Sharma
Face Detection
Detection and recognition of face using neural network
face recognition system using LBP
Deep fakes and beyond
Project Face Detection
Attendence management system using face detection
Age and Gender Detection.docx
Computer vision and robotics
Automated Face Detection System
Facial recognition technology by vaibhav
Human Emotion Recognition
Face recognigion system ppt
Face recognization using artificial nerual network
Face Detection and Recognition System
Final year ppt
Attendance Management System using Face Recognition
Facial Expression Recognition System using Deep Convolutional Neural Networks.
Face recognition attendance system
Deep learning on face recognition (use case, development and risk)

Viewers also liked (17)

PPTX
Comparison of Segmentation Algorithms and Estimation of Optimal Segmentation ...
PDF
Dijital verilerin olcumlenmesi 19 soru 19 cevap
PPTX
Face Recognition based Lecture Attendance System
PPTX
Distributed RDBMS: Data Distribution Policy: Part 3 - Changing Your Data Dist...
PDF
Tendance webdesign 2016
PDF
Catalogo #Stanhome Camp 9/2015
PDF
Семинар "Публичное лицо: коммуникации, образ и поведение"
PDF
Africa: Beef (Cattle Meat) - Market Report. Analysis And Forecast To 2025
PDF
RTF
20 poemas
PDF
RFID Based Smart Class Attendance System with Absentees using Face verification
PDF
MICROCALCIFICATION IDENTIFICATION IN DIGITAL MAMMOGRAM FOR EARLY DETECTION OF...
DOC
مذكرة تاريخ اولى ثانوى كاملة وشامله كل اجزاء المنهج
PPTX
Face recognition tech1
PPTX
Automated attendance system based on facial recognition
PDF
Product (Experience) Management
Comparison of Segmentation Algorithms and Estimation of Optimal Segmentation ...
Dijital verilerin olcumlenmesi 19 soru 19 cevap
Face Recognition based Lecture Attendance System
Distributed RDBMS: Data Distribution Policy: Part 3 - Changing Your Data Dist...
Tendance webdesign 2016
Catalogo #Stanhome Camp 9/2015
Семинар "Публичное лицо: коммуникации, образ и поведение"
Africa: Beef (Cattle Meat) - Market Report. Analysis And Forecast To 2025
20 poemas
RFID Based Smart Class Attendance System with Absentees using Face verification
MICROCALCIFICATION IDENTIFICATION IN DIGITAL MAMMOGRAM FOR EARLY DETECTION OF...
مذكرة تاريخ اولى ثانوى كاملة وشامله كل اجزاء المنهج
Face recognition tech1
Automated attendance system based on facial recognition
Product (Experience) Management

Similar to Automatic Attendance System using CNN (20)

PDF
(20180715) ksiim gan in medical imaging - vuno - kyuhwan jung
PDF
Surviving your frontend (WIP - Sneak Peak)
PPTX
An overview of gradient descent optimization algorithms
PDF
Visual geometry with deep learning
PDF
Lecture 01 frank dellaert - 3 d reconstruction and mapping: a factor graph ...
PPTX
SLIDES OF LECTURE ABOUT TRANSFORMERS FOR VISION TASKS
PPTX
Scalable image recognition model with deep embedding
PPTX
OpenCV Essentials: From Basics to Small Projects, by Irina Nikulina
PPTX
[Mmlab seminar 2016] deep learning for human pose estimation
PDF
Cvpr 2017 Summary Meetup
PDF
Brodmann17 CVPR 2017 review - meetup slides
PDF
Generative Adversarial Networks and Their Medical Imaging Applications
PPTX
Slides for "Do Deep Generative Models Know What They Don't know?"
PPTX
Advanced Recognition System
PPTX
Face recognition system
PDF
The deep bootstrap framework review
PDF
An overview of gradient descent optimization algorithms.pdf
PDF
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
PPTX
Review A DCNN APPROACH FOR REAL TIME UNCONSTRAINED FACE.pptx
PDF
Paper Study - Incremental Data-Flow Analysis Algorithms by Ryder et al
(20180715) ksiim gan in medical imaging - vuno - kyuhwan jung
Surviving your frontend (WIP - Sneak Peak)
An overview of gradient descent optimization algorithms
Visual geometry with deep learning
Lecture 01 frank dellaert - 3 d reconstruction and mapping: a factor graph ...
SLIDES OF LECTURE ABOUT TRANSFORMERS FOR VISION TASKS
Scalable image recognition model with deep embedding
OpenCV Essentials: From Basics to Small Projects, by Irina Nikulina
[Mmlab seminar 2016] deep learning for human pose estimation
Cvpr 2017 Summary Meetup
Brodmann17 CVPR 2017 review - meetup slides
Generative Adversarial Networks and Their Medical Imaging Applications
Slides for "Do Deep Generative Models Know What They Don't know?"
Advanced Recognition System
Face recognition system
The deep bootstrap framework review
An overview of gradient descent optimization algorithms.pdf
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
Review A DCNN APPROACH FOR REAL TIME UNCONSTRAINED FACE.pptx
Paper Study - Incremental Data-Flow Analysis Algorithms by Ryder et al

Recently uploaded (20)

PDF
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PDF
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
PDF
Analyzing Impact of Pakistan Economic Corridor on Import and Export in Pakist...
PDF
Visual Aids for Exploratory Data Analysis.pdf
PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
PPTX
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
PPTX
Artificial Intelligence
PDF
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPTX
Fundamentals of Mechanical Engineering.pptx
PDF
86236642-Electric-Loco-Shed.pdf jfkduklg
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPT
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
PPTX
CURRICULAM DESIGN engineering FOR CSE 2025.pptx
PPTX
UNIT 4 Total Quality Management .pptx
PDF
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
PPTX
introduction to high performance computing
PDF
737-MAX_SRG.pdf student reference guides
PPT
A5_DistSysCh1.ppt_INTRODUCTION TO DISTRIBUTED SYSTEMS
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
Automation-in-Manufacturing-Chapter-Introduction.pdf
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
Analyzing Impact of Pakistan Economic Corridor on Import and Export in Pakist...
Visual Aids for Exploratory Data Analysis.pdf
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
Artificial Intelligence
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
Fundamentals of Mechanical Engineering.pptx
86236642-Electric-Loco-Shed.pdf jfkduklg
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
CURRICULAM DESIGN engineering FOR CSE 2025.pptx
UNIT 4 Total Quality Management .pptx
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
introduction to high performance computing
737-MAX_SRG.pdf student reference guides
A5_DistSysCh1.ppt_INTRODUCTION TO DISTRIBUTED SYSTEMS

Automatic Attendance System using CNN

  • 1. AUTOMATIC ATTENDANCE SYSTEM By: Pinaki Ranjan Sarkar Under the guidance of: Dr. Gorthi R.K.S.S. Manyam & Dr. Deepak Mishra
  • 2. OUTLINE ▪ Motivation▪ Motivation ▪ Objective ▪ System Requirements ▪ Design Details ▪ Tried methods ▪ Inspiration▪ Inspiration ▪ Main design ▪ Status so far ▪ Future work
  • 3. MOTIVATION ▪ Taking attendance in large classes is:▪ Taking attendance in large classes is: ▪ Cumbersome ▪ Repetitive ▪ Consumes valuable class time ▪ What if we make an efficient face detection and recognition system for▪ What if we make an efficient face detection and recognition system for this task?
  • 4. OBJECTIVES ▪ Automatic user identification via face detection and recognition.▪ Automatic user identification via face detection and recognition. ▪ Develop and implement an efficient face detection and recognition system. ▪ End-to-end face recognition system using deep learning.
  • 5. DIFFICULTIES ▪ Large pose variation▪ Large pose variation ▪ Hidden faces & tiny faces ▪ Different illumination conditions, occlusions
  • 6. SYSTEM REQUIREMENTS ▪ Hardware:▪ Hardware: ▪ A camera ▪ PC or Raspberry pi ▪ Software: ▪ Matlab 2013+ ▪ Python 2.7▪ Python 2.7 ▪ Lasagne API
  • 7. DESIGN DETAILS Database Face Detection Face Recognition Abhi - 1 Priya – 1 Ayushi – 0 Pinaki – 0 Akshay – 1 Sidd - 1Sidd - 1 All are using CNN!!
  • 8. GOING DEEP INTO FACE RECOGNITION ▪ Various methods are employed to recognize a person in wild.▪ Various methods are employed to recognize a person in wild. ▪ Comparing to traditional handcrafted features such as high dimensional LBP, Active Appearance Model(AAM), Active Shape Model(ASM) or Bayesian face, Gaussian face etc.; automatically learnt deep features based on personal identity are more advantageous. ▪ In most deep learning based face recognition methods the inputs to the deep model are aligned face images.deep model are aligned face images.
  • 9. TRIED METHODS ▪ Tal Hassner, Shai Harel, Eran Paz, Roee Enbar, "Effective Face Frontalization▪ Tal Hassner, Shai Harel, Eran Paz, Roee Enbar, "Effective Face Frontalization in Unconstrained Images”, CVPR-2015
  • 10. TRIED METHODS ▪ Tal Hassner, Shai Harel, Eran Paz, Roee Enbar, "Effective Face Frontalization▪ Tal Hassner, Shai Harel, Eran Paz, Roee Enbar, "Effective Face Frontalization in Unconstrained Images”, CVPR-2015
  • 11. TRIED METHODS ▪ Zhu, Xiangyu, et al. "Face alignment across large poses: A 3d▪ Zhu, Xiangyu, et al. "Face alignment across large poses: A 3d solution." CVPR-2016
  • 12. TRIED METHODS ▪ Zhu, Xiangyu, et al. "Face alignment across large poses: A 3d▪ Zhu, Xiangyu, et al. "Face alignment across large poses: A 3d solution." CVPR-2016
  • 13. TRIED METHODS ▪ Zhu, Xiangyu, et al. "Face alignment across large poses: A 3d▪ Zhu, Xiangyu, et al. "Face alignment across large poses: A 3d solution." CVPR-2016
  • 14. TRIED METHODS ▪ I have tried to implement some more papers but they failed when we are dealing with large pose. ▪ I have tried to implement some more papers but they failed when we are dealing with large pose. ▪ Instead of AAM, 3D fitted model (3D frontalisation doesn’t show significant improvements over simple 2D alignment*), we used Deep learning techniques to recognize a face using only personal identity clues. * Banerjee, Sandipan, et al. "To Frontalize or Not To Frontalize: Do We Really Need Elaborate Pre-Processing to Improve Face Recognition Performance?." arXiv preprint arXiv:1610.04823 (2016).
  • 15. INSPIRATION ▪ Our work is inspired by some of the state-of-the-art papers.▪ Our work is inspired by some of the state-of-the-art papers. ▪ DeepFace: Closing the Gap to Human-Level Performance in Face Verification, CVPR-2014 ▪ FaceNet: A Unified Embedding for Face Recognition and Clustering, CVPR-2015 ▪ DeepID3: Face Recognition with Very Deep Neural Networks, CVPR-2015 ▪ Supervised Transformer Network for Efficient Face Detection, ECCV-2016 ▪ Towards End-to-End Face Recognition through Alignment Learning, arXiv-2017 ▪ Spatial transformer networks. NIPS-2015 ▪ Finding Tiny Faces. arXiv-2016
  • 16. MAIN DESIGN ▪ The complete architecture has two stages▪ The complete architecture has two stages ▪ Face Detection ▪ Face Recognition
  • 18. ARCHITECTURE FOR DETECTION ▪ They have provided an in-depth analysis of image resolution, object scale, and spatial context for the purposes of finding small faces. ▪ Still the detailed study of the paper is pending as I have found this paper very recently. I will briefly describe their architecture in the next slide
  • 22. WHERE IT FAILS? ▪ For out of plane rotation this proposed method works fine but when 2D▪ For out of plane rotation this proposed method works fine but when 2D rotation comes into picture then their method suffers from less accuracy. ▪ Some of the failures are shown in the next slide
  • 23. 11/14 True detection 1 False detection 1/14 True detection 1 False detection
  • 25. ARCHITECTURE FOR RECOGNITION Localization Network Transform parameters RecognitionRecognition Network Features Augmented image 128 X 128 Transformer Aligned face 64 X 64 Spatial Transformer Network
  • 26. SPATIAL TRANSFORMER NETWORK ▪ Intuition behind STN▪ Intuition behind STN
  • 27. SPATIAL TRANSFORMER NETWORK ▪ Intuition behind STN▪ Intuition behind STN Sampling
  • 30. SPATIAL TRANSFORMER NETWORK ▪ According to the original DeepMind paper, the spatial transformer can▪ According to the original DeepMind paper, the spatial transformer can be used to implement any parametrizable transformation including translation, scaling, affine, projective. ▪ Suppose that for the ith target point pt i = (xt i ; yt i ; 1) in the output image, a grid generator generates its source coordinates (xs i ; ys i ; 1) in the input image according to transformation parameters. Projective transformation equation
  • 31. SPATIAL TRANSFORMER NETWORK ▪ Sampler: (Mathematical Formulation)▪ Sampler: (Mathematical Formulation)
  • 32. SPATIAL TRANSFORMER NETWORK ▪ We use the bilinear kernel so that:▪ We use the bilinear kernel so that:
  • 33. SPATIAL TRANSFORMER NETWORK ▪ We use the bilinear kernel so that:▪ We use the bilinear kernel so that: ▪ So overall transformer model will be:
  • 34. SPATIAL TRANSFORMER NETWORK ▪ We use the bilinear kernel so that:▪ We use the bilinear kernel so that: ▪ So overall transformer model will be: This is equivalent to convolving a sampling kernel k with the source image of H X W dimension
  • 35. SPATIAL TRANSFORMER NETWORK ▪ We use the bilinear kernel so that:▪ We use the bilinear kernel so that: ▪ So overall transformer model will be: This is equivalent to convolving a sampling kernel k with the source image of H X W dimension ▪ All the blocks should be differentiable.
  • 37. SPATIAL TRANSFORMER NETWORK During the backward propagation, we need to calculate the gradient of Vi with respect to each of the eight transformation parameters.
  • 38. SPATIAL TRANSFORMER NETWORK During the backward propagation, we need to calculate the gradient of Vi with respect to each of the eight transformation parameters.
  • 39. SPATIAL TRANSFORMER NETWORK During the backward propagation, we need to calculate the gradient of Vi with respect to each of the eight transformation parameters.
  • 44. SPATIAL TRANSFORMER NETWORK ▪ The similarity transformation is defined here▪ The similarity transformation is defined here in which α is the rotation angle, λ is the scaling factor, and t1; t2 are the horizontal and vertical translation displacements respectively. Analogously, the gradients of Vi respected to α and λ are shown below:
  • 45. SPATIAL TRANSFORMER NETWORK ▪ The similarity transformation is defined here▪ The similarity transformation is defined here in which α is the rotation angle, λ is the scaling factor, and t1; t2 are the horizontal and vertical translation displacements respectively. Analogously, the gradients of Vi respected to α and λ are shown below:
  • 46. SPATIAL TRANSFORMER NETWORK ▪ The similarity transformation is defined here▪ The similarity transformation is defined here in which α is the rotation angle, λ is the scaling factor, and t1; t2 are the horizontal and vertical translation displacements respectively. Analogously, the gradients of Vi respected to α and λ are shown below:
  • 47. STATUS SO FAR ▪ STN is implemented and tested on Labeled Face in Wild (LFW) dataset.▪ STN is implemented and tested on Labeled Face in Wild (LFW) dataset. ▪ Out of 5423 classes, we took only 1000 classes because of the limitation in computation. ▪ During training we did data augmentation with random 2D-Affine transformation on face data to increase the training size. ▪ We had 15399 training images, 3501 testing images and 2100 validation images during training. ▪ We introduced a CNN architecture to extract deep features from the transformed face.
  • 48. STATUS SO FAR ▪ Output of STN network▪ Output of STN network
  • 49. STATUS SO FAR Conv Conv Pool & Actv Conv Conv Conv Pool & Actv Pool & Actv Pool & Actv Dense Dense Dense Actv STN Architecture Conv Pool & Actv Pool & Actv Dense Dense Dense Actv Actv Recognition Architecture
  • 50. STATUS SO FAR Conv Conv Pool & Actv Conv Conv Conv Pool & Actv Pool & Actv Pool & Actv Dense Dense Dense Actv STN Architecture Conv Pool & Actv Pool & Actv Dense Dense Dense Actv Actv Recognition Architecture
  • 51. FUTURE WORK ▪ Try to validate the architecture in real data (taken from classroom)▪ Try to validate the architecture in real data (taken from classroom) ▪ Without training a new CNN model, compare recognition accuracy with the ImageNet winning pre-trained models. ▪ Adding 2D rotation invariance face detection with the recent model.