Computer Vision with Deep Learning
Introduction to Computer Vision
• Definition: Enabling machines to "see" by interpreting visual data (images or
videos).
• Applications: Face recognition, self-driving cars, medical imaging, surveillance,
AR/VR.
Key Topics:
• Image processing vs. Computer Vision
• Visual pipeline: Image acquisition → Preprocessing → Feature extraction →
Interpretation
Digital Image Fundamentals
• Image Representation: Grayscale (1 channel), RGB (3 channels), resolution,
pixels
• Coordinate system: Top-left is (0,0), height and width defined in pixels
Core Concepts:
• Color models: RGB, HSV, Lab
• Bit depth: Number of bits per channel (8-bit = 0–255)
• Image formats: JPG, PNG, BMP, TIFF
Image Processing Basics
• Goal: Improve image quality or extract basic features
Techniques:
• Filtering: Gaussian, Median, Sobel
• Thresholding: Binary, Adaptive
• Morphological operations: Dilation, Erosion
• Edge detection: Canny, Laplacian, Sobel
Classical Feature Extraction
Before deep learning, features were hand-engineered.
Popular Techniques:
• SIFT (Scale Invariant Feature Transform)
• SURF (Speeded-Up Robust Features)
• ORB (Oriented FAST and Rotated BRIEF)
• HOG (Histogram of Oriented Gradients)
Deep Learning for Computer Vision
• Why DL? Automates feature extraction and improves accuracy
Frameworks:
• TensorFlow / Keras
• PyTorch
• OpenCV (for preprocessing + visualization)
Convolutional Neural Networks (CNNs)
CNNs are the backbone of modern computer vision.
CNN Architecture:
• Input Layer: Image tensor (H × W × C)
• Convolutional Layer: Filters that scan the image
• Activation Function: ReLU
• Pooling Layer: Max/Avg Pooling for downsampling
• Fully Connected Layer: Classification/Prediction
• Softmax: Output probabilities
Key Terms:
• Padding
• Stride
• Filter/kernel size
• Feature maps
Image Classification with CNN
• Task: Assign a label to an entire image
Workflow:
1. Prepare dataset (e.g., CIFAR-10, MNIST)
2. Preprocess data (normalize, resize)
3. Build model (CNN layers)
4. Compile (loss: categorical crossentropy)
5. Train and evaluate
Transfer Learning
• Use pretrained models (e.g., VGG, ResNet, EfficientNet) trained on ImageNet
• Fine-tuning: Freeze initial layers, retrain later ones on your dataset
Object Detection
• Goal: Locate and classify objects in an image
Approaches:
• Traditional: Sliding window + classifier
• Deep Learning:
o R-CNN, Fast R-CNN, Faster R-CNN
o YOLO (You Only Look Once)
o SSD (Single Shot Multibox Detector)
Semantic Segmentation
• Goal: Label each pixel with a class
Architectures:
• U-Net
• SegNet
• DeepLab
Image Generation & GANs
• Generative Adversarial Networks (GANs): Generate realistic images
• Components:
o Generator
o Discriminator
Applications:
• Image super-resolution
• Style transfer
• Data augmentation
Vision Transformers (ViTs)
• Alternative to CNNs using attention mechanisms
• Treat image patches as tokens (like NLP)
• Example: ViT, Swin Transformer
Self-Supervised and Contrastive Learning
• Learn useful representations without labels
• SimCLR, MoCo, BYOL
Real-Time Computer Vision
• Techniques for deploying vision models efficiently:
o Quantization
o Pruning
o TensorRT, ONNX
o Edge deployment (e.g., Jetson Nano, Coral)