CS5670: Intro to Computer Vision
(Cornell Tech)
Depth from a single image
Visualizing scenes from tourist
photos
Reconstructing dynamic 3D
scenes
DynIBaR: Neural Dynamic Image-Based Rendering [
https://dynibar.github.io/]
Zhengqi Li, Qianqian Wang, Forrester Cole, Richard Tucker, Noah Snavely
CVPR 2023
Today
1. What is computer vision?
2. Why study computer vision?
3. Course overview
4. Images & image filtering [time permitting]
Today
• Readings
– Szeliski, Chapter 1 (Introduction)
Every image tells a story
• Goal of computer vision:
perceive the “story”
behind the picture
• Compute properties of
the world
– 3D shape
– Names of people or
objects
– What happened?
The goal of computer vision
Can computers match human perception?
• Yes and no (mainly no)
– computers can be better at
“easy” things
– humans are better at
“hard” things
• But huge progress
– Accelerating in the last five
years due to deep learning
– What is considered “hard”
keeps changing
Human perception has its shortcomings
https://twitter.com/pickover/status/
1460275132958662657/
But humans can tell a lot about a scene
from a little information…
Source: “80 million tiny images” by Torralba, et al.
The goal of computer vision
The goal of computer vision
• Compute the 3D shape of the world
ZED 2i Camera
The goal of computer vision
• Recognize objects and people
Terminator 2, 1991
slide credit: Fei-Fei, Fergus & Torralba
sky
building
flag
face
banner
wall
street lamp
bus bus
cars slide credit: Fei-Fei, Fergus & Torralba
The goal of computer vision
• “Enhance” images
The goal of computer vision
• Forensics
Source: Nayar and Nishino, “Eyes for Relighting”
Source: Nayar and Nishino, “Eyes for Relighting”
Source: Nayar and Nishino, “Eyes for Relighting”
The goal of computer vision
• Improve photos (“Computational Photography”)
Super-resolution (source:
2d3)
Depth of field on cell phone
camera (source:
Google Research Blog) Removing objects (
Google Magic Erase
Low-light photography r
(credit: Hasinoff et al., SIGGRAPH ASIA 2016 )
)
April 10, 2019
Why study computer vision?
• Billions of images/videos captured per day
• Huge number of potential applications
• The next slides show the current state of
Optical character recognition
(OCR) • If you have a scanner, it probably came with OCR
software
Digit recognition, AT&T labs (1990’s) License plate readers
http://en.wikipedia.org/wiki/Automatic_number_plate_recognition
http://yann.lecun.com/exdb/lenet/
Sudoku grabber
http://sudokugrab.blogspot.com/
Automatic check processing
Face detection
• Nearly all cameras detect faces in real
time
– (Why?)
Face analysis and recognition
Vision-based biometrics
Who is she? Source: S. Seitz
Vision-based biometrics
“How the Afghan Girl was Identified by Her Iris Patterns” Read
the story
Source: S. Seitz
Login without a password
Fingerprint scanners Face unlock on Apple iPhone X
on many new See also
smartphones and http://www.sensiblevision.com/
other devices
New York Times, Jan. 18, 2020
by Kashmir Hill
Bird identification
Merlin Bird ID (based on Cornell Tech technology!)
Special effects: shape capture
The Matrix movies, ESC Entertainment, XYZRGB, NRC
Source: S. Seitz
Special effects: motion capture
Pirates of the Carribean, Industrial Light and Magic Source: S. Seitz
3D face tracking w/ consumer cameras
Snapchat Lenses
Face2Face system (Thies et
Image synthesis
Karras, et al., Progressive Growing of GANs for Improved Quality, Stability, and Variation, ICLR
Which face is real?
https://www.whichfaceisreal.com/
Image synthesis
“An astronaut riding a horse in a “A photo of a Corgi dog riding a bike in
photorealistic style” – DALL-E 2 Times Square. It is wearing sunglasses and
a beach hat” – Imagen
Sports
Sportvision first down line
Explanation on www.howstuffworks.com
Source: S. Seitz
Smart cars
• Mobileye
• Tesla Autopilot
• Safety features in many cars
Self-driving cars
Waymo
Robotics
NASA’s Mars Curiosity Rover Amazon Picking Challenge
https://en.wikipedia.org/wiki/Curiosity_(rover) http://www.robocup2016.org/en/events/amazon-picking-chal
lenge/
Amazon Prime Air Amazon Scout
Medical imaging
3D imaging
(MRI, CT) Skin cancer classification with deep learning
https://cs.stanford.edu/people/esteva/nature/
Virtual & Augmented Reality
6DoF head tracking Hand & body tracking
3D scene understanding 3D-360 video capture
Current state of the art
• You just saw many examples of current systems.
– Many of these are less than 5 years old
• Computer vision is an active research area, and rapidly
changing
– Many new apps in the next 5 years
– Deep learning and generative methods powering many modern
applications
• Many startups across a dizzying array of areas
– Generative AI, robotics, autonomous vehicles, medical
imaging, construction, inspection, VR/AR, …
Why is computer vision difficult?
Viewpoint variation
Credit: Flickr user michaelpaul
Scale
Illumination
Why is computer vision difficult?
Motion (Source: S. Lazebnik)
Intra-class variation
Background clutter Occlusion
Challenges: local ambiguity
slide credit: Fei-Fei, Fergus & Torralba
But there are lots of visual cues we can
use…
Source: S. Lazebnik
Bottom line
• Perception is an inherently ambiguous problem
– Many different 3D scenes could have given rise to a given 2D
image
Artist Julian Beever with his anamorphic Coke bottle
– We often must use prior knowledge about the world’s
structure Image source: F. Durand
CS5670: Introduction to Computer Vision
• Project-based course whose goal is to teach you
the basics of computer vision – image processing,
geometry, recognition – in a hands-on way
Course requirements
• Prerequisites
– Data structures
– Good working knowledge of Python programming
– Linear algebra
– Vector calculus
• Course does not assume prior imaging
experience
– computer vision, image processing, graphics, etc.
Course overview
(tentative)
1. Low-level vision
– image processing, edge detection,
feature detection, cameras, image
formation
2. Geometry & appearance
– projective geometry, stereo, structure
from motion, optimization, lighting &
materials
3. Recognition & generative
models
– object classification, deep learning,
1. Low-level vision
• Basic image processing and image formation
* =
Filtering, edge detection
Feature extraction Image formation
Project: Hybrid images
Project: Feature detection and matching
2. Geometry & appearance
Image credit: IDS Imaging
Projective geometry Stereo vision
Multi-view stereo Structure from motion
Project: Creating panoramas
Project: 3D reconstruction
3. Recognition, Deep Learning &
Generative Models
“dog”
Image classification Convolutional Neural Networks
“a class watching a computer vision lecture at Cornell Tech”
Image generation
Project: Neural Radiance Fields
(NeRFs)
Questions?