Jürgen Schmidhuber
The Swiss AI Lab IDSIA
Univ. Lugano & SUPSI
http://www.idsia.ch/~juergen
True Artificial
Intelligence Will
Change Everything
NNAISENSE
Jürgen Schmidhuber
You_again Shmidhoobuh
According to Nature’s
1999 Millennium Issue
Fritz Haber
Nobel 1918
Carl Bosch
½ Nobel 1931
Haber-Bosch Process::Extracts
nitrogen from thin air to make
fertilizer – else 1 in 2 people (soon
2 in 3) wouldn’t exist. Nature:
Detonated population explosion
Most Influential Invention
of the 20th Century?
The AI explosion of
the 21st century will
be even much bigger
Konrad Zuse 1941
First working
general computer
Every 5 years
10 times cheaper
75 years ≅1015
R-learn & improve learning
algorithm itself, and also the
meta-learning algorithm, etc…
My diploma thesis (1987):
first concrete design of
recursively self-improving AI
http://people.idsia.ch/~juergen/metalearner.html
SINCE 1991
SCHMIDHUBER
THE SWISS AI LAB
JÜRGEN
IDSIA - USI & SUPSI
NNAISENSE
Father of Deep Learning
A. G. Ivakhnenko, since 1965
Deep multilayer perceptrons with
polynomial activation functions
Incremental layer-wise training by
regression analysis - learn
numbers of layers and units per
layer - prune superfluous units
8 layers already back in 1971
still used in the 2000s
Continuous BP in Euler-LaGrange Calculus + Dynamic
Programming: Bryson 1961, Kelley 1960. BP through
chain rule only: Dreyfus 1962. `Modern BP’ in sparse,
discrete, NN-like nets: Linnainmaa 1970. Weight
changes: Dreyfus 1973. Automatic differentiation:
Speelpenning 1980. BP applied to NNs: Werbos 1982.
Experiments & internal representations: Rumelhart et al
86. RNNs: e.g., Williams, Werbos, Robinson, 1980s...
Supervised Backpropagation (BP)
…..
RNNaissance of the deepest NNs:
RNNs are general computers
Learn program = weight matrix
http://www.idsia.ch/~juergen/rnn.html
http://www.idsia.ch/~juergen/firstdeeplearner.html
Schmidhuber 1991: first very deep
learner. Unsupervised pretraining
for Hierarchical Temporal Memory:
stack of RNN è history
compression èspeed up
supervised learning. Compare
feedforward NN case:
AutoEncoder stacks (Ballard
1987) and Deep Belief NNs
(Hinton et al 2006)
With Hochreiter (1997), Gers (2000), Graves, Fernandez, Gomez, Bayer…
1997-2009. Since 2015 on your phone! Google, Microsoft, IBM, Apple, all use LSTM now
http://www.idsia.ch/~juergen/rnn.html
Examples of recent benchmark records with LSTM
RNNs / CTC, often at major IT companies:
1. Large vocabulary speech recognition (Sak et al., Google, Interspeech 2014)
2. English to French translation (Sutskever et al., Google, NIPS 2014)
3. CTC RNNs break Switchboard record (Hannun et al., Baidu, 2014)
4. Text-to-speech synthesis (Fan et al., Microsoft, 2014, Zen et al., Google, 2015)
5. Prosody contour prediction (Fernandez et al., IBM, Interspeech 2014)
6. Google Voice improved by 49% (Sak et al, 2015, now for billions of users)
7. Syntactic parsing for NLP (Vinyals et al., Google, 2014-15)
8. Photo-real talking heads (Soong and Wang, Microsoft, ICASSP 2015)
9. Quicktype for iOS (Apple, 2016)
10. Image caption generation (Vinyals et al., Google, 2014)
11. Keyword spotting (Chen et al., Google, ICASSP 2015)
12. Video to textual description (Donahue et al., 2014; Li Yao et al., 2015)
http://www.idsia.ch/~juergen/rnn.html
World’s most valuable
public companies are
massively using LSTM
… for best speech recognition,
machine translation, image
captions, chat bots, etc, etc.
2015-2016: our LSTM / CTC is
now available to billions of users
on smartphones, PCs,…
Our Deep GPU-Based Max-Pooling CNN (IJCAI 2011)
e.g., http://www.idsia.ch/~juergen/deeplearning.html
Alternating convolutional and subsampling layers (CNN): Fukushima 1979. Backprop for shift-invariant 1D
CNN or TDNN: Waibel et al, 1987, for 2D CNN: LeCun et al 1989. Max-pooling (MP): Weng 1993. Backprop
for MPCNNs: Ranzato et al 2007, Scherer et al 2010, GPU-MPCNN - Ciresan et al, IDSIA, 2011
Traffic Sign Contest, Silicon Valley, 2011:
Our GPU-MPCNN was twice better than humans
3 times better than closest artificial competitor
6 times better than best non-neural thing: FIRST
Ernst
Dickmanns,
the robot
car pioneer,
Munich, 80s
1995: Munich to
Denmark and
back on public
Autobahns, up to
180 km/h, no
GPS, passing
other cars
Robot Cars
2014: 20 year anniversary of
self-driving cars in highway traffic
Our Deep Learner
Won ISBI 2012 Brain
Image Segmentation
Contest:
First feedforward
Deep Learner to win
an image
segmentation
competition
(but compare deep
recurrent LSTM
2009: segmentation
& classification)
Our Deep Learner
Won ISBI 2012 Brain
Image Segmentation
Contest:
First feedforward
Deep Learner to win
an image
segmentation
competition
(but compare deep
recurrent LSTM
2009: segmentation
& classification)
Our Deep Learner Won ICPR
2012 Contest on Mitosis
Detection: First pure Deep
Learner to win a contest on
object detection (in large
images). Very fast MPCNN
scans: Masci, Giusti, Ciresan,
Gambardella, Schmidhuber,
ICIP 2013
Thanks to Dan Ciresan & Alessandro Giusti
http://www.idsia.ch/~juergen/deeplearningwinsMICCAIgrandchallenge.html
Image caption generation with LSTM RNNs translating internal representations of CNNs
(Vinyals, Toshev, Bengio, Erhan, Google, 2014)
SCHMIDHUBER
THE SWISS AI LAB
JÜRGEN
IDSIA - USI & SUPSI
Highway
Networks
Very similar later ResNet = feedforward LSTM
(in each layer a function of x plus x), no gates,
won ImageNet (Microsoft, 150 layers)
“Feedforward LSTM” (in each layer a function of x plus x)
with forget gates; to train NNs with hundreds of layers:
Srivastava, Greff, Schmidhuber, May 2015, also NIPS 2015
http://people.idsia.ch/~juergen/microsoft-wins-imagenet-through-feedforward-LSTM-without-gates.html
LSTM concepts keep invading CNN territory
Best Segmentation with PyramMiD-LSTM (NIPS 2015)
Stollenga, Byeon,
Liwicki, Schmidhuber
Open Source Neural Networks Library by my PhD students K Greff and R Srivastava
http://people.idsia.ch/~juergen/brainstorm.html
LSTM learns knot-tying tasklets:
Mayr Gomez Wierstra Nagy Knoll
Schmidhuber, IROS’06
• First very deep learner (1991-1993) – tasks with >1000 computational stages
• First neural learner of sequential attention (hard: 1991, soft: 1993)
• First self-referential RNNs that run their own learning algorithm (1993)
• First very deep supervised learner (LSTM, 1990s-2006 and beyond)
• First recurrent NN to win international contests (2009)
• First NN to win connected handwriting contests (2009)
• First outperformance of humans in a computer vision contest (2011)
• First deep NN to win Chinese handwriting contest (2011)
• European handwriting (MNIST): old error record almost halved (2011)
• First deep NN to win image segmentation contest (May 2012)
• First deep NN to win object detection contest (2012)
• First deep NN to win medical imaging contest (2012)
• First feedforward NNs with >100 layers (Highway NNs, May 2015)
• First RNN controller that reinforcement learns from raw video (2013)
Some of Our Deep Learning “Firsts”
1997: Chess
Deep Blue (does not learn), IBM
1994: Backgammon
TD-Gammon (learns), IBM
2016: Go
AlphaGo (learns)
Google DeepMind
Board Games: Non-Human
World Champions
Google bought DeepMind for
600 M to do Machine Learning
(ML) & AI. First DeepMinders
with PhDs in ML & AI: my lab’s
ex-PhD students Legg (co-
founder) & Wierstra (#4).
Background of the other co-
founders: neurobiology &
video games (Hassabis) &
business (Suleyman)
DeepMind hired 2 more PhD
students of mine: Graves (e.g.,
CTC & NTM) & Schaul (RL, on our
2010 Atari-Go paper)
Finds Complex Neural Controllers with a Million Weights – RAW VIDEO INPUT!
Faustino Gomez, Jan Koutnik, Giuseppe Cuccu, J. Schmidhuber, GECCO 2013
Reinforcement Learning in
Partially Observable Worlds
J.S.: IJCNN 1990, NIPS 1991: Reinforcement Learning
with Recurrent Controller & Recurrent World Model
Learning
and
planning
with
recurrent
networks
IJNS 1991: R-Learning of Visual Attention
on 100,000 times slower computers
http://people.idsia.ch/~juergen/attentive.html
1991: current goal=extra fixed input
2015: all of this is coming back!
RoboCup World Champion 2004, Fastest League, 5m/s
Alex @ IDSIA, led
FU Berlin’s RoboCup
World Champion
Team 2004
Lookahead expectation & planning with neural networks
(Schmidhuber, IEEE INNS 1990): successfully used for
RoboCup by Alexander Gloye-Förster (went to IDSIA)
http://www.idsia.ch/~juergen/learningrobots.html
RNNAIssance
2014-2015
On Learning to
Think: Algorithmic
Information
Theory for Novel
Combinations of
Reinforcement
Learning RNN-
based Controllers
(RNNAIs) and
Recurrent Neural
World Models
http://arxiv.org/abs/1511.09249
Formal theory of fun & novelty &
surprise & attention & creativity &
curiosity & art & science & humor
Maximize Future Fun(Data X,O(t))~
∂CompResources(X,O(t))/∂t
E.g., Connection Science 18(2):173-187, 2006
IEEE Transactions AMD 2(3):230-247, 2010
http://www.idsia.ch/~juergen/creativity.html
https://www.youtube.com/watch?v=OTqdXbTEZpE
Continual curiosity-driven skill
acquisition from high-dimensional
video inputs for humanoid robots.
Kompella, Stollenga, Luciw,
Schmidhuber. Artificial Intelligence,
2015
PowerPlay not only solves but also continually
invents problems at the borderline between what's
known and unknown - training an increasingly
general problem solver by continually searching for
the simplest still unsolvable problem
Jürgen Schmidhuber
The Swiss AI Lab IDSIA
Univ. Lugano & SUPSI
http://www.idsia.ch/~juergen
True Artificial
Intelligence Will
Change Everything
NNAISENSE
Next: build small
animal-like AI that
learns to think and
plan hierarchically
like a crow or a
capuchin monkey
Evolution
needed billions
of years for this,
then only a few
more millions
for humans
now talking to investors
neural networks-based
artificial intelligence
Finally: my beautiful simple
pattern (discovered in 2014)
of exponential acceleration of
the most important events in
the history of the universe
from a human perspective
http://www.reddit.com/r/MachineLearning/comments/2xcyrl/i_am_j%C3%BCrgen_schmidhuber_ama/cozd9ju
Ω = 2050 or so - 13.8 B years: Big Bang
Ω - 3.5 B years: first life on Earth
Ω - 0.9 B years: first animal-like mobile life
Ω - 220 M years: first mammals (your ancestors)
Ω - 55 M years: first primates (your ancestors)
Ω - 13 M years: first hominids (your ancestors)
Ω - 3.5 M years: first stone tools (dawn of technology)
Ω – 850 K years: controlled fire (next big tech breakthrough)
Ω - 210 K years: first anatomically modern man (your ancestors)
Ω - 50 K years: behaviorally modern man colonizing earth
Ω - 13 K years: neolithic revolution, agriculture, domestic animals
Ω - 3.3 K years: iron age, 1st population explosion starts
Ω - 800 years: first guns & rockets (in China)
Ω - 200 years: industrial revolution, 2nd population explosion starts
Ω - 50 years: digital nervous system, WWW, cell phones for all
Ω - 12 years: cheap small computers with 1 brain power?
Ω - 3 years: ??
Ω - 9 months: ????
Ω - 2 months: ????????
Ω - 2 weeks: ???????????????? ….
Ω - 1/4 of this time:
Ω - 1/4 of this time:
Ω - 1/4 of this time:
Ω - 1/4 of this time:
Ω - 1/4 of this time:
Ω - 1/4 of this time:
Ω - 1/4 of this time:
Ω - 1/4 of this time:
Ω - 1/4 of this time:
Ω - 1/4 of this time:
Ω - 1/4 of this time:
Ω - 1/4 of this time:
Ω - 1/4 of this time:
Ω - 1/4 of this time:
Ω - 1/4 of this time:
Ω - 1/4 of this time:
Ω - 1/4 of this time
Ω - 1/4 of this time:
Ω - 1/4 of this time:
True Artificial Intelligence Will Change Everything
MANY OF MY WEB PAGES AND TALK SLIDES
ARE GRAPHICALLY STRUCTURED THROUGH
FIBONACCI NUMBER RATIOS RELATED TO
THE“GOLDEN” OR “DIVINE” PROPORTION.
MANY ARTISTS CLAIM THE HUMAN EYE
PREFERS THIS RATIO OVER OTHERS
THE RATIOS OF SUBSEQUENT
FIBONACCI NUMBERS: 1/1, 1/2,
2/3, 3/5, 5/8, 8/13, 13/21, 21/34 ...
CONVERGE TO THE
HARMONIC PROPORTION 1/2
(SQUARE ROOT OF 5 - 1) =
0.618034..., DIVIDING THE UNIT
INTERVAL INTO SEGMENTS OF
LENGTHS A AND B SUCH
THAT A/B=B
BTW, MOST OF MY
SLIDES ARE DESIGNED
THROUGH RECURSIVE
“GOLDEN” HARMONIC
PROPORTIONS
True Artificial Intelligence Will Change Everything

More Related Content

PDF
A brief history of machine learning
PPTX
History of AI
PDF
Past, Present and Future of AI: a Fascinating Journey - Ramon Lopez de Mantar...
PPTX
Introduction to AI
PPTX
Artificial intelligence original
DOCX
Artificial intelligence research
DOCX
Artificial Intelligence power point presentation document
PPT
Artificial Intelligence
A brief history of machine learning
History of AI
Past, Present and Future of AI: a Fascinating Journey - Ramon Lopez de Mantar...
Introduction to AI
Artificial intelligence original
Artificial intelligence research
Artificial Intelligence power point presentation document
Artificial Intelligence

What's hot (20)

PDF
Artificial Intelligence: The Promise, the Myth, and a Dose of Reality
PDF
History of AI, Current Trends, Prospective Trajectories
PPTX
Lecture 02 introduction to ai
PDF
The I in AI (or why there is still none)
PPTX
Artificial Intelligence and Mathematics
PPT
Artificial Intelligence
PDF
Case study on machine learning
DOCX
Case study on deep learning
PPTX
The IOT Academy Training for Artificial Intelligence ( AI)
PPTX
Artificial intelligence my ppt by hemant sankhla
PDF
The Intelligence Corpus, an Annotated Corpus of Definitions of Intelligence: ...
DOCX
Artificial Intelligence
PDF
DWX 2018 Session about Artificial Intelligence, Machine and Deep Learning
PDF
Paper sharing_deep learning for smart manufacturing methods and applications
PDF
Towards which Intelligence? Cognition as Design Key for building Artificial I...
PDF
Coming to terms with intelligence in machines
PPTX
Deep Learning Explained: The future of Artificial Intelligence and Smart Netw...
PPT
AI Lecture 1 (introduction)
PPT
Artificial Intelligence AI Topics History and Overview
PDF
ARTIFICIAL INTELLIGENT ( ITS / TASK 6 ) done by Wael Saad Hameedi / P71062
Artificial Intelligence: The Promise, the Myth, and a Dose of Reality
History of AI, Current Trends, Prospective Trajectories
Lecture 02 introduction to ai
The I in AI (or why there is still none)
Artificial Intelligence and Mathematics
Artificial Intelligence
Case study on machine learning
Case study on deep learning
The IOT Academy Training for Artificial Intelligence ( AI)
Artificial intelligence my ppt by hemant sankhla
The Intelligence Corpus, an Annotated Corpus of Definitions of Intelligence: ...
Artificial Intelligence
DWX 2018 Session about Artificial Intelligence, Machine and Deep Learning
Paper sharing_deep learning for smart manufacturing methods and applications
Towards which Intelligence? Cognition as Design Key for building Artificial I...
Coming to terms with intelligence in machines
Deep Learning Explained: The future of Artificial Intelligence and Smart Netw...
AI Lecture 1 (introduction)
Artificial Intelligence AI Topics History and Overview
ARTIFICIAL INTELLIGENT ( ITS / TASK 6 ) done by Wael Saad Hameedi / P71062
Ad

Similar to True Artificial Intelligence Will Change Everything (20)

PDF
Introduction to Deep Learning: Concepts, Architectures, and Applications
PDF
Robotics: Current Topics
PDF
Research in Deep Learning: A Perspective from NSF
PPT
Introduction_to_DEEP_LEARNING.ppt machine learning that uses data, loads ...
PPTX
AI: the silicon brain
PPT
DEEP LEARNING PPT aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
PDF
Deep Learning & NLP: Graphs to the Rescue!
PPT
Introduction_to_DEEP_LEARNING.ppt
PPT
Introduction_to_DEEP_LEARNING.sfsdafsadfsadfsdafsdppt
PDF
Machine Learning Overview: How did we get here ?
PPT
Introduction_to_DEEP_LEARNING ppt 101ppt
PDF
Deep Learning: a birds eye view
PDF
Deep Neural Networks for Machine Learning
PPTX
Understanding deep learning
PPTX
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
PPTX
Introduction to Deep learning
PDF
Big Data Malaysia - A Primer on Deep Learning
PDF
MIT Deep Learning Basics: Introduction and Overview by Lex Fridman
PPTX
Rise of AI through DL
Introduction to Deep Learning: Concepts, Architectures, and Applications
Robotics: Current Topics
Research in Deep Learning: A Perspective from NSF
Introduction_to_DEEP_LEARNING.ppt machine learning that uses data, loads ...
AI: the silicon brain
DEEP LEARNING PPT aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Deep Learning & NLP: Graphs to the Rescue!
Introduction_to_DEEP_LEARNING.ppt
Introduction_to_DEEP_LEARNING.sfsdafsadfsadfsdafsdppt
Machine Learning Overview: How did we get here ?
Introduction_to_DEEP_LEARNING ppt 101ppt
Deep Learning: a birds eye view
Deep Neural Networks for Machine Learning
Understanding deep learning
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Introduction to Deep learning
Big Data Malaysia - A Primer on Deep Learning
MIT Deep Learning Basics: Introduction and Overview by Lex Fridman
Rise of AI through DL
Ad

More from Russia.AI (6)

PDF
Deep learning for audio-based music recommendation
PDF
Prisma and other stylization apps: explaining tech behind
PDF
Using neural networks methods in reinforcement learning tasks
PDF
Deep learning: technology overview and trends
PDF
Artificial Intelligence: investment trends and applications, H1 2016
PDF
Developing Reading Machines
Deep learning for audio-based music recommendation
Prisma and other stylization apps: explaining tech behind
Using neural networks methods in reinforcement learning tasks
Deep learning: technology overview and trends
Artificial Intelligence: investment trends and applications, H1 2016
Developing Reading Machines

Recently uploaded (20)

PDF
UiPath Agentic Automation session 1: RPA to Agents
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PPTX
Modernising the Digital Integration Hub
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PDF
sbt 2.0: go big (Scala Days 2025 edition)
PDF
Credit Without Borders: AI and Financial Inclusion in Bangladesh
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PPT
What is a Computer? Input Devices /output devices
PDF
The influence of sentiment analysis in enhancing early warning system model f...
PDF
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
DOCX
search engine optimization ppt fir known well about this
PDF
Getting started with AI Agents and Multi-Agent Systems
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PDF
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
PPTX
Build Your First AI Agent with UiPath.pptx
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
DOCX
Basics of Cloud Computing - Cloud Ecosystem
PDF
Five Habits of High-Impact Board Members
PDF
CloudStack 4.21: First Look Webinar slides
PDF
Architecture types and enterprise applications.pdf
UiPath Agentic Automation session 1: RPA to Agents
sustainability-14-14877-v2.pddhzftheheeeee
Modernising the Digital Integration Hub
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
sbt 2.0: go big (Scala Days 2025 edition)
Credit Without Borders: AI and Financial Inclusion in Bangladesh
Final SEM Unit 1 for mit wpu at pune .pptx
What is a Computer? Input Devices /output devices
The influence of sentiment analysis in enhancing early warning system model f...
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
search engine optimization ppt fir known well about this
Getting started with AI Agents and Multi-Agent Systems
A contest of sentiment analysis: k-nearest neighbor versus neural network
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
Build Your First AI Agent with UiPath.pptx
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
Basics of Cloud Computing - Cloud Ecosystem
Five Habits of High-Impact Board Members
CloudStack 4.21: First Look Webinar slides
Architecture types and enterprise applications.pdf

True Artificial Intelligence Will Change Everything

  • 1. Jürgen Schmidhuber The Swiss AI Lab IDSIA Univ. Lugano & SUPSI http://www.idsia.ch/~juergen True Artificial Intelligence Will Change Everything NNAISENSE
  • 3. According to Nature’s 1999 Millennium Issue Fritz Haber Nobel 1918 Carl Bosch ½ Nobel 1931 Haber-Bosch Process::Extracts nitrogen from thin air to make fertilizer – else 1 in 2 people (soon 2 in 3) wouldn’t exist. Nature: Detonated population explosion Most Influential Invention of the 20th Century?
  • 4. The AI explosion of the 21st century will be even much bigger
  • 5. Konrad Zuse 1941 First working general computer Every 5 years 10 times cheaper 75 years ≅1015
  • 6. R-learn & improve learning algorithm itself, and also the meta-learning algorithm, etc… My diploma thesis (1987): first concrete design of recursively self-improving AI http://people.idsia.ch/~juergen/metalearner.html
  • 7. SINCE 1991 SCHMIDHUBER THE SWISS AI LAB JÜRGEN IDSIA - USI & SUPSI NNAISENSE
  • 8. Father of Deep Learning A. G. Ivakhnenko, since 1965 Deep multilayer perceptrons with polynomial activation functions Incremental layer-wise training by regression analysis - learn numbers of layers and units per layer - prune superfluous units 8 layers already back in 1971 still used in the 2000s
  • 9. Continuous BP in Euler-LaGrange Calculus + Dynamic Programming: Bryson 1961, Kelley 1960. BP through chain rule only: Dreyfus 1962. `Modern BP’ in sparse, discrete, NN-like nets: Linnainmaa 1970. Weight changes: Dreyfus 1973. Automatic differentiation: Speelpenning 1980. BP applied to NNs: Werbos 1982. Experiments & internal representations: Rumelhart et al 86. RNNs: e.g., Williams, Werbos, Robinson, 1980s... Supervised Backpropagation (BP) …..
  • 10. RNNaissance of the deepest NNs: RNNs are general computers Learn program = weight matrix http://www.idsia.ch/~juergen/rnn.html
  • 11. http://www.idsia.ch/~juergen/firstdeeplearner.html Schmidhuber 1991: first very deep learner. Unsupervised pretraining for Hierarchical Temporal Memory: stack of RNN è history compression èspeed up supervised learning. Compare feedforward NN case: AutoEncoder stacks (Ballard 1987) and Deep Belief NNs (Hinton et al 2006)
  • 12. With Hochreiter (1997), Gers (2000), Graves, Fernandez, Gomez, Bayer… 1997-2009. Since 2015 on your phone! Google, Microsoft, IBM, Apple, all use LSTM now http://www.idsia.ch/~juergen/rnn.html
  • 13. Examples of recent benchmark records with LSTM RNNs / CTC, often at major IT companies: 1. Large vocabulary speech recognition (Sak et al., Google, Interspeech 2014) 2. English to French translation (Sutskever et al., Google, NIPS 2014) 3. CTC RNNs break Switchboard record (Hannun et al., Baidu, 2014) 4. Text-to-speech synthesis (Fan et al., Microsoft, 2014, Zen et al., Google, 2015) 5. Prosody contour prediction (Fernandez et al., IBM, Interspeech 2014) 6. Google Voice improved by 49% (Sak et al, 2015, now for billions of users) 7. Syntactic parsing for NLP (Vinyals et al., Google, 2014-15) 8. Photo-real talking heads (Soong and Wang, Microsoft, ICASSP 2015) 9. Quicktype for iOS (Apple, 2016) 10. Image caption generation (Vinyals et al., Google, 2014) 11. Keyword spotting (Chen et al., Google, ICASSP 2015) 12. Video to textual description (Donahue et al., 2014; Li Yao et al., 2015) http://www.idsia.ch/~juergen/rnn.html
  • 14. World’s most valuable public companies are massively using LSTM … for best speech recognition, machine translation, image captions, chat bots, etc, etc. 2015-2016: our LSTM / CTC is now available to billions of users on smartphones, PCs,…
  • 15. Our Deep GPU-Based Max-Pooling CNN (IJCAI 2011) e.g., http://www.idsia.ch/~juergen/deeplearning.html Alternating convolutional and subsampling layers (CNN): Fukushima 1979. Backprop for shift-invariant 1D CNN or TDNN: Waibel et al, 1987, for 2D CNN: LeCun et al 1989. Max-pooling (MP): Weng 1993. Backprop for MPCNNs: Ranzato et al 2007, Scherer et al 2010, GPU-MPCNN - Ciresan et al, IDSIA, 2011
  • 16. Traffic Sign Contest, Silicon Valley, 2011: Our GPU-MPCNN was twice better than humans 3 times better than closest artificial competitor 6 times better than best non-neural thing: FIRST
  • 17. Ernst Dickmanns, the robot car pioneer, Munich, 80s 1995: Munich to Denmark and back on public Autobahns, up to 180 km/h, no GPS, passing other cars Robot Cars 2014: 20 year anniversary of self-driving cars in highway traffic
  • 18. Our Deep Learner Won ISBI 2012 Brain Image Segmentation Contest: First feedforward Deep Learner to win an image segmentation competition (but compare deep recurrent LSTM 2009: segmentation & classification) Our Deep Learner Won ISBI 2012 Brain Image Segmentation Contest: First feedforward Deep Learner to win an image segmentation competition (but compare deep recurrent LSTM 2009: segmentation & classification)
  • 19. Our Deep Learner Won ICPR 2012 Contest on Mitosis Detection: First pure Deep Learner to win a contest on object detection (in large images). Very fast MPCNN scans: Masci, Giusti, Ciresan, Gambardella, Schmidhuber, ICIP 2013
  • 20. Thanks to Dan Ciresan & Alessandro Giusti http://www.idsia.ch/~juergen/deeplearningwinsMICCAIgrandchallenge.html
  • 21. Image caption generation with LSTM RNNs translating internal representations of CNNs (Vinyals, Toshev, Bengio, Erhan, Google, 2014)
  • 22. SCHMIDHUBER THE SWISS AI LAB JÜRGEN IDSIA - USI & SUPSI Highway Networks Very similar later ResNet = feedforward LSTM (in each layer a function of x plus x), no gates, won ImageNet (Microsoft, 150 layers) “Feedforward LSTM” (in each layer a function of x plus x) with forget gates; to train NNs with hundreds of layers: Srivastava, Greff, Schmidhuber, May 2015, also NIPS 2015
  • 24. Best Segmentation with PyramMiD-LSTM (NIPS 2015) Stollenga, Byeon, Liwicki, Schmidhuber
  • 25. Open Source Neural Networks Library by my PhD students K Greff and R Srivastava http://people.idsia.ch/~juergen/brainstorm.html
  • 26. LSTM learns knot-tying tasklets: Mayr Gomez Wierstra Nagy Knoll Schmidhuber, IROS’06
  • 27. • First very deep learner (1991-1993) – tasks with >1000 computational stages • First neural learner of sequential attention (hard: 1991, soft: 1993) • First self-referential RNNs that run their own learning algorithm (1993) • First very deep supervised learner (LSTM, 1990s-2006 and beyond) • First recurrent NN to win international contests (2009) • First NN to win connected handwriting contests (2009) • First outperformance of humans in a computer vision contest (2011) • First deep NN to win Chinese handwriting contest (2011) • European handwriting (MNIST): old error record almost halved (2011) • First deep NN to win image segmentation contest (May 2012) • First deep NN to win object detection contest (2012) • First deep NN to win medical imaging contest (2012) • First feedforward NNs with >100 layers (Highway NNs, May 2015) • First RNN controller that reinforcement learns from raw video (2013) Some of Our Deep Learning “Firsts”
  • 28. 1997: Chess Deep Blue (does not learn), IBM 1994: Backgammon TD-Gammon (learns), IBM 2016: Go AlphaGo (learns) Google DeepMind Board Games: Non-Human World Champions
  • 29. Google bought DeepMind for 600 M to do Machine Learning (ML) & AI. First DeepMinders with PhDs in ML & AI: my lab’s ex-PhD students Legg (co- founder) & Wierstra (#4). Background of the other co- founders: neurobiology & video games (Hassabis) & business (Suleyman) DeepMind hired 2 more PhD students of mine: Graves (e.g., CTC & NTM) & Schaul (RL, on our 2010 Atari-Go paper)
  • 30. Finds Complex Neural Controllers with a Million Weights – RAW VIDEO INPUT! Faustino Gomez, Jan Koutnik, Giuseppe Cuccu, J. Schmidhuber, GECCO 2013 Reinforcement Learning in Partially Observable Worlds
  • 31. J.S.: IJCNN 1990, NIPS 1991: Reinforcement Learning with Recurrent Controller & Recurrent World Model Learning and planning with recurrent networks
  • 32. IJNS 1991: R-Learning of Visual Attention on 100,000 times slower computers http://people.idsia.ch/~juergen/attentive.html
  • 33. 1991: current goal=extra fixed input 2015: all of this is coming back!
  • 34. RoboCup World Champion 2004, Fastest League, 5m/s Alex @ IDSIA, led FU Berlin’s RoboCup World Champion Team 2004 Lookahead expectation & planning with neural networks (Schmidhuber, IEEE INNS 1990): successfully used for RoboCup by Alexander Gloye-Förster (went to IDSIA) http://www.idsia.ch/~juergen/learningrobots.html
  • 35. RNNAIssance 2014-2015 On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning RNN- based Controllers (RNNAIs) and Recurrent Neural World Models http://arxiv.org/abs/1511.09249
  • 36. Formal theory of fun & novelty & surprise & attention & creativity & curiosity & art & science & humor Maximize Future Fun(Data X,O(t))~ ∂CompResources(X,O(t))/∂t E.g., Connection Science 18(2):173-187, 2006 IEEE Transactions AMD 2(3):230-247, 2010 http://www.idsia.ch/~juergen/creativity.html
  • 37. https://www.youtube.com/watch?v=OTqdXbTEZpE Continual curiosity-driven skill acquisition from high-dimensional video inputs for humanoid robots. Kompella, Stollenga, Luciw, Schmidhuber. Artificial Intelligence, 2015
  • 38. PowerPlay not only solves but also continually invents problems at the borderline between what's known and unknown - training an increasingly general problem solver by continually searching for the simplest still unsolvable problem
  • 39. Jürgen Schmidhuber The Swiss AI Lab IDSIA Univ. Lugano & SUPSI http://www.idsia.ch/~juergen True Artificial Intelligence Will Change Everything NNAISENSE
  • 40. Next: build small animal-like AI that learns to think and plan hierarchically like a crow or a capuchin monkey Evolution needed billions of years for this, then only a few more millions for humans
  • 41. now talking to investors neural networks-based artificial intelligence
  • 42. Finally: my beautiful simple pattern (discovered in 2014) of exponential acceleration of the most important events in the history of the universe from a human perspective http://www.reddit.com/r/MachineLearning/comments/2xcyrl/i_am_j%C3%BCrgen_schmidhuber_ama/cozd9ju
  • 43. Ω = 2050 or so - 13.8 B years: Big Bang Ω - 3.5 B years: first life on Earth Ω - 0.9 B years: first animal-like mobile life Ω - 220 M years: first mammals (your ancestors) Ω - 55 M years: first primates (your ancestors) Ω - 13 M years: first hominids (your ancestors) Ω - 3.5 M years: first stone tools (dawn of technology) Ω – 850 K years: controlled fire (next big tech breakthrough) Ω - 210 K years: first anatomically modern man (your ancestors) Ω - 50 K years: behaviorally modern man colonizing earth Ω - 13 K years: neolithic revolution, agriculture, domestic animals Ω - 3.3 K years: iron age, 1st population explosion starts Ω - 800 years: first guns & rockets (in China) Ω - 200 years: industrial revolution, 2nd population explosion starts Ω - 50 years: digital nervous system, WWW, cell phones for all Ω - 12 years: cheap small computers with 1 brain power? Ω - 3 years: ?? Ω - 9 months: ???? Ω - 2 months: ???????? Ω - 2 weeks: ???????????????? …. Ω - 1/4 of this time: Ω - 1/4 of this time: Ω - 1/4 of this time: Ω - 1/4 of this time: Ω - 1/4 of this time: Ω - 1/4 of this time: Ω - 1/4 of this time: Ω - 1/4 of this time: Ω - 1/4 of this time: Ω - 1/4 of this time: Ω - 1/4 of this time: Ω - 1/4 of this time: Ω - 1/4 of this time: Ω - 1/4 of this time: Ω - 1/4 of this time: Ω - 1/4 of this time: Ω - 1/4 of this time Ω - 1/4 of this time: Ω - 1/4 of this time:
  • 45. MANY OF MY WEB PAGES AND TALK SLIDES ARE GRAPHICALLY STRUCTURED THROUGH FIBONACCI NUMBER RATIOS RELATED TO THE“GOLDEN” OR “DIVINE” PROPORTION. MANY ARTISTS CLAIM THE HUMAN EYE PREFERS THIS RATIO OVER OTHERS THE RATIOS OF SUBSEQUENT FIBONACCI NUMBERS: 1/1, 1/2, 2/3, 3/5, 5/8, 8/13, 13/21, 21/34 ... CONVERGE TO THE HARMONIC PROPORTION 1/2 (SQUARE ROOT OF 5 - 1) = 0.618034..., DIVIDING THE UNIT INTERVAL INTO SEGMENTS OF LENGTHS A AND B SUCH THAT A/B=B BTW, MOST OF MY SLIDES ARE DESIGNED THROUGH RECURSIVE “GOLDEN” HARMONIC PROPORTIONS