(1) This document provides a quick tour of machine learning concepts including the components, types, and step-by-step process of machine learning.
(2) It discusses machine learning applications in areas like credit approval, education, recommender systems, and reinforcement learning.
(3) The tour outlines the key components of a machine learning problem including the target function, training data, learning algorithm, hypothesis set, and learned hypothesis. It also distinguishes between supervised, unsupervised, and semi-supervised learning problems.
- The document discusses a lecture on machine learning given by Ravi Gupta and G. Bharadwaja Kumar.
- Machine learning allows computers to automatically improve at tasks through experience. It is used for problems where the output is unknown and computation is expensive.
- Machine learning involves training a decision function or hypothesis on examples to perform tasks like classification, regression, and clustering. The training experience and representation impact whether learning succeeds.
- Choosing how to represent the target function, select training examples, and update weights to improve performance are issues in machine learning systems.
This document provides an introduction to machine learning concepts including regression analysis, similarity and metric learning, Bayes classifiers, clustering, and neural networks. It discusses techniques such as linear regression, K-means clustering, naive Bayes classification, and backpropagation in neural networks. Code examples and exercises are provided to help readers learn how to apply these machine learning algorithms.
This document discusses important lessons for human learning from big data. It emphasizes that asking good questions is key, and that simple machine learning models should be tried before complex ones. It also stresses the importance of encoding human domain knowledge and intelligence into the data through feature construction to improve machine learning performance.
This document summarizes a talk on practical machine learning issues. It discusses identifying the right machine learning scenario for a given task, such as classification, regression, clustering, or reinforcement learning. It also addresses common reasons why machine learning models may fail, such as using the wrong evaluation metrics, not having enough labeled training data, or not performing proper feature engineering. The document emphasizes the importance of choosing the appropriate machine learning model, having sufficient high-quality data, and selecting useful features.
An Introduction to Supervised Machine Learning and Pattern Classification: Th...Sebastian Raschka
The document provides an introduction to supervised machine learning and pattern classification. It begins with an overview of the speaker's background and research interests. Key concepts covered include definitions of machine learning, examples of machine learning applications, and the differences between supervised, unsupervised, and reinforcement learning. The rest of the document outlines the typical workflow for a supervised learning problem, including data collection and preprocessing, model training and evaluation, and model selection. Common classification algorithms like decision trees, naive Bayes, and support vector machines are briefly explained. The presentation concludes with discussions around choosing the right algorithm and avoiding overfitting.
An introductory course on building ML applications with primary focus on supervised learning. Covers the typical ML application cycle - Problem formulation, data definitions, offline modeling, platform design. Also, includes key tenets for building applications.
Note: This is an old slide deck. The content on building internal ML platforms is a bit outdated and slides on the model choices do not include deep learning models.
Lecture 01: Machine Learning for Language Technology - IntroductionMarina Santini
This document provides an introduction to a machine learning course being taught at Uppsala University. It outlines the schedule, reading list, assignments, and examination. The course covers topics like decision trees, linear models, ensemble methods, text mining, and unsupervised learning. It discusses the differences between supervised and unsupervised learning, as well as classification, regression, and other machine learning techniques. The goal is to introduce students to commonly used methods in natural language processing.
Machine Learning and Real-World ApplicationsMachinePulse
This presentation was created by Ajay, Machine Learning Scientist at MachinePulse, to present at a Meetup on Jan. 30, 2015. These slides provide an overview of widely used machine learning algorithms. The slides conclude with examples of real world applications.
Ajay Ramaseshan, is a Machine Learning Scientist at MachinePulse. He holds a Bachelors degree in Computer Science from NITK, Suratkhal and a Master in Machine Learning and Data Mining from Aalto University School of Science, Finland. He has extensive experience in the machine learning domain and has dealt with various real world problems.
This slide gives brief overview of supervised, unsupervised and reinforcement learning. Algorithms discussed are Naive Bayes, K nearest neighbour, SVM,decision tree, Markov model.
Difference between regression and classification. difference between supervised and reinforcement, iterative functioning of Markov model and machine learning applications.
Introdution and designing a learning systemswapnac12
The document discusses machine learning and provides definitions and examples. It covers the following key points:
- Machine learning is a subfield of artificial intelligence concerned with developing algorithms that allow computers to learn from data without being explicitly programmed.
- Well-posed learning problems have a defined task, performance measure, and training experience. Examples given include learning to play checkers and recognize handwritten words.
- Designing a machine learning system involves choosing a training experience, target function, representation of the target function, and learning algorithm to approximate the function. A checkers-playing example is used to illustrate these design decisions.
Machine learning is the science of getting computers to act without being explicitly programmed. In the past decade, machine learning has given us self-driving cars, practical speech recognition, effective web search, and a vastly improved understanding of the human genome. Machine learning is so pervasive today that you probably use it dozens of times a day without knowing it.
We will review some modern machine learning applications, understand variety of machine learning problem definitions, go through particular approaches of solving machine learning tasks.
This year 2015 Amazon and Microsoft introduced services to perform machine learning tasks in cloud. Microsoft Azure Machine Learning offers a streamlined experience for all data scientist skill levels, from setting up with only a web browser, to using drag and drop gestures and simple data flow graphs to set up experiments.
We will briefly review Azure ML Studio features and run machine learning experiment.
Introduction to machine learning and model building using linear regressionGirish Gore
An basic introduction of Machine learning and a kick start to model building process using Linear Regression. Covers fundamentals of Data Science field called Machine Learning covering the fundamental topic of supervised learning method called linear regression. Importantly it covers this using R language and throws light on how to interpret linear regression results of a model. Interpretation of results , tuning and accuracy metrics like RMSE Root Mean Squared Error are covered here.
This document provides an overview of classification in machine learning. It discusses supervised learning and the classification process. It describes several common classification algorithms including k-nearest neighbors, Naive Bayes, decision trees, and support vector machines. It also covers performance evaluation metrics like accuracy, precision and recall. The document uses examples to illustrate classification tasks and the training and testing process in supervised learning.
Le Machine Learning, l’IA, le DeepLearning, les Statistiques, le Data Mining… bref, tous ces mots sont les buzz words du moment mais que se cache-t-il derrière ?
A travers des exemples concrets, on parcourra les différentes approches du Machine Learning, les grandes familles d’algorithmes (n’ayez crainte : sans rentrer dans le cœur de leurs implémentations), puis les outils et les frameworks à la disposition des Data Scientists… et pour finir, on essayera de prédire l’avenir !
Salon Data - Nantes - 19 Septembre 2017
https://salondata.fr/2017/07/12/0930-1030-ml/
The document discusses recommender systems and sequential recommendation problems. It covers several key points:
1) Matrix factorization and collaborative filtering techniques are commonly used to build recommender systems, but have limitations like cold start problems and how to incorporate additional constraints.
2) Sequential recommendation problems can be framed as multi-armed bandit problems, where past recommendations influence future recommendations.
3) Various bandit algorithms like UCB, Thompson sampling, and LinUCB can be applied, but extending guarantees to models like matrix factorization is challenging. Offline evaluation on real-world datasets is important.
Dictionary Learning for Massive Matrix Factorizationrecsysfr
The document presents a new algorithm called Subsampled Online Dictionary Learning (SODL) for solving very large matrix factorization problems with missing values efficiently. SODL adapts an existing online dictionary learning algorithm to handle missing values by only using the known ratings for each user, allowing it to process large datasets with billions of ratings in linear time with respect to the number of known ratings. Experiments on movie rating datasets show that SODL achieves similar prediction accuracy as the fastest existing solver but with a speed up of up to 6.8 times on the largest Netflix dataset tested.
1. Machine learning is a branch of artificial intelligence concerned with algorithms that allow computers to learn from data without being explicitly programmed.
2. A major focus is automatically learning patterns from training data to make intelligent decisions on new data. This is challenging since the set of all possible behaviors given all inputs is too large to observe completely.
3. Machine learning is applied in areas like search engines, medical diagnosis, stock market analysis, and game playing by developing algorithms that improve automatically through experience. Decision trees, Bayesian networks, and neural networks are common algorithms.
This document provides an overview of machine learning concepts and techniques including linear regression, logistic regression, unsupervised learning, and k-means clustering. It discusses how machine learning involves using data to train models that can then be used to make predictions on new data. Key machine learning types covered are supervised learning (regression, classification), unsupervised learning (clustering), and reinforcement learning. Example machine learning applications are also mentioned such as spam filtering, recommender systems, and autonomous vehicles.
MS CS - Selecting Machine Learning AlgorithmKaniska Mandal
ML Algorithms usually solve an optimization problem such that we need to find parameters for a given model that minimizes
— Loss function (prediction error)
— Model simplicity (regularization)
Machine Learning basics
The document provides an introduction to machine learning concepts including:
- Machine learning algorithms can learn from data to estimate functions and make predictions.
- Key components of machine learning systems include datasets, models, objective functions, and optimization algorithms.
- Popular machine learning tasks include classification, regression, clustering, and dimensionality reduction.
- Classical machine learning methods like decision trees, k-nearest neighbors, and support vector machines aim to generalize from training data but struggle with high-dimensional or complex problems.
- Modern deep learning methods address these challenges through representation learning, stochastic gradient descent, and the ability to learn from large amounts of data using many parameters.
Fundementals of Machine Learning and Deep Learning ParrotAI
Introduction to machine learning and deep learning to beginners.Learn the applications of machine learning and deep learning and how ti can solve different problems
Machine Learning: Foundations Course Number 0368403401butest
This machine learning foundations course will consist of 4 homework assignments, both theoretical and programming problems in Matlab. There will be a final exam. Students will work in groups of 2-3 to take notes during classes in LaTeX format. These class notes will contribute 30% to the overall grade. The course will cover basic machine learning concepts like storage and retrieval, learning rules, estimating flexible models, and applications in areas like control, medical diagnosis, and document retrieval.
The document provides an overview of machine learning. It defines machine learning as algorithms that can learn from data to optimize performance and make predictions. It discusses different types of machine learning including supervised learning (classification and regression), unsupervised learning (clustering), and reinforcement learning. Applications mentioned include speech recognition, autonomous robot control, data mining, playing games, fault detection, and clinical diagnosis. Statistical learning and probabilistic models are also introduced. Examples of machine learning problems and techniques like decision trees and naive Bayes classifiers are provided.
Machine learning techniques can be used to build software applications that learn from data. Decision trees are a popular machine learning method that can be used to classify data and make predictions. Decision trees work by splitting a dataset into subgroups based on attribute values and building a tree-like model of decisions. Overfitting can be a problem for decision trees, where the model learns the training data too well and does not generalize to new data. Techniques like reduced-error pruning can help avoid overfitting by pruning branches from the tree that do not improve predictions on a validation dataset.
Machine learning involves developing systems that can learn from data and experience. The document discusses several machine learning techniques including decision tree learning, rule induction, case-based reasoning, supervised and unsupervised learning. It also covers representations, learners, critics and applications of machine learning such as improving search engines and developing intelligent tutoring systems.
This document discusses multimodal learning analytics (MLA), which examines learning through multiple modalities like video, audio, digital pens, etc. It provides examples of extracting features from these modalities to analyze problem solving, expertise levels, and presentation quality. Key challenges of MLA are integrating different modalities and developing tools to capture real-world learning outside online systems. While current accuracy is limited, MLA is an emerging field that could provide insights beyond traditional learning analytics.
在此課程中將帶領對資料分析感到陌生卻又充滿興趣的您,完整地學會運用 R 語言從最初的蒐集資料、探索性分析解讀資料,並進行文字探勘,發現那些肉眼看不見、隱藏在資料底下的意義。此課程主要設計給對於 R 語言有基本認識,想要進一步熟悉實作分析的朋友們,希望在課程結束後,您能夠更熟悉 R 語言這個豐富的分析工具。透過蘋果日報慈善捐款的資料集,了解如何從頭解析網頁,撰寫爬蟲自動化收集資訊;取得資料後,能夠靈活處理資料,做清洗、整合及探索;並利用現成的套件進行文字探勘、文本解析;我們將一步步實際走一回資料分析的歷程,處理、觀察、解構資料,試著看看人們在捐款的決策過程中,究竟是什麼因素產生了影響,以及這些結果又是如何從資料中挖掘而出的呢?
Machine Learning and Real-World ApplicationsMachinePulse
This presentation was created by Ajay, Machine Learning Scientist at MachinePulse, to present at a Meetup on Jan. 30, 2015. These slides provide an overview of widely used machine learning algorithms. The slides conclude with examples of real world applications.
Ajay Ramaseshan, is a Machine Learning Scientist at MachinePulse. He holds a Bachelors degree in Computer Science from NITK, Suratkhal and a Master in Machine Learning and Data Mining from Aalto University School of Science, Finland. He has extensive experience in the machine learning domain and has dealt with various real world problems.
This slide gives brief overview of supervised, unsupervised and reinforcement learning. Algorithms discussed are Naive Bayes, K nearest neighbour, SVM,decision tree, Markov model.
Difference between regression and classification. difference between supervised and reinforcement, iterative functioning of Markov model and machine learning applications.
Introdution and designing a learning systemswapnac12
The document discusses machine learning and provides definitions and examples. It covers the following key points:
- Machine learning is a subfield of artificial intelligence concerned with developing algorithms that allow computers to learn from data without being explicitly programmed.
- Well-posed learning problems have a defined task, performance measure, and training experience. Examples given include learning to play checkers and recognize handwritten words.
- Designing a machine learning system involves choosing a training experience, target function, representation of the target function, and learning algorithm to approximate the function. A checkers-playing example is used to illustrate these design decisions.
Machine learning is the science of getting computers to act without being explicitly programmed. In the past decade, machine learning has given us self-driving cars, practical speech recognition, effective web search, and a vastly improved understanding of the human genome. Machine learning is so pervasive today that you probably use it dozens of times a day without knowing it.
We will review some modern machine learning applications, understand variety of machine learning problem definitions, go through particular approaches of solving machine learning tasks.
This year 2015 Amazon and Microsoft introduced services to perform machine learning tasks in cloud. Microsoft Azure Machine Learning offers a streamlined experience for all data scientist skill levels, from setting up with only a web browser, to using drag and drop gestures and simple data flow graphs to set up experiments.
We will briefly review Azure ML Studio features and run machine learning experiment.
Introduction to machine learning and model building using linear regressionGirish Gore
An basic introduction of Machine learning and a kick start to model building process using Linear Regression. Covers fundamentals of Data Science field called Machine Learning covering the fundamental topic of supervised learning method called linear regression. Importantly it covers this using R language and throws light on how to interpret linear regression results of a model. Interpretation of results , tuning and accuracy metrics like RMSE Root Mean Squared Error are covered here.
This document provides an overview of classification in machine learning. It discusses supervised learning and the classification process. It describes several common classification algorithms including k-nearest neighbors, Naive Bayes, decision trees, and support vector machines. It also covers performance evaluation metrics like accuracy, precision and recall. The document uses examples to illustrate classification tasks and the training and testing process in supervised learning.
Le Machine Learning, l’IA, le DeepLearning, les Statistiques, le Data Mining… bref, tous ces mots sont les buzz words du moment mais que se cache-t-il derrière ?
A travers des exemples concrets, on parcourra les différentes approches du Machine Learning, les grandes familles d’algorithmes (n’ayez crainte : sans rentrer dans le cœur de leurs implémentations), puis les outils et les frameworks à la disposition des Data Scientists… et pour finir, on essayera de prédire l’avenir !
Salon Data - Nantes - 19 Septembre 2017
https://salondata.fr/2017/07/12/0930-1030-ml/
The document discusses recommender systems and sequential recommendation problems. It covers several key points:
1) Matrix factorization and collaborative filtering techniques are commonly used to build recommender systems, but have limitations like cold start problems and how to incorporate additional constraints.
2) Sequential recommendation problems can be framed as multi-armed bandit problems, where past recommendations influence future recommendations.
3) Various bandit algorithms like UCB, Thompson sampling, and LinUCB can be applied, but extending guarantees to models like matrix factorization is challenging. Offline evaluation on real-world datasets is important.
Dictionary Learning for Massive Matrix Factorizationrecsysfr
The document presents a new algorithm called Subsampled Online Dictionary Learning (SODL) for solving very large matrix factorization problems with missing values efficiently. SODL adapts an existing online dictionary learning algorithm to handle missing values by only using the known ratings for each user, allowing it to process large datasets with billions of ratings in linear time with respect to the number of known ratings. Experiments on movie rating datasets show that SODL achieves similar prediction accuracy as the fastest existing solver but with a speed up of up to 6.8 times on the largest Netflix dataset tested.
1. Machine learning is a branch of artificial intelligence concerned with algorithms that allow computers to learn from data without being explicitly programmed.
2. A major focus is automatically learning patterns from training data to make intelligent decisions on new data. This is challenging since the set of all possible behaviors given all inputs is too large to observe completely.
3. Machine learning is applied in areas like search engines, medical diagnosis, stock market analysis, and game playing by developing algorithms that improve automatically through experience. Decision trees, Bayesian networks, and neural networks are common algorithms.
This document provides an overview of machine learning concepts and techniques including linear regression, logistic regression, unsupervised learning, and k-means clustering. It discusses how machine learning involves using data to train models that can then be used to make predictions on new data. Key machine learning types covered are supervised learning (regression, classification), unsupervised learning (clustering), and reinforcement learning. Example machine learning applications are also mentioned such as spam filtering, recommender systems, and autonomous vehicles.
MS CS - Selecting Machine Learning AlgorithmKaniska Mandal
ML Algorithms usually solve an optimization problem such that we need to find parameters for a given model that minimizes
— Loss function (prediction error)
— Model simplicity (regularization)
Machine Learning basics
The document provides an introduction to machine learning concepts including:
- Machine learning algorithms can learn from data to estimate functions and make predictions.
- Key components of machine learning systems include datasets, models, objective functions, and optimization algorithms.
- Popular machine learning tasks include classification, regression, clustering, and dimensionality reduction.
- Classical machine learning methods like decision trees, k-nearest neighbors, and support vector machines aim to generalize from training data but struggle with high-dimensional or complex problems.
- Modern deep learning methods address these challenges through representation learning, stochastic gradient descent, and the ability to learn from large amounts of data using many parameters.
Fundementals of Machine Learning and Deep Learning ParrotAI
Introduction to machine learning and deep learning to beginners.Learn the applications of machine learning and deep learning and how ti can solve different problems
Machine Learning: Foundations Course Number 0368403401butest
This machine learning foundations course will consist of 4 homework assignments, both theoretical and programming problems in Matlab. There will be a final exam. Students will work in groups of 2-3 to take notes during classes in LaTeX format. These class notes will contribute 30% to the overall grade. The course will cover basic machine learning concepts like storage and retrieval, learning rules, estimating flexible models, and applications in areas like control, medical diagnosis, and document retrieval.
The document provides an overview of machine learning. It defines machine learning as algorithms that can learn from data to optimize performance and make predictions. It discusses different types of machine learning including supervised learning (classification and regression), unsupervised learning (clustering), and reinforcement learning. Applications mentioned include speech recognition, autonomous robot control, data mining, playing games, fault detection, and clinical diagnosis. Statistical learning and probabilistic models are also introduced. Examples of machine learning problems and techniques like decision trees and naive Bayes classifiers are provided.
Machine learning techniques can be used to build software applications that learn from data. Decision trees are a popular machine learning method that can be used to classify data and make predictions. Decision trees work by splitting a dataset into subgroups based on attribute values and building a tree-like model of decisions. Overfitting can be a problem for decision trees, where the model learns the training data too well and does not generalize to new data. Techniques like reduced-error pruning can help avoid overfitting by pruning branches from the tree that do not improve predictions on a validation dataset.
Machine learning involves developing systems that can learn from data and experience. The document discusses several machine learning techniques including decision tree learning, rule induction, case-based reasoning, supervised and unsupervised learning. It also covers representations, learners, critics and applications of machine learning such as improving search engines and developing intelligent tutoring systems.
This document discusses multimodal learning analytics (MLA), which examines learning through multiple modalities like video, audio, digital pens, etc. It provides examples of extracting features from these modalities to analyze problem solving, expertise levels, and presentation quality. Key challenges of MLA are integrating different modalities and developing tools to capture real-world learning outside online systems. While current accuracy is limited, MLA is an emerging field that could provide insights beyond traditional learning analytics.
在此課程中將帶領對資料分析感到陌生卻又充滿興趣的您,完整地學會運用 R 語言從最初的蒐集資料、探索性分析解讀資料,並進行文字探勘,發現那些肉眼看不見、隱藏在資料底下的意義。此課程主要設計給對於 R 語言有基本認識,想要進一步熟悉實作分析的朋友們,希望在課程結束後,您能夠更熟悉 R 語言這個豐富的分析工具。透過蘋果日報慈善捐款的資料集,了解如何從頭解析網頁,撰寫爬蟲自動化收集資訊;取得資料後,能夠靈活處理資料,做清洗、整合及探索;並利用現成的套件進行文字探勘、文本解析;我們將一步步實際走一回資料分析的歷程,處理、觀察、解構資料,試著看看人們在捐款的決策過程中,究竟是什麼因素產生了影響,以及這些結果又是如何從資料中挖掘而出的呢?
在這個資料科學蔚為風潮的年代,身為一個對新技術充滿好奇的攻城獅,自然會想要擴充自己的武器庫,學習嶄新的資料分析工具;而 R 語言,一個由統計學家專門為了資料探索與分析所開發的腳本語言,具有龐大的開源社群支持以及琳瑯滿目、數以萬計的各式套件,正是當今學習資料科學相關工具的首選。
然而,R 語言的設計邏輯與一般的程式語言不同,工程師們過去學習程式語言的經驗,往往造成學習 R 語言的障礙,本課程將從 R 語言的基礎開始,讓同學們從課堂講解以及互動式上機課程中,得以徹底理解 R 語言的核心概念與精要,學習如何利用 R 語言問資料問題,並且從資料分析的角度撰寫效率良好同時具有高度可讀性的 R 語言代碼。
生成式對抗網路 (Generative Adversarial Network, GAN) 顯然是深度學習領域的下一個熱點,Yann LeCun 說這是機器學習領域這十年來最有趣的想法 (the most interesting idea in the last 10 years in ML),又說這是有史以來最酷的東西 (the coolest thing since sliced bread)。生成式對抗網路解決了什麼樣的問題呢?在機器學習領域,回歸 (regression) 和分類 (classification) 這兩項任務的解法人們已經不再陌生,但是如何讓機器更進一步創造出有結構的複雜物件 (例如:圖片、文句) 仍是一大挑戰。用生成式對抗網路,機器已經可以畫出以假亂真的人臉,也可以根據一段敘述文字,自己畫出對應的圖案,甚至還可以畫出二次元人物頭像 (左邊的動畫人物頭像就是機器自己生成的)。本課程希望能帶大家認識生成式對抗網路這個深度學習最前沿的技術。
This document discusses applying data mining techniques to analyze active users on Reddit. It defines active users as those who posted or commented in at least 5 subreddits and have at least 5 posts/comments in each subreddit. The preprocessing steps extract over 25,000 active users and their posts from the raw Reddit data. K-means clustering is then used to cluster the active users into 10 groups based on their activities to gain insights into different types of active users on Reddit.
This document provides an introduction to exploring and visualizing data using the R programming language. It discusses the history and development of R, introduces key R packages like tidyverse and ggplot2 for data analysis and visualization, and provides examples of reading data, examining data structures, and creating basic plots and histograms. It also demonstrates more advanced ggplot2 concepts like faceting, mapping variables to aesthetics, using different geoms, and combining multiple geoms in a single plot.
(1) The document provides a quick tour of machine learning concepts including definitions of machine learning, components of machine learning problems, different types of machine learning problems, and the general step-by-step process for machine learning.
(2) It defines machine learning as using data to compute a hypothesis that improves some performance measure, and discusses common machine learning applications like classification, regression, and recommendation systems.
(3) The document outlines the key components of a machine learning problem including the input data, output labels, target function to be learned, hypothesis set, and learning algorithm.
This document provides an outline for a talk on machine learning and support vector machines. It begins with an introduction to machine learning, including the goal of allowing computers to learn from examples without being explicitly programmed. It then discusses different types of machine learning problems, including supervised learning problems where labeled training data is provided. Support vector machines are introduced as a method for supervised learning classification and regression tasks by finding optimal separating hyperplanes in feature spaces. The document outlines kernels and how they can be used to map data to higher dimensions to allow for linear separation. Polynomial and Gaussian kernels are briefly described. Applications mentioned include natural language processing, data mining, speech recognition, and web classification.
The document provides an overview of machine learning, including definitions of machine learning, the differences between programming and machine learning, examples of machine learning applications, and descriptions of various machine learning algorithms and techniques. It discusses supervised learning methods like classification and regression. Unsupervised learning methods like clustering are also covered. The document outlines the machine learning process and provides cautions about machine learning.
A Few Useful Things to Know about Machine Learningnep_test_account
1. Machine learning algorithms can automatically learn programs from data by generalizing from examples, which is often more feasible and cost-effective than manual programming. However, developing successful machine learning applications requires expertise beyond what textbooks provide.
2. Machine learning consists of three main components: representation, evaluation, and optimization. Choosing appropriate combinations of these components is key to building effective learners.
3. The goal of machine learning is generalization to new examples, not just accuracy on the training data. Strict separation of training and test data is necessary to evaluate generalization performance.
This document provides an introduction to machine learning, including examples of applications, types of data, and problem formulations. Some key applications discussed are web page ranking by search engines, spam filtering, recommendation systems, computer vision, and natural language processing. The introduction outlines different types of data commonly used in machine learning like text, images, and numerical data. It then describes machine learning problems at a high level, focusing on classification and regression. The document lays the groundwork for exploring machine learning concepts and algorithms in more depth later.
This document provides an introduction to machine learning, including examples of applications, types of data, and problem formulations. It discusses how machine learning is used in applications like web search ranking and spam filtering. It also outlines the basic goals of machine learning problems, which aim to build models from sample data that can predict or describe unseen data. Finally, it provides an overview of the rest of the document, which introduces probability and statistical tools, basic algorithms, and later chapters that discuss more advanced techniques.
This document provides an introduction to machine learning. It begins with examples of machine learning applications including search engines, collaborative filtering, automatic translation, and face recognition. It then discusses the different types of data used in machine learning like text, images, user preferences. Finally, it outlines some common machine learning problems like classification, regression, clustering. The introduction sets the stage for discussing probability, algorithms, and other machine learning foundations.
This document provides an introduction to machine learning. It begins with examples of machine learning applications including search engines, collaborative filtering, automatic translation, and face recognition. It then discusses the different types of data used in machine learning like text, images, user preferences. Finally, it outlines some common machine learning problems like classification, regression, clustering. The document sets the stage for explaining basic machine learning concepts and algorithms.
This document provides an introduction to machine learning, including examples of applications, types of data, and problem formulations. It discusses how machine learning is used in applications like web search ranking and spam filtering. It also outlines the basic goals of machine learning problems, which aim to build models from sample data in order to make predictions or decisions without being explicitly programmed. The introduction provides an overview of machine learning and sets the stage for further technical discussions in later chapters.
This document provides an introduction to machine learning. It begins with examples of machine learning applications including web search ranking, collaborative filtering, automatic translation, and face recognition. It then discusses the different types of data used in machine learning like text, images, user preferences. Finally, it outlines some common machine learning problems like classification, regression, clustering. The document sets the stage for explaining basic machine learning concepts and algorithms.
Our Technology Lead Cory Zibell gave a presentation about Machine Learning. The algorithms, processes, techniques, and modules that it entails. It's meant for anyone to grasp, check it out!
The document discusses machine learning and provides information about several key concepts:
1) Machine learning allows computer systems to learn from data without being explicitly programmed by using statistical techniques to identify patterns in large amounts of data.
2) There are three main approaches to machine learning: supervised learning which uses labeled data to build predictive models, unsupervised learning which finds patterns in unlabeled data, and reinforcement learning which learns from success and failures.
3) Effective machine learning requires balancing model complexity, amount of training data, and ability to generalize to new examples in order to avoid underfitting or overfitting the data. Learning algorithms aim to minimize these risks.
Machine learning is a type of artificial intelligence (AI) that provides computers with the ability to learn without being explicitly programmed. Machine learning focuses on the development of computer programs that can teach themselves to grow and change when exposed to new data.
Introduction to Machine Learning Aristotelis Tsirigos butest
This document provides an introduction to machine learning, covering several key concepts:
- Machine learning aims to build models from data to make predictions without being explicitly programmed.
- There are different types of learning problems including supervised, unsupervised, and reinforcement learning.
- Popular machine learning algorithms discussed include Bayesian learning, nearest neighbors, decision trees, linear classifiers, and ensembles.
- Proper evaluation of machine learning models is important using techniques like cross-validation.
Machine Learning: Foundations Course Number 0368403401butest
This machine learning course will cover theoretical and practical machine learning concepts. It will include 4 homework assignments and programming in Matlab. Lectures will be supplemented by student-submitted class notes in LaTeX. Topics will include learning approaches like storage and retrieval, rule learning, and flexible model estimation, as well as applications in areas like control, medical diagnosis, and web search. A final exam format has not been determined yet.
Machine Learning: Foundations Course Number 0368403401butest
This machine learning course will cover theoretical and practical machine learning concepts. It will include 4 homework assignments and programming in Matlab. Lectures will be supplemented by student-submitted class notes in LaTeX. Topics will include learning approaches like storage and retrieval, rule learning, and flexible model estimation, as well as applications in areas like control, medical diagnosis, and web search. A final exam format has not been determined yet.
The document provides an overview of machine learning algorithms and concepts, including:
- Supervised learning algorithms like regression and classification that use labeled training data to predict target values or categories. Unsupervised learning algorithms like clustering that find hidden patterns in unlabeled data.
- Popular Python libraries for machine learning like NumPy, SciPy, Matplotlib, and Scikit-learn that make implementing algorithms more convenient.
- Examples of supervised and unsupervised learning using a toy that teaches a child to sort shapes or find patterns without explicit labeling of data.
- Definitions of artificial intelligence, machine learning, and deep learning, and how they relate to each other.
This document provides an introduction to machine learning, including definitions, examples of tasks well-suited to machine learning, and different types of machine learning problems. It discusses how machine learning algorithms learn from examples to produce a program or model, and contrasts this with hand-coding programs. It also briefly covers supervised vs. unsupervised vs. reinforcement learning, hypothesis spaces, regularization, validation sets, Bayesian learning, and maximum likelihood learning.
This document provides an overview of machine learning concepts and techniques. It discusses supervised learning methods like classification and regression using algorithms such as naive Bayes, K-nearest neighbors, logistic regression, support vector machines, decision trees, and random forests. Unsupervised learning techniques like clustering and association are also covered. The document contrasts traditional programming with machine learning and describes typical machine learning processes like training, validation, testing, and parameter tuning. Common applications and examples of machine learning are also summarized.
This document is a presentation by Ted Chang about creating new opportunities for Taiwan's intelligent transformation. It discusses paradigm shifts in technology such as mobile phones and cloud computing. It introduces concepts like the Internet of Things, artificial intelligence, and how they can be combined. It argues that key driving forces for the future will be machine learning, big data, cloud computing and AI. The presentation envisions applications of these technologies in areas like future medicine and smart manufacturing. It ends by emphasizing the importance of wisdom and intelligence in shaping the future.
- The document discusses how artificial intelligence can enable earlier and safer medicine.
- It provides background on the author and their expertise in biomedical informatics and roles as editor-in-chief of several academic journals.
- Key applications of AI in healthcare discussed include using machine learning on large medical datasets to detect suspicious moles earlier, reduce medication errors, and more accurately predict cancer occurrence up to 12 months in advance.
- The author argues that AI has the potential to transform medicine by enabling more preventive and earlier detection approaches compared to traditional reactive healthcare models.
Jane may be able to help. Let me check with her personal assistant Jane-ML.
NextPrevIndex
Meera checks with Jane-ML
User-Agent Interaction (V)
48
PA_Meera: Mina, do you
have trouble in
debugging?
Mina: Yes, is there
anyone who has done
this?
Personal Agent
[Meera]
Jane-ML: Jane has done a similar debugging problem before. She is available now and willing to help.
compiletheme
Compiling output
1) Kaggle is the largest platform for AI and data science competitions, acquired by Google in 2017. It has been used by companies like Bosch, Mercedes, and Asus for challenges like improving production lines, accelerating testing processes, and component failure prediction.
2) The document discusses the author's experiences winning silver medals in Kaggle competitions involving camera model identification, passenger screening algorithms, and pneumonia detection. For camera model identification, the author used transfer learning with InceptionResNetV2 and high-pass filters to identify camera models from images.
3) For passenger screening, the author modified a 2D CNN to 3D and used 3D data augmentation to rank in the top 7% of the $1
[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT台灣資料科學年會
The document describes a system for precision agriculture using IoT. It involves sensors collecting environmental data from fields and feeding it to a control board connected to actuators like irrigation systems. The data is also sent to an IoTtalk engine and AgriTalk server in the cloud for analysis and remote access/control through an AgriGUI interface. Equations were developed to estimate nutrient levels like nitrogen from sensor readings to help optimize crop growth.
The document discusses Open Robot Club and includes several links to its website and YouTube videos. It provides information on the club's computing resources like NVIDIA V100 GPUs. Tables with metrics like underkill and overkill percentages are included for different types of tasks like AI AOI and PCB inspection. The club's website and demos are referenced throughout.
Internal Architecture of Database Management SystemsM Munim
A Database Management System (DBMS) is software that allows users to define, create, maintain, and control access to databases. Internally, a DBMS is composed of several interrelated components that work together to manage data efficiently, ensure consistency, and provide quick responses to user queries. The internal architecture typically includes modules for query processing, transaction management, and storage management. This assignment delves into these key components and how they collaborate within a DBMS.
Understanding Tree Data Structure and Its ApplicationsM Munim
A Tree Data Structure is a widely used hierarchical model that represents data in a parent-child relationship. It starts with a root node and branches out to child nodes, forming a tree-like shape. Each node can have multiple children but only one parent, except for the root which has none. Trees are efficient for organizing and managing data, especially when quick searching, inserting, or deleting is needed. Common types include **binary trees**, **binary search trees (BST)**, **heaps**, and **tries**. A binary tree allows each node to have up to two children, while a BST maintains sorted order for fast lookup. Trees are used in various applications like file systems, databases, compilers, and artificial intelligence. Traversal techniques such as preorder, inorder, postorder, and level-order help in visiting all nodes systematically. Trees are fundamental to many algorithms and are essential for solving complex computational problems efficiently.
The final presentation of our time series forecasting project for the "Data Science for Society and Business" Master's program at Constructor University Bremen
The final presentation of our time series forecasting project in the "Data Science for Society and Business" Master's program at Constructor University Bremen
15 Benefits of Data Analytics in Business Growth.pdfAffinityCore
Explore how data analytics boosts business growth with insights that improve decision-making, customer targeting, operations, and long-term profitability.
Ethical Frameworks for Trustworthy AI – Opportunities for Researchers in Huma...Karim Baïna
Artificial Intelligence (AI) is reshaping societies and raising complex ethical, legal, and geopolitical questions. This talk explores the foundations and limits of Trustworthy AI through the lens of global frameworks such as the EU’s HLEG guidelines, UNESCO’s human rights-based approach, OECD recommendations, and NIST’s taxonomy of AI security risks.
We analyze key principles like fairness, transparency, privacy, robustness, and accountability — not only as ideals, but in terms of their practical implementation and tensions. Special attention is given to real-world contexts such as Morocco’s deployment of 4,000 intelligent cameras and the country’s positioning in AI readiness indexes. These examples raise critical issues about surveillance, accountability, and ethical governance in the Global South.
Rather than relying on standardized terms or ethical "checklists", this presentation advocates for a grounded, interdisciplinary, and context-aware approach to responsible AI — one that balances innovation with human rights, and technological ambition with social responsibility.
This rich Trustworthy and Responsible AI frameworks context is a serious opportunity for Human and Social Sciences Researchers : either operate as gatekeepers, reinforcing existing ethical constraints, or become revolutionaries, pioneering new paradigms that redefine how AI interacts with society, knowledge production, and policymaking ?
egc.pdf tài liệu tiếng Anh cho học sinh THPThuyenmy200809
[系列活動] Machine Learning 機器學習課程
1. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Machine Learning
Yuh-Jye Lee
Lab of Data Science and Machine Intelligence
Dept. of Applied Math. at NCTU
Feb. 12, 2017
1 / 136
2. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
1 Introduction to Machine Learning
Some Examples
Basic concept of learning theory
2 Three Fundamental Algorithms
3 Optimization
4 Support Vector Machine
5 Evaluation and Closed Remark
2 / 136
3. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
The Plan of My Lecture
Focus on Supervised Learning mainly (30minutes)
Many examples
Basic Concept of Learning Theorey
Will give you three basic algorithms (80minutes)
k-Nearest Neighbor
Naive Bayes Classifier
Online Perceptron Algorithm
Brief Introduction to Optimization (90minutes)
Support Vector Machines (90minutes)
Evaluation and Closed Remarks (70minutes)
3 / 136
4. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Some Examples
AlphaGo and Master
4 / 136
5. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Some Examples
Mayhem Wins DARPA Cyber Grand Challenge
5 / 136
6. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Some Examples
Supervised Learning Problems
Assumption: training instances are drawn from an unknown but fixed
probability distribution P(x, y) independently.
Our learning task:
Given a training set S = {(x1
, y1), (x2
, y2), . . . , (x , y )}
We would like to construct a rule, f (x) that can correctly
predict the label y given unseen x
If f (x) = y then we get some loss or penalty
For example: (f (x), y) = 1
2 |f (x) − y|
Learning examples: classification, regression and sequence labeling
If y is drawn from a finite set it will be a classification
problem. The simplest case: y ∈ {−1, + 1} called binary
classification problem
If y is a real number it becomes a regression problem
More general case, y can be a vector and each element is
drawn from a finite set. This is the sequence labeling problem
6 / 136
7. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Some Examples
Binary Classification Problem
(A Fundamental Problem in Data Mining)
Find a decision function (classifier) to discriminate two
categories data set.
Supervised learning in Machine Learning
Decision Tree, Deep Neural Network, k-NN and Support
Vector Machines, etc.
Discrimination Analysis in Statistics
Fisher Linear Discriminator
Successful applications:
Cyber Security, Marketing, Bioinformatics, Fraud detection
7 / 136
8. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Some Examples
Bankruptcy Prediction: Solvent vs. Bankrupt
A Binary Classification Problem
Deutsche Bank Dataset
40 financial indicators, (x part), from middle-market
capitalization 422 firms in Benelux.
74 firms went bankrupt and 348 were solvent. (y part)
The variables to be used in the model as explanatory inputs
are 40 financial indicators such as: liquidity, profitability and
solvency measurements.
Machine Learning will identify the most important indicators
W. H¨ardle, Y.-J. Lee, D. Sch¨afer, Dorothea and Y.-R. Yeh, “Variable
selection and oversampling in the use of smooth support vector machines
for predicting the default risk of companies”, Journal of Forecasting, vol
28, (6), p. 512 - 534, 2009
8 / 136
9. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Some Examples
Binary Classification Problem
Given a training dataset
S = {(xi
, yi )|xi
∈ Rn
, yi ∈ {−1, 1}, i = 1, . . . , }
xi
∈ A+ ⇔ yi = 1 & xi
∈ A− ⇔ yi = −1
Main Goal:
Predict the unseen class label for new data
Estimate a posteriori probability of class label
Pr(y = 1|x) > Pr(y = −1|x) ⇒ x ∈ A+
Find a function f : Rn → R by learning from data
f (x) ≥ 0 ⇒ x ∈ A+ and f (x) < 0 ⇒ x ∈ A−
The simplest function is linear: f (x) = w x + b
9 / 136
10. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Basic concept of learning theory
Goal of Learning Algorithms
The early learning algorithms were designed to find such an
accurate fit to the data.
At that time, the training set size is relative small
A classifier is said to be consistent if it performed the correct
classification of the training data.
Please note that it is NOT our learning purpose
The ability of a classifier to correctly classify data not in the
training set is known as its generalization.
Bible code? 1994 Taipei Mayor election?
Predict the real future NOT fitting the data in your hand or
predict the desired results.
10 / 136
11. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Three Fundamental Algorithms
Na¨ıve Bayes Classifier
Based on Bayes’ Rule
k-Nearest Neighbors Algorithm
Distance and Instances based algorithm
Lazy learning
Online Perceptron Algorithm
Mistakes driven algorithm
The smallest unit of Deep Neural Networks
11 / 136
12. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Conditional Probability
Definition
The conditional probability of an event A, given that an event B
has occurred, is equal to
P(A|B) =
P(A ∩ B)
P(B)
Example
Suppose that a fair die is tossed once. Find the probability of
a 1 (event A), given an odd number was obtained (event B).
P(A|B) =
P(A ∩ B)
P(B)
=
1/6
1/2
=
1
3
Restrict the sample space on the event B
12 / 136
13. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Partition Theorem
Assume that {B1, B2, . . . , Bk} is a partition of S such that
P(Bi ) > 0, for i = 1, 2, . . . , k. Then
P(A) =
k
i=1
P(A|Bi )P(Bi ). A ∩ B1
A ∩ B2
A ∩ B3
B1
B2
B3
S
Note that {B1, B2, . . . , Bk} is a partition of S if
1 S = B1 ∪ B2 ∪ . . . ∪ Bk
2 Bi ∩ Bj = ∅ for i = j
13 / 136
14. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Bayes’ Rule
Bayes’ Rule
Assume that {B1, B2, . . . , Bk} is a partition of S such that
P(Bi ) > 0, for i = 1, 2, . . . , k. Then
P(Bj |A) =
P(A|Bj )P(Bj )
k
i=1
P(A|Bi )P(Bi )
.
A ∩ B1
A ∩ B2
A ∩ B3
B1
B2
B3
S
14 / 136
15. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Na¨ıve Bayes for Classification
Also Good for Multi-class Classification
Estimate a posteriori probability of class label
Let each attribute (variable) be a random variable. What is
the probibility of
Pr(y = 1|x) = Pr(y = 1|X1 = x1, X2 = x2, . . . , Xn = xn)
Na¨ıve Bayes TWO not reasonable assumptions:
The importance of each attribute is equal
All attributes are conditional probability independent !
Pr(y = 1|x) =
1
Pr(X = x)
n
i=1
Pr(y = 1|Xi = xi )
15 / 136
16. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
The Weather Data Example
Ian H. Witten & Eibe Frank, Data Mining
Outlook Temperature Humidity Windy Play(Label)
Sunny Hot High False -1
Sunny Hot High True -1
Overcast Hot High False +1
Rainy Mild High False +1
Rainy Cool Normal False +1
Rainy Cool Normal True -1
Overcast Cool Normal True +1
Sunny Mild High False -1
Sunny Cool Normal False +1
Rainy Mild Normal False +1
Sunny Mild Normal True +1
Overcast Mild High True +1
Overcast Hot Normal False +1
Rainy Mild High True -1
16 / 136
17. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Probabilities for Weather Data
Using Maximum Likelihood Estimation
Outlook Temp. Humidity Windy Play
Play Yes No Yes No Yes No Yes No Yes No
Sunny
Overcast
Rainy
2/9
4/9
3/9
3/5
0/5
2/5
Hot
Mild
Cool
2/9
4/9
3/9
2/5
3/5
1/5
High
Normal
3/9
6/9
4/5
1/5
T
F
3/9
6/9
3/5
2/5
9/14 5/14
Likelihood of the two classes:
Pr(y = 1|sunny, cool, high, T) ∝
2
9
·
3
9
·
3
9
·
3
9
·
9
14
Pr(y = −1|sunny, cool, high, T) ∝
3
5
·
1
5
·
4
5
·
3
5
·
5
14
17 / 136
18. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Zero-frequency Problem
What if an attribute value does NOT occur with a class
value?
The posterior probability will all be zero! No matter how likely
the other attribute values are!
Laplace estimator will fix “zero-frequency”,
k + λ
n + aλ
Question: Roll a dice 8 times. The outcomes are as:
2, 5, 6, 2, 1, 5, 3, 6. What is the probability for showing 4?
Pr(X = 4) =
0 + λ
8 + 6λ
, Pr(X = 5) =
2 + λ
8 + 6λ
18 / 136
19. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Instance-based Learning: k-nearest neighbor algorithm
Fundamental philosophy: Two instances that are close to each
other or similar to each other they should share with the
same label
Also known as memory-based learning since what they do is
store the training instances in a lookup table and interpolate
from these.
It requires memory of O(N)
Given an input similar ones should be found and finding them
requires computation of O(N)
Such methods are also called lazy learning algorithms.
Because they do NOT compute a model when they are given
a training set but postpone the computation of the model
until they are given a new test instance (query point)
19 / 136
20. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
k-Nearest Neighbors Classifier
Given a query point xo, we find the k training points
x(i), i = 1, 2, . . . , k closest in distance to xo
Then classify using majority vote among these k neighbors.
Choose k as an odd number will avoid the tie. Ties are broken
at random
If all attributes (features) are real-valued, we can use
Euclidean distance. That is d(x, xo) = x − xo
2
If the attribute values are discrete, we can use Hamming
distance, which counts the number of nonmatching attributes
d(x, xo
) =
n
j=1
1(xj = xo
j )
20 / 136
21. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
1-Nearest Neighbor Decision Boundary (Voronoi)
21 / 136
22. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Distance Measure
Using different distance measurements will give very different
results in k-NN algorithm.
Be careful when you compute the distance
We might need to normalize the scale between different
attributes. For example, yearly income vs. daily spend
Typically we first standardize each of the attributes to have
mean zero and variance 1
ˆxj =
xj − µj
σj
22 / 136
23. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Learning Distance Measure
Finding a distance function d(xi , xj ) such that if xi and xj are
belong to the class the distance is small and if they are
belong to the different classes the distance is large.
Euclidean distance: xi − xj 2
2 = (xi − xj ) (xi − xj )
Mahalanobis distance: d(xi , xj ) = (xi − xj ) M(xi − xj ) where
M is a positive semi-definited matrix.
(xi
− xj
) M(xi
− xj
) = (xi
− xj
) L L(xi
− xj
)
= (Lxi
− Lxj
) (Lxi
− Lxj
)
The matrix L can be with the size k × n and k << n
23 / 136
24. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Linear Learning Machines
The simplest function is linear: f (x) = w x + b
Finding this simplest function via an on-line and
mistake-driven procedure
Update the weight vector and bias when there is a
misclassified point
24 / 136
25. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Binary Classification Problem
Linearly Separable Case
x w + b = 0
x w + b = −1
x w + b = +1
A-
Malignant
A+
Benign
w
25 / 136
26. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Online Learning
Definition of online learning
Given a set of new training data,
Online learner can update its model without reading old data while
improving its performance.
In contrast, off-line learner must combine old and new data and start
the learning all over again, otherwise the performance will suffer.
Online is considered as a solution of large learning tasks
Usually require several passes (or epochs) through the training
instances
Need to keep all instances unless we only run the algorithm one
single pass
26 / 136
27. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Perceptron Algorithm (Primal Form)
Rosenblatt, 1956
Given a training dataset S, and initial weight vector w0 = 0
and the bias b0 = 0
Repeat:
for i = 1 to
if yi ( wk · xi + bk) ≤ 0 then
wk+1 ← wk + ηyi xi
bk+1 ← bk + ηyi R2 R = max
1≤i≤
xi
k ← k + 1
end if
Until no mistakes made within the for loop
Return: k, (wk, bk).
What is k ?
27 / 136
28. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
yi( wk+1
· xi
+ bk+1) > yi( wk
· xi
) + bk ?
wk+1
←− wk
+ ηyixi
and bk+1 ←− bk + ηyiR2
yi ( wk+1
· xi
+ bk+1) = yi ( (wk
+ ηyi xi
) · xi
+ bk + ηyi R2
)
= yi ( wk
· xi
+ bk) + yi (ηyi ( xi
· xi
+ R2
))
= yi ( wk
· xi
+ bk) + η( xi
· xi
+ R2
)
R = max
1≤i≤
xi
28 / 136
29. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Perceptron Algorithm Stop in Finite Steps
Theorem(Novikoff)
Let S be a non-trivial training set, and let
R = max
1≤i≤
xi
Suppose that there exists a vector wopt such that wopt = 1 and
yi ( wopt · xi
+ bopt) ≥ γ for 1 ≤ i ≤ .
Then the number of mistakes made by the on-line perceptron
algorithm on S is almost (2R
γ )2.
29 / 136
30. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Perceptron Algorithm (Dual Form)
w =
i=1
αiyixi
Given a linearly separable training set S and α = 0 , α ∈ R ,
b = 0 , R = max
1≤i≤
xi .
Repeat: for i = 1 to
if yi (
j=1
αj yj xj · xi + b) ≤ 0 then
αi ← αi + 1 ; b ← b + yi R2
end if
end for
Until no mistakes made within the for loop return: (α, b)
30 / 136
31. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
What We Got in the Dual Form of Perceptron Algorithm?
The number of updates equals:
i=1
αi = α 1 ≤ (2R
γ )2
αi > 0 implies that the training point (xi , yi ) has been
misclassified in the training process at least once.
αi = 0 implies that removing the training point (xi , yi ) will
not affect the final results.
The training data only appear in the algorithm through the
entries of the Gram matrix,G ∈ R × which is defined below:
Gij = xi , xj
The key idea of kernel trick in SVMs and all kernel methods
31 / 136
32. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Outline
1 Introduction to Machine Learning
Some Examples
Basic concept of learning theory
2 Three Fundamental Algorithms
3 Optimization
4 Support Vector Machine
5 Evaluation and Closed Remark
32 / 136
33. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
You Have Learned (Unconstrained)
Optimization in Your High School
Let f (x) = ax2 + bx + c, a = 0, x∗ = − b
2a
Case 1 : f (x∗) = 2a > 0 ⇒ x∗ ∈ arg min
x∈R
f (x)
Case 2 : f (x∗) = 2a < 0 ⇒ x∗ ∈ arg max
x∈R
f (x)
For minimization problem (Case I),
f (x∗) = 0 is called the first order optimality condition.
f (x∗) > 0 is the second order optimality condition.
33 / 136
34. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Gradient and Hessian
Let f : Rn → R be a differentiable function. The gradient of
function f at a point x ∈ Rn is defined as
f (x) = [
∂f (x)
∂x1
,
∂f (x)
∂x2
, . . . ,
∂f (x)
∂xn
] ∈ Rn
If f : Rn → R is a twice differentiable function. The Hessian
matrix of f at a point x ∈ Rn is defined as
2
f (x) =
∂2f
∂x2
1
∂2f
∂x1∂x2
· · · ∂2f
∂x1∂xn
...
...
...
...
∂2f
∂xn∂x1
∂2f
∂xn∂x2
· · · ∂2f
∂x2
n
∈ Rn×n
34 / 136
35. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Example of Gradient and Hessian
f (x) = x2
1 + x2
2 − 2x1 + 4x2
=
1
2
x1 x2
2 0
0 2
x1
x2
+ −2 4
x1
x2
f (x) = 2x1 − 2 2x2 + 4 , 2
f (x) =
2 0
0 2
By letting f (x) = 0, we have x∗ =
1
−2
∈ arg min
x∈R2
f (x)
35 / 136
36. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Quadratic Functions (Standard Form)
f (x) = 1
2x Hx + p x
Let f : Rn → R and f (x) = 1
2x Hx + p x
where H ∈ Rn×n is a symmetric matrix and p ∈ Rn
then
f (x) = Hx + p
2
f (x) = H (Hessian)
Note: If H is positive definite, then x∗ = −H−1p is the unique
solution of min f (x).
36 / 136
37. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Least-squares Problem
min
x∈Rn
Ax − b 2
2, A ∈ Rm×n
, b ∈ Rm
f (x) = (Ax − b) (Ax − b)
= x A Ax − 2b Ax + b b
f (x) = 2A Ax − 2A b
2
f (x) = 2A A
x∗
= (A A)−1
A b ∈ arg min
x∈Rn
Ax − b 2
2
If A A is nonsingular matrix ⇒ P.D.
Note : x∗ is an analytical solution.
37 / 136
38. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
How to Solve an Unconstrained MP
Get an initial point and iteratively decrease the obj. function
value.
Stop once the stopping criteria is satisfied.
Steep decent might not be a good choice.
Newtons method is highly recommended.
Local and quadratic convergent algorithm.
Need to choose a good step size to guarantee global
convergence.
38 / 136
39. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
The First Order Taylor Expansion
Let f : Rn → R be a differentiable function
f (x + d) = f (x) + f (x) d + α(x, d) d ,
where
lim
d→0
α(x, d) = 0
If f (x) d < 0 and d is small enough then f (x + d) < f (x).
We call d is a descent direction.
39 / 136
40. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Steep Descent with Exact Line Search
Start with any x0 ∈ Rn. Having xi , stop if f (xi ) = 0.
Else compute xi+1 as follows:
1 Steep descent direction: di = − f (xi )
2 Exact line search: Choose a stepsize such that
df (xi + λdi )
dλ
= f (xi
+ λdi
) = 0
3 Updating: xi+1 = xi + λdi
40 / 136
41. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
MATLAB Code for Steep Descent with Exact Line Search
(Quadratic Function Only)
function [x, f value, iter] = grdlines(Q, p, x0, esp)
%
% min 0.5 ∗ x Qx + p x
% Solving unconstrained minimization via
% steep descent with exact line search
%
41 / 136
42. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
flag = 1;
iter = 0;
while flag > esp
grad = Qx0+p;
temp1 = grad’*grad;
if temp1 < 10−12
flag = esp;
else
stepsize = temp1/(grad’*Q*grad);
x1 = x0 - stepsize*grad;
flag = norm(x1-x0);
x0 = x1;
end;
iter = iter + 1;
end;
x = x0;
fvalue = 0.5*x’*Q*x+p’*x;
42 / 136
43. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
The Key Idea of Newton’s Method
Let f : Rn −→ R be a twice differentiable function
f (x + d) = f (x) + f (x) d +
1
2
d 2
f (x)d + β(x, d) d
where lim
d→0
β(x, d) = 0
At ith iteration, use a quadratic function to approximate
f (x) ≈ f (xi
) + f (xi
)(x − xi
) +
1
2
(x − xi
) 2
f (xi
)(x − xi
)
xi+1 = arg min ˜f (x)
43 / 136
44. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Newton’s Method
Start with x0 ∈ Rn. Having xi ,stop if f (xi ) = 0
Else compute xi+1 as follows:
1 Newton direction: 2f (xi )di = − f (xi )
Have to solve a system of linear equations here!
2 Updating: xi+1 = xi + di
Converge only when x0
is close to x∗
enough.
44 / 136
45. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
It can not converge to the optimal solution.
f(x) = à 6
1
x6
+ 4
1
x4
+ 2x2
g(x) = f(xi
) + f0
(xi
)(x à xi
) + 2
1
f00
(xi
)(x à xi
)
f (x) = 1
6x6 + 1
4x4 + 2x2
g(x) = f (xi ) + f (xi )(x − xi ) + 1
2f (xi )(x − xi )2
It can not converge to the optimal solution.
45 / 136
46. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
People of ACM: David Blei, (Sept. 9, 2014)
The recipient of the 2013 ACM- Infosys Foundation Award in the
Computing Sciences, he is joining Columbia University this fall as a
Professor of Statistics and Computer Science, and will become a
member of Columbia’s Institute for Data Sciences and Engineering.
46 / 136
47. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
What is the most important recent innovation in machine
learning?
[A]: One of the main recent innovations in ML research has been
that we (the ML community) can now scale up our algorithms to
massive data, and I think that this has fueled the modern
renaissance of ML ideas in industry. The main idea is called
stochastic optimization, which is an adaptation of an old algorithm
invented by statisticians in the 1950s.
47 / 136
48. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
What is the most important recent innovation in machine
learning?
[A]: In short, many machine learning problems can be boiled down
to trying to find parameters that maximize (or minimize) a
function. A common way to do this is “gradient ascent,”
iteratively following the steepest direction to climb a function to its
top. This technique requires repeatedly calculating the steepest
direction, and the problem is that this calculation can be
expensive. Stochastic optimization lets us use cheaper approximate
calculations. It has transformed modern machine learning.
48 / 136
49. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Gradient Descent: Batch Learning
For an optimization problem
min f (w) = min r(w) +
1
i=1
(w; (xi
, yi ))
GD tries to find a direction and the learning rate decreasing the objective
function value.
wt+1
= wt
− η f (wt
)
where η is the learning rate, − f (wt
) is the steepest direction
f (wt
) = r(wt
) +
1
i=1
(wt
; (xi
, yi ))
When is large, computing
i=1
(wt
; (xi
, yi )) may cost much time.
49 / 136
50. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Stochastic Gradient Descent: Online Learning
In GD, we compute the gradient using the entire training set.
In stochastic gradient descent(SGD), we use
(wt
; (xt
, yt)) instead of
1
i=1
(wt
; (xi
, yi ))
So the gradient of f (wt
)
f (wt
) = r(wt
) + (wt
; (xt
, yt))
SGD computes the gradient using only one instance.
In experiment, SGD is significantly faster than GD when is large.
50 / 136
51. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Online Perceptron Algorithm [Rosenblatt, 1956]
The Perceptron is considered as a SGD method. The underlying
optimization problem of the algorithm
min
(w,b)∈Rn+1
i=1
(−yi ( w, xi + b))+
In the linearly separable case, the Perceptron alg. will be terminated
in finite steps no matter what learning rate is chosen
In the nonseparable case, how to decide the appropriate learning
rate that will make the least mistake is very difficult
Learning rate can be a nonnegative number. More general case, it
can be a positive definite matrix
51 / 136
52. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
What is Machine Learning?
Representation + Optimization + Evaluation
Pedro Domingos, A few useful things to know about machine learning,
Communications of the ACM, Vol. 55 Issue 10, 78-87, October 2012
The most important reading assignment in my Machine Learning
and Data Science and Machine Intelligence Lab at NCTU
52 / 136
53. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
The Master Algorithm
53 / 136
54. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
The Master Algorithm
54 / 136
55. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Expected Risk vs. Empirical Risk
Assumption: training instances are drawn from an unknown but
fixed probability distribution P(x, y) independently.
Ideally, we would like to have the optimal rule f ∗
that minimizes the
Expected Risk: E(f ) = (f (x), y)dP(x, y) among all functions
Unfortunately, we can not do it. P(x, y) is unknown and we have to
restrict ourselves in a certain hypothesis space, F
How about compute f ∗
∈ F that minimizes the Empirical Risk:
E (f ) = 1
i
(f (xi
), yi )
Only minimizing the empirical risk will be in danger of overfitting
55 / 136
56. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Approximation Optimization Approach
Most of learning algorithms can be formulated as an
optimization problem
56 / 136
57. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Approximation Optimization Approach
Most of learning algorithms can be formulated as an
optimization problem
The objective function consists of two parts: E (f )+ controls
on VC-error bound
57 / 136
58. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Approximation Optimization Approach
Most of learning algorithms can be formulated as an
optimization problem
The objective function consists of two parts: E (f )+ controls
on VC-error bound
Controlling the VC-error bound will avoid the overfitting risk
58 / 136
59. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Approximation Optimization Approach
Most of learning algorithms can be formulated as an
optimization problem
The objective function consists of two parts: E (f )+ controls
on VC-error bound
Controlling the VC-error bound will avoid the overfitting risk
It can be achieved via adding the regularization term into the
objective function
59 / 136
60. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Approximation Optimization Approach
Most of learning algorithms can be formulated as an
optimization problem
The objective function consists of two parts: E (f )+ controls
on VC-error bound
Controlling the VC-error bound will avoid the overfitting risk
It can be achieved via adding the regularization term into the
objective function
Note that: We have made lots of approximations when
formulate a learning task as an optimization problem
60 / 136
61. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Approximation Optimization Approach
Most of learning algorithms can be formulated as an
optimization problem
The objective function consists of two parts: E (f )+ controls
on VC-error bound
Controlling the VC-error bound will avoid the overfitting risk
It can be achieved via adding the regularization term into the
objective function
Note that: We have made lots of approximations when
formulate a learning task as an optimization problem
Why bother to find the optimal solution for the problem?
One could stop the optimization iteration before its
convergence
61 / 136
62. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Constrained Optimization Problem
Problem setting: Given function f , gi , i = 1, ..., k and hj ,
j = 1, ..., m, defined on a domain Ω ⊆ Rn,
min
x∈Ω
f (x)
s.t. gi (x) ≤ 0, ∀i
hj (x) = 0, ∀j
where f (x) is called the objective function and g(x) ≤ 0, h(x) = 0
are called constrains.
62 / 136
64. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
x1 + x2 ≤ 4
−x1 − x2 ≤ −2
x1, x2 ≥ 0
f(x) = [2x1, 2x2]
min
x∈R2
x2
1 + x2
2
f(x∗) = [2, 2]
64 / 136
65. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Definitions and Notation
Feasible region:
F = {x ∈ Ω | g(x) ≤ 0, h(x) = 0}
where g(x) =
g1(x)
...
gk(x)
and h(x) =
h1(x)
...
hm(x)
A solution of the optimization problem is a point x∗ ∈ F such
that x ∈ F for which f (x) < f (x∗) and x∗ is called a global
minimum.
65 / 136
66. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Definitions and Notation
A point ¯x ∈ F is called a local minimum of the optimization
problem if ∃ε > 0 such that
f (x) ≥ f (¯x), ∀x ∈ F and x − ¯x < ε
At the solution x∗, an inequality constraint gi (x) is said to be
active if gi (x∗) = 0, otherwise it is called an inactive
constraint.
gi (x) ≤ 0 ⇔ gi (x) + ξi = 0, ξi ≥ 0 where ξi is called the slack
variable
66 / 136
67. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Definitions and Notation
Remove an inactive constraint in an optimization problem will
NOT affect the optimal solution
Very useful feature in SVM
If F = Rn then the problem is called unconstrained
minimization problem
Least square problem is in this category
SSVM formulation is in this category
Difficult to find the global minimum without convexity
assumption
67 / 136
68. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
The Most Important Concepts in
Optimization(minimization)
A point is said to be an optimal solution of a unconstrained
minimization if there exists no decent direction
=⇒ f (x∗) = 0
A point is said to be an optimal solution of a constrained
minimization if there exists no feasible decent direction
=⇒ KKT conditions
There might exist decent direction but move along this
direction will leave out the feasible region
68 / 136
69. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Minimum Principle
Let f : Rn → R be a convex and differentiable function F ⊆ Rn be
the feasible region.
x∗
∈ arg min
x∈F
f (x) ⇐⇒ f (x∗
)(x − x∗
) ≥ 0 ∀x ∈ F
Example:
min(x − 1)2
s.t. a ≤ x ≤ b
69 / 136
70. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
x1 + x2 ≤ 4
−x1 − x2 ≤ −2
x1, x2 ≥ 0
f(x) = [2x1, 2x2]
min
x∈R2
x2
1 + x2
2
f(x∗) = [2, 2]
70 / 136
71. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Linear Programming Problem
An optimization problem in which the objective function and
all constraints are linear functions is called a linear
programming problem
(LP) min p x
s.t. Ax ≤ b
Cx = d
L ≤ x ≤ U
71 / 136
72. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Linear Programming Solver in MATLAB
X=LINPROG(f,A,b) attempts to solve the linear programming
problem:
min
x
f’*x subject to: A*x <= b
X=LINPROG(f,A,b,Aeq,beq) solves the problem above while
additionally satisfying the equality constraints Aeq*x = beq.
X=LINPROG(f,A,b,Aeq,beq,LB,UB) defines a set of lower and
upper bounds on the design variables, X, so that the solution
is in the range LB <= X <= UB.
Use empty matrices for LB and UB if no bounds exist. Set
LB(i) = -Inf if X(i) is unbounded below; set UB(i) = Inf if X(i)
is unbounded above.
72 / 136
73. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Linear Programming Solver in MATLAB
X=LINPROG(f,A,b,Aeq,beq,LB,UB,X0) sets the starting point
to X0. This option is only available with the active-set al-
gorithm. The default interior point algorithm will ignore any
non-empty starting point.
You can type “help linprog” in MATLAB to get more information!
73 / 136
74. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
L1-Approximation: min
x∈Rn
Ax − b 1
z 1 =
m
i=1
|zi |
min
x,s
1 s
s.t. −s ≤ Ax − b ≤ s
Or
min
x,s
m
i=1
si
s.t. −si ≤ Ai x − bi ≤ si ∀i
min
x,s
0 · · · 0 1 · · · 1
x
s
s.t.
A −I
−A −I 2m×(n+m)
x
s
≤
b
−b
74 / 136
75. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Chebyshev Approximation: min
x∈Rn
Ax − b ∞
z ∞ = max
1≤i≤m
|zi |
min
x,γ
γ
s.t. − 1γ ≤ Ax − b ≤ 1γ
min
x,s
0 · · · 0 1
x
γ
s.t.
A −1
−A −1 2m×(n+1)
x
γ
≤
b
−b
75 / 136
76. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Quadratic Programming Problem
If the objective function is convex quadratic while the
constraints are all linear then the problem is called convex
quadratic programming problem
(QP) min
1
2
x Qx + p x
s.t. Ax ≤ b
Cx = d
L ≤ x ≤ U
76 / 136
77. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Quadratic Programming Solver in MATLAB
X=QUADPROG(H,f,A,b) attempts to solve the quadratic pro-
gramming problem:
min
x
0.5*x’*H*x+f’*x subject to: A*x <= b
X=QUADPROG(H,f,A,b,Aeq,beq) solves the problem
above while additionally satisfying the equality constraints
Aeq*x=beq.
X=QUADPROG(H,f,A,b,Aeq,beq,LB,UB) defines a set of
lower and upper bounds on the design variables, X, so that
the solution is in the range LB <= X <= UB.
Use empty matrices for LB and UB if no bounds exist. Set
LB(i) = -Inf if X(i) is unbounded below; set UB(i) = Inf if X(i)
is unbounded above.
77 / 136
78. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Quadratic Programming Solver in MATLAB
X=QUADPROG(H,f,A,b,Aeq,beq,LB,UB,X0) sets the starting
point to X0.
You can type “help quadprog” in MATLAB to get more
information!
78 / 136
79. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Standard Support Vector Machine
min
w,b,ξA,ξB
C(1 ξA + 1 ξB) +
1
2
w 2
2
(Aw + 1b) + ξA ≥ 1
(Bw + 1b) − ξB ≤ −1
ξA ≥ 0, ξB ≥ 0
79 / 136
80. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Farkas’ Lemma
For any matrix A ∈ Rm×n and any vector b ∈ Rn, either
Ax ≤ 0, b x > 0 has a solution
or
A α = b, α ≥ 0 has a solution
but never both.
80 / 136
81. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Farkas’ Lemma
Ax ≤ 0, b x > 0 has a solution
b is NOT in the cone generated by A1 and A2
A1
A2
b
Solution Area
{x|b x > 0} ∩ {x|Ax ≤ 0} = 0
81 / 136
82. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Farkas’ Lemma
A α = b, α ≥ 0 has a solution
b is in the cone generated by A1 and A2
A1
A2
b
{x|b > 0} ∩ {x|Ax ≤ 0} = ∅
82 / 136
83. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Minimization Problem
vs.
Kuhn-Tucker Stationary-point Problem
MP:
min
x∈Ω
f (x)
s.t. g(x) ≤ 0
KTSP:
Find ¯x ∈ Ω, ¯α ∈ Rm
such that
f (¯x) + ¯α g(¯x) = 0
¯α g(¯x) = 0
g(¯x) ≤ 0
¯α ≥ 0
83 / 136
84. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Lagrangian Function
L(x, α) = f (x) + α g(x)
Let L(x, α) = f (x) + α g(x) and α ≥ 0
If f (x), g(x) are convex the L(x, α) is convex.
For a fixed α ≥ 0, if ¯x ∈ arg min{L(x, α)|x ∈ Rn}
then
∂L(x, α)
∂x x=¯x
= f (¯x) + α g(¯x) = 0
Above result is a sufficient condition if L(x, α) is convex.
84 / 136
85. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
KTSP with Equality Constraints?
(Assume h(x) = 0 are linear functions)
h(x) = 0 ⇔ h(x) ≤ 0 and −h(x) ≤ 0
KTSP:
Find ¯x ∈ Ω, ¯α ∈ Rk
, ¯β+, ¯β− ∈ Rm
such that
f (¯x) + ¯α g(¯x) + (¯β+ − ¯β−) h(¯x) = 0
¯α g(¯x) = 0, (¯β+) h(¯x) = 0, (¯β−) (−h(¯x)) = 0
g(¯x) ≤ 0, h(¯x) = 0
¯α ≥ 0, ¯β+, ¯β− ≥ 0
85 / 136
86. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
KTSP with Equality Constraints
KTSP:
Find ¯x ∈ Ω, ¯α ∈ Rk
, ¯β ∈ Rm
such that
f (¯x) + ¯α g(¯x) + ¯β h(¯x) = 0
¯α g(¯x) = 0, g(¯x) ≤ 0, h(¯x) = 0
¯α ≥ 0
Let ¯β = ¯β+ − ¯β− and ¯β+, ¯β− ≥ 0
then ¯β is free variable
86 / 136
87. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Generalized Lagrangian Function
L(x, α, β) = f (x) + α g(x) + β h(x)
Let L(x, α, β) = f (x) + α g(x) + β h(x) and α ≥ 0
If f (x), g(x) are convex and h(x) is linear then L(x, α, β) is
convex.
For fixed α ≥ 0, if ¯x ∈ arg min{L(x, α, β)|x ∈ Rn}
then
∂L(x, α, β)
∂x x=¯x
= f (¯x) + α g(¯x) + β h(¯x) = 0
Above result is a sufficient condition if L(x, α, β) is convex.
87 / 136
88. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Lagrangian Dual Problem
max
α,β
min
x∈Ω
L(x, α, β)
s.t. α ≥ 0
88 / 136
89. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Lagrangian Dual Problem
max
α,β
min
x∈Ω
L(x, α, β)
s.t. α ≥ 0
max
α,β
θ(α, β)
s.t. α ≥ 0
where θ(α, β) = inf
x∈Ω
L(x, α, β)
89 / 136
90. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Weak Duality Theorem
Let ¯x ∈ Ω be a feasible solution of the primal problem and (α, β) a
feasible sulution of the dual problem. then f (¯x) ≥ θ(α, β)
θ(α, β) = inf
x∈Ω
L(x, α, β) ≤ L(˜x, α, β)
Corollary:
sup{θ(α, β)|α ≥ 0} ≤ inf{f (x)|g(x) ≤ 0, h(x) = 0}
90 / 136
91. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Weak Duality Theorem
Corollary
If f (x∗) = θ(α∗, β∗) where α∗ ≥ 0 and g(x∗) ≤ 0 , h(x∗) = 0
,then x∗ and (α∗, β∗) solve the primal and dual problem
respectively. In this case,
0 ≤ α ⊥ g(x) ≤ 0
91 / 136
92. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Saddle Point of Lagrangian
Let x∗ ∈ Ω,α∗ ≥ 0, β∗ ∈ Rm satisfying
L(x∗
, α, β) ≤ L(x∗
, α∗
, β∗
) ≤ L(x, α∗
, β∗
) , ∀x ∈ Ω , α ≥ 0
Then (x∗, α∗, β∗) is called The saddle point of the Lagrangian
function
92 / 136
93. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Saddle Point of f (x, y) = x2
− y2
Saddle point of
22
),( yxyxf −=
93 / 136
94. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Dual Problem of Linear Program
Primal LP min
x∈Rn
p x
subject to Ax ≥ b , x ≥ 0
Dual LP max
α∈Rm
b α
subject to A α ≤ p , α ≥ 0
All duality theorems hold and work perfectly!
94 / 136
95. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Lagrangian Function of Primal LP
L(x, α) = p x + α1 (b − Ax) + α2 (−x)
max
α1,α2≥0
min
x∈Rn
L(x, α1, α2)
max
α1,α2≥0
p x + α1 (b − Ax) + α2 (−x)
subject to p − A α1 − α2 = 0
( x L(x, α1, α2) = 0)
95 / 136
96. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Application of LP Duality
LSQ − NormalEquation Always Has a Solution
For any matrix A ∈ Rmxn and any vector b ∈ Rm ,
consider min
x∈Rn
Ax − b 2
2
x∗
∈ arg min{ Ax − b 2
2} ⇔ A Ax∗
= A b
Claim : A Ax = A b always has a solution.
96 / 136
97. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Dual Problem of Strictly Convex Quadratic Program
Primal QP
min
x∈Rn
1
2
x Qx + p x
s.t. Ax ≤ b
With strictlyconvex assumption, we have
Dual QP
max −
1
2
(p + α A)Q−1
(A α + p) − α b
s.t. α ≥ 0
97 / 136
98. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Outline
1 Introduction to Machine Learning
Some Examples
Basic concept of learning theory
2 Three Fundamental Algorithms
3 Optimization
4 Support Vector Machine
5 Evaluation and Closed Remark
98 / 136
99. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Binary Classification Problem
Linearly Separable Case
x w + b = 0
x w + b = −1
x w + b = +1
A-
Malignant
A+
Benign
w
99 / 136
100. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Support Vector Machines
Maximizing the Margin between Bounding Planes
x w + b = −1
x w + b = +1
A-
A+
w
2
w 2
= Margin
100 / 136
101. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Why Use Support Vector Machines?
Powerful tools for Data Mining
SVM classifier is an optimally defined surface
SVMs have a good geometric interpretation
SVMs can be generated very efficiently
Can be extended from linear to nonlinear case
Typically nonlinear in the input space
Linear in a higher dimensional ”feature space”
Implicitly defined by a kernel function
Have a sound theoretical foundation
Based on Statistical Learning Theory
101 / 136
102. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Why We Maximize the Margin?
(Based on Statistical Learning Theory)
The Structural Risk Minimization (SRM):
The expected risk will be less than or equal to empirical risk
(training error)+ VC (error) bound
w 2 ∝ VC bound
min VC bound⇔ min 1
2 w 2
2 ⇔ max Margin
102 / 136
103. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Summary the Notations
Let S = {(x1, y1), (x2, y2), . . . , (x , y ) be a training dataset and
represented by matrices
A =
(x1)
(x2)
...
(x )
∈ R ×n
, D =
y1 · · · 0
...
...
...
0 · · · y
∈ R ×
Ai w + b ≥ +1, for Dii = +1
Ai w + b ≤ −1, for Dii = −1 , equivalent to D(Aw + 1b) ≥ 1 ,
where 1 = [1, 1, . . . , 1] ∈ R
103 / 136
104. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Support Vector Classification
(Linearly Separable Case, Primal)
The hyperplane (w, b) is determined by solving the minimization
problem:
min
(w,b)∈Rn+1
1
2
w 2
2
D(Aw + 1b) ≥ 1,
It realizes the maximal margin hyperplane with geometric margin
γ =
1
w 2
104 / 136
105. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Support Vector Classification
(Linearly Separable Case, Dual Form)
The dual problem of previous MP:
max
α∈R
1 α −
1
2
α DAA Dα
subject to
1 Dα = 0, α ≥ 0
Applying the KKT optimality conditions, we have w = A Dα. But
where is b ?
Don’t forget
0 ≤ α ⊥ D(Aw + 1b) − 1 ≥ 0
105 / 136
106. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Dual Representation of SVM
(Key of Kernel Methods: w = A Dα∗
=
i=1
yiα∗
i Ai )
The hypothesis is determined by (α∗, b∗)
h(x) = sgn( x · A Dα∗
+ b∗
)
= sgn(
i=1
yi α∗
i xi
· x + b∗
)
= sgn(
α∗
i >0
yi α∗
i xi
· x + b∗
)
Remember : Ai = xi
106 / 136
107. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Soft Margin SVM
(Nonseparable Case)
If data are not linearly separable
Primal problem is infeasible
Dual problem is unbounded above
Introduce the slack variable for each training point
yi (w xi
+ b) ≥ 1 − ξi , ξi ≥ 0, ∀i
The inequality system is always feasible e.g.
w = 0, b = 0, ξ = 1
107 / 136
108. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
x w + b = −1
x w + b = +1
A-
A+
w
ξi
ξj
2
w 2
= Margin
108 / 136
109. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Robust Linear Programming
Preliminary Approach to SVM
min
w,b,ξ
1 ξ
s.t. D(Aw + 1b) + ξ ≥ 1 (LP)
ξ ≥ 0
where ξ is nonnegative slack(error) vector
The term 1 ξ, 1-norm measure of error vector, is called the
training error
For the linearly separable case, at solution of(LP): ξ = 0
109 / 136
110. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Support Vector Machine Formulations
(Two Different Measures of Training Error)
2-Norm Soft Margin:
min
(w,b,ξ)∈Rn+1+
1
2
w 2
2 +
C
2
ξ 2
2
D(Aw + 1b) + ξ ≥ 1
1-Norm Soft Margin (Conventional SVM)
min
(w,b,ξ)∈Rn+1+
1
2
w 2
2 + C1 ξ
D(Aw + 1b) + ξ ≥ 1
ξ ≥ 0
110 / 136
111. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Tuning Procedure
How to determine C ?Tuning Procedure
How to determine C?
overfitting
The final value of parameter is one with
the maximum testing set correctness !
C
The final value of parameter is one with the maximum testing set
correctness!
111 / 136
112. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
1-Norm SVM
(Different Measure of Margin)
1-Norm SVM:
min
(w,b,ξ)∈Rn+1+
w 1 +C1 ξ
D(Aw + 1b) + ξ ≥ 1
ξ ≥ 0
Equivalent to:
min
(s,w,b,ξ)∈R2n+1+
1s + C1 ξ
D(Aw + 1b) + ξ ≥ 1
−s ≤ w ≤ s
ξ ≥ 0
Good for feature selection and similar to the LASSO
112 / 136
113. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Two-spiral Dataset
(94 white Dots & 94 Red Dots)
Two-spiral Dataset
(94 White Dots & 94 Red Dots)
113 / 136
114. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Learning in Feature Space
(Could Simplify the Classification Task)
Learning in a high dimensional space could degrade
generalization performance
This phenomenon is called curse of dimensionality
By using a kernel function, that represents the inner product
of training example in feature space, we never need to
explicitly know the nonlinear map
Even do not know the dimensionality of feature space
There is no free lunch
Deal with a huge and dense kernel matrix
Reduced kernel can avoid this difficulty
114 / 136
115. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Φ
X − −− −→F
Feature map
nonlinear pattern in data space approximate linear pattern in feature space
115 / 136
116. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Linear Machine in Feature Space
Let φ : X −→ F be a nonlinear map from the input space to some
feature space
The classifier will be in the form(primal):
f (x) = (
?
j=1
wj φj (x)) + b
Make it in the dual form:
f (x) = (
i=1
αi yi φ(xi
) · φ(x) ) + b
116 / 136
117. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Kernel:Represent Inner Product
in Feature Space
Definition: A kernel is a function K : X × X −→ R
such that for all x, z ∈ X
K(x, z) = φ(x) · φ(z)
where φ : X −→ F
The classifier will become:
f (x) = (
i=1
αi yi K(xi
, x)) + b
117 / 136
118. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
A Simple Example of Kernel
Polynomial Kernel of Degree 2: K(x,z)= x, z 2
Let x =
x1
x2
, z =
z1
z2
∈ R2 and the nonlinear map
φ : R2 −→ R3 defined by φ(x) =
x2
1
x2
2√
2x1x2
.
Then φ(x), φ(z) = x, z 2 = K(x, z)
There are many other nonlinear maps, ψ(x), that satisfy the
relation: ψ(x), ψ(z) = x, z 2 = K(x, z)
118 / 136
119. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Power of the Kernel Technique
Consider a nonlinear map φ : Rn −→ Rp that consists of distinct
features of all the monomials of degree d.
Then p =
n + d − 1
d
.
x3
1 x1
2 x4
3 x4
4 =⇒ x o o o x o x o o o o x o o o o
For example: n=11, d=10, p=92378
Is it necessary? We only need to know φ(x), φ(z) !
This can be achieved K(x, z) = x, z d
119 / 136
120. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Kernel Technique
Based on Mercer’s Condition(1909)
The value of kernel function represents the inner product of
two training points in feature space
Kernel function merge two steps
1 map input data from input space to feature space (might be
infinite dim.)
2 do inner product in the feature space
120 / 136
121. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Example of Kernel
K(A, B) : R ×n
× Rnט
−→ R ט
A ∈ R ×n, a ∈ R , µ ∈ R, d is an integer:
Polynomial Kernel:
(AA + µaa )d
• (Linear KernelAA : µ = 0, d = 1)
Gaussian (Radial Basis) Kernel:
K(A, A )ij = e−µ Ai −Aj
2
2 , i, j = 1, ..., m
The ij-entry of K(A, A ) represents the ”similarity” of data
points Ai and Aj
121 / 136
122. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Nonlinear Support Vector Machine
(Applying the Kernel Trick)
1-Norm Soft Margin Linear SVM:
max
α∈R
1 α −
1
2
α DAA Dα s.t. 1 Dα = 0, 0 ≤ α ≤ C1
Applying the kernel trick and running linear SVM in the
feature space without knowing the nonlinear mapping
1-Norm Soft Margin Nonlinear SVM:
max
α∈R
1 α −
1
2
α DK(A, A )Dα
s.t. 1 Dα = 0, 0 ≤ α ≤ C1
All you need to do is replacing AA by K(A, A )
122 / 136
123. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
1-Norm SVM
(Different Measure of Margin)
1-Norm SVM:
min
(w,b,ξ)∈Rn+1+
w 1 +C1 ξ
D(Aw + 1b) + ξ ≥ 1
ξ ≥ 0
Equivalent to:
min
(s,w,b,ξ)∈R2n+1+
1s + C1 ξ
D(Aw + 1b) + ξ ≥ 1
−s ≤ w ≤ s
ξ ≥ 0
Good for feature selection and similar to the LASSO
123 / 136
124. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Outline
1 Introduction to Machine Learning
Some Examples
Basic concept of learning theory
2 Three Fundamental Algorithms
3 Optimization
4 Support Vector Machine
5 Evaluation and Closed Remark
124 / 136
125. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
How to Evaluated What’s been learned
Cost is not sensitive
Measure the performance of a classifier in terms of error rate
or accuracy
Error rate =
Number of misclassified point
Total number of data point
Main Goal: Predict the unseen class label for new data
We have to asses a classifier’s error rate on a set that play no
rule in the learning class
Split the data instances in hand into two parts:
1 Training set: for learning the classifier.
2 Testing set: for evaluating the classifier.
125 / 136
126. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
k-fold Stratified Cross Validation
Maximize the usage of the data in hands
Split the data into k approximately equal partitions.
Each in turn is used for testing while the remainder is used for
training.
The labels (+/−) in the training and testing sets should be in
about right proportion.
Doing the random splitting in the positive class and negative
class respectively will guarantee it.
This procedure is called stratification.
Leave-one-out cross-validation if k = # of data point.
No random sampling is involved but nonstratified.
126 / 136
127. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
How to Compare Two Classifier?
Testing Hypothesis:Paired t-test
We compare two leaving algorithm by comparing the average
error rate over several cross-validations.
Assume the same cross-validation split can be used for both
methods
H0 : ¯d = 0 v.s H1 : ¯d = 0
where ¯d = 1
k
k
i=1 di and di = xi − yi
The t-statistic:
t =
¯d
σ2
d /k
127 / 136
128. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
How to Evaluate What’s Been Learned?
When cost is sensitive
Two types error will occur: False Positive(FP) & False
Negative(FN)
For binary classification problem, the results can be
summarized in a 2 × 2 confusion matrix.
Predicted Class
True Pos.
(TP)
False Neg.
(FN)
Actual
Class
False Pos.
(FP)
True Neg.
(FN)
128 / 136
129. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
ROC Curve
Receiver Operating Characteristic Curve
An evaluation method for learning models.
What it concerns about is the Ranking of instances made by
the learning model.
A Ranking means that we sort the instances w.r.t the
probability of being a positive instance from high to low.
ROC curve plots the true positive rate (TPr) as a function of
the false positive rate (FPr).
129 / 136
130. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
An example of ROC Curve
130 / 136
131. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Using ROC to Compare Two Methods
Figure: Under the same FP rate, method A is better than B.
131 / 136
132. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Using ROC to Compare Two Methods
132 / 136
133. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Area under the Curve (AUC)
An index of ROC curve with range from 0 to 1.
An AUC value of 1 corresponds to a perfect Ranking (all
positive instances are ranked high than all negative instance).
A simple formula for calculating AUC:
AUC =
m
i=1
n
j=1 If (xi )>f (xj )
m
where m: number of positive instances.
n: number of negative instances.
133 / 136
134. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Performance Measures in Information Retrieval (IR)
An IR system, such as Google, for given a query (keywords
search) will try to retrieve all relevant documents in a corpus.
Documents returned that are NOT relevant: FP.
The relevant documents that are NOT return: FN.
Performance measures in IR, Recall & Precision.
Recall =
TP
TP + FN
and
Precision =
TP
TP + FP
134 / 136
135. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Balance the Trade-off between Recall and Precision
Two extreme cases:
1 Return only document with 100% confidence then precision=1
but recall will be very small.
2 Return all documents in the corpus then recall=1 but precision
will be very small.
F-measure balances this trade-off:
F − measure =
2
1
Recall
+
1
Precision
135 / 136
136. Outline Introduction to Machine Learning Three Fundamental Algorithms Optimization Support Vector Machine Evaluation and
Reference
C. J. C Burges. ”A Tutorial on Support Vector Machines for
Pattern Recognition”, Data Mining and Knowledge Discovery,
Vol. 2, No. 2, (1998) 121-167.
N. Cristianini and J. Shawe-Taylor. ”An Introduction to
Support Vector Machines”, Cambridge University Press,(2000).
136 / 136