0% found this document useful (0 votes)
22 views

Computer Vision-Lec 3

Computer vision chapter 1

Uploaded by

Mubeen Akram
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
22 views

Computer Vision-Lec 3

Computer vision chapter 1

Uploaded by

Mubeen Akram
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 11
1. Dataset + A dataset is a particular instance of data that is used for analysis or model building at any given time. ¢ A dataset comes in different flavors such as numerical data, categorical data, text data, image data, voice data, and video data. * For beginning data science projects, the most popular type of dataset is a dataset containing numerical data that is typically stored in a comma-separated values (CSV) file format 2. Data Wrangling * Data wrangling is the process of converting data from its raw form to a tidy form ready for analysis. * Data wrangling is an important step in data preprocessing and includes several processes like data importing, data cleaning, data structuring, string processing, HTML parsing, handling dates and times, handling missing data, and text mining. 3. Data Visualization * Itis one of the main tools used to analyze and study relationships between different variables. * Data visualization (e.g., scatter plots, line graphs, bar plots, histograms, qqplots, smooth densities, boxplots, pair plots, heat maps, etc.) can be used for descriptive analytics. * Data visualization is also used in machine learning for data preprocessing and analysis, feature selection, model building, model testing, and model evaluation. a 4, Outliers * An outlier is a data point that is very different from the rest of the dataset. * Outliers are very common and are expected in large datasets. * One common way to detect outliers in a dataset is by using a box plot. * Outliers can significantly degrade the predictive power of a machine learning model. * Advanced methods for dealing with outliers include the RANSAC method. 5. Data Imputation * Most datasets contain missing values. However, the removal of samples or dropping of entire feature columns is simply not feasible because we might lose too much valuable data. * So, here we can use different interpolation techniques to estimate the missing values from the other training samples in our dataset. * One of the most common interpolation techniques is mean imputation, where we simply replace the missing value with the mean value of the entire feature column. 6. Data Scaling * Scaling your features will help improve the quality and predictive power of your model. * Without scaling your features, the model will be biased towards a particular feature. * In order to bring features to the same scale, we could decide to use either normalization or standardization of features. e>° 7. Data Partitioning * In machine learning, the dataset is often partitioned into training and testing sets. * The model is trained on the training dataset and then tested on the testing dataset. * The testing dataset thus acts as the unseen dataset, which can be used to estimate a generalization error (the error expected when the model is applied to a real-world dataset after the model has been deployed). x y ——— #import train seaterein from sklearn.model_selection import train_test_split X_train,X_testy_train,y_test = train_test_split { aa (Xy,test_size=0.3) test 8. Supervised Learning * These are machine learning algorithms that perform learning by studying the relationship between the feature variables and the known target variable. Supervised learning has two subcategories: a) Continuous Target Variables * Linear Regression, KNeighbors regression (KNR), and Support Vector Regression (SVR). b) Discrete Target Variables * Logistic Regression classifier, Support Vector Machines (SVM), Decision tree classifier EIQ Annotations vet Prediction These are apples eS? 9. Unsupervised Learning * In unsupervised learning, we deal with unlabeled data or data of unknown structure. * Using unsupervised learning techniques, we are able to explore the structure of our data to extract meaningful information without the guidance of a known outcome variable or reward function. * K-means clustering is an example of an unsupervised learning algorithm. Input data 988 es S% Model 10. Reinforcement Learning * Reinforcement Learning(RL) is a type of machine learning technique that enables an agent to learn in an interactive environment by trial and error using feedback from its own actions and experiences. * Reinforcement learning uses rewards and punishment as signals for positive and negative behavior. ACTION Re a @ AGENT ENVIRONMENT td STATE, REWARD 11. Cross-validation * Cross-validation is a method of evaluating a machine learning model’s performance across random samples of the dataset. * In k-fold cross-validation, the dataset is randomly partitioned into training and testing sets. ¢ The model is trained on the training set and evaluated on the testing set. The process is repeated k-times. ¢ The average training and testing scores are then calculated by averaging over the k-folds.

You might also like