Data Science
Data Science
– DATA SCIENCE
AI Need Data
Data Science
Computer Vision
Natural Language Processing
What is Data Science ?
It is the domain of computer science where we extract insights from available data with the help
of scientific methods, Algorithms and Statistics.
Data Science is a technology that does analysis of data to create impactful solution from given
data or predict outcomes for problem statement.
Data Science is used to solve real life problems using Data.
Internet Search Digital Advertisements
NumPy: Numerical Array Data Handling package. It is used for data analysis and calculation
related to large numerical data sets.
OpenCV: Image Processing Package. It is used for manipulating and processing of images like
cropping, resizing, editing etc…
Matplotlib: Data Visualisation package. It is used for the graphical representation to produce
high quality data visualization of the numerical data.
NLTK(Natural Language Tool Kit): Natural Language processing package. It helps in tasks
related to textual data.
Pandas: Data related to 2 or more dimensions is handled using pandas. The source of data is
data arranged in tabular form either using spreadsheets or database software.
Understanding K- Nearest Neighbour Model
(K-NN)
K-Nearest Neighbour Model is a supervised machine learning algorithm based on supervised learning
technique.
It is actually used to solve both classification and regression problems to estimate the possibility of a
new data to become a member of specific data group that is much similar to new data.
The K-NN algorithm assumes that similar things exists in close proximity. It blindly follows the saying
“Birds of same feather flock together”.
It is also called as lazy learner algorithm because it doesn’t learn from the training set immediately
instead it stores the data set and at the time of classification, it performs the action on data cell.
K Nearest Neighbour (KNN) Algorithm : Illustration
Why We Need KNN
Advantages of KNN Algorithm
It is simple to understand and implement
It is versatile for regression and classification
It gives high accuracy using simple supervised learning techniques
It is more effective for large training data
It is very useful for non linear data