0% found this document useful (0 votes)
4 views26 pages

datascience

The document covers data cleaning and preparation techniques using Pandas, including handling missing data, data transformation, and string manipulation. It also discusses the integration of Pandas with modeling libraries like Scikit-learn and Statsmodels, as well as visualization tools such as Matplotlib and Seaborn. Additionally, it introduces concepts related to data mining, including the data mining process, techniques, and knowledge representation.

Uploaded by

CONQUEROR 〆YT
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views26 pages

datascience

The document covers data cleaning and preparation techniques using Pandas, including handling missing data, data transformation, and string manipulation. It also discusses the integration of Pandas with modeling libraries like Scikit-learn and Statsmodels, as well as visualization tools such as Matplotlib and Seaborn. Additionally, it introduces concepts related to data mining, including the data mining process, techniques, and knowledge representation.

Uploaded by

CONQUEROR 〆YT
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Unit 4

Data Cleaning and Preparation

Handling Missing Data


Missing data is a common issue in datasets. Pandas provides several methods to handle missing values
effectively:

1. Detecting Missing Values:

o Use .isnull() or .notnull() to identify missing data.

import pandas as pd
df = pd.DataFrame({'A': [1, None, 3], 'B': [4, 5, None]})
print(df.isnull()) # True for missing values

2. Dropping Missing Values:

o Remove rows or columns with missing data using .dropna().

df = df.dropna() # Drops rows with any missing values

3. Filling Missing Values:

o Replace missing data with a specific value or computed statistics using .fillna().

df = df.fillna(0) # Replace missing values with 0


df['A'] = df['A'].fillna(df['A'].mean()) # Replace with column mean

4. Forward/Backward Fill:

o Fill missing values with the previous or next non-missing value.

df = df.fillna(method='ffill') # Forward fill


df = df.fillna(method='bfill') # Backward fill

Data Transformation

Data transformation involves converting data into a suitable format for analysis.
1. Scaling and Normalization:

o Normalize data to bring all values into a similar range.

from sklearn.preprocessing import MinMaxScaler


scaler = MinMaxScaler()
scaled_data = scaler.fit_transform(df[['A', 'B']])

2. Type Conversion:

o Convert columns to appropriate data types using .astype().

df['A'] = df['A'].astype(int)

3. Renaming Columns:

o Rename columns to meaningful names using .rename().

df = df.rename(columns={'A': 'Column1', 'B': 'Column2'})

4. Removing Duplicates:

o Remove duplicate rows using .drop_duplicates().

df = df.drop_duplicates()

String Manipulation

Pandas provides string functions for handling textual data.

1. Lowercase Conversion:

df['Name'] = df['Name'].str.lower()

2. Removing Whitespace:

df['Name'] = df['Name'].str.strip()

3. Replacing Substrings:
df['Name'] = df['Name'].str.replace('old', 'new')

4. Splitting Strings:

df[['First Name', 'Last Name']] = df['Name'].str.split(' ', expand=True)

Data Wrangling

Join, Combine, and Reshape

1. Combining and Merging Datasets:

o Use pd.concat() to combine datasets vertically or horizontally.

import pandas as pd

df1 = pd.DataFrame({'ID': [1, 2], 'Value': ['A', 'B']})


df2 = pd.DataFrame({'ID': [3, 4], 'Value': ['C', 'D']})
combined_df = pd.concat([df1, df2]) # Combine vertically
print(combined_df)

o Use pd.merge() to merge datasets on common columns.

merged_df = pd.merge(df1, df2, on='ID', how='inner') # Inner join on ID column


print(merged_df)

2. Hierarchical Indexing:

o Create multi-level indices for better organization.

data = pd.DataFrame({'City': ['NY', 'SF'], 'Year': [2020, 2021], 'Value': [100, 200]})
indexed_data = data.set_index(['City', 'Year'])
print(indexed_data)

3. Reshaping and Pivoting:

o Use .pivot() to reshape data based on column values.


pivoted_data = data.pivot(index='City', columns='Year', values='Value')
print(pivoted_data)

4. Melting Data:

o Convert wide-format data into long-format using .melt().

melted_data = pivoted_data.reset_index().melt(id_vars=['City'], var_name='Year',


value_name='Value')
print(melted_data)

These techniques are essential for cleaning and preparing messy datasets for analysis and modeling tasks!

Introduction to Modeling Libraries in Python

Unit 5

Interfacing Between Pandas and Model Code

Pandas is commonly used for preparing datasets before feeding them into machine learning or statistical
models. The integration between Pandas and modeling libraries allows seamless data manipulation and
model building.

1. Data Preparation:

o Use Pandas for cleaning, transforming, and organizing data.

o Example:

import pandas as pd
from sklearn.linear_model import LinearRegression

# Prepare data
df = pd.DataFrame({'X': [1, 2, 3], 'Y': [2, 4, 6]})
X = df[['X']]
Y = df['Y']

# Fit model
model = LinearRegression()
model.fit(X, Y)
print(model.coef_) # Output: [2.]

2. Model Integration:

o Pandas DataFrames can be directly used as inputs for libraries like Scikit-learn.

Creating Model Descriptions with Patsy

Patsy is a Python library that simplifies the creation of statistical models by using formulas similar to R.

1. Formula Syntax:

o Patsy allows you to specify relationships between variables using formulas.

o Example:

import patsy

# Create design matrices


y, X = patsy.dmatrices('Y ~ X', data=df)
print(X) # Design matrix for predictor variables

2. Advantages:

o Automatically handles categorical variables and interactions.

o Simplifies the process of creating design matrices for linear models.

Introduction to Statsmodels

Statsmodels is a Python library for statistical modeling and hypothesis testing.

1. Features:
o Provides tools for linear regression, time series analysis, and more.

o Offers detailed statistical summaries.

2. Example: Linear Regression:

import statsmodels.api as sm

# Prepare data
X = sm.add_constant(df['X']) # Add intercept term
Y = df['Y']

# Fit model
model = sm.OLS(Y, X).fit()
print(model.summary())

3. Advantages:

o Comprehensive output including p-values, confidence intervals, and residuals.

Plotting and Visualization

A Brief Matplotlib API Primer

Matplotlib is a foundational library for creating static plots in Python.

1. Basic Plotting:

import matplotlib.pyplot as plt

plt.plot([1, 2, 3], [4, 5, 6])


plt.title("Basic Plot")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.show()

2. Customizations:
o Add labels, legends, and gridlines.

o Example:

plt.plot([1, 2], [3, 4], label="Line")


plt.legend()
plt.grid(True)
plt.show()

Plotting with Pandas

Pandas has built-in plotting capabilities using Matplotlib.

1. Line Plot:

df.plot(x='X', y='Y', kind='line')


plt.show()

2. Bar Plot:

df.plot(kind='bar')
plt.show()

Plotting with Seaborn

Seaborn is built on top of Matplotlib and provides aesthetically pleasing visualizations.

1. Scatter Plot:

import seaborn as sns

sns.scatterplot(x='X', y='Y', data=df)


plt.show()

2. Heatmap:
sns.heatmap(df.corr(), annot=True)
plt.show()

Other Python Visualization Tools

1. Plotly: Interactive plots for dashboards.

2. Bokeh: Browser-based interactive visualizations.

3. Altair: Declarative statistical visualizations.

These tools provide advanced features like interactivity and integration with web applications!

DATA MINING TECHNIQUES USING R

An Idea on Data Warehouse

A data warehouse is a centralized system designed to store and manage structured data from multiple
sources. It integrates data for business intelligence, enabling fast queries, insightful reporting, and
decision-making. It is a core component of business intelligence systems, helping organizations analyze
historical and current data to generate actionable insights[27][28][29].

Key Characteristics:

1. Subject-Oriented: Focuses on specific topics like sales or inventory rather than overall processes

2. Integrated: Combines data from different sources into a consistent format.

3. Time-Variant: Stores historical data for long-term analysis.

4. Nonvolatile: Data in the warehouse is read-only and cannot be updated or deleted.

Architecture:

1. Bottom Tier: Relational database for storing raw data.

2. Middle Tier: OLAP (Online Analytical Processing) servers for complex queries.

3. Top Tier: Front-end tools for data visualization and analysis[31][30].


Data Mining - KDD vs Data Mining

Knowledge Discovery in Databases (KDD):

• KDD is the overall process of extracting useful knowledge from data, including steps like data
preparation, transformation, mining, and interpretation.

• It focuses on discovering actionable insights from raw data.

Data Mining:

• A subset of KDD that involves applying algorithms to extract patterns from data.

• Techniques include clustering, classification, association rules, and anomaly detection.

Aspect KDD Data Mining

Scope Full process (data prep → insights) Focuses on extracting patterns

Objective Knowledge extraction Pattern identification

Techniques Includes preprocessing and evaluation Statistical and machine learning methods

Stages of the Data Mining Process

The data mining process consists of several stages:

1. Business Understanding:

o Define goals and objectives (e.g., fraud detection).

2. Data Preparation:

o Clean, integrate, transform, and select relevant datasets.

3. Model Building:

o Choose algorithms like decision trees or neural networks.

4. Evaluation:

o Validate patterns using metrics like accuracy or precision.

5. Deployment:
o Apply insights to operational systems (e.g., CRM)[29].

Task Primitives in Data Mining:

• Specify the dataset to be mined.

• Define the type of knowledge to discover (e.g., clustering or classification).

• Set thresholds for interestingness (e.g., minimum support/confidence).

Data Mining Techniques

1. Association Rules:

o Discover relationships between items (e.g., "If a customer buys bread, they are likely to buy
butter").

2. Classification:

o Assign labels to data based on predefined categories (e.g., spam detection).

3. Clustering:

o Group similar data points without predefined labels (e.g., customer segmentation).

4. Regression Analysis:

o Predict numerical outcomes based on input variables.

5. Anomaly Detection:

o Identify rare or unusual patterns in the data (e.g., fraud detection).

6. Neural Networks:

o Model complex nonlinear relationships for tasks like image recognition.

Data Mining Knowledge Representation

Knowledge representation involves encoding mined patterns in formats that are easy to interpret:

1. Rules:
o Represent relationships as "IF...THEN" statements.

o Example: "IF age > 30 AND income > $50K THEN likely to buy luxury products."

2. Decision Trees:

o Visualize decisions as a tree structure with branches representing choices.

3. Graphs and Networks:

o Represent relationships between entities visually (e.g., fraud ring detection).

4. Statistical Summaries:

o Provide numerical summaries like mean, variance, or correlation coefficients.

Effective representation bridges technical outputs with actionable business decisions!

Data Mining Query Languages

What is DMQL?

The Data Mining Query Language (DMQL) was proposed by Han, Fu, and Wang for the DBMiner system.
It is designed to support ad hoc and interactive data mining tasks, enabling users to define data mining
tasks and specify task-relevant data. DMQL is based on SQL and can work with databases and data
warehouses, making it a powerful tool for knowledge discovery[32][33].

Syntax for Task-Relevant Data Specification:

use database database_name


or
use data warehouse data_warehouse_name
in relevance to att_or_dim_list
from relation(s)/cube(s)
[where condition]
order by order_list
group by grouping_list
This syntax allows users to specify the dataset, attributes, dimensions, and conditions for mining
tasks[32][33].

Integration of Data Mining System with a Data Warehouse

Why Integrate?

Integrating a data mining system with a data warehouse provides:

1. Centralized Data Access: Data warehouses store large volumes of historical and current data in a
unified format.

2. Efficient Query Execution: Data mining systems can leverage pre-aggregated data from
warehouses for faster processing.

3. Enhanced Decision-Making: Combines operational data with analytical insights.

Issues in Integration:

1. Data Quality: Ensuring clean and consistent data across sources.

2. Performance: Mining large datasets requires optimized algorithms.

3. Scalability: Handling increasing volumes of data efficiently.

4. Metadata Management: Tracking transformations and lineage of mined patterns[32].

Data Preprocessing

Data preprocessing is essential to prepare raw data for analysis. It includes steps like cleaning,
transformation, feature selection, and dimensionality reduction.

1. Data Cleaning

• Removes noise, inconsistencies, and missing values from the dataset.

• Techniques:

o Handling Missing Values:


import pandas as pd
df.fillna(0) # Replace missing values with 0

o Removing Duplicates:

df.drop_duplicates()

2. Data Transformation

Transforms raw data into a suitable format for analysis.

• Techniques:

o Normalization (scaling values between 0 and 1).

o Encoding categorical variables into numerical formats.

3. Feature Selection

Selects relevant features (variables) that contribute most to the predictive model.

• Example: Use algorithms like Recursive Feature Elimination (RFE) or mutual information.

4. Dimensionality Reduction

Reduces the number of features while retaining essential information.

• Techniques:

o Principal Component Analysis (PCA): Projects data into lower dimensions.

o Singular Value Decomposition (SVD).

Stages of the Data Mining Process

1. Business Understanding:
Define objectives (e.g., fraud detection).

2. Data Preparation:
Clean, integrate, and transform datasets.
3. Model Building:
Apply algorithms like decision trees or clustering.

4. Evaluation:
Validate patterns using metrics like accuracy or precision.

5. Deployment:
Use insights in operational systems (e.g., CRM).

Data Mining Techniques

1. Association Rules:
Discover relationships between items (e.g., "If bread is bought, butter is likely bought").

2. Classification:
Assign labels to data based on predefined categories (e.g., spam detection).

3. Clustering:
Group similar data points without predefined labels (e.g., customer segmentation).

4. Regression Analysis:
Predict numerical outcomes based on input variables.

5. Anomaly Detection:
Identify rare or unusual patterns in the dataset.

Data Mining Knowledge Representation

Knowledge representation encodes mined patterns into interpretable formats:

1. Rules:
Represent relationships as "IF...THEN" statements.
Example: "IF age > 30 AND income > $50K THEN likely to buy luxury products."

2. Decision Trees:
Visualize decisions as a tree structure with branches representing choices.
3. Graphs and Networks:
Represent relationships between entities visually (e.g., fraud ring detection).

4. Statistical Summaries:
Provide numerical summaries like mean, variance, or correlation coefficients[32][33].

Unit 5

Concept Description: Characterization and Comparison

What is Concept Description?

Concept description in data mining refers to summarizing and comparing data characteristics. It involves:

1. Characterization: Summarizing the general features of a dataset.

2. Comparison: Comparing features between datasets or subsets.

Example:

• Characterizing customer purchase behavior (e.g., "Most customers buy electronics during sales").

• Comparing sales trends across regions.

Data Generalization by Attribute-Oriented Induction (AOI)

What is AOI?

Attribute-Oriented Induction (AOI) is a method for data generalization. It transforms detailed data into
higher-level concepts by replacing specific values with generalized ones using concept hierarchies.

Steps in AOI for Data Characterization:

1. Data Preprocessing:

o Clean and prepare the dataset.

o Example: Remove missing values or duplicates.

2. Attribute Generalization:
o Replace low-level attribute values with higher-level concepts.

o Example: Replace "Toyota Corolla" with "Car".

3. Aggregation:

o Combine data into summarized forms.

o Example: Calculate the average sales for each product category.

4. Presentation:

o Visualize generalized data using charts or tables.

Efficient Implementation of AOI:

• Use concept hierarchies to automate generalization.

• Optimize aggregation algorithms for large datasets.

Mining Frequent Patterns, Associations, and Correlations

Basic Concepts

Frequent patterns are recurring relationships in datasets, such as items frequently purchased together.
Mining these patterns helps identify associations and correlations.

Frequent Itemset Mining Methods

1. Apriori Method

The Apriori algorithm is used to find frequent itemsets in transactional data by leveraging the principle
that subsets of frequent itemsets must also be frequent.

Steps:

1. Generate candidate itemsets of length 1.

2. Filter itemsets based on minimum support.

3. Extend itemsets by adding items and repeat filtering.

4. Stop when no more frequent itemsets can be generated.


Example:

• If "Milk" and "Bread" are frequently bought together, they form a frequent itemset.

Generating Association Rules

Association rules describe relationships between items in frequent itemsets, such as:

• Rule: "If Milk, then Bread".

• Metrics:

o Support: Frequency of the itemset in the dataset.

o Confidence: Probability of Bread being bought given Milk was bought.

o Lift: Strength of association compared to random chance.

Improving the Efficiency of Apriori

1. Reduce candidate generation using pruning techniques.

2. Use hash-based methods to count itemsets efficiently.

3. Parallelize computations for large datasets.

2. Pattern-Growth Approach for Mining Frequent Itemsets

The pattern-growth approach avoids candidate generation by recursively dividing the dataset into smaller
subsets based on frequent items (using structures like FP-trees).

Steps:

1. Construct an FP-tree from transactional data.

2. Traverse the tree to find frequent patterns directly.

Advantages:

• Faster than Apriori for large datasets.

• Reduces memory usage by avoiding candidate generation.


Summary

Concept description provides insights into datasets through characterization and comparison, while AOI
simplifies data by generalizing attributes using hierarchies. Frequent pattern mining methods like Apriori
and pattern-growth help uncover associations and correlations efficiently, enabling businesses to make
informed decisions based on hidden patterns in their data[34][35][36].

Unit
4

Classification: Basic Concept is Classification?

Classification is a supervised learning technique in data mining that assigns data instances to predefined
categories (classes) based on their features. It involves building a model using labeled training data and
applying it to predict the class labels of new, unseen data.

Key Steps in Classification:

1. Data Collection: Gather relevant data containing features and labels.

2. Model Building: Train a classification algorithm using the training dataset.

3. Prediction: Apply the trained model to classify new data instances.

4. Evaluation: Assess the model’s performance using metrics like accuracy, precision, recall, and F1-
score.

Types of Classification:

1. Binary Classification: Two classes (e.g., spam vs. not spam).

2. Multi-Class Classification: More than two classes (e.g., categorizing fruits as apple, banana, or
orange).

Decision Tree Induction

What is a Decision Tree?


A decision tree is a tree-like structure used for classification and prediction. Each internal node represents
an attribute test, each branch represents an outcome of the test, and each leaf node represents a class
label.

Decision Tree Induction Algorithm:

1. Start with the entire dataset.

2. Select the best attribute for splitting the data based on an attribute selection measure.

3. Partition the dataset into subsets based on the selected attribute.

4. Repeat recursively for each subset until all instances belong to the same class or stopping criteria
are met.

Attribute Selection Measures

Attribute selection measures help identify the best attribute to split the dataset at each step.

1. Information Gain:

o Measures how much information an attribute contributes to classification.

o Formula:Information Gain = Entropy(before) − Entropy(after)

o Example: Select an attribute that maximizes information gain.

2. Gini Index:

o Measures impurity in a dataset.

o Lower values indicate better splits.

3. Chi-Square Test:

o Evaluates statistical significance of attributes in classification.

Tree Pruning

Tree pruning removes unnecessary branches from a decision tree to reduce complexity and improve
generalization.
1. Pre-Pruning: Stop tree growth early based on criteria like minimum samples per node.

2. Post-Pruning: Remove branches after the tree is built by evaluating their impact on accuracy.

Bayes Classification Methods

What is Bayes Classification?

Bayesian classifiers use probability theory to predict class labels based on prior probabilities and
likelihoods of features given classes.

Naive Bayes Classifier:

• Assumes independence between features.

• Formula:

𝑃(𝑋|𝐶) ⋅ 𝑃(𝐶)
𝑃(𝐶|𝑋) =
𝑃(𝑋)

Where 𝑃(𝐶|𝑋) is the probability of class 𝐶 given feature 𝑋.

• Example:
Predict whether an email is spam based on word frequencies.

Summary

Classification organizes data into predefined categories using methods like decision trees and Bayesian
classifiers. Decision tree induction involves selecting attributes using measures like information gain and
pruning unnecessary branches for simplicity. Bayesian methods rely on probabilistic reasoning, making
them effective for tasks with independent features.
Unit 5

Association Rule Mining

Antecedent and Consequent

• Antecedent: The "IF" part of an association rule. It represents the item or itemset found in the data.
Example: In the rule {Bread} → {Butter}, "Bread" is the antecedent.

• Consequent: The "THEN" part of the association rule. It represents the item or itemset that occurs
along with the antecedent. Example: In {Bread} → {Butter}, "Butter" is the consequent[37][38].

Multi-Relational Association Rules

Multi-relational association rules extend traditional association rules by considering relationships across
multiple tables or datasets. They are useful in scenarios where data is stored in relational databases,
allowing analysis across different dimensions (e.g., customer demographics and purchase history).

ECLAT Algorithm

ECLAT (Equivalence Class Transformation) is a frequent itemset mining algorithm that uses a depth-first
search strategy:

1. Steps:

o Transform the dataset into a vertical format (item → transaction IDs).

o Use intersections of transaction IDs to find frequent itemsets.

2. Advantages:

o Efficient for dense datasets.

o Avoids candidate generation like Apriori[37].

Case Study on Market Basket Analysis

Market Basket Analysis identifies patterns in customer purchasing behavior, such as items frequently
bought together, to optimize sales strategies.

Example: Amazon
Amazon uses market basket analysis to recommend products under headings like "Frequently bought
together" or "Customers who bought this item also bought." This improves cross-selling and enhances
customer experience[39][40][38].

Applications:

1. Retail: Optimize store layouts and product placement (e.g., placing milk near bread).

2. E-commerce: Automate product recommendations.

3. Fraud Detection: Identify unusual credit card transactions[37][40].

Benefits:

• Enhanced customer understanding.

• Improved inventory management.

• Increased sales through cross-selling[41].

Cluster Analysis

What is Cluster Analysis?

Cluster analysis groups similar data points into clusters based on their attributes. It is an unsupervised
learning technique used for segmentation, anomaly detection, and pattern recognition.

Partitioning Methods

Partitioning methods divide data into non-overlapping subsets (clusters). Examples include:

1. K-Means Clustering:

o Divides data into 𝑘 clusters based on centroids.

o Iteratively minimizes within-cluster variance.

o Example:

from sklearn.cluster import KMeans


kmeans = KMeans(n_clusters=3)
kmeans.fit(data)
print(kmeans.labels_)

2. K-Medoids:

o Similar to K-Means but uses medoids (actual data points) as cluster centers.

Hierarchical Methods

Hierarchical clustering builds a tree-like structure of clusters:

1. Agglomerative Clustering:

o Starts with individual points as clusters and merges them iteratively.

2. Divisive Clustering:

o Starts with all points in one cluster and splits them iteratively.

Example:

from scipy.cluster.hierarchy import dendrogram, linkage


linked = linkage(data, method='ward')
dendrogram(linked)

Density-Based Methods

Density-based methods group points that are closely packed together.

DBSCAN (Density-Based Spatial Clustering of Applications with Noise):

1. Groups points based on density (minimum number of points within a radius).

2. Identifies noise as outliers.

3. Steps:

o Select core points based on density threshold.

o Expand clusters around core points.

4. Example:
from sklearn.cluster import DBSCAN
dbscan = DBSCAN(eps=0.5, min_samples=5)
dbscan.fit(data)
print(dbscan.labels_)

5. Advantages:

o Handles arbitrary-shaped clusters.

o Robust to noise.

Summary

Association rule mining uncovers relationships between items, such as antecedents and consequents,
enabling insights like cross-selling opportunities through algorithms like Apriori and ECLAT. Market
Basket Analysis exemplifies its application in retail and e-commerce for optimizing product placement
and recommendations.

Cluster analysis segments data into meaningful groups using methods like K-Means, hierarchical
clustering, and DBSCAN, each suited for specific types of datasets and clustering needs. These techniques
are essential for understanding patterns and improving decision-making across industries!

1. https://www.datacamp.com/blog/what-is-data-analysis-expert-guide

2. https://www.sganalytics.com/blog/what-is-the-meaning-of-data-analysis/

3. https://www.upwork.com/resources/data-analysis-vs-data-analytics

4. https://www.questionpro.com/blog/data-analytics-vs-data-analysis/

5. https://www.w3schools.com/python/python_intro.asp

6. https://www.tutorialspoint.com/python/python_features.htm

7. https://www.sisense.com/glossary/python-for-data-analysis/

8. https://www.linkedin.com/pulse/what-makes-python-brilliant-choice-data-analysis-pratibha-kumari-jha

9. https://en.wikipedia.org/wiki/Data_warehouse
10. https://www.sap.com/mena/products/data-cloud/datasphere/what-is-a-data-warehouse.html

11. https://cdn.aaai.org/KDD/1996/KDD96-014.pdf

12. https://en.wikipedia.org/wiki/Data_mining

13. https://www.ibm.com/think/topics/data-mining

14. https://www.studocu.com/in/messages/question/4583274/discuss-data-mining-task-primitives

15. https://www.investopedia.com/terms/d/datamining.asp

16. https://www.aimasterclass.com/glossary/knowledge-representation

17. http://dataminingzone.weebly.com/uploads/6/5/9/4/6594749/ch_9data_mining_query_language.pdf

18. https://www.tutorialspoint.com/data_mining/dm_query_language.htm

19. https://data-flair.training/blogs/data-mining-query-language/

20. https://liris.cnrs.fr/Documents/Liris-1661.pdf

21. https://marketsplash.com/what-is-ipython/

22. https://domino.ai/data-science-dictionary/jupyter-notebook

23. https://jupyter.org

24. https://www.freecodecamp.org/news/the-python-guide-for-beginners/

25. https://en.wikipedia.org/wiki/Python_syntax_and_semantics

26. https://bootcamp.cvn.columbia.edu/blog/python-basics-guide/

27. https://www.techtarget.com/searchdatamanagement/definition/data-warehouse

28. https://atlan.com/data-warehouse-101/

29. https://en.wikipedia.org/wiki/Data_warehouse

30. https://www.simplilearn.com/data-warehouse-article

31. https://www.sap.com/india/products/technology-platform/datasphere/what-is-a-data-warehouse.html

32. https://www.tutorialspoint.com/data_mining/dm_query_language.htm

33. https://data-flair.training/blogs/data-mining-query-language/
34. https://www.investopedia.com/terms/d/datamining.asp

35. https://www.techtarget.com/searchbusinessanalytics/definition/data-mining

36. https://en.wikipedia.org/wiki/Data_mining

37. https://www.geeksforgeeks.org/market-basket-analysis-in-data-mining/

38. https://www.linkedin.com/pulse/what-market-basket-analysis-overview-uses-types-chatterjee-

39. https://www.simplilearn.com/what-is-market-basket-analysis-article

40. https://www.techtarget.com/searchcustomerexperience/definition/market-basket-analysis

41. https://www.alteryx.com/resources/use-case/market-basket-analysis

You might also like