0% found this document useful (0 votes)
6 views2 pages

Data_Science_Cheat_Sheet (3)

This cheat sheet provides essential commands and concepts for data science, covering NumPy for array manipulation, Pandas for data handling, and basic statistics. It also includes machine learning fundamentals with model training and evaluation, as well as data visualization techniques using Matplotlib and Seaborn. Key statistical concepts such as mean, median, mode, and hypothesis testing are also summarized.

Uploaded by

C. Adithya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views2 pages

Data_Science_Cheat_Sheet (3)

This cheat sheet provides essential commands and concepts for data science, covering NumPy for array manipulation, Pandas for data handling, and basic statistics. It also includes machine learning fundamentals with model training and evaluation, as well as data visualization techniques using Matplotlib and Seaborn. Key statistical concepts such as mean, median, mode, and hypothesis testing are also summarized.

Uploaded by

C. Adithya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Data Science Cheat Sheet

NumPy Basics
- import numpy as np
- np.array([1, 2, 3]) # Create array
- np.zeros((3,3)) # Create 3x3 zero matrix
- np.ones((2,2)) # Create 2x2 one matrix
- np.random.rand(3,3) # Random values
- np.mean(arr), np.std(arr) # Mean & Std

Pandas Essentials
- import pandas as pd
- df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]}) # Create DataFrame
- df.head(), df.tail() # View top/bottom rows
- df.info(), df.describe() # Summary stats
- df['col_name'], df.iloc[0], df.loc['row'] # Access data

Probability & Statistics


- Mean: sum(x)/n
- Median: Middle value of sorted data
- Mode: Most frequent value
- Variance: avg((x - mean)²)
- Standard Deviation: sqrt(variance)
- Normal Distribution: bell curve, symmetric
- Hypothesis Testing: Null & Alternative Hypothesis

Machine Learning Basics


- from sklearn.model_selection import train_test_split
- X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
- from sklearn.linear_model import LinearRegression
- model = LinearRegression()
- model.fit(X_train, y_train)
- y_pred = model.predict(X_test)
- from sklearn.metrics import accuracy_score
- accuracy_score(y_test, y_pred)
Data Visualization
- import matplotlib.pyplot as plt
- plt.plot(x, y), plt.show() # Line plot
- plt.hist(data) # Histogram
- plt.scatter(x, y) # Scatter plot
- import seaborn as sns
- sns.heatmap(df.corr(), annot=True) # Correlation Heatmap

You might also like