0% found this document useful (0 votes)
15 views18 pages

Data Analytics

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views18 pages

Data Analytics

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 18

Ex No: 01

RANDOM SAMPLING USING


Date:14/02/2024 PYTHON

AIM:
To write and execute Python program for random sampling.

ALGORITHM:

Step 1: Start.
Step 2: Import the necessary libraries and modules.
Step 3: Assign a variabe to random sample.
Step 4: Display the random value.
Step 5: Stop.

PROGRAM:

(a)It prints the random float value in the range between [0.0, 1.0) in a 1D
Array.
# importing numpy
import numpy as np
# output random value
out_val = np.random.sample()
print("Output random value : ", out_val)

OUTPUT:

221801007 1 AD19441
(b)Random Sampling for 2D Array

# importing numpy
import numpy as np
# output array
out_arr = np.random.sample(size=(3, 3))
print("Output 2D Array filled with random floats : ", out_arr)

OUTPUT:

(c)Random sampling from DataFrame with sample()

import pandas as pd
import seaborn as sns
df = pd.read_csv('C:/Users/Kavitha/Documents/221801007/Diet.csv')
print(df)
print(df.sample())
print(df.sample(axis=1))
print(df.sample(n=3))

OUTPUT:

221801007 2 AD19441
RESULT:
Thus the python program for implementing random sampling has been
successfully executed and output is verified.

221801007 3 AD19441
Ex No: 02
T-test case study
Date:28/02/2024

AIM:
To write and execute Python program for t-test case study.

ALGORITHM:

Step 1: Start.

Step 2: Import the necessary libraries.

Step 3: Define the given information.

Step 4: Calculate the sample statistics

Step 5: Perform the t-test

Step 6: Hypothesis testing using the t-statistic

Step 7: Interpret the results

Step 8: Stop.

221801007 4 AD19441
PROGRAM:

import numpy as np
from scipy import stats
N = 10
x = np.random.randn(N) + 2
y = np.random.randn(N)
# Calculating the Standard Deviation
# Calculating the variance to get the standard deviation
var_x = x.var(ddof = 1)
var_y = y.var(ddof = 1)
# Standard Deviation
SD = np.sqrt((var_x + var_y) / 2)
print("Standard Deviation =", SD)
# Calculating the T-Statistics
tval = (x.mean() - y.mean()) / (SD * np.sqrt(2 / N))
# Comparing with the critical T-Value
# Degrees of freedom
dof = 2 * N - 2
# p-value after comparison with the T-Statistics
pval = 1 - stats.t.cdf( tval, df = dof)
print("t = " + str(tval))
print("p = " + str(2 * pval))
## Cross Checking using the internal function from SciPy Package
tval2, pval2 = stats.ttest_ind(x, y)
print("t = " + str(tval2))
print("p = " + str(pval2))

OUTPUT:

RESULT:
Thus the python program for T-TEST has been successfully executed
and output is verified.

221801007 5 AD19441
Ex No: 03
Z-test case study
Date:13/03/2024

AIM:
To write and execute Python program for z-test case study.

ALGORITHM:

Step 1: Start.
Step 2: Import the necessary libraries.
Step 3: Define the given information.
Step 4: Calculate the Z-score.
Step 5: Determine the critical Z-score.
Step 6: Hypothesis testing using critical Z-score.
Step 7: Hypothesis testing using P-value .
Step 8: Stop.

PROGRAM:

import numpy as np
import scipy.stats as stats
sample_mean = 110
population_mean = 100
population_std = 15
sample_size = 50
alpha = 0.05
z_score = (sample_mean - population_mean) / (population_std /
np.sqrt(sample_size))
print('Z-Score :', z_score)
z_critical = stats.norm.ppf(1 - alpha)
print('Critical Z-Score :', z_critical)
if z_score > z_critical:
print("Reject Null Hypothesis")
else:
print("Fail to Reject Null Hypothesis")
p_value = 1 - stats.norm.cdf(z_score)
print('p-value :', p_value)
if p_value < alpha:
print("Reject Null Hypothesis")
else:
print("Fail to Reject Null Hypothesis")

221801007 6 AD19441
OUTPUT:

RESULT:
Thus the python program for Z-TEST has been successfully executed and
output is verified.

221801007 7 AD19441
Ex No: 04
Anova case study
Date:20/03/2024

AIM:
To write and execute Python program for z-test case study.

ALGORITHM:

Step 1: Start.
Step 2: Import the necessary libraries.
Step 3: Load the dataset
Step 4: Group the data by the factor of interest
Step 5: Perform the ANOVA
Step 6: Hypothesis testing using the F-statistic
Step 7: Visualize the data
Step 8: Stop.

PROGRAM:

import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.formula.api import ols
import seaborn as sns
import numpy as np
import pandas.tseries
plt.style.use('fivethirtyeight')
mydata = pd.read_csv('C:/Users/Kavitha/Documents/221801007/Diet.csv')
print(mydata.head())

221801007 8 AD19441
OUTPUT:

RESULT:
Thus the python program for Anova case studies has been successfully
executed and output is verified.

221801007 9 AD19441
Ex No: 05
REGRESSION
Date:03/04/2024

AIM:
To write and execute Python program for implementing
regression.

ALGORITHM:

STEP 1: Start.

STEP 2: Import the necessary libraries numpy, matplotlib, sklearn.

STEP 3: Load the Boston dataset.

STEP 4: Split the dataset into training and testing sets.

STEP 5: Train a linear regression model on the training data.

STEP 6: Plot the residual errors for both the training and testing data.

STEP 7: Display the plot.

STEP 8: Stop.

221801007 10 AD19441
PROGRAM:

import matplotlib.pyplot as plt


import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
from sklearn.impute import SimpleImputer
data = pd.read_csv('C:/Users\Kavitha/Documents/221801007/HousingData.csv')
imputer = SimpleImputer(strategy='mean')
data_imputed = pd.DataFrame(imputer.fit_transform(data), columns=data.columns)
X = data_imputed.drop('MEDV', axis=1)
y = data_imputed['MEDV'] # Target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=1)
regression_model = LinearRegression()
regression_model.fit(X_train, y_train)
print('Regression Coefficients:', regression_model.coef_)
y_pred = regression_model.predict(X_test)
print('Variance score:', r2_score(y_test, y_pred))
plt.style.use('fivethirtyeight')
plt.scatter(regression_model.predict(X_train), regression_model.predict(X_train) -
y_train,
color="green", s=10, label='Train data')
plt.scatter(regression_model.predict(X_test), regression_model.predict(X_test) -
y_test,
color="blue", s=10, label='Test data')
max_pred_value = max(regression_model.predict(X_train).max(),
regression_model.predict(X_test).max())
plt.hlines(y=0, xmin=0, xmax=max_pred_value, linewidth=2)
plt.legend(loc='upper right')
plt.title("Residual errors")
plt.show()

221801007 11 AD19441
OUTPUT:

RESULT:
Thus the python program for impementing regression has been
successfully executed and output is verified.

221801007 12 AD19441
Ex No: 06
LOGISTIC REGRESSION
Date:10/04/2024

AIM:
To write and execute Python program for implementing logistic
regression.

ALGORITHM:

Step 1: Start.
Step 2: Import the necessary libraries.
Step 3: Load the dataset
Step 4: Split the dataset into training and testing sets
Step 5: Train a logistic regression model on the training data
Step 6: Evaluate the model
Step 7: Plot the ROC curve
Step 8: Stop.

PROGRAM:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report,
confusion_matrix,roc_curve, auc
# Load the diabetes dataset
diabetes = load_diabetes()
X, y = diabetes.data, diabetes.target

221801007 13 AD19441
# Convert the target variable to binary (1 for diabetes, 0 for no diabetes)
y_binary = (y > np.median(y)).astype(int)
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y_binary, test_size=0.2,
random_state=42)
# Standardize features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Train the Logistic Regression model
model = LogisticRegression()
model.fit(X_train, y_train)
# Evaluate the model
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy: {:.2f}%".format(accuracy * 100))

OUTPUT:

RESULT:
Thus the python program for impementing logistic regression has
been successfully executed and output is verified.

221801007 14 AD19441
Ex No:07
Time Series Analysis
Date:17/04/2024

AIM:
To write and execute Python program for implentation of time

seiries analysis.

ALGORITHM:

Step 1: Start.

Step 2: Import the necessary libraries.

Step 3: Load the time series dataset

Step 4: Plot the time series data

Step 5: Decompose the time series data

Step 6: Split the dataset into training and testing sets

Step 7: Train a SARIMA model on the training data

Step 8: Forecast and plot the results

Step 9: Evaluate the model

Step 10: Stop

221801007 15 AD19441
PROGRAM:

import pandas as pd
df = pd.read_csv('C:/Users\Kavitha/Documents/221801007/AirPassengers.csv')
print(df.head())
print(df.tail())
df['Month'] = pd.to_datetime(df['Month'], format='%Y-%m')
print(df.head())
df.index = df['Month']
del df['Month']
print(df.head())
import matplotlib.pyplot as plt
import seaborn as sns
sns.lineplot(df)
plt.title('Time Series Analysis')
plt.ylabel('Number of Passengers')
plt.show()

221801007 16 AD19441
OUTPUT:

221801007 17 AD19441
RESULT:
Thus the python program for impementing time series analysis
has been successfully executed and output is verified.

221801007 18 AD19441

You might also like