Data Analytics
Data Analytics
AIM:
To write and execute Python program for random sampling.
ALGORITHM:
Step 1: Start.
Step 2: Import the necessary libraries and modules.
Step 3: Assign a variabe to random sample.
Step 4: Display the random value.
Step 5: Stop.
PROGRAM:
(a)It prints the random float value in the range between [0.0, 1.0) in a 1D
Array.
# importing numpy
import numpy as np
# output random value
out_val = np.random.sample()
print("Output random value : ", out_val)
OUTPUT:
221801007 1 AD19441
(b)Random Sampling for 2D Array
# importing numpy
import numpy as np
# output array
out_arr = np.random.sample(size=(3, 3))
print("Output 2D Array filled with random floats : ", out_arr)
OUTPUT:
import pandas as pd
import seaborn as sns
df = pd.read_csv('C:/Users/Kavitha/Documents/221801007/Diet.csv')
print(df)
print(df.sample())
print(df.sample(axis=1))
print(df.sample(n=3))
OUTPUT:
221801007 2 AD19441
RESULT:
Thus the python program for implementing random sampling has been
successfully executed and output is verified.
221801007 3 AD19441
Ex No: 02
T-test case study
Date:28/02/2024
AIM:
To write and execute Python program for t-test case study.
ALGORITHM:
Step 1: Start.
Step 8: Stop.
221801007 4 AD19441
PROGRAM:
import numpy as np
from scipy import stats
N = 10
x = np.random.randn(N) + 2
y = np.random.randn(N)
# Calculating the Standard Deviation
# Calculating the variance to get the standard deviation
var_x = x.var(ddof = 1)
var_y = y.var(ddof = 1)
# Standard Deviation
SD = np.sqrt((var_x + var_y) / 2)
print("Standard Deviation =", SD)
# Calculating the T-Statistics
tval = (x.mean() - y.mean()) / (SD * np.sqrt(2 / N))
# Comparing with the critical T-Value
# Degrees of freedom
dof = 2 * N - 2
# p-value after comparison with the T-Statistics
pval = 1 - stats.t.cdf( tval, df = dof)
print("t = " + str(tval))
print("p = " + str(2 * pval))
## Cross Checking using the internal function from SciPy Package
tval2, pval2 = stats.ttest_ind(x, y)
print("t = " + str(tval2))
print("p = " + str(pval2))
OUTPUT:
RESULT:
Thus the python program for T-TEST has been successfully executed
and output is verified.
221801007 5 AD19441
Ex No: 03
Z-test case study
Date:13/03/2024
AIM:
To write and execute Python program for z-test case study.
ALGORITHM:
Step 1: Start.
Step 2: Import the necessary libraries.
Step 3: Define the given information.
Step 4: Calculate the Z-score.
Step 5: Determine the critical Z-score.
Step 6: Hypothesis testing using critical Z-score.
Step 7: Hypothesis testing using P-value .
Step 8: Stop.
PROGRAM:
import numpy as np
import scipy.stats as stats
sample_mean = 110
population_mean = 100
population_std = 15
sample_size = 50
alpha = 0.05
z_score = (sample_mean - population_mean) / (population_std /
np.sqrt(sample_size))
print('Z-Score :', z_score)
z_critical = stats.norm.ppf(1 - alpha)
print('Critical Z-Score :', z_critical)
if z_score > z_critical:
print("Reject Null Hypothesis")
else:
print("Fail to Reject Null Hypothesis")
p_value = 1 - stats.norm.cdf(z_score)
print('p-value :', p_value)
if p_value < alpha:
print("Reject Null Hypothesis")
else:
print("Fail to Reject Null Hypothesis")
221801007 6 AD19441
OUTPUT:
RESULT:
Thus the python program for Z-TEST has been successfully executed and
output is verified.
221801007 7 AD19441
Ex No: 04
Anova case study
Date:20/03/2024
AIM:
To write and execute Python program for z-test case study.
ALGORITHM:
Step 1: Start.
Step 2: Import the necessary libraries.
Step 3: Load the dataset
Step 4: Group the data by the factor of interest
Step 5: Perform the ANOVA
Step 6: Hypothesis testing using the F-statistic
Step 7: Visualize the data
Step 8: Stop.
PROGRAM:
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.formula.api import ols
import seaborn as sns
import numpy as np
import pandas.tseries
plt.style.use('fivethirtyeight')
mydata = pd.read_csv('C:/Users/Kavitha/Documents/221801007/Diet.csv')
print(mydata.head())
221801007 8 AD19441
OUTPUT:
RESULT:
Thus the python program for Anova case studies has been successfully
executed and output is verified.
221801007 9 AD19441
Ex No: 05
REGRESSION
Date:03/04/2024
AIM:
To write and execute Python program for implementing
regression.
ALGORITHM:
STEP 1: Start.
STEP 6: Plot the residual errors for both the training and testing data.
STEP 8: Stop.
221801007 10 AD19441
PROGRAM:
221801007 11 AD19441
OUTPUT:
RESULT:
Thus the python program for impementing regression has been
successfully executed and output is verified.
221801007 12 AD19441
Ex No: 06
LOGISTIC REGRESSION
Date:10/04/2024
AIM:
To write and execute Python program for implementing logistic
regression.
ALGORITHM:
Step 1: Start.
Step 2: Import the necessary libraries.
Step 3: Load the dataset
Step 4: Split the dataset into training and testing sets
Step 5: Train a logistic regression model on the training data
Step 6: Evaluate the model
Step 7: Plot the ROC curve
Step 8: Stop.
PROGRAM:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report,
confusion_matrix,roc_curve, auc
# Load the diabetes dataset
diabetes = load_diabetes()
X, y = diabetes.data, diabetes.target
221801007 13 AD19441
# Convert the target variable to binary (1 for diabetes, 0 for no diabetes)
y_binary = (y > np.median(y)).astype(int)
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y_binary, test_size=0.2,
random_state=42)
# Standardize features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Train the Logistic Regression model
model = LogisticRegression()
model.fit(X_train, y_train)
# Evaluate the model
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy: {:.2f}%".format(accuracy * 100))
OUTPUT:
RESULT:
Thus the python program for impementing logistic regression has
been successfully executed and output is verified.
221801007 14 AD19441
Ex No:07
Time Series Analysis
Date:17/04/2024
AIM:
To write and execute Python program for implentation of time
seiries analysis.
ALGORITHM:
Step 1: Start.
221801007 15 AD19441
PROGRAM:
import pandas as pd
df = pd.read_csv('C:/Users\Kavitha/Documents/221801007/AirPassengers.csv')
print(df.head())
print(df.tail())
df['Month'] = pd.to_datetime(df['Month'], format='%Y-%m')
print(df.head())
df.index = df['Month']
del df['Month']
print(df.head())
import matplotlib.pyplot as plt
import seaborn as sns
sns.lineplot(df)
plt.title('Time Series Analysis')
plt.ylabel('Number of Passengers')
plt.show()
221801007 16 AD19441
OUTPUT:
221801007 17 AD19441
RESULT:
Thus the python program for impementing time series analysis
has been successfully executed and output is verified.
221801007 18 AD19441