0% found this document useful (0 votes)
3 views6 pages

ml file

Data science is an interdisciplinary field focused on extracting insights from large data sets using statistical methods and machine learning. The document outlines the impact of data science across various industries, highlights essential Python libraries for data manipulation, and provides an overview of a housing prices dataset for analysis. It includes practical exercises using Python's Pandas library for data manipulation tasks such as filtering, sorting, and grouping data.

Uploaded by

meet008828
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views6 pages

ml file

Data science is an interdisciplinary field focused on extracting insights from large data sets using statistical methods and machine learning. The document outlines the impact of data science across various industries, highlights essential Python libraries for data manipulation, and provides an overview of a housing prices dataset for analysis. It includes practical exercises using Python's Pandas library for data manipulation tasks such as filtering, sorting, and grouping data.

Uploaded by

meet008828
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Machine Exercise : 1

What is Data Science?


Data science is an interdisciplinary field that involves extracting insights and knowledge
from large volumes of data. It combines statistical methods, machine learning algorithms,
and domain expertise to solve complex problems. Data scientists play a crucial role in
transforming raw data into actionable information that drives decision-making.

Impact of Data Science


Data science has revolutionized industries across the globe. Its applications range from
healthcare and finance to marketing and e-commerce. Some key impacts include:

- Improved decision-making through data-driven insights


- Enhanced customer experience through personalized recommendations
- Fraud detection and prevention
- Optimization of processes and resource allocation
- Advancements in scientific research and discovery

Essential Python Libraries for Data Science


Python has emerged as the preferred language for data scientists due to its simplicity,
readability, and extensive libraries.

Objective
This exercise aims to demonstrate basic data manipulation techniques using Python's
Pandas library.

Dataset Overview
The Raw Housing Prices dataset provides detailed information about housing sales. This
dataset is useful for understanding pricing trends, property characteristics, and market
behaviors.

Data Description
- Date House was Sold: The date when the house was sold.
- Sale Price: The price at which the house was sold.
- Zipcode: The area code of the property location.
- Bedrooms: The number of bedrooms in the house.
- Bathrooms: The number of bathrooms in the house.
- Living Area (sqft): The living space size in square feet.
- Lot Area (sqft): The size of the lot in square feet.
- Floors: The number of floors in the house.
- Waterfront View: Whether the house has a view of the waterfront.
- Condition: The overall condition of the property.
Purpose
- Price Trend Analysis: Identifying pricing trends over time and across locations.
- Property Segmentation: Analyzing features that affect property prices.
- Location Insights: Understanding how location impacts housing prices.
- Market Behavior: Evaluating market behaviors to assist in real estate decision-making.

Q 1.1) Basic Data Manipulation Tasks

# Import Libraries
import pandas as pd

# Load Data from the provided CSV file


df = pd.read_csv('/content/Raw_Housing_Prices3.csv')

# Display the Data


print(df.head())

Q 1.2) Selecting Multiple Columns

# Selecting relevant columns


selected_columns = df[['Date House was Sold', 'Sale Price', 'Zipcode', 'Waterfront View']]
print(selected_columns)
Q 1.3) Displaying a Concise Summary of the DataFrame

df.info()
Q 1.4) Generating Descriptive Statistics

df.describe()

Q 1.5) Display the Rows and Columns of the Dataset

df.shape

Q 2) Exporting Data

user_data = {'Uniroll': [2234219], 'Name': ['meet'], 'Percentage': [80]}


user_df = pd.DataFrame(user_data)
user_df.to_csv('user_data.csv', index=False)

Q 3) Filtering Data

filtered_data = df[df['Sale Price'] > 500000][['Zipcode', 'Sale Price']]


print(filtered_data)
Q 3) Sorting Data

sorted_df = df.sort_values(by='Sale Price', ascending=False)


print(sorted_df[['Zipcode', 'Sale Price']])

Q 3) Grouping Data

grouped_df = df.groupby('Zipcode')['Sale Price'].sum()


print(grouped_df)

You might also like