0% found this document useful (0 votes)
15 views10 pages

Social Network Analysis: Cheruvu Nvss Suhas 21BCE8374

Uploaded by

scheruvu.30
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views10 pages

Social Network Analysis: Cheruvu Nvss Suhas 21BCE8374

Uploaded by

scheruvu.30
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

SOCIAL NETWORK ANALYSIS

CHERUVU NVSS SUHAS


21BCE8374

User1
1/31/24, 9:33 PM sna - Jupyter Notebook

In [1]: 1 import pandas as pd


2 import numpy as np
3 import matplotlib.pyplot as plt
4 import seaborn as sns

In [2]: 1 df=pd.read_csv("C:\\Users\\User1\\OneDrive\\Desktop\\jobs_in_data.c
sv)

In [3]:
1 df = df.drop_duplicates()
2 df = df.reset_index()
3 df.drop(columns = ['index'],axis = 1,inplace = True)

In [4]:
1 df.info()
s 'pandas.core.frame.DataFrame'>
<clas
Index: 5341 entries, 0 to 5340
Range
Data columns (total 12 columns):
# Column Non-Null Count Dtype

0 work_year 5341 non-null int64


1 job_title 5341 non-null object
2 job_category 5341 non-null object
3 salary_currency 5341 non-null object
4 salary 5341 non-null int64
5 salary_in_usd 5341 non-null int64
6 employee_residence 5341 non-null object
7 experience_level 5341 non-null object
8 employment_type 5341 non-null object
9 work_setting 5341 non-null object
10 company_location 5341 non-null object
11 company_size 5341 non-null object
dtypes: int64(3), object(9)
memory usage: 500.8+ KB

localhost:8888/notebooks/sna.ipynb 1/9
1/31/24, 9:33 PM sna - Jupyter Notebook

In [5]: 1 df

Out[5]:
work_year job_title job_category salary_currency salary salary_in_usd empl

Data
Data
0 2023 DevOps EUR 88000 95012
Engineering
Engineer

Data
Data
1 2023 Architecture USD 186000 186000
Architect
and Modeling

Data
Data
2 2023 Architecture USD 81800 81800
Architect
and Modeling

Data Data Science USD 212000 212000


3 2023
Scientist and Research

Data Data Science USD 93300 93300


4 2023
Scientist and Research
... ... ...

... ... ... ...


USD 165000 165000
Data
Data
5336 2021 Management
Specialist
and Strategy
USD 412000 412000
Data Data Science
5337 2020
Scientist and Research USD 151000 151000
Principal
Data Science
5338 2021 Data
and Research USD 105000 105000
Scientist

Data Data Science


5339 2020 USD 100000 100000
Scientist and Research

Business

5340 2020 Data Data Analysis


Analyst

5341 rows × 12 columns

In [6]: 1 df['work_year'].unique()

Out[6]: array([2023, 2022, 2020, 2021], dtype=int64)

localhost:8888/notebooks/sna.ipynb 2/9
1/31/24, 9:33 PM sna - Jupyter Notebook

In [7]: 1 df['job_title'].value_counts()

Out[7]: job_title
Data Engineer 1100
Data Scientist 1039
Data Analyst 744
Machine Learning Engineer 518
Analytics Engineer 207
...
Deep Learning Researcher 1
Analytics Engineering Manager 1
BI Data Engineer 1
Power BI Developer 1
Marketing Data Engineer 1
Name: count, Length: 125, dtype: int64

In [8]: 1 #barplot
2
3 plt.figure(figsize = (30,10))
4
5 plt.bar(df['job_title'],df['salary_in_usd'])
6 plt.xticks(rotation = 90)
7 plt.ylabel('Salary')
8 plt.xlabel('Job role')
9 plt.title('Job vs Salary')
10 plt.show()

In [9]: 1 df['job_category'].unique()

Out[9]: array(['Data Engineering', 'Data Architecture and Modeling',


'Data Science and Research', 'Machine Learning and AI',
'Data Analysis', 'Leadership and Management',
'BI and Visualization', 'Data Quality and Operations',
'Data Management and Strategy', 'Cloud and Database'], dtype=o
bject)

In [10]: 1 df['work_year'].unique()

Out[10]: array([2023, 2022, 2020, 2021], dtype=int64)

localhost:8888/notebooks/sna.ipynb 3/9
1/31/24, 9:33 PM sna - Jupyter Notebook

In [11]: 1 group = df.groupby('job_category')

In [12]: 1 data_engineer = group.get_group('Data Engineering')


2
3 data_engineer_group = data_engineer.groupby(['work_year'])
4 data_engineer_mean_salary = [data_engineer_group.get_group(2020)['
5 data_engineer_group.get_group(2021)['salary_in_usd'].mean(),
6 data_engineer_group.get_group(2022)['salary_in_usd'].mean(),
7 data_engineer_group.get_group(2023)['salary_in_usd'].mean()]

In [13]: 1 data_analyst = group.get_group('Data Analysis')


2 data_analyst_group = data_analyst.groupby(['work_year'])
3 data_analyst_mean_salary =[data_analyst_group.get_group(2020)['sal
4 data_analyst_group.get_group(2021)['salary_in_usd'].mean(),
5 data_analyst_group.get_group(2022)['salary_in_usd'].mean(),
6 data_analyst_group.get_group(2023)['salary_in_usd'].mean()]

In [14]: 1 data_scientist = group.get_group('Data Science and Research')


2 data_scientist_group = data_scientist.groupby(['work_year'])
3 data_scientist_mean_salary =[data_scientist_group.get_group(2020)[
4 data_scientist_group.get_group(2021)['salary_in_usd'].mean(),
5 data_scientist_group.get_group(2022)['salary_in_usd'].mean(),
6 data_scientist_group.get_group(2023)['salary_in_usd'].mean()]

localhost:8888/notebooks/sna.ipynb 4/9
1/31/24, 9:33 PM sna - Jupyter Notebook

In [15]: 1 #Time Series


2
3 plt.plot([2020,2021,2022,2023] , data_engineer_mean_salary , label
4 plt.plot([2020,2021,2022,2023] , data_analyst_mean_salary , label
5 plt.plot([2020,2021,2022,2023] , data_scientist_mean_salary ,label
6
7 plt.xlabel('year')
8 plt.ylabel('avarage salary in USD')
9 plt.legend(['Data engineer' , 'Data analyst','Data scientist'])
10 plt.show()
11
12
13

In [16]: 1 df['salary_currency'].value_counts()

Out[16]: salary_currency
USD 4707
EUR 282
GBP 276
CAD 37
AUD 11
PLN 7
SGD 6
CHF 5
BRL 4
TRY 3
DKK 3
Name: count, dtype: int64

localhost:8888/notebooks/sna.ipynb 5/9
1/31/24, 9:33 PM sna - Jupyter Notebook

In [17]: 1 df['work_setting'].value_counts()

Out[17]: work_setting
In-person 2913
Remote 2239
Hybrid 189
Name: count, dtype: int64

In [18]: 1 #pie chart


2
3 plt.figure(figsize = (8,8))
4 plt.pie(df['work_setting'].value_counts(), autopct='%0.1f%%' , lab
5 plt.title('Most preffered work type')
6 plt.show()

localhost:8888/notebooks/sna.ipynb 6/9
1/31/24, 9:33 PM sna - Jupyter Notebook

In [19]: 1 #histogram
2 plt.hist(df['salary_in_usd'],bins=10,color= [ 'gray'])
3 plt.xlabel('Salary in USD')
4 plt.ylabel('Frequency')
5 plt.title('Histogram of salary')
6 plt.show()

In [20]: 1 df['Serial_no'] = list(range(0,len(df)))

localhost:8888/notebooks/sna.ipynb 7/9
1/31/24, 9:33 PM sna - Jupyter Notebook

In [21]: 1 # data points in a graph


2 plt.figure(figsize=(20,10))
3 plt.scatter(df['Serial_no'],df['salary_in_usd'])
4 plt.xlabel('Serial number')
5 plt.ylabel('Salary in usd')
6 plt.grid()
7 plt.show()
8

In [22]: 1 # scatter plot


2 plt.figure(figsize = (20,10))
3 plt.scatter(df['Serial_no'],df['salary_in_usd'],marker = '+',color
4 plt.xlabel('Index')
5 plt.ylabel('Salary in USD')
6 plt.legend(['Salary in USD'])
7 plt.title('Salary in USD')
8 plt.show()
9

localhost:8888/notebooks/sna.ipynb 8/9
1/31/24, 9:33 PM sna - Jupyter Notebook

In [23]: 1 #area plot


2
3 sns.kdeplot(df['salary_in_usd'] , fill = True)
4 plt.legend(['salary(USD)'])
5 plt.title('Density curve of salary(USD)')
6 plt.show()

In [ ]: 1

In [ ]: 1

In [ ]: 1

In [ ]: 1

In [ ]: 1

localhost:8888/notebooks/sna.ipynb 9/9

You might also like