Social Network Analysis: Cheruvu Nvss Suhas 21BCE8374
Social Network Analysis: Cheruvu Nvss Suhas 21BCE8374
User1
1/31/24, 9:33 PM sna - Jupyter Notebook
In [2]: 1 df=pd.read_csv("C:\\Users\\User1\\OneDrive\\Desktop\\jobs_in_data.c
sv)
In [3]:
1 df = df.drop_duplicates()
2 df = df.reset_index()
3 df.drop(columns = ['index'],axis = 1,inplace = True)
In [4]:
1 df.info()
s 'pandas.core.frame.DataFrame'>
<clas
Index: 5341 entries, 0 to 5340
Range
Data columns (total 12 columns):
# Column Non-Null Count Dtype
localhost:8888/notebooks/sna.ipynb 1/9
1/31/24, 9:33 PM sna - Jupyter Notebook
In [5]: 1 df
Out[5]:
work_year job_title job_category salary_currency salary salary_in_usd empl
Data
Data
0 2023 DevOps EUR 88000 95012
Engineering
Engineer
Data
Data
1 2023 Architecture USD 186000 186000
Architect
and Modeling
Data
Data
2 2023 Architecture USD 81800 81800
Architect
and Modeling
Business
In [6]: 1 df['work_year'].unique()
localhost:8888/notebooks/sna.ipynb 2/9
1/31/24, 9:33 PM sna - Jupyter Notebook
In [7]: 1 df['job_title'].value_counts()
Out[7]: job_title
Data Engineer 1100
Data Scientist 1039
Data Analyst 744
Machine Learning Engineer 518
Analytics Engineer 207
...
Deep Learning Researcher 1
Analytics Engineering Manager 1
BI Data Engineer 1
Power BI Developer 1
Marketing Data Engineer 1
Name: count, Length: 125, dtype: int64
In [8]: 1 #barplot
2
3 plt.figure(figsize = (30,10))
4
5 plt.bar(df['job_title'],df['salary_in_usd'])
6 plt.xticks(rotation = 90)
7 plt.ylabel('Salary')
8 plt.xlabel('Job role')
9 plt.title('Job vs Salary')
10 plt.show()
In [9]: 1 df['job_category'].unique()
In [10]: 1 df['work_year'].unique()
localhost:8888/notebooks/sna.ipynb 3/9
1/31/24, 9:33 PM sna - Jupyter Notebook
localhost:8888/notebooks/sna.ipynb 4/9
1/31/24, 9:33 PM sna - Jupyter Notebook
In [16]: 1 df['salary_currency'].value_counts()
Out[16]: salary_currency
USD 4707
EUR 282
GBP 276
CAD 37
AUD 11
PLN 7
SGD 6
CHF 5
BRL 4
TRY 3
DKK 3
Name: count, dtype: int64
localhost:8888/notebooks/sna.ipynb 5/9
1/31/24, 9:33 PM sna - Jupyter Notebook
In [17]: 1 df['work_setting'].value_counts()
Out[17]: work_setting
In-person 2913
Remote 2239
Hybrid 189
Name: count, dtype: int64
localhost:8888/notebooks/sna.ipynb 6/9
1/31/24, 9:33 PM sna - Jupyter Notebook
In [19]: 1 #histogram
2 plt.hist(df['salary_in_usd'],bins=10,color= [ 'gray'])
3 plt.xlabel('Salary in USD')
4 plt.ylabel('Frequency')
5 plt.title('Histogram of salary')
6 plt.show()
localhost:8888/notebooks/sna.ipynb 7/9
1/31/24, 9:33 PM sna - Jupyter Notebook
localhost:8888/notebooks/sna.ipynb 8/9
1/31/24, 9:33 PM sna - Jupyter Notebook
In [ ]: 1
In [ ]: 1
In [ ]: 1
In [ ]: 1
In [ ]: 1
localhost:8888/notebooks/sna.ipynb 9/9