UFCFFY-15-M Cyber Security Analytics
Portfolio Assignment: Worksheet 1
Conduct an investigation on an organisation's web server
application to identify malicious attack activity using Python data
science libraries
For this task, the company "UWEcyberSolutions" have enlisted your support as a security data analyst. They know that
they have suffered an attack on their web server application, however they are unable to diagnose what has happened
exactly, or which of their users have caused the attack. The company have provided you with their recent log data records,
and you will need to identify any suspicious activities that has occurred in the dataset, based on your knowledge and
understanding of web application security, and report back to the company on your findings.
Dataset: You will be randomly issued a unique dataset based on your UWE username - failure to use the dataset
assigned to your username will result in a zero grade. Please see the folder *"Portfolio Assignment"* under the
Assignment tab on Blackboard for further detail related to the access and download of the necessary dataset.
Hint: The TryHackMe room "HTTP in detail" may help your research for what to investigate within this large dataset. More
information about Microsoft Internet Information Services (IIS) can also be found at the following URL:
https://docs.microsoft.com/en-us/previous-versions/iis/6.0-sdk/ms525410(v=vs.90)
Assessment and Marking
The completion of this worksheet is worth 20% of your portfolio assignment for the UFCFFY-15-M Cyber Security
Analytics (CSA) module.
For Part A, the set of guided questions carry individual marks for the successful completion of each task, with a maximum
of 12 marks available. Where a question is worth more than 1 mark, a partial solution to the question may warrant partial
marks.
For Part B, the single question is an unguided task that will be graded against three core criteria:
Criteria 0 1 2 3 4
No or very little A good solution
Identifying the Limited attempt to A possible solution but An excellent solution
evidence of with some
suspicious activity address this criteria with weaknesses with clear justification
progress justification
Analytical reasoning No or very little A good solution
Limited attempt to Some fair attempt but An excellent solution
to uncover the evidence of with some
address this criteria with weaknesses with clear justification
activity progress justification
No or very little A reasonable attempt Excellent detail,
Clarity and Limited attempt to Good detail and
evidence of but with some professional
presentation address this criteria presentation
progress weaknesses presentation
Submission Documents
Your submission for this task should include:
1 Jupyter Notebook exported in PDFviaHTML format:
You should complete your work using the iPYNB file provided (i.e., this document). Once you have completed your work,
you should use the export function in Jupyter to save your notebook as an HTML document ("File", "Save and Export
Notebook As", "PDFviaHTML"). *Do not submit your ipynb file - we will not execute any code during marking.
Therefore, you must ensure that all code cell output is presented clearly in your PDF document before you make
your final submission.*
The deadline for your portfolio submission is TUESDAY 2ND MAY @ 14:00. This assignment is eligible for the 5-day late
window policy, however module staff will not be able to assist with any queries after the deadline.
The portfolio will be submitted to Blackboard as 4 independent documents:
*STUDENT_ID-TASK1.pdf* (a PDF document exported from your Jupyter notebook)
*STUDENT_ID-TASK2.pdf* (a PDF document exported from your Jupyter notebook)
*STUDENT_ID-TASK3.pdf* (a PDF report of your research investigation)
*STUDENT_ID-TASK4.mp4* or *STUDENT_ID-TASK4.txt* (either the video file of your presentation, or a text file
that contains instructions for accessing your video online)
Contact
Questions about this assignment should be directed to your module leader ([email protected]). You should use the
online Q&A form to ask questions related to this module and this assignment, as well as utilising the on-site teaching
sessions.
Student ID: -ENTER STUDENT NUMBER-
By submitting this assignment to Blackboard as part of your portfolio, I declare that the submission is my
own work.
In [3]: # Import libraries as required
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
pd.set_option('display.max_rows', 10)
In the cell below, you will need to change data_file to your own specific data filename. The example data file is purely
to demonstrate some initial steps for your investigation and should not be used.
In [4]: data_file = 'YOUR-USERNAME'
In [178… # Load in the data set as required
data = pd.read_csv(data_file, delim_whitespace=True)
temp_df = data[data.columns[:-1]]
temp_df.columns = data.columns[1:]
data = temp_df
data['datetime'] = pd.to_datetime(data['date'] + " " + data['time'])
data
Out[178]:
cs- cs-uri- s- cs-
date time s-ip cs-uri-stem c-ip
method query port username
2022-
0 04:36:00 192.67.2.200 GET bjgstfyo.js v=596413 443 - 194.79.31.2
01-01 (Windows+NT+10.0;+
2022-
1 04:36:00 192.67.2.200 GET index.aspx - 443 - 194.79.31.2
01-01 (Windows+NT+10.0;+
2022-
2 04:36:15 192.67.2.200 GET osivymrb.css - 443 - 194.79.31.2
01-01 (Windows+NT+10.0;+
2022-
3 04:36:15 192.67.2.200 GET laepfxqk.css - 443 - 194.79.31.2
01-01 (Windows+NT+10.0;+
2022-
4 04:36:15 192.67.2.200 GET template.css v=alngoccj 443 - 194.79.31.2
01-01 (Windows+NT+10.0;+
... ... ... ... ... ... ... ... ... ...
2022-
69545 23:54:29 192.67.2.200 GET transactions.aspx page=4 443 ew361149 81.161.226.136
01-30 (X11;+Linux+x86_6
2022-
69546 23:54:32 192.67.2.200 GET template.css v=nhxjnpwa 443 ew361149 81.161.226.136
01-30 (X11;+Linux+x86_6
2022-
69547 23:54:32 192.67.2.200 GET favico.ico - 443 ew361149 81.161.226.136
01-30 (X11;+Linux+x86_6
2022-
69548 23:54:32 192.67.2.200 GET template.css v=eaftdmgp 443 ew361149 81.161.226.136
01-30 (X11;+Linux+x86_6
2022-
69549 23:54:32 192.67.2.200 GET transactions.aspx page=5 443 ew361149 81.161.226.136
01-30 (X11;+Linux+x86_6
69550 rows × 16 columns
Part A:
Please answer the following questions by providing suitable Python code
Questions 1-6 should require only a single line of code per answer. Question 8 should be answered in two lines of
code only. These questions make up 12 possible marks towards the assignment.
Question 1: How many unique machines (defined by client IP address 'c-ip') have
accessed this web server application? (1 Mark)
In [209… # ANSWER
Question 2: How many unique usernames (defined by 'cs-username') have
accessed this web server application? (1 Mark)
In [208… # ANSWER
Question 3: Which URLs (defined by 'cs(Referer)') have been accessed the most
number of times? (1 Mark)
In [207… # ANSWER
Question 4: What is the minimum value in the 'sc-status' column? (1 Marks)
In [206… # ANSWER
Question 5: How many entries in the data column 'cs-uri-query' start with the
string 'v='? (2 Marks)
In [205… # ANSWER
Question 6: How many entries in the data column 'cs(User-Agent)' contain the
term 'Win64? (2 Marks)
In [204… # ANSWER
Question 7: Which file extension occurs the most within the 'cs-uri-stem' column?
(2 Marks)
In [203… # ANSWER
Question 8: How many entries return a 'sc-status' value of 404 before 06:00AM? (2
Marks)
In [211… # ANSWER
Part B:
Investigate the dataset further to uncover the suspicious activity.
This unguided question will be graded against the following criteria:
Identifying the suspicious activity (4 Marks)
Analytical reasoning to uncover the activity (4 Marks)
Clarity and presentation (4 Marks)
You should state all suspicious IP addresses that you have identified as part of your conclusion, and you should explain in
clear written English how you have uncovered this information, based on how you have used Python code for data
investigation. This should be clear and concise, and you only need to include code that helped you to solve the challenge.
In [1]: # ANSWER