0% found this document useful (0 votes)
5 views2 pages

2223 Task01 Student

The document outlines an assignment for a Cyber Security Analytics course, where students must analyze a dataset from a web server application to identify malicious activities. Students are required to use Python data science libraries to investigate log data provided by the company 'UWEcyberSolutions' and report their findings. The assignment is divided into two parts, with specific questions and grading criteria, and submissions must be made in a specified format by a set deadline.

Uploaded by

poshakptandiya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views2 pages

2223 Task01 Student

The document outlines an assignment for a Cyber Security Analytics course, where students must analyze a dataset from a web server application to identify malicious activities. Students are required to use Python data science libraries to investigate log data provided by the company 'UWEcyberSolutions' and report their findings. The assignment is divided into two parts, with specific questions and grading criteria, and submissions must be made in a specified format by a set deadline.

Uploaded by

poshakptandiya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

UFCFFY-15-M Cyber Security Analytics

Portfolio Assignment: Worksheet 1

Conduct an investigation on an organisation's web server


application to identify malicious attack activity using Python data
science libraries

For this task, the company "UWEcyberSolutions" have enlisted your support as a security data analyst. They know that
they have suffered an attack on their web server application, however they are unable to diagnose what has happened
exactly, or which of their users have caused the attack. The company have provided you with their recent log data records,
and you will need to identify any suspicious activities that has occurred in the dataset, based on your knowledge and
understanding of web application security, and report back to the company on your findings.

Dataset: You will be randomly issued a unique dataset based on your UWE username - failure to use the dataset
assigned to your username will result in a zero grade. Please see the folder *"Portfolio Assignment"* under the
Assignment tab on Blackboard for further detail related to the access and download of the necessary dataset.

Hint: The TryHackMe room "HTTP in detail" may help your research for what to investigate within this large dataset. More
information about Microsoft Internet Information Services (IIS) can also be found at the following URL:
https://docs.microsoft.com/en-us/previous-versions/iis/6.0-sdk/ms525410(v=vs.90)

Assessment and Marking

The completion of this worksheet is worth 20% of your portfolio assignment for the UFCFFY-15-M Cyber Security
Analytics (CSA) module.

For Part A, the set of guided questions carry individual marks for the successful completion of each task, with a maximum
of 12 marks available. Where a question is worth more than 1 mark, a partial solution to the question may warrant partial
marks.

For Part B, the single question is an unguided task that will be graded against three core criteria:

Criteria 0 1 2 3 4

No or very little A good solution


Identifying the Limited attempt to A possible solution but An excellent solution
evidence of with some
suspicious activity address this criteria with weaknesses with clear justification
progress justification

Analytical reasoning No or very little A good solution


Limited attempt to Some fair attempt but An excellent solution
to uncover the evidence of with some
address this criteria with weaknesses with clear justification
activity progress justification

No or very little A reasonable attempt Excellent detail,


Clarity and Limited attempt to Good detail and
evidence of but with some professional
presentation address this criteria presentation
progress weaknesses presentation

Submission Documents

Your submission for this task should include:

1 Jupyter Notebook exported in PDFviaHTML format:

You should complete your work using the iPYNB file provided (i.e., this document). Once you have completed your work,
you should use the export function in Jupyter to save your notebook as an HTML document ("File", "Save and Export
Notebook As", "PDFviaHTML"). *Do not submit your ipynb file - we will not execute any code during marking.
Therefore, you must ensure that all code cell output is presented clearly in your PDF document before you make
your final submission.*

The deadline for your portfolio submission is TUESDAY 2ND MAY @ 14:00. This assignment is eligible for the 5-day late
window policy, however module staff will not be able to assist with any queries after the deadline.

The portfolio will be submitted to Blackboard as 4 independent documents:

*STUDENT_ID-TASK1.pdf* (a PDF document exported from your Jupyter notebook)


*STUDENT_ID-TASK2.pdf* (a PDF document exported from your Jupyter notebook)
*STUDENT_ID-TASK3.pdf* (a PDF report of your research investigation)
*STUDENT_ID-TASK4.mp4* or *STUDENT_ID-TASK4.txt* (either the video file of your presentation, or a text file
that contains instructions for accessing your video online)

Contact

Questions about this assignment should be directed to your module leader ([email protected]). You should use the
online Q&A form to ask questions related to this module and this assignment, as well as utilising the on-site teaching
sessions.

Student ID: -ENTER STUDENT NUMBER-


By submitting this assignment to Blackboard as part of your portfolio, I declare that the submission is my
own work.

In [3]: # Import libraries as required


import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
pd.set_option('display.max_rows', 10)

In the cell below, you will need to change data_file to your own specific data filename. The example data file is purely
to demonstrate some initial steps for your investigation and should not be used.

In [4]: data_file = 'YOUR-USERNAME'

In [178… # Load in the data set as required


data = pd.read_csv(data_file, delim_whitespace=True)
temp_df = data[data.columns[:-1]]
temp_df.columns = data.columns[1:]
data = temp_df
data['datetime'] = pd.to_datetime(data['date'] + " " + data['time'])
data

Out[178]:
cs- cs-uri- s- cs-
date time s-ip cs-uri-stem c-ip
method query port username

2022-
0 04:36:00 192.67.2.200 GET bjgstfyo.js v=596413 443 - 194.79.31.2
01-01 (Windows+NT+10.0;+

2022-
1 04:36:00 192.67.2.200 GET index.aspx - 443 - 194.79.31.2
01-01 (Windows+NT+10.0;+

2022-
2 04:36:15 192.67.2.200 GET osivymrb.css - 443 - 194.79.31.2
01-01 (Windows+NT+10.0;+

2022-
3 04:36:15 192.67.2.200 GET laepfxqk.css - 443 - 194.79.31.2
01-01 (Windows+NT+10.0;+

2022-
4 04:36:15 192.67.2.200 GET template.css v=alngoccj 443 - 194.79.31.2
01-01 (Windows+NT+10.0;+

... ... ... ... ... ... ... ... ... ...

2022-
69545 23:54:29 192.67.2.200 GET transactions.aspx page=4 443 ew361149 81.161.226.136
01-30 (X11;+Linux+x86_6

2022-
69546 23:54:32 192.67.2.200 GET template.css v=nhxjnpwa 443 ew361149 81.161.226.136
01-30 (X11;+Linux+x86_6

2022-
69547 23:54:32 192.67.2.200 GET favico.ico - 443 ew361149 81.161.226.136
01-30 (X11;+Linux+x86_6

2022-
69548 23:54:32 192.67.2.200 GET template.css v=eaftdmgp 443 ew361149 81.161.226.136
01-30 (X11;+Linux+x86_6

2022-
69549 23:54:32 192.67.2.200 GET transactions.aspx page=5 443 ew361149 81.161.226.136
01-30 (X11;+Linux+x86_6

69550 rows × 16 columns

Part A:
Please answer the following questions by providing suitable Python code
Questions 1-6 should require only a single line of code per answer. Question 8 should be answered in two lines of
code only. These questions make up 12 possible marks towards the assignment.

Question 1: How many unique machines (defined by client IP address 'c-ip') have
accessed this web server application? (1 Mark)
In [209… # ANSWER

Question 2: How many unique usernames (defined by 'cs-username') have


accessed this web server application? (1 Mark)
In [208… # ANSWER

Question 3: Which URLs (defined by 'cs(Referer)') have been accessed the most
number of times? (1 Mark)
In [207… # ANSWER

Question 4: What is the minimum value in the 'sc-status' column? (1 Marks)


In [206… # ANSWER

Question 5: How many entries in the data column 'cs-uri-query' start with the
string 'v='? (2 Marks)
In [205… # ANSWER

Question 6: How many entries in the data column 'cs(User-Agent)' contain the
term 'Win64? (2 Marks)
In [204… # ANSWER

Question 7: Which file extension occurs the most within the 'cs-uri-stem' column?
(2 Marks)
In [203… # ANSWER

Question 8: How many entries return a 'sc-status' value of 404 before 06:00AM? (2
Marks)
In [211… # ANSWER
Part B:
Investigate the dataset further to uncover the suspicious activity.
This unguided question will be graded against the following criteria:

Identifying the suspicious activity (4 Marks)


Analytical reasoning to uncover the activity (4 Marks)
Clarity and presentation (4 Marks)

You should state all suspicious IP addresses that you have identified as part of your conclusion, and you should explain in
clear written English how you have uncovered this information, based on how you have used Python code for data
investigation. This should be clear and concise, and you only need to include code that helped you to solve the challenge.

In [1]: # ANSWER

You might also like