BCS Level 4 Module in Dat+
BCS Level 4 Module in Dat+
Analysis
Sample Paper A
Record your surname / last / family name and initials on the answer sheet.
A number of possible answers are given for each multiple choice or multiple response
question, indicated by either A B C or D (up to E in the skills scenarios). A number of other
questions will require you to re-order a list or fill in the blanks. Your answers should be
clearly indicated on your answer sheet.
Copying of this paper is expressly forbidden without the direct approval of BCS,
The Chartered Institute for IT.
Page 1 of 19
Copyright © BCS 2022
BCS Level 4 Module in Data Analysis
Version 1.3 March 2022
1 Your organisation needs to collate data from different internal departments, as
well as data originating from external stakeholders. Which ONE of the following
options would be necessary before this dispersed data can be used effectively?
2 Under which of the following conditions would there MOST LIKELY be a potential
issue when working with personal financial information?
A The data has been stored within a customer database which can be accessed by
internal staff.
B The data has been stored in a downloads folder accessed by a member of
internal staff.
C The data has been stored on a well-known cloud storage service.
D The data has been stored on a remote company data centre which can be backed
up remotely.
4 You are currently gathering data relating to coastal erosion, in order to predict the
measurement of the erosion in another 20 years' time. It includes a series of
measurements that have been recorded over a period of 50 years. Which of the
following types of data would you MOST LIKELY use in your analysis?
A Continuous.
B Descriptive.
C Structured.
D Nominal.
At which point in the data lifecycle is this issue MOST LIKELY to have arisen?
A Creation.
B Storage.
C Use.
D Deletion.
6 Which of the following data structures includes the use of a parent node?
A Graph.
B List.
C Tree.
D Array.
8 Which of the following unstructured data formats would require the MOST amount
of pre-processing before the data can be used effectively?
A Audio.
B Word processed files.
C Video.
D Social media feeds.
Page 3 of 19
Copyright © BCS 2022
BCS Level 4 Module in Data Analysis
Version 1.3 March 2022
9 You want to gain insight into the influence your customers have on brand visibility.
You have a structured data source in the form of a CRM, as well as unstructured
data from social media feeds.
What would be the MAIN benefit of using this unstructured data alongside the
structured data source?
A You could identify how active your customers are on social media.
B You could identify how many times your brand's name is mentioned.
C You could identify the number of followers your customers have.
D You could identify how many times your main competitor's name is mentioned.
A Incomplete data.
B Out-of-date data.
C Duplicate data.
D Unverified data.
11 You have been tasked to produce a monthly report on sales from the previous
month. What sort of analytics would you use?
A Decision analytics.
B Descriptive analytics.
C Predictive analytics.
D Prescriptive analytics.
12 An organisation's Data Protection Officer has been asked to carry out a GDPR-
compliant right to erasure request. Which of the following will they will need to
know?
Page 4 of 19
Copyright © BCS 2022
BCS Level 4 Module in Data Analysis
Version 1.3 March 2022
13 Match each of the following GDPR-related roles to the appropriate description of
their responsibilities.
14 A business wants to run a data analysis project to assess the financial viability of
ongoing projects. Sort the following options into business requirements and
technical requirements by deleting the option as appropriate.
A Recounting.
B Observation.
C Enacting.
D Technical testing.
Page 5 of 19
Copyright © BCS 2022
BCS Level 4 Module in Data Analysis
Version 1.3 March 2022
16 You want to create a data model which describes the technical requirements of a
data analysis project. The intended audience will be non-technical company
directors. Select the MOST appropriate data model from the following options.
A Procedures.
B Intuition.
C Experience.
D Databases.
18 Which of the following should define what data is collected and stored in an
organisation?
A Policies.
B Standards.
C Storage space.
D Processing availability.
A Data modelling.
B Data integration.
C Data warehousing.
D Data migration.
Page 6 of 19
Copyright © BCS 2022
BCS Level 4 Module in Data Analysis
Version 1.3 March 2022
20 Which of the following are the key challenges in dealing with Big Data?
A Volume of data.
B Velocity of data.
C Variety of data.
D Verification of data.
Page 7 of 19
Copyright © BCS 2022
BCS Level 4 Module in Data Analysis
Version 1.3 March 2022
Scenario 1: Database design and SQL
You have been put in charge of running the monthly scorecard for the IT department. You will be
using the 'IT Assets' database to retrieve the relevant data. The main tables to retrieve information
from are shown in the ERD below.
What new field could be a suitable primary key for the Manager table?
A first_name
B last_name
C department
D date_started
E manager_id
22 Which field in the Employees table would be a suitable foreign key in the new 'Managers'
table?
Page 8 of 19
Copyright © BCS 2022
BCS Level 4 Module in Data Analysis
Version 1.3 March 2022
A first_name
B last_name
C department
D employee_id
E other_details
A Group By
B Count
C Sort Ascending
D Left
E Union
24 If an asset_id is removed from the IT_assets table, all related information for that
asset should also be removed from the other tables. Which database
functionality should be relied upon in this instance?
A Delete query
B Update query
C Cascading Delete
D Index
E Date fields
25 Fill in the blanks to complete the SQL query shown below, using the given options
listed.
The report requires a view of how many employees are not entering their name
when taking an asset. This report should show how many employees per
department are leaving their name blank.
Page 9 of 19
Copyright © BCS 2022
BCS Level 4 Module in Data Analysis
Version 1.3 March 2022
Scenario 2 - Data Preparation and Integration
You are working as a data analyst in the IT department and have been asked to look into
processing and analysing the IT operational data. The IT department are keen to operationalise the
reporting processes and are therefore using programming languages to automate the importing,
cleansing and manipulation of the data.
26 Order the following Python commands into a logical flow for importing the data contained
in the file called "IToutages.csv".
After importing the entire file you should print the first ten characters of data to screen.
f.close()
print(FirstTenChars)
data=f.read()
f = open("IToutages.csv")
FirstTenChars = data[0:9]
You can use this space below to provide your answer unless using a separate answer
sheet:
27 Which is the missing line of code in the following python programme to find the mean of
2,3,4,5,6?
Numbers = [2,3,4,5,6]
Total=sum(Numbers)
____________
print(Mean)
A Mean=2*6/4
B Range=6-2
C Mean=Total/5
D Mean=sqrt(Total)
E Mean=Total**2
28 Which of the following R commands will correctly show different averages and quartiles
of a dataset?
A str(dataset)
B quantile(dataset)
C mean(dataset)
D summary(dataset)
E median(dataset)
Page 10 of 19
Copyright © BCS 2022
BCS Level 4 Module in Data Analysis
Version 1.3 March 2022
29 The IT department are also considering using the R language.
What R command could you use to see a snapshot of the first six rows in a dataset?
A head(dataset)
B head(dataset, 5)
C tail(dataset,6)
D dataset[6]
E dataset[5]
30 You should be able to visualise time series data very quickly in R. Order the following
lines of code into a logical flow to read in the 'ITemployees.txt' file and plot the data as an
annual time series. Having realised that the data is not annual you should then change
the timeseries to be monthly and the plot a second graph that starts in 1999.
Page 11 of 19
Copyright © BCS 2022
BCS Level 4 Module in Data Analysis
Version 1.3 March 2022
Scenario 3 – ERD Normalisation
You have been asked to lead a project working on the data used by a retail business. They
currently capture all the key sales information in a spreadsheet but would like to migrate this to a
relational database. The structure of the spreadsheet is shown below with five rows of data.
31 Which of these outcomes would you expect when normalising this data to first normal
form?
A No change.
B Repeat the Salesperson information and have 12 separate rows of data.
C Delete the Customer Number column.
D Creation of a two or more separate tables.
E Sales amounts would be aggregated.
A Discount table.
B Purchase table.
C Customer table.
D Shop location table.
E Salesperson table.
Page 12 of 19
Copyright © BCS 2022
BCS Level 4 Module in Data Analysis
Version 1.3 March 2022
33 If you created a purchase table storing information about the product, price and date of
purchase, what other field should be added?
A Salesperson number.
B Customer number.
C A new primary key – PurchaseID.
D Currency exchange rate.
E Table creation date.
34 Creating the normalised form of this data would be categorised as which form of model?
A Physical model.
B Business model.
C Conceptual model.
D Database model.
E Technical model.
35 Creating a database for the retail data will required discussions with stakeholders to
clarify requirements. Which of the following techniques would be appropriate for this
scenario?
A Prototype.
B Interviews.
C Process modelling.
D Financial costing.
E Return on investment.
Page 13 of 19
Copyright © BCS 2022
BCS Level 4 Module in Data Analysis
Version 1.3 March 2022
Scenario 4 – Data Modelling
You have been asked to show senior managers what a statistical analysis of data could be used
for in the business. You have decided to create a predictive model to forecast sales of products
within the company's stores.
36 In planning your approach to the predictive model, you have decided to create a project
plan with the key steps required. Order the given steps in the correct order (earliest to
latest). Indicate your chosen order by writing the letters A-E in the spaces below.
A Cleanse data.
B Create problem hypothesis.
C Analyse data.
D Collect data.
E Document results.
__ __ __ __ __
37 You feel that weather has a lot to do with sales in stores so you would like to create a
model to show the impact that weather has. You believe that sunny weather increases
total sales. Which of the following statements would form a suitable null hypothesis for
this model?
38 Having seen that weather does not appear to impact sales in stores, you have decided to
forecast sales using a linear regression model. Having defined the model, you are looking
to now train the model. What would be an appropriate size subset of your data to use for
this training?
A 0%
B 20%
C 30%
D 50%
E 70%
Page 14 of 19
Copyright © BCS 2022
BCS Level 4 Module in Data Analysis
Version 1.3 March 2022
39 How much data should you set aside for your testing and validation data sets?
A 30%
B 50%
C 70%
D 80%
E 100%
40 You are looking to present the results of your linear regression model to senior
stakeholders. Which visualisation would be most appropriate for a linear regression
forecast?
A Bar chart.
B Histogram.
C Gantt chart.
D Heat map.
E Scatter chart.
End of Paper
Page 15 of 19
Copyright © BCS 2022
BCS Level 4 Module in Data Analysis
Version 1.3 March 2022
BCS Level 4 Module in Data Analysis
Answer Key and Rationale
Page 16 of 19
Copyright © BCS 2022
BCS Level 4 Module in Data Analysis
Version 1.3 March 2022
Question Answer Explanation / Rationale Syllabus
Section
3. Responsible for overseeing data protection
strategy and its implementation.
4. Responsible for monitoring GDPR compliance.
14 A-C Business The format of the data is a technical requirement; while 3.5
D Technical options A-C have a technical element to them, they are
business requirements.
15 B This technique is an effective means for monitoring a 3.6
current or ongoing process, for example, deciphering
how a user performs their role by assessing their work
environment.
16 A Physical and logical data models would be too 3.7
technical for the intended audience; conceptual data
model identifies business concepts and is used to
document business from a data perspective - more
about actual business data rather than database
design.
17 B and C Unwritten and hard to communicate knowledge of a 3.7
business - not usually documented in any form but
resides with experts
18 A and B Data architecture should be defined by policies, rules 4.1
and standards rather than available resource or
technical constraints.
19 C Data warehousing is the only one of the four options 4.2
which relates to historical data
20 A, B and C Acknowledging the 3 V's of describing Big Data 4.3
challenges - there are other accepted descriptions but
verification is not a relevant term.
The answer field implies an 'id' based on manager
21 E which should lead through thought process to unique 4.4, 4.6
identifier of manager.
22 D Employee_id is the primary key in the linked table so 4.4, 4.6
the only option to be described as a foreign key. Whilst
alternatives could be argued, this is a case of the only
sensible option from the ones given.
23 A and B To return results from a query that gives the number of 4.7
assets will mandate use of a 'Count' aggregation. As
the 'Count' aggregation won't be required on the rest of
the data i.e. condition, this will need to be aggregated
using a 'Group By'.
24 C This question tests deeper database functionality rather 4.7
than SQL. The feature of 'cascading delete' is standard
functionality across different systems and would be
used to ensure related records are removed from
related tables.
Page 17 of 19
Copyright © BCS 2022
BCS Level 4 Module in Data Analysis
Version 1.3 March 2022
Question Answer Explanation / Rationale Syllabus
Section
25 See The answer would require choice of options (four from 4.7
explanation six) that return the number of employees (count) using
the criteria of blank first name. The group by will require
the department name to match the question criteria and
the order by checks understanding of labelling columns
rather than using field names (i.e. 2 for the second
column)
f = open("ads.csv")
data=f.read()
f.close()
FirstTenChars = data[0:29]
print(FirstTenChars)
27 C A basic test of statistical knowledge - specifically how 5.1, 5.2,
to calculate the mean of a number. Put into a Python 5.3, 5.4,
context adds no difficulty but does test some 5.5
understanding of algorithmic steps.
28 D All of the functions are valid but only summary will 5.1, 5.2,
return more than one average as well as the quartiles. 5.3, 5.4,
5.5
29 A This checks a basic function of data visualisation in R. 5.1, 5.2,
Head is a common function, some may not know that 5.3, 5.4,
the default amount of rows returned is 6, but a process 5.5
of elimination will lead them there if they understand
whats wrong with the other options.
31 B and D Each record has to have a primary key (and purchases 2.2, 3.8,
does not currently have one). Each record cannot have 4.4, 4.6
repeating groups of attributes therefore Salesperson
needs to be separated into individual records for each
customer/purchase.
32 B, C and E The Salesperson and Customer data are clear sets of 2.2, 3.8,
data with purchase information (date and amount) also 4.4, 4.6
a valid choice to store. There is no indication of
discount or location data so these tables would not be
an outcome from this process.
33 C The new table should always have a unique field for the 2.2, 3.8,
primary key. There is no obvious choice for this in the 4.4, 4.6
current fields so a new ID field should be created.
34 C The conceptual model is to establish the entities, their 2.2, 3.8,
attributes, and their relationships. The logical data 4.4, 4.6
model defines the structure of the data elements and
set the relationships between them. The physical Data
Model describes the database-specific implementation
of the data model.
35 A, B and C As a conceptual model, this is aimed at process 3.7
requirements elicitation. The financial measures are
not relevant at this stage but the other options are all
potential opportunities to gather requirements.
36 B, D, A, C, E Some of the typcial steps of data analysis in the correct 6.1
order (as per syllabus).
37 C Null hypothesis is indicated by H0 and would state that 6.2
there is not enough statistical evidence to show that
sunshine affects product sales.
38 E Normal practice for size of a training data set is at least 6.4
70% (sometimes increasing but 70% option is the only
realistic choice in this list).
39 A The remaining data after the training data set is taken 6.5
out would be used for testing/validation.
40 E As a linear model, the most appropriate visualisation 6.6
should show the data points and the linear trend line
i.e. Scatter Chart.
Page 19 of 19
Copyright © BCS 2022
BCS Level 4 Module in Data Analysis
Version 1.3 March 2022