AnswerData Analyst
AnswerData Analyst
descriptive
analyzing customer ordering history to find popular products within last 6 months analytics
predictive
analyzing historical data of temperature to prepare latest weather forecast analytics
reviewing sales data of top 5 competitors to determine the marketing strategy of own prescriptive
product for next quarter analytics
2. Match the type of data analysis with the question it can answer.
Descriptive analytics uses a feedback system to track the outcome of actions taken.
Prescriptive analytics uses historical data to provide regular reports on events that have already
happened.
Predictive analytics uses simulation models and forecasting to suggest what could happen.
Explanation: Descriptive analytics uses historical data to provide regular reports on events that have
already happened. Prescriptive analytics uses a feedback system to track the outcome of actions taken.
Predictive analytics uses data and statistical techniques to predict future trends. All three types of
analytics (Prescriptive, Predictive, and Descriptive) are used in Big Data analysis.
4. A data analyst uses Excel and Tableau to identify patterns and correlations in a dataset to draw
conclusions. Which phase of the analytic process is the analyst currently working on?
Explanation: During the analyzing data phase of an analytic project, an analyst looks for patterns and
correlations in the data set to draw conclusions.
5. Refer to the exhibit. A data analyst wants to create a formula in Microsoft Excel that will
automatically calculate the net revenue from the sales of dining sets. Which formula inserted in cell F2
will accomplish this?
=(C2*(B2-D2)*E2)
=B2*C2-D2*E2
=B2-D2+C2*E2
=(B2*C2)-(D2*C2)*E2
Explanation: The correct formula is =(C2*(B2-D2)*E2) and will yield a net loss in revenue amounting to
$580.10.
6. Which type of variable is used for qualitative values such as gender or eye color?
ordinal
discrete
continuous
nominal
Explanation: Variables are either categorical or numerical. Categorical variables are qualitative and are
either nominal or ordinal. Nominal variables are used for values that are based on the identity of the
object such as eye color or gender.
7. What are three resources provided by the Kaggle web site? (Choose three.)
code
community competitions
Explanation: Kaggle offers many resources for new and advanced data scientists. These include publicly
available data sets, code, community, inspiration, competitions, and courses.
ratio quantitative values that can specify if a value for a variable exists
is the process of conditioning data into a usable form, such as removing duplicate data, providing
transform missing data, and correcting any errors in the data.
In the ELT process, the transform step occurs on the stored data as it is used.
In the ELT process, the load step occurs after the transform step.
In the ETL process, the extract step occurs after the load step.
In the ETL process, the load step occurs before the transform step.
Explanation: In the ETL process (Extract, Transform and Load), data is first extracted, then transformed
and then loaded into the database. In the ELT process ((Extract, Load and Transform) the load and
transform steps are reversed. ELT enables raw data to skip the transformation step and go straight to
storage in an unstructured form. Transformation then occurs on the stored data as it is used.
11. An analyst in an online order company is researching products that online customers spend the
most time browsing for on the website but do not buy. The result of the analysis will enable the
company to release quick discount notices on the website to encourage customers to buy those
products. What relevant data is required to do the research?
inventory of products
brands of products
Explanation: The role of time plays an important role in modern data analytics. Businesses rely on real-
time data to make quick decisions that will hold the greatest benefits to them. In this case, the customer
viewing data can be analyzed against a policy to release discount notices in a timely fashion in order to
encourage interested customers to make a purchase.
12. Which two tasks are performed as part of the transform step of the ETL data process? (Choose
two.)
Explanation: The process of transforming the data includes tasks such as:
CONCATENATE it is used to combine the data from one or more columns into a single column
INDEX it is used to return a value or the reference to a value from within a table or range
14. Which option in Microsoft Excel represents a formula with an absolute reference?
=($A1-$B1)
=(B$1-B$3)
=($A$1-$B$1))
=(B$=(C$1$-F$1$)
Explanation: Dollar symbols in a cell reference indicate to Microsoft Excel to treat the cell reference as
absolute and to always refer to the value of the cell regardless of whether it is moved or where the
formula is located. An absolute reference is designated in a formula by the addition of a dollar sign ($)
before the column and row. As an absolute reference was used to refer to Cell B1, the formula will
automatically update to wherever the contents of this cell is moved.
15. A data analyst needs to organize sales data for analysis. Which Excel function can order the data
by date sold, with the most recent sales listed first?
Conditional Formatting
Text to Columns
Explanation: The Sort & Filter tool in Excel allows an analyst to sort the contents of a column in either
ascending order so that dates are earliest to most recent or descending order so that dates are most
recent to earliest.
16. How can data analysts use the Conditional Formatting tool in Excel to aid in data analysis of bike
sales records?
to separate the contents of the product description column into separate columns
Explanation: By using conditional formatting, a data analyst can highlight cells that contain certain
values or that meet certain criteria.
17. A learner is analyzing a large volume of data in a Microsoft Excel spreadsheet and wishes to find
duplicate data values. The data has been organized in a table where each row has different but
related forms of data in each column. Which Microsoft Excel function can be used to do this?
ISNA
VLOOKUP
SUM
IF
Explanation: VLOOKUP is a very powerful data analysis tool in Microsoft Excel that is used to find
information in a large spreadsheet including duplicate values. VLOOKUP is a vertical lookup function, so
the data needs to be organized in a table where each row has different but related forms of data in each
column.
The IF function makes logical comparisons.
The ISNA looks for cells with #N/A.
The SUM function is used to add the values in cells selected.
18. A data analyst wants to compare the average life expectancy and GDP for forty countries. Which
type of visual representation would best suit this task?
pie chart
scatter plots
bar chart
line graph
Explanation: A scatter plot is a type of data visualization that shows the relationship between different
variables. This data is shown by placing various data points between an x- and y-axis. Scatter plots are
very popular for correlation visualizations, or when you want to show the distribution of a large number
of data points. Scatter plots are also useful for demonstrating clustering or identifying outliers in the
data.
19. The figure contains a section of an Excel spreadsheet. Cell C2 contains the formula
“=VLOOKUP(B1,$A$2:$A$10,1,FALSE)” When cell C2 is clicked “#N/A” is displayed.
Refer to the exhibit. A learner is analyzing a data spreadsheet in Microsoft Excel and notices a formula
function “=VLOOKUP(B1,$A$2:$A$10,1,FALSE)”. When clicking on cell C2, the value displayed in the
field in cell C2 is “#N/A”. Why is this value displayed?
Explanation: VLOOKUP can also be used to help with data cleaning by finding duplicates. With VLOOKUP
you can compare two columns (or lists) and find duplicate values. The formula is written in cell C2 as
=VLOOKUP(B1,$A$2:$A$10,1,FALSE). This formula compares the value in B1 with the range A2:A10. If
there is no duplicate then an error is displayed as “#N/A”. If a duplicate was found it would display the
name of the duplicate.
20. What is a statistical analysis result that a descriptive statistical analysis will not provide?
predictions made about other data sets that are not in the population
Explanation: Descriptive statistics are used to describe or summarize data in ways that are meaningful
and useful. They describe the current or historical state of the observed population but do not allow for
comparison of groups, conclusions to be drawn, or predictions to be made about other data sets that
are not in the population.
21. Refer to the exhibit. Which option will correctly display the user names and email addresses of
users in this table?
SELECT User
WHERE user_id, user_name, user_email
SELECT User
FROM user_id, user_name, user_email
SELECT user_id, user_name, user_email
FROM User
Explanation: The SELECT command is used to request the specified fields using commas to separate
fields.
The FROM command returns the selection from the table name.
22. Drag the SQL Data Manipulation Language (DML) statements to the correct description.
ORDER BY used to structure the resulting query in ascending (ASC) order by default
23. Refer to the exhibit. A portion of the Movie table from the Movies database is shown. A data
analyst is writing a query that will return the title and release date of movies released after 2000. The
analyst further needs the list ordered by the release date. Which query will return the required
results?
SELECT Movie
FROM Title, Release_date
WHERE Release_date > ‘2000-12-31’
ORDER BY Release-date
spreadsheet it stores data in a sheet using a tabular format of columns, rows, and cells
flat file database it stores records in a single file with no hierarchical structure
Explanation: The Data Manipulation Language stores, modifies, retrieves, and deletes data in a database
table. DML includes the most common SQL commands, including SELECT, INSERT, UPDATE and DELETE.
Explanation: A field within the database schema refers to a specific attribute of an individual record.
Fields are named by column headings. An individual movie title in the Title column would be a field in
the movie table.
27. Refer to the exhibit. A data analyst writes a SQL query to extract information from multiple tables
in the Movies database. To complete the ON command, what needs to be entered in place of the
question mark in this query?
FROM Review AS r
JOIN Movie AS m
ON m.Title = ?;
r.Date;
r.Comment;
r.Score;
r.MovieTitle;
r.ID;
Explanation: To combine two tables, a SQL JOIN can use columns containing the same data type and
information in both tables. The Title column in the Movie table includes the movie’s name, and
the MovieTitle column in the Review table contains the same information. Even though
the Date column appears in both tables, it does not have the same information and, therefore, cannot
be used in the JOIN.
28. Refer to the exhibit. Match the JOIN operation with the proper description.
LEFT JOIN Returns all records from Table 1 and only the matched records from Table 2.
RIGHT JOIN Returns all records from Table 2 and the matched records from Table 1.
INNER JOIN Returns only records where the selected fields have matching values in Table 1 and Table 2.
29. Which two are benefits of organizing information from multiple datasets into a dashboard?
(Choose two)
Explanation: There are several benefits to using dashboards to present your data.
They give you access to data and reports from multiple data sources on a single screen
simplifying data analysis.
worksheets
objects
screens
dashboards
Explanation: In Tableau, using a dashboard is a way to display information that consolidates multiple
views or visualizations.
To create visualizations.
Explanation: Tableau is a business intelligence and data visualization tool. With Tableau you can create
great and dynamic visualizations.
32. Match the data analysis tool with the use case.
avoid selecting only data and methods that support your assumptions
focus on the larger patterns and trends and include outliers in the overall analysis
1. Be aware that bias exists. Record your assumptions and hypotheses before beginning your
analysis. Avoid selecting only data and methods that support your assumptions.
2. Validate your data sources and the methodology used to collect the data.
3. Focus on the larger patterns and trends, remove outliers from the overall analysis in order to
investigate them further.
4. Review your methods and data with others in your team. They may be able to spot bias that you
may have overlooked.
5. Be open-minded and impartial in your analysis. Allow the data to inform your conclusions.
34. What are two methods that ensure confidentiality? (Choose two.)
authorization
encryption
availability
nonrepudiation
authentication
integrity
Explanation: Confidentiality means viewing of information only for those who need to know. This can be
accomplished by encrypting data and authenticating users who request access.
35. A data analyst conducting a study stops collecting more data once the evidence starts to support
the hypothesis. What type of bias has the analyst introduced into the findings?
confirmation
selection
interpretation
information
Explanation: Confirmation bias can occur when an analyst only collects or analyzes data that supports a
particular hypothesis.
36. Which type of bias in data analysis can be caused by the influence of outliers?
interpretation
information
confirmation
selection
Explanation: Information bias can occur when outliers are present in the data and they are not dealt
with appropriately. Outliers can skew the outcomes of analysis and distort the results.
37. Which three features should your data project portfolio contain to ensure that it is considered
favorably by employers and recruiters? (Choose three.)
it contains many multi-colored charts and graphics that show the results of the data analysis
it uses a layout and presentation that showcases your web design and markup skills
there are links to software, so non-technical reviewers can replicate your analysis
Explanation: A project portfolio should present those data projects you want to showcase in a well-
organized and easy-to-navigate format. Non-technical reviewers will probably need to gain the skills or
knowledge and the need to use software to replicate your analysis. A data project portfolio should focus
on the processes and outcomes of your projects, not your web design and markup skills. Extensive use
of multi-colored charts and graphics in your portfolio will detract from the content you want to
showcase.
38. Why is the Jupyter Notebook tool useful when developing and testing data analysis software?
markdown text explains the operation of SQL queries and displays the results
Explanation: Python code is interpreted and not compiled. Jupyter Notebook runs code interactively
within a web browser and not in a standalone application. SQL queries are applied to relational
databases and not to Jupyter Notebooks. A Notebook’s displayed output results from executed code
execution in real time.
39. Which data analytic tool can create interactive documents containing executable program code
and markdown text?
Excel
Jupyter Notebooks
Kaggle
Tableau
Explanation: Jupyter notebook is a web-based interactive computing platform. It is a valuable tool for
data analysts that provides a way to run code interactively within a web browser. It also includes
markdown text to explain what the code is doing.
40. What are the three functions of programming languages optimized for data analysis? (Choose
three)
statistical analysis
data visualization
data cleaning
Explanation: Network communication, hardware virtualization, and fast response times are not
functions that programming languages optimized for data analysis are expected to perform.
41. Refer to the exhibit. What is the resulting output from the formula in cell D8?
46
12
7.6
6
Explanation: The COUNT function counts the number of cells containing numeric data in the formula
range. In this example, the number of cells in the range D2 through D7 that contains data is 6.
42. Which Microsoft Excel formula would correctly multiply a value in cell A1 with a value in cell A19?
=A1*A19
=A1xA19
=MULTIPLY(A1:A19)
=xA1:A19
Explanation: The formula =A1*A19 takes the value in cell A1 and multiplies it with the value in cell A19.
The symbol for multiplication in Microsoft Excel is the asterisk (*).
43. What are two plain-text file types that are compatible with numerous applications and use a
standard method of representing data records? (Choose two.)
JSON
DOC
XLS
XML
Explanation: As data is collected from varying sources and in varying formats, it is beneficial to utilize
specific file types that allow easy conversion and universal application support. CSV, JSON, and XML are
plain text file types that allow for collecting and analyzing of data in a format that is easily compatible
and applicable for analysis.
it is a value or data point that varies significantly from others in the data set
Explanation: An outlier is defined as a value or data point varying significantly from the others, either
much smaller or much greater. Outliers can lead to anomalies in the results obtained, because they can
lead to negative effects on the results of your analysis. Outliers have to be cleaned up before the data
set can be used for effective analysis.
Explanation: The FROM statement in an SQL query specifies the table where data is stored.
46. ou are preparing a presentation that needs a visualization showing the relative levels of coffee
production in the coffee regions of the world. Which type of visualization in Tableau is well suited for
displaying this type of data?
bar chart
bubble map
heat map
area graph
Explanation: The bubble map is useful for comparing proportions over geographic areas. Circles or dots,
which are proportional in size to its value in the dataset, are displayed over a designated geographic
region.
Data is recoverable.
Explanation: Data integrity ensures that data is unaltered in transit. With data integrity there is
confidence that the data is accurate, consistent, and trustworthy. Cryptographic hashing functions are
used to ensure data integrity.
48. Which feature of Python reduces the coding requirements when data tasks such as exploratory
data analysis and machine learning are required?
Explanation: The availability of code libraries such as NumPy and Scikit-learn reduces the coding effort
when using Python programs to perform data tasks such as exploratory data analysis and machine
learning. All the other options are general features of Python that are not necessarily related to data
analysis.
49. What are two types of continuous variables? (Choose two.)
discrete
nominal
ordinal
ratio
interval
Explanation: Variables are either categorical or numerical. Categorical variables are qualitative and are
either nominal or ordinal. Numerical variables are quantitative and are either continuous or discrete.
Continuous variables are measured along a range of values and are either interval or ratio variables.
Explanation: Pivot tables provide a way to automatically summarize, analyze, explore, and present data.
Using the built-in tools you can identify trends, make comparisons between data items and create charts
in different styles to visualize your data. Pivoting data can help you answer different questions and even
experiment with your data to discover new trends and patterns.
51. Which action is taken during the data investigation step of the data analysis lifecycle?
Transform the data into a format appropriate for the analysis methods and tools.