0% found this document useful (0 votes)
12 views41 pages

Does Design Matter When Visualizing Big Data?

This study investigates the impact of visualization type and interaction on the usability of Big Data visualizations, emphasizing the importance of design in enhancing decision-making processes. Through a large-scale experiment, the authors found that appropriate visualization choices and interactive features significantly improve user efficiency and satisfaction. The research highlights the need for empirical evidence in selecting visualization types tailored to specific tasks and data characteristics in management accounting.

Uploaded by

César Sánchez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views41 pages

Does Design Matter When Visualizing Big Data?

This study investigates the impact of visualization type and interaction on the usability of Big Data visualizations, emphasizing the importance of design in enhancing decision-making processes. Through a large-scale experiment, the authors found that appropriate visualization choices and interactive features significantly improve user efficiency and satisfaction. The research highlights the need for empirical evidence in selecting visualization types tailored to specific tasks and data characteristics in management accounting.

Uploaded by

César Sánchez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Journal of Management Control (2020) 31:55–95

https://doi.org/10.1007/s00187-020-00294-0

ORIGINAL PAPER

Does design matter when visualizing Big Data?


An empirical study to investigate the effect of visualization
type and interaction use

Lisa Perkhofer1 · Conny Walchshofer1 · Peter Hofer1

Published online: 17 February 2020


© The Author(s) 2020

Abstract
The need for good visualization is increasing, as data volume and complexity expand.
In order to work with high volumes of structured and unstructured data, visualizations,
supporting the ability of humans to make perceptual inferences, are of the utmost
importance. In this regard, a lot of interactive visualization techniques have been
developed in recent years. However, little emphasis has been placed on the evaluation
of their usability and, in particular, on design characteristics. This paper contributes
to closing this research gap by measuring the effects of appropriate visualization use
based on data and task characteristics. Further, we specifically test the feature of inter-
action as it has been said to be an essential component of Big Data visualizations but
scarcely isolated as an independent variable in experimental research. Data collection
for the large-scale quantitative experiment was done using crowdsourcing (Amazon
Mechanical Turk). The results indicate that both, choosing an appropriate visualization
based on task characteristics and using the feature of interaction, increase usability
considerably.

Keywords Interactive visualization · Usability · Multidimensional data


visualization · Visual analytics · Big Data

Electronic supplementary material The online version of this article (https://doi.org/10.1007/s00187-


020-00294-0) contains supplementary material, which is available to authorized users.

B Lisa Perkhofer
[email protected]
Conny Walchshofer
[email protected]
Peter Hofer
[email protected]

1 Controlling, Accounting and Finance, University of Applied Sciences Upper Austria, Steyr, Austria

123
56 L. Perkhofer et al.

1 Introduction

One of the main purposes of management accounting is to provide decision-makers


with relevant information for an easy, accurate, fast and rational decision-making
process (Appelbaum et al. 2017; Dilla et al. 2010; Ohlert and Weißenberger 2015;
Perkhofer et al. 2019b). Being able to fulfill this fundamental task is becoming more
and more difficult as market dynamics increase (Eisl et al. 2012). The consequence is
most likely a distortion of current working practice. Management accountants need to
expand their scope from historical data reporting to real-time data processing, from
using only in-house data to the inclusion of external data sources, from traditional
paper based to interactive and online based reporting, and to shift the focus from
reporting the past to predicting the future (Appelbaum et al. 2017; Goes 2014). To
achieve this shift, new tools and technical instruments such as algorithms, appropriate
management accounting systems and reporting software (Pasch 2019) and especially
interactive data visualization are necessary (Perkhofer et al. 2019b; Janvrin et al. 2014;
Bačić and Fadlalla 2016; Ohlert and Weißenberger 2015).
Visualizing Big Data proves to be of great importance when problems (or tasks)
are high in complexity (Hirsch et al. 2015; Perkhofer 2019), or not sufficiently well-
defined for computers to handle algorithmically, meaning human involvement and
transparency is required (e.g. in fraud detection) (Munzner 2014; Kehrer and Hauser
2013; Dilla and Raschke 2015; Keim et al. 2008). Visualizing data means organiz-
ing information by spatial location and supporting perceptual inferences (Perkhofer
et al. 2019a). Perceptual inferences are comparatively easy for humans to draw, as
their visual sense is superior (with respect to fast programmable algorithms) and data
transformation to the essential stores in human memory is astoundingly fast (Ware
2012; Keim 2001; Sweller 2010). Visualization thereby enhances the ability of both,
searching and recognizing, and thus significantly enhances sense-making capabilities
(Munzner 2014; Bačić and Fadlalla 2016).
However, in order to optimally support the human perceptual system, an appropriate
and easy-to-use visualization needs to be presented to the decision-maker (Munzner
2014; Pike et al. 2009; Vessey and Galletta 1991; Perkhofer 2019; Falschlunger et al.
2016a). Especially when data and tasks increase in complexity, as is the case with
high-dimensional datasets, traditional business charts are no longer able to convey
all information in one chart. Therefore newer forms of visualizations, also called Big
Data visualizations, have to be taken into account (Grammel et al. 2010). Big Data
visualizations are unique in their purpose and designed to deal with, and present, larger
amounts and various forms of data types (Perkhofer et al. 2019b). Novel forms with
the goal of presenting multidimensional data, often used in visual analytics (Liu et al.
2017; Bačić and Fadlalla 2016), range from parallel coordinates and polar coordinates
plots, over sunburst-, Sankey-, and heatmap-visualizations, to scatterplot matrices and
parallel coordinated views (Liu et al. 2017; Albo et al. 2016; Bertini et al. 2011;
Claessen and van Wijk 2011; Lehmann et al. 2010).
By the use of Big Data visualizations, the management accountant is able to show
the whole dataset within one comprehensive visualization and is therefore able to
generate new insights that would otherwise stay uncovered. For gaining insight, how-
ever, the visual display alone is not enough. The user needs to be able to interact with

123
Does design matter when visualizing Big Data? An empirical… 57

the interface (Elmqvist et al. 2011). Interacting in this context means using filter or
selection techniques, drilling down to analyze the next layer of a data dimension, or
also interchanging data dimensions or value attributes (Perkhofer et al. 2019b; Heer
and Shneiderman 2012). Only if the user is able to interactively work with the dataset
and answer predefined questions or questions that arise during the process of analysis,
Big Data visualizations can unfold their full potential and new correlations, trends, or
clusters can be detected for further use (Perkhofer et al. 2019a, c).
Unlike conventional charts used in everyday life (e.g. line, pie, or bar charts), new
visualizations require a close focus on design and interaction in order to be considered
useful (Liu et al. 2017; Kehrer and Hauser 2013; Elmqvist et al. 2011; Pike et al.
2009; Bačić and Fadlalla 2016). Unfortunately, for both the design and use of new
visualization options, and the design and use of interaction, limited empirical research
is available (Isenberg et al. 2013; Perkhofer et al. 2019b). Users still have to go through
cost-intensive and unsatisfying trial and error routines in order to identify best practice
instead of being able to rely on empirical evidence (van Wijk 2013). This led us to
identify two concrete and pressing questions in current literature, addressed in this
study:

(1) Appropriate use of new visualization types: Depending on data- and task-
characteristics, some visualization types are claimed to outperform others when
it comes to optimal decision-support. However, these claims are mostly based
on their developers opinion or on small scale user studies rather than on exper-
imental research (Isenberg et al. 2013; Perkhofer 2019). As multiple options to
visualize Big Data are available, we limit the scope of this study to identify visu-
alizations for multidimensional data. This is due to the fact, that it is impossible
for traditional forms to show more than three attributes or three dimensions at
the same time within one visualization. This, we think, highlights the importance
and need of Big Data visualizations and demonstrates their benefits. Further, as
a starting point to investigate Big Data visualizations we choose four frequently
cited and actively used visualization types (details please see Table 1), namely the
sunburst visualization, the Sankey visualization, the parallel coordinates plot and
the polar coordinates plot (Bertini et al. 2011; Keim 2002; Shneiderman 1996).
We wanted to investigate if one particular visualization type can outperform the
other based on the three tasks identify, compare, and summarize (classification
based on (Brehmer and Munzner 2013) using two different perspectives on the
dataset (multiple hierarchy-levels vs. multiple attribute comparisons).
(2) Appropriate use of interaction techniques: Pike et al. claim that interaction “has
not been isolated as an experimental variable” yet, therefore hindering direct
causal interpretation on this highly discussed and frequently used visual analytics
feature (Pike et al. 2009, p. 272). This is because most user studies concentrate
on the visualization itself, while interaction is added as an integrated feature
incorporated into the source code of the visual representation. Visualizations
can be used and tested without interaction (as a static form), however, interaction
does not work without the visualization itself (Kehrer and Hauser 2013). “Exactly
what kind of degree of benefit is realized by allowing a user to interact with visual
representation is still undetermined.” (Pike et al. 2009, p. 272). Consequently, to

123
58 L. Perkhofer et al.

answer this claim we isolate the effect of interaction and evaluate the difference
between an almost static versus a highly interactive visualization.

Performance is measured by the three components of usability defined by ISO 9241


(efficiency, effectivity and satisfaction) as well as by one comprehensive sum-score
for usability described and created by the authors. For data collection, we used the
crowdsourcing platform Amazon Mechanical Turk resulting in a large sample size
of N  2272. Results obtained by MTurk have been shown to be congruent with
lab experiments in the context of visual analytics (Harrison et al. 2014), allowing us
to believe it is an appropriate and reliable platform to test our selected visualization
options. Statistical analysis was based on MANCOVA (for simultaneously assessing
efficiency, effectivity and satisfaction) or respectively ANCOVA (for assessing the
sum-score for usability).
Results indicate that the used visualization type and the degree of interaction have
an influence on efficiency and satisfaction, while the task type primarily influences
effectivity. More precisely, from a users’ perspective, information retrieval and there-
fore a fast and accurate decision is encouraged when being confronted with Cartesian
based rather than Polar based visualization types (the Sankey visualization or the
parallel coordinates plot rather than the polar coordinates plot or the sunburst visu-
alization) and if visualizations are made accessible in a highly interactive form. For
users to make effective decisions, the underlying task needs to be supported by the
visualization type. For example, the task type summarize is executed more effectively
if the data presented in the visualization is already condensed by dimensions (e.g. the
Sankey or the sunburst visualization) while the task type identify is easier to execute
if each single data entry is presented as a single property within the visualization (e.g.
the parallel coordinates plot). These results can be seen as general guidelines for Big
Data visualization use in the context of managerial accounting, however, also specific
information on the used visualization types are presented in this experimental study.
The remainder of this paper is structured as follows: First, the general purpose of
visualizations, specific visualization types and interaction techniques are discussed
and the hypotheses four our experimental design are presented. In Sect. 3, the study
design is laid out in detail before analysis is presented in Sect. 4. The last sections
discuss and conclude our research findings, state limitations, and propose opportunities
for further research in the context of interactive visualizations for multidimensional
datasets.

2 Theoretical background and hypotheses

The fundamental goal of visualizations is to generate a particular insight or to execute


a specific task by emphasizing distinct features of the underlying dataset (Lurie and
Mason 2007; Anderson et al. 2011). Insights can either be the discovery of trends,
correlations, associations, clusters, and events (that allow the generation or the verifi-
cation of hypotheses), or the presentation of information to a particular audience by
telling a persuasive and data-supported story for decision-making purposes (Brehmer
and Munzner 2013). While telling a story mostly follows a standardized procedure

123
Does design matter when visualizing Big Data? An empirical… 59

such as reporting, the generation or verification of a hypothesis, in contrast, is typi-


cally ad hoc and unstructured (Perkhofer et al. 2019b). Especially in situations that
ask for an ad hoc evaluation of a highly complex and large dataset, the use of Big
Data visualization is of great importance (Chen and Zhang 2014; Falschlunger et al.
2016a). Consequently, users confronted with such problems have already established
the use of Big Data visualizations. Pioneering examples can be found in fraud detec-
tion (Singh and Best 2019; Keim 2001), or when analyzing network traffic (Keim et al.
2006) as well as business models to reduce costs but maintain quality (Appelbaum
et al. 2017). Further, Big Data visualizations are also customary in companies with a
high focus on personalized marketing and social media to evaluate the implications
of certain initiatives on product satisfaction and innovation (Keim 2001; Appelbaum
et al. 2017).
Novel and interactive visualization types, such as those mentioned in the introduc-
tion (if used for their intended purpose and designed optimally), allow information to
be uncovered which would otherwise stay hidden (Grammel et al. 2010). Currently,
these new insights can be seen as a way to better attract customers or to optimize main-
tenance (Appelbaum et al. 2017), however, in the near future generating insight form
Big Data will be a necessity to stay competitive (Perkhofer et al. 2019b). Nonethe-
less, in order to generate insight, the users and their abilities as well as needs have
to be considered in the process of selection and design (Perkhofer 2019; Endert et al.
2014). While targeting specific users or user groups has already been identified as an
essential part in standardized or scientific visualization use (Yigitbasioglu and Velcu
2012; Speier 2006), unfortunately, researchers and developers working on Big Data
visualizations still put their sole focus on the generation of new visualization options
to present a holistic view on the dataset (Perkhofer 2019). In doing so, they often fail to
consider the users’ precise needs and risk for their visualizations to misinform users or
for not being used at all (Isenberg et al. 2013; van Wijk 2005; Perkhofer et al. 2019b).
In order to create or select appropriate visualizations, three stages are crucial accord-
ing to Brehmer and Munzner: (1) encode (select and design appropriate visual forms),
(2) manipulate (enable the user to interact with the data), and (3) introduce (enable
the user to add additional data and save results) (Brehmer and Munzner 2013). In
this paper, we concentrate on the selection and the design of appropriate, and most
importantly interactive visualizations and therefore put emphasis only on the first two
stages.

2.1 Encode: choosing the visual representation and design

Visual representation is synonymous with visual encoding and means transforming


data into pictures. Analyzing data through a visual inference is easier and cogni-
tively less demanding than looking at raw data because it allows for the identification
of proximity, similarity, continuity, and closure (Zhou and Feiner 1998; Ohlert and
Weißenberger 2015). Before we evaluate Big Data visualizations and their influencing
factors based on usability (ISO 9241), a classification on the multiple choices proposed

123
60 L. Perkhofer et al.

and presented to potential users (in literature and free libraries such as D3.js1 ) is nec-
essary. We limit our investigation to frequently-cited and open-sourced visualization
options (Perkhofer et al. 2019a, b) as the evaluation of all Big Data visualization
options goes beyond the scope of this paper.

2.1.1 Classification and description of frequently used Big Data visualizations

For classification, we distinguish between two features: the type of data that
can be represented in the proposed visualization (1a. multiple dimensions but only
one attribute2 → hierarchical visualization vs. 1b. multiple attributes but only one
dimension → multi-attribute visualization) and the basic layout (2a. Polar or 2b. Carte-
sian-coordinates based visualizations). A summary on the identified visualizations
used, is presented in Table 1 (Please note that the table does not claim to be exhaustive,
but should rather be seen as an indicator of frequently-used or proposed visualization
methods for multidimensional datasets, which is the selection criteria for our empirical
analysis).
Based on this summary of highly cited and used visualization types we can conclude
that both, a mix of Polar and Cartesian-coordinates based visualizations as well as a
mix of hierarchical and multi-attribute based ones, are common. From this pool of
options, we picked the most frequently cited pair of each category for comparison.
For a better understanding of each individual visualization type, they are going to be
explained in more detail in the following:
The sunburst visualization (Polar-coordinates based layout and hierarchical data
structure): The sunburst visualization is one of the more frequently used visualization
types compared to other and newer forms of visualizations (Perkhofer et al. 2019b).
It projects the multiple dimensions of the dataset in a hierarchical dependent manner
into rings and can therefore be mapped to be a Polar-coordinates based visualization
option. The sunburst is a compact and space-filling presentation and shows the respec-
tive proportion of the total value by each dimensions and its sub-components by its
angular size (Rodden 2014). Due to the strict structure of a sunburst, the innermost
ring represents the highest hierarchical level and all dimensions dependent on it are
represented in further rings to the outside (Keim et al. 2006). The position of the
rings influences interpretation and therefore re-positioning of these dimensions (using
another sequence of dimensions for the display of the rings) means gaining other and
new valuable insights. Additionally, based on the Vega-light specification, categorical
color scales are used to encode discrete data values, each representing a distinct cat-
egory and sequential single-hue schemes to visually link related data characteristics
(Satyanarayan et al. 2017).
The Sankey visualization (Cartesian-coordinates based layout and hierarchical
data structure): The Sankey chart focuses on sequences of data, which can either be
time-related or dependent on a hierarchical structure (Hofer et al. 2018). It is often used
1 D3.js (data driven documents) is a popular java-script based website for visualization researchers to share
and exchange code for new visualization types; many visualization types available in software are first
launched on D3.js.
2 An attribute is something measureable like revenues, costs, or contribution margins while a dimension is
a way of clustering the data for analysis such as customer groups, product groups, or sales regions.

123
Does design matter when visualizing Big Data? An empirical… 61

Table 1 Highly cited and used visualization types identified for multidimensional data

Visualization type References Data type Layout

Parallel coordinates plot Abi Akle et al. (2019) Multi-attribute Cartesian-coordinates


Hofer et al. (2018) based visualization
Netzel et al. (2017)
Johansson and Forsell
(2016)
Kanjanabose et al. (2015)
Wang et al. (2015)
Harrison et al. (2014)
Heer and Shneiderman
(2012)
Heinrich et al. (2012)
Claessen and van Wijk
(2011)
Lehmann et al. (2010)
Henley et al. (2007)
Keim (2002)
Shneiderman (1996)
Inselberg and Dimsdale
(1990)
Sunburst visualization Hofer et al. (2018) Hierarchical Polar-coordinates
Harrison et al. (2014) based visualization
Rodden (2014)
Kim and Draper (2014)
Mansmann et al. (2012)
Diehl et al. (2010)
Stab et al. (2010)
Draper et al. (2009)
Keim et al. (2006)
Stasko and Zhang (2000)
Polar coordinates plota Perkhofer et al. (2019a) Multi-attribute Polar-coordinates
Liu et al. (2017) based visualization
Albo et al. (2016)
Harrison et al. (2014)
Claessen and van Wijk
(2011)
Draper et al. (2009)
Elmqvist et al. (2007)
Treemap visualization Perkhofer et al. (2019a) Hierarchical Cartesian-coordinates
Wang et al. (2015) based visualization
Kim and Draper (2014)
Bostock et al. (2011)
Keim et al. (2006)
Songer et al. (2004)
Bruls et al. (2000)
Johnson et al. (1991)
Sankey visualization Hofer et al. (2018) Hierarchical Cartesian-coordinates
Chou et al. (2016) based visualization
Rodden (2014)
Riehmann et al. (2005)

123
62 L. Perkhofer et al.

Table 1 continued

Visualization type References Data type Layout

Heatmap visualization Perkhofer et al. (2019a) Multi-attribute Cartesian-coordinates


Barter and Yu (2018) based visualization
Perrot et al. (2017)
Severino (2015)

a Also known as star plot or radar chart

to present analysis based on elections, as it allows to highlight populations remaining


loyal to the same party as well as populations changing their vote from one election
to the other. Thus, the Sankey visualization is designed to present information of flow
between multiple dimensions (e.g. processes, entities,…) (Chou et al. 2016). With
regard to storytelling and sensemaking, interactions like re-ordering (changing the
sequence of dimensions) and reducing the amount of visible nodes to minimize visual
clutter are indispensable (Chou et al. 2016). In addition, for a consistent analysis of
the data, it is necessary to find a way to highlight information across nodes by making
use of selectors (Hofer et al. 2018).
The parallel coordinates plot (Cartesian-coordinates based layout and multi-
attribute data structure): The parallel coordinates plot is a very popular and strongly
recommended visualization in the InfoVis (Information Visualization) community and
highly cited in scientific research. This is due to the fact, that the parallel coordinates
plot is one of the few visualizations that is able to present multiple attributes in one
chart (Hofer et al. 2018; Perkhofer et al. 2019a). Two or more horizontal dimension
axes are connected via polygonal lines at the height of the respective dimension value
(Keim 2002; Perkhofer et al. 2019c). To do so, data is geometrically transformed
(Keim 2001) and each line represents one data entry (e.g. an order, a sales entry). With
respect to interpretation, Inselberg introduced common rules for the identification of
correlations and trends (Inselberg and Dimsdale 1990)
• lines, which are parallel to each other suggest a positive correlation,
• lines crossing in an X-shape suggest a negative correlation, and
• lines crossing randomly, show no particular relationship.
Similar to a Sankey visualization, a user has to be able to re-arrange the axes on
demand as only neighboring axis can be interpreted in a meaningful way (Perkhofer
et al. 2019a). By making use of both, categorical/sequential single-hue color scales as
well as filtering options, cluster analysis can be performed (Perkhofer et al. 2019c).
The polar coordinates plot (Polar-coordinates based layout and multi-attribute data
structure): The polar coordinates plot is a radial projection of a parallel coordinates plot
(Diehl et al. 2010). Attributes are arranged radially, and each attribute value is presented
proportionally to the magnitude of the value of each attribute with respect to their
minimum and maximum value. Each line connecting the attribute values represents one
data entry. Characteristic for a polar coordinates plot is the detection of dissimilarities
and outliers. Nonetheless, it is difficult to compare the lengths across the uncommon
axes (Diehl et al. 2010). Users encoding a polar coordinates plot try to interpret the

123
Does design matter when visualizing Big Data? An empirical… 63

area that appears as soon as all attributes are connected. Unfortunately, areas that
appear at random depending on the loosely selected order of attributes, misinform
the user. Further, areas are more difficult to compare than straight lines connecting
data points (Kim and Draper 2014) and data points in outer layers cause areas to
appear disproportionately bigger and therefore, angles lead to a harder assessment
than straight lines due to their distortion (Perkhofer et al. 2019a). Effects have not
been tested yet (Albo et al. 2016).

2.1.2 Possible factors influencing usability of Big Data visualizations

After presenting the most frequently used visualization options for Big Data, we
are going to discuss possible influencing factors on their ability to encode specific
information and making them accessible to their users. As already explained, each
visualization type has the potential to uncover and present a different type of insight
to its audience (supporting a different task), while at the same time hiding another
(Perkhofer 2019). As theories and experimental research on the process of encoding
for interactive visualizations for Big Data are limited, research from standard busi-
ness graphics and dashboarding are used for hypotheses development (Falschlunger
et al. 2016a; Speier 2006; Vessey and Galletta 1991; Perkhofer 2019; Yigitbasioglu
and Velcu 2012). The purpose of this approach is to test existing principles on their
applicability on interactive and new forms of visualizations and to shed light on the
process of encoding in order to foster decision-making.
Previous findings have shown that the following factors (explained in more detail
below) influence the ability of the user to successfully decode information given a
chosen visualization option (Perkhofer 2019; Falschlunger et al. 2016a, b; Ware 2012;
Vessey and Galletta 1991; Speier 2006):
(1) the design of the visualization,
(2) the dataset,
(3) the task, and
(4) the decision-maker characteristics (in particular previous experience and knowl-
edge on reading and interpreting visualizations).
With respect to the design of the visualization, it has been shown that a low
data-ink ratio (Tufte 1983) and the display of coherent information in juxtaposition
(Perkhofer 2019) allows for a faster processing of information. Both of these prin-
ciples are satisfied by Big Data visualizations, as they are designed to visualize the
full dataset within one coherent visualization. However, a need for discussion can be
identified when choosing a basic layout as visualizations are either based on a polar-
coordinate or Cartesian-coordinate system and the basic layout fundamentally changes
the way information needs to be decoded by the user (Rodden 2014). While in a polar-
coordinates based visualizations, angles need to be assessed, the height of a column
or the length of a line that needs to be compared within a Coordinates-based system.
With respect to standardized business charts, Cartesian-coordinates based visualiza-
tions (scatterplots, line and bar charts) are known to outperform polar-coordinates
ones (pie charts) (Diehl et al. 2010). However, this result on the most appropriate lay-
out needs to be re-evaluated for Big Data visualizations as interactivity might change

123
64 L. Perkhofer et al.

results (Albo et al. 2016). Further, the share of polar-based visualizations within the
available and applied visualization tools is quite large and therefore deserves a closer
look. This leads to our first hypotheses:
H1a: The basic layout influences usability of a visualization.
H1b: Cartesian-coordinate based visualization types outperform polar-coordinate
based visualization types.
Next to the design, the underlying dataset influences usability. It is known, that data
can only be assessed, as long as enough cognitive space is available for data process-
ing (Sweller 2010; Atkinson and Shiffring 1968; Miller 1956). Otherwise, or more
precisely in a state of information overload, a negative effect on effectivity, efficiency,
and satisfaction can be identified (Bawden and Robinson 2009; Falschlunger et al.
2016a). It has also been demonstrated that data, which is presented in a familiar form
(e.g. known since childhood) or which can be related to already known information
stored in long-term memory, is processed faster and more accurately as the burden
it poses on working memory is reduced (Perkhofer and Lehner 2019; Atkinson and
Shiffring 1968).
As presented in Table 1, one needs to distinguish between hierarchical and multi-
attribute visualization types when dealing with multidimensional datasets. While for
hierarchical visualizations only one attribute (e.g. one KPI such as sales) needs to
be evaluated based on different levels and compositions of aggregation, for multi-
attribute visualizations multiple attributes need to be processed. For the latter, not
only different measures need to be known and understood, but also they have to be
analyzed in reference to each other for new insights to appear. Consequently, multi-
attribute visualizations are said to enhance the burden placed on the user (Falschlunger
et al. 2016a) leading to the following hypotheses for our investigation:
H2a: The underlying dataset influences the usability of a visualization.
H2b: Hierarchy based visualizations types outperform multi-attribute based visu-
alization types.
Without question and as already mentioned multiple times, tasks and insights differ
with different visualization types. Matching the visualization to its respective task has
been identified as a main influence in traditional visualization use. It has been shown
that a mismatch increases cognitive load and consequently impairs decision-making
outcome (Falschlunger et al. 2016a, b; Dilla et al. 2010; Shaft and Vessey 2006;
Speier 2006; Perkhofer 2019). Up to now, the question of tables versus charts has
been extensively tested resulting in a classification of spatial tasks (looking for trends,
comparisons etc.) to be best supported by spatial visualizations (charts) and symbolic
tasks (looking for specific data points) to be best supported by symbolic visualizations
(tables) (Vessey and Galletta 1991). With respect to Big Data visualizations, a new
classification of tasks has been established, namely identify (search for a specific data
point), compare (compare two different data points or also two different aggregation
levels), and summarize (generate overall insights by looking at the whole dataset)
(Brehmer and Munzner 2013). However, these tasks have not yet been associated with
visualization types or characteristics of visualization types.
Based on the fundamental activities that users have to perform and given the above
presented task type classification, we hypothesize that the task identify is easier to

123
Does design matter when visualizing Big Data? An empirical… 65

perform if no previous aggregation based on different dimensions has influenced the


visual appearance of the dataset. On the other hand, summarize should be easier to
accomplish, if the dataset has already (at least to some extent) been aggregated and
not every single data point is displayed in isolation. With respect to compare, results
will be better for single data comparison tasks if the display shows every data-point
in isolation, while results on the comparison of sub-dimensions (already aggregated
data) will be better in hierarchical visualizations.
H3a: The task type influences the usability of a visualization.
H3b: Users will perform better with a multi-attribute visualization than with a
hierarchy-based visualization when confronted with the task type identify.
H3c: Users will perform better with a hierarchy-based visualization than with a
multi-attribute visualization when confronted with the task type summarize.
And finally, yet important, the factor user characteristics needs to be considered
when choosing a specific visualization type. Results on standard business charts have
demonstrated that choosing the right chart type (bar, line, pie, or column) has resulted
in contradicting results given different user groups (Falschlunger et al. 2016a). Only
if the factor previous experience, not only with the dataset and the KPIs but also with
the respective visualization type used, is considered and included into the selection
process, satisfying results are the consequence (Falschlunger et al. 2016b; Perkhofer
and Lehner 2019). This can again be explained by the use of cognitive load theory: If
the user has never seen the layout before a lot of cognitive resources are needed to read
the visualization rather than to interpret the information that is represented by it. The
more experience a user has with a specific visualization the more reading strategies
exist and the more automated is the process of extracting data from the visualization
leaving ample room for data interpretation (Perkhofer 2019).
H4: Previous experience/usage of the different visualization types positively effects
usability.
Based on these findings, the different visualization options for representing multidi-
mensional data presented and described above will most likely result in differences on
usability depending on the task (identify, compare, summarize), the dataset used (hier-
archical or multi-attribute data), their basic layout they represent (Cartesian-based or
polar-based layout), and the level of previous experience. Nonetheless, we hope to find
general rules and guidelines, similar to those of standard and scientific visualization
use, to guide designers and users.

2.2 Manipulate: using interaction to manipulate existing elements

Visualizations designed to present large amounts of data greatly benefit from the
process of interaction. In particular, the following processes are better supported
by interactive visualizations when confronted with visual analytics tasks: detecting
the expected, discovering the unexpected (generate hypotheses), and drawing data-
supported conclusions (reject or verify hypothesis) (Kehrer and Hauser 2013). To be
more specific, working with interactive visualizations is driven by a particular ana-
lytics task. However, when working interactively, analysis does not end by finding a

123
66 L. Perkhofer et al.

proper answer to the initial task, but rather allows the generation and verification of
additional and different hypotheses, which are then called insight (Pike et al. 2009).
These are generated only because the user interactively works with the dataset, and
the process of doing so increases engagement, opportunity, and creativity (Brehmer
and Munzner 2013; Dilla et al. 2010).
As a consequence, visualizations presented in Table 1 are claimed to only become
useful as soon as the user is able to interact with the data. Interaction is of such high
importance, because the actions of a user demonstrate the “discourse the user has
with his or her information, prior knowledge, colleagues and environment” (Pike et al.
2009, p. 273). Further, the sequence of actions is not predefined but rather individual
and dependent on the user. It thereby particularly supports the user’s knowledge base
and perceptual abilities (Dilla et al. 2010; Brehmer and Munzner 2013; Elmqvist
et al. 2011; Dörk et al. 2008; Liu et al. 2017). Consequently, interaction requires
active physical and mental engagement and throughout this process, understanding is
increased and decision-making capabilities are enhanced (Pike et al. 2009; Pretorius
and van Wijk 2005; van Wijk 2005; Wilkinson 2005; Dix and Ellis 1998; Buja et al.
1996; Shneiderman 1996). In a static form, only a general overview is presented
to the user, however, without the opportunity of interaction, hypotheses verification,
or further hypotheses generation is extremely limited (Hofer et al. 2018; Liu et al.
2017; Pike et al. 2009; Perkhofer et al. 2019b). This not only frustrates users, but also
contradicts the well-known mantra of visual information seeking: “overview first,
zoom and filter, then details on demand” (Shneiderman 1996). On a more practical
level, interaction allows the user to filter, select, navigate, arrange, or change either the
amount of data or the characteristics of the visual display (for details on the interaction
techniques see Table 20 in the “Appendix”).
Results on the proper use of interaction techniques are limited. Unfortunately, stud-
ies that have been conducted up to now, tend to blur the concept of visualization in
combination with interaction (Pike et al. 2009). However, existing recommendations
predominantly support the use of multiple interaction methods (Rodden 2014). “The
more ways a user can ‘hold’ their data (by changing their form or exploring them from
different angles and via different transformations), the more insight will accumulate”
(Pike et al. 2009, p. 264). By intentionally clicking, scrolling and filtering the data, the
user gains a deeper understanding of the relations within the given dataset. Interaction
is therefore an essential part of the sense-making process and enhances the user’s
processing and sense-making capabilities (Shneiderman 1996). Building on previous
literature, the following hypotheses are presented:

H5a: Interaction influences the usability of a visualization.


H5b: Users will perform better with a highly interactive visualization than with a
mostly static one.

3 Study design

The purpose of this paper was directed toward the identification of an interactive visual-
ization in order to present multidimensional data effectively and efficiently. Therefore

123
Does design matter when visualizing Big Data? An empirical… 67

Between
subjects Visualization type

Sunburst visualization ▲
Sankey visualization ▲
Polar coordinates plot 
Usage/Experience
Parallel coordinates plot 

Within
subjects Task type Usability

Identify 1. Effectivity (high priority)


Compare 2. Efficiency (medium priority)
Summarize 3. Satisfaction (low priority)

Within
subjects Interaction

Yes
No

▲ x dimensions / 1 attribute
 1 attribute / x dimensions
 Polar-coordinates based layout
 Cartesian-coordinates based layout

Fig. 1 Research model

a within and between experimental design (4 × 3 × 2) was used. The visualization


type was manipulated at four levels: For the experiment, we chose the most frequently
researched and available visualization types, the sunburst visualization, the Sankey
visualization, the parallel coordinates plot and the polar coordinates plot (please see
Table 1). Two of the visualization types under investigation are in a Polar-based lay-
out and two in a Cartesian-based one. Further, two out of four show a hierarchical
dataset while the other two present a multi-attribute one. The task type was manip-
ulated at three levels: The statements were based on Brehmer and Munzner’s task
taxonomy—identify, compare and summarize (Brehmer and Munzner 2013). And
finally, interaction was manipulated at two levels (limited interaction, high interac-
tion).
The experimental study was conducted using LimeSurvey and the crowdsourcing
platform Amazon Mechanical Turk (MTurk). For each visualization type, a separate
but identical experiment was created. Each participant evaluated only one visualization
type, but had to assess various statements to simulate the process of hypothesis verifi-
cation ( task types). Visualizations were coded based on the D3.js library, extended,
and adjusted to fit our purpose (most significant changes needed to be implemented
with respect to interaction techniques; available visualizations had limited options).
Visualizations are available for download on the author’s homepage or they can be
accessed by clicking on the link presented in Table 22. Specifications on the dataset,
the tasks, and the visualizations used are presented in the following subsections and
the research model is presented in the following Fig. 1.

123
68 L. Perkhofer et al.

3.1 Data sample

We used a self-generated data sample for our study as a basis to compare the different
visualization types. The dataset simulated a wine trade company and consisted of 9961
records, whereby each record represented a customer’s order. During construction of
the sample, six finance experts designed key metrics typically used in trade companies
to simulate a close-to-reality example for data exploration. The dataset consisted of
14 dimensions (order number, trader, grape variety, winemaker, state, etc.) and 12
attributes (gross margin, net margin, gross sales, net sales, discounts, gross profit,
shipping costs, etc.) in total. As a result, our dataset can be described as being structured
and shows no inconsistencies or missing values. Users were confronted with a large
amount of data shown within one visualization, including multiple possible dimensions
and attributes in order to find patterns, trends, or outliers. This allowed the assumption
that confusion and misunderstanding based on the dataset were kept to a minimum (also
confirmed in pre-tests). Each visualization used, without any filters active, showed the
complete underlying dataset of 9961 records.

3.2 Manipulation of the independent variables

As already explained in Sect. 2, we tested four distinct visualization types. These four
visualization types could be characterized by two features: by the structure of the data
they were capable to display (hierarchical data vs. multi-attribute data) and by the
overall layout of the visualization types (horizontal/Cartesian vs. radial/Polar). Addi-
tionally, interaction is the central component to understand and work with Big Data
visualization tools. Therefore, by taking a closer look at already existing prototypes
and literature, two interaction concepts per visualization type were designed to estab-
lish comparison and fairness, but also present each type in the best possible and most
natural way. The used and virtually available visualization types and their respective
interaction concepts are presented in Tables 21 and 22 in the “Appendix”.
Based on the previously described dataset, statements in accordance with Brehmer
and Munzner’s task classification model for Big Data visualizations were created and
presented to the participants for evaluation in randomized order. Participants were
asked in the experimental conditions to assess the statements’ truth (examples are
presented in Table 2). Each task type was assessed twice per visualization-interaction
combination.

3.3 Dependent variable usability

For assessing the quality of a visualization, the effects on user performance (effi-
ciency and effectiveness) alone are not sufficient, instead one needs to measure the
whole concept of usability (Pike et al. 2009; van Wijk 2013). Usability is defined by
ISO 9241-11 and represents a combination of effectiveness, efficiency, and satisfaction
(SAT) in order to present the user with the best possible solution. For effectiveness, we
rated the number of statements answered correctly, while for efficiency, we measured
the time for task execution (logged by LimeSurvey as soon as answers to a given task

123
Does design matter when visualizing Big Data? An empirical… 69

Table 2 Task types used including their predefined answer options

Task type Proxy of task used for this task classification

Identify Wine&Co9 sells fewer than 3.000 bottles of white wine


Options: ● This statement is correct. ● This statement is incorrect. ● This statement
cannot be answered
Compare Wine&Co11 sells more red wine than Schenkenfelder 2
Options: ● This statement is correct. ● This statement is incorrect. ● This statement
cannot be answered
Summarize Overall, more white wine is sourced from North/and South America than from Europe
Options: ● This statement is correct. ● This statement is incorrect. ● This statement
cannot be answered

Subtasks to calculate 1. Eliminate outliers and 2. Use z-transformation 3. Calculate sum score
usability score correct direction for for effectivity, efficiency
efficiency and satisfaction and
adjust absolute levels
• 5% percentile • Calculate z-scores • Add up values for
excluded based on (sample value – effectivity, efficiency,
experience in lab mean) / standard and satisfaction
Description and experiments deviation
assumptions (confusion vs. no • Adjust absolute
motivation) levels:
As well as steps • adjust direction of • calculate max and
and formulas used task time based on min to measure the
the direction of the absolute distance
other two variables • use the absolute
effectivity and distance from the
satisfaction by largest parameter
multiplying -1 (efficiency) for
the higher the better level adjustment
(details see Table 6)

Fig. 2 Calculation of one comprehensive score for usability

were submitted). With respect to satisfaction, we collected data not per single task, but
for each visualization and interaction level. Participants had to rate their satisfaction on
a 5-point Likert scale (Question: Please rate your overall level of satisfaction with the
visualization in the figure presented below. Please bear in mind the experimental tasks
when filling out the scale. Answer options: very satisfied—satisfied—neutral—unsat-
isfied—very unsatisfied).
As usability is measured inconsistently throughout the literature, we provide
insights into all three sub-components but also introduce one comprehensive mea-
sure for usability for better readability of results. In order to do so, we first calculate
z-scores for the components before adjusting the absolute distance between min and
max (the largest distance exists between efficiency; this distance is used to re-calculate
min and max for effectivity and satisfaction). After data transformation, a sum score
is calculated. Figure 2 documents the calculation and Fig. 3 shows details on the
distribution of the used variables.

3.4 Control variable

Usage/Experience was assessed by analyzing each visualization type based on a 5-


point Likert scale (1-no use to 5-daily use) (Question: How often do you use the

123
70 L. Perkhofer et al.

Effectivity Efficiency Satisfaction Usability

Z-score before
adjustment

Min: -1.61 Min: -7.15 Min: -1.83


Max: 0.62 Max: 1.15 Max: 1.30
Distance: 2.23 Distance: 8.30 Distance: 3.13

Z-score after
adjustment

Min: -5.99 Min: -7.15 Min: -4.86


Max: 2.31 Max: 1.15 Max: -3.45
Distance: 8.30 Distance: 8.30 Distance: 8.30

Fig. 3 Adjustment of z-scores to absolute levels of efficiency (for effectivity and satisfaction only the pre-
sented scores exists; for efficiency exact times observed during the study are used for analysis)

visualization in the figure presented below?). Insights on the obtained results are
presented in Table 3.

3.5 Procedure

Before starting the experiment, an introduction served to explain the given dataset and
the procedure of the study. Further, in order to work effectively and efficiently with the
visualizations and the various interaction techniques used, we included a short video
showing the visualization type in detail as well as all possible interaction techniques
(the link was also available throughout the experiment by a link posted in the help
section). For each visualization type and each interaction stage, not only the video but
also a verbal explanation was included throughout the experiment.
After reading the introduction, the attention of the participants was tested to ensure
quality by asking control questions. Only if 4 out of 6 questions were answered cor-
rectly, data was used for analysis. The static and the interactive layout were grouped
and presented to the participants in randomized order (either they started with all ques-
tions based on the static or with all questions based on the interactive layout). Tasks
within each layout were again presented in randomized order. After completing the
experimental task, participants filled out a preference questionnaire concerning inter-
action and provided information on their experience with visual analytics and with
the visualization type under investigation. Additionally, demographic information was
collected.
The studies (one per visualization type) were launched on Amazon Mechanical
Turk in June 2018 (Sunburst and Sankey visualization) and in January 2019 (Parallel
Coordinates and Polar Coordinates Plot). Participants were compensated for their
participation (10$ per participant, the study lasted approximately 45 min). Participants
were only compensated for their invested time when the complete dataset was handed
in and no response pattern (e.g. always choosing the same answer option, choosing
“no answer” more than 30% of the time) could be identified. For data analysis, we

123
Does design matter when visualizing Big Data? An empirical… 71

Table 3 Details on control variable usage

Visualization type Static (%) Interactive (%)

Sunburst visualization Not at all 46 60


Occasionally 40 31
Monthly 8 5
Weekly 0 4
Daily 6 0
Sankey visualization Not at all 51 51
Occasionally 43 31
Monthly 4 10
Weekly 2 8
Daily 0 0
Parallel coordinates plot Not at all 48 46
Occasionally 40 38
Monthly 10 4
Weekly 2 10
Daily 0 2
Polar coordinates plot Not at all 63 53
Occasionally 29 27
Monthly 8 14
Weekly 0 4
Daily 0 2

excluded the max 5% and min 5% of time needed per visualization to eliminate outliers.
This is motivated by the researchers’ previous experience in lab settings: We could
observe extremely low task times in cases when not enough effort was put into solving
the tasks (also leading to poor effectivity), and extremely high task times in cases
of distractions. These steps were necessary to ensure high-quality data, even without
observing participants during the execution of the experimental tasks.

3.6 Participants

Current evaluation practice for experimental research in the field of Information


Visualization is to recruit undergraduate or graduate students to participate in a lab
experiment. However, depending on their degree, students may have very little expe-
rience in visual analytics and thereby might cause misleading results (van Wijk 2013).
Hence, we decided to recruit participants via Amazon Mechanical Turk, with the
requirement of having at least the US Bachelor’s degree as educational level, we
introduce the topic and ask for knowledgeable participants in the survey description
and, additionally we check for working experience and experience in visual analyt-
ics. Both, results on actual visualization use (see Table 3) and experience in visual
analytics (see Table 4) show that participants were knowledgeable and therefore the

123
72 L. Perkhofer et al.

right target group for our study. In total, we recruited 198 participants resulting in
2376 evaluable task assessments. Details on the participants per study can be found
in Table 4.

4 Results

In the following, first results for each visualization type are going to be presented.
This initial analysis shows whether interaction is a necessary component of a Big Data
visualization (Sect. 4.1). For evaluation, MANCOVA is used to analyze our dependent
variables individually (effectivity, efficiency, and satisfaction) and ANCOVA to ana-
lyze the generated sum score of usability. To check the quality of our results, we also
conducted randomization tests to see whether a random allocation of results shows
a difference in outcome to one of the variable’s specification (number of resamples:
200). All randomization tests showed satisfying results, which are presented in the
“Appendix”. In the second part of this analysis, the effect of task type is going to be
analyzed in more detail (Sect. 4.2).

4.1 Results of interaction technique (per visualization type)

4.1.1 Descriptive statistics

Table 5 shows descriptive statistics for the variables interaction technique and visual-
ization type, which are the subjects of the first MANCOVA. While interaction seems
to have a limited effect on effectivity, it seems to have a positive effect on both, effi-
ciency and satisfaction. With respect to the visualization types, unfavorable results
from the polar coordinates plot for the two variables efficiency and satisfaction stand
out. Further, measures on effectivity and satisfaction show excess kurtosis as well as
skewness of around − 1 indicating that the measure is a little steeper and somewhat
skewed to the left when compared to normal distribution. Response time shows a
higher deviation from normality (skewness: − 1.7; kurtosis: 4.0), as is typical for time
related experiments with no time constraints imposed on the users.
Based on these results and by taking a closer look at the visualization types used,
we can find an initial support for our hypothesis 1a and 2a that both, the layout as
well as the dataset influence usability. While the layout seems to have a stronger
influence on task time, the dataset has a stronger influence on response accuracy. With
respect to hypothesis 1b, we can support the claim that Cartesian-based visualizations
show a higher usability especially when looking at the interactive form. While for
hypothesis 2b, we find only partial support as efficiency and satisfaction show a better
performance for hierarchy-based visualizations while task accuracy is higher for multi-
attribute ones. Further, we find initial proof for hypothesis 5a as well as 5b stating that
interaction indeed has an influence and that usability is higher for interactive than for
static visualizations. The next step is to test if these findings show significance in our
multivariate linear model.

123
Table 4 Demographic information on the four different participant groups

Visualization type N Task assessments Gender (female %) Age Completed degree Working experiencea Visual analyticsb

Sunburst 52 624 52% 37.9 Bachelor 83% Yes 94% No exp. 21%
Master 11% No 2% Slightly 38%
Doctorate 2% N/A 4% Moderately 39%
N/A 4% Extremely 2%
Sankey 49 588 45% 36.1 Bachelor 86% Yes 88% No exp. 12%
Master 4% No 6% Slightly 41%
Doctorate 2% N/A 6% Moderately 45%
N/A 8% Extremely 2%
Polar coordinates plot 49 588 40% 39.3 Bachelor 82% Yes 96% No exp. 16%
Master 16% No 0% Slightly 43%
Does design matter when visualizing Big Data? An empirical…

Doctorate 0% N/A 4% Moderately 39%


N/A 2% Extremely 2%
Parallel coordinates plot 48 576 35% 37.1 Bachelor 75% Yes 94% No exp. 29%
Master 19% No 2% Slightly 36%
Doctorate 0% N/A 4% Moderately 33%
N/A 6% Extremely 2%
a We measure working experience by asking the question: “Do you have working experience outside of summer jobs or trainee positions?”; Answer options: “Yes, No, N/A”
b We measure experience with visual analytics by asking the question: “Please indicate your experience with visual analytics”; Answer options: “No experience at all—extremely
experienced”

123
73
74 L. Perkhofer et al.

Table 5 Descriptive statistics—interaction technique per visualization type (means before z-transformation)

Means Effectivity (in %) Efficiency (in s) Satisfaction (1–5)

Static Interactive Static Interactive Static Interactive

Sunburst 68.7 67.4 86.2 52.6 2.0 2.7


Sankey 72.7 72.2 82.7 55.4 2.5 3.2
Parallel coordinates 72.3 75.4 83.3 48.5 1.8 3.0
Polar coordinates 78.9 68.9 143.0 90.1 1.1 2.6
Ø 73.4 71.0 99.8 62.3 1.8 2.9
Ø Cartesian-based 72.5 73.8 83.03 51.97 2.1 3.1
Ø Polar-based 74.2 70.0 116.86 63.23 1.5 2.7
Δ Layout − 1.7 3.8 − 33.83 − 11.26 0.6 0.4
Ø Hierarchy based 70.8 69.8 84.37 54.04 2.3 3.0
Ø Multi-attribute based 75.7 75.7 114.13 59.56 1.4 2.9
Δ Dataset − 4.9 − 5.9 − 29.76 − 5.52 0.9 0.1

4.1.2 MANCOVA for evaluating effectivity, efficiency, and satisfaction

We decided to use MANCOVA for analysis despite Box’s Test of equality of covariance
matrices is significant (p < 0.001), as first N is high (approx. 500 per visualization type
and 1000 per interaction concept—static vs. interactive) and therefore the test might
be too sensitive to violations of equality. And second, because groups are roughly
equal in size (between 250 and 270). As a consequence, we report Pillai’s Trace and
Wilks’ Lambda as those two are the more conservative measures to minimize type I
error.
Table 6 shows that the use of interaction has the strongest effect size in this model
with a partial eta squared (ï2 ) of 0.285. Based on Cohen’s classification (1969), this
effect can be seen as strong (ï2 ≥ 0.138), while the effect of visualization type is
0.091 and therefore considered to be medium in size (ï2 ≥ 0.059). Usage (or previous
experience with the particular visualization type) as well as the statistical interaction
effect between visualization type and interaction technique (VisType x Interaction) can
be interpreted as small effects (ï2 ≥ 0.010). After this initial analysis of the independent
variables and the covariate, a more detailed analysis based on the three dependent
variables follows in the Table 7.
Table 7 shows that none of the independent variables under investigation has a
significant effect on effectivity (response accuracy—RA). However, choosing the right
visualization type has an effect on efficiency (response time—RT) and satisfaction.
Further, using Big Data visualizations in an interactive form also show an effect on
efficiency and satisfaction. Our introduced covariate usage (or previous experience
with the particular visualization type) influences only the variable satisfaction, no
effects can be found on efficiency and effectivity. These results allow us to confirm
hypothesis 4 as well as 5a.
Drilling further down in our analysis, results on post hoc Sidak indicate that for all
visualization types the interactive form shows superior results for response time and

123
Does design matter when visualizing Big Data? An empirical… 75

Table 6 Multivariate test (interaction technique and visualization type)

Effect Test Value F Sig. Partial eta squared

Usage Pillai’s Trace 0.023 16.011 0.000 0.023


Wilks’ Lambda 0.977 16.011 0.000 0.023
VisType Pillai’s Trace 0.272 66.589 0.000 0.091
Wilks’ Lambda 0.735 72.944 0.000 0.098
Interaction Pillai’s Trace 0.285 266.168 0.000 0.285
Wilks’ Lambda 0.715 266.168 0.000 0.285
VisType x Interaction Pillai’s Trace 0.039 9.000 0.000 0.013
Wilks’ Lambda 0.961 9.000 0.000 0.013

Table 7 Test of between-subject effects (interaction technique and visualization type)

Effect Dependent variable Type III sum of F Sig. Partial eta


squares squared

Usage RA_z 0.746 0.747 0.387 0.000


RT_z 0.180 0.214 0.644 0.000
SAT_z 0.816 134.610 0.000 0.063
VisType RA_z 5.421 1.811 0.143 0.003
RT_z 303.862 143.305 0.000 0.177
SAT_z 163.566 80.236 0.000 0.107
Interaction RA_z 1.022 1.025 0.311 0.001
RT_z 218.534 309.189 0.000 0.134
SAT_z 306.526 451.092 0.000 0.184
VisType x RA_z 6.019 2.011 0.110 0.003
Interaction RT_z 16.899 7.970 0.000 0.012
SAT_z 32.665 16.024 0.000 0.023
R Squared RA_z 0.007
RT_z 0.278
SAT_z 0.323

that the polar coordinates plot performs significantly worse than all other visualization
types for Big Data. Regarding satisfaction, we can identify a superior visualization
type (Sankey) as well as an inferior visualization type (polar coordinates). Further, we
can identify again that interactive visualization types satisfy participants while static
ones seem to frustrate them. (Note: Analysis based on effectivity is not presented, as
no significant results could be obtained between-subjects).
The detailed analysis in Table 8 allows us to confirm hypothesis 5b (based on
efficiency and satisfaction; no result on effectivity).
Our analysis in Table 9 shows that Cartesian-coordinate based visualizations out-
perform Polar-coordinate based visualizations in terms of efficiency and satisfaction,
while the difference in effectivity is not significant (confirming hypothesis 1b). Tak-

123
76 L. Perkhofer et al.

Table 8 Significant pairwise comparisons—interaction (post hoc Sidak)

Dependent variable Significant pairs only Mean difference Sig.

Efficiency RT_z Static—interactive − 0.662 0.000


Satisfaction SAT_z Static—interactive − 0.784 0.000

Table 9 Significant pairwise comparisons—visualization type (post hoc Sidak)

Dependent variable Significant pairs only Mean difference Sig.

Effectivity RA_z Hierarchy—multi-attribute − 0.079 0.077


Efficiency RT_z Sunburst—polar coordinates 0.848 0.000
Sankey—polar coordinates 0.869 0.000
Parallel coordinates—polar coordinates 0.911 0.000
Polar—Cartesian − 0.499 0.000
Hierarchy—multi-attribute 0.419 0.000
Satisfaction SAT_z Sunburst—Sankey − 0.398 0.000
Sunburst—polar coordinates 0.393 0.000
Sankey—parallel coordinates 0.407 0.000
Sankey—polar coordinates 0.384 0.000
Parallel coordinates—polar coordinates 0.384 0.000
Polar—Cartesian − 0.406 0.000
Hierarchy—multi-attribute 0.417 0.000

ing a closer look at the dataset used, we can find partial support for our hypothesis
2b. Hierarchy based visualizations perform better in terms of efficiency and satisfac-
tion, while they perform worse (but only at a p < 0.1 significance level) in terms of
effectivity.

4.1.3 ANCOVA for evaluating our sum score for usability

After the initial analysis based on MANCOVA, where each dependent variable was
looked at independently, we now investigate how a single score for usability influences
results and interpretation. The following table presents all independent variables and
the used covariate for usage (Table 10).
First, interpretation is easier, as only one dependent variable needs to be con-
sidered during analysis. However, statistical power and explainability is reduced
considerably. The previously observed strong effects are reduced to be of medium
strength and r2 is only 0.147, indicating that only 15% of the variability in the
dependent variable can be explained by our sum score for usability. Nonetheless,
interpretation based on pairwise comparison stays the same. We can derive that
interactive visualizations are superior when compared to static ones for all visual-
ization types tested and we can conclude that the polar coordinates plot is inferior
when compared to the other visualization types used in this study (Tables 11,
12).

123
Does design matter when visualizing Big Data? An empirical… 77

Table 10 Test of between-subject Dependent Type III sum F Sig. Partial eta
effects (visualization type and variable of squares squared
interaction technique)
Usage 472.451 24.472 0.000 0.012
VisType 2228.604 38.480 0.000 0.055
Interaction 3295.185 170.686 0.000 0.079
VisType × 247.064 4.266 0.005 0.006
Interaction
R squared 0.147

Table 11 Significant pairwise comparisons—interaction (post hoc Sidak)

Dependent variable Significant pairs only Mean difference Sig.

Usability Static—interactive − 2.570 0.000

Table 12 Significant pairwise comparisons—visualization type (post hoc Sidak)

Dependent variable Significant pairs only Mean difference Sig.

Usability Sunburst—Sankey − 1.207 0.020


Sunburst—parallel coordinates − 1.133 0.037
Sankey—polar coordinates 2.203 0.000
Parallel coordinates—polar coordinates 2.129 0.000
Polar—Cartesian − 1.741 0.000
Hierarchy—multi-attribute 1.229 0.000

4.2 Results on task type (per interactive visualization type)

From our analysis in Sect. 4.1, we know that using Big Data visualization in an
interactive form results in a better performance with respect to the two dependent
variables efficiency and satisfaction which in turn increases usability. As a result,
analysis based on task type is carried out only for interactive visualization types in
order to concentrate on identifying the best visualization for specific task types. Again,
we separately look at the dependent variables in a multivariate general linear model
(Sect. 4.2.1) and use a univariate general linear model to evaluate the sum score for
usability (Sect. 4.2.2).

4.2.1 Descriptive statistics

Table 13 shows mean values for the obtained data. What we can see from this analysis
is, that the visualizations showing multiple attributes seem to outperform hierarchy-
based visualizations when working with the task type identify (when looking at
effectivity), while for summarize no clear indication on superiority can be found.
Regarding the distribution of the variables, we can again see a rather strong deviation

123
78 L. Perkhofer et al.

Table 13 Descriptive statistics task type per visualization type (means before z-transformation)

Means Effectivity (in %) Efficiency (in s) SATa

Identify Compare Summarize Identify Compare Summarize

Sunburst 51.1 77.2 80.4 49.0 55.5 54.0 2.7


Sankey 57.1 74.4 84.7 53.2 52.7 60.2 3.2
Parallel Coord. 81.0 67.9 77.4 50.1 46.8 48.7 3.0
Polar Coord. 76.7 61.1 68.9 90.4 92.2 87.7 2.6
Ø 66.3 70.2 77.4 60.9 62.1 64.2 2.9
Ø Hierarchy 54.0 75.8 83.2 51.0 54.1 58.0 2.9
Ø Multi- 78.7 67.9 77.4 70.9 46.8 48.7 2.9
attribute
Δ Dataset − 24.7 7.9 5.8 − 19.9 7.3 9.3 –
a Satisfaction had to be assessed based on the four visualization types and interaction; no distinction was
used between task types

from normality for response time (skewness: − 2.2; excess kurtosis: 6.8), while for
the other two variables measures around − 1 are indicated.
Based on these initial results, we can derive that the task type indeed has an influence
on the variables used, however, the strongest influence can be identified for effectivity.
Further, it seems that our hypothesis 3b can be supported, while there might be little
empirical support for 3c. Nonetheless, for a final résumé, statistical analysis from the
next chapter is necessary.

4.2.2 MANCOVA for evaluating effectivity, efficiency and satisfaction

Again, Box’s Test of equality of covariance matrices is significant (p < 0.001) but also
for this analysis N is high (approx. 250 per visualization type and 350 per task type-
—identify, compare, and summarize) and groups are roughly equal in size (between
and 84 and 92). Consequently, we report Pillai’s Trace and Wilks’ Lambda as those
two are the more conservative measures.
Table 14 indicates that the visualization type has an influence on our multivariate
general linear model and based on the effect size represented by ï2 , the effect can be
classified to be of medium size. Also, the covariate usage shows a significant effect and
stands for an increased satisfaction along with an increase in previous experience with
the respective visualization type. An effect is also visible for the statistical interaction
of task type and visualization type (VisType x TaskType), while no significance is
shown for task type. Detailed analysis based on the dependent variables is presented
in Table 10.
The more detailed analysis in Table 15 reveals that (as already known by the analysis
in Sect. 4.1) the visualization type influences efficiency as well as satisfaction, while
it has no effect on effectivity. More interestingly, this analysis also shows that task
type influences effectivity (supporting H3a) and that there is a significant statistical
interaction effect between visualization type and task type used (VisType x TaskType).

123
Does design matter when visualizing Big Data? An empirical… 79

Table 14 Multivariate test (task type and interactive visualization type)

Effect Test Value F Sig. Partial eta squared

Usage Pillai’s Trace 0.070 24.71 0.000 0.070


Wilks’ Lambda 0.970 24.71 0.000 0.070
VisType Pillai’s Trace 0.298 36.48 0.000 0.099
Wilks’ Lambda 0.710 40.58 0.000 0.108
TaskType Pillai’s Trace 0.011 1.82 0.091 0.005
Wilks’ Lambda 0.989 1.82 0.091 0.005
VisType x TaskType Pillai’s Trace 0.043 2.40 0.001 0.014
Wilks’ Lambda 0.957 2.41 0.001 0.014

Table 15 Test of between-subject Effect Dependent Type III F Sig. Partial eta
effects (task type and interactive variable sum of squared
visualization type) squares

Usage RA_z 2.081 2.114 0.146 0.002


RT_z 0.105 0.377 0.539 0.000
SAT_z 46.534 70.307 0.000 0.066
VisType RA_z 3.712 1.237 0.288 0.004
RT_z 92.759 30.920 0.000 0.251
SAT_z 36.058 23.019 0.000 0.052
TaskType RA_z 10.275 5.138 0.006 0.010
RT_z 0.207 0.140 0.690 0.001
SAT_z 0.020 0.010 0.985 0.000
VisType RA_z 36.847 6.236 0.000 0.036
x Task- RT_z 1.842 1.101 0.360 0.007
Type
SAT_z 0.448 0.113 0.995 0.001
R RA_z 0.051
Squared RT_z 0.257
SAT_z 0.119

Table 16 Significant pairwise comparisons—task type

Dependent variable Significant pairs only Mean difference Sig.

Effectivity RA_z Identify—summarize − 0.253 0.004

This interaction effect states that depending on the visualization type used the differ-
ence in mean varies and for further insight pairwise comparison is needed to explain
the connection (Table 16).
Response accuracy, without distinguishing between the visualization types, is sig-
nificantly higher for the task type summarize than it is for the task type identify. Taking
a closer look at the task type identify, it becomes obvious that the visualization types

123
80 L. Perkhofer et al.

Table 17 Test of between-subject Dependent Type III sum F Sig. Partial eta
effects (task types and variable of squares squared
interactive visualization types)
Usage 153.233 7.981 0.005 0.008
VisType 853.638 14.820 0.000 0.043
TaskType 133.150 3.467 0.032 0.007
VisType x 443.878 3.853 0.001 0.023
TaskType
R Squared 0.080

Table 18 Significant pairwise comparisons—task type

Dependent variable Significant pairs only Mean difference Sig.

Usability_Sum Identify—summarize − 0.915 0.027

that show disaggregated data (more attributes and only one dimension: parallel coor-
dinates and polar coordinates) are superior, while visualization types that aggregate
data based on multiple dimensions are inferior (sunburst and Sankey). The opposite is
true for the task type compare (although differences are only significant at a p < 0.1
level) and for the task type summarize no significant difference between visualization
types can be found. Based on these results hypothesis 3b can be supported, while 3c
cannot.
With respect to response time and satisfaction, the same results as already pre-
sented in Sect. 4.1 are visible. Overall, polar coordinates are inferior to the other three
visualization types. No significant difference can be found between the sunburst visu-
alization, the Sankey visualization and the parallel coordinates plot. For satisfaction,
the Sankey chart shows superior results while the polar coordinates shows the worst
outcome.

4.2.3 ANCOVA for evaluating our sum score for usability

After analyzing the multivariate model, we again take a look at the univariate model
based on our calculated sum score for usability. The following table shows the obtained
results based on the between-subject effects (Table 17).
As already visible in the first analysis presented in Sect. 4.1.2, the sum score shows
a reduced r2 (compared to RT and SAT) and also, the effect size of the independent
variables is downsized from previous medium effects to small ones. Pairwise compar-
ison reveals that the Sankey visualization and the parallel coordinates plot show the
highest scores while the polar coordinates plot shows the worst (significant p < 0.05).
Further, we can again derive that irrespective of the visualization type used, the task
type identify is more difficult to perform when Big Data visualizations are presented
to the user than the task type summarize (especially when confronted with hierarchy
based visualizations) (Table 18).

123
Does design matter when visualizing Big Data? An empirical… 81

5 Conclusion and future work

5.1 Discussion and implications

Reducing complexity by relying on a visual layout and enhancing problem-solving


capabilities is the fundamental goal of visualizations (Ohlert and Weißenberger 2015).
However, as datasets are increasing in complexity, conventional business charts (e.g.
line, bar, and pie charts) are no longer sufficient and newer forms of visualizations espe-
cially designed to deal with this increasing complexity need to be applied (Perkhofer
et al. 2019b). Within the discipline of Information Visualization many researchers
are concerned with the creation of such new visualization types. Thus, a large pool
of options already exists (Perkhofer 2019). However, what’s missing, is to test their
ability to inform and satisfy users within their explicit areas of application such as
management accounting.
Thus, in this study we focus on a use-case that is based on data common within
the discipline of management accounting and perform an experiment using knowl-
edgeable participants in visual analytics. The dataset used was specifically created
for this experiment to show a close-to-real data sample from a management-related
perspective (Plaisant et al. 2008). Designing the data sample together with finance
experts allowed us to draw from their experience and calculate metrics highly rele-
vant for managerial accounting in a common trading company (in our case the wine
trade). The contribution of this paper is the evaluation of multidimensional interac-
tive visualizations for Big Data and its distinct components influencing usability in a
large-scale quantitative experiment. By comparing four visualization types (the sun-
burst visualization, the Sankey visualization, the parallel coordinates plot, and the
polar coordinates plot), three different task types (identify, compare, and summarize)
as well as different interaction techniques (mostly static vs. interactive), their effect
on decision-making (efficiency and effectiveness) and satisfaction could be evaluated.
Recommendations based on these findings should help to increase the usability of these
four visualization types in particular, but can also be applied to other types available
(Fig. 4).
Summarizing the obtained results, the visualization type polar coordinates shows
below-average performance in all comparisons and should not be used, while the par-
allel coordinates plot (Cartesian layout) as well as the Sankey visualization (again
Cartesian layout) performed best. Using the right visualization type has a strong effect
on efficiency and a medium effect on satisfaction, while no effect on effectivity could
be detected. This analysis also indicates that horizontal or Cartesian based layouts out-
perform Polar visualizations. Further, our results imply that visualizations representing
hierarchies are easier to interpret and work with, than visualization showing multiple
attributes, however, when identification tasks need to be performed they are inferior.
With respect to task type, results demonstrate that the task type influences effec-
tivity and therefore the most important variable of usability. In more detail, the task
type summarize shows significantly better performance than the task type identify.
Overall, we can state that the visualization results in better performance, when it fits
to the task—identify asks rather for visualization types that show data in disaggregated

123
82 L. Perkhofer et al.

Fig. 4 Results on the research model. Directed effect size indicated by color and arrow thinkness
(→ small, → medium, and → large effect)

form (parallel and polar coordinates) while the task type summarize asks rather for
visualizations that show data in an aggregated form. This is very much in line with
previous research asking for a cognitive fit (Ohlert and Weißenberger 2015; Hirsch
et al. 2015; Vessey and Galletta 1991).
Also clearly indicated by our results is the need for interaction when working with
Big Data visualizations. The use of interaction has a strong effect on satisfaction and
additionally, a medium effect on efficiency. Concerning our covariate usage, we can
detect a significant influence on satisfaction while it has no effect on effectivity or
efficiency. Results on all posted hypotheses are again presented and summarized in
Table 19.
With respect to the evaluation methods used—multiple dependent variables versus
one sum score—we could observe that a lot of explainability is lost by only looking
at the sum score of usability instead of analyzing the three dependent variables
effectivity, efficiency, and satisfaction independently. On the other hand, results
based on the single score can be presented clearer and with respect to interpretation,
no differences in recommendations to users are visible. From our perspective, this
combined evaluation of the dependent variables and the sum score shows a clear
picture on the effects tested in this study.

5.2 Limitations and further research

Of course, this study also includes some limiting factors that need to be discussed and
considered when interpreting results. Limitations identified are an indication for fur-
ther research opportunities that can be addressed in supplementary research endeavors:
Limited number of visualizations used: As already explained, we only analyzed a
subset of visualization options available, however, a huge pool of further possibilities

123
Does design matter when visualizing Big Data? An empirical… 83

Table 19 Results on our hypotheses

Hypothesis Dependent variable Results

H1a: The basic layout Usability True


influences usability of a
visualization
H1b: Cartesian-coordinate Usability True
based visualization types Effectivity No significance
outperform polar-coordinate Efficiency True
based visualization types Satisfaction True
H2a: The underlying dataset Usability Partly True
influences the usability of a
visualization
H2b: Hierarchy based Usability Partly True
visualizations types Effectivity False
outperform multi-attribute Efficiency True
based visualization types Satisfaction True
H3a: The task type influences Usability True
the usability of a visualization Effectivity True
Efficiency Truea
Satisfaction No significance
H3b: Users will perform better Effectivity True
with a multi-attribute
visualization than with a
hierarchy-based visualization
when confronted with the task
type identify
H3c: Users will perform better Effectivity No significance
with a hierarchy-based
visualization than with a
multi-attribute visualization
when confronted with the task
type summarize
H4: Previous experience/usage Usability True
of the different visualization Effectivity No significance
types positively effects Efficiency No significance
usability Satisfaction True
H5a: Interaction influences the Usability True
usability of a visualization
H5b: Users will perform better Usability True
with a highly interactive Effectivity No significance
visualization than with a Efficiency True
mostly static one Satisfaction True
a Significant statistical interaction effect between visualization type and task type

exists. They range from additional forms of one comprehensive visualization, to the use
of small multiples (multiple small visualizations put in juxtaposition such as parallel
coordinates plot matrix or scatterplot matrix). Although the form of one comprehensive
visualization is of relevance, it would be of special interest to investigate if a difference

123
84 L. Perkhofer et al.

on mental demand and/or usability persists when using more than one visualization
for displaying the dataset. These options need to be further explored.
Different interaction techniques depending on the visualization type: Results on
all visualization types are directly comparable as they were tested with the same
questions and the same data dimensions or attributes respectively. However, interaction
techniques were used according to common practice, leaving us with different concepts
using a different mix of individual techniques. We decided on this approach to ensure
high external validity, knowing that we might introduce possible limiting factors for
internal validity. We did not want to depart too much from the real world and thereby
introduce simplification for the sake of fulfilling all requirements for a valid scientific
experiment. However, we do not imply that this is the better approach in general, but it
fits our purpose. The most important requirement for this study is that the visualization
and the task are generic but also realistic.
Use of MTurk: While the use of MTurk comes with the advantage of a large pool
of possible participants and fast survey completion rates, there are also some related
limitations. First, during data collection, we as researchers have no control over the
process. Participants could be disturbed or interrupted, drawing away the necessary
attention that might be needed to successfully fulfill the required tasks. Therefore,
special attention needs to be paid in the design of the questionnaire and quality checks
need to be implemented to sort good from bad. Second, most workers live in the
United States and India, which might introduce a cultural bias. However, workers
tend to be more educated than the general population and therefore more complex
issues can be posted on MTurk, which was important for our study. Further, specific
characteristics of the workers (e.g. the need for a bachelor’s degree) can be linked to
the posted HIT (human intelligence tasks) in exchange for higher payment. Despite
the possible drawbacks, a comparative study which was posted on MTurk and also
executed in the lab in the context of visual analytics produced comparable results
(Harrison et al. 2014). For this initial stage of our research, we need answers to many
manipulations (design, dataset, correlation type…) and we therefore believe MTurk
to be an appropriate platform.
No information on information retrieval process: The way information is retrieved
from a visual display gives a lot of indications on design problems. Controlled exper-
iments using eye-tracking have proven to be particularly useful in providing insight
in the data retrieval process (Falschlunger et al. 2014, 2016a) and might be able to
shed further light on the specific design issues. We could not make use of a controlled
environment by using the crowdsourcing platform MTurk. Thus, we cannot assume
that all participants have participated under the same conditions, meaning the same
speed of internet, the same display accuracy of the end device, as well as the same
environmental conditions (e.g. silent surrounding). Further, having enough cognitive
resources is of high importance in order to uncover insight. Measuring cognitive load
directly, for example by relying on physiological measurement methods such as eye-
tracking (Perkhofer and Lehner 2019) or heart rate variability (Hjortskov et al. 2004),
might allow for more reliable results than by measuring on self-reported data.
Effectivity is measured dichotomous: The use of questions that are either correctly
or incorrectly answered could be the reason why low effects on effectivity are visible
within the model. Asking different questions that allow the assessment of effectivity

123
Does design matter when visualizing Big Data? An empirical… 85

using finer distinctions rather than 0 and 1 could be beneficial to gain further insight
on this measure.

5.3 Concluding remarks

In conclusion, Big Data visualizations allow to show a large amount of data in one
comprehensive visualization, however, a special focus needs to be placed on their
design (including an appropriate layout and interaction techniques) as well as on the
task they are supposed to support. The presented new visualizations (sunburst, Sankey,
parallel coordinates plot, and polar coordinates plot) are better directed towards show-
ing transaction-based data. Moreover, management accounting is an interesting area
of application, however, the lack of experience of a user leaves them at a disadvantage
in terms of interpretation and satisfaction (Perkhofer et al. 2019b). Without knowledge
(or stored schemas in long-term memory) on how to interpret these visual forms and
knowledge on how to operate them, insight cannot be triggered (Sweller 2010). This
necessitates a detailed focus on user-centered visual and functional design, a fact that
has been largely neglected so far (Isenberg et al. 2013). This study is a first attempt at
closing this gap.

Acknowledgements Open access funding provided by University of Applied Sciences Upper Austria. The
authors of this paper are thankful for the support of Hans-Christian Jetter and Thomas Luger and for the
support of the University of Applied Sciences Upper Austria. The authors further gratefully acknowledge
the Austrian Research Promotion Agency (FFG) Grant #856316 USIVIS: User-Centered Interactive Visu-
alization of “Big Data”.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which
permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give
appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence,
and indicate if changes were made. The images or other third party material in this article are included
in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If
material is not included in the article’s Creative Commons licence and your intended use is not permitted
by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the
copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Appendix

See Tables 20, 21 and 22.

123
86 L. Perkhofer et al.

Table 20 Interaction techniques identified and clustered

Interaction technique Description References

Select: Selection allows highlighting data Liu et al. (2017)


based on certain options presented Johansson and Forsell (2016)
to the user. The amount of visible Brehmer and Munzner (2013)
data stays the same, however, Heer and Shneiderman (2012)
highlighted data is moved into the Pike et al. (2009)
spotlight (e.g. through color Yi et al. (2007)
intensity). Multiple selectors are Chengzhi et al. (2003)
possible. However, the more Keim (2001)
selectors are active the higher the
mental demand posed on the user!
Examples: Linking and brushing,
drop-down, checkboxes,
radio-buttons, scrollable lists, slider,
directly clicking on elements in the
visualization, etc.
Filter: In contrast to selecting, filtering Liu et al. (2017)
allows the exclusion data from Brehmer and Munzner (2013)
being visible and therefore actively Kehrer and Hauser (2013)
manipulates the amount of visible Pike et al. (2009)
data. Multiple active filters can be Yi et al. (2007)
very helpful to identify details along Chengzhi et al. (2003)
multiple dimensions or facets, Keim (2001)
however, it is difficult maintaining
overview (even more so if filters are
not visible)
Examples: Drop-down, checkboxes,
radio-buttons, scrollable lists, slider,
directly clicking on elements in the
visualization, etc.
Navigate: Navigate alters the viewpoint of the Johansson and Forsell (2016)
user. This can be used for instance Brehmer and Munzner (2013)
for geographical maps where you Pike et al. (2009)
can move the area of interest to Yi et al. (2007)
custom locations (directly by Keim (2001)
mouse-click)
Examples: Zooming, panning,
rotating
Arrange: Reorganizing elements of a Liu et al. (2017)
visualization spatially (data Chou et al. (2016)
dimensions or attributes) is of high Brehmer and Munzner (2013)
importance with regard to displays Pike et al. (2009)
where only neighboring datasets Yi et al. (2007)
can be analyzed (e.g. changing axis
within parallel coordinates plots or
Sankey visualizations)
Examples: Re-ordering axis,
re-ordering rows/columns, re-order
spatial layout if multiple
visualizations are involved

123
Does design matter when visualizing Big Data? An empirical… 87

Table 20 continued

Interaction technique Description References

Change: Change is represented by the Brehmer and Munzner (2013)


possibility of altering the visual Kehrer and Hauser (2013)
representation itself. It is often Dörk et al. (2012)
mentioned as being a high-level Elmqvist et al. (2011)
interaction. However, following and Pike et al. (2009)
understanding a complete change of Yi et al. (2007)
the display can be cognitively Chengzhi et al. (2003)
difficult to process and therefore
should be used sparingly
Examples: View reconfiguration
(changing the visualization type,
changing color schemes)
Aggregate: Using statistical measures to describe Liu et al. (2017)
multiple data points by one Chou et al. (2016)
measure. This can be very helpful to Brehmer and Munzner (2013)
generate quick insight. However, Heer and Shneiderman (2012)
even though statistical Munzner (2014)
characterization is a very powerful
approach, often valuable
information is lost
Examples: Mean, median, variance,
counts, summations

Table 21 Hierarchical prototypes used (self-coded) for experimental research

Sunburst  ● Sankey  ◆

Interaction concept static: only mouse-over Interaction concept static: Mouse-over


https://usivis-survey.herokuapp.com/horizontal https://usivis-survey.herokuapp.com/sunburst/
Interaction concept interactive: mouse-over, static4
drill-through (filter) and drill-up (remove filter), Interaction interactive: Mouse-over, brushing and
re-ordering of dimensions (from inner to outer highlighting, multiple-selectors, re-ordering of
layer) dimensions (from left to right), re-ordering of
https://usivis-survey.herokuapp.com/sunburst/ dimension details (from top to bottom)
sortable http://hgb-sankey-wine.herokuapp.com/

Table 22 Multi-attribute prototypes used (self-coded) for experimental research

Polar coordinates plot ▲ ● Parallel coordinates plot ▲ ◆

Interaction concept static: multiple-filters, Interaction concept static: Multiple-filters,


drop-down range-slider
https://polarcoordinates.herokuapp.com/? http://intervis.projekte.fh-hagenberg.at/USIVIS/
question=JoMac&isMonochrome=true PC1/examples/PC1.html
Interaction concept interactive: multiple-filters, Interaction concept interactive: Multiple-filters,
drop-down, re-ordering of attributes, range-slider, re-ordering of attributes,
multi-color use to differentiate traders multi-color use to differentiate traders
(dimension details) (dimension details)
https://polarcoordinates.herokuapp.com/? http://intervis.projekte.fh-hagenberg.at/USIVIS/
question=JoMac&isMonochrome=false PC3/examples/PC3.html

123
88 L. Perkhofer et al.

Table 23 Randomization check—visualization type (RT1)

Sample results Resample results


(RT1) (200 resamples)

Mean: − 47.12 Mean: 0.12/SD: 3.44


Probability of samples showing means above mean (sample): 0.00%
Probability of samples showing means below mean (sample): 0.00%
Two tails probability: 0.00%

Table 24 Randomization check—visualization type (RT2)

Sample results Resample results


(RT2) (200 resamples)

Mean: − 48.36 Mean: − 0.17/SD: 3.16


Probability of samples showing means above mean (sample): 0.00%
Probability of samples showing means below mean (sample): 0.00%
Two tails probability: 0.00%

Table 25 Randomization check—visualization type (RT3)

Sample results Resample results


(RT3) (200 resamples)

Mean: − 50.60 Mean: 0.07/SD: 3.61


Probability of samples showing means above mean (sample): 0.00%
Probability of samples showing means below mean (sample): 0.00%
Two tails probability: 0.00%

Table 26 Randomization check—visualization type (SAT1)

Sample results Resample results


(SAT1) (200 resamples)

Mean: − 0.52 Mean: 0.005/SD: 0.072


Probability of samples showing means above mean (sample): 0.00%
Probability of samples showing means below mean (sample): 0.00%
Two tails probability: 0.00%

Randomization check 1: visualization type

Randomization checks are only provided for significant pairwise comparison results
in Sect. 4 (Tables 23, 24, 25, 26, 27, 28, 29, 30):
• RT1: Sunburst—polar coordinates plot
• RT2: Sankey—polar coordinates plot
• RT3: Parallel coordinates plot—polar coordinates plot
• SAT1: Sunburst—Sankey
• SAT2: Sunburst—polar coordinates
• SAT3: Sankey—parallel coordinates

123
Does design matter when visualizing Big Data? An empirical… 89

Table 27 Randomization check—visualization type (SAT2)

Sample results Resample results


(SAT2) (200 resamples)

Mean: 0.52 Mean: 0.006/SD: 0.078


Probability of samples showing means above mean (sample): 0.00%
Probability of samples showing means below mean (sample): 0.00%
Two tails probability: 0.00%

Table 28 Randomization check—visualization type (SAT3)

Sample results Resample results


(SAT3) (200 resamples)

Mean: 0.50 Mean: 0.009/SD: 0.080


Probability of samples showing means above mean (sample): 0.00%
Probability of samples showing means below mean (sample): 0.00%
Two tails probability: 0.00%

Table 29 Randomization check—visualization type (SAT4)

Sample results Resample results


(SAT4) (200 resamples)

Mean: 1.05 Mean: 0.007/SD: 0.070


Probability of samples showing means above mean (sample): 0.00%
Probability of samples showing means below mean (sample): 0.00%
Two tails probability: 0.00%

Table 30 Randomization check—visualization type (SAT5)

Sample results Resample results


(SAT5) (200 resamples)

Mean: 0.54 Mean: − 0.001/SD: 0.084


Probability of samples showing means above mean (sample): 0.00%
Probability of samples showing means below mean (sample): 0.00%
Two tails probability: 0.00%

• SAT4: Sankey—polar coordinates


• SAT5: Parallel coordinates—polar coordinates

Randomization check 2: interaction type

Randomization checks are only provided for significant pairwise comparison results
in Sect. 4 (Tables 31, 32):

• RT1: Static—interactive
• SAT1: Static—interactive

123
90 L. Perkhofer et al.

Table 31 Randomization check—interaction type (RT1)

Sample results Resample results


(RT1) (200 resamples)

Mean: 37.13 Mean: 0.27/SD: 2.56


Probability of samples showing means above mean (sample): 0.00%
Probability of samples showing means below mean (sample): 0.00%
Two tails probability: 0.00%

Table 32 Randomization check—interaction type (SAT1)

Sample results Resample results


(SAT1) (200 resamples)

Mean: − 1.06 Mean: − 0.003/SD: 0.059


Probability of samples showing means above mean (sample): 0.00%
Probability of samples showing means below mean (sample): 0.00%
Two tails probability: 0.00%

Table 33 Randomization check—interaction type (SAT1)

Sample results Resample results


(RA1) (200 resamples)

Mean: − 0.11 Mean: 0.002/SD: 0.034


Probability of samples showing means above mean (sample): 0.00%
Probability of samples showing means below mean (sample): 0.00%
Two tails probability: 0.00%

Randomization check 3: task type

Randomization checks are only provided for significant pairwise comparison results
in Sect. 4 (Table 33):
• RA1: Identify—summarize

References
Abi Akle, A., Yannou, B., & Minel, S. (2019). Information visualisation for efficient knowledge discovery
and informed decision in design by shopping. Journal of Engineering Design, 30(6), 227–253. https://
doi.org/10.1080/09544828.2019.1623383.
Albo, Y., Lanir, J., Bak, P., & Rafaeli, S. (2016). Off the radar. Comparative evaluation of radial visualization
solutions for composite indicators. IEEE Transactions on Visualization and Computer Graphics, 22(1),
569–578. https://doi.org/10.1109/tvcg.2015.2467322.
Anderson, E. W., Potter, K. C., Matzen, L. E., Shepherd, J. F., Preston, G. A., & Silva, C. T. (2011). A user
study of visualization effectiveness using EEG and cognitive load. Computer Graphics Forum, 30(3),
791–800.
Appelbaum, D., Kogan, A., Vasarhelyi, M., & Yan, Z. (2017). Impact of business analytics and enterprise
systems on managerial accounting. International Journal of Accounting Information Systems, 25,
29–44. https://doi.org/10.1016/j.accinf.2017.03.003.

123
Does design matter when visualizing Big Data? An empirical… 91

Atkinson, R. C., & Shiffring, R. M. (1968). Human memory. A proposed system and its control processes.
In K. W. Spence & J. T. Spence (Eds.), The psychology of learning and motivation (Vol. 2, pp. 89–195).
New York: Academic Press.
Bačić, D., & Fadlalla, A. (2016). Business information visualization intellectual contributions: An integra-
tive framework of visualization capabilities and dimensions of visual intelligence. Decision Support
Systems, 89, 77–86. https://doi.org/10.1016/j.dss.2016.06.011.
Barter, R. L., & Yu, B. (2018). Superheat: An R package for creating beautiful and extendable heatmaps for
visualizing complex data. Journal of Computational and Graphical Statistics: A Joint Publication of
American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North
America, 27(4), 910–922. https://doi.org/10.1080/10618600.2018.1473780.
Bawden, D., & Robinson, L. (2009). The dark side of information. Overload, anxiety and other paradoxes
and pathologies. Journal of Information Science, 35(2), 180–191.
Bertini, E., Tatu, A., & Keim, D. A. (2011). Quality metrics in high-dimensional data visualization. An
overview and systematization. IEEE Transactions on Visualization and Computer Graphics, 17(12),
2203–2212.
Bostock, M., Ogievetsky, V., & Heer, J. (2011). D3: Data-driven documents. IEEE Transactions on Visual-
ization and Computer Graphics, 17(12), 2301–2309.
Brehmer, M., & Munzner, T. (2013). A multi-level typology of abstract visualization tasks. IEEE Transac-
tions on Visualization and Computer Graphics, 19(12), 2376–2385. https://doi.org/10.1109/TVCG.
2013.124.
Bruls, M., Huizing, K., & van Wijk, J. J. (2000). Squarified treemaps. In W. de Leeuw & R. van Liere (Eds.),
Eurographics/IEEE VGTC. With assistance of IEEE computer society. IEEE VGTC symposium on
visualization. Amsterdam, 29–30.05.2000 (pp. 1–10).
Buja, A., Cook, D., & Swayne, D. F. (1996). Interactive high-dimensional data visualization. Journal of
Computational and Graphical Statistics, 5(1), 78. https://doi.org/10.2307/1390754.
Chen, C. Philip L, & Zhang, C.-Y. (2014). Data-intensive applications, challenges, techniques and tech-
nologies. A survey on Big Data. Information Sciences, 275, 314–347.
Chengzhi, Q., Chenghu, Z., & Tao, P. (2003). The taxonomy of visaulization techniques and systems.
Concerns between users and developers are different. In Proceedings of the Asia GIS 2003. Asia GIS
conference. Wuhan, China, 16.–18.10.2003 (pp. 1–14).
Chou, J.-K., Wang, Y., & Ma, K.-L. (2016). Privacy preserving event sequence data visualization using a
Sankey diagram-like representation. In SIGGRAPH ASIA 2016 symposium on visualization (pp. 1–8).
Macau: ACM.
Claessen, J. H. T., & van Wijk, J. J. (2011). Flexible linked axes for multivariate data visualization. IEEE
Transactions on Visualization and Computer Graphics, 17(12), 2310–2316. https://doi.org/10.1109/
TVCG.2011.201.
Diehl, S., Beck, F., & Burch, M. (2010). Uncovering strengths and weaknesses of radial visualizations–an
empirical approach. IEEE Transactions on Visualization and Computer Graphics, 16(6), 935–942.
https://doi.org/10.1109/TVCG.2010.209.
Dilla, W., Janvrin, D. J., & Raschke, R. (2010). Interactive data visualization. New directions for accounting
information systems research. Journal of Information Systems, 24(2), 1–37. https://doi.org/10.2308/
jis.2010.24.2.1.
Dilla, W. N., & Raschke, R. L. (2015). Data visualization for fraud detection. Practice implications and a
call for future research. International Journal of Accounting Information Systems, 16, 1–22. https://
doi.org/10.1016/j.accinf.2015.01.001.
Dix, A., & Ellis, G. (Eds.) (1998). Starting simple. Adding value to static visualisation through simple
interaction. In AVI ‘98 Proceedings of the working conference on Advanced visual interfaces. L’Aquila,
Italy: ACM New York, NY, USA (AVI ‘98).
Dörk, M., Carpendale, S., Collings, C., & Williamson, C. (2008). VisGets: Coordinated visualization for
web-based information exploration and discovery. IEEE Transactions on Visualization and Computer
Graphics, 14(6), 1205–1212.
Dörk, M., Riche, N. H., Ramos, G., & Dumais, S. (2012). PivotPaths: Strolling through faceted information
spaces. IEEE Transactions on Visualization and Computer Graphics, 18(12), 2709–2719.
Draper, G. M., Livnat, Y., & Riesenfeld, R. F. (2009). A survey of radial methods for information visual-
ization. IEEE Transactions on Visualization and Computer Graphics, 15(5), 759–776. https://doi.org/
10.1109/TVCG.2009.23.

123
92 L. Perkhofer et al.

Eisl, C., Losbichler, H., Falschlunger, L., Fischer, B., & Hofer, P. (2012). Reporting design. Status quo und
neue Wege in der Gestaltung des internen und externen Berichtswesens. In C. Eisl, H. Losbichler,
C. Engelbrechtsmüller, M. Büttner, H. Wambach, & A. Schmidt-Pöstion (Eds.), FH Oberösterreich,
KPMG Advisory AG, pmOne AG.
Elmqvist, N., Moere, A. V., Jetter, H.-C., Cernea, D., Reiterer, H., & Jankun-Kelly, T. J. (2011). Fluid
interaction for information visualization. Information Visualization, 10(4), 327–340. https://doi.org/
10.1177/1473871611413180.
Elmqvist, N., Stasko, J., & Tsigas, P. (Eds.) (2007). DataMeadow. A visual canvas for analysis of large-scale
multivariate data. In 2007 IEEE symposium on visual analytics science and technology.
Endert, A., Hossain, M. S., Ramakrishnan, N., North, C., Fiaux, P., & Andrews, C. (2014). The human is the
loop: New directions for visual analytics. Journal of Intelligent Information Systems, 43(3), 411–435.
https://doi.org/10.1007/s10844-014-0304-9.
Falschlunger, L., Eisl, C., Losbichler, H., & Greil, A. (2014). Improving information perception of graphical
displays. An experimental study on the display of column graphs. In V. Skala (Ed.), Proceedings of the
22nd WSCG. Conference on computer graphics, visualization and computer vision (WSCG). Pilsen,
02.–02.06.2014 (pp. 19–26).
Falschlunger, L., Lehner, O., & Treiblmaier, H. (2016a). InfoVis: The impact of information overload
on decision making outcome in high complexity settings. In Proceedings of the 15th annual Pre-
ICIS workshop on HCI research in MIS. SIGHCI 2016. Dublin, 11.12.2016. AIS Electronic Library:
Association for Information Systems (Special Interest Group on Human-Computer Interaction), 1–6,
Paper 3.
Falschlunger, L., Lehner, O., Treiblmaier, H., & Eisl, C. (2016b). Visual representation of information
as an antecedent of perceptive efficiency. The effect of experience. In Proceedings of the 49th
Hawaii international conference on system sciences (HICSS). Koloa, HI, USA, 05.01.2016–08.01.2016
(pp. 668–676). IEEE.
Goes, P. B. (2014). Big data and IS research. MIS Quarterly, 38(3), 3–8.
Grammel, L., Tory, M., & Storey, M. A. (2010). How information visualization novices construct visual-
izations. IEEE Transactions on Visualization and Computer Graphics, 16(6), 943–952.
Harrison, L., Yang, F., Franconeri, S., & Chang, R. (2014). Ranking visualizations of correlation using
Weber’s law. IEEE Transactions on Visualization and Computer Graphics, 20(12), 1943–1952. https://
doi.org/10.1109/TVCG.2014.2346979.
Heer, J., & Shneiderman, B. (2012). Interactive dynamics for visual analysis. Communications of the ACM,
55(4), 45–54. https://doi.org/10.1145/2133806.2133821.
Heinrich, J., Stasko, J., & Weiskopf, D. (2012). The parallel coordinates matrix. In Eurographics conference
on visualization (EuroVis). 33rd annual conference of the European association for computer graphics.
Cagliari, Sardinia, Italy, 13.–18.05.2012 (pp. 1–5). European Association for Computer Graphics.
Henley, M., Hagen, M., & Bergeron, D. (2007). Evaluating two visualization techniques for genome com-
parison. In E. Banissi (Ed.), 11th [IEEE] international conference information visualization. IV 2007
[proceedings] 4–6 July 2007, Zurich, Switzerland. IEEE Computer Society (pp. 1–6). Los Alamitos
Calif., Washington D.C.: IEEE Computer Society; Conference Publishing Services.
Hirsch, B., Seubert, A., & Sohn, M. (2015). Visualisation of data in management accounting reports. Journal
of Applied Accounting Research, 16(2), 221–239. https://doi.org/10.1108/JAAR-08-2012-0059.
Hjortskov, N., Rissén, D., Blangsted, A. K., Fallentin, N., Lundberg, U., & Søgaard, K. (2004). The effect
of mental stress on heart rate variability and blood pressure during computer work. European Journal
of Applied Physiology, 92(1–2), 84–89. https://doi.org/10.1007/s00421-004-1055-z.
Hofer, P., Walchshofer, C., Eisl, C., Mayr, A., & Perkhofer, L. (2018). Sankey, Sunburst & Co. Interactive
big data visualizierungen im usability test. In L. Nadig, & U. Egle (Eds.), Proceedings of CARF 2018.
Controlling, accouting, risk, and finance. CARF Luzern 2018. Luzern, 06.–07.09.2018. (pp. 92–112).
University of Applied Sciences Luzern: Verlag IFZ.
Inselberg, A., & Dimsdale, B. (1990). Parallel coordinates. A tool for visualizing multi-dimensional geom-
etry. In Proceedings of the First IEEE conference on visualization: visualization’ 90. San Francisco,
CA, USA, 23–26 Oct. 1990. (pp. 361–378). IEEE Comput. Soc. Press.
Isenberg, T., Isenberg, P., Chen, J., Sedlmair, M., & Möller, T. (2013). A systematic review on the practice
of evaluating visualization. IEEE Transactions on Visualization and Computer Graphics, 19(12),
2818–2827.

123
Does design matter when visualizing Big Data? An empirical… 93

Janvrin, D. J., Raschke, R. L., & Dilla, W. N. (2014). Making sense of complex data using interactive data
visualization. Journal of Accounting Education, 32(4), 31–48. https://doi.org/10.1016/j.jaccedu.2014.
09.003.
Johansson, J., & Forsell, C. (2016). Evaluation of parallel coordinates. Overview, categorization and guide-
lines for future research. IEEE Transactions on Visualization and Computer Graphics, 22(1), 579–588.
https://doi.org/10.1109/tvcg.2015.2466992.
Johnson, B., & Shneiderman, B. (1991). Tree-maps: A space-filling approach to the visualization of
hierarchical information structures. In Proceedings of the 2nd IEEE conference on visualization:
visualization’91 (pp. 284–291). San Diego, CA, USA.
Kanjanabose, R., Abdul-Rahman, A., & Chen, M. (2015). A multi-task comparative study on scatter plots
and parallel coordinates plots. Computer Graphics Forum, 34(3), 261–270. https://doi.org/10.1111/
cgf.12638.
Kehrer, J., & Hauser, H. (2013). Visualization and visual analysis of multifaceted scientific data. A survey.
IEEE Transactions on Visualization and Computer Graphics, 19(3), 495–513. https://doi.org/10.1109/
tvcg.2012.110.
Keim, D. A. (2001). Visual exploration of large data sets. Communications of the ACM, 44(8), 38–44.
https://doi.org/10.1145/381641.381656.
Keim, D. A. (2002). Information visualization and visual data mining. IEEE Transactions on Visualization
and Computer Graphics, 8(1), 1–8. https://doi.org/10.1109/2945.981847.
Keim, D. A., Andrienko, G., Fekete, J.-D., Görg, C., Kohlhammer, J., & Melancon, G. (2008). Visual
analytics. Definition, process, and challenges. In A. Kerren, J. T. Stasko, J. D. Fekete, & C. North
(Eds.), Information visualization. Lecture notes in computer science (Vol. 4950, pp. 154–175). Berlin:
Springer.
Keim, D. A., Mansmann, F., Schneidewind, J., & Schreck, T. (Eds.) (2006). Monitoring network traffic with
radial traffic analyzer. In 2006 IEEE symposium on visual analytics science and technology.
Kim, M., & Draper, G. M. (2014). Radial vs. cartesian revisited. A comparison of space-filling visu-
alizations. In Prodeedings of the VINCI’14. With assistance of ACM. 7th international symposium
on visual information communication and interaction VINCI’14. Sydney, Australia, 05–08.08.2014
(pp. 196–199).
Lehmann, D. J., Albuquerque, G., Eisemann, M., Tatu, A., Keim, D., Schumann, H., et al. (2010). Visual-
isierung und Analyse multidimensionaler Datensätze. Informatik-Spektrum, 6(33), 589–600.
Liu, S., Maljovec, D., Wang, B., Bremer, P.-T., & Pascucci, V. (2017). Visualizing high-dimensional data.
Advances in the past decade. IEEE Transactions on Visualization and Computer Graphics, 23(3),
1249–1268. https://doi.org/10.1109/tvcg.2016.2640960.
Lurie, N. H., & Mason, C. H. (2007). Visual representation: implications for decision making. Journal of
Marketing, 71(1), 160–177.
Mansmann, F., Göbel, T., & Cheswick, W. (2012). Visual analysis of complex firewall configurations. In
D. Schweitzer & D. Quist (Eds.), Proceedings of the ninth international symposium on visualization
for cyber security (VisSec’12). Seattle, Washington, USA, 15.10.2012 (pp. 1–8). ACM.
Miller, G. A. (1956). The magical number seven, plus or minus two. Some limits on our capacity for
processing information. Psychological Review, 101(2), 343–352.
Munzner, T. (2014). Visualization analysis and design. AK Peters visualization series (1st ed.). Boca Raton:
CRC Press.
Netzel, R., Vuong, J., Engelke, U., O’Donoghue, S., Weiskopf, D., & Heinrich, J. (2017). Comparative
eye-tracking evaluation of scatterplots and parallel coordinates. Visual Informatics, 1(2), 118–131.
https://doi.org/10.1016/j.visinf.2017.11.001.
Ohlert, C. R., & Weißenberger, B. E. (2015). Beating the base-rate fallacy: An experimental approach on the
effectiveness of different information presentation formats. Journal of Management Control, 26(1),
51–80. https://doi.org/10.1007/s00187-015-0205-2.
Pasch, T. (2019). Strategy and innovation: The mediating role of management accountants and management
accounting systems’ use. Journal of Management Control, 30(2), 213–246. https://doi.org/10.1007/
s00187-019-00283-y.
Perkhofer, L. M. (2019). A cognitive load-theoretic framework for information visualization. In O. Lehner
(Ed.), Proceedings of the 17th conference on finance, risk and accounting perspectives, in Print
(FRAP). Helsinki, 23.–25.09.2019 (pp. 9–25). ACRN Oxford.
Perkhofer, L., Hofer, P., & Walchshofer, C. (2019a). BIG data visualisierungen 2.0. Optimale Gestaltung
und Einsatz neuartiger Visualisierungsmöglichkeiten. In L. Nadig (Ed.), Proceedings of CARF 2019.

123
94 L. Perkhofer et al.

Controlling, accounting, risk and finance. CARF Luzern 2019. Luzern, 5.–6.9.2019 (pp. 76–104).
University of Applied Sciences Luzern: Verlag IFZ.
Perkhofer, L. M., Hofer, P., Walchshofer, C., Plank, T., & Jetter, H.-C. (2019b). Interactive visualization of
big data in the field of accounting. Journal of Applied Accounting Research, 5(1), 78. https://doi.org/
10.1108/JAAR-10-2017-0114.
Perkhofer, L., & Lehner, O. (2019). Using gaze behavior to measure cognitive load. In F. Davis, R. Riedl,
J. Vom Brocke, P.-M. Léger, & A. Randolph (Eds.), Information systems and neuroscience. NeuroIS
Retreat 2018. Lecture notes in information systems and organisation, NeuroIS Retreat 2018 (1st ed.,
Vol. 29, pp. 73–83). Berlin: Springer.
Perkhofer, L., Walchshofer, C., & Hofer, P. (2019c). Designing visualizations to identify and assess correla-
tions and trends. An experimental study based on price developments. In O. Lehner (Ed.), Proceedings
of the 17th conference on finance, risk and accounting perspectives (FRAP). Helsinki, 23.–25.09.2019
(pp. 294–340). ACRN Oxford.
Perrot, A., Bourqui, R., Hanusse, N., & Auber, D. (2017). HeatPipe: High throughput, low latency big
data heatmap with spark streaming. In 21st international conference on information visualization
2017. Information visualization 2017. London, UK, 11–14.07.2017. IVS (pp. 1–6). https://hal.archives-
ouvertes.fr/hal-01516888/document. Retrieved Dec 2018
Pike, W. A., Stasko, J., Chang, R., & O’Connell, T. A. (2009). The science of interaction. Information
Visualization, 8(4), 263–274. https://doi.org/10.1057/ivs.2009.22.
Plaisant, C., Fekete, J.-D., & Grinstein, G. (2008). Promoting insight-based evaluation of visualizations.
From contest to benchmark repository. IEEE Transactions on Visualization and Computer Graphics,
14(1), 120–134. https://doi.org/10.1109/tvcg.2007.70412.
Pretorius, J., & van Wijk, J. (2005). Multidimensional visualization of transition systems. In E. Banissi, M.
Sarfraz, J. C. Roberts, B. Loften, A. Ursyn, & R. A. Burkhard et al. (Eds.), Proceedings of the ninth
international conference on information visualization (IV’05). London, UK, 06.–08.07.2005 (pp. 1–6).
IEEE Computer Society.
Riehmann, P., Hanfler, M., & Froehlich, B. (2005). Interactive Sankey diagrams. In IEEE symposium on
information visualization (InfoVis). Minneapolis, USA, 23.–25.10.2005 (pp. 233–240). IEEE Com-
puter Society.
Rodden, K. (2014). Applying a sunburst visualization to summarize user navigation sequences. IEEE
Computer Graphics and Applications, 34(5), 36–40. https://doi.org/10.1109/MCG.2014.63.
Satyanarayan, A., Moritz, D., Wongsuphasawat, K., & Heer, J. (2017). Vega-Lite: A grammar of interactive
graphics. IEEE Transactions on Visualization and Computer Graphics, 23(1), 341–350. https://doi.
org/10.1109/TVCG.2016.2599030.
Severino, R. (2015). The data visualization catalogue—An online Blog. Heatmap. Tableau. Retrieved June
21, 2019 from https://datavizcatalogue.com/methods/heatmap.html.
Shaft, T. M., & Vessey, I. (2006). The role of cognitive fit in the relationship between software comprehension
and modification. MIS Quarterly, 30(1), 29–55.
Shneiderman, B. (1996). The eyes have it. A task by data type taxonomy for information visualization. In
IEEE 1996. Proceedings, August 14–16, 1996, Blue Mountain Lake, New York. New York State Center
for Advanced Technology in Computer Applications and Software Engineering (Syracuse University)
(pp. 336–343). IEEE Computer Society. Los Alamitos Calif.: IEEE Computer Society Press.
Singh, K., & Best, P. (2019). Anti-money laundering: Using data visualization to identify suspicious activity.
International Journal of Accounting Information Systems, 34, 100418. https://doi.org/10.1016/j.accinf.
2019.06.001.
Songer, A. D., Hays, B., & North, C. (2004). Multidimensional visualization of project control data. Con-
struction Innovation, 4(3), 173–190. https://doi.org/10.1108/14714170410815088.
Speier, C. (2006). The influence of information presentation formats on complex task decision-making
performance. International Journal of Human-Computer Studies, 64(11), 1115–1131.
Stab, C., Breyer, M., Nazemi, K., Burkhardt, D., Hofmann, C., & Fellner, D. (2010). SemaSun: Visualization
of semantic knowledge based on an improved sunburst visualizatioon metaphor. In J. Herrington & C.
Montgomerie (Eds.), Proceedings of ED-MEDIA 2010. World conference on educational multimedia,
hypermedia & telecommunications. Toronto, Canada, 29.06.2010. (pp. 911–919). Association for the
Advancement of Computing in Education (AACE).
Stasko, J., & Zhang, E. (2000). Focus + context display and navigation techniques for enhancing radial,
space-filling hierarchy visualizations. In Proceedings of the INFOVIS 2000. IEEE symposium on
information visualization 2000. Salt Lake City, Utah, 09–10.10.2000 (pp. 57–65). ACM SIGGRAPH.

123
Does design matter when visualizing Big Data? An empirical… 95

Sweller, J. (2010). Element interactivity and intrinsic, extraneous, and germane gognitive load. Educational
Psychology Review, 22(2), 123–138.
Tufte, E. R. (1983). The visual display of quantitative information (1st ed.). Connecticut: Graphics Press.
van Wijk, J. J. (2005): The value of visualization. In Proceedings of the 2005 IEEE VIS. IEEE visualization.
Minneapolis, MN, USA, 23–28. Oct. 2005 (pp. 79–86).
van Wijk, J. J. (2013). Evaluation. A challenge for visual analytics. Computer, 46(7), 56–60. https://doi.
org/10.1109/mc.2013.151.
Vessey, I., & Galletta, D. (1991). Cognitive fit: An empirical study of information acquisition. Information
Systems Research, 2(1), 63–84.
Wang, L., Wang, G., & Alexander, C. (2015). Big data and visualization. Methods, challenges, and tech-
nology progress. Digital Technologies, 1(1), 33–38. https://doi.org/10.1002/9781119197249.ch1.
Ware, C. (2012). Information visualization. Perception for design (3rd ed.). Oxford: Elsevier Ltd.
Wilkinson, L. (2005). The grammar of graphics (Vol. 2). New York: Springer.
Yi, J. S., Kang, Y. A., Stasko, J., & Jacko, J. (2007). Toward a deeper understanding of the role of interaction
in information visualization. IEEE Transactions on Visualization and Computer Graphics, 13(6),
1224–1231. https://doi.org/10.1109/TVCG.2007.70515.
Yigitbasioglu, O. M., & Velcu, O. (2012). A review of dashboards in performance management: Implications
for design and research. International Journal of Accounting Information Systems, 13(1), 41–59.
Zhou, M. X., & Feiner, S. K. (1998). Visual task characterization for automated visual discourse synthesis. In
Proceedings of the SIGCHI conference on human factors in computing systems. CHI 98. Los Angeles,
C.A., USA, 18.–23.04.1998 (pp. 392–399). ACM.

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps
and institutional affiliations.

123

You might also like