Does Design Matter When Visualizing Big Data?
Does Design Matter When Visualizing Big Data?
https://doi.org/10.1007/s00187-020-00294-0
ORIGINAL PAPER
Abstract
The need for good visualization is increasing, as data volume and complexity expand.
In order to work with high volumes of structured and unstructured data, visualizations,
supporting the ability of humans to make perceptual inferences, are of the utmost
importance. In this regard, a lot of interactive visualization techniques have been
developed in recent years. However, little emphasis has been placed on the evaluation
of their usability and, in particular, on design characteristics. This paper contributes
to closing this research gap by measuring the effects of appropriate visualization use
based on data and task characteristics. Further, we specifically test the feature of inter-
action as it has been said to be an essential component of Big Data visualizations but
scarcely isolated as an independent variable in experimental research. Data collection
for the large-scale quantitative experiment was done using crowdsourcing (Amazon
Mechanical Turk). The results indicate that both, choosing an appropriate visualization
based on task characteristics and using the feature of interaction, increase usability
considerably.
B Lisa Perkhofer
[email protected]
Conny Walchshofer
[email protected]
Peter Hofer
[email protected]
1 Controlling, Accounting and Finance, University of Applied Sciences Upper Austria, Steyr, Austria
123
56 L. Perkhofer et al.
1 Introduction
123
Does design matter when visualizing Big Data? An empirical… 57
the interface (Elmqvist et al. 2011). Interacting in this context means using filter or
selection techniques, drilling down to analyze the next layer of a data dimension, or
also interchanging data dimensions or value attributes (Perkhofer et al. 2019b; Heer
and Shneiderman 2012). Only if the user is able to interactively work with the dataset
and answer predefined questions or questions that arise during the process of analysis,
Big Data visualizations can unfold their full potential and new correlations, trends, or
clusters can be detected for further use (Perkhofer et al. 2019a, c).
Unlike conventional charts used in everyday life (e.g. line, pie, or bar charts), new
visualizations require a close focus on design and interaction in order to be considered
useful (Liu et al. 2017; Kehrer and Hauser 2013; Elmqvist et al. 2011; Pike et al.
2009; Bačić and Fadlalla 2016). Unfortunately, for both the design and use of new
visualization options, and the design and use of interaction, limited empirical research
is available (Isenberg et al. 2013; Perkhofer et al. 2019b). Users still have to go through
cost-intensive and unsatisfying trial and error routines in order to identify best practice
instead of being able to rely on empirical evidence (van Wijk 2013). This led us to
identify two concrete and pressing questions in current literature, addressed in this
study:
(1) Appropriate use of new visualization types: Depending on data- and task-
characteristics, some visualization types are claimed to outperform others when
it comes to optimal decision-support. However, these claims are mostly based
on their developers opinion or on small scale user studies rather than on exper-
imental research (Isenberg et al. 2013; Perkhofer 2019). As multiple options to
visualize Big Data are available, we limit the scope of this study to identify visu-
alizations for multidimensional data. This is due to the fact, that it is impossible
for traditional forms to show more than three attributes or three dimensions at
the same time within one visualization. This, we think, highlights the importance
and need of Big Data visualizations and demonstrates their benefits. Further, as
a starting point to investigate Big Data visualizations we choose four frequently
cited and actively used visualization types (details please see Table 1), namely the
sunburst visualization, the Sankey visualization, the parallel coordinates plot and
the polar coordinates plot (Bertini et al. 2011; Keim 2002; Shneiderman 1996).
We wanted to investigate if one particular visualization type can outperform the
other based on the three tasks identify, compare, and summarize (classification
based on (Brehmer and Munzner 2013) using two different perspectives on the
dataset (multiple hierarchy-levels vs. multiple attribute comparisons).
(2) Appropriate use of interaction techniques: Pike et al. claim that interaction “has
not been isolated as an experimental variable” yet, therefore hindering direct
causal interpretation on this highly discussed and frequently used visual analytics
feature (Pike et al. 2009, p. 272). This is because most user studies concentrate
on the visualization itself, while interaction is added as an integrated feature
incorporated into the source code of the visual representation. Visualizations
can be used and tested without interaction (as a static form), however, interaction
does not work without the visualization itself (Kehrer and Hauser 2013). “Exactly
what kind of degree of benefit is realized by allowing a user to interact with visual
representation is still undetermined.” (Pike et al. 2009, p. 272). Consequently, to
123
58 L. Perkhofer et al.
answer this claim we isolate the effect of interaction and evaluate the difference
between an almost static versus a highly interactive visualization.
123
Does design matter when visualizing Big Data? An empirical… 59
123
60 L. Perkhofer et al.
and presented to potential users (in literature and free libraries such as D3.js1 ) is nec-
essary. We limit our investigation to frequently-cited and open-sourced visualization
options (Perkhofer et al. 2019a, b) as the evaluation of all Big Data visualization
options goes beyond the scope of this paper.
For classification, we distinguish between two features: the type of data that
can be represented in the proposed visualization (1a. multiple dimensions but only
one attribute2 → hierarchical visualization vs. 1b. multiple attributes but only one
dimension → multi-attribute visualization) and the basic layout (2a. Polar or 2b. Carte-
sian-coordinates based visualizations). A summary on the identified visualizations
used, is presented in Table 1 (Please note that the table does not claim to be exhaustive,
but should rather be seen as an indicator of frequently-used or proposed visualization
methods for multidimensional datasets, which is the selection criteria for our empirical
analysis).
Based on this summary of highly cited and used visualization types we can conclude
that both, a mix of Polar and Cartesian-coordinates based visualizations as well as a
mix of hierarchical and multi-attribute based ones, are common. From this pool of
options, we picked the most frequently cited pair of each category for comparison.
For a better understanding of each individual visualization type, they are going to be
explained in more detail in the following:
The sunburst visualization (Polar-coordinates based layout and hierarchical data
structure): The sunburst visualization is one of the more frequently used visualization
types compared to other and newer forms of visualizations (Perkhofer et al. 2019b).
It projects the multiple dimensions of the dataset in a hierarchical dependent manner
into rings and can therefore be mapped to be a Polar-coordinates based visualization
option. The sunburst is a compact and space-filling presentation and shows the respec-
tive proportion of the total value by each dimensions and its sub-components by its
angular size (Rodden 2014). Due to the strict structure of a sunburst, the innermost
ring represents the highest hierarchical level and all dimensions dependent on it are
represented in further rings to the outside (Keim et al. 2006). The position of the
rings influences interpretation and therefore re-positioning of these dimensions (using
another sequence of dimensions for the display of the rings) means gaining other and
new valuable insights. Additionally, based on the Vega-light specification, categorical
color scales are used to encode discrete data values, each representing a distinct cat-
egory and sequential single-hue schemes to visually link related data characteristics
(Satyanarayan et al. 2017).
The Sankey visualization (Cartesian-coordinates based layout and hierarchical
data structure): The Sankey chart focuses on sequences of data, which can either be
time-related or dependent on a hierarchical structure (Hofer et al. 2018). It is often used
1 D3.js (data driven documents) is a popular java-script based website for visualization researchers to share
and exchange code for new visualization types; many visualization types available in software are first
launched on D3.js.
2 An attribute is something measureable like revenues, costs, or contribution margins while a dimension is
a way of clustering the data for analysis such as customer groups, product groups, or sales regions.
123
Does design matter when visualizing Big Data? An empirical… 61
Table 1 Highly cited and used visualization types identified for multidimensional data
123
62 L. Perkhofer et al.
Table 1 continued
123
Does design matter when visualizing Big Data? An empirical… 63
area that appears as soon as all attributes are connected. Unfortunately, areas that
appear at random depending on the loosely selected order of attributes, misinform
the user. Further, areas are more difficult to compare than straight lines connecting
data points (Kim and Draper 2014) and data points in outer layers cause areas to
appear disproportionately bigger and therefore, angles lead to a harder assessment
than straight lines due to their distortion (Perkhofer et al. 2019a). Effects have not
been tested yet (Albo et al. 2016).
After presenting the most frequently used visualization options for Big Data, we
are going to discuss possible influencing factors on their ability to encode specific
information and making them accessible to their users. As already explained, each
visualization type has the potential to uncover and present a different type of insight
to its audience (supporting a different task), while at the same time hiding another
(Perkhofer 2019). As theories and experimental research on the process of encoding
for interactive visualizations for Big Data are limited, research from standard busi-
ness graphics and dashboarding are used for hypotheses development (Falschlunger
et al. 2016a; Speier 2006; Vessey and Galletta 1991; Perkhofer 2019; Yigitbasioglu
and Velcu 2012). The purpose of this approach is to test existing principles on their
applicability on interactive and new forms of visualizations and to shed light on the
process of encoding in order to foster decision-making.
Previous findings have shown that the following factors (explained in more detail
below) influence the ability of the user to successfully decode information given a
chosen visualization option (Perkhofer 2019; Falschlunger et al. 2016a, b; Ware 2012;
Vessey and Galletta 1991; Speier 2006):
(1) the design of the visualization,
(2) the dataset,
(3) the task, and
(4) the decision-maker characteristics (in particular previous experience and knowl-
edge on reading and interpreting visualizations).
With respect to the design of the visualization, it has been shown that a low
data-ink ratio (Tufte 1983) and the display of coherent information in juxtaposition
(Perkhofer 2019) allows for a faster processing of information. Both of these prin-
ciples are satisfied by Big Data visualizations, as they are designed to visualize the
full dataset within one coherent visualization. However, a need for discussion can be
identified when choosing a basic layout as visualizations are either based on a polar-
coordinate or Cartesian-coordinate system and the basic layout fundamentally changes
the way information needs to be decoded by the user (Rodden 2014). While in a polar-
coordinates based visualizations, angles need to be assessed, the height of a column
or the length of a line that needs to be compared within a Coordinates-based system.
With respect to standardized business charts, Cartesian-coordinates based visualiza-
tions (scatterplots, line and bar charts) are known to outperform polar-coordinates
ones (pie charts) (Diehl et al. 2010). However, this result on the most appropriate lay-
out needs to be re-evaluated for Big Data visualizations as interactivity might change
123
64 L. Perkhofer et al.
results (Albo et al. 2016). Further, the share of polar-based visualizations within the
available and applied visualization tools is quite large and therefore deserves a closer
look. This leads to our first hypotheses:
H1a: The basic layout influences usability of a visualization.
H1b: Cartesian-coordinate based visualization types outperform polar-coordinate
based visualization types.
Next to the design, the underlying dataset influences usability. It is known, that data
can only be assessed, as long as enough cognitive space is available for data process-
ing (Sweller 2010; Atkinson and Shiffring 1968; Miller 1956). Otherwise, or more
precisely in a state of information overload, a negative effect on effectivity, efficiency,
and satisfaction can be identified (Bawden and Robinson 2009; Falschlunger et al.
2016a). It has also been demonstrated that data, which is presented in a familiar form
(e.g. known since childhood) or which can be related to already known information
stored in long-term memory, is processed faster and more accurately as the burden
it poses on working memory is reduced (Perkhofer and Lehner 2019; Atkinson and
Shiffring 1968).
As presented in Table 1, one needs to distinguish between hierarchical and multi-
attribute visualization types when dealing with multidimensional datasets. While for
hierarchical visualizations only one attribute (e.g. one KPI such as sales) needs to
be evaluated based on different levels and compositions of aggregation, for multi-
attribute visualizations multiple attributes need to be processed. For the latter, not
only different measures need to be known and understood, but also they have to be
analyzed in reference to each other for new insights to appear. Consequently, multi-
attribute visualizations are said to enhance the burden placed on the user (Falschlunger
et al. 2016a) leading to the following hypotheses for our investigation:
H2a: The underlying dataset influences the usability of a visualization.
H2b: Hierarchy based visualizations types outperform multi-attribute based visu-
alization types.
Without question and as already mentioned multiple times, tasks and insights differ
with different visualization types. Matching the visualization to its respective task has
been identified as a main influence in traditional visualization use. It has been shown
that a mismatch increases cognitive load and consequently impairs decision-making
outcome (Falschlunger et al. 2016a, b; Dilla et al. 2010; Shaft and Vessey 2006;
Speier 2006; Perkhofer 2019). Up to now, the question of tables versus charts has
been extensively tested resulting in a classification of spatial tasks (looking for trends,
comparisons etc.) to be best supported by spatial visualizations (charts) and symbolic
tasks (looking for specific data points) to be best supported by symbolic visualizations
(tables) (Vessey and Galletta 1991). With respect to Big Data visualizations, a new
classification of tasks has been established, namely identify (search for a specific data
point), compare (compare two different data points or also two different aggregation
levels), and summarize (generate overall insights by looking at the whole dataset)
(Brehmer and Munzner 2013). However, these tasks have not yet been associated with
visualization types or characteristics of visualization types.
Based on the fundamental activities that users have to perform and given the above
presented task type classification, we hypothesize that the task identify is easier to
123
Does design matter when visualizing Big Data? An empirical… 65
Visualizations designed to present large amounts of data greatly benefit from the
process of interaction. In particular, the following processes are better supported
by interactive visualizations when confronted with visual analytics tasks: detecting
the expected, discovering the unexpected (generate hypotheses), and drawing data-
supported conclusions (reject or verify hypothesis) (Kehrer and Hauser 2013). To be
more specific, working with interactive visualizations is driven by a particular ana-
lytics task. However, when working interactively, analysis does not end by finding a
123
66 L. Perkhofer et al.
proper answer to the initial task, but rather allows the generation and verification of
additional and different hypotheses, which are then called insight (Pike et al. 2009).
These are generated only because the user interactively works with the dataset, and
the process of doing so increases engagement, opportunity, and creativity (Brehmer
and Munzner 2013; Dilla et al. 2010).
As a consequence, visualizations presented in Table 1 are claimed to only become
useful as soon as the user is able to interact with the data. Interaction is of such high
importance, because the actions of a user demonstrate the “discourse the user has
with his or her information, prior knowledge, colleagues and environment” (Pike et al.
2009, p. 273). Further, the sequence of actions is not predefined but rather individual
and dependent on the user. It thereby particularly supports the user’s knowledge base
and perceptual abilities (Dilla et al. 2010; Brehmer and Munzner 2013; Elmqvist
et al. 2011; Dörk et al. 2008; Liu et al. 2017). Consequently, interaction requires
active physical and mental engagement and throughout this process, understanding is
increased and decision-making capabilities are enhanced (Pike et al. 2009; Pretorius
and van Wijk 2005; van Wijk 2005; Wilkinson 2005; Dix and Ellis 1998; Buja et al.
1996; Shneiderman 1996). In a static form, only a general overview is presented
to the user, however, without the opportunity of interaction, hypotheses verification,
or further hypotheses generation is extremely limited (Hofer et al. 2018; Liu et al.
2017; Pike et al. 2009; Perkhofer et al. 2019b). This not only frustrates users, but also
contradicts the well-known mantra of visual information seeking: “overview first,
zoom and filter, then details on demand” (Shneiderman 1996). On a more practical
level, interaction allows the user to filter, select, navigate, arrange, or change either the
amount of data or the characteristics of the visual display (for details on the interaction
techniques see Table 20 in the “Appendix”).
Results on the proper use of interaction techniques are limited. Unfortunately, stud-
ies that have been conducted up to now, tend to blur the concept of visualization in
combination with interaction (Pike et al. 2009). However, existing recommendations
predominantly support the use of multiple interaction methods (Rodden 2014). “The
more ways a user can ‘hold’ their data (by changing their form or exploring them from
different angles and via different transformations), the more insight will accumulate”
(Pike et al. 2009, p. 264). By intentionally clicking, scrolling and filtering the data, the
user gains a deeper understanding of the relations within the given dataset. Interaction
is therefore an essential part of the sense-making process and enhances the user’s
processing and sense-making capabilities (Shneiderman 1996). Building on previous
literature, the following hypotheses are presented:
3 Study design
The purpose of this paper was directed toward the identification of an interactive visual-
ization in order to present multidimensional data effectively and efficiently. Therefore
123
Does design matter when visualizing Big Data? An empirical… 67
Between
subjects Visualization type
Sunburst visualization ▲
Sankey visualization ▲
Polar coordinates plot
Usage/Experience
Parallel coordinates plot
Within
subjects Task type Usability
Within
subjects Interaction
Yes
No
▲ x dimensions / 1 attribute
1 attribute / x dimensions
Polar-coordinates based layout
Cartesian-coordinates based layout
123
68 L. Perkhofer et al.
We used a self-generated data sample for our study as a basis to compare the different
visualization types. The dataset simulated a wine trade company and consisted of 9961
records, whereby each record represented a customer’s order. During construction of
the sample, six finance experts designed key metrics typically used in trade companies
to simulate a close-to-reality example for data exploration. The dataset consisted of
14 dimensions (order number, trader, grape variety, winemaker, state, etc.) and 12
attributes (gross margin, net margin, gross sales, net sales, discounts, gross profit,
shipping costs, etc.) in total. As a result, our dataset can be described as being structured
and shows no inconsistencies or missing values. Users were confronted with a large
amount of data shown within one visualization, including multiple possible dimensions
and attributes in order to find patterns, trends, or outliers. This allowed the assumption
that confusion and misunderstanding based on the dataset were kept to a minimum (also
confirmed in pre-tests). Each visualization used, without any filters active, showed the
complete underlying dataset of 9961 records.
As already explained in Sect. 2, we tested four distinct visualization types. These four
visualization types could be characterized by two features: by the structure of the data
they were capable to display (hierarchical data vs. multi-attribute data) and by the
overall layout of the visualization types (horizontal/Cartesian vs. radial/Polar). Addi-
tionally, interaction is the central component to understand and work with Big Data
visualization tools. Therefore, by taking a closer look at already existing prototypes
and literature, two interaction concepts per visualization type were designed to estab-
lish comparison and fairness, but also present each type in the best possible and most
natural way. The used and virtually available visualization types and their respective
interaction concepts are presented in Tables 21 and 22 in the “Appendix”.
Based on the previously described dataset, statements in accordance with Brehmer
and Munzner’s task classification model for Big Data visualizations were created and
presented to the participants for evaluation in randomized order. Participants were
asked in the experimental conditions to assess the statements’ truth (examples are
presented in Table 2). Each task type was assessed twice per visualization-interaction
combination.
For assessing the quality of a visualization, the effects on user performance (effi-
ciency and effectiveness) alone are not sufficient, instead one needs to measure the
whole concept of usability (Pike et al. 2009; van Wijk 2013). Usability is defined by
ISO 9241-11 and represents a combination of effectiveness, efficiency, and satisfaction
(SAT) in order to present the user with the best possible solution. For effectiveness, we
rated the number of statements answered correctly, while for efficiency, we measured
the time for task execution (logged by LimeSurvey as soon as answers to a given task
123
Does design matter when visualizing Big Data? An empirical… 69
Subtasks to calculate 1. Eliminate outliers and 2. Use z-transformation 3. Calculate sum score
usability score correct direction for for effectivity, efficiency
efficiency and satisfaction and
adjust absolute levels
• 5% percentile • Calculate z-scores • Add up values for
excluded based on (sample value – effectivity, efficiency,
experience in lab mean) / standard and satisfaction
Description and experiments deviation
assumptions (confusion vs. no • Adjust absolute
motivation) levels:
As well as steps • adjust direction of • calculate max and
and formulas used task time based on min to measure the
the direction of the absolute distance
other two variables • use the absolute
effectivity and distance from the
satisfaction by largest parameter
multiplying -1 (efficiency) for
the higher the better level adjustment
(details see Table 6)
were submitted). With respect to satisfaction, we collected data not per single task, but
for each visualization and interaction level. Participants had to rate their satisfaction on
a 5-point Likert scale (Question: Please rate your overall level of satisfaction with the
visualization in the figure presented below. Please bear in mind the experimental tasks
when filling out the scale. Answer options: very satisfied—satisfied—neutral—unsat-
isfied—very unsatisfied).
As usability is measured inconsistently throughout the literature, we provide
insights into all three sub-components but also introduce one comprehensive mea-
sure for usability for better readability of results. In order to do so, we first calculate
z-scores for the components before adjusting the absolute distance between min and
max (the largest distance exists between efficiency; this distance is used to re-calculate
min and max for effectivity and satisfaction). After data transformation, a sum score
is calculated. Figure 2 documents the calculation and Fig. 3 shows details on the
distribution of the used variables.
123
70 L. Perkhofer et al.
Z-score before
adjustment
Z-score after
adjustment
Fig. 3 Adjustment of z-scores to absolute levels of efficiency (for effectivity and satisfaction only the pre-
sented scores exists; for efficiency exact times observed during the study are used for analysis)
visualization in the figure presented below?). Insights on the obtained results are
presented in Table 3.
3.5 Procedure
Before starting the experiment, an introduction served to explain the given dataset and
the procedure of the study. Further, in order to work effectively and efficiently with the
visualizations and the various interaction techniques used, we included a short video
showing the visualization type in detail as well as all possible interaction techniques
(the link was also available throughout the experiment by a link posted in the help
section). For each visualization type and each interaction stage, not only the video but
also a verbal explanation was included throughout the experiment.
After reading the introduction, the attention of the participants was tested to ensure
quality by asking control questions. Only if 4 out of 6 questions were answered cor-
rectly, data was used for analysis. The static and the interactive layout were grouped
and presented to the participants in randomized order (either they started with all ques-
tions based on the static or with all questions based on the interactive layout). Tasks
within each layout were again presented in randomized order. After completing the
experimental task, participants filled out a preference questionnaire concerning inter-
action and provided information on their experience with visual analytics and with
the visualization type under investigation. Additionally, demographic information was
collected.
The studies (one per visualization type) were launched on Amazon Mechanical
Turk in June 2018 (Sunburst and Sankey visualization) and in January 2019 (Parallel
Coordinates and Polar Coordinates Plot). Participants were compensated for their
participation (10$ per participant, the study lasted approximately 45 min). Participants
were only compensated for their invested time when the complete dataset was handed
in and no response pattern (e.g. always choosing the same answer option, choosing
“no answer” more than 30% of the time) could be identified. For data analysis, we
123
Does design matter when visualizing Big Data? An empirical… 71
excluded the max 5% and min 5% of time needed per visualization to eliminate outliers.
This is motivated by the researchers’ previous experience in lab settings: We could
observe extremely low task times in cases when not enough effort was put into solving
the tasks (also leading to poor effectivity), and extremely high task times in cases
of distractions. These steps were necessary to ensure high-quality data, even without
observing participants during the execution of the experimental tasks.
3.6 Participants
123
72 L. Perkhofer et al.
right target group for our study. In total, we recruited 198 participants resulting in
2376 evaluable task assessments. Details on the participants per study can be found
in Table 4.
4 Results
In the following, first results for each visualization type are going to be presented.
This initial analysis shows whether interaction is a necessary component of a Big Data
visualization (Sect. 4.1). For evaluation, MANCOVA is used to analyze our dependent
variables individually (effectivity, efficiency, and satisfaction) and ANCOVA to ana-
lyze the generated sum score of usability. To check the quality of our results, we also
conducted randomization tests to see whether a random allocation of results shows
a difference in outcome to one of the variable’s specification (number of resamples:
200). All randomization tests showed satisfying results, which are presented in the
“Appendix”. In the second part of this analysis, the effect of task type is going to be
analyzed in more detail (Sect. 4.2).
Table 5 shows descriptive statistics for the variables interaction technique and visual-
ization type, which are the subjects of the first MANCOVA. While interaction seems
to have a limited effect on effectivity, it seems to have a positive effect on both, effi-
ciency and satisfaction. With respect to the visualization types, unfavorable results
from the polar coordinates plot for the two variables efficiency and satisfaction stand
out. Further, measures on effectivity and satisfaction show excess kurtosis as well as
skewness of around − 1 indicating that the measure is a little steeper and somewhat
skewed to the left when compared to normal distribution. Response time shows a
higher deviation from normality (skewness: − 1.7; kurtosis: 4.0), as is typical for time
related experiments with no time constraints imposed on the users.
Based on these results and by taking a closer look at the visualization types used,
we can find an initial support for our hypothesis 1a and 2a that both, the layout as
well as the dataset influence usability. While the layout seems to have a stronger
influence on task time, the dataset has a stronger influence on response accuracy. With
respect to hypothesis 1b, we can support the claim that Cartesian-based visualizations
show a higher usability especially when looking at the interactive form. While for
hypothesis 2b, we find only partial support as efficiency and satisfaction show a better
performance for hierarchy-based visualizations while task accuracy is higher for multi-
attribute ones. Further, we find initial proof for hypothesis 5a as well as 5b stating that
interaction indeed has an influence and that usability is higher for interactive than for
static visualizations. The next step is to test if these findings show significance in our
multivariate linear model.
123
Table 4 Demographic information on the four different participant groups
Visualization type N Task assessments Gender (female %) Age Completed degree Working experiencea Visual analyticsb
Sunburst 52 624 52% 37.9 Bachelor 83% Yes 94% No exp. 21%
Master 11% No 2% Slightly 38%
Doctorate 2% N/A 4% Moderately 39%
N/A 4% Extremely 2%
Sankey 49 588 45% 36.1 Bachelor 86% Yes 88% No exp. 12%
Master 4% No 6% Slightly 41%
Doctorate 2% N/A 6% Moderately 45%
N/A 8% Extremely 2%
Polar coordinates plot 49 588 40% 39.3 Bachelor 82% Yes 96% No exp. 16%
Master 16% No 0% Slightly 43%
Does design matter when visualizing Big Data? An empirical…
123
73
74 L. Perkhofer et al.
Table 5 Descriptive statistics—interaction technique per visualization type (means before z-transformation)
We decided to use MANCOVA for analysis despite Box’s Test of equality of covariance
matrices is significant (p < 0.001), as first N is high (approx. 500 per visualization type
and 1000 per interaction concept—static vs. interactive) and therefore the test might
be too sensitive to violations of equality. And second, because groups are roughly
equal in size (between 250 and 270). As a consequence, we report Pillai’s Trace and
Wilks’ Lambda as those two are the more conservative measures to minimize type I
error.
Table 6 shows that the use of interaction has the strongest effect size in this model
with a partial eta squared (ï2 ) of 0.285. Based on Cohen’s classification (1969), this
effect can be seen as strong (ï2 ≥ 0.138), while the effect of visualization type is
0.091 and therefore considered to be medium in size (ï2 ≥ 0.059). Usage (or previous
experience with the particular visualization type) as well as the statistical interaction
effect between visualization type and interaction technique (VisType x Interaction) can
be interpreted as small effects (ï2 ≥ 0.010). After this initial analysis of the independent
variables and the covariate, a more detailed analysis based on the three dependent
variables follows in the Table 7.
Table 7 shows that none of the independent variables under investigation has a
significant effect on effectivity (response accuracy—RA). However, choosing the right
visualization type has an effect on efficiency (response time—RT) and satisfaction.
Further, using Big Data visualizations in an interactive form also show an effect on
efficiency and satisfaction. Our introduced covariate usage (or previous experience
with the particular visualization type) influences only the variable satisfaction, no
effects can be found on efficiency and effectivity. These results allow us to confirm
hypothesis 4 as well as 5a.
Drilling further down in our analysis, results on post hoc Sidak indicate that for all
visualization types the interactive form shows superior results for response time and
123
Does design matter when visualizing Big Data? An empirical… 75
that the polar coordinates plot performs significantly worse than all other visualization
types for Big Data. Regarding satisfaction, we can identify a superior visualization
type (Sankey) as well as an inferior visualization type (polar coordinates). Further, we
can identify again that interactive visualization types satisfy participants while static
ones seem to frustrate them. (Note: Analysis based on effectivity is not presented, as
no significant results could be obtained between-subjects).
The detailed analysis in Table 8 allows us to confirm hypothesis 5b (based on
efficiency and satisfaction; no result on effectivity).
Our analysis in Table 9 shows that Cartesian-coordinate based visualizations out-
perform Polar-coordinate based visualizations in terms of efficiency and satisfaction,
while the difference in effectivity is not significant (confirming hypothesis 1b). Tak-
123
76 L. Perkhofer et al.
ing a closer look at the dataset used, we can find partial support for our hypothesis
2b. Hierarchy based visualizations perform better in terms of efficiency and satisfac-
tion, while they perform worse (but only at a p < 0.1 significance level) in terms of
effectivity.
After the initial analysis based on MANCOVA, where each dependent variable was
looked at independently, we now investigate how a single score for usability influences
results and interpretation. The following table presents all independent variables and
the used covariate for usage (Table 10).
First, interpretation is easier, as only one dependent variable needs to be con-
sidered during analysis. However, statistical power and explainability is reduced
considerably. The previously observed strong effects are reduced to be of medium
strength and r2 is only 0.147, indicating that only 15% of the variability in the
dependent variable can be explained by our sum score for usability. Nonetheless,
interpretation based on pairwise comparison stays the same. We can derive that
interactive visualizations are superior when compared to static ones for all visual-
ization types tested and we can conclude that the polar coordinates plot is inferior
when compared to the other visualization types used in this study (Tables 11,
12).
123
Does design matter when visualizing Big Data? An empirical… 77
Table 10 Test of between-subject Dependent Type III sum F Sig. Partial eta
effects (visualization type and variable of squares squared
interaction technique)
Usage 472.451 24.472 0.000 0.012
VisType 2228.604 38.480 0.000 0.055
Interaction 3295.185 170.686 0.000 0.079
VisType × 247.064 4.266 0.005 0.006
Interaction
R squared 0.147
From our analysis in Sect. 4.1, we know that using Big Data visualization in an
interactive form results in a better performance with respect to the two dependent
variables efficiency and satisfaction which in turn increases usability. As a result,
analysis based on task type is carried out only for interactive visualization types in
order to concentrate on identifying the best visualization for specific task types. Again,
we separately look at the dependent variables in a multivariate general linear model
(Sect. 4.2.1) and use a univariate general linear model to evaluate the sum score for
usability (Sect. 4.2.2).
Table 13 shows mean values for the obtained data. What we can see from this analysis
is, that the visualizations showing multiple attributes seem to outperform hierarchy-
based visualizations when working with the task type identify (when looking at
effectivity), while for summarize no clear indication on superiority can be found.
Regarding the distribution of the variables, we can again see a rather strong deviation
123
78 L. Perkhofer et al.
Table 13 Descriptive statistics task type per visualization type (means before z-transformation)
from normality for response time (skewness: − 2.2; excess kurtosis: 6.8), while for
the other two variables measures around − 1 are indicated.
Based on these initial results, we can derive that the task type indeed has an influence
on the variables used, however, the strongest influence can be identified for effectivity.
Further, it seems that our hypothesis 3b can be supported, while there might be little
empirical support for 3c. Nonetheless, for a final résumé, statistical analysis from the
next chapter is necessary.
Again, Box’s Test of equality of covariance matrices is significant (p < 0.001) but also
for this analysis N is high (approx. 250 per visualization type and 350 per task type-
—identify, compare, and summarize) and groups are roughly equal in size (between
and 84 and 92). Consequently, we report Pillai’s Trace and Wilks’ Lambda as those
two are the more conservative measures.
Table 14 indicates that the visualization type has an influence on our multivariate
general linear model and based on the effect size represented by ï2 , the effect can be
classified to be of medium size. Also, the covariate usage shows a significant effect and
stands for an increased satisfaction along with an increase in previous experience with
the respective visualization type. An effect is also visible for the statistical interaction
of task type and visualization type (VisType x TaskType), while no significance is
shown for task type. Detailed analysis based on the dependent variables is presented
in Table 10.
The more detailed analysis in Table 15 reveals that (as already known by the analysis
in Sect. 4.1) the visualization type influences efficiency as well as satisfaction, while
it has no effect on effectivity. More interestingly, this analysis also shows that task
type influences effectivity (supporting H3a) and that there is a significant statistical
interaction effect between visualization type and task type used (VisType x TaskType).
123
Does design matter when visualizing Big Data? An empirical… 79
Table 15 Test of between-subject Effect Dependent Type III F Sig. Partial eta
effects (task type and interactive variable sum of squared
visualization type) squares
This interaction effect states that depending on the visualization type used the differ-
ence in mean varies and for further insight pairwise comparison is needed to explain
the connection (Table 16).
Response accuracy, without distinguishing between the visualization types, is sig-
nificantly higher for the task type summarize than it is for the task type identify. Taking
a closer look at the task type identify, it becomes obvious that the visualization types
123
80 L. Perkhofer et al.
Table 17 Test of between-subject Dependent Type III sum F Sig. Partial eta
effects (task types and variable of squares squared
interactive visualization types)
Usage 153.233 7.981 0.005 0.008
VisType 853.638 14.820 0.000 0.043
TaskType 133.150 3.467 0.032 0.007
VisType x 443.878 3.853 0.001 0.023
TaskType
R Squared 0.080
that show disaggregated data (more attributes and only one dimension: parallel coor-
dinates and polar coordinates) are superior, while visualization types that aggregate
data based on multiple dimensions are inferior (sunburst and Sankey). The opposite is
true for the task type compare (although differences are only significant at a p < 0.1
level) and for the task type summarize no significant difference between visualization
types can be found. Based on these results hypothesis 3b can be supported, while 3c
cannot.
With respect to response time and satisfaction, the same results as already pre-
sented in Sect. 4.1 are visible. Overall, polar coordinates are inferior to the other three
visualization types. No significant difference can be found between the sunburst visu-
alization, the Sankey visualization and the parallel coordinates plot. For satisfaction,
the Sankey chart shows superior results while the polar coordinates shows the worst
outcome.
After analyzing the multivariate model, we again take a look at the univariate model
based on our calculated sum score for usability. The following table shows the obtained
results based on the between-subject effects (Table 17).
As already visible in the first analysis presented in Sect. 4.1.2, the sum score shows
a reduced r2 (compared to RT and SAT) and also, the effect size of the independent
variables is downsized from previous medium effects to small ones. Pairwise compar-
ison reveals that the Sankey visualization and the parallel coordinates plot show the
highest scores while the polar coordinates plot shows the worst (significant p < 0.05).
Further, we can again derive that irrespective of the visualization type used, the task
type identify is more difficult to perform when Big Data visualizations are presented
to the user than the task type summarize (especially when confronted with hierarchy
based visualizations) (Table 18).
123
Does design matter when visualizing Big Data? An empirical… 81
123
82 L. Perkhofer et al.
Fig. 4 Results on the research model. Directed effect size indicated by color and arrow thinkness
(→ small, → medium, and → large effect)
form (parallel and polar coordinates) while the task type summarize asks rather for
visualizations that show data in an aggregated form. This is very much in line with
previous research asking for a cognitive fit (Ohlert and Weißenberger 2015; Hirsch
et al. 2015; Vessey and Galletta 1991).
Also clearly indicated by our results is the need for interaction when working with
Big Data visualizations. The use of interaction has a strong effect on satisfaction and
additionally, a medium effect on efficiency. Concerning our covariate usage, we can
detect a significant influence on satisfaction while it has no effect on effectivity or
efficiency. Results on all posted hypotheses are again presented and summarized in
Table 19.
With respect to the evaluation methods used—multiple dependent variables versus
one sum score—we could observe that a lot of explainability is lost by only looking
at the sum score of usability instead of analyzing the three dependent variables
effectivity, efficiency, and satisfaction independently. On the other hand, results
based on the single score can be presented clearer and with respect to interpretation,
no differences in recommendations to users are visible. From our perspective, this
combined evaluation of the dependent variables and the sum score shows a clear
picture on the effects tested in this study.
Of course, this study also includes some limiting factors that need to be discussed and
considered when interpreting results. Limitations identified are an indication for fur-
ther research opportunities that can be addressed in supplementary research endeavors:
Limited number of visualizations used: As already explained, we only analyzed a
subset of visualization options available, however, a huge pool of further possibilities
123
Does design matter when visualizing Big Data? An empirical… 83
exists. They range from additional forms of one comprehensive visualization, to the use
of small multiples (multiple small visualizations put in juxtaposition such as parallel
coordinates plot matrix or scatterplot matrix). Although the form of one comprehensive
visualization is of relevance, it would be of special interest to investigate if a difference
123
84 L. Perkhofer et al.
on mental demand and/or usability persists when using more than one visualization
for displaying the dataset. These options need to be further explored.
Different interaction techniques depending on the visualization type: Results on
all visualization types are directly comparable as they were tested with the same
questions and the same data dimensions or attributes respectively. However, interaction
techniques were used according to common practice, leaving us with different concepts
using a different mix of individual techniques. We decided on this approach to ensure
high external validity, knowing that we might introduce possible limiting factors for
internal validity. We did not want to depart too much from the real world and thereby
introduce simplification for the sake of fulfilling all requirements for a valid scientific
experiment. However, we do not imply that this is the better approach in general, but it
fits our purpose. The most important requirement for this study is that the visualization
and the task are generic but also realistic.
Use of MTurk: While the use of MTurk comes with the advantage of a large pool
of possible participants and fast survey completion rates, there are also some related
limitations. First, during data collection, we as researchers have no control over the
process. Participants could be disturbed or interrupted, drawing away the necessary
attention that might be needed to successfully fulfill the required tasks. Therefore,
special attention needs to be paid in the design of the questionnaire and quality checks
need to be implemented to sort good from bad. Second, most workers live in the
United States and India, which might introduce a cultural bias. However, workers
tend to be more educated than the general population and therefore more complex
issues can be posted on MTurk, which was important for our study. Further, specific
characteristics of the workers (e.g. the need for a bachelor’s degree) can be linked to
the posted HIT (human intelligence tasks) in exchange for higher payment. Despite
the possible drawbacks, a comparative study which was posted on MTurk and also
executed in the lab in the context of visual analytics produced comparable results
(Harrison et al. 2014). For this initial stage of our research, we need answers to many
manipulations (design, dataset, correlation type…) and we therefore believe MTurk
to be an appropriate platform.
No information on information retrieval process: The way information is retrieved
from a visual display gives a lot of indications on design problems. Controlled exper-
iments using eye-tracking have proven to be particularly useful in providing insight
in the data retrieval process (Falschlunger et al. 2014, 2016a) and might be able to
shed further light on the specific design issues. We could not make use of a controlled
environment by using the crowdsourcing platform MTurk. Thus, we cannot assume
that all participants have participated under the same conditions, meaning the same
speed of internet, the same display accuracy of the end device, as well as the same
environmental conditions (e.g. silent surrounding). Further, having enough cognitive
resources is of high importance in order to uncover insight. Measuring cognitive load
directly, for example by relying on physiological measurement methods such as eye-
tracking (Perkhofer and Lehner 2019) or heart rate variability (Hjortskov et al. 2004),
might allow for more reliable results than by measuring on self-reported data.
Effectivity is measured dichotomous: The use of questions that are either correctly
or incorrectly answered could be the reason why low effects on effectivity are visible
within the model. Asking different questions that allow the assessment of effectivity
123
Does design matter when visualizing Big Data? An empirical… 85
using finer distinctions rather than 0 and 1 could be beneficial to gain further insight
on this measure.
In conclusion, Big Data visualizations allow to show a large amount of data in one
comprehensive visualization, however, a special focus needs to be placed on their
design (including an appropriate layout and interaction techniques) as well as on the
task they are supposed to support. The presented new visualizations (sunburst, Sankey,
parallel coordinates plot, and polar coordinates plot) are better directed towards show-
ing transaction-based data. Moreover, management accounting is an interesting area
of application, however, the lack of experience of a user leaves them at a disadvantage
in terms of interpretation and satisfaction (Perkhofer et al. 2019b). Without knowledge
(or stored schemas in long-term memory) on how to interpret these visual forms and
knowledge on how to operate them, insight cannot be triggered (Sweller 2010). This
necessitates a detailed focus on user-centered visual and functional design, a fact that
has been largely neglected so far (Isenberg et al. 2013). This study is a first attempt at
closing this gap.
Acknowledgements Open access funding provided by University of Applied Sciences Upper Austria. The
authors of this paper are thankful for the support of Hans-Christian Jetter and Thomas Luger and for the
support of the University of Applied Sciences Upper Austria. The authors further gratefully acknowledge
the Austrian Research Promotion Agency (FFG) Grant #856316 USIVIS: User-Centered Interactive Visu-
alization of “Big Data”.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which
permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give
appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence,
and indicate if changes were made. The images or other third party material in this article are included
in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If
material is not included in the article’s Creative Commons licence and your intended use is not permitted
by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the
copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Appendix
123
86 L. Perkhofer et al.
123
Does design matter when visualizing Big Data? An empirical… 87
Table 20 continued
Sunburst ● Sankey ◆
123
88 L. Perkhofer et al.
Randomization checks are only provided for significant pairwise comparison results
in Sect. 4 (Tables 23, 24, 25, 26, 27, 28, 29, 30):
• RT1: Sunburst—polar coordinates plot
• RT2: Sankey—polar coordinates plot
• RT3: Parallel coordinates plot—polar coordinates plot
• SAT1: Sunburst—Sankey
• SAT2: Sunburst—polar coordinates
• SAT3: Sankey—parallel coordinates
123
Does design matter when visualizing Big Data? An empirical… 89
Randomization checks are only provided for significant pairwise comparison results
in Sect. 4 (Tables 31, 32):
• RT1: Static—interactive
• SAT1: Static—interactive
123
90 L. Perkhofer et al.
Randomization checks are only provided for significant pairwise comparison results
in Sect. 4 (Table 33):
• RA1: Identify—summarize
References
Abi Akle, A., Yannou, B., & Minel, S. (2019). Information visualisation for efficient knowledge discovery
and informed decision in design by shopping. Journal of Engineering Design, 30(6), 227–253. https://
doi.org/10.1080/09544828.2019.1623383.
Albo, Y., Lanir, J., Bak, P., & Rafaeli, S. (2016). Off the radar. Comparative evaluation of radial visualization
solutions for composite indicators. IEEE Transactions on Visualization and Computer Graphics, 22(1),
569–578. https://doi.org/10.1109/tvcg.2015.2467322.
Anderson, E. W., Potter, K. C., Matzen, L. E., Shepherd, J. F., Preston, G. A., & Silva, C. T. (2011). A user
study of visualization effectiveness using EEG and cognitive load. Computer Graphics Forum, 30(3),
791–800.
Appelbaum, D., Kogan, A., Vasarhelyi, M., & Yan, Z. (2017). Impact of business analytics and enterprise
systems on managerial accounting. International Journal of Accounting Information Systems, 25,
29–44. https://doi.org/10.1016/j.accinf.2017.03.003.
123
Does design matter when visualizing Big Data? An empirical… 91
Atkinson, R. C., & Shiffring, R. M. (1968). Human memory. A proposed system and its control processes.
In K. W. Spence & J. T. Spence (Eds.), The psychology of learning and motivation (Vol. 2, pp. 89–195).
New York: Academic Press.
Bačić, D., & Fadlalla, A. (2016). Business information visualization intellectual contributions: An integra-
tive framework of visualization capabilities and dimensions of visual intelligence. Decision Support
Systems, 89, 77–86. https://doi.org/10.1016/j.dss.2016.06.011.
Barter, R. L., & Yu, B. (2018). Superheat: An R package for creating beautiful and extendable heatmaps for
visualizing complex data. Journal of Computational and Graphical Statistics: A Joint Publication of
American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North
America, 27(4), 910–922. https://doi.org/10.1080/10618600.2018.1473780.
Bawden, D., & Robinson, L. (2009). The dark side of information. Overload, anxiety and other paradoxes
and pathologies. Journal of Information Science, 35(2), 180–191.
Bertini, E., Tatu, A., & Keim, D. A. (2011). Quality metrics in high-dimensional data visualization. An
overview and systematization. IEEE Transactions on Visualization and Computer Graphics, 17(12),
2203–2212.
Bostock, M., Ogievetsky, V., & Heer, J. (2011). D3: Data-driven documents. IEEE Transactions on Visual-
ization and Computer Graphics, 17(12), 2301–2309.
Brehmer, M., & Munzner, T. (2013). A multi-level typology of abstract visualization tasks. IEEE Transac-
tions on Visualization and Computer Graphics, 19(12), 2376–2385. https://doi.org/10.1109/TVCG.
2013.124.
Bruls, M., Huizing, K., & van Wijk, J. J. (2000). Squarified treemaps. In W. de Leeuw & R. van Liere (Eds.),
Eurographics/IEEE VGTC. With assistance of IEEE computer society. IEEE VGTC symposium on
visualization. Amsterdam, 29–30.05.2000 (pp. 1–10).
Buja, A., Cook, D., & Swayne, D. F. (1996). Interactive high-dimensional data visualization. Journal of
Computational and Graphical Statistics, 5(1), 78. https://doi.org/10.2307/1390754.
Chen, C. Philip L, & Zhang, C.-Y. (2014). Data-intensive applications, challenges, techniques and tech-
nologies. A survey on Big Data. Information Sciences, 275, 314–347.
Chengzhi, Q., Chenghu, Z., & Tao, P. (2003). The taxonomy of visaulization techniques and systems.
Concerns between users and developers are different. In Proceedings of the Asia GIS 2003. Asia GIS
conference. Wuhan, China, 16.–18.10.2003 (pp. 1–14).
Chou, J.-K., Wang, Y., & Ma, K.-L. (2016). Privacy preserving event sequence data visualization using a
Sankey diagram-like representation. In SIGGRAPH ASIA 2016 symposium on visualization (pp. 1–8).
Macau: ACM.
Claessen, J. H. T., & van Wijk, J. J. (2011). Flexible linked axes for multivariate data visualization. IEEE
Transactions on Visualization and Computer Graphics, 17(12), 2310–2316. https://doi.org/10.1109/
TVCG.2011.201.
Diehl, S., Beck, F., & Burch, M. (2010). Uncovering strengths and weaknesses of radial visualizations–an
empirical approach. IEEE Transactions on Visualization and Computer Graphics, 16(6), 935–942.
https://doi.org/10.1109/TVCG.2010.209.
Dilla, W., Janvrin, D. J., & Raschke, R. (2010). Interactive data visualization. New directions for accounting
information systems research. Journal of Information Systems, 24(2), 1–37. https://doi.org/10.2308/
jis.2010.24.2.1.
Dilla, W. N., & Raschke, R. L. (2015). Data visualization for fraud detection. Practice implications and a
call for future research. International Journal of Accounting Information Systems, 16, 1–22. https://
doi.org/10.1016/j.accinf.2015.01.001.
Dix, A., & Ellis, G. (Eds.) (1998). Starting simple. Adding value to static visualisation through simple
interaction. In AVI ‘98 Proceedings of the working conference on Advanced visual interfaces. L’Aquila,
Italy: ACM New York, NY, USA (AVI ‘98).
Dörk, M., Carpendale, S., Collings, C., & Williamson, C. (2008). VisGets: Coordinated visualization for
web-based information exploration and discovery. IEEE Transactions on Visualization and Computer
Graphics, 14(6), 1205–1212.
Dörk, M., Riche, N. H., Ramos, G., & Dumais, S. (2012). PivotPaths: Strolling through faceted information
spaces. IEEE Transactions on Visualization and Computer Graphics, 18(12), 2709–2719.
Draper, G. M., Livnat, Y., & Riesenfeld, R. F. (2009). A survey of radial methods for information visual-
ization. IEEE Transactions on Visualization and Computer Graphics, 15(5), 759–776. https://doi.org/
10.1109/TVCG.2009.23.
123
92 L. Perkhofer et al.
Eisl, C., Losbichler, H., Falschlunger, L., Fischer, B., & Hofer, P. (2012). Reporting design. Status quo und
neue Wege in der Gestaltung des internen und externen Berichtswesens. In C. Eisl, H. Losbichler,
C. Engelbrechtsmüller, M. Büttner, H. Wambach, & A. Schmidt-Pöstion (Eds.), FH Oberösterreich,
KPMG Advisory AG, pmOne AG.
Elmqvist, N., Moere, A. V., Jetter, H.-C., Cernea, D., Reiterer, H., & Jankun-Kelly, T. J. (2011). Fluid
interaction for information visualization. Information Visualization, 10(4), 327–340. https://doi.org/
10.1177/1473871611413180.
Elmqvist, N., Stasko, J., & Tsigas, P. (Eds.) (2007). DataMeadow. A visual canvas for analysis of large-scale
multivariate data. In 2007 IEEE symposium on visual analytics science and technology.
Endert, A., Hossain, M. S., Ramakrishnan, N., North, C., Fiaux, P., & Andrews, C. (2014). The human is the
loop: New directions for visual analytics. Journal of Intelligent Information Systems, 43(3), 411–435.
https://doi.org/10.1007/s10844-014-0304-9.
Falschlunger, L., Eisl, C., Losbichler, H., & Greil, A. (2014). Improving information perception of graphical
displays. An experimental study on the display of column graphs. In V. Skala (Ed.), Proceedings of the
22nd WSCG. Conference on computer graphics, visualization and computer vision (WSCG). Pilsen,
02.–02.06.2014 (pp. 19–26).
Falschlunger, L., Lehner, O., & Treiblmaier, H. (2016a). InfoVis: The impact of information overload
on decision making outcome in high complexity settings. In Proceedings of the 15th annual Pre-
ICIS workshop on HCI research in MIS. SIGHCI 2016. Dublin, 11.12.2016. AIS Electronic Library:
Association for Information Systems (Special Interest Group on Human-Computer Interaction), 1–6,
Paper 3.
Falschlunger, L., Lehner, O., Treiblmaier, H., & Eisl, C. (2016b). Visual representation of information
as an antecedent of perceptive efficiency. The effect of experience. In Proceedings of the 49th
Hawaii international conference on system sciences (HICSS). Koloa, HI, USA, 05.01.2016–08.01.2016
(pp. 668–676). IEEE.
Goes, P. B. (2014). Big data and IS research. MIS Quarterly, 38(3), 3–8.
Grammel, L., Tory, M., & Storey, M. A. (2010). How information visualization novices construct visual-
izations. IEEE Transactions on Visualization and Computer Graphics, 16(6), 943–952.
Harrison, L., Yang, F., Franconeri, S., & Chang, R. (2014). Ranking visualizations of correlation using
Weber’s law. IEEE Transactions on Visualization and Computer Graphics, 20(12), 1943–1952. https://
doi.org/10.1109/TVCG.2014.2346979.
Heer, J., & Shneiderman, B. (2012). Interactive dynamics for visual analysis. Communications of the ACM,
55(4), 45–54. https://doi.org/10.1145/2133806.2133821.
Heinrich, J., Stasko, J., & Weiskopf, D. (2012). The parallel coordinates matrix. In Eurographics conference
on visualization (EuroVis). 33rd annual conference of the European association for computer graphics.
Cagliari, Sardinia, Italy, 13.–18.05.2012 (pp. 1–5). European Association for Computer Graphics.
Henley, M., Hagen, M., & Bergeron, D. (2007). Evaluating two visualization techniques for genome com-
parison. In E. Banissi (Ed.), 11th [IEEE] international conference information visualization. IV 2007
[proceedings] 4–6 July 2007, Zurich, Switzerland. IEEE Computer Society (pp. 1–6). Los Alamitos
Calif., Washington D.C.: IEEE Computer Society; Conference Publishing Services.
Hirsch, B., Seubert, A., & Sohn, M. (2015). Visualisation of data in management accounting reports. Journal
of Applied Accounting Research, 16(2), 221–239. https://doi.org/10.1108/JAAR-08-2012-0059.
Hjortskov, N., Rissén, D., Blangsted, A. K., Fallentin, N., Lundberg, U., & Søgaard, K. (2004). The effect
of mental stress on heart rate variability and blood pressure during computer work. European Journal
of Applied Physiology, 92(1–2), 84–89. https://doi.org/10.1007/s00421-004-1055-z.
Hofer, P., Walchshofer, C., Eisl, C., Mayr, A., & Perkhofer, L. (2018). Sankey, Sunburst & Co. Interactive
big data visualizierungen im usability test. In L. Nadig, & U. Egle (Eds.), Proceedings of CARF 2018.
Controlling, accouting, risk, and finance. CARF Luzern 2018. Luzern, 06.–07.09.2018. (pp. 92–112).
University of Applied Sciences Luzern: Verlag IFZ.
Inselberg, A., & Dimsdale, B. (1990). Parallel coordinates. A tool for visualizing multi-dimensional geom-
etry. In Proceedings of the First IEEE conference on visualization: visualization’ 90. San Francisco,
CA, USA, 23–26 Oct. 1990. (pp. 361–378). IEEE Comput. Soc. Press.
Isenberg, T., Isenberg, P., Chen, J., Sedlmair, M., & Möller, T. (2013). A systematic review on the practice
of evaluating visualization. IEEE Transactions on Visualization and Computer Graphics, 19(12),
2818–2827.
123
Does design matter when visualizing Big Data? An empirical… 93
Janvrin, D. J., Raschke, R. L., & Dilla, W. N. (2014). Making sense of complex data using interactive data
visualization. Journal of Accounting Education, 32(4), 31–48. https://doi.org/10.1016/j.jaccedu.2014.
09.003.
Johansson, J., & Forsell, C. (2016). Evaluation of parallel coordinates. Overview, categorization and guide-
lines for future research. IEEE Transactions on Visualization and Computer Graphics, 22(1), 579–588.
https://doi.org/10.1109/tvcg.2015.2466992.
Johnson, B., & Shneiderman, B. (1991). Tree-maps: A space-filling approach to the visualization of
hierarchical information structures. In Proceedings of the 2nd IEEE conference on visualization:
visualization’91 (pp. 284–291). San Diego, CA, USA.
Kanjanabose, R., Abdul-Rahman, A., & Chen, M. (2015). A multi-task comparative study on scatter plots
and parallel coordinates plots. Computer Graphics Forum, 34(3), 261–270. https://doi.org/10.1111/
cgf.12638.
Kehrer, J., & Hauser, H. (2013). Visualization and visual analysis of multifaceted scientific data. A survey.
IEEE Transactions on Visualization and Computer Graphics, 19(3), 495–513. https://doi.org/10.1109/
tvcg.2012.110.
Keim, D. A. (2001). Visual exploration of large data sets. Communications of the ACM, 44(8), 38–44.
https://doi.org/10.1145/381641.381656.
Keim, D. A. (2002). Information visualization and visual data mining. IEEE Transactions on Visualization
and Computer Graphics, 8(1), 1–8. https://doi.org/10.1109/2945.981847.
Keim, D. A., Andrienko, G., Fekete, J.-D., Görg, C., Kohlhammer, J., & Melancon, G. (2008). Visual
analytics. Definition, process, and challenges. In A. Kerren, J. T. Stasko, J. D. Fekete, & C. North
(Eds.), Information visualization. Lecture notes in computer science (Vol. 4950, pp. 154–175). Berlin:
Springer.
Keim, D. A., Mansmann, F., Schneidewind, J., & Schreck, T. (Eds.) (2006). Monitoring network traffic with
radial traffic analyzer. In 2006 IEEE symposium on visual analytics science and technology.
Kim, M., & Draper, G. M. (2014). Radial vs. cartesian revisited. A comparison of space-filling visu-
alizations. In Prodeedings of the VINCI’14. With assistance of ACM. 7th international symposium
on visual information communication and interaction VINCI’14. Sydney, Australia, 05–08.08.2014
(pp. 196–199).
Lehmann, D. J., Albuquerque, G., Eisemann, M., Tatu, A., Keim, D., Schumann, H., et al. (2010). Visual-
isierung und Analyse multidimensionaler Datensätze. Informatik-Spektrum, 6(33), 589–600.
Liu, S., Maljovec, D., Wang, B., Bremer, P.-T., & Pascucci, V. (2017). Visualizing high-dimensional data.
Advances in the past decade. IEEE Transactions on Visualization and Computer Graphics, 23(3),
1249–1268. https://doi.org/10.1109/tvcg.2016.2640960.
Lurie, N. H., & Mason, C. H. (2007). Visual representation: implications for decision making. Journal of
Marketing, 71(1), 160–177.
Mansmann, F., Göbel, T., & Cheswick, W. (2012). Visual analysis of complex firewall configurations. In
D. Schweitzer & D. Quist (Eds.), Proceedings of the ninth international symposium on visualization
for cyber security (VisSec’12). Seattle, Washington, USA, 15.10.2012 (pp. 1–8). ACM.
Miller, G. A. (1956). The magical number seven, plus or minus two. Some limits on our capacity for
processing information. Psychological Review, 101(2), 343–352.
Munzner, T. (2014). Visualization analysis and design. AK Peters visualization series (1st ed.). Boca Raton:
CRC Press.
Netzel, R., Vuong, J., Engelke, U., O’Donoghue, S., Weiskopf, D., & Heinrich, J. (2017). Comparative
eye-tracking evaluation of scatterplots and parallel coordinates. Visual Informatics, 1(2), 118–131.
https://doi.org/10.1016/j.visinf.2017.11.001.
Ohlert, C. R., & Weißenberger, B. E. (2015). Beating the base-rate fallacy: An experimental approach on the
effectiveness of different information presentation formats. Journal of Management Control, 26(1),
51–80. https://doi.org/10.1007/s00187-015-0205-2.
Pasch, T. (2019). Strategy and innovation: The mediating role of management accountants and management
accounting systems’ use. Journal of Management Control, 30(2), 213–246. https://doi.org/10.1007/
s00187-019-00283-y.
Perkhofer, L. M. (2019). A cognitive load-theoretic framework for information visualization. In O. Lehner
(Ed.), Proceedings of the 17th conference on finance, risk and accounting perspectives, in Print
(FRAP). Helsinki, 23.–25.09.2019 (pp. 9–25). ACRN Oxford.
Perkhofer, L., Hofer, P., & Walchshofer, C. (2019a). BIG data visualisierungen 2.0. Optimale Gestaltung
und Einsatz neuartiger Visualisierungsmöglichkeiten. In L. Nadig (Ed.), Proceedings of CARF 2019.
123
94 L. Perkhofer et al.
Controlling, accounting, risk and finance. CARF Luzern 2019. Luzern, 5.–6.9.2019 (pp. 76–104).
University of Applied Sciences Luzern: Verlag IFZ.
Perkhofer, L. M., Hofer, P., Walchshofer, C., Plank, T., & Jetter, H.-C. (2019b). Interactive visualization of
big data in the field of accounting. Journal of Applied Accounting Research, 5(1), 78. https://doi.org/
10.1108/JAAR-10-2017-0114.
Perkhofer, L., & Lehner, O. (2019). Using gaze behavior to measure cognitive load. In F. Davis, R. Riedl,
J. Vom Brocke, P.-M. Léger, & A. Randolph (Eds.), Information systems and neuroscience. NeuroIS
Retreat 2018. Lecture notes in information systems and organisation, NeuroIS Retreat 2018 (1st ed.,
Vol. 29, pp. 73–83). Berlin: Springer.
Perkhofer, L., Walchshofer, C., & Hofer, P. (2019c). Designing visualizations to identify and assess correla-
tions and trends. An experimental study based on price developments. In O. Lehner (Ed.), Proceedings
of the 17th conference on finance, risk and accounting perspectives (FRAP). Helsinki, 23.–25.09.2019
(pp. 294–340). ACRN Oxford.
Perrot, A., Bourqui, R., Hanusse, N., & Auber, D. (2017). HeatPipe: High throughput, low latency big
data heatmap with spark streaming. In 21st international conference on information visualization
2017. Information visualization 2017. London, UK, 11–14.07.2017. IVS (pp. 1–6). https://hal.archives-
ouvertes.fr/hal-01516888/document. Retrieved Dec 2018
Pike, W. A., Stasko, J., Chang, R., & O’Connell, T. A. (2009). The science of interaction. Information
Visualization, 8(4), 263–274. https://doi.org/10.1057/ivs.2009.22.
Plaisant, C., Fekete, J.-D., & Grinstein, G. (2008). Promoting insight-based evaluation of visualizations.
From contest to benchmark repository. IEEE Transactions on Visualization and Computer Graphics,
14(1), 120–134. https://doi.org/10.1109/tvcg.2007.70412.
Pretorius, J., & van Wijk, J. (2005). Multidimensional visualization of transition systems. In E. Banissi, M.
Sarfraz, J. C. Roberts, B. Loften, A. Ursyn, & R. A. Burkhard et al. (Eds.), Proceedings of the ninth
international conference on information visualization (IV’05). London, UK, 06.–08.07.2005 (pp. 1–6).
IEEE Computer Society.
Riehmann, P., Hanfler, M., & Froehlich, B. (2005). Interactive Sankey diagrams. In IEEE symposium on
information visualization (InfoVis). Minneapolis, USA, 23.–25.10.2005 (pp. 233–240). IEEE Com-
puter Society.
Rodden, K. (2014). Applying a sunburst visualization to summarize user navigation sequences. IEEE
Computer Graphics and Applications, 34(5), 36–40. https://doi.org/10.1109/MCG.2014.63.
Satyanarayan, A., Moritz, D., Wongsuphasawat, K., & Heer, J. (2017). Vega-Lite: A grammar of interactive
graphics. IEEE Transactions on Visualization and Computer Graphics, 23(1), 341–350. https://doi.
org/10.1109/TVCG.2016.2599030.
Severino, R. (2015). The data visualization catalogue—An online Blog. Heatmap. Tableau. Retrieved June
21, 2019 from https://datavizcatalogue.com/methods/heatmap.html.
Shaft, T. M., & Vessey, I. (2006). The role of cognitive fit in the relationship between software comprehension
and modification. MIS Quarterly, 30(1), 29–55.
Shneiderman, B. (1996). The eyes have it. A task by data type taxonomy for information visualization. In
IEEE 1996. Proceedings, August 14–16, 1996, Blue Mountain Lake, New York. New York State Center
for Advanced Technology in Computer Applications and Software Engineering (Syracuse University)
(pp. 336–343). IEEE Computer Society. Los Alamitos Calif.: IEEE Computer Society Press.
Singh, K., & Best, P. (2019). Anti-money laundering: Using data visualization to identify suspicious activity.
International Journal of Accounting Information Systems, 34, 100418. https://doi.org/10.1016/j.accinf.
2019.06.001.
Songer, A. D., Hays, B., & North, C. (2004). Multidimensional visualization of project control data. Con-
struction Innovation, 4(3), 173–190. https://doi.org/10.1108/14714170410815088.
Speier, C. (2006). The influence of information presentation formats on complex task decision-making
performance. International Journal of Human-Computer Studies, 64(11), 1115–1131.
Stab, C., Breyer, M., Nazemi, K., Burkhardt, D., Hofmann, C., & Fellner, D. (2010). SemaSun: Visualization
of semantic knowledge based on an improved sunburst visualizatioon metaphor. In J. Herrington & C.
Montgomerie (Eds.), Proceedings of ED-MEDIA 2010. World conference on educational multimedia,
hypermedia & telecommunications. Toronto, Canada, 29.06.2010. (pp. 911–919). Association for the
Advancement of Computing in Education (AACE).
Stasko, J., & Zhang, E. (2000). Focus + context display and navigation techniques for enhancing radial,
space-filling hierarchy visualizations. In Proceedings of the INFOVIS 2000. IEEE symposium on
information visualization 2000. Salt Lake City, Utah, 09–10.10.2000 (pp. 57–65). ACM SIGGRAPH.
123
Does design matter when visualizing Big Data? An empirical… 95
Sweller, J. (2010). Element interactivity and intrinsic, extraneous, and germane gognitive load. Educational
Psychology Review, 22(2), 123–138.
Tufte, E. R. (1983). The visual display of quantitative information (1st ed.). Connecticut: Graphics Press.
van Wijk, J. J. (2005): The value of visualization. In Proceedings of the 2005 IEEE VIS. IEEE visualization.
Minneapolis, MN, USA, 23–28. Oct. 2005 (pp. 79–86).
van Wijk, J. J. (2013). Evaluation. A challenge for visual analytics. Computer, 46(7), 56–60. https://doi.
org/10.1109/mc.2013.151.
Vessey, I., & Galletta, D. (1991). Cognitive fit: An empirical study of information acquisition. Information
Systems Research, 2(1), 63–84.
Wang, L., Wang, G., & Alexander, C. (2015). Big data and visualization. Methods, challenges, and tech-
nology progress. Digital Technologies, 1(1), 33–38. https://doi.org/10.1002/9781119197249.ch1.
Ware, C. (2012). Information visualization. Perception for design (3rd ed.). Oxford: Elsevier Ltd.
Wilkinson, L. (2005). The grammar of graphics (Vol. 2). New York: Springer.
Yi, J. S., Kang, Y. A., Stasko, J., & Jacko, J. (2007). Toward a deeper understanding of the role of interaction
in information visualization. IEEE Transactions on Visualization and Computer Graphics, 13(6),
1224–1231. https://doi.org/10.1109/TVCG.2007.70515.
Yigitbasioglu, O. M., & Velcu, O. (2012). A review of dashboards in performance management: Implications
for design and research. International Journal of Accounting Information Systems, 13(1), 41–59.
Zhou, M. X., & Feiner, S. K. (1998). Visual task characterization for automated visual discourse synthesis. In
Proceedings of the SIGCHI conference on human factors in computing systems. CHI 98. Los Angeles,
C.A., USA, 18.–23.04.1998 (pp. 392–399). ACM.
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps
and institutional affiliations.
123