0% found this document useful (0 votes)
30 views82 pages

CSE512 DataAndImageModels

1) The document discusses data and image models for visualization. It covers topics like data types, taxonomies for organizing data, and how data is mapped to visual representations. 2) Common data types discussed include nominal, ordinal, interval, and ratio variables. Relational and statistical data models are also described. 3) The document explains how data properties like dimensions and measures influence how data can be aggregated and analyzed visually.

Uploaded by

Berlin Shaheema
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views82 pages

CSE512 DataAndImageModels

1) The document discusses data and image models for visualization. It covers topics like data types, taxonomies for organizing data, and how data is mapped to visual representations. 2) Common data types discussed include nominal, ordinal, interval, and ratio variables. Relational and statistical data models are also described. 3) The document explains how data properties like dimensions and measures influence how data can be aggregated and analyzed visually.

Uploaded by

Berlin Shaheema
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 82

CSE512 :: 9 Jan 2014

Data and Image Models

Jeffrey Heer University of Washington

1
Last Time:
Value of Visualization

2
The Value of Visualization
Record information
Blueprints, photographs, seismographs, …
Analyze data to support reasoning
Develop and assess hypotheses
Discover errors in data
Expand memory
Find patterns
Communicate information to others
Share and persuade
Collaborate and revise

3
Marey’s sphygmograph [from Braun 83]

4
Make a decision: Challenger

Visualizations drawn by Tufte show how low temperatures damage O-rings [Tufte 97]

5
“to affect thro’ the Eyes
what we fail to convey to
the public through their
word-proof ears”
1856 “Coxcomb” of Crimean War Deaths, Florence Nightingale

6
Info-Vis vs. Sci-Vis?

7
Visualization Reference Model

Data Visual Form Task

Data Visual
Raw Data Views
Tables Structures

Data Visual View


Transformations Encodings Transformations

8
Data and Image Models

9
The Big Picture
task

data processing
physical type algorithms
int, float, etc. image
abstract type visual channel
nominal, ordinal, etc. retinal variables
mapping
visual encoding
domain visual metaphor
metadata
semantics
conceptual model

10
Topics
Properties of data
Properties of images
Mapping data to images

11
Data

12
Data models vs. Conceptual models
Data models are low level descriptions of the data
 Math: Sets with operations on them
 Example: integers with + and × operators

Conceptual models are mental constructions


 Include semantics and support reasoning

Examples (data vs. conceptual)


 (1D floats) vs. Temperature
 (3D vector of floats) vs. Space

13
Taxonomy (?)
1D (sets and sequences)
Temporal
2D (maps)
3D (shapes)
nD (relational)
Trees (hierarchies)
Networks (graphs)
Are there others?
The eyes have it: A task by data type taxonomy for information
visualization [Shneiderman 96]

14
Types of variables
Physical types
 Characterized by storage format
 Characterized by machine operations
Example: bool, short, int32, float, double, string, …

Abstract types
 Provide descriptions of the data
 May be characterized by methods/attributes
 May be organized into a hierarchy
Example: plants, animals, metazoans, …

15
Nominal, Ordinal and Quantitative
N - Nominal (labels)
 Fruits: Apples, oranges, …
O - Ordered
 Quality of meat: Grade A, AA, AAA
Q - Interval (Location of zero arbitrary)
 Dates: Jan, 19, 2006; Location: (LAT 33.98, LONG -118.45)
 Like a geometric point. Cannot compare directly
 Only differences (i.e. intervals) may be compared
Q - Ratio (zero fixed)
 Physical measurement: Length, Mass, Temp, …
 Counts and amounts
 Like a geometric vector, origin is meaningful
S. S. Stevens, On the theory of scales of measurements, 1946

16
Nominal, Ordinal and Quantitative
N - Nominal (labels)
 Operations: =, ≠
O - Ordered
 Operations: =, ≠, <, >
Q - Interval (Location of zero arbitrary)
 Operations: =, ≠, <, >, -
 Can measure distances or spans
Q - Ratio (zero fixed)
 Operations: =, ≠, <, >, -, %
 Can measure ratios or proportions

S. S. Stevens, On the theory of scales of measurements, 1946

17
From data model to N,O,Q data type

Data model
 32.5, 54.0, -17.3, …
 floats
Conceptual model
 Temperature (°C)
Data type
 Burned vs. Not burned (N)
 Hot, warm, cold (O)
 Continuous range of values (Q)

18
Sepal and petal lengths and widths for three species of iris [Fisher 1936].
19
Q
O
N

20
Relational data model
Represent data as a table (relation)
Each row (tuple) represents a single record
Each record is a fixed-length tuple
Each column (attribute) represents a single variable
Each attribute has a name and a data type
A table’s schema is the set of names and data types

A database is a collection of tables (relations)

21
Relational Algebra [Codd]
 Data transformations (sql)
 Projection (select)
 Selection (where)
 Sorting (order by)
 Aggregation (group by, sum, min, …)
 Set operations (union, …)
 Combine (inner join, outer join, …)

22
Statistical data model
Variables or measurements
Categories or factors or dimensions
Observations or cases

23
Statistical data model
Variables or measurements
Categories or factors or dimensions
Observations or cases
Month Control Placebo 300 mg 450 mg
March 165 163 166 168
April 162 159 161 163
May 164 158 161 153
June 162 161 158 160
July 166 158 160 148
August 163 158 157 150
Blood Pressure Study (4 treatments, 6 months)
24
Dimensions and Measures
Dimensions: Discrete variables describing data
Dates, categories of values (independent vars)

Measures: Data values that can be aggregated


Numbers to be analyzed (dependent vars)
Aggregate as sum, count, average, std. deviation

25
Example: U.S. Census Data
People: # of people in group
Year: 1850 – 2000 (every decade)
Age: 0 – 90+
Sex: Male, Female
Marital Status: Single, Married, Divorced, …

26
Example: U.S. Census

People
Year
Age
Sex
Marital Status

2348 data points

27
Census: N, O, Q?
People Count Q-Ratio
Year Q-Interval (O)
Age Q-Ratio (O)
Sex (M/F) N
Marital Status N

28
Census: Dimension or Measure?
People Count Measure
Year Dimension
Age Depends!
Sex (M/F) Dimension
Marital Status Dimension

29
Roll-Up and Drill-Down
Want to examine marital status in each decade?
Roll-up the data along the desired dimensions
Dimensions Measure

SELECT year, marst, sum(people)


FROM census
GROUP BY year, marst;
Dimensions

30
Roll-Up and Drill-Down
Need more detailed information?
Drill-down into additional dimensions

SELECT year, age, marst, sum(people)


FROM census
GROUP BY year, age, marst;

31
All Marital Status
a r2000
Y e 1990
1980
1970

60+

40-59
Age

20-39
Sum along
0-19 Marital Status
Single

Married

Divorced

Widowed

Sum along Age

Marital Status
All Ages

All Years
Sum along Year

32
All Marital Status
a r2000
e 1990
Y 1980 Roll-Up
1970

60+
Drill-Down
40-59
Age

20-39
Sum along
0-19 Marital Status
Single

Married

Divorced

Widowed

Sum along Age

Marital Status
All Ages

All Years
Sum along Year

33
YEAR AGE MARST SEX PEOPLE
1850 0 0 1 1,483,789
1850 5 0 1 1,411,067
1860 0 0 1 2,120,846
1860 5 0 1 1,804,467
...

AGE MARST SEX 1850 1860 ...


0 0 1 1,483,789 2,120,846 . . .
5 0 1 1,411,067 1,804,467 . . .
...
Which format might we prefer?
34
Row vs. Column-Oriented
Databases

35
Relational Data Organizations
Transactions vs. Analysis
Row-oriented Column-oriented

36
Relational Data Organizations
Row-oriented Column-oriented

37
Relational Data Organizations
Speed-up Analysis Column-oriented
Reduce data transfer
Improved locality
Data compression

38
Administrivia

39
Announcements
Auditors
 Requirements: Come to class and participate (online as well)

Class participation requirements


 Complete readings before class
 In-class discussion
 Post at least 1 discussion substantive comment/question on
Piazza within 24 hours after each lecture (11am next day)

40
Assignment 1: Visualization Design
Design a static visualization for a given data set.

Deliverables (submit via Catalyst)


 Image of your visualization
 Short description and design rationale (≤ 4 para.)

Due by 5:00pm on Monday 1/13.

41
Questions?

42
Image

43
44
Visual language is a sign system

Images perceived as a set of signs


Sender encodes information in signs
Receiver decodes information from signs

Sémiologie Graphique, 1967

Jacques Bertin

45
Bertin’s Semiology of Graphics

1. A, B, C are distinguishable
2. B is between A and C.
C 3. BC is twice as long as AB.
B
A ∴ Encode quantitative variables

"Resemblance, order and proportion are the three


signifieds in graphics.” - Bertin

46
47
Visual encoding variables
Position (x 2)
Size
Value
Texture
Color
Orientation
Shape

48
Visual encoding variables
Position
Length
Area
Volume
Value
Texture
Color
Orientation
Shape
Transparency
Blur / Focus …
49
Information in color and value
Value is perceived as ordered
∴ Encode ordinal variables (O)

∴ Encode continuous variables (Q) [not as well]

Hue is normally perceived as unordered


∴ Encode nominal variables (N) using color

50
Bertin’s “Levels of Organization”
Position N O Q Nominal
Ordered
Size N O Q Quantitative
Note: Q < O < N
Value N O Q

Texture N O

Color N
Orientation N
Shape N
51
Design Space of Visual Encodings

52
factors

Univariate data A B C
variable
1

A B C D

53
factors

Univariate data A B C
variable
1

7
Tukey box plot
5
A B C D E
low Middle 50% high
3

1
Mean

0 20

A B C D

54
A B C
Bivariate data
1
2
C

B F
D

A E

Scatter plot is common

55
A B C
Trivariate data 1
2
3
C F
3D scatter plot is possible
B

E B

D C F
A
A E

56
Three variables
Two variables [x,y] can map to points
 Scatterplots, maps, …
Third variable [z] must use
 Color, size, shape, …

57
Large design space (visual metaphors)

[Bertin, Graphics and Graphic Info. Processing, 1981]


58
Multidimensional data
How many variables can be depicted in an image?
A B C
1
2
3
4
5
6
7
8

59
Multidimensional data
How many variables can be depicted in an image?
A B C
1
2
3
“With up to three rows, a data table
4
can be constructed directly as a
single image … However, an image 5
has only three dimensions. And this 6
barrier is impassible.” 7
Bertin
8

60
Deconstructions

61
Playfair 1786

62
Playfair 1786

x-axis: year (Q)


y-axis: currency (Q)
color: imports/exports (N, O)

63
Wattenberg 1998

http://www.smartmoney.com/marketmap/
64
Wattenberg 1998

rectangle size: market cap (Q)


rectangle position: market sector (N), market cap (Q)
color hue: loss vs. gain (N, O)
color value: magnitude of loss or gain (Q)

65
Minard 1869: Napoleon’s march

66
Single axis composition

= [based on slide from Mackinlay]

67
Mark composition

y-axis: temperature (Q)

+ x-axis: longitude (Q) / time (O)

=
temp over space/time (Q x Q)

[based on slide from Mackinlay]

68
Mark composition
y-axis: longitude (Q)

+ x-axis: latitude (Q)

+
width: army size (Q)

=
army position (Q x Q) and army size (Q)

[based on slide from Mackinlay]

69
longitude (Q)

latitude (Q)

army size (Q)

temperature (Q)

latitude (Q) / time (O)

[based on slide from Mackinlay]

70
Minard 1869: Napoleon’s march

Depicts at least 5 quantitative variables. Any others?


71
Formalizing Design
(Mackinlay 1986)

72
Choosing Visual Encodings
Challenge:
Assume 8 visual encodings and n data attributes.
We would like to pick the “best” encoding among a
combinatorial set of possibilities with size (n+1)8
Principle of Consistency:
The properties of the image (visual variables) should
match the properties of the data.
Principle of Importance Ordering:
Encode the most important information in the most
effective way.

73
Design Criteria (Mackinlay)
Expressiveness
A set of facts is expressible in a visual language if the
sentences (i.e. the visualizations) in the language
express all the facts in the set of data, and only the
facts in the data.

74
Cannot express the facts
A one-to-many (1 → N) relation cannot be expressed in
a single horizontal dot plot because multiple tuples are
mapped to the same position

75
Expresses facts not in the data
A length is interpreted as a quantitative value;
∴ Length of bar says something untrue about N data

[Mackinlay, APT, 1986]


76
Design Criteria (Mackinlay)
Expressiveness
A set of facts is expressible in a visual language if the
sentences (i.e. the visualizations) in the language
express all the facts in the set of data, and only the
facts in the data.

Effectiveness
A visualization is more effective than another
visualization if the information conveyed by one
visualization is more readily perceived than the
information in the other visualization.

77
Mackinlay’s Ranking

Conjectured effectiveness of the encoding


78
Mackinlay’s Design Algorithm
User formally specifies data model and type
 Additional input: ordered list of data variables to show

APT searches over design space


 Tests expressiveness of each visual encoding
 Generates specification for encodings that pass test
 Tests perceptual effectiveness of resulting image

Outputs the “most effective” visualization

79
Limitations
Does not cover many visualization techniques
 Bertin and others discuss networks, maps, diagrams
 Does not consider 3D, animation, illustration, photography, …

Does not model interaction

Does not consider semantic data types / conventions

80
Summary
Formal specification
 Data model
 Image model
 Encodings mapping data to image

Choose expressive and effective encodings


 Formal test of expressiveness
 Experimental tests of perceptual effectiveness

81
Assignment 1: Visualization Design
Design a static visualization for a given data set.

Deliverables (submit via Catalyst)


 Image of your visualization
 Short description and design rationale (≤ 4 para.)

Due by 5:00pm on Monday 1/13.

82

You might also like