Introduction
MIDS W209: Information Visualization

Partially based on slides from Tamara Munzner

What we are going to learn

  • Infovis definition
  • Why should we visualize?
  • What are insights?
  • Problem abstraction
  • Visualization tools landscape

Definitions

Defining Information Visualization (vis)

Computer-based visualization systems provide visual representations of datasets designed to help people carry out tasks more effectively.

Why?

Have the human in the loop

Visualization is suitable when there is a need to augment human capabilities rather than replace people with computational decision-making methods.

Three requirements


Users
Data
Tasks

A good visualization enables users to complete tasks effectively on the data.

When not to use vis

Don’t need vis when fully automatic solution exists and is trusted

But...

  • Many analysis problems are ill-specified.
    • Don’t know exactly what questions to ask in advance

What vis allows for

  • Long-term use for end users (e.g., exploratory analysis of scientific data)
  • Presentation of known results
  • Stepping stone to better understanding of requirements before developing models
  • Helps developers of automatic solution refine/debug, determine parameters
  • Helps end users of automatic solutions verify, build trust

Why use an external representation?

Computer-based visualization systems provide visual representations of datasets designed to help people carry out tasks more effectively.

Why use an external representation?

External representation: replace cognition with perception

[Cerebral: Visualizing Multiple Experimental Conditions on a Graph with Biological Context. Barsky, Munzner, Gardy, and Kincaid. IEEE TVCG (Proc. InfoVis) 14(6):1253-1260, 2008.]

Why use a computer in the loop?

Computer-based visualization systems provide visual representations of datasets designed to help people carry out tasks more effectively.

Why have a computer in the loop?

  • Beyond human patience
  • Scale to large datasets
  • Support interactivity

Why depend on vision?

Computer-based visualization systems provide visual representations of datasets designed to help people carry out tasks more effectively.

Why depend on vision?

  • Human visual system is high-bandwidth channel to brain
    • Overview possible due to background processing
    • Subjective experience of seeing everything simultaneously
    • Significant processing occurs in parallel and pre-attentively
  • Sound: lower bandwidth and different semantics
    • Overview not supported
    • Subjective experience of sequential stream
  • Touch/haptics: impoverished record/replay capacity
    • Only very low-bandwidth communication thus far
  • Taste, smell: no viable record/replay devices

Why show data in detail?

  • Summaries lose information.
    • Confirm expected and find unexpected patterns.
    • Assess validity of statistical model.
University Of California at Berkeley logo

Visualization Design Space

Idioms

Distinct approach to creating or manipulating visual representations

Think about...

In how many ways can you visualize the numbers 13 and 23?

Idiom design space

The design space of possible vis idioms is huge and includes the considerations of both how to create and how to interact with visual representations.

Idioms

  • How to draw it: visual encoding idiom
    • Many possibilities for how to create
  • How to manipulate it: interaction idiom
    • Even more possibilities
    • Make single idiom dynamic
    • Link multiple idioms together through interaction

Why focus on tasks and effectiveness?

Computer-based visualization systems provide visual representations of datasets designed to help people carry out tasks more effectively.

Why focus on tasks and effectiveness?

  • Tasks serve as constraint on design (as do data)
    • Idioms do not serve all tasks equally!
    • Challenge: recast tasks from domain-specific vocabulary to abstract forms
  • Most possibilities ineffective
    • Validation is necessary, but tricky
    • Increases chance of finding good solutions if you understand full space of possibilities

What counts as effective?

  • Novel: enable entirely new kinds of analysis
  • Faster: speed up existing workflows

Resource limitations

Vis designers must take into account three very different kinds of resource limitations: those of computers, of humans, and of displays.

Computational limits

  • Processing time
  • System memory

Human limits

  • Human attention
  • Memory
  • Retention

Display limits

  • Pixels are precious resource, the most constrained resource
  • Information density: ratio of space used to encode information vs. unused white space
    • Trade-off between clutter and wasting space, find sweet spot between dense and sparse
University Of California at Berkeley logo

Why should we visualize?

The purpose of visualization is insight, not pictures

The purpose of data science is insight, not (just) models

Anscombe's quartet

I II III IV
x y x y x y x y
10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71
9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04
6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25
4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50
12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56
7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91
5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89
Property Value
Mean of x 9
Variance of x 11
Mean of y 7.50
Variance of y 4.125
Correlation between x and y 0.816
Linear regression y = 3.00 + 0.500x
Coefficient of determination of the linear regression 0.67

Anscombe's visualized

More examples, same statistics

https://dabblingwithdata.wordpress.com/2017/05/03/the-datasaurus-a-monstrous-anscombe-for-the-21st-century/

Datasaurus!

https://dabblingwithdata.wordpress.com/2017/05/03/the-datasaurus-a-monstrous-anscombe-for-the-21st-century/
University Of California at Berkeley logo

Visual Analytics

How do people do data science?

Traditional

  • Query for known patterns
  • Display results using traditional techniques

Pros:
  • Many solutions
  • Easier to implement

Cons:
  • Can’t search for the unexpected

Data Mining/ML

  • Based on statistics
  • Black box approach
  • Output outliers and correlations
  • Human out of the loop

Pros:
  • Scalable

Cons:
  • Analysts have to make sense of the results
  • Makes assumptions on the data

InfoVis

  • Visual interactive interfaces
  • Human in the loop

Pros:
  • Visual bandwidth is enormous
  • Experts decided what to search for
  • Identify unknown patterns and errors in the data

Cons
  • Scalability can be an issue

In Infovis, we look for insights

  • Deep understanding
  • Meaningful
  • Non-obvious
  • Actionable
  • Based on data

An insight is:

  • Something that the user can learn from the data using the infovis
  • Which she didn't know/expect
  • Also, is useful/needed for her
  • Moreover, she didn't know of it
  • And that she can leverage
University Of California at Berkeley logo

Insights

Tweetometro

Task: Twitter behavior during presidential elections

User: me

http://old.tweetometro.co

Normal tweets

All tweets collected, line chart of number of retweets by time for the first two hours

Weird tweets?

Suspicious tweets, that increased to many retweets in one second

Creation dates

Number of followers

Number of followers
Dashboard link

What car to buy?

User: person buying a car

Task: What's the best car to buy?

Data: all cars on sale

Normal procedure

Ask friends and family

Problem

That's inferring statistics from a sample n = 1

Better approach

Data-based decisions

https://tucarro.com

Jeep Willys

  • Colombia bought many Jeeps after the war
  • They are the a sort of mountain taxi
  • There is a trend to pimp them up
Colombian Jeep Willys. Foto by John Alexis Guerra Gómez
University Of California at Berkeley logo

Problem Abstraction

What/Why/How

  • What is visualized?
    • data abstraction
  • Why is the user looking at it?
    • task abstraction
  • How is visualized?
    • idiom visual encoding and interaction

Abstract language avoids domain specific pitfals
What/Why/How to navigate systematically the design space

Abstraction Example

SpaceTree

http://www.cs.umd.edu/hcil/spacetree/

SpaceTree

TreeJuxtaposer

https://www.cs.ubc.ca/~tmm/papers/tj/

TreeJuxtaposer

University Of California at Berkeley logo

Abstraction in Action

The what/why/how framework allows us to make productive comparisons of different vis idioms. A couple examples:

  • Pie chart vs bar chart
  • Word clouds vs word shift graphs

The classic: pie chart vs bar chart.

You've seen both. Simply put, the question is ill-formed. No single correct answer! We need the framework:

  • What: tabular data.
  • Idiom: bar or pie chart.
    • How (encode bar chart): line marks, value channel aligned vertical position, key attribute horizontal position.
    • How (encode bar chart): area marks (wedges), value channel angle.
  • Why: Identify one, the largest group.
  • Why: Identify one, the 2nd/3rd/4th largest group (rank them).
  • Why: Compare value.
  • Why: Identify distribution of parts of the whole, is a group 1/2, 1/3, 1/4 of whole?.
Pie vs bar chart.
image courtesy of WikiPedia

Word clouds vs word shift graph

You've seen a word cloud, maybe not a word shift graph, let's look at one first.

The same data on a word cloud

Word cloud.
  • What: Weighted average measure over words.
  • Idiom: Word cloud vs word shift graph.
    • How (encode word cloud): mark text, value channel size, spatial position random, color random.
    • How (encode word shift graph): mark text, mark line, share value channel horizontal position, shared value ranked channel vertical position, line key type channel color.
  • Why: Enjoy.
  • Why: Identify one, the largest word.
  • Why: Identify one, the 2nd/3rd/4th largest word (rank them).
  • Why: Compare distribution of words.
  • Why: Summarize features.
University Of California at Berkeley logo

Practice

The visualization tool space

Partially based on slides from Jeff Heer
slides from Jeff Heer
slides from Jeff Heer
slides from Jeff Heer
slides from Jeff Heer
slides from Jeff Heer
slides from Jeff Heer
slides from Jeff Heer

Choose your tool

D3Vega-liteAltairTableau
University Of California at Berkeley logo

Web Development Basics

https://observablehq.com/@observablehq/introduction-to-html

Introduction to Observable

Introduction to D3

https://observablehq.com/@d3/learn-d3

Introduction to Vega-lite

https://observablehq.com/@uwdata/introduction-to-vega-lite

Introduction to Python and Altair

https://altair-viz.github.io/getting_started/overview.html

Interview Jake VanderPlas, Part 1

Introduction to Tableau

https://www.tableau.com/learn/get-started/creator

What we learned

  • Infovis definition
  • Why should we visualize?
  • What are insights?
  • Problem abstraction
  • Visualization tools landscape