Welcome to Etherpad!
This pad text is synchronized as you type, so that everyone viewing this page sees the same text. This allows you to collaborate seamlessly on documents!
Get involved with Etherpad at
http://etherpad.org
Link to slides:
http://opr.princeton.edu/workshops/Downloads/2015Jan_StatisticalGraphicsConsiderationsKoffman.pdf
iterative process
ted - see ancombe's quartet
slides 13 and 14
(It's pretty famous.)
haiti is almost always an exception, you need to know that (!!!)
slide 17: how do you decide what is the scale of presentation
the way it is the pattern is really magnified, how do you decide?
you need to think of your audience
slide 21
life expectancy in haiti is 62
canada is 81
this line should be 75 percent of canada
these people had no intention of distorting or confusing anyone
in a bar graph the length of the bar should be proportional to the value it represents
there is a way to change the title and make the visualization accurate
the visualization does not represent what the title says
what would be a better title?
number of years of life expectancy above age 60?
that is what is really being shown here
slide 22
this is awful
this is more accurate
but it is an ugly boring graph
don't make this graph
slide 23
this is fine because there are no proportions here
you will see that, when you look at bar graphs yu will see so many that don't start at 0 and make this mistake
slide 24
political graphs
campaign political graphs do this all the time
slide 25
from an academic journal
i tried to anonymize it
this is the same problem as the haiti canada graph of life expectancy
north should be 3/4s of central, and this is an academic journal
s26
line need interval scales for slopes to be meaningful
the slope here means nothing, nominal values
this level of error is seen rarely in academic journals
s27
but this is seen a lot
this is not interval scale numeric data
1900-1980
you need to make this an interval scale
s29
the intersection point is wherever
two scales showing two valuables on the same graph
lots of people put them together
to avoid confusion, whaty ou said, you would put the general pattern for this and that
keep them separate and to my mind they are more accurate
the intersection point or not makes the situation worse
s31
exact same data
the curve goes above the bars in one and doesn't reach anything in the other
you can put the bar anywhere
making that a dual scale graph makes it a picture and a point that is not really there
what if you wanted to answer the question: in 2012 is chinese wage growth higher or lower than us wage growth?
if you put them on the same scale
and then, yes
s32
ggplot2
you can't make graphs with dual scales
people say they need it
i believe that plots with separate y scles are fundamentally flawed
you can't plot it back to the data sapce
easily manipulated to mislead
they are arbitrary
s33
when displaying odds ratios
measured on a logarithmic scale
they should be shown on an odds scale
the distance from 1 to 2 should match the distance from 1 to 1 to .5
there is lots of disagreement here
s34
using a linear scale
to show odds ratios
that is incorrect and less accurate
should odds ratios be graphed using a log scale?
controversial
s36
see reference and think about it
s37
visual representation of the data value
bubbles as magnitude population
the radius and not the area corresponds to magnitude
the way people perceive the size of the circle is by area
us ad canada: us looks 40 times larger
if w map to the area
then, the circle for the us is about 9 times larger
up until recently ggplot2, the default was to map the size to the radius
the recent release, that was one of the changes
if you don't specify the scale, it will map the size to the area of the bubble
and that is how most of us perceive it
s38
overplotting obscures your story (it is not inacurate)
there is lots of things you can play with
the point shape and transparency etc.
s39
i adjusted the point size to reduce the amount of overplotting
s40
open circles rather than closed circles
s41
open circles tiny
s42
changed shape, plus signs
s43
point transarency
the points are half as dark
and then you can see the overplotting more
s44
another thing i like to do is to stratify the data
if they are overplotted, i want to see if there is a pattern that emerges
i can stratify with different variables
s45
use point jittering
that the point is concealing multiple points
s46
we are going to move to how make clear comparisons
there is always a comparison
this point here
i have been told by students
and this is the most important information
when you make a graph, do that for the true quantity you want people to see
people graph the data they have and make the audience do the work
do you want to compare magnitudes
or the difference between the magntude of a and b
if you only graph a and b, you are making your reader do the subtraction
if you have to write about gaps think about this
what is the real point of interest
your graphs will be better
it could be the ratio, the absolute difference or what a and b are
this is a highlight
this will really change the way you think about what you want to put on a graph
make the data is easily seen
points large enough, contrast, hidden data
show data not just summary measures
think about what type of graph you will use based on this scale from cleveland
length vs angle vs color
consider proximity, alignment and ordering
move things so that they are right next to each other
s48
number one point
famous graph on imports and graphs from england
if you want to focus on the trade imbalance
that is what you should graph
do the subtraction: it is hard to do in your head and see
do you want them to see all three, that is also fine
if the important point is the difference, graph the difference
different between curves is hard to perceive
s50
the y axis is using a log scale
black vs white wages
the title is about gaps
the only way you can see the gap is to look inside with 17x to 8x
we are at the log scale so the difference looks minimal
they are showing you the wage for whites and blacks
and not the gap which is what this is about
s51
the bars are proportional
we don't have accuracy problem
how can i tell what is a big difference?
in some of them the task is not that long
here is a unified graph
s52
dot plots
connect a stereo
this, where data is unified, is easier to get the big picture from
task data improvement
is that what i want to see
that is going to show me the number of minutes
some of the tasks are longer than others
maybe the number of minutes is not what i am interested in
s53
how about percent in improvement
some of these revised instructions led to a greater percent in improvement
this isa real good example of graphing what you want your reader to take away
what i encourage you to do is before you make the graph
somebody made the observations
they came back to the desk
i am saying step back before goign to the graph
and think what is it that is the real point
what is the real kicker or summary here
think about that before implementing graphs
what is the essence of what you have?
s54
show the data
this of course is realted to the audience and the size of the room
show the data, not just summaries
s55
if it is easy, a box plot for different continents
i can easily put the data, which will provide some information on outliers
it might be intersting to label those utliers
it often adds to the story
s56
...
s59
you can do histograms to show frequency
distributions of continuous variables
s60
density curves are nice makes it easier to compare
s62
cleveland perceptual ease taks
the easiest things to perceive (experiments done in the 80s)
i wonder if things have changed with screens, different distance, whiteness
position on the common scale is easier to perceive
nonaligned scales are harder
that is as easy as is length
with no common based line, harder to perceive
angle slope harder
area harder
color is hardest to perceive
s63
pie chart: controversial and incredibly common
angles and areas which are hard for humans to judge
peope love them because the parts add up to the whole
dramatic differences work better in pie charts than values that are similar
a table is always better than a dumb pie chart
several pie charts that is really hard
the only that is worse is several of them where the audience has to compare
failure to order numbers along a visual scale
cleveland: pie charts have severe perceptual problems
compared to dot charts very unreliable
if perceiving the information is not so important, than a pie chart is fine?!?!
BUT
better than bars for comparing
especially for compound proportions
s67
this is bad
so hard to compare across
s68
perceptual tasks - using legnth vs. position along a common scale
the graph on the left is hard to interpret especially for the valus in the middle
it would be easier to do y graphs for each country using a common scale
s69
it is hard to compare lines
s70
this is fine
the blue lines have a common base line on the left and the red lines have a common base line on the right
and even the one on the bottom is not too bad
the one on the uper right
there is no common base line for the middle values
s71
nicely done graph
it is a stacked bar chart
no common base line
s74
a line graph is so much better here
s75
why are the women in bars: to nighlight them maybe
stiff neck from reading the country names
s76
let's make this upright
s77
made two sets of bars
equalized the men and women visualy
s78
now they are easier to compare, the points are unified
they are ordered by women
s81
color
this is the basics
there is a lot more to learn
colors have hue
unordered representation of the color
there is than chroma, how much gray is added to pure color
luminance, lightness is ordered
hue is categorical
and for continuous you can use chroma or luminance
color to distinguish groups
highlight particular data
encode quantitative values
s82
the country graphs, you are interested in the height of the bar, the color is , i would argue is not necesary
don't vary the color and the pattern
you are changing the bar
i would rather make this a bigger graph
and label the bars rather than use legends: legends require work
hue is unordered and not preceived quantitatively
use color to highlight
if you are trying to emphasize something that is a time to make that a different color
that to me is an excellent use of color
s86
to encode quantitative information
color varies from light to dark
i think this is intuitive
s88
proximity:
grouped bar charts are hard to read
non zero base line here
this is showing life expectancy
s89
we may do something like this
you can see the size of the gap as well as the individual values
there may be a training of your audience: these are not confidence intervals
these are life expectancies
if this is too cumbersome, you can go for the more familiar graph form
it is a trade off
i go towards trying to make the graph that i think is the clearest
i go towards clarity
s90
trend over time, why bar charts
why not a line graph
this surprises me as the latter is so much clearer
s91
important
to ease comparisons when you have two gropus
align things vertically
and not horizontally, even thought the horizontal lay out is more common
and use common axis
s92
don't order alphabetically
it won't translate to a different language either
s93
relatioship among subsets
you want to see how relationship changes between categories
th difference between using shape or fill
s94
so often the cleanest thing you can do is to have small multiples
it is visually hard to perceive this
if you have a lot of data poitnsand you want to show a summary
is there anyway to do that so that each point represents 10 people?
you can say what each point represents
that is available in ggplot2
s95
this is a series of graphs
but keep the scale the same!
s99
s102
famous graph
using dark lots??
a way to capture data entry error
s104
y axis is not common
smaller problem
the lines vary in color and pattern
we have more dimensions of display than we actually have data
how many dimensions of display do you recommend?
would it be toomuch to vary pattern here?
i would have to see the data
what looks clear vs. confusing for me
you have to make it, look at it, show it to people, look at it again
it also depends on how the data lays out
we often get away from common sense
ugly graphs
s106
fluctuating gas prices
all sorts of patterns
the message here is simplified
you are trying to show how behavior changes
over time
simpler is clearer
s107
you don't want 3d bars
they don't line up on the same place on the y axis
why is there 3d or background color
3d bars are just not good
s108
let's simplify this graph
read the legend
find matching color
find next one
back and forth
s110
put outcome variable on vertical (y axis)