Link to slides: http://opr.princeton.edu/workshops/Downloads/2015Jan_StatisticalGraphicsConsiderationsKoffman.pdf iterative process ted - see ancombe's quartet slides 13 and 14 (It's pretty famous.) haiti is almost always an exception, you need to know that (!!!) slide 17: how do you decide what is the scale of presentation the way it is the pattern is really magnified, how do you decide? you need to think of your audience slide 21 life expectancy in haiti is 62 canada is 81 this line should be 75 percent of canada these people had no intention of distorting or confusing anyone in a bar graph the length of the bar should be proportional to the value it represents there is a way to change the title and make the visualization accurate the visualization does not represent what the title says what would be a better title? number of years of life expectancy above age 60? that is what is really being shown here slide 22 this is awful this is more accurate but it is an ugly boring graph don't make this graph slide 23 this is fine because there are no proportions here you will see that, when you look at bar graphs yu will see so many that don't start at 0 and make this mistake slide 24 political graphs campaign political graphs do this all the time slide 25 from an academic journal i tried to anonymize it this is the same problem as the haiti canada graph of life expectancy north should be 3/4s of central, and this is an academic journal s26 line need interval scales for slopes to be meaningful the slope here means nothing, nominal values this level of error is seen rarely in academic journals s27 but this is seen a lot this is not interval scale numeric data 1900-1980 you need to make this an interval scale s29 the intersection point is wherever two scales showing two valuables on the same graph lots of people put them together to avoid confusion, whaty ou said, you would put the general pattern for this and that keep them separate and to my mind they are more accurate the intersection point or not makes the situation worse s31 exact same data the curve goes above the bars in one and doesn't reach anything in the other you can put the bar anywhere making that a dual scale graph makes it a picture and a point that is not really there what if you wanted to answer the question: in 2012 is chinese wage growth higher or lower than us wage growth? if you put them on the same scale and then, yes s32 ggplot2 you can't make graphs with dual scales people say they need it i believe that plots with separate y scles are fundamentally flawed you can't plot it back to the data sapce easily manipulated to mislead they are arbitrary s33 when displaying odds ratios measured on a logarithmic scale they should be shown on an odds scale the distance from 1 to 2 should match the distance from 1 to 1 to .5 there is lots of disagreement here s34 using a linear scale to show odds ratios that is incorrect and less accurate should odds ratios be graphed using a log scale? controversial s36 see reference and think about it s37 visual representation of the data value bubbles as magnitude population the radius and not the area corresponds to magnitude the way people perceive the size of the circle is by area us ad canada: us looks 40 times larger if w map to the area then, the circle for the us is about 9 times larger up until recently ggplot2, the default was to map the size to the radius the recent release, that was one of the changes if you don't specify the scale, it will map the size to the area of the bubble and that is how most of us perceive it s38 overplotting obscures your story (it is not inacurate) there is lots of things you can play with the point shape and transparency etc. s39 i adjusted the point size to reduce the amount of overplotting s40 open circles rather than closed circles s41 open circles tiny s42 changed shape, plus signs s43 point transarency the points are half as dark and then you can see the overplotting more s44 another thing i like to do is to stratify the data if they are overplotted, i want to see if there is a pattern that emerges i can stratify with different variables s45 use point jittering that the point is concealing multiple points s46 we are going to move to how make clear comparisons there is always a comparison this point here i have been told by students and this is the most important information when you make a graph, do that for the true quantity you want people to see people graph the data they have and make the audience do the work do you want to compare magnitudes or the difference between the magntude of a and b if you only graph a and b, you are making your reader do the subtraction if you have to write about gaps think about this what is the real point of interest your graphs will be better it could be the ratio, the absolute difference or what a and b are this is a highlight this will really change the way you think about what you want to put on a graph make the data is easily seen points large enough, contrast, hidden data show data not just summary measures think about what type of graph you will use based on this scale from cleveland length vs angle vs color consider proximity, alignment and ordering move things so that they are right next to each other s48 number one point famous graph on imports and graphs from england if you want to focus on the trade imbalance that is what you should graph do the subtraction: it is hard to do in your head and see do you want them to see all three, that is also fine if the important point is the difference, graph the difference different between curves is hard to perceive s50 the y axis is using a log scale black vs white wages the title is about gaps the only way you can see the gap is to look inside with 17x to 8x we are at the log scale so the difference looks minimal they are showing you the wage for whites and blacks and not the gap which is what this is about s51 the bars are proportional we don't have accuracy problem how can i tell what is a big difference? in some of them the task is not that long here is a unified graph s52 dot plots connect a stereo this, where data is unified, is easier to get the big picture from task data improvement is that what i want to see that is going to show me the number of minutes some of the tasks are longer than others maybe the number of minutes is not what i am interested in s53 how about percent in improvement some of these revised instructions led to a greater percent in improvement this isa real good example of graphing what you want your reader to take away what i encourage you to do is before you make the graph somebody made the observations they came back to the desk i am saying step back before goign to the graph and think what is it that is the real point what is the real kicker or summary here think about that before implementing graphs what is the essence of what you have? s54 show the data this of course is realted to the audience and the size of the room show the data, not just summaries s55 if it is easy, a box plot for different continents i can easily put the data, which will provide some information on outliers it might be intersting to label those utliers it often adds to the story s56 ... s59 you can do histograms to show frequency distributions of continuous variables s60 density curves are nice makes it easier to compare s62 cleveland perceptual ease taks the easiest things to perceive (experiments done in the 80s) i wonder if things have changed with screens, different distance, whiteness position on the common scale is easier to perceive nonaligned scales are harder that is as easy as is length with no common based line, harder to perceive angle slope harder area harder color is hardest to perceive s63 pie chart: controversial and incredibly common angles and areas which are hard for humans to judge peope love them because the parts add up to the whole dramatic differences work better in pie charts than values that are similar a table is always better than a dumb pie chart several pie charts that is really hard the only that is worse is several of them where the audience has to compare failure to order numbers along a visual scale cleveland: pie charts have severe perceptual problems compared to dot charts very unreliable if perceiving the information is not so important, than a pie chart is fine?!?! BUT better than bars for comparing especially for compound proportions s67 this is bad so hard to compare across s68 perceptual tasks - using legnth vs. position along a common scale the graph on the left is hard to interpret especially for the valus in the middle it would be easier to do y graphs for each country using a common scale s69 it is hard to compare lines s70 this is fine the blue lines have a common base line on the left and the red lines have a common base line on the right and even the one on the bottom is not too bad the one on the uper right there is no common base line for the middle values s71 nicely done graph it is a stacked bar chart no common base line s74 a line graph is so much better here s75 why are the women in bars: to nighlight them maybe stiff neck from reading the country names s76 let's make this upright s77 made two sets of bars equalized the men and women visualy s78 now they are easier to compare, the points are unified they are ordered by women s81 color this is the basics there is a lot more to learn colors have hue unordered representation of the color there is than chroma, how much gray is added to pure color luminance, lightness is ordered hue is categorical and for continuous you can use chroma or luminance color to distinguish groups highlight particular data encode quantitative values s82 the country graphs, you are interested in the height of the bar, the color is , i would argue is not necesary don't vary the color and the pattern you are changing the bar i would rather make this a bigger graph and label the bars rather than use legends: legends require work hue is unordered and not preceived quantitatively use color to highlight if you are trying to emphasize something that is a time to make that a different color that to me is an excellent use of color s86 to encode quantitative information color varies from light to dark i think this is intuitive s88 proximity: grouped bar charts are hard to read non zero base line here this is showing life expectancy s89 we may do something like this you can see the size of the gap as well as the individual values there may be a training of your audience: these are not confidence intervals these are life expectancies if this is too cumbersome, you can go for the more familiar graph form it is a trade off i go towards trying to make the graph that i think is the clearest i go towards clarity s90 trend over time, why bar charts why not a line graph this surprises me as the latter is so much clearer s91 important to ease comparisons when you have two gropus align things vertically and not horizontally, even thought the horizontal lay out is more common and use common axis s92 don't order alphabetically it won't translate to a different language either s93 relatioship among subsets you want to see how relationship changes between categories th difference between using shape or fill s94 so often the cleanest thing you can do is to have small multiples it is visually hard to perceive this if you have a lot of data poitnsand you want to show a summary is there anyway to do that so that each point represents 10 people? you can say what each point represents that is available in ggplot2 s95 this is a series of graphs but keep the scale the same! s99 s102 famous graph using dark lots?? a way to capture data entry error s104 y axis is not common smaller problem the lines vary in color and pattern we have more dimensions of display than we actually have data how many dimensions of display do you recommend? would it be toomuch to vary pattern here? i would have to see the data what looks clear vs. confusing for me you have to make it, look at it, show it to people, look at it again it also depends on how the data lays out we often get away from common sense ugly graphs s106 fluctuating gas prices all sorts of patterns the message here is simplified you are trying to show how behavior changes over time simpler is clearer s107 you don't want 3d bars they don't line up on the same place on the y axis why is there 3d or background color 3d bars are just not good s108 let's simplify this graph read the legend find matching color find next one back and forth s110 put outcome variable on vertical (y axis)