Our pumpkins this year.
Gallup-Healthways Well-Being
Will Wilkinson points to the Gallup-Healthways Well-Being Index which purports to measure overall health (”not only the absence of infirmity and disease, but also a state of physical, mental, and social well-being”) at the congressional district level for the United States. Will hypothesizes that Utah’s high score may be due to “a skoche of culture-driven upward inflation” (Mormons overstating their happiness).
Fortunately, the components of the Well-Being Index are reported as well. Two components, Life Evaluation and Emotional Health, measure self-reported happiness. If Will’s hypothesis were correct, we would expect these components to account for a disproportionate share of Utah’s overall index. In the scatterplots to the right, the three Utah congressional districts are highlighted in orange. Contrary to Will, Utah is above average only in the Work Quality component. On all the others, including Life Evaluation and Emotional Health, Utah is average or below average.
Wellness data in Excel, since I couldn’t figure out how to get it from the Gallup-Healthways site. The visualization was done in Tableau.
Posted in Uncategorized.
our day, visualizing the American Time Use Survey
For Jeff’s class I created an interactive visualization of the American Time Use Survey. I got sick last week so didn’t have a lot of time to work on it. As a result it turned out somewhat derivative of the Baby Name Voyager and other stacked area plots.
That said, I think it lets you find some rather interesting patterns in how people use their time. Most noticeable is the extra hour or so that people sleep in on the weekends.
Posted in Uncategorized.
Fundamental Statistical Concepts in Presenting Data: Principles for Constructing Better Graphics
Via Andrew Gelman I came across this long paper on statistical visualization by Rafe Donahue. I haven’t read it through carefully yet, but I enjoyed the examples of visualizations from his children’s schoolwork.
He criticizes boxplots, which caused a discussion in the comments to Andrew’s post. I read Tukey’s EDA recently and was surprised to see how much of Tukey’s work was focused on visualization by hand. The boxplot is a sensible visualization when you had to compute and plot manually. Using only 5 numbers it portrayed much of what was important about the data. However, now that plotting is cheap, it makes a lot more sense to just plot all the data.
In general, summaries, visual or otherwise, which assume a single mode, or worse normality, should be treated with a great deal of caution.
Posted in Uncategorized.
R packages
In class today we covered R packages. A quick try to create a package in Windows revealed that the Windows version of R does not come with the necessary build tools. I tried again on a Mac and ran into problems where package.skeleton failed to create the package directories since .find.package couldn’t find my newly created package. After a little playing around I found that package names cannot have an ‘_’ (at least on a Mac).
The R CMD CHECK command is very nice. It expands on the idea of static code checking to also check documentation, the install process, example code, etc.
Posted in Uncategorized.
Exploratory Model Analysis
I’ve recently come across a few papers on Exploratory Model Analysis. I wasn’t familiar with this work when writing the EnsembleMatrix paper, but they are very closely related. I was working with a ML researcher while designing the EnsembleMatrix visual interface and so did quite of bit of looking around in the ML literature. EMA is emerging in statistics and so didn’t appear in my search.
Here are a few pointers:
Parallel coordinates for exploratory modelling analysis (Antony Unwin)
Exploratory modelling analysis: visualizing the value of variables (Antony Unwin)
Meifly: Models explored interactively (Hadley Wickham)
I tried installing meifly, but it appears to depend on ggplot which is no longer available since ggplot2 has been released.
[2004] Exploratory data analysis for complex models (with discussion) (Andrew Gelman)
Discussion of this paper by Andreas Buja (Andreas Buja)
Rejoinder to discussion (Andrew Gelman)
This discussion is quite inspiring. The idea that visualizations can be thought of a statistical tests was quite eye opening. I think that this suggests quite a few directions for research in InfoVis. However, there hasn’t been much work in this area in the 4 years since the paper came out. Why? Perhaps the artificial division between InfoVis and statistical visualization has kept it from being noticed. Perhaps it’s just very hard
Posted in Uncategorized.
Visualizing Obama’s voter contact operation
Mark Blumenthal writes about new voter turnout information from the 2008 election. The following graph shows the level of voter contact from the Kerry and Obama campaigns (red=low to green=high). Obama had a broader voter contact operation spreading resources more effectively across those with a high probability of voting and voting Democrat.
![]()
Suggestions:
- swap the direction of the vertical axis to put high turnout on top
- add scale numbers, how many contacts? how high is high turnout?
- since the number of contacts is nonnegative, I would use a sequential (one-sided) color scale (running from white, 0, to green) rather than a diverging scale.
- how many people fall into each bucket? An additional grayscale plot showing the distribution of people would be helpful. Or preferably, if possible, the axes could be transformed to make the distribution of individuals roughly uniform across the plot.
Posted in visualization.
First time designing a visualization
The CHANCE contest submission below was my first time creating a complete static visualization that tries to tell a story. It’s sort of sad that I’m in my third year as a Ph.D. student studying visualization and I hadn’t done that yet.
I found it quite satisfying. Back in the olden days when I worked in rendering there was an immense amount of satisfaction that came from getting a rendering right–both visually and algorithmically. In visualization I hadn’t felt that yet, since all of my projects so far have been rather flaky research prototypes.
Over at FlowingData, Nathan is running a biweekly visualization competition/discussion. The first installment uses US poverty statistics. This’ll be a good chance for me to get more design experience.
Posted in Uncategorized.
My submission to the CHANCE contest
Contest description is here:
http://www.public.iastate.edu/~larsen/graphics%20contest.pdf
Posted in Uncategorized.
R complaints
I’ve recently read a number of complaints about the R programming language and thought I’d pull together the complaints into one place.
- Inconsistent return types, list/vector confusion (Andrew Gelman)
I always get mixed up about when to use [] and [[]]. - Lack of useful types (Andrew Gelman)
HavingĀ nonnegative or [0,1) constrained floating point types would be quite useful in many circumstances. I haven’t used factors enough to know if they would work in most scenarios where an enumerated type is used in other languages. Having built in random variable types would be useful too. - Scalability and S4 complexity issues, mixed with R coding style issues (Andrew Gelman)
I haven’t used S4, so I can’t comment on that. However, I have found it very useful to be able to type the name of a function on the command line and see it’s code directly. Unfortunately, built-in functions (e.g. lapply) don’t print out (probably because they really only exist in C code). It would be nice for such functions to print out an equivalent R implementation with a note saying that it really executes in C. - Vector indexing issues #1, #2, #2a, #3 (Radford Neal)
As a CS guy I find 1-based vectors hard to justify, but Radford notes a number of other issues. I’ve been bitten by the automatic dimension dropping “feature” rather frequently.
Posted in Uncategorized.


