Nothing to to with the charting advice, which mainly makes sense, but the Myers-Briggs data touched on a pet peeve of mine: Researchers who report R rather than R-squared in cases where anti-correlation (negative R) is not operative, and thus there's no good reason to use the signed R rather than the unsigned R-squared. This is especially prevalent in cases of weak correlation.
In this particular case, the largest correlation shown corresponds to an R-squared of 0.05. But that looks almost like a null result. So the researchers instead report R=0.23, which looks more impressive.
Totally agree that R^2 is standard. Though if that wasn't an issue, I feel like R might be "more linear" in our intuition for effects? So if we ignore convention, I might prefer that for similar reasons for why I'd rather use standard deviation in a bar chart than variance?
For some of your claims, I think doing a literature review of the data visualization research would be helpful. They do examine axes and aspect ratios. Empirical studies are needed here and they'd be cheap to run with survey software and Prolific.
In the case of charts where the y-axis doesn't start at zero (if zero is semantically meaningful, e.g. money or a dimensionless number, but not temperature), IMO there should be a vertical axis line starting at zero and then a *break* in the axis line before the range where values are actually being plotted. That is, in "Years vs. GDP again", the vertical axis should go $0T, break, $51T, $52T, $53T. Even if that violates the idea of not having an axis line when the x-axis is something like years, I think it's important to emphasize that discontinuity, because a lot of graphs dishonestly exaggerate small y-axis differences by narrowing to a small range and aren't very clear about it.
I actually think the graphs are easier to read when the rule is flipped: When the vertical distances are meaningful, draw the vertical line; when the horizontal distances are meaningful, draw the horizontal line. E.g., for the years versus GDP example, the correct graph in my opinion is the upper right, not the bottom left.
I literally cracked up just reading the TITLE. I love this newsletter so much.
Nothing to to with the charting advice, which mainly makes sense, but the Myers-Briggs data touched on a pet peeve of mine: Researchers who report R rather than R-squared in cases where anti-correlation (negative R) is not operative, and thus there's no good reason to use the signed R rather than the unsigned R-squared. This is especially prevalent in cases of weak correlation.
In this particular case, the largest correlation shown corresponds to an R-squared of 0.05. But that looks almost like a null result. So the researchers instead report R=0.23, which looks more impressive.
Totally agree that R^2 is standard. Though if that wasn't an issue, I feel like R might be "more linear" in our intuition for effects? So if we ignore convention, I might prefer that for similar reasons for why I'd rather use standard deviation in a bar chart than variance?
"time did not start in 1980" - very true; it started exactly ten years earlier..
I don't suppose the rant about rings could go in a footnote somewhere could it?
i think a plot without any axis lines looks absolutely terrible
For some of your claims, I think doing a literature review of the data visualization research would be helpful. They do examine axes and aspect ratios. Empirical studies are needed here and they'd be cheap to run with survey software and Prolific.
In the case of charts where the y-axis doesn't start at zero (if zero is semantically meaningful, e.g. money or a dimensionless number, but not temperature), IMO there should be a vertical axis line starting at zero and then a *break* in the axis line before the range where values are actually being plotted. That is, in "Years vs. GDP again", the vertical axis should go $0T, break, $51T, $52T, $53T. Even if that violates the idea of not having an axis line when the x-axis is something like years, I think it's important to emphasize that discontinuity, because a lot of graphs dishonestly exaggerate small y-axis differences by narrowing to a small range and aren't very clear about it.
I actually think the graphs are easier to read when the rule is flipped: When the vertical distances are meaningful, draw the vertical line; when the horizontal distances are meaningful, draw the horizontal line. E.g., for the years versus GDP example, the correct graph in my opinion is the upper right, not the bottom left.
Thanks for giving me yet another thing to be annoyed about