This is exactly the type of content I like to see. Precisely debunking the "common advice" because that advice is wrong. But also with the verve and enthusiasm communicated to emphasize the point.
If I had a tankard, I would be banging it on the table in appreciation. Yes! Yes please! Science is so bad right now you guys and anything, anything to help us.
On the plus side the incredibly poor quality of most scientific papers out there might reduce the ability of AI to doom us if it gets trained on them?
Agreed-but in my world, this is very important: "And The Principles of Biomedical Scientific Writing says: Significant figures (significant digits) should reflect the degree of precision of the original measurement." There is sometimes a blithe ignorance of calibration errors, the accuracy of an instrument maintenance log, the ambient conditions during which the measurement was made, and the ability of another user to reproduce pertinent measurements.
Hear hear! An additional obnoxious killer of precision: SEM. Just report SD and the number of subjects. I'm perfectly capable of dividing by sqrt(n) myself thanks. What use is it to know that fraction of Sundays spent napping was 50 +/- 1% (mean +/- SEM) in your study of 500 people? gosh I really needed that standard deviation for my prediction intervals and now I'm down another factor of 20 in my precision!
This is an annoyance for me, too. More generally, it drives me nuts how many people refuse to publish the most basic descriptive statistics for their data, whilst they spend page after page showing regression coefficients for 5 different analyses that they imply capture causal effects.
1. In response to the question "shouldn’t we simply require authors to publish their data?", rather than answer no, I would respond "yes, and ask for more digits". I'd then take issue with the "simply". Shouldn't we push authors to publish their data *and do it well*? Including more significant digits is a handy workaround, and I'm all for not letting the great be the enemy of the good, but let's also keep pushing for the great (or, at least, honestly, the merely adequate).
2. A downside of including more digits is that some people will inevitably treat them as significant. One of my core beliefs is that every number is an invitation to misinterpretation (https://climateer.substack.com/p/numbers), and additional digits gives people more opportunity to, like, compare 2.1856 in one paper with 2.1853 in another paper (when both were explicitly ±0.5) and come to some wild conclusion. Which is not a reason to omit the extra digits, my point is simply that... huh I don't actually have a point, just a pet peeve the size of a Great Dane.
Definitely in favor of better data. The replication crisis is only 13 years old if we date it to the Bem affair. But boy do those wheels turn slowly. I think the real problem is that the only way to really guarantee it is to have reviewers actually try to replicate the given results from the given data. Most reviewers simply don't want to work that hard. So it's a hard lift.
My best guess is that the solution won't come from extracting more work from reviewers, but from having much better "post-publication review". Authors should be expected to stand behind their work long-term, not just sneak it past peer review and then disappear into the mist. I want a single place I can go and see if other people have struggled to reproduce published results. And I want hiring/tenure/promotion/grant committees to look at all that. Seems possible, but a big collective action problem.
Love all of this. A culture of stronger post-publication review would be wonderful.
I imagine AI capabilities are already at the point where a useful degree of post-publication review could be automated? Enough to flag some obvious mathematical errors and some cases of missing / questionable data (you'd have to expect false positives, so there would be a need for manual review).
Also - scientists should never be allowed to publish the phrase "p<0.05" - it should always be required to report the exact p value. In many contexts, a p value of 10^-45 is super suspicious.
I hadn't seen either of those specific references before, but they're examples of just the kind of reverse-engineering I find myself doing ALL THE TIME. This bit from the SPRITE paper certainly rings true to me:
"Wicherts et al. (2006) and Bakker and Wicherts (2011) reported receiving data from 38 out of 141 (23%) and 21 from 49 (43%) authors respectively, Vines et al. (2014) obtained variously 17% to 29% for papers in cohorts within the previous 10 years, two of us received 9 out of 21 (43%) requested data sets from articles published in the previous 5 years (Brown and Heathers, 2017), and most recently it was found that for a journal with a data availability policy only 65 out of 180 authors provided the requested materials (Stodden et al., 2018"
Except I think this actually understates the issue. Even when the authors do provide data, most of the time it's incomprehensible or impossible to reconcile with the published results!
The presence of simple errors in statistics in articles is strikingly high in fields where people have looked. When they applied the GRIM test to psychology papers in "leading psychology journals", among the ones where they could apply the test, 50 PERCENT had inconsistencies in summary stats. https://journals.sagepub.com/doi/abs/10.1177/1948550616673876
That Miyakawa article is extraordinary. It's infuriating, but there's also something hilarious about someone being so uncompromising and actually expecting people to live up to the promises they're making. Beautiful.
You make a very good argument for including "extra" digits in numerical results when the quantities are derived from categorical attributes in small data sets. As a former fanatical "get rid of all those meaningless digits!" person, you have succeeded in slightly decreasing my fanaticism.
However, it's a bit different when dealing with larger quantities of data or intrinsically continuous data. There is nothing I hate more than scanning over an entire table of multiple rows and multiple columns of numbers and trying to see patterns in the data when the entries are shown to 8 decimal places. In this case, the extra digits are just distracting noise and cognitive overhead.
In addition, there are cases even for single results where the extra digits really are meaningless, and nothing is gained from showing them. E.g., if I'm looking at the output from a large machine learning model, as opposed to a calculation from a small collection of discrete data, then it's pointless to report P(CatInImage) as 0.8716245356131 rather than simply 0.87.
I must admit there are definitely situations where it's best to trim extra digits. But I think the upside is typically smaller than the downside, so I'd suggest this be considered an "advanced maneuver", only to be done with care by experts. I'm worried about making it conventional wisdom to remove the extra digits. The current status quo is that less statistically literate people often leave in digits, which is exactly what we want. We should preserve that status quo! So I guess my real advice would be "please show lots of digits unless you really know what you're doing".
So if "statistically significant digits" isn't the standard, then what is the *correct* number of digits to include in figures? Because we can carry calculations to arbitrary precision....
This is a hard question, but if I had to choose a single universal number, I guess I'd suggest 7 digits? That's about the precision you have even if you're only working with 32-bit floats, and not too crazily long, so probably a good compromise.
Started the article like "Huh, a rare Dynomight misfire; be interesting to read this and see just where D goes awry.." and finished the article like "SOLD! WHERE DO I SIGN?!"
... almost! I can think of two cases where trimming digits still seems to be preferable (and potentially even very important indeed..)
1) In situations where the press are likely to want to publish their own interpretations of the paper: "[...] we find the gender pay gap to be 1.3502464%, which aligns with previous studies finding a gender pay gap of 1.35% [...]" gets reported as GENDER PAY GAP INCREASING OUT-OF-CONTROL. Values of 7.989723, 7.990019, 8.000169 get reported in the newspaper using a graph clipped at 7.98 on the Y axis, thus implying a steep upward trend.
(Source; used to date a statistician who *literally collected* such press clippings)
2) As a matter of public record, using too many digits has caused us to lose ships at sea. Yes, deck officers understand the technical limitations of our navigation equipment/techniques when we're in a nice comfortable classroom at Nautical College - but when it's 0400 and you've just come on watch after virtually no sleep because the ship has been launching and recovering Chinook helicopters right outside your cabin throughout your max.-8-hour off-watch period, and there's six feet of swell running which causes your inner ear and half your internal organs to experience different G-forces to the rest of you every ten seconds or so, and the off-going Officer of the Watch has charted your position as 19° 51.24083'N, 75° 11.04089'W, it potentially takes more imagination than you might possess at that particular moment to realise that you're not at 19° 51.24083'N, 75° 11.04089'W but rather are actually just somewhere within 15" (500 yards-ish) of 19° 51'N, 75° 11'W, and to navigate accordingly. Even in the age of GNSS this has caused ships to run aground on reefs and shoals, stray into coastal artillery firing ranges, enter other nations' territorial waters without diplomatic clearance leading to diplomatic incidents, &c. &c. For this reason, it's all-but beaten into you as a Cadet to only log your position to sufficient digits to represent the level of confidence you have in it regardless of how many decimal places you're getting from your 3-point terrestrial fix, GNSS receiver, etc. (such beatings are administered with stacks of Marine Incident Investigation Branch reports..)
Also, something like an ISO standard for supporting data would be nice; specifying minimums for what authors need to include, data formatting, a common file format, perhaps even centralised repository/hosting criteria (something like a journal- or institution-sponsored GitHub type thing). The journals could then mutually agree to require authors to maintain ISO-compliant datasets before publishing a paper.
Currently learning Python, so I comment with some trepidation, but since you reindexed the three ranges from 1 rather than 0, and since Python doesn’t execute the loop for the high end value, isn’t there an off-by-one bug in those three for loops?
I was intentionally excluding both endpoints on the logic that it was impossible that ALL students passed or failed in both groups. But... perhaps that assumption isn't justified! I think we only really know that:
This is exactly the type of content I like to see. Precisely debunking the "common advice" because that advice is wrong. But also with the verve and enthusiasm communicated to emphasize the point.
If I had a tankard, I would be banging it on the table in appreciation. Yes! Yes please! Science is so bad right now you guys and anything, anything to help us.
On the plus side the incredibly poor quality of most scientific papers out there might reduce the ability of AI to doom us if it gets trained on them?
Agreed-but in my world, this is very important: "And The Principles of Biomedical Scientific Writing says: Significant figures (significant digits) should reflect the degree of precision of the original measurement." There is sometimes a blithe ignorance of calibration errors, the accuracy of an instrument maintenance log, the ambient conditions during which the measurement was made, and the ability of another user to reproduce pertinent measurements.
Hear hear! An additional obnoxious killer of precision: SEM. Just report SD and the number of subjects. I'm perfectly capable of dividing by sqrt(n) myself thanks. What use is it to know that fraction of Sundays spent napping was 50 +/- 1% (mean +/- SEM) in your study of 500 people? gosh I really needed that standard deviation for my prediction intervals and now I'm down another factor of 20 in my precision!
This is an annoyance for me, too. More generally, it drives me nuts how many people refuse to publish the most basic descriptive statistics for their data, whilst they spend page after page showing regression coefficients for 5 different analyses that they imply capture causal effects.
Hear, hear... and a couple of further thoughts:
1. In response to the question "shouldn’t we simply require authors to publish their data?", rather than answer no, I would respond "yes, and ask for more digits". I'd then take issue with the "simply". Shouldn't we push authors to publish their data *and do it well*? Including more significant digits is a handy workaround, and I'm all for not letting the great be the enemy of the good, but let's also keep pushing for the great (or, at least, honestly, the merely adequate).
2. A downside of including more digits is that some people will inevitably treat them as significant. One of my core beliefs is that every number is an invitation to misinterpretation (https://climateer.substack.com/p/numbers), and additional digits gives people more opportunity to, like, compare 2.1856 in one paper with 2.1853 in another paper (when both were explicitly ±0.5) and come to some wild conclusion. Which is not a reason to omit the extra digits, my point is simply that... huh I don't actually have a point, just a pet peeve the size of a Great Dane.
Definitely in favor of better data. The replication crisis is only 13 years old if we date it to the Bem affair. But boy do those wheels turn slowly. I think the real problem is that the only way to really guarantee it is to have reviewers actually try to replicate the given results from the given data. Most reviewers simply don't want to work that hard. So it's a hard lift.
My best guess is that the solution won't come from extracting more work from reviewers, but from having much better "post-publication review". Authors should be expected to stand behind their work long-term, not just sneak it past peer review and then disappear into the mist. I want a single place I can go and see if other people have struggled to reproduce published results. And I want hiring/tenure/promotion/grant committees to look at all that. Seems possible, but a big collective action problem.
Love all of this. A culture of stronger post-publication review would be wonderful.
I imagine AI capabilities are already at the point where a useful degree of post-publication review could be automated? Enough to flag some obvious mathematical errors and some cases of missing / questionable data (you'd have to expect false positives, so there would be a need for manual review).
Hilarious and lighthearted but makes an important, very serious point. Great stuff.
Inspired by James Heathers GRIM test? https://en.wikipedia.org/wiki/GRIM_test
Related - SPRITE (I don't understand it yet, but I need to) https://peerj.com/preprints/26968/
Also - scientists should never be allowed to publish the phrase "p<0.05" - it should always be required to report the exact p value. In many contexts, a p value of 10^-45 is super suspicious.
I hadn't seen either of those specific references before, but they're examples of just the kind of reverse-engineering I find myself doing ALL THE TIME. This bit from the SPRITE paper certainly rings true to me:
"Wicherts et al. (2006) and Bakker and Wicherts (2011) reported receiving data from 38 out of 141 (23%) and 21 from 49 (43%) authors respectively, Vines et al. (2014) obtained variously 17% to 29% for papers in cohorts within the previous 10 years, two of us received 9 out of 21 (43%) requested data sets from articles published in the previous 5 years (Brown and Heathers, 2017), and most recently it was found that for a journal with a data availability policy only 65 out of 180 authors provided the requested materials (Stodden et al., 2018"
Except I think this actually understates the issue. Even when the authors do provide data, most of the time it's incomprehensible or impossible to reconcile with the published results!
The presence of simple errors in statistics in articles is strikingly high in fields where people have looked. When they applied the GRIM test to psychology papers in "leading psychology journals", among the ones where they could apply the test, 50 PERCENT had inconsistencies in summary stats. https://journals.sagepub.com/doi/abs/10.1177/1948550616673876
Among studies on data accessibility, according to James Heathers this is the best, and also probably the most famous now: (figure 1 went viral on Twitter) https://molecularbrain.biomedcentral.com/articles/10.1186/s13041-020-0552-2
(although it's a bit different because here it was an editor asking for raw data before review)
That Miyakawa article is extraordinary. It's infuriating, but there's also something hilarious about someone being so uncompromising and actually expecting people to live up to the promises they're making. Beautiful.
if anyone likes listening to podcasts and people ranting, James Heathers talks about Miyakawa's article here https://everythinghertz.com/183
"this is a look at what's going on behind the curtain.. and it's a curtain all the way in the back of the sanctum"
You make a very good argument for including "extra" digits in numerical results when the quantities are derived from categorical attributes in small data sets. As a former fanatical "get rid of all those meaningless digits!" person, you have succeeded in slightly decreasing my fanaticism.
However, it's a bit different when dealing with larger quantities of data or intrinsically continuous data. There is nothing I hate more than scanning over an entire table of multiple rows and multiple columns of numbers and trying to see patterns in the data when the entries are shown to 8 decimal places. In this case, the extra digits are just distracting noise and cognitive overhead.
In addition, there are cases even for single results where the extra digits really are meaningless, and nothing is gained from showing them. E.g., if I'm looking at the output from a large machine learning model, as opposed to a calculation from a small collection of discrete data, then it's pointless to report P(CatInImage) as 0.8716245356131 rather than simply 0.87.
I must admit there are definitely situations where it's best to trim extra digits. But I think the upside is typically smaller than the downside, so I'd suggest this be considered an "advanced maneuver", only to be done with care by experts. I'm worried about making it conventional wisdom to remove the extra digits. The current status quo is that less statistically literate people often leave in digits, which is exactly what we want. We should preserve that status quo! So I guess my real advice would be "please show lots of digits unless you really know what you're doing".
So if "statistically significant digits" isn't the standard, then what is the *correct* number of digits to include in figures? Because we can carry calculations to arbitrary precision....
This is a hard question, but if I had to choose a single universal number, I guess I'd suggest 7 digits? That's about the precision you have even if you're only working with 32-bit floats, and not too crazily long, so probably a good compromise.
What do you think of this? https://docs.google.com/document/d/1eJ0P0nHYCZKe8rp1N1Wpd1UpASXgxpLiYynsOk92kWs/edit?tab=t.0#heading=h.ve98pzewxplx
I feel your pain ❤️
Ok, I’m at least halfway sold.
Agreed
Started the article like "Huh, a rare Dynomight misfire; be interesting to read this and see just where D goes awry.." and finished the article like "SOLD! WHERE DO I SIGN?!"
... almost! I can think of two cases where trimming digits still seems to be preferable (and potentially even very important indeed..)
1) In situations where the press are likely to want to publish their own interpretations of the paper: "[...] we find the gender pay gap to be 1.3502464%, which aligns with previous studies finding a gender pay gap of 1.35% [...]" gets reported as GENDER PAY GAP INCREASING OUT-OF-CONTROL. Values of 7.989723, 7.990019, 8.000169 get reported in the newspaper using a graph clipped at 7.98 on the Y axis, thus implying a steep upward trend.
(Source; used to date a statistician who *literally collected* such press clippings)
2) As a matter of public record, using too many digits has caused us to lose ships at sea. Yes, deck officers understand the technical limitations of our navigation equipment/techniques when we're in a nice comfortable classroom at Nautical College - but when it's 0400 and you've just come on watch after virtually no sleep because the ship has been launching and recovering Chinook helicopters right outside your cabin throughout your max.-8-hour off-watch period, and there's six feet of swell running which causes your inner ear and half your internal organs to experience different G-forces to the rest of you every ten seconds or so, and the off-going Officer of the Watch has charted your position as 19° 51.24083'N, 75° 11.04089'W, it potentially takes more imagination than you might possess at that particular moment to realise that you're not at 19° 51.24083'N, 75° 11.04089'W but rather are actually just somewhere within 15" (500 yards-ish) of 19° 51'N, 75° 11'W, and to navigate accordingly. Even in the age of GNSS this has caused ships to run aground on reefs and shoals, stray into coastal artillery firing ranges, enter other nations' territorial waters without diplomatic clearance leading to diplomatic incidents, &c. &c. For this reason, it's all-but beaten into you as a Cadet to only log your position to sufficient digits to represent the level of confidence you have in it regardless of how many decimal places you're getting from your 3-point terrestrial fix, GNSS receiver, etc. (such beatings are administered with stacks of Marine Incident Investigation Branch reports..)
Also, something like an ISO standard for supporting data would be nice; specifying minimums for what authors need to include, data formatting, a common file format, perhaps even centralised repository/hosting criteria (something like a journal- or institution-sponsored GitHub type thing). The journals could then mutually agree to require authors to maintain ISO-compliant datasets before publishing a paper.
(Also, I'd like a pony..)
Before reading this, I believed writing extra digits was silly, but you've changed my mind.
Currently learning Python, so I comment with some trepidation, but since you reindexed the three ranges from 1 rather than 0, and since Python doesn’t execute the loop for the high end value, isn’t there an off-by-one bug in those three for loops?
Claude says the two outermost loops are fine, but there’s probably a problem with the innermost loop underlooping by one.
I was intentionally excluding both endpoints on the logic that it was impossible that ALL students passed or failed in both groups. But... perhaps that assumption isn't justified! I think we only really know that:
L>0
R>0
l>0
r<R
So I THINK the right loops would be
L in range(1, tot_students) # this is fine
l in range(1, L+1) # need to change
r in range(0,R) # needs to change
thoughts?
So that's why robots and Vulcans are always saying the odds are 427863 to 1: in the future we won't use significant digits to indicate precision 😊