There are multiple ways to use data to justify any story or agenda one has. My p-hacking post shows how statistics have been used to get statistically significant results. Therefore you can get your work to publish, and with journal articles and editors not glorifying replication studies, it can be hard to fund them. However, there are also ways to manipulate graphs to meet any narrative you want. Take the figure below, which was published by the Georgia Department of Public Health Website on May 10, 2020. Notice something funny going on in the x-axis, it looks like a Dr. Who’s voyage across time trying to solve the Corona Virus crisis. The dates on the x-axis are not in chronological order (Bump, 2020; Fowler, 2020, Mariano & Trubey, 2020; McFall-Johnsen, 2020, Wallace, 2020). The dates are in the order they need to be, to make it appear that the number of coronavirus cases in Georgia’s top 5 impacted counties is decreasing over time.
Figure 1: May 10 top five impacted counties bar chart from the Georgia Department of Public Health website.
The figure above, if the dates were lined up appropriately would tell a different story. Once this chart was made public, it garnered tons of media coverage and was later fixed. But, this happens all the time when people have an agenda. They mess with the axis, to give them the result they want. It is really rare though to see a real-life example of it on the x-axis.
But wait, there’s more! Notice the grouping order of the top five impacted counties. Pick a color, it looks like the Covid-19 counts per county are playing musical chairs. What was done here was, they ordered each day as top five counties in descending count order, which makes it even harder to understand and interpret, again sewing a narrative that may not be accurate (Bump, 2020; Fowler, 2020, Mariano & Trubey, 2020; McFall-Johnsen, 2020, Wallace, 2020).
Now according to Fowler (2020), there are issues in how the number of Covid-19 cases gets counted here, which adds to misinformation and sews further distrust. It is just another way to build a narrative you wish you had, but carving out an explicit definition of what is in and what is out, you can cause an artificial skew in your data, again to favor a narrative or produce false results that could be accidentally generalized. Here Fowler explains:
“When a new positive case is reported, Georgia assigns that date retroactively to the first sign of symptoms a patient had – or when the test was performed, or when the results were completed. “
Understanding that the virus had many asymptomatic carriers that never got reported is also part of the issue. Understanding that you could be asymptomatic for days and still have Covid-19 in your system, means that the definition above is completely inaccurate. Also, Fowler explains that if there was a Covid-19 test, there is such a backlog of tests, that it could take days to report a positive case, so reporting the last 14 days, these numbers along with the definition will see those numbers shift wildly throughout each iteration of the graph. So, when the figure one was fixed, the last 14 days will inherently show a decrease in cases, due to backlog, definition, and understanding of the virus, see figure 2.
Figure 2: May 19 top five impacted counties bar chart from the Georgia Department of Public Health website.
They did fix the ordering of the counties and the x-axis. But after it was reported by Fox News, Washington Post, and Business Insiders, to report a few. However, the definition of what counts as a Covid-19 case distorts the numbers and still tells the wrong story. It is easy to see this effect when you compare May 4-9 data between Figure 1 and Figure 2. Figure 2 has a higher incidence of Covid-19 recorded, over that same period. That is why definitions and criteria matter just as much as how graphs can be manipulated.
Mariano & Trubey (2020) does have a point, some errors are expected during a time of chaos, but, common chairmanship behavior should be observed. However, be careful of how data is collected, how it is represented on graphs and look at not only the commonly manipulated Y-axis but also the X-axis. That is why the methodology sections in peer-reviewed work are extremely important.
- Bump, P. (2020). The new political battleground over the coronavirus? Math. Washington Post. Retrieved from https://www.washingtonpost.com/politics/2020/05/19/new-political-battleground-over-coronavirus-math/
- Fowler, S. (2020). Georgia’s Gaffe-Prone COVID-19 Dashboard Is Useful – If You Know Where To Look. GPB News. Retrieved from https://www.gpbnews.org/post/georgia-s-gaffe-prone-covid-19-dashboard-useful-if-you-know-where-look
- Mariano, W. & Trubey, J. S. (2020). ‘It’s just cuckoo’: state’s latest data mishap causes critics to cry foul. AJC. Retrieved from https://www.ajc.com/news/state–regional-govt–politics/just-cuckoo-state-latest-data-mishap-causes-critics-cry-foul/182PpUvUX9XEF8vO11NVGO/
- McFall-Johnsen, M. (2020). A ‘cuckoo’ graph with no sense of time or place shows how Georgia bungled coronavirus data as it reopens. Business Insiders. Retrieved from https://www.businessinsider.com/graph-shows-georgia-bungling-coronavirus-data-2020-5
- Wallace, D. (2020). Georgia apologizes over ‘processing error’ after accusations officials were manipulating coronavirus case counts. Fox News. Retrieved from https://www.foxnews.com/us/georgia-health-department-apologize-coronavirus-data-gaffe-processing-error