Let’s imagine a hypothetical situation. There’s an infection going round, and we want to predict the future severity of someone’s illness. There is a test that offers a good prediction. Let’s say the outcome of the test has a correlation of 0.78 with the patient's severity of infection. The problem with the test is that…
Datasets which are identical over a number of statistical properties, yet produce dissimilar graphs, are frequently used to illustrate the importance of graphical representations when exploring data. This paper presents a novel method for generating such datasets, along with several examples. Our technique varies from previous approaches in that new datasets are iteratively generated from a seed dataset through random perturbations of individual data points, and can be directed towards a desired outcome through a simulated annealing optimization strategy.
As just about every statistics student can attest, Simpson's Paradox — a statistical phenomenon where an apparent trend is reversed when you look at subgroups — is notoriously hard to explain. You can look at examples — say, the fact that US wages are rising overall, but dropping within every educational group — but that don't really help to explain the paradox. But it's not really paradox at all, but simply the fact that the disparate rate at which members of the study join the subgroups isn't accounted for in the analysis. To demonstrate this effect, the Visualizing Urban Data...
R. Westerholt, M. Gröbe, A. Zipf, and D. Burghardt. 10th International Conference on Geographic Information Science (GIScience 2018), volume 114 of Leibniz International Proceedings in Informatics (LIPIcs), page 63:1--63:7. Dagstuhl, Germany, Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik, (2018)