
Quantifying the User Experience: Practical Statistics for User Research

As a pragmatic matter, it’s more common to test the hypothesis of 0 difference than some other hypothetical difference.
Jeff Sauro • Quantifying the User Experience: Practical Statistics for User Research
Chapters 6 and 7 contain a thorough discussion of power and computing sample sizes to control Type II errors.
Jeff Sauro • Quantifying the User Experience: Practical Statistics for User Research
With null hypothesis testing, all it takes is sufficient evidence (instead of definitive proof) that a 0 difference between means isn’t likely and you can operate as if at least some difference is true.
Jeff Sauro • Quantifying the User Experience: Practical Statistics for User Research
We can say there is a difference when one doesn’t really exist (called a Type I error), or we can conclude no difference exists when one in fact does exist (called a Type II error).
Jeff Sauro • Quantifying the User Experience: Practical Statistics for User Research
interactive lessons with many visualizations and examples on the www.measuringusability.com
Jeff Sauro • Quantifying the User Experience: Practical Statistics for User Research
Rejecting the opposite of what we’re interested in seems like a lot of hoops to jump through. Why not just test the hypothesis that there is a difference between versions? The reason for this approach is at the heart of the scientific process of falsification. It’s very difficult to prove something scientifically.
Jeff Sauro • Quantifying the User Experience: Practical Statistics for User Research
The p-value tells us the probability we’re making a Type I error. When we see a p-value of 0.05, we interpret this to mean that the probability of obtaining a difference this large or larger if the difference is really 0 is about 5%.
Jeff Sauro • Quantifying the User Experience: Practical Statistics for User Research
When we speak in terms of the standard deviation of the distribution of sample means, this special standard deviation goes by the name “standard error” to remind us that that each sample mean we obtain differs by some amount from the true unknown population mean.
Jeff Sauro • Quantifying the User Experience: Practical Statistics for User Research
Using the Excel function =TDIST(2.3,11,2) we get 0.04, which is called the p-value. A p-value is just a percentile rank or point in the t-distribution. It’s the same concept as the percent of area under the normal curve used with z-scores. A p-value of 0.04 means that only 4% of differences would be greater than 15 seconds if there really was no di
... See more