
Roman's Data Science: How to monetize your data

Kozyrkov implores us to “always evaluate decision quality based only on what was known at the time the decision was made.”
Roman Zykov • Roman's Data Science: How to monetize your data
In his 1925 monograph Statistical Methods for Research Workers, Ronald Fisher (the founder of hypothesis testing) outlined concepts such as the statistical significance criterion, the rules for testing statistical hypotheses, analysis of variance, and experiment planning. This work defined our current approach to experiment planning.
Roman Zykov • Roman's Data Science: How to monetize your data
It is important to note that the p-value is not the probability that hypothesis H0 is correct. Rather, it only works towards rejecting the null hypothesis.
Roman Zykov • Roman's Data Science: How to monetize your data
A hypothesis is an idea for how to improve a product.
Roman Zykov • Roman's Data Science: How to monetize your data
the further the peaks (averages) of these distributions are, the higher the power and the lower probability of a Type 2 error (that the null hypothesis will be accepted incorrectly). This is most logical, as the further the averages of the distributions are from each other, the more obvious the difference between the hypotheses becomes, thus making
... See moreRoman Zykov • Roman's Data Science: How to monetize your data
When I hear the word “distribution,” I imagine a histogram showing the frequency of occurrences of a given event.
Roman Zykov • Roman's Data Science: How to monetize your data
Z-test – for checking the mean of a normally distributed quantity. Student’s t-test – the same as a z-test, but for small samples (t < 100).
Roman Zykov • Roman's Data Science: How to monetize your data
In data analysis, survival bias is taking the known into account while neglecting the unknown (which nevertheless exists).
Roman Zykov • Roman's Data Science: How to monetize your data
All measurements contain errors. This is a fact, get over it. Errors themselves should be noted and not considered errors as such (I’ll explain how we can monitor this in a later chapter).