QX) Question where the answer is: continuity correction.
- we
QX) What is underflow and how do we prevent it?
QX) Your client, CEO of Big Ballers, is interested in knowing the mean height of their customer base, so you collect data from a random sample of 10 customers and use a Z-test to get the 95% confidence interval [169.3cm, 198.8cm]. Write a concise, plain language statement explaining this _____ to a non-statistician client
“A plausible range of values for the true (population) mean is [169.3cm, 198.8cm], at a 95% confidence level.”
Why is saying “There is a 95% chance that the population mean is in this interval” incorrect?
Whether or not the true mean is within your observed confidence interval is deterministic. The true population mean is an unknown but fixed, single real value - therefore it is either inside your confidence or it isn’t - an 100% or 0% chance.
How helpful is this confidence interval? What could you do to make it more narrow?
It’s not very helpful;
- Increase the sample size
- Lower the confidence level
☞ There is a tradeoff between the level of confidence and the utility of the interval; this tradeoff is also present in machine learning
What is the best confidence level for a confidence interval?
There is no ‘best choice’. 95%, 90% and 50% are common choices and vary by application
- Whatever you choose, the true value is never guaranteed to be inside the interval; there is always a chance it will be outside
- If the confidence level if very high
- More likely to capture the true value
- But impractically wide
- If very low
- More ‘useful’ in the sense of being more selective about the possible values of the parameter
- Comes at the expense of a loss of confidence - not very certain about whether the true value is captured inside the interval
Scuffed numbers. You suspect your users are making up data
How could you test this
The accountant thing and a Chi Squared test of goodness of fit (from the tute, once I do it)
Q6.1) What is a test statistic?
☞ A test statistic describes how closely the distribution of your data matches the distribution predicted by the null hypothesis.
Q6.1) What does the ‘size’ of a statistical test mean?
Q6.2) what is the critical region of a statistical test
these are bad. Just do p-value and extra: what is a test statistic
Q6.2) What does the ‘power’ of a statistical test mean?
☞ In plain English, the power of a statistical test tells you how likely it is to reject when it is actually false and should be rejected.
E.g a test with 99% power will produce a ‘true’ result 99% of the time on average. can ou think of zum
The reason the power tells you how ‘powerful’ a test is because it essentially tells you what the probability of the test passing if it’s supposed to. A test which uses the most information will have the highest power (confidence in producing the correct result). E.g the signed rank test uses information on sign as well as magnitude
- beta,
Q6.1) Phrase this as an actual question w context; but essentially do we prefer a one sided or two sided test in this case.
Ans:
- We generally always prefer a one sided test since it has more power than an equivalently sized two-sided test (you’re only betting on one side)
- This indicates a general principle that if a one-sided test is appropriate (i.e. we happen to be sure that θ is not less than 10 in this case for some reason independent of the data) then the one-sided test is a better test being more powerful for any fixed size. NB This decision must be made before any observations are taken - it is wrong to use a one-sided test based on what has been observed.
QX) (situation where one should use mean as a measure of central tendency)
QX) (situation where one should use median as a measure of central tendency)
The bowl example , Week 7 LAB STATS
QX) Perform a Hypothesis Test on the accuracy of a new ML Algorithm
A particular machine learning algorithm is believed to correctly classify on average 60% of new data points. To test this hypothesis, twenty data points are tested using this algorithm, and if less than 8 are correctly classified, it will be assumed that the classification rate is less than 0.60. State and , the Test Statistic, and its exact distribution. What is the size of the test? What is the power of the test for a classification rate of 0.30?
Solution:
-
State H0 and H1:
- H0 (Null hypothesis): The algorithm correctly classifies 60% of data points, i.e., ( p = 0.60 ).
- H1 (Alternative hypothesis): The algorithm correctly classifies less than 60% of data points, i.e., ( p < 0.60 ).
-
Test Statistic and its exact distribution: The test statistic is the number of data points correctly classified, which follows a binomial distribution with parameters ( n = 20 ) (number of trials) and ( p = 0.60 ) (probability of success on each trial). So, ( X \sim B(n=20, p=0.60) ).
-
Size of the test: The size (or significance level) of the test is the probability of rejecting the null hypothesis when it is true. This is given by: [ P(X < 8 | p = 0.60) ] Using a binomial probability table or calculator for ( B(20, 0.60) ), we find that: [ P(X < 8) = P(X \leq 7) = 0.057 ] So, the size of the test is 0.057 (or 5.7%). THIS IS WRONG
-
Power of the test for a classification rate of 0.30: The power of the test is the probability of rejecting the null hypothesis when a specific alternative is true. Here, the specific alternative is ( p = 0.30 ). So, the power is: [ P(X < 8 | p = 0.30) ] Using a binomial probability table or calculator for ( B(20, 0.30) ), we get: [ P(X < 8) = P(X \leq 7) = 0.793 ] Therefore, the power of the test when the true classification rate is 0.30 is 0.793 (or 79.3%).
Follow up: State your conclusion as precisely as possible
☞ It is good practice to say that the test indicates weak/strong evidence against the null hypothesis, rather than saying we ‘reject’ the null hypothesis and ‘accept’ the alternative. This is because we can never be 100% certain about which hypothesis is true in reality, but can only convey our level of confidence on which one.
QX) Imagine we have done an ANOVA to see whether there is a difference in
☞ Similarly, if we get a p-value that is greater than 0.05, it is best practice to avoid saying ‘we reject the alternative hypothesis and accept the null hypothesis’. The underlying truth of which hypothesis is true has not changed. Instead, you can say “There is insufficient evidence against the null hypothesis and thus no strong conclusions can be made. about XYZ”