• Home
  • /
  • Blog
  • /
  • Hypothesis Testing in ML – Explained to Kids

Hypothesis Testing in ML – Explained to Kids

Hypothesis Testing in ML

This post is also available in: العربية (Arabic)

The machine learning practitioner has a tradition of algorithms and a pragmatic focus on results and model skills above other concerns such as model interpretability.

Statisticians work on much the same type of modeling problems under the names of applied statistics and statistical learning. Coming from a mathematical background, they have more of a focus on the behavior of models and the explainability of predictions.

The very close relationship between the two approaches to the same problem means that both fields have a lot to learn from each other. ML models use many statistical concepts and one such is Hypothesis Testing.

coding-for-kids-ebook-cover

Get Instant Access To 

Coding For Kids eBook

A must read for every parent

Country
  • Afghanistan 93
  • Albania 355
  • Algeria 213
  • American Samoa 1-684
  • Andorra 376
  • Angola 244
  • Anguilla 1-264
  • Antarctica 672
  • Antigua & Barbuda 1-268
  • Argentina 54
  • Armenia 374
  • Aruba 297
  • Australia 61
  • Austria 43
  • Azerbaijan 994
  • Bahamas 1-242
  • Bahrain 973
  • Bangladesh 880
  • Barbados 1-246
  • Belarus 375
  • Belgium 32
  • Belize 501
  • Benin 229
  • Bermuda 1-441
  • Bhutan 975
  • Bolivia 591
  • Bosnia 387
  • Botswana 267
  • Bouvet Island 47
  • Brazil 55
  • British Indian Ocean Territory 246
  • British Virgin Islands 1-284
  • Brunei 673
  • Bulgaria 359
  • Burkina Faso 226
  • Burundi 257
  • Cambodia 855
  • Cameroon 237
  • Canada 1
  • Cape Verde 238
  • Caribbean Netherlands 599
  • Cayman Islands 1-345
  • Central African Republic 236
  • Chad 235
  • Chile 56
  • China 86
  • Christmas Island 61
  • Cocos (Keeling) Islands 61
  • Colombia 57
  • Comoros 269
  • Congo - Brazzaville 242
  • Congo - Kinshasa 243
  • Cook Islands 682
  • Costa Rica 506
  • Croatia 385
  • Cuba 53
  • Cyprus 357
  • Czech Republic 420
  • Denmark 45
  • Djibouti 253
  • Dominica 1-767
  • Ecuador 593
  • Egypt 20
  • El Salvador 503
  • Equatorial Guinea 240
  • Eritrea 291
  • Estonia 372
  • Ethiopia 251
  • Falkland Islands 500
  • Faroe Islands 298
  • Fiji 679
  • Finland 358
  • France 33
  • French Guiana 594
  • French Polynesia 689
  • French Southern Territories 262
  • Gabon 241
  • Gambia 220
  • Georgia 995
  • Germany 49
  • Ghana 233
  • Gibraltar 350
  • Greece 30
  • Greenland 299
  • Grenada 1-473
  • Guadeloupe 590
  • Guam 1-671
  • Guatemala 502
  • Guernsey 44
  • Guinea 224
  • Guinea-Bissau 245
  • Guyana 592
  • Haiti 509
  • Heard & McDonald Islands 672
  • Honduras 504
  • Hong Kong 852
  • Hungary 36
  • Iceland 354
  • India 91
  • Indonesia 62
  • Iran 98
  • Iraq 964
  • Ireland 353
  • Isle of Man 44
  • Israel 972
  • Italy 39
  • Jamaica 1-876
  • Japan 81
  • Jersey 44
  • Jordan 962
  • Kazakhstan 7
  • Kenya 254
  • Kiribati 686
  • Kuwait 965
  • Kyrgyzstan 996
  • Laos 856
  • Latvia 371
  • Lebanon 961
  • Lesotho 266
  • Liberia 231
  • Libya 218
  • Liechtenstein 423
  • Lithuania 370
  • Luxembourg 352
  • Macau 853
  • Macedonia 389
  • Madagascar 261
  • Malawi 265
  • Malaysia 60
  • Maldives 960
  • Mali 223
  • Malta 356
  • Marshall Islands 692
  • Martinique 596
  • Mauritania 222
  • Mauritius 230
  • Mayotte 262
  • Mexico 52
  • Micronesia 691
  • Moldova 373
  • Monaco 377
  • Mongolia 976
  • Montenegro 382
  • Montserrat 1-664
  • Morocco 212
  • Mozambique 258
  • Myanmar 95
  • Namibia 264
  • Nauru 674
  • Nepal 977
  • Netherlands 31
  • New Caledonia 687
  • New Zealand 64
  • Nicaragua 505
  • Niger 227
  • Nigeria 234
  • Niue 683
  • Norfolk Island 672
  • North Korea 850
  • Northern Mariana Islands 1-670
  • Norway 47
  • Oman 968
  • Pakistan 92
  • Palau 680
  • Palestine 970
  • Panama 507
  • Papua New Guinea 675
  • Paraguay 595
  • Peru 51
  • Philippines 63
  • Pitcairn Islands 870
  • Poland 48
  • Portugal 351
  • Puerto Rico 1
  • Qatar 974
  • Romania 40
  • Russia 7
  • Rwanda 250
  • Réunion 262
  • Samoa 685
  • San Marino 378
  • Saudi Arabia 966
  • Senegal 221
  • Serbia 381 p
  • Seychelles 248
  • Sierra Leone 232
  • Singapore 65
  • Slovakia 421
  • Slovenia 386
  • Solomon Islands 677
  • Somalia 252
  • South Africa 27
  • South Georgia & South Sandwich Islands 500
  • South Korea 82
  • South Sudan 211
  • Spain 34
  • Sri Lanka 94
  • Sudan 249
  • Suriname 597
  • Svalbard & Jan Mayen 47
  • Swaziland 268
  • Sweden 46
  • Switzerland 41
  • Syria 963
  • Sao Tome and Principe 239
  • Taiwan 886
  • Tajikistan 992
  • Tanzania 255
  • Thailand 66
  • Timor-Leste 670
  • Togo 228
  • Tokelau 690
  • Tonga 676
  • Trinidad & Tobago 1-868
  • Tunisia 216
  • Turkey 90
  • Turkmenistan 993
  • Turks & Caicos Islands 1-649
  • Tuvalu 688
  • U.S. Outlying Islands
  • U.S. Virgin Islands 1-340
  • UK 44
  • US 1
  • Uganda 256
  • Ukraine 380
  • United Arab Emirates 971
  • Uruguay 598
  • Uzbekistan 998
  • Vanuatu 678
  • Vatican City 39-06
  • Venezuela 58
  • Vietnam 84
  • Wallis & Futuna 681
  • Western Sahara 212
  • Yemen 967
  • Zambia 260
  • Zimbabwe 263
How Old Is Your Child?
  • Less Than 5 Years
  • 5 - 8 Years
  • 9 - 13 Years
  • 14 - 18 Years
  • 18+ Years

Let’s understand hypothesis testing in ML.

What is Hypothesis Testing in Machine Learning?

Machine learning models are chosen based on their mean performance, often calculated using k-fold cross-validation.

The algorithm with the best mean performance is expected to be better than those algorithms with worse mean performance. But what if the difference in the mean performance is caused by a statistical fluke?

The solution is to use a statistical hypothesis test to evaluate whether the difference in the mean performance between any two algorithms is real or not.

Machine learning models are chosen based on their mean performance, often calculated using k-fold cross-validation.

Model selection involves evaluating a suite of different machine learning algorithms or modeling pipelines and comparing them based on their performance.

The model or modeling pipeline that achieves the best performance according to your performance metric is then selected as the final model that you can then use to start making predictions on new data.

This applies to regression and classification predictive modeling tasks with classical machine learning algorithms and deep learning. It’s always the same process.

The problem is, how do you know the difference between two models is real and not just a statistical fluke?

This problem can be addressed using a statistical hypothesis test.

What is Hypothesis Testing?

Hypothesis testing is a form of statistical inference that uses data from a sample to draw conclusions about a population parameter or a population probability distribution.

First, a tentative assumption is made about the parameter or distribution. This assumption is called the Null Hypothesis and is denoted by H0. An Alternative Hypothesis (denoted Ha), which is the opposite of what is stated in the null hypothesis, is then defined. The hypothesis-testing procedure involves using sample data to determine whether or not H0 can be rejected. If H0 is rejected, the statistical conclusion is that the alternative hypothesis Ha is true.

hypothesis testing machine learning

For example, assume that a radio station selects the music it plays based on the assumption that the average age of its listening audience is 30 years. To determine whether this assumption is valid, a hypothesis test could be conducted with the null hypothesis given as H0: μ = 30 and the alternative hypothesis given as Ha: μ ≠ 30.

Based on a sample of individuals from the listening audience, the sample mean age, , can be computed and used to determine whether there is sufficient statistical evidence to reject H0. Conceptually, a value of the sample mean that is “close” to 30 is consistent with the null hypothesis, while a value of the sample mean that is “not close” to 30 provides support for the alternative hypothesis. What is considered “close” and “not close” is determined by using the sampling distribution of .

Process of Conducting Hypothesis Testing

When you are evaluating a hypothesis, you need to account for both the variability in your sample and how large your sample is.  Based on this information, you’d like to make an assessment of whether any differences you see are meaningful, or if they are likely just due to chance.  This is formally done through a process called hypothesis testing.

The five steps in Hypothesis Testing are:

  • Specify the Null Hypothesis
  • Specify the Alternative Hypothesis
  • Set the Significance Level
  • Calculate the Test Statistic and Corresponding p-Value
  • Drawing a Conclusion

Step1: Specify the Null Hypothesis

The null hypothesis (H0) is a statement of no effect, relationship, or difference between two or more groups or factors.  In research studies, a researcher is usually interested in disproving the null hypothesis.

Examples

  • There is no difference in intubation rates across ages 0 to 5 years.
  • The intervention and control groups have the same survival rate (or, the intervention does not improve the survival rate).
  • There is no association between injury type and whether or not the patient received an IV in the prehospital setting.

Step2: Specify the Alternative Hypothesis

The alternative hypothesis (Ha) is the statement that there is an effect or difference.  This is usually the hypothesis the researcher is interested in proving.  The alternative hypothesis can be one-sided (only provides one direction, e.g., lower) or two-sided.  We often use two-sided tests even when our true hypothesis is one-sided because it requires more evidence against the null hypothesis to accept the alternative hypothesis.

Examples

  • The intubation success rate differs with the age of the patient being treated (two-sided).
  • The time to resuscitation from cardiac arrest is lower for the intervention group than for the control (one-sided).
  • There is an association between injury type and whether or not the patient received an IV in the prehospital setting (two-sided).

Step 3: Set the Significance Level (α)

The significance level (denoted by the Greek letter alpha— α) is generally set at 0.05.  This means that there is a 5% chance that you will accept your alternative hypothesis when your null hypothesis is actually true. The smaller the significance level, the greater the burden of proof needed to reject the null hypothesis, or in other words, to support the alternative hypothesis.

Step 4: Calculate the Test Statistic and Corresponding p-Value

Hypothesis testing generally uses a test statistic that compares groups or examines associations between variables.  When describing a single sample without establishing relationships between variables, a confidence interval is commonly used.

The p-value describes the probability of obtaining a sample statistic as or more extreme by chance alone if your null hypothesis is true.  This p-value is determined based on the result of your test statistic.  Your conclusions about the hypothesis are based on your p-value and your significance level.

Examples

Ex 1: p-value = 0.01 This will happen 1 in 100 times by pure chance if your null hypothesis is true. Not likely to happen strictly by chance.

Ex 2: p-value = 0.75 This will happen 75 in 100 times by pure chance if your null hypothesis is true. Very likely to occur strictly by chance.

If you do a large number of tests to evaluate a hypothesis (called multiple testing), then you need to control for this in your designation of the significance level or calculation of the p-value.  For example, if three outcomes measure the effectiveness of a drug or other intervention, you will have to adjust for these three analyses.

Step 5: Drawing a Conclusion

  1. p-value <= significance level (α) => Reject your null hypothesis in favor of your alternative hypothesis.  Your result is statistically significant.
  2. p-value > significance level (α) => Fail to reject your null hypothesis.  Your result is not statistically significant.

Hypothesis testing is not set up so that you can absolutely prove a null hypothesis.  Therefore, when you do not find evidence against the null hypothesis, you fail to reject the null hypothesis. When you do find strong enough evidence against the null hypothesis, you reject the null hypothesis. 

Your conclusions also translate into a statement about your alternative hypothesis.  When presenting the results of a hypothesis test, include descriptive statistics in your conclusions as well.  Report exact p-values rather than a certain range. 

For example, “The intubation rate differed significantly by patient age with younger patients having a lower rate of successful intubation (p=0.02).”  Here are two more examples with the conclusion stated in several different ways.

hypothesis testing machine learning

Examples

Ex 1:

  • H0: There is no difference in survival between the intervention and control groups.
  • Ha: There is a difference in survival between the intervention and control groups.
  • α = 0.05; 20% increase in survival for the intervention group; p-value = 0.002

Conclusion:

  • Reject the null hypothesis in favor of the alternative hypothesis.
  • The difference in survival between the intervention and control groups was statistically significant.
  • There was a 20% increase in survival for the intervention group compared to the control (p=0.001).

Ex 2:

  • H0: There is no difference in survival between the intervention and control groups.
  • Ha: There is a difference in survival between the intervention and control groups.
  • α = 0.05; 5% increase in survival between the intervention and control group; p-value = 0.20.

Conclusion:

  • Fail to reject the null hypothesis.
  • The difference in survival between the intervention and control groups was not statistically significant.
  • There was no significant increase in survival for the intervention group compared to the control (p=0.20).

Practice Problems

  1. What is Hypothesis Testing?
  2. What is the Null Hypothesis?
  3. What is Alternate Hypothesis?
  4. What is the level of significance?
  5. What is two-tail hypothesis testing?
  6. What is one-tail hypothesis testing?
  7. How many types of one-tail hypothesis testing are there?
  8. What is a p-value?
  9. Write down the criteria for drawing a conclusion.

FAQs

What do you mean by hypothesis testing?

Hypothesis testing is a form of statistical inference that uses data from a sample to draw conclusions about a population parameter or a population probability distribution.

What is the main purpose of hypothesis testing?

The purpose of hypothesis testing is to determine whether there is enough statistical evidence in favor of a certain belief, or hypothesis, about a parameter.

What are the steps involved in hypothesis testing?

The hypothesis testing involves 5 steps.
a) Specify the Null Hypothesis
b) Specify the Alternative Hypothesis
c) Set the Significance Level
d) Calculate the Test Statistic and Corresponding p-Value
e) Drawing a Conclusion

Conclusion

Hypothesis testing is a form of statistical inference that uses data from a sample to draw conclusions about a population parameter or a population probability distribution. The purpose of hypothesis testing is to determine whether there is enough statistical evidence in favor of a certain belief, or hypothesis, about a parameter.

Recommended Reading

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}
>