What is Hypothesis Testing in Data Science? / Dr Wajid Khan, PhD

Data helps people figure out all sorts of things, like guessing if a new toy will sell or if rain might fall. Hypothesis testing acts like a detective tool in data science, helping decide if guesses about numbers hold true. Imagine playing a game where you test ideas to see if they’re right! Below begins an exciting journey to explore hypothesis testing, step by simple step, using ideas from a special lesson plan.

Learning hypothesis testing makes digging into data feel like solving a mystery. Picture yourself as a number detective, hunting for clues to prove or bust guesses. Each part explains what hypothesis testing means, how to measure numbers, check their spread, test guesses, and use computer tricks—all in a way that’s clear and fun for young readers.

Definition

Hypothesis testing means checking if a guess about numbers makes sense. Imagine wondering if most kids like chocolate ice cream best. You’d test that idea by asking some kids and looking at their answers. Hypothesis testing uses numbers to decide if your guess stands up or falls flat.

In data science, it helps answer big questions. A shop might guess customers spend 50 dollars on average, then test it counting real sales. A smart thinker named Ronald Fisher said hypothesis testing helps us learn truths from numbers, like finding treasure in a pile of clues.

It starts two guesses: one saying things stay normal (called the null guess), and another saying something changes (the alternative guess). Testing picks the winner based on proof from numbers collected.

Computers play a big role here. Tools like Python can crunch numbers fast, helping test guesses without paper and pencil. It’s like having a speedy robot helper!

Hypothesis testing pops up everywhere. Think about guessing if a new game beats an old one or if plants grow taller rain—they’re all ideas to test and explore.

Measuring Numbers

Hypothesis testing needs ways to sum up numbers. Let’s look at three easy helpers!

Most Common Number (Mode)

The mode picks the number showing up most often. Imagine five kids picking snacks: three choose cookies, one picks chips, one grabs candy. Cookies win as the mode since three kids picked them—more than any other snack.

Finding the mode helps spot what’s popular. A toy store might count which toy sells most, making “mode” their star helper for stocking shelves.

It’s super simple—just look for the number repeating most. If two numbers tie, both can be the mode, like a fun double winner!

Middle Number (Median)

The median finds the number smack in the middle. Picture lining up five heights: 100, 110, 120, 130, 140 centimetres. Sort them, and 120 lands in the centre—that’s the median.

It helps when numbers jump around. If one kid grows super tall at 200 centimetres, the median stays steady, showing a fair middle ground.

Finding it means sorting numbers low to high, then picking the one in the middle. It’s like finding the heart of a number line!

Average Number (Mean)

The mean adds all numbers and splits them evenly. Say kids score 10, 20, and 30 on a game. Add them—60—then divide by 3, giving 20 as the mean.

It’s great for seeing the usual amount. A teacher might use the mean to check how kids do on tests overall, giving a big-picture idea.

To find it, add everything up and divide by how many numbers exist. It’s like sharing candies equally among pals!

Checking Spread

Numbers spread out sometimes, and hypothesis testing checks how far they go. Let’s explore two helpers!

Number Stretch (Variance)

Variance measures how far numbers stray from the mean. Imagine three scores: 10, 20, 30. The mean is 20. Variance looks at differences: 10 is 10 below, 20 is 0 away, 30 is 10 above. Square those (100, 0, 100), add them (200), and divide by 3—about 67.

It shows if numbers stick close or fly apart. A big variance means wild differences, like kids’ heights all over the place.

Calculating variance takes each number’s gap from the mean, squares it, adds them up, and divides by the count. It’s a stretch-o-meter for numbers!

Average Jump (Standard Deviation)

Standard deviation shrinks variance into an easier number. Take that 67 from variance, find its square root—about 8. That’s how far numbers jump from the mean on average.

It helps guess where most numbers land. For a mean of 20 and standard deviation of 8, most scores hover between 12 and 28, like a cozy number hug.

To get it, find variance first, then take its square root. It’s like measuring how bouncy numbers feel!

Testing Guesses

Hypothesis testing checks guesses fun ways. Let’s break it down!

Binomial Guess Game

Imagine rolling 20 dice, each a skull on one side out of six. What’s the chance of exactly 4 skulls? Each die has a one-out-of-six shot at a skull. Testing guesses how often 4 skulls pop up in 20 rolls.

Computers make it easy. Python can guess it’s about 20 out of 100 tries—pretty neat! It’s like rolling dice a zillion times fast.

This game helps when things happen or don’t, like flipping coins or picking winning tickets. It’s a yes-or-no guess party!

Normal Guess Shape

Numbers sometimes form a bell shape, called normal. Picture kids’ heights—most cluster around the middle, fewer super short or tall. Testing guesses if numbers follow this shape, starting an average (mean) of 0 and a jump (standard deviation) of 1.

It predicts most numbers land close—68 out of 100 stay near the mean, 95 out of 100 a bit farther, 99 out of 100 way out. It’s like a number map!

Testing shifts numbers to fit this shape. If kids’ scores average 50 and jump 10, a score of 32 turns into a spot 1.8 jumps below, giving a tiny chance of 3.59 out of 100.

T Guess Shape

Sometimes only a few numbers exist, not the whole bunch. The T shape helps then, looking like the bell but plumper at the ends. It’s great for small groups, like 5 kids’ scores.

It guesses more carefully than the normal shape, spreading chances wider. Small groups need that extra wiggle room!

Using T means guessing averages when not everything is known, like a cautious detective checking clues.

Z and T Guess Checks

Z checks guesses when the whole spread is known. Imagine knowing all kids’ score jumps—Z tests if their average matches 67.5. T steps in when only some jumps are known, using what’s collected.

Most times, T wins since full info hides. Z fits big groups, T fits small ones—each has its own guessing superpower!

Picking between them depends on what’s known. Z loves full maps, T loves little sketches—both solve number mysteries!

Using Computer Tricks

Computers speed up hypothesis testing big time!

Python crunches guesses fast. For 20 dice and 4 skulls, it spits out 20 out of 100 without sweat. For 100 coin flips and 5 heads, it guesses tiny odds super quick too.

It turns tough math into playtime. Instead of adding and dividing forever, Python zips through, showing answers like magic.

Testing normal or T shapes gets easier too. Python shifts numbers and checks chances, saving time for more guessing fun!

Computers act like trusty pals, making hypothesis testing a breeze for any number adventure.

Importance of Hypothesis Testing

Hypothesis testing rocks in data science—here’s why!

It proves guesses right or wrong, like if a new snack beats the old one. Shops use it to plan what kids might buy next.

Doctors test if medicine helps, keeping guesses real and fair. Game makers check if new levels work better, making playtime awesome.

A clever mind, George Box, said all guesses need testing—none start perfect. It keeps ideas honest and sharp!

Hypothesis testing turns data into answers, helping everyone make smart moves every day.

Applying Hypothesis Testing

Using hypothesis testing feels like a number treasure hunt!

Start picking a guess, like kids liking cookies best. Collect numbers—ask some kids their picks. Check them measuring averages and spreads.

Test the guess deciding if it holds or flops, using computer tricks to speed it up. It’s a detective game finding number truths!

Conclusion

Hypothesis testing stands as a data science buddy, checking guesses numbers. It measures averages, tracks spreads, tests ideas like dice or shapes, and uses computers for fun. Each step builds a mini-adventure, turning anyone into a number detective. Hypothesis testing spins data into cool answers—for school, games, or just wondering!

References

Slide 2: Central Tendency Summary, Statistics for Data Science. ↩
Slide 3: Variance and Standard Deviation, Statistics for Data Science. ↩
Slide 4: Binomial Distribution, Statistics for Data Science. ↩
Slide 7: Standard Normal Distribution, Statistics for Data Science. ↩
Slide 20: One-Sample Z-Test, Statistics for Data Science. ↩

Books

Montgomery, D. C. (2013). Design and Analysis of Experiments. A big book on testing ideas.
Moore, D. S. (2016). Introduction to the Practice of Statistics. A friendly guide to number fun.
Box, G. E. P. (1978). Statistics for Experimenters. Explains why testing guesses matters.

Activities for Students

Here follows five easy games to play hypothesis testing:

Snack Pick Count
Task: Ask 10 pals their favourite snack—cookies, chips, or candy.
Steps: Count the most picked (mode) and guess if cookies win most often.
Outcome: See if your guess about cookies holds true.

Height Line-Up
Task: Measure 5 friends’ heights in centimetres.
Steps: Find the middle height (median) and guess if most kids match it.
Outcome: Check if the middle tells a real story.

Game Score Average
Task: Play a game 5 times, jotting scores.
Steps: Add scores, divide by 5 (mean), and guess if it’s usual.
Outcome: Test if your average fits most tries.

Dice Skull Hunt
Task: Roll 6 dice 10 times, counting skulls (one per die).
Steps: Guess if 1 skull happens most and check your counts.
Outcome: Explore binomial guessing fun.

Toy Test Guess
Task: Test 5 toys for working or breaking, guessing most work.
Steps: Count working ones and see if your guess stands.
Outcome: Play a simple Z or T test game.

Additional Reading

Coursera, Statistical Inference
Link: coursera.org/learn/statistical-inference
Description: Free course from Johns Hopkins unpacking guess testing, active in 2025.

edX, Statistics Basics
Link: edx.org/learn/statistics
Description: Free lessons diving into number testing, current and kid-friendly.

Khan Academy, Significance Tests
Link: khanacademy.org/math/statistics-probability/significance-tests-one-sample
Description: Free, lively lessons on proving guesses, checked functional for 2025.

What is Hypothesis Testing in Data Science?

Dr Wajid Khan