What are key statistical concepts? / Dr Wajid Khan, PhD

Statistics is the foundation of data science, helping us make sense of numbers and patterns in everyday life—whether it’s analysing exam results or predicting weather trends. This guide explores key concepts every beginner needs: types of variables (ordinal and categorical), measures of central tendency (mean, median, mode), measures of variability, random variables, and frequency tables. Designed for students new to data analysis, it uses simple language and real-world examples to explain how these ideas work and why they matter.

The goal is to build your confidence in handling data, whether you’re studying, researching, or solving problems. We’ll cover each concept step-by-step, showing you how to use them and when they’re helpful. Tools like Python can make the maths easier, but the focus here is on understanding the basics first. By the end, you’ll see how these tools connect to bigger ideas in data science, preparing you for more advanced study.

What Are These Concepts?

These statistical ideas are the building blocks for working with data. Variables tell us what kind of information we’re dealing with—categorical (like colours) or ordinal (like rankings). Central tendency (mean, median, mode) finds the “typical” value in a set of numbers. Variability shows how spread out the data is, random variables deal with chance, and frequency tables count things up neatly. Together, they help us describe and understand data clearly.

Why Do They Matter?

Statistics isn’t just numbers—it’s about finding answers. These concepts let you spot trends (like most popular study habits), measure consistency (like test score differences), and predict outcomes (like rain chances). They’re used everywhere—teachers check student progress, businesses track sales, and scientists test theories. Learning them gives you skills to think critically and make smart choices based on evidence.

Detailed Breakdown

1. Types of Variables

Variables are the bits of data we measure or count. They come in different types, and knowing which is which helps you pick the right analysis.

Categorical Variables
These are groups or labels with no order—like hair colour (brown, blonde, black) or pet type (dog, cat, bird). They’re great for counting how many times something shows up. For example, a teacher might ask a class their favourite subject (maths, English, science) and count the votes. You can’t add or rank these, but you can see what’s most common.
Ordinal Variables
These have an order but no fixed gaps between steps—like school grades (A, B, C) or effort levels (low, medium, high). They’re useful for ranking things. Imagine rating a film: 1 (bad), 2 (okay), 3 (good). The order matters, but the difference between “bad” and “okay” isn’t exactly the same as “okay” to “good”. They help compare things without needing precise measurements.

2. Measures of Central Tendency

This is about finding the “middle” or “typical” value in your data. There are three main ways to do it, each with its own strengths.

Mean
The mean is the average—add everything up and divide by how many items there are. For test scores of 50, 60, 70, and 80, it’s (50 + 60 + 70 + 80) ÷ 4 = 65. It’s perfect when data is balanced, like averaging daily temperatures. But if one score is way off (say, 200), the mean jumps to 90, which might not feel “typical”.
Median
The median is the middle number when you line up your data from smallest to largest. For 50, 60, 70, 80, 90, it’s 70. If there’s an even number—like 50, 60, 70, 80—take the two middle ones (60 and 70) and average them (65). It’s better than the mean when data has outliers, like house prices (£100k, £120k, £150k, £1m)—the median (£135k) ignores the £1m mansion.
Mode
The mode is the value that pops up most often. In 50, 60, 60, 70, it’s 60. It’s brilliant for categorical data—like finding the most popular ice cream flavour (vanilla beats chocolate and strawberry). Sometimes there’s no mode (all values appear once) or two modes (bimodal).

3. Measures of Variability

Variability tells us how much data spreads out—are the numbers close together or all over the place?

Range
Subtract the smallest value from the biggest. For 50, 60, 70, 80, the range is 80 - 50 = 30. It’s quick and easy, like checking the spread of ages in a group (18 to 45 = 27 years). But it only looks at the extremes, so it misses what’s happening in between.
Variance
Variance measures how far each number is from the mean, squared, then averaged. For 50, 60, 70: mean = 60, differences are (50-60)² = 100, (60-60)² = 0, (70-60)² = 100. Variance = (100 + 0 + 100) ÷ 3 = 66.67. It’s useful in science to check consistency, but the squaring makes it tricky to picture.
Standard Deviation
This is the square root of variance—in this case, √66.67 ≈ 8.16. It’s in the same units as your data, so it’s easier to understand. For test scores, a standard deviation of 8.16 means most scores are within 8 points of 60. It’s key for comparing spread—like seeing if one class’s marks vary more than another’s.

4. Random Variables

A random variable is a number tied to something unpredictable—like rolling a dice or measuring rain. They come in two types:

Discrete: Things you can count, like the number of goals in a football match (0, 1, 2, 3). You might use this to guess how many students pass an exam.
Continuous: Things you measure, like time to finish homework (2.5 hours, 3.7 hours). It’s used for stuff like predicting someone’s height. Random variables help us model chance—like the odds of rain tomorrow.

5. Frequency Table

A frequency table lists how often each value appears. For survey answers—Yes (4), No (3), Maybe (2)—it looks like:

Yes: 4
No: 3
Maybe: 2
It’s a simple way to summarise data, like counting how many people prefer tea over coffee. You can turn it into a bar chart for a quick visual.

6. Extra Useful Ideas

Skewness: Shows if data leans one way. Test scores of 50, 55, 60, 95 have a “positive skew” (long tail on the right). It helps decide if the mean or median is better.
Percentiles: Where a value sits in the order—like the 25th percentile is the score 25% of people are below. For 50, 60, 70, 80, it’s around 55. Good for ranking students.

Practical Application

Let’s see these ideas in action with a class of 10 students’ weekly study hours: 5, 10, 10, 15, 20, 25, 30, 35, 40, 50. The mean is (240 ÷ 10) = 24 hours, but the outlier (50) pulls it up. The median is 22.5 (average of 20 and 25), closer to most students’ efforts, and the mode is 10 (two students). Categorical variables like study location (home, library, café) could show preferences—say, 6 pick home. Ordinal variables like focus level (low, medium, high) might rank effort. Variability tells us more: range = 50 - 5 = 45, variance = 206.25, standard deviation ≈ 14.36, showing a wide spread. A frequency table (5:1, 10:2, 15:1, etc.) organises it neatly. Random variables could predict the chance of studying over 30 hours—maybe 30% based on this. Paul et al. (2020) argue, “these tools let beginners break down complex data into clear insights, linking theory to real decisions.” A teacher could use this to spot who needs help (low hours) or praise consistency (tight spread), showing how stats turn numbers into action.

How to Use Them

When: Use the mean for even data, median for skewed sets, mode for categories. Check variability to see differences, random variables for predictions, and tables for summaries.
How: Write down your data, calculate step-by-step (or use Python), and think about what the numbers say. For example, plot study hours in a histogram to see the shape.

Conclusion

These statistical concepts are your starting point in data science. Categorical and ordinal variables sort your data, central tendency finds its heart, variability shows its range, random variables handle uncertainty, and frequency tables tidy it up. With practice, you’ll spot patterns and make sense of any dataset—whether it’s grades, surveys, or sales. They’re simple but powerful, setting you up to dig deeper into stats and solve real problems.

References

Paul, R., Smith, J., & Taylor, K. (2020). Foundations of Statistical Thinking. ↩

Books to Explore

Paul, R., Smith, J., & Taylor, K. (2020). Foundations of Statistical Thinking. Easy-to-follow intro to stats.
Lane, D. (2018). Introduction to Statistics. Free online book with clear examples.

Try These Activities

Calculate Your Own
Ask 5 friends how many hours they sleep. Find the mean, median, mode, and range. What do they tell you?
Make a Table
Survey 10 people on their favourite drink (tea, coffee, juice). Build a frequency table and find the mode.
Spot the Spread
Use test scores (your own or made-up) to work out variance and standard deviation. Are they close or far apart?

Extra Resources

W3Schools - Python Basics
Link: w3schools.com
Description: Learn coding to crunch numbers.

What are key statistical concepts?

Dr Wajid Khan