What is Statistical Analysis In Data Science?

Default Avatar
Dr Wajid Khan
Feb 21, 2025 · 8 mins read

Data drives decision-making in countless fields, from science to business, and statistical methods offer a structured approach to uncovering its meaning. Statistical methods involve techniques to gather, summarise, and evaluate data, revealing patterns and insights crucial for solving problems. Data analytics builds on such methods, applying them to interpret information, often using computational tools like Python.

Grounded in the learning outcomes of TU London’s Statistical Methods for Data Analytics module, key skills include critically evaluating strategic challenges, selecting valid statistical approaches, interpreting outcomes, appraising method reliability, and understanding ethical implications. Below, an examination unfolds into foundational concepts, qualitative and quantitative techniques, and their applications, tailored for students new to the subject.

Understanding statistical methods equips learners to handle data confidently, fostering skills in research, critical thinking, and professional responsibility. The content aligns to develop personal, transferable, cognitive, and ethical competencies, preparing students to analyse data meaningfully in academic and real-world contexts.

Definition

Statistical methods encompass systematic processes for collecting, organising, and interpreting data to draw reliable conclusions. Such approaches span descriptive statistics (e.g., calculating averages), inferential statistics (e.g., testing hypotheses), and visualisation (e.g., plotting graphs). Data analytics integrates these techniques to extract insights, frequently employing programming environments like Python or Google Colab. According to Freedman et al. (2007), “statistics provides a framework to address variability and uncertainty in data, enabling informed decisions.”

Data splits into qualitative (descriptive, non-numerical) and quantitative (numerical, measurable) forms, each requiring distinct handling. The module emphasises critical analysis, ethical considerations, and method selection, ensuring students grasp both theory and practice.

Types of Data

Data categorisation guides analytical approaches, shaping the tools and techniques applied. Key classifications emerge from the module’s focus:

Based on Nature

  • Qualitative Data: Non-numerical information capturing qualities, opinions, or behaviours (e.g., survey responses on game enjoyment). It offers depth and context, often gathered through interviews or focus groups.
  • Quantitative Data: Numerical values suited to measurement and computation (e.g., hours spent studying). It provides objectivity and precision for statistical evaluation.

Based on Collection Method

  • Primary Data: Collected directly from sources via surveys, observations, or experiments (e.g., student performance records).
  • Secondary Data: Derived from existing studies or archives (e.g., published research data).

Based on Scale

  • Nominal Data: Unordered categories (e.g., course subjects: maths, history).
  • Ordinal Data: Ordered categories lacking equal intervals (e.g., satisfaction levels: poor, fair, good).
  • Interval Data: Numerical values equal intervals sans true zero (e.g., temperature in Celsius).
  • Ratio Data: Numerical values equal intervals and a true zero (e.g., distance in metres).

Recognising data types ensures appropriate method selection, a core skill in the module’s outcomes.

Statistical Methods in Analytics

Statistical methods divide into descriptive and inferential domains, each addressing specific analytical needs outlined in the module.

Descriptive Statistics

Descriptive statistics condense data into summaries:

  • Central Tendency: Mean (arithmetic average), median (middle value), and mode (most common value) pinpoint the data’s core.
  • Dispersion: Variance (average squared difference from mean) and standard deviation (variance’s square root) measure spread.
  • Visualisation: Histograms (frequency bars), scatter plots (point patterns), and boxplots (quartiles and outliers) illustrate distributions.

Such techniques help students present data clearly, aligning to critically evaluating and presenting challenges.

Inferential Statistics

Inferential statistics extend beyond observed data:

  • Probability: Quantifies event likelihood (e.g., success rate of a study method).
  • Hypothesis Testing: Assesses claims via T-tests (mean comparisons) or Chi-squared tests (categorical relationships). Moore et al. (2016) note, “hypothesis testing measures evidence against a null hypothesis.”
  • Regression: Models variable relationships (e.g., predicting grades from attendance).
  • ANOVA: Compares multiple group means (e.g., test scores across classes).

These methods enable pattern identification and significance assessment, key cognitive skills in the module.

Qualitative Analysis

Qualitative analysis explores non-numerical data, uncovering themes and meanings vital to understanding human perspectives.

Purpose and Approach

Qualitative research delves into subjective experiences, addressing “why” and “how” questions. It adapts as data emerges, often lacking initial hypotheses. Relevant to computing areas like gamification or cybercrime, it employs:

  • Thematic Analysis: Detects recurring ideas (e.g., student study motivations).
  • Discourse Analysis: Studies language use.
  • Narrative Analysis: Examines story structures.

The module highlights thematic analysis for its accessibility and rigour.

Thematic Analysis Process

Thematic analysis follows a structured sequence:

  1. Familiarisation: Reviewing data (e.g., interview responses) to grasp content.
  2. Initial Coding: Noting patterns or concepts.
  3. Theme Development: Grouping codes into themes (e.g., “time management”).
  4. Review: Checking themes against data.
  5. Definition: Naming and detailing themes.
  6. Reporting: Summarising findings examples.

For larger datasets, themes track across sources, assessing prevalence and links, fostering critical appraisal of method validity.

Quantitative Analysis

Quantitative analysis applies numerical techniques to test ideas and quantify results, aligning to interpreting statistical outcomes.

Key Techniques

  • Correlation: Gauges variable relationships (e.g., study hours and marks).
  • T-tests: Compares two group means (e.g., test scores by gender).
  • Chi-squared: Tests categorical associations (e.g., subject preference by year).
  • Regression: Predicts outcomes from inputs (e.g., sales from marketing spend).

Visualisation

Graphs like scatter plots (relationships) and histograms (distributions) clarify findings, supporting data presentation skills.

Computational Tools

Python, paired Google Colab, facilitates computation and plotting. Students calculate statistics, run tests, and visualise data using basic code. Alternatives like R or NVivo (for qualitative coding) offer flexibility, though Python’s simplicity suits novices, enhancing transferable skills.

Importance of Statistical Methods

Statistical methods hold immense value:

  • Pattern Detection: Reveals trends (e.g., learning preferences).
  • Decision-Making: Guides strategies in education or research.
  • Ethical Awareness: Highlights data’s societal impact, per module outcome 5.

Combining qualitative and quantitative approaches maximises understanding, as Creswell (2014) states, “mixed methods harness complementary strengths.”

Applying Statistical Methods

Practical application involves:

  • Data Collection: Choosing methods (e.g., surveys for quantitative, interviews for qualitative).
  • Analysis: Employing descriptive stats, inferential tests, or thematic coding.
  • Evaluation: Judging method reliability and impact, per module outcomes.

Such steps develop judgement in method selection and responsibility for decisions.

Conclusion

Statistical methods anchor data analytics, empowering students to interpret qualitative and quantitative data adeptly. Techniques spanning probability, regression, and thematic analysis build a versatile skillset. Aligned to module outcomes, mastery fosters critical evaluation, pattern recognition, method appraisal, and ethical understanding, preparing learners to transform data into knowledge across disciplines.


References

  1. Freedman, D., Pisani, R., & Purves, R. (2007). Statistics.
  2. Moore, D. S., McCabe, G. P., & Craig, B. A. (2016). Introduction to the Practice of Statistics.
  3. Creswell, J. W. (2014). Research Design: Qualitative, Quantitative, and Mixed Methods Approaches.
  4. Braun, V., & Clarke, V. (2006). Using Thematic Analysis in Psychology.
  5. Montgomery, D. C. (2013). Design and Analysis of Experiments.

Books

  1. Freedman, D., Pisani, R., & Purves, R. (2007). Statistics. Introduces statistical foundations clearly.
  2. Moore, D. S., McCabe, G. P., & Craig, B. A. (2016). Introduction to the Practice of Statistics. Explains descriptive and inferential methods.
  3. Creswell, J. W. (2014). Research Design. Covers qualitative and quantitative approaches.

Activities for Students

Below are five practical activities to apply statistical methods, reinforcing module outcomes:

  1. Survey Design and Analysis
    • Task: Create a short survey (5 questions) on a topic (e.g., study habits). Collect responses from 10 peers (quantitative: hours studied; qualitative: preferred methods).
    • Steps: Calculate mean and median hours, then code qualitative responses into themes (e.g., “focus techniques”).
    • Outcome: Practises data collection, descriptive stats, and thematic analysis (Outcomes 2, 3).
  2. Plotting Practice
    • Task: Gather daily temperatures for a week (e.g., from a weather site).
    • Steps: Use Google Colab to plot a histogram and a scatter plot of temperatures. Note patterns (e.g., spread).
    • Outcome: Builds visualisation skills and critical presentation (Outcomes 1, 3).
  3. Hypothesis Testing Game
    • Task: Hypothesise if flipping a coin 20 times yields equal heads and tails.
    • Steps: Record results, apply a Chi-squared test manually or via Python, and interpret significance.
    • Outcome: Enhances inferential skills and method justification (Outcomes 2, 3).
  4. Interview Coding Challenge
    • Task: Interview 3 friends about a game they enjoy, recording answers.
    • Steps: Perform thematic analysis, identifying 2-3 themes (e.g., “graphics quality”). Justify choices in a short paragraph.
    • Outcome: Develops qualitative analysis and reliability appraisal (Outcomes 3, 4).
  5. Ethical Reflection
    • Task: Analyse a dataset (e.g., public social media stats) and write 200 words on potential misuse (e.g., privacy risks).
    • Steps: Consider societal impact and suggest ethical guidelines.
    • Outcome: Fosters ethical understanding in data use (Outcome 5).

Additional Reading