Case Study: Automating the Analytical Workflow for a Quasi-Experimental Educational Intervention Using R & Excel

You are here: Home » Guides » Guide 1: Case Studies » Software & Workflow Walkthroughs » Case Study: Automating the Analytical Workflow for a Quasi-Experimental Educational Intervention Using R & Excel

1. Project & Data Context in this automating analytical workflow case

In this automating analytical workflow Thesis, a PhD researcher evaluated a school-based instructional program’s impact on knowledge (0–20 scale) and attitude (0–72 scale) among 120 children and 97 adolescents from urban and rural schools. Data were collected at pre-intervention, immediate post-intervention, and three follow-up timepoints to assess both efficacy and durability .


2. Software Ecosystem of this automating analytical workflow case

  • R (Base + Custom Libraries)
    • Statistical functions: chisq.test(), t.test(), aov(), cor.test()
    • Custom scripts for assumption diagnostics and power/sample-size calculations
  • Microsoft Excel with VBA Macros
    • “Masterchart” standardization and data normalization
    • Automated dashboard generation (tables + charts + text abstracts)
  • R Markdown & Git
    • Reproducible reporting (.Rmd → HTML/PDF)
    • Version control for all scripts and documentation .

3. Data Ingestion & Masterchart Standardization

  1. Raw Import
    • Pre-, post-, and follow-up CSV files loaded into R (read.csv()).
  2. Schema Validation
    • validate_schema() function checks column names, data types, and missing-value codes.
  3. Masterchart Macro
    • VBA macro reformats data into a single “masterchart”: one row per participant, consistent columns, embedded codebook metadata—serving as the authoritative source for all analyses .

4. Automated Descriptive Dashboards

R Script: descriptive_dashboards.R

      df <- read.csv("masterchart.csv")
      library(dplyr)
      summary_stats <- df %>%
        group_by(age_group) %>%
        summarize(mean_knowledge = mean(knowledge_score),
                  sd_knowledge   = sd(knowledge_score))
      write.csv(summary_stats, "descriptive_summary.csv")

Excel VBA:

  • Loops over 30 pre-defined “headings” to create pivot tables and corresponding charts.
  • Exports each dashboard to its own worksheet for review and integration .

5. Inferential Pipeline Automation

All inferential analyses are executed in inferential_pipeline.R, producing ready-to-publish outputs:

TechniqueR FunctionPurpose
Chi-Squared Testchisq.test(table(...))Baseline associations between demographics and knowledge/attitude
Paired t-Testt.test(..., paired=TRUE)Pre- vs. post-intervention within-subject changes
Repeated Measures ANOVAaov(score ~ time + Error(id/time))Durability of gains across follow-ups
Pearson Correlationcor.test(knowledge, attitude)Linear relationship between knowledge and attitude
Independent t-Testt.test(score ~ age_group)Between-group comparison (children vs. adolescents)

Each step writes CSV summaries, diagnostic plots (residuals, QQ-plots), and appends to a centralized results folder.

6. Reproducible Reporting & Audit

  • R Markdown
    • Parameterized templates generate interim reports with embedded code, tables, and narrative.
  • Version Control
    • Git feature branches (e.g., feature/paired_t_tests) ensure peer-reviewed changes.
  • Data Audit Report
    • Final deliverable includes a Data Audit PDF detailing integrity checks, missing-data protocols, outlier handling, and IEC/IRB compliance .

7. Key Benefits & Lessons Learned from in this automating analytical workflow case

  • Consistency & Efficiency: Automating data normalization and dashboard creation reduced manual effort by 70 %, ensuring uniform presentation across chapters.
  • Modularity: Encapsulating each analysis in separate scripts facilitated debugging and reuse for future studies.
  • Auditability: Version-controlled workflows and automated audit reports established a clear provenance for every result—enhancing credibility and easing peer review.
  • Scalability: The same pipeline can process additional cohorts or variables with minimal code changes, supporting future extensions of the research.

This walkthrough under the “Software & Workflow Walkthroughs” cluster demonstrates how a robust, automated analytics pipeline—combining R scripting and Excel macros—streamlines complex inferential research in a quasi-experimental PhD study.


Want to explore more PhD-level case studies? Check out our Comprehensive Case Studies on PhD Statistical Analysis guide page.


Discover more from PhDStats Advisor

Subscribe to get the latest posts sent to your email.

Leave a Reply