1. Project & Data Context in this automating analytical workflow case
In this automating analytical workflow Thesis, a PhD researcher evaluated a school-based instructional program’s impact on knowledge (0–20 scale) and attitude (0–72 scale) among 120 children and 97 adolescents from urban and rural schools. Data were collected at pre-intervention, immediate post-intervention, and three follow-up timepoints to assess both efficacy and durability .
2. Software Ecosystem of this automating analytical workflow case
- R (Base + Custom Libraries)
- Statistical functions:
chisq.test()
,t.test()
,aov()
,cor.test()
- Custom scripts for assumption diagnostics and power/sample-size calculations
- Statistical functions:
- Microsoft Excel with VBA Macros
- “Masterchart” standardization and data normalization
- Automated dashboard generation (tables + charts + text abstracts)
- R Markdown & Git
- Reproducible reporting (
.Rmd
→ HTML/PDF) - Version control for all scripts and documentation .
- Reproducible reporting (
3. Data Ingestion & Masterchart Standardization
- Raw Import
- Pre-, post-, and follow-up CSV files loaded into R (
read.csv()
).
- Pre-, post-, and follow-up CSV files loaded into R (
- Schema Validation
validate_schema()
function checks column names, data types, and missing-value codes.
- Masterchart Macro
- VBA macro reformats data into a single “masterchart”: one row per participant, consistent columns, embedded codebook metadata—serving as the authoritative source for all analyses .
4. Automated Descriptive Dashboards
▶ R Script: descriptive_dashboards.R
df <- read.csv("masterchart.csv")
library(dplyr)
summary_stats <- df %>%
group_by(age_group) %>%
summarize(mean_knowledge = mean(knowledge_score),
sd_knowledge = sd(knowledge_score))
write.csv(summary_stats, "descriptive_summary.csv")
▶ Excel VBA:
- Loops over 30 pre-defined “headings” to create pivot tables and corresponding charts.
- Exports each dashboard to its own worksheet for review and integration .
5. Inferential Pipeline Automation
All inferential analyses are executed in inferential_pipeline.R
, producing ready-to-publish outputs:
Technique | R Function | Purpose |
---|---|---|
Chi-Squared Test | chisq.test(table(...)) | Baseline associations between demographics and knowledge/attitude |
Paired t-Test | t.test(..., paired=TRUE) | Pre- vs. post-intervention within-subject changes |
Repeated Measures ANOVA | aov(score ~ time + Error(id/time)) | Durability of gains across follow-ups |
Pearson Correlation | cor.test(knowledge, attitude) | Linear relationship between knowledge and attitude |
Independent t-Test | t.test(score ~ age_group) | Between-group comparison (children vs. adolescents) |
Each step writes CSV summaries, diagnostic plots (residuals, QQ-plots), and appends to a centralized results folder.
6. Reproducible Reporting & Audit
- R Markdown
- Parameterized templates generate interim reports with embedded code, tables, and narrative.
- Version Control
- Git feature branches (e.g.,
feature/paired_t_tests
) ensure peer-reviewed changes.
- Git feature branches (e.g.,
- Data Audit Report
- Final deliverable includes a Data Audit PDF detailing integrity checks, missing-data protocols, outlier handling, and IEC/IRB compliance .
7. Key Benefits & Lessons Learned from in this automating analytical workflow case
- Consistency & Efficiency: Automating data normalization and dashboard creation reduced manual effort by 70 %, ensuring uniform presentation across chapters.
- Modularity: Encapsulating each analysis in separate scripts facilitated debugging and reuse for future studies.
- Auditability: Version-controlled workflows and automated audit reports established a clear provenance for every result—enhancing credibility and easing peer review.
- Scalability: The same pipeline can process additional cohorts or variables with minimal code changes, supporting future extensions of the research.
This walkthrough under the “Software & Workflow Walkthroughs” cluster demonstrates how a robust, automated analytics pipeline—combining R scripting and Excel macros—streamlines complex inferential research in a quasi-experimental PhD study.
Want to explore more PhD-level case studies? Check out our Comprehensive Case Studies on PhD Statistical Analysis guide page.
Discover more from PhDStats Advisor
Subscribe to get the latest posts sent to your email.