Overview of an automated survey analytics case study
This automated survey analytics case study involves managing 948 survey responses on hybrid instruction motivations required a scalable and reproducible analytics pipeline. This walkthrough explains how Excel VBA, R scripting, and R Markdown combined to automate data ingestion, transformation, dashboarding, inferential testing, and final reporting.
Software Stack used in this automated survey analytics case study
- Excel 365 + VBA: Builds the master dataset and automates dashboards.
- R (v4.2+) + Key Packages: Uses
tidyverse
,simstudy
,MASS
, andknitr
for data processing and analysis. - R Markdown: Generates dynamic HTML and PDF reports with embedded code and visuals.
- Git (GitHub): Tracks code changes, collaboration, and continuous integration.
Workflow Steps
- Masterchart Generation (VBA Macro)
- The
BuildMasterchart()
macro readsdemographics.csv
andresponses.csv
. - Then it enforces column names, merges datasets, and handles missing values.
- Finally, it exports the unified
masterchart.csv
for downstream analysis.
- The
- Likert Transformation & Composite Scoring (
likert_pipeline.R
)
library(tidyverse)
df <- read_csv("masterchart.csv")
collapse <- function(x) case_when(x >= 4 ~ 3, x == 3 ~ 2, TRUE ~ 1)
df <- df %>%
mutate(across(Q1:Q10, collapse)) %>%
rowwise() %>%
mutate(
comp_usability = mean(c(Q1, Q4, Q7)),
comp_relevance = mean(c(Q2, Q5, Q8)),
comp_flex = mean(c(Q3, Q6, Q9))
) %>%
mutate(across(
starts_with("comp_"),
~ case_when(. > 2.33 ~ 3, . >= 1.67 ~ 2, TRUE ~ 1)
))
write_csv(df, "likert_masterchart.csv")
This script collapses raw Likert scores into three categories and computes composite subscale scores.
- Dashboard Automation (Excel VBA)
- A looping macro cycles through each question and each composite score.
- It creates pivot tables summarizing scores by
program_year
. - Next it generates clustered bar charts with uniform formatting.
- Finally it exports each chart to its own standardized worksheet tab.
- Inferential Testing (
inferential_pipeline.R
)- The script runs chi‑square tests for each demographic variable against
high_intent
. - Then it fits a logistic regression:
- Test statistics and p‑values are saved to CSV files.
- Diagnostic plots (e.g., mosaic plots, residual charts) are output to
/results/
.
- The script runs chi‑square tests for each demographic variable against
glm(high_intent ~ age + gender + year + access, family = binomial, data = df)
- Reporting (R Markdown)
- A parameterized
.Rmd
acceptsdata_path
andregion
as inputs. - It renders HTML and PDF reports with sections for executive summaries, methods, dashboards, inferential results, and appendices.
- Each run produces a self‑contained report suitable for web or print distribution.
- A parameterized
- Version Control & Continuous Integration
- All code and configuration live in a GitHub repository.
- GitHub Actions run a test suite on each push, validating schema and sample analytics.
- Successful builds deploy the updated HTML report to GitHub Pages automatically.
Benefits
- Efficiency: Automation cut manual processing time by 75%.
- Consistency: A single masterchart prevented data version drift.
- Reproducibility: Git commits, macro logs, and parameterized reports created a full audit trail.
This end‑to‑end pipeline empowered the researcher to focus on interpretation rather than manual data wrangling. Continuous integration and scripted workflows ensured accuracy, transparency, and scalability for hybrid instruction analytics.
Want to explore more PhD-level case studies? Check out our Comprehensive Case Studies on PhD Statistical Analysis guide page.
Discover more from PhDStats Advisor
Subscribe to get the latest posts sent to your email.