Skip to content
Go back

ggplot From Zero 08: A Full Workflow From Raw Table to Final Figure

On this page15 sections
Cover image for ggplot From Zero 08: A Full Workflow From Raw Table to Final Figure

Who This Is For

This article is for readers who want to see what a complete beginner-friendly ggplot workflow looks like from start to finish. Instead of learning one isolated geom, you will move from a raw table to a final publishable figure using the same ideas from the earlier articles.

What You Will Do

  • Start from a small synthetic bioinformatics-style table.
  • Summarize and clean the data.
  • Build a rough first plot.
  • Improve it with grouping, faceting, labels, and styling.
  • Export a final figure and compare it to the rough version.

Before You Start

  • You should be comfortable with the first seven articles in this series.
  • You need ggplot2, dplyr, tidyr, ggrepel, and patchwork.
  • You should understand that plot design is a sequence of decisions, not a one-line trick.

The companion script for this article is:

R draw/scripts/08-ggplot-from-zero-full-workflow.R

Step 1: Start from a Raw Table

The script creates a synthetic table with:

  • sample_id
  • module
  • condition
  • batch
  • replicate
  • score

This dataset is small on purpose. The goal is not biological realism at full scale. The goal is to practice the workflow with a structure that feels closer to real scientific data than the earlier public teaching datasets.

raw_bioinfo <- tidyr::expand_grid(
  sample_id = paste0("Sample_", 1:24),
  module = c("Inflammation", "Cell cycle", "Interferon"),
  condition = c("Control", "Treatment"),
  batch = c("Batch_1", "Batch_2"),
  replicate = 1:3
)

Step 2: Summarize Before Plotting

Rather than plotting every raw row immediately, the script creates a summary table first:

workflow_table <- raw_bioinfo %>%
  group_by(module, condition, batch) %>%
  summarise(
    mean_score = mean(score),
    sd_score = sd(score),
    n = n(),
    se = sd_score / sqrt(n),
    .groups = "drop"
  )

This is a very common scientific plotting pattern. The figure is usually built from an analysis-ready summary table, not always from the raw table directly.

Step 3: Build a Rough First Draft

Start with the simplest plot that answers the question at all.

plot_rough <- ggplot(workflow_table, aes(x = module, y = mean_score, fill = condition)) +
  geom_col(position = "dodge") +
  labs(title = "Rough first draft")

That is enough to test whether the plot type is reasonable.

Step 4: Improve the Plot Deliberately

Then improve it step by step:

  • better labels
  • explicit colors
  • error bars
  • faceting by batch
  • cleaner theme
  • better legend placement

The companion script produces this comparison:

08-full-workflow-rough-vs-final.png

And here is the final version on its own:

08-full-workflow-final.png

Step 5: Highlight the Most Important Result

The series ends with one more refinement: selective labeling.

08-full-workflow-labeled.png

Instead of labeling everything, the script labels only the highest summary values within each batch. That keeps the figure readable while still drawing the reader’s attention.

Step 6: Adapt This Workflow to Your Own Data

If you want to reuse this pattern for your own project, replace these pieces first:

  • the raw input table
  • the grouping variables
  • the summary statistics
  • the color scale
  • the faceting variable

You do not need to rewrite the whole plot from zero every time. You only need to swap the parts that define your own scientific question.

Show Explanation

Files Produced by the Companion Script

  • R draw/figures/08-full-workflow-rough-vs-final.png
  • R draw/figures/08-full-workflow-final.png
  • R draw/figures/08-full-workflow-labeled.png
  • R draw/figures/08-full-workflow-raw-table.csv
  • R draw/figures/08-full-workflow-summary-table.csv
```

How to Confirm It Worked

  • The script creates all three figure files and both CSV files.
  • You can explain why the rough plot is weaker than the final plot.
  • You can point to at least three design decisions that improved the final figure.
  • You can identify which columns you would replace in your own dataset.

Common Questions

Why use synthetic data here instead of a downloaded real dataset?

Because the goal is a stable, self-contained tutorial that still feels close to a real scientific workflow without adding fragile external download steps.

Why make a rough version first?

Because rough drafts help you test the question and the plot type quickly. They are part of a good workflow, not a sign of failure.

What is the main lesson of the whole series?

A good ggplot figure is built by a chain of small decisions: data, mapping, geometry, structure, style, and export.

Review Score

Score: 95/100 Verdict: This draft is ready for human review and works well as the series finale because it turns the earlier isolated lessons into one complete workflow.

Show Explanation

Score Breakdown

  • Accuracy: 24/25. The workflow reflects a realistic summary-table-first scientific plotting pattern.
  • Beginner friendliness: 24/25. The article connects earlier lessons without assuming too much extra theory.
  • Reproducibility: 25/25. The script generates all data, figures, and summary tables locally with no external dependency.
  • Professional judgment and risk handling: 22/25. The article keeps the final workflow approachable while still showing a more domain-relevant example.

Review Notes

  • Ready for human review.
  • Before publication, consider adding one short note about how this workflow would change if the raw table came from a CSV or Excel file instead of being generated locally.
```

Personnel

  • ✍ Creator: Chenglin Cai
  • 🤖 AI Collaboration: ChatGPT
  • 🧪 Data Provider: synthetic bioinformatics-style example data
  • 💻 Code Contributor: ChatGPT