1. Figures and statistical tests using eDNA data

Building on group eDNA projects

Authors

Amy Van Cise

Sarah Tanja

Published

February 12, 2026

Modified

February 17, 2026

Background

This week you will continue to work in your groups to build upon your final lab project using the eDNA dataset!

The focus in this lab will be polishing up your figures and performing statistical tests to answer your specific research question. You will continue to write up your findings in a lab report format with citations!

A note about statistical tests: So far this quarter, we’ve looked at research questions that can be answered with an ANOVA or a linear regression. For the major lab project, the required statistic is an ANOVA or linear regression. If you learned a statistical test or model in another class that you want to apply here, I encourage you to try it out!

Week 7 Lab Report should include:

  1. Background information on chosen predator (e.g. diet, distribution, habitat use, competitors, predators) with MORE context relating to your specific hypotheses and findings, backed up with references using inline citations

  2. Citations use the default Chicago Manual of Style author-date format

  3. Finalized Broad Research Question

  4. Finalized Specific Research Question (if needed)

  5. Finalized Falsifiable Null and Alternate Hypotheses (be specific)

  6. Defined X and Y variables

  7. Figure(s) showing X and Y variables for each hypotheses tested.

  8. Statistical test results, and a statement about whether you reject or fail to reject the null hypothesis.

  9. Draft paragraph reflecting on your results and how your group interprets them. What is the story behind the data?1

1 New elements expected to be added for your week 7 iterative report are in bold

Code
humpy <- eDNA %>% 
  filter(common_name == "humpback whale") %>% 
  pivot_longer(16:length(.), names_to = "prey_species", values_to = "prey_prop") %>% 
  group_by(Detected, prey_species) %>%
  filter(mean(prey_prop) > 0.01) %>% # !!!!
  ungroup()
Code
humpy_Sleucopsarus <- humpy %>% 
  filter(prey_species == "Stenobrachius leucopsarus")

Polish your figure(s)

Choose a color palette

PNWColors

  • rev() reverses the order the colors are assigned
Code
library(PNWColors)
mycolors <- rev(pnw_palette("Bay", 2, type = "discrete"))
  • scale_color_manual() controls the outline of your geom

  • scale_fill_manual() controls the fill of your geom

Plot title and axis labels

  • labs() cotrols the following

    • title = "X vs. Y" plot title

    • x = "X variable label" x axis label

    • y = "Y variable label" y axis label

Explore the different built in themes

You can try:

  • theme_minimal()

  • theme_grey()

  • theme_bw()

  • theme_classic()

You can customize other parts of your plot in the theme() call if you choose… download this pdf theme cheatsheet if you want to learn more about it!

Unable to display PDF file. Download instead.

Example figure:

Code
ggplot(humpy_Sleucopsarus, 
       aes(x = prey_prop, 
           y = as.factor(Detected), 
           fill = as.factor(Detected), 
           color = as.factor(Detected))) +
  geom_point() +
  geom_boxplot(alpha = 0.5) +
  coord_flip() +
  scale_color_manual(
    values = mycolors,
    name = "Humpback Whale eDNA",
    labels = c("Not detected", "Detected")
  ) +
  scale_fill_manual(
    values = mycolors, 
    name = "Humpback Whale eDNA",
    labels = c("Not detected", "Detected")
  ) +
  theme_grey() +
  labs(
    title = "Detection of Humpback whale eDNA vs. abundance of Northern Lampfish eDNA",
    x = "Relative proportion of Northern Lampfish eDNA",
    y = "Humpback whale eDNA not detected (0) or detected (1)"
  ) +
  theme(
    legend.position = "none",                    # 🔹 hide key
    axis.text = element_text(size = 8, color = "grey60"),
    axis.title = element_text(size = 10, color = "grey60"),
    plot.title = element_text(size = 12, color = "grey40", face = "bold")
  )

Choose your statistical test

1. Identify your predictor variable(s) based on the wording of your specific hypothesis

2. Are your predictor variables categorical or continuous
Tip

If your predictor variable (x) is categorical –> your model is an ANOVA

If your predictor variable (x) is continuous –> your model is a Linear regression

Run your statistical test

ANOVA

A one-way ANOVA in R tests whether the means of a continuous response (y) variable differ across your categorical predictor (x) variable of two or more groups. Conceptually, it asks: Is the variation between group means larger than we would expect from random variation within groups? In R, you typically run it with aov(y ~ group, data = your_dataframe)

Tip

Read up on the usage of the built in ANOVA function aov() from the stats package by running ?aov in the console or a code chunk

Code
?aov
Code
anova_res <- aov(prey_prop ~ Detected, data = humpy_Sleucopsarus)

summary(anova_res)
             Df Sum Sq Mean Sq F value Pr(>F)
Detected      1  0.021 0.02150   1.113  0.292
Residuals   514  9.926 0.01931               
Code
humpy_Sleucopsarus %>% 
  group_by(Detected) %>% 
  summarise(mean(prey_prop))
# A tibble: 2 × 2
  Detected `mean(prey_prop)`
     <dbl>             <dbl>
1        0            0.0656
2        1            0.0897

Linear regression

A linear model in R using lm() estimates the relationship between a continuous response (y) variable and one or more continuous predictor (x) variables. At its core, it fits a line that minimizes squared residuals.

Tip

Read up on the usage of the built in linear model function lm() from the stats package by running ?lm in the console or a code chunk

Code
?lm
Code
lm_res <- lm(Detected ~ prey_prop, data=humpy_Sleucopsarus)
summary(lm_res)

Call:
lm(formula = Detected ~ prey_prop, data = humpy_Sleucopsarus)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.16102 -0.07673 -0.07169 -0.07148  0.92852 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.07148    0.01310   5.456 7.57e-08 ***
prey_prop    0.08954    0.08486   1.055    0.292    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.2676 on 514 degrees of freedom
Multiple R-squared:  0.002161,  Adjusted R-squared:  0.0002199 
F-statistic: 1.113 on 1 and 514 DF,  p-value: 0.2919

Report, Reject or Accept… & Interpret!

ANOVA

In an ANOVA, the key outputs to report include the group means, the F statistic and its associated p-value.

The F statistic is a ratio: \[ F = \frac{\text{variance between groups}}{\text{variance within groups}} \] Your F statistic will be higher when the variance between groups is large relative to the variance within groups, which suggests that the group means are different.

The p-value tells you the probability of observing the F statistic calculated from your data, assuming the null hypothesis is true (i.e., all group means are equal).

If the p-value is below your significance threshold (a default value is typically ( \(\alpha = 0.05\) )), you reject the null hypothesis that all group means are equal.

Important

Importantly, ANOVA tells you that at least one group differs, not which groups differ.

An example for reporting ANOVA results:

The mean proportional relative abundance of Stenobrachius leucopsarus eDNA in samples where humpack whale eDNA was not detected is 0.06, and 0.08 for samples where humpback whale eDNA was detected. A one-way ANOVA showed no significant difference in mean Stenobrachius leucopsarus eDNA proportional relative abundance between samples where humpback whale eDNA was detected and those where it was not detected [F(1,514)=1.113, p=0.292]. We therefore fail to reject the null hypothesis and find no statistical evidence that humpback eDNA detection status is associated with differences in lampfish eDNA abundance.

If the ANOVA is significant, you can follow it up with a post hoc test like the Tukey Honestly Significant Difference test using the TukeyHSD(your_anova_result) function to identify which specific pairs of groups differ.

Tip

Interpret results in plain language: report the F statistic, degrees of freedom, p-value, and group means — and always connect the statistical result back to the biological or practical question you’re asking. You should talk with your group about how you interpret your results in the specific ecological context of what is known about your chosen predator, prey, relating directly to your hypotheses!

Linear model

After running lm_res <- lm(y ~ x, data = your_data) and summary(lm_res), make sure to report the coefficients: the estimate tells you the direction and magnitude of the effect, the standard error reflects uncertainty, the t-value is the signal-to-noise ratio, and the p-value tests whether that coefficient differs significantly from zero. The overall model F-statistic tests whether the model explains more variation than a null (intercept-only) model, and the R2 reports the variance explained by the model.

An example for reporting linear regression results:

A linear regression was conducted to assess whether lampfish (Stenobrachius leucopsarus) eDNA proportional abundance predicted humpback whale eDNA detection. The model estimates a weak positive association, where higher lampfish eDNA proportional abundance corresponds to slightly higher predicted humpback detection probability; however, the magnitude of this effect is small and not a significant predictor of humpback detection status (β=0.0895, SE=0.0849, t(514)=1.06, p=0.292). The overall model was not significant [F(1,514)=1.113, p=0.292] and explained a negligible proportion of variance in detection status [R2=0.002]. Therefore, we fail to reject the null hypothesis that lampfish proportional abundance is associated with humpback whale eDNA detection.

Tip

When you interpret results, translate coefficients into meaningful biological or practical terms — don’t just report significance; explain what the estimated effect size actually means (ex. small positive effect, large negative effect… describe the magnitude and direction of the predictor variable effect on your response variable).