1. Figures and statistical tests using eDNA data

Building on group eDNA projects

Authors

Amy Van Cise

Sarah Tanja

Published

February 12, 2026

Modified

February 17, 2026

Background

This week you will continue to work in your groups to build upon your final lab project using the eDNA dataset!

The focus in this lab will be polishing up your figures and performing statistical tests to answer your specific research question. You will continue to write up your findings in a lab report format with citations!

A note about statistical tests: So far this quarter, we’ve looked at research questions that can be answered with an ANOVA or a linear regression. For the major lab project, the required statistic is an ANOVA or linear regression. If you learned a statistical test or model in another class that you want to apply here, I encourage you to try it out!

Week 7 Lab Report should include:

Background information on chosen predator (e.g. diet, distribution, habitat use, competitors, predators) with MORE context relating to your specific hypotheses and findings, backed up with references using inline citations
Citations use the default Chicago Manual of Style author-date format
Finalized Broad Research Question
Finalized Specific Research Question (if needed)
Finalized Falsifiable Null and Alternate Hypotheses (be specific)
Defined X and Y variables
Figure(s) showing X and Y variables for each hypotheses tested.
Statistical test results, and a statement about whether you reject or fail to reject the null hypothesis.
Draft paragraph reflecting on your results and how your group interprets them. What is the story behind the data?¹

¹ New elements expected to be added for your week 7 iterative report are in bold

Code

humpy <- eDNA %>% 
  filter(common_name == "humpback whale") %>% 
  pivot_longer(16:length(.), names_to = "prey_species", values_to = "prey_prop") %>% 
  group_by(Detected, prey_species) %>%
  filter(mean(prey_prop) > 0.01) %>% # !!!!
  ungroup()

Code

humpy_Sleucopsarus <- humpy %>% 
  filter(prey_species == "Stenobrachius leucopsarus")

Polish your figure(s)

Explore the types of plots you could make in the R graph gallery

Important

Pick at least one plot or figure that illustrates your X and Y variables from each set of tested hypotheses

Choose a color palette

PNWColors

rev() reverses the order the colors are assigned

Code

library(PNWColors)
mycolors <- rev(pnw_palette("Bay", 2, type = "discrete"))

scale_color_manual() controls the outline of your geom
scale_fill_manual() controls the fill of your geom

Plot title and axis labels

labs() cotrols the following
- title = "X vs. Y" plot title
- x = "X variable label" x axis label
- y = "Y variable label" y axis label

Explore the different built in themes

You can try:

theme_minimal()
theme_grey()
theme_bw()
theme_classic()

You can customize other parts of your plot in the theme() call if you choose… download this pdf theme cheatsheet if you want to learn more about it!

Example figure:

Code

ggplot(humpy_Sleucopsarus, 
       aes(x = prey_prop, 
           y = as.factor(Detected), 
           fill = as.factor(Detected), 
           color = as.factor(Detected))) +
  geom_point() +
  geom_boxplot(alpha = 0.5) +
  coord_flip() +
  scale_color_manual(
    values = mycolors,
    name = "Humpback Whale eDNA",
    labels = c("Not detected", "Detected")
  ) +
  scale_fill_manual(
    values = mycolors, 
    name = "Humpback Whale eDNA",
    labels = c("Not detected", "Detected")
  ) +
  theme_grey() +
  labs(
    title = "Detection of Humpback whale eDNA vs. abundance of Northern Lampfish eDNA",
    x = "Relative proportion of Northern Lampfish eDNA",
    y = "Humpback whale eDNA not detected (0) or detected (1)"
  ) +
  theme(
    legend.position = "none",                    # 🔹 hide key
    axis.text = element_text(size = 8, color = "grey60"),
    axis.title = element_text(size = 10, color = "grey60"),
    plot.title = element_text(size = 12, color = "grey40", face = "bold")
  )

Choose your statistical test

1. Identify your predictor variable(s) based on the wording of your specific hypothesis

2. Are your predictor variables categorical or continuous

Tip

If your predictor variable (x) is categorical –> your model is an ANOVA

If your predictor variable (x) is continuous –> your model is a Linear regression

Run your statistical test

ANOVA

A one-way ANOVA in R tests whether the means of a continuous response (y) variable differ across your categorical predictor (x) variable of two or more groups. Conceptually, it asks: Is the variation between group means larger than we would expect from random variation within groups? In R, you typically run it with aov(y ~ group, data = your_dataframe)

Tip

Read up on the usage of the built in ANOVA function aov() from the stats package by running ?aov in the console or a code chunk

Code

?aov

Code

anova_res <- aov(prey_prop ~ Detected, data = humpy_Sleucopsarus)

summary(anova_res)

             Df Sum Sq Mean Sq F value Pr(>F)
Detected      1  0.021 0.02150   1.113  0.292
Residuals   514  9.926 0.01931

Code

humpy_Sleucopsarus %>% 
  group_by(Detected) %>% 
  summarise(mean(prey_prop))

# A tibble: 2 × 2
  Detected `mean(prey_prop)`
     <dbl>             <dbl>
1        0            0.0656
2        1            0.0897

Linear regression

A linear model in R using lm() estimates the relationship between a continuous response (y) variable and one or more continuous predictor (x) variables. At its core, it fits a line that minimizes squared residuals.

Tip

Read up on the usage of the built in linear model function lm() from the stats package by running ?lm in the console or a code chunk

Code

?lm

Code

lm_res <- lm(Detected ~ prey_prop, data=humpy_Sleucopsarus)
summary(lm_res)


Call:
lm(formula = Detected ~ prey_prop, data = humpy_Sleucopsarus)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.16102 -0.07673 -0.07169 -0.07148  0.92852 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.07148    0.01310   5.456 7.57e-08 ***
prey_prop    0.08954    0.08486   1.055    0.292    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.2676 on 514 degrees of freedom
Multiple R-squared:  0.002161,  Adjusted R-squared:  0.0002199 
F-statistic: 1.113 on 1 and 514 DF,  p-value: 0.2919

Report, Reject or Accept… & Interpret!

ANOVA

In an ANOVA, the key outputs to report include the group means, the F statistic and its associated p-value.

The F statistic is a ratio: \[ F = \frac{\text{variance between groups}}{\text{variance within groups}} \] Your F statistic will be higher when the variance between groups is large relative to the variance within groups, which suggests that the group means are different.

The p-value tells you the probability of observing the F statistic calculated from your data, assuming the null hypothesis is true (i.e., all group means are equal).

If the p-value is below your significance threshold (a default value is typically ( $\alpha = 0.05$ )), you reject the null hypothesis that all group means are equal.

Important

Importantly, ANOVA tells you that at least one group differs, not which groups differ.

An example for reporting ANOVA results:

The mean proportional relative abundance of Stenobrachius leucopsarus eDNA in samples where humpack whale eDNA was not detected is 0.06, and 0.08 for samples where humpback whale eDNA was detected. A one-way ANOVA showed no significant difference in mean Stenobrachius leucopsarus eDNA proportional relative abundance between samples where humpback whale eDNA was detected and those where it was not detected [F(1,514)=1.113, p=0.292]. We therefore fail to reject the null hypothesis and find no statistical evidence that humpback eDNA detection status is associated with differences in lampfish eDNA abundance.

If the ANOVA is significant, you can follow it up with a post hoc test like the Tukey Honestly Significant Difference test using the TukeyHSD(your_anova_result) function to identify which specific pairs of groups differ.

Tip

Interpret results in plain language: report the F statistic, degrees of freedom, p-value, and group means — and always connect the statistical result back to the biological or practical question you’re asking. You should talk with your group about how you interpret your results in the specific ecological context of what is known about your chosen predator, prey, relating directly to your hypotheses!

Linear model

After running lm_res <- lm(y ~ x, data = your_data) and summary(lm_res), make sure to report the coefficients: the estimate tells you the direction and magnitude of the effect, the standard error reflects uncertainty, the t-value is the signal-to-noise ratio, and the p-value tests whether that coefficient differs significantly from zero. The overall model F-statistic tests whether the model explains more variation than a null (intercept-only) model, and the R² reports the variance explained by the model.

An example for reporting linear regression results:

A linear regression was conducted to assess whether lampfish (Stenobrachius leucopsarus) eDNA proportional abundance predicted humpback whale eDNA detection. The model estimates a weak positive association, where higher lampfish eDNA proportional abundance corresponds to slightly higher predicted humpback detection probability; however, the magnitude of this effect is small and not a significant predictor of humpback detection status (β=0.0895, SE=0.0849, t(514)=1.06, p=0.292). The overall model was not significant [F(1,514)=1.113, p=0.292] and explained a negligible proportion of variance in detection status [R²=0.002]. Therefore, we fail to reject the null hypothesis that lampfish proportional abundance is associated with humpback whale eDNA detection.

Tip

When you interpret results, translate coefficients into meaningful biological or practical terms — don’t just report significance; explain what the estimated effect size actually means (ex. small positive effect, large negative effect… describe the magnitude and direction of the predictor variable effect on your response variable).

--- title: "1. Figures and statistical tests using eDNA data" subtitle: "Building on group eDNA projects" page-layout: article author: - Amy Van Cise - Sarah Tanja date: "2026-02-12" draft: false date-modified: today order: 1 format: html: toc: true toc-depth: 2 number-sections: false code-fold: true editor: markdown: wrap: 72 --- ```{r setup, include=FALSE} knitr::opts_chunk$set( echo = TRUE, message = FALSE, warning = FALSE ) ``` # Background This week you will continue to work in your groups to build upon your final lab project using the eDNA dataset! The focus in this lab will be polishing up your figures and performing statistical tests to answer your specific research question. You will continue to write up your findings in a lab report format *with citations*! > A note about statistical tests: So far this quarter, we've looked at > research questions that can be answered with an ANOVA or a linear > regression. For the major lab project, the required statistic is an > ANOVA or linear regression. If you learned a statistical test or model > in another class that you want to apply here, I encourage you to try > it out! # Week 7 Lab Report should include: 1. Background information on chosen predator (e.g. diet, distribution, habitat use, competitors, predators) ***with MORE context relating to your specific hypotheses and findings, backed up with references using inline citations*** 2. Citations **use the default [Chicago Manual of Style](https://www.chicagomanualofstyle.org/home.html) author-date format** 3. Finalized Broad Research Question 4. Finalized Specific Research Question (if needed) 5. Finalized Falsifiable Null and Alternate Hypotheses (be specific) 6. Defined X and Y variables 7. **Figure(s) showing X and Y variables for each hypotheses tested**. 8. **Statistical test results, and a statement about whether you reject or fail to reject the null hypothesis.** 9. **Draft paragraph reflecting on your results and how your group interprets them. What is the story behind the data?**[^1] [^1]: New elements expected to be added for your week 7 iterative report are in **bold** ```{r, include=FALSE} library(tidyverse) library(colorspace) ``` ```{r, include=FALSE} eDNA <- read_csv("../../week5/data/eDNA_MM_fish_detections_clean.csv") ``` ```{r} humpy <- eDNA %>% filter(common_name == "humpback whale") %>% pivot_longer(16:length(.), names_to = "prey_species", values_to = "prey_prop") %>% group_by(Detected, prey_species) %>% filter(mean(prey_prop) > 0.01) %>% # !!!! ungroup() ``` ```{r} humpy_Sleucopsarus <- humpy %>% filter(prey_species == "Stenobrachius leucopsarus") ``` # Polish your figure(s) #### Explore the types of plots you could make in the [R graph gallery](https://r-graph-gallery.com/) ::: callout-important Pick at least one plot or figure that illustrates your X and Y variables from *each* set of tested hypotheses ::: #### Choose a color palette [PNWColors](https://github.com/jakelawlor/PNWColors) - `rev()` reverses the order the colors are assigned ```{r} library(PNWColors) mycolors <- rev(pnw_palette("Bay", 2, type = "discrete")) ``` - `scale_color_manual()` controls the outline of your geom - `scale_fill_manual()` controls the fill of your geom #### Plot title and axis labels - `labs()` cotrols the following - `title = "X vs. Y"` plot title - `x = "X variable label"` x axis label - `y = "Y variable label"` y axis label #### Explore the different [built in themes](https://ggplot2-book.org/themes.html#sec-themes) You can try: - `theme_minimal()` - `theme_grey()` - `theme_bw()` - `theme_classic()` You can customize other parts of your plot in the `theme()` call if you choose... download this pdf theme cheatsheet if you want to learn more about it! <object data="../refs/ggplot theme system cheatsheet.pdf" type="application/pdf" width="100%" height="500px"> <p>Unable to display PDF file. <a href="../refs/ggplot theme system cheatsheet.pdf">Download</a> instead.</p> </object> #### Example figure: ```{r} ggplot(humpy_Sleucopsarus, aes(x = prey_prop, y = as.factor(Detected), fill = as.factor(Detected), color = as.factor(Detected))) + geom_point() + geom_boxplot(alpha = 0.5) + coord_flip() + scale_color_manual( values = mycolors, name = "Humpback Whale eDNA", labels = c("Not detected", "Detected") ) + scale_fill_manual( values = mycolors, name = "Humpback Whale eDNA", labels = c("Not detected", "Detected") ) + theme_grey() + labs( title = "Detection of Humpback whale eDNA vs. abundance of Northern Lampfish eDNA", x = "Relative proportion of Northern Lampfish eDNA", y = "Humpback whale eDNA not detected (0) or detected (1)" ) + theme( legend.position = "none", # 🔹 hide key axis.text = element_text(size = 8, color = "grey60"), axis.title = element_text(size = 10, color = "grey60"), plot.title = element_text(size = 12, color = "grey40", face = "bold") ) ``` # Choose your statistical test ##### 1. Identify your predictor variable(s) based on the wording of your specific hypothesis ![](/modules/week4/refs/xy.png) ##### 2. Are your predictor variables categorical or continuous ::: callout-tip If your predictor variable (x) is categorical –\> your model is an ANOVA If your predictor variable (x) is continuous –\> your model is a Linear regression ::: # Run your statistical test ## ANOVA A one-way ANOVA in R tests whether the **means of a continuous response (y) variable differ across your categorical predictor (x) variable of two or more groups**. Conceptually, it asks: *Is the variation between group means larger than we would expect from random variation within groups?* In R, you typically run it with `aov(y ~ group, data = your_dataframe)` ::: callout-tip Read up on the usage of the built in ANOVA function `aov()` from the `stats` package by running `?aov` in the console or a code chunk ::: ```{r, eval=FALSE} ?aov ``` ```{r} anova_res <- aov(prey_prop ~ Detected, data = humpy_Sleucopsarus) summary(anova_res) ``` ```{r} humpy_Sleucopsarus %>% group_by(Detected) %>% summarise(mean(prey_prop)) ``` ## Linear regression A linear model in R using `lm()` estimates the relationship between a **continuous response (y) variable** and one or more **continuous predictor (x) variables**. At its core, it fits a line that minimizes squared residuals. ::: callout-tip Read up on the usage of the built in linear model function `lm()` from the `stats` package by running `?lm` in the console or a code chunk ::: ```{r, eval=FALSE} ?lm ``` ```{r} lm_res <- lm(Detected ~ prey_prop, data=humpy_Sleucopsarus) summary(lm_res) ``` # Report, Reject or Accept... & Interpret! ##### ANOVA In an ANOVA, the key outputs to report include the **group means**, the **F statistic** and its associated **p-value**. The F statistic is a ratio: $$ F = \frac{\text{variance between groups}}{\text{variance within groups}} $$ Your F statistic will be higher when the variance between groups is large relative to the variance within groups, which suggests that the group means are different. The p-value tells you the probability of observing the F statistic calculated from your data, assuming the null hypothesis is true (i.e., all group means are equal). If the p-value is below your significance threshold (a default value is typically ( $\alpha = 0.05$ )), you reject the null hypothesis that all group means are equal. ::: callout-important Importantly, ANOVA tells you **that at least one group differs**, not *which* groups differ. ::: An example for reporting ANOVA results: > The mean proportional relative abundance of *Stenobrachius > leucopsarus* eDNA in samples where humpack whale eDNA was not detected > is 0.06, and 0.08 for samples where humpback whale eDNA was detected. > A one-way ANOVA showed no significant difference in mean > *Stenobrachius leucopsarus* eDNA proportional relative abundance > between samples where humpback whale eDNA was detected and those where > it was not detected \[F(1,514)=1.113, p=0.292\]. We therefore fail to > reject the null hypothesis and find no statistical evidence that > humpback eDNA detection status is associated with differences in > lampfish eDNA abundance. If the ANOVA is significant, you can follow it up with a **post hoc test** like the Tukey Honestly Significant Difference test using the `TukeyHSD(your_anova_result)` function to identify which specific pairs of groups differ. ::: callout-tip Interpret results in plain language: report the F statistic, degrees of freedom, p-value, and group means — and always connect the statistical result back to the biological or practical question you’re asking. You should talk with your group about how you interpret your results in the specific ecological context of what is known about your chosen predator, prey, relating directly to your hypotheses! ::: ##### Linear model After running `lm_res <- lm(y ~ x, data = your_data)` and `summary(lm_res)`, make sure to report the **coefficients**: the estimate tells you the direction and magnitude of the effect, the **standard error** reflects uncertainty, the **t-value** is the signal-to-noise ratio, and the **p-value** tests whether that coefficient differs significantly from zero. The overall model **F-statistic** tests whether the model explains more variation than a null (intercept-only) model, and the **R^2^** reports the variance explained by the model. An example for reporting linear regression results: > A linear regression was conducted to assess whether lampfish > (*Stenobrachius leucopsarus*) eDNA proportional abundance predicted > humpback whale eDNA detection. The model estimates a weak positive > association, where higher lampfish eDNA proportional abundance > corresponds to slightly higher predicted humpback detection > probability; however, the magnitude of this effect is small and not a > significant predictor of humpback detection status (β=0.0895, > SE=0.0849, t(514)=1.06, p=0.292). The overall model was not > significant \[F(1,514)=1.113, p=0.292\] and explained a negligible > proportion of variance in detection status \[R^2^=0.002\]. Therefore, > we fail to reject the null hypothesis that lampfish proportional > abundance is associated with humpback whale eDNA detection. ::: callout-tip When you interpret results, translate coefficients into meaningful biological or practical terms — don’t just report significance; explain what the estimated effect size actually means (ex. small positive effect, large negative effect... describe the magnitude and direction of the predictor variable effect on your response variable). :::