This project was developed as part of the "Statistics for Biologists using R" course at the University of Göttingen. The goal was to master the application of multiple linear regression and interaction modeling on real-world biological datasets.
This repository contains a series of reproducible R workflows for analyzing biological datasets using Multiple Linear Regression. The scripts demonstrate how to transition from raw biological data to statistically sound, publication-ready visualizations.
The workflows cover morphological traits (e.g., insect mass and reproductive age) and behavioral ecology (e.g., flight initiation distances in mammals), applying rigorous statistical checks at each step.
- Continuous Variables (
01_Continuous_Regression.Rmd)- Regression modeling using continuous predictors.
- Predicting age at first reproduction based on morphological mass.
- Categorical Variables & Dummy Coding (
02_Categorical_Regression.Rmd)- Handling non-numeric predictors (e.g., sex, dominance hierarchies).
- Releveling factors and setting reference categories.
- Interaction Terms (
03_Interaction_Effects.Rmd)- Modeling complex biological realities where variables depend on one another (e.g., the interaction between Sex and Season on behavioral traits).
- Language: R
- Data Manipulation: Base R, factors management
- Statistical Modeling and Data Visualization:
lm(),car,effects,psych,ggpubr,lmtest
- Implementation of dummy coding and reference releveling for multi-level categorical variables.
- Rigorous diagnostic testing for model assumptions (Normality, Homoscedasticity, Multicollinearity) and influential outlier detection (Cook's Distance, DFBETAS).
- Creation of publication-quality effect displays featuring confidence bands to accurately visualize main effects and complex interaction terms.
To run these workflows locally on your machine:
-
Clone the repository:
git clone (https://github.com/yazalj/BioData-Linear-Models.git) cd BioData-Linear-Models -
Prerequisites:
Ensure you have R and RStudio installed.
-
Install Dependencies:
The scripts utilize the
pacmanpackage manager for clean environment setup. You only need to installpacmandirectly; it will automatically install and load all other required packages when you run the scripts.install.packages("pacman") -
Execution:
-
Open any of the .Rmd files located in the scripts/ directory using RStudio.
-
Click the "Knit" button in RStudio. This will automatically execute the code, source the datasets from the data/ folder via relative paths, and generate a clean HTML report.
-
Continuous Variables |
Categorical Variables |
Interactions |


