Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.2.3
Suggests:
applicable,
baguette,
beans,
bestNormalize,
Expand All @@ -40,13 +41,15 @@ Suggests:
mixOmics,
multilevelmod,
nlme,
probably,
ranger,
roxygen2,
rsconnect,
rstanarm,
rules,
stringr,
testthat (>= 3.0.0),
textrecipes,
tidymodels,
tidyposterior,
tidyverse,
Expand Down
2 changes: 1 addition & 1 deletion inst/tutorials/02-a-tidyverse-primer/tutorial.Rmd
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: A Tidyverse Primer
author: Pratham Kancherla and David Kane
author: Pratham Kancherla
tutorial:
id: a-tidyverse-primer
output:
Expand Down
2 changes: 1 addition & 1 deletion inst/tutorials/04-the-ames-housing-data/tutorial.Rmd
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: The Ames Housing Data
author: Pratham Kancherla and David Kane
author: Pratham Kancherla
tutorial:
id: the-ames-housing-data
output:
Expand Down
120 changes: 92 additions & 28 deletions inst/tutorials/07-a-model-workflow/tutorial.Rmd
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
---
title: A Model Workflow
author: Pratham Kancherla and David Kane
author: Pratham Kancherla
tutorial:
id: a-model-workflow
output:
learnr::tutorial:
progressive: true
allow_skip: true
progressive: yes
allow_skip: yes
runtime: shiny_prerendered
description: 'Tutorial for Chapter 7: A Model Workflow'
---
Expand Down Expand Up @@ -73,6 +73,16 @@ multilevel_workflow <-

multilevel_fit <- fit(multilevel_workflow, data = Orthodont)

parametric_spec <- survival_reg()

parametric_workflow <-
workflow() |>
add_variables(outcome = c(fustat, futime), predictors = c(age, rx)) |>
add_model(parametric_spec,
formula = Surv(futime, fustat) ~ age + strata(rx))

parametric_fit <- fit(parametric_workflow, data = ovarian)

location <- list(
longitude = Sale_Price ~ Longitude,
latitude = Sale_Price ~ Latitude,
Expand All @@ -83,7 +93,7 @@ location <- list(
location_models <- workflow_set(preproc = location, models = list(lm = lm_model))

location_models <-
location_models %>%
location_models |>
mutate(fit = map(info, ~ fit(.x$workflow[[1]], ames_train)))

final_lm_res <- last_fit(lm_wflow, ames_split)
Expand Down Expand Up @@ -840,27 +850,81 @@ fit(multilevel_workflow, data = Orthodont)

### Exercise 15

<!-- PK: Not sure if I should just give this code since it is kind of repetitive from the last 13 exercises of just split it up. Split it up! Repetition in the pursuit of understanding is no vice! -->

We can even use the previously mentioned `strata()` function from the survival package for survival analysis. Run the following code.
Type `survival_reg()` and set it to `parametric_sepc()`. Then, pipe `workflow()` to `add_variables`. Add the parameter `outcome`, setting it equal to `c(fustat, futime)`, and `predictors`, setting it equal to `c(age, rx)`.

```{r how-does-a-workflow--15, exercise = TRUE}
library(censored)

```

<button onclick = "transfer_code(this)">Copy previous code</button>

```{r how-does-a-workflow--15-hint-1, eval = FALSE}
parametric_spec <- survival_reg()

workflow() |>
add_variables(outcome = ..., predictors = ...)
```

```{r include = FALSE}
parametric_spec <- survival_reg()

workflow() |>
add_variables(outcome = c(fustat, futime), predictors = c(age, rx))
```

###

Outliers can significantly impact analysis; preprocessing involves identifying and handling outliers using techniques like Z-score, IQR, or clustering-based methods.

### Exercise 16

Copy the previous code (delete the parametric_spec line) and pipe it to `add_model()`. Add the parameters `parametric_spec` and `forumla`, setting that equal to `Surv(futime, fustat) ~ age + strata(rx)`. Then, set the entire expression to `parametric_workflow` using `<-`.

```{r how-does-a-workflow--16, exercise = TRUE}

```

<button onclick = "transfer_code(this)">Copy previous code</button>

```{r how-does-a-workflow--16-hint-1, eval = FALSE}
parametric_workflow <-
... |>
add_model(parametric_spec,
formula = ...)
```

```{r include = FALSE}
parametric_workflow <-
workflow() %>%
add_variables(outcome = c(fustat, futime), predictors = c(age, rx)) %>%
workflow() |>
add_variables(outcome = c(fustat, futime), predictors = c(age, rx)) |>
add_model(parametric_spec,
formula = Surv(futime, fustat) ~ age + strata(rx))
```

###

Transformation techniques like log-transformations, scaling, and standardization are used to adjust the data distribution or make it suitable for certain algorithms.

### Exercise 17

Type `fit()`. Add the parameters `parametric_workflow()` and `data`, setting it equal to `ovarian`. Then, set the entire expression to `parametric_fit` using `<-` and run it on the next line.

```{r how-does-a-workflow--17, exercise = TRUE}

```

```{r how-does-a-workflow--17-hint-1, eval = FALSE}
parametric_fit <- fit(..., data = ..)
```

```{r include = FALSE}
parametric_fit <- fit(parametric_workflow, data = ovarian)
parametric_fit
```

###

<!-- PK: DONE. Not sure if I should just give this code since it is kind of repetitive from the last 13 exercises of just split it up. Split it up! Repetition in the pursuit of understanding is no vice!-->

Great Job! You now know how a workflow uses different sorts of formulas from a data set.

## Creating Multiple Workflows at Once
Expand Down Expand Up @@ -1024,12 +1088,12 @@ location_models$fit[[1]]

We use a **purrr** function here to map through our models, but there is an easier, better approach to fit workflow sets that will be introduced in later tutorials.

###
###

Great Job! You now know how to create multiple workflows and put them in a workflow set. You also know how to extract these sets and analyze them based on the model of the chosen workflow set.

## Evaluatin the Test Set
###
###

Let’s say that we’ve concluded our model development and have settled on a final model. There is a convenience function called `last_fit()` that will fit the model to the entire training set and evaluate it with the testing set.

Expand All @@ -1041,15 +1105,15 @@ Enter `last_fit()` and add the parameter `lm_wflow`. Hit "Run Code." (Note: This

```

```{r evaluatin-the-test-s-1-hint, eval = FALSE}
```{r evaluatin-the-test-s-1-hint-1, eval = FALSE}
last_fit(...)
```

```{r, include = FALSE}
```{r include = FALSE}
#last_fit(lm_wflow)
```

###
###

The `last_fit()` function is used to fit a model on the last split of a resampled data set, typically obtained through cross-validation or bootstrapping. It is useful when you want to use the final model trained on the entire training dataset for making predictions on new, unseen data.

Expand All @@ -1063,15 +1127,15 @@ We always need to a have split for `last_fit()`. Add the parameter `ames_split`

<button onclick = "transfer_code(this)">Copy previous code</button>

```{r evaluatin-the-test-s-2-hint, eval = FALSE}
```{r evaluatin-the-test-s-2-hint-1, eval = FALSE}
final_lm_res <- last_fit(lm_wflow, ...)
```

```{r, include = FALSE}
```{r include = FALSE}
final_lm_res <- last_fit(lm_wflow, ames_split)
```

###
###

The .workflow column contains the fitted workflow and can be pulled out of the results using `extract_workflow()`.

Expand All @@ -1083,15 +1147,15 @@ Use `extract_workflow()` and add the parameter `final_lm_res`. Hit "Run Code".

```

```{r evaluatin-the-test-s-3-hint, eval = FALSE}
```{r evaluatin-the-test-s-3-hint-1, eval = FALSE}
extract_workflow(...)
```

```{r, include = FALSE}
```{r include = FALSE}
extract_workflow(final_lm_res)
```

###
###

`collect_metrics()` and `collect_predictions()` provide access to the performance metrics and predictions, respectively. The `collect_metrics()` function is a lovely way to extract model performance metrics with resampling. `collect_predictions()` can summarize the various results over replicate out-of-sample predictions.

Expand All @@ -1105,17 +1169,17 @@ Run `collect_metrics()` and `collect_predictions()`, on separate lines, with the

<button onclick = "transfer_code(this)">Copy previous code</button>

```{r evaluatin-the-test-s-4-hint, eval = FALSE}
```{r evaluatin-the-test-s-4-hint-1, eval = FALSE}
c_mtrcs <- collect_metrics(...)
c_predic <- collect_predictions(...)
```

```{r, include = FALSE}
```{r include = FALSE}
c_mtrcs <- collect_metrics(final_lm_res)
c_predic <- collect_predictions(final_lm_res)
```

###
###

Statistical metrics are used to describe the distribution of data, compare groups, assess relationships between variables, and draw conclusions from data.The model takes the predictor variables from the test data and generates predictions for the outcome variable. For example, in linear regression, the model estimates the response variable based on the values of the predictor variables.

Expand All @@ -1129,19 +1193,19 @@ Finally, lets `slice()` the predictions output, as it is too many unnecessary ro

<button onclick = "transfer_code(this)">Copy previous code</button>

```{r evaluatin-the-test-s-5-hint, eval = FALSE}
```{r evaluatin-the-test-s-5-hint-1, eval = FALSE}
c_predic <-
collect_predictions(final_lm_res) |>
slice(...)
```

```{r, include = FALSE}
```{r include = FALSE}
c_predic <-
collect_predictions(final_lm_res) |>
slice(1:5)
```

###
###

Great Job! You now know how to evaluate a testing set by using `last_fit()` and statistical metrics and predictions using the `collect_metrics()` and `collect_predictions()`.

Expand Down
2 changes: 1 addition & 1 deletion inst/tutorials/09-judging-model-effectiveness/tutorial.Rmd
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: Judging Model Effectiveness
author: Pratham Kancherla and David Kane
author: Pratham Kancherla
tutorial:
id: judging-model-effectiveness
output:
Expand Down
3 changes: 1 addition & 2 deletions inst/tutorials/11-comparing-models/tutorial.Rmd
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: Comparing Models with Resampling
author: Pratham Kancherla and David Kane
author: Pratham Kancherla
tutorial:
id: comparing-models-with-resampling
output:
Expand Down Expand Up @@ -1735,7 +1735,6 @@ How does the number of resamples affect these types of formal Bayesian compariso

Great Job! You now know have basic understanding of Bayesian Methods and how to analyze these methods using models and functions to make these models.

<!-- PK: Skipping a graph because it includes knowledge I am not aware of, therefore cannot explain it well enough. -->

## Summary
###
Expand Down
Loading