diff --git a/_articles/.DS_Store b/_articles/.DS_Store deleted file mode 100644 index 85d172400f..0000000000 Binary files a/_articles/.DS_Store and /dev/null differ diff --git a/_articles/RJ-2014-031/RJ-2014-031.Rmd b/_articles/RJ-2014-031/RJ-2014-031.Rmd index ba1d8371ed..188a01ad00 100644 --- a/_articles/RJ-2014-031/RJ-2014-031.Rmd +++ b/_articles/RJ-2014-031/RJ-2014-031.Rmd @@ -18,7 +18,7 @@ abstract: Assessing the assumption of multivariate normality is required by many robust Mahalanobis distances. Moreover, this package offers functions to check the univariate normality of marginal distributions through both tests and plots. Furthermore, especially for non-R users, we provide a user-friendly web application of the package. - This application is available at . + This application is available at (link updated 27 Jan 2026). author: - name: Selcuk Korkmaz affiliation: Department of Biostatistics, Hacettepe University @@ -190,7 +190,7 @@ $\mu$ and variance $\sigma^2$ as given below (updated 2025-08-29: $\mu$ and $\si $$\begin{aligned} \mu = & \, 1 - a^{-p/2}\left(1 + \frac{p\,\beta^{2}}{a} + \frac{p(p+2)\,\beta^{4}}{2a^{2}}\right)\\ - \sigma^2 = & \, 2(1+4\beta^{2})^{-p/2} + 2a^{-p}\!\left(1 + \frac{2p\beta^{4}}{a^{2}} + \frac{3p(p+2)\beta^{8}}{4a^{4}}\right)\\ + \sigma^2 = & \, 2(1+4\beta^{2})^{-p/2} + 2a^{-p}\!\left(1 + \frac{2p\beta^{4}}{a^{2}} + \frac{3p(p+2)\beta^{8}}{4a^{4}}\right)\\ & \, - 4w_{\beta}^{-p/2}\!\left(1 + \frac{3p\beta^{4}}{2w_{\beta}} + \frac{p(p+2)\beta^{8}}{2w_{\beta}^{2}}\right) \end{aligned}$$ @@ -226,7 +226,7 @@ obtained from the normality transformation proposed by $$\label{eq:xandwjs} \begin{array}{l l l l} \text{if } \: 4\leq n \leq 11; & \qquad x = n \quad & \text{and} & \quad w_j = -\text{log}\left[\gamma - \text{log}\left(1- W_j\right)\right] \\ - \text{if } \: 12\leq n \leq 2000; & \qquad x = \text{log} (n) \quad & \text{and} & \quad w_j = \text{log}\left(1- W_j\right) + \text{if } \: 12\leq n \leq 2000; & \qquad x = \text{log} (n) \quad & \text{and} & \quad w_j = \text{log}\left(1- W_j\right) \end{array} (\#eq:xandwjs) $$ As seen from equation \@ref(eq:xandwjs), $x$ and $w_j$-s change with the @@ -234,7 +234,7 @@ sample size ($n$). By using equation \@ref(eq:xandwjs), transformed values of each random variable can be obtained from equation \@ref(eq:zjs). -$$\label{eq:zjs} +$$\label{eq:zjs} Z_j = \frac{w_j - \mu}{\sigma} (\#eq:zjs) $$ where $\gamma$, $\mu$ and $\sigma$ are derived from the polynomial @@ -243,10 +243,10 @@ coefficients are provided by @royston1992approx for different sample sizes. $$\begin{aligned} - \label{eq:polygms} + \label{eq:polygms} \gamma = & \: a_{0\gamma} + a_{1\gamma}x + a_{2\gamma}x^2 + \cdots + a_{d\gamma}x^{d} \nonumber \\ \mu = & \: a_{0\mu} + a_{1\mu}x + a_{2\mu}x^2 + \cdots + a_{d\mu}x^{d} \\ - \text{log}(\sigma) = & \: a_{0\sigma} + a_{1\sigma}x + a_{2\sigma}x^2 + \cdots + a_{d\sigma}x^{d} \nonumber + \text{log}(\sigma) = & \: a_{0\sigma} + a_{1\sigma}x + a_{2\sigma}x^2 + \cdots + a_{d\sigma}x^{d} \nonumber \end{aligned} (\#eq:polygms) $$ The Royston's test statistic for multivariate normality is as follows: @@ -259,9 +259,9 @@ is the cumulative distribution function for the standard normal distribution such that, $$\begin{aligned} - \label{eq:edf} + \label{eq:edf} e &= p / [1 + (p - 1)\bar{c}] \nonumber \\ - \psi_j &= \left\{\Phi^{-1}\left[\Phi(-Z_j)/2\right]\right\}^2, \qquad j=1,2,\ldots,p. + \psi_j &= \left\{\Phi^{-1}\left[\Phi(-Z_j)/2\right]\right\}^2, \qquad j=1,2,\ldots,p. \end{aligned} (\#eq:edf) $$ As seen from equation \@ref(eq:edf), another extra term $\bar{c}$ has to @@ -314,7 +314,7 @@ Similarly, the *Iris* data can be loaded from the R database by using the following R code: ``` r -# load Iris data +# load Iris data data(iris) ``` @@ -346,23 +346,23 @@ result ``` ``` r - Mardia's Multivariate Normality Test ---------------------------------------- - data : setosa + Mardia's Multivariate Normality Test +--------------------------------------- + data : setosa - g1p : 3.079721 - chi.skew : 25.66434 - p.value.skew : 0.1771859 + g1p : 3.079721 + chi.skew : 25.66434 + p.value.skew : 0.1771859 - g2p : 26.53766 - z.kurtosis : 1.294992 - p.value.kurt : 0.1953229 + g2p : 26.53766 + z.kurtosis : 1.294992 + p.value.kurt : 0.1953229 - chi.small.skew : 27.85973 - p.value.small : 0.1127617 + chi.small.skew : 27.85973 + p.value.small : 0.1127617 - Result : Data are multivariate normal. ---------------------------------------- + Result : Data are multivariate normal. +--------------------------------------- ``` Here: @@ -397,15 +397,15 @@ result ``` ``` r - Henze-Zirkler's Multivariate Normality Test ---------------------------------------------- - data : setosa + Henze-Zirkler's Multivariate Normality Test +--------------------------------------------- + data : setosa - HZ : 0.9488453 - p-value : 0.04995356 + HZ : 0.9488453 + p-value : 0.04995356 - Result : Data are not multivariate normal. ---------------------------------------------- + Result : Data are not multivariate normal. +--------------------------------------------- ``` Here, `HZ` is the value of the Henze-Zirkler's test statistic at @@ -429,15 +429,15 @@ result ``` ``` r - Royston's Multivariate Normality Test ---------------------------------------------- - data : setosa + Royston's Multivariate Normality Test +--------------------------------------------- + data : setosa - H : 31.51803 - p-value : 2.187653e-06 + H : 31.51803 + p-value : 2.187653e-06 - Result : Data are not multivariate normal. ---------------------------------------------- + Result : Data are not multivariate normal. +--------------------------------------------- ``` Here, `H` is the value of the Royston's test statistic at significance @@ -519,7 +519,7 @@ following code chunk is used to perform the Shapiro-Wilk's normality test on each variable: ``` r -uniNorm(setosa, type = "SW") +uniNorm(setosa, type = "SW") ``` ``` r @@ -542,7 +542,7 @@ are given in Table [1](#tbl:setosa). ------------------------------------------ Test Test Statistic p-value --------------- ---------------- --------- - Mardia + Mardia Skewness 11.249 0.338 @@ -626,7 +626,7 @@ statistical significance of bivariate normal distribution of the ------------------------------------------ Test Test Statistic p-value --------------- ---------------- --------- - Mardia + Mardia Skewness 0.760 0.944 @@ -692,11 +692,11 @@ outliers. For this example, we will use another subset of the *Iris* data, which is *versicolor* flowers, with the first three variables. ``` r -versicolor <- iris[51:100, 1:3] +versicolor <- iris[51:100, 1:3] # Mahalanobis distance -result <- mvOutlier(versicolor, qqplot = TRUE, method = "quan") +result <- mvOutlier(versicolor, qqplot = TRUE, method = "quan") # Adjusted Mahalanobis distance -result <- mvOutlier(versicolor, qqplot = TRUE, method = "adj.quan") +result <- mvOutlier(versicolor, qqplot = TRUE, method = "adj.quan") ```
@@ -815,4 +815,4 @@ Research Fund of Marmara University \[FEN-C-DRP-120613-0273\]. [^1]: -[^2]: +[^2]: (link updated 27 Jan 2026) diff --git a/_articles/RJ-2022-048/RJ-2022-048_cache/.DS_Store b/_articles/RJ-2022-048/RJ-2022-048_cache/.DS_Store deleted file mode 100644 index bd4d27a619..0000000000 Binary files a/_articles/RJ-2022-048/RJ-2022-048_cache/.DS_Store and /dev/null differ diff --git a/_articles/RJ-2022-048/RJ-2022-048_files/.DS_Store b/_articles/RJ-2022-048/RJ-2022-048_files/.DS_Store deleted file mode 100644 index d9960dce17..0000000000 Binary files a/_articles/RJ-2022-048/RJ-2022-048_files/.DS_Store and /dev/null differ diff --git a/_articles/RJ-2022-048/did2s_cache/.DS_Store b/_articles/RJ-2022-048/did2s_cache/.DS_Store deleted file mode 100644 index 48f9b3587e..0000000000 Binary files a/_articles/RJ-2022-048/did2s_cache/.DS_Store and /dev/null differ diff --git a/_articles/RJ-2022-048/did2s_files/.DS_Store b/_articles/RJ-2022-048/did2s_files/.DS_Store deleted file mode 100644 index c6afdea821..0000000000 Binary files a/_articles/RJ-2022-048/did2s_files/.DS_Store and /dev/null differ diff --git a/_articles/RJ-2022-052/RJ-2022-052_files/.DS_Store b/_articles/RJ-2022-052/RJ-2022-052_files/.DS_Store deleted file mode 100644 index 712db1c9c8..0000000000 Binary files a/_articles/RJ-2022-052/RJ-2022-052_files/.DS_Store and /dev/null differ diff --git a/_articles/RJ-2022-055/RJ-2022-055_cache/.DS_Store b/_articles/RJ-2022-055/RJ-2022-055_cache/.DS_Store deleted file mode 100644 index bc6e23b04d..0000000000 Binary files a/_articles/RJ-2022-055/RJ-2022-055_cache/.DS_Store and /dev/null differ diff --git a/_articles/RJ-2022-055/RJ-2022-055_files/.DS_Store b/_articles/RJ-2022-055/RJ-2022-055_files/.DS_Store deleted file mode 100644 index 25ff09b7c3..0000000000 Binary files a/_articles/RJ-2022-055/RJ-2022-055_files/.DS_Store and /dev/null differ diff --git a/_articles/RJ-2022-055/hopkins_cache/.DS_Store b/_articles/RJ-2022-055/hopkins_cache/.DS_Store deleted file mode 100644 index cc29b9488c..0000000000 Binary files a/_articles/RJ-2022-055/hopkins_cache/.DS_Store and /dev/null differ diff --git a/_articles/RJ-2022-055/hopkins_files/.DS_Store b/_articles/RJ-2022-055/hopkins_files/.DS_Store deleted file mode 100644 index 84c50811c7..0000000000 Binary files a/_articles/RJ-2022-055/hopkins_files/.DS_Store and /dev/null differ diff --git a/_articles/RJ-2022-056/supplemental_code/.DS_Store b/_articles/RJ-2022-056/supplemental_code/.DS_Store deleted file mode 100644 index a6e0a3bb01..0000000000 Binary files a/_articles/RJ-2022-056/supplemental_code/.DS_Store and /dev/null differ diff --git a/_articles/RJ-2023-003/examples/.DS_Store b/_articles/RJ-2023-003/examples/.DS_Store deleted file mode 100644 index fa2801b509..0000000000 Binary files a/_articles/RJ-2023-003/examples/.DS_Store and /dev/null differ diff --git a/_articles/RJ-2023-007/img/.DS_Store b/_articles/RJ-2023-007/img/.DS_Store deleted file mode 100644 index b0dc0aacb5..0000000000 Binary files a/_articles/RJ-2023-007/img/.DS_Store and /dev/null differ diff --git a/_articles/RJ-2023-010/salmeron-etal_files/.DS_Store b/_articles/RJ-2023-010/salmeron-etal_files/.DS_Store deleted file mode 100644 index cedd04720e..0000000000 Binary files a/_articles/RJ-2023-010/salmeron-etal_files/.DS_Store and /dev/null differ diff --git a/_articles/RJ-2023-013/RJ-2023-013_cache/.DS_Store b/_articles/RJ-2023-013/RJ-2023-013_cache/.DS_Store deleted file mode 100644 index 9b9ae8451d..0000000000 Binary files a/_articles/RJ-2023-013/RJ-2023-013_cache/.DS_Store and /dev/null differ diff --git a/_articles/RJ-2023-013/RJ-2023-013_files/.DS_Store b/_articles/RJ-2023-013/RJ-2023-013_files/.DS_Store deleted file mode 100644 index e9511dfa31..0000000000 Binary files a/_articles/RJ-2023-013/RJ-2023-013_files/.DS_Store and /dev/null differ diff --git a/_articles/RJ-2023-013/jeppson-hofmann_cache/.DS_Store b/_articles/RJ-2023-013/jeppson-hofmann_cache/.DS_Store deleted file mode 100644 index 8c045b81c4..0000000000 Binary files a/_articles/RJ-2023-013/jeppson-hofmann_cache/.DS_Store and /dev/null differ diff --git a/_articles/RJ-2023-013/jeppson-hofmann_files/.DS_Store b/_articles/RJ-2023-013/jeppson-hofmann_files/.DS_Store deleted file mode 100644 index 92b20b4e0b..0000000000 Binary files a/_articles/RJ-2023-013/jeppson-hofmann_files/.DS_Store and /dev/null differ diff --git a/_articles/RJ-2023-014/RJ-2023-014_cache/.DS_Store b/_articles/RJ-2023-014/RJ-2023-014_cache/.DS_Store deleted file mode 100644 index af192b928f..0000000000 Binary files a/_articles/RJ-2023-014/RJ-2023-014_cache/.DS_Store and /dev/null differ diff --git a/_articles/RJ-2023-014/RJ-2023-014_cache/html5/.DS_Store b/_articles/RJ-2023-014/RJ-2023-014_cache/html5/.DS_Store deleted file mode 100644 index 5008ddfcf5..0000000000 Binary files a/_articles/RJ-2023-014/RJ-2023-014_cache/html5/.DS_Store and /dev/null differ diff --git a/_articles/RJ-2023-014/RJ-2023-014_cache/latex/.DS_Store b/_articles/RJ-2023-014/RJ-2023-014_cache/latex/.DS_Store deleted file mode 100644 index 5008ddfcf5..0000000000 Binary files a/_articles/RJ-2023-014/RJ-2023-014_cache/latex/.DS_Store and /dev/null differ diff --git a/_articles/RJ-2023-015/RJ-2023-015_cache/.DS_Store b/_articles/RJ-2023-015/RJ-2023-015_cache/.DS_Store deleted file mode 100644 index e0f352a9f6..0000000000 Binary files a/_articles/RJ-2023-015/RJ-2023-015_cache/.DS_Store and /dev/null differ diff --git a/_articles/RJ-2023-015/RJ-2023-015_files/.DS_Store b/_articles/RJ-2023-015/RJ-2023-015_files/.DS_Store deleted file mode 100644 index ade6482370..0000000000 Binary files a/_articles/RJ-2023-015/RJ-2023-015_files/.DS_Store and /dev/null differ diff --git a/_articles/RJ-2023-019/RJ-2023-019_files/.DS_Store b/_articles/RJ-2023-019/RJ-2023-019_files/.DS_Store deleted file mode 100644 index 3264547923..0000000000 Binary files a/_articles/RJ-2023-019/RJ-2023-019_files/.DS_Store and /dev/null differ diff --git a/_articles/RJ-2023-019/smith_files/.DS_Store b/_articles/RJ-2023-019/smith_files/.DS_Store deleted file mode 100644 index e0c283429a..0000000000 Binary files a/_articles/RJ-2023-019/smith_files/.DS_Store and /dev/null differ diff --git a/_articles/RJ-2023-020/RJ-2023-020_cache/.DS_Store b/_articles/RJ-2023-020/RJ-2023-020_cache/.DS_Store deleted file mode 100644 index 99c8188387..0000000000 Binary files a/_articles/RJ-2023-020/RJ-2023-020_cache/.DS_Store and /dev/null differ diff --git a/_articles/RJ-2023-020/RJ-2023-020_files/.DS_Store b/_articles/RJ-2023-020/RJ-2023-020_files/.DS_Store deleted file mode 100644 index 9d4df38ef0..0000000000 Binary files a/_articles/RJ-2023-020/RJ-2023-020_files/.DS_Store and /dev/null differ diff --git a/_articles/RJ-2023-020/openVA-RJ-R1_cache/.DS_Store b/_articles/RJ-2023-020/openVA-RJ-R1_cache/.DS_Store deleted file mode 100644 index 80ac1d07fa..0000000000 Binary files a/_articles/RJ-2023-020/openVA-RJ-R1_cache/.DS_Store and /dev/null differ diff --git a/_articles/RJ-2023-020/openVA-RJ-R1_files/.DS_Store b/_articles/RJ-2023-020/openVA-RJ-R1_files/.DS_Store deleted file mode 100644 index 0ad0738fae..0000000000 Binary files a/_articles/RJ-2023-020/openVA-RJ-R1_files/.DS_Store and /dev/null differ diff --git a/_articles/RJ-2025-032/RJ-2025-032.R b/_articles/RJ-2025-032/RJ-2025-032.R new file mode 100644 index 0000000000..ffaa70ac9e --- /dev/null +++ b/_articles/RJ-2025-032/RJ-2025-032.R @@ -0,0 +1,146 @@ +# Generated by `rjournal_pdf_article()` using `knitr::purl()`: do not edit by hand +# Please edit RJ-2025-032.Rmd to modify this file + +## ----setup, include=FALSE----------------------------------------------------- +knitr::opts_chunk$set( + echo = TRUE, # show code by default + warning = FALSE, # suppress warnings + message = FALSE, # suppress messages + fig.align = "center", # center figures + fig.width = 6, # default figure width in inches + fig.height = 4, # default figure height in inches + dpi = 300, # high-res figures for PDF + out.width = "100%", + cache = TRUE +) +options(csurvey.multicore = FALSE) +library(Matrix) +library(data.table) +library(coneproj) +#library(foreign) +library(tidyverse) +library(csurvey) +library(MASS) +library(survey) + + +## ----------------------------------------------------------------------------- +library(csurvey) +data(nhdat2, package = "csurvey") +dstrat <- svydesign(ids = ~id, strata = ~str, data = nhdat2, weight = ~wt) + + +## ----------------------------------------------------------------------------- +ans <- csvy(chol ~ incr(age), design = dstrat, n.mix = 100) + + +## ----------------------------------------------------------------------------- +cat("CIC (constrained):", ans$CIC, "\n") + + +## ----------------------------------------------------------------------------- +cat("CIC (unconstrained):", ans$CIC.un, "\n") + + +## ----------------------------------------------------------------------------- +cat(svycontrast(ans, list(avg = c(rep(-1, 13)/13, rep(1, 12)/12))), "\n") + + +## ----nh1big, fig.cap="Estimates of average cholesterol level for 25 ages, with 95% confidence intervals, for a stratified sample in the R dataset `nhdat2`, $n=1933$.", fig.align='center', echo=FALSE---- +knitr::include_graphics("figures/nhanes1.png") + + +## ----------------------------------------------------------------------------- +set.seed(1) +ans <- csvy(chol ~ incr(age)*incr(wcat)*icat, design = dstrat) + + +## ----------------------------------------------------------------------------- +domains <- data.frame(age = c(24, 35), wcat = c(2, 4), icat = c(2, 3)) +pans <- predict(ans, newdata = domains, se.fit = TRUE) +cat("Predicted values, confidence intervals and standard errors for specified domains:\n") +print (pans) + + +## ----nh2, fig.cap="Constrained estimates of population domain means for 400 domains in a 25x4x4 grid. The increasing population domain estimates for the 25 ages are shown within the waist size and income categories. The blue bands indicate 95% confidence intervals for the population domain means, with two specific domains, namely, (age, waist, income) = (24, 2, 2) and (35, 4, 3) marked in red. Empty domains are marked with a red 'x' sign.", fig.align='center', echo=FALSE---- +knitr::include_graphics("figures/nhanes_grid3.png") + + +## ----nh2un, fig.cap="Unconstrained estimates of population domain means for 400 domains in a 25x4x4 grid. The population domain estimates for the 25 ages are shown within the waist size and income categories. The green bands indicate 95% confidence intervals for the population domain means. Empty domains are marked with a red 'x' sign.", fig.align='center', echo=FALSE---- +knitr::include_graphics("figures/nhanes_grid3_un.png") + + +## ----surface9, fig.cap="Estimates of average log(salary) by field of study and year of degree, for observations where highest degree is a Bachelor's, for each of the nine regions.", fig.align='center', echo=FALSE, out.width="60%"---- +knitr::include_graphics("figures/new_surfaces9.png") + + +## ----NEreg, fig.cap="Estimates of average log(salary) for the 75 domains in each of three regions. The blue dots represent the constrained domain mean estimates, while the grey dots represent the unconstrained domain mean estimates. The blue band is the 95% confidence interval for the domains, using the constraints; the grey band is the 95% unconstrained domain mean confidence interval.", fig.align='center', echo=FALSE, out.width="100%"---- +knitr::include_graphics("figures/newplot4.png") + + +## ----test, fig.cap="Estimates of average log(salary) by father's education level, for each of five regions and four fields, for subjects whose degree was attained in 2016-2017. The solid blue lines connect the estimates where the average salary is constrained to be increasing in father's education, and the solid red lines connect unconstrained estimates of average salary.", fig.align='center', echo=FALSE, out.width="100%"---- +knitr::include_graphics("figures/daded.png") + + +## ----comppv, echo=FALSE, results='asis'--------------------------------------- +library(knitr) +library(kableExtra) + +years <- c("2008-09","2010-11","2012-13","2014-15","2016-17","2018-19") +vals <- matrix( + c(".008", "n/a", + "<.001", ".018", + "<.001", "<.001", + "<.001", "n/a", + ".003", ".417", + "<.001", "n/a"), + nrow = 1, byrow = TRUE +) +df <- as.data.frame(vals, stringsAsFactors = FALSE) +colnames(df) <- rep(c("one", "two"), length(years)) + +kable(df, booktabs = TRUE, + caption = "One-sided and two-sided $p$-values for the test of the null hypothesis that salary is constant in father's education level. The two-sided test results in n/a when the grid has at least one empty domain.", + escape = TRUE) %>% + add_header_above(setNames(rep(2, length(years)), years)) %>% + kable_styling(latex_options = c("hold_position")) + + +## ----------------------------------------------------------------------------- +load("./nscg19_2.rda") +data <- nscg2 |> + dplyr::filter(hd_year %in% c(2008, 2009)) + +rds <- svrepdesign(data = data, repweights = dplyr::select(data, "RW0001":"RW0320"), weights = ~w, + combined.weights = TRUE, mse = TRUE, type = "other", + scale = 1, rscale = 0.05) + +set.seed(1) +ans <- csvy(logSalary ~ incr(daded) * field * region, design = rds, test = TRUE) + + +## ----eval=T------------------------------------------------------------------- +summary(ans) + + +## ----------------------------------------------------------------------------- +data(nhdat, package = "csurvey") +dstrat <- svydesign(ids = ~ id, strata = ~ str, data = nhdat, weight = ~ wt) +set.seed(1) +ans <- csvy(chol ~ incr(age) * incr(wcat) * gender, design = dstrat, + family = binomial(link = "logit"), test = TRUE) + + +## ----------------------------------------------------------------------------- +summary(ans) + + +## ----eval=FALSE, echo=FALSE--------------------------------------------------- +# ctl <- list(angle = 0, x1size = 2, x2size = 2, x1lab = "waist", x2_labels = c("male", "female"), +# subtitle.size=6) +# plot(ans, x1 = "wcat", x2 = "gender", type="both", control = ctl) + + +## ----nhanesbin, fig.cap="Estimates of probability of high cholesterol level for each combination of age, waist and gender. The blue dots represent the constrained domain mean estimates, while the green dots represent the unconstrained domain mean estimates. The blue band is the 95% confidence interval for the domains, using the constraints; the green band is the 95% unconstrained domain mean confidence interval.", fig.align='center', echo=FALSE---- +knitr::include_graphics("figures/nhanes_bin.png") + diff --git a/_articles/RJ-2025-032/RJ-2025-032.Rmd b/_articles/RJ-2025-032/RJ-2025-032.Rmd new file mode 100644 index 0000000000..b494e836a6 --- /dev/null +++ b/_articles/RJ-2025-032/RJ-2025-032.Rmd @@ -0,0 +1,394 @@ +--- +title: 'csurvey: Implementing Order Constraints in Survey Data Analysis' +author: +- name: Xiyue Liao + affiliation: San Diego State University + address: + - San Diego State University + - Department of Mathematics and Statistics + - San Diego, California 92182, United States of America + orcid: 0000-0002-4508-9219 + email: xliao@sdsu.edu +- name: Mary C. Meyer + email: meyer@stat.colostate.edu + affiliation: Colorado State University + address: + - Colorado State University + - Department of Statistics + - Fort Collins, Colorado 80523, United States of America +abstract: | + Recent work in survey domain estimation has shown that incorporating a priori assumptions about orderings of population domain means reduces the variance of the estimators. The new R package csurvey allows users to implement order constraints using a design specified in the well-known survey package. The order constraints not only give estimates that satisfy a priori assumptions, but they also provide smaller confidence intervals with good coverage, and allow for design-based estimation in small-sample or empty domains. A test for constant versus increasing domain means is available in the package, with generalizations to other one-sided tests. A cone information criterion may be used to provide evidence that the order constraints are valid. Examples with well-known survey data sets show the utility of the methods. This package is now available from the Comprehensive R Archive Network at . +preamble: | + % Any extra LaTeX you need in the preamble +output: + rjtools::rjournal_article: + toc: no + self_contained: yes +bibliography: article.bib +date: '2026-01-05' +date_received: '2023-09-24' +volume: 17 +issue: 4 +slug: RJ-2025-032 +draft: no +journal: + lastpage: 19 + firstpage: 4 + +--- + + +```{r setup, include=FALSE} +knitr::opts_chunk$set( + echo = TRUE, # show code by default + warning = FALSE, # suppress warnings + message = FALSE, # suppress messages + fig.align = "center", # center figures + fig.width = 6, # default figure width in inches + fig.height = 4, # default figure height in inches + dpi = 300, # high-res figures for PDF + out.width = "100%", + cache = TRUE +) +options(csurvey.multicore = FALSE) +library(Matrix) +library(data.table) +library(coneproj) +#library(foreign) +library(tidyverse) +library(csurvey) +library(MASS) +library(survey) +``` + +# Introduction + +We assume that a finite population is partitioned into a number of domains, and the goal is to estimate the population domain means for the study variable and provide valid inference, such as confidence intervals and hypothesis tests. + +Order constraints are a common type of *a priori* knowledge in survey data analysis. For example, we might know that salary increases with job rank within job type and location, or average cholesterol in a population increases with age category, or test scores decrease as poverty increases, or that the amount of pollution decreases with distance from the source. There might be a "block ordering" of salaries by location, where the locations are partitioned into blocks where salaries for all locations in one block (such as major metropolitan areas) are assumed to be higher, on average, than salaries in another block (such as rural areas), without imposing orderings within the blocks. + +The order constraints may be imposed on domain mean estimates, or systematic component estimates if the response variable is not Gaussian but belongs to an exponential family of distributions. For example, if we are estimating the proportion of people with diabetes in a population, the study variable might be binary (whether or not the subject has diabetes), and perhaps we want to assume that diabetes prevalence increases with age, within ethnicity and socio-economic categories. The \CRANpkg{csurvey} package extends the \CRANpkg{survey} \citep{survey} package in that it allows the user to impose linear inequality constraints on the domain means. If the inequality constraints are valid, this leads to improved inference, as well as the ability to estimate means for empty or small-sample domains without additional assumptions. The \CRANpkg{csurvey} package provides constrained estimates of means or proportions, estimated variances of the estimates, and confidence intervals where the upper and lower bounds of the intervals also satisfy the order constraints. + +The \CRANpkg{csurvey} package implements one-sided tests. Suppose the study variable is salary, and interest is in whether the subject's salary is affected by the level of education of the subject's father, controlling for the subject's education level and field. The null hypothesis is that there is no difference in salary by the level of father's education. The one-sided test with alternative hypothesis that the salary increases with father's education has a higher power than the two-sided test with the larger alternative "salary is not the same across the levels of father's education." + +Finally, the \CRANpkg{csurvey} package includes graphical functions for visualizing the order-constrained estimator and comparing it to the unconstrained estimator. Confidence bands can also be displayed to illustrate estimation uncertainty. + +This package relies on \CRANpkg{coneproj} \citep{coneproj} and \CRANpkg{survey} for its core computations and handling of complex sampling designs. The domain grid is constructed using the \CRANpkg{data.table} \citep{dtable} package. Additional functionality, such as data filtering, variable transformation, and visualization of model results, leverages functions from \CRANpkg{igraph} \citep{igraph}, \CRANpkg{dplyr} \citep{dplyr}, and \CRANpkg{ggplot2} \citep{ggplot2}. When simulating the mixture covariance matrix and the sampling distribution of the one-sided test statistic, selected functions from \CRANpkg{MASS} \citep{mass} and \CRANpkg{Matrix} \citep{matrix} are employed. + +# Research incorporated in csurvey + +The \CRANpkg{csurvey} package provides estimation and inference on population domain variables with order constraints, using recently developed methodology. \citet{wu16} considered a complete ordering on a sequence of domain means. They applied the pooled adjacent violators algorithm \citet{brunk58} for domain mean estimation, and derived asymptotic confidence intervals that have smaller width without sacrificing coverage, compared to the estimators that do not consider the ordering. \citet{oliva20} developed methodology for partial orderings and more general constraints on domains. These include block orderings and orderings on domains arranged in grids by multiple variables of interest. \citet{xu20} refined these methods by proposing variance estimators based on a mixture of covariance matrices, and showed that the mixture covariance estimator improves coverage of confidence intervals while retaining smaller interval lengths. \citet{liao23} showed how to use the order constraints to provide conservative design-based inference in domains with small sample sizes, or even in empty domains, without additional model assumptions. \citet{xu23} developed a test for constant versus increasing domain means, and extended it to one-sided tests for more general orderings. \citet{oliva19} proposed a cone information criterion (CIC) for survey data as a diagnostic method to measure possible departures from the assumed ordering. This criterion is similar to Akaike’s information criterion (AIC) \citep{aic73} and the Bayesian information criterion (BIC) \citep{bic78}, and can be used for model selection. For example, we can use CIC to choose between an unconstrained estimator and an order constrained estimator. The one with a smaller CIC value will be chosen. + +All of the above papers include extensive simulations showing that the methods substantially improve inference when the order restrictions are valid. These methods are easy for users to implement in \CRANpkg{csurvey}, with commands to impose common orderings. + +# How to use csurvey + +Statisticians and practitioners working with survey data are familiar with the R package \CRANpkg{survey} (see \citet{survey}) for analysis of complex survey data. Commands in the package allow users to specify the survey design with which their data were collected, then obtain estimation and inference for population variables of interest. The new \CRANpkg{csurvey} package extends the utility of \CRANpkg{survey} by allowing users to implement order constraints on domains. + +The \CRANpkg{csurvey} package relies on the functions in the \CRANpkg{survey} package such as `svydesign` and `svrepdesign`, which allow users to specify the survey design. This function produces an object that contains information about the sampling design, allowing the user to specify strata, sampling weights, etc. This object is used in statistical functions in the \CRANpkg{csurvey} package in the same manner as for the \CRANpkg{survey} package. In addition, the mixture covariance matrix in \cite{xu20} is constructed from an initial estimate of covariance obtained from \CRANpkg{survey}. + +Consider a finite population with labels \(U = \{1,\ldots,N\}\), partitioned into domains \(U_d\), \(d=1,\ldots,D\), where \(U_d\) has \(N_d\) elements. For a study variable \(y\), suppose interest is in estimating the population domain means +\[ +\bar{y}_{U_d} = \frac{\sum_{k\in U_d} y_k}{N_d} +\] +for each \(d\), and providing inference such as confidence intervals for each \(\bar{y}_{U_d}\). +Given a survey design, a sample \(s\subset U\) is chosen, and the unconstrained estimator of the population domain means is a weighted average of the sample observations in each domain. This estimator \(\tilde{\mathbf{y}}_s=(\tilde{y}_{s_1},\ldots, \tilde{y}_{s_D})\) in \cite{hajek71} is provided by the \CRANpkg{survey} package. + +The desired orderings are imposed as linear inequality constraints on the domain means, in the form of an \(m\times D\) constraint matrix \(\mathbf{A}\). The \CRANpkg{csurvey} package will find the constrained estimator \(\tilde{\boldsymbol{\theta}}\) by minimizing +$$ +\min_{\theta}(\tilde{\mathbf{y}}_s - \boldsymbol{\theta})^{\top}\mathbf{W}_s(\tilde{\mathbf{y}}_s-\boldsymbol{\theta}) \quad \mbox{such that} \hspace{2mm} \mathbf{A}\boldsymbol{\theta}\geq \mathbf{0} +$$ +where the weights $\mathbf{W}_s$ are provided by the survey design. (See \cite{oliva20} for details.) For a simple example of a constraint matrix, consider five domains with a simple ordering, where we assume \(\bar{y}_{U_1}\leq \bar{y}_{U_2}\leq \bar{y}_{U_3}\leq \bar{y}_{U_4}\leq \bar{y}_{U_5}\). Perhaps these are average salaries over five job levels. Then the constraint matrix is +\[ \mathbf{A} = \left(\begin{array}{ccccc} -1 & 1 & 0 & 0 & 0 \\0 & -1 & 1 & 0 & 0 \\0 & 0 & -1 & 1 & 0 \\0 &0 &0 & -1 & 1 \end{array}\right). +\] +For simple orderings on \(D\) domains, the constraint matrix is \((D-1) \times D\). +For a block ordering example, suppose we again have five domains, and we know that each of the population means in the first two domains must be smaller than each of the means of the last three domains. The constraint matrix is +\[ \mathbf{A} = \left(\begin{array}{ccccc} -1 & 0 & 1 & 0 & 0 \\ -1 & 0 & 0 & 1& 0 \\ -1 & 0 & 0 & 0 & 1\\ +0 & -1 & 1 & 0 & 0 \\0 & -1 & 0 & 1& 0 \\ 0 & -1 & 0 & 0 & 1\\ +\end{array}\right). +\] +The number of rows for a block ordering with two blocks is \(D_1 \times D_2\), where \(D_1\) is the number of domains in the first block and \(D_2\) is the number in the second block. The package also allows users to specify grids of domains with various order constraints along the dimensions of the grid. The constraint matrices are automatically generated for some standard types of constraints. + +The \CRANpkg{csurvey} package allows users to specify a simple ordering with the symbolic \code{incr} function. For example, suppose \code{y} is the survey variable of interest (say, cholesterol level), and the variable \code{x} takes values \(1\) through \(D\) (age groups for example). Suppose we assume that average cholesterol level is increasing with age in this population, and that the design is specified by the object \code{ds}. Then + +```r +ans <- csvy(y ~ incr(x), design = ds) +``` + +creates an object containing the estimated population means and confidence intervals. The \code{decr} function is used similarly, to specify decreasing means. + +Next, suppose \code{x1} takes values \(1\) through \(D_1\) and \code{x2} takes values \(1\) through \(D_2\) (say, age group and ethnicity), and we wish to estimate the population means over the \(D_1\times D_2\) grid of values of \code{x1} and \code{x2.} If we assume that the population domain means are ordered in \code{x1} but there is no ordering in \code{x2}, then the command + +```r +ans <- csvy(y ~ incr(x1) * x2, design = ds) +``` + +will provide domain mean estimates where the means are non-decreasing in age, within each ethnicity. Note that we don't allow "+" when defining a formula in `csvy()` because we consider all combinations \(D_1\times D_2\) given by \code{x1} and \code{x2}. + +For an example of a block ordering with three blocks, the command + +```r +ans <- csvy(y ~ block.Ord(x, order = c(1,1,1,2,2,2,3,3,3)), design = ds) +``` + +specifies that the variable \code{x} takes values \(1\) through \(9\), and the domains with values 1, 2, and 3 each have population means not greater than each of the population means in the domains with \code{x} values 4, 5, and 6. The domains with \code{x} values 4, 5, and 6 each have population means not greater than each of the population means in the domains with \code{x} values 7, 8, and 9. More examples of implementation of constraints will be given below. + +Implementing order constraints leads to "pooling" of domains where the order constraints are binding. This naturally leads to smaller confidence intervals as the averaging is over a larger number of observations. The mixture covariance estimator for \(\tilde{\boldsymbol{\theta}}\) that was derived in \cite{xu20} is provided by \CRANpkg{csurvey}. This covariance estimator is constructed by recognizing that for different samples, different constraints are binding, so that different sets of domains are "pooled" to obtain the constrained estimator. The covariance estimator then is a mixture of pooled covariance matrix estimators, with the mixture distribution approximated via simulations. Using this mixture rather than the covariance matrix for the observed pooling provides confidence intervals with coverage that is closer to the target, while retaining the shorter lengths. The method introduced in \cite{liao23} further pools information across domains to provide upper and lower confidence interval bounds that also satisfy the constraints, effectively reducing the confidence interval length for domains with small sample sizes, and allowing for estimation and inference in empty domains. + +The test \(H_0:\mathbf{A}\bar{\mathbf{y}}_U=0\) versus the one-sided \(H_1:\mathbf{A}\bar{\mathbf{y}}_U\geq0\) in \CRANpkg{csurvey} has improved power over the \(F\)-test implemented using the \code{anova} command using the object from \code{svyglm} in the \CRANpkg{survey} package. That \(F\)-test uses the two-sided alternative \(H_2:\mathbf{A}\bar{\mathbf{y}}_U\neq0\). For example, suppose we have measures of amounts of pollutants in samples of water from small lakes in a certain region. We also have measurements of distances from sources, such as factories or waste dumps, that are suspected of contributing to the pollution. The test of the null hypothesis that the amount of pollution does not depend on the distance will have greater power if the alternative is one-sided, that is, that the pollution amount is, on average, larger for smaller distances. + +In the following sections, we only show the main part of code used to generate the results. Supplementary or non-essential code is available in the accompanying R script submitted with this manuscript. + +# NHANES Example with monotonic domain means {#sec-monotonic} + +The National Health and Nutrition Examination Survey (NHANES) combines in-person interviews and physical examinations to produce a comprehensive data set from a probability sample of residents of the U.S. The data are made available to the public at this [CDC NHANES website](https://wwwn.cdc.gov/nchs/nhanes/). The subset used in this example is derived from the 2009–2010 NHANES cycle and is provided in the \CRANpkg{csurvey} package as the \code{nhdat2} data set. We consider the task of estimating the average total cholesterol level (mg/dL), originally recorded as `LBXTC` in the raw NHANES data, by age (in years), corresponding to the original variable `RIDAGEYR`. Focusing on young adults aged 21 to 45, we analyze a sample of $n = 1,933$ participants and specify the survey design using the associated weights and strata available in the data. + +```{r} +library(csurvey) +data(nhdat2, package = "csurvey") +dstrat <- svydesign(ids = ~id, strata = ~str, data = nhdat2, weight = ~wt) +``` + +Then, to get the proposed constrained domain mean estimate, we use the \code{csvy} function in the \CRANpkg{csurvey} package. In this function, \code{incr} is a symbolic function used to define that the population domain means of `chol` are increasing with respect to the predictor `age`. + +```{r} +ans <- csvy(chol ~ incr(age), design = dstrat, n.mix = 100) +``` + +The `n.mix` parameter controls the number of simulations used to estimate the mixture covariance matrix, with a default of `n.mix = 100`. To speed up computation, users can set `n.mix` to be a smaller number, e.g., 10. + +We can extract from \code{ans} the CIC value for the constrained estimator as + +```{r} +cat("CIC (constrained):", ans$CIC, "\n") +``` + +and the CIC value for the unconstrained estimator as + +```{r} +cat("CIC (unconstrained):", ans$CIC.un, "\n") +``` + +We see that for this example, the constrained estimator has a smaller CIC value and this implies it is a better fit. + +If we want to construct a contrast of domain means and get its standard error, we can use the \code{svycontrast} function, which is inherited from the \CRANpkg{survey} package. Note that in the \CRANpkg{survey} package, it is impossible to get a contrast estimate when there is any empty domain. In the \CRANpkg{csurvey} package, we inherit this feature. For example, suppose we want to compare the average cholesterol for the first thirteen age groups to the last twelve age groups, we can code it as: + +```{r} +cat(svycontrast(ans, list(avg = c(rep(-1, 13)/13, rep(1, 12)/12))), "\n") +``` + +The \code{csvy} function produces both the constrained fit and the corresponding unconstrained fit using methods from the \CRANpkg{survey} package. A visual comparison between the two fits can be easily obtained by applying the \code{plot} method to the resulting \code{csvy} object by specifying the argument `type = "both"`. For illustration, Figure \@ref(fig:nh1big) displays the estimated domain means along with 95% confidence intervals, generated by the following code: + +```r +plot(ans, type = "both") +``` + +```{r nh1big, fig.cap="Estimates of average cholesterol level for 25 ages, with 95% confidence intervals, for a stratified sample in the R dataset `nhdat2`, $n=1933$.", fig.align='center', echo=FALSE} +knitr::include_graphics("figures/nhanes1.png") +``` + +The \code{confint} function can be used to extract the confidence interval for domain mean. When the response is not Gaussian, then \code{type="link"} produces the confidence interval for the average systematic component over domains, while \code{type="response"} produces the confidence interval for the domain mean. + +For this data set, the sample sizes for each of the twenty-five ages range from 54 to 99, so that none of the domains is "small." To demonstrate \code{csvy} in the case of small domains, we next provide domain mean estimates and confidence intervals for $400$ domains, arranged in a grid with 25 ages, four waist-size categories, and four income categories. We divide the waist measurement by height to make a relative girth, and split the observations into four groups with variable name `wcat`, which is a 4-level ordinal categorical variable representing waist-to-height ratio categories, computed from `BMXWAIST` (waist circumference in cm) and `BMXHT` (height in cm) in the body measures file `BMX_F.XPT` from NHANES. We have income information for the subjects, in terms of a multiple of the federal poverty level. Our first income category includes those with income that is 75% or less than the poverty line. The second category is .75 through 1.38 times the poverty level (1.38 determines Medicaid eligibility), the third category goes from above 1.38 to 3.5, and finally above 3.5 times the poverty level is the fourth category. These indicators are contained in the variable `icat` (categorized income). We create `icat` by the `INDFMPIR` variable from the file `DEMO_F.XPT` from NHANES. + +The domain sample sizes average only 4.8 observations, and there are 16 empty domains. The sample size for each domain can be checked by `ans$nd`. To estimate the population means we assume average cholesterol level is increasing in both age and waist size, but no ordering is imposed on income categories: + +```{r} +set.seed(1) +ans <- csvy(chol ~ incr(age)*incr(wcat)*icat, design = dstrat) +``` + +To extract estimates and confidence intervals for specific domains defined by the model, the user can use the \code{predict} function as follows: + +```{r} +domains <- data.frame(age = c(24, 35), wcat = c(2, 4), icat = c(2, 3)) +pans <- predict(ans, newdata = domains, se.fit = TRUE) +cat("Predicted values, confidence intervals and standard errors for specified domains:\n") +print (pans) +``` + +Figure \@ref(fig:nh2) displays the domain mean estimates along with 95% confidence intervals, highlighting in red the two domains specified in the \code{predict} function. The code to create this plot is as below. The `control` argument is used to let the user adjust the aesthetics of the plot. + +```{r nh2, fig.cap="Constrained estimates of population domain means for 400 domains in a 25x4x4 grid. The increasing population domain estimates for the 25 ages are shown within the waist size and income categories. The blue bands indicate 95% confidence intervals for the population domain means, with two specific domains, namely, (age, waist, income) = (24, 2, 2) and (35, 4, 3) marked in red. Empty domains are marked with a red 'x' sign.", fig.align='center', echo=FALSE} +knitr::include_graphics("figures/nhanes_grid3.png") +``` + +```r +ctl <- list(x1lab = "waist", x2lab = "income", subtitle.size = 8) +plot(ans, x1 = "wcat", x2 = "icat", control = ctl, domains = domains) +``` + +The user can visualize the unconstrained fit by specifying \code{type = "unconstrained"} in the \code{plot} function. The corresponding output is presented in Figure \@ref(fig:nh2un). + +```r +plot(ans, x1 = "wcat", x2 = "icat", control = ctl, type="unconstrained") +``` + +```{r nh2un, fig.cap="Unconstrained estimates of population domain means for 400 domains in a 25x4x4 grid. The population domain estimates for the 25 ages are shown within the waist size and income categories. The green bands indicate 95% confidence intervals for the population domain means. Empty domains are marked with a red 'x' sign.", fig.align='center', echo=FALSE} +knitr::include_graphics("figures/nhanes_grid3_un.png") +``` + +Without the order constraints, the sample sizes are too small to provide valid estimates and confidence intervals, unless further model assumptions are used, as in some small area estimation methods. With order constraints, design-based estimation and inference are possible for substantially smaller domain sample sizes, compared to the unconstrained design-based estimation. + +# Constrained domain means with a block ordering + +We consider the 2019 National Survey of College Graduates (NSCG), conducted by the U.S. Census Bureau and sponsored by the National Center for Science and Engineering Statistics (NCSES) within the National Science Foundation. The NSCG provides data on the characteristics of the nation's college graduates, with a focus on those in the science and engineering workforce. The datasets and documentation are available to the public on the [National Survey of College Graduates (NSCF) website](https://www.nsf.gov/statistics/srvygrads). Replicate weights are available separately from NCSES upon request. Because the size of subsets of the NSCG survey used in this paper exceeds the size limit allowed for an R package stored on CRAN, the subsets are not included in the \CRANpkg{csurvey} package. Instead, we provide the link to access the subsets `nscg19.rda` and `nscg19_2.rda` used in this section at this [website](https://github.com/xliaosdsu/csurvey-data). + + + +The study variable of interest is annual salary (denoted as SALARY in the dataset), which exhibits substantial right-skewness in its raw form. To reduce the influence of outliers and improve model stability, we restricted the analysis to observations with annual salaries between \$30,000 and \$400,000. A logarithmic transformation was applied to the salary variable to address skewness. Four predictors of salary were considered: + +- Field of study (`field`, denoted by `NDGMEMG` in the raw dataset): This nominal variable defines the field of study for the highest degree. There are five levels: (1) Computer and mathematical sciences; (2) Biological, agricultural and environmental life sciences; (3) Physical and related sciences; (4) Social and related sciences; (5) Engineering. *Block ordering constraint*: given the other predictors, the average annual salary for each of the fields (2) and (4) is less than for the STEM fields (1), (3) and (5). + +- Grouped year of award of highest degree (`hd_year_grouped`, denoted by `HDAY5`): This ordinal variable has five levels: (1) 1995 to 1999; (2) 2000 to 2004; (3) 2005 to 2009; (4) 2010 to 2014; (5) 2015 or later. *Isotonic constraint*: given the other predictors, the average annual salary decreases with the year of award of highest degree; i.e., the more experience respondents have, on average, the higher the annual salary. + +- Highest degree type (`hd_type`, denoted by DGRDG): The three levels are: (1) Bachelor's; (2) Master's; (3) Doctorate and Professional. *Isotonic constraint*: given the other predictors, the average annual salary increases with respect to the highest degree type. + +- Region code for employer (`region`, denoted by EMRG): This nominal variable defines the regions in which the respondents worked within the U.S. Nine levels are: (1) New England; (2) Middle Atlantic; (3) East North Central; (4) West North Central; (5) South Atlantic; (6) East South Central; (7) West South Central; (8) Mountain; (9) Pacific and US Territories. There is no constraint for this predictor. + +This data set contains $n=30,368$ observations in a four-dimensional grid of 675 domains, where the sample size in the domains ranges from one to 491. Here, we specify the shape and order constraints in a similar fashion as we did in previous examples. The symbolic routine \code{block.Ord} is used to impose a block ordering on `field` and the order is specified in the \code{order} argument. The \code{svydesign} specifies a survey design with no clusters. The command \code{svrepdesign} creates a survey design with replicate weights, where the columns named as "RW0001", "RW0002",\ldots,"RW0320" are the 320 NSCG replicate weights and \code{weights( = \textasciitilde w)} denotes the sampling weight. The variance is computed as the sum of squared deviation of the replicates from the mean. The general formula for computing a variance estimate using replicate weights follows: +\[v_{REP}(\hat{\theta})=\sum_{r=1}^{R}c_r(\hat{\theta}_r-\hat{\theta})^2\] +where the estimate \(\hat{\theta}\) is computed based on the final full sample survey weights and the calculation of each replicate estimate \(\hat{\theta}_r\) is according to the \(r\)th set of replicate weights (\(r=1,\cdots, R\)). The replication adjustment factor \(c_r\) is a multiplier for the \(r\)th replicate of squared difference. The \code{scale( = 1)} is an overall multiplier and the \code{rscale( = 0.05)} denotes a vector of replicate-specific multipliers, which are the values of \(c_r\) in above formula. + +```r +load("./nscg19.rda") +rds <- svrepdesign(data = nscg, repweights = dplyr::select(nscg, "RW0001":"RW0320"), + weights = ~w, combined.weights = TRUE, mse = TRUE, type = "other", scale = 1, rscale = 0.05) +``` + +Estimates of domain means for the 225 domains for which the highest degree is a Bachelor's are shown in Figure \@ref(fig:surface9), as surfaces connecting the estimates over the grid of field indicators and year of award of highest degree. The block-ordering constraints can be seen in the surfaces, where the fields labeled 2 and 4 have lower means than those labeled 1, 3, and 5. The surfaces are also constrained to be decreasing in year of award of highest degree. + +To improve computational efficiency, the simulations used to estimate the mixture covariance matrix and the sampling distribution of the test statistic can be parallelized. This can be enabled by setting the following R option + +```r +options(csurvey.multicore = TRUE) +``` + +The model is fitted using the following code + +```r +ans <- csvy(logSalary~decr(hd_year_grouped)*incr(hd_type)*block.Ord(field, + order = c(2, 1, 2, 1, 2))*region, design = rds) +``` + +The \code{plotpersp} function is used to create Figure \@ref(fig:surface9). It will create a three-dimensional estimated surface plot when there are at least two predictors. In this example, we use \code{plotpersp} to generate a 3D perspective plot of the estimated average log-transformed salary with respect to field and year of award of highest degree. Among the three `hd_type` categories, the Bachelor's degree group has the highest number of observations. The \code{plotpersp} function visualizes the 225 domains corresponding to the most frequently observed level of this fourth predictor. Plot aesthetics are customized via the `control` argument: + +```r +ctl <- list(categ = "region", categnm = c("New England", "Middle Atlantic", "East North Central", + "West North Central", "South Atlantic", "East South Central", "West South Central", "Mountain", + "Pacific and US Territories"), NCOL = 3, th = 60, xlab = "years since degree", + ylab = "field of study", zlab = "log(salary)") + +plotpersp(ans, x1="hd_year_grouped", x2="field", control = ctl) +``` + +```{r surface9, fig.cap="Estimates of average log(salary) by field of study and year of degree, for observations where highest degree is a Bachelor's, for each of the nine regions.", fig.align='center', echo=FALSE, out.width="60%"} +knitr::include_graphics("figures/new_surfaces9.png") +``` + + + + + +Estimates and confidence intervals for the 75 domain means associated with three of the regions are shown in Figure \@ref(fig:NEreg). The constrained estimates are shown as blue dots, and the unconstrained means are depicted as grey dots. The confidence intervals are indicated with blue bands for the constrained estimates and grey bands for the unconstrained estimates. + +For the Northeast Region, the average length of the constrained confidence intervals is .353, while the average length for the unconstrained confidence intervals is .477. In the domain for those in the math and computer science field, with PhDs obtained in 2000-2004, the unconstrained log-salary estimate for this domain is much below the corresponding constrained estimate, because the latter is forced to be at least that of lower degrees, and that of newer PhDs. If the constraints hold in the population, then the unconstrained confidence interval is unlikely to capture the population value. As seen in the NHANES example in Section `r if(knitr::is_html_output()) "[4](#sec-monotonic)" else "\\@ref(sec-monotonic)"`, the unconstrained estimators are unreliable when the sample domain size is small. The Pacific region has the largest sample sizes, ranging from 12 to 435 observations. With this larger sample size, most of the unconstrained estimates already satisfy the constraints, but the average length for the constrained estimator is .301, while the average length for the unconstrained estimator is .338, showing that the mixture covariance matrix leads to more precision in the confidence intervals. Also shown is the region with the smallest sample sizes: the East South Central region. Here the average length for the unconstrained confidence intervals is .488 while the average length for the constrained confidence intervals is .444. + +```{r NEreg, fig.cap="Estimates of average log(salary) for the 75 domains in each of three regions. The blue dots represent the constrained domain mean estimates, while the grey dots represent the unconstrained domain mean estimates. The blue band is the 95% confidence interval for the domains, using the constraints; the grey band is the 95% unconstrained domain mean confidence interval.", fig.align='center', echo=FALSE, out.width="100%"} +knitr::include_graphics("figures/newplot4.png") +``` + +# One-sided testing + +For this example we again use the NCGS data set. But for this example, we use some new variables and make some variable groupings. First, instead of using the grouped year of award of highest degree (`hd_year_grouped`), we use the actual calendar year when the highest degree was awarded (`hd_year`, denoted as `HDACYR` in the raw data set). We conduct the one-sided test for six pairs of consecutive calendar years. Besides, we choose father's education level (`daded`, denoted as `EDDAD`) as the main predictor, and the interest is in determining whether salaries are higher for people whose father has a higher education. In the raw data set, `EDDAD` has seven categories, we group them into five categories for this example: 1 = no high school degree, 2 = high school degree but no college degree, 3 = bachelor's degree, 4 = master's degree, and 5 = PhD or professional degree. We further group `region` (`EMRG`) as: 1 = Northeast, 2 = North Central, 3 = Southeast, 4 = West, 5 = Pacific and Territories, and group `field` (`NDGMEMG`) as: 1 = Computer and Mathematical Sciences, Physical and Related Sciences, Engineering, which can be considered as core STEM fields, 2 = Biological, Agricultural, and Environmental Life Sciences, Social and Related Sciences, which can be considered as life and social sciences, 3 = Science and Engineering-Related Fields, 4 = Non-Science and Engineering Fields, which is Non-STEM. Finally, people's highest degree type (denoted as `DGRDG` in the raw data) is used to limit the study sample to people whose highest degree is a Master's degree. The sample size is $n=25,177$. + +It seems reasonable that people whose parents are more educated are more educated themselves, but if we control for the person's level of education, as well as field, region, and year of degree, will the salary still be increasing with level of father's education? For this, the null hypothesis is that within the region, field, and year of degree, the salary is constant over levels of father's education. The one-sided alternative is that salaries increase with father's education level. + +The constrained and unconstrained fits to the data under the alternative hypotheses are shown in Figure \@ref(fig:test) for five regions and four fields, for degree years 2016-2017. The dots connected by solid blue lines represent the fit with the one-sided alternative, i.e., constrained to be increasing in father's education. The dots connected by solid red lines represent the fit with the two-sided alternative, i.e., with unconstrained domain means. + +```{r test, fig.cap="Estimates of average log(salary) by father's education level, for each of five regions and four fields, for subjects whose degree was attained in 2016-2017. The solid blue lines connect the estimates where the average salary is constrained to be increasing in father's education, and the solid red lines connect unconstrained estimates of average salary.", fig.align='center', echo=FALSE, out.width="100%"} +knitr::include_graphics("figures/daded.png") +``` + +The \(p\)-values for the test of the null hypotheses are given in Table \@ref(tab:comppv), where n/a is shown for the two-sided \(p\)-value in the case of an empty domain. The test is conducted for six pairs of consecutive calendar years with the sample sizes to be 2,129, 2,795, 3,069, 2,895, 2,423, and 1,368 respectively. The one-sided test consistently has a small \(p\)-value, indicating that the father's education level is positively associated with salary earned, even after controlling for region, degree type, and field. + +```{r comppv, echo=FALSE, results='asis'} +library(knitr) +library(kableExtra) + +years <- c("2008-09","2010-11","2012-13","2014-15","2016-17","2018-19") +vals <- matrix( + c(".008", "n/a", + "<.001", ".018", + "<.001", "<.001", + "<.001", "n/a", + ".003", ".417", + "<.001", "n/a"), + nrow = 1, byrow = TRUE +) +df <- as.data.frame(vals, stringsAsFactors = FALSE) +colnames(df) <- rep(c("one", "two"), length(years)) + +kable(df, booktabs = TRUE, + caption = "One-sided and two-sided $p$-values for the test of the null hypothesis that salary is constant in father's education level. The two-sided test results in n/a when the grid has at least one empty domain.", + escape = TRUE) %>% + add_header_above(setNames(rep(2, length(years)), years)) %>% + kable_styling(latex_options = c("hold_position")) +``` + +The \(p\)-value of the one-sided test is included in the object \code{ans}. For example, to check the \(p\)-value for the fit for the year 2008 to 2009, we can fit the model and print out its summary table as: + +```{r} +load("./nscg19_2.rda") +data <- nscg2 |> + dplyr::filter(hd_year %in% c(2008, 2009)) + +rds <- svrepdesign(data = data, repweights = dplyr::select(data, "RW0001":"RW0320"), weights = ~w, + combined.weights = TRUE, mse = TRUE, type = "other", + scale = 1, rscale = 0.05) + +set.seed(1) +ans <- csvy(logSalary ~ incr(daded) * field * region, design = rds, test = TRUE) +``` + +```{r eval=T} +summary(ans) +``` + +# Binary outcome + +Finally, we use another subset of the NHANES 2009–2010 data to demonstrate how our method applies when the outcome is binary. This subset is included in the \CRANpkg{csurvey} package as `nhdat`. The construction of variables, sampling weights, and strata in this subset closely follows the approach described in Section `r if(knitr::is_html_output()) "[4](#sec-monotonic)" else "\\@ref(sec-monotonic)"`. It contains $n = 1,680$ observations with complete records on total cholesterol, age, height, and waist circumference for adults aged 21–40. The binary outcome indicates whether an individual has high total cholesterol, coded as 1 if total cholesterol exceeds 200 mg/dL, and 0 otherwise. We estimate the population proportion with high cholesterol by age, waist, and gender (1 = male, 2 = female). The waist variable, denoted as `wcat`, is a 4-level categorized ordinal variable representing waist-to-height ratios. + +It is reasonable to assume that, on average, the proportion of individuals with high cholesterol increases with both age and waist. The model is specified using the following code: + +```{r} +data(nhdat, package = "csurvey") +dstrat <- svydesign(ids = ~ id, strata = ~ str, data = nhdat, weight = ~ wt) +set.seed(1) +ans <- csvy(chol ~ incr(age) * incr(wcat) * gender, design = dstrat, + family = binomial(link = "logit"), test = TRUE) +``` + +The CIC of the constrained estimator is smaller than that of the unconstrained estimator, and the one-sided hypothesis test has a $p$-value close to zero. + +```{r} +summary(ans) +``` + +The combination of age, waist, and gender gives 160 domains. This implies that the average sample size for each domain is only around 10. Due to the small sample sizes, the unconstrained estimator shows unlikely jumps as age increases within each waist category. On the other hand, the constrained estimator is more stable and tends to have smaller confidence intervals compared with the unconstrained Hájek estimator. + +```{r eval=FALSE, echo=FALSE} +ctl <- list(angle = 0, x1size = 2, x2size = 2, x1lab = "waist", x2_labels = c("male", "female"), + subtitle.size=6) +plot(ans, x1 = "wcat", x2 = "gender", type="both", control = ctl) +``` + +```{r nhanesbin, fig.cap="Estimates of probability of high cholesterol level for each combination of age, waist and gender. The blue dots represent the constrained domain mean estimates, while the green dots represent the unconstrained domain mean estimates. The blue band is the 95% confidence interval for the domains, using the constraints; the green band is the 95% unconstrained domain mean confidence interval.", fig.align='center', echo=FALSE} +knitr::include_graphics("figures/nhanes_bin.png") +``` + +# Discussion + +While model-based small area estimators - such as those implemented in the \CRANpkg{sae} package \citep{sae2015} and the \CRANpkg{emdi} package \citep{emdi2019} - are powerful tools for borrowing strength across domains, they rely on parametric assumptions that may be violated in practice. Design-based methods remain essential for official statistical agencies, as they provide transparent and model-free inference that is directly tied to the survey design. Estimation and inference for population domain means with survey data can be substantially improved if constraints based on natural orderings are implemented. The \CRANpkg{csurvey} package (version 1.15) \citep{csurvey2025} allows users to specify orderings on grids of domains, and obtain estimates of and confidence intervals for population domain means. This package also implements the design-based small area estimation method, which allows inference for population domain means for which the sample domain is empty, and further is used to improve estimates for domains with small sample size. The one-sided testing procedure available in \CRANpkg{csurvey} has higher power than the standard two-sided test, and further can be applied in grids with some empty domains. Confidence intervals for domain means have better coverage rate and smaller interval width than what is produced by unconstrained estimation. Finally, the package provides functions to allow the user to easily visualize the data and the fits. The utility of the package has been demonstrated with well-known survey data sets. + +**Acknowledgment:** This work was partially funded by NSF MMS-1533804. diff --git a/_articles/RJ-2025-032/RJ-2025-032.html b/_articles/RJ-2025-032/RJ-2025-032.html new file mode 100644 index 0000000000..3cfea63134 --- /dev/null +++ b/_articles/RJ-2025-032/RJ-2025-032.html @@ -0,0 +1,2238 @@ + + + + + + + + + + + + + + + + + + + + + csurvey: Implementing Order Constraints in Survey Data Analysis + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+

csurvey: Implementing Order Constraints in Survey Data Analysis

+ + + +

Recent work in survey domain estimation has shown that incorporating a priori assumptions about orderings of population domain means reduces the variance of the estimators. The new R package csurvey allows users to implement order constraints using a design specified in the well-known survey package. The order constraints not only give estimates that satisfy a priori assumptions, but they also provide smaller confidence intervals with good coverage, and allow for design-based estimation in small-sample or empty domains. A test for constant versus increasing domain means is available in the package, with generalizations to other one-sided tests. A cone information criterion may be used to provide evidence that the order constraints are valid. Examples with well-known survey data sets show the utility of the methods. This package is now available from the Comprehensive R Archive Network at http://CRAN.R-project.org/package=csurvey.

+
+ + + +
+

1 Introduction

+

We assume that a finite population is partitioned into a number of domains, and the goal is to estimate the population domain means for the study variable and provide valid inference, such as confidence intervals and hypothesis tests.

+

Order constraints are a common type of a priori knowledge in survey data analysis. For example, we might know that salary increases with job rank within job type and location, or average cholesterol in a population increases with age category, or test scores decrease as poverty increases, or that the amount of pollution decreases with distance from the source. There might be a “block ordering” of salaries by location, where the locations are partitioned into blocks where salaries for all locations in one block (such as major metropolitan areas) are assumed to be higher, on average, than salaries in another block (such as rural areas), without imposing orderings within the blocks.

+

The order constraints may be imposed on domain mean estimates, or systematic component estimates if the response variable is not Gaussian but belongs to an exponential family of distributions. For example, if we are estimating the proportion of people with diabetes in a population, the study variable might be binary (whether or not the subject has diabetes), and perhaps we want to assume that diabetes prevalence increases with age, within ethnicity and socio-economic categories. The csurvey package extends the survey package in that it allows the user to impose linear inequality constraints on the domain means. If the inequality constraints are valid, this leads to improved inference, as well as the ability to estimate means for empty or small-sample domains without additional assumptions. The csurvey package provides constrained estimates of means or proportions, estimated variances of the estimates, and confidence intervals where the upper and lower bounds of the intervals also satisfy the order constraints.

+

The csurvey package implements one-sided tests. Suppose the study variable is salary, and interest is in whether the subject’s salary is affected by the level of education of the subject’s father, controlling for the subject’s education level and field. The null hypothesis is that there is no difference in salary by the level of father’s education. The one-sided test with alternative hypothesis that the salary increases with father’s education has a higher power than the two-sided test with the larger alternative “salary is not the same across the levels of father’s education.”

+

Finally, the csurvey package includes graphical functions for visualizing the order-constrained estimator and comparing it to the unconstrained estimator. Confidence bands can also be displayed to illustrate estimation uncertainty.

+

This package relies on coneproj and survey for its core computations and handling of complex sampling designs. The domain grid is constructed using the data.table package. Additional functionality, such as data filtering, variable transformation, and visualization of model results, leverages functions from igraph , dplyr , and ggplot2 . When simulating the mixture covariance matrix and the sampling distribution of the one-sided test statistic, selected functions from MASS and Matrix are employed.

+

2 Research incorporated in csurvey

+

The csurvey package provides estimation and inference on population domain variables with order constraints, using recently developed methodology. considered a complete ordering on a sequence of domain means. They applied the pooled adjacent violators algorithm for domain mean estimation, and derived asymptotic confidence intervals that have smaller width without sacrificing coverage, compared to the estimators that do not consider the ordering. developed methodology for partial orderings and more general constraints on domains. These include block orderings and orderings on domains arranged in grids by multiple variables of interest. refined these methods by proposing variance estimators based on a mixture of covariance matrices, and showed that the mixture covariance estimator improves coverage of confidence intervals while retaining smaller interval lengths. showed how to use the order constraints to provide conservative design-based inference in domains with small sample sizes, or even in empty domains, without additional model assumptions. developed a test for constant versus increasing domain means, and extended it to one-sided tests for more general orderings. proposed a cone information criterion (CIC) for survey data as a diagnostic method to measure possible departures from the assumed ordering. This criterion is similar to Akaike’s information criterion (AIC) and the Bayesian information criterion (BIC) , and can be used for model selection. For example, we can use CIC to choose between an unconstrained estimator and an order constrained estimator. The one with a smaller CIC value will be chosen.

+

All of the above papers include extensive simulations showing that the methods substantially improve inference when the order restrictions are valid. These methods are easy for users to implement in csurvey, with commands to impose common orderings.

+

3 How to use csurvey

+

Statisticians and practitioners working with survey data are familiar with the R package survey (see ) for analysis of complex survey data. Commands in the package allow users to specify the survey design with which their data were collected, then obtain estimation and inference for population variables of interest. The new csurvey package extends the utility of survey by allowing users to implement order constraints on domains.

+

The csurvey package relies on the functions in the survey package such as svydesign and svrepdesign, which allow users to specify the survey design. This function produces an object that contains information about the sampling design, allowing the user to specify strata, sampling weights, etc. This object is used in statistical functions in the csurvey package in the same manner as for the survey package. In addition, the mixture covariance matrix in is constructed from an initial estimate of covariance obtained from survey.

+

Consider a finite population with labels \(U = \{1,\ldots,N\}\), partitioned into domains \(U_d\), \(d=1,\ldots,D\), where \(U_d\) has \(N_d\) elements. For a study variable \(y\), suppose interest is in estimating the population domain means +\[ +\bar{y}_{U_d} = \frac{\sum_{k\in U_d} y_k}{N_d} +\] +for each \(d\), and providing inference such as confidence intervals for each \(\bar{y}_{U_d}\). +Given a survey design, a sample \(s\subset U\) is chosen, and the unconstrained estimator of the population domain means is a weighted average of the sample observations in each domain. This estimator \(\tilde{\mathbf{y}}_s=(\tilde{y}_{s_1},\ldots, \tilde{y}_{s_D})\) in is provided by the survey package.

+

The desired orderings are imposed as linear inequality constraints on the domain means, in the form of an \(m\times D\) constraint matrix \(\mathbf{A}\). The csurvey package will find the constrained estimator \(\tilde{\boldsymbol{\theta}}\) by minimizing +\[ +\min_{\theta}(\tilde{\mathbf{y}}_s - \boldsymbol{\theta})^{\top}\mathbf{W}_s(\tilde{\mathbf{y}}_s-\boldsymbol{\theta}) \quad \mbox{such that} \hspace{2mm} \mathbf{A}\boldsymbol{\theta}\geq \mathbf{0} +\] +where the weights \(\mathbf{W}_s\) are provided by the survey design. (See for details.) For a simple example of a constraint matrix, consider five domains with a simple ordering, where we assume \(\bar{y}_{U_1}\leq \bar{y}_{U_2}\leq \bar{y}_{U_3}\leq \bar{y}_{U_4}\leq \bar{y}_{U_5}\). Perhaps these are average salaries over five job levels. Then the constraint matrix is +\[ \mathbf{A} = \left(\begin{array}{ccccc} -1 & 1 & 0 & 0 & 0 \\0 & -1 & 1 & 0 & 0 \\0 & 0 & -1 & 1 & 0 \\0 &0 &0 & -1 & 1 \end{array}\right). +\] +For simple orderings on \(D\) domains, the constraint matrix is \((D-1) \times D\). +For a block ordering example, suppose we again have five domains, and we know that each of the population means in the first two domains must be smaller than each of the means of the last three domains. The constraint matrix is +\[ \mathbf{A} = \left(\begin{array}{ccccc} -1 & 0 & 1 & 0 & 0 \\ -1 & 0 & 0 & 1& 0 \\ -1 & 0 & 0 & 0 & 1\\ +0 & -1 & 1 & 0 & 0 \\0 & -1 & 0 & 1& 0 \\ 0 & -1 & 0 & 0 & 1\\ +\end{array}\right). +\] +The number of rows for a block ordering with two blocks is \(D_1 \times D_2\), where \(D_1\) is the number of domains in the first block and \(D_2\) is the number in the second block. The package also allows users to specify grids of domains with various order constraints along the dimensions of the grid. The constraint matrices are automatically generated for some standard types of constraints.

+

The csurvey package allows users to specify a simple ordering with the symbolic function. For example, suppose is the survey variable of interest (say, cholesterol level), and the variable takes values \(1\) through \(D\) (age groups for example). Suppose we assume that average cholesterol level is increasing with age in this population, and that the design is specified by the object . Then

+
ans <- csvy(y ~ incr(x), design = ds)
+

creates an object containing the estimated population means and confidence intervals. The function is used similarly, to specify decreasing means.

+

Next, suppose takes values \(1\) through \(D_1\) and takes values \(1\) through \(D_2\) (say, age group and ethnicity), and we wish to estimate the population means over the \(D_1\times D_2\) grid of values of and If we assume that the population domain means are ordered in but there is no ordering in , then the command

+
ans <- csvy(y ~ incr(x1) * x2, design = ds)
+

will provide domain mean estimates where the means are non-decreasing in age, within each ethnicity. Note that we don’t allow “+” when defining a formula in csvy() because we consider all combinations \(D_1\times D_2\) given by and .

+

For an example of a block ordering with three blocks, the command

+
ans <- csvy(y ~ block.Ord(x, order = c(1,1,1,2,2,2,3,3,3)), design = ds)
+

specifies that the variable takes values \(1\) through \(9\), and the domains with values 1, 2, and 3 each have population means not greater than each of the population means in the domains with values 4, 5, and 6. The domains with values 4, 5, and 6 each have population means not greater than each of the population means in the domains with values 7, 8, and 9. More examples of implementation of constraints will be given below.

+

Implementing order constraints leads to “pooling” of domains where the order constraints are binding. This naturally leads to smaller confidence intervals as the averaging is over a larger number of observations. The mixture covariance estimator for \(\tilde{\boldsymbol{\theta}}\) that was derived in is provided by csurvey. This covariance estimator is constructed by recognizing that for different samples, different constraints are binding, so that different sets of domains are “pooled” to obtain the constrained estimator. The covariance estimator then is a mixture of pooled covariance matrix estimators, with the mixture distribution approximated via simulations. Using this mixture rather than the covariance matrix for the observed pooling provides confidence intervals with coverage that is closer to the target, while retaining the shorter lengths. The method introduced in further pools information across domains to provide upper and lower confidence interval bounds that also satisfy the constraints, effectively reducing the confidence interval length for domains with small sample sizes, and allowing for estimation and inference in empty domains.

+

The test \(H_0:\mathbf{A}\bar{\mathbf{y}}_U=0\) versus the one-sided \(H_1:\mathbf{A}\bar{\mathbf{y}}_U\geq0\) in csurvey has improved power over the \(F\)-test implemented using the command using the object from in the survey package. That \(F\)-test uses the two-sided alternative \(H_2:\mathbf{A}\bar{\mathbf{y}}_U\neq0\). For example, suppose we have measures of amounts of pollutants in samples of water from small lakes in a certain region. We also have measurements of distances from sources, such as factories or waste dumps, that are suspected of contributing to the pollution. The test of the null hypothesis that the amount of pollution does not depend on the distance will have greater power if the alternative is one-sided, that is, that the pollution amount is, on average, larger for smaller distances.

+

In the following sections, we only show the main part of code used to generate the results. Supplementary or non-essential code is available in the accompanying R script submitted with this manuscript.

+

4 NHANES Example with monotonic domain means

+

The National Health and Nutrition Examination Survey (NHANES) combines in-person interviews and physical examinations to produce a comprehensive data set from a probability sample of residents of the U.S. The data are made available to the public at this CDC NHANES website. The subset used in this example is derived from the 2009–2010 NHANES cycle and is provided in the csurvey package as the data set. We consider the task of estimating the average total cholesterol level (mg/dL), originally recorded as LBXTC in the raw NHANES data, by age (in years), corresponding to the original variable RIDAGEYR. Focusing on young adults aged 21 to 45, we analyze a sample of \(n = 1,933\) participants and specify the survey design using the associated weights and strata available in the data.

+
+
+
library(csurvey)
+data(nhdat2, package = "csurvey")
+dstrat <- svydesign(ids = ~id,  strata = ~str, data = nhdat2,  weight = ~wt)
+
+
+

Then, to get the proposed constrained domain mean estimate, we use the function in the csurvey package. In this function, is a symbolic function used to define that the population domain means of chol are increasing with respect to the predictor age.

+
+
+
ans <- csvy(chol ~ incr(age), design = dstrat, n.mix = 100)
+
+
+

The n.mix parameter controls the number of simulations used to estimate the mixture covariance matrix, with a default of n.mix = 100. To speed up computation, users can set n.mix to be a smaller number, e.g., 10.

+

We can extract from the CIC value for the constrained estimator as

+
+
+
cat("CIC (constrained):", ans$CIC, "\n")
+
+
CIC (constrained): 32.99313 
+
+

and the CIC value for the unconstrained estimator as

+
+
+
cat("CIC (unconstrained):", ans$CIC.un, "\n")
+
+
CIC (unconstrained): 51.65159 
+
+

We see that for this example, the constrained estimator has a smaller CIC value and this implies it is a better fit.

+

If we want to construct a contrast of domain means and get its standard error, we can use the function, which is inherited from the survey package. Note that in the survey package, it is impossible to get a contrast estimate when there is any empty domain. In the csurvey package, we inherit this feature. For example, suppose we want to compare the average cholesterol for the first thirteen age groups to the last twelve age groups, we can code it as:

+
+
+
cat(svycontrast(ans, list(avg = c(rep(-1, 13)/13, rep(1, 12)/12))), "\n")
+
+
19.44752 
+
+

The function produces both the constrained fit and the corresponding unconstrained fit using methods from the survey package. A visual comparison between the two fits can be easily obtained by applying the method to the resulting object by specifying the argument type = "both". For illustration, Figure 1 displays the estimated domain means along with 95% confidence intervals, generated by the following code:

+
plot(ans, type = "both")
+
+
+Estimates of average cholesterol level for 25 ages, with 95% confidence intervals, for a stratified sample in the R dataset `nhdat2`, $n=1933$. +

+Figure 1: Estimates of average cholesterol level for 25 ages, with 95% confidence intervals, for a stratified sample in the R dataset nhdat2, \(n=1933\). +

+
+
+

The function can be used to extract the confidence interval for domain mean. When the response is not Gaussian, then produces the confidence interval for the average systematic component over domains, while produces the confidence interval for the domain mean.

+

For this data set, the sample sizes for each of the twenty-five ages range from 54 to 99, so that none of the domains is “small.” To demonstrate in the case of small domains, we next provide domain mean estimates and confidence intervals for \(400\) domains, arranged in a grid with 25 ages, four waist-size categories, and four income categories. We divide the waist measurement by height to make a relative girth, and split the observations into four groups with variable name wcat, which is a 4-level ordinal categorical variable representing waist-to-height ratio categories, computed from BMXWAIST (waist circumference in cm) and BMXHT (height in cm) in the body measures file BMX_F.XPT from NHANES. We have income information for the subjects, in terms of a multiple of the federal poverty level. Our first income category includes those with income that is 75% or less than the poverty line. The second category is .75 through 1.38 times the poverty level (1.38 determines Medicaid eligibility), the third category goes from above 1.38 to 3.5, and finally above 3.5 times the poverty level is the fourth category. These indicators are contained in the variable icat (categorized income). We create icat by the INDFMPIR variable from the file DEMO_F.XPT from NHANES.

+

The domain sample sizes average only 4.8 observations, and there are 16 empty domains. The sample size for each domain can be checked by ans$nd. To estimate the population means we assume average cholesterol level is increasing in both age and waist size, but no ordering is imposed on income categories:

+
+
+
set.seed(1)
+ans <- csvy(chol ~ incr(age)*incr(wcat)*icat, design = dstrat)
+
+
+

To extract estimates and confidence intervals for specific domains defined by the model, the user can use the function as follows:

+
+
+
domains <- data.frame(age = c(24, 35), wcat = c(2, 4), icat = c(2, 3))
+pans <- predict(ans, newdata = domains, se.fit = TRUE)
+cat("Predicted values, confidence intervals and standard errors for specified domains:\n")
+
+
Predicted values, confidence intervals and standard errors for specified domains:
+
+
print (pans)
+
+
$fit
+[1] 162.9599 202.0061
+
+$lwr
+[1] 148.6083 183.6379
+
+$upp
+[1] 177.8976 219.7764
+
+$se.fit
+[1] 7.471908 9.219167
+
+

Figure 2 displays the domain mean estimates along with 95% confidence intervals, highlighting in red the two domains specified in the function. The code to create this plot is as below. The control argument is used to let the user adjust the aesthetics of the plot.

+
+
+Constrained estimates of population domain means for 400 domains in a 25x4x4 grid. The increasing population domain estimates for the 25 ages are shown within the waist size and income categories. The blue bands indicate 95% confidence intervals for the population domain means, with two specific domains, namely, (age, waist, income) = (24, 2, 2) and (35, 4, 3) marked in red. Empty domains are marked with a red 'x' sign. +

+Figure 2: Constrained estimates of population domain means for 400 domains in a 25x4x4 grid. The increasing population domain estimates for the 25 ages are shown within the waist size and income categories. The blue bands indicate 95% confidence intervals for the population domain means, with two specific domains, namely, (age, waist, income) = (24, 2, 2) and (35, 4, 3) marked in red. Empty domains are marked with a red ‘x’ sign. +

+
+
+
ctl <- list(x1lab = "waist", x2lab = "income", subtitle.size = 8)
+plot(ans, x1 = "wcat", x2 = "icat", control = ctl, domains = domains)
+

The user can visualize the unconstrained fit by specifying in the function. The corresponding output is presented in Figure 3.

+
plot(ans, x1 = "wcat", x2 = "icat", control = ctl, type="unconstrained")
+
+
+Unconstrained estimates of population domain means for 400 domains in a 25x4x4 grid. The population domain estimates for the 25 ages are shown within the waist size and income categories. The green bands indicate 95% confidence intervals for the population domain means. Empty domains are marked with a red 'x' sign. +

+Figure 3: Unconstrained estimates of population domain means for 400 domains in a 25x4x4 grid. The population domain estimates for the 25 ages are shown within the waist size and income categories. The green bands indicate 95% confidence intervals for the population domain means. Empty domains are marked with a red ‘x’ sign. +

+
+
+

Without the order constraints, the sample sizes are too small to provide valid estimates and confidence intervals, unless further model assumptions are used, as in some small area estimation methods. With order constraints, design-based estimation and inference are possible for substantially smaller domain sample sizes, compared to the unconstrained design-based estimation.

+

5 Constrained domain means with a block ordering

+

We consider the 2019 National Survey of College Graduates (NSCG), conducted by the U.S. Census Bureau and sponsored by the National Center for Science and Engineering Statistics (NCSES) within the National Science Foundation. The NSCG provides data on the characteristics of the nation’s college graduates, with a focus on those in the science and engineering workforce. The datasets and documentation are available to the public on the National Survey of College Graduates (NSCF) website. Replicate weights are available separately from NCSES upon request. Because the size of subsets of the NSCG survey used in this paper exceeds the size limit allowed for an R package stored on CRAN, the subsets are not included in the csurvey package. Instead, we provide the link to access the subsets nscg19.rda and nscg19_2.rda used in this section at this website.

+ +

The study variable of interest is annual salary (denoted as SALARY in the dataset), which exhibits substantial right-skewness in its raw form. To reduce the influence of outliers and improve model stability, we restricted the analysis to observations with annual salaries between $30,000 and $400,000. A logarithmic transformation was applied to the salary variable to address skewness. Four predictors of salary were considered:

+
    +
  • Field of study (field, denoted by NDGMEMG in the raw dataset): This nominal variable defines the field of study for the highest degree. There are five levels: (1) Computer and mathematical sciences; (2) Biological, agricultural and environmental life sciences; (3) Physical and related sciences; (4) Social and related sciences; (5) Engineering. Block ordering constraint: given the other predictors, the average annual salary for each of the fields (2) and (4) is less than for the STEM fields (1), (3) and (5).

  • +
  • Grouped year of award of highest degree (hd_year_grouped, denoted by HDAY5): This ordinal variable has five levels: (1) 1995 to 1999; (2) 2000 to 2004; (3) 2005 to 2009; (4) 2010 to 2014; (5) 2015 or later. Isotonic constraint: given the other predictors, the average annual salary decreases with the year of award of highest degree; i.e., the more experience respondents have, on average, the higher the annual salary.

  • +
  • Highest degree type (hd_type, denoted by DGRDG): The three levels are: (1) Bachelor’s; (2) Master’s; (3) Doctorate and Professional. Isotonic constraint: given the other predictors, the average annual salary increases with respect to the highest degree type.

  • +
  • Region code for employer (region, denoted by EMRG): This nominal variable defines the regions in which the respondents worked within the U.S. Nine levels are: (1) New England; (2) Middle Atlantic; (3) East North Central; (4) West North Central; (5) South Atlantic; (6) East South Central; (7) West South Central; (8) Mountain; (9) Pacific and US Territories. There is no constraint for this predictor.

  • +
+

This data set contains \(n=30,368\) observations in a four-dimensional grid of 675 domains, where the sample size in the domains ranges from one to 491. Here, we specify the shape and order constraints in a similar fashion as we did in previous examples. The symbolic routine is used to impose a block ordering on field and the order is specified in the argument. The specifies a survey design with no clusters. The command creates a survey design with replicate weights, where the columns named as “RW0001”, “RW0002”,,“RW0320” are the 320 NSCG replicate weights and denotes the sampling weight. The variance is computed as the sum of squared deviation of the replicates from the mean. The general formula for computing a variance estimate using replicate weights follows: +\[v_{REP}(\hat{\theta})=\sum_{r=1}^{R}c_r(\hat{\theta}_r-\hat{\theta})^2\] +where the estimate \(\hat{\theta}\) is computed based on the final full sample survey weights and the calculation of each replicate estimate \(\hat{\theta}_r\) is according to the \(r\)th set of replicate weights (\(r=1,\cdots, R\)). The replication adjustment factor \(c_r\) is a multiplier for the \(r\)th replicate of squared difference. The is an overall multiplier and the denotes a vector of replicate-specific multipliers, which are the values of \(c_r\) in above formula.

+
load("./nscg19.rda")
+rds <- svrepdesign(data = nscg, repweights = dplyr::select(nscg, "RW0001":"RW0320"), 
+  weights = ~w, combined.weights = TRUE, mse = TRUE, type = "other", scale = 1, rscale = 0.05)
+

Estimates of domain means for the 225 domains for which the highest degree is a Bachelor’s are shown in Figure 4, as surfaces connecting the estimates over the grid of field indicators and year of award of highest degree. The block-ordering constraints can be seen in the surfaces, where the fields labeled 2 and 4 have lower means than those labeled 1, 3, and 5. The surfaces are also constrained to be decreasing in year of award of highest degree.

+

To improve computational efficiency, the simulations used to estimate the mixture covariance matrix and the sampling distribution of the test statistic can be parallelized. This can be enabled by setting the following R option

+
options(csurvey.multicore = TRUE)
+

The model is fitted using the following code

+
ans <- csvy(logSalary~decr(hd_year_grouped)*incr(hd_type)*block.Ord(field, 
+      order = c(2, 1, 2, 1, 2))*region, design = rds)
+

The function is used to create Figure 4. It will create a three-dimensional estimated surface plot when there are at least two predictors. In this example, we use to generate a 3D perspective plot of the estimated average log-transformed salary with respect to field and year of award of highest degree. Among the three hd_type categories, the Bachelor’s degree group has the highest number of observations. The function visualizes the 225 domains corresponding to the most frequently observed level of this fourth predictor. Plot aesthetics are customized via the control argument:

+
ctl <- list(categ = "region", categnm = c("New England", "Middle Atlantic", "East North Central", 
+  "West North Central", "South Atlantic", "East South Central", "West South Central", "Mountain", 
+  "Pacific and US Territories"), NCOL = 3, th = 60, xlab = "years since degree", 
+  ylab = "field of study", zlab = "log(salary)")
+
+plotpersp(ans, x1="hd_year_grouped", x2="field", control = ctl) 
+
+
+Estimates of average log(salary) by field of study and year of degree, for observations where highest degree is a Bachelor's, for each of the nine regions. +

+Figure 4: Estimates of average log(salary) by field of study and year of degree, for observations where highest degree is a Bachelor’s, for each of the nine regions. +

+
+
+ + + +

Estimates and confidence intervals for the 75 domain means associated with three of the regions are shown in Figure 5. The constrained estimates are shown as blue dots, and the unconstrained means are depicted as grey dots. The confidence intervals are indicated with blue bands for the constrained estimates and grey bands for the unconstrained estimates.

+

For the Northeast Region, the average length of the constrained confidence intervals is .353, while the average length for the unconstrained confidence intervals is .477. In the domain for those in the math and computer science field, with PhDs obtained in 2000-2004, the unconstrained log-salary estimate for this domain is much below the corresponding constrained estimate, because the latter is forced to be at least that of lower degrees, and that of newer PhDs. If the constraints hold in the population, then the unconstrained confidence interval is unlikely to capture the population value. As seen in the NHANES example in Section 4, the unconstrained estimators are unreliable when the sample domain size is small. The Pacific region has the largest sample sizes, ranging from 12 to 435 observations. With this larger sample size, most of the unconstrained estimates already satisfy the constraints, but the average length for the constrained estimator is .301, while the average length for the unconstrained estimator is .338, showing that the mixture covariance matrix leads to more precision in the confidence intervals. Also shown is the region with the smallest sample sizes: the East South Central region. Here the average length for the unconstrained confidence intervals is .488 while the average length for the constrained confidence intervals is .444.

+
+
+Estimates of average log(salary) for the 75 domains in each of three regions. The blue dots represent the constrained domain mean estimates, while the grey dots represent the unconstrained domain mean estimates. The blue band is the 95% confidence interval for the domains, using the constraints; the grey band is the 95% unconstrained domain mean confidence interval. +

+Figure 5: Estimates of average log(salary) for the 75 domains in each of three regions. The blue dots represent the constrained domain mean estimates, while the grey dots represent the unconstrained domain mean estimates. The blue band is the 95% confidence interval for the domains, using the constraints; the grey band is the 95% unconstrained domain mean confidence interval. +

+
+
+

6 One-sided testing

+

For this example we again use the NCGS data set. But for this example, we use some new variables and make some variable groupings. First, instead of using the grouped year of award of highest degree (hd_year_grouped), we use the actual calendar year when the highest degree was awarded (hd_year, denoted as HDACYR in the raw data set). We conduct the one-sided test for six pairs of consecutive calendar years. Besides, we choose father’s education level (daded, denoted as EDDAD) as the main predictor, and the interest is in determining whether salaries are higher for people whose father has a higher education. In the raw data set, EDDAD has seven categories, we group them into five categories for this example: 1 = no high school degree, 2 = high school degree but no college degree, 3 = bachelor’s degree, 4 = master’s degree, and 5 = PhD or professional degree. We further group region (EMRG) as: 1 = Northeast, 2 = North Central, 3 = Southeast, 4 = West, 5 = Pacific and Territories, and group field (NDGMEMG) as: 1 = Computer and Mathematical Sciences, Physical and Related Sciences, Engineering, which can be considered as core STEM fields, 2 = Biological, Agricultural, and Environmental Life Sciences, Social and Related Sciences, which can be considered as life and social sciences, 3 = Science and Engineering-Related Fields, 4 = Non-Science and Engineering Fields, which is Non-STEM. Finally, people’s highest degree type (denoted as DGRDG in the raw data) is used to limit the study sample to people whose highest degree is a Master’s degree. The sample size is \(n=25,177\).

+

It seems reasonable that people whose parents are more educated are more educated themselves, but if we control for the person’s level of education, as well as field, region, and year of degree, will the salary still be increasing with level of father’s education? For this, the null hypothesis is that within the region, field, and year of degree, the salary is constant over levels of father’s education. The one-sided alternative is that salaries increase with father’s education level.

+

The constrained and unconstrained fits to the data under the alternative hypotheses are shown in Figure 6 for five regions and four fields, for degree years 2016-2017. The dots connected by solid blue lines represent the fit with the one-sided alternative, i.e., constrained to be increasing in father’s education. The dots connected by solid red lines represent the fit with the two-sided alternative, i.e., with unconstrained domain means.

+
+
+Estimates of average log(salary) by father's education level, for each of five regions and four fields, for subjects whose degree was attained in 2016-2017. The solid blue lines connect the estimates where the average salary is constrained to be increasing in father's education, and the solid red lines connect unconstrained estimates of average salary. +

+Figure 6: Estimates of average log(salary) by father’s education level, for each of five regions and four fields, for subjects whose degree was attained in 2016-2017. The solid blue lines connect the estimates where the average salary is constrained to be increasing in father’s education, and the solid red lines connect unconstrained estimates of average salary. +

+
+
+

The \(p\)-values for the test of the null hypotheses are given in Table 1, where n/a is shown for the two-sided \(p\)-value in the case of an empty domain. The test is conducted for six pairs of consecutive calendar years with the sample sizes to be 2,129, 2,795, 3,069, 2,895, 2,423, and 1,368 respectively. The one-sided test consistently has a small \(p\)-value, indicating that the father’s education level is positively associated with salary earned, even after controlling for region, degree type, and field.

+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+Table 1: Table 2: One-sided and two-sided \(p\)-values for the test of the null hypothesis that salary is constant in father’s education level. The two-sided test results in n/a when the grid has at least one empty domain. +
+
+2008-09 +
+
+
+2010-11 +
+
+
+2012-13 +
+
+
+2014-15 +
+
+
+2016-17 +
+
+
+2018-19 +
+
+one + +two + +one + +two + +one + +two + +one + +two + +one + +two + +one + +two +
+.008 + +n/a + +<.001 + +.018 + +<.001 + +<.001 + +<.001 + +n/a + +.003 + +.417 + +<.001 + +n/a +
+
+

The \(p\)-value of the one-sided test is included in the object . For example, to check the \(p\)-value for the fit for the year 2008 to 2009, we can fit the model and print out its summary table as:

+
+
+
load("./nscg19_2.rda")
+data <- nscg2 |>
+  dplyr::filter(hd_year %in% c(2008, 2009))
+
+rds <- svrepdesign(data = data, repweights = dplyr::select(data, "RW0001":"RW0320"), weights = ~w,
+                  combined.weights = TRUE, mse = TRUE, type = "other",
+                  scale = 1, rscale = 0.05)
+
+set.seed(1)
+ans <- csvy(logSalary ~ incr(daded) * field * region, design = rds, test = TRUE)
+
+
+
+
+
summary(ans)
+
+
Call:
+csvy(formula = logSalary ~ incr(daded) * field * region, design = rds, 
+    test = TRUE)
+
+Null deviance:  460.6182  on 97  degrees of freedom 
+Residual deviance:  391.3722  on 51  degrees of freedom 
+
+Approximate significance of constrained fit: 
+                         edf mixture.of.Beta p.value   
+incr(daded):field:region  47          0.5536  0.0068 **
+---
+Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+CIC (constrained estimator):  0.0275
+CIC (unconstrained estimator):  0.0354 
+
+

7 Binary outcome

+

Finally, we use another subset of the NHANES 2009–2010 data to demonstrate how our method applies when the outcome is binary. This subset is included in the csurvey package as nhdat. The construction of variables, sampling weights, and strata in this subset closely follows the approach described in Section 4. It contains \(n = 1,680\) observations with complete records on total cholesterol, age, height, and waist circumference for adults aged 21–40. The binary outcome indicates whether an individual has high total cholesterol, coded as 1 if total cholesterol exceeds 200 mg/dL, and 0 otherwise. We estimate the population proportion with high cholesterol by age, waist, and gender (1 = male, 2 = female). The waist variable, denoted as wcat, is a 4-level categorized ordinal variable representing waist-to-height ratios.

+

It is reasonable to assume that, on average, the proportion of individuals with high cholesterol increases with both age and waist. The model is specified using the following code:

+
+
+
data(nhdat, package = "csurvey")
+dstrat <- svydesign(ids = ~ id,  strata = ~ str, data = nhdat,  weight = ~ wt)
+set.seed(1)
+ans <- csvy(chol ~ incr(age) * incr(wcat) * gender, design = dstrat,
+            family = binomial(link = "logit"), test = TRUE)
+
+
+

The CIC of the constrained estimator is smaller than that of the unconstrained estimator, and the one-sided hypothesis test has a \(p\)-value close to zero.

+
+
+
summary(ans)
+
+
Call:
+csvy(formula = chol ~ incr(age) * incr(wcat) * gender, design = dstrat, 
+    family = binomial(link = "logit"), test = TRUE)
+
+Null deviance:  2054.947  on 159  degrees of freedom 
+Residual deviance:  1906.762  on 126  degrees of freedom 
+
+Approximate significance of constrained fit: 
+                            edf mixture.of.Beta               p.value
+incr(age):incr(wcat):gender  34          0.5174 < 0.00000000000000022
+                               
+incr(age):incr(wcat):gender ***
+---
+Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+CIC (constrained estimator):  0.3199
+CIC (unconstrained estimator):  1.2778 
+
+

The combination of age, waist, and gender gives 160 domains. This implies that the average sample size for each domain is only around 10. Due to the small sample sizes, the unconstrained estimator shows unlikely jumps as age increases within each waist category. On the other hand, the constrained estimator is more stable and tends to have smaller confidence intervals compared with the unconstrained Hájek estimator.

+
+ +
+
+
+Estimates of probability of high cholesterol level for each combination of age, waist and gender. The blue dots represent the constrained domain mean estimates, while the green dots represent the unconstrained domain mean estimates. The blue band is the 95% confidence interval for the domains, using the constraints; the green band is the 95% unconstrained domain mean confidence interval. +

+Figure 7: Estimates of probability of high cholesterol level for each combination of age, waist and gender. The blue dots represent the constrained domain mean estimates, while the green dots represent the unconstrained domain mean estimates. The blue band is the 95% confidence interval for the domains, using the constraints; the green band is the 95% unconstrained domain mean confidence interval. +

+
+
+

8 Discussion

+

While model-based small area estimators - such as those implemented in the sae package and the emdi package - are powerful tools for borrowing strength across domains, they rely on parametric assumptions that may be violated in practice. Design-based methods remain essential for official statistical agencies, as they provide transparent and model-free inference that is directly tied to the survey design. Estimation and inference for population domain means with survey data can be substantially improved if constraints based on natural orderings are implemented. The csurvey package (version 1.15) allows users to specify orderings on grids of domains, and obtain estimates of and confidence intervals for population domain means. This package also implements the design-based small area estimation method, which allows inference for population domain means for which the sample domain is empty, and further is used to improve estimates for domains with small sample size. The one-sided testing procedure available in csurvey has higher power than the standard two-sided test, and further can be applied in grids with some empty domains. Confidence intervals for domain means have better coverage rate and smaller interval width than what is produced by unconstrained estimation. Finally, the package provides functions to allow the user to easily visualize the data and the fits. The utility of the package has been demonstrated with well-known survey data sets.

+

Acknowledgment: This work was partially funded by NSF MMS-1533804.

+
+

8.1 Supplementary materials

+

Supplementary materials are available in addition to this article. It can be downloaded at +RJ-2025-032.zip

+

8.2 CRAN packages used

+

csurvey, survey, coneproj, data.table, igraph, dplyr, ggplot2, MASS, Matrix, sae, emdi

+

8.3 CRAN Task Views implied by cited packages

+

ChemPhys, Databases, Distributions, Econometrics, Environmetrics, Finance, GraphicalModels, HighPerformanceComputing, MixedModels, ModelDeployment, NetworkAnalysis, NumericalMathematics, OfficialStatistics, Optimization, Phylogenetics, Psychometrics, Robust, Spatial, Survival, TeachingStatistics, TimeSeries, WebTechnologies

+ + +
+ +
+
+ + + + + + + +
+

References

+
+

Reuse

+

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

+

Citation

+

For attribution, please cite this work as

+
Liao & Meyer, "csurvey: Implementing Order Constraints in Survey Data Analysis", The R Journal, 2026
+

BibTeX citation

+
@article{RJ-2025-032,
+  author = {Liao, Xiyue and Meyer, Mary C.},
+  title = {csurvey: Implementing Order Constraints in Survey Data Analysis},
+  journal = {The R Journal},
+  year = {2026},
+  note = {https://doi.org/10.32614/RJ-2025-032},
+  doi = {10.32614/RJ-2025-032},
+  volume = {17},
+  issue = {4},
+  issn = {2073-4859},
+  pages = {4-19}
+}
+
+ + + + + + + diff --git a/_articles/RJ-2025-032/RJ-2025-032.pdf b/_articles/RJ-2025-032/RJ-2025-032.pdf new file mode 100644 index 0000000000..e39b8dc472 Binary files /dev/null and b/_articles/RJ-2025-032/RJ-2025-032.pdf differ diff --git a/_articles/RJ-2025-032/RJ-2025-032.tex b/_articles/RJ-2025-032/RJ-2025-032.tex new file mode 100644 index 0000000000..472fc53c28 --- /dev/null +++ b/_articles/RJ-2025-032/RJ-2025-032.tex @@ -0,0 +1,449 @@ +% !TeX root = RJwrapper.tex +\title{csurvey: Implementing Order Constraints in Survey Data Analysis} + + +\author{by Xiyue Liao and Mary C. Meyer} + +\maketitle + +\abstract{% +Recent work in survey domain estimation has shown that incorporating a priori assumptions about orderings of population domain means reduces the variance of the estimators. The new R package csurvey allows users to implement order constraints using a design specified in the well-known survey package. The order constraints not only give estimates that satisfy a priori assumptions, but they also provide smaller confidence intervals with good coverage, and allow for design-based estimation in small-sample or empty domains. A test for constant versus increasing domain means is available in the package, with generalizations to other one-sided tests. A cone information criterion may be used to provide evidence that the order constraints are valid. Examples with well-known survey data sets show the utility of the methods. This package is now available from the Comprehensive R Archive Network at \url{http://CRAN.R-project.org/package=csurvey}. +} + +\section{Introduction}\label{introduction} + +We assume that a finite population is partitioned into a number of domains, and the goal is to estimate the population domain means for the study variable and provide valid inference, such as confidence intervals and hypothesis tests. + +Order constraints are a common type of \emph{a priori} knowledge in survey data analysis. For example, we might know that salary increases with job rank within job type and location, or average cholesterol in a population increases with age category, or test scores decrease as poverty increases, or that the amount of pollution decreases with distance from the source. There might be a ``block ordering'' of salaries by location, where the locations are partitioned into blocks where salaries for all locations in one block (such as major metropolitan areas) are assumed to be higher, on average, than salaries in another block (such as rural areas), without imposing orderings within the blocks. + +The order constraints may be imposed on domain mean estimates, or systematic component estimates if the response variable is not Gaussian but belongs to an exponential family of distributions. For example, if we are estimating the proportion of people with diabetes in a population, the study variable might be binary (whether or not the subject has diabetes), and perhaps we want to assume that diabetes prevalence increases with age, within ethnicity and socio-economic categories. The \CRANpkg{csurvey} package extends the \CRANpkg{survey} \citep{survey} package in that it allows the user to impose linear inequality constraints on the domain means. If the inequality constraints are valid, this leads to improved inference, as well as the ability to estimate means for empty or small-sample domains without additional assumptions. The \CRANpkg{csurvey} package provides constrained estimates of means or proportions, estimated variances of the estimates, and confidence intervals where the upper and lower bounds of the intervals also satisfy the order constraints. + +The \CRANpkg{csurvey} package implements one-sided tests. Suppose the study variable is salary, and interest is in whether the subject's salary is affected by the level of education of the subject's father, controlling for the subject's education level and field. The null hypothesis is that there is no difference in salary by the level of father's education. The one-sided test with alternative hypothesis that the salary increases with father's education has a higher power than the two-sided test with the larger alternative ``salary is not the same across the levels of father's education.'' + +Finally, the \CRANpkg{csurvey} package includes graphical functions for visualizing the order-constrained estimator and comparing it to the unconstrained estimator. Confidence bands can also be displayed to illustrate estimation uncertainty. + +This package relies on \CRANpkg{coneproj} \citep{coneproj} and \CRANpkg{survey} for its core computations and handling of complex sampling designs. The domain grid is constructed using the \CRANpkg{data.table} \citep{dtable} package. Additional functionality, such as data filtering, variable transformation, and visualization of model results, leverages functions from \CRANpkg{igraph} \citep{igraph}, \CRANpkg{dplyr} \citep{dplyr}, and \CRANpkg{ggplot2} \citep{ggplot2}. When simulating the mixture covariance matrix and the sampling distribution of the one-sided test statistic, selected functions from \CRANpkg{MASS} \citep{mass} and \CRANpkg{Matrix} \citep{matrix} are employed. + +\section{Research incorporated in csurvey}\label{research-incorporated-in-csurvey} + +The \CRANpkg{csurvey} package provides estimation and inference on population domain variables with order constraints, using recently developed methodology. \citet{wu16} considered a complete ordering on a sequence of domain means. They applied the pooled adjacent violators algorithm \citet{brunk58} for domain mean estimation, and derived asymptotic confidence intervals that have smaller width without sacrificing coverage, compared to the estimators that do not consider the ordering. \citet{oliva20} developed methodology for partial orderings and more general constraints on domains. These include block orderings and orderings on domains arranged in grids by multiple variables of interest. \citet{xu20} refined these methods by proposing variance estimators based on a mixture of covariance matrices, and showed that the mixture covariance estimator improves coverage of confidence intervals while retaining smaller interval lengths. \citet{liao23} showed how to use the order constraints to provide conservative design-based inference in domains with small sample sizes, or even in empty domains, without additional model assumptions. \citet{xu23} developed a test for constant versus increasing domain means, and extended it to one-sided tests for more general orderings. \citet{oliva19} proposed a cone information criterion (CIC) for survey data as a diagnostic method to measure possible departures from the assumed ordering. This criterion is similar to Akaike's information criterion (AIC) \citep{aic73} and the Bayesian information criterion (BIC) \citep{bic78}, and can be used for model selection. For example, we can use CIC to choose between an unconstrained estimator and an order constrained estimator. The one with a smaller CIC value will be chosen. + +All of the above papers include extensive simulations showing that the methods substantially improve inference when the order restrictions are valid. These methods are easy for users to implement in \CRANpkg{csurvey}, with commands to impose common orderings. + +\section{How to use csurvey}\label{how-to-use-csurvey} + +Statisticians and practitioners working with survey data are familiar with the R package \CRANpkg{survey} (see \citet{survey}) for analysis of complex survey data. Commands in the package allow users to specify the survey design with which their data were collected, then obtain estimation and inference for population variables of interest. The new \CRANpkg{csurvey} package extends the utility of \CRANpkg{survey} by allowing users to implement order constraints on domains. + +The \CRANpkg{csurvey} package relies on the functions in the \CRANpkg{survey} package such as \texttt{svydesign} and \texttt{svrepdesign}, which allow users to specify the survey design. This function produces an object that contains information about the sampling design, allowing the user to specify strata, sampling weights, etc. This object is used in statistical functions in the \CRANpkg{csurvey} package in the same manner as for the \CRANpkg{survey} package. In addition, the mixture covariance matrix in \cite{xu20} is constructed from an initial estimate of covariance obtained from \CRANpkg{survey}. + +Consider a finite population with labels \(U = \{1,\ldots,N\}\), partitioned into domains \(U_d\), \(d=1,\ldots,D\), where \(U_d\) has \(N_d\) elements. For a study variable \(y\), suppose interest is in estimating the population domain means +\[ +\bar{y}_{U_d} = \frac{\sum_{k\in U_d} y_k}{N_d} +\] +for each \(d\), and providing inference such as confidence intervals for each \(\bar{y}_{U_d}\). +Given a survey design, a sample \(s\subset U\) is chosen, and the unconstrained estimator of the population domain means is a weighted average of the sample observations in each domain. This estimator \(\tilde{\mathbf{y}}_s=(\tilde{y}_{s_1},\ldots, \tilde{y}_{s_D})\) in \cite{hajek71} is provided by the \CRANpkg{survey} package. + +The desired orderings are imposed as linear inequality constraints on the domain means, in the form of an \(m\times D\) constraint matrix \(\mathbf{A}\). The \CRANpkg{csurvey} package will find the constrained estimator \(\tilde{\boldsymbol{\theta}}\) by minimizing +\[ +\min_{\theta}(\tilde{\mathbf{y}}_s - \boldsymbol{\theta})^{\top}\mathbf{W}_s(\tilde{\mathbf{y}}_s-\boldsymbol{\theta}) \quad \mbox{such that} \hspace{2mm} \mathbf{A}\boldsymbol{\theta}\geq \mathbf{0} +\] +where the weights \(\mathbf{W}_s\) are provided by the survey design. (See \cite{oliva20} for details.) For a simple example of a constraint matrix, consider five domains with a simple ordering, where we assume \(\bar{y}_{U_1}\leq \bar{y}_{U_2}\leq \bar{y}_{U_3}\leq \bar{y}_{U_4}\leq \bar{y}_{U_5}\). Perhaps these are average salaries over five job levels. Then the constraint matrix is +\[ \mathbf{A} = \left(\begin{array}{ccccc} -1 & 1 & 0 & 0 & 0 \\0 & -1 & 1 & 0 & 0 \\0 & 0 & -1 & 1 & 0 \\0 &0 &0 & -1 & 1 \end{array}\right). +\] +For simple orderings on \(D\) domains, the constraint matrix is \((D-1) \times D\). +For a block ordering example, suppose we again have five domains, and we know that each of the population means in the first two domains must be smaller than each of the means of the last three domains. The constraint matrix is +\[ \mathbf{A} = \left(\begin{array}{ccccc} -1 & 0 & 1 & 0 & 0 \\ -1 & 0 & 0 & 1& 0 \\ -1 & 0 & 0 & 0 & 1\\ +0 & -1 & 1 & 0 & 0 \\0 & -1 & 0 & 1& 0 \\ 0 & -1 & 0 & 0 & 1\\ +\end{array}\right). +\] +The number of rows for a block ordering with two blocks is \(D_1 \times D_2\), where \(D_1\) is the number of domains in the first block and \(D_2\) is the number in the second block. The package also allows users to specify grids of domains with various order constraints along the dimensions of the grid. The constraint matrices are automatically generated for some standard types of constraints. + +The \CRANpkg{csurvey} package allows users to specify a simple ordering with the symbolic \code{incr} function. For example, suppose \code{y} is the survey variable of interest (say, cholesterol level), and the variable \code{x} takes values \(1\) through \(D\) (age groups for example). Suppose we assume that average cholesterol level is increasing with age in this population, and that the design is specified by the object \code{ds}. Then + +\begin{verbatim} +ans <- csvy(y ~ incr(x), design = ds) +\end{verbatim} + +creates an object containing the estimated population means and confidence intervals. The \code{decr} function is used similarly, to specify decreasing means. + +Next, suppose \code{x1} takes values \(1\) through \(D_1\) and \code{x2} takes values \(1\) through \(D_2\) (say, age group and ethnicity), and we wish to estimate the population means over the \(D_1\times D_2\) grid of values of \code{x1} and \code{x2.} If we assume that the population domain means are ordered in \code{x1} but there is no ordering in \code{x2}, then the command + +\begin{verbatim} +ans <- csvy(y ~ incr(x1) * x2, design = ds) +\end{verbatim} + +will provide domain mean estimates where the means are non-decreasing in age, within each ethnicity. Note that we don't allow ``+'' when defining a formula in \texttt{csvy()} because we consider all combinations \(D_1\times D_2\) given by \code{x1} and \code{x2}. + +For an example of a block ordering with three blocks, the command + +\begin{verbatim} +ans <- csvy(y ~ block.Ord(x, order = c(1,1,1,2,2,2,3,3,3)), design = ds) +\end{verbatim} + +specifies that the variable \code{x} takes values \(1\) through \(9\), and the domains with values 1, 2, and 3 each have population means not greater than each of the population means in the domains with \code{x} values 4, 5, and 6. The domains with \code{x} values 4, 5, and 6 each have population means not greater than each of the population means in the domains with \code{x} values 7, 8, and 9. More examples of implementation of constraints will be given below. + +Implementing order constraints leads to ``pooling'' of domains where the order constraints are binding. This naturally leads to smaller confidence intervals as the averaging is over a larger number of observations. The mixture covariance estimator for \(\tilde{\boldsymbol{\theta}}\) that was derived in \cite{xu20} is provided by \CRANpkg{csurvey}. This covariance estimator is constructed by recognizing that for different samples, different constraints are binding, so that different sets of domains are ``pooled'' to obtain the constrained estimator. The covariance estimator then is a mixture of pooled covariance matrix estimators, with the mixture distribution approximated via simulations. Using this mixture rather than the covariance matrix for the observed pooling provides confidence intervals with coverage that is closer to the target, while retaining the shorter lengths. The method introduced in \cite{liao23} further pools information across domains to provide upper and lower confidence interval bounds that also satisfy the constraints, effectively reducing the confidence interval length for domains with small sample sizes, and allowing for estimation and inference in empty domains. + +The test \(H_0:\mathbf{A}\bar{\mathbf{y}}_U=0\) versus the one-sided \(H_1:\mathbf{A}\bar{\mathbf{y}}_U\geq0\) in \CRANpkg{csurvey} has improved power over the \(F\)-test implemented using the \code{anova} command using the object from \code{svyglm} in the \CRANpkg{survey} package. That \(F\)-test uses the two-sided alternative \(H_2:\mathbf{A}\bar{\mathbf{y}}_U\neq0\). For example, suppose we have measures of amounts of pollutants in samples of water from small lakes in a certain region. We also have measurements of distances from sources, such as factories or waste dumps, that are suspected of contributing to the pollution. The test of the null hypothesis that the amount of pollution does not depend on the distance will have greater power if the alternative is one-sided, that is, that the pollution amount is, on average, larger for smaller distances. + +In the following sections, we only show the main part of code used to generate the results. Supplementary or non-essential code is available in the accompanying R script submitted with this manuscript. + +\section{NHANES Example with monotonic domain means}\label{sec-monotonic} + +The National Health and Nutrition Examination Survey (NHANES) combines in-person interviews and physical examinations to produce a comprehensive data set from a probability sample of residents of the U.S. The data are made available to the public at this \href{https://wwwn.cdc.gov/nchs/nhanes/}{CDC NHANES website}. The subset used in this example is derived from the 2009--2010 NHANES cycle and is provided in the \CRANpkg{csurvey} package as the \code{nhdat2} data set. We consider the task of estimating the average total cholesterol level (mg/dL), originally recorded as \texttt{LBXTC} in the raw NHANES data, by age (in years), corresponding to the original variable \texttt{RIDAGEYR}. Focusing on young adults aged 21 to 45, we analyze a sample of \(n = 1,933\) participants and specify the survey design using the associated weights and strata available in the data. + +\begin{verbatim} +library(csurvey) +data(nhdat2, package = "csurvey") +dstrat <- svydesign(ids = ~id, strata = ~str, data = nhdat2, weight = ~wt) +\end{verbatim} + +Then, to get the proposed constrained domain mean estimate, we use the \code{csvy} function in the \CRANpkg{csurvey} package. In this function, \code{incr} is a symbolic function used to define that the population domain means of \texttt{chol} are increasing with respect to the predictor \texttt{age}. + +\begin{verbatim} +ans <- csvy(chol ~ incr(age), design = dstrat, n.mix = 100) +\end{verbatim} + +The \texttt{n.mix} parameter controls the number of simulations used to estimate the mixture covariance matrix, with a default of \texttt{n.mix\ =\ 100}. To speed up computation, users can set \texttt{n.mix} to be a smaller number, e.g., 10. + +We can extract from \code{ans} the CIC value for the constrained estimator as + +\begin{verbatim} +cat("CIC (constrained):", ans$CIC, "\n") +\end{verbatim} + +\begin{verbatim} +#> CIC (constrained): 32.99313 +\end{verbatim} + +and the CIC value for the unconstrained estimator as + +\begin{verbatim} +cat("CIC (unconstrained):", ans$CIC.un, "\n") +\end{verbatim} + +\begin{verbatim} +#> CIC (unconstrained): 51.65159 +\end{verbatim} + +We see that for this example, the constrained estimator has a smaller CIC value and this implies it is a better fit. + +If we want to construct a contrast of domain means and get its standard error, we can use the \code{svycontrast} function, which is inherited from the \CRANpkg{survey} package. Note that in the \CRANpkg{survey} package, it is impossible to get a contrast estimate when there is any empty domain. In the \CRANpkg{csurvey} package, we inherit this feature. For example, suppose we want to compare the average cholesterol for the first thirteen age groups to the last twelve age groups, we can code it as: + +\begin{verbatim} +cat(svycontrast(ans, list(avg = c(rep(-1, 13)/13, rep(1, 12)/12))), "\n") +\end{verbatim} + +\begin{verbatim} +#> 19.44752 +\end{verbatim} + +The \code{csvy} function produces both the constrained fit and the corresponding unconstrained fit using methods from the \CRANpkg{survey} package. A visual comparison between the two fits can be easily obtained by applying the \code{plot} method to the resulting \code{csvy} object by specifying the argument \texttt{type\ =\ "both"}. For illustration, Figure \ref{fig:nh1big} displays the estimated domain means along with 95\% confidence intervals, generated by the following code: + +\begin{verbatim} +plot(ans, type = "both") +\end{verbatim} + +\begin{figure} + +{\centering \includegraphics[width=1\linewidth]{figures/nhanes1} + +} + +\caption{Estimates of average cholesterol level for 25 ages, with 95\% confidence intervals, for a stratified sample in the R dataset `nhdat2`, $n=1933$.}\label{fig:nh1big} +\end{figure} + +The \code{confint} function can be used to extract the confidence interval for domain mean. When the response is not Gaussian, then \code{type="link"} produces the confidence interval for the average systematic component over domains, while \code{type="response"} produces the confidence interval for the domain mean. + +For this data set, the sample sizes for each of the twenty-five ages range from 54 to 99, so that none of the domains is ``small.'' To demonstrate \code{csvy} in the case of small domains, we next provide domain mean estimates and confidence intervals for \(400\) domains, arranged in a grid with 25 ages, four waist-size categories, and four income categories. We divide the waist measurement by height to make a relative girth, and split the observations into four groups with variable name \texttt{wcat}, which is a 4-level ordinal categorical variable representing waist-to-height ratio categories, computed from \texttt{BMXWAIST} (waist circumference in cm) and \texttt{BMXHT} (height in cm) in the body measures file \texttt{BMX\_F.XPT} from NHANES. We have income information for the subjects, in terms of a multiple of the federal poverty level. Our first income category includes those with income that is 75\% or less than the poverty line. The second category is .75 through 1.38 times the poverty level (1.38 determines Medicaid eligibility), the third category goes from above 1.38 to 3.5, and finally above 3.5 times the poverty level is the fourth category. These indicators are contained in the variable \texttt{icat} (categorized income). We create \texttt{icat} by the \texttt{INDFMPIR} variable from the file \texttt{DEMO\_F.XPT} from NHANES. + +The domain sample sizes average only 4.8 observations, and there are 16 empty domains. The sample size for each domain can be checked by \texttt{ans\$nd}. To estimate the population means we assume average cholesterol level is increasing in both age and waist size, but no ordering is imposed on income categories: + +\begin{verbatim} +set.seed(1) +ans <- csvy(chol ~ incr(age)*incr(wcat)*icat, design = dstrat) +\end{verbatim} + +To extract estimates and confidence intervals for specific domains defined by the model, the user can use the \code{predict} function as follows: + +\begin{verbatim} +domains <- data.frame(age = c(24, 35), wcat = c(2, 4), icat = c(2, 3)) +pans <- predict(ans, newdata = domains, se.fit = TRUE) +cat("Predicted values, confidence intervals and standard errors for specified domains:\n") +\end{verbatim} + +\begin{verbatim} +#> Predicted values, confidence intervals and standard errors for specified domains: +\end{verbatim} + +\begin{verbatim} +print (pans) +\end{verbatim} + +\begin{verbatim} +#> $fit +#> [1] 162.9599 202.0061 +#> +#> $lwr +#> [1] 148.6083 183.6379 +#> +#> $upp +#> [1] 177.8976 219.7764 +#> +#> $se.fit +#> [1] 7.471908 9.219167 +\end{verbatim} + +Figure \ref{fig:nh2} displays the domain mean estimates along with 95\% confidence intervals, highlighting in red the two domains specified in the \code{predict} function. The code to create this plot is as below. The \texttt{control} argument is used to let the user adjust the aesthetics of the plot. + +\begin{figure} + +{\centering \includegraphics[width=1\linewidth]{figures/nhanes_grid3} + +} + +\caption{Constrained estimates of population domain means for 400 domains in a 25x4x4 grid. The increasing population domain estimates for the 25 ages are shown within the waist size and income categories. The blue bands indicate 95\% confidence intervals for the population domain means, with two specific domains, namely, (age, waist, income) = (24, 2, 2) and (35, 4, 3) marked in red. Empty domains are marked with a red 'x' sign.}\label{fig:nh2} +\end{figure} + +\begin{verbatim} +ctl <- list(x1lab = "waist", x2lab = "income", subtitle.size = 8) +plot(ans, x1 = "wcat", x2 = "icat", control = ctl, domains = domains) +\end{verbatim} + +The user can visualize the unconstrained fit by specifying \code{type = "unconstrained"} in the \code{plot} function. The corresponding output is presented in Figure \ref{fig:nh2un}. + +\begin{verbatim} +plot(ans, x1 = "wcat", x2 = "icat", control = ctl, type="unconstrained") +\end{verbatim} + +\begin{figure} + +{\centering \includegraphics[width=1\linewidth]{figures/nhanes_grid3_un} + +} + +\caption{Unconstrained estimates of population domain means for 400 domains in a 25x4x4 grid. The population domain estimates for the 25 ages are shown within the waist size and income categories. The green bands indicate 95\% confidence intervals for the population domain means. Empty domains are marked with a red 'x' sign.}\label{fig:nh2un} +\end{figure} + +Without the order constraints, the sample sizes are too small to provide valid estimates and confidence intervals, unless further model assumptions are used, as in some small area estimation methods. With order constraints, design-based estimation and inference are possible for substantially smaller domain sample sizes, compared to the unconstrained design-based estimation. + +\section{Constrained domain means with a block ordering}\label{constrained-domain-means-with-a-block-ordering} + +We consider the 2019 National Survey of College Graduates (NSCG), conducted by the U.S. Census Bureau and sponsored by the National Center for Science and Engineering Statistics (NCSES) within the National Science Foundation. The NSCG provides data on the characteristics of the nation's college graduates, with a focus on those in the science and engineering workforce. The datasets and documentation are available to the public on the \href{https://www.nsf.gov/statistics/srvygrads}{National Survey of College Graduates (NSCF) website}. Replicate weights are available separately from NCSES upon request. Because the size of subsets of the NSCG survey used in this paper exceeds the size limit allowed for an R package stored on CRAN, the subsets are not included in the \CRANpkg{csurvey} package. Instead, we provide the link to access the subsets \texttt{nscg19.rda} and \texttt{nscg19\_2.rda} used in this section at this \href{https://github.com/xliaosdsu/csurvey-data}{website}. + +The study variable of interest is annual salary (denoted as SALARY in the dataset), which exhibits substantial right-skewness in its raw form. To reduce the influence of outliers and improve model stability, we restricted the analysis to observations with annual salaries between \$30,000 and \$400,000. A logarithmic transformation was applied to the salary variable to address skewness. Four predictors of salary were considered: + +\begin{itemize} +\item + Field of study (\texttt{field}, denoted by \texttt{NDGMEMG} in the raw dataset): This nominal variable defines the field of study for the highest degree. There are five levels: (1) Computer and mathematical sciences; (2) Biological, agricultural and environmental life sciences; (3) Physical and related sciences; (4) Social and related sciences; (5) Engineering. \emph{Block ordering constraint}: given the other predictors, the average annual salary for each of the fields (2) and (4) is less than for the STEM fields (1), (3) and (5). +\item + Grouped year of award of highest degree (\texttt{hd\_year\_grouped}, denoted by \texttt{HDAY5}): This ordinal variable has five levels: (1) 1995 to 1999; (2) 2000 to 2004; (3) 2005 to 2009; (4) 2010 to 2014; (5) 2015 or later. \emph{Isotonic constraint}: given the other predictors, the average annual salary decreases with the year of award of highest degree; i.e., the more experience respondents have, on average, the higher the annual salary. +\item + Highest degree type (\texttt{hd\_type}, denoted by DGRDG): The three levels are: (1) Bachelor's; (2) Master's; (3) Doctorate and Professional. \emph{Isotonic constraint}: given the other predictors, the average annual salary increases with respect to the highest degree type. +\item + Region code for employer (\texttt{region}, denoted by EMRG): This nominal variable defines the regions in which the respondents worked within the U.S. Nine levels are: (1) New England; (2) Middle Atlantic; (3) East North Central; (4) West North Central; (5) South Atlantic; (6) East South Central; (7) West South Central; (8) Mountain; (9) Pacific and US Territories. There is no constraint for this predictor. +\end{itemize} + +This data set contains \(n=30,368\) observations in a four-dimensional grid of 675 domains, where the sample size in the domains ranges from one to 491. Here, we specify the shape and order constraints in a similar fashion as we did in previous examples. The symbolic routine \code{block.Ord} is used to impose a block ordering on \texttt{field} and the order is specified in the \code{order} argument. The \code{svydesign} specifies a survey design with no clusters. The command \code{svrepdesign} creates a survey design with replicate weights, where the columns named as ``RW0001'', ``RW0002'',\ldots,``RW0320'' are the 320 NSCG replicate weights and \code{weights( = \textasciitilde w)} denotes the sampling weight. The variance is computed as the sum of squared deviation of the replicates from the mean. The general formula for computing a variance estimate using replicate weights follows: +\[v_{REP}(\hat{\theta})=\sum_{r=1}^{R}c_r(\hat{\theta}_r-\hat{\theta})^2\] +where the estimate \(\hat{\theta}\) is computed based on the final full sample survey weights and the calculation of each replicate estimate \(\hat{\theta}_r\) is according to the \(r\)th set of replicate weights (\(r=1,\cdots, R\)). The replication adjustment factor \(c_r\) is a multiplier for the \(r\)th replicate of squared difference. The \code{scale( = 1)} is an overall multiplier and the \code{rscale( = 0.05)} denotes a vector of replicate-specific multipliers, which are the values of \(c_r\) in above formula. + +\begin{verbatim} +load("./nscg19.rda") +rds <- svrepdesign(data = nscg, repweights = dplyr::select(nscg, "RW0001":"RW0320"), + weights = ~w, combined.weights = TRUE, mse = TRUE, type = "other", scale = 1, rscale = 0.05) +\end{verbatim} + +Estimates of domain means for the 225 domains for which the highest degree is a Bachelor's are shown in Figure \ref{fig:surface9}, as surfaces connecting the estimates over the grid of field indicators and year of award of highest degree. The block-ordering constraints can be seen in the surfaces, where the fields labeled 2 and 4 have lower means than those labeled 1, 3, and 5. The surfaces are also constrained to be decreasing in year of award of highest degree. + +To improve computational efficiency, the simulations used to estimate the mixture covariance matrix and the sampling distribution of the test statistic can be parallelized. This can be enabled by setting the following R option + +\begin{verbatim} +options(csurvey.multicore = TRUE) +\end{verbatim} + +The model is fitted using the following code + +\begin{verbatim} +ans <- csvy(logSalary~decr(hd_year_grouped)*incr(hd_type)*block.Ord(field, + order = c(2, 1, 2, 1, 2))*region, design = rds) +\end{verbatim} + +The \code{plotpersp} function is used to create Figure \ref{fig:surface9}. It will create a three-dimensional estimated surface plot when there are at least two predictors. In this example, we use \code{plotpersp} to generate a 3D perspective plot of the estimated average log-transformed salary with respect to field and year of award of highest degree. Among the three \texttt{hd\_type} categories, the Bachelor's degree group has the highest number of observations. The \code{plotpersp} function visualizes the 225 domains corresponding to the most frequently observed level of this fourth predictor. Plot aesthetics are customized via the \texttt{control} argument: + +\begin{verbatim} +ctl <- list(categ = "region", categnm = c("New England", "Middle Atlantic", "East North Central", + "West North Central", "South Atlantic", "East South Central", "West South Central", "Mountain", + "Pacific and US Territories"), NCOL = 3, th = 60, xlab = "years since degree", + ylab = "field of study", zlab = "log(salary)") + +plotpersp(ans, x1="hd_year_grouped", x2="field", control = ctl) +\end{verbatim} + +\begin{figure} + +{\centering \includegraphics[width=0.6\linewidth]{figures/new_surfaces9} + +} + +\caption{Estimates of average log(salary) by field of study and year of degree, for observations where highest degree is a Bachelor's, for each of the nine regions.}\label{fig:surface9} +\end{figure} + +Estimates and confidence intervals for the 75 domain means associated with three of the regions are shown in Figure \ref{fig:NEreg}. The constrained estimates are shown as blue dots, and the unconstrained means are depicted as grey dots. The confidence intervals are indicated with blue bands for the constrained estimates and grey bands for the unconstrained estimates. + +For the Northeast Region, the average length of the constrained confidence intervals is .353, while the average length for the unconstrained confidence intervals is .477. In the domain for those in the math and computer science field, with PhDs obtained in 2000-2004, the unconstrained log-salary estimate for this domain is much below the corresponding constrained estimate, because the latter is forced to be at least that of lower degrees, and that of newer PhDs. If the constraints hold in the population, then the unconstrained confidence interval is unlikely to capture the population value. As seen in the NHANES example in Section \ref{sec-monotonic}, the unconstrained estimators are unreliable when the sample domain size is small. The Pacific region has the largest sample sizes, ranging from 12 to 435 observations. With this larger sample size, most of the unconstrained estimates already satisfy the constraints, but the average length for the constrained estimator is .301, while the average length for the unconstrained estimator is .338, showing that the mixture covariance matrix leads to more precision in the confidence intervals. Also shown is the region with the smallest sample sizes: the East South Central region. Here the average length for the unconstrained confidence intervals is .488 while the average length for the constrained confidence intervals is .444. + +\begin{figure} + +{\centering \includegraphics[width=1\linewidth]{figures/newplot4} + +} + +\caption{Estimates of average log(salary) for the 75 domains in each of three regions. The blue dots represent the constrained domain mean estimates, while the grey dots represent the unconstrained domain mean estimates. The blue band is the 95\% confidence interval for the domains, using the constraints; the grey band is the 95\% unconstrained domain mean confidence interval.}\label{fig:NEreg} +\end{figure} + +\section{One-sided testing}\label{one-sided-testing} + +For this example we again use the NCGS data set. But for this example, we use some new variables and make some variable groupings. First, instead of using the grouped year of award of highest degree (\texttt{hd\_year\_grouped}), we use the actual calendar year when the highest degree was awarded (\texttt{hd\_year}, denoted as \texttt{HDACYR} in the raw data set). We conduct the one-sided test for six pairs of consecutive calendar years. Besides, we choose father's education level (\texttt{daded}, denoted as \texttt{EDDAD}) as the main predictor, and the interest is in determining whether salaries are higher for people whose father has a higher education. In the raw data set, \texttt{EDDAD} has seven categories, we group them into five categories for this example: 1 = no high school degree, 2 = high school degree but no college degree, 3 = bachelor's degree, 4 = master's degree, and 5 = PhD or professional degree. We further group \texttt{region} (\texttt{EMRG}) as: 1 = Northeast, 2 = North Central, 3 = Southeast, 4 = West, 5 = Pacific and Territories, and group \texttt{field} (\texttt{NDGMEMG}) as: 1 = Computer and Mathematical Sciences, Physical and Related Sciences, Engineering, which can be considered as core STEM fields, 2 = Biological, Agricultural, and Environmental Life Sciences, Social and Related Sciences, which can be considered as life and social sciences, 3 = Science and Engineering-Related Fields, 4 = Non-Science and Engineering Fields, which is Non-STEM. Finally, people's highest degree type (denoted as \texttt{DGRDG} in the raw data) is used to limit the study sample to people whose highest degree is a Master's degree. The sample size is \(n=25,177\). + +It seems reasonable that people whose parents are more educated are more educated themselves, but if we control for the person's level of education, as well as field, region, and year of degree, will the salary still be increasing with level of father's education? For this, the null hypothesis is that within the region, field, and year of degree, the salary is constant over levels of father's education. The one-sided alternative is that salaries increase with father's education level. + +The constrained and unconstrained fits to the data under the alternative hypotheses are shown in Figure \ref{fig:test} for five regions and four fields, for degree years 2016-2017. The dots connected by solid blue lines represent the fit with the one-sided alternative, i.e., constrained to be increasing in father's education. The dots connected by solid red lines represent the fit with the two-sided alternative, i.e., with unconstrained domain means. + +\begin{figure} + +{\centering \includegraphics[width=1\linewidth]{figures/daded} + +} + +\caption{Estimates of average log(salary) by father's education level, for each of five regions and four fields, for subjects whose degree was attained in 2016-2017. The solid blue lines connect the estimates where the average salary is constrained to be increasing in father's education, and the solid red lines connect unconstrained estimates of average salary.}\label{fig:test} +\end{figure} + +The \(p\)-values for the test of the null hypotheses are given in Table \ref{tab:comppv}, where n/a is shown for the two-sided \(p\)-value in the case of an empty domain. The test is conducted for six pairs of consecutive calendar years with the sample sizes to be 2,129, 2,795, 3,069, 2,895, 2,423, and 1,368 respectively. The one-sided test consistently has a small \(p\)-value, indicating that the father's education level is positively associated with salary earned, even after controlling for region, degree type, and field. + +\begin{table}[!h] +\centering +\caption{\label{tab:comppv}One-sided and two-sided $p$-values for the test of the null hypothesis that salary is constant in father's education level. The two-sided test results in n/a when the grid has at least one empty domain.} +\centering +\begin{tabular}[t]{llllllllllll} +\toprule +\multicolumn{2}{c}{2008-09} & \multicolumn{2}{c}{2010-11} & \multicolumn{2}{c}{2012-13} & \multicolumn{2}{c}{2014-15} & \multicolumn{2}{c}{2016-17} & \multicolumn{2}{c}{2018-19} \\ +\cmidrule(l{3pt}r{3pt}){1-2} \cmidrule(l{3pt}r{3pt}){3-4} \cmidrule(l{3pt}r{3pt}){5-6} \cmidrule(l{3pt}r{3pt}){7-8} \cmidrule(l{3pt}r{3pt}){9-10} \cmidrule(l{3pt}r{3pt}){11-12} +one & two & one & two & one & two & one & two & one & two & one & two\\ +\midrule +.008 & n/a & <.001 & .018 & <.001 & <.001 & <.001 & n/a & .003 & .417 & <.001 & n/a\\ +\bottomrule +\end{tabular} +\end{table} + +The \(p\)-value of the one-sided test is included in the object \code{ans}. For example, to check the \(p\)-value for the fit for the year 2008 to 2009, we can fit the model and print out its summary table as: + +\begin{verbatim} +load("./nscg19_2.rda") +data <- nscg2 |> + dplyr::filter(hd_year %in% c(2008, 2009)) + +rds <- svrepdesign(data = data, repweights = dplyr::select(data, "RW0001":"RW0320"), weights = ~w, + combined.weights = TRUE, mse = TRUE, type = "other", + scale = 1, rscale = 0.05) + +set.seed(1) +ans <- csvy(logSalary ~ incr(daded) * field * region, design = rds, test = TRUE) +\end{verbatim} + +\begin{verbatim} +summary(ans) +\end{verbatim} + +\begin{verbatim} +#> Call: +#> csvy(formula = logSalary ~ incr(daded) * field * region, design = rds, +#> test = TRUE) +#> +#> Null deviance: 460.6182 on 97 degrees of freedom +#> Residual deviance: 391.3722 on 51 degrees of freedom +#> +#> Approximate significance of constrained fit: +#> edf mixture.of.Beta p.value +#> incr(daded):field:region 47 0.5536 0.0068 ** +#> --- +#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 +#> CIC (constrained estimator): 0.0275 +#> CIC (unconstrained estimator): 0.0354 +\end{verbatim} + +\section{Binary outcome}\label{binary-outcome} + +Finally, we use another subset of the NHANES 2009--2010 data to demonstrate how our method applies when the outcome is binary. This subset is included in the \CRANpkg{csurvey} package as \texttt{nhdat}. The construction of variables, sampling weights, and strata in this subset closely follows the approach described in Section \ref{sec-monotonic}. It contains \(n = 1,680\) observations with complete records on total cholesterol, age, height, and waist circumference for adults aged 21--40. The binary outcome indicates whether an individual has high total cholesterol, coded as 1 if total cholesterol exceeds 200 mg/dL, and 0 otherwise. We estimate the population proportion with high cholesterol by age, waist, and gender (1 = male, 2 = female). The waist variable, denoted as \texttt{wcat}, is a 4-level categorized ordinal variable representing waist-to-height ratios. + +It is reasonable to assume that, on average, the proportion of individuals with high cholesterol increases with both age and waist. The model is specified using the following code: + +\begin{verbatim} +data(nhdat, package = "csurvey") +dstrat <- svydesign(ids = ~ id, strata = ~ str, data = nhdat, weight = ~ wt) +set.seed(1) +ans <- csvy(chol ~ incr(age) * incr(wcat) * gender, design = dstrat, + family = binomial(link = "logit"), test = TRUE) +\end{verbatim} + +The CIC of the constrained estimator is smaller than that of the unconstrained estimator, and the one-sided hypothesis test has a \(p\)-value close to zero. + +\begin{verbatim} +summary(ans) +\end{verbatim} + +\begin{verbatim} +#> Call: +#> csvy(formula = chol ~ incr(age) * incr(wcat) * gender, design = dstrat, +#> family = binomial(link = "logit"), test = TRUE) +#> +#> Null deviance: 2054.947 on 159 degrees of freedom +#> Residual deviance: 1906.762 on 126 degrees of freedom +#> +#> Approximate significance of constrained fit: +#> edf mixture.of.Beta p.value +#> incr(age):incr(wcat):gender 34 0.5174 < 2.2e-16 *** +#> --- +#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 +#> CIC (constrained estimator): 0.3199 +#> CIC (unconstrained estimator): 1.2778 +\end{verbatim} + +The combination of age, waist, and gender gives 160 domains. This implies that the average sample size for each domain is only around 10. Due to the small sample sizes, the unconstrained estimator shows unlikely jumps as age increases within each waist category. On the other hand, the constrained estimator is more stable and tends to have smaller confidence intervals compared with the unconstrained Hájek estimator. + +\begin{figure} + +{\centering \includegraphics[width=1\linewidth]{figures/nhanes_bin} + +} + +\caption{Estimates of probability of high cholesterol level for each combination of age, waist and gender. The blue dots represent the constrained domain mean estimates, while the green dots represent the unconstrained domain mean estimates. The blue band is the 95\% confidence interval for the domains, using the constraints; the green band is the 95\% unconstrained domain mean confidence interval.}\label{fig:nhanesbin} +\end{figure} + +\section{Discussion}\label{discussion} + +While model-based small area estimators - such as those implemented in the \CRANpkg{sae} package \citep{sae2015} and the \CRANpkg{emdi} package \citep{emdi2019} - are powerful tools for borrowing strength across domains, they rely on parametric assumptions that may be violated in practice. Design-based methods remain essential for official statistical agencies, as they provide transparent and model-free inference that is directly tied to the survey design. Estimation and inference for population domain means with survey data can be substantially improved if constraints based on natural orderings are implemented. The \CRANpkg{csurvey} package (version 1.15) \citep{csurvey2025} allows users to specify orderings on grids of domains, and obtain estimates of and confidence intervals for population domain means. This package also implements the design-based small area estimation method, which allows inference for population domain means for which the sample domain is empty, and further is used to improve estimates for domains with small sample size. The one-sided testing procedure available in \CRANpkg{csurvey} has higher power than the standard two-sided test, and further can be applied in grids with some empty domains. Confidence intervals for domain means have better coverage rate and smaller interval width than what is produced by unconstrained estimation. Finally, the package provides functions to allow the user to easily visualize the data and the fits. The utility of the package has been demonstrated with well-known survey data sets. + +\textbf{Acknowledgment:} This work was partially funded by NSF MMS-1533804. + +\bibliography{article.bib} + +\address{% +Xiyue Liao\\ +San Diego State University\\% +San Diego State University\\ Department of Mathematics and Statistics\\ San Diego, California 92182, United States of America\\ +% +% +\textit{ORCiD: \href{https://orcid.org/0000-0002-4508-9219}{0000-0002-4508-9219}}\\% +\href{mailto:xliao@sdsu.edu}{\nolinkurl{xliao@sdsu.edu}}% +} + +\address{% +Mary C. Meyer\\ +Colorado State University\\% +Colorado State University\\ Department of Statistics\\ Fort Collins, Colorado 80523, United States of America\\ +% +% +% +\href{mailto:meyer@stat.colostate.edu}{\nolinkurl{meyer@stat.colostate.edu}}% +} diff --git a/_articles/RJ-2025-032/RJ-2025-032.zip b/_articles/RJ-2025-032/RJ-2025-032.zip new file mode 100644 index 0000000000..72e3ec0c39 Binary files /dev/null and b/_articles/RJ-2025-032/RJ-2025-032.zip differ diff --git a/_articles/RJ-2025-032/RJournal.sty b/_articles/RJ-2025-032/RJournal.sty new file mode 100644 index 0000000000..351990be38 --- /dev/null +++ b/_articles/RJ-2025-032/RJournal.sty @@ -0,0 +1,358 @@ +% Package `RJournal' to use with LaTeX2e +% Copyright (C) 2010 by the R Foundation +% Copyright (C) 2013 by the R Journal +% +% Originally written by Kurt Hornik and Friedrich Leisch with subsequent +% edits by the editorial board +% +% CAUTION: +% Do not modify this style file. Any changes to this file will be reset when your +% article is submitted. +% If you must modify the style or add LaTeX packages to the article, these +% should be specified in RJwrapper.tex + +\NeedsTeXFormat{LaTeX2e}[1995/12/01] +\ProvidesPackage{RJournal}[2025/10/05 v0.17 RJournal package] + +\RequirePackage{tikz} + +% Overall page layout, fonts etc ----------------------------------------------- + +% Issues of of \emph{The R Journal} are created from the standard \LaTeX{} +% document class \pkg{report}. + +\RequirePackage{geometry} +\geometry{a4paper, + textwidth=14cm, top=1cm, bottom=1cm, + includehead,includefoot,centering, + footskip=1.5cm} +\raggedbottom +\sloppy +\clubpenalty = 10000 +\widowpenalty = 10000 +\brokenpenalty = 10000 +\usepackage{microtype} + + +\RequirePackage{fancyhdr} +\fancyhead{} +\fancyheadoffset{2cm} +\fancyhead[L]{\textsc{\RJ@sectionhead}} +\fancyhead[R]{\thepage} +\fancyfoot{} +\fancyfoot[L]{The R Journal Vol. \RJ@volume/\RJ@number, \RJ@month~\RJ@year} +\fancyfoot[R]{ISSN 2073-4859} +\pagestyle{fancy} + +% We use the following fonts (all with T1 encoding): +% +% rm & palatino +% tt & inconsolata +% sf & helvetica +% math & palatino + +\RequirePackage{microtype} + +\RequirePackage[scaled=0.92]{helvet} +\RequirePackage{palatino,mathpazo} +\RequirePackage[scaled=1.02]{inconsolata} +\RequirePackage[T1]{fontenc} + +\RequirePackage[hyphens]{url} +\RequirePackage[pagebackref]{hyperref} +\renewcommand{\backref}[1]{[p#1]} + +% Dark blue colour for all links +\RequirePackage{color} +\definecolor{link}{rgb}{0.45,0.51,0.67} +\hypersetup{ + colorlinks,% + citecolor=link,% + filecolor=link,% + linkcolor=link,% + urlcolor=link +} + +% Give the text a little room to breath +\setlength{\parskip}{3pt} +\RequirePackage{setspace} +\setstretch{1.05} + +% Issue and article metadata --------------------------------------------------- + +% Basic front matter information about the issue: volume, number, and +% date. + +\newcommand{\volume}[1]{\def\RJ@volume{#1}} +\newcommand{\volnumber}[1]{\def\RJ@number{#1}} +\renewcommand{\month}[1]{\def\RJ@month{#1}} +\renewcommand{\year}[1]{\def\RJ@year{#1}} + + +% Individual articles correspond to +% chapters, and are contained in |article| environments. This makes it +% easy to have figures counted within articles and hence hyperlinked +% correctly. + +% An article has an author, a title, and optionally a subtitle. We use +% the obvious commands for specifying these. Articles will be put in certain +% journal sections, named by \sectionhead. + +\newcommand {\sectionhead} [1]{\def\RJ@sectionhead{#1}} +\renewcommand{\author} [1]{\def\RJ@author{#1}} +\renewcommand{\title} [1]{\def\RJ@title{#1}} +\newcommand {\subtitle} [1]{\def\RJ@subtitle{#1}} + +% Control appearance of titles: make slightly smaller than usual, and +% suppress section numbering. See http://tex.stackexchange.com/questions/69749 +% for why we don't use \setcounter{secnumdepth}{-1} + +\usepackage[medium]{titlesec} +\usepackage{titletoc} +\titleformat{\section} {\normalfont\large\bfseries}{\arabic{section}}{1em}{} +\titleformat{\subsection}{\normalfont\normalsize\bfseries}{\arabic{section}.\arabic{subsection}}{0.5em}{} +\titlecontents{chapter} [0em]{}{}{}{\titlerule*[1em]{.}\contentspage} + +% Article layout --------------------------------------------------------------- + +% Environment |article| clears the article header information at its beginning. +% We use |\FloatBarrier| from the placeins package to keep floats within +% the article. +\RequirePackage{placeins} +\newenvironment{article}{\author{}\title{}\subtitle{}\FloatBarrier}{\FloatBarrier} + +% Refereed articles should have an abstract, so we redefine |\abstract| to +% give the desired style + +\renewcommand{\abstract}[1]{\noindent\textbf{Abstract} #1} +\renewenvironment{abstract}{\noindent\textbf{Abstract}~}{} + +% The real work is done by a redefined version of |\maketitle|. Note +% that even though we do not want chapters (articles) numbered, we +% need to increment the chapter counter, so that figures get correct +% labelling. + +\renewcommand{\maketitle}{% +\noindent + \chapter{\RJ@title}\refstepcounter{chapter} + \ifx\empty\RJ@subtitle + \else + \noindent\textbf{\RJ@subtitle} + \par\nobreak\addvspace{\baselineskip} + \fi + \ifx\empty\RJ@author + \else + \noindent\textit{\RJ@author} + \par\nobreak\addvspace{\baselineskip} + \fi + \@afterindentfalse\@nobreaktrue\@afterheading +} + +% Now for some ugly redefinitions. We do not want articles to start a +% new page. (Actually, we do, but this is handled via explicit +% \newpage +% +% The name@of@eq is a hack to get hyperlinks to equations to work +% within each article, even though there may be multiple eq.(1) +% \begin{macrocode} +\renewcommand\chapter{\secdef\RJ@chapter\@schapter} +\providecommand{\nohyphens}{% + \hyphenpenalty=10000\exhyphenpenalty=10000\relax} +\newcommand{\RJ@chapter}{% + \edef\name@of@eq{equation.\@arabic{\c@chapter}}% + \renewcommand{\@seccntformat}[1]{}% + \@startsection{chapter}{0}{0mm}{% + -2\baselineskip \@plus -\baselineskip \@minus -.2ex}{\p@}{% + \phantomsection\normalfont\huge\bfseries\raggedright}} + +% Book reviews should appear as sections in the text and in the pdf bookmarks, +% however we wish them to appear as chapters in the TOC. Thus we define an +% alternative to |\maketitle| for reviews. +\newcommand{\review}[1]{ + \pdfbookmark[1]{#1}{#1} + \section*{#1} + \addtocontents{toc}{\protect\contentsline{chapter}{#1}{\thepage}{#1.1}} +} + +% We want bibliographies as starred sections within articles. +% +\RequirePackage[sectionbib,round]{natbib} +\bibliographystyle{abbrvnat} +\renewcommand{\bibsection}{\section*{References}} + +% Equations, figures and tables are counted within articles, but we do +% not show the article number. For equations it becomes a bit messy to avoid +% having hyperref getting it wrong. + +% \numberwithin{equation}{chapter} +\renewcommand{\theequation}{\@arabic\c@equation} +\renewcommand{\thefigure}{\@arabic\c@figure} +\renewcommand{\thetable}{\@arabic\c@table} + +% Issue layout ----------------------------------------------------------------- + +% Need to provide our own version of |\tableofcontents|. We use the +% tikz package to get the rounded rectangle. Notice that |\section*| +% is really the same as |\chapter*|. +\renewcommand{\contentsname}{Contents} +\renewcommand\tableofcontents{% + \vspace{1cm} + \section*{\contentsname} + { \@starttoc{toc} } +} + +\renewcommand{\titlepage}{% + \thispagestyle{empty} + \hypersetup{ + pdftitle={The R Journal Volume \RJ@volume/\RJ@number, \RJ@month \RJ@year},% + pdfauthor={R Foundation for Statistical Computing},% + } + \noindent + \begin{center} + \fontsize{50pt}{50pt}\selectfont + The \raisebox{-8pt}{\includegraphics[height=77pt]{Rlogo-5}}\hspace{10pt} + Journal + + \end{center} + {\large \hfill Volume \RJ@volume/\RJ@number, \RJ@month{} \RJ@year \quad} + + \rule{\textwidth}{1pt} + \begin{center} + {\Large A peer-reviewed, open-access publication of the \\ + R Foundation for Statistical Computing} + \end{center} + + % And finally, put in the TOC box. Note the way |tocdepth| is adjusted + % before and after producing the TOC: thus, we can ensure that only + % articles show up in the printed TOC, but that in the PDF version, + % bookmarks are created for sections and subsections as well (provided + % that the non-starred forms are used). + \setcounter{tocdepth}{0} + \tableofcontents + \setcounter{tocdepth}{2} + \clearpage +} + +% Text formatting -------------------------------------------------------------- + +\newcommand{\R}{R} +\newcommand{\address}[1]{\addvspace{\baselineskip}\noindent\emph{#1}} +\newcommand{\email}[1]{\href{mailto:#1}{\normalfont\texttt{#1}}} + +% Simple font selection is not good enough. For example, |\texttt{--}| +% gives `\texttt{--}', i.e., an endash in typewriter font. Hence, we +% need to turn off ligatures, which currently only happens for commands +% |\code| and |\samp| and the ones derived from them. Hyphenation is +% another issue; it should really be turned off inside |\samp|. And +% most importantly, \LaTeX{} special characters are a nightmare. E.g., +% one needs |\~{}| to produce a tilde in a file name marked by |\file|. +% Perhaps a few years ago, most users would have agreed that this may be +% unfortunate but should not be changed to ensure consistency. But with +% the advent of the WWW and the need for getting `|~|' and `|#|' into +% URLs, commands which only treat the escape and grouping characters +% specially have gained acceptance + +\DeclareRobustCommand\code{\bgroup\@noligs\@codex} +\def\@codex#1{\texorpdfstring% +{{\normalfont\ttfamily\hyphenchar\font=-1 #1}}% +{#1}\egroup} +\newcommand{\kbd}[1]{{\normalfont\texttt{#1}}} +\newcommand{\key}[1]{{\normalfont\texttt{\uppercase{#1}}}} +\DeclareRobustCommand\samp{`\bgroup\@noligs\@sampx} +\def\@sampx#1{{\normalfont\texttt{#1}}\egroup'} +\newcommand{\var}[1]{{\normalfont\textsl{#1}}} +\let\env=\code +\newcommand{\file}[1]{{`\normalfont\textsf{#1}'}} +\let\command=\code +\let\option=\samp +\newcommand{\dfn}[1]{{\normalfont\textsl{#1}}} +% \acronym is effectively disabled since not used consistently +\newcommand{\acronym}[1]{#1} +\newcommand{\strong}[1]{\texorpdfstring% +{{\normalfont\fontseries{b}\selectfont #1}}% +{#1}} +\let\pkg=\strong +\newcommand{\CRANpkg}[1]{\href{https://CRAN.R-project.org/package=#1}{\pkg{#1}}}% +\let\cpkg=\CRANpkg +\newcommand{\ctv}[1]{\href{https://CRAN.R-project.org/view=#1}{\emph{#1}}} +\newcommand{\BIOpkg}[1]{\href{https://www.bioconductor.org/packages/release/bioc/html/#1.html}{\pkg{#1}}} + +% Example environments --------------------------------------------------------- +\RequirePackage{fancyvrb} +\RequirePackage{alltt} + +\DefineVerbatimEnvironment{example}{Verbatim}{} +\renewenvironment{example*}{\begin{alltt}}{\end{alltt}} + +% Support for output from Sweave, and generic session style code +% These used to have fontshape=sl for Sinput/Scode/Sin, but pslatex +% won't use a condensed font in that case. + +% Update (2015-05-28 by DS): remove fontsize=\small to match example environment + +\DefineVerbatimEnvironment{Sinput}{Verbatim}{} +\DefineVerbatimEnvironment{Soutput}{Verbatim}{} +\DefineVerbatimEnvironment{Scode}{Verbatim}{} +\DefineVerbatimEnvironment{Sin}{Verbatim}{} +\DefineVerbatimEnvironment{Sout}{Verbatim}{} +\newenvironment{Schunk}{}{} + +% Mathematics ------------------------------------------------------------------ + +% The implementation of |\operatorname| is similar to the mechanism +% \LaTeXe{} uses for functions like sin and cos, and simpler than the +% one of \AmSLaTeX{}. We use |\providecommand| for the definition in +% order to keep the one of the \pkg{amstex} if this package has +% already been loaded. +% \begin{macrocode} +\providecommand{\operatorname}[1]{% + \mathop{\operator@font#1}\nolimits} +\RequirePackage{amsfonts} + +\renewcommand{\P}{% + \mathop{\operator@font I\hspace{-1.5pt}P\hspace{.13pt}}} +\newcommand{\E}{% + \mathop{\operator@font I\hspace{-1.5pt}E\hspace{.13pt}}} +\newcommand{\VAR}{\operatorname{var}} +\newcommand{\COV}{\operatorname{cov}} +\newcommand{\COR}{\operatorname{cor}} + +% Figures ---------------------------------------------------------------------- + +% For use with pandoc > 3.2.1 +\newsavebox\pandoc@box +\newcommand*\pandocbounded[1]{% scales image to fit in text height/width + \sbox\pandoc@box{#1}% + \Gscale@div\@tempa{\textheight}{\dimexpr\ht\pandoc@box+\dp\pandoc@box\relax}% + \Gscale@div\@tempb{\linewidth}{\wd\pandoc@box}% + \ifdim\@tempb\p@<\@tempa\p@\let\@tempa\@tempb\fi% select the smaller of both + \ifdim\@tempa\p@<\p@\scalebox{\@tempa}{\usebox\pandoc@box}% + \else\usebox{\pandoc@box}% + \fi% +} + +\RequirePackage[font=small,labelfont=bf]{caption} + +% Wide environments for figures and tables ------------------------------------- +\RequirePackage{environ} + +% An easy way to make a figure span the full width of the page +\NewEnviron{widefigure}[1][]{ +\begin{figure}[#1] +\advance\leftskip-2cm +\begin{minipage}{\dimexpr\textwidth+4cm\relax}% + \captionsetup{margin=2cm} + \BODY +\end{minipage}% +\end{figure} +} + +\NewEnviron{widetable}[1][]{ +\begin{table}[#1] +\advance\leftskip-2cm +\begin{minipage}{\dimexpr\textwidth+4cm\relax}% + \captionsetup{margin=2cm} + \BODY +\end{minipage}% +\end{table} +} diff --git a/_articles/RJ-2025-032/RJwrapper.tex b/_articles/RJ-2025-032/RJwrapper.tex new file mode 100644 index 0000000000..5fb543f497 --- /dev/null +++ b/_articles/RJ-2025-032/RJwrapper.tex @@ -0,0 +1,72 @@ +\documentclass[a4paper]{report} +\usepackage[utf8]{inputenc} +\usepackage[T1]{fontenc} +\usepackage{RJournal} +\usepackage{amsmath,amssymb,array} +\usepackage{booktabs} + + +% tightlist command for lists without linebreak +\providecommand{\tightlist}{% + \setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}} + +\usepackage{longtable} + +% Always define CSL refs as bib entries are contained in separate doc +% Pandoc citation processing +%From Pandoc 3.1.8 +% definitions for citeproc citations +\NewDocumentCommand\citeproctext{}{} +\NewDocumentCommand\citeproc{mm}{% + \begingroup\def\citeproctext{#2}\cite{#1}\endgroup} +\makeatletter + % allow citations to break across lines + \let\@cite@ofmt\@firstofone + % avoid brackets around text for \cite: + \def\@biblabel#1{} + \def\@cite#1#2{{#1\if@tempswa , #2\fi}} +\makeatother +\newlength{\cslhangindent} +\setlength{\cslhangindent}{1.5em} +\newlength{\csllabelwidth} +\setlength{\csllabelwidth}{3em} +\newenvironment{CSLReferences}[2] % #1 hanging-indent, #2 entry-spacing + {\begin{list}{}{% + \setlength{\itemindent}{0pt} + \setlength{\leftmargin}{0pt} + \setlength{\parsep}{0pt} + % turn on hanging indent if param 1 is 1 + \ifodd #1 + \setlength{\leftmargin}{\cslhangindent} + \setlength{\itemindent}{-1\cslhangindent} + \fi + % set entry spacing + \setlength{\itemsep}{#2\baselineskip}}} + {\end{list}} +\usepackage{calc} +\newcommand{\CSLBlock}[1]{#1\hfill\break} +\newcommand{\CSLLeftMargin}[1]{\parbox[t]{\csllabelwidth}{#1}} +\newcommand{\CSLRightInline}[1]{\parbox[t]{\linewidth - \csllabelwidth}{#1}\break} +\newcommand{\CSLIndent}[1]{\hspace{\cslhangindent}#1} + + +% Any extra LaTeX you need in the preamble + + +\begin{document} + + +%% do not edit, for illustration only +\sectionhead{Contributed research article} +\volume{17} +\volnumber{4} +\year{2025} +\month{December} +\setcounter{page}{4} + +\begin{article} + \input{RJ-2025-032} +\end{article} + + +\end{document} diff --git a/_articles/RJ-2025-032/article.bib b/_articles/RJ-2025-032/article.bib new file mode 100644 index 0000000000..920284a000 --- /dev/null +++ b/_articles/RJ-2025-032/article.bib @@ -0,0 +1,523 @@ +@manual{csurvey2025, + title = {csurvey: Constrained Regression for Survey Data}, + author = {Xiyue Liao}, + year = {2025}, + note = {R package version 1.15}, + url = {https://github.com/xliaosdsu/csurvey}, + doi = {10.32614/CRAN.package.csurvey} +} +@article{sae2015, + author = {Isabel Molina and Yolanda Marhuenda}, + title = {{sae}: An {R} Package for Small Area Estimation}, + journal = {The R Journal}, + year = {2015}, + volume = {7}, + number = {1}, + pages = {81--98}, + month = {jun}, + url = {https://journal.r-project.org/archive/2015/RJ-2015-007/RJ-2015-007.pdf} +} +@article{emdi2019, + title = {The {R} Package {emdi} for Estimating and Mapping Regionally Disaggregated Indicators}, + author = {Ann-Kristin Kreutzmann and S\"oren Pannier and Natalia Rojas-Perilla and Timo Schmid and Matthias Templ and Nikos Tzavidis}, + journal = {Journal of Statistical Software}, + year = {2019}, + volume = {91}, + number = {7}, + pages = {1--33}, + doi = {10.18637/jss.v091.i07} +} +@manual{dtable, + title = {data.table: Extension of `data.frame`}, + author = {Tyson Barrett and Matt Dowle and Arun Srinivasan and Jan Gorecki and Michael Chirico and Toby Hocking and Benjamin Schwendinger and Ivan Krylov}, + year = {2025}, + note = {R package version 1.17.0}, + url = {https://CRAN.R-project.org/package=data.table}, + doi = {10.32614/CRAN.package.data.table} +} +@manual{igraph, + title = {{igraph}: Network Analysis and Visualization in {R}}, + author = {Cs{\'a}rdi, G{\'a}bor and Nepusz, Tam{\'a}s and Traag, Vincent and Horv{\'a}t, Szabolcs and Fabio Zanini and Daniel Noom and Kirill Müller}, + year = {2025}, + note = {R package version 2.1.4}, + doi = {10.5281/zenodo.7682609}, + url = {https://CRAN.R-project.org/package=igraph} +} +@manual{dplyr, + title = {dplyr: A Grammar of Data Manipulation}, + author = {Hadley Wickham and Romain François and Lionel Henry and Kirill Müller and Davis Vaughan}, + year = {2023}, + note = {R package version 1.1.4}, + url = {https://CRAN.R-project.org/package=dplyr}, + doi = {10.32614/CRAN.package.dplyr} +} +@book{ggplot2, + author = {Hadley Wickham}, + title = {ggplot2: Elegant Graphics for Data Analysis}, + publisher = {Springer-Verlag New York}, + year = {2016}, + isbn = {978-3-319-24277-4}, + url = {https://ggplot2.tidyverse.org} +} +@book{mass, + title = {Modern Applied Statistics with {S}}, + author = {W. N. Venables and B. D. Ripley}, + publisher = {Springer}, + edition = {Fourth}, + address = {New York}, + year = {2002}, + note = {ISBN 0-387-95457-0}, + url = {https://www.stats.ox.ac.uk/pub/MASS4/} +} +@manual{matrix, + title = {Matrix: Sparse and Dense Matrix Classes and Methods}, + author = {Douglas Bates and Martin Maechler and Mikael Jagan}, + year = {2025}, + note = {R package version 1.7-3}, + url = {https://CRAN.R-project.org/package=Matrix}, + doi = {10.32614/CRAN.package.Matrix} +} +@article{bic78, + title = {Estimating the Dimension of a Model}, + author = {Gideon Schwarz}, + year = {1978}, + journal = {Annals of Statistics}, + volume = {6}, + number = {2}, + pages = {461-464}, + doi = {10.1214/aos/1176344136} +} +@inproceedings{aic73, + author = {Akaike, Hirotugu}, + booktitle = {Second International Symposium on Information Theory}, + editor = {Petrov, B. N. and Csaki, F.}, + pages = {267--281}, + publisher = {{Akad{\'e}miai Kiad{\'o}}}, + title = {Information theory as an extension of the maximum likelihood principle}, + year = {1973} +} +@article{liao23, + author = {Xiyue Liao and Mary C. Meyer and Xiaoming Xu}, + journal = {Survey Methodology}, + number = {2}, + pages = {303-321}, + title = {Design-Based Estimation of Small and Empty Domains in Survey Data Analysis using Order Constraints}, + url = {http://www.statcan.gc.ca/pub/12-001-x/2024002/article/00010-eng.pdf}, + volume = {50}, + year = {2024} +} +@article{survey, + author = {Thomas Lumley}, + doi = {10.18637/jss.v009.i08}, + journal = {Journal of Statistical Software}, + number = {8}, + pages = {1-19}, + title = {Analysis of Complex Survey Samples}, + volume = {9}, + year = {2004} +} +@article{daniel14, + author = {Daniel Oberski}, + doi = {10.18637/jss.v057.i01}, + journal = {Journal of Statistical Software}, + number = {1}, + pages = {1--27}, + title = {lavaan.survey: An {R} Package for Complex Survey Analysis of Structural Equation Models}, + volume = {57}, + year = {2014} +} +@article{andreas13, + author = {Andreas Alfons and Matthias Templ}, + doi = {10.18637/jss.v054.i15}, + journal = {Journal of Statistical Software}, + number = {15}, + pages = {1--25}, + title = {Estimation of Social Exclusion Indicators from Complex Surveys: The {R} Package laeken}, + volume = {54}, + year = {2013} +} +@article{Imai05, + author = {Imai, Kosuke and Van Dyk, David A.}, + doi = {10.18637/jss.v014.i03}, + journal = {Journal of Statistical Software}, + number = {3}, + pages = {1--32}, + title = {{MNP}: {R} Package for Fitting the Multinomial Probit Model}, + volume = {14}, + year = {2005} +} +@article{xu23, + author = {Xiaoming Xu and Mary C. Meyer}, + journal = {Survey Methodology}, + number = {1}, + pages = {117-138}, + title = {One-Sided Testing of Population Domain Means in Surveys}, + url = {https://www150.statcan.gc.ca/n1/en/pub/12-001-x/2023001/article/00001-eng.pdf}, + volume = {49}, + year = {2023} +} +@article{boistard12, + author = {Helene Boistard and Hendrik P. Lopuhaa}, + journal = {Electronic Journal of Statistics}, + pages = {1967-1983}, + title = {Approximation of rejective sampling inclusion probabilities and application to high order correlations}, + volume = {6}, + year = {2012} +} +@article{xu20, + author = {Xiaoming Xu and Mary C. Meyer and Jean D. Opsomer}, + doi = {10.1016/j.jspi.2021.02.004}, + journal = {Journal of Statistical Planning and Inference}, + pages = {47-71}, + title = {Improved Variance Estimation for Inequality-Constrained Domain Mean Estimators using Survey Data}, + volume = {215}, + year = {2021} +} +@article{opsomer2005, + author = {Opsomer, J.D. and Miller, C.P.}, + journal = {Journal of Nonparametric Statistics}, + number = {5}, + pages = {593--611}, + title = {Selecting the amount of smoothing in nonparametric regression estimation for complex surveys}, + volume = {17}, + year = {2005} +} +@article{breidt17, + author = {F. J. Breidt and J. D. Opsomer}, + journal = {Statistical Science}, + number = {2}, + pages = {190-205}, + title = {Model-Assisted Survey Estimation with Modern Prediction Techniques}, + volume = {32}, + year = {2017} +} +@article{oliva20, + author = {Cristian Oliva-Avil\'es AND Mary C. Meyer AND Jean D. Opsomer}, + journal = {Survey Methology}, + number = {2}, + pages = {145-180}, + title = {Estimation and Inference of Domain Means Subject to Qualitative Constraints}, + url = {https://www150.statcan.gc.ca/n1/pub/12-001-x/2020002/article/00002-eng.htm.}, + volume = {46}, + year = {2020} +} +@incollection{akaike73, + address = {Budapest}, + author = {Akaike, H.}, + booktitle = {Second International Symposium on Information Theory}, + editor = {Petrov, B. N. and Csaki, F.}, + pages = {267-281}, + publisher = {Akademiai Kiado}, + title = {Information theory as an extension of the maximum likelihood principle}, + year = {1973} +} +@manual{bcgam, + author = {Cristian Oliva-Aviles and Mary C. Meyer}, + note = {{R} package version 1.0, URL \url{https://CRAN.R-project.org/package=bcgam}}, + title = {bcgam: {Bayesian} Constrained Generalized Additive Model}, + year = {2018} +} +@article{brunk55, + author = {Brunk, H. D.}, + journal = {The Annals of Mathematical Statistics}, + pages = {1026-1053}, + title = {Maximum Likelihood Estimates of Monotone Parameters}, + volume = {28}, + year = {1955} +} +@article{brunk58, + author = {Brunk, H. D.}, + doi = {10.1214/aoms/1177728420}, + journal = {The Annals of Mathematical Statistics}, + number = {2}, + pages = {437-454}, + title = {Maximum Likelihood Estimates of Monotone Parameters}, + volume = {29}, + year = {1958} +} +@manual{cgam, + author = {Mary C. Meyer and Xiyue Liao}, + note = {{R} package version 1.9, URL \url{https://CRAN.R-project.org/package=cgam}}, + title = {cgam: Constrained Generalized Additive Model}, + year = {2017} +} +@article{liao19, + author = {Liao, X. and Meyer, M. C.}, + doi = {10.18637/jss.v089.i05}, + journal = {Journal of Statistical Software}, + number = {5}, + pages = {1--24}, + title = {cgam: An {R} Package for the Constrained Generalized Additive Model}, + volume = {89}, + year = {2019} +} +@incollection{duncan61, + author = {Duncan, Otis Dudley}, + booktitle = {Occupations and Social Status}, + editor = {{Reiss, A. J. Jr.}}, + publisher = {New York: Free Press [Table VI-1]}, + title = {A socioeconomic index for all occupations}, + year = {1961} +} +@article{fayherriot79, + author = {Fay, R. E. and Herriot, R. A.}, + journal = {Journal of the American Statistical Association}, + pages = {761-766}, + title = {Estimates of income for small places: an application of {J}ames-{S}tein procedures to census data}, + volume = {52}, + year = {1979} +} +@book{fenchel53, + address = {Princeton, New Jersey}, + author = {Fenchel, W.}, + publisher = {{Mimeographed notes by D. W. Blackett, Princeton Univ. Press}}, + title = {Convex cones, sets, and functions}, + year = {1953} +} +@article{fuller99, + author = {Fuller, W. A.}, + journal = {Journal of the Agricultural, Biological and Environmental Statistics}, + pages = {331-345}, + title = {Environmental surveys over time}, + volume = {4}, + year = {1999} +} +@incollection{hajek71, + address = {Toronto}, + author = {H\'ajek, J.}, + booktitle = {Foundations of Statistical Inference}, + editor = {V. P. Godambe and D. A. Sprott}, + pages = {236}, + publisher = {Holt, Rinehart and Winston}, + title = {Comment on a paper by {D. Basu}}, + year = {1971} +} +@article{horvitz52, + author = {Horvitz, D. G. and Thompson, D. J.}, + journal = {Journal of the American Statistical Association}, + pages = {663-685}, + title = {A generalization of sampling without replacement from a finite universe}, + volume = {47}, + year = {1952} +} +@article{kott2001, + author = {Kott, P. S.}, + journal = {Journal of Official Statistics}, + pages = {521-526}, + title = {The delete-a-group jackknife}, + volume = {17}, + year = {2001} +} +@article{coneproj, + author = {Xiyue Liao and Mary C. Meyer}, + doi = {10.18637/jss.v061.i12}, + journal = {Journal of Statistical Software}, + number = {12}, + pages = {1--22}, + title = {coneproj: An {R} Package for the Primal or Dual Cone Projections with Routines for Constrained Regression}, + volume = {61}, + year = {2014} +} +@article{meyer08, + author = {Meyer, M. C.}, + journal = {Annals of Applied Statistics}, + number = {3}, + pages = {1013-1033}, + title = {Inference using shape-restricted regression splines}, + volume = {2}, + year = {2008} +} +@article{meyer11, + author = {Meyer, M. C. and Hackstadt, A. J. and Hoeting, J. A.}, + journal = {Journal of Nonparametric Statistics}, + number = {4}, + pages = {867-884}, + title = {{Bayesian} estimation and inference for generalised partial linear models using shape-restricted splines}, + volume = {23}, + year = {2011} +} +@article{meyer13, + author = {Meyer, M. C.}, + journal = {Journal of Nonparametric Statistics}, + pages = {715-730}, + title = {Semi-parametric additive constrained regression}, + volume = {25}, + year = {2013} +} +@article{meyer13b, + author = {Meyer, M. C.}, + journal = {Communications in Statistics}, + pages = {1126-1139}, + title = {A simple new algorithm for quadratic programming with applications in statistics}, + volume = {42}, + year = {2013} +} +@misc{nimblesoft18, + author = {{NIMBLE Development Team}}, + title = {NIMBLE: An {R} Package for Programming with {BUGS} models, Version 0.6-9}, + url = {http://r-nimble.org}, + year = {2018} +} +@article{opsomer08, + author = {Opsomer, J. D. and Claeskens, G. and Ranalli, M. G. and Kauemann, G. and Breidt, F. J.}, + journal = {Journal of the Royal Statistical Society, series B}, + pages = {265-286}, + title = {Nonparametric small area estimation using penalized spline regression}, + volume = {70}, + year = {2008} +} +@article{opsomer16, + author = {Opsomer, J. D. and Breidt, F. J. and White, M. and Li, Y.}, + journal = {Journal of Survey Statistical Methodology}, + number = {1}, + pages = {43-70}, + title = {Succesive difference replication variance estimation in Two-Phase sampling}, + volume = {4}, + year = {2016} +} +@manual{Rteam, + address = {Vienna, Austria}, + author = {{R Core Team}}, + organization = {R Foundation for Statistical Computing}, + title = {{R}: A Language and Environment for Statistical Computing}, + url = {https://www.R-project.org}, + year = {2018} +} +@article{ramsay88, + author = {Ramsay, J. O.}, + journal = {Statistical Science}, + pages = {425-461}, + title = {Monotone regression splines in action}, + volume = {3}, + year = {1988} +} +@book{rao03, + address = {Hoboken, New Jersey}, + author = {Rao, J. N. K.}, + publisher = {Wiley}, + title = {Small Area Estimation}, + year = {2003} +} +@article{rao08, + author = {Rao, J. N. K.}, + journal = {Rivista Internazionale di Scienze Sociali}, + pages = {387-406}, + title = {Some methods for small area estimation}, + volume = {4}, + year = {2008} +} +@book{rao15, + address = {Hoboken, New Jersey}, + author = {Rao, J. N. K. and Molina, Isabel}, + edition = {2nd}, + publisher = {Wiley}, + title = {Small Area Estimation}, + year = {2015} +} +@book{robertson88, + address = {New York}, + author = {Robertson, T. and Wright, F. T. and Dykstra, R. L.}, + publisher = {John Wiley \& Sons}, + title = {Order Restricted Statistical Inference}, + year = {1988} +} +@book{rockafellar70, + address = {New Jersey}, + author = {Rockafellar, R. T.}, + publisher = {Princeton University Press}, + title = {Convex Analysis}, + year = {1970} +} +@article{schwarz78, + author = {Schwarz, G.}, + journal = {Annals of Statistics}, + pages = {461-464}, + title = {Estimating the dimension of a model}, + volume = {6}, + year = {1978} +} +@book{silvapulle05, + address = {Hoboken, New Jersey}, + author = {Silvapulle, M. J. and Sen, P.}, + publisher = {Wiley}, + title = {Constrained Statistical Inference}, + year = {2005} +} +@article{vaneeden56, + author = {VanEeden, C.}, + journal = {Indigationes Mathematicae}, + pages = {444-455}, + title = {Maximum likelihood estimation of ordered probabilities}, + volume = {18}, + year = {1956} +} +@incollection{wollan86, + address = {New York}, + author = {Wollan, P. C. and Dykstra, R. L.}, + booktitle = {Advances in Order Restricted Statistical Inference}, + editor = {Dykstra, R. L. and Robertson, T. and Wright, F. T.}, + pages = {279-295}, + publisher = {Springer-Verlag}, + title = {Conditional Tests with an order restriction as a null hypothesis}, + year = {1986} +} +@article{wu16, + author = {Wu, J. and Meyer, M. C. and Opsomer, J. D.}, + doi = {10.1002/cjs.11301}, + journal = {Canadian Journal of Statistics}, + number = {4}, + pages = {431-444}, + title = {Survey Estimation of Domain Means that Respect Natural Orderings}, + volume = {44}, + year = {2016} +} +@article{oliva19, + author = {Cristian Oliva-Avil\'es AND Mary C. Meyer AND Jean D. Opsomer}, + doi = {10.1002/cjs.11496}, + journal = {Canadian Journal of Statistics}, + number = {2}, + pages = {315-331}, + title = {Checking Validity of Monotone Domain Mean Estimators}, + volume = {47}, + year = {2019} +} +@article{breidt2000, + author = {Breidt, F. and Opsomer, J.}, + journal = {The Annals of Statistics}, + number = {4}, + pages = {1026-1053}, + title = {Local Polynomial Regression Estimators in Survey Sampling}, + volume = {28}, + year = {2000} +} +@article{meyer99, + author = {Meyer, M. C.}, + journal = {Journal of Statistical Planning and Inference}, + pages = {13-31}, + title = {An extension of the mixed primal-dual bases algorithm to the case of more constraints than dimensions}, + volume = {81}, + year = {1999} +} +@book{fuller96, + address = {New York}, + author = {Fuller, W.A.}, + edition = {2nd}, + publisher = {Wiley}, + title = {Introduction to Statistical Time Series}, + year = {1996} +} +@book{sarndal92, + address = {New York}, + author = {S\"arndal, C.-E. and Swensson, B. and Wretman, J.}, + publisher = {Springer}, + title = {Model Assisted Survey Sampling}, + year = {1992} +} +@article{zhang13, + author = {Zhang, X. and Onufrak, S. and Holt, J. B. and Croft, J. B.}, + journal = {Preventing Chronic Disease}, + pages = {120252}, + title = {A multilevel approach to estimating small area childhood obesity prevalence at the census block-group level.}, + volume = {10}, + year = {2013} +} diff --git a/_articles/RJ-2025-032/figures/daded.png b/_articles/RJ-2025-032/figures/daded.png new file mode 100644 index 0000000000..4fcea37902 Binary files /dev/null and b/_articles/RJ-2025-032/figures/daded.png differ diff --git a/_articles/RJ-2025-032/figures/new_surfaces9.pdf b/_articles/RJ-2025-032/figures/new_surfaces9.pdf new file mode 100644 index 0000000000..4c8d534117 Binary files /dev/null and b/_articles/RJ-2025-032/figures/new_surfaces9.pdf differ diff --git a/_articles/RJ-2025-032/figures/new_surfaces9.png b/_articles/RJ-2025-032/figures/new_surfaces9.png new file mode 100644 index 0000000000..e6ad913364 Binary files /dev/null and b/_articles/RJ-2025-032/figures/new_surfaces9.png differ diff --git a/_articles/RJ-2025-032/figures/newplot4.png b/_articles/RJ-2025-032/figures/newplot4.png new file mode 100644 index 0000000000..1160b25931 Binary files /dev/null and b/_articles/RJ-2025-032/figures/newplot4.png differ diff --git a/_articles/RJ-2025-032/figures/nhanes1.png b/_articles/RJ-2025-032/figures/nhanes1.png new file mode 100644 index 0000000000..5e5017d338 Binary files /dev/null and b/_articles/RJ-2025-032/figures/nhanes1.png differ diff --git a/_articles/RJ-2025-032/figures/nhanes_bin.png b/_articles/RJ-2025-032/figures/nhanes_bin.png new file mode 100644 index 0000000000..94b9a02644 Binary files /dev/null and b/_articles/RJ-2025-032/figures/nhanes_bin.png differ diff --git a/_articles/RJ-2025-032/figures/nhanes_grid3.png b/_articles/RJ-2025-032/figures/nhanes_grid3.png new file mode 100644 index 0000000000..16ef62d650 Binary files /dev/null and b/_articles/RJ-2025-032/figures/nhanes_grid3.png differ diff --git a/_articles/RJ-2025-032/figures/nhanes_grid3_un.png b/_articles/RJ-2025-032/figures/nhanes_grid3_un.png new file mode 100644 index 0000000000..f307736895 Binary files /dev/null and b/_articles/RJ-2025-032/figures/nhanes_grid3_un.png differ diff --git a/_articles/RJ-2025-032/nscg19.rda b/_articles/RJ-2025-032/nscg19.rda new file mode 100644 index 0000000000..032a438f25 Binary files /dev/null and b/_articles/RJ-2025-032/nscg19.rda differ diff --git a/_articles/RJ-2025-032/nscg19_2.rda b/_articles/RJ-2025-032/nscg19_2.rda new file mode 100644 index 0000000000..f7cc3012bd Binary files /dev/null and b/_articles/RJ-2025-032/nscg19_2.rda differ diff --git a/_articles/RJ-2025-032/rcode/article.R b/_articles/RJ-2025-032/rcode/article.R new file mode 100644 index 0000000000..7d533a87d3 --- /dev/null +++ b/_articles/RJ-2025-032/rcode/article.R @@ -0,0 +1,243 @@ +library(coneproj) +library(csurvey) +library(survey) +library(data.table) +library(tidyverse) +library(MASS) +library(rlang) +library(zeallot) +library(patchwork) +library(knitr) +library(tinytex) +#options(csurvey.multicore = TRUE) + +#------------------------------------------------ +#NHANES Example with monotonic domain means +#------------------------------------------------ +data(nhdat2, package = "csurvey") +#use ?nhdat2 to see details of this data set +#specify the design: +dstrat <- svydesign(ids = ~id, strata = ~str, data = nhdat2, weight = ~wt) +#fit the model +set.seed(1) +ans <- csvy(chol ~ incr(age), design = dstrat, n.mix = 100) + +cat("CIC (constrained):", ans$CIC, "\n") +cat("CIC (unconstrained):", ans$CIC.un, "\n") +cat(svycontrast(ans, list(avg = c(rep(-1, 13)/13, rep(1, 12)/12))), "\n") + +#Figure 1 +#see ?plot_csvy_control to find out how to adjust aesthetics +ctl = list(unconstrained_color = "grey80") +png("nhanes1.png") +plot(ans, type = 'both', control = ctl) +dev.off() + +#Figure 2 +data(nhdat2, package = 'csurvey') +#specify the design: +dstrat <- svydesign(ids = ~id, strata = ~str, data = nhdat2, weight = ~wt) +#use parallel computing: +#options(csurvey.multicore=TRUE) +set.seed(1) +ans <- csvy(chol ~ incr(age)*incr(wcat)*icat, design = dstrat) + +domains <- data.frame(age = c(24, 35), wcat = c(2, 4), icat = c(2, 3)) +pans <- predict(ans, newdata = domains, se.fit = TRUE) +cat("Predicted values, confidence intervals and standard errors for specified domains:\n") +print (pans) + +ctl <- list(x1lab = 'waist', x2lab = 'income', subtitle.size = 8) + +png('nhanes_grid3.png') +plot(ans, x1 = "wcat", x2 = "icat", control = ctl, domains = domains) +dev.off() + +#Figure 3 +png('nhanes_grid3_un.png') +plot(ans, x1 = "wcat", x2 = "icat", control = ctl, type="unconstrained") +dev.off() + +#---------------------------------------------------------------------- +#Example: Constrained domain means with a block ordering +#---------------------------------------------------------------------- +load('../nscg19.rda') +rds <- svrepdesign(data = nscg, repweights = dplyr::select(nscg, "RW0001":"RW0320"), + weights = ~w, combined.weights = TRUE, mse = TRUE, type = "other", scale = 1, rscale = 0.05) + +#options(csurvey.multicore = TRUE) +ans <- csvy(logSalary~decr(hd_year_grouped)*incr(hd_type)*block.Ord(field, order = c(2, 1, 2, 1, 2))*region, design = rds) + +#Figure 4 +ctl <- list(categ = "region", categnm = c("New England", "Middle Atlantic", "East North Central", "West North Central", + "South Atlantic", "East South Central", "West South Central", "Mountain", "Pacific and US Territories"), + NCOL = 3, th = 60, xlab = "years since degree", ylab = "field of study", zlab = "log(salary)") + +png('new_surfaces9.png') +plotpersp(ans, x1="hd_year_grouped", x2="field", control = ctl) +dev.off() + +#Figure 5 +#use regions: 3, 9, 6 +#use the patchwork package to stack three plots into one +ctl1 <- list(x1lab = " ",x2lab = "Field of study", x3lab = "Year of award of highest degree", + x1_labels = FALSE, x2_labels = FALSE, x3_labels = c(1995, 2000, 2005, 2010, 2015), + x4_labels = c("New England", "Middle Atlantic", "East North Central", "West North Central", "South Atlantic", "East South Central", "West South Central", + "Mountain", "Pacific and US Territories"), ynm = "log(Salary)", ci = TRUE, legend = FALSE, ylab=FALSE, unconstrained_color = 'grey80', x4_vals = 3, subtitle.size = 10) + +p1 <- plot(ans, x1 = 'hd_type', x2 = 'field', x3 = 'hd_year_grouped', + control = ctl1, type = 'both') + +ctl2 <- list(x1lab = " ", x2lab = " ", x3lab = " ", x1_labels = FALSE, x2_labels = FALSE, x3_labels = FALSE, + x4_labels = c("New England", "Middle Atlantic", "East North Central", "West North Central", "South Atlantic", "East South Central", "West South Central", + "Mountain", "Pacific and US Territories"), ynm = 'log(Salary)', ci = TRUE, legend = FALSE, ylab = TRUE, unconstrained_color = 'grey80', x4_vals = 9) + +p2 <- plot(ans, x1 = 'hd_type', x2 = 'field', x3 = 'hd_year_grouped', + control = ctl2, type = 'both') + +ctl3 <- list(x1lab = 'Highest degree type', x2lab = " ", x3lab = " ", + x1_labels = c('BS', 'MS', 'PHD'), x2_labels = FALSE, x3_labels = FALSE, + x4_labels = c("New England", "Middle Atlantic", "East North Central", "West North Central", "South Atlantic", "East South Central", "West South Central", + "Mountain", "Pacific and US Territories"), ynm = 'log(Salary)', ci = TRUE, legend = FALSE, ylab = FALSE, x1size = 2.5, + unconstrained_color = 'grey80', x4_vals = 6, angle = 0, hjust = .5) + +p3 <- plot(ans, x1 = 'hd_type', x2 = 'field', x3 = 'hd_year_grouped', control = ctl3, type = 'both') + +#use the patchwork package to stack three plots into one +library(patchwork) + +png('newplot4.png') +(p1 / p2 / p3) +dev.off() + +#-------------------------- +#Example:One-sided testing +#-------------------------- +load('../nscg19_2.rda') + +#code to generate Table 1 +pvals.table = data.frame(csvy_onesided = numeric(6), svyglm_twosided = numeric(6)) +rownames(pvals.table) = c('2008-09', '2010-11', '2012-13', '2014-15', '2016-17', '2018-19') + +#we need to set a random seed because the p-value for csvy is simulated +set.seed(1) +#svyglm may throw an error message when there is an empty domain, and we use tryCatch to address this issue +#we include the failing example for svyglm on purpose because we want to show that the proposed method csvy will work in some cases when svyglm fails +for(i in 1:6){ + print (i) + years <- c(2008, 2009) + 2 * (i - 1) + #select the data for specific years + data <- nscg2 |> dplyr::filter(hd_year %in% years) + rds <- svrepdesign(data = data, repweights = dplyr::select(data, "RW0001":"RW0320"), weights = ~w, + combined.weights = TRUE, mse = TRUE, type = "other", scale = 1, rscale = 0.05) + + ans <- csvy(logSalary ~ incr(daded) * field * region, design = rds, test = TRUE) + #p-value for the constrained estimate + pvals.table[i, 1] = ans$pval + + #when there is an empty domain, svyglm may not work and will throw an error message, we use tryCatch for the + #objects fitted by svyglm in this for loop + ansu1 <- tryCatch({svyglm(formula = logSalary ~ factor(field) * factor(daded) * factor(region), design = rds, data = data)}, error = function(e) {e}) + ansu2 <- tryCatch({svyglm(formula = logSalary ~ factor(field) * factor(region), design = rds, data = data)}, error = function(e) {e}) + #if ansu1 will be an error, then the anova function will not work and throw an error message, we use tryCatch for the + #object fitted by anova in this for loop + punc <- tryCatch({as.numeric(anova(ansu1, ansu2)[6])}, error = function(e) {e}) + if (inherits(punc, "error")) { + #p-value for the unconstrained estimate + pvals.table[i, 2] = NA + } else { + pvals.table[i, 2] = punc + } +} + +comppv = data.frame(year = c(rep("2008-09", 2), rep("2010-11", 2), rep("2012-13", 2), + rep("2014-15", 2), rep("2016-17", 2), rep("2018-19", 2)), + test = rep(c("one", "two"), 6), pvals = as.vector(t(round(pvals.table, 3)))) + +#print out results in Table 1 +#cat("Table 1:", "\n") +comppv = kable(t(comppv)) +print (comppv) + +#compile the Latex file, which saves the result, into a pdf file using the pdflatex function +tinytex::pdflatex("../tables/comppv.tex") + + +#Figure 6 +data <- nscg2 |> dplyr::filter(hd_year %in% c(2016, 2017)) +#specify a survey design with replicate weights +rds <- svrepdesign(data = data, repweights = dplyr::select(data, "RW0001":"RW0320"), weights = ~w, + combined.weights = TRUE, mse = TRUE, type = "other", scale = 1, rscale = 0.05) +#fit the model +ans <- csvy(logSalary ~ incr(daded) * field * region, design = rds) + +#create the plot +ctl <- list(x1lab = 'Field', x2lab = 'Region', subtitle.size=10, x1size=3, x2size=3, x3lab = "father's education", + ynm = "log(salary)", unconstrained_color = "red", ci = FALSE, + x2_labels = c('Northeast', 'North Central', 'Southeast', 'West', 'Pacific & Territories')) + +png('daded.png') +plot(ans, x1='field', x2='region', control = ctl, type = 'both') +dev.off() + +#show the one-sided test result for 2008-2009 +load("../nscg19_2.rda") +data <- nscg2 |> dplyr::filter(hd_year %in% c(2008, 2009)) +rds <- svrepdesign(data = data, repweights = dplyr::select(data, "RW0001":"RW0320"), weights = ~w, + combined.weights = TRUE, mse = TRUE, type = "other", + scale = 1, rscale = 0.05) +set.seed(1) +ans <- csvy(logSalary ~ incr(daded) * field * region, design = rds, test = TRUE) +summary(ans) + +#------------------------ +#Example: Binary outcome +#------------------------ +#use ?nhdat to check details of this data set +data(nhdat, package = 'csurvey') +#specify the design +dstrat <- svydesign(ids = ~ id, strata = ~ str, data = nhdat, weight = ~ wt) + +#fit the model +set.seed(1) +ans <- csvy(chol ~ incr(age) * incr(wcat) * gender, design = dstrat, family = binomial(link = "logit"), test = TRUE) + +#show the one-sided test result for 2008-2009 +summary(ans) + +#Figure 7 +#see ?plot_csvy_control to get more information about how to adjust aesthetics +ctl <- list(x1lab = 'waist', x2lab = 'gender', ynm = 'high cholesterol level', x2_labels = c('male', 'female'), ci = TRUE, subtitle.size = 8) +png('nhanes_bin.png') +plot(ans, x1='wcat', x2='gender', type="both", control = ctl) +dev.off() + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/_articles/RJ-2025-032/tables/comppv.pdf b/_articles/RJ-2025-032/tables/comppv.pdf new file mode 100644 index 0000000000..502a6a0222 Binary files /dev/null and b/_articles/RJ-2025-032/tables/comppv.pdf differ diff --git a/_articles/RJ-2025-032/tables/comppv.tex b/_articles/RJ-2025-032/tables/comppv.tex new file mode 100644 index 0000000000..e223f84cb3 --- /dev/null +++ b/_articles/RJ-2025-032/tables/comppv.tex @@ -0,0 +1,20 @@ +\documentclass{article} +\usepackage{booktabs} % Ensures proper table formatting +\begin{document} + +\begin{table} +\begin{center} +\begin{tabular}{cccccccccccc} +\multicolumn{2}{c}{2008-09} &\multicolumn{2}{c}{2010-11} &\multicolumn{2}{c}{2012-13} &\multicolumn{2}{c}{2014-15} &\multicolumn{2}{c}{2016-17} &\multicolumn{2}{c}{2018-19}\\ + one & two & one & two & one & two & one & two & one & two & one & two \\ + \hline + .008 & n/a & $<$.001 & .018 & $<$.001 & $<$.001 & $<$.001 & n/a & .003 & .417 & $<$.001 & n/a + \\ + \hline + \end{tabular} +\caption{One-sided and two-sided $p$-values for the test of the null hypothesis that salary is constant in father's education level. The two-sided test results in n/a when the grid has at least one empty domain. } +\label{comppv} +\end{center} +\end{table} + +\end{document} \ No newline at end of file diff --git a/_articles/RJ-2025-033/RJ-2025-033.Rmd b/_articles/RJ-2025-033/RJ-2025-033.Rmd new file mode 100644 index 0000000000..bc5bf85f00 --- /dev/null +++ b/_articles/RJ-2025-033/RJ-2025-033.Rmd @@ -0,0 +1,1174 @@ +--- +title: 'LHD: An All-encompassing R Package for Constructing Optimal Latin Hypercube + Designs' +abstract: | + Optimal Latin hypercube designs (LHDs), including maximin distance + LHDs, maximum projection LHDs and orthogonal LHDs, are widely used in + computer experiments. It is challenging to construct such designs with + flexible sizes, especially for large ones, for two main reasons. One + reason is that theoretical results, such as algebraic constructions + ensuring the maximin distance property or orthogonality, are only + available for certain design sizes. For design sizes where theoretical + results are unavailable, search algorithms can generate designs. + However, their numerical performance is not guaranteed to be optimal. + Another reason is that when design sizes increase, the number of + permutations grows exponentially. Constructing optimal LHDs is a + discrete optimization process, and enumeration is nearly impossible + for large or moderate design sizes. Various search algorithms and + algebraic constructions have been proposed to identify optimal LHDs, + each having its own pros and cons. We develop the R package LHD which + implements various search algorithms and algebraic constructions. We + embedded different optimality criteria into each of the search + algorithms, and they are capable of constructing different types of + optimal LHDs even though they were originally invented to construct + maximin distance LHDs only. Another input argument that controls + maximum CPU time is added to each of the search algorithms to let + users flexibly allocate their computational resources. We demonstrate + functionalities of the package by using various examples, and we + provide guidance for experimenters on finding suitable optimal + designs. The LHD package is easy to use for practitioners and possibly + serves as a benchmark for future developments in LHD. +author: +- name: Hongzhi Wang + affiliation: | + [wanghongzhi.ut@gmail.com](wanghongzhi.ut@gmail.com){.uri} +- name: Qian Xiao + affiliation: |- + Department of Statistics, School of Mathematical Sciences, Shanghai + Jiao Tong University + address: + - 800 Dongchuan Road, Minhang, Shanghai, 200240 + - China + - | + [qian.xiao@sjtu.edu.cn](qian.xiao@sjtu.edu.cn){.uri} +- name: Abhyuday Mandal + affiliation: Department of Statistics, University of Georgia + address: + - 310 Herty Drive, Athens, GA 30602 + - USA + - | + [amandal@stat.uga.edu](amandal@stat.uga.edu){.uri} +date: '2026-01-05' +date_received: '2023-11-20' +journal: + firstpage: 20 + lastpage: 36 +volume: 17 +issue: 4 +slug: RJ-2025-033 +packages: + cran: [] + bioc: [] +preview: preview.png +bibliography: wang-xiao-mandal.bib +CTV: ~ +legacy_pdf: yes +legacy_converted: yes +output: + rjtools::rjournal_web_article: + self_contained: yes + toc: no + mathjax: https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-chtml-full.js + md_extension: -tex_math_single_backslash +draft: no + +--- + + +::::::::::::: article +## Introduction + +Computer experiments are widely used in scientific research and +industrial production, where complex computer codes, commonly +high-fidelity simulators, generate data instead of real physical systems +[@sacks1989designs; @fang2005design]. The outputs from computer +experiments are deterministic (that is, free of random errors), and +therefore replications are not needed +[@butler2001optimal; @joseph2008orthogonal; @ba2015optimal]. Latin +hypercube designs (LHDs, [@mckay1979comparison]) may be the most popular +type of experimental designs for computer experiments +[@fang2005design; @xiao2018construction], which avoid replications on +every dimension and have uniform one-dimensional projections. According +to practical needs, there are various types of optimal LHDs, including +space-filling LHDs, maximum projection LHDs, and orthogonal LHDs. There +is a rich literature on the construction of such designs, but it is +still very challenging to find good ones for moderate to large design +sizes +[@ye1998orthogonal; @fang2005design; @joseph2015maximum; @xiao2018construction]. +One key reason is that theoretical results, such as algebraic +constructions which guarantee the maximin distance property or +orthogonality, are only established for specific design sizes. These +constructions provide theoretical guarantees on the design quality but +are limited in their applicability. For design sizes where such +theoretical guarantees do not exist, search algorithms can generate +designs. However, the performance of search-based designs depends on the +algorithm employed, the search space explored, and the computational +resources allocated, meaning they cannot be guaranteed to be optimal. +Constructing optimal LHDs is a discrete optimization process, where +enumerating all possible solutions guarantees the optimal design for a +given size. However, this approach becomes computationally infeasible as +the number of permutations grows exponentially with increasing design +sizes, making it another key reason that adds to the challenge. + +An LHD with $n$ runs and $k$ factors is an $n \times k$ matrix with each +column being a random permutation of numbers: $1, \ldots, n$. Throughout +this paper, $n$ denotes the run size and $k$ denotes the factor size. A +space-filling LHD has its sampled region as scattered as possible, +minimizing the unsampled region, thus accounting for the uniformity of +all dimensions. Different criteria were proposed to measure designs' +space-filling properties, including the maximin and minimax distance +criteria [@johnson1990minimax; @morris1995exploratory], the discrepancy +criteria +[@hickernell1998generalized; @fang2002centered; @fang2005design] and the +entropy criterion [@shewry1987maximum]. Since there are as many as +$(n!)^{k}$ candidate LHDs for a given design size, it is nearly +impossible to find the space-filling one by enumeration when $n$ and $k$ +are moderate or large. In the current literature, both the search +algorithms +[@morris1995exploratory; @leary2003optimal; @joseph2008orthogonal; @ba2015optimal; @kenny2000algorithmic; @jin2005efficient; @liefvendahl2006study; @grosso2009finding; @chen2013optimizing] +and algebraic constructions +[@zhou2015space; @xiao2017construction; @wang2018optimal] are used to +construct space-filling LHDs. + +Space-filling designs often focus on the full-dimensional space. To +further improve the space-filling properties of all possible sub-spaces, +[@joseph2015maximum] proposed to use the maximum projection designs. +Considering from two to $k-1$ dimensional sub-spaces, maximum projection +LHDs (MaxPro LHDs) are generally more space-filling compared to the +classic maximin distance LHDs. The construction of MaxPro LHDs is also +challenging, especially for large ones, and [@joseph2015maximum] +proposed a simulated annealing (SA) based algorithm. In the `LHD` +package, we incorporated the MaxPro criterion with other different +algorithms such as the particle swarm optimization (PSO) and genetic +algorithm (GA) framework, leading to many better MaxPro LHDs; see +Section 3 for examples. + +Unlike space-filling LHDs that minimize the similarities among rows, +orthogonal LHDs (OLHDs) are another popular type of optimal design which +consider similarities among columns. For example, OLHDs have zero +column-wise correlations. Algebraic constructions are available for +certain design sizes +[@ye1998orthogonal; @cioppa2007efficient; @steinberg2006construction; @sun2010construction; @sun2009construction; @yang2012construction; @georgiou2014some; @butler2001optimal; @tang1993orthogonal; @lin2009construction], +but there are many design sizes where theoretical results are not +available. In the `LHD` package, we implemented the average absolute +correlation criterion and the maximum absolute correlation criterion +[@georgiou2009orthogonal] with SA, PSO, and GA to identify both OLHDs +and nearly orthogonal LHDs (NOLHDs) for almost all design sizes. + +This paper introduces the R package `LHD` available on the Comprehensive +R Archive Network +(), which +implements some currently popular search algorithms and algebraic +constructions for constructing maximin distance LHDs, Maxpro LHDs, OLHDs +and NOLHDs. We embedded different optimality criteria including the +maximin distance criterion, the MaxPro criterion, the average absolute +correlation criterion, and the maximum absolute correlation criterion in +each of the search algorithms which were originally invented to +construct maximin distance LHDs only +[@morris1995exploratory; @leary2003optimal; @joseph2008orthogonal; @liefvendahl2006study; @chen2013optimizing], +and each of them is capable of constructing different types of optimal +LHDs through the package. To let users flexibly allocate their +computational resources, we also embedded an input argument that limits +the maximum CPU time for each of the algorithms, where users can easily +define how and when they want the algorithms to stop. An algorithm can +stop in one of two ways: either when the user-defined maximum number of +iterations is reached or when the user-defined maximum CPU time is +exceeded. For example, users can either allow the algorithm to run for a +specified number of iterations without restricting the maximum CPU time +or set a maximum CPU time limit to stop the algorithm regardless of the +number of iterations completed. After an algorithm is completed or +stopped, the number of iterations completed along with the average CPU +time per iteration will be presented to users for their information. The +R package `LHD` is an integrated tool for users with little or no +background in design theory, and they can easily find optimal LHDs with +desired sizes. Many new designs that are better than the existing ones +are discovered; see Section 3. + +The remainder of the paper is organized as follows. Section 2 +illustrates different optimality criteria for LHDs. Section 3 +demonstrates some popular search algorithms and their implementation +details in the `LHD` package along with examples. Section 4 discusses +some useful algebraic constructions as well as examples of how to +implement them via the developed package. Section 5 reviews other R packages for Latin hypercubes and provides a comparative discussion. Section 6 concludes with a +summary. + +## Optimality Criteria for LHDs {#OC} + +Various criteria are proposed to measure designs' space-filling +properties +[@johnson1990minimax; @hickernell1998generalized; @fang2002centered]. In +this paper, we focus on the currently popular maximin distance criterion +[@johnson1990minimax], which seeks to scatter design points over +experimental domains so that the minimum distances between points are +maximized. Let $\textbf{X}$ denote an LHD matrix throughout this paper. +Define the $L_q$-distance between two runs $x_i$ and $x_j$ of +$\textbf{X}$ as +$d_q(x_i, x_j) = \left\{ \sum_{m=1}^{k} \vert x_{im}-x_{jm}\vert ^q \right\}^{1/q}$ +where $q$ is an integer. Define the $L_q$ distance of the design +$\textbf{X}$ as +$d_q(\textbf{X}) = \text{min} \{d_q(x_i, x_j), 1 \leq i X = rLHD(n = 5, k = 3); X #This generates a 5 by 3 random LHD, denoted as X + [,1] [,2] [,3] +[1,] 2 1 4 +[2,] 4 3 3 +[3,] 3 2 2 +[4,] 1 4 5 +[5,] 5 5 1 +``` + +The input arguments for the function `rLHD` are the run-size `n` and the +factor size `k`. Continuing with the above randomly generated LHD X, we +evaluate it with respect to different optimality criteria. For example, + +``` r +> phi_p(X) #The maximin L1-distance criterion. +[1] 0.3336608 +> phi_p(X, p = 10, q = 2) #The maximin L2-distance criterion. +[1] 0.5797347 +> MaxProCriterion(X) #The maximum projection criterion. +[1] 0.5375482 +> AvgAbsCor(X) #The average absolute correlation criterion. +[1] 0.5333333 +> MaxAbsCor(X) #The maximum absolute correlation criterion. +[1] 0.9 +``` + +The input arguments of the function `phi_p` are an LHD matrix `X`, `p` +and `q`, where `p` and `q` come directly from the equation \@ref(eq:E2). +Note that the default settings within function `phi_p` are $p=15$ and +$q=1$ (the Manhattan distance) and user can change the settings. For +functions `MaxProCriterion`, `AvgAbsCor`, and `MaxAbsCor`, there is only +one input argument, which is an LHD matrix `X`. + +## Search Algorithms for Optimal LHDs with Flexible Sizes {#Algs} + +### Simulated Annealing Based Algorithms + +Simulated annealing (SA, [@kirkpatrick1983optimization]) is a +probabilistic optimization algorithm, whose name comes from the +phenomenon of the annealing process in metallurgy. +[@morris1995exploratory] proposed a modified SA that randomly exchanges +the elements in LHD to seek potential improvements. If such an exchange +leads to a better LHD under a given optimality criterion, the exchange +is maintained. Otherwise, it is kept with a probability of +$\hbox{exp}[-(\Phi(\textbf{X}_{new})-\Phi(\textbf{X}))/T]$, where $\Phi$ +is a given optimality criterion, $\textbf{X}$ is the original LHD, +$\textbf{X}_{new}$ is the LHD after the exchange and $T$ is the current +temperature. In this article, we focus on minimizing the optimality +criteria outlined in Section 2, meaning only minimization optimization +problems are considered. Such probability guarantees that the exchange +that leads to a slightly worse LHD has a higher chance of being kept +than the exchange that leads to a significantly worse LHD, because an +exchange which leads to a slightly worse LHD has a lower value of +$\Phi(\textbf{X}_{new})-\Phi(\textbf{X})$. Such an exchange procedure +will be implemented iteratively to improve LHD. When there are no +improvements after certain attempts, the current temperature $T$ is +annealed. Note that a large value of +$\Phi(\textbf{X}_{new})-\Phi(\textbf{X})$ (exchange leading to a +significantly worse LHD) is more likely to remain during the early phase +of the search process when $T$ is relatively high, and it is less likely +to stay later when $T$ decreases (annealed). The best LHD is identified +after the algorithm converges or the budget constraint is reached. In +the `LHD` package, the function `SA()` implements this algorithm: + +``` r +SA(n, k, N = 10, T0 = 10, rate = 0.1, Tmin = 1, Imax = 5, OC = "phi_p", +p = 15, q = 1, maxtime = 5) +``` + +Table \@ref(tab:T1) provides an +overview of all the input arguments in `SA()`. `n` and `k` are the +desired run size and factor size. `T0` is an initial temperature, `rate` +is the temperature decreasing rate, and `Tmin` is the minimum +temperature. If the current temperature is smaller than `Tmin`, the +current loop in the algorithm will stop and the current number of +iterations will increase by one. There are two stopping criteria for the +entire function: when the current number of iterations reaches the +maximum (denoted as `N` in the function) or when the cumulative CPU time +reaches the maximum (denoted as `maxtime` in the function), +respectively. Either of those will trigger the stop of the function, +whichever is earlier. For input argument `OC` (optimality criterion), +"phi_p\" returns maximin distance LHDs, "MaxProCriterion\" returns +MaxPro LHDs, and "AvgAbsCor\" or "MaxAbsCor\" returns orthogonal LHDs. + +:::: center +::: {#T1} + -------------------------------------------------------------------------------------------------------- + Argument Description + ----------- -------------------------------------------------------------------------------------------- + `n` A positive integer that defines the number of rows (or run size) of output LHD. + + `k` A positive integer that defines the number of columns (or factor size) of output LHD. + + `N` A positive integer that defines the maximum number of iterations in the algorithm. + + A large value of `N` will result in a high CPU time, and it is recommended to be no + + greater than 500. The default is set to be 10. + + `T0` A positive number that defines the initial temperature. The default is set to be 10, + + which means the temperature anneals from 10 in the algorithm. + + `rate` A positive percentage that defines the temperature decrease rate, and it should be + + in (0,1). For example, rate=0.25 means the temperature decreases by 25% each time. + + The default is set to be 10%. + + `Tmin` A positive number that defines the minimum temperature allowed. When current + + temperature becomes smaller or equal to `Tmin`, the stopping criterion for current + + loop is met. The default is set to be 1. + + `Imax` A positive integer that defines the maximum perturbations the algorithm will try + + without improvements before temperature is reduced. The default is set to be 5. + + For CPU time consideration, `Imax` is recommended to be no greater than 5. + + `OC` An optimality criterion. The default setting is "phi_p\", and it could be one of + + the following: "phi_p\", "AvgAbsCor\", "MaxAbsCor\", "MaxProCriterion\". + + `p` A positive integer, which is one parameter in the $\phi_{p}$ formula, and `p` is preferred + + to be large. The default is set to be 15. + + `q` A positive integer, which is one parameter in the $\phi_{p}$ formula, and `q` could be + + either 1 or 2. If `q` is 1, the Manhattan (rectangular) distance will be calculated. + + If `q` is 2, the Euclidean distance will be calculated. + + `maxtime` A positive number, which indicates the expected maximum CPU time, and it is + + measured by minutes. For example, `maxtime`=3.5 indicates the CPU time will + + be no greater than three and a half minutes. The default is set to be 5. + -------------------------------------------------------------------------------------------------------- + + : (#tab:T1) Overview of Input Arguments of the `SA` Function +::: +:::: + +[@leary2003optimal] modified the SA algorithm in +[@morris1995exploratory] to search for optimal orthogonal array-based +LHDs (OALHDs). [@tang1993orthogonal] showed that OALHDs tend to have +better space-filling properties than random LHDs. The SA in +[@leary2003optimal] starts with a random OALHD rather than a random LHD. +The remaining steps are the same as the SA in [@morris1995exploratory]. +Note that the existence of OALHDs is determined by the existence of the +corresponding initial OAs. In the `LHD` package, the function `OASA()` +implements the modified SA algorithm.: + +``` r +OASA(OA, N = 10, T0 = 10, rate = 0.1, Tmin = 1, Imax = 5, OC = "phi_p", +p = 15, q = 1, maxtime = 5), +``` + +where all the input arguments are the same as in `SA` except that `OA` +must be an orthogonal array. + +[@joseph2008orthogonal] proposed another modified SA to identify the +orthogonal-maximin LHDs, which considers both the orthogonality and the +maximin distance criteria. The algorithm starts with generating a random +LHD and then chooses the column that has the largest average pairwise +correlations with all other columns. Next, the algorithm will select the +row which has the largest total row-wise distance with all other rows. +Then, the element at the selected row and column will be exchanged with +a random element from the same column. The remaining steps are the same +as the SA in [@morris1995exploratory]. In the `LHD` package, the +function `SA2008()` implements this algorithm: + +``` r +SA2008(n, k, N = 10, T0 = 10, rate = 0.1, Tmin = 1, Imax = 5, OC = "phi_p", +p = 15, q = 1, maxtime = 5), + +``` + +where all the input arguments are the same as in `SA`. + +### Particle Swarm Optimization Algorithms + +Particle swarm optimization (PSO, [@kennedy1995particle]) is a +metaheuristic optimization algorithm inspired by the social behaviors of +animals. Recent research [@chen2013optimizing] adapted the classic PSO +algorithm and proposed LaPSO to identify maximin distance LHDs. Since +this is a discrete optimization task, LaPSO redefines the steps in which +each particle updates its velocity and position in the general PSO +framework. In the `LHD` package, the function `LaPSO()` implements this +algorithm: + +``` r +LaPSO(n, k, m = 10, N = 10, SameNumP = 0, SameNumG = n/4, p0 = 1/(k - 1), +OC = "phi_p", p = 15, q = 1, maxtime = 5) + +``` + +Table \@ref(tab:T2) provides an +overview of all the input arguments in `LaPSO()`, where `n`, `k`, `N`, +`OC`, `p`, `q`, and `maxtime` are exactly the same as the input +arguments in the function `SA()`. `m` is the number of particles, which +represents candidate solutions in the PSO framework. `SameNumP` and +`SameNumG` are two tuning parameters that denote how many exchanges +would be performed to reduce the Hamming distance towards the personal +best and the global best. `p0` is the tuning parameter that denotes the +probability of a random swap for two elements in the current column of +the current particle to prevent the algorithm from being stuck at the +local optimum. In [@chen2013optimizing], they provided the following +suggestions: `SameNumP` is approximately $n/2$ when `SameNumG` is $0$, +`SameNumG` is approximately $n/4$ when `SameNumP` is $0$, and `p0` +should be between $1/(k-1)$ and $2/(k-1)$. The stopping criterion of the +function is the same as that of the function `SA`. + +:::: center +::: {#T2} + --------------------------------------------------------------------------------------------------------- + Argument Description + ------------ -------------------------------------------------------------------------------------------- + `n` A positive integer that defines the number of rows (or run size) of output LHD. + + `k` A positive integer that defines the number of columns (or factor size) of output LHD. + + `m` A positive integer that defines the number of particles, where each particle is a + + candidate solution. A large value of `N` will result in a high CPU time, and it is + + recommended to be no greater than 100. The default is set to be 10. + + `N` A positive integer that defines the maximum number of iterations in the algorithm. + + A large value of `N` will result in a high CPU time, and it is recommended to be no + + greater than 500. The default is set to be 10. + + `SameNumP` A non-negative integer that defines how many elements in current column of + + current particle should be the same as corresponding Personal Best. SameNumP + + can be 0, 1, 2, ..., n, where 0 means to skip the element exchange, which is the + + default setting. + + `SameNumG` A non-negative integer that defines how many elements in current column of + + current particle should be the same as corresponding Global Best. SameNumG can + + be 0, 1, 2, ..., n, where 0 means to skip the element exchange. The default setting + + is n/4. Note that SameNumP and SameNumG cannot be 0 at the same time. + + `p0` A probability of exchanging two randomly selected elements in current column of + + current particle LHD. The default is set to be 1/(k - 1). + + `OC` An optimality criterion. The default setting is "phi_p\", and it could be one of + + the following: "phi_p\", "AvgAbsCor\", "MaxAbsCor\", "MaxProCriterion\". + + `p` A positive integer, which is one parameter in the $\phi_{p}$ formula, and `p` is preferred + + to be large. The default is set to be 15. + + `q` A positive integer, which is one parameter in the $\phi_{p}$ formula, and `q` could be + + either 1 or 2. If `q` is 1, the Manhattan (rectangular) distance will be calculated. + + If `q` is 2, the Euclidean distance will be calculated. + + `maxtime` A positive number, which indicates the expected maximum CPU time, and it is + + measured by minutes. For example, `maxtime`=3.5 indicates the CPU time will + + be no greater than three and a half minutes. The default is set to be 5. + --------------------------------------------------------------------------------------------------------- + + : (#tab:T2) Overview of Input Arguments of the `LaPSO` Function +::: +:::: + +### Genetic Algorithms + +The genetic algorithm (GA) is a nature-inspired metaheuristic +optimization algorithm that mimics Charles Darwin's idea of natural +selection [@holland1992adaptation; @goldberg1989genetic]. +[@liefvendahl2006study] proposed a version of GA for identifying maximin +distance LHDs. They implement the column exchange technique to solve the +discrete optimization task. In the `LHD` package, the function `GA()` +implements this algorithm: + +``` r +GA(n, k, m = 10, N = 10, pmut = 1/(k - 1), OC = "phi_p", p = 15, q = 1, +maxtime = 5) + +``` + +Table \@ref(tab:T3) provides an +overview of all the input arguments in `GA()`, where `n`, `k`, `N`, +`OC`, `p`, `q`, and `maxtime` are exactly the same as the input +arguments in the function `SA()`. `m` is the population size, which +represents how many candidate solutions in each iteration, and must be +an even number. `pmut` is the tuning parameter that controls how likely +the mutation would happen. When mutation occurs, two randomly selected +elements will be exchanged in the current column of the current LHD. +`pmut` serves the same purpose as `p0` in `LaPSO()`, which prevents the +algorithm from getting stuck at the local optimum, and it is recommended +to be $1/(k-1)$. The stopping criterion of the function is the same as +that of the function `SA`. + +:::: center +::: {#T3} + -------------------------------------------------------------------------------------------------------- + Argument Description + ----------- -------------------------------------------------------------------------------------------- + `n` A positive integer that defines the number of rows (or run size) of output LHD. + + `k` A positive integer that defines the number of columns (or factor size) of output LHD. + + `m` A positive even integer, which stands for the population size and it must be an even + + number. The default is set to be 10. A large value of m will result in a high CPU time, + + and it is recommended to be no greater than 100. + + `N` A positive integer that defines the maximum number of iterations in the algorithm. + + A large value of `N` will result in a high CPU time, and it is recommended to be no + + greater than 500. The default is set to be 10. + + `pmut` A probability for mutation. When the mutation happens, two randomly selected + + elements in current column of current LHD will be exchanged. The default is + + set to be 1/(k - 1). + + `OC` An optimality criterion. The default setting is "phi_p\", and it could be one of + + the following: "phi_p\", "AvgAbsCor\", "MaxAbsCor\", "MaxProCriterion\". + + `p` A positive integer, which is one parameter in the $\phi_{p}$ formula, and `p` is preferred + + to be large. The default is set to be 15. + + `q` A positive integer, which is one parameter in the $\phi_{p}$ formula, and `q` could be + + either 1 or 2. If `q` is 1, the Manhattan (rectangular) distance will be calculated. + + If `q` is 2, the Euclidean distance will be calculated. + + `maxtime` A positive number, which indicates the expected maximum CPU time, and it is + + measured by minutes. For example, `maxtime`=3.5 indicates the CPU time will + + be no greater than three and a half minutes. The default is set to be 5. + -------------------------------------------------------------------------------------------------------- + + : (#tab:T3) Overview of Input Arguments of the `GA` Function +::: +:::: + +### Illustrating Examples for the Implemented Search Algorithms + +This subsection demonstrates some examples on how to use the search +algorithms in the developed `LHD` package. In +Table \@ref(tab:T4), we summarize +the R functions of the algorithms discussed in the previous subsections, +which can be used to identify different types of optimal LHDs. Users who +seek fast solutions can use the default settings of the input arguments +after specifying the design sizes. See the following examples. + +:::: center +::: {#A1} + ------------------------------------------------------------------------------------------- + Function Description + ---------- -------------------------------------------------------------------------------- + SA Returns an LHD via the simulated annealing algorithm [@morris1995exploratory]. + + OASA Returns an LHD via the orthogonal-array-based simulated annealing algorithm + + [@leary2003optimal], where an OA of the required design size must exist. + + SA2008 Returns an LHD via the simulated annealing algorithm with the multi-objective + + optimization approach [@joseph2008orthogonal]. + + LaPSO Returns an LHD via the particle swarm optimization [@chen2013optimizing]. + + GA Returns an LHD via the genetic algorithm [@liefvendahl2006study]. + ------------------------------------------------------------------------------------------- + + : (#tab:T4) Search algorithm functions in the `LHD` package +::: +:::: + +``` r +#Generate a 5 by 3 maximin distance LHD by the SA function. +> try.SA = SA(n = 5, k = 3); try.SA + [,1] [,2] [,3] +[1,] 2 2 1 +[2,] 5 3 2 +[3,] 4 5 5 +[4,] 3 1 4 +[5,] 1 4 3 +> phi_p(try.SA) #\phi_p is smaller than that of a random LHD (0.3336608). +[1] 0.2169567 + +#Similarly, generations of 5 by 3 maximin distance LHD by the SA2008, LaPSO and GA functions. +> try.SA2008 = SA2008(n = 5, k = 3) +> try.LaPSO = LaPSO(n = 5, k = 3) +> try.GA = GA(n = 5, k = 3) + +#Generate an OA(9,2,3,2), an orthogonal array with 9 runs, 2 factors, 3 levels, and 2 strength. +> OA = matrix(c(rep(1:3, each = 3), rep(1:3, times = 3)), ++ ncol = 2, nrow = 9, byrow = FALSE) +#Generates a maximin distance LHD with the same design size as the input OA +#by the orthogonal-array-based simulated annealing algorithm. +> try.OASA = OASA(OA) +> OA; try.OASA + [,1] [,2] [,1] [,2] +[1,] 1 1 [1,] 1 2 +[2,] 1 2 [2,] 2 6 +[3,] 1 3 [3,] 3 9 +[4,] 2 1 [4,] 4 3 +[5,] 2 2 [5,] 6 5 +[6,] 2 3 [6,] 5 7 +[7,] 3 1 [7,] 7 1 +[8,] 3 2 [8,] 9 4 +[9,] 3 3; [9,] 8 8 + +``` + +Note that the default optimality criterion embedded in all search +algorithms is "phi_p\" (that is, the maximin distance criterion), +leading to the maximin $L_2$-distance LHDs. For other optimality +criteria, users should change the setting of the input argument `OC` +(with options "phi_p\", "MaxProCriterion\", "MaxAbsCor\" and +"MaxProCriterion\"). The following examples illustrate some details of +different argument settings. + +``` r +#Below try.SA is a 5 by 3 maximin distance LHD generated by the SA with 30 iterations (N = 30). +#The temperature starts at 10 (T0 = 10) and decreases 10% (rate = 0.1) each time. +#The minimium temperature allowed is 1 (Tmin = 1) and the maximum perturbations that +#the algorithm will try without improvements is 5 (Imax = 5). The optimality criterion +#used is maximin distance criterion (OC = "phi_p") with p = 15 and q = 1, and the +#maximum CPU time is 5 minutes (maxtime = 5). +> try.SA = SA(n = 5, k = 3, N = 30, T0 = 10, rate = 0.1, Tmin = 1, Imax = 5, OC = "phi_p", ++ p = 15, q = 1, maxtime = 5); try.SA + [,1] [,2] [,3] +[1,] 1 3 4 +[2,] 2 5 2 +[3,] 5 4 3 +[4,] 4 1 5 +[5,] 3 2 1 +> phi_p(try.SA) +[1] 0.2169567 + +#Below try.SA2008 is a 5 by 3 maximin distance LHD generated by SA with +#the multi-objective optimization approach. The input arguments are interpreted +#the same as the design try.SA above. +> try.SA2008 = SA2008(n = 5, k = 3, N = 30, T0 = 10, rate = 0.1, Tmin = 1, Imax = 5, ++ OC = "phi_p", p = 15, q = 1, maxtime = 5) + +#Below try.OASA is a 9 by 2 maximin distance LHD generated by the +#orthogonal-array-based simulated annealing algorithm with the input +#OA (defined previously), and the rest input arguments are interpreted the +#same as the design try.SA above. +> try.OASA = OASA(OA, N = 30, T0 = 10, rate = 0.1, Tmin = 1, Imax = 5, ++ OC = "phi_p", p = 15, q = 1, maxtime = 5) + +#Below try.LaPSO is a 5 by 3 maximum projection LHD generated by the particle swarm +#optimization algorithm with 20 particles (m = 20) and 30 iterations (N = 30). +#Zero (or two) elements in any column of the current particle should be the same as +#the elements of corresponding column from personal best (or global best), because +#of SameNumP = 0 (or SameNumG = 2). +#The probability of exchanging two randomly selected elements is 0.5 (p0 = 0.5). +#The optimality criterion is maximum projection criterion (OC = "MaxProCriterion"). +#The maximum CPU time is 5 minutes (maxtime = 5). +> try.LaPSO = LaPSO(n = 5, k = 3, m = 20, N = 30, SameNumP = 0, SameNumG = 2, ++ p0 = 0.5, OC = "MaxProCriterion", maxtime = 5); try.LaPSO + [,1] [,2] [,3] +[1,] 4 5 4 +[2,] 3 1 3 +[3,] 5 2 1 +[4,] 2 3 5 +[5,] 1 4 2 +#Recall the value is 0.5375482 from the random LHD in Section 2. +> MaxProCriterion(try.LaPSO) +[1] 0.3561056 + +#Below try.GA is a 5 by 3 OLHD generated by the genetic algorithm with the +#population size 20 (m = 20), number of iterations 30 (N = 30), mutation +#probability 0.5 (pmut = 0.5), maximum absolute correlation criterion +#(OC = "MaxAbsCor"), and maximum CPU time 5 minutes (maxtime = 5). +> try.GA = GA(n = 5, k = 3, m = 20, N = 30, pmut = 0.5, OC = "MaxAbsCor", ++ maxtime = 5); try.GA + [,1] [,2] [,3] +[1,] 2 1 2 +[2,] 4 4 5 +[3,] 3 5 1 +[4,] 5 2 3 +[5,] 1 3 4 +#Recall the value is 0.9 from the random LHD in Section 2. +> MaxAbsCor(try.GA) +[1] 0.1 #The maximum absolute correlation between columns is 0.1 + +``` + +Next, we discuss some details of the implementation. In SA based +algorithms (`SA`, `SA2008`, and `OASA`), the number of iterations `N` is +recommended to be no greater than 500 for computing time considerations. +The input `rate` determines the percentage of the decrease in current +temperature (for example, $0.1$ means a decrease of $10\%$ each time). A +high rate would make the temperature rapidly drop, which leads to a fast +stop of the algorithm. It is recommended to set `rate` from $0.1$ to +$0.15$. `Imax` indicates the maximum perturbations that the algorithm +will attempt without improvements before the temperature reduces, and it +is recommended to be no greater than 5 for computing time +considerations. `OC` chooses the optimality criterion, and the `"phi_p"` +criterion in \@ref(eq:E2) is set as default. `OC` has other options, +including `"MaxProCriterion"`, `"AvgAbsCor"` and `"MaxAbsCor"`. Our +algorithms support both the $L_1$ and $L_2$ distances. + +For every algorithm, we incorporate a progress bar to visualize the +computing time used. After an algorithm is completed, information on +"average CPU time per iteration\" and "numbers of iterations completed\" +will be presented. Users can set the limit for the CPU time used for +each algorithm using the argument `maxtime`, according to their +practical needs. + +We also provide some illustrative code to demonstrate that the designs +found in the `LHD` package are better than the existing ones, and the +code in the following can be easily modified to construct other design +sizes or other LHD types. Out of 100 trials, the code below shows the GA +in the `LHD` package constructed better MaxPro LHDs 99 times compared to +the algorithm in the `MaxPro` package, when 500 iterations are set for +both algorithms. We did not compare the CPU time between these two +packages since one is written in the R environment and the other one is +written in the C++ environment, but with the same number of iterations, +the GA in the `LHD` package almost always constructs better MaxPro LHDs. + +``` r +#Make sure both packages are properly installed before load them +> library(LHD) +> library(MaxPro) + +> count = 0 #Define a variable for counting purpose + +> k = 5 #Factor size 5 +> n = 10*k #Run size = 10*factor size + +#Setting 500 iterations for both algorithms, below loop counts +#how many times the GA from LHD package outperforms the algorithm +#from MaxPro package out of 100 times +> for (i in 1:100) { + + LHD = LHD::GA(n = n, k = k, m = 100, N = 500) + MaxPro = MaxPro::MaxProLHD(n = n, p = k, total_iter = 500)$Design + + #MaxPro * n + 0.5 applied the transformation mentioned in Section 2 + #to revert the scaling. + Result.LHD = LHD::MaxProCriterion(LHD) + Result.MaxPro = LHD::MaxProCriterion(MaxPro * n + 0.5) + + if (Result.LHD < Result.MaxPro) {count = count + 1} + +} + +> count +[1] 99 + +``` + +## Algebraic Constructions for Optimal LHDs with Certain Sizes {#Constr} + +There are algebraic constructions available for certain design sizes, +and theoretical results are developed to guarantee the efficiency of +such designs. Algebraic constructions almost do not require any +searching, which are especially attractive for large designs. In this +section, we present algebraic constructions that are available in the +`LHD` package for maximin distance LHDs and orthogonal LHDs. + +### Algebraic Constructions for Maximin Distance LHDs {#WT} + +[@wang2018optimal] proposed to generate maximin distance LHDs via good +lattice point (GLP) sets [@zhou2015space] and Williams transformation +[@williams1949experimental]. In practice, their method can lead to +space-filling designs with relatively flexible sizes, where the run size +$n$ is flexible but the factor size $k$ must be no greater than the +number of positive integers that are co-prime to $n$. They proved that +the resulting designs of sizes $n \times (n-1)$ (with $n$ being any odd +prime) and $n \times n$ (with $2n+1$ or $n+1$ being odd prime) are +optimal under the maximin $L_1$ distance criterion. This construction +method by [@wang2018optimal] is very attractive for constructing large +maximin distance LHDs. In the `LHD` package, function `FastMmLHD()` +implements this method: + +``` r +FastMmLHD(n, k, method = "manhattan", t1 = 10), +``` + +where `n` and `k` are the desired run size and factor size. `method` is +a distance measure method which can be one of the following: +"euclidean\", "maximum\", "manhattan\", "canberra\", "binary\" or +"minkowski\". Any unambiguous substring can be given. `t1` is a tuning +parameter, which determines how many repeats will be implemented to +search for the optimal design. The default is set to be 10. + +[@tang1993orthogonal] proposed to construct orthogonal array-based LHDs +(OALHDs) from existing orthogonal arrays (OAs), and +[@tang1993orthogonal] showed that the OALHDs can have better +space-filling properties than the general ones. In the `LHD` package, +function `OA2LHD()` implements this method: + +``` r +OA2LHD(OA), + +``` + +where `OA` is an orthogonal array matrix. Users only need to input an OA +and the function will return an OALHD with the same design size as the +input OA. + +### Algebraic Constructions for Orthogonal LHDs {#sec:olhd} + +Orthogonal LHDs (OLHDs) have zero pairwise correlation between any two +columns, which are widely used by practitioners. There is a rich +literature on the constructions of OLHDs with various design sizes, but +they are often too hard for practitioners to replicate in practice. The +`LHD` package implements some currently popular methods +[@ye1998orthogonal; @cioppa2007efficient; @sun2010construction; @tang1993orthogonal; @lin2009construction; @butler2001optimal] +for practitioners and the functions are easy to use. + +[@ye1998orthogonal] proposed a construction for OLHDs with run sizes +$n=2^m+1$ and factor sizes $k=2m-2$ where $m$ is any integer bigger than +2. In the `LHD` package, function `OLHD.Y1998()` implements this +algebraic construction: + +``` r +OLHD.Y1998(m), + +``` + +where input argument `m` is the $m$ in the construction of +[@ye1998orthogonal]. [@cioppa2007efficient] extended +[@ye1998orthogonal]'s method to construct OLHDs with run size $n=2^m+1$ +and factor size $k=m+ {m-1 \choose 2}$, where $m$ is any integer bigger +than 2. In the `LHD` package, function `OLHD.C2007()` implements this +algebraic construction with input argument `m` remaining the same: + +``` r +OLHD.C2007(m) + +``` + +[@sun2010construction] extended their earlier work +[@sun2009construction] to construct OLHDs with $n=r2^{c+1}+1$ or +$n=r2^{c+1}$ and $k=2^c$, where $r$ and $c$ are positive integers. In +the `LHD` package, function `OLHD.S2010()` implements this algebraic +construction: + +``` r +OLHD.S2010(C, r, type = "odd"), + +``` + +where input arguments `C` and `r` are $c$ and $r$ in the construction. +When input argument `type` is `"odd"`, the output design size would be +$n=r2^{c+1}+1$ by $k=2^c$. When input argument `type` is `"even"`, the +output design size would be $n=r2^{c+1}$ by $k=2^c$. + +[@lin2009construction] constructed OLHDs or NOLHDs with $n^2$ runs and +$2fp$ factors by coupling OLHD($n$, $p$) or NOLHD($n$, $p$) with an +OA($n^2,2f,n,2$). For example, an OLHD(11, 7), coupled with an +OA(121,12,11,2), would yield an OLHD(121, 84). The design size of output +OLHD or NOLHD highly depends on the existence of the OAs. In the `LHD` +package, function `OLHD.L2009()` implements this algebraic construction: + +``` r +OLHD.L2009(OLHD, OA), + +``` + +where input arguments `OLHD` and `OA` are the OLHD and OA to be coupled, +and their design sizes need to be aligned with the designated pattern of +the construction. + +[@butler2001optimal] proposed a method to construct OLHDs with the run +size $n$ being odd primes and factor size $k$ being less than or equal +to $n-1$ via the Williams transformation [@williams1949experimental]. In +the `LHD` package, function `OLHD.B2001()` implements this algebraic +construction with input arguments `n` and `k` exactly matching those in +construction: + +``` r +OLHD.B2001(n, k) + +``` + +### Illustrating Examples for the Implemented Algebraic Constructions + +In Table \@ref(tab:T5), we summarize +the algebraic constructions implemented by the developed `LHD` package, +where `FastMmLHD` and `OA2LHD` are for maximin distance LHDs and +`OLHD.Y1998`, `OLHD.C2007`, `OLHD.S2010`, `OLHD.L2009` and `OLHD.B2001` +are for orthogonal LHDs. The following examples will illustrate how to +use them. + +:::: center +::: {#A2} + ----------------------------------------------------------------------------------------- + Function Description + ------------ ---------------------------------------------------------------------------- + FastMmLHD Returns a maximin distance LHD matrix [@wang2018optimal]. + + OA2LHD Expands an orthogonal array to an LHD [@tang1993orthogonal]. + + OLHD.Y1998 Returns a $2^m+1$ by $2m-2$ orthogonal LHD matrix [@ye1998orthogonal] + + where $m$ is an integer and $m \geq 2$. + + OLHD.C2007 Returns a $2^m+1$ by $m+{m-1 \choose 2}$ orthogonal LHD matrix + + [@cioppa2007efficient] where $m$ is an integer and $m \geq 2$. + + OLHD.S2010 Returns a $r2^{c+1}+1$ or $r2^{c+1}$ by $2^c$ orthogonal LHD matrix + + [@sun2010construction] where $r$ and $c$ are positive integers. + + OLHD.L2009 Couples an $n$ by $p$ orthogonal LHD with a $n^2$ by $2f$ strength $2$ and + + level $n$ orthogonal array to generate a $n^2$ by $2fp$ orthogonal LHD + + [@lin2009construction]. + + OLHD.B2001 Returns an orthogonal LHD [@butler2001optimal] with the run size $n$ + + being odd primes and factor size $k$ being less than or equal to $n-1$ . + ----------------------------------------------------------------------------------------- + + : (#tab:T5) Algebraic constructions in the `LHD` package +::: +:::: + +``` r +#FastMmLHD(8, 8) generates an optimal 8 by 8 maximin L_1 distance LHD. +>try.FastMm = FastMmLHD(n = 8, k = 8); try.FastMm + [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] +[1,] 0 1 2 3 4 5 6 7 +[2,] 1 3 5 7 6 4 2 0 +[3,] 2 5 7 4 1 0 3 6 +[4,] 3 7 4 0 2 6 5 1 +[5,] 4 6 1 2 7 3 0 5 +[6,] 5 4 0 6 3 1 7 2 +[7,] 6 2 3 5 0 7 1 4 +[8,] 7 0 6 1 5 2 4 3 + +#OA2LHD(OA) expands an input OA to an LHD of the same run size. +>try.OA2LHD = OA2LHD(OA) +>OA; try.OA2LHD + [,1] [,2] [,1] [,2] +[1,] 1 1 [1,] 1 2 +[2,] 1 2 [2,] 2 4 +[3,] 1 3 [3,] 3 9 +[4,] 2 1 [4,] 4 3 +[5,] 2 2 [5,] 5 5 +[6,] 2 3 [6,] 6 7 +[7,] 3 1 [7,] 9 1 +[8,] 3 2 [8,] 8 6 +[9,] 3 3; [9,] 7 8 + +``` + +``` r +#OLHD.Y1998(m = 3) generates a 9 by 4 orthogonal LHD. +#Note that 2^m+1 = 9 and 2*m-2 = 4. +> try.Y1998 = OLHD.Y1998(m = 3); try.Y1998 + [,1] [,2] [,3] [,4] +[1,] 4 -3 -2 1 +[2,] 3 4 -1 -2 +[3,] 1 -2 3 -4 +[4,] 2 1 4 3 +[5,] 0 0 0 0 +[6,] -4 3 2 -1 +[7,] -3 -4 1 2 +[8,] -1 2 -3 4 +[9,] -2 -1 -4 -3 +> MaxAbsCor(try.Y1998) #column-wise correlations are 0. +[1] 0 + +#OLHD.C2007(m = 4) generates a 17 by 7 orthogonal LHD. +#Note that 2^m+1 = 17 and $4+{4-1 \choose 2}$ = 7. +> try.C2007 = OLHD.C2007(m = 4); dim(try.C2007) +[1] 17 7 +> MaxAbsCor(try.C2007) #column-wise correlations are 0 +[1] 0 + +#OLHD.S2010(C = 3, r = 3, type = "odd") generates a 49 by 8 orthogonal LHD. +#Note that 3*2^4+1 = 49 and 2^3 = 8. +> dim(OLHD.S2010(C = 3, r = 3, type = "odd")) +[1] 49 8 +> MaxAbsCor(OLHD.S2010(C = 3, r = 3, type = "odd")) #column-wise correlations are 0 +[1] 0 + +#OLHD.S2010(C = 3, r = 3, type = "even") generates a 48 by 8 orthogonal LHD. +#Note that 3*2^4 = 48 and 2^3 = 8. +> dim(OLHD.S2010(C = 3, r = 3, type = "even")) +[1] 48 8 +> MaxAbsCor(OLHD.S2010(C = 3, r = 3, type = "even")) #column-wise correlations are 0 +[1] 0 + +#Create a 5 by 2 OLHD. +> OLHD = OLHD.C2007(m = 2) + +#Create an OA(25, 6, 5, 2). +> OA = matrix(c(2,2,2,2,2,1,2,1,5,4,3,5,3,2,1,5,4,5,1,5,4,3,2,5,4,1,3,5,2,3, +1,2,3,4,5,2,1,3,5,2,4,3,1,1,1,1,1,1,4,3,2,1,5,5,5,5,5,5,5,1,4,4,4,4,4,1, +3,1,4,2,5,4,3,3,3,3,3,1,3,5,2,4,1,3,3,4,5,1,2,2,5,4,3,2,1,5,2,3,4,5,1,2, +2,5,3,1,4,4,1,4,2,5,3,4,4,2,5,3,1,4,2,4,1,3,5,3,5,3,1,4,2,4,5,2,4,1,3,3, +5,1,2,3,4,2,4,5,1,2,3,2), ncol = 6, nrow = 25, byrow = TRUE) + +#OLHD.L2009(OLHD, OA) generates a 25 by 12 orthogonal LHD. +#Note that n = 5 so n^2 = 25. p = 2 and f = 3 so 2fp = 12. +> dim(OLHD.L2009(OLHD, OA)) +[1] 25 12 +> MaxAbsCor(OLHD.L2009(OLHD, OA)) #column-wise correlations are 0. +[1] 0 + +#OLHD.B2001(n = 11, k = 5) generates a 11 by 5 orthogonal LHD. +> dim(OLHD.B2001(n = 11, k = 5)) +[1] 11 5 + +``` + +## Other R Packages for Latin Hypercube and Comparative Discussion + +Several R packages have been developed to facilitate Latin hypercube +samples and design constructions for computer experiments. Among these, +the `lhs` package [@lhs] is widely recognized for its utility. It +provides functions for generating both random and optimized Latin +hypercube samples (but not designs), and its methods are particularly +useful for simulation studies where space-filling properties are desired +but design optimality is not the primary focus. The `SLHD` package +[@SLHD] was originally developed for generating sliced LHDs +[@ba2015optimal], while practitioners can set the number of slices to +one to use the package for generating maximin LHDs. The `MaxPro` package +[@MaxPro] focuses on constructing designs that maximize projection +properties. One of its functions, `MaxProLHD`, generates MaxPro LHDs +using a simulated annealing algorithm [@joseph2015maximum]. + +While we acknowledge the contributions of other relevant R packages, we +emphasize the distinguishing features of our developed package. The +`LHD` package embeds multiple optimality criteria, enabling the +construction of various types of optimal LHDs. In contrast, `lhs` and +`SLHD` primarily focus on space-filling Latin hypercube samples and +designs, while `MaxPro` primarily focuses on maximum projection LHDs. +The `LHD` package implements various search algorithms and algebraic +constructions, whereas the other three packages do not implement +algebraic constructions, and both `SLHD` and `MaxPro` only implement one +algorithm to construct LHDs. The primary application of `LHD` is in the +design of computer experiments, whereas `lhs` is mainly used for +sampling and simulation studies. Therefore, `LHD` emphasizes design +optimality, while `lhs` emphasizes the space-filling properties of +samples. + +## Conclusion and Recommendation {#Con} + +`LHD` package implements popular search algorithms, including the SA +[@morris1995exploratory], OASA [@leary2003optimal], SA2008 +[@joseph2008orthogonal], LaPSO [@chen2013optimizing] and GA +[@liefvendahl2006study], along with some widely used algebraic +constructions +[@wang2018optimal; @ye1998orthogonal; @cioppa2007efficient; @sun2010construction; @tang1993orthogonal; @lin2009construction; @butler2001optimal], +for constructing three types of commonly used optimal LHDs: the maximin +distance LHDs, the maximum projection LHDs and the (nearly) orthogonal +LHDs. We aim to provide guidance and an easy-to-use tool for +practitioners to find appropriate experimental designs. Algebraic +constructions are preferred when available, especially for large +designs. Search algorithms are used to generate optimal LHDs with +flexible sizes. + +Among very few R libraries particularly for LHDs, `LHD` is comprehensive +and self-contained as it not only has search algorithms and algebraic +constructions, but also has other useful functions for LHD research and +development such as calculating different optimality criteria, +generating random LHDs, exchanging two random elements in a matrix, and +calculating intersite distance between matrix rows. The help manual in +the package documentation contains further details and illustrative +examples for users who want to explore more of the functions in the +package. + +## Acknowledgments {#acknowledgments .unnumbered} + +This research was partially supported by the National Science Foundation +(NSF) grant DMS-2311186 and the National Key R&D Program of China +2024YFA1016200. The authors appreciate the reviewers' constructive +comments and suggestions. +::::::::::::: diff --git a/_articles/RJ-2025-033/RJ-2025-033.html b/_articles/RJ-2025-033/RJ-2025-033.html new file mode 100644 index 0000000000..1e93662f16 --- /dev/null +++ b/_articles/RJ-2025-033/RJ-2025-033.html @@ -0,0 +1,3244 @@ + + + + + + + + + + + + + + + + + + + + + + LHD: An All-encompassing R Package for Constructing Optimal Latin Hypercube Designs + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+

LHD: An All-encompassing R Package for Constructing Optimal Latin Hypercube Designs

+ + + +

Optimal Latin hypercube designs (LHDs), including maximin distance +LHDs, maximum projection LHDs and orthogonal LHDs, are widely used in +computer experiments. It is challenging to construct such designs with +flexible sizes, especially for large ones, for two main reasons. One +reason is that theoretical results, such as algebraic constructions +ensuring the maximin distance property or orthogonality, are only +available for certain design sizes. For design sizes where theoretical +results are unavailable, search algorithms can generate designs. +However, their numerical performance is not guaranteed to be optimal. +Another reason is that when design sizes increase, the number of +permutations grows exponentially. Constructing optimal LHDs is a +discrete optimization process, and enumeration is nearly impossible +for large or moderate design sizes. Various search algorithms and +algebraic constructions have been proposed to identify optimal LHDs, +each having its own pros and cons. We develop the R package LHD which +implements various search algorithms and algebraic constructions. We +embedded different optimality criteria into each of the search +algorithms, and they are capable of constructing different types of +optimal LHDs even though they were originally invented to construct +maximin distance LHDs only. Another input argument that controls +maximum CPU time is added to each of the search algorithms to let +users flexibly allocate their computational resources. We demonstrate +functionalities of the package by using various examples, and we +provide guidance for experimenters on finding suitable optimal +designs. The LHD package is easy to use for practitioners and possibly +serves as a benchmark for future developments in LHD.

+
+ + + +
+
+

1 Introduction

+

Computer experiments are widely used in scientific research and +industrial production, where complex computer codes, commonly +high-fidelity simulators, generate data instead of real physical systems +(Sacks et al. 1989; Fang et al. 2005). The outputs from computer +experiments are deterministic (that is, free of random errors), and +therefore replications are not needed +(Butler 2001; Joseph and Hung 2008; Ba et al. 2015). Latin +hypercube designs (LHDs, (McKay et al. 1979)) may be the most popular +type of experimental designs for computer experiments +(Fang et al. 2005; Xiao and Xu 2018), which avoid replications on +every dimension and have uniform one-dimensional projections. According +to practical needs, there are various types of optimal LHDs, including +space-filling LHDs, maximum projection LHDs, and orthogonal LHDs. There +is a rich literature on the construction of such designs, but it is +still very challenging to find good ones for moderate to large design +sizes +(Ye 1998; Fang et al. 2005; Joseph et al. 2015; Xiao and Xu 2018). +One key reason is that theoretical results, such as algebraic +constructions which guarantee the maximin distance property or +orthogonality, are only established for specific design sizes. These +constructions provide theoretical guarantees on the design quality but +are limited in their applicability. For design sizes where such +theoretical guarantees do not exist, search algorithms can generate +designs. However, the performance of search-based designs depends on the +algorithm employed, the search space explored, and the computational +resources allocated, meaning they cannot be guaranteed to be optimal. +Constructing optimal LHDs is a discrete optimization process, where +enumerating all possible solutions guarantees the optimal design for a +given size. However, this approach becomes computationally infeasible as +the number of permutations grows exponentially with increasing design +sizes, making it another key reason that adds to the challenge.

+

An LHD with \(n\) runs and \(k\) factors is an \(n \times k\) matrix with each +column being a random permutation of numbers: \(1, \ldots, n\). Throughout +this paper, \(n\) denotes the run size and \(k\) denotes the factor size. A +space-filling LHD has its sampled region as scattered as possible, +minimizing the unsampled region, thus accounting for the uniformity of +all dimensions. Different criteria were proposed to measure designs’ +space-filling properties, including the maximin and minimax distance +criteria (Johnson et al. 1990; Morris and Mitchell 1995), the discrepancy +criteria +(Hickernell 1998; Fang et al. 2002, 2005) and the +entropy criterion (Shewry and Wynn 1987). Since there are as many as +\((n!)^{k}\) candidate LHDs for a given design size, it is nearly +impossible to find the space-filling one by enumeration when \(n\) and \(k\) +are moderate or large. In the current literature, both the search +algorithms +(Morris and Mitchell 1995; Ye et al. 2000; Leary et al. 2003; Jin et al. 2005; Liefvendahl and Stocki 2006; Joseph and Hung 2008; Grosso et al. 2009; Chen et al. 2013; Ba et al. 2015) +and algebraic constructions +(Zhou and Xu 2015; Xiao and Xu 2017; Wang et al. 2018) are used to +construct space-filling LHDs.

+

Space-filling designs often focus on the full-dimensional space. To +further improve the space-filling properties of all possible sub-spaces, +(Joseph et al. 2015) proposed to use the maximum projection designs. +Considering from two to \(k-1\) dimensional sub-spaces, maximum projection +LHDs (MaxPro LHDs) are generally more space-filling compared to the +classic maximin distance LHDs. The construction of MaxPro LHDs is also +challenging, especially for large ones, and (Joseph et al. 2015) +proposed a simulated annealing (SA) based algorithm. In the LHD +package, we incorporated the MaxPro criterion with other different +algorithms such as the particle swarm optimization (PSO) and genetic +algorithm (GA) framework, leading to many better MaxPro LHDs; see +Section 3 for examples.

+

Unlike space-filling LHDs that minimize the similarities among rows, +orthogonal LHDs (OLHDs) are another popular type of optimal design which +consider similarities among columns. For example, OLHDs have zero +column-wise correlations. Algebraic constructions are available for +certain design sizes +(Tang 1993; Ye 1998; Butler 2001; Steinberg and Lin 2006; Cioppa and Lucas 2007; Lin et al. 2009; Sun et al. 2009, 2010; Yang and Liu 2012; Georgiou and Efthimiou 2014), +but there are many design sizes where theoretical results are not +available. In the LHD package, we implemented the average absolute +correlation criterion and the maximum absolute correlation criterion +(Georgiou 2009) with SA, PSO, and GA to identify both OLHDs +and nearly orthogonal LHDs (NOLHDs) for almost all design sizes.

+

This paper introduces the R package LHD available on the Comprehensive +R Archive Network +(https://cran.r-project.org/web/packages/LHD/index.html), which +implements some currently popular search algorithms and algebraic +constructions for constructing maximin distance LHDs, Maxpro LHDs, OLHDs +and NOLHDs. We embedded different optimality criteria including the +maximin distance criterion, the MaxPro criterion, the average absolute +correlation criterion, and the maximum absolute correlation criterion in +each of the search algorithms which were originally invented to +construct maximin distance LHDs only +(Morris and Mitchell 1995; Leary et al. 2003; Liefvendahl and Stocki 2006; Joseph and Hung 2008; Chen et al. 2013), +and each of them is capable of constructing different types of optimal +LHDs through the package. To let users flexibly allocate their +computational resources, we also embedded an input argument that limits +the maximum CPU time for each of the algorithms, where users can easily +define how and when they want the algorithms to stop. An algorithm can +stop in one of two ways: either when the user-defined maximum number of +iterations is reached or when the user-defined maximum CPU time is +exceeded. For example, users can either allow the algorithm to run for a +specified number of iterations without restricting the maximum CPU time +or set a maximum CPU time limit to stop the algorithm regardless of the +number of iterations completed. After an algorithm is completed or +stopped, the number of iterations completed along with the average CPU +time per iteration will be presented to users for their information. The +R package LHD is an integrated tool for users with little or no +background in design theory, and they can easily find optimal LHDs with +desired sizes. Many new designs that are better than the existing ones +are discovered; see Section 3.

+

The remainder of the paper is organized as follows. Section 2 +illustrates different optimality criteria for LHDs. Section 3 +demonstrates some popular search algorithms and their implementation +details in the LHD package along with examples. Section 4 discusses +some useful algebraic constructions as well as examples of how to +implement them via the developed package. Section 5 reviews other R packages for Latin hypercubes and provides a comparative discussion. Section 6 concludes with a +summary.

+

2 Optimality Criteria for LHDs

+

Various criteria are proposed to measure designs’ space-filling +properties +(Johnson et al. 1990; Hickernell 1998; Fang et al. 2002). In +this paper, we focus on the currently popular maximin distance criterion +(Johnson et al. 1990), which seeks to scatter design points over +experimental domains so that the minimum distances between points are +maximized. Let \(\textbf{X}\) denote an LHD matrix throughout this paper. +Define the \(L_q\)-distance between two runs \(x_i\) and \(x_j\) of +\(\textbf{X}\) as +\(d_q(x_i, x_j) = \left\{ \sum_{m=1}^{k} \vert x_{im}-x_{jm}\vert ^q \right\}^{1/q}\) +where \(q\) is an integer. Define the \(L_q\) distance of the design +\(\textbf{X}\) as +\(d_q(\textbf{X}) = \text{min} \{d_q(x_i, x_j), 1 \leq i<j \leq n \}\). +In this paper, we consider \(q=1\) and \(q=2\), i.e. the Manhattan (\(L_1\)) +and Euclidean (\(L_2\)) distances. A design \(\textbf{X}\) is called a +maximin \(L_q\) distance design if it has the unique largest +\(d_q(\textbf{X})\) value among all designs of the same size. When more +than one design has the same largest \(d_q(\textbf{X})\), the maximin +distance design sequentially maximizes the next minimum inter-site +distances. To evaluate the maximin distance criterion in a more +convenient way, (Morris and Mitchell 1995) and (Jin et al. 2005) +proposed to minimize a scalar value: +\[\label{E2} + \phi_{p}= \bigg\{\sum_{i=1}^{n-1}\sum_{j=i+1}^{n}d_q(x_i, x_j)^{-p} \bigg\} ^{1/p}, \tag{1}\] +where \(p\) is a tuning parameter. This \(\phi_{p}\) criterion in +Equation (1) is asymptotically equivalent to the Maximin +distance criterion as \(p \to \infty\). In practice, \(p=15\) often suffices +(Morris and Mitchell 1995). In the LHD package, the function phi_p() +implements this criterion.

+

Maximin distance LHDs focus on the space-filling properties in the +full-dimensional space, but their space-filling properties in the +subspaces are not guaranteed. (Joseph et al. 2015) proposed the maximum +projection criterion that considers designs’ space-filling properties in +all possible dimensional spaces. An LHD \(\textbf{X}\) is called a maximum +projection LHD (MaxPro LHD) if it minimizes the maximum projection +criterion such that +\[\label{E3} + \mathop{\mathrm{min}}\limits_{\textbf{X}} \psi (\textbf{X}) = \Bigg\{ \frac{1}{{n \choose 2}} \sum_{i=1}^{n-1} \sum_{j=i+1}^{n} \frac{1}{\Pi_{l=1}^{k}(x_{il}-x_{jl})^2} \Bigg\}^{1/k}. \tag{2}\] +From Equation (2), we can see that any two design points should +be apart from each other in any projection to minimize the value of +\(\psi (\textbf{X})\). Thus, the maximum projection LHDs consider the +space-filling properties in all possible subspaces. Note that this +criterion was originally defined using design points scaled to the unit +hypercube \([0,1]^{k}\) in (Joseph et al. 2015), whereas our design points +are represented as integer levels. A simple transformation can be +applied to revert the scaling. For example, the transformation +\(\textbf{X}_{Scaled}*n-0.5\), can be applied, meaning that each element +of every design point in scaled unit hypercube is multiplied by its run +size \(n\) and then adding \(0.5\). The illustrative example at the end of +Section 3 applies this transformation to ensure a fair comparison of +performance. In the LHD package, the function MaxProCriterion() +implements this criterion.

+

Orthogonal and nearly orthogonal designs that aim to minimize the +correlations between factors are widely used in experiments +(Steinberg and Lin 2006; Georgiou 2009; Sun and Tang 2017). +Two major correlation-based criteria to measure designs’ orthogonality +are the average absolute correlation criterion and the maximum absolute +correlation criterion (Georgiou 2009), denoted as ave\((|q|)\) +and max\(|q|\), respectively: +\[\label{E4} + \mathop{\mathrm{ave}}\limits(|q|) = \frac{2 \sum_{i=1}^{k-1} \sum_{j=i+1}^{k}|q_{ij}|}{k(k-1)} \text{ and } \mathop{\mathrm{max}}\limits|q| = \mathop{\mathrm{max}}\limits_{i,j} |q_{ij}|, \tag{3}\] +where \(q_{ij}\) is the correlation between the \(i\)th and \(j\)th columns of +the design matrix \(\textbf{X}\). Orthogonal designs have ave\((|q|)=0\) and +max\(|q|=0\), which may not exist for all design sizes. Designs with a +smaller ave\((|q|)\) or max\(|q|\) are generally preferred in practice. In +the LHD package, functions AvgAbsCor() and MaxAbsCor() implement +the criteria ave\((|q|)\) and max\(|q|\), respectively.

+

Illustrating Examples for the Introduced Optimality Criteria

+

This subsection demonstrates some examples of how to use the optimality +criteria introduced above from the developed LHD package. To generate +a random LHD matrix, the function rLHD can be used. For example,

+
> X = rLHD(n = 5, k = 3); X  #This generates a 5 by 3 random LHD, denoted as X
+     [,1] [,2] [,3]
+[1,]    2    1    4
+[2,]    4    3    3
+[3,]    3    2    2
+[4,]    1    4    5
+[5,]    5    5    1
+

The input arguments for the function rLHD are the run-size n and the +factor size k. Continuing with the above randomly generated LHD X, we +evaluate it with respect to different optimality criteria. For example,

+
> phi_p(X)             #The maximin L1-distance criterion.
+[1] 0.3336608
+> phi_p(X, p = 10, q = 2)    #The maximin L2-distance criterion.
+[1] 0.5797347
+> MaxProCriterion(X)   #The maximum projection criterion.
+[1] 0.5375482
+> AvgAbsCor(X)         #The average absolute correlation criterion.
+[1] 0.5333333
+> MaxAbsCor(X)         #The maximum absolute correlation criterion.
+[1] 0.9
+

The input arguments of the function phi_p are an LHD matrix X, p +and q, where p and q come directly from the equation (1). +Note that the default settings within function phi_p are \(p=15\) and +\(q=1\) (the Manhattan distance) and user can change the settings. For +functions MaxProCriterion, AvgAbsCor, and MaxAbsCor, there is only +one input argument, which is an LHD matrix X.

+

3 Search Algorithms for Optimal LHDs with Flexible Sizes

+

Simulated Annealing Based Algorithms

+

Simulated annealing (SA, (Kirkpatrick et al. 1983)) is a +probabilistic optimization algorithm, whose name comes from the +phenomenon of the annealing process in metallurgy. +(Morris and Mitchell 1995) proposed a modified SA that randomly exchanges +the elements in LHD to seek potential improvements. If such an exchange +leads to a better LHD under a given optimality criterion, the exchange +is maintained. Otherwise, it is kept with a probability of +\(\hbox{exp}[-(\Phi(\textbf{X}_{new})-\Phi(\textbf{X}))/T]\), where \(\Phi\) +is a given optimality criterion, \(\textbf{X}\) is the original LHD, +\(\textbf{X}_{new}\) is the LHD after the exchange and \(T\) is the current +temperature. In this article, we focus on minimizing the optimality +criteria outlined in Section 2, meaning only minimization optimization +problems are considered. Such probability guarantees that the exchange +that leads to a slightly worse LHD has a higher chance of being kept +than the exchange that leads to a significantly worse LHD, because an +exchange which leads to a slightly worse LHD has a lower value of +\(\Phi(\textbf{X}_{new})-\Phi(\textbf{X})\). Such an exchange procedure +will be implemented iteratively to improve LHD. When there are no +improvements after certain attempts, the current temperature \(T\) is +annealed. Note that a large value of +\(\Phi(\textbf{X}_{new})-\Phi(\textbf{X})\) (exchange leading to a +significantly worse LHD) is more likely to remain during the early phase +of the search process when \(T\) is relatively high, and it is less likely +to stay later when \(T\) decreases (annealed). The best LHD is identified +after the algorithm converges or the budget constraint is reached. In +the LHD package, the function SA() implements this algorithm:

+
SA(n, k, N = 10, T0 = 10, rate = 0.1, Tmin = 1, Imax = 5, OC = "phi_p",
+p = 15, q = 1, maxtime = 5)
+

Table 1 provides an +overview of all the input arguments in SA(). n and k are the +desired run size and factor size. T0 is an initial temperature, rate +is the temperature decreasing rate, and Tmin is the minimum +temperature. If the current temperature is smaller than Tmin, the +current loop in the algorithm will stop and the current number of +iterations will increase by one. There are two stopping criteria for the +entire function: when the current number of iterations reaches the +maximum (denoted as N in the function) or when the cumulative CPU time +reaches the maximum (denoted as maxtime in the function), +respectively. Either of those will trigger the stop of the function, +whichever is earlier. For input argument OC (optimality criterion), +“phi_p" returns maximin distance LHDs,”MaxProCriterion" returns +MaxPro LHDs, and “AvgAbsCor" or”MaxAbsCor" returns orthogonal LHDs.

+
+
+ + ++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Table 1: Overview of Input Arguments of the SA Function
ArgumentDescription
nA positive integer that defines the number of rows (or run size) of output LHD.
kA positive integer that defines the number of columns (or factor size) of output LHD.
NA positive integer that defines the maximum number of iterations in the algorithm.
A large value of N will result in a high CPU time, and it is recommended to be no
greater than 500. The default is set to be 10.
T0A positive number that defines the initial temperature. The default is set to be 10,
which means the temperature anneals from 10 in the algorithm.
rateA positive percentage that defines the temperature decrease rate, and it should be
in (0,1). For example, rate=0.25 means the temperature decreases by 25% each time.
The default is set to be 10%.
TminA positive number that defines the minimum temperature allowed. When current
temperature becomes smaller or equal to Tmin, the stopping criterion for current
loop is met. The default is set to be 1.
ImaxA positive integer that defines the maximum perturbations the algorithm will try
without improvements before temperature is reduced. The default is set to be 5.
For CPU time consideration, Imax is recommended to be no greater than 5.
OCAn optimality criterion. The default setting is “phi_p", and it could be one of
the following: “phi_p",”AvgAbsCor", “MaxAbsCor",”MaxProCriterion".
pA positive integer, which is one parameter in the \(\phi_{p}\) formula, and p is preferred
to be large. The default is set to be 15.
qA positive integer, which is one parameter in the \(\phi_{p}\) formula, and q could be
either 1 or 2. If q is 1, the Manhattan (rectangular) distance will be calculated.
If q is 2, the Euclidean distance will be calculated.
maxtimeA positive number, which indicates the expected maximum CPU time, and it is
measured by minutes. For example, maxtime=3.5 indicates the CPU time will
be no greater than three and a half minutes. The default is set to be 5.
+
+
+

(Leary et al. 2003) modified the SA algorithm in +(Morris and Mitchell 1995) to search for optimal orthogonal array-based +LHDs (OALHDs). (Tang 1993) showed that OALHDs tend to have +better space-filling properties than random LHDs. The SA in +(Leary et al. 2003) starts with a random OALHD rather than a random LHD. +The remaining steps are the same as the SA in (Morris and Mitchell 1995). +Note that the existence of OALHDs is determined by the existence of the +corresponding initial OAs. In the LHD package, the function OASA() +implements the modified SA algorithm.:

+
OASA(OA, N = 10, T0 = 10, rate = 0.1, Tmin = 1, Imax = 5, OC = "phi_p",
+p = 15, q = 1, maxtime = 5),
+

where all the input arguments are the same as in SA except that OA +must be an orthogonal array.

+

(Joseph and Hung 2008) proposed another modified SA to identify the +orthogonal-maximin LHDs, which considers both the orthogonality and the +maximin distance criteria. The algorithm starts with generating a random +LHD and then chooses the column that has the largest average pairwise +correlations with all other columns. Next, the algorithm will select the +row which has the largest total row-wise distance with all other rows. +Then, the element at the selected row and column will be exchanged with +a random element from the same column. The remaining steps are the same +as the SA in (Morris and Mitchell 1995). In the LHD package, the +function SA2008() implements this algorithm:

+
SA2008(n, k, N = 10, T0 = 10, rate = 0.1, Tmin = 1, Imax = 5, OC = "phi_p",
+p = 15, q = 1, maxtime = 5),
+

where all the input arguments are the same as in SA.

+

Particle Swarm Optimization Algorithms

+

Particle swarm optimization (PSO, (Kennedy and Eberhart 1995)) is a +metaheuristic optimization algorithm inspired by the social behaviors of +animals. Recent research (Chen et al. 2013) adapted the classic PSO +algorithm and proposed LaPSO to identify maximin distance LHDs. Since +this is a discrete optimization task, LaPSO redefines the steps in which +each particle updates its velocity and position in the general PSO +framework. In the LHD package, the function LaPSO() implements this +algorithm:

+
LaPSO(n, k, m = 10, N = 10, SameNumP = 0, SameNumG = n/4, p0 = 1/(k - 1),
+OC = "phi_p", p = 15, q = 1, maxtime = 5)
+

Table 2 provides an +overview of all the input arguments in LaPSO(), where n, k, N, +OC, p, q, and maxtime are exactly the same as the input +arguments in the function SA(). m is the number of particles, which +represents candidate solutions in the PSO framework. SameNumP and +SameNumG are two tuning parameters that denote how many exchanges +would be performed to reduce the Hamming distance towards the personal +best and the global best. p0 is the tuning parameter that denotes the +probability of a random swap for two elements in the current column of +the current particle to prevent the algorithm from being stuck at the +local optimum. In (Chen et al. 2013), they provided the following +suggestions: SameNumP is approximately \(n/2\) when SameNumG is \(0\), +SameNumG is approximately \(n/4\) when SameNumP is \(0\), and p0 +should be between \(1/(k-1)\) and \(2/(k-1)\). The stopping criterion of the +function is the same as that of the function SA.

+
+
+ + ++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Table 2: Overview of Input Arguments of the LaPSO Function
ArgumentDescription
nA positive integer that defines the number of rows (or run size) of output LHD.
kA positive integer that defines the number of columns (or factor size) of output LHD.
mA positive integer that defines the number of particles, where each particle is a
candidate solution. A large value of N will result in a high CPU time, and it is
recommended to be no greater than 100. The default is set to be 10.
NA positive integer that defines the maximum number of iterations in the algorithm.
A large value of N will result in a high CPU time, and it is recommended to be no
greater than 500. The default is set to be 10.
SameNumPA non-negative integer that defines how many elements in current column of
current particle should be the same as corresponding Personal Best. SameNumP
can be 0, 1, 2, …, n, where 0 means to skip the element exchange, which is the
default setting.
SameNumGA non-negative integer that defines how many elements in current column of
current particle should be the same as corresponding Global Best. SameNumG can
be 0, 1, 2, …, n, where 0 means to skip the element exchange. The default setting
is n/4. Note that SameNumP and SameNumG cannot be 0 at the same time.
p0A probability of exchanging two randomly selected elements in current column of
current particle LHD. The default is set to be 1/(k - 1).
OCAn optimality criterion. The default setting is “phi_p", and it could be one of
the following: “phi_p",”AvgAbsCor", “MaxAbsCor",”MaxProCriterion".
pA positive integer, which is one parameter in the \(\phi_{p}\) formula, and p is preferred
to be large. The default is set to be 15.
qA positive integer, which is one parameter in the \(\phi_{p}\) formula, and q could be
either 1 or 2. If q is 1, the Manhattan (rectangular) distance will be calculated.
If q is 2, the Euclidean distance will be calculated.
maxtimeA positive number, which indicates the expected maximum CPU time, and it is
measured by minutes. For example, maxtime=3.5 indicates the CPU time will
be no greater than three and a half minutes. The default is set to be 5.
+
+
+

Genetic Algorithms

+

The genetic algorithm (GA) is a nature-inspired metaheuristic +optimization algorithm that mimics Charles Darwin’s idea of natural +selection (Goldberg 1989; Holland et al. 1992). +(Liefvendahl and Stocki 2006) proposed a version of GA for identifying maximin +distance LHDs. They implement the column exchange technique to solve the +discrete optimization task. In the LHD package, the function GA() +implements this algorithm:

+
GA(n, k, m = 10, N = 10, pmut = 1/(k - 1), OC = "phi_p", p = 15, q = 1,
+maxtime = 5)
+

Table 3 provides an +overview of all the input arguments in GA(), where n, k, N, +OC, p, q, and maxtime are exactly the same as the input +arguments in the function SA(). m is the population size, which +represents how many candidate solutions in each iteration, and must be +an even number. pmut is the tuning parameter that controls how likely +the mutation would happen. When mutation occurs, two randomly selected +elements will be exchanged in the current column of the current LHD. +pmut serves the same purpose as p0 in LaPSO(), which prevents the +algorithm from getting stuck at the local optimum, and it is recommended +to be \(1/(k-1)\). The stopping criterion of the function is the same as +that of the function SA.

+
+
+ + ++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Table 3: Overview of Input Arguments of the GA Function
ArgumentDescription
nA positive integer that defines the number of rows (or run size) of output LHD.
kA positive integer that defines the number of columns (or factor size) of output LHD.
mA positive even integer, which stands for the population size and it must be an even
number. The default is set to be 10. A large value of m will result in a high CPU time,
and it is recommended to be no greater than 100.
NA positive integer that defines the maximum number of iterations in the algorithm.
A large value of N will result in a high CPU time, and it is recommended to be no
greater than 500. The default is set to be 10.
pmutA probability for mutation. When the mutation happens, two randomly selected
elements in current column of current LHD will be exchanged. The default is
set to be 1/(k - 1).
OCAn optimality criterion. The default setting is “phi_p", and it could be one of
the following: “phi_p",”AvgAbsCor", “MaxAbsCor",”MaxProCriterion".
pA positive integer, which is one parameter in the \(\phi_{p}\) formula, and p is preferred
to be large. The default is set to be 15.
qA positive integer, which is one parameter in the \(\phi_{p}\) formula, and q could be
either 1 or 2. If q is 1, the Manhattan (rectangular) distance will be calculated.
If q is 2, the Euclidean distance will be calculated.
maxtimeA positive number, which indicates the expected maximum CPU time, and it is
measured by minutes. For example, maxtime=3.5 indicates the CPU time will
be no greater than three and a half minutes. The default is set to be 5.
+
+
+

Illustrating Examples for the Implemented Search Algorithms

+

This subsection demonstrates some examples on how to use the search +algorithms in the developed LHD package. In +Table 4, we summarize +the R functions of the algorithms discussed in the previous subsections, +which can be used to identify different types of optimal LHDs. Users who +seek fast solutions can use the default settings of the input arguments +after specifying the design sizes. See the following examples.

+
+
+ + ++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Table 4: Search algorithm functions in the LHD package
FunctionDescription
SAReturns an LHD via the simulated annealing algorithm (Morris and Mitchell 1995).
OASAReturns an LHD via the orthogonal-array-based simulated annealing algorithm
(Leary et al. 2003), where an OA of the required design size must exist.
SA2008Returns an LHD via the simulated annealing algorithm with the multi-objective
optimization approach (Joseph and Hung 2008).
LaPSOReturns an LHD via the particle swarm optimization (Chen et al. 2013).
GAReturns an LHD via the genetic algorithm (Liefvendahl and Stocki 2006).
+
+
+
#Generate a 5 by 3 maximin distance LHD by the SA function.
+> try.SA = SA(n = 5, k = 3); try.SA
+     [,1] [,2] [,3]
+[1,]    2    2    1
+[2,]    5    3    2
+[3,]    4    5    5
+[4,]    3    1    4
+[5,]    1    4    3
+> phi_p(try.SA)   #\phi_p is smaller than that of a random LHD (0.3336608).
+[1] 0.2169567
+
+#Similarly, generations of 5 by 3 maximin distance LHD by the SA2008, LaPSO and GA functions.
+> try.SA2008 = SA2008(n = 5, k = 3)
+> try.LaPSO = LaPSO(n = 5, k = 3)
+> try.GA = GA(n = 5, k = 3)
+
+#Generate an OA(9,2,3,2), an orthogonal array with 9 runs, 2 factors, 3 levels, and 2 strength.
+> OA = matrix(c(rep(1:3, each = 3), rep(1:3, times = 3)),
++           ncol = 2, nrow = 9, byrow = FALSE)
+#Generates a maximin distance LHD with the same design size as the input OA
+#by the orthogonal-array-based simulated annealing algorithm.
+> try.OASA = OASA(OA)
+> OA; try.OASA
+      [,1] [,2]         [,1] [,2]
+[1,]    1    1    [1,]    1    2
+[2,]    1    2    [2,]    2    6
+[3,]    1    3    [3,]    3    9
+[4,]    2    1    [4,]    4    3
+[5,]    2    2    [5,]    6    5
+[6,]    2    3    [6,]    5    7
+[7,]    3    1    [7,]    7    1
+[8,]    3    2    [8,]    9    4
+[9,]    3    3;   [9,]    8    8
+

Note that the default optimality criterion embedded in all search +algorithms is “phi_p" (that is, the maximin distance criterion), +leading to the maximin \(L_2\)-distance LHDs. For other optimality +criteria, users should change the setting of the input argument OC +(with options”phi_p", “MaxProCriterion",”MaxAbsCor" and +“MaxProCriterion"). The following examples illustrate some details of +different argument settings.

+
#Below try.SA is a 5 by 3 maximin distance LHD generated by the SA with 30 iterations (N = 30).
+#The temperature starts at 10 (T0 = 10) and decreases 10% (rate = 0.1) each time.
+#The minimium temperature allowed is 1 (Tmin = 1) and the maximum perturbations that
+#the algorithm will try without improvements is 5 (Imax = 5). The optimality criterion
+#used is maximin distance criterion (OC = "phi_p") with p = 15 and q = 1, and the
+#maximum CPU time is 5 minutes (maxtime = 5).
+> try.SA = SA(n = 5, k = 3, N = 30, T0 = 10, rate = 0.1, Tmin = 1, Imax = 5, OC = "phi_p",
++           p = 15, q = 1, maxtime = 5); try.SA
+     [,1] [,2] [,3]
+[1,]    1    3    4
+[2,]    2    5    2
+[3,]    5    4    3
+[4,]    4    1    5
+[5,]    3    2    1
+> phi_p(try.SA)
+[1] 0.2169567
+
+#Below try.SA2008 is a 5 by 3 maximin distance LHD generated by SA with
+#the multi-objective optimization approach. The input arguments are interpreted
+#the same as the design try.SA above.
+> try.SA2008 = SA2008(n = 5, k = 3, N = 30, T0 = 10, rate = 0.1, Tmin = 1, Imax = 5,
++                   OC = "phi_p", p = 15, q = 1, maxtime = 5)
+
+#Below try.OASA is a 9 by 2 maximin distance LHD generated by the
+#orthogonal-array-based simulated annealing algorithm with the input
+#OA (defined previously), and the rest input arguments are interpreted the
+#same as the design try.SA above.
+> try.OASA = OASA(OA, N = 30, T0 = 10, rate = 0.1, Tmin = 1, Imax = 5,
++               OC = "phi_p", p = 15, q = 1, maxtime = 5)
+
+#Below try.LaPSO is a 5 by 3 maximum projection LHD generated by the particle swarm
+#optimization algorithm with 20 particles (m = 20) and 30 iterations (N = 30).
+#Zero (or two) elements in any column of the current particle should be the same as
+#the elements of corresponding column from personal best (or global best), because
+#of SameNumP = 0 (or SameNumG = 2).
+#The probability of exchanging two randomly selected elements is 0.5 (p0 = 0.5).
+#The optimality criterion is maximum projection criterion (OC = "MaxProCriterion").
+#The maximum CPU time is 5 minutes (maxtime = 5).
+> try.LaPSO = LaPSO(n = 5, k = 3, m = 20, N = 30, SameNumP = 0, SameNumG = 2,
++                 p0 = 0.5, OC = "MaxProCriterion", maxtime = 5); try.LaPSO
+     [,1] [,2] [,3]
+[1,]    4    5    4
+[2,]    3    1    3
+[3,]    5    2    1
+[4,]    2    3    5
+[5,]    1    4    2
+#Recall the value is 0.5375482 from the random LHD in Section 2.
+> MaxProCriterion(try.LaPSO)
+[1] 0.3561056
+
+#Below try.GA is a 5 by 3 OLHD generated by the genetic algorithm with the
+#population size 20 (m = 20), number of iterations 30 (N = 30),  mutation
+#probability 0.5 (pmut = 0.5), maximum absolute correlation criterion
+#(OC = "MaxAbsCor"), and maximum CPU time 5 minutes (maxtime = 5).
+> try.GA = GA(n = 5, k = 3, m = 20, N = 30, pmut = 0.5, OC = "MaxAbsCor",
++                 maxtime = 5); try.GA
+     [,1] [,2] [,3]
+[1,]    2    1    2
+[2,]    4    4    5
+[3,]    3    5    1
+[4,]    5    2    3
+[5,]    1    3    4
+#Recall the value is 0.9 from the random LHD in Section 2.
+> MaxAbsCor(try.GA)
+[1] 0.1     #The maximum absolute correlation between columns is 0.1
+

Next, we discuss some details of the implementation. In SA based +algorithms (SA, SA2008, and OASA), the number of iterations N is +recommended to be no greater than 500 for computing time considerations. +The input rate determines the percentage of the decrease in current +temperature (for example, \(0.1\) means a decrease of \(10\%\) each time). A +high rate would make the temperature rapidly drop, which leads to a fast +stop of the algorithm. It is recommended to set rate from \(0.1\) to +\(0.15\). Imax indicates the maximum perturbations that the algorithm +will attempt without improvements before the temperature reduces, and it +is recommended to be no greater than 5 for computing time +considerations. OC chooses the optimality criterion, and the "phi_p" +criterion in (1) is set as default. OC has other options, +including "MaxProCriterion", "AvgAbsCor" and "MaxAbsCor". Our +algorithms support both the \(L_1\) and \(L_2\) distances.

+

For every algorithm, we incorporate a progress bar to visualize the +computing time used. After an algorithm is completed, information on +“average CPU time per iteration" and”numbers of iterations completed" +will be presented. Users can set the limit for the CPU time used for +each algorithm using the argument maxtime, according to their +practical needs.

+

We also provide some illustrative code to demonstrate that the designs +found in the LHD package are better than the existing ones, and the +code in the following can be easily modified to construct other design +sizes or other LHD types. Out of 100 trials, the code below shows the GA +in the LHD package constructed better MaxPro LHDs 99 times compared to +the algorithm in the MaxPro package, when 500 iterations are set for +both algorithms. We did not compare the CPU time between these two +packages since one is written in the R environment and the other one is +written in the C++ environment, but with the same number of iterations, +the GA in the LHD package almost always constructs better MaxPro LHDs.

+
#Make sure both packages are properly installed before load them
+> library(LHD)
+> library(MaxPro)
+
+> count = 0 #Define a variable for counting purpose
+
+> k = 5       #Factor size 5
+> n = 10*k    #Run size = 10*factor size
+
+#Setting 500 iterations for both algorithms, below loop counts
+#how many times the GA from LHD package outperforms the algorithm
+#from MaxPro package out of 100 times
+> for (i in 1:100) {
+
+  LHD = LHD::GA(n = n, k = k, m = 100, N = 500)
+  MaxPro = MaxPro::MaxProLHD(n = n, p = k, total_iter = 500)$Design
+
+  #MaxPro * n + 0.5 applied the transformation mentioned in Section 2
+  #to revert the scaling.
+  Result.LHD = LHD::MaxProCriterion(LHD)
+  Result.MaxPro = LHD::MaxProCriterion(MaxPro * n + 0.5)
+
+  if (Result.LHD < Result.MaxPro) {count = count + 1}
+
+}
+
+> count
+[1] 99
+

4 Algebraic Constructions for Optimal LHDs with Certain Sizes

+

There are algebraic constructions available for certain design sizes, +and theoretical results are developed to guarantee the efficiency of +such designs. Algebraic constructions almost do not require any +searching, which are especially attractive for large designs. In this +section, we present algebraic constructions that are available in the +LHD package for maximin distance LHDs and orthogonal LHDs.

+

Algebraic Constructions for Maximin Distance LHDs

+

(Wang et al. 2018) proposed to generate maximin distance LHDs via good +lattice point (GLP) sets (Zhou and Xu 2015) and Williams transformation +(Williams 1949). In practice, their method can lead to +space-filling designs with relatively flexible sizes, where the run size +\(n\) is flexible but the factor size \(k\) must be no greater than the +number of positive integers that are co-prime to \(n\). They proved that +the resulting designs of sizes \(n \times (n-1)\) (with \(n\) being any odd +prime) and \(n \times n\) (with \(2n+1\) or \(n+1\) being odd prime) are +optimal under the maximin \(L_1\) distance criterion. This construction +method by (Wang et al. 2018) is very attractive for constructing large +maximin distance LHDs. In the LHD package, function FastMmLHD() +implements this method:

+
FastMmLHD(n, k, method = "manhattan", t1 = 10),
+

where n and k are the desired run size and factor size. method is +a distance measure method which can be one of the following: +“euclidean",”maximum", “manhattan",”canberra", “binary" or”minkowski". Any unambiguous substring can be given. t1 is a tuning +parameter, which determines how many repeats will be implemented to +search for the optimal design. The default is set to be 10.

+

(Tang 1993) proposed to construct orthogonal array-based LHDs +(OALHDs) from existing orthogonal arrays (OAs), and +(Tang 1993) showed that the OALHDs can have better +space-filling properties than the general ones. In the LHD package, +function OA2LHD() implements this method:

+
OA2LHD(OA),
+

where OA is an orthogonal array matrix. Users only need to input an OA +and the function will return an OALHD with the same design size as the +input OA.

+

Algebraic Constructions for Orthogonal LHDs

+

Orthogonal LHDs (OLHDs) have zero pairwise correlation between any two +columns, which are widely used by practitioners. There is a rich +literature on the constructions of OLHDs with various design sizes, but +they are often too hard for practitioners to replicate in practice. The +LHD package implements some currently popular methods +(Tang 1993; Ye 1998; Butler 2001; Cioppa and Lucas 2007; Lin et al. 2009; Sun et al. 2010) +for practitioners and the functions are easy to use.

+

(Ye 1998) proposed a construction for OLHDs with run sizes +\(n=2^m+1\) and factor sizes \(k=2m-2\) where \(m\) is any integer bigger than +2. In the LHD package, function OLHD.Y1998() implements this +algebraic construction:

+
OLHD.Y1998(m),
+

where input argument m is the \(m\) in the construction of +(Ye 1998). (Cioppa and Lucas 2007) extended +(Ye 1998)’s method to construct OLHDs with run size \(n=2^m+1\) +and factor size \(k=m+ {m-1 \choose 2}\), where \(m\) is any integer bigger +than 2. In the LHD package, function OLHD.C2007() implements this +algebraic construction with input argument m remaining the same:

+
OLHD.C2007(m)
+

(Sun et al. 2010) extended their earlier work +(Sun et al. 2009) to construct OLHDs with \(n=r2^{c+1}+1\) or +\(n=r2^{c+1}\) and \(k=2^c\), where \(r\) and \(c\) are positive integers. In +the LHD package, function OLHD.S2010() implements this algebraic +construction:

+
OLHD.S2010(C, r, type = "odd"),
+

where input arguments C and r are \(c\) and \(r\) in the construction. +When input argument type is "odd", the output design size would be +\(n=r2^{c+1}+1\) by \(k=2^c\). When input argument type is "even", the +output design size would be \(n=r2^{c+1}\) by \(k=2^c\).

+

(Lin et al. 2009) constructed OLHDs or NOLHDs with \(n^2\) runs and +\(2fp\) factors by coupling OLHD(\(n\), \(p\)) or NOLHD(\(n\), \(p\)) with an +OA(\(n^2,2f,n,2\)). For example, an OLHD(11, 7), coupled with an +OA(121,12,11,2), would yield an OLHD(121, 84). The design size of output +OLHD or NOLHD highly depends on the existence of the OAs. In the LHD +package, function OLHD.L2009() implements this algebraic construction:

+
OLHD.L2009(OLHD, OA),
+

where input arguments OLHD and OA are the OLHD and OA to be coupled, +and their design sizes need to be aligned with the designated pattern of +the construction.

+

(Butler 2001) proposed a method to construct OLHDs with the run +size \(n\) being odd primes and factor size \(k\) being less than or equal +to \(n-1\) via the Williams transformation (Williams 1949). In +the LHD package, function OLHD.B2001() implements this algebraic +construction with input arguments n and k exactly matching those in +construction:

+
OLHD.B2001(n, k)
+

Illustrating Examples for the Implemented Algebraic Constructions

+

In Table 5, we summarize +the algebraic constructions implemented by the developed LHD package, +where FastMmLHD and OA2LHD are for maximin distance LHDs and +OLHD.Y1998, OLHD.C2007, OLHD.S2010, OLHD.L2009 and OLHD.B2001 +are for orthogonal LHDs. The following examples will illustrate how to +use them.

+
+
+ + ++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Table 5: Algebraic constructions in the LHD package
FunctionDescription
FastMmLHDReturns a maximin distance LHD matrix (Wang et al. 2018).
OA2LHDExpands an orthogonal array to an LHD (Tang 1993).
OLHD.Y1998Returns a \(2^m+1\) by \(2m-2\) orthogonal LHD matrix (Ye 1998)
where \(m\) is an integer and \(m \geq 2\).
OLHD.C2007Returns a \(2^m+1\) by \(m+{m-1 \choose 2}\) orthogonal LHD matrix
(Cioppa and Lucas 2007) where \(m\) is an integer and \(m \geq 2\).
OLHD.S2010Returns a \(r2^{c+1}+1\) or \(r2^{c+1}\) by \(2^c\) orthogonal LHD matrix
(Sun et al. 2010) where \(r\) and \(c\) are positive integers.
OLHD.L2009Couples an \(n\) by \(p\) orthogonal LHD with a \(n^2\) by \(2f\) strength \(2\) and
level \(n\) orthogonal array to generate a \(n^2\) by \(2fp\) orthogonal LHD
(Lin et al. 2009).
OLHD.B2001Returns an orthogonal LHD (Butler 2001) with the run size \(n\)
being odd primes and factor size \(k\) being less than or equal to \(n-1\) .
+
+
+
#FastMmLHD(8, 8) generates an optimal 8 by 8 maximin L_1 distance LHD.
+>try.FastMm = FastMmLHD(n = 8, k = 8); try.FastMm
+     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
+[1,]    0    1    2    3    4    5    6    7
+[2,]    1    3    5    7    6    4    2    0
+[3,]    2    5    7    4    1    0    3    6
+[4,]    3    7    4    0    2    6    5    1
+[5,]    4    6    1    2    7    3    0    5
+[6,]    5    4    0    6    3    1    7    2
+[7,]    6    2    3    5    0    7    1    4
+[8,]    7    0    6    1    5    2    4    3
+
+#OA2LHD(OA) expands an input OA to an LHD of the same run size.
+>try.OA2LHD = OA2LHD(OA)
+>OA; try.OA2LHD
+     [,1] [,2]          [,1] [,2]
+[1,]    1    1     [1,]    1    2
+[2,]    1    2     [2,]    2    4
+[3,]    1    3     [3,]    3    9
+[4,]    2    1     [4,]    4    3
+[5,]    2    2     [5,]    5    5
+[6,]    2    3     [6,]    6    7
+[7,]    3    1     [7,]    9    1
+[8,]    3    2     [8,]    8    6
+[9,]    3    3;    [9,]    7    8
+
#OLHD.Y1998(m = 3) generates a 9 by 4 orthogonal LHD.
+#Note that 2^m+1 = 9 and 2*m-2 = 4.
+> try.Y1998 = OLHD.Y1998(m = 3); try.Y1998
+     [,1] [,2] [,3] [,4]
+[1,]    4   -3   -2    1
+[2,]    3    4   -1   -2
+[3,]    1   -2    3   -4
+[4,]    2    1    4    3
+[5,]    0    0    0    0
+[6,]   -4    3    2   -1
+[7,]   -3   -4    1    2
+[8,]   -1    2   -3    4
+[9,]   -2   -1   -4   -3
+> MaxAbsCor(try.Y1998)    #column-wise correlations are 0.
+[1] 0
+
+#OLHD.C2007(m = 4) generates a 17 by 7 orthogonal LHD.
+#Note that 2^m+1 = 17 and $4+{4-1 \choose 2}$ = 7.
+> try.C2007 = OLHD.C2007(m = 4); dim(try.C2007)
+[1] 17  7
+> MaxAbsCor(try.C2007)    #column-wise correlations are 0
+[1] 0
+
+#OLHD.S2010(C = 3, r = 3, type = "odd") generates a 49 by 8 orthogonal LHD.
+#Note that 3*2^4+1 = 49 and 2^3 = 8.
+> dim(OLHD.S2010(C = 3, r = 3, type = "odd"))
+[1] 49  8
+> MaxAbsCor(OLHD.S2010(C = 3, r = 3, type = "odd")) #column-wise correlations are 0
+[1] 0
+
+#OLHD.S2010(C = 3, r = 3, type = "even") generates a 48 by 8 orthogonal LHD.
+#Note that 3*2^4 = 48 and 2^3 = 8.
+> dim(OLHD.S2010(C = 3, r = 3, type = "even"))
+[1] 48  8
+> MaxAbsCor(OLHD.S2010(C = 3, r = 3, type = "even")) #column-wise correlations are 0
+[1] 0
+
+#Create a 5 by 2 OLHD.
+> OLHD = OLHD.C2007(m = 2)
+
+#Create an OA(25, 6, 5, 2).
+> OA = matrix(c(2,2,2,2,2,1,2,1,5,4,3,5,3,2,1,5,4,5,1,5,4,3,2,5,4,1,3,5,2,3,
+1,2,3,4,5,2,1,3,5,2,4,3,1,1,1,1,1,1,4,3,2,1,5,5,5,5,5,5,5,1,4,4,4,4,4,1,
+3,1,4,2,5,4,3,3,3,3,3,1,3,5,2,4,1,3,3,4,5,1,2,2,5,4,3,2,1,5,2,3,4,5,1,2,
+2,5,3,1,4,4,1,4,2,5,3,4,4,2,5,3,1,4,2,4,1,3,5,3,5,3,1,4,2,4,5,2,4,1,3,3,
+5,1,2,3,4,2,4,5,1,2,3,2), ncol = 6, nrow = 25, byrow = TRUE)
+
+#OLHD.L2009(OLHD, OA) generates a 25 by 12 orthogonal LHD.
+#Note that n = 5 so n^2 = 25. p = 2 and f = 3 so 2fp = 12.
+> dim(OLHD.L2009(OLHD, OA))
+[1] 25 12
+> MaxAbsCor(OLHD.L2009(OLHD, OA))    #column-wise correlations are 0.
+[1] 0
+
+#OLHD.B2001(n = 11, k = 5) generates a 11 by 5 orthogonal LHD.
+> dim(OLHD.B2001(n = 11, k = 5))
+[1] 11  5
+

5 Other R Packages for Latin Hypercube and Comparative Discussion

+

Several R packages have been developed to facilitate Latin hypercube +samples and design constructions for computer experiments. Among these, +the lhs package (Carnell 2024) is widely recognized for its utility. It +provides functions for generating both random and optimized Latin +hypercube samples (but not designs), and its methods are particularly +useful for simulation studies where space-filling properties are desired +but design optimality is not the primary focus. The SLHD package +(Ba 2015) was originally developed for generating sliced LHDs +(Ba et al. 2015), while practitioners can set the number of slices to +one to use the package for generating maximin LHDs. The MaxPro package +(Ba and Joseph 2018) focuses on constructing designs that maximize projection +properties. One of its functions, MaxProLHD, generates MaxPro LHDs +using a simulated annealing algorithm (Joseph et al. 2015).

+

While we acknowledge the contributions of other relevant R packages, we +emphasize the distinguishing features of our developed package. The +LHD package embeds multiple optimality criteria, enabling the +construction of various types of optimal LHDs. In contrast, lhs and +SLHD primarily focus on space-filling Latin hypercube samples and +designs, while MaxPro primarily focuses on maximum projection LHDs. +The LHD package implements various search algorithms and algebraic +constructions, whereas the other three packages do not implement +algebraic constructions, and both SLHD and MaxPro only implement one +algorithm to construct LHDs. The primary application of LHD is in the +design of computer experiments, whereas lhs is mainly used for +sampling and simulation studies. Therefore, LHD emphasizes design +optimality, while lhs emphasizes the space-filling properties of +samples.

+

6 Conclusion and Recommendation

+

LHD package implements popular search algorithms, including the SA +(Morris and Mitchell 1995), OASA (Leary et al. 2003), SA2008 +(Joseph and Hung 2008), LaPSO (Chen et al. 2013) and GA +(Liefvendahl and Stocki 2006), along with some widely used algebraic +constructions +(Tang 1993; Ye 1998; Butler 2001; Cioppa and Lucas 2007; Lin et al. 2009; Sun et al. 2010; Wang et al. 2018), +for constructing three types of commonly used optimal LHDs: the maximin +distance LHDs, the maximum projection LHDs and the (nearly) orthogonal +LHDs. We aim to provide guidance and an easy-to-use tool for +practitioners to find appropriate experimental designs. Algebraic +constructions are preferred when available, especially for large +designs. Search algorithms are used to generate optimal LHDs with +flexible sizes.

+

Among very few R libraries particularly for LHDs, LHD is comprehensive +and self-contained as it not only has search algorithms and algebraic +constructions, but also has other useful functions for LHD research and +development such as calculating different optimality criteria, +generating random LHDs, exchanging two random elements in a matrix, and +calculating intersite distance between matrix rows. The help manual in +the package documentation contains further details and illustrative +examples for users who want to explore more of the functions in the +package.

+

Acknowledgments

+

This research was partially supported by the National Science Foundation +(NSF) grant DMS-2311186 and the National Key R&D Program of China +2024YFA1016200. The authors appreciate the reviewers’ constructive +comments and suggestions.

+
+
+

7 Supplementary materials

+

Supplementary materials are available in addition to this article. It can be downloaded at +RJ-2025-033.zip

+

8 Note

+

This article is converted from a Legacy LaTeX article using the +texor package. +The pdf version is the official version. To report a problem with the html, +refer to CONTRIBUTE on the R Journal homepage.

+
+
+S. Ba. SLHD: Maximin-distance (sliced) Latin hypercube designs. 2015. URL https://CRAN.R-project.org/package=SLHD. R package version 2.1-1. +
+
+S. Ba and V. R. Joseph. MaxPro: Maximum projection designs. 2018. URL https://CRAN.R-project.org/package=MaxPro. R package version 4.1-2. +
+
+S. Ba, W. R. Myers and W. A. Brenneman. Optimal sliced Latin hypercube designs. Technometrics, 57(4): 479–487, 2015. +
+
+N. A. Butler. Optimal and orthogonal Latin hypercube designs for computer experiments. Biometrika, 88(3): 847–857, 2001. +
+
+R. Carnell. Lhs: Latin hypercube samples. 2024. URL https://CRAN.R-project.org/package=lhs. R package version 1.2.0. +
+
+R.-B. Chen, D.-N. Hsieh, Y. Hung and W. Wang. Optimizing Latin hypercube designs by particle swarm. Statistics and computing, 23(5): 663–676, 2013. +
+
+T. M. Cioppa and T. W. Lucas. Efficient nearly orthogonal and space-filling Latin hypercubes. Technometrics, 49(1): 45–55, 2007. +
+
+K.-T. Fang, R. Li and A. Sudjianto. Design and modeling for computer experiments. CRC press, 2005. +
+
+K.-T. Fang, C.-X. Ma and P. Winker. Centered \({L}_{2}\)-discrepancy of random sampling and Latin hypercube design, and construction of uniform designs. Mathematics of Computation, 71(237): 275–296, 2002. +
+
+S. D. Georgiou. Orthogonal Latin hypercube designs from generalized orthogonal designs. Journal of Statistical Planning and Inference, 139(4): 1530–1540, 2009. +
+
+S. D. Georgiou and I. Efthimiou. Some classes of orthogonal Latin hypercube designs. Statistica Sinica, 24(1): 101–120, 2014. +
+
+D. E. Goldberg. Genetic algorithms in search. Optimization, and MachineLearning, 1989. +
+
+A. Grosso, A. Jamali and M. Locatelli. Finding maximin Latin hypercube designs by iterated local search heuristics. European Journal of Operational Research, 197(2): 541–547, 2009. +
+
+F. Hickernell. A generalized discrepancy and quadrature error bound. Mathematics of computation, 67(221): 299–322, 1998. +
+
+J. H. Holland et al. Adaptation in natural and artificial systems: An introductory analysis with applications to biology, control, and artificial intelligence. MIT press, 1992. +
+
+R. Jin, W. Chen and A. Sudjianto. An efficient algorithm for constructing optimal design of computer experiments. Journal of statistical planning and inference, 134(1): 268–287, 2005. +
+
+M. E. Johnson, L. M. Moore and D. Ylvisaker. Minimax and maximin distance designs. Journal of statistical planning and inference, 26(2): 131–148, 1990. +
+
+V. R. Joseph, E. Gul and S. Ba. Maximum projection designs for computer experiments. Biometrika, 102(2): 371–380, 2015. +
+
+V. R. Joseph and Y. Hung. Orthogonal-maximin Latin hypercube designs. Statistica Sinica, 171–186, 2008. +
+
+J. Kennedy and R. Eberhart. Particle swarm optimization. In Proceedings of ICNN’95-international conference on neural networks, pages. 1942–1948 1995. IEEE. +
+
+S. Kirkpatrick, C. D. Gelatt and M. P. Vecchi. Optimization by simulated annealing. science, 220(4598): 671–680, 1983. +
+
+S. Leary, A. Bhaskar and A. Keane. Optimal orthogonal-array-based Latin hypercubes. Journal of Applied Statistics, 30(5): 585–598, 2003. +
+
+M. Liefvendahl and R. Stocki. A study on algorithms for optimization of Latin hypercubes. Journal of statistical planning and inference, 136(9): 3231–3247, 2006. +
+
+C. D. Lin, R. Mukerjee and B. Tang. Construction of orthogonal and nearly orthogonal Latin hypercubes. Biometrika, 96(1): 243–247, 2009. +
+
+M. D. McKay, R. J. Beckman and W. J. Conover. Comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics, 21(2): 239–245, 1979. +
+
+M. D. Morris and T. J. Mitchell. Exploratory designs for computational experiments. Journal of statistical planning and inference, 43(3): 381–402, 1995. +
+
+J. Sacks, S. B. Schiller and W. J. Welch. Designs for computer experiments. Technometrics, 31(1): 41–47, 1989. +
+
+M. C. Shewry and H. P. Wynn. Maximum entropy sampling. Journal of applied statistics, 14(2): 165–170, 1987. +
+
+D. M. Steinberg and D. K. J. Lin. A construction method for orthogonal Latin hypercube designs. Biometrika, 93(2): 279–288, 2006. +
+
+F. Sun, M.-Q. Liu and D. K. J. Lin. Construction of orthogonal Latin hypercube designs. Biometrika, 96(4): 971–974, 2009. +
+
+F. Sun, M.-Q. Liu and D. K. J. Lin. Construction of orthogonal Latin hypercube designs with flexible run sizes. Journal of Statistical Planning and Inference, 140(11): 3236–3242, 2010. +
+
+F. Sun and B. Tang. A general rotation method for orthogonal Latin hypercubes. Biometrika, 104(2): 465–472, 2017. +
+
+B. Tang. Orthogonal array-based Latin hypercubes. Journal of the American statistical association, 88(424): 1392–1397, 1993. +
+
+L. Wang, Q. Xiao and H. Xu. Optimal maximin \({L}_{1}\)-distance latin hypercube designs based on good lattice point designs. The Annals of Statistics, 46(6B): 3741–3766, 2018. +
+
+E. Williams. Experimental designs balanced for the estimation of residual effects of treatments. Australian Journal of Chemistry, 2(2): 149–168, 1949. +
+
+Q. Xiao and H. Xu. Construction of maximin distance designs via level permutation and expansion. Statistica Sinica, 28(3): 1395–1414, 2018. +
+
+Q. Xiao and H. Xu. Construction of maximin distance Latin squares and related Latin hypercube designs. Biometrika, 104(2): 455–464, 2017. +
+
+J. Yang and M.-Q. Liu. Construction of orthogonal and nearly orthogonal Latin hypercube designs from orthogonal designs. Statistica Sinica, 433–442, 2012. +
+
+K. Q. Ye. Orthogonal column Latin hypercubes and their application in computer experiments. Journal of the American Statistical Association, 93(444): 1430–1439, 1998. +
+
+K. Q. Ye, W. Li and A. Sudjianto. Algorithmic construction of optimal symmetric Latin hypercube designs. Journal of statistical planning and inference, 90(1): 145–159, 2000. +
+
+Y. Zhou and H. Xu. Space-filling properties of good lattice point sets. Biometrika, 102(4): 959–966, 2015. +
+
+ + +
+ +
+
+ + + + + + + +
+

References

+
+

Reuse

+

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

+

Citation

+

For attribution, please cite this work as

+
Wang, et al., "LHD: An All-encompassing R Package for Constructing Optimal Latin Hypercube Designs", The R Journal, 2026
+

BibTeX citation

+
@article{RJ-2025-033,
+  author = {Wang, Hongzhi and Xiao, Qian and Mandal, Abhyuday},
+  title = {LHD: An All-encompassing R Package for Constructing Optimal Latin Hypercube Designs},
+  journal = {The R Journal},
+  year = {2026},
+  note = {https://doi.org/10.32614/RJ-2025-033},
+  doi = {10.32614/RJ-2025-033},
+  volume = {17},
+  issue = {4},
+  issn = {2073-4859},
+  pages = {20-36}
+}
+
+ + + + + + + diff --git a/_articles/RJ-2025-033/RJ-2025-033.pdf b/_articles/RJ-2025-033/RJ-2025-033.pdf new file mode 100644 index 0000000000..040cc6554f Binary files /dev/null and b/_articles/RJ-2025-033/RJ-2025-033.pdf differ diff --git a/_articles/RJ-2025-033/RJ-2025-033.zip b/_articles/RJ-2025-033/RJ-2025-033.zip new file mode 100644 index 0000000000..a6fd9d186d Binary files /dev/null and b/_articles/RJ-2025-033/RJ-2025-033.zip differ diff --git a/_articles/RJ-2025-033/RJournal.sty b/_articles/RJ-2025-033/RJournal.sty new file mode 100644 index 0000000000..351990be38 --- /dev/null +++ b/_articles/RJ-2025-033/RJournal.sty @@ -0,0 +1,358 @@ +% Package `RJournal' to use with LaTeX2e +% Copyright (C) 2010 by the R Foundation +% Copyright (C) 2013 by the R Journal +% +% Originally written by Kurt Hornik and Friedrich Leisch with subsequent +% edits by the editorial board +% +% CAUTION: +% Do not modify this style file. Any changes to this file will be reset when your +% article is submitted. +% If you must modify the style or add LaTeX packages to the article, these +% should be specified in RJwrapper.tex + +\NeedsTeXFormat{LaTeX2e}[1995/12/01] +\ProvidesPackage{RJournal}[2025/10/05 v0.17 RJournal package] + +\RequirePackage{tikz} + +% Overall page layout, fonts etc ----------------------------------------------- + +% Issues of of \emph{The R Journal} are created from the standard \LaTeX{} +% document class \pkg{report}. + +\RequirePackage{geometry} +\geometry{a4paper, + textwidth=14cm, top=1cm, bottom=1cm, + includehead,includefoot,centering, + footskip=1.5cm} +\raggedbottom +\sloppy +\clubpenalty = 10000 +\widowpenalty = 10000 +\brokenpenalty = 10000 +\usepackage{microtype} + + +\RequirePackage{fancyhdr} +\fancyhead{} +\fancyheadoffset{2cm} +\fancyhead[L]{\textsc{\RJ@sectionhead}} +\fancyhead[R]{\thepage} +\fancyfoot{} +\fancyfoot[L]{The R Journal Vol. \RJ@volume/\RJ@number, \RJ@month~\RJ@year} +\fancyfoot[R]{ISSN 2073-4859} +\pagestyle{fancy} + +% We use the following fonts (all with T1 encoding): +% +% rm & palatino +% tt & inconsolata +% sf & helvetica +% math & palatino + +\RequirePackage{microtype} + +\RequirePackage[scaled=0.92]{helvet} +\RequirePackage{palatino,mathpazo} +\RequirePackage[scaled=1.02]{inconsolata} +\RequirePackage[T1]{fontenc} + +\RequirePackage[hyphens]{url} +\RequirePackage[pagebackref]{hyperref} +\renewcommand{\backref}[1]{[p#1]} + +% Dark blue colour for all links +\RequirePackage{color} +\definecolor{link}{rgb}{0.45,0.51,0.67} +\hypersetup{ + colorlinks,% + citecolor=link,% + filecolor=link,% + linkcolor=link,% + urlcolor=link +} + +% Give the text a little room to breath +\setlength{\parskip}{3pt} +\RequirePackage{setspace} +\setstretch{1.05} + +% Issue and article metadata --------------------------------------------------- + +% Basic front matter information about the issue: volume, number, and +% date. + +\newcommand{\volume}[1]{\def\RJ@volume{#1}} +\newcommand{\volnumber}[1]{\def\RJ@number{#1}} +\renewcommand{\month}[1]{\def\RJ@month{#1}} +\renewcommand{\year}[1]{\def\RJ@year{#1}} + + +% Individual articles correspond to +% chapters, and are contained in |article| environments. This makes it +% easy to have figures counted within articles and hence hyperlinked +% correctly. + +% An article has an author, a title, and optionally a subtitle. We use +% the obvious commands for specifying these. Articles will be put in certain +% journal sections, named by \sectionhead. + +\newcommand {\sectionhead} [1]{\def\RJ@sectionhead{#1}} +\renewcommand{\author} [1]{\def\RJ@author{#1}} +\renewcommand{\title} [1]{\def\RJ@title{#1}} +\newcommand {\subtitle} [1]{\def\RJ@subtitle{#1}} + +% Control appearance of titles: make slightly smaller than usual, and +% suppress section numbering. See http://tex.stackexchange.com/questions/69749 +% for why we don't use \setcounter{secnumdepth}{-1} + +\usepackage[medium]{titlesec} +\usepackage{titletoc} +\titleformat{\section} {\normalfont\large\bfseries}{\arabic{section}}{1em}{} +\titleformat{\subsection}{\normalfont\normalsize\bfseries}{\arabic{section}.\arabic{subsection}}{0.5em}{} +\titlecontents{chapter} [0em]{}{}{}{\titlerule*[1em]{.}\contentspage} + +% Article layout --------------------------------------------------------------- + +% Environment |article| clears the article header information at its beginning. +% We use |\FloatBarrier| from the placeins package to keep floats within +% the article. +\RequirePackage{placeins} +\newenvironment{article}{\author{}\title{}\subtitle{}\FloatBarrier}{\FloatBarrier} + +% Refereed articles should have an abstract, so we redefine |\abstract| to +% give the desired style + +\renewcommand{\abstract}[1]{\noindent\textbf{Abstract} #1} +\renewenvironment{abstract}{\noindent\textbf{Abstract}~}{} + +% The real work is done by a redefined version of |\maketitle|. Note +% that even though we do not want chapters (articles) numbered, we +% need to increment the chapter counter, so that figures get correct +% labelling. + +\renewcommand{\maketitle}{% +\noindent + \chapter{\RJ@title}\refstepcounter{chapter} + \ifx\empty\RJ@subtitle + \else + \noindent\textbf{\RJ@subtitle} + \par\nobreak\addvspace{\baselineskip} + \fi + \ifx\empty\RJ@author + \else + \noindent\textit{\RJ@author} + \par\nobreak\addvspace{\baselineskip} + \fi + \@afterindentfalse\@nobreaktrue\@afterheading +} + +% Now for some ugly redefinitions. We do not want articles to start a +% new page. (Actually, we do, but this is handled via explicit +% \newpage +% +% The name@of@eq is a hack to get hyperlinks to equations to work +% within each article, even though there may be multiple eq.(1) +% \begin{macrocode} +\renewcommand\chapter{\secdef\RJ@chapter\@schapter} +\providecommand{\nohyphens}{% + \hyphenpenalty=10000\exhyphenpenalty=10000\relax} +\newcommand{\RJ@chapter}{% + \edef\name@of@eq{equation.\@arabic{\c@chapter}}% + \renewcommand{\@seccntformat}[1]{}% + \@startsection{chapter}{0}{0mm}{% + -2\baselineskip \@plus -\baselineskip \@minus -.2ex}{\p@}{% + \phantomsection\normalfont\huge\bfseries\raggedright}} + +% Book reviews should appear as sections in the text and in the pdf bookmarks, +% however we wish them to appear as chapters in the TOC. Thus we define an +% alternative to |\maketitle| for reviews. +\newcommand{\review}[1]{ + \pdfbookmark[1]{#1}{#1} + \section*{#1} + \addtocontents{toc}{\protect\contentsline{chapter}{#1}{\thepage}{#1.1}} +} + +% We want bibliographies as starred sections within articles. +% +\RequirePackage[sectionbib,round]{natbib} +\bibliographystyle{abbrvnat} +\renewcommand{\bibsection}{\section*{References}} + +% Equations, figures and tables are counted within articles, but we do +% not show the article number. For equations it becomes a bit messy to avoid +% having hyperref getting it wrong. + +% \numberwithin{equation}{chapter} +\renewcommand{\theequation}{\@arabic\c@equation} +\renewcommand{\thefigure}{\@arabic\c@figure} +\renewcommand{\thetable}{\@arabic\c@table} + +% Issue layout ----------------------------------------------------------------- + +% Need to provide our own version of |\tableofcontents|. We use the +% tikz package to get the rounded rectangle. Notice that |\section*| +% is really the same as |\chapter*|. +\renewcommand{\contentsname}{Contents} +\renewcommand\tableofcontents{% + \vspace{1cm} + \section*{\contentsname} + { \@starttoc{toc} } +} + +\renewcommand{\titlepage}{% + \thispagestyle{empty} + \hypersetup{ + pdftitle={The R Journal Volume \RJ@volume/\RJ@number, \RJ@month \RJ@year},% + pdfauthor={R Foundation for Statistical Computing},% + } + \noindent + \begin{center} + \fontsize{50pt}{50pt}\selectfont + The \raisebox{-8pt}{\includegraphics[height=77pt]{Rlogo-5}}\hspace{10pt} + Journal + + \end{center} + {\large \hfill Volume \RJ@volume/\RJ@number, \RJ@month{} \RJ@year \quad} + + \rule{\textwidth}{1pt} + \begin{center} + {\Large A peer-reviewed, open-access publication of the \\ + R Foundation for Statistical Computing} + \end{center} + + % And finally, put in the TOC box. Note the way |tocdepth| is adjusted + % before and after producing the TOC: thus, we can ensure that only + % articles show up in the printed TOC, but that in the PDF version, + % bookmarks are created for sections and subsections as well (provided + % that the non-starred forms are used). + \setcounter{tocdepth}{0} + \tableofcontents + \setcounter{tocdepth}{2} + \clearpage +} + +% Text formatting -------------------------------------------------------------- + +\newcommand{\R}{R} +\newcommand{\address}[1]{\addvspace{\baselineskip}\noindent\emph{#1}} +\newcommand{\email}[1]{\href{mailto:#1}{\normalfont\texttt{#1}}} + +% Simple font selection is not good enough. For example, |\texttt{--}| +% gives `\texttt{--}', i.e., an endash in typewriter font. Hence, we +% need to turn off ligatures, which currently only happens for commands +% |\code| and |\samp| and the ones derived from them. Hyphenation is +% another issue; it should really be turned off inside |\samp|. And +% most importantly, \LaTeX{} special characters are a nightmare. E.g., +% one needs |\~{}| to produce a tilde in a file name marked by |\file|. +% Perhaps a few years ago, most users would have agreed that this may be +% unfortunate but should not be changed to ensure consistency. But with +% the advent of the WWW and the need for getting `|~|' and `|#|' into +% URLs, commands which only treat the escape and grouping characters +% specially have gained acceptance + +\DeclareRobustCommand\code{\bgroup\@noligs\@codex} +\def\@codex#1{\texorpdfstring% +{{\normalfont\ttfamily\hyphenchar\font=-1 #1}}% +{#1}\egroup} +\newcommand{\kbd}[1]{{\normalfont\texttt{#1}}} +\newcommand{\key}[1]{{\normalfont\texttt{\uppercase{#1}}}} +\DeclareRobustCommand\samp{`\bgroup\@noligs\@sampx} +\def\@sampx#1{{\normalfont\texttt{#1}}\egroup'} +\newcommand{\var}[1]{{\normalfont\textsl{#1}}} +\let\env=\code +\newcommand{\file}[1]{{`\normalfont\textsf{#1}'}} +\let\command=\code +\let\option=\samp +\newcommand{\dfn}[1]{{\normalfont\textsl{#1}}} +% \acronym is effectively disabled since not used consistently +\newcommand{\acronym}[1]{#1} +\newcommand{\strong}[1]{\texorpdfstring% +{{\normalfont\fontseries{b}\selectfont #1}}% +{#1}} +\let\pkg=\strong +\newcommand{\CRANpkg}[1]{\href{https://CRAN.R-project.org/package=#1}{\pkg{#1}}}% +\let\cpkg=\CRANpkg +\newcommand{\ctv}[1]{\href{https://CRAN.R-project.org/view=#1}{\emph{#1}}} +\newcommand{\BIOpkg}[1]{\href{https://www.bioconductor.org/packages/release/bioc/html/#1.html}{\pkg{#1}}} + +% Example environments --------------------------------------------------------- +\RequirePackage{fancyvrb} +\RequirePackage{alltt} + +\DefineVerbatimEnvironment{example}{Verbatim}{} +\renewenvironment{example*}{\begin{alltt}}{\end{alltt}} + +% Support for output from Sweave, and generic session style code +% These used to have fontshape=sl for Sinput/Scode/Sin, but pslatex +% won't use a condensed font in that case. + +% Update (2015-05-28 by DS): remove fontsize=\small to match example environment + +\DefineVerbatimEnvironment{Sinput}{Verbatim}{} +\DefineVerbatimEnvironment{Soutput}{Verbatim}{} +\DefineVerbatimEnvironment{Scode}{Verbatim}{} +\DefineVerbatimEnvironment{Sin}{Verbatim}{} +\DefineVerbatimEnvironment{Sout}{Verbatim}{} +\newenvironment{Schunk}{}{} + +% Mathematics ------------------------------------------------------------------ + +% The implementation of |\operatorname| is similar to the mechanism +% \LaTeXe{} uses for functions like sin and cos, and simpler than the +% one of \AmSLaTeX{}. We use |\providecommand| for the definition in +% order to keep the one of the \pkg{amstex} if this package has +% already been loaded. +% \begin{macrocode} +\providecommand{\operatorname}[1]{% + \mathop{\operator@font#1}\nolimits} +\RequirePackage{amsfonts} + +\renewcommand{\P}{% + \mathop{\operator@font I\hspace{-1.5pt}P\hspace{.13pt}}} +\newcommand{\E}{% + \mathop{\operator@font I\hspace{-1.5pt}E\hspace{.13pt}}} +\newcommand{\VAR}{\operatorname{var}} +\newcommand{\COV}{\operatorname{cov}} +\newcommand{\COR}{\operatorname{cor}} + +% Figures ---------------------------------------------------------------------- + +% For use with pandoc > 3.2.1 +\newsavebox\pandoc@box +\newcommand*\pandocbounded[1]{% scales image to fit in text height/width + \sbox\pandoc@box{#1}% + \Gscale@div\@tempa{\textheight}{\dimexpr\ht\pandoc@box+\dp\pandoc@box\relax}% + \Gscale@div\@tempb{\linewidth}{\wd\pandoc@box}% + \ifdim\@tempb\p@<\@tempa\p@\let\@tempa\@tempb\fi% select the smaller of both + \ifdim\@tempa\p@<\p@\scalebox{\@tempa}{\usebox\pandoc@box}% + \else\usebox{\pandoc@box}% + \fi% +} + +\RequirePackage[font=small,labelfont=bf]{caption} + +% Wide environments for figures and tables ------------------------------------- +\RequirePackage{environ} + +% An easy way to make a figure span the full width of the page +\NewEnviron{widefigure}[1][]{ +\begin{figure}[#1] +\advance\leftskip-2cm +\begin{minipage}{\dimexpr\textwidth+4cm\relax}% + \captionsetup{margin=2cm} + \BODY +\end{minipage}% +\end{figure} +} + +\NewEnviron{widetable}[1][]{ +\begin{table}[#1] +\advance\leftskip-2cm +\begin{minipage}{\dimexpr\textwidth+4cm\relax}% + \captionsetup{margin=2cm} + \BODY +\end{minipage}% +\end{table} +} diff --git a/_articles/RJ-2025-033/RJwrapper.tex b/_articles/RJ-2025-033/RJwrapper.tex new file mode 100644 index 0000000000..0b69174306 --- /dev/null +++ b/_articles/RJ-2025-033/RJwrapper.tex @@ -0,0 +1,36 @@ +\documentclass[a4paper]{report} +\usepackage[utf8]{inputenc} +\usepackage[T1]{fontenc} +\usepackage{RJournal} +\usepackage{amsmath,amssymb,array} +\usepackage{booktabs} + +\usepackage{multirow} +\usepackage[]{algorithm, algorithmic} + + +\newcommand{\argmax}{\mathop{\mathrm{argmax}}\limits} +\newcommand{\argmin}{\mathop{\mathrm{argmin}}\limits} +\newcommand{\minF}{\mathop{\mathrm{min}}\limits} +\newcommand{\maxF}{\mathop{\mathrm{max}}\limits} +\newcommand{\aveF}{\mathop{\mathrm{ave}}\limits} + +\newcommand{\RR}[1]{\texttt{#1}} + + +\begin{document} + +%% do not edit, for illustration only +\sectionhead{Contributed research article} +\volume{17} +\volnumber{4} +\year{2025} +\month{December} +\setcounter{page}{20} + +%% replace RJtemplate with your article +\begin{article} + \input{wang-xiao-mandal.tex} +\end{article} + +\end{document} diff --git a/_articles/RJ-2025-033/wang-xiao-mandal.R b/_articles/RJ-2025-033/wang-xiao-mandal.R new file mode 100644 index 0000000000..c28b51b1a0 --- /dev/null +++ b/_articles/RJ-2025-033/wang-xiao-mandal.R @@ -0,0 +1,166 @@ +library(LHD) +X = rLHD(n = 5, k = 3); X #This generates a 5 by 3 random LHD, denoted as X + +phi_p(X) #The maximin L1-distance criterion. +phi_p(X, p = 10, q = 2) #The maximin L2-distance criterion. +MaxProCriterion(X) #The maximum projection criterion. +AvgAbsCor(X) #The average absolute correlation criterion. +MaxAbsCor(X) #The maximum absolute correlation criterion. + + +#Generate a 5 by 3 maximin distance LHD by the SA function. +try.SA = SA(n = 5, k = 3); try.SA + +phi_p(try.SA) #\phi_p is smaller than that of a random LHD (0.3336608). + + +#Similarly, generations of 5 by 3 maximin distance LHD by the SA2008, LaPSO and GA functions. +try.SA2008 = SA2008(n = 5, k = 3) +try.LaPSO = LaPSO(n = 5, k = 3) +try.GA = GA(n = 5, k = 3) + +#Generate an OA(9,2,3,2), an orthogonal array with 9 runs, 2 factors, 3 levels, and 2 strength. +OA = matrix(c(rep(1:3, each = 3), rep(1:3, times = 3)), + ncol = 2, nrow = 9, byrow = FALSE) +#Generates a maximin distance LHD with the same design size as the input OA +#by the orthogonal-array-based simulated annealing algorithm. +try.OASA = OASA(OA) +OA; try.OASA + +#Below try.SA is a 5 by 3 maximin distance LHD generated by the SA with 30 iterations (N = 30). +#The temperature starts at 10 (T0 = 10) and decreases 10% (rate = 0.1) each time. +#The minimium temperature allowed is 1 (Tmin = 1) and the maximum perturbations that +#the algorithm will try without improvements is 5 (Imax = 5). The optimality criterion +#used is maximin distance criterion (OC = "phi_p") with p = 15 and q = 1, and the +#maximum CPU time is 5 minutes (maxtime = 5). +try.SA = SA(n = 5, k = 3, N = 30, T0 = 10, rate = 0.1, Tmin = 1, Imax = 5, OC = "phi_p", + p = 15, q = 1, maxtime = 5); try.SA + +phi_p(try.SA) + + +#Below try.SA2008 is a 5 by 3 maximin distance LHD generated by SA with +#the multi-objective optimization approach. The input arguments are interpreted +#the same as the design try.SA above. +try.SA2008=SA2008(n=5,k=3,N=30,T0=10,rate=0.1,Tmin=1,Imax=5, + OC="phi_p",p=15,q=1,maxtime=5) + +#Below try.OASA is a 9 by 2 maximin distance LHD generated by the +#orthogonal-array-based simulated annealing algorithm with the input +#OA (defined previously), and the rest input arguments are interpreted the +#same as the design try.SA above. +try.OASA = OASA(OA, N = 30, T0 = 10, rate = 0.1, Tmin = 1, Imax = 5, + OC = "phi_p", p = 15, q = 1, maxtime = 5) + +#Below try.LaPSO is a 5 by 3 maximum projection LHD generated by the particle swarm +#optimization algorithm with 20 particles (m = 20) and 30 iterations (N = 30). +#Zero (or two) elements in any column of the current particle should be the same as +#the elements of corresponding column from personal best (or global best), because +#of SameNumP = 0 (or SameNumG = 2). +#The probability of exchanging two randomly selected elements is 0.5 (p0 = 0.5). +#The optimality criterion is maximum projection criterion (OC = "MaxProCriterion"). +#The maximum CPU time is 5 minutes (maxtime = 5). +try.LaPSO = LaPSO(n = 5, k = 3, m = 20, N = 30, SameNumP = 0, SameNumG = 2, + p0 = 0.5, OC = "MaxProCriterion", maxtime = 5); try.LaPSO + +#Recall the value is 0.5375482 from the random LHD in Section 2. +MaxProCriterion(try.LaPSO) + + +#Below try.GA is a 5 by 3 OLHD generated by the genetic algorithm with the +#population size 20 (m = 20), number of iterations 30 (N = 30), mutation +#probability 0.5 (pmut = 0.5), maximum absolute correlation criterion +#(OC = "MaxAbsCor"), and maximum CPU time 5 minutes (maxtime = 5). +try.GA = GA(n = 5, k = 3, m = 20, N = 30, pmut = 0.5, OC = "MaxAbsCor", + maxtime = 5); try.GA + +#Recall the value is 0.9 from the random LHD in Section 2. +MaxAbsCor(try.GA) + +#Make sure both packages are properly installed before load them +library(LHD) +library(MaxPro) + +count = 0 #Define a variable for counting purpose + +k = 5 #Factor size 5 +n = 10*k #Run size = 10*factor size + +#Setting 500 iterations for both algorithms, below loop counts +#how many times the GA from LHD package outperforms the algorithm +#from MaxPro package out of 100 times + +for (i in 1:100) { + + LHD = LHD::GA(n = n, k = k, m = 100, N = 500) + MaxPro = MaxPro::MaxProLHD(n = n, p = k, total_iter = 500)$Design + + #MaxPro * n + 0.5 applied the transformation mentioned in Section 2 + #to revert the scaling. + Result.LHD = LHD::MaxProCriterion(LHD) + Result.MaxPro = LHD::MaxProCriterion(MaxPro * n + 0.5) + + if (Result.LHD < Result.MaxPro) {count = count + 1} + +} + +count + +#FastMmLHD(8, 8) generates an optimal 8 by 8 maximin L_1 distance LHD. +try.FastMm = FastMmLHD(n = 8, k = 8); try.FastMm + +#OA2LHD(OA) expands an input OA to an LHD of the same run size. +try.OA2LHD = OA2LHD(OA) +OA; try.OA2LHD + +#OLHD.Y1998(m = 3) generates a 9 by 4 orthogonal LHD. +#Note that 2^m+1 = 9 and 2*m-2 = 4. +try.Y1998 = OLHD.Y1998(m = 3); try.Y1998 + +MaxAbsCor(try.Y1998) #column-wise correlations are 0. + +#OLHD.C2007(m = 4) generates a 17 by 7 orthogonal LHD. +#Note that 2^m+1 = 17 and $4+{4-1 \choose 2}$ = 7. +try.C2007 = OLHD.C2007(m = 4); dim(try.C2007) + +MaxAbsCor(try.C2007) #column-wise correlations are 0 + + +#OLHD.S2010(C = 3, r = 3, type = "odd") generates a 49 by 8 orthogonal LHD. +#Note that 3*2^4+1 = 49 and 2^3 = 8. +dim(OLHD.S2010(C = 3, r = 3, type = "odd")) + +MaxAbsCor(OLHD.S2010(C = 3, r = 3, type = "odd")) #column-wise correlations are 0 + + +#OLHD.S2010(C = 3, r = 3, type = "even") generates a 48 by 8 orthogonal LHD. +#Note that 3*2^4 = 48 and 2^3 = 8. +dim(OLHD.S2010(C = 3, r = 3, type = "even")) + +MaxAbsCor(OLHD.S2010(C = 3, r = 3, type = "even")) #column-wise correlations are 0 + + +#Create a 5 by 2 OLHD. +OLHD = OLHD.C2007(m = 2) + +#Create an OA(25, 6, 5, 2). +OA = matrix(c(2,2,2,2,2,1,2,1,5,4,3,5,3,2,1,5,4,5,1,5,4,3,2,5,4,1,3,5,2,3, + 1,2,3,4,5,2,1,3,5,2,4,3,1,1,1,1,1,1,4,3,2,1,5,5,5,5,5,5,5,1,4,4,4,4,4,1, + 3,1,4,2,5,4,3,3,3,3,3,1,3,5,2,4,1,3,3,4,5,1,2,2,5,4,3,2,1,5,2,3,4,5,1,2, + 2,5,3,1,4,4,1,4,2,5,3,4,4,2,5,3,1,4,2,4,1,3,5,3,5,3,1,4,2,4,5,2,4,1,3,3, + 5,1,2,3,4,2,4,5,1,2,3,2), ncol = 6, nrow = 25, byrow = TRUE) + +#OLHD.L2009(OLHD, OA) generates a 25 by 12 orthogonal LHD. +#Note that n = 5 so n^2 = 25. p = 2 and f = 3 so 2fp = 12. +dim(OLHD.L2009(OLHD, OA)) + +MaxAbsCor(OLHD.L2009(OLHD, OA)) #column-wise correlations are 0. + + +#OLHD.B2001(n = 11, k = 5) generates a 11 by 5 orthogonal LHD. +dim(OLHD.B2001(n = 11, k = 5)) + + + + + diff --git a/_articles/RJ-2025-033/wang-xiao-mandal.bib b/_articles/RJ-2025-033/wang-xiao-mandal.bib new file mode 100644 index 0000000000..6fe166255b --- /dev/null +++ b/_articles/RJ-2025-033/wang-xiao-mandal.bib @@ -0,0 +1,797 @@ +@article{ye1998orthogonal, + title={Orthogonal column {L}atin hypercubes and their application in computer experiments}, + author={Ye, K. Qian}, + journal={Journal of the American Statistical Association}, + volume={93}, + number={444}, + pages={1430--1439}, + year={1998}, + publisher={Taylor \& Francis} +} +@article{cioppa2007efficient, + title={Efficient nearly orthogonal and space-filling {L}atin hypercubes}, + author={Cioppa, Thomas M and Lucas, Thomas W}, + journal={Technometrics}, + volume={49}, + number={1}, + pages={45--55}, + year={2007}, + publisher={Taylor \& Francis} +} +@article{tang1993orthogonal, + title={Orthogonal array-based {L}atin hypercubes}, + author={Tang, Boxin}, + journal={Journal of the American statistical association}, + volume={88}, + number={424}, + pages={1392--1397}, + year={1993}, + publisher={Taylor \& Francis} +} +@article{lin2009construction, + title={Construction of orthogonal and nearly orthogonal {L}atin hypercubes}, + author={Lin, C. Devon and Mukerjee, Rahul and Tang, Boxin}, + journal={Biometrika}, + volume={96}, + number={1}, + pages={243--247}, + year={2009}, + publisher={Oxford University Press} +} +@article{sun2010construction, + title={Construction of orthogonal {L}atin hypercube designs with flexible run sizes}, + author={Sun, Fasheng and Liu, Min-Qian and Lin, Dennis K. J.}, + journal={Journal of Statistical Planning and Inference}, + volume={140}, + number={11}, + pages={3236--3242}, + year={2010}, + publisher={Elsevier} +} +@article{sun2009construction, + title={Construction of orthogonal {L}atin hypercube designs}, + author={Sun, Fasheng and Liu, Min-Qian and Lin, Dennis K. J.}, + journal={Biometrika}, + volume={96}, + number={4}, + pages={971--974}, + year={2009}, + publisher={Oxford University Press} +} +@article{williams1949experimental, + title={Experimental designs balanced for the estimation of residual effects of treatments}, + author={Williams, EJ}, + journal={Australian Journal of Chemistry}, + volume={2}, + number={2}, + pages={149--168}, + year={1949}, + publisher={CSIRO} +} +@article{butler2001optimal, + title={Optimal and orthogonal {L}atin hypercube designs for computer experiments}, + author={Butler, Neil A}, + journal={Biometrika}, + volume={88}, + number={3}, + pages={847--857}, + year={2001}, + publisher={Oxford University Press} +} +@article{johnson1990minimax, + title={Minimax and maximin distance designs}, + author={Johnson, Mark E and Moore, Leslie M and Ylvisaker, Donald}, + journal={Journal of statistical planning and inference}, + volume={26}, + number={2}, + pages={131--148}, + year={1990}, + publisher={Elsevier} +} +@article{morris1995exploratory, + title={Exploratory designs for computational experiments}, + author={Morris, Max D and Mitchell, Toby J}, + journal={Journal of statistical planning and inference}, + volume={43}, + number={3}, + pages={381--402}, + year={1995}, + publisher={Elsevier} +} +@article{hickernell1998generalized, + title={A generalized discrepancy and quadrature error bound}, + author={Hickernell, Fred}, + journal={Mathematics of computation}, + volume={67}, + number={221}, + pages={299--322}, + year={1998} +} +@article{joseph2008orthogonal, + title={Orthogonal-maximin {L}atin hypercube designs}, + author={Joseph, V Roshan and Hung, Ying}, + journal={Statistica Sinica}, + pages={171--186}, + year={2008}, + publisher={JSTOR} +} +@article{jin2005efficient, + title={An efficient algorithm for constructing optimal design of computer experiments}, + author={Jin, Ruichen and Chen, Wei and Sudjianto, Agus}, + journal={Journal of statistical planning and inference}, + volume={134}, + number={1}, + pages={268--287}, + year={2005}, + publisher={Elsevier} +} +@article{wang2018optimal, + title={Optimal maximin ${L}_{1}$-distance Latin hypercube designs based on good lattice point designs}, + author={Wang, Lin and Xiao, Qian and Xu, Hongquan}, + journal={The Annals of Statistics}, + volume={46}, + number={6B}, + pages={3741--3766}, + year={2018}, + publisher={Institute of Mathematical Statistics} +} +@article{xiao2018construction, + title={Construction of maximin distance designs via level permutation and expansion}, + author={Xiao, Qian and Xu, Hongquan}, + journal={Statistica Sinica}, + volume={28}, + number={3}, + pages={1395--1414}, + year={2018}, + publisher={JSTOR} +} +@article{xiao2017construction, + title={Construction of maximin distance {L}atin squares and related {L}atin hypercube designs}, + author={Xiao, Qian and Xu, Hongquan}, + journal={Biometrika}, + volume={104}, + number={2}, + pages={455--464}, + year={2017}, + publisher={Oxford University Press} +} +@article{fang2002centered, + title={Centered ${L}_{2}$-discrepancy of random sampling and {L}atin hypercube design, and construction of uniform designs}, + author={Fang, Kai-Tai and Ma, Chang-Xing and Winker, Peter}, + journal={Mathematics of Computation}, + volume={71}, + number={237}, + pages={275--296}, + year={2002} +} +@inproceedings{korobov1959approximate, + title={The approximate computation of multiple integrals}, + author={Korobov, AN}, + booktitle={Dokl. Akad. Nauk SSSR}, + volume={124}, + pages={1207--1210}, + year={1959} +} +@article{zhou2015space, + title={Space-filling properties of good lattice point sets}, + author={Zhou, Yongdao and Xu, Hongquan}, + journal={Biometrika}, + volume={102}, + number={4}, + pages={959--966}, + year={2015}, + publisher={Oxford University Press} +} +@article{leary2003optimal, + title={Optimal orthogonal-array-based {L}atin hypercubes}, + author={Leary, Stephen and Bhaskar, Atul and Keane, Andy}, + journal={Journal of Applied Statistics}, + volume={30}, + number={5}, + pages={585--598}, + year={2003}, + publisher={Taylor \& Francis} +} +@article{ba2015optimal, + title={Optimal sliced {L}atin hypercube designs}, + author={Ba, Shan and Myers, William R and Brenneman, William A}, + journal={Technometrics}, + volume={57}, + number={4}, + pages={479--487}, + year={2015}, + publisher={Taylor \& Francis} +} +@article{qian2012sliced, + title={Sliced {L}atin hypercube designs}, + author={Qian, Peter ZG}, + journal={Journal of the American Statistical Association}, + volume={107}, + number={497}, + pages={393--399}, + year={2012}, + publisher={Taylor \& Francis Group} +} +@article{chen2013optimizing, + title={Optimizing {L}atin hypercube designs by particle swarm}, + author={Chen, Ray-Bing and Hsieh, Dai-Ni and Hung, Ying and Wang, Weichung}, + journal={Statistics and computing}, + volume={23}, + number={5}, + pages={663--676}, + year={2013}, + publisher={Springer} +} +@article{liefvendahl2006study, + title={A study on algorithms for optimization of {L}atin hypercubes}, + author={Liefvendahl, Mattias and Stocki, Rafa{\l}}, + journal={Journal of statistical planning and inference}, + volume={136}, + number={9}, + pages={3231--3247}, + year={2006}, + publisher={Elsevier} +} +@article{holland1975efficient, + title={An efficient genetic algorithm for the traveling salesman problem}, + author={Holland, JH}, + journal={European Journal of Operational Research}, + volume={145}, + pages={606--617}, + year={1975} +} +@article{joseph2015maximum, + title={Maximum projection designs for computer experiments}, + author={Joseph, V Roshan and Gul, Evren and Ba, Shan}, + journal={Biometrika}, + volume={102}, + number={2}, + pages={371--380}, + year={2015}, + publisher={Oxford University Press} +} +@Manual{team2013r, + title = {{R}: A Language and Environment for Statistical Computing}, + author = {{R Core Team}}, + organization = {R Foundation for Statistical Computing}, + address = {Vienna, Austria}, + year = {2013}, + url = {http://www.R-project.org/} +} +@Manual{MaxPro, + title = {MaxPro: Maximum Projection Designs}, + author = {Shan Ba and V. Roshan Joseph}, + year = {2018}, + note = {R package version 4.1-2}, + url = {https://CRAN.R-project.org/package=MaxPro} +} +@Manual{LHD, + title = {LHD: Latin Hypercube Designs (LHDs)}, + author = {Hongzhi Wang and Qian Xiao and Abhyuday Mandal}, + year = {2025}, + note = {R package version 1.4.1}, + url = {https://CRAN.R-project.org/package=LHD} +} +@article{sacks1989designs, + title={Designs for computer experiments}, + author={Sacks, Jerome and Schiller, Susannah B and Welch, William J}, + journal={Technometrics}, + volume={31}, + number={1}, + pages={41--47}, + year={1989}, + publisher={Taylor \& Francis Group} +} +@book{santner2003design, + title={The design and analysis of computer experiments}, + author={Santner, Thomas J and Williams, Brian J and Notz, William and Williams, Brain J}, + volume={1}, + year={2003}, + publisher={Springer} +} +@book{fang2005design, + title={Design and modeling for computer experiments}, + author={Fang, Kai-Tai and Li, Runze and Sudjianto, Agus}, + year={2005}, + publisher={CRC press} +} +@article{mckay1979comparison, + title={Comparison of three methods for selecting values of input variables in the analysis of output from a computer code}, + author={McKay, Michael D and Beckman, Richard J and Conover, William J}, + journal={Technometrics}, + volume={21}, + number={2}, + pages={239--245}, + year={1979}, + publisher={Taylor \& Francis} +} +@article{kenny2000algorithmic, + title={Algorithmic construction of optimal symmetric {L}atin hypercube designs}, + author={Ye, K. Qian and Li, William and Sudjianto, Agus}, + journal={Journal of statistical planning and inference}, + volume={90}, + number={1}, + pages={145--159}, + year={2000}, + publisher={Elsevier} +} +@article{grosso2009finding, + title={Finding maximin {L}atin hypercube designs by iterated local search heuristics}, + author={Grosso, Andrea and Jamali, ARMJU and Locatelli, Marco}, + journal={European Journal of Operational Research}, + volume={197}, + number={2}, + pages={541--547}, + year={2009}, + publisher={Elsevier} +} +@article{viana2010algorithm, + title={An algorithm for fast optimal {L}atin hypercube design of experiments}, + author={Viana, Felipe AC and Venter, Gerhard and Balabanov, Vladimir}, + journal={International journal for numerical methods in engineering}, + volume={82}, + number={2}, + pages={135--156}, + year={2010}, + publisher={Wiley Online Library} +} +@article{van2007maximin, + title={Maximin {L}atin hypercube designs in two dimensions}, + author={Van Dam, Edwin R and Husslage, Bart and Den Hertog, Dick and Melissen, Hans}, + journal={Operations Research}, + volume={55}, + number={1}, + pages={158--169}, + year={2007}, + publisher={INFORMS} +} +@article{van2009bounds, + title={Bounds for maximin {L}atin hypercube designs}, + author={van Dam, Edwin R and Rennen, Gijs and Husslage, Bart}, + journal={Operations Research}, + volume={57}, + number={3}, + pages={595--608}, + year={2009}, + publisher={INFORMS} +} +@article{steinberg2006construction, + title={A construction method for orthogonal {L}atin hypercube designs}, + author={Steinberg, David M and Lin, Dennis K. J.}, + journal={Biometrika}, + volume={93}, + number={2}, + pages={279--288}, + year={2006}, + publisher={Oxford University Press} +} +@article{yang2012construction, + title={Construction of orthogonal and nearly orthogonal {L}atin hypercube designs from orthogonal designs}, + author={Yang, Jinyu and Liu, Min-Qian}, + journal={Statistica Sinica}, + pages={433--442}, + year={2012}, + publisher={JSTOR} +} +@article{georgiou2014some, + title={Some classes of orthogonal {L}atin hypercube designs}, + author={Georgiou, Stelios D and Efthimiou, Ifigenia}, + journal={Statistica Sinica}, + volume={24}, + number={1}, + pages={101--120}, + year={2014}, + publisher={JSTOR} +} +@article{sun2017general, + title={A general rotation method for orthogonal {L}atin hypercubes}, + author={Sun, Fasheng and Tang, Boxin}, + journal={Biometrika}, + volume={104}, + number={2}, + pages={465--472}, + year={2017}, + publisher={Oxford University Press} +} +@article{georgiou2009orthogonal, + title={Orthogonal {L}atin hypercube designs from generalized orthogonal designs}, + author={Georgiou, Stelios D}, + journal={Journal of Statistical Planning and Inference}, + volume={139}, + number={4}, + pages={1530--1540}, + year={2009}, + publisher={Elsevier} +} +@inproceedings{kennedy1995particle, + title={Particle swarm optimization}, + author={Kennedy, James and Eberhart, Russell}, + booktitle={Proceedings of {ICNN}'95-International Conference on Neural Networks}, + volume={4}, + pages={1942--1948}, + year={1995}, + organization={IEEE} +} +@book{holland1992adaptation, + title={Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence}, + author={Holland, John Henry and others}, + year={1992}, + publisher={MIT press} +} +@article{goldberg1989genetic, + title={Genetic algorithms in search}, + author={Goldberg, David E}, + journal={Optimization, and MachineLearning}, + year={1989}, + publisher={Addison Wesley Publishing Co. Inc.} +} +@article{chen2015minimax, + title={Minimax optimal designs via particle swarm optimization methods}, + author={Chen, Ray-Bing and Chang, Shin-Perng and Wang, Weichung and Tung, Heng-Chih and Wong, Weng Kee}, + journal={Statistics and Computing}, + volume={25}, + number={5}, + pages={975--988}, + year={2015}, + publisher={Springer} +} +@article{wong2015modified, + title={A modified particle swarm optimization technique for finding optimal designs for mixture models}, + author={Wong, Weng Kee and Chen, Ray-Bing and Huang, Chien-Chih and Wang, Weichung}, + journal={PLoS One}, + volume={10}, + number={6}, + pages={e0124720}, + year={2015}, + publisher={Public Library of Science San Francisco, CA USA} +} +@article{kirkpatrick1983optimization, + title={Optimization by simulated annealing}, + author={Kirkpatrick, Scott and Gelatt, C Daniel and Vecchi, Mario P}, + journal={science}, + volume={220}, + number={4598}, + pages={671--680}, + year={1983}, + publisher={American association for the advancement of science} +} +@article{loeppky2009choosing, + title={Choosing the sample size of a computer experiment: A practical guide}, + author={Loeppky, Jason L and Sacks, Jerome and Welch, William J}, + journal={Technometrics}, + volume={51}, + number={4}, + pages={366--376}, + year={2009}, + publisher={Taylor \& Francis} +} +@article{chapman1994arctic, + title={Arctic sea ice variability: Model sensitivities and a multidecadal simulation}, + author={Chapman, William L and Welch, William J and Bowman, Kenneth P and Sacks, Jerome and Walsh, John E}, + journal={Journal of Geophysical Research: Oceans}, + volume={99}, + number={C1}, + pages={919--935}, + year={1994}, + publisher={Wiley Online Library} +} +@article{jones1998efficient, + title={Efficient global optimization of expensive black-box functions}, + author={Jones, Donald R and Schonlau, Matthias and Welch, William J}, + journal={Journal of Global optimization}, + volume={13}, + number={4}, + pages={455--492}, + year={1998}, + publisher={Springer} +} +@article{deng2015design, + title={Design for computer experiments with qualitative and quantitative factors}, + author={Deng, Xinwei and Hung, Ying and Lin, C Devon}, + journal={Statistica Sinica}, + pages={1567--1581}, + year={2015}, + publisher={JSTOR} +} +@article{lin2019design, + title={Design of order-of-addition experiments}, + author={Peng, Jiayu and Mukerjee, Rahul and Lin, Dennis K. J.}, + journal={Biometrika}, + volume={106}, + number={3}, + pages={683--694}, + year={2019}, + publisher={Oxford University Press} +} +@article{lin1993new, + title={A new class of supersaturated designs}, + author={Lin, Dennis K. J.}, + journal={Technometrics}, + volume={35}, + number={1}, + pages={28--31}, + year={1993}, + publisher={Taylor \& Francis} +} +@book{dean1999design, + title={Design and analysis of experiments}, + author={Dean, Angela and Voss, Daniel and Dragulji{\'c}, Danel and others}, + year={2017}, + publisher={Springer International Publishing} +} +@article{stander1992cooperative, + title={Cooperative hunting in lions: the role of the individual}, + author={Stander, Philip E}, + journal={Behavioral ecology and sociobiology}, + volume={29}, + number={6}, + pages={445--454}, + year={1992}, + publisher={Springer} +} +@article{whitacre2011recent, + title={Recent trends indicate rapid growth of nature-inspired optimization in academia and industry}, + author={Whitacre, James M}, + journal={Computing}, + volume={93}, + number={2-4}, + pages={121--133}, + year={2011}, + publisher={Springer} +} +@misc{wang2021musings, + title={Musings about Constructions of Efficient Latin Hypercube Designs with Flexible Run-sizes}, + author={Hongzhi Wang and Qian Xiao and Abhyuday Mandal}, + year={2021}, + eprint={2010.09154}, + archivePrefix={arXiv}, + primaryClass={stat.ME} +} +@article{lin2015using, + title={Using genetic algorithms to design experiments: a review}, + author={Lin, C. Devon and Anderson-Cook, Christine M and Hamada, Michael S and Moore, Leslie M and Sitter, Randy R}, + journal={Quality and Reliability Engineering International}, + volume={31}, + number={2}, + pages={155--167}, + year={2015}, + publisher={Wiley Online Library} +} +@book{boyd2004convex, + title={Convex optimization}, + author={Boyd, Stephen P and Vandenberghe, Lieven}, + year={2004}, + publisher={Cambridge university press} +} +@article{byrd1987trust, + title={A trust region algorithm for nonlinearly constrained optimization}, + author={Byrd, Richard H and Schnabel, Robert B and Shultz, Gerald A}, + journal={SIAM Journal on Numerical Analysis}, + volume={24}, + number={5}, + pages={1152--1170}, + year={1987}, + publisher={SIAM} +} +@incollection{beni1993swarm, + title={Swarm intelligence in cellular robotic systems}, + author={Beni, Gerardo and Wang, Jing}, + booktitle={Robots and biological systems: towards a new bionics?}, + pages={703--712}, + year={1993}, + publisher={Springer} +} +@article{dorigo2006ant, + title={Ant colony optimization}, + author={Dorigo, Marco and Birattari, Mauro and Stutzle, Thomas}, + journal={IEEE computational intelligence magazine}, + volume={1}, + number={4}, + pages={28--39}, + year={2006}, + publisher={IEEE} +} +@incollection{yang2010new, + title={A new metaheuristic bat-inspired algorithm}, + author={Yang, Xin-She}, + booktitle={Nature inspired cooperative strategies for optimization (NICSO 2010)}, + pages={65--74}, + year={2010}, + publisher={Springer} +} +@inproceedings{basturk2006artificial, + title={An artificial bee colony ({ABC}) algorithm for numeric function optimization}, + author={Basturk, Bahriye}, + booktitle={IEEE Swarm Intelligence Symposium, Indianapolis, IN, {USA}, 2006}, + year={2006} +} +@inproceedings{yang2009cuckoo, + title={Cuckoo search via L{\'e}vy flights}, + author={Yang, Xin-She and Deb, Suash}, + booktitle={2009 World congress on nature \& biologically inspired computing ({NaBIC})}, + pages={210--214}, + year={2009}, + organization={Ieee} +} +@inproceedings{yang2009firefly, + title={Firefly algorithms for multimodal optimization}, + author={Yang, Xin-She}, + booktitle={International symposium on stochastic algorithms}, + pages={169--178}, + year={2009}, + organization={Springer} +} +@article{mirjalili2014grey, + title={Grey wolf optimizer}, + author={Mirjalili, Seyedali and Mirjalili, Seyed Mohammad and Lewis, Andrew}, + journal={Advances in engineering software}, + volume={69}, + pages={46--61}, + year={2014}, + publisher={Elsevier} +} +@article{garcia2008jumping, + title={Jumping frogs optimization: a new swarm method for discrete optimization}, + author={Garcia, F Javier Martinez and P{\'e}rez, Jos{\'e} A Moreno}, + journal={Documentos de Trabajo del DEIOC}, + volume={3}, + year={2008} +} +@book{yang2020nature, + title={Nature-inspired optimization algorithms}, + author={Yang, Xin-She}, + year={2020}, + publisher={Academic Press} +} +@incollection{kennedy2006swarm, + title={Swarm intelligence}, + author={Kennedy, James}, + booktitle={Handbook of nature-inspired and innovative computing}, + pages={187--219}, + year={2006}, + publisher={Springer} +} +@incollection{lourencco2019iterated, + title={Iterated local search: Framework and applications}, + author={Louren{\c{c}}o, Helena Ramalhinho and Martin, Olivier C and St{\"u}tzle, Thomas}, + booktitle={Handbook of metaheuristics}, + pages={129--168}, + year={2019}, + publisher={Springer} +} +@article{hansen2010variable, + title={Variable neighbourhood search: methods and applications}, + author={Hansen, Pierre and Mladenovi{\'c}, Nenad and P{\'e}rez, Jos{\'e} A Moreno}, + journal={Annals of Operations Research}, + volume={175}, + number={1}, + pages={367--407}, + year={2010}, + publisher={Springer} +} +@article{stokes2020using, + title={Using Differential Evolution to design optimal experiments}, + author={Stokes, Zack and Mandal, Abhyuday and Wong, Weng Kee}, + journal={Chemometrics and Intelligent Laboratory Systems}, + volume={199}, + pages={103955}, + year={2020}, + publisher={Elsevier} +} +@book{antony2014design, + title={Design of experiments for engineers and scientists}, + author={Antony, Jiju}, + year={2014}, + publisher={Elsevier} +} +@article{storn1997differential, + title={Differential evolution--a simple and efficient heuristic for global optimization over continuous spaces}, + author={Storn, Rainer and Price, Kenneth}, + journal={Journal of global optimization}, + volume={11}, + number={4}, + pages={341--359}, + year={1997}, + publisher={Springer} +} +@book{price2006differential, + title={Differential evolution: a practical approach to global optimization}, + author={Price, Kenneth and Storn, Rainer M and Lampinen, Jouni A}, + year={2006}, + publisher={Springer Science \& Business Media} +} +@article{emich1900explosive, + title={{\"U}ber explosive Gasgemenge}, + author={Emich, F}, + journal={Monatshefte f{\"u}r Chemie und verwandte Teile anderer Wissenschaften}, + volume={21}, + number={10}, + pages={1061--1078}, + year={1900}, + publisher={Springer} +} +@article{lin2019order, + title={Order-of-addition experiments: A review and some new thoughts}, + author={Lin, Dennis KJ and Peng, Jiayu}, + journal={Quality Engineering}, + volume={31}, + number={1}, + pages={49--59}, + year={2019}, + publisher={Taylor \& Francis} +} +@book{gramacy2020surrogates, + title = {Surrogates: {G}aussian Process Modeling, Design and Optimization for the Applied Sciences}, + author = {Robert B. Gramacy}, + publisher = {Chapman Hall/CRC}, + address = {Boca Raton, Florida}, + note = {\url{http://bobby.gramacy.com/surrogates/}}, + year = {2020} +} +@inproceedings{van1995design, + title={Design of experiments where the order of addition is important}, + author={Van Nostrand, RC}, + booktitle={{ASA} proceedings of the Section on Physical and Engineering Sciences}, + pages={155--160}, + year={1995}, + organization={American Statistical Association Alexandria, Virginia} +} +@article{voelkel2019design, + title={The design of order-of-addition experiments}, + author={Voelkel, Joseph G}, + journal={Journal of Quality Technology}, + volume={51}, + number={3}, + pages={230--241}, + year={2019}, + publisher={Taylor \& Francis} +} +@book{fedorov2013theory, + title={Theory of optimal experiments}, + author={Fedorov, Valerii Vadimovich}, + year={2013}, + publisher={Elsevier} +} +@article{shewry1987maximum, + title={Maximum entropy sampling}, + author={Shewry, Michael C and Wynn, Henry P}, + journal={Journal of applied statistics}, + volume={14}, + number={2}, + pages={165--170}, + year={1987}, + publisher={Taylor \& Francis} +} +@article{viana2016tutorial, + title={A tutorial on {Latin} hypercube design of experiments}, + author={Viana, Felipe AC}, + journal={Quality and reliability engineering international}, + volume={32}, + number={5}, + pages={1975--1985}, + year={2016}, + publisher={Wiley Online Library} +} +@article{harari2018computer, + title={Computer experiments: Prediction accuracy, sample size and model complexity revisited}, + author={Harari, Ofir and Bingham, Derek and Dean, Angela and Higdon, Dave}, + journal={Statistica Sinica}, + pages={899--919}, + year={2018}, + publisher={JSTOR} +} +@Manual{lhs, + title = {lhs: {Latin} Hypercube Samples}, + author = {Rob Carnell}, + year = {2024}, + note = {R package version 1.2.0}, + url = {https://CRAN.R-project.org/package=lhs} +} +@Manual{SLHD, + title = {SLHD: Maximin-Distance (Sliced) {Latin} Hypercube Designs}, + author = {Shan Ba}, + year = {2015}, + note = {R package version 2.1-1}, + url = {https://CRAN.R-project.org/package=SLHD} +} + diff --git a/_articles/RJ-2025-033/wang-xiao-mandal.tex b/_articles/RJ-2025-033/wang-xiao-mandal.tex new file mode 100644 index 0000000000..35a16f56a6 --- /dev/null +++ b/_articles/RJ-2025-033/wang-xiao-mandal.tex @@ -0,0 +1,650 @@ +% !TeX root = RJwrapper.tex + +\title{LHD: An All-encompassing R Package for Constructing Optimal Latin Hypercube Designs} +\author{by Hongzhi Wang, Qian Xiao and Abhyuday Mandal} + +\maketitle + +\begin{abstract} +Optimal Latin hypercube designs (LHDs), including maximin distance LHDs, maximum projection LHDs and orthogonal LHDs, are widely used in computer experiments. It is challenging to construct such designs with flexible sizes, especially for large ones, for two main reasons. +One reason is that theoretical results, such as algebraic constructions ensuring the maximin distance property or orthogonality, are only available for certain design sizes. For design sizes where theoretical results are unavailable, search algorithms can generate designs. However, their numerical performance is not guaranteed to be optimal. Another reason is that when design sizes increase, the number of permutations grows exponentially. Constructing optimal LHDs is a discrete optimization process, and enumeration is nearly impossible for large or moderate design sizes. Various search algorithms and algebraic constructions have been proposed to identify optimal LHDs, each having its own pros and cons. We develop the R package LHD which implements various search algorithms and algebraic constructions. We embedded different optimality criteria into each of the search algorithms, and they are capable of constructing different types of optimal LHDs even though they were originally invented to construct maximin distance LHDs only. Another input argument that controls maximum CPU time is added to each of the search algorithms to let users flexibly allocate their computational resources. We demonstrate functionalities of the package by using various examples, and we provide guidance for experimenters on finding suitable optimal designs. The LHD package is easy to use for practitioners and possibly serves as a benchmark for future developments in LHD. +\end{abstract} + +\section{Introduction} + +Computer experiments are widely used in scientific research and industrial production, where +complex computer codes, commonly high-fidelity simulators, generate data instead of real physical systems \citep{sacks1989designs, fang2005design}. The outputs from computer experiments are deterministic (that is, free of random errors), and therefore replications are not needed \citep{butler2001optimal, joseph2008orthogonal, ba2015optimal}. Latin hypercube designs (LHDs, \cite{mckay1979comparison}) may be the most popular type of experimental designs for computer experiments \citep{fang2005design, xiao2018construction}, which avoid replications on every dimension and have uniform one-dimensional projections. According to practical needs, there are various types of optimal LHDs, including space-filling LHDs, maximum projection LHDs, and orthogonal LHDs. There is a rich literature on the construction of such designs, but it is still very challenging to find good ones for moderate to large design sizes \citep{ye1998orthogonal, fang2005design, joseph2015maximum, xiao2018construction}. One key reason is that theoretical results, such as algebraic constructions which guarantee the maximin distance property or orthogonality, are only established for specific design sizes. These constructions provide theoretical guarantees on the design quality but are limited in their applicability. For design sizes where such theoretical guarantees do not exist, search algorithms can generate designs. However, the performance of search-based designs depends on the algorithm employed, the search space explored, and the computational resources allocated, meaning they cannot be guaranteed to be optimal. Constructing optimal LHDs is a discrete optimization process, where enumerating all possible solutions guarantees the optimal design for a given size. However, this approach becomes computationally infeasible as the number of permutations grows exponentially with increasing design sizes, making it another key reason that adds to the challenge. + +An LHD with $n$ runs and $k$ factors is an $n \times k$ matrix with each column being a random permutation of numbers: $1, \ldots, n$. Throughout this paper, $n$ denotes the run size and $k$ denotes the factor size. A space-filling LHD has its sampled region as scattered as possible, minimizing the unsampled region, thus accounting for the uniformity of all dimensions. Different criteria were proposed to measure designs' space-filling properties, including the maximin and minimax distance criteria \citep{johnson1990minimax, morris1995exploratory}, the discrepancy criteria \citep{hickernell1998generalized, fang2002centered, fang2005design} and the entropy criterion \citep{shewry1987maximum}. Since there are as many as $(n!)^{k}$ candidate LHDs for a given design size, it is nearly impossible to find the space-filling one by enumeration when $n$ and $k$ are moderate or large. In the current literature, both the search algorithms \citep{morris1995exploratory, leary2003optimal, joseph2008orthogonal, ba2015optimal, kenny2000algorithmic, jin2005efficient, liefvendahl2006study, grosso2009finding, chen2013optimizing} and algebraic constructions \citep{zhou2015space, xiao2017construction, wang2018optimal} are used to construct space-filling LHDs. + +Space-filling designs often focus on the full-dimensional space. To further improve the space-filling properties of all possible sub-spaces, \cite{joseph2015maximum} proposed to use the maximum projection designs. Considering from two to $k-1$ dimensional sub-spaces, maximum projection LHDs (MaxPro LHDs) are generally more space-filling compared to the classic maximin distance LHDs. The construction of MaxPro LHDs is also challenging, especially for large ones, and \cite{joseph2015maximum} proposed a simulated annealing (SA) based algorithm. In the \RR{LHD} package, we incorporated the MaxPro criterion with other different algorithms such as the particle swarm optimization (PSO) and genetic algorithm (GA) framework, leading to many better MaxPro LHDs; see Section 3 for examples. + +Unlike space-filling LHDs that minimize the similarities among rows, orthogonal LHDs (OLHDs) are another popular type of optimal design which consider similarities among columns. For example, OLHDs have zero column-wise correlations. Algebraic constructions are available for certain design sizes \citep{ye1998orthogonal, cioppa2007efficient, steinberg2006construction, sun2010construction, sun2009construction, yang2012construction, georgiou2014some, butler2001optimal, tang1993orthogonal, lin2009construction}, but there are many design sizes where theoretical results are not available. In the \RR{LHD} package, we implemented the average absolute correlation criterion and the maximum absolute correlation criterion \citep{georgiou2009orthogonal} with SA, PSO, and GA to identify both OLHDs and nearly orthogonal LHDs (NOLHDs) for almost all design sizes. + +This paper introduces the R package \RR{LHD} available on the Comprehensive R Archive Network (\url{https://cran.r-project.org/web/packages/LHD/index.html}), which implements some currently popular search algorithms and algebraic constructions for constructing maximin distance LHDs, Maxpro LHDs, OLHDs and NOLHDs. We embedded different optimality criteria including the maximin distance criterion, the MaxPro criterion, the average absolute correlation criterion, and the maximum absolute correlation criterion in each of the search algorithms which were originally invented to construct maximin distance LHDs only \citep{morris1995exploratory, leary2003optimal, joseph2008orthogonal, liefvendahl2006study, chen2013optimizing}, and each of them is capable of constructing different types of optimal LHDs through the package. To let users flexibly allocate their computational resources, we also embedded an input argument that limits the maximum CPU time for each of the algorithms, where users can easily define how and when they want the algorithms to stop. An algorithm can stop in one of two ways: either when the user-defined maximum number of iterations is reached or when the user-defined maximum CPU time is exceeded. For example, users can either allow the algorithm to run for a specified number of iterations without restricting the maximum CPU time or set a maximum CPU time limit to stop the algorithm regardless of the number of iterations completed. After an algorithm is completed or stopped, the number of iterations completed along with the average CPU time per iteration will be presented to users for their information. The R package \RR{LHD} is an integrated tool for users with little or no background in design theory, and they can easily find optimal LHDs with desired sizes. Many new designs that are better than the existing ones are discovered; see Section 3. + +The remainder of the paper is organized as follows. Section 2 illustrates different optimality criteria for LHDs. Section 3 demonstrates some popular search algorithms and their implementation details in the \RR{LHD} package along with examples. Section 4 discusses some useful algebraic constructions as well as examples of how to implement them via the developed package. Section 5 reviews other R packages for Latin hypercubes and provides a comparative discussion. Section 6 concludes with a summary. + +\section{Optimality Criteria for LHDs} +\label{OC} + +Various criteria are proposed to measure designs' space-filling properties \citep{johnson1990minimax, hickernell1998generalized, fang2002centered}. In this paper, we focus on the currently popular maximin distance criterion \citep{johnson1990minimax}, which seeks to scatter design points over experimental domains so that the minimum distances between points are maximized. Let $\textbf{X}$ denote an LHD matrix throughout this paper. Define the $L_q$-distance between two runs $x_i$ and $x_j$ of $\textbf{X}$ as $d_q(x_i, x_j) = \left\{ \sum_{m=1}^{k} \vert x_{im}-x_{jm}\vert ^q \right\}^{1/q}$ where $q$ is an integer. Define the $L_q$ distance of the design $\textbf{X}$ as $d_q(\textbf{X}) = \text{min} \{d_q(x_i, x_j), 1 \leq i X = rLHD(n = 5, k = 3); X #This generates a 5 by 3 random LHD, denoted as X + [,1] [,2] [,3] +[1,] 2 1 4 +[2,] 4 3 3 +[3,] 3 2 2 +[4,] 1 4 5 +[5,] 5 5 1 +\end{verbatim} +The input arguments for the function \RR{rLHD} are the run-size \RR{n} and the factor size \RR{k}. Continuing with the above randomly generated LHD X, we evaluate it with respect to different optimality criteria. For example, +\begin{verbatim} +> phi_p(X) #The maximin L1-distance criterion. +[1] 0.3336608 +> phi_p(X, p = 10, q = 2) #The maximin L2-distance criterion. +[1] 0.5797347 +> MaxProCriterion(X) #The maximum projection criterion. +[1] 0.5375482 +> AvgAbsCor(X) #The average absolute correlation criterion. +[1] 0.5333333 +> MaxAbsCor(X) #The maximum absolute correlation criterion. +[1] 0.9 +\end{verbatim} +The input arguments of the function \RR{phi\_p} are an LHD matrix \RR{X}, \RR{p} and \RR{q}, where \RR{p} and \RR{q} come directly from the equation~\ref{E2}. Note that the default settings within function \RR{phi\_p} are $p=15$ and $q=1$ (the Manhattan distance) and user can change the settings. For functions \RR{MaxProCriterion}, \RR{AvgAbsCor}, and \RR{MaxAbsCor}, there is only one input argument, which is an LHD matrix \RR{X}. + +\section{Search Algorithms for Optimal LHDs with Flexible Sizes}\label{Algs} + +\subsection{Simulated Annealing Based Algorithms} + +Simulated annealing (SA, \cite{kirkpatrick1983optimization}) is a probabilistic optimization algorithm, whose name comes from the phenomenon of the annealing process in metallurgy. \cite{morris1995exploratory} proposed a modified SA that randomly exchanges the elements in LHD to seek potential improvements. If such an exchange leads to a better LHD under a given optimality criterion, the exchange is maintained. Otherwise, it is kept with a probability of $\hbox{exp}[-(\Phi(\textbf{X}_{new})-\Phi(\textbf{X}))/T]$, where $\Phi$ is a given optimality criterion, $\textbf{X}$ is the original LHD, $\textbf{X}_{new}$ is the LHD after the exchange and $T$ is the current temperature. In this article, we focus on minimizing the optimality criteria outlined in Section 2, meaning only minimization optimization problems are considered. Such probability guarantees that the exchange that leads to a slightly worse LHD has a higher chance of being kept than the exchange that leads to a significantly worse LHD, because an exchange which leads to a slightly worse LHD has a lower value of $\Phi(\textbf{X}_{new})-\Phi(\textbf{X})$. Such an exchange procedure will be implemented iteratively to improve LHD. When there are no improvements after certain attempts, the current temperature $T$ is annealed. Note that a large value of $\Phi(\textbf{X}_{new})-\Phi(\textbf{X})$ (exchange leading to a significantly worse LHD) is more likely to remain during the early phase of the search process when $T$ is relatively high, and it is less likely to stay later when $T$ decreases (annealed). The best LHD is identified after the algorithm converges or the budget constraint is reached. In the \RR{LHD} package, the function \RR{SA()} implements this algorithm: +\begin{verbatim} +SA(n, k, N = 10, T0 = 10, rate = 0.1, Tmin = 1, Imax = 5, OC = "phi_p", +p = 15, q = 1, maxtime = 5) +\end{verbatim} + +Table~\ref{T1} provides an overview of all the input arguments in \RR{SA()}. \RR{n} and \RR{k} are the desired run size and factor size. \RR{T0} is an initial temperature, \RR{rate} is the temperature decreasing rate, and \RR{Tmin} is the minimum temperature. If the current temperature is smaller than \RR{Tmin}, the current loop in the algorithm will stop and the current number of iterations will increase by one. There are two stopping criteria for the entire function: when the current number of iterations reaches the maximum (denoted as \RR{N} in the function) or when the cumulative CPU time reaches the maximum (denoted as \RR{maxtime} in the function), respectively. Either of those will trigger the stop of the function, whichever is earlier. For input argument \RR{OC} (optimality criterion), ``phi\_p" returns maximin distance LHDs, ``MaxProCriterion" returns MaxPro LHDs, and ``AvgAbsCor" or ``MaxAbsCor" returns orthogonal LHDs. + +\begin{table}[!htb] + \caption{Overview of Input Arguments of the \RR{SA} Function}\label{T1} + \begin{center} + \begin{tabular}{ll} + \hline + Argument & Description\\ + \hline + \RR{n} & A positive integer that defines the number of rows (or run size) of output LHD.\\ + \RR{k} & A positive integer that defines the number of columns (or factor size) of output LHD.\\ + \RR{N} & A positive integer that defines the maximum number of iterations in the algorithm. \\ + & A large value of \RR{N} will result in a high CPU time, and it is recommended to be no \\ + & greater than 500. The default is set to be 10.\\ + \hline + \RR{T0} & A positive number that defines the initial temperature. The default is set to be 10, \\ + & which means the temperature anneals from 10 in the algorithm.\\ + \RR{rate} & A positive percentage that defines the temperature decrease rate, and it should be \\ + & in (0,1). For example, rate=0.25 means the temperature decreases by 25\% each time. \\ + & The default is set to be 10\%.\\ + \RR{Tmin} & A positive number that defines the minimum temperature allowed. When current \\ + & temperature becomes smaller or equal to \RR{Tmin}, the stopping criterion for current \\ + & loop is met. The default is set to be 1.\\ + \RR{Imax} & A positive integer that defines the maximum perturbations the algorithm will try \\ + & without improvements before temperature is reduced. The default is set to be 5. \\ + & For CPU time consideration, \RR{Imax} is recommended to be no greater than 5.\\ + \hline + \RR{OC} & An optimality criterion. The default setting is ``phi\_p", and it could be one of \\ + & the following: ``phi\_p", ``AvgAbsCor", ``MaxAbsCor", ``MaxProCriterion".\\ + \RR{p} & A positive integer, which is one parameter in the $\phi_{p}$ formula, and \RR{p} is preferred \\ + & to be large. The default is set to be 15.\\ + \RR{q} & A positive integer, which is one parameter in the $\phi_{p}$ formula, and \RR{q} could be\\ + & either 1 or 2. If \RR{q} is 1, the Manhattan (rectangular) distance will be calculated. \\ + & If \RR{q} is 2, the Euclidean distance will be calculated.\\ + \RR{maxtime} & A positive number, which indicates the expected maximum CPU time, and it is\\ + & measured by minutes. For example, \RR{maxtime}=3.5 indicates the CPU time will \\ + & be no greater than three and a half minutes. The default is set to be 5.\\ + \hline + \end{tabular} + \end{center} +\end{table} + +\cite{leary2003optimal} modified the SA algorithm in \cite{morris1995exploratory} to search for optimal orthogonal array-based LHDs (OALHDs). \cite{tang1993orthogonal} showed that OALHDs tend to have better space-filling properties than random LHDs. The SA in \cite{leary2003optimal} starts with a random OALHD rather than a random LHD. %Two random elements that share the same entry in the original orthogonal array (OA) are exchanged. For example, in an OALHD with 9 runs, suppose that a randomly chosen column has the elements: $(1,2,3,4,5,6,7,8,9)$ and its original OA has the entries: $({\it 1,1,1},{\it 2,2,2},{\it 3,3,3})$. Elements $(1,2,3)$ share the same original OA entry ${\it 1}$, $(4,5,6)$ share the same original OA entry ${\it 2}$, and $(7,8,9)$ share the same original OA entry ${\it 3}$. +The remaining steps are the same as the SA in \cite{morris1995exploratory}. Note that the existence of OALHDs is determined by the existence of the corresponding initial OAs. In the \RR{LHD} package, the function \RR{OASA()} implements the modified SA algorithm.: +\begin{verbatim} +OASA(OA, N = 10, T0 = 10, rate = 0.1, Tmin = 1, Imax = 5, OC = "phi_p", +p = 15, q = 1, maxtime = 5), +\end{verbatim} +\noindent where all the input arguments are the same as in \RR{SA} except that \RR{OA} must be an orthogonal array. + +\cite{joseph2008orthogonal} proposed another modified SA to identify the orthogonal-maximin LHDs, which considers both the orthogonality and the maximin distance criteria. The algorithm starts with generating a random LHD and then chooses the column that has the largest average pairwise correlations with all other columns. %As an illustration, the algorithm selects the $l^{*}$th column where $l^{*} = \argmax \rho_{l}^2$ and $\rho_{l}^2=\frac{1}{k-1}\sum_{j\neq l}\rho_{lj}^2$ is the average pairwise correlation between the $l$th and all other columns. +Next, the algorithm will select the row which has the largest total row-wise distance with all other rows. %As an illustration, the algorithm selects the $i^{*}$th row where $i^{*}=\argmax (\sum_{j\neq i}d(x_i, x_j)^{-p})^{1/p}$ and $d(x_i, x_j)$ is the distance between the $i$th and $j$th rows. +Then, the element at the selected row and column will be exchanged with a random element from the same column. The remaining steps are the same as the SA in \cite{morris1995exploratory}. In the \RR{LHD} package, the function \RR{SA2008()} implements this algorithm: +{\small + \begin{verbatim} +SA2008(n, k, N = 10, T0 = 10, rate = 0.1, Tmin = 1, Imax = 5, OC = "phi_p", +p = 15, q = 1, maxtime = 5), + \end{verbatim} +} +\noindent where all the input arguments are the same as in \RR{SA}. + +\subsection{Particle Swarm Optimization Algorithms} + +Particle swarm optimization (PSO, \cite{kennedy1995particle}) is a metaheuristic optimization algorithm inspired by the social behaviors of animals. Recent research \citep{chen2013optimizing} adapted the classic PSO algorithm and proposed LaPSO to identify maximin distance LHDs. Since this is a discrete optimization task, LaPSO redefines the steps in which each particle updates its velocity and position in the general PSO framework. In the \RR{LHD} package, the function \RR{LaPSO()} implements this algorithm: +{\small + \begin{verbatim} +LaPSO(n, k, m = 10, N = 10, SameNumP = 0, SameNumG = n/4, p0 = 1/(k - 1), +OC = "phi_p", p = 15, q = 1, maxtime = 5) + \end{verbatim} +} + +Table~\ref{T2} provides an overview of all the input arguments in \RR{LaPSO()}, where \RR{n}, \RR{k}, \RR{N}, \RR{OC}, \RR{p}, \RR{q}, and \RR{maxtime} are exactly the same as the input arguments in the function \RR{SA()}. \RR{m} is the number of particles, which represents candidate solutions in the PSO framework. \RR{SameNumP} and \RR{SameNumG} are two tuning parameters that denote how many exchanges would be performed to reduce the Hamming distance towards the personal best and the global best. \RR{p0} is the tuning parameter that denotes the probability of a random swap for two elements in the current column of the current particle to prevent the algorithm from being stuck at the local optimum. In \cite{chen2013optimizing}, they provided the following suggestions: \RR{SameNumP} is approximately $n/2$ when \RR{SameNumG} is $0$, \RR{SameNumG} is approximately $n/4$ when \RR{SameNumP} is $0$, and \RR{p0} should be between $1/(k-1)$ and $2/(k-1)$. The stopping criterion of the function is the same as that of the function \RR{SA}. + +\begin{table}[!htb] + \caption{Overview of Input Arguments of the \RR{LaPSO} Function}\label{T2} + \begin{center} + \begin{tabular}{ll} + \hline + Argument & Description\\ + \hline + \RR{n} & A positive integer that defines the number of rows (or run size) of output LHD.\\ + \RR{k} & A positive integer that defines the number of columns (or factor size) of output LHD.\\ + \RR{m} & A positive integer that defines the number of particles, where each particle is a\\ + & candidate solution. A large value of \RR{N} will result in a high CPU time, and it is \\ + & recommended to be no greater than 100. The default is set to be 10.\\ + \RR{N} & A positive integer that defines the maximum number of iterations in the algorithm. \\ + & A large value of \RR{N} will result in a high CPU time, and it is recommended to be no \\ + & greater than 500. The default is set to be 10.\\ + \hline + \RR{SameNumP} & A non-negative integer that defines how many elements in current column of \\ + & current particle should be the same as corresponding Personal Best. SameNumP \\ + & can be 0, 1, 2, \ldots , n, where 0 means to skip the element exchange, which is the \\ + & default setting.\\ + \RR{SameNumG} & A non-negative integer that defines how many elements in current column of \\ + & current particle should be the same as corresponding Global Best. SameNumG can \\ + & be 0, 1, 2, \ldots , n, where 0 means to skip the element exchange. The default setting\\ + & is n/4. Note that SameNumP and SameNumG cannot be 0 at the same time.\\ + \RR{p0} & A probability of exchanging two randomly selected elements in current column of \\ + & current particle LHD. The default is set to be 1/(k - 1). \\ + \hline + \RR{OC} & An optimality criterion. The default setting is ``phi\_p", and it could be one of \\ + & the following: ``phi\_p", ``AvgAbsCor", ``MaxAbsCor", ``MaxProCriterion".\\ + \RR{p} & A positive integer, which is one parameter in the $\phi_{p}$ formula, and \RR{p} is preferred \\ + & to be large. The default is set to be 15.\\ + \RR{q} & A positive integer, which is one parameter in the $\phi_{p}$ formula, and \RR{q} could be\\ + & either 1 or 2. If \RR{q} is 1, the Manhattan (rectangular) distance will be calculated. \\ + & If \RR{q} is 2, the Euclidean distance will be calculated.\\ + \RR{maxtime} & A positive number, which indicates the expected maximum CPU time, and it is\\ + & measured by minutes. For example, \RR{maxtime}=3.5 indicates the CPU time will \\ + & be no greater than three and a half minutes. The default is set to be 5.\\ + \hline + \end{tabular} + \end{center} +\end{table} + +\subsection{Genetic Algorithms} + +The genetic algorithm (GA) is a nature-inspired metaheuristic optimization algorithm that mimics Charles Darwin's idea of natural selection \citep{holland1992adaptation, goldberg1989genetic}. \cite{liefvendahl2006study} proposed a version of GA for identifying maximin distance LHDs. They implement the column exchange technique to solve the discrete optimization task. In the \RR{LHD} package, the function \RR{GA()} implements this algorithm: +{\small + \begin{verbatim} +GA(n, k, m = 10, N = 10, pmut = 1/(k - 1), OC = "phi_p", p = 15, q = 1, +maxtime = 5) + \end{verbatim} +} + +Table~\ref{T3} provides an overview of all the input arguments in \RR{GA()}, where \RR{n}, \RR{k}, \RR{N}, \RR{OC}, \RR{p}, \RR{q}, and \RR{maxtime} are exactly the same as the input arguments in the function \RR{SA()}. \RR{m} is the population size, which represents how many candidate solutions in each iteration, and must be an even number. \RR{pmut} is the tuning parameter that controls how likely the mutation would happen. When mutation occurs, two randomly selected elements will be exchanged in the current column of the current LHD. \RR{pmut} serves the same purpose as \RR{p0} in \RR{LaPSO()}, which prevents the algorithm from getting stuck at the local optimum, and it is recommended to be $1/(k-1)$. The stopping criterion of the function is the same as that of the function \RR{SA}. + +\begin{table}[!htb] + \caption{Overview of Input Arguments of the \RR{GA} Function}\label{T3} + \begin{center} + \begin{tabular}{ll} + \hline + Argument & Description\\ + \hline + \RR{n} & A positive integer that defines the number of rows (or run size) of output LHD.\\ + \RR{k} & A positive integer that defines the number of columns (or factor size) of output LHD.\\ + \RR{m} & A positive even integer, which stands for the population size and it must be an even \\ + & number. The default is set to be 10. A large value of m will result in a high CPU time, \\ + & and it is recommended to be no greater than 100.\\ + \RR{N} & A positive integer that defines the maximum number of iterations in the algorithm. \\ + & A large value of \RR{N} will result in a high CPU time, and it is recommended to be no \\ + & greater than 500. The default is set to be 10.\\ + \hline + \RR{pmut} & A probability for mutation. When the mutation happens, two randomly selected \\ + & elements in current column of current LHD will be exchanged. The default is \\ + & set to be 1/(k - 1). \\ + \hline + \RR{OC} & An optimality criterion. The default setting is ``phi\_p", and it could be one of \\ + & the following: ``phi\_p", ``AvgAbsCor", ``MaxAbsCor", ``MaxProCriterion".\\ + \RR{p} & A positive integer, which is one parameter in the $\phi_{p}$ formula, and \RR{p} is preferred \\ + & to be large. The default is set to be 15.\\ + \RR{q} & A positive integer, which is one parameter in the $\phi_{p}$ formula, and \RR{q} could be\\ + & either 1 or 2. If \RR{q} is 1, the Manhattan (rectangular) distance will be calculated. \\ + & If \RR{q} is 2, the Euclidean distance will be calculated.\\ + \RR{maxtime} & A positive number, which indicates the expected maximum CPU time, and it is\\ + & measured by minutes. For example, \RR{maxtime}=3.5 indicates the CPU time will \\ + & be no greater than three and a half minutes. The default is set to be 5.\\ + \hline + \end{tabular} + \end{center} +\end{table} + +\subsection{Illustrating Examples for the Implemented Search Algorithms} + +This subsection demonstrates some examples on how to use the search algorithms in the developed \RR{LHD} package. In Table~\ref{A1}, we summarize the R functions of the algorithms discussed in the previous subsections, which can be used to identify different types of optimal LHDs. Users who seek fast solutions can use the default settings of the input arguments after specifying the design sizes. See the following examples. + +\begin{table}[!htb] + \caption{Search algorithm functions in the \RR{LHD} package}\label{A1} + \begin{center} + \begin{tabular}{ll} + \hline + Function & Description\\ + \hline + SA & Returns an LHD via the simulated annealing algorithm \citep{morris1995exploratory}.\\ + OASA & Returns an LHD via the orthogonal-array-based simulated annealing algorithm \\ + & \citep{leary2003optimal}, where an OA of the required design size must exist.\\ + SA2008 & Returns an LHD via the simulated annealing algorithm with the multi-objective \\ + & optimization approach \citep{joseph2008orthogonal}.\\ + %SLHD & Returns an LHD via the improved two-stage algorithm by \cite{ba2015optimal}.\\ + LaPSO & Returns an LHD via the particle swarm optimization \citep{chen2013optimizing}.\\ + GA & Returns an LHD via the genetic algorithm \citep{liefvendahl2006study}. \\ + \hline + \end{tabular} + \end{center} +\end{table} + +{\small + \begin{verbatim} +#Generate a 5 by 3 maximin distance LHD by the SA function. +> try.SA = SA(n = 5, k = 3); try.SA + [,1] [,2] [,3] +[1,] 2 2 1 +[2,] 5 3 2 +[3,] 4 5 5 +[4,] 3 1 4 +[5,] 1 4 3 +> phi_p(try.SA) #\phi_p is smaller than that of a random LHD (0.3336608). +[1] 0.2169567 + +#Similarly, generations of 5 by 3 maximin distance LHD by the SA2008, LaPSO and GA functions. +> try.SA2008 = SA2008(n = 5, k = 3) +> try.LaPSO = LaPSO(n = 5, k = 3) +> try.GA = GA(n = 5, k = 3) + +#Generate an OA(9,2,3,2), an orthogonal array with 9 runs, 2 factors, 3 levels, and 2 strength. +> OA = matrix(c(rep(1:3, each = 3), rep(1:3, times = 3)), ++ ncol = 2, nrow = 9, byrow = FALSE) +#Generates a maximin distance LHD with the same design size as the input OA +#by the orthogonal-array-based simulated annealing algorithm. +> try.OASA = OASA(OA) +> OA; try.OASA + [,1] [,2] [,1] [,2] +[1,] 1 1 [1,] 1 2 +[2,] 1 2 [2,] 2 6 +[3,] 1 3 [3,] 3 9 +[4,] 2 1 [4,] 4 3 +[5,] 2 2 [5,] 6 5 +[6,] 2 3 [6,] 5 7 +[7,] 3 1 [7,] 7 1 +[8,] 3 2 [8,] 9 4 +[9,] 3 3; [9,] 8 8 + \end{verbatim} +} + +Note that the default optimality criterion embedded in all search algorithms is ``phi\_p" (that is, the maximin distance criterion), leading to the maximin $L_2$-distance LHDs. For other optimality criteria, users should change the setting of the input argument \RR{OC} (with options ``phi\_p", ``MaxProCriterion", ``MaxAbsCor" and ``MaxProCriterion"). The following examples illustrate some details of different argument settings. + +{\small + \begin{verbatim} +#Below try.SA is a 5 by 3 maximin distance LHD generated by the SA with 30 iterations (N = 30). +#The temperature starts at 10 (T0 = 10) and decreases 10% (rate = 0.1) each time. +#The minimium temperature allowed is 1 (Tmin = 1) and the maximum perturbations that +#the algorithm will try without improvements is 5 (Imax = 5). The optimality criterion +#used is maximin distance criterion (OC = "phi_p") with p = 15 and q = 1, and the +#maximum CPU time is 5 minutes (maxtime = 5). +> try.SA = SA(n = 5, k = 3, N = 30, T0 = 10, rate = 0.1, Tmin = 1, Imax = 5, OC = "phi_p", ++ p = 15, q = 1, maxtime = 5); try.SA + [,1] [,2] [,3] +[1,] 1 3 4 +[2,] 2 5 2 +[3,] 5 4 3 +[4,] 4 1 5 +[5,] 3 2 1 +> phi_p(try.SA) +[1] 0.2169567 + +#Below try.SA2008 is a 5 by 3 maximin distance LHD generated by SA with +#the multi-objective optimization approach. The input arguments are interpreted +#the same as the design try.SA above. +> try.SA2008 = SA2008(n = 5, k = 3, N = 30, T0 = 10, rate = 0.1, Tmin = 1, Imax = 5, ++ OC = "phi_p", p = 15, q = 1, maxtime = 5) + +#Below try.OASA is a 9 by 2 maximin distance LHD generated by the +#orthogonal-array-based simulated annealing algorithm with the input +#OA (defined previously), and the rest input arguments are interpreted the +#same as the design try.SA above. +> try.OASA = OASA(OA, N = 30, T0 = 10, rate = 0.1, Tmin = 1, Imax = 5, ++ OC = "phi_p", p = 15, q = 1, maxtime = 5) + +#Below try.LaPSO is a 5 by 3 maximum projection LHD generated by the particle swarm +#optimization algorithm with 20 particles (m = 20) and 30 iterations (N = 30). +#Zero (or two) elements in any column of the current particle should be the same as +#the elements of corresponding column from personal best (or global best), because +#of SameNumP = 0 (or SameNumG = 2). +#The probability of exchanging two randomly selected elements is 0.5 (p0 = 0.5). +#The optimality criterion is maximum projection criterion (OC = "MaxProCriterion"). +#The maximum CPU time is 5 minutes (maxtime = 5). +> try.LaPSO = LaPSO(n = 5, k = 3, m = 20, N = 30, SameNumP = 0, SameNumG = 2, ++ p0 = 0.5, OC = "MaxProCriterion", maxtime = 5); try.LaPSO + [,1] [,2] [,3] +[1,] 4 5 4 +[2,] 3 1 3 +[3,] 5 2 1 +[4,] 2 3 5 +[5,] 1 4 2 +#Recall the value is 0.5375482 from the random LHD in Section 2. +> MaxProCriterion(try.LaPSO) +[1] 0.3561056 + +#Below try.GA is a 5 by 3 OLHD generated by the genetic algorithm with the +#population size 20 (m = 20), number of iterations 30 (N = 30), mutation +#probability 0.5 (pmut = 0.5), maximum absolute correlation criterion +#(OC = "MaxAbsCor"), and maximum CPU time 5 minutes (maxtime = 5). +> try.GA = GA(n = 5, k = 3, m = 20, N = 30, pmut = 0.5, OC = "MaxAbsCor", ++ maxtime = 5); try.GA + [,1] [,2] [,3] +[1,] 2 1 2 +[2,] 4 4 5 +[3,] 3 5 1 +[4,] 5 2 3 +[5,] 1 3 4 +#Recall the value is 0.9 from the random LHD in Section 2. +> MaxAbsCor(try.GA) +[1] 0.1 #The maximum absolute correlation between columns is 0.1 + \end{verbatim} +} + +Next, we discuss some details of the implementation. In SA based algorithms (\RR{SA}, \RR{SA2008}, and \RR{OASA}), the number of iterations \RR{N} is recommended to be no greater than 500 for computing time considerations. The input \RR{rate} determines the percentage of the decrease in current temperature (for example, $0.1$ means a decrease of $10\%$ each time). A high rate would make the temperature rapidly drop, which leads to a fast stop of the algorithm. It is recommended to set \RR{rate} from $0.1$ to $0.15$. \RR{Imax} indicates the maximum perturbations that the algorithm will attempt without improvements before the temperature reduces, and it is recommended to be no greater than 5 for computing time considerations. \RR{OC} chooses the optimality criterion, and the \RR{"phi\_p"} criterion in \eqref{E2} is set as default. \RR{OC} has other options, including \RR{"MaxProCriterion"}, \RR{"AvgAbsCor"} and \RR{"MaxAbsCor"}. Our algorithms support both the $L_1$ and $L_2$ distances. + +For every algorithm, we incorporate a progress bar to visualize the computing time used. After an algorithm is completed, information on ``average CPU time per iteration" and ``numbers of iterations completed" will be presented. Users can set the limit for the CPU time used for each algorithm using the argument \RR{maxtime}, according to their practical needs. + +We also provide some illustrative code to demonstrate that the designs found in the \RR{LHD} package are better than the existing ones, and the code in the following can be easily modified to construct other design sizes or other LHD types. Out of 100 trials, the code below shows the GA in the \RR{LHD} package constructed better MaxPro LHDs 99 times compared to the algorithm in the \RR{MaxPro} package, when 500 iterations are set for both algorithms. We did not compare the CPU time between these two packages since one is written in the R environment and the other one is written in the C++ environment, but with the same number of iterations, the GA in the \RR{LHD} package almost always constructs better MaxPro LHDs. + +{\small + \begin{verbatim} +#Make sure both packages are properly installed before load them +> library(LHD) +> library(MaxPro) + +> count = 0 #Define a variable for counting purpose + +> k = 5 #Factor size 5 +> n = 10*k #Run size = 10*factor size + +#Setting 500 iterations for both algorithms, below loop counts +#how many times the GA from LHD package outperforms the algorithm +#from MaxPro package out of 100 times +> for (i in 1:100) { + + LHD = LHD::GA(n = n, k = k, m = 100, N = 500) + MaxPro = MaxPro::MaxProLHD(n = n, p = k, total_iter = 500)$Design + + #MaxPro * n + 0.5 applied the transformation mentioned in Section 2 + #to revert the scaling. + Result.LHD = LHD::MaxProCriterion(LHD) + Result.MaxPro = LHD::MaxProCriterion(MaxPro * n + 0.5) + + if (Result.LHD < Result.MaxPro) {count = count + 1} + +} + +> count +[1] 99 + \end{verbatim} +} + +\section{Algebraic Constructions for Optimal LHDs with Certain Sizes}\label{Constr} + +There are algebraic constructions available for certain design sizes, and theoretical results are developed to guarantee the efficiency of such designs. Algebraic constructions almost do not require any searching, which are especially attractive for large designs. In this section, we present algebraic constructions that are available in the \RR{LHD} package for maximin distance LHDs and orthogonal LHDs. + +\subsection{Algebraic Constructions for Maximin Distance LHDs}\label{WT} + +\cite{wang2018optimal} proposed to generate maximin distance LHDs via good lattice point (GLP) sets \citep{zhou2015space} and Williams transformation \citep{williams1949experimental}. In practice, their method can lead to space-filling designs with relatively flexible sizes, where the run size $n$ is flexible but the factor size $k$ must be no greater than the number of positive integers that are co-prime to $n$. They proved that the resulting designs of sizes $n \times (n-1)$ (with $n$ being any odd prime) and $n \times n$ (with $2n+1$ or $n+1$ being odd prime) are optimal under the maximin $L_1$ distance criterion. This construction method by \cite{wang2018optimal} is very attractive for constructing large maximin distance LHDs. In the \RR{LHD} package, function \RR{FastMmLHD()} implements this method: + +\begin{verbatim} +FastMmLHD(n, k, method = "manhattan", t1 = 10), +\end{verbatim} + +\noindent where \RR{n} and \RR{k} are the desired run size and factor size. \RR{method} is a distance measure method which can be one of the following: ``euclidean", ``maximum", ``manhattan", ``canberra", ``binary" or ``minkowski". Any unambiguous substring can be given. \RR{t1} is a tuning parameter, which determines how many repeats will be implemented to search for the optimal design. The default is set to be 10. + +\cite{tang1993orthogonal} proposed to construct orthogonal array-based LHDs (OALHDs) from existing orthogonal arrays (OAs), and \cite{tang1993orthogonal} showed that the OALHDs can have better space-filling properties than the general ones. In the \RR{LHD} package, function \RR{OA2LHD()} implements this method: +{\small + \begin{verbatim} +OA2LHD(OA), + \end{verbatim} +} + +\noindent where \RR{OA} is an orthogonal array matrix. Users only need to input an OA and the function will return an OALHD with the same design size as the input OA. + +\subsection{Algebraic Constructions for Orthogonal LHDs} +\label{sec:olhd} + +Orthogonal LHDs (OLHDs) have zero pairwise correlation between any two columns, which are widely used by practitioners. There is a rich literature on the constructions of OLHDs with various design sizes, but they are often too hard for practitioners to replicate in practice. The \RR{LHD} package implements some currently popular methods \citep{ye1998orthogonal, cioppa2007efficient, sun2010construction, tang1993orthogonal, lin2009construction, butler2001optimal} for practitioners and the functions are easy to use. + +\cite{ye1998orthogonal} proposed a construction for OLHDs with run sizes $n=2^m+1$ and factor sizes $k=2m-2$ where $m$ is any integer bigger than 2. In the \RR{LHD} package, function \RR{OLHD.Y1998()} implements this algebraic construction: +{\small + \begin{verbatim} +OLHD.Y1998(m), + \end{verbatim} +} +\noindent where input argument \RR{m} is the $m$ in the construction of \cite{ye1998orthogonal}. \cite{cioppa2007efficient} extended \cite{ye1998orthogonal}'s method to construct OLHDs with run size $n=2^m+1$ and factor size $k=m+ {m-1 \choose 2}$, where $m$ is any integer bigger than 2. In the \RR{LHD} package, function \RR{OLHD.C2007()} implements this algebraic construction with input argument \RR{m} remaining the same: +{\small + \begin{verbatim} +OLHD.C2007(m) + \end{verbatim} +} + +\cite{sun2010construction} extended their earlier work \citep{sun2009construction} to construct OLHDs with $n=r2^{c+1}+1$ or $n=r2^{c+1}$ and $k=2^c$, where $r$ and $c$ are positive integers. In the \RR{LHD} package, function \RR{OLHD.S2010()} implements this algebraic construction: +{\small + \begin{verbatim} +OLHD.S2010(C, r, type = "odd"), + \end{verbatim} +} + +\noindent where input arguments \RR{C} and \RR{r} are $c$ and $r$ in the construction. When input argument \RR{type} is \RR{"odd"}, the output design size would be $n=r2^{c+1}+1$ by $k=2^c$. When input argument \RR{type} is \RR{"even"}, the output design size would be $n=r2^{c+1}$ by $k=2^c$. + +\cite{lin2009construction} constructed OLHDs or NOLHDs with $n^2$ runs and $2fp$ factors by coupling OLHD($n$, $p$) or NOLHD($n$, $p$) with an OA($n^2,2f,n,2$). For example, an OLHD(11, 7), coupled with an OA(121,12,11,2), would yield an OLHD(121, 84). The design size of output OLHD or NOLHD highly depends on the existence of the OAs. In the \RR{LHD} package, function \RR{OLHD.L2009()} implements this algebraic construction: +{\small + \begin{verbatim} +OLHD.L2009(OLHD, OA), + \end{verbatim} +} + +\noindent where input arguments \RR{OLHD} and \RR{OA} are the OLHD and OA to be coupled, and their design sizes need to be aligned with the designated pattern of the construction. + +\cite{butler2001optimal} proposed a method to construct OLHDs with the run size $n$ being odd primes and factor size $k$ being less than or equal to $n-1$ via the Williams transformation \citep{williams1949experimental}. In the \RR{LHD} package, function \RR{OLHD.B2001()} implements this algebraic construction with input arguments \RR{n} and \RR{k} exactly matching those in construction: +{\small + \begin{verbatim} +OLHD.B2001(n, k) + \end{verbatim} +} + +\subsection{Illustrating Examples for the Implemented Algebraic Constructions} + +In Table~\ref{A2}, we summarize the algebraic constructions implemented by the developed \RR{LHD} package, where \RR{FastMmLHD} and \RR{OA2LHD} are for maximin distance LHDs and \RR{OLHD.Y1998}, \RR{OLHD.C2007}, \RR{OLHD.S2010}, \RR{OLHD.L2009} and \RR{OLHD.B2001} are for orthogonal LHDs. The following examples will illustrate how to use them. + +\begin{table}[!htb] + \caption{Algebraic constructions in the \RR{LHD} package}\label{A2} + \begin{center} + \small \begin{tabular}{ll} + \hline + Function & Description\\ + \hline + FastMmLHD & Returns a maximin distance LHD matrix \citep{wang2018optimal}.\\ + + OA2LHD & Expands an orthogonal array to an LHD \citep{tang1993orthogonal}.\\ + + OLHD.Y1998 & Returns a $2^m+1$ by $2m-2$ orthogonal LHD matrix \citep{ye1998orthogonal} \\ + & where $m$ is an integer and $m \geq 2$.\\ + + OLHD.C2007 & Returns a $2^m+1$ by $m+{m-1 \choose 2}$ orthogonal LHD matrix \\ + & \citep{cioppa2007efficient} where $m$ is an integer and $m \geq 2$.\\ + + OLHD.S2010 & Returns a $r2^{c+1}+1$ or $r2^{c+1}$ by $2^c$ orthogonal LHD matrix \\ + & \citep{sun2010construction} where $r$ and $c$ are positive integers.\\ + + OLHD.L2009 & Couples an $n$ by $p$ orthogonal LHD with a $n^2$ by $2f$ strength $2$ and \\ + & level $n$ orthogonal array to generate a $n^2$ by $2fp$ orthogonal LHD \\ + & \citep{lin2009construction}.\\ + + OLHD.B2001 & Returns an orthogonal LHD \citep{butler2001optimal} with the run size $n$\\ + & being odd primes and factor size $k$ being less than or equal to $n-1$ .\\ + + \hline + \end{tabular} + \end{center} +\end{table} + +{\small + \begin{verbatim} +#FastMmLHD(8, 8) generates an optimal 8 by 8 maximin L_1 distance LHD. +>try.FastMm = FastMmLHD(n = 8, k = 8); try.FastMm + [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] +[1,] 0 1 2 3 4 5 6 7 +[2,] 1 3 5 7 6 4 2 0 +[3,] 2 5 7 4 1 0 3 6 +[4,] 3 7 4 0 2 6 5 1 +[5,] 4 6 1 2 7 3 0 5 +[6,] 5 4 0 6 3 1 7 2 +[7,] 6 2 3 5 0 7 1 4 +[8,] 7 0 6 1 5 2 4 3 + +#OA2LHD(OA) expands an input OA to an LHD of the same run size. +>try.OA2LHD = OA2LHD(OA) +>OA; try.OA2LHD + [,1] [,2] [,1] [,2] +[1,] 1 1 [1,] 1 2 +[2,] 1 2 [2,] 2 4 +[3,] 1 3 [3,] 3 9 +[4,] 2 1 [4,] 4 3 +[5,] 2 2 [5,] 5 5 +[6,] 2 3 [6,] 6 7 +[7,] 3 1 [7,] 9 1 +[8,] 3 2 [8,] 8 6 +[9,] 3 3; [9,] 7 8 + \end{verbatim} +} + +{\small + \begin{verbatim} +#OLHD.Y1998(m = 3) generates a 9 by 4 orthogonal LHD. +#Note that 2^m+1 = 9 and 2*m-2 = 4. +> try.Y1998 = OLHD.Y1998(m = 3); try.Y1998 + [,1] [,2] [,3] [,4] +[1,] 4 -3 -2 1 +[2,] 3 4 -1 -2 +[3,] 1 -2 3 -4 +[4,] 2 1 4 3 +[5,] 0 0 0 0 +[6,] -4 3 2 -1 +[7,] -3 -4 1 2 +[8,] -1 2 -3 4 +[9,] -2 -1 -4 -3 +> MaxAbsCor(try.Y1998) #column-wise correlations are 0. +[1] 0 + +#OLHD.C2007(m = 4) generates a 17 by 7 orthogonal LHD. +#Note that 2^m+1 = 17 and $4+{4-1 \choose 2}$ = 7. +> try.C2007 = OLHD.C2007(m = 4); dim(try.C2007) +[1] 17 7 +> MaxAbsCor(try.C2007) #column-wise correlations are 0 +[1] 0 + +#OLHD.S2010(C = 3, r = 3, type = "odd") generates a 49 by 8 orthogonal LHD. +#Note that 3*2^4+1 = 49 and 2^3 = 8. +> dim(OLHD.S2010(C = 3, r = 3, type = "odd")) +[1] 49 8 +> MaxAbsCor(OLHD.S2010(C = 3, r = 3, type = "odd")) #column-wise correlations are 0 +[1] 0 + +#OLHD.S2010(C = 3, r = 3, type = "even") generates a 48 by 8 orthogonal LHD. +#Note that 3*2^4 = 48 and 2^3 = 8. +> dim(OLHD.S2010(C = 3, r = 3, type = "even")) +[1] 48 8 +> MaxAbsCor(OLHD.S2010(C = 3, r = 3, type = "even")) #column-wise correlations are 0 +[1] 0 + +#Create a 5 by 2 OLHD. +> OLHD = OLHD.C2007(m = 2) + +#Create an OA(25, 6, 5, 2). +> OA = matrix(c(2,2,2,2,2,1,2,1,5,4,3,5,3,2,1,5,4,5,1,5,4,3,2,5,4,1,3,5,2,3, +1,2,3,4,5,2,1,3,5,2,4,3,1,1,1,1,1,1,4,3,2,1,5,5,5,5,5,5,5,1,4,4,4,4,4,1, +3,1,4,2,5,4,3,3,3,3,3,1,3,5,2,4,1,3,3,4,5,1,2,2,5,4,3,2,1,5,2,3,4,5,1,2, +2,5,3,1,4,4,1,4,2,5,3,4,4,2,5,3,1,4,2,4,1,3,5,3,5,3,1,4,2,4,5,2,4,1,3,3, +5,1,2,3,4,2,4,5,1,2,3,2), ncol = 6, nrow = 25, byrow = TRUE) + +#OLHD.L2009(OLHD, OA) generates a 25 by 12 orthogonal LHD. +#Note that n = 5 so n^2 = 25. p = 2 and f = 3 so 2fp = 12. +> dim(OLHD.L2009(OLHD, OA)) +[1] 25 12 +> MaxAbsCor(OLHD.L2009(OLHD, OA)) #column-wise correlations are 0. +[1] 0 + +#OLHD.B2001(n = 11, k = 5) generates a 11 by 5 orthogonal LHD. +> dim(OLHD.B2001(n = 11, k = 5)) +[1] 11 5 + \end{verbatim} +} + +\section{Other R Packages for Latin Hypercube and Comparative Discussion} + +Several R packages have been developed to facilitate Latin hypercube samples and design constructions for computer experiments. Among these, the \RR{lhs} package \citep{lhs} is widely recognized for its utility. It provides functions for generating both random and optimized Latin hypercube samples (but not designs), and its methods are particularly useful for simulation studies where space-filling properties are desired but design optimality is not the primary focus. The \RR{SLHD} package \citep{SLHD} was originally developed for generating sliced LHDs \citep{ba2015optimal}, while practitioners can set the number of slices to one to use the package for generating maximin LHDs. The \RR{MaxPro} package \citep{MaxPro} focuses on constructing designs that maximize projection properties. One of its functions, \RR{MaxProLHD}, generates MaxPro LHDs using a simulated annealing algorithm \citep{joseph2015maximum}. + +While we acknowledge the contributions of other relevant R packages, we emphasize the distinguishing features of our developed package. The \RR{LHD} package embeds multiple optimality criteria, enabling the construction of various types of optimal LHDs. In contrast, \RR{lhs} and \RR{SLHD} primarily focus on space-filling Latin hypercube samples and designs, while \RR{MaxPro} primarily focuses on maximum projection LHDs. The \RR{LHD} package implements various search algorithms and algebraic constructions, whereas the other three packages do not implement algebraic constructions, and both \RR{SLHD} and \RR{MaxPro} only implement one algorithm to construct LHDs. The primary application of \RR{LHD} is in the design of computer experiments, whereas \RR{lhs} is mainly used for sampling and simulation studies. Therefore, \RR{LHD} emphasizes design optimality, while \RR{lhs} emphasizes the space-filling properties of samples. + +\section{Conclusion and Recommendation}\label{Con} + +\RR{LHD} package implements popular search algorithms, including the SA \citep{morris1995exploratory}, OASA \citep{leary2003optimal}, SA2008 \citep{joseph2008orthogonal}, LaPSO \citep{chen2013optimizing} and GA \citep{liefvendahl2006study}, along with some widely used algebraic constructions \citep{wang2018optimal, ye1998orthogonal, cioppa2007efficient, sun2010construction, tang1993orthogonal, lin2009construction, butler2001optimal}, for constructing three types of commonly used optimal LHDs: the maximin distance LHDs, the maximum projection LHDs and the (nearly) orthogonal LHDs. We aim to provide guidance and an easy-to-use tool for practitioners to find appropriate experimental designs. Algebraic constructions are preferred when available, especially for large designs. Search algorithms are used to generate optimal LHDs with flexible sizes. + +Among very few R libraries particularly for LHDs, \RR{LHD} is comprehensive and self-contained as it not only has search algorithms and algebraic constructions, but also has other useful functions for LHD research and development such as calculating different optimality criteria, generating random LHDs, exchanging two random elements in a matrix, and calculating intersite distance between matrix rows. The help manual in the package documentation contains further details and illustrative examples for users who want to explore more of the functions in the package. + +\section*{Acknowledgments} + +This research was partially supported by the National Science Foundation (NSF) grant DMS-2311186 and the National Key R\&D Program of China 2024YFA1016200. The authors appreciate the reviewers' constructive comments and suggestions. + +\bibliography{wang-xiao-mandal} + +\address{Hongzhi Wang\\ + \email{wanghongzhi.ut@gmail.com}} + +\address{Qian Xiao\\ + Department of Statistics, School of Mathematical Sciences, Shanghai Jiao Tong University\\ + 800 Dongchuan Road, Minhang, Shanghai, 200240\\ + China\\ + \email{qian.xiao@sjtu.edu.cn}} + +\address{Abhyuday Mandal\\ + Department of Statistics, University of Georgia\\ + 310 Herty Drive, Athens, GA 30602\\ + USA\\ + \email{amandal@stat.uga.edu}} diff --git a/_articles/RJ-2025-034/RJ-2025-034.R b/_articles/RJ-2025-034/RJ-2025-034.R new file mode 100644 index 0000000000..7dcf8dbcd6 --- /dev/null +++ b/_articles/RJ-2025-034/RJ-2025-034.R @@ -0,0 +1,596 @@ +# Generated by `rjournal_pdf_article()` using `knitr::purl()`: do not edit by hand +# Please edit RJ-2025-034.Rmd to modify this file + +## ----setup, include=FALSE----------------------------------------------------- +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>", + cache = TRUE +) +par(pch = 20, bty = "l") + + +## ----tbl-japanese-women, eval=TRUE, echo=FALSE-------------------------------- +library(longevity) +library(ggplot2) +data(japanese2, package = "longevity") +female_japanese <- japanese2 |> + dplyr::filter(gender == "female") |> + dplyr::select(!gender) +knitr::kable( + tidyr::pivot_wider(female_japanese, + names_from = bcohort, + values_from = count), + booktabs = TRUE, + longtable = FALSE, + centering = TRUE, + row.names = FALSE, + linesep = "", + caption = "Death count by birth cohort and age band for female Japanese.") + + +## ----dutch-setup, eval=TRUE, echo=FALSE, message=FALSE, warning=FALSE--------- +library(longevity) +library(lubridate) +library(dplyr, warn.conflicts = FALSE) +data(dutch, package = "longevity") +# Extract sampling frame from attributes of data set +yr_samp <- year(attr(x = dutch, which = "sampling_frame")) +# Preprocess data for analysis +dutch1 <- dutch |> + subset(!is.na(ndays)) |> + # Remove interval censored data for the time being + mutate(time = ndays / 365.25, # age at death + time2 = time, + # min/max age to be included in sampling frame + ltrunc = ltrunc / 365.25, + rtrunc = rtrunc / 365.25, + event = 1) |> # observed failure time (event=1) + subset(time > 98) |> + select(time, time2, ltrunc, rtrunc, event, gender, byear) +# Subset all interval censored, interval truncated records +dutch2 <- dutch |> + subset(is.na(ndays)) |> + mutate(time2 = ceiling_date( + dmy(paste("01-", dmonth, "-", dyear)), unit = "month") - 1 - + dmy(paste("01-01-", byear)), + time = dmy(paste("01-", dmonth, "-", dyear)) - + dmy(paste("31-12-", byear)), + ltrunc = dmy(paste("01-01-1986")) - dmy(paste("31-12-", byear)), + rtrunc = dmy(paste("31-12-2015")) - dmy(paste("01-01-", byear)) + ) |> + select(time, time2, ltrunc, rtrunc, gender, byear) |> + # Transform data from days to years for interpretability + mutate(time = as.numeric(time) / 365.25, # lower censoring limit + time2 = as.numeric(time2) / 365.25, # upper censoring limit + ltrunc = as.numeric(ltrunc) / 365.25, # min age to be included + rtrunc = as.numeric(rtrunc) / 365.25, # max age to be included + event = 3) |> # interval censoring (event=3) + subset(time > 98) # subset exceedances above 98 years +# Combine databases +dutch_data <- rbind(dutch1, dutch2) + + +## ----tbl-dutch-preview, eval=TRUE, echo=FALSE--------------------------------- +set.seed(2025) +# Sample some observations +icens_d <- sample(which(dutch_data$event == 1), 3) +obs_d <- sample(which(dutch_data$event == 3), 2) +knitr::kable( + dutch_data[c(obs_d, icens_d),], + digits = 2, + booktabs = TRUE, + longtable = FALSE, + centering = TRUE, + row.names = FALSE, + linesep = "", + caption = "Sample of five Dutch records, formatted so that the inputs match the function arguments used by the package. Columns give the age in years at death (or plausible interval), lower and upper truncation bounds giving minimum and maximum age for inclusion, an integer indicating the type of censoring, gender and birth year.") + + +## ----longevity-setup, echo=TRUE, eval=TRUE------------------------------------ +data(japanese2, package = "longevity") +# Keep only non-empty cells +japanese2 <- japanese2[japanese2$count > 0, ] +# Define arguments that are recycled +japanese2$rtrunc <- 2020 - + as.integer(substr(japanese2$bcohort, 1, 4)) +# The line above extracts the earliest year of the birth cohort +# Create a list with all arguments common to package functions +args_japan <- with(japanese2, + list( + time = age, # lower censoring bound + time2 = age + 1L, # upper censoring bound + event = 3, # define interval censoring + type = "interval2", + rtrunc = rtrunc, # right truncation limit + weights = count)) # counts as weights + + +## ----tab-parametric, eval=TRUE, echo=FALSE------------------------------------ +df_parametric_html <- data.frame(rbind( +c("`exp`", "\\(\\sigma^{-1}\\)", "\\(\\sigma > 0\\)"), +c("`gomp`", "\\(\\sigma^{-1}\\exp(\\beta t/\\sigma)\\)", +"\\(\\sigma > 0, \\beta \\ge 0\\)"), +c("`gp`", "\\((\\sigma + \\xi t)_{+}^{-1}\\)", "\\(\\sigma > 0, \\xi \\in \\mathbb{R}\\)"), +c("`weibull`", "\\(\\sigma^{-\\alpha} \\alpha t^{\\alpha-1}\\)", "\\(\\sigma > 0, \\alpha > 0\\)"), +c("`extgp`", "\\(\\beta\\sigma^{-1}\\exp(\\beta t/\\sigma)[\\beta+\\xi\\{\\exp(\\beta t/\\sigma) -1\\}]^{-1}\\)", "\\(\\sigma > 0, \\beta \\ge 0, \\xi \\in \\mathbb{R}\\)"), +c("`extweibull`", "\\(\\alpha\\sigma^{-\\alpha}t^{\\alpha-1}\\{1+\\xi(t/\\sigma)^{\\alpha}\\}_{+}\\)", "\\(\\sigma > 0, \\alpha > 0, \\xi \\in \\mathbb{R}\\)"), +c("`perks`", "\\(\\alpha\\exp(\\nu x)/\\{1+\\alpha\\exp(\\nu x)\\}\\)", "\\(\\nu \\ge 0, \\alpha >0\\)"), +c("`beard`", "\\(\\alpha\\exp(\\nu x)/\\{1+\\alpha\\beta\\exp(\\nu x)\\}\\)", "\\(\\nu \\ge 0, \\alpha >0, \\beta \\ge 0\\)"), +c("`gompmake`", "\\(\\lambda + \\sigma^{-1}\\exp(\\beta t/\\sigma)\\)", +"\\(\\lambda \\ge 0, \\sigma > 0, \\beta \\ge 0\\)"), +c("`perksmake`", "\\(\\lambda + \\alpha\\exp(\\nu x)/\\{1+\\alpha\\exp(\\nu x)\\}\\)", "\\(\\lambda \\ge 0, \\nu \\ge 0, \\alpha > 0\\)"), +c("`beardmake`", "\\(\\lambda + \\alpha\\exp(\\nu x)/\\{1+\\alpha\\beta\\exp(\\nu x)\\}\\)", "\\( \\lambda \\ge 0, \\nu \\ge 0, \\alpha > 0, \\beta \\ge 0\\)") +)) +colnames(df_parametric_html) <- + c("model", "hazard function", "constraints") +df_parametric_tex <- data.frame(rbind( +c("\\texttt{exp}", "\\(\\sigma^{-1}\\)", "\\(\\sigma > 0\\)"), +c("\\texttt{gomp}", "\\(\\sigma^{-1}\\exp(\\beta t/\\sigma)\\)", +"\\(\\sigma > 0, \\beta \\ge 0\\)"), +c("\\texttt{gp}", "\\((\\sigma + \\xi t)_{+}^{-1}\\)", "\\(\\sigma > 0, \\xi \\in \\mathbb{R}\\)"), +c("\\texttt{weibull}", "\\(\\sigma^{-\\alpha} \\alpha t^{\\alpha-1}\\)", "\\(\\sigma > 0, \\alpha > 0\\)"), +c("\\texttt{extgp}", "\\(\\beta\\sigma^{-1}\\exp(\\beta t/\\sigma)[\\beta+\\xi\\{\\exp(\\beta t/\\sigma) -1\\}]^{-1}\\)", "\\(\\sigma > 0, \\beta \\ge 0, \\xi \\in \\mathbb{R}\\)"), +c("\\texttt{extweibull}", "\\(\\alpha\\sigma^{-\\alpha}t^{\\alpha-1}\\{1+\\xi(t/\\sigma)^{\\alpha}\\}_{+}\\)", "\\(\\sigma > 0, \\alpha > 0, \\xi \\in \\mathbb{R}\\)"), +c("\\texttt{perks}", "\\(\\alpha\\exp(\\nu x)/\\{1+\\alpha\\exp(\\nu x)\\}\\)", "\\(\\nu \\ge 0, \\alpha >0\\)"), +c("\\texttt{beard}", "\\(\\alpha\\exp(\\nu x)/\\{1+\\alpha\\beta\\exp(\\nu x)\\}\\)", "\\(\\nu \\ge 0, \\alpha >0, \\beta \\ge 0\\)"), +c("\\texttt{gompmake}", "\\(\\lambda + \\sigma^{-1}\\exp(\\beta t/\\sigma)\\)", +"\\(\\lambda \\ge 0, \\sigma > 0, \\beta \\ge 0\\)"), +c("\\texttt{perksmake}", "\\(\\lambda + \\alpha\\exp(\\nu x)/\\{1+\\alpha\\exp(\\nu x)\\}\\)", "\\(\\lambda \\ge 0, \\nu \\ge 0, \\alpha > 0\\)"), +c("\\texttt{beardmake}", "\\(\\lambda + \\alpha\\exp(\\nu x)/\\{1+\\alpha\\beta\\exp(\\nu x)\\}\\)", "\\( \\lambda \\ge 0, \\nu \\ge 0, \\alpha > 0, \\beta \\ge 0\\)") +)) +colnames(df_parametric_html) <- + colnames(df_parametric_tex) <- + c("model", "hazard function", "constraints") + + +## ----tbl-parametric-models, eval=TRUE, echo=FALSE----------------------------- +if(knitr::is_html_output()){ +knitr::kable(df_parametric_html, + format = "html", + escape = FALSE, + linesep = "", + caption = "List of parametric models for excess lifetime supported by the package, with parametrization and hazard functions. The models are expressed in terms of scale parameter $\\sigma$, rate parameters $\\lambda$ or $\\nu$, and shape parameters $\\xi$, $\\alpha$ or $\\beta$.") +} else if(knitr::is_latex_output()){ +knitr::kable(df_parametric_tex, + format = "latex", + caption = "List of parametric models for excess lifetime supported by the package, with parametrization and hazard functions.", + escape = FALSE, + linesep = "", + booktabs = TRUE) |> + kableExtra::kable_styling() +} + + +## ----model-comparison--------------------------------------------------------- +thresh <- 108 +model0 <- fit_elife(arguments = args_japan, + thresh = thresh, + family = "exp") +(model1 <- fit_elife(arguments = args_japan, + thresh = thresh, + family = "gomp")) + + +## ----fig-nesting, eval=TRUE, echo=FALSE, fig.cap="Relationship between parametric models showing nested relations. Dashed arrows represent restrictions that lead to nonregular asymptotic null distribution for comparison of nested models. Comparisons between models with Makeham components and exponential are not permitted by the software because of nonidentifiability issues.", fig.alt="Graph with parametric model names, and arrows indicating the relationship between these. Dashed arrows indicate non-regular comparisons between nested models, and the expression indicates which parameter to fix to obtain the submodel.", fig.align='center', out.width='100%'---- +if(knitr::is_latex_output()){ + knitr::include_graphics("fig/nesting_graph.pdf") +} else if(knitr::is_html_output()){ + knitr::include_graphics("fig/nesting_graph.png") +} + + +## ----eval=FALSE, echo=TRUE---------------------------------------------------- +# # Model comparison +# anova(model1, model0) +# # Information criteria +# c("exponential" = BIC(model0), "Gompertz" = BIC(model1)) + + +## ----eval=TRUE, echo=FALSE---------------------------------------------------- +options(knitr.kable.NA = '') +knitr::kable(anova(model1, model0), booktabs = TRUE, digits=2) +c("exponential" = BIC(model0), "Gompertz" = BIC(model1)) + + +## ----sim-EW, echo=FALSE, eval=FALSE------------------------------------------- +# nEW <- 179 +# library(lubridate) +# set.seed(2023) +# # First observation from 1856, so maximum age for truncation is 55 years +# ub <- pgamma(q = 55*365.25, scale = 9.945*365.25, shape = 1.615) +# # Sample right truncated record +# bdate_EW <- lubridate::ymd("1910-12-31") - +# qgamma(p = runif(n = nEW)*ub, +# scale = 9.945*365.25, +# shape = 1.615) +# # Obtain truncation bounds given sampling frame +# ltrunc_EW <- pmax(0, (ymd("1968-01-01") - bdate_EW) / 365.25 - 110) +# rtrunc_EW <- as.numeric(ymd("2020-12-31") - bdate_EW) / 365.25 - 110 +# sim_EW <- longevity::samp_elife( +# n = nEW, +# scale = 1.2709, # parameters obtained from fitting IDL data for E&W +# shape = -0.0234, +# lower = ltrunc_EW, # smallest age for left truncation limit +# upper = rtrunc_EW, # maximum age +# family = "gp", # generalized Pareto +# type2 = "ltrt") # left truncated right truncated +# ewsim4 <- data.frame( +# time = sim_EW, +# ltrunc = ltrunc_EW, +# rtrunc = rtrunc_EW) + + +## ----bootstrap-comparison, eval=TRUE, echo=TRUE------------------------------- +set.seed(2022) +# Count of unique right truncation limit +db_rtrunc <- aggregate(count ~ rtrunc, + FUN = "sum", + data = japanese2, + subset = age >= thresh) +B <- 1000L # Number of bootstrap replications +boot_anova <- numeric(length = B) +boot_gof <- numeric(length = B) +for(b in seq_len(B - 1L)){ + boot_samp <- # Generate bootstrap sample + do.call(rbind, #merge data frames + apply(db_rtrunc, 1, function(x){ # for each rtrunc and count + count <- table( #tabulate count + floor( #round down + samp_elife( # sample right truncated exponential + n = x["count"], + scale = model0$par, + family = "exp", #null model + upper = x["rtrunc"] - thresh, + type2 = "ltrt"))) + data.frame( # return data frame + count = as.integer(count), + rtrunc = as.numeric(x["rtrunc"]) - thresh, + eage = as.integer(names(count))) + })) + boot_mod0 <- # Fit null model to bootstrap sample + with(boot_samp, + fit_elife(time = eage, + time2 = eage + 1L, + rtrunc = rtrunc, + type = "interval", + event = 3, + family = "exp", + weights = count)) + boot_mod1 <- # Fit alternative model to bootstrap sample + with(boot_samp, + fit_elife(time = eage, + time2 = eage + 1L, + rtrunc = rtrunc, + type = "interval", + event = 3, + family = "gomp", + weights = count)) + boot_anova[b] <- deviance(boot_mod0) - + deviance(boot_mod1) +} +# Add original statistic +boot_anova[B] <- deviance(model1) - deviance(model0) +# Bootstrap p-value +(pval <- rank(boot_anova)[B] / B) + + +## ----fig-parameterstab, fig.cap="Threshold diagnostic tools: parameter stability plots for the generalized Pareto model (left) and Northrop--Coleman \\(p\\)-value path (right) for the Japanese centenarian dataset. Both suggest that a threshold as low as 100 may be suitable for peaks-over-threshold analysis.", fig.alt="Threshold stability plots. The left panel shows shape parameter estimates with 95\\% confidence intervals as a function of the threshold value from 100 until 111 years. The right panel shows p-values from a score test for nested models as a function of the same thresholds.", echo=TRUE, fig.show='hold', out.width='100%', fig.width=8.5, fig.height=4, fig.align='center'---- +par(mfrow = c(1, 2), mar = c(4, 4, 1, 1)) +# Threshold sequence +u <- 100:110 +# Threshold stability plot +tstab(arguments = args_japan, + family = "gp", + method = "profile", + which.plot = "shape", + thresh = u) +# Northrop-Coleman diagnostic based on score tests +nu <- length(u) - 1L +nc_score <- nc_test(arguments = c(args_japan, list(thresh = u))) +score_plot <- plot(nc_score) +graphics.off() + + +## ----fig-qqplots, fig.cap="Probability-probability and quantile-quantile plots for generalized Pareto model fitted above age 105 years to Dutch data. The plots indicate broadly good agreement with the observation, except for some individuals who died age 109 for which too many have deaths close to their birthdates.", eval=TRUE, echo=TRUE, fig.width=8.5, fig.height=4, out.width='100%', fig.align='center'---- +fit_dutch <- fit_elife( + arguments = dutch_data, + event = 3, + type = "interval2", + family = "gp", + thresh = 105, + export = TRUE) +par(mfrow = c(1, 2)) +plot(fit_dutch, + which.plot = c("pp","qq")) + + +## ----fig-EW-diag-plots, eval=FALSE, echo=FALSE, fig.width=8.5, fig.height=4, fig.align='center', out.width='100%', fig.cap="Probability-probability (left) and generalized Pareto quantile-quantile (right) plots for the simulated England and Wales supercentenarian data."---- +# fit_EW <- with(ewsim4, +# longevity::fit_elife( +# time = time, +# ltrunc = ltrunc, +# rtrunc = rtrunc, +# family = "gp", +# export = TRUE)) +# plots <- plot(fit_EW, +# which.plot = c("pp","qq"), +# plot.type = "ggplot", +# plot = FALSE) +# library(patchwork) +# plots[[1]] + plots[[2]] + + +## ----bootstrap-gof, eval=TRUE, echo=FALSE------------------------------------- +set.seed(2022) +# Create contingency table with observations +# grouping all above ubound +get_observed_table <- + function(data, ubound){ + # Generate all combinations + df_combo <- + expand.grid( + eage = 0:ubound, + rtrunc = unique(data$rtrunc) + ) + df_combo$count <- 0 + # Merge data frame + # (ensures some empty category appear) + df_count <- data |> + dplyr::select(eage, rtrunc, count) |> + dplyr::full_join(df_combo, + by = c("eage", + "rtrunc", + "count")) |> + dplyr::mutate(eage = pmin(eage, ubound)) |> + # Regroup observations above ubound + dplyr::count(eage, + rtrunc, + wt = count, + name = "count") + } +# Compute expected counts, conditioning +# on number per right truncation limits +get_expected_count <- + function(model, data, ubound){ + data$prob <- + (pelife(q = ifelse(data$eage == ubound, + data$rtrunc, + data$eage + 1), + family = model$family, + scale = model$par) - + pelife(q = data$eage, + family = model$family, + scale = model$par)) / + pelife(q = data$rtrunc, + family = model$family, + scale = model$par) + count_rtrunc <- data |> + dplyr::group_by(rtrunc) |> + dplyr::summarize(tcount = sum(count), + .groups = "keep") + merge(data, count_rtrunc, by = "rtrunc") |> + dplyr::transmute(observed = count, + expected = tcount * prob) + } +# Compute chi-square statistic +chisquare_stat <- function(data){ + with(data, + sum((observed - expected)^2/expected)) +} +# Upper bound for pooling +ubound <- 5L +boot_gof <- numeric(length = B) +for(b in seq_len(B - 1L)){ + # Generate bootstrap sample + boot_samp <- + do.call(rbind, #merge data frames + apply(db_rtrunc, 1, function(x){ + # for each rtrunc and count + count <- table( #tabulate count + floor( #round down + # sample right truncated exponential + samp_elife(n = x["count"], + scale = model0$par, + family = "exp", #null model + upper = x["rtrunc"] - thresh, + type2 = "ltrt"))) + # return data frame + data.frame(count = as.integer(count), + rtrunc = as.numeric(x["rtrunc"]) - thresh, + eage = as.integer(names(count))) + })) + # Fit null model to bootstrap sample + boot_mod0 <- + with(boot_samp, + fit_elife(time = eage, + time2 = eage + 1L, + rtrunc = rtrunc, + type = "interval", + event = 3, + family = "exp", + weights = count)) + ctab <- get_observed_table(boot_samp, + ubound = ubound) + boot_gof[b] <- + chisquare_stat( + data = get_expected_count(model = boot_mod0, + data = ctab, + ubound = ubound) + ) +} +# Add original statistic +db_origin <- + aggregate(count ~ rtrunc + age, + FUN = "sum", + data = japanese2, + subset = age >= thresh) |> + dplyr::mutate(eage = age - thresh) +ctab <- get_observed_table(db_origin, + ubound = ubound) +boot_gof[B] <- + chisquare_stat( + data = get_expected_count(model = model0, + data = ctab, + ubound = ubound)) +# Bootstrap p-value +boot_gof_pval <- rank(boot_gof)[B] / B + + +## ----covariate-test, eval=TRUE, echo=TRUE------------------------------------- +print( + test_elife( + arguments = args_japan, + thresh = 110, + family = "gp", + covariate = japanese2$gender) +) + + +## ----fig-endpoint-confint, eval=TRUE, echo=TRUE, fig.cap="Maximum likelihood estimates with 95\\% confidence intervals as a function of threshold (left) and profile likelihood for exceedances above 110 years (right) for Japanese centenarian data. As the threshold increases, the number of exceedances decreases and the intervals for the upper bound become wider. At 110, the right endpoint of the interval would go until infinity.", fig.width=8.5, fig.height=4, out.width='100%', fig.align='center'---- +# Create grid of threshold values +thresholds <- 105:110 +# Grid of values at which to evaluate profile +psi <- seq(120, 200, length.out = 101) +# Calculate the profile for the endpoint +# of the generalized Pareto at each threshold +endpt_tstab <- do.call( + endpoint.tstab, + args = c( + args_japan, + list(psi = psi, + thresh = thresholds, + plot = FALSE))) +# Compute corresponding confidence intervals +profile <- endpoint.profile( + arguments = c(args_japan, list(thresh = 110, psi = psi))) +# Plot point estimates and confidence intervals +g1 <- autoplot(endpt_tstab, plot = FALSE, ylab = "lifespan (in years)") +# Plot the profile curve with cutoffs for conf. int. for 110 +g2 <- autoplot(profile, plot = FALSE) +patchwork::wrap_plots(g1, g2) + + +## ----fig-hazard, eval=TRUE, echo=FALSE, fig.width=8.5, fig.height=4, out.width='100%', fig.cap="Left: scatterplot of 1000 independent posterior samples from generalized Pareto model with maximum data information prior; the contour curves give the percentiles of credible intervals, and show approximate normality of the posterior. Right: functional boxplots for the corresponding hazard curves, with increasing width at higher ages."---- +par(mfrow = c(1,2), mar = c(4,4,1,1)) +threshold <- 108 +# Note that we cannot have an argument 'arguments' in 'ru' +post_samp <- rust::ru( + logf = lpost_elife, + weights = args_japan$weights, + rtrunc = args_japan$rtrunc, + event= 3, + type = "interval2", + time = args_japan$time, + time2 = args_japan$time2, + thresh = threshold, + family = "gp", + trans = "BC", + n = 1000, + d = 2, + init = c(1.67, -0.08), + lower = c(0, -1)) +plot(post_samp, + xlab = "scale", + ylab = "shape", + bty = "l") +age <- threshold + seq(0.1, 10, length.out = 101) +haz_samp <- apply(X = post_samp$sim_vals, + MARG = 1, + FUN = function(par){ + helife(x = age - threshold, scale = par[1], shape = par[2], family = "gp") + }) +# Functional boxplots +fbox <- fda::fbplot( + fit = haz_samp, + x = age, + xlim = range(age), + ylim = c(0, 2.8), + xaxs = "i", + yaxs = "i", + xlab = "age", + ylab = "hazard (in years)", + outliercol = "gray90", + color = "gray60", + barcol = "black", + bty = "l" +) + + +## ----fig-turnbull, echo=FALSE, eval=TRUE, out.width="100%", fig.width=8.5, fig.height=6, fig.align='center', fig.cap="Illustration of the truncation (pale grey) and censoring intervals (dark grey) equivalence classes based on Turnbull's algorithm. Observations must fall within equivalence classes defined by the former."---- +library(longevity) +n <- 100L +ltrunc <- runif(n = n, min = 0, max = 1) +rtrunc <- TruncatedNormal::rtnorm( + n = 1, + sd = 4, + mu = 5, + lb = ltrunc, + ub = rep(Inf, n)) +time <- TruncatedNormal::rtnorm( + n = 1, + sd = 2, + lb = ltrunc, + ub = rtrunc) +time2 <- ifelse(runif(n) < 0.5, + time, + TruncatedNormal::rtnorm( + n = 1, + sd = 2, + mu = (time + rtrunc)/2, + lb = time, + ub = rtrunc)) +status <- ifelse(time == time2, 1L, 3L) +unex <- turnbull_intervals(time = time, + time2 = time2, + status = status, + ltrunc = ltrunc, + rtrunc = rtrunc) +dummy_cens <- + dummy_trunc <- + matrix(NA, + nrow = nrow(unex), + ncol = length(time)) +for(i in seq_len(nrow(unex))){ + dummy_cens[i,] <- unex[i,1] >= time & unex[i,2] <= time2 + dummy_trunc[i,] <- unex[i,1] >= ltrunc & unex[i,2] <= rtrunc +} +intervals_cens <- apply(dummy_cens, 2, function(x){range(which(x))}) +intervals_trunc <- apply(dummy_trunc, 2, function(x){range(which(x))}) +melted_cens <- reshape2::melt(dummy_cens) +melted_trunc <- reshape2::melt(dummy_trunc) +ggplot() + + geom_tile(data = melted_trunc, + mapping = aes(x = Var2, y = Var1, fill = factor(ifelse(value, 2L, 0L), 0:2)), + ) + + geom_tile(data = melted_cens, + mapping = aes(x = Var2, y = Var1, fill = factor(ifelse(value, 1L, 0L), 0:2)), + alpha = 0.5) + + scale_fill_manual(values = c("white", "black", "grey")) + + labs(y = "Turnbull's intervals", + x = "observation identifier") + + scale_y_continuous(expand = c(0,0)) + + scale_x_continuous(expand = c(0,0)) + + theme(legend.position = "none") + + +## ----fig-ecdf, eval=TRUE, echo=TRUE, fig.align='center', out.width='100%', fig.width=8.5, fig.height=4, fig.cap="Nonparametric maximum likelihood estimate of the density (bar plot, left) and distribution function (staircase function, right), with superimposed generalized Pareto fit for excess lifetimes above 108 years. Except for the discreteness inherent to the nonparametric estimates, the two representations broadly agree at year marks."---- +ecdf <- np_elife(arguments = args_japan, thresh = 108) +# Summary statistics, accounting for censoring +round(summary(ecdf), digits = 2) +# Plots of fitted parametric model and nonparametric CDF +model_gp <- fit_elife( + arguments = args_japan, + thresh = 108, + family = "gp", + export = TRUE) +# ggplot2 plots, wrapped to display side by side +patchwork::wrap_plots( + autoplot( + model_gp, # fitted model + plot = FALSE, # return list of ggplots + which.plot = c("dens", "cdf"), + breaks = seq(0L, 8L, by = 1L) # set bins for histogram + ) +) + diff --git a/_articles/RJ-2025-034/RJ-2025-034.Rmd b/_articles/RJ-2025-034/RJ-2025-034.Rmd new file mode 100644 index 0000000000..aec24e3029 --- /dev/null +++ b/_articles/RJ-2025-034/RJ-2025-034.Rmd @@ -0,0 +1,810 @@ +--- +title: 'longevity: An R Package for Modelling Excess Lifetimes' +date: '2026-01-06' +author: +- name: Léo R. Belzile + affiliation: Department of Decision Sciences, HEC Montréal + address: + - 3000, chemin de la Côte-Sainte-Catherine + - Montréal (Québec), Canada + - H3T 2A7 + url: https://lbelzile.bitbucket.io + orcid: 0000-0002-9135-014X + email: leo.belzile@hec.ca +abstract: | + The longevity R package provides a maximum likelihood estimation routine for modelling of survival data that are subject to non-informative censoring or truncation. It includes a selection of 12 parametric models of varying complexity, with a focus on tools for extreme value analysis and more specifically univariate peaks over threshold modelling. The package provides utilities for univariate threshold selection, parametric and nonparametric maximum likelihood estimation, goodness of fit diagnostics and model comparison tools. These different methods are illustrated using individual Dutch records and aggregated Japanese human lifetime data. +draft: no +preamble: | + \usepackage{amsfonts} +type: package +output: + rjtools::rjournal_web_article: + self_contained: yes + toc: no + mathjax: https://cdn.jsdelivr.net/npm/mathjax@4/tex-mml-chtml.js + rjtools::rjournal_pdf_article: + toc: no +bibliography: longevity.bib +nocite: '@survival-package, @Icens-package, @DTDA-package, @muhaz-package, @mev-package, + @Rsolnp-pkg, @demography-package, @MortCast-package, @MortalityLaws-package, @vitality-package, + @prodlim-package, @dblcens-package, @dplyr-package, @tranSurv-package, @rust-package, + @fda-package, @lubridate-package, @tidyverse' +date_received: '2024-01-11' +volume: 17 +issue: 4 +slug: RJ-2025-034 +journal: + lastpage: 58 + firstpage: 37 + +--- + + +```{r setup, include=FALSE} +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>", + cache = TRUE +) +par(pch = 20, bty = "l") +``` + +# Introduction + +Many data sets collected by demographers for the analysis of human longevity have unusual features and for which limited software implementations exist. The \CRANpkg{longevity} package was initially built for dealing with human longevity records and data from the International Database on Longevity (IDL), which provides age at death of supercentenarians, i.e., people who died above age 110. Data for the statistical analysis of (human) longevity can take the form of aggregated counts per age at death, or most commonly life trajectory of individuals with both birth and death dates. Such lifetimes are often interval truncated (only age at death of individuals dying between two calendar dates are recorded) or left truncated and right censored (when data of individuals still alive at the end of the collection period are also included). Another frequent format is death counts, aggregated per age band. Censoring and truncation are typically of administrative nature and thus non-informative about death. + +Supercentenarians are extremely rare and records are sparse. The most popular parametric models used by practitioners are justified by asymptotic arguments and have their roots in extreme value theory. Univariate extreme value distributions are well implemented in software and @Belzile.Dutang.Northrop.Opitz:2022 provides a recent review of existing implementations. While there are many standard R packages for the analysis of univariate extremes using likelihood-based inference, such as \CRANpkg{evd} [@evd], \CRANpkg{mev} and \CRANpkg{extRemes} [@extRemes], only the \CRANpkg{evgam} package includes functionalities to fit threshold exceedance models with censoring, as showcased in @evgam with rounded rainfall measurements. Support for survival data for extreme value models is in general wholly lacking, which motivated the development of \CRANpkg{longevity}. + +The \CRANpkg{longevity} package also includes several parametric models commonly used in demography. Many existing packages that focus on tools for modelling mortality rates, typically through life tables, are listed in the CRAN Task View \ctv{ActuarialScience} in the Life insurance section. They do not however allow for truncation or more general survival mechanisms, as the aggregated data used are typically complete except for potential right censoring for the oldest age group. The \CRANpkg{survival} package [@survival-book] includes utilities for accelerated failure time models for 10 parametric models. The \CRANpkg{MortCast} package can be used to estimate age-specific mortality rates using the Kannisto and Lee--Carter approaches, among others [@Sevcikova:2016]. The \CRANpkg{demography} package provides forecasting methods for death rates based on constructed life tables using the Lee--Carter or ARIMA models. \CRANpkg{MortalityLaws} includes utilities to download data from the Human Mortality Database (HMD), and fit a total of 27 parametric models for life table data (death counts and population at risk per age group) using Poisson, binomial or alternative loss functions. The \CRANpkg{vitality} package fits the family of vitality models [@Li.Anderson:2009;@Anderson:2000] via maximum likelihood based on empirical survival data. The \CRANpkg{fitdistrplus} [@fitdistrplus-package] allows for generic parametric distributions to be fitted to interval censored data via maximum likelihood, with various S3 methods for model assessment. The package also allows user-specified models, thereby permitting custom definitions for truncated distributions whose truncation bounds are passed as fixed vector parameters. Parameter uncertainty can be obtained via additional functions using nonparametric bootstrap. Many parametric distributions also appear in the \CRANpkg{VGAM} package [@Yee.Wild:1996;@VGAMbook], which allows for vector generalized linear modelling. The \CRANpkg{longevity} package is less general and offers support only for selected parametric distributions, but contrary to the aforementioned packages allows for truncation and general patterns. One strength of \CRANpkg{longevity} is that it also includes model comparison tools that account for non-regular asymptotics and goodness of fit diagnostics. + +Nonparametric methods are popular tools for the analysis of survival data with large samples, owing to their limited set of assumptions. They also serve for the validation of parametric models. Without explanatory variable, a closed-form estimator of the nonparametric maximum likelihood estimator of the survival function can be derived in particular instances, including the product limit estimator [@Kaplan.Meier:1958] for the case of random or non-informative right censoring and an extension allowing for left truncation [@Tsai.Jewell.Wang:1987]. In general, the nonparametric maximum likelihood estimator of the survival function needs to be computed using an expectation-maximization (EM) algorithm [@Turnbull:1976]. Nonparametric estimators only assign probability mass on observed failure times and intervals and so cannot be used for extrapolation beyond the range of the data, limiting its utility in extreme value analysis. + +The CRAN Task View on \ctv{Survival} Analysis lists various implementations of nonparametric maximum likelihood estimators of the survival or hazard functions: \CRANpkg{survival} implements the Kaplan--Meier and Nelson--Aalen estimators. Many packages focus on the case of interval censoring [@Groeneboom.Wellner:1992, \S 3.2], including \CRANpkg{prodlim}; @Anderson-Bergman:2017 reviews the performance of the implementations in \CRANpkg{icenReg} [@icensReg] and \BIOpkg{Icens}. The latter uses the incidence matrix as input data. Routines for doubly censored data are provided by \CRANpkg{dblcens}. The \CRANpkg{interval} [@interval] implements Turnbull's EM algorithm for interval censored data. The case of left truncated and right censored data is handled by \CRANpkg{survival}, and \CRANpkg{tranSurv} provides transformation models with the potential to account for truncation dependent on survival times. For interval truncated data, dedicated algorithms that use gradient-based steps [@Efron.Petrosian:1999] or inverse probability weighting [@Shen:2010] exist and can be more efficient than the EM algorithm of @Turnbull:1976. Many of these are implemented in the \CRANpkg{DTDA} package. The \CRANpkg{longevity} package includes a C$^{++}$ implementation of the corrected Turnbull's algorithm [@Turnbull:1976], returning the nonparametric maximum likelihood estimator for arbitrary censoring and truncation patterns, as opposed to the specific existing aforementioned implementations already available which focus on specific subcases. With a small number of observations, it is also relatively straightforward to maximize the log likelihood for the concave program subject to linear constraints using constrained optimization algorithms; \CRANpkg{longevity} relies for this on \CRANpkg{Rsolnp}, which uses augmented Lagrangian methods and sequential quadratic programming [@Ye:1987]. + +## Motivating examples + +To showcase the functionality of the package and particularity for the modelling of threshold exceedances, we consider Dutch and Japanese life lengths. The `dutch` database contains the age at death (in days) of Dutch people who died above age 92 between 1986 and 2015; these data were obtained from Statistics Netherlands and analyzed in @Einmahl:2019 and @ARSIA:2022. Records are interval truncated, as people are included in the database only if they died during the collection period. In addition, there are 226 interval censored and interval truncated records for which only the month and year of birth and death are known, as opposed to exact dates. + +The second database we consider is drawn from @ExceptionalLifespans. The data frame `japanese2` consists of counts of Japanese above age 100 by age band and are stratified by both birth cohort and sex. To illustrate the format of the data, counts for female Japanese are reproduced in Table \@ref(tab:tbl-japanese-women). The data were constructed using the extinct cohort method and are interval censored between $\texttt{age}$ and $\texttt{age} + 1$ and right truncated at the age reached by the oldest individuals of their birth cohort in 2020. The `count` variable lists the number of instances in the contingency table, and serves as a weight for likelihood contributions. + +```{r tbl-japanese-women, eval=TRUE, echo=FALSE} +library(longevity) +library(ggplot2) +data(japanese2, package = "longevity") +female_japanese <- japanese2 |> + dplyr::filter(gender == "female") |> + dplyr::select(!gender) +knitr::kable( + tidyr::pivot_wider(female_japanese, + names_from = bcohort, + values_from = count), + booktabs = TRUE, + longtable = FALSE, + centering = TRUE, + row.names = FALSE, + linesep = "", + caption = "Death count by birth cohort and age band for female Japanese.") +``` + +# Package functionalities + +The \CRANpkg{longevity} package uses the S3 object oriented system and provides a series of functions with common arguments. The syntax used by the \CRANpkg{longevity} package purposely mimics that of the \CRANpkg{survival} package [@survival-book], except that it does not specify models using a formula. Users must provide vectors for the time or age (or bounds in case of intervals) via arguments `time` and `time2`, as well as lower and upper truncation bounds (`ltrunc` and `rtrunc`) if applicable. The integer vector `event` is used to indicate the type of event, where following \CRANpkg{survival} `0` indicates right-censoring, `1` observed events, `2` left-censoring and `3` interval censoring. Together, these five vectors characterize the data and the survival mechanisms at play. Depending on the sampling scheme, not all arguments are required or relevant and they need not be of the same length, but are common to most functions. Users can also specify a named list `args` to pass arguments: as illustrated below, this is convenient to avoid specifying repeatedly the common arguments in each function call. Default values are overridden by elements in `args`, with the exception of those that are passed by the user directly in the call. Relative to \CRANpkg{survival}, functions have additional arguments `ltrunc` and `rtrunc` for left and right truncation limits, as these are also possibly matrices for the case of double interval truncation [@ARSIA:2022], since both censoring and truncation can be present simultaneously. + +We can manipulate the data set to build the time vectors and truncation bounds for the Dutch data. We re-scale observations to years for interpretability and keep only records above age 98 for simplicity. We split the data to handle the observed age at death first: these are treated as observed (uncensored) whenever `time` and `time2` coincide. When exact dates are not available, we compute the range of possible age at which individuals may have died, given their birth and death years and months. The truncation bounds for each individual can be obtained by subtracting from the endpoints of the sampling frame the birth dates, with left and right truncation bounds +\begin{align*} +\texttt{ltrunc}=\min\{92 \text{ years}, 1986.01.01 - \texttt{bdate}\}, \qquad \texttt{rtrunc} = 2015.12.31- \texttt{bdate}. +\end{align*} +Table \@ref(tab:tbl-dutch-preview) shows a sample of five individuals, two of whom are interval-censored, and the corresponding vectors of arguments along with two covariates, gender (`gender`) and birth year (`byear`). + +```{r dutch-setup, eval=TRUE, echo=FALSE, message=FALSE, warning=FALSE} +library(longevity) +library(lubridate) +library(dplyr, warn.conflicts = FALSE) +data(dutch, package = "longevity") +# Extract sampling frame from attributes of data set +yr_samp <- year(attr(x = dutch, which = "sampling_frame")) +# Preprocess data for analysis +dutch1 <- dutch |> + subset(!is.na(ndays)) |> + # Remove interval censored data for the time being + mutate(time = ndays / 365.25, # age at death + time2 = time, + # min/max age to be included in sampling frame + ltrunc = ltrunc / 365.25, + rtrunc = rtrunc / 365.25, + event = 1) |> # observed failure time (event=1) + subset(time > 98) |> + select(time, time2, ltrunc, rtrunc, event, gender, byear) +# Subset all interval censored, interval truncated records +dutch2 <- dutch |> + subset(is.na(ndays)) |> + mutate(time2 = ceiling_date( + dmy(paste("01-", dmonth, "-", dyear)), unit = "month") - 1 - + dmy(paste("01-01-", byear)), + time = dmy(paste("01-", dmonth, "-", dyear)) - + dmy(paste("31-12-", byear)), + ltrunc = dmy(paste("01-01-1986")) - dmy(paste("31-12-", byear)), + rtrunc = dmy(paste("31-12-2015")) - dmy(paste("01-01-", byear)) + ) |> + select(time, time2, ltrunc, rtrunc, gender, byear) |> + # Transform data from days to years for interpretability + mutate(time = as.numeric(time) / 365.25, # lower censoring limit + time2 = as.numeric(time2) / 365.25, # upper censoring limit + ltrunc = as.numeric(ltrunc) / 365.25, # min age to be included + rtrunc = as.numeric(rtrunc) / 365.25, # max age to be included + event = 3) |> # interval censoring (event=3) + subset(time > 98) # subset exceedances above 98 years +# Combine databases +dutch_data <- rbind(dutch1, dutch2) +``` + +```{r tbl-dutch-preview, eval=TRUE, echo=FALSE} +set.seed(2025) +# Sample some observations +icens_d <- sample(which(dutch_data$event == 1), 3) +obs_d <- sample(which(dutch_data$event == 3), 2) +knitr::kable( + dutch_data[c(obs_d, icens_d),], + digits = 2, + booktabs = TRUE, + longtable = FALSE, + centering = TRUE, + row.names = FALSE, + linesep = "", + caption = "Sample of five Dutch records, formatted so that the inputs match the function arguments used by the package. Columns give the age in years at death (or plausible interval), lower and upper truncation bounds giving minimum and maximum age for inclusion, an integer indicating the type of censoring, gender and birth year.") +``` + +We can proceed similarly for the Japanese data. Ages of centenarians are rounded down to the nearest year, so all observations are interval censored within one-year intervals. Assuming that the ages at death are independent and identically distributed with distribution function $F(\cdot; \boldsymbol{\theta})$, the log likelihood for exceedances $y_i = \texttt{age}_i - u$ above age $u$ is +\begin{align*} +\ell(\boldsymbol{\theta}) = \sum_{i: \texttt{age}_i > u}n_i \left[\log \{F(y_i+1; \boldsymbol{\theta}) - F(y_i; \boldsymbol{\theta})\} - \log F(r_i - u; \boldsymbol{\theta})\right] +\end{align*} +where $n_i$ is the count of the number of individuals in cell $i$ and $r_i > \texttt{age}_i+1$ is the right truncation limit for that cell, i.e., the maximum age that could have been achieved for that birth cohort by the end of the data collection period. + +```{r longevity-setup, echo=TRUE, eval=TRUE} +data(japanese2, package = "longevity") +# Keep only non-empty cells +japanese2 <- japanese2[japanese2$count > 0, ] +# Define arguments that are recycled +japanese2$rtrunc <- 2020 - + as.integer(substr(japanese2$bcohort, 1, 4)) +# The line above extracts the earliest year of the birth cohort +# Create a list with all arguments common to package functions +args_japan <- with(japanese2, + list( + time = age, # lower censoring bound + time2 = age + 1L, # upper censoring bound + event = 3, # define interval censoring + type = "interval2", + rtrunc = rtrunc, # right truncation limit + weights = count)) # counts as weights +``` + +## Parametric models and maximum likelihood estimation + +Various models are implemented in \CRANpkg{longevity}: their hazard functions are reported in Table \@ref(tab:tbl-parametric-models). Two of those models, labelled `perks` and `beard` are logistic-type hazard functions proposed in @Perks:1932 that have been used by @Beard:1963, and popularized in work of Kannisto and Thatcher; we use the parametrization of @Richards:2012, from which we also adopt the nomenclature. Users can compare the models with those available in \CRANpkg{MortalityLaws}; see `?availableLaws` for the list of hazard functions and parametrizations. + +```{r tab-parametric, eval=TRUE, echo=FALSE} +df_parametric_html <- data.frame(rbind( +c("`exp`", "\\(\\sigma^{-1}\\)", "\\(\\sigma > 0\\)"), +c("`gomp`", "\\(\\sigma^{-1}\\exp(\\beta t/\\sigma)\\)", +"\\(\\sigma > 0, \\beta \\ge 0\\)"), +c("`gp`", "\\((\\sigma + \\xi t)_{+}^{-1}\\)", "\\(\\sigma > 0, \\xi \\in \\mathbb{R}\\)"), +c("`weibull`", "\\(\\sigma^{-\\alpha} \\alpha t^{\\alpha-1}\\)", "\\(\\sigma > 0, \\alpha > 0\\)"), +c("`extgp`", "\\(\\beta\\sigma^{-1}\\exp(\\beta t/\\sigma)[\\beta+\\xi\\{\\exp(\\beta t/\\sigma) -1\\}]^{-1}\\)", "\\(\\sigma > 0, \\beta \\ge 0, \\xi \\in \\mathbb{R}\\)"), +c("`extweibull`", "\\(\\alpha\\sigma^{-\\alpha}t^{\\alpha-1}\\{1+\\xi(t/\\sigma)^{\\alpha}\\}_{+}\\)", "\\(\\sigma > 0, \\alpha > 0, \\xi \\in \\mathbb{R}\\)"), +c("`perks`", "\\(\\alpha\\exp(\\nu x)/\\{1+\\alpha\\exp(\\nu x)\\}\\)", "\\(\\nu \\ge 0, \\alpha >0\\)"), +c("`beard`", "\\(\\alpha\\exp(\\nu x)/\\{1+\\alpha\\beta\\exp(\\nu x)\\}\\)", "\\(\\nu \\ge 0, \\alpha >0, \\beta \\ge 0\\)"), +c("`gompmake`", "\\(\\lambda + \\sigma^{-1}\\exp(\\beta t/\\sigma)\\)", +"\\(\\lambda \\ge 0, \\sigma > 0, \\beta \\ge 0\\)"), +c("`perksmake`", "\\(\\lambda + \\alpha\\exp(\\nu x)/\\{1+\\alpha\\exp(\\nu x)\\}\\)", "\\(\\lambda \\ge 0, \\nu \\ge 0, \\alpha > 0\\)"), +c("`beardmake`", "\\(\\lambda + \\alpha\\exp(\\nu x)/\\{1+\\alpha\\beta\\exp(\\nu x)\\}\\)", "\\( \\lambda \\ge 0, \\nu \\ge 0, \\alpha > 0, \\beta \\ge 0\\)") +)) +colnames(df_parametric_html) <- + c("model", "hazard function", "constraints") +df_parametric_tex <- data.frame(rbind( +c("\\texttt{exp}", "\\(\\sigma^{-1}\\)", "\\(\\sigma > 0\\)"), +c("\\texttt{gomp}", "\\(\\sigma^{-1}\\exp(\\beta t/\\sigma)\\)", +"\\(\\sigma > 0, \\beta \\ge 0\\)"), +c("\\texttt{gp}", "\\((\\sigma + \\xi t)_{+}^{-1}\\)", "\\(\\sigma > 0, \\xi \\in \\mathbb{R}\\)"), +c("\\texttt{weibull}", "\\(\\sigma^{-\\alpha} \\alpha t^{\\alpha-1}\\)", "\\(\\sigma > 0, \\alpha > 0\\)"), +c("\\texttt{extgp}", "\\(\\beta\\sigma^{-1}\\exp(\\beta t/\\sigma)[\\beta+\\xi\\{\\exp(\\beta t/\\sigma) -1\\}]^{-1}\\)", "\\(\\sigma > 0, \\beta \\ge 0, \\xi \\in \\mathbb{R}\\)"), +c("\\texttt{extweibull}", "\\(\\alpha\\sigma^{-\\alpha}t^{\\alpha-1}\\{1+\\xi(t/\\sigma)^{\\alpha}\\}_{+}\\)", "\\(\\sigma > 0, \\alpha > 0, \\xi \\in \\mathbb{R}\\)"), +c("\\texttt{perks}", "\\(\\alpha\\exp(\\nu x)/\\{1+\\alpha\\exp(\\nu x)\\}\\)", "\\(\\nu \\ge 0, \\alpha >0\\)"), +c("\\texttt{beard}", "\\(\\alpha\\exp(\\nu x)/\\{1+\\alpha\\beta\\exp(\\nu x)\\}\\)", "\\(\\nu \\ge 0, \\alpha >0, \\beta \\ge 0\\)"), +c("\\texttt{gompmake}", "\\(\\lambda + \\sigma^{-1}\\exp(\\beta t/\\sigma)\\)", +"\\(\\lambda \\ge 0, \\sigma > 0, \\beta \\ge 0\\)"), +c("\\texttt{perksmake}", "\\(\\lambda + \\alpha\\exp(\\nu x)/\\{1+\\alpha\\exp(\\nu x)\\}\\)", "\\(\\lambda \\ge 0, \\nu \\ge 0, \\alpha > 0\\)"), +c("\\texttt{beardmake}", "\\(\\lambda + \\alpha\\exp(\\nu x)/\\{1+\\alpha\\beta\\exp(\\nu x)\\}\\)", "\\( \\lambda \\ge 0, \\nu \\ge 0, \\alpha > 0, \\beta \\ge 0\\)") +)) +colnames(df_parametric_html) <- + colnames(df_parametric_tex) <- + c("model", "hazard function", "constraints") +``` + +```{r tbl-parametric-models, eval=TRUE, echo=FALSE} +if(knitr::is_html_output()){ +knitr::kable(df_parametric_html, + format = "html", + escape = FALSE, + linesep = "", + caption = "List of parametric models for excess lifetime supported by the package, with parametrization and hazard functions. The models are expressed in terms of scale parameter $\\sigma$, rate parameters $\\lambda$ or $\\nu$, and shape parameters $\\xi$, $\\alpha$ or $\\beta$.") +} else if(knitr::is_latex_output()){ +knitr::kable(df_parametric_tex, + format = "latex", + caption = "List of parametric models for excess lifetime supported by the package, with parametrization and hazard functions.", + escape = FALSE, + linesep = "", + booktabs = TRUE) |> + kableExtra::kable_styling() +} +``` + +Many of the models are nested and Figure \@ref(fig:fig-nesting) shows the logical relation between the various families. The function `fit_elife` allows users to fit all of the parametric models of Table \@ref(tab:tbl-parametric-models): the `print` method returns a summary of the sampling mechanism, the number of observations, the maximum log likelihood and parameter estimates with standard errors. Depending on the data, some models may be overparametrized and parameters need not be numerically identifiable. To palliate such issues, the optimization routine, which uses \CRANpkg{Rsolnp}, can try multiple starting values or fit various sub-models to ensure that the parameter values returned are indeed the maximum likelihood estimates. If one tries to compare nested models and the fit of the simpler model is better than the alternative, the `anova` function will return an error message. + +The `fit_elife` function handles arbitrary censoring patterns over single intervals, along with single interval truncation and interval censoring. To accommodate the sampling scheme of the International Database on Longevity (IDL), an option also allows for double interval truncation [@ARSIA:2022], whereby observations are included only if the person dies between time intervals, potentially overlapping, which defines the observation window over which dead individuals are recorded. + +```{r model-comparison} +thresh <- 108 +model0 <- fit_elife(arguments = args_japan, + thresh = thresh, + family = "exp") +(model1 <- fit_elife(arguments = args_japan, + thresh = thresh, + family = "gomp")) +``` + +## Model comparisons + +Goodness of fit of nested models can be compared using likelihood ratio tests via the `anova` method. Most of the interrelations between models yield non-regular model comparisons since, to recover the simpler model, one must often fix parameters to values that lie on the boundary of the parameter space. For example, if we compare a Gompertz model with the exponential, the limiting null distribution is a mixture of a point mass at zero and a $\chi^2_1$ variable, both with probability half [@Chernoff:1954]. Many authors [e.g., @Camarda:2022] fail to recognize this fact. The case becomes more complicated with more than one boundary constraint: for example, the deviance statistic comparing the Beard--Makeham and the Gompertz model, which constrains two parameters on the boundary of the parameter space, has a null distribution which is a mixture of $\chi^2_2/4 + \chi^2_1/2 + \chi^2_0/4$ [@Self.Liang:1987]. + +Nonidentifiability impacts testing: for example, if the rate parameter of the Perks--Makeham model (`perksmake`) $\nu \to 0$, the limiting hazard, $\lambda + \exp(\alpha)/\{1+\exp(\alpha)\}$, is constant (exponential model), but neither $\alpha$ nor $\lambda$ is identifiable. The usual asymptotics for the likelihood ratio test break down as the information matrix is singular [@Rotnitzky:2000]. As such, all three families that include a Makeham component cannot be directly compared to the exponential in \CRANpkg{longevity} and the call to `anova` returns an error message. + +Users can also access information criteria, `AIC` and `BIC`. The correction factors implemented depend on the number of parameters of the distribution, but do not account for singular fit, non-identifiable parameters or singular models for which the usual corrections $2p$ and $\ln(n)p$ are inadequate [@Watanabe:2010]. + +```{r fig-nesting, eval=TRUE, echo=FALSE, fig.cap="Relationship between parametric models showing nested relations. Dashed arrows represent restrictions that lead to nonregular asymptotic null distribution for comparison of nested models. Comparisons between models with Makeham components and exponential are not permitted by the software because of nonidentifiability issues.", fig.alt="Graph with parametric model names, and arrows indicating the relationship between these. Dashed arrows indicate non-regular comparisons between nested models, and the expression indicates which parameter to fix to obtain the submodel.", fig.align='center', out.width='100%'} +if(knitr::is_latex_output()){ + knitr::include_graphics("fig/nesting_graph.pdf") +} else if(knitr::is_html_output()){ + knitr::include_graphics("fig/nesting_graph.png") +} +``` + +To showcase how hypothesis testing is performed, we consider a simple example with two nested models. We test whether the exponential model is an adequate simplification of the Gompertz model for exceedances above 108 years --- an irregular testing problem since $\beta=0$ is a restriction on the boundary of the parameter space. The drop in log likelihood is quite large, indicating the exponential model is not an adequate simplification of the Gompertz fit. This is also what is suggested by the Bayesian information criterion, which is much lower for the Gompertz model than for the exponential. + +```{r, eval=FALSE, echo=TRUE} +# Model comparison +anova(model1, model0) +# Information criteria +c("exponential" = BIC(model0), "Gompertz" = BIC(model1)) +``` + +```{r, eval=TRUE, echo=FALSE} +options(knitr.kable.NA = '') +knitr::kable(anova(model1, model0), booktabs = TRUE, digits=2) +c("exponential" = BIC(model0), "Gompertz" = BIC(model1)) +``` + +## Simulation-based inference + +Given the poor finite sample properties of the aforementioned tests, it may be preferable to rely on a parametric bootstrap rather than on the asymptotic distribution of the test statistic [@ARSIA:2022] for model comparison. +Simulation-based inference requires capabilities for drawing new data sets whose features match those of the original one. For example, the [International Database on Longevity](supercentenarians.org) (IDL) [@IDL:2021] features data that are interval truncated above 110 years, but doubly interval truncated since the sampling period for semisupercentenarians (who died age 105 to 110) and supercentenarians (who died above 110) are not always the same [@ARSIA:2022]. The 2018 Istat semisupercentenarians database analyzed by @Barbi:2018 on Italians includes left truncated right censored records. + +To mimic the postulated data generating mechanism while accounting for the sampling scheme, we could use the observed birth dates, or simulate new birth dates (possibly through a kernel estimator of the empirical distribution of birth dates) while keeping the sampling frame with the first and last date of data collection to define the truncation interval. In other settings, one could obtain the nonparametric maximum likelihood estimator of the distribution of the upper truncation bound [@Shen:2010] using an inverse probability weighted estimator, which for fixed data collection windows is equivalent to setting the birth date. + +The `samp_elife` function includes multiple `type2` arguments to handle these. For interval truncated data (`type2="ltrt"`), it uses the inversion method (Section 2 of @Devroye:1986): for $F$ an absolutely continuous distribution function and $F^{-1}$ the corresponding quantile function, a random variable distributed according to $F$ truncated on $[a,b]$ is generated as $X \sim F^{-1}[F(a) + U\{F(b)-F(a)\}]$ +where $U \sim \mathsf{U}(0,1)$ is standard uniform. + +The function `samp_elife` also has an argument `upper` which serves for both right truncation, and right censoring. For the latter, any record simulated that exceeds `upper` is capped at that upper bound and declared partially observed. This is useful for simulating administrative censoring, whereby the birth date and the upper bound of the collection window fully determine whether an observation is right censored or not. An illustrative example is provided in the next section. + + + +```{r sim-EW, echo=FALSE, eval=FALSE} +nEW <- 179 +library(lubridate) +set.seed(2023) +# First observation from 1856, so maximum age for truncation is 55 years +ub <- pgamma(q = 55*365.25, scale = 9.945*365.25, shape = 1.615) +# Sample right truncated record +bdate_EW <- lubridate::ymd("1910-12-31") - + qgamma(p = runif(n = nEW)*ub, + scale = 9.945*365.25, + shape = 1.615) +# Obtain truncation bounds given sampling frame +ltrunc_EW <- pmax(0, (ymd("1968-01-01") - bdate_EW) / 365.25 - 110) +rtrunc_EW <- as.numeric(ymd("2020-12-31") - bdate_EW) / 365.25 - 110 +sim_EW <- longevity::samp_elife( + n = nEW, + scale = 1.2709, # parameters obtained from fitting IDL data for E&W + shape = -0.0234, + lower = ltrunc_EW, # smallest age for left truncation limit + upper = rtrunc_EW, # maximum age + family = "gp", # generalized Pareto + type2 = "ltrt") # left truncated right truncated +ewsim4 <- data.frame( + time = sim_EW, + ltrunc = ltrunc_EW, + rtrunc = rtrunc_EW) +``` + +The `anova` method call uses the asymptotic null distribution for comparison of nested parametric distributions $\mathcal{F}_0 \subseteq \mathcal{F}_1$. We could use the bootstrap to see how good this approximation to the null distribution is. To mimic as closely as possible the data generating mechanism, which is custom in most scenarios, we condition on the sampling frame and the number of individuals in each birth cohort. The number dying at each age is random, but the right truncation limits will be the same for anyone in that cohort. We simulate excess lifetimes, then interval censor observations by keeping only the corresponding age bracket. Under the null hypothesis, the data are drawn from $\widehat{F}_0 \in \mathcal{F}_0$ and we generate observations from this right truncated distribution using the `samp_elife` utility, which also supports double interval truncation and left truncation right censoring. This must be done within a for loop since we have count attached to each upper bound, but the function is vectorized should we use a single vector containing all of the right truncation limits. + +The bootstrap $p$-value for comparing models $M_0 \subset M_1$ would be obtained by repeating the following steps $B$ times and calculating the rank of the observed test statistic among alternatives: + +1. Simulate new birth dates $d_i$ $(i=1, \ldots, n)$ (e.g., drawing from a smoothed empirical distribution of birth dates); the latest possible birth date is one which ensures the person reached at least the threshold by the end of the period. +2. Subtract the endpoints of the sampling period, say $c_1$ and $c_2$ to get the minimum (maximum) age at death, $c_1 - d_i$ (respectively $c_2 - d_i$) days, which define the truncation bounds. +3. Use the function `samp_elife` to simulate new observations from a parametric interval truncated distribution from the null model $M_0$ +4. Use the optimization procedure in `fit_elife` to fit the model with both $M_1$ and $M_2$ and calculate the deviance, and from them the likelihood ratio statistic. + +The algorithm is implemented below for comparing the Gompertz and the exponential model. Since the procedure is computationally intensive, users must trade off between the precision of the bootstrap $p$-value estimate and the number of replications, $B$. + +```{r bootstrap-comparison, eval=TRUE, echo=TRUE} +set.seed(2022) +# Count of unique right truncation limit +db_rtrunc <- aggregate(count ~ rtrunc, + FUN = "sum", + data = japanese2, + subset = age >= thresh) +B <- 1000L # Number of bootstrap replications +boot_anova <- numeric(length = B) +boot_gof <- numeric(length = B) +for(b in seq_len(B - 1L)){ + boot_samp <- # Generate bootstrap sample + do.call(rbind, #merge data frames + apply(db_rtrunc, 1, function(x){ # for each rtrunc and count + count <- table( #tabulate count + floor( #round down + samp_elife( # sample right truncated exponential + n = x["count"], + scale = model0$par, + family = "exp", #null model + upper = x["rtrunc"] - thresh, + type2 = "ltrt"))) + data.frame( # return data frame + count = as.integer(count), + rtrunc = as.numeric(x["rtrunc"]) - thresh, + eage = as.integer(names(count))) + })) + boot_mod0 <- # Fit null model to bootstrap sample + with(boot_samp, + fit_elife(time = eage, + time2 = eage + 1L, + rtrunc = rtrunc, + type = "interval", + event = 3, + family = "exp", + weights = count)) + boot_mod1 <- # Fit alternative model to bootstrap sample + with(boot_samp, + fit_elife(time = eage, + time2 = eage + 1L, + rtrunc = rtrunc, + type = "interval", + event = 3, + family = "gomp", + weights = count)) + boot_anova[b] <- deviance(boot_mod0) - + deviance(boot_mod1) +} +# Add original statistic +boot_anova[B] <- deviance(model1) - deviance(model0) +# Bootstrap p-value +(pval <- rank(boot_anova)[B] / B) +``` + +The asymptotic approximation is of similar magnitude as the bootstrap $p$-value. Both suggest that the more complex Gompertz model provides a significantly better fit. + +## Extreme value analysis + +Extreme value theory suggests that, in many instances, the limiting conditional distribution of exceedances of a random variable $Y$ with distribution function $F$ is generalized Pareto, meaning +\begin{align} +\lim_{u \to x^*}\Pr(Y-u > y \mid Y > u)= \begin{cases} +\left(1+\xi y/\sigma\right)_{+}^{-1/\xi}, & \xi \neq 0;\\ +\exp(-y/\sigma), & \xi = 0; +\end{cases} +(\#eq:gpd) +\end{align} +with $x_{+} = \max\{x, 0\}$ and $x^*=\sup\{x: F(x) < 1\}$. This justifies the use of Equation \@ref(eq:gpd) for the survival function of threshold exceedances when dealing with rare events. The model has two parameters: a scale $\sigma$ and a shape $\xi$ which determines the behavior of the upper tail. Negative shape parameters correspond to bounded upper tails and a finite right endpoint for the support. + +Study of population dynamics and mortality generally requires knowledge of the total population from which observations are drawn to derive rates. By contrast, the peaks over threshold method, by which one models the $k$ largest observations of a sample, is a conditional analysis (e.g., given survival until a certain age), and is therefore free of denominator specification since we only model exceedances above a high threshold $u$. For modelling purposes, we need to pick a threshold $u$ that is smaller than the upper endpoint $x^*$ in order to have sufficient number of observations to estimate parameters. The threshold selection problem is a classical instance of bias-variance trade-off: the parameter estimators are possibly biased if the threshold is too low because the generalized Pareto approximation is not good enough, whereas choosing a larger threshold to ensure we are closer to the asymptotic regime leads to reduced sample size and increased parameter uncertainty. + +To aid threshold selection, users commonly resort to parameter stability plots. These are common visual diagnostics consisting of a plot of estimates of the shape parameter $\widehat{\xi}$ (with confidence or credible intervals) based on sample exceedances over a range of thresholds $u_1, \ldots, u_K$. If the data were drawn from a generalized Pareto distribution, the conditional distribution above higher threshold $v > u$ is also generalized Pareto with the same shape: this threshold stability property is the basis for extrapolation beyond the range of observed records. Indeed, if the change in estimation of $\xi$ is nearly constant, this provides reassurance that the approximation can be used for extrapolation. The only difference with survival data, relative to the classical setting, is that the likelihood must account for censoring and truncation. Note that, when we use threshold exceedances with a nonzero threshold (argument `thresh`), it however isn't possible to unambiguously determine whether left censored observations are still exceedances: such cases yield errors in the functions. + +Theory on penultimate extremes suggests that, for finite levels and general distribution function $F$ for which \@ref(eq:gpd) holds, the shape parameter varies as a function of the threshold $u$, behaving like the derivative of the reciprocal hazard $r(x) = \{1-F(x)\}/f(x)$. We can thus model the shape as piece-wise constant by fitting a piece-wise generalized Pareto model due to @Northrop.Coleman:2014 and adapted in @ARSIA:2022 for survival data. The latter can be viewed as a mixture of generalized Pareto over $K$ disjoint intervals with continuity constraints to ensure a smooth hazard, which reduces to the generalized Pareto if we force the $K+1$ shape parameters to be equal. We can use a likelihood ratio test to compare the model, or a score test if the latter is too computationally intensive, and plot the $p$-values for each of the $K$ thresholds, corresponding to the null hypotheses $\mathrm{H}_k: \xi_k = \cdots = \xi_{K}$ ($k=1, \ldots, K-1$). As the model quickly becomes overparametrized, optimization is difficult and the score test may be a safer option as it only requires estimation of the null model of a single generalized Pareto over the whole range. + +To illustrate these diagnostic tools, Figure \@ref(fig:fig-parameterstab) shows a threshold stability plot, which features a small increase in the shape parameters as the threshold increases, corresponding to a stabilization or even a slight decrease of the hazard at higher ages. We can envision a threshold of 108 years as being reasonable: the Northrop--Coleman diagnostic plot suggests lower thresholds are compatible with a constant shape above 100. Additional goodness-of-fit diagnostics are necessary to determine if the generalized Pareto model fits well. + +```{r fig-parameterstab, fig.cap="Threshold diagnostic tools: parameter stability plots for the generalized Pareto model (left) and Northrop--Coleman \\(p\\)-value path (right) for the Japanese centenarian dataset. Both suggest that a threshold as low as 100 may be suitable for peaks-over-threshold analysis.", fig.alt="Threshold stability plots. The left panel shows shape parameter estimates with 95\\% confidence intervals as a function of the threshold value from 100 until 111 years. The right panel shows p-values from a score test for nested models as a function of the same thresholds.", echo=TRUE, fig.show='hold', out.width='100%', fig.width=8.5, fig.height=4, fig.align='center'} +par(mfrow = c(1, 2), mar = c(4, 4, 1, 1)) +# Threshold sequence +u <- 100:110 +# Threshold stability plot +tstab(arguments = args_japan, + family = "gp", + method = "profile", + which.plot = "shape", + thresh = u) +# Northrop-Coleman diagnostic based on score tests +nu <- length(u) - 1L +nc_score <- nc_test(arguments = c(args_japan, list(thresh = u))) +score_plot <- plot(nc_score) +graphics.off() +``` + +Each plot in the package can be produced using base R or using \CRANpkg{ggplot2} [@ggplot2], which implements the grammar of graphics. To keep the list of package dependencies lean and adhere to the [`tinyverse` principle](https://cran.r-project.org/web/packages/pacs/vignettes/tinyverse.html), the latter can be obtained by using the argument `plot.type` with the generic S3 method `plot`, or via `autoplot`, provided the \CRANpkg{ggplot2} package is already installed. + +## Graphical goodness of fit diagnostics + +Determining whether a parametric model fits well to survival data is no easy task due to the difficulty in specifying the null distribution of many goodness of fit statistics, such as the Cramér--von Mises statistic, which differ for survival data. As such, the \CRANpkg{longevity} package relies mostly on visual diagnostic tools. @Waller.Turnbull:1992 discusses how classical visual graphics can be adapted in the presence of censoring. Most notably, only observed failure times are displayed on the $y$-axis against their empirical plotting position on the $x$-axis. Contrary to the independent and identically distributed case, the uniform plotting positions $F_n(y_i)$ are based on the nonparametric maximum likelihood estimator discussed in Section 3. + +The situation is more complicated with truncated data [@ARSIA:2022], since the data are not identically distributed: indeed, the distribution function of observation $Y_i$ truncated on the interval $[a_i, b_i]$ is $F_i(y_i) = \{F(y_i)-F(a_i)\}/\{F(b_i) - F(a_i)\}$, so the data arise from different distributions even if these share common parameters. One way out of this conundrum is using the probability integral transform and the quantile transform to map observations to the uniform scale and back onto the data scale. Taking $\widetilde{F}(y_i) = F_n(y_i)=\mathrm{rank}(y_i)/(n+1)$ to denote the empirical distribution function estimator, a probability-probability plot would show $x_i = \widetilde{F}_{i}(y_i)$ against $y_i = F_i(y_i)$, leading to approximately uniform samples if the parametric distribution $F$ is suitable. Another option is to standardize the observation, taking the collection $\widetilde{y}_i=F^{-1}\{F_i(y_i)\}$ of rescaled exceedances and comparing them to the usual plotting positions $x_{(i)} =\{i/(n+1)\}$. The drawback of the latter approach is that the quantities displayed on the $y$-axis are not raw observations and the ranking of the empirical quantiles may change, a somewhat counter-intuitive feature. However, this means that the sample $\{F_i(y_i)\}$ should be uniform under the null hypothesis, and this allows one to use methods from @Sailynoja.Burkner.Vehtari:2021 to obtain point-wise and simultaneous confidence intervals. + +\CRANpkg{longevity} offers users the choice between three types of quantile-quantile plots: regular (Q-Q, `"qq"`), Tukey's mean detrended Q-Q plots (`"tmd"`) and exponential Q-Q plots (`"exp"`). Other options on uniform scale are probability-probability (P-P, `"pp"`) plots and empirically rescaled plots (ERP, `"erp"`) [@Waller.Turnbull:1992], designed to ease interpretation with censored observations by rescaling axes. We illustrate the graphical tools with the `dutch` data above age 105 in Figure \@ref(fig:fig-qqplots). The fit is correct above 110, but there is a notable dip due to excess mortality around 109. + +```{r fig-qqplots, fig.cap="Probability-probability and quantile-quantile plots for generalized Pareto model fitted above age 105 years to Dutch data. The plots indicate broadly good agreement with the observation, except for some individuals who died age 109 for which too many have deaths close to their birthdates.", eval=TRUE, echo=TRUE, fig.width=8.5, fig.height=4, out.width='100%', fig.align='center'} +fit_dutch <- fit_elife( + arguments = dutch_data, + event = 3, + type = "interval2", + family = "gp", + thresh = 105, + export = TRUE) +par(mfrow = c(1, 2)) +plot(fit_dutch, + which.plot = c("pp","qq")) +``` + +```{r fig-EW-diag-plots, eval=FALSE, echo=FALSE, fig.width=8.5, fig.height=4, fig.align='center', out.width='100%', fig.cap="Probability-probability (left) and generalized Pareto quantile-quantile (right) plots for the simulated England and Wales supercentenarian data."} +fit_EW <- with(ewsim4, + longevity::fit_elife( + time = time, + ltrunc = ltrunc, + rtrunc = rtrunc, + family = "gp", + export = TRUE)) +plots <- plot(fit_EW, + which.plot = c("pp","qq"), + plot.type = "ggplot", + plot = FALSE) +library(patchwork) +plots[[1]] + plots[[2]] +``` + +```{r bootstrap-gof, eval=TRUE, echo=FALSE} +set.seed(2022) +# Create contingency table with observations +# grouping all above ubound +get_observed_table <- + function(data, ubound){ + # Generate all combinations + df_combo <- + expand.grid( + eage = 0:ubound, + rtrunc = unique(data$rtrunc) + ) + df_combo$count <- 0 + # Merge data frame + # (ensures some empty category appear) + df_count <- data |> + dplyr::select(eage, rtrunc, count) |> + dplyr::full_join(df_combo, + by = c("eage", + "rtrunc", + "count")) |> + dplyr::mutate(eage = pmin(eage, ubound)) |> + # Regroup observations above ubound + dplyr::count(eage, + rtrunc, + wt = count, + name = "count") + } +# Compute expected counts, conditioning +# on number per right truncation limits +get_expected_count <- + function(model, data, ubound){ + data$prob <- + (pelife(q = ifelse(data$eage == ubound, + data$rtrunc, + data$eage + 1), + family = model$family, + scale = model$par) - + pelife(q = data$eage, + family = model$family, + scale = model$par)) / + pelife(q = data$rtrunc, + family = model$family, + scale = model$par) + count_rtrunc <- data |> + dplyr::group_by(rtrunc) |> + dplyr::summarize(tcount = sum(count), + .groups = "keep") + merge(data, count_rtrunc, by = "rtrunc") |> + dplyr::transmute(observed = count, + expected = tcount * prob) + } +# Compute chi-square statistic +chisquare_stat <- function(data){ + with(data, + sum((observed - expected)^2/expected)) +} +# Upper bound for pooling +ubound <- 5L +boot_gof <- numeric(length = B) +for(b in seq_len(B - 1L)){ + # Generate bootstrap sample + boot_samp <- + do.call(rbind, #merge data frames + apply(db_rtrunc, 1, function(x){ + # for each rtrunc and count + count <- table( #tabulate count + floor( #round down + # sample right truncated exponential + samp_elife(n = x["count"], + scale = model0$par, + family = "exp", #null model + upper = x["rtrunc"] - thresh, + type2 = "ltrt"))) + # return data frame + data.frame(count = as.integer(count), + rtrunc = as.numeric(x["rtrunc"]) - thresh, + eage = as.integer(names(count))) + })) + # Fit null model to bootstrap sample + boot_mod0 <- + with(boot_samp, + fit_elife(time = eage, + time2 = eage + 1L, + rtrunc = rtrunc, + type = "interval", + event = 3, + family = "exp", + weights = count)) + ctab <- get_observed_table(boot_samp, + ubound = ubound) + boot_gof[b] <- + chisquare_stat( + data = get_expected_count(model = boot_mod0, + data = ctab, + ubound = ubound) + ) +} +# Add original statistic +db_origin <- + aggregate(count ~ rtrunc + age, + FUN = "sum", + data = japanese2, + subset = age >= thresh) |> + dplyr::mutate(eage = age - thresh) +ctab <- get_observed_table(db_origin, + ubound = ubound) +boot_gof[B] <- + chisquare_stat( + data = get_expected_count(model = model0, + data = ctab, + ubound = ubound)) +# Bootstrap p-value +boot_gof_pval <- rank(boot_gof)[B] / B +``` + +Censored observations are used to compute the plotting positions, but are not displayed. As such, we cannot use graphical goodness of fit diagnostics for the Japanese interval censored data. An alternative, given that data are tabulated in a contingency table, is to use a chi-squared test for independence, conditioning on the number of individuals per birth cohort. The expected number in each cell (birth cohort and age band) can be obtained by computing the conditional probability of falling in that age band. The asymptotic null distribution should be $\chi^2$ with $(k-1)(p-1)$ degrees of freedom, where $k$ is the number of age bands and $p$ the number of birth cohorts. In finite samples, the expected count for large excess lifetimes are very low so one can expect the $\chi^2$ approximation to be poor. To mitigate this, we can pool observations and resort to simulation to approximate the null distribution of the test statistic. The bootstrap $p$-value for the exponential model above 108 years, pooling observations with excess lifetime of 5 years and above, is `r sprintf(boot_gof_pval, fmt = "%.3f")`, indicating no evidence that the model is inadequate, but the test here may have low power. + +## Stratification + +Demographers may suspect differences between individuals of different sex, from different countries or geographic areas, or by birth cohort. All of these are instances of categorical covariates. One possibility is to incorporate these covariates with suitable link function through parameters, but we consider instead stratification. We can split the data by levels of `covariate` (with factors) into sub-data and compare the goodness of fit of the $K$ models relative to that which pools all observations. The `test_elife` function performs likelihood ratio tests for the comparisons. The test statistic is $-2\{\ell(\widehat{\boldsymbol{\theta}}_0) - \ell(\widehat{\boldsymbol{\theta}})\}$, where $\widehat{\boldsymbol{\theta}}_0$ is the maximum likelihood estimator under the null model with common parameters, and $\widehat{\boldsymbol{\theta}}$ is the unrestricted maximum likelihood estimator for the alternative model with the same distribution, but which allows for stratum-specific parameters. We illustrate this with a generalized Pareto model for the excess lifetime. The null hypothesis is $\mathrm{H}_0: \sigma_{\texttt{f}} = \sigma_{\texttt{m}}, \xi_{\texttt{f}}=\xi_{\texttt{m}}$ against the alternative that at least one equality doesn't hold and so the hazard and endpoints are different. + +```{r covariate-test, eval=TRUE, echo=TRUE} +print( + test_elife( + arguments = args_japan, + thresh = 110, + family = "gp", + covariate = japanese2$gender) +) +``` + + In the present example, there is no evidence against any difference in lifetime distribution between male and female; this is unsurprising given the large imbalance between counts for each covariate level, with far fewer males than females. + +## Extrapolation + +If the maximum likelihood estimator of the shape $\xi$ for the generalized Pareto model is negative, then the distribution has a finite upper endpoint; otherwise, the latter is infinite. With $\xi < 0$, we can look at the profile log likelihood for the endpoint $\eta = -\sigma/\xi$, using the function `prof_gp_endpt`, to draw the curve and obtain confidence intervals. The argument `psi` is used to give a grid of values over which to compute the profile log likelihood. The bounds of the $(1-\alpha)$ confidence intervals are obtained by fitting a cubic smoothing spline for $y=\eta$ as a function of the shifted profile curve $x = 2\{\ell_p(\eta)-\ell_p(\widehat{\eta})\}$ on both sides of the maximum likelihood estimator and predicting the value of $y$ when $x = -\chi^2_1(1-\alpha)$. This technique works well unless the profile is nearly flat or the bounds lie beyond the range of values of `psi` provided; the user may wish to change them if they are too far. If $\widehat{\xi} \approx 0$, then the upper bound of the confidence interval may be infinite and the profile log likelihood may never reach the cutoff value of the asymptotic $\chi^2_1$ distribution. + +The profile log likelihood curve for the endpoint, shifted vertically so that its value is zero at the maximum likelihood estimator, highlights the marked asymmetry of the distribution of $\eta$, shown in \@ref(fig:fig-endpoint-confint), with the horizontal dashed lines showing the limits for the 95% profile likelihood confidence intervals. These suggest that the endpoint, or a potential finite lifespan, could lie very much beyond observed records. The routine used to calculate the upper bound computes the cutoff value by fitting a smoothing spline with the role of the $y$ and $x$ axes reversed and by predicting the value of $\eta$ at $y=0$. In this example, the upper confidence limit is extrapolated from the model: more accurate measures can be obtained by specifying a longer and finer sequence of values of `psi` such that the profile log likelihood drops below the $\chi^2_1$ quantile cutoff. + +```{r fig-endpoint-confint, eval=TRUE, echo=TRUE, fig.cap="Maximum likelihood estimates with 95\\% confidence intervals as a function of threshold (left) and profile likelihood for exceedances above 110 years (right) for Japanese centenarian data. As the threshold increases, the number of exceedances decreases and the intervals for the upper bound become wider. At 110, the right endpoint of the interval would go until infinity.", fig.width=8.5, fig.height=4, out.width='100%', fig.align='center'} +# Create grid of threshold values +thresholds <- 105:110 +# Grid of values at which to evaluate profile +psi <- seq(120, 200, length.out = 101) +# Calculate the profile for the endpoint +# of the generalized Pareto at each threshold +endpt_tstab <- do.call( + endpoint.tstab, + args = c( + args_japan, + list(psi = psi, + thresh = thresholds, + plot = FALSE))) +# Compute corresponding confidence intervals +profile <- endpoint.profile( + arguments = c(args_japan, list(thresh = 110, psi = psi))) +# Plot point estimates and confidence intervals +g1 <- autoplot(endpt_tstab, plot = FALSE, ylab = "lifespan (in years)") +# Plot the profile curve with cutoffs for conf. int. for 110 +g2 <- autoplot(profile, plot = FALSE) +patchwork::wrap_plots(g1, g2) +``` + +Depending on the model, the conclusions about the risk of mortality change drastically: the Gompertz model implies an ever increasing hazard, but no finite endpoint for the distribution of exceedances. The exponential model implies a constant hazard and no endpoint. By contrast, the generalized Pareto can accommodate both finite and infinite endpoints. The marked asymmetry of the distribution of lifespan defined by the generalized Pareto shows that inference obtained using symmetric confidence intervals (i.e., Wald-based) is likely very misleading: the drop in fit from having a zero or positive shape parameter $\xi$ is seemingly smaller than the cutoffs for a 95% confidence interval, suggesting that while the best point estimate is around 128 years, the upper bound is so large (and extrapolated) that everything is possible. The model however also suggests a very high probability of dying in any given year, regardless of whether the hazard is constant, decreasing or increasing. + +## Hazard + +The parameters of the models are seldom of interest in themselves: rather, we may be interested in a summary such as the hazard function. At present, \CRANpkg{longevity} does not allow general linear modelling of model parameters or time-varying covariates, but other software implementations can tackle this task. For example, \CRANpkg{casebase} [@casebase] fits flexible hazard models using logistic or multinomial regression with potential inclusion of penalties for the parameters associated to covariates and splines effects. Another alternative is \CRANpkg{flexsurv} [@flexsurv], which offers 10 parametric models and allows for user-specified models. +The \CRANpkg{bshazard} package [@bshazard] provides nonparametric smoothing via $B$-splines, whereas \CRANpkg{muhaz} handles kernel-based hazard for right censored data; both could be used for validation of the parametric models in the case of right censoring. The \CRANpkg{rstpm2} package [@rstpm2] handles generalized modelling for censoring with the Royston--Parmar model built from natural cubic splines [@Royston.Parmar:2002]. Contrasting with all of the aforementioned approaches, we focus on parametric models: this is partly because there are few observations for the user case we consider and covariates, except perhaps for gender and birth year, are not available. + +The hazard changes over time; the only notable exception is the exponential hazard, which is constant. \CRANpkg{longevity} includes utilities for computing the hazard function from a fitted model object and computing point-wise confidence intervals using symmetric Wald intervals or the profile likelihood. +Specifically, the `hazard_elife` function calculates the hazard $h(t; \boldsymbol{\theta})$ point-wise at times $t=$`x`; Wald-based confidence intervals are obtained using the delta-method, whereas profile likelihood intervals are obtained by reparametrizing the model in terms of $h(t)$ for each time $t$. More naturally perhaps, we can consider a Bayesian analysis of the Japanese excess lifetime above 108 years. Using the likelihood and encoded log posterior provided in `logpost_elife`, we obtained independent samples from the posterior of the generalized Pareto parameters $(\sigma, \xi)$ with maximal data information prior using the \CRANpkg{rust} package. Each parameter combination was then fed into `helife` and the hazard evaluated over a range of values. Figure \@ref(fig:fig-hazard) shows the posterior samples and functional boxplots [@Sun.Genton:2011] of the hazard curves, obtained using the \CRANpkg{fda} package. The risk of dying increases with age, but comes with substantial uncertainty, as evidenced by the increasing width of the boxes and interquartile range. + +```{r fig-hazard, eval=TRUE, echo=FALSE, fig.width=8.5, fig.height=4, out.width='100%', fig.cap="Left: scatterplot of 1000 independent posterior samples from generalized Pareto model with maximum data information prior; the contour curves give the percentiles of credible intervals, and show approximate normality of the posterior. Right: functional boxplots for the corresponding hazard curves, with increasing width at higher ages."} +par(mfrow = c(1,2), mar = c(4,4,1,1)) +threshold <- 108 +# Note that we cannot have an argument 'arguments' in 'ru' +post_samp <- rust::ru( + logf = lpost_elife, + weights = args_japan$weights, + rtrunc = args_japan$rtrunc, + event= 3, + type = "interval2", + time = args_japan$time, + time2 = args_japan$time2, + thresh = threshold, + family = "gp", + trans = "BC", + n = 1000, + d = 2, + init = c(1.67, -0.08), + lower = c(0, -1)) +plot(post_samp, + xlab = "scale", + ylab = "shape", + bty = "l") +age <- threshold + seq(0.1, 10, length.out = 101) +haz_samp <- apply(X = post_samp$sim_vals, + MARG = 1, + FUN = function(par){ + helife(x = age - threshold, scale = par[1], shape = par[2], family = "gp") + }) +# Functional boxplots +fbox <- fda::fbplot( + fit = haz_samp, + x = age, + xlim = range(age), + ylim = c(0, 2.8), + xaxs = "i", + yaxs = "i", + xlab = "age", + ylab = "hazard (in years)", + outliercol = "gray90", + color = "gray60", + barcol = "black", + bty = "l" +) +``` + +## Nonparametric maximum likelihood estimation {#sec-nonparametric} + +The nonparametric maximum likelihood estimator is unique only up to equivalence classes. The data for individual $i$ consists of the tuple $\{L_i, R_i, V_i, U_i\}$, where the censoring interval is $[L_i, R_i]$ and the truncation interval is $[V_i, U_i\}$, with $0 \leq V_i \leq L_i \leq R_i \leq U_i \leq \infty$. @Turnbull:1976 shows how one can build disjoint intervals $C = \bigsqcup_{j=1}^m [a_j, b_j]$ where $a_j \in \mathcal{L} = \{L_1, \ldots, L_n\}$ and $b_j \in \mathcal{R} = \{R_1, \ldots, R_n\}$ satisfy $a_1 \leq b_1 < \cdots < a_m \leq b_m$ and the intervals $[a_j, b_j]$ contain no other members of $L$ or $R$ except at the endpoints. This last condition notably ensures that the intervals created include all observed failure times as singleton sets in the absence of truncation. Other authors [@Lindsey.Ryan:1998] have taken interval censored data as semi-open intervals $(L_i, R_i]$, a convention we adopt here for numerical reasons. For interval censored and truncated data, @Frydman:1994 shows that this construction must be amended by taking instead $a_j \in \mathcal{L} \cup \{U_1, \ldots, U_n\}$ and $b_j \in \mathcal{R} \cup \{V_1, \ldots, V_n\}$. + +We assign probability $p_j = F(b_j^{+}) - F(a_j^{-}) \ge 0$ to each of the resulting $m$ intervals under the constraint $\sum_{j=1}^m p_j = 1$ and $p_j \ge 0$ $(j=1, \ldots, m)$. The nonparametric maximum likelihood estimator of the distribution function $F$ is then +\begin{align*} +\widehat{F}(t) = \begin{cases} 0, & t < a_1;\\ +\widehat{p}_1 + \cdots + \widehat{p}_j, & b_j < t < a_{j+1} \quad (1 \leq j \leq m-1);\\ +1, & t > b_m; +\end{cases} +\end{align*} +and is undefined for $t \in [a_j, b_j]$ $(j=1, \ldots, m)$. + +```{r fig-turnbull, echo=FALSE, eval=TRUE, out.width="100%", fig.width=8.5, fig.height=6, fig.align='center', fig.cap="Illustration of the truncation (pale grey) and censoring intervals (dark grey) equivalence classes based on Turnbull's algorithm. Observations must fall within equivalence classes defined by the former."} +library(longevity) +n <- 100L +ltrunc <- runif(n = n, min = 0, max = 1) +rtrunc <- TruncatedNormal::rtnorm( + n = 1, + sd = 4, + mu = 5, + lb = ltrunc, + ub = rep(Inf, n)) +time <- TruncatedNormal::rtnorm( + n = 1, + sd = 2, + lb = ltrunc, + ub = rtrunc) +time2 <- ifelse(runif(n) < 0.5, + time, + TruncatedNormal::rtnorm( + n = 1, + sd = 2, + mu = (time + rtrunc)/2, + lb = time, + ub = rtrunc)) +status <- ifelse(time == time2, 1L, 3L) +unex <- turnbull_intervals(time = time, + time2 = time2, + status = status, + ltrunc = ltrunc, + rtrunc = rtrunc) +dummy_cens <- + dummy_trunc <- + matrix(NA, + nrow = nrow(unex), + ncol = length(time)) +for(i in seq_len(nrow(unex))){ + dummy_cens[i,] <- unex[i,1] >= time & unex[i,2] <= time2 + dummy_trunc[i,] <- unex[i,1] >= ltrunc & unex[i,2] <= rtrunc +} +intervals_cens <- apply(dummy_cens, 2, function(x){range(which(x))}) +intervals_trunc <- apply(dummy_trunc, 2, function(x){range(which(x))}) +melted_cens <- reshape2::melt(dummy_cens) +melted_trunc <- reshape2::melt(dummy_trunc) +ggplot() + + geom_tile(data = melted_trunc, + mapping = aes(x = Var2, y = Var1, fill = factor(ifelse(value, 2L, 0L), 0:2)), + ) + + geom_tile(data = melted_cens, + mapping = aes(x = Var2, y = Var1, fill = factor(ifelse(value, 1L, 0L), 0:2)), + alpha = 0.5) + + scale_fill_manual(values = c("white", "black", "grey")) + + labs(y = "Turnbull's intervals", + x = "observation identifier") + + scale_y_continuous(expand = c(0,0)) + + scale_x_continuous(expand = c(0,0)) + + theme(legend.position = "none") +``` + +The procedure of Turnbull can be encoded using $m \times n$ matrices. For censoring, we build $\mathbf{A}$ whose $(i,j)$th entry $\alpha_{ij}=1$ if $[a_j, b_j] \subseteq A_i$ and zero otherwise. Since the intervals forming the set $C$ are disjoint and in increasing order, a more storage efficient manner of keeping track of the intervals is to find the smallest integer $j$ such that $L_i \leq a_j$ and the largest $R_i \ge b_j$ $(j=1, \ldots, m)$ for each observation. The same idea applies for the truncation sets $B_i = (V_i, U_i)$ and matrix $\mathbf{B}$ with $(i,j)$ element $\beta_{ij}$. + +The log likelihood function is +\begin{align*} +\ell(\boldsymbol{p}) = \sum_{i=1}^n w_i \log \left( \sum_{j=1}^m \alpha_{ij}p_j\right) - w_i\log \left( \sum_{j=1}^m \beta_{ij}p_j\right) +\end{align*} + +The numerical implementation of the EM is in principle forward: first identify the equivalence classes $C$, next calculate the entries of $A_i$ and $B_i$ (or the vectors of ranges) and finally run the EM algorithm. In the second step, we need to account for potential ties in the presence of (interval) censoring and treat the intervals as open on the left for censored data. For concreteness, consider the toy example $\boldsymbol{T} =(1,1,2)$ and $\boldsymbol{\delta} = (0,1,1)$, where $\delta_i = 1$ if the observation is a failure time and $\delta_i=0$ in case of right censoring. +The left and right censoring bounds are $\mathcal{L} = \{1, 2\}$ and $\mathcal{R} = \{1, 2, \infty\}$ with $A_1 = (1, \infty)$, $A_2 = \{1\}$ and $A_3 = \{2\}$ and $C_1=\{1\}, C_2=\{2\}$. If we were to treat instead $A_1$ as a semi-closed interval $[1, \infty)$, direct maximization of the log likelihood in eq. 2.2 of @Turnbull:1976 would give probability half to each observed failure time. By contrast, the Kaplan--Meier estimator, under the convention that right censored observations at time $t$ were at risk up to and until $t$, assigns probability 1/3 to the first failure. To retrieve this solution with Turnbull's EM estimator, we need the convention that $C_1 \notin A_1$, but this requires comparing the bound with itself. The numerical tolerance in the implementation is taken to be the square root of the machine epsilon for doubles. + +The maximum likelihood estimator (MLE) need not be unique, and the EM algorithm is only guaranteed a local maximum. For interval censored data, @Gentleman.Geyer:1994 consider using the Karush--Kuhn--Tucker conditions to determine whether the probability in some intervals is exactly zero and whether the returned value is indeed the MLE. + +Due to data scarcity, statistical inference for human lifespan is best conducted using parametric models supported by asymptotic theory, reserving nonparametric estimators to assess goodness of fit. The empirical cumulative hazard for Japanese is very close to linear from early ages, suggesting that the hazard may not be very far from exponential even if more complex models are likely to be favored given the large sample size. + +The function `np_elife` returns a list with Turnbull's intervals $[a_j, b_j]$ and the probability weights assigned to each, provided the latter are positive. It also contains an object of class `stepfun` with a weighting argument that defines a cumulative distribution function. + + +We can use the nonparametric maximum likelihood estimator of the distribution function to assess a fitted parametric model by comparing the density with a binned distribution, or the cumulative distribution function. The distribution function is defined over equivalence classes, which may be isolated observations or intervals. The data interval over which there is non-zero probability of events is broken down into equispaced bins and the probability of failing in the latter estimated nonparametrically based on the distribution function. Alternatively, users can also provide a set of `breaks`, or the number of bins. + +```{r fig-ecdf, eval=TRUE, echo=TRUE, fig.align='center', out.width='100%', fig.width=8.5, fig.height=4, fig.cap="Nonparametric maximum likelihood estimate of the density (bar plot, left) and distribution function (staircase function, right), with superimposed generalized Pareto fit for excess lifetimes above 108 years. Except for the discreteness inherent to the nonparametric estimates, the two representations broadly agree at year marks."} +ecdf <- np_elife(arguments = args_japan, thresh = 108) +# Summary statistics, accounting for censoring +round(summary(ecdf), digits = 2) +# Plots of fitted parametric model and nonparametric CDF +model_gp <- fit_elife( + arguments = args_japan, + thresh = 108, + family = "gp", + export = TRUE) +# ggplot2 plots, wrapped to display side by side +patchwork::wrap_plots( + autoplot( + model_gp, # fitted model + plot = FALSE, # return list of ggplots + which.plot = c("dens", "cdf"), + breaks = seq(0L, 8L, by = 1L) # set bins for histogram + ) +) +``` + +# Conclusion + +This paper describes the salient features of \CRANpkg{longevity}, explaining the theoretical underpinning of the methods and the design considerations followed when writing the package. While \CRANpkg{longevity} was conceived for modelling lifetimes, the package could be used for applications outside of demography. Survival data in extreme value theory is infrequent yet hardly absent. For example, rainfall observations can be viewed as rounded due to the limited instrumental precision of rain gauges and treated as interval-censored. Some historical records, which are often lower bounds on the real magnitude of natural catastrophes, can be added as right-censored observations in a peaks-over-threshold analysis. In insurance, losses incurred by the company due to liability claims may be right-censored if they exceed the policy cap and are covered by a reinsurance company, or rounded [@Belzile.Neslehova:2025]. In climate science, attribution studies often focus on data just after a record-breaking event, and the stopping rule leads to truncation [@Miralles.Davison:2023], which biases results if ignored. + +The package has features that are interesting in their own right, including adapted quantile-quantile and other visual goodness of fit diagnostics for model validation. The testing procedures correctly handle tests for restrictions lying on the boundary of the parameter space. Parametric bootstrap procedures for such tests are not straightforward to implement given the heavy reliance on the data generating mechanism and the diversity of possible scenarios: this paper however shows how the utilities of the package can be coupled to ease such estimation and the code in the supplementary material illustrates how this can be extended for goodness of fit testing. + +# Acknowledgements {-} + +The author thanks four anonymous reviewers for valuable feedback. This research was supported financially by the Natural Sciences and Engineering Research Council of Canada (NSERC) via Discovery Grant RGPIN-2022-05001. diff --git a/_articles/RJ-2025-034/RJ-2025-034.html b/_articles/RJ-2025-034/RJ-2025-034.html new file mode 100644 index 0000000000..08c9221991 --- /dev/null +++ b/_articles/RJ-2025-034/RJ-2025-034.html @@ -0,0 +1,2904 @@ + + + + + + + + + + + + + + + + + + + + + + longevity: An R Package for Modelling Excess Lifetimes + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+

longevity: An R Package for Modelling Excess Lifetimes

+ + + +

The longevity R package provides a maximum likelihood estimation routine for modelling of survival data that are subject to non-informative censoring or truncation. It includes a selection of 12 parametric models of varying complexity, with a focus on tools for extreme value analysis and more specifically univariate peaks over threshold modelling. The package provides utilities for univariate threshold selection, parametric and nonparametric maximum likelihood estimation, goodness of fit diagnostics and model comparison tools. These different methods are illustrated using individual Dutch records and aggregated Japanese human lifetime data.

+
+ + + +
+

1 Introduction

+

Many data sets collected by demographers for the analysis of human longevity have unusual features and for which limited software implementations exist. The longevity package was initially built for dealing with human longevity records and data from the International Database on Longevity (IDL), which provides age at death of supercentenarians, i.e., people who died above age 110. Data for the statistical analysis of (human) longevity can take the form of aggregated counts per age at death, or most commonly life trajectory of individuals with both birth and death dates. Such lifetimes are often interval truncated (only age at death of individuals dying between two calendar dates are recorded) or left truncated and right censored (when data of individuals still alive at the end of the collection period are also included). Another frequent format is death counts, aggregated per age band. Censoring and truncation are typically of administrative nature and thus non-informative about death.

+

Supercentenarians are extremely rare and records are sparse. The most popular parametric models used by practitioners are justified by asymptotic arguments and have their roots in extreme value theory. Univariate extreme value distributions are well implemented in software and Belzile et al. (2023b) provides a recent review of existing implementations. While there are many standard R packages for the analysis of univariate extremes using likelihood-based inference, such as evd (Stephenson 2002), mev and extRemes (Gilleland and Katz 2016), only the evgam package includes functionalities to fit threshold exceedance models with censoring, as showcased in Youngman (2022) with rounded rainfall measurements. Support for survival data for extreme value models is in general wholly lacking, which motivated the development of longevity.

+

The longevity package also includes several parametric models commonly used in demography. Many existing packages that focus on tools for modelling mortality rates, typically through life tables, are listed in the CRAN Task View ActuarialScience in the Life insurance section. They do not however allow for truncation or more general survival mechanisms, as the aggregated data used are typically complete except for potential right censoring for the oldest age group. The survival package (Therneau and Grambsch 2000) includes utilities for accelerated failure time models for 10 parametric models. The MortCast package can be used to estimate age-specific mortality rates using the Kannisto and Lee–Carter approaches, among others (Ševčíková et al. 2016). The demography package provides forecasting methods for death rates based on constructed life tables using the Lee–Carter or ARIMA models. MortalityLaws includes utilities to download data from the Human Mortality Database (HMD), and fit a total of 27 parametric models for life table data (death counts and population at risk per age group) using Poisson, binomial or alternative loss functions. The vitality package fits the family of vitality models (Anderson 2000; Li and Anderson 2009) via maximum likelihood based on empirical survival data. The fitdistrplus (Delignette-Muller and Dutang 2015) allows for generic parametric distributions to be fitted to interval censored data via maximum likelihood, with various S3 methods for model assessment. The package also allows user-specified models, thereby permitting custom definitions for truncated distributions whose truncation bounds are passed as fixed vector parameters. Parameter uncertainty can be obtained via additional functions using nonparametric bootstrap. Many parametric distributions also appear in the VGAM package (Yee and Wild 1996; Yee 2015), which allows for vector generalized linear modelling. The longevity package is less general and offers support only for selected parametric distributions, but contrary to the aforementioned packages allows for truncation and general patterns. One strength of longevity is that it also includes model comparison tools that account for non-regular asymptotics and goodness of fit diagnostics.

+

Nonparametric methods are popular tools for the analysis of survival data with large samples, owing to their limited set of assumptions. They also serve for the validation of parametric models. Without explanatory variable, a closed-form estimator of the nonparametric maximum likelihood estimator of the survival function can be derived in particular instances, including the product limit estimator (Kaplan and Meier 1958) for the case of random or non-informative right censoring and an extension allowing for left truncation (Tsai et al. 1987). In general, the nonparametric maximum likelihood estimator of the survival function needs to be computed using an expectation-maximization (EM) algorithm (Turnbull 1976). Nonparametric estimators only assign probability mass on observed failure times and intervals and so cannot be used for extrapolation beyond the range of the data, limiting its utility in extreme value analysis.

+

The CRAN Task View on Survival Analysis lists various implementations of nonparametric maximum likelihood estimators of the survival or hazard functions: survival implements the Kaplan–Meier and Nelson–Aalen estimators. Many packages focus on the case of interval censoring (Groeneboom and Wellner 1992 3.2), including prodlim; Anderson-Bergman (2017a) reviews the performance of the implementations in icenReg (Anderson-Bergman 2017b) and Icens. The latter uses the incidence matrix as input data. Routines for doubly censored data are provided by dblcens. The interval (Fay and Shaw 2010) implements Turnbull’s EM algorithm for interval censored data. The case of left truncated and right censored data is handled by survival, and tranSurv provides transformation models with the potential to account for truncation dependent on survival times. For interval truncated data, dedicated algorithms that use gradient-based steps (Efron and Petrosian 1999) or inverse probability weighting (Shen 2010) exist and can be more efficient than the EM algorithm of Turnbull (1976). Many of these are implemented in the DTDA package. The longevity package includes a C\(^{++}\) implementation of the corrected Turnbull’s algorithm (Turnbull 1976), returning the nonparametric maximum likelihood estimator for arbitrary censoring and truncation patterns, as opposed to the specific existing aforementioned implementations already available which focus on specific subcases. With a small number of observations, it is also relatively straightforward to maximize the log likelihood for the concave program subject to linear constraints using constrained optimization algorithms; longevity relies for this on Rsolnp, which uses augmented Lagrangian methods and sequential quadratic programming (Ye 1987).

+

1.1 Motivating examples

+

To showcase the functionality of the package and particularity for the modelling of threshold exceedances, we consider Dutch and Japanese life lengths. The dutch database contains the age at death (in days) of Dutch people who died above age 92 between 1986 and 2015; these data were obtained from Statistics Netherlands and analyzed in Einmahl et al. (2019) and Belzile et al. (2022). Records are interval truncated, as people are included in the database only if they died during the collection period. In addition, there are 226 interval censored and interval truncated records for which only the month and year of birth and death are known, as opposed to exact dates.

+

The second database we consider is drawn from Maier et al. (2021). The data frame japanese2 consists of counts of Japanese above age 100 by age band and are stratified by both birth cohort and sex. To illustrate the format of the data, counts for female Japanese are reproduced in Table 1. The data were constructed using the extinct cohort method and are interval censored between \(\texttt{age}\) and \(\texttt{age} + 1\) and right truncated at the age reached by the oldest individuals of their birth cohort in 2020. The count variable lists the number of instances in the contingency table, and serves as a weight for likelihood contributions.

+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Table 1: Death count by birth cohort and age band for female Japanese.
age1874-18781879-18831884-18881889-18931894-18981899-1900
1001648251344138079160369858
101975159629215376110477091
1025979871864344674874869
1033455971153223050143293
104191351662140332422133
10512119738185520841357
106641222104951284836
1073474120274774521
108164166152433297
10912303983252167
110617214913092
11141015266947
1123311152922
1132285159
114124372
115011032
116011011
117000011
+
+

2 Package functionalities

+

The longevity package uses the S3 object oriented system and provides a series of functions with common arguments. The syntax used by the longevity package purposely mimics that of the survival package (Therneau and Grambsch 2000), except that it does not specify models using a formula. Users must provide vectors for the time or age (or bounds in case of intervals) via arguments time and time2, as well as lower and upper truncation bounds (ltrunc and rtrunc) if applicable. The integer vector event is used to indicate the type of event, where following survival 0 indicates right-censoring, 1 observed events, 2 left-censoring and 3 interval censoring. Together, these five vectors characterize the data and the survival mechanisms at play. Depending on the sampling scheme, not all arguments are required or relevant and they need not be of the same length, but are common to most functions. Users can also specify a named list args to pass arguments: as illustrated below, this is convenient to avoid specifying repeatedly the common arguments in each function call. Default values are overridden by elements in args, with the exception of those that are passed by the user directly in the call. Relative to survival, functions have additional arguments ltrunc and rtrunc for left and right truncation limits, as these are also possibly matrices for the case of double interval truncation (Belzile et al. 2022), since both censoring and truncation can be present simultaneously.

+

We can manipulate the data set to build the time vectors and truncation bounds for the Dutch data. We re-scale observations to years for interpretability and keep only records above age 98 for simplicity. We split the data to handle the observed age at death first: these are treated as observed (uncensored) whenever time and time2 coincide. When exact dates are not available, we compute the range of possible age at which individuals may have died, given their birth and death years and months. The truncation bounds for each individual can be obtained by subtracting from the endpoints of the sampling frame the birth dates, with left and right truncation bounds +\[\begin{align*} +\texttt{ltrunc}=\min\{92 \text{ years}, 1986.01.01 - \texttt{bdate}\}, \qquad \texttt{rtrunc} = 2015.12.31- \texttt{bdate}. +\end{align*}\] +Table 2 shows a sample of five individuals, two of whom are interval-censored, and the corresponding vectors of arguments along with two covariates, gender (gender) and birth year (byear).

+
+ +
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Table 2: Sample of five Dutch records, formatted so that the inputs match the function arguments used by the package. Columns give the age in years at death (or plausible interval), lower and upper truncation bounds giving minimum and maximum age for inclusion, an integer indicating the type of censoring, gender and birth year.
timetime2ltruncrtrunceventgenderbyear
104.67105.7480.00111.003female1905
103.50104.5878.00109.003female1907
100.28100.2892.01104.501female1911
100.46100.4692.01102.001male1913
100.51100.5192.01104.351female1911
+
+

We can proceed similarly for the Japanese data. Ages of centenarians are rounded down to the nearest year, so all observations are interval censored within one-year intervals. Assuming that the ages at death are independent and identically distributed with distribution function \(F(\cdot; \boldsymbol{\theta})\), the log likelihood for exceedances \(y_i = \texttt{age}_i - u\) above age \(u\) is +\[\begin{align*} +\ell(\boldsymbol{\theta}) = \sum_{i: \texttt{age}_i > u}n_i \left[\log \{F(y_i+1; \boldsymbol{\theta}) - F(y_i; \boldsymbol{\theta})\} - \log F(r_i - u; \boldsymbol{\theta})\right] +\end{align*}\] +where \(n_i\) is the count of the number of individuals in cell \(i\) and \(r_i > \texttt{age}_i+1\) is the right truncation limit for that cell, i.e., the maximum age that could have been achieved for that birth cohort by the end of the data collection period.

+
+
+
data(japanese2, package = "longevity")
+# Keep only non-empty cells
+japanese2 <- japanese2[japanese2$count > 0, ]
+# Define arguments that are recycled
+japanese2$rtrunc <- 2020 - 
+ as.integer(substr(japanese2$bcohort, 1, 4))
+# The line above extracts the earliest year of the birth cohort
+# Create a list with all arguments common to package functions
+args_japan <- with(japanese2, 
+       list(
+        time = age, # lower censoring bound
+        time2 = age + 1L, # upper censoring bound
+        event = 3, # define interval censoring
+        type = "interval2",
+        rtrunc = rtrunc, # right truncation limit
+        weights = count)) # counts as weights
+
+
+

2.1 Parametric models and maximum likelihood estimation

+

Various models are implemented in longevity: their hazard functions are reported in Table 3. Two of those models, labelled perks and beard are logistic-type hazard functions proposed in Perks (1932) that have been used by Beard (1963), and popularized in work of Kannisto and Thatcher; we use the parametrization of Richards (2012), from which we also adopt the nomenclature. Users can compare the models with those available in MortalityLaws; see ?availableLaws for the list of hazard functions and parametrizations.

+
+ +
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+Table 3: List of parametric models for excess lifetime supported by the package, with parametrization and hazard functions. The models are expressed in terms of scale parameter \(\sigma\), rate parameters \(\lambda\) or \(\nu\), and shape parameters \(\xi\), \(\alpha\) or \(\beta\). +
+model + +hazard function + +constraints +
+exp + +\(\sigma^{-1}\) + +\(\sigma > 0\) +
+gomp + +\(\sigma^{-1}\exp(\beta t/\sigma)\) + +\(\sigma > 0, \beta \ge 0\) +
+gp + +\((\sigma + \xi t)_{+}^{-1}\) + +\(\sigma > 0, \xi \in \mathbb{R}\) +
+weibull + +\(\sigma^{-\alpha} \alpha t^{\alpha-1}\) + +\(\sigma > 0, \alpha > 0\) +
+extgp + +\(\beta\sigma^{-1}\exp(\beta t/\sigma)[\beta+\xi\{\exp(\beta t/\sigma) -1\}]^{-1}\) + +\(\sigma > 0, \beta \ge 0, \xi \in \mathbb{R}\) +
+extweibull + +\(\alpha\sigma^{-\alpha}t^{\alpha-1}\{1+\xi(t/\sigma)^{\alpha}\}_{+}\) + +\(\sigma > 0, \alpha > 0, \xi \in \mathbb{R}\) +
+perks + +\(\alpha\exp(\nu x)/\{1+\alpha\exp(\nu x)\}\) + +\(\nu \ge 0, \alpha >0\) +
+beard + +\(\alpha\exp(\nu x)/\{1+\alpha\beta\exp(\nu x)\}\) + +\(\nu \ge 0, \alpha >0, \beta \ge 0\) +
+gompmake + +\(\lambda + \sigma^{-1}\exp(\beta t/\sigma)\) + +\(\lambda \ge 0, \sigma > 0, \beta \ge 0\) +
+perksmake + +\(\lambda + \alpha\exp(\nu x)/\{1+\alpha\exp(\nu x)\}\) + +\(\lambda \ge 0, \nu \ge 0, \alpha > 0\) +
+beardmake + +\(\lambda + \alpha\exp(\nu x)/\{1+\alpha\beta\exp(\nu x)\}\) + +\(\lambda \ge 0, \nu \ge 0, \alpha > 0, \beta \ge 0\) +
+
+

Many of the models are nested and Figure 1 shows the logical relation between the various families. The function fit_elife allows users to fit all of the parametric models of Table 3: the print method returns a summary of the sampling mechanism, the number of observations, the maximum log likelihood and parameter estimates with standard errors. Depending on the data, some models may be overparametrized and parameters need not be numerically identifiable. To palliate such issues, the optimization routine, which uses Rsolnp, can try multiple starting values or fit various sub-models to ensure that the parameter values returned are indeed the maximum likelihood estimates. If one tries to compare nested models and the fit of the simpler model is better than the alternative, the anova function will return an error message.

+

The fit_elife function handles arbitrary censoring patterns over single intervals, along with single interval truncation and interval censoring. To accommodate the sampling scheme of the International Database on Longevity (IDL), an option also allows for double interval truncation (Belzile et al. 2022), whereby observations are included only if the person dies between time intervals, potentially overlapping, which defines the observation window over which dead individuals are recorded.

+
+
+
thresh <- 108
+model0 <- fit_elife(arguments = args_japan,
+          thresh = thresh, 
+          family = "exp")
+(model1 <- fit_elife(arguments = args_japan,
+           thresh = thresh, 
+           family = "gomp"))
+
+
#> Model: Gompertz distribution. 
+#> Sampling: interval censored, right truncated
+#> Log-likelihood: -3599.037 
+#> 
+#> Threshold: 108 
+#> Number of exceedances: 2489 
+#> 
+#> Estimates
+#>  scale   shape  
+#> 1.6855  0.0991  
+#> 
+#> Standard Errors
+#>  scale   shape  
+#> 0.0523  0.0273  
+#> 
+#> Optimization Information
+#>   Convergence: TRUE
+
+

2.2 Model comparisons

+

Goodness of fit of nested models can be compared using likelihood ratio tests via the anova method. Most of the interrelations between models yield non-regular model comparisons since, to recover the simpler model, one must often fix parameters to values that lie on the boundary of the parameter space. For example, if we compare a Gompertz model with the exponential, the limiting null distribution is a mixture of a point mass at zero and a \(\chi^2_1\) variable, both with probability half (Chernoff 1954). Many authors (e.g., Camarda 2022) fail to recognize this fact. The case becomes more complicated with more than one boundary constraint: for example, the deviance statistic comparing the Beard–Makeham and the Gompertz model, which constrains two parameters on the boundary of the parameter space, has a null distribution which is a mixture of \(\chi^2_2/4 + \chi^2_1/2 + \chi^2_0/4\) (Self and Liang 1987).

+

Nonidentifiability impacts testing: for example, if the rate parameter of the Perks–Makeham model (perksmake) \(\nu \to 0\), the limiting hazard, \(\lambda + \exp(\alpha)/\{1+\exp(\alpha)\}\), is constant (exponential model), but neither \(\alpha\) nor \(\lambda\) is identifiable. The usual asymptotics for the likelihood ratio test break down as the information matrix is singular (Rotnitzky et al. 2000). As such, all three families that include a Makeham component cannot be directly compared to the exponential in longevity and the call to anova returns an error message.

+

Users can also access information criteria, AIC and BIC. The correction factors implemented depend on the number of parameters of the distribution, but do not account for singular fit, non-identifiable parameters or singular models for which the usual corrections \(2p\) and \(\ln(n)p\) are inadequate (Watanabe 2010).

+
+
+Graph with parametric model names, and arrows indicating the relationship between these. Dashed arrows indicate non-regular comparisons between nested models, and the expression indicates which parameter to fix to obtain the submodel. +

+Figure 1: Relationship between parametric models showing nested relations. Dashed arrows represent restrictions that lead to nonregular asymptotic null distribution for comparison of nested models. Comparisons between models with Makeham components and exponential are not permitted by the software because of nonidentifiability issues. +

+
+
+

To showcase how hypothesis testing is performed, we consider a simple example with two nested models. We test whether the exponential model is an adequate simplification of the Gompertz model for exceedances above 108 years — an irregular testing problem since \(\beta=0\) is a restriction on the boundary of the parameter space. The drop in log likelihood is quite large, indicating the exponential model is not an adequate simplification of the Gompertz fit. This is also what is suggested by the Bayesian information criterion, which is much lower for the Gompertz model than for the exponential.

+
+
+
# Model comparison
+anova(model1, model0)
+# Information criteria
+c("exponential" = BIC(model0), "Gompertz" = BIC(model1))
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
nparDevianceDfChisqPr(>Chisq)
gomp27198.07
exp17213.25115.170
+
#> exponential    Gompertz 
+#>    7221.065    7213.713
+
+

2.3 Simulation-based inference

+

Given the poor finite sample properties of the aforementioned tests, it may be preferable to rely on a parametric bootstrap rather than on the asymptotic distribution of the test statistic (Belzile et al. 2022) for model comparison. +Simulation-based inference requires capabilities for drawing new data sets whose features match those of the original one. For example, the International Database on Longevity (IDL) (Jdanov et al. 2021) features data that are interval truncated above 110 years, but doubly interval truncated since the sampling period for semisupercentenarians (who died age 105 to 110) and supercentenarians (who died above 110) are not always the same (Belzile et al. 2022). The 2018 Istat semisupercentenarians database analyzed by Barbi et al. (2018) on Italians includes left truncated right censored records.

+

To mimic the postulated data generating mechanism while accounting for the sampling scheme, we could use the observed birth dates, or simulate new birth dates (possibly through a kernel estimator of the empirical distribution of birth dates) while keeping the sampling frame with the first and last date of data collection to define the truncation interval. In other settings, one could obtain the nonparametric maximum likelihood estimator of the distribution of the upper truncation bound (Shen 2010) using an inverse probability weighted estimator, which for fixed data collection windows is equivalent to setting the birth date.

+

The samp_elife function includes multiple type2 arguments to handle these. For interval truncated data (type2="ltrt"), it uses the inversion method (Section 2 of Devroye (1986)): for \(F\) an absolutely continuous distribution function and \(F^{-1}\) the corresponding quantile function, a random variable distributed according to \(F\) truncated on \([a,b]\) is generated as \(X \sim F^{-1}[F(a) + U\{F(b)-F(a)\}]\) +where \(U \sim \mathsf{U}(0,1)\) is standard uniform.

+

The function samp_elife also has an argument upper which serves for both right truncation, and right censoring. For the latter, any record simulated that exceeds upper is capped at that upper bound and declared partially observed. This is useful for simulating administrative censoring, whereby the birth date and the upper bound of the collection window fully determine whether an observation is right censored or not. An illustrative example is provided in the next section.

+ +
+ +
+

The anova method call uses the asymptotic null distribution for comparison of nested parametric distributions \(\mathcal{F}_0 \subseteq \mathcal{F}_1\). We could use the bootstrap to see how good this approximation to the null distribution is. To mimic as closely as possible the data generating mechanism, which is custom in most scenarios, we condition on the sampling frame and the number of individuals in each birth cohort. The number dying at each age is random, but the right truncation limits will be the same for anyone in that cohort. We simulate excess lifetimes, then interval censor observations by keeping only the corresponding age bracket. Under the null hypothesis, the data are drawn from \(\widehat{F}_0 \in \mathcal{F}_0\) and we generate observations from this right truncated distribution using the samp_elife utility, which also supports double interval truncation and left truncation right censoring. This must be done within a for loop since we have count attached to each upper bound, but the function is vectorized should we use a single vector containing all of the right truncation limits.

+

The bootstrap \(p\)-value for comparing models \(M_0 \subset M_1\) would be obtained by repeating the following steps \(B\) times and calculating the rank of the observed test statistic among alternatives:

+
    +
  1. Simulate new birth dates \(d_i\) \((i=1, \ldots, n)\) (e.g., drawing from a smoothed empirical distribution of birth dates); the latest possible birth date is one which ensures the person reached at least the threshold by the end of the period.
  2. +
  3. Subtract the endpoints of the sampling period, say \(c_1\) and \(c_2\) to get the minimum (maximum) age at death, \(c_1 - d_i\) (respectively \(c_2 - d_i\)) days, which define the truncation bounds.
  4. +
  5. Use the function samp_elife to simulate new observations from a parametric interval truncated distribution from the null model \(M_0\)
  6. +
  7. Use the optimization procedure in fit_elife to fit the model with both \(M_1\) and \(M_2\) and calculate the deviance, and from them the likelihood ratio statistic.
  8. +
+

The algorithm is implemented below for comparing the Gompertz and the exponential model. Since the procedure is computationally intensive, users must trade off between the precision of the bootstrap \(p\)-value estimate and the number of replications, \(B\).

+
+
+
set.seed(2022)
+# Count of unique right truncation limit
+db_rtrunc <- aggregate(count ~ rtrunc, 
+        FUN = "sum", 
+        data = japanese2,
+        subset = age >= thresh)
+B <- 1000L # Number of bootstrap replications
+boot_anova <- numeric(length = B)
+boot_gof <- numeric(length = B)
+for(b in seq_len(B - 1L)){
+ boot_samp <-  # Generate bootstrap sample
+  do.call(rbind, #merge data frames
+   apply(db_rtrunc, 1, function(x){  # for each rtrunc and count
+   count <- table( #tabulate count
+   floor( #round down
+    samp_elife( # sample right truncated exponential
+     n = x["count"],   
+       scale = model0$par,
+       family = "exp", #null model
+       upper = x["rtrunc"] - thresh,
+       type2 = "ltrt")))
+  data.frame(   # return data frame
+    count = as.integer(count),
+    rtrunc = as.numeric(x["rtrunc"]) - thresh,
+    eage = as.integer(names(count)))
+ }))
+ boot_mod0 <- # Fit null model to bootstrap sample
+  with(boot_samp,
+     fit_elife(time = eage, 
+      time2 = eage + 1L,
+      rtrunc = rtrunc,
+      type = "interval",
+      event = 3,
+      family = "exp",
+      weights = count))
+ boot_mod1 <-  # Fit alternative model to bootstrap sample
+  with(boot_samp,
+     fit_elife(time = eage, 
+      time2 = eage + 1L,
+      rtrunc = rtrunc,
+      type = "interval",
+      event = 3,
+      family = "gomp",
+      weights = count))
+ boot_anova[b] <- deviance(boot_mod0) -
+  deviance(boot_mod1)
+}
+# Add original statistic
+boot_anova[B] <- deviance(model1) - deviance(model0)
+# Bootstrap p-value
+(pval <- rank(boot_anova)[B] / B)
+
+
#> [1] 0.001
+
+

The asymptotic approximation is of similar magnitude as the bootstrap \(p\)-value. Both suggest that the more complex Gompertz model provides a significantly better fit.

+

2.4 Extreme value analysis

+

Extreme value theory suggests that, in many instances, the limiting conditional distribution of exceedances of a random variable \(Y\) with distribution function \(F\) is generalized Pareto, meaning +\[\begin{align} +\lim_{u \to x^*}\Pr(Y-u > y \mid Y > u)= \begin{cases} +\left(1+\xi y/\sigma\right)_{+}^{-1/\xi}, & \xi \neq 0;\\ +\exp(-y/\sigma), & \xi = 0; +\end{cases} +\tag{1} +\end{align}\] +with \(x_{+} = \max\{x, 0\}\) and \(x^*=\sup\{x: F(x) < 1\}\). This justifies the use of Equation (1) for the survival function of threshold exceedances when dealing with rare events. The model has two parameters: a scale \(\sigma\) and a shape \(\xi\) which determines the behavior of the upper tail. Negative shape parameters correspond to bounded upper tails and a finite right endpoint for the support.

+

Study of population dynamics and mortality generally requires knowledge of the total population from which observations are drawn to derive rates. By contrast, the peaks over threshold method, by which one models the \(k\) largest observations of a sample, is a conditional analysis (e.g., given survival until a certain age), and is therefore free of denominator specification since we only model exceedances above a high threshold \(u\). For modelling purposes, we need to pick a threshold \(u\) that is smaller than the upper endpoint \(x^*\) in order to have sufficient number of observations to estimate parameters. The threshold selection problem is a classical instance of bias-variance trade-off: the parameter estimators are possibly biased if the threshold is too low because the generalized Pareto approximation is not good enough, whereas choosing a larger threshold to ensure we are closer to the asymptotic regime leads to reduced sample size and increased parameter uncertainty.

+

To aid threshold selection, users commonly resort to parameter stability plots. These are common visual diagnostics consisting of a plot of estimates of the shape parameter \(\widehat{\xi}\) (with confidence or credible intervals) based on sample exceedances over a range of thresholds \(u_1, \ldots, u_K\). If the data were drawn from a generalized Pareto distribution, the conditional distribution above higher threshold \(v > u\) is also generalized Pareto with the same shape: this threshold stability property is the basis for extrapolation beyond the range of observed records. Indeed, if the change in estimation of \(\xi\) is nearly constant, this provides reassurance that the approximation can be used for extrapolation. The only difference with survival data, relative to the classical setting, is that the likelihood must account for censoring and truncation. Note that, when we use threshold exceedances with a nonzero threshold (argument thresh), it however isn’t possible to unambiguously determine whether left censored observations are still exceedances: such cases yield errors in the functions.

+

Theory on penultimate extremes suggests that, for finite levels and general distribution function \(F\) for which (1) holds, the shape parameter varies as a function of the threshold \(u\), behaving like the derivative of the reciprocal hazard \(r(x) = \{1-F(x)\}/f(x)\). We can thus model the shape as piece-wise constant by fitting a piece-wise generalized Pareto model due to Northrop and Coleman (2014) and adapted in Belzile et al. (2022) for survival data. The latter can be viewed as a mixture of generalized Pareto over \(K\) disjoint intervals with continuity constraints to ensure a smooth hazard, which reduces to the generalized Pareto if we force the \(K+1\) shape parameters to be equal. We can use a likelihood ratio test to compare the model, or a score test if the latter is too computationally intensive, and plot the \(p\)-values for each of the \(K\) thresholds, corresponding to the null hypotheses \(\mathrm{H}_k: \xi_k = \cdots = \xi_{K}\) (\(k=1, \ldots, K-1\)). As the model quickly becomes overparametrized, optimization is difficult and the score test may be a safer option as it only requires estimation of the null model of a single generalized Pareto over the whole range.

+

To illustrate these diagnostic tools, Figure 2 shows a threshold stability plot, which features a small increase in the shape parameters as the threshold increases, corresponding to a stabilization or even a slight decrease of the hazard at higher ages. We can envision a threshold of 108 years as being reasonable: the Northrop–Coleman diagnostic plot suggests lower thresholds are compatible with a constant shape above 100. Additional goodness-of-fit diagnostics are necessary to determine if the generalized Pareto model fits well.

+
+
+
par(mfrow = c(1, 2), mar = c(4, 4, 1, 1)) 
+# Threshold sequence
+u <- 100:110
+# Threshold stability plot
+tstab(arguments = args_japan,
+      family = "gp",
+      method = "profile",
+      which.plot = "shape",
+      thresh = u)
+# Northrop-Coleman diagnostic based on score tests
+nu <- length(u) - 1L
+nc_score <- nc_test(arguments = c(args_japan, list(thresh = u)))
+score_plot <- plot(nc_score)
+graphics.off()
+
+
+Threshold stability plots. The left panel shows shape parameter estimates with 95\% confidence intervals as a function of the threshold value from 100 until 111 years. The right panel shows p-values from a score test for nested models as a function of the same thresholds. +

+Figure 2: Threshold diagnostic tools: parameter stability plots for the generalized Pareto model (left) and Northrop–Coleman \(p\)-value path (right) for the Japanese centenarian dataset. Both suggest that a threshold as low as 100 may be suitable for peaks-over-threshold analysis. +

+
+
+

Each plot in the package can be produced using base R or using ggplot2 (Wickham 2016), which implements the grammar of graphics. To keep the list of package dependencies lean and adhere to the tinyverse principle, the latter can be obtained by using the argument plot.type with the generic S3 method plot, or via autoplot, provided the ggplot2 package is already installed.

+

2.5 Graphical goodness of fit diagnostics

+

Determining whether a parametric model fits well to survival data is no easy task due to the difficulty in specifying the null distribution of many goodness of fit statistics, such as the Cramér–von Mises statistic, which differ for survival data. As such, the longevity package relies mostly on visual diagnostic tools. Waller and Turnbull (1992) discusses how classical visual graphics can be adapted in the presence of censoring. Most notably, only observed failure times are displayed on the \(y\)-axis against their empirical plotting position on the \(x\)-axis. Contrary to the independent and identically distributed case, the uniform plotting positions \(F_n(y_i)\) are based on the nonparametric maximum likelihood estimator discussed in Section 3.

+

The situation is more complicated with truncated data (Belzile et al. 2022), since the data are not identically distributed: indeed, the distribution function of observation \(Y_i\) truncated on the interval \([a_i, b_i]\) is \(F_i(y_i) = \{F(y_i)-F(a_i)\}/\{F(b_i) - F(a_i)\}\), so the data arise from different distributions even if these share common parameters. One way out of this conundrum is using the probability integral transform and the quantile transform to map observations to the uniform scale and back onto the data scale. Taking \(\widetilde{F}(y_i) = F_n(y_i)=\mathrm{rank}(y_i)/(n+1)\) to denote the empirical distribution function estimator, a probability-probability plot would show \(x_i = \widetilde{F}_{i}(y_i)\) against \(y_i = F_i(y_i)\), leading to approximately uniform samples if the parametric distribution \(F\) is suitable. Another option is to standardize the observation, taking the collection \(\widetilde{y}_i=F^{-1}\{F_i(y_i)\}\) of rescaled exceedances and comparing them to the usual plotting positions \(x_{(i)} =\{i/(n+1)\}\). The drawback of the latter approach is that the quantities displayed on the \(y\)-axis are not raw observations and the ranking of the empirical quantiles may change, a somewhat counter-intuitive feature. However, this means that the sample \(\{F_i(y_i)\}\) should be uniform under the null hypothesis, and this allows one to use methods from Säilynoja et al. (2022) to obtain point-wise and simultaneous confidence intervals.

+

longevity offers users the choice between three types of quantile-quantile plots: regular (Q-Q, "qq"), Tukey’s mean detrended Q-Q plots ("tmd") and exponential Q-Q plots ("exp"). Other options on uniform scale are probability-probability (P-P, "pp") plots and empirically rescaled plots (ERP, "erp") (Waller and Turnbull 1992), designed to ease interpretation with censored observations by rescaling axes. We illustrate the graphical tools with the dutch data above age 105 in Figure 3. The fit is correct above 110, but there is a notable dip due to excess mortality around 109.

+
+
+
fit_dutch <- fit_elife(
+  arguments = dutch_data,
+  event = 3,
+  type = "interval2",
+  family = "gp",
+  thresh = 105,
+  export = TRUE)
+par(mfrow = c(1, 2))
+plot(fit_dutch, 
+   which.plot = c("pp","qq"))
+
+
+Probability-probability and quantile-quantile plots for generalized Pareto model fitted above age 105 years to Dutch data. The plots indicate broadly good agreement with the observation, except for some individuals who died age 109 for which too many have deaths close to their birthdates. +

+Figure 3: Probability-probability and quantile-quantile plots for generalized Pareto model fitted above age 105 years to Dutch data. The plots indicate broadly good agreement with the observation, except for some individuals who died age 109 for which too many have deaths close to their birthdates. +

+
+
+
+ +
+
+ +
+

Censored observations are used to compute the plotting positions, but are not displayed. As such, we cannot use graphical goodness of fit diagnostics for the Japanese interval censored data. An alternative, given that data are tabulated in a contingency table, is to use a chi-squared test for independence, conditioning on the number of individuals per birth cohort. The expected number in each cell (birth cohort and age band) can be obtained by computing the conditional probability of falling in that age band. The asymptotic null distribution should be \(\chi^2\) with \((k-1)(p-1)\) degrees of freedom, where \(k\) is the number of age bands and \(p\) the number of birth cohorts. In finite samples, the expected count for large excess lifetimes are very low so one can expect the \(\chi^2\) approximation to be poor. To mitigate this, we can pool observations and resort to simulation to approximate the null distribution of the test statistic. The bootstrap \(p\)-value for the exponential model above 108 years, pooling observations with excess lifetime of 5 years and above, is 0.872, indicating no evidence that the model is inadequate, but the test here may have low power.

+

2.6 Stratification

+

Demographers may suspect differences between individuals of different sex, from different countries or geographic areas, or by birth cohort. All of these are instances of categorical covariates. One possibility is to incorporate these covariates with suitable link function through parameters, but we consider instead stratification. We can split the data by levels of covariate (with factors) into sub-data and compare the goodness of fit of the \(K\) models relative to that which pools all observations. The test_elife function performs likelihood ratio tests for the comparisons. The test statistic is \(-2\{\ell(\widehat{\boldsymbol{\theta}}_0) - \ell(\widehat{\boldsymbol{\theta}})\}\), where \(\widehat{\boldsymbol{\theta}}_0\) is the maximum likelihood estimator under the null model with common parameters, and \(\widehat{\boldsymbol{\theta}}\) is the unrestricted maximum likelihood estimator for the alternative model with the same distribution, but which allows for stratum-specific parameters. We illustrate this with a generalized Pareto model for the excess lifetime. The null hypothesis is \(\mathrm{H}_0: \sigma_{\texttt{f}} = \sigma_{\texttt{m}}, \xi_{\texttt{f}}=\xi_{\texttt{m}}\) against the alternative that at least one equality doesn’t hold and so the hazard and endpoints are different.

+
+
+
print(
+  test_elife(
+    arguments = args_japan,
+    thresh = 110,
+    family = "gp",
+    covariate = japanese2$gender)
+)
+
+
#> Model: generalized Pareto distribution. 
+#> Threshold: 110 
+#> Number of exceedances per covariate level:
+#> female   male 
+#>    642     61 
+#> 
+#> Likelihood ratio statistic: 0.364
+#> Null distribution: chi-square (2)
+#> Asymptotic p-value:  0.833
+
+

In the present example, there is no evidence against any difference in lifetime distribution between male and female; this is unsurprising given the large imbalance between counts for each covariate level, with far fewer males than females.

+

2.7 Extrapolation

+

If the maximum likelihood estimator of the shape \(\xi\) for the generalized Pareto model is negative, then the distribution has a finite upper endpoint; otherwise, the latter is infinite. With \(\xi < 0\), we can look at the profile log likelihood for the endpoint \(\eta = -\sigma/\xi\), using the function prof_gp_endpt, to draw the curve and obtain confidence intervals. The argument psi is used to give a grid of values over which to compute the profile log likelihood. The bounds of the \((1-\alpha)\) confidence intervals are obtained by fitting a cubic smoothing spline for \(y=\eta\) as a function of the shifted profile curve \(x = 2\{\ell_p(\eta)-\ell_p(\widehat{\eta})\}\) on both sides of the maximum likelihood estimator and predicting the value of \(y\) when \(x = -\chi^2_1(1-\alpha)\). This technique works well unless the profile is nearly flat or the bounds lie beyond the range of values of psi provided; the user may wish to change them if they are too far. If \(\widehat{\xi} \approx 0\), then the upper bound of the confidence interval may be infinite and the profile log likelihood may never reach the cutoff value of the asymptotic \(\chi^2_1\) distribution.

+

The profile log likelihood curve for the endpoint, shifted vertically so that its value is zero at the maximum likelihood estimator, highlights the marked asymmetry of the distribution of \(\eta\), shown in 4, with the horizontal dashed lines showing the limits for the 95% profile likelihood confidence intervals. These suggest that the endpoint, or a potential finite lifespan, could lie very much beyond observed records. The routine used to calculate the upper bound computes the cutoff value by fitting a smoothing spline with the role of the \(y\) and \(x\) axes reversed and by predicting the value of \(\eta\) at \(y=0\). In this example, the upper confidence limit is extrapolated from the model: more accurate measures can be obtained by specifying a longer and finer sequence of values of psi such that the profile log likelihood drops below the \(\chi^2_1\) quantile cutoff.

+
+
+
# Create grid of threshold values
+thresholds <- 105:110
+# Grid of values at which to evaluate profile
+psi <- seq(120, 200, length.out = 101)
+# Calculate the profile for the endpoint
+# of the generalized Pareto at each threshold
+endpt_tstab <- do.call(
+  endpoint.tstab,
+  args = c(
+    args_japan, 
+    list(psi = psi, 
+         thresh = thresholds,
+         plot = FALSE)))
+# Compute corresponding confidence intervals
+profile <- endpoint.profile(
+  arguments = c(args_japan, list(thresh = 110, psi = psi)))
+# Plot point estimates and confidence intervals
+g1 <- autoplot(endpt_tstab, plot = FALSE, ylab = "lifespan (in years)")
+# Plot the profile curve with cutoffs for conf. int. for 110
+g2 <- autoplot(profile, plot = FALSE)
+patchwork::wrap_plots(g1, g2)
+
+
+Maximum likelihood estimates with 95\% confidence intervals as a function of threshold (left) and profile likelihood for exceedances above 110 years (right) for Japanese centenarian data. As the threshold increases, the number of exceedances decreases and the intervals for the upper bound become wider. At 110, the right endpoint of the interval would go until infinity. +

+Figure 4: Maximum likelihood estimates with 95% confidence intervals as a function of threshold (left) and profile likelihood for exceedances above 110 years (right) for Japanese centenarian data. As the threshold increases, the number of exceedances decreases and the intervals for the upper bound become wider. At 110, the right endpoint of the interval would go until infinity. +

+
+
+

Depending on the model, the conclusions about the risk of mortality change drastically: the Gompertz model implies an ever increasing hazard, but no finite endpoint for the distribution of exceedances. The exponential model implies a constant hazard and no endpoint. By contrast, the generalized Pareto can accommodate both finite and infinite endpoints. The marked asymmetry of the distribution of lifespan defined by the generalized Pareto shows that inference obtained using symmetric confidence intervals (i.e., Wald-based) is likely very misleading: the drop in fit from having a zero or positive shape parameter \(\xi\) is seemingly smaller than the cutoffs for a 95% confidence interval, suggesting that while the best point estimate is around 128 years, the upper bound is so large (and extrapolated) that everything is possible. The model however also suggests a very high probability of dying in any given year, regardless of whether the hazard is constant, decreasing or increasing.

+

2.8 Hazard

+

The parameters of the models are seldom of interest in themselves: rather, we may be interested in a summary such as the hazard function. At present, longevity does not allow general linear modelling of model parameters or time-varying covariates, but other software implementations can tackle this task. For example, casebase (Bhatnagar et al. 2022) fits flexible hazard models using logistic or multinomial regression with potential inclusion of penalties for the parameters associated to covariates and splines effects. Another alternative is flexsurv (Jackson 2016), which offers 10 parametric models and allows for user-specified models. +The bshazard package (Rebora et al. 2014) provides nonparametric smoothing via \(B\)-splines, whereas muhaz handles kernel-based hazard for right censored data; both could be used for validation of the parametric models in the case of right censoring. The rstpm2 package (Liu et al. 2018) handles generalized modelling for censoring with the Royston–Parmar model built from natural cubic splines (Royston and Parmar 2002). Contrasting with all of the aforementioned approaches, we focus on parametric models: this is partly because there are few observations for the user case we consider and covariates, except perhaps for gender and birth year, are not available.

+

The hazard changes over time; the only notable exception is the exponential hazard, which is constant. longevity includes utilities for computing the hazard function from a fitted model object and computing point-wise confidence intervals using symmetric Wald intervals or the profile likelihood. +Specifically, the hazard_elife function calculates the hazard \(h(t; \boldsymbol{\theta})\) point-wise at times \(t=\)x; Wald-based confidence intervals are obtained using the delta-method, whereas profile likelihood intervals are obtained by reparametrizing the model in terms of \(h(t)\) for each time \(t\). More naturally perhaps, we can consider a Bayesian analysis of the Japanese excess lifetime above 108 years. Using the likelihood and encoded log posterior provided in logpost_elife, we obtained independent samples from the posterior of the generalized Pareto parameters \((\sigma, \xi)\) with maximal data information prior using the rust package. Each parameter combination was then fed into helife and the hazard evaluated over a range of values. Figure 5 shows the posterior samples and functional boxplots (Sun and Genton 2011) of the hazard curves, obtained using the fda package. The risk of dying increases with age, but comes with substantial uncertainty, as evidenced by the increasing width of the boxes and interquartile range.

+
+
+Left: scatterplot of 1000 independent posterior samples from generalized Pareto model with maximum data information prior; the contour curves give the percentiles of credible intervals, and show approximate normality of the posterior. Right: functional boxplots for the corresponding hazard curves, with increasing width at higher ages. +

+Figure 5: Left: scatterplot of 1000 independent posterior samples from generalized Pareto model with maximum data information prior; the contour curves give the percentiles of credible intervals, and show approximate normality of the posterior. Right: functional boxplots for the corresponding hazard curves, with increasing width at higher ages. +

+
+
+

2.9 Nonparametric maximum likelihood estimation

+

The nonparametric maximum likelihood estimator is unique only up to equivalence classes. The data for individual \(i\) consists of the tuple \(\{L_i, R_i, V_i, U_i\}\), where the censoring interval is \([L_i, R_i]\) and the truncation interval is \([V_i, U_i\}\), with \(0 \leq V_i \leq L_i \leq R_i \leq U_i \leq \infty\). Turnbull (1976) shows how one can build disjoint intervals \(C = \bigsqcup_{j=1}^m [a_j, b_j]\) where \(a_j \in \mathcal{L} = \{L_1, \ldots, L_n\}\) and \(b_j \in \mathcal{R} = \{R_1, \ldots, R_n\}\) satisfy \(a_1 \leq b_1 < \cdots < a_m \leq b_m\) and the intervals \([a_j, b_j]\) contain no other members of \(L\) or \(R\) except at the endpoints. This last condition notably ensures that the intervals created include all observed failure times as singleton sets in the absence of truncation. Other authors (Lindsey and Ryan 1998) have taken interval censored data as semi-open intervals \((L_i, R_i]\), a convention we adopt here for numerical reasons. For interval censored and truncated data, Frydman (1994) shows that this construction must be amended by taking instead \(a_j \in \mathcal{L} \cup \{U_1, \ldots, U_n\}\) and \(b_j \in \mathcal{R} \cup \{V_1, \ldots, V_n\}\).

+

We assign probability \(p_j = F(b_j^{+}) - F(a_j^{-}) \ge 0\) to each of the resulting \(m\) intervals under the constraint \(\sum_{j=1}^m p_j = 1\) and \(p_j \ge 0\) \((j=1, \ldots, m)\). The nonparametric maximum likelihood estimator of the distribution function \(F\) is then +\[\begin{align*} +\widehat{F}(t) = \begin{cases} 0, & t < a_1;\\ +\widehat{p}_1 + \cdots + \widehat{p}_j, & b_j < t < a_{j+1} \quad (1 \leq j \leq m-1);\\ +1, & t > b_m; +\end{cases} +\end{align*}\] +and is undefined for \(t \in [a_j, b_j]\) \((j=1, \ldots, m)\).

+
+
+Illustration of the truncation (pale grey) and censoring intervals (dark grey) equivalence classes based on Turnbull's algorithm. Observations must fall within equivalence classes defined by the former. +

+Figure 6: Illustration of the truncation (pale grey) and censoring intervals (dark grey) equivalence classes based on Turnbull’s algorithm. Observations must fall within equivalence classes defined by the former. +

+
+
+

The procedure of Turnbull can be encoded using \(m \times n\) matrices. For censoring, we build \(\mathbf{A}\) whose \((i,j)\)th entry \(\alpha_{ij}=1\) if \([a_j, b_j] \subseteq A_i\) and zero otherwise. Since the intervals forming the set \(C\) are disjoint and in increasing order, a more storage efficient manner of keeping track of the intervals is to find the smallest integer \(j\) such that \(L_i \leq a_j\) and the largest \(R_i \ge b_j\) \((j=1, \ldots, m)\) for each observation. The same idea applies for the truncation sets \(B_i = (V_i, U_i)\) and matrix \(\mathbf{B}\) with \((i,j)\) element \(\beta_{ij}\).

+

The log likelihood function is +\[\begin{align*} +\ell(\boldsymbol{p}) = \sum_{i=1}^n w_i \log \left( \sum_{j=1}^m \alpha_{ij}p_j\right) - w_i\log \left( \sum_{j=1}^m \beta_{ij}p_j\right) +\end{align*}\]

+

The numerical implementation of the EM is in principle forward: first identify the equivalence classes \(C\), next calculate the entries of \(A_i\) and \(B_i\) (or the vectors of ranges) and finally run the EM algorithm. In the second step, we need to account for potential ties in the presence of (interval) censoring and treat the intervals as open on the left for censored data. For concreteness, consider the toy example \(\boldsymbol{T} =(1,1,2)\) and \(\boldsymbol{\delta} = (0,1,1)\), where \(\delta_i = 1\) if the observation is a failure time and \(\delta_i=0\) in case of right censoring. +The left and right censoring bounds are \(\mathcal{L} = \{1, 2\}\) and \(\mathcal{R} = \{1, 2, \infty\}\) with \(A_1 = (1, \infty)\), \(A_2 = \{1\}\) and \(A_3 = \{2\}\) and \(C_1=\{1\}, C_2=\{2\}\). If we were to treat instead \(A_1\) as a semi-closed interval \([1, \infty)\), direct maximization of the log likelihood in eq. 2.2 of Turnbull (1976) would give probability half to each observed failure time. By contrast, the Kaplan–Meier estimator, under the convention that right censored observations at time \(t\) were at risk up to and until \(t\), assigns probability 1/3 to the first failure. To retrieve this solution with Turnbull’s EM estimator, we need the convention that \(C_1 \notin A_1\), but this requires comparing the bound with itself. The numerical tolerance in the implementation is taken to be the square root of the machine epsilon for doubles.

+

The maximum likelihood estimator (MLE) need not be unique, and the EM algorithm is only guaranteed a local maximum. For interval censored data, Gentleman and Geyer (1994) consider using the Karush–Kuhn–Tucker conditions to determine whether the probability in some intervals is exactly zero and whether the returned value is indeed the MLE.

+

Due to data scarcity, statistical inference for human lifespan is best conducted using parametric models supported by asymptotic theory, reserving nonparametric estimators to assess goodness of fit. The empirical cumulative hazard for Japanese is very close to linear from early ages, suggesting that the hazard may not be very far from exponential even if more complex models are likely to be favored given the large sample size.

+

The function np_elife returns a list with Turnbull’s intervals \([a_j, b_j]\) and the probability weights assigned to each, provided the latter are positive. It also contains an object of class stepfun with a weighting argument that defines a cumulative distribution function. +

+

We can use the nonparametric maximum likelihood estimator of the distribution function to assess a fitted parametric model by comparing the density with a binned distribution, or the cumulative distribution function. The distribution function is defined over equivalence classes, which may be isolated observations or intervals. The data interval over which there is non-zero probability of events is broken down into equispaced bins and the probability of failing in the latter estimated nonparametrically based on the distribution function. Alternatively, users can also provide a set of breaks, or the number of bins.

+
+
+
ecdf <- np_elife(arguments = args_japan, thresh = 108)
+# Summary statistics, accounting for censoring
+round(summary(ecdf), digits = 2)
+
+
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
+#>  109.00  110.00  111.00  110.01  111.00  118.00
+
+
# Plots of fitted parametric model and nonparametric CDF
+model_gp <- fit_elife(
+  arguments = args_japan, 
+  thresh = 108, 
+  family = "gp",
+  export = TRUE)
+# ggplot2 plots, wrapped to display side by side
+patchwork::wrap_plots(
+  autoplot(
+    model_gp, # fitted model
+    plot = FALSE, # return list of ggplots
+    which.plot = c("dens", "cdf"),
+    breaks = seq(0L, 8L, by = 1L) # set bins for histogram
+  ) 
+)
+
+
+Nonparametric maximum likelihood estimate of the density (bar plot, left) and distribution function (staircase function, right), with superimposed generalized Pareto fit for excess lifetimes above 108 years. Except for the discreteness inherent to the nonparametric estimates, the two representations broadly agree at year marks. +

+Figure 7: Nonparametric maximum likelihood estimate of the density (bar plot, left) and distribution function (staircase function, right), with superimposed generalized Pareto fit for excess lifetimes above 108 years. Except for the discreteness inherent to the nonparametric estimates, the two representations broadly agree at year marks. +

+
+
+

3 Conclusion

+

This paper describes the salient features of longevity, explaining the theoretical underpinning of the methods and the design considerations followed when writing the package. While longevity was conceived for modelling lifetimes, the package could be used for applications outside of demography. Survival data in extreme value theory is infrequent yet hardly absent. For example, rainfall observations can be viewed as rounded due to the limited instrumental precision of rain gauges and treated as interval-censored. Some historical records, which are often lower bounds on the real magnitude of natural catastrophes, can be added as right-censored observations in a peaks-over-threshold analysis. In insurance, losses incurred by the company due to liability claims may be right-censored if they exceed the policy cap and are covered by a reinsurance company, or rounded (Belzile and Nešlehová 2025). In climate science, attribution studies often focus on data just after a record-breaking event, and the stopping rule leads to truncation (Miralles and Davison 2023), which biases results if ignored.

+

The package has features that are interesting in their own right, including adapted quantile-quantile and other visual goodness of fit diagnostics for model validation. The testing procedures correctly handle tests for restrictions lying on the boundary of the parameter space. Parametric bootstrap procedures for such tests are not straightforward to implement given the heavy reliance on the data generating mechanism and the diversity of possible scenarios: this paper however shows how the utilities of the package can be coupled to ease such estimation and the code in the supplementary material illustrates how this can be extended for goodness of fit testing.

+

Acknowledgements

+

The author thanks four anonymous reviewers for valuable feedback. This research was supported financially by the Natural Sciences and Engineering Research Council of Canada (NSERC) via Discovery Grant RGPIN-2022-05001.

+
+

3.1 Supplementary materials

+

Supplementary materials are available in addition to this article. It can be downloaded at +RJ-2025-034.zip

+

3.2 CRAN packages used

+

longevity, evd, mev, extRemes, evgam, survival, MortCast, demography, MortalityLaws, vitality, fitdistrplus, VGAM, prodlim, icenReg, dblcens, interval, tranSurv, DTDA, Rsolnp, ggplot2, casebase, flexsurv, bshazard, muhaz, rstpm2, rust, fda

+

3.3 CRAN Task Views implied by cited packages

+

ActuarialScience, ChemPhys, ClinicalTrials, Distributions, Econometrics, Environmetrics, ExtremeValue, FunctionalData, MissingData, NetworkAnalysis, Optimization, Phylogenetics, Psychometrics, Spatial, Survival, TeachingStatistics

+

3.4 Bioconductor packages used

+

Icens

+
+
+J. J. Anderson. A vitality-based model relating stressors and environmental properties to organism survival. Ecological Monographs, 70(3): 445–470, 2000. URL https://doi.org/10.1890/0012-9615(2000)070[0445:AVBMRS]2.0.CO;2. +
+
+C. Anderson-Bergman. An efficient implementation of the EMICM algorithm for the interval censored NPMLE. Journal of Computational and Graphical Statistics, 26(2): 463–467, 2017a. URL https://doi.org/10.1080/10618600.2016.1208616. +
+
+C. Anderson-Bergman. icenReg: Regression models for interval censored data in R. Journal of Statistical Software, 81(12): 1–23, 2017b. URL https://doi.org/10.18637/jss.v081.i12. +
+
+E. Barbi, F. Lagona, M. Marsili, J. W. Vaupel and K. W. Wachter. The plateau of human mortality: Demography of longevity pioneers. Science, 360(6396): 1459–1461, 2018. URL https://doi.org/10.1126/science.aat3119. +
+
+R. E. Beard. A theory of mortality based on actuarial, biological and medical considerations. In Proceedings of the international population conference, new york, 1963. London: International Union for the Scientific Study of Population. +
+
+L. R. Belzile et al. mev: Modelling extreme values. 2023a. URL https://CRAN.R-project.org/package=mev. R package version 1.15. +
+
+L. R. Belzile, A. C. Davison, J. Gampe, H. Rootzén and D. Zholud. Is there a cap on longevity? A statistical review. Annual Review of Statistics and its Application, 9: 22–45, 2022. URL https://doi.org/10.1146/annurev-statistics-040120-025426. +
+
+L. R. Belzile, C. Dutang, P. J. Northrop and T. Opitz. A modeler’s guide to extreme value software. Extremes, 26: 595–638, 2023b. URL https://doi.org/10.1007/s10687-023-00475-9. +
+
+L. R. Belzile and J. G. Nešlehová. Statistics of extremes for incomplete data, with application to lifetime and liability claim modeling. In Handbook on statistics of extremes, Eds M. de Carvalho, R. Huser, P. Naveau and B. J. Reich pages. in press 2025. CRC Press. +
+
+S. R. Bhatnagar, M. Turgeon, J. Islam, J. A. Hanley and O. Saarela. casebase: An alternative framework for survival analysis and comparison of event rates. The R Journal, 14: 59–79, 2022. URL https://doi.org/10.32614/RJ-2022-052. +
+
+C. G. Camarda. The curse of the plateau. Measuring confidence in human mortality estimates at extreme ages. Theoretical Population Biology, 144: 24–36, 2022. URL https://doi.org/10.1016/j.tpb.2022.01.002. +
+
+H. Chernoff. On the distribution of the likelihood ratio. The Annals of Mathematical Statistics, 25(3): 573–578, 1954. URL https://doi.org/10.1214/aoms/1177728725. +
+
+S. H. (Steven) Chiou and J. Qian. tranSurv: Transformation model based estimation of survival and regression under dependent truncation and independent censoring. 2021. URL https://CRAN.R-project.org/package=tranSurv. R package version 1.2.2. +
+
+M. L. Delignette-Muller and C. Dutang. fitdistrplus: An R package for fitting distributions. Journal of Statistical Software, 64(4): 1–34, 2015. DOI 10.18637/jss.v064.i04. +
+
+L. Devroye. Non-uniform random variate generation. New York: Springer, 1986. URL http://www.nrbook.com/devroye/. +
+
+B. Efron and V. Petrosian. Nonparametric methods for doubly truncated data. Journal of the American Statistical Association, 94(447): 824–834, 1999. URL https://doi.org/10.1080/01621459.1999.10474187. +
+
+J. J. Einmahl, J. H. J. Einmahl and L. de Haan. Limits to human life span through extreme value theory. Journal of the American Statistical Association, 114(527): 1075–1080, 2019. URL https://doi.org/10.1080/01621459.2018.1537912. +
+
+M. P. Fay and P. A. Shaw. Exact and asymptotic weighted logrank tests for interval censored data: The interval R package. Journal of Statistical Software, 36(2): 1–34, 2010. URL https://doi.org/10.18637/jss.v036.i02. +
+
+H. Frydman. A note on nonparametric estimation of the distribution function from interval-censored and truncated observations. Journal of the Royal Statistical Society. Series B (Methodological), 56(1): 71–74, 1994. URL https://doi.org/10.1111/j.2517-6161.1994.tb01960.x. +
+
+R. Gentleman and C. J. Geyer. Maximum likelihood for interval censored data: Consistency and computation. Biometrika, 81(3): 618–623, 1994. URL https://doi.org/10.1093/biomet/81.3.618. +
+
+R. Gentleman and A. Vandal. Icens: NPMLE for censored and truncated data. 2024. URL https://CRAN.R-project.org/package=DTDA. R package version 1.76.0. +
+
+T. A. Gerds. prodlim: Product-limit estimation for censored event history analysis. 2024. URL https://CRAN.R-project.org/package=prodlim. R package version 2024.06.25. +
+
+A. Ghalanos and S. Theussl. Rsolnp: General non-linear optimization using augmented Lagrange multiplier method. 2015. URL https://CRAN.R-project.org/package=Rsolnp. R package version 1.16. +
+
+E. Gilleland and R. W. Katz. extRemes 2.0: An extreme value analysis package in R. Journal of Statistical Software, 72(8): 1–39, 2016. DOI 10.18637/jss.v072.i08. +
+
+P. Groeneboom and J. A. Wellner. Information bounds and nonparametric maximum likelihood estimation. Basel, Switzerland, 1992. URL https://doi.org/10.1007/978-3-0348-8621-5. +
+
+G. Grolemund and H. Wickham. Dates and times made easy with lubridate. Journal of Statistical Software, 40(3): 1–25, 2011. URL https://www.jstatsoft.org/v40/i03/. +
+
+K. Hess and R. Gentleman. muhaz: Hazard function estimation in survival analysis. 2021. URL https://CRAN.R-project.org/package=muhaz. R package version 1.2.6.4. +
+
+R. Hyndman. demography: Forecasting mortality, fertility, migration and population data. 2023. URL https://pkg.robjhyndman.com/demography/. R package version 2.0. +
+
+C. Jackson. flexsurv: A platform for parametric survival modeling in R. Journal of Statistical Software, 70(8): 1–33, 2016. URL https://www.jstatsoft.org/index.php/jss/article/view/v070i08. +
+
+D. A. Jdanov, V. M. Shkolnikov and S. Gellers-Barkmann. The international database on longevity: Data resource profile. In Exceptional lifespans, Eds H. Maier, B. Jeune and J. W. Vaupel pages. 22–24 2021. Cham, Switzerland: Springer. URL https://doi.org/10.1007/978-3-030-49970-9_2. +
+
+E. L. Kaplan and P. Meier. Nonparametric estimation from incomplete observations. Journal of the American Statistical Association, 53: 457–481, 1958. URL https://doi.org/10.1080/01621459.1958.10501452. +
+
+T. Li and J. J. Anderson. The vitality model: A way to understand population survival and demographic heterogeneity. Theoretical Population Biology, 76(2): 118–131, 2009. URL https://doi.org/10.1016/j.tpb.2009.05.004. +
+
+J. C. Lindsey and L. M. Ryan. Methods for interval-censored data. Statistics in Medicine, 17(2): 219–238, 1998. URL https://doi.org/10.1002/(SICI)1097-0258(19980130)17:2<219::AID-SIM735>3.0.CO;2-O. +
+
+X.-R. Liu, Y. Pawitan and M. Clements. Parametric and penalized generalized survival models. Statistical Methods in Medical Research, 27(5): 1531–1546, 2018. URL https://doi.org/10.1177/0962280216664760. +
+
+H. Maier, B. Jeune and J. W. Vaupel, eds. Exceptional lifespans. Cham, Switzerland: Springer, 2021. URL https://doi.org/10.1007/978-3-030-49970-9. +
+
+O. Miralles and A. C. Davison. Timing and spatial selection bias in rapid extreme event attribution. Weather and Climate Extremes, 41: 100584, 2023. URL https://doi.org/10.1016/j.wace.2023.100584. +
+
+C. Moreira, J. de Uña-Álvarez and R. Crujeiras. DTDA: Doubly truncated data analysis. 2022. URL https://bioconductor.org/packages/Icens/. R package version 3.0.1. +
+
+P. J. Northrop. rust: Ratio-of-uniforms simulation with transformation. 2023. URL https://paulnorthrop.github.io/rust/. R package version 1.4.2, https://github.com/paulnorthrop/rust. +
+
+P. J. Northrop and C. L. Coleman. Improved diagnostic plots for extreme value analyses. Extremes, 17: 289–303, 2014. URL https://doi.org/10.1007/s10687-014-0183-z. +
+
+M. D. Pascariu. MortalityLaws: Parametric mortality models, life tables and HMD. 2024. URL https://CRAN.R-project.org/package=MortalityLaws. R package version 2.1.0. +
+
+G. Passolt, J. J. Anderson, T. Li, D. H. Salinger and D. J. Sharrow. vitality: Fitting routines for the vitality family of mortality models. 2018. URL https://CRAN.R-project.org/package=vitality. R package version 1.3. +
+
+W. Perks. On some experiments in the graduation of mortality statistics. Journal of the Institute of Actuaries, 63(1): 12–57, 1932. DOI 10.1017/s0020268100046680. +
+
+J. Ramsay. fda: Functional data analysis. 2024. URL http://www.functionaldata.org. R package version 6.1.8. +
+
+P. Rebora, A. Salim and M. Reilly. bshazard: A flexible tool for nonparametric smoothing of the hazard function. The R Journal, 6(2): 114–122, 2014. URL https://doi.org/10.32614/RJ-2014-028. +
+
+S. J. Richards. A handbook of parametric survival models for actuarial use. Scandinavian Actuarial Journal, 2012(4): 233–257, 2012. URL https://doi.org/10.1080/03461238.2010.506688. +
+
+A. Rotnitzky, D. R. Cox, M. Bottai and J. Robins. Likelihood-based inference with singular information matrix. Bernoulli, 6(2): 243–284, 2000. URL https://doi.org/10.2307/3318576. +
+
+P. Royston and M. K. B. Parmar. Flexible parametric proportional-hazards and proportional-odds models for censored survival data, with application to prognostic modelling and estimation of treatment effects. Statistics in Medicine, 21(15): 2175–2197, 2002. URL https://doi.org/10.1002/sim.1203. +
+
+T. Säilynoja, P.-C. Bürkner and A. Vehtari. Graphical test for discrete uniformity and its applications in goodness of fit evaluation and multiple sample comparison. Statistics and Computing, 32(2): 32, 2022. URL https://doi.org/10.1007/s11222-022-10090-6. +
+
+S. G. Self and K.-Y. Liang. Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. Journal of the American Statistical Association, 82(398): 605–610, 1987. URL https://doi.org/10.1080/01621459.1987.10478472. +
+
+H. Ševčíková, N. Li, V. Kantorová, P. Gerland and A. E. Raftery. Age-specific mortality and fertility rates for probabilistic population projections. In Dynamic demographic analysis, Ed R. Schoen pages. 285–310 2016. Cham, Switzerland: Springer. ISBN 978-3-319-26603-9. URL https://doi.org/10.1007/978-3-319-26603-9_15. +
+
+H. Ševčı́ková, N. Li and P. Gerland. MortCast: Estimation and projection of age-specific mortality rates. 2022. URL https://CRAN.R-project.org/package=MortCast. R package version 2.7-0. +
+
+P.-S. Shen. Nonparametric analysis of doubly truncated data. Annals of the Institute of Statistical Mathematics, 62(5): 835–853, 2010. URL https://doi.org/10.1007/s10463-008-0192-2. +
+
+A. G. Stephenson. evd: Extreme value distributions. R News, 2(2): 2002. URL https://cran.r-project.org/doc/Rnews/Rnews_2002-2.pdf. +
+
+Y. Sun and M. G. Genton. Functional boxplots. Journal of Computational and Graphical Statistics, 20(2): 316–334, 2011. DOI 10.1198/jcgs.2011.09224. +
+
+T. M. Therneau. A package for survival analysis in R. 2022. URL https://CRAN.R-project.org/package=survival. R package version 3.3-1. +
+
+T. M. Therneau and P. M. Grambsch. Modeling survival data: Extending the Cox model. New York: Springer, 2000. URL https://doi.org/10.1007/978-1-4757-3294-8. +
+
+W.-Y. Tsai, N. P. Jewell and M.-C. Wang. A note on the product-limit estimator under right censoring and left truncation. Biometrika, 74(4): 883–886, 1987. URL https://doi.org/10.1093/biomet/74.4.883. +
+
+B. W. Turnbull. The empirical distribution function with arbitrarily grouped, censored and truncated data. Journal of the Royal Statistical Society, Series B, 38: 290–295, 1976. URL https://doi.org/10.1111/j.2517-6161.1976.tb01597.x. +
+
+L. A. Waller and B. W. Turnbull. Probability plotting with censored data. American Statistician, 46(1): 5–12, 1992. URL https://doi.org/10.1080/00031305.1992.10475837. +
+
+S. Watanabe. Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. Journal of Machine Learning Research, 11: 3571–3594, 2010. URL https://dl.acm.org/doi/10.5555/1756006.1953045. +
+
+H. Wickham. ggplot2: Elegant graphics for data analysis. Springer-Verlag New York, 2016. URL https://ggplot2.tidyverse.org. +
+
+H. Wickham, M. Averick, J. Bryan, W. Chang, L. D. McGowan, R. François, G. Grolemund, A. Hayes, L. Henry, J. Hester, et al. Welcome to the tidyverse. Journal of Open Source Software, 4(43): 1686, 2019. URL https://doi.org/10.21105/joss.01686. +
+
+H. Wickham, R. François, L. Henry, K. Müller and D. Vaughan. dplyr: A grammar of data manipulation. 2023. URL https://CRAN.R-project.org/package=dplyr. R package version 1.1.4. +
+
+Y. Ye. Interior algorithms for linear, quadratic, and linearly constrained non-linear programming. 1987. +
+
+T. W. Yee. Vector generalized linear and additive models: With an implementation in R. New York, NY: Springer, 2015. URL https://doi.org/10.1007/978-1-4939-2818-7. +
+
+T. W. Yee and C. J. Wild. Vector generalized additive models. Journal of the Royal Statistical Society: Series B (Methodological), 58(3): 481–493, 1996. URL https://doi.org/10.1111/j.2517-6161.1996.tb02095.x. +
+
+B. D. Youngman. evgam: An R package for generalized additive extreme value models. Journal of Statistical Software, 103(3): 1–26, 2022. URL https://www.jstatsoft.org/index.php/jss/article/view/v103i03. +
+
+M. Zhou, L. Lee, K. Chen and Y. Yang. dblcens: Compute the NPMLE of distribution function from doubly censored data, plus the empirical likelihood ratio for \(F(T)\). 2023. URL https://CRAN.R-project.org/package=dblcens. R package version 1.1.9. +
+
+ + +
+ +
+
+ + + + + + + +
+

References

+
+

Reuse

+

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

+

Citation

+

For attribution, please cite this work as

+
Belzile, "longevity: An R Package for Modelling Excess Lifetimes", The R Journal, 2026
+

BibTeX citation

+
@article{RJ-2025-034,
+  author = {Belzile, Léo R.},
+  title = {longevity: An R Package for Modelling Excess Lifetimes},
+  journal = {The R Journal},
+  year = {2026},
+  note = {https://doi.org/10.32614/RJ-2025-034},
+  doi = {10.32614/RJ-2025-034},
+  volume = {17},
+  issue = {4},
+  issn = {2073-4859},
+  pages = {37-58}
+}
+
+ + + + + + + diff --git a/_articles/RJ-2025-034/RJ-2025-034.pdf b/_articles/RJ-2025-034/RJ-2025-034.pdf new file mode 100644 index 0000000000..7ba35a40a4 Binary files /dev/null and b/_articles/RJ-2025-034/RJ-2025-034.pdf differ diff --git a/_articles/RJ-2025-034/RJ-2025-034.tex b/_articles/RJ-2025-034/RJ-2025-034.tex new file mode 100644 index 0000000000..597231e36e --- /dev/null +++ b/_articles/RJ-2025-034/RJ-2025-034.tex @@ -0,0 +1,550 @@ +% !TeX root = RJwrapper.tex +\title{longevity: An R Package for Modelling Excess Lifetimes} + + +\author{by Léo R. Belzile} + +\maketitle + +\abstract{% +The longevity R package provides a maximum likelihood estimation routine for modelling of survival data that are subject to non-informative censoring or truncation. It includes a selection of 12 parametric models of varying complexity, with a focus on tools for extreme value analysis and more specifically univariate peaks over threshold modelling. The package provides utilities for univariate threshold selection, parametric and nonparametric maximum likelihood estimation, goodness of fit diagnostics and model comparison tools. These different methods are illustrated using individual Dutch records and aggregated Japanese human lifetime data. +} + +\section{Introduction}\label{introduction} + +Many data sets collected by demographers for the analysis of human longevity have unusual features and for which limited software implementations exist. The \CRANpkg{longevity} package was initially built for dealing with human longevity records and data from the International Database on Longevity (IDL), which provides age at death of supercentenarians, i.e., people who died above age 110. Data for the statistical analysis of (human) longevity can take the form of aggregated counts per age at death, or most commonly life trajectory of individuals with both birth and death dates. Such lifetimes are often interval truncated (only age at death of individuals dying between two calendar dates are recorded) or left truncated and right censored (when data of individuals still alive at the end of the collection period are also included). Another frequent format is death counts, aggregated per age band. Censoring and truncation are typically of administrative nature and thus non-informative about death. + +Supercentenarians are extremely rare and records are sparse. The most popular parametric models used by practitioners are justified by asymptotic arguments and have their roots in extreme value theory. Univariate extreme value distributions are well implemented in software and \citet{Belzile.Dutang.Northrop.Opitz:2022} provides a recent review of existing implementations. While there are many standard R packages for the analysis of univariate extremes using likelihood-based inference, such as \CRANpkg{evd} \citep{evd}, \CRANpkg{mev} and \CRANpkg{extRemes} \citep{extRemes}, only the \CRANpkg{evgam} package includes functionalities to fit threshold exceedance models with censoring, as showcased in \citet{evgam} with rounded rainfall measurements. Support for survival data for extreme value models is in general wholly lacking, which motivated the development of \CRANpkg{longevity}. + +The \CRANpkg{longevity} package also includes several parametric models commonly used in demography. Many existing packages that focus on tools for modelling mortality rates, typically through life tables, are listed in the CRAN Task View \ctv{ActuarialScience} in the Life insurance section. They do not however allow for truncation or more general survival mechanisms, as the aggregated data used are typically complete except for potential right censoring for the oldest age group. The \CRANpkg{survival} package \citep{survival-book} includes utilities for accelerated failure time models for 10 parametric models. The \CRANpkg{MortCast} package can be used to estimate age-specific mortality rates using the Kannisto and Lee--Carter approaches, among others \citep{Sevcikova:2016}. The \CRANpkg{demography} package provides forecasting methods for death rates based on constructed life tables using the Lee--Carter or ARIMA models. \CRANpkg{MortalityLaws} includes utilities to download data from the Human Mortality Database (HMD), and fit a total of 27 parametric models for life table data (death counts and population at risk per age group) using Poisson, binomial or alternative loss functions. The \CRANpkg{vitality} package fits the family of vitality models \citep{Li.Anderson:2009, Anderson:2000} via maximum likelihood based on empirical survival data. The \CRANpkg{fitdistrplus} \citep{fitdistrplus-package} allows for generic parametric distributions to be fitted to interval censored data via maximum likelihood, with various S3 methods for model assessment. The package also allows user-specified models, thereby permitting custom definitions for truncated distributions whose truncation bounds are passed as fixed vector parameters. Parameter uncertainty can be obtained via additional functions using nonparametric bootstrap. Many parametric distributions also appear in the \CRANpkg{VGAM} package \citep{Yee.Wild:1996, VGAMbook}, which allows for vector generalized linear modelling. The \CRANpkg{longevity} package is less general and offers support only for selected parametric distributions, but contrary to the aforementioned packages allows for truncation and general patterns. One strength of \CRANpkg{longevity} is that it also includes model comparison tools that account for non-regular asymptotics and goodness of fit diagnostics. + +Nonparametric methods are popular tools for the analysis of survival data with large samples, owing to their limited set of assumptions. They also serve for the validation of parametric models. Without explanatory variable, a closed-form estimator of the nonparametric maximum likelihood estimator of the survival function can be derived in particular instances, including the product limit estimator \citep{Kaplan.Meier:1958} for the case of random or non-informative right censoring and an extension allowing for left truncation \citep{Tsai.Jewell.Wang:1987}. In general, the nonparametric maximum likelihood estimator of the survival function needs to be computed using an expectation-maximization (EM) algorithm \citep{Turnbull:1976}. Nonparametric estimators only assign probability mass on observed failure times and intervals and so cannot be used for extrapolation beyond the range of the data, limiting its utility in extreme value analysis. + +The CRAN Task View on \ctv{Survival} Analysis lists various implementations of nonparametric maximum likelihood estimators of the survival or hazard functions: \CRANpkg{survival} implements the Kaplan--Meier and Nelson--Aalen estimators. Many packages focus on the case of interval censoring \citep[\S 3.2]{Groeneboom.Wellner:1992}, including \CRANpkg{prodlim}; \citet{Anderson-Bergman:2017} reviews the performance of the implementations in \CRANpkg{icenReg} \citep{icensReg} and \BIOpkg{Icens}. The latter uses the incidence matrix as input data. Routines for doubly censored data are provided by \CRANpkg{dblcens}. The \CRANpkg{interval} \citep{interval} implements Turnbull's EM algorithm for interval censored data. The case of left truncated and right censored data is handled by \CRANpkg{survival}, and \CRANpkg{tranSurv} provides transformation models with the potential to account for truncation dependent on survival times. For interval truncated data, dedicated algorithms that use gradient-based steps \citep{Efron.Petrosian:1999} or inverse probability weighting \citep{Shen:2010} exist and can be more efficient than the EM algorithm of \citet{Turnbull:1976}. Many of these are implemented in the \CRANpkg{DTDA} package. The \CRANpkg{longevity} package includes a C\(^{++}\) implementation of the corrected Turnbull's algorithm \citep{Turnbull:1976}, returning the nonparametric maximum likelihood estimator for arbitrary censoring and truncation patterns, as opposed to the specific existing aforementioned implementations already available which focus on specific subcases. With a small number of observations, it is also relatively straightforward to maximize the log likelihood for the concave program subject to linear constraints using constrained optimization algorithms; \CRANpkg{longevity} relies for this on \CRANpkg{Rsolnp}, which uses augmented Lagrangian methods and sequential quadratic programming \citep{Ye:1987}. + +\subsection{Motivating examples}\label{motivating-examples} + +To showcase the functionality of the package and particularity for the modelling of threshold exceedances, we consider Dutch and Japanese life lengths. The \texttt{dutch} database contains the age at death (in days) of Dutch people who died above age 92 between 1986 and 2015; these data were obtained from Statistics Netherlands and analyzed in \citet{Einmahl:2019} and \citet{ARSIA:2022}. Records are interval truncated, as people are included in the database only if they died during the collection period. In addition, there are 226 interval censored and interval truncated records for which only the month and year of birth and death are known, as opposed to exact dates. + +The second database we consider is drawn from \citet{ExceptionalLifespans}. The data frame \texttt{japanese2} consists of counts of Japanese above age 100 by age band and are stratified by both birth cohort and sex. To illustrate the format of the data, counts for female Japanese are reproduced in Table \ref{tab:tbl-japanese-women}. The data were constructed using the extinct cohort method and are interval censored between \(\texttt{age}\) and \(\texttt{age} + 1\) and right truncated at the age reached by the oldest individuals of their birth cohort in 2020. The \texttt{count} variable lists the number of instances in the contingency table, and serves as a weight for likelihood contributions. + +\begin{table} + +\caption{\label{tab:tbl-japanese-women}Death count by birth cohort and age band for female Japanese.} +\centering +\begin{tabular}[t]{rrrrrrr} +\toprule +age & 1874-1878 & 1879-1883 & 1884-1888 & 1889-1893 & 1894-1898 & 1899-1900\\ +\midrule +100 & 1648 & 2513 & 4413 & 8079 & 16036 & 9858\\ +101 & 975 & 1596 & 2921 & 5376 & 11047 & 7091\\ +102 & 597 & 987 & 1864 & 3446 & 7487 & 4869\\ +103 & 345 & 597 & 1153 & 2230 & 5014 & 3293\\ +104 & 191 & 351 & 662 & 1403 & 3242 & 2133\\ +105 & 121 & 197 & 381 & 855 & 2084 & 1357\\ +106 & 64 & 122 & 210 & 495 & 1284 & 836\\ +107 & 34 & 74 & 120 & 274 & 774 & 521\\ +108 & 16 & 41 & 66 & 152 & 433 & 297\\ +109 & 12 & 30 & 39 & 83 & 252 & 167\\ +110 & 6 & 17 & 21 & 49 & 130 & 92\\ +111 & 4 & 10 & 15 & 26 & 69 & 47\\ +112 & 3 & 3 & 11 & 15 & 29 & 22\\ +113 & 2 & 2 & 8 & 5 & 15 & 9\\ +114 & 1 & 2 & 4 & 3 & 7 & 2\\ +115 & 0 & 1 & 1 & 0 & 3 & 2\\ +116 & 0 & 1 & 1 & 0 & 1 & 1\\ +117 & 0 & 0 & 0 & 0 & 1 & 1\\ +\bottomrule +\end{tabular} +\end{table} + +\section{Package functionalities}\label{package-functionalities} + +The \CRANpkg{longevity} package uses the S3 object oriented system and provides a series of functions with common arguments. The syntax used by the \CRANpkg{longevity} package purposely mimics that of the \CRANpkg{survival} package \citep{survival-book}, except that it does not specify models using a formula. Users must provide vectors for the time or age (or bounds in case of intervals) via arguments \texttt{time} and \texttt{time2}, as well as lower and upper truncation bounds (\texttt{ltrunc} and \texttt{rtrunc}) if applicable. The integer vector \texttt{event} is used to indicate the type of event, where following \CRANpkg{survival} \texttt{0} indicates right-censoring, \texttt{1} observed events, \texttt{2} left-censoring and \texttt{3} interval censoring. Together, these five vectors characterize the data and the survival mechanisms at play. Depending on the sampling scheme, not all arguments are required or relevant and they need not be of the same length, but are common to most functions. Users can also specify a named list \texttt{args} to pass arguments: as illustrated below, this is convenient to avoid specifying repeatedly the common arguments in each function call. Default values are overridden by elements in \texttt{args}, with the exception of those that are passed by the user directly in the call. Relative to \CRANpkg{survival}, functions have additional arguments \texttt{ltrunc} and \texttt{rtrunc} for left and right truncation limits, as these are also possibly matrices for the case of double interval truncation \citep{ARSIA:2022}, since both censoring and truncation can be present simultaneously. + +We can manipulate the data set to build the time vectors and truncation bounds for the Dutch data. We re-scale observations to years for interpretability and keep only records above age 98 for simplicity. We split the data to handle the observed age at death first: these are treated as observed (uncensored) whenever \texttt{time} and \texttt{time2} coincide. When exact dates are not available, we compute the range of possible age at which individuals may have died, given their birth and death years and months. The truncation bounds for each individual can be obtained by subtracting from the endpoints of the sampling frame the birth dates, with left and right truncation bounds +\begin{align*} +\texttt{ltrunc}=\min\{92 \text{ years}, 1986.01.01 - \texttt{bdate}\}, \qquad \texttt{rtrunc} = 2015.12.31- \texttt{bdate}. +\end{align*} +Table \ref{tab:tbl-dutch-preview} shows a sample of five individuals, two of whom are interval-censored, and the corresponding vectors of arguments along with two covariates, gender (\texttt{gender}) and birth year (\texttt{byear}). + +\begin{table} + +\caption{\label{tab:tbl-dutch-preview}Sample of five Dutch records, formatted so that the inputs match the function arguments used by the package. Columns give the age in years at death (or plausible interval), lower and upper truncation bounds giving minimum and maximum age for inclusion, an integer indicating the type of censoring, gender and birth year.} +\centering +\begin{tabular}[t]{rrrrrlr} +\toprule +time & time2 & ltrunc & rtrunc & event & gender & byear\\ +\midrule +104.67 & 105.74 & 80.00 & 111.00 & 3 & female & 1905\\ +103.50 & 104.58 & 78.00 & 109.00 & 3 & female & 1907\\ +100.28 & 100.28 & 92.01 & 104.50 & 1 & female & 1911\\ +100.46 & 100.46 & 92.01 & 102.00 & 1 & male & 1913\\ +100.51 & 100.51 & 92.01 & 104.35 & 1 & female & 1911\\ +\bottomrule +\end{tabular} +\end{table} + +We can proceed similarly for the Japanese data. Ages of centenarians are rounded down to the nearest year, so all observations are interval censored within one-year intervals. Assuming that the ages at death are independent and identically distributed with distribution function \(F(\cdot; \boldsymbol{\theta})\), the log likelihood for exceedances \(y_i = \texttt{age}_i - u\) above age \(u\) is +\begin{align*} +\ell(\boldsymbol{\theta}) = \sum_{i: \texttt{age}_i > u}n_i \left[\log \{F(y_i+1; \boldsymbol{\theta}) - F(y_i; \boldsymbol{\theta})\} - \log F(r_i - u; \boldsymbol{\theta})\right] +\end{align*} +where \(n_i\) is the count of the number of individuals in cell \(i\) and \(r_i > \texttt{age}_i+1\) is the right truncation limit for that cell, i.e., the maximum age that could have been achieved for that birth cohort by the end of the data collection period. + +\begin{verbatim} +data(japanese2, package = "longevity") +# Keep only non-empty cells +japanese2 <- japanese2[japanese2$count > 0, ] +# Define arguments that are recycled +japanese2$rtrunc <- 2020 - + as.integer(substr(japanese2$bcohort, 1, 4)) +# The line above extracts the earliest year of the birth cohort +# Create a list with all arguments common to package functions +args_japan <- with(japanese2, + list( + time = age, # lower censoring bound + time2 = age + 1L, # upper censoring bound + event = 3, # define interval censoring + type = "interval2", + rtrunc = rtrunc, # right truncation limit + weights = count)) # counts as weights +\end{verbatim} + +\subsection{Parametric models and maximum likelihood estimation}\label{parametric-models-and-maximum-likelihood-estimation} + +Various models are implemented in \CRANpkg{longevity}: their hazard functions are reported in Table \ref{tab:tbl-parametric-models}. Two of those models, labelled \texttt{perks} and \texttt{beard} are logistic-type hazard functions proposed in \citet{Perks:1932} that have been used by \citet{Beard:1963}, and popularized in work of Kannisto and Thatcher; we use the parametrization of \citet{Richards:2012}, from which we also adopt the nomenclature. Users can compare the models with those available in \CRANpkg{MortalityLaws}; see \texttt{?availableLaws} for the list of hazard functions and parametrizations. + +\begin{table} +\centering +\caption{\label{tab:tbl-parametric-models}List of parametric models for excess lifetime supported by the package, with parametrization and hazard functions.} +\centering +\begin{tabular}[t]{lll} +\toprule +model & hazard function & constraints\\ +\midrule +\texttt{exp} & \(\sigma^{-1}\) & \(\sigma > 0\)\\ +\texttt{gomp} & \(\sigma^{-1}\exp(\beta t/\sigma)\) & \(\sigma > 0, \beta \ge 0\)\\ +\texttt{gp} & \((\sigma + \xi t)_{+}^{-1}\) & \(\sigma > 0, \xi \in \mathbb{R}\)\\ +\texttt{weibull} & \(\sigma^{-\alpha} \alpha t^{\alpha-1}\) & \(\sigma > 0, \alpha > 0\)\\ +\texttt{extgp} & \(\beta\sigma^{-1}\exp(\beta t/\sigma)[\beta+\xi\{\exp(\beta t/\sigma) -1\}]^{-1}\) & \(\sigma > 0, \beta \ge 0, \xi \in \mathbb{R}\)\\ +\texttt{extweibull} & \(\alpha\sigma^{-\alpha}t^{\alpha-1}\{1+\xi(t/\sigma)^{\alpha}\}_{+}\) & \(\sigma > 0, \alpha > 0, \xi \in \mathbb{R}\)\\ +\texttt{perks} & \(\alpha\exp(\nu x)/\{1+\alpha\exp(\nu x)\}\) & \(\nu \ge 0, \alpha >0\)\\ +\texttt{beard} & \(\alpha\exp(\nu x)/\{1+\alpha\beta\exp(\nu x)\}\) & \(\nu \ge 0, \alpha >0, \beta \ge 0\)\\ +\texttt{gompmake} & \(\lambda + \sigma^{-1}\exp(\beta t/\sigma)\) & \(\lambda \ge 0, \sigma > 0, \beta \ge 0\)\\ +\texttt{perksmake} & \(\lambda + \alpha\exp(\nu x)/\{1+\alpha\exp(\nu x)\}\) & \(\lambda \ge 0, \nu \ge 0, \alpha > 0\)\\ +\texttt{beardmake} & \(\lambda + \alpha\exp(\nu x)/\{1+\alpha\beta\exp(\nu x)\}\) & \( \lambda \ge 0, \nu \ge 0, \alpha > 0, \beta \ge 0\)\\ +\bottomrule +\end{tabular} +\end{table} + +Many of the models are nested and Figure \ref{fig:fig-nesting} shows the logical relation between the various families. The function \texttt{fit\_elife} allows users to fit all of the parametric models of Table \ref{tab:tbl-parametric-models}: the \texttt{print} method returns a summary of the sampling mechanism, the number of observations, the maximum log likelihood and parameter estimates with standard errors. Depending on the data, some models may be overparametrized and parameters need not be numerically identifiable. To palliate such issues, the optimization routine, which uses \CRANpkg{Rsolnp}, can try multiple starting values or fit various sub-models to ensure that the parameter values returned are indeed the maximum likelihood estimates. If one tries to compare nested models and the fit of the simpler model is better than the alternative, the \texttt{anova} function will return an error message. + +The \texttt{fit\_elife} function handles arbitrary censoring patterns over single intervals, along with single interval truncation and interval censoring. To accommodate the sampling scheme of the International Database on Longevity (IDL), an option also allows for double interval truncation \citep{ARSIA:2022}, whereby observations are included only if the person dies between time intervals, potentially overlapping, which defines the observation window over which dead individuals are recorded. + +\begin{verbatim} +thresh <- 108 +model0 <- fit_elife(arguments = args_japan, + thresh = thresh, + family = "exp") +(model1 <- fit_elife(arguments = args_japan, + thresh = thresh, + family = "gomp")) +#> Model: Gompertz distribution. +#> Sampling: interval censored, right truncated +#> Log-likelihood: -3599.037 +#> +#> Threshold: 108 +#> Number of exceedances: 2489 +#> +#> Estimates +#> scale shape +#> 1.6855 0.0991 +#> +#> Standard Errors +#> scale shape +#> 0.0523 0.0273 +#> +#> Optimization Information +#> Convergence: TRUE +\end{verbatim} + +\subsection{Model comparisons}\label{model-comparisons} + +Goodness of fit of nested models can be compared using likelihood ratio tests via the \texttt{anova} method. Most of the interrelations between models yield non-regular model comparisons since, to recover the simpler model, one must often fix parameters to values that lie on the boundary of the parameter space. For example, if we compare a Gompertz model with the exponential, the limiting null distribution is a mixture of a point mass at zero and a \(\chi^2_1\) variable, both with probability half \citep{Chernoff:1954}. Many authors \citep[e.g.,][]{Camarda:2022} fail to recognize this fact. The case becomes more complicated with more than one boundary constraint: for example, the deviance statistic comparing the Beard--Makeham and the Gompertz model, which constrains two parameters on the boundary of the parameter space, has a null distribution which is a mixture of \(\chi^2_2/4 + \chi^2_1/2 + \chi^2_0/4\) \citep{Self.Liang:1987}. + +Nonidentifiability impacts testing: for example, if the rate parameter of the Perks--Makeham model (\texttt{perksmake}) \(\nu \to 0\), the limiting hazard, \(\lambda + \exp(\alpha)/\{1+\exp(\alpha)\}\), is constant (exponential model), but neither \(\alpha\) nor \(\lambda\) is identifiable. The usual asymptotics for the likelihood ratio test break down as the information matrix is singular \citep{Rotnitzky:2000}. As such, all three families that include a Makeham component cannot be directly compared to the exponential in \CRANpkg{longevity} and the call to \texttt{anova} returns an error message. + +Users can also access information criteria, \texttt{AIC} and \texttt{BIC}. The correction factors implemented depend on the number of parameters of the distribution, but do not account for singular fit, non-identifiable parameters or singular models for which the usual corrections \(2p\) and \(\ln(n)p\) are inadequate \citep{Watanabe:2010}. + +\begin{figure} + +{\centering \includegraphics[width=1\linewidth,alt={Graph with parametric model names, and arrows indicating the relationship between these. Dashed arrows indicate non-regular comparisons between nested models, and the expression indicates which parameter to fix to obtain the submodel.}]{fig/nesting_graph} + +} + +\caption{Relationship between parametric models showing nested relations. Dashed arrows represent restrictions that lead to nonregular asymptotic null distribution for comparison of nested models. Comparisons between models with Makeham components and exponential are not permitted by the software because of nonidentifiability issues.}\label{fig:fig-nesting} +\end{figure} + +To showcase how hypothesis testing is performed, we consider a simple example with two nested models. We test whether the exponential model is an adequate simplification of the Gompertz model for exceedances above 108 years --- an irregular testing problem since \(\beta=0\) is a restriction on the boundary of the parameter space. The drop in log likelihood is quite large, indicating the exponential model is not an adequate simplification of the Gompertz fit. This is also what is suggested by the Bayesian information criterion, which is much lower for the Gompertz model than for the exponential. + +\begin{verbatim} +# Model comparison +anova(model1, model0) +# Information criteria +c("exponential" = BIC(model0), "Gompertz" = BIC(model1)) +\end{verbatim} + +\begin{tabular}{lrrrrr} +\toprule + & npar & Deviance & Df & Chisq & Pr(>Chisq)\\ +\midrule +gomp & 2 & 7198.07 & & & \\ +exp & 1 & 7213.25 & 1 & 15.17 & 0\\ +\bottomrule +\end{tabular} + +\begin{verbatim} +#> exponential Gompertz +#> 7221.065 7213.713 +\end{verbatim} + +\subsection{Simulation-based inference}\label{simulation-based-inference} + +Given the poor finite sample properties of the aforementioned tests, it may be preferable to rely on a parametric bootstrap rather than on the asymptotic distribution of the test statistic \citep{ARSIA:2022} for model comparison. +Simulation-based inference requires capabilities for drawing new data sets whose features match those of the original one. For example, the \href{supercentenarians.org}{International Database on Longevity} (IDL) \citep{IDL:2021} features data that are interval truncated above 110 years, but doubly interval truncated since the sampling period for semisupercentenarians (who died age 105 to 110) and supercentenarians (who died above 110) are not always the same \citep{ARSIA:2022}. The 2018 Istat semisupercentenarians database analyzed by \citet{Barbi:2018} on Italians includes left truncated right censored records. + +To mimic the postulated data generating mechanism while accounting for the sampling scheme, we could use the observed birth dates, or simulate new birth dates (possibly through a kernel estimator of the empirical distribution of birth dates) while keeping the sampling frame with the first and last date of data collection to define the truncation interval. In other settings, one could obtain the nonparametric maximum likelihood estimator of the distribution of the upper truncation bound \citep{Shen:2010} using an inverse probability weighted estimator, which for fixed data collection windows is equivalent to setting the birth date. + +The \texttt{samp\_elife} function includes multiple \texttt{type2} arguments to handle these. For interval truncated data (\texttt{type2="ltrt"}), it uses the inversion method (Section 2 of \citet{Devroye:1986}): for \(F\) an absolutely continuous distribution function and \(F^{-1}\) the corresponding quantile function, a random variable distributed according to \(F\) truncated on \([a,b]\) is generated as \(X \sim F^{-1}[F(a) + U\{F(b)-F(a)\}]\) +where \(U \sim \mathsf{U}(0,1)\) is standard uniform. + +The function \texttt{samp\_elife} also has an argument \texttt{upper} which serves for both right truncation, and right censoring. For the latter, any record simulated that exceeds \texttt{upper} is capped at that upper bound and declared partially observed. This is useful for simulating administrative censoring, whereby the birth date and the upper bound of the collection window fully determine whether an observation is right censored or not. An illustrative example is provided in the next section. + +The \texttt{anova} method call uses the asymptotic null distribution for comparison of nested parametric distributions \(\mathcal{F}_0 \subseteq \mathcal{F}_1\). We could use the bootstrap to see how good this approximation to the null distribution is. To mimic as closely as possible the data generating mechanism, which is custom in most scenarios, we condition on the sampling frame and the number of individuals in each birth cohort. The number dying at each age is random, but the right truncation limits will be the same for anyone in that cohort. We simulate excess lifetimes, then interval censor observations by keeping only the corresponding age bracket. Under the null hypothesis, the data are drawn from \(\widehat{F}_0 \in \mathcal{F}_0\) and we generate observations from this right truncated distribution using the \texttt{samp\_elife} utility, which also supports double interval truncation and left truncation right censoring. This must be done within a for loop since we have count attached to each upper bound, but the function is vectorized should we use a single vector containing all of the right truncation limits. + +The bootstrap \(p\)-value for comparing models \(M_0 \subset M_1\) would be obtained by repeating the following steps \(B\) times and calculating the rank of the observed test statistic among alternatives: + +\begin{enumerate} +\def\labelenumi{\arabic{enumi}.} +\tightlist +\item + Simulate new birth dates \(d_i\) \((i=1, \ldots, n)\) (e.g., drawing from a smoothed empirical distribution of birth dates); the latest possible birth date is one which ensures the person reached at least the threshold by the end of the period. +\item + Subtract the endpoints of the sampling period, say \(c_1\) and \(c_2\) to get the minimum (maximum) age at death, \(c_1 - d_i\) (respectively \(c_2 - d_i\)) days, which define the truncation bounds. +\item + Use the function \texttt{samp\_elife} to simulate new observations from a parametric interval truncated distribution from the null model \(M_0\) +\item + Use the optimization procedure in \texttt{fit\_elife} to fit the model with both \(M_1\) and \(M_2\) and calculate the deviance, and from them the likelihood ratio statistic. +\end{enumerate} + +The algorithm is implemented below for comparing the Gompertz and the exponential model. Since the procedure is computationally intensive, users must trade off between the precision of the bootstrap \(p\)-value estimate and the number of replications, \(B\). + +\begin{verbatim} +set.seed(2022) +# Count of unique right truncation limit +db_rtrunc <- aggregate(count ~ rtrunc, + FUN = "sum", + data = japanese2, + subset = age >= thresh) +B <- 1000L # Number of bootstrap replications +boot_anova <- numeric(length = B) +boot_gof <- numeric(length = B) +for(b in seq_len(B - 1L)){ + boot_samp <- # Generate bootstrap sample + do.call(rbind, #merge data frames + apply(db_rtrunc, 1, function(x){ # for each rtrunc and count + count <- table( #tabulate count + floor( #round down + samp_elife( # sample right truncated exponential + n = x["count"], + scale = model0$par, + family = "exp", #null model + upper = x["rtrunc"] - thresh, + type2 = "ltrt"))) + data.frame( # return data frame + count = as.integer(count), + rtrunc = as.numeric(x["rtrunc"]) - thresh, + eage = as.integer(names(count))) + })) + boot_mod0 <- # Fit null model to bootstrap sample + with(boot_samp, + fit_elife(time = eage, + time2 = eage + 1L, + rtrunc = rtrunc, + type = "interval", + event = 3, + family = "exp", + weights = count)) + boot_mod1 <- # Fit alternative model to bootstrap sample + with(boot_samp, + fit_elife(time = eage, + time2 = eage + 1L, + rtrunc = rtrunc, + type = "interval", + event = 3, + family = "gomp", + weights = count)) + boot_anova[b] <- deviance(boot_mod0) - + deviance(boot_mod1) +} +# Add original statistic +boot_anova[B] <- deviance(model1) - deviance(model0) +# Bootstrap p-value +(pval <- rank(boot_anova)[B] / B) +#> [1] 0.001 +\end{verbatim} + +The asymptotic approximation is of similar magnitude as the bootstrap \(p\)-value. Both suggest that the more complex Gompertz model provides a significantly better fit. + +\subsection{Extreme value analysis}\label{extreme-value-analysis} + +Extreme value theory suggests that, in many instances, the limiting conditional distribution of exceedances of a random variable \(Y\) with distribution function \(F\) is generalized Pareto, meaning +\begin{align} +\lim_{u \to x^*}\Pr(Y-u > y \mid Y > u)= \begin{cases} +\left(1+\xi y/\sigma\right)_{+}^{-1/\xi}, & \xi \neq 0;\\ +\exp(-y/\sigma), & \xi = 0; +\end{cases} +\label{eq:gpd} +\end{align} +with \(x_{+} = \max\{x, 0\}\) and \(x^*=\sup\{x: F(x) < 1\}\). This justifies the use of Equation \eqref{eq:gpd} for the survival function of threshold exceedances when dealing with rare events. The model has two parameters: a scale \(\sigma\) and a shape \(\xi\) which determines the behavior of the upper tail. Negative shape parameters correspond to bounded upper tails and a finite right endpoint for the support. + +Study of population dynamics and mortality generally requires knowledge of the total population from which observations are drawn to derive rates. By contrast, the peaks over threshold method, by which one models the \(k\) largest observations of a sample, is a conditional analysis (e.g., given survival until a certain age), and is therefore free of denominator specification since we only model exceedances above a high threshold \(u\). For modelling purposes, we need to pick a threshold \(u\) that is smaller than the upper endpoint \(x^*\) in order to have sufficient number of observations to estimate parameters. The threshold selection problem is a classical instance of bias-variance trade-off: the parameter estimators are possibly biased if the threshold is too low because the generalized Pareto approximation is not good enough, whereas choosing a larger threshold to ensure we are closer to the asymptotic regime leads to reduced sample size and increased parameter uncertainty. + +To aid threshold selection, users commonly resort to parameter stability plots. These are common visual diagnostics consisting of a plot of estimates of the shape parameter \(\widehat{\xi}\) (with confidence or credible intervals) based on sample exceedances over a range of thresholds \(u_1, \ldots, u_K\). If the data were drawn from a generalized Pareto distribution, the conditional distribution above higher threshold \(v > u\) is also generalized Pareto with the same shape: this threshold stability property is the basis for extrapolation beyond the range of observed records. Indeed, if the change in estimation of \(\xi\) is nearly constant, this provides reassurance that the approximation can be used for extrapolation. The only difference with survival data, relative to the classical setting, is that the likelihood must account for censoring and truncation. Note that, when we use threshold exceedances with a nonzero threshold (argument \texttt{thresh}), it however isn't possible to unambiguously determine whether left censored observations are still exceedances: such cases yield errors in the functions. + +Theory on penultimate extremes suggests that, for finite levels and general distribution function \(F\) for which \eqref{eq:gpd} holds, the shape parameter varies as a function of the threshold \(u\), behaving like the derivative of the reciprocal hazard \(r(x) = \{1-F(x)\}/f(x)\). We can thus model the shape as piece-wise constant by fitting a piece-wise generalized Pareto model due to \citet{Northrop.Coleman:2014} and adapted in \citet{ARSIA:2022} for survival data. The latter can be viewed as a mixture of generalized Pareto over \(K\) disjoint intervals with continuity constraints to ensure a smooth hazard, which reduces to the generalized Pareto if we force the \(K+1\) shape parameters to be equal. We can use a likelihood ratio test to compare the model, or a score test if the latter is too computationally intensive, and plot the \(p\)-values for each of the \(K\) thresholds, corresponding to the null hypotheses \(\mathrm{H}_k: \xi_k = \cdots = \xi_{K}\) (\(k=1, \ldots, K-1\)). As the model quickly becomes overparametrized, optimization is difficult and the score test may be a safer option as it only requires estimation of the null model of a single generalized Pareto over the whole range. + +To illustrate these diagnostic tools, Figure \ref{fig:fig-parameterstab} shows a threshold stability plot, which features a small increase in the shape parameters as the threshold increases, corresponding to a stabilization or even a slight decrease of the hazard at higher ages. We can envision a threshold of 108 years as being reasonable: the Northrop--Coleman diagnostic plot suggests lower thresholds are compatible with a constant shape above 100. Additional goodness-of-fit diagnostics are necessary to determine if the generalized Pareto model fits well. + +\begin{verbatim} +par(mfrow = c(1, 2), mar = c(4, 4, 1, 1)) +# Threshold sequence +u <- 100:110 +# Threshold stability plot +tstab(arguments = args_japan, + family = "gp", + method = "profile", + which.plot = "shape", + thresh = u) +# Northrop-Coleman diagnostic based on score tests +nu <- length(u) - 1L +nc_score <- nc_test(arguments = c(args_japan, list(thresh = u))) +score_plot <- plot(nc_score) +graphics.off() +\end{verbatim} + +\begin{figure} + +{\centering \includegraphics[width=1\linewidth,alt={Threshold stability plots. The left panel shows shape parameter estimates with 95\% confidence intervals as a function of the threshold value from 100 until 111 years. The right panel shows p-values from a score test for nested models as a function of the same thresholds.}]{RJ-2025-034_files/figure-latex/fig-parameterstab-1} + +} + +\caption{Threshold diagnostic tools: parameter stability plots for the generalized Pareto model (left) and Northrop--Coleman \(p\)-value path (right) for the Japanese centenarian dataset. Both suggest that a threshold as low as 100 may be suitable for peaks-over-threshold analysis.}\label{fig:fig-parameterstab} +\end{figure} + +Each plot in the package can be produced using base R or using \CRANpkg{ggplot2} \citep{ggplot2}, which implements the grammar of graphics. To keep the list of package dependencies lean and adhere to the \href{https://cran.r-project.org/web/packages/pacs/vignettes/tinyverse.html}{\texttt{tinyverse} principle}, the latter can be obtained by using the argument \texttt{plot.type} with the generic S3 method \texttt{plot}, or via \texttt{autoplot}, provided the \CRANpkg{ggplot2} package is already installed. + +\subsection{Graphical goodness of fit diagnostics}\label{graphical-goodness-of-fit-diagnostics} + +Determining whether a parametric model fits well to survival data is no easy task due to the difficulty in specifying the null distribution of many goodness of fit statistics, such as the Cramér--von Mises statistic, which differ for survival data. As such, the \CRANpkg{longevity} package relies mostly on visual diagnostic tools. \citet{Waller.Turnbull:1992} discusses how classical visual graphics can be adapted in the presence of censoring. Most notably, only observed failure times are displayed on the \(y\)-axis against their empirical plotting position on the \(x\)-axis. Contrary to the independent and identically distributed case, the uniform plotting positions \(F_n(y_i)\) are based on the nonparametric maximum likelihood estimator discussed in Section 3. + +The situation is more complicated with truncated data \citep{ARSIA:2022}, since the data are not identically distributed: indeed, the distribution function of observation \(Y_i\) truncated on the interval \([a_i, b_i]\) is \(F_i(y_i) = \{F(y_i)-F(a_i)\}/\{F(b_i) - F(a_i)\}\), so the data arise from different distributions even if these share common parameters. One way out of this conundrum is using the probability integral transform and the quantile transform to map observations to the uniform scale and back onto the data scale. Taking \(\widetilde{F}(y_i) = F_n(y_i)=\mathrm{rank}(y_i)/(n+1)\) to denote the empirical distribution function estimator, a probability-probability plot would show \(x_i = \widetilde{F}_{i}(y_i)\) against \(y_i = F_i(y_i)\), leading to approximately uniform samples if the parametric distribution \(F\) is suitable. Another option is to standardize the observation, taking the collection \(\widetilde{y}_i=F^{-1}\{F_i(y_i)\}\) of rescaled exceedances and comparing them to the usual plotting positions \(x_{(i)} =\{i/(n+1)\}\). The drawback of the latter approach is that the quantities displayed on the \(y\)-axis are not raw observations and the ranking of the empirical quantiles may change, a somewhat counter-intuitive feature. However, this means that the sample \(\{F_i(y_i)\}\) should be uniform under the null hypothesis, and this allows one to use methods from \citet{Sailynoja.Burkner.Vehtari:2021} to obtain point-wise and simultaneous confidence intervals. + +\CRANpkg{longevity} offers users the choice between three types of quantile-quantile plots: regular (Q-Q, \texttt{"qq"}), Tukey's mean detrended Q-Q plots (\texttt{"tmd"}) and exponential Q-Q plots (\texttt{"exp"}). Other options on uniform scale are probability-probability (P-P, \texttt{"pp"}) plots and empirically rescaled plots (ERP, \texttt{"erp"}) \citep{Waller.Turnbull:1992}, designed to ease interpretation with censored observations by rescaling axes. We illustrate the graphical tools with the \texttt{dutch} data above age 105 in Figure \ref{fig:fig-qqplots}. The fit is correct above 110, but there is a notable dip due to excess mortality around 109. + +\begin{verbatim} +fit_dutch <- fit_elife( + arguments = dutch_data, + event = 3, + type = "interval2", + family = "gp", + thresh = 105, + export = TRUE) +par(mfrow = c(1, 2)) +plot(fit_dutch, + which.plot = c("pp","qq")) +\end{verbatim} + +\begin{figure} + +{\centering \includegraphics[width=1\linewidth]{RJ-2025-034_files/figure-latex/fig-qqplots-1} + +} + +\caption{Probability-probability and quantile-quantile plots for generalized Pareto model fitted above age 105 years to Dutch data. The plots indicate broadly good agreement with the observation, except for some individuals who died age 109 for which too many have deaths close to their birthdates.}\label{fig:fig-qqplots} +\end{figure} + +Censored observations are used to compute the plotting positions, but are not displayed. As such, we cannot use graphical goodness of fit diagnostics for the Japanese interval censored data. An alternative, given that data are tabulated in a contingency table, is to use a chi-squared test for independence, conditioning on the number of individuals per birth cohort. The expected number in each cell (birth cohort and age band) can be obtained by computing the conditional probability of falling in that age band. The asymptotic null distribution should be \(\chi^2\) with \((k-1)(p-1)\) degrees of freedom, where \(k\) is the number of age bands and \(p\) the number of birth cohorts. In finite samples, the expected count for large excess lifetimes are very low so one can expect the \(\chi^2\) approximation to be poor. To mitigate this, we can pool observations and resort to simulation to approximate the null distribution of the test statistic. The bootstrap \(p\)-value for the exponential model above 108 years, pooling observations with excess lifetime of 5 years and above, is 0.872, indicating no evidence that the model is inadequate, but the test here may have low power. + +\subsection{Stratification}\label{stratification} + +Demographers may suspect differences between individuals of different sex, from different countries or geographic areas, or by birth cohort. All of these are instances of categorical covariates. One possibility is to incorporate these covariates with suitable link function through parameters, but we consider instead stratification. We can split the data by levels of \texttt{covariate} (with factors) into sub-data and compare the goodness of fit of the \(K\) models relative to that which pools all observations. The \texttt{test\_elife} function performs likelihood ratio tests for the comparisons. The test statistic is \(-2\{\ell(\widehat{\boldsymbol{\theta}}_0) - \ell(\widehat{\boldsymbol{\theta}})\}\), where \(\widehat{\boldsymbol{\theta}}_0\) is the maximum likelihood estimator under the null model with common parameters, and \(\widehat{\boldsymbol{\theta}}\) is the unrestricted maximum likelihood estimator for the alternative model with the same distribution, but which allows for stratum-specific parameters. We illustrate this with a generalized Pareto model for the excess lifetime. The null hypothesis is \(\mathrm{H}_0: \sigma_{\texttt{f}} = \sigma_{\texttt{m}}, \xi_{\texttt{f}}=\xi_{\texttt{m}}\) against the alternative that at least one equality doesn't hold and so the hazard and endpoints are different. + +\begin{verbatim} +print( + test_elife( + arguments = args_japan, + thresh = 110, + family = "gp", + covariate = japanese2$gender) +) +#> Model: generalized Pareto distribution. +#> Threshold: 110 +#> Number of exceedances per covariate level: +#> female male +#> 642 61 +#> +#> Likelihood ratio statistic: 0.364 +#> Null distribution: chi-square (2) +#> Asymptotic p-value: 0.833 +\end{verbatim} + +In the present example, there is no evidence against any difference in lifetime distribution between male and female; this is unsurprising given the large imbalance between counts for each covariate level, with far fewer males than females. + +\subsection{Extrapolation}\label{extrapolation} + +If the maximum likelihood estimator of the shape \(\xi\) for the generalized Pareto model is negative, then the distribution has a finite upper endpoint; otherwise, the latter is infinite. With \(\xi < 0\), we can look at the profile log likelihood for the endpoint \(\eta = -\sigma/\xi\), using the function \texttt{prof\_gp\_endpt}, to draw the curve and obtain confidence intervals. The argument \texttt{psi} is used to give a grid of values over which to compute the profile log likelihood. The bounds of the \((1-\alpha)\) confidence intervals are obtained by fitting a cubic smoothing spline for \(y=\eta\) as a function of the shifted profile curve \(x = 2\{\ell_p(\eta)-\ell_p(\widehat{\eta})\}\) on both sides of the maximum likelihood estimator and predicting the value of \(y\) when \(x = -\chi^2_1(1-\alpha)\). This technique works well unless the profile is nearly flat or the bounds lie beyond the range of values of \texttt{psi} provided; the user may wish to change them if they are too far. If \(\widehat{\xi} \approx 0\), then the upper bound of the confidence interval may be infinite and the profile log likelihood may never reach the cutoff value of the asymptotic \(\chi^2_1\) distribution. + +The profile log likelihood curve for the endpoint, shifted vertically so that its value is zero at the maximum likelihood estimator, highlights the marked asymmetry of the distribution of \(\eta\), shown in \ref{fig:fig-endpoint-confint}, with the horizontal dashed lines showing the limits for the 95\% profile likelihood confidence intervals. These suggest that the endpoint, or a potential finite lifespan, could lie very much beyond observed records. The routine used to calculate the upper bound computes the cutoff value by fitting a smoothing spline with the role of the \(y\) and \(x\) axes reversed and by predicting the value of \(\eta\) at \(y=0\). In this example, the upper confidence limit is extrapolated from the model: more accurate measures can be obtained by specifying a longer and finer sequence of values of \texttt{psi} such that the profile log likelihood drops below the \(\chi^2_1\) quantile cutoff. + +\begin{verbatim} +# Create grid of threshold values +thresholds <- 105:110 +# Grid of values at which to evaluate profile +psi <- seq(120, 200, length.out = 101) +# Calculate the profile for the endpoint +# of the generalized Pareto at each threshold +endpt_tstab <- do.call( + endpoint.tstab, + args = c( + args_japan, + list(psi = psi, + thresh = thresholds, + plot = FALSE))) +# Compute corresponding confidence intervals +profile <- endpoint.profile( + arguments = c(args_japan, list(thresh = 110, psi = psi))) +# Plot point estimates and confidence intervals +g1 <- autoplot(endpt_tstab, plot = FALSE, ylab = "lifespan (in years)") +# Plot the profile curve with cutoffs for conf. int. for 110 +g2 <- autoplot(profile, plot = FALSE) +patchwork::wrap_plots(g1, g2) +\end{verbatim} + +\begin{figure} + +{\centering \includegraphics[width=1\linewidth]{RJ-2025-034_files/figure-latex/fig-endpoint-confint-1} + +} + +\caption{Maximum likelihood estimates with 95\% confidence intervals as a function of threshold (left) and profile likelihood for exceedances above 110 years (right) for Japanese centenarian data. As the threshold increases, the number of exceedances decreases and the intervals for the upper bound become wider. At 110, the right endpoint of the interval would go until infinity.}\label{fig:fig-endpoint-confint} +\end{figure} + +Depending on the model, the conclusions about the risk of mortality change drastically: the Gompertz model implies an ever increasing hazard, but no finite endpoint for the distribution of exceedances. The exponential model implies a constant hazard and no endpoint. By contrast, the generalized Pareto can accommodate both finite and infinite endpoints. The marked asymmetry of the distribution of lifespan defined by the generalized Pareto shows that inference obtained using symmetric confidence intervals (i.e., Wald-based) is likely very misleading: the drop in fit from having a zero or positive shape parameter \(\xi\) is seemingly smaller than the cutoffs for a 95\% confidence interval, suggesting that while the best point estimate is around 128 years, the upper bound is so large (and extrapolated) that everything is possible. The model however also suggests a very high probability of dying in any given year, regardless of whether the hazard is constant, decreasing or increasing. + +\subsection{Hazard}\label{hazard} + +The parameters of the models are seldom of interest in themselves: rather, we may be interested in a summary such as the hazard function. At present, \CRANpkg{longevity} does not allow general linear modelling of model parameters or time-varying covariates, but other software implementations can tackle this task. For example, \CRANpkg{casebase} \citep{casebase} fits flexible hazard models using logistic or multinomial regression with potential inclusion of penalties for the parameters associated to covariates and splines effects. Another alternative is \CRANpkg{flexsurv} \citep{flexsurv}, which offers 10 parametric models and allows for user-specified models. +The \CRANpkg{bshazard} package \citep{bshazard} provides nonparametric smoothing via \(B\)-splines, whereas \CRANpkg{muhaz} handles kernel-based hazard for right censored data; both could be used for validation of the parametric models in the case of right censoring. The \CRANpkg{rstpm2} package \citep{rstpm2} handles generalized modelling for censoring with the Royston--Parmar model built from natural cubic splines \citep{Royston.Parmar:2002}. Contrasting with all of the aforementioned approaches, we focus on parametric models: this is partly because there are few observations for the user case we consider and covariates, except perhaps for gender and birth year, are not available. + +The hazard changes over time; the only notable exception is the exponential hazard, which is constant. \CRANpkg{longevity} includes utilities for computing the hazard function from a fitted model object and computing point-wise confidence intervals using symmetric Wald intervals or the profile likelihood. +Specifically, the \texttt{hazard\_elife} function calculates the hazard \(h(t; \boldsymbol{\theta})\) point-wise at times \(t=\)\texttt{x}; Wald-based confidence intervals are obtained using the delta-method, whereas profile likelihood intervals are obtained by reparametrizing the model in terms of \(h(t)\) for each time \(t\). More naturally perhaps, we can consider a Bayesian analysis of the Japanese excess lifetime above 108 years. Using the likelihood and encoded log posterior provided in \texttt{logpost\_elife}, we obtained independent samples from the posterior of the generalized Pareto parameters \((\sigma, \xi)\) with maximal data information prior using the \CRANpkg{rust} package. Each parameter combination was then fed into \texttt{helife} and the hazard evaluated over a range of values. Figure \ref{fig:fig-hazard} shows the posterior samples and functional boxplots \citep{Sun.Genton:2011} of the hazard curves, obtained using the \CRANpkg{fda} package. The risk of dying increases with age, but comes with substantial uncertainty, as evidenced by the increasing width of the boxes and interquartile range. + +\begin{figure} +\includegraphics[width=1\linewidth]{RJ-2025-034_files/figure-latex/fig-hazard-1} \caption{Left: scatterplot of 1000 independent posterior samples from generalized Pareto model with maximum data information prior; the contour curves give the percentiles of credible intervals, and show approximate normality of the posterior. Right: functional boxplots for the corresponding hazard curves, with increasing width at higher ages.}\label{fig:fig-hazard} +\end{figure} + +\subsection{Nonparametric maximum likelihood estimation}\label{sec-nonparametric} + +The nonparametric maximum likelihood estimator is unique only up to equivalence classes. The data for individual \(i\) consists of the tuple \(\{L_i, R_i, V_i, U_i\}\), where the censoring interval is \([L_i, R_i]\) and the truncation interval is \([V_i, U_i\}\), with \(0 \leq V_i \leq L_i \leq R_i \leq U_i \leq \infty\). \citet{Turnbull:1976} shows how one can build disjoint intervals \(C = \bigsqcup_{j=1}^m [a_j, b_j]\) where \(a_j \in \mathcal{L} = \{L_1, \ldots, L_n\}\) and \(b_j \in \mathcal{R} = \{R_1, \ldots, R_n\}\) satisfy \(a_1 \leq b_1 < \cdots < a_m \leq b_m\) and the intervals \([a_j, b_j]\) contain no other members of \(L\) or \(R\) except at the endpoints. This last condition notably ensures that the intervals created include all observed failure times as singleton sets in the absence of truncation. Other authors \citep{Lindsey.Ryan:1998} have taken interval censored data as semi-open intervals \((L_i, R_i]\), a convention we adopt here for numerical reasons. For interval censored and truncated data, \citet{Frydman:1994} shows that this construction must be amended by taking instead \(a_j \in \mathcal{L} \cup \{U_1, \ldots, U_n\}\) and \(b_j \in \mathcal{R} \cup \{V_1, \ldots, V_n\}\). + +We assign probability \(p_j = F(b_j^{+}) - F(a_j^{-}) \ge 0\) to each of the resulting \(m\) intervals under the constraint \(\sum_{j=1}^m p_j = 1\) and \(p_j \ge 0\) \((j=1, \ldots, m)\). The nonparametric maximum likelihood estimator of the distribution function \(F\) is then +\begin{align*} +\widehat{F}(t) = \begin{cases} 0, & t < a_1;\\ +\widehat{p}_1 + \cdots + \widehat{p}_j, & b_j < t < a_{j+1} \quad (1 \leq j \leq m-1);\\ +1, & t > b_m; +\end{cases} +\end{align*} +and is undefined for \(t \in [a_j, b_j]\) \((j=1, \ldots, m)\). + +\begin{figure} + +{\centering \includegraphics[width=1\linewidth]{RJ-2025-034_files/figure-latex/fig-turnbull-1} + +} + +\caption{Illustration of the truncation (pale grey) and censoring intervals (dark grey) equivalence classes based on Turnbull's algorithm. Observations must fall within equivalence classes defined by the former.}\label{fig:fig-turnbull} +\end{figure} + +The procedure of Turnbull can be encoded using \(m \times n\) matrices. For censoring, we build \(\mathbf{A}\) whose \((i,j)\)th entry \(\alpha_{ij}=1\) if \([a_j, b_j] \subseteq A_i\) and zero otherwise. Since the intervals forming the set \(C\) are disjoint and in increasing order, a more storage efficient manner of keeping track of the intervals is to find the smallest integer \(j\) such that \(L_i \leq a_j\) and the largest \(R_i \ge b_j\) \((j=1, \ldots, m)\) for each observation. The same idea applies for the truncation sets \(B_i = (V_i, U_i)\) and matrix \(\mathbf{B}\) with \((i,j)\) element \(\beta_{ij}\). + +The log likelihood function is +\begin{align*} +\ell(\boldsymbol{p}) = \sum_{i=1}^n w_i \log \left( \sum_{j=1}^m \alpha_{ij}p_j\right) - w_i\log \left( \sum_{j=1}^m \beta_{ij}p_j\right) +\end{align*} + +The numerical implementation of the EM is in principle forward: first identify the equivalence classes \(C\), next calculate the entries of \(A_i\) and \(B_i\) (or the vectors of ranges) and finally run the EM algorithm. In the second step, we need to account for potential ties in the presence of (interval) censoring and treat the intervals as open on the left for censored data. For concreteness, consider the toy example \(\boldsymbol{T} =(1,1,2)\) and \(\boldsymbol{\delta} = (0,1,1)\), where \(\delta_i = 1\) if the observation is a failure time and \(\delta_i=0\) in case of right censoring. +The left and right censoring bounds are \(\mathcal{L} = \{1, 2\}\) and \(\mathcal{R} = \{1, 2, \infty\}\) with \(A_1 = (1, \infty)\), \(A_2 = \{1\}\) and \(A_3 = \{2\}\) and \(C_1=\{1\}, C_2=\{2\}\). If we were to treat instead \(A_1\) as a semi-closed interval \([1, \infty)\), direct maximization of the log likelihood in eq. 2.2 of \citet{Turnbull:1976} would give probability half to each observed failure time. By contrast, the Kaplan--Meier estimator, under the convention that right censored observations at time \(t\) were at risk up to and until \(t\), assigns probability 1/3 to the first failure. To retrieve this solution with Turnbull's EM estimator, we need the convention that \(C_1 \notin A_1\), but this requires comparing the bound with itself. The numerical tolerance in the implementation is taken to be the square root of the machine epsilon for doubles. + +The maximum likelihood estimator (MLE) need not be unique, and the EM algorithm is only guaranteed a local maximum. For interval censored data, \citet{Gentleman.Geyer:1994} consider using the Karush--Kuhn--Tucker conditions to determine whether the probability in some intervals is exactly zero and whether the returned value is indeed the MLE. + +Due to data scarcity, statistical inference for human lifespan is best conducted using parametric models supported by asymptotic theory, reserving nonparametric estimators to assess goodness of fit. The empirical cumulative hazard for Japanese is very close to linear from early ages, suggesting that the hazard may not be very far from exponential even if more complex models are likely to be favored given the large sample size. + +The function \texttt{np\_elife} returns a list with Turnbull's intervals \([a_j, b_j]\) and the probability weights assigned to each, provided the latter are positive. It also contains an object of class \texttt{stepfun} with a weighting argument that defines a cumulative distribution function. + +We can use the nonparametric maximum likelihood estimator of the distribution function to assess a fitted parametric model by comparing the density with a binned distribution, or the cumulative distribution function. The distribution function is defined over equivalence classes, which may be isolated observations or intervals. The data interval over which there is non-zero probability of events is broken down into equispaced bins and the probability of failing in the latter estimated nonparametrically based on the distribution function. Alternatively, users can also provide a set of \texttt{breaks}, or the number of bins. + +\begin{verbatim} +ecdf <- np_elife(arguments = args_japan, thresh = 108) +# Summary statistics, accounting for censoring +round(summary(ecdf), digits = 2) +#> Min. 1st Qu. Median Mean 3rd Qu. Max. +#> 109.00 110.00 111.00 110.01 111.00 118.00 +# Plots of fitted parametric model and nonparametric CDF +model_gp <- fit_elife( + arguments = args_japan, + thresh = 108, + family = "gp", + export = TRUE) +# ggplot2 plots, wrapped to display side by side +patchwork::wrap_plots( + autoplot( + model_gp, # fitted model + plot = FALSE, # return list of ggplots + which.plot = c("dens", "cdf"), + breaks = seq(0L, 8L, by = 1L) # set bins for histogram + ) +) +\end{verbatim} + +\begin{figure} + +{\centering \includegraphics[width=1\linewidth]{RJ-2025-034_files/figure-latex/fig-ecdf-1} + +} + +\caption{Nonparametric maximum likelihood estimate of the density (bar plot, left) and distribution function (staircase function, right), with superimposed generalized Pareto fit for excess lifetimes above 108 years. Except for the discreteness inherent to the nonparametric estimates, the two representations broadly agree at year marks.}\label{fig:fig-ecdf} +\end{figure} + +\section{Conclusion}\label{conclusion} + +This paper describes the salient features of \CRANpkg{longevity}, explaining the theoretical underpinning of the methods and the design considerations followed when writing the package. While \CRANpkg{longevity} was conceived for modelling lifetimes, the package could be used for applications outside of demography. Survival data in extreme value theory is infrequent yet hardly absent. For example, rainfall observations can be viewed as rounded due to the limited instrumental precision of rain gauges and treated as interval-censored. Some historical records, which are often lower bounds on the real magnitude of natural catastrophes, can be added as right-censored observations in a peaks-over-threshold analysis. In insurance, losses incurred by the company due to liability claims may be right-censored if they exceed the policy cap and are covered by a reinsurance company, or rounded \citep{Belzile.Neslehova:2025}. In climate science, attribution studies often focus on data just after a record-breaking event, and the stopping rule leads to truncation \citep{Miralles.Davison:2023}, which biases results if ignored. + +The package has features that are interesting in their own right, including adapted quantile-quantile and other visual goodness of fit diagnostics for model validation. The testing procedures correctly handle tests for restrictions lying on the boundary of the parameter space. Parametric bootstrap procedures for such tests are not straightforward to implement given the heavy reliance on the data generating mechanism and the diversity of possible scenarios: this paper however shows how the utilities of the package can be coupled to ease such estimation and the code in the supplementary material illustrates how this can be extended for goodness of fit testing. + +\section*{Acknowledgements}\label{acknowledgements} +\addcontentsline{toc}{section}{Acknowledgements} + +The author thanks four anonymous reviewers for valuable feedback. This research was supported financially by the Natural Sciences and Engineering Research Council of Canada (NSERC) via Discovery Grant RGPIN-2022-05001. + +\bibliography{longevity.bib} + +\address{% +Léo R. Belzile\\ +Department of Decision Sciences, HEC Montréal\\% +3000, chemin de la Côte-Sainte-Catherine\\ Montréal (Québec), Canada\\ H3T 2A7\\ +% +\url{https://lbelzile.bitbucket.io}\\% +\textit{ORCiD: \href{https://orcid.org/0000-0002-9135-014X}{0000-0002-9135-014X}}\\% +\href{mailto:leo.belzile@hec.ca}{\nolinkurl{leo.belzile@hec.ca}}% +} diff --git a/_articles/RJ-2025-034/RJ-2025-034.zip b/_articles/RJ-2025-034/RJ-2025-034.zip new file mode 100644 index 0000000000..ca3580d058 Binary files /dev/null and b/_articles/RJ-2025-034/RJ-2025-034.zip differ diff --git a/_articles/RJ-2025-034/RJ-2025-034_files/figure-html5/fig-ecdf-1.png b/_articles/RJ-2025-034/RJ-2025-034_files/figure-html5/fig-ecdf-1.png new file mode 100644 index 0000000000..44f70cdfca Binary files /dev/null and b/_articles/RJ-2025-034/RJ-2025-034_files/figure-html5/fig-ecdf-1.png differ diff --git a/_articles/RJ-2025-034/RJ-2025-034_files/figure-html5/fig-endpoint-confint-1.png b/_articles/RJ-2025-034/RJ-2025-034_files/figure-html5/fig-endpoint-confint-1.png new file mode 100644 index 0000000000..47b44cba86 Binary files /dev/null and b/_articles/RJ-2025-034/RJ-2025-034_files/figure-html5/fig-endpoint-confint-1.png differ diff --git a/_articles/RJ-2025-034/RJ-2025-034_files/figure-html5/fig-hazard-1.png b/_articles/RJ-2025-034/RJ-2025-034_files/figure-html5/fig-hazard-1.png new file mode 100644 index 0000000000..d5bc33c9ac Binary files /dev/null and b/_articles/RJ-2025-034/RJ-2025-034_files/figure-html5/fig-hazard-1.png differ diff --git a/_articles/RJ-2025-034/RJ-2025-034_files/figure-html5/fig-parameterstab-1.png b/_articles/RJ-2025-034/RJ-2025-034_files/figure-html5/fig-parameterstab-1.png new file mode 100644 index 0000000000..16d522bec1 Binary files /dev/null and b/_articles/RJ-2025-034/RJ-2025-034_files/figure-html5/fig-parameterstab-1.png differ diff --git a/_articles/RJ-2025-034/RJ-2025-034_files/figure-html5/fig-qqplots-1.png b/_articles/RJ-2025-034/RJ-2025-034_files/figure-html5/fig-qqplots-1.png new file mode 100644 index 0000000000..3b7e729359 Binary files /dev/null and b/_articles/RJ-2025-034/RJ-2025-034_files/figure-html5/fig-qqplots-1.png differ diff --git a/_articles/RJ-2025-034/RJ-2025-034_files/figure-html5/fig-turnbull-1.png b/_articles/RJ-2025-034/RJ-2025-034_files/figure-html5/fig-turnbull-1.png new file mode 100644 index 0000000000..99292b1711 Binary files /dev/null and b/_articles/RJ-2025-034/RJ-2025-034_files/figure-html5/fig-turnbull-1.png differ diff --git a/_articles/RJ-2025-034/RJ-2025-034_files/figure-latex/fig-ecdf-1.pdf b/_articles/RJ-2025-034/RJ-2025-034_files/figure-latex/fig-ecdf-1.pdf new file mode 100644 index 0000000000..d4361874de Binary files /dev/null and b/_articles/RJ-2025-034/RJ-2025-034_files/figure-latex/fig-ecdf-1.pdf differ diff --git a/_articles/RJ-2025-034/RJ-2025-034_files/figure-latex/fig-endpoint-confint-1.pdf b/_articles/RJ-2025-034/RJ-2025-034_files/figure-latex/fig-endpoint-confint-1.pdf new file mode 100644 index 0000000000..b77014293a Binary files /dev/null and b/_articles/RJ-2025-034/RJ-2025-034_files/figure-latex/fig-endpoint-confint-1.pdf differ diff --git a/_articles/RJ-2025-034/RJ-2025-034_files/figure-latex/fig-hazard-1.pdf b/_articles/RJ-2025-034/RJ-2025-034_files/figure-latex/fig-hazard-1.pdf new file mode 100644 index 0000000000..da46f6a2e8 Binary files /dev/null and b/_articles/RJ-2025-034/RJ-2025-034_files/figure-latex/fig-hazard-1.pdf differ diff --git a/_articles/RJ-2025-034/RJ-2025-034_files/figure-latex/fig-parameterstab-1.pdf b/_articles/RJ-2025-034/RJ-2025-034_files/figure-latex/fig-parameterstab-1.pdf new file mode 100644 index 0000000000..b87cffac23 Binary files /dev/null and b/_articles/RJ-2025-034/RJ-2025-034_files/figure-latex/fig-parameterstab-1.pdf differ diff --git a/_articles/RJ-2025-034/RJ-2025-034_files/figure-latex/fig-qqplots-1.pdf b/_articles/RJ-2025-034/RJ-2025-034_files/figure-latex/fig-qqplots-1.pdf new file mode 100644 index 0000000000..991420134d Binary files /dev/null and b/_articles/RJ-2025-034/RJ-2025-034_files/figure-latex/fig-qqplots-1.pdf differ diff --git a/_articles/RJ-2025-034/RJ-2025-034_files/figure-latex/fig-turnbull-1.pdf b/_articles/RJ-2025-034/RJ-2025-034_files/figure-latex/fig-turnbull-1.pdf new file mode 100644 index 0000000000..0b7aa34d5a Binary files /dev/null and b/_articles/RJ-2025-034/RJ-2025-034_files/figure-latex/fig-turnbull-1.pdf differ diff --git a/_articles/RJ-2025-034/RJournal.sty b/_articles/RJ-2025-034/RJournal.sty new file mode 100644 index 0000000000..351990be38 --- /dev/null +++ b/_articles/RJ-2025-034/RJournal.sty @@ -0,0 +1,358 @@ +% Package `RJournal' to use with LaTeX2e +% Copyright (C) 2010 by the R Foundation +% Copyright (C) 2013 by the R Journal +% +% Originally written by Kurt Hornik and Friedrich Leisch with subsequent +% edits by the editorial board +% +% CAUTION: +% Do not modify this style file. Any changes to this file will be reset when your +% article is submitted. +% If you must modify the style or add LaTeX packages to the article, these +% should be specified in RJwrapper.tex + +\NeedsTeXFormat{LaTeX2e}[1995/12/01] +\ProvidesPackage{RJournal}[2025/10/05 v0.17 RJournal package] + +\RequirePackage{tikz} + +% Overall page layout, fonts etc ----------------------------------------------- + +% Issues of of \emph{The R Journal} are created from the standard \LaTeX{} +% document class \pkg{report}. + +\RequirePackage{geometry} +\geometry{a4paper, + textwidth=14cm, top=1cm, bottom=1cm, + includehead,includefoot,centering, + footskip=1.5cm} +\raggedbottom +\sloppy +\clubpenalty = 10000 +\widowpenalty = 10000 +\brokenpenalty = 10000 +\usepackage{microtype} + + +\RequirePackage{fancyhdr} +\fancyhead{} +\fancyheadoffset{2cm} +\fancyhead[L]{\textsc{\RJ@sectionhead}} +\fancyhead[R]{\thepage} +\fancyfoot{} +\fancyfoot[L]{The R Journal Vol. \RJ@volume/\RJ@number, \RJ@month~\RJ@year} +\fancyfoot[R]{ISSN 2073-4859} +\pagestyle{fancy} + +% We use the following fonts (all with T1 encoding): +% +% rm & palatino +% tt & inconsolata +% sf & helvetica +% math & palatino + +\RequirePackage{microtype} + +\RequirePackage[scaled=0.92]{helvet} +\RequirePackage{palatino,mathpazo} +\RequirePackage[scaled=1.02]{inconsolata} +\RequirePackage[T1]{fontenc} + +\RequirePackage[hyphens]{url} +\RequirePackage[pagebackref]{hyperref} +\renewcommand{\backref}[1]{[p#1]} + +% Dark blue colour for all links +\RequirePackage{color} +\definecolor{link}{rgb}{0.45,0.51,0.67} +\hypersetup{ + colorlinks,% + citecolor=link,% + filecolor=link,% + linkcolor=link,% + urlcolor=link +} + +% Give the text a little room to breath +\setlength{\parskip}{3pt} +\RequirePackage{setspace} +\setstretch{1.05} + +% Issue and article metadata --------------------------------------------------- + +% Basic front matter information about the issue: volume, number, and +% date. + +\newcommand{\volume}[1]{\def\RJ@volume{#1}} +\newcommand{\volnumber}[1]{\def\RJ@number{#1}} +\renewcommand{\month}[1]{\def\RJ@month{#1}} +\renewcommand{\year}[1]{\def\RJ@year{#1}} + + +% Individual articles correspond to +% chapters, and are contained in |article| environments. This makes it +% easy to have figures counted within articles and hence hyperlinked +% correctly. + +% An article has an author, a title, and optionally a subtitle. We use +% the obvious commands for specifying these. Articles will be put in certain +% journal sections, named by \sectionhead. + +\newcommand {\sectionhead} [1]{\def\RJ@sectionhead{#1}} +\renewcommand{\author} [1]{\def\RJ@author{#1}} +\renewcommand{\title} [1]{\def\RJ@title{#1}} +\newcommand {\subtitle} [1]{\def\RJ@subtitle{#1}} + +% Control appearance of titles: make slightly smaller than usual, and +% suppress section numbering. See http://tex.stackexchange.com/questions/69749 +% for why we don't use \setcounter{secnumdepth}{-1} + +\usepackage[medium]{titlesec} +\usepackage{titletoc} +\titleformat{\section} {\normalfont\large\bfseries}{\arabic{section}}{1em}{} +\titleformat{\subsection}{\normalfont\normalsize\bfseries}{\arabic{section}.\arabic{subsection}}{0.5em}{} +\titlecontents{chapter} [0em]{}{}{}{\titlerule*[1em]{.}\contentspage} + +% Article layout --------------------------------------------------------------- + +% Environment |article| clears the article header information at its beginning. +% We use |\FloatBarrier| from the placeins package to keep floats within +% the article. +\RequirePackage{placeins} +\newenvironment{article}{\author{}\title{}\subtitle{}\FloatBarrier}{\FloatBarrier} + +% Refereed articles should have an abstract, so we redefine |\abstract| to +% give the desired style + +\renewcommand{\abstract}[1]{\noindent\textbf{Abstract} #1} +\renewenvironment{abstract}{\noindent\textbf{Abstract}~}{} + +% The real work is done by a redefined version of |\maketitle|. Note +% that even though we do not want chapters (articles) numbered, we +% need to increment the chapter counter, so that figures get correct +% labelling. + +\renewcommand{\maketitle}{% +\noindent + \chapter{\RJ@title}\refstepcounter{chapter} + \ifx\empty\RJ@subtitle + \else + \noindent\textbf{\RJ@subtitle} + \par\nobreak\addvspace{\baselineskip} + \fi + \ifx\empty\RJ@author + \else + \noindent\textit{\RJ@author} + \par\nobreak\addvspace{\baselineskip} + \fi + \@afterindentfalse\@nobreaktrue\@afterheading +} + +% Now for some ugly redefinitions. We do not want articles to start a +% new page. (Actually, we do, but this is handled via explicit +% \newpage +% +% The name@of@eq is a hack to get hyperlinks to equations to work +% within each article, even though there may be multiple eq.(1) +% \begin{macrocode} +\renewcommand\chapter{\secdef\RJ@chapter\@schapter} +\providecommand{\nohyphens}{% + \hyphenpenalty=10000\exhyphenpenalty=10000\relax} +\newcommand{\RJ@chapter}{% + \edef\name@of@eq{equation.\@arabic{\c@chapter}}% + \renewcommand{\@seccntformat}[1]{}% + \@startsection{chapter}{0}{0mm}{% + -2\baselineskip \@plus -\baselineskip \@minus -.2ex}{\p@}{% + \phantomsection\normalfont\huge\bfseries\raggedright}} + +% Book reviews should appear as sections in the text and in the pdf bookmarks, +% however we wish them to appear as chapters in the TOC. Thus we define an +% alternative to |\maketitle| for reviews. +\newcommand{\review}[1]{ + \pdfbookmark[1]{#1}{#1} + \section*{#1} + \addtocontents{toc}{\protect\contentsline{chapter}{#1}{\thepage}{#1.1}} +} + +% We want bibliographies as starred sections within articles. +% +\RequirePackage[sectionbib,round]{natbib} +\bibliographystyle{abbrvnat} +\renewcommand{\bibsection}{\section*{References}} + +% Equations, figures and tables are counted within articles, but we do +% not show the article number. For equations it becomes a bit messy to avoid +% having hyperref getting it wrong. + +% \numberwithin{equation}{chapter} +\renewcommand{\theequation}{\@arabic\c@equation} +\renewcommand{\thefigure}{\@arabic\c@figure} +\renewcommand{\thetable}{\@arabic\c@table} + +% Issue layout ----------------------------------------------------------------- + +% Need to provide our own version of |\tableofcontents|. We use the +% tikz package to get the rounded rectangle. Notice that |\section*| +% is really the same as |\chapter*|. +\renewcommand{\contentsname}{Contents} +\renewcommand\tableofcontents{% + \vspace{1cm} + \section*{\contentsname} + { \@starttoc{toc} } +} + +\renewcommand{\titlepage}{% + \thispagestyle{empty} + \hypersetup{ + pdftitle={The R Journal Volume \RJ@volume/\RJ@number, \RJ@month \RJ@year},% + pdfauthor={R Foundation for Statistical Computing},% + } + \noindent + \begin{center} + \fontsize{50pt}{50pt}\selectfont + The \raisebox{-8pt}{\includegraphics[height=77pt]{Rlogo-5}}\hspace{10pt} + Journal + + \end{center} + {\large \hfill Volume \RJ@volume/\RJ@number, \RJ@month{} \RJ@year \quad} + + \rule{\textwidth}{1pt} + \begin{center} + {\Large A peer-reviewed, open-access publication of the \\ + R Foundation for Statistical Computing} + \end{center} + + % And finally, put in the TOC box. Note the way |tocdepth| is adjusted + % before and after producing the TOC: thus, we can ensure that only + % articles show up in the printed TOC, but that in the PDF version, + % bookmarks are created for sections and subsections as well (provided + % that the non-starred forms are used). + \setcounter{tocdepth}{0} + \tableofcontents + \setcounter{tocdepth}{2} + \clearpage +} + +% Text formatting -------------------------------------------------------------- + +\newcommand{\R}{R} +\newcommand{\address}[1]{\addvspace{\baselineskip}\noindent\emph{#1}} +\newcommand{\email}[1]{\href{mailto:#1}{\normalfont\texttt{#1}}} + +% Simple font selection is not good enough. For example, |\texttt{--}| +% gives `\texttt{--}', i.e., an endash in typewriter font. Hence, we +% need to turn off ligatures, which currently only happens for commands +% |\code| and |\samp| and the ones derived from them. Hyphenation is +% another issue; it should really be turned off inside |\samp|. And +% most importantly, \LaTeX{} special characters are a nightmare. E.g., +% one needs |\~{}| to produce a tilde in a file name marked by |\file|. +% Perhaps a few years ago, most users would have agreed that this may be +% unfortunate but should not be changed to ensure consistency. But with +% the advent of the WWW and the need for getting `|~|' and `|#|' into +% URLs, commands which only treat the escape and grouping characters +% specially have gained acceptance + +\DeclareRobustCommand\code{\bgroup\@noligs\@codex} +\def\@codex#1{\texorpdfstring% +{{\normalfont\ttfamily\hyphenchar\font=-1 #1}}% +{#1}\egroup} +\newcommand{\kbd}[1]{{\normalfont\texttt{#1}}} +\newcommand{\key}[1]{{\normalfont\texttt{\uppercase{#1}}}} +\DeclareRobustCommand\samp{`\bgroup\@noligs\@sampx} +\def\@sampx#1{{\normalfont\texttt{#1}}\egroup'} +\newcommand{\var}[1]{{\normalfont\textsl{#1}}} +\let\env=\code +\newcommand{\file}[1]{{`\normalfont\textsf{#1}'}} +\let\command=\code +\let\option=\samp +\newcommand{\dfn}[1]{{\normalfont\textsl{#1}}} +% \acronym is effectively disabled since not used consistently +\newcommand{\acronym}[1]{#1} +\newcommand{\strong}[1]{\texorpdfstring% +{{\normalfont\fontseries{b}\selectfont #1}}% +{#1}} +\let\pkg=\strong +\newcommand{\CRANpkg}[1]{\href{https://CRAN.R-project.org/package=#1}{\pkg{#1}}}% +\let\cpkg=\CRANpkg +\newcommand{\ctv}[1]{\href{https://CRAN.R-project.org/view=#1}{\emph{#1}}} +\newcommand{\BIOpkg}[1]{\href{https://www.bioconductor.org/packages/release/bioc/html/#1.html}{\pkg{#1}}} + +% Example environments --------------------------------------------------------- +\RequirePackage{fancyvrb} +\RequirePackage{alltt} + +\DefineVerbatimEnvironment{example}{Verbatim}{} +\renewenvironment{example*}{\begin{alltt}}{\end{alltt}} + +% Support for output from Sweave, and generic session style code +% These used to have fontshape=sl for Sinput/Scode/Sin, but pslatex +% won't use a condensed font in that case. + +% Update (2015-05-28 by DS): remove fontsize=\small to match example environment + +\DefineVerbatimEnvironment{Sinput}{Verbatim}{} +\DefineVerbatimEnvironment{Soutput}{Verbatim}{} +\DefineVerbatimEnvironment{Scode}{Verbatim}{} +\DefineVerbatimEnvironment{Sin}{Verbatim}{} +\DefineVerbatimEnvironment{Sout}{Verbatim}{} +\newenvironment{Schunk}{}{} + +% Mathematics ------------------------------------------------------------------ + +% The implementation of |\operatorname| is similar to the mechanism +% \LaTeXe{} uses for functions like sin and cos, and simpler than the +% one of \AmSLaTeX{}. We use |\providecommand| for the definition in +% order to keep the one of the \pkg{amstex} if this package has +% already been loaded. +% \begin{macrocode} +\providecommand{\operatorname}[1]{% + \mathop{\operator@font#1}\nolimits} +\RequirePackage{amsfonts} + +\renewcommand{\P}{% + \mathop{\operator@font I\hspace{-1.5pt}P\hspace{.13pt}}} +\newcommand{\E}{% + \mathop{\operator@font I\hspace{-1.5pt}E\hspace{.13pt}}} +\newcommand{\VAR}{\operatorname{var}} +\newcommand{\COV}{\operatorname{cov}} +\newcommand{\COR}{\operatorname{cor}} + +% Figures ---------------------------------------------------------------------- + +% For use with pandoc > 3.2.1 +\newsavebox\pandoc@box +\newcommand*\pandocbounded[1]{% scales image to fit in text height/width + \sbox\pandoc@box{#1}% + \Gscale@div\@tempa{\textheight}{\dimexpr\ht\pandoc@box+\dp\pandoc@box\relax}% + \Gscale@div\@tempb{\linewidth}{\wd\pandoc@box}% + \ifdim\@tempb\p@<\@tempa\p@\let\@tempa\@tempb\fi% select the smaller of both + \ifdim\@tempa\p@<\p@\scalebox{\@tempa}{\usebox\pandoc@box}% + \else\usebox{\pandoc@box}% + \fi% +} + +\RequirePackage[font=small,labelfont=bf]{caption} + +% Wide environments for figures and tables ------------------------------------- +\RequirePackage{environ} + +% An easy way to make a figure span the full width of the page +\NewEnviron{widefigure}[1][]{ +\begin{figure}[#1] +\advance\leftskip-2cm +\begin{minipage}{\dimexpr\textwidth+4cm\relax}% + \captionsetup{margin=2cm} + \BODY +\end{minipage}% +\end{figure} +} + +\NewEnviron{widetable}[1][]{ +\begin{table}[#1] +\advance\leftskip-2cm +\begin{minipage}{\dimexpr\textwidth+4cm\relax}% + \captionsetup{margin=2cm} + \BODY +\end{minipage}% +\end{table} +} diff --git a/_articles/RJ-2025-034/RJwrapper.tex b/_articles/RJ-2025-034/RJwrapper.tex new file mode 100644 index 0000000000..831fed17b8 --- /dev/null +++ b/_articles/RJ-2025-034/RJwrapper.tex @@ -0,0 +1,72 @@ +\documentclass[a4paper]{report} +\usepackage[utf8]{inputenc} +\usepackage[T1]{fontenc} +\usepackage{RJournal} +\usepackage{amsmath,amssymb,array} +\usepackage{booktabs} + + +% tightlist command for lists without linebreak +\providecommand{\tightlist}{% + \setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}} + +\usepackage{longtable} + +% Always define CSL refs as bib entries are contained in separate doc +% Pandoc citation processing +%From Pandoc 3.1.8 +% definitions for citeproc citations +\NewDocumentCommand\citeproctext{}{} +\NewDocumentCommand\citeproc{mm}{% + \begingroup\def\citeproctext{#2}\cite{#1}\endgroup} +\makeatletter + % allow citations to break across lines + \let\@cite@ofmt\@firstofone + % avoid brackets around text for \cite: + \def\@biblabel#1{} + \def\@cite#1#2{{#1\if@tempswa , #2\fi}} +\makeatother +\newlength{\cslhangindent} +\setlength{\cslhangindent}{1.5em} +\newlength{\csllabelwidth} +\setlength{\csllabelwidth}{3em} +\newenvironment{CSLReferences}[2] % #1 hanging-indent, #2 entry-spacing + {\begin{list}{}{% + \setlength{\itemindent}{0pt} + \setlength{\leftmargin}{0pt} + \setlength{\parsep}{0pt} + % turn on hanging indent if param 1 is 1 + \ifodd #1 + \setlength{\leftmargin}{\cslhangindent} + \setlength{\itemindent}{-1\cslhangindent} + \fi + % set entry spacing + \setlength{\itemsep}{#2\baselineskip}}} + {\end{list}} +\usepackage{calc} +\newcommand{\CSLBlock}[1]{#1\hfill\break} +\newcommand{\CSLLeftMargin}[1]{\parbox[t]{\csllabelwidth}{#1}} +\newcommand{\CSLRightInline}[1]{\parbox[t]{\linewidth - \csllabelwidth}{#1}\break} +\newcommand{\CSLIndent}[1]{\hspace{\cslhangindent}#1} + + +\usepackage{amsfonts} + + +\begin{document} + + +%% do not edit, for illustration only +\sectionhead{Contributed research article} +\volume{17} +\volnumber{4} +\year{2025} +\month{December} +\setcounter{page}{37} + +\begin{article} + \input{RJ-2025-034} +\end{article} + + +\end{document} diff --git a/_articles/RJ-2025-034/Rlogo-5.png b/_articles/RJ-2025-034/Rlogo-5.png new file mode 100644 index 0000000000..1a04f27478 Binary files /dev/null and b/_articles/RJ-2025-034/Rlogo-5.png differ diff --git a/_articles/RJ-2025-034/fig/Lexis_local_hazard.pdf b/_articles/RJ-2025-034/fig/Lexis_local_hazard.pdf new file mode 100644 index 0000000000..88fb61cc9a Binary files /dev/null and b/_articles/RJ-2025-034/fig/Lexis_local_hazard.pdf differ diff --git a/_articles/RJ-2025-034/fig/Lexis_local_hazard.png b/_articles/RJ-2025-034/fig/Lexis_local_hazard.png new file mode 100644 index 0000000000..daea63f675 Binary files /dev/null and b/_articles/RJ-2025-034/fig/Lexis_local_hazard.png differ diff --git a/_articles/RJ-2025-034/fig/Lexis_local_hazard.tex b/_articles/RJ-2025-034/fig/Lexis_local_hazard.tex new file mode 100644 index 0000000000..ec71ce4538 --- /dev/null +++ b/_articles/RJ-2025-034/fig/Lexis_local_hazard.tex @@ -0,0 +1,99 @@ +\documentclass{standalone} + +\usepackage{tikz} + \usetikzlibrary{shapes,shapes.misc,arrows} + \tikzset{cross/.style={cross out, draw=black, minimum size=2*(#1-\pgflinewidth), inner sep=0pt, outer sep=0pt}, +% %default radius will be 1pt. + cross/.default={2pt}} +\usepackage{amsmath} +\usepackage{fourier} + +\begin{document} + +\begin{tikzpicture} +\fill [gray!10] (0,0) rectangle (3,5); +\fill [gray!30] (0,0) rectangle (3,1); +\fill [gray!30] (0,2) rectangle (3,3); +\fill [gray!30] (0,4) rectangle (3,5); +\draw[dashed](3,-0.1)--(3,5); %vertical right bar +% \draw[dotted, very thick, color=white](0,3.7)--(3,3.7); +% \draw[dotted, very thick, color=white](0,1.2)--(3,1.2); +% \draw(3.8,1.2)--(3.9,1.2) node[right] at (3.9,1.2) {\small $t_B$}; +% \draw(3.8,0.8)--(3.9,0.8) node[right] at (3.9,0.8) {\small $c_2-x_D$}; +% \draw(3.8,1.6)--(3.9,1.6) node[right] at (3.9,1.6) {\small $c_1-x_C$}; +% \draw(3.8,3.2)--(3.9,3.2) node[right] at (3.9,3.2) {\small $c_1-x_A$}; +% \draw(3.8,4.6)--(3.9,4.6) node[right] at (3.9,4.6) {\small $c_2-x_C$}; +% \draw(3.8,3.7)--(3.9,3.7) node[right] at (3.9,3.7) {\small $t_A$}; +% \draw (3.7,0)--(3.9,0) node[right]{\small $0$}; + +\node[anchor=base,inner sep=0pt, outer sep=0pt] at (0,5.1) {Excess lifetime}; +\draw (0,0)--(3.2,0); +\draw (0,-0.1)--(0,5); +\node[below] at(0,-0.1) {$c_1$}; +\node[below] at(3,-0.1) {$c_2$}; +\node[left] at (0, 0) {$0$}; +\draw (-0.1,0)--(0,0); +\draw (-0.1,1)--(0,1); +\draw (-0.1,2)--(0,2); +\draw (-0.1,3)--(0,3); +\draw (-0.1,4)--(0,4); +\node[left] at (0, 1) {$a_1$}; +\node[left] at (0, 2) {$a_2$}; +\node[left] at (0, 3) {$a_3$}; +\node[left] at (0, 4) {$a_4$}; +% \draw (2.2,0)--(2.2,-0.1) ; +% \draw (0.4,0)--(0.4,-0.1); +% \draw (-3.2,0)--(-3.2,-0.1); +% Draw C +\draw (0,1.6)-- (3,4.6); +\draw[dashed] (3,4.6) -- (3.3,4.9); +\draw[fill=black] (3,4.6) circle (1pt) ; +\node[right] at (3,4.6) {$C$}; + +% Draw D +\node[above left] at (-0.2,0.8) {}; +\draw (2.2,0)-- (3,0.8); +\draw[fill=black] (3,0.8) circle (1pt); +\draw[dashed] (3,0.8) -- (3.3,1.1); +\draw (0,3.2)-- (0.5,3.7); +\draw (0.5,3.7) node[cross,rotate=45] {}; +\node[right] at (0.5,3.7) {$A$}; +\draw (0.4,0)-- (1.6,1.2); +\draw (1.6,1.2) node[cross,rotate=45] {}; +\node[right] at (1.6,1.2) {$B$}; + +\draw (1.1,0) -- (1.8, 0.7); +\draw (1.8,0.7) node[cross,rotate=45] {}; + + + +\draw (0,3) -- (0.2, 3.2); +\draw (0.2, 3.2) node[cross,rotate=45] {}; + +\draw (2.8,0) -- (2.95, 0.15); +\draw (2.95, 0.15) node[cross,rotate=45] {}; + + +\node [below] at (3,-0.8){calendar time}; +% \draw[->,>=stealth](-0.5,0)--(-0.5,5); +\end{tikzpicture} + +\end{document} +{\scalebox{0.5}{ +\begin{tikzpicture}[ every node/.style={scale=2}] +\path[clip] (-6,-2) rectangle (5.5,10.5); +\fill [gray!20] (-5,0) rectangle (0,10); +\fill [gray!10] (0,0) rectangle (3,10); +\draw[->](-5,0) -- (3.5,0); +\draw[->](-5,-0) -- (-5, 10.5); +\draw(-5,1) -- (2,8) node[cross,rotate=45] {};; +\draw[dashed, color = white](2,8)--(4,10); +\draw(-5,3) -- (-3,5) node[cross,rotate=45] {};; +\node[left] at(-5,0){\footnotesize }; +\draw[dashed, color = white, thick](-3,5)--(2,10); +\node[below] at(-5,-0.1) {$c_1$}; +\node[below] at(3,-0.1) {$c_2$}; +\node[below] at(0,-0.1) {$m$}; +\node[anchor=base,inner sep=0pt, outer sep=0pt] at (-2,10) {Excess lifetime}; +\end{tikzpicture} +}} diff --git a/_articles/RJ-2025-034/fig/nesting_graph.pdf b/_articles/RJ-2025-034/fig/nesting_graph.pdf new file mode 100644 index 0000000000..d05790ba43 Binary files /dev/null and b/_articles/RJ-2025-034/fig/nesting_graph.pdf differ diff --git a/_articles/RJ-2025-034/fig/nesting_graph.png b/_articles/RJ-2025-034/fig/nesting_graph.png new file mode 100644 index 0000000000..e813e0d8b7 Binary files /dev/null and b/_articles/RJ-2025-034/fig/nesting_graph.png differ diff --git a/_articles/RJ-2025-034/fig/nesting_graph.tex b/_articles/RJ-2025-034/fig/nesting_graph.tex new file mode 100644 index 0000000000..e93cb76713 --- /dev/null +++ b/_articles/RJ-2025-034/fig/nesting_graph.tex @@ -0,0 +1,75 @@ +\documentclass{standalone} +\usepackage{tikz} +\RequirePackage{palatino,mathpazo} +\RequirePackage[scaled=1.02]{inconsolata} +\RequirePackage[T1]{fontenc} +\begin{document} +% This code uses the tikz package +\begin{tikzpicture}[scale=2] +\node (v0) at (0.00,0.00) {\texttt{exp}}; +\node (v1) at (-1.00,-1.00) {\texttt{gomp}}; +\node (v2) at (1.50,0.00) {\texttt{weibull}}; +\node (v3) at (1.00,-1.00) {\texttt{gp}}; +\node (v4) at (0.00,-2.00) {\texttt{extgp}}; +\node (v5) at (-2.00,-2.00) {\texttt{gompmake}}; +\node (v6) at (2,-2.00) {\texttt{gppiece}}; +\node(v7) at (-1.5, 0) {\texttt{perks}}; +\node(v8) at (-3.5, -0.5) {\texttt{perksmake}}; +\node(v9) at (-2.5, -1) {\texttt{beard}}; +\node[left](v10) at (-3.5, -2) {\texttt{beardmake}}; +\node [right](v11) at (2.0,-1) {\texttt{extweibull}}; +\draw [-stealth, dashed] (v1) edge (v0); +\draw [-stealth] (v2) edge (v0); +\draw [-stealth] (v3) edge (v0); +\draw [-stealth, dashed] (v8) edge (v7); +\draw [-stealth, dashed] (v10) edge (v9); +\draw [-stealth, dashed] (v10) edge (v5); +\draw [-stealth, dashed] (v9) edge (v1); +\draw [-stealth, dashed] (v7) edge (v0); +\draw [-stealth] (v10) edge (v8); +\draw [-stealth] (v9) edge (v7); +\draw [-stealth] (v4) edge (v1); +\draw [-stealth] (v11) edge (v2); +\draw [-stealth] (v11) edge (v3); +\draw [-stealth, dashed] (v5) edge (v1); +\draw [-stealth, dashed] (v4) edge (v3); +\draw [-stealth] (v6) edge (v3); +\node[scale=0.75,fill=white] at (-0.5,-0.5) {$\beta=0$}; %Gomp vs exp +\node[scale=0.75,fill=white] at (-0.75,0) {$\nu=0$}; %Perk vs exp +\node[scale=0.75,fill=white] at (-2,-0.5) {$\beta=1$}; %beard vs perks +\node[scale=0.75,fill=white] at (-3.75,-1.25) {$\beta=1$}; +\node[scale=0.75,fill=white] at (-3.25,-1.5) {$\lambda=0$}; +\node[scale=0.75,fill=white] at (-1.75,-1) {$\beta=0$}; +\node[scale=0.75,fill=white] at (0.75,0) {$\alpha=1$}; +\node[scale=0.75,fill=white] at (1.6,-1) {$\alpha=1$}; +\node[scale=0.75,fill=white] at (0.5,-0.5) {$\xi=0$}; +\node[scale=0.75,fill=white] at (2,-0.5) {$\xi=0$}; +\node[scale=0.75,fill=white] at (-1.5,-1.5) {$\lambda=0$}; +\node[scale=0.75,fill=white] at (-3,-2) {$\beta=0$}; +\node[scale=0.75,fill=white] at (-2.5,-0.25) {$\lambda=0$}; +\node[scale=0.75,fill=white] at (1.5,-1.5) {$\xi_1=\cdots =\xi_K$}; +\node[scale=0.75,fill=white] at (-0.5,-1.5) {$\xi=0$}; +\node[scale=0.75,fill=white] at (0.5,-1.5) {$\beta=0$}; +\end{tikzpicture} +\end{document} +dag <- dagitty::dagitty('dag { +A [pos="0.000,0.000",label="exp"] +B [pos="-1.000,1.000",label="gomp"] +C [pos="1.000,0.000",label="weibull"] +D [pos="1.000,1.000",label="gp"] +E [pos="0.000,2.000",label="extgp"] +F [pos="-2.000,2.000",label="gompmake"] +G [pos="2.000,2.000",label="gppiece"] +H[pos="1.000,1.000",label="extweibull"] +A -> B +A -> C +A -> D +B -> E +B -> F +D -> E +D -> G +C -> H +G -> H +}' +) +plot(dag) diff --git a/_articles/RJ-2025-034/longevity.bib b/_articles/RJ-2025-034/longevity.bib new file mode 100644 index 0000000000..fd3a9203a0 --- /dev/null +++ b/_articles/RJ-2025-034/longevity.bib @@ -0,0 +1,780 @@ +@article{Perks:1932, + title = {On Some Experiments in the Graduation of Mortality Statistics}, + author = {Perks, Wilfred}, + year = 1932, + journal = {Journal of the Institute of Actuaries}, + publisher = {Cambridge University Press}, + volume = 63, + number = 1, + pages = {12--57}, + doi = {10.1017/s0020268100046680} +} +@article{Chernoff:1954, + title = {On the Distribution of the Likelihood Ratio}, + author = {Herman Chernoff}, + year = 1954, + journal = {The Annals of Mathematical Statistics}, + publisher = {Institute of Mathematical Statistics}, + volume = 25, + number = 3, + pages = {573--578}, + doi = {10.1214/aoms/1177728725}, + url = {https://doi.org/10.1214/aoms/1177728725} +} +@article{Kaplan.Meier:1958, + title = {Nonparametric estimation from incomplete observations}, + author = {Edward L. Kaplan and Paul Meier}, + year = 1958, + journal = {Journal of the American Statistical Association}, + volume = 53, + pages = {457--481}, + doi = {10.1080/01621459.1958.10501452}, + url = {https://doi.org/10.1080/01621459.1958.10501452} +} +@article{Lynden-Bell:1971, + title = {A Method of Allowing for Known Observational Selection in Small Samples Applied to {3CR} Quasars}, + author = {Lynden-Bell, D.}, + year = 1971, + journal = {Monthly Notices of the Royal Astronomical Society}, + volume = 155, + number = 1, + pages = {95--118}, + doi = {10.1093/mnras/155.1.95}, + issn = {0035-8711}, + url = {https://doi.org/10.1093/mnras/155.1.95} +} +@article{Turnbull:1976, + title = {The empirical distribution function with arbitrarily grouped, censored and truncated data}, + author = {Bruce W. Turnbull}, + year = 1976, + journal = {Journal of the Royal Statistical Society, Series B}, + volume = 38, + pages = {290--295}, + doi = {10.1111/j.2517-6161.1976.tb01597.x}, + url = {https://doi.org/10.1111/j.2517-6161.1976.tb01597.x} +} +@article{Self.Liang:1987, + title = {Asymptotic Properties of Maximum Likelihood Estimators and Likelihood Ratio Tests under Nonstandard Conditions}, + author = {Steven G. Self and Kung-Yee Liang}, + year = 1987, + journal = {Journal of the American Statistical Association}, + publisher = {Taylor \& Francis}, + volume = 82, + number = 398, + pages = {605--610}, + doi = {10.1080/01621459.1987.10478472}, + url = {https://doi.org/10.1080/01621459.1987.10478472} +} +@article{Tsai.Jewell.Wang:1987, + title = {A Note on the Product-Limit Estimator Under Right Censoring and Left Truncation}, + author = {Wei-Yann Tsai and Nicholas P. Jewell and Mei-Cheng Wang}, + year = 1987, + journal = {Biometrika}, + volume = 74, + number = 4, + pages = {883--886}, + doi = {10.1093/biomet/74.4.883}, + url = {https://doi.org/10.1093/biomet/74.4.883} +} +@article{Aragon.Eberly:1992, + title = {On Convergence of Convex Minorant Algorithms for Distribution Estimation with Interval-Censored Data}, + author = {Jorge Arag\'{o}n and David Eberly}, + year = 1992, + journal = {Journal of Computational and Graphical Statistics}, + publisher = {Taylor \& Francis}, + volume = 1, + number = 2, + pages = {129--140}, + doi = {10.1080/10618600.1992.10477009}, + url = {https://doi.org/10.1080/10618600.1992.10477009} +} +@article{Waller.Turnbull:1992, + title = {Probability plotting with censored data}, + author = {Lance A. Waller and Bruce W. Turnbull}, + year = 1992, + journal = {American Statistician}, + volume = 46, + number = 1, + pages = {5--12}, + doi = {10.1080/00031305.1992.10475837}, + url = {https://doi.org/10.1080/00031305.1992.10475837} +} +@article{Gentleman.Geyer:1994, + title = {Maximum likelihood for interval censored data: Consistency and computation}, + author = {Gentleman, Robert and Charles J. Geyer}, + year = 1994, + journal = {Biometrika}, + volume = 81, + number = 3, + pages = {618--623}, + doi = {10.1093/biomet/81.3.618}, + issn = {0006-3444}, + url = {https://doi.org/10.1093/biomet/81.3.618} +} +@article{Frydman:1994, + title = {A Note on Nonparametric Estimation of the Distribution Function from Interval-Censored and Truncated Observations}, + author = {Halina Frydman}, + year = 1994, + journal = {Journal of the Royal Statistical Society. Series B (Methodological)}, + publisher = {[Royal Statistical Society, Wiley]}, + volume = 56, + number = 1, + pages = {71--74}, + doi = {10.1111/j.2517-6161.1994.tb01960.x}, + issn = {00359246}, + url = {https://doi.org/10.1111/j.2517-6161.1994.tb01960.x} +} +@article{Ihaka.Gentleman:1996, + title = {{R}: A Language for Data Analysis and Graphics}, + author = {Ihaka, Ross and Gentleman, Robert}, + year = 1996, + journal = {Journal of Computational and Graphical Statistics}, + volume = 5, + number = 3, + pages = {299--314}, + doi = {10.1080/10618600.1996.10474713}, + url = {https://doi.org/10.1080/10618600.1996.10474713} +} +@article{Yee.Wild:1996, + title = {Vector Generalized Additive Models}, + author = {Yee, Thomas W. and Wild, C. J.}, + year = 1996, + journal = {Journal of the Royal Statistical Society: Series B (Methodological)}, + volume = 58, + number = 3, + pages = {481--493}, + doi = {10.1111/j.2517-6161.1996.tb02095.x}, + url = {https://doi.org/10.1111/j.2517-6161.1996.tb02095.x} +} +@article{Lindsey.Ryan:1998, + title = {Methods for interval-censored data}, + author = {Lindsey, Jane C. and Ryan, Louise M.}, + year = 1998, + journal = {Statistics in Medicine}, + volume = 17, + number = 2, + pages = {219--238}, + doi = {10.1002/(sici)1097-0258(19980130)17:2<219::aid-sim735>3.0.co;2-o}, + url = {https://doi.org/10.1002/(SICI)1097-0258(19980130)17:2<219::AID-SIM735>3.0.CO;2-O} +} +@article{Efron.Petrosian:1999, + title = {Nonparametric Methods for Doubly Truncated Data}, + author = {Bradley Efron and Vahe Petrosian}, + year = 1999, + journal = {Journal of the American Statistical Association}, + volume = 94, + number = 447, + pages = {824--834}, + doi = {10.1080/01621459.1999.10474187}, + url = {https://doi.org/10.1080/01621459.1999.10474187} +} +@article{Anderson:2000, + title = {A vitality-based model relating stressors and environmental properties to organism survival}, + author = {Anderson, James J.}, + year = 2000, + journal = {Ecological Monographs}, + volume = 70, + number = 3, + pages = {445--470}, + doi = {10.1890/0012-9615(2000)070[0445:avbmrs]2.0.co;2}, + url = {https://doi.org/10.1890/0012-9615(2000)070[0445:AVBMRS]2.0.CO;2} +} +@article{Rotnitzky:2000, + title = {Likelihood-based inference with singular information matrix}, + author = {Andrea Rotnitzky and David R. Cox and Matteo Bottai and James Robins}, + year = 2000, + journal = {Bernoulli}, + publisher = {Bernoulli Society for Mathematical Statistics and Probability}, + volume = 6, + number = 2, + pages = {243 -- 284}, + doi = {10.2307/3318576}, + url = {https://doi.org/10.2307/3318576} +} +@article{evd, + title = {{evd}: Extreme Value Distributions}, + author = {A. G. Stephenson}, + year = 2002, + journal = {R News}, + volume = 2, + number = 2, + url = {https://cran.r-project.org/doc/Rnews/Rnews_2002-2.pdf} +} +@article{Royston.Parmar:2002, + title = {Flexible parametric proportional-hazards and proportional-odds models for censored survival data, with application to prognostic modelling and estimation of treatment effects}, + author = {Royston, Patrick and Parmar, Mahesh K. B.}, + year = 2002, + journal = {Statistics in Medicine}, + volume = 21, + number = 15, + pages = {2175--2197}, + doi = {10.1002/sim.1203}, + url = {https://doi.org/10.1002/sim.1203} +} +@article{Li.Anderson:2009, + title = {The vitality model: A way to understand population survival and demographic heterogeneity}, + author = {Ting Li and James J. Anderson}, + year = 2009, + journal = {Theoretical Population Biology}, + volume = 76, + number = 2, + pages = {118--131}, + doi = {10.1016/j.tpb.2009.05.004}, + issn = {0040-5809}, + url = {https://doi.org/10.1016/j.tpb.2009.05.004} +} +@article{interval, + title = {Exact and Asymptotic Weighted Logrank Tests for Interval Censored Data: The {interval} {R} Package}, + author = {Fay, Michael P. and Shaw, Pamela A.}, + year = 2010, + journal = {Journal of Statistical Software}, + volume = 36, + number = 2, + pages = {1--34}, + doi = {10.18637/jss.v036.i02}, + url = {https://doi.org/10.18637/jss.v036.i02} +} +@article{Shen:2010, + title = {Nonparametric analysis of doubly truncated data}, + author = {Shen, Pao-Sheng}, + year = 2010, + journal = {Annals of the Institute of Statistical Mathematics}, + volume = 62, + number = 5, + pages = {835--853}, + doi = {10.1007/s10463-008-0192-2}, + issn = {1572-9052}, + url = {https://doi.org/10.1007/s10463-008-0192-2} +} +@article{Watanabe:2010, + title = {Asymptotic Equivalence of {B}ayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory}, + author = {Watanabe, Sumio}, + year = 2010, + journal = {Journal of Machine Learning Research}, + publisher = {JMLR.org}, + volume = 11, + pages = {3571--3594}, + issn = {1532-4435}, + url = {https://dl.acm.org/doi/10.5555/1756006.1953045} +} +@article{lubridate-package, + title = {Dates and Times Made Easy with {lubridate}}, + author = {Garrett Grolemund and Hadley Wickham}, + year = 2011, + journal = {Journal of Statistical Software}, + volume = 40, + number = 3, + pages = {1--25}, + doi = {10.18637/jss.v040.i03}, + url = {https://www.jstatsoft.org/v40/i03/} +} +@article{Sun.Genton:2011, + title = {Functional Boxplots}, + author = {Ying Sun and Marc G. Genton}, + year = 2011, + journal = {Journal of Computational and Graphical Statistics}, + publisher = {Taylor \& Francis}, + volume = 20, + number = 2, + pages = {316--334}, + doi = {10.1198/jcgs.2011.09224} +} +@article{Richards:2012, + title = {A handbook of parametric survival models for actuarial use}, + author = {Stephens J. Richards}, + year = 2012, + journal = {Scandinavian Actuarial Journal}, + publisher = {Taylor \& Francis}, + volume = 2012, + number = 4, + pages = {233--257}, + doi = {10.1080/03461238.2010.506688}, + url = {https://doi.org/10.1080/03461238.2010.506688} +} +@article{Austin.Simon.Betensky:2014, + title = {Computationally simple estimation and improved efficiency for special cases of double truncation}, + author = {Austin, Matthew D. and Simon, David K. and Betensky, Rebecca A.}, + year = 2014, + day = {01}, + journal = {Lifetime Data Analysis}, + volume = 20, + number = 3, + pages = {335--354}, + doi = {10.1007/s10985-013-9287-z}, + url = {https://doi.org/10.1007/s10985-013-9287-z} +} +@article{bshazard, + title = {{bshazard}: A Flexible Tool for Nonparametric Smoothing of the Hazard Function}, + author = {Paola Rebora and Agus Salim and Marie Reilly}, + year = 2014, + journal = {The R Journal}, + volume = 6, + number = 2, + pages = {114--122}, + doi = {10.32614/rj-2014-028}, + url = {https://doi.org/10.32614/RJ-2014-028} +} +@article{Northrop.Coleman:2014, + title = {Improved diagnostic plots for extreme value analyses}, + author = {Paul J. Northrop and Claire L. Coleman}, + year = 2014, + journal = {Extremes}, + volume = 17, + pages = {289--303}, + doi = {10.1007/s10687-014-0183-z}, + url = {https://doi.org/10.1007/s10687-014-0183-z} +} +@article{fitdistrplus-package, + title = {{fitdistrplus}: An {R} Package for Fitting Distributions}, + author = {Marie Laure Delignette-Muller and Christophe Dutang}, + year = 2015, + journal = {Journal of Statistical Software}, + volume = 64, + number = 4, + pages = {1--34}, + doi = {10.18637/jss.v064.i04} +} +@article{extRemes, + title = {{extRemes} 2.0: An Extreme Value Analysis Package in {R}}, + author = {Eric Gilleland and Richard W. Katz}, + year = 2016, + journal = {Journal of Statistical Software}, + volume = 72, + number = 8, + pages = {1--39}, + doi = {10.18637/jss.v072.i08} +} +@article{flexsurv, + title = {{flexsurv}: A Platform for Parametric Survival Modeling in {R}}, + author = {Jackson, Christopher}, + year = 2016, + journal = {Journal of Statistical Software}, + volume = 70, + number = 8, + pages = {1--33}, + doi = {10.18637/jss.v070.i08}, + url = {https://www.jstatsoft.org/index.php/jss/article/view/v070i08} +} +@article{Anderson-Bergman:2017, + title = {An Efficient Implementation of the {EMICM} Algorithm for the Interval Censored {NPMLE}}, + author = {Clifford Anderson-Bergman}, + year = 2017, + journal = {Journal of Computational and Graphical Statistics}, + publisher = {Taylor \& Francis}, + volume = 26, + number = 2, + pages = {463--467}, + doi = {10.1080/10618600.2016.1208616}, + url = {https://doi.org/10.1080/10618600.2016.1208616} +} +@article{icensReg, + title = {{icenReg}: Regression Models for Interval Censored Data in {R}}, + author = {Clifford Anderson-Bergman}, + year = 2017, + journal = {Journal of Statistical Software}, + volume = 81, + number = 12, + pages = {1--23}, + doi = {10.18637/jss.v081.i12}, + url = {https://doi.org/10.18637/jss.v081.i12} +} +@article{Barbi:2018, + title = {The plateau of human mortality: Demography of longevity pioneers}, + author = {Barbi, Elisabetta and Lagona, Francesco and Marsili, Marco and Vaupel, James W. and Wachter, Kenneth W.}, + year = 2018, + journal = {Science}, + publisher = {American Association for the Advancement of Science}, + volume = 360, + number = 6396, + pages = {1459--1461}, + doi = {10.1126/science.aat3119}, + issn = {0036-8075}, + url = {https://doi.org/10.1126/science.aat3119} +} +@article{Peto:1973, + title = {Experimental Survival Curves for Interval-Censored Data}, + author = {Peto, Richard}, + year = 2018, + journal = {Journal of the Royal Statistical Society Series C: Applied Statistics}, + volume = 22, + number = 1, + pages = {86--91}, + doi = {10.2307/2346307}, + issn = {0035-9254}, + url = {https://doi.org/10.2307/2346307} +} +@article{rstpm2, + title = {Parametric and penalized generalized survival models}, + author = {Xing-Rong Liu and Yudi Pawitan and Mark Clements}, + year = 2018, + journal = {Statistical Methods in Medical Research}, + volume = 27, + number = 5, + pages = {1531--1546}, + doi = {10.1177/0962280216664760}, + url = {https://doi.org/10.1177/0962280216664760} +} +@article{Einmahl:2019, + title = {Limits to Human Life Span Through Extreme Value Theory}, + author = {Einmahl, Jesson J. and Einmahl, John H. J. and de Haan, Laurens}, + year = 2019, + journal = {Journal of the American Statistical Association}, + publisher = {Taylor \& Francis}, + volume = 114, + number = 527, + pages = {1075--1080}, + doi = {10.1080/01621459.2018.1537912}, + url = {https://doi.org/10.1080/01621459.2018.1537912} +} +@article{Belzile:2021, + title = {Human mortality at extreme age}, + author = {Belzile, L\'{e}o R. and Anthony C. Davison and Holger Rootz\'en and Dmitrii Zholud}, + year = 2021, + journal = {Royal Society Open Science}, + volume = 202097, + doi = {10.1098/rsos.202097}, + url = {https://doi.org/10.1098/rsos.202097} +} +@article{ARSIA:2022, + title = {Is there a cap on longevity? {A} statistical review}, + author = {Belzile, L\'{e}o R. and Anthony C. Davison and Jutta Gampe and Holger Rootz\'en and Dmitrii Zholud}, + year = 2022, + journal = {Annual Review of Statistics and its Application}, + volume = 9, + pages = {22--45}, + doi = {10.1146/annurev-statistics-040120-025426}, + url = {https://doi.org/10.1146/annurev-statistics-040120-025426} +} +@article{casebase, + title = {{casebase}: An Alternative Framework for Survival Analysis and Comparison of Event Rates}, + author = {Bhatnagar, Sahir Rai and Turgeon, Maxime and Islam, Jesse and Hanley, James A. and Saarela, Olli}, + year = 2022, + journal = {The R Journal}, + volume = 14, + pages = {59--79}, + doi = {10.32614/rj-2022-052}, + issn = {2073-4859}, + url = {https://doi.org/10.32614/RJ-2022-052}, + issue = 3 +} +@article{Camarda:2022, + title = {The curse of the plateau. Measuring confidence in human mortality estimates at extreme ages}, + author = {Carlo Giovanni Camarda}, + year = 2022, + journal = {Theoretical Population Biology}, + volume = 144, + pages = {24--36}, + doi = {10.1016/j.tpb.2022.01.002}, + url = {https://doi.org/10.1016/j.tpb.2022.01.002} +} +@article{Sailynoja.Burkner.Vehtari:2021, + title = {Graphical Test for Discrete Uniformity and its Applications in Goodness of Fit Evaluation and Multiple Sample Comparison}, + author = {S\"{a}ilynoja, Teemu and B\"{u}rkner, Paul-Christian and Vehtari, Aki}, + year = 2022, + day = 24, + journal = {Statistics and Computing}, + volume = 32, + number = 2, + pages = 32, + doi = {10.1007/s11222-022-10090-6}, + url = {https://doi.org/10.1007/s11222-022-10090-6} +} +@article{evgam, + title = {{evgam}: An {R} Package for Generalized Additive Extreme Value Models}, + author = {Youngman, Benjamin D.}, + year = 2022, + journal = {Journal of Statistical Software}, + volume = 103, + number = 3, + pages = {1--26}, + doi = {10.18637/jss.v103.i03}, + url = {https://www.jstatsoft.org/index.php/jss/article/view/v103i03} +} +@article{Belzile.Dutang.Northrop.Opitz:2022, + title = {A modeler's guide to extreme value software}, + author = {Belzile, L\'{e}o R. and Dutang, Christophe and Northrop, Paul J. and Opitz, Thomas}, + year = 2023, + journal = {Extremes}, + publisher = {Springer}, + volume = 26, + pages = {595--638}, + doi = {10.1007/s10687-023-00475-9}, + url = {https://doi.org/10.1007/s10687-023-00475-9} +} +@book{Devroye:1986, + title = {Non-Uniform Random Variate Generation}, + author = {Devroye, L.}, + year = 1986, + publisher = {Springer}, + address = {New York}, + url = {http://www.nrbook.com/devroye/}, + page = 843 +} +@book{Groeneboom.Wellner:1992, + title = {Information Bounds and Nonparametric Maximum Likelihood Estimation}, + author = {Piet Groeneboom and Jon A. Wellner}, + year = 1992, + address = {Basel, Switzerland}, + pages = 128, + doi = {10.1007/978-3-0348-8621-5}, + isbn = {978-3-7643-2794-1}, + url = {https://doi.org/10.1007/978-3-0348-8621-5}, + editor = {Birkh\"{a}user} +} +@book{Davison.Hinkley:1997, + title = {Bootstrap Methods and Their Application}, + author = {Davison, A. C. and Hinkley, D. V.}, + year = 1997, + publisher = {Cambridge University Press}, + address = {New York}, + pages = 582, + doi = {10.1017/cbo9780511802843}, + url = {https://doi.org/10.1017/CBO9780511802843} +} +@book{survival-book, + title = {Modeling Survival Data: Extending the {C}ox Model}, + author = {Terry M. Therneau and Patricia M. Grambsch}, + year = 2000, + publisher = {Springer}, + address = {New York}, + doi = {10.1007/978-1-4757-3294-8}, + isbn = {0-387-98784-3}, + url = {https://doi.org/10.1007/978-1-4757-3294-8} +} +@book{VGAMbook, + title = {Vector Generalized Linear and Additive Models}, + author = {Thomas W. Yee}, + year = 2015, + location = {New York, NY}, + publisher = {Springer}, + doi = {10.1007/978-1-4939-2818-7}, + url = {https://doi.org/10.1007/978-1-4939-2818-7}, + subtitle = {With an Implementation in {R}} +} +@book{ggplot2, + title = {{ggplot2}: Elegant Graphics for Data Analysis}, + author = {Hadley Wickham}, + year = 2016, + publisher = {Springer-Verlag New York}, + isbn = {978-3-319-24277-4}, + url = {https://ggplot2.tidyverse.org} +} +@book{ExceptionalLifespans, + title = {Exceptional Lifespans}, + year = 2021, + location = {Cham, Switzerland}, + publisher = {Springer}, + series = {Demographic Research Monographs}, + doi = {10.1007/978-3-030-49970-9}, + url = {https://doi.org/10.1007/978-3-030-49970-9}, + editor = {Maier, Heiner and Jeune, Bernard and Vaupel, James W.} +} +@inbook{Sevcikova:2016, + title = {Age-Specific Mortality and Fertility Rates for Probabilistic Population Projections}, + author = {{\v{S}}ev{\v{c}}{\'i}kov{\'a}, Hana and Li, Nan and Kantorov{\'a}, Vladim{\'i}ra and Gerland, Patrick and Raftery, Adrian E.}, + year = 2016, + booktitle = {Dynamic Demographic Analysis}, + publisher = {Springer}, + address = {Cham, Switzerland}, + pages = {285--310}, + doi = {10.1007/978-3-319-26603-9_15}, + isbn = {978-3-319-26603-9}, + url = {https://doi.org/10.1007/978-3-319-26603-9_15}, + editor = {Schoen, Robert} +} +@incollection{IDL:2021, + title = {The International Database on Longevity: Data Resource Profile}, + author = {Dmitri A. Jdanov and Vladimir M. Shkolnikov and Sigrid Gellers-Barkmann}, + year = 2021, + booktitle = {Exceptional Lifespans}, + publisher = {Springer}, + address = {Cham, Switzerland}, + series = {Demographic Research Monographs}, + pages = {22--24}, + doi = {10.1007/978-3-030-49970-9_2}, + url = {https://doi.org/10.1007/978-3-030-49970-9_2}, + editor = {Maier, Heiner and Jeune, Bernard and Vaupel, James W.} +} +@inproceedings{Beard:1963, + title = {A theory of mortality based on actuarial, biological and medical considerations}, + author = {Beard, Robert E.}, + year = 1963, + booktitle = {Proceedings of the International Population Conference, New York}, + publisher = {International Union for the Scientific Study of Population}, + address = {London}, + volume = 1 +} +@manual{Rsolnp-pkg, + title = {{Rsolnp}: General Non-linear Optimization Using Augmented {L}agrange Multiplier Method}, + author = {Alexios Ghalanos and Stefan Theussl}, + year = 2015, + doi = {10.32614/CRAN.package.Rsolnp}, + url = {https://CRAN.R-project.org/package=Rsolnp}, + note = {R package version 1.16.} +} +@manual{vitality-package, + title = {{vitality}: Fitting Routines for the Vitality Family of Mortality Models}, + author = {Gregor Passolt and James J. Anderson and Ting Li and David H. Salinger and David J. Sharrow}, + year = 2018, + doi = {10.32614/CRAN.package.vitality}, + url = {https://CRAN.R-project.org/package=vitality}, + note = {R package version 1.3} +} +@manual{ReIns, + title = {{ReIns}: Functions from "Reinsurance: Actuarial and Statistical Aspects"}, + author = {Tom Reynkens and Roel Verbelen}, + year = 2020, + doi = {10.32614/CRAN.package.ReIns}, + url = {https://CRAN.R-project.org/package=ReIns}, + note = {R package version 1.0.10} +} +@manual{muhaz-package, + title = {{muhaz}: Hazard Function Estimation in Survival Analysis}, + author = {Kenneth Hess and Robert Gentleman}, + year = 2021, + doi = {10.32614/CRAN.package.muhaz}, + url = {https://CRAN.R-project.org/package=muhaz}, + note = {R package version 1.2.6.4} +} +@manual{tranSurv-package, + title = {{tranSurv}: Transformation Model Based Estimation of Survival and Regression Under Dependent Truncation and Independent Censoring}, + author = {Sy Han (Steven) Chiou and Jing Qian}, + year = 2021, + doi = {10.32614/CRAN.package.tranSurv}, + url = {https://CRAN.R-project.org/package=tranSurv}, + note = {R package version 1.2.2} +} +@manual{DTDA-package, + title = {{DTDA}: Doubly Truncated Data Analysis}, + author = {Carla Moreira and Jacobo {de U\~{n}a-\'{A}lvarez} and Rosa Crujeiras}, + year = 2022, + doi = {10.32614/CRAN.package.DTDA}, + url = {https://bioconductor.org/packages/Icens/}, + note = {R package version 3.0.1} +} +@manual{MortCast-package, + title = {{MortCast}: Estimation and Projection of Age-Specific Mortality Rates}, + author = {Hana \v{S}ev\v{c}\'{\i}kov\'{a} and Nan Li and Patrick Gerland}, + year = 2022, + doi = {10.32614/CRAN.package.MortCast}, + url = {https://CRAN.R-project.org/package=MortCast}, + note = {R package version 2.7-0} +} +@manual{survival-package, + title = {A Package for Survival Analysis in {R}}, + author = {Terry M Therneau}, + year = 2022, + doi = {10.32614/CRAN.package.survival}, + url = {https://CRAN.R-project.org/package=survival}, + note = {R package version 3.3-1} +} +@manual{mev-package, + title = {{mev}: Modelling Extreme Values}, + author = {Belzile, L\'{e}o R. and others}, + year = 2023, + doi = {10.32614/CRAN.package.mev}, + url = {https://CRAN.R-project.org/package=mev}, + note = {R package version 1.15} +} +@manual{dplyr-package, + title = {{dplyr}: A Grammar of Data Manipulation}, + author = {Hadley Wickham and Romain Fran\c{c}ois and Lionel Henry and Kirill M\"{u}ller and Davis Vaughan}, + year = 2023, + doi = {10.32614/CRAN.package.dplyr}, + url = {https://CRAN.R-project.org/package=dplyr}, + note = {R package version 1.1.4} +} +@manual{dblcens-package, + title = {{dblcens}: Compute the {NPMLE} of Distribution Function from Doubly Censored Data, Plus the Empirical Likelihood Ratio for $F(T)$}, + author = {Mai Zhou and Li Lee and Kun Chen and Yifan Yang}, + year = 2023, + doi = {10.32614/CRAN.package.dblcens}, + url = {https://CRAN.R-project.org/package=dblcens}, + note = {R package version 1.1.9} +} +@manual{rust-package, + title = {{rust}: Ratio-of-Uniforms Simulation with Transformation}, + author = {Paul J. Northrop}, + year = 2023, + doi = {10.32614/CRAN.package.rust}, + url = {https://paulnorthrop.github.io/rust/}, + note = {R package version 1.4.2, https://github.com/paulnorthrop/rust} +} +@manual{demography-package, + title = {{demography}: Forecasting Mortality, Fertility, Migration and Population Data}, + author = {Rob Hyndman}, + year = 2023, + doi = {10.32614/CRAN.package.demography}, + url = {https://pkg.robjhyndman.com/demography/}, + note = {R package version 2.0} +} +@manual{fda-package, + title = {{fda}: Functional Data Analysis}, + author = {James Ramsay}, + year = 2024, + doi = {10.32614/CRAN.package.fda}, + url = {http://www.functionaldata.org}, + note = {R package version 6.1.8} +} +@manual{MortalityLaws-package, + title = {{MortalityLaws}: Parametric Mortality Models, Life Tables and {HMD}}, + author = {Marius D. Pascariu}, + year = 2024, + doi = {10.32614/CRAN.package.MortalityLaws}, + url = {https://CRAN.R-project.org/package=MortalityLaws}, + note = {R package version 2.1.0} +} +@manual{Icens-package, + title = {{Icens}: {NPMLE} for Censored and Truncated Data}, + author = {Robert Gentleman and Alain Vandal}, + year = 2024, + doi = {10.18129/B9.bioc.Icens}, + url = {https://CRAN.R-project.org/package=DTDA}, + note = {R package version 1.76.0} +} +@manual{prodlim-package, + title = {{prodlim}: Product-Limit Estimation for Censored Event History Analysis}, + author = {Thomas A. Gerds}, + year = 2024, + doi = {10.32614/CRAN.package.prodlim}, + url = {https://CRAN.R-project.org/package=prodlim}, + note = {R package version 2024.06.25} +} +@incollection{Belzile.Neslehova:2025, + title = {Statistics of extremes for incomplete data, with application to lifetime and liability claim modeling}, + author = {Belzile, L\'eo R. and Ne\v{s}lehov\'a, Johanna G.}, + year = 2025, + booktitle = {Handbook on Statistics of Extremes}, + publisher = {CRC Press}, + pages = {in press}, + chapter = 31, + editor = {Miguel de Carvalho and Rapha\"el Huser and Philippe Naveau and Reich, Brian J.} +} +@phdthesis{Ye:1987, + title = {Interior Algorithms for Linear, Quadratic, and Linearly Constrained Non-Linear Programming}, + author = {Yinyu Ye}, + year = 1987, + school = {Department of {ESS}, Stanford University} +} +@article{Miralles.Davison:2023, + title = {Timing and spatial selection bias in rapid extreme event attribution}, + author = {Oph{\'e}lia Miralles and Anthony C. Davison}, + year = 2023, + journal = {Weather and Climate Extremes}, + volume = 41, + pages = 100584, + doi = {10.1016/j.wace.2023.100584}, + url = {https://doi.org/10.1016/j.wace.2023.100584} +} +@article{tidyverse, + title = {Welcome to the {tidyverse}}, + author = {Hadley Wickham and Mara Averick and Jennifer Bryan and Winston Chang and Lucy D'Agostino McGowan and Romain François and Garrett Grolemund and Alex Hayes and Lionel Henry and Jim Hester and Max Kuhn and Thomas Lin Pedersen and Evan Miller and Stephan Milton Bache and Kirill Müller and Jeroen Ooms and David Robinson and Dana Paige Seidel and Vitalie Spinu and Kohske Takahashi and Davis Vaughan and Claus Wilke and Kara Woo and Hiroaki Yutani}, + year = 2019, + journal = {Journal of Open Source Software}, + volume = 4, + number = 43, + pages = 1686, + doi = {10.21105/joss.01686}, + url = {https://doi.org/10.21105/joss.01686} +} + diff --git a/_articles/RJ-2025-035/PerFer-MarCam_movieROC.R b/_articles/RJ-2025-035/PerFer-MarCam_movieROC.R new file mode 100644 index 0000000000..30b09494d8 --- /dev/null +++ b/_articles/RJ-2025-035/PerFer-MarCam_movieROC.R @@ -0,0 +1,128 @@ +# R script to reproduce the results presented in the paper ---- +# "movieROC: Visualizing the Decision Rules Underlying Binary Classification" + +library(movieROC) +data(HCC) +str(HCC) +table(HCC$tumor) + + +## Figure 2 ---- + +par(mfrow = c(1,3)) +for(gene in c("20202438", "18384097", "03515901")){ + roc <- gROC(X = HCC[,paste0("cg",gene)], D = HCC$tumor) + plot_densities(roc, histogram = TRUE, lwd = 3, main = paste("Gene", gene), + legend = (gene == "03515901"), pos.legend = "topleft", + xlim = c(0.4*(gene == "20202438"),1)) + plot_densities(roc, lwd = 3, new = FALSE, + col = adjustcolor(c('#485C99','#8F3D52'), alpha.f = 0.8))} + +par(mfrow = c(1,1)) + + +## Figure 3 ---- + +for(gene in c("20202438", "18384097", "03515901")){ + roc <- gROC(X = HCC[,paste0("cg",gene)], D = HCC$tumor, + side = ifelse(gene == "03515901", "left", "right")) + plot(roc, col = "gray50", main = paste("Gene", gene), lwd = 2) + groc <- gROC(X = HCC[,paste0("cg",gene)], D = HCC$tumor, side = "both") + plot(groc, new = FALSE, lwd = 2) + legend("bottomright", paste(c("AUC =", "gAUC ="), + format(c(roc$auc, groc$auc), digits = 3)), + col = c("gray50", "black"), lwd = 2, bty = "n", inset = .01) +} + + +## Figure extra HTML ---- + +roc_selg1 <- gROC(X = HCC$cg20202438, D = HCC$tumor, side = "right") +roc_selg1 +predict(roc_selg1, FPR = .1) + +plot_densityROC(roc_selg1, C = .77, build.process = TRUE) +plot_buildROC(roc_selg1, C = .77, build.process = TRUE, reduce = FALSE) +movieROC(roc_selg1, reduce = FALSE, file = "StandardROC_gene20202438.gif") + + +## Figure 4 ---- + +X <- HCC[ ,c("cg20202438", "cg18384097")]; D <- HCC$tumor +biroc_12_PT <- multiROC(X, D, method = "fixedLinear", methodLinear = "PepeThompson") +biroc_12_Meis <- multiROC(X, D, method = "dynamicMeisner", verbose = TRUE) +biroc_12_lrm <- multiROC(X, D) +biroc_12_kernel <- multiROC(X, D, method = "kernelOptimal") +list_biroc <- list(PepeTh = biroc_12_PT, Meisner = biroc_12_Meis, + LRM = biroc_12_lrm, KernelDens = biroc_12_kernel) +lapply(names(list_biroc), function(x) movieROC(list_biroc[[x]], display.method = "OV", + xlab = "Gene 20202438", ylab = "Gene 18384097", + lwd.curve = 4, cex = 1.2, alpha.points = 1, + file = paste0(x, ".gif"))) + + +## Figure 5 ---- + +multiroc_PT <- multiROC(X = HCC[,c("cg20202438", "cg18384097", "cg03515901")], + D = HCC$tumor, method = "fixedLinear", methodLinear = "PepeThompson") +multiroc_PT +plot_buildROC(multiroc_PT, cex = 1.2, lwd.curve = 4, alpha.points = 1) +plot_buildROC(multiroc_PT, display.method = "OV", displayOV = c(1,3), cex = 1.2, + xlab = "Gene 20202438", ylab = "Gene 03515901", lwd.curve = 4, alpha.points = 1) + + +## Figure 6 ---- + +groc_selg1 <- gROC(X = HCC$cg20202438, D = HCC$tumor, side = "both") +groc_selg1 +predict(groc_selg1, FPR = .1) + +groc_selg1_C <- gROC(X = HCC$cg20202438, D = HCC$tumor, side = "both", + restric = TRUE, optim = TRUE) + +plot_regions(roc_selg1, cex.legend = 1.5, plot.auc = TRUE, + main = "Standard right-sided assumption [Classification subsets]") +plot_regions(groc_selg1, plot.auc = TRUE, legend = F, main.plotroc = "gROC curve", + main = "General approach [Classification subsets]") +plot_regions(groc_selg1_C, plot.auc = TRUE, legend = F, main.plotroc = "gROC curve", + main = "General approach with restriction (C) [Classific. subsets]", + xlab = "Gene 20202438 expression intensity") + + +## Figure 7 ---- + +X <- HCC$cg18384097; D <- HCC$tumor +hroc_cubic_selg2 <- hROC(X, D) +hroc_cubic_selg2 +hroc_rcs_selg2 <- hROC(X, D, formula.lrm = "D ~ rcs(X,8)") +hroc_lkr1_selg2 <- hROC(X, D, type = "kernel") +hroc_lkr3_selg2 <- hROC(X, D, type = "kernel", kernel.h = 3) +hroc_overfit_selg2 <- hROC(X, D, type = "overfitting") + +groc_selg2_C <- gROC(X, D, side = "both", restric = TRUE, optim = TRUE) + +list_hroc <- list(Cubic = hroc_cubic_selg2, Splines = hroc_rcs_selg2, + Overfit = hroc_overfit_selg2, LikRatioEst_h3 = hroc_lkr3_selg2, + LikRatioEst_h1 = hroc_lkr1_selg2, gAUC_restC = groc_selg2_C) +AUCs <- sapply(list_hroc, function(x) x$auc) +round(AUCs, 3) + +par(mfrow = c(2,3)) +lapply(list_hroc, function(x) plot_funregions(x, FPR = .15, FPR2 = .5)) + +### To modify titles: +for(i in seq_along(list_hroc)){ + main <- NULL + if(i == 4) main <- "Likelihood ratio estimation \n(bandwidth = 3)" + if(i == 5) main <- "Likelihood ratio estimation \n(bandwidth = 1)" + if(i == 6) main <- "General approach \n under restriction (C)" + plot_funregions(list_hroc[[i]], FPR = .15, FPR2 = .5, main = main) +} + + +### Figure 8 ---- + +plot_regions(hroc_rcs_selg2, FPR = .5, cex.legend = 1.5, plot.auc = TRUE) +plot_regions(groc_selg2_C, FPR = .5, legend = FALSE, plot.auc = TRUE, + main = "Classification subsets: General approach with restriction (C)", + main.plotroc = "gROC curve", xlab = "Gene 18384097 expression intensity") diff --git a/_articles/RJ-2025-035/PerFer-MarCam_movieROC.bib b/_articles/RJ-2025-035/PerFer-MarCam_movieROC.bib new file mode 100644 index 0000000000..6b5575ca5b --- /dev/null +++ b/_articles/RJ-2025-035/PerFer-MarCam_movieROC.bib @@ -0,0 +1,491 @@ +@manual{animationCRAN2021, + Author = {Yihui Xie and Christian Mueller and Lijia Yu and Weicheng Zhu}, + Note = {R package version 2.7}, + Title = {animation: A Gallery of Animations in Statistics and Utilities to Create Animations}, + Url = {https://yihui.org/animation/}, + Year = {2021} +} +@manual{ksCRAN2023, + Author = {Tarn Duong}, + Note = {R package version 1.14.1}, + Title = {ks: Kernel Smoothing}, + Url = {https://CRAN.R-project.org/package=ks}, + Year = {2023} +} +@manual{e1071CRAN2023, + Author = {David Meyer and Evgenia Dimitriadou and Kurt Hornik and Andreas Weingessel and Friedrich Leisch}, + Note = {R package version 1.7-13}, + Title = {e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien}, + Url = {https://CRAN.R-project.org/package=e1071}, + Year = {2023} +} +@article{Floyd1962, + Author = {Robert W. Floyd}, + Journal = {Communications of the ACM}, + Pages = {345--345}, + Title = {Algorithm 97: Shortest path}, + Url = {https://doi.org/10.1145/367766.368168}, + Volume = {5}, + Year = {1962} +} +@article{Hanley1988, + Author = {James A. Hanley}, + Journal = {Medical Decision Making}, + Note = {PMID: 3398748}, + Number = {3}, + Pages = {197-203}, + Title = {The Robustness of the "Binormal" Assumptions Used in Fitting {ROC} Curves}, + Url = {https://doi.org/10.1177/0272989X8800800308}, + Volume = {8}, + Year = {1988} +} +@article{DiazCoto2020, + Author = {Susana D{\'\i}az-Coto}, + Journal = {Computational Statistics}, + Pages = {1231--1251}, + Title = {{smoothROCtime: an {R} package for time-dependent {ROC} curve estimation}}, + Url = {https://doi.org/10.1007/s00180-020-00955-7}, + Volume = {35}, + Year = {2020} +} +@article{Hotelling1933, + Author = {Harold Hotelling}, + Journal = {Journal of Educational Psychology}, + Number = {6}, + Pages = {417--441}, + Publisher = {Warwick \& York}, + Title = {Analysis of a complex of statistical variables into principal components}, + Url = {https://doi.org/10.1037/h0071325}, + Volume = {24}, + Year = {1933} +} +@manual{DiazCoto2023, + Author = {Susana D{\'\i}az-Coto}, + Note = {R package version 0.1.1}, + Title = {sMSROC: Assessment of Diagnostic and Prognostic Markers}, + Url = {https://CRAN.R-project.org/package=sMSROC}, + Year = {2023} +} +@article{Dorfman1997, + Author = {D.D. Dorfman and K.S. Berbaum and C.E. Metz and R.V. Lenth and J.A. Hanley and H.A. Dagga}, + Journal = {Academic Radiology}, + Number = {2}, + Pages = {138--149}, + Title = {Proper receiver operating characteristic analysis: The bigamma model}, + Url = {https://doi.org/10.1016/s1076-6332(97)80013-x}, + Volume = {4}, + Year = {1997} +} +@book{Nakas2023, + Author = {Nakas, C.T. and Bantis, L.E. and Gatsonis, C.}, + Isbn = {9781032480220}, + Lccn = {2022055093}, + Publisher = {CRC Press}, + Series = {Chapman \& Hall/CRC biostatistics series}, + Title = {ROC Analysis for Classification and Prediction in Practice}, + Url = {https://www.routledge.com/ROC-Analysis-for-Classification-and-Prediction-in-Practice/Nakas-Bantis-Gatsonis/p/book/9781482233704}, + Year = {2023} +} +@article{Duong2007, + Author = {Duong, Tarn}, + Journal = {Journal of Statistical Software}, + Number = {7}, + Pages = {1--16}, + Title = {ks: Kernel Density Estimation and Kernel Discriminant Analysis for Multivariate Data in {R}}, + Url = {https://doi.org/10.18637/jss.v021.i07}, + Volume = {21}, + Year = {2007} +} +@article{Camblor2021b, + Author = {Mart{\'\i}nez-Camblor, Pablo and P{\'e}rez-Fern{\'a}ndez, Sonia and D{\'\i}az-Coto, Susana}, + Journal = {AStA Advances in Statistical Analysis}, + Number = {4}, + Pages = {581--599}, + Title = {Optimal classification scores based on multivariate marker transformations}, + Url = {https://doi.org/10.1007/s10182-020-00388-z}, + Volume = {105}, + Year = {2021} +} +@article{Camblor2021a, + Author = {Mart{\'\i}nez-Camblor, Pablo and P{\'e}rez-Fern{\'a}ndez, Sonia and D{\'\i}az-Coto, Susana}, + Journal = {The International Journal of Biostatistics}, + Number = {1}, + Pages = {293--306}, + Publisher = {De Gruyter}, + Title = {The area under the generalized receiver-operating characteristic curve}, + Url = {https://doi.org/10.1515/ijb-2020-0091}, + Volume = {18}, + Year = {2021} +} +@article{Bantis2021a, + Author = {Bantis, Leonidas E. and Tsimikas, John V. and Chambers, Gregory R. and Capello, Michela and Hanash, Samir and Feng, Ziding}, + Journal = {Statistics in Medicine}, + Number = {7}, + Pages = {1767--1789}, + Title = {The length of the receiver operating characteristic curve and the two cutoff {Youden} index within a robust framework for discovery, evaluation, and cutoff estimation in biomarker studies involving improper receiver operating characteristic curves}, + Url = {https://doi.org/10.1002/sim.8869}, + Volume = {40}, + Year = {2021} +} +@article{Liu2011, + Author = {Liu, Chunling and Liu, Aiyi and Halabi, Susan}, + Journal = {Statistics in Medicine}, + Number = {16}, + Pages = {2005--2014}, + Title = {A min--max combination of biomarkers to improve diagnostic accuracy}, + Url = {https://doi.org/10.1002/sim.4238}, + Volume = {30}, + Year = {2011} +} +@article{Zou1997, + Author = {Zou, Kelly H. and Hall, W. J. and Shapiro, David E.}, + Journal = {Statistics in Medicine}, + Number = {19}, + Pages = {2143--2156}, + Title = {Smooth non-parametric receiver operating characteristic ({ROC}) curves for continuous diagnostic tests}, + Url = {https://doi.org/10.1002/(SICI)1097-0258(19971015)16:19<2143::AID-SIM655>3.0.CO;2-3}, + Volume = {16}, + Year = {1997} +} +@article{Hsieh1996, + Author = {Hsieh, Fushing and Turnbull, Bruce W.}, + Journal = {The Annals of Statistics}, + Number = {1}, + Pages = {25--40}, + Publisher = {The Institute of Mathematical Statistics}, + Title = {Nonparametric and semiparametric estimation of the receiver operating characteristic curve}, + Url = {https://doi.org/10.1214/aos/1033066197}, + Volume = {24}, + Year = {1996} +} +@article{Fluss2005, + Author = {Ronen Fluss and David Faraggi and Benjamin Reiser}, + Journal = {Biometrical Journal}, + Number = {4}, + Pages = {458--472}, + Title = {Estimation of the {Youden} Index and its Associated Cutoff Point}, + Url = {https://doi.org/10.1002/bimj.200410135}, + Volume = {47}, + Year = {2005} +} +@article{Goncalves2014, + Author = {Luzia Gon{\c c}alves and Ana Subtil and Mar{\'\i}a Ros{\'a}rio Oliveira and Patricia De Zea Bermudez}, + Journal = {REVSTAT Statistical Journal}, + Number = {1}, + Pages = {1--20}, + Title = {{ROC} curve estimation: An overview}, + Url = {https://doi.org/10.57805/revstat.v12i1.141}, + Volume = {12}, + Year = {2014} +} +@article{Camblor2018a, + Author = {Mart{\'\i}nez-Camblor, P.}, + Journal = {Statistics in Medicine}, + Number = {7}, + Pages = {1222--1224}, + Title = {{On the paper ``Notes on the overlap measure as an alternative to the {Youden} index''}}, + Url = {https://doi.org/10.1002/sim.7517}, + Volume = {37}, + Year = {2018} +} +@article{Murtaugh1995, + Author = {Paul A. Murtaugh}, + Journal = {Biometrics}, + Number = {4}, + Pages = {1514--1522}, + Title = {{ROC} Curves with Multiple Marker Measurements}, + Url = {https://doi.org/10.2307/2533281}, + Volume = {51}, + Year = {1995} +} +@article{Mayeux2004, + Author = {Mayeux, R.}, + Journal = {NeuroRx}, + Number = {2}, + Pages = {182--188}, + Title = {Biomarkers: Potential uses and limitations}, + Url = {https://doi.org/10.1602/neurorx.1.2.182}, + Volume = {1}, + Year = {2004} +} +@misc{ROCnReg:CRAN2023, + Author = {Maria Xos{\'e} Rodr{\'\i}guez-{\'A}lvarez and Vanda Inacio}, + Note = {R package version 1.0-8}, + Title = {{ROCnReg}: {ROC} Curve Inference with and without Covariates}, + Url = {https://CRAN.R-project.org/package=ROCnReg}, + Year = {2023} +} +@article{ROCnReg2021, + Author = {Mar{\'\i}a Xos{\'e} Rodr{\'\i}guez-{\'A}lvarez and Vanda In{\'a}cio}, + Journal = {The R Journal}, + Number = {1}, + Pages = {525--555}, + Title = {{ROCnReg}: An {R} Package for Receiver Operating Characteristic Curve Inference With and Without Covariates}, + Url = {https://doi.org/10.32614/RJ-2021-066}, + Volume = {13}, + Year = {2021} +} +@article{nsROC2018, + Author = {Sonia P{\'e}rez-Fern{\'a}ndez and Pablo Mart{\'\i}nez-Camblor and Peter Filzmoser and Norberto Corral}, + Journal = {{The R Journal}}, + Number = {2}, + Pages = {55--74}, + Title = {{nsROC}: An {R} package for Non-Standard {ROC} Curve Analysis}, + Url = {https://doi.org/10.32614/RJ-2018-043}, + Volume = {10}, + Year = {2018} +} +@article{OptimalCutpoints2014, + Author = {M{\'o}nica L{\'o}pez-Rat{\'o}n and Mar{\'\i}a Xos{\'e} Rodr{\'\i}guez-{\'A}lvarez and Carmen Cadarso Su{\'a}rez and Francisco Gude Sampedro}, + Journal = {Journal of Statistical Software}, + Number = {8}, + Pages = {1--36}, + Title = {{OptimalCutpoints}: An {R} Package for Selecting Optimal Cutpoints in Diagnostic Tests}, + Url = {https://doi.org/10.18637/jss.v061.i08}, + Volume = {61}, + Year = {2014} +} +@misc{OptimalCutpoints:CRAN2021, + Author = {M{\'o}nica L{\'o}pez-Rat{\'o}n and Mar{\'\i}a Xos{\'e} Rodr{\'\i}guez-{\'A}lvarez}, + Note = {R package version 1.1-5}, + Title = {{OptimalCutpoints: Computing Optimal Cutpoints in Diagnostic Tests}}, + Url = {https://CRAN.R-project.org/package=OptimalCutpoints}, + Year = {2021} +} +@article{pROC2011, + Author = {Xavier Robin and Natacha Turck and Alexandre Hainard and Natalia Tiberti and Fr{\'e}d{\'e}rique Lisacek and Jean-Charles Sanchez and Markus M{\"u}ller}, + Journal = {BMC Bioinformatics}, + Number = {77}, + Pages = {1--8}, + Title = {p{ROC}: an open-source package for {R} and {S+} to analyze and compare {ROC} curves}, + Url = {https://doi.org/10.1186/1471-2105-12-77}, + Volume = {12}, + Year = {2011} +} +@misc{pROC:CRAN2023, + Author = {Xavier Robin and Natacha Turck and Alexandre Hainard and Natalia Tiberti and Fr{\'e}d{\'e}rique Lisacek and Jean-Charles Sanchez and Markus M{\"u}ller and Stefan Siegert and Matthias Doering and Zane Billings}, + Note = {R package version 1.18.4}, + Title = {p{ROC}: Display and Analyze {ROC} Curves}, + Url = {https://CRAN.R-project.org/package=pROC}, + Year = {2023} +} +@manual{nsROC:CRAN2018, + Author = {Sonia P\'erez-Fern\'andez}, + Note = {R package version 1.1}, + Title = {ns{ROC}: Non-Standard {ROC} Curve Analysis}, + Url = {https://CRAN.R-project.org/package=nsROC}, + Year = {2018} +} +@article{Shen2012, + Author = {Shen, Jing and Wang, Shuang and Zhang, Yu-Jing and Kappil, Maya and Wu, Hui-Chen and Kibriya, Muhammad G and Wang, Qiao and Jasmine, Farzana and Ahsan, Habib and Lee, Po-Huang and others}, + Journal = {Hepatology}, + Number = {6}, + Pages = {1799--1808}, + Publisher = {Wiley Online Library}, + Title = {Genome-wide {DNA} methylation profiles in hepatocellular carcinoma}, + Url = {https://doi.org/10.1002/hep.25569}, + Volume = {55}, + Year = {2012} +} +@techreport{Kauppi2016, + Author = {Heikki Kauppi}, + Institution = {Aboa Centre for Economics}, + Number = {114}, + Title = {The Generalized Receiver Operating Characteristic Curve}, + Type = {Discussion paper}, + Url = {https://www.econstor.eu/bitstream/10419/233329/1/aboa-ce-dp114.pdf}, + Year = {2016} +} +@article{Inacio2021, + Author = {In\'{a}cio, Vanda and Rodr\'{\i}guez-\'{A}lvarez, Mar\'{\i}a Xos\'{e} and Gayoso-Diz, Pilar}, + Journal = {Annual Review of Statistics and Its Application}, + Number = {1}, + Pages = {41--67}, + Title = {Statistical Evaluation of Medical Tests}, + Url = {https://doi.org/10.1146/annurev-statistics-040720-022432}, + Volume = {8}, + Year = {2021} +} +@article{Meisner2021, + Author = {Allison Meisner and Marco Carone and Margaret Sullivan Pepe and Kathleen F. Kerr}, + Journal = {Biometrical Journal}, + Number = {6}, + Pages = {1223--1240}, + Title = {Combining Biomarkers by Maximizing the True Positive Rate for a Fixed False Positive Rate}, + Url = {https://doi.org/10.1002/bimj.202000210}, + Volume = {63}, + Year = {2021} +} +@book{Pepe2003a, + Author = {Margaret Sullivan Pepe}, + Isbn = {9780198509844}, + Publisher = {Oxford University Press}, + Series = {Oxford Statistical Science Series}, + Title = {The Statistical Evaluation of Medical Tests for Classification and Prediction}, + Url = {https://global.oup.com/academic/product/the-statistical-evaluation-of-medical-tests-for-classification-and-prediction-9780198565826?cc=es&lang=en&}, + Year = {2003} +} +@book{Zhou2002, + Author = {Xiao-Hua Zhou and Donna K. McClish and Nancy A. Obuchowski}, + Publisher = {Wiley}, + Series = {Wiley Series in Probability and Statistics}, + Title = {Statistical Methods in Diagnostic Medicine}, + Url = {https://doi.org/10.1002/9780470317082}, + Volume = {414}, + Year = {2002} +} +@article{Su1993, + Author = {John Q. Su and Jun S. Liu}, + Journal = {Journal of the American Statistical Association}, + Number = {424}, + Pages = {1350--1355}, + Title = {Linear combinations of multiple diagnostic markers}, + Url = {https://doi.org/10.1080/01621459.1993.10476417}, + Volume = {88}, + Year = {1993} +} +@article{Hanley1982, + Author = {James A. Hanley and Barbara J McNeil}, + Journal = {Radiology}, + Number = {1}, + Pages = {29--36}, + Title = {The Meaning and Use of the Area Under a Receiver Operating Characteristic ({ROC}) Curve}, + Url = {https://doi.org/110.1148/radiology.143.1.7063747}, + Volume = {143}, + Year = {1982} +} +@article{Bamber1975, + Author = {Donald Bamber}, + Journal = {Journal of Mathematical Psychology}, + Number = {4}, + Pages = {387--415}, + Title = {The area above the ordinal dominance graph and the area below the receiver operating characteristic graph}, + Url = {https://doi.org/10.1016/0022-2496(75)90001-2}, + Volume = {12}, + Year = {1975} +} +@article{Youden1950, + Author = {Youden, W.J.}, + Journal = {Cancer}, + Number = {1}, + Pages = {32--35}, + Title = {Index for rating diagnostic tests}, + Url = {https://doi.org/10.1002/1097-0142(1950)3:1<32::AID-CNCR2820030106>3.0.CO;2-3}, + Volume = {3}, + Year = {1950} +} +@article{PerezFernandez2020, + Author = {Sonia P{\'e}rez-Fern{\'a}ndez and Pablo Mart{\'\i}nez-Camblor and Peter Filzmoser and Norberto Corral}, + Journal = {{AStA Advances in Statistical Analysis}}, + Number = {1}, + Pages = {135--161}, + Title = {Visualizing the decision rules behind the {ROC} curves: understanding the classification process}, + Url = {https://doi.org/10.1007/s10182-020-00385-2}, + Volume = {105}, + Year = {2021} +} +@article{Pepe2000, + Author = {Margaret Sullivan Pepe and Mary Lou Thompson}, + Journal = {Biostatistics}, + Number = {2}, + Pages = {123--140}, + Title = {Combining diagnostic test results to increase accuracy}, + Url = {https://doi.org/10.1093/biostatistics/1.2.123}, + Volume = {1}, + Year = {2000} +} +@article{Camblor2019c, + Author = {Pablo Mart{\'\i}nez-Camblor and Juan Carlos Pardo-Fern{\'a}ndez}, + Journal = {The International Journal of Biostatistics}, + Number = {1}, + Title = {The {Youden} Index in the Generalized Receiver Operating Characteristic Curve Context}, + Url = {https://doi.org/10.1515/ijb-2018-0060}, + Volume = {15}, + Year = {2019} +} +@article{Camblor2019a, + Author = {Pablo Mart{\'\i}nez-Camblor and Sonia P{\'e}rez-Fern{\'a}ndez and Susana D{\'\i}az-Coto}, + Journal = {Journal of Applied Statistics}, + Number = {9}, + Pages = {1550--1566}, + Title = {Improving the biomarker diagnostic capacity via functional transformations}, + Url = {https://doi.org/10.1080/02664763.2018.1554628}, + Volume = {46}, + Year = {2019} +} +@article{Camblor2019b, + Author = {Pablo Mart{\'\i}nez-Camblor and Juan Carlos Pardo-Fern{\'a}ndez}, + Journal = {Statistical Methods in Medical Research}, + Number = {7}, + Pages = {2032--2048}, + Title = {Parametric estimates for the receiver operating characteristic curve generalization for non-monotone relationships}, + Url = {https://doi.org/10.1177/0962280217747009}, + Volume = {28}, + Year = {2019} +} +@article{Camblor2017a, + Author = {Pablo Mart{\'\i}nez-Camblor and Norberto Corral and Corsino Rey and Julio Pascual and Eva Cernuda-Moroll{\'o}n}, + Journal = {Statistical Methods in Medical Research}, + Number = {1}, + Pages = {113--123}, + Title = {Receiver operating characteristic curve generalization for non-monotone relationships}, + Url = {https://doi.org/10.1177/0962280214541095}, + Volume = {26}, + Year = {2017} +} +@article{Pepe2003b, + Author = {Pepe, {M.S.} and G. Longton and Anderson, {G.L.} and M. Schummer}, + Journal = {Biometrics}, + Number = {1}, + Pages = {133--142}, + Title = {Selecting differentially expressed genes from microarray experiments}, + Url = {https://doi.org/10.1111/1541-0420.00016}, + Volume = {59}, + Year = {2003} +} +@article{Kang2016, + Author = {Le Kang and Aiyi Liu and Lili Tian}, + Journal = {Statistical Methods in Medical Research}, + Number = {4}, + Pages = {1359--1380}, + Title = {Linear combination methods to improve diagnostic/prognostic accuracy on future observations}, + Url = {https://doi.org/10.1177/0962280213481053}, + Volume = {25}, + Year = {2016} +} +@article{McIntosh2002, + Author = {Martin W. McIntosh and Margaret Sullivan Pepe}, + Journal = {Biometrics}, + Number = {3}, + Pages = {657--664}, + Title = {Combining Several Screening Tests: Optimality of the Risk Score}, + Url = {https://doi.org/10.1111/j.0006-341X.2002.00657.x}, + Volume = {58}, + Year = {2002} +} +@manual{R, + Address = {Vienna, Austria}, + Author = {{R Core Team}}, + Note = {{ISBN} 3-900051-07-0}, + Organization = {R Foundation for Statistical Computing}, + Title = {{R}: A Language and Environment for Statistical Computing}, + Url = {https://www.R-project.org/}, + Year = {2016} +} +@article{ihaka:1996, + Author = {Ihaka, Ross and Gentleman, Robert}, + Journal = {Journal of Computational and Graphical Statistics}, + Number = {3}, + Pages = {299--314}, + Title = {{R}: A Language for Data Analysis and Graphics}, + Url = {https://doi.org/10.1080/10618600.1996.10474713}, + Volume = {5}, + Year = {1996} +} +@manual{rmsCRAN2023, + Author = {Frank E {Harrell Jr}}, + Note = {R package version 6.7-1}, + Title = {rms: Regression Modeling Strategies}, + Url = {https://CRAN.R-project.org/package=rms}, + Year = {2023} +} diff --git a/_articles/RJ-2025-035/PerFer-MarCam_movieROC.tex b/_articles/RJ-2025-035/PerFer-MarCam_movieROC.tex new file mode 100644 index 0000000000..b47c042f49 --- /dev/null +++ b/_articles/RJ-2025-035/PerFer-MarCam_movieROC.tex @@ -0,0 +1,708 @@ +% !TeX root = RJwrapper.tex +% !TEX spellcheck = en_US + +\title{movieROC: Visualizing the Decision Rules Underlying Binary Classification} +\author{by Sonia P\'erez-Fern\'andez, Pablo Mart\'inez-Camblor and Norberto Corral-Blanco} + +\maketitle + +\abstract{ +The receiver operating characteristic (ROC) curve is a graphical tool commonly used to depict the binary classification accuracy of a continuous marker in terms of its sensitivity and specificity. The standard ROC curve assumes a monotone relationship between the marker and the response, inducing classification subsets of the form $(c,\infty)$ with $c \in \mathbb R$. +However, in non-standard cases, the involved classification regions are not so clear, highlighting the importance of tracking the decision rules. +This paper introduces the R package movieROC, which provides visualization tools for understanding the ability of markers to identify a characteristic of interest, complementing the ROC curve representation. This tool accommodates multivariate scenarios and generalizations involving different decision rules. +The main contribution of this package is the visualization of the underlying classification regions, with the associated gain in interpretability. Adding the time (videos) as a third dimension, this package facilitates the visualization of binary classification in multivariate problems. It constitutes a good tool to generate graphical material for presentations. +} + +\section{Introduction} \label{introduction} + +The use of data to detect a characteristic of interest is a cornerstone of many disciplines such as medicine (to diagnose a pathology or to predict a patient outcome), finance (to detect fraud) or machine learning (to evaluate a classification algorithm), among others. Continuous markers are surrogate measures for the characteristic under study, or predictors of a potential subsequent event. They are measured in subjects, some of them with the characteristic (\dfn{positive}), and some without it (\dfn{negative}). In addition to reliability and feasibility, a good marker must have two relevant properties: interpretability and accuracy \citep{Mayeux2004}. High binary classification \dfn{accuracy} can be achieved if there exists a strong relationship between the marker and the \dfn{response}. The latter is assessed by a \dfn{gold standard} for the presence or absence of the characteristic of interest. \dfn{Interpretability} refers to the \dfn{decision rules} or \dfn{subsets} considered in the classification process. This piece of research seeks to elucidate both desirable properties for a marker by the implementation of a graphical tool in R language. We propose a novel approach involving the generation of videos as a solution to effectively capture the classification procedure for univariate and multivariate markers. Graphical analysis plays a pivotal role in data exploration, interpretation, and communication. Its burgeoning potential is underscored by the fast pace of technological advances, which empower the creation of insightful graphical representations. + +A usual practice when the binary classification accuracy of a marker is of interest involves the representation of the \dfn{Receiver Operating Characteristic (ROC) curve}, summarized by the \dfn{Area Under the Curve} (\dfn{AUC}) \citep{Hanley1982}. The resulting plot reflects the trade-off between the sensitivity and the complementary of the specificity. \dfn{Sensitivity} and \dfn{specificity} are probabilities of correctly classifying subjects, either positive or negative, respectively. Mathematically, let $\xi$ and $\chi$ be the random variables modeling the marker values in the positive and the negative population, respectively, with $F_\xi(\cdot)$ and $F_\chi(\cdot)$ their associated cumulative distribution functions. Assuming that the expected value of the marker is larger in the positive than in the negative population, the standard ROC curve is based on \dfn{classification subsets} of the form $s = (c, \infty)$, where $c$ is the so-called \dfn{cut-off value or threshold} in the support of the marker $X$, $\mathcal{S}(X)$. One subject is classified as a positive if its marker value is within this region, and as a negative otherwise. This type of subsets has two important advantages: first, their interpretability is clear; second, for each specificity $1-t \in [0,1]$, the corresponding $s_t = (c_t, \infty)$ is univocally defined by $c_t = F_\chi^{-1}(1-t)$ for absolutely continuous markers. + + +When differences in marker distribution between the negative and the positive population are only in location but not in shape, then $F_\chi(\cdot) < F_\xi(\cdot)$, and the classification is direct by using these decision rules. However, when this is not the case, the standard ROC curve may cross the main diagonal, resulting in an \dfn{improper} curve \citep{Dorfman1997}. This may be due to three different scenarios: +\begin{itemize} +\item[i)] the behavior of the marker in the two studied populations is different but it is not possible to determine the decision rules. Notice that the binary classification problem goes further than distinction between the two populations: the classification subsets should be highly likely in one population and highly unlikely in the other one \citep{Camblor2018a}; +\item[ii)] there exists a relationship between the marker and the response with a potential classification use, but this is not monotone; +\item[iii)] there is no relationship between the marker and the response at all (main diagonal ROC curve). +\end{itemize} + +In the second case, we have to define classification subsets different from standard $s_t=(c_t,\infty)$. +Therefore, the use of the marker becomes more complex. With the aim of accommodating scenarios where both higher and lower values of the marker are associated with a higher risk of having the characteristic, \citet{Camblor2017a} proposed the so-called \dfn{generalized ROC (gROC) curve}. This curve tracks the highest sensitivity for every specificity in the unit interval resulting from subsets of the form $s_t=(-\infty, x_t^L] \cup (x_t^U, \infty)$ with $x_t^L \leq x_t^U \in \mathcal{S}(X)$. + +Despite final decisions are based on the underlying classification subsets, they are typically not depicted. +This omission is not a shortcoming in standard cases, as for each specificity $1-t \in [0,1]$, there is only one rule of the form $s_t = (c_t, \infty)$ which such specificity. Particularly, $s_t$ is univocally defined by $c_t = 1 - F_\chi^{-1}(1-t)$; and the same applies if we fix a sensitivity. Nevertheless, if the gROC curve is taken, there are infinite subsets of the form $s_t=(-\infty, x_t^L] \cup (x_t^U, \infty)$ resulting in $\P(\chi \in s_t) = t$. +This loss of univocity underlines the importance of reporting (numerically and/or graphically) the decision rules actually proposed for classifying. This gap is covered in the presented package. + +An alternative approach to assess the classification performance of a marker involves considering a transformation of it. This transformation $h(\cdot)$ aims to capture differences in distribution between the two populations in the ROC sense. Once $h(\cdot)$ is identified, the standard ROC curve for $h(X)$ is represented, resulting in the \dfn{efficient ROC (eROC) curve} \citep{Kauppi2016}. Arguing as before, for a fixed specificity, the classification subsets $s_t=(c_t, \infty)$ in the transformed space are univocally defined, where a subject is classified as positive if $h(x) \in s_t$ and negative otherwise (with $x$ representing its marker value). However, they may have any shape in the original space, depending on the monotonicity of the functional transformation $h(\cdot)$ \citep{Camblor2019a}. Emphasizing the importance of tracking the decision rules underlying the eROC curve, this monitoring process enables an assessment of whether the improved accuracy of the marker justifies the potential loss in interpretability. + +The ROC curve is defined for classification accuracy evaluation of univariate markers. To deal with multivariate markers, the usual practice is to consider a transformation $\boldsymbol{h}(\cdot)$ to reduce it to a univariate one, and then to construct the standard ROC curve. Same considerations as before apply when a functional transformation is taken. In the proposed R library, we consider methods from the literature to define and estimate $\boldsymbol{h}(\cdot)$ in the multivariate case \citep{Kang2016, Meisner2021}. + + +Focusing on the classification subsets underlying the decision rules, the \CRANpkg{movieROC} package incorporates methods to visualize the construction process of ROC curves by presenting the classification accuracy of these subsets. +For univariate markers, the library includes both the classical (standard ROC curve) and the generalized (gROC curve) approach. Besides it enables the display of decision rules for various transformations of the marker, seeking to maximize performance allowing for flexibility in the final shape of the subsets (eROC curve). For multidimensional markers, the proposed tool visualizes the evolution of decision subsets when different objective functions are employed for optimization, even imposing restrictions on the underlying regions. In this case, displaying the decision rules associated with every specificity in a single static image is no longer feasible. Therefore, \dfn{dynamic representations} (videos) are implemented, drawing on time as an extra dimension to capture the variation in specificity. + +Much software available in R could be discussed here covering diverse topics related to ROC curves: +the \CRANpkg{pROC} package is a main reference including tools to visualize, estimate and compare ROC curves \citep{pROC2011}; +\CRANpkg{ROCnReg} explicitly considers covariate information to estimate the covariate-specific and the covariate-adjusted ROC curves \citep{ROCnReg2021}; +\CRANpkg{smoothROCtime} implements smooth estimation of time-dependent ROC curves based on the bivariate kernel density estimator for $(X, \textit{time-to-event})$ \citep{DiazCoto2020}; +\CRANpkg{OptimalCutpoints} includes point and interval estimation methods for optimal thresholds \citep{OptimalCutpoints2014}; +and \CRANpkg{nsROC} performs non-standard analysis such as gROC estimation \citep{nsROC2018}; among others. + +This paper introduces and elucidates the diverse functionalities of the newly developed \CRANpkg{movieROC} package, aimed at facilitating the visualization and comprehension of the decision rules underlying the binary classification process, encompassing various generalizations. Despite the availability of numerous R packages implementing related analyses, we have identified the main gaps covered in this library: tracking the decision rules underlying the ROC curve, including multivariate markers and non-standard scenarios (i.e. non-monotonous). The rest of the paper is structured as follows. In Section~\ref{section:functionality}, we introduce the main R functions and objects implemented, and briefly explain the dataset employed throughout this manuscript to demonstrate the utility of the R library. Section~\ref{section:regularroc} is devoted to reconsidering the definition of the standard ROC curve from the perspective of classification subsets, including an extension to multivariate scenarios. Sections~\ref{section:groc} and \ref{section:efficientroc} revisit the gROC curve and the eROC curve, respectively, covering various methods to capture the potential classification accuracy of the marker under study. +Each of these sections begins with a state-of-the-art overview, followed by the main syntax of the corresponding R functions. In addition, examples of implementation using the dataset presented in Section \ref{subsection:dataset} are provided. Finally, the paper concludes with a concise summary and computational details regarding the implemented tool. + + + + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\section{Main functions of the movieROC package and illustrative dataset}\label{section:functionality} + +Sections~\ref{subsec:functionality} and \ref{subsec:functions} provide a detailed description of the main objectives of the implemented R functions. To reflect the practical usage of the developed R package, we employ a real dataset throughout this manuscript, which is introduced in Section~\ref{subsection:dataset}. + + + +\subsection{Functionality of the movieROC package} \label{subsec:functionality} + +A graphical tool was developed to showcase static and dynamic graphics displaying the classification subsets derived from maximizing diagnostic accuracy under certain assumptions, ensuring the preservation of the interpretability. The R package facilitates the construction of the ROC curve across various specificities, providing visualizations of the resulting classification regions. The proposed tool comprises multiple R functions that generate objects with distinct class attributes (see function names where red arrows depart from and red nodes in Figure~\ref{figure:functionality}, respectively). Once the object of interest is created, different methods may be used, in order to plot the underlying regions (\code{plot\_regions()}, \code{plot\_funregions()}), to track the resulting ROC curve (\code{plot\_buildROC()}, \code{plot()}), to \code{predict} decision rules for a particular specificity, and to \code{print} relevant information, among others. The main function of the package, \code{movieROC()}, produces videos to exhibit the classification procedure. + +\begin{figure}[htbp] + \centering + \includegraphics[width=\textwidth]{figures/movieROC_mainFunctions.pdf} + \includegraphics[width=.8\textwidth]{figures/movieROC_extraFunctions.pdf} + \caption{R functions of the \CRANpkg{movieROC} package. The blue nodes include the names of the R functions and the red nodes indicate the different R objects that can be created and worked with. The red arrows depart from those R functions engaged in creating R objects and the black arrows indicate which R functions can be applied to which R objects. The grey dashed arrows show internal dependencies.} + \label{figure:functionality} +\end{figure} + +It includes algorithms to visualize the regions that underlie the binary classification problem, considering different approaches: +\begin{itemize} +\item make the classification subsets flexible in order to cover non-standard scenarios, by considering two cut-off values (\code{gROC()} function); explained in Section~\ref{section:groc}; +\item transform the marker by a proper function $h(\cdot)$ (\code{hROC()} function); introduced in Section~\ref{section:efficientroc}; +\item when dealing with multivariate markers, consider a functional transformation with some fixed or dynamic parameters resulting from different methods available in the literature (\code{multiROC()} function); covered in Section~\ref{section:multiroc}. +\end{itemize} + + + + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\subsection{Class methods for \pkg{movieROC} objects} \label{subsec:functions} + +By using the \code{gROC()}, the \code{multiROC()} or the \code{hROC()} function, the user obtains an R object of class `\code{groc}', `\code{multiroc}' or `\code{hroc}', respectively. These will be called \CRANpkg{movieROC} objects. Once the object of interest is created, the implemented package includes many functions (methods) to pass to it. Some of them are generic methods (\code{print()}, \code{plot()} and \code{predict()}), commonly used in R language over different objects according to their class attributes. The rest of the functions are specific for this library and therefore only applicable to \CRANpkg{movieROC} objects. Table~\ref{table:funcmovieROC} summarizes all these functions and provides their target and main syntax (with default input parameters). + +{\footnotesize +\begin{longtable}{@{}p{2.5cm}p{11.1cm}@{}} +\caption{Methods for a \CRANpkg{movieROC} object: `\code{groc}', `\code{multiroc}' or `\code{hroc}' object (output of the \code{gROC()}, \code{multiROC()} or \code{hROC()} function, respectively). The main input parameters are displayed.\label{table:funcmovieROC}}\\ +\toprule +\multicolumn{2}{c}{Generic functions}\\ +\hline +\centering \code{print()} & Print some relevant information.\\ +\centering \code{plot()} & Plot the ROC curve estimate.\\ +\centering \code{predict()} & Print the classification subsets corresponding to a particular false-positive rate (\code{FPR}) introduced by the user. For a `\code{groc}' object, the use may specify a cut-off value \code{C} (for the standard ROC curve) or two cut-off values \code{XL} and \code{XU} (for the gROC curve).\\ +\midrule +\multicolumn{2}{c}{Specific functions}\\ +\hline +\centering \code{plot\_regions()} & Applicable to a `\code{groc}' or a `\code{hroc}' object. \par \vspace{1mm} +Plot two graphics in the same figure: left, classification subsets for each false-positive rate (grey color by default); right, $90^\circ$ rotated ROC curve. \par \vspace{1mm} +\textsc{Main syntax:} \par \hspace{3mm} +\code{plot\_regions(obj, plot.roc = TRUE, plot.auc = FALSE, FPR = 0.15, ...)} \par \vspace{1mm} +If the input parameter \code{FPR} is specified, the corresponding classification region reporting such false-positive rate and the point in the ROC curve are highlighted in blue color.\\ +\midrule +\centering \code{plot\_funregions()} & Applicable to a `\code{groc}' or a `\code{hroc}' object. \par \vspace{1mm} +Plot the transforming function and the classification subsets reporting the false-positive rate(s) indicated in the input parameter(s) \code{FPR} and \code{FPR2}. \par \vspace{1mm} +\textsc{Main syntax:} \par \hspace{3mm} +\code{plot\_funregions(obj, FPR = 0.15, FPR2 = NULL, plot.subsets = TRUE, ...)}\\ +\midrule +\centering \code{plot\_buildROC()} & Applicable to a `\code{groc}' or a `\code{multiroc}' object. \par \vspace{2mm} +- For a `\code{groc}' object: \par \vspace{1mm} +Plot four (if input \code{reduce} is FALSE) or two (if \code{reduce} is TRUE, only those on the top) graphics in the same figure: top-left, density function estimates for the marker in both populations with the areas corresponding to FPR and TPR colored (blue and red, respectively) for the optional input parameter \code{FPR}, \code{C} or \code{XL, XU}; top-right, the empirical ROC curve estimate; bottom-left, boxplots in both groups; bottom-right, classification subsets for every FPR (grey color). \par \vspace{1mm} +\textsc{Main syntax:} \par \hspace{3mm} +\code{plot\_buildROC(obj, FPR = 0.15, C, XL, XU, h = c(1,1), histogram = FALSE, breaks = 15,} \linebreak \hspace*{2.3cm} \code{reduce = TRUE, build.process = FALSE, completeROC = FALSE, ...)} \par \vspace{1mm} +If \code{build.process} is FALSE, the whole ROC curve is displayed; otherwise, if \code{completeROC} is TRUE, the portion of the ROC curve until the fixed FPR is highlighted in black and the rest is shown in gray, while if \code{completeROC} is FALSE, only the first portion of the curve is displayed.\par \vspace{2mm} +- For a `\code{multiroc}' object: \par \vspace{1mm} +Plot two graphics in the same figure: right, the ROC curve highlighting the point and the threshold for the resulting univariate marker; left, scatterplot with the marker values in both positive (red color) and negative (blue color) subjects +\begin{itemize} +\item for $p=2$: over the original/feature bivariate space; +\item for $p>2$: projected over two selected components of the marker (if \code{display.method = "OV"} with components selection in \code{displayOV}, \code{c(1,2)} by default) or the first two principal components from PCA (if \code{display.method = "PCA"}, default); +\end{itemize} +and the classification subset (gold color) reporting the \code{FPR} selected by the user (\code{FPR} $\neq$ \code{NULL}). \par \vspace{1mm} +\textsc{Main syntax:} +\begin{itemize} +\item for $p=2$: +\item[] \code{plot\_buildROC(obj, FPR = 0.15, build.process = FALSE, completeROC = TRUE, ...)} +\item for $p>2$: +\item[] \code{plot\_buildROC(obj, FPR = 0.15, display.method = c("PCA","OV"), displayOV = } \linebreak \hspace*{2cm} \code{c(1,2), build.process = FALSE, completeROC = TRUE, ...)} +\end{itemize} +If \code{build.process} is FALSE, the whole ROC curve is displayed; otherwise, if \code{completeROC} is TRUE, the portion of the ROC curve until the fixed FPR is highlighted in black and the rest is shown in gray, while if \code{completeROC} is FALSE, only the first portion of the curve is shown.\\ +\midrule\\\pagebreak\midrule +\centering \code{movieROC()} & Applicable to a `\code{groc}' or a `\code{multiroc}' object. \par \vspace{2mm} +Save a video as a GIF illustrating the construction of the ROC curve. \par \vspace{1mm} +- For a `\code{groc}' object: \par \vspace{1mm} +\textsc{Main syntax:} \par \hspace{3mm} +\code{movieROC(obj, fpr=NULL, h=c(1,1), histogram = FALSE, breaks = 15, reduce = TRUE,} \linebreak \hspace*{1.6cm} \code{completeROC = FALSE, videobar = TRUE, file = "animation1.gif", ...)} \par \vspace{1mm} +For each element in vector \code{fpr} (optional input parameter), the function executed is \code{plot\_buildROC(obj, FPR = fpr[i], build.process = TRUE, ...)}. The vector of false-positive rates illustrated in the video is \code{NULL} by default: if length of output parameter \code{t} for \code{gROC()} function is lower than 150, such vector is taken as \code{fpr}; otherwise, an equally-space vector of length 100 covering the range of the marker values is considered.\\ +\phantom{\centering \code{plot\_buildROC()}} & +- For a `\code{multiroc}' object: \par \vspace{1mm} +\textsc{Main syntax:} +\begin{itemize} +\item for $p=2$: +\item[] \code{movieROC(obj, fpr = NULL, file = "animation1.gif", save = TRUE, border = TRUE,} \par \hspace*{1.2cm} \code{completeROC = FALSE, ...)}; +\item for $p>2$: +\item[] \code{movieROC(obj, fpr = NULL, display.method = c("PCA","OV"), displayOV = c(1,2),} \par +\hspace*{1.2cm} \code{file = "animation1.gif", save = TRUE, border = TRUE, } \par +\hspace*{1.2cm} \code{completeROC = FALSE, ...)}. +\end{itemize} +The video is \code{save}d by default as a GIF with the name indicated in argument \code{file} (extension \code{.gif} should be added). A \code{border} for the classification subsets is drawn by default.\par \vspace*{1mm} +For each element in vector \code{fpr} (optional input parameter), the function executed is +\begin{itemize} +\item for $p=2$: +\item[] \code{plot\_buildROC(obj, FPR = fpr[i], build.process = TRUE, completeROC, ...)}; +\item for $p>2$: +\item[] \code{plot\_buildROC(obj, FPR = fpr[i], build.process = TRUE, display.method,} \par \hspace*{1.9cm} \code{displayOV, completeROC, ...)} +\end{itemize} + +Same considerations about the input \code{fpr} as those for \code{movieROC()} over a `\code{groc}' object.\\ +\bottomrule +\end{longtable}} + + + + + + + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\subsection{Illustrative dataset} \label{subsection:dataset} + +In order to illustrate the functionality of our R package, we consider the \code{HCC} data. This dataset is derived from gene expression arrays of tumor and adjacent non-tumor tissues of 62 Taiwanese cases of hepatocellular carcinoma (HCC). The goal of the original study \citep{Shen2012} was to identify, with a genome-wide approach, additional genes hypermethylated in HCC that could be used for more accurate analysis of plasma DNA for early diagnosis, by using Illumina methylation arrays (Illumina, Inc., San Diego, CA) that screen 27,578 autosomal CpG sites. The complete dataset was deposited in NCBI’s Gene Expression Omnibus (GEO) and it is available through series accession number GSE37988 (\url{www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE37988}). It is included in the presented package (\code{HCC} dataset), selecting 948 genes with complete information. + +The following code loads the R package and the \code{HCC} dataset (see the \href{https://cran.r-project.org/web/packages/movieROC/vignettes/movieROC_vignette.pdf}{vignette} for main structure). +\begin{example} +R> library(movieROC) +R> data(HCC) +\end{example} + +We selected the genes 20202438, 18384097, and 03515901. On the one hand, we chose the gene 03515901 as an example of a monotone relationship between the marker and the response, reporting a good ROC curve. On the other hand, relative gene expression intensities of the genes 20202438 and 18384097 tend to be more extreme in tissues with tumor than in those without it. These are non-standard cases, so if we limit ourselves to detect ``appropriate'' genes on the basis of the standard ROC curve, they would not be chosen. However, extending the decision rules by means of the gROC curve, those genes may be considered as potential biomarkers (locations) to differ between the two groups. The R code estimating and displaying the density probability function for gene expression intensities of the selected genes in each group (Figure~\ref{fig:densities}) is included in the \href{https://cran.r-project.org/web/packages/movieROC/vignettes/movieROC_vignette.pdf}{vignette}. + + +\begin{figure}[h!] +\includegraphics[width=\textwidth,trim={0 4mm 0 2mm},clip]{figures/Histogram_3genes.pdf} + \caption{Density histograms and kernel density estimations (lighter) for gene expression intensities of the genes 20202438, 18384097 and 03515901 in negative (non-tumor) and positive (tumor) tissues.} + \label{fig:densities} +\end{figure} + + + +\newpage + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\section{Regular ROC curve} \label{section:regularroc} + +Assuming that there exists a monotone relationship between the marker and the response, the \dfn{regular, right-sided} or \dfn{standard ROC curve} associated with the marker $X$ considers classification subsets of the form $s_t=(c_t,\infty)$. +For each specificity $1-t=\P(\chi \notin s_t) \in [0,1]$, also called \dfn{true-negative rate}, there exists only one subset $s_t$ reporting such specificity and thus a particular sensitivity, also called \dfn{true-positive rate}, $\P(\xi \in s_t)$. +This results in a simple correspondence between each point of the ROC curve $\mathcal{R}_r(t) = 1-F_\xi \big(F_\chi^{-1}(1-t)\big)$ and its associated classification region $s_t \in \mathcal{I}_r(t)$, where +$$\mathcal{I}_r(t) = \Big\{ s_t = (c_t, \infty) \mbox{ : } c_t \in \mathcal{S}(X) \mbox{ , } \P(\chi \in s_t) = t \Big\}$$ +is the \dfn{right-sided family of eligible classification subsets}. The definition of this family captures the shape of the decision rules and the target specificity. + +If higher values of the marker are associated with a higher probability of not having the characteristic (see gene 03515901 in Figure~\ref{fig:densities}), the ROC curve would be defined by the \dfn{left-sided family of eligible classification subsets} \citep{Camblor2017a}, $\mathcal{I}_l(t)$, similarly to $\mathcal{I}_r(t)$ but with the form $s_t = (\infty, c_t]$. +%$$\mathcal{I}_l(t) = \Big\{ s_t = (\infty, c_t] \mbox{ : } c_t \in \mathcal{S}(X) \mbox{ , } \P(\chi \in s_t) = t \Big\},$$ +It results in $\mathcal{R}_l(t) = F_\xi \big(F_\chi^{-1}(t) \big)$, $t\in[0,1]$, and the decision rules are also univocally defined in this case. + +The ROC curve and related problems were widely studied in the literature; interested reader is referred to the monographs of \citet{Zhou2002}, \citet{Pepe2003a}, and \citet{Nakas2023}, as well as the review by \citet{Inacio2021}. +By definition, the ROC curve is confined within the unit square, with optimal performance achieved when it approaches the left-upper corner (AUC closer to 1). Conversely, proximity to the main diagonal (AUC closer to 0.5) means diminished discriminatory ability, resembling a random classifier. + +In practice, let $(\xi_1, \xi_2, \dots, \xi_n)$ and $(\chi_1, \chi_2, \dots, \chi_m)$ be two independent and identically distributed (i.i.d.) samples from the positive and the negative population, respectively. Different estimation procedures are implemented in the \CRANpkg{movieROC} package, such as the empirical estimator \citep{Hsieh1996} (by default in the \code{gROC()} function), accompanied by its summary indices: the AUC and the Youden index \citep{Youden1950}. Alternatively, semiparametric approaches based on kernel density estimation for the involved distributions may be considered \citep{Zou1997}. The \code{plot\_densityROC()} function provides plots for both right- and left-sided ROC curved estimated by this method. On the other hand, assuming that the marker follows a gaussian distribution in both populations, that is, $\xi \sim \mathcal{N}(\mu_\xi, \sigma_\xi)$ and $\chi \sim \mathcal{N}(\mu_\chi, \sigma_\chi)$, parametric approaches propose plug-in estimators by estimating the unknown parameters while using the known distributions \citep{Hanley1988}. This parametric estimation is included in the \code{gROC\_param()} function, which works similarly to \code{gROC()}. + +\textsc{Main syntax:} \par \hspace{3mm} +\code{gROC(X, D, side = "right", ...)} \hspace{8mm} \code{gROC\_param(X, D, side = "right", ...)} + +\noindent Table 1 in the \href{https://cran.r-project.org/web/packages/movieROC/vignettes/movieROC_vignette.pdf}{vignette} provides the main input and output parameters of these R functions, which estimate the regular ROC curve (right-sided of left-sided with \code{side = "right"} or \code{"left"}, respectively) and associated decision rules. Its output is an R object of class `\code{groc}`, to which the functions listed in Table~\ref{table:funcmovieROC} above can be applied. Most of them are visualization tools, but the user may also \code{print()} summary information and \code{predict()} classification regions for a particular specificity. + +Figure~\ref{fig:roccurves} graphically represents the empirical estimation of the standard (gray line) and generalized (black line) ROC curves for each gene in Figure~\ref{fig:densities}. To construct the standard ROC curve for the first two genes (20202438 and 18384097), the right-sided ROC curve is considered; and the left-sided curve for the third one (03515901). As expected following the discussion about Figure~\ref{fig:densities}, the standard and gROC curves are similar for the third gene because there exists a monotone relationship between the marker and the response. However, these curves differ for the first two genes due to the lack of monotonicity in those scenarios. The empirical gROC curve estimator is explained in detail in Section~\ref{section:groc}. + +\noindent Next chunk of code generates the figure, providing as example of the use of \code{gROC()} function, \code{plot()} and how to get access to the AUC. + +\begin{example} +R> for(gene in c("20202438", "18384097", "03515901")){ ++ roc <- gROC(X = HCC[,paste0("cg",gene)], D = HCC$tumor, ++ side = ifelse(gene == "03515901", "left", "right")) ++ plot(roc, col = "gray50", main = paste("Gene", gene), lwd = 3) ++ groc <- gROC(X = HCC[,paste0("cg",gene)], D = HCC$tumor, side = "both") ++ plot(groc, new = FALSE, lwd = 3) ++ legend("bottomright", paste(c("AUC =", "gAUC ="), format(c(roc$auc, groc$auc), ++ digits = 3)), col = c("gray50", "black"), lwd = 3, bty = "n", inset = .01)} +\end{example} + +\begin{figure}[h!] + \centering + \includegraphics[width=.95\textwidth,trim={0 3mm 0 5mm},clip]{figures/ROCcurves_3genes.pdf} + \caption{Standard ROC curve (in gray) and gROC curve (in black) empirical estimation for the capacity of genes 20202438, 18384097 and 03515901 to differ between tumor and non-tumor group.} + \label{fig:roccurves} +\end{figure} + + +The following code snippet estimates the standard ROC curve for gene 20202438, prints its basic information, and predicts the classification region and sensitivity resulting in a specificity of 0.9. It provides an illustrative example of utilizing the \code{print()} and \code{predict()} functions. + + +\begin{example} +R> roc_selg1 <- gROC(X = HCC$cg20202438, D = HCC$tumor, side = "right") +R> roc_selg1 +\end{example} +\vspace*{-3mm} + +\begin{example} +Data was encoded with nontumor (controls) and tumor (cases). +It is assumed that larger values of the marker indicate larger confidence that a + given subject is a case. +There are 62 controls and 62 cases. +The specificity and sensitivity reported by the Youden index are 0.855 and 0.403, + respectively, corresponding to the following classification subset: (0.799, Inf). +The area under the right-sided ROC curve (AUC) is 0.547. +\end{example} + +\begin{example} +R> predict(roc_selg1, FPR = .1) +\end{example} +\vspace*{-3mm} +\begin{example} +$ClassSubsets $Specificity $Sensitivity +[1] 0.8063487 Inf [1] 0.9032258 [1] 0.3064516 +\end{example} + +%Figure~\ref{fig:classproc_roc} was generated by \code{plot\_densityROC()} and \code{plot\_buildROC()} functions as follows. These R functions compute the kernel density estimation and the empirical ROC curve estimate, respectively. The second graphic can be seen as a snapshot from the videos generated by the \code{movieROC()} function. + +%\begin{figure}[h!] +% \centering +% \includegraphics[width=.45\textwidth]{figures/plot_densityROC_gene20202438.pdf} \hspace*{4mm} +% \includegraphics[width=.46\textwidth,trim={0 4mm 0 0mm},clip]{figures/plotbuildROC_gene20202438.pdf} +% \caption{Classification procedure until cut-off value 0.77 for the gene 20202438 by using the \code{plot\_densityROC()} (left) and the \code{plot\_buildROC()} (right) function.} +% \label{fig:classproc_roc} +%\end{figure} + +%\begin{example} +%R> plot_densityROC(roc_selg1, C = .77, build.process = TRUE) +%R> plot_buildROC(roc_selg1, C = .77, build.process = TRUE, reduce = FALSE) +%\end{example} + + +\noindent The following line of code displays the whole construction of the empirical standard ROC curve for gene 20202438. The video is saved by default as a GIF with the name provided. +\begin{example} +R> movieROC(roc_selg1, reduce = FALSE, file = "StandardROC_gene20202438.gif") +\end{example} + + + + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\subsection{Multivariate ROC curve} \label{section:multiroc} + +In practice, many cases may benefit from combining information from different markers to enhance classification accuracy. Rather than assessing univariate markers separately, taking the multivariate marker resulting from merging them can yield relevant gain. However, note that the ROC curve and related indices are defined only for univariate markers, as they require the existence of a total order. To address this limitation, a common approach involves transforming the $p$-dimensional multivariate marker $\boldsymbol{X}$ into a univariate one through a functional transformation $\boldsymbol{h}: \mathbb{R}^p \longrightarrow \mathbb{R}$. This transformation $\boldsymbol{h}(\cdot)$ seeks to optimize an objective function related to the classification accuracy, usually the AUC \citep{Su1993, McIntosh2002, Camblor2019a}. + +We enumerate the methods included in the proposed R tool by the \code{multiROC()} function (with the input parameter \code{method}), listed according to the objective function to optimize. Recall that the output of \code{multiROC()} is an object of class `\code{multiroc}`, containing information about the estimation of the ROC curve and subsets for multivariate scenarios. Table 2 in the \href{https://cran.r-project.org/web/packages/movieROC/vignettes/movieROC_vignette.pdf}{vignette} includes the usage of this function. + +\textsc{Main syntax:} \par \hspace{3mm} +\code{multiROC(X, D, method = "lrm", } \par \hspace{1.8cm} +\verb"formula = 'D ~ X.1 + I(X.1^2) + X.2 + I(X.2^2) + I(X.1*X.2)'" \code{, ...)} + + +\begin{itemize} +\item[1.-] AUC: Different procedures to estimate the $\boldsymbol{h}(\cdot)$ maximizing the AUC in the multidimensional case have been studied in the literature. Among all families of functions, linear combinations ($\mathcal{L}_{\boldsymbol{\beta}}(\boldsymbol{X}) = \beta_1 X_1 + \dots + \beta_p X_p$) are widely used due to their simplicity; an extensive review of the existing methods was conducted by \citet{Kang2016}. +\item[] Computation: In the \code{multiROC()} function, fixing input parameters \code{method = "fixedLinear"} and \code{methodLinear} to one from \code{"SuLiu"} \citep{Su1993}, \code{"PepeThompson"} \citep{Pepe2000}, or \code{"minmax"} \citep{Liu2011}. The R function also admits quadratic combinations when $p=2$, i.e. $\mathcal{Q}_{\boldsymbol{\beta}}(\boldsymbol{X}) = \beta_1 X_1 + \beta_2 X_2 + \beta_3 X_1 X_2 + \beta_4 X_1^2 + \beta_5 X_2^2$, by fixing \code{method = "fixedQuadratic"} for a particular \code{coefQuadratic} $= \boldsymbol{\beta} = (\beta_1, \dots, \beta_5)$. +\item[2.-] The risk score function $\mbox{logit} \left\{ \P(D = 1 \, | \, \boldsymbol{X}) \right\}$: Our package allows the user to fit a logistic regression model (\code{method = "lrm"}) considering any family of functions (linear, quadratic, whether considering interactions or not...) by means of the input parameter \code{formula.lrm}. A stepwise regression model is fitted if \code{stepModel = TRUE}. Details are explained in Section~\ref{section:efficientroc}. +\item[3.-] The sensitivity for a particular specificity: +\begin{itemize} +\item[a)] Considering the theoretical discussion about the search of the optimal transformation $\boldsymbol{h}(\cdot)$ pointed out in Section~\ref{section:efficientroc}, \citet{Camblor2021b} proposed to estimate it by multivariate kernel density estimation for positive and negative groups separately. +\item[] Computation: The \code{multiROC()} function integrates the estimation procedures for the bandwidth matrix developed by \citet{Duong2007}, by fixing \code{method = "kernelOptimal"} and choosing a proper method to estimate the bandwidth (\code{"kernelOptimal.H"}). +\item[b)] Mainly linear combinations have been explored to date in the scientific literature \citep{Meisner2021,PerezFernandez2020}. For a fixed specificity $t\in[0,1]$, we seek the linear combination $\mathcal{L}_{\boldsymbol{\beta}(t)}(\boldsymbol{X}) = \beta_1(t) X_1 + \dots + \beta_p(t) X_p$ maximizing the true-positive rate by considering standard subsets for the transformed marker. +The coefficients $\boldsymbol{\beta}(t)$ are called `dynamic parameters' because they may be different for each $t \in [0,1]$. +\item[] Computation: Since our objective is to display the ROC curve, $\mathcal{L}_{\boldsymbol{\beta}(t)}(\boldsymbol{X})$ is estimated for every $t$ in a grid of the unit interval, resulting in one $\boldsymbol{\hat{\beta}}(t)$ for each $t$. This approach is time-consuming, especially when it is based on the plug-in empirical estimators involved (\code{method = "dynamicEmpirical"}, only implemented for $p=2$), and may result in overfitting. Instead, \citet{Meisner2021} method is recommended (\code{method = "dynamicMeisner"}). +\end{itemize} +\end{itemize} + + +Once the classification subsets for a multivariate marker are constructed by the \code{multiROC()} function, several R methods may be used for the output object (see Table~\ref{table:funcmovieROC}). They include \code{print} relevant information or \code{plot} the resulting ROC curve. The main contribution of the package is to plot the construction of the ROC curve together with the classification subsets in static figure for a particular FPR (\code{plot\_buildROC()} function), or in a video for tracking the whole process (\code{movieROC()} function). + +Figure~\ref{fig:biroc} illustrates snapshots of videos resulting from \code{movieROC()} function for two FPR: $0.1$ and $0.55$. Particularly, classification accuracy of the bivariate marker \code{(cg20202438, cg18384097)} was studied by using four different approaches indicated on the captions, considering linear combinations (top) and nonlinear transformations (bottom). This figure was implemented by the code below, integrating \code{multiROC()} and \code{movieROC()} functions. Four videos are saved as GIF files with names \code{"PepeTh.gif"} (a), \code{"Meisner.gif"} (b), \code{"LRM.gif"} (c), and \code{"KernelDens.gif"} (d). + +\begin{figure}[h!] + \begin{subfigure}{0.49\textwidth} + \includegraphics[width=\linewidth,page=1]{figures/movieROC_g1g2_PT.pdf} + \includegraphics[width=\linewidth,page=2]{figures/movieROC_g1g2_PT.pdf} + \caption{Linear combinations with fixed parameters by \citet{Pepe2000}.} + \end{subfigure} \hspace{0.01\textwidth} + \begin{subfigure}{0.49\textwidth} + \includegraphics[width=\linewidth,page=1]{figures/movieROC_g1g2_Meis.pdf} + \includegraphics[width=\linewidth,page=2]{figures/movieROC_g1g2_Meis.pdf} + \caption{Linear combinations with dynamic parameters by \citet{Meisner2021}.} + \end{subfigure} + +\end{figure} + +\begin{figure}[h!]\ContinuedFloat + \begin{subfigure}{0.49\textwidth} + \includegraphics[width=\linewidth,page=1]{figures/movieROC_g1g2_lrm.pdf} + \includegraphics[width=\linewidth,page=2]{figures/movieROC_g1g2_lrm.pdf} + \caption{Logistic regression model with quadratic formula by default (see \code{formula.lrm} in Table 2 of the \href{https://cran.r-project.org/web/packages/movieROC/vignettes/movieROC_vignette.pdf}{vignette}). \hspace*{2cm}} + \end{subfigure} \hspace{0.01\textwidth} + \begin{subfigure}{0.49\textwidth} + \includegraphics[width=\linewidth,page=1]{figures/movieROC_g1g2_optimalT.pdf} + \includegraphics[width=\linewidth,page=2]{figures/movieROC_g1g2_optimalT.pdf} + \caption{Optimal transformation by multivariate kernel density estimation with \code{"Hbcv"} method by default \citep{Camblor2021b}.} + \end{subfigure} + + \vspace*{-1mm} + +\caption{Snapshots (from \code{movieROC()} videos) of the classification procedure and ROC curve for the bivariate marker \code{(cg20202438, cg18384097)} when false-positive rate equals 0.1 (top of each subfigure) and 0.55 (bottom of each subfigure). Four different methods for classification are displayed.} +\label{fig:biroc} +\end{figure} + +\begin{example} +R> X <- HCC[ ,c("cg20202438", "cg18384097")]; D <- HCC$tumor +R> biroc_12_PT <- multiROC(X, D, method = "fixedLinear", methodLinear = "PepeThompson") +R> biroc_12_Meis <- multiROC(X, D, method = "dynamicMeisner", verbose = TRUE) +R> biroc_12_lrm <- multiROC(X, D) +R> biroc_12_kernel <- multiROC(X, D, method = "kernelOptimal") +R> list_biroc <- list(PepeTh = biroc_12_PT, Meisner = biroc_12_Meis, ++ LRM = biroc_12_lrm, KernelDens = biroc_12_kernel) +R> lapply(names(list_biroc), function(x) movieROC(list_biroc[[x]], ++ display.method = "OV", xlab = "Gene 20202438", ylab = "Gene 18384097", ++ cex = 1.2, alpha.points = 1, lwd.curve = 4, file = paste0(x, ".gif"))) +\end{example} + + +\bigskip + +When the marker has a dimension higher than two it is difficult to visualize the data and the classification regions. Therefore, the \code{movieROC()} function offers two options for showing the results, both on a bidimensional space. On the one hand, to choose two of the components of the multivariate marker and project the classification subsets on the plain defined by them (Figure~\ref{fig:multiroccurve}, middle). On the other, to project the classification regions on the plain defined by the two first principal components (Figure~\ref{fig:multiroccurve}, left). +The R function \code{prcomp()} from \pkg{stats} is used to perform Principal Components Analysis (PCA) \citep{Hotelling1933}. + +Figure~\ref{fig:multiroccurve} shows the difficulty in displaying the decision rules when $p>2$ (the 3 genes used along this manuscript), even with the two options implemented in our package. It was generated using \code{multiROC()} and \code{plot\_buildROC()}: + +\begin{example} +R> multiroc_PT <- multiROC(X = HCC[ ,c("cg20202438", "cg18384097", "cg03515901")], ++ D = HCC$tumor, method = "fixedLinear", methodLinear = "PepeThompson") +R> multiroc_PT +\end{example} + +\begin{example} +Data was encoded with nontumor (controls) and tumor (cases). +There are 62 controls and 62 cases. +A total of 3 variables have been considered. +A linear combination with fixed parameters estimated by PepeThompson approach has + been considered. +The specificity and sensitivity reported by the Youden index are 0.855 and 0.742, + respectively, corresponding to the cut-off point -0.0755 for the transformation + h(X) = 0.81*cg20202438 - 0.1*cg18384097 - 1*cg03515901. +The area under the ROC curve (AUC) is 0.811. +\end{example} + +\begin{example} +R> plot_buildROC(multiroc_PT, cex = 1.2, lwd.curve = 4) +R> plot_buildROC(multiroc_PT, display.method = "OV", displayOV = c(1,3), cex = 1.2, ++ xlab = "Gene 20202438", ylab = "Gene 03515901", lwd.curve = 4) +\end{example} +\begin{figure}[h!] + \centering + \includegraphics[height=4.6cm,page=1,trim={0 4mm 12.5cm 0},clip]{figures/buildROC_multi3.pdf} + \includegraphics[height=4.6cm,page=2,trim={0 4mm 0 0},clip]{figures/buildROC_multi3.pdf} + \caption{Multivariate ROC curve estimation for the simultaneous diagnostic accuracy of genes 20202438, 18384097 and 03515901. \citet{Pepe2000} approach was used to estimate the linear coefficients and classification rules (yellow and gray border for positive and negative class, respectively) for a FPR 0.15 are displayed. Left, projected over the 2 principal components from PCA; middle, over the 1st and the 3rd selected genes.} + \label{fig:multiroccurve} +\end{figure} + + + + + + + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\section{Generalized ROC curve} \label{section:groc} + +There are scenarios whose standard ROC curves are not concave (first two genes in Figure~\ref{fig:roccurves}, gray solid line), reflecting that the standard initial assumption of existence of a monotone relationship between the marker and the response is misleading. In Figure~\ref{fig:densities}, we may see that difference in gene 20202438 distribution between those tissues who have the characteristic and those who do not is mainly on dispersion. To accommodate this common type of scenarios, \citet{Camblor2017a} extended the ROC curve definition to the case where both extremes for marker values are associated with a higher risk of having the characteristic of interest, by considering the \dfn{both-sided family of eligible classification subsets}: +$$\mathcal{I}_g(t) = \Big\{ s_t = (-\infty,x_t^L] \cup (x_t^U, \infty) \mbox{ : } x_t^L \leq x_t^U \in \mathcal{S}(X) \mbox{ , } \P(\chi \in s_t) = t \Big\}.$$ +It becomes crucial to consider the supremum in the definition of the generalized ROC curve because the decision rule for each $t \in [0,1]$ is not univocally defined: there exist infinite pairs $x_t^L \leq x_t^U$ reporting a specificity $1-t$ (i.e. $\mathcal{I}_g(t)$ is uncountably infinite). +Computationally, this optimization process incurs a time-consuming estimation, depending on the number of different marker values in the sample. + +After the introduction of this extension, several studies followed up regarding estimation of the gROC curve \citep{Camblor2017a, Camblor2019b} and related measures such as its area (gAUC) \citep{Camblor2021a} and the Youden index \citep{Camblor2019c, Bantis2021a}. +By considering this generalization, another property of the classification subsets may be lost: the regions may not be self-contained over the increase in false-positive rate. It may happen that a subject is classified as a positive for a particular FPR $t_1$, but as a negative for a higher FPR $t_2$. Therefore, it is natural to establish a restriction \textit{(C)} on the classification subsets, ensuring that any subject classified as a positive for a fixed specificity (or sensitivity) will be also classified as a positive for any classification subset with lower specificity (higher sensitivity). \citet{PerezFernandez2020} proposed an algorithm to estimate the gROC curve under restriction \textit{(C)}, included in the \code{gROC()} function of the presented R package. See final Section for computational details about this algorithm implementation. + +\textsc{Main syntax:} \par \hspace{3mm} +\code{gROC(X, D, side = "both", ...)} \hspace{8mm} \code{gROC\_param(X, D, side = "both", ...)} + +\noindent Table 1 in the \href{https://cran.r-project.org/web/packages/movieROC/vignettes/movieROC_vignette.pdf}{vignette} collects the input and output parameters of the \code{gROC()} function, which estimates the gROC curve, both in the mentioned direction (\code{side = "both"}) and in the opposite, i.e. when classification subsets of the form $s_t=(x_t^L, x_t^U]$ are considered (\code{side = "both2"}). In addition, all the particular methods for a `\code{groc}' object collected in Table~\ref{table:funcmovieROC} above may be used in this general scenario. + +%\begin{table}[h!] +% \renewcommand{\arraystretch}{1.5} +%{\footnotesize +%\begin{tabular}{@{}p{1.7cm}|p{11.9cm}@{}} +%\toprule +%\multicolumn{2}{c}{Input parameters}\\ +%\hline +%\centering \code{X} & Vector of marker values. \\ +%\centering \code{D} & Vector of response values. Two levels; if more, the two +%first ones are used.\\ +%\centering \code{side} & Type of ROC curve. One of \code{"right"} ($\mathcal{R}_r(\cdot)$), +%\code{"left"} ($\mathcal{R}_l(\cdot)$), \code{"both"} ($\mathcal{R}_g(\cdot)$) or \code{"both2"} ($\mathcal{R}_{g'}(\cdot)$). Default: \code{"right"}.\\ +%\centering \code{N} & Length of the vector of FPR used to build the ROC curve: $t \in \{ 0, 1/N, 2/N, \dots, 1 \}$. Default: \code{1000}. Only used for the \code{gROC\_param()} function.\\ +%\hdashline +%\centering \code{restric} & If TRUE, the gROC curve with restriction \textit{(C)} is computed. Default: FALSE.\\ +%\centering \code{optim} & If TRUE (and \code{restric = TRUE}), the computation of the optimal gROC curve under restriction \textit{(C)} is performed by using Floyd's algorithm \citep{Floyd1962}. It is computed by the \code{allShortestPaths()} function in the \CRANpkg{e1071} package. Default: TRUE.\\ +%\centering \code{t0} & An integer number between 1 and $m+1$. If \code{restric = TRUE}, the restricted gROC curve is computed departing from (\code{t0}$-1$)/$m$. Default: the one reporting the Youden index.\\ +%\centering \code{t0max} & If TRUE, the computation of the gROC curve under restriction \textit{(C)} is performed departing from every possible \code{t0} and the one reporting the maximum AUC is selected.\\ +%\bottomrule +%\end{tabular} +%\begin{tabular}{@{}p{2cm}|p{11.6cm}@{}} +%\toprule +%\multicolumn{2}{c}{Output parameters}\\ +%\hline +%\centering \code{controls, cases} & Marker values of negative and positive subjects, respectively. \\ +%\centering \code{t} & Vector of false-positive rates.\\ +%\centering \code{roc} & Vector of values of the ROC curve for \code{t}.\\ +%\centering \code{c} & Vector of marker thresholds resulting in (\code{t}, \code{roc}) if \code{side = "right" | "left"}.\\ +%\centering \code{xl, xu} & Vectors of marker thresholds resulting in (\code{t}, \code{roc}) if \code{side = "both" | "both2"}.\\ +%\centering \code{auc} & Area under the curve estimate.\\ +%\centering \code{a, b} & Estimates for parameters $a$ and $b$ used in ROC curve estimation: $\hat{a} = \left[ \overline{\xi_n} - \overline{\chi_m} \right]/\hat{s}_\xi$, $\hat{b} = \hat{s}_\chi / \hat{s}_\xi$. Only used for the \code{gROC\_param()} function.\\ +%\centering \code{p0} & Estimate of the `central value', $\mu^*$, about to which the thresholds $x^L$ and $x^U$ are symmetrical. Only used for the \code{gROC\_param()} function.\\ +%\hdashline +%\centering \code{aucfree} & Area under the curve estimate without restrictions.\\ +%\centering \code{aucsi0} & gAUC under restriction \textit{(C)} departing from every false-positive rate, $FPR \in \{ 0, 1/m, \dots, 1 \}$.\\ +%\bottomrule +%\end{tabular} +%} +%\caption{\color{yellow} +%Most relevant parameters of the \code{gROC()} function to estimate the standard and the gROC curve. The \code{gROC\_param()} function works similarly when the binormal scenario is assumed.} +%\label{table:gROC} +%\end{table} + + + +Following the gene 20202438 expression intensity diagnostic accuracy is evaluated by the gROC curve without restrictions (\code{groc\_selg1} object) and under the restriction \textit{(C)} (\code{groc\_selg1\_C} object). The classification subsets and sensitivity for a specificity of $0.9$ are displayed with the \code{predict()} function. + +\begin{example} +R> groc_selg1 <- gROC(X = HCC$cg20202438, D = HCC$tumor, side = "both") +R> predict(groc_selg1, FPR = .1) +\end{example} + +\begin{example} +$ClassSubsets $Specificity $Sensitivity + [,1] [,2] [1] 0.9032258 [1] 0.4032258 +[1,] -Inf 0.7180623 +[2,] 0.8296072 Inf +\end{example} + +\begin{example} +R> groc_selg1_C <- gROC(X = HCC$cg20202438, D = HCC$tumor, side = "both", ++ restric = TRUE, optim = TRUE) +\end{example} + +All the classification regions underlying the standard and the generalized ROC curves without and with restrictions are represented in Figure~\ref{fig:regionsgroc}. The following code was used to generate the figure, illustrating the usage and output of the \code{plot\_regions()} function. Besides displaying all the classification regions underlying every specificity (in gray), the one chosen by the user (FPR = 0.15 by default) is highlighted in blue. Note that the ROC curves are rotated $90^\circ$ to the right, in order to use the vertical axis for FPR in both plots. + +\begin{example} +R> plot_regions(roc_selg1, cex.legend = 1.5, plot.auc = TRUE, ++ main = "Standard right-sided assumption [Classification subsets]") +R> plot_regions(groc_selg1, plot.auc = TRUE, legend = F, main.plotroc = "gROC curve", ++ main = "General approach [Classification subsets]") +R> plot_regions(groc_selg1_C, plot.auc = TRUE, legend = F, main.plotroc = "gROC curve", ++ main = "General approach with restriction (C) [Classific. subsets]", ++ xlab = "Gene 20202438 expression intensity") +\end{example} + +\begin{figure}[h!] +\centering + \includegraphics[width=.91\linewidth,page=1,trim={5mm 1cm 1.2cm 3mm},clip]{figures/plotregions_gene20202438.pdf} + \includegraphics[width=.91\linewidth,page=2,trim={5mm 1cm 1.2cm 0},clip]{figures/plotregions_gene20202438.pdf} + \includegraphics[width=.91\linewidth,page=3,trim={5mm 2mm 1.2cm 0},clip]{figures/plotregions_gene20202438.pdf} + \caption{Classification regions and the ROC curve (90º rotated) for evaluation of gene 20202438 expression intensity assuming i) standard scenario (top), ii) generalized scenario without restrictions (middle), iii) generalized scenario under restriction \textit{(C)} over the subsets (bottom).} + \label{fig:regionsgroc} +\end{figure} + +It is clear the gain achieved for considering the generalized scenario for this marker, which fits better its distribution in each group. Standard estimated AUC is 0.547, while the gAUC increases to 0.765. The gAUC is not especially affected by imposing the restriction \textit{(C)}, resulting in 0.762. + + + + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\section{Efficient ROC curve: pursuing an optimal transformation} \label{section:efficientroc} + +By keeping classification subsets of the form $s_t = (c_t, \infty)$, an alternative approach can be explored: transforming the univariate marker through a suitable function $h: \mathbb{R} \longrightarrow \mathbb{R}$ to enhance its accuracy. +Henceforth, the transformation $h^*(\cdot)$ reporting the dominant ROC curve compared to the one from any other function (i.e. $\mathcal{R}_{h^*}(\cdot) \geq \mathcal{R}_h(\cdot)$) will be referred to as \dfn{optimal transformation} (in the ROC sense), and the resulting ROC curve is called eROC \citep{Kauppi2016}. Following the well-known Neyman–Pearson lemma, \citet{McIntosh2002} proved that $h^*(\cdot)$ is the likelihood ratio. + +We enumerate the methods included in the proposed R tool by the \code{hROC()} function (with the input parameter \code{type}), listed according to the procedure considered to estimate $h^*(\cdot)$. The output of this function is an object of class `\code{hroc}`. See Table 3 in the \href{https://cran.r-project.org/web/packages/movieROC/vignettes/movieROC_vignette.pdf}{vignette} for function usage and output details. + +\textsc{Main syntax:} \hspace{3mm} +\code{hROC(X, D, type = "lrm", formula.lrm = } \verb"'D ~ pol(X,3)'" \code{, ...)} + +\begin{itemize} +\item[1.-] \citet{Camblor2019a} exploited the result proved by \citet{McIntosh2002}, suggesting to estimate the logit of the risk function by logistic regression, since it is a monotone increasing transformation of the likelihood ratio. +\item[] Computation: By the proposed R tool, the user can define any transformation $h(\cdot)$ for the right-hand side of the logistic regression model to be fitted, $\mbox{logit} \big\{ \P(D = 1 \, | \, x) \big\} = h(x)$. Particularly, by fixing the input parameters: \code{type = "lrm"} and defining the function $h(\cdot)$ as \code{formula.lrm}. + +\item[2.-] Arguing as in \citet{Camblor2021b} for univariate markers instead of multivariate, the optimal transformation in the ROC sense is equivalent to $h^*(\cdot)=f_\xi(\cdot)/\big(f_\xi(\cdot) + f_\chi(\cdot)\big)$, where $f(\cdot)$ denotes the density function. +In order to estimate $h^*(\cdot)$, different estimation procedures for the density functions separately may be used, such as the kernel density estimator. +\item[] Computation: By the \code{hROC()} function, the user may fix \code{type = "kernel"} and choosing a proper bandwidth for the kernel estimation by \code{kernel.h} in order to compute this method. + +\item[3.-] \citet{Camblor2019a} also included the estimation of the \dfn{overfitting function}, $h_{of}(\cdot)$, defined as the optimal one when no restrictions on the shape of $h^*(\cdot)$ are imposed. It takes the value 1 for the positive marker values and 0 for the negative ones, reporting an estimated AUC of 1, but totally depending on the available sample (the resulting rules cannot be extended). +\item[] Computation: $h_{of}(\cdot)$ may be estimated by fixing the input parameter \code{type = "overfitting"}. +\end{itemize} + + + +Following code and figures study the capacity of improving the classification performance of the gene 18384097 expression intensity via the above functional transformations and its impact on the final decision rules. The first one considering an ordinary cubic polynomial formula (\code{hroc\_cubic\_selg2}), and a linear tail-restricted cubic splines (\code{hroc\_rcs\_selg2}) for the right-hand side of logistic regression model. The second one using two different bandwidths ($h=1$ and $h=3$) for density function estimation. For a comparative purpose, the last one estimates the gROC curve under restriction \textit{(C)}. + +\begin{example} +R> X <- HCC$cg18384097; D <- HCC$tumor +\end{example} +\begin{example} +R> hroc_cubic_selg2 <- hROC(X, D); hroc_cubic_selg2 +\end{example} +\vspace*{-3mm} + +\begin{example} +Data was encoded with nontumor (controls) and tumor (cases). +There are 62 controls and 62 cases. +A logistic regression model of the form D ~ pol(X,3) has been performed. +The estimated parameters of the model are the following: + Intercept X X^2 X^3 + "1.551" "32.054" "-120.713" "100.449" +The specificity and sensitivity reported by the Youden index are 0.935 and 0.532, + respectively, corresponding to the following classification subset: + (-Inf, 0.442) U (0.78, Inf). +The area under the ROC curve (AUC) is 0.759. +\end{example} + +\begin{example} +R> hroc_rcs_selg2 <- hROC(X, D, formula.lrm = "D ~ rcs(X,8)") +R> hroc_lkr1_selg2 <- hROC(X, D, type = "kernel") +R> hroc_lkr3_selg2 <- hROC(X, D, type = "kernel", kernel.h = 3) +R> hroc_overfit_selg2 <- hROC(X, D, type = "overfitting") + +R> groc_selg2_C <- gROC(X, D, side = "both", restric = TRUE, optim = TRUE) +\end{example} + +The following code snippet compares the AUC achieved from each approach considered above: +\begin{example} +R> list_hroc <- list(Cubic = hroc_cubic_selg2, Splines = hroc_rcs_selg2, ++ Overfit = hroc_overfit_selg2, LikRatioEst_h3 = hroc_lkr3_selg2, ++ LikRatioEst_h1 = hroc_lkr1_selg2, gAUC_restC = groc_selg2_C) +\end{example} + +\begin{example} +R> AUCs <- sapply(list_hroc, function(x) x$auc) +R> round(AUCs, 3) +\end{example} +\vspace*{-3mm} + +\begin{example} +Cubic Splines Overfit LikRatioEst_h3 LikRatioEst_h1 gAUC_restC +0.759 0.807 1.000 0.781 0.799 0.836 +\end{example} + + +The shape of the classification regions over the original space $\mathcal{S}(X)$ depends on the monotonicity of $h^*(\cdot)$, which may be graphically studied by the \code{plot\_funregions()} function (see Figure~\ref{fig:transformations}). These regions can be visualized by the R function \code{plot\_regions()} (see Figure~\ref{fig:regionshroc}). Both are explained in Table~\ref{table:funcmovieROC} and illustrated below. The next chunk of code produced Figure~\ref{fig:transformations}, representing the different functional transformations estimated previously: +\begin{example} +R> lapply(list_hroc, function(x) plot_funregions(x, FPR = .15, FPR2 = .5)) +\end{example} + +\vspace*{-3mm} + +\begin{figure}[H] +\centering +\includegraphics[width=.85\textwidth]{figures/plotfunregions_hroc_lrm_gene18384097.pdf} + \caption{Different functional transformations and resulting classification subsets for gene 18384097. Rules for FPR 0.15 (blue) and 0.50 (red) are remarked. Top, from left to right: cubic polynomial function, restricted cubic splines (with 8 knots), and overfitted transformation. Bottom: likelihood ratio estimation with bandwidths 3 (left) and 1 (middle), and transformation resulting in the gROC curve under restriction \textit{(C)}.} + \label{fig:transformations} +\end{figure} + +%\noindent The next chunk of code produced Figure~\ref{fig:transformations}, which represents the different functional transformations estimated previously by using the \code{plot\_funregions()} function. + + + +Finally, using the \code{plot\_regions()} function, Figure~\ref{fig:regionshroc} shows the resulting classification subsets over the original space for the best two of the six methods above. First method (fitting a logistic regression model with restricted cubic splines with 8 knots) reports an AUC of 0.804 (compared to 0.684 by the standard ROC curve), but the shape of some classification rules is complex, such as $s_t=(-\infty,a_t] \cup (b_t,c_t] \cup (d_t,\infty)$. This area increases to 0.836 by considering subsets of the form $s_t=(-\infty,x_t^L] \cup (x_t^U,\infty)$, even imposing the restriction $\textit{(C)}$ to get a functional transformation $h(\cdot)$. + + +\begin{figure}[H] +\centering + \includegraphics[width=.91\linewidth,page=1,trim={5mm 1cm 1.2cm 4mm},clip]{figures/plotregions_gene18384097.pdf} + \includegraphics[width=.91\linewidth,page=2,trim={5mm 3mm 1.2cm 0},clip]{figures/plotregions_gene18384097.pdf} + \caption{Classification regions and the resulting ROC curve (90º rotated) for the gene 18384097. Top, ROC curve for restricted cubic splines transformation with 8 knots; bottom, gROC curve under restriction \textit{(C)} for the original marker.} + \label{fig:regionshroc} +\end{figure} + + + + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\section{Summary and conclusion} \label{section:conclusion} + +Conducting binary classification using continuous markers requires establishment of decision rules. In the standard case, each specificity $t \in [0,1]$ entails a classification subset of the form $s_t = (c_t,\infty)$ univocally defined. However, in more complex situations -- such as there is a non-monotone relationship between the marker and the response or in multivariate scenarios -- these become not clear. Visualization of the decision rules becomes crucial in these cases. To address this, the \CRANpkg{movieROC} package incorporates novel visualization tools complementing the ROC curve representation. + +\noindent This R package offers a user-friendly and easily comprehensible software solution tailored for practical researchers. It implements statistical techniques to estimate and compare, and finally to graphically represent different classification procedures. While several R packages address ROC curve estimation, the proposed one emphasizes the classification process, tracking the decision rules underlying the studied binary classification problem. This tool incorporates different considerations and transformations which may be useful to capture the potential of the marker to classify in non-standard scenarios. Nevertheless, this library is also useful in standard cases, as well as when the marker itself comes from a classification or regression method (such as support vector machines), because it provides nice visuals and additional information not usually reported with the ROC curve. + +\noindent The main function of the package, \code{movieROC()}, allows to monitor how the resulting classification subsets change along different specificities, thereby building the corresponding ROC curve. Notably, it introduces time as a third dimension to keep those specificities, generating informative videos. For interested readers or potential users of \CRANpkg{movieROC}, the \href{https://cran.r-project.org/web/packages/movieROC/movieROC.pdf}{manual} available in CRAN provides complete information about the implemented functions and their parameters. In addition, a \href{https://cran.r-project.org/web/packages/movieROC/vignettes/movieROC_vignette.pdf}{vignette} is accessible, including mathematical formalism and details about the algorithms implemented. + + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\section*{Computational considerations} \label{sec:complim} + +\paragraph{Dependencies} Some functions of our package depend on other libraries available on CRAN: +\begin{itemize} +\item \code{gROC(X, D, side = "both", restric = TRUE, optim = TRUE, ...)} uses the \code{allShortestPaths()} function in the \CRANpkg{e1071} package \citep{e1071CRAN2023}. +\item \code{hROC(X, D, type = "lrm", ...)} and \code{multiROC(X, D, method = "lrm", ...)} use the \code{lrm()} function in the \CRANpkg{rms} package \citep{rmsCRAN2023}. +\item \code{multiROC(X, D, method = "kernelOptimal", ...)} uses the \code{kde()} function in the \CRANpkg{ks} package \citep{ksCRAN2023}. +\item \code{multiROC(X, D, method = "dynamicMeisner", ...)} uses the \code{maxTPR()} function in the \pkg{maxTPR} package \citep{Meisner2021}. This package was removed from the CRAN repository, so we integrated the code of the \code{maxTPR()} function into our package. This function uses \code{Rsolnp::solnp()} and \code{robustbase::BYlogreg()}. +\item \code{multiROC(X, D, method = "fixedLinear", methodLinear, ...)} uses the R functions included in \citet{Kang2016} (Appendix). We integrated this code into our package. +\item \code{movieROC(obj, save = TRUE, ...)} uses the \code{saveGIF()} function in the \CRANpkg{animation} package \citep{animationCRAN2021}. +\end{itemize} + + +\paragraph{Limitations} + +Users should be aware of certain limitations while working with this package: +\begin{itemize} +\item Some methods are potentially time-consuming, especially with medium to large sample sizes: + +\item[] The estimation of the gROC curve under restriction \textit{(C)} can be computationally intensive, especially when considering different FPR to locally optimize the search using \code{gROC(X, D, side = "both", restric = TRUE, optim = TRUE, t0max = TRUE)}. Note that this method involves a quite exhaustive search of the self-contained classification subsets leading to the optimal gROC curve estimate. However, even selecting different false-positive rates $t_0$ to start from, it may not result in the optimal achievable estimate under restriction \textit{(C)}. Input parameters \code{restric}, \code{optim}, \code{t0} and \code{t0max} for \code{gROC()} function included in Table 1 of the \href{https://cran.r-project.org/web/packages/movieROC/vignettes/movieROC_vignette.pdf}{vignette} serve to control this search. + +\item[] Similarly, it also occurs for multivariate markers when considering linear frontiers with dynamic parameters (by using \code{multiROC(X, D, method = "dynamicMeisner" | "dynamicEmpirical")}). +\item Most implemented R functions consider empirical estimation for the resulting ROC curve, even if the procedure to estimate the decision rules is semi-parametric. An exception is the \code{gROC\_param()} function, which accommodates the binormal scenario. +\item When visualizing classification regions for multivariate markers with high dimension \linebreak (\code{plot\_buildROC()} and \code{movieROC()} functions for a `\code{multiroc}' object), our package provides some alternatives, but additional improvements could provide further aid in interpretation. +\end{itemize} + + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\section*{Acknowledgements} + +The authors acknowledge support by the Grants PID2019-104486GB-I00 and PID2020-118101GB-I00 from Ministerio de Ciencia e Innovación (Spanish Government), and by a financial Grant for Excellence Mobility for lecturers and researchers subsidized by the University of Oviedo in collaboration with Banco Santander. + + + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\bibliography{PerFer-MarCam_movieROC} + +\address{Sonia P\' erez Fern\'andez\\ + Department of Statistics and Operations Research and Mathematics Didactics\\ + University of Oviedo, Asturias, Spain\\ + (ORCiD: \href{https://orcid.org/0000-0002-2767-6399}{0000-0002-2767-6399})\\ + \email{perezsonia@uniovi.es}} + +\address{Pablo Mart\'inez Camblor\\ + Department of Anesthesiology\\ + Geisel School of Medicine at Dartmouth, New Hampshire, USA\\ + and\\ + Faculty of Health Sciences\\ + Universidad Autónoma de Chile, Chile\\ + (ORCiD: \href{https://orcid.org/0000-0001-7845-3905}{0000-0001-7845-3905})\\ + \email{Pablo.Martinez-Camblor@hitchcock.org}} + +\address{Norberto Corral Blanco\\ + Department of Statistics and Operations Research and Mathematics Didactics\\ + University of Oviedo, Asturias, Spain\\ + (ORCiD: \href{https://orcid.org/0000-0002-6962-6154}{0000-0002-6962-6154})\\ + \email{norbert@uniovi.es}} diff --git a/_articles/RJ-2025-035/RJ-2025-035.Rmd b/_articles/RJ-2025-035/RJ-2025-035.Rmd new file mode 100644 index 0000000000..a5b5693e6e --- /dev/null +++ b/_articles/RJ-2025-035/RJ-2025-035.Rmd @@ -0,0 +1,1227 @@ +--- +title: 'movieROC: Visualizing the Decision Rules Underlying Binary Classification' +abstract: | + The receiver operating characteristic (ROC) curve is a graphical tool + commonly used to depict the binary classification accuracy of a + continuous marker in terms of its sensitivity and specificity. The + standard ROC curve assumes a monotone relationship between the marker + and the response, inducing classification subsets of the form + $(c,\infty)$ with $c \in \mathbb{R}$. However, in non-standard cases, + the involved classification regions are not so clear, highlighting the + importance of tracking the decision rules. This paper introduces the R + package movieROC, + which provides visualization tools for understanding the ability of + markers to identify a characteristic of interest, complementing the + ROC curve representation. This tool accommodates multivariate + scenarios and generalizations involving different decision rules. The + main contribution of this package is the visualization of the + underlying classification regions, with the associated gain in + interpretability. Adding the time (videos) as a third dimension, this + package facilitates the visualization of binary classification in + multivariate problems. It constitutes a good tool to generate + graphical material for presentations. +author: +- name: Sonia Pérez-Fernández + affiliation: Department of Statistics and Operations Research and Mathematics Didactics, + University of Oviedo (Asturias, Spain) + orcid: 0000-0002-2767-6399 + email: perezsonia@uniovi.es +- name: Pablo Martínez-Camblor + affiliation: Department of Anesthesiology, Geisel School of Medicine at Dartmouth + (NH, USA); Faculty of Health Sciences, Universidad Autónoma de Chile (Chile) + orcid: 0000-0001-7845-3905 + email: Pablo.Martinez-Camblor@hitchcock.org +- name: Norberto Corral-Blanco + affiliation: Department of Statistics and Operations Research and Mathematics Didactics, + University of Oviedo (Asturias, Spain) + orcid: 0000-0002-6962-6154 + email: norbert@uniovi.es +date: '2026-02-18' +date_received: '2024-02-09' +journal: + firstpage: 59 + lastpage: 79 +volume: 17 +issue: 4 +slug: RJ-2025-035 +packages: + cran: + - movieROC + - pROC + - ROCnReg + - OptimalCutpoints + - nsROC + - rms + - ks + - e1071 + - animation + bioc: [] +preview: preview.png +bibliography: PerFer-MarCam_movieROC.bib +CTV: ~ +legacy_pdf: yes +legacy_converted: yes +output: + rjtools::rjournal_web_article: + self_contained: yes + toc: no + mathjax: https://cdn.jsdelivr.net/npm/mathjax@4/tex-mml-chtml.js + md_extension: -tex_math_single_backslash +draft: no + +--- + + +::: article +# Introduction + +The use of data to detect a characteristic of interest is a cornerstone +of many disciplines such as medicine (to diagnose a pathology or to +predict a patient outcome), finance (to detect fraud) or machine +learning (to evaluate a classification algorithm), among others. +Continuous markers are surrogate measures for the characteristic under +study, or predictors of a potential subsequent event. They are measured +in subjects, some of whom have the characteristic (_positive_), and some +without it (_negative_). In addition to reliability and feasibility, a +good marker must have two relevant properties: interpretability and +accuracy [@Mayeux2004]. High binary classification _accuracy_ can be +achieved if there exists a strong relationship between the marker and +the _response_. The latter is assessed by a _gold standard_ for the presence +or absence of the characteristic of interest. _Interpretability_ refers to +the _decision rules_ or _subsets_ considered in the classification process. +This piece of research seeks to elucidate both desirable properties for +a marker by the implementation of a graphical tool in R language. We +propose a novel approach involving the generation of videos as a +solution to effectively capture the classification procedure for +univariate and multivariate markers. Graphical analysis plays a pivotal +role in data exploration, interpretation, and communication. Its +burgeoning potential is underscored by the fast pace of technological +advances, which empower the creation of insightful graphical +representations. + +A usual practice when the binary classification accuracy of a marker is +of interest involves the representation of the _Receiver Operating +Characteristic (ROC) curve_, summarized by the _Area Under the Curve_ (_AUC_) +[@Hanley1982]. The resulting plot reflects the trade-off between the +sensitivity and the complement of the specificity. _Sensitivity_ and +_specificity_ are probabilities of correctly classifying subjects, either +positive or negative, respectively. Mathematically, let $\xi$ and $\chi$ +be the random variables modeling the marker values in the positive and +the negative population, respectively, with $F_\xi(\cdot)$ and +$F_\chi(\cdot)$ their associated cumulative distribution functions. +Assuming that the expected value of the marker is larger in the positive +than in the negative population, the standard ROC curve is based on +_classification subsets_ of the form $s = (c, \infty)$, where $c$ is the +so-called _cut-off value or threshold_ in the support of the marker $X$, +$\mathcal{S}(X)$. One subject is classified as a positive if its marker +value is within this region, and as a negative otherwise. This type of +subsets has two important advantages: first, their interpretability is +clear; second, for each specificity $1-t \in [0,1]$, the corresponding +$s_t = (c_t, \infty)$ is univocally defined by $c_t = F_\chi^{-1}(1-t)$ +for absolutely continuous markers. + +When differences in marker distribution between the negative and the +positive population are only in location but not in shape, then +$F_\chi(\cdot) < F_\xi(\cdot)$, and the classification is direct by +using these decision rules. However, when this is not the case, the +standard ROC curve may cross the main diagonal, resulting in an +_improper_ curve [@Dorfman1997]. This may be due to three different +scenarios: + +i. the behavior of the marker in the two studied populations is + different but it is not possible to determine the decision rules. + Notice that the binary classification problem goes further than + distinction between the two populations: the classification subsets + should be highly likely in one population and highly unlikely in the + other one [@Camblor2018a]; + +ii. there exists a relationship between the marker and the response with + a potential classification use, but this is not monotone; + +iii. there is no relationship between the marker and the response at all + (main diagonal ROC curve). + +In the second case, we have to define classification subsets different +from standard $s_t=(c_t,\infty)$. Therefore, the use of the marker +becomes more complex. With the aim of accommodating scenarios where both +higher and lower values of the marker are associated with a higher risk +of having the characteristic, @Camblor2017a proposed the so-called +_generalized ROC (gROC) curve_. This curve tracks the highest sensitivity +for every specificity in the unit interval resulting from subsets of the +form $s_t=(-\infty, x_t^L] \cup (x_t^U, \infty)$ with +$x_t^L \leq x_t^U \in \mathcal{S}(X)$. + +Although final decisions are based on the underlying classification +subsets, they are typically not depicted. This omission is not a +shortcoming in standard cases, as for each specificity $1-t \in [0,1]$, +there is only one rule of the form $s_t = (c_t, \infty)$ with such +specificity. Particularly, $s_t$ is univocally defined by +$c_t = 1 - F_\chi^{-1}(1-t)$; and the same applies if we fix a +sensitivity. Nevertheless, if the gROC curve is taken, there are +infinite subsets of the form $s_t=(-\infty, x_t^L] \cup (x_t^U, \infty)$ +resulting in $\mathbb{P}(\chi \in s_t) = t$. This loss of univocity underlines +the importance of reporting (numerically and/or graphically) the +decision rules actually proposed for classifying. This gap is covered in +the presented package. + +An alternative approach to assess the classification performance of a +marker involves considering a transformation of it. This transformation +$h(\cdot)$ aims to capture differences in distribution between the two +populations in the ROC sense. Once $h(\cdot)$ is identified, the +standard ROC curve for $h(X)$ is represented, resulting in the _efficient +ROC (eROC) curve_ [@Kauppi2016]. Arguing as before, for a fixed +specificity, the classification subsets $s_t=(c_t, \infty)$ in the +transformed space are univocally defined, where a subject is classified +as positive if $h(x) \in s_t$ and negative otherwise (with $x$ +representing its marker value). However, they may have any shape in the +original space, depending on the monotonicity of the functional +transformation $h(\cdot)$ [@Camblor2019a]. Emphasizing the importance of +tracking the decision rules underlying the eROC curve, this monitoring +process enables an assessment of whether the improved accuracy of the +marker justifies the potential loss in interpretability. + +The ROC curve is defined for classification accuracy evaluation of +univariate markers. To deal with multivariate markers, the usual +practice is to consider a transformation $\boldsymbol{h}(\cdot)$ to +reduce it to a univariate one, and then to construct the standard ROC +curve. Same considerations as before apply when a functional +transformation is taken. In the proposed R library, we consider methods +from the literature to define and estimate $\boldsymbol{h}(\cdot)$ in +the multivariate case [@Kang2016; @Meisner2021]. + +Focusing on the classification subsets underlying the decision rules, +the [**movieROC**](https://CRAN.R-project.org/package=movieROC) package +incorporates methods to visualize the construction process of ROC curves +by presenting the classification accuracy of these subsets. For +univariate markers, the library includes both the classical (standard +ROC curve) and the generalized (gROC curve) approach. Besides, it enables +the display of decision rules for various transformations of the marker, +seeking to maximize performance and allowing for flexibility in the final +shape of the subsets (eROC curve). For multidimensional markers, the +proposed tool visualizes the evolution of decision subsets when +different objective functions are employed for optimization, even +imposing restrictions on the underlying regions. In this case, +displaying the decision rules associated with every specificity in a +single static image is no longer feasible. Therefore, _dynamic +representations_ (videos) are implemented, drawing on time as an extra +dimension to capture the variation in specificity. + +Much software available in R could be discussed here covering diverse +topics related to ROC curves: the +[**pROC**](https://CRAN.R-project.org/package=pROC) package is a main +reference including tools to visualize, estimate and compare ROC curves +[@pROC2011]; [**ROCnReg**](https://CRAN.R-project.org/package=ROCnReg) +explicitly considers covariate information to estimate the +covariate-specific and the covariate-adjusted ROC curves [@ROCnReg2021]; +[**smoothROCtime**](https://CRAN.R-project.org/package=smoothROCtime) +implements smooth estimation of time-dependent ROC curves based on the +bivariate kernel density estimator for $(X, \textit{time-to-event})$ +[@DiazCoto2020]; +[**OptimalCutpoints**](https://CRAN.R-project.org/package=OptimalCutpoints) +includes point and interval estimation methods for optimal thresholds +[@OptimalCutpoints2014]; and +[**nsROC**](https://CRAN.R-project.org/package=nsROC) performs +non-standard analysis such as gROC estimation [@nsROC2018]; among +others. + +This paper introduces and elucidates the diverse functionalities of the +newly developed +[**movieROC**](https://CRAN.R-project.org/package=movieROC) package, +aimed at facilitating the visualization and comprehension of the +decision rules underlying the binary classification process, +encompassing various generalizations. Despite the availability of +numerous R packages implementing related analyses, we have identified +the main gaps covered in this library: tracking the decision rules +underlying the ROC curve, including multivariate markers and +non-standard scenarios (i.e. non-monotonic). The rest of the paper is +structured as follows. In +Section [2](#section:functionality){reference-type="ref" +reference="section:functionality"}, we introduce the main R functions +and objects implemented, and briefly explain the dataset employed +throughout this manuscript to demonstrate the utility of the R library. +Section [3](#section:regularroc){reference-type="ref" +reference="section:regularroc"} is devoted to reconsidering the +definition of the standard ROC curve from the perspective of +classification subsets, including an extension to multivariate +scenarios. Sections [4](#section:groc){reference-type="ref" +reference="section:groc"} and +[5](#section:efficientroc){reference-type="ref" +reference="section:efficientroc"} revisit the gROC curve and the eROC +curve, respectively, covering various methods to capture the potential +classification accuracy of the marker under study. Each of these +sections begins with a state-of-the-art overview, followed by the main +syntax of the corresponding R functions. In addition, examples of +implementation using the dataset presented in Section +[2.3](#subsection:dataset){reference-type="ref" +reference="subsection:dataset"} are provided. Finally, the paper +concludes with a concise summary and computational details regarding the +implemented tool. + +# Main functions of the movieROC package and illustrative dataset {#section:functionality} + +Sections [2.1](#subsec:functionality){reference-type="ref" +reference="subsec:functionality"} and +[2.2](#subsec:functions){reference-type="ref" +reference="subsec:functions"} provide a detailed description of the main +objectives of the implemented R functions. To reflect the practical +usage of the developed R package, we employ a real dataset throughout +this manuscript, which is introduced in +Section [2.3](#subsection:dataset){reference-type="ref" +reference="subsection:dataset"}. + +## Functionality of the movieROC package {#subsec:functionality} + +A graphical tool was developed to showcase static and dynamic graphics +displaying the classification subsets derived from maximizing diagnostic +accuracy under certain assumptions, ensuring the preservation of the +interpretability. The R package facilitates the construction of the ROC +curve across various specificities, providing visualizations of the +resulting classification regions. The proposed tool comprises multiple R +functions that generate objects with distinct class attributes (see +function names where red arrows depart from and red nodes in +Figure [1](#figure:functionality){reference-type="ref" +reference="figure:functionality"}, respectively). Once the object of +interest is created, different methods may be used, in order to plot the +underlying regions (`plot_regions()`, `plot_funregions()`), to track the +resulting ROC curve (`plot_buildROC()`, `plot()`), to `predict` decision +rules for a particular specificity, and to `print` relevant information, +among others. The main function of the package, `movieROC()`, produces +videos to exhibit the classification procedure. + +```{r functionality, fig.cap="R functions of the movieROC package. The blue nodes include the names of the R functions and the red nodes indicate the different R objects that can be created and worked with. The red arrows depart from those R functions engaged in creating R objects and the black arrows indicate which R functions can be applied to which R objects. The grey dashed arrows show internal dependencies.", out.width="100%", echo=FALSE, fig.show="hold"} +knitr::include_graphics(c( + "figures/movieROC_mainFunctions.png", + "figures/movieROC_extraFunctions.png" +)) +``` + +It includes algorithms to visualize the regions that underlie the binary +classification problem, considering different approaches: + +- make the classification subsets flexible in order to cover + non-standard scenarios, by considering two cut-off values (`gROC()` + function); explained in + Section [4](#section:groc){reference-type="ref" + reference="section:groc"}; + +- transform the marker by a proper function $h(\cdot)$ (`hROC()` + function); introduced in + Section [5](#section:efficientroc){reference-type="ref" + reference="section:efficientroc"}; + +- when dealing with multivariate markers, consider a functional + transformation with some fixed or dynamic parameters resulting from + different methods available in the literature (`multiROC()` + function); covered in + Section [3.1](#section:multiroc){reference-type="ref" + reference="section:multiroc"}. + +## Class methods for movieROC objects {#subsec:functions} + +By using the `gROC()`, the `multiROC()` or the `hROC()` function, the +user obtains an R object of class '`groc`', '`multiroc`' or '`hroc`', +respectively. These will be called +[**movieROC**](https://CRAN.R-project.org/package=movieROC) objects. +Once the object of interest is created, the implemented package includes +many functions (methods) to pass to it. Some of them are generic methods +(`print()`, `plot()` and `predict()`), commonly used in R language over +different objects according to their class attributes. The rest of the +functions are specific for this library and therefore only applicable to +[**movieROC**](https://CRAN.R-project.org/package=movieROC) objects. +The following outline summarizes all these functions and +provides their target and main syntax (with default input parameters). + +### Generic functions + + + **`print()`**: Print some relevant information. + + **`plot()`**: Plot the ROC curve estimate. + + **`predict()`**: Print the classification subsets corresponding to a particular false-positive rate (`FPR`) introduced by the user. For a '`groc`' object, the user may specify a cut-off value `C` (for the standard ROC curve) or two cut-off values `XL` and `XU` (for the gROC curve). + +### Specific functions + + * **`plot_regions()`** + +Applicable to a '`groc`' or a '`hroc`' object. Plot two graphics in the same figure: left, classification subsets for each false-positive rate (grey color by default); right, $90^\circ$ rotated ROC curve. + + _Main syntax:_ +```{r eval = FALSE} + plot_regions(obj, plot.roc = TRUE, plot.auc = FALSE, FPR = 0.15, ...) +``` + + If the input parameter `FPR` is specified, the corresponding classification region reporting such false-positive rate and the point in the ROC curve are highlighted in blue color. + + * **`plot_funregions()`** + + Applicable to a '`groc`' or a '`hroc`' object. +Plot the transforming function and the classification subsets reporting the false-positive rate(s) indicated in the input parameter(s) `FPR` and `FPR2`. + +_Main syntax:_ +```{r eval = FALSE} +plot_funregions(obj, FPR = 0.15, FPR2 = NULL, plot.subsets = TRUE, ...) +``` + + * **`plot_buildROC()`** + +Applicable to a '`groc`' or a '`multiroc`' object. + + - For a '`groc`' object: Plot four (if input `reduce` is FALSE) or two (if `reduce` is TRUE, only those on the top) graphics in the same figure: top-left, density function estimates for the marker in both populations with the areas corresponding to FPR and TPR colored (blue and red, respectively) for the optional input parameter `FPR`, `C` or `XL, XU`; top-right, the empirical ROC curve estimate; bottom-left, boxplots in both groups; bottom-right, classification subsets for every FPR (grey color). + +_Main syntax:_ +```{r eval = FALSE} +plot_buildROC(obj, FPR = 0.15, C, XL, XU, h = c(1,1), + histogram = FALSE, breaks = 15, reduce = TRUE, + build.process = FALSE, completeROC = FALSE, ...) +``` + +If `build.process` is FALSE, the whole ROC curve is displayed; otherwise, if `completeROC` is TRUE, the portion of the ROC curve until the fixed FPR is highlighted in black and the rest is shown in gray, while if `completeROC` is FALSE, only the first portion of the curve is displayed. + + - For a '`multiroc`' object: Plot two graphics in the same figure: right, the ROC curve highlighting the point and the threshold for the resulting univariate marker; left, scatterplot with the marker values in both positive (red color) and negative (blue color) subjects. About the left graphic: for $p=2$, over the original/feature bivariate space; for $p>2$, projected over two selected components of the marker (if `display.method = "OV"` with components selection in `displayOV`, `c(1,2)` by default) or the first two principal components from PCA (if `display.method = "PCA"`, default). The classification subset reporting the `FPR` selected by the user (`FPR` $\neq$ `NULL`) is displayed in gold color. + +_Main syntax:_ + +for $p=2$: +```{r eval = FALSE} +plot_buildROC(obj, FPR = 0.15, + build.process = FALSE, completeROC = TRUE, ...) +``` + +for $p>2$: +```{r eval = FALSE} +plot_buildROC(obj, FPR = 0.15, + display.method = c("PCA","OV"), displayOV = c(1,2), + build.process = FALSE, completeROC = TRUE, ...) +``` + +If `build.process` is FALSE, the whole ROC curve is displayed; otherwise, if `completeROC` is TRUE, the portion of the ROC curve until the fixed FPR is highlighted in black and the rest is shown in gray, while if `completeROC` is FALSE, only the first portion of the curve is shown. + + * **`movieROC()`** + +Applicable to a '`groc`' or a '`multiroc`' object. Save a video as a GIF illustrating the construction of the ROC curve. + + + For a '`groc`' object: + +_Main syntax:_ +```{r eval = FALSE} +movieROC(obj, fpr = NULL, + h = c(1,1), histogram = FALSE, breaks = 15, + reduce = TRUE, completeROC = FALSE, videobar = TRUE, + file = "animation1.gif", ...) +``` + +For each element in vector `fpr` (optional input parameter), the function executed is `plot_buildROC(obj, FPR = fpr[i], build.process = TRUE, ...)`. The vector of false-positive rates illustrated in the video is `NULL` by default: if length of output parameter `t` for `gROC()` function is lower than 150, such vector is taken as `fpr`; otherwise, an equally-spaced vector of length 100 covering the range of the marker values is considered. + + + For a '`multiroc`' object: + +_Main syntax:_ + +for $p=2$: +```{r eval = FALSE} +movieROC(obj, fpr = NULL, + file = "animation1.gif", save = TRUE, + border = TRUE, completeROC = FALSE, ...) +``` + +for $p>2$: +```{r eval = FALSE} +movieROC(obj, fpr = NULL, + display.method = c("PCA","OV"), displayOV = c(1,2), + file = "animation1.gif", save = TRUE, + border = TRUE, completeROC = FALSE, ...) +``` + +The video is `save`d by default as a GIF with the name indicated in argument `file` (extension `.gif` should be added). A `border` for the classification subsets is drawn by default. + +For each element in vector `fpr` (optional input parameter), the function executed is + +for $p=2$: +```{r eval = FALSE} +plot_buildROC(obj, FPR = fpr[i], build.process = TRUE, completeROC, ...) +``` + +for $p>2$: +```{r eval = FALSE} +plot_buildROC(obj, FPR = fpr[i], build.process = TRUE, completeROC, + display.method, displayOV, ...) +``` + +Same considerations about the input `fpr` as those for `movieROC()` over a '`groc`' object. + +## Illustrative dataset {#subsection:dataset} + +In order to illustrate the functionality of our R package, we consider +the `HCC` data. This dataset is derived from gene expression arrays of +tumor and adjacent non-tumor tissues of 62 Taiwanese cases of +hepatocellular carcinoma (HCC). The goal of the original study +[@Shen2012] was to identify, with a genome-wide approach, additional +genes hypermethylated in HCC that could be used for more accurate +analysis of plasma DNA for early diagnosis, by using Illumina +methylation arrays (Illumina, Inc., San Diego, CA) that screen 27,578 +autosomal CpG sites. The complete dataset was deposited in NCBI's Gene +Expression Omnibus (GEO) and it is available through series accession +number GSE37988 +([www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE37988](http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE37988){.uri}). +It is included in the presented package (`HCC` dataset), selecting 948 +genes with complete information. + +The following code loads the R package and the `HCC` dataset (see the +[vignette](https://cran.r-project.org/web/packages/movieROC/vignettes/movieROC_vignette.pdf) +for main structure). + +``` r +R> library(movieROC) +R> data(HCC) +``` + +We selected the genes 20202438, 18384097, and 03515901. On the one hand, +we chose the gene 03515901 as an example of a monotone relationship +between the marker and the response, reporting a good ROC curve. On the +other hand, relative gene expression intensities of the genes 20202438 +and 18384097 tend to be more extreme in tissues with tumor than in those +without it. These are non-standard cases, so if we limit ourselves to +detect "appropriate" genes on the basis of the standard ROC curve, they +would not be chosen. However, extending the decision rules by means of +the gROC curve, those genes may be considered as potential biomarkers +(locations) to differ between the two groups. The R code estimating and +displaying the density probability function for gene expression +intensities of the selected genes in each group +(Figure [2](#fig:densities){reference-type="ref" +reference="fig:densities"}) is included in the +[vignette](https://cran.r-project.org/web/packages/movieROC/vignettes/movieROC_vignette.pdf). + +```{r densities, fig.cap="Density histograms and kernel density estimations (lighter) for gene expression intensities of the genes 20202438, 18384097 and 03515901 in negative (non-tumor) and positive (tumor) tissues.", fig.width=10, fig.height=4, echo=FALSE} +knitr::include_graphics("figures/Histogram_3genes.png") +``` + +# Regular ROC curve {#section:regularroc} + +Assuming that there exists a monotone relationship between the marker +and the response, the _regular, right-sided or standard ROC curve_ +associated with the marker $X$ considers classification subsets of the +form $s_t=(c_t,\infty)$. For each specificity +$1-t=\mathbb{P}(\chi \notin s_t) \in [0,1]$, also called _true-negative rate_, +there exists only one subset $s_t$ reporting such specificity and thus a +particular sensitivity, also called _true-positive rate_, +$\mathbb{P}(\xi \in s_t)$. This results in a simple correspondence between each +point of the ROC curve +$\mathcal{R}_r(t) = 1-F_\xi \big(F_\chi^{-1}(1-t)\big)$ and its +associated classification region $s_t \in \mathcal{I}_r(t)$, where +$$\mathcal{I}_r(t) = \Big\{ s_t = (c_t, \infty) : c_t \in \mathcal{S}(X) , \mathbb{P}(\chi \in s_t) = t \Big\}$$ +is the _right-sided family of eligible classification subsets_. The +definition of this family captures the shape of the decision rules and +the target specificity. + +If higher values of the marker are associated with a higher probability +of not having the characteristic (see gene 03515901 in +Figure [2](#fig:densities){reference-type="ref" +reference="fig:densities"}), the ROC curve would be defined by the +_left-sided family of eligible classification subsets_ [@Camblor2017a], +$\mathcal{I}_l(t)$, similarly to $\mathcal{I}_r(t)$ but with the form +$s_t = (\infty, c_t]$. It results in +$\mathcal{R}_l(t) = F_\xi \big(F_\chi^{-1}(t) \big)$, $t\in[0,1]$, and +the decision rules are also univocally defined in this case. + +The ROC curve and related problems were widely studied in the +literature; interested readers are referred to the monographs of +@Zhou2002, @Pepe2003a, and @Nakas2023, as well as the review by +@Inacio2021. By definition, the ROC curve is confined within the unit +square, with optimal performance achieved when it approaches the +left-upper corner (AUC closer to 1). Conversely, proximity to the main +diagonal (AUC closer to 0.5) means diminished discriminatory ability, +resembling a random classifier. + +In practice, let $(\xi_1, \xi_2, \dots, \xi_n)$ and +$(\chi_1, \chi_2, \dots, \chi_m)$ be two independent and identically +distributed (i.i.d.) samples from the positive and the negative +population, respectively. Different estimation procedures are +implemented in the +[**movieROC**](https://CRAN.R-project.org/package=movieROC) package, +such as the empirical estimator [@Hsieh1996] (by default in the `gROC()` +function), accompanied by its summary indices: the AUC and the Youden +index [@Youden1950]. Alternatively, semiparametric approaches based on +kernel density estimation for the involved distributions may be +considered [@Zou1997]. The `plot_densityROC()` function provides plots +for both right- and left-sided ROC curves estimated by this method. On +the other hand, assuming that the marker follows a gaussian distribution +in both populations, that is, +$\xi \sim \mathcal{N}(\mu_\xi, \sigma_\xi)$ and +$\chi \sim \mathcal{N}(\mu_\chi, \sigma_\chi)$, parametric approaches +propose plug-in estimators by estimating the unknown parameters while +using the known distributions [@Hanley1988]. This parametric estimation +is included in the `gROC_param()` function, which works similarly to +`gROC()`. + +_Main syntax:_ + +```{r eval = FALSE} +gROC(X, D, side = "right", ...) +gROC_param(X, D, side = "right", ...) +``` + +Table 1 in the +[vignette](https://cran.r-project.org/web/packages/movieROC/vignettes/movieROC_vignette.pdf) +provides the main input and output parameters of these R functions, +which estimate the regular ROC curve (right-sided or left-sided with +`side = "right"` or `"left"`, respectively) and associated decision +rules. Its output is an R object of class '`groc`', to which the +functions listed in Section [2.2](#subsec:functions){reference-type="ref" +reference="subsec:functions"} can be applied. Most of them are +visualization tools, but the user may also `print()` summary information +and `predict()` classification regions for a particular specificity. + +Figure [3](#fig:roccurves){reference-type="ref" +reference="fig:roccurves"} graphically represents the empirical +estimation of the standard (gray line) and generalized (black line) ROC +curves for each gene in Figure [2](#fig:densities){reference-type="ref" +reference="fig:densities"}. To construct the standard ROC curve for the +first two genes (20202438 and 18384097), the right-sided ROC curve is +considered; and the left-sided curve for the third one (03515901). As +expected following the discussion about +Figure [2](#fig:densities){reference-type="ref" +reference="fig:densities"}, the standard and gROC curves are similar for +the third gene because there exists a monotone relationship between the +marker and the response. However, these curves differ for the first two +genes due to the lack of monotonicity in those scenarios. The empirical +gROC curve estimator is explained in detail in +Section [4](#section:groc){reference-type="ref" +reference="section:groc"}. + +Next chunk of code generates the figure, providing an example of the use +of `gROC()` function, `plot()` and how to get access to the AUC. + +``` r +R> for(gene in c("20202438", "18384097", "03515901")){ ++ roc <- gROC(X = HCC[,paste0("cg",gene)], D = HCC$tumor, ++ side = ifelse(gene == "03515901", "left", "right")) ++ plot(roc, col = "gray50", main = paste("Gene", gene), lwd = 3) ++ groc <- gROC(X = HCC[,paste0("cg",gene)], D = HCC$tumor, side = "both") ++ plot(groc, new = FALSE, lwd = 3) ++ legend("bottomright", paste(c("AUC =", "gAUC ="), format(c(roc$auc, groc$auc), ++ digits = 3)), col = c("gray50", "black"), lwd = 3, bty = "n", inset = .01)} +``` +```{r roccurves, fig.cap="Standard ROC curve (in gray) and gROC curve (in black) empirical estimation for the capacity of genes 20202438, 18384097 and 03515901 to differ between tumor and non-tumor group.", out.width="100%", echo=FALSE} +knitr::include_graphics("figures/ROCcurves_3genes.png") +``` + +The following code snippet estimates the standard ROC curve for gene +20202438, prints its basic information, and predicts the classification +region and sensitivity resulting in a specificity of 0.9. It provides an +illustrative example of utilizing the `print()` and `predict()` +functions. + +``` r +R> roc_selg1 <- gROC(X = HCC$cg20202438, D = HCC$tumor, side = "right") +R> roc_selg1 +``` + +``` r +Data was encoded with nontumor (controls) and tumor (cases). +It is assumed that larger values of the marker indicate larger confidence that a + given subject is a case. +There are 62 controls and 62 cases. +The specificity and sensitivity reported by the Youden index are 0.855 and 0.403, + respectively, corresponding to the following classification subset: (0.799, Inf). +The area under the right-sided ROC curve (AUC) is 0.547. +``` + +``` r +R> predict(roc_selg1, FPR = .1) +``` + +``` r +$ClassSubsets $Specificity $Sensitivity +[1] 0.8063487 Inf [1] 0.9032258 [1] 0.3064516 +``` + +The following line of code displays the whole construction of the +empirical standard ROC curve for gene 20202438. The video is saved by +default as a GIF with the name provided. + +``` r +R> movieROC(roc_selg1, reduce = FALSE, file = "StandardROC_gene20202438.gif") +``` +![](figures/StandardROC_gene20202438.gif){width=60%} + +## Multivariate ROC curve {#section:multiroc} + +In practice, many cases may benefit from combining information from +different markers to enhance classification accuracy. Rather than +assessing univariate markers separately, taking the multivariate marker +resulting from merging them can yield relevant gain. However, note that +the ROC curve and related indices are defined only for univariate +markers, as they require the existence of a total order. To address this +limitation, a common approach involves transforming the $p$-dimensional +multivariate marker $\boldsymbol{X}$ into a univariate one through a +functional transformation +$\boldsymbol{h}: \mathbb{R}^p \longrightarrow \mathbb{R}$. This +transformation $\boldsymbol{h}(\cdot)$ seeks to optimize an objective +function related to the classification accuracy, usually the AUC +[@Su1993; @McIntosh2002; @Camblor2019a]. + +We enumerate the methods included in the proposed R tool by the +`multiROC()` function (with the input parameter `method`), listed +according to the objective function to optimize. Recall that the output +of `multiROC()` is an object of class '`multiroc`', containing +information about the estimation of the ROC curve and subsets for +multivariate scenarios. Table 2 in the +[vignette](https://cran.r-project.org/web/packages/movieROC/vignettes/movieROC_vignette.pdf) +includes the usage of this function. + +_Main syntax:_ + +```{r eval = FALSE} +multiROC(X, D, method = "lrm", + formula = 'D ~ X.1 + I(X.1^2) + X.2 + I(X.2^2) + I(X.1*X.2)', ...) +``` + +1. AUC: Different procedures to estimate the $\boldsymbol{h}(\cdot)$ + maximizing the AUC in the multidimensional case have been studied in + the literature. Among all families of functions, linear combinations + ($\mathcal{L}_{\boldsymbol{\beta}}(\boldsymbol{X}) = \beta_1 X_1 + \dots + \beta_p X_p$) + are widely used due to their simplicity; an extensive review of the + existing methods was conducted by @Kang2016. + +- Computation: In the `multiROC()` function, fixing input parameters + `method = "fixedLinear"` and `methodLinear` to one from `"SuLiu"` + [@Su1993], `"PepeThompson"` [@Pepe2000], or `"minmax"` [@Liu2011]. + The R function also admits quadratic combinations when $p=2$, i.e. + $\mathcal{Q}_{\boldsymbol{\beta}}(\boldsymbol{X}) = \beta_1 X_1 + \beta_2 X_2 + \beta_3 X_1 X_2 + \beta_4 X_1^2 + \beta_5 X_2^2$, + by fixing `method = "fixedQuadratic"` for a particular + `coefQuadratic` $= \boldsymbol{\beta} = (\beta_1, \dots, \beta_5)$. + +2. The risk score function + logit$\left\{ \mathbb{P}(D = 1 \, | \, \boldsymbol{X}) \right\}$: Our + package allows the user to fit a logistic regression model + (`method = "lrm"`) considering any family of functions (linear, + quadratic, whether considering interactions or not\...) by means of + the input parameter `formula.lrm`. A stepwise regression model is + fitted if `stepModel = TRUE`. Details are explained in + Section [5](#section:efficientroc){reference-type="ref" + reference="section:efficientroc"}. + +3. The sensitivity for a particular specificity: + +a. Considering the theoretical discussion about the search of the + optimal transformation $\boldsymbol{h}(\cdot)$ pointed out in + Section [5](#section:efficientroc){reference-type="ref" + reference="section:efficientroc"}, @Camblor2021b proposed to + estimate it by multivariate kernel density estimation for + positive and negative groups separately. + + - Computation: The `multiROC()` function integrates the estimation + procedures for the bandwidth matrix developed by @Duong2007, by + fixing `method = "kernelOptimal"` and choosing a proper method + to estimate the bandwidth (`"kernelOptimal.H"`). + +b. Mainly linear combinations have been explored to date in the + scientific literature [@Meisner2021; @PerezFernandez2020]. For a + fixed specificity $t\in[0,1]$, we seek the linear combination + $\mathcal{L}_{\boldsymbol{\beta}(t)}(\boldsymbol{X}) = \beta_1(t) X_1 + \dots + \beta_p(t) X_p$ + maximizing the true-positive rate by considering standard + subsets for the transformed marker. The coefficients + $\boldsymbol{\beta}(t)$ are called 'dynamic parameters' because + they may be different for each $t \in [0,1]$. + + - Computation: Since our objective is to display the ROC curve, + $\mathcal{L}_{\boldsymbol{\beta}(t)}(\boldsymbol{X})$ is + estimated for every $t$ in a grid of the unit interval, + resulting in one $\boldsymbol{\hat{\beta}}(t)$ for each $t$. + This approach is time-consuming, especially when it is based on + the plug-in empirical estimators involved + (`method = "dynamicEmpirical"`, only implemented for $p=2$), and + may result in overfitting. Instead, @Meisner2021 method is + recommended (`method = "dynamicMeisner"`). + +Once the classification subsets for a multivariate marker are +constructed by the `multiROC()` function, several R methods may be used +for the output object (see Section [2.2](#subsec:functions){reference-type="ref" +reference="subsec:functions"}). They include `print` relevant +information or `plot` the resulting ROC curve. The main contribution of +the package is to plot the construction of the ROC curve together with +the classification subsets in a static figure for a particular FPR +(`plot_buildROC()` function), or in a video for tracking the whole +process (`movieROC()` function). + +Figure \@ref(fig:biroc) +illustrates the videos resulting from `movieROC()` function. +Particularly, classification accuracy of the +bivariate marker `(cg20202438, cg18384097)` was studied by using four +different approaches indicated on the captions, considering linear +combinations (top) and nonlinear transformations (bottom). This figure +was implemented by the code below, integrating `multiROC()` and +`movieROC()` functions. Four videos are saved as GIF files with names +`"PepeTh.gif"` (a), `"Meisner.gif"` (b), `"LRM.gif"` (c), and +`"KernelDens.gif"` (d). + +``` r +R> X <- HCC[ ,c("cg20202438", "cg18384097")]; D <- HCC$tumor +R> biroc_12_PT <- multiROC(X, D, method = "fixedLinear", methodLinear = "PepeThompson") +R> biroc_12_Meis <- multiROC(X, D, method = "dynamicMeisner", verbose = TRUE) +R> biroc_12_lrm <- multiROC(X, D) +R> biroc_12_kernel <- multiROC(X, D, method = "kernelOptimal") +R> list_biroc <- list(PepeTh = biroc_12_PT, Meisner = biroc_12_Meis, ++ LRM = biroc_12_lrm, KernelDens = biroc_12_kernel) +R> lapply(names(list_biroc), function(x) movieROC(list_biroc[[x]], ++ display.method = "OV", xlab = "Gene 20202438", ylab = "Gene 18384097", ++ cex = 1.2, alpha.points = 1, lwd.curve = 4, file = paste0(x, ".gif"))) +``` + +::: {.figure} + +| ![a) Pepe and Thompson](figures/PepeTh.gif){width=100%} | ![b) Meisner et al.](figures/Meisner.gif){width=100%} | +|:--------------------------------------------------------:|:-----------------------------------------------------:| +| **(a)** Linear combinations with fixed parameters by @Pepe2000. | **(b)** Linear combinations with dynamic parameters by @Meisner2021. | + +| ![c) Logistic Regression](figures/LRM.gif){width=100%} | ![d) Kernel Density](figures/KernelDens.gif){width=100%} | +|:--------------------------------------------------------:|:---------------------------------------------------------:| +| **(c)** Logistic regression model with quadratic formula by default (see `formula.lrm` in Table 2 of the [vignette](https://cran.r-project.org/web/packages/movieROC/vignettes/movieROC_vignette.pdf)). | **(d)** Optimal transformation by multivariate kernel density estimation with `"Hbcv"` method by default [@Camblor2021b]. | +::: + +```{r biroc, echo=FALSE, out.width="1px", out.height="1px", fig.cap="Videos (from `movieROC()` function) of the classification procedure and ROC curve for the bivariate marker `cg20202438`, `cg18384097`). Four different methods for classification are displayed."} +knitr::include_graphics("transparent.png") +``` + +When the marker has a dimension higher than two it is difficult to +visualize the data and the classification regions. Therefore, the +`movieROC()` function offers two options for showing the results, both +on a bidimensional space. On the one hand, to choose two of the +components of the multivariate marker and project the classification +subsets on the plane defined by them +(Figure [5](#fig:multiroccurve){reference-type="ref" +reference="fig:multiroccurve"}, middle). On the other, to project the +classification regions on the plane defined by the two first principal +components (Figure [5](#fig:multiroccurve){reference-type="ref" +reference="fig:multiroccurve"}, left). The R function `prcomp()` from `stats` +is used to perform Principal Components Analysis (PCA) [@Hotelling1933]. + +Figure [5](#fig:multiroccurve){reference-type="ref" +reference="fig:multiroccurve"} shows the difficulty in displaying the +decision rules when $p>2$ (the 3 genes used along this manuscript), even +with the two options implemented in our package. It was generated using +`multiROC()` and `plot_buildROC()`: + +``` r +R> multiroc_PT <- multiROC(X = HCC[ ,c("cg20202438", "cg18384097", "cg03515901")], ++ D = HCC$tumor, method = "fixedLinear", methodLinear = "PepeThompson") +R> multiroc_PT +``` + +``` r +Data was encoded with nontumor (controls) and tumor (cases). +There are 62 controls and 62 cases. +A total of 3 variables have been considered. +A linear combination with fixed parameters estimated by PepeThompson approach has + been considered. +The specificity and sensitivity reported by the Youden index are 0.855 and 0.742, + respectively, corresponding to the cut-off point -0.0755 for the transformation + h(X) = 0.81*cg20202438 - 0.1*cg18384097 - 1*cg03515901. +The area under the ROC curve (AUC) is 0.811. +``` + +``` r +R> plot_buildROC(multiroc_PT, cex = 1.2, lwd.curve = 4) +R> plot_buildROC(multiroc_PT, display.method = "OV", displayOV = c(1,3), cex = 1.2, ++ xlab = "Gene 20202438", ylab = "Gene 03515901", lwd.curve = 4) +``` + +```{r multiroccurve, fig.cap="Multivariate ROC curve estimation for the simultaneous diagnostic accuracy of genes 20202438, 18384097 and 03515901. Pepe and Thompson (2000) approach was used to estimate the linear coefficients and classification rules (yellow and gray border for positive and negative class, respectively) for a FPR 0.15 are displayed. Left, projected over the 2 principal components from PCA; middle, over the 1st and the 3rd selected genes.", out.width="100%", echo=FALSE} +knitr::include_graphics("figures/buildROC_multi3_full.png") +``` + +# Generalized ROC curve {#section:groc} + +There are scenarios whose standard ROC curves are not concave (first two +genes in Figure [3](#fig:roccurves){reference-type="ref" +reference="fig:roccurves"}, gray solid line), reflecting that the +standard initial assumption of existence of a monotone relationship +between the marker and the response is misleading. In +Figure [2](#fig:densities){reference-type="ref" +reference="fig:densities"}, we may see that difference in gene 20202438 +distribution between those tissues which have the characteristic and those +which do not is mainly in dispersion. To accommodate this common type of +scenarios, @Camblor2017a extended the ROC curve definition to the case +where both extremes for marker values are associated with a higher risk +of having the characteristic of interest, by considering the _both-sided +family of eligible classification subsets_: +$$\mathcal{I}_g(t) = \Big\{ s_t = (-\infty,x_t^L] \cup (x_t^U, \infty) : x_t^L \leq x_t^U \in \mathcal{S}(X) , \mathbb{P}(\chi \in s_t) = t \Big\}.$$ + +It becomes crucial to consider the supremum in the definition of the +generalized ROC curve because the decision rule for each $t \in [0,1]$ +is not univocally defined: there exist infinite pairs $x_t^L \leq x_t^U$ +reporting a specificity $1-t$ (i.e. $\mathcal{I}_g(t)$ is uncountably +infinite). Computationally, this optimization process incurs a +time-consuming estimation, depending on the number of different marker +values in the sample. + +After the introduction of this extension, several studies followed up +regarding estimation of the gROC curve [@Camblor2017a; @Camblor2019b] +and related measures such as its area (gAUC) [@Camblor2021a] and the +Youden index [@Camblor2019c; @Bantis2021a]. By considering this +generalization, another property of the classification subsets may be +lost: the regions may not be self-contained over the increase in +false-positive rate. It may happen that a subject is classified as a +positive for a particular FPR $t_1$, but as a negative for a higher FPR +$t_2$. Therefore, it is natural to establish a restriction *(C)* on the +classification subsets, ensuring that any subject classified as a +positive for a fixed specificity (or sensitivity) will also be +classified as a positive for any classification subset with lower +specificity (higher sensitivity). @PerezFernandez2020 proposed an +algorithm to estimate the gROC curve under restriction *(C)*, included +in the `gROC()` function of the presented R package. See final Section +for computational details about this algorithm implementation. + +_Main syntax:_ + +```{r eval = FALSE} +gROC(X, D, side = "both", ...) +gROC_param(X, D, side = "both", ...) +``` + +Table 1 in the +[vignette](https://cran.r-project.org/web/packages/movieROC/vignettes/movieROC_vignette.pdf) +collects the input and output parameters of the `gROC()` function, which +estimates the gROC curve, both in the mentioned direction +(`side = "both"`) and in the opposite, i.e. when classification subsets +of the form $s_t=(x_t^L, x_t^U]$ are considered (`side = "both2"`). In +addition, all the particular methods for a '`groc`' object collected in +Section [2.2](#subsec:functions){reference-type="ref" +reference="subsec:functions"} may be used in this general +scenario. + +Following, the gene 20202438 expression intensity diagnostic accuracy is +evaluated by the gROC curve without restrictions (`groc_selg1` object) +and under the restriction *(C)* (`groc_selg1_C` object). The +classification subsets and sensitivity for a specificity of $0.9$ are +displayed with the `predict()` function. + +``` r +R> groc_selg1 <- gROC(X = HCC$cg20202438, D = HCC$tumor, side = "both") +R> predict(groc_selg1, FPR = .1) +``` + +``` r +$ClassSubsets $Specificity $Sensitivity + [,1] [,2] [1] 0.9032258 [1] 0.4032258 +[1,] -Inf 0.7180623 +[2,] 0.8296072 Inf +``` + +``` r +R> groc_selg1_C <- gROC(X = HCC$cg20202438, D = HCC$tumor, side = "both", ++ restric = TRUE, optim = TRUE) +``` + +All the classification regions underlying the standard and the +generalized ROC curves without and with restrictions are represented in +Figure [6](#fig:regionsgroc){reference-type="ref" +reference="fig:regionsgroc"}. The following code was used to generate +the figure, illustrating the usage and output of the `plot_regions()` +function. Besides displaying all the classification regions underlying +every specificity (in gray), the one chosen by the user (FPR = 0.15 by +default) is highlighted in blue. Note that the ROC curves are rotated +$90^\circ$ to the right, in order to use the vertical axis for FPR in +both plots. + +``` r +R> plot_regions(roc_selg1, cex.legend = 1.5, plot.auc = TRUE, ++ main = "Standard right-sided assumption [Classification subsets]") +R> plot_regions(groc_selg1, plot.auc = TRUE, legend = F, ++ main.plotroc = "gROC curve", ++ main = "General approach [Classification subsets]") +R> plot_regions(groc_selg1_C, plot.auc = TRUE, legend = F, ++ main.plotroc = "gROC curve", ++ main = "General approach with restriction (C) [Classific. subsets]", ++ xlab = "Gene 20202438 expression intensity") +``` + +```{r regionsgroc, fig.cap="Classification regions and the ROC curve (90º rotated) for evaluation of gene 20202438 expression intensity assuming i) standard scenario (top), ii) generalized scenario without restrictions (middle), iii) generalized scenario under restriction *(C)* over the subsets (bottom).", out.width="100%", echo=FALSE, fig.show="hold"} +knitr::include_graphics(c("figures/plotregions_gene20202438_1.png", "figures/plotregions_gene20202438_2.png", "figures/plotregions_gene20202438_3.png")) +``` + + +It is clear the gain achieved for considering the generalized scenario +for this marker, which fits better its distribution in each group. +Standard estimated AUC is 0.547, while the gAUC increases to 0.765. The +gAUC is not especially affected by imposing the restriction *(C)*, +resulting in 0.762. + +# Efficient ROC curve: pursuing an optimal transformation {#section:efficientroc} + +By keeping classification subsets of the form $s_t = (c_t, \infty)$, an +alternative approach can be explored: transforming the univariate marker +through a suitable function $h: \mathbb{R} \longrightarrow \mathbb{R}$ +to enhance its accuracy. Henceforth, the transformation $h^*(\cdot)$ +reporting the dominant ROC curve compared to the one from any other +function (i.e. $\mathcal{R}_{h^*}(\cdot) \geq \mathcal{R}_h(\cdot)$) +will be referred to as _optimal transformation_ (in the ROC sense), and +the resulting ROC curve is called eROC [@Kauppi2016]. Following the +well-known Neyman--Pearson lemma, @McIntosh2002 proved that $h^*(\cdot)$ +is the likelihood ratio. + +We enumerate the methods included in the proposed R tool by the `hROC()` +function (with the input parameter `type`), listed according to the +procedure considered to estimate $h^*(\cdot)$. The output of this +function is an object of class '`hroc`'. See Table 3 in the +[vignette](https://cran.r-project.org/web/packages/movieROC/vignettes/movieROC_vignette.pdf) +for function usage and output details. + +_Main syntax:_ + +```{r eval = FALSE} +hROC(X, D, type = "lrm", formula.lrm = 'D ~ pol(X,3)', ...) +``` + +1. @Camblor2019a exploited the result proved by @McIntosh2002, + suggesting to estimate the logit of the risk function by logistic + regression, since it is a monotone increasing transformation of the + likelihood ratio. + +- Computation: By the proposed R tool, the user can define any + transformation $h(\cdot)$ for the right-hand side of the logistic + regression model to be fitted, + logit$\big\{ \mathbb{P}(D = 1 \, | \, x) \big\} = h(x)$. Particularly, by + fixing the input parameters: `type = "lrm"` and defining the + function $h(\cdot)$ as `formula.lrm`. + +2. Arguing as in @Camblor2021b for univariate markers instead of + multivariate, the optimal transformation in the ROC sense is + equivalent to + $h^*(\cdot)=f_\xi(\cdot)/\big(f_\xi(\cdot) + f_\chi(\cdot)\big)$, + where $f(\cdot)$ denotes the density function. In order to estimate + $h^*(\cdot)$, different estimation procedures for the density + functions separately may be used, such as the kernel density + estimator. + +- Computation: By the `hROC()` function, the user may fix + `type = "kernel"` and choose a proper bandwidth for the kernel + estimation by `kernel.h` in order to compute this method. + +3. @Camblor2019a also included the estimation of the _overfitting + function_, $h_{of}(\cdot)$, defined as the optimal one when no + restrictions on the shape of $h^*(\cdot)$ are imposed. It takes the + value 1 for the positive marker values and 0 for the negative ones, + reporting an estimated AUC of 1, but totally depending on the + available sample (the resulting rules cannot be extended). + +- Computation: $h_{of}(\cdot)$ may be estimated by fixing the input + parameter `type = "overfitting"`. + +The following code and figures study the capacity of improving the +classification performance of the gene 18384097 expression intensity via +the above functional transformations and its impact on the final +decision rules. The first one considers an ordinary cubic polynomial +formula (`hroc_cubic_selg2`), and a linear tail-restricted cubic spline +(`hroc_rcs_selg2`) for the right-hand side of the logistic regression model. +The second one uses two different bandwidths ($h=1$ and $h=3$) for +density function estimation. For a comparative purpose, the last one +estimates the gROC curve under restriction *(C)*. + +``` r +R> X <- HCC$cg18384097; D <- HCC$tumor +``` + +``` r +R> hroc_cubic_selg2 <- hROC(X, D); hroc_cubic_selg2 +``` + +``` r +Data was encoded with nontumor (controls) and tumor (cases). +There are 62 controls and 62 cases. +A logistic regression model of the form D ~ pol(X,3) has been performed. +The estimated parameters of the model are the following: + Intercept X X^2 X^3 + "1.551" "32.054" "-120.713" "100.449" +The specificity and sensitivity reported by the Youden index are 0.935 and 0.532, + respectively, corresponding to the following classification subset: + (-Inf, 0.442) U (0.78, Inf). +The area under the ROC curve (AUC) is 0.759. +``` + +``` r +R> hroc_rcs_selg2 <- hROC(X, D, formula.lrm = "D ~ rcs(X,8)") +R> hroc_lkr1_selg2 <- hROC(X, D, type = "kernel") +R> hroc_lkr3_selg2 <- hROC(X, D, type = "kernel", kernel.h = 3) +R> hroc_overfit_selg2 <- hROC(X, D, type = "overfitting") + +R> groc_selg2_C <- gROC(X, D, side = "both", restric = TRUE, optim = TRUE) +``` + +The following code snippet compares the AUC achieved from each approach +considered above: + +``` r +R> list_hroc <- list(Cubic = hroc_cubic_selg2, Splines = hroc_rcs_selg2, ++ Overfit = hroc_overfit_selg2, LikRatioEst_h3 = hroc_lkr3_selg2, ++ LikRatioEst_h1 = hroc_lkr1_selg2, gAUC_restC = groc_selg2_C) +``` + +``` r +R> AUCs <- sapply(list_hroc, function(x) x$auc) +R> round(AUCs, 3) +``` + +``` r +Cubic Splines Overfit LikRatioEst_h3 LikRatioEst_h1 gAUC_restC +0.759 0.807 1.000 0.781 0.799 0.836 +``` + +The shape of the classification regions over the original space +$\mathcal{S}(X)$ depends on the monotonicity of $h^*(\cdot)$, which may +be graphically studied by the `plot_funregions()` function (see +Figure [7](#fig:transformations){reference-type="ref" +reference="fig:transformations"}). These regions can be visualized by +the R function `plot_regions()` (see +Figure [8](#fig:regionshroc){reference-type="ref" +reference="fig:regionshroc"}). Both are explained in +Section [2.2](#subsec:functions){reference-type="ref" +reference="subsec:functions"} and illustrated below. The next chunk of +code produced Figure [7](#fig:transformations){reference-type="ref" +reference="fig:transformations"}, representing the different functional +transformations estimated previously: + +``` r +R> lapply(list_hroc, function(x) plot_funregions(x, FPR = .15, FPR2 = .5)) +``` + +```{r transformations, fig.cap="Different functional transformations and resulting classification subsets for gene 18384097. Rules for FPR 0.15 (blue) and 0.50 (red) are remarked. Top, from left to right: cubic polynomial function, restricted cubic splines (with 8 knots), and overfitted transformation. Bottom: likelihood ratio estimation with bandwidths 3 (left) and 1 (middle), and transformation resulting in the gROC curve under restriction *(C)*.", out.width="100%", echo=FALSE} +knitr::include_graphics("figures/plotfunregions_hroc_lrm_gene18384097.png") +``` + + +Finally, using the `plot_regions()` function, +Figure [8](#fig:regionshroc){reference-type="ref" +reference="fig:regionshroc"} shows the resulting classification subsets +over the original space for the best two of the six methods above. The first +method (fitting a logistic regression model with restricted cubic +splines with 8 knots) reports an AUC of 0.804 (compared to 0.684 by the +standard ROC curve), but the shape of some classification rules is +complex, such as $s_t=(-\infty,a_t] \cup (b_t,c_t] \cup (d_t,\infty)$. +This area increases to 0.836 by considering subsets of the form +$s_t=(-\infty,x_t^L] \cup (x_t^U,\infty)$, even imposing the restriction +*(C)* to get a functional transformation $h(\cdot)$. + +```{r regionshroc, fig.cap="Classification regions and the resulting ROC curve (90º rotated) for the gene 18384097. Top, ROC curve for restricted cubic splines transformation with 8 knots; bottom, gROC curve under restriction *(C)* for the original marker.", out.width="100%", echo=FALSE, fig.show="hold"} +knitr::include_graphics(c("figures/plotregions_gene18384097_1.png", "figures/plotregions_gene18384097_2.png")) +``` + + +# Summary and conclusion {#section:conclusion} + +Conducting binary classification using continuous markers requires +establishment of decision rules. In the standard case, each specificity +$t \in [0,1]$ entails a classification subset of the form +$s_t = (c_t,\infty)$ univocally defined. However, in more complex +situations -- such as where there is a non-monotone relationship between the +marker and the response or in multivariate scenarios -- these become not +clear. Visualization of the decision rules becomes crucial in these +cases. To address this, the +[**movieROC**](https://CRAN.R-project.org/package=movieROC) package +incorporates novel visualization tools complementing the ROC curve +representation. + +This R package offers a user-friendly and easily comprehensible software +solution tailored for practical researchers. It implements statistical +techniques to estimate and compare, and finally to graphically represent +different classification procedures. While several R packages address +ROC curve estimation, the proposed one emphasizes the classification +process, tracking the decision rules underlying the studied binary +classification problem. This tool incorporates different considerations +and transformations which may be useful to capture the potential of the +marker to classify in non-standard scenarios. Nevertheless, this library +is also useful in standard cases, as well as when the marker itself +comes from a classification or regression method (such as support vector +machines), because it provides nice visuals and additional information +not usually reported with the ROC curve. + +The main function of the package, `movieROC()`, allows users to monitor how +the resulting classification subsets change along different +specificities, thereby building the corresponding ROC curve. Notably, it +introduces time as a third dimension to keep those specificities, +generating informative videos. For interested readers or potential users +of [**movieROC**](https://CRAN.R-project.org/package=movieROC), the +[manual](https://cran.r-project.org/web/packages/movieROC/movieROC.pdf) +available in CRAN provides complete information about the implemented +functions and their parameters. In addition, a +[vignette](https://cran.r-project.org/web/packages/movieROC/vignettes/movieROC_vignette.pdf) +is accessible, including mathematical formalism and details about the +algorithms implemented. + +# Computational considerations {#sec:complim .unnumbered} + +### Dependencies + +Some functions of our package depend on other libraries available on +CRAN: + +- `gROC(X, D, side = "both", restric = TRUE, optim = TRUE, ...)` uses + the `allShortestPaths()` function in the + [**e1071**](https://CRAN.R-project.org/package=e1071) package + [@e1071CRAN2023]. + +- `hROC(X, D, type = "lrm", ...)` and + `multiROC(X, D, method = "lrm", ...)` use the `lrm()` function in + the [**rms**](https://CRAN.R-project.org/package=rms) package + [@rmsCRAN2023]. + +- `multiROC(X, D, method = "kernelOptimal", ...)` uses the `kde()` + function in the [**ks**](https://CRAN.R-project.org/package=ks) + package [@ksCRAN2023]. + +- `multiROC(X, D, method = "dynamicMeisner", ...)` uses the `maxTPR()` + function in the **maxTPR** package [@Meisner2021]. This package was + removed from the CRAN repository, so we integrated the code of the + `maxTPR()` function into our package. This function uses + `Rsolnp::solnp()` and `robustbase::BYlogreg()`. + +- `multiROC(X, D, method = "fixedLinear", methodLinear, ...)` uses the + R functions included in @Kang2016 (Appendix). We integrated this + code into our package. + +- `movieROC(obj, save = TRUE, ...)` uses the `saveGIF()` function in + the [**animation**](https://CRAN.R-project.org/package=animation) + package [@animationCRAN2021]. + +### Limitations + +Users should be aware of certain limitations while working with this +package: + +* Some methods are potentially time-consuming, especially with medium + to large sample sizes: + + + The estimation of the gROC curve under restriction *(C)* can be + computationally intensive, especially when considering different FPR + to locally optimize the search using + `gROC(X, D, side = "both", restric = TRUE, optim = TRUE, t0max = TRUE)`. + Note that this method involves a quite exhaustive search of the + self-contained classification subsets leading to the optimal gROC + curve estimate. However, even selecting different false-positive + rates $t_0$ to start from, it may not result in the optimal + achievable estimate under restriction *(C)*. Input parameters + `restric`, `optim`, `t0` and `t0max` for `gROC()` function included + in Table 1 of the + [vignette](https://cran.r-project.org/web/packages/movieROC/vignettes/movieROC_vignette.pdf) + serve to control this search. + + + Similarly, it also occurs for multivariate markers when considering + linear frontiers with dynamic parameters (by using
+ `multiROC(X, D, method = "dynamicMeisner" | "dynamicEmpirical")`). + +* Most implemented R functions consider empirical estimation for the + resulting ROC curve, even if the procedure to estimate the decision + rules is semi-parametric. An exception is the `gROC_param()` + function, which accommodates the binormal scenario. + +* When visualizing classification regions for multivariate markers + with high dimension (`plot_buildROC()` and `movieROC()` functions + for a '`multiroc`' object), our package provides some alternatives, + but additional improvements could provide further aid in + interpretation. + +# Acknowledgements {#acknowledgements .unnumbered} + +The authors acknowledge support by the Grants PID2019-104486GB-I00 and +PID2020-118101GB-I00 from Ministerio de Ciencia e Innovación (Spanish +Government), and by a financial Grant for Excellence Mobility for +lecturers and researchers subsidized by the University of Oviedo in +collaboration with Banco Santander. +::: diff --git a/_articles/RJ-2025-035/RJ-2025-035.html b/_articles/RJ-2025-035/RJ-2025-035.html new file mode 100644 index 0000000000..a75dfc757f --- /dev/null +++ b/_articles/RJ-2025-035/RJ-2025-035.html @@ -0,0 +1,3078 @@ + + + + + + + + + + + + + + + + + + + + + + movieROC: Visualizing the Decision Rules Underlying Binary Classification + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+

movieROC: Visualizing the Decision Rules Underlying Binary Classification

+ + + +

The receiver operating characteristic (ROC) curve is a graphical tool +commonly used to depict the binary classification accuracy of a +continuous marker in terms of its sensitivity and specificity. The +standard ROC curve assumes a monotone relationship between the marker +and the response, inducing classification subsets of the form +\((c,\infty)\) with \(c \in \mathbb{R}\). However, in non-standard cases, +the involved classification regions are not so clear, highlighting the +importance of tracking the decision rules. This paper introduces the R +package movieROC, +which provides visualization tools for understanding the ability of +markers to identify a characteristic of interest, complementing the +ROC curve representation. This tool accommodates multivariate +scenarios and generalizations involving different decision rules. The +main contribution of this package is the visualization of the +underlying classification regions, with the associated gain in +interpretability. Adding the time (videos) as a third dimension, this +package facilitates the visualization of binary classification in +multivariate problems. It constitutes a good tool to generate +graphical material for presentations.

+
+ + + +
+
+

1 Introduction

+

The use of data to detect a characteristic of interest is a cornerstone +of many disciplines such as medicine (to diagnose a pathology or to +predict a patient outcome), finance (to detect fraud) or machine +learning (to evaluate a classification algorithm), among others. +Continuous markers are surrogate measures for the characteristic under +study, or predictors of a potential subsequent event. They are measured +in subjects, some of whom have the characteristic (positive), and some +without it (negative). In addition to reliability and feasibility, a +good marker must have two relevant properties: interpretability and +accuracy (Mayeux 2004). High binary classification accuracy can be +achieved if there exists a strong relationship between the marker and +the response. The latter is assessed by a gold standard for the presence +or absence of the characteristic of interest. Interpretability refers to +the decision rules or subsets considered in the classification process. +This piece of research seeks to elucidate both desirable properties for +a marker by the implementation of a graphical tool in R language. We +propose a novel approach involving the generation of videos as a +solution to effectively capture the classification procedure for +univariate and multivariate markers. Graphical analysis plays a pivotal +role in data exploration, interpretation, and communication. Its +burgeoning potential is underscored by the fast pace of technological +advances, which empower the creation of insightful graphical +representations.

+

A usual practice when the binary classification accuracy of a marker is +of interest involves the representation of the Receiver Operating +Characteristic (ROC) curve, summarized by the Area Under the Curve (AUC) +(Hanley and McNeil 1982). The resulting plot reflects the trade-off between the +sensitivity and the complement of the specificity. Sensitivity and +specificity are probabilities of correctly classifying subjects, either +positive or negative, respectively. Mathematically, let \(\xi\) and \(\chi\) +be the random variables modeling the marker values in the positive and +the negative population, respectively, with \(F_\xi(\cdot)\) and +\(F_\chi(\cdot)\) their associated cumulative distribution functions. +Assuming that the expected value of the marker is larger in the positive +than in the negative population, the standard ROC curve is based on +classification subsets of the form \(s = (c, \infty)\), where \(c\) is the +so-called cut-off value or threshold in the support of the marker \(X\), +\(\mathcal{S}(X)\). One subject is classified as a positive if its marker +value is within this region, and as a negative otherwise. This type of +subsets has two important advantages: first, their interpretability is +clear; second, for each specificity \(1-t \in [0,1]\), the corresponding +\(s_t = (c_t, \infty)\) is univocally defined by \(c_t = F_\chi^{-1}(1-t)\) +for absolutely continuous markers.

+

When differences in marker distribution between the negative and the +positive population are only in location but not in shape, then +\(F_\chi(\cdot) < F_\xi(\cdot)\), and the classification is direct by +using these decision rules. However, when this is not the case, the +standard ROC curve may cross the main diagonal, resulting in an +improper curve (Dorfman et al. 1997). This may be due to three different +scenarios:

+
    +
  1. the behavior of the marker in the two studied populations is +different but it is not possible to determine the decision rules. +Notice that the binary classification problem goes further than +distinction between the two populations: the classification subsets +should be highly likely in one population and highly unlikely in the +other one (Martı́nez-Camblor 2018);

  2. +
  3. there exists a relationship between the marker and the response with +a potential classification use, but this is not monotone;

  4. +
  5. there is no relationship between the marker and the response at all +(main diagonal ROC curve).

  6. +
+

In the second case, we have to define classification subsets different +from standard \(s_t=(c_t,\infty)\). Therefore, the use of the marker +becomes more complex. With the aim of accommodating scenarios where both +higher and lower values of the marker are associated with a higher risk +of having the characteristic, Martı́nez-Camblor et al. (2017) proposed the so-called +generalized ROC (gROC) curve. This curve tracks the highest sensitivity +for every specificity in the unit interval resulting from subsets of the +form \(s_t=(-\infty, x_t^L] \cup (x_t^U, \infty)\) with +\(x_t^L \leq x_t^U \in \mathcal{S}(X)\).

+

Although final decisions are based on the underlying classification +subsets, they are typically not depicted. This omission is not a +shortcoming in standard cases, as for each specificity \(1-t \in [0,1]\), +there is only one rule of the form \(s_t = (c_t, \infty)\) with such +specificity. Particularly, \(s_t\) is univocally defined by +\(c_t = 1 - F_\chi^{-1}(1-t)\); and the same applies if we fix a +sensitivity. Nevertheless, if the gROC curve is taken, there are +infinite subsets of the form \(s_t=(-\infty, x_t^L] \cup (x_t^U, \infty)\) +resulting in \(\mathbb{P}(\chi \in s_t) = t\). This loss of univocity underlines +the importance of reporting (numerically and/or graphically) the +decision rules actually proposed for classifying. This gap is covered in +the presented package.

+

An alternative approach to assess the classification performance of a +marker involves considering a transformation of it. This transformation +\(h(\cdot)\) aims to capture differences in distribution between the two +populations in the ROC sense. Once \(h(\cdot)\) is identified, the +standard ROC curve for \(h(X)\) is represented, resulting in the efficient +ROC (eROC) curve (Kauppi 2016). Arguing as before, for a fixed +specificity, the classification subsets \(s_t=(c_t, \infty)\) in the +transformed space are univocally defined, where a subject is classified +as positive if \(h(x) \in s_t\) and negative otherwise (with \(x\) +representing its marker value). However, they may have any shape in the +original space, depending on the monotonicity of the functional +transformation \(h(\cdot)\) (Martı́nez-Camblor et al. 2019). Emphasizing the importance of +tracking the decision rules underlying the eROC curve, this monitoring +process enables an assessment of whether the improved accuracy of the +marker justifies the potential loss in interpretability.

+

The ROC curve is defined for classification accuracy evaluation of +univariate markers. To deal with multivariate markers, the usual +practice is to consider a transformation \(\boldsymbol{h}(\cdot)\) to +reduce it to a univariate one, and then to construct the standard ROC +curve. Same considerations as before apply when a functional +transformation is taken. In the proposed R library, we consider methods +from the literature to define and estimate \(\boldsymbol{h}(\cdot)\) in +the multivariate case (Kang et al. 2016; Meisner et al. 2021).

+

Focusing on the classification subsets underlying the decision rules, +the movieROC package +incorporates methods to visualize the construction process of ROC curves +by presenting the classification accuracy of these subsets. For +univariate markers, the library includes both the classical (standard +ROC curve) and the generalized (gROC curve) approach. Besides, it enables +the display of decision rules for various transformations of the marker, +seeking to maximize performance and allowing for flexibility in the final +shape of the subsets (eROC curve). For multidimensional markers, the +proposed tool visualizes the evolution of decision subsets when +different objective functions are employed for optimization, even +imposing restrictions on the underlying regions. In this case, +displaying the decision rules associated with every specificity in a +single static image is no longer feasible. Therefore, dynamic +representations (videos) are implemented, drawing on time as an extra +dimension to capture the variation in specificity.

+

Much software available in R could be discussed here covering diverse +topics related to ROC curves: the +pROC package is a main +reference including tools to visualize, estimate and compare ROC curves +(Robin et al. 2011); ROCnReg +explicitly considers covariate information to estimate the +covariate-specific and the covariate-adjusted ROC curves (Rodrı́guez-Álvarez and Inácio 2021); +smoothROCtime +implements smooth estimation of time-dependent ROC curves based on the +bivariate kernel density estimator for \((X, \textit{time-to-event})\) +(Dı́az-Coto 2020); +OptimalCutpoints +includes point and interval estimation methods for optimal thresholds +(López-Ratón et al. 2014); and +nsROC performs +non-standard analysis such as gROC estimation (Pérez-Fernández et al. 2018); among +others.

+

This paper introduces and elucidates the diverse functionalities of the +newly developed +movieROC package, +aimed at facilitating the visualization and comprehension of the +decision rules underlying the binary classification process, +encompassing various generalizations. Despite the availability of +numerous R packages implementing related analyses, we have identified +the main gaps covered in this library: tracking the decision rules +underlying the ROC curve, including multivariate markers and +non-standard scenarios (i.e. non-monotonic). The rest of the paper is +structured as follows. In +Section 2, we introduce the main R functions +and objects implemented, and briefly explain the dataset employed +throughout this manuscript to demonstrate the utility of the R library. +Section 3 is devoted to reconsidering the +definition of the standard ROC curve from the perspective of +classification subsets, including an extension to multivariate +scenarios. Sections 4 and +5 revisit the gROC curve and the eROC +curve, respectively, covering various methods to capture the potential +classification accuracy of the marker under study. Each of these +sections begins with a state-of-the-art overview, followed by the main +syntax of the corresponding R functions. In addition, examples of +implementation using the dataset presented in Section +2.3 are provided. Finally, the paper +concludes with a concise summary and computational details regarding the +implemented tool.

+

2 Main functions of the movieROC package and illustrative dataset

+

Sections 2.1 and +2.2 provide a detailed description of the main +objectives of the implemented R functions. To reflect the practical +usage of the developed R package, we employ a real dataset throughout +this manuscript, which is introduced in +Section 2.3.

+

2.1 Functionality of the movieROC package

+

A graphical tool was developed to showcase static and dynamic graphics +displaying the classification subsets derived from maximizing diagnostic +accuracy under certain assumptions, ensuring the preservation of the +interpretability. The R package facilitates the construction of the ROC +curve across various specificities, providing visualizations of the +resulting classification regions. The proposed tool comprises multiple R +functions that generate objects with distinct class attributes (see +function names where red arrows depart from and red nodes in +Figure 1, respectively). Once the object of +interest is created, different methods may be used, in order to plot the +underlying regions (plot_regions(), plot_funregions()), to track the +resulting ROC curve (plot_buildROC(), plot()), to predict decision +rules for a particular specificity, and to print relevant information, +among others. The main function of the package, movieROC(), produces +videos to exhibit the classification procedure.

+
+
+R functions of the movieROC package. The blue nodes include the names of the R functions and the red nodes indicate the different R objects that can be created and worked with. The red arrows depart from those R functions engaged in creating R objects and the black arrows indicate which R functions can be applied to which R objects. The grey dashed arrows show internal dependencies.R functions of the movieROC package. The blue nodes include the names of the R functions and the red nodes indicate the different R objects that can be created and worked with. The red arrows depart from those R functions engaged in creating R objects and the black arrows indicate which R functions can be applied to which R objects. The grey dashed arrows show internal dependencies. +

+Figure 1: R functions of the movieROC package. The blue nodes include the names of the R functions and the red nodes indicate the different R objects that can be created and worked with. The red arrows depart from those R functions engaged in creating R objects and the black arrows indicate which R functions can be applied to which R objects. The grey dashed arrows show internal dependencies. +

+
+
+

It includes algorithms to visualize the regions that underlie the binary +classification problem, considering different approaches:

+
    +
  • make the classification subsets flexible in order to cover +non-standard scenarios, by considering two cut-off values (gROC() +function); explained in +Section 4;

  • +
  • transform the marker by a proper function \(h(\cdot)\) (hROC() +function); introduced in +Section 5;

  • +
  • when dealing with multivariate markers, consider a functional +transformation with some fixed or dynamic parameters resulting from +different methods available in the literature (multiROC() +function); covered in +Section 3.1.

  • +
+

2.2 Class methods for movieROC objects

+

By using the gROC(), the multiROC() or the hROC() function, the +user obtains an R object of class ‘groc’, ‘multiroc’ or ‘hroc’, +respectively. These will be called +movieROC objects. +Once the object of interest is created, the implemented package includes +many functions (methods) to pass to it. Some of them are generic methods +(print(), plot() and predict()), commonly used in R language over +different objects according to their class attributes. The rest of the +functions are specific for this library and therefore only applicable to +movieROC objects. +The following outline summarizes all these functions and +provides their target and main syntax (with default input parameters).

+

Generic functions

+
    +
  • print(): Print some relevant information.
  • +
  • plot(): Plot the ROC curve estimate.
  • +
  • predict(): Print the classification subsets corresponding to a particular false-positive rate (FPR) introduced by the user. For a ‘groc’ object, the user may specify a cut-off value C (for the standard ROC curve) or two cut-off values XL and XU (for the gROC curve).
  • +
+

Specific functions

+
    +
  • plot_regions()
  • +
+

Applicable to a ‘groc’ or a ‘hroc’ object. Plot two graphics in the same figure: left, classification subsets for each false-positive rate (grey color by default); right, \(90^\circ\) rotated ROC curve.

+Main syntax: +
+
+
  plot_regions(obj, plot.roc = TRUE, plot.auc = FALSE, FPR = 0.15, ...)
+
+
+

If the input parameter FPR is specified, the corresponding classification region reporting such false-positive rate and the point in the ROC curve are highlighted in blue color.

+
    +
  • plot_funregions()
  • +
+

Applicable to a ‘groc’ or a ‘hroc’ object. +Plot the transforming function and the classification subsets reporting the false-positive rate(s) indicated in the input parameter(s) FPR and FPR2.

+Main syntax: +
+
+
plot_funregions(obj, FPR = 0.15, FPR2 = NULL, plot.subsets = TRUE, ...)
+
+
+
    +
  • plot_buildROC()
  • +
+

Applicable to a ‘groc’ or a ‘multiroc’ object.

+
    +
  • For a ‘groc’ object: Plot four (if input reduce is FALSE) or two (if reduce is TRUE, only those on the top) graphics in the same figure: top-left, density function estimates for the marker in both populations with the areas corresponding to FPR and TPR colored (blue and red, respectively) for the optional input parameter FPR, C or XL, XU; top-right, the empirical ROC curve estimate; bottom-left, boxplots in both groups; bottom-right, classification subsets for every FPR (grey color).
  • +
+Main syntax: +
+
+
plot_buildROC(obj, FPR = 0.15, C, XL, XU, h = c(1,1),
+              histogram = FALSE, breaks = 15, reduce = TRUE,
+              build.process = FALSE, completeROC = FALSE,  ...)
+
+
+

If build.process is FALSE, the whole ROC curve is displayed; otherwise, if completeROC is TRUE, the portion of the ROC curve until the fixed FPR is highlighted in black and the rest is shown in gray, while if completeROC is FALSE, only the first portion of the curve is displayed.

+
    +
  • For a ‘multiroc’ object: Plot two graphics in the same figure: right, the ROC curve highlighting the point and the threshold for the resulting univariate marker; left, scatterplot with the marker values in both positive (red color) and negative (blue color) subjects. About the left graphic: for \(p=2\), over the original/feature bivariate space; for \(p>2\), projected over two selected components of the marker (if display.method = "OV" with components selection in displayOV, c(1,2) by default) or the first two principal components from PCA (if display.method = "PCA", default). The classification subset reporting the FPR selected by the user (FPR \(\neq\) NULL) is displayed in gold color.
  • +
+

Main syntax:

+for \(p=2\): +
+
+
plot_buildROC(obj, FPR = 0.15,
+              build.process = FALSE, completeROC = TRUE,  ...)
+
+
+for \(p>2\): +
+
+
plot_buildROC(obj, FPR = 0.15,
+              display.method = c("PCA","OV"), displayOV = c(1,2),
+              build.process = FALSE, completeROC = TRUE,  ...)
+
+
+

If build.process is FALSE, the whole ROC curve is displayed; otherwise, if completeROC is TRUE, the portion of the ROC curve until the fixed FPR is highlighted in black and the rest is shown in gray, while if completeROC is FALSE, only the first portion of the curve is shown.

+
    +
  • movieROC()
  • +
+

Applicable to a ‘groc’ or a ‘multiroc’ object. Save a video as a GIF illustrating the construction of the ROC curve.

+
    +
  • For a ‘groc’ object:
  • +
+Main syntax: +
+
+
movieROC(obj, fpr = NULL,
+         h = c(1,1), histogram = FALSE, breaks = 15,
+         reduce = TRUE, completeROC = FALSE, videobar = TRUE,
+         file = "animation1.gif", ...)
+
+
+

For each element in vector fpr (optional input parameter), the function executed is plot_buildROC(obj, FPR = fpr[i], build.process = TRUE, ...). The vector of false-positive rates illustrated in the video is NULL by default: if length of output parameter t for gROC() function is lower than 150, such vector is taken as fpr; otherwise, an equally-spaced vector of length 100 covering the range of the marker values is considered.

+
    +
  • For a ‘multiroc’ object:
  • +
+

Main syntax:

+for \(p=2\): +
+
+
movieROC(obj, fpr = NULL,
+         file = "animation1.gif", save = TRUE,
+         border = TRUE, completeROC = FALSE, ...)
+
+
+for \(p>2\): +
+
+
movieROC(obj, fpr = NULL,
+         display.method = c("PCA","OV"), displayOV = c(1,2),
+         file = "animation1.gif", save = TRUE,
+         border = TRUE, completeROC = FALSE, ...)
+
+
+

The video is saved by default as a GIF with the name indicated in argument file (extension .gif should be added). A border for the classification subsets is drawn by default.

+

For each element in vector fpr (optional input parameter), the function executed is

+for \(p=2\): +
+
+
plot_buildROC(obj, FPR = fpr[i], build.process = TRUE, completeROC, ...)
+
+
+for \(p>2\): +
+
+
plot_buildROC(obj, FPR = fpr[i], build.process = TRUE, completeROC,
+              display.method, displayOV, ...)
+
+
+

Same considerations about the input fpr as those for movieROC() over a ‘groc’ object.

+

2.3 Illustrative dataset

+

In order to illustrate the functionality of our R package, we consider +the HCC data. This dataset is derived from gene expression arrays of +tumor and adjacent non-tumor tissues of 62 Taiwanese cases of +hepatocellular carcinoma (HCC). The goal of the original study +(Shen et al. 2012) was to identify, with a genome-wide approach, additional +genes hypermethylated in HCC that could be used for more accurate +analysis of plasma DNA for early diagnosis, by using Illumina +methylation arrays (Illumina, Inc., San Diego, CA) that screen 27,578 +autosomal CpG sites. The complete dataset was deposited in NCBI’s Gene +Expression Omnibus (GEO) and it is available through series accession +number GSE37988 +(www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE37988). +It is included in the presented package (HCC dataset), selecting 948 +genes with complete information.

+

The following code loads the R package and the HCC dataset (see the +vignette +for main structure).

+
R> library(movieROC)
+R> data(HCC)
+

We selected the genes 20202438, 18384097, and 03515901. On the one hand, +we chose the gene 03515901 as an example of a monotone relationship +between the marker and the response, reporting a good ROC curve. On the +other hand, relative gene expression intensities of the genes 20202438 +and 18384097 tend to be more extreme in tissues with tumor than in those +without it. These are non-standard cases, so if we limit ourselves to +detect “appropriate” genes on the basis of the standard ROC curve, they +would not be chosen. However, extending the decision rules by means of +the gROC curve, those genes may be considered as potential biomarkers +(locations) to differ between the two groups. The R code estimating and +displaying the density probability function for gene expression +intensities of the selected genes in each group +(Figure 2) is included in the +vignette.

+
+
+Density histograms and kernel density estimations (lighter) for gene expression intensities of the genes 20202438, 18384097 and 03515901 in negative (non-tumor) and positive (tumor) tissues. +

+Figure 2: Density histograms and kernel density estimations (lighter) for gene expression intensities of the genes 20202438, 18384097 and 03515901 in negative (non-tumor) and positive (tumor) tissues. +

+
+
+

3 Regular ROC curve

+

Assuming that there exists a monotone relationship between the marker +and the response, the regular, right-sided or standard ROC curve +associated with the marker \(X\) considers classification subsets of the +form \(s_t=(c_t,\infty)\). For each specificity +\(1-t=\mathbb{P}(\chi \notin s_t) \in [0,1]\), also called true-negative rate, +there exists only one subset \(s_t\) reporting such specificity and thus a +particular sensitivity, also called true-positive rate, +\(\mathbb{P}(\xi \in s_t)\). This results in a simple correspondence between each +point of the ROC curve +\(\mathcal{R}_r(t) = 1-F_\xi \big(F_\chi^{-1}(1-t)\big)\) and its +associated classification region \(s_t \in \mathcal{I}_r(t)\), where +\[\mathcal{I}_r(t) = \Big\{ s_t = (c_t, \infty) : c_t \in \mathcal{S}(X) , \mathbb{P}(\chi \in s_t) = t \Big\}\] +is the right-sided family of eligible classification subsets. The +definition of this family captures the shape of the decision rules and +the target specificity.

+

If higher values of the marker are associated with a higher probability +of not having the characteristic (see gene 03515901 in +Figure 2), the ROC curve would be defined by the +left-sided family of eligible classification subsets (Martı́nez-Camblor et al. 2017), +\(\mathcal{I}_l(t)\), similarly to \(\mathcal{I}_r(t)\) but with the form +\(s_t = (\infty, c_t]\). It results in +\(\mathcal{R}_l(t) = F_\xi \big(F_\chi^{-1}(t) \big)\), \(t\in[0,1]\), and +the decision rules are also univocally defined in this case.

+

The ROC curve and related problems were widely studied in the +literature; interested readers are referred to the monographs of +Zhou et al. (2002), Pepe (2003), and Nakas et al. (2023), as well as the review by +Inácio et al. (2021). By definition, the ROC curve is confined within the unit +square, with optimal performance achieved when it approaches the +left-upper corner (AUC closer to 1). Conversely, proximity to the main +diagonal (AUC closer to 0.5) means diminished discriminatory ability, +resembling a random classifier.

+

In practice, let \((\xi_1, \xi_2, \dots, \xi_n)\) and +\((\chi_1, \chi_2, \dots, \chi_m)\) be two independent and identically +distributed (i.i.d.) samples from the positive and the negative +population, respectively. Different estimation procedures are +implemented in the +movieROC package, +such as the empirical estimator (Hsieh and Turnbull 1996) (by default in the gROC() +function), accompanied by its summary indices: the AUC and the Youden +index (Youden 1950). Alternatively, semiparametric approaches based on +kernel density estimation for the involved distributions may be +considered (Zou et al. 1997). The plot_densityROC() function provides plots +for both right- and left-sided ROC curves estimated by this method. On +the other hand, assuming that the marker follows a gaussian distribution +in both populations, that is, +\(\xi \sim \mathcal{N}(\mu_\xi, \sigma_\xi)\) and +\(\chi \sim \mathcal{N}(\mu_\chi, \sigma_\chi)\), parametric approaches +propose plug-in estimators by estimating the unknown parameters while +using the known distributions (Hanley 1988). This parametric estimation +is included in the gROC_param() function, which works similarly to +gROC().

+

Main syntax:

+
+
+
gROC(X, D, side = "right", ...)
+gROC_param(X, D, side = "right", ...)
+
+
+

Table 1 in the +vignette +provides the main input and output parameters of these R functions, +which estimate the regular ROC curve (right-sided or left-sided with +side = "right" or "left", respectively) and associated decision +rules. Its output is an R object of class ‘groc’, to which the +functions listed in Section 2.2 can be applied. Most of them are +visualization tools, but the user may also print() summary information +and predict() classification regions for a particular specificity.

+

Figure 3 graphically represents the empirical +estimation of the standard (gray line) and generalized (black line) ROC +curves for each gene in Figure 2. To construct the standard ROC curve for the +first two genes (20202438 and 18384097), the right-sided ROC curve is +considered; and the left-sided curve for the third one (03515901). As +expected following the discussion about +Figure 2, the standard and gROC curves are similar for +the third gene because there exists a monotone relationship between the +marker and the response. However, these curves differ for the first two +genes due to the lack of monotonicity in those scenarios. The empirical +gROC curve estimator is explained in detail in +Section 4.

+

Next chunk of code generates the figure, providing an example of the use +of gROC() function, plot() and how to get access to the AUC.

+
R> for(gene in c("20202438", "18384097", "03515901")){
++   roc <- gROC(X = HCC[,paste0("cg",gene)], D = HCC$tumor,
++               side = ifelse(gene == "03515901", "left", "right"))
++   plot(roc, col = "gray50", main = paste("Gene", gene), lwd = 3)
++   groc <- gROC(X = HCC[,paste0("cg",gene)], D = HCC$tumor, side = "both")
++   plot(groc, new = FALSE, lwd = 3)
++   legend("bottomright", paste(c("AUC =", "gAUC ="), format(c(roc$auc, groc$auc),
++          digits = 3)), col = c("gray50", "black"), lwd = 3, bty = "n", inset = .01)}
+
+
+Standard ROC curve (in gray) and gROC curve (in black) empirical estimation for the capacity of genes 20202438, 18384097 and 03515901 to differ between tumor and non-tumor group. +

+Figure 3: Standard ROC curve (in gray) and gROC curve (in black) empirical estimation for the capacity of genes 20202438, 18384097 and 03515901 to differ between tumor and non-tumor group. +

+
+
+

The following code snippet estimates the standard ROC curve for gene +20202438, prints its basic information, and predicts the classification +region and sensitivity resulting in a specificity of 0.9. It provides an +illustrative example of utilizing the print() and predict() +functions.

+
R> roc_selg1 <- gROC(X = HCC$cg20202438, D = HCC$tumor, side = "right")
+R> roc_selg1
+
Data was encoded with nontumor (controls) and tumor (cases).
+It is assumed that larger values of the marker indicate larger confidence that a
+ given subject is a case.
+There are 62 controls and 62 cases.
+The specificity and sensitivity reported by the Youden index are 0.855 and 0.403,
+ respectively, corresponding to the following classification subset: (0.799, Inf).
+The area under the right-sided ROC curve (AUC) is 0.547.
+
R> predict(roc_selg1, FPR = .1)
+
$ClassSubsets           $Specificity        $Sensitivity
+[1] 0.8063487   Inf     [1] 0.9032258       [1] 0.3064516
+

The following line of code displays the whole construction of the +empirical standard ROC curve for gene 20202438. The video is saved by +default as a GIF with the name provided.

+
R> movieROC(roc_selg1, reduce = FALSE, file = "StandardROC_gene20202438.gif")
+

+

3.1 Multivariate ROC curve

+

In practice, many cases may benefit from combining information from +different markers to enhance classification accuracy. Rather than +assessing univariate markers separately, taking the multivariate marker +resulting from merging them can yield relevant gain. However, note that +the ROC curve and related indices are defined only for univariate +markers, as they require the existence of a total order. To address this +limitation, a common approach involves transforming the \(p\)-dimensional +multivariate marker \(\boldsymbol{X}\) into a univariate one through a +functional transformation +\(\boldsymbol{h}: \mathbb{R}^p \longrightarrow \mathbb{R}\). This +transformation \(\boldsymbol{h}(\cdot)\) seeks to optimize an objective +function related to the classification accuracy, usually the AUC +(Su and Liu 1993; McIntosh and Pepe 2002; Martı́nez-Camblor et al. 2019).

+

We enumerate the methods included in the proposed R tool by the +multiROC() function (with the input parameter method), listed +according to the objective function to optimize. Recall that the output +of multiROC() is an object of class ‘multiroc’, containing +information about the estimation of the ROC curve and subsets for +multivariate scenarios. Table 2 in the +vignette +includes the usage of this function.

+

Main syntax:

+
+
+
multiROC(X, D, method = "lrm",
+         formula = 'D ~ X.1 + I(X.1^2) + X.2 + I(X.2^2) + I(X.1*X.2)', ...)
+
+
+
    +
  1. AUC: Different procedures to estimate the \(\boldsymbol{h}(\cdot)\) +maximizing the AUC in the multidimensional case have been studied in +the literature. Among all families of functions, linear combinations +(\(\mathcal{L}_{\boldsymbol{\beta}}(\boldsymbol{X}) = \beta_1 X_1 + \dots + \beta_p X_p\)) +are widely used due to their simplicity; an extensive review of the +existing methods was conducted by Kang et al. (2016).
  2. +
+
    +
  • Computation: In the multiROC() function, fixing input parameters +method = "fixedLinear" and methodLinear to one from "SuLiu" +(Su and Liu 1993), "PepeThompson" (Pepe and Thompson 2000), or "minmax" (Liu et al. 2011). +The R function also admits quadratic combinations when \(p=2\), i.e. +\(\mathcal{Q}_{\boldsymbol{\beta}}(\boldsymbol{X}) = \beta_1 X_1 + \beta_2 X_2 + \beta_3 X_1 X_2 + \beta_4 X_1^2 + \beta_5 X_2^2\), +by fixing method = "fixedQuadratic" for a particular +coefQuadratic \(= \boldsymbol{\beta} = (\beta_1, \dots, \beta_5)\).
  • +
+
    +
  1. The risk score function +logit\(\left\{ \mathbb{P}(D = 1 \, | \, \boldsymbol{X}) \right\}\): Our +package allows the user to fit a logistic regression model +(method = "lrm") considering any family of functions (linear, +quadratic, whether considering interactions or not...) by means of +the input parameter formula.lrm. A stepwise regression model is +fitted if stepModel = TRUE. Details are explained in +Section 5.

  2. +
  3. The sensitivity for a particular specificity:

  4. +
+
    +
  1. Considering the theoretical discussion about the search of the +optimal transformation \(\boldsymbol{h}(\cdot)\) pointed out in +Section 5, Martı́nez-Camblor et al. (2021a) proposed to +estimate it by multivariate kernel density estimation for +positive and negative groups separately.
  2. +
+
    +
  • Computation: The multiROC() function integrates the estimation +procedures for the bandwidth matrix developed by Duong (2007), by +fixing method = "kernelOptimal" and choosing a proper method +to estimate the bandwidth ("kernelOptimal.H").
  • +
+
    +
  1. Mainly linear combinations have been explored to date in the +scientific literature (Meisner et al. 2021; Pérez-Fernández et al. 2021). For a +fixed specificity \(t\in[0,1]\), we seek the linear combination +\(\mathcal{L}_{\boldsymbol{\beta}(t)}(\boldsymbol{X}) = \beta_1(t) X_1 + \dots + \beta_p(t) X_p\) +maximizing the true-positive rate by considering standard +subsets for the transformed marker. The coefficients +\(\boldsymbol{\beta}(t)\) are called ‘dynamic parameters’ because +they may be different for each \(t \in [0,1]\).
  2. +
+
    +
  • Computation: Since our objective is to display the ROC curve, +\(\mathcal{L}_{\boldsymbol{\beta}(t)}(\boldsymbol{X})\) is +estimated for every \(t\) in a grid of the unit interval, +resulting in one \(\boldsymbol{\hat{\beta}}(t)\) for each \(t\). +This approach is time-consuming, especially when it is based on +the plug-in empirical estimators involved +(method = "dynamicEmpirical", only implemented for \(p=2\)), and +may result in overfitting. Instead, Meisner et al. (2021) method is +recommended (method = "dynamicMeisner").
  • +
+

Once the classification subsets for a multivariate marker are +constructed by the multiROC() function, several R methods may be used +for the output object (see Section 2.2). They include print relevant +information or plot the resulting ROC curve. The main contribution of +the package is to plot the construction of the ROC curve together with +the classification subsets in a static figure for a particular FPR +(plot_buildROC() function), or in a video for tracking the whole +process (movieROC() function).

+

Figure 4 +illustrates the videos resulting from movieROC() function. +Particularly, classification accuracy of the +bivariate marker (cg20202438, cg18384097) was studied by using four +different approaches indicated on the captions, considering linear +combinations (top) and nonlinear transformations (bottom). This figure +was implemented by the code below, integrating multiROC() and +movieROC() functions. Four videos are saved as GIF files with names +"PepeTh.gif" (a), "Meisner.gif" (b), "LRM.gif" (c), and +"KernelDens.gif" (d).

+
R> X <- HCC[ ,c("cg20202438", "cg18384097")]; D <- HCC$tumor
+R> biroc_12_PT <- multiROC(X, D, method = "fixedLinear", methodLinear = "PepeThompson")
+R> biroc_12_Meis <- multiROC(X, D, method = "dynamicMeisner", verbose = TRUE)
+R> biroc_12_lrm <- multiROC(X, D)
+R> biroc_12_kernel <- multiROC(X, D, method = "kernelOptimal")
+R> list_biroc <- list(PepeTh = biroc_12_PT, Meisner = biroc_12_Meis,
++           LRM = biroc_12_lrm, KernelDens = biroc_12_kernel)
+R> lapply(names(list_biroc), function(x) movieROC(list_biroc[[x]],
++           display.method = "OV", xlab = "Gene 20202438", ylab = "Gene 18384097",
++           cex = 1.2, alpha.points = 1, lwd.curve = 4, file = paste0(x, ".gif")))
+
+ ++++ + + + + + + + + + + + + +
a) Pepe and Thompsonb) Meisner et al.
(a) Linear combinations with fixed parameters by Pepe and Thompson (2000).(b) Linear combinations with dynamic parameters by Meisner et al. (2021).
+ ++++ + + + + + + + + + + + + +
c) Logistic Regressiond) Kernel Density
(c) Logistic regression model with quadratic formula by default (see formula.lrm in Table 2 of the vignette).(d) Optimal transformation by multivariate kernel density estimation with "Hbcv" method by default (Martı́nez-Camblor et al. 2021a).
+
+
+
+Videos (from `movieROC()` function) of the classification procedure and ROC curve for the bivariate marker `cg20202438`, `cg18384097`). Four different methods for classification are displayed. +

+Figure 4: Videos (from movieROC() function) of the classification procedure and ROC curve for the bivariate marker cg20202438, cg18384097). Four different methods for classification are displayed. +

+
+
+

When the marker has a dimension higher than two it is difficult to +visualize the data and the classification regions. Therefore, the +movieROC() function offers two options for showing the results, both +on a bidimensional space. On the one hand, to choose two of the +components of the multivariate marker and project the classification +subsets on the plane defined by them +(Figure 5, middle). On the other, to project the +classification regions on the plane defined by the two first principal +components (Figure 5, left). The R function prcomp() from stats +is used to perform Principal Components Analysis (PCA) (Hotelling 1933).

+

Figure 5 shows the difficulty in displaying the +decision rules when \(p>2\) (the 3 genes used along this manuscript), even +with the two options implemented in our package. It was generated using +multiROC() and plot_buildROC():

+
R> multiroc_PT <- multiROC(X = HCC[ ,c("cg20202438", "cg18384097", "cg03515901")],
++                 D = HCC$tumor, method = "fixedLinear", methodLinear = "PepeThompson")
+R> multiroc_PT
+
Data was encoded with nontumor (controls) and tumor (cases).
+There are 62 controls and 62 cases.
+A total of 3 variables have been considered.
+A linear combination with fixed parameters estimated by PepeThompson approach has
+ been considered.
+The specificity and sensitivity reported by the Youden index are 0.855 and 0.742,
+ respectively, corresponding to the cut-off point -0.0755 for the transformation
+ h(X) =  0.81*cg20202438 - 0.1*cg18384097 - 1*cg03515901.
+The area under the ROC curve (AUC) is 0.811.
+
R> plot_buildROC(multiroc_PT, cex = 1.2, lwd.curve = 4)
+R> plot_buildROC(multiroc_PT, display.method = "OV", displayOV = c(1,3), cex = 1.2,
++                xlab = "Gene 20202438", ylab = "Gene 03515901", lwd.curve = 4)
+
+
+Multivariate ROC curve estimation for the simultaneous diagnostic accuracy of genes 20202438, 18384097 and 03515901. Pepe and Thompson (2000) approach was used to estimate the linear coefficients and classification rules (yellow and gray border for positive and negative class, respectively) for a FPR 0.15 are displayed. Left, projected over the 2 principal components from PCA; middle, over the 1st and the 3rd selected genes. +

+Figure 5: Multivariate ROC curve estimation for the simultaneous diagnostic accuracy of genes 20202438, 18384097 and 03515901. Pepe and Thompson (2000) approach was used to estimate the linear coefficients and classification rules (yellow and gray border for positive and negative class, respectively) for a FPR 0.15 are displayed. Left, projected over the 2 principal components from PCA; middle, over the 1st and the 3rd selected genes. +

+
+
+

4 Generalized ROC curve

+

There are scenarios whose standard ROC curves are not concave (first two +genes in Figure 3, gray solid line), reflecting that the +standard initial assumption of existence of a monotone relationship +between the marker and the response is misleading. In +Figure 2, we may see that difference in gene 20202438 +distribution between those tissues which have the characteristic and those +which do not is mainly in dispersion. To accommodate this common type of +scenarios, Martı́nez-Camblor et al. (2017) extended the ROC curve definition to the case +where both extremes for marker values are associated with a higher risk +of having the characteristic of interest, by considering the both-sided +family of eligible classification subsets: +\[\mathcal{I}_g(t) = \Big\{ s_t = (-\infty,x_t^L] \cup (x_t^U, \infty) : x_t^L \leq x_t^U \in \mathcal{S}(X) , \mathbb{P}(\chi \in s_t) = t \Big\}.\]

+

It becomes crucial to consider the supremum in the definition of the +generalized ROC curve because the decision rule for each \(t \in [0,1]\) +is not univocally defined: there exist infinite pairs \(x_t^L \leq x_t^U\) +reporting a specificity \(1-t\) (i.e. \(\mathcal{I}_g(t)\) is uncountably +infinite). Computationally, this optimization process incurs a +time-consuming estimation, depending on the number of different marker +values in the sample.

+

After the introduction of this extension, several studies followed up +regarding estimation of the gROC curve (Martı́nez-Camblor et al. 2017; Martı́nez-Camblor and Pardo-Fernández 2019a) +and related measures such as its area (gAUC) (Martı́nez-Camblor et al. 2021b) and the +Youden index (Martı́nez-Camblor and Pardo-Fernández 2019b; Bantis et al. 2021). By considering this +generalization, another property of the classification subsets may be +lost: the regions may not be self-contained over the increase in +false-positive rate. It may happen that a subject is classified as a +positive for a particular FPR \(t_1\), but as a negative for a higher FPR +\(t_2\). Therefore, it is natural to establish a restriction (C) on the +classification subsets, ensuring that any subject classified as a +positive for a fixed specificity (or sensitivity) will also be +classified as a positive for any classification subset with lower +specificity (higher sensitivity). Pérez-Fernández et al. (2021) proposed an +algorithm to estimate the gROC curve under restriction (C), included +in the gROC() function of the presented R package. See final Section +for computational details about this algorithm implementation.

+

Main syntax:

+
+
+
gROC(X, D, side = "both", ...)
+gROC_param(X, D, side = "both", ...)
+
+
+

Table 1 in the +vignette +collects the input and output parameters of the gROC() function, which +estimates the gROC curve, both in the mentioned direction +(side = "both") and in the opposite, i.e. when classification subsets +of the form \(s_t=(x_t^L, x_t^U]\) are considered (side = "both2"). In +addition, all the particular methods for a ‘groc’ object collected in +Section 2.2 may be used in this general +scenario.

+

Following, the gene 20202438 expression intensity diagnostic accuracy is +evaluated by the gROC curve without restrictions (groc_selg1 object) +and under the restriction (C) (groc_selg1_C object). The +classification subsets and sensitivity for a specificity of \(0.9\) are +displayed with the predict() function.

+
R> groc_selg1 <- gROC(X = HCC$cg20202438, D = HCC$tumor, side = "both")
+R> predict(groc_selg1, FPR = .1)
+
$ClassSubsets                     $Specificity      $Sensitivity
+          [,1]      [,2]        [1] 0.9032258       [1] 0.4032258
+[1,]      -Inf 0.7180623
+[2,] 0.8296072       Inf
+
R> groc_selg1_C <- gROC(X = HCC$cg20202438, D = HCC$tumor, side = "both",
++              restric = TRUE, optim = TRUE)
+

All the classification regions underlying the standard and the +generalized ROC curves without and with restrictions are represented in +Figure 6. The following code was used to generate +the figure, illustrating the usage and output of the plot_regions() +function. Besides displaying all the classification regions underlying +every specificity (in gray), the one chosen by the user (FPR = 0.15 by +default) is highlighted in blue. Note that the ROC curves are rotated +\(90^\circ\) to the right, in order to use the vertical axis for FPR in +both plots.

+
R> plot_regions(roc_selg1, cex.legend = 1.5, plot.auc = TRUE,
++       main = "Standard right-sided assumption [Classification subsets]")
+R> plot_regions(groc_selg1, plot.auc = TRUE, legend = F,
++       main.plotroc = "gROC curve",
++       main = "General approach [Classification subsets]")
+R> plot_regions(groc_selg1_C, plot.auc = TRUE, legend = F,
++       main.plotroc = "gROC curve",
++       main = "General approach with restriction (C) [Classific. subsets]",
++       xlab = "Gene 20202438 expression intensity")
+
+
+Classification regions and the ROC curve (90º rotated) for evaluation of gene 20202438 expression intensity assuming i) standard scenario (top), ii) generalized scenario without restrictions (middle), iii) generalized scenario under restriction *(C)* over the subsets (bottom).Classification regions and the ROC curve (90º rotated) for evaluation of gene 20202438 expression intensity assuming i) standard scenario (top), ii) generalized scenario without restrictions (middle), iii) generalized scenario under restriction *(C)* over the subsets (bottom).Classification regions and the ROC curve (90º rotated) for evaluation of gene 20202438 expression intensity assuming i) standard scenario (top), ii) generalized scenario without restrictions (middle), iii) generalized scenario under restriction *(C)* over the subsets (bottom). +

+Figure 6: Classification regions and the ROC curve (90º rotated) for evaluation of gene 20202438 expression intensity assuming i) standard scenario (top), ii) generalized scenario without restrictions (middle), iii) generalized scenario under restriction (C) over the subsets (bottom). +

+
+
+

It is clear the gain achieved for considering the generalized scenario +for this marker, which fits better its distribution in each group. +Standard estimated AUC is 0.547, while the gAUC increases to 0.765. The +gAUC is not especially affected by imposing the restriction (C), +resulting in 0.762.

+

5 Efficient ROC curve: pursuing an optimal transformation

+

By keeping classification subsets of the form \(s_t = (c_t, \infty)\), an +alternative approach can be explored: transforming the univariate marker +through a suitable function \(h: \mathbb{R} \longrightarrow \mathbb{R}\) +to enhance its accuracy. Henceforth, the transformation \(h^*(\cdot)\) +reporting the dominant ROC curve compared to the one from any other +function (i.e. \(\mathcal{R}_{h^*}(\cdot) \geq \mathcal{R}_h(\cdot)\)) +will be referred to as optimal transformation (in the ROC sense), and +the resulting ROC curve is called eROC (Kauppi 2016). Following the +well-known Neyman–Pearson lemma, McIntosh and Pepe (2002) proved that \(h^*(\cdot)\) +is the likelihood ratio.

+

We enumerate the methods included in the proposed R tool by the hROC() +function (with the input parameter type), listed according to the +procedure considered to estimate \(h^*(\cdot)\). The output of this +function is an object of class ‘hroc’. See Table 3 in the +vignette +for function usage and output details.

+

Main syntax:

+
+
+
hROC(X, D, type = "lrm", formula.lrm = 'D ~ pol(X,3)', ...)
+
+
+
    +
  1. Martı́nez-Camblor et al. (2019) exploited the result proved by McIntosh and Pepe (2002), +suggesting to estimate the logit of the risk function by logistic +regression, since it is a monotone increasing transformation of the +likelihood ratio.
  2. +
+
    +
  • Computation: By the proposed R tool, the user can define any +transformation \(h(\cdot)\) for the right-hand side of the logistic +regression model to be fitted, +logit\(\big\{ \mathbb{P}(D = 1 \, | \, x) \big\} = h(x)\). Particularly, by +fixing the input parameters: type = "lrm" and defining the +function \(h(\cdot)\) as formula.lrm.
  • +
+
    +
  1. Arguing as in Martı́nez-Camblor et al. (2021a) for univariate markers instead of +multivariate, the optimal transformation in the ROC sense is +equivalent to +\(h^*(\cdot)=f_\xi(\cdot)/\big(f_\xi(\cdot) + f_\chi(\cdot)\big)\), +where \(f(\cdot)\) denotes the density function. In order to estimate +\(h^*(\cdot)\), different estimation procedures for the density +functions separately may be used, such as the kernel density +estimator.
  2. +
+
    +
  • Computation: By the hROC() function, the user may fix +type = "kernel" and choose a proper bandwidth for the kernel +estimation by kernel.h in order to compute this method.
  • +
+
    +
  1. Martı́nez-Camblor et al. (2019) also included the estimation of the overfitting +function, \(h_{of}(\cdot)\), defined as the optimal one when no +restrictions on the shape of \(h^*(\cdot)\) are imposed. It takes the +value 1 for the positive marker values and 0 for the negative ones, +reporting an estimated AUC of 1, but totally depending on the +available sample (the resulting rules cannot be extended).
  2. +
+
    +
  • Computation: \(h_{of}(\cdot)\) may be estimated by fixing the input +parameter type = "overfitting".
  • +
+

The following code and figures study the capacity of improving the +classification performance of the gene 18384097 expression intensity via +the above functional transformations and its impact on the final +decision rules. The first one considers an ordinary cubic polynomial +formula (hroc_cubic_selg2), and a linear tail-restricted cubic spline +(hroc_rcs_selg2) for the right-hand side of the logistic regression model. +The second one uses two different bandwidths (\(h=1\) and \(h=3\)) for +density function estimation. For a comparative purpose, the last one +estimates the gROC curve under restriction (C).

+
R> X <- HCC$cg18384097; D <- HCC$tumor
+
R> hroc_cubic_selg2 <- hROC(X, D); hroc_cubic_selg2
+
Data was encoded with nontumor (controls) and tumor (cases).
+There are 62 controls and 62 cases.
+A logistic regression model of the form D ~ pol(X,3) has been performed.
+The estimated parameters of the model are the following:
+ Intercept          X        X^2        X^3
+   "1.551"   "32.054" "-120.713"  "100.449"
+The specificity and sensitivity reported by the Youden index are 0.935 and 0.532,
+ respectively, corresponding to the following classification subset:
+ (-Inf, 0.442) U (0.78, Inf).
+The area under the ROC curve (AUC) is 0.759.
+
R> hroc_rcs_selg2 <- hROC(X, D, formula.lrm = "D ~ rcs(X,8)")
+R> hroc_lkr1_selg2 <- hROC(X, D, type = "kernel")
+R> hroc_lkr3_selg2 <- hROC(X, D, type = "kernel", kernel.h = 3)
+R> hroc_overfit_selg2 <- hROC(X, D, type = "overfitting")
+
+R> groc_selg2_C <- gROC(X, D, side = "both", restric = TRUE, optim = TRUE)
+

The following code snippet compares the AUC achieved from each approach +considered above:

+
R> list_hroc <- list(Cubic = hroc_cubic_selg2, Splines = hroc_rcs_selg2,
++               Overfit = hroc_overfit_selg2, LikRatioEst_h3 = hroc_lkr3_selg2,
++               LikRatioEst_h1 = hroc_lkr1_selg2, gAUC_restC = groc_selg2_C)
+
R> AUCs <- sapply(list_hroc, function(x) x$auc)
+R> round(AUCs, 3)
+
Cubic        Splines   Overfit   LikRatioEst_h3    LikRatioEst_h1       gAUC_restC
+0.759          0.807     1.000            0.781             0.799            0.836
+

The shape of the classification regions over the original space +\(\mathcal{S}(X)\) depends on the monotonicity of \(h^*(\cdot)\), which may +be graphically studied by the plot_funregions() function (see +Figure 7). These regions can be visualized by +the R function plot_regions() (see +Figure 8). Both are explained in +Section 2.2 and illustrated below. The next chunk of +code produced Figure 7, representing the different functional +transformations estimated previously:

+
R> lapply(list_hroc, function(x) plot_funregions(x, FPR = .15, FPR2 = .5))
+
+
+Different functional transformations and resulting classification subsets for gene 18384097. Rules for FPR 0.15 (blue) and 0.50 (red) are remarked. Top, from left to right: cubic polynomial function, restricted cubic splines (with 8 knots), and overfitted transformation. Bottom: likelihood ratio estimation with bandwidths 3 (left) and 1 (middle), and transformation resulting in the gROC curve under restriction *(C)*. +

+Figure 7: Different functional transformations and resulting classification subsets for gene 18384097. Rules for FPR 0.15 (blue) and 0.50 (red) are remarked. Top, from left to right: cubic polynomial function, restricted cubic splines (with 8 knots), and overfitted transformation. Bottom: likelihood ratio estimation with bandwidths 3 (left) and 1 (middle), and transformation resulting in the gROC curve under restriction (C). +

+
+
+

Finally, using the plot_regions() function, +Figure 8 shows the resulting classification subsets +over the original space for the best two of the six methods above. The first +method (fitting a logistic regression model with restricted cubic +splines with 8 knots) reports an AUC of 0.804 (compared to 0.684 by the +standard ROC curve), but the shape of some classification rules is +complex, such as \(s_t=(-\infty,a_t] \cup (b_t,c_t] \cup (d_t,\infty)\). +This area increases to 0.836 by considering subsets of the form +\(s_t=(-\infty,x_t^L] \cup (x_t^U,\infty)\), even imposing the restriction +(C) to get a functional transformation \(h(\cdot)\).

+
+
+Classification regions and the resulting ROC curve (90º rotated) for the gene 18384097. Top, ROC curve for restricted cubic splines transformation with 8 knots; bottom, gROC curve under restriction *(C)* for the original marker.Classification regions and the resulting ROC curve (90º rotated) for the gene 18384097. Top, ROC curve for restricted cubic splines transformation with 8 knots; bottom, gROC curve under restriction *(C)* for the original marker. +

+Figure 8: Classification regions and the resulting ROC curve (90º rotated) for the gene 18384097. Top, ROC curve for restricted cubic splines transformation with 8 knots; bottom, gROC curve under restriction (C) for the original marker. +

+
+
+

6 Summary and conclusion

+

Conducting binary classification using continuous markers requires +establishment of decision rules. In the standard case, each specificity +\(t \in [0,1]\) entails a classification subset of the form +\(s_t = (c_t,\infty)\) univocally defined. However, in more complex +situations – such as where there is a non-monotone relationship between the +marker and the response or in multivariate scenarios – these become not +clear. Visualization of the decision rules becomes crucial in these +cases. To address this, the +movieROC package +incorporates novel visualization tools complementing the ROC curve +representation.

+

This R package offers a user-friendly and easily comprehensible software +solution tailored for practical researchers. It implements statistical +techniques to estimate and compare, and finally to graphically represent +different classification procedures. While several R packages address +ROC curve estimation, the proposed one emphasizes the classification +process, tracking the decision rules underlying the studied binary +classification problem. This tool incorporates different considerations +and transformations which may be useful to capture the potential of the +marker to classify in non-standard scenarios. Nevertheless, this library +is also useful in standard cases, as well as when the marker itself +comes from a classification or regression method (such as support vector +machines), because it provides nice visuals and additional information +not usually reported with the ROC curve.

+

The main function of the package, movieROC(), allows users to monitor how +the resulting classification subsets change along different +specificities, thereby building the corresponding ROC curve. Notably, it +introduces time as a third dimension to keep those specificities, +generating informative videos. For interested readers or potential users +of movieROC, the +manual +available in CRAN provides complete information about the implemented +functions and their parameters. In addition, a +vignette +is accessible, including mathematical formalism and details about the +algorithms implemented.

+

Computational considerations

+

Dependencies

+

Some functions of our package depend on other libraries available on +CRAN:

+
    +
  • gROC(X, D, side = "both", restric = TRUE, optim = TRUE, ...) uses +the allShortestPaths() function in the +e1071 package +(Meyer et al. 2023).

  • +
  • hROC(X, D, type = "lrm", ...) and +multiROC(X, D, method = "lrm", ...) use the lrm() function in +the rms package +(Harrell Jr 2023).

  • +
  • multiROC(X, D, method = "kernelOptimal", ...) uses the kde() +function in the ks +package (Duong 2023).

  • +
  • multiROC(X, D, method = "dynamicMeisner", ...) uses the maxTPR() +function in the maxTPR package (Meisner et al. 2021). This package was +removed from the CRAN repository, so we integrated the code of the +maxTPR() function into our package. This function uses +Rsolnp::solnp() and robustbase::BYlogreg().

  • +
  • multiROC(X, D, method = "fixedLinear", methodLinear, ...) uses the +R functions included in Kang et al. (2016) (Appendix). We integrated this +code into our package.

  • +
  • movieROC(obj, save = TRUE, ...) uses the saveGIF() function in +the animation +package (Xie et al. 2021).

  • +
+

Limitations

+

Users should be aware of certain limitations while working with this +package:

+
    +
  • Some methods are potentially time-consuming, especially with medium +to large sample sizes:

    +
      +
    • The estimation of the gROC curve under restriction (C) can be +computationally intensive, especially when considering different FPR +to locally optimize the search using +gROC(X, D, side = "both", restric = TRUE, optim = TRUE, t0max = TRUE). +Note that this method involves a quite exhaustive search of the +self-contained classification subsets leading to the optimal gROC +curve estimate. However, even selecting different false-positive +rates \(t_0\) to start from, it may not result in the optimal +achievable estimate under restriction (C). Input parameters +restric, optim, t0 and t0max for gROC() function included +in Table 1 of the +vignette +serve to control this search.

    • +
    • Similarly, it also occurs for multivariate markers when considering +linear frontiers with dynamic parameters (by using
      +multiROC(X, D, method = "dynamicMeisner" | "dynamicEmpirical")).

    • +
  • +
  • Most implemented R functions consider empirical estimation for the +resulting ROC curve, even if the procedure to estimate the decision +rules is semi-parametric. An exception is the gROC_param() +function, which accommodates the binormal scenario.

  • +
  • When visualizing classification regions for multivariate markers +with high dimension (plot_buildROC() and movieROC() functions +for a ‘multiroc’ object), our package provides some alternatives, +but additional improvements could provide further aid in +interpretation.

  • +
+

Acknowledgements

+

The authors acknowledge support by the Grants PID2019-104486GB-I00 and +PID2020-118101GB-I00 from Ministerio de Ciencia e Innovación (Spanish +Government), and by a financial Grant for Excellence Mobility for +lecturers and researchers subsidized by the University of Oviedo in +collaboration with Banco Santander.

+
+
+

6.1 CRAN packages used

+

movieROC, pROC, ROCnReg, OptimalCutpoints, nsROC, rms, ks, e1071, animation

+

6.2 CRAN Task Views implied by cited packages

+

Cluster, Distributions, DynamicVisualizations, Econometrics, Environmetrics, MachineLearning, Psychometrics, ReproducibleResearch, Survival, TeachingStatistics

+

6.3 Note

+

This article is converted from a Legacy LaTeX article using the +texor package. +The pdf version is the official version. To report a problem with the html, +refer to CONTRIBUTE on the R Journal homepage.

+
+
+L. E. Bantis, J. V. Tsimikas, G. R. Chambers, M. Capello, S. Hanash and Z. Feng. The length of the receiver operating characteristic curve and the two cutoff Youden index within a robust framework for discovery, evaluation, and cutoff estimation in biomarker studies involving improper receiver operating characteristic curves. Statistics in Medicine, 40(7): 1767–1789, 2021. URL https://doi.org/10.1002/sim.8869. +
+
+S. Dı́az-Coto. smoothROCtime: an R package for time-dependent ROC curve estimation. Computational Statistics, 35: 1231–1251, 2020. URL https://doi.org/10.1007/s00180-020-00955-7. +
+
+D. D. Dorfman, K. S. Berbaum, C. E. Metz, R. V. Lenth, J. A. Hanley and H. A. Dagga. Proper receiver operating characteristic analysis: The bigamma model. Academic Radiology, 4(2): 138–149, 1997. URL https://doi.org/10.1016/s1076-6332(97)80013-x. +
+
+T. Duong. Ks: Kernel density estimation and kernel discriminant analysis for multivariate data in R. Journal of Statistical Software, 21(7): 1–16, 2007. URL https://doi.org/10.18637/jss.v021.i07. +
+
+T. Duong. Ks: Kernel smoothing. 2023. URL https://CRAN.R-project.org/package=ks. R package version 1.14.1. +
+
+J. A. Hanley. The robustness of the "binormal" assumptions used in fitting ROC curves. Medical Decision Making, 8(3): 197–203, 1988. URL https://doi.org/10.1177/0272989X8800800308. PMID: 3398748. +
+
+J. A. Hanley and B. J. McNeil. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143(1): 29–36, 1982. URL https://doi.org/110.1148/radiology.143.1.7063747. +
+
+F. E. Harrell Jr. Rms: Regression modeling strategies. 2023. URL https://CRAN.R-project.org/package=rms. R package version 6.7-1. +
+
+H. Hotelling. Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24(6): 417–441, 1933. URL https://doi.org/10.1037/h0071325. +
+
+F. Hsieh and B. W. Turnbull. Nonparametric and semiparametric estimation of the receiver operating characteristic curve. The Annals of Statistics, 24(1): 25–40, 1996. URL https://doi.org/10.1214/aos/1033066197. +
+
+V. Inácio, M. X. Rodrı́guez-Álvarez and P. Gayoso-Diz. Statistical evaluation of medical tests. Annual Review of Statistics and Its Application, 8(1): 41–67, 2021. URL https://doi.org/10.1146/annurev-statistics-040720-022432. +
+
+L. Kang, A. Liu and L. Tian. Linear combination methods to improve diagnostic/prognostic accuracy on future observations. Statistical Methods in Medical Research, 25(4): 1359–1380, 2016. URL https://doi.org/10.1177/0962280213481053. +
+
+H. Kauppi. The generalized receiver operating characteristic curve. 114. Aboa Centre for Economics. 2016. URL https://www.econstor.eu/bitstream/10419/233329/1/aboa-ce-dp114.pdf. +
+
+C. Liu, A. Liu and S. Halabi. A min–max combination of biomarkers to improve diagnostic accuracy. Statistics in Medicine, 30(16): 2005–2014, 2011. URL https://doi.org/10.1002/sim.4238. +
+
+M. López-Ratón, M. X. Rodrı́guez-Álvarez, C. C. Suárez and F. G. Sampedro. OptimalCutpoints: An R package for selecting optimal cutpoints in diagnostic tests. Journal of Statistical Software, 61(8): 1–36, 2014. URL https://doi.org/10.18637/jss.v061.i08. +
+
+P. Martı́nez-Camblor. On the paper “Notes on the overlap measure as an alternative to the Youden index”. Statistics in Medicine, 37(7): 1222–1224, 2018. URL https://doi.org/10.1002/sim.7517. +
+
+P. Martı́nez-Camblor, N. Corral, C. Rey, J. Pascual and E. Cernuda-Morollón. Receiver operating characteristic curve generalization for non-monotone relationships. Statistical Methods in Medical Research, 26(1): 113–123, 2017. URL https://doi.org/10.1177/0962280214541095. +
+
+P. Martı́nez-Camblor and J. C. Pardo-Fernández. Parametric estimates for the receiver operating characteristic curve generalization for non-monotone relationships. Statistical Methods in Medical Research, 28(7): 2032–2048, 2019a. URL https://doi.org/10.1177/0962280217747009. +
+
+P. Martı́nez-Camblor and J. C. Pardo-Fernández. The Youden index in the generalized receiver operating characteristic curve context. The International Journal of Biostatistics, 15(1): 2019b. URL https://doi.org/10.1515/ijb-2018-0060. +
+
+P. Martı́nez-Camblor, S. Pérez-Fernández and S. Dı́az-Coto. Improving the biomarker diagnostic capacity via functional transformations. Journal of Applied Statistics, 46(9): 1550–1566, 2019. URL https://doi.org/10.1080/02664763.2018.1554628. +
+
+P. Martı́nez-Camblor, S. Pérez-Fernández and S. Dı́az-Coto. Optimal classification scores based on multivariate marker transformations. AStA Advances in Statistical Analysis, 105(4): 581–599, 2021a. URL https://doi.org/10.1007/s10182-020-00388-z. +
+
+P. Martı́nez-Camblor, S. Pérez-Fernández and S. Dı́az-Coto. The area under the generalized receiver-operating characteristic curve. The International Journal of Biostatistics, 18(1): 293–306, 2021b. URL https://doi.org/10.1515/ijb-2020-0091. +
+
+R. Mayeux. Biomarkers: Potential uses and limitations. NeuroRx, 1(2): 182–188, 2004. URL https://doi.org/10.1602/neurorx.1.2.182. +
+
+M. W. McIntosh and M. S. Pepe. Combining several screening tests: Optimality of the risk score. Biometrics, 58(3): 657–664, 2002. URL https://doi.org/10.1111/j.0006-341X.2002.00657.x. +
+
+A. Meisner, M. Carone, M. S. Pepe and K. F. Kerr. Combining biomarkers by maximizing the true positive rate for a fixed false positive rate. Biometrical Journal, 63(6): 1223–1240, 2021. URL https://doi.org/10.1002/bimj.202000210. +
+
+D. Meyer, E. Dimitriadou, K. Hornik, A. Weingessel and F. Leisch. e1071: Misc functions of the department of statistics, probability theory group (formerly: E1071), TU wien. 2023. URL https://CRAN.R-project.org/package=e1071. R package version 1.7-13. +
+
+C. T. Nakas, L. E. Bantis and C. Gatsonis. ROC analysis for classification and prediction in practice. CRC Press, 2023. URL https://www.routledge.com/ROC-Analysis-for-Classification-and-Prediction-in-Practice/Nakas-Bantis-Gatsonis/p/book/9781482233704. +
+
+M. S. Pepe. The statistical evaluation of medical tests for classification and prediction. Oxford University Press, 2003. URL https://global.oup.com/academic/product/the-statistical-evaluation-of-medical-tests-for-classification-and-prediction-9780198565826?cc=es&lang=en&. +
+
+M. S. Pepe and M. L. Thompson. Combining diagnostic test results to increase accuracy. Biostatistics, 1(2): 123–140, 2000. URL https://doi.org/10.1093/biostatistics/1.2.123. +
+
+S. Pérez-Fernández, P. Martı́nez-Camblor, P. Filzmoser and N. Corral. nsROC: An R package for non-standard ROC curve analysis. The R Journal, 10(2): 55–74, 2018. URL https://doi.org/10.32614/RJ-2018-043. +
+
+S. Pérez-Fernández, P. Martı́nez-Camblor, P. Filzmoser and N. Corral. Visualizing the decision rules behind the ROC curves: Understanding the classification process. AStA Advances in Statistical Analysis, 105(1): 135–161, 2021. URL https://doi.org/10.1007/s10182-020-00385-2. +
+
+X. Robin, N. Turck, A. Hainard, N. Tiberti, F. Lisacek, J.-C. Sanchez and M. Müller. PROC: An open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics, 12(77): 1–8, 2011. URL https://doi.org/10.1186/1471-2105-12-77. +
+
+M. X. Rodrı́guez-Álvarez and V. Inácio. ROCnReg: An R package for receiver operating characteristic curve inference with and without covariates. The R Journal, 13(1): 525–555, 2021. URL https://doi.org/10.32614/RJ-2021-066. +
+
+J. Shen, S. Wang, Y.-J. Zhang, M. Kappil, H.-C. Wu, M. G. Kibriya, Q. Wang, F. Jasmine, H. Ahsan, P.-H. Lee, et al. Genome-wide DNA methylation profiles in hepatocellular carcinoma. Hepatology, 55(6): 1799–1808, 2012. URL https://doi.org/10.1002/hep.25569. +
+
+J. Q. Su and J. S. Liu. Linear combinations of multiple diagnostic markers. Journal of the American Statistical Association, 88(424): 1350–1355, 1993. URL https://doi.org/10.1080/01621459.1993.10476417. +
+
+Y. Xie, C. Mueller, L. Yu and W. Zhu. Animation: A gallery of animations in statistics and utilities to create animations. 2021. URL https://yihui.org/animation/. R package version 2.7. +
+
+W. J. Youden. Index for rating diagnostic tests. Cancer, 3(1): 32–35, 1950. URL https://doi.org/10.1002/1097-0142(1950)3:1<32::AID-CNCR2820030106>3.0.CO;2-3. +
+
+X.-H. Zhou, D. K. McClish and N. A. Obuchowski. Statistical methods in diagnostic medicine. Wiley, 2002. URL https://doi.org/10.1002/9780470317082. +
+
+K. H. Zou, W. J. Hall and D. E. Shapiro. Smooth non-parametric receiver operating characteristic (ROC) curves for continuous diagnostic tests. Statistics in Medicine, 16(19): 2143–2156, 1997. URL https://doi.org/10.1002/(SICI)1097-0258(19971015)16:19<2143::AID-SIM655>3.0.CO;2-3. +
+
+ + +
+ +
+
+ + + + + + + +
+

References

+
+

Reuse

+

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

+

Citation

+

For attribution, please cite this work as

+
Pérez-Fernández, et al., "movieROC: Visualizing the Decision Rules Underlying Binary Classification", The R Journal, 2026
+

BibTeX citation

+
@article{RJ-2025-035,
+  author = {Pérez-Fernández, Sonia and Martínez-Camblor, Pablo and Corral-Blanco, Norberto},
+  title = {movieROC: Visualizing the Decision Rules Underlying Binary Classification},
+  journal = {The R Journal},
+  year = {2026},
+  note = {https://doi.org/10.32614/RJ-2025-035},
+  doi = {10.32614/RJ-2025-035},
+  volume = {17},
+  issue = {4},
+  issn = {2073-4859},
+  pages = {59-79}
+}
+
+ + + + + + + diff --git a/_articles/RJ-2025-035/RJ-2025-035.pdf b/_articles/RJ-2025-035/RJ-2025-035.pdf new file mode 100644 index 0000000000..ca90a4f6f1 Binary files /dev/null and b/_articles/RJ-2025-035/RJ-2025-035.pdf differ diff --git a/_articles/RJ-2025-035/RJ-2025-035_files/anchor-4.2.2/anchor.min.js b/_articles/RJ-2025-035/RJ-2025-035_files/anchor-4.2.2/anchor.min.js new file mode 100644 index 0000000000..1342f5f6f9 --- /dev/null +++ b/_articles/RJ-2025-035/RJ-2025-035_files/anchor-4.2.2/anchor.min.js @@ -0,0 +1,9 @@ +// @license magnet:?xt=urn:btih:d3d9a9a6595521f9666a5e94cc830dab83b65699&dn=expat.txt Expat +// +// AnchorJS - v4.2.2 - 2019-11-14 +// https://www.bryanbraun.com/anchorjs/ +// Copyright (c) 2019 Bryan Braun; Licensed MIT +// +// @license magnet:?xt=urn:btih:d3d9a9a6595521f9666a5e94cc830dab83b65699&dn=expat.txt Expat +!function(A,e){"use strict";"function"==typeof define&&define.amd?define([],e):"object"==typeof module&&module.exports?module.exports=e():(A.AnchorJS=e(),A.anchors=new A.AnchorJS)}(this,function(){"use strict";return function(A){function f(A){A.icon=A.hasOwnProperty("icon")?A.icon:"",A.visible=A.hasOwnProperty("visible")?A.visible:"hover",A.placement=A.hasOwnProperty("placement")?A.placement:"right",A.ariaLabel=A.hasOwnProperty("ariaLabel")?A.ariaLabel:"Anchor",A.class=A.hasOwnProperty("class")?A.class:"",A.base=A.hasOwnProperty("base")?A.base:"",A.truncate=A.hasOwnProperty("truncate")?Math.floor(A.truncate):64,A.titleText=A.hasOwnProperty("titleText")?A.titleText:""}function p(A){var e;if("string"==typeof A||A instanceof String)e=[].slice.call(document.querySelectorAll(A));else{if(!(Array.isArray(A)||A instanceof NodeList))throw new Error("The selector provided to AnchorJS was invalid.");e=[].slice.call(A)}return e}this.options=A||{},this.elements=[],f(this.options),this.isTouchDevice=function(){return!!("ontouchstart"in window||window.DocumentTouch&&document instanceof DocumentTouch)},this.add=function(A){var e,t,i,n,o,s,a,r,c,h,l,u,d=[];if(f(this.options),"touch"===(l=this.options.visible)&&(l=this.isTouchDevice()?"always":"hover"),0===(e=p(A=A||"h2, h3, h4, h5, h6")).length)return this;for(!function(){if(null!==document.head.querySelector("style.anchorjs"))return;var A,e=document.createElement("style");e.className="anchorjs",e.appendChild(document.createTextNode("")),void 0===(A=document.head.querySelector('[rel="stylesheet"], style'))?document.head.appendChild(e):document.head.insertBefore(e,A);e.sheet.insertRule(" .anchorjs-link { opacity: 0; text-decoration: none; -webkit-font-smoothing: antialiased; -moz-osx-font-smoothing: grayscale; }",e.sheet.cssRules.length),e.sheet.insertRule(" *:hover > .anchorjs-link, .anchorjs-link:focus { opacity: 1; }",e.sheet.cssRules.length),e.sheet.insertRule(" [data-anchorjs-icon]::after { content: attr(data-anchorjs-icon); }",e.sheet.cssRules.length),e.sheet.insertRule(' @font-face { font-family: "anchorjs-icons"; src: url(data:n/a;base64,AAEAAAALAIAAAwAwT1MvMg8yG2cAAAE4AAAAYGNtYXDp3gC3AAABpAAAAExnYXNwAAAAEAAAA9wAAAAIZ2x5ZlQCcfwAAAH4AAABCGhlYWQHFvHyAAAAvAAAADZoaGVhBnACFwAAAPQAAAAkaG10eASAADEAAAGYAAAADGxvY2EACACEAAAB8AAAAAhtYXhwAAYAVwAAARgAAAAgbmFtZQGOH9cAAAMAAAAAunBvc3QAAwAAAAADvAAAACAAAQAAAAEAAHzE2p9fDzz1AAkEAAAAAADRecUWAAAAANQA6R8AAAAAAoACwAAAAAgAAgAAAAAAAAABAAADwP/AAAACgAAA/9MCrQABAAAAAAAAAAAAAAAAAAAAAwABAAAAAwBVAAIAAAAAAAIAAAAAAAAAAAAAAAAAAAAAAAMCQAGQAAUAAAKZAswAAACPApkCzAAAAesAMwEJAAAAAAAAAAAAAAAAAAAAARAAAAAAAAAAAAAAAAAAAAAAQAAg//0DwP/AAEADwABAAAAAAQAAAAAAAAAAAAAAIAAAAAAAAAIAAAACgAAxAAAAAwAAAAMAAAAcAAEAAwAAABwAAwABAAAAHAAEADAAAAAIAAgAAgAAACDpy//9//8AAAAg6cv//f///+EWNwADAAEAAAAAAAAAAAAAAAAACACEAAEAAAAAAAAAAAAAAAAxAAACAAQARAKAAsAAKwBUAAABIiYnJjQ3NzY2MzIWFxYUBwcGIicmNDc3NjQnJiYjIgYHBwYUFxYUBwYGIwciJicmNDc3NjIXFhQHBwYUFxYWMzI2Nzc2NCcmNDc2MhcWFAcHBgYjARQGDAUtLXoWOR8fORYtLTgKGwoKCjgaGg0gEhIgDXoaGgkJBQwHdR85Fi0tOAobCgoKOBoaDSASEiANehoaCQkKGwotLXoWOR8BMwUFLYEuehYXFxYugC44CQkKGwo4GkoaDQ0NDXoaShoKGwoFBe8XFi6ALjgJCQobCjgaShoNDQ0NehpKGgobCgoKLYEuehYXAAAADACWAAEAAAAAAAEACAAAAAEAAAAAAAIAAwAIAAEAAAAAAAMACAAAAAEAAAAAAAQACAAAAAEAAAAAAAUAAQALAAEAAAAAAAYACAAAAAMAAQQJAAEAEAAMAAMAAQQJAAIABgAcAAMAAQQJAAMAEAAMAAMAAQQJAAQAEAAMAAMAAQQJAAUAAgAiAAMAAQQJAAYAEAAMYW5jaG9yanM0MDBAAGEAbgBjAGgAbwByAGoAcwA0ADAAMABAAAAAAwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABAAH//wAP) format("truetype"); }',e.sheet.cssRules.length)}(),t=document.querySelectorAll("[id]"),i=[].map.call(t,function(A){return A.id}),o=0;o\]\.\/\(\)\*\\\n\t\b\v]/g,"-").replace(/-{2,}/g,"-").substring(0,this.options.truncate).replace(/^-+|-+$/gm,"").toLowerCase()},this.hasAnchorJSLink=function(A){var e=A.firstChild&&-1<(" "+A.firstChild.className+" ").indexOf(" anchorjs-link "),t=A.lastChild&&-1<(" "+A.lastChild.className+" ").indexOf(" anchorjs-link ");return e||t||!1}}}); +// @license-end \ No newline at end of file diff --git a/_articles/RJ-2025-035/RJ-2025-035_files/bowser-1.9.3/bowser.min.js b/_articles/RJ-2025-035/RJ-2025-035_files/bowser-1.9.3/bowser.min.js new file mode 100644 index 0000000000..3da61049a4 --- /dev/null +++ b/_articles/RJ-2025-035/RJ-2025-035_files/bowser-1.9.3/bowser.min.js @@ -0,0 +1,6 @@ +/*! + * Bowser - a browser detector + * https://github.com/ded/bowser + * MIT License | (c) Dustin Diaz 2015 + */ +!function(e,t,n){typeof module!="undefined"&&module.exports?module.exports=n():typeof define=="function"&&define.amd?define(t,n):e[t]=n()}(this,"bowser",function(){function t(t){function n(e){var n=t.match(e);return n&&n.length>1&&n[1]||""}function r(e){var n=t.match(e);return n&&n.length>1&&n[2]||""}function N(e){switch(e){case"NT":return"NT";case"XP":return"XP";case"NT 5.0":return"2000";case"NT 5.1":return"XP";case"NT 5.2":return"2003";case"NT 6.0":return"Vista";case"NT 6.1":return"7";case"NT 6.2":return"8";case"NT 6.3":return"8.1";case"NT 10.0":return"10";default:return undefined}}var i=n(/(ipod|iphone|ipad)/i).toLowerCase(),s=/like android/i.test(t),o=!s&&/android/i.test(t),u=/nexus\s*[0-6]\s*/i.test(t),a=!u&&/nexus\s*[0-9]+/i.test(t),f=/CrOS/.test(t),l=/silk/i.test(t),c=/sailfish/i.test(t),h=/tizen/i.test(t),p=/(web|hpw)os/i.test(t),d=/windows phone/i.test(t),v=/SamsungBrowser/i.test(t),m=!d&&/windows/i.test(t),g=!i&&!l&&/macintosh/i.test(t),y=!o&&!c&&!h&&!p&&/linux/i.test(t),b=r(/edg([ea]|ios)\/(\d+(\.\d+)?)/i),w=n(/version\/(\d+(\.\d+)?)/i),E=/tablet/i.test(t)&&!/tablet pc/i.test(t),S=!E&&/[^-]mobi/i.test(t),x=/xbox/i.test(t),T;/opera/i.test(t)?T={name:"Opera",opera:e,version:w||n(/(?:opera|opr|opios)[\s\/](\d+(\.\d+)?)/i)}:/opr\/|opios/i.test(t)?T={name:"Opera",opera:e,version:n(/(?:opr|opios)[\s\/](\d+(\.\d+)?)/i)||w}:/SamsungBrowser/i.test(t)?T={name:"Samsung Internet for Android",samsungBrowser:e,version:w||n(/(?:SamsungBrowser)[\s\/](\d+(\.\d+)?)/i)}:/coast/i.test(t)?T={name:"Opera Coast",coast:e,version:w||n(/(?:coast)[\s\/](\d+(\.\d+)?)/i)}:/yabrowser/i.test(t)?T={name:"Yandex Browser",yandexbrowser:e,version:w||n(/(?:yabrowser)[\s\/](\d+(\.\d+)?)/i)}:/ucbrowser/i.test(t)?T={name:"UC Browser",ucbrowser:e,version:n(/(?:ucbrowser)[\s\/](\d+(?:\.\d+)+)/i)}:/mxios/i.test(t)?T={name:"Maxthon",maxthon:e,version:n(/(?:mxios)[\s\/](\d+(?:\.\d+)+)/i)}:/epiphany/i.test(t)?T={name:"Epiphany",epiphany:e,version:n(/(?:epiphany)[\s\/](\d+(?:\.\d+)+)/i)}:/puffin/i.test(t)?T={name:"Puffin",puffin:e,version:n(/(?:puffin)[\s\/](\d+(?:\.\d+)?)/i)}:/sleipnir/i.test(t)?T={name:"Sleipnir",sleipnir:e,version:n(/(?:sleipnir)[\s\/](\d+(?:\.\d+)+)/i)}:/k-meleon/i.test(t)?T={name:"K-Meleon",kMeleon:e,version:n(/(?:k-meleon)[\s\/](\d+(?:\.\d+)+)/i)}:d?(T={name:"Windows Phone",osname:"Windows Phone",windowsphone:e},b?(T.msedge=e,T.version=b):(T.msie=e,T.version=n(/iemobile\/(\d+(\.\d+)?)/i))):/msie|trident/i.test(t)?T={name:"Internet Explorer",msie:e,version:n(/(?:msie |rv:)(\d+(\.\d+)?)/i)}:f?T={name:"Chrome",osname:"Chrome OS",chromeos:e,chromeBook:e,chrome:e,version:n(/(?:chrome|crios|crmo)\/(\d+(\.\d+)?)/i)}:/edg([ea]|ios)/i.test(t)?T={name:"Microsoft Edge",msedge:e,version:b}:/vivaldi/i.test(t)?T={name:"Vivaldi",vivaldi:e,version:n(/vivaldi\/(\d+(\.\d+)?)/i)||w}:c?T={name:"Sailfish",osname:"Sailfish OS",sailfish:e,version:n(/sailfish\s?browser\/(\d+(\.\d+)?)/i)}:/seamonkey\//i.test(t)?T={name:"SeaMonkey",seamonkey:e,version:n(/seamonkey\/(\d+(\.\d+)?)/i)}:/firefox|iceweasel|fxios/i.test(t)?(T={name:"Firefox",firefox:e,version:n(/(?:firefox|iceweasel|fxios)[ \/](\d+(\.\d+)?)/i)},/\((mobile|tablet);[^\)]*rv:[\d\.]+\)/i.test(t)&&(T.firefoxos=e,T.osname="Firefox OS")):l?T={name:"Amazon Silk",silk:e,version:n(/silk\/(\d+(\.\d+)?)/i)}:/phantom/i.test(t)?T={name:"PhantomJS",phantom:e,version:n(/phantomjs\/(\d+(\.\d+)?)/i)}:/slimerjs/i.test(t)?T={name:"SlimerJS",slimer:e,version:n(/slimerjs\/(\d+(\.\d+)?)/i)}:/blackberry|\bbb\d+/i.test(t)||/rim\stablet/i.test(t)?T={name:"BlackBerry",osname:"BlackBerry OS",blackberry:e,version:w||n(/blackberry[\d]+\/(\d+(\.\d+)?)/i)}:p?(T={name:"WebOS",osname:"WebOS",webos:e,version:w||n(/w(?:eb)?osbrowser\/(\d+(\.\d+)?)/i)},/touchpad\//i.test(t)&&(T.touchpad=e)):/bada/i.test(t)?T={name:"Bada",osname:"Bada",bada:e,version:n(/dolfin\/(\d+(\.\d+)?)/i)}:h?T={name:"Tizen",osname:"Tizen",tizen:e,version:n(/(?:tizen\s?)?browser\/(\d+(\.\d+)?)/i)||w}:/qupzilla/i.test(t)?T={name:"QupZilla",qupzilla:e,version:n(/(?:qupzilla)[\s\/](\d+(?:\.\d+)+)/i)||w}:/chromium/i.test(t)?T={name:"Chromium",chromium:e,version:n(/(?:chromium)[\s\/](\d+(?:\.\d+)?)/i)||w}:/chrome|crios|crmo/i.test(t)?T={name:"Chrome",chrome:e,version:n(/(?:chrome|crios|crmo)\/(\d+(\.\d+)?)/i)}:o?T={name:"Android",version:w}:/safari|applewebkit/i.test(t)?(T={name:"Safari",safari:e},w&&(T.version=w)):i?(T={name:i=="iphone"?"iPhone":i=="ipad"?"iPad":"iPod"},w&&(T.version=w)):/googlebot/i.test(t)?T={name:"Googlebot",googlebot:e,version:n(/googlebot\/(\d+(\.\d+))/i)||w}:T={name:n(/^(.*)\/(.*) /),version:r(/^(.*)\/(.*) /)},!T.msedge&&/(apple)?webkit/i.test(t)?(/(apple)?webkit\/537\.36/i.test(t)?(T.name=T.name||"Blink",T.blink=e):(T.name=T.name||"Webkit",T.webkit=e),!T.version&&w&&(T.version=w)):!T.opera&&/gecko\//i.test(t)&&(T.name=T.name||"Gecko",T.gecko=e,T.version=T.version||n(/gecko\/(\d+(\.\d+)?)/i)),!T.windowsphone&&(o||T.silk)?(T.android=e,T.osname="Android"):!T.windowsphone&&i?(T[i]=e,T.ios=e,T.osname="iOS"):g?(T.mac=e,T.osname="macOS"):x?(T.xbox=e,T.osname="Xbox"):m?(T.windows=e,T.osname="Windows"):y&&(T.linux=e,T.osname="Linux");var C="";T.windows?C=N(n(/Windows ((NT|XP)( \d\d?.\d)?)/i)):T.windowsphone?C=n(/windows phone (?:os)?\s?(\d+(\.\d+)*)/i):T.mac?(C=n(/Mac OS X (\d+([_\.\s]\d+)*)/i),C=C.replace(/[_\s]/g,".")):i?(C=n(/os (\d+([_\s]\d+)*) like mac os x/i),C=C.replace(/[_\s]/g,".")):o?C=n(/android[ \/-](\d+(\.\d+)*)/i):T.webos?C=n(/(?:web|hpw)os\/(\d+(\.\d+)*)/i):T.blackberry?C=n(/rim\stablet\sos\s(\d+(\.\d+)*)/i):T.bada?C=n(/bada\/(\d+(\.\d+)*)/i):T.tizen&&(C=n(/tizen[\/\s](\d+(\.\d+)*)/i)),C&&(T.osversion=C);var k=!T.windows&&C.split(".")[0];if(E||a||i=="ipad"||o&&(k==3||k>=4&&!S)||T.silk)T.tablet=e;else if(S||i=="iphone"||i=="ipod"||o||u||T.blackberry||T.webos||T.bada)T.mobile=e;return T.msedge||T.msie&&T.version>=10||T.yandexbrowser&&T.version>=15||T.vivaldi&&T.version>=1||T.chrome&&T.version>=20||T.samsungBrowser&&T.version>=4||T.firefox&&T.version>=20||T.safari&&T.version>=6||T.opera&&T.version>=10||T.ios&&T.osversion&&T.osversion.split(".")[0]>=6||T.blackberry&&T.version>=10.1||T.chromium&&T.version>=20?T.a=e:T.msie&&T.version<10||T.chrome&&T.version<20||T.firefox&&T.version<20||T.safari&&T.version<6||T.opera&&T.version<10||T.ios&&T.osversion&&T.osversion.split(".")[0]<6||T.chromium&&T.version<20?T.c=e:T.x=e,T}function r(e){return e.split(".").length}function i(e,t){var n=[],r;if(Array.prototype.map)return Array.prototype.map.call(e,t);for(r=0;r=0){if(n[0][t]>n[1][t])return 1;if(n[0][t]!==n[1][t])return-1;if(t===0)return 0}}function o(e,r,i){var o=n;typeof r=="string"&&(i=r,r=void 0),r===void 0&&(r=!1),i&&(o=t(i));var u=""+o.version;for(var a in e)if(e.hasOwnProperty(a)&&o[a]){if(typeof e[a]!="string")throw new Error("Browser version in the minVersion map should be a string: "+a+": "+String(e));return s([u,e[a]])<0}return r}function u(e,t,n){return!o(e,t,n)}var e=!0,n=t(typeof navigator!="undefined"?navigator.userAgent||"":"");return n.test=function(e){for(var t=0;tnew Qn(e)),e.katex=t.katex,e.password=t.password}function t(e=document){const t=new Set,n=e.querySelectorAll('d-cite');for(const i of n){const e=i.getAttribute('key').split(',');for(const n of e)t.add(n)}return[...t]}function n(e,t,n,i){if(null==e.author)return'';var a=e.author.split(' and ');let d=a.map((e)=>{if(e=e.trim(),e.match(/\{.+\}/)){var n=/\{([^}]+)\}/,i=n.exec(e);return i[1]}if(-1!=e.indexOf(','))var a=e.split(',')[0].trim(),d=e.split(',')[1];else var a=e.split(' ').slice(-1)[0].trim(),d=e.split(' ').slice(0,-1).join(' ');var r='';return void 0!=d&&(r=d.trim().split(' ').map((e)=>e.trim()[0]),r=r.join('.')+'.'),t.replace('${F}',d).replace('${L}',a).replace('${I}',r)});if(1[${i||'link'}]`}return''}function d(e,t){return'doi'in e?`${t?'
':''} DOI: ${e.doi}`:''}function r(e){return''+e.title+' '}function o(e){if(e){var t=r(e);return t+=a(e)+'
',e.author&&(t+=n(e,'${L}, ${I}',', ',' and '),(e.year||e.date)&&(t+=', ')),t+=e.year||e.date?(e.year||e.date)+'. ':'. ',t+=i(e),t+=d(e),t}return'?'}function l(e){if(e){var t='';t+=''+e.title+'',t+=a(e),t+='
';var r=n(e,'${I} ${L}',', ')+'.',o=i(e).trim()+' '+e.year+'. '+d(e,!0);return t+=(r+o).length'+o,t}return'?'}function s(e){for(let t of e.authors){const e=!!t.affiliation,n=!!t.affiliations;if(e)if(n)console.warn(`Author ${t.author} has both old-style ("affiliation" & "affiliationURL") and new style ("affiliations") affiliation information!`);else{let e={name:t.affiliation};t.affiliationURL&&(e.url=t.affiliationURL),t.affiliations=[e]}}return console.log(e),e}function c(e){const t=e.querySelector('script');if(t){const e=t.getAttribute('type');if('json'==e.split('/')[1]){const e=t.textContent,n=JSON.parse(e);return s(n)}console.error('Distill only supports JSON frontmatter tags anymore; no more YAML.')}else console.error('You added a frontmatter tag but did not provide a script tag with front matter data in it. Please take a look at our templates.');return{}}function u(){return-1!==['interactive','complete'].indexOf(document.readyState)}function p(e){const t='distill-prerendered-styles',n=e.getElementById(t);if(!n){const n=e.createElement('style');n.id=t,n.type='text/css';const i=e.createTextNode(bi);n.appendChild(i);const a=e.head.querySelector('script');e.head.insertBefore(n,a)}}function g(e,t){console.info('Runlevel 0: Polyfill required: '+e.name);const n=document.createElement('script');n.src=e.url,n.async=!1,t&&(n.onload=function(){t(e)}),n.onerror=function(){new Error('Runlevel 0: Polyfills failed to load script '+e.name)},document.head.appendChild(n)}function f(e,t){return t={exports:{}},e(t,t.exports),t.exports}function h(e){return e.replace(/[\t\n ]+/g,' ').replace(/{\\["^`.'acu~Hvs]( )?([a-zA-Z])}/g,(e,t,n)=>n).replace(/{\\([a-zA-Z])}/g,(e,t)=>t)}function b(e){const t=new Map,n=_i.toJSON(e);for(const i of n){for(const[e,t]of Object.entries(i.entryTags))i.entryTags[e.toLowerCase()]=h(t);i.entryTags.type=i.entryType,t.set(i.citationKey,i.entryTags)}return t}function m(e){return`@article{${e.slug}, + author = {${e.bibtexAuthors}}, + title = {${e.title}}, + journal = {${e.journal.title}}, + year = {${e.publishedYear}}, + note = {${e.url}}, + doi = {${e.doi}} +}`}function y(e){return` + +`}function x(e,t,n=document){if(0 + + d-toc { + contain: layout style; + display: block; + } + + d-toc ul { + padding-left: 0; + } + + d-toc ul > ul { + padding-left: 24px; + } + + d-toc a { + border-bottom: none; + text-decoration: none; + } + + + +

Table of contents

+
    `;for(const i of t){const e='D-TITLE'==i.parentElement.tagName,t=i.getAttribute('no-toc');if(e||t)continue;const a=i.textContent,d='#'+i.getAttribute('id');let r='
  • '+a+'
  • ';'H3'==i.tagName?r='
      '+r+'
    ':r+='
    ',n+=r}n+='
',e.innerHTML=n}function v(e){return function(t,n){return Xi(e(t),n)}}function w(e,t,n){var i=(t-e)/Rn(0,n),a=Fn(jn(i)/Nn),d=i/In(10,a);return 0<=a?(d>=Gi?10:d>=ea?5:d>=ta?2:1)*In(10,a):-In(10,-a)/(d>=Gi?10:d>=ea?5:d>=ta?2:1)}function S(e,t,n){var i=Un(t-e)/Rn(0,n),a=In(10,Fn(jn(i)/Nn)),d=i/a;return d>=Gi?a*=10:d>=ea?a*=5:d>=ta&&(a*=2),t>8|240&t>>4,15&t>>4|240&t,(15&t)<<4|15&t,1)):(t=ca.exec(e))?O(parseInt(t[1],16)):(t=ua.exec(e))?new j(t[1],t[2],t[3],1):(t=pa.exec(e))?new j(255*t[1]/100,255*t[2]/100,255*t[3]/100,1):(t=ga.exec(e))?U(t[1],t[2],t[3],t[4]):(t=fa.exec(e))?U(255*t[1]/100,255*t[2]/100,255*t[3]/100,t[4]):(t=ha.exec(e))?R(t[1],t[2]/100,t[3]/100,1):(t=ba.exec(e))?R(t[1],t[2]/100,t[3]/100,t[4]):ma.hasOwnProperty(e)?O(ma[e]):'transparent'===e?new j(NaN,NaN,NaN,0):null}function O(e){return new j(255&e>>16,255&e>>8,255&e,1)}function U(e,t,n,i){return 0>=i&&(e=t=n=NaN),new j(e,t,n,i)}function I(e){return(e instanceof L||(e=M(e)),!e)?new j:(e=e.rgb(),new j(e.r,e.g,e.b,e.opacity))}function N(e,t,n,i){return 1===arguments.length?I(e):new j(e,t,n,null==i?1:i)}function j(e,t,n,i){this.r=+e,this.g=+t,this.b=+n,this.opacity=+i}function R(e,t,n,i){return 0>=i?e=t=n=NaN:0>=n||1<=n?e=t=NaN:0>=t&&(e=NaN),new F(e,t,n,i)}function q(e){if(e instanceof F)return new F(e.h,e.s,e.l,e.opacity);if(e instanceof L||(e=M(e)),!e)return new F;if(e instanceof F)return e;e=e.rgb();var t=e.r/255,n=e.g/255,i=e.b/255,a=Hn(t,n,i),d=Rn(t,n,i),r=NaN,c=d-a,s=(d+a)/2;return c?(r=t===d?(n-i)/c+6*(ns?d+a:2-d-a,r*=60):c=0s?0:r,new F(r,c,s,e.opacity)}function F(e,t,n,i){this.h=+e,this.s=+t,this.l=+n,this.opacity=+i}function P(e,t,n){return 255*(60>e?t+(n-t)*e/60:180>e?n:240>e?t+(n-t)*(240-e)/60:t)}function H(e){if(e instanceof Y)return new Y(e.l,e.a,e.b,e.opacity);if(e instanceof X){var t=e.h*ya;return new Y(e.l,Mn(t)*e.c,Dn(t)*e.c,e.opacity)}e instanceof j||(e=I(e));var n=$(e.r),i=$(e.g),a=$(e.b),d=W((0.4124564*n+0.3575761*i+0.1804375*a)/Kn),r=W((0.2126729*n+0.7151522*i+0.072175*a)/Xn),o=W((0.0193339*n+0.119192*i+0.9503041*a)/Yn);return new Y(116*r-16,500*(d-r),200*(r-o),e.opacity)}function Y(e,t,n,i){this.l=+e,this.a=+t,this.b=+n,this.opacity=+i}function W(e){return e>Sa?In(e,1/3):e/wa+Zn}function V(e){return e>va?e*e*e:wa*(e-Zn)}function K(e){return 255*(0.0031308>=e?12.92*e:1.055*In(e,1/2.4)-0.055)}function $(e){return 0.04045>=(e/=255)?e/12.92:In((e+0.055)/1.055,2.4)}function z(e){if(e instanceof X)return new X(e.h,e.c,e.l,e.opacity);e instanceof Y||(e=H(e));var t=En(e.b,e.a)*xa;return new X(0>t?t+360:t,An(e.a*e.a+e.b*e.b),e.l,e.opacity)}function X(e,t,n,i){this.h=+e,this.c=+t,this.l=+n,this.opacity=+i}function J(e){if(e instanceof Z)return new Z(e.h,e.s,e.l,e.opacity);e instanceof j||(e=I(e));var t=e.r/255,n=e.g/255,i=e.b/255,a=(_a*i+E*t-Ta*n)/(_a+E-Ta),d=i-a,r=(D*(n-a)-B*d)/C,o=An(r*r+d*d)/(D*a*(1-a)),l=o?En(r,d)*xa-120:NaN;return new Z(0>l?l+360:l,o,a,e.opacity)}function Q(e,t,n,i){return 1===arguments.length?J(e):new Z(e,t,n,null==i?1:i)}function Z(e,t,n,i){this.h=+e,this.s=+t,this.l=+n,this.opacity=+i}function G(e,n){return function(i){return e+i*n}}function ee(e,n,i){return e=In(e,i),n=In(n,i)-e,i=1/i,function(a){return In(e+a*n,i)}}function te(e){return 1==(e=+e)?ne:function(t,n){return n-t?ee(t,n,e):La(isNaN(t)?n:t)}}function ne(e,t){var n=t-e;return n?G(e,n):La(isNaN(e)?t:e)}function ie(e){return function(){return e}}function ae(e){return function(n){return e(n)+''}}function de(e){return function t(n){function i(i,t){var a=e((i=Q(i)).h,(t=Q(t)).h),d=ne(i.s,t.s),r=ne(i.l,t.l),o=ne(i.opacity,t.opacity);return function(e){return i.h=a(e),i.s=d(e),i.l=r(In(e,n)),i.opacity=o(e),i+''}}return n=+n,i.gamma=t,i}(1)}function oe(e,t){return(t-=e=+e)?function(n){return(n-e)/t}:Pa(t)}function le(e){return function(t,n){var i=e(t=+t,n=+n);return function(e){return e<=t?0:e>=n?1:i(e)}}}function se(e){return function(n,i){var d=e(n=+n,i=+i);return function(e){return 0>=e?n:1<=e?i:d(e)}}}function ce(e,t,n,i){var a=e[0],d=e[1],r=t[0],o=t[1];return d',a=t[3]||'-',d=t[4]||'',r=!!t[5],o=t[6]&&+t[6],l=!!t[7],s=t[8]&&+t[8].slice(1),c=t[9]||'';'n'===c?(l=!0,c='g'):!$a[c]&&(c=''),(r||'0'===n&&'='===i)&&(r=!0,n='0',i='='),this.fill=n,this.align=i,this.sign=a,this.symbol=d,this.zero=r,this.width=o,this.comma=l,this.precision=s,this.type=c}function be(e){var t=e.domain;return e.ticks=function(e){var n=t();return na(n[0],n[n.length-1],null==e?10:e)},e.tickFormat=function(e,n){return ad(t(),e,n)},e.nice=function(n){null==n&&(n=10);var i,a=t(),d=0,r=a.length-1,o=a[d],l=a[r];return li&&(o=qn(o*i)/i,l=Fn(l*i)/i,i=w(o,l,n)),0i&&(a[d]=qn(o*i)/i,a[r]=Fn(l*i)/i,t(a)),e},e}function me(){var e=ge(oe,Ma);return e.copy=function(){return pe(e,me())},be(e)}function ye(e,t,n,i){function a(t){return e(t=new Date(+t)),t}return a.floor=a,a.ceil=function(n){return e(n=new Date(n-1)),t(n,1),e(n),n},a.round=function(e){var t=a(e),n=a.ceil(e);return e-t=t)for(;e(t),!n(t);)t.setTime(t-1)},function(e,i){if(e>=e)if(0>i)for(;0>=++i;)for(;t(e,-1),!n(e););else for(;0<=--i;)for(;t(e,1),!n(e););})},n&&(a.count=function(t,i){return dd.setTime(+t),rd.setTime(+i),e(dd),e(rd),Fn(n(dd,rd))},a.every=function(e){return e=Fn(e),isFinite(e)&&0e.y){var t=new Date(-1,e.m,e.d,e.H,e.M,e.S,e.L);return t.setFullYear(e.y),t}return new Date(e.y,e.m,e.d,e.H,e.M,e.S,e.L)}function we(e){if(0<=e.y&&100>e.y){var t=new Date(Date.UTC(-1,e.m,e.d,e.H,e.M,e.S,e.L));return t.setUTCFullYear(e.y),t}return new Date(Date.UTC(e.y,e.m,e.d,e.H,e.M,e.S,e.L))}function Se(e){return{y:e,m:0,d:1,H:0,M:0,S:0,L:0}}function Ce(e){function t(e,t){return function(a){var d,r,o,l=[],s=-1,i=0,c=e.length;for(a instanceof Date||(a=new Date(+a));++s=n)return-1;if(r=t.charCodeAt(l++),37===r){if(r=t.charAt(l++),o=C[r in Hd?t.charAt(l++):r],!o||0>(d=o(e,a,d)))return-1;}else if(r!=a.charCodeAt(d++))return-1}return d}var r=e.dateTime,o=e.date,l=e.time,i=e.periods,s=e.days,c=e.shortDays,u=e.months,p=e.shortMonths,g=Le(i),f=Ae(i),h=Le(s),b=Ae(s),m=Le(c),y=Ae(c),x=Le(u),k=Ae(u),v=Le(p),w=Ae(p),d={a:function(e){return c[e.getDay()]},A:function(e){return s[e.getDay()]},b:function(e){return p[e.getMonth()]},B:function(e){return u[e.getMonth()]},c:null,d:Ye,e:Ye,H:Be,I:We,j:Ve,L:Ke,m:$e,M:Xe,p:function(e){return i[+(12<=e.getHours())]},S:Je,U:Qe,w:Ze,W:Ge,x:null,X:null,y:et,Y:tt,Z:nt,"%":mt},S={a:function(e){return c[e.getUTCDay()]},A:function(e){return s[e.getUTCDay()]},b:function(e){return p[e.getUTCMonth()]},B:function(e){return u[e.getUTCMonth()]},c:null,d:it,e:it,H:at,I:dt,j:rt,L:ot,m:lt,M:st,p:function(e){return i[+(12<=e.getUTCHours())]},S:ct,U:ut,w:pt,W:gt,x:null,X:null,y:ft,Y:ht,Z:bt,"%":mt},C={a:function(e,t,a){var i=m.exec(t.slice(a));return i?(e.w=y[i[0].toLowerCase()],a+i[0].length):-1},A:function(e,t,a){var i=h.exec(t.slice(a));return i?(e.w=b[i[0].toLowerCase()],a+i[0].length):-1},b:function(e,t,a){var i=v.exec(t.slice(a));return i?(e.m=w[i[0].toLowerCase()],a+i[0].length):-1},B:function(e,t,a){var i=x.exec(t.slice(a));return i?(e.m=k[i[0].toLowerCase()],a+i[0].length):-1},c:function(e,t,n){return a(e,r,t,n)},d:je,e:je,H:qe,I:qe,j:Re,L:He,m:Ne,M:Fe,p:function(e,t,a){var i=g.exec(t.slice(a));return i?(e.p=f[i[0].toLowerCase()],a+i[0].length):-1},S:Pe,U:De,w:Ee,W:Me,x:function(e,t,n){return a(e,o,t,n)},X:function(e,t,n){return a(e,l,t,n)},y:Ue,Y:Oe,Z:Ie,"%":ze};return d.x=t(o,d),d.X=t(l,d),d.c=t(r,d),S.x=t(o,S),S.X=t(l,S),S.c=t(r,S),{format:function(e){var n=t(e+='',d);return n.toString=function(){return e},n},parse:function(e){var t=n(e+='',ve);return t.toString=function(){return e},t},utcFormat:function(e){var n=t(e+='',S);return n.toString=function(){return e},n},utcParse:function(e){var t=n(e,we);return t.toString=function(){return e},t}}}function Te(e,t,n){var i=0>e?'-':'',a=(i?-e:e)+'',d=a.length;return i+(dt?1:e>=t?0:NaN}function qt(e){return function(){this.removeAttribute(e)}}function Ft(e){return function(){this.removeAttributeNS(e.space,e.local)}}function Pt(e,t){return function(){this.setAttribute(e,t)}}function Ht(e,t){return function(){this.setAttributeNS(e.space,e.local,t)}}function zt(e,t){return function(){var n=t.apply(this,arguments);null==n?this.removeAttribute(e):this.setAttribute(e,n)}}function Yt(e,t){return function(){var n=t.apply(this,arguments);null==n?this.removeAttributeNS(e.space,e.local):this.setAttributeNS(e.space,e.local,n)}}function Bt(e){return function(){this.style.removeProperty(e)}}function Wt(e,t,n){return function(){this.style.setProperty(e,t,n)}}function Vt(e,t,n){return function(){var i=t.apply(this,arguments);null==i?this.style.removeProperty(e):this.style.setProperty(e,i,n)}}function Kt(e,t){return e.style.getPropertyValue(t)||vr(e).getComputedStyle(e,null).getPropertyValue(t)}function $t(e){return function(){delete this[e]}}function Xt(e,t){return function(){this[e]=t}}function Jt(e,t){return function(){var n=t.apply(this,arguments);null==n?delete this[e]:this[e]=n}}function Qt(e){return e.trim().split(/^|\s+/)}function Zt(e){return e.classList||new Gt(e)}function Gt(e){this._node=e,this._names=Qt(e.getAttribute('class')||'')}function en(e,t){for(var a=Zt(e),d=-1,i=t.length;++dUpdates and Corrections +

`,e.githubCompareUpdatesUrl&&(t+=`View all changes to this article since it was first published.`),t+=` + If you see mistakes or want to suggest changes, please create an issue on GitHub.

+ `);const n=e.journal;return'undefined'!=typeof n&&'Distill'===n.title&&(t+=` +

Reuse

+

Diagrams and text are licensed under Creative Commons Attribution CC-BY 4.0 with the source available on GitHub, unless noted otherwise. The figures that have been reused from other sources don’t fall under this license and can be recognized by a note in their caption: “Figure from …”.

+ `),'undefined'!=typeof e.publishedDate&&(t+=` +

Citation

+

For attribution in academic contexts, please cite this work as

+
${e.concatenatedAuthors}, "${e.title}", Distill, ${e.publishedYear}.
+

BibTeX citation

+
${m(e)}
+ `),t}var An=Math.sqrt,En=Math.atan2,Dn=Math.sin,Mn=Math.cos,On=Math.PI,Un=Math.abs,In=Math.pow,Nn=Math.LN10,jn=Math.log,Rn=Math.max,qn=Math.ceil,Fn=Math.floor,Pn=Math.round,Hn=Math.min;const zn=['Sunday','Monday','Tuesday','Wednesday','Thursday','Friday','Saturday'],Bn=['Jan.','Feb.','March','April','May','June','July','Aug.','Sept.','Oct.','Nov.','Dec.'],Wn=(e)=>10>e?'0'+e:e,Vn=function(e){const t=zn[e.getDay()].substring(0,3),n=Wn(e.getDate()),i=Bn[e.getMonth()].substring(0,3),a=e.getFullYear().toString(),d=e.getUTCHours().toString(),r=e.getUTCMinutes().toString(),o=e.getUTCSeconds().toString();return`${t}, ${n} ${i} ${a} ${d}:${r}:${o} Z`},$n=function(e){const t=Array.from(e).reduce((e,[t,n])=>Object.assign(e,{[t]:n}),{});return t},Jn=function(e){const t=new Map;for(var n in e)e.hasOwnProperty(n)&&t.set(n,e[n]);return t};class Qn{constructor(e){this.name=e.author,this.personalURL=e.authorURL,this.affiliation=e.affiliation,this.affiliationURL=e.affiliationURL,this.affiliations=e.affiliations||[]}get firstName(){const e=this.name.split(' ');return e.slice(0,e.length-1).join(' ')}get lastName(){const e=this.name.split(' ');return e[e.length-1]}}class Gn{constructor(){this.title='unnamed article',this.description='',this.authors=[],this.bibliography=new Map,this.bibliographyParsed=!1,this.citations=[],this.citationsCollected=!1,this.journal={},this.katex={},this.publishedDate=void 0}set url(e){this._url=e}get url(){if(this._url)return this._url;return this.distillPath&&this.journal.url?this.journal.url+'/'+this.distillPath:this.journal.url?this.journal.url:void 0}get githubUrl(){return this.githubPath?'https://github.com/'+this.githubPath:void 0}set previewURL(e){this._previewURL=e}get previewURL(){return this._previewURL?this._previewURL:this.url+'/thumbnail.jpg'}get publishedDateRFC(){return Vn(this.publishedDate)}get updatedDateRFC(){return Vn(this.updatedDate)}get publishedYear(){return this.publishedDate.getFullYear()}get publishedMonth(){return Bn[this.publishedDate.getMonth()]}get publishedDay(){return this.publishedDate.getDate()}get publishedMonthPadded(){return Wn(this.publishedDate.getMonth()+1)}get publishedDayPadded(){return Wn(this.publishedDate.getDate())}get publishedISODateOnly(){return this.publishedDate.toISOString().split('T')[0]}get volume(){const e=this.publishedYear-2015;if(1>e)throw new Error('Invalid publish date detected during computing volume');return e}get issue(){return this.publishedDate.getMonth()+1}get concatenatedAuthors(){if(2{return e.lastName+', '+e.firstName}).join(' and ')}get slug(){let e='';return this.authors.length&&(e+=this.authors[0].lastName.toLowerCase(),e+=this.publishedYear,e+=this.title.split(' ')[0].toLowerCase()),e||'Untitled'}get bibliographyEntries(){return new Map(this.citations.map((e)=>{const t=this.bibliography.get(e);return[e,t]}))}set bibliography(e){e instanceof Map?this._bibliography=e:'object'==typeof e&&(this._bibliography=Jn(e))}get bibliography(){return this._bibliography}static fromObject(e){const t=new Gn;return Object.assign(t,e),t}assignToObject(e){Object.assign(e,this),e.bibliography=$n(this.bibliographyEntries),e.url=this.url,e.githubUrl=this.githubUrl,e.previewURL=this.previewURL,this.publishedDate&&(e.volume=this.volume,e.issue=this.issue,e.publishedDateRFC=this.publishedDateRFC,e.publishedYear=this.publishedYear,e.publishedMonth=this.publishedMonth,e.publishedDay=this.publishedDay,e.publishedMonthPadded=this.publishedMonthPadded,e.publishedDayPadded=this.publishedDayPadded),this.updatedDate&&(e.updatedDateRFC=this.updatedDateRFC),e.concatenatedAuthors=this.concatenatedAuthors,e.bibtexAuthors=this.bibtexAuthors,e.slug=this.slug}}const ei=(e)=>{return class extends e{constructor(){super();const e={childList:!0,characterData:!0,subtree:!0},t=new MutationObserver(()=>{t.disconnect(),this.renderIfPossible(),t.observe(this,e)});t.observe(this,e)}connectedCallback(){super.connectedCallback(),this.renderIfPossible()}renderIfPossible(){this.textContent&&this.root&&this.renderContent()}renderContent(){console.error(`Your class ${this.constructor.name} must provide a custom renderContent() method!`)}}},ti=(e,t,n=!0)=>{return(i)=>{const a=document.createElement('template');return a.innerHTML=t,n&&'ShadyCSS'in window&&ShadyCSS.prepareTemplate(a,e),class extends i{static get is(){return e}constructor(){super(),this.clone=document.importNode(a.content,!0),n&&(this.attachShadow({mode:'open'}),this.shadowRoot.appendChild(this.clone))}connectedCallback(){n?'ShadyCSS'in window&&ShadyCSS.styleElement(this):this.insertBefore(this.clone,this.firstChild)}get root(){return n?this.shadowRoot:this}$(e){return this.root.querySelector(e)}$$(e){return this.root.querySelectorAll(e)}}}};var ni='/*\n * Copyright 2018 The Distill Template Authors\n *\n * Licensed under the Apache License, Version 2.0 (the "License");\n * you may not use this file except in compliance with the License.\n * You may obtain a copy of the License at\n *\n * http://www.apache.org/licenses/LICENSE-2.0\n *\n * Unless required by applicable law or agreed to in writing, software\n * distributed under the License is distributed on an "AS IS" BASIS,\n * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n * See the License for the specific language governing permissions and\n * limitations under the License.\n */\n\nspan.katex-display {\n text-align: left;\n padding: 8px 0 8px 0;\n margin: 0.5em 0 0.5em 1em;\n}\n\nspan.katex {\n -webkit-font-smoothing: antialiased;\n color: rgba(0, 0, 0, 0.8);\n font-size: 1.18em;\n}\n';const ii=function(e,t,n){let i=n,a=0;for(const d=e.length;i=a&&t.slice(i,i+d)===e)return i;'\\'===n?i++:'{'===n?a++:'}'===n&&a--;i++}return-1},ai=function(e,t,n,i){const a=[];for(let d=0;d',ui=ti('d-math',` +${ci} + + +`);class T extends ei(ui(HTMLElement)){static set katexOptions(e){T._katexOptions=e,T.katexOptions.delimiters&&(T.katexAdded?T.katexLoadedCallback():T.addKatex())}static get katexOptions(){return T._katexOptions||(T._katexOptions={delimiters:[{left:'$$',right:'$$',display:!1}]}),T._katexOptions}static katexLoadedCallback(){const e=document.querySelectorAll('d-math');for(const t of e)t.renderContent();if(T.katexOptions.delimiters){const e=document.querySelector('d-article');si(e,T.katexOptions)}}static addKatex(){document.head.insertAdjacentHTML('beforeend',ci);const e=document.createElement('script');e.src='https://distill.pub/third-party/katex/katex.min.js',e.async=!0,e.onload=T.katexLoadedCallback,e.crossorigin='anonymous',document.head.appendChild(e),T.katexAdded=!0}get options(){const e={displayMode:this.hasAttribute('block')};return Object.assign(e,T.katexOptions)}connectedCallback(){super.connectedCallback(),T.katexAdded||T.addKatex()}renderContent(){if('undefined'!=typeof katex){const e=this.root.querySelector('#katex-container');katex.render(this.textContent,e,this.options)}}}T.katexAdded=!1,T.inlineMathRendered=!1,window.DMath=T;class pi extends HTMLElement{static get is(){return'd-front-matter'}constructor(){super();const e=new MutationObserver((e)=>{for(const t of e)if('SCRIPT'===t.target.nodeName||'characterData'===t.type){const e=c(this);this.notify(e)}});e.observe(this,{childList:!0,characterData:!0,subtree:!0})}notify(e){const t=new CustomEvent('onFrontMatterChanged',{detail:e,bubbles:!0});document.dispatchEvent(t)}}var gi=function(e,t){const n=e.body,i=n.querySelector('d-article');if(!i)return void console.warn('No d-article tag found; skipping adding optional components!');let a=e.querySelector('d-byline');a||(t.authors?(a=e.createElement('d-byline'),n.insertBefore(a,i)):console.warn('No authors found in front matter; please add them before submission!'));let d=e.querySelector('d-title');d||(d=e.createElement('d-title'),n.insertBefore(d,a));let r=d.querySelector('h1');r||(r=e.createElement('h1'),r.textContent=t.title,d.insertBefore(r,d.firstChild));const o='undefined'!=typeof t.password;let l=n.querySelector('d-interstitial');if(o&&!l){const i='undefined'!=typeof window,a=i&&window.location.hostname.includes('localhost');i&&a||(l=e.createElement('d-interstitial'),l.password=t.password,n.insertBefore(l,n.firstChild))}else!o&&l&&l.parentElement.removeChild(this);let s=e.querySelector('d-appendix');s||(s=e.createElement('d-appendix'),e.body.appendChild(s));let c=e.querySelector('d-footnote-list');c||(c=e.createElement('d-footnote-list'),s.appendChild(c));let u=e.querySelector('d-citation-list');u||(u=e.createElement('d-citation-list'),s.appendChild(u))};const fi=new Gn,hi={frontMatter:fi,waitingOn:{bibliography:[],citations:[]},listeners:{onCiteKeyCreated(e){const[t,n]=e.detail;if(!fi.citationsCollected)return void hi.waitingOn.citations.push(()=>hi.listeners.onCiteKeyCreated(e));if(!fi.bibliographyParsed)return void hi.waitingOn.bibliography.push(()=>hi.listeners.onCiteKeyCreated(e));const i=n.map((e)=>fi.citations.indexOf(e));t.numbers=i;const a=n.map((e)=>fi.bibliography.get(e));t.entries=a},onCiteKeyChanged(){fi.citations=t(),fi.citationsCollected=!0;for(const e of hi.waitingOn.citations.slice())e();const e=document.querySelector('d-citation-list'),n=new Map(fi.citations.map((e)=>{return[e,fi.bibliography.get(e)]}));e.citations=n;const i=document.querySelectorAll('d-cite');for(const e of i){const t=e.keys,n=t.map((e)=>fi.citations.indexOf(e));e.numbers=n;const i=t.map((e)=>fi.bibliography.get(e));e.entries=i}},onCiteKeyRemoved(e){hi.listeners.onCiteKeyChanged(e)},onBibliographyChanged(e){const t=document.querySelector('d-citation-list'),n=e.detail;fi.bibliography=n,fi.bibliographyParsed=!0;for(const t of hi.waitingOn.bibliography.slice())t();if(!fi.citationsCollected)return void hi.waitingOn.citations.push(function(){hi.listeners.onBibliographyChanged({target:e.target,detail:e.detail})});if(t.hasAttribute('distill-prerendered'))console.info('Citation list was prerendered; not updating it.');else{const e=new Map(fi.citations.map((e)=>{return[e,fi.bibliography.get(e)]}));t.citations=e}},onFootnoteChanged(){const e=document.querySelector('d-footnote-list');if(e){const t=document.querySelectorAll('d-footnote');e.footnotes=t}},onFrontMatterChanged(t){const n=t.detail;e(fi,n);const i=document.querySelector('d-interstitial');i&&('undefined'==typeof fi.password?i.parentElement.removeChild(i):i.password=fi.password);const a=document.body.hasAttribute('distill-prerendered');if(!a&&u()){gi(document,fi);const e=document.querySelector('distill-appendix');e&&(e.frontMatter=fi);const t=document.querySelector('d-byline');t&&(t.frontMatter=fi),n.katex&&(T.katexOptions=n.katex)}},DOMContentLoaded(){if(hi.loaded)return void console.warn('Controller received DOMContentLoaded but was already loaded!');if(!u())return void console.warn('Controller received DOMContentLoaded before appropriate document.readyState!');hi.loaded=!0,console.log('Runlevel 4: Controller running DOMContentLoaded');const e=document.querySelector('d-front-matter'),n=c(e);hi.listeners.onFrontMatterChanged({detail:n}),fi.citations=t(),fi.citationsCollected=!0;for(const e of hi.waitingOn.citations.slice())e();if(fi.bibliographyParsed)for(const e of hi.waitingOn.bibliography.slice())e();const i=document.querySelector('d-footnote-list');if(i){const e=document.querySelectorAll('d-footnote');i.footnotes=e}}}};const bi='/*\n * Copyright 2018 The Distill Template Authors\n *\n * Licensed under the Apache License, Version 2.0 (the "License");\n * you may not use this file except in compliance with the License.\n * You may obtain a copy of the License at\n *\n * http://www.apache.org/licenses/LICENSE-2.0\n *\n * Unless required by applicable law or agreed to in writing, software\n * distributed under the License is distributed on an "AS IS" BASIS,\n * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n * See the License for the specific language governing permissions and\n * limitations under the License.\n */\n\nhtml {\n font-size: 14px;\n\tline-height: 1.6em;\n /* font-family: "Libre Franklin", "Helvetica Neue", sans-serif; */\n font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Oxygen, Ubuntu, Cantarell, "Fira Sans", "Droid Sans", "Helvetica Neue", Arial, sans-serif;\n /*, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol";*/\n text-size-adjust: 100%;\n -ms-text-size-adjust: 100%;\n -webkit-text-size-adjust: 100%;\n}\n\n@media(min-width: 768px) {\n html {\n font-size: 16px;\n }\n}\n\nbody {\n margin: 0;\n}\n\na {\n color: #004276;\n}\n\nfigure {\n margin: 0;\n}\n\ntable {\n\tborder-collapse: collapse;\n\tborder-spacing: 0;\n}\n\ntable th {\n\ttext-align: left;\n}\n\ntable thead {\n border-bottom: 1px solid rgba(0, 0, 0, 0.05);\n}\n\ntable thead th {\n padding-bottom: 0.5em;\n}\n\ntable tbody :first-child td {\n padding-top: 0.5em;\n}\n\npre {\n overflow: auto;\n max-width: 100%;\n}\n\np {\n margin-top: 0;\n margin-bottom: 1em;\n}\n\nsup, sub {\n vertical-align: baseline;\n position: relative;\n top: -0.4em;\n line-height: 1em;\n}\n\nsub {\n top: 0.4em;\n}\n\n.kicker,\n.marker {\n font-size: 15px;\n font-weight: 600;\n color: rgba(0, 0, 0, 0.5);\n}\n\n\n/* Headline */\n\n@media(min-width: 1024px) {\n d-title h1 span {\n display: block;\n }\n}\n\n/* Figure */\n\nfigure {\n position: relative;\n margin-bottom: 2.5em;\n margin-top: 1.5em;\n}\n\nfigcaption+figure {\n\n}\n\nfigure img {\n width: 100%;\n}\n\nfigure svg text,\nfigure svg tspan {\n}\n\nfigcaption,\n.figcaption {\n color: rgba(0, 0, 0, 0.6);\n font-size: 12px;\n line-height: 1.5em;\n}\n\n@media(min-width: 1024px) {\nfigcaption,\n.figcaption {\n font-size: 13px;\n }\n}\n\nfigure.external img {\n background: white;\n border: 1px solid rgba(0, 0, 0, 0.1);\n box-shadow: 0 1px 8px rgba(0, 0, 0, 0.1);\n padding: 18px;\n box-sizing: border-box;\n}\n\nfigcaption a {\n color: rgba(0, 0, 0, 0.6);\n}\n\nfigcaption b,\nfigcaption strong, {\n font-weight: 600;\n color: rgba(0, 0, 0, 1.0);\n}\n'+'/*\n * Copyright 2018 The Distill Template Authors\n *\n * Licensed under the Apache License, Version 2.0 (the "License");\n * you may not use this file except in compliance with the License.\n * You may obtain a copy of the License at\n *\n * http://www.apache.org/licenses/LICENSE-2.0\n *\n * Unless required by applicable law or agreed to in writing, software\n * distributed under the License is distributed on an "AS IS" BASIS,\n * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n * See the License for the specific language governing permissions and\n * limitations under the License.\n */\n\n@supports not (display: grid) {\n .base-grid,\n distill-header,\n d-title,\n d-abstract,\n d-article,\n d-appendix,\n distill-appendix,\n d-byline,\n d-footnote-list,\n d-citation-list,\n distill-footer {\n display: block;\n padding: 8px;\n }\n}\n\n.base-grid,\ndistill-header,\nd-title,\nd-abstract,\nd-article,\nd-appendix,\ndistill-appendix,\nd-byline,\nd-footnote-list,\nd-citation-list,\ndistill-footer {\n display: grid;\n justify-items: stretch;\n grid-template-columns: [screen-start] 8px [page-start kicker-start text-start gutter-start middle-start] 1fr 1fr 1fr 1fr 1fr 1fr 1fr 1fr [text-end page-end gutter-end kicker-end middle-end] 8px [screen-end];\n grid-column-gap: 8px;\n}\n\n.grid {\n display: grid;\n grid-column-gap: 8px;\n}\n\n@media(min-width: 768px) {\n .base-grid,\n distill-header,\n d-title,\n d-abstract,\n d-article,\n d-appendix,\n distill-appendix,\n d-byline,\n d-footnote-list,\n d-citation-list,\n distill-footer {\n grid-template-columns: [screen-start] 1fr [page-start kicker-start middle-start text-start] 45px 45px 45px 45px 45px 45px 45px 45px [ kicker-end text-end gutter-start] 45px [middle-end] 45px [page-end gutter-end] 1fr [screen-end];\n grid-column-gap: 16px;\n }\n\n .grid {\n grid-column-gap: 16px;\n }\n}\n\n@media(min-width: 1000px) {\n .base-grid,\n distill-header,\n d-title,\n d-abstract,\n d-article,\n d-appendix,\n distill-appendix,\n d-byline,\n d-footnote-list,\n d-citation-list,\n distill-footer {\n grid-template-columns: [screen-start] 1fr [page-start kicker-start] 50px [middle-start] 50px [text-start kicker-end] 50px 50px 50px 50px 50px 50px 50px 50px [text-end gutter-start] 50px [middle-end] 50px [page-end gutter-end] 1fr [screen-end];\n grid-column-gap: 16px;\n }\n\n .grid {\n grid-column-gap: 16px;\n }\n}\n\n@media(min-width: 1180px) {\n .base-grid,\n distill-header,\n d-title,\n d-abstract,\n d-article,\n d-appendix,\n distill-appendix,\n d-byline,\n d-footnote-list,\n d-citation-list,\n distill-footer {\n grid-template-columns: [screen-start] 1fr [page-start kicker-start] 60px [middle-start] 60px [text-start kicker-end] 60px 60px 60px 60px 60px 60px 60px 60px [text-end gutter-start] 60px [middle-end] 60px [page-end gutter-end] 1fr [screen-end];\n grid-column-gap: 32px;\n }\n\n .grid {\n grid-column-gap: 32px;\n }\n}\n\n\n\n\n.base-grid {\n grid-column: screen;\n}\n\n/* .l-body,\nd-article > * {\n grid-column: text;\n}\n\n.l-page,\nd-title > *,\nd-figure {\n grid-column: page;\n} */\n\n.l-gutter {\n grid-column: gutter;\n}\n\n.l-text,\n.l-body {\n grid-column: text;\n}\n\n.l-page {\n grid-column: page;\n}\n\n.l-body-outset {\n grid-column: middle;\n}\n\n.l-page-outset {\n grid-column: page;\n}\n\n.l-screen {\n grid-column: screen;\n}\n\n.l-screen-inset {\n grid-column: screen;\n padding-left: 16px;\n padding-left: 16px;\n}\n\n\n/* Aside */\n\nd-article aside {\n grid-column: gutter;\n font-size: 12px;\n line-height: 1.6em;\n color: rgba(0, 0, 0, 0.6)\n}\n\n@media(min-width: 768px) {\n aside {\n grid-column: gutter;\n }\n\n .side {\n grid-column: gutter;\n }\n}\n'+'/*\n * Copyright 2018 The Distill Template Authors\n *\n * Licensed under the Apache License, Version 2.0 (the "License");\n * you may not use this file except in compliance with the License.\n * You may obtain a copy of the License at\n *\n * http://www.apache.org/licenses/LICENSE-2.0\n *\n * Unless required by applicable law or agreed to in writing, software\n * distributed under the License is distributed on an "AS IS" BASIS,\n * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n * See the License for the specific language governing permissions and\n * limitations under the License.\n */\n\nd-title {\n padding: 2rem 0 1.5rem;\n contain: layout style;\n overflow-x: hidden;\n}\n\n@media(min-width: 768px) {\n d-title {\n padding: 4rem 0 1.5rem;\n }\n}\n\nd-title h1 {\n grid-column: text;\n font-size: 40px;\n font-weight: 700;\n line-height: 1.1em;\n margin: 0 0 0.5rem;\n}\n\n@media(min-width: 768px) {\n d-title h1 {\n font-size: 50px;\n }\n}\n\nd-title p {\n font-weight: 300;\n font-size: 1.2rem;\n line-height: 1.55em;\n grid-column: text;\n}\n\nd-title .status {\n margin-top: 0px;\n font-size: 12px;\n color: #009688;\n opacity: 0.8;\n grid-column: kicker;\n}\n\nd-title .status span {\n line-height: 1;\n display: inline-block;\n padding: 6px 0;\n border-bottom: 1px solid #80cbc4;\n font-size: 11px;\n text-transform: uppercase;\n}\n'+'/*\n * Copyright 2018 The Distill Template Authors\n *\n * Licensed under the Apache License, Version 2.0 (the "License");\n * you may not use this file except in compliance with the License.\n * You may obtain a copy of the License at\n *\n * http://www.apache.org/licenses/LICENSE-2.0\n *\n * Unless required by applicable law or agreed to in writing, software\n * distributed under the License is distributed on an "AS IS" BASIS,\n * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n * See the License for the specific language governing permissions and\n * limitations under the License.\n */\n\nd-byline {\n contain: content;\n overflow: hidden;\n border-top: 1px solid rgba(0, 0, 0, 0.1);\n font-size: 0.8rem;\n line-height: 1.8em;\n padding: 1.5rem 0;\n min-height: 1.8em;\n}\n\n\nd-byline .byline {\n grid-template-columns: 1fr 1fr;\n grid-column: text;\n}\n\n@media(min-width: 768px) {\n d-byline .byline {\n grid-template-columns: 1fr 1fr 1fr 1fr;\n }\n}\n\nd-byline .authors-affiliations {\n grid-column-end: span 2;\n grid-template-columns: 1fr 1fr;\n margin-bottom: 1em;\n}\n\n@media(min-width: 768px) {\n d-byline .authors-affiliations {\n margin-bottom: 0;\n }\n}\n\nd-byline h3 {\n font-size: 0.6rem;\n font-weight: 400;\n color: rgba(0, 0, 0, 0.5);\n margin: 0;\n text-transform: uppercase;\n}\n\nd-byline p {\n margin: 0;\n}\n\nd-byline a,\nd-article d-byline a {\n color: rgba(0, 0, 0, 0.8);\n text-decoration: none;\n border-bottom: none;\n}\n\nd-article d-byline a:hover {\n text-decoration: underline;\n border-bottom: none;\n}\n\nd-byline p.author {\n font-weight: 500;\n}\n\nd-byline .affiliations {\n\n}\n'+'/*\n * Copyright 2018 The Distill Template Authors\n *\n * Licensed under the Apache License, Version 2.0 (the "License");\n * you may not use this file except in compliance with the License.\n * You may obtain a copy of the License at\n *\n * http://www.apache.org/licenses/LICENSE-2.0\n *\n * Unless required by applicable law or agreed to in writing, software\n * distributed under the License is distributed on an "AS IS" BASIS,\n * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n * See the License for the specific language governing permissions and\n * limitations under the License.\n */\n\nd-article {\n contain: layout style;\n overflow-x: hidden;\n border-top: 1px solid rgba(0, 0, 0, 0.1);\n padding-top: 2rem;\n color: rgba(0, 0, 0, 0.8);\n}\n\nd-article > * {\n grid-column: text;\n}\n\n@media(min-width: 768px) {\n d-article {\n font-size: 16px;\n }\n}\n\n@media(min-width: 1024px) {\n d-article {\n font-size: 1.06rem;\n line-height: 1.7em;\n }\n}\n\n\n/* H2 */\n\n\nd-article .marker {\n text-decoration: none;\n border: none;\n counter-reset: section;\n grid-column: kicker;\n line-height: 1.7em;\n}\n\nd-article .marker:hover {\n border: none;\n}\n\nd-article .marker span {\n padding: 0 3px 4px;\n border-bottom: 1px solid rgba(0, 0, 0, 0.2);\n position: relative;\n top: 4px;\n}\n\nd-article .marker:hover span {\n color: rgba(0, 0, 0, 0.7);\n border-bottom: 1px solid rgba(0, 0, 0, 0.7);\n}\n\nd-article h2 {\n font-weight: 600;\n font-size: 24px;\n line-height: 1.25em;\n margin: 2rem 0 1.5rem 0;\n border-bottom: 1px solid rgba(0, 0, 0, 0.1);\n padding-bottom: 1rem;\n}\n\n@media(min-width: 1024px) {\n d-article h2 {\n font-size: 36px;\n }\n}\n\n/* H3 */\n\nd-article h3 {\n font-weight: 700;\n font-size: 18px;\n line-height: 1.4em;\n margin-bottom: 1em;\n margin-top: 2em;\n}\n\n@media(min-width: 1024px) {\n d-article h3 {\n font-size: 20px;\n }\n}\n\n/* H4 */\n\nd-article h4 {\n font-weight: 600;\n text-transform: uppercase;\n font-size: 14px;\n line-height: 1.4em;\n}\n\nd-article a {\n color: inherit;\n}\n\nd-article p,\nd-article ul,\nd-article ol,\nd-article blockquote {\n margin-top: 0;\n margin-bottom: 1em;\n margin-left: 0;\n margin-right: 0;\n}\n\nd-article blockquote {\n border-left: 2px solid rgba(0, 0, 0, 0.2);\n padding-left: 2em;\n font-style: italic;\n color: rgba(0, 0, 0, 0.6);\n}\n\nd-article a {\n border-bottom: 1px solid rgba(0, 0, 0, 0.4);\n text-decoration: none;\n}\n\nd-article a:hover {\n border-bottom: 1px solid rgba(0, 0, 0, 0.8);\n}\n\nd-article .link {\n text-decoration: underline;\n cursor: pointer;\n}\n\nd-article ul,\nd-article ol {\n padding-left: 24px;\n}\n\nd-article li {\n margin-bottom: 1em;\n margin-left: 0;\n padding-left: 0;\n}\n\nd-article li:last-child {\n margin-bottom: 0;\n}\n\nd-article pre {\n font-size: 14px;\n margin-bottom: 20px;\n}\n\nd-article hr {\n grid-column: screen;\n width: 100%;\n border: none;\n border-bottom: 1px solid rgba(0, 0, 0, 0.1);\n margin-top: 60px;\n margin-bottom: 60px;\n}\n\nd-article section {\n margin-top: 60px;\n margin-bottom: 60px;\n}\n\nd-article span.equation-mimic {\n font-family: georgia;\n font-size: 115%;\n font-style: italic;\n}\n\nd-article > d-code,\nd-article section > d-code {\n display: block;\n}\n\nd-article > d-math[block],\nd-article section > d-math[block] {\n display: block;\n}\n\n@media (max-width: 768px) {\n d-article > d-code,\n d-article section > d-code,\n d-article > d-math[block],\n d-article section > d-math[block] {\n overflow-x: scroll;\n -ms-overflow-style: none; // IE 10+\n overflow: -moz-scrollbars-none; // Firefox\n }\n\n d-article > d-code::-webkit-scrollbar,\n d-article section > d-code::-webkit-scrollbar,\n d-article > d-math[block]::-webkit-scrollbar,\n d-article section > d-math[block]::-webkit-scrollbar {\n display: none; // Safari and Chrome\n }\n}\n\nd-article .citation {\n color: #668;\n cursor: pointer;\n}\n\nd-include {\n width: auto;\n display: block;\n}\n\nd-figure {\n contain: layout style;\n}\n\n/* KaTeX */\n\n.katex, .katex-prerendered {\n contain: style;\n display: inline-block;\n}\n\n/* Tables */\n\nd-article table {\n border-collapse: collapse;\n margin-bottom: 1.5rem;\n border-bottom: 1px solid rgba(0, 0, 0, 0.2);\n}\n\nd-article table th {\n border-bottom: 1px solid rgba(0, 0, 0, 0.2);\n}\n\nd-article table td {\n border-bottom: 1px solid rgba(0, 0, 0, 0.05);\n}\n\nd-article table tr:last-of-type td {\n border-bottom: none;\n}\n\nd-article table th,\nd-article table td {\n font-size: 15px;\n padding: 2px 8px;\n}\n\nd-article table tbody :first-child td {\n padding-top: 2px;\n}\n'+ni+'/*\n * Copyright 2018 The Distill Template Authors\n *\n * Licensed under the Apache License, Version 2.0 (the "License");\n * you may not use this file except in compliance with the License.\n * You may obtain a copy of the License at\n *\n * http://www.apache.org/licenses/LICENSE-2.0\n *\n * Unless required by applicable law or agreed to in writing, software\n * distributed under the License is distributed on an "AS IS" BASIS,\n * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n * See the License for the specific language governing permissions and\n * limitations under the License.\n */\n\n@media print {\n\n @page {\n size: 8in 11in;\n @bottom-right {\n content: counter(page) " of " counter(pages);\n }\n }\n\n html {\n /* no general margins -- CSS Grid takes care of those */\n }\n\n p, code {\n page-break-inside: avoid;\n }\n\n h2, h3 {\n page-break-after: avoid;\n }\n\n d-header {\n visibility: hidden;\n }\n\n d-footer {\n display: none!important;\n }\n\n}\n',mi=[{name:'WebComponents',support:function(){return'customElements'in window&&'attachShadow'in Element.prototype&&'getRootNode'in Element.prototype&&'content'in document.createElement('template')&&'Promise'in window&&'from'in Array},url:'https://distill.pub/third-party/polyfills/webcomponents-lite.js'},{name:'IntersectionObserver',support:function(){return'IntersectionObserver'in window&&'IntersectionObserverEntry'in window},url:'https://distill.pub/third-party/polyfills/intersection-observer.js'}];class yi{static browserSupportsAllFeatures(){return mi.every((e)=>e.support())}static load(e){const t=function(t){t.loaded=!0,console.info('Runlevel 0: Polyfill has finished loading: '+t.name),yi.neededPolyfills.every((e)=>e.loaded)&&(console.info('Runlevel 0: All required polyfills have finished loading.'),console.info('Runlevel 0->1.'),window.distillRunlevel=1,e())};for(const n of yi.neededPolyfills)g(n,t)}static get neededPolyfills(){return yi._neededPolyfills||(yi._neededPolyfills=mi.filter((e)=>!e.support())),yi._neededPolyfills}}const xi=ti('d-abstract',` + + + +`);class ki extends xi(HTMLElement){}const vi=ti('d-appendix',` + + +`,!1);class wi extends vi(HTMLElement){}const Si=/^\s*$/;class Ci extends HTMLElement{static get is(){return'd-article'}constructor(){super(),new MutationObserver((e)=>{for(const t of e)for(const e of t.addedNodes)switch(e.nodeName){case'#text':{const t=e.nodeValue;if(!Si.test(t)){console.warn('Use of unwrapped text in distill articles is discouraged as it breaks layout! Please wrap any text in a or

tag. We found the following text: '+t);const n=document.createElement('span');n.innerHTML=e.nodeValue,e.parentNode.insertBefore(n,e),e.parentNode.removeChild(e)}}}}).observe(this,{childList:!0})}}var Ti='undefined'==typeof window?'undefined'==typeof global?'undefined'==typeof self?{}:self:global:window,_i=f(function(e,t){(function(e){function t(){this.months=['jan','feb','mar','apr','may','jun','jul','aug','sep','oct','nov','dec'],this.notKey=[',','{','}',' ','='],this.pos=0,this.input='',this.entries=[],this.currentEntry='',this.setInput=function(e){this.input=e},this.getEntries=function(){return this.entries},this.isWhitespace=function(e){return' '==e||'\r'==e||'\t'==e||'\n'==e},this.match=function(e,t){if((void 0==t||null==t)&&(t=!0),this.skipWhitespace(t),this.input.substring(this.pos,this.pos+e.length)==e)this.pos+=e.length;else throw'Token mismatch, expected '+e+', found '+this.input.substring(this.pos);this.skipWhitespace(t)},this.tryMatch=function(e,t){return(void 0==t||null==t)&&(t=!0),this.skipWhitespace(t),this.input.substring(this.pos,this.pos+e.length)==e},this.matchAt=function(){for(;this.input.length>this.pos&&'@'!=this.input[this.pos];)this.pos++;return!('@'!=this.input[this.pos])},this.skipWhitespace=function(e){for(;this.isWhitespace(this.input[this.pos]);)this.pos++;if('%'==this.input[this.pos]&&!0==e){for(;'\n'!=this.input[this.pos];)this.pos++;this.skipWhitespace(e)}},this.value_braces=function(){var e=0;this.match('{',!1);for(var t=this.pos,n=!1;;){if(!n)if('}'==this.input[this.pos]){if(0=this.input.length-1)throw'Unterminated value';n='\\'==this.input[this.pos]&&!1==n,this.pos++}},this.value_comment=function(){for(var e='',t=0;!(this.tryMatch('}',!1)&&0==t);){if(e+=this.input[this.pos],'{'==this.input[this.pos]&&t++,'}'==this.input[this.pos]&&t--,this.pos>=this.input.length-1)throw'Unterminated value:'+this.input.substring(start);this.pos++}return e},this.value_quotes=function(){this.match('"',!1);for(var e=this.pos,t=!1;;){if(!t){if('"'==this.input[this.pos]){var n=this.pos;return this.match('"',!1),this.input.substring(e,n)}if(this.pos>=this.input.length-1)throw'Unterminated value:'+this.input.substring(e)}t='\\'==this.input[this.pos]&&!1==t,this.pos++}},this.single_value=function(){var e=this.pos;if(this.tryMatch('{'))return this.value_braces();if(this.tryMatch('"'))return this.value_quotes();var t=this.key();if(t.match('^[0-9]+$'))return t;if(0<=this.months.indexOf(t.toLowerCase()))return t.toLowerCase();throw'Value expected:'+this.input.substring(e)+' for key: '+t},this.value=function(){for(var e=[this.single_value()];this.tryMatch('#');)this.match('#'),e.push(this.single_value());return e.join('')},this.key=function(){for(var e=this.pos;;){if(this.pos>=this.input.length)throw'Runaway key';if(0<=this.notKey.indexOf(this.input[this.pos]))return this.input.substring(e,this.pos);this.pos++}},this.key_equals_value=function(){var e=this.key();if(this.tryMatch('=')){this.match('=');var t=this.value();return[e,t]}throw'... = value expected, equals sign missing:'+this.input.substring(this.pos)},this.key_value_list=function(){var e=this.key_equals_value();for(this.currentEntry.entryTags={},this.currentEntry.entryTags[e[0]]=e[1];this.tryMatch(',')&&(this.match(','),!this.tryMatch('}'));)e=this.key_equals_value(),this.currentEntry.entryTags[e[0]]=e[1]},this.entry_body=function(e){this.currentEntry={},this.currentEntry.citationKey=this.key(),this.currentEntry.entryType=e.substring(1),this.match(','),this.key_value_list(),this.entries.push(this.currentEntry)},this.directive=function(){return this.match('@'),'@'+this.key()},this.preamble=function(){this.currentEntry={},this.currentEntry.entryType='PREAMBLE',this.currentEntry.entry=this.value_comment(),this.entries.push(this.currentEntry)},this.comment=function(){this.currentEntry={},this.currentEntry.entryType='COMMENT',this.currentEntry.entry=this.value_comment(),this.entries.push(this.currentEntry)},this.entry=function(e){this.entry_body(e)},this.bibtex=function(){for(;this.matchAt();){var e=this.directive();this.match('{'),'@STRING'==e?this.string():'@PREAMBLE'==e?this.preamble():'@COMMENT'==e?this.comment():this.entry(e),this.match('}')}}}e.toJSON=function(e){var n=new t;return n.setInput(e),n.bibtex(),n.entries},e.toBibtex=function(e){var t='';for(var n in e){if(t+='@'+e[n].entryType,t+='{',e[n].citationKey&&(t+=e[n].citationKey+', '),e[n].entry&&(t+=e[n].entry),e[n].entryTags){var i='';for(var a in e[n].entryTags)0!=i.length&&(i+=', '),i+=a+'= {'+e[n].entryTags[a]+'}';t+=i}t+='}\n\n'}return t}})(t)});class Li extends HTMLElement{static get is(){return'd-bibliography'}constructor(){super();const e=new MutationObserver((e)=>{for(const t of e)('SCRIPT'===t.target.nodeName||'characterData'===t.type)&&this.parseIfPossible()});e.observe(this,{childList:!0,characterData:!0,subtree:!0})}connectedCallback(){requestAnimationFrame(()=>{this.parseIfPossible()})}parseIfPossible(){const e=this.querySelector('script');if(e)if('text/bibtex'==e.type){const t=e.textContent;if(this.bibtex!==t){this.bibtex=t;const e=b(this.bibtex);this.notify(e)}}else if('text/json'==e.type){const t=new Map(JSON.parse(e.textContent));this.notify(t)}else console.warn('Unsupported bibliography script tag type: '+e.type)}notify(e){const t=new CustomEvent('onBibliographyChanged',{detail:e,bubbles:!0});this.dispatchEvent(t)}static get observedAttributes(){return['src']}receivedBibtex(e){const t=b(e.target.response);this.notify(t)}attributeChangedCallback(e,t,n){var i=new XMLHttpRequest;i.onload=(t)=>this.receivedBibtex(t),i.onerror=()=>console.warn(`Could not load Bibtex! (tried ${n})`),i.responseType='text',i.open('GET',n,!0),i.send()}}class Ai extends HTMLElement{static get is(){return'd-byline'}set frontMatter(e){this.innerHTML=y(e)}}const Ei=ti('d-cite',` + + + + +

+ + +
+`);class Di extends Ei(HTMLElement){connectedCallback(){this.outerSpan=this.root.querySelector('#citation-'),this.innerSpan=this.root.querySelector('.citation-number'),this.hoverBox=this.root.querySelector('d-hover-box'),window.customElements.whenDefined('d-hover-box').then(()=>{this.hoverBox.listen(this)})}static get observedAttributes(){return['key']}attributeChangedCallback(e,t,n){const i=t?'onCiteKeyChanged':'onCiteKeyCreated',a=n.split(','),d={detail:[this,a],bubbles:!0},r=new CustomEvent(i,d);document.dispatchEvent(r)}set key(e){this.setAttribute('key',e)}get key(){return this.getAttribute('key')}get keys(){return this.getAttribute('key').split(',')}set numbers(e){const t=e.map((e)=>{return-1==e?'?':e+1+''}),n='['+t.join(', ')+']';this.innerSpan&&(this.innerSpan.textContent=n)}set entries(e){this.hoverBox&&(this.hoverBox.innerHTML=`
    + ${e.map(l).map((e)=>`
  • ${e}
  • `).join('\n')} +
`)}}const Mi=` +d-citation-list { + contain: layout style; +} + +d-citation-list .references { + grid-column: text; +} + +d-citation-list .references .title { + font-weight: 500; +} +`;class Oi extends HTMLElement{static get is(){return'd-citation-list'}connectedCallback(){this.hasAttribute('distill-prerendered')||(this.style.display='none')}set citations(e){x(this,e)}}var Ui=f(function(e){var t='undefined'==typeof window?'undefined'!=typeof WorkerGlobalScope&&self instanceof WorkerGlobalScope?self:{}:window,n=function(){var e=/\blang(?:uage)?-(\w+)\b/i,n=0,a=t.Prism={util:{encode:function(e){return e instanceof i?new i(e.type,a.util.encode(e.content),e.alias):'Array'===a.util.type(e)?e.map(a.util.encode):e.replace(/&/g,'&').replace(/e.length)break tokenloop;if(!(y instanceof n)){c.lastIndex=0;var v=c.exec(y),w=1;if(!v&&f&&x!=d.length-1){if(c.lastIndex=i,v=c.exec(e),!v)break;for(var S=v.index+(g?v[1].length:0),C=v.index+v[0].length,T=x,k=i,p=d.length;T=k&&(++x,i=k);if(d[x]instanceof n||d[T-1].greedy)continue;w=T-x,y=e.slice(i,k),v.index-=i}if(v){g&&(h=v[1].length);var S=v.index+h,v=v[0].slice(h),C=S+v.length,_=y.slice(0,S),L=y.slice(C),A=[x,w];_&&A.push(_);var E=new n(o,u?a.tokenize(v,u):v,b,v,f);A.push(E),L&&A.push(L),Array.prototype.splice.apply(d,A)}}}}}return d},hooks:{all:{},add:function(e,t){var n=a.hooks.all;n[e]=n[e]||[],n[e].push(t)},run:function(e,t){var n=a.hooks.all[e];if(n&&n.length)for(var d,r=0;d=n[r++];)d(t)}}},i=a.Token=function(e,t,n,i,a){this.type=e,this.content=t,this.alias=n,this.length=0|(i||'').length,this.greedy=!!a};if(i.stringify=function(e,t,n){if('string'==typeof e)return e;if('Array'===a.util.type(e))return e.map(function(n){return i.stringify(n,t,e)}).join('');var d={type:e.type,content:i.stringify(e.content,t,n),tag:'span',classes:['token',e.type],attributes:{},language:t,parent:n};if('comment'==d.type&&(d.attributes.spellcheck='true'),e.alias){var r='Array'===a.util.type(e.alias)?e.alias:[e.alias];Array.prototype.push.apply(d.classes,r)}a.hooks.run('wrap',d);var l=Object.keys(d.attributes).map(function(e){return e+'="'+(d.attributes[e]||'').replace(/"/g,'"')+'"'}).join(' ');return'<'+d.tag+' class="'+d.classes.join(' ')+'"'+(l?' '+l:'')+'>'+d.content+''},!t.document)return t.addEventListener?(t.addEventListener('message',function(e){var n=JSON.parse(e.data),i=n.language,d=n.code,r=n.immediateClose;t.postMessage(a.highlight(d,a.languages[i],i)),r&&t.close()},!1),t.Prism):t.Prism;var d=document.currentScript||[].slice.call(document.getElementsByTagName('script')).pop();return d&&(a.filename=d.src,document.addEventListener&&!d.hasAttribute('data-manual')&&('loading'===document.readyState?document.addEventListener('DOMContentLoaded',a.highlightAll):window.requestAnimationFrame?window.requestAnimationFrame(a.highlightAll):window.setTimeout(a.highlightAll,16))),t.Prism}();e.exports&&(e.exports=n),'undefined'!=typeof Ti&&(Ti.Prism=n),n.languages.markup={comment://,prolog:/<\?[\w\W]+?\?>/,doctype://i,cdata://i,tag:{pattern:/<\/?(?!\d)[^\s>\/=$<]+(?:\s+[^\s>\/=]+(?:=(?:("|')(?:\\\1|\\?(?!\1)[\w\W])*\1|[^\s'">=]+))?)*\s*\/?>/i,inside:{tag:{pattern:/^<\/?[^\s>\/]+/i,inside:{punctuation:/^<\/?/,namespace:/^[^\s>\/:]+:/}},"attr-value":{pattern:/=(?:('|")[\w\W]*?(\1)|[^\s>]+)/i,inside:{punctuation:/[=>"']/}},punctuation:/\/?>/,"attr-name":{pattern:/[^\s>\/]+/,inside:{namespace:/^[^\s>\/:]+:/}}}},entity:/&#?[\da-z]{1,8};/i},n.hooks.add('wrap',function(e){'entity'===e.type&&(e.attributes.title=e.content.replace(/&/,'&'))}),n.languages.xml=n.languages.markup,n.languages.html=n.languages.markup,n.languages.mathml=n.languages.markup,n.languages.svg=n.languages.markup,n.languages.css={comment:/\/\*[\w\W]*?\*\//,atrule:{pattern:/@[\w-]+?.*?(;|(?=\s*\{))/i,inside:{rule:/@[\w-]+/}},url:/url\((?:(["'])(\\(?:\r\n|[\w\W])|(?!\1)[^\\\r\n])*\1|.*?)\)/i,selector:/[^\{\}\s][^\{\};]*?(?=\s*\{)/,string:{pattern:/("|')(\\(?:\r\n|[\w\W])|(?!\1)[^\\\r\n])*\1/,greedy:!0},property:/(\b|\B)[\w-]+(?=\s*:)/i,important:/\B!important\b/i,function:/[-a-z0-9]+(?=\()/i,punctuation:/[(){};:]/},n.languages.css.atrule.inside.rest=n.util.clone(n.languages.css),n.languages.markup&&(n.languages.insertBefore('markup','tag',{style:{pattern:/()[\w\W]*?(?=<\/style>)/i,lookbehind:!0,inside:n.languages.css,alias:'language-css'}}),n.languages.insertBefore('inside','attr-value',{"style-attr":{pattern:/\s*style=("|').*?\1/i,inside:{"attr-name":{pattern:/^\s*style/i,inside:n.languages.markup.tag.inside},punctuation:/^\s*=\s*['"]|['"]\s*$/,"attr-value":{pattern:/.+/i,inside:n.languages.css}},alias:'language-css'}},n.languages.markup.tag)),n.languages.clike={comment:[{pattern:/(^|[^\\])#.*/,lookbehind:!0},{pattern:/(^|[^\\])\/\*[\w\W]*?\*\//,lookbehind:!0},{pattern:/(^|[^\\:])\/\/.*/,lookbehind:!0}],string:{pattern:/(["'])(\\(?:\r\n|[\s\S])|(?!\1)[^\\\r\n])*\1/,greedy:!0},"class-name":{pattern:/((?:\b(?:class|interface|extends|implements|trait|instanceof|new)\s+)|(?:catch\s+\())[a-z0-9_\.\\]+/i,lookbehind:!0,inside:{punctuation:/(\.|\\)/}},keyword:/\b(if|else|while|do|for|return|in|instanceof|function|new|try|throw|catch|finally|null|break|continue)\b/,boolean:/\b(true|false)\b/,function:/[a-z\.0-9_]+(?=\()/i,number:/\b-?(?:0x[\da-f]+|\d*\.?\d+(?:e[+-]?\d+)?)\b/i,operator:/--?|\+\+?|!=?=?|<=?|>=?|==?=?|&&?|\|\|?|\?|\*|\/|~|\^|%/,punctuation:/[{}[\];(),.:]/},n.languages.javascript=n.languages.extend('clike',{keyword:/\b(as|async|await|break|case|catch|class|const|continue|debugger|default|delete|do|else|enum|export|extends|finally|for|from|function|get|if|implements|import|in|instanceof|interface|let|new|null|of|package|private|protected|public|return|set|static|super|switch|this|throw|try|typeof|var|void|while|with|yield)\b/,number:/\b-?(0x[\dA-Fa-f]+|0b[01]+|0o[0-7]+|\d*\.?\d+([Ee][+-]?\d+)?|NaN|Infinity)\b/,function:/[_$a-zA-Z\xA0-\uFFFF][_$a-zA-Z0-9\xA0-\uFFFF]*(?=\()/i,operator:/--?|\+\+?|!=?=?|<=?|>=?|==?=?|&&?|\|\|?|\?|\*\*?|\/|~|\^|%|\.{3}/}),n.languages.insertBefore('javascript','keyword',{regex:{pattern:/(^|[^/])\/(?!\/)(\[.+?]|\\.|[^/\\\r\n])+\/[gimyu]{0,5}(?=\s*($|[\r\n,.;})]))/,lookbehind:!0,greedy:!0}}),n.languages.insertBefore('javascript','string',{"template-string":{pattern:/`(?:\\\\|\\?[^\\])*?`/,greedy:!0,inside:{interpolation:{pattern:/\$\{[^}]+\}/,inside:{"interpolation-punctuation":{pattern:/^\$\{|\}$/,alias:'punctuation'},rest:n.languages.javascript}},string:/[\s\S]+/}}}),n.languages.markup&&n.languages.insertBefore('markup','tag',{script:{pattern:/()[\w\W]*?(?=<\/script>)/i,lookbehind:!0,inside:n.languages.javascript,alias:'language-javascript'}}),n.languages.js=n.languages.javascript,function(){'undefined'!=typeof self&&self.Prism&&self.document&&document.querySelector&&(self.Prism.fileHighlight=function(){var e={js:'javascript',py:'python',rb:'ruby',ps1:'powershell',psm1:'powershell',sh:'bash',bat:'batch',h:'c',tex:'latex'};Array.prototype.forEach&&Array.prototype.slice.call(document.querySelectorAll('pre[data-src]')).forEach(function(t){for(var i,a=t.getAttribute('data-src'),d=t,r=/\blang(?:uage)?-(?!\*)(\w+)\b/i;d&&!r.test(d.className);)d=d.parentNode;if(d&&(i=(t.className.match(r)||[,''])[1]),!i){var o=(a.match(/\.(\w+)$/)||[,''])[1];i=e[o]||o}var l=document.createElement('code');l.className='language-'+i,t.textContent='',l.textContent='Loading\u2026',t.appendChild(l);var s=new XMLHttpRequest;s.open('GET',a,!0),s.onreadystatechange=function(){4==s.readyState&&(400>s.status&&s.responseText?(l.textContent=s.responseText,n.highlightElement(l)):400<=s.status?l.textContent='\u2716 Error '+s.status+' while fetching file: '+s.statusText:l.textContent='\u2716 Error: File does not exist or is empty')},s.send(null)})},document.addEventListener('DOMContentLoaded',self.Prism.fileHighlight))}()});Prism.languages.python={"triple-quoted-string":{pattern:/"""[\s\S]+?"""|'''[\s\S]+?'''/,alias:'string'},comment:{pattern:/(^|[^\\])#.*/,lookbehind:!0},string:{pattern:/("|')(?:\\\\|\\?[^\\\r\n])*?\1/,greedy:!0},function:{pattern:/((?:^|\s)def[ \t]+)[a-zA-Z_][a-zA-Z0-9_]*(?=\()/g,lookbehind:!0},"class-name":{pattern:/(\bclass\s+)[a-z0-9_]+/i,lookbehind:!0},keyword:/\b(?:as|assert|async|await|break|class|continue|def|del|elif|else|except|exec|finally|for|from|global|if|import|in|is|lambda|pass|print|raise|return|try|while|with|yield)\b/,boolean:/\b(?:True|False)\b/,number:/\b-?(?:0[bo])?(?:(?:\d|0x[\da-f])[\da-f]*\.?\d*|\.\d+)(?:e[+-]?\d+)?j?\b/i,operator:/[-+%=]=?|!=|\*\*?=?|\/\/?=?|<[<=>]?|>[=>]?|[&|^~]|\b(?:or|and|not)\b/,punctuation:/[{}[\];(),.:]/},Prism.languages.clike={comment:[{pattern:/(^|[^\\])#.*/,lookbehind:!0},{pattern:/(^|[^\\])\/\*[\w\W]*?\*\//,lookbehind:!0},{pattern:/(^|[^\\:])\/\/.*/,lookbehind:!0}],string:{pattern:/(["'])(\\(?:\r\n|[\s\S])|(?!\1)[^\\\r\n])*\1/,greedy:!0},"class-name":{pattern:/((?:\b(?:class|interface|extends|implements|trait|instanceof|new)\s+)|(?:catch\s+\())[a-z0-9_\.\\]+/i,lookbehind:!0,inside:{punctuation:/(\.|\\)/}},keyword:/\b(if|else|while|do|for|return|in|instanceof|function|new|try|throw|catch|finally|null|break|continue)\b/,boolean:/\b(true|false)\b/,function:/[a-z\.0-9_]+(?=\()/i,number:/\b-?(?:0x[\da-f]+|\d*\.?\d+(?:e[+-]?\d+)?)\b/i,operator:/--?|\+\+?|!=?=?|<=?|>=?|==?=?|&&?|\|\|?|\?|\*|\/|~|\^|%/,punctuation:/[{}[\];(),.:]/},Prism.languages.lua={comment:/^#!.+|--(?:\[(=*)\[[\s\S]*?\]\1\]|.*)/m,string:{pattern:/(["'])(?:(?!\1)[^\\\r\n]|\\z(?:\r\n|\s)|\\(?:\r\n|[\s\S]))*\1|\[(=*)\[[\s\S]*?\]\2\]/,greedy:!0},number:/\b0x[a-f\d]+\.?[a-f\d]*(?:p[+-]?\d+)?\b|\b\d+(?:\.\B|\.?\d*(?:e[+-]?\d+)?\b)|\B\.\d+(?:e[+-]?\d+)?\b/i,keyword:/\b(?:and|break|do|else|elseif|end|false|for|function|goto|if|in|local|nil|not|or|repeat|return|then|true|until|while)\b/,function:/(?!\d)\w+(?=\s*(?:[({]))/,operator:[/[-+*%^&|#]|\/\/?|<[<=]?|>[>=]?|[=~]=?/,{pattern:/(^|[^.])\.\.(?!\.)/,lookbehind:!0}],punctuation:/[\[\](){},;]|\.+|:+/},function(e){var t={variable:[{pattern:/\$?\(\([\w\W]+?\)\)/,inside:{variable:[{pattern:/(^\$\(\([\w\W]+)\)\)/,lookbehind:!0},/^\$\(\(/],number:/\b-?(?:0x[\dA-Fa-f]+|\d*\.?\d+(?:[Ee]-?\d+)?)\b/,operator:/--?|-=|\+\+?|\+=|!=?|~|\*\*?|\*=|\/=?|%=?|<<=?|>>=?|<=?|>=?|==?|&&?|&=|\^=?|\|\|?|\|=|\?|:/,punctuation:/\(\(?|\)\)?|,|;/}},{pattern:/\$\([^)]+\)|`[^`]+`/,inside:{variable:/^\$\(|^`|\)$|`$/}},/\$(?:[a-z0-9_#\?\*!@]+|\{[^}]+\})/i]};e.languages.bash={shebang:{pattern:/^#!\s*\/bin\/bash|^#!\s*\/bin\/sh/,alias:'important'},comment:{pattern:/(^|[^"{\\])#.*/,lookbehind:!0},string:[{pattern:/((?:^|[^<])<<\s*)(?:"|')?(\w+?)(?:"|')?\s*\r?\n(?:[\s\S])*?\r?\n\2/g,lookbehind:!0,greedy:!0,inside:t},{pattern:/(["'])(?:\\\\|\\?[^\\])*?\1/g,greedy:!0,inside:t}],variable:t.variable,function:{pattern:/(^|\s|;|\||&)(?:alias|apropos|apt-get|aptitude|aspell|awk|basename|bash|bc|bg|builtin|bzip2|cal|cat|cd|cfdisk|chgrp|chmod|chown|chroot|chkconfig|cksum|clear|cmp|comm|command|cp|cron|crontab|csplit|cut|date|dc|dd|ddrescue|df|diff|diff3|dig|dir|dircolors|dirname|dirs|dmesg|du|egrep|eject|enable|env|ethtool|eval|exec|expand|expect|export|expr|fdformat|fdisk|fg|fgrep|file|find|fmt|fold|format|free|fsck|ftp|fuser|gawk|getopts|git|grep|groupadd|groupdel|groupmod|groups|gzip|hash|head|help|hg|history|hostname|htop|iconv|id|ifconfig|ifdown|ifup|import|install|jobs|join|kill|killall|less|link|ln|locate|logname|logout|look|lpc|lpr|lprint|lprintd|lprintq|lprm|ls|lsof|make|man|mkdir|mkfifo|mkisofs|mknod|more|most|mount|mtools|mtr|mv|mmv|nano|netstat|nice|nl|nohup|notify-send|npm|nslookup|open|op|passwd|paste|pathchk|ping|pkill|popd|pr|printcap|printenv|printf|ps|pushd|pv|pwd|quota|quotacheck|quotactl|ram|rar|rcp|read|readarray|readonly|reboot|rename|renice|remsync|rev|rm|rmdir|rsync|screen|scp|sdiff|sed|seq|service|sftp|shift|shopt|shutdown|sleep|slocate|sort|source|split|ssh|stat|strace|su|sudo|sum|suspend|sync|tail|tar|tee|test|time|timeout|times|touch|top|traceroute|trap|tr|tsort|tty|type|ulimit|umask|umount|unalias|uname|unexpand|uniq|units|unrar|unshar|uptime|useradd|userdel|usermod|users|uuencode|uudecode|v|vdir|vi|vmstat|wait|watch|wc|wget|whereis|which|who|whoami|write|xargs|xdg-open|yes|zip)(?=$|\s|;|\||&)/,lookbehind:!0},keyword:{pattern:/(^|\s|;|\||&)(?:let|:|\.|if|then|else|elif|fi|for|break|continue|while|in|case|function|select|do|done|until|echo|exit|return|set|declare)(?=$|\s|;|\||&)/,lookbehind:!0},boolean:{pattern:/(^|\s|;|\||&)(?:true|false)(?=$|\s|;|\||&)/,lookbehind:!0},operator:/&&?|\|\|?|==?|!=?|<<>|<=?|>=?|=~/,punctuation:/\$?\(\(?|\)\)?|\.\.|[{}[\];]/};var n=t.variable[1].inside;n['function']=e.languages.bash['function'],n.keyword=e.languages.bash.keyword,n.boolean=e.languages.bash.boolean,n.operator=e.languages.bash.operator,n.punctuation=e.languages.bash.punctuation}(Prism),Prism.languages.go=Prism.languages.extend('clike',{keyword:/\b(break|case|chan|const|continue|default|defer|else|fallthrough|for|func|go(to)?|if|import|interface|map|package|range|return|select|struct|switch|type|var)\b/,builtin:/\b(bool|byte|complex(64|128)|error|float(32|64)|rune|string|u?int(8|16|32|64|)|uintptr|append|cap|close|complex|copy|delete|imag|len|make|new|panic|print(ln)?|real|recover)\b/,boolean:/\b(_|iota|nil|true|false)\b/,operator:/[*\/%^!=]=?|\+[=+]?|-[=-]?|\|[=|]?|&(?:=|&|\^=?)?|>(?:>=?|=)?|<(?:<=?|=|-)?|:=|\.\.\./,number:/\b(-?(0x[a-f\d]+|(\d+\.?\d*|\.\d+)(e[-+]?\d+)?)i?)\b/i,string:/("|'|`)(\\?.|\r|\n)*?\1/}),delete Prism.languages.go['class-name'],Prism.languages.markdown=Prism.languages.extend('markup',{}),Prism.languages.insertBefore('markdown','prolog',{blockquote:{pattern:/^>(?:[\t ]*>)*/m,alias:'punctuation'},code:[{pattern:/^(?: {4}|\t).+/m,alias:'keyword'},{pattern:/``.+?``|`[^`\n]+`/,alias:'keyword'}],title:[{pattern:/\w+.*(?:\r?\n|\r)(?:==+|--+)/,alias:'important',inside:{punctuation:/==+$|--+$/}},{pattern:/(^\s*)#+.+/m,lookbehind:!0,alias:'important',inside:{punctuation:/^#+|#+$/}}],hr:{pattern:/(^\s*)([*-])([\t ]*\2){2,}(?=\s*$)/m,lookbehind:!0,alias:'punctuation'},list:{pattern:/(^\s*)(?:[*+-]|\d+\.)(?=[\t ].)/m,lookbehind:!0,alias:'punctuation'},"url-reference":{pattern:/!?\[[^\]]+\]:[\t ]+(?:\S+|<(?:\\.|[^>\\])+>)(?:[\t ]+(?:"(?:\\.|[^"\\])*"|'(?:\\.|[^'\\])*'|\((?:\\.|[^)\\])*\)))?/,inside:{variable:{pattern:/^(!?\[)[^\]]+/,lookbehind:!0},string:/(?:"(?:\\.|[^"\\])*"|'(?:\\.|[^'\\])*'|\((?:\\.|[^)\\])*\))$/,punctuation:/^[\[\]!:]|[<>]/},alias:'url'},bold:{pattern:/(^|[^\\])(\*\*|__)(?:(?:\r?\n|\r)(?!\r?\n|\r)|.)+?\2/,lookbehind:!0,inside:{punctuation:/^\*\*|^__|\*\*$|__$/}},italic:{pattern:/(^|[^\\])([*_])(?:(?:\r?\n|\r)(?!\r?\n|\r)|.)+?\2/,lookbehind:!0,inside:{punctuation:/^[*_]|[*_]$/}},url:{pattern:/!?\[[^\]]+\](?:\([^\s)]+(?:[\t ]+"(?:\\.|[^"\\])*")?\)| ?\[[^\]\n]*\])/,inside:{variable:{pattern:/(!?\[)[^\]]+(?=\]$)/,lookbehind:!0},string:{pattern:/"(?:\\.|[^"\\])*"(?=\)$)/}}}}),Prism.languages.markdown.bold.inside.url=Prism.util.clone(Prism.languages.markdown.url),Prism.languages.markdown.italic.inside.url=Prism.util.clone(Prism.languages.markdown.url),Prism.languages.markdown.bold.inside.italic=Prism.util.clone(Prism.languages.markdown.italic),Prism.languages.markdown.italic.inside.bold=Prism.util.clone(Prism.languages.markdown.bold),Prism.languages.julia={comment:{pattern:/(^|[^\\])#.*/,lookbehind:!0},string:/"""[\s\S]+?"""|'''[\s\S]+?'''|("|')(\\?.)*?\1/,keyword:/\b(abstract|baremodule|begin|bitstype|break|catch|ccall|const|continue|do|else|elseif|end|export|finally|for|function|global|if|immutable|import|importall|let|local|macro|module|print|println|quote|return|try|type|typealias|using|while)\b/,boolean:/\b(true|false)\b/,number:/\b-?(0[box])?(?:[\da-f]+\.?\d*|\.\d+)(?:[efp][+-]?\d+)?j?\b/i,operator:/\+=?|-=?|\*=?|\/[\/=]?|\\=?|\^=?|%=?|÷=?|!=?=?|&=?|\|[=>]?|\$=?|<(?:<=?|[=:])?|>(?:=|>>?=?)?|==?=?|[~≠≤≥]/,punctuation:/[{}[\];(),.:]/};const Ii=ti('d-code',` + + + + +`);class Ni extends ei(Ii(HTMLElement)){renderContent(){if(this.languageName=this.getAttribute('language'),!this.languageName)return void console.warn('You need to provide a language attribute to your block to let us know how to highlight your code; e.g.:\n zeros = np.zeros(shape).');const e=Ui.languages[this.languageName];if(void 0==e)return void console.warn(`Distill does not yet support highlighting your code block in "${this.languageName}'.`);let t=this.textContent;const n=this.shadowRoot.querySelector('#code-container');if(this.hasAttribute('block')){t=t.replace(/\n/,'');const e=t.match(/\s*/);if(t=t.replace(new RegExp('\n'+e,'g'),'\n'),t=t.trim(),n.parentNode instanceof ShadowRoot){const e=document.createElement('pre');this.shadowRoot.removeChild(n),e.appendChild(n),this.shadowRoot.appendChild(e)}}n.className=`language-${this.languageName}`,n.innerHTML=Ui.highlight(t,e)}}const ji=ti('d-footnote',` + + + +
+ +
+
+ + + + + +`);class Ri extends ji(HTMLElement){constructor(){super();const e=new MutationObserver(this.notify);e.observe(this,{childList:!0,characterData:!0,subtree:!0})}notify(){const e={detail:this,bubbles:!0},t=new CustomEvent('onFootnoteChanged',e);document.dispatchEvent(t)}connectedCallback(){this.hoverBox=this.root.querySelector('d-hover-box'),window.customElements.whenDefined('d-hover-box').then(()=>{this.hoverBox.listen(this)}),Ri.currentFootnoteId+=1;const e=Ri.currentFootnoteId.toString();this.root.host.id='d-footnote-'+e;const t='dt-fn-hover-box-'+e;this.hoverBox.id=t;const n=this.root.querySelector('#fn-');n.setAttribute('id','fn-'+e),n.setAttribute('data-hover-ref',t),n.textContent=e}}Ri.currentFootnoteId=0;const qi=ti('d-footnote-list',` + + +

Footnotes

+
    +`,!1);class Fi extends qi(HTMLElement){connectedCallback(){super.connectedCallback(),this.list=this.root.querySelector('ol'),this.root.style.display='none'}set footnotes(e){if(this.list.innerHTML='',e.length){this.root.style.display='';for(const t of e){const e=document.createElement('li');e.id=t.id+'-listing',e.innerHTML=t.innerHTML;const n=document.createElement('a');n.setAttribute('class','footnote-backlink'),n.textContent='[\u21A9]',n.href='#'+t.id,e.appendChild(n),this.list.appendChild(e)}}else this.root.style.display='none'}}const Pi=ti('d-hover-box',` + + +
    +
    + +
    +
    +`);class Hi extends Pi(HTMLElement){constructor(){super()}connectedCallback(){}listen(e){this.bindDivEvents(this),this.bindTriggerEvents(e)}bindDivEvents(e){e.addEventListener('mouseover',()=>{this.visible||this.showAtNode(e),this.stopTimeout()}),e.addEventListener('mouseout',()=>{this.extendTimeout(500)}),e.addEventListener('touchstart',(e)=>{e.stopPropagation()},{passive:!0}),document.body.addEventListener('touchstart',()=>{this.hide()},{passive:!0})}bindTriggerEvents(e){e.addEventListener('mouseover',()=>{this.visible||this.showAtNode(e),this.stopTimeout()}),e.addEventListener('mouseout',()=>{this.extendTimeout(300)}),e.addEventListener('touchstart',(t)=>{this.visible?this.hide():this.showAtNode(e),t.stopPropagation()},{passive:!0})}show(e){this.visible=!0,this.style.display='block',this.style.top=Pn(e[1]+10)+'px'}showAtNode(e){const t=e.getBoundingClientRect();this.show([e.offsetLeft+t.width,e.offsetTop+t.height])}hide(){this.visible=!1,this.style.display='none',this.stopTimeout()}stopTimeout(){this.timeout&&clearTimeout(this.timeout)}extendTimeout(e){this.stopTimeout(),this.timeout=setTimeout(()=>{this.hide()},e)}}class zi extends HTMLElement{static get is(){return'd-title'}}const Yi=ti('d-references',` + +`,!1);class Bi extends Yi(HTMLElement){}class Wi extends HTMLElement{static get is(){return'd-toc'}connectedCallback(){this.getAttribute('prerendered')||(window.onload=()=>{const e=document.querySelector('d-article'),t=e.querySelectorAll('h2, h3');k(this,t)})}}class Vi extends HTMLElement{static get is(){return'd-figure'}static get readyQueue(){return Vi._readyQueue||(Vi._readyQueue=[]),Vi._readyQueue}static addToReadyQueue(e){-1===Vi.readyQueue.indexOf(e)&&(Vi.readyQueue.push(e),Vi.runReadyQueue())}static runReadyQueue(){const e=Vi.readyQueue.sort((e,t)=>e._seenOnScreen-t._seenOnScreen).filter((e)=>!e._ready).pop();e&&(e.ready(),requestAnimationFrame(Vi.runReadyQueue))}constructor(){super(),this._ready=!1,this._onscreen=!1,this._offscreen=!0}connectedCallback(){this.loadsWhileScrolling=this.hasAttribute('loadsWhileScrolling'),Vi.marginObserver.observe(this),Vi.directObserver.observe(this)}disconnectedCallback(){Vi.marginObserver.unobserve(this),Vi.directObserver.unobserve(this)}static get marginObserver(){if(!Vi._marginObserver){const e=window.innerHeight,t=Fn(2*e),n=Vi.didObserveMarginIntersection,i=new IntersectionObserver(n,{rootMargin:t+'px 0px '+t+'px 0px',threshold:0.01});Vi._marginObserver=i}return Vi._marginObserver}static didObserveMarginIntersection(e){for(const t of e){const e=t.target;t.isIntersecting&&!e._ready&&Vi.addToReadyQueue(e)}}static get directObserver(){return Vi._directObserver||(Vi._directObserver=new IntersectionObserver(Vi.didObserveDirectIntersection,{rootMargin:'0px',threshold:[0,1]})),Vi._directObserver}static didObserveDirectIntersection(e){for(const t of e){const e=t.target;t.isIntersecting?(e._seenOnScreen=new Date,e._offscreen&&e.onscreen()):e._onscreen&&e.offscreen()}}addEventListener(e,t){super.addEventListener(e,t),'ready'===e&&-1!==Vi.readyQueue.indexOf(this)&&(this._ready=!1,Vi.runReadyQueue()),'onscreen'===e&&this.onscreen()}ready(){this._ready=!0,Vi.marginObserver.unobserve(this);const e=new CustomEvent('ready');this.dispatchEvent(e)}onscreen(){this._onscreen=!0,this._offscreen=!1;const e=new CustomEvent('onscreen');this.dispatchEvent(e)}offscreen(){this._onscreen=!1,this._offscreen=!0;const e=new CustomEvent('offscreen');this.dispatchEvent(e)}}if('undefined'!=typeof window){Vi.isScrolling=!1;let e;window.addEventListener('scroll',()=>{Vi.isScrolling=!0,clearTimeout(e),e=setTimeout(()=>{Vi.isScrolling=!1,Vi.runReadyQueue()},500)},!0)}const Ki=ti('d-interstitial',` + + +
    +
    +

    This article is in review.

    +

    Do not share this URL or the contents of this article. Thank you!

    + +

    Enter the password we shared with you as part of the review process to view the article.

    +
    +
    +`);class $i extends Ki(HTMLElement){connectedCallback(){if(this.shouldRemoveSelf())this.parentElement.removeChild(this);else{const e=this.root.querySelector('#interstitial-password-input');e.oninput=(e)=>this.passwordChanged(e)}}passwordChanged(e){const t=e.target.value;t===this.password&&(console.log('Correct password entered.'),this.parentElement.removeChild(this),'undefined'!=typeof Storage&&(console.log('Saved that correct password was entered.'),localStorage.setItem(this.localStorageIdentifier(),'true')))}shouldRemoveSelf(){return window&&window.location.hostname==='distill.pub'?(console.warn('Interstitial found on production, hiding it.'),!0):'undefined'!=typeof Storage&&'true'===localStorage.getItem(this.localStorageIdentifier())&&(console.log('Loaded that correct password was entered before; skipping interstitial.'),!0)}localStorageIdentifier(){return'distill-drafts'+(window?window.location.pathname:'-')+'interstitial-password-correct'}}var Xi=function(e,t){return et?1:e>=t?0:NaN},Ji=function(e){return 1===e.length&&(e=v(e)),{left:function(t,n,i,a){for(null==i&&(i=0),null==a&&(a=t.length);i>>1;0>e(t[d],n)?i=d+1:a=d}return i},right:function(t,n,i,a){for(null==i&&(i=0),null==a&&(a=t.length);i>>1;0(i=arguments.length)?(t=e,e=0,1):3>i?1:+a;for(var d=-1,i=0|Rn(0,qn((t-e)/a)),n=Array(i);++d=this.r&&0<=this.g&&255>=this.g&&0<=this.b&&255>=this.b&&0<=this.opacity&&1>=this.opacity},toString:function(){var e=this.opacity;return e=isNaN(e)?1:Rn(0,Hn(1,e)),(1===e?'rgb(':'rgba(')+Rn(0,Hn(255,Pn(this.r)||0))+', '+Rn(0,Hn(255,Pn(this.g)||0))+', '+Rn(0,Hn(255,Pn(this.b)||0))+(1===e?')':', '+e+')')}})),ra(F,function(e,t,n,i){return 1===arguments.length?q(e):new F(e,t,n,null==i?1:i)},_(L,{brighter:function(e){return e=null==e?la:In(la,e),new F(this.h,this.s,this.l*e,this.opacity)},darker:function(e){return e=null==e?oa:In(oa,e),new F(this.h,this.s,this.l*e,this.opacity)},rgb:function(){var e=this.h%360+360*(0>this.h),t=isNaN(e)||isNaN(this.s)?0:this.s,n=this.l,i=n+(0.5>n?n:1-n)*t,a=2*n-i;return new j(P(240<=e?e-240:e+120,a,i),P(e,a,i),P(120>e?e+240:e-120,a,i),this.opacity)},displayable:function(){return(0<=this.s&&1>=this.s||isNaN(this.s))&&0<=this.l&&1>=this.l&&0<=this.opacity&&1>=this.opacity}}));var ya=On/180,xa=180/On,ka=18,Kn=0.95047,Xn=1,Yn=1.08883,Zn=4/29,va=6/29,wa=3*va*va,Sa=va*va*va;ra(Y,function(e,t,n,i){return 1===arguments.length?H(e):new Y(e,t,n,null==i?1:i)},_(L,{brighter:function(e){return new Y(this.l+ka*(null==e?1:e),this.a,this.b,this.opacity)},darker:function(e){return new Y(this.l-ka*(null==e?1:e),this.a,this.b,this.opacity)},rgb:function(){var e=(this.l+16)/116,t=isNaN(this.a)?e:e+this.a/500,n=isNaN(this.b)?e:e-this.b/200;return e=Xn*V(e),t=Kn*V(t),n=Yn*V(n),new j(K(3.2404542*t-1.5371385*e-0.4985314*n),K(-0.969266*t+1.8760108*e+0.041556*n),K(0.0556434*t-0.2040259*e+1.0572252*n),this.opacity)}})),ra(X,function(e,t,n,i){return 1===arguments.length?z(e):new X(e,t,n,null==i?1:i)},_(L,{brighter:function(e){return new X(this.h,this.c,this.l+ka*(null==e?1:e),this.opacity)},darker:function(e){return new X(this.h,this.c,this.l-ka*(null==e?1:e),this.opacity)},rgb:function(){return H(this).rgb()}}));var Ca=-0.14861,A=+1.78277,B=-0.29227,C=-0.90649,D=+1.97294,E=D*C,Ta=D*A,_a=A*B-C*Ca;ra(Z,Q,_(L,{brighter:function(e){return e=null==e?la:In(la,e),new Z(this.h,this.s,this.l*e,this.opacity)},darker:function(e){return e=null==e?oa:In(oa,e),new Z(this.h,this.s,this.l*e,this.opacity)},rgb:function(){var e=isNaN(this.h)?0:(this.h+120)*ya,t=+this.l,n=isNaN(this.s)?0:this.s*t*(1-t),i=Mn(e),a=Dn(e);return new j(255*(t+n*(Ca*i+A*a)),255*(t+n*(B*i+C*a)),255*(t+n*(D*i)),this.opacity)}}));var La=function(e){return function(){return e}},Aa=function e(t){function n(e,t){var n=i((e=N(e)).r,(t=N(t)).r),a=i(e.g,t.g),d=i(e.b,t.b),r=ne(e.opacity,t.opacity);return function(i){return e.r=n(i),e.g=a(i),e.b=d(i),e.opacity=r(i),e+''}}var i=te(t);return n.gamma=e,n}(1),Ea=function(e,t){var n,i=t?t.length:0,a=e?Hn(i,e.length):0,d=Array(i),r=Array(i);for(n=0;nr&&(d=n.slice(r,d),l[o]?l[o]+=d:l[++o]=d),(t=t[0])===(a=a[0])?l[o]?l[o]+=a:l[++o]=a:(l[++o]=null,s.push({i:o,x:Ma(t,a)})),r=Ia.lastIndex;return rl.length?s[0]?ae(s[0].x):ie(n):(n=s.length,function(e){for(var t,a=0;an?n-360*Pn(n/360):n):La(isNaN(e)?t:e)});var qa,Fa=de(ne),Pa=function(e){return function(){return e}},Ha=function(e){return+e},za=[0,1],Ya=function(e,t){if(0>(n=(e=t?e.toExponential(t-1):e.toExponential()).indexOf('e')))return null;var n,i=e.slice(0,n);return[1d&&(o=Rn(1,d-l)),i.push(a.substring(r-=o,r+o)),!((l+=o+1)>d));)o=e[t=(t+1)%e.length];return i.reverse().join(n)}},Va=function(e){return function(t){return t.replace(/[0-9]/g,function(t){return e[+t]})}},Ka=function(e,t){var n=Ya(e,t);if(!n)return e+'';var i=n[0],a=n[1];return 0>a?'0.'+Array(-a).join('0')+i:i.length>a+1?i.slice(0,a+1)+'.'+i.slice(a+1):i+Array(a-i.length+2).join('0')},$a={"":function(e,t){e=e.toPrecision(t);out:for(var a,d=e.length,n=1,i=-1;ni?r+Array(l-i+1).join('0'):0=^]))?([+\-\( ])?([$#])?(0)?(\d+)?(,)?(\.\d+)?([a-z%])?$/i;fe.prototype=he.prototype,he.prototype.toString=function(){return this.fill+this.align+this.sign+this.symbol+(this.zero?'0':'')+(null==this.width?'':Rn(1,0|this.width))+(this.comma?',':'')+(null==this.precision?'':'.'+Rn(0,0|this.precision))+this.type};var re,Ja,Qa,Za=function(e){return e},Ga=['y','z','a','f','p','n','\xB5','m','','k','M','G','T','P','E','Z','Y'],ed=function(e){function t(e){function t(e){var t,i,n,c=b,k=m;if('c'===h)k=y(e)+k,e='';else{e=+e;var v=0>e;if(e=y(Un(e),f),v&&0==+e&&(v=!1),c=(v?'('===s?s:'-':'-'===s||'('===s?'':s)+c,k=k+('s'===h?Ga[8+qa/3]:'')+(v&&'('===s?')':''),x)for(t=-1,i=e.length;++tn||57>1)+c+e+k+S.slice(w);break;default:e=S+c+e+k;}return r(e)}e=fe(e);var o=e.fill,l=e.align,s=e.sign,c=e.symbol,u=e.zero,p=e.width,g=e.comma,f=e.precision,h=e.type,b='$'===c?n[0]:'#'===c&&/[boxX]/.test(h)?'0'+h.toLowerCase():'',m='$'===c?n[1]:/[%p]/.test(h)?i:'',y=$a[h],x=!h||/[defgprs%]/.test(h);return f=null==f?h?6:12:/[gprs]/.test(h)?Rn(1,Hn(21,f)):Rn(0,Hn(20,f)),t.toString=function(){return e+''},t}var a=e.grouping&&e.thousands?Wa(e.grouping,e.thousands):Za,n=e.currency,d=e.decimal,r=e.numerals?Va(e.numerals):Za,i=e.percent||'%';return{format:t,formatPrefix:function(n,i){var a=t((n=fe(n),n.type='f',n)),d=3*Rn(-8,Hn(8,Fn(Ba(i)/3))),r=In(10,-d),o=Ga[8+d/3];return function(e){return a(r*e)+o}}}};(function(e){return re=ed(e),Ja=re.format,Qa=re.formatPrefix,re})({decimal:'.',thousands:',',grouping:[3],currency:['$','']});var td=function(e){return Rn(0,-Ba(Un(e)))},nd=function(e,t){return Rn(0,3*Rn(-8,Hn(8,Fn(Ba(t)/3)))-Ba(Un(e)))},id=function(e,t){return e=Un(e),t=Un(t)-e,Rn(0,Ba(t)-Ba(e))+1},ad=function(e,t,n){var i,a=e[0],d=e[e.length-1],r=S(a,d,null==t?10:t);switch(n=fe(null==n?',f':n),n.type){case's':{var o=Rn(Un(a),Un(d));return null!=n.precision||isNaN(i=nd(r,o))||(n.precision=i),Qa(n,o)}case'':case'e':case'g':case'p':case'r':{null!=n.precision||isNaN(i=id(r,Rn(Un(a),Un(d))))||(n.precision=i-('e'===n.type));break}case'f':case'%':{null!=n.precision||isNaN(i=td(r))||(n.precision=i-2*('%'===n.type));break}}return Ja(n)},dd=new Date,rd=new Date,od=ye(function(){},function(e,t){e.setTime(+e+t)},function(e,t){return t-e});od.every=function(e){return e=Fn(e),isFinite(e)&&0t&&(t+=cd),e.setTime(Fn((+e-t)/cd)*cd+t)},function(e,t){e.setTime(+e+t*cd)},function(e,t){return(t-e)/cd},function(e){return e.getHours()}),bd=ye(function(e){e.setHours(0,0,0,0)},function(e,t){e.setDate(e.getDate()+t)},function(e,t){return(t-e-(t.getTimezoneOffset()-e.getTimezoneOffset())*sd)/ud},function(e){return e.getDate()-1}),md=xe(0),yd=xe(1),xd=xe(2),kd=xe(3),vd=xe(4),wd=xe(5),Sd=xe(6),Cd=ye(function(e){e.setDate(1),e.setHours(0,0,0,0)},function(e,t){e.setMonth(e.getMonth()+t)},function(e,t){return t.getMonth()-e.getMonth()+12*(t.getFullYear()-e.getFullYear())},function(e){return e.getMonth()}),Td=ye(function(e){e.setMonth(0,1),e.setHours(0,0,0,0)},function(e,t){e.setFullYear(e.getFullYear()+t)},function(e,t){return t.getFullYear()-e.getFullYear()},function(e){return e.getFullYear()});Td.every=function(e){return isFinite(e=Fn(e))&&0arguments.length){for(;++ot&&(this._names.push(e),this._node.setAttribute('class',this._names.join(' ')))},remove:function(e){var t=this._names.indexOf(e);0<=t&&(this._names.splice(t,1),this._node.setAttribute('class',this._names.join(' ')))},contains:function(e){return 0<=this._names.indexOf(e)}};var wr=[null];xn.prototype=function(){return new xn([[document.documentElement]],wr)}.prototype={constructor:xn,select:function(e){'function'!=typeof e&&(e=br(e));for(var t=this._groups,a=t.length,d=Array(a),r=0;r=v&&(v=k+1);!(x=b[v])&&++varguments.length){var i=this.node();return n.local?i.getAttributeNS(n.space,n.local):i.getAttribute(n)}return this.each((null==t?n.local?Ft:qt:'function'==typeof t?n.local?Yt:zt:n.local?Ht:Pt)(n,t))},style:function(e,t,n){return 1arguments.length){for(var d=Zt(this.node()),r=-1,i=a.length;++rarguments.length){var n=this.node().__on;if(n)for(var s,o=0,c=n.length;oarguments.length&&(a=t,t=gr().changedTouches);for(var d,r=0,i=t?t.length:0;rx}b.mouse('drag')}function i(){Sr(ur.view).on('mousemove.drag mouseup.drag',null),vn(ur.view,c),Tr(),b.mouse('end')}function a(){if(p.apply(this,arguments)){var e,t,i=ur.changedTouches,a=g.apply(this,arguments),d=i.length;for(e=0;e + :host { + position: relative; + display: inline-block; + } + + :host(:focus) { + outline: none; + } + + .background { + padding: 9px 0; + color: white; + position: relative; + } + + .track { + height: 3px; + width: 100%; + border-radius: 2px; + background-color: hsla(0, 0%, 0%, 0.2); + } + + .track-fill { + position: absolute; + top: 9px; + height: 3px; + border-radius: 4px; + background-color: hsl(24, 100%, 50%); + } + + .knob-container { + position: absolute; + top: 10px; + } + + .knob { + position: absolute; + top: -6px; + left: -6px; + width: 13px; + height: 13px; + background-color: hsl(24, 100%, 50%); + border-radius: 50%; + transition-property: transform; + transition-duration: 0.18s; + transition-timing-function: ease; + } + .mousedown .knob { + transform: scale(1.5); + } + + .knob-highlight { + position: absolute; + top: -6px; + left: -6px; + width: 13px; + height: 13px; + background-color: hsla(0, 0%, 0%, 0.1); + border-radius: 50%; + transition-property: transform; + transition-duration: 0.18s; + transition-timing-function: ease; + } + + .focus .knob-highlight { + transform: scale(2); + } + + .ticks { + position: absolute; + top: 16px; + height: 4px; + width: 100%; + z-index: -1; + } + + .ticks .tick { + position: absolute; + height: 100%; + border-left: 1px solid hsla(0, 0%, 0%, 0.2); + } + + + +
    +
    +
    +
    +
    +
    +
    +
    +
    +`),Dr={left:37,up:38,right:39,down:40,pageUp:33,pageDown:34,end:35,home:36};class Mr extends Er(HTMLElement){connectedCallback(){this.connected=!0,this.setAttribute('role','slider'),this.hasAttribute('tabindex')||this.setAttribute('tabindex',0),this.mouseEvent=!1,this.knob=this.root.querySelector('.knob-container'),this.background=this.root.querySelector('.background'),this.trackFill=this.root.querySelector('.track-fill'),this.track=this.root.querySelector('.track'),this.min=this.min?this.min:0,this.max=this.max?this.max:100,this.scale=me().domain([this.min,this.max]).range([0,1]).clamp(!0),this.origin=this.origin===void 0?this.min:this.origin,this.step=this.step?this.step:1,this.update(this.value?this.value:0),this.ticks=!!this.ticks&&this.ticks,this.renderTicks(),this.drag=Ar().container(this.background).on('start',()=>{this.mouseEvent=!0,this.background.classList.add('mousedown'),this.changeValue=this.value,this.dragUpdate()}).on('drag',()=>{this.dragUpdate()}).on('end',()=>{this.mouseEvent=!1,this.background.classList.remove('mousedown'),this.dragUpdate(),this.changeValue!==this.value&&this.dispatchChange(),this.changeValue=this.value}),this.drag(Sr(this.background)),this.addEventListener('focusin',()=>{this.mouseEvent||this.background.classList.add('focus')}),this.addEventListener('focusout',()=>{this.background.classList.remove('focus')}),this.addEventListener('keydown',this.onKeyDown)}static get observedAttributes(){return['min','max','value','step','ticks','origin','tickValues','tickLabels']}attributeChangedCallback(e,t,n){isNaN(n)||void 0===n||null===n||('min'==e&&(this.min=+n,this.setAttribute('aria-valuemin',this.min)),'max'==e&&(this.max=+n,this.setAttribute('aria-valuemax',this.max)),'value'==e&&this.update(+n),'origin'==e&&(this.origin=+n),'step'==e&&0{const n=document.createElement('div');n.classList.add('tick'),n.style.left=100*this.scale(t)+'%',e.appendChild(n)})}else e.style.display='none'}}var Or='\n \n\n';const Ur=ti('distill-header',` + + +`,!1);class Ir extends Ur(HTMLElement){}const Nr=` + +`;class jr extends HTMLElement{static get is(){return'distill-appendix'}set frontMatter(e){this.innerHTML=Ln(e)}}const Rr=ti('distill-footer',` + + +
    + + is dedicated to clear explanations of machine learning + + + +
    + +`);class qr extends Rr(HTMLElement){}const Fr=function(){if(1>window.distillRunlevel)throw new Error('Insufficient Runlevel for Distill Template!');if('distillTemplateIsLoading'in window&&window.distillTemplateIsLoading)throw new Error('Runlevel 1: Distill Template is getting loaded more than once, aborting!');else window.distillTemplateIsLoading=!0,console.info('Runlevel 1: Distill Template has started loading.');p(document),console.info('Runlevel 1: Static Distill styles have been added.'),console.info('Runlevel 1->2.'),window.distillRunlevel+=1;for(const[e,t]of Object.entries(hi.listeners))'function'==typeof t?document.addEventListener(e,t):console.error('Runlevel 2: Controller listeners need to be functions!');console.info('Runlevel 2: We can now listen to controller events.'),console.info('Runlevel 2->3.'),window.distillRunlevel+=1;if(2>window.distillRunlevel)throw new Error('Insufficient Runlevel for adding custom elements!');const e=[ki,wi,Ci,Li,Ai,Di,Oi,Ni,Ri,Fi,pi,Hi,zi,T,Bi,Wi,Vi,Mr,$i].concat([Ir,jr,qr]);for(const t of e)console.info('Runlevel 2: Registering custom element: '+t.is),customElements.define(t.is,t);console.info('Runlevel 3: Distill Template finished registering custom elements.'),console.info('Runlevel 3->4.'),window.distillRunlevel+=1,hi.listeners.DOMContentLoaded(),console.info('Runlevel 4: Distill Template initialisation complete.')};window.distillRunlevel=0,yi.browserSupportsAllFeatures()?(console.info('Runlevel 0: No need for polyfills.'),console.info('Runlevel 0->1.'),window.distillRunlevel+=1,Fr()):(console.info('Runlevel 0: Distill Template is loading polyfills.'),yi.load(Fr))}); +//# sourceMappingURL=template.v2.js.map +} diff --git a/_articles/RJ-2025-035/RJ-2025-035_files/header-attrs-2.30/header-attrs.js b/_articles/RJ-2025-035/RJ-2025-035_files/header-attrs-2.30/header-attrs.js new file mode 100644 index 0000000000..dd57d92e02 --- /dev/null +++ b/_articles/RJ-2025-035/RJ-2025-035_files/header-attrs-2.30/header-attrs.js @@ -0,0 +1,12 @@ +// Pandoc 2.9 adds attributes on both header and div. We remove the former (to +// be compatible with the behavior of Pandoc < 2.8). +document.addEventListener('DOMContentLoaded', function(e) { + var hs = document.querySelectorAll("div.section[class*='level'] > :first-child"); + var i, h, a; + for (i = 0; i < hs.length; i++) { + h = hs[i]; + if (!/^h[1-6]$/i.test(h.tagName)) continue; // it should be a header h1-h6 + a = h.attributes; + while (a.length > 0) h.removeAttribute(a[0].name); + } +}); diff --git a/_articles/RJ-2025-035/RJ-2025-035_files/jquery-3.6.0/jquery-3.6.0.js b/_articles/RJ-2025-035/RJ-2025-035_files/jquery-3.6.0/jquery-3.6.0.js new file mode 100644 index 0000000000..fc6c299b73 --- /dev/null +++ b/_articles/RJ-2025-035/RJ-2025-035_files/jquery-3.6.0/jquery-3.6.0.js @@ -0,0 +1,10881 @@ +/*! + * jQuery JavaScript Library v3.6.0 + * https://jquery.com/ + * + * Includes Sizzle.js + * https://sizzlejs.com/ + * + * Copyright OpenJS Foundation and other contributors + * Released under the MIT license + * https://jquery.org/license + * + * Date: 2021-03-02T17:08Z + */ +( function( global, factory ) { + + "use strict"; + + if ( typeof module === "object" && typeof module.exports === "object" ) { + + // For CommonJS and CommonJS-like environments where a proper `window` + // is present, execute the factory and get jQuery. + // For environments that do not have a `window` with a `document` + // (such as Node.js), expose a factory as module.exports. + // This accentuates the need for the creation of a real `window`. + // e.g. var jQuery = require("jquery")(window); + // See ticket #14549 for more info. + module.exports = global.document ? + factory( global, true ) : + function( w ) { + if ( !w.document ) { + throw new Error( "jQuery requires a window with a document" ); + } + return factory( w ); + }; + } else { + factory( global ); + } + +// Pass this if window is not defined yet +} )( typeof window !== "undefined" ? window : this, function( window, noGlobal ) { + +// Edge <= 12 - 13+, Firefox <=18 - 45+, IE 10 - 11, Safari 5.1 - 9+, iOS 6 - 9.1 +// throw exceptions when non-strict code (e.g., ASP.NET 4.5) accesses strict mode +// arguments.callee.caller (trac-13335). But as of jQuery 3.0 (2016), strict mode should be common +// enough that all such attempts are guarded in a try block. +"use strict"; + +var arr = []; + +var getProto = Object.getPrototypeOf; + +var slice = arr.slice; + +var flat = arr.flat ? function( array ) { + return arr.flat.call( array ); +} : function( array ) { + return arr.concat.apply( [], array ); +}; + + +var push = arr.push; + +var indexOf = arr.indexOf; + +var class2type = {}; + +var toString = class2type.toString; + +var hasOwn = class2type.hasOwnProperty; + +var fnToString = hasOwn.toString; + +var ObjectFunctionString = fnToString.call( Object ); + +var support = {}; + +var isFunction = function isFunction( obj ) { + + // Support: Chrome <=57, Firefox <=52 + // In some browsers, typeof returns "function" for HTML elements + // (i.e., `typeof document.createElement( "object" ) === "function"`). + // We don't want to classify *any* DOM node as a function. + // Support: QtWeb <=3.8.5, WebKit <=534.34, wkhtmltopdf tool <=0.12.5 + // Plus for old WebKit, typeof returns "function" for HTML collections + // (e.g., `typeof document.getElementsByTagName("div") === "function"`). (gh-4756) + return typeof obj === "function" && typeof obj.nodeType !== "number" && + typeof obj.item !== "function"; + }; + + +var isWindow = function isWindow( obj ) { + return obj != null && obj === obj.window; + }; + + +var document = window.document; + + + + var preservedScriptAttributes = { + type: true, + src: true, + nonce: true, + noModule: true + }; + + function DOMEval( code, node, doc ) { + doc = doc || document; + + var i, val, + script = doc.createElement( "script" ); + + script.text = code; + if ( node ) { + for ( i in preservedScriptAttributes ) { + + // Support: Firefox 64+, Edge 18+ + // Some browsers don't support the "nonce" property on scripts. + // On the other hand, just using `getAttribute` is not enough as + // the `nonce` attribute is reset to an empty string whenever it + // becomes browsing-context connected. + // See https://github.com/whatwg/html/issues/2369 + // See https://html.spec.whatwg.org/#nonce-attributes + // The `node.getAttribute` check was added for the sake of + // `jQuery.globalEval` so that it can fake a nonce-containing node + // via an object. + val = node[ i ] || node.getAttribute && node.getAttribute( i ); + if ( val ) { + script.setAttribute( i, val ); + } + } + } + doc.head.appendChild( script ).parentNode.removeChild( script ); + } + + +function toType( obj ) { + if ( obj == null ) { + return obj + ""; + } + + // Support: Android <=2.3 only (functionish RegExp) + return typeof obj === "object" || typeof obj === "function" ? + class2type[ toString.call( obj ) ] || "object" : + typeof obj; +} +/* global Symbol */ +// Defining this global in .eslintrc.json would create a danger of using the global +// unguarded in another place, it seems safer to define global only for this module + + + +var + version = "3.6.0", + + // Define a local copy of jQuery + jQuery = function( selector, context ) { + + // The jQuery object is actually just the init constructor 'enhanced' + // Need init if jQuery is called (just allow error to be thrown if not included) + return new jQuery.fn.init( selector, context ); + }; + +jQuery.fn = jQuery.prototype = { + + // The current version of jQuery being used + jquery: version, + + constructor: jQuery, + + // The default length of a jQuery object is 0 + length: 0, + + toArray: function() { + return slice.call( this ); + }, + + // Get the Nth element in the matched element set OR + // Get the whole matched element set as a clean array + get: function( num ) { + + // Return all the elements in a clean array + if ( num == null ) { + return slice.call( this ); + } + + // Return just the one element from the set + return num < 0 ? this[ num + this.length ] : this[ num ]; + }, + + // Take an array of elements and push it onto the stack + // (returning the new matched element set) + pushStack: function( elems ) { + + // Build a new jQuery matched element set + var ret = jQuery.merge( this.constructor(), elems ); + + // Add the old object onto the stack (as a reference) + ret.prevObject = this; + + // Return the newly-formed element set + return ret; + }, + + // Execute a callback for every element in the matched set. + each: function( callback ) { + return jQuery.each( this, callback ); + }, + + map: function( callback ) { + return this.pushStack( jQuery.map( this, function( elem, i ) { + return callback.call( elem, i, elem ); + } ) ); + }, + + slice: function() { + return this.pushStack( slice.apply( this, arguments ) ); + }, + + first: function() { + return this.eq( 0 ); + }, + + last: function() { + return this.eq( -1 ); + }, + + even: function() { + return this.pushStack( jQuery.grep( this, function( _elem, i ) { + return ( i + 1 ) % 2; + } ) ); + }, + + odd: function() { + return this.pushStack( jQuery.grep( this, function( _elem, i ) { + return i % 2; + } ) ); + }, + + eq: function( i ) { + var len = this.length, + j = +i + ( i < 0 ? len : 0 ); + return this.pushStack( j >= 0 && j < len ? [ this[ j ] ] : [] ); + }, + + end: function() { + return this.prevObject || this.constructor(); + }, + + // For internal use only. + // Behaves like an Array's method, not like a jQuery method. + push: push, + sort: arr.sort, + splice: arr.splice +}; + +jQuery.extend = jQuery.fn.extend = function() { + var options, name, src, copy, copyIsArray, clone, + target = arguments[ 0 ] || {}, + i = 1, + length = arguments.length, + deep = false; + + // Handle a deep copy situation + if ( typeof target === "boolean" ) { + deep = target; + + // Skip the boolean and the target + target = arguments[ i ] || {}; + i++; + } + + // Handle case when target is a string or something (possible in deep copy) + if ( typeof target !== "object" && !isFunction( target ) ) { + target = {}; + } + + // Extend jQuery itself if only one argument is passed + if ( i === length ) { + target = this; + i--; + } + + for ( ; i < length; i++ ) { + + // Only deal with non-null/undefined values + if ( ( options = arguments[ i ] ) != null ) { + + // Extend the base object + for ( name in options ) { + copy = options[ name ]; + + // Prevent Object.prototype pollution + // Prevent never-ending loop + if ( name === "__proto__" || target === copy ) { + continue; + } + + // Recurse if we're merging plain objects or arrays + if ( deep && copy && ( jQuery.isPlainObject( copy ) || + ( copyIsArray = Array.isArray( copy ) ) ) ) { + src = target[ name ]; + + // Ensure proper type for the source value + if ( copyIsArray && !Array.isArray( src ) ) { + clone = []; + } else if ( !copyIsArray && !jQuery.isPlainObject( src ) ) { + clone = {}; + } else { + clone = src; + } + copyIsArray = false; + + // Never move original objects, clone them + target[ name ] = jQuery.extend( deep, clone, copy ); + + // Don't bring in undefined values + } else if ( copy !== undefined ) { + target[ name ] = copy; + } + } + } + } + + // Return the modified object + return target; +}; + +jQuery.extend( { + + // Unique for each copy of jQuery on the page + expando: "jQuery" + ( version + Math.random() ).replace( /\D/g, "" ), + + // Assume jQuery is ready without the ready module + isReady: true, + + error: function( msg ) { + throw new Error( msg ); + }, + + noop: function() {}, + + isPlainObject: function( obj ) { + var proto, Ctor; + + // Detect obvious negatives + // Use toString instead of jQuery.type to catch host objects + if ( !obj || toString.call( obj ) !== "[object Object]" ) { + return false; + } + + proto = getProto( obj ); + + // Objects with no prototype (e.g., `Object.create( null )`) are plain + if ( !proto ) { + return true; + } + + // Objects with prototype are plain iff they were constructed by a global Object function + Ctor = hasOwn.call( proto, "constructor" ) && proto.constructor; + return typeof Ctor === "function" && fnToString.call( Ctor ) === ObjectFunctionString; + }, + + isEmptyObject: function( obj ) { + var name; + + for ( name in obj ) { + return false; + } + return true; + }, + + // Evaluates a script in a provided context; falls back to the global one + // if not specified. + globalEval: function( code, options, doc ) { + DOMEval( code, { nonce: options && options.nonce }, doc ); + }, + + each: function( obj, callback ) { + var length, i = 0; + + if ( isArrayLike( obj ) ) { + length = obj.length; + for ( ; i < length; i++ ) { + if ( callback.call( obj[ i ], i, obj[ i ] ) === false ) { + break; + } + } + } else { + for ( i in obj ) { + if ( callback.call( obj[ i ], i, obj[ i ] ) === false ) { + break; + } + } + } + + return obj; + }, + + // results is for internal usage only + makeArray: function( arr, results ) { + var ret = results || []; + + if ( arr != null ) { + if ( isArrayLike( Object( arr ) ) ) { + jQuery.merge( ret, + typeof arr === "string" ? + [ arr ] : arr + ); + } else { + push.call( ret, arr ); + } + } + + return ret; + }, + + inArray: function( elem, arr, i ) { + return arr == null ? -1 : indexOf.call( arr, elem, i ); + }, + + // Support: Android <=4.0 only, PhantomJS 1 only + // push.apply(_, arraylike) throws on ancient WebKit + merge: function( first, second ) { + var len = +second.length, + j = 0, + i = first.length; + + for ( ; j < len; j++ ) { + first[ i++ ] = second[ j ]; + } + + first.length = i; + + return first; + }, + + grep: function( elems, callback, invert ) { + var callbackInverse, + matches = [], + i = 0, + length = elems.length, + callbackExpect = !invert; + + // Go through the array, only saving the items + // that pass the validator function + for ( ; i < length; i++ ) { + callbackInverse = !callback( elems[ i ], i ); + if ( callbackInverse !== callbackExpect ) { + matches.push( elems[ i ] ); + } + } + + return matches; + }, + + // arg is for internal usage only + map: function( elems, callback, arg ) { + var length, value, + i = 0, + ret = []; + + // Go through the array, translating each of the items to their new values + if ( isArrayLike( elems ) ) { + length = elems.length; + for ( ; i < length; i++ ) { + value = callback( elems[ i ], i, arg ); + + if ( value != null ) { + ret.push( value ); + } + } + + // Go through every key on the object, + } else { + for ( i in elems ) { + value = callback( elems[ i ], i, arg ); + + if ( value != null ) { + ret.push( value ); + } + } + } + + // Flatten any nested arrays + return flat( ret ); + }, + + // A global GUID counter for objects + guid: 1, + + // jQuery.support is not used in Core but other projects attach their + // properties to it so it needs to exist. + support: support +} ); + +if ( typeof Symbol === "function" ) { + jQuery.fn[ Symbol.iterator ] = arr[ Symbol.iterator ]; +} + +// Populate the class2type map +jQuery.each( "Boolean Number String Function Array Date RegExp Object Error Symbol".split( " " ), + function( _i, name ) { + class2type[ "[object " + name + "]" ] = name.toLowerCase(); + } ); + +function isArrayLike( obj ) { + + // Support: real iOS 8.2 only (not reproducible in simulator) + // `in` check used to prevent JIT error (gh-2145) + // hasOwn isn't used here due to false negatives + // regarding Nodelist length in IE + var length = !!obj && "length" in obj && obj.length, + type = toType( obj ); + + if ( isFunction( obj ) || isWindow( obj ) ) { + return false; + } + + return type === "array" || length === 0 || + typeof length === "number" && length > 0 && ( length - 1 ) in obj; +} +var Sizzle = +/*! + * Sizzle CSS Selector Engine v2.3.6 + * https://sizzlejs.com/ + * + * Copyright JS Foundation and other contributors + * Released under the MIT license + * https://js.foundation/ + * + * Date: 2021-02-16 + */ +( function( window ) { +var i, + support, + Expr, + getText, + isXML, + tokenize, + compile, + select, + outermostContext, + sortInput, + hasDuplicate, + + // Local document vars + setDocument, + document, + docElem, + documentIsHTML, + rbuggyQSA, + rbuggyMatches, + matches, + contains, + + // Instance-specific data + expando = "sizzle" + 1 * new Date(), + preferredDoc = window.document, + dirruns = 0, + done = 0, + classCache = createCache(), + tokenCache = createCache(), + compilerCache = createCache(), + nonnativeSelectorCache = createCache(), + sortOrder = function( a, b ) { + if ( a === b ) { + hasDuplicate = true; + } + return 0; + }, + + // Instance methods + hasOwn = ( {} ).hasOwnProperty, + arr = [], + pop = arr.pop, + pushNative = arr.push, + push = arr.push, + slice = arr.slice, + + // Use a stripped-down indexOf as it's faster than native + // https://jsperf.com/thor-indexof-vs-for/5 + indexOf = function( list, elem ) { + var i = 0, + len = list.length; + for ( ; i < len; i++ ) { + if ( list[ i ] === elem ) { + return i; + } + } + return -1; + }, + + booleans = "checked|selected|async|autofocus|autoplay|controls|defer|disabled|hidden|" + + "ismap|loop|multiple|open|readonly|required|scoped", + + // Regular expressions + + // http://www.w3.org/TR/css3-selectors/#whitespace + whitespace = "[\\x20\\t\\r\\n\\f]", + + // https://www.w3.org/TR/css-syntax-3/#ident-token-diagram + identifier = "(?:\\\\[\\da-fA-F]{1,6}" + whitespace + + "?|\\\\[^\\r\\n\\f]|[\\w-]|[^\0-\\x7f])+", + + // Attribute selectors: http://www.w3.org/TR/selectors/#attribute-selectors + attributes = "\\[" + whitespace + "*(" + identifier + ")(?:" + whitespace + + + // Operator (capture 2) + "*([*^$|!~]?=)" + whitespace + + + // "Attribute values must be CSS identifiers [capture 5] + // or strings [capture 3 or capture 4]" + "*(?:'((?:\\\\.|[^\\\\'])*)'|\"((?:\\\\.|[^\\\\\"])*)\"|(" + identifier + "))|)" + + whitespace + "*\\]", + + pseudos = ":(" + identifier + ")(?:\\((" + + + // To reduce the number of selectors needing tokenize in the preFilter, prefer arguments: + // 1. quoted (capture 3; capture 4 or capture 5) + "('((?:\\\\.|[^\\\\'])*)'|\"((?:\\\\.|[^\\\\\"])*)\")|" + + + // 2. simple (capture 6) + "((?:\\\\.|[^\\\\()[\\]]|" + attributes + ")*)|" + + + // 3. anything else (capture 2) + ".*" + + ")\\)|)", + + // Leading and non-escaped trailing whitespace, capturing some non-whitespace characters preceding the latter + rwhitespace = new RegExp( whitespace + "+", "g" ), + rtrim = new RegExp( "^" + whitespace + "+|((?:^|[^\\\\])(?:\\\\.)*)" + + whitespace + "+$", "g" ), + + rcomma = new RegExp( "^" + whitespace + "*," + whitespace + "*" ), + rcombinators = new RegExp( "^" + whitespace + "*([>+~]|" + whitespace + ")" + whitespace + + "*" ), + rdescend = new RegExp( whitespace + "|>" ), + + rpseudo = new RegExp( pseudos ), + ridentifier = new RegExp( "^" + identifier + "$" ), + + matchExpr = { + "ID": new RegExp( "^#(" + identifier + ")" ), + "CLASS": new RegExp( "^\\.(" + identifier + ")" ), + "TAG": new RegExp( "^(" + identifier + "|[*])" ), + "ATTR": new RegExp( "^" + attributes ), + "PSEUDO": new RegExp( "^" + pseudos ), + "CHILD": new RegExp( "^:(only|first|last|nth|nth-last)-(child|of-type)(?:\\(" + + whitespace + "*(even|odd|(([+-]|)(\\d*)n|)" + whitespace + "*(?:([+-]|)" + + whitespace + "*(\\d+)|))" + whitespace + "*\\)|)", "i" ), + "bool": new RegExp( "^(?:" + booleans + ")$", "i" ), + + // For use in libraries implementing .is() + // We use this for POS matching in `select` + "needsContext": new RegExp( "^" + whitespace + + "*[>+~]|:(even|odd|eq|gt|lt|nth|first|last)(?:\\(" + whitespace + + "*((?:-\\d)?\\d*)" + whitespace + "*\\)|)(?=[^-]|$)", "i" ) + }, + + rhtml = /HTML$/i, + rinputs = /^(?:input|select|textarea|button)$/i, + rheader = /^h\d$/i, + + rnative = /^[^{]+\{\s*\[native \w/, + + // Easily-parseable/retrievable ID or TAG or CLASS selectors + rquickExpr = /^(?:#([\w-]+)|(\w+)|\.([\w-]+))$/, + + rsibling = /[+~]/, + + // CSS escapes + // http://www.w3.org/TR/CSS21/syndata.html#escaped-characters + runescape = new RegExp( "\\\\[\\da-fA-F]{1,6}" + whitespace + "?|\\\\([^\\r\\n\\f])", "g" ), + funescape = function( escape, nonHex ) { + var high = "0x" + escape.slice( 1 ) - 0x10000; + + return nonHex ? + + // Strip the backslash prefix from a non-hex escape sequence + nonHex : + + // Replace a hexadecimal escape sequence with the encoded Unicode code point + // Support: IE <=11+ + // For values outside the Basic Multilingual Plane (BMP), manually construct a + // surrogate pair + high < 0 ? + String.fromCharCode( high + 0x10000 ) : + String.fromCharCode( high >> 10 | 0xD800, high & 0x3FF | 0xDC00 ); + }, + + // CSS string/identifier serialization + // https://drafts.csswg.org/cssom/#common-serializing-idioms + rcssescape = /([\0-\x1f\x7f]|^-?\d)|^-$|[^\0-\x1f\x7f-\uFFFF\w-]/g, + fcssescape = function( ch, asCodePoint ) { + if ( asCodePoint ) { + + // U+0000 NULL becomes U+FFFD REPLACEMENT CHARACTER + if ( ch === "\0" ) { + return "\uFFFD"; + } + + // Control characters and (dependent upon position) numbers get escaped as code points + return ch.slice( 0, -1 ) + "\\" + + ch.charCodeAt( ch.length - 1 ).toString( 16 ) + " "; + } + + // Other potentially-special ASCII characters get backslash-escaped + return "\\" + ch; + }, + + // Used for iframes + // See setDocument() + // Removing the function wrapper causes a "Permission Denied" + // error in IE + unloadHandler = function() { + setDocument(); + }, + + inDisabledFieldset = addCombinator( + function( elem ) { + return elem.disabled === true && elem.nodeName.toLowerCase() === "fieldset"; + }, + { dir: "parentNode", next: "legend" } + ); + +// Optimize for push.apply( _, NodeList ) +try { + push.apply( + ( arr = slice.call( preferredDoc.childNodes ) ), + preferredDoc.childNodes + ); + + // Support: Android<4.0 + // Detect silently failing push.apply + // eslint-disable-next-line no-unused-expressions + arr[ preferredDoc.childNodes.length ].nodeType; +} catch ( e ) { + push = { apply: arr.length ? + + // Leverage slice if possible + function( target, els ) { + pushNative.apply( target, slice.call( els ) ); + } : + + // Support: IE<9 + // Otherwise append directly + function( target, els ) { + var j = target.length, + i = 0; + + // Can't trust NodeList.length + while ( ( target[ j++ ] = els[ i++ ] ) ) {} + target.length = j - 1; + } + }; +} + +function Sizzle( selector, context, results, seed ) { + var m, i, elem, nid, match, groups, newSelector, + newContext = context && context.ownerDocument, + + // nodeType defaults to 9, since context defaults to document + nodeType = context ? context.nodeType : 9; + + results = results || []; + + // Return early from calls with invalid selector or context + if ( typeof selector !== "string" || !selector || + nodeType !== 1 && nodeType !== 9 && nodeType !== 11 ) { + + return results; + } + + // Try to shortcut find operations (as opposed to filters) in HTML documents + if ( !seed ) { + setDocument( context ); + context = context || document; + + if ( documentIsHTML ) { + + // If the selector is sufficiently simple, try using a "get*By*" DOM method + // (excepting DocumentFragment context, where the methods don't exist) + if ( nodeType !== 11 && ( match = rquickExpr.exec( selector ) ) ) { + + // ID selector + if ( ( m = match[ 1 ] ) ) { + + // Document context + if ( nodeType === 9 ) { + if ( ( elem = context.getElementById( m ) ) ) { + + // Support: IE, Opera, Webkit + // TODO: identify versions + // getElementById can match elements by name instead of ID + if ( elem.id === m ) { + results.push( elem ); + return results; + } + } else { + return results; + } + + // Element context + } else { + + // Support: IE, Opera, Webkit + // TODO: identify versions + // getElementById can match elements by name instead of ID + if ( newContext && ( elem = newContext.getElementById( m ) ) && + contains( context, elem ) && + elem.id === m ) { + + results.push( elem ); + return results; + } + } + + // Type selector + } else if ( match[ 2 ] ) { + push.apply( results, context.getElementsByTagName( selector ) ); + return results; + + // Class selector + } else if ( ( m = match[ 3 ] ) && support.getElementsByClassName && + context.getElementsByClassName ) { + + push.apply( results, context.getElementsByClassName( m ) ); + return results; + } + } + + // Take advantage of querySelectorAll + if ( support.qsa && + !nonnativeSelectorCache[ selector + " " ] && + ( !rbuggyQSA || !rbuggyQSA.test( selector ) ) && + + // Support: IE 8 only + // Exclude object elements + ( nodeType !== 1 || context.nodeName.toLowerCase() !== "object" ) ) { + + newSelector = selector; + newContext = context; + + // qSA considers elements outside a scoping root when evaluating child or + // descendant combinators, which is not what we want. + // In such cases, we work around the behavior by prefixing every selector in the + // list with an ID selector referencing the scope context. + // The technique has to be used as well when a leading combinator is used + // as such selectors are not recognized by querySelectorAll. + // Thanks to Andrew Dupont for this technique. + if ( nodeType === 1 && + ( rdescend.test( selector ) || rcombinators.test( selector ) ) ) { + + // Expand context for sibling selectors + newContext = rsibling.test( selector ) && testContext( context.parentNode ) || + context; + + // We can use :scope instead of the ID hack if the browser + // supports it & if we're not changing the context. + if ( newContext !== context || !support.scope ) { + + // Capture the context ID, setting it first if necessary + if ( ( nid = context.getAttribute( "id" ) ) ) { + nid = nid.replace( rcssescape, fcssescape ); + } else { + context.setAttribute( "id", ( nid = expando ) ); + } + } + + // Prefix every selector in the list + groups = tokenize( selector ); + i = groups.length; + while ( i-- ) { + groups[ i ] = ( nid ? "#" + nid : ":scope" ) + " " + + toSelector( groups[ i ] ); + } + newSelector = groups.join( "," ); + } + + try { + push.apply( results, + newContext.querySelectorAll( newSelector ) + ); + return results; + } catch ( qsaError ) { + nonnativeSelectorCache( selector, true ); + } finally { + if ( nid === expando ) { + context.removeAttribute( "id" ); + } + } + } + } + } + + // All others + return select( selector.replace( rtrim, "$1" ), context, results, seed ); +} + +/** + * Create key-value caches of limited size + * @returns {function(string, object)} Returns the Object data after storing it on itself with + * property name the (space-suffixed) string and (if the cache is larger than Expr.cacheLength) + * deleting the oldest entry + */ +function createCache() { + var keys = []; + + function cache( key, value ) { + + // Use (key + " ") to avoid collision with native prototype properties (see Issue #157) + if ( keys.push( key + " " ) > Expr.cacheLength ) { + + // Only keep the most recent entries + delete cache[ keys.shift() ]; + } + return ( cache[ key + " " ] = value ); + } + return cache; +} + +/** + * Mark a function for special use by Sizzle + * @param {Function} fn The function to mark + */ +function markFunction( fn ) { + fn[ expando ] = true; + return fn; +} + +/** + * Support testing using an element + * @param {Function} fn Passed the created element and returns a boolean result + */ +function assert( fn ) { + var el = document.createElement( "fieldset" ); + + try { + return !!fn( el ); + } catch ( e ) { + return false; + } finally { + + // Remove from its parent by default + if ( el.parentNode ) { + el.parentNode.removeChild( el ); + } + + // release memory in IE + el = null; + } +} + +/** + * Adds the same handler for all of the specified attrs + * @param {String} attrs Pipe-separated list of attributes + * @param {Function} handler The method that will be applied + */ +function addHandle( attrs, handler ) { + var arr = attrs.split( "|" ), + i = arr.length; + + while ( i-- ) { + Expr.attrHandle[ arr[ i ] ] = handler; + } +} + +/** + * Checks document order of two siblings + * @param {Element} a + * @param {Element} b + * @returns {Number} Returns less than 0 if a precedes b, greater than 0 if a follows b + */ +function siblingCheck( a, b ) { + var cur = b && a, + diff = cur && a.nodeType === 1 && b.nodeType === 1 && + a.sourceIndex - b.sourceIndex; + + // Use IE sourceIndex if available on both nodes + if ( diff ) { + return diff; + } + + // Check if b follows a + if ( cur ) { + while ( ( cur = cur.nextSibling ) ) { + if ( cur === b ) { + return -1; + } + } + } + + return a ? 1 : -1; +} + +/** + * Returns a function to use in pseudos for input types + * @param {String} type + */ +function createInputPseudo( type ) { + return function( elem ) { + var name = elem.nodeName.toLowerCase(); + return name === "input" && elem.type === type; + }; +} + +/** + * Returns a function to use in pseudos for buttons + * @param {String} type + */ +function createButtonPseudo( type ) { + return function( elem ) { + var name = elem.nodeName.toLowerCase(); + return ( name === "input" || name === "button" ) && elem.type === type; + }; +} + +/** + * Returns a function to use in pseudos for :enabled/:disabled + * @param {Boolean} disabled true for :disabled; false for :enabled + */ +function createDisabledPseudo( disabled ) { + + // Known :disabled false positives: fieldset[disabled] > legend:nth-of-type(n+2) :can-disable + return function( elem ) { + + // Only certain elements can match :enabled or :disabled + // https://html.spec.whatwg.org/multipage/scripting.html#selector-enabled + // https://html.spec.whatwg.org/multipage/scripting.html#selector-disabled + if ( "form" in elem ) { + + // Check for inherited disabledness on relevant non-disabled elements: + // * listed form-associated elements in a disabled fieldset + // https://html.spec.whatwg.org/multipage/forms.html#category-listed + // https://html.spec.whatwg.org/multipage/forms.html#concept-fe-disabled + // * option elements in a disabled optgroup + // https://html.spec.whatwg.org/multipage/forms.html#concept-option-disabled + // All such elements have a "form" property. + if ( elem.parentNode && elem.disabled === false ) { + + // Option elements defer to a parent optgroup if present + if ( "label" in elem ) { + if ( "label" in elem.parentNode ) { + return elem.parentNode.disabled === disabled; + } else { + return elem.disabled === disabled; + } + } + + // Support: IE 6 - 11 + // Use the isDisabled shortcut property to check for disabled fieldset ancestors + return elem.isDisabled === disabled || + + // Where there is no isDisabled, check manually + /* jshint -W018 */ + elem.isDisabled !== !disabled && + inDisabledFieldset( elem ) === disabled; + } + + return elem.disabled === disabled; + + // Try to winnow out elements that can't be disabled before trusting the disabled property. + // Some victims get caught in our net (label, legend, menu, track), but it shouldn't + // even exist on them, let alone have a boolean value. + } else if ( "label" in elem ) { + return elem.disabled === disabled; + } + + // Remaining elements are neither :enabled nor :disabled + return false; + }; +} + +/** + * Returns a function to use in pseudos for positionals + * @param {Function} fn + */ +function createPositionalPseudo( fn ) { + return markFunction( function( argument ) { + argument = +argument; + return markFunction( function( seed, matches ) { + var j, + matchIndexes = fn( [], seed.length, argument ), + i = matchIndexes.length; + + // Match elements found at the specified indexes + while ( i-- ) { + if ( seed[ ( j = matchIndexes[ i ] ) ] ) { + seed[ j ] = !( matches[ j ] = seed[ j ] ); + } + } + } ); + } ); +} + +/** + * Checks a node for validity as a Sizzle context + * @param {Element|Object=} context + * @returns {Element|Object|Boolean} The input node if acceptable, otherwise a falsy value + */ +function testContext( context ) { + return context && typeof context.getElementsByTagName !== "undefined" && context; +} + +// Expose support vars for convenience +support = Sizzle.support = {}; + +/** + * Detects XML nodes + * @param {Element|Object} elem An element or a document + * @returns {Boolean} True iff elem is a non-HTML XML node + */ +isXML = Sizzle.isXML = function( elem ) { + var namespace = elem && elem.namespaceURI, + docElem = elem && ( elem.ownerDocument || elem ).documentElement; + + // Support: IE <=8 + // Assume HTML when documentElement doesn't yet exist, such as inside loading iframes + // https://bugs.jquery.com/ticket/4833 + return !rhtml.test( namespace || docElem && docElem.nodeName || "HTML" ); +}; + +/** + * Sets document-related variables once based on the current document + * @param {Element|Object} [doc] An element or document object to use to set the document + * @returns {Object} Returns the current document + */ +setDocument = Sizzle.setDocument = function( node ) { + var hasCompare, subWindow, + doc = node ? node.ownerDocument || node : preferredDoc; + + // Return early if doc is invalid or already selected + // Support: IE 11+, Edge 17 - 18+ + // IE/Edge sometimes throw a "Permission denied" error when strict-comparing + // two documents; shallow comparisons work. + // eslint-disable-next-line eqeqeq + if ( doc == document || doc.nodeType !== 9 || !doc.documentElement ) { + return document; + } + + // Update global variables + document = doc; + docElem = document.documentElement; + documentIsHTML = !isXML( document ); + + // Support: IE 9 - 11+, Edge 12 - 18+ + // Accessing iframe documents after unload throws "permission denied" errors (jQuery #13936) + // Support: IE 11+, Edge 17 - 18+ + // IE/Edge sometimes throw a "Permission denied" error when strict-comparing + // two documents; shallow comparisons work. + // eslint-disable-next-line eqeqeq + if ( preferredDoc != document && + ( subWindow = document.defaultView ) && subWindow.top !== subWindow ) { + + // Support: IE 11, Edge + if ( subWindow.addEventListener ) { + subWindow.addEventListener( "unload", unloadHandler, false ); + + // Support: IE 9 - 10 only + } else if ( subWindow.attachEvent ) { + subWindow.attachEvent( "onunload", unloadHandler ); + } + } + + // Support: IE 8 - 11+, Edge 12 - 18+, Chrome <=16 - 25 only, Firefox <=3.6 - 31 only, + // Safari 4 - 5 only, Opera <=11.6 - 12.x only + // IE/Edge & older browsers don't support the :scope pseudo-class. + // Support: Safari 6.0 only + // Safari 6.0 supports :scope but it's an alias of :root there. + support.scope = assert( function( el ) { + docElem.appendChild( el ).appendChild( document.createElement( "div" ) ); + return typeof el.querySelectorAll !== "undefined" && + !el.querySelectorAll( ":scope fieldset div" ).length; + } ); + + /* Attributes + ---------------------------------------------------------------------- */ + + // Support: IE<8 + // Verify that getAttribute really returns attributes and not properties + // (excepting IE8 booleans) + support.attributes = assert( function( el ) { + el.className = "i"; + return !el.getAttribute( "className" ); + } ); + + /* getElement(s)By* + ---------------------------------------------------------------------- */ + + // Check if getElementsByTagName("*") returns only elements + support.getElementsByTagName = assert( function( el ) { + el.appendChild( document.createComment( "" ) ); + return !el.getElementsByTagName( "*" ).length; + } ); + + // Support: IE<9 + support.getElementsByClassName = rnative.test( document.getElementsByClassName ); + + // Support: IE<10 + // Check if getElementById returns elements by name + // The broken getElementById methods don't pick up programmatically-set names, + // so use a roundabout getElementsByName test + support.getById = assert( function( el ) { + docElem.appendChild( el ).id = expando; + return !document.getElementsByName || !document.getElementsByName( expando ).length; + } ); + + // ID filter and find + if ( support.getById ) { + Expr.filter[ "ID" ] = function( id ) { + var attrId = id.replace( runescape, funescape ); + return function( elem ) { + return elem.getAttribute( "id" ) === attrId; + }; + }; + Expr.find[ "ID" ] = function( id, context ) { + if ( typeof context.getElementById !== "undefined" && documentIsHTML ) { + var elem = context.getElementById( id ); + return elem ? [ elem ] : []; + } + }; + } else { + Expr.filter[ "ID" ] = function( id ) { + var attrId = id.replace( runescape, funescape ); + return function( elem ) { + var node = typeof elem.getAttributeNode !== "undefined" && + elem.getAttributeNode( "id" ); + return node && node.value === attrId; + }; + }; + + // Support: IE 6 - 7 only + // getElementById is not reliable as a find shortcut + Expr.find[ "ID" ] = function( id, context ) { + if ( typeof context.getElementById !== "undefined" && documentIsHTML ) { + var node, i, elems, + elem = context.getElementById( id ); + + if ( elem ) { + + // Verify the id attribute + node = elem.getAttributeNode( "id" ); + if ( node && node.value === id ) { + return [ elem ]; + } + + // Fall back on getElementsByName + elems = context.getElementsByName( id ); + i = 0; + while ( ( elem = elems[ i++ ] ) ) { + node = elem.getAttributeNode( "id" ); + if ( node && node.value === id ) { + return [ elem ]; + } + } + } + + return []; + } + }; + } + + // Tag + Expr.find[ "TAG" ] = support.getElementsByTagName ? + function( tag, context ) { + if ( typeof context.getElementsByTagName !== "undefined" ) { + return context.getElementsByTagName( tag ); + + // DocumentFragment nodes don't have gEBTN + } else if ( support.qsa ) { + return context.querySelectorAll( tag ); + } + } : + + function( tag, context ) { + var elem, + tmp = [], + i = 0, + + // By happy coincidence, a (broken) gEBTN appears on DocumentFragment nodes too + results = context.getElementsByTagName( tag ); + + // Filter out possible comments + if ( tag === "*" ) { + while ( ( elem = results[ i++ ] ) ) { + if ( elem.nodeType === 1 ) { + tmp.push( elem ); + } + } + + return tmp; + } + return results; + }; + + // Class + Expr.find[ "CLASS" ] = support.getElementsByClassName && function( className, context ) { + if ( typeof context.getElementsByClassName !== "undefined" && documentIsHTML ) { + return context.getElementsByClassName( className ); + } + }; + + /* QSA/matchesSelector + ---------------------------------------------------------------------- */ + + // QSA and matchesSelector support + + // matchesSelector(:active) reports false when true (IE9/Opera 11.5) + rbuggyMatches = []; + + // qSa(:focus) reports false when true (Chrome 21) + // We allow this because of a bug in IE8/9 that throws an error + // whenever `document.activeElement` is accessed on an iframe + // So, we allow :focus to pass through QSA all the time to avoid the IE error + // See https://bugs.jquery.com/ticket/13378 + rbuggyQSA = []; + + if ( ( support.qsa = rnative.test( document.querySelectorAll ) ) ) { + + // Build QSA regex + // Regex strategy adopted from Diego Perini + assert( function( el ) { + + var input; + + // Select is set to empty string on purpose + // This is to test IE's treatment of not explicitly + // setting a boolean content attribute, + // since its presence should be enough + // https://bugs.jquery.com/ticket/12359 + docElem.appendChild( el ).innerHTML = "" + + ""; + + // Support: IE8, Opera 11-12.16 + // Nothing should be selected when empty strings follow ^= or $= or *= + // The test attribute must be unknown in Opera but "safe" for WinRT + // https://msdn.microsoft.com/en-us/library/ie/hh465388.aspx#attribute_section + if ( el.querySelectorAll( "[msallowcapture^='']" ).length ) { + rbuggyQSA.push( "[*^$]=" + whitespace + "*(?:''|\"\")" ); + } + + // Support: IE8 + // Boolean attributes and "value" are not treated correctly + if ( !el.querySelectorAll( "[selected]" ).length ) { + rbuggyQSA.push( "\\[" + whitespace + "*(?:value|" + booleans + ")" ); + } + + // Support: Chrome<29, Android<4.4, Safari<7.0+, iOS<7.0+, PhantomJS<1.9.8+ + if ( !el.querySelectorAll( "[id~=" + expando + "-]" ).length ) { + rbuggyQSA.push( "~=" ); + } + + // Support: IE 11+, Edge 15 - 18+ + // IE 11/Edge don't find elements on a `[name='']` query in some cases. + // Adding a temporary attribute to the document before the selection works + // around the issue. + // Interestingly, IE 10 & older don't seem to have the issue. + input = document.createElement( "input" ); + input.setAttribute( "name", "" ); + el.appendChild( input ); + if ( !el.querySelectorAll( "[name='']" ).length ) { + rbuggyQSA.push( "\\[" + whitespace + "*name" + whitespace + "*=" + + whitespace + "*(?:''|\"\")" ); + } + + // Webkit/Opera - :checked should return selected option elements + // http://www.w3.org/TR/2011/REC-css3-selectors-20110929/#checked + // IE8 throws error here and will not see later tests + if ( !el.querySelectorAll( ":checked" ).length ) { + rbuggyQSA.push( ":checked" ); + } + + // Support: Safari 8+, iOS 8+ + // https://bugs.webkit.org/show_bug.cgi?id=136851 + // In-page `selector#id sibling-combinator selector` fails + if ( !el.querySelectorAll( "a#" + expando + "+*" ).length ) { + rbuggyQSA.push( ".#.+[+~]" ); + } + + // Support: Firefox <=3.6 - 5 only + // Old Firefox doesn't throw on a badly-escaped identifier. + el.querySelectorAll( "\\\f" ); + rbuggyQSA.push( "[\\r\\n\\f]" ); + } ); + + assert( function( el ) { + el.innerHTML = "" + + ""; + + // Support: Windows 8 Native Apps + // The type and name attributes are restricted during .innerHTML assignment + var input = document.createElement( "input" ); + input.setAttribute( "type", "hidden" ); + el.appendChild( input ).setAttribute( "name", "D" ); + + // Support: IE8 + // Enforce case-sensitivity of name attribute + if ( el.querySelectorAll( "[name=d]" ).length ) { + rbuggyQSA.push( "name" + whitespace + "*[*^$|!~]?=" ); + } + + // FF 3.5 - :enabled/:disabled and hidden elements (hidden elements are still enabled) + // IE8 throws error here and will not see later tests + if ( el.querySelectorAll( ":enabled" ).length !== 2 ) { + rbuggyQSA.push( ":enabled", ":disabled" ); + } + + // Support: IE9-11+ + // IE's :disabled selector does not pick up the children of disabled fieldsets + docElem.appendChild( el ).disabled = true; + if ( el.querySelectorAll( ":disabled" ).length !== 2 ) { + rbuggyQSA.push( ":enabled", ":disabled" ); + } + + // Support: Opera 10 - 11 only + // Opera 10-11 does not throw on post-comma invalid pseudos + el.querySelectorAll( "*,:x" ); + rbuggyQSA.push( ",.*:" ); + } ); + } + + if ( ( support.matchesSelector = rnative.test( ( matches = docElem.matches || + docElem.webkitMatchesSelector || + docElem.mozMatchesSelector || + docElem.oMatchesSelector || + docElem.msMatchesSelector ) ) ) ) { + + assert( function( el ) { + + // Check to see if it's possible to do matchesSelector + // on a disconnected node (IE 9) + support.disconnectedMatch = matches.call( el, "*" ); + + // This should fail with an exception + // Gecko does not error, returns false instead + matches.call( el, "[s!='']:x" ); + rbuggyMatches.push( "!=", pseudos ); + } ); + } + + rbuggyQSA = rbuggyQSA.length && new RegExp( rbuggyQSA.join( "|" ) ); + rbuggyMatches = rbuggyMatches.length && new RegExp( rbuggyMatches.join( "|" ) ); + + /* Contains + ---------------------------------------------------------------------- */ + hasCompare = rnative.test( docElem.compareDocumentPosition ); + + // Element contains another + // Purposefully self-exclusive + // As in, an element does not contain itself + contains = hasCompare || rnative.test( docElem.contains ) ? + function( a, b ) { + var adown = a.nodeType === 9 ? a.documentElement : a, + bup = b && b.parentNode; + return a === bup || !!( bup && bup.nodeType === 1 && ( + adown.contains ? + adown.contains( bup ) : + a.compareDocumentPosition && a.compareDocumentPosition( bup ) & 16 + ) ); + } : + function( a, b ) { + if ( b ) { + while ( ( b = b.parentNode ) ) { + if ( b === a ) { + return true; + } + } + } + return false; + }; + + /* Sorting + ---------------------------------------------------------------------- */ + + // Document order sorting + sortOrder = hasCompare ? + function( a, b ) { + + // Flag for duplicate removal + if ( a === b ) { + hasDuplicate = true; + return 0; + } + + // Sort on method existence if only one input has compareDocumentPosition + var compare = !a.compareDocumentPosition - !b.compareDocumentPosition; + if ( compare ) { + return compare; + } + + // Calculate position if both inputs belong to the same document + // Support: IE 11+, Edge 17 - 18+ + // IE/Edge sometimes throw a "Permission denied" error when strict-comparing + // two documents; shallow comparisons work. + // eslint-disable-next-line eqeqeq + compare = ( a.ownerDocument || a ) == ( b.ownerDocument || b ) ? + a.compareDocumentPosition( b ) : + + // Otherwise we know they are disconnected + 1; + + // Disconnected nodes + if ( compare & 1 || + ( !support.sortDetached && b.compareDocumentPosition( a ) === compare ) ) { + + // Choose the first element that is related to our preferred document + // Support: IE 11+, Edge 17 - 18+ + // IE/Edge sometimes throw a "Permission denied" error when strict-comparing + // two documents; shallow comparisons work. + // eslint-disable-next-line eqeqeq + if ( a == document || a.ownerDocument == preferredDoc && + contains( preferredDoc, a ) ) { + return -1; + } + + // Support: IE 11+, Edge 17 - 18+ + // IE/Edge sometimes throw a "Permission denied" error when strict-comparing + // two documents; shallow comparisons work. + // eslint-disable-next-line eqeqeq + if ( b == document || b.ownerDocument == preferredDoc && + contains( preferredDoc, b ) ) { + return 1; + } + + // Maintain original order + return sortInput ? + ( indexOf( sortInput, a ) - indexOf( sortInput, b ) ) : + 0; + } + + return compare & 4 ? -1 : 1; + } : + function( a, b ) { + + // Exit early if the nodes are identical + if ( a === b ) { + hasDuplicate = true; + return 0; + } + + var cur, + i = 0, + aup = a.parentNode, + bup = b.parentNode, + ap = [ a ], + bp = [ b ]; + + // Parentless nodes are either documents or disconnected + if ( !aup || !bup ) { + + // Support: IE 11+, Edge 17 - 18+ + // IE/Edge sometimes throw a "Permission denied" error when strict-comparing + // two documents; shallow comparisons work. + /* eslint-disable eqeqeq */ + return a == document ? -1 : + b == document ? 1 : + /* eslint-enable eqeqeq */ + aup ? -1 : + bup ? 1 : + sortInput ? + ( indexOf( sortInput, a ) - indexOf( sortInput, b ) ) : + 0; + + // If the nodes are siblings, we can do a quick check + } else if ( aup === bup ) { + return siblingCheck( a, b ); + } + + // Otherwise we need full lists of their ancestors for comparison + cur = a; + while ( ( cur = cur.parentNode ) ) { + ap.unshift( cur ); + } + cur = b; + while ( ( cur = cur.parentNode ) ) { + bp.unshift( cur ); + } + + // Walk down the tree looking for a discrepancy + while ( ap[ i ] === bp[ i ] ) { + i++; + } + + return i ? + + // Do a sibling check if the nodes have a common ancestor + siblingCheck( ap[ i ], bp[ i ] ) : + + // Otherwise nodes in our document sort first + // Support: IE 11+, Edge 17 - 18+ + // IE/Edge sometimes throw a "Permission denied" error when strict-comparing + // two documents; shallow comparisons work. + /* eslint-disable eqeqeq */ + ap[ i ] == preferredDoc ? -1 : + bp[ i ] == preferredDoc ? 1 : + /* eslint-enable eqeqeq */ + 0; + }; + + return document; +}; + +Sizzle.matches = function( expr, elements ) { + return Sizzle( expr, null, null, elements ); +}; + +Sizzle.matchesSelector = function( elem, expr ) { + setDocument( elem ); + + if ( support.matchesSelector && documentIsHTML && + !nonnativeSelectorCache[ expr + " " ] && + ( !rbuggyMatches || !rbuggyMatches.test( expr ) ) && + ( !rbuggyQSA || !rbuggyQSA.test( expr ) ) ) { + + try { + var ret = matches.call( elem, expr ); + + // IE 9's matchesSelector returns false on disconnected nodes + if ( ret || support.disconnectedMatch || + + // As well, disconnected nodes are said to be in a document + // fragment in IE 9 + elem.document && elem.document.nodeType !== 11 ) { + return ret; + } + } catch ( e ) { + nonnativeSelectorCache( expr, true ); + } + } + + return Sizzle( expr, document, null, [ elem ] ).length > 0; +}; + +Sizzle.contains = function( context, elem ) { + + // Set document vars if needed + // Support: IE 11+, Edge 17 - 18+ + // IE/Edge sometimes throw a "Permission denied" error when strict-comparing + // two documents; shallow comparisons work. + // eslint-disable-next-line eqeqeq + if ( ( context.ownerDocument || context ) != document ) { + setDocument( context ); + } + return contains( context, elem ); +}; + +Sizzle.attr = function( elem, name ) { + + // Set document vars if needed + // Support: IE 11+, Edge 17 - 18+ + // IE/Edge sometimes throw a "Permission denied" error when strict-comparing + // two documents; shallow comparisons work. + // eslint-disable-next-line eqeqeq + if ( ( elem.ownerDocument || elem ) != document ) { + setDocument( elem ); + } + + var fn = Expr.attrHandle[ name.toLowerCase() ], + + // Don't get fooled by Object.prototype properties (jQuery #13807) + val = fn && hasOwn.call( Expr.attrHandle, name.toLowerCase() ) ? + fn( elem, name, !documentIsHTML ) : + undefined; + + return val !== undefined ? + val : + support.attributes || !documentIsHTML ? + elem.getAttribute( name ) : + ( val = elem.getAttributeNode( name ) ) && val.specified ? + val.value : + null; +}; + +Sizzle.escape = function( sel ) { + return ( sel + "" ).replace( rcssescape, fcssescape ); +}; + +Sizzle.error = function( msg ) { + throw new Error( "Syntax error, unrecognized expression: " + msg ); +}; + +/** + * Document sorting and removing duplicates + * @param {ArrayLike} results + */ +Sizzle.uniqueSort = function( results ) { + var elem, + duplicates = [], + j = 0, + i = 0; + + // Unless we *know* we can detect duplicates, assume their presence + hasDuplicate = !support.detectDuplicates; + sortInput = !support.sortStable && results.slice( 0 ); + results.sort( sortOrder ); + + if ( hasDuplicate ) { + while ( ( elem = results[ i++ ] ) ) { + if ( elem === results[ i ] ) { + j = duplicates.push( i ); + } + } + while ( j-- ) { + results.splice( duplicates[ j ], 1 ); + } + } + + // Clear input after sorting to release objects + // See https://github.com/jquery/sizzle/pull/225 + sortInput = null; + + return results; +}; + +/** + * Utility function for retrieving the text value of an array of DOM nodes + * @param {Array|Element} elem + */ +getText = Sizzle.getText = function( elem ) { + var node, + ret = "", + i = 0, + nodeType = elem.nodeType; + + if ( !nodeType ) { + + // If no nodeType, this is expected to be an array + while ( ( node = elem[ i++ ] ) ) { + + // Do not traverse comment nodes + ret += getText( node ); + } + } else if ( nodeType === 1 || nodeType === 9 || nodeType === 11 ) { + + // Use textContent for elements + // innerText usage removed for consistency of new lines (jQuery #11153) + if ( typeof elem.textContent === "string" ) { + return elem.textContent; + } else { + + // Traverse its children + for ( elem = elem.firstChild; elem; elem = elem.nextSibling ) { + ret += getText( elem ); + } + } + } else if ( nodeType === 3 || nodeType === 4 ) { + return elem.nodeValue; + } + + // Do not include comment or processing instruction nodes + + return ret; +}; + +Expr = Sizzle.selectors = { + + // Can be adjusted by the user + cacheLength: 50, + + createPseudo: markFunction, + + match: matchExpr, + + attrHandle: {}, + + find: {}, + + relative: { + ">": { dir: "parentNode", first: true }, + " ": { dir: "parentNode" }, + "+": { dir: "previousSibling", first: true }, + "~": { dir: "previousSibling" } + }, + + preFilter: { + "ATTR": function( match ) { + match[ 1 ] = match[ 1 ].replace( runescape, funescape ); + + // Move the given value to match[3] whether quoted or unquoted + match[ 3 ] = ( match[ 3 ] || match[ 4 ] || + match[ 5 ] || "" ).replace( runescape, funescape ); + + if ( match[ 2 ] === "~=" ) { + match[ 3 ] = " " + match[ 3 ] + " "; + } + + return match.slice( 0, 4 ); + }, + + "CHILD": function( match ) { + + /* matches from matchExpr["CHILD"] + 1 type (only|nth|...) + 2 what (child|of-type) + 3 argument (even|odd|\d*|\d*n([+-]\d+)?|...) + 4 xn-component of xn+y argument ([+-]?\d*n|) + 5 sign of xn-component + 6 x of xn-component + 7 sign of y-component + 8 y of y-component + */ + match[ 1 ] = match[ 1 ].toLowerCase(); + + if ( match[ 1 ].slice( 0, 3 ) === "nth" ) { + + // nth-* requires argument + if ( !match[ 3 ] ) { + Sizzle.error( match[ 0 ] ); + } + + // numeric x and y parameters for Expr.filter.CHILD + // remember that false/true cast respectively to 0/1 + match[ 4 ] = +( match[ 4 ] ? + match[ 5 ] + ( match[ 6 ] || 1 ) : + 2 * ( match[ 3 ] === "even" || match[ 3 ] === "odd" ) ); + match[ 5 ] = +( ( match[ 7 ] + match[ 8 ] ) || match[ 3 ] === "odd" ); + + // other types prohibit arguments + } else if ( match[ 3 ] ) { + Sizzle.error( match[ 0 ] ); + } + + return match; + }, + + "PSEUDO": function( match ) { + var excess, + unquoted = !match[ 6 ] && match[ 2 ]; + + if ( matchExpr[ "CHILD" ].test( match[ 0 ] ) ) { + return null; + } + + // Accept quoted arguments as-is + if ( match[ 3 ] ) { + match[ 2 ] = match[ 4 ] || match[ 5 ] || ""; + + // Strip excess characters from unquoted arguments + } else if ( unquoted && rpseudo.test( unquoted ) && + + // Get excess from tokenize (recursively) + ( excess = tokenize( unquoted, true ) ) && + + // advance to the next closing parenthesis + ( excess = unquoted.indexOf( ")", unquoted.length - excess ) - unquoted.length ) ) { + + // excess is a negative index + match[ 0 ] = match[ 0 ].slice( 0, excess ); + match[ 2 ] = unquoted.slice( 0, excess ); + } + + // Return only captures needed by the pseudo filter method (type and argument) + return match.slice( 0, 3 ); + } + }, + + filter: { + + "TAG": function( nodeNameSelector ) { + var nodeName = nodeNameSelector.replace( runescape, funescape ).toLowerCase(); + return nodeNameSelector === "*" ? + function() { + return true; + } : + function( elem ) { + return elem.nodeName && elem.nodeName.toLowerCase() === nodeName; + }; + }, + + "CLASS": function( className ) { + var pattern = classCache[ className + " " ]; + + return pattern || + ( pattern = new RegExp( "(^|" + whitespace + + ")" + className + "(" + whitespace + "|$)" ) ) && classCache( + className, function( elem ) { + return pattern.test( + typeof elem.className === "string" && elem.className || + typeof elem.getAttribute !== "undefined" && + elem.getAttribute( "class" ) || + "" + ); + } ); + }, + + "ATTR": function( name, operator, check ) { + return function( elem ) { + var result = Sizzle.attr( elem, name ); + + if ( result == null ) { + return operator === "!="; + } + if ( !operator ) { + return true; + } + + result += ""; + + /* eslint-disable max-len */ + + return operator === "=" ? result === check : + operator === "!=" ? result !== check : + operator === "^=" ? check && result.indexOf( check ) === 0 : + operator === "*=" ? check && result.indexOf( check ) > -1 : + operator === "$=" ? check && result.slice( -check.length ) === check : + operator === "~=" ? ( " " + result.replace( rwhitespace, " " ) + " " ).indexOf( check ) > -1 : + operator === "|=" ? result === check || result.slice( 0, check.length + 1 ) === check + "-" : + false; + /* eslint-enable max-len */ + + }; + }, + + "CHILD": function( type, what, _argument, first, last ) { + var simple = type.slice( 0, 3 ) !== "nth", + forward = type.slice( -4 ) !== "last", + ofType = what === "of-type"; + + return first === 1 && last === 0 ? + + // Shortcut for :nth-*(n) + function( elem ) { + return !!elem.parentNode; + } : + + function( elem, _context, xml ) { + var cache, uniqueCache, outerCache, node, nodeIndex, start, + dir = simple !== forward ? "nextSibling" : "previousSibling", + parent = elem.parentNode, + name = ofType && elem.nodeName.toLowerCase(), + useCache = !xml && !ofType, + diff = false; + + if ( parent ) { + + // :(first|last|only)-(child|of-type) + if ( simple ) { + while ( dir ) { + node = elem; + while ( ( node = node[ dir ] ) ) { + if ( ofType ? + node.nodeName.toLowerCase() === name : + node.nodeType === 1 ) { + + return false; + } + } + + // Reverse direction for :only-* (if we haven't yet done so) + start = dir = type === "only" && !start && "nextSibling"; + } + return true; + } + + start = [ forward ? parent.firstChild : parent.lastChild ]; + + // non-xml :nth-child(...) stores cache data on `parent` + if ( forward && useCache ) { + + // Seek `elem` from a previously-cached index + + // ...in a gzip-friendly way + node = parent; + outerCache = node[ expando ] || ( node[ expando ] = {} ); + + // Support: IE <9 only + // Defend against cloned attroperties (jQuery gh-1709) + uniqueCache = outerCache[ node.uniqueID ] || + ( outerCache[ node.uniqueID ] = {} ); + + cache = uniqueCache[ type ] || []; + nodeIndex = cache[ 0 ] === dirruns && cache[ 1 ]; + diff = nodeIndex && cache[ 2 ]; + node = nodeIndex && parent.childNodes[ nodeIndex ]; + + while ( ( node = ++nodeIndex && node && node[ dir ] || + + // Fallback to seeking `elem` from the start + ( diff = nodeIndex = 0 ) || start.pop() ) ) { + + // When found, cache indexes on `parent` and break + if ( node.nodeType === 1 && ++diff && node === elem ) { + uniqueCache[ type ] = [ dirruns, nodeIndex, diff ]; + break; + } + } + + } else { + + // Use previously-cached element index if available + if ( useCache ) { + + // ...in a gzip-friendly way + node = elem; + outerCache = node[ expando ] || ( node[ expando ] = {} ); + + // Support: IE <9 only + // Defend against cloned attroperties (jQuery gh-1709) + uniqueCache = outerCache[ node.uniqueID ] || + ( outerCache[ node.uniqueID ] = {} ); + + cache = uniqueCache[ type ] || []; + nodeIndex = cache[ 0 ] === dirruns && cache[ 1 ]; + diff = nodeIndex; + } + + // xml :nth-child(...) + // or :nth-last-child(...) or :nth(-last)?-of-type(...) + if ( diff === false ) { + + // Use the same loop as above to seek `elem` from the start + while ( ( node = ++nodeIndex && node && node[ dir ] || + ( diff = nodeIndex = 0 ) || start.pop() ) ) { + + if ( ( ofType ? + node.nodeName.toLowerCase() === name : + node.nodeType === 1 ) && + ++diff ) { + + // Cache the index of each encountered element + if ( useCache ) { + outerCache = node[ expando ] || + ( node[ expando ] = {} ); + + // Support: IE <9 only + // Defend against cloned attroperties (jQuery gh-1709) + uniqueCache = outerCache[ node.uniqueID ] || + ( outerCache[ node.uniqueID ] = {} ); + + uniqueCache[ type ] = [ dirruns, diff ]; + } + + if ( node === elem ) { + break; + } + } + } + } + } + + // Incorporate the offset, then check against cycle size + diff -= last; + return diff === first || ( diff % first === 0 && diff / first >= 0 ); + } + }; + }, + + "PSEUDO": function( pseudo, argument ) { + + // pseudo-class names are case-insensitive + // http://www.w3.org/TR/selectors/#pseudo-classes + // Prioritize by case sensitivity in case custom pseudos are added with uppercase letters + // Remember that setFilters inherits from pseudos + var args, + fn = Expr.pseudos[ pseudo ] || Expr.setFilters[ pseudo.toLowerCase() ] || + Sizzle.error( "unsupported pseudo: " + pseudo ); + + // The user may use createPseudo to indicate that + // arguments are needed to create the filter function + // just as Sizzle does + if ( fn[ expando ] ) { + return fn( argument ); + } + + // But maintain support for old signatures + if ( fn.length > 1 ) { + args = [ pseudo, pseudo, "", argument ]; + return Expr.setFilters.hasOwnProperty( pseudo.toLowerCase() ) ? + markFunction( function( seed, matches ) { + var idx, + matched = fn( seed, argument ), + i = matched.length; + while ( i-- ) { + idx = indexOf( seed, matched[ i ] ); + seed[ idx ] = !( matches[ idx ] = matched[ i ] ); + } + } ) : + function( elem ) { + return fn( elem, 0, args ); + }; + } + + return fn; + } + }, + + pseudos: { + + // Potentially complex pseudos + "not": markFunction( function( selector ) { + + // Trim the selector passed to compile + // to avoid treating leading and trailing + // spaces as combinators + var input = [], + results = [], + matcher = compile( selector.replace( rtrim, "$1" ) ); + + return matcher[ expando ] ? + markFunction( function( seed, matches, _context, xml ) { + var elem, + unmatched = matcher( seed, null, xml, [] ), + i = seed.length; + + // Match elements unmatched by `matcher` + while ( i-- ) { + if ( ( elem = unmatched[ i ] ) ) { + seed[ i ] = !( matches[ i ] = elem ); + } + } + } ) : + function( elem, _context, xml ) { + input[ 0 ] = elem; + matcher( input, null, xml, results ); + + // Don't keep the element (issue #299) + input[ 0 ] = null; + return !results.pop(); + }; + } ), + + "has": markFunction( function( selector ) { + return function( elem ) { + return Sizzle( selector, elem ).length > 0; + }; + } ), + + "contains": markFunction( function( text ) { + text = text.replace( runescape, funescape ); + return function( elem ) { + return ( elem.textContent || getText( elem ) ).indexOf( text ) > -1; + }; + } ), + + // "Whether an element is represented by a :lang() selector + // is based solely on the element's language value + // being equal to the identifier C, + // or beginning with the identifier C immediately followed by "-". + // The matching of C against the element's language value is performed case-insensitively. + // The identifier C does not have to be a valid language name." + // http://www.w3.org/TR/selectors/#lang-pseudo + "lang": markFunction( function( lang ) { + + // lang value must be a valid identifier + if ( !ridentifier.test( lang || "" ) ) { + Sizzle.error( "unsupported lang: " + lang ); + } + lang = lang.replace( runescape, funescape ).toLowerCase(); + return function( elem ) { + var elemLang; + do { + if ( ( elemLang = documentIsHTML ? + elem.lang : + elem.getAttribute( "xml:lang" ) || elem.getAttribute( "lang" ) ) ) { + + elemLang = elemLang.toLowerCase(); + return elemLang === lang || elemLang.indexOf( lang + "-" ) === 0; + } + } while ( ( elem = elem.parentNode ) && elem.nodeType === 1 ); + return false; + }; + } ), + + // Miscellaneous + "target": function( elem ) { + var hash = window.location && window.location.hash; + return hash && hash.slice( 1 ) === elem.id; + }, + + "root": function( elem ) { + return elem === docElem; + }, + + "focus": function( elem ) { + return elem === document.activeElement && + ( !document.hasFocus || document.hasFocus() ) && + !!( elem.type || elem.href || ~elem.tabIndex ); + }, + + // Boolean properties + "enabled": createDisabledPseudo( false ), + "disabled": createDisabledPseudo( true ), + + "checked": function( elem ) { + + // In CSS3, :checked should return both checked and selected elements + // http://www.w3.org/TR/2011/REC-css3-selectors-20110929/#checked + var nodeName = elem.nodeName.toLowerCase(); + return ( nodeName === "input" && !!elem.checked ) || + ( nodeName === "option" && !!elem.selected ); + }, + + "selected": function( elem ) { + + // Accessing this property makes selected-by-default + // options in Safari work properly + if ( elem.parentNode ) { + // eslint-disable-next-line no-unused-expressions + elem.parentNode.selectedIndex; + } + + return elem.selected === true; + }, + + // Contents + "empty": function( elem ) { + + // http://www.w3.org/TR/selectors/#empty-pseudo + // :empty is negated by element (1) or content nodes (text: 3; cdata: 4; entity ref: 5), + // but not by others (comment: 8; processing instruction: 7; etc.) + // nodeType < 6 works because attributes (2) do not appear as children + for ( elem = elem.firstChild; elem; elem = elem.nextSibling ) { + if ( elem.nodeType < 6 ) { + return false; + } + } + return true; + }, + + "parent": function( elem ) { + return !Expr.pseudos[ "empty" ]( elem ); + }, + + // Element/input types + "header": function( elem ) { + return rheader.test( elem.nodeName ); + }, + + "input": function( elem ) { + return rinputs.test( elem.nodeName ); + }, + + "button": function( elem ) { + var name = elem.nodeName.toLowerCase(); + return name === "input" && elem.type === "button" || name === "button"; + }, + + "text": function( elem ) { + var attr; + return elem.nodeName.toLowerCase() === "input" && + elem.type === "text" && + + // Support: IE<8 + // New HTML5 attribute values (e.g., "search") appear with elem.type === "text" + ( ( attr = elem.getAttribute( "type" ) ) == null || + attr.toLowerCase() === "text" ); + }, + + // Position-in-collection + "first": createPositionalPseudo( function() { + return [ 0 ]; + } ), + + "last": createPositionalPseudo( function( _matchIndexes, length ) { + return [ length - 1 ]; + } ), + + "eq": createPositionalPseudo( function( _matchIndexes, length, argument ) { + return [ argument < 0 ? argument + length : argument ]; + } ), + + "even": createPositionalPseudo( function( matchIndexes, length ) { + var i = 0; + for ( ; i < length; i += 2 ) { + matchIndexes.push( i ); + } + return matchIndexes; + } ), + + "odd": createPositionalPseudo( function( matchIndexes, length ) { + var i = 1; + for ( ; i < length; i += 2 ) { + matchIndexes.push( i ); + } + return matchIndexes; + } ), + + "lt": createPositionalPseudo( function( matchIndexes, length, argument ) { + var i = argument < 0 ? + argument + length : + argument > length ? + length : + argument; + for ( ; --i >= 0; ) { + matchIndexes.push( i ); + } + return matchIndexes; + } ), + + "gt": createPositionalPseudo( function( matchIndexes, length, argument ) { + var i = argument < 0 ? argument + length : argument; + for ( ; ++i < length; ) { + matchIndexes.push( i ); + } + return matchIndexes; + } ) + } +}; + +Expr.pseudos[ "nth" ] = Expr.pseudos[ "eq" ]; + +// Add button/input type pseudos +for ( i in { radio: true, checkbox: true, file: true, password: true, image: true } ) { + Expr.pseudos[ i ] = createInputPseudo( i ); +} +for ( i in { submit: true, reset: true } ) { + Expr.pseudos[ i ] = createButtonPseudo( i ); +} + +// Easy API for creating new setFilters +function setFilters() {} +setFilters.prototype = Expr.filters = Expr.pseudos; +Expr.setFilters = new setFilters(); + +tokenize = Sizzle.tokenize = function( selector, parseOnly ) { + var matched, match, tokens, type, + soFar, groups, preFilters, + cached = tokenCache[ selector + " " ]; + + if ( cached ) { + return parseOnly ? 0 : cached.slice( 0 ); + } + + soFar = selector; + groups = []; + preFilters = Expr.preFilter; + + while ( soFar ) { + + // Comma and first run + if ( !matched || ( match = rcomma.exec( soFar ) ) ) { + if ( match ) { + + // Don't consume trailing commas as valid + soFar = soFar.slice( match[ 0 ].length ) || soFar; + } + groups.push( ( tokens = [] ) ); + } + + matched = false; + + // Combinators + if ( ( match = rcombinators.exec( soFar ) ) ) { + matched = match.shift(); + tokens.push( { + value: matched, + + // Cast descendant combinators to space + type: match[ 0 ].replace( rtrim, " " ) + } ); + soFar = soFar.slice( matched.length ); + } + + // Filters + for ( type in Expr.filter ) { + if ( ( match = matchExpr[ type ].exec( soFar ) ) && ( !preFilters[ type ] || + ( match = preFilters[ type ]( match ) ) ) ) { + matched = match.shift(); + tokens.push( { + value: matched, + type: type, + matches: match + } ); + soFar = soFar.slice( matched.length ); + } + } + + if ( !matched ) { + break; + } + } + + // Return the length of the invalid excess + // if we're just parsing + // Otherwise, throw an error or return tokens + return parseOnly ? + soFar.length : + soFar ? + Sizzle.error( selector ) : + + // Cache the tokens + tokenCache( selector, groups ).slice( 0 ); +}; + +function toSelector( tokens ) { + var i = 0, + len = tokens.length, + selector = ""; + for ( ; i < len; i++ ) { + selector += tokens[ i ].value; + } + return selector; +} + +function addCombinator( matcher, combinator, base ) { + var dir = combinator.dir, + skip = combinator.next, + key = skip || dir, + checkNonElements = base && key === "parentNode", + doneName = done++; + + return combinator.first ? + + // Check against closest ancestor/preceding element + function( elem, context, xml ) { + while ( ( elem = elem[ dir ] ) ) { + if ( elem.nodeType === 1 || checkNonElements ) { + return matcher( elem, context, xml ); + } + } + return false; + } : + + // Check against all ancestor/preceding elements + function( elem, context, xml ) { + var oldCache, uniqueCache, outerCache, + newCache = [ dirruns, doneName ]; + + // We can't set arbitrary data on XML nodes, so they don't benefit from combinator caching + if ( xml ) { + while ( ( elem = elem[ dir ] ) ) { + if ( elem.nodeType === 1 || checkNonElements ) { + if ( matcher( elem, context, xml ) ) { + return true; + } + } + } + } else { + while ( ( elem = elem[ dir ] ) ) { + if ( elem.nodeType === 1 || checkNonElements ) { + outerCache = elem[ expando ] || ( elem[ expando ] = {} ); + + // Support: IE <9 only + // Defend against cloned attroperties (jQuery gh-1709) + uniqueCache = outerCache[ elem.uniqueID ] || + ( outerCache[ elem.uniqueID ] = {} ); + + if ( skip && skip === elem.nodeName.toLowerCase() ) { + elem = elem[ dir ] || elem; + } else if ( ( oldCache = uniqueCache[ key ] ) && + oldCache[ 0 ] === dirruns && oldCache[ 1 ] === doneName ) { + + // Assign to newCache so results back-propagate to previous elements + return ( newCache[ 2 ] = oldCache[ 2 ] ); + } else { + + // Reuse newcache so results back-propagate to previous elements + uniqueCache[ key ] = newCache; + + // A match means we're done; a fail means we have to keep checking + if ( ( newCache[ 2 ] = matcher( elem, context, xml ) ) ) { + return true; + } + } + } + } + } + return false; + }; +} + +function elementMatcher( matchers ) { + return matchers.length > 1 ? + function( elem, context, xml ) { + var i = matchers.length; + while ( i-- ) { + if ( !matchers[ i ]( elem, context, xml ) ) { + return false; + } + } + return true; + } : + matchers[ 0 ]; +} + +function multipleContexts( selector, contexts, results ) { + var i = 0, + len = contexts.length; + for ( ; i < len; i++ ) { + Sizzle( selector, contexts[ i ], results ); + } + return results; +} + +function condense( unmatched, map, filter, context, xml ) { + var elem, + newUnmatched = [], + i = 0, + len = unmatched.length, + mapped = map != null; + + for ( ; i < len; i++ ) { + if ( ( elem = unmatched[ i ] ) ) { + if ( !filter || filter( elem, context, xml ) ) { + newUnmatched.push( elem ); + if ( mapped ) { + map.push( i ); + } + } + } + } + + return newUnmatched; +} + +function setMatcher( preFilter, selector, matcher, postFilter, postFinder, postSelector ) { + if ( postFilter && !postFilter[ expando ] ) { + postFilter = setMatcher( postFilter ); + } + if ( postFinder && !postFinder[ expando ] ) { + postFinder = setMatcher( postFinder, postSelector ); + } + return markFunction( function( seed, results, context, xml ) { + var temp, i, elem, + preMap = [], + postMap = [], + preexisting = results.length, + + // Get initial elements from seed or context + elems = seed || multipleContexts( + selector || "*", + context.nodeType ? [ context ] : context, + [] + ), + + // Prefilter to get matcher input, preserving a map for seed-results synchronization + matcherIn = preFilter && ( seed || !selector ) ? + condense( elems, preMap, preFilter, context, xml ) : + elems, + + matcherOut = matcher ? + + // If we have a postFinder, or filtered seed, or non-seed postFilter or preexisting results, + postFinder || ( seed ? preFilter : preexisting || postFilter ) ? + + // ...intermediate processing is necessary + [] : + + // ...otherwise use results directly + results : + matcherIn; + + // Find primary matches + if ( matcher ) { + matcher( matcherIn, matcherOut, context, xml ); + } + + // Apply postFilter + if ( postFilter ) { + temp = condense( matcherOut, postMap ); + postFilter( temp, [], context, xml ); + + // Un-match failing elements by moving them back to matcherIn + i = temp.length; + while ( i-- ) { + if ( ( elem = temp[ i ] ) ) { + matcherOut[ postMap[ i ] ] = !( matcherIn[ postMap[ i ] ] = elem ); + } + } + } + + if ( seed ) { + if ( postFinder || preFilter ) { + if ( postFinder ) { + + // Get the final matcherOut by condensing this intermediate into postFinder contexts + temp = []; + i = matcherOut.length; + while ( i-- ) { + if ( ( elem = matcherOut[ i ] ) ) { + + // Restore matcherIn since elem is not yet a final match + temp.push( ( matcherIn[ i ] = elem ) ); + } + } + postFinder( null, ( matcherOut = [] ), temp, xml ); + } + + // Move matched elements from seed to results to keep them synchronized + i = matcherOut.length; + while ( i-- ) { + if ( ( elem = matcherOut[ i ] ) && + ( temp = postFinder ? indexOf( seed, elem ) : preMap[ i ] ) > -1 ) { + + seed[ temp ] = !( results[ temp ] = elem ); + } + } + } + + // Add elements to results, through postFinder if defined + } else { + matcherOut = condense( + matcherOut === results ? + matcherOut.splice( preexisting, matcherOut.length ) : + matcherOut + ); + if ( postFinder ) { + postFinder( null, results, matcherOut, xml ); + } else { + push.apply( results, matcherOut ); + } + } + } ); +} + +function matcherFromTokens( tokens ) { + var checkContext, matcher, j, + len = tokens.length, + leadingRelative = Expr.relative[ tokens[ 0 ].type ], + implicitRelative = leadingRelative || Expr.relative[ " " ], + i = leadingRelative ? 1 : 0, + + // The foundational matcher ensures that elements are reachable from top-level context(s) + matchContext = addCombinator( function( elem ) { + return elem === checkContext; + }, implicitRelative, true ), + matchAnyContext = addCombinator( function( elem ) { + return indexOf( checkContext, elem ) > -1; + }, implicitRelative, true ), + matchers = [ function( elem, context, xml ) { + var ret = ( !leadingRelative && ( xml || context !== outermostContext ) ) || ( + ( checkContext = context ).nodeType ? + matchContext( elem, context, xml ) : + matchAnyContext( elem, context, xml ) ); + + // Avoid hanging onto element (issue #299) + checkContext = null; + return ret; + } ]; + + for ( ; i < len; i++ ) { + if ( ( matcher = Expr.relative[ tokens[ i ].type ] ) ) { + matchers = [ addCombinator( elementMatcher( matchers ), matcher ) ]; + } else { + matcher = Expr.filter[ tokens[ i ].type ].apply( null, tokens[ i ].matches ); + + // Return special upon seeing a positional matcher + if ( matcher[ expando ] ) { + + // Find the next relative operator (if any) for proper handling + j = ++i; + for ( ; j < len; j++ ) { + if ( Expr.relative[ tokens[ j ].type ] ) { + break; + } + } + return setMatcher( + i > 1 && elementMatcher( matchers ), + i > 1 && toSelector( + + // If the preceding token was a descendant combinator, insert an implicit any-element `*` + tokens + .slice( 0, i - 1 ) + .concat( { value: tokens[ i - 2 ].type === " " ? "*" : "" } ) + ).replace( rtrim, "$1" ), + matcher, + i < j && matcherFromTokens( tokens.slice( i, j ) ), + j < len && matcherFromTokens( ( tokens = tokens.slice( j ) ) ), + j < len && toSelector( tokens ) + ); + } + matchers.push( matcher ); + } + } + + return elementMatcher( matchers ); +} + +function matcherFromGroupMatchers( elementMatchers, setMatchers ) { + var bySet = setMatchers.length > 0, + byElement = elementMatchers.length > 0, + superMatcher = function( seed, context, xml, results, outermost ) { + var elem, j, matcher, + matchedCount = 0, + i = "0", + unmatched = seed && [], + setMatched = [], + contextBackup = outermostContext, + + // We must always have either seed elements or outermost context + elems = seed || byElement && Expr.find[ "TAG" ]( "*", outermost ), + + // Use integer dirruns iff this is the outermost matcher + dirrunsUnique = ( dirruns += contextBackup == null ? 1 : Math.random() || 0.1 ), + len = elems.length; + + if ( outermost ) { + + // Support: IE 11+, Edge 17 - 18+ + // IE/Edge sometimes throw a "Permission denied" error when strict-comparing + // two documents; shallow comparisons work. + // eslint-disable-next-line eqeqeq + outermostContext = context == document || context || outermost; + } + + // Add elements passing elementMatchers directly to results + // Support: IE<9, Safari + // Tolerate NodeList properties (IE: "length"; Safari: ) matching elements by id + for ( ; i !== len && ( elem = elems[ i ] ) != null; i++ ) { + if ( byElement && elem ) { + j = 0; + + // Support: IE 11+, Edge 17 - 18+ + // IE/Edge sometimes throw a "Permission denied" error when strict-comparing + // two documents; shallow comparisons work. + // eslint-disable-next-line eqeqeq + if ( !context && elem.ownerDocument != document ) { + setDocument( elem ); + xml = !documentIsHTML; + } + while ( ( matcher = elementMatchers[ j++ ] ) ) { + if ( matcher( elem, context || document, xml ) ) { + results.push( elem ); + break; + } + } + if ( outermost ) { + dirruns = dirrunsUnique; + } + } + + // Track unmatched elements for set filters + if ( bySet ) { + + // They will have gone through all possible matchers + if ( ( elem = !matcher && elem ) ) { + matchedCount--; + } + + // Lengthen the array for every element, matched or not + if ( seed ) { + unmatched.push( elem ); + } + } + } + + // `i` is now the count of elements visited above, and adding it to `matchedCount` + // makes the latter nonnegative. + matchedCount += i; + + // Apply set filters to unmatched elements + // NOTE: This can be skipped if there are no unmatched elements (i.e., `matchedCount` + // equals `i`), unless we didn't visit _any_ elements in the above loop because we have + // no element matchers and no seed. + // Incrementing an initially-string "0" `i` allows `i` to remain a string only in that + // case, which will result in a "00" `matchedCount` that differs from `i` but is also + // numerically zero. + if ( bySet && i !== matchedCount ) { + j = 0; + while ( ( matcher = setMatchers[ j++ ] ) ) { + matcher( unmatched, setMatched, context, xml ); + } + + if ( seed ) { + + // Reintegrate element matches to eliminate the need for sorting + if ( matchedCount > 0 ) { + while ( i-- ) { + if ( !( unmatched[ i ] || setMatched[ i ] ) ) { + setMatched[ i ] = pop.call( results ); + } + } + } + + // Discard index placeholder values to get only actual matches + setMatched = condense( setMatched ); + } + + // Add matches to results + push.apply( results, setMatched ); + + // Seedless set matches succeeding multiple successful matchers stipulate sorting + if ( outermost && !seed && setMatched.length > 0 && + ( matchedCount + setMatchers.length ) > 1 ) { + + Sizzle.uniqueSort( results ); + } + } + + // Override manipulation of globals by nested matchers + if ( outermost ) { + dirruns = dirrunsUnique; + outermostContext = contextBackup; + } + + return unmatched; + }; + + return bySet ? + markFunction( superMatcher ) : + superMatcher; +} + +compile = Sizzle.compile = function( selector, match /* Internal Use Only */ ) { + var i, + setMatchers = [], + elementMatchers = [], + cached = compilerCache[ selector + " " ]; + + if ( !cached ) { + + // Generate a function of recursive functions that can be used to check each element + if ( !match ) { + match = tokenize( selector ); + } + i = match.length; + while ( i-- ) { + cached = matcherFromTokens( match[ i ] ); + if ( cached[ expando ] ) { + setMatchers.push( cached ); + } else { + elementMatchers.push( cached ); + } + } + + // Cache the compiled function + cached = compilerCache( + selector, + matcherFromGroupMatchers( elementMatchers, setMatchers ) + ); + + // Save selector and tokenization + cached.selector = selector; + } + return cached; +}; + +/** + * A low-level selection function that works with Sizzle's compiled + * selector functions + * @param {String|Function} selector A selector or a pre-compiled + * selector function built with Sizzle.compile + * @param {Element} context + * @param {Array} [results] + * @param {Array} [seed] A set of elements to match against + */ +select = Sizzle.select = function( selector, context, results, seed ) { + var i, tokens, token, type, find, + compiled = typeof selector === "function" && selector, + match = !seed && tokenize( ( selector = compiled.selector || selector ) ); + + results = results || []; + + // Try to minimize operations if there is only one selector in the list and no seed + // (the latter of which guarantees us context) + if ( match.length === 1 ) { + + // Reduce context if the leading compound selector is an ID + tokens = match[ 0 ] = match[ 0 ].slice( 0 ); + if ( tokens.length > 2 && ( token = tokens[ 0 ] ).type === "ID" && + context.nodeType === 9 && documentIsHTML && Expr.relative[ tokens[ 1 ].type ] ) { + + context = ( Expr.find[ "ID" ]( token.matches[ 0 ] + .replace( runescape, funescape ), context ) || [] )[ 0 ]; + if ( !context ) { + return results; + + // Precompiled matchers will still verify ancestry, so step up a level + } else if ( compiled ) { + context = context.parentNode; + } + + selector = selector.slice( tokens.shift().value.length ); + } + + // Fetch a seed set for right-to-left matching + i = matchExpr[ "needsContext" ].test( selector ) ? 0 : tokens.length; + while ( i-- ) { + token = tokens[ i ]; + + // Abort if we hit a combinator + if ( Expr.relative[ ( type = token.type ) ] ) { + break; + } + if ( ( find = Expr.find[ type ] ) ) { + + // Search, expanding context for leading sibling combinators + if ( ( seed = find( + token.matches[ 0 ].replace( runescape, funescape ), + rsibling.test( tokens[ 0 ].type ) && testContext( context.parentNode ) || + context + ) ) ) { + + // If seed is empty or no tokens remain, we can return early + tokens.splice( i, 1 ); + selector = seed.length && toSelector( tokens ); + if ( !selector ) { + push.apply( results, seed ); + return results; + } + + break; + } + } + } + } + + // Compile and execute a filtering function if one is not provided + // Provide `match` to avoid retokenization if we modified the selector above + ( compiled || compile( selector, match ) )( + seed, + context, + !documentIsHTML, + results, + !context || rsibling.test( selector ) && testContext( context.parentNode ) || context + ); + return results; +}; + +// One-time assignments + +// Sort stability +support.sortStable = expando.split( "" ).sort( sortOrder ).join( "" ) === expando; + +// Support: Chrome 14-35+ +// Always assume duplicates if they aren't passed to the comparison function +support.detectDuplicates = !!hasDuplicate; + +// Initialize against the default document +setDocument(); + +// Support: Webkit<537.32 - Safari 6.0.3/Chrome 25 (fixed in Chrome 27) +// Detached nodes confoundingly follow *each other* +support.sortDetached = assert( function( el ) { + + // Should return 1, but returns 4 (following) + return el.compareDocumentPosition( document.createElement( "fieldset" ) ) & 1; +} ); + +// Support: IE<8 +// Prevent attribute/property "interpolation" +// https://msdn.microsoft.com/en-us/library/ms536429%28VS.85%29.aspx +if ( !assert( function( el ) { + el.innerHTML = ""; + return el.firstChild.getAttribute( "href" ) === "#"; +} ) ) { + addHandle( "type|href|height|width", function( elem, name, isXML ) { + if ( !isXML ) { + return elem.getAttribute( name, name.toLowerCase() === "type" ? 1 : 2 ); + } + } ); +} + +// Support: IE<9 +// Use defaultValue in place of getAttribute("value") +if ( !support.attributes || !assert( function( el ) { + el.innerHTML = ""; + el.firstChild.setAttribute( "value", "" ); + return el.firstChild.getAttribute( "value" ) === ""; +} ) ) { + addHandle( "value", function( elem, _name, isXML ) { + if ( !isXML && elem.nodeName.toLowerCase() === "input" ) { + return elem.defaultValue; + } + } ); +} + +// Support: IE<9 +// Use getAttributeNode to fetch booleans when getAttribute lies +if ( !assert( function( el ) { + return el.getAttribute( "disabled" ) == null; +} ) ) { + addHandle( booleans, function( elem, name, isXML ) { + var val; + if ( !isXML ) { + return elem[ name ] === true ? name.toLowerCase() : + ( val = elem.getAttributeNode( name ) ) && val.specified ? + val.value : + null; + } + } ); +} + +return Sizzle; + +} )( window ); + + + +jQuery.find = Sizzle; +jQuery.expr = Sizzle.selectors; + +// Deprecated +jQuery.expr[ ":" ] = jQuery.expr.pseudos; +jQuery.uniqueSort = jQuery.unique = Sizzle.uniqueSort; +jQuery.text = Sizzle.getText; +jQuery.isXMLDoc = Sizzle.isXML; +jQuery.contains = Sizzle.contains; +jQuery.escapeSelector = Sizzle.escape; + + + + +var dir = function( elem, dir, until ) { + var matched = [], + truncate = until !== undefined; + + while ( ( elem = elem[ dir ] ) && elem.nodeType !== 9 ) { + if ( elem.nodeType === 1 ) { + if ( truncate && jQuery( elem ).is( until ) ) { + break; + } + matched.push( elem ); + } + } + return matched; +}; + + +var siblings = function( n, elem ) { + var matched = []; + + for ( ; n; n = n.nextSibling ) { + if ( n.nodeType === 1 && n !== elem ) { + matched.push( n ); + } + } + + return matched; +}; + + +var rneedsContext = jQuery.expr.match.needsContext; + + + +function nodeName( elem, name ) { + + return elem.nodeName && elem.nodeName.toLowerCase() === name.toLowerCase(); + +} +var rsingleTag = ( /^<([a-z][^\/\0>:\x20\t\r\n\f]*)[\x20\t\r\n\f]*\/?>(?:<\/\1>|)$/i ); + + + +// Implement the identical functionality for filter and not +function winnow( elements, qualifier, not ) { + if ( isFunction( qualifier ) ) { + return jQuery.grep( elements, function( elem, i ) { + return !!qualifier.call( elem, i, elem ) !== not; + } ); + } + + // Single element + if ( qualifier.nodeType ) { + return jQuery.grep( elements, function( elem ) { + return ( elem === qualifier ) !== not; + } ); + } + + // Arraylike of elements (jQuery, arguments, Array) + if ( typeof qualifier !== "string" ) { + return jQuery.grep( elements, function( elem ) { + return ( indexOf.call( qualifier, elem ) > -1 ) !== not; + } ); + } + + // Filtered directly for both simple and complex selectors + return jQuery.filter( qualifier, elements, not ); +} + +jQuery.filter = function( expr, elems, not ) { + var elem = elems[ 0 ]; + + if ( not ) { + expr = ":not(" + expr + ")"; + } + + if ( elems.length === 1 && elem.nodeType === 1 ) { + return jQuery.find.matchesSelector( elem, expr ) ? [ elem ] : []; + } + + return jQuery.find.matches( expr, jQuery.grep( elems, function( elem ) { + return elem.nodeType === 1; + } ) ); +}; + +jQuery.fn.extend( { + find: function( selector ) { + var i, ret, + len = this.length, + self = this; + + if ( typeof selector !== "string" ) { + return this.pushStack( jQuery( selector ).filter( function() { + for ( i = 0; i < len; i++ ) { + if ( jQuery.contains( self[ i ], this ) ) { + return true; + } + } + } ) ); + } + + ret = this.pushStack( [] ); + + for ( i = 0; i < len; i++ ) { + jQuery.find( selector, self[ i ], ret ); + } + + return len > 1 ? jQuery.uniqueSort( ret ) : ret; + }, + filter: function( selector ) { + return this.pushStack( winnow( this, selector || [], false ) ); + }, + not: function( selector ) { + return this.pushStack( winnow( this, selector || [], true ) ); + }, + is: function( selector ) { + return !!winnow( + this, + + // If this is a positional/relative selector, check membership in the returned set + // so $("p:first").is("p:last") won't return true for a doc with two "p". + typeof selector === "string" && rneedsContext.test( selector ) ? + jQuery( selector ) : + selector || [], + false + ).length; + } +} ); + + +// Initialize a jQuery object + + +// A central reference to the root jQuery(document) +var rootjQuery, + + // A simple way to check for HTML strings + // Prioritize #id over to avoid XSS via location.hash (#9521) + // Strict HTML recognition (#11290: must start with <) + // Shortcut simple #id case for speed + rquickExpr = /^(?:\s*(<[\w\W]+>)[^>]*|#([\w-]+))$/, + + init = jQuery.fn.init = function( selector, context, root ) { + var match, elem; + + // HANDLE: $(""), $(null), $(undefined), $(false) + if ( !selector ) { + return this; + } + + // Method init() accepts an alternate rootjQuery + // so migrate can support jQuery.sub (gh-2101) + root = root || rootjQuery; + + // Handle HTML strings + if ( typeof selector === "string" ) { + if ( selector[ 0 ] === "<" && + selector[ selector.length - 1 ] === ">" && + selector.length >= 3 ) { + + // Assume that strings that start and end with <> are HTML and skip the regex check + match = [ null, selector, null ]; + + } else { + match = rquickExpr.exec( selector ); + } + + // Match html or make sure no context is specified for #id + if ( match && ( match[ 1 ] || !context ) ) { + + // HANDLE: $(html) -> $(array) + if ( match[ 1 ] ) { + context = context instanceof jQuery ? context[ 0 ] : context; + + // Option to run scripts is true for back-compat + // Intentionally let the error be thrown if parseHTML is not present + jQuery.merge( this, jQuery.parseHTML( + match[ 1 ], + context && context.nodeType ? context.ownerDocument || context : document, + true + ) ); + + // HANDLE: $(html, props) + if ( rsingleTag.test( match[ 1 ] ) && jQuery.isPlainObject( context ) ) { + for ( match in context ) { + + // Properties of context are called as methods if possible + if ( isFunction( this[ match ] ) ) { + this[ match ]( context[ match ] ); + + // ...and otherwise set as attributes + } else { + this.attr( match, context[ match ] ); + } + } + } + + return this; + + // HANDLE: $(#id) + } else { + elem = document.getElementById( match[ 2 ] ); + + if ( elem ) { + + // Inject the element directly into the jQuery object + this[ 0 ] = elem; + this.length = 1; + } + return this; + } + + // HANDLE: $(expr, $(...)) + } else if ( !context || context.jquery ) { + return ( context || root ).find( selector ); + + // HANDLE: $(expr, context) + // (which is just equivalent to: $(context).find(expr) + } else { + return this.constructor( context ).find( selector ); + } + + // HANDLE: $(DOMElement) + } else if ( selector.nodeType ) { + this[ 0 ] = selector; + this.length = 1; + return this; + + // HANDLE: $(function) + // Shortcut for document ready + } else if ( isFunction( selector ) ) { + return root.ready !== undefined ? + root.ready( selector ) : + + // Execute immediately if ready is not present + selector( jQuery ); + } + + return jQuery.makeArray( selector, this ); + }; + +// Give the init function the jQuery prototype for later instantiation +init.prototype = jQuery.fn; + +// Initialize central reference +rootjQuery = jQuery( document ); + + +var rparentsprev = /^(?:parents|prev(?:Until|All))/, + + // Methods guaranteed to produce a unique set when starting from a unique set + guaranteedUnique = { + children: true, + contents: true, + next: true, + prev: true + }; + +jQuery.fn.extend( { + has: function( target ) { + var targets = jQuery( target, this ), + l = targets.length; + + return this.filter( function() { + var i = 0; + for ( ; i < l; i++ ) { + if ( jQuery.contains( this, targets[ i ] ) ) { + return true; + } + } + } ); + }, + + closest: function( selectors, context ) { + var cur, + i = 0, + l = this.length, + matched = [], + targets = typeof selectors !== "string" && jQuery( selectors ); + + // Positional selectors never match, since there's no _selection_ context + if ( !rneedsContext.test( selectors ) ) { + for ( ; i < l; i++ ) { + for ( cur = this[ i ]; cur && cur !== context; cur = cur.parentNode ) { + + // Always skip document fragments + if ( cur.nodeType < 11 && ( targets ? + targets.index( cur ) > -1 : + + // Don't pass non-elements to Sizzle + cur.nodeType === 1 && + jQuery.find.matchesSelector( cur, selectors ) ) ) { + + matched.push( cur ); + break; + } + } + } + } + + return this.pushStack( matched.length > 1 ? jQuery.uniqueSort( matched ) : matched ); + }, + + // Determine the position of an element within the set + index: function( elem ) { + + // No argument, return index in parent + if ( !elem ) { + return ( this[ 0 ] && this[ 0 ].parentNode ) ? this.first().prevAll().length : -1; + } + + // Index in selector + if ( typeof elem === "string" ) { + return indexOf.call( jQuery( elem ), this[ 0 ] ); + } + + // Locate the position of the desired element + return indexOf.call( this, + + // If it receives a jQuery object, the first element is used + elem.jquery ? elem[ 0 ] : elem + ); + }, + + add: function( selector, context ) { + return this.pushStack( + jQuery.uniqueSort( + jQuery.merge( this.get(), jQuery( selector, context ) ) + ) + ); + }, + + addBack: function( selector ) { + return this.add( selector == null ? + this.prevObject : this.prevObject.filter( selector ) + ); + } +} ); + +function sibling( cur, dir ) { + while ( ( cur = cur[ dir ] ) && cur.nodeType !== 1 ) {} + return cur; +} + +jQuery.each( { + parent: function( elem ) { + var parent = elem.parentNode; + return parent && parent.nodeType !== 11 ? parent : null; + }, + parents: function( elem ) { + return dir( elem, "parentNode" ); + }, + parentsUntil: function( elem, _i, until ) { + return dir( elem, "parentNode", until ); + }, + next: function( elem ) { + return sibling( elem, "nextSibling" ); + }, + prev: function( elem ) { + return sibling( elem, "previousSibling" ); + }, + nextAll: function( elem ) { + return dir( elem, "nextSibling" ); + }, + prevAll: function( elem ) { + return dir( elem, "previousSibling" ); + }, + nextUntil: function( elem, _i, until ) { + return dir( elem, "nextSibling", until ); + }, + prevUntil: function( elem, _i, until ) { + return dir( elem, "previousSibling", until ); + }, + siblings: function( elem ) { + return siblings( ( elem.parentNode || {} ).firstChild, elem ); + }, + children: function( elem ) { + return siblings( elem.firstChild ); + }, + contents: function( elem ) { + if ( elem.contentDocument != null && + + // Support: IE 11+ + // elements with no `data` attribute has an object + // `contentDocument` with a `null` prototype. + getProto( elem.contentDocument ) ) { + + return elem.contentDocument; + } + + // Support: IE 9 - 11 only, iOS 7 only, Android Browser <=4.3 only + // Treat the template element as a regular one in browsers that + // don't support it. + if ( nodeName( elem, "template" ) ) { + elem = elem.content || elem; + } + + return jQuery.merge( [], elem.childNodes ); + } +}, function( name, fn ) { + jQuery.fn[ name ] = function( until, selector ) { + var matched = jQuery.map( this, fn, until ); + + if ( name.slice( -5 ) !== "Until" ) { + selector = until; + } + + if ( selector && typeof selector === "string" ) { + matched = jQuery.filter( selector, matched ); + } + + if ( this.length > 1 ) { + + // Remove duplicates + if ( !guaranteedUnique[ name ] ) { + jQuery.uniqueSort( matched ); + } + + // Reverse order for parents* and prev-derivatives + if ( rparentsprev.test( name ) ) { + matched.reverse(); + } + } + + return this.pushStack( matched ); + }; +} ); +var rnothtmlwhite = ( /[^\x20\t\r\n\f]+/g ); + + + +// Convert String-formatted options into Object-formatted ones +function createOptions( options ) { + var object = {}; + jQuery.each( options.match( rnothtmlwhite ) || [], function( _, flag ) { + object[ flag ] = true; + } ); + return object; +} + +/* + * Create a callback list using the following parameters: + * + * options: an optional list of space-separated options that will change how + * the callback list behaves or a more traditional option object + * + * By default a callback list will act like an event callback list and can be + * "fired" multiple times. + * + * Possible options: + * + * once: will ensure the callback list can only be fired once (like a Deferred) + * + * memory: will keep track of previous values and will call any callback added + * after the list has been fired right away with the latest "memorized" + * values (like a Deferred) + * + * unique: will ensure a callback can only be added once (no duplicate in the list) + * + * stopOnFalse: interrupt callings when a callback returns false + * + */ +jQuery.Callbacks = function( options ) { + + // Convert options from String-formatted to Object-formatted if needed + // (we check in cache first) + options = typeof options === "string" ? + createOptions( options ) : + jQuery.extend( {}, options ); + + var // Flag to know if list is currently firing + firing, + + // Last fire value for non-forgettable lists + memory, + + // Flag to know if list was already fired + fired, + + // Flag to prevent firing + locked, + + // Actual callback list + list = [], + + // Queue of execution data for repeatable lists + queue = [], + + // Index of currently firing callback (modified by add/remove as needed) + firingIndex = -1, + + // Fire callbacks + fire = function() { + + // Enforce single-firing + locked = locked || options.once; + + // Execute callbacks for all pending executions, + // respecting firingIndex overrides and runtime changes + fired = firing = true; + for ( ; queue.length; firingIndex = -1 ) { + memory = queue.shift(); + while ( ++firingIndex < list.length ) { + + // Run callback and check for early termination + if ( list[ firingIndex ].apply( memory[ 0 ], memory[ 1 ] ) === false && + options.stopOnFalse ) { + + // Jump to end and forget the data so .add doesn't re-fire + firingIndex = list.length; + memory = false; + } + } + } + + // Forget the data if we're done with it + if ( !options.memory ) { + memory = false; + } + + firing = false; + + // Clean up if we're done firing for good + if ( locked ) { + + // Keep an empty list if we have data for future add calls + if ( memory ) { + list = []; + + // Otherwise, this object is spent + } else { + list = ""; + } + } + }, + + // Actual Callbacks object + self = { + + // Add a callback or a collection of callbacks to the list + add: function() { + if ( list ) { + + // If we have memory from a past run, we should fire after adding + if ( memory && !firing ) { + firingIndex = list.length - 1; + queue.push( memory ); + } + + ( function add( args ) { + jQuery.each( args, function( _, arg ) { + if ( isFunction( arg ) ) { + if ( !options.unique || !self.has( arg ) ) { + list.push( arg ); + } + } else if ( arg && arg.length && toType( arg ) !== "string" ) { + + // Inspect recursively + add( arg ); + } + } ); + } )( arguments ); + + if ( memory && !firing ) { + fire(); + } + } + return this; + }, + + // Remove a callback from the list + remove: function() { + jQuery.each( arguments, function( _, arg ) { + var index; + while ( ( index = jQuery.inArray( arg, list, index ) ) > -1 ) { + list.splice( index, 1 ); + + // Handle firing indexes + if ( index <= firingIndex ) { + firingIndex--; + } + } + } ); + return this; + }, + + // Check if a given callback is in the list. + // If no argument is given, return whether or not list has callbacks attached. + has: function( fn ) { + return fn ? + jQuery.inArray( fn, list ) > -1 : + list.length > 0; + }, + + // Remove all callbacks from the list + empty: function() { + if ( list ) { + list = []; + } + return this; + }, + + // Disable .fire and .add + // Abort any current/pending executions + // Clear all callbacks and values + disable: function() { + locked = queue = []; + list = memory = ""; + return this; + }, + disabled: function() { + return !list; + }, + + // Disable .fire + // Also disable .add unless we have memory (since it would have no effect) + // Abort any pending executions + lock: function() { + locked = queue = []; + if ( !memory && !firing ) { + list = memory = ""; + } + return this; + }, + locked: function() { + return !!locked; + }, + + // Call all callbacks with the given context and arguments + fireWith: function( context, args ) { + if ( !locked ) { + args = args || []; + args = [ context, args.slice ? args.slice() : args ]; + queue.push( args ); + if ( !firing ) { + fire(); + } + } + return this; + }, + + // Call all the callbacks with the given arguments + fire: function() { + self.fireWith( this, arguments ); + return this; + }, + + // To know if the callbacks have already been called at least once + fired: function() { + return !!fired; + } + }; + + return self; +}; + + +function Identity( v ) { + return v; +} +function Thrower( ex ) { + throw ex; +} + +function adoptValue( value, resolve, reject, noValue ) { + var method; + + try { + + // Check for promise aspect first to privilege synchronous behavior + if ( value && isFunction( ( method = value.promise ) ) ) { + method.call( value ).done( resolve ).fail( reject ); + + // Other thenables + } else if ( value && isFunction( ( method = value.then ) ) ) { + method.call( value, resolve, reject ); + + // Other non-thenables + } else { + + // Control `resolve` arguments by letting Array#slice cast boolean `noValue` to integer: + // * false: [ value ].slice( 0 ) => resolve( value ) + // * true: [ value ].slice( 1 ) => resolve() + resolve.apply( undefined, [ value ].slice( noValue ) ); + } + + // For Promises/A+, convert exceptions into rejections + // Since jQuery.when doesn't unwrap thenables, we can skip the extra checks appearing in + // Deferred#then to conditionally suppress rejection. + } catch ( value ) { + + // Support: Android 4.0 only + // Strict mode functions invoked without .call/.apply get global-object context + reject.apply( undefined, [ value ] ); + } +} + +jQuery.extend( { + + Deferred: function( func ) { + var tuples = [ + + // action, add listener, callbacks, + // ... .then handlers, argument index, [final state] + [ "notify", "progress", jQuery.Callbacks( "memory" ), + jQuery.Callbacks( "memory" ), 2 ], + [ "resolve", "done", jQuery.Callbacks( "once memory" ), + jQuery.Callbacks( "once memory" ), 0, "resolved" ], + [ "reject", "fail", jQuery.Callbacks( "once memory" ), + jQuery.Callbacks( "once memory" ), 1, "rejected" ] + ], + state = "pending", + promise = { + state: function() { + return state; + }, + always: function() { + deferred.done( arguments ).fail( arguments ); + return this; + }, + "catch": function( fn ) { + return promise.then( null, fn ); + }, + + // Keep pipe for back-compat + pipe: function( /* fnDone, fnFail, fnProgress */ ) { + var fns = arguments; + + return jQuery.Deferred( function( newDefer ) { + jQuery.each( tuples, function( _i, tuple ) { + + // Map tuples (progress, done, fail) to arguments (done, fail, progress) + var fn = isFunction( fns[ tuple[ 4 ] ] ) && fns[ tuple[ 4 ] ]; + + // deferred.progress(function() { bind to newDefer or newDefer.notify }) + // deferred.done(function() { bind to newDefer or newDefer.resolve }) + // deferred.fail(function() { bind to newDefer or newDefer.reject }) + deferred[ tuple[ 1 ] ]( function() { + var returned = fn && fn.apply( this, arguments ); + if ( returned && isFunction( returned.promise ) ) { + returned.promise() + .progress( newDefer.notify ) + .done( newDefer.resolve ) + .fail( newDefer.reject ); + } else { + newDefer[ tuple[ 0 ] + "With" ]( + this, + fn ? [ returned ] : arguments + ); + } + } ); + } ); + fns = null; + } ).promise(); + }, + then: function( onFulfilled, onRejected, onProgress ) { + var maxDepth = 0; + function resolve( depth, deferred, handler, special ) { + return function() { + var that = this, + args = arguments, + mightThrow = function() { + var returned, then; + + // Support: Promises/A+ section 2.3.3.3.3 + // https://promisesaplus.com/#point-59 + // Ignore double-resolution attempts + if ( depth < maxDepth ) { + return; + } + + returned = handler.apply( that, args ); + + // Support: Promises/A+ section 2.3.1 + // https://promisesaplus.com/#point-48 + if ( returned === deferred.promise() ) { + throw new TypeError( "Thenable self-resolution" ); + } + + // Support: Promises/A+ sections 2.3.3.1, 3.5 + // https://promisesaplus.com/#point-54 + // https://promisesaplus.com/#point-75 + // Retrieve `then` only once + then = returned && + + // Support: Promises/A+ section 2.3.4 + // https://promisesaplus.com/#point-64 + // Only check objects and functions for thenability + ( typeof returned === "object" || + typeof returned === "function" ) && + returned.then; + + // Handle a returned thenable + if ( isFunction( then ) ) { + + // Special processors (notify) just wait for resolution + if ( special ) { + then.call( + returned, + resolve( maxDepth, deferred, Identity, special ), + resolve( maxDepth, deferred, Thrower, special ) + ); + + // Normal processors (resolve) also hook into progress + } else { + + // ...and disregard older resolution values + maxDepth++; + + then.call( + returned, + resolve( maxDepth, deferred, Identity, special ), + resolve( maxDepth, deferred, Thrower, special ), + resolve( maxDepth, deferred, Identity, + deferred.notifyWith ) + ); + } + + // Handle all other returned values + } else { + + // Only substitute handlers pass on context + // and multiple values (non-spec behavior) + if ( handler !== Identity ) { + that = undefined; + args = [ returned ]; + } + + // Process the value(s) + // Default process is resolve + ( special || deferred.resolveWith )( that, args ); + } + }, + + // Only normal processors (resolve) catch and reject exceptions + process = special ? + mightThrow : + function() { + try { + mightThrow(); + } catch ( e ) { + + if ( jQuery.Deferred.exceptionHook ) { + jQuery.Deferred.exceptionHook( e, + process.stackTrace ); + } + + // Support: Promises/A+ section 2.3.3.3.4.1 + // https://promisesaplus.com/#point-61 + // Ignore post-resolution exceptions + if ( depth + 1 >= maxDepth ) { + + // Only substitute handlers pass on context + // and multiple values (non-spec behavior) + if ( handler !== Thrower ) { + that = undefined; + args = [ e ]; + } + + deferred.rejectWith( that, args ); + } + } + }; + + // Support: Promises/A+ section 2.3.3.3.1 + // https://promisesaplus.com/#point-57 + // Re-resolve promises immediately to dodge false rejection from + // subsequent errors + if ( depth ) { + process(); + } else { + + // Call an optional hook to record the stack, in case of exception + // since it's otherwise lost when execution goes async + if ( jQuery.Deferred.getStackHook ) { + process.stackTrace = jQuery.Deferred.getStackHook(); + } + window.setTimeout( process ); + } + }; + } + + return jQuery.Deferred( function( newDefer ) { + + // progress_handlers.add( ... ) + tuples[ 0 ][ 3 ].add( + resolve( + 0, + newDefer, + isFunction( onProgress ) ? + onProgress : + Identity, + newDefer.notifyWith + ) + ); + + // fulfilled_handlers.add( ... ) + tuples[ 1 ][ 3 ].add( + resolve( + 0, + newDefer, + isFunction( onFulfilled ) ? + onFulfilled : + Identity + ) + ); + + // rejected_handlers.add( ... ) + tuples[ 2 ][ 3 ].add( + resolve( + 0, + newDefer, + isFunction( onRejected ) ? + onRejected : + Thrower + ) + ); + } ).promise(); + }, + + // Get a promise for this deferred + // If obj is provided, the promise aspect is added to the object + promise: function( obj ) { + return obj != null ? jQuery.extend( obj, promise ) : promise; + } + }, + deferred = {}; + + // Add list-specific methods + jQuery.each( tuples, function( i, tuple ) { + var list = tuple[ 2 ], + stateString = tuple[ 5 ]; + + // promise.progress = list.add + // promise.done = list.add + // promise.fail = list.add + promise[ tuple[ 1 ] ] = list.add; + + // Handle state + if ( stateString ) { + list.add( + function() { + + // state = "resolved" (i.e., fulfilled) + // state = "rejected" + state = stateString; + }, + + // rejected_callbacks.disable + // fulfilled_callbacks.disable + tuples[ 3 - i ][ 2 ].disable, + + // rejected_handlers.disable + // fulfilled_handlers.disable + tuples[ 3 - i ][ 3 ].disable, + + // progress_callbacks.lock + tuples[ 0 ][ 2 ].lock, + + // progress_handlers.lock + tuples[ 0 ][ 3 ].lock + ); + } + + // progress_handlers.fire + // fulfilled_handlers.fire + // rejected_handlers.fire + list.add( tuple[ 3 ].fire ); + + // deferred.notify = function() { deferred.notifyWith(...) } + // deferred.resolve = function() { deferred.resolveWith(...) } + // deferred.reject = function() { deferred.rejectWith(...) } + deferred[ tuple[ 0 ] ] = function() { + deferred[ tuple[ 0 ] + "With" ]( this === deferred ? undefined : this, arguments ); + return this; + }; + + // deferred.notifyWith = list.fireWith + // deferred.resolveWith = list.fireWith + // deferred.rejectWith = list.fireWith + deferred[ tuple[ 0 ] + "With" ] = list.fireWith; + } ); + + // Make the deferred a promise + promise.promise( deferred ); + + // Call given func if any + if ( func ) { + func.call( deferred, deferred ); + } + + // All done! + return deferred; + }, + + // Deferred helper + when: function( singleValue ) { + var + + // count of uncompleted subordinates + remaining = arguments.length, + + // count of unprocessed arguments + i = remaining, + + // subordinate fulfillment data + resolveContexts = Array( i ), + resolveValues = slice.call( arguments ), + + // the primary Deferred + primary = jQuery.Deferred(), + + // subordinate callback factory + updateFunc = function( i ) { + return function( value ) { + resolveContexts[ i ] = this; + resolveValues[ i ] = arguments.length > 1 ? slice.call( arguments ) : value; + if ( !( --remaining ) ) { + primary.resolveWith( resolveContexts, resolveValues ); + } + }; + }; + + // Single- and empty arguments are adopted like Promise.resolve + if ( remaining <= 1 ) { + adoptValue( singleValue, primary.done( updateFunc( i ) ).resolve, primary.reject, + !remaining ); + + // Use .then() to unwrap secondary thenables (cf. gh-3000) + if ( primary.state() === "pending" || + isFunction( resolveValues[ i ] && resolveValues[ i ].then ) ) { + + return primary.then(); + } + } + + // Multiple arguments are aggregated like Promise.all array elements + while ( i-- ) { + adoptValue( resolveValues[ i ], updateFunc( i ), primary.reject ); + } + + return primary.promise(); + } +} ); + + +// These usually indicate a programmer mistake during development, +// warn about them ASAP rather than swallowing them by default. +var rerrorNames = /^(Eval|Internal|Range|Reference|Syntax|Type|URI)Error$/; + +jQuery.Deferred.exceptionHook = function( error, stack ) { + + // Support: IE 8 - 9 only + // Console exists when dev tools are open, which can happen at any time + if ( window.console && window.console.warn && error && rerrorNames.test( error.name ) ) { + window.console.warn( "jQuery.Deferred exception: " + error.message, error.stack, stack ); + } +}; + + + + +jQuery.readyException = function( error ) { + window.setTimeout( function() { + throw error; + } ); +}; + + + + +// The deferred used on DOM ready +var readyList = jQuery.Deferred(); + +jQuery.fn.ready = function( fn ) { + + readyList + .then( fn ) + + // Wrap jQuery.readyException in a function so that the lookup + // happens at the time of error handling instead of callback + // registration. + .catch( function( error ) { + jQuery.readyException( error ); + } ); + + return this; +}; + +jQuery.extend( { + + // Is the DOM ready to be used? Set to true once it occurs. + isReady: false, + + // A counter to track how many items to wait for before + // the ready event fires. See #6781 + readyWait: 1, + + // Handle when the DOM is ready + ready: function( wait ) { + + // Abort if there are pending holds or we're already ready + if ( wait === true ? --jQuery.readyWait : jQuery.isReady ) { + return; + } + + // Remember that the DOM is ready + jQuery.isReady = true; + + // If a normal DOM Ready event fired, decrement, and wait if need be + if ( wait !== true && --jQuery.readyWait > 0 ) { + return; + } + + // If there are functions bound, to execute + readyList.resolveWith( document, [ jQuery ] ); + } +} ); + +jQuery.ready.then = readyList.then; + +// The ready event handler and self cleanup method +function completed() { + document.removeEventListener( "DOMContentLoaded", completed ); + window.removeEventListener( "load", completed ); + jQuery.ready(); +} + +// Catch cases where $(document).ready() is called +// after the browser event has already occurred. +// Support: IE <=9 - 10 only +// Older IE sometimes signals "interactive" too soon +if ( document.readyState === "complete" || + ( document.readyState !== "loading" && !document.documentElement.doScroll ) ) { + + // Handle it asynchronously to allow scripts the opportunity to delay ready + window.setTimeout( jQuery.ready ); + +} else { + + // Use the handy event callback + document.addEventListener( "DOMContentLoaded", completed ); + + // A fallback to window.onload, that will always work + window.addEventListener( "load", completed ); +} + + + + +// Multifunctional method to get and set values of a collection +// The value/s can optionally be executed if it's a function +var access = function( elems, fn, key, value, chainable, emptyGet, raw ) { + var i = 0, + len = elems.length, + bulk = key == null; + + // Sets many values + if ( toType( key ) === "object" ) { + chainable = true; + for ( i in key ) { + access( elems, fn, i, key[ i ], true, emptyGet, raw ); + } + + // Sets one value + } else if ( value !== undefined ) { + chainable = true; + + if ( !isFunction( value ) ) { + raw = true; + } + + if ( bulk ) { + + // Bulk operations run against the entire set + if ( raw ) { + fn.call( elems, value ); + fn = null; + + // ...except when executing function values + } else { + bulk = fn; + fn = function( elem, _key, value ) { + return bulk.call( jQuery( elem ), value ); + }; + } + } + + if ( fn ) { + for ( ; i < len; i++ ) { + fn( + elems[ i ], key, raw ? + value : + value.call( elems[ i ], i, fn( elems[ i ], key ) ) + ); + } + } + } + + if ( chainable ) { + return elems; + } + + // Gets + if ( bulk ) { + return fn.call( elems ); + } + + return len ? fn( elems[ 0 ], key ) : emptyGet; +}; + + +// Matches dashed string for camelizing +var rmsPrefix = /^-ms-/, + rdashAlpha = /-([a-z])/g; + +// Used by camelCase as callback to replace() +function fcamelCase( _all, letter ) { + return letter.toUpperCase(); +} + +// Convert dashed to camelCase; used by the css and data modules +// Support: IE <=9 - 11, Edge 12 - 15 +// Microsoft forgot to hump their vendor prefix (#9572) +function camelCase( string ) { + return string.replace( rmsPrefix, "ms-" ).replace( rdashAlpha, fcamelCase ); +} +var acceptData = function( owner ) { + + // Accepts only: + // - Node + // - Node.ELEMENT_NODE + // - Node.DOCUMENT_NODE + // - Object + // - Any + return owner.nodeType === 1 || owner.nodeType === 9 || !( +owner.nodeType ); +}; + + + + +function Data() { + this.expando = jQuery.expando + Data.uid++; +} + +Data.uid = 1; + +Data.prototype = { + + cache: function( owner ) { + + // Check if the owner object already has a cache + var value = owner[ this.expando ]; + + // If not, create one + if ( !value ) { + value = {}; + + // We can accept data for non-element nodes in modern browsers, + // but we should not, see #8335. + // Always return an empty object. + if ( acceptData( owner ) ) { + + // If it is a node unlikely to be stringify-ed or looped over + // use plain assignment + if ( owner.nodeType ) { + owner[ this.expando ] = value; + + // Otherwise secure it in a non-enumerable property + // configurable must be true to allow the property to be + // deleted when data is removed + } else { + Object.defineProperty( owner, this.expando, { + value: value, + configurable: true + } ); + } + } + } + + return value; + }, + set: function( owner, data, value ) { + var prop, + cache = this.cache( owner ); + + // Handle: [ owner, key, value ] args + // Always use camelCase key (gh-2257) + if ( typeof data === "string" ) { + cache[ camelCase( data ) ] = value; + + // Handle: [ owner, { properties } ] args + } else { + + // Copy the properties one-by-one to the cache object + for ( prop in data ) { + cache[ camelCase( prop ) ] = data[ prop ]; + } + } + return cache; + }, + get: function( owner, key ) { + return key === undefined ? + this.cache( owner ) : + + // Always use camelCase key (gh-2257) + owner[ this.expando ] && owner[ this.expando ][ camelCase( key ) ]; + }, + access: function( owner, key, value ) { + + // In cases where either: + // + // 1. No key was specified + // 2. A string key was specified, but no value provided + // + // Take the "read" path and allow the get method to determine + // which value to return, respectively either: + // + // 1. The entire cache object + // 2. The data stored at the key + // + if ( key === undefined || + ( ( key && typeof key === "string" ) && value === undefined ) ) { + + return this.get( owner, key ); + } + + // When the key is not a string, or both a key and value + // are specified, set or extend (existing objects) with either: + // + // 1. An object of properties + // 2. A key and value + // + this.set( owner, key, value ); + + // Since the "set" path can have two possible entry points + // return the expected data based on which path was taken[*] + return value !== undefined ? value : key; + }, + remove: function( owner, key ) { + var i, + cache = owner[ this.expando ]; + + if ( cache === undefined ) { + return; + } + + if ( key !== undefined ) { + + // Support array or space separated string of keys + if ( Array.isArray( key ) ) { + + // If key is an array of keys... + // We always set camelCase keys, so remove that. + key = key.map( camelCase ); + } else { + key = camelCase( key ); + + // If a key with the spaces exists, use it. + // Otherwise, create an array by matching non-whitespace + key = key in cache ? + [ key ] : + ( key.match( rnothtmlwhite ) || [] ); + } + + i = key.length; + + while ( i-- ) { + delete cache[ key[ i ] ]; + } + } + + // Remove the expando if there's no more data + if ( key === undefined || jQuery.isEmptyObject( cache ) ) { + + // Support: Chrome <=35 - 45 + // Webkit & Blink performance suffers when deleting properties + // from DOM nodes, so set to undefined instead + // https://bugs.chromium.org/p/chromium/issues/detail?id=378607 (bug restricted) + if ( owner.nodeType ) { + owner[ this.expando ] = undefined; + } else { + delete owner[ this.expando ]; + } + } + }, + hasData: function( owner ) { + var cache = owner[ this.expando ]; + return cache !== undefined && !jQuery.isEmptyObject( cache ); + } +}; +var dataPriv = new Data(); + +var dataUser = new Data(); + + + +// Implementation Summary +// +// 1. Enforce API surface and semantic compatibility with 1.9.x branch +// 2. Improve the module's maintainability by reducing the storage +// paths to a single mechanism. +// 3. Use the same single mechanism to support "private" and "user" data. +// 4. _Never_ expose "private" data to user code (TODO: Drop _data, _removeData) +// 5. Avoid exposing implementation details on user objects (eg. expando properties) +// 6. Provide a clear path for implementation upgrade to WeakMap in 2014 + +var rbrace = /^(?:\{[\w\W]*\}|\[[\w\W]*\])$/, + rmultiDash = /[A-Z]/g; + +function getData( data ) { + if ( data === "true" ) { + return true; + } + + if ( data === "false" ) { + return false; + } + + if ( data === "null" ) { + return null; + } + + // Only convert to a number if it doesn't change the string + if ( data === +data + "" ) { + return +data; + } + + if ( rbrace.test( data ) ) { + return JSON.parse( data ); + } + + return data; +} + +function dataAttr( elem, key, data ) { + var name; + + // If nothing was found internally, try to fetch any + // data from the HTML5 data-* attribute + if ( data === undefined && elem.nodeType === 1 ) { + name = "data-" + key.replace( rmultiDash, "-$&" ).toLowerCase(); + data = elem.getAttribute( name ); + + if ( typeof data === "string" ) { + try { + data = getData( data ); + } catch ( e ) {} + + // Make sure we set the data so it isn't changed later + dataUser.set( elem, key, data ); + } else { + data = undefined; + } + } + return data; +} + +jQuery.extend( { + hasData: function( elem ) { + return dataUser.hasData( elem ) || dataPriv.hasData( elem ); + }, + + data: function( elem, name, data ) { + return dataUser.access( elem, name, data ); + }, + + removeData: function( elem, name ) { + dataUser.remove( elem, name ); + }, + + // TODO: Now that all calls to _data and _removeData have been replaced + // with direct calls to dataPriv methods, these can be deprecated. + _data: function( elem, name, data ) { + return dataPriv.access( elem, name, data ); + }, + + _removeData: function( elem, name ) { + dataPriv.remove( elem, name ); + } +} ); + +jQuery.fn.extend( { + data: function( key, value ) { + var i, name, data, + elem = this[ 0 ], + attrs = elem && elem.attributes; + + // Gets all values + if ( key === undefined ) { + if ( this.length ) { + data = dataUser.get( elem ); + + if ( elem.nodeType === 1 && !dataPriv.get( elem, "hasDataAttrs" ) ) { + i = attrs.length; + while ( i-- ) { + + // Support: IE 11 only + // The attrs elements can be null (#14894) + if ( attrs[ i ] ) { + name = attrs[ i ].name; + if ( name.indexOf( "data-" ) === 0 ) { + name = camelCase( name.slice( 5 ) ); + dataAttr( elem, name, data[ name ] ); + } + } + } + dataPriv.set( elem, "hasDataAttrs", true ); + } + } + + return data; + } + + // Sets multiple values + if ( typeof key === "object" ) { + return this.each( function() { + dataUser.set( this, key ); + } ); + } + + return access( this, function( value ) { + var data; + + // The calling jQuery object (element matches) is not empty + // (and therefore has an element appears at this[ 0 ]) and the + // `value` parameter was not undefined. An empty jQuery object + // will result in `undefined` for elem = this[ 0 ] which will + // throw an exception if an attempt to read a data cache is made. + if ( elem && value === undefined ) { + + // Attempt to get data from the cache + // The key will always be camelCased in Data + data = dataUser.get( elem, key ); + if ( data !== undefined ) { + return data; + } + + // Attempt to "discover" the data in + // HTML5 custom data-* attrs + data = dataAttr( elem, key ); + if ( data !== undefined ) { + return data; + } + + // We tried really hard, but the data doesn't exist. + return; + } + + // Set the data... + this.each( function() { + + // We always store the camelCased key + dataUser.set( this, key, value ); + } ); + }, null, value, arguments.length > 1, null, true ); + }, + + removeData: function( key ) { + return this.each( function() { + dataUser.remove( this, key ); + } ); + } +} ); + + +jQuery.extend( { + queue: function( elem, type, data ) { + var queue; + + if ( elem ) { + type = ( type || "fx" ) + "queue"; + queue = dataPriv.get( elem, type ); + + // Speed up dequeue by getting out quickly if this is just a lookup + if ( data ) { + if ( !queue || Array.isArray( data ) ) { + queue = dataPriv.access( elem, type, jQuery.makeArray( data ) ); + } else { + queue.push( data ); + } + } + return queue || []; + } + }, + + dequeue: function( elem, type ) { + type = type || "fx"; + + var queue = jQuery.queue( elem, type ), + startLength = queue.length, + fn = queue.shift(), + hooks = jQuery._queueHooks( elem, type ), + next = function() { + jQuery.dequeue( elem, type ); + }; + + // If the fx queue is dequeued, always remove the progress sentinel + if ( fn === "inprogress" ) { + fn = queue.shift(); + startLength--; + } + + if ( fn ) { + + // Add a progress sentinel to prevent the fx queue from being + // automatically dequeued + if ( type === "fx" ) { + queue.unshift( "inprogress" ); + } + + // Clear up the last queue stop function + delete hooks.stop; + fn.call( elem, next, hooks ); + } + + if ( !startLength && hooks ) { + hooks.empty.fire(); + } + }, + + // Not public - generate a queueHooks object, or return the current one + _queueHooks: function( elem, type ) { + var key = type + "queueHooks"; + return dataPriv.get( elem, key ) || dataPriv.access( elem, key, { + empty: jQuery.Callbacks( "once memory" ).add( function() { + dataPriv.remove( elem, [ type + "queue", key ] ); + } ) + } ); + } +} ); + +jQuery.fn.extend( { + queue: function( type, data ) { + var setter = 2; + + if ( typeof type !== "string" ) { + data = type; + type = "fx"; + setter--; + } + + if ( arguments.length < setter ) { + return jQuery.queue( this[ 0 ], type ); + } + + return data === undefined ? + this : + this.each( function() { + var queue = jQuery.queue( this, type, data ); + + // Ensure a hooks for this queue + jQuery._queueHooks( this, type ); + + if ( type === "fx" && queue[ 0 ] !== "inprogress" ) { + jQuery.dequeue( this, type ); + } + } ); + }, + dequeue: function( type ) { + return this.each( function() { + jQuery.dequeue( this, type ); + } ); + }, + clearQueue: function( type ) { + return this.queue( type || "fx", [] ); + }, + + // Get a promise resolved when queues of a certain type + // are emptied (fx is the type by default) + promise: function( type, obj ) { + var tmp, + count = 1, + defer = jQuery.Deferred(), + elements = this, + i = this.length, + resolve = function() { + if ( !( --count ) ) { + defer.resolveWith( elements, [ elements ] ); + } + }; + + if ( typeof type !== "string" ) { + obj = type; + type = undefined; + } + type = type || "fx"; + + while ( i-- ) { + tmp = dataPriv.get( elements[ i ], type + "queueHooks" ); + if ( tmp && tmp.empty ) { + count++; + tmp.empty.add( resolve ); + } + } + resolve(); + return defer.promise( obj ); + } +} ); +var pnum = ( /[+-]?(?:\d*\.|)\d+(?:[eE][+-]?\d+|)/ ).source; + +var rcssNum = new RegExp( "^(?:([+-])=|)(" + pnum + ")([a-z%]*)$", "i" ); + + +var cssExpand = [ "Top", "Right", "Bottom", "Left" ]; + +var documentElement = document.documentElement; + + + + var isAttached = function( elem ) { + return jQuery.contains( elem.ownerDocument, elem ); + }, + composed = { composed: true }; + + // Support: IE 9 - 11+, Edge 12 - 18+, iOS 10.0 - 10.2 only + // Check attachment across shadow DOM boundaries when possible (gh-3504) + // Support: iOS 10.0-10.2 only + // Early iOS 10 versions support `attachShadow` but not `getRootNode`, + // leading to errors. We need to check for `getRootNode`. + if ( documentElement.getRootNode ) { + isAttached = function( elem ) { + return jQuery.contains( elem.ownerDocument, elem ) || + elem.getRootNode( composed ) === elem.ownerDocument; + }; + } +var isHiddenWithinTree = function( elem, el ) { + + // isHiddenWithinTree might be called from jQuery#filter function; + // in that case, element will be second argument + elem = el || elem; + + // Inline style trumps all + return elem.style.display === "none" || + elem.style.display === "" && + + // Otherwise, check computed style + // Support: Firefox <=43 - 45 + // Disconnected elements can have computed display: none, so first confirm that elem is + // in the document. + isAttached( elem ) && + + jQuery.css( elem, "display" ) === "none"; + }; + + + +function adjustCSS( elem, prop, valueParts, tween ) { + var adjusted, scale, + maxIterations = 20, + currentValue = tween ? + function() { + return tween.cur(); + } : + function() { + return jQuery.css( elem, prop, "" ); + }, + initial = currentValue(), + unit = valueParts && valueParts[ 3 ] || ( jQuery.cssNumber[ prop ] ? "" : "px" ), + + // Starting value computation is required for potential unit mismatches + initialInUnit = elem.nodeType && + ( jQuery.cssNumber[ prop ] || unit !== "px" && +initial ) && + rcssNum.exec( jQuery.css( elem, prop ) ); + + if ( initialInUnit && initialInUnit[ 3 ] !== unit ) { + + // Support: Firefox <=54 + // Halve the iteration target value to prevent interference from CSS upper bounds (gh-2144) + initial = initial / 2; + + // Trust units reported by jQuery.css + unit = unit || initialInUnit[ 3 ]; + + // Iteratively approximate from a nonzero starting point + initialInUnit = +initial || 1; + + while ( maxIterations-- ) { + + // Evaluate and update our best guess (doubling guesses that zero out). + // Finish if the scale equals or crosses 1 (making the old*new product non-positive). + jQuery.style( elem, prop, initialInUnit + unit ); + if ( ( 1 - scale ) * ( 1 - ( scale = currentValue() / initial || 0.5 ) ) <= 0 ) { + maxIterations = 0; + } + initialInUnit = initialInUnit / scale; + + } + + initialInUnit = initialInUnit * 2; + jQuery.style( elem, prop, initialInUnit + unit ); + + // Make sure we update the tween properties later on + valueParts = valueParts || []; + } + + if ( valueParts ) { + initialInUnit = +initialInUnit || +initial || 0; + + // Apply relative offset (+=/-=) if specified + adjusted = valueParts[ 1 ] ? + initialInUnit + ( valueParts[ 1 ] + 1 ) * valueParts[ 2 ] : + +valueParts[ 2 ]; + if ( tween ) { + tween.unit = unit; + tween.start = initialInUnit; + tween.end = adjusted; + } + } + return adjusted; +} + + +var defaultDisplayMap = {}; + +function getDefaultDisplay( elem ) { + var temp, + doc = elem.ownerDocument, + nodeName = elem.nodeName, + display = defaultDisplayMap[ nodeName ]; + + if ( display ) { + return display; + } + + temp = doc.body.appendChild( doc.createElement( nodeName ) ); + display = jQuery.css( temp, "display" ); + + temp.parentNode.removeChild( temp ); + + if ( display === "none" ) { + display = "block"; + } + defaultDisplayMap[ nodeName ] = display; + + return display; +} + +function showHide( elements, show ) { + var display, elem, + values = [], + index = 0, + length = elements.length; + + // Determine new display value for elements that need to change + for ( ; index < length; index++ ) { + elem = elements[ index ]; + if ( !elem.style ) { + continue; + } + + display = elem.style.display; + if ( show ) { + + // Since we force visibility upon cascade-hidden elements, an immediate (and slow) + // check is required in this first loop unless we have a nonempty display value (either + // inline or about-to-be-restored) + if ( display === "none" ) { + values[ index ] = dataPriv.get( elem, "display" ) || null; + if ( !values[ index ] ) { + elem.style.display = ""; + } + } + if ( elem.style.display === "" && isHiddenWithinTree( elem ) ) { + values[ index ] = getDefaultDisplay( elem ); + } + } else { + if ( display !== "none" ) { + values[ index ] = "none"; + + // Remember what we're overwriting + dataPriv.set( elem, "display", display ); + } + } + } + + // Set the display of the elements in a second loop to avoid constant reflow + for ( index = 0; index < length; index++ ) { + if ( values[ index ] != null ) { + elements[ index ].style.display = values[ index ]; + } + } + + return elements; +} + +jQuery.fn.extend( { + show: function() { + return showHide( this, true ); + }, + hide: function() { + return showHide( this ); + }, + toggle: function( state ) { + if ( typeof state === "boolean" ) { + return state ? this.show() : this.hide(); + } + + return this.each( function() { + if ( isHiddenWithinTree( this ) ) { + jQuery( this ).show(); + } else { + jQuery( this ).hide(); + } + } ); + } +} ); +var rcheckableType = ( /^(?:checkbox|radio)$/i ); + +var rtagName = ( /<([a-z][^\/\0>\x20\t\r\n\f]*)/i ); + +var rscriptType = ( /^$|^module$|\/(?:java|ecma)script/i ); + + + +( function() { + var fragment = document.createDocumentFragment(), + div = fragment.appendChild( document.createElement( "div" ) ), + input = document.createElement( "input" ); + + // Support: Android 4.0 - 4.3 only + // Check state lost if the name is set (#11217) + // Support: Windows Web Apps (WWA) + // `name` and `type` must use .setAttribute for WWA (#14901) + input.setAttribute( "type", "radio" ); + input.setAttribute( "checked", "checked" ); + input.setAttribute( "name", "t" ); + + div.appendChild( input ); + + // Support: Android <=4.1 only + // Older WebKit doesn't clone checked state correctly in fragments + support.checkClone = div.cloneNode( true ).cloneNode( true ).lastChild.checked; + + // Support: IE <=11 only + // Make sure textarea (and checkbox) defaultValue is properly cloned + div.innerHTML = ""; + support.noCloneChecked = !!div.cloneNode( true ).lastChild.defaultValue; + + // Support: IE <=9 only + // IE <=9 replaces "; + support.option = !!div.lastChild; +} )(); + + +// We have to close these tags to support XHTML (#13200) +var wrapMap = { + + // XHTML parsers do not magically insert elements in the + // same way that tag soup parsers do. So we cannot shorten + // this by omitting or other required elements. + thead: [ 1, "", "
    " ], + col: [ 2, "", "
    " ], + tr: [ 2, "", "
    " ], + td: [ 3, "", "
    " ], + + _default: [ 0, "", "" ] +}; + +wrapMap.tbody = wrapMap.tfoot = wrapMap.colgroup = wrapMap.caption = wrapMap.thead; +wrapMap.th = wrapMap.td; + +// Support: IE <=9 only +if ( !support.option ) { + wrapMap.optgroup = wrapMap.option = [ 1, "" ]; +} + + +function getAll( context, tag ) { + + // Support: IE <=9 - 11 only + // Use typeof to avoid zero-argument method invocation on host objects (#15151) + var ret; + + if ( typeof context.getElementsByTagName !== "undefined" ) { + ret = context.getElementsByTagName( tag || "*" ); + + } else if ( typeof context.querySelectorAll !== "undefined" ) { + ret = context.querySelectorAll( tag || "*" ); + + } else { + ret = []; + } + + if ( tag === undefined || tag && nodeName( context, tag ) ) { + return jQuery.merge( [ context ], ret ); + } + + return ret; +} + + +// Mark scripts as having already been evaluated +function setGlobalEval( elems, refElements ) { + var i = 0, + l = elems.length; + + for ( ; i < l; i++ ) { + dataPriv.set( + elems[ i ], + "globalEval", + !refElements || dataPriv.get( refElements[ i ], "globalEval" ) + ); + } +} + + +var rhtml = /<|&#?\w+;/; + +function buildFragment( elems, context, scripts, selection, ignored ) { + var elem, tmp, tag, wrap, attached, j, + fragment = context.createDocumentFragment(), + nodes = [], + i = 0, + l = elems.length; + + for ( ; i < l; i++ ) { + elem = elems[ i ]; + + if ( elem || elem === 0 ) { + + // Add nodes directly + if ( toType( elem ) === "object" ) { + + // Support: Android <=4.0 only, PhantomJS 1 only + // push.apply(_, arraylike) throws on ancient WebKit + jQuery.merge( nodes, elem.nodeType ? [ elem ] : elem ); + + // Convert non-html into a text node + } else if ( !rhtml.test( elem ) ) { + nodes.push( context.createTextNode( elem ) ); + + // Convert html into DOM nodes + } else { + tmp = tmp || fragment.appendChild( context.createElement( "div" ) ); + + // Deserialize a standard representation + tag = ( rtagName.exec( elem ) || [ "", "" ] )[ 1 ].toLowerCase(); + wrap = wrapMap[ tag ] || wrapMap._default; + tmp.innerHTML = wrap[ 1 ] + jQuery.htmlPrefilter( elem ) + wrap[ 2 ]; + + // Descend through wrappers to the right content + j = wrap[ 0 ]; + while ( j-- ) { + tmp = tmp.lastChild; + } + + // Support: Android <=4.0 only, PhantomJS 1 only + // push.apply(_, arraylike) throws on ancient WebKit + jQuery.merge( nodes, tmp.childNodes ); + + // Remember the top-level container + tmp = fragment.firstChild; + + // Ensure the created nodes are orphaned (#12392) + tmp.textContent = ""; + } + } + } + + // Remove wrapper from fragment + fragment.textContent = ""; + + i = 0; + while ( ( elem = nodes[ i++ ] ) ) { + + // Skip elements already in the context collection (trac-4087) + if ( selection && jQuery.inArray( elem, selection ) > -1 ) { + if ( ignored ) { + ignored.push( elem ); + } + continue; + } + + attached = isAttached( elem ); + + // Append to fragment + tmp = getAll( fragment.appendChild( elem ), "script" ); + + // Preserve script evaluation history + if ( attached ) { + setGlobalEval( tmp ); + } + + // Capture executables + if ( scripts ) { + j = 0; + while ( ( elem = tmp[ j++ ] ) ) { + if ( rscriptType.test( elem.type || "" ) ) { + scripts.push( elem ); + } + } + } + } + + return fragment; +} + + +var rtypenamespace = /^([^.]*)(?:\.(.+)|)/; + +function returnTrue() { + return true; +} + +function returnFalse() { + return false; +} + +// Support: IE <=9 - 11+ +// focus() and blur() are asynchronous, except when they are no-op. +// So expect focus to be synchronous when the element is already active, +// and blur to be synchronous when the element is not already active. +// (focus and blur are always synchronous in other supported browsers, +// this just defines when we can count on it). +function expectSync( elem, type ) { + return ( elem === safeActiveElement() ) === ( type === "focus" ); +} + +// Support: IE <=9 only +// Accessing document.activeElement can throw unexpectedly +// https://bugs.jquery.com/ticket/13393 +function safeActiveElement() { + try { + return document.activeElement; + } catch ( err ) { } +} + +function on( elem, types, selector, data, fn, one ) { + var origFn, type; + + // Types can be a map of types/handlers + if ( typeof types === "object" ) { + + // ( types-Object, selector, data ) + if ( typeof selector !== "string" ) { + + // ( types-Object, data ) + data = data || selector; + selector = undefined; + } + for ( type in types ) { + on( elem, type, selector, data, types[ type ], one ); + } + return elem; + } + + if ( data == null && fn == null ) { + + // ( types, fn ) + fn = selector; + data = selector = undefined; + } else if ( fn == null ) { + if ( typeof selector === "string" ) { + + // ( types, selector, fn ) + fn = data; + data = undefined; + } else { + + // ( types, data, fn ) + fn = data; + data = selector; + selector = undefined; + } + } + if ( fn === false ) { + fn = returnFalse; + } else if ( !fn ) { + return elem; + } + + if ( one === 1 ) { + origFn = fn; + fn = function( event ) { + + // Can use an empty set, since event contains the info + jQuery().off( event ); + return origFn.apply( this, arguments ); + }; + + // Use same guid so caller can remove using origFn + fn.guid = origFn.guid || ( origFn.guid = jQuery.guid++ ); + } + return elem.each( function() { + jQuery.event.add( this, types, fn, data, selector ); + } ); +} + +/* + * Helper functions for managing events -- not part of the public interface. + * Props to Dean Edwards' addEvent library for many of the ideas. + */ +jQuery.event = { + + global: {}, + + add: function( elem, types, handler, data, selector ) { + + var handleObjIn, eventHandle, tmp, + events, t, handleObj, + special, handlers, type, namespaces, origType, + elemData = dataPriv.get( elem ); + + // Only attach events to objects that accept data + if ( !acceptData( elem ) ) { + return; + } + + // Caller can pass in an object of custom data in lieu of the handler + if ( handler.handler ) { + handleObjIn = handler; + handler = handleObjIn.handler; + selector = handleObjIn.selector; + } + + // Ensure that invalid selectors throw exceptions at attach time + // Evaluate against documentElement in case elem is a non-element node (e.g., document) + if ( selector ) { + jQuery.find.matchesSelector( documentElement, selector ); + } + + // Make sure that the handler has a unique ID, used to find/remove it later + if ( !handler.guid ) { + handler.guid = jQuery.guid++; + } + + // Init the element's event structure and main handler, if this is the first + if ( !( events = elemData.events ) ) { + events = elemData.events = Object.create( null ); + } + if ( !( eventHandle = elemData.handle ) ) { + eventHandle = elemData.handle = function( e ) { + + // Discard the second event of a jQuery.event.trigger() and + // when an event is called after a page has unloaded + return typeof jQuery !== "undefined" && jQuery.event.triggered !== e.type ? + jQuery.event.dispatch.apply( elem, arguments ) : undefined; + }; + } + + // Handle multiple events separated by a space + types = ( types || "" ).match( rnothtmlwhite ) || [ "" ]; + t = types.length; + while ( t-- ) { + tmp = rtypenamespace.exec( types[ t ] ) || []; + type = origType = tmp[ 1 ]; + namespaces = ( tmp[ 2 ] || "" ).split( "." ).sort(); + + // There *must* be a type, no attaching namespace-only handlers + if ( !type ) { + continue; + } + + // If event changes its type, use the special event handlers for the changed type + special = jQuery.event.special[ type ] || {}; + + // If selector defined, determine special event api type, otherwise given type + type = ( selector ? special.delegateType : special.bindType ) || type; + + // Update special based on newly reset type + special = jQuery.event.special[ type ] || {}; + + // handleObj is passed to all event handlers + handleObj = jQuery.extend( { + type: type, + origType: origType, + data: data, + handler: handler, + guid: handler.guid, + selector: selector, + needsContext: selector && jQuery.expr.match.needsContext.test( selector ), + namespace: namespaces.join( "." ) + }, handleObjIn ); + + // Init the event handler queue if we're the first + if ( !( handlers = events[ type ] ) ) { + handlers = events[ type ] = []; + handlers.delegateCount = 0; + + // Only use addEventListener if the special events handler returns false + if ( !special.setup || + special.setup.call( elem, data, namespaces, eventHandle ) === false ) { + + if ( elem.addEventListener ) { + elem.addEventListener( type, eventHandle ); + } + } + } + + if ( special.add ) { + special.add.call( elem, handleObj ); + + if ( !handleObj.handler.guid ) { + handleObj.handler.guid = handler.guid; + } + } + + // Add to the element's handler list, delegates in front + if ( selector ) { + handlers.splice( handlers.delegateCount++, 0, handleObj ); + } else { + handlers.push( handleObj ); + } + + // Keep track of which events have ever been used, for event optimization + jQuery.event.global[ type ] = true; + } + + }, + + // Detach an event or set of events from an element + remove: function( elem, types, handler, selector, mappedTypes ) { + + var j, origCount, tmp, + events, t, handleObj, + special, handlers, type, namespaces, origType, + elemData = dataPriv.hasData( elem ) && dataPriv.get( elem ); + + if ( !elemData || !( events = elemData.events ) ) { + return; + } + + // Once for each type.namespace in types; type may be omitted + types = ( types || "" ).match( rnothtmlwhite ) || [ "" ]; + t = types.length; + while ( t-- ) { + tmp = rtypenamespace.exec( types[ t ] ) || []; + type = origType = tmp[ 1 ]; + namespaces = ( tmp[ 2 ] || "" ).split( "." ).sort(); + + // Unbind all events (on this namespace, if provided) for the element + if ( !type ) { + for ( type in events ) { + jQuery.event.remove( elem, type + types[ t ], handler, selector, true ); + } + continue; + } + + special = jQuery.event.special[ type ] || {}; + type = ( selector ? special.delegateType : special.bindType ) || type; + handlers = events[ type ] || []; + tmp = tmp[ 2 ] && + new RegExp( "(^|\\.)" + namespaces.join( "\\.(?:.*\\.|)" ) + "(\\.|$)" ); + + // Remove matching events + origCount = j = handlers.length; + while ( j-- ) { + handleObj = handlers[ j ]; + + if ( ( mappedTypes || origType === handleObj.origType ) && + ( !handler || handler.guid === handleObj.guid ) && + ( !tmp || tmp.test( handleObj.namespace ) ) && + ( !selector || selector === handleObj.selector || + selector === "**" && handleObj.selector ) ) { + handlers.splice( j, 1 ); + + if ( handleObj.selector ) { + handlers.delegateCount--; + } + if ( special.remove ) { + special.remove.call( elem, handleObj ); + } + } + } + + // Remove generic event handler if we removed something and no more handlers exist + // (avoids potential for endless recursion during removal of special event handlers) + if ( origCount && !handlers.length ) { + if ( !special.teardown || + special.teardown.call( elem, namespaces, elemData.handle ) === false ) { + + jQuery.removeEvent( elem, type, elemData.handle ); + } + + delete events[ type ]; + } + } + + // Remove data and the expando if it's no longer used + if ( jQuery.isEmptyObject( events ) ) { + dataPriv.remove( elem, "handle events" ); + } + }, + + dispatch: function( nativeEvent ) { + + var i, j, ret, matched, handleObj, handlerQueue, + args = new Array( arguments.length ), + + // Make a writable jQuery.Event from the native event object + event = jQuery.event.fix( nativeEvent ), + + handlers = ( + dataPriv.get( this, "events" ) || Object.create( null ) + )[ event.type ] || [], + special = jQuery.event.special[ event.type ] || {}; + + // Use the fix-ed jQuery.Event rather than the (read-only) native event + args[ 0 ] = event; + + for ( i = 1; i < arguments.length; i++ ) { + args[ i ] = arguments[ i ]; + } + + event.delegateTarget = this; + + // Call the preDispatch hook for the mapped type, and let it bail if desired + if ( special.preDispatch && special.preDispatch.call( this, event ) === false ) { + return; + } + + // Determine handlers + handlerQueue = jQuery.event.handlers.call( this, event, handlers ); + + // Run delegates first; they may want to stop propagation beneath us + i = 0; + while ( ( matched = handlerQueue[ i++ ] ) && !event.isPropagationStopped() ) { + event.currentTarget = matched.elem; + + j = 0; + while ( ( handleObj = matched.handlers[ j++ ] ) && + !event.isImmediatePropagationStopped() ) { + + // If the event is namespaced, then each handler is only invoked if it is + // specially universal or its namespaces are a superset of the event's. + if ( !event.rnamespace || handleObj.namespace === false || + event.rnamespace.test( handleObj.namespace ) ) { + + event.handleObj = handleObj; + event.data = handleObj.data; + + ret = ( ( jQuery.event.special[ handleObj.origType ] || {} ).handle || + handleObj.handler ).apply( matched.elem, args ); + + if ( ret !== undefined ) { + if ( ( event.result = ret ) === false ) { + event.preventDefault(); + event.stopPropagation(); + } + } + } + } + } + + // Call the postDispatch hook for the mapped type + if ( special.postDispatch ) { + special.postDispatch.call( this, event ); + } + + return event.result; + }, + + handlers: function( event, handlers ) { + var i, handleObj, sel, matchedHandlers, matchedSelectors, + handlerQueue = [], + delegateCount = handlers.delegateCount, + cur = event.target; + + // Find delegate handlers + if ( delegateCount && + + // Support: IE <=9 + // Black-hole SVG instance trees (trac-13180) + cur.nodeType && + + // Support: Firefox <=42 + // Suppress spec-violating clicks indicating a non-primary pointer button (trac-3861) + // https://www.w3.org/TR/DOM-Level-3-Events/#event-type-click + // Support: IE 11 only + // ...but not arrow key "clicks" of radio inputs, which can have `button` -1 (gh-2343) + !( event.type === "click" && event.button >= 1 ) ) { + + for ( ; cur !== this; cur = cur.parentNode || this ) { + + // Don't check non-elements (#13208) + // Don't process clicks on disabled elements (#6911, #8165, #11382, #11764) + if ( cur.nodeType === 1 && !( event.type === "click" && cur.disabled === true ) ) { + matchedHandlers = []; + matchedSelectors = {}; + for ( i = 0; i < delegateCount; i++ ) { + handleObj = handlers[ i ]; + + // Don't conflict with Object.prototype properties (#13203) + sel = handleObj.selector + " "; + + if ( matchedSelectors[ sel ] === undefined ) { + matchedSelectors[ sel ] = handleObj.needsContext ? + jQuery( sel, this ).index( cur ) > -1 : + jQuery.find( sel, this, null, [ cur ] ).length; + } + if ( matchedSelectors[ sel ] ) { + matchedHandlers.push( handleObj ); + } + } + if ( matchedHandlers.length ) { + handlerQueue.push( { elem: cur, handlers: matchedHandlers } ); + } + } + } + } + + // Add the remaining (directly-bound) handlers + cur = this; + if ( delegateCount < handlers.length ) { + handlerQueue.push( { elem: cur, handlers: handlers.slice( delegateCount ) } ); + } + + return handlerQueue; + }, + + addProp: function( name, hook ) { + Object.defineProperty( jQuery.Event.prototype, name, { + enumerable: true, + configurable: true, + + get: isFunction( hook ) ? + function() { + if ( this.originalEvent ) { + return hook( this.originalEvent ); + } + } : + function() { + if ( this.originalEvent ) { + return this.originalEvent[ name ]; + } + }, + + set: function( value ) { + Object.defineProperty( this, name, { + enumerable: true, + configurable: true, + writable: true, + value: value + } ); + } + } ); + }, + + fix: function( originalEvent ) { + return originalEvent[ jQuery.expando ] ? + originalEvent : + new jQuery.Event( originalEvent ); + }, + + special: { + load: { + + // Prevent triggered image.load events from bubbling to window.load + noBubble: true + }, + click: { + + // Utilize native event to ensure correct state for checkable inputs + setup: function( data ) { + + // For mutual compressibility with _default, replace `this` access with a local var. + // `|| data` is dead code meant only to preserve the variable through minification. + var el = this || data; + + // Claim the first handler + if ( rcheckableType.test( el.type ) && + el.click && nodeName( el, "input" ) ) { + + // dataPriv.set( el, "click", ... ) + leverageNative( el, "click", returnTrue ); + } + + // Return false to allow normal processing in the caller + return false; + }, + trigger: function( data ) { + + // For mutual compressibility with _default, replace `this` access with a local var. + // `|| data` is dead code meant only to preserve the variable through minification. + var el = this || data; + + // Force setup before triggering a click + if ( rcheckableType.test( el.type ) && + el.click && nodeName( el, "input" ) ) { + + leverageNative( el, "click" ); + } + + // Return non-false to allow normal event-path propagation + return true; + }, + + // For cross-browser consistency, suppress native .click() on links + // Also prevent it if we're currently inside a leveraged native-event stack + _default: function( event ) { + var target = event.target; + return rcheckableType.test( target.type ) && + target.click && nodeName( target, "input" ) && + dataPriv.get( target, "click" ) || + nodeName( target, "a" ); + } + }, + + beforeunload: { + postDispatch: function( event ) { + + // Support: Firefox 20+ + // Firefox doesn't alert if the returnValue field is not set. + if ( event.result !== undefined && event.originalEvent ) { + event.originalEvent.returnValue = event.result; + } + } + } + } +}; + +// Ensure the presence of an event listener that handles manually-triggered +// synthetic events by interrupting progress until reinvoked in response to +// *native* events that it fires directly, ensuring that state changes have +// already occurred before other listeners are invoked. +function leverageNative( el, type, expectSync ) { + + // Missing expectSync indicates a trigger call, which must force setup through jQuery.event.add + if ( !expectSync ) { + if ( dataPriv.get( el, type ) === undefined ) { + jQuery.event.add( el, type, returnTrue ); + } + return; + } + + // Register the controller as a special universal handler for all event namespaces + dataPriv.set( el, type, false ); + jQuery.event.add( el, type, { + namespace: false, + handler: function( event ) { + var notAsync, result, + saved = dataPriv.get( this, type ); + + if ( ( event.isTrigger & 1 ) && this[ type ] ) { + + // Interrupt processing of the outer synthetic .trigger()ed event + // Saved data should be false in such cases, but might be a leftover capture object + // from an async native handler (gh-4350) + if ( !saved.length ) { + + // Store arguments for use when handling the inner native event + // There will always be at least one argument (an event object), so this array + // will not be confused with a leftover capture object. + saved = slice.call( arguments ); + dataPriv.set( this, type, saved ); + + // Trigger the native event and capture its result + // Support: IE <=9 - 11+ + // focus() and blur() are asynchronous + notAsync = expectSync( this, type ); + this[ type ](); + result = dataPriv.get( this, type ); + if ( saved !== result || notAsync ) { + dataPriv.set( this, type, false ); + } else { + result = {}; + } + if ( saved !== result ) { + + // Cancel the outer synthetic event + event.stopImmediatePropagation(); + event.preventDefault(); + + // Support: Chrome 86+ + // In Chrome, if an element having a focusout handler is blurred by + // clicking outside of it, it invokes the handler synchronously. If + // that handler calls `.remove()` on the element, the data is cleared, + // leaving `result` undefined. We need to guard against this. + return result && result.value; + } + + // If this is an inner synthetic event for an event with a bubbling surrogate + // (focus or blur), assume that the surrogate already propagated from triggering the + // native event and prevent that from happening again here. + // This technically gets the ordering wrong w.r.t. to `.trigger()` (in which the + // bubbling surrogate propagates *after* the non-bubbling base), but that seems + // less bad than duplication. + } else if ( ( jQuery.event.special[ type ] || {} ).delegateType ) { + event.stopPropagation(); + } + + // If this is a native event triggered above, everything is now in order + // Fire an inner synthetic event with the original arguments + } else if ( saved.length ) { + + // ...and capture the result + dataPriv.set( this, type, { + value: jQuery.event.trigger( + + // Support: IE <=9 - 11+ + // Extend with the prototype to reset the above stopImmediatePropagation() + jQuery.extend( saved[ 0 ], jQuery.Event.prototype ), + saved.slice( 1 ), + this + ) + } ); + + // Abort handling of the native event + event.stopImmediatePropagation(); + } + } + } ); +} + +jQuery.removeEvent = function( elem, type, handle ) { + + // This "if" is needed for plain objects + if ( elem.removeEventListener ) { + elem.removeEventListener( type, handle ); + } +}; + +jQuery.Event = function( src, props ) { + + // Allow instantiation without the 'new' keyword + if ( !( this instanceof jQuery.Event ) ) { + return new jQuery.Event( src, props ); + } + + // Event object + if ( src && src.type ) { + this.originalEvent = src; + this.type = src.type; + + // Events bubbling up the document may have been marked as prevented + // by a handler lower down the tree; reflect the correct value. + this.isDefaultPrevented = src.defaultPrevented || + src.defaultPrevented === undefined && + + // Support: Android <=2.3 only + src.returnValue === false ? + returnTrue : + returnFalse; + + // Create target properties + // Support: Safari <=6 - 7 only + // Target should not be a text node (#504, #13143) + this.target = ( src.target && src.target.nodeType === 3 ) ? + src.target.parentNode : + src.target; + + this.currentTarget = src.currentTarget; + this.relatedTarget = src.relatedTarget; + + // Event type + } else { + this.type = src; + } + + // Put explicitly provided properties onto the event object + if ( props ) { + jQuery.extend( this, props ); + } + + // Create a timestamp if incoming event doesn't have one + this.timeStamp = src && src.timeStamp || Date.now(); + + // Mark it as fixed + this[ jQuery.expando ] = true; +}; + +// jQuery.Event is based on DOM3 Events as specified by the ECMAScript Language Binding +// https://www.w3.org/TR/2003/WD-DOM-Level-3-Events-20030331/ecma-script-binding.html +jQuery.Event.prototype = { + constructor: jQuery.Event, + isDefaultPrevented: returnFalse, + isPropagationStopped: returnFalse, + isImmediatePropagationStopped: returnFalse, + isSimulated: false, + + preventDefault: function() { + var e = this.originalEvent; + + this.isDefaultPrevented = returnTrue; + + if ( e && !this.isSimulated ) { + e.preventDefault(); + } + }, + stopPropagation: function() { + var e = this.originalEvent; + + this.isPropagationStopped = returnTrue; + + if ( e && !this.isSimulated ) { + e.stopPropagation(); + } + }, + stopImmediatePropagation: function() { + var e = this.originalEvent; + + this.isImmediatePropagationStopped = returnTrue; + + if ( e && !this.isSimulated ) { + e.stopImmediatePropagation(); + } + + this.stopPropagation(); + } +}; + +// Includes all common event props including KeyEvent and MouseEvent specific props +jQuery.each( { + altKey: true, + bubbles: true, + cancelable: true, + changedTouches: true, + ctrlKey: true, + detail: true, + eventPhase: true, + metaKey: true, + pageX: true, + pageY: true, + shiftKey: true, + view: true, + "char": true, + code: true, + charCode: true, + key: true, + keyCode: true, + button: true, + buttons: true, + clientX: true, + clientY: true, + offsetX: true, + offsetY: true, + pointerId: true, + pointerType: true, + screenX: true, + screenY: true, + targetTouches: true, + toElement: true, + touches: true, + which: true +}, jQuery.event.addProp ); + +jQuery.each( { focus: "focusin", blur: "focusout" }, function( type, delegateType ) { + jQuery.event.special[ type ] = { + + // Utilize native event if possible so blur/focus sequence is correct + setup: function() { + + // Claim the first handler + // dataPriv.set( this, "focus", ... ) + // dataPriv.set( this, "blur", ... ) + leverageNative( this, type, expectSync ); + + // Return false to allow normal processing in the caller + return false; + }, + trigger: function() { + + // Force setup before trigger + leverageNative( this, type ); + + // Return non-false to allow normal event-path propagation + return true; + }, + + // Suppress native focus or blur as it's already being fired + // in leverageNative. + _default: function() { + return true; + }, + + delegateType: delegateType + }; +} ); + +// Create mouseenter/leave events using mouseover/out and event-time checks +// so that event delegation works in jQuery. +// Do the same for pointerenter/pointerleave and pointerover/pointerout +// +// Support: Safari 7 only +// Safari sends mouseenter too often; see: +// https://bugs.chromium.org/p/chromium/issues/detail?id=470258 +// for the description of the bug (it existed in older Chrome versions as well). +jQuery.each( { + mouseenter: "mouseover", + mouseleave: "mouseout", + pointerenter: "pointerover", + pointerleave: "pointerout" +}, function( orig, fix ) { + jQuery.event.special[ orig ] = { + delegateType: fix, + bindType: fix, + + handle: function( event ) { + var ret, + target = this, + related = event.relatedTarget, + handleObj = event.handleObj; + + // For mouseenter/leave call the handler if related is outside the target. + // NB: No relatedTarget if the mouse left/entered the browser window + if ( !related || ( related !== target && !jQuery.contains( target, related ) ) ) { + event.type = handleObj.origType; + ret = handleObj.handler.apply( this, arguments ); + event.type = fix; + } + return ret; + } + }; +} ); + +jQuery.fn.extend( { + + on: function( types, selector, data, fn ) { + return on( this, types, selector, data, fn ); + }, + one: function( types, selector, data, fn ) { + return on( this, types, selector, data, fn, 1 ); + }, + off: function( types, selector, fn ) { + var handleObj, type; + if ( types && types.preventDefault && types.handleObj ) { + + // ( event ) dispatched jQuery.Event + handleObj = types.handleObj; + jQuery( types.delegateTarget ).off( + handleObj.namespace ? + handleObj.origType + "." + handleObj.namespace : + handleObj.origType, + handleObj.selector, + handleObj.handler + ); + return this; + } + if ( typeof types === "object" ) { + + // ( types-object [, selector] ) + for ( type in types ) { + this.off( type, selector, types[ type ] ); + } + return this; + } + if ( selector === false || typeof selector === "function" ) { + + // ( types [, fn] ) + fn = selector; + selector = undefined; + } + if ( fn === false ) { + fn = returnFalse; + } + return this.each( function() { + jQuery.event.remove( this, types, fn, selector ); + } ); + } +} ); + + +var + + // Support: IE <=10 - 11, Edge 12 - 13 only + // In IE/Edge using regex groups here causes severe slowdowns. + // See https://connect.microsoft.com/IE/feedback/details/1736512/ + rnoInnerhtml = /\s*$/g; + +// Prefer a tbody over its parent table for containing new rows +function manipulationTarget( elem, content ) { + if ( nodeName( elem, "table" ) && + nodeName( content.nodeType !== 11 ? content : content.firstChild, "tr" ) ) { + + return jQuery( elem ).children( "tbody" )[ 0 ] || elem; + } + + return elem; +} + +// Replace/restore the type attribute of script elements for safe DOM manipulation +function disableScript( elem ) { + elem.type = ( elem.getAttribute( "type" ) !== null ) + "/" + elem.type; + return elem; +} +function restoreScript( elem ) { + if ( ( elem.type || "" ).slice( 0, 5 ) === "true/" ) { + elem.type = elem.type.slice( 5 ); + } else { + elem.removeAttribute( "type" ); + } + + return elem; +} + +function cloneCopyEvent( src, dest ) { + var i, l, type, pdataOld, udataOld, udataCur, events; + + if ( dest.nodeType !== 1 ) { + return; + } + + // 1. Copy private data: events, handlers, etc. + if ( dataPriv.hasData( src ) ) { + pdataOld = dataPriv.get( src ); + events = pdataOld.events; + + if ( events ) { + dataPriv.remove( dest, "handle events" ); + + for ( type in events ) { + for ( i = 0, l = events[ type ].length; i < l; i++ ) { + jQuery.event.add( dest, type, events[ type ][ i ] ); + } + } + } + } + + // 2. Copy user data + if ( dataUser.hasData( src ) ) { + udataOld = dataUser.access( src ); + udataCur = jQuery.extend( {}, udataOld ); + + dataUser.set( dest, udataCur ); + } +} + +// Fix IE bugs, see support tests +function fixInput( src, dest ) { + var nodeName = dest.nodeName.toLowerCase(); + + // Fails to persist the checked state of a cloned checkbox or radio button. + if ( nodeName === "input" && rcheckableType.test( src.type ) ) { + dest.checked = src.checked; + + // Fails to return the selected option to the default selected state when cloning options + } else if ( nodeName === "input" || nodeName === "textarea" ) { + dest.defaultValue = src.defaultValue; + } +} + +function domManip( collection, args, callback, ignored ) { + + // Flatten any nested arrays + args = flat( args ); + + var fragment, first, scripts, hasScripts, node, doc, + i = 0, + l = collection.length, + iNoClone = l - 1, + value = args[ 0 ], + valueIsFunction = isFunction( value ); + + // We can't cloneNode fragments that contain checked, in WebKit + if ( valueIsFunction || + ( l > 1 && typeof value === "string" && + !support.checkClone && rchecked.test( value ) ) ) { + return collection.each( function( index ) { + var self = collection.eq( index ); + if ( valueIsFunction ) { + args[ 0 ] = value.call( this, index, self.html() ); + } + domManip( self, args, callback, ignored ); + } ); + } + + if ( l ) { + fragment = buildFragment( args, collection[ 0 ].ownerDocument, false, collection, ignored ); + first = fragment.firstChild; + + if ( fragment.childNodes.length === 1 ) { + fragment = first; + } + + // Require either new content or an interest in ignored elements to invoke the callback + if ( first || ignored ) { + scripts = jQuery.map( getAll( fragment, "script" ), disableScript ); + hasScripts = scripts.length; + + // Use the original fragment for the last item + // instead of the first because it can end up + // being emptied incorrectly in certain situations (#8070). + for ( ; i < l; i++ ) { + node = fragment; + + if ( i !== iNoClone ) { + node = jQuery.clone( node, true, true ); + + // Keep references to cloned scripts for later restoration + if ( hasScripts ) { + + // Support: Android <=4.0 only, PhantomJS 1 only + // push.apply(_, arraylike) throws on ancient WebKit + jQuery.merge( scripts, getAll( node, "script" ) ); + } + } + + callback.call( collection[ i ], node, i ); + } + + if ( hasScripts ) { + doc = scripts[ scripts.length - 1 ].ownerDocument; + + // Reenable scripts + jQuery.map( scripts, restoreScript ); + + // Evaluate executable scripts on first document insertion + for ( i = 0; i < hasScripts; i++ ) { + node = scripts[ i ]; + if ( rscriptType.test( node.type || "" ) && + !dataPriv.access( node, "globalEval" ) && + jQuery.contains( doc, node ) ) { + + if ( node.src && ( node.type || "" ).toLowerCase() !== "module" ) { + + // Optional AJAX dependency, but won't run scripts if not present + if ( jQuery._evalUrl && !node.noModule ) { + jQuery._evalUrl( node.src, { + nonce: node.nonce || node.getAttribute( "nonce" ) + }, doc ); + } + } else { + DOMEval( node.textContent.replace( rcleanScript, "" ), node, doc ); + } + } + } + } + } + } + + return collection; +} + +function remove( elem, selector, keepData ) { + var node, + nodes = selector ? jQuery.filter( selector, elem ) : elem, + i = 0; + + for ( ; ( node = nodes[ i ] ) != null; i++ ) { + if ( !keepData && node.nodeType === 1 ) { + jQuery.cleanData( getAll( node ) ); + } + + if ( node.parentNode ) { + if ( keepData && isAttached( node ) ) { + setGlobalEval( getAll( node, "script" ) ); + } + node.parentNode.removeChild( node ); + } + } + + return elem; +} + +jQuery.extend( { + htmlPrefilter: function( html ) { + return html; + }, + + clone: function( elem, dataAndEvents, deepDataAndEvents ) { + var i, l, srcElements, destElements, + clone = elem.cloneNode( true ), + inPage = isAttached( elem ); + + // Fix IE cloning issues + if ( !support.noCloneChecked && ( elem.nodeType === 1 || elem.nodeType === 11 ) && + !jQuery.isXMLDoc( elem ) ) { + + // We eschew Sizzle here for performance reasons: https://jsperf.com/getall-vs-sizzle/2 + destElements = getAll( clone ); + srcElements = getAll( elem ); + + for ( i = 0, l = srcElements.length; i < l; i++ ) { + fixInput( srcElements[ i ], destElements[ i ] ); + } + } + + // Copy the events from the original to the clone + if ( dataAndEvents ) { + if ( deepDataAndEvents ) { + srcElements = srcElements || getAll( elem ); + destElements = destElements || getAll( clone ); + + for ( i = 0, l = srcElements.length; i < l; i++ ) { + cloneCopyEvent( srcElements[ i ], destElements[ i ] ); + } + } else { + cloneCopyEvent( elem, clone ); + } + } + + // Preserve script evaluation history + destElements = getAll( clone, "script" ); + if ( destElements.length > 0 ) { + setGlobalEval( destElements, !inPage && getAll( elem, "script" ) ); + } + + // Return the cloned set + return clone; + }, + + cleanData: function( elems ) { + var data, elem, type, + special = jQuery.event.special, + i = 0; + + for ( ; ( elem = elems[ i ] ) !== undefined; i++ ) { + if ( acceptData( elem ) ) { + if ( ( data = elem[ dataPriv.expando ] ) ) { + if ( data.events ) { + for ( type in data.events ) { + if ( special[ type ] ) { + jQuery.event.remove( elem, type ); + + // This is a shortcut to avoid jQuery.event.remove's overhead + } else { + jQuery.removeEvent( elem, type, data.handle ); + } + } + } + + // Support: Chrome <=35 - 45+ + // Assign undefined instead of using delete, see Data#remove + elem[ dataPriv.expando ] = undefined; + } + if ( elem[ dataUser.expando ] ) { + + // Support: Chrome <=35 - 45+ + // Assign undefined instead of using delete, see Data#remove + elem[ dataUser.expando ] = undefined; + } + } + } + } +} ); + +jQuery.fn.extend( { + detach: function( selector ) { + return remove( this, selector, true ); + }, + + remove: function( selector ) { + return remove( this, selector ); + }, + + text: function( value ) { + return access( this, function( value ) { + return value === undefined ? + jQuery.text( this ) : + this.empty().each( function() { + if ( this.nodeType === 1 || this.nodeType === 11 || this.nodeType === 9 ) { + this.textContent = value; + } + } ); + }, null, value, arguments.length ); + }, + + append: function() { + return domManip( this, arguments, function( elem ) { + if ( this.nodeType === 1 || this.nodeType === 11 || this.nodeType === 9 ) { + var target = manipulationTarget( this, elem ); + target.appendChild( elem ); + } + } ); + }, + + prepend: function() { + return domManip( this, arguments, function( elem ) { + if ( this.nodeType === 1 || this.nodeType === 11 || this.nodeType === 9 ) { + var target = manipulationTarget( this, elem ); + target.insertBefore( elem, target.firstChild ); + } + } ); + }, + + before: function() { + return domManip( this, arguments, function( elem ) { + if ( this.parentNode ) { + this.parentNode.insertBefore( elem, this ); + } + } ); + }, + + after: function() { + return domManip( this, arguments, function( elem ) { + if ( this.parentNode ) { + this.parentNode.insertBefore( elem, this.nextSibling ); + } + } ); + }, + + empty: function() { + var elem, + i = 0; + + for ( ; ( elem = this[ i ] ) != null; i++ ) { + if ( elem.nodeType === 1 ) { + + // Prevent memory leaks + jQuery.cleanData( getAll( elem, false ) ); + + // Remove any remaining nodes + elem.textContent = ""; + } + } + + return this; + }, + + clone: function( dataAndEvents, deepDataAndEvents ) { + dataAndEvents = dataAndEvents == null ? false : dataAndEvents; + deepDataAndEvents = deepDataAndEvents == null ? dataAndEvents : deepDataAndEvents; + + return this.map( function() { + return jQuery.clone( this, dataAndEvents, deepDataAndEvents ); + } ); + }, + + html: function( value ) { + return access( this, function( value ) { + var elem = this[ 0 ] || {}, + i = 0, + l = this.length; + + if ( value === undefined && elem.nodeType === 1 ) { + return elem.innerHTML; + } + + // See if we can take a shortcut and just use innerHTML + if ( typeof value === "string" && !rnoInnerhtml.test( value ) && + !wrapMap[ ( rtagName.exec( value ) || [ "", "" ] )[ 1 ].toLowerCase() ] ) { + + value = jQuery.htmlPrefilter( value ); + + try { + for ( ; i < l; i++ ) { + elem = this[ i ] || {}; + + // Remove element nodes and prevent memory leaks + if ( elem.nodeType === 1 ) { + jQuery.cleanData( getAll( elem, false ) ); + elem.innerHTML = value; + } + } + + elem = 0; + + // If using innerHTML throws an exception, use the fallback method + } catch ( e ) {} + } + + if ( elem ) { + this.empty().append( value ); + } + }, null, value, arguments.length ); + }, + + replaceWith: function() { + var ignored = []; + + // Make the changes, replacing each non-ignored context element with the new content + return domManip( this, arguments, function( elem ) { + var parent = this.parentNode; + + if ( jQuery.inArray( this, ignored ) < 0 ) { + jQuery.cleanData( getAll( this ) ); + if ( parent ) { + parent.replaceChild( elem, this ); + } + } + + // Force callback invocation + }, ignored ); + } +} ); + +jQuery.each( { + appendTo: "append", + prependTo: "prepend", + insertBefore: "before", + insertAfter: "after", + replaceAll: "replaceWith" +}, function( name, original ) { + jQuery.fn[ name ] = function( selector ) { + var elems, + ret = [], + insert = jQuery( selector ), + last = insert.length - 1, + i = 0; + + for ( ; i <= last; i++ ) { + elems = i === last ? this : this.clone( true ); + jQuery( insert[ i ] )[ original ]( elems ); + + // Support: Android <=4.0 only, PhantomJS 1 only + // .get() because push.apply(_, arraylike) throws on ancient WebKit + push.apply( ret, elems.get() ); + } + + return this.pushStack( ret ); + }; +} ); +var rnumnonpx = new RegExp( "^(" + pnum + ")(?!px)[a-z%]+$", "i" ); + +var getStyles = function( elem ) { + + // Support: IE <=11 only, Firefox <=30 (#15098, #14150) + // IE throws on elements created in popups + // FF meanwhile throws on frame elements through "defaultView.getComputedStyle" + var view = elem.ownerDocument.defaultView; + + if ( !view || !view.opener ) { + view = window; + } + + return view.getComputedStyle( elem ); + }; + +var swap = function( elem, options, callback ) { + var ret, name, + old = {}; + + // Remember the old values, and insert the new ones + for ( name in options ) { + old[ name ] = elem.style[ name ]; + elem.style[ name ] = options[ name ]; + } + + ret = callback.call( elem ); + + // Revert the old values + for ( name in options ) { + elem.style[ name ] = old[ name ]; + } + + return ret; +}; + + +var rboxStyle = new RegExp( cssExpand.join( "|" ), "i" ); + + + +( function() { + + // Executing both pixelPosition & boxSizingReliable tests require only one layout + // so they're executed at the same time to save the second computation. + function computeStyleTests() { + + // This is a singleton, we need to execute it only once + if ( !div ) { + return; + } + + container.style.cssText = "position:absolute;left:-11111px;width:60px;" + + "margin-top:1px;padding:0;border:0"; + div.style.cssText = + "position:relative;display:block;box-sizing:border-box;overflow:scroll;" + + "margin:auto;border:1px;padding:1px;" + + "width:60%;top:1%"; + documentElement.appendChild( container ).appendChild( div ); + + var divStyle = window.getComputedStyle( div ); + pixelPositionVal = divStyle.top !== "1%"; + + // Support: Android 4.0 - 4.3 only, Firefox <=3 - 44 + reliableMarginLeftVal = roundPixelMeasures( divStyle.marginLeft ) === 12; + + // Support: Android 4.0 - 4.3 only, Safari <=9.1 - 10.1, iOS <=7.0 - 9.3 + // Some styles come back with percentage values, even though they shouldn't + div.style.right = "60%"; + pixelBoxStylesVal = roundPixelMeasures( divStyle.right ) === 36; + + // Support: IE 9 - 11 only + // Detect misreporting of content dimensions for box-sizing:border-box elements + boxSizingReliableVal = roundPixelMeasures( divStyle.width ) === 36; + + // Support: IE 9 only + // Detect overflow:scroll screwiness (gh-3699) + // Support: Chrome <=64 + // Don't get tricked when zoom affects offsetWidth (gh-4029) + div.style.position = "absolute"; + scrollboxSizeVal = roundPixelMeasures( div.offsetWidth / 3 ) === 12; + + documentElement.removeChild( container ); + + // Nullify the div so it wouldn't be stored in the memory and + // it will also be a sign that checks already performed + div = null; + } + + function roundPixelMeasures( measure ) { + return Math.round( parseFloat( measure ) ); + } + + var pixelPositionVal, boxSizingReliableVal, scrollboxSizeVal, pixelBoxStylesVal, + reliableTrDimensionsVal, reliableMarginLeftVal, + container = document.createElement( "div" ), + div = document.createElement( "div" ); + + // Finish early in limited (non-browser) environments + if ( !div.style ) { + return; + } + + // Support: IE <=9 - 11 only + // Style of cloned element affects source element cloned (#8908) + div.style.backgroundClip = "content-box"; + div.cloneNode( true ).style.backgroundClip = ""; + support.clearCloneStyle = div.style.backgroundClip === "content-box"; + + jQuery.extend( support, { + boxSizingReliable: function() { + computeStyleTests(); + return boxSizingReliableVal; + }, + pixelBoxStyles: function() { + computeStyleTests(); + return pixelBoxStylesVal; + }, + pixelPosition: function() { + computeStyleTests(); + return pixelPositionVal; + }, + reliableMarginLeft: function() { + computeStyleTests(); + return reliableMarginLeftVal; + }, + scrollboxSize: function() { + computeStyleTests(); + return scrollboxSizeVal; + }, + + // Support: IE 9 - 11+, Edge 15 - 18+ + // IE/Edge misreport `getComputedStyle` of table rows with width/height + // set in CSS while `offset*` properties report correct values. + // Behavior in IE 9 is more subtle than in newer versions & it passes + // some versions of this test; make sure not to make it pass there! + // + // Support: Firefox 70+ + // Only Firefox includes border widths + // in computed dimensions. (gh-4529) + reliableTrDimensions: function() { + var table, tr, trChild, trStyle; + if ( reliableTrDimensionsVal == null ) { + table = document.createElement( "table" ); + tr = document.createElement( "tr" ); + trChild = document.createElement( "div" ); + + table.style.cssText = "position:absolute;left:-11111px;border-collapse:separate"; + tr.style.cssText = "border:1px solid"; + + // Support: Chrome 86+ + // Height set through cssText does not get applied. + // Computed height then comes back as 0. + tr.style.height = "1px"; + trChild.style.height = "9px"; + + // Support: Android 8 Chrome 86+ + // In our bodyBackground.html iframe, + // display for all div elements is set to "inline", + // which causes a problem only in Android 8 Chrome 86. + // Ensuring the div is display: block + // gets around this issue. + trChild.style.display = "block"; + + documentElement + .appendChild( table ) + .appendChild( tr ) + .appendChild( trChild ); + + trStyle = window.getComputedStyle( tr ); + reliableTrDimensionsVal = ( parseInt( trStyle.height, 10 ) + + parseInt( trStyle.borderTopWidth, 10 ) + + parseInt( trStyle.borderBottomWidth, 10 ) ) === tr.offsetHeight; + + documentElement.removeChild( table ); + } + return reliableTrDimensionsVal; + } + } ); +} )(); + + +function curCSS( elem, name, computed ) { + var width, minWidth, maxWidth, ret, + + // Support: Firefox 51+ + // Retrieving style before computed somehow + // fixes an issue with getting wrong values + // on detached elements + style = elem.style; + + computed = computed || getStyles( elem ); + + // getPropertyValue is needed for: + // .css('filter') (IE 9 only, #12537) + // .css('--customProperty) (#3144) + if ( computed ) { + ret = computed.getPropertyValue( name ) || computed[ name ]; + + if ( ret === "" && !isAttached( elem ) ) { + ret = jQuery.style( elem, name ); + } + + // A tribute to the "awesome hack by Dean Edwards" + // Android Browser returns percentage for some values, + // but width seems to be reliably pixels. + // This is against the CSSOM draft spec: + // https://drafts.csswg.org/cssom/#resolved-values + if ( !support.pixelBoxStyles() && rnumnonpx.test( ret ) && rboxStyle.test( name ) ) { + + // Remember the original values + width = style.width; + minWidth = style.minWidth; + maxWidth = style.maxWidth; + + // Put in the new values to get a computed value out + style.minWidth = style.maxWidth = style.width = ret; + ret = computed.width; + + // Revert the changed values + style.width = width; + style.minWidth = minWidth; + style.maxWidth = maxWidth; + } + } + + return ret !== undefined ? + + // Support: IE <=9 - 11 only + // IE returns zIndex value as an integer. + ret + "" : + ret; +} + + +function addGetHookIf( conditionFn, hookFn ) { + + // Define the hook, we'll check on the first run if it's really needed. + return { + get: function() { + if ( conditionFn() ) { + + // Hook not needed (or it's not possible to use it due + // to missing dependency), remove it. + delete this.get; + return; + } + + // Hook needed; redefine it so that the support test is not executed again. + return ( this.get = hookFn ).apply( this, arguments ); + } + }; +} + + +var cssPrefixes = [ "Webkit", "Moz", "ms" ], + emptyStyle = document.createElement( "div" ).style, + vendorProps = {}; + +// Return a vendor-prefixed property or undefined +function vendorPropName( name ) { + + // Check for vendor prefixed names + var capName = name[ 0 ].toUpperCase() + name.slice( 1 ), + i = cssPrefixes.length; + + while ( i-- ) { + name = cssPrefixes[ i ] + capName; + if ( name in emptyStyle ) { + return name; + } + } +} + +// Return a potentially-mapped jQuery.cssProps or vendor prefixed property +function finalPropName( name ) { + var final = jQuery.cssProps[ name ] || vendorProps[ name ]; + + if ( final ) { + return final; + } + if ( name in emptyStyle ) { + return name; + } + return vendorProps[ name ] = vendorPropName( name ) || name; +} + + +var + + // Swappable if display is none or starts with table + // except "table", "table-cell", or "table-caption" + // See here for display values: https://developer.mozilla.org/en-US/docs/CSS/display + rdisplayswap = /^(none|table(?!-c[ea]).+)/, + rcustomProp = /^--/, + cssShow = { position: "absolute", visibility: "hidden", display: "block" }, + cssNormalTransform = { + letterSpacing: "0", + fontWeight: "400" + }; + +function setPositiveNumber( _elem, value, subtract ) { + + // Any relative (+/-) values have already been + // normalized at this point + var matches = rcssNum.exec( value ); + return matches ? + + // Guard against undefined "subtract", e.g., when used as in cssHooks + Math.max( 0, matches[ 2 ] - ( subtract || 0 ) ) + ( matches[ 3 ] || "px" ) : + value; +} + +function boxModelAdjustment( elem, dimension, box, isBorderBox, styles, computedVal ) { + var i = dimension === "width" ? 1 : 0, + extra = 0, + delta = 0; + + // Adjustment may not be necessary + if ( box === ( isBorderBox ? "border" : "content" ) ) { + return 0; + } + + for ( ; i < 4; i += 2 ) { + + // Both box models exclude margin + if ( box === "margin" ) { + delta += jQuery.css( elem, box + cssExpand[ i ], true, styles ); + } + + // If we get here with a content-box, we're seeking "padding" or "border" or "margin" + if ( !isBorderBox ) { + + // Add padding + delta += jQuery.css( elem, "padding" + cssExpand[ i ], true, styles ); + + // For "border" or "margin", add border + if ( box !== "padding" ) { + delta += jQuery.css( elem, "border" + cssExpand[ i ] + "Width", true, styles ); + + // But still keep track of it otherwise + } else { + extra += jQuery.css( elem, "border" + cssExpand[ i ] + "Width", true, styles ); + } + + // If we get here with a border-box (content + padding + border), we're seeking "content" or + // "padding" or "margin" + } else { + + // For "content", subtract padding + if ( box === "content" ) { + delta -= jQuery.css( elem, "padding" + cssExpand[ i ], true, styles ); + } + + // For "content" or "padding", subtract border + if ( box !== "margin" ) { + delta -= jQuery.css( elem, "border" + cssExpand[ i ] + "Width", true, styles ); + } + } + } + + // Account for positive content-box scroll gutter when requested by providing computedVal + if ( !isBorderBox && computedVal >= 0 ) { + + // offsetWidth/offsetHeight is a rounded sum of content, padding, scroll gutter, and border + // Assuming integer scroll gutter, subtract the rest and round down + delta += Math.max( 0, Math.ceil( + elem[ "offset" + dimension[ 0 ].toUpperCase() + dimension.slice( 1 ) ] - + computedVal - + delta - + extra - + 0.5 + + // If offsetWidth/offsetHeight is unknown, then we can't determine content-box scroll gutter + // Use an explicit zero to avoid NaN (gh-3964) + ) ) || 0; + } + + return delta; +} + +function getWidthOrHeight( elem, dimension, extra ) { + + // Start with computed style + var styles = getStyles( elem ), + + // To avoid forcing a reflow, only fetch boxSizing if we need it (gh-4322). + // Fake content-box until we know it's needed to know the true value. + boxSizingNeeded = !support.boxSizingReliable() || extra, + isBorderBox = boxSizingNeeded && + jQuery.css( elem, "boxSizing", false, styles ) === "border-box", + valueIsBorderBox = isBorderBox, + + val = curCSS( elem, dimension, styles ), + offsetProp = "offset" + dimension[ 0 ].toUpperCase() + dimension.slice( 1 ); + + // Support: Firefox <=54 + // Return a confounding non-pixel value or feign ignorance, as appropriate. + if ( rnumnonpx.test( val ) ) { + if ( !extra ) { + return val; + } + val = "auto"; + } + + + // Support: IE 9 - 11 only + // Use offsetWidth/offsetHeight for when box sizing is unreliable. + // In those cases, the computed value can be trusted to be border-box. + if ( ( !support.boxSizingReliable() && isBorderBox || + + // Support: IE 10 - 11+, Edge 15 - 18+ + // IE/Edge misreport `getComputedStyle` of table rows with width/height + // set in CSS while `offset*` properties report correct values. + // Interestingly, in some cases IE 9 doesn't suffer from this issue. + !support.reliableTrDimensions() && nodeName( elem, "tr" ) || + + // Fall back to offsetWidth/offsetHeight when value is "auto" + // This happens for inline elements with no explicit setting (gh-3571) + val === "auto" || + + // Support: Android <=4.1 - 4.3 only + // Also use offsetWidth/offsetHeight for misreported inline dimensions (gh-3602) + !parseFloat( val ) && jQuery.css( elem, "display", false, styles ) === "inline" ) && + + // Make sure the element is visible & connected + elem.getClientRects().length ) { + + isBorderBox = jQuery.css( elem, "boxSizing", false, styles ) === "border-box"; + + // Where available, offsetWidth/offsetHeight approximate border box dimensions. + // Where not available (e.g., SVG), assume unreliable box-sizing and interpret the + // retrieved value as a content box dimension. + valueIsBorderBox = offsetProp in elem; + if ( valueIsBorderBox ) { + val = elem[ offsetProp ]; + } + } + + // Normalize "" and auto + val = parseFloat( val ) || 0; + + // Adjust for the element's box model + return ( val + + boxModelAdjustment( + elem, + dimension, + extra || ( isBorderBox ? "border" : "content" ), + valueIsBorderBox, + styles, + + // Provide the current computed size to request scroll gutter calculation (gh-3589) + val + ) + ) + "px"; +} + +jQuery.extend( { + + // Add in style property hooks for overriding the default + // behavior of getting and setting a style property + cssHooks: { + opacity: { + get: function( elem, computed ) { + if ( computed ) { + + // We should always get a number back from opacity + var ret = curCSS( elem, "opacity" ); + return ret === "" ? "1" : ret; + } + } + } + }, + + // Don't automatically add "px" to these possibly-unitless properties + cssNumber: { + "animationIterationCount": true, + "columnCount": true, + "fillOpacity": true, + "flexGrow": true, + "flexShrink": true, + "fontWeight": true, + "gridArea": true, + "gridColumn": true, + "gridColumnEnd": true, + "gridColumnStart": true, + "gridRow": true, + "gridRowEnd": true, + "gridRowStart": true, + "lineHeight": true, + "opacity": true, + "order": true, + "orphans": true, + "widows": true, + "zIndex": true, + "zoom": true + }, + + // Add in properties whose names you wish to fix before + // setting or getting the value + cssProps: {}, + + // Get and set the style property on a DOM Node + style: function( elem, name, value, extra ) { + + // Don't set styles on text and comment nodes + if ( !elem || elem.nodeType === 3 || elem.nodeType === 8 || !elem.style ) { + return; + } + + // Make sure that we're working with the right name + var ret, type, hooks, + origName = camelCase( name ), + isCustomProp = rcustomProp.test( name ), + style = elem.style; + + // Make sure that we're working with the right name. We don't + // want to query the value if it is a CSS custom property + // since they are user-defined. + if ( !isCustomProp ) { + name = finalPropName( origName ); + } + + // Gets hook for the prefixed version, then unprefixed version + hooks = jQuery.cssHooks[ name ] || jQuery.cssHooks[ origName ]; + + // Check if we're setting a value + if ( value !== undefined ) { + type = typeof value; + + // Convert "+=" or "-=" to relative numbers (#7345) + if ( type === "string" && ( ret = rcssNum.exec( value ) ) && ret[ 1 ] ) { + value = adjustCSS( elem, name, ret ); + + // Fixes bug #9237 + type = "number"; + } + + // Make sure that null and NaN values aren't set (#7116) + if ( value == null || value !== value ) { + return; + } + + // If a number was passed in, add the unit (except for certain CSS properties) + // The isCustomProp check can be removed in jQuery 4.0 when we only auto-append + // "px" to a few hardcoded values. + if ( type === "number" && !isCustomProp ) { + value += ret && ret[ 3 ] || ( jQuery.cssNumber[ origName ] ? "" : "px" ); + } + + // background-* props affect original clone's values + if ( !support.clearCloneStyle && value === "" && name.indexOf( "background" ) === 0 ) { + style[ name ] = "inherit"; + } + + // If a hook was provided, use that value, otherwise just set the specified value + if ( !hooks || !( "set" in hooks ) || + ( value = hooks.set( elem, value, extra ) ) !== undefined ) { + + if ( isCustomProp ) { + style.setProperty( name, value ); + } else { + style[ name ] = value; + } + } + + } else { + + // If a hook was provided get the non-computed value from there + if ( hooks && "get" in hooks && + ( ret = hooks.get( elem, false, extra ) ) !== undefined ) { + + return ret; + } + + // Otherwise just get the value from the style object + return style[ name ]; + } + }, + + css: function( elem, name, extra, styles ) { + var val, num, hooks, + origName = camelCase( name ), + isCustomProp = rcustomProp.test( name ); + + // Make sure that we're working with the right name. We don't + // want to modify the value if it is a CSS custom property + // since they are user-defined. + if ( !isCustomProp ) { + name = finalPropName( origName ); + } + + // Try prefixed name followed by the unprefixed name + hooks = jQuery.cssHooks[ name ] || jQuery.cssHooks[ origName ]; + + // If a hook was provided get the computed value from there + if ( hooks && "get" in hooks ) { + val = hooks.get( elem, true, extra ); + } + + // Otherwise, if a way to get the computed value exists, use that + if ( val === undefined ) { + val = curCSS( elem, name, styles ); + } + + // Convert "normal" to computed value + if ( val === "normal" && name in cssNormalTransform ) { + val = cssNormalTransform[ name ]; + } + + // Make numeric if forced or a qualifier was provided and val looks numeric + if ( extra === "" || extra ) { + num = parseFloat( val ); + return extra === true || isFinite( num ) ? num || 0 : val; + } + + return val; + } +} ); + +jQuery.each( [ "height", "width" ], function( _i, dimension ) { + jQuery.cssHooks[ dimension ] = { + get: function( elem, computed, extra ) { + if ( computed ) { + + // Certain elements can have dimension info if we invisibly show them + // but it must have a current display style that would benefit + return rdisplayswap.test( jQuery.css( elem, "display" ) ) && + + // Support: Safari 8+ + // Table columns in Safari have non-zero offsetWidth & zero + // getBoundingClientRect().width unless display is changed. + // Support: IE <=11 only + // Running getBoundingClientRect on a disconnected node + // in IE throws an error. + ( !elem.getClientRects().length || !elem.getBoundingClientRect().width ) ? + swap( elem, cssShow, function() { + return getWidthOrHeight( elem, dimension, extra ); + } ) : + getWidthOrHeight( elem, dimension, extra ); + } + }, + + set: function( elem, value, extra ) { + var matches, + styles = getStyles( elem ), + + // Only read styles.position if the test has a chance to fail + // to avoid forcing a reflow. + scrollboxSizeBuggy = !support.scrollboxSize() && + styles.position === "absolute", + + // To avoid forcing a reflow, only fetch boxSizing if we need it (gh-3991) + boxSizingNeeded = scrollboxSizeBuggy || extra, + isBorderBox = boxSizingNeeded && + jQuery.css( elem, "boxSizing", false, styles ) === "border-box", + subtract = extra ? + boxModelAdjustment( + elem, + dimension, + extra, + isBorderBox, + styles + ) : + 0; + + // Account for unreliable border-box dimensions by comparing offset* to computed and + // faking a content-box to get border and padding (gh-3699) + if ( isBorderBox && scrollboxSizeBuggy ) { + subtract -= Math.ceil( + elem[ "offset" + dimension[ 0 ].toUpperCase() + dimension.slice( 1 ) ] - + parseFloat( styles[ dimension ] ) - + boxModelAdjustment( elem, dimension, "border", false, styles ) - + 0.5 + ); + } + + // Convert to pixels if value adjustment is needed + if ( subtract && ( matches = rcssNum.exec( value ) ) && + ( matches[ 3 ] || "px" ) !== "px" ) { + + elem.style[ dimension ] = value; + value = jQuery.css( elem, dimension ); + } + + return setPositiveNumber( elem, value, subtract ); + } + }; +} ); + +jQuery.cssHooks.marginLeft = addGetHookIf( support.reliableMarginLeft, + function( elem, computed ) { + if ( computed ) { + return ( parseFloat( curCSS( elem, "marginLeft" ) ) || + elem.getBoundingClientRect().left - + swap( elem, { marginLeft: 0 }, function() { + return elem.getBoundingClientRect().left; + } ) + ) + "px"; + } + } +); + +// These hooks are used by animate to expand properties +jQuery.each( { + margin: "", + padding: "", + border: "Width" +}, function( prefix, suffix ) { + jQuery.cssHooks[ prefix + suffix ] = { + expand: function( value ) { + var i = 0, + expanded = {}, + + // Assumes a single number if not a string + parts = typeof value === "string" ? value.split( " " ) : [ value ]; + + for ( ; i < 4; i++ ) { + expanded[ prefix + cssExpand[ i ] + suffix ] = + parts[ i ] || parts[ i - 2 ] || parts[ 0 ]; + } + + return expanded; + } + }; + + if ( prefix !== "margin" ) { + jQuery.cssHooks[ prefix + suffix ].set = setPositiveNumber; + } +} ); + +jQuery.fn.extend( { + css: function( name, value ) { + return access( this, function( elem, name, value ) { + var styles, len, + map = {}, + i = 0; + + if ( Array.isArray( name ) ) { + styles = getStyles( elem ); + len = name.length; + + for ( ; i < len; i++ ) { + map[ name[ i ] ] = jQuery.css( elem, name[ i ], false, styles ); + } + + return map; + } + + return value !== undefined ? + jQuery.style( elem, name, value ) : + jQuery.css( elem, name ); + }, name, value, arguments.length > 1 ); + } +} ); + + +function Tween( elem, options, prop, end, easing ) { + return new Tween.prototype.init( elem, options, prop, end, easing ); +} +jQuery.Tween = Tween; + +Tween.prototype = { + constructor: Tween, + init: function( elem, options, prop, end, easing, unit ) { + this.elem = elem; + this.prop = prop; + this.easing = easing || jQuery.easing._default; + this.options = options; + this.start = this.now = this.cur(); + this.end = end; + this.unit = unit || ( jQuery.cssNumber[ prop ] ? "" : "px" ); + }, + cur: function() { + var hooks = Tween.propHooks[ this.prop ]; + + return hooks && hooks.get ? + hooks.get( this ) : + Tween.propHooks._default.get( this ); + }, + run: function( percent ) { + var eased, + hooks = Tween.propHooks[ this.prop ]; + + if ( this.options.duration ) { + this.pos = eased = jQuery.easing[ this.easing ]( + percent, this.options.duration * percent, 0, 1, this.options.duration + ); + } else { + this.pos = eased = percent; + } + this.now = ( this.end - this.start ) * eased + this.start; + + if ( this.options.step ) { + this.options.step.call( this.elem, this.now, this ); + } + + if ( hooks && hooks.set ) { + hooks.set( this ); + } else { + Tween.propHooks._default.set( this ); + } + return this; + } +}; + +Tween.prototype.init.prototype = Tween.prototype; + +Tween.propHooks = { + _default: { + get: function( tween ) { + var result; + + // Use a property on the element directly when it is not a DOM element, + // or when there is no matching style property that exists. + if ( tween.elem.nodeType !== 1 || + tween.elem[ tween.prop ] != null && tween.elem.style[ tween.prop ] == null ) { + return tween.elem[ tween.prop ]; + } + + // Passing an empty string as a 3rd parameter to .css will automatically + // attempt a parseFloat and fallback to a string if the parse fails. + // Simple values such as "10px" are parsed to Float; + // complex values such as "rotate(1rad)" are returned as-is. + result = jQuery.css( tween.elem, tween.prop, "" ); + + // Empty strings, null, undefined and "auto" are converted to 0. + return !result || result === "auto" ? 0 : result; + }, + set: function( tween ) { + + // Use step hook for back compat. + // Use cssHook if its there. + // Use .style if available and use plain properties where available. + if ( jQuery.fx.step[ tween.prop ] ) { + jQuery.fx.step[ tween.prop ]( tween ); + } else if ( tween.elem.nodeType === 1 && ( + jQuery.cssHooks[ tween.prop ] || + tween.elem.style[ finalPropName( tween.prop ) ] != null ) ) { + jQuery.style( tween.elem, tween.prop, tween.now + tween.unit ); + } else { + tween.elem[ tween.prop ] = tween.now; + } + } + } +}; + +// Support: IE <=9 only +// Panic based approach to setting things on disconnected nodes +Tween.propHooks.scrollTop = Tween.propHooks.scrollLeft = { + set: function( tween ) { + if ( tween.elem.nodeType && tween.elem.parentNode ) { + tween.elem[ tween.prop ] = tween.now; + } + } +}; + +jQuery.easing = { + linear: function( p ) { + return p; + }, + swing: function( p ) { + return 0.5 - Math.cos( p * Math.PI ) / 2; + }, + _default: "swing" +}; + +jQuery.fx = Tween.prototype.init; + +// Back compat <1.8 extension point +jQuery.fx.step = {}; + + + + +var + fxNow, inProgress, + rfxtypes = /^(?:toggle|show|hide)$/, + rrun = /queueHooks$/; + +function schedule() { + if ( inProgress ) { + if ( document.hidden === false && window.requestAnimationFrame ) { + window.requestAnimationFrame( schedule ); + } else { + window.setTimeout( schedule, jQuery.fx.interval ); + } + + jQuery.fx.tick(); + } +} + +// Animations created synchronously will run synchronously +function createFxNow() { + window.setTimeout( function() { + fxNow = undefined; + } ); + return ( fxNow = Date.now() ); +} + +// Generate parameters to create a standard animation +function genFx( type, includeWidth ) { + var which, + i = 0, + attrs = { height: type }; + + // If we include width, step value is 1 to do all cssExpand values, + // otherwise step value is 2 to skip over Left and Right + includeWidth = includeWidth ? 1 : 0; + for ( ; i < 4; i += 2 - includeWidth ) { + which = cssExpand[ i ]; + attrs[ "margin" + which ] = attrs[ "padding" + which ] = type; + } + + if ( includeWidth ) { + attrs.opacity = attrs.width = type; + } + + return attrs; +} + +function createTween( value, prop, animation ) { + var tween, + collection = ( Animation.tweeners[ prop ] || [] ).concat( Animation.tweeners[ "*" ] ), + index = 0, + length = collection.length; + for ( ; index < length; index++ ) { + if ( ( tween = collection[ index ].call( animation, prop, value ) ) ) { + + // We're done with this property + return tween; + } + } +} + +function defaultPrefilter( elem, props, opts ) { + var prop, value, toggle, hooks, oldfire, propTween, restoreDisplay, display, + isBox = "width" in props || "height" in props, + anim = this, + orig = {}, + style = elem.style, + hidden = elem.nodeType && isHiddenWithinTree( elem ), + dataShow = dataPriv.get( elem, "fxshow" ); + + // Queue-skipping animations hijack the fx hooks + if ( !opts.queue ) { + hooks = jQuery._queueHooks( elem, "fx" ); + if ( hooks.unqueued == null ) { + hooks.unqueued = 0; + oldfire = hooks.empty.fire; + hooks.empty.fire = function() { + if ( !hooks.unqueued ) { + oldfire(); + } + }; + } + hooks.unqueued++; + + anim.always( function() { + + // Ensure the complete handler is called before this completes + anim.always( function() { + hooks.unqueued--; + if ( !jQuery.queue( elem, "fx" ).length ) { + hooks.empty.fire(); + } + } ); + } ); + } + + // Detect show/hide animations + for ( prop in props ) { + value = props[ prop ]; + if ( rfxtypes.test( value ) ) { + delete props[ prop ]; + toggle = toggle || value === "toggle"; + if ( value === ( hidden ? "hide" : "show" ) ) { + + // Pretend to be hidden if this is a "show" and + // there is still data from a stopped show/hide + if ( value === "show" && dataShow && dataShow[ prop ] !== undefined ) { + hidden = true; + + // Ignore all other no-op show/hide data + } else { + continue; + } + } + orig[ prop ] = dataShow && dataShow[ prop ] || jQuery.style( elem, prop ); + } + } + + // Bail out if this is a no-op like .hide().hide() + propTween = !jQuery.isEmptyObject( props ); + if ( !propTween && jQuery.isEmptyObject( orig ) ) { + return; + } + + // Restrict "overflow" and "display" styles during box animations + if ( isBox && elem.nodeType === 1 ) { + + // Support: IE <=9 - 11, Edge 12 - 15 + // Record all 3 overflow attributes because IE does not infer the shorthand + // from identically-valued overflowX and overflowY and Edge just mirrors + // the overflowX value there. + opts.overflow = [ style.overflow, style.overflowX, style.overflowY ]; + + // Identify a display type, preferring old show/hide data over the CSS cascade + restoreDisplay = dataShow && dataShow.display; + if ( restoreDisplay == null ) { + restoreDisplay = dataPriv.get( elem, "display" ); + } + display = jQuery.css( elem, "display" ); + if ( display === "none" ) { + if ( restoreDisplay ) { + display = restoreDisplay; + } else { + + // Get nonempty value(s) by temporarily forcing visibility + showHide( [ elem ], true ); + restoreDisplay = elem.style.display || restoreDisplay; + display = jQuery.css( elem, "display" ); + showHide( [ elem ] ); + } + } + + // Animate inline elements as inline-block + if ( display === "inline" || display === "inline-block" && restoreDisplay != null ) { + if ( jQuery.css( elem, "float" ) === "none" ) { + + // Restore the original display value at the end of pure show/hide animations + if ( !propTween ) { + anim.done( function() { + style.display = restoreDisplay; + } ); + if ( restoreDisplay == null ) { + display = style.display; + restoreDisplay = display === "none" ? "" : display; + } + } + style.display = "inline-block"; + } + } + } + + if ( opts.overflow ) { + style.overflow = "hidden"; + anim.always( function() { + style.overflow = opts.overflow[ 0 ]; + style.overflowX = opts.overflow[ 1 ]; + style.overflowY = opts.overflow[ 2 ]; + } ); + } + + // Implement show/hide animations + propTween = false; + for ( prop in orig ) { + + // General show/hide setup for this element animation + if ( !propTween ) { + if ( dataShow ) { + if ( "hidden" in dataShow ) { + hidden = dataShow.hidden; + } + } else { + dataShow = dataPriv.access( elem, "fxshow", { display: restoreDisplay } ); + } + + // Store hidden/visible for toggle so `.stop().toggle()` "reverses" + if ( toggle ) { + dataShow.hidden = !hidden; + } + + // Show elements before animating them + if ( hidden ) { + showHide( [ elem ], true ); + } + + /* eslint-disable no-loop-func */ + + anim.done( function() { + + /* eslint-enable no-loop-func */ + + // The final step of a "hide" animation is actually hiding the element + if ( !hidden ) { + showHide( [ elem ] ); + } + dataPriv.remove( elem, "fxshow" ); + for ( prop in orig ) { + jQuery.style( elem, prop, orig[ prop ] ); + } + } ); + } + + // Per-property setup + propTween = createTween( hidden ? dataShow[ prop ] : 0, prop, anim ); + if ( !( prop in dataShow ) ) { + dataShow[ prop ] = propTween.start; + if ( hidden ) { + propTween.end = propTween.start; + propTween.start = 0; + } + } + } +} + +function propFilter( props, specialEasing ) { + var index, name, easing, value, hooks; + + // camelCase, specialEasing and expand cssHook pass + for ( index in props ) { + name = camelCase( index ); + easing = specialEasing[ name ]; + value = props[ index ]; + if ( Array.isArray( value ) ) { + easing = value[ 1 ]; + value = props[ index ] = value[ 0 ]; + } + + if ( index !== name ) { + props[ name ] = value; + delete props[ index ]; + } + + hooks = jQuery.cssHooks[ name ]; + if ( hooks && "expand" in hooks ) { + value = hooks.expand( value ); + delete props[ name ]; + + // Not quite $.extend, this won't overwrite existing keys. + // Reusing 'index' because we have the correct "name" + for ( index in value ) { + if ( !( index in props ) ) { + props[ index ] = value[ index ]; + specialEasing[ index ] = easing; + } + } + } else { + specialEasing[ name ] = easing; + } + } +} + +function Animation( elem, properties, options ) { + var result, + stopped, + index = 0, + length = Animation.prefilters.length, + deferred = jQuery.Deferred().always( function() { + + // Don't match elem in the :animated selector + delete tick.elem; + } ), + tick = function() { + if ( stopped ) { + return false; + } + var currentTime = fxNow || createFxNow(), + remaining = Math.max( 0, animation.startTime + animation.duration - currentTime ), + + // Support: Android 2.3 only + // Archaic crash bug won't allow us to use `1 - ( 0.5 || 0 )` (#12497) + temp = remaining / animation.duration || 0, + percent = 1 - temp, + index = 0, + length = animation.tweens.length; + + for ( ; index < length; index++ ) { + animation.tweens[ index ].run( percent ); + } + + deferred.notifyWith( elem, [ animation, percent, remaining ] ); + + // If there's more to do, yield + if ( percent < 1 && length ) { + return remaining; + } + + // If this was an empty animation, synthesize a final progress notification + if ( !length ) { + deferred.notifyWith( elem, [ animation, 1, 0 ] ); + } + + // Resolve the animation and report its conclusion + deferred.resolveWith( elem, [ animation ] ); + return false; + }, + animation = deferred.promise( { + elem: elem, + props: jQuery.extend( {}, properties ), + opts: jQuery.extend( true, { + specialEasing: {}, + easing: jQuery.easing._default + }, options ), + originalProperties: properties, + originalOptions: options, + startTime: fxNow || createFxNow(), + duration: options.duration, + tweens: [], + createTween: function( prop, end ) { + var tween = jQuery.Tween( elem, animation.opts, prop, end, + animation.opts.specialEasing[ prop ] || animation.opts.easing ); + animation.tweens.push( tween ); + return tween; + }, + stop: function( gotoEnd ) { + var index = 0, + + // If we are going to the end, we want to run all the tweens + // otherwise we skip this part + length = gotoEnd ? animation.tweens.length : 0; + if ( stopped ) { + return this; + } + stopped = true; + for ( ; index < length; index++ ) { + animation.tweens[ index ].run( 1 ); + } + + // Resolve when we played the last frame; otherwise, reject + if ( gotoEnd ) { + deferred.notifyWith( elem, [ animation, 1, 0 ] ); + deferred.resolveWith( elem, [ animation, gotoEnd ] ); + } else { + deferred.rejectWith( elem, [ animation, gotoEnd ] ); + } + return this; + } + } ), + props = animation.props; + + propFilter( props, animation.opts.specialEasing ); + + for ( ; index < length; index++ ) { + result = Animation.prefilters[ index ].call( animation, elem, props, animation.opts ); + if ( result ) { + if ( isFunction( result.stop ) ) { + jQuery._queueHooks( animation.elem, animation.opts.queue ).stop = + result.stop.bind( result ); + } + return result; + } + } + + jQuery.map( props, createTween, animation ); + + if ( isFunction( animation.opts.start ) ) { + animation.opts.start.call( elem, animation ); + } + + // Attach callbacks from options + animation + .progress( animation.opts.progress ) + .done( animation.opts.done, animation.opts.complete ) + .fail( animation.opts.fail ) + .always( animation.opts.always ); + + jQuery.fx.timer( + jQuery.extend( tick, { + elem: elem, + anim: animation, + queue: animation.opts.queue + } ) + ); + + return animation; +} + +jQuery.Animation = jQuery.extend( Animation, { + + tweeners: { + "*": [ function( prop, value ) { + var tween = this.createTween( prop, value ); + adjustCSS( tween.elem, prop, rcssNum.exec( value ), tween ); + return tween; + } ] + }, + + tweener: function( props, callback ) { + if ( isFunction( props ) ) { + callback = props; + props = [ "*" ]; + } else { + props = props.match( rnothtmlwhite ); + } + + var prop, + index = 0, + length = props.length; + + for ( ; index < length; index++ ) { + prop = props[ index ]; + Animation.tweeners[ prop ] = Animation.tweeners[ prop ] || []; + Animation.tweeners[ prop ].unshift( callback ); + } + }, + + prefilters: [ defaultPrefilter ], + + prefilter: function( callback, prepend ) { + if ( prepend ) { + Animation.prefilters.unshift( callback ); + } else { + Animation.prefilters.push( callback ); + } + } +} ); + +jQuery.speed = function( speed, easing, fn ) { + var opt = speed && typeof speed === "object" ? jQuery.extend( {}, speed ) : { + complete: fn || !fn && easing || + isFunction( speed ) && speed, + duration: speed, + easing: fn && easing || easing && !isFunction( easing ) && easing + }; + + // Go to the end state if fx are off + if ( jQuery.fx.off ) { + opt.duration = 0; + + } else { + if ( typeof opt.duration !== "number" ) { + if ( opt.duration in jQuery.fx.speeds ) { + opt.duration = jQuery.fx.speeds[ opt.duration ]; + + } else { + opt.duration = jQuery.fx.speeds._default; + } + } + } + + // Normalize opt.queue - true/undefined/null -> "fx" + if ( opt.queue == null || opt.queue === true ) { + opt.queue = "fx"; + } + + // Queueing + opt.old = opt.complete; + + opt.complete = function() { + if ( isFunction( opt.old ) ) { + opt.old.call( this ); + } + + if ( opt.queue ) { + jQuery.dequeue( this, opt.queue ); + } + }; + + return opt; +}; + +jQuery.fn.extend( { + fadeTo: function( speed, to, easing, callback ) { + + // Show any hidden elements after setting opacity to 0 + return this.filter( isHiddenWithinTree ).css( "opacity", 0 ).show() + + // Animate to the value specified + .end().animate( { opacity: to }, speed, easing, callback ); + }, + animate: function( prop, speed, easing, callback ) { + var empty = jQuery.isEmptyObject( prop ), + optall = jQuery.speed( speed, easing, callback ), + doAnimation = function() { + + // Operate on a copy of prop so per-property easing won't be lost + var anim = Animation( this, jQuery.extend( {}, prop ), optall ); + + // Empty animations, or finishing resolves immediately + if ( empty || dataPriv.get( this, "finish" ) ) { + anim.stop( true ); + } + }; + + doAnimation.finish = doAnimation; + + return empty || optall.queue === false ? + this.each( doAnimation ) : + this.queue( optall.queue, doAnimation ); + }, + stop: function( type, clearQueue, gotoEnd ) { + var stopQueue = function( hooks ) { + var stop = hooks.stop; + delete hooks.stop; + stop( gotoEnd ); + }; + + if ( typeof type !== "string" ) { + gotoEnd = clearQueue; + clearQueue = type; + type = undefined; + } + if ( clearQueue ) { + this.queue( type || "fx", [] ); + } + + return this.each( function() { + var dequeue = true, + index = type != null && type + "queueHooks", + timers = jQuery.timers, + data = dataPriv.get( this ); + + if ( index ) { + if ( data[ index ] && data[ index ].stop ) { + stopQueue( data[ index ] ); + } + } else { + for ( index in data ) { + if ( data[ index ] && data[ index ].stop && rrun.test( index ) ) { + stopQueue( data[ index ] ); + } + } + } + + for ( index = timers.length; index--; ) { + if ( timers[ index ].elem === this && + ( type == null || timers[ index ].queue === type ) ) { + + timers[ index ].anim.stop( gotoEnd ); + dequeue = false; + timers.splice( index, 1 ); + } + } + + // Start the next in the queue if the last step wasn't forced. + // Timers currently will call their complete callbacks, which + // will dequeue but only if they were gotoEnd. + if ( dequeue || !gotoEnd ) { + jQuery.dequeue( this, type ); + } + } ); + }, + finish: function( type ) { + if ( type !== false ) { + type = type || "fx"; + } + return this.each( function() { + var index, + data = dataPriv.get( this ), + queue = data[ type + "queue" ], + hooks = data[ type + "queueHooks" ], + timers = jQuery.timers, + length = queue ? queue.length : 0; + + // Enable finishing flag on private data + data.finish = true; + + // Empty the queue first + jQuery.queue( this, type, [] ); + + if ( hooks && hooks.stop ) { + hooks.stop.call( this, true ); + } + + // Look for any active animations, and finish them + for ( index = timers.length; index--; ) { + if ( timers[ index ].elem === this && timers[ index ].queue === type ) { + timers[ index ].anim.stop( true ); + timers.splice( index, 1 ); + } + } + + // Look for any animations in the old queue and finish them + for ( index = 0; index < length; index++ ) { + if ( queue[ index ] && queue[ index ].finish ) { + queue[ index ].finish.call( this ); + } + } + + // Turn off finishing flag + delete data.finish; + } ); + } +} ); + +jQuery.each( [ "toggle", "show", "hide" ], function( _i, name ) { + var cssFn = jQuery.fn[ name ]; + jQuery.fn[ name ] = function( speed, easing, callback ) { + return speed == null || typeof speed === "boolean" ? + cssFn.apply( this, arguments ) : + this.animate( genFx( name, true ), speed, easing, callback ); + }; +} ); + +// Generate shortcuts for custom animations +jQuery.each( { + slideDown: genFx( "show" ), + slideUp: genFx( "hide" ), + slideToggle: genFx( "toggle" ), + fadeIn: { opacity: "show" }, + fadeOut: { opacity: "hide" }, + fadeToggle: { opacity: "toggle" } +}, function( name, props ) { + jQuery.fn[ name ] = function( speed, easing, callback ) { + return this.animate( props, speed, easing, callback ); + }; +} ); + +jQuery.timers = []; +jQuery.fx.tick = function() { + var timer, + i = 0, + timers = jQuery.timers; + + fxNow = Date.now(); + + for ( ; i < timers.length; i++ ) { + timer = timers[ i ]; + + // Run the timer and safely remove it when done (allowing for external removal) + if ( !timer() && timers[ i ] === timer ) { + timers.splice( i--, 1 ); + } + } + + if ( !timers.length ) { + jQuery.fx.stop(); + } + fxNow = undefined; +}; + +jQuery.fx.timer = function( timer ) { + jQuery.timers.push( timer ); + jQuery.fx.start(); +}; + +jQuery.fx.interval = 13; +jQuery.fx.start = function() { + if ( inProgress ) { + return; + } + + inProgress = true; + schedule(); +}; + +jQuery.fx.stop = function() { + inProgress = null; +}; + +jQuery.fx.speeds = { + slow: 600, + fast: 200, + + // Default speed + _default: 400 +}; + + +// Based off of the plugin by Clint Helfers, with permission. +// https://web.archive.org/web/20100324014747/http://blindsignals.com/index.php/2009/07/jquery-delay/ +jQuery.fn.delay = function( time, type ) { + time = jQuery.fx ? jQuery.fx.speeds[ time ] || time : time; + type = type || "fx"; + + return this.queue( type, function( next, hooks ) { + var timeout = window.setTimeout( next, time ); + hooks.stop = function() { + window.clearTimeout( timeout ); + }; + } ); +}; + + +( function() { + var input = document.createElement( "input" ), + select = document.createElement( "select" ), + opt = select.appendChild( document.createElement( "option" ) ); + + input.type = "checkbox"; + + // Support: Android <=4.3 only + // Default value for a checkbox should be "on" + support.checkOn = input.value !== ""; + + // Support: IE <=11 only + // Must access selectedIndex to make default options select + support.optSelected = opt.selected; + + // Support: IE <=11 only + // An input loses its value after becoming a radio + input = document.createElement( "input" ); + input.value = "t"; + input.type = "radio"; + support.radioValue = input.value === "t"; +} )(); + + +var boolHook, + attrHandle = jQuery.expr.attrHandle; + +jQuery.fn.extend( { + attr: function( name, value ) { + return access( this, jQuery.attr, name, value, arguments.length > 1 ); + }, + + removeAttr: function( name ) { + return this.each( function() { + jQuery.removeAttr( this, name ); + } ); + } +} ); + +jQuery.extend( { + attr: function( elem, name, value ) { + var ret, hooks, + nType = elem.nodeType; + + // Don't get/set attributes on text, comment and attribute nodes + if ( nType === 3 || nType === 8 || nType === 2 ) { + return; + } + + // Fallback to prop when attributes are not supported + if ( typeof elem.getAttribute === "undefined" ) { + return jQuery.prop( elem, name, value ); + } + + // Attribute hooks are determined by the lowercase version + // Grab necessary hook if one is defined + if ( nType !== 1 || !jQuery.isXMLDoc( elem ) ) { + hooks = jQuery.attrHooks[ name.toLowerCase() ] || + ( jQuery.expr.match.bool.test( name ) ? boolHook : undefined ); + } + + if ( value !== undefined ) { + if ( value === null ) { + jQuery.removeAttr( elem, name ); + return; + } + + if ( hooks && "set" in hooks && + ( ret = hooks.set( elem, value, name ) ) !== undefined ) { + return ret; + } + + elem.setAttribute( name, value + "" ); + return value; + } + + if ( hooks && "get" in hooks && ( ret = hooks.get( elem, name ) ) !== null ) { + return ret; + } + + ret = jQuery.find.attr( elem, name ); + + // Non-existent attributes return null, we normalize to undefined + return ret == null ? undefined : ret; + }, + + attrHooks: { + type: { + set: function( elem, value ) { + if ( !support.radioValue && value === "radio" && + nodeName( elem, "input" ) ) { + var val = elem.value; + elem.setAttribute( "type", value ); + if ( val ) { + elem.value = val; + } + return value; + } + } + } + }, + + removeAttr: function( elem, value ) { + var name, + i = 0, + + // Attribute names can contain non-HTML whitespace characters + // https://html.spec.whatwg.org/multipage/syntax.html#attributes-2 + attrNames = value && value.match( rnothtmlwhite ); + + if ( attrNames && elem.nodeType === 1 ) { + while ( ( name = attrNames[ i++ ] ) ) { + elem.removeAttribute( name ); + } + } + } +} ); + +// Hooks for boolean attributes +boolHook = { + set: function( elem, value, name ) { + if ( value === false ) { + + // Remove boolean attributes when set to false + jQuery.removeAttr( elem, name ); + } else { + elem.setAttribute( name, name ); + } + return name; + } +}; + +jQuery.each( jQuery.expr.match.bool.source.match( /\w+/g ), function( _i, name ) { + var getter = attrHandle[ name ] || jQuery.find.attr; + + attrHandle[ name ] = function( elem, name, isXML ) { + var ret, handle, + lowercaseName = name.toLowerCase(); + + if ( !isXML ) { + + // Avoid an infinite loop by temporarily removing this function from the getter + handle = attrHandle[ lowercaseName ]; + attrHandle[ lowercaseName ] = ret; + ret = getter( elem, name, isXML ) != null ? + lowercaseName : + null; + attrHandle[ lowercaseName ] = handle; + } + return ret; + }; +} ); + + + + +var rfocusable = /^(?:input|select|textarea|button)$/i, + rclickable = /^(?:a|area)$/i; + +jQuery.fn.extend( { + prop: function( name, value ) { + return access( this, jQuery.prop, name, value, arguments.length > 1 ); + }, + + removeProp: function( name ) { + return this.each( function() { + delete this[ jQuery.propFix[ name ] || name ]; + } ); + } +} ); + +jQuery.extend( { + prop: function( elem, name, value ) { + var ret, hooks, + nType = elem.nodeType; + + // Don't get/set properties on text, comment and attribute nodes + if ( nType === 3 || nType === 8 || nType === 2 ) { + return; + } + + if ( nType !== 1 || !jQuery.isXMLDoc( elem ) ) { + + // Fix name and attach hooks + name = jQuery.propFix[ name ] || name; + hooks = jQuery.propHooks[ name ]; + } + + if ( value !== undefined ) { + if ( hooks && "set" in hooks && + ( ret = hooks.set( elem, value, name ) ) !== undefined ) { + return ret; + } + + return ( elem[ name ] = value ); + } + + if ( hooks && "get" in hooks && ( ret = hooks.get( elem, name ) ) !== null ) { + return ret; + } + + return elem[ name ]; + }, + + propHooks: { + tabIndex: { + get: function( elem ) { + + // Support: IE <=9 - 11 only + // elem.tabIndex doesn't always return the + // correct value when it hasn't been explicitly set + // https://web.archive.org/web/20141116233347/http://fluidproject.org/blog/2008/01/09/getting-setting-and-removing-tabindex-values-with-javascript/ + // Use proper attribute retrieval(#12072) + var tabindex = jQuery.find.attr( elem, "tabindex" ); + + if ( tabindex ) { + return parseInt( tabindex, 10 ); + } + + if ( + rfocusable.test( elem.nodeName ) || + rclickable.test( elem.nodeName ) && + elem.href + ) { + return 0; + } + + return -1; + } + } + }, + + propFix: { + "for": "htmlFor", + "class": "className" + } +} ); + +// Support: IE <=11 only +// Accessing the selectedIndex property +// forces the browser to respect setting selected +// on the option +// The getter ensures a default option is selected +// when in an optgroup +// eslint rule "no-unused-expressions" is disabled for this code +// since it considers such accessions noop +if ( !support.optSelected ) { + jQuery.propHooks.selected = { + get: function( elem ) { + + /* eslint no-unused-expressions: "off" */ + + var parent = elem.parentNode; + if ( parent && parent.parentNode ) { + parent.parentNode.selectedIndex; + } + return null; + }, + set: function( elem ) { + + /* eslint no-unused-expressions: "off" */ + + var parent = elem.parentNode; + if ( parent ) { + parent.selectedIndex; + + if ( parent.parentNode ) { + parent.parentNode.selectedIndex; + } + } + } + }; +} + +jQuery.each( [ + "tabIndex", + "readOnly", + "maxLength", + "cellSpacing", + "cellPadding", + "rowSpan", + "colSpan", + "useMap", + "frameBorder", + "contentEditable" +], function() { + jQuery.propFix[ this.toLowerCase() ] = this; +} ); + + + + + // Strip and collapse whitespace according to HTML spec + // https://infra.spec.whatwg.org/#strip-and-collapse-ascii-whitespace + function stripAndCollapse( value ) { + var tokens = value.match( rnothtmlwhite ) || []; + return tokens.join( " " ); + } + + +function getClass( elem ) { + return elem.getAttribute && elem.getAttribute( "class" ) || ""; +} + +function classesToArray( value ) { + if ( Array.isArray( value ) ) { + return value; + } + if ( typeof value === "string" ) { + return value.match( rnothtmlwhite ) || []; + } + return []; +} + +jQuery.fn.extend( { + addClass: function( value ) { + var classes, elem, cur, curValue, clazz, j, finalValue, + i = 0; + + if ( isFunction( value ) ) { + return this.each( function( j ) { + jQuery( this ).addClass( value.call( this, j, getClass( this ) ) ); + } ); + } + + classes = classesToArray( value ); + + if ( classes.length ) { + while ( ( elem = this[ i++ ] ) ) { + curValue = getClass( elem ); + cur = elem.nodeType === 1 && ( " " + stripAndCollapse( curValue ) + " " ); + + if ( cur ) { + j = 0; + while ( ( clazz = classes[ j++ ] ) ) { + if ( cur.indexOf( " " + clazz + " " ) < 0 ) { + cur += clazz + " "; + } + } + + // Only assign if different to avoid unneeded rendering. + finalValue = stripAndCollapse( cur ); + if ( curValue !== finalValue ) { + elem.setAttribute( "class", finalValue ); + } + } + } + } + + return this; + }, + + removeClass: function( value ) { + var classes, elem, cur, curValue, clazz, j, finalValue, + i = 0; + + if ( isFunction( value ) ) { + return this.each( function( j ) { + jQuery( this ).removeClass( value.call( this, j, getClass( this ) ) ); + } ); + } + + if ( !arguments.length ) { + return this.attr( "class", "" ); + } + + classes = classesToArray( value ); + + if ( classes.length ) { + while ( ( elem = this[ i++ ] ) ) { + curValue = getClass( elem ); + + // This expression is here for better compressibility (see addClass) + cur = elem.nodeType === 1 && ( " " + stripAndCollapse( curValue ) + " " ); + + if ( cur ) { + j = 0; + while ( ( clazz = classes[ j++ ] ) ) { + + // Remove *all* instances + while ( cur.indexOf( " " + clazz + " " ) > -1 ) { + cur = cur.replace( " " + clazz + " ", " " ); + } + } + + // Only assign if different to avoid unneeded rendering. + finalValue = stripAndCollapse( cur ); + if ( curValue !== finalValue ) { + elem.setAttribute( "class", finalValue ); + } + } + } + } + + return this; + }, + + toggleClass: function( value, stateVal ) { + var type = typeof value, + isValidValue = type === "string" || Array.isArray( value ); + + if ( typeof stateVal === "boolean" && isValidValue ) { + return stateVal ? this.addClass( value ) : this.removeClass( value ); + } + + if ( isFunction( value ) ) { + return this.each( function( i ) { + jQuery( this ).toggleClass( + value.call( this, i, getClass( this ), stateVal ), + stateVal + ); + } ); + } + + return this.each( function() { + var className, i, self, classNames; + + if ( isValidValue ) { + + // Toggle individual class names + i = 0; + self = jQuery( this ); + classNames = classesToArray( value ); + + while ( ( className = classNames[ i++ ] ) ) { + + // Check each className given, space separated list + if ( self.hasClass( className ) ) { + self.removeClass( className ); + } else { + self.addClass( className ); + } + } + + // Toggle whole class name + } else if ( value === undefined || type === "boolean" ) { + className = getClass( this ); + if ( className ) { + + // Store className if set + dataPriv.set( this, "__className__", className ); + } + + // If the element has a class name or if we're passed `false`, + // then remove the whole classname (if there was one, the above saved it). + // Otherwise bring back whatever was previously saved (if anything), + // falling back to the empty string if nothing was stored. + if ( this.setAttribute ) { + this.setAttribute( "class", + className || value === false ? + "" : + dataPriv.get( this, "__className__" ) || "" + ); + } + } + } ); + }, + + hasClass: function( selector ) { + var className, elem, + i = 0; + + className = " " + selector + " "; + while ( ( elem = this[ i++ ] ) ) { + if ( elem.nodeType === 1 && + ( " " + stripAndCollapse( getClass( elem ) ) + " " ).indexOf( className ) > -1 ) { + return true; + } + } + + return false; + } +} ); + + + + +var rreturn = /\r/g; + +jQuery.fn.extend( { + val: function( value ) { + var hooks, ret, valueIsFunction, + elem = this[ 0 ]; + + if ( !arguments.length ) { + if ( elem ) { + hooks = jQuery.valHooks[ elem.type ] || + jQuery.valHooks[ elem.nodeName.toLowerCase() ]; + + if ( hooks && + "get" in hooks && + ( ret = hooks.get( elem, "value" ) ) !== undefined + ) { + return ret; + } + + ret = elem.value; + + // Handle most common string cases + if ( typeof ret === "string" ) { + return ret.replace( rreturn, "" ); + } + + // Handle cases where value is null/undef or number + return ret == null ? "" : ret; + } + + return; + } + + valueIsFunction = isFunction( value ); + + return this.each( function( i ) { + var val; + + if ( this.nodeType !== 1 ) { + return; + } + + if ( valueIsFunction ) { + val = value.call( this, i, jQuery( this ).val() ); + } else { + val = value; + } + + // Treat null/undefined as ""; convert numbers to string + if ( val == null ) { + val = ""; + + } else if ( typeof val === "number" ) { + val += ""; + + } else if ( Array.isArray( val ) ) { + val = jQuery.map( val, function( value ) { + return value == null ? "" : value + ""; + } ); + } + + hooks = jQuery.valHooks[ this.type ] || jQuery.valHooks[ this.nodeName.toLowerCase() ]; + + // If set returns undefined, fall back to normal setting + if ( !hooks || !( "set" in hooks ) || hooks.set( this, val, "value" ) === undefined ) { + this.value = val; + } + } ); + } +} ); + +jQuery.extend( { + valHooks: { + option: { + get: function( elem ) { + + var val = jQuery.find.attr( elem, "value" ); + return val != null ? + val : + + // Support: IE <=10 - 11 only + // option.text throws exceptions (#14686, #14858) + // Strip and collapse whitespace + // https://html.spec.whatwg.org/#strip-and-collapse-whitespace + stripAndCollapse( jQuery.text( elem ) ); + } + }, + select: { + get: function( elem ) { + var value, option, i, + options = elem.options, + index = elem.selectedIndex, + one = elem.type === "select-one", + values = one ? null : [], + max = one ? index + 1 : options.length; + + if ( index < 0 ) { + i = max; + + } else { + i = one ? index : 0; + } + + // Loop through all the selected options + for ( ; i < max; i++ ) { + option = options[ i ]; + + // Support: IE <=9 only + // IE8-9 doesn't update selected after form reset (#2551) + if ( ( option.selected || i === index ) && + + // Don't return options that are disabled or in a disabled optgroup + !option.disabled && + ( !option.parentNode.disabled || + !nodeName( option.parentNode, "optgroup" ) ) ) { + + // Get the specific value for the option + value = jQuery( option ).val(); + + // We don't need an array for one selects + if ( one ) { + return value; + } + + // Multi-Selects return an array + values.push( value ); + } + } + + return values; + }, + + set: function( elem, value ) { + var optionSet, option, + options = elem.options, + values = jQuery.makeArray( value ), + i = options.length; + + while ( i-- ) { + option = options[ i ]; + + /* eslint-disable no-cond-assign */ + + if ( option.selected = + jQuery.inArray( jQuery.valHooks.option.get( option ), values ) > -1 + ) { + optionSet = true; + } + + /* eslint-enable no-cond-assign */ + } + + // Force browsers to behave consistently when non-matching value is set + if ( !optionSet ) { + elem.selectedIndex = -1; + } + return values; + } + } + } +} ); + +// Radios and checkboxes getter/setter +jQuery.each( [ "radio", "checkbox" ], function() { + jQuery.valHooks[ this ] = { + set: function( elem, value ) { + if ( Array.isArray( value ) ) { + return ( elem.checked = jQuery.inArray( jQuery( elem ).val(), value ) > -1 ); + } + } + }; + if ( !support.checkOn ) { + jQuery.valHooks[ this ].get = function( elem ) { + return elem.getAttribute( "value" ) === null ? "on" : elem.value; + }; + } +} ); + + + + +// Return jQuery for attributes-only inclusion + + +support.focusin = "onfocusin" in window; + + +var rfocusMorph = /^(?:focusinfocus|focusoutblur)$/, + stopPropagationCallback = function( e ) { + e.stopPropagation(); + }; + +jQuery.extend( jQuery.event, { + + trigger: function( event, data, elem, onlyHandlers ) { + + var i, cur, tmp, bubbleType, ontype, handle, special, lastElement, + eventPath = [ elem || document ], + type = hasOwn.call( event, "type" ) ? event.type : event, + namespaces = hasOwn.call( event, "namespace" ) ? event.namespace.split( "." ) : []; + + cur = lastElement = tmp = elem = elem || document; + + // Don't do events on text and comment nodes + if ( elem.nodeType === 3 || elem.nodeType === 8 ) { + return; + } + + // focus/blur morphs to focusin/out; ensure we're not firing them right now + if ( rfocusMorph.test( type + jQuery.event.triggered ) ) { + return; + } + + if ( type.indexOf( "." ) > -1 ) { + + // Namespaced trigger; create a regexp to match event type in handle() + namespaces = type.split( "." ); + type = namespaces.shift(); + namespaces.sort(); + } + ontype = type.indexOf( ":" ) < 0 && "on" + type; + + // Caller can pass in a jQuery.Event object, Object, or just an event type string + event = event[ jQuery.expando ] ? + event : + new jQuery.Event( type, typeof event === "object" && event ); + + // Trigger bitmask: & 1 for native handlers; & 2 for jQuery (always true) + event.isTrigger = onlyHandlers ? 2 : 3; + event.namespace = namespaces.join( "." ); + event.rnamespace = event.namespace ? + new RegExp( "(^|\\.)" + namespaces.join( "\\.(?:.*\\.|)" ) + "(\\.|$)" ) : + null; + + // Clean up the event in case it is being reused + event.result = undefined; + if ( !event.target ) { + event.target = elem; + } + + // Clone any incoming data and prepend the event, creating the handler arg list + data = data == null ? + [ event ] : + jQuery.makeArray( data, [ event ] ); + + // Allow special events to draw outside the lines + special = jQuery.event.special[ type ] || {}; + if ( !onlyHandlers && special.trigger && special.trigger.apply( elem, data ) === false ) { + return; + } + + // Determine event propagation path in advance, per W3C events spec (#9951) + // Bubble up to document, then to window; watch for a global ownerDocument var (#9724) + if ( !onlyHandlers && !special.noBubble && !isWindow( elem ) ) { + + bubbleType = special.delegateType || type; + if ( !rfocusMorph.test( bubbleType + type ) ) { + cur = cur.parentNode; + } + for ( ; cur; cur = cur.parentNode ) { + eventPath.push( cur ); + tmp = cur; + } + + // Only add window if we got to document (e.g., not plain obj or detached DOM) + if ( tmp === ( elem.ownerDocument || document ) ) { + eventPath.push( tmp.defaultView || tmp.parentWindow || window ); + } + } + + // Fire handlers on the event path + i = 0; + while ( ( cur = eventPath[ i++ ] ) && !event.isPropagationStopped() ) { + lastElement = cur; + event.type = i > 1 ? + bubbleType : + special.bindType || type; + + // jQuery handler + handle = ( dataPriv.get( cur, "events" ) || Object.create( null ) )[ event.type ] && + dataPriv.get( cur, "handle" ); + if ( handle ) { + handle.apply( cur, data ); + } + + // Native handler + handle = ontype && cur[ ontype ]; + if ( handle && handle.apply && acceptData( cur ) ) { + event.result = handle.apply( cur, data ); + if ( event.result === false ) { + event.preventDefault(); + } + } + } + event.type = type; + + // If nobody prevented the default action, do it now + if ( !onlyHandlers && !event.isDefaultPrevented() ) { + + if ( ( !special._default || + special._default.apply( eventPath.pop(), data ) === false ) && + acceptData( elem ) ) { + + // Call a native DOM method on the target with the same name as the event. + // Don't do default actions on window, that's where global variables be (#6170) + if ( ontype && isFunction( elem[ type ] ) && !isWindow( elem ) ) { + + // Don't re-trigger an onFOO event when we call its FOO() method + tmp = elem[ ontype ]; + + if ( tmp ) { + elem[ ontype ] = null; + } + + // Prevent re-triggering of the same event, since we already bubbled it above + jQuery.event.triggered = type; + + if ( event.isPropagationStopped() ) { + lastElement.addEventListener( type, stopPropagationCallback ); + } + + elem[ type ](); + + if ( event.isPropagationStopped() ) { + lastElement.removeEventListener( type, stopPropagationCallback ); + } + + jQuery.event.triggered = undefined; + + if ( tmp ) { + elem[ ontype ] = tmp; + } + } + } + } + + return event.result; + }, + + // Piggyback on a donor event to simulate a different one + // Used only for `focus(in | out)` events + simulate: function( type, elem, event ) { + var e = jQuery.extend( + new jQuery.Event(), + event, + { + type: type, + isSimulated: true + } + ); + + jQuery.event.trigger( e, null, elem ); + } + +} ); + +jQuery.fn.extend( { + + trigger: function( type, data ) { + return this.each( function() { + jQuery.event.trigger( type, data, this ); + } ); + }, + triggerHandler: function( type, data ) { + var elem = this[ 0 ]; + if ( elem ) { + return jQuery.event.trigger( type, data, elem, true ); + } + } +} ); + + +// Support: Firefox <=44 +// Firefox doesn't have focus(in | out) events +// Related ticket - https://bugzilla.mozilla.org/show_bug.cgi?id=687787 +// +// Support: Chrome <=48 - 49, Safari <=9.0 - 9.1 +// focus(in | out) events fire after focus & blur events, +// which is spec violation - http://www.w3.org/TR/DOM-Level-3-Events/#events-focusevent-event-order +// Related ticket - https://bugs.chromium.org/p/chromium/issues/detail?id=449857 +if ( !support.focusin ) { + jQuery.each( { focus: "focusin", blur: "focusout" }, function( orig, fix ) { + + // Attach a single capturing handler on the document while someone wants focusin/focusout + var handler = function( event ) { + jQuery.event.simulate( fix, event.target, jQuery.event.fix( event ) ); + }; + + jQuery.event.special[ fix ] = { + setup: function() { + + // Handle: regular nodes (via `this.ownerDocument`), window + // (via `this.document`) & document (via `this`). + var doc = this.ownerDocument || this.document || this, + attaches = dataPriv.access( doc, fix ); + + if ( !attaches ) { + doc.addEventListener( orig, handler, true ); + } + dataPriv.access( doc, fix, ( attaches || 0 ) + 1 ); + }, + teardown: function() { + var doc = this.ownerDocument || this.document || this, + attaches = dataPriv.access( doc, fix ) - 1; + + if ( !attaches ) { + doc.removeEventListener( orig, handler, true ); + dataPriv.remove( doc, fix ); + + } else { + dataPriv.access( doc, fix, attaches ); + } + } + }; + } ); +} +var location = window.location; + +var nonce = { guid: Date.now() }; + +var rquery = ( /\?/ ); + + + +// Cross-browser xml parsing +jQuery.parseXML = function( data ) { + var xml, parserErrorElem; + if ( !data || typeof data !== "string" ) { + return null; + } + + // Support: IE 9 - 11 only + // IE throws on parseFromString with invalid input. + try { + xml = ( new window.DOMParser() ).parseFromString( data, "text/xml" ); + } catch ( e ) {} + + parserErrorElem = xml && xml.getElementsByTagName( "parsererror" )[ 0 ]; + if ( !xml || parserErrorElem ) { + jQuery.error( "Invalid XML: " + ( + parserErrorElem ? + jQuery.map( parserErrorElem.childNodes, function( el ) { + return el.textContent; + } ).join( "\n" ) : + data + ) ); + } + return xml; +}; + + +var + rbracket = /\[\]$/, + rCRLF = /\r?\n/g, + rsubmitterTypes = /^(?:submit|button|image|reset|file)$/i, + rsubmittable = /^(?:input|select|textarea|keygen)/i; + +function buildParams( prefix, obj, traditional, add ) { + var name; + + if ( Array.isArray( obj ) ) { + + // Serialize array item. + jQuery.each( obj, function( i, v ) { + if ( traditional || rbracket.test( prefix ) ) { + + // Treat each array item as a scalar. + add( prefix, v ); + + } else { + + // Item is non-scalar (array or object), encode its numeric index. + buildParams( + prefix + "[" + ( typeof v === "object" && v != null ? i : "" ) + "]", + v, + traditional, + add + ); + } + } ); + + } else if ( !traditional && toType( obj ) === "object" ) { + + // Serialize object item. + for ( name in obj ) { + buildParams( prefix + "[" + name + "]", obj[ name ], traditional, add ); + } + + } else { + + // Serialize scalar item. + add( prefix, obj ); + } +} + +// Serialize an array of form elements or a set of +// key/values into a query string +jQuery.param = function( a, traditional ) { + var prefix, + s = [], + add = function( key, valueOrFunction ) { + + // If value is a function, invoke it and use its return value + var value = isFunction( valueOrFunction ) ? + valueOrFunction() : + valueOrFunction; + + s[ s.length ] = encodeURIComponent( key ) + "=" + + encodeURIComponent( value == null ? "" : value ); + }; + + if ( a == null ) { + return ""; + } + + // If an array was passed in, assume that it is an array of form elements. + if ( Array.isArray( a ) || ( a.jquery && !jQuery.isPlainObject( a ) ) ) { + + // Serialize the form elements + jQuery.each( a, function() { + add( this.name, this.value ); + } ); + + } else { + + // If traditional, encode the "old" way (the way 1.3.2 or older + // did it), otherwise encode params recursively. + for ( prefix in a ) { + buildParams( prefix, a[ prefix ], traditional, add ); + } + } + + // Return the resulting serialization + return s.join( "&" ); +}; + +jQuery.fn.extend( { + serialize: function() { + return jQuery.param( this.serializeArray() ); + }, + serializeArray: function() { + return this.map( function() { + + // Can add propHook for "elements" to filter or add form elements + var elements = jQuery.prop( this, "elements" ); + return elements ? jQuery.makeArray( elements ) : this; + } ).filter( function() { + var type = this.type; + + // Use .is( ":disabled" ) so that fieldset[disabled] works + return this.name && !jQuery( this ).is( ":disabled" ) && + rsubmittable.test( this.nodeName ) && !rsubmitterTypes.test( type ) && + ( this.checked || !rcheckableType.test( type ) ); + } ).map( function( _i, elem ) { + var val = jQuery( this ).val(); + + if ( val == null ) { + return null; + } + + if ( Array.isArray( val ) ) { + return jQuery.map( val, function( val ) { + return { name: elem.name, value: val.replace( rCRLF, "\r\n" ) }; + } ); + } + + return { name: elem.name, value: val.replace( rCRLF, "\r\n" ) }; + } ).get(); + } +} ); + + +var + r20 = /%20/g, + rhash = /#.*$/, + rantiCache = /([?&])_=[^&]*/, + rheaders = /^(.*?):[ \t]*([^\r\n]*)$/mg, + + // #7653, #8125, #8152: local protocol detection + rlocalProtocol = /^(?:about|app|app-storage|.+-extension|file|res|widget):$/, + rnoContent = /^(?:GET|HEAD)$/, + rprotocol = /^\/\//, + + /* Prefilters + * 1) They are useful to introduce custom dataTypes (see ajax/jsonp.js for an example) + * 2) These are called: + * - BEFORE asking for a transport + * - AFTER param serialization (s.data is a string if s.processData is true) + * 3) key is the dataType + * 4) the catchall symbol "*" can be used + * 5) execution will start with transport dataType and THEN continue down to "*" if needed + */ + prefilters = {}, + + /* Transports bindings + * 1) key is the dataType + * 2) the catchall symbol "*" can be used + * 3) selection will start with transport dataType and THEN go to "*" if needed + */ + transports = {}, + + // Avoid comment-prolog char sequence (#10098); must appease lint and evade compression + allTypes = "*/".concat( "*" ), + + // Anchor tag for parsing the document origin + originAnchor = document.createElement( "a" ); + +originAnchor.href = location.href; + +// Base "constructor" for jQuery.ajaxPrefilter and jQuery.ajaxTransport +function addToPrefiltersOrTransports( structure ) { + + // dataTypeExpression is optional and defaults to "*" + return function( dataTypeExpression, func ) { + + if ( typeof dataTypeExpression !== "string" ) { + func = dataTypeExpression; + dataTypeExpression = "*"; + } + + var dataType, + i = 0, + dataTypes = dataTypeExpression.toLowerCase().match( rnothtmlwhite ) || []; + + if ( isFunction( func ) ) { + + // For each dataType in the dataTypeExpression + while ( ( dataType = dataTypes[ i++ ] ) ) { + + // Prepend if requested + if ( dataType[ 0 ] === "+" ) { + dataType = dataType.slice( 1 ) || "*"; + ( structure[ dataType ] = structure[ dataType ] || [] ).unshift( func ); + + // Otherwise append + } else { + ( structure[ dataType ] = structure[ dataType ] || [] ).push( func ); + } + } + } + }; +} + +// Base inspection function for prefilters and transports +function inspectPrefiltersOrTransports( structure, options, originalOptions, jqXHR ) { + + var inspected = {}, + seekingTransport = ( structure === transports ); + + function inspect( dataType ) { + var selected; + inspected[ dataType ] = true; + jQuery.each( structure[ dataType ] || [], function( _, prefilterOrFactory ) { + var dataTypeOrTransport = prefilterOrFactory( options, originalOptions, jqXHR ); + if ( typeof dataTypeOrTransport === "string" && + !seekingTransport && !inspected[ dataTypeOrTransport ] ) { + + options.dataTypes.unshift( dataTypeOrTransport ); + inspect( dataTypeOrTransport ); + return false; + } else if ( seekingTransport ) { + return !( selected = dataTypeOrTransport ); + } + } ); + return selected; + } + + return inspect( options.dataTypes[ 0 ] ) || !inspected[ "*" ] && inspect( "*" ); +} + +// A special extend for ajax options +// that takes "flat" options (not to be deep extended) +// Fixes #9887 +function ajaxExtend( target, src ) { + var key, deep, + flatOptions = jQuery.ajaxSettings.flatOptions || {}; + + for ( key in src ) { + if ( src[ key ] !== undefined ) { + ( flatOptions[ key ] ? target : ( deep || ( deep = {} ) ) )[ key ] = src[ key ]; + } + } + if ( deep ) { + jQuery.extend( true, target, deep ); + } + + return target; +} + +/* Handles responses to an ajax request: + * - finds the right dataType (mediates between content-type and expected dataType) + * - returns the corresponding response + */ +function ajaxHandleResponses( s, jqXHR, responses ) { + + var ct, type, finalDataType, firstDataType, + contents = s.contents, + dataTypes = s.dataTypes; + + // Remove auto dataType and get content-type in the process + while ( dataTypes[ 0 ] === "*" ) { + dataTypes.shift(); + if ( ct === undefined ) { + ct = s.mimeType || jqXHR.getResponseHeader( "Content-Type" ); + } + } + + // Check if we're dealing with a known content-type + if ( ct ) { + for ( type in contents ) { + if ( contents[ type ] && contents[ type ].test( ct ) ) { + dataTypes.unshift( type ); + break; + } + } + } + + // Check to see if we have a response for the expected dataType + if ( dataTypes[ 0 ] in responses ) { + finalDataType = dataTypes[ 0 ]; + } else { + + // Try convertible dataTypes + for ( type in responses ) { + if ( !dataTypes[ 0 ] || s.converters[ type + " " + dataTypes[ 0 ] ] ) { + finalDataType = type; + break; + } + if ( !firstDataType ) { + firstDataType = type; + } + } + + // Or just use first one + finalDataType = finalDataType || firstDataType; + } + + // If we found a dataType + // We add the dataType to the list if needed + // and return the corresponding response + if ( finalDataType ) { + if ( finalDataType !== dataTypes[ 0 ] ) { + dataTypes.unshift( finalDataType ); + } + return responses[ finalDataType ]; + } +} + +/* Chain conversions given the request and the original response + * Also sets the responseXXX fields on the jqXHR instance + */ +function ajaxConvert( s, response, jqXHR, isSuccess ) { + var conv2, current, conv, tmp, prev, + converters = {}, + + // Work with a copy of dataTypes in case we need to modify it for conversion + dataTypes = s.dataTypes.slice(); + + // Create converters map with lowercased keys + if ( dataTypes[ 1 ] ) { + for ( conv in s.converters ) { + converters[ conv.toLowerCase() ] = s.converters[ conv ]; + } + } + + current = dataTypes.shift(); + + // Convert to each sequential dataType + while ( current ) { + + if ( s.responseFields[ current ] ) { + jqXHR[ s.responseFields[ current ] ] = response; + } + + // Apply the dataFilter if provided + if ( !prev && isSuccess && s.dataFilter ) { + response = s.dataFilter( response, s.dataType ); + } + + prev = current; + current = dataTypes.shift(); + + if ( current ) { + + // There's only work to do if current dataType is non-auto + if ( current === "*" ) { + + current = prev; + + // Convert response if prev dataType is non-auto and differs from current + } else if ( prev !== "*" && prev !== current ) { + + // Seek a direct converter + conv = converters[ prev + " " + current ] || converters[ "* " + current ]; + + // If none found, seek a pair + if ( !conv ) { + for ( conv2 in converters ) { + + // If conv2 outputs current + tmp = conv2.split( " " ); + if ( tmp[ 1 ] === current ) { + + // If prev can be converted to accepted input + conv = converters[ prev + " " + tmp[ 0 ] ] || + converters[ "* " + tmp[ 0 ] ]; + if ( conv ) { + + // Condense equivalence converters + if ( conv === true ) { + conv = converters[ conv2 ]; + + // Otherwise, insert the intermediate dataType + } else if ( converters[ conv2 ] !== true ) { + current = tmp[ 0 ]; + dataTypes.unshift( tmp[ 1 ] ); + } + break; + } + } + } + } + + // Apply converter (if not an equivalence) + if ( conv !== true ) { + + // Unless errors are allowed to bubble, catch and return them + if ( conv && s.throws ) { + response = conv( response ); + } else { + try { + response = conv( response ); + } catch ( e ) { + return { + state: "parsererror", + error: conv ? e : "No conversion from " + prev + " to " + current + }; + } + } + } + } + } + } + + return { state: "success", data: response }; +} + +jQuery.extend( { + + // Counter for holding the number of active queries + active: 0, + + // Last-Modified header cache for next request + lastModified: {}, + etag: {}, + + ajaxSettings: { + url: location.href, + type: "GET", + isLocal: rlocalProtocol.test( location.protocol ), + global: true, + processData: true, + async: true, + contentType: "application/x-www-form-urlencoded; charset=UTF-8", + + /* + timeout: 0, + data: null, + dataType: null, + username: null, + password: null, + cache: null, + throws: false, + traditional: false, + headers: {}, + */ + + accepts: { + "*": allTypes, + text: "text/plain", + html: "text/html", + xml: "application/xml, text/xml", + json: "application/json, text/javascript" + }, + + contents: { + xml: /\bxml\b/, + html: /\bhtml/, + json: /\bjson\b/ + }, + + responseFields: { + xml: "responseXML", + text: "responseText", + json: "responseJSON" + }, + + // Data converters + // Keys separate source (or catchall "*") and destination types with a single space + converters: { + + // Convert anything to text + "* text": String, + + // Text to html (true = no transformation) + "text html": true, + + // Evaluate text as a json expression + "text json": JSON.parse, + + // Parse text as xml + "text xml": jQuery.parseXML + }, + + // For options that shouldn't be deep extended: + // you can add your own custom options here if + // and when you create one that shouldn't be + // deep extended (see ajaxExtend) + flatOptions: { + url: true, + context: true + } + }, + + // Creates a full fledged settings object into target + // with both ajaxSettings and settings fields. + // If target is omitted, writes into ajaxSettings. + ajaxSetup: function( target, settings ) { + return settings ? + + // Building a settings object + ajaxExtend( ajaxExtend( target, jQuery.ajaxSettings ), settings ) : + + // Extending ajaxSettings + ajaxExtend( jQuery.ajaxSettings, target ); + }, + + ajaxPrefilter: addToPrefiltersOrTransports( prefilters ), + ajaxTransport: addToPrefiltersOrTransports( transports ), + + // Main method + ajax: function( url, options ) { + + // If url is an object, simulate pre-1.5 signature + if ( typeof url === "object" ) { + options = url; + url = undefined; + } + + // Force options to be an object + options = options || {}; + + var transport, + + // URL without anti-cache param + cacheURL, + + // Response headers + responseHeadersString, + responseHeaders, + + // timeout handle + timeoutTimer, + + // Url cleanup var + urlAnchor, + + // Request state (becomes false upon send and true upon completion) + completed, + + // To know if global events are to be dispatched + fireGlobals, + + // Loop variable + i, + + // uncached part of the url + uncached, + + // Create the final options object + s = jQuery.ajaxSetup( {}, options ), + + // Callbacks context + callbackContext = s.context || s, + + // Context for global events is callbackContext if it is a DOM node or jQuery collection + globalEventContext = s.context && + ( callbackContext.nodeType || callbackContext.jquery ) ? + jQuery( callbackContext ) : + jQuery.event, + + // Deferreds + deferred = jQuery.Deferred(), + completeDeferred = jQuery.Callbacks( "once memory" ), + + // Status-dependent callbacks + statusCode = s.statusCode || {}, + + // Headers (they are sent all at once) + requestHeaders = {}, + requestHeadersNames = {}, + + // Default abort message + strAbort = "canceled", + + // Fake xhr + jqXHR = { + readyState: 0, + + // Builds headers hashtable if needed + getResponseHeader: function( key ) { + var match; + if ( completed ) { + if ( !responseHeaders ) { + responseHeaders = {}; + while ( ( match = rheaders.exec( responseHeadersString ) ) ) { + responseHeaders[ match[ 1 ].toLowerCase() + " " ] = + ( responseHeaders[ match[ 1 ].toLowerCase() + " " ] || [] ) + .concat( match[ 2 ] ); + } + } + match = responseHeaders[ key.toLowerCase() + " " ]; + } + return match == null ? null : match.join( ", " ); + }, + + // Raw string + getAllResponseHeaders: function() { + return completed ? responseHeadersString : null; + }, + + // Caches the header + setRequestHeader: function( name, value ) { + if ( completed == null ) { + name = requestHeadersNames[ name.toLowerCase() ] = + requestHeadersNames[ name.toLowerCase() ] || name; + requestHeaders[ name ] = value; + } + return this; + }, + + // Overrides response content-type header + overrideMimeType: function( type ) { + if ( completed == null ) { + s.mimeType = type; + } + return this; + }, + + // Status-dependent callbacks + statusCode: function( map ) { + var code; + if ( map ) { + if ( completed ) { + + // Execute the appropriate callbacks + jqXHR.always( map[ jqXHR.status ] ); + } else { + + // Lazy-add the new callbacks in a way that preserves old ones + for ( code in map ) { + statusCode[ code ] = [ statusCode[ code ], map[ code ] ]; + } + } + } + return this; + }, + + // Cancel the request + abort: function( statusText ) { + var finalText = statusText || strAbort; + if ( transport ) { + transport.abort( finalText ); + } + done( 0, finalText ); + return this; + } + }; + + // Attach deferreds + deferred.promise( jqXHR ); + + // Add protocol if not provided (prefilters might expect it) + // Handle falsy url in the settings object (#10093: consistency with old signature) + // We also use the url parameter if available + s.url = ( ( url || s.url || location.href ) + "" ) + .replace( rprotocol, location.protocol + "//" ); + + // Alias method option to type as per ticket #12004 + s.type = options.method || options.type || s.method || s.type; + + // Extract dataTypes list + s.dataTypes = ( s.dataType || "*" ).toLowerCase().match( rnothtmlwhite ) || [ "" ]; + + // A cross-domain request is in order when the origin doesn't match the current origin. + if ( s.crossDomain == null ) { + urlAnchor = document.createElement( "a" ); + + // Support: IE <=8 - 11, Edge 12 - 15 + // IE throws exception on accessing the href property if url is malformed, + // e.g. http://example.com:80x/ + try { + urlAnchor.href = s.url; + + // Support: IE <=8 - 11 only + // Anchor's host property isn't correctly set when s.url is relative + urlAnchor.href = urlAnchor.href; + s.crossDomain = originAnchor.protocol + "//" + originAnchor.host !== + urlAnchor.protocol + "//" + urlAnchor.host; + } catch ( e ) { + + // If there is an error parsing the URL, assume it is crossDomain, + // it can be rejected by the transport if it is invalid + s.crossDomain = true; + } + } + + // Convert data if not already a string + if ( s.data && s.processData && typeof s.data !== "string" ) { + s.data = jQuery.param( s.data, s.traditional ); + } + + // Apply prefilters + inspectPrefiltersOrTransports( prefilters, s, options, jqXHR ); + + // If request was aborted inside a prefilter, stop there + if ( completed ) { + return jqXHR; + } + + // We can fire global events as of now if asked to + // Don't fire events if jQuery.event is undefined in an AMD-usage scenario (#15118) + fireGlobals = jQuery.event && s.global; + + // Watch for a new set of requests + if ( fireGlobals && jQuery.active++ === 0 ) { + jQuery.event.trigger( "ajaxStart" ); + } + + // Uppercase the type + s.type = s.type.toUpperCase(); + + // Determine if request has content + s.hasContent = !rnoContent.test( s.type ); + + // Save the URL in case we're toying with the If-Modified-Since + // and/or If-None-Match header later on + // Remove hash to simplify url manipulation + cacheURL = s.url.replace( rhash, "" ); + + // More options handling for requests with no content + if ( !s.hasContent ) { + + // Remember the hash so we can put it back + uncached = s.url.slice( cacheURL.length ); + + // If data is available and should be processed, append data to url + if ( s.data && ( s.processData || typeof s.data === "string" ) ) { + cacheURL += ( rquery.test( cacheURL ) ? "&" : "?" ) + s.data; + + // #9682: remove data so that it's not used in an eventual retry + delete s.data; + } + + // Add or update anti-cache param if needed + if ( s.cache === false ) { + cacheURL = cacheURL.replace( rantiCache, "$1" ); + uncached = ( rquery.test( cacheURL ) ? "&" : "?" ) + "_=" + ( nonce.guid++ ) + + uncached; + } + + // Put hash and anti-cache on the URL that will be requested (gh-1732) + s.url = cacheURL + uncached; + + // Change '%20' to '+' if this is encoded form body content (gh-2658) + } else if ( s.data && s.processData && + ( s.contentType || "" ).indexOf( "application/x-www-form-urlencoded" ) === 0 ) { + s.data = s.data.replace( r20, "+" ); + } + + // Set the If-Modified-Since and/or If-None-Match header, if in ifModified mode. + if ( s.ifModified ) { + if ( jQuery.lastModified[ cacheURL ] ) { + jqXHR.setRequestHeader( "If-Modified-Since", jQuery.lastModified[ cacheURL ] ); + } + if ( jQuery.etag[ cacheURL ] ) { + jqXHR.setRequestHeader( "If-None-Match", jQuery.etag[ cacheURL ] ); + } + } + + // Set the correct header, if data is being sent + if ( s.data && s.hasContent && s.contentType !== false || options.contentType ) { + jqXHR.setRequestHeader( "Content-Type", s.contentType ); + } + + // Set the Accepts header for the server, depending on the dataType + jqXHR.setRequestHeader( + "Accept", + s.dataTypes[ 0 ] && s.accepts[ s.dataTypes[ 0 ] ] ? + s.accepts[ s.dataTypes[ 0 ] ] + + ( s.dataTypes[ 0 ] !== "*" ? ", " + allTypes + "; q=0.01" : "" ) : + s.accepts[ "*" ] + ); + + // Check for headers option + for ( i in s.headers ) { + jqXHR.setRequestHeader( i, s.headers[ i ] ); + } + + // Allow custom headers/mimetypes and early abort + if ( s.beforeSend && + ( s.beforeSend.call( callbackContext, jqXHR, s ) === false || completed ) ) { + + // Abort if not done already and return + return jqXHR.abort(); + } + + // Aborting is no longer a cancellation + strAbort = "abort"; + + // Install callbacks on deferreds + completeDeferred.add( s.complete ); + jqXHR.done( s.success ); + jqXHR.fail( s.error ); + + // Get transport + transport = inspectPrefiltersOrTransports( transports, s, options, jqXHR ); + + // If no transport, we auto-abort + if ( !transport ) { + done( -1, "No Transport" ); + } else { + jqXHR.readyState = 1; + + // Send global event + if ( fireGlobals ) { + globalEventContext.trigger( "ajaxSend", [ jqXHR, s ] ); + } + + // If request was aborted inside ajaxSend, stop there + if ( completed ) { + return jqXHR; + } + + // Timeout + if ( s.async && s.timeout > 0 ) { + timeoutTimer = window.setTimeout( function() { + jqXHR.abort( "timeout" ); + }, s.timeout ); + } + + try { + completed = false; + transport.send( requestHeaders, done ); + } catch ( e ) { + + // Rethrow post-completion exceptions + if ( completed ) { + throw e; + } + + // Propagate others as results + done( -1, e ); + } + } + + // Callback for when everything is done + function done( status, nativeStatusText, responses, headers ) { + var isSuccess, success, error, response, modified, + statusText = nativeStatusText; + + // Ignore repeat invocations + if ( completed ) { + return; + } + + completed = true; + + // Clear timeout if it exists + if ( timeoutTimer ) { + window.clearTimeout( timeoutTimer ); + } + + // Dereference transport for early garbage collection + // (no matter how long the jqXHR object will be used) + transport = undefined; + + // Cache response headers + responseHeadersString = headers || ""; + + // Set readyState + jqXHR.readyState = status > 0 ? 4 : 0; + + // Determine if successful + isSuccess = status >= 200 && status < 300 || status === 304; + + // Get response data + if ( responses ) { + response = ajaxHandleResponses( s, jqXHR, responses ); + } + + // Use a noop converter for missing script but not if jsonp + if ( !isSuccess && + jQuery.inArray( "script", s.dataTypes ) > -1 && + jQuery.inArray( "json", s.dataTypes ) < 0 ) { + s.converters[ "text script" ] = function() {}; + } + + // Convert no matter what (that way responseXXX fields are always set) + response = ajaxConvert( s, response, jqXHR, isSuccess ); + + // If successful, handle type chaining + if ( isSuccess ) { + + // Set the If-Modified-Since and/or If-None-Match header, if in ifModified mode. + if ( s.ifModified ) { + modified = jqXHR.getResponseHeader( "Last-Modified" ); + if ( modified ) { + jQuery.lastModified[ cacheURL ] = modified; + } + modified = jqXHR.getResponseHeader( "etag" ); + if ( modified ) { + jQuery.etag[ cacheURL ] = modified; + } + } + + // if no content + if ( status === 204 || s.type === "HEAD" ) { + statusText = "nocontent"; + + // if not modified + } else if ( status === 304 ) { + statusText = "notmodified"; + + // If we have data, let's convert it + } else { + statusText = response.state; + success = response.data; + error = response.error; + isSuccess = !error; + } + } else { + + // Extract error from statusText and normalize for non-aborts + error = statusText; + if ( status || !statusText ) { + statusText = "error"; + if ( status < 0 ) { + status = 0; + } + } + } + + // Set data for the fake xhr object + jqXHR.status = status; + jqXHR.statusText = ( nativeStatusText || statusText ) + ""; + + // Success/Error + if ( isSuccess ) { + deferred.resolveWith( callbackContext, [ success, statusText, jqXHR ] ); + } else { + deferred.rejectWith( callbackContext, [ jqXHR, statusText, error ] ); + } + + // Status-dependent callbacks + jqXHR.statusCode( statusCode ); + statusCode = undefined; + + if ( fireGlobals ) { + globalEventContext.trigger( isSuccess ? "ajaxSuccess" : "ajaxError", + [ jqXHR, s, isSuccess ? success : error ] ); + } + + // Complete + completeDeferred.fireWith( callbackContext, [ jqXHR, statusText ] ); + + if ( fireGlobals ) { + globalEventContext.trigger( "ajaxComplete", [ jqXHR, s ] ); + + // Handle the global AJAX counter + if ( !( --jQuery.active ) ) { + jQuery.event.trigger( "ajaxStop" ); + } + } + } + + return jqXHR; + }, + + getJSON: function( url, data, callback ) { + return jQuery.get( url, data, callback, "json" ); + }, + + getScript: function( url, callback ) { + return jQuery.get( url, undefined, callback, "script" ); + } +} ); + +jQuery.each( [ "get", "post" ], function( _i, method ) { + jQuery[ method ] = function( url, data, callback, type ) { + + // Shift arguments if data argument was omitted + if ( isFunction( data ) ) { + type = type || callback; + callback = data; + data = undefined; + } + + // The url can be an options object (which then must have .url) + return jQuery.ajax( jQuery.extend( { + url: url, + type: method, + dataType: type, + data: data, + success: callback + }, jQuery.isPlainObject( url ) && url ) ); + }; +} ); + +jQuery.ajaxPrefilter( function( s ) { + var i; + for ( i in s.headers ) { + if ( i.toLowerCase() === "content-type" ) { + s.contentType = s.headers[ i ] || ""; + } + } +} ); + + +jQuery._evalUrl = function( url, options, doc ) { + return jQuery.ajax( { + url: url, + + // Make this explicit, since user can override this through ajaxSetup (#11264) + type: "GET", + dataType: "script", + cache: true, + async: false, + global: false, + + // Only evaluate the response if it is successful (gh-4126) + // dataFilter is not invoked for failure responses, so using it instead + // of the default converter is kludgy but it works. + converters: { + "text script": function() {} + }, + dataFilter: function( response ) { + jQuery.globalEval( response, options, doc ); + } + } ); +}; + + +jQuery.fn.extend( { + wrapAll: function( html ) { + var wrap; + + if ( this[ 0 ] ) { + if ( isFunction( html ) ) { + html = html.call( this[ 0 ] ); + } + + // The elements to wrap the target around + wrap = jQuery( html, this[ 0 ].ownerDocument ).eq( 0 ).clone( true ); + + if ( this[ 0 ].parentNode ) { + wrap.insertBefore( this[ 0 ] ); + } + + wrap.map( function() { + var elem = this; + + while ( elem.firstElementChild ) { + elem = elem.firstElementChild; + } + + return elem; + } ).append( this ); + } + + return this; + }, + + wrapInner: function( html ) { + if ( isFunction( html ) ) { + return this.each( function( i ) { + jQuery( this ).wrapInner( html.call( this, i ) ); + } ); + } + + return this.each( function() { + var self = jQuery( this ), + contents = self.contents(); + + if ( contents.length ) { + contents.wrapAll( html ); + + } else { + self.append( html ); + } + } ); + }, + + wrap: function( html ) { + var htmlIsFunction = isFunction( html ); + + return this.each( function( i ) { + jQuery( this ).wrapAll( htmlIsFunction ? html.call( this, i ) : html ); + } ); + }, + + unwrap: function( selector ) { + this.parent( selector ).not( "body" ).each( function() { + jQuery( this ).replaceWith( this.childNodes ); + } ); + return this; + } +} ); + + +jQuery.expr.pseudos.hidden = function( elem ) { + return !jQuery.expr.pseudos.visible( elem ); +}; +jQuery.expr.pseudos.visible = function( elem ) { + return !!( elem.offsetWidth || elem.offsetHeight || elem.getClientRects().length ); +}; + + + + +jQuery.ajaxSettings.xhr = function() { + try { + return new window.XMLHttpRequest(); + } catch ( e ) {} +}; + +var xhrSuccessStatus = { + + // File protocol always yields status code 0, assume 200 + 0: 200, + + // Support: IE <=9 only + // #1450: sometimes IE returns 1223 when it should be 204 + 1223: 204 + }, + xhrSupported = jQuery.ajaxSettings.xhr(); + +support.cors = !!xhrSupported && ( "withCredentials" in xhrSupported ); +support.ajax = xhrSupported = !!xhrSupported; + +jQuery.ajaxTransport( function( options ) { + var callback, errorCallback; + + // Cross domain only allowed if supported through XMLHttpRequest + if ( support.cors || xhrSupported && !options.crossDomain ) { + return { + send: function( headers, complete ) { + var i, + xhr = options.xhr(); + + xhr.open( + options.type, + options.url, + options.async, + options.username, + options.password + ); + + // Apply custom fields if provided + if ( options.xhrFields ) { + for ( i in options.xhrFields ) { + xhr[ i ] = options.xhrFields[ i ]; + } + } + + // Override mime type if needed + if ( options.mimeType && xhr.overrideMimeType ) { + xhr.overrideMimeType( options.mimeType ); + } + + // X-Requested-With header + // For cross-domain requests, seeing as conditions for a preflight are + // akin to a jigsaw puzzle, we simply never set it to be sure. + // (it can always be set on a per-request basis or even using ajaxSetup) + // For same-domain requests, won't change header if already provided. + if ( !options.crossDomain && !headers[ "X-Requested-With" ] ) { + headers[ "X-Requested-With" ] = "XMLHttpRequest"; + } + + // Set headers + for ( i in headers ) { + xhr.setRequestHeader( i, headers[ i ] ); + } + + // Callback + callback = function( type ) { + return function() { + if ( callback ) { + callback = errorCallback = xhr.onload = + xhr.onerror = xhr.onabort = xhr.ontimeout = + xhr.onreadystatechange = null; + + if ( type === "abort" ) { + xhr.abort(); + } else if ( type === "error" ) { + + // Support: IE <=9 only + // On a manual native abort, IE9 throws + // errors on any property access that is not readyState + if ( typeof xhr.status !== "number" ) { + complete( 0, "error" ); + } else { + complete( + + // File: protocol always yields status 0; see #8605, #14207 + xhr.status, + xhr.statusText + ); + } + } else { + complete( + xhrSuccessStatus[ xhr.status ] || xhr.status, + xhr.statusText, + + // Support: IE <=9 only + // IE9 has no XHR2 but throws on binary (trac-11426) + // For XHR2 non-text, let the caller handle it (gh-2498) + ( xhr.responseType || "text" ) !== "text" || + typeof xhr.responseText !== "string" ? + { binary: xhr.response } : + { text: xhr.responseText }, + xhr.getAllResponseHeaders() + ); + } + } + }; + }; + + // Listen to events + xhr.onload = callback(); + errorCallback = xhr.onerror = xhr.ontimeout = callback( "error" ); + + // Support: IE 9 only + // Use onreadystatechange to replace onabort + // to handle uncaught aborts + if ( xhr.onabort !== undefined ) { + xhr.onabort = errorCallback; + } else { + xhr.onreadystatechange = function() { + + // Check readyState before timeout as it changes + if ( xhr.readyState === 4 ) { + + // Allow onerror to be called first, + // but that will not handle a native abort + // Also, save errorCallback to a variable + // as xhr.onerror cannot be accessed + window.setTimeout( function() { + if ( callback ) { + errorCallback(); + } + } ); + } + }; + } + + // Create the abort callback + callback = callback( "abort" ); + + try { + + // Do send the request (this may raise an exception) + xhr.send( options.hasContent && options.data || null ); + } catch ( e ) { + + // #14683: Only rethrow if this hasn't been notified as an error yet + if ( callback ) { + throw e; + } + } + }, + + abort: function() { + if ( callback ) { + callback(); + } + } + }; + } +} ); + + + + +// Prevent auto-execution of scripts when no explicit dataType was provided (See gh-2432) +jQuery.ajaxPrefilter( function( s ) { + if ( s.crossDomain ) { + s.contents.script = false; + } +} ); + +// Install script dataType +jQuery.ajaxSetup( { + accepts: { + script: "text/javascript, application/javascript, " + + "application/ecmascript, application/x-ecmascript" + }, + contents: { + script: /\b(?:java|ecma)script\b/ + }, + converters: { + "text script": function( text ) { + jQuery.globalEval( text ); + return text; + } + } +} ); + +// Handle cache's special case and crossDomain +jQuery.ajaxPrefilter( "script", function( s ) { + if ( s.cache === undefined ) { + s.cache = false; + } + if ( s.crossDomain ) { + s.type = "GET"; + } +} ); + +// Bind script tag hack transport +jQuery.ajaxTransport( "script", function( s ) { + + // This transport only deals with cross domain or forced-by-attrs requests + if ( s.crossDomain || s.scriptAttrs ) { + var script, callback; + return { + send: function( _, complete ) { + script = jQuery( " + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +

    dtComb: A Comprehensive R Library and Web Tool for Combining Diagnostic Tests

    + + + +

    The combination of diagnostic tests has become a crucial area of +research, aiming to improve the accuracy and robustness of medical +diagnostics. While existing tools focus primarily on linear +combination methods, there is a lack of comprehensive tools that +integrate diverse methodologies. In this study, we present dtComb, a +comprehensive R package and web tool designed to address the +limitations of existing diagnostic test combination platforms. One of +the unique contributions of dtComb is offering a range of 142 methods +to combine two diagnostic tests, including linear, non-linear, machine +learning algorithms, and mathematical operators. Another significant +contribution of dtComb is its inclusion of advanced tools for ROC +analysis, diagnostic performance metrics, and visual outputs such as +sensitivity-specificity curves. Furthermore, dtComb offers +classification functions for new observations, making it an +easy-to-use tool for clinicians and researchers. The web-based version +is also available at https://biotools.erciyes.edu.tr/dtComb/ for +non-R users, providing an intuitive interface for test combination and +model training.

    +
    + + + +
    +
    +

    1 Introduction

    +

    A typical scenario often encountered in combining diagnostic tests is +when the gold standard method combines two-category and two continuous +diagnostic tests. In such cases, clinicians usually seek to compare +these two diagnostic tests and improve the performance of these +diagnostic test results by dividing the results into proportional +results (Nyblom et al. 2006; Faria et al. 2016; Müller et al. 2019). +However, this technique is straightforward and may not fully capture all +potential interactions and relationships between the diagnostic tests. +Linear combination methods have been developed to overcome such problems +(Ertürk Zararsız 2023).
    +Linear methods combine two diagnostic tests into a single score/index by +assigning weights to each test, optimizing their performance in +diagnosing the condition of interest (Neumann et al. 2023). Such +methods improve accuracy by leveraging the strengths of both tests +(Bansal and Sullivan Pepe 2013; Aznar-Gimeno et al. 2022). For instance, Su and Liu +(Su and Liu 1993) found that Fisher’s linear discriminant function +generates a linear combination of markers with either proportional or +disproportional covariance matrices, aiming to maximize sensitivity +consistently across the entire selectivity spectrum under a multivariate +normal distribution model. In contrast, another approach introduced by +Pepe and Thomson (Pepe and Thompson 2000) relies on ranking scores, +eliminating the need for linear distributional assumptions when +combining diagnostic tests. Despite the theoretical advances, when +existing tools were examined, it was seen that they contained a limited +number of methods. For instance, Kramar et al. developed a computer +program called mROC that includes only the Su and Liu method +(Kramar et al. 2001). Pérez-Fernández et al. presented a +movieROC +R package that includes methods such as Su and Liu, min-max, and +logistic regression methods (Pérez-Fernández et al. 2021). An R package called +maxmzpAUC that includes +similar methods was developed by Yu and Park (Yu and Park 2015).

    +

    On the other hand, non-linear approaches incorporating the non-linearity +between the diagnostic tests have been developed and employed to +integrate the diagnostic tests +(Ghosh and Chinnaiyan 2005; Du et al. 2024). These approaches +incorporate the non-linear structure of tests into the model, which +might improve the accuracy and reliability of the diagnosis. In contrast +to some existing packages, which permit the use of non-linear approaches +such as splines1, lasso2 and ridge regression, there is currently +no package that employs these methods directly for combination and +offers diagnostic performance. Machine-learning (ML) algorithms have +recently been adopted to combine diagnostic tests +(Agarwal et al. 2023; Prinzi et al. 2023; Ahsan et al. 2024; Sewak et al. 2024). +Many publications/studies focus on implementing ML algorithms in +diagnostic tests +(Zararsiz et al. 2016; Salvetat et al. 2022, 2024; Ganapathy et al. 2023; Alzyoud et al. 2024). +For instance, DeGroat et al. performed four different classification +algorithms (Random Forest, Support Vector Machine, Extreme Gradient +Boosting Decision Trees, and k-Nearest Neighbors) to combine markers for +the diagnosis of cardiovascular disease (DeGroat et al. 2024). The +results showed that patients with cardiovascular disease can be +diagnosed with up to 96% accuracy using these ML techniques. There are +numerous applications where ML methods can be implemented +(scikit-learn +(Pedregosa et al. 2011), +TensorFlow +(Abadi et al. 2015), +caret +(Kuhn 2008)). The +caret +library is one of the most comprehensive tools developed in the R +language(Kuhn 2008). However, these are general tools developed +only for ML algorithms and do not directly combine two diagnostic tests +or provide diagnostic performance measures.

    +

    Apart from the aforementioned methods, several basic mathematical +operations such as addition, multiplication, subtraction, and division +can also be used to combine markers +(Luo et al. 2024; Serban et al. 2024; Svart et al. 2024). For +instance, addition can enhance diagnostic sensitivity by combining the +effects of markers, whereas subtraction can more distinctly +differentiate disease states by illustrating the variance across +markers. On the other hand, there are several commercial (e.g. IBM SPSS, +MedCalc, Stata, etc.) and open source (R) software packages +(ROCR +(Sing et al. 2005), +(pROC +(Robin et al. 2011), +PRROC +(Grau et al. 2015), +plotROC +(Sachs 2017)) that researchers can use for Receiver operating +characteristic (ROC) curve analysis. However, these tools are designed +to perform a single marker ROC analysis. As a result, there is currently +no software tool that covers almost all combination methods.

    +

    In this study, we developed +dtComb, +an R package encompassing nearly all existing combination approaches in +the literature. +dtComb +has two key advantages, making it easy to apply and superior to the +other packages: (1) it provides users with a comprehensive 142 methods, +including linear and non-linear approaches, ML approaches and +mathematical operators; (2) it produces turnkey solutions to users from +the stage of uploading data to the stage of performing analyses, +performance evaluation and reporting. Furthermore, it is the only +package that illustrates linear approaches such as Minimax and Todor & +Saplacan (Todor et al. 2014; Sameera et al. 2016). In addition, it allows +for the classification of new, previously unseen observations using +trained models. To our knowledge, no other tools were designed and +developed to combine two diagnostic tests on a single platform with 142 +different methods. In other words, +dtComb +has made more effective and robust combination methods ready for +application instead of traditional approaches such as simple ratio-based +methods. First, we review the theoretical basis of the related +combination methods; then, we present an example implementation to +demonstrate the applicability of the package. Finally, we present a +user-friendly, up-to-date, and comprehensive web tool developed to +facilitate +dtComb +for physicians and healthcare professionals who do not use the R +programming language. The +dtComb +package is freely available on the CRAN network, the web application is +freely available at https://biotools.erciyes.edu.tr/dtComb/, and all +source code is available on GitHub3.

    +

    2 Material and methods

    +

    This section will provide an overview of the combination methods +implemented in the literature. Before applying these methods, we will +also discuss the standardization techniques available for the markers, +the resampling methods during model training, and, ultimately, the +metrics used to evaluate the model’s performance.

    +

    Combination approaches

    +
    Linear combination methods
    +

    The +dtComb +package comprises eight distinct linear combination methods, which will +be elaborated in this section. Before investigating these methods, we +briefly introduce some notations which will be used throughout this +section.
    +Notations:
    +Let \(D_{i}, i = 1, 2, …, n_1\) be the marker values of the \(i\)th +individual in the diseased group, where \(D_i=(D_{i1},D_{i2})\), and +\(H_j, j=1,2,…,n_2\) be the marker values of the \(j\)th individual in the +healthy group, where \(H_j=(H_{j1},H_{j2})\). Let +\(x_{i1}=c(D_{{i1}},H_{j1})\) be the values of the first marker, and +\(x_{i2}=c(D_{i2},H_{j2})\) be values of the second marker for the \(i\)th +individual \(i=1,2,...,n\). Let \(D_{i,min}=\min(D_{i1},D_{i2})\), +\(D_{i,max}=\max(D_{i1},D_{i2})\), \(H_{j,min}=\min(H_{j1},H_{j2})\), +\(H_{j,max}=\max(H_{j1},H_{j2})\) and \(c_i\) be the resulting combination +score of the \(i\)th individual.

    +
      +
    • Logistic regression: Logistic regression is a statistical method +used for binary classification. The logistic regression model +estimates the probability of the binary outcome occurring based on +the values of the independent variables. It is one of the most +commonly applied methods in diagnostic tests, and it generates a +linear combination of markers that can distinguish between control +and diseased individuals. Logistic regression is generally less +effective than normal-based discriminant analysis, like Su and Liu’s +multivariate normality-based method, when the normal assumption is +met (Efron 1975; Ruiz-Velasco 1991). On the other hand, +others have argued that logistic regression is more robust because +it does not require any assumptions about the joint distribution of +multiple markers (Cox and Snell 1989). Therefore, it is essential to +investigate the performance of linear combination methods derived +from the logistic regression approach with non-normally distributed +data.
      +The objective of the logistic regression model is to maximize the +logistic likelihood function. In other words, the logistic +likelihood function is maximized to estimate the logistic regression +model coefficients.
      +

      +

      \[\label{eq:1} +c=\frac{exp(\beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2}}{1+exp(\beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2}} \tag{1}\] +The logistic regression coefficients can provide the maximum +likelihood estimation of the model, producing an easily +interpretable value for distinguishing between the two groups.

    • +
    • Scoring based on logistic regression: The method primarily uses a +binary logistic regression model, with slight modifications to +enhance the combination score. The regression coefficients, as +predicted in Eq (1), are rounded to a user-specified number +of decimal places and subsequently used to calculate the combination +score (León et al. 2006). +\[c= \beta_1 x_{i1}+\beta_2 x_{i2}\]

    • +
    • Pepe & Thompson’s method: Pepe & Thompson have aimed to maximize +the AUC or partial AUC to combine diagnostic tests, regardless of +the distribution of markers (Pepe and Thompson 2000). They developed an +empirical solution of optimal linear combinations that maximize the +Mann-Whitney U statistic, an empirical estimate of the ROC curve. +Notably, this approach is distribution-free. Mathematically, they +maximized the following objective function: +\[\text{maximize} \; U(a)= \frac{1}{n_1 n_2} \sum_{i=1}^{n_1} \sum_{j=1}^{n_2} I\left[D_{i1}+\alpha D_{i2}\geq H_{j1}+\alpha H_{j2}\right]\]

      +

      \[c= x_{i1}+\alpha x_{i2} +\label{eq:4} \tag{2}\] +where \(a \in [-1,1]\) is interpreted as the relative weight of +\(x_{i2}\) to \(x_{i1}\) in the combination, the weight of the second +marker. This formula aims to find \(\alpha\) to maximize \(U(a)\). +Readers are referred to see (Pepe and Thomson) (Pepe and Thompson 2000).

    • +
    • Pepe, Cai & Langton’s method: Pepe et al. observed that when the +disease status and the levels of markers conform to a generalized +linear model, the regression coefficients represent the optimal +linear combinations that maximize the area under the ROC curves +(Pepe et al. 2006). The following objective function is maximized +to achieve a higher AUC value: +\[\text{maximize} \; U(a)= \frac{1}{n_1 n_2} \sum_{i=1}^{n_1} \sum_{j=1}^{n_2} I\left[D_{i1}+\alpha D_{i2}> H_{j1}+\alpha H_{j2}\right] + \frac{1}{2}I\left[D_{i1}+\alpha = H_{j1} + \alpha H_{j2}\right]\] +Before calculating the combination score using Eq (2), the +marker values are normalized or scaled to be constrained within the +scale of 0 to 1. In addition, it is noted that the estimate obtained +by maximizing the empirical AUC can be considered as a particular +case of the maximum rank correlation estimator from which the +general asymptotic distribution theory was developed. Readers are +referred to Pepe (2003, Chapters 4–6) for a review of the ROC curve +approach and more details (Pepe 2003).

    • +
    • Min-Max method: The Pepe & Thomson method is straightforward if +there are two markers. It is computationally challenging if we have +more than two markers to be combined. To overcome the computational +complexity issue of this method, Liu et al. (Liu et al. 2011) proposed a +non-parametric approach that linearly combines the minimum and +maximum values of the observed markers of each subject. This +approach, which does not rely on the normality assumption of data +distributions (i.e., distribution-free), is known as the Min-Max +method and may provide higher sensitivity than any single marker. +The objective function of the Min-Max method is as follows: +\[\text{maximize} \; U(a)= \frac{1}{n_1 n_2} \sum_{i=1}^{n_1} \sum_{j=1}^{n_2} I[D_{i,max}+\alpha D_{i,min}> H_{j,max}+\alpha H_{j,min}]\]

      +

      \[c= x_{i,max}+\alpha x_{i,min}\]
      +where \(x_{i,max}=\max⁡(x_{i1},x_{i2})\) and +\(x_{i,min}=\min⁡(x_{i1},x_{i2})\).
      +The Min-Max method aims to combine repeated measurements of a single +marker over time or multiple markers that are measured with the same +unit. While the Min-Max method is relatively simple to implement, it +has some limitations. For example, markers may have different units +of measurement, so standardization can be needed to ensure +uniformity during the combination process. Furthermore, it is +unclear whether all available information is fully utilized when +combining markers, as this method incorporates only the markers’ +minimum and maximum values into the model (Kang et al. 2016).

    • +
    • Su & Liu’s method: Su and Liu examined the combination score +separately under the assumption of two multivariate normal +distributions when the covariance matrices were proportional or +disproportionate (Su and Liu 1993). Multivariate normal distributions +with different covariances were first utilized in classification +problems (Anderson and Bahadur 1962). Then, Su and Liu also +developed a linear combination method by extending the idea of using +multivariate distributions to the AUC, showing that the best +coefficients that maximize AUC are Fisher’s discriminant +coefficients. Assuming that \(D~N(\mu_D, \sum_D)\) and +\(H~N(\mu_H, \sum_H)\) represent the multivariate normal distributions +for the diseased and non-diseased groups, respectively. The Fisher’s +coefficients are as follows:

      +

      \[\begin{equation} + (\alpha, \beta) = (\sum_{D} + \sum_{H})^{-1} \mu + \tag{3} +\end{equation}\]

      +

      where \(\mu=\mu_D-\mu_H\). The combination score in this case is: +\[c= \alpha x_{i1}+ \beta x_{i2} +\label{eq:9} \tag{4}\]

    • +
    • The Minimax method: The Minimax method is an extension of Su & +Liu’s method (Sameera et al. 2016). Suppose that D follows a +multivariate normal distribution \(D\sim N(\mu_D, \sum_D)\), +representing the diseased group, and H follows a multivariate normal +distribution \(H\sim N(\mu_H, \sum_H)\), representing the non-diseased +group. Then Fisher’s coefficients are as follows:

      +

      \[\begin{equation} + (\alpha, \beta) = \left[t\sum_{D} + (1-t)\sum_{H}\right]^{-1} (\mu_D - \mu_H) + \tag{5} +\end{equation}\]

      +

      Given these coefficients, the combination score is calculated using +Eq (4). In this formula, t is a constant with values +ranging from 0 to 1. This value can be hyper-tuned by maximizing the +AUC.

    • +
    • Todor & Saplacan’s method: Todor and Saplacan’s method uses the +sine and cosine trigonometric functions to calculate the combination +score (Todor et al. 2014). The combination score is calculated using +\(\theta \in[-\frac{\pi}{2},\frac{\pi}{2}]\), which maximizes the AUC +within this interval. The formula for the combination score is given +as follows: +\[c= \sin{(\theta)}x_{i1}+\cos{(\theta)}x_{i2}\]

    • +
    +
    Non-linear combination methods
    +

    In addition to linear combination methods, the +dtComb +package includes seven non-linear approaches, which will be discussed in +this subsection. In this subsection, we will use the following +notations: \(x_{ij}\): the value of the jth marker for the ith +individual, \(i=1,2,...,n\) and \(j=1,2\) d: degree of polynomial +regressions and splines, \(d = 1,2,…,p\).

    +
      +
    • Logistic Regression with Polynomial Feature Space: This approach +extends the logistic regression model by adding extra predictors +created by raising the original predictor variables to a certain +power. This transformation enables the model to capture and model +non-linear relationships in the data by including polynomial terms +in the feature space (James et al. 2013). The combination score +is calculated as follows: +\[c=\frac{exp\left(\beta_0 + \beta_1 x_{ij} + \beta_2 x_{ij}^2+...+\beta_p x_{ij}^p\right)}{1+exp\left(\beta_0 + \beta_1 x_{ij} + \beta_2 x_{ij}^2+...+\beta_p x_{ij}^p\right)}\] +where \(c_i\) is the combination score for the ith individual and +represents the posterior probabilities.

    • +
    • Ridge Regression with Polynomial Feature Space: This method +combines Ridge regression with expanded feature space created by +adding polynomial terms to the original predictor variables. It is a +widely used shrinkage method when we have multicollinearity between +the variables, which may be an issue for least squares regression. +This method aims to estimate the coefficients of these correlated +variables by minimizing the residual sum of squares (RSS) while +adding a term (referred to as a regularization term) to prevent +overfitting. The objective function is based on the L2 norm of the +coefficient vector, which prevents overfitting in the model (Eq +(6)). The Ridge estimate is defined as follows:

      +

      \[\begin{equation} + \hat{\beta}^R = \text{argmin}_{\beta} \text{RSS} + \lambda \sum_{j=1}^{2} \sum_{d=1}^{p} \beta_j^{d^2} + \tag{6} +\end{equation}\]

      +

      where +\[RSS=\sum_{i=1}^{n}\left(y_i-\beta_0-\sum_{j=1}^{2}\sum_{d=1}^{p} \beta_j^d x_{ij}^d\right)\] +and \(\hat{\beta}^R\) denotes the estimates of the coefficients of the +Ridge regression, and the second term is called a penalty term where +\(\lambda \geq 0\) is a shrinkage parameter. The shrinkage parameter, +\(\lambda\), controls the amount of shrinkage applied to regression +coefficients. A cross-validation is implemented to find the +shrinkage parameter. We used the +glmnet +package (Friedman et al. 2010) to implement the Ridge +regression in combining the diagnostic tests.

    • +
    • Lasso Regression with Polynomial Feature Space: Similar to Ridge +regression, Lasso regression is also a shrinkage method that adds a +penalty term to the objective function of the least square +regression. The objective function, in this case, is based on the L1 +norm of the coefficient vector, which leads to the sparsity in the +model. Some of the regression coefficients are precisely zero when +the tuning parameter \(\lambda\) is sufficiently large. This property +of the Lasso method allows the model to automatically identify and +remove less relevant variables and reduce the algorithm’s +complexity. The Lasso estimates are defined as follows:

      +

      \[\begin{equation} + \hat{\beta}^L = \text{argmin}_{\beta} \text{RSS} + \lambda \sum_{j=1}^{2} \sum_{d=1}^{d} | \beta_j^d | + \tag{7} +\end{equation}\]

      +

      To implement the Lasso regression in combining the diagnostic tests, +we used the +glmnet +package (Friedman et al. 2010).

    • +
    • Elastic-Net Regression with Polynomial Feature Space: Elastic-Net +Regression is a method that combines Lasso (L1 regularization) and +Ridge (L2 regularization) penalties to address some of the +limitations of each technique. The combination of the two penalties +is controlled by two hyperparameters, \(\alpha\in\)[0,1] and +\(\lambda\), which enable you to adjust the trade-off between the L1 +and L2 regularization terms (James et al. 2013). For the +implementation of the method, the +glmnet +package is used (Friedman et al. 2010).

    • +
    • Splines: Another non-linear combination technique frequently +applied in diagnostic tests is the splines. Splines are a versatile +mathematical and computational technique that has a wide range of +applications. These splines are piecewise functions that make +interpolating or approximating data points possible. There are +several types of splines, such as cubic splines. Smooth curves are +created by approximating a set of control points using cubic +polynomial functions. When implementing splines, two critical +parameters come into play: degrees of freedom and the choice of +polynomial degrees (i.e., degrees of the fitted polynomials). These +user-adjustable parameters, which influence the flexibility and +smoothness of the resulting curve, are critical for controlling the +behavior of splines. We used the +splines +package in the R programming language to implement splines.

    • +
    • Generalized Additive Models with Smoothing Splines and Generalized +Additive Models with Natural Cubic Splines: Regression models are +of great interest in many fields to understand the importance of +different inputs. Even though regression is widely used, the +traditional linear models often fail in real life as effects may not +be linear. Another method called generalized additive models was +introduced to identify and characterize non-linear regression +(James et al. 2013). Smoothing Splines and Natural Cubic +Splines are two standard methods used within GAMs to model +non-linear relationships. To implement these two methods, we used +the gam +package in R (Hastie 2015). The method of GAMs with Smoothing +Splines is a more data-driven and adaptive approach where smoothing +splines can automatically capture non-linear relationships without +specifying the number of knots (specific points where two or more +polynomial segments are joined together to create a +piecewise-defined curve or surface) or the shape of the spline in +advance. On the other hand, natural cubic splines are preferred when +we have prior knowledge or assumptions about the shape of the +non-linear relationship. Natural cubic splines are more +interpretable and can be controlled by the number of knots +(Elhakeem et al. 2022).

    • +
    +
    Mathematical Operators
    +

    This section will mention four arithmetic operators, eight distance +measurements, and the exponential approach. Also, unlike other +approaches, in this section, users can apply logarithmic, exponential, +and trigonometric (sinus and cosine) transformations on the markers. Let +\(x_{ij}\) represent the value of the jth variable for the ith +observation, with \(i=1,2,...,n\) and \(j=1,2\). Let the resulting +combination score for the ith individual be \(c_i\).

    +
      +
    • Arithmetic Operators: Arithmetic operators such as addition, +multiplication, division, and subtraction can also be used in +diagnostic tests to optimize the AUC, a measure of diagnostic test +performance. These mathematical operations can potentially increase +the AUC and improve the efficacy of diagnostic tests by combining +markers in specific ways. For example, if high values in one test +indicate risk, while low values in the other indicate risk, +subtraction or division can effectively combine these markers.

    • +
    • Distance Measurements: While combining markers with mathematical +operators, a distance measure is used to evaluate the relationships +or similarities between marker values. It’s worth noting that, as +far as we know, no studies have integrated various distinct distance +measures with arithmetic operators in this context. Euclidean +distance is the most commonly used distance measure, which may not +accurately reflect the relationship between markers. Therefore, we +incorporated a variety of distances into the package we developed. +These distances are given as follows +(Cha 2007; Pandit et al. 2011; Minaev et al. 2018):
      +Euclidean: +\[\begin{equation} + c = \sqrt{(x_{i1} - 0)^2 + (x_{i2} - 0)^2} + \tag{8} +\end{equation}\]
      +Manhattan: +\[\begin{equation} + c = |x_{i1} - 0| + |x_{i2} - 0| + \tag{9} +\end{equation}\]
      +Chebyshev: +\[\begin{equation} + c = \max\{|x_{i1} - 0|, |x_{i2} - 0|\} + \tag{10} +\end{equation}\]
      +Kulczynskid: +\[\begin{equation} + c = \frac{|x_{i1} - 0| + |x_{i2} - 0|}{\min\{x_{i1}, x_{i2}\}} + \tag{11} +\end{equation}\]
      +Lorentzian: +\[\begin{equation} + c = \ln(1 + |x_{i1} - 0|) + \ln(1 + |x_{i2} - 0|) + \tag{12} +\end{equation}\]
      +Taneja: +\[\begin{equation} + c = z_1 \left( \log \left( \frac{z_1}{\sqrt{x_{i1} \epsilon}} \right) \right) + z_2 \left( \log \left( \frac{z_2}{\sqrt{x_{i2} \epsilon}} \right) \right) + \tag{13} +\end{equation}\]
      +where +\(z_1 = \frac{x_{i1} - 0}{2}, \quad z_2 = \frac{x_{i2} - 0}{2}\)
      +Kumar-Johnson: +\[\begin{equation} + c = \frac{{(x_{i1}^2 - 0)^2}}{{2(x_{i1} \epsilon)^{\frac{3}{2}}}} + \frac{{(x_{i2}^2 - 0)^2}}{{2(x_{i2} \epsilon)^{\frac{3}{2}}}}, \quad \epsilon=0.0000 + \tag{14} +\end{equation}\]
      +Avg: +\[\begin{equation} + c = \frac{{|x_{i1} - 0| + |x_{i2} - 0| + \max\{(x_{i1} - 0),(x_{i2} - 0)\}}}{2} + \tag{14} +\end{equation}\]
      +

    • +
    • Exponential approach: The exponential approach is another +technique to explore different relationships between the diagnostic +measurements. The methods in which one of the two diagnostic tests +is considered as the base and the other as an exponent can be +represented as \(x_{i1}^{(x_{i2})}\) and \(x_{i2}^{(x_{i1})}\). The +specific goals or hypothesis of the analysis, as well as the +characteristics of the diagnostic tests, will determine which method +to use.

    • +
    +
    Machine-Learning algorithms
    +

    Machine-learning algorithms have been increasingly implemented in +various fields, including the medical field, to combine diagnostic +tests. Integrating diagnostic tests through ML can lead to more +accurate, timely, and personalized diagnoses, which are particularly +valuable in complex medical cases where multiple factors must be +considered. In this study, we aimed to incorporate almost all ML +algorithms in the package we developed. We took advantage of the +caret +package in R (Kuhn 2008) to achieve this goal. This package +includes 190 classification algorithms that could be used to train +models and make predictions. Our study focused on models that use +numerical inputs and produce binary responses depending on the +variables/features and the desired outcome. This selection process +resulted in 113 models we further implemented in our study. We then +classified these 113 models into five classes using the same idea given +in (Zararsiz et al. 2016): (i) discriminant classifiers, (ii) +decision tree models, (iii) kernel-based classifiers, (iv) ensemble +classifiers, and (v) others. Like in the +caret +package, mlComb() sets up a grid of tuning parameters for a number of +classification routines, fits each model, and calculates a performance +measure based on resampling. After the model fitting, it uses the +predict() function to calculate the probability of the "event" +occurring for each observation. Finally, it performs ROC analysis based +on the probabilities obtained from the prediction step.

    +

    Standardization

    +

    Standardization is converting/transforming data into a standard scale to +facilitate meaningful comparisons and statistical inference. Many +statistical techniques frequently employ standardization to improve the +interpretability and comparability of data. We implemented five +different standardization methods that can be applied for each marker, +the formulas of which are listed below:

    +
      +
    • Z-score: \(\frac{{x - \text{mean}(x)}}{{\text{sd}(x)}}\)

    • +
    • T-score: +\(\left( \frac{{x - \text{mean}(x)}}{{\text{sd}(x)}} \times 10 \right) + 50\)

    • +
    • min_max_scale: \(\frac{{x - \min(x)}}{{\max(x) - \min(x)}}\)

    • +
    • scale_mean_to_one: \(\frac{x}{{\text{mean}(x)}}\)

    • +
    • scale_sd_to_one: \(\frac{x}{{\text{sd}(x)}}\)

    • +
    +

    Model building

    +

    After specifying a combination method from the +dtComb +package, users can build and optimize model parameters using functions +like mlComb(), linComb(), nonlinComb(), and mathComb(), +depending on the specific model selected. Parameter optimization is done +using n-fold cross-validation, repeated n-fold cross-validation, and +bootstrapping methods for linear and non-linear approaches (i.e., +linComb(), nonlinComb()). Additionally, for machine-learning +approaches (i.e., mlComb()), all of the resampling methods from the +caret +package are used to optimize the model parameters. The total number of +parameters being optimized varies across models, and these parameters +are fine-tuned to maximize the AUC. The returned object stores input +data, preprocessed and transformed data, trained model, and resampling +results.

    +

    Evaluation of model performances

    +

    A confusion matrix, as shown in Table 1, is a table used to evaluate the +performance of a classification model and shows the number of correct +and incorrect predictions. It compares predicted and actual

    +
    + + ++++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Table 1: Confusion Matrix
    Predicted labelsActual class labelsTotal
    PositiveNegative
    PositiveTPFPTP+FP
    NegativeFNTNFN+TN
    TotalTP+FNFP+TNn
    +
    +

    TP: True Positive, TN: True Negative, FP: False Positive, FN: False +Negative, n: Sample size

    +

    class labels, with diagonal elements representing the correct +predictions and off-diagonal elements representing the number of +incorrect predictions. The +dtComb +package uses the +OptimalCutpoints +package (Yin and Tian 2014) to generate the confusion matrix and then +epiR +(Stevenson et al. 2017), including different performance metrics, to +evaluate the performances. Various performance metrics, accuracy rate +(ACC), Kappa statistic (\(\kappa\)), sensitivity (SE), specificity (SP), +apparent and true prevalence (AP, TP), positive and negative predictive +values (PPV, NPV), positive and negative likelihood ratio (PLR, NLR), +the proportion of true outcome negative subjects that test positive +(False T+ proportion for true D-), the proportion of true outcome +positive subjects that test negative (False T- proportion for true D+), +the proportion of test-positive subjects that are outcome negative +(False T+ proportion for T+), the proportion of test negative subjects +(False T- proportion for T-) that are outcome positive measures are +available in the +dtComb +package. These metrics are summarized in Table +2.

    +
    + + ++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Table 2: Performance Metrics and Formulas
    Performance MetricFormula
    Accuracy\(\text{ACC} = \frac{{\text{TP} + \text{TN}}}{2}\)
    Kappa\(\kappa = \frac{{\text{ACC} - P_e}}{{1 - P_e}}\)
    \(P_e = \frac{{(\text{TN} + \text{FN})(\text{TP} + \text{FP}) + (\text{FP} + \text{TN})(\text{FN} + \text{TN})}}{{n^2}}\)
    Sensitivity (Recall)\(\text{SE} = \frac{{\text{TP}}}{{\text{TP} + \text{FN}}}\)
    Specificity\(\text{SP} = \frac{{\text{TN}}}{{\text{TN} + \text{FP}}}\)
    Apparent Prevalence\(\text{AP} = \frac{{\text{TP}}}{{n}} + \frac{{\text{FP}}}{{n}}\)
    True Prevalence\(\text{TP} = \frac{{\text{AP} + \text{SP} - 1}}{{\text{SE} + \text{SP} - 1}}\)
    Positive Predictive Value (Precision)\(\text{PPV} = \frac{{\text{TP}}}{{\text{TP} + \text{FP}}}\)
    Negative Predictive Value\(\text{NPV} = \frac{{\text{TN}}}{{\text{TN} + \text{FN}}}\)
    Positive Likelihood Ratio\(\text{PLR} = \frac{{\text{SE}}}{{1 - \text{SP}}}\)
    Negative Likelihood Ratio\(\text{NLR} = \frac{{1 - \text{SE}}}{{\text{SP}}}\)
    The Proportion of True Outcome Negative Subjects That Test Positive\(\frac{{\text{FP}}}{{\text{FP} + \text{TN}}}\)
    The Proportion of True Outcome Positive Subjects That Test Negative\(\frac{{\text{FN}}}{{\text{TP} + \text{FN}}}\)
    The Proportion of Test Positive Subjects That Are Outcome Negative\(\frac{{\text{FP}}}{{\text{TP} + \text{FN}}}\)
    The Proportion of Test Negative Subjects That Are Outcome Positive\(\frac{{\text{FN}}}{{\text{FN} + \text{TN}}}\)
    +
    +

    Prediction of the test cases

    +

    The class labels of the observations in the test set are predicted with +the model parameters derived from the training phase. It is critical to +emphasize that the same analytical procedures employed during the +training phase have also been applied to the test set, such as +normalization, transformation, or standardization. More specifically, if +the training set underwent Z-standardization, the test set would +similarly be standardized using the mean and standard deviation derived +from the training set. The class labels of the test set are then +estimated based on the cut-off value established during the training +phase and using the model’s parameters that are trained using the +training set.

    +

    Technical details and the structure of dtComb

    +

    The +dtComb +package is implemented using the R programming language +(https://www.r-project.org/) version 4.2.0. Package development was +facilitated with +devtools +(Wickham et al. 2022) and documented with +roxygen2 +(Wickham et al. 2024). Package testing was performed using 271 unit +tests (Wickham 2024). Double programming was performed using +Python (https://www.python.org/) to validate the implemented functions +(Shiralkar 2010).
    +To combine diagnostic tests, the +dtComb +package allows the integration of eight linear combination methods, +seven non-linear combination methods, arithmetic operators, and, in +addition to these, eight distance metrics within the scope of +mathematical operators and a total of 113 machine-learning algorithms +from the +caret +package (Kuhn 2008). These are summarized in Table +3.

    +
    + + ++++ + + + + + + + + + + + + + + + + + + + + + + + + +
    Table 3: Features of dtComb
    Modules (Tab Panels)Features
    Combination Methods
      +
    • Linear Combination Approach (8 +Different methods)

    • +
    • Non-linear Combination Approach (7 +Different Methods)

    • +
    • Mathematical Operators (14 Different +methods)

    • +
    • Machine-Learning Algorithms (113 +Different Methods) +(Kuhn 2008)

    • +
      +
    • Five standardization methods +applicable to linear, non-linear, +mathematical methods

    • +
    • 16 preprocessing methods applicable +to ML (Kuhn 2008)

    • +
      +
    • Three different methods for linear +and non-linear combination methods

      +
        +
      • Bootstrapping

      • +
      • Cross-validation

      • +
      • Repeated cross-validation

      • +
    • +
    • 12 different resampling methods for +ML (Kuhn 2008)

    • +
    +
    +

    3 Results

    +

    Table 4 +summarizes the existing packages and programs, including +dtComb, +along with the number of combination methods included in each package. +While mROC offers only one linear combination method, +maxmzpAUC and +movieROC +provide five linear combination techniques each, and +SLModels +includes four. However, these existing packages primarily focus on +linear combination approaches. In contrast, +dtComb +goes beyond these limitations by integrating not only linear methods but +also non-linear approaches, machine learning algorithms, and +mathematical operators.

    +
    + + +++++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Table 4: Comparison of dtComb vs. existing packages and +programs
    Packages&ProgramsLinear Comb.Non-linear Comb.Math. OperatorsML algorithms
    mROC (Kramar et al. 2001)1---
    maxmzpAUC (Yu and Park 2015)5---
    movieROC (Pérez-Fernández et al. 2021)5---
    SLModels (Aznar-Gimeno et al. 2023)4---
    dtComb8714113
    +
    +

    Dataset

    +

    To demonstrate the functionality of the +dtComb +package, we conduct a case study using four different combination +methods. The data used in this study were obtained from patients who +presented at Erciyes University Faculty of Medicine, Department of +General Surgery, with complaints of abdominal pain +(Akyildiz et al. 2010; Zararsiz et al. 2016). The dataset comprised +D-dimer levels (D_dimer) and leukocyte counts (log_leukocyte) of 225 +patients, divided into two groups (Group): the first group consisted +of 110 patients who required an immediate laparotomy (needed). In +comparison, the second group comprised 115 patients who did not +(not_needed). After the evaluation of conventional treatment, the +patients who underwent surgery due to their postoperative pathologies +are placed in the first group. In contrast, those with a negative result +from their laparotomy were assigned to the second group. All the +analyses were performed by following a workflow given in Fig. +1. +First of all, the +dtComb +package should be loaded in order to use related functions.

    +
    +
    +**Combination steps of two diagnostic tests.** The figure presents a schematic representation of the sequential steps involved in combining two diagnostic tests using a combination method. +

    +Figure 1: Combination steps of two diagnostic tests. The figure presents a schematic representation of the sequential steps involved in combining two diagnostic tests using a combination method. +

    +
    +
    +
    # load dtComb package
    +library(dtComb)
    +

    Similarly, the laparotomy data can be loaded from the R database by +using the following R code:

    +
    
    +# load laparotomy data
    +data(laparotomy)
    +

    Implementation of the dtComb package

    +

    In order to demonstrate the applicability of the +dtComb +package, the implementation of an arbitrarily chosen method from each of +the linear, non-linear, mathematical operator and machine learning +approaches is demonstrated and their performance is compared. These +methods are Pepe, Cai & Langton for linear combination, Splines for +non-linear, Addition for mathematical operator and SVM for +machine-learning. Before applying the methods, we split the data into +two parts: a training set comprising 70% of the data and a test set +comprising the remaining 30%.

    +
    # Splitting the data set into train and test (70%-30%)
    +set.seed(2128)
    +inTrain <- caret::createDataPartition(laparotomy$group, p = 0.7, list = FALSE)
    +trainData <- laparotomy[inTrain, ]
    +colnames(trainData) <- c("Group", "D_dimer", "log_leukocyte")
    +testData <- laparotomy[-inTrain, -1]
    +
    +# define marker and status for combination function
    +markers <- trainData[, -1]
    +status <- factor(trainData$Group, levels = c("not_needed", "needed"))
    +

    The model is trained on trainData and the resampling parameters used +in the training phase are chosen as ten repeat five fold repeated +cross-validation. Direction = ‘<’ is chosen, as higher values indicate +higher risks. The Youden index was chosen among the cut-off methods. We +note that markers are not standardised and results are presented at the +confidence level (CI 95%). Four main combination functions are run with +the selected methods as follows.

    +
    
    +# PCL method
    +fit.lin.PCL <- linComb(markers = markers,  status = status, event = "needed", 
    +                       method = "PCL", resample = "repeatedcv", nfolds = 5,
    +                       nrepeats = 10, direction = "<", cutoff.method = "Youden")
    +
    +# splines method (degree = 3 and degrees of freedom = 3)
    +fit.nonlin.splines <- nonlinComb(markers = markers, status = status, event = "needed", 
    +                                 method = "splines", resample = "repeatedcv", nfolds = 5, 
    +                                 nrepeats = 10, cutoff.method = "Youden", direction = "<", 
    +                                 df1 = 3, df2 = 3)
    +#add operator
    + fit.add <- mathComb(markers = markers, status = status, event = "needed",
    +                     method = "add", direction = "<", cutoff.method = "Youden")
    +#SVM
    +fit.svm <- mlComb(markers = markers, status = status, event = "needed", method = "svmLinear", 
    +                 resample = "repeatedcv", nfolds  = 5,nrepeats = 10, direction = "<", 
    +                 cutoff.method = "Youden")
    +

    Various measures were considered to compare model performances, +including AUC, ACC, SEN, SPE, PPV, and NPV. AUC statistics, with 95% CI, +have been calculated for each marker and method. The resulting +statistics are as follows: 0.816 (0.751–0.880), 0.802 (0.728–0.877), +0.888 (0.825–0.930), 0.911 (0.868–0.954), 0.877 (0.824-0.929), and +0.875 (0.821-0.930) for D-dimer, Log(leukocyte), Pepe, Cai & Langton, +Splines, Addition, and Support Vector Machine (SVM). The results +revealed that the predictive performances of markers and the combination +of markers are significantly higher than random chance in determining +the use of laparotomy (\(p<0.05\)). The highest sensitivity and NPV were +observed with the Addition method, while the highest specificity and PPV +were observed with the Splines method. According to the overall AUC and +accuracies, the combined approach fitted with the Splines method +performed better than the other methods (Fig. +2). +Therefore, the Splines method will be used in the subsequent analysis of +the findings.

    +
    +
    +**Radar plots of trained models and performance measures of two markers.** Radar plots summarize the diagnostic performances of two markers and various combination methods in the training dataset. These plots illustrate the performance metrics such as AUC, ACC, SEN, SPE, PPV, and NPV measurements. In these plots, the width of the polygon formed by connecting each point indicates the model's performance in terms of AUC, ACC, SEN, SPE, PPV, and NPV metrics. It can be observed that the polygon associated with the Splines method occupies the most extensive area, which means that the Splines method performed better than the other methods. +

    +Figure 2: Radar plots of trained models and performance measures of two markers. Radar plots summarize the diagnostic performances of two markers and various combination methods in the training dataset. These plots illustrate the performance metrics such as AUC, ACC, SEN, SPE, PPV, and NPV measurements. In these plots, the width of the polygon formed by connecting each point indicates the model’s performance in terms of AUC, ACC, SEN, SPE, PPV, and NPV metrics. It can be observed that the polygon associated with the Splines method occupies the most extensive area, which means that the Splines method performed better than the other methods. +

    +
    +
    +

    For the AUC of markers and the spline model:

    +
    fit.nonlin.splines$AUC_table
    +                    AUC     SE.AUC LowerLimit UpperLimit         z      p.value
    +D_dimer       0.8156966 0.03303310  0.7509530  0.8804403  9.556979 1.212446e-21
    +log_leukocyte 0.8022286 0.03791768  0.7279113  0.8765459  7.970652 1.578391e-15
    +Combination   0.9111752 0.02189588  0.8682601  0.9540904 18.778659 1.128958e-78
    +

    Here:
    +SE: Standard Error.
    +The area under ROC curves for D-dimer levels and leukocyte counts on the +logarithmic scale and combination score were 0.816, 0.802, and 0.911, +respectively. The ROC curves generated with the combination score from +the splines model, D-dimer levels, and leukocyte count markers are also +given in Fig. 3, showing that the combination score has the +highest AUC. It is observed that the splines method significantly +improved between 9.5% and 10.9% in AUC statistics compared to D-dimer +level and leukocyte counts, respectively.

    +
    +
    +**ROC curves.** ROC curves for combined diagnostic tests, with sensitivity displayed on the y-axis and 1-specificity displayed on the x-axis. As can be observed, the combination score produced the highest AUC value, indicating that the combined strategy performs the best overall. +

    +Figure 3: ROC curves. ROC curves for combined diagnostic tests, with sensitivity displayed on the y-axis and 1-specificity displayed on the x-axis. As can be observed, the combination score produced the highest AUC value, indicating that the combined strategy performs the best overall. +

    +
    +
    +

    To see the results of the binary comparison between the combination +score and markers:

    +
    fit.nonlin.splines$MultComp_table
    +
    +Marker1 (A)   Marker2 (B)   AUC (A)   AUC (B)      |A-B|  SE(|A-B|)         z      p-value
    +1 Combination       D_dimer 0.9079686 0.8156966 0.09227193 0.02223904 4.1490971 3.337893e-05
    +2 Combination log_leukocyte 0.9079686 0.8022286 0.10573994 0.03466544 3.0502981 2.286144e-03
    +3     D_dimer log_leukocyte 0.8156966 0.8022286 0.01346801 0.04847560 0.2778308 7.811423e-01
    +

    Controlling Type I error using Bonferroni correction, comparison of +combination score with markers yielded significant results (\(p<0.05\)).
    +To demonstrate the diagnostic test results and performance measures for +non-linear combination approach, the following code can be used:

    +
    fit.nonlin.splines$DiagStatCombined
    +          Outcome +    Outcome -      Total
    +Test +           66           13         79
    +Test -           11           68         79
    +Total            77           81        158
    +
    +Point estimates and 95% CIs:
    +--------------------------------------------------------------
    +Apparent prevalence *                  0.50 (0.42, 0.58)
    +True prevalence *                      0.49 (0.41, 0.57)
    +Sensitivity *                          0.86 (0.76, 0.93)
    +Specificity *                          0.84 (0.74, 0.91)
    +Positive predictive value *            0.84 (0.74, 0.91)
    +Negative predictive value *            0.86 (0.76, 0.93)
    +Positive likelihood ratio              5.34 (3.22, 8.86)
    +Negative likelihood ratio              0.17 (0.10, 0.30)
    +False T+ proportion for true D- *      0.16 (0.09, 0.26)
    +False T- proportion for true D+ *      0.14 (0.07, 0.24)
    +False T+ proportion for T+ *           0.16 (0.09, 0.26)
    +False T- proportion for T- *           0.14 (0.07, 0.24)
    +Correctly classified proportion *      0.85 (0.78, 0.90)
    +--------------------------------------------------------------
    +* Exact CIs
    +

    Furthermore, if the diagnostic test results and performance measures of +the combination score are compared with the results of the single +markers, it can be observed that the TN value of the combination score +is higher than that of the single markers, and the combination of +markers has higher specificity and positive-negative predictive value +than the log-transformed leukocyte counts and D-dimer level (Table +5). +Conversely, D-dimer has a higher sensitivity than the others. Optimal +cut-off values for both markers and the combined approach are also given +in this table.

    +
    + + ++++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Table 5: Statistical diagnostic measures with 95% confidence +intervals for each marker and the combination score
    Diagnostic Measures (95% CI)D-dimer level (\(>1.6\))Log(leukocyte count) (\(>4.16\))Combination score (\(>0.448\))
    TP666165
    TN536069
    FP282112
    FN111612
    Apparent prevalence0.59 (0.51-0.67)0.52 (0.44-0.60)0.49 (0.41-0.57)
    True prevalence0.49 (0.41-0.57)0.49 (0.41-0.57)0.49 (0.41-0.57)
    Sensitivity0.86 (0.76-0.93)0.79 (0.68-0.88)0.84 (0.74-0.92)
    Specificity0.65 (0.54-0.76)0.74 (0.63-0.83)0.85 (0.76-0.92)
    Positive predictive value0.70 (0.60-0.79)0.74 (0.64-0.83)0.84 (0.74-0.92)
    Negative predictive value0.83 (0.71-0.91)0.79 (0.68-0.87)0.85 (0.76-0.92)
    Positive likelihood ratio2.48 (1.81-3.39)3.06 (2.08-4.49)5.70 (3.35-9.69)
    Negative likelihood ratio0.22 (0.12-0.39)0.28 (0.18-0.44)0.18 (0.11-0.31)
    False T+ proportion for true D-0.35 (0.24-0.46)0.26 (0.17-0.37)0.15 (0.08-0.24)
    False T- proportion for true D+0.14 (0.07-0.24)0.21 (0.12-0.32)0.16 (0.08-0.26)
    False T+ proportion for T+0.30 (0.21-0.40)0.26 (0.17-0.36)0.16 (0.08-0.26)
    False T- proportion for T-0.17 (0.09-0.29)0.21 (0.13-0.32)0.15 (0.08-0.24)
    Accuracy0.75 (0.68-0.82)0.77 (0.69-0.83)0.85 (0.78-0.90)
    +
    +

    For a comprehensive analysis, the plotComb function in +dtComb +can be used to generate plots of the kernel density and individual-value +of combination scores of each group and the specificity and sensitivity +corresponding to different cut-off point values Fig. +4. +This function requires the result of the nonlinComb function, which is +an object of the “dtComb” class and status which is of factor type.

    +
    # draw distribution, dispersion, and specificity and sensitivity plots
    +plotComb(fit.nonlin.splines, status)
    +
    +
    +Kernel density, individual-value, and sens&spe plots of the combination score acquired with the training model. Kernel density of the combination score for two groups: needed and not needed (a). Individual-value graph with classes on the x-axis and combination score on the y-axis (b). Sensitivity and specificity graph of the combination score c. While colors show each class in Figures (a) and (b), in Figure (c), the colors represent the sensitivity and specificity of the combination score. +

    +Figure 4: Kernel density, individual-value, and sens&spe plots of the combination score acquired with the training model. Kernel density of the combination score for two groups: needed and not needed (a). Individual-value graph with classes on the x-axis and combination score on the y-axis (b). Sensitivity and specificity graph of the combination score c. While colors show each class in Figures (a) and (b), in Figure (c), the colors represent the sensitivity and specificity of the combination score. +

    +
    +
    +

    If the model trained with Splines is to be tested, the generically +written predict function is used. This function requires the test set +and the result of the nonlinComb function, which is an object of the +“dtComb” class. As a result of prediction, the output for each +observation consisted of the combination score and the predicted label +determined by the cut-off value derived from the model.

    +
    # To predict the test set 
    +pred <- predict(fit.nonlin.splines, testData)
    +head(pred)
    +
    +   comb.score labels
    +1   0.6133884 needed
    +7   0.9946474 needed
    +10  0.9972347 needed
    +11  0.9925040 needed
    +13  0.9257699 needed
    +14  0.9847090 needed
    +

    Above, it can be seen that the estimated combination scores for the +first six observations in the test set were labelled as needed +because they were higher than the cut-off value of 0.448.

    +

    Web interface for the dtComb package

    +

    The primary goal of developing the +dtComb +package is to combine numerous distinct combination methods and make +them easily accessible to researchers. Furthermore, the package includes +diagnostic statistics and visualization tools for diagnostic tests and +the combination score generated by the chosen method. Nevertheless, it +is worth noting that using R code may pose challenges for physicians and +those unfamiliar with R programming. We have also developed a +user-friendly web application for +dtComb +using +Shiny +(Chang et al. 2024) to address this. This web-based tool is publicly +accessible and provides an interactive interface with all the +functionalities found in the +dtComb +package.
    +To initiate the analysis, users must upload their data by following the +instructions outlined in the "Data upload" tab of the web tool. For +convenience, we have provided three example datasets on this page to +assist researchers in practicing the tool’s functionality and to guide +them in formatting their own data (as illustrated in Fig. +5a). We also +note that ROC analysis for a single marker can be performed within the +‘ROC Analysis for Single Marker(s)’ tab in the data upload section of +the web interface.

    +

    In the "Analysis" tab, one can find two crucial subpanels:

    +
      +
    • Plots (Fig. 5b): This section offers various visual +representations, such as ROC curves, kernel density plots, +individual-value plots, and sensitivity and specificity plots. These +visualizations help users assess single diagnostic tests and the +combination score generated using user-defined combination methods.

    • +
    • Results (Fig. 5c): In this subpanel, one can access a range +of statistics. It provides insights into the combination score and +single diagnostic tests, AUC statistics, and comparisons to evaluate +how the combination score fares against individual diagnostic tests, +and various diagnostic measures. One can also predict new data based +on the model parameters set previously and stored in the "Predict" +tab (Fig. 5d). If needed, one can download the model +created during the analysis to keep the parameters of the fitted +model. This lets users make new predictions by reloading the model +from the "Predict" tab. Additionally, all the results can easily +be downloaded using the dedicated download buttons in their +respective tabs.

    • +
    +
    +
    +**Web interface of the dtComb package.** The figure illustrates the web interface of the [`dtComb`](https://cran.r-project.org/web/packages/dtComb/index.html) package, which demonstrates the steps involved in combining two diagnostic tests. a) Data Upload: The user is able to upload the dataset and select relevant markers, a gold standard test, and an event factor for analysis.b) Combination Analysis: This panel allows the selection of the combination method, method-specific parameters, and resampling options to refine the analysis. c) Combination Analysis Output: Displays the results generated by the selected combination method, providing the user with key metrics and visualizations for interpretation. d) Predict: Displays the prediction results of the trained model when applied to the test set. +

    +Figure 5: Web interface of the dtComb package. The figure illustrates the web interface of the dtComb package, which demonstrates the steps involved in combining two diagnostic tests. a) Data Upload: The user is able to upload the dataset and select relevant markers, a gold standard test, and an event factor for analysis.b) Combination Analysis: This panel allows the selection of the combination method, method-specific parameters, and resampling options to refine the analysis. c) Combination Analysis Output: Displays the results generated by the selected combination method, providing the user with key metrics and visualizations for interpretation. d) Predict: Displays the prediction results of the trained model when applied to the test set. +

    +
    +
    +

    4 Summary and further research

    +

    In clinical practice, multiple diagnostic tests are possible for disease +diagnosis (Yu and Park 2015). Combining these tests to enhance diagnostic +accuracy is a widely accepted approach +(Su and Liu 1993; Pepe and Thompson 2000; Pepe et al. 2006; Liu et al. 2011; Todor et al. 2014; Sameera et al. 2016). +As far as we know, the tools in Table 4 have been designed to combine diagnostic +tests but only contain at most five different combination methods. As a +result, despite the existence of numerous advanced combination methods, +there is currently no extensive tool available for integrating +diagnostic tests.
    +In this study, we presented +dtComb, a +comprehensive R package designed to combine diagnostic tests using +various methods, including linear, non-linear, mathematical operators, +and machine learning algorithms. The package integrates 142 different +methods for combining two diagnostic markers to improve the accuracy of +diagnosis. The package also provides ROC curve analysis, various +graphical approaches, diagnostic performance scores, and binary +comparison results. In the given example, one can determine whether +patients with abdominal pain require laparotomy by combining the D-dimer +levels and white blood cell counts of those patients. Various methods, +such as linear and non-linear combinations, were tested, and the results +showed that the Splines method performed better than the others, +particularly in terms of AUC and accuracy compared to single tests. This +shows that diagnostic accuracy can be improved with combination +methods.
    +Future work can focus on extending the capabilities of the +dtComb +package. While some studies focus on combining multiple markers, our +study aimed to combine two markers using nearly all existing methods and +develop a tool and package for clinical practice (Kang et al. 2016).

    +

    R Software

    +

    The R package +dtComb is +now available on the CRAN website +https://cran.r-project.org/web/packages/dtComb/index.html.

    +

    Acknowledgment

    +

    We would like to thank the Proofreading & Editing Office of the Dean for +Research at Erciyes University for the copyediting and proofreading +service for this manuscript.

    +
    +

    5 Note

    +

    This article is converted from a Legacy LaTeX article using the +texor package. +The pdf version is the official version. To report a problem with the html, +refer to CONTRIBUTE on the R Journal homepage.

    +
    +
    +M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, et al. TensorFlow: Large-scale machine learning on heterogeneous systems. 2015. URL https://www.tensorflow.org/. Software available from tensorflow.org. +
    +
    +S. Agarwal, A. S. Yadav, V. Dinesh, K. S. S. Vatsav, K. S. S. Prakash and S. Jaiswal. By artificial intelligence algorithms and machine learning models to diagnosis cancer. Materials Today: Proceedings, 80: 2969–2975, 2023. +
    +
    +M. Ahsan, A. Khan, K. R. Khan, B. B. Sinha and A. Sharma. Advancements in medical diagnosis and treatment through machine learning: A review. Expert Systems, 41(3): e13499, 2024. +
    +
    +H. Y. Akyildiz, E. Sozuer, A. Akcan, C. Kuçuk, T. Artis, İ. Biri and N. Yılmaz. The value of D-dimer test in the diagnosis of patients with nontraumatic acute abdomen. Turkish Journal of Trauma and Emergency Surgery, 16(1): 22–26, 2010. +
    +
    +M. Alzyoud, R. Alazaidah, M. Aljaidi, G. Samara, M. Qasem, M. Khalid and N. Al-Shanableh. Diagnosing diabetes mellitus using machine learning techniques. International Journal of Data and Network Science, 8(1): 179–188, 2024. +
    +
    +T. W. Anderson and R. R. Bahadur. Classification into two multivariate normal distributions with different covariance matrices. The annals of mathematical statistics, 420–431, 1962. +
    +
    +R. Aznar-Gimeno, L. M. Esteban, R. del-Hoyo-Alonso, Á. Borque-Fernando and G. Sanz. A stepwise algorithm for linearly combining biomarkers under Youden index maximization. Mathematics, 10(8): 1221, 2022. +
    +
    +R. Aznar-Gimeno, L. M. Esteban, G. Sanz and R. del-Hoyo-Alonso. Comparing the min–max–median/IQR approach with the min–max approach, logistic regression and XGBoost, maximising the Youden index. Symmetry (Basel), 15: 2023. URL https://doi.org/10.3390/sym15030756. +
    +
    +A. Bansal and M. Sullivan Pepe. When does combining markers improve classification performance and what are implications for practice? Statistics in medicine, 32(11): 1877–1892, 2013. +
    +
    +S.-H. Cha. Comprehensive survey on distance/similarity measures between probability density functions. City, 1(2): 1, 2007. +
    +
    +W. Chang, J. Cheng, J. Allaire, C. Sievert, B. Schloerke, Y. Xie, J. Allen, J. McPherson, A. Dipert and B. Borges. Shiny: Web application framework for R. 2024. URL https://shiny.posit.co/. R package version 1.9.1.9000, https://github.com/rstudio/shiny. +
    +
    +D. R. Cox and E. J. Snell. Analysis of binary data. 2nd ed London: Chapman; Hall/CRC, 1989. +
    +
    +W. DeGroat, H. Abdelhalim, K. Patel, D. Mendhe, S. Zeeshan and Z. Ahmed. Discovering biomarkers associated and predicting cardiovascular disease with high accuracy using a novel nexus of machine learning techniques for precision medicine. Scientific reports, 14(1): 1, 2024. +
    +
    +Z. Du, P. Du and A. Liu. Likelihood ratio combination of multiple biomarkers via smoothing spline estimated densities. Statistics in Medicine, 43(7): 1372–1383, 2024. +
    +
    +B. Efron. The efficiency of logistic regression compared to normal discriminant analysis. Journal of the American Statistical Association, 70(352): 892–898, 1975. +
    +
    +A. Elhakeem, R. A. Hughes, K. Tilling, D. L. Cousminer, S. A. Jackowski, T. J. Cole, A. S. Kwong, Z. Li, S. F. Grant, A. D. Baxter-Jones, et al. Using linear and natural cubic splines, SITAR, and latent trajectory models to characterise nonlinear longitudinal growth trajectories in cohort studies. BMC Medical Research Methodology, 22(1): 68, 2022. +
    +
    +G. Ertürk Zararsız. Linear combination of leukocyte count and D-Dimer levels in the diagnosis of patients with non-traumatic acute abdomen. Med. Rec., 5: 84–90, 2023. URL https://doi.org/10.37990/medr.1166531. +
    +
    +S. S. Faria, P. C. Fernandes Jr, M. J. B. Silva, V. C. Lima, W. Fontes, R. Freitas-Junior, A. K. Eterovic and P. Forget. The neutrophil-to-lymphocyte ratio: A narrative review. ecancermedicalscience, 10: 2016. +
    +
    +J. Friedman, T. Hastie and R. Tibshirani. Regularization paths for generalized linear models via coordinate descent. Journal of statistical software, 33(1): 1, 2010. +
    +
    +S. Ganapathy, H. KT, B. Jindal, P. S. Naik and S. Nair N. Comparison of diagnostic accuracy of models combining the renal biomarkers in predicting renal scarring in pediatric population with vesicoureteral reflux (VUR). Irish Journal of Medical Science (1971-), 192(5): 2521–2526, 2023. +
    +
    +D. Ghosh and A. M. Chinnaiyan. Classification and selection of biomarkers in genomic data using LASSO. BioMed Research International, 2005(2): 147–154, 2005. +
    +
    +J. Grau, I. Grosse and J. Keilwagen. PRROC: Computing and visualizing precision-recall and receiver operating characteristic curves in R. Bioinformatics, 31(15): 2595–2597, 2015. +
    +
    +T. Hastie. Gam: Generalized additive models. 2015. URL https://CRAN.R-project.org/package=gam. R package version 1.22-5. +
    +
    +G. James, D. Witten, T. Hastie, R. Tibshirani, et al. An introduction to statistical learning. Springer, 2013. +
    +
    +L. Kang, A. Liu and L. Tian. Linear combination methods to improve diagnostic/prognostic accuracy on future observations. Statistical methods in medical research, 25(4): 1359–1380, 2016. +
    +
    +A. Kramar, D. Faraggi, A. Fortuné and B. Reiser. mROC: A computer program for combining tumour markers in predicting disease states. Computer methods and programs in biomedicine, 66(2-3): 199–207, 2001. +
    +
    +M. Kuhn. Building predictive models in R using the caret package. Journal of statistical software, 28: 1–26, 2008. +
    +
    +C. León, S. Ruiz-Santana, P. Saavedra, B. Almirante, J. Nolla-Salas, F. Álvarez-Lerma, J. Garnacho-Montero, M. Á. León, E. S. Group, et al. A bedside scoring system (“Candida score”) for early antifungal treatment in nonneutropenic critically ill patients with Candida colonization. Critical care medicine, 34(3): 730–737, 2006. +
    +
    +C. Liu, A. Liu and S. Halabi. A min–max combination of biomarkers to improve diagnostic accuracy. Statistics in medicine, 30(16): 2005–2014, 2011. +
    +
    +J. Luo, F. Yu, H. Zhou, X. Wu, Q. Zhou, Q. Liu and S. Gan. AST/ALT ratio is an independent risk factor for diabetic retinopathy: A cross-sectional study. Medicine, 103(26): e38583, 2024. +
    +
    +G. Minaev, R. Piché and A. Visa. Distance measures for classification of numerical features. 2018. URL https://trepo.tuni.fi/handle/10024/124353. +
    +
    +E. G. Müller, T. H. Edwin, C. Stokke, S. S. Navelsaker, A. Babovic, N. Bogdanovic, A. B. Knapskog and M. E. Revheim. Amyloid-\(\beta\) PET—correlation with cerebrospinal fluid biomarkers and prediction of Alzheimers disease diagnosis in a memory clinic. PloS one, 14(8): e0221365, 2019. +
    +
    +M. Neumann, H. Kothare and V. Ramanarayanan. Combining multiple multimodal speech features into an interpretable index score for capturing disease progression in Amyotrophic Lateral Sclerosis. Interspeech, 2353: 2023. +
    +
    +H. Nyblom, E. Björnsson, M. Simrén, F. Aldenborg, S. Almer and R. Olsson. The AST/ALT ratio as an indicator of cirrhosis in patients with PBC. Liver International, 26(7): 840–845, 2006. +
    +
    +S. Pandit, S. Gupta, et al. A comparative study on distance measuring approaches for clustering. International journal of research in computer science, 2(1): 29–31, 2011. +
    +
    +F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al. Scikit-learn: Machine learning in Python. the Journal of machine Learning research, 12: 2825–2830, 2011. +
    +
    +M. S. Pepe. The statistical evaluation of medical tests for classification and prediction. Oxford university press, 2003. +
    +
    +M. S. Pepe, T. Cai and G. Longton. Combining predictors for classification using the area under the receiver operating characteristic curve. Biometrics, 62(1): 221–229, 2006. +
    +
    +M. S. Pepe and M. L. Thompson. Combining diagnostic test results to increase accuracy. Biostatistics, 1(2): 123–140, 2000. +
    +
    +S. Pérez-Fernández, P. Martı́nez-Camblor, P. Filzmoser and N. Corral. Visualizing the decision rules behind the ROC curves: Understanding the classification process. AStA Advances in Statistical Analysis, 105(1): 135–161, 2021. +
    +
    +F. Prinzi, C. Militello, N. Scichilone, S. Gaglio and S. Vitabile. Explainable machine-learning models for COVID-19 prognosis prediction using clinical, laboratory and radiomic features. IEEE Access, 11: 121492–121510, 2023. +
    +
    +X. Robin, N. Turck, A. Hainard, N. Tiberti, F. Lisacek, J.-C. Sanchez and M. Müller. pROC: An open-source package for R and S+ to analyze and compare ROC curves. BMC bioinformatics, 12: 1–8, 2011. +
    +
    +S. Ruiz-Velasco. Asymptotic efficiency of logistic regression relative to linear discriminant analysis. Biometrika, 78(2): 235–243, 1991. +
    +
    +M. C. Sachs. plotROC: A tool for plotting ROC curves. Journal of statistical software, 79: 2017. +
    +
    +N. Salvetat, F. J. Checa-Robles, A. Delacrétaz, C. Cayzac, B. Dubuc, D. Vetter, J. Dainat, J.-P. Lang, F. Gamma and D. Weissmann. AI algorithm combined with RNA editing-based blood biomarkers to discriminate bipolar from major depressive disorders in an external validation multicentric cohort. Journal of Affective Disorders, 356: 385–393, 2024. +
    +
    +N. Salvetat, F. J. Checa-Robles, V. Patel, C. Cayzac, B. Dubuc, F. Chimienti, J.-D. Abraham, P. Dupré, D. Vetter, S. Méreuze, et al. A game changer for bipolar disorder diagnosis using RNA editing-based biomarkers. Translational Psychiatry, 12(1): 182, 2022. +
    +
    +G. Sameera, R. V. Vardhan and K. Sarma. Binary classification using multivariate receiver operating characteristic curve for continuous data. Journal of biopharmaceutical statistics, 26(3): 421–431, 2016. +
    +
    +D. Serban, N. Papanas, A. M. Dascalu, P. Kempler, I. Raz, A. A. Rizvi, M. Rizzo, C. Tudor, M. Silviu Tudosie, D. Tanasescu, et al. Significance of neutrophil to lymphocyte ratio (NLR) and platelet lymphocyte ratio (PLR) in diabetic foot ulcer and potential new therapeutic targets. The International Journal of Lower Extremity Wounds, 23(2): 205–216, 2024. +
    +
    +A. Sewak, S. Siegfried and T. Hothorn. Construction and evaluation of optimal diagnostic tests with application to hepatocellular carcinoma diagnosis. arXiv preprint arXiv:2402.03004, 2024. +
    +
    +P. Shiralkar. Programming validation: Perspectives and strategies. PharmaSUG 2010—paper IB09, 2010. +
    +
    +T. Sing, O. Sander, N. Beerenwinkel and T. Lengauer. ROCR: Visualizing classifier performance in R. Bioinformatics, 21(20): 3940–3941, 2005. +
    +
    +M. Stevenson, T. Nunes, C. Heuer, J. Marshall, J. Sanchez, R. Thornton, J. Reiczigel, J. Robison-Cox, P. Sebastiani, P. Solymos, et al. epiR: Tools for the analysis of epidemiological data. 2017. URL https://cran.r-project.org/web/packages/epiR/index.html. R package version 2.0.76. +
    +
    +J. Q. Su and J. S. Liu. Linear combinations of multiple diagnostic markers. Journal of the American Statistical Association, 88(424): 1350–1355, 1993. +
    +
    +K. Svart, J. J. Korsbæk, R. H. Jensen, T. Parkner, C. S. Knudsen, S. G. Hasselbalch, S. M. Hagen, E. A. Wibroe, L. D. Molander and D. Beier. Neurofilament light chain is elevated in patients with newly diagnosed idiopathic intracranial hypertension: A prospective study. Cephalalgia, 44(5): 03331024241248203, 2024. +
    +
    +N. Todor, I. Todor and G. Săplăcan. Tools to identify linear combination of prognostic factors which maximizes area under receiver operator curve. Journal of clinical bioinformatics, 4: 1–7, 2014. +
    +
    +H. Wickham. Testthat: Get started with testing. 2024. URL https://cran.r-project.org/web/packages/testthat/index.html. R package version 3.2.1.1. +
    +
    +H. Wickham, P. Danenberg and M. Eugster. roxygen2: In-source documentation for R. 2024. URL https://cran.r-project.org/web/packages/roxygen2/index.html. R package version 7.3.2. +
    +
    +H. Wickham, J. Hester, W. Chang and J. Bryan. Devtools: Tools to make developing R packages easier. 2022. URL https://cran.r-project.org/web/packages/devtools/index.html. R package version 2.4.5. +
    +
    +J. Yin and L. Tian. Optimal linear combinations of multiple diagnostic biomarkers based on Youden index. Statistics in medicine, 33(8): 1426–1440, 2014. +
    +
    +W. Yu and T. Park. Two simple algorithms on linear combination of multiple biomarkers to maximize partial area under the ROC curve. Computational Statistics & Data Analysis, 88: 15–27, 2015. +
    +
    +G. Zararsiz, H. Y. Akyildiz, D. GÖKSÜLÜK, S. Korkmaz and A. ÖZTÜRK. Statistical learning approaches in diagnosing patients with nontraumatic acute abdomen. Turkish Journal of Electrical Engineering and Computer Sciences, 24(5): 3685–3697, 2016. +
    +
    +
    +
    +
      +
    1. https://cran.r-project.org/web/ +packages/splines/index.html↩︎

    2. +
    3. https://cran.r-project.org/web/packages/glmnet/index.html↩︎

    4. +
    5. https://github.com/gokmenzararsiz/dtComb, +https://github.com/gokmenzararsiz/dtComb_Shiny

      +
      +↩︎
    6. +
    +
    + + +
    + +
    +
    + + + + + + + +
    +

    References

    +
    +

    Reuse

    +

    Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

    +

    Citation

    +

    For attribution, please cite this work as

    +
    Taştan, et al., "dtComb: A Comprehensive R Library and Web Tool for Combining Diagnostic Tests", The R Journal, 2026
    +

    BibTeX citation

    +
    @article{RJ-2025-036,
    +  author = {Taştan, S. Ilayda Yerlitaş and Gengeç, Serra Bersan and Koçhan, Necla and Zararsız, Ertürk and Korkmaz, Selçuk and Zararsız, },
    +  title = {dtComb: A Comprehensive R Library and Web Tool for Combining Diagnostic Tests},
    +  journal = {The R Journal},
    +  year = {2026},
    +  note = {https://doi.org/10.32614/RJ-2025-036},
    +  doi = {10.32614/RJ-2025-036},
    +  volume = {17},
    +  issue = {4},
    +  issn = {2073-4859},
    +  pages = {80-102}
    +}
    +
    + + + + + + + diff --git a/_articles/RJ-2025-036/RJ-2025-036.pdf b/_articles/RJ-2025-036/RJ-2025-036.pdf new file mode 100644 index 0000000000..216b1da249 Binary files /dev/null and b/_articles/RJ-2025-036/RJ-2025-036.pdf differ diff --git a/_articles/RJ-2025-036/RJournal.sty b/_articles/RJ-2025-036/RJournal.sty new file mode 100644 index 0000000000..351990be38 --- /dev/null +++ b/_articles/RJ-2025-036/RJournal.sty @@ -0,0 +1,358 @@ +% Package `RJournal' to use with LaTeX2e +% Copyright (C) 2010 by the R Foundation +% Copyright (C) 2013 by the R Journal +% +% Originally written by Kurt Hornik and Friedrich Leisch with subsequent +% edits by the editorial board +% +% CAUTION: +% Do not modify this style file. Any changes to this file will be reset when your +% article is submitted. +% If you must modify the style or add LaTeX packages to the article, these +% should be specified in RJwrapper.tex + +\NeedsTeXFormat{LaTeX2e}[1995/12/01] +\ProvidesPackage{RJournal}[2025/10/05 v0.17 RJournal package] + +\RequirePackage{tikz} + +% Overall page layout, fonts etc ----------------------------------------------- + +% Issues of of \emph{The R Journal} are created from the standard \LaTeX{} +% document class \pkg{report}. + +\RequirePackage{geometry} +\geometry{a4paper, + textwidth=14cm, top=1cm, bottom=1cm, + includehead,includefoot,centering, + footskip=1.5cm} +\raggedbottom +\sloppy +\clubpenalty = 10000 +\widowpenalty = 10000 +\brokenpenalty = 10000 +\usepackage{microtype} + + +\RequirePackage{fancyhdr} +\fancyhead{} +\fancyheadoffset{2cm} +\fancyhead[L]{\textsc{\RJ@sectionhead}} +\fancyhead[R]{\thepage} +\fancyfoot{} +\fancyfoot[L]{The R Journal Vol. \RJ@volume/\RJ@number, \RJ@month~\RJ@year} +\fancyfoot[R]{ISSN 2073-4859} +\pagestyle{fancy} + +% We use the following fonts (all with T1 encoding): +% +% rm & palatino +% tt & inconsolata +% sf & helvetica +% math & palatino + +\RequirePackage{microtype} + +\RequirePackage[scaled=0.92]{helvet} +\RequirePackage{palatino,mathpazo} +\RequirePackage[scaled=1.02]{inconsolata} +\RequirePackage[T1]{fontenc} + +\RequirePackage[hyphens]{url} +\RequirePackage[pagebackref]{hyperref} +\renewcommand{\backref}[1]{[p#1]} + +% Dark blue colour for all links +\RequirePackage{color} +\definecolor{link}{rgb}{0.45,0.51,0.67} +\hypersetup{ + colorlinks,% + citecolor=link,% + filecolor=link,% + linkcolor=link,% + urlcolor=link +} + +% Give the text a little room to breath +\setlength{\parskip}{3pt} +\RequirePackage{setspace} +\setstretch{1.05} + +% Issue and article metadata --------------------------------------------------- + +% Basic front matter information about the issue: volume, number, and +% date. + +\newcommand{\volume}[1]{\def\RJ@volume{#1}} +\newcommand{\volnumber}[1]{\def\RJ@number{#1}} +\renewcommand{\month}[1]{\def\RJ@month{#1}} +\renewcommand{\year}[1]{\def\RJ@year{#1}} + + +% Individual articles correspond to +% chapters, and are contained in |article| environments. This makes it +% easy to have figures counted within articles and hence hyperlinked +% correctly. + +% An article has an author, a title, and optionally a subtitle. We use +% the obvious commands for specifying these. Articles will be put in certain +% journal sections, named by \sectionhead. + +\newcommand {\sectionhead} [1]{\def\RJ@sectionhead{#1}} +\renewcommand{\author} [1]{\def\RJ@author{#1}} +\renewcommand{\title} [1]{\def\RJ@title{#1}} +\newcommand {\subtitle} [1]{\def\RJ@subtitle{#1}} + +% Control appearance of titles: make slightly smaller than usual, and +% suppress section numbering. See http://tex.stackexchange.com/questions/69749 +% for why we don't use \setcounter{secnumdepth}{-1} + +\usepackage[medium]{titlesec} +\usepackage{titletoc} +\titleformat{\section} {\normalfont\large\bfseries}{\arabic{section}}{1em}{} +\titleformat{\subsection}{\normalfont\normalsize\bfseries}{\arabic{section}.\arabic{subsection}}{0.5em}{} +\titlecontents{chapter} [0em]{}{}{}{\titlerule*[1em]{.}\contentspage} + +% Article layout --------------------------------------------------------------- + +% Environment |article| clears the article header information at its beginning. +% We use |\FloatBarrier| from the placeins package to keep floats within +% the article. +\RequirePackage{placeins} +\newenvironment{article}{\author{}\title{}\subtitle{}\FloatBarrier}{\FloatBarrier} + +% Refereed articles should have an abstract, so we redefine |\abstract| to +% give the desired style + +\renewcommand{\abstract}[1]{\noindent\textbf{Abstract} #1} +\renewenvironment{abstract}{\noindent\textbf{Abstract}~}{} + +% The real work is done by a redefined version of |\maketitle|. Note +% that even though we do not want chapters (articles) numbered, we +% need to increment the chapter counter, so that figures get correct +% labelling. + +\renewcommand{\maketitle}{% +\noindent + \chapter{\RJ@title}\refstepcounter{chapter} + \ifx\empty\RJ@subtitle + \else + \noindent\textbf{\RJ@subtitle} + \par\nobreak\addvspace{\baselineskip} + \fi + \ifx\empty\RJ@author + \else + \noindent\textit{\RJ@author} + \par\nobreak\addvspace{\baselineskip} + \fi + \@afterindentfalse\@nobreaktrue\@afterheading +} + +% Now for some ugly redefinitions. We do not want articles to start a +% new page. (Actually, we do, but this is handled via explicit +% \newpage +% +% The name@of@eq is a hack to get hyperlinks to equations to work +% within each article, even though there may be multiple eq.(1) +% \begin{macrocode} +\renewcommand\chapter{\secdef\RJ@chapter\@schapter} +\providecommand{\nohyphens}{% + \hyphenpenalty=10000\exhyphenpenalty=10000\relax} +\newcommand{\RJ@chapter}{% + \edef\name@of@eq{equation.\@arabic{\c@chapter}}% + \renewcommand{\@seccntformat}[1]{}% + \@startsection{chapter}{0}{0mm}{% + -2\baselineskip \@plus -\baselineskip \@minus -.2ex}{\p@}{% + \phantomsection\normalfont\huge\bfseries\raggedright}} + +% Book reviews should appear as sections in the text and in the pdf bookmarks, +% however we wish them to appear as chapters in the TOC. Thus we define an +% alternative to |\maketitle| for reviews. +\newcommand{\review}[1]{ + \pdfbookmark[1]{#1}{#1} + \section*{#1} + \addtocontents{toc}{\protect\contentsline{chapter}{#1}{\thepage}{#1.1}} +} + +% We want bibliographies as starred sections within articles. +% +\RequirePackage[sectionbib,round]{natbib} +\bibliographystyle{abbrvnat} +\renewcommand{\bibsection}{\section*{References}} + +% Equations, figures and tables are counted within articles, but we do +% not show the article number. For equations it becomes a bit messy to avoid +% having hyperref getting it wrong. + +% \numberwithin{equation}{chapter} +\renewcommand{\theequation}{\@arabic\c@equation} +\renewcommand{\thefigure}{\@arabic\c@figure} +\renewcommand{\thetable}{\@arabic\c@table} + +% Issue layout ----------------------------------------------------------------- + +% Need to provide our own version of |\tableofcontents|. We use the +% tikz package to get the rounded rectangle. Notice that |\section*| +% is really the same as |\chapter*|. +\renewcommand{\contentsname}{Contents} +\renewcommand\tableofcontents{% + \vspace{1cm} + \section*{\contentsname} + { \@starttoc{toc} } +} + +\renewcommand{\titlepage}{% + \thispagestyle{empty} + \hypersetup{ + pdftitle={The R Journal Volume \RJ@volume/\RJ@number, \RJ@month \RJ@year},% + pdfauthor={R Foundation for Statistical Computing},% + } + \noindent + \begin{center} + \fontsize{50pt}{50pt}\selectfont + The \raisebox{-8pt}{\includegraphics[height=77pt]{Rlogo-5}}\hspace{10pt} + Journal + + \end{center} + {\large \hfill Volume \RJ@volume/\RJ@number, \RJ@month{} \RJ@year \quad} + + \rule{\textwidth}{1pt} + \begin{center} + {\Large A peer-reviewed, open-access publication of the \\ + R Foundation for Statistical Computing} + \end{center} + + % And finally, put in the TOC box. Note the way |tocdepth| is adjusted + % before and after producing the TOC: thus, we can ensure that only + % articles show up in the printed TOC, but that in the PDF version, + % bookmarks are created for sections and subsections as well (provided + % that the non-starred forms are used). + \setcounter{tocdepth}{0} + \tableofcontents + \setcounter{tocdepth}{2} + \clearpage +} + +% Text formatting -------------------------------------------------------------- + +\newcommand{\R}{R} +\newcommand{\address}[1]{\addvspace{\baselineskip}\noindent\emph{#1}} +\newcommand{\email}[1]{\href{mailto:#1}{\normalfont\texttt{#1}}} + +% Simple font selection is not good enough. For example, |\texttt{--}| +% gives `\texttt{--}', i.e., an endash in typewriter font. Hence, we +% need to turn off ligatures, which currently only happens for commands +% |\code| and |\samp| and the ones derived from them. Hyphenation is +% another issue; it should really be turned off inside |\samp|. And +% most importantly, \LaTeX{} special characters are a nightmare. E.g., +% one needs |\~{}| to produce a tilde in a file name marked by |\file|. +% Perhaps a few years ago, most users would have agreed that this may be +% unfortunate but should not be changed to ensure consistency. But with +% the advent of the WWW and the need for getting `|~|' and `|#|' into +% URLs, commands which only treat the escape and grouping characters +% specially have gained acceptance + +\DeclareRobustCommand\code{\bgroup\@noligs\@codex} +\def\@codex#1{\texorpdfstring% +{{\normalfont\ttfamily\hyphenchar\font=-1 #1}}% +{#1}\egroup} +\newcommand{\kbd}[1]{{\normalfont\texttt{#1}}} +\newcommand{\key}[1]{{\normalfont\texttt{\uppercase{#1}}}} +\DeclareRobustCommand\samp{`\bgroup\@noligs\@sampx} +\def\@sampx#1{{\normalfont\texttt{#1}}\egroup'} +\newcommand{\var}[1]{{\normalfont\textsl{#1}}} +\let\env=\code +\newcommand{\file}[1]{{`\normalfont\textsf{#1}'}} +\let\command=\code +\let\option=\samp +\newcommand{\dfn}[1]{{\normalfont\textsl{#1}}} +% \acronym is effectively disabled since not used consistently +\newcommand{\acronym}[1]{#1} +\newcommand{\strong}[1]{\texorpdfstring% +{{\normalfont\fontseries{b}\selectfont #1}}% +{#1}} +\let\pkg=\strong +\newcommand{\CRANpkg}[1]{\href{https://CRAN.R-project.org/package=#1}{\pkg{#1}}}% +\let\cpkg=\CRANpkg +\newcommand{\ctv}[1]{\href{https://CRAN.R-project.org/view=#1}{\emph{#1}}} +\newcommand{\BIOpkg}[1]{\href{https://www.bioconductor.org/packages/release/bioc/html/#1.html}{\pkg{#1}}} + +% Example environments --------------------------------------------------------- +\RequirePackage{fancyvrb} +\RequirePackage{alltt} + +\DefineVerbatimEnvironment{example}{Verbatim}{} +\renewenvironment{example*}{\begin{alltt}}{\end{alltt}} + +% Support for output from Sweave, and generic session style code +% These used to have fontshape=sl for Sinput/Scode/Sin, but pslatex +% won't use a condensed font in that case. + +% Update (2015-05-28 by DS): remove fontsize=\small to match example environment + +\DefineVerbatimEnvironment{Sinput}{Verbatim}{} +\DefineVerbatimEnvironment{Soutput}{Verbatim}{} +\DefineVerbatimEnvironment{Scode}{Verbatim}{} +\DefineVerbatimEnvironment{Sin}{Verbatim}{} +\DefineVerbatimEnvironment{Sout}{Verbatim}{} +\newenvironment{Schunk}{}{} + +% Mathematics ------------------------------------------------------------------ + +% The implementation of |\operatorname| is similar to the mechanism +% \LaTeXe{} uses for functions like sin and cos, and simpler than the +% one of \AmSLaTeX{}. We use |\providecommand| for the definition in +% order to keep the one of the \pkg{amstex} if this package has +% already been loaded. +% \begin{macrocode} +\providecommand{\operatorname}[1]{% + \mathop{\operator@font#1}\nolimits} +\RequirePackage{amsfonts} + +\renewcommand{\P}{% + \mathop{\operator@font I\hspace{-1.5pt}P\hspace{.13pt}}} +\newcommand{\E}{% + \mathop{\operator@font I\hspace{-1.5pt}E\hspace{.13pt}}} +\newcommand{\VAR}{\operatorname{var}} +\newcommand{\COV}{\operatorname{cov}} +\newcommand{\COR}{\operatorname{cor}} + +% Figures ---------------------------------------------------------------------- + +% For use with pandoc > 3.2.1 +\newsavebox\pandoc@box +\newcommand*\pandocbounded[1]{% scales image to fit in text height/width + \sbox\pandoc@box{#1}% + \Gscale@div\@tempa{\textheight}{\dimexpr\ht\pandoc@box+\dp\pandoc@box\relax}% + \Gscale@div\@tempb{\linewidth}{\wd\pandoc@box}% + \ifdim\@tempb\p@<\@tempa\p@\let\@tempa\@tempb\fi% select the smaller of both + \ifdim\@tempa\p@<\p@\scalebox{\@tempa}{\usebox\pandoc@box}% + \else\usebox{\pandoc@box}% + \fi% +} + +\RequirePackage[font=small,labelfont=bf]{caption} + +% Wide environments for figures and tables ------------------------------------- +\RequirePackage{environ} + +% An easy way to make a figure span the full width of the page +\NewEnviron{widefigure}[1][]{ +\begin{figure}[#1] +\advance\leftskip-2cm +\begin{minipage}{\dimexpr\textwidth+4cm\relax}% + \captionsetup{margin=2cm} + \BODY +\end{minipage}% +\end{figure} +} + +\NewEnviron{widetable}[1][]{ +\begin{table}[#1] +\advance\leftskip-2cm +\begin{minipage}{\dimexpr\textwidth+4cm\relax}% + \captionsetup{margin=2cm} + \BODY +\end{minipage}% +\end{table} +} diff --git a/_articles/RJ-2025-036/RJwrapper.md b/_articles/RJ-2025-036/RJwrapper.md new file mode 100644 index 0000000000..abc8d0facf --- /dev/null +++ b/_articles/RJ-2025-036/RJwrapper.md @@ -0,0 +1,1245 @@ +--- +abstract: | + The combination of diagnostic tests has become a crucial area of + research, aiming to improve the accuracy and robustness of medical + diagnostics. While existing tools focus primarily on linear + combination methods, there is a lack of comprehensive tools that + integrate diverse methodologies. In this study, we present + [`dtComb`](https://cran.r-project.org/web/packages/dtComb/index.html), + a comprehensive R package and web tool designed to address the + limitations of existing diagnostic test combination platforms. One of + the unique contributions of + [`dtComb`](https://cran.r-project.org/web/packages/dtComb/index.html) + is offering a range of 142 methods to combine two diagnostic tests, + including linear, non-linear, machine learning algorithms, and + mathematical operators. Another significant contribution of + [`dtComb`](https://cran.r-project.org/web/packages/dtComb/index.html) + is its inclusion of advanced tools for ROC analysis, diagnostic + performance metrics, and visual outputs such as + sensitivity-specificity curves. Furthermore, + [`dtComb`](https://cran.r-project.org/web/packages/dtComb/index.html) + offers classification functions for new observations, making it an + easy-to-use tool for clinicians and researchers. The web-based version + is also available at for + non-R users, providing an intuitive interface for test combination and + model training. +address: +- | + S. Ilayda Yerlitaş Taştan\ + Department of Biostatistics\ + Erciyes University\ + Türkiye\ + (ORCiD: 0000-0003-2830-3006)\ + [ilaydayerlitas340@gmail.com](ilaydayerlitas340@gmail.com){.uri} +- | + Serra Bersan Gengeç\ + Department of Biostatistics\ + Erciyes University\ + Türkiye\ + [serrabersan@gmail.com](serrabersan@gmail.com){.uri} +- | + Necla Koçhan\ + Department of Mathematics\ + Izmir University of Economics\ + Türkiye\ + (ORCiD: 0000-0003-2355-4826)\ + [necla.kayaalp@gmail.com](necla.kayaalp@gmail.com){.uri} +- | + Ertürk Zararsız\ + Department of Biostatistics\ + Erciyes University\ + Türkiye\ + (ORCiD if desired)\ + [gozdeerturk9@gmail.com](gozdeerturk9@gmail.com){.uri} +- | + Selçuk Korkmaz\ + Department of Biostatistics\ + Trakya University\ + Türkiye\ + (ORCiD if desired)\ + [selcukorkmaz@gmail.com](selcukorkmaz@gmail.com){.uri} +- | + Zararsız\ + Department of Biostatistics\ + Erciyes University\ + Türkiye\ + (ORCiD: 0000-0001-5801-1835)\ + [gokmen.zararsiz@gmail.com](gokmen.zararsiz@gmail.com){.uri} +author: +- S. Ilayda Yerlitaş Taştan, Serra Bersan Gengeç, Necla Koçhan, Ertürk + Zararsız, Selçuk Korkmaz and Zararsız +bibliography: +- dtCombreferences.bib +title: "dtComb: A Comprehensive R Library and Web Tool for Combining + Diagnostic Tests" +--- + +::::::::: article +## Introduction + +A typical scenario often encountered in combining diagnostic tests is +when the gold standard method combines two-category and two continuous +diagnostic tests. In such cases, clinicians usually seek to compare +these two diagnostic tests and improve the performance of these +diagnostic test results by dividing the results into proportional +results [@muller2019amyloid; @faria2016neutrophil; @nyblom2006ast]. +However, this technique is straightforward and may not fully capture all +potential interactions and relationships between the diagnostic tests. +Linear combination methods have been developed to overcome such problems +[@erturkzararsiz2023linear].\ +Linear methods combine two diagnostic tests into a single score/index by +assigning weights to each test, optimizing their performance in +diagnosing the condition of interest [@neumann2023combining]. Such +methods improve accuracy by leveraging the strengths of both tests +[@aznar2022stepwise; @bansal2013does]. For instance, Su and Liu +[@su1993linear] found that Fisher's linear discriminant function +generates a linear combination of markers with either proportional or +disproportional covariance matrices, aiming to maximize sensitivity +consistently across the entire selectivity spectrum under a multivariate +normal distribution model. In contrast, another approach introduced by +Pepe and Thomson [@pepe2000combining] relies on ranking scores, +eliminating the need for linear distributional assumptions when +combining diagnostic tests. Despite the theoretical advances, when +existing tools were examined, it was seen that they contained a limited +number of methods. For instance, Kramar et al. developed a computer +program called **mROC** that includes only the Su and Liu method +[@kramar2001mroc]. Pérez-Fernández et al. presented a +[`movieROC`](https://cran.r-project.org/web/packages/movieROC/index.html) +R package that includes methods such as Su and Liu, min-max, and +logistic regression methods [@perez2021visualizing]. An R package called +[`maxmzpAUC`](https://github.com/wbaopaul/MaxmzpAUC-R) that includes +similar methods was developed by Yu and Park [@yu2015two]. + +On the other hand, non-linear approaches incorporating the non-linearity +between the diagnostic tests have been developed and employed to +integrate the diagnostic tests +[@du2024likelihood; @ghosh2005classification]. These approaches +incorporate the non-linear structure of tests into the model, which +might improve the accuracy and reliability of the diagnosis. In contrast +to some existing packages, which permit the use of non-linear approaches +such as splines[^1], lasso[^2] and ridge regression, there is currently +no package that employs these methods directly for combination and +offers diagnostic performance. Machine-learning (ML) algorithms have +recently been adopted to combine diagnostic tests +[@ahsan2024advancements; @sewak2024construction; @agarwal2023artificial; @prinzi2023explainable]. +Many publications/studies focus on implementing ML algorithms in +diagnostic tests +[@salvetat2022game; @salvetat2024ai; @ganapathy2023comparison; @alzyoud2024diagnosing; @zararsiz2016statistical]. +For instance, DeGroat et al. performed four different classification +algorithms (Random Forest, Support Vector Machine, Extreme Gradient +Boosting Decision Trees, and k-Nearest Neighbors) to combine markers for +the diagnosis of cardiovascular disease [@degroat2024discovering]. The +results showed that patients with cardiovascular disease can be +diagnosed with up to 96% accuracy using these ML techniques. There are +numerous applications where ML methods can be implemented +([`scikit-learn`](https://scikit-learn.org/stable/) +[@pedregosa2011scikit], +[`TensorFlow`](https://www.tensorflow.org/learn?hl=tr) +[@tensorflow2015-whitepaper], +[`caret`](https://cran.r-project.org/web/packages/caret/index.html) +[@kuhn2008building]). The +[`caret`](https://cran.r-project.org/web/packages/caret/index.html) +library is one of the most comprehensive tools developed in the R +language[@kuhn2008building]. However, these are general tools developed +only for ML algorithms and do not directly combine two diagnostic tests +or provide diagnostic performance measures. + +Apart from the aforementioned methods, several basic mathematical +operations such as addition, multiplication, subtraction, and division +can also be used to combine markers +[@svart2024neurofilament; @luo2024ast; @serban2024significance]. For +instance, addition can enhance diagnostic sensitivity by combining the +effects of markers, whereas subtraction can more distinctly +differentiate disease states by illustrating the variance across +markers. On the other hand, there are several commercial (e.g. IBM SPSS, +MedCalc, Stata, etc.) and open source (R) software packages +([`ROCR`](https://cran.r-project.org/web/packages/ROCR/index.html) +[@sing2005rocr], +([`pROC`](https://cran.r-project.org/web/packages/pROC/index.html) +[@robin2011proc], +[`PRROC`](https://cran.r-project.org/web/packages/PRROC/index.html) +[@grau2015prroc], +[`plotROC`](https://cran.r-project.org/web/packages/plotROC/index.html) +[@sachs2017plotroc]) that researchers can use for Receiver operating +characteristic (ROC) curve analysis. However, these tools are designed +to perform a single marker ROC analysis. As a result, there is currently +no software tool that covers almost all combination methods. + +In this study, we developed +[`dtComb`](https://cran.r-project.org/web/packages/dtComb/index.html), +an R package encompassing nearly all existing combination approaches in +the literature. +[`dtComb`](https://cran.r-project.org/web/packages/dtComb/index.html) +has two key advantages, making it easy to apply and superior to the +other packages: (1) it provides users with a comprehensive 142 methods, +including linear and non-linear approaches, ML approaches and +mathematical operators; (2) it produces turnkey solutions to users from +the stage of uploading data to the stage of performing analyses, +performance evaluation and reporting. Furthermore, it is the only +package that illustrates linear approaches such as Minimax and Todor & +Saplacan [@sameera2016binary; @todor2014tools]. In addition, it allows +for the classification of new, previously unseen observations using +trained models. To our knowledge, no other tools were designed and +developed to combine two diagnostic tests on a single platform with 142 +different methods. In other words, +[`dtComb`](https://cran.r-project.org/web/packages/dtComb/index.html) +has made more effective and robust combination methods ready for +application instead of traditional approaches such as simple ratio-based +methods. First, we review the theoretical basis of the related +combination methods; then, we present an example implementation to +demonstrate the applicability of the package. Finally, we present a +user-friendly, up-to-date, and comprehensive web tool developed to +facilitate +[`dtComb`](https://cran.r-project.org/web/packages/dtComb/index.html) +for physicians and healthcare professionals who do not use the R +programming language. The +[`dtComb`](https://cran.r-project.org/web/packages/dtComb/index.html) +package is freely available on the CRAN network, the web application is +freely available at , and all +source code is available on GitHub[^3]. + +## Material and methods + +This section will provide an overview of the combination methods +implemented in the literature. Before applying these methods, we will +also discuss the standardization techniques available for the markers, +the resampling methods during model training, and, ultimately, the +metrics used to evaluate the model's performance. + +### Combination approaches + +#### Linear combination methods + +The +[`dtComb`](https://cran.r-project.org/web/packages/dtComb/index.html) +package comprises eight distinct linear combination methods, which will +be elaborated in this section. Before investigating these methods, we +briefly introduce some notations which will be used throughout this +section.\ +Notations:\ +Let $D_{i}, i = 1, 2, …, n_1$ be the marker values of the $i$th +individual in the diseased group, where $D_i=(D_{i1},D_{i2})$, and +$H_j, j=1,2,…,n_2$ be the marker values of the $j$th individual in the +healthy group, where $H_j=(H_{j1},H_{j2})$. Let +$x_{i1}=c(D_{{i1}},H_{j1})$ be the values of the first marker, and +$x_{i2}=c(D_{i2},H_{j2})$ be values of the second marker for the $i$th +individual $i=1,2,...,n$. Let $D_{i,min}=\min(D_{i1},D_{i2})$, +$D_{i,max}=\max(D_{i1},D_{i2})$, $H_{j,min}=\min(H_{j1},H_{j2})$, +$H_{j,max}=\max(H_{j1},H_{j2})$ and $c_i$ be the resulting combination +score of the $i$th individual. + +- *Logistic regression:* Logistic regression is a statistical method + used for binary classification. The logistic regression model + estimates the probability of the binary outcome occurring based on + the values of the independent variables. It is one of the most + commonly applied methods in diagnostic tests, and it generates a + linear combination of markers that can distinguish between control + and diseased individuals. Logistic regression is generally less + effective than normal-based discriminant analysis, like Su and Liu's + multivariate normality-based method, when the normal assumption is + met [@ruiz1991asymptotic; @efron1975efficiency]. On the other hand, + others have argued that logistic regression is more robust because + it does not require any assumptions about the joint distribution of + multiple markers [@cox1989analysis]. Therefore, it is essential to + investigate the performance of linear combination methods derived + from the logistic regression approach with non-normally distributed + data.\ + The objective of the logistic regression model is to maximize the + logistic likelihood function. In other words, the logistic + likelihood function is maximized to estimate the logistic regression + model coefficients.\ + + $$\label{eq:1} + c=\frac{exp(\beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2}}{1+exp(\beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2}} (\#eq:1)$$ + The logistic regression coefficients can provide the maximum + likelihood estimation of the model, producing an easily + interpretable value for distinguishing between the two groups. + +- *Scoring based on logistic regression:* The method primarily uses a + binary logistic regression model, with slight modifications to + enhance the combination score. The regression coefficients, as + predicted in Eq \@ref(eq:1), are rounded to a user-specified number + of decimal places and subsequently used to calculate the combination + score [@leon2006bedside]. + $$c= \beta_1 x_{i1}+\beta_2 x_{i2}$$ + +- *Pepe & Thompson's method:* Pepe & Thompson have aimed to maximize + the AUC or partial AUC to combine diagnostic tests, regardless of + the distribution of markers [@pepe2000combining]. They developed an + empirical solution of optimal linear combinations that maximize the + Mann-Whitney U statistic, an empirical estimate of the ROC curve. + Notably, this approach is distribution-free. Mathematically, they + maximized the following objective function: + $$\text{maximize} \; U(a)= \frac{1}{n_1 n_2} \sum_{i=1}^{n_1} \sum_{j=1}^{n_2} I\left[D_{i1}+\alpha D_{i2}\geq H_{j1}+\alpha H_{j2}\right]$$ + + $$c= x_{i1}+\alpha x_{i2} + \label{eq:4} (\#eq:4)$$ + where $a \in [-1,1]$ is interpreted as the relative weight of + $x_{i2}$ to $x_{i1}$ in the combination, the weight of the second + marker. This formula aims to find $\alpha$ to maximize $U(a)$. + Readers are referred to see (Pepe and Thomson) [@pepe2000combining]. + +- *Pepe, Cai & Langton's method:* Pepe et al. observed that when the + disease status and the levels of markers conform to a generalized + linear model, the regression coefficients represent the optimal + linear combinations that maximize the area under the ROC curves + [@pepe2006combining]. The following objective function is maximized + to achieve a higher AUC value: + $$\text{maximize} \; U(a)= \frac{1}{n_1 n_2} \sum_{i=1}^{n_1} \sum_{j=1}^{n_2} I\left[D_{i1}+\alpha D_{i2}> H_{j1}+\alpha H_{j2}\right] + \frac{1}{2}I\left[D_{i1}+\alpha = H_{j1} + \alpha H_{j2}\right]$$ + Before calculating the combination score using Eq \@ref(eq:4), the + marker values are normalized or scaled to be constrained within the + scale of 0 to 1. In addition, it is noted that the estimate obtained + by maximizing the empirical AUC can be considered as a particular + case of the maximum rank correlation estimator from which the + general asymptotic distribution theory was developed. Readers are + referred to Pepe (2003, Chapters 4--6) for a review of the ROC curve + approach and more details [@pepe2003statistical]. + +- *Min-Max method:* The Pepe & Thomson method is straightforward if + there are two markers. It is computationally challenging if we have + more than two markers to be combined. To overcome the computational + complexity issue of this method, Liu et al. [@liu2011min] proposed a + non-parametric approach that linearly combines the minimum and + maximum values of the observed markers of each subject. This + approach, which does not rely on the normality assumption of data + distributions (i.e., distribution-free), is known as the Min-Max + method and may provide higher sensitivity than any single marker. + The objective function of the Min-Max method is as follows: + $$\text{maximize} \; U(a)= \frac{1}{n_1 n_2} \sum_{i=1}^{n_1} \sum_{j=1}^{n_2} I[D_{i,max}+\alpha D_{i,min}> H_{j,max}+\alpha H_{j,min}]$$ + + $$c= x_{i,max}+\alpha x_{i,min}$$ + \ + where $x_{i,max}=\max⁡(x_{i1},x_{i2})$ and + $x_{i,min}=\min⁡(x_{i1},x_{i2})$.\ + The Min-Max method aims to combine repeated measurements of a single + marker over time or multiple markers that are measured with the same + unit. While the Min-Max method is relatively simple to implement, it + has some limitations. For example, markers may have different units + of measurement, so standardization can be needed to ensure + uniformity during the combination process. Furthermore, it is + unclear whether all available information is fully utilized when + combining markers, as this method incorporates only the markers' + minimum and maximum values into the model [@kang2016linear]. + +- *Su & Liu's method:* Su and Liu examined the combination score + separately under the assumption of two multivariate normal + distributions when the covariance matrices were proportional or + disproportionate [@su1993linear]. Multivariate normal distributions + with different covariances were first utilized in classification + problems [@anderson1962classification]. Then, Su and Liu also + developed a linear combination method by extending the idea of using + multivariate distributions to the AUC, showing that the best + coefficients that maximize AUC are Fisher's discriminant + coefficients. Assuming that $D~N(\mu_D, \sum_D)$ and + $H~N(\mu_H, \sum_H)$ represent the multivariate normal distributions + for the diseased and non-diseased groups, respectively. The Fisher's + coefficients are as follows: + $$(\alpha, \beta) = (\sum_{D} + \sum_{H})^{-1} \mu \label{eq:alpha_beta} (\#eq:alpha-beta)$$ + where $\mu=\mu_D-\mu_H$. The combination score in this case is: + $$c= \alpha x_{i1}+ \beta x_{i2} + \label{eq:9} (\#eq:9)$$ + +- *The Minimax method:* The Minimax method is an extension of Su & + Liu's method [@sameera2016binary]. Suppose that D follows a + multivariate normal distribution $D\sim N(\mu_D, \sum_D)$, + representing the diseased group, and H follows a multivariate normal + distribution $H\sim N(\mu_H, \sum_H)$, representing the non-diseased + group. Then Fisher's coefficients are as follows: + $$(\alpha, \beta) = \left[t\sum_{D} + (1-t)\sum_{H}\right]^{-1} (\mu_D - \mu_H) \label{eq:alpha_beta_expression} (\#eq:alpha-beta-expression)$$ + + Given these coefficients, the combination score is calculated using + Eq \@ref(eq:9). In this formula, *t* is a constant with values + ranging from 0 to 1. This value can be hyper-tuned by maximizing the + AUC. + +- *Todor & Saplacan's method:* Todor and Saplacan's method uses the + sine and cosine trigonometric functions to calculate the combination + score [@todor2014tools]. The combination score is calculated using + $\theta \in[-\frac{\pi}{2},\frac{\pi}{2}]$, which maximizes the AUC + within this interval. The formula for the combination score is given + as follows: + $$c= \sin{(\theta)}x_{i1}+\cos{(\theta)}x_{i2}$$ + +#### Non-linear combination methods + +In addition to linear combination methods, the +[`dtComb`](https://cran.r-project.org/web/packages/dtComb/index.html) +package includes seven non-linear approaches, which will be discussed in +this subsection. In this subsection, we will use the following +notations: $x_{ij}$: the value of the *j*th marker for the *i*th +individual, $i=1,2,...,n$ and $j=1,2$ *d*: degree of polynomial +regressions and splines, $d = 1,2,…,p$. + +- *Logistic Regression with Polynomial Feature Space:* This approach + extends the logistic regression model by adding extra predictors + created by raising the original predictor variables to a certain + power. This transformation enables the model to capture and model + non-linear relationships in the data by including polynomial terms + in the feature space [@james2013introduction]. The combination score + is calculated as follows: + $$c=\frac{exp\left(\beta_0 + \beta_1 x_{ij} + \beta_2 x_{ij}^2+...+\beta_p x_{ij}^p\right)}{1+exp\left(\beta_0 + \beta_1 x_{ij} + \beta_2 x_{ij}^2+...+\beta_p x_{ij}^p\right)}$$ + where $c_i$ is the combination score for the *i*th individual and + represents the posterior probabilities. + +- *Ridge Regression with Polynomial Feature Space:* This method + combines Ridge regression with expanded feature space created by + adding polynomial terms to the original predictor variables. It is a + widely used shrinkage method when we have multicollinearity between + the variables, which may be an issue for least squares regression. + This method aims to estimate the coefficients of these correlated + variables by minimizing the residual sum of squares (RSS) while + adding a term (referred to as a regularization term) to prevent + overfitting. The objective function is based on the L2 norm of the + coefficient vector, which prevents overfitting in the model (Eq + \@ref(eq:beta-hat-r)). The Ridge estimate is defined as follows: + $$\hat{\beta}^R = \text{argmin}_{\beta} \text{RSS} + \lambda \sum_{j=1}^{2} \sum_{d=1}^{p} \beta_j^{d^2} \label{eq:beta_hat_r} (\#eq:beta-hat-r)$$ + + where + $$RSS=\sum_{i=1}^{n}\left(y_i-\beta_0-\sum_{j=1}^{2}\sum_{d=1}^{p} \beta_j^d x_{ij}^d\right)$$ + and $\hat{\beta}^R$ denotes the estimates of the coefficients of the + Ridge regression, and the second term is called a penalty term where + $\lambda \geq 0$ is a shrinkage parameter. The shrinkage parameter, + $\lambda$, controls the amount of shrinkage applied to regression + coefficients. A cross-validation is implemented to find the + shrinkage parameter. We used the + [`glmnet`](https://cran.r-project.org/web/packages/glmnet/index.html) + package [@friedman2010regularization] to implement the Ridge + regression in combining the diagnostic tests. + +- *Lasso Regression with Polynomial Feature Space:* Similar to Ridge + regression, Lasso regression is also a shrinkage method that adds a + penalty term to the objective function of the least square + regression. The objective function, in this case, is based on the L1 + norm of the coefficient vector, which leads to the sparsity in the + model. Some of the regression coefficients are precisely zero when + the tuning parameter $\lambda$ is sufficiently large. This property + of the Lasso method allows the model to automatically identify and + remove less relevant variables and reduce the algorithm's + complexity. The Lasso estimates are defined as follows: + + $$\hat{\beta}^L = \text{argmin}_{\beta} \text{RSS} + \lambda \sum_{j=1}^{2} \sum_{d=1}^{d} | \beta_j^d | \label{eq:beta_hat_l} (\#eq:beta-hat-l)$$ + + To implement the Lasso regression in combining the diagnostic tests, + we used the + [`glmnet`](https://cran.r-project.org/web/packages/glmnet/index.html) + package [@friedman2010regularization]. + +- *Elastic-Net Regression with Polynomial Feature Space:* Elastic-Net + Regression is a method that combines Lasso (L1 regularization) and + Ridge (L2 regularization) penalties to address some of the + limitations of each technique. The combination of the two penalties + is controlled by two hyperparameters, $\alpha\in$\[0,1\] and + $\lambda$, which enable you to adjust the trade-off between the L1 + and L2 regularization terms [@james2013introduction]. For the + implementation of the method, the + [`glmnet`](https://cran.r-project.org/web/packages/glmnet/index.html) + package is used [@friedman2010regularization]. + +- *Splines:* Another non-linear combination technique frequently + applied in diagnostic tests is the splines. Splines are a versatile + mathematical and computational technique that has a wide range of + applications. These splines are piecewise functions that make + interpolating or approximating data points possible. There are + several types of splines, such as cubic splines. Smooth curves are + created by approximating a set of control points using cubic + polynomial functions. When implementing splines, two critical + parameters come into play: degrees of freedom and the choice of + polynomial degrees (i.e., degrees of the fitted polynomials). These + user-adjustable parameters, which influence the flexibility and + smoothness of the resulting curve, are critical for controlling the + behavior of splines. We used the + [`splines`](https://rdocumentation.org/packages/splines/versions/3.6.2) + package in the R programming language to implement splines. + +- *Generalized Additive Models with Smoothing Splines and Generalized + Additive Models with Natural Cubic Splines:* Regression models are + of great interest in many fields to understand the importance of + different inputs. Even though regression is widely used, the + traditional linear models often fail in real life as effects may not + be linear. Another method called generalized additive models was + introduced to identify and characterize non-linear regression + [@james2013introduction]. Smoothing Splines and Natural Cubic + Splines are two standard methods used within GAMs to model + non-linear relationships. To implement these two methods, we used + the [`gam`](https://cran.r-project.org/web/packages/gam/index.html) + package in R [@Trevor2015gam]. The method of GAMs with Smoothing + Splines is a more data-driven and adaptive approach where smoothing + splines can automatically capture non-linear relationships without + specifying the number of knots (specific points where two or more + polynomial segments are joined together to create a + piecewise-defined curve or surface) or the shape of the spline in + advance. On the other hand, natural cubic splines are preferred when + we have prior knowledge or assumptions about the shape of the + non-linear relationship. Natural cubic splines are more + interpretable and can be controlled by the number of knots + [@elhakeem2022using]. + +#### Mathematical Operators + +This section will mention four arithmetic operators, eight distance +measurements, and the exponential approach. Also, unlike other +approaches, in this section, users can apply logarithmic, exponential, +and trigonometric (sinus and cosine) transformations on the markers. Let +$x_{ij}$ represent the value of the *j*th variable for the *i*th +observation, with $i=1,2,...,n$ and $j=1,2$. Let the resulting +combination score for the *i*th individual be $c_i$. + +- *Arithmetic Operators:* Arithmetic operators such as addition, + multiplication, division, and subtraction can also be used in + diagnostic tests to optimize the AUC, a measure of diagnostic test + performance. These mathematical operations can potentially increase + the AUC and improve the efficacy of diagnostic tests by combining + markers in specific ways. For example, if high values in one test + indicate risk, while low values in the other indicate risk, + subtraction or division can effectively combine these markers. + +- *Distance Measurements:* While combining markers with mathematical + operators, a distance measure is used to evaluate the relationships + or similarities between marker values. It's worth noting that, as + far as we know, no studies have integrated various distinct distance + measures with arithmetic operators in this context. Euclidean + distance is the most commonly used distance measure, which may not + accurately reflect the relationship between markers. Therefore, we + incorporated a variety of distances into the package we developed. + These distances are given as follows + [@minaev2018distance; @pandit2011comparative; @cha2007comprehensive]:\ + *Euclidean:* + $$c = \sqrt{(x_{i1} - 0)^2 + (x_{i2} - 0)^2} \label{eq:euclidean_distance} (\#eq:euclidean-distance)$$ + \ + *Manhattan:* + $$c = |x_{i1} - 0| + |x_{i2} - 0| \label{eq:manhattan_distance} (\#eq:manhattan-distance)$$ + \ + *Chebyshev:* + $$c = \max\{|x_{i1} - 0|, |x_{i2} - 0|\} \label{eq:max_absolute} (\#eq:max-absolute)$$ + \ + *Kulczynskid:* + $$c = \frac{|x_{i1} - 0| + |x_{i2} - 0|}{\min\{x_{i1}, x_{i2}\}} \label{eq:custom_expression} (\#eq:custom-expression)$$ + \ + *Lorentzian:* + $$c = \ln(1 + |x_{i1} - 0|) + \ln(1 + |x_{i2} - 0|) \label{eq:ln_expression} (\#eq:ln-expression)$$ + \ + *Taneja:* + $$c = z_1 \left( \log \left( \frac{z_1}{\sqrt{x_{i1} \epsilon}} \right) \right) + z_2 \left( \log \left( \frac{z_2}{\sqrt{x_{i2} \epsilon}} \right) \right) \label{eq:log_expression} (\#eq:log-expression)$$ + \ + where + $z_1 = \frac{x_{i1} - 0}{2}, \quad z_2 = \frac{x_{i2} - 0}{2}$\ + *Kumar-Johnson:* + $$c = \frac{{(x_{i1}^2 - 0)^2}}{{2(x_{i1} \epsilon)^{\frac{3}{2}}}} + \frac{{(x_{i2}^2 - 0)^2}}{{2(x_{i2} \epsilon)^{\frac{3}{2}}}}, \quad \epsilon=0.0000) \label{eq:c_expression} (\#eq:c-expression)$$ + \ + *Avg:* + $$c = \frac{{|x_{i1} - 0| + |x_{i2} - 0| + \max\{(x_{i1} - 0),(x_{i2} - 0)\}}}{2} \label{eq:c_expression} (\#eq:c-expression)$$ + \ + +- *Exponential approach:* The exponential approach is another + technique to explore different relationships between the diagnostic + measurements. The methods in which one of the two diagnostic tests + is considered as the base and the other as an exponent can be + represented as $x_{i1}^{(x_{i2})}$ and $x_{i2}^{(x_{i1})}$. The + specific goals or hypothesis of the analysis, as well as the + characteristics of the diagnostic tests, will determine which method + to use. + +#### Machine-Learning algorithms + +Machine-learning algorithms have been increasingly implemented in +various fields, including the medical field, to combine diagnostic +tests. Integrating diagnostic tests through ML can lead to more +accurate, timely, and personalized diagnoses, which are particularly +valuable in complex medical cases where multiple factors must be +considered. In this study, we aimed to incorporate almost all ML +algorithms in the package we developed. We took advantage of the +[`caret`](https://cran.r-project.org/web/packages/caret/index.html) +package in R [@kuhn2008building] to achieve this goal. This package +includes 190 classification algorithms that could be used to train +models and make predictions. Our study focused on models that use +numerical inputs and produce binary responses depending on the +variables/features and the desired outcome. This selection process +resulted in 113 models we further implemented in our study. We then +classified these 113 models into five classes using the same idea given +in [@zararsiz2016statistical]: (i) discriminant classifiers, (ii) +decision tree models, (iii) kernel-based classifiers, (iv) ensemble +classifiers, and (v) others. Like in the +[`caret`](https://cran.r-project.org/web/packages/caret/index.html) +package, `mlComb()` sets up a grid of tuning parameters for a number of +classification routines, fits each model, and calculates a performance +measure based on resampling. After the model fitting, it uses the +`predict()` function to calculate the probability of the \"event\" +occurring for each observation. Finally, it performs ROC analysis based +on the probabilities obtained from the prediction step. + +### Standardization + +Standardization is converting/transforming data into a standard scale to +facilitate meaningful comparisons and statistical inference. Many +statistical techniques frequently employ standardization to improve the +interpretability and comparability of data. We implemented five +different standardization methods that can be applied for each marker, +the formulas of which are listed below: + +- Z-score: $\frac{{x - \text{mean}(x)}}{{\text{sd}(x)}}$ + +- T-score: + $\left( \frac{{x - \text{mean}(x)}}{{\text{sd}(x)}} \times 10 \right) + 50$ + +- min_max_scale: $\frac{{x - \min(x)}}{{\max(x) - \min(x)}}$ + +- scale_mean_to_one: $\frac{x}{{\text{mean}(x)}}$ + +- scale_sd_to_one: $\frac{x}{{\text{sd}(x)}}$ + +### Model building + +After specifying a combination method from the +[`dtComb`](https://cran.r-project.org/web/packages/dtComb/index.html) +package, users can build and optimize model parameters using functions +like `mlComb()`, `linComb()`, `nonlinComb()`, and `mathComb()`, +depending on the specific model selected. Parameter optimization is done +using n-fold cross-validation, repeated n-fold cross-validation, and +bootstrapping methods for linear and non-linear approaches (i.e., +`linComb()`, `nonlinComb()`). Additionally, for machine-learning +approaches (i.e., `mlComb()`), all of the resampling methods from the +[`caret`](https://cran.r-project.org/web/packages/caret/index.html) +package are used to optimize the model parameters. The total number of +parameters being optimized varies across models, and these parameters +are fine-tuned to maximize the AUC. The returned object stores input +data, preprocessed and transformed data, trained model, and resampling +results. + +### Evaluation of model performances + +A confusion matrix, as shown in Table [1](#tab:T1){reference-type="ref" +reference="tab:confusion_matrix"}, is a table used to evaluate the +performance of a classification model and shows the number of correct +and incorrect predictions. It compares predicted and actual + +::: {#tab:confusion_matrix} + ----------------------------------------------------------- + Predicted labels Actual class labels Total + ------------------ --------------------- ---------- ------- + Positive Negative + + Positive TP FP TP+FP + + Negative FN TN FN+TN + + Total TP+FN FP+TN n + ----------------------------------------------------------- + + : (#tab:T1) Confusion Matrix +::: + +::: flush +TP: True Positive, TN: True Negative, FP: False Positive, FN: False +Negative, n: Sample size +::: + +class labels, with diagonal elements representing the correct +predictions and off-diagonal elements representing the number of +incorrect predictions. The +[`dtComb`](https://cran.r-project.org/web/packages/dtComb/index.html) +package uses the +[`OptimalCutpoints`](https://cran.r-project.org/web/packages/OptimalCutpoints/index.html) +package [@yin2014optimal] to generate the confusion matrix and then +[`epiR`](https://cran.r-project.org/web/packages/epiR/index.html) +[@stevenson2017epir], including different performance metrics, to +evaluate the performances. Various performance metrics, accuracy rate +(ACC), Kappa statistic ($\kappa$), sensitivity (SE), specificity (SP), +apparent and true prevalence (AP, TP), positive and negative predictive +values (PPV, NPV), positive and negative likelihood ratio (PLR, NLR), +the proportion of true outcome negative subjects that test positive +(False T+ proportion for true D-), the proportion of true outcome +positive subjects that test negative (False T- proportion for true D+), +the proportion of test-positive subjects that are outcome negative +(False T+ proportion for T+), the proportion of test negative subjects +(False T- proportion for T-) that are outcome positive measures are +available in the +[`dtComb`](https://cran.r-project.org/web/packages/dtComb/index.html) +package. These metrics are summarized in Table +[2](#tab:T2){reference-type="ref" reference="tab:performance_metrics"}. + +::: {#tab:performance_metrics} + ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + **Performance Metric** **Formula** + --------------------------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------- + Accuracy $\text{ACC} = \frac{{\text{TP} + \text{TN}}}{2}$ + + Kappa $\kappa = \frac{{\text{ACC} - P_e}}{{1 - P_e}}$ + + $P_e = \frac{{(\text{TN} + \text{FN})(\text{TP} + \text{FP}) + (\text{FP} + \text{TN})(\text{FN} + \text{TN})}}{{n^2}}$ + + Sensitivity (Recall) $\text{SE} = \frac{{\text{TP}}}{{\text{TP} + \text{FN}}}$ + + Specificity $\text{SP} = \frac{{\text{TN}}}{{\text{TN} + \text{FP}}}$ + + Apparent Prevalence $\text{AP} = \frac{{\text{TP}}}{{n}} + \frac{{\text{FP}}}{{n}}$ + + True Prevalence $\text{TP} = \frac{{\text{AP} + \text{SP} - 1}}{{\text{SE} + \text{SP} - 1}}$ + + Positive Predictive Value (Precision) $\text{PPV} = \frac{{\text{TP}}}{{\text{TP} + \text{FP}}}$ + + Negative Predictive Value $\text{NPV} = \frac{{\text{TN}}}{{\text{TN} + \text{FN}}}$ + + Positive Likelihood Ratio $\text{PLR} = \frac{{\text{SE}}}{{1 - \text{SP}}}$ + + Negative Likelihood Ratio $\text{NLR} = \frac{{1 - \text{SE}}}{{\text{SP}}}$ + + The Proportion of True Outcome Negative Subjects That Test Positive $\frac{{\text{FP}}}{{\text{FP} + \text{TN}}}$ + + The Proportion of True Outcome Positive Subjects That Test Negative $\frac{{\text{FN}}}{{\text{TP} + \text{FN}}}$ + + The Proportion of Test Positive Subjects That Are Outcome Negative $\frac{{\text{FP}}}{{\text{TP} + \text{FN}}}$ + + The Proportion of Test Negative Subjects That Are Outcome Positive $\frac{{\text{FN}}}{{\text{FN} + \text{TN}}}$ + ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + + : (#tab:T2) Performance Metrics and Formulas +::: + +### Prediction of the test cases + +The class labels of the observations in the test set are predicted with +the model parameters derived from the training phase. It is critical to +emphasize that the same analytical procedures employed during the +training phase have also been applied to the test set, such as +normalization, transformation, or standardization. More specifically, if +the training set underwent Z-standardization, the test set would +similarly be standardized using the mean and standard deviation derived +from the training set. The class labels of the test set are then +estimated based on the cut-off value established during the training +phase and using the model's parameters that are trained using the +training set. + +### Technical details and the structure of dtComb + +The +[`dtComb`](https://cran.r-project.org/web/packages/dtComb/index.html) +package is implemented using the R programming language +() version 4.2.0. Package development was +facilitated with +[`devtools`](https://cran.r-project.org/web/packages/devtools/index.html) +[@wickham2016devtools] and documented with +[`roxygen2`](https://cran.r-project.org/web/packages/roxygen2/index.html) +[@wickham2013roxygen2]. Package testing was performed using 271 unit +tests [@wickham2011testthat]. Double programming was performed using +Python () to validate the implemented functions +[@shiralkarprogramming].\ +To combine diagnostic tests, the +[`dtComb`](https://cran.r-project.org/web/packages/dtComb/index.html) +package allows the integration of eight linear combination methods, +seven non-linear combination methods, arithmetic operators, and, in +addition to these, eight distance metrics within the scope of +mathematical operators and a total of 113 machine-learning algorithms +from the +[`caret`](https://cran.r-project.org/web/packages/caret/index.html) +package [@kuhn2008building]. These are summarized in Table +[3](#tab:T3){reference-type="ref" reference="tab:dtComb_features"}. + +::: {#tab:dtComb_features} ++--------------------------+------------------------------------------+ +| **Modules (Tab Panels)** | **Features** | ++:=========================+:=========================================+ +| Combination Methods | - Linear Combination Approach (8 | +| | Different methods) | +| | | +| | - Non-linear Combination Approach (7 | +| | Different Methods) | +| | | +| | - Mathematical Operators (14 Different | +| | methods) | +| | | +| | - Machine-Learning Algorithms (113 | +| | Different Methods) | +| | [@kuhn2008building] | ++--------------------------+------------------------------------------+ +| | - Five standardization methods | +| | applicable to linear, non-linear, | +| | mathematical methods | +| | | +| | - 16 preprocessing methods applicable | +| | to ML [@kuhn2008building] | ++--------------------------+------------------------------------------+ +| | - Three different methods for linear | +| | and non-linear combination methods | +| | | +| | - Bootstrapping | +| | | +| | - Cross-validation | +| | | +| | - Repeated cross-validation | +| | | +| | - 12 different resampling methods for | +| | ML [@kuhn2008building] | ++--------------------------+------------------------------------------+ +| | - 34 different methods for optimum | +| | cutpoints [@yin2014optimal] | ++--------------------------+------------------------------------------+ + +: (#tab:T3) Features of dtComb +::: + +## Results + +Table [4](#tab:T4){reference-type="ref" reference="tab:exist_pck"} +summarizes the existing packages and programs, including +[`dtComb`](https://cran.r-project.org/web/packages/dtComb/index.html), +along with the number of combination methods included in each package. +While **mROC** offers only one linear combination method, +[`maxmzpAUC`](https://github.com/wbaopaul/MaxmzpAUC-R) and +[`movieROC`](https://cran.r-project.org/web/packages/movieROC/index.html) +provide five linear combination techniques each, and +[`SLModels`](https://cran.r-project.org/web/packages/SLModels/index.html) +includes four. However, these existing packages primarily focus on +linear combination approaches. In contrast, +[`dtComb`](https://cran.r-project.org/web/packages/dtComb/index.html) +goes beyond these limitations by integrating not only linear methods but +also non-linear approaches, machine learning algorithms, and +mathematical operators. + +::: {#tab:exist_pck} + -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + **Packages&Programs** **Linear Comb.** **Non-linear Comb.** **Math. Operators** **ML algorithms** + -------------------------------------------------------------------------------------------------------- ------------------ ---------------------- --------------------- ------------------- + **mROC** [@kramar2001mroc] 1 \- \- \- + + [`maxmzpAUC`](https://github.com/wbaopaul/MaxmzpAUC-R) [@yu2015two] 5 \- \- \- + + [`movieROC`](https://cran.r-project.org/web/packages/movieROC/index.html) [@perez2021visualizing] 5 \- \- \- + + [`SLModels`](https://cran.r-project.org/web/packages/SLModels/index.html) [@aznar-gimeno2023comparing] 4 \- \- \- + + [`dtComb`](https://cran.r-project.org/web/packages/dtComb/index.html) 8 7 14 113 + -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + + : (#tab:T4) Comparison of dtComb vs. existing packages and + programs +::: + +### Dataset + +To demonstrate the functionality of the +[`dtComb`](https://cran.r-project.org/web/packages/dtComb/index.html) +package, we conduct a case study using four different combination +methods. The data used in this study were obtained from patients who +presented at Erciyes University Faculty of Medicine, Department of +General Surgery, with complaints of abdominal pain +[@zararsiz2016statistical; @akyildiz2010value]. The dataset comprised +D-dimer levels (*D_dimer*) and leukocyte counts (*log_leukocyte*) of 225 +patients, divided into two groups (*Group*): the first group consisted +of 110 patients who required an immediate laparotomy (*needed*). In +comparison, the second group comprised 115 patients who did not +(*not_needed*). After the evaluation of conventional treatment, the +patients who underwent surgery due to their postoperative pathologies +are placed in the first group. In contrast, those with a negative result +from their laparotomy were assigned to the second group. All the +analyses were performed by following a workflow given in Fig. +[1](#figure:workflow){reference-type="ref" reference="figure:workflow"}. +First of all, the +[`dtComb`](https://cran.r-project.org/web/packages/dtComb/index.html) +package should be loaded in order to use related functions. + +![Figure 1: **Combination steps of two diagnostic tests.** The figure +presents a schematic representation of the sequential steps involved in +combining two diagnostic tests using a combination +method.](Figure/Figure_1.png){#figure:workflow width="81.0%" +alt="graphic without alt text"} + +``` r +# load dtComb package +library(dtComb) +``` + +Similarly, the laparotomy data can be loaded from the R database by +using the following R code: + +``` r + +# load laparotomy data +data(laparotomy) +``` + +### Implementation of the dtComb package + +In order to demonstrate the applicability of the +[`dtComb`](https://cran.r-project.org/web/packages/dtComb/index.html) +package, the implementation of an arbitrarily chosen method from each of +the linear, non-linear, mathematical operator and machine learning +approaches is demonstrated and their performance is compared. These +methods are Pepe, Cai & Langton for linear combination, Splines for +non-linear, Addition for mathematical operator and SVM for +machine-learning. Before applying the methods, we split the data into +two parts: a training set comprising 70% of the data and a test set +comprising the remaining 30%. + +``` r +# Splitting the data set into train and test (70%-30%) +set.seed(2128) +inTrain <- caret::createDataPartition(laparotomy$group, p = 0.7, list = FALSE) +trainData <- laparotomy[inTrain, ] +colnames(trainData) <- c("Group", "D_dimer", "log_leukocyte") +testData <- laparotomy[-inTrain, -1] + +# define marker and status for combination function +markers <- trainData[, -1] +status <- factor(trainData$Group, levels = c("not_needed", "needed")) +``` + +The model is trained on `trainData` and the resampling parameters used +in the training phase are chosen as ten repeat five fold repeated +cross-validation. Direction = '\<' is chosen, as higher values indicate +higher risks. The Youden index was chosen among the cut-off methods. We +note that markers are not standardised and results are presented at the +confidence level (CI 95%). Four main combination functions are run with +the selected methods as follows. + +``` r + +# PCL method +fit.lin.PCL <- linComb(markers = markers, status = status, event = "needed", + method = "PCL", resample = "repeatedcv", nfolds = 5, + nrepeats = 10, direction = "<", cutoff.method = "Youden") + +# splines method (degree = 3 and degrees of freedom = 3) +fit.nonlin.splines <- nonlinComb(markers = markers, status = status, event = "needed", + method = "splines", resample = "repeatedcv", nfolds = 5, + nrepeats = 10, cutoff.method = "Youden", direction = "<", + df1 = 3, df2 = 3) +#add operator + fit.add <- mathComb(markers = markers, status = status, event = "needed", + method = "add", direction = "<", cutoff.method = "Youden") +#SVM +fit.svm <- mlComb(markers = markers, status = status, event = "needed", method = "svmLinear", + resample = "repeatedcv", nfolds = 5,nrepeats = 10, direction = "<", + cutoff.method = "Youden") +``` + +Various measures were considered to compare model performances, +including AUC, ACC, SEN, SPE, PPV, and NPV. AUC statistics, with 95% CI, +have been calculated for each marker and method. The resulting +statistics are as follows: 0.816 (0.751--0.880), 0.802 (0.728--0.877), +0.888 (0.825--0.930), 0.911 (0.868--0.954), 0.877 (0.824-0.929), and +0.875 (0.821-0.930) for D-dimer, Log(leukocyte), Pepe, Cai & Langton, +Splines, Addition, and Support Vector Machine (SVM). The results +revealed that the predictive performances of markers and the combination +of markers are significantly higher than random chance in determining +the use of laparotomy ($p<0.05$). The highest sensitivity and NPV were +observed with the Addition method, while the highest specificity and PPV +were observed with the Splines method. According to the overall AUC and +accuracies, the combined approach fitted with the Splines method +performed better than the other methods (Fig. +[2](#figure:radar){reference-type="ref" reference="figure:radar"}). +Therefore, the Splines method will be used in the subsequent analysis of +the findings. + +![Figure 2: **Radar plots of trained models and performance measures of +two markers.** Radar plots summarize the diagnostic performances of two +markers and various combination methods in the training dataset. These +plots illustrate the performance metrics such as AUC, ACC, SEN, SPE, +PPV, and NPV measurements. In these plots, the width of the polygon +formed by connecting each point indicates the model's performance in +terms of AUC, ACC, SEN, SPE, PPV, and NPV metrics. It can be observed +that the polygon associated with the Splines method occupies the most +extensive area, which means that the Splines method performed better +than the other methods.](Figure/Figure_4.png){#figure:radar width="100%" +alt="graphic without alt text"} + +For the AUC of markers and the spline model: + +``` r +fit.nonlin.splines$AUC_table + AUC SE.AUC LowerLimit UpperLimit z p.value +D_dimer 0.8156966 0.03303310 0.7509530 0.8804403 9.556979 1.212446e-21 +log_leukocyte 0.8022286 0.03791768 0.7279113 0.8765459 7.970652 1.578391e-15 +Combination 0.9111752 0.02189588 0.8682601 0.9540904 18.778659 1.128958e-78 +``` + +Here:\ +`SE`: Standard Error.\ +The area under ROC curves for D-dimer levels and leukocyte counts on the +logarithmic scale and combination score were 0.816, 0.802, and 0.911, +respectively. The ROC curves generated with the combination score from +the splines model, D-dimer levels, and leukocyte count markers are also +given in Fig. [3](#figure:roc){reference-type="ref" +reference="figure:roc"}, showing that the combination score has the +highest AUC. It is observed that the splines method significantly +improved between 9.5% and 10.9% in AUC statistics compared to D-dimer +level and leukocyte counts, respectively. + +![Figure 3: **ROC curves.** ROC curves for combined diagnostic tests, +with sensitivity displayed on the y-axis and 1-specificity displayed on +the x-axis. As can be observed, the combination score produced the +highest AUC value, indicating that the combined strategy performs the +best overall.](Figure/Figure_2.png){#figure:roc width="70.0%" +alt="graphic without alt text"} + +\ +To see the results of the binary comparison between the combination +score and markers: + +``` r +fit.nonlin.splines$MultComp_table + +Marker1 (A) Marker2 (B) AUC (A) AUC (B) |A-B| SE(|A-B|) z p-value +1 Combination D_dimer 0.9079686 0.8156966 0.09227193 0.02223904 4.1490971 3.337893e-05 +2 Combination log_leukocyte 0.9079686 0.8022286 0.10573994 0.03466544 3.0502981 2.286144e-03 +3 D_dimer log_leukocyte 0.8156966 0.8022286 0.01346801 0.04847560 0.2778308 7.811423e-01 +``` + +Controlling Type I error using Bonferroni correction, comparison of +combination score with markers yielded significant results ($p<0.05$).\ +To demonstrate the diagnostic test results and performance measures for +non-linear combination approach, the following code can be used: + +``` r +fit.nonlin.splines$DiagStatCombined + Outcome + Outcome - Total +Test + 66 13 79 +Test - 11 68 79 +Total 77 81 158 + +Point estimates and 95% CIs: +-------------------------------------------------------------- +Apparent prevalence * 0.50 (0.42, 0.58) +True prevalence * 0.49 (0.41, 0.57) +Sensitivity * 0.86 (0.76, 0.93) +Specificity * 0.84 (0.74, 0.91) +Positive predictive value * 0.84 (0.74, 0.91) +Negative predictive value * 0.86 (0.76, 0.93) +Positive likelihood ratio 5.34 (3.22, 8.86) +Negative likelihood ratio 0.17 (0.10, 0.30) +False T+ proportion for true D- * 0.16 (0.09, 0.26) +False T- proportion for true D+ * 0.14 (0.07, 0.24) +False T+ proportion for T+ * 0.16 (0.09, 0.26) +False T- proportion for T- * 0.14 (0.07, 0.24) +Correctly classified proportion * 0.85 (0.78, 0.90) +-------------------------------------------------------------- +* Exact CIs +``` + +Furthermore, if the diagnostic test results and performance measures of +the combination score are compared with the results of the single +markers, it can be observed that the TN value of the combination score +is higher than that of the single markers, and the combination of +markers has higher specificity and positive-negative predictive value +than the log-transformed leukocyte counts and D-dimer level (Table +[5](#tab:T5){reference-type="ref" reference="tab:diagnostic_measures"}). +Conversely, D-dimer has a higher sensitivity than the others. Optimal +cut-off values for both markers and the combined approach are also given +in this table. + +::: {#tab:diagnostic_measures} + --------------------------------------------------------------------------------------------------------------------------------------- + **Diagnostic Measures (95% CI)** **D-dimer level ($>1.6$)** **Log(leukocyte count) ($>4.16$)** **Combination score ($>0.448$)** + ---------------------------------- ---------------------------- ------------------------------------ ---------------------------------- + TP 66 61 65 + + TN 53 60 69 + + FP 28 21 12 + + FN 11 16 12 + + Apparent prevalence 0.59 (0.51-0.67) 0.52 (0.44-0.60) 0.49 (0.41-0.57) + + True prevalence 0.49 (0.41-0.57) 0.49 (0.41-0.57) 0.49 (0.41-0.57) + + Sensitivity 0.86 (0.76-0.93) 0.79 (0.68-0.88) 0.84 (0.74-0.92) + + Specificity 0.65 (0.54-0.76) 0.74 (0.63-0.83) 0.85 (0.76-0.92) + + Positive predictive value 0.70 (0.60-0.79) 0.74 (0.64-0.83) 0.84 (0.74-0.92) + + Negative predictive value 0.83 (0.71-0.91) 0.79 (0.68-0.87) 0.85 (0.76-0.92) + + Positive likelihood ratio 2.48 (1.81-3.39) 3.06 (2.08-4.49) 5.70 (3.35-9.69) + + Negative likelihood ratio 0.22 (0.12-0.39) 0.28 (0.18-0.44) 0.18 (0.11-0.31) + + False T+ proportion for true D- 0.35 (0.24-0.46) 0.26 (0.17-0.37) 0.15 (0.08-0.24) + + False T- proportion for true D+ 0.14 (0.07-0.24) 0.21 (0.12-0.32) 0.16 (0.08-0.26) + + False T+ proportion for T+ 0.30 (0.21-0.40) 0.26 (0.17-0.36) 0.16 (0.08-0.26) + + False T- proportion for T- 0.17 (0.09-0.29) 0.21 (0.13-0.32) 0.15 (0.08-0.24) + + Accuracy 0.75 (0.68-0.82) 0.77 (0.69-0.83) 0.85 (0.78-0.90) + --------------------------------------------------------------------------------------------------------------------------------------- + + : (#tab:T5) Statistical diagnostic measures with 95% confidence + intervals for each marker and the combination score +::: + +For a comprehensive analysis, the `plotComb` function in +[`dtComb`](https://cran.r-project.org/web/packages/dtComb/index.html) +can be used to generate plots of the kernel density and individual-value +of combination scores of each group and the specificity and sensitivity +corresponding to different cut-off point values Fig. +[4](#figure:scatter){reference-type="ref" reference="figure:scatter"}. +This function requires the result of the `nonlinComb` function, which is +an object of the "dtComb" class and `status` which is of factor type. + +``` r +# draw distribution, dispersion, and specificity and sensitivity plots +plotComb(fit.nonlin.splines, status) +``` + +![Figure 4: **Kernel density, individual-value, and sens&spe plots of +the combination score acquired with the training model.** Kernel density +of the combination score for two groups: needed and not needed (a). +Individual-value graph with classes on the x-axis and combination score +on the y-axis (b). Sensitivity and specificity graph of the combination +score c. While colors show each class in Figures (a) and (b), in Figure +(c), the colors represent the sensitivity and specificity of the +combination score.](Figure/Figure_3.png){#figure:scatter width="100%" +alt="graphic without alt text"} + +If the model trained with Splines is to be tested, the generically +written predict function is used. This function requires the test set +and the result of the `nonlinComb` function, which is an object of the +"dtComb" class. As a result of prediction, the output for each +observation consisted of the combination score and the predicted label +determined by the cut-off value derived from the model. + +``` r +# To predict the test set +pred <- predict(fit.nonlin.splines, testData) +head(pred) + + comb.score labels +1 0.6133884 needed +7 0.9946474 needed +10 0.9972347 needed +11 0.9925040 needed +13 0.9257699 needed +14 0.9847090 needed +``` + +Above, it can be seen that the estimated combination scores for the +first six observations in the test set were labelled as **needed** +because they were higher than the cut-off value of 0.448. + +### Web interface for the dtComb package + +The primary goal of developing the +[`dtComb`](https://cran.r-project.org/web/packages/dtComb/index.html) +package is to combine numerous distinct combination methods and make +them easily accessible to researchers. Furthermore, the package includes +diagnostic statistics and visualization tools for diagnostic tests and +the combination score generated by the chosen method. Nevertheless, it +is worth noting that using R code may pose challenges for physicians and +those unfamiliar with R programming. We have also developed a +user-friendly web application for +[`dtComb`](https://cran.r-project.org/web/packages/dtComb/index.html) +using +[`Shiny`](https://cran.r-project.org/web/packages/shiny/index.html) +[@chang2017shiny] to address this. This web-based tool is publicly +accessible and provides an interactive interface with all the +functionalities found in the +[`dtComb`](https://cran.r-project.org/web/packages/dtComb/index.html) +package.\ +To initiate the analysis, users must upload their data by following the +instructions outlined in the \"Data upload\" tab of the web tool. For +convenience, we have provided three example datasets on this page to +assist researchers in practicing the tool's functionality and to guide +them in formatting their own data (as illustrated in Fig. +[5](#figure:web){reference-type="ref" reference="figure:web"}a). We also +note that ROC analysis for a single marker can be performed within the +'ROC Analysis for Single Marker(s)' tab in the data upload section of +the web interface. + +In the \"Analysis\" tab, one can find two crucial subpanels: + +- Plots (Fig. [5](#figure:web){reference-type="ref" + reference="figure:web"}b): This section offers various visual + representations, such as ROC curves, kernel density plots, + individual-value plots, and sensitivity and specificity plots. These + visualizations help users assess single diagnostic tests and the + combination score generated using user-defined combination methods. + +- Results (Fig. [5](#figure:web){reference-type="ref" + reference="figure:web"}c): In this subpanel, one can access a range + of statistics. It provides insights into the combination score and + single diagnostic tests, AUC statistics, and comparisons to evaluate + how the combination score fares against individual diagnostic tests, + and various diagnostic measures. One can also predict new data based + on the model parameters set previously and stored in the \"Predict\" + tab (Fig. [5](#figure:web){reference-type="ref" + reference="figure:web"}d). If needed, one can download the model + created during the analysis to keep the parameters of the fitted + model. This lets users make new predictions by reloading the model + from the \"Predict\" tab. Additionally, all the results can easily + be downloaded using the dedicated download buttons in their + respective tabs. + +![Figure 5: **Web interface of the dtComb package.** The figure +illustrates the web interface of the +[`dtComb`](https://cran.r-project.org/web/packages/dtComb/index.html) +package, which demonstrates the steps involved in combining two +diagnostic tests. a) Data Upload: The user is able to upload the dataset +and select relevant markers, a gold standard test, and an event factor +for analysis.b) Combination Analysis: This panel allows the selection of +the combination method, method-specific parameters, and resampling +options to refine the analysis. c) Combination Analysis Output: Displays +the results generated by the selected combination method, providing the +user with key metrics and visualizations for interpretation. d) Predict: +Displays the prediction results of the trained model when applied to the +test set.](Figure/Figure_5.png){#figure:web width="100%" +alt="graphic without alt text"} + +## Summary and further research + +In clinical practice, multiple diagnostic tests are possible for disease +diagnosis [@yu2015two]. Combining these tests to enhance diagnostic +accuracy is a widely accepted approach +[@su1993linear; @pepe2000combining; @liu2011min; @sameera2016binary; @pepe2006combining; @todor2014tools]. +As far as we know, the tools in Table [4](#tab:T4){reference-type="ref" +reference="tab:exist_pck"} have been designed to combine diagnostic +tests but only contain at most five different combination methods. As a +result, despite the existence of numerous advanced combination methods, +there is currently no extensive tool available for integrating +diagnostic tests.\ +In this study, we presented +[`dtComb`](https://cran.r-project.org/web/packages/dtComb/index.html), a +comprehensive R package designed to combine diagnostic tests using +various methods, including linear, non-linear, mathematical operators, +and machine learning algorithms. The package integrates 142 different +methods for combining two diagnostic markers to improve the accuracy of +diagnosis. The package also provides ROC curve analysis, various +graphical approaches, diagnostic performance scores, and binary +comparison results. In the given example, one can determine whether +patients with abdominal pain require laparotomy by combining the D-dimer +levels and white blood cell counts of those patients. Various methods, +such as linear and non-linear combinations, were tested, and the results +showed that the Splines method performed better than the others, +particularly in terms of AUC and accuracy compared to single tests. This +shows that diagnostic accuracy can be improved with combination +methods.\ +Future work can focus on extending the capabilities of the +[`dtComb`](https://cran.r-project.org/web/packages/dtComb/index.html) +package. While some studies focus on combining multiple markers, our +study aimed to combine two markers using nearly all existing methods and +develop a tool and package for clinical practice [@kang2016linear]. + +### R Software + +The R package +[`dtComb`](https://cran.r-project.org/web/packages/dtComb/index.html) is +now available on the CRAN website +. + +### Acknowledgment + +We would like to thank the Proofreading & Editing Office of the Dean for +Research at Erciyes University for the copyediting and proofreading +service for this manuscript. +::::::::: + +[^1]: [https://cran.r-project.org/web/ + packages/splines/index.html](https://cran.r-project.org/web/ + packages/splines/index.html){.uri} + +[^2]: []{#note2 + label="note2"} + +[^3]: , + diff --git a/_articles/RJ-2025-036/RJwrapper.tex b/_articles/RJ-2025-036/RJwrapper.tex new file mode 100644 index 0000000000..b8f01a8e4a --- /dev/null +++ b/_articles/RJ-2025-036/RJwrapper.tex @@ -0,0 +1,35 @@ +\documentclass[a4paper]{report} +\usepackage[utf8]{inputenc} +\usepackage[T1]{fontenc} +\usepackage{RJournal} +\usepackage{amsmath,amssymb,array} +\usepackage{booktabs} +\usepackage{multirow} +\usepackage{array} +\usepackage{ulem} +\usepackage{cleveref} +\usepackage{hyperref} +\usepackage{float} + + +%% load any required packages FOLLOWING this line +\renewcommand{\arraystretch}{1.5} % Adjust the value as needed +\DeclareUnicodeCharacter{2047}{\ensuremath{\max}} +\DeclareUnicodeCharacter{2061}{} + +\begin{document} + +%% do not edit, for illustration only +\sectionhead{Contributed research article} +\volume{17} +\volnumber{4} +\year{2025} +\month{December} +\setcounter{page}{80} + +%% replace RJtemplate with your article +\begin{article} + \input{dtComb3} +\end{article} + +\end{document} diff --git a/_articles/RJ-2025-036/Rlogo-5.png b/_articles/RJ-2025-036/Rlogo-5.png new file mode 100644 index 0000000000..077505788a Binary files /dev/null and b/_articles/RJ-2025-036/Rlogo-5.png differ diff --git a/_articles/RJ-2025-036/diff.pdf b/_articles/RJ-2025-036/diff.pdf new file mode 100644 index 0000000000..06115a4ef6 Binary files /dev/null and b/_articles/RJ-2025-036/diff.pdf differ diff --git a/_articles/RJ-2025-036/diff.tex b/_articles/RJ-2025-036/diff.tex new file mode 100644 index 0000000000..b6396b0a89 --- /dev/null +++ b/_articles/RJ-2025-036/diff.tex @@ -0,0 +1,652 @@ + +%% Include all macros below + +\newcommand{\lorem}{{\bf LOREM}} +\newcommand{\ipsum}{{\bf IPSUM}} + +%% END MACROS SECTION +%DIF PREAMBLE EXTENSION ADDED BY LATEXDIFF +%DIF UNDERLINE PREAMBLE %DIF PREAMBLE +%\RequirePackage[]{ulem} %DIF PREAMBLE +\RequirePackage{}\definecolor{RED}{rgb}{1,0,0}\definecolor{BLUE}{rgb}{0,0,1} %DIF PREAMBLE +\providecommand{\DIFadd}[1]{{\protect\color{blue}\uwave{#1}}} %DIF PREAMBLE +\providecommand{\DIFdel}[1]{{\protect\color{red}\sout{#1}}} %DIF PREAMBLE +%DIF SAFE PREAMBLE %DIF PREAMBLE +\providecommand{\DIFaddbegin}{} %DIF PREAMBLE +\providecommand{\DIFaddend}{} %DIF PREAMBLE +\providecommand{\DIFdelbegin}{} %DIF PREAMBLE +\providecommand{\DIFdelend}{} %DIF PREAMBLE +%DIF FLOATSAFE PREAMBLE %DIF PREAMBLE +\providecommand{\DIFaddFL}[1]{\DIFadd{#1}} %DIF PREAMBLE +\providecommand{\DIFdelFL}[1]{\DIFdel{#1}} %DIF PREAMBLE +\providecommand{\DIFaddbeginFL}{} %DIF PREAMBLE +\providecommand{\DIFaddendFL}{} %DIF PREAMBLE +\providecommand{\DIFdelbeginFL}{} %DIF PREAMBLE +\providecommand{\DIFdelendFL}{} %DIF PREAMBLE +%DIF END PREAMBLE EXTENSION ADDED BY LATEXDIFF + +% !TeX root = RJwrapper.tex +\title{dtComb: A Comprehensive R Library and Web Tool for Combining Diagnostic Tests} + +%\author{Serra Ilayda Yerlitas, Serra Bersan Gengec, Necla Kochan, Gozde Erturk Zararsiz, Selcuk Korkmaz and Gokmen Zararsiz} + +\author{Serra Ilayda Yerlitaş Taştan, Serra Bersan Gengeç, Necla Koçhan, Gözde Ertürk Zararsız, Selçuk Korkmaz and Gökmen Zararsız} +\DIFaddbegin + + \DIFaddend + + + +\maketitle + +\abstract{ +The combination of diagnostic tests has become a crucial area of research, aiming to improve the accuracy and robustness of medical diagnostics. While existing tools focus primarily on linear combination methods, there is a lack of comprehensive tools that integrate diverse methodologies. In this study, we present \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}}, a \DIFdelbegin \DIFdel{genuinely }\DIFdelend comprehensive R package and web tool designed to address the limitations of existing diagnostic test combination platforms. One of the unique contributions of \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} is offering a range of 142 methods to combine two diagnostic tests, including linear, non-linear, machine learning algorithms and mathematical operators. Another significant contribution of \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} is its inclusion of advanced tools for ROC analysis, diagnostic performance metrics, and visual outputs such as sensitivity-specificity curves. Furthermore, \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} offers classification functions for new observations, making it an easy-to-use tool for clinicians and researchers. The web-based version is also available at \url{https://biotools.erciyes.edu.tr/dtComb/} for non-R users, providing an intuitive interface for test combination and model training.} + +\section{Introduction} +A typical scenario often encountered in combining diagnostic tests is when the gold standard method combines two-category and two continuous diagnostic tests. In such cases, clinicians usually seek to compare these two diagnostic tests and improve the performance of these diagnostic test results by dividing the results into proportional results \citep{muller2019amyloid, faria2016neutrophil, nyblom2006ast}. However, this technique is straightforward and may not fully capture all potential interactions and relationships between the diagnostic tests. Linear combination methods have been developed to overcome such problems \citep{erturkzararsiz2023linear}.\\ +Linear methods combine two diagnostic tests into a single score/index by assigning weights to each test, optimizing their performance in diagnosing the condition of interest \citep{neumann2023combining}. Such methods improve accuracy by leveraging the strengths of both tests \citep{aznar2022stepwise, bansal2013does}. For instance, Su and Liu \citep{su1993linear} found that Fisher’s linear discriminant function generates a linear combination of markers with either proportional or disproportional covariance matrices, aiming to maximize sensitivity consistently across the entire selectivity spectrum under a multivariate normal distribution model. In contrast, another approach introduced by Pepe and Thomson \citep{pepe2000combining} relies on ranking scores, eliminating the need for linear distributional assumptions when combining diagnostic tests. Despite the theoretical advances, when existing tools were examined, it was seen that they contained a limited number of methods. For instance, Kramar et al. developed a computer program called \pkg{mROC} that includes only the Su and Liu method \citep{kramar2001mroc}. Pérez-Fernández et al. presented a \href{https://cran.r-project.org/web/packages/movieROC/index.html}{\texttt{movieROC}} R package that includes methods such as Su and Liu, min-max, and logistic regression methods \citep{perez2021visualizing}. An R package called \href{https://github.com/wbaopaul/MaxmzpAUC-R}{\texttt{maxmzpAUC}} that includes similar methods was developed by Yu and Park \citep{yu2015two}. + + +On the other hand, non-linear approaches incorporating the non-linearity between the diagnostic tests have been developed and employed to integrate the diagnostic tests \citep{du2024likelihood, ghosh2005classification}. These approaches incorporate the non-linear structure of tests into the model, which might improve the accuracy and reliability of the diagnosis. In contrast to some existing packages, which permit the use of non-linear approaches such as splines\footnote{\url{https://cran.r-project.org/web/ +packages/splines/index.html}}, lasso\footnote{\label{note2}\url{https://cran.r-project.org/web/packages/glmnet/index.html}} and ridge\footref{note2} regression, there is currently no package that employs these methods directly for combination and offers diagnostic performance. Machine-learning (ML) algorithms have recently been adopted to combine diagnostic tests \citep{ahsan2024advancements, sewak2024construction, agarwal2023artificial, prinzi2023explainable}. Many publications/studies focus on implementing ML algorithms in diagnostic tests \citep{salvetat2022game, salvetat2024ai, ganapathy2023comparison, alzyoud2024diagnosing, zararsiz2016statistical}. For instance, DeGroat et al. performed four different classification algorithms (Random Forest, Support Vector Machine, Extreme Gradient Boosting Decision Trees, and k-Nearest Neighbors) to combine markers for the diagnosis of cardiovascular disease \citep{degroat2024discovering}. The results showed that patients with cardiovascular disease can be diagnosed with up to 96\% accuracy using these ML techniques. There are numerous applications where ML methods can be implemented (\href{https://scikit-learn.org/stable/}{\texttt{scikit-learn}} \citep{pedregosa2011scikit}, \href{https://www.tensorflow.org/learn?hl=tr}{\texttt{TensorFlow}} \citep{tensorflow2015-whitepaper}, \href{https://cran.r-project.org/web/packages/caret/index.html}{\texttt{caret}} \citep{kuhn2008building}). The \href{https://cran.r-project.org/web/packages/caret/index.html}{\texttt{caret}} library is one of the most comprehensive tools developed in the R language\citep{kuhn2008building}. However, these are general tools developed only for ML algorithms and do not directly combine two diagnostic tests and provide diagnostic performance measures. + +Apart from the aforementioned methods, several basic mathematical operations such as addition, multiplication, subtraction, and division can also be used to combine markers \citep{svart2024neurofilament, luo2024ast, serban2024significance}. For instance, addition can enhance diagnostic sensitivity by combining the effects of markers, whereas subtraction can more distinctly differentiate disease states by illustrating the variance across markers. On the other hand, there are several commercial (e.g. IBM SPSS, MedCalc, Stata, etc.) and open source (R) software packages (\href{https://cran.r-project.org/web/packages/ROCR/index.html}{\texttt{ROCR}} \citep{sing2005rocr}, (\href{https://cran.r-project.org/web/packages/pROC/index.html}{\texttt{pROC}} \citep{robin2011proc}, \href{https://cran.r-project.org/web/packages/PRROC/index.html}{\texttt{PRROC}} \citep{grau2015prroc}, \href{https://cran.r-project.org/web/packages/plotROC/index.html}{\texttt{plotROC}} \citep{sachs2017plotroc}) that researchers can use for Receiver operating characteristic (ROC) curve analysis. However, these tools are designed to perform a single marker ROC analysis. As a result, there is currently no software tool that covers almost all combination methods. + +In this study, we developed \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} , an \DIFdelbegin \DIFdel{innovative }\DIFdelend R package encompassing nearly all existing combination approaches in the literature. \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} has two key advantages, making it easy to apply and superior to the other packages: (1) it provides users with a comprehensive 142 methods, including linear and non-linear approaches, ML approaches and mathematical operators; (2) it produces turnkey solutions to users from the stage of uploading data to the stage of performing analyses, performance evaluation and reporting. Furthermore, it is the only package that illustrates linear approaches such as Minimax and Todor \& Saplacan \citep{sameera2016binary,todor2014tools}. In addition, it allows for the classification of new, previously unseen observations using trained models. To our knowledge, no other tools were designed and developed to combine two diagnostic tests on a single platform with 142 different methods. In other words, \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} has made more effective and robust combination methods ready for application instead of traditional approaches such as simple ratio-based methods. First, we review the theoretical basis of the related combination methods; then, we present an example implementation to demonstrate the applicability of the package. Finally, we present a user-friendly, up-to-date, and comprehensive web tool developed to facilitate \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} for physicians and healthcare professionals who do not use the R programming language. The \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} package is freely available on the CRAN network, the web application is freely available at \url{https://biotools.erciyes.edu.tr/dtComb/}, and all source code is available on GitHub\footnote{\url{https://github.com/gokmenzararsiz/dtComb}, \url{https://github.com/gokmenzararsiz/dtComb_Shiny}}. +\section{Material and methods} +This section will provide an overview of the combination methods implemented in the literature. Before applying these methods, we will also discuss the standardization techniques available for the markers, the resampling methods during model training, and, ultimately, the metrics used to evaluate the model’s performance. + +\subsection{Combination approaches} +\subsubsection{Linear combination methods} +The \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} package comprises eight distinct linear combination methods, which will be elaborated in this section. Before investigating these methods, we briefly introduce some notations which will be used throughout this section. \\ +Notations: \\ +Let $D_{i}, i = 1, 2, …, n_1$ be the marker values of $i$th individual in the diseased group, where $D_i=(D_{i1},D_{i2})$, and $H_j, j=1,2,…,n_2$ be the marker values of $j$th individual in the healthy group, where $H_j=(H_{j1},H_{j2})$. Let $x_{i1}=c(D_{{i1}},H_{j1})$ be the values of the first marker, and $x_{i2}=c(D_{i2},H_{j2})$ be values of the second marker for the $i$th individual $i=1,2,...,n$. Let $D_{i,min}=\min(D_{i1},D_{i2})$, $D_{i,max}=\max(D_{i1},D_{i2})$, $H_{j,min}=\min(H_{j1},H_{j2})$, $H_{j,max}=\max(H_{j1},H_{j2})$ and $c_i$ be the resulting combination score of the $i$th individual. +\begin{itemize} + \item \textit{Logistic regression:} Logistic regression is a statistical method used for binary classification. The logistic regression model estimates the probability of the binary outcome occurring based on the values of the independent variables. It is one of the most commonly applied methods in diagnostic tests, and it generates a linear combination of markers that can distinguish between control and diseased individuals. Logistic regression is generally less effective than normal-based discriminant analysis, like Su and Liu's multivariate normality-based method, when the normal assumption is met \citep{ruiz1991asymptotic,efron1975efficiency}. On the other hand, others have argued that logistic regression is more robust because it does not require any assumptions about the joint distribution of multiple markers \citep{cox1989analysis}. Therefore, it is essential to investigate the performance of linear combination methods derived from the logistic regression approach with non-normally distributed data.\\ + The objective of the logistic regression model is to maximize the logistic likelihood function. In other words, the logistic likelihood function is maximized to estimate the logistic regression model coefficients.\\ + \begin{equation} + \label{eq:1} +c=\frac{exp(\beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2}}{1+exp(\beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2}} +\end{equation} + The logistic regression coefficients can provide the maximum likelihood estimation of the model, producing an easily interpretable value for distinguishing between the two groups. + \item \textit{Scoring based on logistic regression:} The method primarily uses a binary logistic regression model, with slight modifications to enhance the combination score. The regression coefficients, as predicted in Eq \ref{eq:1}, are rounded to a user-specified number of decimal places and subsequently used to calculate the combination score \citep{leon2006bedside}. + \begin{equation} +c= \beta_1 x_{i1}+\beta_2 x_{i2} +\end{equation} + \item \textit{Pepe \& Thompson's method:} Pepe \& Thompson have aimed to maximize the AUC or partial AUC to combine diagnostic tests, regardless of the distribution of markers \citep{pepe2000combining}. They developed an empirical solution of optimal linear combinations that maximize the Mann-Whitney U statistic, an empirical estimate of the ROC curve. Notably, this approach is distribution-free. Mathematically, they maximized the following objective function: + \begin{equation} +\text{maximize} \; U(a)= \frac{1}{n_1 n_2} \sum_{i=1}^{n_1} \sum_{j=1}^{n_2} I\left[D_{i1}+\alpha D_{i2}\geq H_{j1}+\alpha H_{j2}\right] +\end{equation} + \begin{equation} +c= x_{i1}+\alpha x_{i2} +\label{eq:4} +\end{equation} + where $a \in [-1,1]$ is interpreted as the relative weight of $x_{i2}$ to $x_{i1}$ in the combination, the weight of the second marker. This formula aims to find $\alpha$ to maximize $U(a)$. Readers are referred to see (Pepe and Thomson) \citep{pepe2000combining}. + \item \textit{Pepe, Cai \& Langton's method:} Pepe et al. observed that when the disease status and the levels of markers conform to a generalized linear model, the regression coefficients represent the optimal linear combinations that maximize the area under the ROC curves \citep{pepe2006combining}. The following objective function is maximized to achieve a higher AUC value: +\begin{equation} +\text{maximize} \; U(a)= \frac{1}{n_1 n_2} \sum_{i=1}^{n_1} \sum_{j=1}^{n_2} I\left[D_{i1}+\alpha D_{i2}> H_{j1}+\alpha H_{j2}\right] + \frac{1}{2}I\left[D_{i1}+\alpha = H_{j1} + \alpha H_{j2}\right] +\end{equation} + Before calculating the combination score using Eq \ref{eq:4}, the marker values are normalized or scaled to be constrained within the \DIFdelbegin \DIFdel{range }\DIFdelend \DIFaddbegin \DIFadd{scale }\DIFaddend of 0 to 1. In addition, it is noted that the estimate obtained by maximizing the empirical AUC can be considered as a particular case of the maximum rank correlation estimator from which the general asymptotic distribution theory was developed. Readers are referred to Pepe (2003, Chapters 4–6) for a review of the ROC curve approach and more details \citep{pepe2003statistical}. + + \item \textit{Min-Max method:} The Pepe \& Thomson method is straightforward if there are two markers. It is computationally challenging if we have more than two markers to be combined. To overcome the computational complexity issue of this method, Liu et al. \citep{liu2011min} proposed a non-parametric approach that linearly combines the minimum and maximum values of the observed markers of each subject. This approach, which does not rely on the normality assumption of data distributions (i.e., distribution-free), is known as the Min-Max method and may provide higher sensitivity than any single marker. The objective function of the Min-Max method is as follows: +\begin{equation} +\text{maximize} \; U(a)= \frac{1}{n_1 n_2} \sum_{i=1}^{n_1} \sum_{j=1}^{n_2} I[D_{i,max}+\alpha D_{i,min}> H_{j,max}+\alpha H_{j,min}] \end{equation} +\begin{equation} + c= x_{i,max}+\alpha x_{i,min} +\end{equation}\\ + where $x_{i,max}=\max⁡(x_{i1},x_{i2})$ and $x_{i,min}=\min⁡(x_{i1},x_{i2})$.\\ + +The Min-Max method aims to combine repeated measurements of a single marker over time or multiple markers that are measured with the same unit. While the Min-Max method is relatively simple to implement, it has some limitations. For example, markers may have different units of measurement, so standardization can be needed to ensure uniformity during the combination process. Furthermore, it is unclear whether all available information is fully utilized when combining markers, as this method incorporates only the markers' minimum and maximum values into the model \citep{kang2016linear}. + + \item \textit{Su \& Liu's method:} Su and Liu examined the combination score separately under the assumption of two multivariate normal distributions when the covariance matrices were proportional or disproportionate \citep{su1993linear}. Multivariate normal distributions with different covariances were first utilized in classification problems \citep{anderson1962classification}. Then, Su and Liu also developed a linear combination method by extending the idea of using multivariate distributions to the AUC, showing that the best coefficients that maximize AUC are Fisher's discriminant coefficients. Assuming that $D~N(\mu_D, \sum_D)$ and $H~N(\mu_H, \sum_H)$ represent the multivariate normal distributions for the diseased and non-diseased groups, respectively. The Fisher’s coefficients are as follows: +\begin{equation} +(\alpha, \beta) = (\sum_{D} + \sum_{H})^{-1} \mu \label{eq:alpha_beta} +\end{equation} + where $\mu=\mu_D-\mu_H$. The combination score in this case is: +\begin{equation} +c= \alpha x_{i1}+ \beta x_{i2} +\label{eq:9} +\end{equation} + \item \textit{The Minimax method:} The Minimax method is an extension of Su \& Liu's method \citep{sameera2016binary}. Suppose that D follows a multivariate normal distribution $D\sim N(\mu_D, \sum_D)$, representing the diseased group, and H follows a multivariate normal distribution $H\sim N(\mu_H, \sum_H)$, representing the non-diseased group. Then Fisher’s coefficients are as follows: +\begin{equation} +(\alpha, \beta) = \left[t\sum_{D} + (1-t)\sum_{H}\right]^{-1} (\mu_D - \mu_H) \label{eq:alpha_beta_expression} +\end{equation} + + Given these coefficients, the combination score is calculated using Eq \ref{eq:9}. In this formula, \textit{t} is a constant with values ranging from 0 to 1. This value can be hyper-tuned by maximizing the AUC. + + \item \textit{Todor \& Saplacan’s method:} Todor and Saplacan's method uses the sine and cosine trigonometric functions to calculate the combination score \citep{todor2014tools}. The combination score is calculated using $\theta \in[-\frac{\pi}{2},\frac{\pi}{2}]$, which maximizes the AUC within this interval. The formula for the combination score is given as follows: +\begin{equation} +c= \sin{(\theta)}x_{i1}+\cos{(\theta)}x_{i2} +\end{equation} +\end{itemize} + +\subsubsection{Non-linear combination methods} +In addition to linear combination methods, the \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} package includes seven non-linear approaches, which will be discussed in this subsection. In this subsection, we will use the following notations: +$x_{ij}$: the value of the \textit{j}th marker for the \textit{i}th individual, $i=1,2,...,n$ and $j=1,2$ \textit{d}: degree of polynomial regressions and splines, $d = 1,2,…,p$. + +\begin{itemize} + \item \textit{Logistic Regression with Polynomial Feature Space:} This approach extends the logistic regression model by adding extra predictors created by raising the original predictor variables to a certain power. This transformation enables the model to capture and model non-linear relationships in the data by including polynomial terms in the feature space \citep{james2013introduction}. The combination score is calculated as follows: +\begin{equation} +c=\frac{exp\left(\beta_0 + \beta_1 x_{ij} + \beta_2 x_{ij}^2+...+\beta_p x_{ij}^p\right)}{1+exp\left(\beta_0 + \beta_1 x_{ij} + \beta_2 x_{ij}^2+...+\beta_p x_{ij}^p\right)} +\end{equation} + where $c_i$ is the combination score for the \textit{i}th individual and represents the posterior probabilities. + + \item \textit{Ridge Regression with Polynomial Feature Space:} This method combines Ridge regression with expanded feature space created by adding polynomial terms to the original predictor variables. It is a widely used shrinkage method when we have multicollinearity between the variables, which may be an issue for least squares regression. This method aims to estimate the coefficients of these correlated variables by minimizing the residual sum of squares (RSS) while adding a term (referred to as a regularization term) to prevent overfitting. The objective function is based on the L2 norm of the coefficient vector, which prevents overfitting in the model (Eq \ref{eq:beta_hat_r}). The Ridge estimate is defined as follows: +\begin{equation} +\hat{\beta}^R = \text{argmin}_{\beta} \text{RSS} + \lambda \sum_{j=1}^{2} \sum_{d=1}^{p} \beta_j^{d^2} \label{eq:beta_hat_r} +\end{equation} + +where +\begin{equation} +RSS=\sum_{i=1}^{n}\left(y_i-\beta_0-\sum_{j=1}^{2}\sum_{d=1}^{p} \beta_j^d x_{ij}^d\right) +\end{equation} + and $\hat{\beta}^R$ denotes the estimates of the coefficients of the Ridge regression, and the second term is called a penalty term where $\lambda \geq 0$ is a shrinkage parameter. The shrinkage parameter, $\lambda$, controls the amount of shrinkage applied to regression coefficients. A cross-validation is implemented to find the shrinkage parameter. We used the \href{https://cran.r-project.org/web/packages/glmnet/index.html}{\texttt{glmnet}} package \citep{friedman2010regularization} to implement the Ridge regression in combining the diagnostic tests. + + \item \textit{Lasso Regression with Polynomial Feature Space:} Similar to Ridge regression, Lasso regression is also a shrinkage method that adds a penalty term to the objective function of the least square regression. The objective function, in this case, is based on the L1 norm of the coefficient vector, which leads to the sparsity in the model. Some of the regression coefficients are precisely zero when the tuning parameter $\lambda$ is sufficiently large. This property of the Lasso method allows the model to automatically identify and remove less relevant variables and reduce the algorithm's complexity. The Lasso estimates are defined as follows: + + \begin{equation} +\hat{\beta}^L = \text{argmin}_{\beta} \text{RSS} + \lambda \sum_{j=1}^{2} \sum_{d=1}^{d} | \beta_j^d | \label{eq:beta_hat_l} +\end{equation} + + + To implement the Lasso regression in combining the diagnostic tests, we used the \href{https://cran.r-project.org/web/packages/glmnet/index.html}{\texttt{glmnet}} package \citep{friedman2010regularization}. + + \item \textit{Elastic-Net Regression with Polynomial Feature Space:} Elastic-Net Regression is a method that combines Lasso (L1 regularization) and Ridge (L2 regularization) penalties to address some of the limitations of each technique. The combination of the two penalties is controlled by two hyperparameters, $\alpha\in$[0,1] and $\lambda$, which enable you to adjust the trade-off between the L1 and L2 regularization terms \citep{james2013introduction}. For the implementation of the method, the \href{https://cran.r-project.org/web/packages/glmnet/index.html}{\texttt{glmnet}} package is used \citep{friedman2010regularization}. + \item \textit{Splines:} Another non-linear combination technique frequently applied in diagnostic tests is the splines. Splines are a versatile mathematical and computational technique that has a wide range of applications. These splines are piecewise functions that make interpolating or approximating data points possible. There are several types of splines, such as cubic splines. Smooth curves are created by approximating a set of control points using cubic polynomial functions. When implementing splines, two critical parameters come into play: degrees of freedom and the choice of polynomial degrees (i.e., degrees of the fitted polynomials). These user-adjustable parameters, which influence the flexibility and smoothness of the resulting curve, are critical for controlling the behavior of splines. We used the \href{https://rdocumentation.org/packages/splines/versions/3.6.2}{\texttt{splines}} package in the R programming language to implement splines. + + \item \textit{Generalized Additive Models with Smoothing Splines and Generalized Additive Models with Natural Cubic Splines:} Regression models are of great interest in many fields to understand the importance of different inputs. Even though regression is widely used, the traditional linear models often fail in real life as effects may not be linear. Another method called generalized additive models was introduced to identify and characterize non-linear regression \citep{james2013introduction}. Smoothing Splines and Natural Cubic Splines are two standard methods used within GAMs to model non-linear relationships. To implement these two methods, we used the \href{https://cran.r-project.org/web/packages/gam/index.html}{\texttt{gam}} package in R \citep{Trevor2015gam}. The method of GAMs with Smoothing Splines is a more data-driven and adaptive approach where smoothing splines can automatically capture non-linear relationships without specifying the number of knots (specific points where two or more polynomial segments are joined together to create a piecewise-defined curve or surface) or the shape of the spline in advance. On the other hand, natural cubic splines are preferred when we have prior knowledge or assumptions about the shape of the non-linear relationship. Natural cubic splines are more interpretable and can be controlled by the number of knots \citep{elhakeem2022using}. +\end{itemize} + +\subsubsection{Mathematical Operators} +This section will mention four arithmetic operators, eight distance measurements, and the exponential approach. Also, unlike other approaches, in this section, users can apply logarithmic, exponential, and trigonometric (sinus and cosine) transformations on the markers. Let $x_{ij}$ represent the value of the \textit{j}th variable for the \textit{i}th observation, with $i=1,2,...,n$ and $j=1,2$. Let the resulting combination score for the \textit{i}th individual be $c_i$. +\begin{itemize} + \item \textit{Arithmetic Operators:} Arithmetic operators such as addition, multiplication, division, and subtraction can also be used in diagnostic tests to optimize the AUC, a measure of diagnostic test performance. These mathematical operations can potentially increase the AUC and improve the efficacy of diagnostic tests by combining markers in specific ways. For example, if high values in one test indicate risk, while low values in the other indicate risk, subtraction or division can effectively combine these markers. + \item \textit{Distance Measurements:} While combining markers with mathematical operators, a distance measure is used to evaluate the relationships or similarities between marker values . It's worth noting that, as far as we know, no studies have been integrating various distinct distance measures with arithmetic operators in this context. Euclidean distance is the most commonly used distance measure, which may not accurately reflect the relationship between markers. Therefore, we incorporated a variety of distances into the package we developed. These distances are given as follows \citep{minaev2018distance,pandit2011comparative,cha2007comprehensive}:\\ + +\textit{Euclidean:} +\begin{equation} +c = \sqrt{(x_{i1} - 0)^2 + (x_{i2} - 0)^2} \label{eq:euclidean_distance} +\end{equation} +\\ +\textit{Manhattan:} +\begin{equation} +c = |x_{i1} - 0| + |x_{i2} - 0| \label{eq:manhattan_distance} +\end{equation} +\\ +\textit{Chebyshev:} +\begin{equation} +c = \max\{|x_{i1} - 0|, |x_{i2} - 0|\} \label{eq:max_absolute} +\end{equation} +\\ +\textit{Kulczynskid:} +\begin{equation} +c = \frac{|x_{i1} - 0| + |x_{i2} - 0|}{\min\{x_{i1}, x_{i2}\}} \label{eq:custom_expression} +\end{equation} +\\ +\textit{Lorentzian:} +\begin{equation} +c = \ln(1 + |x_{i1} - 0|) + \ln(1 + |x_{i2} - 0|) \label{eq:ln_expression} +\end{equation} +\\ + \textit{Taneja:} +\begin{equation} +c = z_1 \left( \log \left( \frac{z_1}{\sqrt{x_{i1} \epsilon}} \right) \right) + z_2 \left( \log \left( \frac{z_2}{\sqrt{x_{i2} \epsilon}} \right) \right) \label{eq:log_expression} +\end{equation} +\\ +where $z_1 = \frac{x_{i1} - 0}{2}, \quad z_2 = \frac{x_{i2} - 0}{2}$ \\ + +\textit{Kumar-Johnson:} +\begin{equation} +c = \frac{{(x_{i1}^2 - 0)^2}}{{2(x_{i1} \epsilon)^{\frac{3}{2}}}} + \frac{{(x_{i2}^2 - 0)^2}}{{2(x_{i2} \epsilon)^{\frac{3}{2}}}}, \quad \epsilon=0.0000) \label{eq:c_expression} +\end{equation} +\\ +\textit{Avg:} +\begin{equation} +c = \frac{{|x_{i1} - 0| + |x_{i2} - 0| + \max\{(x_{i1} - 0),(x_{i2} - 0)\}}}{2} \label{eq:c_expression} +\end{equation}\\ + + \item \textit{Exponential approach:} The exponential approach is another technique to explore different relationships between the diagnostic measurements. The methods in which one of the two diagnostic tests is considered as the base and the other as an exponent can be represented as $x_{i1}^{(x_{i2})}$ and $x_{i2}^{(x_{i1})}$. The specific goals or hypothesis of the analysis, as well as the characteristics of the diagnostic tests, will determine which method to use. +\end{itemize} +\subsubsection{Machine-Learning algorithms} +Machine-learning algorithms have been increasingly implemented in various fields, including the medical field, to combine diagnostic tests. Integrating diagnostic tests through ML can lead to more accurate, timely, and personalized diagnoses, which are particularly valuable in complex medical cases where multiple factors must be considered. In this study, we aimed to incorporate almost all ML algorithms in the package we developed. We took advantage of the \href{https://cran.r-project.org/web/packages/caret/index.html}{\texttt{caret}} package in R \citep{kuhn2008building} to achieve this goal. This package includes 190 classification algorithms that could be used to train models and make predictions. Our study focused on models that use numerical inputs and produce binary responses depending on the variables/features and the desired outcome. This selection process resulted in 113 models we further implemented in our study. We then classified these 113 models into five classes using the same idea given in \citep{zararsiz2016statistical}: (i) discriminant classifiers, (ii) decision tree models, (iii) kernel-based classifiers, (iv) ensemble classifiers, and (v) others. Like in the \href{https://cran.r-project.org/web/packages/caret/index.html}{\texttt{caret}} package, \code{mlComb()} sets up a grid of tuning parameters for a number of classification routines, fits each model, and calculates a performance measure based on resampling. After the model fitting, it uses the \code{predict()} function to calculate the probability of the "event" occurring for each observation. Finally, it performs ROC analysis based on the probabilities obtained from the prediction step. + +\subsection{Standardization} +Standardization is converting/transforming data into a standard scale to facilitate meaningful comparisons and statistical inference. Many statistical techniques frequently employ standardization to improve the interpretability and comparability of data. We implemented five different standardization methods that can be applied for each marker, the formulas of which are listed below: + +\begin{itemize} + \item Z-score: \( \frac{{x - \text{mean}(x)}}{{\text{sd}(x)}} \) + \item T-score: \( \left( \frac{{x - \text{mean}(x)}}{{\text{sd}(x)}} \times 10 \right) + 50 \) + \item \DIFdelbegin \DIFdel{Range}\DIFdelend \DIFaddbegin \DIFadd{min\_max\_scale}\DIFaddend : \( \frac{{x - \min(x)}}{{\max(x) - \min(x)}} \) + \item \DIFdelbegin \DIFdel{Mean}\DIFdelend \DIFaddbegin \DIFadd{scale\_mean\_to\_one}\DIFaddend : \( \frac{x}{{\text{mean}(x)}} \) + \item \DIFdelbegin \DIFdel{Deviance}\DIFdelend \DIFaddbegin \DIFadd{scale\_sd\_to\_one}\DIFaddend : \( \frac{x}{{\text{sd}(x)}} \) +\end{itemize} + + +\subsection{Model building} +After specifying a combination method from the \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} package, users can build and optimize model parameters using functions like \code{mlComb()}, \code{linComb()}, \code{nonlinComb()}, and \code{mathComb()}, depending on the specific model selected. Parameter optimization is done using n-fold cross-validation, repeated n-fold cross-validation, and bootstrapping methods for linear and non-linear approaches (i.e., \code{linComb()}, \code{nonlinComb()}). Additionally, for machine-learning approaches (i.e., \code{mlComb()}), all of the resampling methods from the \href{https://cran.r-project.org/web/packages/caret/index.html}{\texttt{caret}} package are used to optimize the model parameters. The total number of parameters being optimized varies across models, and these parameters are fine-tuned to maximize the AUC. Returned object stores input data, preprocessed and transformed data, trained model, and resampling results. +\subsection{Evaluation of model performances} + +A confusion matrix, as shown in Table \ref{tab:confusion_matrix}, is a table used to evaluate the performance of a classification model and shows the number of correct and incorrect predictions. It compares predicted and actual + +\begin{table}[h] +\centering +\caption{Confusion Matrix} +\label{tab:confusion_matrix} +\begin{tabular}{llll} +\hline +\multirow{2}{*}{Predicted labels} & \multicolumn{2}{l}{Actual class labels} & Total \\ \cline{2-4} + & Positive & Negative & \\ \hline +Positive & TP & FP & TP+FP \\ +Negative & FN & TN & FN+TN \\ +Total & TP+FN & FP+TN & n \\ \hline +\end{tabular} + + \begin{flush} +\tiny TP: True Positive, TN: True Negative, FP: False Positive, FN: False Negative, n: Sample size + \end{flush} +\end{table} +\noindent +class labels, with diagonal elements representing the correct predictions and off-diagonal elements representing the number of incorrect predictions. The \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} package uses the \href{https://cran.r-project.org/web/packages/OptimalCutpoints/index.html}{\texttt{OptimalCutpoints}} package \citep{yin2014optimal} to generate the confusion matrix and then \href{https://cran.r-project.org/web/packages/epiR/index.html}{\texttt{epiR}} \citep{stevenson2017epir}, including different performance metrics, to evaluate the performances. Various performance metrics accuracy rate (ACC), Kappa statistic ($\kappa$), sensitivity (SE), specificity (SP), apparent and true prevalence (AP, TP), positive and negative predictive values (PPV, NPV), positive and negative likelihood ratio (PLR, NLR), the proportion of true outcome negative subjects that test positive (False T+ proportion for true D-), the proportion of true outcome positive subjects that test negative (False T- proportion for true D+), the proportion of test-positive subjects that are outcome negative (False T+ proportion for T+), the proportion of test negative subjects (False T- proportion for T-) that are outcome positive measures are available in the \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} package. These metrics are summarized in Table \ref{tab:performance_metrics}. + +\begin{table}[htbp] + \centering \small + \caption{Performance Metrics and Formulas} + \label{tab:performance_metrics} + \begin{tabular}{ll} + \hline + \textbf{Performance Metric} & \textbf{Formula} \\ + \hline + Accuracy & $\text{ACC} = \frac{{\text{TP} + \text{TN}}}{2}$ \\ + Kappa & $\kappa = \frac{{\text{ACC} - P_e}}{{1 - P_e}}$ \\ + & $P_e = \frac{{(\text{TN} + \text{FN})(\text{TP} + \text{FP}) + (\text{FP} + \text{TN})(\text{FN} + \text{TN})}}{{n^2}}$ \\ + Sensitivity (Recall) & $\text{SE} = \frac{{\text{TP}}}{{\text{TP} + \text{FN}}}$ \\ + Specificity & $\text{SP} = \frac{{\text{TN}}}{{\text{TN} + \text{FP}}}$ \\ + Apparent Prevalence & $\text{AP} = \frac{{\text{TP}}}{{n}} + \frac{{\text{FP}}}{{n}}$ \\ + True Prevalence & $\text{TP} = \frac{{\text{AP} + \text{SP} - 1}}{{\text{SE} + \text{SP} - 1}}$ \\ + Positive Predictive Value (Precision) & $\text{PPV} = \frac{{\text{TP}}}{{\text{TP} + \text{FP}}}$ \\ + Negative Predictive Value & $\text{NPV} = \frac{{\text{TN}}}{{\text{TN} + \text{FN}}}$ \\ + Positive Likelihood Ratio & $\text{PLR} = \frac{{\text{SE}}}{{1 - \text{SP}}}$ \\ + Negative Likelihood Ratio & $\text{NLR} = \frac{{1 - \text{SE}}}{{\text{SP}}}$ \\ + The Proportion of True Outcome Negative Subjects That Test Positive & $\frac{{\text{FP}}}{{\text{FP} + \text{TN}}}$ \\ + The Proportion of True Outcome Positive Subjects That Test Negative & $\frac{{\text{FN}}}{{\text{TP} + \text{FN}}}$ \\ + The Proportion of Test Positive Subjects That Are Outcome Negative & $\frac{{\text{FP}}}{{\text{TP} + \text{FN}}}$ \\ + The Proportion of Test Negative Subjects That Are Outcome Positive & $\frac{{\text{FN}}}{{\text{FN} + \text{TN}}}$ \\ + \hline + \end{tabular} +\end{table} + + +\subsection{Prediction of the test cases} +The class labels of the observations in the test set are predicted with the model parameters derived from the training phase. It is critical to emphasize that the same analytical procedures employed during the training phase have also been applied to the test set, such as normalization, transformation, or standardization. More specifically, if the training set underwent Z-standardization, the test set would similarly be standardized using the mean and standard deviation derived from the training set. The class labels of the test set are then estimated based on the cut-off value established during the training phase and using the model's parameters that are trained using the training set. + +\subsection{Technical details and the structure of dtComb} +The \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} package is implemented using the R programming language (\url{https://www.r-project.org/}) version 4.2.0. Package development was facilitated with \href{https://cran.r-project.org/web/packages/devtools/index.html}{\texttt{devtools}} \citep{wickham2016devtools} and documented with \href{https://cran.r-project.org/web/packages/roxygen2/index.html}{\texttt{roxygen2}} \citep{wickham2013roxygen2}. Package testing was performed using 271 unit tests \citep{wickham2011testthat}. Double programming was performed using Python (\url{https://www.python.org/}) to validate the implemented functions \citep{shiralkarprogramming}.\\ + +\newpage +To combine diagnostic tests, the \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} package allows the integration of eight linear combination methods, seven non-linear combination methods, arithmetic operators, and, in addition to these, eight distance metrics within the scope of mathematical operators and a total of 113 machine-learning algorithms from the \href{https://cran.r-project.org/web/packages/caret/index.html}{\texttt{caret}} package \citep{kuhn2008building}. These are summarized in Table \ref{tab:dtComb_features}. +%Table 3 + +\begin{table}[htbp] + \centering \small + \caption{Features of dtComb} + \label{tab:dtComb_features} + \begin{tabular}{l p{10cm}} + \hline + \textbf{Modules (Tab Panels)} & \textbf{Features} \\ +\hline + \multirow{4}{*}{Combination Methods} & + \begin{itemize} + \item Linear Combination Approach (8 Different methods) + \item Non-linear Combination Approach (7 Different Methods) + \item Mathematical Operators (14 Different methods) + \item Machine-Learning Algorithms (113 Different Methods) \citep{kuhn2008building} + \end{itemize} \\ + + \DIFdelbeginFL %DIFDELCMD < \multirow{2}{*}{Standardization Methods} %%% +\DIFdelendFL \DIFaddbeginFL \multirow{2}{*}{Preprocessing} \DIFaddendFL & + \begin{itemize} + \item \DIFdelbeginFL \DIFdelFL{Linear}\DIFdelendFL \DIFaddbeginFL \DIFaddFL{Five standardization methods applicable to linear}\DIFaddendFL , non-linear, \DIFdelbeginFL \DIFdelFL{and mathematical methods + }%DIFDELCMD < \begin{itemize} +%DIFDELCMD < %%% +\DIFdelendFL \DIFaddbeginFL \DIFaddFL{mathematical methods + }\DIFaddendFL \item \DIFdelbeginFL \DIFdelFL{Z-score + }%DIFDELCMD < \item %%% +\item%DIFAUXCMD +\DIFdelFL{T-score + }%DIFDELCMD < \item %%% +\item%DIFAUXCMD +\DIFdelFL{Range + }%DIFDELCMD < \item %%% +\item%DIFAUXCMD +\DIFdelFL{Mean + }%DIFDELCMD < \item %%% +\item%DIFAUXCMD +\DIFdelFL{Deviance + }%DIFDELCMD < \end{itemize} +%DIFDELCMD < \item %%% +\item%DIFAUXCMD +\DIFdelendFL 16 \DIFdelbeginFL \DIFdelFL{different preprocessing methods for }\DIFdelendFL \DIFaddbeginFL \DIFaddFL{preprocessing methods applicable to }\DIFaddendFL ML \citep{kuhn2008building} + \end{itemize} \\ + \multirow{2}{*}{Resampling} & + \begin{itemize} + \item \DIFdelbeginFL \DIFdelFL{3 }\DIFdelendFL \DIFaddbeginFL \DIFaddFL{Three }\DIFaddendFL different methods for linear and non-linear combination methods + \begin{itemize} + \item Bootstrapping + \item Cross-validation + \item Repeated cross-validation + \end{itemize} + \item 12 different resampling methods for ML \citep{kuhn2008building} + \end{itemize} \\ + {Cutpoints} & + \begin{itemize} + \item 34 different methods for optimum cutpoints \citep{yin2014optimal} + \end{itemize} \\ + \hline + \end{tabular} +\end{table} +\section{Results} + +Table \ref{tab:exist_pck} summarizes the existing packages and programs, including \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}}, along with the number of combination methods included in each package. While \pkg{mROC} offers only one linear combination method, \href{https://github.com/wbaopaul/MaxmzpAUC-R}{\texttt{maxmzpAUC}} and \href{https://cran.r-project.org/web/packages/movieROC/index.html}{\texttt{movieROC}} provide five linear combination techniques each, and \href{https://cran.r-project.org/web/packages/SLModels/index.html}{\texttt{SLModels}} includes four. However, these existing packages primarily focus on linear combination approaches. In contrast, \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} goes beyond these limitations by integrating not only linear methods but also non-linear approaches, machine learning algorithms, and mathematical operators. + +\begin{table}[htbp] + \centering \small + \caption{Comparison of dtComb vs. existing packages and programs} + \label{tab:exist_pck} + \begin{tabular}{@{}lcccc@{}} + \toprule + \textbf{Packages\&Programs} & \textbf{Linear Comb.} & \textbf{Non-linear Comb.} & \textbf{Math. Operators} & \textbf{ML algorithms} \\ + \midrule + \textbf{mROC} \citep{kramar2001mroc} & 1 & - & - & - \\ + \href{https://github.com/wbaopaul/MaxmzpAUC-R}{\texttt{maxmzpAUC}} \citep{yu2015two} & 5 & - & - &- \\ + \href{https://cran.r-project.org/web/packages/movieROC/index.html}{\texttt{movieROC}} \citep{perez2021visualizing}& 5 &- & - &- \\ + \href{https://cran.r-project.org/web/packages/SLModels/index.html}{\texttt{SLModels}} \citep{aznar-gimeno2023comparing} & 4 & - & - & - \\ + \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}}& 8 & 7 & 14 & 113 \\ + \bottomrule + \end{tabular} +\end{table} +\subsection{Dataset} +To demonstrate the functionality of the \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} package, we conduct a case study using four different combination methods. The data used in this study were obtained from patients who presented at Erciyes University Faculty of Medicine, Department of General Surgery, with complaints of abdominal pain \citep{zararsiz2016statistical,akyildiz2010value}. The dataset comprised D-dimer levels (\textit{D\_dimer}) and leukocyte counts (\textit{log\_leukocyte}) of 225 patients, divided into two groups (\textit{Group}): the first group consisted of 110 patients who required an immediate laparotomy (\textit{nedeed}). In comparison, the second group comprised 115 patients who did not (\textit{not\_nedeed}). After the evaluation of conventional treatment, the patients who underwent surgery due to their postoperative pathologies are placed in the first group. In contrast, those with a negative result from their laparotomy were assigned to the second group. All the analyses were performed by following a workflow given in Fig. \ref{figure:workflow}. First of all, the \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} package should be loaded in order to use related functions. + +\begin{figure}[H] + \centering + \includegraphics[width=0.81\textwidth]{dtComb/Figure/Figure_1.pdf} + \caption{\textbf{Combination steps of two diagnostic tests.} The figure presents a schematic representation of the sequential steps involved in combining two diagnostic tests using a combination method.} + \label{figure:workflow} +\end{figure} + +\begin{example} +# load dtComb package +library(dtComb) +\end{example} +Similarly, the \DIFdelbegin \DIFdel{exampleData1 }\DIFdelend \DIFaddbegin \DIFadd{laparotomy }\DIFaddend data can be loaded from the R database by using the following R code: +\begin{example} + +# load \DIFdelbegin \DIFdel{exampleData1 }\DIFdelend \DIFaddbegin \DIFadd{laparotomy }\DIFaddend data +data(\DIFdelbegin \DIFdel{exampleData1}\DIFdelend \DIFaddbegin \DIFadd{laparotomy}\DIFaddend ) +\end{example} + +\subsection{Implementation of the dtComb package} +In order to demonstrate the applicability of the \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} package, the implementation of an arbitrarily chosen method from each of the linear, non-linear, mathematical operator and machine learning approaches is demonstrated and their performance is compared. These methods are Pepe, Cai \& Langton for linear combination, Splines for non-linear, Addition for mathematical operator and SVM for machine-learning. Before applying the methods, we split the data into two parts: a training set comprising 70\% of the data and a test set comprising the remaining 30\%. + +\begin{example} +# Splitting the data set into train and test (70%-30%) +set.seed(2128) +inTrain <- caret::createDataPartition(\DIFdelbegin \DIFdel{exampleData1$group, p = 0.7, list = FALSE) +trainData <- exampleData1[inTrain, ] +colnames(trainData) <- c("Group", "D_dimer", "log_leukocyte") +testData <- exampleData1[-inTrain, -1] + +# define marker and status for combination function +markers <- trainData[, -1] +status <- factor(trainData$}\DIFdelend \DIFaddbegin \DIFadd{laparotomy$group, p = 0.7, list = FALSE) +trainData <- laparotomy[inTrain, ] +colnames(trainData) <- c("Group", "D_dimer", "log_leukocyte") +testData <- laparotomy[-inTrain, -1] + +# define marker and status for combination function +markers <- trainData[, -1] +status <- factor(trainData$}\DIFaddend Group, levels = c("not_needed", "needed")) +\end{example} + +The model is trained on \code{trainData} and the resampling parameters used in the training phase are chosen as ten repeat five fold repeated cross-validation. Direction = ‘<’ is chosen, as higher values indicate higher risks. The Youden index was chosen among the cut-off methods. We note that markers are not standardised and results are presented at the confidence level (CI 95\%). Four main combination functions are run with the selected methods as follows. +\begin{example} + +# PCL method +fit.lin.PCL <- linComb(markers = markers, status = status, event = "needed", + method = "PCL", resample = "repeatedcv", nfolds = 5, + nrepeats = 10, direction = "<", cutoff.method = "Youden") + +# splines method (degree = 3 and degrees of freedom = 3) +fit.nonlin.splines <- nonlinComb(markers = markers, status = status, event = "needed", + method = "splines", resample = "repeatedcv", nfolds = 5, + nrepeats = 10, cutoff.method = "Youden", direction = "<", + df1 = 3, df2 = 3) +#add operator + fit.add <- mathComb(markers = markers, status = status, event = "needed", + method = "add", direction = "<", cutoff.method = "Youden") +#SVM +fit.svm <- mlComb(markers = markers, status = status, event = "needed", method = "svmLinear", + resample = "repeatedcv", nfolds = 5,nrepeats = 10, direction = "<", + cutoff.method = "Youden") + +\end{example} + +Various measures were considered to compare model performances, including AUC, ACC, SEN, SPE, PPV, and NPV. AUC statistics, with 95\% CI, have been calculated for each marker and method. The resulting statistics are as follows: 0.816 (0.751–0.880), 0.802 (0.728–0.877), 0.888 (0.825–0.930), 0.911 (0.868–0.954), 0.877 (0.824-0.929), and 0.875 (0.821-0.930) for D-dimer, Log(leukocyte), Pepe, Cai \& Langton, Splines, Addition, and Support Vector Machine (SVM). The results revealed that the predictive performances of markers and the combination of markers are significantly higher than random chance in determining the use of \DIFdelbegin \DIFdel{laparoscopy }\DIFdelend \DIFaddbegin \DIFadd{laparotomy }\DIFaddend ($p<0.05$). The highest sensitivity and NPV were observed with the Addition method, while the highest specificity and PPV were observed with the Splines method. According to the overall AUC and accuracies, the combined approach fitted with the Splines method performed better than the other methods (Fig. \ref{figure:radar}). Therefore, the Splines method will be used in the subsequent analysis of the findings. + +\begin{figure}[H] + \centering + \includegraphics[width=1\textwidth]{dtComb/Figure/Figure_4.pdf} + \caption{\textbf{Radar plots of trained models and performance measures of two markers.} Radar plots summarize the diagnostic performances of two markers and various combination methods in the training dataset. These plots illustrate the performance metrics such as AUC, ACC, SEN, SPE, PPV, and NPV measurements. In these plots, the width of the polygon formed by connecting each point indicates the model's performance in terms of AUC, ACC, SEN, SPE, PPV, and NPV metrics. It can be observed that the polygon associated with the Splines method occupies the most expensive area, which means that the Splines method performed better than the other methods.} + \label{figure:radar} +\end{figure} +For the AUC of markers and the spline model: +\begin{example} +fit.nonlin.splines$AUC_table + AUC SE.AUC LowerLimit UpperLimit z p.value +D_dimer 0.8156966 0.03303310 0.7509530 0.8804403 9.556979 1.212446e-21 +log_leukocyte 0.8022286 0.03791768 0.7279113 0.8765459 7.970652 1.578391e-15 +Combination 0.9111752 0.02189588 0.8682601 0.9540904 18.778659 1.128958e-78 +\end{example} +Here: \\ +\code{SE}: Standard Error.\\ + + +The area under ROC curves for D-dimer levels and leukocyte counts on the logarithmic scale and combination score were 0.816, 0.802, and 0.911, respectively. The ROC curves generated with the combination score from the splines model, D-dimer levels, and leukocyte count markers are also given in Fig. \ref{figure:roc}, showing that the combination score has the highest AUC. It is observed that the splines method significantly improved between 9.5\% and 10.9\% in AUC statistics compared to D-dimer level and leukocyte counts, respectively. +\begin{figure}[H] + \centering + \includegraphics[width=0.7\textwidth]{dtComb/Figure/Figure_2.pdf} + \caption{\textbf{ROC curves.} ROC curves for combined diagnostic tests, with sensitivity displayed on the y-axis and 1-specificity displayed on the x-axis. As can be observed, the combination score produced the highest AUC value, indicating that the combined strategy performs the best overall.} + \label{figure:roc} +\end{figure}\\ + +To see the results of the binary comparison between the combination score and markers: +\begin{example} +fit.nonlin.splines$MultComp_table + +Marker1 (A) Marker2 (B) AUC (A) AUC (B) |A-B| SE(|A-B|) z p-value +1 Combination D_dimer 0.9079686 0.8156966 0.09227193 0.02223904 4.1490971 3.337893e-05 +2 Combination log_leukocyte 0.9079686 0.8022286 0.10573994 0.03466544 3.0502981 2.286144e-03 +3 D_dimer log_leukocyte 0.8156966 0.8022286 0.01346801 0.04847560 0.2778308 7.811423e-01 +\end{example} + +Controlling Type I error using Bonferroni correction, comparison of combination score with markers yielded significant results ($p<0.05$).\\ + +To demonstrate the diagnostic test results and performance measures for non-linear combination approach, the following code can be used: + +\begin{example} +fit.nonlin.splines$DiagStatCombined + Outcome + Outcome - Total +Test + 66 13 79 +Test - 11 68 79 +Total 77 81 158 + +Point estimates and 95% CIs: +-------------------------------------------------------------- +Apparent prevalence * 0.50 (0.42, 0.58) +True prevalence * 0.49 (0.41, 0.57) +Sensitivity * 0.86 (0.76, 0.93) +Specificity * 0.84 (0.74, 0.91) +Positive predictive value * 0.84 (0.74, 0.91) +Negative predictive value * 0.86 (0.76, 0.93) +Positive likelihood ratio 5.34 (3.22, 8.86) +Negative likelihood ratio 0.17 (0.10, 0.30) +False T+ proportion for true D- * 0.16 (0.09, 0.26) +False T- proportion for true D+ * 0.14 (0.07, 0.24) +False T+ proportion for T+ * 0.16 (0.09, 0.26) +False T- proportion for T- * 0.14 (0.07, 0.24) +Correctly classified proportion * 0.85 (0.78, 0.90) +-------------------------------------------------------------- +* Exact CIs +\end{example} + +Furthermore, if the diagnostic test results and performance measures of the combination score are compared with the results of the single markers, it can be observed that the TN value of the combination score is higher than that of the single markers, and the combination of markers has higher specificity and positive-negative predictive value than the log-transformed leukocyte counts and D-dimer level (Table \ref{tab:diagnostic_measures}). Conversely, D-dimer has a higher sensitivity than the others. Optimal cut-off values for both markers and the combined approach are also given in this table. + +\begin{table}[htbp] + \centering \small + \caption{Statistical diagnostic measures with 95\% confidence intervals for each marker and the combination score} + \label{tab:diagnostic_measures} + \begin{tabular}{@{}lccc@{}} + \toprule + \textbf{Diagnostic Measures (95\% CI)} & \textbf{D-dimer level ($>1.6$)} & \textbf{Log(leukocyte count) ($>4.16$)} & \textbf{Combination score ($>0.448$)} \\ + \midrule + TP & 66 & 61 & 65 \\ + TN & 53 & 60 & 69 \\ + FP & 28 & 21 & 12 \\ + FN & 11 & 16 & 12 \\ + Apparent prevalence & 0.59 (0.51-0.67) & 0.52 (0.44-0.60) & 0.49 (0.41-0.57) \\ + True prevalence & 0.49 (0.41-0.57) & 0.49 (0.41-0.57) & 0.49 (0.41-0.57) \\ + Sensitivity & 0.86 (0.76-0.93) & 0.79 (0.68-0.88) & 0.84 (0.74-0.92) \\ + Specificity & 0.65 (0.54-0.76) & 0.74 (0.63-0.83) & 0.85 (0.76-0.92) \\ + Positive predictive value & 0.70 (0.60-0.79) & 0.74 (0.64-0.83) & 0.84 (0.74-0.92) \\ + Negative predictive value & 0.83 (0.71-0.91) & 0.79 (0.68-0.87) & 0.85 (0.76-0.92) \\ + Positive likelihood ratio & 2.48 (1.81-3.39) & 3.06 (2.08-4.49) & 5.70 (3.35-9.69) \\ + Negative likelihood ratio & 0.22 (0.12-0.39) & 0.28 (0.18-0.44) & 0.18 (0.11-0.31) \\ + False T+ proportion for true D- & 0.35 (0.24-0.46) & 0.26 (0.17-0.37) & 0.15 (0.08-0.24) \\ + False T- proportion for true D+ & 0.14 (0.07-0.24) & 0.21 (0.12-0.32) & 0.16 (0.08-0.26) \\ + False T+ proportion for T+ & 0.30 (0.21-0.40) & 0.26 (0.17-0.36) & 0.16 (0.08-0.26) \\ + False T- proportion for T- & 0.17 (0.09-0.29) & 0.21 (0.13-0.32) & 0.15 (0.08-0.24) \\ + Accuracy & 0.75 (0.68-0.82) & 0.77 (0.69-0.83) & 0.85 (0.78-0.90) \\ + \bottomrule + \end{tabular} +\end{table} + +For a comprehensive analysis, the \code{plotComb} function in \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} can be used to generate plots of the \DIFdelbegin \DIFdel{distribution and scatter }\DIFdelend \DIFaddbegin \DIFadd{kernel density and individual-value }\DIFaddend of combination scores of each group and the specificity and sensitivity corresponding to different cut-off point values Fig. \ref{figure:scatter}. This function requires the result of the \code{nonlinComb} function, which is an object of the “dtComb” class and \code{status} which is of factor type. +\begin{example} +# draw distribution, dispersion, and specificity and sensitivity plots +plotComb(fit.nonlin.splines, status) +\end{example} + +\begin{figure}[htbp] + \centering + \includegraphics[width=1\textwidth]{dtComb/Figure/Figure_3.pdf} + \caption{\DIFdelbeginFL \textbf{\DIFdelFL{Distribution, scatter, and sens\&spe plots of the combination score acquired with the training model.}} %DIFAUXCMD +\DIFdelFL{Distribution }\DIFdelendFL \DIFaddbeginFL \textbf{\DIFaddFL{Kernel density, individual-value, and sens\&spe plots of the combination score acquired with the training model.}} \DIFaddFL{Kernel density }\DIFaddendFL of the combination score for two groups: needed and not needed (a). \DIFdelbeginFL \DIFdelFL{Scatter }\DIFdelendFL \DIFaddbeginFL \DIFaddFL{Individual-value }\DIFaddendFL graph with classes on the x-axis and combination score on the y-axis (b). Sensitivity and specificity graph of the combination score c. While colors show each class in Figures (a) and (b), in Figure (c), the colors represent the sensitivity and specificity of the combination score.} + \label{figure:scatter} +\end{figure} + +If the model trained with Splines is to be tested, the generically written predict function is used. This function requires the test set and the result of the \code{nonlinComb} function, which is an object of the “dtComb” class. As a result of prediction, the output for each observation consisted of the combination score and the predicted label determined by the cut-off value derived from the model. +\begin{example} +# To predict the test set +pred <- predict(fit.nonlin.splines, testData) +head(pred) + + comb.score labels +1 0.6133884 needed +7 0.9946474 needed +10 0.9972347 needed +11 0.9925040 needed +13 0.9257699 needed +14 0.9847090 needed +\end{example} +Above, it can be seen that the estimated combination scores for the first six observations in the test set were labelled as \textbf{needed} because they were higher than the cut-off value of 0.448. + +\subsection{Web interface for the dtComb package} +The primary goal of developing the \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} package is to combine numerous distinct combination methods and make them easily accessible to researchers. Furthermore, the package includes diagnostic statistics and visualization tools for diagnostic tests and the combination score generated by the chosen method. Nevertheless, it is worth noting that using R code may pose challenges for physicians and those unfamiliar with R programming. We have also developed a user-friendly web application for \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} using \href{https://cran.r-project.org/web/packages/shiny/index.html}{\texttt{Shiny}} \citep{chang2017shiny} to address this. This web-based tool is publicly accessible and provides an interactive interface with all the functionalities found in the \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} package. \\ + +To initiate the analysis, users must upload their data by following the instructions outlined in the "Data upload" tab of the web tool. For convenience, we have provided three example datasets on this page to assist researchers in practicing the tool's functionality and to guide them in formatting their own data (as illustrated in Fig. \ref{figure:web}a). We also note that ROC analysis for a single marker can be performed within the ‘ROC Analysis for Single Marker(s)’ tab in the data upload section of the web interface. + +In the "Analysis" tab, one can find two crucial subpanels: +\begin{itemize} + \item Plots (Fig. \ref{figure:web}b): This section offers various visual representations, such as ROC curves, \DIFdelbegin \DIFdel{distribution plots, scatter }\DIFdelend \DIFaddbegin \DIFadd{kernel density plots, individual-value }\DIFaddend plots, and sensitivity and specificity plots. These visualizations help users assess single diagnostic tests and the combination score generated using user-defined combination methods. + \item Results (Fig. \ref{figure:web}c): In this subpanel, one can access a range of statistics. It provides insights into the combination score and single diagnostic tests, AUC statistics, and comparisons to evaluate how the combination score fares against individual diagnostic tests, and various diagnostic measures. One can also predict new data based on the model parameters set previously and stored in the "Predict" tab (Fig. \ref{figure:web}d). If needed, one can download the model created during the analysis to keep the parameters of the fitted model. This lets users make new predictions by reloading the model from the "Predict" tab. Additionally, all the results can easily be downloaded using the dedicated download buttons in their respective tabs. +\end{itemize} +\begin{figure}[H] + \centering + \includegraphics[width=1\textwidth]{dtComb/Figure/Figure_5.pdf} + \caption{\textbf{Web interface of the dtComb package.} The figure illustrates the web interface of the \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} package, which demonstrates the steps involved in combining two diagnostic tests. a) Data Upload: The user is able to upload the dataset and select relevant markers, a gold standard test, and an event factor for analysis.b) Combination Analysis: This panel allows the selection of the combination method, method-specific parameters, and resampling options to refine the analysis. c) Combination Analysis Output: Displays the results generated by the selected combination method, providing the user with key metrics and visualizations for interpretation. d) Predict: Displays the prediction results of the trained model when applied to the test set.} + \label{figure:web} +\end{figure} +\section{Summary and further research} +In clinical practice, multiple diagnostic tests are possible for disease diagnosis \citep{yu2015two}. Combining these tests to enhance diagnostic accuracy is a widely accepted approach \citep{su1993linear,pepe2000combining,liu2011min,sameera2016binary,pepe2006combining,todor2014tools}. As far as we know, the tools in Table \ref{tab:exist_pck} have been designed to combine diagnostic tests but only contain at most five different combination methods. As a result, despite the existence of numerous advanced combination methods, there is currently no extensive tool available for integrating diagnostic tests.\\ +In this study, we presented \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}}, a comprehensive R package designed to combine diagnostic tests using various methods, including linear, non-linear, mathematical operators, and machine learning algorithms. The package integrates 142 different methods for combining two diagnostic markers to improve the accuracy of diagnosis. The package also provides ROC curve analysis, various graphical approaches, diagnostic performance scores, and binary comparison results. In the given example, one can determine whether patients with abdominal pain require \DIFdelbegin \DIFdel{laparoscopy }\DIFdelend \DIFaddbegin \DIFadd{laparotomy }\DIFaddend by combining the D-dimer levels and white blood cell counts of those patients. Various methods, such as linear and non-linear combinations, were tested, and the results showed that the Splines method performed better than the others, particularly in terms of AUC and accuracy compared to single tests. This shows that diagnostic accuracy can be improved with combination methods.\\ +Future work can focus on extending the capabilities of the \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} package. While some studies focus on combining multiple markers, our study aimed to combine two markers using nearly all existing methods and develop a tool and package for clinical practice \citep{kang2016linear}. +\DIFdelbegin \DIFdel{However, }\href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{\DIFdel{dtComb}}%DIFAUXCMD +} %DIFAUXCMD +\DIFdel{can be further enhanced to combine more than two markers, broadening its applicability and utility in clinical settings. +}\DIFdelend + + + +\subsection{R Software} + +The R package \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} is now available on the CRAN website \url{https://cran.r-project.org/web/packages/dtComb/index.html}. +\DIFaddbegin + +\subsection{\DIFadd{Acknowledgment}} +\DIFadd{We would like to thank the Proofreading \& Editing Office of the Dean for Research at Erciyes University for the copyediting and proofreading service for this manuscript. +}\DIFaddend %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + + + +%\section{Summary} +%This file is only a basic article template. For full details of \emph{The R Journal} style and information on how to prepare your article for submission, see the \href{https://journal.r-project.org/share/author-guide.pdf}{Instructions for Authors}. + +\bibliography{dtCombreferences} + +\address{S. Ilayda Yerlitaş Taştan\\ + Department of Biostatistics \\ + Erciyes University\\ + Türkiye\\ + (ORCiD: 0000-0003-2830-3006)\\ + \email{ilaydayerlitas340@gmail.com}} + +\DIFdelbegin %DIFDELCMD < \address{Serra Bersan Gengeç\\ +%DIFDELCMD < Drug Application and Research Center (ERFARMA)\\ +%DIFDELCMD < Erciyes University\\ +%DIFDELCMD < Türkiye\\ +%DIFDELCMD < \email{serrabersan@gmail.com}} +%DIFDELCMD < %%% +\DIFdelend \DIFaddbegin \address{Serra Bersan Gengeç\\ + Department of Biostatistics \\ + Erciyes University\\ + Türkiye\\ + \email{serrabersan@gmail.com}} +\DIFaddend + +\address{Necla Koçhan\\ + Department of Mathematics\\ + Izmir University of Economics\\ + Türkiye\\ + (ORCiD: 0000-0003-2355-4826)\\ + \email{necla.kayaalp@gmail.com}} + + \address{Gözde Ertürk Zararsız\\ + Department of Biostatistics \\ + Erciyes University\\ + Türkiye\\ + (ORCiD if desired)\\ + \email{gozdeerturk9@gmail.com}} + + \address{Selçuk Korkmaz\\ + Department of Biostatistics \\ + Trakya University\\ + Türkiye\\ + (ORCiD if desired)\\ + \email{selcukorkmaz@gmail.com}} + + \address{Gökmen Zararsız\\ + Department of Biostatistics \\ + Erciyes University\\ + Türkiye\\ + (ORCiD: 0000-0001-5801-1835)\\ + \email{gokmen.zararsiz@gmail.com}} diff --git a/_articles/RJ-2025-036/dtComb.tex b/_articles/RJ-2025-036/dtComb.tex new file mode 100644 index 0000000000..355012f7ce --- /dev/null +++ b/_articles/RJ-2025-036/dtComb.tex @@ -0,0 +1,503 @@ +% !TeX root = RJwrapper.tex +\title{dtComb: A comprehensive R library and web tool for combining diagnostic tests} + +%\author{Serra Ilayda Yerlitas, Serra Bersan Gengec, Necla Kochan, Gozde Erturk Zararsiz, Selcuk Korkmaz and Gokmen Zararsiz} + +\author{Serra İlayda Yerlitaş, Serra Bersan Gengeç, Necla Koçhan, Gözde Ertürk Zararsız, Selçuk Korkmaz and Gökmen Zararsız} + + +\maketitle + +\abstract{ +Background and Objective: +The development of diagnostic tests for diagnosing and differentiating diseases is a vibrant field of research. Numerous diagnostic tests have been developed, and the diagnostic accuracy and reliability of these tests play a pivotal role in their widespread usage. Researchers have focused on integrating these diverse diagnostic tests to improve diagnostic test accuracy and reliability. On the other hand, while many methods for integrating diagnostic tests have been proposed in the literature, there is currently no comprehensive software for implementing these methods. Therefore, we developed an R package, \pkg{dtComb}, to apply these combination methods in a single platform. +Method and Materials +We employed a total of 142 methods, categorized under four main headings: (i) linear methods (8), (ii) non-linear methods (7), (iii) mathematical operators (14), and (iv) machine learning algorithms (113), for implementation in a wide range of diagnostic test combinations. We also used various standardization methods before the analysis and resampling methods to tune the hyperparameters (i.e., parameter optimization). The \pkg{dtComb} package encompasses implementations for 142 approaches, realized through 18 distinct R functions. The R package development of \pkg{dtComb} was facilitated with \pkg{devtools} and documented with \pkg{roxygen2}. Software tests were conducted with 271 unit tests using the testthat library of the R programming language. +Results +The \pkg{dtComb} package allows combination methods to calculate the combined score to diagnose the disease and increase diagnostic accuracy within a targeted study population. We demonstrated the functionalities/capabilities of the \pkg{dtComb} package by analyzing a real dataset, the abdominal pain dataset. The results showed that the package effectively calculates the combined scores using various existing methods. +Conclusion +\pkg{dtComb} is a comprehensive R package developed to combine two markers. Additionally, a web tool has been created to facilitate ease of use for healthcare professionals and non-R users. It allows researchers to combine markers on a single platform. In this regard, dtComb can be viewed as a pioneering tool, offering this unique combination feature. The web tool is available free at \url{https://biotools.erciyes.edu.tr/dtComb/}. +} + +\section{Introduction} +Correct diagnosis is a fundamental element of effective treatment in medicine and healthcare. Accurate diagnosis helps prevent unnecessary and potentially harmful treatments or procedures. Without it, patients might undergo ineffective or harmful treatments, increasing medical costs and potential complications. Reference tests are often considered the gold standard for diagnosing a disease or condition. However, these tests can be expensive and sometimes invasive (i.e., risky) or uncomfortable for the patient. On the other hand, despite their lower accuracy compared to reference tests, diagnostic markers play a crucial role in diagnosing diseases and have gained significant importance in medical research \citep{novielli2013meta}. Their role in the medical field continues to grow, providing opportunities for earlier and more effective disease management and treatment.\\ +Studies have shown that a single diagnostic test or marker cannot accurately diagnose diseases, particularly complex diseases such as autoimmune diseases, certain cancers, or neurological disorders. Clinicians conduct multiple tests separately on the same person to collect as much information as possible to ensure a thorough and accurate assessment or to diagnose specific conditions \citep{kang2016linear}. Numerous studies, including this one, have shown that effectively combining available information can improve the sensitivity and specificity of diagnostics or diagnostic accuracy. Several approaches have been developed to combine diagnostic tests or markers. For instance, Su and Liu \citep{su1993linear} found that Fisher's linear discriminant function generates a linear combination of markers with either proportional or disproportional covariance matrices, aiming to maximize sensitivity consistently across the entire selectivity spectrum under a multivariate normal distribution model. In contrast, another approach introduced by Pepe and Thomson \citep{pepe2000combining} relies on ranking scores, eliminating the need for linear distributional assumptions when combining diagnostic tests. Liu et al. \citep{liu2011min} have introduced a computationally efficient semi-linear min-max combination approach. It simplifies the process by focusing on finding a single λ value, which maximizes the Mann-Whitney U statistic of the Area Under Curve (AUC). Similarly, Sameera et al. \citep{sameera2016binary} developed the best linear combination method using minimax, asserting its superior effectiveness to previously proposed methods. These methods either rely on the assumption of a fundamental distribution, such as the normal distribution, or adopt an estimation-based approach. In this context, the focus often shifts towards prediction, as the primary concern is typically the diagnostic test performance (accuracy) in detecting specific conditions in future patients \citep{wang2013predicting}.\\ +Besides linear combination methods, non-linear approaches can also be employed to integrate the diagnostic tests. These approaches mainly focus on two statistical concepts: polynomial regression models and splines. Polynomial regression models address two critical issues/factors: interactions between markers and creating a polynomial feature space with degree parameters. This leads to developing models like polynomial logistic regression, polynomial ridge regression, and polynomial lasso regression, which consider these factors. Spline-based methods involve determining the number of knots and the degrees of the polynomials positioned between these knots. It also cretaed models such as β spline, Generalized Additive Models (GAMs) with smoothing splines, and natural cubic splines. Apart from these linear and non-linear combinations described in the literature, various other combinations, like the ratio of two diagnostic tests, can also be identified using mathematical operators. These diverse approaches allow flexibility in combining diagnostic tests for improved accuracy and effectiveness.\\ +Apart from the aforementioned methods, a number of basic mathematical operations such as addition, multiplication, subtraction, and division have also been implemented to combine markers. These mathematical operators can potentially increase and improve diagnostic tests' efficacy by combining markers in particular ways \citep{fagan2007cerebrospinal, nyblom2004high, balta2016relation}.\\ +Machine learning (ML) algorithms have recently been adopted to combine diagnostic tests. These advanced algorithms offer a robust and data-driven approach to enhance the integration and analysis of diagnostic data, potentially leading to more accurate and effective diagnostic solutions. More than 100 publications/studies focus on implementing of ML algorithms in diagnostic tests \citep{bardella1991iga, bozkurt2014comparison, chen2015diagnosis}. Zararsiz et al., for instance, aimed to improve diagnostic accuracy by combining D-dimer and leukocyte markers with various machine (statistical) learning algorithms to differentiate between surgical and non-surgical pathologies in patients with acute abdominal pain \citep{zararsiz2016statistical}. Another application was observed in the study where Abate et al. performed the combination of markers for the diagnosing Alzheimer's disease using a regression tree. The findings reported an increase in diagnosis accuracy \citep{abate2020conformation}. \\ +Despite the numerous studies that have applied machine learning algorithms and various combination methods to diagnostic tests, an easy-to-use, up-to-date, and comprehensive tool for implementing existing combination approaches has not yet been developed. Therefore, in this study, we introduced \pkg{dtComb}, an R package designed to implement the existing combination approaches. First, we examined the theoretical background of related combination methods. Subsequently, we provided an illustrative example to demonstrate the viability of the package. Finally, we presented an enhanced web application of the \pkg{dtComb} package, available at \url{https://biotools.erciyes.edu.tr/dtComb/}, which will be beneficial particularly for non-R users. + +\section{Material and methods} +This section will provide an overview of the combination methods implemented in the literature. Before applying these methods, we will also discuss the standardization techniques available for the markers, the resampling methods during model training, and, ultimately, the metrics used to evaluate the model’s performance. + +\subsection{Combination approaches} +\subsubsection{Linear combination methods} +The "\pkg{dtComb}" package comprises eight distinct linear combination methods, which will be elaborated in this section. Before investigating these methods, we briefly introduce some notations which will be used throughout this section. \\ +Notations: \\ +Let $D_{i}, i = 1, 2, …, n_1$ be the marker values of $i$th individual in the diseased group, where $D_i=(D_{i1},D_{i2})$, and $H_j, j=1,2,…,n_2$ be the marker values of $j$th individual in the healthy group, where $H_j=(H_{j1},H_{j2})$. Let $x_{i1}=c(D_{{i1}},H_{j1})$ be the values of the first marker, and $x_{i2}=c(D_{i2},H_{j2})$ be values of the second marker for the $i$th individual $i=1,2,...,n$. Let $D_{i,min}=\min(D_{i1},D_{i2})$, $D_{i,max}=\max(D_{i1},D_{i2})$, $H_{j,min}=\min(H_{j1},H_{j2})$, $H_{j,max}=\max(H_{j1},H_{j2})$ and $c_i$ be the resulting combination score of the $i$th individual. +\begin{itemize} + \item \textit{Logistic regression:} Logistic regression is a statistical method used for binary classification. The logistic regression model estimates the probability of the binary outcome occurring based on the values of the independent variables. It is one of the most commonly applied methods in diagnostic tests, and it generates a linear combination of markers that can distinguish between control and diseased individuals. Logistic regression is generally less effective than normal-based discriminant analysis, like Su and Liu's multivariate normality-based method, when the normal assumption is met \citep{ruiz1991asymptotic,efron1975efficiency}. On the other hand, others have argued that logistic regression is more robust because it does not require any assumptions about the joint distribution of multiple markers \citep{cox1989analysis}. Therefore, it is essential to investigate the performance of linear combination methods derived from the logistic regression approach with non-normally distributed data.\\ + The objective of the logistic regression model is to maximize the logistic likelihood function. In other words, the logistic likelihood function is maximized to estimate the logistic regression model coefficients.\\ + \begin{equation} + \label{eq:1} +c=\frac{exp(\beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2}}{1+exp(\beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2}} +\end{equation} + The logistic regression coefficients can provide the maximum likelihood estimation of the model, producing an easily interpretable value for distinguishing between the two groups. + \item \textit{Scoring based on logistic regression:} The method primarily uses a binary logistic regression model, with slight modifications to enhance the combination score. The regression coefficients, as predicted in Eq \ref{eq:1}, are rounded to a user-specified number of decimal places and subsequently used to calculate the combination score \citep{leon2006bedside}. + \begin{equation} +c= \beta_1 x_{i1}+\beta_2 x_{i2} +\end{equation} + \item \textit{Pepe \& Thompson's method:} Pepe \& Thompson have aimed to maximize the AUC or partial AUC to combine diagnostic tests, regardless of the distribution of markers \citep{pepe2000combining}. They developed an empirical solution of optimal linear combinations that maximize the Mann-Whitney U statistic, an empirical estimate of the ROC curve. Notably, this approach is distribution-free. Mathematically, they maximized the following objective function + \begin{equation} +\text{maximize} \; U(a)= \frac{1}{n_1 n_2} \sum_{i=1}^{n_1} \sum_{j=1}^{n_2} I\left[D_{i1}+\alpha D_{i2}\geq H_{j1}+\alpha H_{j2}\right] +\end{equation} + \begin{equation} +c= x_{i1}+\alpha x_{i2} +\label{eq:4} +\end{equation} + where $a \in [-1,1]$ is interpreted as the relative weight of $x_{i2}$ to $x_{i1}$ in the combination, the weight of the second marker. This formula aims to find $\alpha$ to maximize $U(a)$. Readers are referred to see (Pepe and Thomson) \citep{pepe2000combining}. + \item \textit{Pepe, Cai \& Langton's method:} Pepe et al. observed that when the disease status and the levels of markers conform to a generalized linear model, the regression coefficients represent the optimal linear combinations that maximize the area under the ROC curves \citep{pepe2006combining}. The following objective function is maximized to achieve a higher AUC value: +\begin{equation} +\text{maximize} \; U(a)= \frac{1}{n_1 n_2} \sum_{i=1}^{n_1} \sum_{j=1}^{n_2} I\left[D_{i1}+\alpha D_{i2}> H_{j1}+\alpha H_{j2}\right] + \frac{1}{2}I\left[D_{i1}+\alpha = H_{j1} + \alpha H_{j2}\right] +\end{equation} + Before calculating the combination score using Eq \ref{eq:4}, the marker values are normalized or scaled to be constrained within the range of 0 to 1. In addition, it is noted that the estimate obtained by maximizing the empirical AUC can be considered as a particular case of the maximum rank correlation estimator from which the general asymptotic distribution theory was developed. Readers are referred to Pepe (2003, Chapters 4–6) for a review of the ROC curve approach and more details \citep{pepe2003statistical}. + + \item \textit{Min-Max method:} The Pepe \& Thomson method is straightforward if there are two markers. It is computationally challenging if we have more than two markers to be combined. To overcome the computational complexity issue of this method, Liu et al. \citep{liu2011min} proposed a non-parametric approach that linearly combines the minimum and maximum values of the observed markers of each subject. This approach, which does not rely on the normality assumption of data distributions (i.e., distribution-free), is known as the Min-Max method and may provide higher sensitivity than any single marker. The objective function of the Min-Max method is as follows: +\begin{equation} +\text{maximize} \; U(a)= \frac{1}{n_1 n_2} \sum_{i=1}^{n_1} \sum_{j=1}^{n_2} I[D_{i,max}+\alpha D_{i,min}> H_{j,max}+\alpha H_{j,min}] \end{equation} +\begin{equation} + c= x_{i,max}+\alpha x_{i,min} +\end{equation}\\ + where $x_{i,max}=\max⁡(x_{i1},x_{i2})$ and $x_{i,min}=\min⁡(x_{i1},x_{i2})$.\\ + +The Min-Max method aims to combine repeated measurements of a single marker over time or multiple markers that are measured with the same unit. While the Min-Max method is relatively simple to implement, it has some limitations. For example, markers may have different units of measurement, so standardization can be needed to ensure uniformity during the combination process. Furthermore, it is unclear whether all available information is fully utilized when combining markers, as this method incorporates only the markers' minimum and maximum values into the model \citep{kang2016linear}. + + \item \textit{Su \& Liu's method:} Su and Liu examined the combination score separately under the assumption of two multivariate normal distributions when the covariance matrices were proportional or disproportionate \citep{su1993linear}. Multivariate normal distributions with different covariances were first utilized in classification problems \citep{anderson1961classification}. Then, Su and Liu also developed a linear combination method by extending the idea of using multivariate distributions to the AUC, showing that the best coefficients that maximize AUC are Fisher's discriminant coefficients. Assuming that $D~N(\mu_D, \sum_D)$ and $H~N(\mu_H, \sum_H)$ represent the multivariate normal distributions for the diseased and non-diseased groups, respectively. The Fisher’s coefficients are as follows: +\begin{equation} +(\alpha, \beta) = (\sum_{D} + \sum_{H})^{-1} \mu \label{eq:alpha_beta} +\end{equation} + where $\mu=\mu_D-\mu_H$. The combination score in this case is: +\begin{equation} +c= \alpha x_{i1}+ \beta x_{i2} +\label{eq:9} +\end{equation} + \item \textit{The Minimax method:} The Minimax method is an extension of Su & Liu's method \citep{sameera2016binary}. Suppose that D follows a multivariate normal distribution $D\sim N(\mu_D, \sum_D)$, representing the diseased group, and H follows a multivariate normal distribution $H\sim N(\mu_H, \sum_H)$, representing the non-diseased group. Then Fisher’s coefficients are as follows: +\begin{equation} +(\alpha, \beta) = \left[t\sum_{D} + (1-t)\sum_{H}\right]^{-1} (\mu_D - \mu_H) \label{eq:alpha_beta_expression} +\end{equation} + + Given these coefficients, the combination score is calculated using Eq \ref{eq:9}. In this formula, \textit{t} is a constant with values ranging from 0 to 1. This value can be hyper-tuned by maximizing the AUC. + + \item \textit{Todor & Saplacan’s method:} Todor and Saplacan's method uses the sine and cosine trigonometric functions to calculate the combination score \citep{todor2014tools}. The combination score is calculated using $\theta \in[-\frac{\pi}{2},\frac{\pi}{2}]$, which maximizes the AUC within this interval. The formula for the combination score is given as follows: +\begin{equation} +c= \sin{(\theta)}x_{i1}+\cos{(\theta)}x_{i2} +\end{equation} +\end{itemize} + +\subsubsection{Non-linear combination methods} +In addition to linear combination methods, the \pkg{dtComb} package includes seven non-linear approaches, which will be discussed in this subsection. In this subsection, we will use the following notations: +$x_{ij}$: the value of the \textit{j}th marker for the \textit{i}th individual, $i=1,2,...,n$ and $j=1,2$ \textit{d}: degree of polynomial regressions and splines, $d = 1,2,…,p$. + +\begin{itemize} + \item \textit{Logistic Regression with Polynomial Feature Space:} This approach extends the logistic regression model by adding extra predictors created by raising the original predictor variables to a certain power. This transformation enables the model to capture and model non-linear relationships in the data by including polynomial terms in the feature space \citep{james2021introduction}. The combination score is calculated as follows: +\begin{equation} +c=\frac{exp\left(\beta_0 + \beta_1 x_{ij} + \beta_2 x_{ij}^2+...+\beta_p x_{ij}^p\right)}{1+exp\left(\beta_0 + \beta_1 x_{ij} + \beta_2 x_{ij}^2+...+\beta_p x_{ij}^p\right)} +\end{equation} + where $c_i$ is the combination score for the \textit{i}th individual and represents the posterior probabilities. + + \item \textit{Ridge Regression with Polynomial Feature Space:} This method combines Ridge regression with expanded feature space created by adding polynomial terms to the original predictor variables. It is a widely used shrinkage method when we have multicollinearity between the variables, which may be an issue for least squares regression. This method aims to estimate the coefficients of these correlated variables by minimizing the residual sum of squares (RSS) while adding a term (referred to as a regularization term) to prevent overfitting. The objective function is based on the L2 norm of the coefficient vector, which prevents overfitting in the model (Eq \ref{eq:beta_hat_r}). The Ridge estimate is defined as follows: +\begin{equation} +\hat{\beta}^R = \text{argmin}_{\beta} \text{RSS} + \lambda \sum_{j=1}^{2} \sum_{d=1}^{p} \beta_j^{d^2} \label{eq:beta_hat_r} +\end{equation} + +where +\begin{equation} +RSS=\sum_{i=1}^{n}\left(y_i-\beta_0-\sum_{j=1}^{2}\sum_{d=1}^{p} \beta_j^d x_{ij}^d\right) +\end{equation} + and $\hat{\beta}^R$ denotes the estimates of the coefficients of the Ridge regression, and the second term is called a penalty term where $\lambda \geq 0$ is a shrinkage parameter. The shrinkage parameter, $\lambda$, controls the amount of shrinkage applied to regression coefficients. A cross-validaiton is implemented to find the shrinkage parameter. To implement the Ridge regression in combining the diagnostic tests, we used the \pkg{glmnet} package \citep{friedman2010regularization}. + + \item \textit{Lasso Regression with Polynomial Feature Space:} Similar to Ridge regression, Lasso regression is also a shrinkage method that adds a penalty term to the objective function of the least square regression. The objective function, in this case, is based on the L1 norm of the coefficient vector, which leads to the sparsity in the model. Some of the regression coefficients are precisely zero when the tuning parameter λ is sufficiently large. This property of the Lasso method allows the model to automatically identify and remove less relevant variables and reduce the algorithm's complexity. The Lasso estimates are defined as follows: + + \begin{equation} +\hat{\beta}^L = \text{argmin}_{\beta} \text{RSS} + \lambda \sum_{j=1}^{2} \sum_{d=1}^{d} | \beta_j^d | \label{eq:beta_hat_l} +\end{equation} + + + To implement the Lasso regression in combining the diagnostic tests, we used the \pkg{glmnet} package \citep{friedman2010regularization}. + + \item \textit{Elastic-Net Regression with Polynomial Feature Space:} Elastic-Net Regression is a method that combines Lasso (L1 regularization) and Ridge (L2 regularization) penalties to address some of the limitations of each technique. The combination of the two penalties is controlled by two hyperparameters, α∈[0,1] and λ, which enable you to adjust the trade-off between the L1 and L2 regularization terms \citep{james2021introduction}. For the implementation of the method, the \pkg{glmnet} package is used \citep{friedman2010regularization}. + \item \textit{Splines:} Another non-linear combination technique frequently applied in diagnostic tests is the splines. Splines are a versatile mathematical and computational technique that has a wide range of applications. These splines are piecewise functions that make interpolating or approximating data points possible. There are several types of splines, such as cubic splines. Smooth curves are created by approximating a set of control points using cubic polynomial functions. When implementing splines, two critical parameters come into play: degrees of freedom and the choice of polynomial degrees (i.e., degrees of the fitted polynomials). These user-adjustable parameters, which influence the flexibility and smoothness of the resulting curve, are critical for controlling the behavior of splines. We used the \pkg{splines} package \citep{venable2016splines} in the R programming language to implement splines. + + \item \textit{Generalized Additive Models with Smoothing Splines and Generalized Additive Models with Natural Cubic Splines:} Regression models are of great interest in many fields to understand the importance of different inputs. Even though regression is widely used, the traditional linear models often fail in real life as effects may not be linear. Another method called generalized additive models was introduced to identify and characterize non-linear regression \citep{sameera2016binary}. Smoothing Splines and Natural Cubic Splines are two standard methods used within GAMs to model non-linear relationships. To implement these two methods, we used the \pkg{gam} package in R \citep{hastie2023gam}. The method of GAMs with Smoothing Splines is a more data-driven and adaptive approach where smoothing splines can automatically capture non-linear relationships without specifying the number of knots (specific points where two or more polynomial segments are joined together to create a piecewise-defined curve or surface) or the shape of the spline in advance. On the other hand, natural cubic splines are preferred when we have prior knowledge or assumptions about the shape of the non-linear relationship. Natural cubic splines are more interpretable and can be controlled by the number of knots \citep{elhakeem2022using}. +\end{itemize} + +\subsubsection{Mathematical Operators} +This section will mention four arithmetic operators, eight distance measurements, and the exponential approach. Also, unlike other approaches, in this section, users can apply logarithmic, exponential, and trigonometric (sinus and cosine) transformations on the markers. Let $x_{ij}$ represent the value of the \textit{j}th variable for the \textit{i}th observation, with $i=1,2,...,n$ and $j=1,2$. Let the resulting combination score for the \textit{i}th individual be $c_i$. +\begin{itemize} + \item \textit{Arithmetic Operators:} Arithmetic operators such as addition, multiplication, division, and subtraction can also be used in diagnostic tests to optimize the AUC, a measure of diagnostic test performance. These mathematical operations can potentially increase the AUC and improve the efficacy of diagnostic tests by combining markers in specific ways. For example, if high values in one test indicate risk, while low values in the other indicate risk, subtraction or division can effectively combine these markers. + \item \textit{Distance Measurements:} While combining markers with mathematical operators, a distance measure is used to evaluate the relationships or similarities between marker values \citep{minaev2018distance,pandit2011comparative,cha2007comprehensive}. It's worth noting that, as far as we know, no studies have been integrating various distinct distance measures with arithmetic operators in this context. Euclidean distance is the most commonly used distance measure, which may not accurately reflect the relationship between markers. Therefore, we incorporated a variety of distances into the package we developed. These distances are given as follows:\\ + +\textit{Euclidean:} +\begin{equation} +c = \sqrt{(x_{i1} - 0)^2 + (x_{i2} - 0)^2} \label{eq:euclidean_distance} +\end{equation} +\\ +\textit{Manhattan:} +\begin{equation} +c = |x_{i1} - 0| + |x_{i2} - 0| \label{eq:manhattan_distance} +\end{equation} +\\ +\textit{Chebyshev:} +\begin{equation} +c = \max\{|x_{i1} - 0|, |x_{i2} - 0|\} \label{eq:max_absolute} +\end{equation} +\\ +\textit{Kulczynskid:} +\begin{equation} +c = \frac{|x_{i1} - 0| + |x_{i2} - 0|}{\min\{x_{i1}, x_{i2}\}} \label{eq:custom_expression} +\end{equation} +\\ +\textit{Lorentzian:} +\begin{equation} +c = \ln(1 + |x_{i1} - 0|) + \ln(1 + |x_{i2} - 0|) \label{eq:ln_expression} +\end{equation} +\\ + \textit{Taneja:} +\begin{equation} +c = z_1 \left( \log \left( \frac{z_1}{\sqrt{x_{i1} \epsilon}} \right) \right) + z_2 \left( \log \left( \frac{z_2}{\sqrt{x_{i2} \epsilon}} \right) \right) \label{eq:log_expression} +\end{equation} +\\ +where $z_1 = \frac{x_{i1} - 0}{2}, \quad z_2 = \frac{x_{i2} - 0}{2}$ \\ + +\textit{Kumar-Johnson:} +\begin{equation} +c = \frac{{(x_{i1}^2 - 0)^2}}{{2(x_{i1} \epsilon)^{\frac{3}{2}}}} + \frac{{(x_{i2}^2 - 0)^2}}{{2(x_{i2} \epsilon)^{\frac{3}{2}}}}, \quad \epsilon=0.0000) \label{eq:c_expression} +\end{equation} +\\ +\textit{Avg:} +\begin{equation} +c = \frac{{|x_{i1} - 0| + |x_{i2} - 0| + \max\{(x_{i1} - 0),(x_{i2} - 0)\}}}{2} \label{eq:c_expression} +\end{equation}\\ + + \item \textit{Exponential approach:} The exponential approach is another technique to explore different relationships between the diagnostic measurements. The methods in which one of the two diagnostic tests is considered as the base and the other as an exponent can be represented as $x_{i1}^{(x_{i2})}$ and $x_{i2}^{(x_{i1})}$. The specific goals or hypothesis of the analysis, as well as the characteristics of the diagnostic tests, will determine which method to use. +\end{itemize} +\subsubsection{Machine-Learning algorithms} +Machine learning algorithms have been increasingly implemented in various fields, including the medical field, to combine diagnostic tests. Integrating diagnostic tests through ML can lead to more accurate, timely, and personalized diagnoses, particularly valuable in complex medical cases where multiple factors must be considered. In this study, we aimed to incorporate almost all ML algorithms in the package we developed. To achieve this goal, we took advantage of the \pkg{caret} package in R \citep{kuhn2008caret}. This package includes 190 classification algorithms that could be used to train models and make predictions. Our study focused on models that use numerical inputs and produce binary responses depending on the variables/features and the desired outcome. This selection process resulted in 113 models we further implemented in our study. We then classified these 113 models into five classes using the same idea given in \citep{zararsiz2016statistical}: (i) discriminant classifiers, (ii) decision tree models, (iii) kernel-based classifiers, (iv) ensemble classifiers, and (v) others. Like in the \pkg{ caret} package, \code{mlComb()} sets up a grid of tuning parameters for a number of classification routines, fits each model, and calculates a performance measure based on resampling. After the model fitting, it uses the \code{predict()} function to calculate the probability of the "event" occurring for each observation. Finally, it performs ROC analysis based on the probabilities obtained from the prediction step. + +\subsection{Standardization} +Standardization is converting/transforming data into a standard scale to facilitate meaningful comparisons and statistical inference. Many statistical techniques frequently employ standardization to improve the interpretability and comparability of data. We implemented five different standardization methods that can be applied for each marker, the formulas of which are listed below: + +\begin{itemize} + \item Z-score: \( \frac{{x - \text{mean}(x)}}{{\text{sd}(x)}} \) + \item T-score: \( \left( \frac{{x - \text{mean}(x)}}{{\text{sd}(x)}} \times 10 \right) + 50 \) + \item Range: \( \frac{{x - \min(x)}}{{\max(x) - \min(x)}} \) + \item Mean: \( \frac{x}{{\text{mean}(x)}} \) + \item Deviance: \( \frac{x}{{\text{sd}(x)}} \) +\end{itemize} + + +\subsection{Model building} +After specifying a combination method from the \pkg{dtComb} package, users can build and optimize model parameters using functions like \code{mlComb()}, \code{linComb()}, \code{nonlinComb()}, and \code{mathComb()}, depending on the specific model selected. Parameter optimization is done using n-fold cross-validation, repeated n-fold cross-validation, and bootstrapping methods for linear and non-linear approaches (i.e., \code{linComb()}, \code{nonlinComb()}). Additionally, for machine learning approaches (i.e., \code{mlComb()}), all of the resampling methods from the \pkg{caret} package are used to optimize the model parameters. The total number of parameters being optimized varies across models, and these parameters are fine-tuned to maximize the AUC. Returned object stores input data, preprocessed and transformed data, trained model, and resampling results. +\subsection{Evaluation of model performances} + +A confusion matrix, as shown in Table \ref{tab:confusion_matrix}, is a table used to evaluate the performance of a classification model and shows the number of correct and incorrect predictions. It compares predicted and actual + +\begin{table}[h] +\centering +\caption{Confusion Matrix} +\label{tab:confusion_matrix} +\begin{tabular}{llll} +\hline +\multirow{2}{*}{Predicted labels} & \multicolumn{2}{l}{Actual class labels} & Total \\ \cline{2-4} + & Positive & Negative & \\ \hline +Positive & TP & FP & TP+FP \\ +Negative & FN & TN & FN+TN \\ +Total & TP+FN & FP+TN & n \\ \hline +\end{tabular} + + \begin{flush} +\tiny TP: True Positive, TN: True Negative, FP: False Positive, FN: False Negative, n: Sample size + \end{flush} +\end{table} +\noindent +class labels, with diagonal elements representing the correct predictions and off-diagonal elements representing the number of incorrect predictions. The \pkg{dtComb} package uses the \pkg{epiR} package [32], which includes different performance metrics. Various performance metrics accuracy rate (ACC), Kappa statistic (κ), sensitivity (SE), specificity (SP), apparent and true prevalence (AP, TP), positive and negative predictive values (PPV, NPV), positive and negative likelihood ratio (PLR, NLR), the proportion of true outcome negative subjects that test positive (False T+ proportion for true D-), the proportion of true outcome positive subjects that test negative (False T- proportion for true D+), the proportion of test positive subjects that are outcome negative (False T+ proportion for T+), the proportion of test negative subjects (False T- proportion for T-) that are outcome positive measures are available in the \pkg{dtComb} package. These metrics are summarized in Table \ref{tab:performance_metrics}. + +\begin{table}[htbp] + \centering \small + \caption{Performance Metrics and Formulas} + \label{tab:performance_metrics} + \begin{tabular}{ll} + \hline + \textbf{Performance Metric} & \textbf{Formula} \\ + \hline + Accuracy & $\text{ACC} = \frac{{\text{TP} + \text{TN}}}{2}$ \\ + Kappa & $\kappa = \frac{{\text{ACC} - P_e}}{{1 - P_e}}$ \\ + & $P_e = \frac{{(\text{TN} + \text{FN})(\text{TP} + \text{FP}) + (\text{FP} + \text{TN})(\text{FN} + \text{TN})}}{{n^2}}$ \\ + Sensitivity (Recall) & $\text{SE} = \frac{{\text{TP}}}{{\text{TP} + \text{FN}}}$ \\ + Specificity & $\text{SP} = \frac{{\text{TN}}}{{\text{TN} + \text{FP}}}$ \\ + Apparent Prevalence & $\text{AP} = \frac{{\text{TP}}}{{n}} + \frac{{\text{FP}}}{{n}}$ \\ + True Prevalence & $\text{TP} = \frac{{\text{AP} + \text{SP} - 1}}{{\text{SE} + \text{SP} - 1}}$ \\ + Positive Predictive Value (Precision) & $\text{PPV} = \frac{{\text{TP}}}{{\text{TP} + \text{FP}}}$ \\ + Negative Predictive Value & $\text{NPV} = \frac{{\text{TN}}}{{\text{TN} + \text{FN}}}$ \\ + Positive Likelihood Ratio & $\text{PLR} = \frac{{\text{SE}}}{{1 - \text{SP}}}$ \\ + Negative Likelihood Ratio & $\text{NLR} = \frac{{1 - \text{SE}}}{{\text{SP}}}$ \\ + The Proportion of True Outcome Negative Subjects That Test Positive & $\frac{{\text{FP}}}{{\text{FP} + \text{TN}}}$ \\ + The Proportion of True Outcome Positive Subjects That Test Negative & $\frac{{\text{FN}}}{{\text{TP} + \text{FN}}}$ \\ + The Proportion of Test Positive Subjects That Are Outcome Negative & $\frac{{\text{FP}}}{{\text{TP} + \text{FN}}}$ \\ + The Proportion of Test Negative Subjects That Are Outcome Positive & $\frac{{\text{FN}}}{{\text{FN} + \text{TN}}}$ \\ + \hline + \end{tabular} +\end{table} + + +\subsection{Prediction of the test cases} +The class labels of the observations in the test set are predicted with the model parameters derived from the training phase. It is critical to emphasize that the same analytical procedures employed during the training phase have also been applied to the test set, such as normalization, transformation, or standardization. More specifically, if the training set underwent Z-standardization, the test set would similarly be standardized using the mean and standard deviation derived from the training set. The class labels of the test set are then estimated based on the cut-off value established during the training phase and using the parameters of the model that is trained using the training set. +\subsection{Technical details and the structure of dtComb} +The \pkg{dtComb} package is implemented using the R programming language (\url{https://www.r-project.org/}) version 4.2.0. Package development was facilitated with devtools \citep{wickham2016devtools} and documented with roxygen2 \citep{wickham2013roxygen2}. Package testing was performed using 271 unit tests \citep{wickham2011testthat}. Double programming was performed using Python (\url{https://www.python.org/}) to validate the implemented functions \citep{meszaros2007xunit}.\\ + +To combine diagnostic tests, the \pkg{dtComb} package allows the integration of eight linear combination methods, seven non-linear combination methods, arithmetic operators, and, in addition to these, eight distance metrics within the scope of mathematical operators and a total of 113 machine learning algorithms from the \pkg{caret} package \citep{kuhn2008caret}. These are summarized in Table \ref{tab:dtComb_features}. +%Table 3 + +\begin{table}[htbp] + \centering + \caption{Features of dtComb} + \label{tab:dtComb_features} + \begin{tabular}{l p{10cm}} + \hline + \textbf{Modules (Tab Panels)} & \textbf{Features} \\ +\hline + \multirow{4}{*}{Combination Methods} & + \begin{itemize} + \item Linear Combination Approach (8 Different methods) + \item Non-linear Combination Approach (7 Different Methods) + \item Mathematical Operators (14 Different methods) + \item Machine-Learning Algorithms (113 Different Methods) + \end{itemize} \\ + + \multirow{2}{*}{Standardization Methods} & + \begin{itemize} + \item Linear, non-linear, and mathematical methods + \begin{itemize} + \item Z-score + \item T-score + \item Range + \item Mean + \item Deviance + \end{itemize} + \item 16 different preprocessing methods for ML \citep{kuhn2008caret} + \end{itemize} \\ + \multirow{2}{*}{Resampling} & + \begin{itemize} + \item 3 different methods for linear and non-linear combination methods + \begin{itemize} + \item Bootstrapping + \item Cross-validation + \item Repeated cross-validation + \end{itemize} + \item 12 different resampling methods for ML \citep{kuhn2008caret} + \end{itemize} \\ + {Cutpoints} & + \begin{itemize} + \item 34 different methods for optimum cutpoints \citep{yin2014optimal} + \end{itemize} \\ + \hline + \end{tabular} +\end{table} + + +\section{Results} + +\subsection{Dataset} +To demonstrate the functionality of the \pkg{dtComb} package, we conduct a case study using four different combination methods. The data used in this study were obtained from patients who presented at Erciyes University Faculty of Medicine, Department of General Surgery, with complaints of abdominal pain \citep{zararsiz2016statistical,akyildiz2010value}. The dataset comprised D-dimer levels and leukocyte counts of 225 patients, divided into two groups: the first group consisted of 110 patients who required an immediate laparotomy. In comparison, the second group comprised 115 patients who did not. After the evaluation of conventional treatment, the patients who underwent surgery due to their postoperative pathologies are placed in the first group. In contrast, those with a negative result from their laparotomy were assigned to the second group. + +\begin{figure}[htbp] + \centering + \includegraphics[width=1\textwidth]{dtComb/Figure_1.pdf} + \caption{Combination steps of two diagnostic tests} + \label{figure:rlogo} +\end{figure} + + +\subsection{Implementation of the \pkg{dtComb} package} +To exemplify the utilization of the \pkg{dtComb} package, we implemented the Splines method from non-linear combination approaches on Abdominal pain data. The parameters that must be predefined to implement the Splines method are the degree and degrees of freedom, both set to 3. To implement the Splines method, we split the data into two parts: a training set comprising 70\% of the data and a test set comprising the remaining 30\%. The model is trained on the training set, and the model parameters used in the prediction phase are fine-tuned using a five-fold cross-validation with 10-repeats. Since higher values indicate higher risks, the Youden index is chosen from Cut-off methods with Direction = “<”. We note that markers have not been standardized, and the results are presented at the confidence level (CI 95\%). + +The area under ROC curves for D-dimer levels and leukocyte counts on the logarithmic scale and combination score were 0.816, 0.802, and 0.911, respectively (Table \ref{tab:AUC_markers_combination_score} ). The ROC curves generated with the combination score from the splines model, D-dimer levels, and leukocyte count markers are also given in Fig. \ref{figure:roc}, showing that the combination score has the highest AUC. It is observed that the splines method significantly improved between 9.5\% and 10.9\% in AUC statistics compared to D-dimer level and leukocyte counts, respectively. Controlling Type I error using Bonferroni correction, comparison of combination score with markers yielded significant results ($p<0.05$) (Table \ref{tab:AUC_comparison}). Table \ref{tab:diagnostic_test_results} summarizes the diagnostic test results for each marker and the non-linear combination approach. Optimal cut-off values for both markers and the combined approach are also given in this table. Table 6 shows that the TP value of the combination score is higher than that of single markers. + +\begin{table}[htbp] + \centering + \caption{Area Under the Curves of Markers and Combination Score} + \label{tab:AUC_markers_combination_score} + \begin{tabular}{@{}lllll@{}} + \toprule + \textbf{Variable} & \textbf{AUC (\%95 CI)} & \textbf{SE (AUC)} & \textbf{z} & $ \textbf{p}$ \\ + \midrule + D-dimer level & 0.816 (0.751-0.880) & 0.033 & 9.557 & \textbf{$<$0.001} \\ + Log(leukocyte count) & 0.802 (0.728-0.877) & 0.038 & 7.971 & \textbf{$<$0.001} \\ + Combination score & 0.911 (0.868-0.954) & 0.022 & 18.803 & \textbf{$<$0.001} \\ + \bottomrule + \end{tabular} + \small + + \begin{flush} + \small SE - Standard Error. Statistically significant p-values are shown in bold (p $<$ 0.05).% + + \end{flush} +\end{table} + +\begin{figure}[htbp] + \centering + \includegraphics[width=1\textwidth]{dtComb/Figure_2.pdf} + \caption{\textbf{ROC curves.} ROC curves for combined diagnostic tests, with sensitivity displayed on the y-axis and 1-specificity displayed on the x-axis. As can be observed, the combination score produced the highest AUC value, indicating that the combined strategy performs the best overall.} + \label{figure:roc} +\end{figure} + + + +%Table 5 +\begin{table}[htbp] + \centering + \caption{Area Under the Curve Comparison of Markers and Combination Score} + \label{tab:AUC_comparison} + \begin{tabular}{@{}llllll@{}} + \toprule + \textbf{Marker 1 (A)} & \textbf{Marker 2 (B)} & \textbf{|A-B|} & \textbf{SE(|A-B|)} & \textbf{z} & $\textbf{p}$ \\ + \midrule + Combination score & D-dimer & 0.095 & 0.024 & 4.007 & \textbf{$<$0.001} \\ + Combination score & Log(leukocyte count) & 0.108 & 0.034 & 3.201 & \textbf{0.001} \\ + D-dimer & Log(leukocyte count) & 0.013 & 0.048 & 0.278 & 0.781 \\ + \bottomrule + \end{tabular} +\end{table} + +%Table 6 +\begin{table}[htbp] + \centering + \caption{Diagnostic test results for each marker and non-linear combination model} + \label{tab:diagnostic_test_results} + \begin{tabular}{@{}lllll@{}} + \toprule + \textbf{Variable} & \textbf{TP} & \textbf{TN} & \textbf{FP} & \textbf{FN} \\ + \midrule + D-dimer ($>1.6 \; \mu g$ FEU/mL) & 66 & 53 & 28 & 11 \\ + Log(leukocyte count) ($>4.16$) & 61 & 60 & 21 & 16 \\ + Combination score ($>0.448$) & 65 & 69 & 12 & 12 \\ + \bottomrule + \end{tabular} +\end{table} + +\newpage +Table \ref{tab:diagnostic_measures} summarizes the performance metrics used to assess the effectiveness of diagnostic tests, i.e., single marker, D-dimer or leukocyte, and combined approach. These measures are calculated along with their corresponding 95\% confidence intervals. The combination of markers was found to have higher specificity and positive-negative predictive value than log-transformed leukocyte counts and D-dimer level. Conversely, D-dimer has a greater sensitivity than the others. +%Table 7 +\begin{table}[htbp] + \centering + \caption{Statistical diagnostic measures with 95\% confidence intervals for each marker and the combination Score} + \label{tab:diagnostic_measures} + \begin{tabular}{@{}lccc@{}} + \toprule + \textbf{Diagnostic Measures (95\% CI)} & \textbf{D-dimer level} & \textbf{Log(leukocyte count)} & \textbf{Combination score} \\ + \midrule + Apparent prevalence & 0.59 (0.51-0.67) & 0.52 (0.44-0.60) & 0.49 (0.41-0.57) \\ + True prevalence & 0.49 (0.41-0.57) & 0.49 (0.41-0.57) & 0.49 (0.41-0.57) \\ + Sensitivity & 0.86 (0.76-0.93) & 0.79 (0.68-0.88) & 0.84 (0.74-0.92) \\ + Specificity & 0.65 (0.54-0.76) & 0.74 (0.63-0.83) & 0.85 (0.76-0.92) \\ + Positive predictive value & 0.70 (0.60-0.79) & 0.74 (0.64-0.83) & 0.84 (0.74-0.92) \\ + Negative predictive value & 0.83 (0.71-0.91) & 0.79 (0.68-0.87) & 0.85 (0.76-0.92) \\ + Positive likelihood ratio & 2.48 (1.81-3.39) & 3.06 (2.08-4.49) & 5.70 (3.35-9.69) \\ + Negative likelihood ratio & 0.22 (0.12-0.39) & 0.28 (0.18-0.44) & 0.18 (0.11-0.31) \\ + False T+ proportion for true D- & 0.35 (0.24-0.46) & 0.26 (0.17-0.37) & 0.15 (0.08-0.24) \\ + False T- proportion for true D+ & 0.14 (0.07-0.24) & 0.21 (0.12-0.32) & 0.16 (0.08-0.26) \\ + False T+ proportion for T+ & 0.30 (0.21-0.40) & 0.26 (0.17-0.36) & 0.16 (0.08-0.26) \\ + False T- proportion for T- & 0.17 (0.09-0.29) & 0.21 (0.13-0.32) & 0.15 (0.08-0.24) \\ + Accuracy & 0.75 (0.68-0.82) & 0.77 (0.69-0.83) & 0.85 (0.78-0.90) \\ + \bottomrule + \end{tabular} +\end{table} + +For a comprehensive analysis, distribution and scatter plots are generated to visualize each group's density and distribution of combination scores (Fig. \ref{figure:scatter}a, \ref{figure:scatter}b). Significant variations in specificity and sensitivity measurements, corresponding to different cut-off point values, are also displayed in Figure 3c. +\begin{figure}[htbp] + \centering + \includegraphics[width=1\textwidth]{dtComb/Figure_3.pdf} + \caption{\textbf{Distribution, scatter, and sens\&spe plots of the combination score acquired with the training model.} Distribution of the combination score for two groups: needed and not needed (a). Scatter graph with classes on the x-axis and combination score on the y-axis (b). Sensitivity and specificity graph of the combination score c. While colors show each class in Figures (a) and (b), in Figure (c), the colors represent the sensitivity and specificity of the combination score.} + \label{figure:scatter} +\end{figure} +\subsection{Comparison of classifiers} +In this section, we discuss and compare the performance of the fitted models in detail. Various measures were considered to compare model performances, including AUC, ACC, SEN, SPE, PPV, and NPV. AUC statistics, with 95\% CI, have been calculated for each marker and method. The resulting statistics are as follows: 0.816 (0.751–0.880), 0.802 (0.728–0.877), 0.888 (0.825–0.930), 0.911 (0.868–0.954), 0.877 (0.824-0.929), and 0.875 (0.821-0.930) for D-dimer, Log(leukocyte), Pepe, Cai \& Langton, Splines, Addition, and Support Vector Machine (SVM). The results revealed that the predictive performances of markers and the combination of markers are significantly higher than random chance in determining the use of laparoscopy ($p<0.05$). According to the overall AUC and accuracies, the combined approach fitted with the Splines method performed better than the other methods (Fig. \ref{figure:radar}). The highest sensitivity and NPV were observed with the Addition method, while the highest specificity and PPV were observed with the Splines method. + +\begin{figure}[h] + \centering + \includegraphics[width=1\textwidth]{dtComb/Figure_4.pdf} + \caption{\textbf{Radar plots of trained models and performance measures of two markers.} Radar plots summarize the diagnostic performances of two markers and various combination methods in the training dataset. These plots illustrate the performance metrics such as AUC, ACC, SEN, SPE, PPV, and NPV measurements. In these plots, the width of the polygon formed by connecting each point indicates the model's performance in terms of AUC, ACC, SEN, SPE, PPV, and NPV metrics. It can be observed that the polygon associated with the Splines method occupies the most expensive area, which means that the Splines method performed better than the other methods.} + \label{figure:radar} +\end{figure} +During the prediction phase, 67 observations were tested using the fitted Splines method. The output for each observation consisted of the predicted label, determined based on the combination score and the model-derived cut-off value. Table 8 displays the estimated combination scores and labels for the first ten observations in the test set. +%Table 8 +\subsection{3.Web interface for the \pkg{dtComb} package} +The primary goal of developing the \pkg{dtComb} package is to combine numerous distinct combination methods and make them easily accessible to researchers. Furthermore, the package includes diagnostic statistics and visualization tools for diagnostic tests and the combination score generated by the chosen method. Nevertheless, it is worth noting that using R code may pose challenges for physicians and those unfamiliar with R programming. We have also developed a user-friendly web application for dtComb using "Shiny" \citep{chang2017shiny} to address this. This web-based tool is publicly accessible and provides an interactive interface with all the functionalities found in the \pkg{dtComb} package. \\ + +To initiate the analysis, users must upload their data by following the instructions outlined in the "Data upload" tab of the web tool. For convenience, we have provided three example datasets on this page to assist researchers in practicing the tool's functionality and to guide them in formatting their own data (as illustrated in Fig. \ref{figure:web}a). We also note that ROC analysis for a single marker can be performed within the ‘ROC Analysis for Single Marker(s)’ tab in the data upload section of the web interface. +\begin{figure}[h] + \centering + \includegraphics[width=1\textwidth]{dtComb/Figure_5_new.pdf} + \caption{\textbf{Web interface of the \pkg{dtComb} package.}} + \label{figure:web} +\end{figure} +In the "Analysis" tab, one can find two crucial subpanels: +\begin{itemize} + \item Plots (Fig. \ref{figure:web}b): This section offers various visual representations, such as ROC curves, distribution plots, scatter plots, and sensitivity nad specificity plots. These visualizations help users assess single diagnostic tests and the combination score, which is generated using user-defined combination methods. + \item Results (Fig. \ref{figure:web}c): In this subpanel, one can access a range of statistics. It provides insights into the combination score and single diagnostic tests, AUC statistics, and comparisons to evaluate how the combination score fares against individual diagnostic tests, and various diagnostic measures. One can also predict new data based on the model parameters set previously and stored in the "Predict" tab (Fig. \ref{figure:web}d). If needed, one can download the model created during the analysis to keep the parameters of the fitted model. This lets users make new predictions by reloading the model from the "Predict" tab. Additionally, all the results can easily be downloaded using the dedicated download buttons in their respective tabs. +\end{itemize} +\section{Discussion} +The accuracy and reliability of diagnostic tests are critical factors in determining their widespread adoption and use in clinical practice. Researchers have focused on integrating various diagnostic approaches to achieve higher levels of accuracy. While numerous diagnostic tests exist, combining their results coherently and effectively has been challenging. Several methods, each with its strengths and limitations, have been developed for combining diagnostic tests or markers \citep{su1993linear,pepe2000combining,liu2011min,sameera2016binary,pepe2006combining,todor2014tools}. +Linear combination methods, including Su \& Liu, Min-Max, and Pepe \& Thomson, are frequently used in medicine and diagnostics to improve the performance of diagnostic tests for various medical problems \citep{erturkzararsiz2023linear,ma2020combination,aznar-gimeno2023comparing}. These methods involve linearly combining multiple diagnostic markers or variables to generate a new combined score, improving the diagnostic test's accuracy and reliability. Non-linear combination methods such as Lasso regression and splines have also been proposed and utilized in medical diagnostics to incorporate complexities and the interaction of variables that linear methods might not capture effectively. Application on real data showed that using a combination of markers is more effective or accurate in predicting the need for laparoscopy compared to relying on a single marker alone. Given the overall AUC and accuracies, the combined approach fitted with the Splines method performed better than the other methods (Fig. \ref{figure:radar}). Recently, others have integrated machine-learning algorithms to enhance diagnostic tests’ accuracy and the reliability \citep{chang2022artificial,alkayyali2023systematic,ghazal2022intelligent}. +Despite numerous combination methods proposed in the literature, a method commonly implemented in clinical practice involves the division of two diagnostic tests due to its ease of implementation \citep{fagan2007cerebrospinal,nyblom2004high,balta2016relation,klemt2023complete,ji2017monocyte}. Although this simple approach improves diagnostic scores compared to using a single marker, more complex yet potentially effective combination methods have been neglected or underrated. Therefore, a significant need exits to integrate these combination methods into clinical practice. Additionally, while packages like \pkg{ROCR}, \pkg{pROC}, \pkg{PRROC}, \pkg{plotROC}, \pkg{precrec}, and \pkg{ROCit} focus on a single marker, a comprehensive software tool capable of implementing any combination of methods is unavailable \citep{sing2005rocr,turck2011proc,grau2015prroc}. To remedy this, we introduced \pkg{dtComb}, a groundbreaking, user-friendly software package that includes almost all existing approaches from the diverse literature on diagnostic test combinations. The \pkg{dtComb} package, developed within the R language environment, represents a significant step forward in diagnostics. One of the critical strengths of \pkg{dtComb} lies in its ability to accommodate a wide range of diagnostic test methodologies, providing users with a comprehensive toolkit to merge and analyze diverse test results. This ensures that clinicians can tailor their diagnostic approach according to the specific requirements of individual cases, leading to more personalized and precise healthcare interventions. On the other hand, it can be challenging, especially for physicians, healthcare professionals, and those unfamiliar with R programming, to use the \pkg{dtComb} package. Therefore, we also developed a user-friendly web application for \pkg{dtComb}, which is accessible to the public and offers an interactive interface with all the functionalities in the \pkg{dtComb} package. +\section{Conclusions} +In conclusion, we developed an R package, \pkg{dtComb}, to combine markers for diagnostic tests. The package executes combination methods through distinct functions, each specifically designed to generate combined scores for various markers. These functions enable researchers to proficiently amalgamate diagnostic tests, catering to a diverse and extensive research population. Moreover, a user-friendly web tool has been developed to enhance accessibility for both researchers and clinicians who are non-R users. The tool is freely available through \url{https://biotools.erciyes.edu.tr/dtComb/}, and the source code is available on the GitHub repository at \url{https://github.com/gokmenzararsiz/dtComb}. Its simple interface enables researchers to easily combine two markers in a single platform, simplifying the analysis process and increasing efficiency. The tool is groundbreaking because it introduces a novel feature that addresses a critical need in marker discovery research while providing a more straightforward and time-saving process. +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + + + +%\section{Summary} +%This file is only a basic article template. For full details of \emph{The R Journal} style and information on how to prepare your article for submission, see the \href{https://journal.r-project.org/share/author-guide.pdf}{Instructions for Authors}. + +\bibliography{dtCombreferences} + +\address{Serra Ilayda Yerlitas\\ + Department of Biostatistics \\ + Erciyes University\\ + Türkiye\\ + (ORCiD: 0000-0003-2830-3006)\\ + \email{ilaydayerlitas340@gmail.com}} + +\address{Serra Bersan Gengec\\ + Drug Application and Research Center (ERFARMA)\\ + Erciyes University\\ + Türkiye\\ + \email{serrabersan@gmail.com}} + +\address{Necla Koçhan\\ + Department of Mathematics\\ + Izmir University of Economics\\ + Türkiye\\ + (ORCiD: 0000-0003-2355-4826)\\ + \email{necla.kayaalp@gmail.com}} + + \address{Gözde Ertürk Zararsız\\ + Department of Biostatistics \\ + Erciyes University\\ + Türkiye\\ + (ORCiD if desired)\\ + \email{gozdeerturk9@gmail.com}} + + \address{Selçuk Korkmaz\\ + Department of Biostatistics \\ + Trakya University\\ + Türkiye\\ + (ORCiD if desired)\\ + \email{selcukorkmaz@gmail.com}} + + \address{Gökmen Zararsız\\ + Department of Biostatistics \\ + Erciyes University\\ + Türkiye\\ + (ORCiD: 0000-0001-5801-1835)\\ + \email{gokmen.zararsiz@gmail.com}} diff --git a/_articles/RJ-2025-036/dtComb2.tex b/_articles/RJ-2025-036/dtComb2.tex new file mode 100644 index 0000000000..21cf8f9689 --- /dev/null +++ b/_articles/RJ-2025-036/dtComb2.tex @@ -0,0 +1,587 @@ +% !TeX root = RJwrapper.tex +\title{dtComb: A Comprehensive R Library and Web Tool for Combining Diagnostic Tests} + +%\author{Serra Ilayda Yerlitas, Serra Bersan Gengec, Necla Kochan, Gozde Erturk Zararsiz, Selcuk Korkmaz and Gokmen Zararsiz} + +\author{Serra Ilayda Yerlitaş Taştan, Serra Bersan Gengeç, Necla Koçhan, Gözde Ertürk Zararsız, Selçuk Korkmaz and Gökmen Zararsız} + + + +\maketitle + +\abstract{ + +The combination of diagnostic tests has become a crucial area of research, aiming to improve the accuracy and robustness of medical diagnostics. While existing tools focus primarily on linear combination methods, there is a lack of comprehensive tools that integrate diverse methodologies. In this study, we present \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}}, a genuinely comprehensive R package and web tool designed to address the limitations of existing diagnostic test combination platforms. One of the unique contributions of \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} is offering a range of 142 methods to combine two diagnostic tests, including linear, non-linear, machine learning algorithms and mathematical operators. Another significant contribution of \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} is its inclusion of advanced tools for ROC analysis, diagnostic performance metrics, and visual outputs such as sensitivity-specificity curves. Furthermore, \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} offers classification functions for new observations, making it an easy-to-use tool for clinicians and researchers. The web-based version is also available at \url{https://biotools.erciyes.edu.tr/dtComb/} for non-R users, providing an intuitive interface for test combination and model training.} + +\section{Introduction} +A typical scenario often encountered in combining diagnostic tests is when the gold standard method combines two-category and two continuous diagnostic tests. In such cases, clinicians usually seek to compare these two diagnostic tests and improve the performance of these diagnostic test results by dividing the results into proportional results \citep{muller2019amyloid, faria2016neutrophil, nyblom2006ast}. However, this technique is straightforward and may not fully capture all potential interactions and relationships between the diagnostic tests. Linear combination methods have been developed to overcome such problems \citep{erturkzararsiz2023linear}.\\ +Linear methods combine two diagnostic tests into a single score/index by assigning weights to each test, optimizing their performance in diagnosing the condition of interest \citep{neumann2023combining}. Such methods improve accuracy by leveraging the strengths of both tests \citep{aznar2022stepwise, bansal2013does}. For instance, Su and Liu \citep{su1993linear} found that Fisher’s linear discriminant function generates a linear combination of markers with either proportional or disproportional covariance matrices, aiming to maximize sensitivity consistently across the entire selectivity spectrum under a multivariate normal distribution model. In contrast, another approach introduced by Pepe and Thomson \citep{pepe2000combining} relies on ranking scores, eliminating the need for linear distributional assumptions when combining diagnostic tests. Despite the theoretical advances, when existing tools were examined, it was seen that they contained a limited number of methods. For instance, Kramar et al. developed a computer program called \pkg{mROC} that includes only the Su and Liu method \citep{kramar2001mroc}. Pérez-Fernández et al. presented a \href{https://cran.r-project.org/web/packages/movieROC/index.html}{\texttt{movieROC}} R package that includes methods such as Su and Liu, min-max, and logistic regression methods \citep{perez2021visualizing}. An R package called \href{https://github.com/wbaopaul/MaxmzpAUC-R}{\texttt{maxmzpAUC}} that includes similar methods was developed by Yu and Park \citep{yu2015two}. + + +On the other hand, non-linear approaches incorporating the non-linearity between the diagnostic tests have been developed and employed to integrate the diagnostic tests \citep{du2024likelihood, ghosh2005classification}. These approaches incorporate the non-linear structure of tests into the model, which might improve the accuracy and reliability of the diagnosis. In contrast to some existing packages, which permit the use of non-linear approaches such as splines\footnote{\url{https://cran.r-project.org/web/ +packages/splines/index.html}}, lasso\footnote{\label{note2}\url{https://cran.r-project.org/web/packages/glmnet/index.html}} and ridge\footref{note2} regression, there is currently no package that employs these methods directly for combination and offers diagnostic performance. Machine-learning (ML) algorithms have recently been adopted to combine diagnostic tests \citep{ahsan2024advancements, sewak2024construction, agarwal2023artificial, prinzi2023explainable}. Many publications/studies focus on implementing ML algorithms in diagnostic tests \citep{salvetat2022game, salvetat2024ai, ganapathy2023comparison, alzyoud2024diagnosing, zararsiz2016statistical}. For instance, DeGroat et al. performed four different classification algorithms (Random Forest, Support Vector Machine, Extreme Gradient Boosting Decision Trees, and k-Nearest Neighbors) to combine markers for the diagnosis of cardiovascular disease \citep{degroat2024discovering}. The results showed that patients with cardiovascular disease can be diagnosed with up to 96\% accuracy using these ML techniques. There are numerous applications where ML methods can be implemented (\href{https://scikit-learn.org/stable/}{\texttt{scikit-learn}} \citep{pedregosa2011scikit}, \href{https://www.tensorflow.org/learn?hl=tr}{\texttt{TensorFlow}} \citep{tensorflow2015-whitepaper}, \href{https://cran.r-project.org/web/packages/caret/index.html}{\texttt{caret}} \citep{kuhn2008building}). The \href{https://cran.r-project.org/web/packages/caret/index.html}{\texttt{caret}} library is one of the most comprehensive tools developed in the R language\citep{kuhn2008building}. However, these are general tools developed only for ML algorithms and do not directly combine two diagnostic tests and provide diagnostic performance measures. + +Apart from the aforementioned methods, several basic mathematical operations such as addition, multiplication, subtraction, and division can also be used to combine markers \citep{svart2024neurofilament, luo2024ast, serban2024significance}. For instance, addition can enhance diagnostic sensitivity by combining the effects of markers, whereas subtraction can more distinctly differentiate disease states by illustrating the variance across markers. On the other hand, there are several commercial (e.g. IBM SPSS, MedCalc, Stata, etc.) and open source (R) software packages (\href{https://cran.r-project.org/web/packages/ROCR/index.html}{\texttt{ROCR}} \citep{sing2005rocr}, (\href{https://cran.r-project.org/web/packages/pROC/index.html}{\texttt{pROC}} \citep{robin2011proc}, \href{https://cran.r-project.org/web/packages/PRROC/index.html}{\texttt{PRROC}} \citep{grau2015prroc}, \href{https://cran.r-project.org/web/packages/plotROC/index.html}{\texttt{plotROC}} \citep{sachs2017plotroc}) that researchers can use for Receiver operating characteristic (ROC) curve analysis. However, these tools are designed to perform a single marker ROC analysis. As a result, there is currently no software tool that covers almost all combination methods. + +In this study, we developed \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} , an innovative R package encompassing nearly all existing combination approaches in the literature. \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} has two key advantages, making it easy to apply and superior to the other packages: (1) it provides users with a comprehensive 142 methods, including linear and non-linear approaches, ML approaches and mathematical operators; (2) it produces turnkey solutions to users from the stage of uploading data to the stage of performing analyses, performance evaluation and reporting. Furthermore, it is the only package that illustrates linear approaches such as Minimax and Todor \& Saplacan \citep{sameera2016binary,todor2014tools}. In addition, it allows for the classification of new, previously unseen observations using trained models. To our knowledge, no other tools were designed and developed to combine two diagnostic tests on a single platform with 142 different methods. In other words, \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} has made more effective and robust combination methods ready for application instead of traditional approaches such as simple ratio-based methods. First, we review the theoretical basis of the related combination methods; then, we present an example implementation to demonstrate the applicability of the package. Finally, we present a user-friendly, up-to-date, and comprehensive web tool developed to facilitate \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} for physicians and healthcare professionals who do not use the R programming language. The \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} package is freely available on the CRAN network, the web application is freely available at \url{https://biotools.erciyes.edu.tr/dtComb/}, and all source code is available on GitHub\footnote{\url{https://github.com/gokmenzararsiz/dtComb}, \url{https://github.com/gokmenzararsiz/dtComb_Shiny}}. +\section{Material and methods} +This section will provide an overview of the combination methods implemented in the literature. Before applying these methods, we will also discuss the standardization techniques available for the markers, the resampling methods during model training, and, ultimately, the metrics used to evaluate the model’s performance. + +\subsection{Combination approaches} +\subsubsection{Linear combination methods} +The \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} package comprises eight distinct linear combination methods, which will be elaborated in this section. Before investigating these methods, we briefly introduce some notations which will be used throughout this section. \\ +Notations: \\ +Let $D_{i}, i = 1, 2, …, n_1$ be the marker values of $i$th individual in the diseased group, where $D_i=(D_{i1},D_{i2})$, and $H_j, j=1,2,…,n_2$ be the marker values of $j$th individual in the healthy group, where $H_j=(H_{j1},H_{j2})$. Let $x_{i1}=c(D_{{i1}},H_{j1})$ be the values of the first marker, and $x_{i2}=c(D_{i2},H_{j2})$ be values of the second marker for the $i$th individual $i=1,2,...,n$. Let $D_{i,min}=\min(D_{i1},D_{i2})$, $D_{i,max}=\max(D_{i1},D_{i2})$, $H_{j,min}=\min(H_{j1},H_{j2})$, $H_{j,max}=\max(H_{j1},H_{j2})$ and $c_i$ be the resulting combination score of the $i$th individual. +\begin{itemize} + \item \textit{Logistic regression:} Logistic regression is a statistical method used for binary classification. The logistic regression model estimates the probability of the binary outcome occurring based on the values of the independent variables. It is one of the most commonly applied methods in diagnostic tests, and it generates a linear combination of markers that can distinguish between control and diseased individuals. Logistic regression is generally less effective than normal-based discriminant analysis, like Su and Liu's multivariate normality-based method, when the normal assumption is met \citep{ruiz1991asymptotic,efron1975efficiency}. On the other hand, others have argued that logistic regression is more robust because it does not require any assumptions about the joint distribution of multiple markers \citep{cox1989analysis}. Therefore, it is essential to investigate the performance of linear combination methods derived from the logistic regression approach with non-normally distributed data.\\ + The objective of the logistic regression model is to maximize the logistic likelihood function. In other words, the logistic likelihood function is maximized to estimate the logistic regression model coefficients.\\ + \begin{equation} + \label{eq:1} +c=\frac{exp(\beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2}}{1+exp(\beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2}} +\end{equation} + The logistic regression coefficients can provide the maximum likelihood estimation of the model, producing an easily interpretable value for distinguishing between the two groups. + \item \textit{Scoring based on logistic regression:} The method primarily uses a binary logistic regression model, with slight modifications to enhance the combination score. The regression coefficients, as predicted in Eq \ref{eq:1}, are rounded to a user-specified number of decimal places and subsequently used to calculate the combination score \citep{leon2006bedside}. + \begin{equation} +c= \beta_1 x_{i1}+\beta_2 x_{i2} +\end{equation} + \item \textit{Pepe \& Thompson's method:} Pepe \& Thompson have aimed to maximize the AUC or partial AUC to combine diagnostic tests, regardless of the distribution of markers \citep{pepe2000combining}. They developed an empirical solution of optimal linear combinations that maximize the Mann-Whitney U statistic, an empirical estimate of the ROC curve. Notably, this approach is distribution-free. Mathematically, they maximized the following objective function: + \begin{equation} +\text{maximize} \; U(a)= \frac{1}{n_1 n_2} \sum_{i=1}^{n_1} \sum_{j=1}^{n_2} I\left[D_{i1}+\alpha D_{i2}\geq H_{j1}+\alpha H_{j2}\right] +\end{equation} + \begin{equation} +c= x_{i1}+\alpha x_{i2} +\label{eq:4} +\end{equation} + where $a \in [-1,1]$ is interpreted as the relative weight of $x_{i2}$ to $x_{i1}$ in the combination, the weight of the second marker. This formula aims to find $\alpha$ to maximize $U(a)$. Readers are referred to see (Pepe and Thomson) \citep{pepe2000combining}. + \item \textit{Pepe, Cai \& Langton's method:} Pepe et al. observed that when the disease status and the levels of markers conform to a generalized linear model, the regression coefficients represent the optimal linear combinations that maximize the area under the ROC curves \citep{pepe2006combining}. The following objective function is maximized to achieve a higher AUC value: +\begin{equation} +\text{maximize} \; U(a)= \frac{1}{n_1 n_2} \sum_{i=1}^{n_1} \sum_{j=1}^{n_2} I\left[D_{i1}+\alpha D_{i2}> H_{j1}+\alpha H_{j2}\right] + \frac{1}{2}I\left[D_{i1}+\alpha = H_{j1} + \alpha H_{j2}\right] +\end{equation} + Before calculating the combination score using Eq \ref{eq:4}, the marker values are normalized or scaled to be constrained within the range of 0 to 1. In addition, it is noted that the estimate obtained by maximizing the empirical AUC can be considered as a particular case of the maximum rank correlation estimator from which the general asymptotic distribution theory was developed. Readers are referred to Pepe (2003, Chapters 4–6) for a review of the ROC curve approach and more details \citep{pepe2003statistical}. + + \item \textit{Min-Max method:} The Pepe \& Thomson method is straightforward if there are two markers. It is computationally challenging if we have more than two markers to be combined. To overcome the computational complexity issue of this method, Liu et al. \citep{liu2011min} proposed a non-parametric approach that linearly combines the minimum and maximum values of the observed markers of each subject. This approach, which does not rely on the normality assumption of data distributions (i.e., distribution-free), is known as the Min-Max method and may provide higher sensitivity than any single marker. The objective function of the Min-Max method is as follows: +\begin{equation} +\text{maximize} \; U(a)= \frac{1}{n_1 n_2} \sum_{i=1}^{n_1} \sum_{j=1}^{n_2} I[D_{i,max}+\alpha D_{i,min}> H_{j,max}+\alpha H_{j,min}] \end{equation} +\begin{equation} + c= x_{i,max}+\alpha x_{i,min} +\end{equation}\\ + where $x_{i,max}=\max⁡(x_{i1},x_{i2})$ and $x_{i,min}=\min⁡(x_{i1},x_{i2})$.\\ + +The Min-Max method aims to combine repeated measurements of a single marker over time or multiple markers that are measured with the same unit. While the Min-Max method is relatively simple to implement, it has some limitations. For example, markers may have different units of measurement, so standardization can be needed to ensure uniformity during the combination process. Furthermore, it is unclear whether all available information is fully utilized when combining markers, as this method incorporates only the markers' minimum and maximum values into the model \citep{kang2016linear}. + + \item \textit{Su \& Liu's method:} Su and Liu examined the combination score separately under the assumption of two multivariate normal distributions when the covariance matrices were proportional or disproportionate \citep{su1993linear}. Multivariate normal distributions with different covariances were first utilized in classification problems \citep{anderson1962classification}. Then, Su and Liu also developed a linear combination method by extending the idea of using multivariate distributions to the AUC, showing that the best coefficients that maximize AUC are Fisher's discriminant coefficients. Assuming that $D~N(\mu_D, \sum_D)$ and $H~N(\mu_H, \sum_H)$ represent the multivariate normal distributions for the diseased and non-diseased groups, respectively. The Fisher’s coefficients are as follows: +\begin{equation} +(\alpha, \beta) = (\sum_{D} + \sum_{H})^{-1} \mu \label{eq:alpha_beta} +\end{equation} + where $\mu=\mu_D-\mu_H$. The combination score in this case is: +\begin{equation} +c= \alpha x_{i1}+ \beta x_{i2} +\label{eq:9} +\end{equation} + \item \textit{The Minimax method:} The Minimax method is an extension of Su \& Liu's method \citep{sameera2016binary}. Suppose that D follows a multivariate normal distribution $D\sim N(\mu_D, \sum_D)$, representing the diseased group, and H follows a multivariate normal distribution $H\sim N(\mu_H, \sum_H)$, representing the non-diseased group. Then Fisher’s coefficients are as follows: +\begin{equation} +(\alpha, \beta) = \left[t\sum_{D} + (1-t)\sum_{H}\right]^{-1} (\mu_D - \mu_H) \label{eq:alpha_beta_expression} +\end{equation} + + Given these coefficients, the combination score is calculated using Eq \ref{eq:9}. In this formula, \textit{t} is a constant with values ranging from 0 to 1. This value can be hyper-tuned by maximizing the AUC. + + \item \textit{Todor \& Saplacan’s method:} Todor and Saplacan's method uses the sine and cosine trigonometric functions to calculate the combination score \citep{todor2014tools}. The combination score is calculated using $\theta \in[-\frac{\pi}{2},\frac{\pi}{2}]$, which maximizes the AUC within this interval. The formula for the combination score is given as follows: +\begin{equation} +c= \sin{(\theta)}x_{i1}+\cos{(\theta)}x_{i2} +\end{equation} +\end{itemize} + +\subsubsection{Non-linear combination methods} +In addition to linear combination methods, the \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} package includes seven non-linear approaches, which will be discussed in this subsection. In this subsection, we will use the following notations: +$x_{ij}$: the value of the \textit{j}th marker for the \textit{i}th individual, $i=1,2,...,n$ and $j=1,2$ \textit{d}: degree of polynomial regressions and splines, $d = 1,2,…,p$. + +\begin{itemize} + \item \textit{Logistic Regression with Polynomial Feature Space:} This approach extends the logistic regression model by adding extra predictors created by raising the original predictor variables to a certain power. This transformation enables the model to capture and model non-linear relationships in the data by including polynomial terms in the feature space \citep{james2013introduction}. The combination score is calculated as follows: +\begin{equation} +c=\frac{exp\left(\beta_0 + \beta_1 x_{ij} + \beta_2 x_{ij}^2+...+\beta_p x_{ij}^p\right)}{1+exp\left(\beta_0 + \beta_1 x_{ij} + \beta_2 x_{ij}^2+...+\beta_p x_{ij}^p\right)} +\end{equation} + where $c_i$ is the combination score for the \textit{i}th individual and represents the posterior probabilities. + + \item \textit{Ridge Regression with Polynomial Feature Space:} This method combines Ridge regression with expanded feature space created by adding polynomial terms to the original predictor variables. It is a widely used shrinkage method when we have multicollinearity between the variables, which may be an issue for least squares regression. This method aims to estimate the coefficients of these correlated variables by minimizing the residual sum of squares (RSS) while adding a term (referred to as a regularization term) to prevent overfitting. The objective function is based on the L2 norm of the coefficient vector, which prevents overfitting in the model (Eq \ref{eq:beta_hat_r}). The Ridge estimate is defined as follows: +\begin{equation} +\hat{\beta}^R = \text{argmin}_{\beta} \text{RSS} + \lambda \sum_{j=1}^{2} \sum_{d=1}^{p} \beta_j^{d^2} \label{eq:beta_hat_r} +\end{equation} + +where +\begin{equation} +RSS=\sum_{i=1}^{n}\left(y_i-\beta_0-\sum_{j=1}^{2}\sum_{d=1}^{p} \beta_j^d x_{ij}^d\right) +\end{equation} + and $\hat{\beta}^R$ denotes the estimates of the coefficients of the Ridge regression, and the second term is called a penalty term where $\lambda \geq 0$ is a shrinkage parameter. The shrinkage parameter, $\lambda$, controls the amount of shrinkage applied to regression coefficients. A cross-validation is implemented to find the shrinkage parameter. We used the \href{https://cran.r-project.org/web/packages/glmnet/index.html}{\texttt{glmnet}} package \citep{friedman2010regularization} to implement the Ridge regression in combining the diagnostic tests. + + \item \textit{Lasso Regression with Polynomial Feature Space:} Similar to Ridge regression, Lasso regression is also a shrinkage method that adds a penalty term to the objective function of the least square regression. The objective function, in this case, is based on the L1 norm of the coefficient vector, which leads to the sparsity in the model. Some of the regression coefficients are precisely zero when the tuning parameter $\lambda$ is sufficiently large. This property of the Lasso method allows the model to automatically identify and remove less relevant variables and reduce the algorithm's complexity. The Lasso estimates are defined as follows: + + \begin{equation} +\hat{\beta}^L = \text{argmin}_{\beta} \text{RSS} + \lambda \sum_{j=1}^{2} \sum_{d=1}^{d} | \beta_j^d | \label{eq:beta_hat_l} +\end{equation} + + + To implement the Lasso regression in combining the diagnostic tests, we used the \href{https://cran.r-project.org/web/packages/glmnet/index.html}{\texttt{glmnet}} package \citep{friedman2010regularization}. + + \item \textit{Elastic-Net Regression with Polynomial Feature Space:} Elastic-Net Regression is a method that combines Lasso (L1 regularization) and Ridge (L2 regularization) penalties to address some of the limitations of each technique. The combination of the two penalties is controlled by two hyperparameters, $\alpha\in$[0,1] and $\lambda$, which enable you to adjust the trade-off between the L1 and L2 regularization terms \citep{james2013introduction}. For the implementation of the method, the \href{https://cran.r-project.org/web/packages/glmnet/index.html}{\texttt{glmnet}} package is used \citep{friedman2010regularization}. + \item \textit{Splines:} Another non-linear combination technique frequently applied in diagnostic tests is the splines. Splines are a versatile mathematical and computational technique that has a wide range of applications. These splines are piecewise functions that make interpolating or approximating data points possible. There are several types of splines, such as cubic splines. Smooth curves are created by approximating a set of control points using cubic polynomial functions. When implementing splines, two critical parameters come into play: degrees of freedom and the choice of polynomial degrees (i.e., degrees of the fitted polynomials). These user-adjustable parameters, which influence the flexibility and smoothness of the resulting curve, are critical for controlling the behavior of splines. We used the \href{https://rdocumentation.org/packages/splines/versions/3.6.2}{\texttt{splines}} package in the R programming language to implement splines. + + \item \textit{Generalized Additive Models with Smoothing Splines and Generalized Additive Models with Natural Cubic Splines:} Regression models are of great interest in many fields to understand the importance of different inputs. Even though regression is widely used, the traditional linear models often fail in real life as effects may not be linear. Another method called generalized additive models was introduced to identify and characterize non-linear regression \citep{james2013introduction}. Smoothing Splines and Natural Cubic Splines are two standard methods used within GAMs to model non-linear relationships. To implement these two methods, we used the \href{https://cran.r-project.org/web/packages/gam/index.html}{\texttt{gam}} package in R \citep{Trevor2015gam}. The method of GAMs with Smoothing Splines is a more data-driven and adaptive approach where smoothing splines can automatically capture non-linear relationships without specifying the number of knots (specific points where two or more polynomial segments are joined together to create a piecewise-defined curve or surface) or the shape of the spline in advance. On the other hand, natural cubic splines are preferred when we have prior knowledge or assumptions about the shape of the non-linear relationship. Natural cubic splines are more interpretable and can be controlled by the number of knots \citep{elhakeem2022using}. +\end{itemize} + +\subsubsection{Mathematical Operators} +This section will mention four arithmetic operators, eight distance measurements, and the exponential approach. Also, unlike other approaches, in this section, users can apply logarithmic, exponential, and trigonometric (sinus and cosine) transformations on the markers. Let $x_{ij}$ represent the value of the \textit{j}th variable for the \textit{i}th observation, with $i=1,2,...,n$ and $j=1,2$. Let the resulting combination score for the \textit{i}th individual be $c_i$. +\begin{itemize} + \item \textit{Arithmetic Operators:} Arithmetic operators such as addition, multiplication, division, and subtraction can also be used in diagnostic tests to optimize the AUC, a measure of diagnostic test performance. These mathematical operations can potentially increase the AUC and improve the efficacy of diagnostic tests by combining markers in specific ways. For example, if high values in one test indicate risk, while low values in the other indicate risk, subtraction or division can effectively combine these markers. + \item \textit{Distance Measurements:} While combining markers with mathematical operators, a distance measure is used to evaluate the relationships or similarities between marker values . It's worth noting that, as far as we know, no studies have been integrating various distinct distance measures with arithmetic operators in this context. Euclidean distance is the most commonly used distance measure, which may not accurately reflect the relationship between markers. Therefore, we incorporated a variety of distances into the package we developed. These distances are given as follows \citep{minaev2018distance,pandit2011comparative,cha2007comprehensive}:\\ + +\textit{Euclidean:} +\begin{equation} +c = \sqrt{(x_{i1} - 0)^2 + (x_{i2} - 0)^2} \label{eq:euclidean_distance} +\end{equation} +\\ +\textit{Manhattan:} +\begin{equation} +c = |x_{i1} - 0| + |x_{i2} - 0| \label{eq:manhattan_distance} +\end{equation} +\\ +\textit{Chebyshev:} +\begin{equation} +c = \max\{|x_{i1} - 0|, |x_{i2} - 0|\} \label{eq:max_absolute} +\end{equation} +\\ +\textit{Kulczynskid:} +\begin{equation} +c = \frac{|x_{i1} - 0| + |x_{i2} - 0|}{\min\{x_{i1}, x_{i2}\}} \label{eq:custom_expression} +\end{equation} +\\ +\textit{Lorentzian:} +\begin{equation} +c = \ln(1 + |x_{i1} - 0|) + \ln(1 + |x_{i2} - 0|) \label{eq:ln_expression} +\end{equation} +\\ + \textit{Taneja:} +\begin{equation} +c = z_1 \left( \log \left( \frac{z_1}{\sqrt{x_{i1} \epsilon}} \right) \right) + z_2 \left( \log \left( \frac{z_2}{\sqrt{x_{i2} \epsilon}} \right) \right) \label{eq:log_expression} +\end{equation} +\\ +where $z_1 = \frac{x_{i1} - 0}{2}, \quad z_2 = \frac{x_{i2} - 0}{2}$ \\ + +\textit{Kumar-Johnson:} +\begin{equation} +c = \frac{{(x_{i1}^2 - 0)^2}}{{2(x_{i1} \epsilon)^{\frac{3}{2}}}} + \frac{{(x_{i2}^2 - 0)^2}}{{2(x_{i2} \epsilon)^{\frac{3}{2}}}}, \quad \epsilon=0.0000) \label{eq:c_expression} +\end{equation} +\\ +\textit{Avg:} +\begin{equation} +c = \frac{{|x_{i1} - 0| + |x_{i2} - 0| + \max\{(x_{i1} - 0),(x_{i2} - 0)\}}}{2} \label{eq:c_expression} +\end{equation}\\ + + \item \textit{Exponential approach:} The exponential approach is another technique to explore different relationships between the diagnostic measurements. The methods in which one of the two diagnostic tests is considered as the base and the other as an exponent can be represented as $x_{i1}^{(x_{i2})}$ and $x_{i2}^{(x_{i1})}$. The specific goals or hypothesis of the analysis, as well as the characteristics of the diagnostic tests, will determine which method to use. +\end{itemize} +\subsubsection{Machine-Learning algorithms} +Machine-learning algorithms have been increasingly implemented in various fields, including the medical field, to combine diagnostic tests. Integrating diagnostic tests through ML can lead to more accurate, timely, and personalized diagnoses, which are particularly valuable in complex medical cases where multiple factors must be considered. In this study, we aimed to incorporate almost all ML algorithms in the package we developed. We took advantage of the \href{https://cran.r-project.org/web/packages/caret/index.html}{\texttt{caret}} package in R \citep{kuhn2008building} to achieve this goal. This package includes 190 classification algorithms that could be used to train models and make predictions. Our study focused on models that use numerical inputs and produce binary responses depending on the variables/features and the desired outcome. This selection process resulted in 113 models we further implemented in our study. We then classified these 113 models into five classes using the same idea given in \citep{zararsiz2016statistical}: (i) discriminant classifiers, (ii) decision tree models, (iii) kernel-based classifiers, (iv) ensemble classifiers, and (v) others. Like in the \href{https://cran.r-project.org/web/packages/caret/index.html}{\texttt{caret}} package, \code{mlComb()} sets up a grid of tuning parameters for a number of classification routines, fits each model, and calculates a performance measure based on resampling. After the model fitting, it uses the \code{predict()} function to calculate the probability of the "event" occurring for each observation. Finally, it performs ROC analysis based on the probabilities obtained from the prediction step. + +\subsection{Standardization} +Standardization is converting/transforming data into a standard scale to facilitate meaningful comparisons and statistical inference. Many statistical techniques frequently employ standardization to improve the interpretability and comparability of data. We implemented five different standardization methods that can be applied for each marker, the formulas of which are listed below: + +\begin{itemize} + \item Z-score: \( \frac{{x - \text{mean}(x)}}{{\text{sd}(x)}} \) + \item T-score: \( \left( \frac{{x - \text{mean}(x)}}{{\text{sd}(x)}} \times 10 \right) + 50 \) + \item Range: \( \frac{{x - \min(x)}}{{\max(x) - \min(x)}} \) + \item Mean: \( \frac{x}{{\text{mean}(x)}} \) + \item Deviance: \( \frac{x}{{\text{sd}(x)}} \) +\end{itemize} + + +\subsection{Model building} +After specifying a combination method from the \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} package, users can build and optimize model parameters using functions like \code{mlComb()}, \code{linComb()}, \code{nonlinComb()}, and \code{mathComb()}, depending on the specific model selected. Parameter optimization is done using n-fold cross-validation, repeated n-fold cross-validation, and bootstrapping methods for linear and non-linear approaches (i.e., \code{linComb()}, \code{nonlinComb()}). Additionally, for machine-learning approaches (i.e., \code{mlComb()}), all of the resampling methods from the \href{https://cran.r-project.org/web/packages/caret/index.html}{\texttt{caret}} package are used to optimize the model parameters. The total number of parameters being optimized varies across models, and these parameters are fine-tuned to maximize the AUC. Returned object stores input data, preprocessed and transformed data, trained model, and resampling results. +\subsection{Evaluation of model performances} + +A confusion matrix, as shown in Table \ref{tab:confusion_matrix}, is a table used to evaluate the performance of a classification model and shows the number of correct and incorrect predictions. It compares predicted and actual + +\begin{table}[h] +\centering +\caption{Confusion Matrix} +\label{tab:confusion_matrix} +\begin{tabular}{llll} +\hline +\multirow{2}{*}{Predicted labels} & \multicolumn{2}{l}{Actual class labels} & Total \\ \cline{2-4} + & Positive & Negative & \\ \hline +Positive & TP & FP & TP+FP \\ +Negative & FN & TN & FN+TN \\ +Total & TP+FN & FP+TN & n \\ \hline +\end{tabular} + + \begin{flush} +\tiny TP: True Positive, TN: True Negative, FP: False Positive, FN: False Negative, n: Sample size + \end{flush} +\end{table} +\noindent +class labels, with diagonal elements representing the correct predictions and off-diagonal elements representing the number of incorrect predictions. The \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} package uses the \href{https://cran.r-project.org/web/packages/OptimalCutpoints/index.html}{\texttt{OptimalCutpoints}} package \citep{yin2014optimal} to generate the confusion matrix and then \href{https://cran.r-project.org/web/packages/epiR/index.html}{\texttt{epiR}} \citep{stevenson2017epir}, including different performance metrics, to evaluate the performances. Various performance metrics accuracy rate (ACC), Kappa statistic ($\kappa$), sensitivity (SE), specificity (SP), apparent and true prevalence (AP, TP), positive and negative predictive values (PPV, NPV), positive and negative likelihood ratio (PLR, NLR), the proportion of true outcome negative subjects that test positive (False T+ proportion for true D-), the proportion of true outcome positive subjects that test negative (False T- proportion for true D+), the proportion of test-positive subjects that are outcome negative (False T+ proportion for T+), the proportion of test negative subjects (False T- proportion for T-) that are outcome positive measures are available in the \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} package. These metrics are summarized in Table \ref{tab:performance_metrics}. + +\begin{table}[htbp] + \centering \small + \caption{Performance Metrics and Formulas} + \label{tab:performance_metrics} + \begin{tabular}{ll} + \hline + \textbf{Performance Metric} & \textbf{Formula} \\ + \hline + Accuracy & $\text{ACC} = \frac{{\text{TP} + \text{TN}}}{2}$ \\ + Kappa & $\kappa = \frac{{\text{ACC} - P_e}}{{1 - P_e}}$ \\ + & $P_e = \frac{{(\text{TN} + \text{FN})(\text{TP} + \text{FP}) + (\text{FP} + \text{TN})(\text{FN} + \text{TN})}}{{n^2}}$ \\ + Sensitivity (Recall) & $\text{SE} = \frac{{\text{TP}}}{{\text{TP} + \text{FN}}}$ \\ + Specificity & $\text{SP} = \frac{{\text{TN}}}{{\text{TN} + \text{FP}}}$ \\ + Apparent Prevalence & $\text{AP} = \frac{{\text{TP}}}{{n}} + \frac{{\text{FP}}}{{n}}$ \\ + True Prevalence & $\text{TP} = \frac{{\text{AP} + \text{SP} - 1}}{{\text{SE} + \text{SP} - 1}}$ \\ + Positive Predictive Value (Precision) & $\text{PPV} = \frac{{\text{TP}}}{{\text{TP} + \text{FP}}}$ \\ + Negative Predictive Value & $\text{NPV} = \frac{{\text{TN}}}{{\text{TN} + \text{FN}}}$ \\ + Positive Likelihood Ratio & $\text{PLR} = \frac{{\text{SE}}}{{1 - \text{SP}}}$ \\ + Negative Likelihood Ratio & $\text{NLR} = \frac{{1 - \text{SE}}}{{\text{SP}}}$ \\ + The Proportion of True Outcome Negative Subjects That Test Positive & $\frac{{\text{FP}}}{{\text{FP} + \text{TN}}}$ \\ + The Proportion of True Outcome Positive Subjects That Test Negative & $\frac{{\text{FN}}}{{\text{TP} + \text{FN}}}$ \\ + The Proportion of Test Positive Subjects That Are Outcome Negative & $\frac{{\text{FP}}}{{\text{TP} + \text{FN}}}$ \\ + The Proportion of Test Negative Subjects That Are Outcome Positive & $\frac{{\text{FN}}}{{\text{FN} + \text{TN}}}$ \\ + \hline + \end{tabular} +\end{table} + + +\subsection{Prediction of the test cases} +The class labels of the observations in the test set are predicted with the model parameters derived from the training phase. It is critical to emphasize that the same analytical procedures employed during the training phase have also been applied to the test set, such as normalization, transformation, or standardization. More specifically, if the training set underwent Z-standardization, the test set would similarly be standardized using the mean and standard deviation derived from the training set. The class labels of the test set are then estimated based on the cut-off value established during the training phase and using the model's parameters that are trained using the training set. + +\subsection{Technical details and the structure of dtComb} +The \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} package is implemented using the R programming language (\url{https://www.r-project.org/}) version 4.2.0. Package development was facilitated with \href{https://cran.r-project.org/web/packages/devtools/index.html}{\texttt{devtools}} \citep{wickham2016devtools} and documented with \href{https://cran.r-project.org/web/packages/roxygen2/index.html}{\texttt{roxygen2}} \citep{wickham2013roxygen2}. Package testing was performed using 271 unit tests \citep{wickham2011testthat}. Double programming was performed using Python (\url{https://www.python.org/}) to validate the implemented functions \citep{shiralkarprogramming}.\\ + +\newpage +To combine diagnostic tests, the \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} package allows the integration of eight linear combination methods, seven non-linear combination methods, arithmetic operators, and, in addition to these, eight distance metrics within the scope of mathematical operators and a total of 113 machine-learning algorithms from the \href{https://cran.r-project.org/web/packages/caret/index.html}{\texttt{caret}} package \citep{kuhn2008building}. These are summarized in Table \ref{tab:dtComb_features}. +%Table 3 + +\begin{table}[htbp] + \centering \small + \caption{Features of dtComb} + \label{tab:dtComb_features} + \begin{tabular}{l p{10cm}} + \hline + \textbf{Modules (Tab Panels)} & \textbf{Features} \\ +\hline + \multirow{4}{*}{Combination Methods} & + \begin{itemize} + \item Linear Combination Approach (8 Different methods) + \item Non-linear Combination Approach (7 Different Methods) + \item Mathematical Operators (14 Different methods) + \item Machine-Learning Algorithms (113 Different Methods) \citep{kuhn2008building} + \end{itemize} \\ + + \multirow{2}{*}{Standardization Methods} & + \begin{itemize} + \item Linear, non-linear, and mathematical methods + \begin{itemize} + \item Z-score + \item T-score + \item Range + \item Mean + \item Deviance + \end{itemize} + \item 16 different preprocessing methods for ML \citep{kuhn2008building} + \end{itemize} \\ + \multirow{2}{*}{Resampling} & + \begin{itemize} + \item 3 different methods for linear and non-linear combination methods + \begin{itemize} + \item Bootstrapping + \item Cross-validation + \item Repeated cross-validation + \end{itemize} + \item 12 different resampling methods for ML \citep{kuhn2008building} + \end{itemize} \\ + {Cutpoints} & + \begin{itemize} + \item 34 different methods for optimum cutpoints \citep{yin2014optimal} + \end{itemize} \\ + \hline + \end{tabular} +\end{table} +\section{Results} + +Table \ref{tab:exist_pck} summarizes the existing packages and programs, including \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}}, along with the number of combination methods included in each package. While \pkg{mROC} offers only one linear combination method, \href{https://github.com/wbaopaul/MaxmzpAUC-R}{\texttt{maxmzpAUC}} and \href{https://cran.r-project.org/web/packages/movieROC/index.html}{\texttt{movieROC}} provide five linear combination techniques each, and \href{https://cran.r-project.org/web/packages/SLModels/index.html}{\texttt{SLModels}} includes four. However, these existing packages primarily focus on linear combination approaches. In contrast, \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} goes beyond these limitations by integrating not only linear methods but also non-linear approaches, machine learning algorithms, and mathematical operators. + +\begin{table}[htbp] + \centering \small + \caption{Comparison of dtComb vs. existing packages and programs} + \label{tab:exist_pck} + \begin{tabular}{@{}lcccc@{}} + \toprule + \textbf{Packages\&Programs} & \textbf{Linear Comb.} & \textbf{Non-linear Comb.} & \textbf{Math. Operators} & \textbf{ML algorithms} \\ + \midrule + \textbf{mROC} \citep{kramar2001mroc} & 1 & - & - & - \\ + \href{https://github.com/wbaopaul/MaxmzpAUC-R}{\texttt{maxmzpAUC}} \citep{yu2015two} & 5 & - & - &- \\ + \href{https://cran.r-project.org/web/packages/movieROC/index.html}{\texttt{movieROC}} \citep{perez2021visualizing}& 5 &- & - &- \\ + \href{https://cran.r-project.org/web/packages/SLModels/index.html}{\texttt{SLModels}} \citep{aznar-gimeno2023comparing} & 4 & - & - & - \\ + \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}}& 8 & 7 & 14 & 113 \\ + \bottomrule + \end{tabular} +\end{table} +\subsection{Dataset} +To demonstrate the functionality of the \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} package, we conduct a case study using four different combination methods. The data used in this study were obtained from patients who presented at Erciyes University Faculty of Medicine, Department of General Surgery, with complaints of abdominal pain \citep{zararsiz2016statistical,akyildiz2010value}. The dataset comprised D-dimer levels (\textit{D\_dimer}) and leukocyte counts (\textit{log\_leukocyte}) of 225 patients, divided into two groups (\textit{Group}): the first group consisted of 110 patients who required an immediate laparotomy (\textit{nedeed}). In comparison, the second group comprised 115 patients who did not (\textit{not\_nedeed}). After the evaluation of conventional treatment, the patients who underwent surgery due to their postoperative pathologies are placed in the first group. In contrast, those with a negative result from their laparotomy were assigned to the second group. All the analyses were performed by following a workflow given in Fig. \ref{figure:workflow}. First of all, the \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} package should be loaded in order to use related functions. + +\begin{figure}[H] + \centering + \includegraphics[width=0.81\textwidth]{dtComb/Figure/Figure_1.pdf} + \caption{\textbf{Combination steps of two diagnostic tests.} The figure presents a schematic representation of the sequential steps involved in combining two diagnostic tests using a combination method.} + \label{figure:workflow} +\end{figure} + +\begin{example} +# load dtComb package +library(dtComb) +\end{example} +Similarly, the exampleData1 data can be loaded from the R database by using the following R code: +\begin{example} + +# load exampleData1 data +data(exampleData1) +\end{example} + +\subsection{Implementation of the dtComb package} +In order to demonstrate the applicability of the \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} package, the implementation of an arbitrarily chosen method from each of the linear, non-linear, mathematical operator and machine learning approaches is demonstrated and their performance is compared. These methods are Pepe, Cai \& Langton for linear combination, Splines for non-linear, Addition for mathematical operator and SVM for machine-learning. Before applying the methods, we split the data into two parts: a training set comprising 70\% of the data and a test set comprising the remaining 30\%. + +\begin{example} +# Splitting the data set into train and test (70%-30%) +set.seed(2128) +inTrain <- caret::createDataPartition(exampleData1$group, p = 0.7, list = FALSE) +trainData <- exampleData1[inTrain, ] +colnames(trainData) <- c("Group", "D_dimer", "log_leukocyte") +testData <- exampleData1[-inTrain, -1] + +# define marker and status for combination function +markers <- trainData[, -1] +status <- factor(trainData$Group, levels = c("not_needed", "needed")) +\end{example} + +The model is trained on \code{trainData} and the resampling parameters used in the training phase are chosen as ten repeat five fold repeated cross-validation. Direction = ‘<’ is chosen, as higher values indicate higher risks. The Youden index was chosen among the cut-off methods. We note that markers are not standardised and results are presented at the confidence level (CI 95\%). Four main combination functions are run with the selected methods as follows. +\begin{example} + +# PCL method +fit.lin.PCL <- linComb(markers = markers, # status = status, event = "needed", + method = "PCL", resample = "repeatedcv", nfolds = 5, + nrepeats = 10, direction = "<", cutoff.method = "Youden") + +# splines method (degree = 3 and degrees of freedom = 3) +fit.nonlin.splines <- nonlinComb(markers = markers, status = status, event = "needed", + method = "splines", resample = "repeatedcv", nfolds = 5, + nrepeats = 10, cutoff.method = "Youden", direction = "<", + df1 = 3, df2 = 3) +#add operator + fit.add <- mathComb(markers = markers, status = status, event = "needed", + method = "add", direction = "<", cutoff.method = "Youden") +#SVM +fit.svm <- mlComb(markers = markers, status = status, event = "needed", method = "svmLinear", + resample = "repeatedcv", nfolds # = 5,nrepeats = 10, direction = "<", + cutoff.method = "Youden") + +\end{example} + +Various measures were considered to compare model performances, including AUC, ACC, SEN, SPE, PPV, and NPV. AUC statistics, with 95\% CI, have been calculated for each marker and method. The resulting statistics are as follows: 0.816 (0.751–0.880), 0.802 (0.728–0.877), 0.888 (0.825–0.930), 0.911 (0.868–0.954), 0.877 (0.824-0.929), and 0.875 (0.821-0.930) for D-dimer, Log(leukocyte), Pepe, Cai \& Langton, Splines, Addition, and Support Vector Machine (SVM). The results revealed that the predictive performances of markers and the combination of markers are significantly higher than random chance in determining the use of laparoscopy ($p<0.05$). The highest sensitivity and NPV were observed with the Addition method, while the highest specificity and PPV were observed with the Splines method. According to the overall AUC and accuracies, the combined approach fitted with the Splines method performed better than the other methods (Fig. \ref{figure:radar}). Therefore, the Splines method will be used in the subsequent analysis of the findings. + +\begin{figure}[H] + \centering + \includegraphics[width=1\textwidth]{dtComb/Figure/Figure_4.pdf} + \caption{\textbf{Radar plots of trained models and performance measures of two markers.} Radar plots summarize the diagnostic performances of two markers and various combination methods in the training dataset. These plots illustrate the performance metrics such as AUC, ACC, SEN, SPE, PPV, and NPV measurements. In these plots, the width of the polygon formed by connecting each point indicates the model's performance in terms of AUC, ACC, SEN, SPE, PPV, and NPV metrics. It can be observed that the polygon associated with the Splines method occupies the most expensive area, which means that the Splines method performed better than the other methods.} + \label{figure:radar} +\end{figure} +For the AUC of markers and the spline model: +\begin{example} +fit.nonlin.splines$AUC_table + AUC SE.AUC LowerLimit UpperLimit z p.value +D_dimer 0.8156966 0.03303310 0.7509530 0.8804403 9.556979 1.212446e-21 +log_leukocyte 0.8022286 0.03791768 0.7279113 0.8765459 7.970652 1.578391e-15 +Combination 0.9111752 0.02189588 0.8682601 0.9540904 18.778659 1.128958e-78 +\end{example} +Here: \\ +\code{SE}: Standard Error.\\ + + +The area under ROC curves for D-dimer levels and leukocyte counts on the logarithmic scale and combination score were 0.816, 0.802, and 0.911, respectively. The ROC curves generated with the combination score from the splines model, D-dimer levels, and leukocyte count markers are also given in Fig. \ref{figure:roc}, showing that the combination score has the highest AUC. It is observed that the splines method significantly improved between 9.5\% and 10.9\% in AUC statistics compared to D-dimer level and leukocyte counts, respectively. +\begin{figure}[H] + \centering + \includegraphics[width=0.7\textwidth]{dtComb/Figure/Figure_2.pdf} + \caption{\textbf{ROC curves.} ROC curves for combined diagnostic tests, with sensitivity displayed on the y-axis and 1-specificity displayed on the x-axis. As can be observed, the combination score produced the highest AUC value, indicating that the combined strategy performs the best overall.} + \label{figure:roc} +\end{figure}\\ + +To see the results of the binary comparison between the combination score and markers: +\begin{example} +fit.nonlin.splines$MultComp_table + +Marker1 (A) Marker2 (B) AUC (A) AUC (B) |A-B| SE(|A-B|) z p-value +1 Combination D_dimer 0.9079686 0.8156966 0.09227193 0.02223904 4.1490971 3.337893e-05 +2 Combination log_leukocyte 0.9079686 0.8022286 0.10573994 0.03466544 3.0502981 2.286144e-03 +3 D_dimer log_leukocyte 0.8156966 0.8022286 0.01346801 0.04847560 0.2778308 7.811423e-01 +\end{example} + +Controlling Type I error using Bonferroni correction, comparison of combination score with markers yielded significant results ($p<0.05$).\\ + +To demonstrate the diagnostic test results and performance measures for non-linear combination approach, the following code can be used: + +\begin{example} +fit.nonlin.splines$DiagStatCombined + Outcome + Outcome - Total +Test + 66 13 79 +Test - 11 68 79 +Total 77 81 158 + +Point estimates and 95% CIs: +-------------------------------------------------------------- +Apparent prevalence * 0.50 (0.42, 0.58) +True prevalence * 0.49 (0.41, 0.57) +Sensitivity * 0.86 (0.76, 0.93) +Specificity * 0.84 (0.74, 0.91) +Positive predictive value * 0.84 (0.74, 0.91) +Negative predictive value * 0.86 (0.76, 0.93) +Positive likelihood ratio 5.34 (3.22, 8.86) +Negative likelihood ratio 0.17 (0.10, 0.30) +False T+ proportion for true D- * 0.16 (0.09, 0.26) +False T- proportion for true D+ * 0.14 (0.07, 0.24) +False T+ proportion for T+ * 0.16 (0.09, 0.26) +False T- proportion for T- * 0.14 (0.07, 0.24) +Correctly classified proportion * 0.85 (0.78, 0.90) +-------------------------------------------------------------- +* Exact CIs +\end{example} + +Furthermore, if the diagnostic test results and performance measures of the combination score are compared with the results of the single markers, it can be observed that the TN value of the combination score is higher than that of the single markers, and the combination of markers has higher specificity and positive-negative predictive value than the log-transformed leukocyte counts and D-dimer level (Table \ref{tab:diagnostic_measures}). Conversely, D-dimer has a higher sensitivity than the others. Optimal cut-off values for both markers and the combined approach are also given in this table. + +\begin{table}[htbp] + \centering \small + \caption{Statistical diagnostic measures with 95\% confidence intervals for each marker and the combination score} + \label{tab:diagnostic_measures} + \begin{tabular}{@{}lccc@{}} + \toprule + \textbf{Diagnostic Measures (95\% CI)} & \textbf{D-dimer level ($>1.6$)} & \textbf{Log(leukocyte count) ($>4.16$)} & \textbf{Combination score ($>0.448$)} \\ + \midrule + TP & 66 & 61 & 65 \\ + TN & 53 & 60 & 69 \\ + FP & 28 & 21 & 12 \\ + FN & 11 & 16 & 12 \\ + Apparent prevalence & 0.59 (0.51-0.67) & 0.52 (0.44-0.60) & 0.49 (0.41-0.57) \\ + True prevalence & 0.49 (0.41-0.57) & 0.49 (0.41-0.57) & 0.49 (0.41-0.57) \\ + Sensitivity & 0.86 (0.76-0.93) & 0.79 (0.68-0.88) & 0.84 (0.74-0.92) \\ + Specificity & 0.65 (0.54-0.76) & 0.74 (0.63-0.83) & 0.85 (0.76-0.92) \\ + Positive predictive value & 0.70 (0.60-0.79) & 0.74 (0.64-0.83) & 0.84 (0.74-0.92) \\ + Negative predictive value & 0.83 (0.71-0.91) & 0.79 (0.68-0.87) & 0.85 (0.76-0.92) \\ + Positive likelihood ratio & 2.48 (1.81-3.39) & 3.06 (2.08-4.49) & 5.70 (3.35-9.69) \\ + Negative likelihood ratio & 0.22 (0.12-0.39) & 0.28 (0.18-0.44) & 0.18 (0.11-0.31) \\ + False T+ proportion for true D- & 0.35 (0.24-0.46) & 0.26 (0.17-0.37) & 0.15 (0.08-0.24) \\ + False T- proportion for true D+ & 0.14 (0.07-0.24) & 0.21 (0.12-0.32) & 0.16 (0.08-0.26) \\ + False T+ proportion for T+ & 0.30 (0.21-0.40) & 0.26 (0.17-0.36) & 0.16 (0.08-0.26) \\ + False T- proportion for T- & 0.17 (0.09-0.29) & 0.21 (0.13-0.32) & 0.15 (0.08-0.24) \\ + Accuracy & 0.75 (0.68-0.82) & 0.77 (0.69-0.83) & 0.85 (0.78-0.90) \\ + \bottomrule + \end{tabular} +\end{table} + +For a comprehensive analysis, the \code{plotComb} function in \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} can be used to generate plots of the distribution and scatter of combination scores of each group and the specificity and sensitivity corresponding to different cut-off point values Fig. \ref{figure:scatter}. This function requires the result of the \code{nonlinComb} function, which is an object of the “dtComb” class and \code{status} which is of factor type. +\begin{example} +# draw distribution, dispersion, and specificity and sensitivity plots +plotComb(fit.nonlin.splines, status) +\end{example} + +\begin{figure}[htbp] + \centering + \includegraphics[width=1\textwidth]{dtComb/Figure/Figure_3.pdf} + \caption{\textbf{Distribution, scatter, and sens\&spe plots of the combination score acquired with the training model.} Distribution of the combination score for two groups: needed and not needed (a). Scatter graph with classes on the x-axis and combination score on the y-axis (b). Sensitivity and specificity graph of the combination score c. While colors show each class in Figures (a) and (b), in Figure (c), the colors represent the sensitivity and specificity of the combination score.} + \label{figure:scatter} +\end{figure} + +If the model trained with Splines is to be tested, the generically written predict function is used. This function requires the test set and the result of the \code{nonlinComb} function, which is an object of the “dtComb” class. As a result of prediction, the output for each observation consisted of the combination score and the predicted label determined by the cut-off value derived from the model. +\begin{example} +# To predict the test set +pred <- predict(fit.nonlin.splines, testData) +head(pred) + + comb.score labels +1 0.6133884 needed +7 0.9946474 needed +10 0.9972347 needed +11 0.9925040 needed +13 0.9257699 needed +14 0.9847090 needed +\end{example} +Above, it can be seen that the estimated combination scores for the first six observations in the test set were labelled as \textbf{needed} because they were higher than the cut-off value of 0.448. + +\subsection{Web interface for the dtComb package} +The primary goal of developing the \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} package is to combine numerous distinct combination methods and make them easily accessible to researchers. Furthermore, the package includes diagnostic statistics and visualization tools for diagnostic tests and the combination score generated by the chosen method. Nevertheless, it is worth noting that using R code may pose challenges for physicians and those unfamiliar with R programming. We have also developed a user-friendly web application for \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} using \href{https://cran.r-project.org/web/packages/shiny/index.html}{\texttt{Shiny}} \citep{chang2017shiny} to address this. This web-based tool is publicly accessible and provides an interactive interface with all the functionalities found in the \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} package. \\ + +To initiate the analysis, users must upload their data by following the instructions outlined in the "Data upload" tab of the web tool. For convenience, we have provided three example datasets on this page to assist researchers in practicing the tool's functionality and to guide them in formatting their own data (as illustrated in Fig. \ref{figure:web}a). We also note that ROC analysis for a single marker can be performed within the ‘ROC Analysis for Single Marker(s)’ tab in the data upload section of the web interface. + +In the "Analysis" tab, one can find two crucial subpanels: +\begin{itemize} + \item Plots (Fig. \ref{figure:web}b): This section offers various visual representations, such as ROC curves, distribution plots, scatter plots, and sensitivity and specificity plots. These visualizations help users assess single diagnostic tests and the combination score generated using user-defined combination methods. + \item Results (Fig. \ref{figure:web}c): In this subpanel, one can access a range of statistics. It provides insights into the combination score and single diagnostic tests, AUC statistics, and comparisons to evaluate how the combination score fares against individual diagnostic tests, and various diagnostic measures. One can also predict new data based on the model parameters set previously and stored in the "Predict" tab (Fig. \ref{figure:web}d). If needed, one can download the model created during the analysis to keep the parameters of the fitted model. This lets users make new predictions by reloading the model from the "Predict" tab. Additionally, all the results can easily be downloaded using the dedicated download buttons in their respective tabs. +\end{itemize} +\begin{figure}[H] + \centering + \includegraphics[width=1\textwidth]{dtComb/Figure/Figure_5.pdf} + \caption{\textbf{Web interface of the dtComb package.} The figure illustrates the web interface of the \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} package, which demonstrates the steps involved in combining two diagnostic tests. a) Data Upload: The user is able to upload the dataset and select relevant markers, a gold standard test, and an event factor for analysis.b) Combination Analysis: This panel allows the selection of the combination method, method-specific parameters, and resampling options to refine the analysis. c) Combination Analysis Output: Displays the results generated by the selected combination method, providing the user with key metrics and visualizations for interpretation. d) Predict: Displays the prediction results of the trained model when applied to the test set.} + \label{figure:web} +\end{figure} +\section{Summary and further research} +In clinical practice, multiple diagnostic tests are possible for disease diagnosis \citep{yu2015two}. Combining these tests to enhance diagnostic accuracy is a widely accepted approach \citep{su1993linear,pepe2000combining,liu2011min,sameera2016binary,pepe2006combining,todor2014tools}. As far as we know, the tools in Table \ref{tab:exist_pck} have been designed to combine diagnostic tests but only contain at most five different combination methods. As a result, despite the existence of numerous advanced combination methods, there is currently no extensive tool available for integrating diagnostic tests.\\ +In this study, we presented \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}}, a comprehensive R package designed to combine diagnostic tests using various methods, including linear, non-linear, mathematical operators, and machine learning algorithms. The package integrates 142 different methods for combining two diagnostic markers to improve the accuracy of diagnosis. The package also provides ROC curve analysis, various graphical approaches, diagnostic performance scores, and binary comparison results. In the given example, one can determine whether patients with abdominal pain require laparoscopy by combining the D-dimer levels and white blood cell counts of those patients. Various methods, such as linear and non-linear combinations, were tested, and the results showed that the Splines method performed better than the others, particularly in terms of AUC and accuracy compared to single tests. This shows that diagnostic accuracy can be improved with combination methods.\\ +Future work can focus on extending the capabilities of the \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} package. While some studies focus on combining multiple markers, our study aimed to combine two markers using nearly all existing methods and develop a tool and package for clinical practice \citep{kang2016linear}. However, \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} can be further enhanced to combine more than two markers, broadening its applicability and utility in clinical settings. + + + +\subsection{R Software} + +The R package \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} is now available on the CRAN website \url{https://cran.r-project.org/web/packages/dtComb/index.html}. +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + + + +%\section{Summary} +%This file is only a basic article template. For full details of \emph{The R Journal} style and information on how to prepare your article for submission, see the \href{https://journal.r-project.org/share/author-guide.pdf}{Instructions for Authors}. + +\bibliography{dtCombreferences} + +\address{S. Ilayda Yerlitaş Taştan\\ + Department of Biostatistics \\ + Erciyes University\\ + Türkiye\\ + (ORCiD: 0000-0003-2830-3006)\\ + \email{ilaydayerlitas340@gmail.com}} + +\address{Serra Bersan Gengeç\\ + Drug Application and Research Center (ERFARMA)\\ + Erciyes University\\ + Türkiye\\ + \email{serrabersan@gmail.com}} + +\address{Necla Koçhan\\ + Department of Mathematics\\ + Izmir University of Economics\\ + Türkiye\\ + (ORCiD: 0000-0003-2355-4826)\\ + \email{necla.kayaalp@gmail.com}} + + \address{Gözde Ertürk Zararsız\\ + Department of Biostatistics \\ + Erciyes University\\ + Türkiye\\ + (ORCiD if desired)\\ + \email{gozdeerturk9@gmail.com}} + + \address{Selçuk Korkmaz\\ + Department of Biostatistics \\ + Trakya University\\ + Türkiye\\ + (ORCiD if desired)\\ + \email{selcukorkmaz@gmail.com}} + + \address{Gökmen Zararsız\\ + Department of Biostatistics \\ + Erciyes University\\ + Türkiye\\ + (ORCiD: 0000-0001-5801-1835)\\ + \email{gokmen.zararsiz@gmail.com}} diff --git a/_articles/RJ-2025-036/dtComb3.tex b/_articles/RJ-2025-036/dtComb3.tex new file mode 100644 index 0000000000..c513e1fcae --- /dev/null +++ b/_articles/RJ-2025-036/dtComb3.tex @@ -0,0 +1,570 @@ +% !TeX root = RJwrapper.tex +\title{dtComb: A Comprehensive R Library and Web Tool for Combining Diagnostic Tests} + +%\author{Serra Ilayda Yerlitas, Serra Bersan Gengec, Necla Kochan, Gozde Erturk Zararsiz, Selcuk Korkmaz and Gokmen Zararsiz} + +\author{S. Ilayda Yerlitaş Taştan, Serra Bersan Gengeç, Necla Koçhan, Gözde Ertürk Zararsız, Selçuk Korkmaz and Gökmen Zararsız} + +\maketitle + +\begin{abstract} + +The combination of diagnostic tests has become a crucial area of research, aiming to improve the accuracy and robustness of medical diagnostics. While existing tools focus primarily on linear combination methods, there is a lack of comprehensive tools that integrate diverse methodologies. In this study, we present dtComb, a comprehensive R package and web tool designed to address the limitations of existing diagnostic test combination platforms. One of the unique contributions of dtComb is offering a range of 142 methods to combine two diagnostic tests, including linear, non-linear, machine learning algorithms, and mathematical operators. Another significant contribution of dtComb is its inclusion of advanced tools for ROC analysis, diagnostic performance metrics, and visual outputs such as sensitivity-specificity curves. Furthermore, dtComb offers classification functions for new observations, making it an easy-to-use tool for clinicians and researchers. The web-based version is also available at \url{https://biotools.erciyes.edu.tr/dtComb/} for non-R users, providing an intuitive interface for test combination and model training.\end{abstract} + +\section{Introduction} +A typical scenario often encountered in combining diagnostic tests is when the gold standard method combines two-category and two continuous diagnostic tests. In such cases, clinicians usually seek to compare these two diagnostic tests and improve the performance of these diagnostic test results by dividing the results into proportional results \citep{muller2019amyloid, faria2016neutrophil, nyblom2006ast}. However, this technique is straightforward and may not fully capture all potential interactions and relationships between the diagnostic tests. Linear combination methods have been developed to overcome such problems \citep{erturkzararsiz2023linear}.\\ +Linear methods combine two diagnostic tests into a single score/index by assigning weights to each test, optimizing their performance in diagnosing the condition of interest \citep{neumann2023combining}. Such methods improve accuracy by leveraging the strengths of both tests \citep{aznar2022stepwise, bansal2013does}. For instance, Su and Liu \citep{su1993linear} found that Fisher’s linear discriminant function generates a linear combination of markers with either proportional or disproportional covariance matrices, aiming to maximize sensitivity consistently across the entire selectivity spectrum under a multivariate normal distribution model. In contrast, another approach introduced by Pepe and Thomson \citep{pepe2000combining} relies on ranking scores, eliminating the need for linear distributional assumptions when combining diagnostic tests. Despite the theoretical advances, when existing tools were examined, it was seen that they contained a limited number of methods. For instance, Kramar et al. developed a computer program called \pkg{mROC} that includes only the Su and Liu method \citep{kramar2001mroc}. Pérez-Fernández et al. presented a \href{https://cran.r-project.org/web/packages/movieROC/index.html}{\texttt{movieROC}} R package that includes methods such as Su and Liu, min-max, and logistic regression methods \citep{perez2021visualizing}. An R package called \href{https://github.com/wbaopaul/MaxmzpAUC-R}{\texttt{maxmzpAUC}} that includes similar methods was developed by Yu and Park \citep{yu2015two}. + +On the other hand, non-linear approaches incorporating the non-linearity between the diagnostic tests have been developed and employed to integrate the diagnostic tests \citep{du2024likelihood, ghosh2005classification}. These approaches incorporate the non-linear structure of tests into the model, which might improve the accuracy and reliability of the diagnosis. In contrast to some existing packages, which permit the use of non-linear approaches such as splines\footnote{\url{https://cran.r-project.org/web/ +packages/splines/index.html}}, lasso\footnote{\label{note2}\url{https://cran.r-project.org/web/packages/glmnet/index.html}} and ridge\footref{note2} regression, there is currently no package that employs these methods directly for combination and offers diagnostic performance. Machine-learning (ML) algorithms have recently been adopted to combine diagnostic tests \citep{ahsan2024advancements, sewak2024construction, agarwal2023artificial, prinzi2023explainable}. Many publications/studies focus on implementing ML algorithms in diagnostic tests \citep{salvetat2022game, salvetat2024ai, ganapathy2023comparison, alzyoud2024diagnosing, zararsiz2016statistical}. For instance, DeGroat et al. performed four different classification algorithms (Random Forest, Support Vector Machine, Extreme Gradient Boosting Decision Trees, and k-Nearest Neighbors) to combine markers for the diagnosis of cardiovascular disease \citep{degroat2024discovering}. The results showed that patients with cardiovascular disease can be diagnosed with up to 96\% accuracy using these ML techniques. There are numerous applications where ML methods can be implemented (\href{https://scikit-learn.org/stable/}{\texttt{scikit-learn}} \citep{pedregosa2011scikit}, \href{https://www.tensorflow.org/learn?hl=tr}{\texttt{TensorFlow}} \citep{tensorflow2015-whitepaper}, \href{https://cran.r-project.org/web/packages/caret/index.html}{\texttt{caret}} \citep{kuhn2008building}). The \href{https://cran.r-project.org/web/packages/caret/index.html}{\texttt{caret}} library is one of the most comprehensive tools developed in the R language\citep{kuhn2008building}. However, these are general tools developed only for ML algorithms and do not directly combine two diagnostic tests or provide diagnostic performance measures. + +Apart from the aforementioned methods, several basic mathematical operations such as addition, multiplication, subtraction, and division can also be used to combine markers \citep{svart2024neurofilament, luo2024ast, serban2024significance}. For instance, addition can enhance diagnostic sensitivity by combining the effects of markers, whereas subtraction can more distinctly differentiate disease states by illustrating the variance across markers. On the other hand, there are several commercial (e.g. IBM SPSS, MedCalc, Stata, etc.) and open source (R) software packages (\href{https://cran.r-project.org/web/packages/ROCR/index.html}{\texttt{ROCR}} \citep{sing2005rocr}, (\href{https://cran.r-project.org/web/packages/pROC/index.html}{\texttt{pROC}} \citep{robin2011proc}, \href{https://cran.r-project.org/web/packages/PRROC/index.html}{\texttt{PRROC}} \citep{grau2015prroc}, \href{https://cran.r-project.org/web/packages/plotROC/index.html}{\texttt{plotROC}} \citep{sachs2017plotroc}) that researchers can use for Receiver operating characteristic (ROC) curve analysis. However, these tools are designed to perform a single marker ROC analysis. As a result, there is currently no software tool that covers almost all combination methods. + +In this study, we developed \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}}, an R package encompassing nearly all existing combination approaches in the literature. \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} has two key advantages, making it easy to apply and superior to the other packages: (1) it provides users with a comprehensive 142 methods, including linear and non-linear approaches, ML approaches and mathematical operators; (2) it produces turnkey solutions to users from the stage of uploading data to the stage of performing analyses, performance evaluation and reporting. Furthermore, it is the only package that illustrates linear approaches such as Minimax and Todor \& Saplacan \citep{sameera2016binary,todor2014tools}. In addition, it allows for the classification of new, previously unseen observations using trained models. To our knowledge, no other tools were designed and developed to combine two diagnostic tests on a single platform with 142 different methods. In other words, \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} has made more effective and robust combination methods ready for application instead of traditional approaches such as simple ratio-based methods. First, we review the theoretical basis of the related combination methods; then, we present an example implementation to demonstrate the applicability of the package. Finally, we present a user-friendly, up-to-date, and comprehensive web tool developed to facilitate \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} for physicians and healthcare professionals who do not use the R programming language. The \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} package is freely available on the CRAN network, the web application is freely available at \url{https://biotools.erciyes.edu.tr/dtComb/}, and all source code is available on GitHub\footnote{\url{https://github.com/gokmenzararsiz/dtComb}, \url{https://github.com/gokmenzararsiz/dtComb_Shiny}}. +\section{Material and methods} +This section will provide an overview of the combination methods implemented in the literature. Before applying these methods, we will also discuss the standardization techniques available for the markers, the resampling methods during model training, and, ultimately, the metrics used to evaluate the model’s performance. + +\subsection{Combination approaches} +\subsubsection{Linear combination methods} +The \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} package comprises eight distinct linear combination methods, which will be elaborated in this section. Before investigating these methods, we briefly introduce some notations which will be used throughout this section. \\ +Notations: \\ +Let $D_{i}, i = 1, 2, …, n_1$ be the marker values of the $i$th individual in the diseased group, where $D_i=(D_{i1},D_{i2})$, and $H_j, j=1,2,…,n_2$ be the marker values of the $j$th individual in the healthy group, where $H_j=(H_{j1},H_{j2})$. Let $x_{i1}=c(D_{{i1}},H_{j1})$ be the values of the first marker, and $x_{i2}=c(D_{i2},H_{j2})$ be values of the second marker for the $i$th individual $i=1,2,...,n$. Let $D_{i,min}=\min(D_{i1},D_{i2})$, $D_{i,max}=\max(D_{i1},D_{i2})$, $H_{j,min}=\min(H_{j1},H_{j2})$, $H_{j,max}=\max(H_{j1},H_{j2})$ and $c_i$ be the resulting combination score of the $i$th individual. +\begin{itemize} + \item \textit{Logistic regression:} Logistic regression is a statistical method used for binary classification. The logistic regression model estimates the probability of the binary outcome occurring based on the values of the independent variables. It is one of the most commonly applied methods in diagnostic tests, and it generates a linear combination of markers that can distinguish between control and diseased individuals. Logistic regression is generally less effective than normal-based discriminant analysis, like Su and Liu's multivariate normality-based method, when the normal assumption is met \citep{ruiz1991asymptotic,efron1975efficiency}. On the other hand, others have argued that logistic regression is more robust because it does not require any assumptions about the joint distribution of multiple markers \citep{cox1989analysis}. Therefore, it is essential to investigate the performance of linear combination methods derived from the logistic regression approach with non-normally distributed data.\\ + The objective of the logistic regression model is to maximize the logistic likelihood function. In other words, the logistic likelihood function is maximized to estimate the logistic regression model coefficients.\\ + \begin{equation} + \label{eq:1} +c=\frac{exp(\beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2}}{1+exp(\beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2}} +\end{equation} + The logistic regression coefficients can provide the maximum likelihood estimation of the model, producing an easily interpretable value for distinguishing between the two groups. + \item \textit{Scoring based on logistic regression:} The method primarily uses a binary logistic regression model, with slight modifications to enhance the combination score. The regression coefficients, as predicted in Eq \ref{eq:1}, are rounded to a user-specified number of decimal places and subsequently used to calculate the combination score \citep{leon2006bedside}. + \begin{equation} +c= \beta_1 x_{i1}+\beta_2 x_{i2} +\end{equation} + \item \textit{Pepe \& Thompson's method:} Pepe \& Thompson have aimed to maximize the AUC or partial AUC to combine diagnostic tests, regardless of the distribution of markers \citep{pepe2000combining}. They developed an empirical solution of optimal linear combinations that maximize the Mann-Whitney U statistic, an empirical estimate of the ROC curve. Notably, this approach is distribution-free. Mathematically, they maximized the following objective function: + \begin{equation} +\text{maximize} \; U(a)= \frac{1}{n_1 n_2} \sum_{i=1}^{n_1} \sum_{j=1}^{n_2} I\left[D_{i1}+\alpha D_{i2}\geq H_{j1}+\alpha H_{j2}\right] +\end{equation} + \begin{equation} +c= x_{i1}+\alpha x_{i2} +\label{eq:4} +\end{equation} + where $a \in [-1,1]$ is interpreted as the relative weight of $x_{i2}$ to $x_{i1}$ in the combination, the weight of the second marker. This formula aims to find $\alpha$ to maximize $U(a)$. Readers are referred to see (Pepe and Thomson) \citep{pepe2000combining}. + \item \textit{Pepe, Cai \& Langton's method:} Pepe et al. observed that when the disease status and the levels of markers conform to a generalized linear model, the regression coefficients represent the optimal linear combinations that maximize the area under the ROC curves \citep{pepe2006combining}. The following objective function is maximized to achieve a higher AUC value: +\begin{equation} +\text{maximize} \; U(a)= \frac{1}{n_1 n_2} \sum_{i=1}^{n_1} \sum_{j=1}^{n_2} I\left[D_{i1}+\alpha D_{i2}> H_{j1}+\alpha H_{j2}\right] + \frac{1}{2}I\left[D_{i1}+\alpha = H_{j1} + \alpha H_{j2}\right] +\end{equation} + Before calculating the combination score using Eq \ref{eq:4}, the marker values are normalized or scaled to be constrained within the scale of 0 to 1. In addition, it is noted that the estimate obtained by maximizing the empirical AUC can be considered as a particular case of the maximum rank correlation estimator from which the general asymptotic distribution theory was developed. Readers are referred to Pepe (2003, Chapters 4–6) for a review of the ROC curve approach and more details \citep{pepe2003statistical}. + + \item \textit{Min-Max method:} The Pepe \& Thomson method is straightforward if there are two markers. It is computationally challenging if we have more than two markers to be combined. To overcome the computational complexity issue of this method, Liu et al. \citep{liu2011min} proposed a non-parametric approach that linearly combines the minimum and maximum values of the observed markers of each subject. This approach, which does not rely on the normality assumption of data distributions (i.e., distribution-free), is known as the Min-Max method and may provide higher sensitivity than any single marker. The objective function of the Min-Max method is as follows: +\begin{equation} +\text{maximize} \; U(a)= \frac{1}{n_1 n_2} \sum_{i=1}^{n_1} \sum_{j=1}^{n_2} I[D_{i,max}+\alpha D_{i,min}> H_{j,max}+\alpha H_{j,min}] \end{equation} +\begin{equation} + c= x_{i,max}+\alpha x_{i,min} +\end{equation}\\ + where $x_{i,max}=\max⁡(x_{i1},x_{i2})$ and $x_{i,min}=\min⁡(x_{i1},x_{i2})$.\\ + +The Min-Max method aims to combine repeated measurements of a single marker over time or multiple markers that are measured with the same unit. While the Min-Max method is relatively simple to implement, it has some limitations. For example, markers may have different units of measurement, so standardization can be needed to ensure uniformity during the combination process. Furthermore, it is unclear whether all available information is fully utilized when combining markers, as this method incorporates only the markers' minimum and maximum values into the model \citep{kang2016linear}. + + \item \textit{Su \& Liu's method:} Su and Liu examined the combination score separately under the assumption of two multivariate normal distributions when the covariance matrices were proportional or disproportionate \citep{su1993linear}. Multivariate normal distributions with different covariances were first utilized in classification problems \citep{anderson1962classification}. Then, Su and Liu also developed a linear combination method by extending the idea of using multivariate distributions to the AUC, showing that the best coefficients that maximize AUC are Fisher's discriminant coefficients. Assuming that $D~N(\mu_D, \sum_D)$ and $H~N(\mu_H, \sum_H)$ represent the multivariate normal distributions for the diseased and non-diseased groups, respectively. The Fisher’s coefficients are as follows: +\begin{equation} +(\alpha, \beta) = (\sum_{D} + \sum_{H})^{-1} \mu \label{eq:alpha_beta} +\end{equation} + where $\mu=\mu_D-\mu_H$. The combination score in this case is: +\begin{equation} +c= \alpha x_{i1}+ \beta x_{i2} +\label{eq:9} +\end{equation} + \item \textit{The Minimax method:} The Minimax method is an extension of Su \& Liu's method \citep{sameera2016binary}. Suppose that D follows a multivariate normal distribution $D\sim N(\mu_D, \sum_D)$, representing the diseased group, and H follows a multivariate normal distribution $H\sim N(\mu_H, \sum_H)$, representing the non-diseased group. Then Fisher’s coefficients are as follows: +\begin{equation} +(\alpha, \beta) = \left[t\sum_{D} + (1-t)\sum_{H}\right]^{-1} (\mu_D - \mu_H) \label{eq:alpha_beta_expression} +\end{equation} + + Given these coefficients, the combination score is calculated using Eq \ref{eq:9}. In this formula, \textit{t} is a constant with values ranging from 0 to 1. This value can be hyper-tuned by maximizing the AUC. + + \item \textit{Todor \& Saplacan’s method:} Todor and Saplacan's method uses the sine and cosine trigonometric functions to calculate the combination score \citep{todor2014tools}. The combination score is calculated using $\theta \in[-\frac{\pi}{2},\frac{\pi}{2}]$, which maximizes the AUC within this interval. The formula for the combination score is given as follows: +\begin{equation} +c= \sin{(\theta)}x_{i1}+\cos{(\theta)}x_{i2} +\end{equation} +\end{itemize} + +\subsubsection{Non-linear combination methods} +In addition to linear combination methods, the \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} package includes seven non-linear approaches, which will be discussed in this subsection. In this subsection, we will use the following notations: +$x_{ij}$: the value of the \textit{j}th marker for the \textit{i}th individual, $i=1,2,...,n$ and $j=1,2$ \textit{d}: degree of polynomial regressions and splines, $d = 1,2,…,p$. + +\begin{itemize} + \item \textit{Logistic Regression with Polynomial Feature Space:} This approach extends the logistic regression model by adding extra predictors created by raising the original predictor variables to a certain power. This transformation enables the model to capture and model non-linear relationships in the data by including polynomial terms in the feature space \citep{james2013introduction}. The combination score is calculated as follows: +\begin{equation} +c=\frac{exp\left(\beta_0 + \beta_1 x_{ij} + \beta_2 x_{ij}^2+...+\beta_p x_{ij}^p\right)}{1+exp\left(\beta_0 + \beta_1 x_{ij} + \beta_2 x_{ij}^2+...+\beta_p x_{ij}^p\right)} +\end{equation} + where $c_i$ is the combination score for the \textit{i}th individual and represents the posterior probabilities. + + \item \textit{Ridge Regression with Polynomial Feature Space:} This method combines Ridge regression with expanded feature space created by adding polynomial terms to the original predictor variables. It is a widely used shrinkage method when we have multicollinearity between the variables, which may be an issue for least squares regression. This method aims to estimate the coefficients of these correlated variables by minimizing the residual sum of squares (RSS) while adding a term (referred to as a regularization term) to prevent overfitting. The objective function is based on the L2 norm of the coefficient vector, which prevents overfitting in the model (Eq \ref{eq:beta_hat_r}). The Ridge estimate is defined as follows: +\begin{equation} +\hat{\beta}^R = \text{argmin}_{\beta} \text{RSS} + \lambda \sum_{j=1}^{2} \sum_{d=1}^{p} \beta_j^{d^2} \label{eq:beta_hat_r} +\end{equation} + +where +\begin{equation} +RSS=\sum_{i=1}^{n}\left(y_i-\beta_0-\sum_{j=1}^{2}\sum_{d=1}^{p} \beta_j^d x_{ij}^d\right) +\end{equation} + and $\hat{\beta}^R$ denotes the estimates of the coefficients of the Ridge regression, and the second term is called a penalty term where $\lambda \geq 0$ is a shrinkage parameter. The shrinkage parameter, $\lambda$, controls the amount of shrinkage applied to regression coefficients. A cross-validation is implemented to find the shrinkage parameter. We used the \href{https://cran.r-project.org/web/packages/glmnet/index.html}{\texttt{glmnet}} package \citep{friedman2010regularization} to implement the Ridge regression in combining the diagnostic tests. + + \item \textit{Lasso Regression with Polynomial Feature Space:} Similar to Ridge regression, Lasso regression is also a shrinkage method that adds a penalty term to the objective function of the least square regression. The objective function, in this case, is based on the L1 norm of the coefficient vector, which leads to the sparsity in the model. Some of the regression coefficients are precisely zero when the tuning parameter $\lambda$ is sufficiently large. This property of the Lasso method allows the model to automatically identify and remove less relevant variables and reduce the algorithm's complexity. The Lasso estimates are defined as follows: + + \begin{equation} +\hat{\beta}^L = \text{argmin}_{\beta} \text{RSS} + \lambda \sum_{j=1}^{2} \sum_{d=1}^{d} | \beta_j^d | \label{eq:beta_hat_l} +\end{equation} + + To implement the Lasso regression in combining the diagnostic tests, we used the \href{https://cran.r-project.org/web/packages/glmnet/index.html}{\texttt{glmnet}} package \citep{friedman2010regularization}. + + \item \textit{Elastic-Net Regression with Polynomial Feature Space:} Elastic-Net Regression is a method that combines Lasso (L1 regularization) and Ridge (L2 regularization) penalties to address some of the limitations of each technique. The combination of the two penalties is controlled by two hyperparameters, $\alpha\in$[0,1] and $\lambda$, which enable you to adjust the trade-off between the L1 and L2 regularization terms \citep{james2013introduction}. For the implementation of the method, the \href{https://cran.r-project.org/web/packages/glmnet/index.html}{\texttt{glmnet}} package is used \citep{friedman2010regularization}. + \item \textit{Splines:} Another non-linear combination technique frequently applied in diagnostic tests is the splines. Splines are a versatile mathematical and computational technique that has a wide range of applications. These splines are piecewise functions that make interpolating or approximating data points possible. There are several types of splines, such as cubic splines. Smooth curves are created by approximating a set of control points using cubic polynomial functions. When implementing splines, two critical parameters come into play: degrees of freedom and the choice of polynomial degrees (i.e., degrees of the fitted polynomials). These user-adjustable parameters, which influence the flexibility and smoothness of the resulting curve, are critical for controlling the behavior of splines. We used the \href{https://rdocumentation.org/packages/splines/versions/3.6.2}{\texttt{splines}} package in the R programming language to implement splines. + + \item \textit{Generalized Additive Models with Smoothing Splines and Generalized Additive Models with Natural Cubic Splines:} Regression models are of great interest in many fields to understand the importance of different inputs. Even though regression is widely used, the traditional linear models often fail in real life as effects may not be linear. Another method called generalized additive models was introduced to identify and characterize non-linear regression \citep{james2013introduction}. Smoothing Splines and Natural Cubic Splines are two standard methods used within GAMs to model non-linear relationships. To implement these two methods, we used the \href{https://cran.r-project.org/web/packages/gam/index.html}{\texttt{gam}} package in R \citep{Trevor2015gam}. The method of GAMs with Smoothing Splines is a more data-driven and adaptive approach where smoothing splines can automatically capture non-linear relationships without specifying the number of knots (specific points where two or more polynomial segments are joined together to create a piecewise-defined curve or surface) or the shape of the spline in advance. On the other hand, natural cubic splines are preferred when we have prior knowledge or assumptions about the shape of the non-linear relationship. Natural cubic splines are more interpretable and can be controlled by the number of knots \citep{elhakeem2022using}. +\end{itemize} + +\subsubsection{Mathematical Operators} +This section will mention four arithmetic operators, eight distance measurements, and the exponential approach. Also, unlike other approaches, in this section, users can apply logarithmic, exponential, and trigonometric (sinus and cosine) transformations on the markers. Let $x_{ij}$ represent the value of the \textit{j}th variable for the \textit{i}th observation, with $i=1,2,...,n$ and $j=1,2$. Let the resulting combination score for the \textit{i}th individual be $c_i$. +\begin{itemize} + \item \textit{Arithmetic Operators:} Arithmetic operators such as addition, multiplication, division, and subtraction can also be used in diagnostic tests to optimize the AUC, a measure of diagnostic test performance. These mathematical operations can potentially increase the AUC and improve the efficacy of diagnostic tests by combining markers in specific ways. For example, if high values in one test indicate risk, while low values in the other indicate risk, subtraction or division can effectively combine these markers. + \item \textit{Distance Measurements:} While combining markers with mathematical operators, a distance measure is used to evaluate the relationships or similarities between marker values. It's worth noting that, as far as we know, no studies have integrated various distinct distance measures with arithmetic operators in this context. Euclidean distance is the most commonly used distance measure, which may not accurately reflect the relationship between markers. Therefore, we incorporated a variety of distances into the package we developed. These distances are given as follows \citep{minaev2018distance,pandit2011comparative,cha2007comprehensive}:\\ + +\textit{Euclidean:} +\begin{equation} +c = \sqrt{(x_{i1} - 0)^2 + (x_{i2} - 0)^2} \label{eq:euclidean_distance} +\end{equation} +\\ +\textit{Manhattan:} +\begin{equation} +c = |x_{i1} - 0| + |x_{i2} - 0| \label{eq:manhattan_distance} +\end{equation} +\\ +\textit{Chebyshev:} +\begin{equation} +c = \max\{|x_{i1} - 0|, |x_{i2} - 0|\} \label{eq:max_absolute} +\end{equation} +\\ +\textit{Kulczynskid:} +\begin{equation} +c = \frac{|x_{i1} - 0| + |x_{i2} - 0|}{\min\{x_{i1}, x_{i2}\}} \label{eq:custom_expression} +\end{equation} +\\ +\textit{Lorentzian:} +\begin{equation} +c = \ln(1 + |x_{i1} - 0|) + \ln(1 + |x_{i2} - 0|) \label{eq:ln_expression} +\end{equation} +\\ + \textit{Taneja:} +\begin{equation} +c = z_1 \left( \log \left( \frac{z_1}{\sqrt{x_{i1} \epsilon}} \right) \right) + z_2 \left( \log \left( \frac{z_2}{\sqrt{x_{i2} \epsilon}} \right) \right) \label{eq:log_expression} +\end{equation} +\\ +where $z_1 = \frac{x_{i1} - 0}{2}, \quad z_2 = \frac{x_{i2} - 0}{2}$ \\ + +\textit{Kumar-Johnson:} +\begin{equation} +c = \frac{{(x_{i1}^2 - 0)^2}}{{2(x_{i1} \epsilon)^{\frac{3}{2}}}} + \frac{{(x_{i2}^2 - 0)^2}}{{2(x_{i2} \epsilon)^{\frac{3}{2}}}}, \quad \epsilon=0.0000) \label{eq:c_expression} +\end{equation} +\\ +\textit{Avg:} +\begin{equation} +c = \frac{{|x_{i1} - 0| + |x_{i2} - 0| + \max\{(x_{i1} - 0),(x_{i2} - 0)\}}}{2} \label{eq:c_expression} +\end{equation}\\ + + \item \textit{Exponential approach:} The exponential approach is another technique to explore different relationships between the diagnostic measurements. The methods in which one of the two diagnostic tests is considered as the base and the other as an exponent can be represented as $x_{i1}^{(x_{i2})}$ and $x_{i2}^{(x_{i1})}$. The specific goals or hypothesis of the analysis, as well as the characteristics of the diagnostic tests, will determine which method to use. +\end{itemize} +\subsubsection{Machine-Learning algorithms} +Machine-learning algorithms have been increasingly implemented in various fields, including the medical field, to combine diagnostic tests. Integrating diagnostic tests through ML can lead to more accurate, timely, and personalized diagnoses, which are particularly valuable in complex medical cases where multiple factors must be considered. In this study, we aimed to incorporate almost all ML algorithms in the package we developed. We took advantage of the \href{https://cran.r-project.org/web/packages/caret/index.html}{\texttt{caret}} package in R \citep{kuhn2008building} to achieve this goal. This package includes 190 classification algorithms that could be used to train models and make predictions. Our study focused on models that use numerical inputs and produce binary responses depending on the variables/features and the desired outcome. This selection process resulted in 113 models we further implemented in our study. We then classified these 113 models into five classes using the same idea given in \citep{zararsiz2016statistical}: (i) discriminant classifiers, (ii) decision tree models, (iii) kernel-based classifiers, (iv) ensemble classifiers, and (v) others. Like in the \href{https://cran.r-project.org/web/packages/caret/index.html}{\texttt{caret}} package, \code{mlComb()} sets up a grid of tuning parameters for a number of classification routines, fits each model, and calculates a performance measure based on resampling. After the model fitting, it uses the \code{predict()} function to calculate the probability of the "event" occurring for each observation. Finally, it performs ROC analysis based on the probabilities obtained from the prediction step. + +\subsection{Standardization} +Standardization is converting/transforming data into a standard scale to facilitate meaningful comparisons and statistical inference. Many statistical techniques frequently employ standardization to improve the interpretability and comparability of data. We implemented five different standardization methods that can be applied for each marker, the formulas of which are listed below: + +\begin{itemize} + \item Z-score: \( \frac{{x - \text{mean}(x)}}{{\text{sd}(x)}} \) + \item T-score: \( \left( \frac{{x - \text{mean}(x)}}{{\text{sd}(x)}} \times 10 \right) + 50 \) + \item min\_max\_scale: \( \frac{{x - \min(x)}}{{\max(x) - \min(x)}} \) + \item scale\_mean\_to\_one: \( \frac{x}{{\text{mean}(x)}} \) + \item scale\_sd\_to\_one: \( \frac{x}{{\text{sd}(x)}} \) +\end{itemize} + +\subsection{Model building} +After specifying a combination method from the \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} package, users can build and optimize model parameters using functions like \code{mlComb()}, \code{linComb()}, \code{nonlinComb()}, and \code{mathComb()}, depending on the specific model selected. Parameter optimization is done using n-fold cross-validation, repeated n-fold cross-validation, and bootstrapping methods for linear and non-linear approaches (i.e., \code{linComb()}, \code{nonlinComb()}). Additionally, for machine-learning approaches (i.e., \code{mlComb()}), all of the resampling methods from the \href{https://cran.r-project.org/web/packages/caret/index.html}{\texttt{caret}} package are used to optimize the model parameters. The total number of parameters being optimized varies across models, and these parameters are fine-tuned to maximize the AUC. The returned object stores input data, preprocessed and transformed data, trained model, and resampling results. +\subsection{Evaluation of model performances} + +A confusion matrix, as shown in Table \ref{tab:confusion_matrix}, is a table used to evaluate the performance of a classification model and shows the number of correct and incorrect predictions. It compares predicted and actual + +\begin{table}[h] +\centering +\caption{Confusion Matrix} +\label{tab:confusion_matrix} +\begin{tabular}{llll} +\hline +\multirow{2}{*}{Predicted labels} & \multicolumn{2}{l}{Actual class labels} & Total \\ \cline{2-4} + & Positive & Negative & \\ \hline +Positive & TP & FP & TP+FP \\ +Negative & FN & TN & FN+TN \\ +Total & TP+FN & FP+TN & n \\ \hline +\end{tabular} + +{\centering\tiny TP: True Positive, TN: True Negative, FP: False Positive, FN: False Negative, n: Sample size} +\end{table} +\noindent +class labels, with diagonal elements representing the correct predictions and off-diagonal elements representing the number of incorrect predictions. The \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} package uses the \href{https://cran.r-project.org/web/packages/OptimalCutpoints/index.html}{\texttt{OptimalCutpoints}} package \citep{yin2014optimal} to generate the confusion matrix and then \href{https://cran.r-project.org/web/packages/epiR/index.html}{\texttt{epiR}} \citep{stevenson2017epir}, including different performance metrics, to evaluate the performances. Various performance metrics, accuracy rate (ACC), Kappa statistic ($\kappa$), sensitivity (SE), specificity (SP), apparent and true prevalence (AP, TP), positive and negative predictive values (PPV, NPV), positive and negative likelihood ratio (PLR, NLR), the proportion of true outcome negative subjects that test positive (False T+ proportion for true D-), the proportion of true outcome positive subjects that test negative (False T- proportion for true D+), the proportion of test-positive subjects that are outcome negative (False T+ proportion for T+), the proportion of test negative subjects (False T- proportion for T-) that are outcome positive measures are available in the \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} package. These metrics are summarized in Table \ref{tab:performance_metrics}. + +\begin{table}[htbp] + \centering \small + \caption{Performance Metrics and Formulas} + \label{tab:performance_metrics} + \begin{tabular}{ll} + \hline + \textbf{Performance Metric} & \textbf{Formula} \\ + \hline + Accuracy & $\text{ACC} = \frac{{\text{TP} + \text{TN}}}{2}$ \\ + Kappa & $\kappa = \frac{{\text{ACC} - P_e}}{{1 - P_e}}$ \\ + & $P_e = \frac{{(\text{TN} + \text{FN})(\text{TP} + \text{FP}) + (\text{FP} + \text{TN})(\text{FN} + \text{TN})}}{{n^2}}$ \\ + Sensitivity (Recall) & $\text{SE} = \frac{{\text{TP}}}{{\text{TP} + \text{FN}}}$ \\ + Specificity & $\text{SP} = \frac{{\text{TN}}}{{\text{TN} + \text{FP}}}$ \\ + Apparent Prevalence & $\text{AP} = \frac{{\text{TP}}}{{n}} + \frac{{\text{FP}}}{{n}}$ \\ + True Prevalence & $\text{TP} = \frac{{\text{AP} + \text{SP} - 1}}{{\text{SE} + \text{SP} - 1}}$ \\ + Positive Predictive Value (Precision) & $\text{PPV} = \frac{{\text{TP}}}{{\text{TP} + \text{FP}}}$ \\ + Negative Predictive Value & $\text{NPV} = \frac{{\text{TN}}}{{\text{TN} + \text{FN}}}$ \\ + Positive Likelihood Ratio & $\text{PLR} = \frac{{\text{SE}}}{{1 - \text{SP}}}$ \\ + Negative Likelihood Ratio & $\text{NLR} = \frac{{1 - \text{SE}}}{{\text{SP}}}$ \\ + The Proportion of True Outcome Negative Subjects That Test Positive & $\frac{{\text{FP}}}{{\text{FP} + \text{TN}}}$ \\ + The Proportion of True Outcome Positive Subjects That Test Negative & $\frac{{\text{FN}}}{{\text{TP} + \text{FN}}}$ \\ + The Proportion of Test Positive Subjects That Are Outcome Negative & $\frac{{\text{FP}}}{{\text{TP} + \text{FN}}}$ \\ + The Proportion of Test Negative Subjects That Are Outcome Positive & $\frac{{\text{FN}}}{{\text{FN} + \text{TN}}}$ \\ + \hline + \end{tabular} +\end{table} + +\subsection{Prediction of the test cases} +The class labels of the observations in the test set are predicted with the model parameters derived from the training phase. It is critical to emphasize that the same analytical procedures employed during the training phase have also been applied to the test set, such as normalization, transformation, or standardization. More specifically, if the training set underwent Z-standardization, the test set would similarly be standardized using the mean and standard deviation derived from the training set. The class labels of the test set are then estimated based on the cut-off value established during the training phase and using the model's parameters that are trained using the training set. + +\subsection{Technical details and the structure of dtComb} +The \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} package is implemented using the R programming language (\url{https://www.r-project.org/}) version 4.2.0. Package development was facilitated with \href{https://cran.r-project.org/web/packages/devtools/index.html}{\texttt{devtools}} \citep{wickham2016devtools} and documented with \href{https://cran.r-project.org/web/packages/roxygen2/index.html}{\texttt{roxygen2}} \citep{wickham2013roxygen2}. Package testing was performed using 271 unit tests \citep{wickham2011testthat}. Double programming was performed using Python (\url{https://www.python.org/}) to validate the implemented functions \citep{shiralkarprogramming}.\\ + +\newpage +To combine diagnostic tests, the \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} package allows the integration of eight linear combination methods, seven non-linear combination methods, arithmetic operators, and, in addition to these, eight distance metrics within the scope of mathematical operators and a total of 113 machine-learning algorithms from the \href{https://cran.r-project.org/web/packages/caret/index.html}{\texttt{caret}} package \citep{kuhn2008building}. These are summarized in Table \ref{tab:dtComb_features}. +%Table 3 + +\begin{table}[htbp] + \centering \small + \caption{Features of dtComb} + \label{tab:dtComb_features} + \begin{tabular}{l p{10cm}} + \hline + \textbf{Modules (Tab Panels)} & \textbf{Features} \\ +\hline + \multirow{4}{*}{Combination Methods} & + \begin{itemize} + \item Linear Combination Approach (8 Different methods) + \item Non-linear Combination Approach (7 Different Methods) + \item Mathematical Operators (14 Different methods) + \item Machine-Learning Algorithms (113 Different Methods) \citep{kuhn2008building} + \end{itemize} \\ + + \multirow{2}{*}{Preprocessing} & + \begin{itemize} + \item Five standardization methods applicable to linear, non-linear, mathematical methods + \item 16 preprocessing methods applicable to ML \citep{kuhn2008building} + \end{itemize} \\ + \multirow{2}{*}{Resampling} & + \begin{itemize} + \item Three different methods for linear and non-linear combination methods + \begin{itemize} + \item Bootstrapping + \item Cross-validation + \item Repeated cross-validation + \end{itemize} + \item 12 different resampling methods for ML \citep{kuhn2008building} + \end{itemize} \\ + {Cutpoints} & + \begin{itemize} + \item 34 different methods for optimum cutpoints \citep{yin2014optimal} + \end{itemize} \\ + \hline + \end{tabular} +\end{table} +\section{Results} + +Table \ref{tab:exist_pck} summarizes the existing packages and programs, including \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}}, along with the number of combination methods included in each package. While \pkg{mROC} offers only one linear combination method, \href{https://github.com/wbaopaul/MaxmzpAUC-R}{\texttt{maxmzpAUC}} and \href{https://cran.r-project.org/web/packages/movieROC/index.html}{\texttt{movieROC}} provide five linear combination techniques each, and \href{https://cran.r-project.org/web/packages/SLModels/index.html}{\texttt{SLModels}} includes four. However, these existing packages primarily focus on linear combination approaches. In contrast, \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} goes beyond these limitations by integrating not only linear methods but also non-linear approaches, machine learning algorithms, and mathematical operators. + +\begin{table}[htbp] + \centering \small + \caption{Comparison of dtComb vs. existing packages and programs} + \label{tab:exist_pck} + \begin{tabular}{@{}lcccc@{}} + \toprule + \textbf{Packages\&Programs} & \textbf{Linear Comb.} & \textbf{Non-linear Comb.} & \textbf{Math. Operators} & \textbf{ML algorithms} \\ + \midrule + \textbf{mROC} \citep{kramar2001mroc} & 1 & - & - & - \\ + \href{https://github.com/wbaopaul/MaxmzpAUC-R}{\texttt{maxmzpAUC}} \citep{yu2015two} & 5 & - & - &- \\ + \href{https://cran.r-project.org/web/packages/movieROC/index.html}{\texttt{movieROC}} \citep{perez2021visualizing}& 5 &- & - &- \\ + \href{https://cran.r-project.org/web/packages/SLModels/index.html}{\texttt{SLModels}} \citep{aznar-gimeno2023comparing} & 4 & - & - & - \\ + \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}}& 8 & 7 & 14 & 113 \\ + \bottomrule + \end{tabular} +\end{table} +\subsection{Dataset} +To demonstrate the functionality of the \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} package, we conduct a case study using four different combination methods. The data used in this study were obtained from patients who presented at Erciyes University Faculty of Medicine, Department of General Surgery, with complaints of abdominal pain \citep{zararsiz2016statistical,akyildiz2010value}. The dataset comprised D-dimer levels (\textit{D\_dimer}) and leukocyte counts (\textit{log\_leukocyte}) of 225 patients, divided into two groups (\textit{Group}): the first group consisted of 110 patients who required an immediate laparotomy (\textit{needed}). In comparison, the second group comprised 115 patients who did not (\textit{not\_needed}). After the evaluation of conventional treatment, the patients who underwent surgery due to their postoperative pathologies are placed in the first group. In contrast, those with a negative result from their laparotomy were assigned to the second group. All the analyses were performed by following a workflow given in Fig. \ref{figure:workflow}. First of all, the \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} package should be loaded in order to use related functions. + +\begin{figure}[H] + \centering + \includegraphics[width=0.81\textwidth]{Figure/Figure_1.pdf} + \caption{\textbf{Combination steps of two diagnostic tests.} The figure presents a schematic representation of the sequential steps involved in combining two diagnostic tests using a combination method.} + \label{figure:workflow} +\end{figure} + +\begin{example} +# load dtComb package +library(dtComb) +\end{example} +Similarly, the laparotomy data can be loaded from the R database by using the following R code: +\begin{example} + +# load laparotomy data +data(laparotomy) +\end{example} + +\subsection{Implementation of the dtComb package} +In order to demonstrate the applicability of the \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} package, the implementation of an arbitrarily chosen method from each of the linear, non-linear, mathematical operator and machine learning approaches is demonstrated and their performance is compared. These methods are Pepe, Cai \& Langton for linear combination, Splines for non-linear, Addition for mathematical operator and SVM for machine-learning. Before applying the methods, we split the data into two parts: a training set comprising 70\% of the data and a test set comprising the remaining 30\%. + +\begin{example} +# Splitting the data set into train and test (70%-30%) +set.seed(2128) +inTrain <- caret::createDataPartition(laparotomy$group, p = 0.7, list = FALSE) +trainData <- laparotomy[inTrain, ] +colnames(trainData) <- c("Group", "D_dimer", "log_leukocyte") +testData <- laparotomy[-inTrain, -1] + +# define marker and status for combination function +markers <- trainData[, -1] +status <- factor(trainData$Group, levels = c("not_needed", "needed")) +\end{example} + +The model is trained on \code{trainData} and the resampling parameters used in the training phase are chosen as ten repeat five fold repeated cross-validation. Direction = ‘<’ is chosen, as higher values indicate higher risks. The Youden index was chosen among the cut-off methods. We note that markers are not standardised and results are presented at the confidence level (CI 95\%). Four main combination functions are run with the selected methods as follows. +\begin{example} + +# PCL method +fit.lin.PCL <- linComb(markers = markers, status = status, event = "needed", + method = "PCL", resample = "repeatedcv", nfolds = 5, + nrepeats = 10, direction = "<", cutoff.method = "Youden") + +# splines method (degree = 3 and degrees of freedom = 3) +fit.nonlin.splines <- nonlinComb(markers = markers, status = status, event = "needed", + method = "splines", resample = "repeatedcv", nfolds = 5, + nrepeats = 10, cutoff.method = "Youden", direction = "<", + df1 = 3, df2 = 3) +#add operator + fit.add <- mathComb(markers = markers, status = status, event = "needed", + method = "add", direction = "<", cutoff.method = "Youden") +#SVM +fit.svm <- mlComb(markers = markers, status = status, event = "needed", method = "svmLinear", + resample = "repeatedcv", nfolds = 5,nrepeats = 10, direction = "<", + cutoff.method = "Youden") + +\end{example} + +Various measures were considered to compare model performances, including AUC, ACC, SEN, SPE, PPV, and NPV. AUC statistics, with 95\% CI, have been calculated for each marker and method. The resulting statistics are as follows: 0.816 (0.751–0.880), 0.802 (0.728–0.877), 0.888 (0.825–0.930), 0.911 (0.868–0.954), 0.877 (0.824-0.929), and 0.875 (0.821-0.930) for D-dimer, Log(leukocyte), Pepe, Cai \& Langton, Splines, Addition, and Support Vector Machine (SVM). The results revealed that the predictive performances of markers and the combination of markers are significantly higher than random chance in determining the use of laparotomy ($p<0.05$). The highest sensitivity and NPV were observed with the Addition method, while the highest specificity and PPV were observed with the Splines method. According to the overall AUC and accuracies, the combined approach fitted with the Splines method performed better than the other methods (Fig. \ref{figure:radar}). Therefore, the Splines method will be used in the subsequent analysis of the findings. + +\begin{figure}[H] + \centering + \includegraphics[width=1\textwidth]{Figure/Figure_4.pdf} + \caption{\textbf{Radar plots of trained models and performance measures of two markers.} Radar plots summarize the diagnostic performances of two markers and various combination methods in the training dataset. These plots illustrate the performance metrics such as AUC, ACC, SEN, SPE, PPV, and NPV measurements. In these plots, the width of the polygon formed by connecting each point indicates the model's performance in terms of AUC, ACC, SEN, SPE, PPV, and NPV metrics. It can be observed that the polygon associated with the Splines method occupies the most extensive area, which means that the Splines method performed better than the other methods.} + \label{figure:radar} +\end{figure} +For the AUC of markers and the spline model: +\begin{example} +fit.nonlin.splines$AUC_table + AUC SE.AUC LowerLimit UpperLimit z p.value +D_dimer 0.8156966 0.03303310 0.7509530 0.8804403 9.556979 1.212446e-21 +log_leukocyte 0.8022286 0.03791768 0.7279113 0.8765459 7.970652 1.578391e-15 +Combination 0.9111752 0.02189588 0.8682601 0.9540904 18.778659 1.128958e-78 +\end{example} +Here: \\ +\code{SE}: Standard Error.\\ + +The area under ROC curves for D-dimer levels and leukocyte counts on the logarithmic scale and combination score were 0.816, 0.802, and 0.911, respectively. The ROC curves generated with the combination score from the splines model, D-dimer levels, and leukocyte count markers are also given in Fig. \ref{figure:roc}, showing that the combination score has the highest AUC. It is observed that the splines method significantly improved between 9.5\% and 10.9\% in AUC statistics compared to D-dimer level and leukocyte counts, respectively. +\begin{figure}[H] + \centering + \includegraphics[width=0.7\textwidth]{Figure/Figure_2.pdf} + \caption{\textbf{ROC curves.} ROC curves for combined diagnostic tests, with sensitivity displayed on the y-axis and 1-specificity displayed on the x-axis. As can be observed, the combination score produced the highest AUC value, indicating that the combined strategy performs the best overall.} + \label{figure:roc} +\end{figure} + +To see the results of the binary comparison between the combination score and markers: +\begin{example} +fit.nonlin.splines$MultComp_table + +Marker1 (A) Marker2 (B) AUC (A) AUC (B) |A-B| SE(|A-B|) z p-value +1 Combination D_dimer 0.9079686 0.8156966 0.09227193 0.02223904 4.1490971 3.337893e-05 +2 Combination log_leukocyte 0.9079686 0.8022286 0.10573994 0.03466544 3.0502981 2.286144e-03 +3 D_dimer log_leukocyte 0.8156966 0.8022286 0.01346801 0.04847560 0.2778308 7.811423e-01 +\end{example} + +Controlling Type I error using Bonferroni correction, comparison of combination score with markers yielded significant results ($p<0.05$).\\ + +To demonstrate the diagnostic test results and performance measures for non-linear combination approach, the following code can be used: + +\begin{example} +fit.nonlin.splines$DiagStatCombined + Outcome + Outcome - Total +Test + 66 13 79 +Test - 11 68 79 +Total 77 81 158 + +Point estimates and 95% CIs: +-------------------------------------------------------------- +Apparent prevalence * 0.50 (0.42, 0.58) +True prevalence * 0.49 (0.41, 0.57) +Sensitivity * 0.86 (0.76, 0.93) +Specificity * 0.84 (0.74, 0.91) +Positive predictive value * 0.84 (0.74, 0.91) +Negative predictive value * 0.86 (0.76, 0.93) +Positive likelihood ratio 5.34 (3.22, 8.86) +Negative likelihood ratio 0.17 (0.10, 0.30) +False T+ proportion for true D- * 0.16 (0.09, 0.26) +False T- proportion for true D+ * 0.14 (0.07, 0.24) +False T+ proportion for T+ * 0.16 (0.09, 0.26) +False T- proportion for T- * 0.14 (0.07, 0.24) +Correctly classified proportion * 0.85 (0.78, 0.90) +-------------------------------------------------------------- +* Exact CIs +\end{example} + +Furthermore, if the diagnostic test results and performance measures of the combination score are compared with the results of the single markers, it can be observed that the TN value of the combination score is higher than that of the single markers, and the combination of markers has higher specificity and positive-negative predictive value than the log-transformed leukocyte counts and D-dimer level (Table \ref{tab:diagnostic_measures}). Conversely, D-dimer has a higher sensitivity than the others. Optimal cut-off values for both markers and the combined approach are also given in this table. + +\begin{table}[htbp] + \centering \small + \caption{Statistical diagnostic measures with 95\% confidence intervals for each marker and the combination score} + \label{tab:diagnostic_measures} + \begin{tabular}{@{}lccc@{}} + \toprule + \textbf{Diagnostic Measures (95\% CI)} & \textbf{D-dimer level ($>1.6$)} & \textbf{Log(leukocyte count) ($>4.16$)} & \textbf{Combination score ($>0.448$)} \\ + \midrule + TP & 66 & 61 & 65 \\ + TN & 53 & 60 & 69 \\ + FP & 28 & 21 & 12 \\ + FN & 11 & 16 & 12 \\ + Apparent prevalence & 0.59 (0.51-0.67) & 0.52 (0.44-0.60) & 0.49 (0.41-0.57) \\ + True prevalence & 0.49 (0.41-0.57) & 0.49 (0.41-0.57) & 0.49 (0.41-0.57) \\ + Sensitivity & 0.86 (0.76-0.93) & 0.79 (0.68-0.88) & 0.84 (0.74-0.92) \\ + Specificity & 0.65 (0.54-0.76) & 0.74 (0.63-0.83) & 0.85 (0.76-0.92) \\ + Positive predictive value & 0.70 (0.60-0.79) & 0.74 (0.64-0.83) & 0.84 (0.74-0.92) \\ + Negative predictive value & 0.83 (0.71-0.91) & 0.79 (0.68-0.87) & 0.85 (0.76-0.92) \\ + Positive likelihood ratio & 2.48 (1.81-3.39) & 3.06 (2.08-4.49) & 5.70 (3.35-9.69) \\ + Negative likelihood ratio & 0.22 (0.12-0.39) & 0.28 (0.18-0.44) & 0.18 (0.11-0.31) \\ + False T+ proportion for true D- & 0.35 (0.24-0.46) & 0.26 (0.17-0.37) & 0.15 (0.08-0.24) \\ + False T- proportion for true D+ & 0.14 (0.07-0.24) & 0.21 (0.12-0.32) & 0.16 (0.08-0.26) \\ + False T+ proportion for T+ & 0.30 (0.21-0.40) & 0.26 (0.17-0.36) & 0.16 (0.08-0.26) \\ + False T- proportion for T- & 0.17 (0.09-0.29) & 0.21 (0.13-0.32) & 0.15 (0.08-0.24) \\ + Accuracy & 0.75 (0.68-0.82) & 0.77 (0.69-0.83) & 0.85 (0.78-0.90) \\ + \bottomrule + \end{tabular} +\end{table} + +For a comprehensive analysis, the \code{plotComb} function in \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} can be used to generate plots of the kernel density and individual-value of combination scores of each group and the specificity and sensitivity corresponding to different cut-off point values Fig. \ref{figure:scatter}. This function requires the result of the \code{nonlinComb} function, which is an object of the “dtComb” class and \code{status} which is of factor type. +\begin{example} +# draw distribution, dispersion, and specificity and sensitivity plots +plotComb(fit.nonlin.splines, status) +\end{example} + +\begin{figure}[htbp] + \centering + \includegraphics[width=1\textwidth]{Figure/Figure_3.pdf} + \caption{\textbf{Kernel density, individual-value, and sens\&spe plots of the combination score acquired with the training model.} Kernel density of the combination score for two groups: needed and not needed (a). Individual-value graph with classes on the x-axis and combination score on the y-axis (b). Sensitivity and specificity graph of the combination score c. While colors show each class in Figures (a) and (b), in Figure (c), the colors represent the sensitivity and specificity of the combination score.} + \label{figure:scatter} +\end{figure} + +If the model trained with Splines is to be tested, the generically written predict function is used. This function requires the test set and the result of the \code{nonlinComb} function, which is an object of the “dtComb” class. As a result of prediction, the output for each observation consisted of the combination score and the predicted label determined by the cut-off value derived from the model. +\begin{example} +# To predict the test set +pred <- predict(fit.nonlin.splines, testData) +head(pred) + + comb.score labels +1 0.6133884 needed +7 0.9946474 needed +10 0.9972347 needed +11 0.9925040 needed +13 0.9257699 needed +14 0.9847090 needed +\end{example} +Above, it can be seen that the estimated combination scores for the first six observations in the test set were labelled as \textbf{needed} because they were higher than the cut-off value of 0.448. + +\subsection{Web interface for the dtComb package} +The primary goal of developing the \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} package is to combine numerous distinct combination methods and make them easily accessible to researchers. Furthermore, the package includes diagnostic statistics and visualization tools for diagnostic tests and the combination score generated by the chosen method. Nevertheless, it is worth noting that using R code may pose challenges for physicians and those unfamiliar with R programming. We have also developed a user-friendly web application for \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} using \href{https://cran.r-project.org/web/packages/shiny/index.html}{\texttt{Shiny}} \citep{chang2017shiny} to address this. This web-based tool is publicly accessible and provides an interactive interface with all the functionalities found in the \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} package. \\ + +To initiate the analysis, users must upload their data by following the instructions outlined in the "Data upload" tab of the web tool. For convenience, we have provided three example datasets on this page to assist researchers in practicing the tool's functionality and to guide them in formatting their own data (as illustrated in Fig. \ref{figure:web}a). We also note that ROC analysis for a single marker can be performed within the ‘ROC Analysis for Single Marker(s)’ tab in the data upload section of the web interface. + +In the "Analysis" tab, one can find two crucial subpanels: +\begin{itemize} + \item Plots (Fig. \ref{figure:web}b): This section offers various visual representations, such as ROC curves, kernel density plots, individual-value plots, and sensitivity and specificity plots. These visualizations help users assess single diagnostic tests and the combination score generated using user-defined combination methods. + \item Results (Fig. \ref{figure:web}c): In this subpanel, one can access a range of statistics. It provides insights into the combination score and single diagnostic tests, AUC statistics, and comparisons to evaluate how the combination score fares against individual diagnostic tests, and various diagnostic measures. One can also predict new data based on the model parameters set previously and stored in the "Predict" tab (Fig. \ref{figure:web}d). If needed, one can download the model created during the analysis to keep the parameters of the fitted model. This lets users make new predictions by reloading the model from the "Predict" tab. Additionally, all the results can easily be downloaded using the dedicated download buttons in their respective tabs. +\end{itemize} +\begin{figure}[H] + \centering + \includegraphics[width=1\textwidth]{Figure/Figure_5.pdf} + \caption{\textbf{Web interface of the dtComb package.} The figure illustrates the web interface of the \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} package, which demonstrates the steps involved in combining two diagnostic tests. a) Data Upload: The user is able to upload the dataset and select relevant markers, a gold standard test, and an event factor for analysis.b) Combination Analysis: This panel allows the selection of the combination method, method-specific parameters, and resampling options to refine the analysis. c) Combination Analysis Output: Displays the results generated by the selected combination method, providing the user with key metrics and visualizations for interpretation. d) Predict: Displays the prediction results of the trained model when applied to the test set.} + \label{figure:web} +\end{figure} +\section{Summary and further research} +In clinical practice, multiple diagnostic tests are possible for disease diagnosis \citep{yu2015two}. Combining these tests to enhance diagnostic accuracy is a widely accepted approach \citep{su1993linear,pepe2000combining,liu2011min,sameera2016binary,pepe2006combining,todor2014tools}. As far as we know, the tools in Table \ref{tab:exist_pck} have been designed to combine diagnostic tests but only contain at most five different combination methods. As a result, despite the existence of numerous advanced combination methods, there is currently no extensive tool available for integrating diagnostic tests.\\ +In this study, we presented \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}}, a comprehensive R package designed to combine diagnostic tests using various methods, including linear, non-linear, mathematical operators, and machine learning algorithms. The package integrates 142 different methods for combining two diagnostic markers to improve the accuracy of diagnosis. The package also provides ROC curve analysis, various graphical approaches, diagnostic performance scores, and binary comparison results. In the given example, one can determine whether patients with abdominal pain require laparotomy by combining the D-dimer levels and white blood cell counts of those patients. Various methods, such as linear and non-linear combinations, were tested, and the results showed that the Splines method performed better than the others, particularly in terms of AUC and accuracy compared to single tests. This shows that diagnostic accuracy can be improved with combination methods.\\ +Future work can focus on extending the capabilities of the \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} package. While some studies focus on combining multiple markers, our study aimed to combine two markers using nearly all existing methods and develop a tool and package for clinical practice \citep{kang2016linear}. + +\subsection{R Software} + +The R package \href{https://cran.r-project.org/web/packages/dtComb/index.html}{\texttt{dtComb}} is now available on the CRAN website \url{https://cran.r-project.org/web/packages/dtComb/index.html}. + +\subsection{Acknowledgment} +We would like to thank the Proofreading \& Editing Office of the Dean for Research at Erciyes University for the copyediting and proofreading service for this manuscript. +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +%\section{Summary} +%This file is only a basic article template. For full details of \emph{The R Journal} style and information on how to prepare your article for submission, see the \href{https://journal.r-project.org/share/author-guide.pdf}{Instructions for Authors}. + +\bibliography{dtCombreferences} + +\address{S. Ilayda Yerlitaş Taştan\\ + Department of Biostatistics \\ + Erciyes University\\ + Türkiye\\ + (ORCiD: 0000-0003-2830-3006)\\ + \email{ilaydayerlitas340@gmail.com}} + +\address{Serra Bersan Gengeç\\ + Department of Biostatistics \\ + Erciyes University\\ + Türkiye\\ + \email{serrabersan@gmail.com}} + +\address{Necla Koçhan\\ + Department of Mathematics\\ + Izmir University of Economics\\ + Türkiye\\ + (ORCiD: 0000-0003-2355-4826)\\ + \email{necla.kayaalp@gmail.com}} + + \address{Gözde Ertürk Zararsız\\ + Department of Biostatistics \\ + Erciyes University\\ + Türkiye\\ + (ORCiD if desired)\\ + \email{gozdeerturk9@gmail.com}} + + \address{Selçuk Korkmaz\\ + Department of Biostatistics \\ + Trakya University\\ + Türkiye\\ + (ORCiD if desired)\\ + \email{selcukorkmaz@gmail.com}} + + \address{Gökmen Zararsız\\ + Department of Biostatistics \\ + Erciyes University\\ + Türkiye\\ + (ORCiD: 0000-0001-5801-1835)\\ + \email{gokmen.zararsiz@gmail.com}} diff --git a/_articles/RJ-2025-036/dtCombreferences.bib b/_articles/RJ-2025-036/dtCombreferences.bib new file mode 100644 index 0000000000..71640734a9 --- /dev/null +++ b/_articles/RJ-2025-036/dtCombreferences.bib @@ -0,0 +1,872 @@ +@article{pepe2005evaluating, + title={Evaluating technologies for classification and prediction in medicine}, + author={Pepe, MS}, + journal={Statistics in medicine}, + volume={24}, + number={24}, + pages={3687--3696}, + year={2005}, + publisher={Wiley Online Library} +} +@article{baranski2022accuracy, + title={The Accuracy of a Screening Tool in Epidemiological Studies—An Example of Exhaled Nitric Oxide in Paediatric Asthma}, + author={Bara{\'n}ski, Kamil and Schl{\"u}nssen, Vivi}, + journal={International Journal of Environmental Research and Public Health}, + volume={19}, + number={22}, + pages={14746}, + year={2022}, + publisher={MDPI} +} +@article{califf2018biomarker, + title={Biomarker definitions and their applications}, + author={Califf, Robert M}, + journal={Experimental biology and medicine}, + volume={243}, + number={3}, + pages={213--221}, + year={2018}, + publisher={SAGE Publications Sage UK: London, England} +} +@article{sewak2024construction, + title={Construction and evaluation of optimal diagnostic tests with application to hepatocellular carcinoma diagnosis}, + author={Sewak, Ainesh and Siegfried, Sandra and Hothorn, Torsten}, + journal={arXiv preprint arXiv:2402.03004}, + year={2024} +} +@article{du2024likelihood, + title={Likelihood ratio combination of multiple biomarkers via smoothing spline estimated densities}, + author={Du, Zhiyuan and Du, Pang and Liu, Aiyi}, + journal={Statistics in Medicine}, + volume={43}, + number={7}, + pages={1372--1383}, + year={2024}, + publisher={Wiley Online Library} +} +@article{inacio2021statistical, + title={Statistical evaluation of medical tests}, + author={In{\'a}cio, Vanda and Rodr{\'\i}guez-{\'A}lvarez, Mar{\'\i}a Xos{\'e} and Gayoso-Diz, Pilar}, + journal={Annual Review of Statistics and Its Application}, + volume={8}, + number={1}, + pages={41--67}, + year={2021}, + publisher={Annual Reviews} +} +@article{salvetat2024ai, + title={AI algorithm combined with RNA editing-based blood biomarkers to discriminate bipolar from major depressive disorders in an external validation multicentric cohort}, + author={Salvetat, Nicolas and Checa-Robles, Francisco Jesus and Delacr{\'e}taz, Aur{\'e}lie and Cayzac, Christopher and Dubuc, Benjamin and Vetter, Diana and Dainat, Jacques and Lang, Jean-Philippe and Gamma, Franziska and Weissmann, Dinah}, + journal={Journal of Affective Disorders}, + volume={356}, + pages={385--393}, + year={2024}, + publisher={Elsevier} +} +@article{prinzi2023explainable, + title={Explainable machine-learning models for {COVID-19} prognosis prediction using clinical, laboratory and radiomic features}, + author={Prinzi, Francesco and Militello, Carmelo and Scichilone, Nicola and Gaglio, Salvatore and Vitabile, Salvatore}, + journal={IEEE Access}, + volume={11}, + pages={121492--121510}, + year={2023}, + publisher={IEEE} +} +@article{mann2024blood, + title={Blood RNA biomarkers for tuberculosis screening in people living with {HIV} before antiretroviral therapy initiation: a diagnostic accuracy study}, + author={Mann, Tiffeney and Gupta, Rishi K and Reeve, Byron WP and Ndlangalavu, Gcobisa and Chandran, Aneesh and Krishna, Amirtha P and Calderwood, Claire J and Tshivhula, Happy and Palmer, Zaida and Naidoo, Selisha and others}, + journal={The Lancet Global Health}, + volume={12}, + number={5}, + pages={e783--e792}, + year={2024}, + publisher={Elsevier} +} +@article{liu2011min, + title={A min--max combination of biomarkers to improve diagnostic accuracy}, + author={Liu, Chunling and Liu, Aiyi and Halabi, Susan}, + journal={Statistics in medicine}, + volume={30}, + number={16}, + pages={2005--2014}, + year={2011}, + publisher={Wiley Online Library} +} +@article{sameera2016binary, + title={Binary classification using multivariate receiver operating characteristic curve for continuous data}, + author={Sameera, G and Vardhan, R Vishnu and Sarma, KVS}, + journal={Journal of biopharmaceutical statistics}, + volume={26}, + number={3}, + pages={421--431}, + year={2016}, + publisher={Taylor \& Francis} +} +@article{su1993linear, + title={Linear combinations of multiple diagnostic markers}, + author={Su, John Q and Liu, Jun S}, + journal={Journal of the American Statistical Association}, + volume={88}, + number={424}, + pages={1350--1355}, + year={1993}, + publisher={Taylor \& Francis} +} +@article{pepe2000combining, + title={Combining diagnostic test results to increase accuracy}, + author={Pepe, Margaret Sullivan and Thompson, Mary Lou}, + journal={Biostatistics}, + volume={1}, + number={2}, + pages={123--140}, + year={2000}, + publisher={Oxford University Press} +} +@article{ghosh2005classification, + title={Classification and selection of biomarkers in genomic data using {LASSO}}, + author={Ghosh, Debashis and Chinnaiyan, Arul M}, + journal={BioMed Research International}, + volume={2005}, + number={2}, + pages={147--154}, + year={2005}, + publisher={Wiley Online Library} +} +@article{svart2024neurofilament, + title={Neurofilament light chain is elevated in patients with newly diagnosed idiopathic intracranial hypertension: A prospective study}, + author={Svart, Katrine and Korsb{\ae}k, Johanne Juhl and Jensen, Rigmor H{\o}jland and Parkner, Tina and Knudsen, Cindy S{\o}nders{\o} and Hasselbalch, Steen Gregers and Hagen, Snorre Malm and Wibroe, Elisabeth Arnberg and Molander, Laleh Dehghani and Beier, Dagmar}, + journal={Cephalalgia}, + volume={44}, + number={5}, + pages={03331024241248203}, + year={2024}, + publisher={SAGE Publications Sage UK: London, England} +} +@article{luo2024ast, + title={{AST}/{ALT} ratio is an independent risk factor for diabetic retinopathy: A cross-sectional study}, + author={Luo, Jian and Yu, Fang and Zhou, Haifeng and Wu, Xueyan and Zhou, Quan and Liu, Qin and Gan, Shenglian}, + journal={Medicine}, + volume={103}, + number={26}, + pages={e38583}, + year={2024}, + publisher={LWW} +} +@article{serban2024significance, + title={Significance of neutrophil to lymphocyte ratio ({NLR}) and platelet lymphocyte ratio ({PLR}) in diabetic foot ulcer and potential new therapeutic targets}, + author={Serban, Dragos and Papanas, Nikolaos and Dascalu, Ana Maria and Kempler, Peter and Raz, Itamar and Rizvi, Ali A and Rizzo, Manfredi and Tudor, Corneliu and Silviu Tudosie, Mihail and Tanasescu, Denisa and others}, + journal={The International Journal of Lower Extremity Wounds}, + volume={23}, + number={2}, + pages={205--216}, + year={2024}, + publisher={SAGE Publications Sage CA: Los Angeles, CA} +} +@article{ahsan2024advancements, + title={Advancements in medical diagnosis and treatment through machine learning: A review}, + author={Ahsan, Mohammad and Khan, Anam and Khan, Kaif Rehman and Sinha, Bam Bahadur and Sharma, Anamika}, + journal={Expert Systems}, + volume={41}, + number={3}, + pages={e13499}, + year={2024}, + publisher={Wiley Online Library} +} +@article{agarwal2023artificial, + title={By artificial intelligence algorithms and machine learning models to diagnosis cancer}, + author={Agarwal, Seema and Yadav, Ajay Singh and Dinesh, Vennapoosa and Vatsav, Kolluru Sai Sri and Prakash, Kolluru Sai Surya and Jaiswal, Sushma}, + journal={Materials Today: Proceedings}, + volume={80}, + pages={2969--2975}, + year={2023}, + publisher={Elsevier} +} +@article{salvetat2022game, + title={A game changer for bipolar disorder diagnosis using RNA editing-based biomarkers}, + author={Salvetat, Nicolas and Checa-Robles, Francisco Jesus and Patel, Vipul and Cayzac, Christopher and Dubuc, Benjamin and Chimienti, Fabrice and Abraham, Jean-Daniel and Dupr{\'e}, Pierrick and Vetter, Diana and M{\'e}reuze, Sandie and others}, + journal={Translational Psychiatry}, + volume={12}, + number={1}, + pages={182}, + year={2022}, + publisher={Nature Publishing Group UK London} +} +@article{ganapathy2023comparison, + title={Comparison of diagnostic accuracy of models combining the renal biomarkers in predicting renal scarring in pediatric population with vesicoureteral reflux ({VUR})}, + author={Ganapathy, Sachit and KT, Harichandrakumar and Jindal, Bibekanand and Naik, Prathibha S and Nair N, Sreekumaran}, + journal={Irish Journal of Medical Science (1971-)}, + volume={192}, + number={5}, + pages={2521--2526}, + year={2023}, + publisher={Springer} +} +@article{alzyoud2024diagnosing, + title={Diagnosing diabetes mellitus using machine learning techniques}, + author={Alzyoud, Mazen and Alazaidah, Raed and Aljaidi, Mohammad and Samara, Ghassan and Qasem, M and Khalid, Muhammad and Al-Shanableh, Najah}, + journal={International Journal of Data and Network Science}, + volume={8}, + number={1}, + pages={179--188}, + year={2024} +} +@article{zararsiz2016statistical, + title={Statistical learning approaches in diagnosing patients with nontraumatic acute abdomen}, + author={Zararsiz, G{\"O}KMEN and Akyildiz, Hizir Yakup and G{\"O}KS{\"U}L{\"U}K, D{\.I}N{\c{C}}ER and Korkmaz, Selcuk and {\"O}ZT{\"U}RK, AHMET}, + journal={Turkish Journal of Electrical Engineering and Computer Sciences}, + volume={24}, + number={5}, + pages={3685--3697}, + year={2016} +} +@article{degroat2024discovering, + title={Discovering biomarkers associated and predicting cardiovascular disease with high accuracy using a novel nexus of machine learning techniques for precision medicine}, + author={DeGroat, William and Abdelhalim, Habiba and Patel, Kush and Mendhe, Dinesh and Zeeshan, Saman and Ahmed, Zeeshan}, + journal={Scientific reports}, + volume={14}, + number={1}, + pages={1}, + year={2024}, + publisher={Nature Publishing Group UK London} +} +@article{kramar2001mroc, + title={mROC: a computer program for combining tumour markers in predicting disease states}, + author={Kramar, Andrew and Faraggi, David and Fortun{\'e}, Antoine and Reiser, Benjamin}, + journal={Computer methods and programs in biomedicine}, + volume={66}, + number={2-3}, + pages={199--207}, + year={2001}, + publisher={Elsevier} +} +@article{perez2021visualizing, + title={Visualizing the decision rules behind the {ROC} curves: understanding the classification process}, + author={P{\'e}rez-Fern{\'a}ndez, Sonia and Mart{\'\i}nez-Camblor, Pablo and Filzmoser, Peter and Corral, Norberto}, + journal={AStA Advances in Statistical Analysis}, + volume={105}, + number={1}, + pages={135--161}, + year={2021}, + publisher={Springer} +} +@article{yu2015two, + title={Two simple algorithms on linear combination of multiple biomarkers to maximize partial area under the {ROC} curve}, + author={Yu, Wenbao and Park, Taesung}, + journal={Computational Statistics \& Data Analysis}, + volume={88}, + pages={15--27}, + year={2015}, + publisher={Elsevier} +} +@article{kuhn2008building, + title={Building predictive models in {R} using the caret package}, + author={Kuhn, Max}, + journal={Journal of statistical software}, + volume={28}, + pages={1--26}, + year={2008} +} +@article{kuhn2020tidymodels, + title={Tidymodels: a collection of packages for modeling and machine learning using tidyverse principles}, + author={Kuhn, Max and Wickham, Hadley}, + year={2020}, + url={https://www.tidymodels.org}, + publisher={Boston, MA, USA} +} +@inproceedings{chen2016xgboost, + title={{Xgboost}: A scalable tree boosting system}, + author={Chen, Tianqi and Guestrin, Carlos}, + booktitle={Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining}, + pages={785--794}, + year={2016} +} +@article{Trevor2015gam, + title={gam: Generalized Additive Models.}, + author={Trevor Hastie}, + year={2015}, + url={https://CRAN.R-project.org/package=gam}, + note={R package version 1.22-5} +} +@article{ruiz1991asymptotic, + title={Asymptotic efficiency of logistic regression relative to linear discriminant analysis}, + author={Ruiz-Velasco, S}, + journal={Biometrika}, + volume={78}, + number={2}, + pages={235--243}, + year={1991}, + publisher={Oxford University Press} +} +@article{efron1975efficiency, + title={The efficiency of logistic regression compared to normal discriminant analysis}, + author={Efron, Bradley}, + journal={Journal of the American Statistical Association}, + volume={70}, + number={352}, + pages={892--898}, + year={1975}, + publisher={Taylor \& Francis} +} +@book{cox1989analysis, + title={Analysis of Binary Data}, + author={Cox, D.R. and Snell, E.J.}, + edition={2nd}, + year={1989}, + publisher={Chapman and Hall/CRC}, + address={London} +} +@article{leon2006bedside, + title={A bedside scoring system ({``Candida score''}) for early antifungal treatment in nonneutropenic critically ill patients with {Candida} colonization}, + author={Le{\'o}n, Crist{\'o}bal and Ruiz-Santana, Sergio and Saavedra, Pedro and Almirante, Benito and Nolla-Salas, Juan and {\'A}lvarez-Lerma, Francisco and Garnacho-Montero, Jos{\'e} and Le{\'o}n, Mar{\'\i}a {\'A}ngeles and EPCAN Study Group and others}, + journal={Critical care medicine}, + volume={34}, + number={3}, + pages={730--737}, + year={2006}, + publisher={LWW} +} +@article{pepe2006combining, + title={Combining predictors for classification using the area under the receiver operating characteristic curve}, + author={Pepe, Margaret Sullivan and Cai, Tianxi and Longton, Gary}, + journal={Biometrics}, + volume={62}, + number={1}, + pages={221--229}, + year={2006}, + publisher={Oxford University Press} +} +@book{pepe2003statistical, + title={The statistical evaluation of medical tests for classification and prediction}, + author={Pepe, Margaret Sullivan}, + year={2003}, + publisher={Oxford university press} +} +@article{kuhn2008caret, + author = {Kuhn, Max}, + title = {Building Predictive Models in {R} Using the {caret} Package}, + journal = {Journal of Statistical Software}, + volume = {28}, + number = {5}, + pages = {1--26}, + year = {2008}, + doi = {10.18637/jss.v028.i05}, + url = {https://doi.org/10.18637/jss.v028.i05} +} +@manual{hastie2023gam, + title={Package 'gam'}, + author={{M.T. Hastie}}, + year={2023}, + organization={CRAN}, + url={URL_of_the_package_documentation} +} +@manual{venable2016splines, + title={Package 'splines'}, + author={{W.N.V. Billvenable}}, + year={2016}, + organization={CRAN}, + url={https://cran.r-project.org/web/packages/splines/index.html} +} +@book{james2021introduction, + title={An Introduction to Statistical Learning}, + author={James, G. and Witten, D. and Hastie, T. and Tibshirani, R.}, + year={2021}, + publisher={Springer}, + address={New York} +} +@article{turck2011proc, + title={pROC: an open-source package for {R} and {S+} to analyze and compare {ROC} curves}, + author={Turck, N. and Vutskits, L. and Sanchez-Pena, P. and Robin, X. and Hainard, A. and Gex-Fabry, M. and Fouda, C. and Bassem, H. and Mueller, M. and Lisacek, F. and Puybasset, L. and Sanchez, J.-C.}, + journal={BMC Bioinformatics}, + volume={8}, + year={2011}, + pages={12--77}, + doi={10.1007/s00134-009-1641-y} +} +@article{anderson1961classification, + title={Classification into two multivariate normal distributions with different covariance matrices}, + author={Anderson, R.R. and Bahadur, T.W.}, + journal={Ann. Math. Stat.}, + volume={32}, + number={1}, + pages={32--35}, + year={1961}, + doi={10.1214/aoms/1177733256}, + url={https://doi.org/10.1214/aoms/1177733256} +} +@article{kang2016linear, + title={Linear combination methods to improve diagnostic/prognostic accuracy on future observations}, + author={Kang, Le and Liu, Aiyi and Tian, Lili}, + journal={Statistical methods in medical research}, + volume={25}, + number={4}, + pages={1359--1380}, + year={2016}, + publisher={SAGE Publications Sage UK: London, England} +} +@article{anderson1962classification, + title={Classification into two multivariate normal distributions with different covariance matrices}, + author={Anderson, Theodore W and Bahadur, Raghu Raj}, + journal={The annals of mathematical statistics}, + pages={420--431}, + year={1962}, + publisher={JSTOR} +} +@article{todor2014tools, + title={Tools to identify linear combination of prognostic factors which maximizes area under receiver operator curve}, + author={Todor, Nicolae and Todor, Irina and S{\u{a}}pl{\u{a}}can, Gavril}, + journal={Journal of clinical bioinformatics}, + volume={4}, + pages={1--7}, + year={2014}, + publisher={Springer} +} +@book{james2013introduction, + title={An introduction to statistical learning}, + author={James, Gareth and Witten, Daniela and Hastie, Trevor and Tibshirani, Robert and others}, + volume={112}, + year={2013}, + publisher={Springer} +} +@article{friedman2010regularization, + title={Regularization paths for generalized linear models via coordinate descent}, + author={Friedman, Jerome and Hastie, Trevor and Tibshirani, Rob}, + journal={Journal of statistical software}, + volume={33}, + number={1}, + pages={1}, + year={2010}, + publisher={NIH Public Access} +} +@article{elhakeem2022using, + title={Using linear and natural cubic splines, {SITAR}, and latent trajectory models to characterise nonlinear longitudinal growth trajectories in cohort studies}, + author={Elhakeem, Ahmed and Hughes, Rachael A and Tilling, Kate and Cousminer, Diana L and Jackowski, Stefan A and Cole, Tim J and Kwong, Alex SF and Li, Zheyuan and Grant, Struan FA and Baxter-Jones, Adam DG and others}, + journal={BMC Medical Research Methodology}, + volume={22}, + number={1}, + pages={68}, + year={2022}, + publisher={Springer} +} +@article{minaev2018distance, + title={Distance measures for classification of numerical features}, + author={Minaev, Georgy and Pich{\'e}, Robert and Visa, Ari}, + year={2018}, + publisher={Tampere University of Technology}, + url={https://trepo.tuni.fi/handle/10024/124353} +} +@article{pandit2011comparative, + title={A comparative study on distance measuring approaches for clustering}, + author={Pandit, Shraddha and Gupta, Suchita and others}, + journal={International journal of research in computer science}, + volume={2}, + number={1}, + pages={29--31}, + year={2011}, + publisher={Citeseer} +} +@article{cha2007comprehensive, + title={Comprehensive survey on distance/similarity measures between probability density functions}, + author={Cha, Sung-Hyuk}, + journal={City}, + volume={1}, + number={2}, + pages={1}, + year={2007} +} +@article{yin2014optimal, + title={Optimal linear combinations of multiple diagnostic biomarkers based on {Youden} index}, + author={Yin, Jingjing and Tian, Lili}, + journal={Statistics in medicine}, + volume={33}, + number={8}, + pages={1426--1440}, + year={2014}, + publisher={Wiley Online Library} +} +@article{stevenson2017epir, + title={epiR: tools for the analysis of epidemiological data.}, + author={Stevenson, Mark and Nunes, Telmo and Heuer, Cord and Marshall, Jonathon and Sanchez, Javier and Thornton, Ron and Reiczigel, Jeno and Robison-Cox, Jim and Sebastiani, Paola and Solymos, Peter and others}, + url={https://cran.r-project.org/web/packages/epiR/index.html}, + year={2017}, + note={R package version 2.0.76} +} +@article{wickham2016devtools, + title={Devtools: Tools to make developing {R} packages easier}, + author={Wickham, H. and Hester, J. and Chang, W. and Bryan, J.}, + year={2022}, + note={R package version 2.4.5}, + url={https://cran.r-project.org/web/packages/devtools/index.html} +} +@article{wickham2013roxygen2, + title={roxygen2: In-source documentation for {R}}, + author={Wickham, H. and Danenberg, P. and Eugster, M.}, + year={2024}, + note={R package version 7.3.2}, + url={https://cran.r-project.org/web/packages/roxygen2/index.html} +} +@article{wickham2011testthat, + title={Testthat: Get started with testing}, + author={Wickham, H.}, + year={2024}, + url={https://cran.r-project.org/web/packages/testthat/index.html}, + note={R package version 3.2.1.1} +} +@article{shiralkarprogramming, + title={Programming Validation: Perspectives and Strategies}, + journal={PharmaSUG 2010—paper IB09}, + year={2010}, + author={Shiralkar, Parag} +} +@article{akyildiz2010value, + title={The value of {D-dimer} test in the diagnosis of patients with nontraumatic acute abdomen}, + author={Akyildiz, Hizir Yakup and Sozuer, Erdogan and Akcan, Alper and Ku{\c{c}}uk, Can and Artis, Tarik and Biri, {\.I}smail and Y{\i}lmaz, Nam{\i}k}, + journal={Turkish Journal of Trauma and Emergency Surgery}, + volume={16}, + number={1}, + pages={22--26}, + year={2010} +} +@article{chang2017shiny, + title={Shiny: Web Application Framework for {R}}, + author={Winston Chang and Joe Cheng and JJ Allaire and Carson Sievert and Barret Schloerke and Yihui Xie and Jeff Allen and Jonathan McPherson and Alan Dipert and Barbara Borges}, + year = {2024}, + note = {R package version 1.9.1.9000, https://github.com/rstudio/shiny}, + url = {https://shiny.posit.co/} +} +@article{condurache2023min, + title={Min-Max, Min-Max-Median, and Min-Max-IQR in Deciding Optimal Diagnostic Thresholds: Performances of a Logistic Regression Approach on Simulated and Real Data}, + author={Condurache, Ilie-Andrei and Bolboac{\u{a}}, Sorana D}, + journal={Applied Medical Informatics}, + volume={45}, + number={3}, + year={2023} +} +@article{chang2022artificial, + title={An artificial intelligence model for heart disease detection using machine learning algorithms}, + author={Chang, Victor and Bhavani, Vallabhanent Rupa and Xu, Ariel Qianwen and Hossain, MA}, + journal={Healthcare Analytics}, + volume={2}, + pages={100016}, + year={2022}, + publisher={Elsevier} +} +@article{alkayyali2023systematic, + title={A systematic literature review of deep and machine learning algorithms in cardiovascular diseases diagnosis}, + author={Alkayyali, ZK and Idris, S Anuar Bin and Abu-Naser, Samy S}, + journal={Journal of Theoretical and Applied Information Technology}, + volume={101}, + number={4}, + pages={1353--1365}, + year={2023} +} +@inproceedings{ghazal2022intelligent, + title={Intelligent model to predict early liver disease using machine learning technique}, + author={Ghazal, Taher M and Rehman, Aziz Ur and Saleem, Muhammad and Ahmad, Munir and Ahmad, Shabir and Mehmood, Faisal}, + booktitle={2022 International Conference on Business Analytics for Technology and Security ({ICBATS})}, + pages={1--5}, + year={2022}, + organization={IEEE} +} +@article{sing2005rocr, + title={ROCR: visualizing classifier performance in {R}}, + author={Sing, Tobias and Sander, Oliver and Beerenwinkel, Niko and Lengauer, Thomas}, + journal={Bioinformatics}, + volume={21}, + number={20}, + pages={3940--3941}, + year={2005}, + publisher={Oxford University Press} +} +@article{robin2011proc, + title={pROC: an open-source package for {R} and {S+} to analyze and compare {ROC} curves}, + author={Robin, Xavier and Turck, Natacha and Hainard, Alexandre and Tiberti, Natalia and Lisacek, Fr{\'e}d{\'e}rique and Sanchez, Jean-Charles and M{\"u}ller, Markus}, + journal={BMC bioinformatics}, + volume={12}, + pages={1--8}, + year={2011}, + publisher={Springer} +} +@article{grau2015prroc, + title={PRROC: computing and visualizing precision-recall and receiver operating characteristic curves in {R}}, + author={Grau, Jan and Grosse, Ivo and Keilwagen, Jens}, + journal={Bioinformatics}, + volume={31}, + number={15}, + pages={2595--2597}, + year={2015}, + publisher={Oxford University Press} +} +@article{sachs2017plotroc, + title={plotROC: a tool for plotting {ROC} curves}, + author={Sachs, Michael C}, + journal={Journal of statistical software}, + volume={79}, + year={2017}, + publisher={NIH Public Access} +} +@article{saito2017precrec, + title={Precrec: fast and accurate precision--recall and {ROC} curve calculations in {R}}, + author={Saito, Takaya and Rehmsmeier, Marc}, + journal={Bioinformatics}, + volume={33}, + number={1}, + pages={145--147}, + year={2017}, + publisher={Oxford University Press} +} +@article{khan2019rocit, + title={Rocit-an {R} package for performance assessment of binary classifier with visualization}, + author={Khan, Md Riaz Ahmed Ahmed}, + year={2019}, + url={https://CRAN.R-project.org/package=ROCit} +} +@article{novielli2013meta, + title={Meta-analysis of the accuracy of two diagnostic tests used in combination: application to the {D-dimer} test and the wells score for the diagnosis of deep vein thrombosis}, + author={Novielli, Nicola and Sutton, Alexander J and Cooper, Nicola J}, + journal={Value in health}, + volume={16}, + number={4}, + pages={619--628}, + year={2013}, + publisher={Elsevier} +} +@article{wang2013predicting, + title={Predicting drug-target interactions using restricted {Boltzmann} machines}, + author={Wang, Yuhao and Zeng, Jianyang}, + journal={Bioinformatics}, + volume={29}, + number={13}, + pages={i126--i134}, + year={2013}, + publisher={Oxford University Press} +} +@article{fagan2007cerebrospinal, + title={Cerebrospinal fluid tau/$\beta$-amyloid42 ratio as a prediction of cognitive decline in nondemented older adults}, + author={Fagan, Anne M and Roe, Catherine M and Xiong, Chengjie and Mintun, Mark A and Morris, John C and Holtzman, David M}, + journal={Archives of neurology}, + volume={64}, + number={3}, + pages={343--349}, + year={2007}, + publisher={American Medical Association} +} +@article{nyblom2004high, + title={High {AST}/{ALT} ratio may indicate advanced alcoholic liver disease rather than heavy drinking}, + author={Nyblom, HBUBJ and Berggren, Ulf and Balldin, Jan and Olsson, Rolf}, + journal={Alcohol and alcoholism}, + volume={39}, + number={4}, + pages={336--339}, + year={2004}, + publisher={Oxford University Press} +} +@article{balta2016relation, + title={The relation between atherosclerosis and the neutrophil--lymphocyte ratio}, + author={Balta, Sevket and Celik, Turgay and Mikhailidis, Dimitri P and Ozturk, Cengiz and Demirkol, Sait and Aparci, Mustafa and Iyisoy, Atila}, + journal={Clinical and applied thrombosis/hemostasis}, + volume={22}, + number={5}, + pages={405--411}, + year={2016}, + publisher={SAGE Publications Sage CA: Los Angeles, CA} +} +@article{bardella1991iga, + title={{IgA} antigliadin antibodies, cellobiose/mannitol sugar test, and carotenemia in the diagnosis of and screening for celiac disease.}, + author={Bardella, MT and Molteni, N and Cesana, B and Baldassarri, AR and Bianchi, PA}, + journal={American Journal of Gastroenterology (Springer Nature)}, + volume={86}, + number={3}, + year={1991} +} +@article{bozkurt2014comparison, + title={Comparison of different methods for determining diabetes}, + author={Bozkurt, Mehmet Recep and Yurtay, Nil{\"u}fer and Yilmaz, Ziynet and Sertkaya, Cengiz}, + journal={Turkish Journal of Electrical Engineering and Computer Sciences}, + volume={22}, + number={4}, + pages={1044--1055}, + year={2014} +} +@article{chen2015diagnosis, + title={Diagnosis of colorectal cancer by near-infrared optical fiber spectroscopy and random forest}, + author={Chen, Hui and Lin, Zan and Wu, Hegang and Wang, Li and Wu, Tong and Tan, Chao}, + journal={Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy}, + volume={135}, + pages={185--191}, + year={2015}, + publisher={Elsevier} +} +@article{abate2020conformation, + title={A conformation variant of {p53} combined with machine learning identifies {Alzheimer} disease in preclinical and prodromal stages}, + author={Abate, Giulia and Vezzoli, Marika and Polito, Letizia and Guaita, Antonio and Albani, Diego and Marizzoni, Moira and Garrafa, Emirena and Marengoni, Alessandra and Forloni, Gianluigi and Frisoni, Giovanni B and others}, + journal={Journal of personalized medicine}, + volume={11}, + number={1}, + pages={14}, + year={2020}, + publisher={MDPI} +} +@book{meszaros2007xunit, + title={{xUnit} Test Patterns: Refactoring Test Code}, + author={Meszaros, G.}, + year={2007}, + publisher={Addison-Wesley Professional} +} +@article{erturkzararsiz2023linear, + title={Linear Combination of Leukocyte Count and {D-Dimer} Levels in the Diagnosis of Patients with Non-traumatic Acute Abdomen}, + author={Ert{\"u}rk Zarars{\i}z, G.}, + journal={Med. Rec.}, + volume={5}, + year={2023}, + pages={84--90}, + doi={10.37990/medr.1166531}, + url={https://doi.org/10.37990/medr.1166531} +} +@article{ma2020combination, + title={Combination of multiple functional markers to improve diagnostic accuracy}, + author={Ma, H. and Yang, J. and Xu, S. and Liu, C. and Zhang, Q.}, + journal={J. Appl. Stat.}, + year={2020}, + pages={1--20}, + doi={10.1080/02664763.2020.1796945}, + url={https://doi.org/10.1080/02664763.2020.1796945} +} +@article{aznar-gimeno2023comparing, + title={Comparing the Min--Max--Median/{IQR} Approach with the Min--Max Approach, Logistic Regression and {XGBoost}, Maximising the {Youden} Index}, + author={Aznar-Gimeno, R. and Esteban, L.M. and Sanz, G. and del-Hoyo-Alonso, R.}, + journal={Symmetry (Basel)}, + volume={15}, + year={2023}, + doi={10.3390/sym15030756}, + url={https://doi.org/10.3390/sym15030756} +} +@article{klemt2023complete, + title={Complete blood platelet and lymphocyte ratios increase diagnostic accuracy of periprosthetic joint infection following total hip arthroplasty}, + author={Klemt, C. and Tirumala, V. and Smith, E.J. and Xiong, L. and Kwon, Y.M.}, + journal={Arch. Orthop. Trauma Surg.}, + volume={143}, + year={2023}, + pages={1441--1449}, + doi={10.1007/s00402-021-04309-w} +} +@article{ji2017monocyte, + title={Monocyte/lymphocyte ratio predicts the severity of coronary artery disease: A syntax score assessment}, + author={Ji, H. and Li, Y. and Fan, Z. and Zuo, B. and Jian, X. and Li, L. and Liu, T.}, + journal={BMC Cardiovasc. Disord.}, + volume={17}, + year={2017}, + pages={1--8}, + doi={10.1186/s12872-017-0507-4} +} +@article{muller2019amyloid, + title={Amyloid-$\beta$ {PET}—Correlation with cerebrospinal fluid biomarkers and prediction of {Alzheimer{'}s} disease diagnosis in a memory clinic}, + author={M{\"u}ller, Ebba Gl{\o}ersen and Edwin, Trine Holt and Stokke, Caroline and Navelsaker, Sigrid Stensby and Babovic, Almira and Bogdanovic, Nenad and Knapskog, Anne Brita and Revheim, Mona Elisabeth}, + journal={PloS one}, + volume={14}, + number={8}, + pages={e0221365}, + year={2019}, + publisher={Public Library of Science San Francisco, CA USA} +} +@article{neumann2023combining, + title={Combining Multiple Multimodal Speech Features into an Interpretable Index Score for Capturing Disease Progression in {Amyotrophic Lateral Sclerosis}}, + author={Neumann, Michael and Kothare, Hardik and Ramanarayanan, Vikram}, + volume={2353}, + year={2023}, + journal={Interspeech} +} +@article{aznar2022stepwise, + title={A Stepwise Algorithm for Linearly Combining Biomarkers under {Youden} Index Maximization}, + author={Aznar-Gimeno, Roc{\'\i}o and Esteban, Luis M and del-Hoyo-Alonso, Rafael and Borque-Fernando, {\'A}ngel and Sanz, Gerardo}, + journal={Mathematics}, + volume={10}, + number={8}, + pages={1221}, + year={2022}, + publisher={MDPI} +} +@article{bansal2013does, + title={When does combining markers improve classification performance and what are implications for practice?}, + author={Bansal, Aasthaa and Sullivan Pepe, Margaret}, + journal={Statistics in medicine}, + volume={32}, + number={11}, + pages={1877--1892}, + year={2013}, + publisher={Wiley Online Library} +} +@article{faria2016neutrophil, + title={The neutrophil-to-lymphocyte ratio: a narrative review}, + author={Faria, Sara Socorro and Fernandes Jr, Paulo C{\'e}sar and Silva, Marcelo Jos{\'e} Barbosa and Lima, Vladmir C and Fontes, Wagner and Freitas-Junior, Ruffo and Eterovic, Agda Karina and Forget, Patrice}, + journal={ecancermedicalscience}, + volume={10}, + year={2016}, + publisher={ecancer Global Foundation} +} +@article{nyblom2006ast, + title={The {AST}/{ALT} ratio as an indicator of cirrhosis in patients with {PBC}}, + author={Nyblom, Helena and Bj{\"o}rnsson, Einar and Simr{\'e}n, Magnus and Aldenborg, Frank and Almer, Sven and Olsson, Rolf}, + journal={Liver International}, + volume={26}, + number={7}, + pages={840--845}, + year={2006}, + publisher={Wiley Online Library} +} +@article{pedregosa2011scikit, + title={Scikit-learn: Machine learning in {Python}}, + author={Pedregosa, Fabian and Varoquaux, Ga{\"e}l and Gramfort, Alexandre and Michel, Vincent and Thirion, Bertrand and Grisel, Olivier and Blondel, Mathieu and Prettenhofer, Peter and Weiss, Ron and Dubourg, Vincent and others}, + journal={the Journal of machine Learning research}, + volume={12}, + pages={2825--2830}, + year={2011}, + publisher={JMLR. org} +} +@misc{tensorflow2015-whitepaper, + title={ {TensorFlow}: Large-Scale Machine Learning on Heterogeneous Systems}, + url={https://www.tensorflow.org/}, + note={Software available from tensorflow.org}, + author={ + Mart\'{i}n~Abadi and + Ashish~Agarwal and + Paul~Barham and + Eugene~Brevdo and + Zhifeng~Chen and + Craig~Citro and + Greg~S.~Corrado and + Andy~Davis and + Jeffrey~Dean and + Matthieu~Devin and + Sanjay~Ghemawat and + Ian~Goodfellow and + Andrew~Harp and + Geoffrey~Irving and + Michael~Isard and + Yangqing~Jia and + Rafal~Jozefowicz and + Lukasz~Kaiser and + Manjunath~Kudlur and + Josh~Levenberg and + Dandelion~Man\'{e} and + Rajat~Monga and + Sherry~Moore and + Derek~Murray and + Chris~Olah and + Mike~Schuster and + Jonathon~Shlens and + Benoit~Steiner and + Ilya~Sutskever and + Kunal~Talwar and + Paul~Tucker and + Vincent~Vanhoucke and + Vijay~Vasudevan and + Fernanda~Vi\'{e}gas and + Oriol~Vinyals and + Pete~Warden and + Martin~Wattenberg and + Martin~Wicke and + Yuan~Yu and + Xiaoqiang~Zheng}, + year={2015} +} + diff --git a/_articles/RJ-2025-036/manuscript.pdf b/_articles/RJ-2025-036/manuscript.pdf new file mode 100644 index 0000000000..bf9753e06a Binary files /dev/null and b/_articles/RJ-2025-036/manuscript.pdf differ diff --git a/_articles/RJ-2025-037/RJ-2025-037.R b/_articles/RJ-2025-037/RJ-2025-037.R new file mode 100644 index 0000000000..4cb40646ff --- /dev/null +++ b/_articles/RJ-2025-037/RJ-2025-037.R @@ -0,0 +1,619 @@ +# Generated by `rjournal_pdf_article()` using `knitr::purl()`: do not edit by hand +# Please edit RJ-2025-037.Rmd to modify this file + +## ----setup, include=FALSE----------------------------------------------------- +knitr::opts_chunk$set(echo = FALSE, warning = FALSE, message = FALSE, cache = TRUE) +library(AcceptReject) +library(numDeriv) +library(ggplot2) +library(parallel) +library(bench) + + +## ----"fig-inspect-1", out.width="50%", fig.align='center', fig.cap = "Inspection of the probability density function of the random variable of interest with the base probability density function, with $c = 1$ (default) (a) and $c = 4.3$ (b).", eval = TRUE, echo = FALSE, fig.pos="H"---- +#| fig-subcap: +#| - "For $c = 1$ (default)." +#| - "For $c = 4.3$." +#| layout-ncol: 2 +library(AcceptReject) + +# Considering c = 1 (default) + inspect( + f = dweibull, + f_base = dunif, + xlim = c(0, 5), + args_f = list(shape = 2, scale = 1), + args_f_base = list(min = 0, max = 5), + c = 1 +) + +# Considering c = 4.3 + inspect( + f = dweibull, + f_base = dunif, + xlim = c(0, 5), + args_f = list(shape = 2, scale = 1), + args_f_base = list(min = 0, max = 5), + c = 4.3 +) + + +## ----eval = FALSE, echo = TRUE------------------------------------------------ +# # Install CRAN version +# install.packages("remotes") +# +# # Installing the development version from GitHub +# # or install.packages("remotes") +# remotes::install_github("prdm0/AcceptReject", force = TRUE) +# +# # Load the package +# library(AcceptReject) + + +## ----logo, out.width = "25%", fig.align = "center", fig.cap="Logo of the package.", eval=TRUE, echo = FALSE, fig.pos="H", echo=FALSE---- +knitr::include_graphics("figures/logo.png") + + +## ----echo = TRUE, eval = FALSE------------------------------------------------ +# inspect( +# f, +# args_f, +# f_base, +# args_f_base, +# xlim, +# c = 1, +# alpha = 0.4, +# color_intersection = "#BB9FC9", +# color_f = "#FE4F0E", +# color_f_base = "#7BBDB3" +# ) + + +## ----------------------------------------------------------------------------- +#| eval: FALSE +#| echo: TRUE +# library(AcceptReject) +# +# # Considering c = 1 (default) +# inspect( +# f = dweibull, +# f_base = dunif, +# xlim = c(0, 5), +# args_f = list(shape = 2, scale = 1), +# args_f_base = list(min = 0, max = 5), +# c = 1 +# ) + + +## ----eval = FALSE, echo = TRUE------------------------------------------------ +# accept_reject( +# n = 1L, +# continuous = TRUE, +# f = NULL, +# args_f = NULL, +# f_base = NULL, +# random_base = NULL, +# args_f_base = NULL, +# xlim = NULL, +# c = NULL, +# parallel = FALSE, +# cores = NULL, +# warning = TRUE, +# ... +# ) + + +## ----eval = TRUE, echo = TRUE------------------------------------------------- +set.seed(0) + +# Generate 100 observations from a random variable X with +# f_X(x) = 2x, 0 <= x <= 1. +x <- accept_reject( + n = 100L, + f = function(x) 2 * x, + args_f = list(), + xlim = c(0, 1), + warning = FALSE +) +print(x[1L:8L]) + + +## ----------------------------------------------------------------------------- +#| echo: TRUE +# setting a seed for reproducibility +set.seed(0) +x <- accept_reject( + n = 2000L, + f = dbinom, + continuous = FALSE, + args_f = list(size = 5, prob = 0.5), + xlim = c(0, 10) +) + +# Printing the first 10 (default) observations +print(x) + +# Printing the first 20 observations +print(x, n_min = 20L) + +# Summary +summary(x) + + +## ----eval = FALSE, echo = TRUE------------------------------------------------ +#| echo: TRUE +#| eval: FALSE +# ## S3 method for class 'accept_reject' +# plot( +# x, +# color_observed_density = "#BB9FC9", +# color_true_density = "#FE4F0E", +# color_bar = "#BB9FC9", +# color_observable_point = "#7BBDB3", +# color_real_point = "#FE4F0E", +# alpha = 0.3, +# hist = TRUE, +# ... +# ) + + +## ----"fig-plotfunc-1", fig.cap="Plotting the theoretical density function (a) and the probability mass function (b), with details of the respective parameters in the code.", echo = TRUE, out.width="50%", fig.align='center', fig.pos="H"---- +#| fig-subcap: +#| - "Weibull with $n = 2000$ observations." +#| - "Binomial with $n = 1000$ observations." +#| layout-ncol: 2 +library(AcceptReject) + +# Generating and plotting the theoretical density with the +# observed density. + +# setting a seed for reproducibility +set.seed(0) + +# Continuous case +accept_reject( + n = 2000L, + continuous = TRUE, + f = dweibull, + args_f = list(shape = 2.1, scale = 2.2), + xlim = c(0, 10) +) |> + plot( + hist = FALSE, + color_true_density = "#2B8b99", + color_observed_density = "#F4DDB3", + alpha = 0.6 + ) # Changing some arguments in plot() + +# Discrete case +accept_reject( + n = 1000L, + f = dbinom, + continuous = FALSE, + args_f = list(size = 5, prob = 0.5), + xlim = c(0, 10) +) |> plot() + + +## ----"fig-poisson-1", echo = TRUE, fig.cap="Generating observations from a Poisson distribution using the acceptance-rejection method, with $n = 25$ (a) and $n = 2500$ (b), respectively.", fig.align='center', out.width="50%", fig.pos="H"---- +#| fig-subcap: +#| - "n = 25 observations." +#| - "n = 2500 observations." +#| layout-ncol: 2 +library(AcceptReject) +library(parallel) +library(cowplot) # install.packages("cowplot") + +# Ensuring reproducibility in parallel computing +RNGkind("L'Ecuyer-CMRG") +set.seed(0) +mc.reset.stream() + +# Simulation +simulation <- function(n, lambda = 0.7) + accept_reject( + n = n, + f = dpois, + continuous = FALSE, # discrete case + args_f = list(lambda = lambda), + xlim = c(0, 20), + parallel = TRUE # Parallelizing the code in Unix-based systems +) + +# Generating observations +# n = 25 observations +system.time({x <- simulation(25L)}) +plot(x) + +# n = 2500 observations +system.time({y <- simulation(2500L)}) +plot(y) + + +## ----echo = TRUE, fig.cap="Generating observations from a continuous random variable with a Standard Normal distribution.", fig.align='center', out.width="50%", fig.pos="H"---- +#| label: fig-normal +#| fig-cap: "Generating observations from a continuous random variable with a Standard Normal distribution, with $n = 50$ and $n = 500$ observations, respectively." +#| fig-subcap: +#| - "n = 50 observations." +#| - "n = 500 observations." +#| layout-ncol: 2 + +library(AcceptReject) +library(parallel) + +# Ensuring reproducibility in parallel computing +RNGkind("L'Ecuyer-CMRG") +set.seed(0) +mc.reset.stream() + +# Generating observations +accept_reject( + n = 50L, + f = dnorm, + continuous = TRUE, + args_f = list(mean = 0, sd = 1), + xlim = c(-4, 4), + parallel = TRUE +) |> plot() + +accept_reject( + n = 500L, + f = dnorm, + continuous = TRUE, + args_f = list(mean = 0, sd = 1), + xlim = c(-4, 4), + parallel = TRUE +) |> plot() + + +## ----echo = TRUE-------------------------------------------------------------- +#| label: modified_beta +#| echo: TRUE +#| eval: TRUE +#| +library(numDeriv) + +pdf <- function(x, G, ...){ + numDeriv::grad( + func = \(x) G(x, ...), + x = x + ) +} + +# Modified Beta Distributions +# Link: https://link.springer.com/article/10.1007/s13571-013-0077-0 +generator <- function(x, G, a, b, beta, ...){ + g <- pdf(x = x, G = G, ...) + numerator <- beta^a * g * G(x, ...)^(a - 1) * (1 - G(x, ...))^(b - 1) + denominator <- beta(a, b) * (1 - (1 - beta) * G(x, ...))^(a + b) + numerator/denominator +} + +# Probability density function - Modified Beta Weibull +pdf_mbw <- function(x, a, b, beta, shape, scale) + generator( + x = x, + G = pweibull, + a = a, + b = b, + beta = beta, + shape = shape, + scale = scale + ) + +# Checking the value of the integral +integrate( + f = \(x) pdf_mbw(x, 1, 1, 1, 1, 1), + lower = 0, + upper = Inf +) + + +## ----echo = TRUE, out.width="50%", fig.align='center', fig.pos="H"------------ +#| label: fig-nadarajah +#| fig.cap: Inspecting the Weibull distribution with shape = 2, scale = 1.2, with the support xlim = c(0, 4) and c = 1 (default) (a) and c = 2.2 (b), respectively. +#| fig.subcap: +#| - "For c = 1." +#| - "For c = 2.2." +#| layout-ncol: 2 + +library(AcceptReject) + +# True parameters +a <- 10.5 +b <- 4.2 +beta <- 5.9 +shape <- 1.5 +scale <- 1.7 + +# c = 1 (default) +inspect( + f = pdf_mbw, + f_base = dweibull, + xlim = c(0, 4), + args_f = list( + a = a, + b = b, + beta = beta, + shape = shape, + scale = scale + ), + args_f_base = list(shape = 2, scale = 1.2), + c = 1 +) + +# c = 2.2 +inspect( + f = pdf_mbw, + f_base = dweibull, + xlim = c(0, 4), + args_f = list( + a = a, + b = b, + beta = beta, + shape = shape, + scale = scale + ), + args_f_base = list(shape = 2, scale = 1.2), + c = 2.2 +) + + +## ----out.width="50%", fig.align='center', echo = FALSE, fig.pos="H"----------- +#| label: fig-benchmarking +#| fig.cap: Benchmarking for different sample sizes, considering the Weibull distribution and the uniform distribution as the base density, with Weibull distribution and Uniform distribution (default), respectively. +#| fig-subcap: +#| - "Weibull distribution." +#| - "Uniform distribution (default)." +#| layout-ncol: 2 + +library(AcceptReject) +library(numDeriv) # install.packages("numDeriv") +library(bench) # install.packages("bench") +library(ggplot2) # install.packages("ggplot2") +library(parallel) + +simulation <- function(n, parallel = TRUE, base = TRUE) { + # True parameters + a <- 10.5 + b <- 4.2 + beta <- 5.9 + shape <- 1.5 + scale <- 1.7 + c <- 2.2 + + # Generate data with the true parameters using + # the AcceptReject package. + if (base) { + # Using the Weibull distribution as the base distribution + accept_reject( + n = n, + f = pdf_mbw, + args_f = list( + a = a, + b = b, + beta = beta, + shape = shape, + scale = scale + ), + f_base = dweibull, + args_f_base = list(shape = 2, scale = 1.2), + random_base = rweibull, + xlim = c(0, 4), + c = c, + parallel = parallel + ) + } else { + # Using the uniforme distribution as the base distribution + accept_reject( + n = n, + f = pdf_mbw, + args_f = list( + a = a, + b = b, + beta = beta, + shape = shape, + scale = scale + ), + xlim = c(0, 4), + parallel = parallel + ) + } +} + +benchmark <- function(n_values, + time_unit = 's', + base = TRUE) { + # Initialize an empty data frame to store the results + results_df <- data.frame() + + # Run benchmarks for each sample size and each type of code + for (n in n_values) { + for (parallel in c(TRUE, FALSE)) { + results <- bench::mark( + simulation( + n = n, + parallel = parallel, + base = base + ), + time_unit = time_unit, + memory = FALSE, + check = FALSE, + filter_gc = FALSE + ) + + # Convert results to data frame and add columns for the sample + # size and type of code + results_df_temp <- as.data.frame(results) + results_df_temp$n <- n + results_df_temp + results_df_temp$Code <- ifelse(parallel, "Parallel", "Serial") + + # Append the results to the results data frame + results_df <- rbind(results_df, results_df_temp) + } + } + + # Create a scatter plot of the median time vs the sample size, + # colored by the type of code + ggplot(results_df, aes(x = n, y = median, color = Code)) + + geom_point() + + scale_x_log10() + + scale_y_log10() + + labs(x = "Sample Size (n)", y = "Median Time (s)", color = "Code Type") + + ggtitle("Benchmark Results") + + ggplot2::theme( + axis.title = ggplot2::element_text(face = "bold"), + title = ggplot2::element_text(face = "bold"), + legend.title = ggplot2::element_text(face = "bold"), + plot.subtitle = ggplot2::element_text(face = "plain") + ) +} + +# Sample sizes +n <- c(50, + 250, + 500, + 1e3, + 5e3, + 10e3, + 15e3, + 25e3, + 50e3, + 100e3, + 150e3, + 250e3, + 500e3, + 1e6) + +# Ensuring reproducibility in parallel computing +RNGkind("L'Ecuyer-CMRG") +set.seed(0) +mc.reset.stream() + +# Run the benchmark function for multiple sample sizes +n |> benchmark(n_values = _, base = TRUE) +n |> benchmark(n_values = _, base = FALSE) + + +## ----out.width="50%", fig.align='center', fig.pos='H'------------------------- +#| label: fig-bench +#| fig.cap: "Comparison between the AcceptReject and SimDesign package for different sample sizes, considering the generation of observations from a random variable with a Modified Beta Weibull distribution, serial processing with AcceptReject (a) and parallel processing with AcceptReject package (b), respectively." +#| fig-subcap: +#| - "Weibull distribution." +#| - "Uniform distribution (default)." +#| layout-ncol: 2 +#| warning: false +#| echo: false + +library(AcceptReject) +library(SimDesign) +library(numDeriv) +library(bench) +library(parallel) + +simulation_1 <- function(n, parallel = TRUE, base = TRUE) { + accept_reject( + n = n, + f = pdf_mbw, + args_f = list( + a = 10, + b = 1, + beta = 20.5, + shape = 2, + scale = 0.3 + ), + xlim = c(0, 1), + parallel = parallel + ) +} + +simulation_2 <- function(n) { + df = \(x) pdf_mbw( + x = x, + a = 10, + b = 1, + beta = 20.5, + shape = 2, + scale = 0.3 + ) + dg = \(x) dunif(x = x, min = 0, max = 1) + rg = \(n) runif(n = n, min = 0, max = 1) + + # when df and dg both integrate to 1, acceptance probability = 1/M + M <- + rejectionSampling(df = df, dg = dg, rg = rg) + rejectionSampling(n, + df = df, + dg = dg, + rg = rg, + M = M) +} + +benchmark <- function(n_values, parallel = TRUE) { + # Initialize an empty data frame to store the results + results_df <- data.frame() + + # Run benchmarks for each sample size and each type of code + filter_gc <- ifelse(parallel, FALSE, TRUE) + for (n in n_values) { + results <- bench::mark( + AcceptReject = simulation_1(n = n, parallel = parallel), + SimDesign = simulation_2(n = n), + time_unit = 's', + memory = FALSE, + check = FALSE, + filter_gc = filter_gc + ) + + # Convert results to data frame and add columns for the sample + # size and type of code + results_df_temp <- results + results_df_temp$n <- n + + # Append the results to the results data frame + results_df <- rbind(results_df, results_df_temp) + } + + # Create a scatter plot of the median time vs the sample size, + # colored by the type of code + ggplot(results_df, aes(x = n, y = median, color = expression)) + + geom_point() + + scale_x_log10() + + scale_y_log10() + + labs(x = "Sample Size (n)", y = "Median Time (s)", color = "Packages") + + ggtitle("Benchmark Results") + + theme( + axis.title = element_text(face = "bold"), + title = element_text(face = "bold"), + legend.title = element_text(face = "bold"), + plot.subtitle = element_text(face = "plain") + ) +} + +small_and_moderate_sample <- c(100, + 150, + 250, + 500, + 1e3, + 1500, + 2000, + 2500, + 3500, + 4500, + 5500, + 7500, + 10e3, + 25e3) +big_sample <- c(50e3, 75e3, 100e3, 150e3, 250e3, 500e3, 750e3, 1e6) +# Ensuring reproducibility in parallel computing +RNGkind("L'Ecuyer-CMRG") +set.seed(0) +mc.reset.stream() + +# Serial +benchmark(n_values = small_and_moderate_sample, parallel = FALSE) + +# Parallel +benchmark(n_values = big_sample, parallel = TRUE) + diff --git a/_articles/RJ-2025-037/RJ-2025-037.Rmd b/_articles/RJ-2025-037/RJ-2025-037.Rmd new file mode 100644 index 0000000000..72994b4d70 --- /dev/null +++ b/_articles/RJ-2025-037/RJ-2025-037.Rmd @@ -0,0 +1,899 @@ +--- +title: 'AcceptReject: An R Package for Acceptance-Rejection Method' +date: '2026-02-04' +abstract: | + The AcceptReject package, available for the R programming language on the Comprehensive R Archive Network (CRAN), versioned and maintained on GitHub, offers a simple and efficient solution for generating pseudo-random observations of discrete or continuous random variables using the acceptance-rejection method. This method provides a viable alternative for generating pseudo-random observations in univariate distributions when the inverse of the cumulative distribution function is not in closed form or when suitable transformations involving random variables that we know how to generate are unknown, thereby facilitating the generation of observations for the variable of interest. The package is designed to be simple, intuitive, and efficient, allowing for the rapid generation of observations and supporting multicore parallelism on Unix-based operating systems. Some components are written using C++, and the package maximizes the acceptance probability of the generated observations, resulting in even more efficient execution. The package also allows users to explore the generated pseudo-random observations by comparing them with the theoretical probability mass function or probability density function and to inspect the underlying probability density functions that can be used in the method for generating observations of continuous random variables. This article explores the package in detail, discussing its functionalities, benefits, and practical applications, and provides various benchmarks in several scenarios. +draft: no +author: +- name: Pedro Rafael Diniz Marinho + affiliation: Federal University of Paraíba + address: + - Department of Statistics + - Cidade Universitária, s/n, Departamento de Estatística - UFPB + orcid: 0000-0003-1591-8300 + email: pedro.rafael.mariho@gmail.com + url: https://github.com/prdm0 +- name: Vera L. D. Tomazella + affiliation: Federal University of São Carlos + address: + - Department of Statistics + - Rodovia Washington Luiz, Km 235, Monjolinho + orcid: 0000-0002-6780-2089 + email: vera@ufscar.br + url: https://www.servidores.ufscar.br/vera/ +type: package +output: + rjtools::rjournal_pdf_article: + toc: no + rjtools::rjournal_web_article: + self_contained: yes + toc: no + mathjax: https://cdn.jsdelivr.net/npm/mathjax@4/tex-mml-chtml.js +header-includes: +- \usepackage{subfig} +- \usepackage{float} +bibliography: RJreferences.bib +date_received: '2024-07-16' +volume: 17 +issue: 4 +slug: RJ-2025-037 +journal: + lastpage: 155 + firstpage: 133 + +--- + + +```{r setup, include=FALSE} +knitr::opts_chunk$set(echo = FALSE, warning = FALSE, message = FALSE, cache = TRUE) +library(AcceptReject) +library(numDeriv) +library(ggplot2) +library(parallel) +library(bench) +``` + + + + + + + + + +# Introduction + +The class of Monte Carlo methods and algorithms is a powerful and versatile computational technique that has been widely used in a variety of fields, from physics, statistics, and economics to biology and areas such as social sciences. Through the modeling of complex systems and the performance of stochastic experiments, Monte Carlo simulation allows researchers to explore scenarios that would be impractical or impossible to investigate through traditional experimental methods. + +A critical component of any Monte Carlo simulation is the ability to generate pseudo-random observations of a sequence of random variables that follow a given probability distribution. In many cases, these variables can be discrete or continuous, and the ability to generate observations from a sequence of random variables efficiently and accurately is fundamental in simulation studies. + +There are several techniques to generate observations from a sequence of random variables, many of which depend on the availability of a closed-form quantile function of the distribution of interest or knowledge of some transformation involving random variables that we know how to generate observations from. Depending on the distribution one wishes to generate observations from, the inverse of the cumulative distribution function (quantile function) does not have a closed form, and in many cases, we do not know transformations involving random variables that we know how to generate observations from, and thus, we can generate observations from the random variable of interest. In this sense, computational methods, such as the Acceptance-Rejection Method (ARM), proposed by John von Neumann in 1951, see @neumann1951various, are a viable alternative for generating observations in the univariate context. For example, in the current scenario, where various distributions and probability distribution generators are being proposed, with these generators being functions that, from the knowledge of a probability distribution, can generate a new probability distribution, the ARM is widely used because it proves to be useful for generating pseudo-random observations of random variables of a discrete or continuous nature. + +In the case of univariate distributions, the ARM is an effective technique for generating pseudo-random observations of random variables and has several advantages over other methods such as the Metropolis-Hastings (MH) algorithm, for example. The ARM, often also called the Rejection Method, is an algorithm that can be easily parallelized. Moreover, this method is not sensitive to initial parameters, as in the case of the MH algorithm, and leads to observations that are not dependent. Additionally, in the ARM, it is not necessary to perform a "burn-in," and it is not required to wait for the algorithm to reach a stationary level, as in the case of MH. The ARM, when well implemented, can be a great alternative for generating observations of random variables in the univariate case and can be considered in many situations before opting for computationally more expensive methods such as MH or Gibbs sampling. + +The \CRANpkg{AcceptReject} package [@AcceptReject] for the R programming language, also available and maintained at , was specifically developed to handle these challenges. It offers a simple and efficient solution for generating pseudo-random observations of discrete or continuous random variables in univariate distributions, using the ARM to generate pseudo-random observations of a sequence of random variables whose probability mass function (discrete case) or probability density function (continuous case) are complex or poorly explored. The library has detailed documentation and vignettes to assist user understanding. On the package's website, you can find usage examples and the complete documentation of the package , in addition to the vignettes. + +The design of the \CRANpkg{AcceptReject} package is simple and intuitive, allowing users to generate observations of random variables quickly and efficiently, and in a parallelized manner, using multicore parallelism on Unix-based operating systems. The package also performs excellently on Windows, as much of the library's performance comes from optimizing the probability of accepting observations of an auxiliary random variable as observations of the random variable of interest. Additionally, some points have been further optimized using the \CRANpkg{Rcpp} [@eddelbuettel2024package] and \CRANpkg{RcppArmadillo} [@eddelbuettel2014rcpparmadillo] libraries. + +The library, through the `accept_reject()` function , only requires the user to provide the probability mass function (for discrete variables) or the probability density function (for continuous variables) they wish to generate observations from, in addition to the list of arguments of the distribution of interest. Other arguments can be passed and are optional, such as the normalization constant value, which by default is obtained automatically, arguments involving the optimization method, a probability mass function (discrete case), or probability density function (continuous case), if the user wishes to specify a base distribution, which can be useful in cases of more complex distributions. + +With the use of simple functions such as [`inspect()`](https://prdm0.github.io/AcceptReject/reference/) and [`plot()`](https://prdm0.github.io/AcceptReject/reference/plot.accept_reject.html) the user can inspect the base probability density function and the quality of generating pseudo-random observations quickly, without spending much time on plotting, making the analysis process faster, which is as important as computational efficiency. + +In this article, we will explore the \pkg{AcceptReject} package in detail. We will discuss how it can be used, the functionalities it offers, and the benefits it brings to users. Through examples and discussions, we hope to demonstrate the value of the \pkg{AcceptReject} package as a useful addition to the toolbox of any researcher or professional working with Monte Carlo simulations. + +Initially, in Section 2, the article will discuss the ARM and how it can be used to generate pseudo-random observations of random variables. In Section 3, the dependencies used in the current version of the package will be listed and referenced, and some computational details considered in the examples and simulations will be presented. In Section 4, it shows how to install and load the package from CRAN and GitHub. In Section 5, each of the functions exported by the \pkg{AcceptReject} package is discussed, and some examples are presented to help expose the details. In Section 6, more examples of the package's use are presented, showing the package for generating pseudo-random observations of discrete and continuous random variables, and Section 7 is dedicated to presenting a more complex problem, in a more realistic scenario, where a probability distribution generator, proposed by @nadarajah2014modified, is used to generate the Modified Beta Weibull (MBW) probability density function, where we do not know its quantile function, and thus, the ARM applied using the \pkg{AcceptReject} package can solve the problem of generating observations of a random variable with MBW distribution. In Section 8, benchmarks are presented in different scenarios to demonstrate the computational efficiency of the package. In Section 9, similar works are compared with the \CRANpkg{AcceptReject} package, and layout and performance advantages are presented. Finally, in Section 10, a conclusion is presented with suggestions for future improvements to enhance the package. + +# Acceptance-Rejection Method - ARM + +The acceptance-rejection method - ARM, proposed by [@neumann1951various] and often simply called the rejection method, is a useful method for generating pseudo-random observations of discrete or continuous random variables, mainly in the context of univariate distributions, to which the [AcceptReject]{.pkg} package is intended. The method is based on the idea that, if we can find a probability distribution $G_Y$ of a random variable $Y$ that envelops (bounds) the probability distribution $F_X$ of the random variable $X$ of interest, and $G$ and $F$ share the same support, then we can generate observations of $X$. In addition to sharing the same support and having $G_Y$ bounding $f_X$, it is necessary that we can generate observations of $Y$, that is, in practice, we will have a function that generates observations of this random variable. Through the generator of $Y$, we can then generate observations of the random variable of interest $X$, accepting or rejecting the observations generated by the observation generator of $Y$, based on the probability density functions or probability mass functions of $X$ and $Y$, in the continuous or discrete case, respectively. + +Consider a hypothetical example, with $x,y \in [0, 5]$, where $x$ and $y$ are observations of the random variables $X$ (variable of interest) and $Y$ (variable from which we can generate observations), respectively, and assuming we do not know how to generate observations of $X$, for example, from the Weibull distribution (in practice, this is not true), where $X \sim Weibull(\alpha = 2, \beta = 1)$, with $c \times Y \sim \mathcal{U}(a = 0, b = 5)$, where $c \geq 1$. In this example, given a density function used as a basis (density function of $Y$), the initial idea of ARM is to make the base density, denoted by $g_Y$, envelop (bound) the probability density function of interest, that is, envelop $f_X$. The Figure \@ref(fig:fig-inspect-1) (a) and Figure \@ref(fig:fig-inspect-1) (b) illustrate the procedure for choosing the constant $c$ for ARM, with $c = 1$ and $c = 4.3$, respectively. + +```{r "fig-inspect-1", out.width="50%", fig.align='center', fig.cap = "Inspection of the probability density function of the random variable of interest with the base probability density function, with $c = 1$ (default) (a) and $c = 4.3$ (b).", eval = TRUE, echo = FALSE, fig.pos="H"} +#| fig-subcap: +#| - "For $c = 1$ (default)." +#| - "For $c = 4.3$." +#| layout-ncol: 2 +library(AcceptReject) + +# Considering c = 1 (default) + inspect( + f = dweibull, + f_base = dunif, + xlim = c(0, 5), + args_f = list(shape = 2, scale = 1), + args_f_base = list(min = 0, max = 5), + c = 1 +) + +# Considering c = 4.3 + inspect( + f = dweibull, + f_base = dunif, + xlim = c(0, 5), + args_f = list(shape = 2, scale = 1), + args_f_base = list(min = 0, max = 5), + c = 4.3 +) +``` + +Notice that the support of the base probability density function $g_Y$ must be the same as $f_X$, since the observations generated from $Y$ are accepted or rejected as observations of $X$ (the random variable of interest). The interval $x,y \in [0, 5]$ was chosen because the density near the upper limit of this interval quickly drops to values close to zero, eliminating the need to consider a broader interval, although it could be considered. It is possible to see, with the help of Figure \@ref(fig:fig-inspect-1) (a), that $c = 1$ is not a good choice, as much of the density $f_X$ would be left out and not captured by $g_Y$, i.e., it would not be bounded by $g_Y$. Therefore, increasing the value of $c$ in this hypothetical example is necessary for good generation of observations of $X$. Note in Figure \@ref(fig:fig-inspect-1) (a) that the intersection area is approximately 0.39, and in the optimal case, we should have $c \geq 1$, such that the intersection area between $f_X$ and $g_Y$ is 1, and $c$ is the smallest possible value. A small value of $c$ will always imply a higher probability of accepting observations of $Y$ as observations of $X$. Thus, the choice of $c$ will be a trade-off between computational efficiency (higher acceptance probability) and ensuring that the base density bounds the density of interest. + +Note now that for $c = 4.3$, in Figure \@ref(fig:fig-inspect-1) (b), the intersection area is equal to $1$ and $g_Y$ does not excessively bound $f_X$, thus becoming a convenient value. In this way, $c = 4.3$ could be an appropriate value for the given example. However, a larger value of $c$ could be used, which would decrease the computational efficiency of ARM, for reasons that will be presented later. + +What makes the method interesting is that the iterations of ARM are not mathematically dependent, making it easily parallelizable. Although it can be extended to the bivariate or multivariate case, the method is most commonly used to generate univariate observations. Moreover, if the distribution of the random variable from which pseudo-random observations are to be generated is indexed by many parameters, ARM can be applied without major impacts, as the number of parameters usually does not affect the efficiency of the method. + +Considering the discrete case, suppose $X$ and $Y$ are random variables with probability density function (pdf) or probability function (pf) $f$ and $g$, respectively. Furthermore, suppose there exists a constant $c$ such that + +$$\frac{f(x)}{g(y)} \leq c,$$ for every value of $x$, with $f(x) > 0$. To use the acceptance-rejection method to generate observations of the random variable $X$, using the algorithm below, first find a random variable $Y$ with pdf or pf $g$, such that it satisfies the above condition. + +It is important that the chosen random variable $Y$ is such that you can easily generate its observations. This is because the acceptance-rejection method is computationally more intensive than more direct methods such as the transformation method or the inversion method, which only requires the generation of pseudo-random numbers with a uniform distribution. + +**Algorithm of the Acceptance-Rejection Method**: + +1 - Generate an observation $y$ from a random variable $Y$ with pdf/pf $g$; + +2 - Generate an observation $u$ from a random variable $U\sim \mathcal{U} (0, 1)$; + +3 - If $u < \frac{f(y)}{cg(y)}$ accept $x = y$; otherwise, reject $y$ as an observation of the random variable $X$ and go back to step 1. + +**Proof**: Consider the discrete case, that is, $X$ and $Y$ are random variables with pfs $f$ and $g$, respectively. By step 3 of the algorithm above, we have $\{accept\} = \{x = y\} = u < \frac{f(y)}{cg(y)}$. That is, + +$$P(accept | Y = y) = \frac{P(accept \cap \{Y = y\})}{g(y)} = \frac{P(U \leq f(y)/cg(y)) \times g(y)}{g(y)} = \frac{f(y)}{cg(y)}.$$ Hence, by the [**Law of Total Probability**](https://en.wikipedia.org/wiki/Law_of_total_probability), we have: + +$$P(accept) = \sum_y P(accept|Y=y)\times P(Y=y) = \sum_y \frac{f(y)}{cg(y)}\times g(y) = \frac{1}{c}.$$ Therefore, by the acceptance-rejection method, we accept the occurrence of $Y$ as an occurrence of $X$ with probability $1/c$. Moreover, by Bayes' Theorem, we have + +$$P(Y = y | accept) = \frac{P(accept|Y = y)\times g(y)}{P(accept)} = \frac{[f(y)/cg(y)] \times g(y)}{1/c} = f(y).$$ The result above shows that accepting $x = y$ by the algorithm's procedure is equivalent to accepting a value from $X$ that has pf $f$. For the continuous case, the proof is similar. + +Notice that to reduce the computational cost of the method, we should choose $c$ in such a way that we can maximize $P(accept)$. Therefore, choosing an excessively large value of the constant $c$ will reduce the probability of accepting an observation from $Y$ as an observation of the random variable $X$. + +Computationally, it is convenient to consider $Y$ as a random variable with a uniform distribution on the support of $f$, since generating observations from a uniform distribution is straightforward on any computer. For the discrete case, considering $Y$ with a discrete uniform distribution might be a good alternative. + +Choosing an excessively large value for $c$ will increase the chances of generating good observations of $X$, as an excessive value for $c$, in many situations, will allow the base distribution $g$ to bound $f$. The problem, however, will be the computational cost. Thus, the problem of choosing an appropriate value for $c$ is an optimization problem, where selecting an appropriate value for $c$ is a task that can be automated. Additionally, it is important to note that choosing an excessively large value for the constant $c$, although it will lead to excessively slow code, is not as significant a problem as choosing an excessively small value for $c$, which will lead to the generation of poor observations of $X$. Since it is an optimization problem, it is possible to choose a convenient value for $c$ that is neither too large nor too small, generating good observations for $X$ with a very acceptable computational cost. + +Therefore, the \pkg{AcceptReject} package aims to automate some tasks, including the optimization of the value of $c$. Thus, it becomes clear that, given a probability density function (continuous case) or probability mass function (discrete case) $g$, obtaining the smallest value of $c$, such that $c \geq 1$, combined with the possibility of code parallelism, are excellent ways to reduce the computational cost of ARM. This is, among other things, what the \pkg{AcceptReject} package does through its [`accept_reject()`](https://prdm0.github.io/AcceptReject/reference/accept_reject.html) function and other available functions. In the following sections, we will discuss each of them in detail. + +The first article describing the acceptance-rejection method is by von Neumann, titled "Various techniques used in connection with random digits" from 1951 [@neumann1951various]. Several common books in the field of stochastic simulations and Monte Carlo methods also detail the method, such as [@kroese2013handbook], [@asmussen2007stochastic], [@kemp2003discrete], and [@gentle2003random]. + +# Dependencies and Some Computational Details + +The \pkg{AcceptReject} package [@AcceptReject] has several dependencies, which, in its most current version, can be observed in the [DESCRIPTION](https://github.com/prdm0/AcceptReject/blob/main/DESCRIPTION) file. In the current version on GitHub, version 0.1.2, the dependencies are \CRANpkg{assertthat} [@assertthat], \CRANpkg{cli} [@cli], \CRANpkg{ggplot2} [@ggplot2], \CRANpkg{glue} [@glue], \CRANpkg{numDeriv} [@numDeriv], \CRANpkg{purrr} [@purrr], \CRANpkg{Rcpp} [@eddelbuettel2024package], \CRANpkg{RcppArmadillo} [@eddelbuettel2014rcpparmadillo], \CRANpkg{rlang} [@rlang], \CRANpkg{scales} [@scales], and \CRANpkg{scattermore} [@scattermore]. Suggested libraries include \CRANpkg{knitr} [@knitr], \CRANpkg{rmarkdown} [@rmarkdown], \CRANpkg{cowplot} [@cowplot], and \CRANpkg{testthat} [@testthat]. + +In some examples that use the pipe operator `|>`, internal to the R language, it is necessary to have version 4.1.0 or higher of the language. However, this operator is not essential in the examples and can be easily omitted. The `%>%` operator from the \CRANpkg{magrittr} package [@magrittr] can also be used, as the \pkg{purrr} package exports it. In fact, the `|>` operator has not been used internally in any function of the \pkg{AcceptReject} package, in the current version on GitHub, so that it can pass the automatic checks of GitHub Actions, whose tests also consider older versions of the R language. + +Additionally, some examples load the \CRANpkg{parallel} library, included in the R language since version 2.14 in 2011. In these examples, where ARM executions are performed in parallel, to ensure the reproducibility of the results, it is necessary to call the instructions `RNGkind("L'Ecuyer-CMRG")` and `mc.reset.stream()`. The `RNGkind("L'Ecuyer-CMRG")` instruction sets the L'Ecuyer-CMRG pseudo-random number generator [@l1999good] and [@l2002object], which is a safe type of generator to use in parallel calculations, as it is a generator of multiple independent streams. The `mc.reset.stream()` function resets the stream of random numbers, ensuring that in a subsequent execution, the same sequence is generated again. + +To take advantage of parallelism in ARM implemented in the \CRANpkg{AcceptReject} package, which works on Unix-based operating systems, it is not necessary to load the \CRANpkg{parallel} library, as the [`accept_reject()`](https://prdm0.github.io/AcceptReject/reference/accept_reject.html) function of the \pkg{AcceptReject} package already makes use of parallelism. Loading the \pkg{parallel} library is only interesting if reproducibility is desired when execution is done in parallel using multiple processes, meaning when there is interest in executing the instructions `RNGkind("L'Ecuyer-CMRG")` and `mc.reset.stream()`. + +Simulations for benchmarking in parallel and non-parallel scenarios were performed on a computer with the Arch Linux operating system, an Intel Core(TM) i7-1260P processor with 16 threads, a maximum processor frequency of 4.70 GHz, and 16 GB of RAM. The version of the R language was 4.4.0, and the computational times considered are in logarithmic scale, base 10. + +The \pkg{AcceptReject} package makes use of the S3 object-oriented system in R. Simple functions like `plot()`, `qqplot()`, and `print()` were exported to dispatch for the `accept_reject` class of the \pkg{AcceptReject} package, making it easier to use. Other object-oriented systems, such as the R6 system [@r6], were not considered, as this would make the use of the package non-idiomatic, and some users might feel discouraged from using the library. Historical details about the programming paradigms of the R language can be found in [@chambers2014object]. + +# Installation and loading the package + +The \pkg{AcceptReject} package is available on CRAN and GitHub and can be installed using the following command: + +```{r, eval = FALSE, echo = TRUE} +# Install CRAN version +install.packages("remotes") + +# Installing the development version from GitHub +# or install.packages("remotes") +remotes::install_github("prdm0/AcceptReject", force = TRUE) + +# Load the package +library(AcceptReject) +``` + +```{r logo, out.width = "25%", fig.align = "center", fig.cap="Logo of the package.", eval=TRUE, echo = FALSE, fig.pos="H", echo=FALSE} +knitr::include_graphics("figures/logo.png") +``` + +To access the latest updates of the \pkg{AcceptReject} package versions, check the [changelog](https://prdm0.github.io/AcceptReject/news/). For suggestions, questions, or to report issues and bugs, please open an [issue](https://github.com/prdm0/AcceptReject/issues). For a more general and quick overview of the package, read the [README.md](https://github.com/prdm0/AcceptReject) file or visit the package's website at to explore usage examples. + +# Function details and package usage + +The \pkg{AcceptReject} package provides functions not only to generate observations using ARM in an optimized manner, but it also exports auxiliary functions that allow inspecting the base density function $g_Y$, as well as the generation of pseudo-random observations after the creation, making the generation and inspection work efficient, easy, and intuitive. Efficient from a computational point of view by automatically optimizing the constant $c$ and, therefore, maximizing the acceptance probability $1/c$ of accepting observations of $Y$ as observations of $X$, in addition to allowing multicore parallelism in Unix-based operating systems. The computational efficiency combined with the ease of inspecting $g_Y$ and the generated observations makes the package pleasant to use. The \pkg{AcceptReject} library provides the following functions: + +- [`inspect()`](https://prdm0.github.io/AcceptReject/reference/inspect.html): a useful function for inspecting the probability density function $f_X$ with the base probability density function $g_Y$, highlighting the intersection between the two functions and the value of the areas, allowing experimentation with different values of $c$. This function does not perform any optimization; it simply facilitates the inspection of the proposed $g_Y$ as a base probability density function, returning an object of the secondary class `gg` and the primary class `ggplot` for graphs created with the \pkg{ggplot2} library, as shown in Figure \@ref(fig:fig-inspect-1) (a) and Figure \@ref(fig:fig-inspect-1) (b); + +- [`accept_reject()`](https://prdm0.github.io/AcceptReject/reference/accept_reject.html): implements ARM, optimizes the constant $c$, and performs parallelism if specified by the user. The user can also specify details in the optimization process or define a value for $c$ and thus assume this value of $c$, omitting the optimization process. Additionally, it is possible to specify or not a base probability mass function or probability density function $g_Y$, depending on whether $X$ is a discrete or continuous random variable, respectively. If omitted, $Y \sim \mathcal{U}$ discrete or continuous will be considered, depending on the nature of $X$. Moreover, the user can specify, in the case of not specifying the value of the constant $c$, an initial guess used in the optimization process of the constant $c$. A good guess can be given by graphically inspecting the relationship between $f_X$ and $g_Y$. In most cases, the `accept_reject()` function will provide good acceptances without specifying $g_Y$ different from the default discrete or continuous uniform distribution and will make a good estimation of $c$; + +- [`print.accept_reject()`](https://prdm0.github.io/AcceptReject/reference/print.accept_reject.html): a function responsible for printing useful information on the screen about objects of the `accept_reject` class returned by the [`accept_reject()`](https://prdm0.github.io/AcceptReject/reference/accept_reject.html) function, such as the number of generated observations, the estimated constant $c$, and the estimated acceptance probability of accepting observations of $Y$ as observations of $X$; + +- [`plot.accept_reject()`](https://prdm0.github.io/AcceptReject/reference/plot.accept_reject.html): operates on objects of the `accept_reject` class returned by the [`accept_reject()`](https://prdm0.github.io/AcceptReject/reference/accept_reject.html) function. The [`plot.accept_reject()`](https://prdm0.github.io/AcceptReject/reference/plot.accept_reject.html) function returns an object of the secondary class `gg` and the primary class `ggplot` from the \pkg{ggplot2} package, allowing easy graphical comparison of the probability density function or probability mass function $f_X$ with the observed probability density function or mass function from the generated data; + +- [`qqplot()`](https://prdm0.github.io/AcceptReject/reference/qqplot.html): constructs a Quantile-Quantile plot (QQ-plot) to compare the distribution of data generated by ARM (observed distribution) with the distribution of the random variable $X$ (theoretical distribution, denoted by $f_X$). In a very simple way, the QQ-plot is produced by passing an object of the `accept_reject` class, returned by the `accept_reject()` function, to the `qqplot()` function. + +In the following subsections, more details about the functions exported by the \pkg{AcceptReject} package will be presented. Examples are provided to facilitate understanding. Additional details about the functions can be found in the function documentation, which can be accessed using `help(package = "AcceptReject")` or on the package's website, hosted in the GitHub repository that versions the development of the library at . + +## Inspecting Density Functions + +The [`inspect()`](https://prdm0.github.io/AcceptReject/reference/inspect.html) function of the \pkg{AcceptReject} package is useful when we want to generate pseudo-random observations of a continuous random variable using ARM. It is possible to skip this inspection since the `accept_reject()` function already automatically considers the continuous uniform distribution as the base density and optimizes, based on this base distribution, the best value for the constant $c$. Additionally, other base densities $g_Y$ can be specified to the `accept_reject()` function, where the search for the constant $c$ will be done automatically, optimizing the value of $c$ and its relationship with $f_X$ and $g_Y$, given $g_Y$. + +To specify other base density functions $g_Y$, it is prudent to perform a graphical inspection of the relationship between $f_X$ and $g_Y$ to get an idea of the reasonableness of the candidate base probability density function $g_Y$. Thus, the `inspect()` function will automatically plot a graph with some useful information, as well as the functions $f_X$ and $g_Y$. The `inspect()` function will return an object with the secondary class `gg` and the primary class `ggplot` (a graph made with the \pkg{ggplot2} library) highlighting the intersection area between $f_X$ and $g_Y$ and the value of this area for a given $c$, specified in the `c` argument of the `inspect()` function. + +Theoretically, you can use any function $g_Y$ that has support equivalent to that of the function $f_X$, finding the appropriate value of $c$ that will make $g_Y$ envelop $f_X$, that is, so that the value of the intersection area integrates to 1. The `inspect()` function has the following form: + +```{r, echo = TRUE, eval = FALSE} +inspect( + f, + args_f, + f_base, + args_f_base, + xlim, + c = 1, + alpha = 0.4, + color_intersection = "#BB9FC9", + color_f = "#FE4F0E", + color_f_base = "#7BBDB3" +) +``` + +where: + +1. `f` is the probability density function $f_X$ of the random variable $X$ of interest; +2. `args_f`: is a list of arguments that will be passed to the probability density function $f_X$ and that specify the parameters of the density of $X$; +3. `f_base`: is the probability density function that will supposedly be used as the base density $g_Y$; +4. `args_f_base`: is a list of arguments that will be passed to the base probability density function $g_Y$ and that specify the parameters of the density of $Y$; +5. `xlim`: is a vector of size two that specifies the support of the functions $f_Y$ and $g_Y$ (they must be equivalent); +6. `c`: is the constant $c$ that will be used to multiply the base probability density function $g_Y$, so it envelops (bounds) $f_X$. The default is `c = 1`; +7. `alpha`: is the transparency of the intersection area (default is `alpha = 0.4`); +8. `color_intersection`: is the color of the intersection area between $f_X$ and $g_Y$, where the default is `color_intersection = #BB9FC9`; +9. `color_f`: is the color of the curve of the probability density function $f_X$ (default is `color_f = #FE4F0E`); +10. `color_f_base`: is the color of the curve of the base probability density function $g_Y$, where the default is `color_f_base = #7BBDB3`. + +The Figure \@ref(fig:fig-inspect-1) (a) and Figure \@ref(fig:fig-inspect-1) (b), presented at the beginning of this paper, were automatically created with the [`inspect()`](https://prdm0.github.io/AcceptReject/reference/inspect.html) function. As an example, in addition to those available in the package documentation and vignettes, here is the code that generated the graph shown in Figure \@ref(fig:fig-inspect-1) (a): + +```{r} +#| eval: FALSE +#| echo: TRUE +library(AcceptReject) + +# Considering c = 1 (default) +inspect( + f = dweibull, + f_base = dunif, + xlim = c(0, 5), + args_f = list(shape = 2, scale = 1), + args_f_base = list(min = 0, max = 5), + c = 1 +) +``` + +## Generating Observations with ARM + +The most important function of the \pkg{AcceptReject} package is the `accept_reject()` function, as it is the function that implements ARM and all the optimizations to obtain a good generation of $X$. The `accept_reject()` function has the following signature: + +```{r, eval = FALSE, echo = TRUE} +accept_reject( + n = 1L, + continuous = TRUE, + f = NULL, + args_f = NULL, + f_base = NULL, + random_base = NULL, + args_f_base = NULL, + xlim = NULL, + c = NULL, + parallel = FALSE, + cores = NULL, + warning = TRUE, + ... +) +``` + +where: + +1. `n`: is the number of observations to be generated (default `n = 1L`); +2. `continuous`: is a logical value that indicates whether the probability density function $f_X$ is continuous (default `continuous = TRUE`) or discrete, if `continuous = FALSE`; +3. `f`: is the probability density function $f_X$ of the random variable $X$ of interest; +4. `args_f`: is a list of arguments that will be passed to the probability density function $f_X$ and that specify the parameters of the density of $X$. No matter how many parameters there are in `f`, they should be passed as a list to `args_f`; +5. `f_base`: is the probability density function that will supposedly be used as the base density $g_Y$. It is important to note that this argument is only useful in this package when `continuous = TRUE`, as for the discrete case the package already has quite satisfactory computational performance. Additionally, visualizing a probability mass function that bounds `f_X` is more complicated than for the continuous case. In the discrete case, when `continuous = FALSE`, `f_base = NULL` will be considered the discrete uniform distribution; +6. `random_base`: is a function that generates observations from the base distribution $Y$ (default `random_base = NULL`). If `random_base = NULL`, the base distribution $Y$ will be considered uniform over the interval specified in the `xlim` argument; +7. `args_f_base`: is a list of arguments that will be passed to the base probability density function $g_Y$ and that specify the parameters of the density of $Y$. No matter how many parameters there are in `f_base`, they should be passed as a list to `args_f_base`; +8. `xlim`: is a vector of size two that specifies the support of the functions $f_Y$ and $g_Y$. It is important to remember that the support of $f_Y$ and $g_Y$ must be equivalent and will be informed by a single vector of size two passed as an argument to `xlim`; +9. `c`: is the constant $c$ that will be used to multiply the base probability density function $g_Y$, so it envelops (bounds) $f_X$. The default is `c = 1`. Unless there is a very strong reason, considering `c = NULL` as the default will be a good choice because the `accept_reject()` function will attempt to find an optimal value for the constant $c$; +10. `parallel`: is a logical value that indicates whether the generation of observations will be done in parallel (default `parallel = FALSE`). If `parallel = TRUE`, the generation of observations will be done in parallel on Unix-based systems, using the total number of cores available on the system. If `parallel = TRUE` and the operating system is Windows, the code will run serially, and no error will be returned even if `parallel = TRUE`; +11. `cores`: is the number of cores that will be used for parallel observation generation. If `parallel = TRUE` and `cores = NULL`, the number of cores used will be the total number of cores available on the system. If the user wishes to use a smaller number of cores, they can define it in the `cores` argument; +12. `warning`: is a logical value that indicates whether warning messages will be printed during the execution of the function (default `warning = TRUE`). If the user specifies a very small domain in `xlim`, the `accept_reject()` function will issue a warning informing that the specified domain is too small and that the generation of observations may be compromised; +13. `...`: additional arguments that the user might want to pass to the `optimize` function used to optimize the value of `c`. + +In a simpler use case, where the user does not wish to specify the base density function $g_Y$ passed as an argument to `f_base`, useful for generating observations of a sequence of continuous random variables ($x_1, \cdots, x_n$), the use of the `accept_reject()` function is quite straightforward. For example, to generate 100 observations of a random variable $X$ with a probability density function $f_X(x) = 2x$, $0 \leq x \leq 1$, the user could do: + +```{r, eval = TRUE, echo = TRUE} +set.seed(0) + +# Generate 100 observations from a random variable X with +# f_X(x) = 2x, 0 <= x <= 1. +x <- accept_reject( + n = 100L, + f = function(x) 2 * x, + args_f = list(), + xlim = c(0, 1), + warning = FALSE +) +print(x[1L:8L]) +``` + +Note that in this case, if `warning = TRUE` (default), the `accept_reject()` function cannot know that `xlim = c(0, 1)` specifies the entire support of the probability density function $f_X(x) = 2x$, $0 \leq x \leq 1$, and that is why it issues a warning. Typically, one chooses a support that is passed to the `xlim` argument in which below the lower limit (first element of `xlim`) and above the upper limit (second element of the vector `xlim`), the probability mass (in the discrete case) or density (in the continuous case) is close to zero or not defined. In this case, we have deliberately set `warning = FALSE`. + +## Printing the Object with Generated Observations + +The `accept_reject()` function returns an object of class `accept_reject`. An object of the `accept_reject` class is essentially an atomic vector with observations generated by ARM, carrying attributes and marked with a specific class from the \pkg{AcceptReject} package, i.e., marked with the `accept_reject` class. These attributes are useful for the package and carry information used by the `plot.accept_reject()` function. + +Using the S3 object-oriented system in R, the `print()` function will dispatch for objects of the `accept_reject()` class, invoking the `print.accept_reject()` method from the \pkg{AcceptReject} package. For an object of the `accept_reject` class returned by the `accept_reject()` function, it is possible to pass methods that operate on atomic vectors, such as `summary()`, `mean()`, `var()`, among others. The `print()` function will print useful information about the `accept_reject()` class object, such as the number of observations generated, the value of `c` used, the acceptance probability, the first generated observations, and the considered `xlim` interval. The `summary()` function will work by returning some descriptive statistics about the generated observations, such as the mean, median, variance, standard deviation, minimum, maximum, first quartile, and third quartile. + +```{r} +#| echo: TRUE +# setting a seed for reproducibility +set.seed(0) +x <- accept_reject( + n = 2000L, + f = dbinom, + continuous = FALSE, + args_f = list(size = 5, prob = 0.5), + xlim = c(0, 10) +) + +# Printing the first 10 (default) observations +print(x) + +# Printing the first 20 observations +print(x, n_min = 20L) + +# Summary +summary(x) +``` + +## Plotting Data Generated by ARM + +Often, when generating observations using ARM, we are interested in visualizing the generated observations and graphically comparing them with the probability mass function (discrete case) or probability density function (continuous case) to get an idea of the quality of the data generation. Constructing a graph with your favorite library for each generation can be time-consuming. The idea of the `plot.accept_reject()` method is to facilitate this task, allowing you to easily and quickly generate this type of graph. In fact, since the S3 object-oriented system is used, you only need to use the `plot()` function on an `accept_reject` class object. + +The `plot()` function applied to an `accept_reject` class object will return an object with the secondary class `gg` and the primary class `ggplot` (returns a graph). The returned object is a graph constructed with the \pkg{ggplot2} library and can be modified to your preferred \pkg{ggplot2} standards, such as its theme. However, the `plot` function has some specific arguments that allow you to modify some elements. The general usage form is: + +```{r, eval = FALSE, echo = TRUE} +#| echo: TRUE +#| eval: FALSE +## S3 method for class 'accept_reject' +plot( + x, + color_observed_density = "#BB9FC9", + color_true_density = "#FE4F0E", + color_bar = "#BB9FC9", + color_observable_point = "#7BBDB3", + color_real_point = "#FE4F0E", + alpha = 0.3, + hist = TRUE, + ... +) +``` + +1. `x`: An object of the `accept_reject` class; +2. `color_observed_density`: observed density color (continuous case); +3. `color_true_density`: theoretical density color (continuous case); +4. `color_bar`: bar chart fill color (discrete case); +5. `color_observable_point`: color of generated points (discrete case); +6. `color_real_point`: color of the real points (discrete case); +7. `alpha`: transparency of the bars (discrete case); +8. `hist`: if `TRUE`, a histogram will be plotted in the continuous case, comparing the theoretical density with the observed one. If `FALSE`, `ggplot2::geom_density()` will be used instead of the histogram; +9. `...`: additional arguments. + +The following code demonstrates the use of the `plot()` function. In Figure \@ref(fig:fig-plotfunc-1) (a), an example of a graph is presented with the histogram replaced by the observed density, if this type of representation is more useful to the user. To do this, simply pass the argument `hist = FALSE`. To illustrate the use of the arguments, the parameters `color_true_density`, `color_observed_density`, and `alpha` were also changed. In Figure \@ref(fig:fig-plotfunc-1) (b), an example of generating a graph for the discrete case is presented. The simple use of the `plot()` function without passing arguments should be sufficient to meet the needs of most users. + +```{r "fig-plotfunc-1", fig.cap="Plotting the theoretical density function (a) and the probability mass function (b), with details of the respective parameters in the code.", echo = TRUE, out.width="50%", fig.align='center', fig.pos="H"} +#| fig-subcap: +#| - "Weibull with $n = 2000$ observations." +#| - "Binomial with $n = 1000$ observations." +#| layout-ncol: 2 +library(AcceptReject) + +# Generating and plotting the theoretical density with the +# observed density. + +# setting a seed for reproducibility +set.seed(0) + +# Continuous case +accept_reject( + n = 2000L, + continuous = TRUE, + f = dweibull, + args_f = list(shape = 2.1, scale = 2.2), + xlim = c(0, 10) +) |> + plot( + hist = FALSE, + color_true_density = "#2B8b99", + color_observed_density = "#F4DDB3", + alpha = 0.6 + ) # Changing some arguments in plot() + +# Discrete case +accept_reject( + n = 1000L, + f = dbinom, + continuous = FALSE, + args_f = list(size = 5, prob = 0.5), + xlim = c(0, 10) +) |> plot() +``` + +# Examples + +Below are some examples of using the `accept_reject()` function to generate pseudo-random observations of discrete and continuous random variables. It should be noted that in the case of $X$ being a discrete random variable, it is necessary to provide the argument `continuous = FALSE`, whereas in the case of $X$ being continuous (the default), you must consider `continuous = TRUE`. + +## Generating discrete observations + +As an example, let $X \sim Poisson(\lambda = 0.7)$. We will generate $n = 1000$ observations of $X$ using the acceptance-rejection method, using the `accept_reject()` function. Note that it is necessary to provide the `xlim` argument. Try to set an upper limit value for which the probability of $X$ assuming that value is zero or very close to zero. In this case, we choose `xlim = c(0, 20)`, where `dpois(x = 20, lambda = 0.7)` is very close to zero (`r dpois(x = 20, lambda = 0.7)`). Note Figure \@ref(fig:fig-poisson-1) (a) and Figure \@ref(fig:fig-poisson-1) (b), respectively, and note that in Figure \@ref(fig:fig-poisson-1) (b), there is a proximity of the observed probabilities to the theoretical probabilities, indicating that ARM is generating observations that approximate the true probability mass function as the sample size increases. + +```{r "fig-poisson-1", echo = TRUE, fig.cap="Generating observations from a Poisson distribution using the acceptance-rejection method, with $n = 25$ (a) and $n = 2500$ (b), respectively.", fig.align='center', out.width="50%", fig.pos="H"} +#| fig-subcap: +#| - "n = 25 observations." +#| - "n = 2500 observations." +#| layout-ncol: 2 +library(AcceptReject) +library(parallel) +library(cowplot) # install.packages("cowplot") + +# Ensuring reproducibility in parallel computing +RNGkind("L'Ecuyer-CMRG") +set.seed(0) +mc.reset.stream() + +# Simulation +simulation <- function(n, lambda = 0.7) + accept_reject( + n = n, + f = dpois, + continuous = FALSE, # discrete case + args_f = list(lambda = lambda), + xlim = c(0, 20), + parallel = TRUE # Parallelizing the code in Unix-based systems +) + +# Generating observations +# n = 25 observations +system.time({x <- simulation(25L)}) +plot(x) + +# n = 2500 observations +system.time({y <- simulation(2500L)}) +plot(y) +``` + +Note that it is necessary to specify the nature of the random variable from which observations are desired to be generated. In the case of discrete variables, the argument `continuous = FALSE` must be passed. In Section 6, examples of how to generate continuous observations will be presented. + +## Generating continuous observations + +Considering the default base function, which is the uniform distribution, where it is not necessary to specify it, the code below exemplifies the continuous case, where $X \sim \mathcal{N}(\mu = 0, \sigma^2 = 1)$. Not specifying a base probability density function implies `f_base = NULL`, `random_base = NULL`, and `args_base = NULL`, which are the defaults for the `accept_reject()` function. If at least one of them is specified as `NULL` and, by mistake, another is specified, no error will occur. In this situation, the `accept_reject()` function will assume the base probability density function is the density function of a uniform distribution over `xlim`. Note also the use of the `plot()` function, which generates a graph with the theoretical density function and the histogram of the generated observations, allowing a quick visual check of the quality of the observations generated by ARM. + +```{r, echo = TRUE, fig.cap="Generating observations from a continuous random variable with a Standard Normal distribution.", fig.align='center', out.width="50%", fig.pos="H"} +#| label: fig-normal +#| fig-cap: "Generating observations from a continuous random variable with a Standard Normal distribution, with $n = 50$ and $n = 500$ observations, respectively." +#| fig-subcap: +#| - "n = 50 observations." +#| - "n = 500 observations." +#| layout-ncol: 2 + +library(AcceptReject) +library(parallel) + +# Ensuring reproducibility in parallel computing +RNGkind("L'Ecuyer-CMRG") +set.seed(0) +mc.reset.stream() + +# Generating observations +accept_reject( + n = 50L, + f = dnorm, + continuous = TRUE, + args_f = list(mean = 0, sd = 1), + xlim = c(-4, 4), + parallel = TRUE +) |> plot() + +accept_reject( + n = 500L, + f = dnorm, + continuous = TRUE, + args_f = list(mean = 0, sd = 1), + xlim = c(-4, 4), + parallel = TRUE +) |> plot() +``` + +# A practical scenario for the use of the package + +So far, the use of the package to generate distributions that are relatively simple and involve few parameters has been demonstrated. In this section, the idea is to demonstrate the use of the \pkg{AcceptReject} package in a practical situation where the use of ARM becomes necessary. Let's consider the generator of Modified Beta Distributions, proposed by [@nadarajah2014modified]. It is a family of probability distributions since it is possible to generate various probability density functions through the proposed density generator, whose general density function is defined by: + + $$f_X(x) = \frac{\beta^a}{B(a,b)} \times \frac{g(x)G(x)^{a - 1}(1 - G(x))^{b - 1}}{[1 - (1 - \beta)G(x)]^{a + b}},$$ with $x \geq 0$ and $\beta, a, b > 0$, where $g(x)$ is a probability density function, $G(x)$ is the cumulative distribution function of $g(x)$, and $B(a,b)$ is the beta function. + + Notice that $f_{X}(x)$ has a certain complexity since it depends on another probability density function $g(x)$ and its cumulative distribution function $G(x)$. Thus, $f_X(x)$ has three parameters, $c, a, b$, plus additional parameters inherited from $g(x)$. Here, we will consider $g(x)$ as the Weibull probability density function. Therefore, $f_X(x)$ is the modified Weibull beta density function with five parameters. The implementation of the Modified Beta Distributions generator is presented below: + +```{r, echo = TRUE} +#| label: modified_beta +#| echo: TRUE +#| eval: TRUE +#| +library(numDeriv) + +pdf <- function(x, G, ...){ + numDeriv::grad( + func = \(x) G(x, ...), + x = x + ) +} + +# Modified Beta Distributions +# Link: https://link.springer.com/article/10.1007/s13571-013-0077-0 +generator <- function(x, G, a, b, beta, ...){ + g <- pdf(x = x, G = G, ...) + numerator <- beta^a * g * G(x, ...)^(a - 1) * (1 - G(x, ...))^(b - 1) + denominator <- beta(a, b) * (1 - (1 - beta) * G(x, ...))^(a + b) + numerator/denominator +} + +# Probability density function - Modified Beta Weibull +pdf_mbw <- function(x, a, b, beta, shape, scale) + generator( + x = x, + G = pweibull, + a = a, + b = b, + beta = beta, + shape = shape, + scale = scale + ) + +# Checking the value of the integral +integrate( + f = \(x) pdf_mbw(x, 1, 1, 1, 1, 1), + lower = 0, + upper = Inf +) +``` + +Notice that the value `pdf_mbw()` integrates to 1, being a probability density function. Thus, the `generator()` function generates probability density functions based on another distribution $G_X(x)$. In the case of the code above, the cumulative distribution function of the Weibull distribution is passed to the `generator()` function, but it could be any other. + +In the following code, we will adopt the strategy of investigating (inspecting) a coherent proposal for a base density function to be passed as an argument to `f_base` in the `accept_reject()` function. The investigation could be skipped, in which case the `accept_reject()` function would assume the uniform distribution as the base. + +We will consider the Weibull distribution since it is a particular case of the Modified Beta Weibull distribution. As we know how to generate observations from the Weibull distribution using the `rweibull()` function, the Weibull distribution is a viable candidate for the base density $g_Y(y)$. Consider the true parameters `a = 10.5`, `b = 4.2`, `beta = 5.9`, `shape = 1.5`, and `scale = 1.7`. Thus, using the `inspect()` function, we can quickly inspect by doing: + +```{r, echo = TRUE, out.width="50%", fig.align='center', fig.pos="H"} +#| label: fig-nadarajah +#| fig.cap: Inspecting the Weibull distribution with shape = 2, scale = 1.2, with the support xlim = c(0, 4) and c = 1 (default) (a) and c = 2.2 (b), respectively. +#| fig.subcap: +#| - "For c = 1." +#| - "For c = 2.2." +#| layout-ncol: 2 + +library(AcceptReject) + +# True parameters +a <- 10.5 +b <- 4.2 +beta <- 5.9 +shape <- 1.5 +scale <- 1.7 + +# c = 1 (default) +inspect( + f = pdf_mbw, + f_base = dweibull, + xlim = c(0, 4), + args_f = list( + a = a, + b = b, + beta = beta, + shape = shape, + scale = scale + ), + args_f_base = list(shape = 2, scale = 1.2), + c = 1 +) + +# c = 2.2 +inspect( + f = pdf_mbw, + f_base = dweibull, + xlim = c(0, 4), + args_f = list( + a = a, + b = b, + beta = beta, + shape = shape, + scale = scale + ), + args_f_base = list(shape = 2, scale = 1.2), + c = 2.2 +) +``` + +Notice that in Figure \@ref(fig:fig-nadarajah) (b), when $c = 2.2$, the density $g_Y$ bounds the density $f_X$, which is the Modified Beta Weibull density of $X$ from which we want to generate observations. Thus, the density $g_Y$ used as a base is a viable candidate to be passed to the `f_base` argument of the `accept_reject()` function in the \pkg{AcceptReject} package. Also, note that the area between $f_X$ and $g_Y$ is smaller than it would be if $g_Y$ were considered the uniform probability density function in the `xlim` support. In the following section, we will discuss the computational cost for different sample sizes, considering the base density $g_Y$ as the Weibull density or the default (uniform density). + +# Benchmarking + +The benchmarks considered in this section were performed on a computer with Arch Linux operating system, Intel Core(TM) i7-1260P processor with 16 threads, maximum frequency of 4.70 GHz, with computational times in logarithmic scale, base ten. More specifications were presented in Section 3. + +Considering the case of the Modified Beta Weibull probability density function (a probability density function with five parameters) implemented in the `pdf_mbw()` function presented in Section 7, several benchmarks were conducted to evaluate the computational cost of the `accept_reject()` function for various sample sizes, considering both the parallelized and non-parallelized scenarios. Additionally, the benchmarks took into account the specification of $g_Y$ considering the continuous uniform distribution in the `xlim` interval (default of the `accept_reject()` function) and the Weibull distribution with parameters `shape = 2` and `scale = 1.2`, which bounds the probability density function of a random variable with Modified Beta Weibull with true parameters as in Figure \@ref(fig:fig-nadarajah). The sample sizes considered were $n = 50$, $250$, $500$, $1000$, $5000$, $10000$, $15000$, $25000$, $50000$, $100000$, $150000$, $250000$, $500000$, and $1000000$. + +```{r, out.width="50%", fig.align='center', echo = FALSE, fig.pos="H"} +#| label: fig-benchmarking +#| fig.cap: Benchmarking for different sample sizes, considering the Weibull distribution and the uniform distribution as the base density, with Weibull distribution and Uniform distribution (default), respectively. +#| fig-subcap: +#| - "Weibull distribution." +#| - "Uniform distribution (default)." +#| layout-ncol: 2 + +library(AcceptReject) +library(numDeriv) # install.packages("numDeriv") +library(bench) # install.packages("bench") +library(ggplot2) # install.packages("ggplot2") +library(parallel) + +simulation <- function(n, parallel = TRUE, base = TRUE) { + # True parameters + a <- 10.5 + b <- 4.2 + beta <- 5.9 + shape <- 1.5 + scale <- 1.7 + c <- 2.2 + + # Generate data with the true parameters using + # the AcceptReject package. + if (base) { + # Using the Weibull distribution as the base distribution + accept_reject( + n = n, + f = pdf_mbw, + args_f = list( + a = a, + b = b, + beta = beta, + shape = shape, + scale = scale + ), + f_base = dweibull, + args_f_base = list(shape = 2, scale = 1.2), + random_base = rweibull, + xlim = c(0, 4), + c = c, + parallel = parallel + ) + } else { + # Using the uniforme distribution as the base distribution + accept_reject( + n = n, + f = pdf_mbw, + args_f = list( + a = a, + b = b, + beta = beta, + shape = shape, + scale = scale + ), + xlim = c(0, 4), + parallel = parallel + ) + } +} + +benchmark <- function(n_values, + time_unit = 's', + base = TRUE) { + # Initialize an empty data frame to store the results + results_df <- data.frame() + + # Run benchmarks for each sample size and each type of code + for (n in n_values) { + for (parallel in c(TRUE, FALSE)) { + results <- bench::mark( + simulation( + n = n, + parallel = parallel, + base = base + ), + time_unit = time_unit, + memory = FALSE, + check = FALSE, + filter_gc = FALSE + ) + + # Convert results to data frame and add columns for the sample + # size and type of code + results_df_temp <- as.data.frame(results) + results_df_temp$n <- n + results_df_temp + results_df_temp$Code <- ifelse(parallel, "Parallel", "Serial") + + # Append the results to the results data frame + results_df <- rbind(results_df, results_df_temp) + } + } + + # Create a scatter plot of the median time vs the sample size, + # colored by the type of code + ggplot(results_df, aes(x = n, y = median, color = Code)) + + geom_point() + + scale_x_log10() + + scale_y_log10() + + labs(x = "Sample Size (n)", y = "Median Time (s)", color = "Code Type") + + ggtitle("Benchmark Results") + + ggplot2::theme( + axis.title = ggplot2::element_text(face = "bold"), + title = ggplot2::element_text(face = "bold"), + legend.title = ggplot2::element_text(face = "bold"), + plot.subtitle = ggplot2::element_text(face = "plain") + ) +} + +# Sample sizes +n <- c(50, + 250, + 500, + 1e3, + 5e3, + 10e3, + 15e3, + 25e3, + 50e3, + 100e3, + 150e3, + 250e3, + 500e3, + 1e6) + +# Ensuring reproducibility in parallel computing +RNGkind("L'Ecuyer-CMRG") +set.seed(0) +mc.reset.stream() + +# Run the benchmark function for multiple sample sizes +n |> benchmark(n_values = _, base = TRUE) +n |> benchmark(n_values = _, base = FALSE) +``` + +Observing the Figures in \@ref(fig:fig-benchmarking), it is possible to see that the serial code, which is the default for the `accept_reject()` function, already performs excellently even with large samples, both when considering the specification of a base function $g_Y$ or using the uniform distribution as the base. Only with very large samples does the parallelized code perform better, as the parallel code may impose an overhead in thread creation that might not be justified with very small samples. For sample sizes above 10,000, the parallelized code shows better performance. However, depending on the complexity of the probability distributions involved, there may be situations where parallelizing with moderate samples could be a good alternative. The package user should, in practice, conduct tests and decide whether to consider `parallel = TRUE` or `parallel = FALSE`. + +It can be observed Figure \@ref(fig:fig-benchmarking) that the choice of the base distribution, for the simulated case, did not significantly influence the computational performance of the `accept_reject()` function. Often, depending on the complexity of $f_X$, the user might not need to worry about choosing a $g_Y$ to pass to the `f_base` argument of the `accept_reject()` function. Additionally, it is not very common in Monte-Carlo simulation studies to consider sample sizes much larger than those considered here. Therefore, users of non-Unix-based systems will not experience significant issues regarding the computational cost of the `accept_reject()` function. + +# Related works + +An implementation of ARM is provided by the \CRANpkg{AR} library [@AR]. This library exports the `AR.Sim()` function, which has an educational utility to demonstrate the functioning of ARM. Additionally, the design of the \CRANpkg{AcceptReject} package allows, in the continuous case, the use of base probability density functions that do not necessarily need to be implemented in the R language or in specific packages like in the case of the \CRANpkg{AR} library, which is a significant advantage. The specifications of the base densities in the \CRANpkg{AR} package are made considering only the densities implemented in the \CRANpkg{DISTRIB} package [@DISTRIB]. + +Another library that implements ARM is \CRANpkg{SimDesign} [@SimDesign], through the `rejectionSampling()` function. The `rejectionSampling()` function is more efficient than the `AR.Sim()` function from the \CRANpkg{AR} library, but its efficiency still does not surpass that of the `accept_reject()` function from the \CRANpkg{AcceptReject} package. The `rejectionSampling()` function from the \CRANpkg{SimDesign} library also does not support parallelism, which is a disadvantage compared to the `accept_reject()` function from the \CRANpkg{AcceptReject} package. Furthermore, the design of the \CRANpkg{AcceptReject} package uses the S3 object-oriented system, with exports of simple functions that allow the inspection and analysis of the generated data. + +The \CRANpkg{AcceptReject} library presents several advantages over the mentioned libraries, particularly in the way the package is designed, facilitating the use of functions. The library, for example, provides functions that allow the inspection of $f_X$ with $g_Y$ when $X$ and $Y$ are continuous random variables, making it easy to create highly informative graphs for the user, visually informing about the quality of the generated observations. It is a common interest for those using ARM to observe the generated data and check if they conform to the desired probability distribution. + +Figure \@ref(fig:fig-bench) (a) presents a simulation study comparing the `accept_reject()` function from the \CRANpkg{AcceptReject} package with the `rejectionSampling()` function from the \CRANpkg{SimDesign} package, considering small to moderate sample sizes. In this scenario, the `accept_reject()` function was executed serially, just like the `rejectionSampling()` function, which does not support parallelism. Equivalent performance was observed between the functions. In Figure \@ref(fig:fig-bench) (b), it is observed that the performance of the `accept_reject()` function on large samples, considering the parallel execution of the `accept_reject()` function, surpasses the performance of the `rejectionSampling()` function from the \CRANpkg{SimDesign} package. + +```{r, out.width="50%", fig.align='center', fig.pos='H'} +#| label: fig-bench +#| fig.cap: "Comparison between the AcceptReject and SimDesign package for different sample sizes, considering the generation of observations from a random variable with a Modified Beta Weibull distribution, serial processing with AcceptReject (a) and parallel processing with AcceptReject package (b), respectively." +#| fig-subcap: +#| - "Weibull distribution." +#| - "Uniform distribution (default)." +#| layout-ncol: 2 +#| warning: false +#| echo: false + +library(AcceptReject) +library(SimDesign) +library(numDeriv) +library(bench) +library(parallel) + +simulation_1 <- function(n, parallel = TRUE, base = TRUE) { + accept_reject( + n = n, + f = pdf_mbw, + args_f = list( + a = 10, + b = 1, + beta = 20.5, + shape = 2, + scale = 0.3 + ), + xlim = c(0, 1), + parallel = parallel + ) +} + +simulation_2 <- function(n) { + df = \(x) pdf_mbw( + x = x, + a = 10, + b = 1, + beta = 20.5, + shape = 2, + scale = 0.3 + ) + dg = \(x) dunif(x = x, min = 0, max = 1) + rg = \(n) runif(n = n, min = 0, max = 1) + + # when df and dg both integrate to 1, acceptance probability = 1/M + M <- + rejectionSampling(df = df, dg = dg, rg = rg) + rejectionSampling(n, + df = df, + dg = dg, + rg = rg, + M = M) +} + +benchmark <- function(n_values, parallel = TRUE) { + # Initialize an empty data frame to store the results + results_df <- data.frame() + + # Run benchmarks for each sample size and each type of code + filter_gc <- ifelse(parallel, FALSE, TRUE) + for (n in n_values) { + results <- bench::mark( + AcceptReject = simulation_1(n = n, parallel = parallel), + SimDesign = simulation_2(n = n), + time_unit = 's', + memory = FALSE, + check = FALSE, + filter_gc = filter_gc + ) + + # Convert results to data frame and add columns for the sample + # size and type of code + results_df_temp <- results + results_df_temp$n <- n + + # Append the results to the results data frame + results_df <- rbind(results_df, results_df_temp) + } + + # Create a scatter plot of the median time vs the sample size, + # colored by the type of code + ggplot(results_df, aes(x = n, y = median, color = expression)) + + geom_point() + + scale_x_log10() + + scale_y_log10() + + labs(x = "Sample Size (n)", y = "Median Time (s)", color = "Packages") + + ggtitle("Benchmark Results") + + theme( + axis.title = element_text(face = "bold"), + title = element_text(face = "bold"), + legend.title = element_text(face = "bold"), + plot.subtitle = element_text(face = "plain") + ) +} + +small_and_moderate_sample <- c(100, + 150, + 250, + 500, + 1e3, + 1500, + 2000, + 2500, + 3500, + 4500, + 5500, + 7500, + 10e3, + 25e3) +big_sample <- c(50e3, 75e3, 100e3, 150e3, 250e3, 500e3, 750e3, 1e6) +# Ensuring reproducibility in parallel computing +RNGkind("L'Ecuyer-CMRG") +set.seed(0) +mc.reset.stream() + +# Serial +benchmark(n_values = small_and_moderate_sample, parallel = FALSE) + +# Parallel +benchmark(n_values = big_sample, parallel = TRUE) +``` + +# Conclusion and future developments + +The \CRANpkg{AcceptReject} package is an efficient tool for generating pseudo-random numbers in univariate distributions of discrete and continuous random variables using the acceptance-rejection method (ARM). The library is built with the aim of delivering ease of use and computational efficiency. The \CRANpkg{AcceptReject} library can generate pseudo-random numbers both serially and in parallel, with computational efficiency in both cases, in addition to allowing easy inspection and analysis of the generated data. + +For future developments, the \CRANpkg{AcceptReject} library can be extended with functions that allow the visualization of ARM in an iterative application using Shiny for educational purposes. The most important development step for future versions will be to seek even more performance. Additionally, the project is also open to contributions from the community. diff --git a/_articles/RJ-2025-037/RJ-2025-037.html b/_articles/RJ-2025-037/RJ-2025-037.html new file mode 100644 index 0000000000..24466e4403 --- /dev/null +++ b/_articles/RJ-2025-037/RJ-2025-037.html @@ -0,0 +1,2464 @@ + + + + + + + + + + + + + + + + + + + + + + AcceptReject: An R Package for Acceptance-Rejection Method + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +

    AcceptReject: An R Package for Acceptance-Rejection Method

    + + + +

    The AcceptReject package, available for the R programming language on the Comprehensive R Archive Network (CRAN), versioned and maintained on GitHub, offers a simple and efficient solution for generating pseudo-random observations of discrete or continuous random variables using the acceptance-rejection method. This method provides a viable alternative for generating pseudo-random observations in univariate distributions when the inverse of the cumulative distribution function is not in closed form or when suitable transformations involving random variables that we know how to generate are unknown, thereby facilitating the generation of observations for the variable of interest. The package is designed to be simple, intuitive, and efficient, allowing for the rapid generation of observations and supporting multicore parallelism on Unix-based operating systems. Some components are written using C++, and the package maximizes the acceptance probability of the generated observations, resulting in even more efficient execution. The package also allows users to explore the generated pseudo-random observations by comparing them with the theoretical probability mass function or probability density function and to inspect the underlying probability density functions that can be used in the method for generating observations of continuous random variables. This article explores the package in detail, discussing its functionalities, benefits, and practical applications, and provides various benchmarks in several scenarios.

    +
    + + + +
    + + + + + + +

    1 Introduction

    +

    The class of Monte Carlo methods and algorithms is a powerful and versatile computational technique that has been widely used in a variety of fields, from physics, statistics, and economics to biology and areas such as social sciences. Through the modeling of complex systems and the performance of stochastic experiments, Monte Carlo simulation allows researchers to explore scenarios that would be impractical or impossible to investigate through traditional experimental methods.

    +

    A critical component of any Monte Carlo simulation is the ability to generate pseudo-random observations of a sequence of random variables that follow a given probability distribution. In many cases, these variables can be discrete or continuous, and the ability to generate observations from a sequence of random variables efficiently and accurately is fundamental in simulation studies.

    +

    There are several techniques to generate observations from a sequence of random variables, many of which depend on the availability of a closed-form quantile function of the distribution of interest or knowledge of some transformation involving random variables that we know how to generate observations from. Depending on the distribution one wishes to generate observations from, the inverse of the cumulative distribution function (quantile function) does not have a closed form, and in many cases, we do not know transformations involving random variables that we know how to generate observations from, and thus, we can generate observations from the random variable of interest. In this sense, computational methods, such as the Acceptance-Rejection Method (ARM), proposed by John von Neumann in 1951, see von Neumann (1951), are a viable alternative for generating observations in the univariate context. For example, in the current scenario, where various distributions and probability distribution generators are being proposed, with these generators being functions that, from the knowledge of a probability distribution, can generate a new probability distribution, the ARM is widely used because it proves to be useful for generating pseudo-random observations of random variables of a discrete or continuous nature.

    +

    In the case of univariate distributions, the ARM is an effective technique for generating pseudo-random observations of random variables and has several advantages over other methods such as the Metropolis-Hastings (MH) algorithm, for example. The ARM, often also called the Rejection Method, is an algorithm that can be easily parallelized. Moreover, this method is not sensitive to initial parameters, as in the case of the MH algorithm, and leads to observations that are not dependent. Additionally, in the ARM, it is not necessary to perform a “burn-in,” and it is not required to wait for the algorithm to reach a stationary level, as in the case of MH. The ARM, when well implemented, can be a great alternative for generating observations of random variables in the univariate case and can be considered in many situations before opting for computationally more expensive methods such as MH or Gibbs sampling.

    +

    The AcceptReject package (Marinho and Tomazella 2024) for the R programming language, also available and maintained at https://github.com/prdm0/AcceptReject/, was specifically developed to handle these challenges. It offers a simple and efficient solution for generating pseudo-random observations of discrete or continuous random variables in univariate distributions, using the ARM to generate pseudo-random observations of a sequence of random variables whose probability mass function (discrete case) or probability density function (continuous case) are complex or poorly explored. The library has detailed documentation and vignettes to assist user understanding. On the package’s website, you can find usage examples and the complete documentation of the package https://prdm0.github.io/AcceptReject/, in addition to the vignettes.

    +

    The design of the AcceptReject package is simple and intuitive, allowing users to generate observations of random variables quickly and efficiently, and in a parallelized manner, using multicore parallelism on Unix-based operating systems. The package also performs excellently on Windows, as much of the library’s performance comes from optimizing the probability of accepting observations of an auxiliary random variable as observations of the random variable of interest. Additionally, some points have been further optimized using the Rcpp (Eddelbuettel et al. 2024) and RcppArmadillo (Eddelbuettel and Sanderson 2014) libraries.

    +

    The library, through the accept_reject() function https://prdm0.github.io/AcceptReject/reference/accept_reject.html, only requires the user to provide the probability mass function (for discrete variables) or the probability density function (for continuous variables) they wish to generate observations from, in addition to the list of arguments of the distribution of interest. Other arguments can be passed and are optional, such as the normalization constant value, which by default is obtained automatically, arguments involving the optimization method, a probability mass function (discrete case), or probability density function (continuous case), if the user wishes to specify a base distribution, which can be useful in cases of more complex distributions.

    +

    With the use of simple functions such as inspect() and plot() the user can inspect the base probability density function and the quality of generating pseudo-random observations quickly, without spending much time on plotting, making the analysis process faster, which is as important as computational efficiency.

    +

    In this article, we will explore the AcceptReject package in detail. We will discuss how it can be used, the functionalities it offers, and the benefits it brings to users. Through examples and discussions, we hope to demonstrate the value of the AcceptReject package as a useful addition to the toolbox of any researcher or professional working with Monte Carlo simulations.

    +

    Initially, in Section 2, the article will discuss the ARM and how it can be used to generate pseudo-random observations of random variables. In Section 3, the dependencies used in the current version of the package will be listed and referenced, and some computational details considered in the examples and simulations will be presented. In Section 4, it shows how to install and load the package from CRAN and GitHub. In Section 5, each of the functions exported by the AcceptReject package is discussed, and some examples are presented to help expose the details. In Section 6, more examples of the package’s use are presented, showing the package for generating pseudo-random observations of discrete and continuous random variables, and Section 7 is dedicated to presenting a more complex problem, in a more realistic scenario, where a probability distribution generator, proposed by Nadarajah et al. (2014), is used to generate the Modified Beta Weibull (MBW) probability density function, where we do not know its quantile function, and thus, the ARM applied using the AcceptReject package can solve the problem of generating observations of a random variable with MBW distribution. In Section 8, benchmarks are presented in different scenarios to demonstrate the computational efficiency of the package. In Section 9, similar works are compared with the AcceptReject package, and layout and performance advantages are presented. Finally, in Section 10, a conclusion is presented with suggestions for future improvements to enhance the package.

    +

    2 Acceptance-Rejection Method - ARM

    +

    The acceptance-rejection method - ARM, proposed by (von Neumann 1951) and often simply called the rejection method, is a useful method for generating pseudo-random observations of discrete or continuous random variables, mainly in the context of univariate distributions, to which the AcceptReject package is intended. The method is based on the idea that, if we can find a probability distribution \(G_Y\) of a random variable \(Y\) that envelops (bounds) the probability distribution \(F_X\) of the random variable \(X\) of interest, and \(G\) and \(F\) share the same support, then we can generate observations of \(X\). In addition to sharing the same support and having \(G_Y\) bounding \(f_X\), it is necessary that we can generate observations of \(Y\), that is, in practice, we will have a function that generates observations of this random variable. Through the generator of \(Y\), we can then generate observations of the random variable of interest \(X\), accepting or rejecting the observations generated by the observation generator of \(Y\), based on the probability density functions or probability mass functions of \(X\) and \(Y\), in the continuous or discrete case, respectively.

    +

    Consider a hypothetical example, with \(x,y \in [0, 5]\), where \(x\) and \(y\) are observations of the random variables \(X\) (variable of interest) and \(Y\) (variable from which we can generate observations), respectively, and assuming we do not know how to generate observations of \(X\), for example, from the Weibull distribution (in practice, this is not true), where \(X \sim Weibull(\alpha = 2, \beta = 1)\), with \(c \times Y \sim \mathcal{U}(a = 0, b = 5)\), where \(c \geq 1\). In this example, given a density function used as a basis (density function of \(Y\)), the initial idea of ARM is to make the base density, denoted by \(g_Y\), envelop (bound) the probability density function of interest, that is, envelop \(f_X\). The Figure 1 (a) and Figure 1 (b) illustrate the procedure for choosing the constant \(c\) for ARM, with \(c = 1\) and \(c = 4.3\), respectively.

    +
    +
    +Inspection of the probability density function of the random variable of interest with the base probability density function, with $c = 1$ (default) (a) and $c = 4.3$ (b).Inspection of the probability density function of the random variable of interest with the base probability density function, with $c = 1$ (default) (a) and $c = 4.3$ (b). +

    +Figure 1: Inspection of the probability density function of the random variable of interest with the base probability density function, with \(c = 1\) (default) (a) and \(c = 4.3\) (b). +

    +
    +
    +

    Notice that the support of the base probability density function \(g_Y\) must be the same as \(f_X\), since the observations generated from \(Y\) are accepted or rejected as observations of \(X\) (the random variable of interest). The interval \(x,y \in [0, 5]\) was chosen because the density near the upper limit of this interval quickly drops to values close to zero, eliminating the need to consider a broader interval, although it could be considered. It is possible to see, with the help of Figure 1 (a), that \(c = 1\) is not a good choice, as much of the density \(f_X\) would be left out and not captured by \(g_Y\), i.e., it would not be bounded by \(g_Y\). Therefore, increasing the value of \(c\) in this hypothetical example is necessary for good generation of observations of \(X\). Note in Figure 1 (a) that the intersection area is approximately 0.39, and in the optimal case, we should have \(c \geq 1\), such that the intersection area between \(f_X\) and \(g_Y\) is 1, and \(c\) is the smallest possible value. A small value of \(c\) will always imply a higher probability of accepting observations of \(Y\) as observations of \(X\). Thus, the choice of \(c\) will be a trade-off between computational efficiency (higher acceptance probability) and ensuring that the base density bounds the density of interest.

    +

    Note now that for \(c = 4.3\), in Figure 1 (b), the intersection area is equal to \(1\) and \(g_Y\) does not excessively bound \(f_X\), thus becoming a convenient value. In this way, \(c = 4.3\) could be an appropriate value for the given example. However, a larger value of \(c\) could be used, which would decrease the computational efficiency of ARM, for reasons that will be presented later.

    +

    What makes the method interesting is that the iterations of ARM are not mathematically dependent, making it easily parallelizable. Although it can be extended to the bivariate or multivariate case, the method is most commonly used to generate univariate observations. Moreover, if the distribution of the random variable from which pseudo-random observations are to be generated is indexed by many parameters, ARM can be applied without major impacts, as the number of parameters usually does not affect the efficiency of the method.

    +

    Considering the discrete case, suppose \(X\) and \(Y\) are random variables with probability density function (pdf) or probability function (pf) \(f\) and \(g\), respectively. Furthermore, suppose there exists a constant \(c\) such that

    +

    \[\frac{f(x)}{g(y)} \leq c,\] for every value of \(x\), with \(f(x) > 0\). To use the acceptance-rejection method to generate observations of the random variable \(X\), using the algorithm below, first find a random variable \(Y\) with pdf or pf \(g\), such that it satisfies the above condition.

    +

    It is important that the chosen random variable \(Y\) is such that you can easily generate its observations. This is because the acceptance-rejection method is computationally more intensive than more direct methods such as the transformation method or the inversion method, which only requires the generation of pseudo-random numbers with a uniform distribution.

    +

    Algorithm of the Acceptance-Rejection Method:

    +

    1 - Generate an observation \(y\) from a random variable \(Y\) with pdf/pf \(g\);

    +

    2 - Generate an observation \(u\) from a random variable \(U\sim \mathcal{U} (0, 1)\);

    +

    3 - If \(u < \frac{f(y)}{cg(y)}\) accept \(x = y\); otherwise, reject \(y\) as an observation of the random variable \(X\) and go back to step 1.

    +

    Proof: Consider the discrete case, that is, \(X\) and \(Y\) are random variables with pfs \(f\) and \(g\), respectively. By step 3 of the algorithm above, we have \(\{accept\} = \{x = y\} = u < \frac{f(y)}{cg(y)}\). That is,

    +

    \[P(accept | Y = y) = \frac{P(accept \cap \{Y = y\})}{g(y)} = \frac{P(U \leq f(y)/cg(y)) \times g(y)}{g(y)} = \frac{f(y)}{cg(y)}.\] Hence, by the Law of Total Probability, we have:

    +

    \[P(accept) = \sum_y P(accept|Y=y)\times P(Y=y) = \sum_y \frac{f(y)}{cg(y)}\times g(y) = \frac{1}{c}.\] Therefore, by the acceptance-rejection method, we accept the occurrence of \(Y\) as an occurrence of \(X\) with probability \(1/c\). Moreover, by Bayes’ Theorem, we have

    +

    \[P(Y = y | accept) = \frac{P(accept|Y = y)\times g(y)}{P(accept)} = \frac{[f(y)/cg(y)] \times g(y)}{1/c} = f(y).\] The result above shows that accepting \(x = y\) by the algorithm’s procedure is equivalent to accepting a value from \(X\) that has pf \(f\). For the continuous case, the proof is similar.

    +

    Notice that to reduce the computational cost of the method, we should choose \(c\) in such a way that we can maximize \(P(accept)\). Therefore, choosing an excessively large value of the constant \(c\) will reduce the probability of accepting an observation from \(Y\) as an observation of the random variable \(X\).

    +

    Computationally, it is convenient to consider \(Y\) as a random variable with a uniform distribution on the support of \(f\), since generating observations from a uniform distribution is straightforward on any computer. For the discrete case, considering \(Y\) with a discrete uniform distribution might be a good alternative.

    +

    Choosing an excessively large value for \(c\) will increase the chances of generating good observations of \(X\), as an excessive value for \(c\), in many situations, will allow the base distribution \(g\) to bound \(f\). The problem, however, will be the computational cost. Thus, the problem of choosing an appropriate value for \(c\) is an optimization problem, where selecting an appropriate value for \(c\) is a task that can be automated. Additionally, it is important to note that choosing an excessively large value for the constant \(c\), although it will lead to excessively slow code, is not as significant a problem as choosing an excessively small value for \(c\), which will lead to the generation of poor observations of \(X\). Since it is an optimization problem, it is possible to choose a convenient value for \(c\) that is neither too large nor too small, generating good observations for \(X\) with a very acceptable computational cost.

    +

    Therefore, the AcceptReject package aims to automate some tasks, including the optimization of the value of \(c\). Thus, it becomes clear that, given a probability density function (continuous case) or probability mass function (discrete case) \(g\), obtaining the smallest value of \(c\), such that \(c \geq 1\), combined with the possibility of code parallelism, are excellent ways to reduce the computational cost of ARM. This is, among other things, what the AcceptReject package does through its accept_reject() function and other available functions. In the following sections, we will discuss each of them in detail.

    +

    The first article describing the acceptance-rejection method is by von Neumann, titled “Various techniques used in connection with random digits” from 1951 (von Neumann 1951). Several common books in the field of stochastic simulations and Monte Carlo methods also detail the method, such as (Kroese et al. 2013), (Asmussen and Glynn 2007), (Kemp 2003), and (Gentle 2003).

    +

    3 Dependencies and Some Computational Details

    +

    The AcceptReject package (Marinho and Tomazella 2024) has several dependencies, which, in its most current version, can be observed in the DESCRIPTION file. In the current version on GitHub, version 0.1.2, the dependencies are assertthat (Wickham 2019), cli (Csárdi 2023), ggplot2 (Wickham 2016), glue (Hester and Bryan 2024), numDeriv (Gilbert and Varadhan 2019), purrr (Wickham and Henry 2023), Rcpp (Eddelbuettel et al. 2024), RcppArmadillo (Eddelbuettel and Sanderson 2014), rlang (Henry and Wickham 2024), scales (Wickham et al. 2023), and scattermore (Kulichova and Kratochvil 2023). Suggested libraries include knitr (Xie 2024), rmarkdown (Allaire et al. 2024), cowplot (Wilke 2024), and testthat (Wickham 2011).

    +

    In some examples that use the pipe operator |>, internal to the R language, it is necessary to have version 4.1.0 or higher of the language. However, this operator is not essential in the examples and can be easily omitted. The %>% operator from the magrittr package (Bache and Wickham 2022) can also be used, as the purrr package exports it. In fact, the |> operator has not been used internally in any function of the AcceptReject package, in the current version on GitHub, so that it can pass the automatic checks of GitHub Actions, whose tests also consider older versions of the R language.

    +

    Additionally, some examples load the parallel library, included in the R language since version 2.14 in 2011. In these examples, where ARM executions are performed in parallel, to ensure the reproducibility of the results, it is necessary to call the instructions RNGkind("L'Ecuyer-CMRG") and mc.reset.stream(). The RNGkind("L'Ecuyer-CMRG") instruction sets the L’Ecuyer-CMRG pseudo-random number generator (L’ecuyer 1999) and (L’ecuyer et al. 2002), which is a safe type of generator to use in parallel calculations, as it is a generator of multiple independent streams. The mc.reset.stream() function resets the stream of random numbers, ensuring that in a subsequent execution, the same sequence is generated again.

    +

    To take advantage of parallelism in ARM implemented in the AcceptReject package, which works on Unix-based operating systems, it is not necessary to load the parallel library, as the accept_reject() function of the AcceptReject package already makes use of parallelism. Loading the parallel library is only interesting if reproducibility is desired when execution is done in parallel using multiple processes, meaning when there is interest in executing the instructions RNGkind("L'Ecuyer-CMRG") and mc.reset.stream().

    +

    Simulations for benchmarking in parallel and non-parallel scenarios were performed on a computer with the Arch Linux operating system, an Intel Core(TM) i7-1260P processor with 16 threads, a maximum processor frequency of 4.70 GHz, and 16 GB of RAM. The version of the R language was 4.4.0, and the computational times considered are in logarithmic scale, base 10.

    +

    The AcceptReject package makes use of the S3 object-oriented system in R. Simple functions like plot(), qqplot(), and print() were exported to dispatch for the accept_reject class of the AcceptReject package, making it easier to use. Other object-oriented systems, such as the R6 system (Chang 2021), were not considered, as this would make the use of the package non-idiomatic, and some users might feel discouraged from using the library. Historical details about the programming paradigms of the R language can be found in (Chambers 2014).

    +

    4 Installation and loading the package

    +

    The AcceptReject package is available on CRAN and GitHub and can be installed using the following command:

    +
    +
    +
    # Install CRAN version
    +install.packages("remotes")
    +
    +# Installing the development version from GitHub
    +# or install.packages("remotes")
    +remotes::install_github("prdm0/AcceptReject", force = TRUE)
    +
    +# Load the package
    +library(AcceptReject)
    +
    +
    +
    +
    +Logo of the package. +

    +Figure 2: Logo of the package. +

    +
    +
    +

    To access the latest updates of the AcceptReject package versions, check the changelog. For suggestions, questions, or to report issues and bugs, please open an issue. For a more general and quick overview of the package, read the README.md file or visit the package’s website at https://prdm0.github.io/AcceptReject/ to explore usage examples.

    +

    5 Function details and package usage

    +

    The AcceptReject package provides functions not only to generate observations using ARM in an optimized manner, but it also exports auxiliary functions that allow inspecting the base density function \(g_Y\), as well as the generation of pseudo-random observations after the creation, making the generation and inspection work efficient, easy, and intuitive. Efficient from a computational point of view by automatically optimizing the constant \(c\) and, therefore, maximizing the acceptance probability \(1/c\) of accepting observations of \(Y\) as observations of \(X\), in addition to allowing multicore parallelism in Unix-based operating systems. The computational efficiency combined with the ease of inspecting \(g_Y\) and the generated observations makes the package pleasant to use. The AcceptReject library provides the following functions:

    +
      +
    • inspect(): a useful function for inspecting the probability density function \(f_X\) with the base probability density function \(g_Y\), highlighting the intersection between the two functions and the value of the areas, allowing experimentation with different values of \(c\). This function does not perform any optimization; it simply facilitates the inspection of the proposed \(g_Y\) as a base probability density function, returning an object of the secondary class gg and the primary class ggplot for graphs created with the ggplot2 library, as shown in Figure 1 (a) and Figure 1 (b);

    • +
    • accept_reject(): implements ARM, optimizes the constant \(c\), and performs parallelism if specified by the user. The user can also specify details in the optimization process or define a value for \(c\) and thus assume this value of \(c\), omitting the optimization process. Additionally, it is possible to specify or not a base probability mass function or probability density function \(g_Y\), depending on whether \(X\) is a discrete or continuous random variable, respectively. If omitted, \(Y \sim \mathcal{U}\) discrete or continuous will be considered, depending on the nature of \(X\). Moreover, the user can specify, in the case of not specifying the value of the constant \(c\), an initial guess used in the optimization process of the constant \(c\). A good guess can be given by graphically inspecting the relationship between \(f_X\) and \(g_Y\). In most cases, the accept_reject() function will provide good acceptances without specifying \(g_Y\) different from the default discrete or continuous uniform distribution and will make a good estimation of \(c\);

    • +
    • print.accept_reject(): a function responsible for printing useful information on the screen about objects of the accept_reject class returned by the accept_reject() function, such as the number of generated observations, the estimated constant \(c\), and the estimated acceptance probability of accepting observations of \(Y\) as observations of \(X\);

    • +
    • plot.accept_reject(): operates on objects of the accept_reject class returned by the accept_reject() function. The plot.accept_reject() function returns an object of the secondary class gg and the primary class ggplot from the ggplot2 package, allowing easy graphical comparison of the probability density function or probability mass function \(f_X\) with the observed probability density function or mass function from the generated data;

    • +
    • qqplot(): constructs a Quantile-Quantile plot (QQ-plot) to compare the distribution of data generated by ARM (observed distribution) with the distribution of the random variable \(X\) (theoretical distribution, denoted by \(f_X\)). In a very simple way, the QQ-plot is produced by passing an object of the accept_reject class, returned by the accept_reject() function, to the qqplot() function.

    • +
    +

    In the following subsections, more details about the functions exported by the AcceptReject package will be presented. Examples are provided to facilitate understanding. Additional details about the functions can be found in the function documentation, which can be accessed using help(package = "AcceptReject") or on the package’s website, hosted in the GitHub repository that versions the development of the library at https://prdm0.github.io/AcceptReject/.

    +

    5.1 Inspecting Density Functions

    +

    The inspect() function of the AcceptReject package is useful when we want to generate pseudo-random observations of a continuous random variable using ARM. It is possible to skip this inspection since the accept_reject() function already automatically considers the continuous uniform distribution as the base density and optimizes, based on this base distribution, the best value for the constant \(c\). Additionally, other base densities \(g_Y\) can be specified to the accept_reject() function, where the search for the constant \(c\) will be done automatically, optimizing the value of \(c\) and its relationship with \(f_X\) and \(g_Y\), given \(g_Y\).

    +

    To specify other base density functions \(g_Y\), it is prudent to perform a graphical inspection of the relationship between \(f_X\) and \(g_Y\) to get an idea of the reasonableness of the candidate base probability density function \(g_Y\). Thus, the inspect() function will automatically plot a graph with some useful information, as well as the functions \(f_X\) and \(g_Y\). The inspect() function will return an object with the secondary class gg and the primary class ggplot (a graph made with the ggplot2 library) highlighting the intersection area between \(f_X\) and \(g_Y\) and the value of this area for a given \(c\), specified in the c argument of the inspect() function.

    +

    Theoretically, you can use any function \(g_Y\) that has support equivalent to that of the function \(f_X\), finding the appropriate value of \(c\) that will make \(g_Y\) envelop \(f_X\), that is, so that the value of the intersection area integrates to 1. The inspect() function has the following form:

    +
    +
    +
    inspect(
    +  f,
    +  args_f,
    +  f_base,
    +  args_f_base,
    +  xlim,
    +  c = 1,
    +  alpha = 0.4,
    +  color_intersection = "#BB9FC9",
    +  color_f = "#FE4F0E",
    +  color_f_base = "#7BBDB3"
    +)
    +
    +
    +

    where:

    +
      +
    1. f is the probability density function \(f_X\) of the random variable \(X\) of interest;
    2. +
    3. args_f: is a list of arguments that will be passed to the probability density function \(f_X\) and that specify the parameters of the density of \(X\);
    4. +
    5. f_base: is the probability density function that will supposedly be used as the base density \(g_Y\);
    6. +
    7. args_f_base: is a list of arguments that will be passed to the base probability density function \(g_Y\) and that specify the parameters of the density of \(Y\);
    8. +
    9. xlim: is a vector of size two that specifies the support of the functions \(f_Y\) and \(g_Y\) (they must be equivalent);
    10. +
    11. c: is the constant \(c\) that will be used to multiply the base probability density function \(g_Y\), so it envelops (bounds) \(f_X\). The default is c = 1;
    12. +
    13. alpha: is the transparency of the intersection area (default is alpha = 0.4);
    14. +
    15. color_intersection: is the color of the intersection area between \(f_X\) and \(g_Y\), where the default is color_intersection = #BB9FC9;
    16. +
    17. color_f: is the color of the curve of the probability density function \(f_X\) (default is color_f = #FE4F0E);
    18. +
    19. color_f_base: is the color of the curve of the base probability density function \(g_Y\), where the default is color_f_base = #7BBDB3.
    20. +
    +

    The Figure 1 (a) and Figure 1 (b), presented at the beginning of this paper, were automatically created with the inspect() function. As an example, in addition to those available in the package documentation and vignettes, here is the code that generated the graph shown in Figure 1 (a):

    +
    +
    +
    library(AcceptReject)
    +
    +# Considering c = 1 (default)
    +inspect(
    +  f = dweibull,
    +  f_base = dunif,
    +  xlim = c(0, 5),
    +  args_f = list(shape = 2, scale = 1),
    +  args_f_base = list(min = 0, max = 5),
    +  c = 1
    +)
    +
    +
    +

    5.2 Generating Observations with ARM

    +

    The most important function of the AcceptReject package is the accept_reject() function, as it is the function that implements ARM and all the optimizations to obtain a good generation of \(X\). The accept_reject() function has the following signature:

    +
    +
    +
    accept_reject(
    +  n = 1L,
    +  continuous = TRUE,
    +  f = NULL,
    +  args_f = NULL,
    +  f_base = NULL,
    +  random_base = NULL,
    +  args_f_base = NULL,
    +  xlim = NULL,
    +  c = NULL,
    +  parallel = FALSE,
    +  cores = NULL,
    +  warning = TRUE,
    +  ...
    +)
    +
    +
    +

    where:

    +
      +
    1. n: is the number of observations to be generated (default n = 1L);
    2. +
    3. continuous: is a logical value that indicates whether the probability density function \(f_X\) is continuous (default continuous = TRUE) or discrete, if continuous = FALSE;
    4. +
    5. f: is the probability density function \(f_X\) of the random variable \(X\) of interest;
    6. +
    7. args_f: is a list of arguments that will be passed to the probability density function \(f_X\) and that specify the parameters of the density of \(X\). No matter how many parameters there are in f, they should be passed as a list to args_f;
    8. +
    9. f_base: is the probability density function that will supposedly be used as the base density \(g_Y\). It is important to note that this argument is only useful in this package when continuous = TRUE, as for the discrete case the package already has quite satisfactory computational performance. Additionally, visualizing a probability mass function that bounds f_X is more complicated than for the continuous case. In the discrete case, when continuous = FALSE, f_base = NULL will be considered the discrete uniform distribution;
    10. +
    11. random_base: is a function that generates observations from the base distribution \(Y\) (default random_base = NULL). If random_base = NULL, the base distribution \(Y\) will be considered uniform over the interval specified in the xlim argument;
    12. +
    13. args_f_base: is a list of arguments that will be passed to the base probability density function \(g_Y\) and that specify the parameters of the density of \(Y\). No matter how many parameters there are in f_base, they should be passed as a list to args_f_base;
    14. +
    15. xlim: is a vector of size two that specifies the support of the functions \(f_Y\) and \(g_Y\). It is important to remember that the support of \(f_Y\) and \(g_Y\) must be equivalent and will be informed by a single vector of size two passed as an argument to xlim;
    16. +
    17. c: is the constant \(c\) that will be used to multiply the base probability density function \(g_Y\), so it envelops (bounds) \(f_X\). The default is c = 1. Unless there is a very strong reason, considering c = NULL as the default will be a good choice because the accept_reject() function will attempt to find an optimal value for the constant \(c\);
    18. +
    19. parallel: is a logical value that indicates whether the generation of observations will be done in parallel (default parallel = FALSE). If parallel = TRUE, the generation of observations will be done in parallel on Unix-based systems, using the total number of cores available on the system. If parallel = TRUE and the operating system is Windows, the code will run serially, and no error will be returned even if parallel = TRUE;
    20. +
    21. cores: is the number of cores that will be used for parallel observation generation. If parallel = TRUE and cores = NULL, the number of cores used will be the total number of cores available on the system. If the user wishes to use a smaller number of cores, they can define it in the cores argument;
    22. +
    23. warning: is a logical value that indicates whether warning messages will be printed during the execution of the function (default warning = TRUE). If the user specifies a very small domain in xlim, the accept_reject() function will issue a warning informing that the specified domain is too small and that the generation of observations may be compromised;
    24. +
    25. ...: additional arguments that the user might want to pass to the optimize function used to optimize the value of c.
    26. +
    +

    In a simpler use case, where the user does not wish to specify the base density function \(g_Y\) passed as an argument to f_base, useful for generating observations of a sequence of continuous random variables (\(x_1, \cdots, x_n\)), the use of the accept_reject() function is quite straightforward. For example, to generate 100 observations of a random variable \(X\) with a probability density function \(f_X(x) = 2x\), \(0 \leq x \leq 1\), the user could do:

    +
    +
    +
    set.seed(0)
    +
    +# Generate 100 observations from a random variable X with
    +# f_X(x) = 2x, 0 <= x <= 1.
    +x <- accept_reject(
    +  n = 100L,
    +  f = function(x) 2 * x,
    +  args_f = list(),
    +  xlim = c(0, 1),
    +  warning = FALSE
    +)
    +print(x[1L:8L])
    +
    +
    [1] 0.8966972 0.3721239 0.5728534 0.8983897 0.9446753 0.6607978
    +[7] 0.6870228 0.7698414
    +
    +

    Note that in this case, if warning = TRUE (default), the accept_reject() function cannot know that xlim = c(0, 1) specifies the entire support of the probability density function \(f_X(x) = 2x\), \(0 \leq x \leq 1\), and that is why it issues a warning. Typically, one chooses a support that is passed to the xlim argument in which below the lower limit (first element of xlim) and above the upper limit (second element of the vector xlim), the probability mass (in the discrete case) or density (in the continuous case) is close to zero or not defined. In this case, we have deliberately set warning = FALSE.

    +

    5.3 Printing the Object with Generated Observations

    +

    The accept_reject() function returns an object of class accept_reject. An object of the accept_reject class is essentially an atomic vector with observations generated by ARM, carrying attributes and marked with a specific class from the AcceptReject package, i.e., marked with the accept_reject class. These attributes are useful for the package and carry information used by the plot.accept_reject() function.

    +

    Using the S3 object-oriented system in R, the print() function will dispatch for objects of the accept_reject() class, invoking the print.accept_reject() method from the AcceptReject package. For an object of the accept_reject class returned by the accept_reject() function, it is possible to pass methods that operate on atomic vectors, such as summary(), mean(), var(), among others. The print() function will print useful information about the accept_reject() class object, such as the number of observations generated, the value of c used, the acceptance probability, the first generated observations, and the considered xlim interval. The summary() function will work by returning some descriptive statistics about the generated observations, such as the mean, median, variance, standard deviation, minimum, maximum, first quartile, and third quartile.

    +
    +
    +
    # setting a seed for reproducibility
    +set.seed(0)
    +x <- accept_reject(
    +   n = 2000L,
    +   f = dbinom,
    +   continuous = FALSE,
    +   args_f = list(size = 5, prob = 0.5),
    +   xlim = c(0, 10)
    +)
    +
    +# Printing the first 10 (default) observations
    +print(x)
    +
    +
    +
    # Printing the first 20 observations
    +print(x, n_min = 20L)
    +
    +
    +
    # Summary
    +summary(x)
    +
    +
           V1       
    + Min.   :0.000  
    + 1st Qu.:2.000  
    + Median :3.000  
    + Mean   :2.538  
    + 3rd Qu.:3.000  
    + Max.   :5.000  
    +
    +

    5.4 Plotting Data Generated by ARM

    +

    Often, when generating observations using ARM, we are interested in visualizing the generated observations and graphically comparing them with the probability mass function (discrete case) or probability density function (continuous case) to get an idea of the quality of the data generation. Constructing a graph with your favorite library for each generation can be time-consuming. The idea of the plot.accept_reject() method is to facilitate this task, allowing you to easily and quickly generate this type of graph. In fact, since the S3 object-oriented system is used, you only need to use the plot() function on an accept_reject class object.

    +

    The plot() function applied to an accept_reject class object will return an object with the secondary class gg and the primary class ggplot (returns a graph). The returned object is a graph constructed with the ggplot2 library and can be modified to your preferred ggplot2 standards, such as its theme. However, the plot function has some specific arguments that allow you to modify some elements. The general usage form is:

    +
    +
    +
    ## S3 method for class 'accept_reject'
    +plot(
    +  x,
    +  color_observed_density = "#BB9FC9",
    +  color_true_density = "#FE4F0E",
    +  color_bar = "#BB9FC9",
    +  color_observable_point = "#7BBDB3",
    +  color_real_point = "#FE4F0E",
    +  alpha = 0.3,
    +  hist = TRUE,
    +  ...
    +)
    +
    +
    +
      +
    1. x: An object of the accept_reject class;
    2. +
    3. color_observed_density: observed density color (continuous case);
    4. +
    5. color_true_density: theoretical density color (continuous case);
    6. +
    7. color_bar: bar chart fill color (discrete case);
    8. +
    9. color_observable_point: color of generated points (discrete case);
    10. +
    11. color_real_point: color of the real points (discrete case);
    12. +
    13. alpha: transparency of the bars (discrete case);
    14. +
    15. hist: if TRUE, a histogram will be plotted in the continuous case, comparing the theoretical density with the observed one. If FALSE, ggplot2::geom_density() will be used instead of the histogram;
    16. +
    17. ...: additional arguments.
    18. +
    +

    The following code demonstrates the use of the plot() function. In Figure 3 (a), an example of a graph is presented with the histogram replaced by the observed density, if this type of representation is more useful to the user. To do this, simply pass the argument hist = FALSE. To illustrate the use of the arguments, the parameters color_true_density, color_observed_density, and alpha were also changed. In Figure 3 (b), an example of generating a graph for the discrete case is presented. The simple use of the plot() function without passing arguments should be sufficient to meet the needs of most users.

    +
    +
    +
    library(AcceptReject)
    +
    +# Generating and plotting the theoretical density with the
    +# observed density.
    +
    +# setting a seed for reproducibility
    +set.seed(0)
    +
    +# Continuous case
    +accept_reject(
    +  n = 2000L,
    +  continuous = TRUE,
    +  f = dweibull,
    +  args_f = list(shape = 2.1, scale = 2.2),
    +  xlim = c(0, 10)
    +) |>
    +  plot(
    +    hist = FALSE,
    +    color_true_density = "#2B8b99",
    +    color_observed_density = "#F4DDB3",
    +    alpha = 0.6
    +  ) # Changing some arguments in plot()
    +
    +# Discrete case
    +accept_reject(
    +  n = 1000L,
    +  f = dbinom,
    +  continuous = FALSE,
    +  args_f = list(size = 5, prob = 0.5),
    +  xlim = c(0, 10)
    +) |> plot()
    +
    +
    +Plotting the theoretical density function (a) and the probability mass function (b), with details of the respective parameters in the code.Plotting the theoretical density function (a) and the probability mass function (b), with details of the respective parameters in the code. +

    +Figure 3: Plotting the theoretical density function (a) and the probability mass function (b), with details of the respective parameters in the code. +

    +
    +
    +

    6 Examples

    +

    Below are some examples of using the accept_reject() function to generate pseudo-random observations of discrete and continuous random variables. It should be noted that in the case of \(X\) being a discrete random variable, it is necessary to provide the argument continuous = FALSE, whereas in the case of \(X\) being continuous (the default), you must consider continuous = TRUE.

    +

    6.1 Generating discrete observations

    +

    As an example, let \(X \sim Poisson(\lambda = 0.7)\). We will generate \(n = 1000\) observations of \(X\) using the acceptance-rejection method, using the accept_reject() function. Note that it is necessary to provide the xlim argument. Try to set an upper limit value for which the probability of \(X\) assuming that value is zero or very close to zero. In this case, we choose xlim = c(0, 20), where dpois(x = 20, lambda = 0.7) is very close to zero (0). Note Figure 4 (a) and Figure 4 (b), respectively, and note that in Figure 4 (b), there is a proximity of the observed probabilities to the theoretical probabilities, indicating that ARM is generating observations that approximate the true probability mass function as the sample size increases.

    +
    +
    +
    library(AcceptReject)
    +library(parallel)
    +library(cowplot) # install.packages("cowplot")
    +
    +# Ensuring reproducibility in parallel computing
    +RNGkind("L'Ecuyer-CMRG")
    +set.seed(0)
    +mc.reset.stream()
    +
    +# Simulation
    +simulation <- function(n, lambda = 0.7)
    +  accept_reject(
    +    n = n,
    +    f = dpois,
    +    continuous = FALSE, # discrete case
    +    args_f = list(lambda = lambda),
    +    xlim = c(0, 20),
    +    parallel = TRUE # Parallelizing the code in Unix-based systems
    +)
    +
    +# Generating observations
    +# n = 25 observations
    +system.time({x <- simulation(25L)})
    +
    +
       user  system elapsed 
    +  0.004   0.038   0.047 
    +
    +
    plot(x)
    +
    +# n = 2500 observations
    +system.time({y <- simulation(2500L)})
    +
    +
       user  system elapsed 
    +  0.008   0.066   0.052 
    +
    +
    plot(y)
    +
    +
    +Generating observations from a Poisson distribution using the acceptance-rejection method, with $n = 25$ (a) and $n = 2500$ (b), respectively.Generating observations from a Poisson distribution using the acceptance-rejection method, with $n = 25$ (a) and $n = 2500$ (b), respectively. +

    +Figure 4: Generating observations from a Poisson distribution using the acceptance-rejection method, with \(n = 25\) (a) and \(n = 2500\) (b), respectively. +

    +
    +
    +

    Note that it is necessary to specify the nature of the random variable from which observations are desired to be generated. In the case of discrete variables, the argument continuous = FALSE must be passed. In Section 6, examples of how to generate continuous observations will be presented.

    +

    6.2 Generating continuous observations

    +

    Considering the default base function, which is the uniform distribution, where it is not necessary to specify it, the code below exemplifies the continuous case, where \(X \sim \mathcal{N}(\mu = 0, \sigma^2 = 1)\). Not specifying a base probability density function implies f_base = NULL, random_base = NULL, and args_base = NULL, which are the defaults for the accept_reject() function. If at least one of them is specified as NULL and, by mistake, another is specified, no error will occur. In this situation, the accept_reject() function will assume the base probability density function is the density function of a uniform distribution over xlim. Note also the use of the plot() function, which generates a graph with the theoretical density function and the histogram of the generated observations, allowing a quick visual check of the quality of the observations generated by ARM.

    +
    +
    +
    library(AcceptReject)
    +library(parallel)
    +
    +# Ensuring reproducibility in parallel computing
    +RNGkind("L'Ecuyer-CMRG")
    +set.seed(0)
    +mc.reset.stream()
    +
    +# Generating observations
    +accept_reject(
    +  n = 50L,
    +  f = dnorm,
    +  continuous = TRUE,
    +  args_f = list(mean = 0, sd = 1),
    +  xlim = c(-4, 4),
    +  parallel = TRUE
    +) |> plot()
    +
    +accept_reject(
    +  n = 500L,
    +  f = dnorm,
    +  continuous = TRUE,
    +  args_f = list(mean = 0, sd = 1),
    +  xlim = c(-4, 4),
    +  parallel = TRUE
    +) |> plot()
    +
    +
    +Generating observations from a continuous random variable with a Standard Normal distribution, with $n = 50$ and $n = 500$ observations, respectively.Generating observations from a continuous random variable with a Standard Normal distribution, with $n = 50$ and $n = 500$ observations, respectively. +

    +Figure 5: Generating observations from a continuous random variable with a Standard Normal distribution, with \(n = 50\) and \(n = 500\) observations, respectively. +

    +
    +
    +

    7 A practical scenario for the use of the package

    +

    So far, the use of the package to generate distributions that are relatively simple and involve few parameters has been demonstrated. In this section, the idea is to demonstrate the use of the AcceptReject package in a practical situation where the use of ARM becomes necessary. Let’s consider the generator of Modified Beta Distributions, proposed by (Nadarajah et al. 2014). It is a family of probability distributions since it is possible to generate various probability density functions through the proposed density generator, whose general density function is defined by:

    +

    \[f_X(x) = \frac{\beta^a}{B(a,b)} \times \frac{g(x)G(x)^{a - 1}(1 - G(x))^{b - 1}}{[1 - (1 - \beta)G(x)]^{a + b}},\] with \(x \geq 0\) and \(\beta, a, b > 0\), where \(g(x)\) is a probability density function, \(G(x)\) is the cumulative distribution function of \(g(x)\), and \(B(a,b)\) is the beta function.

    +

    Notice that \(f_{X}(x)\) has a certain complexity since it depends on another probability density function \(g(x)\) and its cumulative distribution function \(G(x)\). Thus, \(f_X(x)\) has three parameters, \(c, a, b\), plus additional parameters inherited from \(g(x)\). Here, we will consider \(g(x)\) as the Weibull probability density function. Therefore, \(f_X(x)\) is the modified Weibull beta density function with five parameters. The implementation of the Modified Beta Distributions generator is presented below:

    +
    +
    +
    #|
    +library(numDeriv)
    +
    +pdf <- function(x, G, ...){
    +  numDeriv::grad(
    +    func = \(x) G(x, ...),
    +    x = x
    +  )
    +}
    +
    +# Modified Beta Distributions
    +# Link: https://link.springer.com/article/10.1007/s13571-013-0077-0
    +generator <- function(x, G, a, b, beta, ...){
    +  g <- pdf(x = x, G = G, ...)
    +  numerator <- beta^a * g * G(x, ...)^(a - 1) * (1 - G(x, ...))^(b - 1)
    +  denominator <- beta(a, b) * (1 - (1 - beta) * G(x, ...))^(a + b)
    +  numerator/denominator
    +}
    +
    +# Probability density function - Modified Beta Weibull
    +pdf_mbw <- function(x, a, b, beta, shape, scale)
    +  generator(
    +    x = x,
    +    G = pweibull,
    +    a = a,
    +    b = b,
    +    beta = beta,
    +    shape = shape,
    +    scale = scale
    +  )
    +
    +# Checking the value of the integral
    +integrate(
    +  f = \(x) pdf_mbw(x, 1, 1, 1, 1, 1),
    +  lower = 0,
    +  upper = Inf
    +)
    +
    +
    1 with absolute error < 5.7e-05
    +
    +

    Notice that the value pdf_mbw() integrates to 1, being a probability density function. Thus, the generator() function generates probability density functions based on another distribution \(G_X(x)\). In the case of the code above, the cumulative distribution function of the Weibull distribution is passed to the generator() function, but it could be any other.

    +

    In the following code, we will adopt the strategy of investigating (inspecting) a coherent proposal for a base density function to be passed as an argument to f_base in the accept_reject() function. The investigation could be skipped, in which case the accept_reject() function would assume the uniform distribution as the base.

    +

    We will consider the Weibull distribution since it is a particular case of the Modified Beta Weibull distribution. As we know how to generate observations from the Weibull distribution using the rweibull() function, the Weibull distribution is a viable candidate for the base density \(g_Y(y)\). Consider the true parameters a = 10.5, b = 4.2, beta = 5.9, shape = 1.5, and scale = 1.7. Thus, using the inspect() function, we can quickly inspect by doing:

    +
    +
    +
    library(AcceptReject)
    +
    +# True parameters
    +a <- 10.5
    +b <- 4.2
    +beta <- 5.9
    +shape <- 1.5
    +scale <- 1.7
    +
    +# c = 1 (default)
    +inspect(
    +  f = pdf_mbw,
    +  f_base = dweibull,
    +  xlim = c(0, 4),
    +  args_f = list(
    +    a = a,
    +    b = b,
    +    beta = beta,
    +    shape = shape,
    +    scale = scale
    +  ),
    +  args_f_base = list(shape = 2, scale = 1.2),
    +  c = 1
    +)
    +
    +# c = 2.2
    +inspect(
    +  f = pdf_mbw,
    +  f_base = dweibull,
    +  xlim = c(0, 4),
    +  args_f = list(
    +    a = a,
    +    b = b,
    +    beta = beta,
    +    shape = shape,
    +    scale = scale
    +  ),
    +  args_f_base = list(shape = 2, scale = 1.2),
    +  c = 2.2
    +)
    +
    +
    +Inspecting the Weibull distribution with shape = 2, scale = 1.2, with the support xlim = c(0, 4) and c = 1 (default) (a) and c = 2.2 (b), respectively.Inspecting the Weibull distribution with shape = 2, scale = 1.2, with the support xlim = c(0, 4) and c = 1 (default) (a) and c = 2.2 (b), respectively. +

    +Figure 6: Inspecting the Weibull distribution with shape = 2, scale = 1.2, with the support xlim = c(0, 4) and c = 1 (default) (a) and c = 2.2 (b), respectively. +

    +
    +
    +

    Notice that in Figure 6 (b), when \(c = 2.2\), the density \(g_Y\) bounds the density \(f_X\), which is the Modified Beta Weibull density of \(X\) from which we want to generate observations. Thus, the density \(g_Y\) used as a base is a viable candidate to be passed to the f_base argument of the accept_reject() function in the AcceptReject package. Also, note that the area between \(f_X\) and \(g_Y\) is smaller than it would be if \(g_Y\) were considered the uniform probability density function in the xlim support. In the following section, we will discuss the computational cost for different sample sizes, considering the base density \(g_Y\) as the Weibull density or the default (uniform density).

    +

    8 Benchmarking

    +

    The benchmarks considered in this section were performed on a computer with Arch Linux operating system, Intel Core(TM) i7-1260P processor with 16 threads, maximum frequency of 4.70 GHz, with computational times in logarithmic scale, base ten. More specifications were presented in Section 3.

    +

    Considering the case of the Modified Beta Weibull probability density function (a probability density function with five parameters) implemented in the pdf_mbw() function presented in Section 7, several benchmarks were conducted to evaluate the computational cost of the accept_reject() function for various sample sizes, considering both the parallelized and non-parallelized scenarios. Additionally, the benchmarks took into account the specification of \(g_Y\) considering the continuous uniform distribution in the xlim interval (default of the accept_reject() function) and the Weibull distribution with parameters shape = 2 and scale = 1.2, which bounds the probability density function of a random variable with Modified Beta Weibull with true parameters as in Figure 6. The sample sizes considered were \(n = 50\), \(250\), \(500\), \(1000\), \(5000\), \(10000\), \(15000\), \(25000\), \(50000\), \(100000\), \(150000\), \(250000\), \(500000\), and \(1000000\).

    +
    +
    +Benchmarking for different sample sizes, considering the Weibull distribution and the uniform distribution as the base density, with Weibull distribution and Uniform distribution (default), respectively.Benchmarking for different sample sizes, considering the Weibull distribution and the uniform distribution as the base density, with Weibull distribution and Uniform distribution (default), respectively. +

    +Figure 7: Benchmarking for different sample sizes, considering the Weibull distribution and the uniform distribution as the base density, with Weibull distribution and Uniform distribution (default), respectively. +

    +
    +
    +

    Observing the Figures in 7, it is possible to see that the serial code, which is the default for the accept_reject() function, already performs excellently even with large samples, both when considering the specification of a base function \(g_Y\) or using the uniform distribution as the base. Only with very large samples does the parallelized code perform better, as the parallel code may impose an overhead in thread creation that might not be justified with very small samples. For sample sizes above 10,000, the parallelized code shows better performance. However, depending on the complexity of the probability distributions involved, there may be situations where parallelizing with moderate samples could be a good alternative. The package user should, in practice, conduct tests and decide whether to consider parallel = TRUE or parallel = FALSE.

    +

    It can be observed Figure 7 that the choice of the base distribution, for the simulated case, did not significantly influence the computational performance of the accept_reject() function. Often, depending on the complexity of \(f_X\), the user might not need to worry about choosing a \(g_Y\) to pass to the f_base argument of the accept_reject() function. Additionally, it is not very common in Monte-Carlo simulation studies to consider sample sizes much larger than those considered here. Therefore, users of non-Unix-based systems will not experience significant issues regarding the computational cost of the accept_reject() function.

    + +

    An implementation of ARM is provided by the AR library (Parchami 2018). This library exports the AR.Sim() function, which has an educational utility to demonstrate the functioning of ARM. Additionally, the design of the AcceptReject package allows, in the continuous case, the use of base probability density functions that do not necessarily need to be implemented in the R language or in specific packages like in the case of the AR library, which is a significant advantage. The specifications of the base densities in the AR package are made considering only the densities implemented in the DISTRIB package (Parchami 2016).

    +

    Another library that implements ARM is SimDesign (Chalmers and Adkins 2020), through the rejectionSampling() function. The rejectionSampling() function is more efficient than the AR.Sim() function from the AR library, but its efficiency still does not surpass that of the accept_reject() function from the AcceptReject package. The rejectionSampling() function from the SimDesign library also does not support parallelism, which is a disadvantage compared to the accept_reject() function from the AcceptReject package. Furthermore, the design of the AcceptReject package uses the S3 object-oriented system, with exports of simple functions that allow the inspection and analysis of the generated data.

    +

    The AcceptReject library presents several advantages over the mentioned libraries, particularly in the way the package is designed, facilitating the use of functions. The library, for example, provides functions that allow the inspection of \(f_X\) with \(g_Y\) when \(X\) and \(Y\) are continuous random variables, making it easy to create highly informative graphs for the user, visually informing about the quality of the generated observations. It is a common interest for those using ARM to observe the generated data and check if they conform to the desired probability distribution.

    +

    Figure 8 (a) presents a simulation study comparing the accept_reject() function from the AcceptReject package with the rejectionSampling() function from the SimDesign package, considering small to moderate sample sizes. In this scenario, the accept_reject() function was executed serially, just like the rejectionSampling() function, which does not support parallelism. Equivalent performance was observed between the functions. In Figure 8 (b), it is observed that the performance of the accept_reject() function on large samples, considering the parallel execution of the accept_reject() function, surpasses the performance of the rejectionSampling() function from the SimDesign package.

    +
    +
    +Comparison between the AcceptReject and SimDesign package for different sample sizes, considering the generation of observations from a random variable with a Modified Beta Weibull distribution, serial processing with AcceptReject (a) and parallel processing with AcceptReject package (b), respectively.Comparison between the AcceptReject and SimDesign package for different sample sizes, considering the generation of observations from a random variable with a Modified Beta Weibull distribution, serial processing with AcceptReject (a) and parallel processing with AcceptReject package (b), respectively. +

    +Figure 8: Comparison between the AcceptReject and SimDesign package for different sample sizes, considering the generation of observations from a random variable with a Modified Beta Weibull distribution, serial processing with AcceptReject (a) and parallel processing with AcceptReject package (b), respectively. +

    +
    +
    +

    10 Conclusion and future developments

    +

    The AcceptReject package is an efficient tool for generating pseudo-random numbers in univariate distributions of discrete and continuous random variables using the acceptance-rejection method (ARM). The library is built with the aim of delivering ease of use and computational efficiency. The AcceptReject library can generate pseudo-random numbers both serially and in parallel, with computational efficiency in both cases, in addition to allowing easy inspection and analysis of the generated data.

    +

    For future developments, the AcceptReject library can be extended with functions that allow the visualization of ARM in an iterative application using Shiny for educational purposes. The most important development step for future versions will be to seek even more performance. Additionally, the project is also open to contributions from the community.

    +
    +

    10.1 Supplementary materials

    +

    Supplementary materials are available in addition to this article. It can be downloaded at +RJ-2025-037.zip

    +

    10.2 CRAN packages used

    +

    AcceptReject, Rcpp, RcppArmadillo, assertthat, cli, ggplot2, glue, numDeriv, purrr, rlang, scales, scattermore, knitr, rmarkdown, cowplot, testthat, magrittr, parallel, AR, DISTRIB, SimDesign

    +

    10.3 CRAN Task Views implied by cited packages

    +

    ChemPhys, HighPerformanceComputing, NetworkAnalysis, NumericalMathematics, Phylogenetics, ReproducibleResearch, Spatial, TeachingStatistics

    +
    +
    +J. Allaire, Y. Xie, C. Dervieux, J. McPherson, J. Luraschi, K. Ushey, A. Atkins, H. Wickham, J. Cheng, W. Chang, et al. rmarkdown: Dynamic documents for R. 2024. URL https://github.com/rstudio/rmarkdown. R package version 2.26. +
    +
    +S. Asmussen and P. W. Glynn. Stochastic simulation: Algorithms and analysis. Springer, 2007. +
    +
    +S. M. Bache and H. Wickham. magrittr: A forward-pipe operator for R. 2022. URL https://CRAN.R-project.org/package=magrittr. R package version 2.0.3. +
    +
    +R. P. Chalmers and M. C. Adkins. Writing effective and reliable Monte Carlo simulations with the SimDesign package. The Quantitative Methods for Psychology, 16(4): 248–280, 2020. DOI 10.20982/tqmp.16.4.p248. +
    +
    +J. M. Chambers. Object-oriented programming, functional programming and R. 2014. +
    +
    +W. Chang. R6: Encapsulated classes with reference semantics. 2021. URL https://CRAN.R-project.org/package=R6. R package version 2.5.1. +
    +
    +G. Csárdi. cli: Helpers for developing command line interfaces. 2023. URL https://CRAN.R-project.org/package=cli. R package version 3.6.2. +
    +
    +D. Eddelbuettel, R. Francois, J. Allaire, K. Ushey, Q. Kou, N. Russell, D. Bates, J. Chambers and M. D. Eddelbuettel. Package “Rcpp”. 2024. +
    +
    +D. Eddelbuettel and C. Sanderson. RcppArmadillo: Accelerating R with high-performance C++ linear algebra. Computational statistics & data analysis, 71: 1054–1063, 2014. +
    +
    +J. E. Gentle. Random number generation and Monte Carlo methods. Springer, 2003. +
    +
    +P. Gilbert and R. Varadhan. numDeriv: Accurate numerical derivatives. 2019. URL https://CRAN.R-project.org/package=numDeriv. R package version 2016.8-1.1. +
    +
    +L. Henry and H. Wickham. rlang: Functions for base types and core R and ’Tidyverse’ features. 2024. URL https://CRAN.R-project.org/package=rlang. R package version 1.1.3. +
    +
    +J. Hester and J. Bryan. glue: Interpreted string literals. 2024. URL https://CRAN.R-project.org/package=glue. R package version 1.7.0. +
    +
    +D. Kemp. Discrete-event simulation: Modeling, programming, and analysis. 2003. +
    +
    +D. P. Kroese, T. Taimre and Z. I. Botev. Handbook of monte carlo methods. John Wiley & Sons, 2013. +
    +
    +T. Kulichova and M. Kratochvil. scattermore: Scatterplots with more points. 2023. URL https://CRAN.R-project.org/package=scattermore. R package version 1.2. +
    +
    +P. L’ecuyer. Good parameters and implementations for combined multiple recursive random number generators. Operations Research, 47(1): 159–164, 1999. +
    +
    +P. L’ecuyer, R. Simard, E. J. Chen and W. D. Kelton. An object-oriented random-number package with many long streams and substreams. Operations research, 50(6): 1073–1075, 2002. +
    +
    +P. R. D. Marinho and V. L. D. Tomazella. AcceptReject: Acceptance-rejection method for generating pseudo-random observations. 2024. URL https://prdm0.github.io/AcceptReject/. R package version 0.1.1. +
    +
    +S. Nadarajah, M. Teimouri and S. H. Shih. Modified beta distributions. Sankhya B, 76: 19–48, 2014. +
    +
    +A. Parchami. AR: Another look at the acceptance-rejection method. 2018. URL https://CRAN.R-project.org/package=AR. R package version 1.1. +
    +
    +A. Parchami. DISTRIB: Four essential functions for statistical distributions analysis: A new functional approach. 2016. URL https://CRAN.R-project.org/package=DISTRIB. R package version 1.0. +
    +
    +J. von Neumann. Various techniques used in connection with random digits. Notes by GE Forsythe, 36–38, 1951. +
    +
    +H. Wickham. assertthat: Easy pre and post assertions. 2019. URL https://CRAN.R-project.org/package=assertthat. R package version 0.2.1. +
    +
    +H. Wickham. ggplot2: Elegant graphics for data analysis. Springer-Verlag New York, 2016. URL https://ggplot2.tidyverse.org. +
    +
    +H. Wickham. testthat: Get started with testing. The R Journal, 3: 5–10, 2011. URL https://journal.r-project.org/archive/2011-1/RJournal_2011-1_Wickham.pdf. +
    +
    +H. Wickham and L. Henry. purrr: Functional programming tools. 2023. URL https://CRAN.R-project.org/package=purrr. R package version 1.0.2. +
    +
    +H. Wickham, T. L. Pedersen and D. Seidel. scales: Scale functions for visualization. 2023. URL https://CRAN.R-project.org/package=scales. R package version 1.3.0. +
    +
    +C. O. Wilke. cowplot: Streamlined plot theme and plot annotations for ’ggplot2’. 2024. URL https://CRAN.R-project.org/package=cowplot. R package version 1.1.3. +
    +
    +Y. Xie. knitr: A general-purpose package for dynamic report generation in R. 2024. URL https://yihui.org/knitr/. R package version 1.46. +
    +
    + + +
    + +
    +
    + + + + + + + +
    +

    References

    +
    +

    Reuse

    +

    Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

    +

    Citation

    +

    For attribution, please cite this work as

    +
    Marinho & Tomazella, "AcceptReject: An R Package for Acceptance-Rejection Method", The R Journal, 2026
    +

    BibTeX citation

    +
    @article{RJ-2025-037,
    +  author = {Marinho, Pedro Rafael Diniz and Tomazella, Vera L. D.},
    +  title = {AcceptReject: An R Package for Acceptance-Rejection Method},
    +  journal = {The R Journal},
    +  year = {2026},
    +  note = {https://doi.org/10.32614/RJ-2025-037},
    +  doi = {10.32614/RJ-2025-037},
    +  volume = {17},
    +  issue = {4},
    +  issn = {2073-4859},
    +  pages = {133-155}
    +}
    +
    + + + + + + + diff --git a/_articles/RJ-2025-037/RJ-2025-037.pdf b/_articles/RJ-2025-037/RJ-2025-037.pdf new file mode 100644 index 0000000000..209b3abe92 Binary files /dev/null and b/_articles/RJ-2025-037/RJ-2025-037.pdf differ diff --git a/_articles/RJ-2025-037/RJ-2025-037.tex b/_articles/RJ-2025-037/RJ-2025-037.tex new file mode 100644 index 0000000000..ae8dc00f6b --- /dev/null +++ b/_articles/RJ-2025-037/RJ-2025-037.tex @@ -0,0 +1,707 @@ +% !TeX root = RJwrapper.tex +\title{AcceptReject: An R Package for Acceptance-Rejection Method} + + +\author{by Pedro Rafael Diniz Marinho and Vera L. D. Tomazella} + +\maketitle + +\abstract{% +The AcceptReject package, available for the R programming language on the Comprehensive R Archive Network (CRAN), versioned and maintained on GitHub, offers a simple and efficient solution for generating pseudo-random observations of discrete or continuous random variables using the acceptance-rejection method. This method provides a viable alternative for generating pseudo-random observations in univariate distributions when the inverse of the cumulative distribution function is not in closed form or when suitable transformations involving random variables that we know how to generate are unknown, thereby facilitating the generation of observations for the variable of interest. The package is designed to be simple, intuitive, and efficient, allowing for the rapid generation of observations and supporting multicore parallelism on Unix-based operating systems. Some components are written using C++, and the package maximizes the acceptance probability of the generated observations, resulting in even more efficient execution. The package also allows users to explore the generated pseudo-random observations by comparing them with the theoretical probability mass function or probability density function and to inspect the underlying probability density functions that can be used in the method for generating observations of continuous random variables. This article explores the package in detail, discussing its functionalities, benefits, and practical applications, and provides various benchmarks in several scenarios. +} + +\section{Introduction}\label{introduction} + +The class of Monte Carlo methods and algorithms is a powerful and versatile computational technique that has been widely used in a variety of fields, from physics, statistics, and economics to biology and areas such as social sciences. Through the modeling of complex systems and the performance of stochastic experiments, Monte Carlo simulation allows researchers to explore scenarios that would be impractical or impossible to investigate through traditional experimental methods. + +A critical component of any Monte Carlo simulation is the ability to generate pseudo-random observations of a sequence of random variables that follow a given probability distribution. In many cases, these variables can be discrete or continuous, and the ability to generate observations from a sequence of random variables efficiently and accurately is fundamental in simulation studies. + +There are several techniques to generate observations from a sequence of random variables, many of which depend on the availability of a closed-form quantile function of the distribution of interest or knowledge of some transformation involving random variables that we know how to generate observations from. Depending on the distribution one wishes to generate observations from, the inverse of the cumulative distribution function (quantile function) does not have a closed form, and in many cases, we do not know transformations involving random variables that we know how to generate observations from, and thus, we can generate observations from the random variable of interest. In this sense, computational methods, such as the Acceptance-Rejection Method (ARM), proposed by John von Neumann in 1951, see \citet{neumann1951various}, are a viable alternative for generating observations in the univariate context. For example, in the current scenario, where various distributions and probability distribution generators are being proposed, with these generators being functions that, from the knowledge of a probability distribution, can generate a new probability distribution, the ARM is widely used because it proves to be useful for generating pseudo-random observations of random variables of a discrete or continuous nature. + +In the case of univariate distributions, the ARM is an effective technique for generating pseudo-random observations of random variables and has several advantages over other methods such as the Metropolis-Hastings (MH) algorithm, for example. The ARM, often also called the Rejection Method, is an algorithm that can be easily parallelized. Moreover, this method is not sensitive to initial parameters, as in the case of the MH algorithm, and leads to observations that are not dependent. Additionally, in the ARM, it is not necessary to perform a ``burn-in,'' and it is not required to wait for the algorithm to reach a stationary level, as in the case of MH. The ARM, when well implemented, can be a great alternative for generating observations of random variables in the univariate case and can be considered in many situations before opting for computationally more expensive methods such as MH or Gibbs sampling. + +The \CRANpkg{AcceptReject} package \citep{AcceptReject} for the R programming language, also available and maintained at \url{https://github.com/prdm0/AcceptReject/}, was specifically developed to handle these challenges. It offers a simple and efficient solution for generating pseudo-random observations of discrete or continuous random variables in univariate distributions, using the ARM to generate pseudo-random observations of a sequence of random variables whose probability mass function (discrete case) or probability density function (continuous case) are complex or poorly explored. The library has detailed documentation and vignettes to assist user understanding. On the package's website, you can find usage examples and the complete documentation of the package \url{https://prdm0.github.io/AcceptReject/}, in addition to the vignettes. + +The design of the \CRANpkg{AcceptReject} package is simple and intuitive, allowing users to generate observations of random variables quickly and efficiently, and in a parallelized manner, using multicore parallelism on Unix-based operating systems. The package also performs excellently on Windows, as much of the library's performance comes from optimizing the probability of accepting observations of an auxiliary random variable as observations of the random variable of interest. Additionally, some points have been further optimized using the \CRANpkg{Rcpp} \citep{eddelbuettel2024package} and \CRANpkg{RcppArmadillo} \citep{eddelbuettel2014rcpparmadillo} libraries. + +The library, through the \texttt{accept\_reject()} function \url{https://prdm0.github.io/AcceptReject/reference/accept_reject.html}, only requires the user to provide the probability mass function (for discrete variables) or the probability density function (for continuous variables) they wish to generate observations from, in addition to the list of arguments of the distribution of interest. Other arguments can be passed and are optional, such as the normalization constant value, which by default is obtained automatically, arguments involving the optimization method, a probability mass function (discrete case), or probability density function (continuous case), if the user wishes to specify a base distribution, which can be useful in cases of more complex distributions. + +With the use of simple functions such as \href{https://prdm0.github.io/AcceptReject/reference/}{\texttt{inspect()}} and \href{https://prdm0.github.io/AcceptReject/reference/plot.accept_reject.html}{\texttt{plot()}} the user can inspect the base probability density function and the quality of generating pseudo-random observations quickly, without spending much time on plotting, making the analysis process faster, which is as important as computational efficiency. + +In this article, we will explore the \pkg{AcceptReject} package in detail. We will discuss how it can be used, the functionalities it offers, and the benefits it brings to users. Through examples and discussions, we hope to demonstrate the value of the \pkg{AcceptReject} package as a useful addition to the toolbox of any researcher or professional working with Monte Carlo simulations. + +Initially, in Section 2, the article will discuss the ARM and how it can be used to generate pseudo-random observations of random variables. In Section 3, the dependencies used in the current version of the package will be listed and referenced, and some computational details considered in the examples and simulations will be presented. In Section 4, it shows how to install and load the package from CRAN and GitHub. In Section 5, each of the functions exported by the \pkg{AcceptReject} package is discussed, and some examples are presented to help expose the details. In Section 6, more examples of the package's use are presented, showing the package for generating pseudo-random observations of discrete and continuous random variables, and Section 7 is dedicated to presenting a more complex problem, in a more realistic scenario, where a probability distribution generator, proposed by \citet{nadarajah2014modified}, is used to generate the Modified Beta Weibull (MBW) probability density function, where we do not know its quantile function, and thus, the ARM applied using the \pkg{AcceptReject} package can solve the problem of generating observations of a random variable with MBW distribution. In Section 8, benchmarks are presented in different scenarios to demonstrate the computational efficiency of the package. In Section 9, similar works are compared with the \CRANpkg{AcceptReject} package, and layout and performance advantages are presented. Finally, in Section 10, a conclusion is presented with suggestions for future improvements to enhance the package. + +\section{Acceptance-Rejection Method - ARM}\label{acceptance-rejection-method---arm} + +The acceptance-rejection method - ARM, proposed by \citep{neumann1951various} and often simply called the rejection method, is a useful method for generating pseudo-random observations of discrete or continuous random variables, mainly in the context of univariate distributions, to which the {AcceptReject} package is intended. The method is based on the idea that, if we can find a probability distribution \(G_Y\) of a random variable \(Y\) that envelops (bounds) the probability distribution \(F_X\) of the random variable \(X\) of interest, and \(G\) and \(F\) share the same support, then we can generate observations of \(X\). In addition to sharing the same support and having \(G_Y\) bounding \(f_X\), it is necessary that we can generate observations of \(Y\), that is, in practice, we will have a function that generates observations of this random variable. Through the generator of \(Y\), we can then generate observations of the random variable of interest \(X\), accepting or rejecting the observations generated by the observation generator of \(Y\), based on the probability density functions or probability mass functions of \(X\) and \(Y\), in the continuous or discrete case, respectively. + +Consider a hypothetical example, with \(x,y \in [0, 5]\), where \(x\) and \(y\) are observations of the random variables \(X\) (variable of interest) and \(Y\) (variable from which we can generate observations), respectively, and assuming we do not know how to generate observations of \(X\), for example, from the Weibull distribution (in practice, this is not true), where \(X \sim Weibull(\alpha = 2, \beta = 1)\), with \(c \times Y \sim \mathcal{U}(a = 0, b = 5)\), where \(c \geq 1\). In this example, given a density function used as a basis (density function of \(Y\)), the initial idea of ARM is to make the base density, denoted by \(g_Y\), envelop (bound) the probability density function of interest, that is, envelop \(f_X\). The Figure \ref{fig:fig-inspect-1} (a) and Figure \ref{fig:fig-inspect-1} (b) illustrate the procedure for choosing the constant \(c\) for ARM, with \(c = 1\) and \(c = 4.3\), respectively. + +\begin{figure}[H] + +{\centering \subfloat[For $c = 1$ (default).\label{fig:fig-inspect-1-1}]{\includegraphics[width=0.5\linewidth]{RJ-2025-037_files/figure-latex/fig-inspect-1-1} }\subfloat[For $c = 4.3$.\label{fig:fig-inspect-1-2}]{\includegraphics[width=0.5\linewidth]{RJ-2025-037_files/figure-latex/fig-inspect-1-2} } + +} + +\caption{Inspection of the probability density function of the random variable of interest with the base probability density function, with $c = 1$ (default) (a) and $c = 4.3$ (b).}\label{fig:fig-inspect-1} +\end{figure} + +Notice that the support of the base probability density function \(g_Y\) must be the same as \(f_X\), since the observations generated from \(Y\) are accepted or rejected as observations of \(X\) (the random variable of interest). The interval \(x,y \in [0, 5]\) was chosen because the density near the upper limit of this interval quickly drops to values close to zero, eliminating the need to consider a broader interval, although it could be considered. It is possible to see, with the help of Figure \ref{fig:fig-inspect-1} (a), that \(c = 1\) is not a good choice, as much of the density \(f_X\) would be left out and not captured by \(g_Y\), i.e., it would not be bounded by \(g_Y\). Therefore, increasing the value of \(c\) in this hypothetical example is necessary for good generation of observations of \(X\). Note in Figure \ref{fig:fig-inspect-1} (a) that the intersection area is approximately 0.39, and in the optimal case, we should have \(c \geq 1\), such that the intersection area between \(f_X\) and \(g_Y\) is 1, and \(c\) is the smallest possible value. A small value of \(c\) will always imply a higher probability of accepting observations of \(Y\) as observations of \(X\). Thus, the choice of \(c\) will be a trade-off between computational efficiency (higher acceptance probability) and ensuring that the base density bounds the density of interest. + +Note now that for \(c = 4.3\), in Figure \ref{fig:fig-inspect-1} (b), the intersection area is equal to \(1\) and \(g_Y\) does not excessively bound \(f_X\), thus becoming a convenient value. In this way, \(c = 4.3\) could be an appropriate value for the given example. However, a larger value of \(c\) could be used, which would decrease the computational efficiency of ARM, for reasons that will be presented later. + +What makes the method interesting is that the iterations of ARM are not mathematically dependent, making it easily parallelizable. Although it can be extended to the bivariate or multivariate case, the method is most commonly used to generate univariate observations. Moreover, if the distribution of the random variable from which pseudo-random observations are to be generated is indexed by many parameters, ARM can be applied without major impacts, as the number of parameters usually does not affect the efficiency of the method. + +Considering the discrete case, suppose \(X\) and \(Y\) are random variables with probability density function (pdf) or probability function (pf) \(f\) and \(g\), respectively. Furthermore, suppose there exists a constant \(c\) such that + +\[\frac{f(x)}{g(y)} \leq c,\] for every value of \(x\), with \(f(x) > 0\). To use the acceptance-rejection method to generate observations of the random variable \(X\), using the algorithm below, first find a random variable \(Y\) with pdf or pf \(g\), such that it satisfies the above condition. + +It is important that the chosen random variable \(Y\) is such that you can easily generate its observations. This is because the acceptance-rejection method is computationally more intensive than more direct methods such as the transformation method or the inversion method, which only requires the generation of pseudo-random numbers with a uniform distribution. + +\textbf{Algorithm of the Acceptance-Rejection Method}: + +1 - Generate an observation \(y\) from a random variable \(Y\) with pdf/pf \(g\); + +2 - Generate an observation \(u\) from a random variable \(U\sim \mathcal{U} (0, 1)\); + +3 - If \(u < \frac{f(y)}{cg(y)}\) accept \(x = y\); otherwise, reject \(y\) as an observation of the random variable \(X\) and go back to step 1. + +\textbf{Proof}: Consider the discrete case, that is, \(X\) and \(Y\) are random variables with pfs \(f\) and \(g\), respectively. By step 3 of the algorithm above, we have \(\{accept\} = \{x = y\} = u < \frac{f(y)}{cg(y)}\). That is, + +\[P(accept | Y = y) = \frac{P(accept \cap \{Y = y\})}{g(y)} = \frac{P(U \leq f(y)/cg(y)) \times g(y)}{g(y)} = \frac{f(y)}{cg(y)}.\] Hence, by the \href{https://en.wikipedia.org/wiki/Law_of_total_probability}{\textbf{Law of Total Probability}}, we have: + +\[P(accept) = \sum_y P(accept|Y=y)\times P(Y=y) = \sum_y \frac{f(y)}{cg(y)}\times g(y) = \frac{1}{c}.\] Therefore, by the acceptance-rejection method, we accept the occurrence of \(Y\) as an occurrence of \(X\) with probability \(1/c\). Moreover, by Bayes' Theorem, we have + +\[P(Y = y | accept) = \frac{P(accept|Y = y)\times g(y)}{P(accept)} = \frac{[f(y)/cg(y)] \times g(y)}{1/c} = f(y).\] The result above shows that accepting \(x = y\) by the algorithm's procedure is equivalent to accepting a value from \(X\) that has pf \(f\). For the continuous case, the proof is similar. + +Notice that to reduce the computational cost of the method, we should choose \(c\) in such a way that we can maximize \(P(accept)\). Therefore, choosing an excessively large value of the constant \(c\) will reduce the probability of accepting an observation from \(Y\) as an observation of the random variable \(X\). + +Computationally, it is convenient to consider \(Y\) as a random variable with a uniform distribution on the support of \(f\), since generating observations from a uniform distribution is straightforward on any computer. For the discrete case, considering \(Y\) with a discrete uniform distribution might be a good alternative. + +Choosing an excessively large value for \(c\) will increase the chances of generating good observations of \(X\), as an excessive value for \(c\), in many situations, will allow the base distribution \(g\) to bound \(f\). The problem, however, will be the computational cost. Thus, the problem of choosing an appropriate value for \(c\) is an optimization problem, where selecting an appropriate value for \(c\) is a task that can be automated. Additionally, it is important to note that choosing an excessively large value for the constant \(c\), although it will lead to excessively slow code, is not as significant a problem as choosing an excessively small value for \(c\), which will lead to the generation of poor observations of \(X\). Since it is an optimization problem, it is possible to choose a convenient value for \(c\) that is neither too large nor too small, generating good observations for \(X\) with a very acceptable computational cost. + +Therefore, the \pkg{AcceptReject} package aims to automate some tasks, including the optimization of the value of \(c\). Thus, it becomes clear that, given a probability density function (continuous case) or probability mass function (discrete case) \(g\), obtaining the smallest value of \(c\), such that \(c \geq 1\), combined with the possibility of code parallelism, are excellent ways to reduce the computational cost of ARM. This is, among other things, what the \pkg{AcceptReject} package does through its \href{https://prdm0.github.io/AcceptReject/reference/accept_reject.html}{\texttt{accept\_reject()}} function and other available functions. In the following sections, we will discuss each of them in detail. + +The first article describing the acceptance-rejection method is by von Neumann, titled ``Various techniques used in connection with random digits'' from 1951 \citep{neumann1951various}. Several common books in the field of stochastic simulations and Monte Carlo methods also detail the method, such as \citep{kroese2013handbook}, \citep{asmussen2007stochastic}, \citep{kemp2003discrete}, and \citep{gentle2003random}. + +\section{Dependencies and Some Computational Details}\label{dependencies-and-some-computational-details} + +The \pkg{AcceptReject} package \citep{AcceptReject} has several dependencies, which, in its most current version, can be observed in the \href{https://github.com/prdm0/AcceptReject/blob/main/DESCRIPTION}{DESCRIPTION} file. In the current version on GitHub, version 0.1.2, the dependencies are \CRANpkg{assertthat} \citep{assertthat}, \CRANpkg{cli} \citep{cli}, \CRANpkg{ggplot2} \citep{ggplot2}, \CRANpkg{glue} \citep{glue}, \CRANpkg{numDeriv} \citep{numDeriv}, \CRANpkg{purrr} \citep{purrr}, \CRANpkg{Rcpp} \citep{eddelbuettel2024package}, \CRANpkg{RcppArmadillo} \citep{eddelbuettel2014rcpparmadillo}, \CRANpkg{rlang} \citep{rlang}, \CRANpkg{scales} \citep{scales}, and \CRANpkg{scattermore} \citep{scattermore}. Suggested libraries include \CRANpkg{knitr} \citep{knitr}, \CRANpkg{rmarkdown} \citep{rmarkdown}, \CRANpkg{cowplot} \citep{cowplot}, and \CRANpkg{testthat} \citep{testthat}. + +In some examples that use the pipe operator \texttt{\textbar{}\textgreater{}}, internal to the R language, it is necessary to have version 4.1.0 or higher of the language. However, this operator is not essential in the examples and can be easily omitted. The \texttt{\%\textgreater{}\%} operator from the \CRANpkg{magrittr} package \citep{magrittr} can also be used, as the \pkg{purrr} package exports it. In fact, the \texttt{\textbar{}\textgreater{}} operator has not been used internally in any function of the \pkg{AcceptReject} package, in the current version on GitHub, so that it can pass the automatic checks of GitHub Actions, whose tests also consider older versions of the R language. + +Additionally, some examples load the \CRANpkg{parallel} library, included in the R language since version 2.14 in 2011. In these examples, where ARM executions are performed in parallel, to ensure the reproducibility of the results, it is necessary to call the instructions \texttt{RNGkind("L\textquotesingle{}Ecuyer-CMRG")} and \texttt{mc.reset.stream()}. The \texttt{RNGkind("L\textquotesingle{}Ecuyer-CMRG")} instruction sets the L'Ecuyer-CMRG pseudo-random number generator \citep{l1999good} and \citep{l2002object}, which is a safe type of generator to use in parallel calculations, as it is a generator of multiple independent streams. The \texttt{mc.reset.stream()} function resets the stream of random numbers, ensuring that in a subsequent execution, the same sequence is generated again. + +To take advantage of parallelism in ARM implemented in the \CRANpkg{AcceptReject} package, which works on Unix-based operating systems, it is not necessary to load the \CRANpkg{parallel} library, as the \href{https://prdm0.github.io/AcceptReject/reference/accept_reject.html}{\texttt{accept\_reject()}} function of the \pkg{AcceptReject} package already makes use of parallelism. Loading the \pkg{parallel} library is only interesting if reproducibility is desired when execution is done in parallel using multiple processes, meaning when there is interest in executing the instructions \texttt{RNGkind("L\textquotesingle{}Ecuyer-CMRG")} and \texttt{mc.reset.stream()}. + +Simulations for benchmarking in parallel and non-parallel scenarios were performed on a computer with the Arch Linux operating system, an Intel Core(TM) i7-1260P processor with 16 threads, a maximum processor frequency of 4.70 GHz, and 16 GB of RAM. The version of the R language was 4.4.0, and the computational times considered are in logarithmic scale, base 10. + +The \pkg{AcceptReject} package makes use of the S3 object-oriented system in R. Simple functions like \texttt{plot()}, \texttt{qqplot()}, and \texttt{print()} were exported to dispatch for the \texttt{accept\_reject} class of the \pkg{AcceptReject} package, making it easier to use. Other object-oriented systems, such as the R6 system \citep{r6}, were not considered, as this would make the use of the package non-idiomatic, and some users might feel discouraged from using the library. Historical details about the programming paradigms of the R language can be found in \citep{chambers2014object}. + +\section{Installation and loading the package}\label{installation-and-loading-the-package} + +The \pkg{AcceptReject} package is available on CRAN and GitHub and can be installed using the following command: + +\begin{verbatim} +# Install CRAN version +install.packages("remotes") + +# Installing the development version from GitHub +# or install.packages("remotes") +remotes::install_github("prdm0/AcceptReject", force = TRUE) + +# Load the package +library(AcceptReject) +\end{verbatim} + +\begin{figure}[H] + +{\centering \includegraphics[width=0.25\linewidth]{figures/logo} + +} + +\caption{Logo of the package.}\label{fig:logo} +\end{figure} + +To access the latest updates of the \pkg{AcceptReject} package versions, check the \href{https://prdm0.github.io/AcceptReject/news/}{changelog}. For suggestions, questions, or to report issues and bugs, please open an \href{https://github.com/prdm0/AcceptReject/issues}{issue}. For a more general and quick overview of the package, read the \href{https://github.com/prdm0/AcceptReject}{README.md} file or visit the package's website at \url{https://prdm0.github.io/AcceptReject/} to explore usage examples. + +\section{Function details and package usage}\label{function-details-and-package-usage} + +The \pkg{AcceptReject} package provides functions not only to generate observations using ARM in an optimized manner, but it also exports auxiliary functions that allow inspecting the base density function \(g_Y\), as well as the generation of pseudo-random observations after the creation, making the generation and inspection work efficient, easy, and intuitive. Efficient from a computational point of view by automatically optimizing the constant \(c\) and, therefore, maximizing the acceptance probability \(1/c\) of accepting observations of \(Y\) as observations of \(X\), in addition to allowing multicore parallelism in Unix-based operating systems. The computational efficiency combined with the ease of inspecting \(g_Y\) and the generated observations makes the package pleasant to use. The \pkg{AcceptReject} library provides the following functions: + +\begin{itemize} +\item + \href{https://prdm0.github.io/AcceptReject/reference/inspect.html}{\texttt{inspect()}}: a useful function for inspecting the probability density function \(f_X\) with the base probability density function \(g_Y\), highlighting the intersection between the two functions and the value of the areas, allowing experimentation with different values of \(c\). This function does not perform any optimization; it simply facilitates the inspection of the proposed \(g_Y\) as a base probability density function, returning an object of the secondary class \texttt{gg} and the primary class \texttt{ggplot} for graphs created with the \pkg{ggplot2} library, as shown in Figure \ref{fig:fig-inspect-1} (a) and Figure \ref{fig:fig-inspect-1} (b); +\item + \href{https://prdm0.github.io/AcceptReject/reference/accept_reject.html}{\texttt{accept\_reject()}}: implements ARM, optimizes the constant \(c\), and performs parallelism if specified by the user. The user can also specify details in the optimization process or define a value for \(c\) and thus assume this value of \(c\), omitting the optimization process. Additionally, it is possible to specify or not a base probability mass function or probability density function \(g_Y\), depending on whether \(X\) is a discrete or continuous random variable, respectively. If omitted, \(Y \sim \mathcal{U}\) discrete or continuous will be considered, depending on the nature of \(X\). Moreover, the user can specify, in the case of not specifying the value of the constant \(c\), an initial guess used in the optimization process of the constant \(c\). A good guess can be given by graphically inspecting the relationship between \(f_X\) and \(g_Y\). In most cases, the \texttt{accept\_reject()} function will provide good acceptances without specifying \(g_Y\) different from the default discrete or continuous uniform distribution and will make a good estimation of \(c\); +\item + \href{https://prdm0.github.io/AcceptReject/reference/print.accept_reject.html}{\texttt{print.accept\_reject()}}: a function responsible for printing useful information on the screen about objects of the \texttt{accept\_reject} class returned by the \href{https://prdm0.github.io/AcceptReject/reference/accept_reject.html}{\texttt{accept\_reject()}} function, such as the number of generated observations, the estimated constant \(c\), and the estimated acceptance probability of accepting observations of \(Y\) as observations of \(X\); +\item + \href{https://prdm0.github.io/AcceptReject/reference/plot.accept_reject.html}{\texttt{plot.accept\_reject()}}: operates on objects of the \texttt{accept\_reject} class returned by the \href{https://prdm0.github.io/AcceptReject/reference/accept_reject.html}{\texttt{accept\_reject()}} function. The \href{https://prdm0.github.io/AcceptReject/reference/plot.accept_reject.html}{\texttt{plot.accept\_reject()}} function returns an object of the secondary class \texttt{gg} and the primary class \texttt{ggplot} from the \pkg{ggplot2} package, allowing easy graphical comparison of the probability density function or probability mass function \(f_X\) with the observed probability density function or mass function from the generated data; +\item + \href{https://prdm0.github.io/AcceptReject/reference/qqplot.html}{\texttt{qqplot()}}: constructs a Quantile-Quantile plot (QQ-plot) to compare the distribution of data generated by ARM (observed distribution) with the distribution of the random variable \(X\) (theoretical distribution, denoted by \(f_X\)). In a very simple way, the QQ-plot is produced by passing an object of the \texttt{accept\_reject} class, returned by the \texttt{accept\_reject()} function, to the \texttt{qqplot()} function. +\end{itemize} + +In the following subsections, more details about the functions exported by the \pkg{AcceptReject} package will be presented. Examples are provided to facilitate understanding. Additional details about the functions can be found in the function documentation, which can be accessed using \texttt{help(package\ =\ "AcceptReject")} or on the package's website, hosted in the GitHub repository that versions the development of the library at \url{https://prdm0.github.io/AcceptReject/}. + +\subsection{Inspecting Density Functions}\label{inspecting-density-functions} + +The \href{https://prdm0.github.io/AcceptReject/reference/inspect.html}{\texttt{inspect()}} function of the \pkg{AcceptReject} package is useful when we want to generate pseudo-random observations of a continuous random variable using ARM. It is possible to skip this inspection since the \texttt{accept\_reject()} function already automatically considers the continuous uniform distribution as the base density and optimizes, based on this base distribution, the best value for the constant \(c\). Additionally, other base densities \(g_Y\) can be specified to the \texttt{accept\_reject()} function, where the search for the constant \(c\) will be done automatically, optimizing the value of \(c\) and its relationship with \(f_X\) and \(g_Y\), given \(g_Y\). + +To specify other base density functions \(g_Y\), it is prudent to perform a graphical inspection of the relationship between \(f_X\) and \(g_Y\) to get an idea of the reasonableness of the candidate base probability density function \(g_Y\). Thus, the \texttt{inspect()} function will automatically plot a graph with some useful information, as well as the functions \(f_X\) and \(g_Y\). The \texttt{inspect()} function will return an object with the secondary class \texttt{gg} and the primary class \texttt{ggplot} (a graph made with the \pkg{ggplot2} library) highlighting the intersection area between \(f_X\) and \(g_Y\) and the value of this area for a given \(c\), specified in the \texttt{c} argument of the \texttt{inspect()} function. + +Theoretically, you can use any function \(g_Y\) that has support equivalent to that of the function \(f_X\), finding the appropriate value of \(c\) that will make \(g_Y\) envelop \(f_X\), that is, so that the value of the intersection area integrates to 1. The \texttt{inspect()} function has the following form: + +\begin{verbatim} +inspect( + f, + args_f, + f_base, + args_f_base, + xlim, + c = 1, + alpha = 0.4, + color_intersection = "#BB9FC9", + color_f = "#FE4F0E", + color_f_base = "#7BBDB3" +) +\end{verbatim} + +where: + +\begin{enumerate} +\def\labelenumi{\arabic{enumi}.} +\tightlist +\item + \texttt{f} is the probability density function \(f_X\) of the random variable \(X\) of interest; +\item + \texttt{args\_f}: is a list of arguments that will be passed to the probability density function \(f_X\) and that specify the parameters of the density of \(X\); +\item + \texttt{f\_base}: is the probability density function that will supposedly be used as the base density \(g_Y\); +\item + \texttt{args\_f\_base}: is a list of arguments that will be passed to the base probability density function \(g_Y\) and that specify the parameters of the density of \(Y\); +\item + \texttt{xlim}: is a vector of size two that specifies the support of the functions \(f_Y\) and \(g_Y\) (they must be equivalent); +\item + \texttt{c}: is the constant \(c\) that will be used to multiply the base probability density function \(g_Y\), so it envelops (bounds) \(f_X\). The default is \texttt{c\ =\ 1}; +\item + \texttt{alpha}: is the transparency of the intersection area (default is \texttt{alpha\ =\ 0.4}); +\item + \texttt{color\_intersection}: is the color of the intersection area between \(f_X\) and \(g_Y\), where the default is \texttt{color\_intersection\ =\ \#BB9FC9}; +\item + \texttt{color\_f}: is the color of the curve of the probability density function \(f_X\) (default is \texttt{color\_f\ =\ \#FE4F0E}); +\item + \texttt{color\_f\_base}: is the color of the curve of the base probability density function \(g_Y\), where the default is \texttt{color\_f\_base\ =\ \#7BBDB3}. +\end{enumerate} + +The Figure \ref{fig:fig-inspect-1} (a) and Figure \ref{fig:fig-inspect-1} (b), presented at the beginning of this paper, were automatically created with the \href{https://prdm0.github.io/AcceptReject/reference/inspect.html}{\texttt{inspect()}} function. As an example, in addition to those available in the package documentation and vignettes, here is the code that generated the graph shown in Figure \ref{fig:fig-inspect-1} (a): + +\begin{verbatim} +library(AcceptReject) + +# Considering c = 1 (default) +inspect( + f = dweibull, + f_base = dunif, + xlim = c(0, 5), + args_f = list(shape = 2, scale = 1), + args_f_base = list(min = 0, max = 5), + c = 1 +) +\end{verbatim} + +\subsection{Generating Observations with ARM}\label{generating-observations-with-arm} + +The most important function of the \pkg{AcceptReject} package is the \texttt{accept\_reject()} function, as it is the function that implements ARM and all the optimizations to obtain a good generation of \(X\). The \texttt{accept\_reject()} function has the following signature: + +\begin{verbatim} +accept_reject( + n = 1L, + continuous = TRUE, + f = NULL, + args_f = NULL, + f_base = NULL, + random_base = NULL, + args_f_base = NULL, + xlim = NULL, + c = NULL, + parallel = FALSE, + cores = NULL, + warning = TRUE, + ... +) +\end{verbatim} + +where: + +\begin{enumerate} +\def\labelenumi{\arabic{enumi}.} +\tightlist +\item + \texttt{n}: is the number of observations to be generated (default \texttt{n\ =\ 1L}); +\item + \texttt{continuous}: is a logical value that indicates whether the probability density function \(f_X\) is continuous (default \texttt{continuous\ =\ TRUE}) or discrete, if \texttt{continuous\ =\ FALSE}; +\item + \texttt{f}: is the probability density function \(f_X\) of the random variable \(X\) of interest; +\item + \texttt{args\_f}: is a list of arguments that will be passed to the probability density function \(f_X\) and that specify the parameters of the density of \(X\). No matter how many parameters there are in \texttt{f}, they should be passed as a list to \texttt{args\_f}; +\item + \texttt{f\_base}: is the probability density function that will supposedly be used as the base density \(g_Y\). It is important to note that this argument is only useful in this package when \texttt{continuous\ =\ TRUE}, as for the discrete case the package already has quite satisfactory computational performance. Additionally, visualizing a probability mass function that bounds \texttt{f\_X} is more complicated than for the continuous case. In the discrete case, when \texttt{continuous\ =\ FALSE}, \texttt{f\_base\ =\ NULL} will be considered the discrete uniform distribution; +\item + \texttt{random\_base}: is a function that generates observations from the base distribution \(Y\) (default \texttt{random\_base\ =\ NULL}). If \texttt{random\_base\ =\ NULL}, the base distribution \(Y\) will be considered uniform over the interval specified in the \texttt{xlim} argument; +\item + \texttt{args\_f\_base}: is a list of arguments that will be passed to the base probability density function \(g_Y\) and that specify the parameters of the density of \(Y\). No matter how many parameters there are in \texttt{f\_base}, they should be passed as a list to \texttt{args\_f\_base}; +\item + \texttt{xlim}: is a vector of size two that specifies the support of the functions \(f_Y\) and \(g_Y\). It is important to remember that the support of \(f_Y\) and \(g_Y\) must be equivalent and will be informed by a single vector of size two passed as an argument to \texttt{xlim}; +\item + \texttt{c}: is the constant \(c\) that will be used to multiply the base probability density function \(g_Y\), so it envelops (bounds) \(f_X\). The default is \texttt{c\ =\ 1}. Unless there is a very strong reason, considering \texttt{c\ =\ NULL} as the default will be a good choice because the \texttt{accept\_reject()} function will attempt to find an optimal value for the constant \(c\); +\item + \texttt{parallel}: is a logical value that indicates whether the generation of observations will be done in parallel (default \texttt{parallel\ =\ FALSE}). If \texttt{parallel\ =\ TRUE}, the generation of observations will be done in parallel on Unix-based systems, using the total number of cores available on the system. If \texttt{parallel\ =\ TRUE} and the operating system is Windows, the code will run serially, and no error will be returned even if \texttt{parallel\ =\ TRUE}; +\item + \texttt{cores}: is the number of cores that will be used for parallel observation generation. If \texttt{parallel\ =\ TRUE} and \texttt{cores\ =\ NULL}, the number of cores used will be the total number of cores available on the system. If the user wishes to use a smaller number of cores, they can define it in the \texttt{cores} argument; +\item + \texttt{warning}: is a logical value that indicates whether warning messages will be printed during the execution of the function (default \texttt{warning\ =\ TRUE}). If the user specifies a very small domain in \texttt{xlim}, the \texttt{accept\_reject()} function will issue a warning informing that the specified domain is too small and that the generation of observations may be compromised; +\item + \texttt{...}: additional arguments that the user might want to pass to the \texttt{optimize} function used to optimize the value of \texttt{c}. +\end{enumerate} + +In a simpler use case, where the user does not wish to specify the base density function \(g_Y\) passed as an argument to \texttt{f\_base}, useful for generating observations of a sequence of continuous random variables (\(x_1, \cdots, x_n\)), the use of the \texttt{accept\_reject()} function is quite straightforward. For example, to generate 100 observations of a random variable \(X\) with a probability density function \(f_X(x) = 2x\), \(0 \leq x \leq 1\), the user could do: + +\begin{verbatim} +set.seed(0) + +# Generate 100 observations from a random variable X with +# f_X(x) = 2x, 0 <= x <= 1. +x <- accept_reject( + n = 100L, + f = function(x) 2 * x, + args_f = list(), + xlim = c(0, 1), + warning = FALSE +) +print(x[1L:8L]) +\end{verbatim} + +\begin{verbatim} +#> [1] 0.8966972 0.3721239 0.5728534 0.8983897 0.9446753 0.6607978 0.6870228 +#> [8] 0.7698414 +\end{verbatim} + +Note that in this case, if \texttt{warning\ =\ TRUE} (default), the \texttt{accept\_reject()} function cannot know that \texttt{xlim\ =\ c(0,\ 1)} specifies the entire support of the probability density function \(f_X(x) = 2x\), \(0 \leq x \leq 1\), and that is why it issues a warning. Typically, one chooses a support that is passed to the \texttt{xlim} argument in which below the lower limit (first element of \texttt{xlim}) and above the upper limit (second element of the vector \texttt{xlim}), the probability mass (in the discrete case) or density (in the continuous case) is close to zero or not defined. In this case, we have deliberately set \texttt{warning\ =\ FALSE}. + +\subsection{Printing the Object with Generated Observations}\label{printing-the-object-with-generated-observations} + +The \texttt{accept\_reject()} function returns an object of class \texttt{accept\_reject}. An object of the \texttt{accept\_reject} class is essentially an atomic vector with observations generated by ARM, carrying attributes and marked with a specific class from the \pkg{AcceptReject} package, i.e., marked with the \texttt{accept\_reject} class. These attributes are useful for the package and carry information used by the \texttt{plot.accept\_reject()} function. + +Using the S3 object-oriented system in R, the \texttt{print()} function will dispatch for objects of the \texttt{accept\_reject()} class, invoking the \texttt{print.accept\_reject()} method from the \pkg{AcceptReject} package. For an object of the \texttt{accept\_reject} class returned by the \texttt{accept\_reject()} function, it is possible to pass methods that operate on atomic vectors, such as \texttt{summary()}, \texttt{mean()}, \texttt{var()}, among others. The \texttt{print()} function will print useful information about the \texttt{accept\_reject()} class object, such as the number of observations generated, the value of \texttt{c} used, the acceptance probability, the first generated observations, and the considered \texttt{xlim} interval. The \texttt{summary()} function will work by returning some descriptive statistics about the generated observations, such as the mean, median, variance, standard deviation, minimum, maximum, first quartile, and third quartile. + +\begin{verbatim} +# setting a seed for reproducibility +set.seed(0) +x <- accept_reject( + n = 2000L, + f = dbinom, + continuous = FALSE, + args_f = list(size = 5, prob = 0.5), + xlim = c(0, 10) +) + +# Printing the first 10 (default) observations +print(x) +\end{verbatim} + +\begin{verbatim} +# Printing the first 20 observations +print(x, n_min = 20L) +\end{verbatim} + +\begin{verbatim} +# Summary +summary(x) +\end{verbatim} + +\begin{verbatim} +#> V1 +#> Min. :0.000 +#> 1st Qu.:2.000 +#> Median :3.000 +#> Mean :2.538 +#> 3rd Qu.:3.000 +#> Max. :5.000 +\end{verbatim} + +\subsection{Plotting Data Generated by ARM}\label{plotting-data-generated-by-arm} + +Often, when generating observations using ARM, we are interested in visualizing the generated observations and graphically comparing them with the probability mass function (discrete case) or probability density function (continuous case) to get an idea of the quality of the data generation. Constructing a graph with your favorite library for each generation can be time-consuming. The idea of the \texttt{plot.accept\_reject()} method is to facilitate this task, allowing you to easily and quickly generate this type of graph. In fact, since the S3 object-oriented system is used, you only need to use the \texttt{plot()} function on an \texttt{accept\_reject} class object. + +The \texttt{plot()} function applied to an \texttt{accept\_reject} class object will return an object with the secondary class \texttt{gg} and the primary class \texttt{ggplot} (returns a graph). The returned object is a graph constructed with the \pkg{ggplot2} library and can be modified to your preferred \pkg{ggplot2} standards, such as its theme. However, the \texttt{plot} function has some specific arguments that allow you to modify some elements. The general usage form is: + +\begin{verbatim} +## S3 method for class 'accept_reject' +plot( + x, + color_observed_density = "#BB9FC9", + color_true_density = "#FE4F0E", + color_bar = "#BB9FC9", + color_observable_point = "#7BBDB3", + color_real_point = "#FE4F0E", + alpha = 0.3, + hist = TRUE, + ... +) +\end{verbatim} + +\begin{enumerate} +\def\labelenumi{\arabic{enumi}.} +\tightlist +\item + \texttt{x}: An object of the \texttt{accept\_reject} class; +\item + \texttt{color\_observed\_density}: observed density color (continuous case); +\item + \texttt{color\_true\_density}: theoretical density color (continuous case); +\item + \texttt{color\_bar}: bar chart fill color (discrete case); +\item + \texttt{color\_observable\_point}: color of generated points (discrete case); +\item + \texttt{color\_real\_point}: color of the real points (discrete case); +\item + \texttt{alpha}: transparency of the bars (discrete case); +\item + \texttt{hist}: if \texttt{TRUE}, a histogram will be plotted in the continuous case, comparing the theoretical density with the observed one. If \texttt{FALSE}, \texttt{ggplot2::geom\_density()} will be used instead of the histogram; +\item + \texttt{...}: additional arguments. +\end{enumerate} + +The following code demonstrates the use of the \texttt{plot()} function. In Figure \ref{fig:fig-plotfunc-1} (a), an example of a graph is presented with the histogram replaced by the observed density, if this type of representation is more useful to the user. To do this, simply pass the argument \texttt{hist\ =\ FALSE}. To illustrate the use of the arguments, the parameters \texttt{color\_true\_density}, \texttt{color\_observed\_density}, and \texttt{alpha} were also changed. In Figure \ref{fig:fig-plotfunc-1} (b), an example of generating a graph for the discrete case is presented. The simple use of the \texttt{plot()} function without passing arguments should be sufficient to meet the needs of most users. + +\begin{verbatim} +library(AcceptReject) + +# Generating and plotting the theoretical density with the +# observed density. + +# setting a seed for reproducibility +set.seed(0) + +# Continuous case +accept_reject( + n = 2000L, + continuous = TRUE, + f = dweibull, + args_f = list(shape = 2.1, scale = 2.2), + xlim = c(0, 10) +) |> + plot( + hist = FALSE, + color_true_density = "#2B8b99", + color_observed_density = "#F4DDB3", + alpha = 0.6 + ) # Changing some arguments in plot() + +# Discrete case +accept_reject( + n = 1000L, + f = dbinom, + continuous = FALSE, + args_f = list(size = 5, prob = 0.5), + xlim = c(0, 10) +) |> plot() +\end{verbatim} + +\begin{figure}[H] + +{\centering \subfloat[Weibull with $n = 2000$ observations.\label{fig:fig-plotfunc-1-1}]{\includegraphics[width=0.5\linewidth]{RJ-2025-037_files/figure-latex/fig-plotfunc-1-1} }\subfloat[Binomial with $n = 1000$ observations.\label{fig:fig-plotfunc-1-2}]{\includegraphics[width=0.5\linewidth]{RJ-2025-037_files/figure-latex/fig-plotfunc-1-2} } + +} + +\caption{Plotting the theoretical density function (a) and the probability mass function (b), with details of the respective parameters in the code.}\label{fig:fig-plotfunc-1} +\end{figure} + +\section{Examples}\label{examples} + +Below are some examples of using the \texttt{accept\_reject()} function to generate pseudo-random observations of discrete and continuous random variables. It should be noted that in the case of \(X\) being a discrete random variable, it is necessary to provide the argument \texttt{continuous\ =\ FALSE}, whereas in the case of \(X\) being continuous (the default), you must consider \texttt{continuous\ =\ TRUE}. + +\subsection{Generating discrete observations}\label{generating-discrete-observations} + +As an example, let \(X \sim Poisson(\lambda = 0.7)\). We will generate \(n = 1000\) observations of \(X\) using the acceptance-rejection method, using the \texttt{accept\_reject()} function. Note that it is necessary to provide the \texttt{xlim} argument. Try to set an upper limit value for which the probability of \(X\) assuming that value is zero or very close to zero. In this case, we choose \texttt{xlim\ =\ c(0,\ 20)}, where \texttt{dpois(x\ =\ 20,\ lambda\ =\ 0.7)} is very close to zero (\ensuremath{1.6286586\times 10^{-22}}). Note Figure \ref{fig:fig-poisson-1} (a) and Figure \ref{fig:fig-poisson-1} (b), respectively, and note that in Figure \ref{fig:fig-poisson-1} (b), there is a proximity of the observed probabilities to the theoretical probabilities, indicating that ARM is generating observations that approximate the true probability mass function as the sample size increases. + +\begin{verbatim} +library(AcceptReject) +library(parallel) +library(cowplot) # install.packages("cowplot") + +# Ensuring reproducibility in parallel computing +RNGkind("L'Ecuyer-CMRG") +set.seed(0) +mc.reset.stream() + +# Simulation +simulation <- function(n, lambda = 0.7) + accept_reject( + n = n, + f = dpois, + continuous = FALSE, # discrete case + args_f = list(lambda = lambda), + xlim = c(0, 20), + parallel = TRUE # Parallelizing the code in Unix-based systems +) + +# Generating observations +# n = 25 observations +system.time({x <- simulation(25L)}) +\end{verbatim} + +\begin{verbatim} +#> user system elapsed +#> 0.006 0.038 0.047 +\end{verbatim} + +\begin{verbatim} +plot(x) + +# n = 2500 observations +system.time({y <- simulation(2500L)}) +\end{verbatim} + +\begin{verbatim} +#> user system elapsed +#> 0.006 0.033 0.043 +\end{verbatim} + +\begin{verbatim} +plot(y) +\end{verbatim} + +\begin{figure}[H] + +{\centering \subfloat[n = 25 observations.\label{fig:fig-poisson-1-1}]{\includegraphics[width=0.5\linewidth]{RJ-2025-037_files/figure-latex/fig-poisson-1-1} }\subfloat[n = 2500 observations.\label{fig:fig-poisson-1-2}]{\includegraphics[width=0.5\linewidth]{RJ-2025-037_files/figure-latex/fig-poisson-1-2} } + +} + +\caption{Generating observations from a Poisson distribution using the acceptance-rejection method, with $n = 25$ (a) and $n = 2500$ (b), respectively.}\label{fig:fig-poisson-1} +\end{figure} + +Note that it is necessary to specify the nature of the random variable from which observations are desired to be generated. In the case of discrete variables, the argument \texttt{continuous\ =\ FALSE} must be passed. In Section 6, examples of how to generate continuous observations will be presented. + +\subsection{Generating continuous observations}\label{generating-continuous-observations} + +Considering the default base function, which is the uniform distribution, where it is not necessary to specify it, the code below exemplifies the continuous case, where \(X \sim \mathcal{N}(\mu = 0, \sigma^2 = 1)\). Not specifying a base probability density function implies \texttt{f\_base\ =\ NULL}, \texttt{random\_base\ =\ NULL}, and \texttt{args\_base\ =\ NULL}, which are the defaults for the \texttt{accept\_reject()} function. If at least one of them is specified as \texttt{NULL} and, by mistake, another is specified, no error will occur. In this situation, the \texttt{accept\_reject()} function will assume the base probability density function is the density function of a uniform distribution over \texttt{xlim}. Note also the use of the \texttt{plot()} function, which generates a graph with the theoretical density function and the histogram of the generated observations, allowing a quick visual check of the quality of the observations generated by ARM. + +\begin{verbatim} +library(AcceptReject) +library(parallel) + +# Ensuring reproducibility in parallel computing +RNGkind("L'Ecuyer-CMRG") +set.seed(0) +mc.reset.stream() + +# Generating observations +accept_reject( + n = 50L, + f = dnorm, + continuous = TRUE, + args_f = list(mean = 0, sd = 1), + xlim = c(-4, 4), + parallel = TRUE +) |> plot() + +accept_reject( + n = 500L, + f = dnorm, + continuous = TRUE, + args_f = list(mean = 0, sd = 1), + xlim = c(-4, 4), + parallel = TRUE +) |> plot() +\end{verbatim} + +\begin{figure}[H] + +{\centering \subfloat[n = 50 observations.\label{fig:fig-normal-1}]{\includegraphics[width=0.5\linewidth]{RJ-2025-037_files/figure-latex/fig-normal-1} }\subfloat[n = 500 observations.\label{fig:fig-normal-2}]{\includegraphics[width=0.5\linewidth]{RJ-2025-037_files/figure-latex/fig-normal-2} } + +} + +\caption{Generating observations from a continuous random variable with a Standard Normal distribution, with $n = 50$ and $n = 500$ observations, respectively.}\label{fig:fig-normal} +\end{figure} + +\section{A practical scenario for the use of the package}\label{a-practical-scenario-for-the-use-of-the-package} + +So far, the use of the package to generate distributions that are relatively simple and involve few parameters has been demonstrated. In this section, the idea is to demonstrate the use of the \pkg{AcceptReject} package in a practical situation where the use of ARM becomes necessary. Let's consider the generator of Modified Beta Distributions, proposed by \citep{nadarajah2014modified}. It is a family of probability distributions since it is possible to generate various probability density functions through the proposed density generator, whose general density function is defined by: + +\[f_X(x) = \frac{\beta^a}{B(a,b)} \times \frac{g(x)G(x)^{a - 1}(1 - G(x))^{b - 1}}{[1 - (1 - \beta)G(x)]^{a + b}},\] with \(x \geq 0\) and \(\beta, a, b > 0\), where \(g(x)\) is a probability density function, \(G(x)\) is the cumulative distribution function of \(g(x)\), and \(B(a,b)\) is the beta function. + +Notice that \(f_{X}(x)\) has a certain complexity since it depends on another probability density function \(g(x)\) and its cumulative distribution function \(G(x)\). Thus, \(f_X(x)\) has three parameters, \(c, a, b\), plus additional parameters inherited from \(g(x)\). Here, we will consider \(g(x)\) as the Weibull probability density function. Therefore, \(f_X(x)\) is the modified Weibull beta density function with five parameters. The implementation of the Modified Beta Distributions generator is presented below: + +\begin{verbatim} +#| +library(numDeriv) + +pdf <- function(x, G, ...){ + numDeriv::grad( + func = \(x) G(x, ...), + x = x + ) +} + +# Modified Beta Distributions +# Link: https://link.springer.com/article/10.1007/s13571-013-0077-0 +generator <- function(x, G, a, b, beta, ...){ + g <- pdf(x = x, G = G, ...) + numerator <- beta^a * g * G(x, ...)^(a - 1) * (1 - G(x, ...))^(b - 1) + denominator <- beta(a, b) * (1 - (1 - beta) * G(x, ...))^(a + b) + numerator/denominator +} + +# Probability density function - Modified Beta Weibull +pdf_mbw <- function(x, a, b, beta, shape, scale) + generator( + x = x, + G = pweibull, + a = a, + b = b, + beta = beta, + shape = shape, + scale = scale + ) + +# Checking the value of the integral +integrate( + f = \(x) pdf_mbw(x, 1, 1, 1, 1, 1), + lower = 0, + upper = Inf +) +\end{verbatim} + +\begin{verbatim} +#> 1 with absolute error < 5.7e-05 +\end{verbatim} + +Notice that the value \texttt{pdf\_mbw()} integrates to 1, being a probability density function. Thus, the \texttt{generator()} function generates probability density functions based on another distribution \(G_X(x)\). In the case of the code above, the cumulative distribution function of the Weibull distribution is passed to the \texttt{generator()} function, but it could be any other. + +In the following code, we will adopt the strategy of investigating (inspecting) a coherent proposal for a base density function to be passed as an argument to \texttt{f\_base} in the \texttt{accept\_reject()} function. The investigation could be skipped, in which case the \texttt{accept\_reject()} function would assume the uniform distribution as the base. + +We will consider the Weibull distribution since it is a particular case of the Modified Beta Weibull distribution. As we know how to generate observations from the Weibull distribution using the \texttt{rweibull()} function, the Weibull distribution is a viable candidate for the base density \(g_Y(y)\). Consider the true parameters \texttt{a\ =\ 10.5}, \texttt{b\ =\ 4.2}, \texttt{beta\ =\ 5.9}, \texttt{shape\ =\ 1.5}, and \texttt{scale\ =\ 1.7}. Thus, using the \texttt{inspect()} function, we can quickly inspect by doing: + +\begin{verbatim} +library(AcceptReject) + +# True parameters +a <- 10.5 +b <- 4.2 +beta <- 5.9 +shape <- 1.5 +scale <- 1.7 + +# c = 1 (default) +inspect( + f = pdf_mbw, + f_base = dweibull, + xlim = c(0, 4), + args_f = list( + a = a, + b = b, + beta = beta, + shape = shape, + scale = scale + ), + args_f_base = list(shape = 2, scale = 1.2), + c = 1 +) + +# c = 2.2 +inspect( + f = pdf_mbw, + f_base = dweibull, + xlim = c(0, 4), + args_f = list( + a = a, + b = b, + beta = beta, + shape = shape, + scale = scale + ), + args_f_base = list(shape = 2, scale = 1.2), + c = 2.2 +) +\end{verbatim} + +\begin{figure}[H] + +{\centering \subfloat[For c = 1.\label{fig:fig-nadarajah-1}]{\includegraphics[width=0.5\linewidth]{RJ-2025-037_files/figure-latex/fig-nadarajah-1} }\subfloat[For c = 2.2.\label{fig:fig-nadarajah-2}]{\includegraphics[width=0.5\linewidth]{RJ-2025-037_files/figure-latex/fig-nadarajah-2} } + +} + +\caption{Inspecting the Weibull distribution with shape = 2, scale = 1.2, with the support xlim = c(0, 4) and c = 1 (default) (a) and c = 2.2 (b), respectively.}\label{fig:fig-nadarajah} +\end{figure} + +Notice that in Figure \ref{fig:fig-nadarajah} (b), when \(c = 2.2\), the density \(g_Y\) bounds the density \(f_X\), which is the Modified Beta Weibull density of \(X\) from which we want to generate observations. Thus, the density \(g_Y\) used as a base is a viable candidate to be passed to the \texttt{f\_base} argument of the \texttt{accept\_reject()} function in the \pkg{AcceptReject} package. Also, note that the area between \(f_X\) and \(g_Y\) is smaller than it would be if \(g_Y\) were considered the uniform probability density function in the \texttt{xlim} support. In the following section, we will discuss the computational cost for different sample sizes, considering the base density \(g_Y\) as the Weibull density or the default (uniform density). + +\section{Benchmarking}\label{benchmarking} + +The benchmarks considered in this section were performed on a computer with Arch Linux operating system, Intel Core(TM) i7-1260P processor with 16 threads, maximum frequency of 4.70 GHz, with computational times in logarithmic scale, base ten. More specifications were presented in Section 3. + +Considering the case of the Modified Beta Weibull probability density function (a probability density function with five parameters) implemented in the \texttt{pdf\_mbw()} function presented in Section 7, several benchmarks were conducted to evaluate the computational cost of the \texttt{accept\_reject()} function for various sample sizes, considering both the parallelized and non-parallelized scenarios. Additionally, the benchmarks took into account the specification of \(g_Y\) considering the continuous uniform distribution in the \texttt{xlim} interval (default of the \texttt{accept\_reject()} function) and the Weibull distribution with parameters \texttt{shape\ =\ 2} and \texttt{scale\ =\ 1.2}, which bounds the probability density function of a random variable with Modified Beta Weibull with true parameters as in Figure \ref{fig:fig-nadarajah}. The sample sizes considered were \(n = 50\), \(250\), \(500\), \(1000\), \(5000\), \(10000\), \(15000\), \(25000\), \(50000\), \(100000\), \(150000\), \(250000\), \(500000\), and \(1000000\). + +\begin{figure}[H] + +{\centering \subfloat[Weibull distribution.\label{fig:fig-benchmarking-1}]{\includegraphics[width=0.5\linewidth]{RJ-2025-037_files/figure-latex/fig-benchmarking-1} }\subfloat[Uniform distribution (default).\label{fig:fig-benchmarking-2}]{\includegraphics[width=0.5\linewidth]{RJ-2025-037_files/figure-latex/fig-benchmarking-2} } + +} + +\caption{Benchmarking for different sample sizes, considering the Weibull distribution and the uniform distribution as the base density, with Weibull distribution and Uniform distribution (default), respectively.}\label{fig:fig-benchmarking} +\end{figure} + +Observing the Figures in \ref{fig:fig-benchmarking}, it is possible to see that the serial code, which is the default for the \texttt{accept\_reject()} function, already performs excellently even with large samples, both when considering the specification of a base function \(g_Y\) or using the uniform distribution as the base. Only with very large samples does the parallelized code perform better, as the parallel code may impose an overhead in thread creation that might not be justified with very small samples. For sample sizes above 10,000, the parallelized code shows better performance. However, depending on the complexity of the probability distributions involved, there may be situations where parallelizing with moderate samples could be a good alternative. The package user should, in practice, conduct tests and decide whether to consider \texttt{parallel\ =\ TRUE} or \texttt{parallel\ =\ FALSE}. + +It can be observed Figure \ref{fig:fig-benchmarking} that the choice of the base distribution, for the simulated case, did not significantly influence the computational performance of the \texttt{accept\_reject()} function. Often, depending on the complexity of \(f_X\), the user might not need to worry about choosing a \(g_Y\) to pass to the \texttt{f\_base} argument of the \texttt{accept\_reject()} function. Additionally, it is not very common in Monte-Carlo simulation studies to consider sample sizes much larger than those considered here. Therefore, users of non-Unix-based systems will not experience significant issues regarding the computational cost of the \texttt{accept\_reject()} function. + +\section{Related works}\label{related-works} + +An implementation of ARM is provided by the \CRANpkg{AR} library \citep{AR}. This library exports the \texttt{AR.Sim()} function, which has an educational utility to demonstrate the functioning of ARM. Additionally, the design of the \CRANpkg{AcceptReject} package allows, in the continuous case, the use of base probability density functions that do not necessarily need to be implemented in the R language or in specific packages like in the case of the \CRANpkg{AR} library, which is a significant advantage. The specifications of the base densities in the \CRANpkg{AR} package are made considering only the densities implemented in the \CRANpkg{DISTRIB} package \citep{DISTRIB}. + +Another library that implements ARM is \CRANpkg{SimDesign} \citep{SimDesign}, through the \texttt{rejectionSampling()} function. The \texttt{rejectionSampling()} function is more efficient than the \texttt{AR.Sim()} function from the \CRANpkg{AR} library, but its efficiency still does not surpass that of the \texttt{accept\_reject()} function from the \CRANpkg{AcceptReject} package. The \texttt{rejectionSampling()} function from the \CRANpkg{SimDesign} library also does not support parallelism, which is a disadvantage compared to the \texttt{accept\_reject()} function from the \CRANpkg{AcceptReject} package. Furthermore, the design of the \CRANpkg{AcceptReject} package uses the S3 object-oriented system, with exports of simple functions that allow the inspection and analysis of the generated data. + +The \CRANpkg{AcceptReject} library presents several advantages over the mentioned libraries, particularly in the way the package is designed, facilitating the use of functions. The library, for example, provides functions that allow the inspection of \(f_X\) with \(g_Y\) when \(X\) and \(Y\) are continuous random variables, making it easy to create highly informative graphs for the user, visually informing about the quality of the generated observations. It is a common interest for those using ARM to observe the generated data and check if they conform to the desired probability distribution. + +Figure \ref{fig:fig-bench} (a) presents a simulation study comparing the \texttt{accept\_reject()} function from the \CRANpkg{AcceptReject} package with the \texttt{rejectionSampling()} function from the \CRANpkg{SimDesign} package, considering small to moderate sample sizes. In this scenario, the \texttt{accept\_reject()} function was executed serially, just like the \texttt{rejectionSampling()} function, which does not support parallelism. Equivalent performance was observed between the functions. In Figure \ref{fig:fig-bench} (b), it is observed that the performance of the \texttt{accept\_reject()} function on large samples, considering the parallel execution of the \texttt{accept\_reject()} function, surpasses the performance of the \texttt{rejectionSampling()} function from the \CRANpkg{SimDesign} package. + +\begin{figure}[H] + +{\centering \subfloat[Weibull distribution.\label{fig:fig-bench-1}]{\includegraphics[width=0.5\linewidth]{RJ-2025-037_files/figure-latex/fig-bench-1} }\subfloat[Uniform distribution (default).\label{fig:fig-bench-2}]{\includegraphics[width=0.5\linewidth]{RJ-2025-037_files/figure-latex/fig-bench-2} } + +} + +\caption{Comparison between the AcceptReject and SimDesign package for different sample sizes, considering the generation of observations from a random variable with a Modified Beta Weibull distribution, serial processing with AcceptReject (a) and parallel processing with AcceptReject package (b), respectively.}\label{fig:fig-bench} +\end{figure} + +\section{Conclusion and future developments}\label{conclusion-and-future-developments} + +The \CRANpkg{AcceptReject} package is an efficient tool for generating pseudo-random numbers in univariate distributions of discrete and continuous random variables using the acceptance-rejection method (ARM). The library is built with the aim of delivering ease of use and computational efficiency. The \CRANpkg{AcceptReject} library can generate pseudo-random numbers both serially and in parallel, with computational efficiency in both cases, in addition to allowing easy inspection and analysis of the generated data. + +For future developments, the \CRANpkg{AcceptReject} library can be extended with functions that allow the visualization of ARM in an iterative application using Shiny for educational purposes. The most important development step for future versions will be to seek even more performance. Additionally, the project is also open to contributions from the community. + +\bibliography{RJreferences.bib} + +\address{% +Pedro Rafael Diniz Marinho\\ +Federal University of Paraíba\\% +Department of Statistics\\ Cidade Universitária, s/n, Departamento de Estatística - UFPB\\ +% +\url{https://github.com/prdm0}\\% +\textit{ORCiD: \href{https://orcid.org/0000-0003-1591-8300}{0000-0003-1591-8300}}\\% +\href{mailto:pedro.rafael.mariho@gmail.com}{\nolinkurl{pedro.rafael.mariho@gmail.com}}% +} + +\address{% +Vera L. D. Tomazella\\ +Federal University of São Carlos\\% +Department of Statistics\\ Rodovia Washington Luiz, Km 235, Monjolinho\\ +% +\url{https://www.servidores.ufscar.br/vera/}\\% +\textit{ORCiD: \href{https://orcid.org/0000-0002-6780-2089}{0000-0002-6780-2089}}\\% +\href{mailto:vera@ufscar.br}{\nolinkurl{vera@ufscar.br}}% +} diff --git a/_articles/RJ-2025-037/RJ-2025-037.zip b/_articles/RJ-2025-037/RJ-2025-037.zip new file mode 100644 index 0000000000..252dde0adf Binary files /dev/null and b/_articles/RJ-2025-037/RJ-2025-037.zip differ diff --git a/_articles/RJ-2025-037/RJ-2025-037_files/figure-html5/fig-bench-1.png b/_articles/RJ-2025-037/RJ-2025-037_files/figure-html5/fig-bench-1.png new file mode 100644 index 0000000000..77adbfe4d7 Binary files /dev/null and b/_articles/RJ-2025-037/RJ-2025-037_files/figure-html5/fig-bench-1.png differ diff --git a/_articles/RJ-2025-037/RJ-2025-037_files/figure-html5/fig-bench-2.png b/_articles/RJ-2025-037/RJ-2025-037_files/figure-html5/fig-bench-2.png new file mode 100644 index 0000000000..a91b95c7b4 Binary files /dev/null and b/_articles/RJ-2025-037/RJ-2025-037_files/figure-html5/fig-bench-2.png differ diff --git a/_articles/RJ-2025-037/RJ-2025-037_files/figure-html5/fig-benchmarking-1.png b/_articles/RJ-2025-037/RJ-2025-037_files/figure-html5/fig-benchmarking-1.png new file mode 100644 index 0000000000..67d4a2f7cd Binary files /dev/null and b/_articles/RJ-2025-037/RJ-2025-037_files/figure-html5/fig-benchmarking-1.png differ diff --git a/_articles/RJ-2025-037/RJ-2025-037_files/figure-html5/fig-benchmarking-2.png b/_articles/RJ-2025-037/RJ-2025-037_files/figure-html5/fig-benchmarking-2.png new file mode 100644 index 0000000000..60d882c747 Binary files /dev/null and b/_articles/RJ-2025-037/RJ-2025-037_files/figure-html5/fig-benchmarking-2.png differ diff --git a/_articles/RJ-2025-037/RJ-2025-037_files/figure-html5/fig-inspect-1-1.png b/_articles/RJ-2025-037/RJ-2025-037_files/figure-html5/fig-inspect-1-1.png new file mode 100644 index 0000000000..34166b2bdb Binary files /dev/null and b/_articles/RJ-2025-037/RJ-2025-037_files/figure-html5/fig-inspect-1-1.png differ diff --git a/_articles/RJ-2025-037/RJ-2025-037_files/figure-html5/fig-inspect-1-2.png b/_articles/RJ-2025-037/RJ-2025-037_files/figure-html5/fig-inspect-1-2.png new file mode 100644 index 0000000000..070818999d Binary files /dev/null and b/_articles/RJ-2025-037/RJ-2025-037_files/figure-html5/fig-inspect-1-2.png differ diff --git a/_articles/RJ-2025-037/RJ-2025-037_files/figure-html5/fig-nadarajah-1.png b/_articles/RJ-2025-037/RJ-2025-037_files/figure-html5/fig-nadarajah-1.png new file mode 100644 index 0000000000..e6e1eee676 Binary files /dev/null and b/_articles/RJ-2025-037/RJ-2025-037_files/figure-html5/fig-nadarajah-1.png differ diff --git a/_articles/RJ-2025-037/RJ-2025-037_files/figure-html5/fig-nadarajah-2.png b/_articles/RJ-2025-037/RJ-2025-037_files/figure-html5/fig-nadarajah-2.png new file mode 100644 index 0000000000..6382f07f0f Binary files /dev/null and b/_articles/RJ-2025-037/RJ-2025-037_files/figure-html5/fig-nadarajah-2.png differ diff --git a/_articles/RJ-2025-037/RJ-2025-037_files/figure-html5/fig-normal-1.png b/_articles/RJ-2025-037/RJ-2025-037_files/figure-html5/fig-normal-1.png new file mode 100644 index 0000000000..1216bd66aa Binary files /dev/null and b/_articles/RJ-2025-037/RJ-2025-037_files/figure-html5/fig-normal-1.png differ diff --git a/_articles/RJ-2025-037/RJ-2025-037_files/figure-html5/fig-normal-2.png b/_articles/RJ-2025-037/RJ-2025-037_files/figure-html5/fig-normal-2.png new file mode 100644 index 0000000000..e0a9ff1c08 Binary files /dev/null and b/_articles/RJ-2025-037/RJ-2025-037_files/figure-html5/fig-normal-2.png differ diff --git a/_articles/RJ-2025-037/RJ-2025-037_files/figure-html5/fig-plotfunc-1-1.png b/_articles/RJ-2025-037/RJ-2025-037_files/figure-html5/fig-plotfunc-1-1.png new file mode 100644 index 0000000000..68b165951c Binary files /dev/null and b/_articles/RJ-2025-037/RJ-2025-037_files/figure-html5/fig-plotfunc-1-1.png differ diff --git a/_articles/RJ-2025-037/RJ-2025-037_files/figure-html5/fig-plotfunc-1-2.png b/_articles/RJ-2025-037/RJ-2025-037_files/figure-html5/fig-plotfunc-1-2.png new file mode 100644 index 0000000000..cc83d4cadd Binary files /dev/null and b/_articles/RJ-2025-037/RJ-2025-037_files/figure-html5/fig-plotfunc-1-2.png differ diff --git a/_articles/RJ-2025-037/RJ-2025-037_files/figure-html5/fig-poisson-1-1.png b/_articles/RJ-2025-037/RJ-2025-037_files/figure-html5/fig-poisson-1-1.png new file mode 100644 index 0000000000..8afd9fe281 Binary files /dev/null and b/_articles/RJ-2025-037/RJ-2025-037_files/figure-html5/fig-poisson-1-1.png differ diff --git a/_articles/RJ-2025-037/RJ-2025-037_files/figure-html5/fig-poisson-1-2.png b/_articles/RJ-2025-037/RJ-2025-037_files/figure-html5/fig-poisson-1-2.png new file mode 100644 index 0000000000..5051f466c3 Binary files /dev/null and b/_articles/RJ-2025-037/RJ-2025-037_files/figure-html5/fig-poisson-1-2.png differ diff --git a/_articles/RJ-2025-037/RJ-2025-037_files/figure-latex/fig-bench-1.pdf b/_articles/RJ-2025-037/RJ-2025-037_files/figure-latex/fig-bench-1.pdf new file mode 100644 index 0000000000..06006aee11 Binary files /dev/null and b/_articles/RJ-2025-037/RJ-2025-037_files/figure-latex/fig-bench-1.pdf differ diff --git a/_articles/RJ-2025-037/RJ-2025-037_files/figure-latex/fig-bench-2.pdf b/_articles/RJ-2025-037/RJ-2025-037_files/figure-latex/fig-bench-2.pdf new file mode 100644 index 0000000000..b014f77404 Binary files /dev/null and b/_articles/RJ-2025-037/RJ-2025-037_files/figure-latex/fig-bench-2.pdf differ diff --git a/_articles/RJ-2025-037/RJ-2025-037_files/figure-latex/fig-benchmarking-1.pdf b/_articles/RJ-2025-037/RJ-2025-037_files/figure-latex/fig-benchmarking-1.pdf new file mode 100644 index 0000000000..6b4fbcc839 Binary files /dev/null and b/_articles/RJ-2025-037/RJ-2025-037_files/figure-latex/fig-benchmarking-1.pdf differ diff --git a/_articles/RJ-2025-037/RJ-2025-037_files/figure-latex/fig-benchmarking-2.pdf b/_articles/RJ-2025-037/RJ-2025-037_files/figure-latex/fig-benchmarking-2.pdf new file mode 100644 index 0000000000..c1ee0c0053 Binary files /dev/null and b/_articles/RJ-2025-037/RJ-2025-037_files/figure-latex/fig-benchmarking-2.pdf differ diff --git a/_articles/RJ-2025-037/RJ-2025-037_files/figure-latex/fig-inspect-1-1.pdf b/_articles/RJ-2025-037/RJ-2025-037_files/figure-latex/fig-inspect-1-1.pdf new file mode 100644 index 0000000000..15ac36897a Binary files /dev/null and b/_articles/RJ-2025-037/RJ-2025-037_files/figure-latex/fig-inspect-1-1.pdf differ diff --git a/_articles/RJ-2025-037/RJ-2025-037_files/figure-latex/fig-inspect-1-2.pdf b/_articles/RJ-2025-037/RJ-2025-037_files/figure-latex/fig-inspect-1-2.pdf new file mode 100644 index 0000000000..cfc5262a3b Binary files /dev/null and b/_articles/RJ-2025-037/RJ-2025-037_files/figure-latex/fig-inspect-1-2.pdf differ diff --git a/_articles/RJ-2025-037/RJ-2025-037_files/figure-latex/fig-nadarajah-1.pdf b/_articles/RJ-2025-037/RJ-2025-037_files/figure-latex/fig-nadarajah-1.pdf new file mode 100644 index 0000000000..968a1533d9 Binary files /dev/null and b/_articles/RJ-2025-037/RJ-2025-037_files/figure-latex/fig-nadarajah-1.pdf differ diff --git a/_articles/RJ-2025-037/RJ-2025-037_files/figure-latex/fig-nadarajah-2.pdf b/_articles/RJ-2025-037/RJ-2025-037_files/figure-latex/fig-nadarajah-2.pdf new file mode 100644 index 0000000000..c990051b60 Binary files /dev/null and b/_articles/RJ-2025-037/RJ-2025-037_files/figure-latex/fig-nadarajah-2.pdf differ diff --git a/_articles/RJ-2025-037/RJ-2025-037_files/figure-latex/fig-normal-1.pdf b/_articles/RJ-2025-037/RJ-2025-037_files/figure-latex/fig-normal-1.pdf new file mode 100644 index 0000000000..d4c7bba592 Binary files /dev/null and b/_articles/RJ-2025-037/RJ-2025-037_files/figure-latex/fig-normal-1.pdf differ diff --git a/_articles/RJ-2025-037/RJ-2025-037_files/figure-latex/fig-normal-2.pdf b/_articles/RJ-2025-037/RJ-2025-037_files/figure-latex/fig-normal-2.pdf new file mode 100644 index 0000000000..32129f158c Binary files /dev/null and b/_articles/RJ-2025-037/RJ-2025-037_files/figure-latex/fig-normal-2.pdf differ diff --git a/_articles/RJ-2025-037/RJ-2025-037_files/figure-latex/fig-plotfunc-1-1.pdf b/_articles/RJ-2025-037/RJ-2025-037_files/figure-latex/fig-plotfunc-1-1.pdf new file mode 100644 index 0000000000..d926206d00 Binary files /dev/null and b/_articles/RJ-2025-037/RJ-2025-037_files/figure-latex/fig-plotfunc-1-1.pdf differ diff --git a/_articles/RJ-2025-037/RJ-2025-037_files/figure-latex/fig-plotfunc-1-2.pdf b/_articles/RJ-2025-037/RJ-2025-037_files/figure-latex/fig-plotfunc-1-2.pdf new file mode 100644 index 0000000000..ceacf28e0b Binary files /dev/null and b/_articles/RJ-2025-037/RJ-2025-037_files/figure-latex/fig-plotfunc-1-2.pdf differ diff --git a/_articles/RJ-2025-037/RJ-2025-037_files/figure-latex/fig-poisson-1-1.pdf b/_articles/RJ-2025-037/RJ-2025-037_files/figure-latex/fig-poisson-1-1.pdf new file mode 100644 index 0000000000..c16990264d Binary files /dev/null and b/_articles/RJ-2025-037/RJ-2025-037_files/figure-latex/fig-poisson-1-1.pdf differ diff --git a/_articles/RJ-2025-037/RJ-2025-037_files/figure-latex/fig-poisson-1-2.pdf b/_articles/RJ-2025-037/RJ-2025-037_files/figure-latex/fig-poisson-1-2.pdf new file mode 100644 index 0000000000..19471a5f99 Binary files /dev/null and b/_articles/RJ-2025-037/RJ-2025-037_files/figure-latex/fig-poisson-1-2.pdf differ diff --git a/_articles/RJ-2025-037/RJournal.sty b/_articles/RJ-2025-037/RJournal.sty new file mode 100644 index 0000000000..351990be38 --- /dev/null +++ b/_articles/RJ-2025-037/RJournal.sty @@ -0,0 +1,358 @@ +% Package `RJournal' to use with LaTeX2e +% Copyright (C) 2010 by the R Foundation +% Copyright (C) 2013 by the R Journal +% +% Originally written by Kurt Hornik and Friedrich Leisch with subsequent +% edits by the editorial board +% +% CAUTION: +% Do not modify this style file. Any changes to this file will be reset when your +% article is submitted. +% If you must modify the style or add LaTeX packages to the article, these +% should be specified in RJwrapper.tex + +\NeedsTeXFormat{LaTeX2e}[1995/12/01] +\ProvidesPackage{RJournal}[2025/10/05 v0.17 RJournal package] + +\RequirePackage{tikz} + +% Overall page layout, fonts etc ----------------------------------------------- + +% Issues of of \emph{The R Journal} are created from the standard \LaTeX{} +% document class \pkg{report}. + +\RequirePackage{geometry} +\geometry{a4paper, + textwidth=14cm, top=1cm, bottom=1cm, + includehead,includefoot,centering, + footskip=1.5cm} +\raggedbottom +\sloppy +\clubpenalty = 10000 +\widowpenalty = 10000 +\brokenpenalty = 10000 +\usepackage{microtype} + + +\RequirePackage{fancyhdr} +\fancyhead{} +\fancyheadoffset{2cm} +\fancyhead[L]{\textsc{\RJ@sectionhead}} +\fancyhead[R]{\thepage} +\fancyfoot{} +\fancyfoot[L]{The R Journal Vol. \RJ@volume/\RJ@number, \RJ@month~\RJ@year} +\fancyfoot[R]{ISSN 2073-4859} +\pagestyle{fancy} + +% We use the following fonts (all with T1 encoding): +% +% rm & palatino +% tt & inconsolata +% sf & helvetica +% math & palatino + +\RequirePackage{microtype} + +\RequirePackage[scaled=0.92]{helvet} +\RequirePackage{palatino,mathpazo} +\RequirePackage[scaled=1.02]{inconsolata} +\RequirePackage[T1]{fontenc} + +\RequirePackage[hyphens]{url} +\RequirePackage[pagebackref]{hyperref} +\renewcommand{\backref}[1]{[p#1]} + +% Dark blue colour for all links +\RequirePackage{color} +\definecolor{link}{rgb}{0.45,0.51,0.67} +\hypersetup{ + colorlinks,% + citecolor=link,% + filecolor=link,% + linkcolor=link,% + urlcolor=link +} + +% Give the text a little room to breath +\setlength{\parskip}{3pt} +\RequirePackage{setspace} +\setstretch{1.05} + +% Issue and article metadata --------------------------------------------------- + +% Basic front matter information about the issue: volume, number, and +% date. + +\newcommand{\volume}[1]{\def\RJ@volume{#1}} +\newcommand{\volnumber}[1]{\def\RJ@number{#1}} +\renewcommand{\month}[1]{\def\RJ@month{#1}} +\renewcommand{\year}[1]{\def\RJ@year{#1}} + + +% Individual articles correspond to +% chapters, and are contained in |article| environments. This makes it +% easy to have figures counted within articles and hence hyperlinked +% correctly. + +% An article has an author, a title, and optionally a subtitle. We use +% the obvious commands for specifying these. Articles will be put in certain +% journal sections, named by \sectionhead. + +\newcommand {\sectionhead} [1]{\def\RJ@sectionhead{#1}} +\renewcommand{\author} [1]{\def\RJ@author{#1}} +\renewcommand{\title} [1]{\def\RJ@title{#1}} +\newcommand {\subtitle} [1]{\def\RJ@subtitle{#1}} + +% Control appearance of titles: make slightly smaller than usual, and +% suppress section numbering. See http://tex.stackexchange.com/questions/69749 +% for why we don't use \setcounter{secnumdepth}{-1} + +\usepackage[medium]{titlesec} +\usepackage{titletoc} +\titleformat{\section} {\normalfont\large\bfseries}{\arabic{section}}{1em}{} +\titleformat{\subsection}{\normalfont\normalsize\bfseries}{\arabic{section}.\arabic{subsection}}{0.5em}{} +\titlecontents{chapter} [0em]{}{}{}{\titlerule*[1em]{.}\contentspage} + +% Article layout --------------------------------------------------------------- + +% Environment |article| clears the article header information at its beginning. +% We use |\FloatBarrier| from the placeins package to keep floats within +% the article. +\RequirePackage{placeins} +\newenvironment{article}{\author{}\title{}\subtitle{}\FloatBarrier}{\FloatBarrier} + +% Refereed articles should have an abstract, so we redefine |\abstract| to +% give the desired style + +\renewcommand{\abstract}[1]{\noindent\textbf{Abstract} #1} +\renewenvironment{abstract}{\noindent\textbf{Abstract}~}{} + +% The real work is done by a redefined version of |\maketitle|. Note +% that even though we do not want chapters (articles) numbered, we +% need to increment the chapter counter, so that figures get correct +% labelling. + +\renewcommand{\maketitle}{% +\noindent + \chapter{\RJ@title}\refstepcounter{chapter} + \ifx\empty\RJ@subtitle + \else + \noindent\textbf{\RJ@subtitle} + \par\nobreak\addvspace{\baselineskip} + \fi + \ifx\empty\RJ@author + \else + \noindent\textit{\RJ@author} + \par\nobreak\addvspace{\baselineskip} + \fi + \@afterindentfalse\@nobreaktrue\@afterheading +} + +% Now for some ugly redefinitions. We do not want articles to start a +% new page. (Actually, we do, but this is handled via explicit +% \newpage +% +% The name@of@eq is a hack to get hyperlinks to equations to work +% within each article, even though there may be multiple eq.(1) +% \begin{macrocode} +\renewcommand\chapter{\secdef\RJ@chapter\@schapter} +\providecommand{\nohyphens}{% + \hyphenpenalty=10000\exhyphenpenalty=10000\relax} +\newcommand{\RJ@chapter}{% + \edef\name@of@eq{equation.\@arabic{\c@chapter}}% + \renewcommand{\@seccntformat}[1]{}% + \@startsection{chapter}{0}{0mm}{% + -2\baselineskip \@plus -\baselineskip \@minus -.2ex}{\p@}{% + \phantomsection\normalfont\huge\bfseries\raggedright}} + +% Book reviews should appear as sections in the text and in the pdf bookmarks, +% however we wish them to appear as chapters in the TOC. Thus we define an +% alternative to |\maketitle| for reviews. +\newcommand{\review}[1]{ + \pdfbookmark[1]{#1}{#1} + \section*{#1} + \addtocontents{toc}{\protect\contentsline{chapter}{#1}{\thepage}{#1.1}} +} + +% We want bibliographies as starred sections within articles. +% +\RequirePackage[sectionbib,round]{natbib} +\bibliographystyle{abbrvnat} +\renewcommand{\bibsection}{\section*{References}} + +% Equations, figures and tables are counted within articles, but we do +% not show the article number. For equations it becomes a bit messy to avoid +% having hyperref getting it wrong. + +% \numberwithin{equation}{chapter} +\renewcommand{\theequation}{\@arabic\c@equation} +\renewcommand{\thefigure}{\@arabic\c@figure} +\renewcommand{\thetable}{\@arabic\c@table} + +% Issue layout ----------------------------------------------------------------- + +% Need to provide our own version of |\tableofcontents|. We use the +% tikz package to get the rounded rectangle. Notice that |\section*| +% is really the same as |\chapter*|. +\renewcommand{\contentsname}{Contents} +\renewcommand\tableofcontents{% + \vspace{1cm} + \section*{\contentsname} + { \@starttoc{toc} } +} + +\renewcommand{\titlepage}{% + \thispagestyle{empty} + \hypersetup{ + pdftitle={The R Journal Volume \RJ@volume/\RJ@number, \RJ@month \RJ@year},% + pdfauthor={R Foundation for Statistical Computing},% + } + \noindent + \begin{center} + \fontsize{50pt}{50pt}\selectfont + The \raisebox{-8pt}{\includegraphics[height=77pt]{Rlogo-5}}\hspace{10pt} + Journal + + \end{center} + {\large \hfill Volume \RJ@volume/\RJ@number, \RJ@month{} \RJ@year \quad} + + \rule{\textwidth}{1pt} + \begin{center} + {\Large A peer-reviewed, open-access publication of the \\ + R Foundation for Statistical Computing} + \end{center} + + % And finally, put in the TOC box. Note the way |tocdepth| is adjusted + % before and after producing the TOC: thus, we can ensure that only + % articles show up in the printed TOC, but that in the PDF version, + % bookmarks are created for sections and subsections as well (provided + % that the non-starred forms are used). + \setcounter{tocdepth}{0} + \tableofcontents + \setcounter{tocdepth}{2} + \clearpage +} + +% Text formatting -------------------------------------------------------------- + +\newcommand{\R}{R} +\newcommand{\address}[1]{\addvspace{\baselineskip}\noindent\emph{#1}} +\newcommand{\email}[1]{\href{mailto:#1}{\normalfont\texttt{#1}}} + +% Simple font selection is not good enough. For example, |\texttt{--}| +% gives `\texttt{--}', i.e., an endash in typewriter font. Hence, we +% need to turn off ligatures, which currently only happens for commands +% |\code| and |\samp| and the ones derived from them. Hyphenation is +% another issue; it should really be turned off inside |\samp|. And +% most importantly, \LaTeX{} special characters are a nightmare. E.g., +% one needs |\~{}| to produce a tilde in a file name marked by |\file|. +% Perhaps a few years ago, most users would have agreed that this may be +% unfortunate but should not be changed to ensure consistency. But with +% the advent of the WWW and the need for getting `|~|' and `|#|' into +% URLs, commands which only treat the escape and grouping characters +% specially have gained acceptance + +\DeclareRobustCommand\code{\bgroup\@noligs\@codex} +\def\@codex#1{\texorpdfstring% +{{\normalfont\ttfamily\hyphenchar\font=-1 #1}}% +{#1}\egroup} +\newcommand{\kbd}[1]{{\normalfont\texttt{#1}}} +\newcommand{\key}[1]{{\normalfont\texttt{\uppercase{#1}}}} +\DeclareRobustCommand\samp{`\bgroup\@noligs\@sampx} +\def\@sampx#1{{\normalfont\texttt{#1}}\egroup'} +\newcommand{\var}[1]{{\normalfont\textsl{#1}}} +\let\env=\code +\newcommand{\file}[1]{{`\normalfont\textsf{#1}'}} +\let\command=\code +\let\option=\samp +\newcommand{\dfn}[1]{{\normalfont\textsl{#1}}} +% \acronym is effectively disabled since not used consistently +\newcommand{\acronym}[1]{#1} +\newcommand{\strong}[1]{\texorpdfstring% +{{\normalfont\fontseries{b}\selectfont #1}}% +{#1}} +\let\pkg=\strong +\newcommand{\CRANpkg}[1]{\href{https://CRAN.R-project.org/package=#1}{\pkg{#1}}}% +\let\cpkg=\CRANpkg +\newcommand{\ctv}[1]{\href{https://CRAN.R-project.org/view=#1}{\emph{#1}}} +\newcommand{\BIOpkg}[1]{\href{https://www.bioconductor.org/packages/release/bioc/html/#1.html}{\pkg{#1}}} + +% Example environments --------------------------------------------------------- +\RequirePackage{fancyvrb} +\RequirePackage{alltt} + +\DefineVerbatimEnvironment{example}{Verbatim}{} +\renewenvironment{example*}{\begin{alltt}}{\end{alltt}} + +% Support for output from Sweave, and generic session style code +% These used to have fontshape=sl for Sinput/Scode/Sin, but pslatex +% won't use a condensed font in that case. + +% Update (2015-05-28 by DS): remove fontsize=\small to match example environment + +\DefineVerbatimEnvironment{Sinput}{Verbatim}{} +\DefineVerbatimEnvironment{Soutput}{Verbatim}{} +\DefineVerbatimEnvironment{Scode}{Verbatim}{} +\DefineVerbatimEnvironment{Sin}{Verbatim}{} +\DefineVerbatimEnvironment{Sout}{Verbatim}{} +\newenvironment{Schunk}{}{} + +% Mathematics ------------------------------------------------------------------ + +% The implementation of |\operatorname| is similar to the mechanism +% \LaTeXe{} uses for functions like sin and cos, and simpler than the +% one of \AmSLaTeX{}. We use |\providecommand| for the definition in +% order to keep the one of the \pkg{amstex} if this package has +% already been loaded. +% \begin{macrocode} +\providecommand{\operatorname}[1]{% + \mathop{\operator@font#1}\nolimits} +\RequirePackage{amsfonts} + +\renewcommand{\P}{% + \mathop{\operator@font I\hspace{-1.5pt}P\hspace{.13pt}}} +\newcommand{\E}{% + \mathop{\operator@font I\hspace{-1.5pt}E\hspace{.13pt}}} +\newcommand{\VAR}{\operatorname{var}} +\newcommand{\COV}{\operatorname{cov}} +\newcommand{\COR}{\operatorname{cor}} + +% Figures ---------------------------------------------------------------------- + +% For use with pandoc > 3.2.1 +\newsavebox\pandoc@box +\newcommand*\pandocbounded[1]{% scales image to fit in text height/width + \sbox\pandoc@box{#1}% + \Gscale@div\@tempa{\textheight}{\dimexpr\ht\pandoc@box+\dp\pandoc@box\relax}% + \Gscale@div\@tempb{\linewidth}{\wd\pandoc@box}% + \ifdim\@tempb\p@<\@tempa\p@\let\@tempa\@tempb\fi% select the smaller of both + \ifdim\@tempa\p@<\p@\scalebox{\@tempa}{\usebox\pandoc@box}% + \else\usebox{\pandoc@box}% + \fi% +} + +\RequirePackage[font=small,labelfont=bf]{caption} + +% Wide environments for figures and tables ------------------------------------- +\RequirePackage{environ} + +% An easy way to make a figure span the full width of the page +\NewEnviron{widefigure}[1][]{ +\begin{figure}[#1] +\advance\leftskip-2cm +\begin{minipage}{\dimexpr\textwidth+4cm\relax}% + \captionsetup{margin=2cm} + \BODY +\end{minipage}% +\end{figure} +} + +\NewEnviron{widetable}[1][]{ +\begin{table}[#1] +\advance\leftskip-2cm +\begin{minipage}{\dimexpr\textwidth+4cm\relax}% + \captionsetup{margin=2cm} + \BODY +\end{minipage}% +\end{table} +} diff --git a/_articles/RJ-2025-037/RJreferences.bib b/_articles/RJ-2025-037/RJreferences.bib new file mode 100644 index 0000000000..2375cbdf62 --- /dev/null +++ b/_articles/RJ-2025-037/RJreferences.bib @@ -0,0 +1,249 @@ +@article{neumann1951various, + title={{Various techniques used in connection with random digits}}, + author={John {von Neumann}}, + journal={Notes by GE Forsythe}, + pages={36--38}, + year={1951} +} +@Manual{rlanguage, + title = {{R}: A Language and Environment for Statistical Computing}, + author = {{R Core Team}}, + organization = {R Foundation for Statistical Computing}, + address = {Vienna, Austria}, + year = {2024}, + url = {https://www.R-project.org/} +} +@Manual{AcceptReject, + title = {{AcceptReject}: Acceptance-Rejection Method for Generating Pseudo-Random Observations}, + author = {Pedro Rafael D. {Marinho} and Vera Lucia Damasceno Tomazella}, + year = {2024}, + note = {R package version 0.1.1}, + url = {https://prdm0.github.io/AcceptReject/} +} +@Manual{assertthat, + title = {{assertthat}: Easy Pre and Post Assertions}, + author = {Hadley Wickham}, + year = {2019}, + note = {R package version 0.2.1}, + url = {https://CRAN.R-project.org/package=assertthat} +} +@Manual{cli, + title = {{cli}: Helpers for Developing Command Line Interfaces}, + author = {Gábor Csárdi}, + year = {2023}, + note = {R package version 3.6.2}, + url = {https://CRAN.R-project.org/package=cli} +} +@Book{ggplot2, + author = {Hadley Wickham}, + title = {{ggplot2}: Elegant Graphics for Data Analysis}, + publisher = {Springer-Verlag New York}, + year = {2016}, + isbn = {978-3-319-24277-4}, + url = {https://ggplot2.tidyverse.org} +} +@Manual{glue, + title = {{glue}: Interpreted String Literals}, + author = {Jim Hester and Jennifer Bryan}, + year = {2024}, + note = {R package version 1.7.0}, + url = {https://CRAN.R-project.org/package=glue} +} +@Manual{numDeriv, + title = {{numDeriv}: Accurate Numerical Derivatives}, + author = {Paul Gilbert and Ravi Varadhan}, + year = {2019}, + note = {R package version 2016.8-1.1}, + url = {https://CRAN.R-project.org/package=numDeriv} +} +@Manual{pbmcapply, + title = {{pbmcapply}: Tracking the Progress of {Mc*pply} with Progress Bar}, + author = {Kevin Kuang and Quyu Kong and Francesco Napolitano}, + year = {2022}, + note = {R package version 1.5.1}, + url = {https://CRAN.R-project.org/package=pbmcapply} +} +@Manual{purrr, + title = {{purrr}: Functional Programming Tools}, + author = {Hadley Wickham and Lionel Henry}, + year = {2023}, + note = {R package version 1.0.2}, + url = {https://CRAN.R-project.org/package=purrr} +} +@Manual{rlang, + title = {{rlang}: Functions for Base Types and Core {R} and '{Tidyverse}' Features}, + author = {Lionel Henry and Hadley Wickham}, + year = {2024}, + note = {R package version 1.1.3}, + url = {https://CRAN.R-project.org/package=rlang} +} +@Manual{scales, + title = {{scales}: Scale Functions for Visualization}, + author = {Hadley Wickham and Thomas Lin Pedersen and Dana Seidel}, + year = {2023}, + note = {R package version 1.3.0}, + url = {https://CRAN.R-project.org/package=scales} +} +@Manual{knitr, + title = {{knitr}: A General-Purpose Package for Dynamic Report Generation in {R}}, + author = {Yihui Xie}, + year = {2024}, + note = {R package version 1.46}, + url = {https://yihui.org/knitr/} +} +@Manual{rmarkdown, + title = {{rmarkdown}: Dynamic Documents for {R}}, + author = {JJ Allaire and Yihui Xie and Christophe Dervieux and Jonathan McPherson and Javier Luraschi and Kevin Ushey and Aron Atkins and Hadley Wickham and Joe Cheng and Winston Chang and Richard Iannone}, + year = {2024}, + note = {R package version 2.26}, + url = {https://github.com/rstudio/rmarkdown} +} +@Manual{cowplot, + title = {{cowplot}: Streamlined Plot Theme and Plot Annotations for '{ggplot2}'}, + author = {Claus O. Wilke}, + year = {2024}, + note = {R package version 1.1.3}, + url = {https://CRAN.R-project.org/package=cowplot} +} +@Manual{tictoc, + title = {{tictoc}: Functions for Timing {R} Scripts, as Well as Implementations of "Stack" and "StackList" Structures}, + author = {Sergei Izrailev}, + year = {2023}, + note = {R package version 1.2}, + url = {https://CRAN.R-project.org/package=tictoc} +} +@Manual{lbfgs, + title = {{lbfgs}: Limited-memory {BFGS} Optimization}, + author = {Antonio Coppola and Brandon Stewart and Naoaki Okazaki}, + year = {2022}, + note = {R package version 1.2.1.2}, + url = {https://CRAN.R-project.org/package=lbfgs} +} +@Manual{magrittr, + title = {{magrittr}: A Forward-Pipe Operator for {R}}, + author = {Stefan Milton Bache and Hadley Wickham}, + year = {2022}, + note = {R package version 2.0.3}, + url = {https://CRAN.R-project.org/package=magrittr} +} +@article{nadarajah2014modified, + title={{Modified beta distributions}}, + author={Nadarajah, Saralees and Teimouri, Mahdi and Shih, Shou Hsing}, + journal={Sankhya B}, + volume={76}, + pages={19--48}, + year={2014}, + publisher={Springer} +} +@article{l1999good, + title={{Good parameters and implementations for combined multiple recursive random number generators}}, + author={L'ecuyer, Pierre}, + journal={Operations Research}, + volume={47}, + number={1}, + pages={159--164}, + year={1999}, + publisher={INFORMS} +} +@article{l2002object, + title={{An object-oriented random-number package with many long streams and substreams}}, + author={L'ecuyer, Pierre and Simard, Richard and Chen, E Jack and Kelton, W David}, + journal={Operations research}, + volume={50}, + number={6}, + pages={1073--1075}, + year={2002}, + publisher={INFORMS} +} +@book{kroese2013handbook, + title={{Handbook of monte carlo methods}}, + author={Kroese, Dirk P and Taimre, Thomas and Botev, Zdravko I}, + year={2013}, + publisher={John Wiley \& Sons} +} +@book{asmussen2007stochastic, + title={{Stochastic simulation}: algorithms and analysis}, + author={Asmussen, S{\o}ren and Glynn, Peter W}, + volume={57}, + year={2007}, + publisher={Springer} +} +@misc{kemp2003discrete, + title={{Discrete-event simulation}: modeling, programming, and analysis}, + author={Kemp, David}, + year={2003}, + publisher={Oxford University Press} +} +@book{gentle2003random, + title={{Random number generation and {Monte Carlo} methods}}, + author={Gentle, James E}, + volume={381}, + year={2003}, + publisher={Springer} +} +@Manual{AR, + title = {{AR}: Another Look at the Acceptance-Rejection Method}, + author = {Abbas Parchami}, + year = {2018}, + note = {R package version 1.1}, + url = {https://CRAN.R-project.org/package=AR} +} +@Article{SimDesign, + title = {{Writing effective and reliable {Monte Carlo} simulations with the {SimDesign} package}}, + author = {R. Philip Chalmers and Mark C. Adkins}, + journal = {The Quantitative Methods for Psychology}, + year = {2020}, + volume = {16}, + number = {4}, + pages = {248--280}, + doi = {10.20982/tqmp.16.4.p248} +} +@article{chambers2014object, + title={{Object-oriented programming, functional programming and {R}}}, + author={Chambers, John M}, + year={2014} +} +@Manual{r6, + title = {{R6}: Encapsulated Classes with Reference Semantics}, + author = {Winston Chang}, + year = {2021}, + note = {R package version 2.5.1}, + url = {https://CRAN.R-project.org/package=R6} +} +@article{eddelbuettel2024package, + title={{Package ‘Rcpp’}}, + author={Eddelbuettel, Dirk and Francois, Romain and Allaire, JJ and Ushey, Kevin and Kou, Qiang and Russell, Nathan and Bates, Douglas and Chambers, John and Eddelbuettel, Maintainer Dirk}, + year={2024} +} +@article{eddelbuettel2014rcpparmadillo, + title={{RcppArmadillo}: Accelerating {R} with high-performance {C++} linear algebra}, + author={Eddelbuettel, Dirk and Sanderson, Conrad}, + journal={Computational statistics \& data analysis}, + volume={71}, + pages={1054--1063}, + year={2014}, + publisher={Elsevier} +} +@Manual{scattermore, + title = {{scattermore}: Scatterplots with More Points}, + author = {Tereza Kulichova and Mirek Kratochvil}, + year = {2023}, + note = {R package version 1.2}, + url = {https://CRAN.R-project.org/package=scattermore} +} +@Article{testthat, + author = {Hadley Wickham}, + title = {{testthat}: Get Started with Testing}, + journal = {The R Journal}, + year = {2011}, + volume = {3}, + pages = {5--10}, + url = {https://journal.r-project.org/archive/2011-1/RJournal_2011-1_Wickham.pdf} +} +@Manual{DISTRIB, + title = {{DISTRIB}: Four Essential Functions for Statistical Distributions Analysis: A New Functional Approach}, + author = {Abbas Parchami}, + year = {2016}, + note = {R package version 1.0}, + url = {https://CRAN.R-project.org/package=DISTRIB} +} diff --git a/_articles/RJ-2025-037/RJwrapper.tex b/_articles/RJ-2025-037/RJwrapper.tex new file mode 100644 index 0000000000..a26f382970 --- /dev/null +++ b/_articles/RJ-2025-037/RJwrapper.tex @@ -0,0 +1,72 @@ +\documentclass[a4paper]{report} +\usepackage[utf8]{inputenc} +\usepackage[T1]{fontenc} +\usepackage{RJournal} +\usepackage{amsmath,amssymb,array} +\usepackage{booktabs} + + +% tightlist command for lists without linebreak +\providecommand{\tightlist}{% + \setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}} + +\usepackage{longtable} + +% Always define CSL refs as bib entries are contained in separate doc +% Pandoc citation processing +%From Pandoc 3.1.8 +% definitions for citeproc citations +\NewDocumentCommand\citeproctext{}{} +\NewDocumentCommand\citeproc{mm}{% + \begingroup\def\citeproctext{#2}\cite{#1}\endgroup} +\makeatletter + % allow citations to break across lines + \let\@cite@ofmt\@firstofone + % avoid brackets around text for \cite: + \def\@biblabel#1{} + \def\@cite#1#2{{#1\if@tempswa , #2\fi}} +\makeatother +\newlength{\cslhangindent} +\setlength{\cslhangindent}{1.5em} +\newlength{\csllabelwidth} +\setlength{\csllabelwidth}{3em} +\newenvironment{CSLReferences}[2] % #1 hanging-indent, #2 entry-spacing + {\begin{list}{}{% + \setlength{\itemindent}{0pt} + \setlength{\leftmargin}{0pt} + \setlength{\parsep}{0pt} + % turn on hanging indent if param 1 is 1 + \ifodd #1 + \setlength{\leftmargin}{\cslhangindent} + \setlength{\itemindent}{-1\cslhangindent} + \fi + % set entry spacing + \setlength{\itemsep}{#2\baselineskip}}} + {\end{list}} +\usepackage{calc} +\newcommand{\CSLBlock}[1]{#1\hfill\break} +\newcommand{\CSLLeftMargin}[1]{\parbox[t]{\csllabelwidth}{#1}} +\newcommand{\CSLRightInline}[1]{\parbox[t]{\linewidth - \csllabelwidth}{#1}\break} +\newcommand{\CSLIndent}[1]{\hspace{\cslhangindent}#1} + + +\usepackage{subfig} +\usepackage{float} + +\begin{document} + + +%% do not edit, for illustration only +\sectionhead{Contributed research article} +\volume{17} +\volnumber{4} +\year{2025} +\month{December} +\setcounter{page}{133} + +\begin{article} + \input{RJ-2025-037} +\end{article} + + +\end{document} diff --git a/_articles/RJ-2025-037/bibliography.bib b/_articles/RJ-2025-037/bibliography.bib new file mode 100644 index 0000000000..47242c616f --- /dev/null +++ b/_articles/RJ-2025-037/bibliography.bib @@ -0,0 +1,285 @@ +@article{neumann1951various, + title={Various techniques used in connection with random digits}, + author={Neumann, Von}, + journal={Notes by GE Forsythe}, + pages={36--38}, + year={1951} +} + + +@Manual{rlanguage, + title = {R: A Language and Environment for Statistical Computing}, + author = {{R Core Team}}, + organization = {R Foundation for Statistical Computing}, + address = {Vienna, Austria}, + year = {2024}, + url = {https://www.R-project.org/}, +} + +@Manual{AcceptReject, + title = {AcceptReject: Acceptance-Rejection Method for Generating Pseudo-Random +Observations}, + author = {Pedro Rafael D. {Marinho} and Vera Lucia Damasceno Tomazella}, + year = {2024}, + note = {R package version 0.1.1}, + url = {https://prdm0.github.io/AcceptReject/}, +} + + @Manual{assertthat, + title = {assertthat: Easy Pre and Post Assertions}, + author = {Hadley Wickham}, + year = {2019}, + note = {R package version 0.2.1}, + url = {https://CRAN.R-project.org/package=assertthat}, +} + + @Manual{cli, + title = {cli: Helpers for Developing Command Line Interfaces}, + author = {Gábor Csárdi}, + year = {2023}, + note = {R package version 3.6.2}, + url = {https://CRAN.R-project.org/package=cli}, +} + +@Book{ggplot2, + author = {Hadley Wickham}, + title = {ggplot2: Elegant Graphics for Data Analysis}, + publisher = {Springer-Verlag New York}, + year = {2016}, + isbn = {978-3-319-24277-4}, + url = {https://ggplot2.tidyverse.org}, +} + +@Manual{glue, + title = {glue: Interpreted String Literals}, + author = {Jim Hester and Jennifer Bryan}, + year = {2024}, + note = {R package version 1.7.0}, + url = {https://CRAN.R-project.org/package=glue}, +} + +@Manual{numDeriv, + title = {numDeriv: Accurate Numerical Derivatives}, + author = {Paul Gilbert and Ravi Varadhan}, + year = {2019}, + note = {R package version 2016.8-1.1}, + url = {https://CRAN.R-project.org/package=numDeriv}, +} + +@Manual{pbmcapply, + title = {pbmcapply: Tracking the Progress of Mc*pply with Progress Bar}, + author = {Kevin Kuang and Quyu Kong and Francesco Napolitano}, + year = {2022}, + note = {R package version 1.5.1}, + url = {https://CRAN.R-project.org/package=pbmcapply}, +} + +@Manual{purrr, + title = {purrr: Functional Programming Tools}, + author = {Hadley Wickham and Lionel Henry}, + year = {2023}, + note = {R package version 1.0.2}, + url = {https://CRAN.R-project.org/package=purrr}, +} + +@Manual{rlang, + title = {rlang: Functions for Base Types and Core R and 'Tidyverse' Features}, + author = {Lionel Henry and Hadley Wickham}, + year = {2024}, + note = {R package version 1.1.3}, + url = {https://CRAN.R-project.org/package=rlang}, +} + +@Manual{scales, + title = {scales: Scale Functions for Visualization}, + author = {Hadley Wickham and Thomas Lin Pedersen and Dana Seidel}, + year = {2023}, + note = {R package version 1.3.0}, + url = {https://CRAN.R-project.org/package=scales}, +} + +@Manual{knitr, + title = {knitr: A General-Purpose Package for Dynamic Report Generation in R}, + author = {Yihui Xie}, + year = {2024}, + note = {R package version 1.46}, + url = {https://yihui.org/knitr/}, +} + +@Manual{rmarkdown, + title = {rmarkdown: Dynamic Documents for R}, + author = {JJ Allaire and Yihui Xie and Christophe Dervieux and Jonathan McPherson and Javier Luraschi and Kevin Ushey and Aron Atkins and Hadley Wickham and Joe Cheng and Winston Chang and Richard Iannone}, + year = {2024}, + note = {R package version 2.26}, + url = {https://github.com/rstudio/rmarkdown}, +} + +@Manual{cowplot, + title = {cowplot: Streamlined Plot Theme and Plot Annotations for 'ggplot2'}, + author = {Claus O. Wilke}, + year = {2024}, + note = {R package version 1.1.3}, + url = {https://CRAN.R-project.org/package=cowplot}, +} + +@Manual{tictoc, + title = {tictoc: Functions for Timing R Scripts, as Well as Implementations of +"Stack" and "StackList" Structures}, + author = {Sergei Izrailev}, + year = {2023}, + note = {R package version 1.2}, + url = {https://CRAN.R-project.org/package=tictoc}, +} + +@Manual{lbfgs, + title = {lbfgs: Limited-memory BFGS Optimization}, + author = {Antonio Coppola and Brandon Stewart and Naoaki Okazaki}, + year = {2022}, + note = {R package version 1.2.1.2}, + url = {https://CRAN.R-project.org/package=lbfgs}, +} + +@Manual{magrittr, + title = {magrittr: A Forward-Pipe Operator for R}, + author = {Stefan Milton Bache and Hadley Wickham}, + year = {2022}, + note = {R package version 2.0.3}, + url = {https://CRAN.R-project.org/package=magrittr}, +} + +@article{nadarajah2014modified, + title={Modified beta distributions}, + author={Nadarajah, Saralees and Teimouri, Mahdi and Shih, Shou Hsing}, + journal={Sankhya B}, + volume={76}, + pages={19--48}, + year={2014}, + publisher={Springer} +} + +@article{l1999good, + title={Good parameters and implementations for combined multiple recursive random number generators}, + author={L'ecuyer, Pierre}, + journal={Operations Research}, + volume={47}, + number={1}, + pages={159--164}, + year={1999}, + publisher={INFORMS} +} + +@article{l2002object, + title={An object-oriented random-number package with many long streams and substreams}, + author={L'ecuyer, Pierre and Simard, Richard and Chen, E Jack and Kelton, W David}, + journal={Operations research}, + volume={50}, + number={6}, + pages={1073--1075}, + year={2002}, + publisher={INFORMS} +} + +@book{kroese2013handbook, + title={Handbook of monte carlo methods}, + author={Kroese, Dirk P and Taimre, Thomas and Botev, Zdravko I}, + year={2013}, + publisher={John Wiley \& Sons} +} + +@book{asmussen2007stochastic, + title={Stochastic simulation: algorithms and analysis}, + author={Asmussen, S{\o}ren and Glynn, Peter W}, + volume={57}, + year={2007}, + publisher={Springer} +} + +@misc{kemp2003discrete, + title={Discrete-event simulation: modeling, programming, and analysis}, + author={Kemp, David}, + year={2003}, + publisher={Oxford University Press} +} + +@book{gentle2003random, + title={Random number generation and Monte Carlo methods}, + author={Gentle, James E}, + volume={381}, + year={2003}, + publisher={Springer} +} + +@Manual{AR, + title = {AR: Another Look at the Acceptance-Rejection Method}, + author = {Abbas Parchami}, + year = {2018}, + note = {R package version 1.1}, + url = {https://CRAN.R-project.org/package=AR}, +} + +@Article{SimDesign, + title = {Writing effective and reliable {Monte Carlo} simulations with the {SimDesign} package}, + author = {R. Philip Chalmers and Mark C. Adkins}, + journal = {The Quantitative Methods for Psychology}, + year = {2020}, + volume = {16}, + number = {4}, + pages = {248--280}, + doi = {10.20982/tqmp.16.4.p248}, +} + +@article{chambers2014object, + title={Object-oriented programming, functional programming and R}, + author={Chambers, John M}, + year={2014} +} + +@Manual{r6, + title = {R6: Encapsulated Classes with Reference Semantics}, + author = {Winston Chang}, + year = {2021}, + note = {R package version 2.5.1}, + url = {https://CRAN.R-project.org/package=R6}, +} + +@article{eddelbuettel2024package, + title={Package ‘Rcpp’}, + author={Eddelbuettel, Dirk and Francois, Romain and Allaire, JJ and Ushey, Kevin and Kou, Qiang and Russell, Nathan and Bates, Douglas and Chambers, John and Eddelbuettel, Maintainer Dirk}, + year={2024} +} + +@article{eddelbuettel2014rcpparmadillo, + title={RcppArmadillo: Accelerating R with high-performance C++ linear algebra}, + author={Eddelbuettel, Dirk and Sanderson, Conrad}, + journal={Computational statistics \& data analysis}, + volume={71}, + pages={1054--1063}, + year={2014}, + publisher={Elsevier} +} + +@Manual{scattermore, + title = {scattermore: Scatterplots with More Points}, + author = {Tereza Kulichova and Mirek Kratochvil}, + year = {2023}, + note = {R package version 1.2}, + url = {https://CRAN.R-project.org/package=scattermore}, + } + +@Article{testthat, + author = {Hadley Wickham}, + title = {testthat: Get Started with Testing}, + journal = {The R Journal}, + year = {2011}, + volume = {3}, + pages = {5--10}, + url = {https://journal.r-project.org/archive/2011-1/RJournal_2011-1_Wickham.pdf}, + } + + @Manual{DISTRIB, + title = {DISTRIB: Four Essential Functions for Statistical Distributions Analysis: A New Functional Approach}, + author = {Abbas Parchami}, + year = {2016}, + note = {R package version 1.0}, + url = {https://CRAN.R-project.org/package=DISTRIB}, + } diff --git a/_articles/RJ-2025-037/figures/logo.png b/_articles/RJ-2025-037/figures/logo.png new file mode 100644 index 0000000000..66d44f85e8 Binary files /dev/null and b/_articles/RJ-2025-037/figures/logo.png differ diff --git a/_articles/RJ-2025-037/supplement.R b/_articles/RJ-2025-037/supplement.R new file mode 100644 index 0000000000..561085b666 --- /dev/null +++ b/_articles/RJ-2025-037/supplement.R @@ -0,0 +1,397 @@ +# Install packages +# install.packages("AcceptReject") +# install.packages("tictoc") +# install.packages("SimDesign") +# install.packages("numDeriv") +# install.packages("bench") +# install.packages("ggplot2") + +# Load packages +library(AcceptReject) +library(parallel) +library(tictoc) +library(SimDesign) +library(numDeriv) +library(bench) +library(parallel) +library(ggplot2) + +# Example 1 --------------------------------------------------------------- + +# Considering c = 1 (default) +inspect( + f = dweibull, + f_base = dunif, + xlim = c(0, 5), + args_f = list(shape = 2, scale = 1), + args_f_base = list(min = 0, max = 5), + c = 1 +) + +# Considering c = 4.3 +inspect( + f = dweibull, + f_base = dunif, + xlim = c(0, 5), + args_f = list(shape = 2, scale = 1), + args_f_base = list(min = 0, max = 5), + c = 4.3 +) + +# Example 2 --------------------------------------------------------------- + +set.seed(0) + +# Generate 100 observations from a random variable X with +# f_X(x) = 2x, 0 <= x <= 1. +x <- accept_reject( + n = 100L, + f = function(x) 2 * x, + args_f = list(), + xlim = c(0, 1), + warning = FALSE +) +print(x) + +# Example 3 --------------------------------------------------------------- + +set.seed(0) +x <- accept_reject( + n = 2000L, + f = dbinom, + continuous = FALSE, + args_f = list(size = 5, prob = 0.5), + xlim = c(0, 10) +) + +# Printing the first 10 (default) observations +print(x) + +# Printing the first 20 observations +print(x, n_min = 20L) + +# Summary +summary(x) + +# Example 4 --------------------------------------------------------------- + +# Generating and plotting the theoretical density with the +# observed density. + +# setting a seed for reproducibility +set.seed(0) + +# Continuous case +accept_reject( + n = 2000L, + continuous = TRUE, + f = dweibull, + args_f = list(shape = 2.1, scale = 2.2), + xlim = c(0, 10) +) |> + plot( + hist = FALSE, + color_true_density = "#2B8b99", + color_observed_density = "#F4DDB3", + alpha = 0.6 + ) # Changing some arguments in plot() + +# Discrete case +accept_reject( + n = 1000L, + f = dbinom, + continuous = FALSE, + args_f = list(size = 5, prob = 0.5), + xlim = c(0, 10) +) |> plot() + +# Example 5 --------------------------------------------------------------- + +# Ensuring reproducibility in parallel computing +RNGkind("L'Ecuyer-CMRG") +set.seed(0) +mc.reset.stream() + +# Simulation +simulation <- function(n, lambda = 0.7) + accept_reject( + n = n, + f = dpois, + continuous = FALSE, # discrete case + args_f = list(lambda = lambda), + xlim = c(0, 20), + parallel = TRUE # Parallelizing the code in Unix-based systems + ) + +# Generating observations +# n = 25 observations +tic() +simulation(25) |> plot() +toc() + +# n = 2500 observations +tic() +simulation(2500) |> plot() +toc() + +# Example 6 --------------------------------------------------------------- + +# Ensuring reproducibility in parallel computing +RNGkind("L'Ecuyer-CMRG") +set.seed(0) +mc.reset.stream() + +# Generating observations +accept_reject( + n = 50L, + f = dnorm, + continuous = TRUE, + args_f = list(mean = 0, sd = 1), + xlim = c(-4, 4), + parallel = TRUE +) |> plot() + +accept_reject( + n = 500L, + f = dnorm, + continuous = TRUE, + args_f = list(mean = 0, sd = 1), + xlim = c(-4, 4), + parallel = TRUE +) |> plot() + +# Example 7 --------------------------------------------------------------- + +pdf <- function(x, G, ...){ + numDeriv::grad( + func = \(x) G(x, ...), + x = x + ) +} + +# Modified Beta Distributions +# Link: https://link.springer.com/article/10.1007/s13571-013-0077-0 +generator <- function(x, G, a, b, beta, ...){ + g <- pdf(x = x, G = G, ...) + numerator <- beta^a * g * G(x, ...)^(a - 1) * (1 - G(x, ...))^(b - 1) + denominator <- beta(a, b) * (1 - (1 - beta) * G(x, ...))^(a + b) + numerator/denominator +} + +# Probability density function - Modified Beta Weibull +pdf_mbw <- function(x, a, b, beta, shape, scale) + generator( + x = x, + G = pweibull, + a = a, + b = b, + beta = beta, + shape = shape, + scale = scale + ) + +# Checking the value of the integral +integrate( + f = \(x) pdf_mbw(x, 1, 1, 1, 1, 1), + lower = 0, + upper = Inf +) + +# Example 8 --------------------------------------------------------------- + +# True parameters +a <- 10.5 +b <- 4.2 +beta <- 5.9 +shape <- 1.5 +scale <- 1.7 + +# c = 1 (default) +inspect( + f = pdf_mbw, + f_base = dweibull, + xlim = c(0, 4), + args_f = list(a = a, b = b, beta = beta, shape = shape, scale = scale), + args_f_base = list(shape = 2, scale = 1.2), + c = 1 +) + +# c = 2.2 +inspect( + f = pdf_mbw, + f_base = dweibull, + xlim = c(0, 4), + args_f = list(a = a, b = b, beta = beta, shape = shape, scale = scale), + args_f_base = list(shape = 2, scale = 1.2), + c = 2.2 +) + +# Example 9 --------------------------------------------------------------- + +simulation <- function(n, parallel = TRUE, base = TRUE){ + # True parameters + a <- 10.5 + b <- 4.2 + beta <- 5.9 + shape <- 1.5 + scale <- 1.7 + c <- 2.2 + + # Generate data with the true parameters using + # the AcceptReject package. + if(base){ # Using the Weibull distribution as the base distribution + accept_reject( + n = n, + f = pdf_mbw, + args_f = list(a = a, b = b, beta = beta, shape = shape, scale = scale), + f_base = dweibull, + args_f_base = list(shape = 2, scale = 1.2), + random_base = rweibull, + xlim = c(0, 4), + c = c, + parallel = parallel + ) + } else { # Using the uniforme distribution as the base distribution + accept_reject( + n = n, + f = pdf_mbw, + args_f = list(a = a, b = b, beta = beta, shape = shape, scale = scale), + xlim = c(0, 4), + parallel = parallel + ) + } +} + +benchmark <- function(n_values, time_unit = 's', base = TRUE){ + # Initialize an empty data frame to store the results + results_df <- data.frame() + + # Run benchmarks for each sample size and each type of code + for(n in n_values) { + for(parallel in c(TRUE, FALSE)) { + results <- bench::mark( + simulation(n = n, parallel = parallel, base = base), + time_unit = time_unit, + memory = FALSE, + check = FALSE + ) + + # Convert results to data frame and add columns for the sample + # size and type of code + results_df_temp <- as.data.frame(results) + results_df_temp$n <- n + results_df_temp + results_df_temp$Code <- ifelse(parallel, "Parallel", "Serial") + + # Append the results to the results data frame + results_df <- rbind(results_df, results_df_temp) + } + } + + # Create a scatter plot of the median time vs the sample size, + # colored by the type of code + ggplot(results_df, aes(x = n, y = median, color = Code)) + + geom_point() + + scale_x_log10() + + scale_y_log10() + + labs(x = "Sample Size (n)", y = "Median Time (s)", color = "Code Type") + + ggtitle("Benchmark Results") + + ggplot2::theme( + axis.title = ggplot2::element_text(face = "bold"), + title = ggplot2::element_text(face = "bold"), + legend.title = ggplot2::element_text(face = "bold"), + plot.subtitle = ggplot2::element_text(face = "plain") + ) +} + +# Sample sizes +n <- c(50, 250, 500, 1e3, 5e3, 10e3, 15e3, 25e3, 50e3, 100e3, 150e3, 250e3, 500e3, 1e6) + +# Ensuring reproducibility in parallel computing +RNGkind("L'Ecuyer-CMRG") +set.seed(0) +mc.reset.stream() + +# Run the benchmark function for multiple sample sizes +n |> benchmark(n_values = _, base = TRUE) +n |> benchmark(n_values = _, base = FALSE) + +# Example 10 -------------------------------------------------------------- + +simulation_1 <- function(n, parallel = TRUE, base = TRUE){ + accept_reject( + n = n, + f = pdf_mbw, + args_f = list(a = 10, b = 1, beta = 20.5, shape = 2, scale = 0.3), + xlim = c(0, 1), + parallel = parallel + ) +} + +simulation_2 <- function(n){ + + df = \(x) pdf_mbw(x = x, a = 10, b = 1, beta = 20.5, shape = 2, scale = 0.3) + dg = \(x) dunif(x = x, min = 0, max = 1) + rg = \(n) runif(n = n, min = 0, max = 1) + + # when df and dg both integrate to 1, acceptance probability = 1/M + M <- + rejectionSampling( + df = df, + dg = dg, + rg = rg + ) + rejectionSampling(n, df = df, dg = dg, rg = rg, M = M) +} + +benchmark <- function(n_values, parallel = TRUE){ + # Initialize an empty data frame to store the results + results_df <- data.frame() + + # Run benchmarks for each sample size and each type of code + for(n in n_values) { + results <- bench::mark( + AcceptReject = simulation_1(n = n, parallel = parallel), + SimDesign = simulation_2(n = n), + time_unit = 's', + memory = FALSE, + check = FALSE + ) + + # Convert results to data frame and add columns for the sample + # size and type of code + results_df_temp <- results + results_df_temp$n <- n + + # Append the results to the results data frame + results_df <- rbind(results_df, results_df_temp) + } + + # Create a scatter plot of the median time vs the sample size, + # colored by the type of code + ggplot(results_df, aes(x = n, y = median, color = expression)) + + geom_point() + + scale_x_log10() + + scale_y_log10() + + labs(x = "Sample Size (n)", y = "Median Time (s)", color = "Packages") + + ggtitle("Benchmark Results") + + theme( + axis.title = element_text(face = "bold"), + title = element_text(face = "bold"), + legend.title = element_text(face = "bold"), + plot.subtitle = element_text(face = "plain") + ) +} + +small_and_moderate_sample <- c(100, 150, 250, 500, 1e3, 1500, 2000, 2500, 3500, 4500, 5500, 7500, 10e3, 25e3) +big_sample <- c(50e3, 75e3, 100e3, 150e3, 250e3, 500e3, 750e3, 1e6) +# Ensuring reproducibility in parallel computing +RNGkind("L'Ecuyer-CMRG") +set.seed(0) +mc.reset.stream() + +# Serial +benchmark(n_values = small_and_moderate_sample, parallel = FALSE) + +# Parallel +benchmark(n_values = big_sample, parallel = TRUE) \ No newline at end of file diff --git a/_articles/RJ-2025-038/RJ-2025-038.Rmd b/_articles/RJ-2025-038/RJ-2025-038.Rmd new file mode 100644 index 0000000000..c501e06d57 --- /dev/null +++ b/_articles/RJ-2025-038/RJ-2025-038.Rmd @@ -0,0 +1,1268 @@ +--- +title: 'qCBA: An R Package for Postoptimization of Rule Models Learnt on Quantized + Data' +abstract: | + A popular approach to building rule models is association rule + classification. However, these often produce larger models than most + other rule learners, impeding the comprehensibility of the created + classifiers. Also, these algorithms decouple discretization from model + learning, often leading to a loss of predictive performance. This + package presents an implementation of Quantitative Classification + based on Associations (QCBA), which is a collection of postprocessing + algorithms for rule models built over discretized data. The QCBA + method improves the fit of the bins originally produced by + discretization and performs additional pruning, resulting in models + that are typically smaller and often more accurate. The qCBA package + supports models created with multiple packages for rule-based + classification available in CRAN, including arc, arulesCBA, rCBA and + sbrl. +author: +- name: Tomas Kliegr + affiliation: Prague University of Economics and Business + address: + - Winston Churchill Sq. 4, Prague + - Czech Republic + - + - '*ORCiD: [0000-0002-7261-0380](https://orcid.org/0000-0002-7261-0380)*' + - | + [`tomas.kliegr@vse.cz`](mailto:tomas.kliegr@vse.cz) + [tomas.kliegr@vse.cz](tomas.kliegr@vse.cz){.uri} +date: '2026-02-04' +date_received: '2024-10-01' +journal: + firstpage: 156 + lastpage: 172 +volume: 17 +issue: 4 +slug: RJ-2025-038 +packages: + cran: + - qCBA + - rCBA + - arc + - arulesCBA + - arules + - arcPackage + - rJava + - datasets + - discretization + - sbrl + bioc: [] +preview: preview.png +bibliography: qcba.bib +CTV: ~ +legacy_pdf: yes +legacy_converted: yes +output: + rjtools::rjournal_web_article: + self_contained: yes + toc: no + mathjax: https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-chtml-full.js + md_extension: -tex_math_single_backslash +draft: no + +--- + + +::::: article +## Introduction {#sec:introduction} + +There is a resurgence of interest in interpretable machine learning +models, with rule learning providing an appealing combination of +intrinsic comprehensibility, as humans are naturally used to working +with rules, with well-documented predictive performance and scalability. +Association rule classification (ARC) is a subclass of rule-learning +algorithms that can quickly generate a large number of candidate rules, +a subset of which is subsequently chosen for the final classifier. The +first, and with at least three packages [@hahsler2019associative], the +most popular such algorithm was Classification Based on Associations +(CBA) [@Liu98integratingclassification]. There are also multiple newer +approaches, such as SBRL [@yang2017scalable] or RCAR [@azmi2020rcar], +both available in R [@sbrl; @arulesCBA]. + +A major limitation of ARC approaches is that they typically trade off +the ability to process numerical data for the speed of generation. On +the input, these approaches require categorical data. If there are +numerical attributes, these need to be converted to categories, +typically through some discretization (quantization) approach, such as +MDLP [@FayyadI93]. Association rule learning then operates on +prediscretized datasets, which results in the loss of predictive +performance and larger rule sets. + +The Quantitative Classification Based on Associations method (QCBA) +[@kliegr2023qcba] is a collection of several algorithms that +postoptimize rule-based classifiers learnt on prediscretized data with +respect to the original raw dataset with numerical attributes. As was +experimentally shown in @kliegr2023qcba, this makes the models often +more accurate and consistently smaller and thus more interpretable. + +This paper presents the +[**qCBA**](https://CRAN.R-project.org/package=qCBA) R package, which +implements QCBA and is available on CRAN. The +[**qCBA**](https://CRAN.R-project.org/package=qCBA) R package was +initially developed to postprocess results of CBA implementations, as +these were the most common rule learning systems in R, but it can now +also handle results of other rule learning approaches such as SBRL. The +three CBA implementations in CRAN -- +[**rCBA**](https://CRAN.R-project.org/package=rCBA) [@rcba], +[**arc**](https://CRAN.R-project.org/package=arc) [@arcPackage] and +[**arulesCBA**](https://CRAN.R-project.org/package=arulesCBA) +[@arulesCBA] introduced in @hahsler2019associative rely on the fast and +proven [**arules**](https://CRAN.R-project.org/package=arules) package +[@hahsler2011arules] to mine association rules, which is also the main +dependency of the [**qCBA**](https://CRAN.R-project.org/package=qCBA) R +package. + +## Primer on building rule-based classifiers for QCBA in R {#sec:primer} + +This primer will show how to use QCBA with CBA as the base rule learner. +Out of the rule learners supported by QCBA, CBA is the most widely +supported in CRAN and also scientifically most cited (as of writing). +This primer is standalone and intends to show the main concepts through +code-based examples. However, it uses the same dataset and setting as +the going example in the open-access publication [@kliegr2023qcba], +which contains graphical illustrations as well as formal definitions of +the algorithms (as opposed to R-code examples here). + +### Brief introduction to association rule classification + +Before we present the details of the QCBA algorithm, we start by +covering the foundational methods of association rule learning and +classification. + +**Input data** Association rules are historically mined on +*transactions*, which is also the format used by the most commonly used +R package [**arules**](https://CRAN.R-project.org/package=arules). A +standard data table (data frame) can be converted to a transaction +matrix. This can be viewed as a binary incidence matrix, with one dummy +variable for each attribute-value pair (called an *item*). In the case +of numerical variables, discretization (quantization) is a standard +preprocessing step. It is required to ensure that the search for rules +is fast and the conditions in the discovered rules are sufficiently +broad. + +**Association rules** Algorithms such as Apriori [@agrawal94fast] are +used for association rule mining. An association rule has the form +$r: antecedent \rightarrow consequent$, where both $antecedent$ and +$consequent$ are sets of items. The +[**arules**](https://CRAN.R-project.org/package=arules) package, which +is the most popular Apriori implementation in CRAN, calls the antecedent +the left-hand side of the rule ($lhs$) and the consequent the right-hand +side ($rhs$). Each set is interpreted as a conjunction of conditions +corresponding to individual items. When all these conditions are met, +then the rule predicts that its consequent is true for a given input +data point. Formally, we say that a rule *covers* a transaction if all +items in the rule's antecedent are contained in the transaction. A rule +*correctly classifies* the transaction if the transaction is covered +and, at the same time, the items in the rule consequent are contained in +the transaction. + +**Rule interest measures** Each rule is associated with some quality +metrics (also called rule interest measures). The two most important +ones are the confidence and support of rule $r$. Confidence is +calculated as $conf(r)=a/(a+b)$, where $a$ is the number of transactions +matching both the antecedent and consequent and $b$ is the number of +transactions matching the antecedent but not the consequent. Support is +calculated either as $a$ (absolute support) or $a$ divided by the total +number of transactions (relative support). In the association rule +generation step, confidence and support are used as constraints in +association rule learning. Another important constraint to prevent a +combinatorial explosion is a restriction on the length of the rule +(`minLen` and `maxLen` parameters in +[**arules**](https://CRAN.R-project.org/package=arules)), defined as the +threshold on the minimum and maximum count of items in the rule +antecedent and consequent. + +**Association Rule Classification models** ARC algorithms process +candidate association rules into a classifier. An input to the ARC +algorithm is a set of pre-mined *class association rules* (CARs). A +class association rule is an association rule that contains exactly one +item corresponding to one of the target classes in the consequent. The +classifier-building process typically amounts to a selection of a subset +of CARs [@arcReview]. Some ARC algorithms, such as CBA, use rule +interest measures for this: rules are first ordered and then some of +them are removed using *data coverage pruning* and *default rule +pruning* (step 2 and step 3 of the CBA-CB algorithm as described in +@Liu98integratingclassification. Some ARC algorithms also include a rule +with an empty antecedent (called *default rule*) at the end of the rule +list to ensure that the classifier can always provide a prediction even +if there is no specific rule matching that particular transaction. + +**Types of ARC models** CBA produces *rule lists*: the rule with the +highest priority in the pruned rule list is used for classification. +Other recent algorithms producing rule lists include SBRL [@sbrl]. The +other common ARC approach is based on *rule sets*, where rules are not +ordered, and multiple rules can be used for classification, e.g., +through voting. This second group includes algorithms such as CMAR +[@cmar] or CPAR [@yin2003cpar]. + +**Postprocessing with Quantitative CBA** QCBA is a postprocessing +algorithm for ARC models built over quantized data. On the input, it can +take both rule lists or rule sets. However, it always outputs a rule +list. While association rule learning often faces issues with a +combinatorial explosion [@kliegr2019tuning], the postprocessing by QCBA +is performed on a relatively small number of rules that passed the +pruning steps within the specific ARC algorithm. QCBA is a modular +approach with six steps that can be performed independently of each +other. The only exception is the initial refit tuning step, which +processes all items in a given rule that are the result of quantization. +QCBA adjusts the item boundaries so that they correspond to actual +values appearing in the original training data before discretization. +Benchmarks in @kliegr2023qcba have shown that, on average, the +postprocessing with QCBA took a similar time as building the input CBA +model. The most expensive step can be extension, with its complexity +depending on the number of unique numerical values. + +**Comparison with decision tree induction** A common question is the +relationship between association rule classifiers and decision trees. +Trees can be decomposed into rules that are similar to association +rules, as each path from the root to the leaf corresponds to one rule. +However, individual trees in algorithms such as C4.5 [@quinlan2014c4] +are built in such a way that the resulting rules are non-overlapping, +while association rule learning outputs overlapping rules. Algorithms +such as Apriori will output all rules that are valid in the data, given +the user-specified thresholds. In contrast, decision tree induction +algorithms use a heuristic, such as information gain, which results in a +large number of otherwise valid patterns being skipped as rules +prioritize splits that maximize the chosen heuristic. + +### Example {#sec:example} + +#### Dataset + +Let's look at the `humtemp` synthetic data set, which we will be using +throughout this tutorial and is bundled with the +[**arcPackage**](https://CRAN.R-project.org/package=arcPackage). There +are two quantitative explanatory attributes (Temperature and Humidity). +The target attribute is preference (subjective comfort level). + +The first six rows of `humtemp` obtained with `head(humtemp)`: + +``` r +## Temperature Humidity Class +## 1 45 33 2 +## 2 27 29 3 +## 3 40 48 2 +## 4 40 65 1 +## 5 38 82 1 +## 6 37 30 3 +``` + +#### Quantization {#sec:quantiz} + +The essence of QCBA is the optimization of literals (conditions) created +over numerical attributes in rules. QCBA thus needs to translate back +the bins created during the discretization to the continuous space. + +For clarity, we will perform the quantization manually using equidistant +binning and user-defined cut points. An alternative approach using +automatic supervised discretization is shown in +Section [4](#sec:demo){reference-type="ref" reference="sec:demo"}. + +``` r +library(qCBA) + +temp_breaks <- seq(from = 15, to = 45, by = 5) +# Another possibility with user-defined cutpoints +hum_breaks <- c(0, 40, 60, 80, 100) + +data_discr <- arc::applyCuts( + df = humtemp, + cutp = list(temp_breaks, hum_breaks, NULL), + infinite_bounds = TRUE, + labels = TRUE +) +head(data_discr) +``` + +The result of quantization is: + +``` r +## Temperature Humidity Class +## 1 (40;45] (0;40] 2 +## 2 (25;30] (0;40] 3 +## 3 (35;40] (40;60] 2 +## 4 (35;40] (60;80] 1 +## 5 (35;40] (80;100] 1 +## 6 (35;40] (0;40] 3 +``` + +The purpose of the `applyCuts()` function is to ensure that within +intervals, a semicolon is used as a separator instead of the more common +comma. A semicolon is used as the standard interval separator in some +countries, such as Czechia. However, the main reason is that a comma is +already used within rules for another purpose -- to separate conditions. +The use of a different separator fosters unambiguity, for example, in +situations when a rule set aimed to be optimized is read from a file. + +#### Discovery of candidate class association rules + +ARC algorithms typically first generate a large number of association +rules or frequent itemsets. Typically, this step is handled internally +by the ARC library (as shown in Section [Demonstration of individual +QCBA optimization steps](#{sec:demo})). + +For better clarity, in the example below, the list of CARs is generated +by manually invoking the apriori algorithm. + +``` r +txns <- as(data_discr, "transactions") +appearance <- list(rhs = c("Class=1", "Class=2", "Class=3", "Class=4")) +rules <- arules::apriori( + data = txns, + parameter = list( + confidence = 0.5, + support = 3 / nrow(data_discr), + minlen = 1, + maxlen = 3 + ), + appearance = appearance +) +``` + +The first line converts the input data into *items*, attribute-value +pairs such as `Temperature=(40;45]` or `Class=3`. The `appearance` +defines which items can appear in the consequent of the rules +(right-hand side, *rhs*). This data format is required for association +rule learning. On the second line, there are several standard additional +parameters typically used for the extraction of association rules. The +confidence threshold of 0.5 and support of 1% is recommended +[@Liu98integratingclassification]. However, since the `humtemp` dataset +has fewer than 100 rows, the support threshold would correspond to the +support of just 1 transaction, which would miss the purpose of this +threshold to address overfitting by eliminating rules that are backed by +only a small number of instances. We, therefore, set the minimum support +threshold to 3 transactions (expressed as a percentage in the code +snippet). The `minlen` and `maxlen` express that rules must contain at +most two items in the condition part (antecedent). + +The discovered rules, shown with `inspect(rules)`, are shown below: + +``` r +lhs rhs support confidence coverage lift # +{Humidity=(80;100]} => {Class=1} 0.11111111 0.8000000 0.1388889 3.600000 4 +{Temperature=(15;20]} => {Class=2} 0.11111111 0.5714286 0.1944444 2.057143 4 +{Temperature=(30;35]} => {Class=4} 0.13888889 0.6250000 0.2222222 2.045455 5 +{Temperature=(25;30]} => {Class=4} 0.13888889 0.5000000 0.2777778 1.636364 5 +{Temperature=(25;30], Humidity=(40;60]} => {Class=4} 0.08333333 0.6000000 0.1388889 1.963636 3 +``` + +In this listing, coverage and lift, as defined in the @arules +documentation, are additional rule quality measures not used by the +[**qCBA**](https://CRAN.R-project.org/package=qCBA) package. Count +(abbreviated with \# in the listing) corresponds to the support of the +rule represented as an integer value rather than a proportion. + +#### Learn classifier from candidate CARs + +Out of the five discovered rules, we create a CBA classifier. The +following lists the conceptual steps performed by CBA to transform the +input rule list into a CBA model: + +1. Rule *precedence* is established: rules are sorted according to + confidence, support and length. + +2. Rules are subject to *pruning* + + - *data coverage pruning*: the algorithm iterates through the + rules in the order of precedence removing any rule which does + not correctly classify at least one instance. If the rule + correctly classifies at least one instance, it is retained and + the instances removed (only for the purpose of subsequent steps + of data coverage pruning). + + - *default rule pruning*: the algorithm iterates through the rules + in the sort order, and cuts off the list once keeping the + current rule would result in worse accuracy of the model than if + a default rule was inserted at the place of the current rule and + the rules below it were removed. + +3. *Default rule* is inserted at the bottom of the list. + +To supply our own list of candidate rules instead of using one generated +with `cba()`, we will call: + +``` r +classAtt <- "Class" +rmCBA <- cba_manual( + datadf_raw = humtemp, + rules = rules, + txns = txns, + rhs = appearance$rhs, + classAtt = classAtt, + cutp = list() +) +inspect(rmCBA@rules) +``` + +The reason why we invoke `cba_manual()` from the `arc` package is that +`cba_manual()` instead of `cba()` allows us to supply a custom list of +rules from which the CBA model will be built. This function would also +do the quantization, but since we already did this as part of the +preprocessing, we use `cutp = list()` to express that no cutpoints are +specified. + +In this toy example, the CBA model, which could be displayed through the +function call `inspect(rmCBA@rules)`, is almost identical to the +candidate list of rules shown above. The main difference is the +reordering of rules by confidence and support (higher is better) and the +addition of the *default rule* -- a rule with an empty antecedent to the +end. Note that for brevity, the conditions in the rules in the printout +below were replaced by {\...} as they are the same as in the printout on +the previous page (although mind the different order of rules). The +values of support, confidence, coverage and lift are also the same and +were omitted. + +``` r +lhs rhs {...} count lhs_length orderedConf orderedSupp cumulativeConf +[1] {...} => {Class=1} {...} 4 1 0.8000000 4 0.8000000 +[2] {...} => {Class=4} {...} 5 1 0.7142857 5 0.7500000 +[3] {...} => {Class=4} {...} 3 2 0.6000000 3 0.7058824 +[4] {...} => {Class=2} {...} 4 1 0.5000000 3 0.6521739 +[5] {...} => {Class=4} {...} 5 1 0.5000000 2 0.6296296 +[6] {} => {Class=2} {...} 0 0 0.5555556 5 0.6111111 +``` + +The default rule ensures that the rule list covers every possible +instance. The rule list is visualized in +Figure \@ref(fig:postpr), +where the green background corresponds to Class 2 predicted by the +default rule. The CBA output contains several additional statistics for +each rule. The *ordered* versions of confidence and support are computed +only from those training instances reaching the given rule. For the +first rule, the ordered confidence is identical to standard confidence. +The ordered support is also semantically the same but is expressed as an +absolute count rather than a proportion. To compute these values for the +subsequent rules, instances covered by rules higher in the list have +been removed. The cumulative confidence is an experimental measure +described in [**arc**](https://CRAN.R-project.org/package=arc) +documentation. + + +```{r postpr, echo=FALSE, fig.cap="Illustration of postpruning algorithm (HumTemp dataset). Left: CBA model; right: QCBA model.", fig.show='hold', out.width="48%"} +knitr::include_graphics("figures/figure1a.png") +knitr::include_graphics("figures/figure1b.png") +``` + + +#### Applying the classifier + +The toy example does not contain enough data for a meaningfully large +train/test split. Therefore, we will evaluate the training data. The +accuracy of the CBA model on training data: + +``` r +prediction_cba <- predict(rmCBA, data_discr) +acc_cba <- CBARuleModelAccuracy( + prediction = prediction_cba, + groundtruth = data_discr[[classAtt]] +) +``` + +The accuracy is 0.61, and the contents of `prediction_cba` are the +predicted values of comfort: + +``` r + [1] 2 4 2 2 1 2 2 4 4 4 4 4 4 1 4 1 4 4 4 4 4 4 4 4 1 2 2 2 2 1 2 2 2 2 2 2 +Levels: 1 2 4 +``` + +#### Explaining the prediction + +The `predict()` function from the +[**arc**](https://CRAN.R-project.org/package=arc) library allows for +additional output to enhance the explainability of the result. By +setting `outputFiringRuleIDs=TRUE` we can obtain the ID of a particular +rule that was used to classify each instance in the passed dataset. + +``` r +ruleIDs <- predict(rmCBA, data_discr, outputFiringRuleIDs = TRUE) +``` + +For example, we may now explain the classification of the first row of +`data_discr`, which is, for convenience, reproduced below: + +``` r +## Temperature Humidity Class +## 1 (40;45] (0;40] 2 +``` + +To do so, we invoke: + +``` r +inspect(rmCBA@rules[ruleIDs[1]]) +``` + +This returns the default rule (number 6). The reason is that the values +in this instance are out of bounds of the conditions in all other rules. +Now, we have a rule list ready for postoptimization with QCBA. + +#### Postprocessing with QCBA + +By working with the original continuous data, the QCBA algorithm can +improve the fit of the rules and consequently reduce their count. + +We will use the `rmCBA` model built previously: + +``` r +rmqCBA <- qcba(cbaRuleModel = rmCBA, datadf = humtemp) +``` + +This ran QCBA with the default set of optimization steps enabled, which +correspond to the best-performing configuration #5 from +[@kliegr2023qcba]. + +The resulting rule list is + +``` r + rules support confidence # +1 {Humidity=[82;95]} => {Class=1} 0.1111111 0.8000000 1 +2 {Temperature=[22;31],Humidity=[33;53]} => {Class=4} 0.1666667 0.7500000 2 +3 {Temperature=[31;34]} => {Class=4} 0.1388889 0.6250000 1 +4 {} => {Class=2} 0.2777778 0.2777778 0 +``` + +Note that the 'condition_count' column was abbreviated as \# in the +listing and the orderedConf and orderedSupp columns were omitted for +brevity. + +Figure \@ref(fig:postpr) +(right) shows the QCBA model. Compared to the CBA model in +Figure \@ref(fig:postpr) +(left), QCBA removed two rules and refined the boundaries of the +remaining rules. + +Predictive performance is computed in the same way as for CBA: + +``` r +prediction <- predict(rmqCBA, humtemp) +acc <- CBARuleModelAccuracy(prediction, humtemp[[rmqCBA@classAtt]]) +``` + +The accuracy is unchanged at 0.61, but we got a smaller model. +Similarly, as with CBA, we could use the argument *outputFiringRuleIDs*. + +Note that the QCBA algorithm does not introduce any mandatory thresholds +or meta-parameters for the user to set or optimize, although it does +allow the user to enable or disable the individual optimizations, as +shown in the next section. + +## Detailed description of package [**qCBA**](https://CRAN.R-project.org/package=qCBA) with examples in R + +### Overview of `qcba()` arguments + +To build a model, `qcba()` needs a set of rules. As a second mandatory +argument, the `qcba()` function takes the *raw* data frame. This can +contain nominal as well as numerical columns. The remaining arguments +are optional. The most important ones relating to the optimizations +performed by QCBA are described in the following two subsections. + +This rule model can be either the instance of `customCBARuleModel` or +`CBARuleModel` class. The difference is that in the $rules$ slot, in the +former class, rules are represented as string objects in a data frame. +This universal data frame format is convenient for loading rules from +other sources or R packages that export rules as strings. The latter +uses an instance of `rules` class from +[**arules**](https://CRAN.R-project.org/package=arules), which is more +efficient, especially when the number of rules or items is larger. An +important slot shared between both classes is `cutp`, which contains +information on the cutpoints used to quantize the data on which the +rules were learnt. + +### Optimizations on individual rules + +Optimizations can be divided into two groups depending on whether they +are performed on individual rules or on the entire model. We first +describe the first group. Since these operations are independent of the +other rules, they can also be parallelized. + +**Refitting rules**. This step processes all items derived with +quantization in the antecedent of a given rule. These items have +boundaries that stick to a grid that corresponds to the result of +discretization. The grid used by QCBA corresponds to all unique values +appearing in the training data. *This is the only mandatory step.* + +**Attribute pruning** (`attributePruning`). Attribute pruning is a step +in QCBA that evaluates if the items are needed for each rule and item in +its antecedent. The item is removed if a rule created without the item +has at least the confidence of the original rule. *Enabled (`TRUE`) by +default.* + +**Trimming** (`trim_literal_boundaries`). Boundary parts of intervals +into which no instances correctly classified by the rule fall are +removed. *Enabled (`TRUE`) by default.* + +**Extension** (`extendType`). The ranges of intervals in the antecedent +of each rule are attempted to be enlarged. Currently, only extension on +numerical attributes is supported. By default, the extension is accepted +if it does not decrease rule confidence, but this behaviour can be +controlled by setting the `minImprovement` parameter (default is 0). To +overcome local minima, the extension process can provisionally accept a +drop in confidence in the intermediate result of the extension. How much +the confidence can temporarily decrease for the extension process not to +stop is controlled by `minCondImprovement`. In the current version, the +extension applies only to numerical attributes (set +`extendType="numericOnly"` to enable, this is also the default value). +In future versions, other extend types may be added. + +### Optimizations on rule list {#ss:rulelistoptimization} + +The second group of optimizations aims at removing rules, considering +the context of other rules in the list. + +**Data coverage pruning** (`postpruning`). The purpose of this step is +to remove rules that were made redundant by the previous QCBA +optimizations. Possible values are `none`, `cba` and `greedy`. The `cba` +option is identical to CBA's data coverage pruning: a rule is removed if +it does not correctly classify any transaction in the training data +after all transactions covered by retained rules with higher precedence +were removed. The `greedy` option is an experimental modification of +data coverage pruning described in the +[**qCBA**](https://CRAN.R-project.org/package=qCBA) documentation. +*Enabled by default* (the default value is `postpruning=cba`). + +**Default rule overlap pruning** (`defaultRuleOverlapPruning`). Let +$R_p$ denote the set of rules that classify into the same class as the +default rule $r_d: \{\} \rightarrow cons_1$. To determine if a pruning +candidate, denoted as $r_p \in R_p: ant \rightarrow cons_1$, can be +removed, all rules with lower precedence that have a nonempty antecedent +and a different consequent $cons_2$ ($cons_1\neq cons_2$) are +identified. Let's denote the set of these rules as $R_c$. If the +antecedents of all rules in $R_c$ do not cover any of the transactions +covered by $r_p$ in the training data, $r_p$ is removed. The removal of +$r_p$ will not affect the classification of training data, since the +instances originally covered by $r_p: ant \rightarrow cons_1$ will be +classified to the same class by $r_d: \{\} \rightarrow cons_1$. This is +called the transaction-based version. In the alternative range-based +version, the checks on rules in $R_p$ involve checking the boundaries of +intervals rather than overlap in matched transactions. This parameter +has three possible values: `transactionBased`, `rangeBased` and `none`. +According to analysis and benchmarks in [@kliegr2023qcba], the +transaction-based version removes more rules than the range-based +version, although it can sometimes affect predictive performance on +unseen data. *Transaction-based pruning is the default.* + +### Effects on classification performance and model size + +An overview of observed properties of the QCBA steps is present in +Table \@ref(tab:T1). The +entries denote the effect of applying the algorithm specified in the +first column on the input rule list: $\geq$ denotes that the value of +the given metric will increase or will not change, = the value will not +change, $\leq$ decrease or will not change, $na$ can increase, decrease +or will not change. For example, applying the refit algorithm on a rule +can have the following effects according to the table: the density of +the rule will improve or remain the same: the (+) symbol in the table +denotes that an increase in that value is considered a favourable +property, and (-) as negative. Rule confidence (*conf*), rule support +(*supp*), rule length (*length*) will remain unaffected. Considering the +entire rule list, the refit operation will not affect the rule count or +accuracy on training data ($acc_{train}$). However, the accuracy on +unseen data ($acc_{test}$) may change. + +:::: center +::: {#tbl:guarantees} + ----------------- ------------------------- ------------ ------------ --------------- -------------- -------------- + algorithm rule (local classifier) rule list + + dataset conf supp length $acc_{train}$ $acc_{test}$ rule count + + refit = = = = $na$ = + + literal pruning $\geq$ $\geq (+)$ $\leq (+)$ $na$ $na$ $\leq$\* (+) + + trimming $\geq (+)$ = = $\geq (+)$ $na$ = + + extension $\geq (+)$ $\geq (+)$ = $na$ $na$ = + + postpruning na na na $\geq$ $na$ $\leq$ (+) + + drop - trans. na na na = $na$ $\leq$ (+) + + drop - range na na na = = $\leq$ (+) + ----------------- ------------------------- ------------ ------------ --------------- -------------- -------------- + + : (#tab:T1) Hypothesized properties of the proposed rule tuning + algorithms. *drop* denotes default rule pruning (*trans.* + transaction-based; *range* range-based). \* the number of rules is not + directly reduced, but literal pruning can make a rule redundant + (identical to another rule). The value *na* expresses that there is no + unanimous effect of the preprocessing algorithm on the quality + measure. +::: +:::: + +### Handling missing data + +Association rule learning approaches are resilient to the presence of +missing data in the input data frame. The reason is that rows are viewed +as transactions and combinations of column names and their values as +items. A missing value (`NA`) in a given column is then interpreted as +an item not present in a given transaction and skipped when a data frame +is converted to the `transactions` data structure from the +[**arules**](https://CRAN.R-project.org/package=arules) using the `as()` +function, which will result in NA values not being present in learnt +rules. Note that `qcba()` treats an empty string in the input dataframe +in the same way as the `NA` value along with a dataframe containing `NA` +values: the `NA` value is first converted to an empty string, +represented using `.jarray()` from +[**rJava**](https://CRAN.R-project.org/package=rJava) and then passed to +the Java-based core of +[**qCBA**](https://CRAN.R-project.org/package=qCBA), where it is treated +as a `Float.NaN` value and also generally skipped. + +### Computational costs for large datasets {#ss:costs} + +The individual postprocessing operations have distinct computational +costs. These depend on several main factors: the number of rows and +columns of the input data, the number of unique numerical values, and +the number of input rules. A detailed experimental study of the +influence of these factors is presented in @kliegr2023qcba. This was +performed on subsets of the KDD'99 Anomaly Detection dataset, with the +maximum dataset size being processed using all QCBA optimizations, +reaching about 40,000 rows and about the same number of unique numerical +values. The results show that the most computationally intensive step is +the extension step. For large datasets, the user may, therefore, +consider disabling it or tuning its `minCI` parameter, which can +influence the runtime. An important factor is also the number of input +rules for optimization. This is often related to the chosen base +learning algorithm. For example, CPAR tends to produce a large number of +rules while SBRL will output condensed rule models. For details, please +refer to @kliegr2023qcba. + +## Demonstration of individual QCBA optimization steps {#sec:demo} + +Compared to the example in Subsection [Example](#{sec:example}), this +part will use the larger iris dataset, which allows for the +demonstration of all QCBA steps. To show QCBA with a different base rule +learner than CBA from the +[**arc**](https://CRAN.R-project.org/package=arc) package used in the +previous section, we will use CPAR from the +[**arulesCBA**](https://CRAN.R-project.org/package=arulesCBA) package. + +### Data {#ss:data} + +We use the `iris` dataset from the +[**datasets**](https://CRAN.R-project.org/package=datasets) R package. +The dataset was shuffled and randomly split into a train set (100 rows) +and a test set (50 rows). The data was then automatically discretized +using the MDLP algorithm wrapped by `arc::discrNumeric()` and originally +available from the R +[**discretization**](https://CRAN.R-project.org/package=discretization) +library. + +``` r +library(arulesCBA) #version 1.2.7 +set.seed(12) # Chosen for demonstration purposes +allDataShuffled <- datasets::iris[sample(nrow(datasets::iris)), ] +trainFold <- allDataShuffled[1:100, ] +testFold <- allDataShuffled[101:nrow(datasets::iris), ] +classAtt <- "Species" + +discrModel <- discrNumeric(df = trainFold, classAtt = classAtt) +train_disc <- as.data.frame(lapply(discrModel$Disc.data, as.factor)) +cutPoints <- discrModel$cutp +test_disc <- applyCuts( + testFold, + cutPoints, + infinite_bounds = TRUE, + labels = TRUE +) +y_true <- testFold[[classAtt]] +``` + +### Learn base ARC model with CPAR + +In the following, we learn an ARC model using the CPAR (Classification +based on Predictive Association Rules) algorithm using its default +settings. + +``` r +rmBASE <- CPAR(train_disc, formula = as.formula(paste(classAtt, "~ ."))) +predictionBASE <- predict(rmBASE, test_disc) +inspect(rmBASE$rules) +cat("Number of rules: ", length(rmBASE$rules)) +cat("Total conditions: ", sum(rmBASE$rules@lhs@data)) +cat("Accuracy on test data: ", mean(predictionBASE == y_true)) +``` + +In this case, the rule model is composed of seven rules. Note that the +last field on the output containing the Laplace statistics was omitted +for brevity. + +``` r + lhs rhs support confidence lift +[1] {Petal.Length=[-Inf;2.6]} => {Species=setosa} 0.32 1.0000000 3.125000 +[2] {Petal.Width=[-Inf;0.8]} => {Species=setosa} 0.32 1.0000000 3.125000 +[3] {Petal.Length=(5.15; Inf], + Petal.Width=(1.75; Inf]} => {Species=virginica} 0.25 1.0000000 2.777778 +[4] {Petal.Width=(1.75; Inf]} => {Species=virginica} 0.33 0.9705882 2.696078 +[5] {Sepal.Length=(5.55; Inf], + Petal.Length=(2.6;4.75], + Petal.Width=(0.8;1.75]} => {Species=versicolor} 0.21 1.0000000 3.125000 +[6] {Petal.Length=(2.6;4.75]} => {Species=versicolor} 0.26 0.9629630 3.009259 +[7] {Petal.Width=(0.8;1.75]} => {Species=versicolor} 0.31 0.9117647 2.849265 +``` + +The statistics are: + +``` r +"Number of rules: 7 Total conditions: 10 Accuracy on test data: 0.96" +``` + +### Configuring QCBA optimizations and printing statistics + +For our demonstration purposes, we will set up a generic +[**qCBA**](https://CRAN.R-project.org/package=qCBA) configuration +(variable `baseModel_arc`), which initially disables all optimizations. +To avoid repeating code for passing long argument lists to `qcba()` and +for printing model statistics such as model size and accuracy on test +data, we will also introduce the helper function `qcba_with_summary()`. + +``` r + +baseModel_arc <- arulesCBA2arcCBAModel( + arulesCBAModel = rmBASE, + cutPoints = cutPoints, + rawDataset = trainFold, + classAtt = classAtt +) +qcbaParams <- list( + cbaRuleModel = baseModel_arc, + datadf = trainFold, + extend = "noExtend", + attributePruning = FALSE, + continuousPruning = FALSE, + postpruning = "none", + trim_literal_boundaries = FALSE, + defaultRuleOverlapPruning = "noPruning", + minImprovement = 0, + minCondImprovement = 0 +) + +qcba_with_summary <- function(params) { + rmQCBA <- do.call(qcba, params) + cat("Number of rules: ", nrow(rmQCBA@rules), " ") + cat("Total conditions: ", sum(rmQCBA@rules$condition_count), " ") + accuracy <- CBARuleModelAccuracy(predict(rmQCBA, testFold), testFold[[classAtt]]) + cat("Accuracy on test data: ", round(accuracy, 2)) + print(rmQCBA@rules) +} +``` + +### QCBA Refit + +The following code will postoptimize the previously learnt CBA model +using the refit optimization: + +``` r +qcba_with_summary(qcbaParams) +``` + +This will output the following list of eight rules (note that in this +and the following printouts, the columns with rule measures are omitted +for brevity): + +``` r +1 {Petal.Length=[-Inf;1.9]} => {Species=setosa} +2 {Petal.Width=[-Inf;0.6]} => {Species=setosa} +3 {Petal.Length=[5.2;Inf],Petal.Width=[1.8;Inf]} => {Species=virginica} +4 {Sepal.Length=[5.6;Inf],Petal.Length=[3.3;4.7],Petal.Width=[1;1.7]} => {Species=versicolor} +5 {Petal.Width=[1.8;Inf]} => {Species=virginica} +6 {Petal.Length=[3.3;4.7]} => {Species=versicolor} +7 {Petal.Width=[1;1.7]} => {Species=versicolor} +8 {} => {Species=virginica} +``` + +The intervals (except for boundaries set to infinity) were shortened. +For example, for the first rule, the training data does not contain any +data point with Petal.Length=2.6 (original boundary) but it does contain +the value 1.9 (new boundary). + +``` r +any(trainFold$Petal.Length == 1.9) +any(trainFold$Petal.Length == 2.6) +``` + +The first returns `TRUE` and the second `FALSE`. + +The statistics are + +``` r +[1] "Number of rules: 8 Total conditions: 10 Accuracy on test data: 0.96" +``` + +While this is one rule more than the CPAR model, this extra rule is the +explicitly included default rule (rule #8). In the +[**arulesCBA**](https://CRAN.R-project.org/package=arulesCBA) CPAR +model, the default rule is included in a separate slot +(`rmBASE$default`) and is the same virginica class as the one in the +QCBA rule set. + +Recall that rules are applied from top to bottom. A careful inspection +of the rule list shows that it contains rule 4, which is a special case +of rule 7 (both predicting the versicolor class). However, neither of +these rules can be removed without impact on the predictions. If the +more specific rule 4 was removed, some instances would be classified +differently, as more instances would reach rule 5, which would classify +them as virginica. Similarly, the classification would change if we +replace rule 4 with rule 7 or rule 7 with rule 4. + +### Adjusting boundaries and attribute pruning + +We will demonstrate extension, trimming and attribute pruning steps +simultaneously for efficient presentation. + +``` r +qcbaParams$attributePruning <- TRUE +qcbaParams$trim_literal_boundaries <- TRUE +qcbaParams$extend <- "numericOnly" +qcba_with_summary(qcbaParams) +``` + +The list of resulting rules is + +``` r +1 {Petal.Length=[-Inf;1.9]} => {Species=setosa} +2 {Petal.Width=[-Inf;0.6]} => {Species=setosa} +3 {Petal.Length=[5.2;Inf]} => {Species=virginica} +4 {Sepal.Length=[5;Inf],Petal.Length=[3.3;4.7]} => {Species=versicolor} +5 {Petal.Width=[1.8;Inf]} => {Species=virginica} +6 {Petal.Length=[3.3;4.7]} => {Species=versicolor} +7 {Petal.Width=[1;1.7]} => {Species=versicolor} +8 {} => {Species=virginica} +``` + +As can be seen, the adjustment of intervals resulted in a change in the +boundary for Sepal.length in rule #4. The attribute pruning removed +extra conditions from rules #3 and #4 resulting in a smaller model with +overall improved test accuracy: + +``` r +"Number of rules: 8 Total conditions: 8 Accuracy on test data: 1" +``` + +### Postpruning + +The postpruning is performed on a model resulting from all previous +steps. + +``` r +qcbaParams$postpruning <- "cba" +qcba_with_summary(qcbaParams) +``` + +The result is a substantially reduced rule list: + +``` r +1 {Petal.Length=[1;1.9]} => {Species=setosa} +2 {Petal.Length=[5.2;Inf]} => {Species=virginica} +3 {Sepal.Length=[5;Inf],Petal.Length=[3.3;4.7]} => {Species=versicolor} +4 {Petal.Width=[1.8;Inf]} => {Species=virginica} +5 {} => {Species=versicolor} +``` + +The statistics are: + +``` r +"Number of rules: 5 Total conditions: 5 Accuracy on test data: 1.0" +``` + +As can be seen, postpruning reduced the model size significantly, and in +this case, the model has even better accuracy than the base CPAR model. +However, in some other cases, a small average decrease in accuracy as a +result of pruning has been shown in benchmarks in @kliegr2023qcba. + +### Default rule pruning + +The following code demonstrates the standard strategy for default rule +pruning. As outlined earlier, this often provides an effective way to +reduce the rule count, however, sometimes at the expense of slightly +lower accuracy. []{#final:cpar label="final:cpar"} + +``` r +qcbaParams$defaultRuleOverlapPruning <- "transactionBased" +qcba_with_summary(qcbaParams) +``` + +The result is the final reduced rule list with the removed rule #3 from +the previous print-out. This rule classifies to the versicolor class. +Since it is the default class, instances that were covered by this rule +will be covered by the default rule. The original rule #4 covers -- +considering its position in the rule list -- different instances of +training data and, therefore, will not interfere with this. + +``` r +1 {Petal.Length=[1;1.9]} => {Species=setosa} +2 {Petal.Length=[5.2;Inf]} => {Species=virginica} +3 {Petal.Width=[1.8;Inf]} => {Species=virginica} +4 {} => {Species=versicolor} +``` + +The statistics for the final model are: + +``` r +Number of rules: 4 Total conditions: 3 Accuracy on test data: 1.0 +``` + +Compared to the original CPAR model, the number of rules dropped from 8 +(including the default rule) to 4, the number of conditions dropped from +10 to 3, and the accuracy increased by 0.04 points. This is an +illustrative example and actual results may vary for a particular +dataset. On average, the improvement reported over CPAR as measured on +22 benchmark datasets was a 2% improvement in accuracy, a 40% reduction +in the number of rules and a 29% reduction in the number of conditions +[@kliegr2023qcba]. More details on the benchmarks are covered in +Section [Built-in benchmark support](#{sec:benchmark}). + +## Interoperability with Rule Learning Packages in CRAN + +The [**qCBA**](https://CRAN.R-project.org/package=qCBA) package is able +to process CBA models produced by all three CBA implementations in CRAN: +[**arc**](https://CRAN.R-project.org/package=arc), +[**arulesCBA**](https://CRAN.R-project.org/package=arulesCBA), +[**rCBA**](https://CRAN.R-project.org/package=rCBA). Additionally, it +can process other rule models generated by +[**arulesCBA**](https://CRAN.R-project.org/package=arulesCBA) such as +CPAR and SBRL models generated by the package +[**sbrl**](https://CRAN.R-project.org/package=sbrl). + +On the input, `qcba()` requires an instance of `CBARuleModel`, which has +the following slots: + +- `rules`: list of rules in the model (instance of `rules` from + `arules` package. + +- `cutp`: specification of cutpoints used to discretize numerical + attributes, + +- `classAtt`: name of the target attribute, + +- `attTypes`: types of the attributes (numeric or factor). + +As shown below, the instance of this class is created automatically when +used with [**arc**](https://CRAN.R-project.org/package=arc) and through +prepared helper functions for other libraries. The code examples in the +following are built on the data preparation described in +Subsection [Data](#{ss:data}). + +### [**arc**](https://CRAN.R-project.org/package=arc) package + +As the [**arc**](https://CRAN.R-project.org/package=arc) package was +specifically designed for +[**qCBA**](https://CRAN.R-project.org/package=qCBA), it outputs an +instance of the `CBARuleModel` class, which is accepted by the `qcba()` +function. The postprocessing can thus be directly applied to the result +of `cba()`. + +``` r +rmCBA <- cba(datadf = trainFold, classAtt = "Species") +rmqCBA <- qcba(cbaRuleModel = rmCBA, datadf = trainFold) +``` + +By default, the function `cba()` from the +[**arc**](https://CRAN.R-project.org/package=arc) package learns +candidate association rules using automatic threshold detection using +the heuristic algorithm from [@kliegr2019tuning]. Therefore, no support +and confidence thresholds have to be passed. A more complex case +involving custom discretization and thresholds was demonstrated in +Section [Primer on building rule-based classifiers for QCBA in +R](#{sec:primer}). + +### [**arulesCBA**](https://CRAN.R-project.org/package=arulesCBA) package + +Compared to the previous example, there is an extra line with a call to +a helper function. + +``` r +library(arulesCBA) +arulesCBAModel <- arulesCBA::CBA(Species ~ ., data = train_disc, supp = 0.1, conf = 0.9) +CBAmodel <- arulesCBA2arcCBARuleModel(arulesCBAModel, discrModel$cutp, iris, classAtt) +qCBAmodel <- qcba(cbaRuleModel = CBAmodel, datadf = iris) +``` + +Note that we passed a prediscretized data in `irisDisc` to the package +[**arulesCBA**](https://CRAN.R-project.org/package=arulesCBA). While the +[**arulesCBA**](https://CRAN.R-project.org/package=arulesCBA) package +allows for supervised discretization using the +`arulesCBA::discretizeDF.supervised()` method, the cutpoints determined +during the discretization are not exposed in a machine-readable way. +Therefore, when +[**arulesCBA**](https://CRAN.R-project.org/package=arulesCBA) is used in +conjunction with [**qCBA**](https://CRAN.R-project.org/package=qCBA), +the discretization should be performed using `arc::discrNumeric()`. +Since both [**arc**](https://CRAN.R-project.org/package=arc) and +[**arulesCBA**](https://CRAN.R-project.org/package=arulesCBA) internally +use the MDLP method from the +[**discretization**](https://CRAN.R-project.org/package=discretization) +package, this should not influence the results. + +### [**rCBA**](https://CRAN.R-project.org/package=rCBA) package + +The logic of the use of +[**rCBA**](https://CRAN.R-project.org/package=rCBA) package is similar +to that of the previously covered package; it is only necessary to +ensure the use of a different conversion function: + +``` r +library(rCBA) +rCBAmodel <- rCBA::build(train_disc) +CBAmodel <- rcbaModel2CBARuleModel(rCBAmodel, discrModel$cutp, iris, "Species") +qCBAmodel <- qcba(CBAmodel, iris) +``` + +Confidence and support thresholds are not specified as +[**rCBA**](https://CRAN.R-project.org/package=rCBA) uses automatic +confidence and threshold tuning using the simulated annealing algorithm +from [@kliegr2019tuning]. The +[**rCBA**](https://CRAN.R-project.org/package=rCBA) package internally +uses a Java implementation of CBA, which may result in faster +performance on some datasets than +[**arulesCBA**](https://CRAN.R-project.org/package=arulesCBA) +[@hahsler2019associative]. + +### [**sbrl**](https://CRAN.R-project.org/package=sbrl) and other packages + +The logic of the use of +[**qCBA**](https://CRAN.R-project.org/package=qCBA) package for +[**sbrl**](https://CRAN.R-project.org/package=sbrl) is similar; it is +again only necessary to use a dedicated conversion function +`sbrlModel2arcCBARuleModel()`. + +An example for [**sbrl**](https://CRAN.R-project.org/package=sbrl) is +contained in the [**qCBA**](https://CRAN.R-project.org/package=qCBA) +package documentation, as +[**sbrl**](https://CRAN.R-project.org/package=sbrl) requires additional +preprocessing and postprocessing: +[**sbrl**](https://CRAN.R-project.org/package=sbrl) requires a specially +named target attribute, allows only for binary targets, and outputs +probabilities rather than specific class predictions. + +For compatibility with packages that do not use the +[**arules**](https://CRAN.R-project.org/package=arules) data structures, +there is also the `customCBARuleModel` class, which takes rules as a +dataframe conforming to the format used in +[**arules**](https://CRAN.R-project.org/package=arules) that can be +obtained with `(as(rules, "data.frame"))`. + +## Built-in benchmark support {#sec:benchmark} + +The [**qCBA**](https://CRAN.R-project.org/package=qCBA) package has +built-in support for benchmarking over all supported types of algorithms +covered in the previous section. This includes +[**arulesCBA**](https://CRAN.R-project.org/package=arulesCBA) +implementations of CBA, CMAR, CPAR, PRM and FOIL2 [@quinlan1993foil]. + +By default, an average of two runs of each algorithm is performed. + +``` r +# learn with default metaparameter values +stats <- benchmarkQCBA(train = trainFold, test = testFold, classAtt = classAtt) +print(stats) +``` + +The result of the last printout is + +``` r + CBA CMAR CPAR PRM FOIL2 CBA_QCBA CMAR_QCBA CPAR_QCBA PRM_QCBA FOIL2_QCBA +accuracy 1.00 0.960 0.960 0.960 0.960 1.000 1.000 1.000 1.000 0.960 +rulecount 5.00 25.000 7.000 6.000 8.000 4.000 4.000 4.000 4.000 5.000 +modelsize 5.00 52.000 10.000 9.000 13.000 3.000 3.000 3.000 3.000 5.000 +buildtime 0.05 0.535 0.215 0.205 0.215 0.058 0.138 0.059 0.059 0.081 +``` + +The function can be easily turned into a comparison with +`round((stats[, 6:10] / stats[, 1:5] - 1), 3)`, the result is then: + +``` r + CBA_QCBA CMAR_QCBA CPAR_QCBA PRM_QCBA FOIL2_QCBA +accuracy 0.000 0.042 0.042 0.042 0.000 +rulecount -0.200 -0.840 -0.429 -0.333 -0.375 +modelsize -0.400 -0.942 -0.700 -0.667 -0.615 +buildtime 0.204 -0.737 -0.681 -0.722 -0.665 +``` + +This shows that on the iris, depending on the base algorithm, QCBA +decreased the rule count between 20% and 84%, while accuracy remained +unchanged or increased by about 4%. The last row shows that the time +required by QCBA is, for four out of five studied reference algorithms, +lower than what it takes to train the input model by the corresponding +algorithm. The `benchmarkQCBA()` function can also accept custom +metaparameters and selected base rule learners. The user can also choose +the number of runs (`iterations` parameter) and obtain the resulting +models from the last iteration. + +Since the outputs of some learners may depend on chance, the function +also allows setting the random seed through the optional `seed` +argument. Note that the provided seed is not used for splitting data, +which needs to be performed externally. This approach provides the most +control for the user, avoids replicating code in other R packages and +functions aimed at splitting data, and also improves reproducibility. + +``` r +output <- benchmarkQCBA( + trainFold, + testFold, + classAtt, + train_disc, + test_disc, + discrModel$cutp, + CBA = list(support = 0.05, confidence = 0.5), + algs = c("CPAR"), + iterations = 10, + return_models = TRUE, + seed = 1 +) + +message("Evaluation statistics") +print(output$stats) +message("CPAR model") +inspect(output$CPAR[[1]]) +message("QCBA model") +print(output$CPAR_QCBA[[1]]) +``` + +This will produce output with a list of rules similar to the final CPAR +model presented in Section [Demonstration of individual QCBA +optimization steps](#{sec:demo}). A more complex benchmark of +computational costs is presented in @kliegr2023qcba, which also includes +the study of various data sizes and the effect of varying the number of +unique values in the dataset. A brief overview of the main results was +presented in Subsection [Computational costs for large +datasets](#{ss:costs}). + +A GitHub repository contains +scripts that extend this workflow into automation across multiple +datasets and materialized splits in each dataset. It also includes +support for benchmarking additional rule learning algorithms, including +SBRL, Python packages producing IDS models [@lakkarajuinterpretable] and +Weka libraries for RIPPER [@cohen1995fast] and FURIA [@huhn2009furia]. +Detailed benchmarking results are included in [@kliegr2023qcba]. + +## Conclusions + +Quantitative CBA ameliorates one of the major drawbacks of association +rule classification, the adherence of rules comprising the classifier to +the multidimensional grid created by discretization of numerical +attributes. By working with the original continuous data, the algorithm +can improve the fit of the rules and consequently reduce their count. +The QCBA algorithm does not introduce any mandatory thresholds or +meta-parameters for the user to set or optimize, although it does allow +disabling the individual optimizations. The +[**qCBA**](https://CRAN.R-project.org/package=qCBA) package implements +QCBA, allowing the postprocessing of the output of all three CBA +implementations currently in CRAN. The package can also be used in +conjunction with other association rule-based models, including those +producing rule sets and using multi-rule classification. + +The QCBA algorithm is described in detail in @kliegr2023qcba, the +package documentation is available in @qcbaPackage, and additional +information is available at , which also +features an interactive RMarkdown tutorial supplementing this paper. + +## Acknowledgment + +The author thanks the Faculty of Informatics and Statistics, Prague +University of Economics and Business, for long-term institutional +support of research activities. +::::: diff --git a/_articles/RJ-2025-038/RJ-2025-038.html b/_articles/RJ-2025-038/RJ-2025-038.html new file mode 100644 index 0000000000..b3f9f81de2 --- /dev/null +++ b/_articles/RJ-2025-038/RJ-2025-038.html @@ -0,0 +1,2969 @@ + + + + + + + + + + + + + + + + + + + + + + qCBA: An R Package for Postoptimization of Rule Models Learnt on Quantized Data + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +

    qCBA: An R Package for Postoptimization of Rule Models Learnt on Quantized Data

    + + + +

    A popular approach to building rule models is association rule +classification. However, these often produce larger models than most +other rule learners, impeding the comprehensibility of the created +classifiers. Also, these algorithms decouple discretization from model +learning, often leading to a loss of predictive performance. This +package presents an implementation of Quantitative Classification +based on Associations (QCBA), which is a collection of postprocessing +algorithms for rule models built over discretized data. The QCBA +method improves the fit of the bins originally produced by +discretization and performs additional pruning, resulting in models +that are typically smaller and often more accurate. The qCBA package +supports models created with multiple packages for rule-based +classification available in CRAN, including arc, arulesCBA, rCBA and +sbrl.

    +
    + + + +
    +
    +

    1 Introduction

    +

    There is a resurgence of interest in interpretable machine learning +models, with rule learning providing an appealing combination of +intrinsic comprehensibility, as humans are naturally used to working +with rules, with well-documented predictive performance and scalability. +Association rule classification (ARC) is a subclass of rule-learning +algorithms that can quickly generate a large number of candidate rules, +a subset of which is subsequently chosen for the final classifier. The +first, and with at least three packages (Hahsler et al. 2019), the +most popular such algorithm was Classification Based on Associations +(CBA) (Liu et al. 1998). There are also multiple newer +approaches, such as SBRL (Yang et al. 2017) or RCAR (Azmi and Berrado 2020), +both available in R (Michael Hahsler 2024; Yang et al. 2024).

    +

    A major limitation of ARC approaches is that they typically trade off +the ability to process numerical data for the speed of generation. On +the input, these approaches require categorical data. If there are +numerical attributes, these need to be converted to categories, +typically through some discretization (quantization) approach, such as +MDLP (Fayyad and Irani 1993). Association rule learning then operates on +prediscretized datasets, which results in the loss of predictive +performance and larger rule sets.

    +

    The Quantitative Classification Based on Associations method (QCBA) +(Kliegr and Izquierdo 2023) is a collection of several algorithms that +postoptimize rule-based classifiers learnt on prediscretized data with +respect to the original raw dataset with numerical attributes. As was +experimentally shown in Kliegr and Izquierdo (2023), this makes the models often +more accurate and consistently smaller and thus more interpretable.

    +

    This paper presents the +qCBA R package, which +implements QCBA and is available on CRAN. The +qCBA R package was +initially developed to postprocess results of CBA implementations, as +these were the most common rule learning systems in R, but it can now +also handle results of other rule learning approaches such as SBRL. The +three CBA implementations in CRAN – +rCBA (Kuchař and Kliegr 2019), +arc (Kliegr 2016) and +arulesCBA +(Michael Hahsler 2024) introduced in Hahsler et al. (2019) rely on the fast and +proven arules package +(Hahsler et al. 2011) to mine association rules, which is also the main +dependency of the qCBA R +package.

    +

    2 Primer on building rule-based classifiers for QCBA in R

    +

    This primer will show how to use QCBA with CBA as the base rule learner. +Out of the rule learners supported by QCBA, CBA is the most widely +supported in CRAN and also scientifically most cited (as of writing). +This primer is standalone and intends to show the main concepts through +code-based examples. However, it uses the same dataset and setting as +the going example in the open-access publication (Kliegr and Izquierdo 2023), +which contains graphical illustrations as well as formal definitions of +the algorithms (as opposed to R-code examples here).

    +

    Brief introduction to association rule classification

    +

    Before we present the details of the QCBA algorithm, we start by +covering the foundational methods of association rule learning and +classification.

    +

    Input data Association rules are historically mined on +transactions, which is also the format used by the most commonly used +R package arules. A +standard data table (data frame) can be converted to a transaction +matrix. This can be viewed as a binary incidence matrix, with one dummy +variable for each attribute-value pair (called an item). In the case +of numerical variables, discretization (quantization) is a standard +preprocessing step. It is required to ensure that the search for rules +is fast and the conditions in the discovered rules are sufficiently +broad.

    +

    Association rules Algorithms such as Apriori (Agrawal and Srikant 1994) are +used for association rule mining. An association rule has the form +\(r: antecedent \rightarrow consequent\), where both \(antecedent\) and +\(consequent\) are sets of items. The +arules package, which +is the most popular Apriori implementation in CRAN, calls the antecedent +the left-hand side of the rule (\(lhs\)) and the consequent the right-hand +side (\(rhs\)). Each set is interpreted as a conjunction of conditions +corresponding to individual items. When all these conditions are met, +then the rule predicts that its consequent is true for a given input +data point. Formally, we say that a rule covers a transaction if all +items in the rule’s antecedent are contained in the transaction. A rule +correctly classifies the transaction if the transaction is covered +and, at the same time, the items in the rule consequent are contained in +the transaction.

    +

    Rule interest measures Each rule is associated with some quality +metrics (also called rule interest measures). The two most important +ones are the confidence and support of rule \(r\). Confidence is +calculated as \(conf(r)=a/(a+b)\), where \(a\) is the number of transactions +matching both the antecedent and consequent and \(b\) is the number of +transactions matching the antecedent but not the consequent. Support is +calculated either as \(a\) (absolute support) or \(a\) divided by the total +number of transactions (relative support). In the association rule +generation step, confidence and support are used as constraints in +association rule learning. Another important constraint to prevent a +combinatorial explosion is a restriction on the length of the rule +(minLen and maxLen parameters in +arules), defined as the +threshold on the minimum and maximum count of items in the rule +antecedent and consequent.

    +

    Association Rule Classification models ARC algorithms process +candidate association rules into a classifier. An input to the ARC +algorithm is a set of pre-mined class association rules (CARs). A +class association rule is an association rule that contains exactly one +item corresponding to one of the target classes in the consequent. The +classifier-building process typically amounts to a selection of a subset +of CARs (Vanhoof and Depaire 2010). Some ARC algorithms, such as CBA, use rule +interest measures for this: rules are first ordered and then some of +them are removed using data coverage pruning and default rule +pruning (step 2 and step 3 of the CBA-CB algorithm as described in +Liu et al. (1998). Some ARC algorithms also include a rule +with an empty antecedent (called default rule) at the end of the rule +list to ensure that the classifier can always provide a prediction even +if there is no specific rule matching that particular transaction.

    +

    Types of ARC models CBA produces rule lists: the rule with the +highest priority in the pruned rule list is used for classification. +Other recent algorithms producing rule lists include SBRL (Yang et al. 2024). The +other common ARC approach is based on rule sets, where rules are not +ordered, and multiple rules can be used for classification, e.g., +through voting. This second group includes algorithms such as CMAR +(Li et al. 2001) or CPAR (Yin and Han 2003).

    +

    Postprocessing with Quantitative CBA QCBA is a postprocessing +algorithm for ARC models built over quantized data. On the input, it can +take both rule lists or rule sets. However, it always outputs a rule +list. While association rule learning often faces issues with a +combinatorial explosion (Kliegr and Kuchar 2019), the postprocessing by QCBA +is performed on a relatively small number of rules that passed the +pruning steps within the specific ARC algorithm. QCBA is a modular +approach with six steps that can be performed independently of each +other. The only exception is the initial refit tuning step, which +processes all items in a given rule that are the result of quantization. +QCBA adjusts the item boundaries so that they correspond to actual +values appearing in the original training data before discretization. +Benchmarks in Kliegr and Izquierdo (2023) have shown that, on average, the +postprocessing with QCBA took a similar time as building the input CBA +model. The most expensive step can be extension, with its complexity +depending on the number of unique numerical values.

    +

    Comparison with decision tree induction A common question is the +relationship between association rule classifiers and decision trees. +Trees can be decomposed into rules that are similar to association +rules, as each path from the root to the leaf corresponds to one rule. +However, individual trees in algorithms such as C4.5 (Quinlan 1993) +are built in such a way that the resulting rules are non-overlapping, +while association rule learning outputs overlapping rules. Algorithms +such as Apriori will output all rules that are valid in the data, given +the user-specified thresholds. In contrast, decision tree induction +algorithms use a heuristic, such as information gain, which results in a +large number of otherwise valid patterns being skipped as rules +prioritize splits that maximize the chosen heuristic.

    +

    Example

    +
    Dataset
    +

    Let’s look at the humtemp synthetic data set, which we will be using +throughout this tutorial and is bundled with the +arcPackage. There +are two quantitative explanatory attributes (Temperature and Humidity). +The target attribute is preference (subjective comfort level).

    +

    The first six rows of humtemp obtained with head(humtemp):

    +
    ##   Temperature Humidity Class
    +## 1          45       33     2
    +## 2          27       29     3
    +## 3          40       48     2
    +## 4          40       65     1
    +## 5          38       82     1
    +## 6          37       30     3
    +
    Quantization
    +

    The essence of QCBA is the optimization of literals (conditions) created +over numerical attributes in rules. QCBA thus needs to translate back +the bins created during the discretization to the continuous space.

    +

    For clarity, we will perform the quantization manually using equidistant +binning and user-defined cut points. An alternative approach using +automatic supervised discretization is shown in +Section 4.

    +
    library(qCBA)
    +
    +temp_breaks <- seq(from = 15, to = 45, by = 5)
    +# Another possibility with user-defined cutpoints
    +hum_breaks <- c(0, 40, 60, 80, 100) 
    +
    +data_discr <- arc::applyCuts(
    +  df = humtemp,
    +  cutp = list(temp_breaks, hum_breaks, NULL),
    +  infinite_bounds = TRUE,
    +  labels = TRUE
    +)
    +head(data_discr)
    +

    The result of quantization is:

    +
    ##   Temperature Humidity Class
    +## 1     (40;45]   (0;40]     2
    +## 2     (25;30]   (0;40]     3
    +## 3     (35;40]  (40;60]     2
    +## 4     (35;40]  (60;80]     1
    +## 5     (35;40] (80;100]     1
    +## 6     (35;40]   (0;40]     3
    +

    The purpose of the applyCuts() function is to ensure that within +intervals, a semicolon is used as a separator instead of the more common +comma. A semicolon is used as the standard interval separator in some +countries, such as Czechia. However, the main reason is that a comma is +already used within rules for another purpose – to separate conditions. +The use of a different separator fosters unambiguity, for example, in +situations when a rule set aimed to be optimized is read from a file.

    +
    Discovery of candidate class association rules
    +

    ARC algorithms typically first generate a large number of association +rules or frequent itemsets. Typically, this step is handled internally +by the ARC library (as shown in Section Demonstration of individual +QCBA optimization steps).

    +

    For better clarity, in the example below, the list of CARs is generated +by manually invoking the apriori algorithm.

    +
    txns <- as(data_discr, "transactions")
    +appearance <- list(rhs = c("Class=1", "Class=2", "Class=3", "Class=4"))
    +rules <- arules::apriori(
    +  data = txns,
    +  parameter = list(
    +    confidence = 0.5,
    +    support = 3 / nrow(data_discr),
    +    minlen = 1,
    +    maxlen = 3
    +  ),
    +  appearance = appearance
    +)
    +

    The first line converts the input data into items, attribute-value +pairs such as Temperature=(40;45] or Class=3. The appearance +defines which items can appear in the consequent of the rules +(right-hand side, rhs). This data format is required for association +rule learning. On the second line, there are several standard additional +parameters typically used for the extraction of association rules. The +confidence threshold of 0.5 and support of 1% is recommended +(Liu et al. 1998). However, since the humtemp dataset +has fewer than 100 rows, the support threshold would correspond to the +support of just 1 transaction, which would miss the purpose of this +threshold to address overfitting by eliminating rules that are backed by +only a small number of instances. We, therefore, set the minimum support +threshold to 3 transactions (expressed as a percentage in the code +snippet). The minlen and maxlen express that rules must contain at +most two items in the condition part (antecedent).

    +

    The discovered rules, shown with inspect(rules), are shown below:

    +
    lhs                                        rhs       support    confidence coverage  lift     #
    +{Humidity=(80;100]}                     => {Class=1} 0.11111111 0.8000000  0.1388889 3.600000 4
    +{Temperature=(15;20]}                   => {Class=2} 0.11111111 0.5714286  0.1944444 2.057143 4
    +{Temperature=(30;35]}                   => {Class=4} 0.13888889 0.6250000  0.2222222 2.045455 5
    +{Temperature=(25;30]}                   => {Class=4} 0.13888889 0.5000000  0.2777778 1.636364 5
    +{Temperature=(25;30], Humidity=(40;60]} => {Class=4} 0.08333333 0.6000000  0.1388889 1.963636 3
    +

    In this listing, coverage and lift, as defined in the Hahsler et al. (2007) +documentation, are additional rule quality measures not used by the +qCBA package. Count +(abbreviated with # in the listing) corresponds to the support of the +rule represented as an integer value rather than a proportion.

    +
    Learn classifier from candidate CARs
    +

    Out of the five discovered rules, we create a CBA classifier. The +following lists the conceptual steps performed by CBA to transform the +input rule list into a CBA model:

    +
      +
    1. Rule precedence is established: rules are sorted according to +confidence, support and length.

    2. +
    3. Rules are subject to pruning

      +
        +
      • data coverage pruning: the algorithm iterates through the +rules in the order of precedence removing any rule which does +not correctly classify at least one instance. If the rule +correctly classifies at least one instance, it is retained and +the instances removed (only for the purpose of subsequent steps +of data coverage pruning).

      • +
      • default rule pruning: the algorithm iterates through the rules +in the sort order, and cuts off the list once keeping the +current rule would result in worse accuracy of the model than if +a default rule was inserted at the place of the current rule and +the rules below it were removed.

      • +
    4. +
    5. Default rule is inserted at the bottom of the list.

    6. +
    +

    To supply our own list of candidate rules instead of using one generated +with cba(), we will call:

    +
    classAtt <- "Class"
    +rmCBA <- cba_manual(
    +  datadf_raw = humtemp,
    +  rules = rules,
    +  txns = txns,
    +  rhs = appearance$rhs,
    +  classAtt = classAtt,
    +  cutp = list()
    +)
    +inspect(rmCBA@rules)
    +

    The reason why we invoke cba_manual() from the arc package is that +cba_manual() instead of cba() allows us to supply a custom list of +rules from which the CBA model will be built. This function would also +do the quantization, but since we already did this as part of the +preprocessing, we use cutp = list() to express that no cutpoints are +specified.

    +

    In this toy example, the CBA model, which could be displayed through the +function call inspect(rmCBA@rules), is almost identical to the +candidate list of rules shown above. The main difference is the +reordering of rules by confidence and support (higher is better) and the +addition of the default rule – a rule with an empty antecedent to the +end. Note that for brevity, the conditions in the rules in the printout +below were replaced by {...} as they are the same as in the printout on +the previous page (although mind the different order of rules). The +values of support, confidence, coverage and lift are also the same and +were omitted.

    +
    lhs       rhs        {...}  count lhs_length orderedConf orderedSupp cumulativeConf
    +[1] {...} => {Class=1}   {...}  4     1          0.8000000   4           0.8000000         
    +[2] {...} => {Class=4}   {...}  5     1          0.7142857   5           0.7500000        
    +[3] {...} => {Class=4}   {...}  3     2          0.6000000   3           0.7058824          
    +[4] {...} => {Class=2}   {...}  4     1          0.5000000   3           0.6521739        
    +[5] {...} => {Class=4}   {...}  5     1          0.5000000   2           0.6296296         
    +[6] {}    => {Class=2}   {...}  0     0          0.5555556   5           0.6111111        
    +

    The default rule ensures that the rule list covers every possible +instance. The rule list is visualized in +Figure 1, +where the green background corresponds to Class 2 predicted by the +default rule. The CBA output contains several additional statistics for +each rule. The ordered versions of confidence and support are computed +only from those training instances reaching the given rule. For the +first rule, the ordered confidence is identical to standard confidence. +The ordered support is also semantically the same but is expressed as an +absolute count rather than a proportion. To compute these values for the +subsequent rules, instances covered by rules higher in the list have +been removed. The cumulative confidence is an experimental measure +described in arc +documentation.

    +
    +
    +Illustration of postpruning algorithm (HumTemp dataset). Left: CBA model; right: QCBA model.Illustration of postpruning algorithm (HumTemp dataset). Left: CBA model; right: QCBA model. +

    +Figure 1: Illustration of postpruning algorithm (HumTemp dataset). Left: CBA model; right: QCBA model. +

    +
    +
    +
    Applying the classifier
    +

    The toy example does not contain enough data for a meaningfully large +train/test split. Therefore, we will evaluate the training data. The +accuracy of the CBA model on training data:

    +
    prediction_cba <- predict(rmCBA, data_discr)
    +acc_cba <- CBARuleModelAccuracy(
    +  prediction = prediction_cba,
    +  groundtruth = data_discr[[classAtt]]
    +)
    +

    The accuracy is 0.61, and the contents of prediction_cba are the +predicted values of comfort:

    +
     [1] 2 4 2 2 1 2 2 4 4 4 4 4 4 1 4 1 4 4 4 4 4 4 4 4 1 2 2 2 2 1 2 2 2 2 2 2
    +Levels: 1 2 4
    +
    Explaining the prediction
    +

    The predict() function from the +arc library allows for +additional output to enhance the explainability of the result. By +setting outputFiringRuleIDs=TRUE we can obtain the ID of a particular +rule that was used to classify each instance in the passed dataset.

    +
    ruleIDs <- predict(rmCBA, data_discr, outputFiringRuleIDs = TRUE)
    +

    For example, we may now explain the classification of the first row of +data_discr, which is, for convenience, reproduced below:

    +
    ##   Temperature Humidity Class
    +## 1     (40;45]   (0;40]     2
    +

    To do so, we invoke:

    +
    inspect(rmCBA@rules[ruleIDs[1]])
    +

    This returns the default rule (number 6). The reason is that the values +in this instance are out of bounds of the conditions in all other rules. +Now, we have a rule list ready for postoptimization with QCBA.

    +
    Postprocessing with QCBA
    +

    By working with the original continuous data, the QCBA algorithm can +improve the fit of the rules and consequently reduce their count.

    +

    We will use the rmCBA model built previously:

    +
    rmqCBA <- qcba(cbaRuleModel = rmCBA, datadf = humtemp)
    +

    This ran QCBA with the default set of optimization steps enabled, which +correspond to the best-performing configuration #5 from +(Kliegr and Izquierdo 2023).

    +

    The resulting rule list is

    +
       rules                                              support    confidence # 
    +1 {Humidity=[82;95]} => {Class=1}                     0.1111111  0.8000000  1 
    +2 {Temperature=[22;31],Humidity=[33;53]} => {Class=4} 0.1666667  0.7500000  2 
    +3 {Temperature=[31;34]} => {Class=4}                  0.1388889  0.6250000  1 
    +4 {} => {Class=2}                                     0.2777778  0.2777778  0 
    +

    Note that the ‘condition_count’ column was abbreviated as # in the +listing and the orderedConf and orderedSupp columns were omitted for +brevity.

    +

    Figure 1 +(right) shows the QCBA model. Compared to the CBA model in +Figure 1 +(left), QCBA removed two rules and refined the boundaries of the +remaining rules.

    +

    Predictive performance is computed in the same way as for CBA:

    +
    prediction <- predict(rmqCBA, humtemp)
    +acc <- CBARuleModelAccuracy(prediction, humtemp[[rmqCBA@classAtt]])
    +

    The accuracy is unchanged at 0.61, but we got a smaller model. +Similarly, as with CBA, we could use the argument outputFiringRuleIDs.

    +

    Note that the QCBA algorithm does not introduce any mandatory thresholds +or meta-parameters for the user to set or optimize, although it does +allow the user to enable or disable the individual optimizations, as +shown in the next section.

    +

    3 Detailed description of package qCBA with examples in R

    +

    Overview of qcba() arguments

    +

    To build a model, qcba() needs a set of rules. As a second mandatory +argument, the qcba() function takes the raw data frame. This can +contain nominal as well as numerical columns. The remaining arguments +are optional. The most important ones relating to the optimizations +performed by QCBA are described in the following two subsections.

    +

    This rule model can be either the instance of customCBARuleModel or +CBARuleModel class. The difference is that in the \(rules\) slot, in the +former class, rules are represented as string objects in a data frame. +This universal data frame format is convenient for loading rules from +other sources or R packages that export rules as strings. The latter +uses an instance of rules class from +arules, which is more +efficient, especially when the number of rules or items is larger. An +important slot shared between both classes is cutp, which contains +information on the cutpoints used to quantize the data on which the +rules were learnt.

    +

    Optimizations on individual rules

    +

    Optimizations can be divided into two groups depending on whether they +are performed on individual rules or on the entire model. We first +describe the first group. Since these operations are independent of the +other rules, they can also be parallelized.

    +

    Refitting rules. This step processes all items derived with +quantization in the antecedent of a given rule. These items have +boundaries that stick to a grid that corresponds to the result of +discretization. The grid used by QCBA corresponds to all unique values +appearing in the training data. This is the only mandatory step.

    +

    Attribute pruning (attributePruning). Attribute pruning is a step +in QCBA that evaluates if the items are needed for each rule and item in +its antecedent. The item is removed if a rule created without the item +has at least the confidence of the original rule. Enabled (TRUE) by +default.

    +

    Trimming (trim_literal_boundaries). Boundary parts of intervals +into which no instances correctly classified by the rule fall are +removed. Enabled (TRUE) by default.

    +

    Extension (extendType). The ranges of intervals in the antecedent +of each rule are attempted to be enlarged. Currently, only extension on +numerical attributes is supported. By default, the extension is accepted +if it does not decrease rule confidence, but this behaviour can be +controlled by setting the minImprovement parameter (default is 0). To +overcome local minima, the extension process can provisionally accept a +drop in confidence in the intermediate result of the extension. How much +the confidence can temporarily decrease for the extension process not to +stop is controlled by minCondImprovement. In the current version, the +extension applies only to numerical attributes (set +extendType="numericOnly" to enable, this is also the default value). +In future versions, other extend types may be added.

    +

    Optimizations on rule list

    +

    The second group of optimizations aims at removing rules, considering +the context of other rules in the list.

    +

    Data coverage pruning (postpruning). The purpose of this step is +to remove rules that were made redundant by the previous QCBA +optimizations. Possible values are none, cba and greedy. The cba +option is identical to CBA’s data coverage pruning: a rule is removed if +it does not correctly classify any transaction in the training data +after all transactions covered by retained rules with higher precedence +were removed. The greedy option is an experimental modification of +data coverage pruning described in the +qCBA documentation. +Enabled by default (the default value is postpruning=cba).

    +

    Default rule overlap pruning (defaultRuleOverlapPruning). Let +\(R_p\) denote the set of rules that classify into the same class as the +default rule \(r_d: \{\} \rightarrow cons_1\). To determine if a pruning +candidate, denoted as \(r_p \in R_p: ant \rightarrow cons_1\), can be +removed, all rules with lower precedence that have a nonempty antecedent +and a different consequent \(cons_2\) (\(cons_1\neq cons_2\)) are +identified. Let’s denote the set of these rules as \(R_c\). If the +antecedents of all rules in \(R_c\) do not cover any of the transactions +covered by \(r_p\) in the training data, \(r_p\) is removed. The removal of +\(r_p\) will not affect the classification of training data, since the +instances originally covered by \(r_p: ant \rightarrow cons_1\) will be +classified to the same class by \(r_d: \{\} \rightarrow cons_1\). This is +called the transaction-based version. In the alternative range-based +version, the checks on rules in \(R_p\) involve checking the boundaries of +intervals rather than overlap in matched transactions. This parameter +has three possible values: transactionBased, rangeBased and none. +According to analysis and benchmarks in (Kliegr and Izquierdo 2023), the +transaction-based version removes more rules than the range-based +version, although it can sometimes affect predictive performance on +unseen data. Transaction-based pruning is the default.

    +

    Effects on classification performance and model size

    +

    An overview of observed properties of the QCBA steps is present in +Table 1. The +entries denote the effect of applying the algorithm specified in the +first column on the input rule list: \(\geq\) denotes that the value of +the given metric will increase or will not change, = the value will not +change, \(\leq\) decrease or will not change, \(na\) can increase, decrease +or will not change. For example, applying the refit algorithm on a rule +can have the following effects according to the table: the density of +the rule will improve or remain the same: the (+) symbol in the table +denotes that an increase in that value is considered a favourable +property, and (-) as negative. Rule confidence (conf), rule support +(supp), rule length (length) will remain unaffected. Considering the +entire rule list, the refit operation will not affect the rule count or +accuracy on training data (\(acc_{train}\)). However, the accuracy on +unseen data (\(acc_{test}\)) may change.

    +
    +
    + + +++++++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Table 1: Hypothesized properties of the proposed rule tuning +algorithms. drop denotes default rule pruning (trans. +transaction-based; range range-based). * the number of rules is not +directly reduced, but literal pruning can make a rule redundant +(identical to another rule). The value na expresses that there is no +unanimous effect of the preprocessing algorithm on the quality +measure.
    algorithmrule (local classifier)rule list
    datasetconfsupplength\(acc_{train}\)\(acc_{test}\)rule count
    refit====\(na\)=
    literal pruning\(\geq\)\(\geq (+)\)\(\leq (+)\)\(na\)\(na\)\(\leq\)* (+)
    trimming\(\geq (+)\)==\(\geq (+)\)\(na\)=
    extension\(\geq (+)\)\(\geq (+)\)=\(na\)\(na\)=
    postpruningnanana\(\geq\)\(na\)\(\leq\) (+)
    drop - trans.nanana=\(na\)\(\leq\) (+)
    drop - rangenanana==\(\leq\) (+)
    +
    +
    +

    Handling missing data

    +

    Association rule learning approaches are resilient to the presence of +missing data in the input data frame. The reason is that rows are viewed +as transactions and combinations of column names and their values as +items. A missing value (NA) in a given column is then interpreted as +an item not present in a given transaction and skipped when a data frame +is converted to the transactions data structure from the +arules using the as() +function, which will result in NA values not being present in learnt +rules. Note that qcba() treats an empty string in the input dataframe +in the same way as the NA value along with a dataframe containing NA +values: the NA value is first converted to an empty string, +represented using .jarray() from +rJava and then passed to +the Java-based core of +qCBA, where it is treated +as a Float.NaN value and also generally skipped.

    +

    Computational costs for large datasets

    +

    The individual postprocessing operations have distinct computational +costs. These depend on several main factors: the number of rows and +columns of the input data, the number of unique numerical values, and +the number of input rules. A detailed experimental study of the +influence of these factors is presented in Kliegr and Izquierdo (2023). This was +performed on subsets of the KDD’99 Anomaly Detection dataset, with the +maximum dataset size being processed using all QCBA optimizations, +reaching about 40,000 rows and about the same number of unique numerical +values. The results show that the most computationally intensive step is +the extension step. For large datasets, the user may, therefore, +consider disabling it or tuning its minCI parameter, which can +influence the runtime. An important factor is also the number of input +rules for optimization. This is often related to the chosen base +learning algorithm. For example, CPAR tends to produce a large number of +rules while SBRL will output condensed rule models. For details, please +refer to Kliegr and Izquierdo (2023).

    +

    4 Demonstration of individual QCBA optimization steps

    +

    Compared to the example in Subsection Example, this +part will use the larger iris dataset, which allows for the +demonstration of all QCBA steps. To show QCBA with a different base rule +learner than CBA from the +arc package used in the +previous section, we will use CPAR from the +arulesCBA package.

    +

    Data

    +

    We use the iris dataset from the +datasets R package. +The dataset was shuffled and randomly split into a train set (100 rows) +and a test set (50 rows). The data was then automatically discretized +using the MDLP algorithm wrapped by arc::discrNumeric() and originally +available from the R +discretization +library.

    +
    library(arulesCBA)  #version 1.2.7
    +set.seed(12) # Chosen for demonstration purposes
    +allDataShuffled <- datasets::iris[sample(nrow(datasets::iris)), ]
    +trainFold <- allDataShuffled[1:100, ]
    +testFold <- allDataShuffled[101:nrow(datasets::iris), ]
    +classAtt <- "Species"
    +
    +discrModel <- discrNumeric(df = trainFold, classAtt = classAtt)
    +train_disc <- as.data.frame(lapply(discrModel$Disc.data, as.factor))
    +cutPoints <- discrModel$cutp
    +test_disc <- applyCuts(
    +  testFold,
    +  cutPoints,
    +  infinite_bounds = TRUE,
    +  labels = TRUE
    +)
    +y_true <- testFold[[classAtt]]
    +

    Learn base ARC model with CPAR

    +

    In the following, we learn an ARC model using the CPAR (Classification +based on Predictive Association Rules) algorithm using its default +settings.

    +
    rmBASE <- CPAR(train_disc, formula = as.formula(paste(classAtt, "~ .")))
    +predictionBASE <- predict(rmBASE, test_disc)
    +inspect(rmBASE$rules)
    +cat("Number of rules: ", length(rmBASE$rules))
    +cat("Total conditions: ", sum(rmBASE$rules@lhs@data))
    +cat("Accuracy on test data: ", mean(predictionBASE == y_true))
    +

    In this case, the rule model is composed of seven rules. Note that the +last field on the output containing the Laplace statistics was omitted +for brevity.

    +
        lhs                            rhs                     support confidence     lift 
    +[1] {Petal.Length=[-Inf;2.6]}   => {Species=setosa}        0.32    1.0000000      3.125000 
    +[2] {Petal.Width=[-Inf;0.8]}    => {Species=setosa}        0.32    1.0000000      3.125000 
    +[3] {Petal.Length=(5.15; Inf],                                                               
    +     Petal.Width=(1.75; Inf]}   => {Species=virginica}     0.25    1.0000000      2.777778 
    +[4] {Petal.Width=(1.75; Inf]}   => {Species=virginica}     0.33    0.9705882      2.696078 
    +[5] {Sepal.Length=(5.55; Inf],                                                               
    +     Petal.Length=(2.6;4.75],                                                                
    +     Petal.Width=(0.8;1.75]}    => {Species=versicolor}    0.21    1.0000000      3.125000 
    +[6] {Petal.Length=(2.6;4.75]}   => {Species=versicolor}    0.26    0.9629630      3.009259 
    +[7] {Petal.Width=(0.8;1.75]}    => {Species=versicolor}    0.31    0.9117647      2.849265 
    +

    The statistics are:

    +
    "Number of rules: 7 Total conditions: 10 Accuracy on test data: 0.96"
    +

    Configuring QCBA optimizations and printing statistics

    +

    For our demonstration purposes, we will set up a generic +qCBA configuration +(variable baseModel_arc), which initially disables all optimizations. +To avoid repeating code for passing long argument lists to qcba() and +for printing model statistics such as model size and accuracy on test +data, we will also introduce the helper function qcba_with_summary().

    +
    
    +baseModel_arc <- arulesCBA2arcCBAModel(
    +  arulesCBAModel = rmBASE,
    +  cutPoints = cutPoints,
    +  rawDataset = trainFold,
    +  classAtt = classAtt
    +)
    +qcbaParams <- list(
    +  cbaRuleModel = baseModel_arc,
    +  datadf = trainFold,
    +  extend = "noExtend",
    +  attributePruning = FALSE,
    +  continuousPruning = FALSE,
    +  postpruning = "none",
    +  trim_literal_boundaries = FALSE,
    +  defaultRuleOverlapPruning = "noPruning",
    +  minImprovement = 0,
    +  minCondImprovement = 0
    +)
    +
    +qcba_with_summary <- function(params) {
    +  rmQCBA <- do.call(qcba, params)
    +  cat("Number of rules: ", nrow(rmQCBA@rules), " ")
    +  cat("Total conditions: ", sum(rmQCBA@rules$condition_count), " ")
    +  accuracy <- CBARuleModelAccuracy(predict(rmQCBA, testFold), testFold[[classAtt]])
    +  cat("Accuracy on test data: ", round(accuracy, 2))
    +  print(rmQCBA@rules)
    +}
    +

    QCBA Refit

    +

    The following code will postoptimize the previously learnt CBA model +using the refit optimization:

    +
    qcba_with_summary(qcbaParams)
    +

    This will output the following list of eight rules (note that in this +and the following printouts, the columns with rule measures are omitted +for brevity):

    +
    1                                               {Petal.Length=[-Inf;1.9]} => {Species=setosa}
    +2                                                {Petal.Width=[-Inf;0.6]} => {Species=setosa}
    +3                          {Petal.Length=[5.2;Inf],Petal.Width=[1.8;Inf]} => {Species=virginica}
    +4     {Sepal.Length=[5.6;Inf],Petal.Length=[3.3;4.7],Petal.Width=[1;1.7]} => {Species=versicolor}
    +5                                                 {Petal.Width=[1.8;Inf]} => {Species=virginica}
    +6                                                {Petal.Length=[3.3;4.7]} => {Species=versicolor}
    +7                                                   {Petal.Width=[1;1.7]} => {Species=versicolor}
    +8                                                                      {} => {Species=virginica}
    +

    The intervals (except for boundaries set to infinity) were shortened. +For example, for the first rule, the training data does not contain any +data point with Petal.Length=2.6 (original boundary) but it does contain +the value 1.9 (new boundary).

    +
    any(trainFold$Petal.Length == 1.9)
    +any(trainFold$Petal.Length == 2.6)
    +

    The first returns TRUE and the second FALSE.

    +

    The statistics are

    +
    [1] "Number of rules: 8 Total conditions: 10 Accuracy on test data: 0.96"
    +

    While this is one rule more than the CPAR model, this extra rule is the +explicitly included default rule (rule #8). In the +arulesCBA CPAR +model, the default rule is included in a separate slot +(rmBASE$default) and is the same virginica class as the one in the +QCBA rule set.

    +

    Recall that rules are applied from top to bottom. A careful inspection +of the rule list shows that it contains rule 4, which is a special case +of rule 7 (both predicting the versicolor class). However, neither of +these rules can be removed without impact on the predictions. If the +more specific rule 4 was removed, some instances would be classified +differently, as more instances would reach rule 5, which would classify +them as virginica. Similarly, the classification would change if we +replace rule 4 with rule 7 or rule 7 with rule 4.

    +

    Adjusting boundaries and attribute pruning

    +

    We will demonstrate extension, trimming and attribute pruning steps +simultaneously for efficient presentation.

    +
    qcbaParams$attributePruning <- TRUE
    +qcbaParams$trim_literal_boundaries <- TRUE
    +qcbaParams$extend <- "numericOnly"
    +qcba_with_summary(qcbaParams)
    +

    The list of resulting rules is

    +
    1  {Petal.Length=[-Inf;1.9]}                     => {Species=setosa}
    +2  {Petal.Width=[-Inf;0.6]}                      => {Species=setosa}
    +3  {Petal.Length=[5.2;Inf]}                      => {Species=virginica}
    +4  {Sepal.Length=[5;Inf],Petal.Length=[3.3;4.7]} => {Species=versicolor}
    +5  {Petal.Width=[1.8;Inf]}                       => {Species=virginica}
    +6  {Petal.Length=[3.3;4.7]}                      => {Species=versicolor}    
    +7  {Petal.Width=[1;1.7]}                         => {Species=versicolor}    
    +8  {}                                            => {Species=virginica}
    +

    As can be seen, the adjustment of intervals resulted in a change in the +boundary for Sepal.length in rule #4. The attribute pruning removed +extra conditions from rules #3 and #4 resulting in a smaller model with +overall improved test accuracy:

    +
    "Number of rules: 8 Total conditions: 8 Accuracy on test data: 1"
    +

    Postpruning

    +

    The postpruning is performed on a model resulting from all previous +steps.

    +
    qcbaParams$postpruning <- "cba"
    +qcba_with_summary(qcbaParams)
    +

    The result is a substantially reduced rule list:

    +
    1  {Petal.Length=[1;1.9]}                        => {Species=setosa}    
    +2  {Petal.Length=[5.2;Inf]}                      => {Species=virginica} 
    +3  {Sepal.Length=[5;Inf],Petal.Length=[3.3;4.7]} => {Species=versicolor}
    +4  {Petal.Width=[1.8;Inf]}                       => {Species=virginica}
    +5  {}                                            => {Species=versicolor}    
    +

    The statistics are:

    +
    "Number of rules: 5 Total conditions: 5 Accuracy on test data: 1.0"
    +

    As can be seen, postpruning reduced the model size significantly, and in +this case, the model has even better accuracy than the base CPAR model. +However, in some other cases, a small average decrease in accuracy as a +result of pruning has been shown in benchmarks in Kliegr and Izquierdo (2023).

    +

    Default rule pruning

    +

    The following code demonstrates the standard strategy for default rule +pruning. As outlined earlier, this often provides an effective way to +reduce the rule count, however, sometimes at the expense of slightly +lower accuracy.

    +
    qcbaParams$defaultRuleOverlapPruning <- "transactionBased"
    +qcba_with_summary(qcbaParams)
    +

    The result is the final reduced rule list with the removed rule #3 from +the previous print-out. This rule classifies to the versicolor class. +Since it is the default class, instances that were covered by this rule +will be covered by the default rule. The original rule #4 covers – +considering its position in the rule list – different instances of +training data and, therefore, will not interfere with this.

    +
    1  {Petal.Length=[1;1.9]}   => {Species=setosa} 
    +2  {Petal.Length=[5.2;Inf]} => {Species=virginica}  
    +3  {Petal.Width=[1.8;Inf]}  => {Species=virginica}
    +4  {}                       => {Species=versicolor} 
    +

    The statistics for the final model are:

    +
    Number of rules: 4 Total conditions: 3 Accuracy on test data: 1.0
    +

    Compared to the original CPAR model, the number of rules dropped from 8 +(including the default rule) to 4, the number of conditions dropped from +10 to 3, and the accuracy increased by 0.04 points. This is an +illustrative example and actual results may vary for a particular +dataset. On average, the improvement reported over CPAR as measured on +22 benchmark datasets was a 2% improvement in accuracy, a 40% reduction +in the number of rules and a 29% reduction in the number of conditions +(Kliegr and Izquierdo 2023). More details on the benchmarks are covered in +Section Built-in benchmark support.

    +

    5 Interoperability with Rule Learning Packages in CRAN

    +

    The qCBA package is able +to process CBA models produced by all three CBA implementations in CRAN: +arc, +arulesCBA, +rCBA. Additionally, it +can process other rule models generated by +arulesCBA such as +CPAR and SBRL models generated by the package +sbrl.

    +

    On the input, qcba() requires an instance of CBARuleModel, which has +the following slots:

    +
      +
    • rules: list of rules in the model (instance of rules from +arules package.

    • +
    • cutp: specification of cutpoints used to discretize numerical +attributes,

    • +
    • classAtt: name of the target attribute,

    • +
    • attTypes: types of the attributes (numeric or factor).

    • +
    +

    As shown below, the instance of this class is created automatically when +used with arc and through +prepared helper functions for other libraries. The code examples in the +following are built on the data preparation described in +Subsection Data.

    +

    arc package

    +

    As the arc package was +specifically designed for +qCBA, it outputs an +instance of the CBARuleModel class, which is accepted by the qcba() +function. The postprocessing can thus be directly applied to the result +of cba().

    +
    rmCBA <- cba(datadf = trainFold, classAtt = "Species")
    +rmqCBA <- qcba(cbaRuleModel = rmCBA, datadf = trainFold)
    +

    By default, the function cba() from the +arc package learns +candidate association rules using automatic threshold detection using +the heuristic algorithm from (Kliegr and Kuchar 2019). Therefore, no support +and confidence thresholds have to be passed. A more complex case +involving custom discretization and thresholds was demonstrated in +Section Primer on building rule-based classifiers for QCBA in +R.

    +

    arulesCBA package

    +

    Compared to the previous example, there is an extra line with a call to +a helper function.

    +
    library(arulesCBA)
    +arulesCBAModel <- arulesCBA::CBA(Species ~ ., data = train_disc, supp = 0.1, conf = 0.9)
    +CBAmodel <- arulesCBA2arcCBARuleModel(arulesCBAModel, discrModel$cutp, iris, classAtt)
    +qCBAmodel <- qcba(cbaRuleModel = CBAmodel, datadf = iris)
    +

    Note that we passed a prediscretized data in irisDisc to the package +arulesCBA. While the +arulesCBA package +allows for supervised discretization using the +arulesCBA::discretizeDF.supervised() method, the cutpoints determined +during the discretization are not exposed in a machine-readable way. +Therefore, when +arulesCBA is used in +conjunction with qCBA, +the discretization should be performed using arc::discrNumeric(). +Since both arc and +arulesCBA internally +use the MDLP method from the +discretization +package, this should not influence the results.

    +

    rCBA package

    +

    The logic of the use of +rCBA package is similar +to that of the previously covered package; it is only necessary to +ensure the use of a different conversion function:

    +
    library(rCBA)
    +rCBAmodel <- rCBA::build(train_disc)
    +CBAmodel <- rcbaModel2CBARuleModel(rCBAmodel, discrModel$cutp, iris, "Species")
    +qCBAmodel <- qcba(CBAmodel, iris)
    +

    Confidence and support thresholds are not specified as +rCBA uses automatic +confidence and threshold tuning using the simulated annealing algorithm +from (Kliegr and Kuchar 2019). The +rCBA package internally +uses a Java implementation of CBA, which may result in faster +performance on some datasets than +arulesCBA +(Hahsler et al. 2019).

    +

    sbrl and other packages

    +

    The logic of the use of +qCBA package for +sbrl is similar; it is +again only necessary to use a dedicated conversion function +sbrlModel2arcCBARuleModel().

    +

    An example for sbrl is +contained in the qCBA +package documentation, as +sbrl requires additional +preprocessing and postprocessing: +sbrl requires a specially +named target attribute, allows only for binary targets, and outputs +probabilities rather than specific class predictions.

    +

    For compatibility with packages that do not use the +arules data structures, +there is also the customCBARuleModel class, which takes rules as a +dataframe conforming to the format used in +arules that can be +obtained with (as(rules, "data.frame")).

    +

    6 Built-in benchmark support

    +

    The qCBA package has +built-in support for benchmarking over all supported types of algorithms +covered in the previous section. This includes +arulesCBA +implementations of CBA, CMAR, CPAR, PRM and FOIL2 (Quinlan and Cameron-Jones 1993).

    +

    By default, an average of two runs of each algorithm is performed.

    +
    # learn with default metaparameter values
    +stats <- benchmarkQCBA(train = trainFold, test = testFold, classAtt = classAtt)
    +print(stats)
    +

    The result of the last printout is

    +
               CBA   CMAR   CPAR   PRM  FOIL2 CBA_QCBA CMAR_QCBA CPAR_QCBA PRM_QCBA FOIL2_QCBA
    +accuracy  1.00  0.960  0.960 0.960  0.960    1.000     1.000     1.000    1.000      0.960
    +rulecount 5.00 25.000  7.000 6.000  8.000    4.000     4.000     4.000    4.000      5.000
    +modelsize 5.00 52.000 10.000 9.000 13.000    3.000     3.000     3.000    3.000      5.000
    +buildtime 0.05  0.535  0.215 0.205  0.215    0.058     0.138     0.059    0.059      0.081
    +

    The function can be easily turned into a comparison with +round((stats[, 6:10] / stats[, 1:5] - 1), 3), the result is then:

    +
              CBA_QCBA CMAR_QCBA CPAR_QCBA PRM_QCBA FOIL2_QCBA
    +accuracy     0.000     0.042     0.042    0.042      0.000
    +rulecount   -0.200    -0.840    -0.429   -0.333     -0.375
    +modelsize   -0.400    -0.942    -0.700   -0.667     -0.615
    +buildtime    0.204    -0.737    -0.681   -0.722     -0.665
    +

    This shows that on the iris, depending on the base algorithm, QCBA +decreased the rule count between 20% and 84%, while accuracy remained +unchanged or increased by about 4%. The last row shows that the time +required by QCBA is, for four out of five studied reference algorithms, +lower than what it takes to train the input model by the corresponding +algorithm. The benchmarkQCBA() function can also accept custom +metaparameters and selected base rule learners. The user can also choose +the number of runs (iterations parameter) and obtain the resulting +models from the last iteration.

    +

    Since the outputs of some learners may depend on chance, the function +also allows setting the random seed through the optional seed +argument. Note that the provided seed is not used for splitting data, +which needs to be performed externally. This approach provides the most +control for the user, avoids replicating code in other R packages and +functions aimed at splitting data, and also improves reproducibility.

    +
    output <- benchmarkQCBA(
    +  trainFold,
    +  testFold,
    +  classAtt,
    +  train_disc,
    +  test_disc,
    +  discrModel$cutp,
    +  CBA = list(support = 0.05, confidence = 0.5),
    +  algs = c("CPAR"),
    +  iterations = 10,
    +  return_models = TRUE,
    +  seed = 1
    +)
    +
    +message("Evaluation statistics")
    +print(output$stats)
    +message("CPAR model")
    +inspect(output$CPAR[[1]])
    +message("QCBA model")
    +print(output$CPAR_QCBA[[1]])
    +

    This will produce output with a list of rules similar to the final CPAR +model presented in Section Demonstration of individual QCBA +optimization steps. A more complex benchmark of +computational costs is presented in Kliegr and Izquierdo (2023), which also includes +the study of various data sizes and the effect of varying the number of +unique values in the dataset. A brief overview of the main results was +presented in Subsection Computational costs for large +datasets.

    +

    A GitHub repository https://github.com/kliegr/arcBench contains +scripts that extend this workflow into automation across multiple +datasets and materialized splits in each dataset. It also includes +support for benchmarking additional rule learning algorithms, including +SBRL, Python packages producing IDS models (Lakkaraju et al. 2016) and +Weka libraries for RIPPER (Cohen 1995) and FURIA (Hühn and Hüllermeier 2009). +Detailed benchmarking results are included in (Kliegr and Izquierdo 2023).

    +

    7 Conclusions

    +

    Quantitative CBA ameliorates one of the major drawbacks of association +rule classification, the adherence of rules comprising the classifier to +the multidimensional grid created by discretization of numerical +attributes. By working with the original continuous data, the algorithm +can improve the fit of the rules and consequently reduce their count. +The QCBA algorithm does not introduce any mandatory thresholds or +meta-parameters for the user to set or optimize, although it does allow +disabling the individual optimizations. The +qCBA package implements +QCBA, allowing the postprocessing of the output of all three CBA +implementations currently in CRAN. The package can also be used in +conjunction with other association rule-based models, including those +producing rule sets and using multi-rule classification.

    +

    The QCBA algorithm is described in detail in Kliegr and Izquierdo (2023), the +package documentation is available in Kliegr (2024), and additional +information is available at https://github.com/kliegr/qcba, which also +features an interactive RMarkdown tutorial supplementing this paper.

    +

    8 Acknowledgment

    +

    The author thanks the Faculty of Informatics and Statistics, Prague +University of Economics and Business, for long-term institutional +support of research activities.

    +
    +
    +

    9 Supplementary materials

    +

    Supplementary materials are available in addition to this article. It can be downloaded at +RJ-2025-038.zip

    +

    10 CRAN packages used

    +

    qCBA, rCBA, arc, arulesCBA, arules, arcPackage, rJava, datasets, discretization, sbrl

    +

    11 CRAN Task Views implied by cited packages

    +

    HighPerformanceComputing, MachineLearning, ModelDeployment

    +

    12 Note

    +

    This article is converted from a Legacy LaTeX article using the +texor package. +The pdf version is the official version. To report a problem with the html, +refer to CONTRIBUTE on the R Journal homepage.

    +
    +
    +R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In Proceedings of the 20th international conference on very large data bases, pages. 487–499 1994. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc. ISBN 1-55860-153-8. +
    +
    +M. Azmi and A. Berrado. RCAR framework: Building a regularized class association rules model in a categorical data space. In Proceedings of the 13th international conference on intelligent systems: Theories and applications, pages. 1–6 2020. +
    +
    +W. W. Cohen. Fast effective rule induction. In Twelfth international conference on machine learning, pages. 115–123 1995. Morgan Kaufmann. +
    +
    +U. M. Fayyad and K. B. Irani. Multi-interval discretization of continuous-valued attributes for classification learning. In 13th international joint conference on uncertainly in artificial intelligence (IJCAI93), pages. 1022–1029 1993. +
    +
    +M. Hahsler, S. Chelluboina, K. Hornik and C. Buchta. The arules r-package ecosystem: Analyzing interesting patterns from large transaction data sets. Journal of Machine Learning Research, 12(Jun): 2021–2025, 2011. +
    +
    +M. Hahsler, B. Grün and K. Hornik. Introduction to arules–mining association rules and frequent item sets. SIGKDD Explor, 2(4): 1–28, 2007. +
    +
    +M. Hahsler, I. Johnson, T. Kliegr and J. Kuchař. Associative classification in R: Arc, arulesCBA, and rCBA. R Journal, 9(2): 2019. +
    +
    +J. Hühn and E. Hüllermeier. FURIA: An algorithm for unordered fuzzy rule induction. Data Mining and Knowledge Discovery, 19(3): 293–319, 2009. +
    +
    +T. Kliegr. Association rule classification. 2016. URL https://CRAN.R-project.org/package=arc. R package version 1.1.3. +
    +
    +T. Kliegr. qCBA: Quantitative classification by association rules. 2024. URL https://CRAN.R-project.org/package=qCBA. R package version 1.0. +
    +
    +T. Kliegr and E. Izquierdo. QCBA: Improving rule classifiers learned from quantitative data by recovering information lost by discretisation. Applied Intelligence, 53(18): 20797–20827, 2023. +
    +
    +T. Kliegr and J. Kuchar. Tuning hyperparameters of classification based on associations (CBA). In ITAT, pages. 9–16 2019. +
    +
    +J. Kuchař and T. Kliegr. rCBA: CBA classifier for R. 2019. URL https://CRAN.R-project.org/package=rCBA. R package version 0.4.3. +
    +
    +H. Lakkaraju, S. H. Bach and J. Leskovec. Interpretable decision sets: A joint framework for description and prediction. In Proceedings of the 22Nd ACM SIGKDD international conference on knowledge discovery and data mining, pages. 1675–1684 2016. New York, NY, USA: ACM. ISBN 978-1-4503-4232-2. +
    +
    +W. Li, J. Han and J. Pei. CMAR: Accurate and efficient classification based on multiple class-association rules. In Proceedings of the 2001 IEEE international conference on data mining, pages. 369–376 2001. Washington, DC, USA: IEEE Computer Society. ISBN 0-7695-1119-8. +
    +
    +B. Liu, W. Hsu and Y. Ma. Integrating classification and association rule mining. In Proceedings of the fourth international conference on knowledge discovery and data mining, pages. 80–86 1998. New York, NY: AAAI Press. +
    +
    +T. G. Michael Hahsler Ian Johnson. arulesCBA: Classification based on association rules. 2024. URL https://CRAN.R-project.org/package=arulesCBA. R package version 1.2.7. +
    +
    +J. R. Quinlan. C4.5: Programs for machine learning. Morgan Kaufmann, 1993. +
    +
    +J. R. Quinlan and R. M. Cameron-Jones. FOIL: A midterm report. In European conference on machine learning, pages. 1–20 1993. Springer. +
    +
    +K. Vanhoof and B. Depaire. Structure of association rule classifiers: A review. In 2010 international conference on intelligent systems and knowledge engineering (ISKE), pages. 9–12 2010. +
    +
    +H. Yang, C. Rudin and M. Seltzer. Sbrl: Scalable Bayesian rule lists model. 2024. URL https://CRAN.R-project.org/package=sbrl. R package version 1.4. +
    +
    +H. Yang, C. Rudin and M. Seltzer. Scalable Bayesian rule lists. In International conference on machine learning, pages. 3921–3930 2017. PMLR. +
    +
    +X. Yin and J. Han. CPAR: Classification based on predictive association rules. In Proceedings of the 2003 SIAM international conference on data mining, pages. 331–335 2003. SIAM. +
    +
    + + +
    + +
    +
    + + + + + + + +
    +

    References

    +
    +

    Reuse

    +

    Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

    +

    Citation

    +

    For attribution, please cite this work as

    +
    Kliegr, "qCBA: An R Package for Postoptimization of Rule Models Learnt on Quantized Data", The R Journal, 2026
    +

    BibTeX citation

    +
    @article{RJ-2025-038,
    +  author = {Kliegr, Tomas},
    +  title = {qCBA: An R Package for Postoptimization of Rule Models Learnt on Quantized Data},
    +  journal = {The R Journal},
    +  year = {2026},
    +  note = {https://doi.org/10.32614/RJ-2025-038},
    +  doi = {10.32614/RJ-2025-038},
    +  volume = {17},
    +  issue = {4},
    +  issn = {2073-4859},
    +  pages = {156-172}
    +}
    +
    + + + + + + + diff --git a/_articles/RJ-2025-038/RJ-2025-038.pdf b/_articles/RJ-2025-038/RJ-2025-038.pdf new file mode 100644 index 0000000000..c7088f7aaa Binary files /dev/null and b/_articles/RJ-2025-038/RJ-2025-038.pdf differ diff --git a/_articles/RJ-2025-038/RJ-2025-038.zip b/_articles/RJ-2025-038/RJ-2025-038.zip new file mode 100644 index 0000000000..50ec5508de Binary files /dev/null and b/_articles/RJ-2025-038/RJ-2025-038.zip differ diff --git a/_articles/RJ-2025-038/RJournal.sty b/_articles/RJ-2025-038/RJournal.sty new file mode 100644 index 0000000000..351990be38 --- /dev/null +++ b/_articles/RJ-2025-038/RJournal.sty @@ -0,0 +1,358 @@ +% Package `RJournal' to use with LaTeX2e +% Copyright (C) 2010 by the R Foundation +% Copyright (C) 2013 by the R Journal +% +% Originally written by Kurt Hornik and Friedrich Leisch with subsequent +% edits by the editorial board +% +% CAUTION: +% Do not modify this style file. Any changes to this file will be reset when your +% article is submitted. +% If you must modify the style or add LaTeX packages to the article, these +% should be specified in RJwrapper.tex + +\NeedsTeXFormat{LaTeX2e}[1995/12/01] +\ProvidesPackage{RJournal}[2025/10/05 v0.17 RJournal package] + +\RequirePackage{tikz} + +% Overall page layout, fonts etc ----------------------------------------------- + +% Issues of of \emph{The R Journal} are created from the standard \LaTeX{} +% document class \pkg{report}. + +\RequirePackage{geometry} +\geometry{a4paper, + textwidth=14cm, top=1cm, bottom=1cm, + includehead,includefoot,centering, + footskip=1.5cm} +\raggedbottom +\sloppy +\clubpenalty = 10000 +\widowpenalty = 10000 +\brokenpenalty = 10000 +\usepackage{microtype} + + +\RequirePackage{fancyhdr} +\fancyhead{} +\fancyheadoffset{2cm} +\fancyhead[L]{\textsc{\RJ@sectionhead}} +\fancyhead[R]{\thepage} +\fancyfoot{} +\fancyfoot[L]{The R Journal Vol. \RJ@volume/\RJ@number, \RJ@month~\RJ@year} +\fancyfoot[R]{ISSN 2073-4859} +\pagestyle{fancy} + +% We use the following fonts (all with T1 encoding): +% +% rm & palatino +% tt & inconsolata +% sf & helvetica +% math & palatino + +\RequirePackage{microtype} + +\RequirePackage[scaled=0.92]{helvet} +\RequirePackage{palatino,mathpazo} +\RequirePackage[scaled=1.02]{inconsolata} +\RequirePackage[T1]{fontenc} + +\RequirePackage[hyphens]{url} +\RequirePackage[pagebackref]{hyperref} +\renewcommand{\backref}[1]{[p#1]} + +% Dark blue colour for all links +\RequirePackage{color} +\definecolor{link}{rgb}{0.45,0.51,0.67} +\hypersetup{ + colorlinks,% + citecolor=link,% + filecolor=link,% + linkcolor=link,% + urlcolor=link +} + +% Give the text a little room to breath +\setlength{\parskip}{3pt} +\RequirePackage{setspace} +\setstretch{1.05} + +% Issue and article metadata --------------------------------------------------- + +% Basic front matter information about the issue: volume, number, and +% date. + +\newcommand{\volume}[1]{\def\RJ@volume{#1}} +\newcommand{\volnumber}[1]{\def\RJ@number{#1}} +\renewcommand{\month}[1]{\def\RJ@month{#1}} +\renewcommand{\year}[1]{\def\RJ@year{#1}} + + +% Individual articles correspond to +% chapters, and are contained in |article| environments. This makes it +% easy to have figures counted within articles and hence hyperlinked +% correctly. + +% An article has an author, a title, and optionally a subtitle. We use +% the obvious commands for specifying these. Articles will be put in certain +% journal sections, named by \sectionhead. + +\newcommand {\sectionhead} [1]{\def\RJ@sectionhead{#1}} +\renewcommand{\author} [1]{\def\RJ@author{#1}} +\renewcommand{\title} [1]{\def\RJ@title{#1}} +\newcommand {\subtitle} [1]{\def\RJ@subtitle{#1}} + +% Control appearance of titles: make slightly smaller than usual, and +% suppress section numbering. See http://tex.stackexchange.com/questions/69749 +% for why we don't use \setcounter{secnumdepth}{-1} + +\usepackage[medium]{titlesec} +\usepackage{titletoc} +\titleformat{\section} {\normalfont\large\bfseries}{\arabic{section}}{1em}{} +\titleformat{\subsection}{\normalfont\normalsize\bfseries}{\arabic{section}.\arabic{subsection}}{0.5em}{} +\titlecontents{chapter} [0em]{}{}{}{\titlerule*[1em]{.}\contentspage} + +% Article layout --------------------------------------------------------------- + +% Environment |article| clears the article header information at its beginning. +% We use |\FloatBarrier| from the placeins package to keep floats within +% the article. +\RequirePackage{placeins} +\newenvironment{article}{\author{}\title{}\subtitle{}\FloatBarrier}{\FloatBarrier} + +% Refereed articles should have an abstract, so we redefine |\abstract| to +% give the desired style + +\renewcommand{\abstract}[1]{\noindent\textbf{Abstract} #1} +\renewenvironment{abstract}{\noindent\textbf{Abstract}~}{} + +% The real work is done by a redefined version of |\maketitle|. Note +% that even though we do not want chapters (articles) numbered, we +% need to increment the chapter counter, so that figures get correct +% labelling. + +\renewcommand{\maketitle}{% +\noindent + \chapter{\RJ@title}\refstepcounter{chapter} + \ifx\empty\RJ@subtitle + \else + \noindent\textbf{\RJ@subtitle} + \par\nobreak\addvspace{\baselineskip} + \fi + \ifx\empty\RJ@author + \else + \noindent\textit{\RJ@author} + \par\nobreak\addvspace{\baselineskip} + \fi + \@afterindentfalse\@nobreaktrue\@afterheading +} + +% Now for some ugly redefinitions. We do not want articles to start a +% new page. (Actually, we do, but this is handled via explicit +% \newpage +% +% The name@of@eq is a hack to get hyperlinks to equations to work +% within each article, even though there may be multiple eq.(1) +% \begin{macrocode} +\renewcommand\chapter{\secdef\RJ@chapter\@schapter} +\providecommand{\nohyphens}{% + \hyphenpenalty=10000\exhyphenpenalty=10000\relax} +\newcommand{\RJ@chapter}{% + \edef\name@of@eq{equation.\@arabic{\c@chapter}}% + \renewcommand{\@seccntformat}[1]{}% + \@startsection{chapter}{0}{0mm}{% + -2\baselineskip \@plus -\baselineskip \@minus -.2ex}{\p@}{% + \phantomsection\normalfont\huge\bfseries\raggedright}} + +% Book reviews should appear as sections in the text and in the pdf bookmarks, +% however we wish them to appear as chapters in the TOC. Thus we define an +% alternative to |\maketitle| for reviews. +\newcommand{\review}[1]{ + \pdfbookmark[1]{#1}{#1} + \section*{#1} + \addtocontents{toc}{\protect\contentsline{chapter}{#1}{\thepage}{#1.1}} +} + +% We want bibliographies as starred sections within articles. +% +\RequirePackage[sectionbib,round]{natbib} +\bibliographystyle{abbrvnat} +\renewcommand{\bibsection}{\section*{References}} + +% Equations, figures and tables are counted within articles, but we do +% not show the article number. For equations it becomes a bit messy to avoid +% having hyperref getting it wrong. + +% \numberwithin{equation}{chapter} +\renewcommand{\theequation}{\@arabic\c@equation} +\renewcommand{\thefigure}{\@arabic\c@figure} +\renewcommand{\thetable}{\@arabic\c@table} + +% Issue layout ----------------------------------------------------------------- + +% Need to provide our own version of |\tableofcontents|. We use the +% tikz package to get the rounded rectangle. Notice that |\section*| +% is really the same as |\chapter*|. +\renewcommand{\contentsname}{Contents} +\renewcommand\tableofcontents{% + \vspace{1cm} + \section*{\contentsname} + { \@starttoc{toc} } +} + +\renewcommand{\titlepage}{% + \thispagestyle{empty} + \hypersetup{ + pdftitle={The R Journal Volume \RJ@volume/\RJ@number, \RJ@month \RJ@year},% + pdfauthor={R Foundation for Statistical Computing},% + } + \noindent + \begin{center} + \fontsize{50pt}{50pt}\selectfont + The \raisebox{-8pt}{\includegraphics[height=77pt]{Rlogo-5}}\hspace{10pt} + Journal + + \end{center} + {\large \hfill Volume \RJ@volume/\RJ@number, \RJ@month{} \RJ@year \quad} + + \rule{\textwidth}{1pt} + \begin{center} + {\Large A peer-reviewed, open-access publication of the \\ + R Foundation for Statistical Computing} + \end{center} + + % And finally, put in the TOC box. Note the way |tocdepth| is adjusted + % before and after producing the TOC: thus, we can ensure that only + % articles show up in the printed TOC, but that in the PDF version, + % bookmarks are created for sections and subsections as well (provided + % that the non-starred forms are used). + \setcounter{tocdepth}{0} + \tableofcontents + \setcounter{tocdepth}{2} + \clearpage +} + +% Text formatting -------------------------------------------------------------- + +\newcommand{\R}{R} +\newcommand{\address}[1]{\addvspace{\baselineskip}\noindent\emph{#1}} +\newcommand{\email}[1]{\href{mailto:#1}{\normalfont\texttt{#1}}} + +% Simple font selection is not good enough. For example, |\texttt{--}| +% gives `\texttt{--}', i.e., an endash in typewriter font. Hence, we +% need to turn off ligatures, which currently only happens for commands +% |\code| and |\samp| and the ones derived from them. Hyphenation is +% another issue; it should really be turned off inside |\samp|. And +% most importantly, \LaTeX{} special characters are a nightmare. E.g., +% one needs |\~{}| to produce a tilde in a file name marked by |\file|. +% Perhaps a few years ago, most users would have agreed that this may be +% unfortunate but should not be changed to ensure consistency. But with +% the advent of the WWW and the need for getting `|~|' and `|#|' into +% URLs, commands which only treat the escape and grouping characters +% specially have gained acceptance + +\DeclareRobustCommand\code{\bgroup\@noligs\@codex} +\def\@codex#1{\texorpdfstring% +{{\normalfont\ttfamily\hyphenchar\font=-1 #1}}% +{#1}\egroup} +\newcommand{\kbd}[1]{{\normalfont\texttt{#1}}} +\newcommand{\key}[1]{{\normalfont\texttt{\uppercase{#1}}}} +\DeclareRobustCommand\samp{`\bgroup\@noligs\@sampx} +\def\@sampx#1{{\normalfont\texttt{#1}}\egroup'} +\newcommand{\var}[1]{{\normalfont\textsl{#1}}} +\let\env=\code +\newcommand{\file}[1]{{`\normalfont\textsf{#1}'}} +\let\command=\code +\let\option=\samp +\newcommand{\dfn}[1]{{\normalfont\textsl{#1}}} +% \acronym is effectively disabled since not used consistently +\newcommand{\acronym}[1]{#1} +\newcommand{\strong}[1]{\texorpdfstring% +{{\normalfont\fontseries{b}\selectfont #1}}% +{#1}} +\let\pkg=\strong +\newcommand{\CRANpkg}[1]{\href{https://CRAN.R-project.org/package=#1}{\pkg{#1}}}% +\let\cpkg=\CRANpkg +\newcommand{\ctv}[1]{\href{https://CRAN.R-project.org/view=#1}{\emph{#1}}} +\newcommand{\BIOpkg}[1]{\href{https://www.bioconductor.org/packages/release/bioc/html/#1.html}{\pkg{#1}}} + +% Example environments --------------------------------------------------------- +\RequirePackage{fancyvrb} +\RequirePackage{alltt} + +\DefineVerbatimEnvironment{example}{Verbatim}{} +\renewenvironment{example*}{\begin{alltt}}{\end{alltt}} + +% Support for output from Sweave, and generic session style code +% These used to have fontshape=sl for Sinput/Scode/Sin, but pslatex +% won't use a condensed font in that case. + +% Update (2015-05-28 by DS): remove fontsize=\small to match example environment + +\DefineVerbatimEnvironment{Sinput}{Verbatim}{} +\DefineVerbatimEnvironment{Soutput}{Verbatim}{} +\DefineVerbatimEnvironment{Scode}{Verbatim}{} +\DefineVerbatimEnvironment{Sin}{Verbatim}{} +\DefineVerbatimEnvironment{Sout}{Verbatim}{} +\newenvironment{Schunk}{}{} + +% Mathematics ------------------------------------------------------------------ + +% The implementation of |\operatorname| is similar to the mechanism +% \LaTeXe{} uses for functions like sin and cos, and simpler than the +% one of \AmSLaTeX{}. We use |\providecommand| for the definition in +% order to keep the one of the \pkg{amstex} if this package has +% already been loaded. +% \begin{macrocode} +\providecommand{\operatorname}[1]{% + \mathop{\operator@font#1}\nolimits} +\RequirePackage{amsfonts} + +\renewcommand{\P}{% + \mathop{\operator@font I\hspace{-1.5pt}P\hspace{.13pt}}} +\newcommand{\E}{% + \mathop{\operator@font I\hspace{-1.5pt}E\hspace{.13pt}}} +\newcommand{\VAR}{\operatorname{var}} +\newcommand{\COV}{\operatorname{cov}} +\newcommand{\COR}{\operatorname{cor}} + +% Figures ---------------------------------------------------------------------- + +% For use with pandoc > 3.2.1 +\newsavebox\pandoc@box +\newcommand*\pandocbounded[1]{% scales image to fit in text height/width + \sbox\pandoc@box{#1}% + \Gscale@div\@tempa{\textheight}{\dimexpr\ht\pandoc@box+\dp\pandoc@box\relax}% + \Gscale@div\@tempb{\linewidth}{\wd\pandoc@box}% + \ifdim\@tempb\p@<\@tempa\p@\let\@tempa\@tempb\fi% select the smaller of both + \ifdim\@tempa\p@<\p@\scalebox{\@tempa}{\usebox\pandoc@box}% + \else\usebox{\pandoc@box}% + \fi% +} + +\RequirePackage[font=small,labelfont=bf]{caption} + +% Wide environments for figures and tables ------------------------------------- +\RequirePackage{environ} + +% An easy way to make a figure span the full width of the page +\NewEnviron{widefigure}[1][]{ +\begin{figure}[#1] +\advance\leftskip-2cm +\begin{minipage}{\dimexpr\textwidth+4cm\relax}% + \captionsetup{margin=2cm} + \BODY +\end{minipage}% +\end{figure} +} + +\NewEnviron{widetable}[1][]{ +\begin{table}[#1] +\advance\leftskip-2cm +\begin{minipage}{\dimexpr\textwidth+4cm\relax}% + \captionsetup{margin=2cm} + \BODY +\end{minipage}% +\end{table} +} diff --git a/_articles/RJ-2025-038/RJwrapper.tex b/_articles/RJ-2025-038/RJwrapper.tex new file mode 100644 index 0000000000..6617425cd1 --- /dev/null +++ b/_articles/RJ-2025-038/RJwrapper.tex @@ -0,0 +1,46 @@ +\documentclass[a4paper]{report} +\usepackage[utf8]{inputenc} +\usepackage[T1]{fontenc} +\usepackage{RJournal} +\usepackage{amsmath,amssymb,array} +\usepackage{booktabs} + + +% tightlist command for lists without linebreak +\providecommand{\tightlist}{% + \setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}} + +%\usepackage{framed} +\usepackage{algorithm,algcompatible} +\usepackage{subcaption} +% insert here the call for the packages your document requires + +\usepackage{listings} +%\usepackage{tipa} +%\usepackage{fancyvrb} + +\lstset{ + basicstyle=\ttfamily\footnotesize, + columns=fullflexible, + keepspaces=true, + breaklines=true, +} + +%% load any required packages here + +\begin{document} + +%% do not edit, for illustration only +\sectionhead{Contributed research article} +\volume{17} +\volnumber{4} +\year{2025} +\month{December} +\setcounter{page}{156} + +%% replace RJtemplate with your article +\begin{article} + \input{qcba} +\end{article} + +\end{document} diff --git a/_articles/RJ-2025-038/figures/figure1a.png b/_articles/RJ-2025-038/figures/figure1a.png new file mode 100644 index 0000000000..8f03ee9a6b Binary files /dev/null and b/_articles/RJ-2025-038/figures/figure1a.png differ diff --git a/_articles/RJ-2025-038/figures/figure1b.png b/_articles/RJ-2025-038/figures/figure1b.png new file mode 100644 index 0000000000..4a5e6355bc Binary files /dev/null and b/_articles/RJ-2025-038/figures/figure1b.png differ diff --git a/_articles/RJ-2025-038/qcba.R b/_articles/RJ-2025-038/qcba.R new file mode 100644 index 0000000000..6cb3f9540c --- /dev/null +++ b/_articles/RJ-2025-038/qcba.R @@ -0,0 +1,328 @@ +# Section 2 +library(qCBA) + +# Define cut points for quantization +temp_breaks <- seq(from = 15, to = 45, by = 5) +hum_breaks <- c(0, 40, 60, 80, 100) + +data_discr <- arc::applyCuts( + df = humtemp, + cutp = list(temp_breaks, hum_breaks, NULL), + infinite_bounds = TRUE, + labels = TRUE +) +head(data_discr) + +# Discover candidate class association rules (CARs) +txns <- as(data_discr, "transactions") +appearance <- list(rhs = c("Class=1", "Class=2", "Class=3", "Class=4")) +rules <- arules::apriori( + data = txns, + parameter = list( + confidence = 0.5, + support = 3 / nrow(data_discr), + minlen = 1, + maxlen = 3 + ), + appearance = appearance +) +inspect(rules) + +# Learn CBA classifier from the candidate CARs +classAtt <- "Class" +rmCBA <- cba_manual( + datadf_raw = humtemp, + rules = rules, + txns = txns, + rhs = appearance$rhs, + classAtt = classAtt, + cutp = list() +) +inspect(rmCBA@rules) + +# Evaluate the CBA model +prediction_cba <- predict(rmCBA, data_discr) +acc_cba <- CBARuleModelAccuracy( + prediction = prediction_cba, + groundtruth = data_discr[[classAtt]] +) +prediction_cba + +# Explain a prediction by finding the firing rule +ruleIDs <- predict(rmCBA, data_discr, outputFiringRuleIDs = TRUE) +inspect(rmCBA@rules[ruleIDs[1]]) + +# Post-process the model with QCBA +rmqCBA <- qcba(cbaRuleModel = rmCBA, datadf = humtemp) +rmqCBA@rules + +# Evaluate the QCBA model +prediction <- predict(rmqCBA, humtemp) +acc <- CBARuleModelAccuracy(prediction, humtemp[[rmqCBA@classAtt]]) +acc + + +# Section 4 +library(arulesCBA) # version 1.2.7 + +# Prepare data: shuffle, split, and discretize +set.seed(12) # Chosen for demonstration purposes +allDataShuffled <- datasets::iris[sample(nrow(datasets::iris)), ] +trainFold <- allDataShuffled[1:100, ] +testFold <- allDataShuffled[101:nrow(datasets::iris), ] +classAtt <- "Species" + +discrModel <- discrNumeric(df = trainFold, classAtt = classAtt) +train_disc <- as.data.frame(lapply(discrModel$Disc.data, as.factor)) +cutPoints <- discrModel$cutp +test_disc <- applyCuts( + testFold, + cutPoints, + infinite_bounds = TRUE, + labels = TRUE +) +y_true <- testFold[[classAtt]] + +# Learn a base CPAR model +rmBASE <- CPAR(train_disc, formula = as.formula(paste(classAtt, "~ ."))) +predictionBASE <- predict(rmBASE, test_disc) +inspect(rmBASE$rules) +cat("Number of rules: ", length(rmBASE$rules)) +cat("Total conditions: ", sum(rmBASE$rules@lhs@data)) +cat("Accuracy on test data: ", mean(predictionBASE == y_true)) + +# Configure QCBA parameters, initially disabling optimizations +baseModel_arc <- arulesCBA2arcCBAModel( + arulesCBAModel = rmBASE, + cutPoints = cutPoints, + rawDataset = trainFold, + classAtt = classAtt +) +qcbaParams <- list( + cbaRuleModel = baseModel_arc, + datadf = trainFold, + extend = "noExtend", + attributePruning = FALSE, + continuousPruning = FALSE, + postpruning = "none", + trim_literal_boundaries = FALSE, + defaultRuleOverlapPruning = "noPruning", + minImprovement = 0, + minCondImprovement = 0 +) + +# Helper function to run QCBA and print a summary +qcba_with_summary <- function(params) { + rmQCBA <- do.call(qcba, params) + cat("Number of rules: ", nrow(rmQCBA@rules), " ") + cat("Total conditions: ", sum(rmQCBA@rules$condition_count), " ") + accuracy <- CBARuleModelAccuracy(predict(rmQCBA, testFold), testFold[[classAtt]]) + cat("Accuracy on test data: ", round(accuracy, 2)) + print(rmQCBA@rules) +} + +# Run QCBA with only the mandatory refit step +qcba_with_summary(qcbaParams) + +# Check for presence of specific values in the training data +any(trainFold$Petal.Length == 1.9) +any(trainFold$Petal.Length == 2.6) + +# Enable boundary adjustments and attribute pruning +qcbaParams$attributePruning <- TRUE +qcbaParams$trim_literal_boundaries <- TRUE +qcbaParams$extend <- "numericOnly" +qcba_with_summary(qcbaParams) + +# Enable post-pruning +qcbaParams$postpruning <- "cba" +qcba_with_summary(qcbaParams) + +# Enable default rule pruning +qcbaParams$defaultRuleOverlapPruning <- "transactionBased" +qcba_with_summary(qcbaParams) + + +# Section 5 +# With arc package +rmCBA <- cba(datadf = trainFold, classAtt = "Species") +rmqCBA <- qcba(cbaRuleModel = rmCBA, datadf = trainFold) + +# With arulesCBA package +library(arulesCBA) +arulesCBAModel <- arulesCBA::CBA(Species ~ ., data = train_disc, supp = 0.1, conf = 0.9) +CBAmodel <- arulesCBA2arcCBAModel(arulesCBAModel, discrModel$cutp, iris, classAtt) +qCBAmodel <- qcba(cbaRuleModel = CBAmodel, datadf = iris) + +# With rCBA package +# Note: This may produce a warning due to a compatibility issue between rCBA +# and newer versions of the arules package. +library(rCBA) +rCBAmodel <- rCBA::build(train_disc) +CBAmodel <- rcbaModel2CBARuleModel(rCBAmodel, discrModel$cutp, iris, "Species") +qCBAmodel <- qcba(CBAmodel, iris) + +# Section 6 + +# Learn with default metaparameter values +stats <- benchmarkQCBA(train = trainFold, test = testFold, classAtt = classAtt) +print(stats) + +# Calculate relative improvement +round((stats[, 6:10] / stats[, 1:5] - 1), 3) + +# Run a more complex benchmark with custom parameters +output <- benchmarkQCBA( + trainFold, + testFold, + classAtt, + train_disc, + test_disc, + discrModel$cutp, + CBA = list(support = 0.05, confidence = 0.5), + algs = c("CPAR"), + iterations = 10, + return_models = TRUE, + seed = 1 +) + +message("Evaluation statistics") +print(output$stats) +message("CPAR model") +inspect(output$CPAR[[1]]) +message("QCBA model") +print(output$CPAR_QCBA[[1]]) + +# Plot figure 1 When run in R studio, the two plots will appear under the Plots tab. THIS CODE IS INCLUDED FOR +# REPLICABILITY OF THE VISUALIZATION BUT WILL NOT APPEAR IN THE ARTICLE + +attach(humtemp) +# custom discretization +data_raw <- humtemp +data_discr <- humtemp +temp_breaks <- seq(from = 15, to = 45, by = 5) +hum_breaks <- c(0, 40, 60, 80, 100) +temp_unique_vals <- setdiff(unique(Temperature), temp_breaks) +hum_unique_vals <- setdiff(unique(Humidity), hum_breaks) +data_discr[, 1] <- cut(Temperature, breaks = temp_breaks) +data_discr[, 2] <- cut(Humidity, breaks = hum_breaks) +# change interval syntax from (15,20] to (15;20], which is required by QCBA R package +data_discr[, 1] <- as.factor(unlist(lapply(data_discr[, 1], function(x) { + gsub(",", ";", x) +}))) +data_discr[, 2] <- as.factor(unlist(lapply(data_discr[, 2], function(x) { + gsub(",", ";", x) +}))) + +head(data_discr) +plotGrid <- function(plotFineGrid = TRUE, plotDiscrGrid = TRUE) { + if (plotDiscrGrid) { + + for (i in temp_breaks[-1]) { + abline(h = i, lty = 2) + } + for (i in hum_breaks[-1]) { + abline(v = i, lty = 2) + } + } + if (plotFineGrid) { + for (i in temp_unique_vals[-1]) { + abline(h = i, lty = 3, col = "grey") + } + for (i in hum_unique_vals[-1]) { + abline(v = i, lty = 3, col = "grey") + } + } +} + +classAtt <- "Class" +appearance <- getAppearance(data_discr, classAtt) +txns <- as(data_discr, "transactions") +rules <- apriori(txns, parameter = list(confidence = 0.5, support = 3/nrow(data_discr), minlen = 1, maxlen = 3), appearance = appearance) +plot(Humidity, Temperature, pch = as.character(Class), main = "Discovered asociation rules", cex.lab = 1.5, cex.axis = 1.5, cex.main = 1.5, + cex.sub = 1.5) + +plotGrid(FALSE) + + +plotHumTempRule <- function(rules, ruleIndex) { + if (typeof(rules) == "S4") { + # rules is a arules rule model + r <- inspect(rules)[ruleIndex, ] + rule <- paste(unlist(r$lhs[1]), collapse = "") + rhs <- paste(unlist(r$rhs[1]), collapse = "") + } else { + # rules is a list of rules output by qCBA + rule <- rules[ruleIndex, 1] + rhs <- regmatches(rule, regexec("\\{Class=.*\\}", rule)) + } + # get color + if (rhs == "{Class=1}") { + border = "red" + col = rgb(1, 0.2, 0.2, alpha = 0.3) + } else if (rhs == "{Class=2}") { + border = "green" + col = rgb(0, 1, 0, alpha = 0.3) + } else if (rhs == "{Class=3}") { + border = "black" + col = rgb(0.4, 0.4, 0.4, alpha = 0.3) + } else if (rhs == "{Class=4}") { + border = "blue" + col = rgb(0, 0, 1, alpha = 0.3) + } + + + temp_coordinates <- unlist(regmatches(rule, regexec("Temperature=.([0-9]+);([0-9]+).", rule))) + if (length(temp_coordinates) == 0) { + # if the temperature literal is missing in the rule, use the following coordinates + temp_coordinates = c(0, 0, 50) + } + hum_coordinates <- unlist(regmatches(rule, regexec("Humidity=.([0-9]+);([0-9]+).", rule))) + if (length(hum_coordinates) == 0) { + # if the humidity literal is missing in the rule, use the following coordinates + hum_coordinates = c(0, 0, 100) + } + m <- rect(hum_coordinates[2], temp_coordinates[2], hum_coordinates[3], temp_coordinates[3], border = border, col = col) +} + +plotRules <- function(rules) { + if (typeof(rules) == "S4") { + # for arules/cba + rule_count <- length(rules) + } else { + # for qcba + rule_count <- nrow(rules) + } + for (i in 1:rule_count) { + plotHumTempRule(rules, i) + } +} + +classAtt <- "Class" +appearance <- getAppearance(data_discr, classAtt) +# Note that we are calling `cba_manual()` instead of cba() because we want - for demonstration purposes - to construct the +# classifier from a externally-generated rule list. +rmCBA <- cba_manual(data_raw, rules, txns, appearance$rhs, classAtt, cutp = list(), pruning_options = list(default_rule_pruning = FALSE)) + +plot(Humidity, Temperature, pch = as.character(Class), main = "CBA model", cex.lab = 1.5, cex.axis = 1.5, cex.main = 1.5, cex.sub = 1.5) +plotGrid(FALSE) +plotRules(rmCBA@rules) + +# To save the figure to a file uncomment the following two lines +# dev.copy(png,filename='figures/figure1a.png'); +# dev.off (); + +rmqCBA <- qcba(cbaRuleModel = rmCBA, datadf = data_raw, extendType = "numericOnly", trim_literal_boundaries = TRUE, postpruning = "cba", + attributePruning = TRUE, defaultRuleOverlapPruning = "transactionBased", createHistorySlot = TRUE, loglevel = "WARNING") + +plot(Humidity, Temperature, pch = as.character(Class), main = "QCBA model", cex.lab = 1.5, cex.axis = 1.5, cex.main = 1.5, cex.sub = 1.5) +plotGrid(FALSE) +plotRules(rmqCBA@rules) + +# To save the figure to a file uncomment the following two lines +# dev.copy(png,filename='figures/figure1b.png'); +# dev.off (); + +message("To save the generated figures to a file uncomment the commented out code.") + +message("Note that the code may produce a not logical or factor warning, which is caused by a compatibility issue between rCBA and newer versions of arules.") diff --git a/_articles/RJ-2025-038/qcba.bib b/_articles/RJ-2025-038/qcba.bib new file mode 100644 index 0000000000..9450a64113 --- /dev/null +++ b/_articles/RJ-2025-038/qcba.bib @@ -0,0 +1,9623 @@ +@inproceedings{quinlan1993foil, + title={{FOIL}: A midterm report}, + author={Quinlan, J Ross and Cameron-Jones, R Mike}, + booktitle={European conference on machine learning}, + pages={1--20}, + year={1993}, + organization={Springer} +} +@inproceedings{kliegr2019tuning, + title={Tuning Hyperparameters of Classification Based on Associations ({CBA})}, + author={Kliegr, Tom{\'a}s and Kuchar, Jaroslav}, + booktitle={ITAT}, + pages={9--16}, + year={2019} +} +@Manual{sbrl, + title = {sbrl: Scalable {Bayesian} Rule Lists Model}, + author = {Hongyu Yang and Cynthia Rudin and Margo Seltzer}, + year = {2024}, + note = {{R} package version 1.4}, + url = {https://CRAN.R-project.org/package=sbrl} +} +@inproceedings{yang2017scalable, + title={Scalable {Bayesian} rule lists}, + author={Yang, Hongyu and Rudin, Cynthia and Seltzer, Margo}, + booktitle={International conference on machine learning}, + pages={3921--3930}, + year={2017}, + organization={PMLR} +} +@inproceedings{azmi2020rcar, + title={{RCAR} framework: building a regularized class association rules model in a categorical data space}, + author={Azmi, Mohamed and Berrado, Abdelaziz}, + booktitle={Proceedings of the 13th International Conference on Intelligent Systems: Theories and Applications}, + pages={1--6}, + year={2020} +} +@article{hahsler2019associative, + title={Associative Classification in {R}: arc, arulesCBA, and rCBA.}, + author={Hahsler, Michael and Johnson, Ian and Kliegr, Tom{\'a}{\v{s}} and Kucha{\v{r}}, Jaroslav}, + journal={R Journal}, + volume={9}, + number={2}, + year={2019} +} +@article{kliegr2023qcba, + title={{QCBA}: improving rule classifiers learned from quantitative data by recovering information lost by discretisation}, + author={Kliegr, Tom{\'a}{\v{s}} and Izquierdo, Ebroul}, + journal={Applied Intelligence}, + volume={53}, + number={18}, + pages={20797--20827}, + year={2023}, + publisher={Springer} +} +@article{thabtah2006improving, + title={Improving rule sorting, predictive accuracy and training time in associative classification}, + author={Thabtah, Fadi and Cowling, Peter and Hammoud, Suhel}, + journal={Expert Systems with Applications}, + volume={31}, + number={2}, + pages={414--426}, + year={2006}, + publisher={Elsevier} +} +@Manual{qcbaPackage, + title = {{qCBA}: Quantitative Classification by Association Rules}, + author = {Tom{\'a}{\v{s}} Kliegr}, + year = {2024}, + note = {{R} package version 1.0}, + url = {https://CRAN.R-project.org/package=qCBA} +} +@article{guazzelli2009pmml, + title={{PMML}: An open standard for sharing models}, + author={Guazzelli, Alex and Zeller, Michael and Lin, Wen-Ching and Williams, Graham and others}, + journal={The {R} Journal}, + volume={1}, + number={1}, + pages={60--65}, + year={2009} +} +@article{arxiv:qcba, + title={Quantitative {CBA}: Small and Comprehensible Association Rule Classification Models}, + author={Tom{\'a}{\v{s}} Kliegr}, + journal={arXiv preprint arXiv:1711.10166}, + year={2017} +} +@book{quinlan2014c4, + title={{C4.5}: programs for machine learning}, + author={Quinlan, J Ross}, + year={1993}, + publisher={Morgan Kaufmann} +} +@Manual{arcPackage, + title = {Association Rule Classification}, + author = {Tom{\'{a}}{\v{s}} Kliegr}, + year = {2016}, + note = {{R} package version 1.1.3}, + url = {https://CRAN.R-project.org/package=arc} +} +@Manual{arulesCBA, + title = {arulesCBA: Classification Based on Association Rules}, + author = {Michael Hahsler, Ian Johnson, Tyler Giallanza}, + year = {2024}, + note = {{R} package version 1.2.7}, + url = {https://CRAN.R-project.org/package=arulesCBA} +} +@Manual{rcba, + title = {rCBA: {CBA} Classifier for {R}}, + author = {Jaroslav Kucha{\v{r}} and Tom{\'a}{\v{s}} Kliegr}, + year={2019}, + note = {{R} package version 0.4.3}, + url = {https://CRAN.R-project.org/package=rCBA} +} +@Inbook{Toivonen2010, + author="Toivonen, Hannu", + editor="Sammut, Claude and Webb, Geoffrey I.", + title="Frequent Itemset", + booktitle="Encyclopedia of Machine Learning", + year="2010", + publisher="Springer US", + address="Boston, MA", + pages="418--418", + isbn="978-0-387-30164-8", + doi="10.1007/978-0-387-30164-8_317", + url="https://doi.org/10.1007/978-0-387-30164-8_317" +} + +@article{hertwig1997reiteration, + title={The reiteration effect in hindsight bias}, + author={Hertwig, Ralph and Gigerenzer, Gerd and Hoffrage, Ulrich}, + journal={Psychological Review}, + volume={104}, + number={1}, + pages={194}, + year={1997}, + publisher={American Psychological Association} +} + +@article{fisher1936use, + title={The use of multiple measurements in taxonomic problems}, + author={Fisher, Ronald A}, + journal={Annals of human genetics}, + volume={7}, + number={2}, + pages={179--188}, + year={1936}, + publisher={Wiley Online Library} +} + +@article{ghaderi2017linear, + title={A linear programming approach for learning non-monotonic additive value functions in multiple criteria decision aiding}, + author={Ghaderi, Mohammad and Ruiz, Francisco and Agell, N{\'u}ria}, + journal={European Journal of Operational Research}, + volume={259}, + number={3}, + pages={1073--1084}, + year={2017}, + publisher={Elsevier} +} + +@inproceedings{giraud2005toward, + title={Toward a justification of meta-learning: Is the no free lunch theorem a show-stopper}, + author={Giraud-Carrier, Christophe and Provost, Foster}, + booktitle={Proceedings of the ICML-2005 Workshop on Meta-learning}, + pages={12--19}, + year={2005} +} + +@inproceedings{schaffer1994conservation, + title={A conservation law for generalization performance}, + author={Schaffer, Cullen}, + booktitle={Proceedings of the 11th international conference on machine learning}, + pages={259--265}, + year={1994} +} + +@article{mcnicholas2008standardising, + title={Standardising the lift of an association rule}, + author={McNicholas, Paul David and Murphy, Thomas Brendan and O’Regan, M}, + journal={Computational Statistics \& Data Analysis}, + volume={52}, + number={10}, + pages={4712--4721}, + year={2008}, + publisher={Elsevier} +} + +@inproceedings{brzezinski2016bayesian, + title={Bayesian Confirmation Measures in Rule-based Classification}, + author={Brzezinski, Dariusz and Grudzi{\'n}ski, Zbigniew and Szcz{\k{e}}ch, Izabela}, + booktitle={International Workshop on New Frontiers in Mining Complex Patterns}, + pages={39--53}, + year={2016}, + organization={Springer} +} +@article{crump2013evaluating, + title={Evaluating {Amazon's Mechanical Turk} as a tool for experimental behavioral research}, + author={Crump, Matthew JC and McDonnell, John V and Gureckis, Todd M}, + journal={PloS one}, + volume={8}, + number={3}, + pages={e57410}, + year={2013}, + publisher={Public Library of Science} +} + +@article{paolacci2010running, + title={Running experiments on {Amazon Mechanical Turk}}, + author={Paolacci, Gabriele and Chandler, Jesse and Ipeirotis, Panagiotis G}, + year={2010}, + journal={Judgment and Decision Making}, + volume=5, + number=5 +} + + +@misc{crowdflower, + title={Dropping Mechanical Turk Helps Our Customers Get the Best Results}, + author={Ashleigh Harris}, + year={2014}, + publisher={CrowdFlower}, + url={https://www.crowdflower.com/crowdflower-drops-mechanical-turk-to-ensure-the-best-results-for-its-customers/} + +} + + +@article{paolacci2014inside, + title={Inside the {Turk}: Understanding {Mechanical Turk} as a participant pool}, + author={Paolacci, Gabriele and Chandler, Jesse}, + journal={Current Directions in Psychological Science}, + volume={23}, + number={3}, + pages={184--188}, + year={2014}, + publisher={Sage Publications Sage CA: Los Angeles, CA} +} + +@ARTICLE{gigerenzer1996narrow, + author = {Gerd Gigerenzer}, + title = {On narrow norms and vague heuristics: A reply to {Kahneman and Tversky}}, + journal = {Psychological Review}, + year = {1996}, + pages = {592--596} +} + +@article{brown2014crowdsourcing, + title={Crowdsourcing for cognitive science--the utility of smartphones}, + author={Brown, Harriet R and Zeidman, Peter and Smittenaar, Peter and Adams, Rick A and McNab, Fiona and Rutledge, Robb B and Dolan, Raymond J}, + journal={PloS one}, + volume={9}, + number={7}, + pages={e100662}, + year={2014}, + publisher={Public Library of Science} +} + +@incollection{serfas2011cognitive, + title={Cognitive Biases in the Capital Investment Context}, + author={Serfas, Sebastian}, + booktitle={Cognitive Biases in the Capital Investment Context}, + pages={95--189}, + year={2011}, + publisher={Springer} +} + + +@inproceedings{DBLP:conf/dis/StecherJF16, + author = {Julius Stecher and + Frederik Janssen and + Johannes F{\"{u}}rnkranz}, + title = {Shorter Rules Are Better, Aren't They?}, + booktitle = {Discovery Science - 19th International Conference, {DS} 2016, Bari, + Italy, October 19-21, 2016, Proceedings}, + pages = {279--294}, + year = {2016}, + crossref = {DBLP:conf/dis/2016}, + url = {https://doi.org/10.1007/978-3-319-46307-0_18}, + doi = {10.1007/978-3-319-46307-0_18}, + timestamp = {Tue, 23 May 2017 01:11:57 +0200}, + biburl = {http://dblp.uni-trier.de/rec/bib/conf/dis/StecherJF16}, + bibsource = {dblp computer science bibliography, http://dblp.org} +} + +@proceedings{DBLP:conf/dis/2016, + editor = {Toon Calders and + Michelangelo Ceci and + Donato Malerba}, + title = {Discovery Science - 19th International Conference, {DS} 2016, Bari, + Italy, October 19-21, 2016, Proceedings}, + series = {Lecture Notes in Computer Science}, + volume = {9956}, + year = {2016}, + url = {https://doi.org/10.1007/978-3-319-46307-0}, + doi = {10.1007/978-3-319-46307-0}, + isbn = {978-3-319-46306-3}, + timestamp = {Tue, 23 May 2017 01:11:57 +0200}, + biburl = {http://dblp.uni-trier.de/rec/bib/conf/dis/2016}, + bibsource = {dblp computer science bibliography, http://dblp.org} +} + +@book{han2012data, + title={Data mining: concepts and techniques}, + edition={3rd}, + author={Han, Jiawei and Pei, Jian and Kamber, Micheline}, + year={2012}, + publisher={Elsevier} +} + +@inproceedings{goethals2003fimi, + title={{FIMI}’03: Workshop on frequent itemset mining implementations}, + author={Goethals, Bart and Zaki, Mohammed J}, + booktitle={3rd IEEE ICDM Workshop on Frequent Itemset Mining Implementations}, + pages={1--13}, + year={2003} +} +@article{mansoori2008sgerd, + title={SGERD: A steady-state genetic algorithm for extracting fuzzy classification rules from data}, + author={Mansoori, Eghbal G and Zolghadri, Mansoor J and Katebi, Seraj D}, + journal={IEEE Transactions on Fuzzy Systems}, + volume={16}, + number={4}, + pages={1061--1071}, + year={2008}, + publisher={IEEE} +} +@article{chen2008building, + title={Building an associative classifier based on fuzzy association rules}, + author={Chen, Zuoliang and Chen, Guoqing}, + journal={International Journal of Computational Intelligence Systems}, + volume={1}, + number={3}, + pages={262--273}, + year={2008}, + publisher={Taylor \& Francis} +} +@article{ishibuchi2005hybridization, + title={Hybridization of fuzzy {GBML} approaches for pattern classification problems}, + author={Ishibuchi, Hisao and Yamamoto, Takashi and Nakashima, Tomoharu}, + journal={IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics)}, + volume={35}, + number={2}, + pages={359--365}, + year={2005}, + publisher={IEEE} +} +@article{hu2003finding, + title={Finding fuzzy classification rules using data mining techniques}, + author={Hu, Yi-Chung and Chen, Ruey-Shun and Tzeng, Gwo-Hshiung}, + journal={Pattern Recognition Letters}, + volume={24}, + number={1}, + pages={509--519}, + year={2003}, + publisher={Elsevier} +} +@inproceedings{Yin03cpar:classification, + author = {Xiaoxin Yin and Jiawei Han}, + title = {{CPAR}: Classification based on Predictive Association Rules}, + year = {2003}, + booktitle={Proceedings of the SIAM International Conference on Data Mining}, + pages={369-376}, + publisher={SIAM Press}, + address={San Franciso} +} + + + +@article{gonzalez2001selection, + title={Selection of relevant features in a fuzzy genetic learning algorithm}, + author={Gonz{\'a}lez, Antonio and P{\'e}rez, Ra{\'u}l}, + journal={IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics)}, + volume={31}, + number={3}, + pages={417--425}, + year={2001}, + publisher={IEEE} +} + +@article{liu2001classification, + title={Classification using association rules: weaknesses and enhancements}, + author={Liu, Bing and Ma, Yiming and Wong, Ching-Kian}, + journal={Data mining for scientific applications}, + volume={591}, + year={2001}, + publisher={Citeseer} +} + +@inproceedings{yin2003cpar, + title={{CPAR}: Classification based on predictive association rules}, + author={Yin, Xiaoxin and Han, Jiawei}, + booktitle={Proceedings of the 2003 SIAM International Conference on Data Mining}, + pages={331--335}, + year={2003}, + organization={SIAM} +} + +@inproceedings{Li:2001:CAE:645496.657866, + author = {Li, Wenmin and Han, Jiawei and Pei, Jian}, + title = {{CMAR}: Accurate and Efficient Classification Based on Multiple Class-Association Rules}, + booktitle = {ICDM '01 Proceedings}, + year = {2001}, + isbn = {0-7695-1119-8}, + pages = {369--376}, + numpages = {8}, + acmid = {657866}, + publisher = {IEEE}, + address = {Washington, DC, USA}, +} +@article{nguyen2015novel, + title={A novel method for constrained class association rule mining}, + author={Nguyen, Dang and Nguyen, Loan TT and Vo, Bay and Hong, Tzung-Pei}, + journal={Information Sciences}, + volume={320}, + pages={107--125}, + year={2015}, + publisher={Elsevier} +} +@ARTICLE{farchd2, +author={M. Elkano and M. Galar and J. A. Sanz and A. Fernández and E. Barrenechea and F. Herrera and H. Bustince}, +journal={IEEE Transactions on Fuzzy Systems}, +title={Enhancing Multiclass Classification in FARC-HD Fuzzy Classifier: On the Synergy Between $n$-Dimensional Overlap Functions and Decomposition Strategies}, +year={2015}, +volume={23}, +number={5}, +pages={1562-1580}, +keywords={fuzzy reasoning;fuzzy set theory;pattern classification;statistical analysis;FARC-HD fuzzy classifier;OVA;OVO;association degrees;binary counterparts;bioinformatics;computer vision;decomposition strategies;fuzzy association rule-based classification model for high-dimensional problems fuzzy classifier;fuzzy reasoning method;medicine;multiclass classification;multiclass classification problems;n-dimensional overlap functions;one-versus-all strategies;one-versus-one strategies;product t-norm;real-world classification problems;state-of-the-art fuzzy classifiers;statistical analysis;weighted voting;Computational modeling;Electronic mail;Fuzzy reasoning;Pragmatics;Proposals;Training;Vectors;Aggregations;Multi-classification;aggregations;fuzzy rule-based classification systems;multiclassification;one-versus-one;one-vs-one;overlaps}, +doi={10.1109/TFUZZ.2014.2370677}, +ISSN={1063-6706}, +month={Oct},} + +@article{oleson2011programmatic, + title={Programmatic Gold: Targeted and Scalable Quality Assurance in Crowdsourcing}, + author={Oleson, David and Sorokin, Alexander and Laughlin, Greg P and Hester, Vaughn and Le, John and Biewald, Lukas}, + journal={Human computation}, + volume={11}, + number={11}, + year={2011} +} + +@article{slowinski2006application, + + title={Application of Bayesian confirmation measures for mining rules from support-confidence Pareto-optimal set}, + + author={Slowinski, Roman and Brzezinska, Izabela and Greco, Salvatore}, + + journal={Artificial Intelligence and Soft Computing--ICAISC 2006}, + + pages={1018--1026}, + + year={2006}, + + publisher={Springer} + +} +@book{kunda1999social, + title={Social cognition: Making sense of people}, + author={Kunda, Ziva}, + year={1999}, + publisher={MIT press} +} + +@incollection{serfas2011cognitive, + title={Cognitive Biases in the Capital Investment Context}, + author={Serfas, Sebastian}, + booktitle={Cognitive Biases in the Capital Investment Context}, + pages={95--189}, + year={2011}, + publisher={Springer} +} + +@Inbook{Ristoski2016, +author="Ristoski, Petar +and de Vries, Gerben Klaas Dirk +and Paulheim, Heiko", +editor="Groth, Paul +and Simperl, Elena +and Gray, Alasdair +and Sabou, Marta +and Kr{\"o}tzsch, Markus +and Lecue, Freddy +and Fl{\"o}ck, Fabian +and Gil, Yolanda", +title="A Collection of Benchmark Datasets for Systematic Evaluations of Machine Learning on the Semantic Web", +bookTitle="The Semantic Web -- ISWC 2016: 15th International Semantic Web Conference, Kobe, Japan, October 17--21, 2016, Proceedings, Part II", +year="2016", +publisher="Springer International Publishing", +address="Cham", +pages="186--194", +abstract="In the recent years, several approaches for machine learning on the Semantic Web have been proposed. However, no extensive comparisons between those approaches have been undertaken, in particular due to a lack of publicly available, acknowledged benchmark datasets. In this paper, we present a collection of 22 benchmark datasets of different sizes. Such a collection of datasets can be used to conduct quantitative performance testing and systematic comparisons of approaches.", +isbn="978-3-319-46547-0", +doi="10.1007/978-3-319-46547-0_20", +url="https://doi.org/10.1007/978-3-319-46547-0_20" +} + +@article{furnkranz2005roc, + title={Roc ‘n’rule learning—towards a better understanding of covering algorithms}, + author={F{\"u}rnkranz, Johannes and Flach, Peter A}, + journal={Machine Learning}, + volume={58}, + number={1}, + pages={39--77}, + year={2005}, + publisher={Springer} +} + +@inproceedings{jalali2010study, + title={A study on interestingness measures for associative classifiers}, + author={Jalali-Heravi, Mojdeh and Za{\"\i}ane, Osmar R}, + booktitle={Proceedings of the 2010 ACM Symposium on Applied Computing}, + pages={1039--1046}, + year={2010}, + organization={ACM} +} + +@article{clinton2004proxy, + title={Proxy variable}, + author={Clinton, Joshua D}, + journal={The SAGE Encyclopedia of Social Science Research Methods. PLACE: Sage}, + year={2004} +} + +@article{pratto2005automatic, + title={Automatic Vigilance: The Attention-Grabbing Power of negative Social Information}, + author={Pratto, Felicia and John, Oliver P}, + journal={Social cognition: key readings}, + volume={250}, + year={2005} +} + + + +@article{rozin2001negativity, + title={Negativity bias, negativity dominance, and contagion}, + author={Rozin, Paul and Royzman, Edward B}, + journal={Personality and social psychology review}, + volume={5}, + number={4}, + pages={296--320}, + year={2001}, + publisher={Sage Publications Sage CA: Los Angeles, CA} +} + +@article{curley1984investigation, + title={An investigation of patient's reactions to therapeutic uncertainty}, + author={Curley, Shawn P and Eraker, Stephen A and Yates, J Frank}, + journal={Medical Decision Making}, + volume={4}, + number={4}, + pages={501--511}, + year={1984}, + publisher={Sage Publications Sage CA: Thousand Oaks, CA} +} + + +@inproceedings{beaman2006does, + title={When does ignorance make us smart? Additional factors guiding heuristic inference}, + author={Beaman, C Philip and McCloy, Rachel and Smith, Philip T}, + booktitle={Proceedings of the Cognitive Science Society}, + volume={28}, + number={28}, + year={2006} +} + +@article{bornstein1989exposure, + title={Exposure and affect: overview and meta-analysis of research, 1968--1987}, + journal={Psychological Bulletin}, + pages={265-289}, + number=106, + volume=2, + author={Bornstein, Robert F}, + year={1989}, + publisher={American Psychological Association} +} + +@article{monahan2000subliminal, + title={Subliminal mere exposure: Specific, general, and diffuse effects}, + author={Monahan, Jennifer L and Murphy, Sheila T and Zajonc, Robert B}, + journal={Psychological Science}, + volume={11}, + number={6}, + pages={462--466}, + year={2000}, + publisher={SAGE Publications Sage CA: Los Angeles, CA} +} + +@article{schwarz1991ease, + title={Ease of retrieval as information: Another look at the availability heuristic}, + author={Schwarz, Norbert and Bless, Herbert and Strack, Fritz and Klumpp, Gisela and Rittenauer-Schatka, Helga and Simons, Annette}, + journal={Journal of Personality and Social psychology}, + volume={61}, + number={2}, + pages={195}, + year={1991}, + publisher={American Psychological Association} +} + +@article{mynatt1977confirmation, + title={Confirmation bias in a simulated research environment: An experimental study of scientific inference}, + author={Mynatt, Clifford R and Doherty, Michael E and Tweney, Ryan D}, + journal={The quarterly journal of experimental psychology}, + volume={29}, + number={1}, + pages={85--95}, + year={1977}, + publisher={Taylor \& Francis} +} + +@article{tversky1971belief, + title={Belief in the law of small numbers}, + author={Tversky, Amos and Kahneman, Daniel}, + journal={Psychological bulletin}, + volume={76}, + number={2}, + pages={105}, + year={1971}, + publisher={American Psychological Association} +} + +@inproceedings{azevedo2007comparing, + title={Comparing rule measures for predictive association rules}, + author={Azevedo, Paulo J and Jorge, Al{\'\i}pio M{\'a}rio}, + booktitle={Ecml}, + volume={7}, + pages={510--517}, + year={2007}, + organization={Springer} +} + +@article{stanovich2009distinguishing, + title={Distinguishing the reflective, algorithmic, and autonomous minds: Is it time for a tri-process theory}, + author={Stanovich, Keith E}, + journal={In two minds: Dual processes and beyond}, + pages={55--88}, + year={2009} +} + +@book{evans2007hypothetical, + title={Hypothetical thinking: Dual processes in reasoning and judgement}, + author={Evans, Jonathan St BT and others}, + volume={3}, + year={2007}, + publisher={Psychology Press} +} + +@book{pohl2017cognitive, + title={Cognitive illusions: A handbook on fallacies and biases in thinking, judgement and memory}, + author={Pohl, R{\"u}diger}, + year={2017}, + publisher={Psychology Press}, + note={2nd ed.} +} + +@article{quinlan1996improved, + title={Improved use of continuous attributes in {C4. 5}}, + author={Quinlan, J Ross}, + journal={Journal of artificial intelligence research}, + volume={4}, + pages={77--90}, + year={1996} +} +@article{alcala2011fuzzy, + title={A fuzzy association rule-based classification model for high-dimensional problems with genetic rule selection and lateral tuning}, + author={Alcala-Fdez, Jes{\'u}s and Alcala, Rafael and Herrera, Francisco}, + journal={IEEE Transactions on Fuzzy Systems}, + volume={19}, + number={5}, + pages={857--872}, + year={2011}, + publisher={IEEE} +} +@article{Kuchar2017, + +title = "InBeat: JavaScript recommender system supporting sensor input and linked data ", + +journal = "Knowledge-Based Systems ", + +volume = "135", + +number = "", + +pages = "40 - 43", + +year = "2017", + +note = "", + +issn = "0950-7051", + +doi = "https://doi.org/10.1016/j.knosys.2017.07.026", + +url = "http://www.sciencedirect.com/science/article/pii/S0950705117303428", + +author = "Jaroslav Kucha{\v{s}} and Tomáš Kliegr", + +keywords = "Recommender system", + +keywords = "Semantic web", + +keywords = "Association rules", + +keywords = "Sensors " + +} + + + + +@inproceedings{zaki2003fast, + title={Fast vertical mining using diffsets}, + author={Zaki, Mohammed J and Gouda, Karam}, + booktitle={Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining}, + pages={326--335}, + year={2003}, + organization={ACM} +} + +@book{aggarwal2014frequent, + title={Frequent pattern mining}, + author={Aggarwal, Charu C and Han, Jiawei}, + year={2014}, + publisher={Springer} +} + +@article{paraminer, +year={2014}, +issn={1384-5810}, +journal={Data Mining and Knowledge Discovery}, +volume={28}, +number={3}, +title={Para Miner: a generic pattern mining algorithm for multi-core architectures}, +publisher={Springer US}, +keywords={Data mining; Closed pattern mining; Parallel pattern mining; Multi-core architectures}, +author={Negrevergne, Benjamin and Termier, Alexandre and Rousset, Marie-Christine and Méhaut, Jean-François}, +pages={593-633}, +language={English} +} +@INPROCEEDINGS{PLCM, +author={Negrevergne, Benjamin and Termier, Alexandre and Rousset, Marie-Christine and Méhaut, Jean-François and Uno, Takeaki}, +booktitle={Proc. of the International Conference on High Performance Computing and Simulation}, +series ={HPCS 2010}, +title={Discovering closed frequent itemsets on multicore: Parallelizing computations and optimizing memory accesses}, +year={2010}, +pages={521-528}, +keywords={data mining;parallel algorithms;PLCMQS;Tuple space;closed frequent itemsets;data mining;multicore processors;optimizing memory accesses;parallel algorithms;Data mining;Itemsets;Multicore processing;Program processors;frequent closed itemset;memory accesses;multicore;pattern mining} +} + +@INPROCEEDINGS{mtclosed, +author={Lucchese, C. and Orlando, S. and Perego, R.}, +booktitle={Proc. of the 7th IEEE International Conference on Data Mining }, +series={ICDM 2007}, +title={Parallel Mining of Frequent Closed Patterns: Harnessing Modern Computer Architectures}, +year={2007}, +pages={242-251}, +keywords={computer architecture;data mining;multi-threading;parallel algorithms;scheduling;MT_CLOSED;central processing unit;duplicate checking;frequent closed itemset mining;frequent closed pattern;multicore computer architecture;multithreaded algorithm;parallel algorithm;parallel mining;parallelization;task scheduling;Algorithm design and analysis;Computer architecture;Coprocessors;Data mining;Data structures;Dynamic scheduling;Itemsets;Parallel algorithms;Parallel processing;Yarn}, +ISSN={1550-4786},} + +@INPROCEEDINGS{Lucchese04dciclosed, + author = {Claudio Lucchese}, + title = {{DCI Closed}: A Fast and Memory Efficient Algorithm to Mine Frequent Closed Itemsets}, + booktitle = {Proc. of the IEEE ICDM 2004 Workshop on Frequent Itemset Mining Implementations (FIMI'04)}, + year = {2004} +} + +@INPROCEEDINGS{lcmv2, + author = {Takeaki Uno and Masashi Kiyomi and Hiroki Arimura}, + title = {{LCM} ver. 2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets}, + booktitle = {Proc. of the IEEE ICDM 2004 Workshop on Frequent Itemset Mining Implementations (FIMI'04)}, + year = {2004} +} + +@incollection{pasquier, +year={1999}, +isbn={978-3-540-65452-0}, +booktitle={Inf. Proc. of the 7th International Conference on Database Theory}, +volume={1540}, +series={ICDT'99}, +editor={Beeri, Catriel and Buneman, Peter}, +title={Discovering Frequent Closed Itemsets for Association Rules}, +publisher={Springer}, +author={Pasquier, Nicolas and Bastide, Yves and Taouil, Rafik and Lakhal, Lotfi}, +pages={398-416}, +language={English} +} + + + +@article {BDM:BDM584, +author = {Newell, Ben R. and Mitchell, Chris J. and Hayes, Brett K.}, +title = {Getting scarred and winning lotteries: effects of exemplar cuing and statistical format on imagining low-probability events}, +journal = {Journal of Behavioral Decision Making}, +volume = {21}, +number = {3}, +publisher = {John Wiley & Sons, Ltd.}, +issn = {1099-0771}, +url = {http://dx.doi.org/10.1002/bdm.584}, +doi = {10.1002/bdm.584}, +pages = {317--335}, +keywords = {imaginability, exemplar cuing, frequency format, probability judgment}, +year = {2008}, +} + +@Inbook{Hullermeier2013, +author="H{\"u}llermeier, Eyke +and Fober, Thomas +and Mernberger, Marco", +editor="Dubitzky, Werner +and Wolkenhauer, Olaf +and Cho, Kwang-Hyun +and Yokota, Hiroki", +title="Inductive Bias", +bookTitle="Encyclopedia of Systems Biology", +year="2013", +publisher="Springer New York", +address="New York, NY", +pages="1018--1018", +isbn="978-1-4419-9863-7", +doi="10.1007/978-1-4419-9863-7_927", +url="http://dx.doi.org/10.1007/978-1-4419-9863-7_927" +} + +@book{becker2007economic, + title={Economic theory}, + author={Becker, Gary Stanley}, + year={2007}, + publisher={Transaction Publishers} +} + +@techreport{wolpert1995no, + title={No free lunch theorems for search}, + author={Wolpert, David H and Macready, William G and others}, + year={1995}, + institution={Technical Report SFI-TR-95-02-010, Santa Fe Institute} +} + + + +@article{mata2012cognitive, + title={Cognitive bias}, + author={Mata, R}, + journal={Encyclopedia of human behaviour}, + volume={1}, + pages={531--535}, + year={2012} +} + +@book{utgoff2012machine, + title={Machine learning of inductive bias}, + author={Utgoff, Paul E}, + volume={15}, + year={2012}, + publisher={Springer Science \& Business Media} +} + +@book{mitchell1980need, + title={The need for biases in learning generalizations}, + author={Mitchell, Tom M}, + year={1980}, + publisher={Department of Computer Science, Laboratory for Computer Science Research, Rutgers Univ. New Jersey} +} + +@article{haussler1988quantifying, + title={Quantifying inductive bias: AI learning algorithms and Valiant's learning framework}, + author={Haussler, David}, + journal={Artificial intelligence}, + volume={36}, + number={2}, + pages={177--221}, + year={1988}, + publisher={Elsevier} +} + +@article{ullman2012simple, + title={From simple innate biases to complex visual concepts}, + author={Ullman, Shimon and Harari, Daniel and Dorfman, Nimrod}, + journal={Proceedings of the National Academy of Sciences}, + volume={109}, + number={44}, + pages={18215--18220}, + year={2012}, + publisher={National Acad Sciences} +} + +@article{lake2016building, + title={Building machines that learn and think like people}, + author={Lake, Brenden M and Ullman, Tomer D and Tenenbaum, Joshua B and Gershman, Samuel J}, + journal={arXiv preprint arXiv:1604.00289}, + year={2016} +} + +@article{Demsar:2006:SCC, + author = {Dem\v{s}ar, Janez}, + title = {Statistical Comparisons of Classifiers over Multiple Data Sets}, + journal = {J. Mach. Learn. Res.}, + issue_date = {12/1/2006}, + volume = {7}, + month = dec, + year = {2006}, + issn = {1532-4435}, + pages = {1--30}, + numpages = {30}, + url = {http://dl.acm.org/citation.cfm?id=1248547.1248548}, + acmid = {1248548}, + publisher = {JMLR.org} +} + +@inproceedings{furnkranz2015brief, + title={A brief overview of rule learning}, + author={F{\"u}rnkranz, Johannes and Kliegr, Tom{\'a}{\v{s}}}, + booktitle={International Symposium on Rules and Rule Markup Languages for the Semantic Web}, + pages={54--69}, + year={2015}, + organization={Springer} +} + + +@inproceedings{wilson1996value, + title={Value difference metrics for continuously valued attributes}, + author={Wilson, D Randall and Martinez, Tony R}, + booktitle={Proceedings of the International Conference on Artificial Intelligence, Expert Systems and Neural Networks}, + pages={11--14}, + year={1996} +} + + +@Book{ panda, + publisher = "Università Ca' Foscari Venezia", + address = "Venezia", + author = "Cristian De Zotti", + title = "Mining Top-K Classification Rules", + year = "2016", + note = "Master Thesis" +} + @inproceedings{Frank1998, + author = {Eibe Frank and Ian H. Witten}, + booktitle = {Fifteenth International Conference on Machine Learning}, + editor = {J. Shavlik}, + pages = {144-151}, + publisher = {Morgan Kaufmann}, + title = {Generating Accurate Rule Sets Without Global Optimization}, + year = {1998} + } + @Manual{matrixPackage, + title = {Matrix: Sparse and Dense Matrix Classes and Methods}, + author = {Douglas Bates and Martin Maechler}, + year = {2017}, + note = {R package version 1.2-8}, + url = {https://CRAN.R-project.org/package=Matrix}, + } + + + +@Manual{discrPackage, + title = {discretization: Data preprocessing, discretization for classification}, + author = {HyunJi Kim}, + year = {2012}, + note = {R package version 1.0-1}, + url = {https://CRAN.R-project.org/package=discretization}, + } + +@inproceedings{furnkranz1994incremental, + title={Incremental reduced error pruning}, + author={F{\"u}rnkranz, Johannes and Widmer, Gerhard}, + booktitle={Proceedings of the 11th International Conference on Machine Learning (ML-94)}, + pages={70--77}, + year={1994} +} + +@inproceedings{cohen1995fast, + title={Fast effective rule induction}, + author={Cohen, William W}, + booktitle = {Twelfth International Conference on Machine Learning}, + publisher = {Morgan Kaufmann}, + pages={115--123}, + year={1995} +} + + + + +@article{huhn2009furia, + title={{FURIA}: an algorithm for unordered fuzzy rule induction}, + author={H{\"u}hn, Jens and H{\"u}llermeier, Eyke}, + journal={Data Mining and Knowledge Discovery}, + volume={19}, + number={3}, + pages={293--319}, + year={2009}, + publisher={Springer} +} + + + +@book{langley1996elements, + title={Elements of machine learning}, + author={Langley, Pat}, + year={1996}, + publisher={Morgan Kaufmann} +} +@inproceedings{feelders2000prior, + title={Prior knowledge in economic applications of data mining}, + author={Feelders, Ad J}, + booktitle={European Conference on Principles of Data Mining and Knowledge Discovery}, + pages={395--400}, + year={2000}, + organization={Springer} +} + +@Inbook{Rauch2009, +author="Rauch, Jan", +title="Considerations on Logical Calculi for Dealing with Knowledge in Data Mining", +bookTitle="Advances in Data Management", +year="2009", +publisher="Springer Berlin Heidelberg", +address="Berlin, Heidelberg", +pages="177--199", +isbn="978-3-642-02190-9" +} + +@inproceedings{rauch2011applying, + title={Applying domain knowledge in association rules mining process--first experience}, + author={Rauch, Jan and \v{S}im\r{u}nek, Milan}, + booktitle={International Symposium on Methodologies for Intelligent Systems}, + pages={113--122}, + year={2011}, + organization={Springer} +} + +@inproceedings{gabriel2014learning, + title={Learning Semantically Coherent Rules}, + author={Gabriel, Alexander and Paulheim, Heiko and Janssen, Frederik}, + booktitle={Proceedings of the 1st International Workshop on Interactions between Data Mining and Natural Language Processing co-located with The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (DMNLP@ PKDD/ECML)}, + publisher={CEUR Workshop Proceedings}, + issn={1613-0073}, + address={Nancy, France}, + pages={49--63}, + year={2014} +} + +@article{CIS-312066, + Author = {Kim, Seongho}, + Title = {ppcor: An {R} Package for a Fast Calculation to Semi-partial Correlation Coefficients}, + Journal = {Communications for Statistical Applications and Methods}, + Volume = {22}, + Number = {6}, + Year = {2015}, + Pages = {665--674}, + Keywords = {part correlation, partial correlation} +} +@article{hertwig2005judgments, + title={Judgments of risk frequencies: tests of possible cognitive mechanisms}, + author={Hertwig, Ralph and Pachur, Thorsten and Kurzenh{\"a}user, Stephanie}, + journal={Journal of Experimental Psychology: Learning, Memory, and Cognition}, + volume={31}, + number={4}, + pages={621}, + year={2005}, + publisher={American Psychological Association} +} +@ARTICLE{850821, +author={M. J. Pazzani}, +journal={IEEE Intelligent Systems and their Applications}, +title={Knowledge discovery from data?}, +year={2000}, +volume={15}, +number={2}, +pages={10-12}, +keywords={cognitive systems;data mining;psychology;user modelling;very large databases;KDD goal;US laws;business practice;cognitive factors;commercial packages;data mining;electronic greeting cards;general-purpose KDD tools;knowledge discovery from data;knowledge discovery tools;lawsuit;learned decision criteria;loan application;mail filter;marketing database;massive data sets;medical database;medical journal;spam mail;Artificial intelligence;Business;Data mining;Databases;Financial management;Government;Knowledge management;Packaging;Postal services;Statistics}, +ISSN={1094-7167}, +month={March}} + +@article{quinlan1986induction, + title={Induction of decision trees}, + author={Quinlan, J. Ross}, + journal={Machine learning}, + volume={1}, + number={1}, + pages={81--106}, + year={1986}, + publisher={Springer} +} + +@article{tweney1980strategies, + title={Strategies of rule discovery in an inference task}, + author={Tweney, Ryan D and Doherty, Michael E and Worner, Winifred J and Pliske, Daniel B and Mynatt, Clifford R and Gross, Kimberly A and Arkkelin, Daniel L}, + journal={Quarterly Journal of Experimental Psychology}, + volume={32}, + number={1}, + pages={109--123}, + year={1980}, + publisher={Taylor \& Francis} +} + + +@article{edwards1965optimal, + title={Optimal strategies for seeking information: Models for statistics, choice reaction times, and human information processing}, + author={Edwards, Ward}, + journal={Journal of Mathematical Psychology}, + volume={2}, + number={2}, + pages={312--329}, + year={1965}, + publisher={Elsevier} +} + +@book{rips1994psychology, + title={The psychology of proof}, + author={Rips, Lance J}, + year={1994}, + publisher={MIT Press Cambridge} +} + +@article{wolpert2005coevolutionary, + title={Coevolutionary free lunches}, + author={Wolpert, David H and Macready, William G}, + journal={IEEE Transactions on Evolutionary Computation}, + volume={9}, + number={6}, + pages={721--735}, + year={2005}, + publisher={IEEE} +} + +@article{wolpert1996lack, + title={The lack of a priori distinctions between learning algorithms}, + author={Wolpert, David H}, + journal={Neural computation}, + volume={8}, + number={7}, + pages={1341--1390}, + year={1996}, + publisher={MIT Press} +} + +@inbook{gilovich2002like, + title={Like goes with like: The role of representativeness in erroneous and pseudo-scientific beliefs}, + author={Gilovich, Thomas and Savitsky, Kenneth}, + year={2002}, + publisher={Cambridge University Press}, + booktitle={Heuristics and Biases: +The Psychology of Intuitive Judgment} +} + + + +@book{nisbett1993rules, + title={Rules for reasoning}, + author={Nisbett, Richard E}, + year={1993}, + publisher={Psychology Press} +} + +@book{pinker2015words, + title={Words and rules: The ingredients of language}, + author={Pinker, Steven}, + year={2015}, + publisher={Basic Books} +} + +@article{smith1992case, + title={The case for rules in reasoning}, + author={Smith, Edward E and Langston, Christopher and Nisbett, Richard E}, + journal={Cognitive science}, + volume={16}, + number={1}, + pages={1--40}, + year={1992}, + publisher={Wiley Online Library} +} + +@article{gentner1998similarity, + title={Similarity and the development of rules}, + author={Gentner, Dedre and Medina, Jose}, + journal={Cognition}, + volume={65}, + number={2}, + pages={263--297}, + year={1998}, + publisher={Elsevier} +} + +@article{cheng1985pragmatic, + title={Pragmatic reasoning schemas}, + author={Cheng, Patricia W and Holyoak, Keith J}, + journal={Cognitive psychology}, + volume={17}, + number={4}, + pages={391--416}, + year={1985}, + publisher={Elsevier} +} + +@article{sloman1996empirical, + title={The empirical case for two systems of reasoning}, + author={Sloman, Steven A}, + journal={Psychological bulletin}, + volume={119}, + number={1}, + pages={3}, + year={1996}, + publisher={American Psychological Association} +} + +@article{stanovich2002individual, + title={Individual differences in reasoning: Implications for the rationality debate?}, + author={Stanovich, Keith E and West, Richard F}, + journal={Behavioral and brain sciences}, + year={2000}, + issue=23, + pages={645-726} +} + +@book{oaksford2007bayesian, + title={Bayesian rationality: The probabilistic approach to human reasoning}, + author={Oaksford, Mike and Chater, Nick}, + year={2007}, + publisher={Oxford University Press} +} + +@article{anderson1991adaptive, + title={The adaptive nature of human categorization}, + author={Anderson, John R}, + journal={Psychological Review}, + volume={98}, + number={3}, + pages={409}, + year={1991}, + publisher={American Psychological Association} +} + +@article{watkins1992q, + title={Q-learning}, + author={Watkins, Christopher JCH and Dayan, Peter}, + journal={Machine learning}, + volume={8}, + number={3-4}, + pages={279--292}, + year={1992}, + publisher={Springer} +} + +@article{thorndike1927influence, + title={The influence of primacy}, + author={Thorndike, Edward L}, + journal={Journal of Experimental Psychology}, + volume={10}, + number={1}, + pages={18}, + year={1927}, + publisher={Psychological Review Company} + } +@article{shteingart2013role, + title={The role of first impression in operant learning}, + author={Shteingart, Hanan and Neiman, Tal and Loewenstein, Yonatan}, + journal={Journal of Experimental Psychology: General}, + volume={142}, + number={2}, + pages={476}, + year={2013}, + publisher={American Psychological Association} +} + +@article{kaheman2003perspective, + title={A perspective on judgment and choice}, + author={Kahneman, Daniel}, + journal={American Psychologist}, + volume={58}, + year={2003} +} + +@article{grether1992testing, + title={Testing {Bayes} rule and the representativeness heuristic: Some experimental evidence}, + author={Grether, David M}, + journal={Journal of Economic Behavior \& Organization}, + volume={17}, + number={1}, + pages={31--57}, + year={1992}, + publisher={Elsevier} +} + +@article{bar1991commentary, + title={Commentary on {Wolford, Taylor, and B}eck: The Conjunction Fallacy?}, + author={Bar-Hillel, Maya}, + journal={Memory \& cognition}, + volume={19}, + number={4}, + pages={412--414}, + year={1991}, + publisher={Springer} +} + +@article{hilton1995social, + title={The social context of reasoning: Conversational inference and rational judgment}, + author={Hilton, Denis J}, + journal={Psychological Bulletin}, + volume={118}, + number={2}, + pages={248}, + year={1995}, + publisher={American Psychological Association} +} + +@article{sides2002reality, + title={On the reality of the conjunction fallacy}, + author={Sides, Ashley and Osherson, Daniel and Bonini, Nicolao and Viale, Riccardo}, + journal={Memory \& Cognition}, + volume={30}, + number={2}, + pages={191--198}, + year={2002}, + publisher={Springer} +} + +@article{rossi2001hypothesis, + title={Hypothesis testing in a rule discovery problem: When a focused procedure is effective}, + author={Rossi, Sandrine and Caverni, Jean Paul and Girotto, Vittorio}, + journal={The Quarterly Journal of Experimental Psychology: Section A}, + volume={54}, + number={1}, + pages={263--267}, + year={2001}, + publisher={Taylor \& Francis} +} + +@inproceedings{vallee2008goal, + title={Goal-driven hypothesis testing in a rule discovery task}, + author={Vall{\'e}e-Tourangeau, Fr{\'e}d{\'e}ric and Payton, Teresa}, + booktitle={Proceedings of the 30th Annual Conference of the Cognitive Science Society}, + pages={2122--2127}, + year={2008}, + organization={Cognitive Science Society Austin, TX} +} + +@article{gopnik2007bayesian, + title={Bayesian networks, {Bayesian} learning and cognitive development}, + author={Gopnik, Alison and Tenenbaum, Joshua B}, + journal={Developmental science}, + volume={10}, + number={3}, + pages={281--287}, + year={2007}, + publisher={Wiley Online Library} +} + +@article{kemeny1953use, + title={The use of simplicity in induction}, + author={Kemeny, John G}, + journal={The Philosophical Review}, + volume={62}, + number={3}, + pages={391--408}, + year={1953}, + publisher={JSTOR} +} + +@book{popper1935logik, + title={Logik der Forschung: zur Erkenntnistheorie der moderner Naturwissenschaft}, + author={Popper, Karl Raimund}, + year={1935}, + publisher={Verlag von Julius Springer} +} + + +@book{popper2005logic, + title={The logic of scientific discovery}, + author={Popper, Karl}, + year={2005}, + publisher={Routledge} +} + + +@article{post1960simplicity, + title={Simplicity in scientific theories}, + author={Post, HR}, + journal={The British Journal for the Philosophy of Science}, + volume={11}, + number={41}, + pages={32--41}, + year={1960}, + publisher={JSTOR} +} +@book{hintzman1978psychology, + title={The psychology of learning and memory}, + author={Hintzman, Douglas L}, + year={1978}, + publisher={Freeman} +} + +@article{furnkranz1999separate, + title={Separate-and-conquer rule learning}, + author={F{\"u}rnkranz, Johannes}, + journal={Artificial Intelligence Review}, + volume={13}, + number={1}, + pages={3--54}, + year={1999}, + publisher={Springer} +} + +@inproceedings{michalski1969quasi, + title={On the quasi-minimal solution of the general covering problem}, + author={Michalski, Ryszard S}, + booktitle={Proceedings of the V International Symposium on Information Processing (FCIP 69)(Switching Circuits)}, + address={Yugoslavia, Bled}, + pages={125-128}, + year={1969} +} + +@article{yu2016learning, + title={Learning Interpretable Musical Compositional Rules and Traces}, + author={Yu, Haizi and Varshney, Lav R and Garnett, Guy E and Kumar, Ranjitha}, + journal={arXiv preprint arXiv:1606.05572}, + note = {Presented at 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY }, + year={2016} +} + +@article{gallego2016interpreting, + title={Interpreting extracted rules from ensemble of trees: Application to computer-aided diagnosis of breast {MRI}}, + author={Gallego-Ortiz, Cristina and Martel, Anne L}, + journal={arXiv preprint arXiv:1606.08288}, + note = {Presented at 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY }, + year={2016} +} + +@article{su2015interpretable, + title={Interpretable Two-level Boolean Rule Learning for Classification}, + author={Su, Guolong and Wei, Dennis and Varshney, Kush R and Malioutov, Dmitry M}, + journal={arXiv preprint arXiv:1511.07361}, + note = {Presented at 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY}, + year={2016} +} + +@article{lipton2016mythos, + title={The Mythos of Model Interpretability}, + author={Lipton, Zachary}, + journal={arXiv preprint arXiv:1606.03490}, + note = {Presented at 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY}, + year={2016} +} + +@article{goodman2016european, + title={European Union regulations on algorithmic decision-making and a" right to explanation"}, + author={Goodman, Bryce and Flaxman, Seth}, + journal={arXiv preprint arXiv:1606.08813}, + year={2016} +} + +@article{wason1960failure, + title={On the failure to eliminate hypotheses in a conceptual task}, + author={Wason, Peter C}, + journal={Quarterly journal of experimental psychology}, + volume={12}, + number={3}, + pages={129--140}, + year={1960}, + publisher={Taylor \& Francis} +} + +@article{bond2007information, + title={Information distortion in the evaluation of a single option}, + author={Bond, Samuel D and Carlson, Kurt A and Meloy, Margaret G and Russo, J Edward and Tanner, Robin J}, + journal={Organizational Behavior and Human Decision Processes}, + volume={102}, + number={2}, + pages={240--254}, + year={2007}, + publisher={Elsevier} +} + +@article{trope1997wishful, + title={Wishful thinking from a pragmatic hypothesis-testing perspective}, + author={Trope, Yaacov and Gervey, Benjamin and Liberman, Nira}, + journal={The mythomanias: The nature of deception and self-deception}, + pages={105--31}, + year={1997}, + publisher={Lawrence Erlbaum Mahway, NJ} +} + + + +@inbook{trope1996social, +title = "Social hypothesis-testing: Cognitive and motivational mechanisms", +author = "Yaacov Trope and A. Liberman", +year = "1996", +booktitle = "Social psychology", +publisher = "Guilford Press", + +} + + +@book{pohl2004cognitive, + title={Cognitive illusions: A handbook on fallacies and biases in thinking, judgement and memory}, + author={Pohl, R{\"u}diger}, + year={2004}, + publisher={Psychology Press} +} + +@article{westen2006neural, + title={Neural bases of motivated reasoning: An {fMRI} study of emotional constraints on partisan political judgment in the 2004 {US} presidential election}, + author={Westen, Drew and Blagov, Pavel S and Harenski, Keith and Kilts, Clint and Hamann, Stephan}, + journal={Journal of cognitive neuroscience}, + volume={18}, + number={11}, + pages={1947--1958}, + year={2006}, + publisher={MIT Press} +} + +@article{klayman1987confirmation, + title={Confirmation, disconfirmation, and information in hypothesis testing}, + author={Klayman, Joshua and Ha, Young-Won}, + journal={Psychological review}, + volume={94}, + number={2}, + pages={211}, + year={1987}, + publisher={American Psychological Association} +} + +@article{wolfe2008locus, + title={The locus of the myside bias in written argumentation}, + author={Wolfe, Christopher R and Britt, M Anne}, + journal={Thinking \& Reasoning}, + volume={14}, + number={1}, + pages={1--27}, + year={2008}, + publisher={Taylor \& Francis} +} + +@article{stanovich2013myside, + title={Myside bias, rational thinking, and intelligence}, + author={Stanovich, Keith E and West, Richard F and Toplak, Maggie E}, + journal={Current Directions in Psychological Science}, + volume={22}, + number={4}, + pages={259--264}, + year={2013}, + publisher={Sage Publications} +} + +@article{albarracin2004role, + title={The role of defensive confidence in preference for proattitudinal information: How believing that one is strong can sometimes be a defensive weakness}, + author={Albarrac{\'\i}n, Dolores and Mitchell, Amy L}, + journal={Personality and Social Psychology Bulletin}, + volume={30}, + number={12}, + pages={1565--1584}, + year={2004}, + publisher={Sage Publications} +} + + + +@article{nickerson1998confirmation, + title={Confirmation bias: A ubiquitous phenomenon in many guises}, + author={Nickerson, Raymond S}, + journal={Review of general psychology}, + volume={2}, + number={2}, + pages={175}, + year={1998}, + publisher={Educational Publishing Foundation} +} + +@book{evans1989bias, + title={Bias in human reasoning: Causes and consequences}, + author={Evans, Jonathan St BT}, + year={1989}, + publisher={Lawrence Erlbaum Associates, Inc} +} + +@article{kahneman1979prospect, + title={Prospect theory: An analysis of decision under risk}, + author={Kahneman, Daniel and Tversky, Amos}, + journal={Econometrica: Journal of the econometric society}, + pages={263--291}, + year={1979}, + publisher={JSTOR} +} + +@article{rozin2001negativity, + title={Negativity bias, negativity dominance, and contagion}, + author={Rozin, Paul and Royzman, Edward B}, + journal={Personality and social psychology review}, + volume={5}, + number={4}, + pages={296--320}, + year={2001}, + publisher={Sage Publications} +} + +@article{robinson1996role, + title={The role of conscious recollection in recognition of affective material: Evidence for positive-negative asymmetry}, + author={Robinson-Riegler, Gregory L and Winton, Ward M}, + journal={The Journal of General Psychology}, + volume={123}, + number={2}, + pages={93--104}, + year={1996}, + publisher={Taylor \& Francis} +} + +@article{ohira1998effects, + title={Effects of stimulus valence on recognition memory and endogenous eyeblinks: Further evidence for positive-negative asymmetry}, + author={Ohira, Hideki and Winton, Ward M and Oyama, Makiko}, + journal={Personality and Social Psychology Bulletin}, + volume={24}, + number={9}, + pages={986--993}, + year={1998}, + publisher={Sage Publications} +} + +@article{fiske1980attention, + title={Attention and weight in person perception: The impact of negative and extreme behavior}, + author={Fiske, Susan T}, + journal={Journal of personality and Social Psychology}, + volume={38}, + number={6}, + pages={889}, + year={1980}, + publisher={American Psychological Association} +} + +@article{wasserstein2016asa, + title={The {ASA}'s statement on p-values: context, process, and purpose}, + author={Wasserstein, Ronald L and Lazar, Nicole A}, + journal={The American Statistician}, + year={2016}, + publisher={Taylor \& Francis} +} + +@article{griffiths2010probabilistic, + title={Probabilistic models of cognition: Exploring representations and inductive biases}, + author={Griffiths, Thomas L and Chater, Nick and Kemp, Charles and Perfors, Amy and Tenenbaum, Joshua B}, + journal={Trends in cognitive sciences}, + volume={14}, + number={8}, + pages={357--364}, + year={2010}, + publisher={Elsevier} +} + +@article{hoffrage2015natural, + title={Natural frequencies improve {Bayesian} reasoning in simple and complex inference tasks}, + author={Hoffrage, Ulrich and Krauss, Stefan and Martignon, Laura and Gigerenzer, Gerd}, + journal={Frontiers in psychology}, + volume={6}, + year={2015}, + publisher={Frontiers Media SA} +} + +@article{ellsberg1961risk, + ISSN = {00335533, 15314650}, + URL = {http://www.jstor.org/stable/1884324}, + abstract = {I. Are there uncertainties that are not risks? 643.--II. Uncertainties that are not risks, 647.--III. Why are some uncertainties not risks?--656.}, + author = {Daniel Ellsberg}, + journal = {The Quarterly Journal of Economics}, + number = {4}, + pages = {643-669}, + publisher = {Oxford University Press}, + title = {Risk, Ambiguity, and the {S}avage Axioms}, + volume = {75}, + year = {1961} +} + + +@inproceedings{martens2006ant, + title={Ant-based approach to the knowledge fusion problem}, + author={Martens, David and De Backer, Manu and Haesen, Raf and Baesens, Bart and Mues, Christophe and Vanthienen, Jan}, + booktitle={International Workshop on Ant Colony Optimization and Swarm Intelligence}, + pages={84--95}, + year={2006}, + organization={Springer} +} + +@article{martens2011performance, + title={Performance of classification models from a user perspective}, + author={Martens, David and Vanthienen, Jan and Verbeke, Wouter and Baesens, Bart}, + journal={Decision Support Systems}, + volume={51}, + number={4}, + pages={782--793}, + year={2011}, + publisher={Elsevier} +} + +@article{miller1956magical, + title={The magical number seven, plus or minus two: Some limits on our capacity for processing information}, + author={Miller, George A}, + journal={Psychological review}, + volume={63}, + number={2}, + pages={81}, + year={1956}, + publisher={American Psychological Association} +} + +@inproceedings{bibal2016interpretability, + title={Interpretability of Machine Learning Models and Representations: an Introduction}, + author={Bibal, Adrien and Fr{\'e}nay, Beno{\^\i}t}, + booktitle={Proceedings of the 24th European Symposium on Artificial Neural Networks (ESANN)}, + addrss={Bruges, Belgium}, + pages={77--82}, + year={2016} +} + +@inproceedings{elomaa2014defense, + title={In defense of {C4.5}: Notes on learning one-level decision trees}, + author={Elomaa, Tapio}, + booktitle={Proceedings of the 11th International conference on machine learning}, + volume={254}, + pages={62--69}, + year={1994}, + publisher={Morgan Kaufmann} +} + +@article{garcia2009study, + title={A study of statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability}, + author={Garc{\'\i}a, Salvador and Fern{\'a}ndez, Alberto and Luengo, Juli{\'a}n and Herrera, Francisco}, + journal={Soft Computing}, + volume={13}, + number={10}, + pages={959--977}, + year={2009}, + publisher={Springer} +} + +@article{Piltaver2016333, +title = "What makes classification trees comprehensible?", +journal = "Expert Systems with Applications", +volume = "62", +pages = "333 - 346", +year = "2016", +issn = "0957-4174", +author = "Rok Piltaver and Mitja Lustrek and Matjaz Gams and Sanda Martincic-Ipsic" +} + + +@inproceedings{pazzani1997comprehensible, + title={Comprehensible knowledge discovery: gaining insight from data}, + author={Pazzani, M}, + booktitle={First Federal Data Mining Conference and Exposition}, + pages={73--82}, + year={1997}, + address={Washington, DC} +} + +@article{freitas2014comprehensible, + title={Comprehensible classification models: a position paper}, + author={Freitas, Alex A}, + journal={ACM SIGKDD explorations newsletter}, + volume={15}, + number={1}, + pages={1--10}, + year={2014}, + publisher={ACM} +} + +@article{lavravc1998data, + title={Data mining in medicine: Selected techniques and applications}, + author={Lavra{\v{c}}, Nada}, + year={1998}, + journal={Artificial Intelligence in Medicine}, + volume={16}, + issue=1, + pages={3-23} + +} + +@article{huysmans2011empirical, + title={An empirical evaluation of the comprehensibility of decision table, tree and rule based predictive models}, + author={Huysmans, Johan and Dejaeger, Karel and Mues, Christophe and Vanthienen, Jan and Baesens, Bart}, + journal={Decision Support Systems}, + volume={51}, + number={1}, + pages={141--154}, + year={2011}, + publisher={Elsevier} +} +@inproceedings{allahyari2011user, + title={User-oriented assessment of classification model understandability}, + author={Allahyari, Hiva and Lavesson, Niklas}, + booktitle={11th Scandinavian Conference on Artificial Intelligence}, + year={2011}, + organization={IOS Press} +} + +@inproceedings{otero2013improving, + title={Improving the interpretability of classification rules discovered by an ant colony algorithm}, + author={Otero, Fernando EB and Freitas, Alex A}, + booktitle={Proceedings of the 15th annual conference on Genetic and evolutionary computation}, + pages={73--80}, + year={2013}, + organization={ACM} +} + +@article{hertwig2008conjunction, + title={The conjunction fallacy and the many meanings of and}, + author={Hertwig, Ralph and Benz, Bj{\"o}rn and Krauss, Stefan}, + journal={Cognition}, + volume={108}, + number={3}, + pages={740--753}, + year={2008}, + publisher={Elsevier} +} + + + +@article{guhaOverview, +author = {Petr H{\'a}jek and Martin Hole\v{n}a and Jan Rauch}, +title={The {GUHA} method and its meaning for data mining}, +journal={Journal of Computer and System Sciences}, +publisher={Elsevier}, +volume={76}, +numer={1}, +pages={34--48}, +year={2010}, + +} +@article{schaffer1993overfitting, + title={Overfitting avoidance as bias}, + author={Schaffer, Cullen}, + journal={Machine learning}, + volume={10}, + number={2}, + pages={153--178}, + year={1993}, + publisher={Springer} +} + +@inproceedings{schaffer1994conservation, + title={A conservation law for generalization performance}, + author={Schaffer, Cullen}, + booktitle={Proceedings of the 11th international conference on machine learning}, + pages={259--265}, + year={1994} +} + +@article{baxter2000model, + title={A model of inductive bias learning}, + author={Baxter, Jonathan}, + journal={J. Artif. Intell. Res.(JAIR)}, + volume={12}, + number={149-198}, + pages={3}, + year={2000} +} + +@inproceedings{brighton2006robust, + title={Robust Inference with Simple Cognitive Models}, + author={Brighton, Henry}, + booktitle={AAAI Spring Symposium: Between a Rock and a Hard Place: Cognitive Science Principles Meet AI-Hard Problems}, + pages={17--22}, + year={2006} +} + +@article{chater2003fast, + title={Fast, frugal, and rational: How rational norms explain behavior}, + author={Chater, Nick and Oaksford, Mike and Nakisa, Ramin and Redington, Martin}, + journal={Organizational behavior and human decision processes}, + volume={90}, + number={1}, + pages={63--86}, + year={2003}, + publisher={Elsevier} +} +@incollection{gigerenzer1999betting, + title={Betting on one good reason: The take the best heuristic}, + author={Gigerenzer, Gerd and Goldstein, Daniel G}, + booktitle={Simple heuristics that make us smart}, + pages={75--95}, + year={1999}, + publisher={Oxford University Press} +} + + +@incollection{gigerenzer1999fastandfrugal, + title={Fast and Frugal Heuristics}, + author={Gigerenzer, Gerd and Goldstein, Daniel G}, + booktitle={Simple heuristics that make us smart}, + pages={75--95}, + year={1999}, + publisher={Oxford University Press} +} + + + +@book{payne1993adaptive, + title={The adaptive decision maker}, + author={Payne, John W and Bettman, James R and Johnson, Eric J}, + year={1993}, + publisher={Cambridge University Press} +} + +@article{gigerenzer2009homo, + title={Homo heuristicus: Why biased minds make better inferences}, + author={Gigerenzer, Gerd and Brighton, Henry}, + journal={Topics in Cognitive Science}, + volume={1}, + number={1}, + pages={107--143}, + year={2009}, + publisher={Wiley Online Library} +} + +@article{brunswik1955representative, + title={Representative design and probabilistic theory in a functional psychology}, + author={Brunswik, Egon}, + journal={Psychological review}, + volume={62}, + number={3}, + pages={193}, + year={1955}, + publisher={American Psychological Association} +} + +@article{gigerenzer1991probabilistic, + title={Probabilistic mental models: a {B}runswikian theory of confidence}, + author={Gigerenzer, Gerd and Hoffrage, Ulrich and Kleinb{\"o}lting, Heinz}, + journal={Psychological review}, + volume={98}, + number={4}, + pages={506}, + year={1991}, + publisher={American Psychological Association} +} + +@book{simon1982models, + title={Models of bounded rationality: Empirically grounded economic reason}, + author={Simon, Herbert Alexander}, + volume={3}, + year={1982}, + publisher={MIT press} +} + +@article{simon1956rational, + title={Rational choice and the structure of the environment}, + author={Simon, Herbert A}, + journal={Psychological review}, + volume={63}, + number={2}, + pages={129}, + year={1956}, + publisher={American Psychological Association} +} + +@article{haselton2006paranoid, + title={The paranoid optimist: An integrative evolutionary model of cognitive biases}, + author={Haselton, Martie G and Nettle, Daniel}, + journal={Personality and social psychology Review}, + volume={10}, + number={1}, + pages={47--66}, + year={2006}, + publisher={Sage Publications} +} +@article{haselton2000error, + title={Error management theory: a new perspective on biases in cross-sex mind reading}, + author={Haselton, Martie G and Buss, David M}, + journal={Journal of personality and social psychology}, + volume={78}, + number={1}, + pages={81}, + year={2000}, + publisher={American Psychological Association} +} + +@article{mitchell1997machine, + title={Machine learning. 1997}, + author={Mitchell, Tom M}, + journal={Burr Ridge, IL: McGraw Hill}, + volume={45}, + pages={37}, + year={1997} +} + + +@article{Fayyad:1996:KPE:240455.240464, + author = {Fayyad, Usama and Piatetsky-Shapiro, Gregory and Smyth, Padhraic}, + title = {The {KDD} Process for Extracting Useful Knowledge from Volumes of Data}, + journal = {Commun. ACM}, + issue_date = {Nov. 1996}, + volume = {39}, + number = {11}, + month = nov, + year = {1996}, + issn = {0001-0782}, + pages = {27--34}, + numpages = {8}, + doi = {10.1145/240455.240464}, + acmid = {240464}, + publisher = {ACM}, + address = {New York, NY, USA}, +} + +@article{kahneman1972subjective, + title={Subjective probability: A judgment of representativeness}, + author={Kahneman, Daniel and Tversky, Amos}, + journal={Cognitive psychology}, + volume={3}, + number={3}, + pages={430--454}, + year={1972}, + publisher={Elsevier} +} + +@article{cosmides1996humans, + title={Are humans good intuitive statisticians after all? Rethinking some conclusions from the literature on judgment under uncertainty}, + author={Cosmides, Leda and Tooby, John}, + journal={Cognition}, + volume={58}, + number={1}, + pages={1--73}, + year={1996}, + publisher={Elsevier} +} + +@incollection{michalski1983theory, + title={A theory and methodology of inductive learning}, + author={Michalski, Ryszard S}, + booktitle={Machine learning}, + pages={83--134}, + year={1983}, + publisher={Springer} +} + +@inproceedings{stecher2016shorter, + title={Shorter Rules Are Better, Aren't They?}, + author={Stecher, Julius and Janssen, Frederik and F{\"u}rnkranz, Johannes}, + booktitle={International Conference on Discovery Science}, + pages={279--294}, + year={2016}, + organization={Springer} +} + + + +@inproceedings{lakkarajuinterpretable, + author = {Lakkaraju, Himabindu and Bach, Stephen H. and Leskovec, Jure}, + title = {Interpretable Decision Sets: A Joint Framework for Description and Prediction}, + booktitle = {Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining}, + series = {KDD '16}, + year = {2016}, + isbn = {978-1-4503-4232-2}, + location = {San Francisco, California, USA}, + pages = {1675--1684}, + numpages = {10}, + acmid = {2939874}, + publisher = {ACM}, + address = {New York, NY, USA}, + keywords = {classification, decision sets, interpretable machine learning, submodularity}, +} + + + +@article{griffin1992weighing, + title={The weighing of evidence and the determinants of confidence}, + author={Griffin, Dale and Tversky, Amos}, + journal={Cognitive psychology}, + volume={24}, + number={3}, + pages={411--435}, + year={1992}, + publisher={Elsevier} +} + +@article{tversky1993context, + title={Context-dependent preference}, + author={Tversky, Amos and Simonson, Itamar}, + journal={Management science}, + volume={39}, + number={10}, + pages={1179--1189}, + year={1993}, + publisher={INFORMS} +} + +@article{fantino1997conjunction, + title={The conjunction fallacy: A test of averaging hypotheses}, + author={Fantino, Edmund and Kulik, James and Stolarz-Fantino, Stephanie and Wright, William}, + journal={Psychonomic Bulletin \& Review}, + volume={4}, + number={1}, + pages={96--101}, + year={1997}, + publisher={Springer} +} + +@article{zizzo2000violation, + title={A violation of the monotonicity axiom: Experimental evidence on the conjunction fallacy}, + author={Zizzo, Daniel John and Stolarz-Fantino, Stephanie and Wen, Julie and Fantino, Edmund}, + journal={Journal of Economic Behavior \& Organization}, + volume={41}, + number={3}, + pages={263--276}, + year={2000}, + publisher={Elsevier} +} + +@article{stolarz1996conjunction, + title={The conjunction fallacy: Differential incidence as a function of descriptive frames and educational context}, + author={Stolarz-Fantino, Stephanie and Fantino, Edmund and Kulik, James}, + journal={Contemporary Educational Psychology}, + volume={21}, + number={2}, + pages={208--218}, + year={1996}, + publisher={Elsevier} +} + +@article{gigerenzer1996reasoning, + title={Reasoning the fast and frugal way: models of bounded rationality}, + author={Gigerenzer, Gerd and Goldstein, Daniel G}, + journal={Psychological review}, + volume={103}, + number={4}, + pages={650}, + year={1996}, + publisher={American Psychological Association} +} + +@article{al2009ambiguity, + title={The ambiguity aversion literature: a critical assessment}, + author={Al-Najjar, Nabil I and Weinstein, Jonathan}, + journal={Economics and Philosophy}, + volume={25}, + number={03}, + pages={249--284}, + year={2009}, + publisher={Cambridge Univ Press} +} + +@article{kahneman2002representativeness, + title={Representativeness revisited: Attribute substitution in intuitive judgment}, + author={Kahneman, Daniel and Frederick, Shane}, + journal={Heuristics and biases: The psychology of intuitive judgment}, + volume={49}, + year={2002} +} + +@article{bar1993alike, + title={How alike is it versus how likely is it: A disjunction fallacy in probability judgments}, + author={Bar-Hillel, Maya and Neter, Efrat}, + journal={Journal of Personality and Social Psychology}, + volume={65}, + number={6}, + pages={1119}, + year={1993}, + publisher={American Psychological Association} +} + + +@article{mosconi2001role, + title={The role of pragmatic rules in the conjunction fallacy}, + author={Mosconi, Giuseppe and Macchi, Laura}, + journal={Mind \& Society}, + volume={2}, + number={1}, + pages={31--57}, + year={2001}, + publisher={Springer} +} + + +@article{sloman2003frequency, + title={Frequency illusions and other fallacies}, + author={Sloman, Steven A and Over, David and Slovak, Lila and Stibel, Jeffrey M}, + journal={Organizational Behavior and Human Decision Processes}, + volume={91}, + number={2}, + pages={296--309}, + year={2003}, + publisher={Elsevier} +} + +@article{gigerenzer1999overcoming, + title={Overcoming difficulties in {Bayesian} reasoning: A reply to {Lewis and Keren (1999) and Mellers and McGraw} (1999).}, + author={Gigerenzer, Gerd and Hoffrage, Ulrich}, + year={1999}, + publisher={American Psychological Association}, + journal={Psychological Review}, + pages={425-430}, + number=106 +} + +@Article{Chew2012, +author="Chew, Soo Hong +and Ebstein, Richard P. +and Zhong, Songfa", +title="Ambiguity aversion and familiarity bias: Evidence from behavioral and gene association studies", +journal="Journal of Risk and Uncertainty", +year="2012", +volume="44", +number="1", +pages="1--18", +abstract="It is increasingly recognized that decision making under uncertainty depends not only on probabilities, but also on psychological factors such as ambiguity and familiarity. Using 325 Beijing subjects, we conduct a neurogenetic study of ambiguity aversion and familiarity bias in an incentivized laboratory setting. For ambiguity aversion, 49.4{\%} of the subjects choose to bet on the 50--50 deck despite the unknown deck paying 20{\%} more. For familiarity bias, 39.6{\%} choose the bet on Beijing's temperature rather than the corresponding bet with Tokyo even though the latter pays 20{\%} more. We genotype subjects for anxiety-related candidate genes and find a serotonin transporter polymorphism being associated with familiarity bias, but not ambiguity aversion, while the dopamine D5 receptor gene and estrogen receptor beta gene are associated with ambiguity aversion only among female subjects. Our findings contribute to understanding of decision making under uncertainty beyond revealed preference.", +issn="1573-0476" +} + + +@book{keynes1922treatise, + title={A Treatise on Probability}, + author={Keynes, John Maynard}, + year={1922}, + publisher={Macmillan \& Co} +} + +@Article{Camerer1992, +author="Camerer, Colin +and Weber, Martin", +title="Recent developments in modeling preferences: Uncertainty and ambiguity", +journal="Journal of Risk and Uncertainty", +year="1992", +volume="5", +number="4", +pages="325--370", +abstract="In subjective expected utility (SEU), the decision weights people attach to events are their beliefs about the likelihood of events. Much empirical evidence, inspired by Ellsberg (1961) and others, shows that people prefer to bet on events they know more about, even when their beliefs are held constant. (They are averse to ambiguity, or uncertainty about probability.) We review evidence, recent theoretical explanations, and applications of research on ambiguity and SEU.", +issn="1573-0476" +} + +@article{geier2006unit, + title={Unit bias a new heuristic that helps explain the effect of portion size on food intake}, + author={Geier, Andrew B and Rozin, Paul and Doros, Gheorghe}, + journal={Psychological Science}, + volume={17}, + number={6}, + pages={521--525}, + year={2006}, + publisher={SAGE Publications} +} + +@article{baron1988heuristics, + title={Heuristics and biases in diagnostic reasoning: {II.} Congruence, information, and certainty}, + author={Baron, Jonathan and Beattie, Jane and Hershey, John C}, + journal={Organizational Behavior and Human Decision Processes}, + volume={42}, + number={1}, + pages={88--110}, + year={1988}, + publisher={Elsevier} +} + + +@article{penney1985elimination, + title={Elimination of the suffix effect on preterminal list items with unpredictable list length: Evidence for a dual model of suffix effects}, + author={Penney, Catherine G}, + journal={Journal of Experimental Psychology: Learning, Memory, and Cognition}, + volume={11}, + number={2}, + pages={229}, + year={1985}, + publisher={American Psychological Association} +} + + +@inproceedings{kanouse1987negativity, + title={Negativity in evaluations}, + author={Kanouse, David E and Hanson Jr, L Reid}, + booktitle={Attribution: Perceiving the causes of behavior}, + year={1987}, + organization={Lawrence Erlbaum Associates, Inc} +} + +@book{lincoff1989audubon, + title={{The Audubon society field guide to North American mushrooms}}, + author={Lincoff, Gary H}, + year={1981}, + publisher={Knopf} +} + +@article{hahsler2011arules, + title={The arules R-package ecosystem: analyzing interesting patterns from large transaction data sets}, + author={Hahsler, Michael and Chelluboina, Sudheer and Hornik, Kurt and Buchta, Christian}, + journal={Journal of Machine Learning Research}, + volume={12}, + number={Jun}, + pages={2021--2025}, + year={2011} +} + + +@incollection{goldstein1999recognition, + title={The recognition heuristic: How ignorance makes us smart}, + author={Goldstein, Daniel G and Gigerenzer, Gerd}, + booktitle={Simple heuristics that make us smart}, + pages={37--58}, + year={1999}, + publisher={Oxford University Press} +} + +@article{furnkranz1997pruning, + title={Pruning algorithms for rule learning}, + author={F{\"u}rnkranz, Johannes}, + journal={Machine Learning}, + volume={27}, + number={2}, + pages={139--172}, + year={1997}, + publisher={Springer} +} + +@article{hasher1977frequency, + title={Frequency and the conference of referential validity}, + author={Hasher, Lynn and Goldstein, David and Toppino, Thomas}, + journal={Journal of Verbal Learning and Verbal Behavior}, + volume={16}, + number={1}, + pages={107--112}, + year={1977}, + publisher={Elsevier} +} + + +@article{pachur2006psychology, + title={On the psychology of the recognition heuristic: Retrieval primacy as a key determinant of its use}, + author={Pachur, Thorsten and Hertwig, Ralph}, + journal={Journal of Experimental Psychology: Learning, Memory, and Cognition}, + volume={32}, + number={5}, + pages={983}, + year={2006}, + publisher={American Psychological Association} +} + + +@article{zajonc1968attitudinal, + title={Attitudinal effects of mere exposure}, + author={Zajonc, Robert B}, + journal={Journal of personality and social psychology}, + volume={9}, + number={2, Pt. 2}, + pages={1}, + year={1968}, + publisher={American Psychological Association} +} + +@article{pachur2011recognition, + title={The recognition heuristic: A review of theory and tests}, + author={Pachur, Thorsten and Todd, Peter M and Gigerenzer, Gerd and Schooler, Lael and Goldstein, Daniel G}, + journal={Frontiers in psychology}, + volume={2}, + pages={147}, + year={2011}, + publisher={Frontiers} +} + + + +@article{sloman2011human, + title={Human representation and reasoning about complex causal systems}, + author={Sloman, Steven A and Fernbach, Philip M}, + journal={Information Knowledge Systems Management}, + volume={10}, + number={1-4}, + pages={85--99}, + year={2011}, + publisher={IOS Press} +} + +@article{martire2013expression, + title={The expression and interpretation of uncertain forensic science evidence: verbal equivalence, evidence strength, and the weak evidence effect}, + author={Martire, Kristy A and Kemp, Richard I and Watkins, Ian and Sayle, Malindi A and Newell, Ben R}, + journal={Law and human behavior}, + volume={37}, + number={3}, + pages={197}, + year={2013}, + publisher={Educational Publishing Foundation} +} + + +@article{edgell2004learned, + title={What is learned from experience in a probabilistic environment?}, + author={Edgell, Stephen E and Harbison, J and Neace, William P and Nahinsky, Irwin D and Lajoie, A Scott}, + journal={Journal of Behavioral Decision Making}, + volume={17}, + number={3}, + pages={213--229}, + year={2004}, + publisher={Wiley Online Library} +} + +@article{gigerenzer1995improve, + title={How to improve {Bayesian} reasoning without instruction: frequency formats}, + author={Gigerenzer, Gerd and Hoffrage, Ulrich}, + journal={Psychological review}, + volume={102}, + number={4}, + pages={684}, + year={1995}, + publisher={American Psychological Association} +} + +@article{tversky1973availability, + title={Availability: A heuristic for judging frequency and probability}, + author={Tversky, Amos and Kahneman, Daniel}, + journal={Cognitive psychology}, + volume={5}, + number={2}, + pages={207--232}, + year={1973}, + publisher={Elsevier} +} + +@INPROCEEDINGS{ballinuse, + title={The use of information from experts for agricultural official statistics}, + author={Ballin, Marco and Carbini, Riccardo and Loporcaro, Maria Francesca and Lori, Massimo and Moro, Roberto and Olivieri, Valeria and Mauro Scanu}, + booktitle={European Conference on Quality in Official Statistics (Q2008)}, + year = {2008} +} + + +@book{plous1993psychology, + title={The psychology of judgment and decision making}, + author={Plous, Scott}, + year={1993}, + publisher={McGraw-Hill Book Company} +} + +@article{fernbach2011good, + title={When good evidence goes bad: The weak evidence effect in judgment and decision-making}, + author={Fernbach, Philip M and Darlow, Adam and Sloman, Steven A}, + journal={Cognition}, + volume={119}, + number={3}, + pages={459--467}, + year={2011}, + publisher={Elsevier} +} + + + +@Inbook{kahneman1999economic, +author="Kahneman, Daniel +and Ritov, Ilana +and Schkade, David +and Sherman, Steven J. +and Varian, Hal R.", +editor="Fischhoff, Baruch +and Manski, Charles F.", +title="Economic Preferences or Attitude Expressions?: An Analysis of Dollar Responses to Public Issues", +bookTitle="Elicitation of Preferences", +year="2000", +publisher="Springer Netherlands", +address="Dordrecht", +pages="203--242", +isbn="978-94-017-1406-8" +} + + +@article{marewski2010five, + title={Five principles for studying people's use of heuristics}, + author={Marewski, Julian N and Schooler, Lael J and Gigerenzer, Gerd}, + journal={Acta Psychologica Sinica}, + volume={42}, + number={1}, + pages={72--87}, + year={2010} +} + + +@article{domingos1999role, + title={The role of {Occam}'s razor in knowledge discovery}, + author={Domingos, Pedro}, + journal={Data mining and knowledge discovery}, + volume={3}, + number={4}, + pages={409--425}, + year={1999}, + publisher={Springer} +} + +@article{bar1980base, + title={The base-rate fallacy in probability judgments}, + author={Bar-Hillel, Maya}, + journal={Acta Psychologica}, + volume={44}, + number={3}, + pages={211--233}, + year={1980}, + publisher={Elsevier} +} + + +@article{goldstein2002models, + title={Models of ecological rationality: the recognition heuristic}, + author={Goldstein, Daniel G and Gigerenzer, Gerd}, + journal={Psychological review}, + volume={109}, + number={1}, + pages={75}, + year={2002}, + publisher={American Psychological Association} +} + +@article{hertwig2008fluency, + title={Fluency heuristic: a model of how the mind exploits a by-product of information retrieval}, + author={Hertwig, Ralph and Herzog, Stefan M and Schooler, Lael J and Reimer, Torsten}, + journal={Journal of Experimental Psychology: Learning, memory, and cognition}, + volume={34}, + number={5}, + pages={1191}, + year={2008}, + publisher={American Psychological Association} +} + +@article{clark1976effects, + title={The effects of data aggregation in statistical analysis}, + author={Clark, William AV and Avery, Karen L}, + journal={Geographical Analysis}, + volume={8}, + number={4}, + pages={428--438}, + year={1976}, + publisher={Wiley Online Library} +} + +@article{robinson-ecological, + title={Ecological correlations and the behavior of individuals}, + author={Robinson, William S}, + journal={American Sociological Review}, + volume={15}, + number={3}, + pages={351--337}, + year={1950} +} + + +@article{newson2002parameters, + title={Parameters behind "nonparametric" statistics: {Kendall}'s tau, {Somers}' D and median differences}, + author={Newson, Roger}, + year={2002}, + journal={Stata Journal}, + volume=2, + issue=1, + page={45--64} +} + + +@article{gibbons1990rank, + title={Rank correlation methods}, + author={Gibbons, Jean D and Kendall, MG}, + journal={Edward Arnold}, + year={1990} +} + + +@article{kuh1955correlation, + title={Correlation and regression estimates when the data are ratios}, + author={Kuh, Edwin and Meyer, John R}, + journal={Econometric, Journal of the Econometric Society}, + pages={400--416}, + year={1955}, + publisher={JSTOR} +} + +@article{Tu2004143, + +title = "Ratio variables in regression analysis can give rise to spurious results: illustration from two studies in periodontology ", + +journal = "Journal of Dentistry ", + +volume = "32", + +number = "2", + +pages = "143 - 151", + +year = "2004", + +note = "", + +issn = "0300-5712", + + +author = "Yu-Kang Tu and Valerie Clerehugh and Mark S. Gilthorpe", + +keywords = "Guided tissue regeneration", + +keywords = "Ratio variables", + +keywords = "Root coverage", + +keywords = "Mathematical coupling " + +} + + +@article{schnoebelen2010using, + title={Using {Amazon Mechanical Turk} for linguistic research}, + author={Schnoebelen, Tyler and Kuperman, Victor}, + journal={Psihologija}, + volume={43}, + number={4}, + pages={441--464}, + year={2010} +} + +@article{kahneman2000evaluation, + title={Evaluation by moments: Past and future}, + author={Kahneman, Daniel}, + journal={Choices, values, and frames}, + pages={693--708}, + year={2000} +} + + +@article{1974-02325-00119730701, +Abstract = {Considers that intuitive predictions follow a judgmental heuristic-representativeness. By this heuristic, people predict the outcome that appears most representative of the evidence. Consequently, intuitive predictions are insensitive to the reliability of the evidence or to the prior probability of the outcome, in violation of the logic of statistical prediction. The hypothesis that people predict by representativeness was supported in a series of studies with both naive and sophisticated university students (N = 871). The ranking of outcomes by likelihood coincided with the ranking by representativeness, and Ss erroneously predicted rare events and extreme values if these happened to be representative. The experience of unjustified confidence in predictions and the prevalence of fallacious intuitions concerning statistical regression are traced to the representativeness heuristic. (PsycINFO Database Record (c) 2013 APA, all rights reserved)}, +Author = {Kahneman, Daniel and Tversky, Amos}, +ISSN = {0033-295X}, +Journal = {Psychological Review}, +Keywords = {rules determining intuitive predictions & judgments of confidence, contrast to normative principles of statistical prediction, Intuition, Judgment, Prediction, Statistical Probability}, +Number = {4}, +Pages = {237 - 251}, +Title = {On the psychology of prediction}, +Volume = {80}, +Year = {1973}, +} +@article{Tversky27091974, +author = {Tversky, Amos and Kahneman, Daniel}, +title = {Judgment under Uncertainty: Heuristics and Biases}, +volume = {185}, +number = {4157}, +pages = {1124-1131}, +year = {1974}, +abstract ={This article described three heuristics that are employed in making judgements under uncertainty: (i) representativeness, which is usually employed when people are asked to judge the probability that an object or event A belongs to class or process B; (ii) availability of instances or scenarios, which is often employed when people are asked to assess the frequency of a class or the plausibility of a particular development; and (iii) adjustment from an anchor, which is usually employed in numerical prediction when a relevant value is available. These heuristics are highly economical and usually effective, but they lead to systematic and predictable errors. A better understanding of these heuristics and of the biases to which they lead could improve judgements and decisions in situations of uncertainty}, +journal = {Science} +} + +@article{hertwig1999conjunction, + title={The "conjunction fallacy" revisited: How intelligent inferences look like reasoning errors}, + author={Hertwig, Ralph and Gigerenzer, Gerd}, + journal={Journal of Behavioral Decision Making}, + volume={12}, + number={4}, + pages={275--305}, + year={1999} +} + +@article{charness2010conjunction, +title = "On the conjunction fallacy in probability judgment: New experimental evidence regarding {L}inda", +journal = "Games and Economic Behavior ", +volume = "68", +number = "2", +pages = "551 - 556", +year = "2010", +note = "", +issn = "0899-8256", +author = "Gary Charness and Edi Karni and Dan Levin", +keywords = "Conjunction fallacy", +keywords = "Representativeness bias", +keywords = "Group consultation", +keywords = "Incentives ", +abstract = "This paper reports the results of a series of experiments designed to test whether and to what extent individuals succumb to the conjunction fallacy. Using an experimental design of Tversky and Kahneman (1983), it finds that given mild incentives, the proportion of individuals who violate the conjunction principle is significantly lower than that reported by Kahneman and Tversky. Moreover, when subjects are allowed to consult with other subjects, these proportions fall dramatically, particularly when the size of the group rises from two to three. These findings cast serious doubts about the importance and robustness of such violations for the understanding of real-life economic decisions. " +} + + +@article{tentori2004conjunction, + title={The conjunction fallacy: a misunderstanding about conjunction?}, + author={Tentori, Katya and Bonini, Nicolao and Osherson, Daniel}, + journal={Cognitive Science}, + volume={28}, + number={3}, + pages={467--477}, + year={2004}, + publisher={Elsevier} +} + + +@article{personalitytype, +year={2013}, +issn={0947-3602}, +journal={Requirements Engineering}, +volume={18}, +number={3}, +title={Effect of personality type on structured tool comprehension performance}, +publisher={Springer London}, +keywords={Structured tools; Personality type; Comprehension accuracy; Comprehension speed; Systems analysis}, +author={Gorla, Narasimhaiah and Chiravuri, Ananth and Meso, Peter}, +pages={281-292} +} + + +@inproceedings{comprehensibility, + title={Comprehensibility of Classification Trees -- Survey Design Validation}, + author={Piltaver, Rok and Lu{\v{s}}trek, Mitja and Gams, Matja{\v{z}} and Martin{\v{c}}i{\'c}-Ip{\v{s}}i{\'c}, Sanda}, + booktitle={6th International Conference on Information Technologies and Information Society - ITIS2014}, + year={2014} +} + + + +@inproceedings{Otero:2013:IIC:2463372.2463382, + author = {Otero, Fernando E.B. and Freitas, Alex A.}, + title = {Improving the Interpretability of Classification Rules Discovered by an Ant Colony Algorithm}, + booktitle = {Proceedings of the 15th Annual Conference on Genetic and Evolutionary Computation}, + series = {GECCO '13}, + year = {2013}, + isbn = {978-1-4503-1963-8}, + location = {Amsterdam, The Netherlands}, + pages = {73--80}, + numpages = {8}, + acmid = {2463382}, + publisher = {ACM}, + address = {New York, NY, USA}, + keywords = {ant colony optimization, classification, data mining, sequential covering, unordered rules}, +} + +@article{tentori2004conjunction, + title={The conjunction fallacy: a misunderstanding about conjunction?}, + author={Tentori, Katya and Bonini, Nicolao and Osherson, Daniel}, + journal={Cognitive Science}, + volume={28}, + number={3}, + pages={467--477}, + year={2004}, + publisher={Elsevier} +} + +@article{tversky1983extensional, + title={Extensional versus intuitive reasoning: the conjunction fallacy in probability judgment}, + author={Tversky, Amos and Kahneman, Daniel}, + journal={Psychological review}, + volume={90}, + number={4}, + pages={293}, + year={1983}, + publisher={American Psychological Association} +} + +@article{tentori2012conjunction, + title={On the conjunction fallacy and the meaning of and, yet again: A reply to {Hertwig, Benz, and Krauss} (2008)}, + author={Tentori, Katya and Crupi, Vincenzo}, + journal={Cognition}, + volume={122}, + number={2}, + pages={123--134}, + year={2012}, + publisher={Elsevier} +} + +@INPROCEEDINGS{DBLP:conf/sigmod/AgrawalIS93, + author = "Rakesh Agrawal and Tomasz Imielinski and Arun N. Swami", + title = "Mining Association Rules between Sets of Items in Large Databases", + booktitle = "SIGMOD", + year = "1993", + pages = "207--216", + crossref = "DBLP:conf/sigmod/93", + bibsource = "DBLP, http://dblp.uni-trier.de" +} + +@TechReport{ page98pagerank, + author = "Lawrence Page and Sergey Brin and Rajeev Motwani and Terry Winograd", + institution = "Stanford Digital Library Technologies Project", + title = "The {PageRank} Citation Ranking: Bringing Order to the Web", + year = "1998" +} + + + + +@article{journals/jmlr/HahslerCHB11, + added-at = {2011-12-01T00:00:00.000+0100}, + author = {Hahsler, Michael and Chelluboina, Sudheer and Hornik, Kurt and Buchta, Christian}, + biburl = {http://www.bibsonomy.org/bibtex/2d504938973442b3fb46de4ecaf6cef7d/dblp}, + ee = {http://dl.acm.org/citation.cfm?id=2021064}, + interhash = {d1bc11bd100d4954c12466b6f2db8df8}, + intrahash = {d504938973442b3fb46de4ecaf6cef7d}, + journal = {Journal of Machine Learning Research}, + keywords = {dblp}, + pages = {2021-2025}, + timestamp = {2011-12-02T11:38:47.000+0100}, + title = {The arules R-Package Ecosystem: Analyzing Interesting Patterns from Large Transaction Data Sets}, + url = {http://dblp.uni-trier.de/db/journals/jmlr/jmlr12.html#HahslerCHB11}, + volume = 12, + year = 2011 +} + + +@ARTICLE{Hahsler05acomputational, + author = {Michael Hahsler and Bettina Grün and Kurt Hornik}, + title = { A computational environment for mining association rules and frequent item sets}, + journal = {JOURNAL OF STATISTICAL SOFTWARE}, + year = {2005}, + pages = {2005} +} + +@inproceedings{DBLP:conf/clef/KliegrK15, + author = {Tom{\'{a}}\v{s} Kliegr and + Jaroslav Kucha\v{r}}, + title = {Benchmark of Rule-Based Classifiers in the News Recommendation Task}, + booktitle = {Experimental {IR} Meets Multilinguality, Multimodality, and Interaction + - 6th International Conference of the {CLEF} Association, {CLEF} 2015, + Toulouse, France, September 8-11, 2015, Proceedings}, + pages = {130--141}, + year = {2015}, + crossref = {DBLP:conf/clef/2015}, + timestamp = {Mon, 31 Aug 2015 13:37:28 +0200}, + biburl = {http://dblp.uni-trier.de/rec/bib/conf/clef/KliegrK15}, + bibsource = {dblp computer science bibliography, http://dblp.org} +} + +@proceedings{DBLP:conf/clef/2015, + editor = {Josiane Mothe and + Jacques Savoy and + Jaap Kamps and + Karen Pinel{-}Sauvagnat and + Gareth J. F. Jones and + Eric SanJuan and + Linda Cappellato and + Nicola Ferro}, + title = {Experimental {IR} Meets Multilinguality, Multimodality, and Interaction + - 6th International Conference of the {CLEF} Association, {CLEF} 2015, + Toulouse, France, September 8-11, 2015, Proceedings}, + series = {Lecture Notes in Computer Science}, + volume = {9283}, + publisher = {Springer}, + year = {2015}, + isbn = {978-3-319-24026-8}, + timestamp = {Mon, 31 Aug 2015 13:36:49 +0200}, + biburl = {http://dblp.uni-trier.de/rec/bib/conf/clef/2015}, + bibsource = {dblp computer science bibliography, http://dblp.org} +} + + +@misc{lucskdd, +author={Frans Coenen}, +year=2004, +title={{LUCS KDD implementation of CBA (Classification Based on Associations)}}, +url={http://www.csc.liv.ac.uk/~frans/KDD/Software/CMAR/cba.html}, +note={Institute of Computer Science, The University of Liverpool, UK.} +} + +@article{IJSP37872, + author = {Ida Moghimipour and Malihe Ebrahimpour}, + title = {Comparing Decision Tree Method Over Three Data Mining Software}, + journal = {International Journal of Statistics and Probability}, + volume = {3}, + number = {3}, + year = {2014}, + keywords = {}, + abstract = {As a result of the growing IT and producing methods and collecting data, it is admitted that today the data can be warehoused faster in comparison with the past. Therefore, knowledge discovery tools are required in order to make use of data mining. Data mining is typically employed as an advanced tool for analyzing the data and knowledge discovery. Indeed, the purpose of data mining is to establish models for decision. These models have the ability to predict the future treatments according to the past analysis and are of the exciting areas of machine learning and adaptive computation. Statistical analysis of the data uses a combination of techniques and artificial intelligence algorithms and data quality information. To utilize the data mining applications, including the commercial and open source applications, numerous programs are currently available.In this research, we introduce data mining and principal concepts of the decision tree method which are the most effective and widely used +classification methods. In addition, a succinct description of the three data mining software, namely \textit\{SPSS-Clementine\}, \textit\{RapidMiner\} and \textit\{Weka\} is also provided. Afterwards, a comparison was performed on 3515 real datasets in terms of classification accuracy between the three different decision tree algorithms in order to illustrate the procedure of this research. The most accurate decision tree algorithm is \emph\{Decision Tree\} by 92.49\% in \emph\{Rapidminer\}.}, + issn = {1927-7040}, url = {http://www.ccsenet.org/journal/index.php/ijsp/article/view/37872} +} +@article{id3, +year={1986}, +issn={0885-6125}, +journal={Machine Learning}, +volume={1}, +number={1}, +title={Induction of decision trees}, +publisher={Kluwer Academic Publishers}, +keywords={classification; induction; decision trees; information theory; knowledge acquisition; expert systems}, +author={Quinlan, J.R.}, +pages={81-106}, +language={English} +} + +@inproceedings{Dumais:2000:HCW:345508.345593, + author = {Dumais, Susan and Chen, Hao}, + title = {Hierarchical Classification of Web Content}, + booktitle = {Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval}, + series = {SIGIR '00}, + year = {2000}, + isbn = {1-58113-226-3}, + location = {Athens, Greece}, + pages = {256--263}, + numpages = {8}, + url = {http://doi.acm.org/10.1145/345508.345593}, + doi = {10.1145/345508.345593}, + acmid = {345593}, + publisher = {ACM}, + address = {New York, NY, USA}, + keywords = {Web hierarchies, classification, hierarchical models, machine learning, support vector machines, text catergorization, text classification}, +} + + + +@article{kliegrMphil, +title={Monotonicity in Rule Classifiers}, + journal={Research Report (MPhil Progression)}, +publisher={Queen Mary, University of London}, +author={Tom{\'a}\v{s} Kliegr}, +year={2014}} + + +@incollection{stecher2014separating, + title={Separating Rule Refinement and Rule Selection Heuristics in Inductive Rule Learning}, + author={Stecher, Julius and Janssen, Frederik and F{\"u}rnkranz, Johannes}, + booktitle={Machine Learning and Knowledge Discovery in Databases}, + pages={114--129}, + year={2014}, + publisher={Springer Berlin Heidelberg} +} + +@inproceedings{conf/semweb/DojchinovskiK13, + added-at = {2013-12-09T00:00:00.000+0100}, + author = {Dojchinovski, Milan and Kliegr, Tomás}, + biburl = {http://www.bibsonomy.org/bibtex/201752d4b2a78fc47cf087960915012c9/dblp}, + booktitle = {NLP-DBPEDIA@ISWC}, + crossref = {conf/semweb/2013nlp}, + editor = {Hellmann, Sebastian and Filipowska, Agata and Barrière, Caroline and Mendes, Pablo N. and Kontokostas, Dimitris}, + ee = {http://ceur-ws.org/Vol-1064/Dojchinovski_Datasets.pdf}, + interhash = {58c7cb988e150c8a3194b9587db9c81c}, + intrahash = {01752d4b2a78fc47cf087960915012c9}, + keywords = {dblp}, + publisher = {CEUR-WS.org}, + series = {CEUR Workshop Proceedings}, + timestamp = {2013-12-09T00:00:00.000+0100}, + title = {Datasets, GATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems.}, + url = {http://dblp.uni-trier.de/db/conf/semweb/nlp2013.html#DojchinovskiK13}, + volume = 1064, + year = 2013 +} + + +@TECHREPORT{bbcClientSide, + author = {Chris Newell and Libby Miller}, + title = {Design and Evaluation of a Client side Recommender System}, + institution = {University of Ottowa}, + year = {2014}, + note={Whitepaper, RecSys'13} +} + +Design and Evaluation of a Client +- +side +Recommender System + +@incollection{zaiane95, +year={2005}, +isbn={978-3-540-26076-9}, +booktitle={Advances in Knowledge Discovery and Data Mining}, +volume={3518}, +series={Lecture Notes in Computer Science}, +editor={Ho, TuBao and Cheung, David and Liu, Huan}, +title={Considering Re-occurring Features in Associative Classifiers}, +publisher={Springer Berlin Heidelberg}, +author={Rak, Rafal and Stach, Wojciech and Zaïane, OsmarR. and Antonie, Maria-Luiza}, +pages={240-248}, +language={English} +} + +@inproceedings{Bekkerman:2001:FDC:383952.383976, + author = {Bekkerman, Ron and El-Yaniv, Ran and Tishby, Naftali and Winter, Yoad}, + title = {On Feature Distributional Clustering for Text Categorization}, + booktitle = {Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval}, + series = {SIGIR '01}, + year = {2001}, + isbn = {1-58113-331-6}, + location = {New Orleans, Louisiana, USA}, + pages = {146--153}, + numpages = {8}, + url = {http://doi.acm.org/10.1145/383952.383976}, + doi = {10.1145/383952.383976}, + acmid = {383976}, + publisher = {ACM}, + address = {New York, NY, USA}, +} + + + +@inproceedings{Antonie:2002:TDC:844380.844745, + author = {Antonie, Maria-Luiza and Za\"{\i}ane, Osmar R.}, + title = {Text Document Categorization by Term Association}, + booktitle = {Proceedings of the 2002 IEEE International Conference on Data Mining}, + series = {ICDM '02}, + year = {2002}, + isbn = {0-7695-1754-4}, + pages = {19--}, + url = {http://dl.acm.org/citation.cfm?id=844380.844745}, + acmid = {844745}, + publisher = {IEEE Computer Society}, + address = {Washington, DC, USA}, +} + + +@article{greco3ejor01, +author="S. Greco and B. Matarazzo and R. Slowinski", +year=2001, +title="Rough sets theory for multicriteria decision analysis", +journal="European Journal of Operational Research", +volume=129, +pages="1-47" +} + + +@inproceedings{Shashua02rankingwith, + author = {Shashua, Amnon and Levin, Anat}, + title = {Ranking with Large Margin Principle: Two Approaches}, + booktitle = {Proceedings of the 15th International Conference on Neural Information Processing Systems}, + series = {NIPS'02}, + year = {2002}, + pages = {961--968}, + numpages = {8}, + acmid = {2968738}, + publisher = {MIT Press}, + address = {Cambridge, MA, USA}, +} + + + +@article{Boutilier:2004:CTR:1622467.1622473, + author = {Boutilier, Craig and Brafman, Ronen I. and Domshlak, Carmel and Hoos, Holger H. and Poole, David}, + title = {CP-nets: A Tool for Representing and Reasoning with Conditional Ceteris Paribus Preference Statements}, + journal = {Journal of Artificial Intelligence Research}, + issue_date = {January 2004}, + volume = {21}, + number = {1}, + month = feb, + year = {2004}, + issn = {1076-9757}, + pages = {135--191}, + numpages = {57}, + acmid = {1622473}, + publisher = {AI Access Foundation}, + address = {USA}, +} + +@inproceedings{UTANM, +author = "Tom{\'a}s Kliegr", + title = {{UTA - NM}: Explaining Stated Preferences with Additive Non-Monotonic Utility Functions}, + booktitle = {Proceedings of the ECML'09 Preference Learning Workshop}, + location = {Bled, Slovenia}, + year=2009 +} + + +@article{greco, +title = {Inductive Models of User Preferences for Semantic Web}, +author = {Salvatore Greco and Vincent Mousseau and Roman Slowinski }, +year = {2007}, +pages = {416--436}, +crossref = {b:PoSnDATESO2007}, +booktitle = {DATESO 2008}, +journal = {European Journal of Operational Research }, +number=191, +issue=2 +} + + +@Article{b:LagrezeUTA82, + author={Jacquet-Lagreze, E. and Siskos, J.}, + title={Assessing a set of additive utility functions for multicriteria decision-making, the UTA method}, + journal={European Journal of Operational Research}, + year=1982, + volume={10}, + number={2}, + pages={151-164}, + month={June}, + keywords={} +} + +@inproceedings{kinteresttv, + author = {Julien Leroy and Francois Rocca and Matei Mancas and Radhwan Ben Madhkour and Fabien c +Grisard and Tom{\'a}\v{s} Kliegr and Jaroslav Kucha\v{r} and Jakub Vit and Ivan Pirner and Petr +Zimmermann}, + editor= { Yves Rybarczyk and Tiago Cardoso and Joao Rosas }, + publisher = {Springer}, + title={Innovative and Creative Developments in Multimodal Interaction Systems}, + booktitle = {{KINterestTV} - Can we measure in a non-invasive way, the interest that a user has in front of his television displaying its content?}, + year = 2014 +} + + +@incollection{3dHeadPose, +year={2013}, +isbn={978-3-319-03891-9}, +booktitle={Intelligent Technologies for Interactive Entertainment}, +volume={124}, +series={Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering}, +editor={Mancas, Matei and d’ Alessandro, Nicolas and Siebert, Xavier and Gosselin, Bernard and Valderrama, Carlos and Dutoit, Thierry}, +title={{3D} Head Pose Estimation for {TV} Setups}, +publisher={Springer International Publishing}, +keywords={attention; head pose estimation; second screen interaction; eye tracking; Facelab; future TV; personalization}, +author={Leroy, Julien and Rocca, Francois and Mancaş, Matei and Gosselin, Bernard}, +pages={55-64} +} + +@inproceedings{Joachims:2006:TLS:1150402.1150429, + author = {Joachims, Thorsten}, + title = {Training Linear SVMs in Linear Time}, + booktitle = {Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining}, + series = {KDD '06}, + year = {2006}, + isbn = {1-59593-339-5}, + location = {Philadelphia, PA, USA}, + pages = {217--226}, + numpages = {10}, + url = {http://doi.acm.org/10.1145/1150402.1150429}, + doi = {10.1145/1150402.1150429}, + acmid = {1150429}, + publisher = {ACM}, + address = {New York, NY, USA}, + keywords = {ROC-area, large-scale problems, ordinal regression, support vector machines (SVM), training algorithms}, +} + + + +@book{hastie01statisticallearning, + added-at = {2008-05-16T16:17:42.000+0200}, + address = {New York, NY, USA}, + author = {Hastie, Trevor and Tibshirani, Robert and Friedman, Jerome}, + biburl = {http://www.bibsonomy.org/bibtex/2f58afc5c9793fcc8ad8389824e57984c/sb3000}, + interhash = {d585aea274f2b9b228fc1629bc273644}, + intrahash = {f58afc5c9793fcc8ad8389824e57984c}, + keywords = {ml statistics}, + publisher = {Springer New York Inc.}, + series = {Springer Series in Statistics}, + timestamp = {2008-05-16T16:17:42.000+0200}, + title = {The Elements of Statistical Learning}, + year = 2001 +} + + +@INPROCEEDINGS{Cohen95fasteffective, + author = {William W. Cohen}, + title = {Fast Effective Rule Induction}, + booktitle = {Proceedings of the Twelfth International Conference on Machine Learning}, + year = {1995}, + pages = {115--123}, + publisher = {Morgan Kaufmann} +} + + +@inproceedings{DBLP:conf/clef/14, + author = {Jaroslav Kucha\v{r} and Tom{\'a}\v{s} Kliegr }, + title = {{InBeat}: News Recommender System as a Service @ {CLEF NEWSREEL'14}}, + booktitle = {CEUR-WS Proceedings Vol-1180: CLEF-2014}, + year = {2014} +} + + + +@inproceedings{DBLP:conf/ruleml/KliegrKSV14, +author="Kliegr, Tom{\'a}{\v{s}} +and Kucha{\v{r}}, Jaroslav +and Sottara, Davide +and Voj{\'i}{\v{r}}, Stanislav", +editor="Bikakis, Antonis +and Fodor, Paul +and Roman, Dumitru", +title="Learning Business Rules with Association Rule Classifiers", +bookTitle="Rules on the Web. From Theory to Applications: 8th International Symposium, RuleML 2014, Co-located with the 21st European Conference on Artificial Intelligence, ECAI 2014, Prague, Czech Republic, August 18-20, 2014. Proceedings", +year="2014", +publisher="Springer International Publishing", +address="Cham", +pages="236--250", +isbn="978-3-319-09870-8" +} + + + +@inproceedings{Kille:ThePlistaDataset:2013, +author = {Kille, Benjamin and Hopfgartner, Frank and Brodt, Torben and Heintz, Tobias}, +title = {The plista Dataset}, +booktitle = {NRS’13: Proceedings of the International Workshop and Challenge on News Recommender Systems}, +year = {2013}, +month = {10}, +pages = {14–22}, +location = {Hong Kong, China}, +publisher = {ACM}, +series = {ICPS} +} + +@inproceedings{Kliegr:InBeat14, + AUTHOR = "Tom{\'a}\v{s} Kliegr and Jaroslav Kucha\v{r}", + TITLE = "{Orwellian Eye}: Video Recommendation with {Microsoft Kinect}", + BOOKTITLE = "Conference on Prestigious Applications of Intelligent Systems (PAIS'14) collocated with European Conference on Artificial Intelligence (ECAI'14)", + publisher = "IOS Press", + MONTH = "August", + YEAR = {2014} +} + + + +@inproceedings{Kuchar:GAIN12, + AUTHOR = "Jaroslav Kucha\v{r} and Tom{\'a}\v{s} Kliegr", + TITLE = "{GAIN}: Analysis of Implicit Feedback on Semantically Annotated Content", + BOOKTITLE = "7th Workshop on Intelligent and Knowledge Oriented Technologies", + PAGES = "75-78", + ORGANIZATION = "STU FIIT", + MONTH = "November", + YEAR = {2012} +} + +@inproceedings{Kuchar:GAIN13, + author = {Jaroslav Kucha\v{r} and Tom{\'a}\v{s} Kliegr}, + title = {{GAIN}: web service for user tracking and preference learning - a {SMART TV} use case}, + booktitle = {7th ACM Conference on Recommender Systems, RecSys '13, Hong Kong, China, October 12-16, 2013}, + year = {2013} +} + +@article{recSysOverview, +title = "Recommender systems ", +journal = "Physics Reports ", +volume = "519", +number = "1", +pages = "1 - 49", +year = "2012", +issn = "0370-1573", +author = "Linyuan Lu and Mat{\'u}\v{s} Medo and Chi Ho Yeung and Yi-Cheng Zhang and Zi-Ke Zhang and Tao Zhou", +keywords = "Recommender systems", +keywords = "Information filtering", +keywords = "Networks " +} + +@inproceedings{Schafer:1999:RSE:336992.337035, + author = {Schafer, J. Ben and Konstan, Joseph and Riedl, John}, + title = {Recommender systems in {E-commerce}}, + booktitle = {Proceedings of the 1st ACM Conference on Electronic commerce}, + series = {EC '99}, + year = {1999}, + isbn = {1-58113-176-3}, + location = {Denver, Colorado, USA}, + pages = {158--166}, + numpages = {9}, + acmid = {337035}, + publisher = {ACM}, + address = {New York, NY, USA}, + keywords = {cross-sell, customer loyalty, electronic commerce, interface, mass customization, recommender systems, up-sell}, +} + +@article{Bizer:2009:DCP:1640541.1640848, + author = {Bizer, Christian and Lehmann, Jens and Kobilarov, Georgi and Auer, S\"{o}ren and Becker, Christian and Cyganiak, Richard and Hellmann, Sebastian}, + title = {{DB}pedia - A crystallization point for the Web of Data}, + journal = {Web Semant.}, + issue_date = {September, 2009}, + volume = {7}, + number = {3}, + month = sep, + year = {2009}, + issn = {1570-8268}, + pages = {154--165}, + numpages = {12}, + acmid = {1640848}, + publisher = {Elsevier Science Publishers B. V.}, + address = {Amsterdam, The Netherlands, The Netherlands}, + keywords = {Knowledge extraction, Linked Data, RDF, Web of Data, Wikipedia}, +} + + +@ARTICLE{yago2, +AUTHOR = {Hoffart, Johannes and Suchanek, Fabian M. and Berberich, Klaus and Weikum, Gerhard}, +TITLE = {{YAGO2}: A spatially and temporally enhanced knowledge base from {W}ikipedia}, +JOURNAL = {Artificial Intelligence}, +PUBLISHER = {Elsevier}, +YEAR = {2013}, +VOLUME = {194}, +PAGES = {28--61}, +ADDRESS = {Amsterdam}, +} + +@Article{ rauch, + author = "Jan Rauch AND Milan \v{S}im\r{u}nek", + title = "An Alternative Approach to Mining Association Rules", + journal = "Foundation of Data Mining and Knowl. Discovery", + publisher = "Springer", + year = "2005", + volume = "6", + isbn = "978-3-540-26257-2", + pages = "211--231", + place = "Berlin" +} + +@Book{ kliegrdp, + publisher = "University of Economics in Prague, Faculty of Informatics and Statistics", + address = "Prague", + author = "Tom{\'a}\v{s} Kliegr", + title = "Clickstream Analysis", + year = "2007", + note = "Master Thesis" +} +@InProceedings{ RuleMLChallenge13, + author = "Voj{\'i}\v{r}, Stanislav and Kliegr, Tom{\'a}\v{s} and Hazucha, Andrej and {S}krabal, Radek and \v{S}im\r{u}nek, Milan", + booktitle = "RuleML-2013 Challenge", + editor = "Paul Fodor and Dumitru Roman and Darko Anicic and Adam Wyner and Monica Palmirani and Davide Sottara and Francois L{\'e}vy ", + publisher = "CEUR-WS.org", + series = "CEUR Workshop Proceedings", + title = "Transforming Association Rules to Business Rules: {EasyMiner} meets {Drools} ", + volume = "1004", + year = "2013", + ee = "http://ceur-ws.org/Vol-1004/paper13.pdf", +} +@InProceedings{ RuleMLChallenge13_short, + author = "Voj{\'i}\v{r}, Stanislav and Kliegr, Tom{\'a}\v{s} and Hazucha, Andrej and \v{S}krabal, Radek and \v{S}im\r{u}nek, Milan", + booktitle = "RuleML-2013 Challenge", + publisher = "CEUR-WS.org", + title = "Transforming Association Rules to Business Rules: {EasyMiner} meets {Drools} ", + year = "2013", +} + + + + +@inproceedings{cmar, + author = {Li, Wenmin and Han, Jiawei and Pei, Jian}, + title = {CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules}, + booktitle = {Proceedings of the 2001 IEEE International Conference on Data Mining}, + series = {ICDM '01}, + year = {2001}, + isbn = {0-7695-1119-8}, + pages = {369--376}, + numpages = {8}, + acmid = {657866}, + publisher = {IEEE Computer Society}, + address = {Washington, DC, USA}, +} + + + +@inproceedings{Antonie:2004:MPN:1053072.1053078, + author = {Antonie, Maria-Luiza and Za\"{\i}ane, Osmar R.}, + title = {Mining Positive and Negative Association Rules: An Approach for Confined Rules}, + booktitle = {Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases}, + series = {PKDD '04}, + year = {2004}, + isbn = {3-540-23108-0}, + location = {Pisa, Italy}, + pages = {27--38}, + numpages = {12}, + acmid = {1053078}, + publisher = {Springer-Verlag New York, Inc.}, + address = {New York, NY, USA}, +} + + + +@book{DBLP:books/mk/Quinlan93, + author = {J. Ross Quinlan}, + title = {{C4.5}: Programs for Machine Learning}, + publisher = {Morgan Kaufmann}, + year = {1993}, + isbn = {1-55860-238-0}, + bibsource = {DBLP, http://dblp.uni-trier.de} +} + +@article{Han:2004:MFP:954514.954525, + author = {Han, Jiawei and Pei, Jian and Yin, Yiwen and Mao, Runying}, + title = {Mining Frequent Patterns Without Candidate Generation: A Frequent-Pattern Tree Approach}, + journal = {Data Mining and Knowledge Discovery}, + issue_date = {January 2004}, + volume = {8}, + number = {1}, + month = jan, + year = {2004}, + issn = {1384-5810}, + pages = {53--87}, + numpages = {35}, + acmid = {954525}, + publisher = {Kluwer Academic Publishers}, + address = {Hingham, MA, USA}, + keywords = {algorithm, association mining, data structure, frequent pattern mining, performance improvements}, +} + +@article{thabtah2006pruning, + title={Pruning Techniques in Associative Classification: Survey and Comparison.}, + author={Thabtah, Fadi}, + journal={Journal of Digital Information Management}, + volume={4}, + number={3}, + year={2006} +} + + +@article{DBLP:journals/ker/Thabtah07, + author = {Fadi A. Thabtah}, + title = {A review of associative classification mining}, + journal = {Knowledge Eng. Review}, + volume = {22}, + number = {1}, + year = {2007}, + pages = {37-65}, + ee = {http://dx.doi.org/10.1017/S0269888907001026}, + bibsource = {DBLP, http://dblp.uni-trier.de} +} + + +@inproceedings{DBLP:conf/ruleml/VojirKHSS13, + author = {Stanislav Voj\'{\i}\v{r} and + Tom{\'a}\v{s} Kliegr and + Andrej Hazucha and + Radek Skrabal and + Milan \v{S}imunek}, + title = {Transforming Association Rules to Business Rules: EasyMiner + meets Drools}, + booktitle = {RuleML (2)}, + year = {2013}, + ee = {http://ceur-ws.org/Vol-1004/paper13.pdf}, + crossref = {DBLP:conf/ruleml/2013-2}, + bibsource = {DBLP, http://dblp.uni-trier.de} +} + +@proceedings{DBLP:conf/ruleml/2013-2, + editor = {Paul Fodor and + Dumitru Roman and + Darko Anicic and + Adam Wyner and + Monica Palmirani and + Davide Sottara and + Fran\c{c}ois L{\'e}vy}, + title = {Joint Proceedings of the 7th International Rule Challenge, + the Special Track on Human Language Technology and the 3rd + RuleML Doctoral Consortium, Seattle, USA, July 11 -13, 2013}, + booktitle = {RuleML (2)}, + publisher = {CEUR-WS.org}, + series = {CEUR Workshop Proceedings}, + volume = {1004}, + year = {2013}, + ee = {http://ceur-ws.org/Vol-1004}, + bibsource = {DBLP, http://dblp.uni-trier.de} +} + + + +@Proceedings{ DBLP:conf/sigmod/93, + title = "Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, D.C., May 26-28, 1993", + publisher = "ACM Press", + year = "1993", + bibsource = "DBLP, http://dblp.uni-trier.de" +} +@INPROCEEDINGS{arcReview, +author={Vanhoof, K. and Depaire, B.}, +booktitle={2010 International Conference on Intelligent Systems and Knowledge Engineering (ISKE)}, +title={Structure of association rule classifiers: a review}, +year={2010}, +month={November}, +pages={9-12}, +keywords={data mining;pattern classification;ARC;association rule classifiers;classification schemes;pruning schemes;Accuracy;Association rules;Classification algorithms;Error analysis;Itemsets;Transforms;association rules;classification;pruning} +} +@inproceedings{Toivonen95pruningand, + author = {H. Toivonen and M. Klemettinen and P. Ronkainen and K. Hätönen and H. Mannila}, + title = {Pruning and Grouping Discovered Association Rules}, + booktitle="ECML'95 Workshop on statistics, Machine Learning and Knowledge Discovery in Databases", + pages={47-52}, + year = {1995} +} + +@article{thabtah, +year={2006}, +issn={0219-1377}, +journal={Knowledge and Information Systems}, +volume={9}, +number={1}, +title={Multiple labels associative classification}, +publisher={Springer-Verlag}, +keywords={Data mining; Association rule; Classification; Multi-label classification; Frequent itemset; Hyperheuristic; Scheduling}, +author={Fadi Thabtah and Peter Cowling and Yonghong Peng }, +pages={109-129}, +language={English} +} + + +@inproceedings{Liu98integratingclassification, + author = {Liu, Bing and Hsu, Wynne and Ma, Yiming}, + title = {Integrating Classification and Association Rule Mining}, + booktitle = {Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining}, + series = {KDD'98}, + year = {1998}, + location = {New York, NY}, + pages = {80--86}, + numpages = {7}, + acmid = {3000305}, + publisher = {AAAI Press} +} + + +@incollection{impactofranking, +year={2006}, +isbn={978-1-84628-225-6}, +booktitle={Research and Development in Intelligent Systems XXII}, +editor={Bramer, Max and Coenen, Frans and Allen, Tony}, +title={The Impact of Rule Ranking on the Quality of Associative Classifiers}, +publisher={Springer London}, +author={Thabtah, Fadi and Cowling, Peter and Peng, Yonghong}, +pages={277-287}, +language={English} +} + +@INPROCEEDINGS{Ali97partialclassification, + author = {Kamal Ali and Stefanos Manganaris and Ramakrishnan Srikant}, + title = {Partial classification using association rules}, + booktitle = { Proc. 3th Int. Conf. on KDD}, + year = {1997}, + pages = {115--118}, + publisher = {} +} + +@article{THABTAH:2007:RAC:1294755.1294758, + author = {THABTAH, FADI}, + title = {A Review of Associative Classification Mining}, + journal = {Knowl. Eng. Rev.}, + issue_date = {March 2007}, + volume = {22}, + number = {1}, + month = mar, + year = {2007}, + issn = {0269-8889}, + pages = {37--65}, + numpages = {29}, + acmid = {1294758}, + publisher = {Cambridge University Press}, + address = {New York, NY, USA}, +} + + + + +@inproceedings{FayyadI93, + author = {Fayyad, U. M. and Irani, K. B.}, + booktitle = {13th International Joint Conference on Uncertainly in Artificial Intelligence (IJCAI93)}, + citeulike-article-id = {6584254}, + keywords = {discretization}, + pages = {1022--1029}, + posted-at = {2010-01-24 22:34:23}, + priority = {1}, + title = {Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning}, + year = {1993} +} + + + + +@article{thabtah2006pruning, + title={Pruning Techniques in Associative Classification: Survey and Comparison.}, + author={Thabtah, Fadi}, + journal={Journal of Digital Information Management}, + volume={4}, + number={3}, + year={2006} +} + +@article{DBLP:journals/ker/Thabtah07, + author = {Fadi A. Thabtah}, + title = {A review of associative classification mining}, + journal = {Knowledge Eng. Review}, + volume = {22}, + number = {1}, + year = {2007}, + pages = {37-65}, + ee = {http://dx.doi.org/10.1017/S0269888907001026}, + bibsource = {DBLP, http://dblp.uni-trier.de} +} + + +@article{hu2008falcon, + title={Falcon-{AO}: A practical ontology matching system}, + author={Hu, Wei and Qu, Yuzhong}, + journal={Web Semantics: Science, Services and Agents on the World Wide Web}, + volume={6}, + number={3}, + pages={237--239}, + year={2008}, + publisher={Elsevier} +} + +@incollection{ngo2012yam++, + title={{YAM}++: a multi-strategy based approach for ontology matching task}, + author={Ngo, DuyHoa and Bellahsene, Zohra}, + booktitle={Knowledge Engineering and Knowledge Management}, + pages={421--425}, + year={2012}, + publisher={Springer} +} + +@article{jimenez2013logmap, + title={LogMap and LogMapLt results for {OAEI} 2012}, + author={Jim{\'e}nez-Ruiz, Ernesto and Grau, Bernardo Cuenca and Horrocks, Ian}, + journal={Ontology Matching}, + pages={131}, + year={2013} +} + +@misc{pradeep2013methods, + title={Methods and apparatus for providing personalized media in video}, + author={Pradeep, A. and Knight, R.T. and Gurumoorthy, R.}, + url={https://www.google.com/patents/US8464288}, + year={2013}, + month=jun # "~11", + publisher={Google Patents}, + note={US Patent 8,464,288} +} +@book{orwell20041984, + title={1984}, + author={Orwell, G.}, + isbn={9781595404329}, + lccn={2003195107}, + url={http://books.google.cz/books?id=w-rb62wiFAwC}, + year={2004}, + publisher={1st World Library - Literary Society} +} + +@incollection{3dHeadPose_short, +year={2013}, +isbn={978-3-319-03891-9}, +booktitle={Intelligent Technologies for Interactive Entertainment}, +volume={124}, +title={3D Head Pose Estimation for {TV} Setups}, +publisher={Springer}, +keywords={attention; head pose estimation; second screen interaction; eye tracking; Facelab; future TV; personalization}, +author={Leroy, Julien and Rocca, Francois and Mancaş, Matei and Gosselin, Bernard}, +pages={55-64} +} +@incollection{3dHeadPose, +year={2013}, +isbn={978-3-319-03891-9}, +booktitle={Intelligent Technologies for Interactive Entertainment}, +volume={124}, +series={Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering}, +editor={Mancas, Matei and d’ Alessandro, Nicolas and Siebert, Xavier and Gosselin, Bernard and Valderrama, Carlos and Dutoit, Thierry}, +title={3D Head Pose Estimation for TV Setups}, +publisher={Springer International Publishing}, +keywords={attention; head pose estimation; second screen interaction; eye tracking; Facelab; future TV; personalization}, +author={Leroy, Julien and Rocca, Francois and Mancaş, Matei and Gosselin, Bernard}, +pages={55-64} +} + +@inproceedings{Kuchar:2013:GWS:2507157.2508217, + author = {Kucha\v{r}, Jaroslav and Kliegr, Tom\'{a}\v{s}}, + title = {{GAIN}: Web Service for User Tracking and Preference Learning - a {Smart TV} Use Case}, + booktitle = {Proceedings of the 7th ACM Conference on Recommender Systems}, + series = {RecSys '13}, + year = {2013}, + isbn = {978-1-4503-2409-0}, + location = {Hong Kong, China}, + pages = {467--468}, + numpages = {2}, + acmid = {2508217}, + publisher = {ACM}, + address = {New York, NY, USA}, + keywords = {association rules, entity classification, preference learning}, +} + +@inproceedings{Martinez-Gomez:2012:QAI:2166966.2167055, + author = {Martinez-Gomez, Pascual}, + title = {Quantitative Analysis and Inference on Gaze Data Using Natural Language Processing Techniques}, + booktitle = {Proceedings of the 2012 ACM International Conference on Intelligent User Interfaces}, + series = {IUI '12}, + year = {2012}, + isbn = {978-1-4503-1048-2}, + location = {Lisbon, Portugal}, + pages = {389--392}, + numpages = {4}, + doi = {10.1145/2166966.2167055}, + acmid = {2167055}, + publisher = {ACM}, + address = {New York, NY, USA}, + keywords = {eye-tracking, natural language processing, quantitative approaches, user-centered systems}, +} + + + +@inproceedings{Xu:2008:POD:1454008.1454023, + author = {Xu, Songhua and Jiang, Hao and Lau, Francis C.M.}, + title = {Personalized Online Document, Image and Video Recommendation via Commodity Eye-tracking}, + booktitle = {RecSys '08}, + year = {2008}, + isbn = {978-1-60558-093-7}, + location = {Lausanne, Switzerland}, + pages = {83--90}, + numpages = {8}, + acmid = {1454023}, + publisher = {ACM}, + address = {New York, NY, USA}, + keywords = {commodity eye-tracking, document, image and video recommendation, implicit user feedback, personalized recommendation and ranking, user attention, web search}, +} + +@incollection{recomreview, +year={2007}, +isbn={978-3-540-72078-2}, +booktitle={The Adaptive Web}, +series={LNCS}, +title={Content-Based Recommendation Systems}, +publisher={Springer}, +author={Pazzani, Michael J. and Billsus, Daniel}, +pages={325-341} +} + +@inproceedings{tac13, + author = {Milan Dojchinovski and Tom{\'a}\v{s} Kliegr and Ivo La\v{s}ek and Ond\v{r}ej Zamazal }, + title = {Wikipedia Search as Effective Entity Linking Algorithm}, + booktitle = {Text Analysis Conference (TAC) 2013 Proceedings}, + year = {2013}, + publisher = {NIST}, + note={Accepted} +} + +@inproceedings{DBLP:conf/esws/Gangemi13, + author = {Aldo Gangemi}, + title = {A Comparison of Knowledge Extraction Tools for the Semantic + Web}, + booktitle = {ESWC}, + year = {2013}, + pages = {351-366}, + crossref = {DBLP:conf/esws/2013}, + bibsource = {DBLP, http://dblp.uni-trier.de} +} + +@proceedings{DBLP:conf/esws/2013, + editor = {Philipp Cimiano and + {\'O}scar Corcho and + Valentina Presutti and + Laura Hollink and + Sebastian Rudolph}, + title = {The Semantic Web: Semantics and Big Data, 10th International + Conference, ESWC 2013, Montpellier, France, May 26-30, 2013. + Proceedings}, + booktitle = {ESWC}, + publisher = {Springer}, + series = {Lecture Notes in Computer Science}, + volume = {7882}, + year = {2013}, + isbn = {978-3-642-38287-1, 978-3-642-38288-8}, + bibsource = {DBLP, http://dblp.uni-trier.de} +} + +@INPROCEEDINGS{Ritter09whatis, + author = {Alan Ritter and Stephen Soderland and Oren Etzioni}, + title = {What is this, anyway: Automatic hypernym discovery}, + booktitle = {PROCEEDINGS OF AAAI-09 SPRING SYMPOSIUM ON LEARNING}, + year = {2009}, + pages = {88--93}, + publisher = {} +} + +@InProceedings{KASSNER08.544, + author = {Laura Kassner, Vivi Nastase and Michael Strube}, + title = {Acquiring a Taxonomy from the German Wikipedia}, + booktitle = {Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)}, + year = {2008}, + month = {may}, + date = {28-30}, + address = {Marrakech, Morocco}, + editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias}, + publisher = {European Language Resources Association (ELRA)}, + isbn = {2-9517408-4-0}, + note = {http://www.lrec-conf.org/proceedings/lrec2008/}, + language = {english} + } + +@inproceedings{Ponzetto:2007:DLS:1619797.1619876, + author = {Ponzetto, Simone Paolo and Strube, Michael}, + title = {Deriving a Large Scale Taxonomy from Wikipedia}, + booktitle = {Proceedings of the 22Nd National Conference on Artificial Intelligence - Volume 2}, + series = {AAAI'07}, + year = {2007}, + isbn = {978-1-57735-323-2}, + location = {Vancouver, British Columbia, Canada}, + pages = {1440--1445}, + numpages = {6}, + acmid = {1619876}, + publisher = {AAAI Press}, +} + +[download] + +@inproceedings{Leroy:2013:SSI:2487788.2487969, + author = {Leroy, Julien and Rocca, Fran\c{c}ois and Mancas, Matei and Gosselin, Bernard}, + title = {Second screen interaction: an approach to infer tv watcher's interest using 3d head pose estimation}, + booktitle = {Proceedings of the 22nd international conference on World Wide Web companion}, + series = {WWW '13 Companion}, + year = {2013}, + isbn = {978-1-4503-2038-2}, + location = {Rio de Janeiro, Brazil}, + pages = {465--468}, + numpages = {4}, + acmid = {2487969}, + publisher = {International World Wide Web Conferences Steering Committee}, + address = {Republic and Canton of Geneva, Switzerland}, + keywords = {attention, head pose estimation, second screen interaction}, +} +@inproceedings{kinect_short, + author = {Leroy, Julien and Rocca, Fran\c{c}ois and Mancas, Matei and Gosselin, Bernard}, + title = {Second screen interaction: an approach to infer {TV} watcher's interest using {3D} head pose estimation}, + booktitle = {LiME-2013}, + series = {WWW '13 Companion, ACM}, + year = {2013}, + pages = {465--468} +} +@inproceedings{kinect, + author = {Leroy, Julien and Rocca, Fran\c{c}ois and Mancas, Matei and Gosselin, Bernard}, + title = {Second screen interaction: an approach to infer tv watcher's interest using 3d head pose estimation}, + booktitle = {Proceedings of the 22nd international conference on World Wide Web companion}, + series = {WWW '13 Companion}, + year = {2013}, + isbn = {978-1-4503-2038-2}, + location = {Rio de Janeiro, Brazil}, + pages = {465--468}, + numpages = {4}, + acmid = {2487969}, + publisher = {International World Wide Web Conferences Steering Committee}, + address = {Republic and Canton of Geneva, Switzerland}, + keywords = {attention, head pose estimation, second screen interaction}, +} + +[download] + +@InProceedings{ RuleMLChallenge13, + author = "Voj{\'i}\v{r}, Stanislav and Kliegr, Tom{\'a}\v{s} and Hazucha, Andrej and {S}krabal, Radek and \v{S}im\r{u}nek, Milan", + booktitle = "RuleML-2013 Challenge", + editor = "Paul Fodor and Dumitru Roman and Darko Anicic and Adam Wyner and Monica Palmirani and Davide Sottara and François Lévy ", + publisher = "CEUR-WS.org", + series = "CEUR Workshop Proceedings", + title = "Transforming Association Rules to Business Rules: {EasyMiner} meets {Drools} ", + volume = "1004", + year = "2013", + ee = "http://ceur-ws.org/Vol-1004/paper13.pdf", +} +@InProceedings{ RuleMLChallenge13_short, + author = "Voj{\'i}\v{r}, Stanislav and Kliegr, Tom{\'a}\v{s} and Hazucha, Andrej and \v{S}krabal, Radek and \v{S}im\r{u}nek, Milan", + booktitle = "RuleML-2013 Challenge", + publisher = "CEUR-WS.org", + title = "Transforming Association Rules to Business Rules: {EasyMiner} meets {Drools} ", + year = "2013", +} + +@inproceedings{rizzo2011, + added-at = {2011-10-27T10:08:58.000+0200}, + author = {Rizzo, Giuseppe and Troncy, Raphael}, + biburl = {http://www.bibsonomy.org/bibtex/278c92d60b24e401a7065e4ac2f227615/maxirichter}, + interhash = {bc2d2dc358b0b4048f507156cfe54880}, + intrahash = {78c92d60b24e401a7065e4ac2f227615}, + keywords = {iswc2011 ner semantic_web}, + timestamp = {2011-10-27T10:08:58.000+0200}, + title = {{NERD}: a Framework for Evaluating Named Entity Recognition Tools in the Web of Data}, + url = {http://iswc2011.semanticweb.org/fileadmin/iswc/Papers/PostersDemos/iswc11pd_submission_35.pdf}, + year = 2011 +} + + +@book{rauch13, + author = {Jan Rauch}, + publisher = {Springer-Verlag}, + title = {{Observational Calculi and Association Rules}}, + series = {Studies in Computational Intelligence}, + year = {2013}, + address={Berlin} +} + +@misc{sbvr, + author = {OMG ({Object Management Group})}, + timestamp = {2008-01-27T23:43:29.000+0100}, + title = {“{Semantics of Business Vocabulary and Business Rules (SBVR)}, v1.0}, + url = {http://www.omg.org/spec/SBVR/1.0/PDF}, + year = 2008 +} +@InProceedings{ RuleMLChallenge, + author = "Tom\'{a}\v{s} Kliegr and David Chud{\'a}n and Andrej Hazucha and Jan Rauch", + booktitle = "RuleML-2010 Challenge", + editor = "Monica Palmirani and M. Omair Shafiq and Enrico Francesconi and Fabio Vitali", + publisher = "CEUR-WS.org", + series = "CEUR Workshop Proceedings", + title = " {SEWEBAR-CMS}: A System for Postprocessing Data Mining Models ", + volume = "639", + year = "2010", + ee = "http://ceur-ws.org/Vol-649/paper9.pdf", +} + +@inproceedings{conf/w3c/Chapin05, + added-at = {2005-07-06T00:00:00.000+0200}, + author = {Chapin, Donald}, + biburl = {http://www.bibsonomy.org/bibtex/213140d794f3a383fc9d8d787e97ca8ee/dblp}, + booktitle = {Rule Languages for Interoperability}, + crossref = {conf/w3c/2005r}, + date = {2005-07-06}, + description = {dblp}, + ee = {http://www.w3.org/2004/12/rules-ws/paper/85}, + interhash = {565c7aff79c89835d30fbf9c46d8ffd4}, + intrahash = {13140d794f3a383fc9d8d787e97ca8ee}, + keywords = {dblp}, + publisher = {W3C}, + timestamp = {2005-07-06T00:00:00.000+0200}, + title = {Semantics of Business Vocabulary & Business Rules (SBVR).}, + url = {http://dblp.uni-trier.de/db/conf/w3c/rules2005.html#Chapin05}, + year = 2005 +} + + + +@incollection{ar2nl, +year={2005}, +isbn={978-3-540-26257-2}, +booktitle={Foundations of Data Mining and knowledge Discovery}, +volume={6}, +series={Studies in Computational Intelligence}, +editor={Young Lin, Tsau and Ohsuga, Setsuo and Liau, Churn-Jung and Hu, Xiaohua and Tsumoto, Shusaku}, +title={Reporting Data Mining Results in a Natural Language}, +publisher={Springer Berlin Heidelberg}, +author={Strossa, Petr and Černý, Zdeněk and Rauch, Jan}, +pages={347-361} +} + +@INPROCEEDINGS{Goethals00onsupporting, + author = {Bart Goethals and Jan Van Den Bussche}, + title = {On Supporting Interactive Association Rule Mining}, + booktitle = {Proceedings of the 2 nd International Conference on Data Warehousing and Knowledge Discovery}, + year = {2000}, + pages = {307--316}, + publisher = {Springer} +} + +@article{Liu:2003:SDU:608371.608409, + author = {Liu, Bing and Ma, Yiming and Wong, Ching Kian and Yu, Philip S.}, + title = {Scoring the Data Using Association Rules}, + journal = {Applied Intelligence}, + issue_date = {March-April 2003}, + volume = {18}, + number = {2}, + month = mar, + year = {2003}, + issn = {0924-669X}, + pages = {119--135}, + numpages = {17}, + url = {http://dx.doi.org/10.1023/A:1021931008240}, + doi = {10.1023/A:1021931008240}, + acmid = {608409}, + publisher = {Kluwer Academic Publishers}, + address = {Hingham, MA, USA}, + keywords = {association rules, classifications, data mining, scoring, target selection}, +} + + +@inproceedings{thdecml13, +author="Dojchinovski, Milan +and Kliegr, Tom{\'a}{\v{s}}", +editor="Blockeel, Hendrik +and Kersting, Kristian +and Nijssen, Siegfried +and {\v{Z}}elezn{\'y}, Filip", +title="Entityclassifier.eu: Real-Time Classification of Entities in Text with Wikipedia", +bookTitle="Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2013, Prague, Czech Republic, September 23-27, 2013, Proceedings, Part III", +year="2013", +publisher="Springer Berlin Heidelberg", +address="Berlin, Heidelberg", +pages="654--658", +isbn="978-3-642-40994-3", +} + + +@incollection{easyminer12, +year={2012}, +isbn={978-3-642-33485-6}, +booktitle={Machine Learning and Knowledge Discovery in Databases}, +volume={7524}, +series={Lecture Notes in Computer Science}, +editor={Flach, Peter A. and Bie, Tijl and Cristianini, Nello}, +title={Association Rule Mining Following the Web Search Paradigm}, +publisher={Springer Berlin Heidelberg}, +author={\v{S}krabal, Radek and \v{S}im\r{u}nek, Milan and Voj{\'i}\v{r}, Stanislav and Hazucha, Andrej and Marek, Tom{\'a}\v{s} and Chud{\'a}n, David and Kliegr, Tom{\'a}\v{s}}, +pages={808-811} +} + +@inproceedings{easyminer12_short, +booktitle={ECML'12}, +title={Association Rule Mining Following the Web Search Paradigm}, +publisher={Springer}, +author={\v{S}krabal, Radek and \v{S}im\r{u}nek, Milan and Voj{\'i}\v{r}, Stanislav and Hazucha, Andrej and Marek, Tom{\'a}\v{s} and Chud{\'a}n, David and Kliegr, Tom{\'a}\v{s}}, +pages={808-811}, +year={2012} +} + +@inproceedings{isem2011mendesetal, + title = {{DB}pedia Spotlight: Shedding Light on the Web of Documents}, + author = {Pablo N. Mendes and Max Jakob and Andres Garcia-Silva and Christian Bizer}, + year = {2011}, + booktitle = {Proceedings of the 7th International Conference on Semantic Systems (I-Semantics)}, + abstract = {Interlinking text documents with Linked Open Data enables the Web of Data to be used as background knowledge within document-oriented applications such as search and faceted browsing. As a step towards interconnecting the Web of Documents with the Web of Data, we developed DBpedia Spotlight, a system for automatically annotating text documents with DBpedia URIs. DBpedia Spotlight allows users to configure the annotations to their specific needs through the DBpedia Ontology and quality measures such as prominence, topical pertinence, contextual ambiguity and disambiguation confidence. We compare our approach with the state of the art in disambiguation, and evaluate our results in light of three baselines and six publicly available annotation systems, demonstrating the competitiveness of our system. DBpedia Spotlight is shared as open source and deployed as a Web Service freely available for public use.} +} + +@inproceedings{isem2011mendesetal_short, + title = {{DB}pedia Spotlight: Shedding Light on the Web of Documents}, + author = {Pablo N. Mendes and Max Jakob and Andres Garcia-Silva and Christian Bizer}, + year = {2011}, + booktitle = {I-Semantics}, + abstract = {Interlinking text documents with Linked Open Data enables the Web of Data to be used as background knowledge within document-oriented applications such as search and faceted browsing. As a step towards interconnecting the Web of Documents with the Web of Data, we developed DBpedia Spotlight, a system for automatically annotating text documents with DBpedia URIs. DBpedia Spotlight allows users to configure the annotations to their specific needs through the DBpedia Ontology and quality measures such as prominence, topical pertinence, contextual ambiguity and disambiguation confidence. We compare our approach with the state of the art in disambiguation, and evaluate our results in light of three baselines and six publicly available annotation systems, demonstrating the competitiveness of our system. DBpedia Spotlight is shared as open source and deployed as a Web Service freely available for public use.} +} + +@article{DBLP:journals/pvldb/YosefHBSW11, + author = {Mohamed Amir Yosef and + Johannes Hoffart and + Ilaria Bordino and + Marc Spaniol and + Gerhard Weikum}, + title = {{AIDA}: An Online Tool for Accurate Disambiguation of Named + Entities in Text and Tables}, + journal = {PVLDB}, + volume = {4}, + number = {12}, + year = {2011}, + pages = {1450-1453}, + ee = {http://www.vldb.org/pvldb/vol4/p1450-yosef.pdf}, + bibsource = {DBLP, http://dblp.uni-trier.de} +} + +@article{lhd, + +title = "Linked hypernyms: Enriching {DBpedia} with Targeted Hypernym Discovery ", + +journal = "Web Semantics: Science, Services and Agents on the World Wide Web ", + +volume = "31", + +number = "", + +pages = "59 - 69", + +year = "2015", + +note = "", + +issn = "1570-8268", +author = "Tom{\'a}\v{s} Kliegr", + +keywords = "DBpedia", + +keywords = "Hearst patterns", + +keywords = "Hypernym", + +keywords = "Linked data", + +keywords = "YAGO", + +keywords = "Wikipedia", + +keywords = "Type inference " + +} + + +@article{lhd2, + author = {Tomáš Kliegr and Ondřej Zamazal}, + title = {{LHD} 2.0: A text mining approach to typing entities in knowledge graphs}, + journal = {Web Semantics: Science, Services and Agents on the World Wide Web}, + volume = {39}, + number = {0}, + year = {2016}, + publisher = {Elsevier}, + keywords = {Type inference; Support Vector Machines; Entity classification; DBpedia}, + issn = {1570-8268} +} + + +@inproceedings{lhdCH, + author = {Tom{\'a}\v{s} Kliegr and V{\'a}clav Zeman and Milan Dojchinovski}, + title = {Linked Hypernyms Dataset - Generation Framework and Use Cases}, + booktitle = {The 3rd Workshop on Linked Data in Linguistics: Multilingual Knowledge Resources and Natural Language Processing, co-located with LREC 2014}, + note = {To appear} +} + +@article{Rubenstein:1965:CCS:365628.365657, + author = {Rubenstein, Herbert and Goodenough, John B.}, + title = {Contextual correlates of synonymy}, + journal = {Commun. ACM}, + issue_date = {Oct. 1965}, + volume = {8}, + number = {10}, + month = oct, + year = {1965}, + issn = {0001-0782}, + pages = {627--633}, + numpages = {7}, + acmid = {365657}, + publisher = {ACM}, + address = {New York, NY, USA}, +} + + + +@inproceedings{Pirro:2010:FIT:1940281.1940321, + author = {Pirr\'{o}, Giuseppe and Euzenat, J{\'e}r\^{o}me}, + title = {A feature and information theoretic framework for semantic similarity and relatedness}, + booktitle = {Proceedings of the 9th international semantic web conference on The semantic web - Volume Part I}, + series = {ISWC'10}, + year = {2010}, + isbn = {3-642-17745-X, 978-3-642-17745-3}, + location = {Shanghai, China}, + pages = {615--630}, + numpages = {16}, + acmid = {1940321}, + publisher = {Springer-Verlag}, + address = {Berlin, Heidelberg}, + keywords = {feature based similarity, ontologies, semantic similarity}, +} + + + +@inproceedings{Radinsky:2011:WTC:1963405.1963455, + author = {Radinsky, Kira and Agichtein, Eugene and Gabrilovich, Evgeniy and Markovitch, Shaul}, + title = {A word at a time: computing word relatedness using temporal semantic analysis}, + booktitle = {Proceedings of the 20th international conference on World wide web}, + series = {WWW '11}, + year = {2011}, + isbn = {978-1-4503-0632-4}, + location = {Hyderabad, India}, + pages = {337--346}, + numpages = {10}, + acmid = {1963455}, + publisher = {ACM}, + address = {New York, NY, USA}, + keywords = {semantic analysis, semantic similarity, temporal dynamics, temporal semantics, word relatedness}, +} + + +@InProceedings { eswc2013, + author = { Alessio Palmero Aprosio and Claudio Giuliano and Alberto Lavelli }, + title = { Automatic expansion of {DB}pedia exploiting {W}ikipedia cross-language information }, + booktitle = { Proceedings of the European Semantic Web Conference, ESWC 2013 }, + address = {Montpellier, France}, + month = { February }, + year = { 2013 }, +} + + + + +@InProceedings { iswc2012paper-semantic-web-challenge-open-12, + author = { Heiko Paulheim }, + title = { Browsing {Linked Open Data} with Auto Complete }, + booktitle = { Proceedings of the Semantic Web Challenge co-located with ISWC2012 }, + address = {Boston, US}, + publisher={Univ., Mannheim}, + month = { November }, + year = { 2012 }, +} +@article{Bizer:2009:DCP:1640541.1640848, + author = {Bizer, Christian and Lehmann, Jens and Kobilarov, Georgi and Auer, S\"{o}ren and Becker, Christian and Cyganiak, Richard and Hellmann, Sebastian}, + title = {{DB}pedia - A crystallization point for the Web of Data}, + journal = {Web Semant.}, + issue_date = {September, 2009}, + volume = {7}, + number = {3}, + month = sep, + year = {2009}, + issn = {1570-8268}, + pages = {154--165}, + numpages = {12}, + acmid = {1640848}, + publisher = {Elsevier Science Publishers B. V.}, + address = {Amsterdam, The Netherlands, The Netherlands}, + keywords = {Knowledge extraction, Linked Data, RDF, Web of Data, Wikipedia}, +} + + + +@misc{rdf-n-triples, + added-at = {2008-01-27T23:43:29.000+0100}, + author = {RDF-Core working group}, + biburl = {http://www.bibsonomy.org/bibtex/2dbc5c44c37313a8fc9f1c67fa1980d56/bergo}, + interhash = {66c9ac1265935d623df96122db785592}, + intrahash = {dbc5c44c37313a8fc9f1c67fa1980d56}, + keywords = {imported}, + owner = {Administrator}, + timestamp = {2008-01-27T23:43:29.000+0100}, + title = {N-Triples}, + url = {http://www.w3.org/2001/sw/RDFCore/ntriples/}, + year = 2001 +} + +@ARTICLE{yago2, +AUTHOR = {Hoffart, Johannes and Suchanek, Fabian M. and Berberich, Klaus and Weikum, Gerhard}, +TITLE = {{YAGO2}: A spatially and temporally enhanced knowledge base from {W}ikipedia}, +JOURNAL = {Artificial Intelligence}, +PUBLISHER = {Elsevier}, +YEAR = {2013}, +VOLUME = {194}, +PAGES = {28--61}, +ADDRESS = {Amsterdam}, +} +@incollection{automaticTypingDBpediaEntities, +year={2012}, +isbn={978-3-642-35175-4}, +booktitle={The Semantic Web - ISWC 2012}, +series={Lecture Notes in Computer Science}, +editor={Cudre-Mauroux, Philippe and Heflin, Jeff and Sirin, Evren and Tudorache, Tania and Euzenat, Jerome and Hauswirth, Manfred and Parreira, Josiane Xavier and Hendler, Jim and Schreiber, Guus and Bernstein, Abraham and Blomqvist, Eva}, +title={Automatic Typing of {DB}pedia Entities}, +publisher={Springer Berlin Heidelberg}, +author={Gangemi, Aldo and Nuzzolese, Andrea Giovanni and Presutti, Valentina and Draicchio, Francesco and Musetti, Alberto and Ciancarini, Paolo}, +pages={65-81} +} + +@INPROCEEDINGS{wiktDojch, + author = {Milan Dojchinovski and Tom{\'a}\v{s} Kliegr}, + title = {Recognizing, Classifying and Linking Entities with {W}ikipedia and {DB}pedia}, + booktitle = {Proceedings of the 7th Workshop on Intelligent and Knowledge Oriented Technologies }, + series={WIKT'12}, + year = {2012}, + pages = {41--44}, + url={http://wikt2012.fiit.stuba.sk/data/wikt2012-proceedings.pdf} +} + + + +@ARTICLE{Kilgarriff00frameworkand, + author = {Adam Kilgarriff and Joseph Rosenzweig}, + title = {Framework and Results for {E}nglish {SENSEVAL}}, + journal = {Special Issue on {SENSEVAL}. Computers and the Humanties}, + year = {2000}, + pages = {15--48} +} + +@INPROCEEDINGS{Evans03aframework, + author = {Richard Evans}, + title = {A framework for named entity recognition in the open domain}, + booktitle = {Proceedings of the Recent Advances in Natural Language Processing }, + series={RANLP'03}, + year = {2003}, + pages = {137--144} +} +@conference{alfonseca_manandhar_2002, + added-at = {2012-03-20T05:00:52.000+0100}, + address = { India}, + author = {Alfonseca, Enrique and Manandhar, Suresh}, + biburl = {http://www.bibsonomy.org/bibtex/2ad637aa4a3daedcb2a25821889c2f8fc/wyswilson}, + booktitle = { Proceedings of the 1st International Conference on General {W}ord{N}et}, + interhash = {4ec4b41cb59e30761c65ede32fbfbca8}, + intrahash = {ad637aa4a3daedcb2a25821889c2f8fc}, + keywords = {imported}, + timestamp = {2012-03-20T05:00:52.000+0100}, + title = { An Unsupervised Method for General Named Entity Recognition and Automated Concept Discovery}, + year = { 2002} +} + + + +@article{ner-sekine2007, + added-at = {2009-11-06T17:24:59.000+0100}, + author = {David Nadeau and Satoshi Sekine }, + biburl = {http://www.bibsonomy.org/bibtex/231a3c28acfe2301ba810b2fccd1ea392/mirian}, + interhash = {2d9a1a5440885a8741a1686f344a9494}, + intrahash = {31a3c28acfe2301ba810b2fccd1ea392}, + journal = {Linguisticae Investigationes}, + keywords = {ie ner nlp survey}, + month = {January}, +publisher={John Benjamins Publishing Company}, + number = 1, + pages = {3--26}, + timestamp = {2009-11-06T17:24:59.000+0100}, + title = {A survey of named entity recognition and classification}, + url = {http://www.ingentaconnect.com/content/jbp/li/2007/00000030/00000001/art00002}, + volume = 30, + year = 2007 +} + + +@misc{gatedoc, +author={ + Hamish Cunningham and Diana Maynard and Kalina Bontcheva and Valentin Tablan and Niraj Aswani and Ian Roberts and Genevieve Gorrell and Adam Funk and Angus Roberts and Danica Damljanovic and Thomas Heitz and Mark A. Greenwood and Horacio Saggion and Johann Petrak and Yaoyong Li and Wim Peters and et al}, +institution={University of Sheffield, Department of Computer Science}, +year={2012}, +title={Developing Language Processing +Components with {GATE} +Version 7 (a User Guide)}, +url={http://gate.ac.uk/sale/tao/split.html} +} + +} +@INPROCEEDINGS{Crammer01prankingwith, + author = {Koby Crammer and Yoram Singer}, + title = {Pranking with Ranking}, + booktitle = {Advances in Neural Information Processing Systems 14}, + year = {2001}, + pages = {641--647}, + publisher = {MIT Press} +} +@article{Kramer_Widmer_Pfahringer, title={Prediction of ordinal classes using regression trees}, volume={47}, url={http://iospress.metapress.com/index/VP1P87EH3GBX319R.pdf}, number={1}, journal={Fundamenta Informaticae}, publisher={IOS Press}, author={Kramer, Stefan and Widmer, Gerhard and Pfahringer, Bernhard and De Groeve, Michael}, year={2001}, pages={1–13}} + +@ARTICLE{Chu04gaussianprocesses, + author = {Wei Chu and Zoubin Ghahramani}, + title = {Gaussian processes for ordinal regression}, + journal = {Journal of Machine Learning Research}, + year = {2004}, + volume = {6}, + pages = {2005} +} +@article{DBLP:journals/neco/ChuK07, + author = {Wei Chu and + S. Sathiya Keerthi}, + title = {Support Vector Ordinal Regression}, + journal = {Neural Computation}, + volume = {19}, + number = {3}, + year = {2007}, + pages = {792-815}, + ee = {http://dx.doi.org/10.1162/neco.2007.19.3.792}, + bibsource = {DBLP, http://dblp.uni-trier.de} +} + +@book{agresti, + author = {A. Agresti}, + publisher = {Wiley-Interscience}, + title = {{Categorical Data Analysis}}, + series = {Wiley Series in Probability and Statistics}, + edition = {2nd}, + year = {2002} +} + +@article{siskos, +author="Y. Siskos and D. Yanacopoulos", +year=1985, +title="{UTA STAR} - an ordinal regression method for building additive value functions", +journal="Investigacao Operational", +volume=5, +pages="39-53" +} + +@book{pemberton, + title={Mathematics For Economists: An Introductory Textbook, Second Edition}, + author={Pemberton, M. and Rau, N.}, + isbn={9780719075391}, + series={G - Reference, Information and Interdisciplinary Subjects Series}, + url={http://books.google.co.uk/books?id=zS8itLDS36sC}, + year={2006}, + publisher={Manchester University Press} +} + +@Book{ sill, + TITLE = "Monotonicity and Connectedness in Learning Systems", + AUTHOR = "Sill J", + PUBLISHER = "California Institute of Technology ", + YEAR = "1998", + NOTE = "PhD Dissertation" +} + + +@Book{ kotlowski, + TITLE = "Statistical Approach to Ordinal Classification with Monotonicity Constraints", + AUTHOR = "Kotlowski", + PUBLISHER = "Poznan University of Technology ", + YEAR = "2008", + NOTE = "PhD Dissertation" +} + +@inproceedings{ pl12, + TITLE = "Preprocessing Algorithm for Handling Non-Monotone Attributes in the {UTA} method ", + AUTHOR = "Alan Eckhardt and Tom{\'a}\v{s} Kliegr", + booktitle="Preference Learning: Problems and Applications in AI (PL-12) ", + YEAR = "2012" +} + + + + + + +@Book{ eckhardt, + TITLE = "Induction of User Preferences For Semantic Web", + AUTHOR = "Alan Eckhardt", + PUBLISHER = "Mathematic-Physics Faculty of the Charles University ", + YEAR = "2010", + NOTE = "PhD Dissertation" +} +@BOOK{jf:Book-Nada, + author = {F{\"{u}}rnkranz, Johannes and Gamberger, Dragan and Lavra{\v c}, Nada}, + title = {Foundations of Rule Learning}, + year = {2012}, + publisher = {Springer-Verlag}, + isbn = {978-3-540-75196-0} +} + + @INPROCEEDINGS{Fürnkranz05roc'n', + author = {Johannes Fürnkranz and Peter A. Flach}, + title = {ROC 'n' Rule Learning - Towards a Better Understanding of Covering Algorithms}, + booktitle = {Machine Learning}, + year = {2005}, + pages = {39--77} +} + +@article{DBLP:journals/ml/JanssenF10, + author = {Frederik Janssen and + Johannes F{\"{u}}rnkranz}, + title = {On the quest for optimal rule learning heuristics}, + journal = {Machine Learning}, + year = {2010}, + volume = {78}, + number = {3}, + pages = {343--379}, + url = {http://springerlink.metapress.com/content/5133634885171258/}, + timestamp = {Thu, 13 Nov 2014 16:54:00 +0100}, + biburl = {http://dblp.uni-trier.de/rec/bib/journals/ml/JanssenF10}, + bibsource = {dblp computer science bibliography, http://dblp.org} +} + +@incollection{plbook:preflearn, + author = {Dembczyñski, K. and Kotlowski, W. and Slowiñski, R. and Szelag, M.}, + title = {Learning of Rule Ensembles for Multiple Attribute Ranking Problems}, + booktitle = {Preference Learning}, + editor = {F{\"{u}}rnkranz, Johannes and H{\"{u}}llermeier, Eyke}, + publisher = {Springer-Verlag}, + year = {2010}, + pages = {217--247} +} + + +@incollection{plbook:rulessuvey, + author = {Zhang, J. and Bala, J and Hadjarian A, and Han B}, + title = {Ranking Cases with Classification Rules}, + booktitle = {Preference Learning}, + editor = {F{\"{u}}rnkranz, Johannes and H{\"{u}}llermeier, Eyke}, + publisher = {Springer-Verlag}, + year = {2010}, + pages = {155--177} +} + + +@proceedings{mathutilitytheory, +title = {Mathematical Utility Theory: +Utility Functions, Models, and Applications in the Social Sciences}, +author={Gerhard Herden}, +publisher={Springer}, +year={1999} +} + +@incollection{plbook:ordinalregrsuvey, + author = {Waegeman W and Baets B}, + title = {A Survey on ROC-based Ordinal Regression}, + booktitle = {Preference Learning}, + editor = {F{\"{u}}rnkranz, Johannes and H{\"{u}}llermeier, Eyke}, + publisher = {Springer-Verlag}, + year = {2010}, + pages = {127--153} +} + + +@InBook{ handbookmathpsy, + author = "Luce R and Suppes P", + book = "Handbook of Mathematical Psychology", + title = " chapter Preference, Utility and Subjective Probability", + publisher= "Wiley", + Pages = "249--410", + year = 1965 +} + + +@article{Dembczynski:2009:LRE:1609998.1610003, + author = {Dembczynski, Krzysztof and Kotlowski., Wojciech and Slowinski, Roman}, + title = {Learning Rule Ensembles for Ordinal Classification with Monotonicity Constraints}, + journal = {Fundam. Inf.}, + issue_date = {April 2009}, + volume = {94}, + number = {2}, + month = apr, + year = {2009}, + issn = {0169-2968}, + pages = {163--178}, + numpages = {16}, + acmid = {1610003}, + publisher = {IOS Press}, + address = {Amsterdam, The Netherlands, The Netherlands}, + +} + +@article{greco3ejor01, +author="S. Greco and B. Matarazzo and R. Slowinski", +year=2001, +title="Rough sets theory for multicriteria decision analysis", +journal="European Journal of Operational Research", +volume=129, +pages="1-47" +} + +@incollection{plbook:intro, + author = {F{\"{u}}rnkranz, Johannes and H{\"{u}}llermeier, Eyke}, + title = {Preference Learning: An Introduction}, + booktitle = {Preference Learning}, + editor = {F{\"{u}}rnkranz, Johannes and H{\"{u}}llermeier, Eyke}, + publisher = {Springer-Verlag}, + year = {2010}, + pages = {19--42} +} + +@inproceedings{Gottron:2011:IES:2063576.2063865, + author = {Gottron, Thomas and Anderka, Maik and Stein, Benno}, + title = {Insights into explicit semantic analysis}, + booktitle = {Proceedings of the 20th ACM international conference on Information and knowledge management}, + series = {CIKM '11}, + year = {2011}, + isbn = {978-1-4503-0717-8}, + location = {Glasgow, Scotland, UK}, + pages = {1961--1964}, + numpages = {4}, + acmid = {2063865}, + publisher = {ACM}, + address = {New York, NY, USA}, + keywords = {explicit semantic analysis, semantic relatedness}, +} + + +@inproceedings{Goethals:2011:MFI:2034161.2034208, + author = {Goethals, Bart and Moens, Sandy and Vreeken, Jilles}, + title = {{MIME}: a framework for interactive visual pattern mining}, + series = {ECML PKDD'11}, + year = {2011}, + isbn = {978-3-642-23807-9}, + location = {Athens, Greece}, + pages = {634--637}, + numpages = {4}, + acmid = {2034208}, + publisher = {Springer}, + address = {Berlin}, + keywords = {MIME, interactive visual mining, pattern exploration}, +} +@inproceedings{Kliegr:2011:BKP:2023598.2023606, + author = {Kliegr, Tom\'{a}\v{s} and Voj\'{\i}\v{r}, Stanislav and Rauch, Jan}, + title = {Background knowledge and {PMML}: first considerations}, + booktitle = {Proceedings of the 2011 workshop on Predictive markup language modeling}, + series = {PMML '11}, + year = {2011}, + isbn = {978-1-4503-0837-3}, + location = {San Diego, California, USA}, + pages = {54--62}, + numpages = {9}, + acmid = {2023606}, + publisher = {ACM}, + address = {New York, NY, USA}, + keywords = {background knowledge, pmml}, +} + +[download] +@inproceedings{techila, +author = { Milan \v{S}im\r{u}nek and Teppo Tammisto}, +title={Distributed Data-Mining in the {LISp-Miner} System Using {Techila} Grid}, +booktitle={Networked Digital Technologies'10}, +address={Berlin}, +publisher={Springer}, +year= 2010, +pages={15-21} +} + + + +@inproceedings{dimRedIR, + author = {Peng Wu and Bangalore S. Manjunath and Hyundoo D. Shin}, + title = {Dimensionality Reduction for Image Retrieval}, + booktitle = {Proceedings of International Conference on Image Processing}, + year = {2000}, + pages={726 - 729}, +volume={3}, +location={Vancouver, BC}, + publisher={IEEE} +} + + + + +@Book{ featuresinimageretrieval, + TITLE = "Features for Image Retrieval", + AUTHOR = "Thomas Deselaers", + PUBLISHER = "Rheinisch-Westfalische Technische Hochschule Aachen", + YEAR = "2003", + NOTE = "Diploma thesis" +} + +@book{tucker1989unified, + title={A Unified Introduction to Linear Algebra: Models, Methods and Theory}, + author={Tucker, Allan}, + year={1989}, + publisher={Maxwell Macmillan} +} + +@inproceedings{DBLP:conf/eacl/BunescuP06, +title={Using Encyclopedic Knowledge for Named Entity Disambiguation}, +author={Razvan Bunescu and Marius Pasca}, +booktitle={Proceesings of the 11th Conference of the European Chapter of the Association for Computational Linguistics}, +publisher={Association for Computational Linguistics}, +series={EACL '06}, +address={Trento, Italy}, +pages={9-16}, +url="http://www.cs.utexas.edu/users/ai-lab/pub-view.php?PubID=51468", +year={2006} +} + +@proceedings{DBLP:conf/eacl/2006, + title = {EACL 2006, 11st Conference of the European Chapter of the + Association for Computational Linguistics, Proceedings of + the Conference, April 3-7, 2006, Trento, Italy}, + booktitle = {EACL}, + publisher = {The Association for Computer Linguistics}, + year = {2006}, + isbn = {1-932432-59-0}, + bibsource = {DBLP, http://dblp.uni-trier.de} +} + + + + +@inproceedings{DBLP:conf/cikm/Kliegr10, + author = {Kliegr, Tom\'{a}\v{s}}, + title = {Entity classification by bag of {W}ikipedia articles}, + booktitle = {Proceedings of the 3rd workshop on Ph.D. students in information and knowledge management}, + series = {PIKM '10}, + year = {2010}, + isbn = {978-1-4503-0385-9}, + location = {Toronto, ON, Canada}, + pages = {67--74}, + numpages = {8}, + acmid = {1871914}, + publisher = {ACM}, + address = {New York, NY, USA}, + keywords = {named entity classification, wikipedia, word sense disambiguation}, +} + + +@proceedings{DBLP:conf/cikm/2010pikm, + editor = {Anisoara Nica and + Aparna S. Varde}, + title = {Proceedings of the Third Ph.D. Workshop on Information and + Knowledge Management, PIKM 2010, Toronto, Ontario, Canada, + October 30, 2010}, + booktitle = {PIKM}, + publisher = {ACM}, + year = {2010}, + bibsource = {DBLP, http://dblp.uni-trier.de} +} + + +@INPROCEEDINGS{Salton88term-weightingapproaches, + author = {Gerard Salton and Christopher Buckley}, + title = {Term-weighting approaches in automatic text retrieval}, + booktitle = {Information Processing and Management}, + year = {1988}, + pages = {513--523}, + publisher = {Pergamon Press} +} + +@Article{ genkin2005slr, + author = "Alexander Genkin and David D. Lewis and David Madigan", + citeulike-article-id = "5393542", + note = "Project Report, DIMACS Working Group on Monitoring Message Streams", + keywords = "learning", + posted-at = "2009-08-07 18:02:07", + priority = "2", + title = "Sparse Logistic Regression for Text Categorization", + year = "2005" +} + + + +@book{utilityLearning, + author = "Friedman, Craig", + title = "Utility-Based Learning from Data", + publisher = "CRC Press", + address = "Hoboken", + series = "Chapman and Hall/CRC Machine Learning and Pattern Recognition", + year = "2010", +} + +@article{doccatbasque, +title ={Analyzing the Effect of Dimensionality Reduction in Document Categorization for {B}asque}, +author ={Ana Zelaia and Inaki Alegria and Olatz Arregi and Basilio Sierra}, +journal={Archives of Control Science}, +year={2005}, +publisher={Silesian University of Technology} +} + + +@inproceedings{DBLP:conf/ercimdl/BelKV03, + author = {N{\'u}ria Bel and + Cornelis H. A. Koster and + Marta Villegas}, + title = {Cross-Lingual Text Categorization}, + booktitle = {Proceedings of the 7th European Conference on Research and Advancement Technology for Digital Libraries}, +series={ECDL '03}, +publisher={Springer-Verlag}, + year = {2003}, + pages = {126-139}, + ee = {http://dx.doi.org/10.1007/b11967}, + crossref = {DBLP:conf/ercimdl/2003}, + bibsource = {DBLP, http://dblp.uni-trier.de} +} + +@proceedings{DBLP:conf/ercimdl/2003, + editor = {Traugott Koch and + Ingeborg S{\o}lvberg}, + title = {Research and Advanced Technology for Digital Libraries, + 7th European Conference, ECDL 2003, Trondheim, Norway, August + 17-22, 2003, Proceedings}, + booktitle = {ECDL}, + publisher = {Springer}, + series = {Lecture Notes in Computer Science}, + volume = {2769}, + year = {2003}, + isbn = {3-540-40726-X}, + bibsource = {DBLP, http://dblp.uni-trier.de} +} + + +@inproceedings{Joachims96aprobabilistic, + author = {Joachims, Thorsten}, + title = {A Probabilistic Analysis of the {R}occhio Algorithm with {TFIDF} for Text Categorization}, + booktitle = {Proceedings of the Fourteenth International Conference on Machine Learning}, + series = {ICML '97}, + year = {1997}, + isbn = {1-55860-486-3}, + pages = {143--151}, + numpages = {9}, + acmid = {657278}, + publisher = {Morgan Kaufmann Publishers Inc.}, + address = {San Francisco, CA, USA}, +} + +@inproceedings{rocchio, + author = {Rocchio, J.}, + booktitle = {The SMART Retrieval System}, + citeulike-article-id = {2790518}, + citeulike-linkout-0 = {http://scholar.google.com/scholar?hl=en\&\#38;lr=\&\#38;client=firefox-a\&\#38;q=relevance+feedback+in+information+retrieval\&\#38;btnG=Search}, + keywords = {plurality, relevancefeedback, rocchio}, + pages = {313--323}, + posted-at = {2008-05-12 20:07:47}, + priority = {2}, + title = {Relevance Feedback in Information Retrieval}, + url = {http://scholar.google.com/scholar?hl=en\&\#38;lr=\&\#38;client=firefox-a\&\#38;q=relevance+feedback+in+information+retrieval\&\#38;btnG=Search}, + year = {1971} +} + + +@book{Manning:2008:IIR:1394399, + author = {Manning, Christopher D. and Raghavan, Prabhakar and Schtze, Hinrich}, + title = {Introduction to Information Retrieval}, + year = {2008}, + isbn = {0521865719, 9780521865715}, + publisher = {Cambridge University Press}, + address = {New York, NY, USA}, +} + +@article{Fleming:1986:LSC:5666.5673, + author = {Fleming, Philip J. and Wallace, John J.}, + title = {How not to lie with statistics: the correct way to summarize benchmark results}, + journal = {Commun. ACM}, + volume = {29}, + issue = {3}, + month = {March}, + year = {1986}, + issn = {0001-0782}, + pages = {218--221}, + numpages = {4}, + acmid = {5673}, + publisher = {ACM}, + address = {New York, NY, USA}, +} + + + +@inproceedings{wordnetWikiMapping, +title={Aligning {W}ord{N}et Synsets and {W}ikipedia Articles}, +author={Samuel Fernando and Mark Stevenson}, +booktitle = {Proceedings of the Collaboratively-Built Knowledge Sources and Artificial Intelligence Workshop at 22nd international conference on Machine learning}, +publisher={Association for Computational Linguistics}, +year = {2010} +} + +@inproceedings{Daume:2005:LSO:1102351.1102373, + author = {Daum\'{e},III, Hal and Marcu, Daniel}, + title = {Learning as search optimization: approximate large margin methods for structured prediction}, + booktitle = {Proceedings of the 22nd international conference on Machine learning}, + series = {ICML '05}, + year = {2005}, + isbn = {1-59593-180-5}, + location = {Bonn, Germany}, + pages = {169--176}, + numpages = {8}, + acmid = {1102373}, + publisher = {ACM}, + address = {New York, NY, USA}, +} + +@inproceedings{ KaTo07, + author = {Jun'ichi Kazama and Kentaro Torisawa}, + pages = {698--707}, + school = {Japan Advanced Institute of Science and Technologie}, + title = {Exploiting {W}ikipedia as External Knowledge for Named Entity Recognition}, + booktitle = {Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning}, +series={EMNLP-CoNLL'07}, + year = {2007} +} +@inproceedings{Agirre:2009:SSR:1620754.1620758, + author = {Agirre, Eneko and Alfonseca, Enrique and Hall, Keith and Kravalov\'{a}, Jana and Pa\c{s}ca, Marius and Soroa, Aitor}, + title = {A study on similarity and relatedness using distributional and {W}ord{N}et-based approaches}, + booktitle = {Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics}, + series= {NAACL '09}, + year = {2009}, + isbn = {978-1-932432-41-1}, + location = {Boulder, Colorado}, + pages = {19--27}, + numpages = {9}, + acmid = {1620758}, + publisher = {Association for Computational Linguistics}, + address = {Stroudsburg, PA, USA}, +} + +@inproceedings{Lesk:1986:ASD:318723.318728, + author = {Lesk, Michael}, + title = {Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone}, + booktitle = {Proceedings of the 5th annual international conference on systems documentation}, + series = {SIGDOC '86}, + year = {1986}, + isbn = {0-89791-224-1}, + location = {Toronto, Ontario, Canada}, + pages = {24--26}, + numpages = {3}, + acmid = {318728}, + publisher = {ACM}, + address = {New York, NY, USA}, +} + + + +@article{Tversky77, +author = {Amos Tversky}, +title = {Features of similarity}, +journal = {Psychological Review}, +volume = {84}, +pages = {327--352}, +year = {1977} +} + + +@inproceedings{Pirro:2008:DIE:1483848.1483883, + author = {Pirr\'{o}, Giuseppe and Seco, Nuno}, + title = {Design, Implementation and Evaluation of a New Semantic Similarity Metric Combining Features and Intrinsic Information Content}, + booktitle = {Proceedings of the OTM 2008 Confederated International Conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part II on On the Move to Meaningful Internet Systems}, + series = {OTM '08}, + year = {2008}, + isbn = {978-3-540-88872-7}, + location = {Monterrey, Mexico}, + pages = {1271--1288}, + numpages = {18}, + acmid = {1483883}, + publisher = {Springer}, + address = {Berlin, Heidelberg}, + keywords = {Feature Based Similarity, Intrinsic Information Content, Java {W}ord{N}et Similarity Library, Semantic Similarity}, +} + + +@article{rada, + author={Roy Rada and Ellen Bicknell }, + title={Ranking Documents with a Thesaurus}, + journal={Journal of the American Society for Information Science}, + pages={304-310}, + year =1989 +} + +@inproceedings{Lin:1998:IDS:645527.657297, + author = {Lin, Dekang}, + title = {An Information-Theoretic Definition of Similarity}, + booktitle = {Proceedings of the Fifteenth International Conference on Machine Learning}, + series = {ICML '98}, + year = {1998}, + isbn = {1-55860-556-8}, + pages = {296--304}, + numpages = {9}, + url = {http://portal.acm.org/citation.cfm?id=645527.657297}, + acmid = {657297}, + publisher = {Morgan Kaufmann Publishers Inc.}, + address = {San Francisco, CA, USA}, +} + + +@inproceedings{Jiang97taxonomySimilarity, + author = {Jay J. Jiang and + David W. Conrath}, + booktitle = {Proceedings of the International Conference on Research in Computational Linguistics}, + interhash = {175ec03ee8c47d4b2d0a083609a78e05}, + intrahash = {c4ffc507dafc908eab62fde53f7e4f7a}, + pages = {19--33}, + title = {Semantic similarity based on corpus statistics and lexical taxonomy}, + url = {http://www.cse.iitb.ac.in/~cs626-449/Papers/WordSimilarity/4.pdf}, + year = 1997, + keywords = {1997 Conrath Jiang JiangConrath folksonomy lexical measure semantic similarity taxonomy}, + added-at = {2010-03-12T16:18:27.000+0100}, + description = {Jiang Conrath Maß}, + file = {jiang1997ssb.pdf:jiang1997ssb.pdf:PDF}, + biburl = {http://www.bibsonomy.org/bibtex/2c4ffc507dafc908eab62fde53f7e4f7a/sdo}, + abstract = {This paper presents a new approach for measuring semantic similarity/distance between words and concepts. It combines a lexical taxonomy structure with corpus statistical information so that the semantic distance between nodes in the semantic space constructed by the taxonomy can be better quantified with the computational evidence derived from a distributional analysis of corpus data. Specifically, the proposed measure is a combined approach that inherits the edge-based approach of the edge counting scheme, which is then enhanced by the node-based approach of the information content calculation. When tested on a common data set of word pair similarity ratings, the proposed approach outperforms other computational models. It gives the highest correlation value (r = 0.828) with a benchmark based on human similarity judgements, whereas an upper bound (r = 0.885) is observed when human subjects replicate the same task.} +} + + + +@article{wordnetWeighting, +author = {Mohammed M. Sakre and Mohammed M. Kouta and Ali M. N. Allam}, +title = {WEIGHTING QUERY TERMS USING {W}ORDNET ONTOLOGY}, +year = {2009}, +masid = {6907992}, +journal={ International Journal of Computer Science and Network Security}, +volume={9}, +pages={349-358}, +issue={4}, + +} + + @article{Harman:1992:DTP:146565.146567, + author = {Harman, Donna}, + title = {The {DARPA TIPSTER} project}, + journal = {SIGIR Forum}, + volume = {26}, + issue = {2}, + month = {October}, + year = {1992}, + issn = {0163-5840}, + pages = {26--28}, + numpages = {3}, + url = {http://doi.acm.org/10.1145/146565.146567}, + doi = {http://doi.acm.org/10.1145/146565.146567}, + acmid = {146567}, + publisher = {ACM}, + address = {New York, NY, USA}, +} + + + +@article{ner-sekine2007, + author = {David Nadeau and Satoshi Sekine}, + interhash = {2d9a1a5440885a8741a1686f344a9494}, + intrahash = {31a3c28acfe2301ba810b2fccd1ea392}, + journal = {Linguisticae Investigationes}, + number = 1, + pages = {3--26}, + title = {A survey of named entity recognition and classification}, + url = {http://www.ingentaconnect.com/content/jbp/li/2007/00000030/00000001/art00002}, + volume = 30, + year = 2007, + keywords = {ie ner nlp survey}, + added-at = {2009-11-06T17:24:59.000+0100}, + biburl = {http://www.bibsonomy.org/bibtex/231a3c28acfe2301ba810b2fccd1ea392/mirian}, + month = {January} +} + +@inproceedings{ruleml, + author = {Tom\'{a}\v{s} Kliegr and Jan Rauch}, + title = {An {XML} Format for Association Rule Models Based on GUHA Method}, + booktitle = {RuleML-2010, 4th International Web Rule Symposium}, + year = {2010}, + location = {Washington, DC}, + publisher = {Springer-Verlag}, + address = {Berlin, Heidelberg}, + } + +@InProceedings{BekkermanJ07, + + author = "Ron Bekkerman and Jiwoon Jeon", + + title = "Multi-modal Clustering for Multimedia Collections", + + booktitle = "Proceedings of the IEEE Computer Society Conference + on Computer Vision and Pattern Recognition", + + year = "2007", +series={CVPR '07}, + location = "Minneapolis, Minnesota" + +} + +@inproceedings{kulczynski, + author = {Wu, Tianyi and Chen, Yuguo and Han, Jiawei}, + title = {Association Mining in Large Databases: A Re-examination of Its Measures}, + booktitle = {PKDD 2007: Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases}, + year = {2007}, + isbn = {978-3-540-74975-2}, + pages = {621--628}, + location = {Warsaw, Poland}, + publisher = {Springer-Verlag}, + address = {Berlin, Heidelberg}, + } + + + +@inproceedings{lispminerpaper, + author = {Milan \v{S}im\r{u}nek}, + title = {Academic {KDD} Project {LIS}p-{M}iner}, + booktitle = {Advances in Soft Computing - Intelligent Systems Design and Applications}, + year = {2003}, + pages = {263-272}, + publisher={Springer}, + editors={Ajith Abreham and Katrin Franke and Mario Koppen} +} + +@book{wordnet, + day = {15}, + editor = {Fellbaum, Christiane}, + howpublished = {Hardcover}, + isbn = {026206197X}, + keywords = {lexical\_resources, lexicography, semantics, wordnet}, + month = {May}, + posted-at = {2007-11-15 20:28:18}, + priority = {0}, + publisher = {The MIT Press}, + title = {{WordNet}: An Electronic Lexical Database (Language, Speech, and Communication)}, + year = {1998} +} + + + +@INPROCEEDINGS{wittenwiki, + author = {David Milne and Ian H. Witten}, + title = {An Effective, Low-Cost Measure of Semantic Relatedness Obtained from {W}ikipedia Links}, + booktitle = {Proceedings of the first AAAI Workshop on Wikipedia and Artificial Intelligence}, + year = {2008}, +pages={25-30} +} +@TECHREPORT{Jarmasz03rogetsthesaurus, + author = {Mario Jarmasz}, + title = {Roget's thesaurus as a lexical resource for natural language processing}, + institution = {University of Ottowa}, + year = {2003} +} + +@article{Finkelstein:02, + abstract = {Keyword-based search engines are in widespread use today as a popular means for Web-based information retrieval. Although such systems seem deceptively simple, a considerable amount of skill is required in order to satisfy non-trivial information needs. This paper presents a new conceptual paradigm for performing search in context, that largely automates the search process, providing even non-professional users with highly relevant results. This paradigm is implemented in practice in the IntelliZap system, where search is initiated from a text query marked by the user in a document she views, and is guided by the text surrounding the marked query in that document ("the context"). The context-driven information retrieval process involves semantic keyword extraction and clustering to automatically generate new, augmented queries. The latter are submitted to a host of general and domain-specific search engines. Search results are then semantically reranked, using context. Experimental results +testify that using context to guide search, effectively offers even inexperienced users an advanced search tool on the Web.}, + address = {New York, NY, USA}, + author={Lev Finkelstein and Evgeniy Gabrilovich and Yossi Matias and Ehud Rivlin and Zach Solan and Gadi Wolfman and Eytan Ruppin}, + citeulike-article-id = {379844}, + citeulike-linkout-0 = {http://portal.acm.org/citation.cfm?id=503110}, + citeulike-linkout-1 = {http://dx.doi.org/10.1145/503104.503110}, + doi = {10.1145/503104.503110}, + issn = {1046-8188}, + journal = {ACM Transactions on Information Systems}, + keywords = {context}, + month = {January}, + number = {1}, + pages = {116--131}, + posted-at = {2008-02-25 15:54:13}, + priority = {2}, + publisher = {ACM}, + title = {Placing search in context: the concept revisited}, + url = {http://dx.doi.org/10.1145/503104.503110}, + volume = {20}, + year = {2002} +} + +@inproceedings{wikirelate, + author = {Strube, Michael and Ponzetto, Simone Paolo}, + title = {Wiki{R}elate! computing semantic relatedness using {W}ikipedia}, + booktitle = {Proceedings of the 21st national conference on Artificial intelligence - Volume 2}, + series = {AAAI'06}, + year = {2006}, + isbn = {978-1-57735-281-5}, + location = {Boston, Massachusetts}, + pages = {1419--1424}, + numpages = {6}, + acmid = {1597414}, + publisher = {AAAI Press}, +} + + + + +@InProceedings{Cucerzan07large-scalenamed, + author = {Cucerzan, Silviu}, + title = {Large-Scale Named Entity Disambiguation Based on {Wikipedia} Data}, + booktitle = {Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning}, +series={EMNLP-CoNLL '07}, + month = {June}, + year = {2007}, + address = {Prague, Czech Republic}, + publisher = {Association for Computational Linguistics}, + pages = {708--716}, + url = {http://www.aclweb.org/anthology/D/D07/D07-1074} +} + +@inproceedings{Gabrilovich07computingsemantic, + author = {Gabrilovich, Evgeniy and Markovitch, Shaul}, + title = {Computing semantic relatedness using {W}ikipedia-based explicit semantic analysis}, + booktitle = {Proceedings of the 20th international joint conference on Artifical intelligence}, + series = {IJCAI'07}, + year = {2007}, + location = {Hyderabad, India}, + pages = {1606--1611}, + numpages = {6}, + acmid = {1625535}, + publisher = {Morgan Kaufmann Publishers Inc.}, + address = {San Francisco, CA, USA}, +} + +@inproceedings{Jiline:2009:BKE:1560466.1560513, + author = {Jiline, Mikhail}, + title = {Background Knowledge Enriched Data Mining for Interactome Analysis}, + booktitle = {Proceedings of the 22nd Canadian Conference on Artificial Intelligence: Advances in Artificial Intelligence}, + series = {Canadian AI '09}, + year = {2009}, + isbn = {978-3-642-01817-6}, + location = {Kelowna, Canada}, + pages = {283--286}, + numpages = {4}, + acmid = {1560513}, + publisher = {Springer-Verlag}, + address = {Berlin, Heidelberg}, +} + + +@INPROCEEDINGS{Feldman96miningassociations, + author = {Ronen Feldman and Haym Hirsh}, + title = {Mining Associations in Text in the Presence of Background Knowledge}, + booktitle = {Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96}, + year = {1996}, + pages = {343--346} +} + +@Book{ feldman, + abstract = "{Text mining tries to solve the crisis of information overload by combining techniques from data mining, machine learning, natural language processing, information retrieval, and knowledge management. In addition to providing an in-depth examination of core text mining and link detection algorithms and operations, this book examines advanced pre-processing techniques, knowledge representation considerations, and visualization approaches. Finally, it explores current real-world, mission-critical applications of text mining and link detection in such varied fields as M\&A business intelligence, genomics research and counter-terrorism activities.}", + author = "Ronen Feldman and James Sanger", + day = "11", + howpublished = "Hardcover", + isbn = "0521836573", + keywords = "text\_mining", + month = dec, + posted-at = "2008-05-30 11:38:34", + priority = "2", + publisher = "Cambridge University Press", + title = "The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data", + url = "http://www.amazon.com/exec/obidos/redirect?tag=citeulike07-20&path=ASIN/0521836573", + year = "2006" +} + + +@inproceedings{Sekine02, + address = {Canary Islands, Spain}, + author = {Satoshi Sekine and Kiyoshi Sudo and Chikashi Nobata}, + booktitle = {Proceedings of $3^{rd}$ International Conference on Language Resources and Evaluation}, +series={LREC'02}, + editor = {M. Gonz\'ales Rodr\'iguez and C. Paz Su\'arez Araujo}, + interhash = {fa75343b70dace4b0fe9f507ab44112b}, + intrahash = {d6a05ce6d3f987f489b673c312773f02}, + pages = {1818--1824}, + title = {Extended Named Entity Hierarchy}, + year = 2002, + timestamp = {2008-03-15T16:50:05.000+0100}, + keywords = {dipl_literatur named_entity_recognition}, + added-at = {2008-03-15T16:50:05.000+0100}, + description = {The big one}, + biburl = {http://www.bibsonomy.org/bibtex/2d6a05ce6d3f987f489b673c312773f02/danielt}, + month = May +} + + + +@article{googledistance, + author = {Cilibrasi, Rudi L. and Vitanyi, Paul M. B.}, + title = {The {G}oogle Similarity Distance}, + journal = {IEEE Transactions on Knowledge and Data Engineering}, + volume = {19}, + issue = {3}, + month = {March}, + year = {2007}, + issn = {1041-4347}, + pages = {370--383}, + numpages = {14}, + url = {http://portal.acm.org/citation.cfm?id=1263132.1263333}, + doi = {10.1109/TKDE.2007.48}, + acmid = {1263333}, + publisher = {IEEE Educational Activities Department}, + address = {Piscataway, NJ, USA}, +} + + +@misc{esaimpl, +author = {Henning Jacobs}, +year= 2007, +title={ Explicit Semantic Analysis ({ESA}) using +{W}ikipedia}, +note={ Retrieved June 7, 2011 from +http://www.srcco.de/v/wikipedia-esa} +} +@INPROCEEDINGS{Milne_anopen-source, + author = {David Milne}, + title = {An open-source toolkit for mining {W}ikipedia}, + booktitle = {Proceedings of the New Zealand Computer Science Research Student Conference}, + year = {2009}, + pages = {} +} + +@inproceedings{miApp, +author = { Jan Rauch and Milan \v{S}im\r{u}nek}, +title={Applying Domain +Knowledge in Association Rules Mining Process -- First Experience.}, +booktitle={ISMIS'11}, +address={Berlin}, +publisher={Springer}, +year= 2011, +pages={113-122} +} + + + + +@Inbook{miMain, +author="Rauch, Jan", +editor="Ras, Zbigniew W. +and Dardzinska, Agnieszka", +title="Considerations on Logical Calculi for Dealing with Knowledge in Data Mining", +bookTitle="Advances in Data Management", +year="2009", +publisher="Springer Berlin Heidelberg", +address="Berlin, Heidelberg", +pages="177--199", +isbn="978-3-642-02190-9" +} + + +@inproceedings{rrsubm, + author = {Tom\'{a}\v{s} Kliegr and Andrej Hazucha and Tom\'{a}\v{s} Marek}, + title = {Instant feedback on discovered association rules with {PMML}-based Query-by-Example}, +booktitle={Web Reasoning and Rule Systems}, +publisher={Springer}, +year ={ 2011} + } + @inproceedings{DBLP:conf/pkdd/SkrabalSVHMCK12, + author = {\v{S}krabal, Radek and + \v{S}im\r{u}nek, Milan and + Stanislav Voj\'{\i}\v{r} and + Andrej Hazucha and + Tom{\'a}\v{s} Marek and + David Chud{\'a}n and + Tom{\'a}\v{s} Kliegr}, + title = {Association Rule Mining Following the Web Search Paradigm}, + booktitle = {ECML/PKDD (2)}, + year = {2012}, + pages = {808-811}, + ee = {http://dx.doi.org/10.1007/978-3-642-33486-3_52}, + crossref = {DBLP:conf/pkdd/2012-2}, + bibsource = {DBLP, http://dblp.uni-trier.de} +} + +@proceedings{DBLP:conf/pkdd/2012-2, + editor = {Peter A. Flach and + Tijl De Bie and + Nello Cristianini}, + title = {Machine Learning and Knowledge Discovery in Databases - + European Conference, ECML PKDD 2012, Bristol, UK, September + 24-28, 2012. Proceedings, Part II}, + booktitle = {ECML/PKDD (2)}, + publisher = {Springer}, + series = {Lecture Notes in Computer Science}, + volume = {7524}, + year = {2012}, + isbn = {978-3-642-33485-6}, + ee = {http://dx.doi.org/10.1007/978-3-642-33486-3}, + bibsource = {DBLP, http://dblp.uni-trier.de} +} + + +@inproceedings{ecmlsubm, + author = {Andrej Hazucha and Milan \v{S}im\accent23unek and Tom\'{a}\v{s} Kliegr}, + title = { Instant Feedback Visual Association Rule Mining based on the {GUHA} Method }, + } + +@inproceedings{zn11ch, + author = {Tom\'{a}\v{s} Kliegr and Andrej Hazucha and David Chudan and Jan Rauch}, + title = {{SEWEBAR-CMS} as a support for teaching data mining}, + booktitle = {Proceedings of Znalosti'11}, + year = {2011}, + location = {Stara Lesna, Slovakia}, + note = {In Czech.}, + } + + + + +@inproceedings{Lesk:1986:ASD:318723.318728, + author = {Lesk, Michael}, + title = {Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone}, + booktitle = {Proceedings of the 5th annual international conference on Systems documentation}, + series = {SIGDOC '86}, + year = {1986}, + isbn = {0-89791-224-1}, + location = {Toronto, Ontario, Canada}, + pages = {24--26}, + numpages = {3}, + url = {http://doi.acm.org/10.1145/318723.318728}, + doi = {http://doi.acm.org/10.1145/318723.318728}, + acmid = {318728}, + publisher = {ACM}, + address = {New York, NY, USA}, +} + +@article{Harman:1992:DTP:146565.146567, + author = {Harman, Donna}, + title = {The {DARPA TIPSTER} project}, + journal = {SIGIR Forum}, + volume = {26}, + issue = {2}, + month = {October}, + year = {1992}, + issn = {0163-5840}, + pages = {26--28}, + numpages = {3}, + url = {http://doi.acm.org/10.1145/146565.146567}, + doi = {http://doi.acm.org/10.1145/146565.146567}, + acmid = {146567}, + publisher = {ACM}, + address = {New York, NY, USA}, +} + + +@article{ner-sekine2007, + author = {David Nadeau and Satoshi Sekine}, + interhash = {2d9a1a5440885a8741a1686f344a9494}, + intrahash = {31a3c28acfe2301ba810b2fccd1ea392}, + journal = {Linguisticae Investigationes}, + note = {Publisher: John Benjamins Publishing Company}, + number = 1, + pages = {3--26}, + title = {A survey of named entity recognition and classification}, + url = {http://www.ingentaconnect.com/content/jbp/li/2007/00000030/00000001/art00002}, + volume = 30, + year = 2007, + timestamp = {2009-11-06T17:24:59.000+0100}, + keywords = {ie ner nlp survey}, + added-at = {2009-11-06T17:24:59.000+0100}, + biburl = {http://www.bibsonomy.org/bibtex/231a3c28acfe2301ba810b2fccd1ea392/mirian}, + month = {January} +} + + +@article{Jiang97taxonomySimilarity, + author = {Jay J. Jiang and + David W. Conrath}, + title = {Semantic Similarity Based on Corpus Statistics and Lexical + Taxonomy}, + journal = {CoRR}, + volume = {cmp-lg/9709008}, + year = {1997}, + ee = {http://arxiv.org/abs/cmp-lg/9709008}, + bibsource = {DBLP, http://dblp.uni-trier.de} +} + + +@inproceedings{semcor, + author = {Miller, George A. and Leacock, Claudia and Tengi, Randee and Bunker, Ross T.}, + title = {A semantic concordance}, + booktitle = {Proceedings of the workshop on Human Language Technology}, + series = {HLT '93}, + year = {1993}, + isbn = {1-55860-324-7}, + location = {Princeton, New Jersey}, + pages = {303--308}, + numpages = {6}, + url = {http://dx.doi.org/10.3115/1075671.1075742}, + doi = {http://dx.doi.org/10.3115/1075671.1075742}, + acmid = {1075742}, + publisher = {Association for Computational Linguistics}, + address = {Stroudsburg, PA, USA}, +} + +@inproceedings{Lin98aninformation, + author = {Dekang Lin}, + title = {An Information-Theoretic Definition of Similarity}, + booktitle = {Proceedings of the 15th International Conference on Machine Learning}, + year = {1998}, + pages = {296--304}, + publisher = {Morgan Kaufmann} +} + + +@book{kucera, + added-at = {2007-11-01T22:51:30.000+0100}, + author = {Francis, Winthrop Nelson and Ku\v{c}era, Ji\v{r}{\'i}}, + biburl = {http://www.bibsonomy.org/bibtex/2c1fc73284ae676d7b3855bc41b31c567/stumme}, + booktitle = 1983, + interhash = {0c62e7ae7fdcda5c360550ec149b79c6}, + intrahash = {c1fc73284ae676d7b3855bc41b31c567}, + keywords = {brown corpus}, + publisher = {Houghton Mifflin}, + timestamp = {2007-11-01T22:51:30.000+0100}, + title = {Frequency Analysis of {E}nglish Usage: Lexicon and Grammar}, + year = 1983 +} +@inproceedings{agirreDisambiguation, +author = {Eneko Agirre and German Rigau and Pau Gargallo}, +title = {A Proposal for Word Sense Disambiguation using Conceptual Distance}, +booktitle = {Proceedings of the Recent Advances in Natural Language Processing}, +year = {1995}, +series={RANLP '95}, +masid = {2363681} +} + + +@inproceedings{WuPalmer, + author = {Wu, Zhibiao and Palmer, Martha}, + title = {Verbs semantics and lexical selection}, + booktitle = {Proceedings of the 32nd annual meeting on Association for Computational Linguistics}, + series = {ACL '94}, + year = {1994}, + location = {Las Cruces, New Mexico}, + pages = {133--138}, + numpages = {6}, + url = {http://dx.doi.org/10.3115/981732.981751}, + doi = {http://dx.doi.org/10.3115/981732.981751}, + acmid = {981751}, + publisher = {Association for Computational Linguistics}, + address = {Stroudsburg, PA, USA}, +} + + +@inproceedings{leacockChodorov, + author = {Leacock, Claudia and Chodorow, Martin}, + booktitle = {{W}ord{N}et: A Lexical Reference System and its Application}, + citeulike-article-id = {1259480}, + keywords = {bibtex-import, phd-reading}, + posted-at = {2007-04-27 10:03:10}, + priority = {2}, + title = {Combining local context with {WordNet} similarity for word sense identification}, + year = {1998} +} + +@article{rada, + abstract = {Motivated by the properties of spreading activation and conceptual +distance, the authors propose a metric, called distance, on the power +set of nodes in a semantic net. Distance is the average minimum path +length over all pairwise combinations of nodes between two subsets of +nodes. Distance can be successfully used to assess the conceptual +distance between sets of concepts when used on a semantic net of +hierarchical relations. When other kinds of relationships, like `cause', +are used, distance must be amended but then can again be effective. The +judgements of distance significantly correlate with the distance +judgements that people make and help to determine whether one semantic +net is better or worse than another. The authors focus on the +mathematical characteristics of distance that presents novel cases and +interpretations. Experiments in which distance is applied to pairs of +concepts and to sets of concepts in a hierarchical knowledge base show +the power of hierarchical relations in representing information about +the conceptual distance between concepts}, + author = {Rada, R. and Mili, H. and Bicknell, E. and Blettner, M.}, + booktitle = {Systems, Man and Cybernetics, IEEE Transactions on}, + citeulike-article-id = {1607377}, + citeulike-linkout-0 = {http://dx.doi.org/10.1109/21.24528}, + citeulike-linkout-1 = {http://ieeexplore.ieee.org/xpls/abs\_all.jsp?arnumber=24528}, + doi = {10.1109/21.24528}, + issn = {00189472}, + journal = {IEEE Transactions on Systems, Man, and Cybernetics}, + keywords = {mesh, semantic\_similarity}, + month = jan, + number = {1}, + pages = {17--30}, + posted-at = {2007-08-30 15:38:11}, + priority = {2}, + title = {Development and application of a metric on semantic nets}, + url = {http://dx.doi.org/10.1109/21.24528}, + volume = {19}, + year = {1989} +} + +@inproceedings{deerwesterLSI, + address = {Atlanta, Georgia}, + author = {Deerwester, Scott}, + booktitle = {Proceedings of the 51st ASIS Annual Meeting}, + citeulike-article-id = {973254}, +series={ASIS '88}, + editor = {Borgman, Christine L. and Pai, Edward Y. H.}, + keywords = {bibtex-import}, + month = oct, + organization = {American Society for Information Science}, + posted-at = {2006-12-04 15:07:47}, + priority = {2}, + title = {Improving Information Retrieval with {L}atent {S}emantic {I}ndexing}, + volume = {25}, + year = {1988} +} + + +@inproceedings{linkgrammar, + abstract = {{We define a new formal grammatical system called a link grammar . A sequence of words is in the language of a link grammar if there is a way to draw links between words in such a way that (1) the local requirements of each word are satisfied, (2) the links do not cross, and (3) the words form a connected graph. We have encoded English grammar into such a system, and written a program (based on new algorithms) for efficiently parsing with a link grammar. The formalism is lexical and makes no...}}, + author = {Daniel D. K. Sleator and Davy Temperley}, + booktitle = {Proceedings of the 3rd International Workshop on Parsing Technologies}, + citeulike-article-id = {1576584}, + citeulike-linkout-0 = {http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.12.1238}, + keywords = {nlp, parsing}, + posted-at = {2007-08-20 11:15:57}, + priority = {2}, + title = {Parsing {E}nglish with a link grammar}, + url = {http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.12.1238}, + year = {1993} +} + +@ARTICLE{Miller90wordnet:an, + author = {George A. Miller and Richard Beckwith and Christiane Fellbaum and Derek Gross and Katherine Miller}, + title = {{W}ord{N}et: An on-line lexical database}, + journal = {International Journal of Lexicography}, + year = {1990}, + volume = {3}, + pages = {235--244} +} + +@article{DBLP:journals/cacm/Miller95, + author = {George A. Miller}, + title = {{W}ord{N}et: A Lexical Database for English}, + journal = {Commun. ACM}, + volume = {38}, + number = {11}, + year = {1995}, + pages = {39-41}, + bibsource = {DBLP, http://dblp.uni-trier.de} +} + + + + +@article{citeulike:160044, +title={Maximizing semantic relatedness to perform word sense disambiguation}, +volume={25}, url={http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.122.6537&rep=rep1&type=pdf}, + journal={Research Report}, +publisher={University of Minnesota}, +author={Pedersen, Ted and Banerjee, Satanjeev and Patwardhan, Siddharth}, +year={2005}, +pages={2005–25}} + +@article{DBLP:journals/dke/OlivaSCI11, + author = {Jesus Oliva and + Jose Ignacio Serrano and + Mar\'{\i}a Dolores del Castillo and + {\'A}ngel Iglesias}, + title = {SyMSS: A syntax-based measure for short-text semantic similarity}, + journal = {Data \& Knowledge Engineering}, + volume = {70}, + number = {4}, + year = {2011}, + pages = {390-405}, + ee = {http://dx.doi.org/10.1016/j.datak.2011.01.002}, + bibsource = {DBLP, http://dblp.uni-trier.de} +} + +@incollection {THDgermanRecent_short, + author = {Litz, Berenike and Langer, Hagen and Malaka, Rainer}, + affiliation = {TZI, University of Bremen, Bremen, Germany}, + title = {Sequential Supervised Learning for Hypernym Discovery from {W}ikipedia}, + booktitle = {Knowledge Discovery, Knowlege Engineering and Knowledge Management}, + publisher = {Springer-Verlag}, + address={Berlin}, + isbn = {978-3-642-19032-2}, + keyword = {Computer Science}, + pages = {68-80}, + volume = {128}, + + year = {2011} +} +@incollection {THDgermanRecent, + author = {Litz, Berenike and Langer, Hagen and Malaka, Rainer}, + affiliation = {TZI, University of Bremen, Bremen, Germany}, + title = {Sequential Supervised Learning for Hypernym Discovery from {W}ikipedia}, + booktitle = {Knowledge Discovery, Knowlege Engineering and Knowledge Management}, + series = {Communications in Computer and Information Science}, + editor = {Fred, Ana and Dietz, Jan L. G. and Liu, Kecheng and Filipe, Joaquim}, + publisher = {Springer-Verlag}, + address={Berlin Heidelberg}, + isbn = {978-3-642-19032-2}, + keyword = {Computer Science}, + pages = {68-80}, + volume = {128}, + + year = {2011} +} + +@book{introtoSVMs, + abstract = {This is the first comprehensive introduction to Support Vector Machines ({SVMs}), a new generation learning system based on recent advances in statistical learning theory. Students will find the book both stimulating and accessible, while practitioners will be guided smoothly through the material required for a good grasp of the theory and its applications. The concepts are introduced gradually in accessible and self-contained stages, while the presentation is rigorous and thorough. Pointers to relevant literature and web sites containing software make it an ideal starting point for further study.}, + author = {Cristianini, Nello and Shawe-Taylor, John}, + day = {28}, + edition = {1st}, + howpublished = {Hardcover}, + isbn = {0521780195}, + keywords = {book, regularization, statistical-learning-theory, svm}, + month = mar, + posted-at = {2005-03-05 09:49:31}, + priority = {2}, + publisher = {Cambridge University Press}, + title = {An introduction to support vector machines : and other kernel-based learning methods}, + url = {http://www.cs.orst.edu/\~{}bulatov/papers/Cambridge University Press -Support.Vector.Machines.and.Oth.chm}, + year = {2000} +} + +@inproceedings{Wang:2008:BSK:1401890.1401976, + author = {Wang, Pu and Domeniconi, Carlotta}, + title = {Building semantic kernels for text classification using {W}ikipedia}, + booktitle = {Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining}, + series = {KDD '08}, + year = {2008}, + isbn = {978-1-60558-193-4}, + location = {Las Vegas, Nevada, USA}, + pages = {713--721}, + numpages = {9}, + url = {http://doi.acm.org/10.1145/1401890.1401976}, + doi = {http://doi.acm.org/10.1145/1401890.1401976}, + acmid = {1401976}, + publisher = {ACM}, + address = {New York, NY, USA}, + keywords = {kernel methods, semantic kernels, text classification, wikipedia}, +} + + +@inproceedings{Wang:2007:ITC:1441428.1442085, + author = {Wang, Pu and Hu, Jian and Zeng, Hua-Jun and Chen, Lijun and Chen, Zheng}, + title = {Improving Text Classification by Using Encyclopedia Knowledge}, + booktitle = {Proceedings of the 7th IEEE International Conference on Data Mining}, + year = {2007}, + isbn = {0-7695-3018-4}, + pages = {332--341}, + numpages = {10}, + url = {http://portal.acm.org/citation.cfm?id=1441428.1442085}, + doi = {10.1109/ICDM.2007.77}, + acmid = {1442085}, + publisher = {IEEE Computer Society}, + address = {Washington, DC, USA}, +} + + +@incollection {gridmining, + author = {\v{S}im\accent23unek, Milan and Tammisto, Teppo}, + affiliation = {University of Economics Prague Czech Republic}, + title = {Distributed Data-Mining in the LISp-Miner System Using Techila Grid}, + booktitle = {Networked Digital Technologies}, + series = {Communications in Computer and Information Science}, + editor = {Zavoral, Filip and Yaghob, Jakub and Pichappan, Pit and El-Qawasmeh, Eyas}, + publisher = {Springer Berlin Heidelberg}, + isbn = {978-3-642-14292-5}, + keyword = {Computer Science}, + pages = {15-20}, + volume = {87}, + year = {2010} +} + +@InProceedings{ znalosti11, + author = "Luk{\'a}\v{s} Beranek and Andrej Hazucha and Tom{\'a}\v{s} Marek and Tom{\'a}\v{s} Kliegr", + title = "Searching {A}ssociation {R}ules - fulltext, structured or semantic search?", + booktitle = "Znalosti 2011 Proceedings", + year = "2011", + location = "Star{\'a} Lesn{\'a}", + publisher = "V\v{S}B-TU Ostrava", + note = "In Czech" +} + +@InProceedings{ tmra10, + author = "Andrej Hazucha and Jakub Balhar and Tom\'{a}\v{s} Kliegr", + title = "A {PHP} library for {Ontopia-CMS} Integration", + booktitle = "TMRA 2010", + year = "2010", + publisher = "University of Leipzig" +} + + +@article{Hajek201034, +title = "The {GUHA} method and its meaning for data mining", +journal = "Journal of Computer and System Sciences", +volume = "76", +number = "1", +pages = "34 - 48", +year = "2010", +note = "Special Issue on Intelligent Data Analysis", +issn = "0022-0000", +author = "Petr H\'{a}jek and Martin Holena and Jan Rauch", +keywords = "GUHA method", +keywords = "Data mining", +keywords = "LISp-Miner", +keywords = "Fuzzy hypotheses" +} +@article{Geng:2006:IMD:1132960.1132963, + author = {Geng, Liqiang and Hamilton, Howard J.}, + title = {Interestingness measures for data mining: A survey}, + journal = {ACM Comput. Surv.}, + volume = {38}, + issue = {3}, + month = {September}, + year = {2006}, + issn = {0360-0300}, + articleno = {9}, + url = {http://doi.acm.org/10.1145/1132960.1132963}, + doi = {http://doi.acm.org/10.1145/1132960.1132963}, + acmid = {1132963}, + publisher = {ACM}, + address = {New York, NY, USA}, + keywords = {Knowledge discovery, association rules, classification rules, interest measures, interestingness measures, summaries}, +} + + +@article{citeulike:964200, + abstract = {'{ReMoDiscovery}' is an intuitive algorithm to correlate regulatory programs with regulators and corresponding motifs to a set of co-expressed genes. It exploits in a concurrent way three independent data sources: {ChIP}-chip data, motif information and gene expression profiles. When compared to published module discovery algorithms, {ReMoDiscovery} is fast and easily tunable. We evaluated our method on yeast data, where it was shown to generate biologically meaningful findings and allowed the prediction of potential novel roles of transcriptional regulators.}, + address = {BIOI@SCD, Department of Electrical Engineering, KU Leuven, Kasteelpark Arenberg, B-3001 Heverlee, Belgium.}, + author = {Lemmens, Karen and Dhollander, Thomas and De Bie, Tijl and Monsieurs, Pieter and Engelen, Kristof and Smets, Bart and Winderickx, Joris and De Moor, Bart and Marchal, Kathleen}, + citeulike-article-id = {964200}, + citeulike-linkout-0 = {http://dx.doi.org/10.1186/gb-2006-7-5-r37}, + citeulike-linkout-1 = {http://view.ncbi.nlm.nih.gov/pubmed/16677396}, + citeulike-linkout-2 = {http://www.hubmed.org/display.cgi?uids=16677396}, + doi = {10.1186/gb-2006-7-5-r37}, + issn = {1465-6906}, + journal = {Genome Biology}, + keywords = {hyper-clustering}, + number = {5}, + pages = {R37+}, + pmid = {16677396}, + posted-at = {2008-10-15 21:32:29}, + priority = {2}, + title = {Inferring transcriptional modules from {ChIP}-chip, motif and microarray data}, + url = {http://dx.doi.org/10.1186/gb-2006-7-5-r37}, + volume = {7}, + year = {2006} +} + +@inproceedings{Bie:2010:FMI:1816112.1816117, + author = {De Bie, Tijl and Kontonasios, Kleanthis-Nikolaos and Spyropoulou, Eirini}, + title = {A framework for mining interesting pattern sets}, + booktitle = {Useful Patterns}, + series = {UP '10}, + year = {2010}, + isbn = {978-1-4503-0216-6}, + location = {Washington, DC}, + pages = {27--35}, + numpages = {9}, + url = {http://doi.acm.org/10.1145/1816112.1816117}, + doi = {http://doi.acm.org/10.1145/1816112.1816117}, + acmid = {1816117}, + publisher = {ACM}, + address = {New York, NY, USA}, + keywords = {maximum entropy, pattern set mining, prior information, subjective interestingness measures}, +} + + +@INPROCEEDINGS{Tuzhilin95onsubjective, + author = {Alexander Tuzhilin}, + title = {On subjective measures of interestingness in knowledge discovery}, + booktitle = {Proceedings of the First International Conference on Knowledge Discovery and Data Mining}, + year = {1995}, + pages = {275--281}, + publisher = {AAAI Press} +} +@Article{ doi:10.1175/WAF-D-10-05029.1, + author = "Ruixin Yang and Jiang Tang and Donglian Sun", + title = "Association Rule Data Mining Applications for Atlantic Tropical Cyclone Intensity Changes", + journal = "Weather and Forecasting", + volume = "0", + number = "0", + pages = "null", + year = "0", + doi = "10.1175/WAF-D-10-05029.1", + URL = "http://journals.ametsoc.org/doi/abs/10.1175/WAF-D-10-05029.1", + eprint = "http://journals.ametsoc.org/doi/pdf/10.1175/WAF-D-10-05029.1" +} + + + +@Book{ citeulike:2162869, + author = "Ronen Feldman and James Sanger", + day = "11", + howpublished = "Hardcover", + isbn = "0521836573", + keywords = "text\_mining", + month = dec, + posted-at = "2008-05-30 11:38:34", + priority = "2", + publisher = "Cambridge University Press", + title = "{The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data}", + url = "http://www.amazon.com/exec/obidos/redirect?tag=citeulike07-20&path=ASIN/0521836573", + year = "2006" +} + + + +@Book{ citeulike:975464, + author = "Bing Liu", + day = "21", + edition = "1st ed. 2007. Corr. 2nd printing", + howpublished = "Hardcover", + isbn = "3540378812", + keywords = "data-mining, text-mining", + month = jan, + posted-at = "2008-02-07 19:59:15", + priority = "2", + publisher = "Springer", + title = "{Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data (Data-Centric Systems and Applications)}", + url = "http://www.amazon.com/exec/obidos/redirect?tag=citeulike07-20&path=ASIN/3540378812", + year = "2009" +} + + + +@article{Wu:2007:TAD:1327434.1327436, + author = {Wu, Xindong and Kumar, Vipin and Ross Quinlan, J. and Ghosh, Joydeep and Yang, Qiang and Motoda, Hiroshi and McLachlan, Geoffrey J. and Ng, Angus and Liu, Bing and Yu, Philip S. and Zhou, Zhi-Hua and Steinbach, Michael and Hand, David J. and Steinberg, Dan}, + title = {Top 10 algorithms in data mining}, + journal = {Knowl. Inf. Syst.}, + volume = {14}, + issue = {1}, + month = {December}, + year = {2007}, + issn = {0219-1377}, + pages = {1--37}, + numpages = {37}, + acmid = {1327436}, + publisher = {Springer-Verlag New York, Inc.}, + address = {New York, NY, USA}, +} + +@inproceedings{postprocessingAROntologies, +author = {Claudia Marinica and Fabrice Guillet and Henri Briand}, +title = {Post-Processing of Discovered Association Rules Using Ontologies}, +booktitle = {IEEE International Conference on Data Mining}, +year = {2008}, +pages = {126--133}, +doi = {10.1109/ICDMW.2008.87}, +masid = {4716217} +} + +@inproceedings{DBLP:conf/dawak/BerkaR10, + author = {Petr Berka and + Jan Rauch}, + title = {Meta-learning for Post-processing of Association Rules}, + booktitle = {DaWak}, + year = {2010}, + pages = {251-262}, + ee = {http://dx.doi.org/10.1007/978-3-642-15105-7_20}, + crossref = {DBLP:conf/dawak/2010}, + bibsource = {DBLP, http://dblp.uni-trier.de} +} + +@inproceedings{rulemldemo, + author = {Tom\'{a}\v{s} Kliegr and Andrej Hazucha and David Chudan and Jan Rauch}, + title = { {SEWEBAR-CMS}: A System for Postprocessing Data Mining Models }, + booktitle = {RuleML-2010, 4th International Web Rule Symposium}, + year = {2010}, + location = {Washington, DC}, + publisher = {CEUR-WS}, + } + + + +@inproceedings{ruleml, + author = {Tom\'{a}\v{s} Kliegr and Jan Rauch}, + title = {An {XML} Format for Association Rule Models Based on {GUHA} Method}, + booktitle = {RuleML-2010, 4th International Web Rule Symposium}, + year = {2010}, + location = {Washington, DC}, + publisher = {Springer-Verlag}, + address = {Berlin, Heidelberg}, + } + +@InProceedings{BekkermanJ07, + + author = "R. Bekkerman and J. Jeon", + + title = "Multi-modal Clustering for Multimedia Collections", + + booktitle = "CVPR-07: Proceedings of the IEEE Computer Society Conference + on Computer Vision and Pattern Recognition", + + year = "2007", + + location = "Minneapolis, Minnesota" + +} + +@inproceedings{kulczynski, + author = {Wu, Tianyi and Chen, Yuguo and Han, Jiawei}, + title = {Association Mining in Large Databases: A Re-examination of Its Measures}, + booktitle = {PKDD 2007: Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases}, + year = {2007}, + isbn = {978-3-540-74975-2}, + pages = {621--628}, + location = {Warsaw, Poland}, + publisher = {Springer-Verlag}, + address = {Berlin, Heidelberg}, + } + + + +@inproceedings{lispminerpaper, + author = {Milan \v{S}im\accent23unek}, + title = {Academic {KDD} Project {LIS}p-{M}iner}, + booktitle = {Advances in Soft Computing - Intelligent Systems Design and Applications}, + year = {2003}, + pages = {263-272}, + publisher={Springer}, + editors={Ajith Abreham and Katrin Franke and Mario Koppen} +} + +@book{wordnet, + day = {15}, + edition = {illustrated edition}, + editor = {Fellbaum, Christiane}, + howpublished = {Hardcover}, + isbn = {026206197X}, + keywords = {lexical\_resources, lexicography, semantics, wordnet}, + month = {May}, + posted-at = {2007-11-15 20:28:18}, + priority = {0}, + publisher = {The MIT Press}, + title = {{W}ord{N}et: An Electronic Lexical Database (Language, Speech, and Communication)}, + url = {http://www.amazon.com/exec/obidos/redirect?tag=citeulike07-20&path=ASIN/026206197X}, + year = {1998} +} + + +@inproceedings{refiningtheMFSbaseline, + author = {Preiss, Judita and Dehdari, Jon and King, Josh and Mehay, Dennis}, + title = {Refining the most frequent sense baseline}, + booktitle = {Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions}, + series = {DEW '09}, + year = {2009}, + isbn = {978-1-932432-31-2}, + location = {Boulder, Colorado}, + pages = {10--18}, + numpages = {9}, + url = {http://dl.acm.org/citation.cfm?id=1621969.1621973}, + acmid = {1621973}, + publisher = {Association for Computational Linguistics}, + address = {Stroudsburg, PA, USA}, +} + +@INPROCEEDINGS{wittenwiki, + author = {David Milne and Ian H. Witten}, + title = {An Effective, Low-Cost Measure of Semantic Relatedness Obtained from {W}ikipedia Links}, + booktitle = {Proceedings of AAAI'08: the 23rd conference on + Artificial intelligence}, + year = {2008} +} +@TECHREPORT{Jarmasz03rogetsthesaurus, + author = {Mario Jarmasz}, + title = {Roget's thesaurus as a lexical resource for natural language processing}, + institution = {University of Ottowa}, +note ={Master thesis}, + year = {2003} +} + +@article{Finkelstein:02, + abstract = {Keyword-based search engines are in widespread use today as a popular means for Web-based information retrieval. Although such systems seem deceptively simple, a considerable amount of skill is required in order to satisfy non-trivial information needs. This paper presents a new conceptual paradigm for performing search in context, that largely automates the search process, providing even non-professional users with highly relevant results. This paradigm is implemented in practice in the IntelliZap system, where search is initiated from a text query marked by the user in a document she views, and is guided by the text surrounding the marked query in that document ("the context"). The context-driven information retrieval process involves semantic keyword extraction and clustering to automatically generate new, augmented queries. The latter are submitted to a host of general and domain-specific search engines. Search results are then semantically reranked, using context. Experimental results +testify that using context to guide search, effectively offers even inexperienced users an advanced search tool on the Web.}, + address = {New York, NY, USA}, + author={Lev Finkelstein and Evgeniy Gabrilovich and Yossi Matias and Ehud Rivlin and Zach Solan and Gadi Wolfman and Eytan Ruppin}, + citeulike-article-id = {379844}, + citeulike-linkout-0 = {http://portal.acm.org/citation.cfm?id=503110}, + citeulike-linkout-1 = {http://dx.doi.org/10.1145/503104.503110}, + doi = {10.1145/503104.503110}, + issn = {1046-8188}, + journal = {ACM Trans. Inf. Syst.}, + keywords = {context}, + month = {January}, + number = {1}, + pages = {116--131}, + posted-at = {2008-02-25 15:54:13}, + priority = {2}, + publisher = {ACM}, + title = {Placing search in context: the concept revisited}, + url = {http://dx.doi.org/10.1145/503104.503110}, + volume = {20}, + year = {2002} +} + +@InProceedings{ wikirelate, + abstract = "Wikipedia provides a knowledge base for computing word relatedness in a more structured fashion than a search engine and with more coverage than WordNet. In this work we present experiments on using Wikipedia for computing semantic relatedness and compare it to WordNet on various benchmarking datasets. Existing relatedness measures perform better using Wikipedia than a baseline given by Google counts, and we show that Wikipedia outperforms WordNet when applied to the largest available dataset designed for that purpose. The best results on this dataset are obtained by integrating Google, WordNet and Wikipedia based measures. We also show that including Wikipedia improves the performance of an NLP application processing naturally occurring texts.", + author = "Michael Strube and Simone P. Ponzetto", + booktitle = "Proceedings of the 21st conference on + Artificial intelligence (AAAI'06)", + citeulike-article-id = "7466320", + citeulike-linkout-0 = "http://portal.acm.org/citation.cfm?id=1597414", + isbn = "978-1-57735-281-5", + keywords = "bioinformatics, michael-strube, simone-ponzetto, wikipedia", + location = "Boston, Massachusetts", + pages = "1419--1424", + posted-at = "2010-07-12 16:48:40", + priority = "2", + publisher = "AAAI Press", + title = "WikiRelate! computing semantic relatedness using {W}ikipedia", + url = "http://portal.acm.org/citation.cfm?id=1597414", + year = "2006" +} + + +@INPROCEEDINGS{Cucerzan07large-scalenamed, + author = {Silviu Cucerzan}, + title = {Large-scale named entity disambiguation based on {W}ikipedia data}, + booktitle = {EMNLP-CoNLL'07: Proceedings of Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning}, + year = {2007}, + pages = {708--716} +} + +@INPROCEEDINGS{Gabrilovich07computingsemantic, + author = {Evgeniy Gabrilovich and Shaul Markovitch}, + title = {Computing semantic relatedness using {W}ikipedia-based explicit semantic analysis}, + booktitle = {AAAI'07: Proceedings of the 20th International Joint Conference on + Artificial Intelligence}, + year = {2007}, + pages = {1606--1611} +} + +@INPROCEEDINGS{pmmlExtensionAttemptXMLSchema, +author={Dietrich Wettschereck and Stefan Mueller}, +title={Exchanging data mining models with the {P}redictive {M}odel {M}arkup {L}anguage}, +booktitle={Proceedings of the ECML/PKDD-01 Worksh. on Integr. of DM Decision Supp. and Meta-Learning}, +pages={55-66} +} + + +@INPROCEEDINGS{pmmlExtensionAttemptDTD, + author = {Dietrich Wettschereck}, + title = {A KDDSE-independent PMML Visualizer}, + booktitle = {Proceedings of IDDM-02, workshop on Integration aspects of Decision Support and Data Mining, (Eds.) Bohanec}, + year = {2002} +} + +@misc{pkdd:99, +title={Workshop Notes on {PKDD}'99 Discovery Challenge}, + note={Prague, Czech Republic}, +year={1999} +} + +@inproceedings{DBLP:conf/grc/RauchS05, + author = {Jan Rauch and + Milan Sim\accent23unek}, + title = {GUHA method and granular computing}, + booktitle = {GrC}, + year = {2005}, + pages = {630-635}, + ee = {http://doi.ieeecomputersociety.org/10.1109/GRC.2005.1547368}, + crossref = {DBLP:conf/grc/2005}, + bibsource = {DBLP, http://dblp.uni-trier.de} +} + +@proceedings{DBLP:conf/grc/2005, + editor = {Xiaohua Hu and + Qing Liu and + Andrzej Skowron and + Tsau Young Lin and + Ronald R. Yager and + Bo Zhang}, + title = {2005 IEEE International Conference on Granular Computing, + Beijing, China, July 25-27, 2005}, + booktitle = {GrC}, + publisher = {IEEE}, + year = {2005}, + isbn = {0-7803-9017-2}, + bibsource = {DBLP, http://dblp.uni-trier.de} + } + +@misc{kiwi, +title={Knowledge in a wiki}, +note={www.kiwi-project.eu} +} + +@misc{ontopia, +title={Ontopia Knowledge Suite}, +note={http://ontopia.net} +} + +@misc{iks, +title={Interactive Knowledge Stack}, +note={http://www.iks-project.eu} +} + +@misc{itms, +author={Lars Marius Garshol}, +title={Topic Maps in Content Management}, +note={http://ontopia.net/topicmaps/materials/itms.html} +} + +@InCollection{ DBLP:series/sci/Rauch08, + author = "Jan Rauch", + title = "Classes of Association Rules: An Overview", + booktitle = "Data Mining: Foundations and Practice", + year = "2008", + pages = "315--337", + ee = "http://dx.doi.org/10.1007/978-3-540-78488-3\_19", + crossref = "DBLP:series/sci/2008-118", + bibsource = "DBLP, http://dblp.uni-trier.de" +} + +@InProceedings{ DBLP:conf/cidm/OlaruMG09, + author = "Andrei Olaru and Claudia Marinica and Fabrice Guillet", + title = "Local mining of Association Rules with Rule Schemas", + booktitle = "CIDM", + year = "2009", + pages = "118--124", + ee = "http://dx.doi.org/10.1109/CIDM.2009.4938638", + crossref = "DBLP:conf/cidm/2009", + bibsource = "DBLP, http://dblp.uni-trier.de" +} + +@InProceedings{ DBLP:conf/dasfaa/ZhangJHNZ07, + author = "Xiaodan Zhang and Liping Jing and Xiaohua Hu and Michael Ng and Xiaohua Zhou", + title = "A Comparative Study of Ontology Based Term Similarity Measures on PubMed Document Clustering", + booktitle = "DASFAA", + year = "2007", + pages = "115--126", + ee = "http://dx.doi.org/10.1007/978-3-540-71703-4\_12", + crossref = "DBLP:conf/dasfaa/2007", + bibsource = "DBLP, http://dblp.uni-trier.de" +} + +@InProceedings{ DBLP:conf/kdd/Berendt05, + author = "Bettina Berendt", + title = "Using and Learning Semantics in Frequent Subgraph Mining", + booktitle = "WEBKDD", + year = "2005", + pages = "18--38", + ee = "http://dx.doi.org/10.1007/11891321\_2", + crossref = "DBLP:conf/kdd/2005web", + bibsource = "DBLP, http://dblp.uni-trier.de" +} + +@inproceedings{conf/pkdd/BlohmC07, + author = {Blohm, Sebastian and Cimiano, Philipp}, + title = {Using the Web to Reduce Data Sparseness in Pattern-Based Information Extraction}, + booktitle = {Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases}, + series = {PKDD 2007}, + year = {2007}, + isbn = {978-3-540-74975-2}, + location = {Warsaw, Poland}, + pages = {18--29}, + numpages = {12}, + acmid = {1421759}, + publisher = {Springer-Verlag}, + address = {Berlin, Heidelberg}, +} + +@InProceedings{ conf/grc/RauchS05, + title = "GUHA method and granular computing.", + author = "Jan Rauch and Milan \v{S}im\accent23unek", + booktitle = "GrC", + crossref = "conf/grc/2005", + editoDISABLEDr = "Xiaohua Hu and Qing Liu and Andrzej Skowron and Tsau Young Lin and Ronald R. Yager and Bo Zhang", + pages = "630--635", + publisher = "IEEE", + year = "2005" +} + +@InProceedings{ DBLP:conf/wise/WoonNL02a, + author = "Yew Kwong Woon and Wee Keong Ng and Ee Peng Lim", + title = "Evaluating Web Access Log Mining Algorithms: A Cognitive Approach.", + booktitle = "WISE Workshops", + year = "2002", + pages = "217--222", + ee = "http://computer.org/proceedings/wisew/1813/18130217abs.htm", + crossref = "DBLP:conf/wise/2002-2", + bibsource = "DBLP, http://dblp.uni-trier.de" +} + +@InProceedings{ DBLP:conf/ewmf/VanzinB05, + author = "Mari{\^a}ngela Vanzin and Karin Becker", + title = "Ontology-Based Rummaging Mechanisms for the Interpretation of Web Usage Patterns", + booktitle = "EWMF/KDO", + year = "2005", + pages = "180--195", + ee = "http://dx.doi.org/10.1007/11908678\_12", + crossref = "DBLP:conf/ewmf/2005", + bibsource = "DBLP, http://dblp.uni-trier.de" +} + +@InBook{ mobasherCastIdeaWUM, + pages = "276--306", + chapter = "XIII", + author = "Bamshad Mobasher and Honghua Dai", + title = "Integrating Semantic Knowledge with Web Usage Mining for Personalization", + crossref = "IdeaWebMining" +} + +@InProceedings{ DBLP:conf/iccsa/KimCP06, + author = "Il Kim and Bong Joon Choi and Kyoo Seok Park", + title = "Design and Implementation of Web Usage Mining System Using Page Scroll.", + booktitle = "ICCSA (5)", + year = "2006", + pages = "912--921", + ee = "http://dx.doi.org/10.1007/11751649\_100", + crossref = "DBLP:conf/iccsa/2006-5", + bibsource = "DBLP, http://dblp.uni-trier.de" +} + +@InProceedings{ DBLP:conf/dmin/KimCP06, + author = "Il Kim and Bong Joon Choi and Kyoo Seok Park", + title = "A Study on Web-Usage Mining Control System of using Page Scroll.", + booktitle = "DMIN", + year = "2006", + pages = "337--342", + crossref = "DBLP:conf/dmin/2006", + bibsource = "DBLP, http://dblp.uni-trier.de" +} + +@InProceedings{ DBLP:conf/ah/Barla06, + author = "Michal Barla", + title = "Interception of User's Interests on the Web.", + booktitle = "AH", + year = "2006", + pages = "435--439", + ee = "http://dx.doi.org/10.1007/11768012\_65", + crossref = "DBLP:conf/ah/2006", + bibsource = "DBLP, http://dblp.uni-trier.de" +} + +@InProceedings{ HajekHR03, + author = "Petr H{\'a}jek and Martin Hole\v{n}a and Jan Rauch", + title = "The {GUHA} Method and Foundations of (Relational) Data Mining.", + booktitle = "Theory and Applications of Relational Structures as Knowledge Instruments", + year = "2003", + pages = "17--37", + ee = "http://springerlink.metapress.com/openurl.asp?genre=article{\&}issn=0302-9743{\&}volume=2929{\&}spage=17", + crossref = "DBLP:conf/RelMiCS/2003", + bibsource = "DBLP, http://dblp.uni-trier.de" +} + +@Article{ guhemeaning, + author = "Petr H{\'a}jek and Martin Hole\v{n}a and Jan Rauch", + title = "The {GUHA} method and its meaning for data mining", + journal = "Journal of Computer and System Science", + publisher = "Springer Verlag", + year = 2010, + issue = "76/1", + pages = "34--38" +} + +\%misto 19 navrhuju + +@InCollection{ imLogic, + author = "Jan Rauch", + title = "Logical Aspects of the Measures of Interestingness of Association Rules", + booktitle = "Advances in Machine Learning II", + address = "Berlin", + publisher = "Springer", + isbn = "978-3-642-05178-4", + year = "2010", + pages = "175--203" +} + +\%misto 21 navrhuju + +@InCollection{ sewebarPrelim, + author = "Jan Rauch and Milan \v{S}im\accent23unek", + title = "Semantic Web Presentation of Analytical Reports from Data Mining -- Preliminary Considerations", + booktitle = "WEB INTELLIGENCE", + address = "Los Alamitos", + publisher = "IEEE", + isbn = "0-7695-3026-5", + year = "2010", + pages = "3--7" +} + +@article{ jiis, + author = "Tom\'{a}\v{s} Kliegr and Vojt\v{e}ch Sv\'{a}tek and Milan \v{S}im\accent23unek and Martin Ralbovsk\'{y}", + title = "Semantic Analytical Reports: A Framework for Post-Processing of Data Mining Results", + journal = "Journal of Intelligent Information Systems", + publisher = "Springer Verlag", + volume = "37", + year = "2011", + pages= "371-395", + number= "3" +} + +@InProceedings{ ardisj, + author = "Ralbovsk{\'y} Martin and Kucha\v{r} Tom\'{a}\v{s}", + title = "Using Disjunctions in Association Mining", + editor = "P. Perner", + booktitle = "Advances in Data Mining - Theoretical Aspects and Applications, LNAI", + publisher = "Springer Verlag", + year = 2007, + pages = "339--351", + address = "Heidelberg" +} + +@Misc{ pmmlinactionbook, + author = "Alex Guazzelli and Wen-Ching Lin and Tridivesh Jena", + title = "{PMML} in {A}ction", + isbn = "978-1452858265", + publisher = "CreateSpace", + year = "2010" +} + +@InCollection{ BRT:09, + author = "Petr Berka and Jan Rauch and Tome\v{c}kov\'{a} Marie", + title = "Data Mining in Atherosclerosis Risk Factor Data", + booktitle = "Data Mining and Medical Knowledge Management: Cases and Applications", + volume = "15", + year = "2009", + publisher = "IGI Global", + pages = "376--397" +} + +@Book{ Ha:81, + editor = "Petr H{\'a}jek", + title = "International Journal of Man-Machine Studies,second special issue on GUHA", + volume = "15", + year = "1981" +} + +@InCollection{ ConsiderationsRauch, + author = "Jan Rauch", + title = "Considerations on Logical Calculi for Dealing with Knowledge in Data Mining", + booktitle = "Data Mining: Foundations and Practice", + publisher = "Springer", + isbn = "978-3-642-02189-3", + year = "2009", + pages = "177--199" +} + +@InCollection{ DealingSewebar, + author = "Jan Rauch and Milan \v{S}im\r{u}nek", + title = "Dealing with Background Knowledge in the {SEWEBAR} Project", + booktitle = "Data Mining: Foundations and Practice", + publisher = "Springer", + isbn = "978-3-642-01890-9", + year = "2009", + pages = "89--106" +} + +@InProceedings{ acMiner, + author = "Jan Rauch and Milan \v{S}im\r{u}nek", + title = "Action Rules and the {GUHA} Method: Preliminary Considerations and Results", + booktitle = "ISMIS '09: Proceedings of the 18th International Symposium on Foundations of Intelligent Systems", + year = "2009", + isbn = "978-3-642-04124-2", + pages = "76--87", + location = "Prague, Czech Republic", + publisher = "Springer-Verlag", + address = "Berlin, Heidelberg" +} +@InProceedings{ acMinerShort, + author = "Jan Rauch and Milan \v{S}im\r{u}nek", + title = "Action Rules and the {GUHA} Method: Preliminary Considerations and Results", + booktitle = "ISMIS '09", + year = "2009", + isbn = "978-3-642-04124-2", + pages = "76--87", + location = "Prague, Czech Republic", + publisher = "Springer-Verlag", + address = "Berlin, Heidelberg" +} +@Article{ KLMiner, + title = "The KL-Miner procedure for datamining", + author = "V. Dolej\v{s}{\'i} P. Rauch J. Sim\r{u}nek M. L{\'i}n", + journal = "Neural Network World", + pages = "411--420", + volume = "14", + number = "5", + year = "2004", + document_type = "Conference Paper", + source = "Scopus" +} + +@Article{ ARSurvey, + author = {Jochen Hipp and Ulrich G{\"u}ntzer and Gholamreza Nakhaeizadeh}, + title = "Algorithms for association rule mining --- a general survey and comparison", + journal = "SIGKDD Explor. Newsl.", + volume = "2", + number = "1", + year = "2000", + issn = "1931-0145", + pages = "58--64", + doi = "http://doi.acm.org/10.1145/360402.360421", + publisher = "ACM", + address = "New York, NY, USA" +} + +@InCollection{ irmles10, + author = "Tom\'{a}\v{s} Kliegr and Vojt\v{e}ch Sv\'{a}tek and Milan \v{S}im\accent23unek and Daniel \v{S}tastn\'y and Andrej Hazucha", + title = "{XML} Schema and Topic Map Ontology for Background Knowledge in Data Mining", + booktitle = "The 2nd IRMLES ESWC Workshop", + year = "2010" +} + +@Book{ DBLP:series/sci/2008-118, + title = "Data Mining: Foundations and Practice", + booktitle = "Data Mining: Foundations and Practice", + publisher = "Springer", + series = "Studies in Comp. Int.", + volume = "118", + year = "2008", + isbn = "978-3-540-78487-6", + bibsource = "DBLP, http://dblp.uni-trier.de" +} + +@InProceedings{ negativeARs, + author = {Maria-Luiza Antonie and Osmar R. Za\"{\i}ane}, + title = "Mining positive and negative association rules: an approach for confined rules", + booktitle = "PKDD '04", + year = "2004", + isbn = "3-540-23108-0", + pages = "27--38", + location = "Pisa, Italy", + publisher = "Springer-Verlag New York, Inc.", + address = "New York, NY, USA" +} + +@InProceedings{ tmra09, + author = "Tom\'{a}\v{s} Kliegr and Marek Ove\v{c}ka and Jan Zem\'{a}nek ", + title = "Topic Maps for Association Rule Mining", + booktitle = "Proceedings of TMRA 2009", + year = "2009", + publisher = "University of Leipzig" +} + +@InProceedings{ Jan02content-basedretrieval, + author = "V\'{a}clav L\'{i}n and Jan Rauch and Vojt\v{e}ch Sv\'{a}tek", + title = "Content-based Retrieval of Analytic Reports", + booktitle = "Rule Markup Languages for Business Rules on the Semantic Web, Sardinia 2002", + year = "2002", + pages = "219--224" +} + +@Article{ arulesbetter, + author = {Michael Hahsler and Bettina Gr{\"u}n and Kurt Hornik}, + title = "arules - A Computational Environment for Mining Association Rules and Frequent Item Sets", + journal = "Journal of Statistical Software", + volume = "14", + number = "15", + pages = "1--25", + day = "29", + month = "9", + year = "2005", + CODEN = "JSSOBK", + ISSN = "1548-7660", + bibdate = "2005-09-29", + URL = "http://www.jstatsoft.org/v14/i15", + accepted = "2005-09-29", + acknowledgement = "", + keywords = "", + submitted = "2005-04-15" +} + +@Article{ SurveyofInterestingnessMeasures, + title = "A Survey of Interestingness Measures for Association Rules", + author = "Yuejin Zhang and Lingling Zhang and Guangli Nie and Yong Shi", + publisher = "IEEE", + address = "Los Alamitos, CA, USA", + journal = "Business Intelligence and Financial Engineering, International Conference on", + pages = "460--463", + volume = "0", + year = "2009", + isbn = "978-0-7695-3705-4", + doi = "http://doi.ieeecomputersociety.org/10.1109/BIFE.2009.110" +} + +@Proceedings{ DBLP:conf/cidm/2009, + title = "Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2009, part of the IEEE Symposium Series on Computational Intelligence 2009, Nashville, TN, USA, March 30, 2009 - April 2, 2009", + booktitle = "CIDM", + publisher = "IEEE", + year = "2009", + bibsource = "DBLP, http://dblp.uni-trier.de" +} + +@InProceedings{ mininggenearlizeddisjunctivears, + author = "Amit A. Nanavati and Krishna P. Chitrapura and Sachindra Joshi and Raghu Krishnapuram", + title = "Mining generalised disjunctive association rules", + booktitle = "CIKM '01", + year = "2001", + isbn = "1-58113-436-3", + pages = "482--489", + location = "Atlanta, Georgia, USA", + doi = "http://doi.acm.org/10.1145/502585.502666", + publisher = "ACM", + address = "New York, NY, USA" +} + +@InProceedings{ gtm, + AUTHOR = "Hendrik Thomas and Tobias Redmann and Maik Pressler and Bernd Markscheffel", + TITLE = "{GTMalpha} - Towards a Graphical Notation for Topic Maps", + YEAR = "2008", + Booktitle = "TMRA 2008", + ADDRESS = "Leipzig", + pages = "56 -- 66" +} + +@Book{ nla.cat-vn667167, + author = "David W. Stephens and J. R. Krebs", + title = " Foraging theory / David W. Stephens and John R. Krebs ", + isbn = " 0691084416 0691084424 ", + publisher = " Princeton University Press, Princeton, N.J. : ", + pages = " xiv, 247 p. : ", + year = " 1986 ", + type = " Book ", + language = " English ", + subjects = " Animals - Food. ", + life-dates = " 1986 - ", + catalogue-url = " http://nla.gov.au/nla.cat-vn667167 " +} + +@Article{ DBLP:journals/dke/FaccaL05, + author = "Federico Michele Facca and Pier Luca Lanzi", + title = "Mining interesting knowledge from weblogs: a survey", + journal = "Data Knowl. Eng.", + volume = "53", + number = "3", + year = "2005", + pages = "225--241", + ee = "http://dx.doi.org/10.1016/j.datak.2004.08.001", + bibsource = "DBLP, http://dblp.uni-trier.de" +} + +@Article{ Simon1955, + abstract = "Introduction, 99.--I. Some general features of rational choice, 100.--II. The essential simplifications, 103.--III. Existence and uniqueness of solutions, 111.--IV. Further comments on dynamics, 113.--V. Conclusion, 114.--Appendix, 115.", + author = "Herbert A. Simon", + citeulike-article-id = "996808", + citeulike-linkout-0 = "http://dx.doi.org/10.2307/1884852", + citeulike-linkout-1 = "http://www.jstor.org/stable/1884852", + doi = "10.2307/1884852", + issn = "00335533", + journal = "The Quarterly Journal of Economics", + keywords = "bounded-rationality, rational-choice, satisficing", + number = "1", + pages = "99--118", + posted-at = "2006-12-15 09:59:20", + priority = "3", + publisher = "The MIT Press", + title = "A Behavioral Model of Rational Choice", + url = "http://dx.doi.org/10.2307/1884852", + volume = "69", + year = "1955" +} + +@Book{ citeulike:1771049, + author = "J. R. Anderson", + citeulike-article-id = "1771049", + keywords = "adaptive-rationality, rationality", + posted-at = "2007-10-15 19:39:06", + priority = "2", + title = "The Adaptive Character of Thought", + year = "1990" +} + +@Book{ mitchell:machine-learning, + author = "T. Mitchell", + citeulike-article-id = "218147", + citeulike-linkout-0 = "http://www.amazon.ca/exec/obidos/redirect?tag=citeulike09-20\&path=ASIN/0071154671", + citeulike-linkout-1 = "http://www.amazon.de/exec/obidos/redirect?tag=citeulike01-21\&path=ASIN/0071154671", + citeulike-linkout-2 = "http://www.amazon.fr/exec/obidos/redirect?tag=citeulike06-21\&path=ASIN/0071154671", + citeulike-linkout-3 = "http://www.amazon.jp/exec/obidos/ASIN/0071154671", + citeulike-linkout-4 = "http://www.amazon.co.uk/exec/obidos/ASIN/0071154671/citeulike00-21", + citeulike-linkout-5 = "http://www.amazon.com/exec/obidos/redirect?tag=citeulike07-20\&path=ASIN/0071154671", + citeulike-linkout-6 = "http://www.worldcat.org/isbn/0071154671", + citeulike-linkout-7 = "http://books.google.com/books?vid=ISBN0071154671", + citeulike-linkout-8 = "http://www.amazon.com/gp/search?keywords=0071154671\&index=books\&linkCode=qs", + citeulike-linkout-9 = "http://www.librarything.com/isbn/0071154671", + edition = "1st", + howpublished = "Paperback", + isbn = "0071154671", + keywords = "ai, textbooks", + month = "October", + posted-at = "2008-03-06 15:24:59", + priority = "2", + publisher = "McGraw-Hill Education (ISE Editions)", + title = "Machine Learning (Mcgraw-Hill International Edit)", + url = "http://www.amazon.com/exec/obidos/redirect?tag=citeulike07-20&path=ASIN/0071154671", + year = "1997" +} + +@InBook{ simdabook, + title = "System Integration using Model Driven Engineering", + booktitle = "System Integration via Model Composition, Designing Software-Intensive Systems: Methods and Principles", + author = "Krishnakumar Balasubramanian and Douglas C. Schmidt and Zoltᅵn Molnᅵr and {ᅵkos Lᅵdeczi}", + note = "Submitted to publication, www.dre.vanderbilt.edu/~kitty/pubs/bookchapter-final.pdf", + url = "www.dre.vanderbilt.edu/~kitty/pubs/bookchapter-final.pdf" +} + +@Book{ vorisek, + title = "Aplikaᅵnᅵ sluᅵby IS/ICT formou ASP", + author = "Jiᅵᅵ Voᅵᅵᅵek and Jan Pavelka and Miroslav Vᅵt", + publisher = "Grada", + year = "2004" +} + +@InProceedings{ Favre, + author = "Jean-Marie Favre", + booktitle = "Transformation Techniques in Software Engineering", + editor = "James R. Cordy and Ralf Lᅵmmel and Andreas Winter", + interHash = "e72483c32ca8e3691a821aaad9d74018", + intraHash = "d6e5eb8aca94d84624e689aa17706394", + publisher = "Internationales Begegnungs- und Forschungszentrum fᅵr Informatik (IBFI), Schloss Dagstuhl, Germany", + series = "Dagstuhl Seminar Proceedings", + title = "Megamodelling and Etymology.", + url = "http://dblp.uni-trier.de/db/conf/dagstuhl/P5161.html#Favre05", + volume = "05161", + year = "2005", + ee = "http://drops.dagstuhl.de/opus/volltexte/2006/427", + date = "2006-05-10" +} + +@TechReport{ swprimer, + title = "A Semantic Web Primer for Object-Oriented Software Developers", + institution = "W3C", + year = "2006", + url = "http://www.w3.org/2001/sw/BestPractices/SE/ODSD/", + note = "W3C Editor's Draft" +} + +@Book{ mdabook, + title = "{Model Driven Architecture: Applying MDA to Enterprise Computing}", + author = "David S. Frankel", + publisher = "{John Wiley \& Sons}", + year = 2003, + isbn = "0471319201", + address = "New York, NY, USA" +} + +@Misc{ smolikdis, + author = "Petr Smolᅵk", + title = "MAMBO METAMODELING ENVIRONMENT", + note = "Disertaᅵnᅵ prᅵce, FIT VUT Brno", + institution = "FIT VUT Brno", + year = 2006 +} + +@Misc{ smolikkeg, + author = "Petr Smolᅵk", + title = "Model-Driven Engineering of Enterprise Information Systems: An Overview", + institution = "Metada s.r.o", + note = "Pᅵednᅵᅵka na seminᅵᅵi KEG, VSE, 16.4.2009 ", + date = "2009" +} + +@TechReport{ odm, + institution = "IBM and Sandpiper Software", + title = "Ontology Definition Metamodel Third Revised Submission to {OMG/ RFP ad/2003-03-40}", + date = "2005", + url = "http://www.omg.org/docs/ad/05-08-01.pdf" +} + +@InProceedings{ AMW, + title = "{AMW: a generic model weaver}", + author = "Marcos Didonet Del Fabro and Jean Bezivin and Frederic Jouault and Erwan Breton and Guillaume Gueltas", + booktitle = "Proceedings of the 1ere Journᅵe sur l'Ingᅵnierie Dirigᅵe par les Modeles (IDM05)", + publisher = " ", + year = 2005, + url = "http://www.sciences.univ-nantes.fr/lina/atl/www/papers/IDM_2005_weaver.pdf", + description = "http://www.sciences.univ-nantes.fr/lina/atl/bibliography/IDM2005/", + biburl = "http://www.bibsonomy.org/bibtex/2a1b26c0445e2c417ec142a47c3021c30/msn", + keywords = "cites.ref state.unclassified", + authorurls = "http://www.sciences.univ-nantes.fr/lina/atl/contrib/fabro and http://www.sciences.univ-nantes.fr/lina/atl/contrib/bezivin and http://www.sciences.univ-nantes.fr/lina/atl/contrib/jouault and and " +} + +@InProceedings{ Friesen, + author = "Andreas Friesen", + booktitle = "TWOMD", + editor = "Fernando Silva Parreiras and Jeff Z. Pan and Uwe Aᅵmann and Jakob Henriksson", + interHash = "58eff2d48c95dad34aa83ab634a999a3", + intraHash = "6dc6aaba948082b0287b2344defdd7c5", + pages = "1--4", + publisher = "CEUR-WS.org", + series = "CEUR Workshop Proceedings", + title = "Potential Applications of Ontologies and Reasoning for Modeling and Software Engineering.", + url = "http://dblp.uni-trier.de/db/conf/models/twomd2008.html#Friesen08", + volume = "395", + year = "2008", + ee = "http://ceur-ws.org/Vol-395/paper00.pdf", + date = "2008-12-22" +} + +@InCollection{ Falkovych, + author = "Kateryna Falkovych and Martha Sabou and Heiner Stuckenschmidt", + booktitle = "Knowledge Transformation for the Semantic Web", + editor = "B. Omelayenko and M. Klein", + interHash = "366684f34c7d6dce877b5405765df680", + intraHash = "6fd16ed56713b5265dc414acb232ead8", + pages = "92--106", + publisher = "IOS Press", + title = "{UML} for the {S}emantic {W}eb: {T}ransformation-{B}ased {A}pproaches", + url = "http://www.cwi.nl/~media/publications/UML_for_SW.pdf", + year = "2003", + isbn = "1-58603-325-5", + abstract = "The perspective role of UML as a conceptual modelling language for the Semantic Web has become an important research topic. We argue that UML could be a key technology for overcoming the ontology development bottleneck thanks to its wide acceptance and sophisticated tool support. Transformational approaches are a promising way of establishing a connection between UML and web-based ontology languages. We compare some proposals for defining transformations between UML and web ontology languages and discuss the different ways they handle the conceptual differences between these languages. We identify commonalities and differences of the approaches and point out open questions that have not or not satisfyingly been addressed by existing approaches. " +} + +@InProceedings{ Kappel, + author = "Gerti Kappel and Elisabeth Kapsammer and Horst Kargl and Gerhard Kramler and Thomas Reiter and Werner Retschitzegger and Wieland Schwinger and Manuel Wimmer", + booktitle = "MoDELS", + editor = "Oscar Nierstrasz and Jon Whittle and David Harel and Gianna Reggio", + interHash = "a61c0d05029006217e9e9fbd1d6d2809", + intraHash = "336431b3c724ecab7b6f6501fa25cbfe", + pages = "528--542", + publisher = "Springer", + series = "Lecture Notes in Computer Science", + title = "Lifting Metamodels to Ontologies: A Step to the Semantic Integration of Modeling Languages.", + url = "http://dblp.uni-trier.de/db/conf/models/models2006.html#KappelKKKRRSW06", + volume = "4199", + year = "2006", + ee = "http://dx.doi.org/10.1007/11880240\_37", + isbn = "3-540-45772-0", + date = "2006-12-07" +} + +@InProceedings{ Parreiras, + author = "Fernando Silva Parreiras and Steffen Staab and Andreas Winter", + booktitle = "ESEC/SIGSOFT FSE (Companion)", + editor = "Ivica Crnkovic and Antonia Bertolino", + interHash = "37e729c1e1101ad1112337c4b2571e33", + intraHash = "223e5ffd0c99df7d145fff70823ba99d", + pages = "439--448", + publisher = "ACM", + title = "On marrying ontological and metamodeling technical spaces.", + url = "http://dblp.uni-trier.de/db/conf/sigsoft/fse2007c.html#ParreirasSW07a", + year = "2007", + ee = "http://doi.acm.org/10.1145/1295014.1295017", + isbn = "978-1-59593-812-1", + date = "2008-11-03" +} + +@InProceedings{ RelationalOWL, + author = "Cristian Pᅵrez de Laborda and Stefan Conrad", + booktitle = "ICDE Workshops", + editor = "Roger S. Barga and Xiaofang Zhou", + interHash = "eab3ca43e59c0b65757f9011805d7941", + intraHash = "72916db1a569b439b718efd7b4380d63", + pages = "55", + publisher = "IEEE Computer Society", + title = "{Bringing Relational Data into the SemanticWeb using SPARQL and Relational.OWL.}", + url = "http://dblp.uni-trier.de/db/conf/icde/icdew2006.html#LabordaC06", + year = "2006", + ee = "http://doi.ieeecomputersociety.org/10.1109/ICDEW.2006.37", + date = "2006-05-15" +} + +@InProceedings{ ATL, + address = "Montego Bay, Jamaica", + author = "Frᅵdᅵric Jouault and Ivan Kurtev", + booktitle = "Proceedings of the Model Transformations in Practice Workshop at MoDELS 2005", + interHash = "d2e2787bcf9ef42c2349f34e200153e6", + intraHash = "4c1416fc2d68fda4a4058752ea40cc1a", + title = "{Transforming Models with ATL}", + url = "http://sosym.dcs.kcl.ac.uk/events/mtip/submissions/jouault_kurtev__transforming_models_with_atl.pdf", + year = "2005", + authorurls = "http://www.sciences.univ-nantes.fr/lina/atl/contrib/jouault and http://www.sciences.univ-nantes.fr/lina/atl/contrib/kurtev" +} + +@InProceedings{ Hillairet, + author = "Guillaume Hillairet and Frᅵdᅵric Bertrand and Jean-Yves Lafaye", + booktitle = "TWOMD", + editor = "Fernando Silva Parreiras and Jeff Z. Pan and Uwe Aᅵmann and Jakob Henriksson", + interHash = "3f492019ea833be28c843c3e5dcfc128", + intraHash = "607043b9df6fc056dad3f682001c5bba", + pages = "32--46", + publisher = "CEUR-WS.org", + series = "CEUR Workshop Proceedings", + title = "{MDE for Publishing Data on the Semantic Web.}", + url = "http://dblp.uni-trier.de/db/conf/models/twomd2008.html#HillairetBL08", + volume = "395", + year = "2008", + ee = "http://ceur-ws.org/Vol-395/paper03.pdf", + date = "2008-12-22" +} + +@InProceedings{ Zivkovic, + author = "Srdjan Zivkovic and Marion Murzek and Harald Kᅵhn", + booktitle = "TWOMD", + editor = "Fernando Silva Parreiras and Jeff Z. Pan and Uwe Aᅵmann and Jakob Henriksson", + interHash = "9fa564522a2a1c5d6de54941d4765a94", + intraHash = "a0ca60d76191af225747f049c0370861", + pages = "47--54", + publisher = "CEUR-WS.org", + series = "CEUR Workshop Proceedings", + title = "Bringing Ontology Awareness into Model Driven Engineering Platforms.", + url = "http://dblp.uni-trier.de/db/conf/models/twomd2008.html#ZivkovicMK08", + volume = "395", + year = "2008", + ee = "http://ceur-ws.org/Vol-395/paper04.pdf", + date = "2008-12-22" +} + + +@Article{ AI, + author = "Jan Rauch", + title = " Logic of Association Rules", + journal = "Applied Intelligence", + number = 22, + year = 2005, + pages = "9--28" +} + +@InProceedings{ sewebar, + author = "Jan Rauch and Milan \v{S}im\accent23unek ", + title = " Dealing with Background Knowledge in the {SEWEBAR} Project", + booktitle = "PRICKL ECML/PKDD Workshop", + address = "Warszaw", + year = "2007", + apges = "97--108" +} + +@InProceedings{ ismis09, + title = "Semantic Analytical Reports: A Framework for Post-Processing Data Mining Results", + author = "Tom{\'a}\v{s} Kliegr and Martin Ralbovsk{\'y} and Vojt\v{e}ch Sv{\'a}tek and Milan \v{S}im\accent23unek and Vojt\v{e}ch Jirkovsk{\'y} and Jan Nemrava and Jan Zem{\'a}nek", + booktitle = "ISMIS'09: 18th International Symposium on Methodologies for Intelligent Systems", + year = 2009, + publisher = "Springer", + pages = "453--458" +} + +@InProceedings{ ismis09short, + title = "Semantic Analytical Reports: A Framework for Post-Processing Data Mining Results", + author = "Tom{\'a}\v{s} Kliegr and Martin Ralbovsk{\'y} and Vojt\v{e}ch Sv{\'a}tek and Milan \v{S}im\accent23unek and Vojt\v{e}ch Jirkovsk{\'y} and Jan Nemrava and Jan Zem{\'a}nek", + booktitle = "ISMIS'09", + address = "Prague", + year = 2009, + publisher = "Springer", + pages = "453ᅵ--458" +} + +@TechReport{ ltm, + url = "http://www.ontopia.net/download/ltm.htm", + title = "The Linear Topic Map Notation", + author = "Lars Marius Garshol", + institution = "Ontopia", + year = 2006 +} + +@Article{ KotsiatnisARSurvey, + abstract = "In this paper, we provide the preliminaries of basic concepts about association rule mining and survey the list of existing association rule mining techniques. Of course, a single article cannot be a complete review of all the algorithms, yet we hope that the references cited will cover the major theoretical issues, guiding the researcher in interesting research directions that have yet to be explored.", + author = "Sotiris Kotsiantis and Dimitris Kanellopoulos", + citeulike-article-id = "4186352", + citeulike-linkout-0 = "http://scholar.google.es/scholar?hl=es\&\#38;rlz=1B3GGGL\_esAR313AR313\&\#38;q=author:\%22Kotsiantis\%22+intitle:\%22Association+Rules+Mining:+A+Recent+Overview\%22+\&\#38;um=1\&\#38;ie=UTF-8\&\#38;oi=scholarr", + journal = "International Transactions on Computer Science and Engineering", + keywords = "association, rules, survey", + number = "1", + pages = "71--82", + posted-at = "2009-03-17 12:53:19", + priority = "0", + publisher = "Global Engineering, Science, and Technology Society (GESTS)", + title = "Association Rules Mining: A Recent Overview", + url = "http://scholar.google.es/scholar?hl=es&rlz=1B3GGGL_esAR313AR313&q=author:%22Kotsiantis%22+intitle:%22Association+Rules+Mining:+A+Recent+Overview%22+&um=1&ie=UTF-8&oi=scholarr", + volume = "32", + year = "2006" +} + +@Article{ veblen2, + jstor_articletype = "primary\_article", + title = "Relative Preferences", + author = "Richard H. McAdams", + journal = "The Yale Law Journal", + jstor_issuetitle = "", + volume = "102", + number = "1", + jstor_formatteddate = "Oct., 1992", + pages = "1--104", + url = "http://www.jstor.org/stable/796772", + ISSN = "00440094", + abstract = "Neoclassical economics traditionally has assumed that consumer preferences are independent of each other and has neglected the desire of individuals for relative position. To the extent that legal scholarship has addressed this interdependence of preferences, it has focused entirely on altruism and envy. Professor McAdams focuses on a different manner in which preferences may be interdependent: people often desire, as an end in itself, to equal or surpass the consumption level of others. Drawing upon social science literature, the Article discusses the importance of relative preferences to descriptive and normative legal theories. Under certain conditions, unregulated competition to satisfy these relative preferences will produce outcomes inferior to those made possible by regulation. Professor McAdams discusses the factors that create this market failure, the rare conditions under which satisfaction of relative preferences is desirable, and the implications for law and legal theory. In +particular, the Article considers taxation and antidiscrimination laws as ways of limiting wasteful individual and group status competition.", + publisher = "The Yale Law Journal Company, Inc.", + language = "", + copyright = "Copyright ᅵ 1992 The Yale Law Journal Company, Inc.", + year = "1992" +} + +@TechReport{ getlittleworkalot, + author = "Alwine Mohnen and Kathrin Pokorny", + title = "Reference Dependency and Loss Aversion in Employer-Employee-Relationships: A Real Effort Experiment on the Incentive Impact of the Fixed Wage ", + year = 2004, + note = "Working Paper Series ", + institution = "University of Cologne" +} + +@TechReport{ whyrejectleftright, + author = "Heike Hennig-Schmidt and Zhu-Yu Li and Chaoliang Yang", + title = "Why People Reject Advantageous Offers - Non-monotone Strategies in Ultimatum Bargaining", + year = 2004, + month = Dec, + institution = "University of Bonn, Germany", + type = "Bonn Econ Discussion Papers" +} + +@Book{ handbookutility, + title = "Handbook of utility theory", + editor = "Salvador Barbera and Peter J Hammond and Christian Seidl", + publisher = "Springer", + year = "1998", +volume = {1}, + ISBN = 0792381742 +} + +@Book{ handbookutilityVol2, + title = "Handbook of utility theory - Extensions", + editor = "Salvador Barbera and Peter J Hammond and Christian Seidl", + publisher = "Springer", + year = "1999", +volume = {2}, + ISBN = 0792381742 +} + +@InBook{ mustard, + author = "Yannis Siskos", + book = "Journal of Behavioral Decision Making", + title = " MUSTARD: Multicriteria utility-based stochastic aid for ranking decisions ", + note = "Software Review", + volume = 15, + issue = 5, + Pages = "461--465", + year = 2002 +} + +@InCollection{ Despotis93, + author = "D.K. Despotis and Constantin Zopounidis", + editor = "P.M. Pardalos and Y. Siskos and C. Zopounidis", + year = 1993, + title = "Building additive utilities in the presence of non-monotonic preference", + booktitle = "Advances in Multicriteria Analysis", + publisher = "Kluwer Academic Publisher, Dordrecht", + pages = "101--114" +} + +@Article{ ZopounidisDompos2000, + author = "Constantin Zopounidis and Michael Doumpos", + month = "June", + year = 2000, + title = "{PREFDIS}: {A} multicriteria decision support system for sorting decision problems", + journal = "Computers \& Operations Research", + volume = 27, + number = "7--8", + pages = "779--797" +} + +@InBook{ uta, + BOOKTITLE = "Multiple Criteria Decision Analysis: State of the Art suveys", + AUTHOR = "Yannis Siskos", + CITY = "New York", + Publisher = "Springer", + pages = "297--334", + CHAPTER = "XVIII", + Title = "UTA Methods", + ISBN = "978-0-387-23081-8" +} + +@Article{ lamsade_ejor, + author = "Salvatore Greco and Vincent Mousseau and Roman Slowinski", + title = "Ordinal regression revisited: Multiple criteria ranking using a set of additive value functions", + journal = "European Journal of Operational Research", + year = 2008, + volume = "191", + number = "2", + pages = "416--436", + month = "December" +} + +@InProceedings{ lamsade, + title = "Ordinal regression revisited: multiple criteria ranking with a set of additive value functions", + author = "Salvatore Greco and Vincent Mousseau and Roman Slowinski", + year = "2007", + booktitle = "Cahier du LAMSADE no. 240", + institution = "Universitᅵ de Paris-Dauphine" +} + +@article{ kendall, + author = "James E. De Muth", + title = "Basic Statistics and Pharmaceutical Statistical Applications, 2nd edition", + journal = "Journal of the Royal Statistical Society: Series A", + ISBN-13 = "978-0824719678", + year = 1999, + publisher = "CRC" +} + + +@InProceedings{ kliegr08WBBT_short, + author = "Tom{\'a}\v{s} Kliegr and Krishna Chandramouli and Jan Nemrava and Vojt\v{e}ch Sv{\'a}tek and Ebroul Izquierdo", + title = "{W}ikipedia as the Premiere source for Targeted Hypernym Discovery", + booktitle = "WBBT/ECML'08: Wikis, Proceedings of Blogs, Bookmarking Tools - +Mining the Web 2.0 Workshop", + place = "Antwerp", + year = "2008" +} + + +@InProceedings{ kliegr08WBBT, + author = "Tom{\'a}\v{s} Kliegr and Krishna Chandramouli and Jan Nemrava and Vojt\v{e}ch Sv{\'a}tek and Ebroul Izquierdo", + title = "{W}ikipedia as the Premiere source for Targeted Hypernym Discovery", + booktitle = "Proceedings of the Wiki's, Blogs and Bookmarking tools - Mining the Web 2.0 Workshop at ECML'08", + place = "Antwerp", + year = "2008" +} + +@InProceedings{ simAn_short, + author = "Jim Cowie and Joe Guthrie and Louise Guthrie", + title = "Lexical disambiguation using simulated annealing", + booktitle = "COLING", + year = "1992", + pages = "359--365", + location = "Nantes, France", + publisher = "ACM", + address = "Morristown, NJ, USA" +} + +@InProceedings{ simAn, + author = "Jim Cowie and Joe Guthrie and Louise Guthrie", + title = "Lexical disambiguation using simulated annealing", + booktitle = "Proceedings of the 14th conference on Computational linguistics", + year = "1992", + pages = "359--365", + location = "Nantes, France", + publisher = "Association for Computational Linguistics", + address = "Morristown, NJ, USA" +} + +@InProceedings{ finegrainedcl_short, + author = "Michael Fleischman and Eduard Hovy", + title = "Fine grained classification of named entities", + booktitle = "COLING", + year = "2002", + publisher = "ACL" +} + +@InProceedings{ finegrainedcl, + author = "Michael Fleischman and Eduard Hovy", + title = "Fine grained classification of named entities", + booktitle = "Proceedings of the 19th international conference on Computational linguistics", + year = "2002", + pages = "1--7", + location = "Taipei, Taiwan", + doi = "http://dx.doi.org/10.3115/1072228.1072358", + publisher = "Association for Computational Linguistics", + address = "Morristown, NJ, USA" +} + +@InProceedings{ cimiano05_NEC_short, + abstract = "Named entity recognition and classification research has so far mainly focused on supervised techniques and has typically considered only small sets of classes with regard to which to classify the recognized entities.", + author = {Philipp Cimiano and Johanna V{\"o}lker}, + booktitle = "RANLP", + citeulike-article-id = "933535", + pages = "166--172", + posted-at = "2008-04-13 16:46:21", + priority = "2", + title = "Towards Large-Scale, Open-Domain and Ontology-Based Named Entity Classification", + url = "http://citeseer.ist.psu.edu/cimiano05towards.html", + year = "2005" +} + +@InProceedings{ cimiano05_NEC, + abstract = "Named entity recognition and classification research has so far mainly focused on supervised techniques and has typically considered only small sets of classes with regard to which to classify the recognized entities.", + author = {Philipp Cimiano and Johanna V{\"o}lker}, + booktitle = "Proceedings of Recent Advances in Natural Language Processing", + citeulike-article-id = "933535", + pages = "166--172", + posted-at = "2008-04-13 16:46:21", + priority = "2", +series={RANLP'05}, + title = "Towards Large-Scale, Open-Domain and Ontology-Based Named Entity Classification", + url = "http://citeseer.ist.psu.edu/cimiano05towards.html", + year = "2005" +} + + + +@INPROCEEDINGS{resnik95using_short, + author = {Philip Resnik}, + title = {Using Information Content to Evaluate Semantic Similarity in a Taxonomy}, + booktitle = {Proceedings of the 14th International Joint Conference on Artificial Intelligence}, + year = {1995}, + pages = {448--453} +} + + +@inproceedings{resnik95using, + author = {Resnik, Philip}, + title = {Using information content to evaluate semantic similarity in a taxonomy}, + booktitle = {Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1}, + series = {IJCAI'95}, + year = {1995}, + isbn = {1-55860-363-8, 978-1-558-60363-9}, + location = {Montreal, Quebec, Canada}, + pages = {448--453}, + numpages = {6}, + url = {http://dl.acm.org/citation.cfm?id=1625855.1625914}, + acmid = {1625914}, + publisher = {Morgan Kaufmann Publishers Inc.}, + address = {San Francisco, CA, USA}, +} + + +@Article{ fusion-overview, + author = "Dymitr Ruta and Bogdan Gabrys", + title = "An Overview of Classifier Fusion Methods", + journal = "Computing and Information Systems", + year = "2000", + volume = "7" +} + +@InProceedings{ graphBasedWSD, + author = "Ravi Sinha and Rada Mihalcea", + title = "Unsupervised Graph-basedWord Sense Disambiguation Using Measures of Word Semantic Similarity", + booktitle = "ICSC '07: Proceedings of the International Conference on Semantic Computing", + year = "2007", + isbn = "0-7695-2997-6", + pages = "363--369", + doi = "http://dx.doi.org/10.1109/ICSC.2007.107", + publisher = "IEEE Computer Society", + address = "Washington, DC, USA" +} + +@inproceedings{banerjee03extended, + author = {Banerjee, Satanjeev and Pedersen, Ted}, + title = {Extended gloss overlaps as a measure of semantic relatedness}, + booktitle = {Proceedings of the 18th international joint conference on Artificial intelligence}, + series = {IJCAI'03}, + year = {2003}, + location = {Acapulco, Mexico}, + pages = {805--810}, + numpages = {6}, + url = {http://dl.acm.org/citation.cfm?id=1630659.1630775}, + acmid = {1630775}, + publisher = {Morgan Kaufmann Publishers Inc.}, + address = {San Francisco, CA, USA}, +} + + +@InProceedings{ cucerzan:2007:EMNLP-CoNLL2007_short, + author = "Silviu Cucerzan", + title = "Large-Scale Named Entity Disambiguation Based on {Wikipedia} Data", + booktitle = "EMNLP-CoNLL", + year = "2007", + pages = "708--716", + url = "http://www.aclweb.org/anthology/D/D07/D07-1074" +} + +@InProceedings{ cucerzan:2007:EMNLP-CoNLL2007, + author = "Silviu Cucerzan", + title = "Large-Scale Named Entity Disambiguation Based on {Wikipedia} Data", + booktitle = "EMNLP-CoNLL'07: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning", + pages = "708--716", + url = "http://www.aclweb.org/anthology/D/D07/D07-1074" +} + + +@inproceedings{kliegr08MDM, + author = { Kliegr,Tom{\'a}\v{s} and Krishna Chandramouli and Jan Nemrava and Vojt\v{e}ch Sv{\'a}tek and Ebroul Izquierdo}, + title = {Combining image captions and visual analysis for image concept classification}, + booktitle = {Proceedings of the 9th International Workshop on Multimedia Data Mining: held in conjunction with the ACM SIGKDD 2008}, + series = {MDM '08}, + year = {2008}, + isbn = {978-1-60558-261-0}, + location = {Las Vegas, Nevada}, + pages = {8--17}, + numpages = {10}, + url = {http://doi.acm.org/10.1145/1509212.1509214}, + doi = {10.1145/1509212.1509214}, + acmid = {1509214}, + publisher = {ACM}, + address = {New York, NY, USA}, +} + + +@InProceedings{ kliegr08MDM_short, + author = "Tom{\'a}\v{s} Kliegr and Krishna Chandramouli and Jan Nemrava and Vojt\v{e}ch Sv{\'a}tek and Ebroul Izquierdo", + title = "Combining Captions and Visual Analysis for Image Concept Classification", + booktitle = "MDM/KDD'08", + place = "Las Vegas, USA", + publisher = "ACM", + note = "To appear", + year = "2008" +} + +@InProceedings{ KaTo07, + author = "Jun'ichi Kazama and Kentaro Torisawa", + pages = "698--707", + school = "Japan Advanced Institute of Science and Technologie", + title = "Exploiting {W}ikipedia as External Knowledge for Named Entity Recognition", + booktitle = "EMNLP-CoNLL'07: Proceedings of the 2007 Joint Conference on + Empirical Methods in Natural Language Processing and Computational Natural + Language Learning", + year = "2007" +} + +@InProceedings{ multiClassLabeling, + title = "Improved Image Annotation and Labelling through Multi-Label Boosting", + author = "Matthew Johnson and Roberto Cipolla", + booktitle = "Proceedings of British Machine Vision Conference 2005", + year = "2005", + publisher = "Oxford University Press" +} + +@InProceedings{ stanfordNER, + author = "Jenny Rose Finkel and Trond Grenager and Christopher Manning", + title = "Incorporating non-local information into information extraction systems by Gibbs sampling", + booktitle = "ACL '05: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics", + year = "2005", + pages = "363--370", + location = "Ann Arbor, Michigan", + doi = "http://dx.doi.org/10.3115/1219840.1219885", + publisher = "ACL", + address = "Morristown, NJ, USA" +} + +@Article{ multiclassNN, + author = "Guobin Ou and Yi Lu Murphey", + title = "Multi-class pattern classification using neural networks", + journal = "Pattern Recogn.", + volume = "40", + number = "1", + year = "2007", + issn = "0031-3203", + pages = "4--18", + publisher = "Elsevier Science Inc.", + address = "New York, NY, USA" +} + +@InProceedings{ joachims98text, + author = "Thorsten Joachims", + title = "Text categorization with support vector machines: learning with many relevant features", + booktitle = "{ECML}-98: Proceedings of the 10th European Conference on Machine Learning", + number = "1398", + publisher = "Springer Verlag, Heidelberg, DE", + address = "Chemnitz, DE", + pages = "137--142", + year = "1998", + url = "citeseer.ist.psu.edu/joachims97text.html" +} + +@InProceedings{ ClasifInternet, + author = "Theo Gevers and Frank Aldersho and Arnold W.M. Smeulders", + title = "Classification of images on the Internet by visual and textual information", + booktitle = "Proceedings of SPIE Conference on Internet Imaging", + year = 1999, + month = dec, + pages = "16--27" +} + +@InProceedings{ maynard-named, + author = "Diana Maynard and Valentin Tablan and Cristian Ursu and Hamish Cunningham and Yorick Wilks", + title = "Named Entity Recognition from Diverse Text Types", + booktitle = "Recent Advances in Natural Language Processing", + year = 2001 +} + +@InProceedings{ whoisonpic, + author = "Tamara L. Berg and Alexander C. Berg and Jaety Edwards and D.A. Forsyth", + title = "Who's in the Picture?", + booktitle = "Neural Information Processing Systems Conference", + year = 2004, + url = "citeseer.ist.psu.edu/westerveld00image.html" +} + + +@INPROCEEDINGS{westerveld00image, + author = {Thijs Westerveld}, + title = {Image Retrieval: Content versus Context}, + booktitle = {Proceedings of RIAO 2000 Conference on Content-Based Multimedia Information Access}, + year = {2000}, + pages = {276--284} +} + + +@article{satoh99name, + author = {Satoh, Shin'ichi and Nakamura, Yuichi and Kanade, Takeo}, + title = {Name-It: Naming and Detecting Faces in News Videos}, + journal = {IEEE MultiMedia}, + issue_date = {January 1999}, + volume = {6}, + number = {1}, + month = jan, + year = {1999}, + issn = {1070-986X}, + pages = {22--35}, + numpages = {14}, + url = {http://dx.doi.org/10.1109/93.752960}, + doi = {10.1109/93.752960}, + acmid = {614883}, + publisher = {IEEE Computer Society Press}, + address = {Los Alamitos, CA, USA}, +} +@InProceedings{ deschacht-moens:2007:ACLMain, + author = "Koen Deschacht and Marie-Francine Moens", + title = "Text Analysis for Automatic Image Annotation", + booktitle = "Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics", + month = "June", + series="ACL'05", + year = "2007", + address = "Prague, Czech Republic", + publisher = "Association of Computational Linguistics", + pages = "1000--1007" +} + +@InProceedings{ sidhu-klavans-lin:2007:LaTeCH, + author = "Tandeep Sidhu and Judith Klavans and Jimmy Lin", + title = "Concept Disambiguation for Improved Subject Access Using Multiple Knowledge Sources", + booktitle = "LaTeCH 2007: Proceedings of the Workshop on Language Technology for Cultural Heritage Data", + month = "June", + year = "2007", + address = "Prague, Czech Republic", + publisher = "ACL", + pages = "25--32", + url = "http://www.aclweb.org/anthology/W/W07/W07-0904" +} + +\%------------------------------Krishna Bib------------------------------ + +@Article{ smeulders00, + title = "{Content-based image retrieval at the end of the early years}", + author = "AWM Smeulders and M. Worring and S. Santini and A. Gupta and R. Jain", + journal = "Pattern Analysis and Machine Intelligence, IEEE Transactions on", + volume = "22", + number = "12", + pages = "1349--1380", + year = "2000" +} + +05, + +author = {R. Xu and D. Wunch II}, + +title = {Survey of Clustering Algorithms}, + +journal = {IEEE Transactions on Neural Networks}, + +volume = {6}, + +year = {2005}, + +number = {3}, + +month = {May}, + +pages = {645--678} +} + +94, + +author={D. Fogel}, + +title={{An Introduction to Simulated Evolutionary Optimization}}, + +journal={IEEE Transactions on Neural Network}, + +volume={5}, + +year={1994} +} + +@Article{ eberhart01, + author = "R. C. Eberhart and Y. Shi", + title = "{Tracking and optimizing dynamic systems with particle swarms}", + journal = "Evolutionary Computation, 2001. Proceedings of the 2001 Congress on", + volume = "1", + year = "2001" +} + +@InProceedings{ reynolds87, + author = "C.W. Reynolds", + title = "Flocks, herds and schools: a distributed behavioural model", + booktitle = "Computer Graphics", + year = "1987", + pages = "25--34" +} + +06, + +author = {K. Chandramouli and E. Izquierdo}, + +title = {Image Classification using Self Organising Feature Maps and Particle Swarm Optimisation}, + +booktitle = {WIAMIS'06: Proceedings of the 7th International Workshop on Image Analysis for Multimedia Interactive Services }, + +year = {2006}, + +pages = {313-316} +} + +07, + +author = {E. Izquierdo, K. Chandramouli, M. Grzegorzek, and T. Piatrik}, + +title = {K-Space Content Management and Retrieval System}, + +booktitle = {Proceedings 14th International Conference on Image Analysis and Processing}, + +year = {2007}, + +pages = {} +} + +07, + +author = {K. Chandramouli}, + +title = {Image Classification using Self Organising Feature Maps and Particle Swarm Optimisation}, + +booktitle = {Doctoral Consortium of SMAP'07: Proceedings of 2nd International Workshop on Semantic Media Adaptation and Personalization}, + +year = {2007}, + +pages = {212-216} +} + +08, + +author = {K. Chandramouli and E. Izquierdo}, + +title = {A study on Relevance Feedback using Particle Swarm Optimization}, + +booktitle = {Submitted}, + +year = {2008}, + +pages = {} +} + +08, + +author = {G. Th. Papadopoulos and K. Chandramouli and V. Mezaris and I. Kompatsiaris and E. Izquierdo and M.G. Strintzis}, + +title = {A Comparative Study of Classification Techniques for Knowledge-Assisted Image Analysis}, + +booktitle = {WIAMIS'08: Proceedings of 9th International Workshop on Image Analysis for Multimedia Interactive Services}, + +year = {2008}, + +pages = {} +} + +07, + +author = {L. Hollink and G. Schreiber and B. Wielinga}, + +title = {Patterns of semantic relations to improve image content search}, + +booktitle = {Web Semant.}, + +year = {2007}, + +pages = {195-203} +} + +05, + +author = {Z. Gong and C. W. Cheang and U. L. Hou}, + +title = {Web Query Expansion By WordNet}, + +booktitle = {Proceedings of DEXA 2005}, + +publisher = {LNCS}, + +year = {2005}, + +pages = {166-175} +} + +75, + +author = {E. O. Wilson}, + +title = {Sociobiology: The new synthesis}, + +booktitle = {Cambridge, MA: Belknap Press,}, + +year = {1975}, + +pages = {} +} + +90, + +author = {T. Kohonen}, + +title = {The Self Organizing Map}, + +journal = {Proceedings of IEEE}, + +volume = {78}, + +year = {1990}, + +number = {4}, + +month = {September}, + +pages = {1464--1480} +} + +07, + +author = {D. Djordjevic and E. Izquierdo}, + +title = {An Object- and User- Driven System for Semantic-Based Image Annotation and Retrieval}, + +journal = {IEEE Transactions on Circuits and Systems for Video Technology}, + +volume = {17}, + +year = {2007}, + +number = {3}, + +month = {March}, + +pages = {313-323} +} + +91, + +author = {M. Swain and D. Ballard}, + +title = {Color Indexing}, + +journal = {Int. Journal for Computer Vision}, + +volume = {7}, + +year = {1991}, + +number = {11}, + +month = {June}, + +pages = {11-32} +} + +07, + +author = {S. Blohm and P. Cimiano and E. Stemle}, + +title = {Harvesting Relations from the Web - Quantifiying the Impact of Filtering Functions}, + +booktitle = {AAAI}, + +year = {2007}, + +pages = {1316-1321}, + +bibsource = {DBLP, http://dblp.uni-trier.de} +} + +07, + +title={{Combining Global and Local Information for Knowledge-Assisted Image Analysis and Classification}}, + +author={Papadopoulos, Georgios Th. and Mezaris, V. and Kompatsiaris, I. and Strintzis, M. G.}, + +journal={EURASIP Journal on Advances in Signal Processing}, + +volume={2007}, + +pages={1--15}, + +year={2007}, + +publisher={Hindawi Publishing Corporation} +} + +90, + +title={{A Stochastic nonlinear model for coordinated bird flocks}}, + +author={F. Heppner and U. Grenander}, + +journal={S.Krasner Ed., The Ubiquity of Chaos, AAAS Publications, Washington}, + +year={1990} +} + +@Article{ Adamake05, + title = "{Region-based segmentation of images using syntactic visual features}", + author = "T. Adamek and N. {O Connor} and N. Murphy", + journal = "WIAMIS'05: Workshop on Image Analysis for Multimedia Interactive Services, Montreux, Switzerland", + year = "2005" +} + +96, + +author = {B. S. Manjunath and W. T. Ma}, + +title = {Texture features for browsing and retrieval of image data}, + +journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence}, + +volume = {18}, + +year = {1996}, + +number = {8}, + +month = {August}, + +pages = {837-842} +} + +98, + +author = {M. Tuceryan and A. K. Jain}, + +title = {Texture Analysis}, + +booktitle = {The Handbook of Pattern Recognition and Computer Vision}, + +year = {1998}, + +pages = {207-248} +} + +01, + +author = {B. S. Manjunath and J-R. Ohm and V. V. Vinod and A. Yamada}, + +title = {Color and Texture Descriptors}, + +journal = {IEEE Transactions on Circuits and Systems for Video Technology, Special Issue on MPEG - 7}, + +volume = {11}, + +year = {2001}, + +number = {6}, + +month = {June}, + +pages = {703-715} +} + +@article{semrelatedness, + author = {Budanitsky, Alexander and Hirst, Graeme}, + title = {Evaluating {WordNet}-based Measures of Lexical Semantic Relatedness}, + journal = {Computational Linguistics}, + issue_date = {March 2006}, + volume = {32}, + number = {1}, + month = mar, + year = {2006}, + issn = {0891-2017}, + pages = {13--47}, + numpages = {35}, + url = {http://dx.doi.org/10.1162/coli.2006.32.1.13}, + doi = {10.1162/coli.2006.32.1.13}, + acmid = {1168108}, + publisher = {MIT Press}, + address = {Cambridge, MA, USA}, +} + +@InProceedings{ pirro, + booktitle = "SWAP 2007: Fourth Italian Workshop on Semantic Web Applications and Perspectives", + year = "2007", + isbn = "978-88-902981-1-0", + publisher = "Universita di Bari", + address = "Italy" +} + +@Book{ agirre, + title = "Word Sense Disambiguation: Algorithms and applications", + author = "Eneko Agirre", + year = 2007, + ISBN = "978-1-4020-6870-6", + publisher = "Springer-Verlag" +} + +@TechReport{ cunningham00jape, + author = "Hamish Cunningham and Diana Maynard and Valentin Tablan", + title = "{JAPE} - a {J}ava {A}nnotation {P}atterns {E}ngine ({S}econd edition), {D}epartment of {C}omputer {S}cience, {U}niversity of {S}heffield, 2000", +year=2000, +note ={Technical Report} +} + +"Department of Computer Science, University of Sheffield", + +year = "2000"} + +@TechReport{ winkler, + author = "William E. Winkler and Yves Thibaudeau", + title = "An Application of the {Fellegi-Sunter} Model of Record Linkage to the 1990 {U.S.} Decennial Census", + note = "Statistical Research Report Series RR91/09", + institution = "U.S. Bureau of the Census", + year = 1991, + address = " Washington, D.C." +} + +@inproceedings{suchanek2007WWW, + author = {Suchanek, Fabian M. and Kasneci, Gjergji and Weikum, Gerhard}, + title = {Yago: a core of semantic knowledge}, + booktitle = {Proceedings of the 16th international conference on World Wide Web}, + series = {WWW '07}, + year = {2007}, + isbn = {978-1-59593-654-7}, + location = {Banff, Alberta, Canada}, + pages = {697--706}, + numpages = {10}, + url = {http://doi.acm.org/10.1145/1242572.1242667}, + doi = {10.1145/1242572.1242667}, + acmid = {1242667}, + publisher = {ACM}, + address = {New York, NY, USA}, + keywords = {WordNet, wikipedia}, +} + + +@Article{ PersIR, + author = "Ph. Mylonas and D. Vallet and P. Castells and M. Fernandez and Y. Avrithis", + title = "Personalized information retrieval based on context and ontological knowledge", + year = "2008", + publisher = "Knowledge Engineering Review, Cambridge University Press, Volume 23, Issue 1, pp. 73-100, March 2008", + url = "http://www.image.ece.ntua.gr/publications.php" +} + + +@inproceedings{bast06, + author = {Bast, Holger and Dupret, Georges and Majumdar, Debapriyo and Piwowarski, Benjamin}, + title = {Discovering a term taxonomy from term similarities using principal component analysis}, + booktitle = {Proceedings of the 2005 joint international conference on Semantics, Web and Mining}, + series = {EWMF'05/KDO'05}, + year = {2006}, + isbn = {3-540-47697-0, 978-3-540-47697-9}, + location = {Porto, Portugal}, + pages = {103--120}, + numpages = {18}, + acmid = {2172948}, + publisher = {Springer-Verlag}, + address = {Berlin, Heidelberg}, + keywords = {eigenvector decomposition, latent semantic indexing, ontology extraction, principal component analysis, semantic tagging, taxonomy extraction}, +} +@Article{ manjunath01, + author = "B. S. Manjunath and J-R. Ohm and V. V. Vinod and A. Yamada", + title = "Color and Texture descriptors", + journal = "IEEE Transactions on Circuits and Systems for Video Technology", + volume = "11", + number = "6", + year = "2001", + pages = "703--715" +} + +@Book{ manjunath03, + AUTHOR = "B. S. Manjunath and P. Salembier and T. Sikora", + TITLE = "Introduction to MPEG - 7, Multimedia Content Description Interface", + PUBLISHER = "New Cork, Wiley", + YEAR = "2003" +} + +@inproceedings{cimiano05, + author = {Cimiano, Philipp and V\"{o}lker, Johanna}, + title = {Text2Onto: a framework for ontology learning and data-driven change discovery}, + booktitle = {Proceedings of the 10th international conference on Natural Language Processing and Information Systems}, + series = {NLDB'05}, + year = {2005}, + isbn = {3-540-26031-5, 978-3-540-26031-8}, + location = {Alicante, Spain}, + pages = {227--238}, + numpages = {12}, + acmid = {2129816}, + publisher = {Springer-Verlag}, + address = {Berlin, Heidelberg}, +} + + + +@inproceedings{cunningham02, + author = {Cunningham, Hamish and Maynard, Diana and Bontcheva, Kalina and Tablan, Valentin}, + title = {GATE: an architecture for development of robust HLT applications}, + booktitle = {Proceedings of the 40th Annual Meeting on Association for Computational Linguistics}, + series = {ACL '02}, + year = {2002}, + location = {Philadelphia, Pennsylvania}, + pages = {168--175}, + numpages = {8}, + url = {http://dx.doi.org/10.3115/1073083.1073112}, + doi = {10.3115/1073083.1073112}, + acmid = {1073112}, + publisher = {Association for Computational Linguistics}, + address = {Stroudsburg, PA, USA}, +} + + +@InProceedings{ hepple00, + author = "Mark Hepple", + title = "Independence and commitment: Assumptions for rapid training and execution of rule-based POS taggers", + booktitle = "ACL-2000", + year = "2000", + editor = "", + series = "", + number = "", + publisher = "", + pages = "" +} + +@Book{ manning99, + AUTHOR = "Christopher D. Manning and Hinrich Schultze", + TITLE = "Foundations of Statistical Natural Language Processing", + PUBLISHER = "MIT Press", + YEAR = "1992" +} + +@InProceedings{ nemeth04, + author = "Y. Nemeth and B. Shapira and M. Taeib-Maimon", + title = "Evaluation of the real and perceived value of automatic and interactive query expansion", + booktitle = "SIGIR '04", + year = "2006", + editor = "", + series = "", + number = "", + publisher = "", + pages = "526--527" +} + +@InProceedings{ ramshaw95, + author = "Lance A. Ramshaw and Mitchell P. Marcus", + title = "Text Chunking using Transformation-Based Learning", + booktitle = "Third Workshop on Very Large Corpora", + year = "1995", + editor = {Yarovsky, David and Church, Kenneth}, +address = {Somerset, New Jersey}, + publisher = {Association for Computational Linguistics}, + pages = "82--94" +} + +@InProceedings{ snow05, + author = "Rion Snow and Daniel Jurafsky and Andrew Y. Ng", + title = "Learning syntactic patterns for automatic hypernym discovery", + booktitle = "Advances in Neural Information Processing Systems", + year = "2005", + editor = "", + series = "", + address = "Cambridge, MA", + number = "17", + publisher = "MIT Press", + pages = "1297--1304" +} + + +@inproceedings{snow06, + author = {Snow, Rion and Jurafsky, Daniel and Ng, Andrew Y.}, + title = {Semantic taxonomy induction from heterogenous evidence}, + booktitle = {Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics}, + series = {ACL-44}, + year = {2006}, + location = {Sydney, Australia}, + pages = {801--808}, + numpages = {8}, + url = {http://dx.doi.org/10.3115/1220175.1220276}, + doi = {10.3115/1220175.1220276}, + acmid = {1220276}, + publisher = {Association for Computational Linguistics}, + address = {Stroudsburg, PA, USA}, +} + +@inproceedings{snow06_short, + author = {Snow, Rion and Jurafsky, Daniel and Ng, Andrew Y.}, + title = {Semantic taxonomy induction from heterogenous evidence}, + series = {ACL-44}, + year = {2006}, + location = {Sydney, Australia}, + pages = {801--808}, + numpages = {8}, + url = {http://dx.doi.org/10.3115/1220175.1220276}, + doi = {10.3115/1220175.1220276}, + acmid = {1220276}, + publisher = {ACL}, + address = {Stroudsburg, PA, USA}, +} + + +@InProceedings{ shapira05, + author = "B. Shapira and M. Taieb-Maimon and Y. Nemeth", + title = "Subjective and objective evaluation of interactive and automatic query expansion", + booktitle = "Online Information Review", + year = "2005", + editor = "", + series = "", + number = "", + publisher = "", + pages = "374--390" +} + +@Proceedings{ DBLP:conf/dasfaa/2007, + editor = "Kotagiri Ramamohanarao and P. Radha Krishna and Mukesh K. Mohania and Ekawit Nantajeewarawat", + title = "Advances in Databases: Concepts, Systems and Applications, 12th International Conference on Database Systems for Advanced Applications, DASFAA 2007, Bangkok, Thailand, April 9-12, 2007, Proceedings", + booktitle = "DASFAA", + publisher = "Springer", + series = "Lecture Notes in Computer Science", + volume = "4443", + year = "2007", + isbn = "978-3-540-71702-7", + bibsource = "DBLP, http://dblp.uni-trier.de" +} + +@InProceedings{ chandramoulivie08_short, + author = "Krishna Chandramouli and Tom{\'a}\v{s} Kliegr and Jan Nemrava and Vojt\v{e}ch Sv{\'a}tek and Ebroul Isquierdo", + title = "Query Refinement and User Relevance Feedback for Contextualized Image Retrieval", + booktitle = "Proceedings of 5th International Conference on Visual Information Engineering", +series="VIE' 08", +pages="453 - 458", +publisher = "IET", + address = "Xi'en, China", + year = "2008" +} + +@InProceedings{ chandramoulivie08, + author = "Krishna Chandramouli and Tom{\'a}\v{s} Kliegr and Jan Nemrava and Vojt\v{e}ch Sv{\'a}tek and Ebroul Isquierdo", + title = "Query Refinement and User Relevance Feedback for Contextualized Image Retrieval", + booktitle = "Proceedings of 5th International Conference on Visual Information Engineering", +series="VIE' 08", +pages="453 - 458", +publisher = "IET", + address = "Xi'en, China", + year = "2008"} + +@InProceedings{ gunduz03web, + author = "S. Gunduz and M. Ozsu", + title = "A Web Page Prediction Model Based on Click-Stream Tree Representation of User Behavior", + booktitle = "Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining", + pages = "535--540", + year = "2003" +} + +@InProceedings{ oberle-conceptual, + author = "Daniel Oberle and Bettina Berendt and Andreas Hotho and Jorge Gonzalez", + title = "Conceptual User Tracking", + booktitle = "Advances in Web Intelligence", + isbn = "978-3-540-40124-7", + year = 2003, + volume = "2663", + publisher = "Springer", + series = "LNCS" +} + +@Book{ infforaging, + author = "Peter Pirolli", + title = "Information foraging theory: adaptive interaction with information", + year = "2007", + publisher = "Oxford University Press" +} + +@Article{ huberman98strong, + author = "Bernardo A. Huberman and Peter L. T. Pirolli and James E. Pitkow and Rajan M. Lukose", + title = "Strong Regularities in {W}orld {W}ide {W}eb Surfing", + journal = "Science", + volume = "280", + number = "5360", + pages = "95--97", + year = "1998" +} + +year = "1999"} + +@InProceedings{ agrawal_constraint_based_mining, + title = "Constraint-Based Rule Mining in Large, Dense Databases", + booktitle = "ICDE '99: Proceedings of the 15th International Conference on Data Engineering", + year = "1999", + isbn = "0-7695-0071-4", + pages = "188", + publisher = "IEEE", + address = "Washington, DC, USA" +} + +@InProceedings{ wumrepr, + author = "Bamshad Mobasher and Honghua Dai and Tao Luo and Yuqing Sun and Jiang Zhu", + title = "Integrating Web Usage and Content Mining for More Effective Personalization", + booktitle = "EC-WEB '00: Proceedings of the First International Conference on Electronic Commerce and Web Technologies", + year = "2000", + isbn = "3-540-67981-2", + pages = "165--176", + publisher = "Springer-Verlag", + address = "London, UK" +} + +@InProceedings{ dai02using, + author = "H. Dai and B. Mobasher", + title = "Using ontologies to discover domain-level web usage profiles", + booktitle = "Proceedings of the 2nd Semantic Web Mining Workshop at ECML/PKDD", + year = "2002", + location = "Helsinki, Finlan" +} + +@Proceedings{ DBLP:conf/kdd/2005web, + editor = "Olfa Nasraoui and Osmar R. Zaiane and Myra Spiliopoulou and Bamshad Mobasher and Brij M. Masand and Philip S. Yu", + title = "Advances in Web Mining and Web Usage Analysis, 7th International Workshop on Knowledge Discovery on the Web, WebKDD 2005, Chicago, IL, USA, August 21, 2005. Revised Papers", + booktitle = "WebKDD", + publisher = "Springer", + series = "Lecture Notes in Computer Science", + volume = "4198", + year = "2006", + isbn = "3-540-46346-1", + bibsource = "DBLP, http://dblp.uni-trier.de" +} + +@InProceedings{ usage_mnining_SW, + author = "Gerd Stumme and Bettina Berendt and Andreas Hotho", + title = "Usage Mining for and on the Semantic Web", + booktitle = "NGDM02: Proceedings of NSF Workshop on Next Generation Data Mining", + year = "2002", + pages = "77--86", + location = "Baltimore, USA" +} + +@InProceedings{ plsa, + author = "Xin Jin and Yanzan Zhou and Bamshad Mobasher", + title = "Web usage mining based on probabilistic latent semantic analysis", + booktitle = "KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining", + year = "2004", + isbn = "1-58113-888-1", + pages = "197--205", + location = "Seattle, WA, USA", + publisher = "ACM", + address = "New York, NY, USA" +} + +@Article{ assoc_rulesWSD, + author = "Min Song and Il-Yeol Song and Xiaohua Hu and Robert B. Allen", + title = "Integration of Association Rules and Ontology for Semantic-based Query Expansion", + journal = "Data \& Knowledge Engineering", + volume = "63", + year = "2007", + issn = "0169-023X", + publisher = "Elsevier", + address = "Amsterdam, The Netherlands, The Netherlands" +} + +@InProceedings{ BastDMP06, + AUTHOR = "Holger Bast and Georges Dupret and Debapriyo Majumdar and Benjamin Piwowarski", + TITLE = "Discovering a Term Taxonomy from Term Similarities Using Principal Component Analysis", + BOOKTITLE = "Semantics, web and mining : Joint International Workshops, EWMF 2005 and KDO 2005", + PUBLISHER = "Springer", + YEAR = "2006", + VOLUME = "4289", + PAGES = "103--120", + SERIES = "LNCS", + ADDRESS = "Porto, Portugal", + ISBN = "978-3-540-47697-9" +} + +@InProceedings{ cimiano05learning, + title = "Learning Taxonomic Relations from Heterogeneous Sources of Evidence", + author = "Paul Buitelaar and Philipp Cimiano and Bernardo Magnini", + booktitle = "Ontology Learning from Text: Methods, Evaluation and Applications", + volume = 123, + series = "Frontiers in Artificial Intelligence", + pages = "59--73", + publisher = "IOS Press", + year = 2005 +} + +@InProceedings{ cimiano04learning, + author = "P. Cimiano and A. Pivk and L. Schmidt-Thieme and S. Staab", + title = "Learning taxonomic relations from heterogeneous sources", + booktitle = "Proceedings of the ECAI 2004 Ontology Learning and Population Workshop", + year = "2004" +} + +@InProceedings{ nemravaekaw, + title = "Refining search queries using {W}ord{N}et glosses", + author = "Jan Nemrava", + booktitle = "Poster and Demo Proceedings of EKAW 2006", + year = "2006", + editor = "Helena Sofia Pinto and Martin Labsky", + isbn = "80-86742-15-6", + location = "Pod\v{e}brady" +} + +@Misc{ MOGP, + title = "A Generic Optimal Feature Extraction Method using Multiobjective Genetic Programming: Methodology and Applications", + author = "Yang Zhang and Peter Rockett", + year = 2006, + note = "{U}niversity of {S}heffield, {T}echnical {R}eport" +} + +@Article{ GPoutperformsPCAandFLDA, + author = "Hong Guo and Asoke K. Nandi", + title = "Breast cancer diagnosis using genetic programming generated feature", + journal = "Pattern Recognition", + volume = "39", + number = "5", + year = "2006", + issn = "0031-3203", + pages = "980--987", + publisher = "Elsevier Science Inc.", + address = "New York, NY, USA" +} + +@Article{ statedpreference, + author = "Els Vandaele and Frank Witlox", + title = "Determining the Monetary Value of Quality Attributes in Freight Transportation Using a Stated Preference Approach", + journal = "Transportation, Planning and Techology", + month = April, + year = 2005, + Volume = 28 +} + +@TechReport{ uta_transport, + AUTHOR = "Michel Beuthe and Christophe Bouffioux and Jan De Maeyer", + TITLE = "A multicriteria analysis of stated preferences among freight transport alternatives", + YEAR = 2003, + MONTH = Aug, + INSTITUTION = "European Regional Science Association", + TYPE = "ERSA conference papers", + NOTE = "available at http://ideas.repec.org/p/wiw/wiwrsa/ersa03p173.html", + NUMBER = "ersa03p173" +} + +@InProceedings{ kliegr_edbt08, + author = "Tom{\'a}\v{s} Kliegr", + title = "Representation and dimensionality reduction of semantically enriched clickstreams", + booktitle = "Ph.D. '08: Proceedings of the 2008 EDBT Ph.D. workshop", + year = "2008", + isbn = "978-1-59593-968-5", + pages = "29--38", + location = "Nantes, France", + doi = "http://doi.acm.org/10.1145/1387150.1387156", + publisher = "ACM", + address = "New York, NY, USA" +} + +@InProceedings{ lpcomplexity, + author = "Nimrod Megiddo", + title = "On the complexity of linear programming", + publisher = "Cambridge University Press", + year = 1987, + pages = "225--ᅵ268", + journal = "Advances in economic theory: Fifth world congress" +} + +@Misc{ reversesimplex, + title = "POST-OPTIMALITY ANALYSIS VIA THE REVERSE SIMPLEX METHOD AND THE TARRY METHOD", + author = "van de Panne", + institution = "VIRGINIA UNIV CHARLOTTESVILLE DEPT OF ECONOMICS", + year = 1996, + note = "Technical Report", + abstract = "The practical applications of linear programming, not only the optimal solution but also solutions which have a somewhat lower value of the objective function are of interest. It is therefore desirable to generate all extreme-point solutions satisfying the constraints and giving a value of the objective function which differs by at most a given amount from the value for the optimal solution. Two methods are considered for generating these extreme points. The first method is called the reverse Simplex method, since it reverses the Simplex method for linear programming, the second is basic on the Tarry method for traversing a network such that all nodes are visited. The two methods are explained in detail, applied to an example and compared with each other." +} + +@InProceedings{ ar-fraud-detection, + author = "Ahmed Metwally and Divyakant Agrawal and Amr El Abbadi", + title = "Using Association Rules for Fraud Detection in Web Advertising Networks", + booktitle = "Proceedings of the 31st VLDB Conference", + address = "Trondheim, Norway", + year = "2005" +} + +@InProceedings{ wum-clustering-survey, + author = "Athena Vakali and Jaroslav Pokorny and Theodore Dalamagas", + booktitle = "Proceedings of the EDBT 2004 Workshops", + year = "2004", + title = "An Overview of Web Data Clustering Practices", + publisher = "Springer", + pages = "597--606" +} + +@InProceedings{ banerjee01clickstream, + author = "Arindam Banerjee and Joydeep Ghosh", + title = "Clickstream clustering using weighted longest common subsequences", + booktitle = "Proceedings of the Web Mining Workshop at the 1st SIAM Conference on Data Mining", + place = "Chicago", + year = "2001", + month = "April", + pages = "33--40" +} + +@aricle{ invalid_clicks, + author = "Andy Beal", + title = "Googleᅵs Click Fraud Rate is Less than 2\%", + month = "December", + Year = "2006", + journal = "Marketing Pilgrim" +} + +@InProceedings{ ppcSpend, + title = "NetTravel.CZ a Virtuᅵlnᅵ svᅵt PPC", + booktitle = "Prezentace na konferenci PPC 2007", + year = 2007, + url = "konference.dobryweb.cz/ppc/ke-stazeni/3-petr-stanek.ppt " +} + +@InProceedings{ ppc07, + title = "Inzerce placenᅵ za proklik v ᅵR k roku 2007", + booktitle = "Sbornᅵk konference PPC 2007", + year = 2007, + url = "http://konference.dobryweb.cz/ppc/ke-stazeni/ppc-2007.pdf" +} + +url="http://www.iabeurope.ws/news/stories/news\_228.htm" +} + +@Article{ onlineAdSpend, + title = "Online Ad Spend Growth Is Historic", + month = "March", + day = 20, + year = 2007, + url = "http://www.emarketer.com/Article.aspx?id=1004695" +} + +@Misc{ nettravel, + author = "Petr Stanek", + title = "Net Travel a virtuᅵlnᅵ svᅵt PPC", + note = "prezentace na konferenci Dobrᅵ web", + url = "konference.dobryweb.cz/ppc/ke-stazeni/3-petr-stanek.ppt" +} + +@Article{ srivastava00web, + author = "Jaideep Srivastava and Robert Cooley and Mukund Deshpande and Pang-Ning Tan", + title = "Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data", + journal = "{SIGKDD} Explorations", + publisher = "ACM", + volume = "1", + number = "2", + pages = "12--23", + year = "2000" +} + +@Book{ kliegrdp, + PUBLISHER = "University of Economics in Prague, Faculty of Informatics and Statistics", + ADDRESS = "Prague", + Author = "Tom{\'a}\v{s} Kliegr", + TITLE = "Clickstream Analysis", + YEAR = "2007", + NOTE = "Master Thesis" +} + +@Article{ kosala00web, + author = "Kosala and Blockeel", + title = "Web Mining Research: {A} Survey", + journal = "{SIGKDD} Explorations: Newsletter of the SIGKDD", + volume = "2", + publisher = "ACM", + year = "2000" +} + +@Article{ watrends, + author = "Arun Sen and Peter A. Dacin and Christos Pattichis", + title = "Current trends in web data analysis", + journal = "ACM Communications", + volume = "49", + number = "11", + year = "2006", + issn = "0001-0782", + pages = "85--91", + publisher = "ACM Press", + address = "New York, NY, USA" +} + +@InProceedings{ kohaviecommercegoodbadugly, + author = "Ron Kohavi", + title = "Mining E-Commerce Data: The Good, the Bad, and the Ugly", + booktitle = "Proceedings of the Seventh ACM SIGKDD", + editorDISABLED = "Foster Provost and Ramakrishnan Srikant", + pages = "8--13", + year = "2001" +} + +article{businessWeekClickFraud, +title={Click Fraud: The dark side of online advertising}, +journal={BusinessWeek} +month= {October}, +year={2006}, +day = {2}, +note={ISSN 0007-7135} +} + +@Article{ forresterq32007, + month = "September", + year = 2007, + day = 11, + title = "{W}eb {A}nalytics, {Q}3 2007", + author = "Megan Burns", + journal = "The {F}orrester {W}aveᅵ" +} + +@Article{ forresterq12006, + year = 2006, + title = "{W}eb {A}nalytics, {Q}1 2006", + author = "Nate L. Root", + journal = "The {F}orrester {W}aveᅵ" +} + +@InProceedings{ webmetricscase, + author = "Birgit Weischedel and Eelko K. R. E. Huizingh", + title = "Website optimization with web metrics: a case study", + booktitle = "ICEC '06: Proceedings of the 8th international conference on Electronic commerce", + year = "2006", + isbn = "1-59593-392-1", + pages = "463--470", + location = "Fredericton, New Brunswick, Canada", + publisher = "ACM Press", + address = "New York, NY, USA" +} + +"

    Usability is becoming an increasingly critical issue for commercial websites. Even big brand names are losing substantial revenues because their sites are too complicated for customers. Traditionally, usability was simply about making life easier for customers rather than impacting on the bottom line. When combined with appropriate measuring tools and metrics, usability has a return on investment (ROI) demonstrably greater than any other initiative. Making a site easier to use means that it is easier for customers to do business with the company.

    This paper proposes an approach whereby an Internet strategy based on performance measurement and a focus on usability offers an unequivocal ROI. The demonstrable ROI lends credibility to the business justification for an Internet strategy or augmentation of an existing strategy.

    This approach involves determining key business metrics and developing a solution employing usability methods to benchmark, analyse, resolve and remeasure to position a site +that delivers on commercial objectives. This strategy is not for the faint of heart, but it does help to increase revenues and reduce costs while offering a sustainable competitive advantage.

    ", + +pages = "223-234(12)" +} + +@Article{ aberdeen, + author = "Aberdeen Group", + year = "2002", + title = "Web Analytics: Making Business Sense of Online Behavior", + Place = "Boston,MA" +} + +@InProceedings{ tensupplmeasures, + booktitle = "Proceedings of the Fifth WEBKDD workshop", + year = "2003", + author = "Ron Kohavi AND Rajesh Parekh", + title = "Ten Supplementary Analyses to Improve E-commerce Web Sites" +} + +@Article{ forrester_forecast, + title = "Web Analytics Market: Continued Growth In 2005", + author = "Bob Chatham", + year = "2004", + month = "November", + journal = "Forrester" +} + +@Article{ forrester_3winningKPIs, + title = "Three Winning Web Analytics KPIs", + author = "Craig Menzies", + abstract = "Web analysts want to use their toolbox to improve the customer experience of their Web sites. To be successful, they should use more advanced metrics like single access ratio, internal search results, and segmented conversion rates. By working with marketing and IT colleagues, Web analysts can then focus the site on the visitors that really matter, and save on expensive pay-per-click campaigns.", + year = "2007" +} + +@Misc{ whynotga, + publisher = "Webtraffiq", + title = "Why ᅵNotᅵ Google Web Analytics?", + Year = 2007, + Month = July, + Note = "White paper", + URL = "http://www.webtrafficiq.com/home/white_paper_google_web_analytics.pdf" +} + +@Article{ bradford, + author = "S. C. Bradford", + title = "Sources of information on specific subjects", + journal = "J. Inf. Sci.", + volume = "10", + number = "4", + year = "1985", + issn = "0165-5515", + pages = "173--180", + publisher = "Sage Publications, Inc.", + address = "Thousand Oaks, CA, USA" +} + +@Article{ kleinberg99authoritative, + author = "Jon M. Kleinberg", + title = "Authoritative sources in a hyperlinked environment", + journal = "Journal of the ACM", + volume = "46", + number = "5", + pages = "604--632", + year = "1999" +} + +@Article{ porter80-algorithm, + title = "An algorithm for suffix stripping", + author = "M. F. Porter", + journal = "Program", + month = "July", + number = "3", + pages = "130--137", + volume = "14", + year = "1980" +} + +@InProceedings{ guhaprefix, + author = "Luk{\'a}\v{s} Vl\v{c}ek", + title = "A GUHA Style Association Rule Mining Using Prefix Tree Data Structure", + booktitle = "Znalosti 2006 Proceedings", + year = "2006", + editor = "J{\'e}n Parali\v{c} and Ji{\v{r}}{\'i} Dvorsk{\'y} and Michal Kr{\'a}tk{\'y}", + isbn = "80-248-1001-8", + location = "Ostrava \v{C}R", + publisher = "V\v{S}B-TU Ostrava" +} + +@Misc{ customizegoogle, + url = "http://customizegoogle.blogspot.com/", + Title = "Customize Google", + Publisher = "Official blog of customizegoogle.com", + note = "[Online; accessed 30-July-2007]" +} + +@Proceedings{ DBLP:conf/wise/2002-2, + title = "3rd International Conference on Web Information Systems Engineering Workshops (WISE 2002 Workshops), 11 December 2002, Singapore, Proceedings", + booktitle = "WISE Workshops", + publisher = "IEEE Computer Society", + year = "2002", + isbn = "0-7695-1813-3", + bibsource = "DBLP, http://dblp.uni-trier.de" +} + +@InProceedings{ 1244037, + author = "Hisham Al-Mubaid and Hoa A. Nguyen", + title = "Semantic distance of concepts within a unified framework in the biomedical domain", + booktitle = "SAC '07: Proceedings of the 2007 ACM symposium on Applied computing", + year = "2007", + isbn = "1-59593-480-4", + pages = "142--143", + location = "Seoul, Korea", + doi = "http://doi.acm.org/10.1145/1244002.1244037", + publisher = "ACM Press", + address = "New York, NY, USA" +} + +@Book{ pospichalea, + title = "Evolu\v{c}n{\'e} algoritmy", + Author = "Vladim{\'i}r Kvas{\v{n}}i\v{c}ka and Ji{\v{r}}{\'i} Posp{\'i}chal and Peter Ti{\v{n}}o", + Publisher = "STU Bratislava", + Year = "2000", + ISBN = "80-227-1377-5" +} + +@InProceedings{ randomprojection, + author = "Ella Bingham and Heikki Mannila", + title = "Random projection in dimensionality reduction: applications to image and text data", + booktitle = "KDD '01: Proceedings of the 7th ACM SIGKDD international conference on Knowledge discovery and data mining", + year = "2001", + isbn = "1-58113-391-X", + pages = "245--250", + location = "San Francisco, California", + doi = "http://doi.acm.org/10.1145/502512.502546", + publisher = "ACM Press", + address = "New York, NY, USA" +} + +@Misc{ higherorderstat, + url = "http://www.maths.leeds.ac.uk/applied/news.dir/issue2/hos_intro.html", + author = "Achilleas Stogioglou {Steve McLaughlin} and Justin Fackrell", + title = "Introducing Higher Order Statistics (HOS) for the Detection of Nonlinearities", + institution = "Department of Electrical Engineering, University of Edinburgh" +} + +@Misc{ lecturedimred, + title = "Lecture 5: Dimensionality reduction (PCA)", + Institution = "Texas A\&M University", + note = "University Course material", + url = "http://courses.cs.tamu.edu/rgutier/cs790_w02/l5.pdf" +} + +@TechReport{ Fodor02DRSurvey, + title = "A Survey of Dimension Reduction Techniques", + author = "Imola Fodor", + institution = "Lawrence Livermore National Lab., CA (US)", + url = "http://www.osti.gov/energycitations/product.biblio.jsp?osti_id=15002155", + year = "2002" +} + +@Article{ algorithmsforARs, + author = "Markus Hegland", + title = "Algorithms for association rules", + journal = "Advanced lectures on machine learning", + year = "2003", + isbn = "3-540-00529-3", + pages = "226--234", + publisher = "Springer-Verlag New York, Inc.", + address = "New York, NY, USA" +} + +@Book{ statsoftetextbook, + Author = "StatSoft", + Title = "StatSoft Electronic Textbook", + url = "http://www.statsoft.com/textbook/glosc.html", + note = "Available online" +} + +@Misc{ stateng, + author = "Charles Annis", + title = "Statistical Engineering", + url = "http://www.statisticalengineering.com/curse_of_dimensionality.htm", + note = "Web article" +} + +@Book{ curseofdimensionality, + TITLE = "Relationship-based Clustering and Cluster Ensembles for High-dimensional Data Mining", + AUTHOR = "Alexander Strehl", + PUBLISHER = "The University of Texas at Austin ", + YEAR = "2002", + NOTE = "PhD Dissertation" +} + +@Misc{ seqpat, + Author = "Jirachai Buddhakulsomsiri", + title = "Sequential pattern analysis for automotive warranty data mining", + booktitle = "Industrial Engineering Research Conference (IERC)", + location = "Orlando, FL", + Month = "May", + Year = " 2006", + url = "http://www.engin.umd.umich.edu/hpceep/tech_day/download.php?year=2006&dept=IMS&target=2006_IMS_slideshow_pdf_00296.pdf" +} + +@InProceedings{ agrawal95mining, + author = "Rakesh Agrawal and Ramakrishnan Srikant", + title = "Mining sequential patterns", + booktitle = "Eleventh International Conference on Data Engineering", + publisher = "IEEE Computer Society Press", + address = "Taipei, Taiwan", + editor = "Philip S. Yu and Arbee S. P. Chen", + pages = "3--14", + year = "1995" +} + +@Misc{ arbenchmark, + title = "FIMI'03 -- Frequent Itemset Mining Implementations", + url = "http://fimi.cs.helsinki.fi/", + author = "Bart Goethals" +} + +@Misc{ borgelt02induction, + author = "Christian Borgelt and Rudolf Kruse", + title = "Induction of association rules: A priori implementation", + text = "C. Borgelt and R. Kruse. Induction of association rules: A priori implementation. In 15th Conference on Computational Statistics, 2002.", + year = "2002" +} + +@InProceedings{ structuremining, + author = "Miguel Gomes da Costa Junior and Zhiguo Gong", + title = "Web structure mining: an introduction", + booktitle = "Proceeding of the 2005 IEEE International Conference on Information Acquisition", + Month = "June", + Year = "2005", + ISBN = "0-7803-9303-1" +} + +@Proceedings{ DBLP:conf/ewmf/2005, + editorDISABLED = "Markus Ackermann and Bettina Berendt and Marko Grobelnik and Andreas Hotho and Dunja Mladenic and Giovanni Semeraro and Myra Spiliopoulou and Gerd Stumme and Vojtech Sv{\'a}tek and Maarten van Someren", + title = "Semantics, Web and Mining, Joint International Workshops, EWMF 2005 and KDO 2005, Porto, Portugal, October 3 and 7, 2005, Revised Selected Papers", + booktitle = "EWMF/KDO", + publisher = "Springer", + series = "LNCS", + volume = "4289", + year = "2006", + isbn = "3-540-47697-0", + bibsource = "DBLP, http://dblp.uni-trier.de" +} + +@Misc{ googlefirefox, + title = "Google Gets Closer To Firefox", + author = "Antone Gonsalves", + publsiher = "Information Week", + Month = "October", + day = "9", + Year = "2005", + url = "http://www.informationweek.com/showArticle.jhtml?articleID=173601284" +} + +@Article{ tan02discovery, + author = "P. Tan and V. Kumar", + title = "Discovery of web robot sessions based on their navigational patterns", + journal = "Data Mining and Knowl. Discovery", + volume = "6", + pages = "9--35", + year = "2002" +} + +@Book{ salton, + author = "Gerard Salton", + citeulike-article-id = "163527", + howpublished = "Hardcover", + isbn = "0070544840", + keywords = "ir", + month = "September", + priority = "0", + publisher = "{McGraw-Hill Companies}", + title = "Introduction to Modern Information Retrieval (McGraw-Hill Computer Science Series)", + year = "1983" +} + +@Book{ Principlesofdatamining, + author = "David J. Hand and Padhraic Smyth and Heikki Mannila", + title = "Principles of data mining", + year = "2001", + isbn = "0-262-08290-X", + publisher = "MIT Press", + address = "Cambridge, MA, USA" +} + +@Article{ frawley92knowledge, + author = "W. J. Frawley and G. Piatetsky-Shapiro and C. J. Matheus", + title = "Knowledge discovery in databases - an overview", + journal = "Ai Magazine", + volume = "13", + address = "Gte Labs Inc, Distributed Cooperating Learning Syst Project, Waltham, Ma, 02254 Gte Labs Inc, Knowledge Discovery Databases Project, Waltham, Ma, 02254", + pages = "57--70", + year = "1992" +} + +@Misc{ cooley97web, + author = "Robert Cooley and Bamshad Mobasher and Jaideep Srivastava", + title = "Web Mining: Information and Pattern Discovery on the World Wide Web", + text = "COOLEY, R., SRIVASTAVA, J., MOBASHER, B., Web Mining: Information and Pattern Discovery on the World Wide Web, Proceedings of the 9th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'97), November 1997.", + year = "1997" +} + +center} + +\end{center} +year = {2006}, +description = {dblp}, +ee = {http://dx.doi.org/10.1007/11891451\_28}, isbn = {3-540-46363-1}, date = {2006-10-24}, +keywords = {dblp } +} + +@Misc{ conceptlattice, + title = "The Agile Virtual Enterprise: Cases, Metrics, Tools", + author = "Ted Goranson", + url = "http://www.sirius-beta.com/ALICEsupp/FarOutIdeas/conceptlattices.html" +} + +@InProceedings{ stumme00fast, + author = "Gerd Stumme and Rafik Taouil and Yves Bastide and Nicolas Pasquier and Lotfi Lakhal", + title = "Fast Computation of Concept lattices Using Data Mining Techniques", + booktitle = "Knowledge Representation Meets Databases", + pages = "129--139", + year = "2000" +} + +@Book{ IdeaWebMining, + author = "Anthony Scime", + title = "Web Mining: Applications and Techniques", + year = "2004", + isbn = "1591404150", + publisher = "IGI Publishing", + address = "Hershey, PA, USA" +} + +@Misc{ wiki:ontology, + author = "Wikipedia", + title = "Ontology (computer science) - Wikipedia The Free Encyclopedia", + year = "2007", + url = "http://en.wikipedia.org/w/index.php?title=Ontology_%28computer_science%29&oldid=144091862", + note = "[Online; accessed 19-July-2007]" +} + +@Misc{ wet, + author = "Michael Etgen and Judy Cantor", + title = "What does getting wet (web event-logging tool) mean for web usability?", + booktitle = "5th Conference on Human Factors and The Web", + year = "1999" +} + +@Misc{ uar, + author = "Richard Thomas and Gregor Kennedy and Steve Draper and Rebecca Mancy and Murray Crease and Huw Evans and Phil Gray", + title = "Generic usage monitoring of programming students", + text = "Crisp, G., Thiele, D., Scholten, I., Barker, S., Baron, J., eds.: 20th Annual Conference of the Australasian Society for Computers in Learning in Tertiary Education (ASCILITE).", + year = "2003" +} + +@Misc{ vips, + author = "Deng Cai and Shipeng Yu and Ji-Rong Wen and Wei-Ying Ma", + title = "VIPS: a visionbased page segmentation algorithm", + text = "Microsoft Technical Report, MSR-TR-2003-79", + year = "2003" +} + +@Misc{ kovacevic-recognition, + author = "Milos Kovacevic and Michelangelo Diligenti and Marco Gori and Marco Maggini and Veljko Milutinovic", + title = "Recognition of Common Areas in a Web Page Using Visual Information: a possible application in a page classification", + year = "2002" +} + +@InProceedings{ eyetracking, + author = "Laura A. Granka and Thorsten Joachims and Geri Gay", + title = "Eye-tracking analysis of user behavior in WWW search", + booktitle = "SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval", + year = "2004", + isbn = "1-58113-881-4", + pages = "478--479", + location = "Sheffield, United Kingdom", + publisher = "ACM Press", + address = "New York, NY, USA" +} + +@Proceedings{ DBLP:conf/iccsa/2006-5, + editor = "Marina L. Gavrilova and Osvaldo Gervasi and Vipin Kumar and Chih Jeng Kenneth Tan and David Taniar and Antonio Lagan{\`a} and Youngsong Mun and Hyunseung Choo", + title = "Computational Science and Its Applications - ICCSA 2006, International Conference, Glasgow, UK, May 8-11, 2006, Proceedings, Part V", + booktitle = "ICCSA (5)", + publisher = "Springer", + series = "Lecture Notes in Computer Science", + volume = "3984", + year = "2006", + isbn = "3-540-34079-3", + bibsource = "DBLP, http://dblp.uni-trier.de" +} + +@Proceedings{ DBLP:conf/dmin/2006, + editor = "Sven F. Crone and Stefan Lessmann and Robert Stahlbock", + title = "Proceedings of the 2006 International Conference on Data Mining, DMIN 2006, Las Vegas, Nevada, USA, June 26-29, 2006", + booktitle = "DMIN", + publisher = "CSREA Press", + year = "2006", + isbn = "1-60132-004-3", + bibsource = "DBLP, http://dblp.uni-trier.de" +} + +@Proceedings{ DBLP:conf/ah/2006, + editor = "Vincent P. Wade and Helen Ashman and Barry Smyth", + title = "Adaptive Hypermedia and Adaptive Web-Based Systems, 4th International Conference, AH 2006, Dublin, Ireland, June 21-23, 2006, Proceedings", + booktitle = "AH", + publisher = "Springer", + series = "Lecture Notes in Computer Science", + volume = "4018", + year = "2006", + isbn = "3-540-34696-1", + bibsource = "DBLP, http://dblp.uni-trier.de" +} + +@InProceedings{ shahabi00insite, + author = "Cyrus Shahabi and Adil Faisal and Farnoush Banaei Kashani and Jabed Faruque", + title = "{INSITE}: A Tool for Interpreting Users? Interaction with a Web Space", + booktitle = "The {VLDB} Journal", + pages = "635--638", + year = "2000" +} + +@Article{ cooley99data, + author = "Robert Cooley and Bamshad Mobasher and Jaideep Srivastava", + title = "Data Preparation for Mining World Wide Web Browsing Patterns", + journal = "Knowledge and Information Systems", + volume = "1", + number = "1", + pages = "5--32", + year = "1999" +} + +@Book{ kliegrbp, + PUBLISHER = "University of Economics in Prague", + ADDRESS = "Prague", + Author = "Tom{\'a}\v{s} Kliegr", + TITLE = "Search Engine Optimization And Web Metrics", + YEAR = "2006", + NOTE = "Bachelor Thesis [In Czech]" +} + +@InProceedings{ Paganelli, + author = "Laila Paganelli and Fabio Patern{\`o}", + title = "Intelligent analysis of user interactions with web applications", + booktitle = "IUI '02: Proceedings of the 7th international conference on Intelligent user interfaces", + year = "2002", + isbn = "1-58113-459-2", + pages = "111--118", + location = "San Francisco, California, USA", + doi = "http://doi.acm.org/10.1145/502716.502735", + publisher = "ACM Press", + address = "New York, NY, USA" +} + +@Article{ googlefortune500, + title = "20 percent of the Fortune 500 may be insane", + author = "Matthew Roche", + url = "http://www.landingpageoptimization.com/2006/07/20_of_the_fortu.html", + note = "Blog post. July 20, 2006 " +} + +@Misc{ comscore, + title = "Cookie-Based Counting Overstates Size of Web Site Audiences", + author = "ComScore", + url = "http://www.comscore.com/press/release.asp?press=1389", + note = "Press Release. April 16, 2007 " +} + +@Article{ cnet, + title = "Bumpy start for Google analytics giveaway", + author = "Elinor Mills", + url = "http://news.com.com/Bumpy+start+for+Google+analytics+giveaway/2100-1032_3-5956308.html", + journal = "CNET news.com", + year = "2005", + note = "Published: November 16, 2005" +} + +@Article{ army, + title = "India's secret army of online ad 'clickers'", + author = "N. Vidyasagar", + JOURNAL = "Times of India", + year = "2004", + url = "http://timesofindia.indiatimes.com/articleshow/msid-654822,curpg-1.cms", + note = "[3.5.2004]" +} + +@Misc{ clickingagent, + title = "Clicking agent", + author = "Lotesoft co. co.", + url = "http://www.clickingagent.com/softcaca.html", + note = "[8.5.2007]" +} + +@Misc{ CSSexplanation, + title = "Phishing for Clues - Inferring Context Using Cascading Style Sheets and Browser History", + author = "Markus Jakobsson and Tom N. Jagatic and Sid Stamm", + url = "https://www.indiana.edu/~phishing/browser-recon/", + note = "[8.5.2007]" +} + +@Misc{ CSSdemo, + title = "CSS Exploit information", + author = "Henrik Gemal", + url = "http://gemal.dk/browserspy/css.html", + note = "[8.5.2007]" +} + +@InProceedings{ Microsoft2, + author = "Joshua Goodman", + title = "Pay-per-percentage of Impressions: An Advertising Method that is Highly Robust to Fraud", + booktitle = "Presented at the ACM E-Commerce Workshop on Sponsored Search Auctions", + year = "2005", + text = "In this paper, we describe a simple method for selling advertising, pay-per-percentage of impressions, that is immune to both click fraud and impression fraud. We describe assumptions required to guarantee the immunity, which impact the design of the system. In particular, ads must be shown in a truly random way, across the percentage of impressions purchased. We describe prefix-match: a system that is similar to broad-match, but more compatible with pay-per-percentage. We show how to auction pay-per-percentage matches, including prefix matches in a revenue maximizing way. Finally, we describe variations on the technique that may make it easier to sell to advertisers. " +} + +@Article{ anupam99security, + author = "Vinod Anupam and Alain Mayer and Kobbi Nissim and Benny Pinkas and Michael K. Reiter", + title = "On the security of pay-per-click and other {Web} advertising schemes", + journal = "Computer Networks (Amsterdam, Netherlands: 1999)", + volume = "31", + number = "11--16", + pages = "1091--1100", + year = "1999" +} + +@InProceedings{ Microsoft1, + author = "N. Immorlica and K. Jain and M. Mahdian and K. Talwar", + title = "Click Fraud Resistant Methods for Learning Click-Through Rates", + booktitle = "Presented at the Workshop on Internet and Network Economics (WINE)", + year = "2005", + location = "Hong Kong", + text = "Most of today's online advertising business (such as Google and Overture) works based on pay-per-click auctions. Such systems often learn a parameter called click-through rate (CTR), and rank bidders based on their bid times their CTR. In this way, in expectation, the system behaves like a pay-per-impression system, where each advertiser pays an amount equal to her bid times her CTR every time her ad is displayed. It is well-known that pay-per-impression systems are less prone to fraud. Therefore, one might expect a pay-per-click system to be resistant to fraud in expectation. In this paper, we observe that this conclusion is not necessarily true, since an adversary can create fluctuation in the CTR and take advantage of the fact that the CTR learning algorithm does not adapt quickly to such fluctuations to cause harm to the advertiser. Then, we define a class of CTR learning algorithms called click-based algorithms and prove that click-based learning algorithms are resistant to such attacks. We +give examples demonstrating that many natural non-click-based algorithms are not resistant to such attacks." +} + +@Misc{ metwally05using, + author = "A. Metwally and D. Agrawal and A. Abbadi", + title = "Using Association Rules for Fraud Detection in Web Advertising Networks", + publisher = "University of California", + place = "Santa Barbara", + year = "2005" +} + +@InProceedings{ reiterdetecting, + author = "Michael K. Reiter and Vinod Anupam and Alain Mayer", + title = "Detecting Hit Shaving in Click-Through Payment Schemes", + booktitle = "Proceedings of the 3rd USENIX Workshop on Electronic Commerce", + pages = "155--166", + year = "1998" +} + +@Misc{ tuzhilin, + author = "Alexander Thuzhilin", + title = "The {L}aneᅵs {G}ifts v. {G}oogle {R}eport", + note = "Court report", + year = "2006", + url = "http://googleblog.blogspot.com/pdf/Tuzhilin_Report.pdf" +} + +@Misc{ blogonthuzhilin, + author = "Rimm-Kaufman Group", + note = "Comments on The Laneᅵs Gifts v. Google Report by Alexander Tuzhilin", + url = "http://www.rimmkaufman.com/rkgblog/2006/08/06/comments-on-the-lanes-gifts-v-google-report-by-alexander-tuzhilin/" +} + +@Misc{ googlelinkquality, + author = "Google Inc {Click Quality Team}", + note = "How Fictitious Clicks Occur in Third-Party Click Fraud Audit Reports", + url = "www.google.com/adwords/ReportonThird-PartyClickFraudAuditing.pdf", + year = "1978" +} + +@Book{ guhabook, + PUBLISHER = "Springer-Verlag", + Author = "Petr H{\'a}jek and Tom{\'a}\v{s} Havr{\'a}nek", + TITLE = "Mechanizing Hypothesis Formation", + YEAR = "1978", + ISBN = "3-540-08738-9" +} + +@Proceedings{ DBLP:conf/RelMiCS/2003, + editor = "Harrie C. M. de Swart and Ewa Orlowska and Gunther Schmidt and Marc Roubens", + title = "Theory and Applications of Relational Structures as Knowledge Instruments, COST Action 274, TARSKI, Revised Papers", + booktitle = "Theory and Applications of Relational Structures as Knowledge Instruments", + publisher = "Springer", + series = "Lecture Notes in Computer Science", + volume = "2929", + year = "2003", + isbn = "3-540-20780-5", + bibsource = "DBLP, http://dblp.uni-trier.de" +} + +@InCollection{ DBLP:books/mit/PF91/Piatetsky91, + author = "Gregory Piatetsky-Shapiro", + title = "Discovery, Analysis, and Presentation of Strong Rules", + booktitle = "Knowledge Discovery in Databases", + publisher = "AAAI/MIT Press", + year = "1991", + isbn = "0-262-62080-4", + pages = "229--248", + bibsource = "DBLP, http://dblp.uni-trier.de" +} + +@inproceedings{agrawal94fast, + author = {Agrawal, Rakesh and Srikant, Ramakrishnan}, + title = {Fast Algorithms for Mining Association Rules in Large Databases}, + booktitle = {Proceedings of the 20th International Conference on Very Large Data Bases}, + series = {VLDB '94}, + year = {1994}, + isbn = {1-55860-153-8}, + pages = {487--499}, + numpages = {13}, + acmid = {672836}, + publisher = {Morgan Kaufmann Publishers Inc.}, + address = {San Francisco, CA, USA}, +} + + + + +@InProceedings{ tan02selecting, + author = "P. Tan and V. Kumar and J. Srivastava", + title = "Selecting the right interestingness measure for association patterns", + booktitle = "Proceedings of the Eight A CM SIGKDD International Conference on Knowledge Discovery and Data Mining", + Month = "July", + volume = "183", + year = "2002" +} + +@InProceedings{ PruningandSummarizingtheDiscoveredAssociations, + author = "Bing Liu and Wynne Hsu and Yiming Ma", + title = "Pruning and summarizing the discovered associations", + booktitle = "KDD '99: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining", + year = "1999", + isbn = "1-58113-143-7", + pages = "125--134", + location = "San Diego, California, United States", + doi = "http://doi.acm.org/10.1145/312129.312216", + publisher = "ACM Press", + address = "New York, NY, USA" +} + +@Manual{ apriorib, + TITLE = "Apriori documentation", + author = "Christian Borgelt", + INSTITUTION = "University of Magdeburg", + URL = "http://fuzzy.cs.uni-magdeburg.de/\~borgelt/doc/apriori/apriori.html", + NOTE = "[4.5.2007]" +} +@article{arules, + title={Introduction to arules--mining association rules and frequent item sets}, + author={Hahsler, Michael and Gr{\"u}n, Bettina and Hornik, Kurt}, + journal={SIGKDD Explor}, + volume={2}, + number={4}, + pages={1--28}, + year={2007}, + publisher={Citeseer} +} + +@InProceedings{ chi02lumberjack, + author = "Ed H. Chi and Adam Rosien and Jeffrey Heer", + title = "LumberJack: Intelligent Discovery and Analysis of Web User Traffic Composition", + booktitle = "Proceedings of ACMSIGKDD Workshop on Web Mining for Usage Patterns and User Profiles", + year = "2002", + publisher = "ACM Press", + location = "Canada" +} + +@InProceedings{ relevanceoftime, + author = "Peter I. Hofgesang", + title = "Relevance of Time Spent on Web Pages", + booktitle = "Proceedings of the KDD Workshop on Web Mining and Web Usage Analysis at WebKDD 2006", + year = "2006", + isbn = "978-80-248-1279-3", + location = "Philadelphia", + publisher = "ACM Press" +} + +@InProceedings{ kliegr_ekaw06, + author = "Tom{\'a}\v{s} Kliegr", + title = "Clickstream analysis - the semantic approach", + booktitle = "Poster and Demo Proceedings of EKAW 2006 - 15th International Conference on Knowledge Engineering and Knowledge Management Managing Knowledge in a World of Networks", + year = "2006", + editor = "Helena Sofia Pinto and Martin Labsky", + isbn = "80-86742-15-6", + location = "Pod\v{e}brady" +} + +@InProceedings{ kliegr_znalosti07, + author = "Tom{\'a}\v{s} Kliegr", + title = "Mining conversion patterns from click streams", + booktitle = "Sborn{\'i}k konference Znalosti 2007", + year = "2007", + isbn = "978-80-248-1279-3", + editor = "Mikuleck{\'y} Peter and Ji{\v{r}}{\'i} Dvorsk{\'y} and Michal Kr{\'a}tk{\'y}", + location = "Ostrava \v{C}R", + publisher = "V\v{S}B-TU Ostrava" +} + +@InProceedings{ kliegr_znalosti06, + author = "Tom{\'a}\v{s} Kliegr", + title = "P{\v{r}}{\'i}prava dat pro clickstream anal{\'y}zu", + booktitle = "Znalosti 2006 Proceedings", + year = "2006", + editor = "J{\'a}n Parali\v{c} and Ji{\v{r}}{\'i} Dvorsk{\'y} and Michal Kr{\'a}tk{\'y}", + isbn = "80-248-1001-8", + location = "Ostrava \v{C}R", + publisher = "V\v{S}B-TU Ostrava" +} + +@Misc{ antcolonyGP, + author = "A. Abraham and V. Ramos", + title = "Web Usage Mining using Artificial Ant Colony Clustering and Genetic Programming", + text = "Abraham A. and Ramos V., 2003, Web Usage Mining using Artificial Ant Colony Clustering and Genetic Programming. CEC'03 -- Proceedings Congress on Evolutionary Computation, IEEE Press, Australia, pp. 1384-1391.", + year = "2003" +} + +@Book{ barla, + PUBLISHER = "Fakulta informatiky A informa\v{c}n{\'y}ch Techonol{\'o}gi{\'i} - Slovensk{\'a} technick{\'a} univerzita v Bratislave ", + Author = "Michal Barla", + TITLE = "Zachytenie z{\'a}ujmov pouᅵᅵvatelᅵa na webe", + YEAR = "2006", + NOTE = "Diplomov{\'a} pr{\'a}ce" +} + +@Book{ witten, + abstract = "As with any burgeoning technology that enjoys commercial attention, the use of data mining is surrounded by a great deal of hype. Exaggerated reports tell of secrets that can be uncovered by setting algorithms loose on oceans of data. But there is no magic in machine learning, no hidden power, no alchemy. Instead there is an identifiable body of practical techniques that can extract useful information from raw data. This book describes these techniques and shows how they work. The book is a major revision of the first edition that appeared in 1999. While the basic core remains the same, it has been updated to reflect the changes that have taken place over five years, and now has nearly double the references. The highlights for the new edition include thirty new technique sections; an enhanced Weka machine learning workbench, which now features an interactive interface; comprehensive information on neural networks; a new section on Bayesian networks; plus much more. + Authors, Ian Witten and Eibe +Frank, recipients of the 2005 ACM SIGKDD Service Award. + Algorithmic methods at the heart of successful data miningincluding tried and true techniques as well as leading edge methods; + Performance improvement techniques that work by transforming the input or output; + Downloadable Weka, a collection of machine learning algorithms for data mining tasks, including tools for data pre-processing, classification, regression, clustering, association rules, and visualizationin a new, interactive interface.", + author = "Ian H. Witten and Eibe Frank", + citeulike-article-id = "340715", + edition = "Second", + howpublished = "Paperback", + isbn = "0120884070", + keywords = "classification machine-learning ml", + month = "June", + priority = "0", + publisher = "Morgan Kaufmann", + series = "Morgan Kaufmann Series in Data Management Systems", + title = "Data Mining: Practical Machine Learning Tools and Techniques", + year = "2005" +} + +@Book{ ashlock, + PUBLISHER = "Springer", + Author = "Daniel Ashlock", + TITLE = "Evolutionary Computation for Modeling and Optimization", + YEAR = "2005", + ISBN = "978-0387221960", + publisher1 = "Springer" +} + +@Book{ mogotsiDis, + PUBLISHER = "University of Stellenbosch", + Author = "I.C. Mogotsi", + TITLE = "What did they cover? ", + YEAR = "2006", + NOTE = "Disertaᅵnᅵ pr{\'a}ce" +} + +@InProceedings{ kmeanskomplexity, + author = "David Arthur and Sergei Vassilvitskii", + title = "How slow is the k-means method?", + booktitle = "SCG '06: Proceedings of the twenty-second annual symposium on Computational geometry", + year = "2006", + isbn = "1-59593-340-9", + pages = "144--153", + location = "Sedona, Arizona, USA", + doi = "http://doi.acm.org/10.1145/1137856.1137880", + publisher = "ACM Press", + address = "New York, NY, USA" +} + +@InProceedings{ SpericalKmeansJeLepsi, + author = "Alexander Strehl and Joydeep Ghosh and Raymond Mooney", + booktitle = "Proceedings of Workshop of Artificial Intelligence for Web Search", + citeulike-article-id = "945520", + keywords = "clustering similarity web", + month = "July", + address = "Texas", + pages = "58--64", + priority = "3", + publisher = "AAAI", + title = "Impact of Similarity Measures on Web-page Clustering", + year = "2000" +} + +@InProceedings{ SpericalKmeans, + author = "Shi Zhong", + abstract = "The spherical k-means algorithm, i.e., the k-means algorithm with cosine similarity, is a popular method for clustering high-dimensional text data. In this algorithm, each document as well as each cluster mean is represented as a high-dimensional unit-length vector. However, it has been mainly used in hatch mode. Thus is, each cluster mean vector is updated, only after all document vectors being assigned, as the (normalized) average of all the document vectors assigned to that cluster. This paper investigates an online version of the spherical k-means algorithm based on the well-known winner-take-all competitive learning. In this online algorithm, each cluster centroid is incrementally updated given a document. We demonstrate that the online spherical k-means algorithm can achieve significantly better clustering results than the batch version, especially when an annealing-type learning rate schedule is used. We also present heuristics to improve the speed, yet almost without loss of clustering +quality.", + booktitle = "Proceedings of the Neural Networks IJCNN '05", + month = "July", + pages = "3180--3185", + volume = "5", + publisher = "IEEE", + title = "Efficient online spherical k-means clustering", + year = "2005" +} + +@Article{ kmeansinitcomparison, + author = "J. M. Pena and J. A. Lozano and P. Larranaga", + title = "An empirical comparison of four initialization methods for the K-Means algorithm", + journal = "Pattern Recogn. Lett.", + volume = "20", + number = "10", + year = "1999", + issn = "0167-8655", + pages = "1027--1040", + publisher = "Elsevier Science Inc.", + address = "New York, NY, USA" +} + +@Book{ smicka, + author = "Radim Smiᅵka", + TITLE = "Optimalizace pro vyhled{\'a}vaᅵe -- SEO", + PUBLISHER = "Jaroslava Smiᅵkov{\'a}", + ADDRESS = "Dubany", + YEAR = "2004", + ISBN = "80-239-2961-5" +} + +@Article{ prweek, + title = "Firms look to search engines to manage reputations", + author = "Rob Key", + JOURNAL = "PRweek. (U.S. ed.)", + ADDRESS = "New York", + ISSUE = "31", + VOLUME = "7", + Month = "ᅵᅵjen", + YEAR = "2004" +} + +@Article{ chainstoreage, + title = "Driving Site Traffic", + author = "Lee Ann Prescott", + JOURNAL = "Chain store age", + ADDRESS = "New York", + ISSUE = "4", + VOLUME = "81", + Month = "Duben", + YEAR = "2005" +} + +@Article{ newsweek, + title = "Hotwiring Your Search Engine", + author = "Brad Stone", + JOURNAL = "Newsweek. (U.S. ed.)", + ADDRESS = "New York", + ISSUE = "25", + VOLUME = "146", + Month = "Prosinec", + YEAR = "2005" +} + +@Article{ prnews, + title = "Companies missing windows of opportunity", + JOURNAL = "PR News", + ADDRESS = "Potomac", + ISSUE = "60", + VOLUME = "36", + Month = "Z{\'a}ᅵᅵ", + YEAR = "2004" +} + +@Article{ wikikontext, + title = "Context", + JOURNAL = "{W}ikipedia, the free encyklopedia", + URL = "http://en.wikipedia.org/wiki/Context", + YEAR = "2005", + NOTE = "[1.1.2006]" +} + +@Book{ torok, + PUBLISHER = "Budapesti Business School", + ADDRESS = "Budapest", + Author = "Katalin Tᅵrᅵk", + TITLE = "Pay-for-performance marketing on the Internet", + YEAR = "2003", + NOTE = "Diplomov{\'a} pr{\'a}ce" +} + +@Article{ murray, + title = " PPC v Natural Search ᅵ A Cost Comparison Case Study", + Author = "Glen Murray", + JOURNAL = "OZArticles.com", + URL = "http://www.ozarticles.com/natural-search.html", + NOTE = "[1.1.2006]" +} + +@Article{ wikiseo, + title = "SEO", + JOURNAL = "{W}ikipedia, the free encyklopedia", + URL = "http://en.wikipedia.org/wiki/SEO", + NOTE = "[1.1.2006]" +} + +@Article{ tkacikova, + AUTHOR = "Daniela Tkaᅵᅵkov{\'a}", + TITLE = "Kvalitnᅵ dokument jako z{\'a}klad ᅵᅵinnᅵho vyhled{\'a}v{\'a}nᅵ informacᅵ", + JOURNAL = "Sbornᅵk konference INFORUM", + YEAR = 2004, + URL = "http://www.inforum.cz/inforum2004/pdf/Tkacikova_Daniela.pdf", + NOTE = "[1.1.2006]" +} + +@Article{ kosek, + AUTHOR = "Jiᅵᅵ Kosek", + TITLE = "Proᅵ nepouᅵᅵv{\'a}m XHTML", + JOURNAL = "Interval.cz", + YEAR = 2004, + ISSN = "1212-8651", + URL = "http://interval.cz/clanek.asp?article=3600", + NOTE = "[1.1.2006]" +} + +@Article{ stanicek, + AUTHOR = "Petr Stanᅵᅵek", + TITLE = "Proᅵ pouᅵᅵv{\'a}m XHTML", + JOURNAL = "Interval.cz", + YEAR = 2004, + ISSN = "1212-8651", + URL = "http://interval.cz/clanek.asp?article=3609", + NOTE = "[1.1.2006]" +} + +@Article{ sej, + TITLE = "SEO Benefits Of CSS", + JOURNAL = "Search Engine Journal", + YEAR = 2005, + MONTH = "Z{\'a}ᅵᅵ", + URL = "http://www.searchenginejournal.com/index.php?p=2211", + NOTE = "[1.1.2006]" +} + +@Article{ blazek, + AUTHOR = "Vratislav Blaᅵek", + TITLE = "SEO obr{\'a}zky", + JOURNAL = "Interval.cz", + YEAR = 2005, + ISSN = "1212-8651", + URL = "http://css.interval.cz/clanky/seo-obrazky/", + NOTE = "[1.1.2006]" +} + +@Article{ weida, + AUTHOR = "Petr Weida", + TITLE = "SEO - vᅵbᅵr domᅵny a hostingu", + JOURNAL = "Interval.cz", + YEAR = 2004, + ISSN = "1212-8651", + URL = "http://interval.cz/clanek.asp?article=3719", + NOTE = "[1.1.2006]" +} + +@Article{ illich, + AUTHOR = "Michal Illich", + TITLE = "PageRank a jeho roz\v{s}ᅵᅵenᅵ", + JOURNAL = "Lupa.cz", + YEAR = 2003, + ISSN = "1213-0702", + URL = "http://www.lupa.cz/clanky/pagerank-a-jeho-rozsireni/", + NOTE = "[1.1.2006]" +} + +@Article{ tabke, + title = "Successful Site in 12 Months with Google Alone", + AUTHOR = "Brett Tabke", + JOURNAL = "Google News Archive", + URL = "http://www.webmasterworld.com/forum3/2010.htm", + YEAR = "2002", + NOTE = "[1.1.2006]" +} + +@Article{ smickablog, + title = "Vlastnᅵ IP adresa", + AUTHOR = "Radim Smiᅵka", + JOURNAL = "Radim Smiᅵka Blog", + URL = "http://smicka.blog.cz/0508/vlastni-ip-adresa", + YEAR = "2005", + NOTE = "[1.1.2006]" +} + +@TechReport{ googlepatent, + Title = "United States Patent Application 20050071741", + AUTHOR = "Anurag Acharya et al. {et al.}", + Institution = "US Patent And Trademark office", + URL = "http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PG01&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.html&r=1&f=G&l=50&s1=%2220050071741%22.PGNR.&OS=DN/20050071741&RS=DN/20050071741", + YEAR = "2005", + NOTE = "[1.1.2006]" +} + +@Article{ seochat, + title = "Google Optimization", + JOURNAL = "SEOChat.com", + URL = "http://forums.seochat.com/google-optimization-7/proof-of-search-position-based-on-age-of-domain-17923.html", + YEAR = "2004", + NOTE = "[1.1.2005]" +} + +@Article{ webpronews, + author = "Garrett French", + title = "Google SandBox Effect Revealed", + JOURNAL = "webpronews.com", + URL = "http://www.webpronews.com/insiderreports/searchinsider/wpn-49-20040506GoogleSandBoxEffectRevealed.html", + YEAR = "2004", + NOTE = "[1.1.2005]" +} + +@Article{ boldis, + AUTHOR = "Petr Boldi\v{s}", + TITLE = "Content management: organizace informacᅵ na webovᅵch str{\'a}nk{\'a}ch", + JOURNAL = "Sbornᅵk konference INFORUM", + YEAR = 2004, + URL = "http://www.inforum.cz/inforum2004/pdf/Boldis_Petr.pdf", + NOTE = "[1.1.2006]" +} + +@Article{ prokop, + AUTHOR = "Marek Prokop", + TITLE = "Role informaᅵnᅵ architektury a optimalizace pro vyhledavaᅵe v online publikov{\'a}nᅵ", + JOURNAL = "Sbornᅵk konference INFORUM", + YEAR = 2004, + URL = "http://www.inforum.cz/inforum2004/pdf/Prokop_Marek.pdf", + NOTE = "[1.1.2006]" +} + +@Article{ schlesinger, + AUTHOR = "Vojtech Schlesinger", + TITLE = "Mod\_rewrite pro hezk{\'a} URL - RewriteEngine a RewriteRule", + JOURNAL = "Interval.cz", + YEAR = 2005, + ISSN = "1212-8651", + URL = "http://interval.cz/clanek.asp?article=3950", + NOTE = "[1.1.2006]" +} + +@Article{ ruzicka, + AUTHOR = "Pavel Rᅵᅵiᅵka", + TITLE = "Vlastnᅵ pᅵesmᅵrovacᅵ sluᅵba", + JOURNAL = "Interval.cz", + YEAR = 2002, + ISSN = "1212-8651", + URL = "http://interval.cz/clanek.asp?article=990", + NOTE = "[1.1.2006]" +} + +@Manual{ googlewebguide, + TITLE = "Google Information for Webmasters - Webmaster Guidelines", + INSTITUTION = "Google", + URL = "http://www.google.com/intl/en/webmasters/guidelines.html", + YEAR = "2005", + NOTE = "[1.1.2006]" +} + +@Article{ curly, + title = "Current SEO Best Practices", + JOURNAL = "WebmasterWorld.COM", + URL = "http://www.webmasterworld.com/forum5/3106.htm", + YEAR = "2001", + NOTE = "[1.1.2006]" +} + +@Article{ kilkelly, + title = "META tags explained and how to use them for ranking", + AUTHOR = "Frank Kilkelly", + URL = "http://www.webmasterworld.com/forum5/3106.htm", + NOTE = "[1.1.2006]" +} + +@Article{ seoconsultants, + title = "Title Element - Page Titles", + YEAR = "2006", + URL = "http://www.seoconsultants.com/meta-tags/title-element.asp", + NOTE = "[1.1.2006]" +} + +@Article{ simonyi, + title = "SEO Tips \& Tricks", + author = "Estevan Simonyi", + JOURNAL = "support4sites", + YEAR = "2005", + URL = "http://www.support4sites.net/article6.htm", + NOTE = "[1.1.2006]" +} + +@Manual{ nkp, + TITLE = "ᅵesk{\'a} terminologick{\'a} datab{\'a}ze knihovnictvᅵ a informaᅵnᅵ vᅵdy", + INSITUTION = "N{\'a}rodnᅵ knihovna ᅵR", + URL = "http://sigma.nkp.cz", + NOTE = "[1.1.2006]" +} + +@Manual{ googletech, + TITLE = "Google Technology - Why Use Google", + INSTITUTION = "Google", + URL = "http://www.google.com/technology/", + YEAR = "2004", + NOTE = "[1.1.2006]" +} + +@Article{ mizoch, + AUTHOR = "Luk{\'a}\v{s} Miᅵoch", + TITLE = "Open Directory Project neboli DMOZ", + JOURNAL = "Interval.cz", + YEAR = 2003, + ISSN = "1212-8651", + URL = "http://interval.cz/clanek.asp?article=2896", + NOTE = "[1.1.2006]" +} + +@Article{ craven, + title = "Google's PageRank Explained and how to make the most of it", + author = "Phil Craven", + URL = "http://www.webworkshop.net/pagerank.html", + NOTE = "[1.1.2006]" +} + +@Book{ nemravadp, + PUBLISHER = "Vysok{\'a} \v{s}kola ekonomick{\'a} - Fakulta managementu", + ADDRESS = "Praha", + Author = "Jan Nemrava", + TITLE = "Optimalizace WWW str{\'a}nek pro vyhled{\'a}vacᅵ a indexovacᅵ katalogy", + YEAR = "2004", + NOTE = "Diplomov{\'a} pr{\'a}ce" +} + +@Article{ seomoz, + title = "Google's Patent: Information Retrieval Based on Historical Data", + JOURNAL = "seomoz.org", + URL = "http://www.seomoz.org/articles/google-historical-data-patent.php#linkageofindependentpeers", + YEAR = "2005", + NOTE = "[1.1.2006]" +} + +@Article{ kryl, + title = "Atribut rel=nofollow a koment{\'a}ᅵovᅵ spam", + AUTHOR = "Milan Kryl", + JOURNAL = "Kryl Blog", + URL = "http://kryl.info/clanek/219-atribut-relnofollow-a-komentarovy-spam", + YEAR = "2005", + NOTE = "[1.1.2006]" +} + +@Manual{ googlebot, + TITLE = "Googlebot: Google's Web Crawler", + INSTITUTION = "Google", + URL = "http://www.google.com/webmasters/bot.html", + YEAR = "2005", + NOTE = "[1.1.2006]" +} + +@Article{ lee, + title = "The Semantic Web", + AUTHOR = "Tim Berners Lee AND James Hendler AND Ora Lassila", + INSTITUTION = "W3C", + JOURNAL = "Scientific American", + YEAR = "2001", + MONTH = "Kvᅵten" +} + +@Manual{ csumb, + TITLE = "Data Warehouse Glossary", + INSTITUTION = "California State University - Monterey Bay", + URL = "http://it.csumb.edu/departments/data/glossary.html", + NOTE = "[1.1.2006]" +} + +@Book{ sterne, + author = "Jim Sterne", + TITLE = "Web Metrics: Proven Methods for Measuring Web Site Success", + PUBLISHER = "Wiley", + ADDRESS = "New York", + YEAR = "2002", + ISBN = "0-471-22072-8" +} + +@Book{ fletcher, + author = "Peter Fletcher and Alex Poon and Ben Pearce and Peter Comber", + TITLE = "Practical Web Traffic Analysis: Standards, Privacy, Techniques, Results", + PUBLISHER = "Glasshaus", + ADDRESS = "Birmingham", + YEAR = "2002", + ISBN = "1-903151-18-3" +} + +@Manual{ navrcholu, + TITLE = "Uᅵivatelsk{\'a} pᅵᅵᅵuᅵka Navrcholu", + INSTITUTION = "IINFO", + YEAR = "2004" +} + +@Article{ webtrends, + title = "WebTrends Advises Sites to Move to First-Party Cookies Based on Four-Fold Increase in Third-Party Cookie Rejection Rates", + INSTITUTION = "WebTrends", + YEAR = "2005", + URL = "http://www.webtrends.com/AboutWebTrends/NewsRoom/NewsRoomArchive/2005/CookieRejection.aspx", + NOTE = "[1.1.2006]" +} + +@Manual{ msdn, + TITLE = "Introduction to ASP.NET and Web Forms", + AUTHOR = "Paul D. Sheriff", + INSTITUTION = "PDSA, Inc.", + URL = "msdn.microsoft.com/library/en-us/dndotnet/html/introwebforms.asp", + YEAR = "2001", + NOTE = "[1.1.2006]" +} + +@Article{ wsje, + title = "Yahoo to Track Impact of Internet Ads", + AUTHOR = "Aaron O. Patrick", + JOURNAL = "The Wall Street Journal Europe", + YEAR = "2005", + VOLUME = "23", + ISSUE = "226", + MONTH = "Prosinec" +} + +@Article{ wumsurvey_facca, + author = "Federico Michele Facca and Pier Luca Lanzi", + title = "Mining interesting knowledge from weblogs: a survey", + journal = "Data \& Knowledge Engineering", + volume = "53", + number = "3", + year = "2005", + issn = "0169-023X", + pages = "225--241", + doi = "http://dx.doi.org/10.1016/j.datak.2004.08.001", + publisher = "Elsevier Science Publishers B. V.", + address = "Amsterdam, The Netherlands, The Netherlands" +} + +@Article{ sem, + title = "Search Engine Marketing", + AUTHOR = "Carol Krol", + JOURNAL = "B to B", + YEAR = "2004", + ADDRESS = "Chicago", + ISSUE = "9", + VOLUME = "89", + MONTH = "Srpen" +} + +@Article{ presswire, + title = "Searchspell announces free hosted typo optimization site service", + JOURNAL = "M2 Presswire", + YEAR = "2004", + ADDRESS = "Coventry", + MONTH = "Z{\'a}ᅵᅵ" +} + +@Article{ cooley, + author = "Robert Cooley", + title = "The use of web structure and content to identify subjectively interesting web usage patterns", + journal = "ACM Trans. Inter. Tech.", + volume = "3", + number = "2", + year = "2003", + issn = "1533-5399", + pages = "93--116", + publisher = "ACM Press", + address = "New York, NY, USA" +} + +@Article{ plourde, + title = "Tracking Visitors with ASP.NET", + author = "Wayne Plourde", + JOURNAL = "15 seconds", + URL = "http://www.15seconds.com/issue/021119.htm", + NOTE = "[1.1.2006]" +} + +@InProceedings{ eirinaki, + author = "Magdalini Eirinaki and Charalampos Lampos and Stratos Paulakis and Michalis Vazirgiannis", + title = "Web personalization integrating content semantics and navigational patterns", + booktitle = "WIDM '04: Proceedings of the 6th annual ACM international workshop on Web information and data management", + year = "2004", + isbn = "1-58113-978-0", + pages = "72--79", + location = "Washington DC, USA", + doi = "http://doi.acm.org/10.1145/1031453.1031468", + publisher = "ACM Press", + address = "New York, NY, USA" +} + +@InProceedings{ hofgesang, + author = "Peter I. Hofgesang AND Wojtek Kowalczyk", + title = "Analysing Clickstream Data: From Anomaly Detection to Visitor Profiling ", + booktitle = "Proceedings of Discovery Challenge 2005", + year = "2005", + pages = "21--30", + location = "Porto, Portugal", + address = "New York, NY, USA" +} + +@Book{ berka, + author = "Petr Berka", + TITLE = "Dobᅵv{\'a}nᅵ znalostᅵ z datab{\'a}zᅵ", + PUBLISHER = "Academia", + ADDRESS = "Praha", + YEAR = "2003", + ISBN = "80-200-1062-9" +} + +@Manual{ lispminer, + TITLE = " LISp-Miner", + INSTITUTION = "Laboratory of Intelligent Systems", + URL = "http://lispminer.vse.cz", + NOTE = "[1.1.2006]" +} + +@Manual{ crispdm, + TITLE = "CRISP-DM 1.0 Step-by-step Data Mining Guide", + AUTHOR = "Pete Chapman and Julian Clinton and Randy Kerber and Thomas Khabaza and Thomas Reinartz and Colin Shearer and Rudiger Wirth", + YEAR = "2000", + MONTH = "June", + PUBLISHER = "CRISP-DM Consorcium", + URL = "http://www.crisp-dm.org/CRISPWP-0800.pdf", + NOTE = "[1.1.2006]" +} + +@Unpublished{ ferda, + Title = "Ferda, novᅵ vizu{\'a}lnᅵ prostᅵedᅵ pro dobᅵv{\'a}nᅵ znalostᅵ", + Authors = "Michal Kov{\'a}ᅵ AND Tom{\'a}\v{s} Kuchaᅵ AND Alexander Kuzmin AND Martin Ralbovskᅵ", + booktitle = "Sborn{\'i}k konference Znalosti 2006", + year = "2006", + editor = "J{\'e}n Parali\v{c} and Ji{\v{r}}{\'i} Dvorsk{\'y} and Michal Kr{\'a}tk{\'y}", + isbn = "80-248-1001-8", + location = "Ostrava \v{C}R", + publisher = "V\v{S}B-TU Ostrava" +} + +@Book{ guhaBook, + Author = "Petr H{\'a}jek AND Tom{\'a}\v{s} Havr{\'a}nek AND Metodᅵj Karel Chytil", + TITLE = "Metoda GUHA. Automatick{\'a} tvorba hypotᅵz", + ADDRESS = "Praha", + PUBLISHER = "Academia", + YEAR = "1983" +} + +@Article{ rauch, + author = "Jan Rauch AND Milan \v{S}im\r{u}nek", + title = "An Alternative Approach to Mining Association Rules", + journal = "Foundation of Data Mining and Knowl. Discovery", + publisher = "Springer", + year = "2005", + volume = "6", + isbn = "978-3-540-26257-2", + pages = "211--231", + place = "Berlin" +} + +@Book{ kejkula, + PUBLISHER = "Vysok{\'a} \v{s}kola ekonomick{\'a} - Fakulta informatiky a statistiky", + ADDRESS = "Praha", + Author = "Martin Kejkula", + TITLE = "Atypickᅵ aplikace procedury 4ft-Miner", + YEAR = "2002", + NOTE = "Diplomov{\'a} pr{\'a}ce" +} diff --git a/_articles/RJ-2025-038/qcba.pdf b/_articles/RJ-2025-038/qcba.pdf new file mode 100644 index 0000000000..d38380139f Binary files /dev/null and b/_articles/RJ-2025-038/qcba.pdf differ diff --git a/_articles/RJ-2025-038/qcba.tex b/_articles/RJ-2025-038/qcba.tex new file mode 100644 index 0000000000..95427ad708 --- /dev/null +++ b/_articles/RJ-2025-038/qcba.tex @@ -0,0 +1,653 @@ +% !TeX root = RJwrapper.tex + +\title{qCBA: An R Package for Postoptimization of Rule Models Learnt on Quantized Data} +\author{Tomas Kliegr} + +\maketitle + +\begin{abstract}A popular approach to building rule models is association rule classification. However, these often produce larger models than most other rule learners, impeding the comprehensibility of the created classifiers. Also, these algorithms decouple discretization from model learning, often leading to a loss of predictive performance. This package presents an implementation of Quantitative Classification based on Associations (QCBA), which is a collection of postprocessing algorithms for rule models built over discretized data. The QCBA method improves the fit of the bins originally produced by discretization and performs additional pruning, resulting in models that are typically smaller and often more accurate. The qCBA package supports models created with multiple packages for rule-based classification available in CRAN, including arc, arulesCBA, rCBA and sbrl. +\end{abstract} + +\section{Introduction}\label{sec:introduction} +There is a resurgence of interest in interpretable machine learning models, with rule learning providing an appealing combination of intrinsic comprehensibility, as humans are naturally used to working with rules, with well-documented predictive performance and scalability. Association rule classification (ARC) is a subclass of rule-learning algorithms that can quickly generate a large number of candidate rules, a subset of which is subsequently chosen for the final classifier. The first, and with at least three packages \citep{hahsler2019associative}, the most popular such algorithm was Classification Based on Associations (CBA) \citep{Liu98integratingclassification}. There are also multiple newer approaches, such as SBRL \citep{yang2017scalable} or RCAR \citep{azmi2020rcar}, both available in R \citep{sbrl,arulesCBA}. + +A major limitation of ARC approaches is that they typically trade off the ability to process numerical data for the speed of generation. On the input, these approaches require categorical data. If there are numerical attributes, these need to be converted to categories, typically through some discretization (quantization) approach, such as MDLP \citep{FayyadI93}. +Association rule learning then operates on prediscretized datasets, which results in the loss of predictive performance and larger rule sets. + +The \dfn{Quantitative Classification Based on Associations} method (QCBA) \citep{kliegr2023qcba} is a collection of several algorithms that postoptimize rule-based classifiers learnt on prediscretized data with respect to the original raw dataset with numerical attributes. As was experimentally shown in \citet{kliegr2023qcba}, this makes the models often more accurate and consistently smaller and thus more interpretable. + +This paper presents the \CRANpkg{qCBA} R package, which implements QCBA and is available on CRAN. +The \CRANpkg{qCBA} R package was initially developed to postprocess results of CBA implementations, as these were the most common rule learning systems in R, but it can now also handle results of other rule learning approaches such as SBRL. The three CBA implementations in CRAN -- \CRANpkg{rCBA} \citep{rcba}, \CRANpkg{arc} \citep{arcPackage} and \CRANpkg{arulesCBA} \citep{arulesCBA} introduced in \citet{hahsler2019associative} rely on the fast and proven \CRANpkg{arules} package \citep{hahsler2011arules} to mine association rules, which is also the main dependency of the \CRANpkg{qCBA} R package. + +\section{Primer on building rule-based classifiers for QCBA in R} +\label{sec:primer} +This primer will show how to use QCBA with CBA as the base rule learner. Out of the rule learners supported by QCBA, CBA is the most widely supported in CRAN and also scientifically most cited (as of writing). This primer is standalone and intends to show the main concepts through code-based examples. However, it uses the same dataset and setting as the going example in the open-access publication \citep{kliegr2023qcba}, which contains graphical illustrations as well as formal definitions of the algorithms (as opposed to R-code examples here). + +\subsection{Brief introduction to association rule classification} +Before we present the details of the QCBA algorithm, we start by covering the foundational methods of association rule learning and classification. + +\textbf{Input data} Association rules are historically mined on \emph{transactions}, which is also the format used by the most commonly used R package \CRANpkg{arules}. +A standard data table (data frame) can be converted to a transaction matrix. This can be viewed as a binary incidence matrix, with one dummy variable for each attribute-value pair (called an \emph{item}). In the case of numerical variables, discretization (quantization) is a standard preprocessing step. It is required to ensure that the search for rules is fast and the conditions in the discovered rules are sufficiently broad. + +\textbf{Association rules} +Algorithms such as Apriori \citep{agrawal94fast} are used for association rule mining. +An association rule has the form $r: antecedent \rightarrow consequent$, where both $antecedent$ and $consequent$ are sets of items. The \CRANpkg{arules} package, which is the most popular Apriori implementation in CRAN, calls the antecedent the left-hand side of the rule ($lhs$) and the consequent the right-hand side ($rhs$). +Each set is interpreted as a conjunction of conditions corresponding to individual items. +When all these conditions are met, then the rule predicts that its consequent is true for a given input data point. Formally, we say that a rule \emph{covers} a transaction if all items in the rule's antecedent are contained in the transaction. A rule \emph{correctly classifies} the transaction if the transaction is covered and, at the same time, the items in the rule consequent are contained in the transaction. + +\textbf{Rule interest measures} +Each rule is associated with some quality metrics (also called rule interest measures). The two most important ones are the confidence and support of rule $r$. Confidence is calculated as $conf(r)=a/(a+b)$, where $a$ is the number of transactions matching both the antecedent and consequent and $b$ is the number of transactions matching the antecedent but not the consequent. Support is calculated either as $a$ (absolute support) or $a$ divided by the total number of transactions (relative support). +In the association rule generation step, confidence and support are used as constraints in association rule learning. Another important constraint to prevent a combinatorial explosion is a restriction on the length of the rule (\texttt{minLen} and \texttt{maxLen} parameters in \CRANpkg{arules}), defined as the threshold on the minimum and maximum count of items in the rule antecedent and consequent. + +\textbf{Association Rule Classification models} +ARC algorithms process candidate association rules into a classifier. An input to the ARC algorithm is a set of pre-mined \emph{class association rules} (CARs). A class association rule is an association rule that contains exactly one item corresponding to one of the target classes in the consequent. The classifier-building process typically amounts to a selection of a subset of CARs \citep{arcReview}. Some ARC algorithms, such as CBA, use rule interest measures for this: rules are first ordered and then some of them are removed using \emph{data coverage pruning} and \emph{default rule pruning} (step 2 and step 3 of the CBA-CB algorithm as described in \citet{Liu98integratingclassification}. Some ARC algorithms also include a rule with an empty antecedent (called \emph{default rule}) at the end of the rule list to ensure that the classifier can always provide a prediction even if there is no specific rule matching that particular transaction. + +\textbf{Types of ARC models} +CBA produces \emph{rule lists}: the rule with the highest priority in the pruned rule list is used for classification. Other recent algorithms producing rule lists include SBRL \citep{sbrl}. The other common ARC approach is based on \emph{rule sets}, where rules are not ordered, and multiple rules can be used for classification, e.g., through voting. This second group includes algorithms such as CMAR \citep{cmar} or CPAR \citep{yin2003cpar}. + +\textbf{Postprocessing with Quantitative CBA} QCBA is a postprocessing algorithm for ARC models built over quantized data. On the input, it can take both rule lists or rule sets. However, it always outputs a rule list. While association rule learning often faces issues with a combinatorial explosion \citep{kliegr2019tuning}, the postprocessing by QCBA is performed on a relatively small number of rules that passed the pruning steps within the specific ARC algorithm. QCBA is a modular approach with six steps that can be performed independently of each other. The only exception is the initial refit tuning step, which processes all items in a given rule that are the result of quantization. QCBA adjusts the item boundaries so that they correspond to actual values appearing in the original training data before discretization. +Benchmarks in \citet{kliegr2023qcba} have shown that, on average, the postprocessing with QCBA took a similar time as building the input CBA model. The most expensive step can be extension, with its complexity depending on the number of unique numerical values. + +\textbf{Comparison with decision tree induction} +A common question is the relationship between association rule classifiers and decision trees. Trees can be decomposed into rules that are similar to association rules, as each path from the root to the leaf corresponds to one rule. However, individual trees in algorithms such as C4.5 \citep{quinlan2014c4} are built in such a way that the resulting rules are non-overlapping, while association rule learning outputs overlapping rules. Algorithms such as Apriori will output all rules that are valid in the data, given the user-specified thresholds. In contrast, decision tree induction algorithms use a heuristic, such as information gain, which results in a large number of otherwise valid patterns being skipped as rules prioritize splits that maximize the chosen heuristic. + +\subsection{Example} +\label{sec:example} +\subsubsection{Dataset} +Let's look at the \code{humtemp} synthetic data set, which we will be using throughout this tutorial and is bundled with the \CRANpkg{arcPackage}. +There are two quantitative explanatory attributes (Temperature and Humidity). The target attribute is preference (subjective comfort level). + +The first six rows of \code{humtemp} obtained with \code{head(humtemp)}: +\begin{verbatim} +## Temperature Humidity Class +## 1 45 33 2 +## 2 27 29 3 +## 3 40 48 2 +## 4 40 65 1 +## 5 38 82 1 +## 6 37 30 3 +\end{verbatim} + +\subsubsection{Quantization} +\label{sec:quantiz} + The essence of QCBA is the optimization of literals (conditions) created over numerical attributes in rules. QCBA thus needs to translate back the bins created during the discretization to the continuous space. + +For clarity, we will perform the quantization manually using equidistant binning and user-defined cut points. An alternative approach using automatic supervised discretization is shown in Section~\ref{sec:demo}. + +\begin{lstlisting} +library(qCBA) + +temp_breaks <- seq(from = 15, to = 45, by = 5) +# Another possibility with user-defined cutpoints +hum_breaks <- c(0, 40, 60, 80, 100) + +data_discr <- arc::applyCuts( + df = humtemp, + cutp = list(temp_breaks, hum_breaks, NULL), + infinite_bounds = TRUE, + labels = TRUE +) +head(data_discr) +\end{lstlisting} + +\noindent The result of quantization is: +\begin{verbatim} +## Temperature Humidity Class +## 1 (40;45] (0;40] 2 +## 2 (25;30] (0;40] 3 +## 3 (35;40] (40;60] 2 +## 4 (35;40] (60;80] 1 +## 5 (35;40] (80;100] 1 +## 6 (35;40] (0;40] 3 +\end{verbatim} +The purpose of the \code{applyCuts()} function is to ensure that within intervals, a semicolon is used as a separator instead of the more common comma. A semicolon is used as the standard interval separator in some countries, such as Czechia. However, the main reason is that a comma is already used within rules for another purpose -- to separate conditions. The use of a different separator fosters unambiguity, for example, in situations when a rule set aimed to be optimized is read from a file. + +\subsubsection{Discovery of candidate class association rules} +ARC algorithms typically first generate a large number of association rules or frequent itemsets. Typically, this step is handled internally by the ARC library + (as shown in Section~\hyperref[{sec:demo}]{Demonstration of individual QCBA optimization steps}). + +For better clarity, in the example below, the list of CARs is generated by manually invoking the apriori algorithm. + +\begin{lstlisting} +txns <- as(data_discr, "transactions") +appearance <- list(rhs = c("Class=1", "Class=2", "Class=3", "Class=4")) +rules <- arules::apriori( + data = txns, + parameter = list( + confidence = 0.5, + support = 3 / nrow(data_discr), + minlen = 1, + maxlen = 3 + ), + appearance = appearance +) +\end{lstlisting} +The first line converts the input data into \emph{items}, attribute-value pairs such as \texttt{Temperature=(40;45]} or \texttt{Class=3}. +The \code{appearance} defines which items can appear in the consequent of the rules (right-hand side, \emph{rhs}). This data format is required for association rule learning. On the second line, there are several standard additional parameters typically used for the extraction of association rules. The confidence threshold of 0.5 and support of 1\% is recommended \citep{Liu98integratingclassification}. +However, since the \code{humtemp} dataset has fewer than 100 rows, the support threshold would correspond to the support of just 1 transaction, which would miss the purpose of this threshold to address overfitting by eliminating rules that are backed by only a small number of instances. We, therefore, set the minimum support threshold to 3 transactions (expressed as a percentage in the code snippet). The \code{minlen} and \code{maxlen} express that rules must contain at most two items in the condition part (antecedent). + +The discovered rules, shown with \code{inspect(rules)}, are shown below: + +\begin{Verbatim}[formatcom=\footnotesize\ttfamily] +lhs rhs support confidence coverage lift # +{Humidity=(80;100]} => {Class=1} 0.11111111 0.8000000 0.1388889 3.600000 4 +{Temperature=(15;20]} => {Class=2} 0.11111111 0.5714286 0.1944444 2.057143 4 +{Temperature=(30;35]} => {Class=4} 0.13888889 0.6250000 0.2222222 2.045455 5 +{Temperature=(25;30]} => {Class=4} 0.13888889 0.5000000 0.2777778 1.636364 5 +{Temperature=(25;30], Humidity=(40;60]} => {Class=4} 0.08333333 0.6000000 0.1388889 1.963636 3 +\end{Verbatim} + +In this listing, coverage and lift, as defined in the \citet{arules} documentation, are additional rule quality measures not used by the \CRANpkg{qCBA} package. Count (abbreviated with \# in the listing) corresponds to the support of the rule represented as an integer value rather than a proportion. + +\subsubsection{Learn classifier from candidate CARs} +Out of the five discovered rules, we create a CBA classifier. The following lists the conceptual steps performed by CBA to transform the input rule list into a CBA model: + +\begin{enumerate} + \item Rule \emph{precedence} is established: rules are sorted according to confidence, support and length. + \item Rules are subject to \emph{pruning} + \begin{itemize} + \item \emph{data coverage pruning}: the algorithm iterates through the rules in the order of precedence removing any rule which does not correctly classify at least one instance. If the rule correctly classifies at least one instance, it is retained and the instances removed (only for the purpose of subsequent steps of data coverage pruning). + \item \emph{default rule pruning}: the algorithm iterates through the rules in the sort order, and cuts off the list once keeping the current rule would result in worse accuracy of the model than if a default rule was inserted at the place of the current rule and the rules below it were removed. + \end{itemize} +\item \emph{Default rule} is inserted at the bottom of the list. +\end{enumerate} + +To supply our own list of candidate rules instead of using one generated with \code{cba()}, we will call: + +\begin{lstlisting} +classAtt <- "Class" +rmCBA <- cba_manual( + datadf_raw = humtemp, + rules = rules, + txns = txns, + rhs = appearance$rhs, + classAtt = classAtt, + cutp = list() +) +inspect(rmCBA@rules) +\end{lstlisting} +The reason why we invoke \code{cba\_manual()} from the \code{arc} package is that \code{cba\_manual()} instead of \code{cba()} allows us to supply a custom list of rules from which the CBA model will be built. This function would also do the quantization, but since we already did this as part of the preprocessing, we use \code{cutp = list()} to express that no cutpoints are specified. + +In this toy example, the CBA model, which could be displayed through the function call \code{inspect(rmCBA@rules)}, is almost identical to the candidate list of rules shown above. The main +difference is the reordering of rules by confidence and support (higher is better) and the addition of the \emph{default rule} -- a rule with an empty antecedent to the end. Note that for brevity, the conditions in the rules in the printout below were replaced by \{...\} as they are the same as in the printout on the previous page (although mind the different order of rules). The values of support, confidence, coverage and lift are also the same and were omitted. + +\begin{Verbatim}[formatcom=\footnotesize\ttfamily] + lhs rhs {...} count lhs_length orderedConf orderedSupp cumulativeConf +[1] {...} => {Class=1} {...} 4 1 0.8000000 4 0.8000000 +[2] {...} => {Class=4} {...} 5 1 0.7142857 5 0.7500000 +[3] {...} => {Class=4} {...} 3 2 0.6000000 3 0.7058824 +[4] {...} => {Class=2} {...} 4 1 0.5000000 3 0.6521739 +[5] {...} => {Class=4} {...} 5 1 0.5000000 2 0.6296296 +[6] {} => {Class=2} {...} 0 0 0.5555556 5 0.6111111 +\end{Verbatim} + +The default rule ensures that the rule list covers every possible instance. The rule list is visualized in Figure~\ref{fig:postpr}, where the green background corresponds to Class 2 predicted by the default rule. +The CBA output contains several additional statistics for each rule. The \emph{ordered} versions of confidence and support are computed only from those training instances reaching the given rule. For the first rule, the ordered confidence is identical to standard confidence. The ordered support is also semantically the same but is expressed as an absolute count rather than a proportion. To compute these values for the subsequent rules, instances covered by rules higher in the list have been removed. The cumulative confidence is an experimental measure described in \CRANpkg{arc} documentation. + +\begin{figure} +\centering + \includegraphics[width=0.48\textwidth]{figures/figure1a.png} + \includegraphics[width=0.48\textwidth]{figures/figure1b.png} + \caption{Illustration of postpruning algorithm (HumTemp dataset). Left: CBA model; right: QCBA model.} + \label{fig:postpr} +\end{figure} + +\subsubsection{Applying the classifier} +The toy example does not contain enough data for a meaningfully large train/test split. Therefore, we will evaluate the training data. +\noindent The accuracy of the CBA model on training data: +\begin{lstlisting} +prediction_cba <- predict(rmCBA, data_discr) +acc_cba <- CBARuleModelAccuracy( + prediction = prediction_cba, + groundtruth = data_discr[[classAtt]] +) +\end{lstlisting} +\noindent The accuracy is 0.61, and the contents of \code{prediction\_cba} are the predicted values of comfort: +\begin{lstlisting} + [1] 2 4 2 2 1 2 2 4 4 4 4 4 4 1 4 1 4 4 4 4 4 4 4 4 1 2 2 2 2 1 2 2 2 2 2 2 +Levels: 1 2 4 +\end{lstlisting} + +\subsubsection{Explaining the prediction} +The \code{predict()} function from the \CRANpkg{arc} library allows for additional output to enhance the explainability of the result. +By setting \code{outputFiringRuleIDs=TRUE} we can obtain the ID of a particular rule that was used to classify each instance in the passed dataset. +\begin{lstlisting} +ruleIDs <- predict(rmCBA, data_discr, outputFiringRuleIDs = TRUE) +\end{lstlisting} + +For example, we may now explain the classification of the first row of \code{data\_discr}, which is, for convenience, reproduced below: +\begin{lstlisting} +## Temperature Humidity Class +## 1 (40;45] (0;40] 2 +\end{lstlisting} +To do so, we invoke: +\begin{lstlisting} +inspect(rmCBA@rules[ruleIDs[1]]) +\end{lstlisting} +This returns the default rule (number 6). The reason is that the values in this instance are out of bounds of the conditions in all other rules. Now, we have a rule list ready for postoptimization with QCBA. +\subsubsection{Postprocessing with QCBA} +By working with the original continuous data, the QCBA algorithm can improve the fit of the rules and consequently reduce their count. + +\noindent We will use the \code{rmCBA} model built previously: + +\begin{lstlisting} +rmqCBA <- qcba(cbaRuleModel = rmCBA, datadf = humtemp) +\end{lstlisting} +This ran QCBA with the default set of optimization steps enabled, which correspond to the best-performing configuration \#5 from \citep{kliegr2023qcba}. + +The resulting rule list is +\begin{lstlisting} + rules support confidence # +1 {Humidity=[82;95]} => {Class=1} 0.1111111 0.8000000 1 +2 {Temperature=[22;31],Humidity=[33;53]} => {Class=4} 0.1666667 0.7500000 2 +3 {Temperature=[31;34]} => {Class=4} 0.1388889 0.6250000 1 +4 {} => {Class=2} 0.2777778 0.2777778 0 +\end{lstlisting} +Note that the `condition\_count' column was abbreviated as \# in the listing and the orderedConf and orderedSupp columns were omitted for brevity. + +Figure~\ref{fig:postpr} (right) shows the QCBA model. Compared to the CBA model in Figure~\ref{fig:postpr} (left), QCBA removed two rules and refined the boundaries of the remaining rules. + +Predictive performance is computed in the same way as for CBA: +\begin{lstlisting} +prediction <- predict(rmqCBA, humtemp) +acc <- CBARuleModelAccuracy(prediction, humtemp[[rmqCBA@classAtt]]) +\end{lstlisting} +The accuracy is unchanged at 0.61, but we got a smaller model. Similarly, as with CBA, we could use the argument \emph{outputFiringRuleIDs}. + +Note that the QCBA algorithm does not introduce any mandatory thresholds or meta-parameters for the user to set or optimize, although it does allow the user to enable or disable the individual optimizations, as shown in the next section. + +\section{Detailed description of package \CRANpkg{qCBA} with examples in R} + +\subsection{Overview of \code{qcba()} arguments} +To build a model, \code{qcba()} needs a set of rules. As a second mandatory argument, the \code{qcba()} function takes the \emph{raw} data frame. This can contain nominal as well as numerical columns. The remaining arguments are optional. The most important ones relating to the optimizations performed by QCBA are described in the following two subsections. + +This rule model can be either the instance of \code{customCBARuleModel} or \code{CBARuleModel} class. The difference is that in the $rules$ slot, in the former class, rules are represented as string objects in a data frame. This universal data frame format is convenient for loading rules from other sources or R packages that export rules as strings. The latter uses an instance of \code{rules} class from \CRANpkg{arules}, which is more efficient, especially when the number of rules or items is larger. +An important slot shared between both classes is \code{cutp}, which contains information on the cutpoints used to quantize the data on which the rules were learnt. + +\subsection{Optimizations on individual rules} +Optimizations can be divided into two groups depending on whether they are performed on individual rules or on the entire model. We first describe the first group. +Since these operations are independent of the other rules, they can also be parallelized. + +\textbf{Refitting rules}. This step processes all items derived with quantization in the antecedent of a given rule. These items have boundaries that stick to a +grid that corresponds to the result of discretization. The grid used by QCBA corresponds to all unique values +appearing in the training data. \emph{This is the only mandatory step.} + +\textbf{Attribute pruning} (\code{attributePruning}). Attribute pruning is a step in QCBA that evaluates if the items are needed for each rule and item in its antecedent. The item is removed if a rule created without the item has at least the confidence of +the original rule. +\emph{Enabled (\code{TRUE}) by default.} + +\textbf{Trimming} (\code{trim\_literal\_boundaries}). Boundary parts of intervals into which no instances correctly classified by the rule fall are removed. \emph{Enabled (\code{TRUE}) by default.} + +\textbf{Extension} (\code{extendType}). The ranges of intervals in the antecedent of each rule are attempted to be enlarged. Currently, only extension on numerical attributes is supported. By default, the extension is accepted if it does not decrease rule confidence, but this behaviour can be controlled by setting the \code{minImprovement} parameter (default is 0). To overcome local minima, the extension process can provisionally accept a drop in confidence in the intermediate result of the extension. How much the confidence can temporarily decrease for the extension process not to stop is controlled by \code{minCondImprovement}. +In the current version, the extension applies only to numerical attributes (set \code{extendType="numericOnly"} to enable, this is also the default value). In future versions, other extend types may be added. + +\subsection{Optimizations on rule list} +\label{ss:rulelistoptimization} +The second group of optimizations aims at removing rules, considering the context of other rules in the list. + +\textbf{Data coverage pruning} (\code{postpruning}). The purpose of this step is to remove rules that were made redundant by the previous QCBA optimizations. Possible values are \code{none}, \code{cba} and \code{greedy}. The \code{cba} option is identical to CBA's data coverage pruning: a rule is removed if it does not correctly classify any transaction in the training data after all transactions covered by retained rules with higher precedence were removed. The \code{greedy} option is an experimental modification of data coverage pruning described in the \CRANpkg{qCBA} documentation. \emph{Enabled by default} (the default value is \code{postpruning=cba}). + +\textbf{Default rule overlap pruning} (\code{defaultRuleOverlapPruning}). Let $R_p$ denote the set of rules that classify into the same class as the default rule $r_d: \{\} \rightarrow cons_1$. +To determine if a pruning candidate, denoted as $r_p \in R_p: ant \rightarrow cons_1$, can be removed, all rules with lower precedence that have a nonempty antecedent and a different consequent $cons_2$ ($cons_1\neq cons_2$) are identified. Let's denote the set of these rules as $R_c$. +If the antecedents of all rules in $R_c$ do not cover any of the transactions covered by $r_p$ in the training data, $r_p$ is removed. The removal of $r_p$ will not affect the classification of training data, since the instances originally covered by $r_p: ant \rightarrow cons_1$ will be classified to the same class by $r_d: \{\} \rightarrow cons_1$. This is called the transaction-based version. In the alternative range-based version, the checks on rules in $R_p$ involve checking the boundaries of intervals rather than overlap in matched transactions. +This parameter has three possible values: \code{transactionBased}, \code{rangeBased} and \code{none}. According to analysis and benchmarks in \citep{kliegr2023qcba}, the transaction-based version removes more rules than the range-based version, although it can sometimes affect predictive performance on unseen data. \emph{Transaction-based pruning is the default.} + +\subsection{Effects on classification performance and model size} +An overview of observed properties of the QCBA steps is present in Table~\ref{tbl:guarantees}. +The entries denote the effect of applying the algorithm specified in the first column on the input rule list: $\geq$ denotes that the value of the given metric will increase or will not change, = the value will not change, $\leq$ decrease or will not change, $na$ can increase, decrease or will not change. For example, applying the refit algorithm on a rule can have the following effects according to the table: the density of the rule will improve or remain the same: the (+) symbol in the table denotes that an increase in that value is considered a favourable property, and (-) as negative. Rule confidence (\emph{conf}), rule support (\emph{supp}), rule length (\emph{length}) will remain unaffected. Considering the entire rule list, the refit operation will not affect the rule count or accuracy on training data ($acc_{train}$). However, the accuracy on unseen data ($acc_{test}$) may change. + +\begin{table}[h!] +{\footnotesize +\begin{center} +\begin{tabular}{llll|lll} +\toprule +algorithm & \multicolumn{3}{c|}{rule (local classifier)} & \multicolumn{3}{c}{rule list} \\ +dataset & conf & supp & length & $acc_{train}$ & $acc_{test}$ & rule count \\ +\midrule +refit & = & = & = & = & $na$ & =\\ +literal pruning & $\geq$ & $\geq (+)$ & $\leq (+)$ & $na$ & $na$ & $\leq$* (+) \\ +trimming & $\geq (+)$ & = & = & $\geq (+)$ & $na$ &=\\ +extension & $\geq (+)$ & $\geq (+)$ & = & $na$ & $na$ & =\\ +postpruning & na & na & na & $\geq$ & $na$ &$\leq$ (+) \\ +drop - trans. & na & na & na & = &$na$ & $\leq$ (+) \\ +drop - range & na & na & na & = & =& $\leq$ (+) \\ +\bottomrule +\end{tabular} +\end{center} +} + +\caption{Hypothesized properties of the proposed rule tuning algorithms. \emph{drop} denotes default rule pruning (\emph{trans.} transaction-based; \emph{range} range-based). * the number of rules is not directly reduced, but literal pruning can make a rule redundant (identical to another rule). The value \emph{na} expresses that there is no unanimous effect of the preprocessing algorithm on the quality measure. +} +\label{tbl:guarantees} +\end{table} + +\subsection{Handling missing data} +Association rule learning approaches are resilient to the presence of missing data in the input data frame. The reason is that rows are viewed as transactions and combinations of column names and their values as items. A missing value (\code{NA}) in a given column is then interpreted as an item not present in a given transaction and skipped when a data frame is converted to the \code{transactions} data structure from the \CRANpkg{arules} using the \code{as()} function, which will result in NA values not being present in learnt rules. +Note that \code{qcba()} treats an empty string in the input dataframe in the same way as the \code{NA} value along with a dataframe containing \code{NA} values: the \code{NA} value is first converted to an empty string, represented using \code{.jarray()} from \CRANpkg{rJava} and then passed to the Java-based core of \CRANpkg{qCBA}, where it is treated as a \code{Float.NaN} value and also generally skipped. + +\subsection{Computational costs for large datasets} +\label{ss:costs} +The individual postprocessing operations have distinct computational costs. These depend on several main factors: the number of rows and columns of the input data, the number of unique numerical values, and the number of input rules. A detailed experimental study of the influence of these factors is presented in \citet{kliegr2023qcba}. This was performed on subsets of the KDD'99 Anomaly Detection dataset, with the maximum dataset size being processed using all QCBA optimizations, reaching about 40,000 rows and about the same number of unique numerical values. The results show that the most computationally intensive step is the extension step. For large datasets, the user may, therefore, consider disabling it or tuning its \code{minCI} parameter, which can influence the runtime. +An important factor is also the number of input rules for optimization. This is often related to the chosen base learning algorithm. For example, CPAR tends to produce a large number of rules while SBRL will output condensed rule models. For details, please refer to \citet{kliegr2023qcba}. + +\section{Demonstration of individual QCBA optimization steps} +\label{sec:demo} + +Compared to the example in Subsection~\hyperref[{sec:example}]{Example}, this part will use the larger iris dataset, which allows for the demonstration of all QCBA steps. To show QCBA with a different base rule learner than CBA from the \CRANpkg{arc} package used in the previous section, we will use CPAR from the \CRANpkg{arulesCBA} package. + +\subsection{Data} +\label{ss:data} +We use the \code{iris} dataset from the \CRANpkg{datasets} R package. The dataset was shuffled and randomly split into a train set (100 rows) and a test set (50 rows). The data was then automatically discretized using the MDLP algorithm wrapped by \code{arc::discrNumeric()} and originally available from the R \CRANpkg{discretization} library. + +\begin{lstlisting} +library(arulesCBA) #version 1.2.7 +set.seed(12) # Chosen for demonstration purposes +allDataShuffled <- datasets::iris[sample(nrow(datasets::iris)), ] +trainFold <- allDataShuffled[1:100, ] +testFold <- allDataShuffled[101:nrow(datasets::iris), ] +classAtt <- "Species" + +discrModel <- discrNumeric(df = trainFold, classAtt = classAtt) +train_disc <- as.data.frame(lapply(discrModel$Disc.data, as.factor)) +cutPoints <- discrModel$cutp +test_disc <- applyCuts( + testFold, + cutPoints, + infinite_bounds = TRUE, + labels = TRUE +) +y_true <- testFold[[classAtt]] +\end{lstlisting} + +\subsection{Learn base ARC model with CPAR} +In the following, we learn an ARC model using the CPAR (Classification based on Predictive Association Rules) algorithm using its default settings. +\begin{lstlisting} +rmBASE <- CPAR(train_disc, formula = as.formula(paste(classAtt, "~ ."))) +predictionBASE <- predict(rmBASE, test_disc) +inspect(rmBASE$rules) +cat("Number of rules: ", length(rmBASE$rules)) +cat("Total conditions: ", sum(rmBASE$rules@lhs@data)) +cat("Accuracy on test data: ", mean(predictionBASE == y_true)) +\end{lstlisting} +In this case, the rule model is composed of seven rules. Note that the last field on the output containing the Laplace statistics was omitted for brevity. +\begin{lstlisting} + lhs rhs support confidence lift +[1] {Petal.Length=[-Inf;2.6]} => {Species=setosa} 0.32 1.0000000 3.125000 +[2] {Petal.Width=[-Inf;0.8]} => {Species=setosa} 0.32 1.0000000 3.125000 +[3] {Petal.Length=(5.15; Inf], + Petal.Width=(1.75; Inf]} => {Species=virginica} 0.25 1.0000000 2.777778 +[4] {Petal.Width=(1.75; Inf]} => {Species=virginica} 0.33 0.9705882 2.696078 +[5] {Sepal.Length=(5.55; Inf], + Petal.Length=(2.6;4.75], + Petal.Width=(0.8;1.75]} => {Species=versicolor} 0.21 1.0000000 3.125000 +[6] {Petal.Length=(2.6;4.75]} => {Species=versicolor} 0.26 0.9629630 3.009259 +[7] {Petal.Width=(0.8;1.75]} => {Species=versicolor} 0.31 0.9117647 2.849265 +\end{lstlisting} +The statistics are: +\begin{lstlisting} +"Number of rules: 7 Total conditions: 10 Accuracy on test data: 0.96" +\end{lstlisting} + +\subsection{Configuring QCBA optimizations and printing statistics} +For our demonstration purposes, we will set up a generic \CRANpkg{qCBA} configuration (variable \code{baseModel\_arc}), which initially disables all optimizations. To avoid repeating code for passing long argument lists to \code{qcba()} and for printing model statistics such as model size and accuracy on test data, we will also introduce the helper function \code{qcba\_with\_summary()}. + +\begin{lstlisting} + +baseModel_arc <- arulesCBA2arcCBAModel( + arulesCBAModel = rmBASE, + cutPoints = cutPoints, + rawDataset = trainFold, + classAtt = classAtt +) +qcbaParams <- list( + cbaRuleModel = baseModel_arc, + datadf = trainFold, + extend = "noExtend", + attributePruning = FALSE, + continuousPruning = FALSE, + postpruning = "none", + trim_literal_boundaries = FALSE, + defaultRuleOverlapPruning = "noPruning", + minImprovement = 0, + minCondImprovement = 0 +) + +qcba_with_summary <- function(params) { + rmQCBA <- do.call(qcba, params) + cat("Number of rules: ", nrow(rmQCBA@rules), " ") + cat("Total conditions: ", sum(rmQCBA@rules$condition_count), " ") + accuracy <- CBARuleModelAccuracy(predict(rmQCBA, testFold), testFold[[classAtt]]) + cat("Accuracy on test data: ", round(accuracy, 2)) + print(rmQCBA@rules) +} +\end{lstlisting} + +\subsection{QCBA Refit} +The following code will postoptimize the previously learnt CBA model using the refit optimization: +\begin{lstlisting} +qcba_with_summary(qcbaParams) +\end{lstlisting} +This will output the following list of eight rules (note that in this and the following printouts, the columns with rule measures are omitted for brevity): +\begin{lstlisting} +1 {Petal.Length=[-Inf;1.9]} => {Species=setosa} +2 {Petal.Width=[-Inf;0.6]} => {Species=setosa} +3 {Petal.Length=[5.2;Inf],Petal.Width=[1.8;Inf]} => {Species=virginica} +4 {Sepal.Length=[5.6;Inf],Petal.Length=[3.3;4.7],Petal.Width=[1;1.7]} => {Species=versicolor} +5 {Petal.Width=[1.8;Inf]} => {Species=virginica} +6 {Petal.Length=[3.3;4.7]} => {Species=versicolor} +7 {Petal.Width=[1;1.7]} => {Species=versicolor} +8 {} => {Species=virginica} +\end{lstlisting} +The intervals (except for boundaries set to infinity) were shortened. For example, for the first rule, the training data does not contain any data point with Petal.Length=2.6 (original boundary) but it does contain the value 1.9 (new boundary). +\begin{lstlisting} +any(trainFold$Petal.Length == 1.9) +any(trainFold$Petal.Length == 2.6) +\end{lstlisting} +The first returns \code{TRUE} and the second \code{FALSE}. + +The statistics are +\begin{lstlisting} +[1] "Number of rules: 8 Total conditions: 10 Accuracy on test data: 0.96" +\end{lstlisting} +While this is one rule more than the CPAR model, this extra rule is the explicitly included default rule (rule \#8). In the \CRANpkg{arulesCBA} CPAR model, the default rule is included in a separate slot (\code{rmBASE\$default}) and is the same virginica class as the one in the QCBA rule set. + +Recall that rules are applied from top to bottom. A careful inspection of the rule list shows that it contains rule 4, which is a special case of rule 7 (both predicting the versicolor class). However, neither of these rules can be removed without impact on the predictions. If the more specific rule 4 was removed, some instances would be classified differently, as more instances would reach rule 5, which would classify them as virginica. Similarly, the classification would change if we replace rule 4 with rule 7 or rule 7 with rule 4. + +\subsection{Adjusting boundaries and attribute pruning} +We will demonstrate extension, trimming and attribute pruning steps simultaneously for efficient presentation. +\begin{lstlisting} +qcbaParams$attributePruning <- TRUE +qcbaParams$trim_literal_boundaries <- TRUE +qcbaParams$extend <- "numericOnly" +qcba_with_summary(qcbaParams) +\end{lstlisting} + +The list of resulting rules is +\begin{lstlisting} +1 {Petal.Length=[-Inf;1.9]} => {Species=setosa} +2 {Petal.Width=[-Inf;0.6]} => {Species=setosa} +3 {Petal.Length=[5.2;Inf]} => {Species=virginica} +4 {Sepal.Length=[5;Inf],Petal.Length=[3.3;4.7]} => {Species=versicolor} +5 {Petal.Width=[1.8;Inf]} => {Species=virginica} +6 {Petal.Length=[3.3;4.7]} => {Species=versicolor} +7 {Petal.Width=[1;1.7]} => {Species=versicolor} +8 {} => {Species=virginica} + +\end{lstlisting} +As can be seen, the adjustment of intervals resulted in a change in the boundary for Sepal.length in rule \#4. The attribute pruning removed extra conditions from rules \#3 and \#4 resulting in a smaller model with overall improved test accuracy: +\begin{lstlisting} +"Number of rules: 8 Total conditions: 8 Accuracy on test data: 1" +\end{lstlisting} + +\subsection{Postpruning} +The postpruning is performed on a model resulting from all previous steps. +\begin{lstlisting} +qcbaParams$postpruning <- "cba" +qcba_with_summary(qcbaParams) +\end{lstlisting} +The result is a substantially reduced rule list: +\begin{lstlisting} +1 {Petal.Length=[1;1.9]} => {Species=setosa} +2 {Petal.Length=[5.2;Inf]} => {Species=virginica} +3 {Sepal.Length=[5;Inf],Petal.Length=[3.3;4.7]} => {Species=versicolor} +4 {Petal.Width=[1.8;Inf]} => {Species=virginica} +5 {} => {Species=versicolor} +\end{lstlisting} +The statistics are: +\begin{lstlisting} +"Number of rules: 5 Total conditions: 5 Accuracy on test data: 1.0" +\end{lstlisting} +As can be seen, postpruning reduced the model size significantly, and in this case, the model has even better accuracy than the base CPAR model. However, in some other cases, a small average decrease in accuracy as a result of pruning has been shown in benchmarks in \citet{kliegr2023qcba}. + +\subsection{Default rule pruning} +The following code demonstrates the standard strategy for default rule pruning. As outlined earlier, this often provides an effective way to reduce the rule count, however, sometimes at the expense of slightly lower accuracy. +\label{final:cpar} +\begin{lstlisting} +qcbaParams$defaultRuleOverlapPruning <- "transactionBased" +qcba_with_summary(qcbaParams) +\end{lstlisting} +The result is the final reduced rule list with the removed rule \#3 from the previous print-out. This rule classifies to the versicolor class. Since it is the default class, instances that were covered by this rule will be covered by the default rule. The original rule \#4 covers -- considering its position in the rule list -- different instances of training data and, therefore, will not interfere with this. +\begin{lstlisting} +1 {Petal.Length=[1;1.9]} => {Species=setosa} +2 {Petal.Length=[5.2;Inf]} => {Species=virginica} +3 {Petal.Width=[1.8;Inf]} => {Species=virginica} +4 {} => {Species=versicolor} +\end{lstlisting} +The statistics for the final model are: +\begin{lstlisting} +Number of rules: 4 Total conditions: 3 Accuracy on test data: 1.0 +\end{lstlisting} +Compared to the original CPAR model, the number of rules dropped from 8 (including the default rule) to 4, the number of conditions dropped from 10 to 3, and the accuracy increased by 0.04 points. This is an illustrative example and actual results may vary for a particular dataset. On average, the improvement reported over CPAR as measured on 22 benchmark datasets was a 2\% improvement in accuracy, a 40\% reduction in the number of rules and a 29\% reduction in the number of conditions \cite{kliegr2023qcba}. More details on the benchmarks are covered in Section~\hyperref[{sec:benchmark}]{Built-in benchmark support}. + +\section{Interoperability with Rule Learning Packages in CRAN} +The \CRANpkg{qCBA} package is able to process CBA models produced by all three CBA implementations in CRAN: \CRANpkg{arc}, \CRANpkg{arulesCBA}, \CRANpkg{rCBA}. Additionally, it can process other rule models generated by \CRANpkg{arulesCBA} such as CPAR and SBRL models generated by the package \CRANpkg{sbrl}. + +On the input, \code{qcba()} requires an instance of \code{CBARuleModel}, which has the following slots: +\begin{itemize} + \item \code{rules}: list of rules in the model (instance of \code{rules} from \code{arules} package. + \item \code{cutp}: specification of cutpoints used to discretize numerical attributes, + \item \code{classAtt}: name of the target attribute, + \item \code{attTypes}: types of the attributes (numeric or factor). +\end{itemize} + +As shown below, the instance of this class is created automatically when used with \CRANpkg{arc} and through prepared helper functions for other libraries. +The code examples in the following are built on the data preparation described in Subsection~\hyperref[{ss:data}]{Data}. + +\subsection{\CRANpkg{arc} package} +As the \CRANpkg{arc} package was specifically designed for \CRANpkg{qCBA}, it outputs an instance of the \code{CBARuleModel} class, which is accepted by the \code{qcba()} function. The postprocessing can thus be directly applied to the result of \code{cba()}. + +\begin{lstlisting} +rmCBA <- cba(datadf = trainFold, classAtt = "Species") +rmqCBA <- qcba(cbaRuleModel = rmCBA, datadf = trainFold) +\end{lstlisting} +By default, the function \code{cba()} from the \CRANpkg{arc} package learns candidate association rules using automatic threshold detection using the heuristic algorithm from \citep{kliegr2019tuning}. Therefore, no support and confidence thresholds have to be passed. +A more complex case involving custom discretization and thresholds was demonstrated in Section~\hyperref[{sec:primer}]{Primer on building rule-based classifiers for QCBA in R}. + +\subsection{\CRANpkg{arulesCBA} package} +Compared to the previous example, there is an extra line with a call to a helper function. +%To combine \CRANpkg{qCBA} \CRANpkg{arulesCBA} and other packages, it is necessary that external logic is applied to retain the cutpoints generated by the discretization +\begin{lstlisting} +library(arulesCBA) +arulesCBAModel <- arulesCBA::CBA(Species ~ ., data = train_disc, supp = 0.1, conf = 0.9) +CBAmodel <- arulesCBA2arcCBARuleModel(arulesCBAModel, discrModel$cutp, iris, classAtt) +qCBAmodel <- qcba(cbaRuleModel = CBAmodel, datadf = iris) +\end{lstlisting} +Note that we passed a prediscretized data in \code{irisDisc} to the package \CRANpkg{arulesCBA}. +While the \CRANpkg{arulesCBA} package allows for supervised discretization using the +\code{arulesCBA::discretizeDF.supervised()} method, the cutpoints determined during the discretization are not exposed in a machine-readable way. Therefore, when \CRANpkg{arulesCBA} is used in conjunction with \CRANpkg{qCBA}, the discretization should be performed using \code{arc::discrNumeric()}. Since both \CRANpkg{arc} and \CRANpkg{arulesCBA} internally use the MDLP method from the \CRANpkg{discretization} package, this should not influence the results. + +\subsection{\CRANpkg{rCBA} package} +The logic of the use of \CRANpkg{rCBA} package is similar to that of the previously covered package; it is only necessary to ensure the use of a different conversion function: +\begin{lstlisting} +library(rCBA) +rCBAmodel <- rCBA::build(train_disc) +CBAmodel <- rcbaModel2CBARuleModel(rCBAmodel, discrModel$cutp, iris, "Species") +qCBAmodel <- qcba(CBAmodel, iris) +\end{lstlisting} +Confidence and support thresholds are not specified as \CRANpkg{rCBA} uses automatic confidence and threshold tuning using the simulated annealing algorithm from \citep{kliegr2019tuning}. The \CRANpkg{rCBA} package internally uses a Java implementation of CBA, which may result in faster performance on some datasets than \CRANpkg{arulesCBA} \citep{hahsler2019associative}. + +\subsection{\CRANpkg{sbrl} and other packages} +The logic of the use of \CRANpkg{qCBA} package for \CRANpkg{sbrl} is similar; it is again only necessary to use a dedicated conversion function \code{sbrlModel2arcCBARuleModel()}. + +An example for \CRANpkg{sbrl} is contained in the \CRANpkg{qCBA} package documentation, as \CRANpkg{sbrl} requires additional preprocessing and postprocessing: \CRANpkg{sbrl} requires a specially named target attribute, allows only for binary targets, and outputs probabilities rather than specific class predictions. + +For compatibility with packages that do not use the \CRANpkg{arules} data structures, there is also the \code{customCBARuleModel} class, which takes rules as a dataframe conforming to the format used in \CRANpkg{arules} that can be obtained with \code{(as(rules, "data.frame"))}. + +\section{Built-in benchmark support} +\label{sec:benchmark} +The \CRANpkg{qCBA} package has built-in support for benchmarking over all supported types of algorithms covered in the previous section. This includes \CRANpkg{arulesCBA} implementations of CBA, CMAR, CPAR, PRM and FOIL2 \citep{quinlan1993foil}. + +By default, an average of two runs of each algorithm is performed. + +\begin{lstlisting} +# learn with default metaparameter values +stats <- benchmarkQCBA(train = trainFold, test = testFold, classAtt = classAtt) +print(stats) +\end{lstlisting} +The result of the last printout is +\begin{lstlisting} + CBA CMAR CPAR PRM FOIL2 CBA_QCBA CMAR_QCBA CPAR_QCBA PRM_QCBA FOIL2_QCBA +accuracy 1.00 0.960 0.960 0.960 0.960 1.000 1.000 1.000 1.000 0.960 +rulecount 5.00 25.000 7.000 6.000 8.000 4.000 4.000 4.000 4.000 5.000 +modelsize 5.00 52.000 10.000 9.000 13.000 3.000 3.000 3.000 3.000 5.000 +buildtime 0.05 0.535 0.215 0.205 0.215 0.058 0.138 0.059 0.059 0.081 +\end{lstlisting} +The function can be easily turned into a comparison with \code{round((stats[, 6:10] / stats[, 1:5] - 1), 3)}, the result is then: + +\begin{lstlisting} + CBA_QCBA CMAR_QCBA CPAR_QCBA PRM_QCBA FOIL2_QCBA +accuracy 0.000 0.042 0.042 0.042 0.000 +rulecount -0.200 -0.840 -0.429 -0.333 -0.375 +modelsize -0.400 -0.942 -0.700 -0.667 -0.615 +buildtime 0.204 -0.737 -0.681 -0.722 -0.665 +\end{lstlisting} +This shows that on the iris, depending on the base algorithm, QCBA decreased the rule count between 20\% and 84\%, while accuracy remained unchanged or increased by about 4\%. The last row shows that the time required by QCBA is, for four out of five studied reference algorithms, lower than what it takes to train the input model by the corresponding algorithm. +The \code{benchmarkQCBA()} function can also accept custom metaparameters and selected base rule learners. The user can also choose the number of runs (\code{iterations} parameter) and obtain the resulting models from the last iteration. + +Since the outputs of some learners may depend on chance, the function also allows setting the random seed through the optional \code{seed} argument. Note that the provided seed is not used for splitting data, which needs to be performed externally. This approach provides the most control for the user, avoids replicating code in other R packages and functions aimed at splitting data, and also improves reproducibility. +\begin{lstlisting} +output <- benchmarkQCBA( + trainFold, + testFold, + classAtt, + train_disc, + test_disc, + discrModel$cutp, + CBA = list(support = 0.05, confidence = 0.5), + algs = c("CPAR"), + iterations = 10, + return_models = TRUE, + seed = 1 +) + +message("Evaluation statistics") +print(output$stats) +message("CPAR model") +inspect(output$CPAR[[1]]) +message("QCBA model") +print(output$CPAR_QCBA[[1]]) +\end{lstlisting} +This will produce output with a list of rules similar to the final CPAR model presented in Section~\hyperref[{sec:demo}]{Demonstration of individual QCBA optimization steps}. A more complex benchmark of computational costs is presented in \citet{kliegr2023qcba}, which also includes the study of various data sizes and the effect of varying the number of unique values in the dataset. A brief overview of the main results was presented in Subsection~\hyperref[{ss:costs}]{Computational costs for large datasets}. + +A GitHub repository \url{https://github.com/kliegr/arcBench} contains scripts that extend this workflow into automation across multiple datasets and materialized splits in each dataset. It also includes support for benchmarking additional rule learning algorithms, including SBRL, Python packages producing IDS models \citep{lakkarajuinterpretable} and Weka libraries for RIPPER \citep{cohen1995fast} and FURIA \citep{huhn2009furia}. Detailed benchmarking results are included in \citep{kliegr2023qcba}. + +\section{Conclusions} +Quantitative CBA ameliorates one of the major drawbacks of association rule classification, the adherence of rules comprising the classifier to the multidimensional grid created by discretization of numerical attributes. By working with the original continuous data, the algorithm can improve the fit of the rules and consequently reduce their count. +The QCBA algorithm does not introduce any mandatory thresholds or meta-parameters for the user to set or optimize, although it does allow disabling the individual optimizations. The \CRANpkg{qCBA} package implements QCBA, allowing the postprocessing of the output of all three CBA implementations currently in CRAN. The package can also be used in conjunction with other association rule-based models, including those producing rule sets and using multi-rule classification. + +The QCBA algorithm is described in detail in \citet{kliegr2023qcba}, the package documentation is available in \citet{qcbaPackage}, and additional information is available at \url{https://github.com/kliegr/qcba}, which also features an interactive RMarkdown tutorial supplementing this paper. + +\section{Acknowledgment} +The author thanks the Faculty of Informatics and Statistics, Prague University of Economics and Business, for long-term institutional support of research activities. + +\bibliography{qcba} + +\address{Tomas Kliegr\\ + Prague University of Economics and Business\\ + Winston Churchill Sq. 4, Prague\\ + Czech Republic\\ + % + \url{https://nb.vse.cz/~klit01}\\% +\textit{ORCiD: \href{https://orcid.org/0000-0002-7261-0380}{0000-0002-7261-0380}}\\% +\href{mailto:tomas.kliegr@vse.cz}{\nolinkurl{tomas.kliegr@vse.cz}}% + \email{tomas.kliegr@vse.cz}} diff --git a/_articles/RJ-2025-039/RJ-2025-039.R b/_articles/RJ-2025-039/RJ-2025-039.R new file mode 100644 index 0000000000..fa72ea8db2 --- /dev/null +++ b/_articles/RJ-2025-039/RJ-2025-039.R @@ -0,0 +1,346 @@ +# Generated by `rjournal_pdf_article()` using `knitr::purl()`: do not edit by hand +# Please edit RJ-2025-039.Rmd to modify this file + +## ----echo=FALSE--------------------------------------------------------------- +knitr::opts_chunk$set(tidy=TRUE, tidy.opts=list(width.cutoff=70), cache=FALSE) +RNGkind("L'Ecuyer-CMRG") + + +## ----metaphor, fig.show="hold", out.width="100%", fig.cap="Illustration of the ball-and-landscape metaphor commonly used in the field of psychopathology.",fig.alt='Three landscape plots, each with a ball resting in one of two basins. The left basin is labeled maladaptive, the right basin healthy. In the first plot, the maladaptive basin is deeper; in the second, both basins are equally deep; in the third, the healthy basin is deeper.',echo=FALSE,message=FALSE---- +knitr::include_graphics("figures/metaphor.png") + + +## ----diagram, fig.show="hold", out.width="100%", fig.cap="The structure and workflow of simlandr.", fig.alt='A flow chart showing the analysis steps in simlandr, with functions listed under each step.', echo=FALSE,message=FALSE---- +knitr::include_graphics("figures/diagram.png") + + +## ----e1, fig.show="hold", out.width="50%", fig.cap="A graphical illustration of the relationship between the activation levels of the two genes. Solid arrows represent positive relationships (i.e., activation) and dashed arrows represent negative relationships (i.e., inhibition).", fig.alt='Two large circles labeled X1 and X2. Dashed arrows labeled b connect the circles in both directions. Each circle has a solid self-loop labeled a and a dashed self-loop labeled k.', echo=FALSE,message=FALSE---- +knitr::include_graphics("figures/diagram-e1.png") + + +## ----------------------------------------------------------------------------- +# Load the package. +library(simlandr) + +# Specify the simulation function. +b <- 1 +k <- 1 +S <- 0.5 +n <- 4 +lambda <- 0.01 + +drift_gene <- c( + rlang::expr(z * x^(!!n) / ((!!S)^(!!n) + x^(!!n)) + (!!b) * (!!S)^(!!n) / ((!!S)^(!!n) + y^(!!n)) - (!!k) * x), + rlang::expr(z * y^(!!n) / ((!!S)^(!!n) + y^(!!n)) + (!!b) * (!!S)^(!!n) / ((!!S)^(!!n) + x^(!!n)) - (!!k) * y), + rlang::expr(-(!!lambda) * z) +) |> as.expression() + +diffusion_gene <- expression( + 0.2, + 0.2, + 0.2 +) + + +## ----eval=FALSE--------------------------------------------------------------- +# # Perform a simulation and save the output. +# set.seed(1614) +# single_output_gene <- sim_SDE(drift = drift_gene, diffusion = diffusion_gene, N = 1e6, M = 10, Dt = 0.1, x0 = c(0, 0, 1), keep_full = FALSE) + + +## ----echo=FALSE--------------------------------------------------------------- +# To save time for building the document, we save the output in a file. The same applies to the following examples. +if (!file.exists("data/single_output_gene.RDS")) { + set.seed(1614) + single_output_gene <- sim_SDE(drift = drift_gene, diffusion = diffusion_gene, N = 1e6, M = 10, Dt = 0.1, x0 = c(0, 0, 1), keep_full = FALSE) + xfun::dir_create("data") + saveRDS(single_output_gene, "data/single_output_gene.RDS") +} else { + single_output_gene <- readRDS("data/single_output_gene.RDS") +} + + +## ----------------------------------------------------------------------------- +single_output_gene2 <- do.call(rbind, single_output_gene) +single_output_gene2 <- cbind(single_output_gene2[, "X"] - single_output_gene2[, "Y"], single_output_gene2[, "Z"]) +colnames(single_output_gene2) <- c("delta_x", "a") + + +## ----------------------------------------------------------------------------- +single_output_gene_mcmc_thin <- as.mcmc.list(lapply(single_output_gene, function(x) x[seq(1, nrow(x), by = 100), ])) + + +## ----converge-gene, message=FALSE, warning=FALSE, fig.cap="The convergence check result for the simulation of the gene expression model. The variables in different simulation stages did not show distributional differences, indicating that the simulation is long enough to provide a reliable estimation of the steady-state distribution.", fig.alt='Trace and density plots for variables x, y, and z.', out.width="100%"---- +plot(single_output_gene_mcmc_thin) + + +## ----warning=FALSE------------------------------------------------------------ +l_single_gene_3d <- + make_3d_single(single_output_gene2, + x = "delta_x", y = "a", + lims = c(-1.5, 1.5, 0, 1.5), + Umax = 8) + + +## ----eval=FALSE--------------------------------------------------------------- +# plot(l_single_non_grad_3d) + + +## ----echo=FALSE,message=FALSE,warning=FALSE,results='hide'-------------------- +if(!file.exists("figures/3dstatic_gene.png")) { + plotly::orca(plot(l_single_gene_3d) |> + plotly::layout(scene = list( + aspectmode = "manual", aspectratio = list(x = 1.1, y = 1.1, z = 0.66), + xaxis = list(range = list(-2, 2)), + yaxis = list(range = list(0, 1.5)), + camera = list( + eye = list( + x = 0.3, y = -1.5, z = 1.5 + ) + ) + )), file = "figures/3dstatic_gene.png", height = 650, width = 750) +} + + +## ----3dstaticgene, fig.show="hold", out.width="50%", fig.cap="The 3D landscape (potential value as z-axis) for the gene expression model. The left panel is the plot produced by simlandr; the right panel is the potential landscape obtained analytically by Wang et al. (2008), reproduced with the permission of the authors and in accordance with the journal policy.", fig.alt='Two similar landscape plots, each with three basins.', echo=FALSE,message=FALSE,warning=FALSE---- +knitr::include_graphics("figures/3dstatic_gene.png") +knitr::include_graphics("figures/wang2011.png") + + +## ----results='markup', fig.show='hide', warning=FALSE------------------------- +b_single_gene_3d <- calculate_barrier(l_single_gene_3d, + start_location_value = c(0, 1.2), end_location_value = c(1, 0.2), + start_r = 0.3, end_r = 0.3 +) + +get_barrier_height(b_single_gene_3d) + + +## ----bsingle3dgene, out.width="50%", fig.cap="The landscape for the gene expression model. The local minima are marked as white dots, the saddle points are marked as red dots, and the MEPs are marked as white lines.", fig.alt='A landscape plot with a white line connecting two white dots, passing through a red dot in the middle.', message=FALSE,warning=FALSE---- + +plot(l_single_gene_3d, 2) + autolayer(b_single_gene_3d) + + +## ----------------------------------------------------------------------------- +batch_arg_set_gene <- new_arg_set() +batch_arg_set_gene <- batch_arg_set_gene |> + add_arg_ele( + arg_name = "parameter", ele_name = "b", + start = 0.5, end = 1.5, by = 0.5 + ) |> + add_arg_ele( + arg_name = "parameter", ele_name = "k", + start = 0.5, end = 1.5, by = 0.5 + ) +batch_grid_gene <- make_arg_grid(batch_arg_set_gene) + + +## ----eval=FALSE--------------------------------------------------------------- +# batch_output_gene <- batch_simulation(batch_grid_gene, +# sim_fun = function(parameter) { +# b <- parameter[["b"]] +# k <- parameter[["k"]] +# drift_gene <- c( +# rlang::expr(z * x^(!!n) / ((!!S)^(!!n) + x^(!!n)) + (!!b) * (!!S)^(!!n) / ((!!S)^(!!n) + y^(!!n)) - (!!k) * x), +# rlang::expr(z * y^(!!n) / ((!!S)^(!!n) + y^(!!n)) + (!!b) * (!!S)^(!!n) / ((!!S)^(!!n) + x^(!!n)) - (!!k) * y), +# rlang::expr(-(!!lambda) * z) +# ) |> as.expression() +# set.seed(1614) +# single_output_gene <- sim_SDE(drift = drift_gene, diffusion = diffusion_gene, N = 1e6, M = 10, Dt = 0.1, x0 = c(0, 0, 1), keep_full = FALSE) +# single_output_gene2 <- do.call(rbind, single_output_gene) +# single_output_gene2 <- cbind(single_output_gene2[, "X"] - single_output_gene2[, "Y"], single_output_gene2[, "Z"]) +# colnames(single_output_gene2) <- c("delta_x", "a") +# single_output_gene2 +# }, +# bigmemory = TRUE +# ) + + +## ----eval=FALSE--------------------------------------------------------------- +# saveRDS(batch_output_gene, "batch_output_gene.RDS") +# batch_output_gene <- readRDS("batch_output_gene.RDS") |> attach_all_matrices() + + +## ----echo=FALSE--------------------------------------------------------------- +if (file.exists("data/batch_output_gene.RDS")) { + batch_output_gene <- readRDS("data/batch_output_gene.RDS") |> attach_all_matrices()} else { + batch_output_gene <- batch_simulation(batch_grid_gene, + sim_fun = function(parameter) { + b <- parameter[["b"]] + k <- parameter[["k"]] + drift_gene <- c( + rlang::expr(z * x^(!!n) / ((!!S)^(!!n) + x^(!!n)) + (!!b) * (!!S)^(!!n) / ((!!S)^(!!n) + y^(!!n)) - (!!k) * x), + rlang::expr(z * y^(!!n) / ((!!S)^(!!n) + y^(!!n)) + (!!b) * (!!S)^(!!n) / ((!!S)^(!!n) + x^(!!n)) - (!!k) * y), + rlang::expr(-(!!lambda) * z) + ) |> as.expression() + set.seed(1614) + single_output_gene <- sim_SDE(drift = drift_gene, diffusion = diffusion_gene, N = 1e6, M = 10, Dt = 0.1, x0 = c(0, 0, 1), keep_full = FALSE) + single_output_gene2 <- do.call(rbind, single_output_gene) + single_output_gene2 <- cbind(single_output_gene2[, "X"] - single_output_gene2[, "Y"], single_output_gene2[, "Z"]) + colnames(single_output_gene2) <- c("delta_x", "a") + single_output_gene2 + }, + bigmemory = TRUE + ) + saveRDS(batch_output_gene, "data/batch_output_gene.RDS") + } + + +## ----warning=FALSE, message=FALSE--------------------------------------------- +l_batch_gene_3d <- make_3d_matrix(batch_output_gene, + x = "delta_x", y = "a", cols = "b", rows = "k", + lims = c(-5, 5, -0.5, 2), h = 0.005, + Umax = 8, + kde_fun = "ks", individual_landscape = TRUE +) + + +## ----results='markup', fig.show='hide', message=FALSE,warning=FALSE----------- +make_barrier_grid_3d(batch_grid_gene, + start_location_value = c(0, 1.5), end_location_value = c(1, -0.5), + start_r = 1, end_r = 1, print_template = TRUE +) + +bg_gene <- make_barrier_grid_3d(batch_grid_gene, df = structure(list(start_location_value = list(c(0, 1.5), c(0, 1.5), c(0, 1.5), c(0, 1.5), c(0, 1.5), c(0, 1.5), c(0, 1.5), c(0, 1.5), c(0, 1.5)), start_r = list(c(0.2, 1), c(0.2, 1), c(0.2, 1), c(0.2, 0.5), c(0.2, 0.5), c(0.2, 0.5), c(0.2, 0.3), c(0.2, 0.3), c(0.2, 0.3)), end_location_value = list( + c(2, 0), c(2, 0), c(2, 0), c(1, 0), c(1, 0), c(1, 0), c(1, 0), c(1, 0), c(1, 0) +), end_r = list( + c(1, 1), c(1, 1), c(1, 1), c(1, 1), c(1, 1), c(1, 1), c(1, 1), c(1, 1), c(1, 1) +)), row.names = c(NA, -9L), class = c( + "arg_grid", + "data.frame" +))) + + +## ----------------------------------------------------------------------------- +b_batch_gene_3d <- calculate_barrier(l_batch_gene_3d, + bg = bg_gene +) + + +## ----eval=FALSE--------------------------------------------------------------- +# b_batch_gene_3d <- calculate_barrier(l_batch_gene_3d, start_location_value = c(0, 1.5), end_location_value = c(1, 0), start_r = 1, end_r = 1) + + +## ----bbatch3dgene, out.width="100%", fig.cap="The landscape for the gene expression model of different \\( b \\) and \\( k \\) values. The local minima are marked as white dots, the saddle points are marked as red dots, and the MEPs are marked as white lines.",fig.alt='Nine landscape plots arranged in a 3×3 grid. The x-axis is labeled delta x, the y-axis is labeled a. Rows correspond to k values (0.5, 1, 1.5), and columns to b values (0.5, 1, 1.5).', message=FALSE,warning=FALSE---- +plot(l_batch_gene_3d) + autolayer(b_batch_gene_3d) + + +## ----e2, fig.show="hold", out.width="100%", fig.cap="A graphical illustration of the relationships between several important psychological variables in the panic disorder model. Solid arrows represent positive relationships and dashed arrows represent negative relationships.", fig.alt='Four large circles labeled H, A, PT, and E from left to right. Solid arrows go from A to H, A to PT, PT to A, and PT to E. Dashed arrows go from H to A and from E to PT. The solid arrow from A to PT is labeled AS.', echo=FALSE,message=FALSE---- +knitr::include_graphics("figures/diagram-e2.png") + + +## ----------------------------------------------------------------------------- +library(PanicModel) + +sim_fun_panic <- function(x0, par) { + + # Change several default parameters + pars <- pars_default + # Increase the noise strength to improve sampling efficiency + pars$N$lambda_N <- 200 + # Make S constant through the simulation + pars$TS$r_S_a <- 0 + pars$TS$r_S_e <- 0 + + # Specify the initial values of A and PT according to the format requirement by `multi_init_simulation()`, while the other variables use the default initial values. + initial <- initial_default + initial$A <- x0[1] + initial$PT <- x0[2] + + # Specify the value of S according to the format requirement by `batch_simulation()`. + initial$S <- par$S + + # Extract the simulation output from the result by simPanic(). Only keep the core variables. + return( + as.matrix( + simPanic(1:5000, initial = initial, parameters = pars)$outmat[, c("A", "PT", "E")] + ) + ) +} + + +## ----eval=FALSE--------------------------------------------------------------- +# future::plan("multisession") +# set.seed(1614, kind = "L'Ecuyer-CMRG") +# single_output_panic <- multi_init_simulation( +# sim_fun = sim_fun_panic, +# range_x0 = c(0, 1, 0, 1), +# R = 4, +# par = list(S = 0.5) +# ) + + +## ----echo=FALSE--------------------------------------------------------------- +if (file.exists("data/single_output_panic.RDS")) { + single_output_panic <- readRDS("data/single_output_panic.RDS") +} else { + future::plan("multisession") + set.seed(1614, kind = "L'Ecuyer-CMRG") + single_output_panic <- multi_init_simulation( + sim_fun = sim_fun_panic, + range_x0 = c(0, 1, 0, 1), + R = 4, + par = list(S = 0.5) + ) + saveRDS(single_output_panic, "data/single_output_panic.RDS") +} + + +## ----converge-panic, message=FALSE, warning=FALSE, fig.cap="The convergence check result for the simulation of the panic disorder model.", fig.alt='Trace and density plots for variables x, y, and z.', out.width="100%"---- +plot(single_output_panic) + + +## ----3dstaticpanic, fig.show="hold", out.width="50%", fig.cap="The 3D landscape (potential value as color) for the panic disorder model", fig.alt='A landscape plot with two basins.', echo=FALSE,message=FALSE,warning=FALSE---- +l_single_panic_3d <- make_3d_single( + single_output_panic |> window(start = 100), + x = "A", y = "PT", h = 0.005, lims = c(-1, 1.5, -0.5, 1.5)) +plot(l_single_panic_3d, 2) + + +## ----------------------------------------------------------------------------- +batch_arg_grid_panic <- new_arg_set() |> + add_arg_ele(arg_name = "par", ele_name = "S", start = 0, end = 1, by = 0.5) |> + make_arg_grid() + + +## ----eval=FALSE--------------------------------------------------------------- +# future::plan("multisession") +# set.seed(1614, kind = "L'Ecuyer-CMRG") +# batch_output_panic <- batch_simulation( +# batch_arg_grid_panic, +# sim_fun = function(par) { +# multi_init_simulation( +# sim_fun_panic, +# range_x0 = c(0, 1, 0, 1), +# R = 4, +# par = par +# ) |> window(start = 100) +# } +# ) + + +## ----echo=FALSE,message=FALSE------------------------------------------------- +if (file.exists("data/batch_output_panic.RDS")) { + batch_output_panic <- readRDS("data/batch_output_panic.RDS") +} else { + future::plan("multisession") + set.seed(1614, kind = "L'Ecuyer-CMRG") + batch_output_panic <- batch_simulation( + batch_arg_grid_panic, + sim_fun = function(par) { + multi_init_simulation( + sim_fun_panic, + range_x0 = c(0, 1, 0, 1), + R = 4, + par = par + ) |> window(start = 100) + } + ) + saveRDS(batch_output_panic, "data/batch_output_panic.RDS") +} + + +## ----lbatch3dpanic, out.width="100%", fig.cap="The landscape for the panic disorder model of different \\( S \\) values. Two landscapes are shown for different variable combinations, \\( A \\) and \\( PT \\), or \\( A \\) and \\( E \\).", fig.alt='Three landscape plots arranged in a row. The x-axis is labeled A, the y-axis is labeled PT. Columns correspond to S values of 0, 0.5, and 1.', message=FALSE,warning=FALSE---- +l_batch_panic_3d <- make_3d_matrix(batch_output_panic, x = "A", y = "PT", cols = "S", h = 0.005, lims = c(-1, 1.5, -0.5, 1.5)) +plot(l_batch_panic_3d) + diff --git a/_articles/RJ-2025-039/RJ-2025-039.Rmd b/_articles/RJ-2025-039/RJ-2025-039.Rmd new file mode 100644 index 0000000000..0ac7c29346 --- /dev/null +++ b/_articles/RJ-2025-039/RJ-2025-039.Rmd @@ -0,0 +1,594 @@ +--- +title: 'simlandr: Simulation-Based Landscape Construction for Dynamical Systems' +date: '2026-02-13' +abstract: | + We present the simlandr package for R, which provides a set of tools for constructing potential landscapes for dynamical systems using Monte Carlo simulation. Potential landscapes can be used to quantify the stability of system states. While the canonical form of a potential function is defined for gradient systems, generalized potential functions can also be defined for non-gradient dynamical systems. Our method is based on the potential landscape definition from the steady-state distribution, and can be used for a large variety of models. To facilitate simulation and computation, we introduce several novel features, including data structures optimized for batch simulations under varying conditions, an out-of-memory computation tool with integrated hash-based file-saving systems, and an algorithm for efficiently searching the minimum energy path. Using a multistable cell differentiation model as an example, we illustrate how simlandr can be used for model simulation, landscape construction, and barrier height calculation. The simlandr package is available at https://CRAN.R-project.org/package=simlandr, under GPL-3 license. +draft: no +author: +- name: Jingmeng Cui + affiliation: + - University of Groningen, + - Radboud University + address: + - Faculty of Behavioural and Social Sciences, Groningen, the Netherlands + - Behavioural Science Institute, Nijmegen, the Netherlands + email: jingmeng.cui@rug.nl + orcid: 0000-0003-3421-8457 +- name: Merlijn Olthof + affiliation: + - University of Groningen, + - Radboud University + address: + - Faculty of Behavioural and Social Sciences, Groningen, the Netherlands + - Behavioural Science Institute, Nijmegen, the Netherlands + email: merlijn.olthof@ru.nl + orcid: 0000-0002-5975-6588 +- name: Anna Lichtwarck-Aschoff + affiliation: + - University of Groningen, + - Radboud University + address: + - Faculty of Behavioural and Social Sciences, Groningen, the Netherlands + - Behavioural Science Institute, Nijmegen, the Netherlands + email: a.lichtwarck-aschoff@rug.nl + orcid: 0000-0002-4365-1538 +- name: Tiejun Li + affiliation: Peking University + address: \textit{(corresponding author)} LMAM and School of Mathematical Sciences, + Beijing, China + email: tieli@pku.edu.cn + orcid: 0000-0002-2086-1620 +- name: Fred Hasselman + affiliation: Radboud University + address: \textit{(corresponding author)} Behavioural Science Institute, Nijmegen, + the Netherlands + email: fred.hasselman@ru.nl + orcid: 0000-0003-1384-8361 +type: package +output: + rjtools::rjournal_pdf_article: + toc: no + rjtools::rjournal_web_article: + self_contained: yes + toc: no +bibliography: RJreferences.bib +date_received: '2024-10-13' +volume: 17 +issue: 4 +slug: RJ-2025-039 +journal: + lastpage: 191 + firstpage: 173 + +--- + + +```{r echo=FALSE} +knitr::opts_chunk$set(tidy=TRUE, tidy.opts=list(width.cutoff=70), cache=FALSE) +RNGkind("L'Ecuyer-CMRG") +``` + +# Introduction {#sec:intro} + +To better understand a dynamical system, it is often important to know the stability of different states. The metaphor of a potential landscape consisting of hills and valleys has been used to illustrate differences in stability in many fields, including genetics [@wangQuantifyingWaddingtonLandscape2011; @WaddingtonPrinciplesDevelopmentDifferentiation1966], ecology [@lamotheLinkingBallandcupAnalogy2019], and psychology [@olthofComplexityTheoryPsychopathology2020]. In such a landscape, the stable states of the system correspond to the lowest points (minima) in the valleys of the landscape. Just like a ball that is thrown in such a landscape will eventually gravitate towards such a minimum, the dynamical system is conceptually more likely to visit its stable states in which the system is also more resilient to noise. For example, in the landscape metaphor of psychopathology (Figure \@ref(fig:metaphor)), the valleys represent different mental health states, their relative depth represents the relative stability of the states, and the barriers between valleys represent the difficulty of transitioning between these states [@olthofComplexityTheoryPsychopathology2020,@hayesComplexSystemsApproach2020]. When the healthy state is more stable, the person is more likely to stay mentally healthy, whereas when the maladaptive state is more stable, the person is more likely to suffer from mental disorders. + +```{r metaphor, fig.show="hold", out.width="100%", fig.cap="Illustration of the ball-and-landscape metaphor commonly used in the field of psychopathology.",fig.alt='Three landscape plots, each with a ball resting in one of two basins. The left basin is labeled maladaptive, the right basin healthy. In the first plot, the maladaptive basin is deeper; in the second, both basins are equally deep; in the third, the healthy basin is deeper.',echo=FALSE,message=FALSE} +knitr::include_graphics("figures/metaphor.png") +``` + +Yet, formally quantifying the stability of states is a nontrivial question. Here we present an R package, \CRANpkg{simlandr}, that can quantify the stability of various kinds of systems without many mathematical restrictions. + +Dynamical systems are usually modeled by stochastic differential equations, which may depend on the past history (i.e., may be non-Markovian, @StumpfEtAlModelingStemCell2021). They take the general form of +\begin{equation} +\mathrm{d} \boldsymbol{X}_t = \boldsymbol{b}(\boldsymbol{X}_t, \boldsymbol{H}_t){\mathrm{d}t} + \boldsymbol{\sigma}( \boldsymbol{X}_t, \boldsymbol{H}_t)\mathrm{d}\boldsymbol{W}, +(\#eq:sde) +\end{equation} +where $\boldsymbol{X}_t$ is the random variable representing the current state of the system and $\boldsymbol{H_t}$ represents the past history of the system $\boldsymbol{H}_t=\{\boldsymbol{X_s} | s \in [0, t)\}$\footnote{The corresponding variable representing positions in the state space is not a random variable, so we use lowercase \(\boldsymbol{x}\) for it. This convention will be followed throughout this article.} The first term on the right-hand side of Eq \@ref(eq:sde) represents the deterministic part of the dynamics, which is a function of the system's current state $\boldsymbol{b}(\boldsymbol{X}_t, \boldsymbol{H}_t)$. The second term represents the stochastic part, which is standard white noise $\mathrm{d}\boldsymbol{W}$ multiplied by the noise strength $\boldsymbol{\sigma}( \boldsymbol{X}_t, \boldsymbol{H}_t)$. + +If the dynamical equation (Eq \@ref(eq:sde)) can be written in the following form +\begin{equation} +\mathrm{d} \boldsymbol{X} = - \nabla U {\mathrm{d} t} +\sqrt{2}\mathrm{d}\boldsymbol{W}, +(\#eq:canon) +\end{equation} +then $U$ is the potential function of the system.\footnote{Under zero inertia approximation.} However, this is not possible for general dynamical systems. The trajectory of such a system may contain loops which are not possible to be represented by a gradient system (this issue was compared to Escher's stairs by @rodriguez-sanchezClimbingEscherStairs2020). In this case, further generalization is needed. The theoretical background of \CRANpkg{simlandr} is the generalized potential landscape by @wangPotentialLandscapeFlux2008, which is based on the Boltzmann distribution and the steady-state distribution of the system. The Boltzmann distribution is a distribution law in physics, which states the distribution of classical particles depends on the energy level they occupy. When the energy is higher, the particle is exponentially less likely to be in such states +\begin{equation} +P(\boldsymbol{x}) \propto \exp (-U). +(\#eq:Boltzmann) +\end{equation} + +This is then linked to dynamical systems by the steady-state distribution. The steady-state distribution of stochastic differential equations is the distribution that does not change over time, denoted by $P_{\mathrm{SS}}$ which satisfies +\begin{equation} +\frac{\partial P_{\mathrm{SS}} (\boldsymbol{x},t)}{\partial t} = 0. +(\#eq:ss) +\end{equation} +The steady-state distribution is important because it extracts time-invariant information from a set of stochastic differential equations. Substituting the steady-state distribution into Eq \@ref(eq:Boltzmann) gives Wang's generalized potential landscape function [@wangPotentialLandscapeFlux2008] +\begin{equation} +U(\boldsymbol{x}) = - \ln P_{\mathrm{SS}}(\boldsymbol{x}). +(\#eq:Wang) +\end{equation} +If the system has ergodicity (i.e., after sufficient time it can travel to all possible states in the state space), the long-term sample distribution can be used to estimate the steady-state distribution, and the generalized potential function can be calculated. + +Our approach is not the only possible way for constructing potential landscapes. Many other theoretical approaches are available, including the SDE decomposition method by @AoPotentialStochasticDifferential2004 and the quasi-potential by @FreidlinWentzellRandomPerturbationsDynamical2012. Various strategies to numerically compute these landscapes have been proposed (see @zhouConstructionLandscapeMultistable2016 for a review). However, available realizations are still scarce. To our knowledge, besides \CRANpkg{simlandr}, there are two existing packages specifically for computing potential landscapes: the \CRANpkg{waydown} package [@rodriguez-sanchezPabRodRolldownPostpublication2020] and the \pkg{QPot} package [@MooreEtAlQPotPackageStochastic2016; @dahiyaOrderedLineIntegral2018]. The \CRANpkg{waydown} package uses the skew-symmetric decomposition of the Jacobian, which theoretically produces landscapes that are similar to @wangPotentialLandscapeFlux2008 (but see @CuiEtAlCommentsClimbingEscher2023 for a potential technical issue with this package.). The \pkg{QPot} package uses a path integral method that produces quasi-landscapes following the definition by @FreidlinWentzellRandomPerturbationsDynamical2012. Because of the analytical methods used by \CRANpkg{waydown} and \pkg{QPot}, they both require the dynamic function to be Markovian and differentiable in the whole state space. Moreover, they can only be used for systems of up to two dimensions. \CRANpkg{simlandr}, in contrast, is based on Monte Carlo simulation and the steady-state distribution. It does not have specific requirements for the model. Even for models that are not globally differentiable, have history-dependence, and are defined in a high-dimensional space, \CRANpkg{simlandr} is still applicable (e.g., @CuiEtAlMetaphorComputationConstructing2021). Therefore, \CRANpkg{simlandr} can be applied to a much wider range of dynamical systems, illustrate a big picture of dominated attractors, and investigate how the stability of different attractors may be influenced by model parameters. As a trade-off, \CRANpkg{simlandr} is not designed for rare events sampling in which the noise strength $\boldsymbol{\sigma}(\boldsymbol{X})$ is extremely small, nor for the precise calculation of the tail probability and transition paths. Instead, it is better to view \CRANpkg{simlandr} as a semi-quantitative tool that provides a broad overview of key attractors in dynamic systems, allowing for comparisons of their relative stability and the investigation of system parameter influences. We will show some typical use cases of \CRANpkg{simlandr} in later sections. Some key terms used in this article are summarized in Table \@ref(tab:terms). + +| Term | Explanation | +|------|-------------| +| Potential landscape metaphor | A conceptual metaphor representing the stability of a complex dynamic system as an uneven landscape, with a ball on it representing the system's state. This can be quantitatively realized in various ways [@zhouConstructionLandscapeMultistable2016]. | +| Gradient system | A system whose deterministic motion can be described solely by the gradient of a potential function. | +| Non-Markovian system | A system whose future evolution depends not only on its current state but also on its past history. | +| Steady-state distribution | The probability distribution of a dynamic system that remains unchanged over time. | +| Ergodic system | A dynamic system that, given enough time, will eventually pass through all possible states. | +| Minimum energy path (MEP) | A transition path linking two local minima and passing through a saddle point [@WanEtAlFindingTransitionState2024]. It is always parallel to the gradient of the energy landscape, representing an efficient transition route. | + +: (#tab:terms) Summary of key terms used in this article. + +# Design and implementation + +The general workflow of \CRANpkg{simlandr} involves three steps: model simulation, landscape construction, and barrier height computation. See Figure \@ref(fig:diagram) for a summary. + +```{r diagram, fig.show="hold", out.width="100%", fig.cap="The structure and workflow of simlandr.", fig.alt='A flow chart showing the analysis steps in simlandr, with functions listed under each step.', echo=FALSE,message=FALSE} +knitr::include_graphics("figures/diagram.png") +``` + +## Step 1: model simulation + +For the first step, a simulation function should be created by the user. This function should be able to simulate how the dynamical system of interest evolves over time and record its state in every time step. This can often be done with the Euler-Maruyama method. If the SDEs are up to three dimensions and Markovian, a helper function from \CRANpkg{simlandr}, `sim_SDE()`, can also be used. This function is based on the simulation utilities from the \CRANpkg{Sim.DiffProc} [@GuidoumBoukhetalaPerformingParallelMonte2020] and the output can be directly used for later steps. Moreover, the `multi_init_simulation()` function can be used to simulate trajectories from various starting points, thus reducing the possibility for the system to be trapped in a local minimum. The `multi_init_simulation()` function also supports parallel simulation based on the \CRANpkg{future} framework to improve time efficiency @RJ-2021-048. + +For Monte Carlo methods, it is important that the simulation *converges*, which means the distribution of the system is roughly stable. The precision of the steady-state distribution estimation, according to Eq \@ref(eq:Wang), determines the precision of the distribution estimation. \CRANpkg{simlandr} provides a visual tool to compare the sample distributions in different stages (`check_conv()`), whereas the \CRANpkg{coda} package [@PlummerEtAlCODAConvergenceDiagnosis2006] can be used for more advanced diagnostics. The output of the `sim_SDE()` and `multi_init_simulation()` functions also uses the classes from the \CRANpkg{coda} package to enable easy convergence diagnosis. To achieve ergodicity in reasonable time, sometimes stronger noises need to be added to the system. + +A simulation function is sufficient if the user is only interested in a single model setting. If the model is parameterized and the user wants to investigate the influence of parameters on the stability of the system, then multiple simulations need to be run with different parameter settings. \CRANpkg{simlandr} provides functions to perform batch simulations and store the outputs for landscape construction as one object. This can later be used to compare the stability under different parameter settings or produce animations to show how a model parameter influences the stability of the model. + +In many cases, the output of the simulation is so large that it cannot be properly stored in the memory. \CRANpkg{simlandr} provides a hash_big_matrix class, which is a modification of the big.matrix class from the \CRANpkg{bigmemory} package [@KaneEtAlScalableStrategiesComputing2013], that can perform out-of-memory computation and organize the data files in the disk. In an out-of-memory computation, the majority of the data is not loaded into the memory, but only the small subset of data that is used for the current computation step. Therefore, the memory occupation is dramatically reduced. The big.matrix class in the \CRANpkg{bigmemory} package provides a powerful tool for out-of-memory computation. It, however, requires an explicit file name for each matrix, which can be cumbersome if there are many matrices to be handled, and this is likely to be the case in a batch simulation. The `hash_big_matrix` class automatically generates the file names using the md5 values of the matrices with the \CRANpkg{digest} package[@EddelbuettelEtAlPkgdigestCreateCompact2021] and stores it within the object. Therefore, the file links can also be restored automatically. + +## Step 2: landscape construction + +\CRANpkg{simlandr} provides a set of tools to construct 2D, 3D, and 4D\footnote{In this package, we use the number of dimensions in landscape plots (including \(U\) to define the dimension of landscapes. The x-, y-, z-, and color- axes can all be regarded as a dimension. Therefore, the dimension of a landscape can be one more than the dimension of the kernel smooth function.} landscapes from single or multiple simulation results. The steady-state distribution for selected variables of the system is first estimated using the kernel density estimates (KDE). The density function in R is used for 2D landscapes, whereas the \CRANpkg{ks} package [@ChaconDuongMultivariateKernelSmoothing2018] is used by default for 3D and 4D landscapes because of its higher efficiency. Then the potential function $U$ is calculated from Eq \@ref(eq:Wang). The landscape plots without a z-axis are created with \CRANpkg{ggplot2} [@WickhamGgplot2ElegantGraphics2016], and those with a z-axis are created with \CRANpkg{plotly} [@SievertInteractiveWebbasedData2020]. These plots can be further refined using the standard \CRANpkg{ggplot2} or \CRANpkg{plotly} methods. See Table \@ref(tab:overview) for an overview for the family of landscape functions. + +| Type of Input | Function | Dimensions | +|-------------------|-------------------|----------------------------------| +| Single simulation data | `make_2d_static()` | x, **y** | +| | `make_3d_static()` | $1$ x, y, **z+color**; (2) x, y, **color** | +| | `make_4d_static()` | x, y, z, **color** | +| Multiple simulation data | `make_2d_matrix()` | x, **y**, *cols*, *(rows)* | +| | `make_3d_matrix()` | x, y, **z+color**, *cols*, *(rows)* | +| | `make_3d_animation()` | $1$ x, y, **z+color**, *fr*; (2) x, y, **color**, *fr*; (3) x, y, **z+color**, *cols* | + +: (#tab:overview) Overview of various landscape functions provided by `simlandr`. Dimensions in bold represent the potential U calculated by the function. Dimensions in italic represent model parameters. Dimensions in parentheses are optional. + +## Step 3: barrier height calculation {#sec:singlel} + +An important property of the states in a landscape is their stability, which can be indicated by the barrier height that the system has to overcome when it transitions between one stable state to another adjacent state (see @CuiEtAlMetaphorComputationConstructing2021 for further discussions about different stability indicators). The barrier height is also related to the escape time that the system transitions from one valley to another, which can be tested empirically [@wangPotentialLandscapeFlux2008]. \CRANpkg{simlandr} provides tools to calculate the barrier heights from landscapes. These functions look for the local minima in given regions and try to find the saddle point between two local minima. The potential differences between the saddle point and local minima are calculated as barrier heights. + +In 2D cases, there is only one possible path connecting two local minima. The point on the path with the highest $U$ is identified as the saddle point. For 3D landscapes, there are multiple paths between two local minima. If we treat the system *as if* it is a gradient system with Brownian noise, then the most probable path (termed as the *minimum energy path*, MEP) that the system transitions is that it first goes along the steepest *ascent* path from the starting point, and then goes along the steepest *descent* path to the end point [@EVanden-EijndenTransitionpathTheoryPathfinding2010]. We find this path by minimizing the following action using the @dijkstra1959note algorithm [@HeymannVanden-EijndenPathwaysMaximumLikelihood2008] +\begin{equation} +\varphi_{\mathrm{MEP}} = \arg\min_\varphi \int_A^B |\nabla U||\mathrm{d}\varphi|\left( \approx \arg \min_\varphi \Sigma_i |\nabla U_i||\Delta \varphi_i|\right), +(\#eq:optim) +\end{equation} +where $A$ and $B$ are the starting and end points and $\varphi$ is the path starting at $A$ and ending in $B$. After that, the point with the maximum potential value on the MEP is identified as the saddle point. Note that while the barrier height still indicates the stability of local minima, the MEP may not be the true most probable path for a nongradient system to transition between stable states. + +# Examples + +We use two dynamical systems to illustrate the usage of the \CRANpkg{simlandr} package. The first one is a two-dimensional stochastic non-gradient gene expression model, which was used by @wangQuantifyingWaddingtonLandscape2011 to represent cell development and differentiation. The second example is a dynamic model of panic disorder [@robinaugh2024] which contains many more variables and parameters, non-Markovian property, and non-differentiable formulas. We mainly use the first example to show the agreement of the results from \CRANpkg{simlandr} with previous analytic results, and the second example as a typical use case of a complex dynamic model which is not treatable with other methods (also see @CuiEtAlMetaphorComputationConstructing2021 for more substantive discussions). Note that, both systems include more than two variables, making it impossible to perform the landscape analysis with other available R packages. + +## Example 1: the gene expression model + +This model is built on the mutual regulations of the expressions of two genes, in which $X_1$ and $X_2$ represent the expression levels of two genes which activate themselves and inhibit each other. A graphical illustration is shown in Figure \@ref(fig:e1) (adapted from @wangQuantifyingWaddingtonLandscape2011). Their dynamic functions can be written as +\begin{align} +\frac {\mathrm{d}X_ {1}}{\mathrm{d}t} &= \frac {ax_ {1}^ {n}}{S^ {n}+x_ {1}^ {n}} + \frac {bS^ {n}}{S^ {n}+x_ {2}^ {n}} - kx_ {1}+ \sigma_1 \frac{\mathrm{d}W_1}{\mathrm{d}t}, +(\#eq:sim2x1) +\\ +\frac {\mathrm{d}X_ {2}}{\mathrm{d}t} &= \frac {ax_ {2}^ {n}}{S^ {n}+x_ {2}^ {n}} + \frac {bS^ {n}}{S^ {n}+x_ {1}^ {n}} - kx_ {2}+ \sigma_2 \frac{\mathrm{d}W_2}{\mathrm{d}t}, +(\#eq:sim2x2) +\\ +\frac {\mathrm{d}a}{\mathrm{d}t} &= -\lambda a+ \sigma_3 \frac{\mathrm{d}W_3}{\mathrm{d}t}, +(\#eq:sim2a) +\end{align} +where $a$ represents the strength of self-activation, $b$ represents the strength of mutual-inhibition, and $k$ represents the speed of degradation. The development of an organism is modeled as $a$ decreasing at a certain speed $\lambda$. In the beginning, there is only one possible state for the cell. After a certain milestone, the cell differentiates into one of the two possible states. + +```{r e1, fig.show="hold", out.width="50%", fig.cap="A graphical illustration of the relationship between the activation levels of the two genes. Solid arrows represent positive relationships (i.e., activation) and dashed arrows represent negative relationships (i.e., inhibition).", fig.alt='Two large circles labeled X1 and X2. Dashed arrows labeled b connect the circles in both directions. Each circle has a solid self-loop labeled a and a dashed self-loop labeled k.', echo=FALSE,message=FALSE} +knitr::include_graphics("figures/diagram-e1.png") +``` + +This model can be simulated using the `sim_SDE()` function in \CRANpkg{simlandr}, with the default parameter setting $b = 1, k = 1, S = 0.5, n = 4, \lambda = 0.01$, and $\sigma_1 = \sigma_2 = \sigma_3 = 0.2$. + +```{r} +# Load the package. +library(simlandr) + +# Specify the simulation function. +b <- 1 +k <- 1 +S <- 0.5 +n <- 4 +lambda <- 0.01 + +drift_gene <- c( + rlang::expr(z * x^(!!n) / ((!!S)^(!!n) + x^(!!n)) + (!!b) * (!!S)^(!!n) / ((!!S)^(!!n) + y^(!!n)) - (!!k) * x), + rlang::expr(z * y^(!!n) / ((!!S)^(!!n) + y^(!!n)) + (!!b) * (!!S)^(!!n) / ((!!S)^(!!n) + x^(!!n)) - (!!k) * y), + rlang::expr(-(!!lambda) * z) +) |> as.expression() + +diffusion_gene <- expression( + 0.2, + 0.2, + 0.2 +) +``` + +```{r eval=FALSE} +# Perform a simulation and save the output. +set.seed(1614) +single_output_gene <- sim_SDE(drift = drift_gene, diffusion = diffusion_gene, N = 1e6, M = 10, Dt = 0.1, x0 = c(0, 0, 1), keep_full = FALSE) +``` + +```{r echo=FALSE} +# To save time for building the document, we save the output in a file. The same applies to the following examples. +if (!file.exists("data/single_output_gene.RDS")) { + set.seed(1614) + single_output_gene <- sim_SDE(drift = drift_gene, diffusion = diffusion_gene, N = 1e6, M = 10, Dt = 0.1, x0 = c(0, 0, 1), keep_full = FALSE) + xfun::dir_create("data") + saveRDS(single_output_gene, "data/single_output_gene.RDS") +} else { + single_output_gene <- readRDS("data/single_output_gene.RDS") +} +``` + +After the simulation, we perform some basic data wrangling to produce a dataset that can be used for further analysis. We create a new variable `delta_x` as the difference between X1 (X) and X2 (Y), and we rename the variable Z as `a`. + +```{r} +single_output_gene2 <- do.call(rbind, single_output_gene) +single_output_gene2 <- cbind(single_output_gene2[, "X"] - single_output_gene2[, "Y"], single_output_gene2[, "Z"]) +colnames(single_output_gene2) <- c("delta_x", "a") +``` + +We then perform the convergence check on the simulation result. First, we convert the simulation output to the format for the `coda` package, and thin the output to speed up the convergence check. + +```{r} +single_output_gene_mcmc_thin <- as.mcmc.list(lapply(single_output_gene, function(x) x[seq(1, nrow(x), by = 100), ])) +``` + +We then show the convergence diagnosis plot to check the convergence of the simulation in Figure \@ref(fig:converge-gene). The distribution of the two key variables in different simulation stages are converging, indicating that the simulation is long enough to provide reliable estimation of the steady-state distribution. Other convergence checks can also be readily performed using the \CRANpkg{coda} package. + +```{r converge-gene, message=FALSE, warning=FALSE, fig.cap="The convergence check result for the simulation of the gene expression model. The variables in different simulation stages did not show distributional differences, indicating that the simulation is long enough to provide a reliable estimation of the steady-state distribution.", fig.alt='Trace and density plots for variables x, y, and z.', out.width="100%"} +plot(single_output_gene_mcmc_thin) +``` + +We generated the 3D landscape for this model with `make_3d_single()`. Here, we use `x`, `y` to specify the variables of interest, and use `lims` to specify the limits of the x and y axes for the landscape. The `lims` argument can be left blank, then the limits will be automatically calculated. + +```{r warning=FALSE} +l_single_gene_3d <- + make_3d_single(single_output_gene2, + x = "delta_x", y = "a", + lims = c(-1.5, 1.5, 0, 1.5), + Umax = 8) +``` + +The resulting landscape is shown in the left panel of Figure \@ref(fig:3dstaticgene). In this plot, the x-axis represents $\Delta x (= x_1-x_2)$, and the y-axis represents $a$. To compare with, the potential landscape obtained analytically by @wangQuantifyingWaddingtonLandscape2011 is shown in the right panel of Figure \@ref(fig:3dstaticgene). The result of \CRANpkg{simlandr} appears to be very close to the result based on the analytical derivation. Note that because different normalization methods were used, the $U$ values of the two landscapes are not directly comparable. Here, we are mainly interested in their relative shape. + +```{r eval=FALSE} +plot(l_single_non_grad_3d) +``` + +```{r echo=FALSE,message=FALSE,warning=FALSE,results='hide'} +if(!file.exists("figures/3dstatic_gene.png")) { + plotly::orca(plot(l_single_gene_3d) |> + plotly::layout(scene = list( + aspectmode = "manual", aspectratio = list(x = 1.1, y = 1.1, z = 0.66), + xaxis = list(range = list(-2, 2)), + yaxis = list(range = list(0, 1.5)), + camera = list( + eye = list( + x = 0.3, y = -1.5, z = 1.5 + ) + ) + )), file = "figures/3dstatic_gene.png", height = 650, width = 750) +} +``` + +```{r 3dstaticgene, fig.show="hold", out.width="50%", fig.cap="The 3D landscape (potential value as z-axis) for the gene expression model. The left panel is the plot produced by simlandr; the right panel is the potential landscape obtained analytically by Wang et al. (2008), reproduced with the permission of the authors and in accordance with the journal policy.", fig.alt='Two similar landscape plots, each with three basins.', echo=FALSE,message=FALSE,warning=FALSE} +knitr::include_graphics("figures/3dstatic_gene.png") +knitr::include_graphics("figures/wang2011.png") +``` + +We then calculate the barrier for the landscape using `calculate_barrier()`. The barrier is calculated by specifying the start and end locations, and the radii of the start and end locations. The height of the barrier from two sides can be calculated with `get_barrier_height()`. + +```{r results='markup', fig.show='hide', warning=FALSE} +b_single_gene_3d <- calculate_barrier(l_single_gene_3d, + start_location_value = c(0, 1.2), end_location_value = c(1, 0.2), + start_r = 0.3, end_r = 0.3 +) + +get_barrier_height(b_single_gene_3d) +``` + +The local minima, the saddle point, and the MEP can be added to the landscape with `autolayer()`, shown in Figure \@ref(fig:bsingle3dgene). + +```{r bsingle3dgene, out.width="50%", fig.cap="The landscape for the gene expression model. The local minima are marked as white dots, the saddle points are marked as red dots, and the MEPs are marked as white lines.", fig.alt='A landscape plot with a white line connecting two white dots, passing through a red dot in the middle.', message=FALSE,warning=FALSE} + +plot(l_single_gene_3d, 2) + autolayer(b_single_gene_3d) +``` + +Next, we use multiple simulations to investigate the influence of two parameters, $k$ and $b$, on the stability of the system. As explained above, the parameter $b$ represents the strength of mutual-inhibition between the two genes. Therefore, as $b$ increases, we expect the differentiation is more extreme, that is, the cell is more likely to develop into one of the two cell types with very different gene expression levels. The valleys in the landscape representing the two types will become further apart and the barrier becomes clearer. The parameter $k$ represents the speed of degradation of the gene products. As $k$ increases, the gene products degrade faster, and this effect is more pronounced when the gene products are at high levels. Therefore, the dominant gene will express at a less extreme level, and we expect that the two valleys become closer, and the barrier will become less clear as $k$ increases. + +We use the batch simulation functions of \CRANpkg{simlandr}. First, we create the argument set for the batch simulation. This specifies the parameters to be varied. We examined three $b$ values, 0.5, 1, 1.5, and three $k$ values, 0.5, 1, 1.5, which form 9 possible combinations. + +```{r} +batch_arg_set_gene <- new_arg_set() +batch_arg_set_gene <- batch_arg_set_gene |> + add_arg_ele( + arg_name = "parameter", ele_name = "b", + start = 0.5, end = 1.5, by = 0.5 + ) |> + add_arg_ele( + arg_name = "parameter", ele_name = "k", + start = 0.5, end = 1.5, by = 0.5 + ) +batch_grid_gene <- make_arg_grid(batch_arg_set_gene) +``` + +We then perform the batch simulation with the `batch_simulation()` function. Here, we specify the simulation function to be used, which is similar to the single simulation we showed above, together with the data wrangling procedure. The simulation function is defined with the `sim_fun` argument. We also use `bigmemory = TRUE` to store the simulation results in the `hash_big_matrix` format, which is more memory-efficient. + +```{r eval=FALSE} +batch_output_gene <- batch_simulation(batch_grid_gene, + sim_fun = function(parameter) { + b <- parameter[["b"]] + k <- parameter[["k"]] + drift_gene <- c( + rlang::expr(z * x^(!!n) / ((!!S)^(!!n) + x^(!!n)) + (!!b) * (!!S)^(!!n) / ((!!S)^(!!n) + y^(!!n)) - (!!k) * x), + rlang::expr(z * y^(!!n) / ((!!S)^(!!n) + y^(!!n)) + (!!b) * (!!S)^(!!n) / ((!!S)^(!!n) + x^(!!n)) - (!!k) * y), + rlang::expr(-(!!lambda) * z) + ) |> as.expression() + set.seed(1614) + single_output_gene <- sim_SDE(drift = drift_gene, diffusion = diffusion_gene, N = 1e6, M = 10, Dt = 0.1, x0 = c(0, 0, 1), keep_full = FALSE) + single_output_gene2 <- do.call(rbind, single_output_gene) + single_output_gene2 <- cbind(single_output_gene2[, "X"] - single_output_gene2[, "Y"], single_output_gene2[, "Z"]) + colnames(single_output_gene2) <- c("delta_x", "a") + single_output_gene2 + }, + bigmemory = TRUE +) +``` + +If the output is saved in an RDS file, upon next use, it can be read as follows. + +```{r eval=FALSE} +saveRDS(batch_output_gene, "batch_output_gene.RDS") +batch_output_gene <- readRDS("batch_output_gene.RDS") |> attach_all_matrices() +``` + +```{r echo=FALSE} +if (file.exists("data/batch_output_gene.RDS")) { + batch_output_gene <- readRDS("data/batch_output_gene.RDS") |> attach_all_matrices()} else { + batch_output_gene <- batch_simulation(batch_grid_gene, + sim_fun = function(parameter) { + b <- parameter[["b"]] + k <- parameter[["k"]] + drift_gene <- c( + rlang::expr(z * x^(!!n) / ((!!S)^(!!n) + x^(!!n)) + (!!b) * (!!S)^(!!n) / ((!!S)^(!!n) + y^(!!n)) - (!!k) * x), + rlang::expr(z * y^(!!n) / ((!!S)^(!!n) + y^(!!n)) + (!!b) * (!!S)^(!!n) / ((!!S)^(!!n) + x^(!!n)) - (!!k) * y), + rlang::expr(-(!!lambda) * z) + ) |> as.expression() + set.seed(1614) + single_output_gene <- sim_SDE(drift = drift_gene, diffusion = diffusion_gene, N = 1e6, M = 10, Dt = 0.1, x0 = c(0, 0, 1), keep_full = FALSE) + single_output_gene2 <- do.call(rbind, single_output_gene) + single_output_gene2 <- cbind(single_output_gene2[, "X"] - single_output_gene2[, "Y"], single_output_gene2[, "Z"]) + colnames(single_output_gene2) <- c("delta_x", "a") + single_output_gene2 + }, + bigmemory = TRUE + ) + saveRDS(batch_output_gene, "data/batch_output_gene.RDS") + } +``` + +We then make the 3D matrix for the batch output, using `make_3d_matrix()`. + +```{r warning=FALSE, message=FALSE} +l_batch_gene_3d <- make_3d_matrix(batch_output_gene, + x = "delta_x", y = "a", cols = "b", rows = "k", + lims = c(-5, 5, -0.5, 2), h = 0.005, + Umax = 8, + kde_fun = "ks", individual_landscape = TRUE +) +``` + +For the barrier calculation step, the start and end points of the barrier may be different for each landscape plot. The following code shows how to create a barrier grid for each landscape plot. First, we create a barrier grid template using the function `make_barrier_grid_3d()`. Next, we modify the barrier grid template to create a barrier grid for the landscape plot. + +```{r results='markup', fig.show='hide', message=FALSE,warning=FALSE} +make_barrier_grid_3d(batch_grid_gene, + start_location_value = c(0, 1.5), end_location_value = c(1, -0.5), + start_r = 1, end_r = 1, print_template = TRUE +) + +bg_gene <- make_barrier_grid_3d(batch_grid_gene, df = structure(list(start_location_value = list(c(0, 1.5), c(0, 1.5), c(0, 1.5), c(0, 1.5), c(0, 1.5), c(0, 1.5), c(0, 1.5), c(0, 1.5), c(0, 1.5)), start_r = list(c(0.2, 1), c(0.2, 1), c(0.2, 1), c(0.2, 0.5), c(0.2, 0.5), c(0.2, 0.5), c(0.2, 0.3), c(0.2, 0.3), c(0.2, 0.3)), end_location_value = list( + c(2, 0), c(2, 0), c(2, 0), c(1, 0), c(1, 0), c(1, 0), c(1, 0), c(1, 0), c(1, 0) +), end_r = list( + c(1, 1), c(1, 1), c(1, 1), c(1, 1), c(1, 1), c(1, 1), c(1, 1), c(1, 1), c(1, 1) +)), row.names = c(NA, -9L), class = c( + "arg_grid", + "data.frame" +))) +``` + +With the barrier grid template, we can calculate the barrier for each landscape plot. + +```{r} +b_batch_gene_3d <- calculate_barrier(l_batch_gene_3d, + bg = bg_gene +) +``` + +If the barrier grid was not needed, the following code can be used to calculate the barrier. + +```{r eval=FALSE} +b_batch_gene_3d <- calculate_barrier(l_batch_gene_3d, start_location_value = c(0, 1.5), end_location_value = c(1, 0), start_r = 1, end_r = 1) +``` + +The resulting landscapes and the MEPs between states are shown in Figure \@ref(fig:bbatch3dgene). + +```{r bbatch3dgene, out.width="100%", fig.cap="The landscape for the gene expression model of different \\( b \\) and \\( k \\) values. The local minima are marked as white dots, the saddle points are marked as red dots, and the MEPs are marked as white lines.",fig.alt='Nine landscape plots arranged in a 3×3 grid. The x-axis is labeled delta x, the y-axis is labeled a. Rows correspond to k values (0.5, 1, 1.5), and columns to b values (0.5, 1, 1.5).', message=FALSE,warning=FALSE} +plot(l_batch_gene_3d) + autolayer(b_batch_gene_3d) +``` + +From the landscapes, it is clear that increasing $b$, which represents higher strength of gene mutual prohibition, makes the two differentiated states further apart from each other. Increasing $k$, which represents faster degradation, makes the undifferentiated state disappear earlier, thus makes the differentiation earlier. When $b$ is low enough and $k$ is high enough, there is no differentiation anymore because the two differentiated states merge together and form a more stable state at $\Delta x = 0$. In this case, there is no actual differentiation in the system, but only a one-to-one conversion of cell types. Only when $b$ is high enough and $k$ is low enough is it possible for the cell to differentiate into two types. + +## Example 2: panic disorder model + +The second example we use is the panic disorder model proposed by @robinaugh2024. The model is implemented in the \pkg{PanicModel} package (). It contains 12 variables and 33 parameters and also involves history dependency and non-differentiable formula (such as if-else conditions) to model the complex interplay of individual and environmental elements in different time scales. The most important variables of the model are the physical arousal ($A$) of a person (e.g., heart beat, muscle tension, sweating), the person's perceived threat ($PT$, how dangerous the person cognitively evaluates the environment), and the person's tendency to escape from the situation ($E$). The core theoretical idea of the model is that physical arousal and perceived threat of a person may strengthen each other in certain circumstances, leading to sudden increases in both variables, manifesting as panic attacks. The tendency that a person tends to use physical arousal as cognitive evidence of threat is represented by another variable, arousal schema $S$. A comprehensive introduction of the model is beyond the scope of the current article, and we would like to refer interested readers to @robinaugh2024. Here, to simplify the context, we assume that $AS$ does not change over time, and no psychotherapy is being administered. We focus on the influence of $AS$ on the system's stability represented by $A$ and $PT$. A graphical illustration of several core variables of this model is shown in Figure \@ref(fig:e2) (adapted from @CuiEtAlMetaphorComputationConstructing2021). + +```{r e2, fig.show="hold", out.width="100%", fig.cap="A graphical illustration of the relationships between several important psychological variables in the panic disorder model. Solid arrows represent positive relationships and dashed arrows represent negative relationships.", fig.alt='Four large circles labeled H, A, PT, and E from left to right. Solid arrows go from A to H, A to PT, PT to A, and PT to E. Dashed arrows go from H to A and from E to PT. The solid arrow from A to PT is labeled AS.', echo=FALSE,message=FALSE} +knitr::include_graphics("figures/diagram-e2.png") +``` + +To construct the potential landscapes for this model, we first create a function that performs a simulation using the `simPanic()` function from \pkg{PanicModel}. This is required as we need to modify some default options for illustration. + +```{r} +library(PanicModel) + +sim_fun_panic <- function(x0, par) { + + # Change several default parameters + pars <- pars_default + # Increase the noise strength to improve sampling efficiency + pars$N$lambda_N <- 200 + # Make S constant through the simulation + pars$TS$r_S_a <- 0 + pars$TS$r_S_e <- 0 + + # Specify the initial values of A and PT according to the format requirement by `multi_init_simulation()`, while the other variables use the default initial values. + initial <- initial_default + initial$A <- x0[1] + initial$PT <- x0[2] + + # Specify the value of S according to the format requirement by `batch_simulation()`. + initial$S <- par$S + + # Extract the simulation output from the result by simPanic(). Only keep the core variables. + return( + as.matrix( + simPanic(1:5000, initial = initial, parameters = pars)$outmat[, c("A", "PT", "E")] + ) + ) +} +``` + +We then perform a single simulation from multiple starting points. To speed up the simulation, we use parallel computing. + +```{r eval=FALSE} +future::plan("multisession") +set.seed(1614, kind = "L'Ecuyer-CMRG") +single_output_panic <- multi_init_simulation( + sim_fun = sim_fun_panic, + range_x0 = c(0, 1, 0, 1), + R = 4, + par = list(S = 0.5) +) +``` + +```{r echo=FALSE} +if (file.exists("data/single_output_panic.RDS")) { + single_output_panic <- readRDS("data/single_output_panic.RDS") +} else { + future::plan("multisession") + set.seed(1614, kind = "L'Ecuyer-CMRG") + single_output_panic <- multi_init_simulation( + sim_fun = sim_fun_panic, + range_x0 = c(0, 1, 0, 1), + R = 4, + par = list(S = 0.5) + ) + saveRDS(single_output_panic, "data/single_output_panic.RDS") +} +``` + +The convergence check results of the simulation, shown in Figure \@ref(fig:converge-panic), indicate that the time series of the first 100 data points are strongly influenced by the choice of initial value. Therefore, we remove the first 100 data points in the following analysis. + +```{r converge-panic, message=FALSE, warning=FALSE, fig.cap="The convergence check result for the simulation of the panic disorder model.", fig.alt='Trace and density plots for variables x, y, and z.', out.width="100%"} +plot(single_output_panic) +``` + +We then create the 3D landscape for the panic disorder model, shown in Figure \@ref(fig:3dstaticpanic). The landscape shows that the system has two stable states, which are represented by the valleys in the landscape. The system can be trapped in these valleys, leading to different levels of physical arousal and perceived threat. The valley with higher potential value represents a state with higher physical arousal and perceived threat, which corresponds to a panic attack. In contrast, the valley with lower potential value represents a state with lower physical arousal and perceived threat, which corresponds to a healthy state. + +```{r 3dstaticpanic, fig.show="hold", out.width="50%", fig.cap="The 3D landscape (potential value as color) for the panic disorder model", fig.alt='A landscape plot with two basins.', echo=FALSE,message=FALSE,warning=FALSE} +l_single_panic_3d <- make_3d_single( + single_output_panic |> window(start = 100), + x = "A", y = "PT", h = 0.005, lims = c(-1, 1.5, -0.5, 1.5)) +plot(l_single_panic_3d, 2) +``` + +We now investigate the effect of the parameter $S$ on the potential landscape. This parameter represents the tendency that a person considers physical arousal as a sign of danger. Therefore, we expect that higher $S$ will stabilize the panic state and destabilize the healthy state. + +We perform a batch simulation with varying $S$ values to construct the potential landscapes for different $S$ values. This, again, starts with the creation of a grid of parameter values. + +```{r} +batch_arg_grid_panic <- new_arg_set() |> + add_arg_ele(arg_name = "par", ele_name = "S", start = 0, end = 1, by = 0.5) |> + make_arg_grid() +``` + +We then perform the batch simulation using parallel computing. + +```{r eval=FALSE} +future::plan("multisession") +set.seed(1614, kind = "L'Ecuyer-CMRG") +batch_output_panic <- batch_simulation( + batch_arg_grid_panic, + sim_fun = function(par) { + multi_init_simulation( + sim_fun_panic, + range_x0 = c(0, 1, 0, 1), + R = 4, + par = par + ) |> window(start = 100) + } +) +``` + +```{r echo=FALSE,message=FALSE} +if (file.exists("data/batch_output_panic.RDS")) { + batch_output_panic <- readRDS("data/batch_output_panic.RDS") +} else { + future::plan("multisession") + set.seed(1614, kind = "L'Ecuyer-CMRG") + batch_output_panic <- batch_simulation( + batch_arg_grid_panic, + sim_fun = function(par) { + multi_init_simulation( + sim_fun_panic, + range_x0 = c(0, 1, 0, 1), + R = 4, + par = par + ) |> window(start = 100) + } + ) + saveRDS(batch_output_panic, "data/batch_output_panic.RDS") +} +``` + +The 3D landscapes for different $S$ values are shown in Figure \@ref(fig:lbatch3dpanic). The landscapes show that the system only has one stable state when $S$ is low, but has two stable states when $S$ is high. The stability of the panic state also increases when $S$ is higher. This indicates that a higher $S$ value corresponds to a higher risk of panic attacks. + +```{r lbatch3dpanic, out.width="100%", fig.cap="The landscape for the panic disorder model of different \\( S \\) values. Two landscapes are shown for different variable combinations, \\( A \\) and \\( PT \\), or \\( A \\) and \\( E \\).", fig.alt='Three landscape plots arranged in a row. The x-axis is labeled A, the y-axis is labeled PT. Columns correspond to S values of 0, 0.5, and 1.', message=FALSE,warning=FALSE} +l_batch_panic_3d <- make_3d_matrix(batch_output_panic, x = "A", y = "PT", cols = "S", h = 0.005, lims = c(-1, 1.5, -0.5, 1.5)) +plot(l_batch_panic_3d) +``` + +# Discussion + +Potential landscapes can show the stability of states for a dynamical system in an intuitive and quantitative way. They are especially informative for multistable systems. In this article, we illustrated how to construct potential landscapes using \CRANpkg{simlandr}. The potential landscapes generated by \CRANpkg{simlandr} are based on the steady-state distribution of the system, which is in turn estimated using Monte Carlo simulation. Compared to analytic methods, Monte Carlo estimation is more flexible and thus more applicable for complex models. The flexibility comes together with a higher demand for time and storage, which is necessary to make the estimation precise enough. The hash_big_matrix class partly solved this problem by dumping the memory storage to hard disk space. Also, it is important that the simulation function itself is efficient enough. The functions `sim_SDE()` and `multi_init_simulation()` make use of the efficient simulations provided by \CRANpkg{Sim.DiffProc} [@GuidoumBoukhetalaPerformingParallelMonte2020] and the parallel computing with the \CRANpkg{future} framework [@RJ-2021-048]. For customized simulation functions, there are also multiple approaches that can be used to improve the performance, for which we refer interested readers to @WickhamImprovingPerformance2019a. In Supplementary Materials A, we provide a benchmark of the typical time and memory usage of the procedures in \CRANpkg{simlandr}. From there, we can see that time and memory usage are acceptable in most cases on a personal computer. When the transition between attractors is rare, the `multi_init_simulation()` function may help to speed up the convergence, and more advanced sampling methods like importance sampling or rare event sampling may be needed in more complex situations. The detailed ways to implement such methods are highly dependent on the specific model and are beyond the scope of this package. We direct interested readers to @RubinoTuffinRareEventSimulation2009 and @KloekvanDijkBayesianEstimatesEquation1978 for a comprehensive review of rare event simulation methods. Nevertheless, the landscape construction functions in \CRANpkg{simlandr} allow users to provide weights for the simulation results, which can be used to adjust the sampling distribution. + +In addition, the length of the simulation and the choice of noise strength may also have an important influence on the results. If the length is too short, the density estimation will be inaccurate, resulting in rugged landscapes. If the length is too long, the simulation part would require more computational resources, which is not always realistic. If the noise is too weak, the system may not be able to converge in a reasonable time, resulting in problems in convergence checks, overly noisy landscapes, or failure to show valleys that are theoretically present. If the noise is too strong, the simulation may be unstable and the boundaries between valleys may be blurry. In Supplementary Materials B, we showed the influence of simulation length and noise strength on the landscape output. With some theoretical expectation of the system's behavior, it is not difficult to spot that the simulation is too short, or the noise level might be unsuitable. In that case, some adjustments are required before the landscape can be well constructed. + +All landscape construction and barrier calculation functions in \CRANpkg{simlandr} contain both visual aids and numerical data that can be used for further processing. The html plots based on \CRANpkg{plotly} are more suitable for interactive illustrations, while it is also possible to export them to static plots using `plotly::orca()`. The \CRANpkg{ggplot2} plots are readily usable for flat printing. + +We also want to note some limitations of the potential landscape generated by \CRANpkg{simlandr}. First, the generalized potential landscape is not a complete description of all dynamics in a system. It emphasizes the stability of different states by filtering out other dynamical information. Some behaviors are not possible in gradient systems (e.g., oscillations and loops), thus cannot be shown in a potential landscape [@zhouConstructionLandscapeMultistable2016]. Second, since the steady-state distribution is estimated using a kernel smoothing method, which depends on stochastic simulations, the resulting potential function may not be highly accurate. Its accuracy is further affected by the choice of kernel bandwidth and noise strength. This issue is particularly pronounced at valley edges, where fewer samples are available for estimation. Similar limitations apply to MEP calculations, as they are derived from the generalized landscape rather than the original dynamics. Therefore, we do not recommend directly interpreting the potential function or barrier height results for applications requiring high precision. Instead, the potential landscape is best used as a semi-quantitative tool to gain insights into the system's overall behavior, guide further analysis, and compare system behavior under different parameter settings, provided the same simulation and kernel estimation conditions are used. The examples in this article illustrated some typical use cases we recommend. + +# Availability and future directions + +This package is publicly available from the Comprehensive R Archive Network (CRAN) at \url{https://CRAN.R-project.org/package=simlandr}, under GPL-3 license. The results in the current article were generated with \CRANpkg{simlandr} 0.4.0 version. R script to replicate all the results in this article can be found in the supporting information. + +The barrier height data calculated by \CRANpkg{simlandr} can also be further analyzed and visualized. For example, sometimes it is helpful to look into how the barrier height changes with varying parameters (e.g., @CuiEtAlMetaphorComputationConstructing2021). We encourage users to explore other ways of analyzing and visualizing the various results provided by \CRANpkg{simlandr}. + +The method we chose for \CRANpkg{simlandr} is not the only possible one. The generalized landscape by @wangPotentialLandscapeFlux2008, which we implemented, is more flexible and emphasizes the possibility that the system is in a specific state, while other methods may have other strengths (e.g., the method by @rodriguez-sanchezPabRodRolldownPostpublication2020 emphasizes the gradient part of the vector field, and the method by @MooreEtAlQPotPackageStochastic2016 emphasizes the possibility of transition processes under small noise). We look forward to future theoretical and methodological developments in this direction. + +# Acknowledgments + +TL was supported by the NSFC under Grant No. 11825102 and the Beijing Academy of Artificial Intelligence (BAAI). ALA and MO were supported by an NWO VIDI grant, Grant No. VI.Vidi.191.178. diff --git a/_articles/RJ-2025-039/RJ-2025-039.html b/_articles/RJ-2025-039/RJ-2025-039.html new file mode 100644 index 0000000000..0825e00ad4 --- /dev/null +++ b/_articles/RJ-2025-039/RJ-2025-039.html @@ -0,0 +1,2524 @@ + + + + + + + + + + + + + + + + + + + + + + simlandr: Simulation-Based Landscape Construction for Dynamical Systems + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +

    simlandr: Simulation-Based Landscape Construction for Dynamical Systems

    + + + +

    We present the simlandr package for R, which provides a set of tools for constructing potential landscapes for dynamical systems using Monte Carlo simulation. Potential landscapes can be used to quantify the stability of system states. While the canonical form of a potential function is defined for gradient systems, generalized potential functions can also be defined for non-gradient dynamical systems. Our method is based on the potential landscape definition from the steady-state distribution, and can be used for a large variety of models. To facilitate simulation and computation, we introduce several novel features, including data structures optimized for batch simulations under varying conditions, an out-of-memory computation tool with integrated hash-based file-saving systems, and an algorithm for efficiently searching the minimum energy path. Using a multistable cell differentiation model as an example, we illustrate how simlandr can be used for model simulation, landscape construction, and barrier height calculation. The simlandr package is available at https://CRAN.R-project.org/package=simlandr, under GPL-3 license.

    +
    + + + +
    +
    + +
    +

    1 Introduction

    +

    To better understand a dynamical system, it is often important to know the stability of different states. The metaphor of a potential landscape consisting of hills and valleys has been used to illustrate differences in stability in many fields, including genetics (Waddington 1966; Wang et al. 2011), ecology (Lamothe et al. 2019), and psychology (Olthof et al. 2023). In such a landscape, the stable states of the system correspond to the lowest points (minima) in the valleys of the landscape. Just like a ball that is thrown in such a landscape will eventually gravitate towards such a minimum, the dynamical system is conceptually more likely to visit its stable states in which the system is also more resilient to noise. For example, in the landscape metaphor of psychopathology (Figure 1), the valleys represent different mental health states, their relative depth represents the relative stability of the states, and the barriers between valleys represent the difficulty of transitioning between these states Hayes and Andrews (2020). When the healthy state is more stable, the person is more likely to stay mentally healthy, whereas when the maladaptive state is more stable, the person is more likely to suffer from mental disorders.

    +
    +
    +Three landscape plots, each with a ball resting in one of two basins. The left basin is labeled maladaptive, the right basin healthy. In the first plot, the maladaptive basin is deeper; in the second, both basins are equally deep; in the third, the healthy basin is deeper. +

    +Figure 1: Illustration of the ball-and-landscape metaphor commonly used in the field of psychopathology. +

    +
    +
    +

    Yet, formally quantifying the stability of states is a nontrivial question. Here we present an R package, simlandr, that can quantify the stability of various kinds of systems without many mathematical restrictions.

    +

    Dynamical systems are usually modeled by stochastic differential equations, which may depend on the past history (i.e., may be non-Markovian, Stumpf et al. (2021)). They take the general form of +\[\begin{equation} +\mathrm{d} \boldsymbol{X}_t = \boldsymbol{b}(\boldsymbol{X}_t, \boldsymbol{H}_t){\mathrm{d}t} + \boldsymbol{\sigma}( \boldsymbol{X}_t, \boldsymbol{H}_t)\mathrm{d}\boldsymbol{W}, +\tag{1} +\end{equation}\] +where \(\boldsymbol{X}_t\) is the random variable representing the current state of the system and \(\boldsymbol{H_t}\) represents the past history of the system \(\boldsymbol{H}_t=\{\boldsymbol{X_s} | s \in [0, t)\}\) The first term on the right-hand side of Eq (1) represents the deterministic part of the dynamics, which is a function of the system’s current state \(\boldsymbol{b}(\boldsymbol{X}_t, \boldsymbol{H}_t)\). The second term represents the stochastic part, which is standard white noise \(\mathrm{d}\boldsymbol{W}\) multiplied by the noise strength \(\boldsymbol{\sigma}( \boldsymbol{X}_t, \boldsymbol{H}_t)\).

    +

    If the dynamical equation (Eq (1)) can be written in the following form +\[\begin{equation} +\mathrm{d} \boldsymbol{X} = - \nabla U {\mathrm{d} t} +\sqrt{2}\mathrm{d}\boldsymbol{W}, +\tag{2} +\end{equation}\] +then \(U\) is the potential function of the system. However, this is not possible for general dynamical systems. The trajectory of such a system may contain loops which are not possible to be represented by a gradient system (this issue was compared to Escher’s stairs by Rodríguez-Sánchez et al. (2020)). In this case, further generalization is needed. The theoretical background of simlandr is the generalized potential landscape by Wang et al. (2008), which is based on the Boltzmann distribution and the steady-state distribution of the system. The Boltzmann distribution is a distribution law in physics, which states the distribution of classical particles depends on the energy level they occupy. When the energy is higher, the particle is exponentially less likely to be in such states +\[\begin{equation} +P(\boldsymbol{x}) \propto \exp (-U). +\tag{3} +\end{equation}\]

    +

    This is then linked to dynamical systems by the steady-state distribution. The steady-state distribution of stochastic differential equations is the distribution that does not change over time, denoted by \(P_{\mathrm{SS}}\) which satisfies +\[\begin{equation} +\frac{\partial P_{\mathrm{SS}} (\boldsymbol{x},t)}{\partial t} = 0. +\tag{4} +\end{equation}\] +The steady-state distribution is important because it extracts time-invariant information from a set of stochastic differential equations. Substituting the steady-state distribution into Eq (3) gives Wang’s generalized potential landscape function (Wang et al. 2008) +\[\begin{equation} +U(\boldsymbol{x}) = - \ln P_{\mathrm{SS}}(\boldsymbol{x}). +\tag{5} +\end{equation}\] +If the system has ergodicity (i.e., after sufficient time it can travel to all possible states in the state space), the long-term sample distribution can be used to estimate the steady-state distribution, and the generalized potential function can be calculated.

    +

    Our approach is not the only possible way for constructing potential landscapes. Many other theoretical approaches are available, including the SDE decomposition method by Ao (2004) and the quasi-potential by Freidlin and Wentzell (2012). Various strategies to numerically compute these landscapes have been proposed (see Zhou and Li (2016) for a review). However, available realizations are still scarce. To our knowledge, besides simlandr, there are two existing packages specifically for computing potential landscapes: the waydown package (Rodríguez-Sánchez 2020) and the QPot package (Moore et al. 2016; Dahiya and Cameron 2018). The waydown package uses the skew-symmetric decomposition of the Jacobian, which theoretically produces landscapes that are similar to Wang et al. (2008) (but see Cui et al. (2023a) for a potential technical issue with this package.). The QPot package uses a path integral method that produces quasi-landscapes following the definition by Freidlin and Wentzell (2012). Because of the analytical methods used by waydown and QPot, they both require the dynamic function to be Markovian and differentiable in the whole state space. Moreover, they can only be used for systems of up to two dimensions. simlandr, in contrast, is based on Monte Carlo simulation and the steady-state distribution. It does not have specific requirements for the model. Even for models that are not globally differentiable, have history-dependence, and are defined in a high-dimensional space, simlandr is still applicable (e.g., Cui et al. (2023b)). Therefore, simlandr can be applied to a much wider range of dynamical systems, illustrate a big picture of dominated attractors, and investigate how the stability of different attractors may be influenced by model parameters. As a trade-off, simlandr is not designed for rare events sampling in which the noise strength \(\boldsymbol{\sigma}(\boldsymbol{X})\) is extremely small, nor for the precise calculation of the tail probability and transition paths. Instead, it is better to view simlandr as a semi-quantitative tool that provides a broad overview of key attractors in dynamic systems, allowing for comparisons of their relative stability and the investigation of system parameter influences. We will show some typical use cases of simlandr in later sections. Some key terms used in this article are summarized in Table 1.

    + + ++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Table 1: Summary of key terms used in this article.
    TermExplanation
    Potential landscape metaphorA conceptual metaphor representing the stability of a complex dynamic system as an uneven landscape, with a ball on it representing the system’s state. This can be quantitatively realized in various ways (Zhou and Li 2016).
    Gradient systemA system whose deterministic motion can be described solely by the gradient of a potential function.
    Non-Markovian systemA system whose future evolution depends not only on its current state but also on its past history.
    Steady-state distributionThe probability distribution of a dynamic system that remains unchanged over time.
    Ergodic systemA dynamic system that, given enough time, will eventually pass through all possible states.
    Minimum energy path (MEP)A transition path linking two local minima and passing through a saddle point (Wan et al. 2024). It is always parallel to the gradient of the energy landscape, representing an efficient transition route.
    +

    2 Design and implementation

    +

    The general workflow of simlandr involves three steps: model simulation, landscape construction, and barrier height computation. See Figure 2 for a summary.

    +
    +
    +A flow chart showing the analysis steps in simlandr, with functions listed under each step. +

    +Figure 2: The structure and workflow of simlandr. +

    +
    +
    +

    2.1 Step 1: model simulation

    +

    For the first step, a simulation function should be created by the user. This function should be able to simulate how the dynamical system of interest evolves over time and record its state in every time step. This can often be done with the Euler-Maruyama method. If the SDEs are up to three dimensions and Markovian, a helper function from simlandr, sim_SDE(), can also be used. This function is based on the simulation utilities from the Sim.DiffProc (Guidoum and Boukhetala 2020) and the output can be directly used for later steps. Moreover, the multi_init_simulation() function can be used to simulate trajectories from various starting points, thus reducing the possibility for the system to be trapped in a local minimum. The multi_init_simulation() function also supports parallel simulation based on the future framework to improve time efficiency Bengtsson (2021).

    +

    For Monte Carlo methods, it is important that the simulation converges, which means the distribution of the system is roughly stable. The precision of the steady-state distribution estimation, according to Eq (5), determines the precision of the distribution estimation. simlandr provides a visual tool to compare the sample distributions in different stages (check_conv()), whereas the coda package (Plummer et al. 2006) can be used for more advanced diagnostics. The output of the sim_SDE() and multi_init_simulation() functions also uses the classes from the coda package to enable easy convergence diagnosis. To achieve ergodicity in reasonable time, sometimes stronger noises need to be added to the system.

    +

    A simulation function is sufficient if the user is only interested in a single model setting. If the model is parameterized and the user wants to investigate the influence of parameters on the stability of the system, then multiple simulations need to be run with different parameter settings. simlandr provides functions to perform batch simulations and store the outputs for landscape construction as one object. This can later be used to compare the stability under different parameter settings or produce animations to show how a model parameter influences the stability of the model.

    +

    In many cases, the output of the simulation is so large that it cannot be properly stored in the memory. simlandr provides a hash_big_matrix class, which is a modification of the big.matrix class from the bigmemory package (Kane et al. 2013), that can perform out-of-memory computation and organize the data files in the disk. In an out-of-memory computation, the majority of the data is not loaded into the memory, but only the small subset of data that is used for the current computation step. Therefore, the memory occupation is dramatically reduced. The big.matrix class in the bigmemory package provides a powerful tool for out-of-memory computation. It, however, requires an explicit file name for each matrix, which can be cumbersome if there are many matrices to be handled, and this is likely to be the case in a batch simulation. The hash_big_matrix class automatically generates the file names using the md5 values of the matrices with the digest package(Eddelbuettel et al. 2021) and stores it within the object. Therefore, the file links can also be restored automatically.

    +

    2.2 Step 2: landscape construction

    +

    simlandr provides a set of tools to construct 2D, 3D, and 4D landscapes from single or multiple simulation results. The steady-state distribution for selected variables of the system is first estimated using the kernel density estimates (KDE). The density function in R is used for 2D landscapes, whereas the ks package (Chacón and Duong 2018) is used by default for 3D and 4D landscapes because of its higher efficiency. Then the potential function \(U\) is calculated from Eq (5). The landscape plots without a z-axis are created with ggplot2 (Wickham 2016), and those with a z-axis are created with plotly (Sievert 2020). These plots can be further refined using the standard ggplot2 or plotly methods. See Table 2 for an overview for the family of landscape functions.

    + + +++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Table 2: Overview of various landscape functions provided by simlandr. Dimensions in bold represent the potential U calculated by the function. Dimensions in italic represent model parameters. Dimensions in parentheses are optional.
    Type of InputFunctionDimensions
    Single simulation datamake_2d_static()x, y
    make_3d_static()\(1\) x, y, z+color; (2) x, y, color
    make_4d_static()x, y, z, color
    Multiple simulation datamake_2d_matrix()x, y, cols, (rows)
    make_3d_matrix()x, y, z+color, cols, (rows)
    make_3d_animation()\(1\) x, y, z+color, fr; (2) x, y, color, fr; (3) x, y, z+color, cols
    +

    2.3 Step 3: barrier height calculation

    +

    An important property of the states in a landscape is their stability, which can be indicated by the barrier height that the system has to overcome when it transitions between one stable state to another adjacent state (see Cui et al. (2023b) for further discussions about different stability indicators). The barrier height is also related to the escape time that the system transitions from one valley to another, which can be tested empirically (Wang et al. 2008). simlandr provides tools to calculate the barrier heights from landscapes. These functions look for the local minima in given regions and try to find the saddle point between two local minima. The potential differences between the saddle point and local minima are calculated as barrier heights.

    +

    In 2D cases, there is only one possible path connecting two local minima. The point on the path with the highest \(U\) is identified as the saddle point. For 3D landscapes, there are multiple paths between two local minima. If we treat the system as if it is a gradient system with Brownian noise, then the most probable path (termed as the minimum energy path, MEP) that the system transitions is that it first goes along the steepest ascent path from the starting point, and then goes along the steepest descent path to the end point (E and Vanden-Eijnden 2010). We find this path by minimizing the following action using the Dijkstra (1959) algorithm (Heymann and Vanden-Eijnden 2008) +\[\begin{equation} +\varphi_{\mathrm{MEP}} = \arg\min_\varphi \int_A^B |\nabla U||\mathrm{d}\varphi|\left( \approx \arg \min_\varphi \Sigma_i |\nabla U_i||\Delta \varphi_i|\right), +\tag{6} +\end{equation}\] +where \(A\) and \(B\) are the starting and end points and \(\varphi\) is the path starting at \(A\) and ending in \(B\). After that, the point with the maximum potential value on the MEP is identified as the saddle point. Note that while the barrier height still indicates the stability of local minima, the MEP may not be the true most probable path for a nongradient system to transition between stable states.

    +

    3 Examples

    +

    We use two dynamical systems to illustrate the usage of the simlandr package. The first one is a two-dimensional stochastic non-gradient gene expression model, which was used by Wang et al. (2011) to represent cell development and differentiation. The second example is a dynamic model of panic disorder (Robinaugh et al. 2024) which contains many more variables and parameters, non-Markovian property, and non-differentiable formulas. We mainly use the first example to show the agreement of the results from simlandr with previous analytic results, and the second example as a typical use case of a complex dynamic model which is not treatable with other methods (also see Cui et al. (2023b) for more substantive discussions). Note that, both systems include more than two variables, making it impossible to perform the landscape analysis with other available R packages.

    +

    3.1 Example 1: the gene expression model

    +

    This model is built on the mutual regulations of the expressions of two genes, in which \(X_1\) and \(X_2\) represent the expression levels of two genes which activate themselves and inhibit each other. A graphical illustration is shown in Figure 3 (adapted from Wang et al. (2011)). Their dynamic functions can be written as +\[\begin{align} +\frac {\mathrm{d}X_ {1}}{\mathrm{d}t} &= \frac {ax_ {1}^ {n}}{S^ {n}+x_ {1}^ {n}} + \frac {bS^ {n}}{S^ {n}+x_ {2}^ {n}} - kx_ {1}+ \sigma_1 \frac{\mathrm{d}W_1}{\mathrm{d}t}, +\tag{7} +\\ +\frac {\mathrm{d}X_ {2}}{\mathrm{d}t} &= \frac {ax_ {2}^ {n}}{S^ {n}+x_ {2}^ {n}} + \frac {bS^ {n}}{S^ {n}+x_ {1}^ {n}} - kx_ {2}+ \sigma_2 \frac{\mathrm{d}W_2}{\mathrm{d}t}, +\tag{8} +\\ +\frac {\mathrm{d}a}{\mathrm{d}t} &= -\lambda a+ \sigma_3 \frac{\mathrm{d}W_3}{\mathrm{d}t}, +\tag{9} +\end{align}\] +where \(a\) represents the strength of self-activation, \(b\) represents the strength of mutual-inhibition, and \(k\) represents the speed of degradation. The development of an organism is modeled as \(a\) decreasing at a certain speed \(\lambda\). In the beginning, there is only one possible state for the cell. After a certain milestone, the cell differentiates into one of the two possible states.

    +
    +
    +Two large circles labeled X1 and X2. Dashed arrows labeled b connect the circles in both directions. Each circle has a solid self-loop labeled a and a dashed self-loop labeled k. +

    +Figure 3: A graphical illustration of the relationship between the activation levels of the two genes. Solid arrows represent positive relationships (i.e., activation) and dashed arrows represent negative relationships (i.e., inhibition). +

    +
    +
    +

    This model can be simulated using the sim_SDE() function in simlandr, with the default parameter setting \(b = 1, k = 1, S = 0.5, n = 4, \lambda = 0.01\), and \(\sigma_1 = \sigma_2 = \sigma_3 = 0.2\).

    +
    +
    +
    # Load the package.
    +library(simlandr)
    +
    +# Specify the simulation function.
    +b <- 1
    +k <- 1
    +S <- 0.5
    +n <- 4
    +lambda <- 0.01
    +
    +drift_gene <- c(rlang::expr(z * x^(!!n)/((!!S)^(!!n) + x^(!!n)) + (!!b) *
    +    (!!S)^(!!n)/((!!S)^(!!n) + y^(!!n)) - (!!k) * x), rlang::expr(z * y^(!!n)/((!!S)^(!!n) +
    +    y^(!!n)) + (!!b) * (!!S)^(!!n)/((!!S)^(!!n) + x^(!!n)) - (!!k) * y),
    +    rlang::expr(-(!!lambda) * z)) |>
    +    as.expression()
    +
    +diffusion_gene <- expression(0.2, 0.2, 0.2)
    +
    +
    +
    +
    +
    # Perform a simulation and save the output.
    +set.seed(1614)
    +single_output_gene <- sim_SDE(drift = drift_gene, diffusion = diffusion_gene,
    +    N = 1000000, M = 10, Dt = 0.1, x0 = c(0, 0, 1), keep_full = FALSE)
    +
    +
    +
    + +
    +

    After the simulation, we perform some basic data wrangling to produce a dataset that can be used for further analysis. We create a new variable delta_x as the difference between X1 (X) and X2 (Y), and we rename the variable Z as a.

    +
    +
    +
    single_output_gene2 <- do.call(rbind, single_output_gene)
    +single_output_gene2 <- cbind(single_output_gene2[, "X"] - single_output_gene2[,
    +    "Y"], single_output_gene2[, "Z"])
    +colnames(single_output_gene2) <- c("delta_x", "a")
    +
    +
    +

    We then perform the convergence check on the simulation result. First, we convert the simulation output to the format for the coda package, and thin the output to speed up the convergence check.

    +
    +
    +
    single_output_gene_mcmc_thin <- as.mcmc.list(lapply(single_output_gene,
    +    function(x) x[seq(1, nrow(x), by = 100), ]))
    +
    +
    +

    We then show the convergence diagnosis plot to check the convergence of the simulation in Figure 4. The distribution of the two key variables in different simulation stages are converging, indicating that the simulation is long enough to provide reliable estimation of the steady-state distribution. Other convergence checks can also be readily performed using the coda package.

    +
    +
    +
    plot(single_output_gene_mcmc_thin)
    +
    +
    +Trace and density plots for variables x, y, and z. +

    +Figure 4: The convergence check result for the simulation of the gene expression model. The variables in different simulation stages did not show distributional differences, indicating that the simulation is long enough to provide a reliable estimation of the steady-state distribution. +

    +
    +
    +

    We generated the 3D landscape for this model with make_3d_single(). Here, we use x, y to specify the variables of interest, and use lims to specify the limits of the x and y axes for the landscape. The lims argument can be left blank, then the limits will be automatically calculated.

    +
    +
    +
    l_single_gene_3d <- make_3d_single(single_output_gene2, x = "delta_x",
    +    y = "a", lims = c(-1.5, 1.5, 0, 1.5), Umax = 8)
    +
    +
    +

    The resulting landscape is shown in the left panel of Figure 5. In this plot, the x-axis represents \(\Delta x (= x_1-x_2)\), and the y-axis represents \(a\). To compare with, the potential landscape obtained analytically by Wang et al. (2011) is shown in the right panel of Figure 5. The result of simlandr appears to be very close to the result based on the analytical derivation. Note that because different normalization methods were used, the \(U\) values of the two landscapes are not directly comparable. Here, we are mainly interested in their relative shape.

    +
    +
    +
    plot(l_single_non_grad_3d)
    +
    +
    +
    + +
    +
    +
    +Two similar landscape plots, each with three basins.Two similar landscape plots, each with three basins. +

    +Figure 5: The 3D landscape (potential value as z-axis) for the gene expression model. The left panel is the plot produced by simlandr; the right panel is the potential landscape obtained analytically by Wang et al. (2008), reproduced with the permission of the authors and in accordance with the journal policy. +

    +
    +
    +

    We then calculate the barrier for the landscape using calculate_barrier(). The barrier is calculated by specifying the start and end locations, and the radii of the start and end locations. The height of the barrier from two sides can be calculated with get_barrier_height().

    +
    +
    +
    b_single_gene_3d <- calculate_barrier(l_single_gene_3d, start_location_value = c(0,
    +    1.2), end_location_value = c(1, 0.2), start_r = 0.3, end_r = 0.3)
    +
    +get_barrier_height(b_single_gene_3d)
    +
    +
    delta_U_start   delta_U_end 
    +     2.506326      2.810604 
    +
    +

    The local minima, the saddle point, and the MEP can be added to the landscape with autolayer(), shown in Figure 6.

    +
    +
    +
    plot(l_single_gene_3d, 2) + autolayer(b_single_gene_3d)
    +
    +
    +A landscape plot with a white line connecting two white dots, passing through a red dot in the middle. +

    +Figure 6: The landscape for the gene expression model. The local minima are marked as white dots, the saddle points are marked as red dots, and the MEPs are marked as white lines. +

    +
    +
    +

    Next, we use multiple simulations to investigate the influence of two parameters, \(k\) and \(b\), on the stability of the system. As explained above, the parameter \(b\) represents the strength of mutual-inhibition between the two genes. Therefore, as \(b\) increases, we expect the differentiation is more extreme, that is, the cell is more likely to develop into one of the two cell types with very different gene expression levels. The valleys in the landscape representing the two types will become further apart and the barrier becomes clearer. The parameter \(k\) represents the speed of degradation of the gene products. As \(k\) increases, the gene products degrade faster, and this effect is more pronounced when the gene products are at high levels. Therefore, the dominant gene will express at a less extreme level, and we expect that the two valleys become closer, and the barrier will become less clear as \(k\) increases.

    +

    We use the batch simulation functions of simlandr. First, we create the argument set for the batch simulation. This specifies the parameters to be varied. We examined three \(b\) values, 0.5, 1, 1.5, and three \(k\) values, 0.5, 1, 1.5, which form 9 possible combinations.

    +
    +
    +
    batch_arg_set_gene <- new_arg_set()
    +batch_arg_set_gene <- batch_arg_set_gene |>
    +    add_arg_ele(arg_name = "parameter", ele_name = "b", start = 0.5, end = 1.5,
    +        by = 0.5) |>
    +    add_arg_ele(arg_name = "parameter", ele_name = "k", start = 0.5, end = 1.5,
    +        by = 0.5)
    +batch_grid_gene <- make_arg_grid(batch_arg_set_gene)
    +
    +
    +

    We then perform the batch simulation with the batch_simulation() function. Here, we specify the simulation function to be used, which is similar to the single simulation we showed above, together with the data wrangling procedure. The simulation function is defined with the sim_fun argument. We also use bigmemory = TRUE to store the simulation results in the hash_big_matrix format, which is more memory-efficient.

    +
    +
    +
    batch_output_gene <- batch_simulation(batch_grid_gene, sim_fun = function(parameter) {
    +    b <- parameter[["b"]]
    +    k <- parameter[["k"]]
    +    drift_gene <- c(rlang::expr(z * x^(!!n)/((!!S)^(!!n) + x^(!!n)) + (!!b) *
    +        (!!S)^(!!n)/((!!S)^(!!n) + y^(!!n)) - (!!k) * x), rlang::expr(z *
    +        y^(!!n)/((!!S)^(!!n) + y^(!!n)) + (!!b) * (!!S)^(!!n)/((!!S)^(!!n) +
    +        x^(!!n)) - (!!k) * y), rlang::expr(-(!!lambda) * z)) |>
    +        as.expression()
    +    set.seed(1614)
    +    single_output_gene <- sim_SDE(drift = drift_gene, diffusion = diffusion_gene,
    +        N = 1000000, M = 10, Dt = 0.1, x0 = c(0, 0, 1), keep_full = FALSE)
    +    single_output_gene2 <- do.call(rbind, single_output_gene)
    +    single_output_gene2 <- cbind(single_output_gene2[, "X"] - single_output_gene2[,
    +        "Y"], single_output_gene2[, "Z"])
    +    colnames(single_output_gene2) <- c("delta_x", "a")
    +    single_output_gene2
    +}, bigmemory = TRUE)
    +
    +
    +

    If the output is saved in an RDS file, upon next use, it can be read as follows.

    +
    +
    +
    saveRDS(batch_output_gene, "batch_output_gene.RDS")
    +batch_output_gene <- readRDS("batch_output_gene.RDS") |>
    +    attach_all_matrices()
    +
    +
    +
    + +
    +

    We then make the 3D matrix for the batch output, using make_3d_matrix().

    +
    +
    +
    l_batch_gene_3d <- make_3d_matrix(batch_output_gene, x = "delta_x", y = "a",
    +    cols = "b", rows = "k", lims = c(-5, 5, -0.5, 2), h = 0.005, Umax = 8,
    +    kde_fun = "ks", individual_landscape = TRUE)
    +
    +
    +

    For the barrier calculation step, the start and end points of the barrier may be different for each landscape plot. The following code shows how to create a barrier grid for each landscape plot. First, we create a barrier grid template using the function make_barrier_grid_3d(). Next, we modify the barrier grid template to create a barrier grid for the landscape plot.

    +
    +
    +
    make_barrier_grid_3d(batch_grid_gene, start_location_value = c(0, 1.5),
    +    end_location_value = c(1, -0.5), start_r = 1, end_r = 1, print_template = TRUE)
    +
    +
    structure(list(start_location_value = list(c(0, 1.5), c(0, 1.5
    +), c(0, 1.5), c(0, 1.5), c(0, 1.5), c(0, 1.5), c(0, 1.5), c(0, 
    +1.5), c(0, 1.5)), start_r = list(c(1, 1), c(1, 1), c(1, 1), c(1, 
    +1), c(1, 1), c(1, 1), c(1, 1), c(1, 1), c(1, 1)), end_location_value = list(
    +    c(1, -0.5), c(1, -0.5), c(1, -0.5), c(1, -0.5), c(1, -0.5
    +    ), c(1, -0.5), c(1, -0.5), c(1, -0.5), c(1, -0.5)), end_r = list(
    +    c(1, 1), c(1, 1), c(1, 1), c(1, 1), c(1, 1), c(1, 1), c(1, 
    +    1), c(1, 1), c(1, 1))), row.names = c(NA, -9L), class = c("arg_grid", 
    +"data.frame"))
    +
          ele_list   b   k start_location_value start_r
    +1 list(b =.... 0.5 0.5               0, 1.5    1, 1
    +2 list(b =.... 1.0 0.5               0, 1.5    1, 1
    +3 list(b =.... 1.5 0.5               0, 1.5    1, 1
    +4 list(b =.... 0.5 1.0               0, 1.5    1, 1
    +5 list(b =.... 1.0 1.0               0, 1.5    1, 1
    +6 list(b =.... 1.5 1.0               0, 1.5    1, 1
    +7 list(b =.... 0.5 1.5               0, 1.5    1, 1
    +8 list(b =.... 1.0 1.5               0, 1.5    1, 1
    +9 list(b =.... 1.5 1.5               0, 1.5    1, 1
    +  end_location_value end_r
    +1            1, -0.5  1, 1
    +2            1, -0.5  1, 1
    +3            1, -0.5  1, 1
    +4            1, -0.5  1, 1
    +5            1, -0.5  1, 1
    +6            1, -0.5  1, 1
    +7            1, -0.5  1, 1
    +8            1, -0.5  1, 1
    +9            1, -0.5  1, 1
    +
    +
    bg_gene <- make_barrier_grid_3d(batch_grid_gene, df = structure(list(start_location_value = list(c(0,
    +    1.5), c(0, 1.5), c(0, 1.5), c(0, 1.5), c(0, 1.5), c(0, 1.5), c(0, 1.5),
    +    c(0, 1.5), c(0, 1.5)), start_r = list(c(0.2, 1), c(0.2, 1), c(0.2,
    +    1), c(0.2, 0.5), c(0.2, 0.5), c(0.2, 0.5), c(0.2, 0.3), c(0.2, 0.3),
    +    c(0.2, 0.3)), end_location_value = list(c(2, 0), c(2, 0), c(2, 0),
    +    c(1, 0), c(1, 0), c(1, 0), c(1, 0), c(1, 0), c(1, 0)), end_r = list(c(1,
    +    1), c(1, 1), c(1, 1), c(1, 1), c(1, 1), c(1, 1), c(1, 1), c(1, 1),
    +    c(1, 1))), row.names = c(NA, -9L), class = c("arg_grid", "data.frame")))
    +
    +
    +

    With the barrier grid template, we can calculate the barrier for each landscape plot.

    +
    +
    +
    b_batch_gene_3d <- calculate_barrier(l_batch_gene_3d, bg = bg_gene)
    +
    +
    +

    If the barrier grid was not needed, the following code can be used to calculate the barrier.

    +
    +
    +
    b_batch_gene_3d <- calculate_barrier(l_batch_gene_3d, start_location_value = c(0,
    +    1.5), end_location_value = c(1, 0), start_r = 1, end_r = 1)
    +
    +
    +

    The resulting landscapes and the MEPs between states are shown in Figure 7.

    +
    +
    +
    plot(l_batch_gene_3d) + autolayer(b_batch_gene_3d)
    +
    +
    +Nine landscape plots arranged in a 3×3 grid. The x-axis is labeled delta x, the y-axis is labeled a. Rows correspond to k values (0.5, 1, 1.5), and columns to b values (0.5, 1, 1.5). +

    +Figure 7: The landscape for the gene expression model of different \(b\) and \(k\) values. The local minima are marked as white dots, the saddle points are marked as red dots, and the MEPs are marked as white lines. +

    +
    +
    +

    From the landscapes, it is clear that increasing \(b\), which represents higher strength of gene mutual prohibition, makes the two differentiated states further apart from each other. Increasing \(k\), which represents faster degradation, makes the undifferentiated state disappear earlier, thus makes the differentiation earlier. When \(b\) is low enough and \(k\) is high enough, there is no differentiation anymore because the two differentiated states merge together and form a more stable state at \(\Delta x = 0\). In this case, there is no actual differentiation in the system, but only a one-to-one conversion of cell types. Only when \(b\) is high enough and \(k\) is low enough is it possible for the cell to differentiate into two types.

    +

    3.2 Example 2: panic disorder model

    +

    The second example we use is the panic disorder model proposed by Robinaugh et al. (2024). The model is implemented in the PanicModel package (https://github.com/jmbh/PanicModel/). It contains 12 variables and 33 parameters and also involves history dependency and non-differentiable formula (such as if-else conditions) to model the complex interplay of individual and environmental elements in different time scales. The most important variables of the model are the physical arousal (\(A\)) of a person (e.g., heart beat, muscle tension, sweating), the person’s perceived threat (\(PT\), how dangerous the person cognitively evaluates the environment), and the person’s tendency to escape from the situation (\(E\)). The core theoretical idea of the model is that physical arousal and perceived threat of a person may strengthen each other in certain circumstances, leading to sudden increases in both variables, manifesting as panic attacks. The tendency that a person tends to use physical arousal as cognitive evidence of threat is represented by another variable, arousal schema \(S\). A comprehensive introduction of the model is beyond the scope of the current article, and we would like to refer interested readers to Robinaugh et al. (2024). Here, to simplify the context, we assume that \(AS\) does not change over time, and no psychotherapy is being administered. We focus on the influence of \(AS\) on the system’s stability represented by \(A\) and \(PT\). A graphical illustration of several core variables of this model is shown in Figure 8 (adapted from Cui et al. (2023b)).

    +
    +
    +Four large circles labeled H, A, PT, and E from left to right. Solid arrows go from A to H, A to PT, PT to A, and PT to E. Dashed arrows go from H to A and from E to PT. The solid arrow from A to PT is labeled AS. +

    +Figure 8: A graphical illustration of the relationships between several important psychological variables in the panic disorder model. Solid arrows represent positive relationships and dashed arrows represent negative relationships. +

    +
    +
    +

    To construct the potential landscapes for this model, we first create a function that performs a simulation using the simPanic() function from PanicModel. This is required as we need to modify some default options for illustration.

    +
    +
    +
    library(PanicModel)
    +
    +sim_fun_panic <- function(x0, par) {
    +
    +    # Change several default parameters
    +    pars <- pars_default
    +    # Increase the noise strength to improve sampling efficiency
    +    pars$N$lambda_N <- 200
    +    # Make S constant through the simulation
    +    pars$TS$r_S_a <- 0
    +    pars$TS$r_S_e <- 0
    +
    +    # Specify the initial values of A and PT according to the format
    +    # requirement by `multi_init_simulation()`, while the other
    +    # variables use the default initial values.
    +    initial <- initial_default
    +    initial$A <- x0[1]
    +    initial$PT <- x0[2]
    +
    +    # Specify the value of S according to the format requirement by
    +    # `batch_simulation()`.
    +    initial$S <- par$S
    +
    +    # Extract the simulation output from the result by simPanic().
    +    # Only keep the core variables.
    +    return(as.matrix(simPanic(1:5000, initial = initial, parameters = pars)$outmat[,
    +        c("A", "PT", "E")]))
    +}
    +
    +
    +

    We then perform a single simulation from multiple starting points. To speed up the simulation, we use parallel computing.

    +
    +
    +
    future::plan("multisession")
    +set.seed(1614, kind = "L'Ecuyer-CMRG")
    +single_output_panic <- multi_init_simulation(sim_fun = sim_fun_panic, range_x0 = c(0,
    +    1, 0, 1), R = 4, par = list(S = 0.5))
    +
    +
    +
    + +
    +

    The convergence check results of the simulation, shown in Figure 9, indicate that the time series of the first 100 data points are strongly influenced by the choice of initial value. Therefore, we remove the first 100 data points in the following analysis.

    +
    +
    +
    plot(single_output_panic)
    +
    +
    +Trace and density plots for variables x, y, and z. +

    +Figure 9: The convergence check result for the simulation of the panic disorder model. +

    +
    +
    +

    We then create the 3D landscape for the panic disorder model, shown in Figure 10. The landscape shows that the system has two stable states, which are represented by the valleys in the landscape. The system can be trapped in these valleys, leading to different levels of physical arousal and perceived threat. The valley with higher potential value represents a state with higher physical arousal and perceived threat, which corresponds to a panic attack. In contrast, the valley with lower potential value represents a state with lower physical arousal and perceived threat, which corresponds to a healthy state.

    +
    +
    +A landscape plot with two basins. +

    +Figure 10: The 3D landscape (potential value as color) for the panic disorder model +

    +
    +
    +

    We now investigate the effect of the parameter \(S\) on the potential landscape. This parameter represents the tendency that a person considers physical arousal as a sign of danger. Therefore, we expect that higher \(S\) will stabilize the panic state and destabilize the healthy state.

    +

    We perform a batch simulation with varying \(S\) values to construct the potential landscapes for different \(S\) values. This, again, starts with the creation of a grid of parameter values.

    +
    +
    +
    batch_arg_grid_panic <- new_arg_set() |>
    +    add_arg_ele(arg_name = "par", ele_name = "S", start = 0, end = 1, by = 0.5) |>
    +    make_arg_grid()
    +
    +
    +

    We then perform the batch simulation using parallel computing.

    +
    +
    +
    future::plan("multisession")
    +set.seed(1614, kind = "L'Ecuyer-CMRG")
    +batch_output_panic <- batch_simulation(batch_arg_grid_panic, sim_fun = function(par) {
    +    multi_init_simulation(sim_fun_panic, range_x0 = c(0, 1, 0, 1), R = 4,
    +        par = par) |>
    +        window(start = 100)
    +})
    +
    +
    +
    + +
    +

    The 3D landscapes for different \(S\) values are shown in Figure 11. The landscapes show that the system only has one stable state when \(S\) is low, but has two stable states when \(S\) is high. The stability of the panic state also increases when \(S\) is higher. This indicates that a higher \(S\) value corresponds to a higher risk of panic attacks.

    +
    +
    +
    l_batch_panic_3d <- make_3d_matrix(batch_output_panic, x = "A", y = "PT",
    +    cols = "S", h = 0.005, lims = c(-1, 1.5, -0.5, 1.5))
    +plot(l_batch_panic_3d)
    +
    +
    +Three landscape plots arranged in a row. The x-axis is labeled A, the y-axis is labeled PT. Columns correspond to S values of 0, 0.5, and 1. +

    +Figure 11: The landscape for the panic disorder model of different \(S\) values. Two landscapes are shown for different variable combinations, \(A\) and \(PT\), or \(A\) and \(E\). +

    +
    +
    +

    4 Discussion

    +

    Potential landscapes can show the stability of states for a dynamical system in an intuitive and quantitative way. They are especially informative for multistable systems. In this article, we illustrated how to construct potential landscapes using simlandr. The potential landscapes generated by simlandr are based on the steady-state distribution of the system, which is in turn estimated using Monte Carlo simulation. Compared to analytic methods, Monte Carlo estimation is more flexible and thus more applicable for complex models. The flexibility comes together with a higher demand for time and storage, which is necessary to make the estimation precise enough. The hash_big_matrix class partly solved this problem by dumping the memory storage to hard disk space. Also, it is important that the simulation function itself is efficient enough. The functions sim_SDE() and multi_init_simulation() make use of the efficient simulations provided by Sim.DiffProc (Guidoum and Boukhetala 2020) and the parallel computing with the future framework (Bengtsson 2021). For customized simulation functions, there are also multiple approaches that can be used to improve the performance, for which we refer interested readers to Wickham (2019). In Supplementary Materials A, we provide a benchmark of the typical time and memory usage of the procedures in simlandr. From there, we can see that time and memory usage are acceptable in most cases on a personal computer. When the transition between attractors is rare, the multi_init_simulation() function may help to speed up the convergence, and more advanced sampling methods like importance sampling or rare event sampling may be needed in more complex situations. The detailed ways to implement such methods are highly dependent on the specific model and are beyond the scope of this package. We direct interested readers to Rubino and Tuffin (2009) and Kloek and van Dijk (1978) for a comprehensive review of rare event simulation methods. Nevertheless, the landscape construction functions in simlandr allow users to provide weights for the simulation results, which can be used to adjust the sampling distribution.

    +

    In addition, the length of the simulation and the choice of noise strength may also have an important influence on the results. If the length is too short, the density estimation will be inaccurate, resulting in rugged landscapes. If the length is too long, the simulation part would require more computational resources, which is not always realistic. If the noise is too weak, the system may not be able to converge in a reasonable time, resulting in problems in convergence checks, overly noisy landscapes, or failure to show valleys that are theoretically present. If the noise is too strong, the simulation may be unstable and the boundaries between valleys may be blurry. In Supplementary Materials B, we showed the influence of simulation length and noise strength on the landscape output. With some theoretical expectation of the system’s behavior, it is not difficult to spot that the simulation is too short, or the noise level might be unsuitable. In that case, some adjustments are required before the landscape can be well constructed.

    +

    All landscape construction and barrier calculation functions in simlandr contain both visual aids and numerical data that can be used for further processing. The html plots based on plotly are more suitable for interactive illustrations, while it is also possible to export them to static plots using plotly::orca(). The ggplot2 plots are readily usable for flat printing.

    +

    We also want to note some limitations of the potential landscape generated by simlandr. First, the generalized potential landscape is not a complete description of all dynamics in a system. It emphasizes the stability of different states by filtering out other dynamical information. Some behaviors are not possible in gradient systems (e.g., oscillations and loops), thus cannot be shown in a potential landscape (Zhou and Li 2016). Second, since the steady-state distribution is estimated using a kernel smoothing method, which depends on stochastic simulations, the resulting potential function may not be highly accurate. Its accuracy is further affected by the choice of kernel bandwidth and noise strength. This issue is particularly pronounced at valley edges, where fewer samples are available for estimation. Similar limitations apply to MEP calculations, as they are derived from the generalized landscape rather than the original dynamics. Therefore, we do not recommend directly interpreting the potential function or barrier height results for applications requiring high precision. Instead, the potential landscape is best used as a semi-quantitative tool to gain insights into the system’s overall behavior, guide further analysis, and compare system behavior under different parameter settings, provided the same simulation and kernel estimation conditions are used. The examples in this article illustrated some typical use cases we recommend.

    +

    5 Availability and future directions

    +

    This package is publicly available from the Comprehensive R Archive Network (CRAN) at , under GPL-3 license. The results in the current article were generated with simlandr 0.4.0 version. R script to replicate all the results in this article can be found in the supporting information.

    +

    The barrier height data calculated by simlandr can also be further analyzed and visualized. For example, sometimes it is helpful to look into how the barrier height changes with varying parameters (e.g., Cui et al. (2023b)). We encourage users to explore other ways of analyzing and visualizing the various results provided by simlandr.

    +

    The method we chose for simlandr is not the only possible one. The generalized landscape by Wang et al. (2008), which we implemented, is more flexible and emphasizes the possibility that the system is in a specific state, while other methods may have other strengths (e.g., the method by Rodríguez-Sánchez (2020) emphasizes the gradient part of the vector field, and the method by Moore et al. (2016) emphasizes the possibility of transition processes under small noise). We look forward to future theoretical and methodological developments in this direction.

    +

    6 Acknowledgments

    +

    TL was supported by the NSFC under Grant No. 11825102 and the Beijing Academy of Artificial Intelligence (BAAI). ALA and MO were supported by an NWO VIDI grant, Grant No. VI.Vidi.191.178.

    +
    +

    6.1 Supplementary materials

    +

    Supplementary materials are available in addition to this article. It can be downloaded at +RJ-2025-039.zip

    +

    6.2 CRAN packages used

    +

    simlandr, waydown, Sim.DiffProc, future, coda, bigmemory, digest, ks, ggplot2, plotly

    +

    6.3 CRAN Task Views implied by cited packages

    +

    Bayesian, ChemPhys, DifferentialEquations, DynamicVisualizations, Finance, GraphicalModels, HighPerformanceComputing, NetworkAnalysis, Phylogenetics, Psychometrics, Spatial, TeachingStatistics, TimeSeries, WebTechnologies

    +
    +
    +P. Ao. Potential in stochastic differential equations: Novel construction. Journal of Physics A: Mathematical and General, 37(3): L25–L30, 2004. DOI 10.1088/0305-4470/37/3/L01. +
    +
    +H. Bengtsson. A unifying framework for parallel and distributed processing in R using futures. The R Journal, 13(2): 208–227, 2021. DOI 10.32614/RJ-2021-048. +
    +
    +J. E. Chacón and T. Duong. Multivariate kernel smoothing and its applications. New York: Chapman and Hall/CRC, 2018. DOI 10.1201/9780429485572. +
    +
    +J. Cui, A. Lichtwarck-Aschoff and F. Hasselman. Comments on "Climbing Escher’s Stairs: A way to approximate stability landscapes in multidimensional systems". 2023a. DOI 10.48550/arXiv.2312.09690. +
    +
    +J. Cui, A. Lichtwarck-Aschoff, M. Olthof, T. Li and F. Hasselman. From metaphor to computation: Constructing the potential landscape for multivariate psychological formal models. Multivariate Behavioral Research, 58(4): 743–761, 2023b. DOI 10.1080/00273171.2022.2119927. +
    +
    +D. Dahiya and M. Cameron. Ordered line integral methods for computing the quasi-potential. Journal of Scientific Computing, 75(3): 1351–1384, 2018. DOI 10.1007/s10915-017-0590-9. +
    +
    +E. W. Dijkstra. A note on two problems in connexion with graphs. Numerische Mathematik, 1(1): 269–271, 1959. DOI 10.1007/BF01386390. +
    +
    +W. E and E. Vanden-Eijnden. Transition-path theory and path-finding algorithms for the study of rare events. Annual Review of Physical Chemistry, 61(1): 391–420, 2010. DOI 10.1146/annurev.physchem.040808.090412. +
    +
    +D. Eddelbuettel, A. Lucas, J. Tuszynski, H. Bengtsson, S. Urbanek, M. Frasca, B. Lewis, M. Stokely, H. Muehleisen, D. Murdoch, et al. digest: Create compact hash digests of R objects. 2021. DOI 10.32614/CRAN.package.digest. +
    +
    +M. I. Freidlin and A. D. Wentzell. Random perturbations of dynamical systems. 3rd ed Berlin: Springer, 2012. DOI 10.1007/978-3-642-25847-3. +
    +
    +A. C. Guidoum and K. Boukhetala. Performing parallel Monte Carlo and moment equations methods for Itô and Stratonovich stochastic differential systems: R package Sim.DiffProc. Journal of Statistical Software, 96: 1–82, 2020. DOI 10.18637/jss.v096.i02. +
    +
    +A. M. Hayes and L. A. Andrews. A complex systems approach to the study of change in psychotherapy. BMC Medicine, 18(1): 197, 2020. DOI 10.1186/s12916-020-01662-2. +
    +
    +M. Heymann and E. Vanden-Eijnden. Pathways of maximum likelihood for rare events in nonequilibrium systems: Application to nucleation in the presence of shear. Physical Review Letters, 100(14): 140601, 2008. DOI 10.1103/PhysRevLett.100.140601. +
    +
    +M. J. Kane, J. Emerson and S. Weston. Scalable strategies for computing with massive data. Journal of Statistical Software, 55(14): 1–19, 2013. DOI 10.18637/jss.v055.i14. +
    +
    +T. Kloek and H. K. van Dijk. Bayesian estimates of equation system parameters: An application of integration by Monte Carlo. Econometrica, 46(1): 1–19, 1978. DOI 10.2307/1913641. +
    +
    +K. A. Lamothe, K. M. Somers and D. A. Jackson. Linking the ball-and-cup analogy and ordination trajectories to describe ecosystem stability, resistance, and resilience. Ecosphere, 10(3): e02629, 2019. DOI 10.1002/ecs2.2629. +
    +
    +C. M. Moore, C. R. Stieha, B. C. Nolting, M. K. Cameron and K. C. Abbott. QPot: An R package for stochastic differential equation quasi-potential analysis. The R Journal, 8(2): 19–38, 2016. DOI 10.32614/RJ-2016-031. +
    +
    +M. Olthof, F. Hasselman, F. Oude Maatman, A. M. T. Bosman and A. Lichtwarck-Aschoff. Complexity theory of psychopathology. Journal of Psychopathology and Clinical Science, 132(3): 314–323, 2023. DOI 10.1037/abn0000740. +
    +
    +M. Plummer, N. Best, K. Cowles and K. Vines. CODA: Convergence diagnosis and output analysis for MCMC. R News, 6(1): 7–11, 2006. URL https://journal.r-project.org/articles/RN-2006-002/RN-2006-002.pdf. +
    +
    +D. J. Robinaugh, J. M. B. Haslbeck, L. J. Waldorp, J. J. Kossakowski, E. I. Fried, A. J. Millner, R. J. McNally, O. Ryan, J. de Ron, H. L. J. van der Maas, et al. Advancing the network theory of mental disorders: A computational model of panic disorder. Psychological Review, 131(6): 1482–1508, 2024. DOI 10.1037/rev0000515. +
    +
    +P. Rodríguez-Sánchez. PabRod/rolldown: Post-publication update. 2020. DOI 10.5281/zenodo.3763038. +
    +
    +P. Rodríguez-Sánchez, E. H. van Nes and M. Scheffer. Climbing Escher’s stairs: A way to approximate stability landscapes in multidimensional systems. PLOS Computational Biology, 16(4): e1007788, 2020. DOI 10.1371/journal.pcbi.1007788. +
    +
    +G. Rubino and B. Tuffin. Rare event simulation using Monte Carlo methods. Newark, UNITED KINGDOM: John Wiley & Sons, Incorporated, 2009. DOI 10.1002/9780470745403. +
    +
    +C. Sievert. Interactive web-based data visualization with R, plotly, and shiny. Chapman and Hall/CRC, 2020. URL https://plotly-r.com. +
    +
    +P. S. Stumpf, F. Arai and B. D. MacArthur. Modeling stem cell fates using non-Markov processes. Cell Stem Cell, 28(2): 187–190, 2021. DOI 10.1016/j.stem.2021.01.009. +
    +
    +C. H. Waddington. Principles of development and differentiation. New York: Macmillan, 1966. +
    +
    +G. Wan, S. J. Avis, Z. Wang, X. Wang, H. Kusumaatmaja and T. Zhang. Finding transition state and minimum energy path of bistable elastic continua through energy landscape explorations. Journal of the Mechanics and Physics of Solids, 183: 105503, 2024. DOI 10.1016/j.jmps.2023.105503. +
    +
    +J. Wang, L. Xu and E. Wang. Potential landscape and flux framework of nonequilibrium networks: Robustness, dissipation, and coherence of biochemical oscillations. Proceedings of the National Academy of Sciences, 105(34): 12271–12276, 2008. DOI 10.1073/pnas.0800579105. +
    +
    +J. Wang, K. Zhang, L. Xu and E. Wang. Quantifying the Waddington landscape and biological paths for development and differentiation. Proceedings of the National Academy of Sciences, 108(20): 8257–8262, 2011. DOI 10.1073/pnas.1017017108. +
    +
    +H. Wickham. ggplot2: Elegant graphics for data analysis. Springer-Verlag New York, 2016. URL https://ggplot2.tidyverse.org. +
    +
    +H. Wickham. Improving performance. In Advanced R, 2nd ed pages. 531–546 2019. Boca Raton: Chapman and Hall/CRC. URL https://adv-r.hadley.nz/. +
    +
    +P. Zhou and T. Li. Construction of the landscape for multi-stable systems: Potential landscape, quasi-potential, a-type integral and beyond. The Journal of Chemical Physics, 144(9): 094109, 2016. DOI 10.1063/1.4943096. +
    +
    + + +
    + +
    +
    + + + + + + + +
    +

    References

    +
    +

    Reuse

    +

    Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

    +

    Citation

    +

    For attribution, please cite this work as

    +
    Cui, et al., "simlandr: Simulation-Based Landscape Construction for Dynamical Systems", The R Journal, 2026
    +

    BibTeX citation

    +
    @article{RJ-2025-039,
    +  author = {Cui, Jingmeng and Olthof, Merlijn and Lichtwarck-Aschoff, Anna and Li, Tiejun and Hasselman, Fred},
    +  title = {simlandr: Simulation-Based Landscape Construction for Dynamical Systems},
    +  journal = {The R Journal},
    +  year = {2026},
    +  note = {https://doi.org/10.32614/RJ-2025-039},
    +  doi = {10.32614/RJ-2025-039},
    +  volume = {17},
    +  issue = {4},
    +  issn = {2073-4859},
    +  pages = {173-191}
    +}
    +
    + + + + + + + diff --git a/_articles/RJ-2025-039/RJ-2025-039.pdf b/_articles/RJ-2025-039/RJ-2025-039.pdf new file mode 100644 index 0000000000..b9544b9816 Binary files /dev/null and b/_articles/RJ-2025-039/RJ-2025-039.pdf differ diff --git a/_articles/RJ-2025-039/RJ-2025-039.tex b/_articles/RJ-2025-039/RJ-2025-039.tex new file mode 100644 index 0000000000..991ac8a68b --- /dev/null +++ b/_articles/RJ-2025-039/RJ-2025-039.tex @@ -0,0 +1,564 @@ +% !TeX root = RJwrapper.tex +\title{simlandr: Simulation-Based Landscape Construction for Dynamical Systems} + + +\author{by Jingmeng Cui, Merlijn Olthof, Anna Lichtwarck-Aschoff, Tiejun Li, and Fred Hasselman} + +\maketitle + +\abstract{% +We present the simlandr package for R, which provides a set of tools for constructing potential landscapes for dynamical systems using Monte Carlo simulation. Potential landscapes can be used to quantify the stability of system states. While the canonical form of a potential function is defined for gradient systems, generalized potential functions can also be defined for non-gradient dynamical systems. Our method is based on the potential landscape definition from the steady-state distribution, and can be used for a large variety of models. To facilitate simulation and computation, we introduce several novel features, including data structures optimized for batch simulations under varying conditions, an out-of-memory computation tool with integrated hash-based file-saving systems, and an algorithm for efficiently searching the minimum energy path. Using a multistable cell differentiation model as an example, we illustrate how simlandr can be used for model simulation, landscape construction, and barrier height calculation. The simlandr package is available at \url{https://CRAN.R-project.org/package=simlandr}, under GPL-3 license. +} + +\section{Introduction}\label{sec:intro} + +To better understand a dynamical system, it is often important to know the stability of different states. The metaphor of a potential landscape consisting of hills and valleys has been used to illustrate differences in stability in many fields, including genetics \citep{wangQuantifyingWaddingtonLandscape2011, WaddingtonPrinciplesDevelopmentDifferentiation1966}, ecology \citep{lamotheLinkingBallandcupAnalogy2019}, and psychology \citep{olthofComplexityTheoryPsychopathology2020}. In such a landscape, the stable states of the system correspond to the lowest points (minima) in the valleys of the landscape. Just like a ball that is thrown in such a landscape will eventually gravitate towards such a minimum, the dynamical system is conceptually more likely to visit its stable states in which the system is also more resilient to noise. For example, in the landscape metaphor of psychopathology (Figure \ref{fig:metaphor}), the valleys represent different mental health states, their relative depth represents the relative stability of the states, and the barriers between valleys represent the difficulty of transitioning between these states \citep[\citet{hayesComplexSystemsApproach2020}]{olthofComplexityTheoryPsychopathology2020}. When the healthy state is more stable, the person is more likely to stay mentally healthy, whereas when the maladaptive state is more stable, the person is more likely to suffer from mental disorders. + +\begin{figure} +\includegraphics[width=1\linewidth,alt={Three landscape plots, each with a ball resting in one of two basins. The left basin is labeled maladaptive, the right basin healthy. In the first plot, the maladaptive basin is deeper; in the second, both basins are equally deep; in the third, the healthy basin is deeper.}]{figures/metaphor} \caption{Illustration of the ball-and-landscape metaphor commonly used in the field of psychopathology.}\label{fig:metaphor} +\end{figure} + +Yet, formally quantifying the stability of states is a nontrivial question. Here we present an R package, \CRANpkg{simlandr}, that can quantify the stability of various kinds of systems without many mathematical restrictions. + +Dynamical systems are usually modeled by stochastic differential equations, which may depend on the past history (i.e., may be non-Markovian, \citet{StumpfEtAlModelingStemCell2021}). They take the general form of +\begin{equation} +\mathrm{d} \boldsymbol{X}_t = \boldsymbol{b}(\boldsymbol{X}_t, \boldsymbol{H}_t){\mathrm{d}t} + \boldsymbol{\sigma}( \boldsymbol{X}_t, \boldsymbol{H}_t)\mathrm{d}\boldsymbol{W}, +\label{eq:sde} +\end{equation} +where \(\boldsymbol{X}_t\) is the random variable representing the current state of the system and \(\boldsymbol{H_t}\) represents the past history of the system \(\boldsymbol{H}_t=\{\boldsymbol{X_s} | s \in [0, t)\}\)\footnote{The corresponding variable representing positions in the state space is not a random variable, so we use lowercase \(\boldsymbol{x}\) for it. This convention will be followed throughout this article.} The first term on the right-hand side of Eq \eqref{eq:sde} represents the deterministic part of the dynamics, which is a function of the system's current state \(\boldsymbol{b}(\boldsymbol{X}_t, \boldsymbol{H}_t)\). The second term represents the stochastic part, which is standard white noise \(\mathrm{d}\boldsymbol{W}\) multiplied by the noise strength \(\boldsymbol{\sigma}( \boldsymbol{X}_t, \boldsymbol{H}_t)\). + +If the dynamical equation (Eq \eqref{eq:sde}) can be written in the following form +\begin{equation} +\mathrm{d} \boldsymbol{X} = - \nabla U {\mathrm{d} t} +\sqrt{2}\mathrm{d}\boldsymbol{W}, +\label{eq:canon} +\end{equation} +then \(U\) is the potential function of the system.\footnote{Under zero inertia approximation.} However, this is not possible for general dynamical systems. The trajectory of such a system may contain loops which are not possible to be represented by a gradient system (this issue was compared to Escher's stairs by \citet{rodriguez-sanchezClimbingEscherStairs2020}). In this case, further generalization is needed. The theoretical background of \CRANpkg{simlandr} is the generalized potential landscape by \citet{wangPotentialLandscapeFlux2008}, which is based on the Boltzmann distribution and the steady-state distribution of the system. The Boltzmann distribution is a distribution law in physics, which states the distribution of classical particles depends on the energy level they occupy. When the energy is higher, the particle is exponentially less likely to be in such states +\begin{equation} +P(\boldsymbol{x}) \propto \exp (-U). +\label{eq:Boltzmann} +\end{equation} + +This is then linked to dynamical systems by the steady-state distribution. The steady-state distribution of stochastic differential equations is the distribution that does not change over time, denoted by \(P_{\mathrm{SS}}\) which satisfies +\begin{equation} +\frac{\partial P_{\mathrm{SS}} (\boldsymbol{x},t)}{\partial t} = 0. +\label{eq:ss} +\end{equation} +The steady-state distribution is important because it extracts time-invariant information from a set of stochastic differential equations. Substituting the steady-state distribution into Eq \eqref{eq:Boltzmann} gives Wang's generalized potential landscape function \citep{wangPotentialLandscapeFlux2008} +\begin{equation} +U(\boldsymbol{x}) = - \ln P_{\mathrm{SS}}(\boldsymbol{x}). +\label{eq:Wang} +\end{equation} +If the system has ergodicity (i.e., after sufficient time it can travel to all possible states in the state space), the long-term sample distribution can be used to estimate the steady-state distribution, and the generalized potential function can be calculated. + +Our approach is not the only possible way for constructing potential landscapes. Many other theoretical approaches are available, including the SDE decomposition method by \citet{AoPotentialStochasticDifferential2004} and the quasi-potential by \citet{FreidlinWentzellRandomPerturbationsDynamical2012}. Various strategies to numerically compute these landscapes have been proposed (see \citet{zhouConstructionLandscapeMultistable2016} for a review). However, available realizations are still scarce. To our knowledge, besides \CRANpkg{simlandr}, there are two existing packages specifically for computing potential landscapes: the \CRANpkg{waydown} package \citep{rodriguez-sanchezPabRodRolldownPostpublication2020} and the \pkg{QPot} package \citep{MooreEtAlQPotPackageStochastic2016, dahiyaOrderedLineIntegral2018}. The \CRANpkg{waydown} package uses the skew-symmetric decomposition of the Jacobian, which theoretically produces landscapes that are similar to \citet{wangPotentialLandscapeFlux2008} (but see \citet{CuiEtAlCommentsClimbingEscher2023} for a potential technical issue with this package.). The \pkg{QPot} package uses a path integral method that produces quasi-landscapes following the definition by \citet{FreidlinWentzellRandomPerturbationsDynamical2012}. Because of the analytical methods used by \CRANpkg{waydown} and \pkg{QPot}, they both require the dynamic function to be Markovian and differentiable in the whole state space. Moreover, they can only be used for systems of up to two dimensions. \CRANpkg{simlandr}, in contrast, is based on Monte Carlo simulation and the steady-state distribution. It does not have specific requirements for the model. Even for models that are not globally differentiable, have history-dependence, and are defined in a high-dimensional space, \CRANpkg{simlandr} is still applicable (e.g., \citet{CuiEtAlMetaphorComputationConstructing2021}). Therefore, \CRANpkg{simlandr} can be applied to a much wider range of dynamical systems, illustrate a big picture of dominated attractors, and investigate how the stability of different attractors may be influenced by model parameters. As a trade-off, \CRANpkg{simlandr} is not designed for rare events sampling in which the noise strength \(\boldsymbol{\sigma}(\boldsymbol{X})\) is extremely small, nor for the precise calculation of the tail probability and transition paths. Instead, it is better to view \CRANpkg{simlandr} as a semi-quantitative tool that provides a broad overview of key attractors in dynamic systems, allowing for comparisons of their relative stability and the investigation of system parameter influences. We will show some typical use cases of \CRANpkg{simlandr} in later sections. Some key terms used in this article are summarized in Table \ref{tab:terms}. + +\begin{longtable}[]{@{} + >{\raggedright\arraybackslash}p{(\linewidth - 2\tabcolsep) * \real{0.3158}} + >{\raggedright\arraybackslash}p{(\linewidth - 2\tabcolsep) * \real{0.6842}}@{}} +\caption{\label{tab:terms} Summary of key terms used in this article.}\tabularnewline +\toprule\noalign{} +\begin{minipage}[b]{\linewidth}\raggedright +Term +\end{minipage} & \begin{minipage}[b]{\linewidth}\raggedright +Explanation +\end{minipage} \\ +\midrule\noalign{} +\endfirsthead +\toprule\noalign{} +\begin{minipage}[b]{\linewidth}\raggedright +Term +\end{minipage} & \begin{minipage}[b]{\linewidth}\raggedright +Explanation +\end{minipage} \\ +\midrule\noalign{} +\endhead +\bottomrule\noalign{} +\endlastfoot +Potential landscape metaphor & A conceptual metaphor representing the stability of a complex dynamic system as an uneven landscape, with a ball on it representing the system's state. This can be quantitatively realized in various ways \citep{zhouConstructionLandscapeMultistable2016}. \\ +Gradient system & A system whose deterministic motion can be described solely by the gradient of a potential function. \\ +Non-Markovian system & A system whose future evolution depends not only on its current state but also on its past history. \\ +Steady-state distribution & The probability distribution of a dynamic system that remains unchanged over time. \\ +Ergodic system & A dynamic system that, given enough time, will eventually pass through all possible states. \\ +Minimum energy path (MEP) & A transition path linking two local minima and passing through a saddle point \citep{WanEtAlFindingTransitionState2024}. It is always parallel to the gradient of the energy landscape, representing an efficient transition route. \\ +\end{longtable} + +\section{Design and implementation}\label{design-and-implementation} + +The general workflow of \CRANpkg{simlandr} involves three steps: model simulation, landscape construction, and barrier height computation. See Figure \ref{fig:diagram} for a summary. + +\begin{figure} +\includegraphics[width=1\linewidth,alt={A flow chart showing the analysis steps in simlandr, with functions listed under each step.}]{figures/diagram} \caption{The structure and workflow of simlandr.}\label{fig:diagram} +\end{figure} + +\subsection{Step 1: model simulation}\label{step-1-model-simulation} + +For the first step, a simulation function should be created by the user. This function should be able to simulate how the dynamical system of interest evolves over time and record its state in every time step. This can often be done with the Euler-Maruyama method. If the SDEs are up to three dimensions and Markovian, a helper function from \CRANpkg{simlandr}, \texttt{sim\_SDE()}, can also be used. This function is based on the simulation utilities from the \CRANpkg{Sim.DiffProc} \citep{GuidoumBoukhetalaPerformingParallelMonte2020} and the output can be directly used for later steps. Moreover, the \texttt{multi\_init\_simulation()} function can be used to simulate trajectories from various starting points, thus reducing the possibility for the system to be trapped in a local minimum. The \texttt{multi\_init\_simulation()} function also supports parallel simulation based on the \CRANpkg{future} framework to improve time efficiency \citet{RJ-2021-048}. + +For Monte Carlo methods, it is important that the simulation \emph{converges}, which means the distribution of the system is roughly stable. The precision of the steady-state distribution estimation, according to Eq \eqref{eq:Wang}, determines the precision of the distribution estimation. \CRANpkg{simlandr} provides a visual tool to compare the sample distributions in different stages (\texttt{check\_conv()}), whereas the \CRANpkg{coda} package \citep{PlummerEtAlCODAConvergenceDiagnosis2006} can be used for more advanced diagnostics. The output of the \texttt{sim\_SDE()} and \texttt{multi\_init\_simulation()} functions also uses the classes from the \CRANpkg{coda} package to enable easy convergence diagnosis. To achieve ergodicity in reasonable time, sometimes stronger noises need to be added to the system. + +A simulation function is sufficient if the user is only interested in a single model setting. If the model is parameterized and the user wants to investigate the influence of parameters on the stability of the system, then multiple simulations need to be run with different parameter settings. \CRANpkg{simlandr} provides functions to perform batch simulations and store the outputs for landscape construction as one object. This can later be used to compare the stability under different parameter settings or produce animations to show how a model parameter influences the stability of the model. + +In many cases, the output of the simulation is so large that it cannot be properly stored in the memory. \CRANpkg{simlandr} provides a hash\_big\_matrix class, which is a modification of the big.matrix class from the \CRANpkg{bigmemory} package \citep{KaneEtAlScalableStrategiesComputing2013}, that can perform out-of-memory computation and organize the data files in the disk. In an out-of-memory computation, the majority of the data is not loaded into the memory, but only the small subset of data that is used for the current computation step. Therefore, the memory occupation is dramatically reduced. The big.matrix class in the \CRANpkg{bigmemory} package provides a powerful tool for out-of-memory computation. It, however, requires an explicit file name for each matrix, which can be cumbersome if there are many matrices to be handled, and this is likely to be the case in a batch simulation. The \texttt{hash\_big\_matrix} class automatically generates the file names using the md5 values of the matrices with the \CRANpkg{digest} package\citep{EddelbuettelEtAlPkgdigestCreateCompact2021} and stores it within the object. Therefore, the file links can also be restored automatically. + +\subsection{Step 2: landscape construction}\label{step-2-landscape-construction} + +\CRANpkg{simlandr} provides a set of tools to construct 2D, 3D, and 4D\footnote{In this package, we use the number of dimensions in landscape plots (including \(U\) to define the dimension of landscapes. The x-, y-, z-, and color- axes can all be regarded as a dimension. Therefore, the dimension of a landscape can be one more than the dimension of the kernel smooth function.} landscapes from single or multiple simulation results. The steady-state distribution for selected variables of the system is first estimated using the kernel density estimates (KDE). The density function in R is used for 2D landscapes, whereas the \CRANpkg{ks} package \citep{ChaconDuongMultivariateKernelSmoothing2018} is used by default for 3D and 4D landscapes because of its higher efficiency. Then the potential function \(U\) is calculated from Eq \eqref{eq:Wang}. The landscape plots without a z-axis are created with \CRANpkg{ggplot2} \citep{WickhamGgplot2ElegantGraphics2016}, and those with a z-axis are created with \CRANpkg{plotly} \citep{SievertInteractiveWebbasedData2020}. These plots can be further refined using the standard \CRANpkg{ggplot2} or \CRANpkg{plotly} methods. See Table \ref{tab:overview} for an overview for the family of landscape functions. + +\begin{longtable}[]{@{} + >{\raggedright\arraybackslash}p{(\linewidth - 4\tabcolsep) * \real{0.2639}} + >{\raggedright\arraybackslash}p{(\linewidth - 4\tabcolsep) * \real{0.2639}} + >{\raggedright\arraybackslash}p{(\linewidth - 4\tabcolsep) * \real{0.4722}}@{}} +\caption{\label{tab:overview} Overview of various landscape functions provided by \texttt{simlandr}. Dimensions in bold represent the potential U calculated by the function. Dimensions in italic represent model parameters. Dimensions in parentheses are optional.}\tabularnewline +\toprule\noalign{} +\begin{minipage}[b]{\linewidth}\raggedright +Type of Input +\end{minipage} & \begin{minipage}[b]{\linewidth}\raggedright +Function +\end{minipage} & \begin{minipage}[b]{\linewidth}\raggedright +Dimensions +\end{minipage} \\ +\midrule\noalign{} +\endfirsthead +\toprule\noalign{} +\begin{minipage}[b]{\linewidth}\raggedright +Type of Input +\end{minipage} & \begin{minipage}[b]{\linewidth}\raggedright +Function +\end{minipage} & \begin{minipage}[b]{\linewidth}\raggedright +Dimensions +\end{minipage} \\ +\midrule\noalign{} +\endhead +\bottomrule\noalign{} +\endlastfoot +Single simulation data & \texttt{make\_2d\_static()} & x, \textbf{y} \\ +& \texttt{make\_3d\_static()} & \(1\) x, y, \textbf{z+color}; (2) x, y, \textbf{color} \\ +& \texttt{make\_4d\_static()} & x, y, z, \textbf{color} \\ +Multiple simulation data & \texttt{make\_2d\_matrix()} & x, \textbf{y}, \emph{cols}, \emph{(rows)} \\ +& \texttt{make\_3d\_matrix()} & x, y, \textbf{z+color}, \emph{cols}, \emph{(rows)} \\ +& \texttt{make\_3d\_animation()} & \(1\) x, y, \textbf{z+color}, \emph{fr}; (2) x, y, \textbf{color}, \emph{fr}; (3) x, y, \textbf{z+color}, \emph{cols} \\ +\end{longtable} + +\subsection{Step 3: barrier height calculation}\label{sec:singlel} + +An important property of the states in a landscape is their stability, which can be indicated by the barrier height that the system has to overcome when it transitions between one stable state to another adjacent state (see \citet{CuiEtAlMetaphorComputationConstructing2021} for further discussions about different stability indicators). The barrier height is also related to the escape time that the system transitions from one valley to another, which can be tested empirically \citep{wangPotentialLandscapeFlux2008}. \CRANpkg{simlandr} provides tools to calculate the barrier heights from landscapes. These functions look for the local minima in given regions and try to find the saddle point between two local minima. The potential differences between the saddle point and local minima are calculated as barrier heights. + +In 2D cases, there is only one possible path connecting two local minima. The point on the path with the highest \(U\) is identified as the saddle point. For 3D landscapes, there are multiple paths between two local minima. If we treat the system \emph{as if} it is a gradient system with Brownian noise, then the most probable path (termed as the \emph{minimum energy path}, MEP) that the system transitions is that it first goes along the steepest \emph{ascent} path from the starting point, and then goes along the steepest \emph{descent} path to the end point \citep{EVanden-EijndenTransitionpathTheoryPathfinding2010}. We find this path by minimizing the following action using the \citet{dijkstra1959note} algorithm \citep{HeymannVanden-EijndenPathwaysMaximumLikelihood2008} +\begin{equation} +\varphi_{\mathrm{MEP}} = \arg\min_\varphi \int_A^B |\nabla U||\mathrm{d}\varphi|\left( \approx \arg \min_\varphi \Sigma_i |\nabla U_i||\Delta \varphi_i|\right), +\label{eq:optim} +\end{equation} +where \(A\) and \(B\) are the starting and end points and \(\varphi\) is the path starting at \(A\) and ending in \(B\). After that, the point with the maximum potential value on the MEP is identified as the saddle point. Note that while the barrier height still indicates the stability of local minima, the MEP may not be the true most probable path for a nongradient system to transition between stable states. + +\section{Examples}\label{examples} + +We use two dynamical systems to illustrate the usage of the \CRANpkg{simlandr} package. The first one is a two-dimensional stochastic non-gradient gene expression model, which was used by \citet{wangQuantifyingWaddingtonLandscape2011} to represent cell development and differentiation. The second example is a dynamic model of panic disorder \citep{robinaugh2024} which contains many more variables and parameters, non-Markovian property, and non-differentiable formulas. We mainly use the first example to show the agreement of the results from \CRANpkg{simlandr} with previous analytic results, and the second example as a typical use case of a complex dynamic model which is not treatable with other methods (also see \citet{CuiEtAlMetaphorComputationConstructing2021} for more substantive discussions). Note that, both systems include more than two variables, making it impossible to perform the landscape analysis with other available R packages. + +\subsection{Example 1: the gene expression model}\label{example-1-the-gene-expression-model} + +This model is built on the mutual regulations of the expressions of two genes, in which \(X_1\) and \(X_2\) represent the expression levels of two genes which activate themselves and inhibit each other. A graphical illustration is shown in Figure \ref{fig:e1} (adapted from \citet{wangQuantifyingWaddingtonLandscape2011}). Their dynamic functions can be written as +\begin{align} +\frac {\mathrm{d}X_ {1}}{\mathrm{d}t} &= \frac {ax_ {1}^ {n}}{S^ {n}+x_ {1}^ {n}} + \frac {bS^ {n}}{S^ {n}+x_ {2}^ {n}} - kx_ {1}+ \sigma_1 \frac{\mathrm{d}W_1}{\mathrm{d}t}, +\label{eq:sim2x1} +\\ +\frac {\mathrm{d}X_ {2}}{\mathrm{d}t} &= \frac {ax_ {2}^ {n}}{S^ {n}+x_ {2}^ {n}} + \frac {bS^ {n}}{S^ {n}+x_ {1}^ {n}} - kx_ {2}+ \sigma_2 \frac{\mathrm{d}W_2}{\mathrm{d}t}, +\label{eq:sim2x2} +\\ +\frac {\mathrm{d}a}{\mathrm{d}t} &= -\lambda a+ \sigma_3 \frac{\mathrm{d}W_3}{\mathrm{d}t}, +\label{eq:sim2a} +\end{align} +where \(a\) represents the strength of self-activation, \(b\) represents the strength of mutual-inhibition, and \(k\) represents the speed of degradation. The development of an organism is modeled as \(a\) decreasing at a certain speed \(\lambda\). In the beginning, there is only one possible state for the cell. After a certain milestone, the cell differentiates into one of the two possible states. + +\begin{figure} +\includegraphics[width=0.5\linewidth,alt={Two large circles labeled X1 and X2. Dashed arrows labeled b connect the circles in both directions. Each circle has a solid self-loop labeled a and a dashed self-loop labeled k.}]{figures/diagram-e1} \caption{A graphical illustration of the relationship between the activation levels of the two genes. Solid arrows represent positive relationships (i.e., activation) and dashed arrows represent negative relationships (i.e., inhibition).}\label{fig:e1} +\end{figure} + +This model can be simulated using the \texttt{sim\_SDE()} function in \CRANpkg{simlandr}, with the default parameter setting \(b = 1, k = 1, S = 0.5, n = 4, \lambda = 0.01\), and \(\sigma_1 = \sigma_2 = \sigma_3 = 0.2\). + +\begin{verbatim} +# Load the package. +library(simlandr) + +# Specify the simulation function. +b <- 1 +k <- 1 +S <- 0.5 +n <- 4 +lambda <- 0.01 + +drift_gene <- c(rlang::expr(z * x^(!!n)/((!!S)^(!!n) + x^(!!n)) + (!!b) * + (!!S)^(!!n)/((!!S)^(!!n) + y^(!!n)) - (!!k) * x), rlang::expr(z * y^(!!n)/((!!S)^(!!n) + + y^(!!n)) + (!!b) * (!!S)^(!!n)/((!!S)^(!!n) + x^(!!n)) - (!!k) * y), + rlang::expr(-(!!lambda) * z)) |> + as.expression() + +diffusion_gene <- expression(0.2, 0.2, 0.2) +\end{verbatim} + +\begin{verbatim} +# Perform a simulation and save the output. +set.seed(1614) +single_output_gene <- sim_SDE(drift = drift_gene, diffusion = diffusion_gene, + N = 1e+06, M = 10, Dt = 0.1, x0 = c(0, 0, 1), keep_full = FALSE) +\end{verbatim} + +After the simulation, we perform some basic data wrangling to produce a dataset that can be used for further analysis. We create a new variable \texttt{delta\_x} as the difference between X1 (X) and X2 (Y), and we rename the variable Z as \texttt{a}. + +\begin{verbatim} +single_output_gene2 <- do.call(rbind, single_output_gene) +single_output_gene2 <- cbind(single_output_gene2[, "X"] - single_output_gene2[, + "Y"], single_output_gene2[, "Z"]) +colnames(single_output_gene2) <- c("delta_x", "a") +\end{verbatim} + +We then perform the convergence check on the simulation result. First, we convert the simulation output to the format for the \texttt{coda} package, and thin the output to speed up the convergence check. + +\begin{verbatim} +single_output_gene_mcmc_thin <- as.mcmc.list(lapply(single_output_gene, + function(x) x[seq(1, nrow(x), by = 100), ])) +\end{verbatim} + +We then show the convergence diagnosis plot to check the convergence of the simulation in Figure \ref{fig:converge-gene}. The distribution of the two key variables in different simulation stages are converging, indicating that the simulation is long enough to provide reliable estimation of the steady-state distribution. Other convergence checks can also be readily performed using the \CRANpkg{coda} package. + +\begin{verbatim} +plot(single_output_gene_mcmc_thin) +\end{verbatim} + +\begin{figure} +\includegraphics[width=1\linewidth,alt={Trace and density plots for variables x, y, and z.}]{RJ-2025-039_files/figure-latex/converge-gene-1} \caption{The convergence check result for the simulation of the gene expression model. The variables in different simulation stages did not show distributional differences, indicating that the simulation is long enough to provide a reliable estimation of the steady-state distribution.}\label{fig:converge-gene} +\end{figure} + +We generated the 3D landscape for this model with \texttt{make\_3d\_single()}. Here, we use \texttt{x}, \texttt{y} to specify the variables of interest, and use \texttt{lims} to specify the limits of the x and y axes for the landscape. The \texttt{lims} argument can be left blank, then the limits will be automatically calculated. + +\begin{verbatim} +l_single_gene_3d <- make_3d_single(single_output_gene2, x = "delta_x", + y = "a", lims = c(-1.5, 1.5, 0, 1.5), Umax = 8) +\end{verbatim} + +The resulting landscape is shown in the left panel of Figure \ref{fig:3dstaticgene}. In this plot, the x-axis represents \(\Delta x (= x_1-x_2)\), and the y-axis represents \(a\). To compare with, the potential landscape obtained analytically by \citet{wangQuantifyingWaddingtonLandscape2011} is shown in the right panel of Figure \ref{fig:3dstaticgene}. The result of \CRANpkg{simlandr} appears to be very close to the result based on the analytical derivation. Note that because different normalization methods were used, the \(U\) values of the two landscapes are not directly comparable. Here, we are mainly interested in their relative shape. + +\begin{verbatim} +plot(l_single_non_grad_3d) +\end{verbatim} + +\begin{figure} +\includegraphics[width=0.5\linewidth,alt={Two similar landscape plots, each with three basins.}]{figures/3dstatic_gene} \includegraphics[width=0.5\linewidth,alt={Two similar landscape plots, each with three basins.}]{figures/wang2011} \caption{The 3D landscape (potential value as z-axis) for the gene expression model. The left panel is the plot produced by simlandr; the right panel is the potential landscape obtained analytically by Wang et al. (2008), reproduced with the permission of the authors and in accordance with the journal policy.}\label{fig:3dstaticgene} +\end{figure} + +We then calculate the barrier for the landscape using \texttt{calculate\_barrier()}. The barrier is calculated by specifying the start and end locations, and the radii of the start and end locations. The height of the barrier from two sides can be calculated with \texttt{get\_barrier\_height()}. + +\begin{verbatim} +b_single_gene_3d <- calculate_barrier(l_single_gene_3d, start_location_value = c(0, + 1.2), end_location_value = c(1, 0.2), start_r = 0.3, end_r = 0.3) + +get_barrier_height(b_single_gene_3d) +\end{verbatim} + +\begin{verbatim} +#> delta_U_start delta_U_end +#> 2.506326 2.810604 +\end{verbatim} + +The local minima, the saddle point, and the MEP can be added to the landscape with \texttt{autolayer()}, shown in Figure \ref{fig:bsingle3dgene}. + +\begin{verbatim} +plot(l_single_gene_3d, 2) + autolayer(b_single_gene_3d) +\end{verbatim} + +\begin{figure} +\includegraphics[width=0.5\linewidth,alt={A landscape plot with a white line connecting two white dots, passing through a red dot in the middle.}]{RJ-2025-039_files/figure-latex/bsingle3dgene-1} \caption{The landscape for the gene expression model. The local minima are marked as white dots, the saddle points are marked as red dots, and the MEPs are marked as white lines.}\label{fig:bsingle3dgene} +\end{figure} + +Next, we use multiple simulations to investigate the influence of two parameters, \(k\) and \(b\), on the stability of the system. As explained above, the parameter \(b\) represents the strength of mutual-inhibition between the two genes. Therefore, as \(b\) increases, we expect the differentiation is more extreme, that is, the cell is more likely to develop into one of the two cell types with very different gene expression levels. The valleys in the landscape representing the two types will become further apart and the barrier becomes clearer. The parameter \(k\) represents the speed of degradation of the gene products. As \(k\) increases, the gene products degrade faster, and this effect is more pronounced when the gene products are at high levels. Therefore, the dominant gene will express at a less extreme level, and we expect that the two valleys become closer, and the barrier will become less clear as \(k\) increases. + +We use the batch simulation functions of \CRANpkg{simlandr}. First, we create the argument set for the batch simulation. This specifies the parameters to be varied. We examined three \(b\) values, 0.5, 1, 1.5, and three \(k\) values, 0.5, 1, 1.5, which form 9 possible combinations. + +\begin{verbatim} +batch_arg_set_gene <- new_arg_set() +batch_arg_set_gene <- batch_arg_set_gene |> + add_arg_ele(arg_name = "parameter", ele_name = "b", start = 0.5, end = 1.5, + by = 0.5) |> + add_arg_ele(arg_name = "parameter", ele_name = "k", start = 0.5, end = 1.5, + by = 0.5) +batch_grid_gene <- make_arg_grid(batch_arg_set_gene) +\end{verbatim} + +We then perform the batch simulation with the \texttt{batch\_simulation()} function. Here, we specify the simulation function to be used, which is similar to the single simulation we showed above, together with the data wrangling procedure. The simulation function is defined with the \texttt{sim\_fun} argument. We also use \texttt{bigmemory\ =\ TRUE} to store the simulation results in the \texttt{hash\_big\_matrix} format, which is more memory-efficient. + +\begin{verbatim} +batch_output_gene <- batch_simulation(batch_grid_gene, sim_fun = function(parameter) { + b <- parameter[["b"]] + k <- parameter[["k"]] + drift_gene <- c(rlang::expr(z * x^(!!n)/((!!S)^(!!n) + x^(!!n)) + (!!b) * + (!!S)^(!!n)/((!!S)^(!!n) + y^(!!n)) - (!!k) * x), rlang::expr(z * + y^(!!n)/((!!S)^(!!n) + y^(!!n)) + (!!b) * (!!S)^(!!n)/((!!S)^(!!n) + + x^(!!n)) - (!!k) * y), rlang::expr(-(!!lambda) * z)) |> + as.expression() + set.seed(1614) + single_output_gene <- sim_SDE(drift = drift_gene, diffusion = diffusion_gene, + N = 1e+06, M = 10, Dt = 0.1, x0 = c(0, 0, 1), keep_full = FALSE) + single_output_gene2 <- do.call(rbind, single_output_gene) + single_output_gene2 <- cbind(single_output_gene2[, "X"] - single_output_gene2[, + "Y"], single_output_gene2[, "Z"]) + colnames(single_output_gene2) <- c("delta_x", "a") + single_output_gene2 +}, bigmemory = TRUE) +\end{verbatim} + +If the output is saved in an RDS file, upon next use, it can be read as follows. + +\begin{verbatim} +saveRDS(batch_output_gene, "batch_output_gene.RDS") +batch_output_gene <- readRDS("batch_output_gene.RDS") |> + attach_all_matrices() +\end{verbatim} + +We then make the 3D matrix for the batch output, using \texttt{make\_3d\_matrix()}. + +\begin{verbatim} +l_batch_gene_3d <- make_3d_matrix(batch_output_gene, x = "delta_x", y = "a", + cols = "b", rows = "k", lims = c(-5, 5, -0.5, 2), h = 0.005, Umax = 8, + kde_fun = "ks", individual_landscape = TRUE) +\end{verbatim} + +For the barrier calculation step, the start and end points of the barrier may be different for each landscape plot. The following code shows how to create a barrier grid for each landscape plot. First, we create a barrier grid template using the function \texttt{make\_barrier\_grid\_3d()}. Next, we modify the barrier grid template to create a barrier grid for the landscape plot. + +\begin{verbatim} +make_barrier_grid_3d(batch_grid_gene, start_location_value = c(0, 1.5), + end_location_value = c(1, -0.5), start_r = 1, end_r = 1, print_template = TRUE) +\end{verbatim} + +\begin{verbatim} +#> structure(list(start_location_value = list(c(0, 1.5), c(0, 1.5 +#> ), c(0, 1.5), c(0, 1.5), c(0, 1.5), c(0, 1.5), c(0, 1.5), c(0, +#> 1.5), c(0, 1.5)), start_r = list(c(1, 1), c(1, 1), c(1, 1), c(1, +#> 1), c(1, 1), c(1, 1), c(1, 1), c(1, 1), c(1, 1)), end_location_value = list( +#> c(1, -0.5), c(1, -0.5), c(1, -0.5), c(1, -0.5), c(1, -0.5 +#> ), c(1, -0.5), c(1, -0.5), c(1, -0.5), c(1, -0.5)), end_r = list( +#> c(1, 1), c(1, 1), c(1, 1), c(1, 1), c(1, 1), c(1, 1), c(1, +#> 1), c(1, 1), c(1, 1))), row.names = c(NA, -9L), class = c("arg_grid", +#> "data.frame")) +\end{verbatim} + +\begin{verbatim} +#> ele_list b k start_location_value start_r end_location_value end_r +#> 1 0.5, 0.5 0.5 0.5 0.0, 1.5 1, 1 1.0, -0.5 1, 1 +#> 2 1.0, 0.5 1.0 0.5 0.0, 1.5 1, 1 1.0, -0.5 1, 1 +#> 3 1.5, 0.5 1.5 0.5 0.0, 1.5 1, 1 1.0, -0.5 1, 1 +#> 4 0.5, 1.0 0.5 1.0 0.0, 1.5 1, 1 1.0, -0.5 1, 1 +#> 5 1, 1 1.0 1.0 0.0, 1.5 1, 1 1.0, -0.5 1, 1 +#> 6 1.5, 1.0 1.5 1.0 0.0, 1.5 1, 1 1.0, -0.5 1, 1 +#> 7 0.5, 1.5 0.5 1.5 0.0, 1.5 1, 1 1.0, -0.5 1, 1 +#> 8 1.0, 1.5 1.0 1.5 0.0, 1.5 1, 1 1.0, -0.5 1, 1 +#> 9 1.5, 1.5 1.5 1.5 0.0, 1.5 1, 1 1.0, -0.5 1, 1 +\end{verbatim} + +\begin{verbatim} +bg_gene <- make_barrier_grid_3d(batch_grid_gene, df = structure(list(start_location_value = list(c(0, + 1.5), c(0, 1.5), c(0, 1.5), c(0, 1.5), c(0, 1.5), c(0, 1.5), c(0, 1.5), + c(0, 1.5), c(0, 1.5)), start_r = list(c(0.2, 1), c(0.2, 1), c(0.2, + 1), c(0.2, 0.5), c(0.2, 0.5), c(0.2, 0.5), c(0.2, 0.3), c(0.2, 0.3), + c(0.2, 0.3)), end_location_value = list(c(2, 0), c(2, 0), c(2, 0), + c(1, 0), c(1, 0), c(1, 0), c(1, 0), c(1, 0), c(1, 0)), end_r = list(c(1, + 1), c(1, 1), c(1, 1), c(1, 1), c(1, 1), c(1, 1), c(1, 1), c(1, 1), + c(1, 1))), row.names = c(NA, -9L), class = c("arg_grid", "data.frame"))) +\end{verbatim} + +With the barrier grid template, we can calculate the barrier for each landscape plot. + +\begin{verbatim} +b_batch_gene_3d <- calculate_barrier(l_batch_gene_3d, bg = bg_gene) +\end{verbatim} + +If the barrier grid was not needed, the following code can be used to calculate the barrier. + +\begin{verbatim} +b_batch_gene_3d <- calculate_barrier(l_batch_gene_3d, start_location_value = c(0, + 1.5), end_location_value = c(1, 0), start_r = 1, end_r = 1) +\end{verbatim} + +The resulting landscapes and the MEPs between states are shown in Figure \ref{fig:bbatch3dgene}. + +\begin{verbatim} +plot(l_batch_gene_3d) + autolayer(b_batch_gene_3d) +\end{verbatim} + +\begin{figure} +\includegraphics[width=1\linewidth,alt={Nine landscape plots arranged in a 3×3 grid. The x-axis is labeled delta x, the y-axis is labeled a. Rows correspond to k values (0.5, 1, 1.5), and columns to b values (0.5, 1, 1.5).}]{RJ-2025-039_files/figure-latex/bbatch3dgene-1} \caption{The landscape for the gene expression model of different \( b \) and \( k \) values. The local minima are marked as white dots, the saddle points are marked as red dots, and the MEPs are marked as white lines.}\label{fig:bbatch3dgene} +\end{figure} + +From the landscapes, it is clear that increasing \(b\), which represents higher strength of gene mutual prohibition, makes the two differentiated states further apart from each other. Increasing \(k\), which represents faster degradation, makes the undifferentiated state disappear earlier, thus makes the differentiation earlier. When \(b\) is low enough and \(k\) is high enough, there is no differentiation anymore because the two differentiated states merge together and form a more stable state at \(\Delta x = 0\). In this case, there is no actual differentiation in the system, but only a one-to-one conversion of cell types. Only when \(b\) is high enough and \(k\) is low enough is it possible for the cell to differentiate into two types. + +\subsection{Example 2: panic disorder model}\label{example-2-panic-disorder-model} + +The second example we use is the panic disorder model proposed by \citet{robinaugh2024}. The model is implemented in the \pkg{PanicModel} package (\url{https://github.com/jmbh/PanicModel/}). It contains 12 variables and 33 parameters and also involves history dependency and non-differentiable formula (such as if-else conditions) to model the complex interplay of individual and environmental elements in different time scales. The most important variables of the model are the physical arousal (\(A\)) of a person (e.g., heart beat, muscle tension, sweating), the person's perceived threat (\(PT\), how dangerous the person cognitively evaluates the environment), and the person's tendency to escape from the situation (\(E\)). The core theoretical idea of the model is that physical arousal and perceived threat of a person may strengthen each other in certain circumstances, leading to sudden increases in both variables, manifesting as panic attacks. The tendency that a person tends to use physical arousal as cognitive evidence of threat is represented by another variable, arousal schema \(S\). A comprehensive introduction of the model is beyond the scope of the current article, and we would like to refer interested readers to \citet{robinaugh2024}. Here, to simplify the context, we assume that \(AS\) does not change over time, and no psychotherapy is being administered. We focus on the influence of \(AS\) on the system's stability represented by \(A\) and \(PT\). A graphical illustration of several core variables of this model is shown in Figure \ref{fig:e2} (adapted from \citet{CuiEtAlMetaphorComputationConstructing2021}). + +\begin{figure} +\includegraphics[width=1\linewidth,alt={Four large circles labeled H, A, PT, and E from left to right. Solid arrows go from A to H, A to PT, PT to A, and PT to E. Dashed arrows go from H to A and from E to PT. The solid arrow from A to PT is labeled AS.}]{figures/diagram-e2} \caption{A graphical illustration of the relationships between several important psychological variables in the panic disorder model. Solid arrows represent positive relationships and dashed arrows represent negative relationships.}\label{fig:e2} +\end{figure} + +To construct the potential landscapes for this model, we first create a function that performs a simulation using the \texttt{simPanic()} function from \pkg{PanicModel}. This is required as we need to modify some default options for illustration. + +\begin{verbatim} +library(PanicModel) + +sim_fun_panic <- function(x0, par) { + + # Change several default parameters + pars <- pars_default + # Increase the noise strength to improve sampling efficiency + pars$N$lambda_N <- 200 + # Make S constant through the simulation + pars$TS$r_S_a <- 0 + pars$TS$r_S_e <- 0 + + # Specify the initial values of A and PT according to the format + # requirement by `multi_init_simulation()`, while the other + # variables use the default initial values. + initial <- initial_default + initial$A <- x0[1] + initial$PT <- x0[2] + + # Specify the value of S according to the format requirement by + # `batch_simulation()`. + initial$S <- par$S + + # Extract the simulation output from the result by simPanic(). + # Only keep the core variables. + return(as.matrix(simPanic(1:5000, initial = initial, parameters = pars)$outmat[, + c("A", "PT", "E")])) +} +\end{verbatim} + +We then perform a single simulation from multiple starting points. To speed up the simulation, we use parallel computing. + +\begin{verbatim} +future::plan("multisession") +set.seed(1614, kind = "L'Ecuyer-CMRG") +single_output_panic <- multi_init_simulation(sim_fun = sim_fun_panic, range_x0 = c(0, + 1, 0, 1), R = 4, par = list(S = 0.5)) +\end{verbatim} + +The convergence check results of the simulation, shown in Figure \ref{fig:converge-panic}, indicate that the time series of the first 100 data points are strongly influenced by the choice of initial value. Therefore, we remove the first 100 data points in the following analysis. + +\begin{verbatim} +plot(single_output_panic) +\end{verbatim} + +\begin{figure} +\includegraphics[width=1\linewidth,alt={Trace and density plots for variables x, y, and z.}]{RJ-2025-039_files/figure-latex/converge-panic-1} \caption{The convergence check result for the simulation of the panic disorder model.}\label{fig:converge-panic} +\end{figure} + +We then create the 3D landscape for the panic disorder model, shown in Figure \ref{fig:3dstaticpanic}. The landscape shows that the system has two stable states, which are represented by the valleys in the landscape. The system can be trapped in these valleys, leading to different levels of physical arousal and perceived threat. The valley with higher potential value represents a state with higher physical arousal and perceived threat, which corresponds to a panic attack. In contrast, the valley with lower potential value represents a state with lower physical arousal and perceived threat, which corresponds to a healthy state. + +\begin{figure} +\includegraphics[width=0.5\linewidth,alt={A landscape plot with two basins.}]{RJ-2025-039_files/figure-latex/3dstaticpanic-1} \caption{The 3D landscape (potential value as color) for the panic disorder model}\label{fig:3dstaticpanic} +\end{figure} + +We now investigate the effect of the parameter \(S\) on the potential landscape. This parameter represents the tendency that a person considers physical arousal as a sign of danger. Therefore, we expect that higher \(S\) will stabilize the panic state and destabilize the healthy state. + +We perform a batch simulation with varying \(S\) values to construct the potential landscapes for different \(S\) values. This, again, starts with the creation of a grid of parameter values. + +\begin{verbatim} +batch_arg_grid_panic <- new_arg_set() |> + add_arg_ele(arg_name = "par", ele_name = "S", start = 0, end = 1, by = 0.5) |> + make_arg_grid() +\end{verbatim} + +We then perform the batch simulation using parallel computing. + +\begin{verbatim} +future::plan("multisession") +set.seed(1614, kind = "L'Ecuyer-CMRG") +batch_output_panic <- batch_simulation(batch_arg_grid_panic, sim_fun = function(par) { + multi_init_simulation(sim_fun_panic, range_x0 = c(0, 1, 0, 1), R = 4, + par = par) |> + window(start = 100) +}) +\end{verbatim} + +The 3D landscapes for different \(S\) values are shown in Figure \ref{fig:lbatch3dpanic}. The landscapes show that the system only has one stable state when \(S\) is low, but has two stable states when \(S\) is high. The stability of the panic state also increases when \(S\) is higher. This indicates that a higher \(S\) value corresponds to a higher risk of panic attacks. + +\begin{verbatim} +l_batch_panic_3d <- make_3d_matrix(batch_output_panic, x = "A", y = "PT", + cols = "S", h = 0.005, lims = c(-1, 1.5, -0.5, 1.5)) +plot(l_batch_panic_3d) +\end{verbatim} + +\begin{figure} +\includegraphics[width=1\linewidth,alt={Three landscape plots arranged in a row. The x-axis is labeled A, the y-axis is labeled PT. Columns correspond to S values of 0, 0.5, and 1.}]{RJ-2025-039_files/figure-latex/lbatch3dpanic-1} \caption{The landscape for the panic disorder model of different \( S \) values. Two landscapes are shown for different variable combinations, \( A \) and \( PT \), or \( A \) and \( E \).}\label{fig:lbatch3dpanic} +\end{figure} + +\section{Discussion}\label{discussion} + +Potential landscapes can show the stability of states for a dynamical system in an intuitive and quantitative way. They are especially informative for multistable systems. In this article, we illustrated how to construct potential landscapes using \CRANpkg{simlandr}. The potential landscapes generated by \CRANpkg{simlandr} are based on the steady-state distribution of the system, which is in turn estimated using Monte Carlo simulation. Compared to analytic methods, Monte Carlo estimation is more flexible and thus more applicable for complex models. The flexibility comes together with a higher demand for time and storage, which is necessary to make the estimation precise enough. The hash\_big\_matrix class partly solved this problem by dumping the memory storage to hard disk space. Also, it is important that the simulation function itself is efficient enough. The functions \texttt{sim\_SDE()} and \texttt{multi\_init\_simulation()} make use of the efficient simulations provided by \CRANpkg{Sim.DiffProc} \citep{GuidoumBoukhetalaPerformingParallelMonte2020} and the parallel computing with the \CRANpkg{future} framework \citep{RJ-2021-048}. For customized simulation functions, there are also multiple approaches that can be used to improve the performance, for which we refer interested readers to \citet{WickhamImprovingPerformance2019a}. In Supplementary Materials A, we provide a benchmark of the typical time and memory usage of the procedures in \CRANpkg{simlandr}. From there, we can see that time and memory usage are acceptable in most cases on a personal computer. When the transition between attractors is rare, the \texttt{multi\_init\_simulation()} function may help to speed up the convergence, and more advanced sampling methods like importance sampling or rare event sampling may be needed in more complex situations. The detailed ways to implement such methods are highly dependent on the specific model and are beyond the scope of this package. We direct interested readers to \citet{RubinoTuffinRareEventSimulation2009} and \citet{KloekvanDijkBayesianEstimatesEquation1978} for a comprehensive review of rare event simulation methods. Nevertheless, the landscape construction functions in \CRANpkg{simlandr} allow users to provide weights for the simulation results, which can be used to adjust the sampling distribution. + +In addition, the length of the simulation and the choice of noise strength may also have an important influence on the results. If the length is too short, the density estimation will be inaccurate, resulting in rugged landscapes. If the length is too long, the simulation part would require more computational resources, which is not always realistic. If the noise is too weak, the system may not be able to converge in a reasonable time, resulting in problems in convergence checks, overly noisy landscapes, or failure to show valleys that are theoretically present. If the noise is too strong, the simulation may be unstable and the boundaries between valleys may be blurry. In Supplementary Materials B, we showed the influence of simulation length and noise strength on the landscape output. With some theoretical expectation of the system's behavior, it is not difficult to spot that the simulation is too short, or the noise level might be unsuitable. In that case, some adjustments are required before the landscape can be well constructed. + +All landscape construction and barrier calculation functions in \CRANpkg{simlandr} contain both visual aids and numerical data that can be used for further processing. The html plots based on \CRANpkg{plotly} are more suitable for interactive illustrations, while it is also possible to export them to static plots using \texttt{plotly::orca()}. The \CRANpkg{ggplot2} plots are readily usable for flat printing. + +We also want to note some limitations of the potential landscape generated by \CRANpkg{simlandr}. First, the generalized potential landscape is not a complete description of all dynamics in a system. It emphasizes the stability of different states by filtering out other dynamical information. Some behaviors are not possible in gradient systems (e.g., oscillations and loops), thus cannot be shown in a potential landscape \citep{zhouConstructionLandscapeMultistable2016}. Second, since the steady-state distribution is estimated using a kernel smoothing method, which depends on stochastic simulations, the resulting potential function may not be highly accurate. Its accuracy is further affected by the choice of kernel bandwidth and noise strength. This issue is particularly pronounced at valley edges, where fewer samples are available for estimation. Similar limitations apply to MEP calculations, as they are derived from the generalized landscape rather than the original dynamics. Therefore, we do not recommend directly interpreting the potential function or barrier height results for applications requiring high precision. Instead, the potential landscape is best used as a semi-quantitative tool to gain insights into the system's overall behavior, guide further analysis, and compare system behavior under different parameter settings, provided the same simulation and kernel estimation conditions are used. The examples in this article illustrated some typical use cases we recommend. + +\section{Availability and future directions}\label{availability-and-future-directions} + +This package is publicly available from the Comprehensive R Archive Network (CRAN) at \url{https://CRAN.R-project.org/package=simlandr}, under GPL-3 license. The results in the current article were generated with \CRANpkg{simlandr} 0.4.0 version. R script to replicate all the results in this article can be found in the supporting information. + +The barrier height data calculated by \CRANpkg{simlandr} can also be further analyzed and visualized. For example, sometimes it is helpful to look into how the barrier height changes with varying parameters (e.g., \citet{CuiEtAlMetaphorComputationConstructing2021}). We encourage users to explore other ways of analyzing and visualizing the various results provided by \CRANpkg{simlandr}. + +The method we chose for \CRANpkg{simlandr} is not the only possible one. The generalized landscape by \citet{wangPotentialLandscapeFlux2008}, which we implemented, is more flexible and emphasizes the possibility that the system is in a specific state, while other methods may have other strengths (e.g., the method by \citet{rodriguez-sanchezPabRodRolldownPostpublication2020} emphasizes the gradient part of the vector field, and the method by \citet{MooreEtAlQPotPackageStochastic2016} emphasizes the possibility of transition processes under small noise). We look forward to future theoretical and methodological developments in this direction. + +\section{Acknowledgments}\label{acknowledgments} + +TL was supported by the NSFC under Grant No.~11825102 and the Beijing Academy of Artificial Intelligence (BAAI). ALA and MO were supported by an NWO VIDI grant, Grant No.~VI.Vidi.191.178. + +\bibliography{RJreferences.bib} + +\address{% +Jingmeng Cui\\ +University of Groningen,Radboud University\\% +Faculty of Behavioural and Social Sciences, Groningen, the Netherlands\\ Behavioural Science Institute, Nijmegen, the Netherlands\\ +% +% +\textit{ORCiD: \href{https://orcid.org/0000-0003-3421-8457}{0000-0003-3421-8457}}\\% +\href{mailto:jingmeng.cui@rug.nl}{\nolinkurl{jingmeng.cui@rug.nl}}% +} + +\address{% +Merlijn Olthof\\ +University of Groningen,Radboud University\\% +Faculty of Behavioural and Social Sciences, Groningen, the Netherlands\\ Behavioural Science Institute, Nijmegen, the Netherlands\\ +% +% +\textit{ORCiD: \href{https://orcid.org/0000-0002-5975-6588}{0000-0002-5975-6588}}\\% +\href{mailto:merlijn.olthof@ru.nl}{\nolinkurl{merlijn.olthof@ru.nl}}% +} + +\address{% +Anna Lichtwarck-Aschoff\\ +University of Groningen,Radboud University\\% +Faculty of Behavioural and Social Sciences, Groningen, the Netherlands\\ Behavioural Science Institute, Nijmegen, the Netherlands\\ +% +% +\textit{ORCiD: \href{https://orcid.org/0000-0002-4365-1538}{0000-0002-4365-1538}}\\% +\href{mailto:a.lichtwarck-aschoff@rug.nl}{\nolinkurl{a.lichtwarck-aschoff@rug.nl}}% +} + +\address{% +Tiejun Li\\ +Peking University\\% +\textit{(corresponding author)} LMAM and School of Mathematical Sciences, Beijing, China\\ +% +% +\textit{ORCiD: \href{https://orcid.org/0000-0002-2086-1620}{0000-0002-2086-1620}}\\% +\href{mailto:tieli@pku.edu.cn}{\nolinkurl{tieli@pku.edu.cn}}% +} + +\address{% +Fred Hasselman\\ +Radboud University\\% +\textit{(corresponding author)} Behavioural Science Institute, Nijmegen, the Netherlands\\ +% +% +\textit{ORCiD: \href{https://orcid.org/0000-0003-1384-8361}{0000-0003-1384-8361}}\\% +\href{mailto:fred.hasselman@ru.nl}{\nolinkurl{fred.hasselman@ru.nl}}% +} diff --git a/_articles/RJ-2025-039/RJ-2025-039.zip b/_articles/RJ-2025-039/RJ-2025-039.zip new file mode 100644 index 0000000000..4cdf87f393 Binary files /dev/null and b/_articles/RJ-2025-039/RJ-2025-039.zip differ diff --git a/_articles/RJ-2025-039/RJ-2025-039_files/figure-html5/3dstaticpanic-1.png b/_articles/RJ-2025-039/RJ-2025-039_files/figure-html5/3dstaticpanic-1.png new file mode 100644 index 0000000000..911f010f39 Binary files /dev/null and b/_articles/RJ-2025-039/RJ-2025-039_files/figure-html5/3dstaticpanic-1.png differ diff --git a/_articles/RJ-2025-039/RJ-2025-039_files/figure-html5/bbatch3dgene-1.png b/_articles/RJ-2025-039/RJ-2025-039_files/figure-html5/bbatch3dgene-1.png new file mode 100644 index 0000000000..acd58b69d4 Binary files /dev/null and b/_articles/RJ-2025-039/RJ-2025-039_files/figure-html5/bbatch3dgene-1.png differ diff --git a/_articles/RJ-2025-039/RJ-2025-039_files/figure-html5/bsingle3dgene-1.png b/_articles/RJ-2025-039/RJ-2025-039_files/figure-html5/bsingle3dgene-1.png new file mode 100644 index 0000000000..01f9981038 Binary files /dev/null and b/_articles/RJ-2025-039/RJ-2025-039_files/figure-html5/bsingle3dgene-1.png differ diff --git a/_articles/RJ-2025-039/RJ-2025-039_files/figure-html5/converge-gene-1.png b/_articles/RJ-2025-039/RJ-2025-039_files/figure-html5/converge-gene-1.png new file mode 100644 index 0000000000..5c97a42c8a Binary files /dev/null and b/_articles/RJ-2025-039/RJ-2025-039_files/figure-html5/converge-gene-1.png differ diff --git a/_articles/RJ-2025-039/RJ-2025-039_files/figure-html5/converge-panic-1.png b/_articles/RJ-2025-039/RJ-2025-039_files/figure-html5/converge-panic-1.png new file mode 100644 index 0000000000..3d7c1070e1 Binary files /dev/null and b/_articles/RJ-2025-039/RJ-2025-039_files/figure-html5/converge-panic-1.png differ diff --git a/_articles/RJ-2025-039/RJ-2025-039_files/figure-html5/lbatch3dpanic-1.png b/_articles/RJ-2025-039/RJ-2025-039_files/figure-html5/lbatch3dpanic-1.png new file mode 100644 index 0000000000..3b324818cf Binary files /dev/null and b/_articles/RJ-2025-039/RJ-2025-039_files/figure-html5/lbatch3dpanic-1.png differ diff --git a/_articles/RJ-2025-039/RJournal.sty b/_articles/RJ-2025-039/RJournal.sty new file mode 100644 index 0000000000..351990be38 --- /dev/null +++ b/_articles/RJ-2025-039/RJournal.sty @@ -0,0 +1,358 @@ +% Package `RJournal' to use with LaTeX2e +% Copyright (C) 2010 by the R Foundation +% Copyright (C) 2013 by the R Journal +% +% Originally written by Kurt Hornik and Friedrich Leisch with subsequent +% edits by the editorial board +% +% CAUTION: +% Do not modify this style file. Any changes to this file will be reset when your +% article is submitted. +% If you must modify the style or add LaTeX packages to the article, these +% should be specified in RJwrapper.tex + +\NeedsTeXFormat{LaTeX2e}[1995/12/01] +\ProvidesPackage{RJournal}[2025/10/05 v0.17 RJournal package] + +\RequirePackage{tikz} + +% Overall page layout, fonts etc ----------------------------------------------- + +% Issues of of \emph{The R Journal} are created from the standard \LaTeX{} +% document class \pkg{report}. + +\RequirePackage{geometry} +\geometry{a4paper, + textwidth=14cm, top=1cm, bottom=1cm, + includehead,includefoot,centering, + footskip=1.5cm} +\raggedbottom +\sloppy +\clubpenalty = 10000 +\widowpenalty = 10000 +\brokenpenalty = 10000 +\usepackage{microtype} + + +\RequirePackage{fancyhdr} +\fancyhead{} +\fancyheadoffset{2cm} +\fancyhead[L]{\textsc{\RJ@sectionhead}} +\fancyhead[R]{\thepage} +\fancyfoot{} +\fancyfoot[L]{The R Journal Vol. \RJ@volume/\RJ@number, \RJ@month~\RJ@year} +\fancyfoot[R]{ISSN 2073-4859} +\pagestyle{fancy} + +% We use the following fonts (all with T1 encoding): +% +% rm & palatino +% tt & inconsolata +% sf & helvetica +% math & palatino + +\RequirePackage{microtype} + +\RequirePackage[scaled=0.92]{helvet} +\RequirePackage{palatino,mathpazo} +\RequirePackage[scaled=1.02]{inconsolata} +\RequirePackage[T1]{fontenc} + +\RequirePackage[hyphens]{url} +\RequirePackage[pagebackref]{hyperref} +\renewcommand{\backref}[1]{[p#1]} + +% Dark blue colour for all links +\RequirePackage{color} +\definecolor{link}{rgb}{0.45,0.51,0.67} +\hypersetup{ + colorlinks,% + citecolor=link,% + filecolor=link,% + linkcolor=link,% + urlcolor=link +} + +% Give the text a little room to breath +\setlength{\parskip}{3pt} +\RequirePackage{setspace} +\setstretch{1.05} + +% Issue and article metadata --------------------------------------------------- + +% Basic front matter information about the issue: volume, number, and +% date. + +\newcommand{\volume}[1]{\def\RJ@volume{#1}} +\newcommand{\volnumber}[1]{\def\RJ@number{#1}} +\renewcommand{\month}[1]{\def\RJ@month{#1}} +\renewcommand{\year}[1]{\def\RJ@year{#1}} + + +% Individual articles correspond to +% chapters, and are contained in |article| environments. This makes it +% easy to have figures counted within articles and hence hyperlinked +% correctly. + +% An article has an author, a title, and optionally a subtitle. We use +% the obvious commands for specifying these. Articles will be put in certain +% journal sections, named by \sectionhead. + +\newcommand {\sectionhead} [1]{\def\RJ@sectionhead{#1}} +\renewcommand{\author} [1]{\def\RJ@author{#1}} +\renewcommand{\title} [1]{\def\RJ@title{#1}} +\newcommand {\subtitle} [1]{\def\RJ@subtitle{#1}} + +% Control appearance of titles: make slightly smaller than usual, and +% suppress section numbering. See http://tex.stackexchange.com/questions/69749 +% for why we don't use \setcounter{secnumdepth}{-1} + +\usepackage[medium]{titlesec} +\usepackage{titletoc} +\titleformat{\section} {\normalfont\large\bfseries}{\arabic{section}}{1em}{} +\titleformat{\subsection}{\normalfont\normalsize\bfseries}{\arabic{section}.\arabic{subsection}}{0.5em}{} +\titlecontents{chapter} [0em]{}{}{}{\titlerule*[1em]{.}\contentspage} + +% Article layout --------------------------------------------------------------- + +% Environment |article| clears the article header information at its beginning. +% We use |\FloatBarrier| from the placeins package to keep floats within +% the article. +\RequirePackage{placeins} +\newenvironment{article}{\author{}\title{}\subtitle{}\FloatBarrier}{\FloatBarrier} + +% Refereed articles should have an abstract, so we redefine |\abstract| to +% give the desired style + +\renewcommand{\abstract}[1]{\noindent\textbf{Abstract} #1} +\renewenvironment{abstract}{\noindent\textbf{Abstract}~}{} + +% The real work is done by a redefined version of |\maketitle|. Note +% that even though we do not want chapters (articles) numbered, we +% need to increment the chapter counter, so that figures get correct +% labelling. + +\renewcommand{\maketitle}{% +\noindent + \chapter{\RJ@title}\refstepcounter{chapter} + \ifx\empty\RJ@subtitle + \else + \noindent\textbf{\RJ@subtitle} + \par\nobreak\addvspace{\baselineskip} + \fi + \ifx\empty\RJ@author + \else + \noindent\textit{\RJ@author} + \par\nobreak\addvspace{\baselineskip} + \fi + \@afterindentfalse\@nobreaktrue\@afterheading +} + +% Now for some ugly redefinitions. We do not want articles to start a +% new page. (Actually, we do, but this is handled via explicit +% \newpage +% +% The name@of@eq is a hack to get hyperlinks to equations to work +% within each article, even though there may be multiple eq.(1) +% \begin{macrocode} +\renewcommand\chapter{\secdef\RJ@chapter\@schapter} +\providecommand{\nohyphens}{% + \hyphenpenalty=10000\exhyphenpenalty=10000\relax} +\newcommand{\RJ@chapter}{% + \edef\name@of@eq{equation.\@arabic{\c@chapter}}% + \renewcommand{\@seccntformat}[1]{}% + \@startsection{chapter}{0}{0mm}{% + -2\baselineskip \@plus -\baselineskip \@minus -.2ex}{\p@}{% + \phantomsection\normalfont\huge\bfseries\raggedright}} + +% Book reviews should appear as sections in the text and in the pdf bookmarks, +% however we wish them to appear as chapters in the TOC. Thus we define an +% alternative to |\maketitle| for reviews. +\newcommand{\review}[1]{ + \pdfbookmark[1]{#1}{#1} + \section*{#1} + \addtocontents{toc}{\protect\contentsline{chapter}{#1}{\thepage}{#1.1}} +} + +% We want bibliographies as starred sections within articles. +% +\RequirePackage[sectionbib,round]{natbib} +\bibliographystyle{abbrvnat} +\renewcommand{\bibsection}{\section*{References}} + +% Equations, figures and tables are counted within articles, but we do +% not show the article number. For equations it becomes a bit messy to avoid +% having hyperref getting it wrong. + +% \numberwithin{equation}{chapter} +\renewcommand{\theequation}{\@arabic\c@equation} +\renewcommand{\thefigure}{\@arabic\c@figure} +\renewcommand{\thetable}{\@arabic\c@table} + +% Issue layout ----------------------------------------------------------------- + +% Need to provide our own version of |\tableofcontents|. We use the +% tikz package to get the rounded rectangle. Notice that |\section*| +% is really the same as |\chapter*|. +\renewcommand{\contentsname}{Contents} +\renewcommand\tableofcontents{% + \vspace{1cm} + \section*{\contentsname} + { \@starttoc{toc} } +} + +\renewcommand{\titlepage}{% + \thispagestyle{empty} + \hypersetup{ + pdftitle={The R Journal Volume \RJ@volume/\RJ@number, \RJ@month \RJ@year},% + pdfauthor={R Foundation for Statistical Computing},% + } + \noindent + \begin{center} + \fontsize{50pt}{50pt}\selectfont + The \raisebox{-8pt}{\includegraphics[height=77pt]{Rlogo-5}}\hspace{10pt} + Journal + + \end{center} + {\large \hfill Volume \RJ@volume/\RJ@number, \RJ@month{} \RJ@year \quad} + + \rule{\textwidth}{1pt} + \begin{center} + {\Large A peer-reviewed, open-access publication of the \\ + R Foundation for Statistical Computing} + \end{center} + + % And finally, put in the TOC box. Note the way |tocdepth| is adjusted + % before and after producing the TOC: thus, we can ensure that only + % articles show up in the printed TOC, but that in the PDF version, + % bookmarks are created for sections and subsections as well (provided + % that the non-starred forms are used). + \setcounter{tocdepth}{0} + \tableofcontents + \setcounter{tocdepth}{2} + \clearpage +} + +% Text formatting -------------------------------------------------------------- + +\newcommand{\R}{R} +\newcommand{\address}[1]{\addvspace{\baselineskip}\noindent\emph{#1}} +\newcommand{\email}[1]{\href{mailto:#1}{\normalfont\texttt{#1}}} + +% Simple font selection is not good enough. For example, |\texttt{--}| +% gives `\texttt{--}', i.e., an endash in typewriter font. Hence, we +% need to turn off ligatures, which currently only happens for commands +% |\code| and |\samp| and the ones derived from them. Hyphenation is +% another issue; it should really be turned off inside |\samp|. And +% most importantly, \LaTeX{} special characters are a nightmare. E.g., +% one needs |\~{}| to produce a tilde in a file name marked by |\file|. +% Perhaps a few years ago, most users would have agreed that this may be +% unfortunate but should not be changed to ensure consistency. But with +% the advent of the WWW and the need for getting `|~|' and `|#|' into +% URLs, commands which only treat the escape and grouping characters +% specially have gained acceptance + +\DeclareRobustCommand\code{\bgroup\@noligs\@codex} +\def\@codex#1{\texorpdfstring% +{{\normalfont\ttfamily\hyphenchar\font=-1 #1}}% +{#1}\egroup} +\newcommand{\kbd}[1]{{\normalfont\texttt{#1}}} +\newcommand{\key}[1]{{\normalfont\texttt{\uppercase{#1}}}} +\DeclareRobustCommand\samp{`\bgroup\@noligs\@sampx} +\def\@sampx#1{{\normalfont\texttt{#1}}\egroup'} +\newcommand{\var}[1]{{\normalfont\textsl{#1}}} +\let\env=\code +\newcommand{\file}[1]{{`\normalfont\textsf{#1}'}} +\let\command=\code +\let\option=\samp +\newcommand{\dfn}[1]{{\normalfont\textsl{#1}}} +% \acronym is effectively disabled since not used consistently +\newcommand{\acronym}[1]{#1} +\newcommand{\strong}[1]{\texorpdfstring% +{{\normalfont\fontseries{b}\selectfont #1}}% +{#1}} +\let\pkg=\strong +\newcommand{\CRANpkg}[1]{\href{https://CRAN.R-project.org/package=#1}{\pkg{#1}}}% +\let\cpkg=\CRANpkg +\newcommand{\ctv}[1]{\href{https://CRAN.R-project.org/view=#1}{\emph{#1}}} +\newcommand{\BIOpkg}[1]{\href{https://www.bioconductor.org/packages/release/bioc/html/#1.html}{\pkg{#1}}} + +% Example environments --------------------------------------------------------- +\RequirePackage{fancyvrb} +\RequirePackage{alltt} + +\DefineVerbatimEnvironment{example}{Verbatim}{} +\renewenvironment{example*}{\begin{alltt}}{\end{alltt}} + +% Support for output from Sweave, and generic session style code +% These used to have fontshape=sl for Sinput/Scode/Sin, but pslatex +% won't use a condensed font in that case. + +% Update (2015-05-28 by DS): remove fontsize=\small to match example environment + +\DefineVerbatimEnvironment{Sinput}{Verbatim}{} +\DefineVerbatimEnvironment{Soutput}{Verbatim}{} +\DefineVerbatimEnvironment{Scode}{Verbatim}{} +\DefineVerbatimEnvironment{Sin}{Verbatim}{} +\DefineVerbatimEnvironment{Sout}{Verbatim}{} +\newenvironment{Schunk}{}{} + +% Mathematics ------------------------------------------------------------------ + +% The implementation of |\operatorname| is similar to the mechanism +% \LaTeXe{} uses for functions like sin and cos, and simpler than the +% one of \AmSLaTeX{}. We use |\providecommand| for the definition in +% order to keep the one of the \pkg{amstex} if this package has +% already been loaded. +% \begin{macrocode} +\providecommand{\operatorname}[1]{% + \mathop{\operator@font#1}\nolimits} +\RequirePackage{amsfonts} + +\renewcommand{\P}{% + \mathop{\operator@font I\hspace{-1.5pt}P\hspace{.13pt}}} +\newcommand{\E}{% + \mathop{\operator@font I\hspace{-1.5pt}E\hspace{.13pt}}} +\newcommand{\VAR}{\operatorname{var}} +\newcommand{\COV}{\operatorname{cov}} +\newcommand{\COR}{\operatorname{cor}} + +% Figures ---------------------------------------------------------------------- + +% For use with pandoc > 3.2.1 +\newsavebox\pandoc@box +\newcommand*\pandocbounded[1]{% scales image to fit in text height/width + \sbox\pandoc@box{#1}% + \Gscale@div\@tempa{\textheight}{\dimexpr\ht\pandoc@box+\dp\pandoc@box\relax}% + \Gscale@div\@tempb{\linewidth}{\wd\pandoc@box}% + \ifdim\@tempb\p@<\@tempa\p@\let\@tempa\@tempb\fi% select the smaller of both + \ifdim\@tempa\p@<\p@\scalebox{\@tempa}{\usebox\pandoc@box}% + \else\usebox{\pandoc@box}% + \fi% +} + +\RequirePackage[font=small,labelfont=bf]{caption} + +% Wide environments for figures and tables ------------------------------------- +\RequirePackage{environ} + +% An easy way to make a figure span the full width of the page +\NewEnviron{widefigure}[1][]{ +\begin{figure}[#1] +\advance\leftskip-2cm +\begin{minipage}{\dimexpr\textwidth+4cm\relax}% + \captionsetup{margin=2cm} + \BODY +\end{minipage}% +\end{figure} +} + +\NewEnviron{widetable}[1][]{ +\begin{table}[#1] +\advance\leftskip-2cm +\begin{minipage}{\dimexpr\textwidth+4cm\relax}% + \captionsetup{margin=2cm} + \BODY +\end{minipage}% +\end{table} +} diff --git a/_articles/RJ-2025-039/RJreferences.bib b/_articles/RJ-2025-039/RJreferences.bib new file mode 100644 index 0000000000..f99a5d6c84 --- /dev/null +++ b/_articles/RJ-2025-039/RJreferences.bib @@ -0,0 +1,340 @@ +@article{AoPotentialStochasticDifferential2004, + title = {Potential in Stochastic Differential Equations: Novel Construction}, + author = {Ao, P.}, + year = {2004}, + journal = {Journal of Physics A: Mathematical and General}, + volume = {37}, + number = {3}, + pages = {L25--L30}, + publisher = {IOP Publishing}, + issn = {0305-4470}, + doi = {10.1088/0305-4470/37/3/L01} +} +@article{CuiEtAlMetaphorComputationConstructing2021, + title = {From Metaphor to Computation: Constructing the Potential Landscape for Multivariate Psychological Formal Models}, + author = {Cui, Jingmeng and Lichtwarck-Aschoff, Anna and Olthof, Merlijn and Li, Tiejun and Hasselman, Fred}, + year = {2023}, + journal = {Multivariate Behavioral Research}, + volume = {58}, + number = {4}, + pages = {743--761}, + doi = {10.1080/00273171.2022.2119927} +} +@article{dahiyaOrderedLineIntegral2018, + title = {Ordered Line Integral Methods for Computing the Quasi-Potential}, + author = {Dahiya, Daisy and Cameron, Maria}, + year = {2018}, + journal = {Journal of Scientific Computing}, + volume = {75}, + number = {3}, + pages = {1351--1384}, + issn = {1573-7691}, + doi = {10.1007/s10915-017-0590-9} +} +@article{dijkstra1959note, + title = {A Note on Two Problems in Connexion with Graphs}, + author = {Dijkstra, Edsger W}, + year = {1959}, + journal = {Numerische Mathematik}, + volume = {1}, + number = {1}, + pages = {269--271}, + publisher = {Springer}, + doi = {10.1007/BF01386390} +} +@book{ChaconDuongMultivariateKernelSmoothing2018, + title = {Multivariate Kernel Smoothing and Its Applications}, + author = {Chac{\'o}n, Jos{\'e} E. and Duong, Tarn}, + year = {2018}, + month = may, + publisher = {{Chapman and Hall/CRC}}, + address = {{New York}}, + doi = {10.1201/9780429485572}, + isbn = {978-0-429-48557-2} +} +@manual{EddelbuettelEtAlPkgdigestCreateCompact2021, + title = {{{digest}}: Create Compact Hash Digests of {{R}} Objects}, + author = {Eddelbuettel, Dirk and Lucas, Antoine and Tuszynski, Jarek and Bengtsson, Henrik and Urbanek, Simon and Frasca, Mario and Lewis, Bryan and Stokely, Murray and Muehleisen, Hannes and Murdoch, Duncan and Hester, Jim and Wu, Wush and Kou, Qiang and Onkelinx, Thierry and Lang, Michel and Simko, Viliam and Hornik, Kurt and Neal, Radford and Bell, Kendon and de Queljoe, Matthew and Suruceanu, Ion and Denney, Bill and Schumacher, Dirk and Chang, Winston}, + year = {2021}, + doi = {10.32614/CRAN.package.digest} +} +@article{EVanden-EijndenTransitionpathTheoryPathfinding2010, + title = {Transition-Path Theory and Path-Finding Algorithms for the Study of Rare Events}, + author = {E, Weinan and Vanden-Eijnden, Eric}, + year = {2010}, + journal = {Annual Review of Physical Chemistry}, + volume = {61}, + number = {1}, + pages = {391--420}, + issn = {0066-426X}, + doi = {10.1146/annurev.physchem.040808.090412} +} +@book{FreidlinWentzellRandomPerturbationsDynamical2012, + title = {Random Perturbations of Dynamical Systems}, + author = {Freidlin, M.I. and Wentzell, A.D.}, + year = {2012}, + edition = {3rd}, + publisher = {Springer}, + address = {Berlin}, + doi = {10.1007/978-3-642-25847-3} +} +@article{HeymannVanden-EijndenPathwaysMaximumLikelihood2008, + title = {Pathways of Maximum Likelihood for Rare Events in Nonequilibrium Systems: Application to Nucleation in the Presence of Shear}, + author = {Heymann, Matthias and Vanden-Eijnden, Eric}, + year = {2008}, + journal = {Physical Review Letters}, + volume = {100}, + number = {14}, + pages = {140601}, + publisher = {American Physical Society}, + doi = {10.1103/PhysRevLett.100.140601} +} +@article{KaneEtAlScalableStrategiesComputing2013, + title = {Scalable Strategies for Computing with Massive Data}, + author = {Kane, Michael J. and Emerson, John and Weston, Stephen}, + year = {2013}, + journal = {Journal of Statistical Software}, + volume = {55}, + number = {14}, + pages = {1--19}, + doi = {10.18637/jss.v055.i14} +} +@article{KloekvanDijkBayesianEstimatesEquation1978, + title = {{Bayesian} Estimates of Equation System Parameters: An Application of Integration by {{Monte Carlo}}}, + author = {Kloek, T. and {van Dijk}, H. K.}, + year = {1978}, + journal = {Econometrica}, + volume = {46}, + number = {1}, + pages = {1--19}, + publisher = {Wiley, Econometric Society}, + issn = {0012-9682}, + doi = {10.2307/1913641} +} +@article{lamotheLinkingBallandcupAnalogy2019, + title = {Linking the Ball-and-Cup Analogy and Ordination Trajectories to Describe Ecosystem Stability, Resistance, and Resilience}, + author = {Lamothe, Karl A. and Somers, Keith M. and Jackson, Donald A.}, + year = {2019}, + journal = {Ecosphere}, + volume = {10}, + number = {3}, + pages = {e02629}, + issn = {2150-8925}, + doi = {10.1002/ecs2.2629} +} +@article{MooreEtAlQPotPackageStochastic2016, + author = {Christopher M. Moore and Christopher R. Stieha and Ben C. Nolting and Maria K. Cameron and Karen C. Abbott}, + title = {{{QPot}}: An {{R}} Package for Stochastic Differential Equation Quasi-Potential Analysis}, + year = {2016}, + journal = {{The R Journal}}, + doi = {10.32614/RJ-2016-031}, + pages = {19--38}, + volume = {8}, + number = {2} +} +@article{olthofComplexityTheoryPsychopathology2020, + title = {Complexity Theory of Psychopathology}, + author = {Olthof, Merlijn and Hasselman, Fred and Oude Maatman, Freek and Bosman, Anna Maria Theodora and Lichtwarck-Aschoff, Anna}, + journal = {Journal of Psychopathology and Clinical Science}, + volume = {132}, + number = {3}, + pages = {314--323}, + year = {2023}, + doi = {10.1037/abn0000740} +} +@article{PlummerEtAlCODAConvergenceDiagnosis2006, + title = {CODA: Convergence Diagnosis and Output Analysis for MCMC}, + author = {Plummer, Martyn and Best, Nicky and Cowles, Kate and Vines, Karen}, + year = {2006}, + journal = {R News}, + volume = {6}, + number = {1}, + pages = {7--11}, + url = {https://journal.r-project.org/articles/RN-2006-002/RN-2006-002.pdf} +} +@article{rodriguez-sanchezClimbingEscherStairs2020, + title = {Climbing {Escher}'s Stairs: A Way to Approximate Stability Landscapes in Multidimensional Systems}, + author = {Rodríguez-Sánchez, Pablo and {van Nes}, Egbert H. and Scheffer, Marten}, + year = {2020}, + journal = {PLOS Computational Biology}, + volume = {16}, + number = {4}, + pages = {e1007788}, + publisher = {Public Library of Science}, + issn = {1553-7358}, + doi = {10.1371/journal.pcbi.1007788} +} +@misc{rodriguez-sanchezPabRodRolldownPostpublication2020, + title = {PabRod/Rolldown: Post-publication Update}, + author = {Rodríguez-Sánchez, Pablo}, + year = {2020}, + doi = {10.5281/zenodo.3763038} +} +@book{SievertInteractiveWebbasedData2020, + title = {Interactive Web-Based Data Visualization with {{R}}, Plotly, and Shiny}, + author = {Sievert, Carson}, + year = {2020}, + publisher = {{Chapman and Hall/CRC}}, + url = {https://plotly-r.com}, + isbn = {978-1-138-33145-7} +} +@book{WaddingtonPrinciplesDevelopmentDifferentiation1966, + title = {Principles of Development and Differentiation}, + author = {Waddington, Conrad Hal}, + year = {1966}, + publisher = {Macmillan}, + address = {New York}, + abstract = {AGRICULTURAL SCIENCE AND TECHNOLOGY INFORMATION}, + language = {english} +} +@article{wangPotentialLandscapeFlux2008, + title = {Potential Landscape and Flux Framework of Nonequilibrium Networks: Robustness, Dissipation, and Coherence of Biochemical Oscillations}, + author = {Wang, Jin and Xu, Li and Wang, Erkang}, + year = {2008}, + journal = {Proceedings of the National Academy of Sciences}, + volume = {105}, + number = {34}, + pages = {12271--12276}, + publisher = {National Academy of Sciences}, + issn = {0027-8424}, + doi = {10.1073/pnas.0800579105}, + eprint = {18719111}, + eprinttype = {pmid} +} +@article{wangQuantifyingWaddingtonLandscape2011, + title = {Quantifying the {Waddington} Landscape and Biological Paths for Development and Differentiation}, + author = {Wang, Jin and Zhang, Kun and Xu, Li and Wang, Erkang}, + year = {2011}, + journal = {Proceedings of the National Academy of Sciences}, + volume = {108}, + number = {20}, + pages = {8257--8262}, + publisher = {National Academy of Sciences}, + issn = {0027-8424}, + doi = {10.1073/pnas.1017017108}, + eprint = {21536909}, + eprinttype = {pmid} +} +@book{WickhamGgplot2ElegantGraphics2016, + title = {{{ggplot2}}: Elegant Graphics for Data Analysis}, + author = {Wickham, Hadley}, + year = {2016}, + publisher = {Springer-Verlag New York}, + url = {https://ggplot2.tidyverse.org}, + isbn = {978-3-319-24277-4} +} +@incollection{WickhamImprovingPerformance2019a, + title = {Improving Performance}, + booktitle = {Advanced {{R}}}, + author = {Wickham, Hadley}, + year = {2019}, + edition = {2nd}, + pages = {531--546}, + publisher = {{Chapman and Hall/CRC}}, + address = {Boca Raton}, + url = {https://adv-r.hadley.nz/} +} +@article{zhouConstructionLandscapeMultistable2016, + title = {Construction of the Landscape for Multi-Stable Systems: Potential Landscape, Quasi-Potential, A-type Integral and Beyond}, + author = {Zhou, Peijie and Li, Tiejun}, + year = {2016}, + journal = {The Journal of Chemical Physics}, + volume = {144}, + number = {9}, + pages = {094109}, + publisher = {American Institute of Physics}, + issn = {0021-9606}, + doi = {10.1063/1.4943096} +} +@misc{CuiEtAlCommentsClimbingEscher2023, + title = {Comments on "{{Climbing {Escher}}'s Stairs}: {{A}} Way to Approximate Stability Landscapes in Multidimensional Systems"}, + shorttitle = {Comments on "{{Climbing {Escher}}'s Stairs}}, + author = {Cui, Jingmeng and {Lichtwarck-Aschoff}, Anna and Hasselman, Fred}, + year = {2023}, + month = dec, + publisher = {{arXiv}}, + doi = {10.48550/arXiv.2312.09690} +} +@article{StumpfEtAlModelingStemCell2021, + title = {Modeling Stem Cell Fates using Non-{{Markov}} Processes}, + author = {Stumpf, Patrick S. and Arai, Fumio and MacArthur, Ben D.}, + year = {2021}, + month = {02}, + date = {2021-02-04}, + journal = {Cell Stem Cell}, + pages = {187--190}, + volume = {28}, + number = {2}, + doi = {10.1016/j.stem.2021.01.009} +} +@article{GuidoumBoukhetalaPerformingParallelMonte2020, + title = {Performing Parallel {{Monte Carlo}} and Moment Equations Methods for {{It\^o}} and {{Stratonovich}} Stochastic Differential Systems: {{R}} Package {{Sim}}.{{DiffProc}}}, + shorttitle = {Performing Parallel Monte Carlo and Moment Equations Methods for It\^o and Stratonovich Stochastic Differential Systems}, + author = {Guidoum, Arsalane Chouaib and Boukhetala, Kamal}, + year = {2020}, + month = nov, + journal = {Journal of Statistical Software}, + volume = {96}, + pages = {1--82}, + issn = {1548-7660}, + doi = {10.18637/jss.v096.i02} +} +@article{robinaugh2024, + title = {Advancing the Network Theory of Mental Disorders: A Computational Model of Panic Disorder}, + author = {Robinaugh, Donald J. and Haslbeck, Jonas M. B. and Waldorp, Lourens J. and Kossakowski, Jolanda J. and Fried, Eiko I. and Millner, Alexander J. and McNally, Richard J. and Ryan, {Oisín} and de Ron, Jill and van der Maas, Han L. J. and van Nes, Egbert H. and Scheffer, Marten and Kendler, Kenneth S. and Borsboom, Denny}, + year = {2024}, + date = {2024}, + journal = {Psychological Review}, + pages = {1482--1508}, + volume = {131}, + number = {6}, + doi = {10.1037/rev0000515} +} +@article{RJ-2021-048, + title = {A Unifying Framework for Parallel and Distributed Processing in {{R}} Using Futures}, + volume = {13}, + DOI = {10.32614/RJ-2021-048}, + number = {2}, + journal = {The R Journal}, + author = {Bengtsson, Henrik}, + year = {2021}, + pages = {208--227} +} +@book{RubinoTuffinRareEventSimulation2009, + title = {Rare Event Simulation Using {{Monte Carlo}} Methods}, + author = {Rubino, Gerardo and Tuffin, Bruno}, + year = {2009}, + publisher = {{John Wiley \& Sons, Incorporated}}, + address = {{Newark, UNITED KINGDOM}}, + abstract = {Rare Event Simulation In a probabilistic model, a rare event is an event with a very small probability of occurrence. The forecasting of rare events is a formidable task but is important in many areas. For instance a catastrophic failure in a transport system or in a nuclear power plant, the failure of an information processing system in a bank, or in the communication network of a group of banks, leading to financial losses. Being able to evaluate the probability of rare events is therefore a critical issue. Monte Carlo Methods, the simulation of corresponding models, are used to analyze rare events. This book sets out to present the mathematical tools available for the efficient simulation of rare events. Importance sampling and splitting are presented along with an exposition of how to apply these tools to a variety of fields ranging from performance and dependability evaluation of complex systems, typically in computer science or in telecommunications, to chemical reaction analysis in biology or particle transport in physics. Graduate students, researchers and practitioners who wish to learn and apply rare event simulation techniques will find this book beneficial.}, + isbn = {978-0-470-74541-0}, + keywords = {System analysis -- Data processing}, + doi = {10.1002/9780470745403} +} +@article{hayesComplexSystemsApproach2020, + title = {A Complex Systems Approach to the Study of Change in Psychotherapy}, + author = {Hayes, Adele M. and Andrews, Leigh A.}, + year = {2020}, + month = jul, + journal = {BMC Medicine}, + volume = {18}, + number = {1}, + pages = {197}, + issn = {1741-7015}, + doi = {10.1186/s12916-020-01662-2}, + abstract = {A growing body of research highlights the limitations of traditional methods for studying the process of change in psychotherapy. The science of complex systems offers a useful paradigm for studying patterns of psychopathology and the development of more functional patterns in psychotherapy. Some basic principles of change are presented from subdisciplines of complexity science that are particularly relevant to psychotherapy: dynamical systems theory, synergetics, and network theory. Two early warning signs of system transition that have been identified across sciences (critical fluctuations and critical slowing) are also described. The network destabilization and transition (NDT) model of therapeutic change is presented as a conceptual framework to import these principles to psychotherapy research and to suggest future research directions.}, + langid = {english} +} +@article{WanEtAlFindingTransitionState2024, + title = {Finding Transition State and Minimum Energy Path of Bistable Elastic Continua through Energy Landscape Explorations}, + author = {Wan, Guangchao and Avis, Samuel J. and Wang, Zizheng and Wang, Xueju and Kusumaatmaja, Halim and Zhang, Teng}, + year = {2024}, + month = feb, + journal = {Journal of the Mechanics and Physics of Solids}, + volume = {183}, + pages = {105503}, + issn = {0022-5096}, + doi = {10.1016/j.jmps.2023.105503}, + abstract = {Mechanical bistable structures have two stable equilibria and can transit between them under external stimuli. Due to their unique behaviors such as snap-through and substantial shape changes, bistable structures exhibit unprecedented properties compared to conventional structures and thus have found applications in various fields such as soft robots, morphing wings and logic units. To quantitatively predict the performance of bistable structures in these applications, it is desirable to acquire information about the minimum energy barrier and an energy-efficient transition path between the two stable states. However, there is still a general lack of efficient methodologies to obtain this information, particularly for elastic continua with complicated, unintuitive transition paths. To overcome this challenge, here we integrate energy landscape exploration algorithms into finite element method (FEM). We first utilize the binary image transition state search (BITSS) method to identify the saddle point and then perform nudged elastic band (NEB) calculations with an initial guess based on the BITSS results to find the minimum energy path (MEP). This integrated strategy greatly helps the convergence of MEP calculations, which are highly nonlinear. Two representative cases are studied, including bistable buckled beams and a bistable unit of mechanical metamaterials, and the numerical results agree well with the previous works. Importantly, we numerically predict the complicated MEP of an asymmetric bistable unit of mechanical metamaterials and use experiments to demonstrate that following this MEP leads to successful transition between stable states while intuitive uniaxial compression fails to do so. Our work provides an effective numerical platform for identifying the minimum energy barrier and energy-efficient transition path of a bistable continuum, which can offer valuable guidance to the design of actuators, damping structures, energy harvesters, and mechanical metamaterials.}, + keywords = {Asymmetric transition path,Binary image transition state search,Bistable continuum structures,Energy barrier,Minimum energy path,Nudged elastic band} +} diff --git a/_articles/RJ-2025-039/RJwrapper.tex b/_articles/RJ-2025-039/RJwrapper.tex new file mode 100644 index 0000000000..f3363ac546 --- /dev/null +++ b/_articles/RJ-2025-039/RJwrapper.tex @@ -0,0 +1,70 @@ +\documentclass[a4paper]{report} +\usepackage[utf8]{inputenc} +\usepackage[T1]{fontenc} +\usepackage{RJournal} +\usepackage{amsmath,amssymb,array} +\usepackage{booktabs} + + +% tightlist command for lists without linebreak +\providecommand{\tightlist}{% + \setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}} + +\usepackage{longtable} + +% Always define CSL refs as bib entries are contained in separate doc +% Pandoc citation processing +%From Pandoc 3.1.8 +% definitions for citeproc citations +\NewDocumentCommand\citeproctext{}{} +\NewDocumentCommand\citeproc{mm}{% + \begingroup\def\citeproctext{#2}\cite{#1}\endgroup} +\makeatletter + % allow citations to break across lines + \let\@cite@ofmt\@firstofone + % avoid brackets around text for \cite: + \def\@biblabel#1{} + \def\@cite#1#2{{#1\if@tempswa , #2\fi}} +\makeatother +\newlength{\cslhangindent} +\setlength{\cslhangindent}{1.5em} +\newlength{\csllabelwidth} +\setlength{\csllabelwidth}{3em} +\newenvironment{CSLReferences}[2] % #1 hanging-indent, #2 entry-spacing + {\begin{list}{}{% + \setlength{\itemindent}{0pt} + \setlength{\leftmargin}{0pt} + \setlength{\parsep}{0pt} + % turn on hanging indent if param 1 is 1 + \ifodd #1 + \setlength{\leftmargin}{\cslhangindent} + \setlength{\itemindent}{-1\cslhangindent} + \fi + % set entry spacing + \setlength{\itemsep}{#2\baselineskip}}} + {\end{list}} +\usepackage{calc} +\newcommand{\CSLBlock}[1]{#1\hfill\break} +\newcommand{\CSLLeftMargin}[1]{\parbox[t]{\csllabelwidth}{#1}} +\newcommand{\CSLRightInline}[1]{\parbox[t]{\linewidth - \csllabelwidth}{#1}\break} +\newcommand{\CSLIndent}[1]{\hspace{\cslhangindent}#1} + + + +\begin{document} + + +%% do not edit, for illustration only +\sectionhead{Contributed research article} +\volume{17} +\volnumber{4} +\year{2025} +\month{December} +\setcounter{page}{173} + +\begin{article} + \input{RJ-2025-039} +\end{article} + + +\end{document} diff --git a/_articles/RJ-2025-039/figures/3dstatic_gene.png b/_articles/RJ-2025-039/figures/3dstatic_gene.png new file mode 100644 index 0000000000..3ec558aabe Binary files /dev/null and b/_articles/RJ-2025-039/figures/3dstatic_gene.png differ diff --git a/_articles/RJ-2025-039/figures/diagram-e1.png b/_articles/RJ-2025-039/figures/diagram-e1.png new file mode 100644 index 0000000000..45c84a2da4 Binary files /dev/null and b/_articles/RJ-2025-039/figures/diagram-e1.png differ diff --git a/_articles/RJ-2025-039/figures/diagram-e2.png b/_articles/RJ-2025-039/figures/diagram-e2.png new file mode 100644 index 0000000000..b08877a7a3 Binary files /dev/null and b/_articles/RJ-2025-039/figures/diagram-e2.png differ diff --git a/_articles/RJ-2025-039/figures/diagram.png b/_articles/RJ-2025-039/figures/diagram.png new file mode 100644 index 0000000000..64165bf10a Binary files /dev/null and b/_articles/RJ-2025-039/figures/diagram.png differ diff --git a/_articles/RJ-2025-039/figures/metaphor.png b/_articles/RJ-2025-039/figures/metaphor.png new file mode 100644 index 0000000000..ead33fefd8 Binary files /dev/null and b/_articles/RJ-2025-039/figures/metaphor.png differ diff --git a/_articles/RJ-2025-039/figures/wang2011.png b/_articles/RJ-2025-039/figures/wang2011.png new file mode 100644 index 0000000000..2a6f1ad84c Binary files /dev/null and b/_articles/RJ-2025-039/figures/wang2011.png differ diff --git a/_articles/RJ-2025-040/RJ-2025-040.R b/_articles/RJ-2025-040/RJ-2025-040.R new file mode 100644 index 0000000000..7bce91af65 --- /dev/null +++ b/_articles/RJ-2025-040/RJ-2025-040.R @@ -0,0 +1,506 @@ +# Generated by `rjournal_pdf_article()` using `knitr::purl()`: do not edit by hand +# Please edit RJ-2025-040.Rmd to modify this file + +## ----setup, include=FALSE----------------------------------------------------- +knitr::opts_chunk$set(echo = FALSE, warning = FALSE, message = FALSE) #, cache = TRUE) +options(tinytex.verbose = TRUE) +library(rvif) +library(knitr) +library(kableExtra) +library(memisc) # mtable + + +## ----WisseltableHTML, eval = knitr::is_html_output()-------------------------- +# WisselTABLE = Wissel[,-3] +# knitr::kable(WisselTABLE, format = "html", caption = "Data set presented previously by @Wissell", align="cccccc", digits = 3) + + +## ----WisseltableLATEX, eval = knitr::is_latex_output()------------------------ +WisselTABLE = Wissel[,-3] +knitr::kable(WisselTABLE, format = "latex", booktabs = TRUE, caption = "Data set presented previously by Wissell", align="cccccc", digits = 3)%>% +kable_styling(latex_options = "scale_down") + + +## ----Wisselregression--------------------------------------------------------- +attach(Wissel) +obs = nrow(Wissel) +regWISSEL0 = lm(D~C+I+CP) + regWISSEL0coef = as.double(regWISSEL0$coefficients) + regWISSEL0se = as.double(summary(regWISSEL0)[[4]][,2]) + regWISSEL0texp = as.double(summary(regWISSEL0)[[4]][,3]) + regWISSEL0pvalue = as.double(summary(regWISSEL0)[[4]][,4]) + regWISSEL0sigma2 = as.double(summary(regWISSEL0)[[6]]^2) + regWISSEL0R2 = as.double(summary(regWISSEL0)[[8]]) + regWISSEL0Fexp = as.double(summary(regWISSEL0)[[10]][[1]]) + regWISSEL0table = data.frame(c(regWISSEL0coef, obs), + c(regWISSEL0se, regWISSEL0sigma2), + c(regWISSEL0texp, regWISSEL0R2), + c(regWISSEL0pvalue, regWISSEL0Fexp)) + regWISSEL0table = round(regWISSEL0table, digits=4) + colnames(regWISSEL0table) =c("Estimator", "Standard Error", "Experimental t", "p-value") + rownames(regWISSEL0table) =c("Intercept", "Personal consumption", "Personal income", "Outstanding consumer credit", "(Obs, Sigma Est., Coef. Det., F exp.)") +regWISSEL1 = lm(D~C) + regWISSEL1coef = as.double(regWISSEL1$coefficients) + regWISSEL1se = as.double(summary(regWISSEL1)[[4]][,2]) + regWISSEL1texp = as.double(summary(regWISSEL1)[[4]][,3]) + regWISSEL1pvalue = as.double(summary(regWISSEL1)[[4]][,4]) + regWISSEL1sigma2 = as.double(summary(regWISSEL1)[[6]]^2) + regWISSEL1R2 = as.double(summary(regWISSEL1)[[8]]) + regWISSEL1Fexp = as.double(summary(regWISSEL1)[[10]][[1]]) + regWISSEL1table = data.frame(c(regWISSEL1coef, obs), + c(regWISSEL1se, regWISSEL1sigma2), + c(regWISSEL1texp, regWISSEL1R2), + c(regWISSEL1pvalue, regWISSEL1Fexp)) + regWISSEL1table = round(regWISSEL1table, digits=4) + colnames(regWISSEL1table) =c("Estimator", "Standard Error", "Experimental t", "p-value") + rownames(regWISSEL1table) =c("Intercept", "Personal consumption", "(Obs, Sigma Est., Coef. Det., F exp.)") +regWISSEL2 = lm(D~C+I) + regWISSEL2coef = as.double(regWISSEL2$coefficients) + regWISSEL2se = as.double(summary(regWISSEL2)[[4]][,2]) + regWISSEL2texp = as.double(summary(regWISSEL2)[[4]][,3]) + regWISSEL2pvalue = as.double(summary(regWISSEL2)[[4]][,4]) + regWISSEL2sigma2 = as.double(summary(regWISSEL2)[[6]]^2) + regWISSEL2R2 = as.double(summary(regWISSEL2)[[8]]) + regWISSEL2Fexp = as.double(summary(regWISSEL2)[[10]][[1]]) + regWISSEL2table = data.frame(c(regWISSEL2coef, obs), + c(regWISSEL2se, regWISSEL2sigma2), + c(regWISSEL2texp, regWISSEL2R2), + c(regWISSEL2pvalue, regWISSEL2Fexp)) + regWISSEL2table = round(regWISSEL2table, digits=4) + colnames(regWISSEL2table) =c("Estimator", "Standard Error", "Experimental t", "p-value") + rownames(regWISSEL2table) =c("Intercept", "Personal consumption", "Personal income", "(Obs, Sigma Est., Coef. Det., F exp.)") +regWISSEL3 = lm(D~C+CP) + regWISSEL3coef = as.double(regWISSEL3$coefficients) + regWISSEL3se = as.double(summary(regWISSEL3)[[4]][,2]) + regWISSEL3texp = as.double(summary(regWISSEL3)[[4]][,3]) + regWISSEL3pvalue = as.double(summary(regWISSEL3)[[4]][,4]) + regWISSEL3sigma2 = as.double(summary(regWISSEL3)[[6]]^2) + regWISSEL3R2 = as.double(summary(regWISSEL3)[[8]]) + regWISSEL3Fexp = as.double(summary(regWISSEL3)[[10]][[1]]) + regWISSEL3table = data.frame(c(regWISSEL3coef, obs), + c(regWISSEL3se, regWISSEL3sigma2), + c(regWISSEL3texp, regWISSEL3R2), + c(regWISSEL3pvalue, regWISSEL3Fexp)) + regWISSEL3table = round(regWISSEL3table, digits=4) + colnames(regWISSEL3table) =c("Estimator", "Standard Error", "Experimental t", "p-value") + rownames(regWISSEL3table) =c("Intercept", "Personal consumption", "Outstanding consumer credit", "(Obs, Sigma Est., Coef. Det., F exp.)") + + +## ----Wissel0tableHTML, eval = knitr::is_html_output()------------------------- +# knitr::kable(regWISSEL0table, format = "html", caption = "OLS estimation for the Wissel model", align="cccc", digits = 3) + + +## ----Wissel0tableLATEX, eval = knitr::is_latex_output()----------------------- +knitr::kable(regWISSEL0table, format = "latex", booktabs = TRUE, caption = "OLS estimation for the Wissel model", align="cccc", digits = 3) %>% +kable_styling(latex_options = "scale_down") + + +## ----Wissel1tableHTML, eval = knitr::is_html_output()------------------------- +# knitr::kable(regWISSEL1table, format = "html", caption = "OLS estimation for part of the Wissel model", align="cccc", digits = 3) + + +## ----Wissel1tableLATEX, eval = knitr::is_latex_output()----------------------- +knitr::kable(regWISSEL1table, format = "latex", booktabs = TRUE, caption = "OLS estimation for part of the Wissel model", align="cccc", digits = 3) %>% +kable_styling(latex_options = "scale_down") + + +## ----Wissel2tableHTML, eval = knitr::is_html_output()------------------------- +# knitr::kable(regWISSEL2table, format = "html", caption = "OLS estimation for part of the Wissel model", align="cccc", digits = 3) + + +## ----Wissel2tableLATEX, eval = knitr::is_latex_output()----------------------- +knitr::kable(regWISSEL2table, format = "latex", booktabs = TRUE, caption = "OLS estimation for part of the Wissel model", align="cccc", digits = 3) %>% +kable_styling(latex_options = "scale_down") + + +## ----Wissel3tableHTML, eval = knitr::is_html_output()------------------------- +# knitr::kable(regWISSEL3table, format = "html", caption = "OLS estimation for part of the Wissel model", align="cccc", digits = 3) + + +## ----Wissel3tableLATEX, eval = knitr::is_latex_output()----------------------- +knitr::kable(regWISSEL3table, format = "latex", booktabs = TRUE, caption = "OLS estimation for part of the Wissel model", align="cccc", digits = 3)%>% +kable_styling(latex_options = "scale_down") + + +## ----WisselORTO--------------------------------------------------------------- +y = Wissel[,2] +X = as.matrix(Wissel[,3:6]) +Xqr=qr(X) +Xo = qr.Q(Xqr) +regORTO = lm(y~Xo+0) +#summary(regORTO) + regORTOcoef = as.double(regORTO$coefficients) + regORTOse = as.double(summary(regORTO)[[4]][,2]) + regORTOtexp = as.double(summary(regORTO)[[4]][,3]) + regORTOpvalue = as.double(summary(regORTO)[[4]][,4]) + regORTOsigma2 = as.double(summary(regORTO)[[6]]^2) + regORTOR2 = as.double(summary(regORTO)[[8]]) # as I have removed the intercept in the regression, this does not calculate it well + regORTOFexp = as.double(summary(regORTO)[[10]][[1]]) # as I have removed the intercept in the regression, this does not calculate it well + regORTOtable = data.frame(c(regORTOcoef, obs), + c(regORTOse, regORTOsigma2), + c(regORTOtexp, 0.9235), + c(regORTOpvalue, 52.3047)) + regORTOtable = round(regORTOtable, digits=4) + colnames(regORTOtable) =c("Estimator", "Standard Error", "Experimental t", "p-value") + rownames(regORTOtable) =c("Intercept", "Personal consumption", "Personal income", "Outstanding consumer credit", "(Obs, Sigma Est., Coef. Det., F exp.)") + + +## ----WisselORTOtableHTML, eval = knitr::is_html_output()---------------------- +# knitr::kable(regORTOtable, format = "html", caption = "OLS estimation for the orthonormal Wissel model", align="cccc", digits = 3) + + +## ----WisselORTOtableLATEX, eval = knitr::is_latex_output()-------------------- +knitr::kable(regORTOtable, format = "latex", booktabs = TRUE, caption = "OLS estimation for the orthonormal Wissel model", align="cccc", digits = 3)%>% +kable_styling(latex_options = "scale_down") + + +## ----KGtableHTML, eval = knitr::is_html_output()------------------------------ +# data(KG) +# KGtable = KG +# colnames(KGtable) = c("Consumption", "Wage income", "Non-farm income", "Farm income") +# knitr::kable(KGtable, format = "html", caption = "Data set presented previously by @KleinGoldberger", align="cccc", digits = 3) + + +## ----KGtableLATEX, eval = knitr::is_latex_output()---------------------------- +data(KG) +KGtable = KG +colnames(KGtable) = c("Consumption", "Wage income", "Non-farm income", "Farm income") +knitr::kable(KGtable, format = "latex", booktabs = TRUE, caption = "Data set presented previously by Klein and Goldberger", align="cccc", digits = 3)%>% +kable_styling(latex_options = "scale_down") + + +## ----KGregression------------------------------------------------------------- +attach(KG) +obs = nrow(KG) +regKG = lm(consumption~wage.income+non.farm.income+farm.income) + regKGcoef = as.double(regKG$coefficients) + regKGse = as.double(summary(regKG)[[4]][,2]) + regKGtexp = as.double(summary(regKG)[[4]][,3]) + regKGpvalue = as.double(summary(regKG)[[4]][,4]) + regKGsigma2 = as.double(summary(regKG)[[6]]^2) + regKGR2 = as.double(summary(regKG)[[8]]) + regKGFexp = as.double(summary(regKG)[[10]][[1]]) + regKGtable = data.frame(c(regKGcoef, obs), + c(regKGse, regKGsigma2), + c(regKGtexp, regKGR2), + c(regKGpvalue, regKGFexp)) + regKGtable = round(regKGtable, digits=4) + colnames(regKGtable) =c("Estimator", "Standard Error", "Experimental t", "p-value") + rownames(regKGtable) =c("Intercept", "Wage income", "Non-farm income", "Farm income", "(Obs, Sigma Est., Coef. Det., F exp.)") + + +## ----regKGtableHTML, eval = knitr::is_html_output()--------------------------- +# knitr::kable(regKGtable, format = "html", caption = "OLS estimation for the Klein and Goldberger model", align="cccc", digits = 3) + + +## ----regKGtableLATEX, eval = knitr::is_latex_output()------------------------- +knitr::kable(regKGtable, format = "latex", booktabs = TRUE, caption = "OLS estimation for the Klein and Goldberger model", align="cccc", digits = 3)%>% +kable_styling(latex_options = "scale_down") + + +## ----THEOREM------------------------------------------------------------------ +y = Wissel[,2] +X = as.matrix(Wissel[,3:6]) +theoremWISSEL = multicollinearity(y, X) +rownames(theoremWISSEL) = c("Intercept", "Personal consumption", "Personal income", "Outstanding consumer credit") + +y = KG[,1] +cte = rep(1, length(y)) +X = as.matrix(cbind(cte, KG[,-1])) +theoremKG = multicollinearity(y, X) +rownames(theoremKG) = c("Intercept", "Wage income", "Non-farm income", "Farm income") + + +## ----theoremWISSELtableHTML, eval = knitr::is_html_output()------------------- +# knitr::kable(theoremWISSEL, format = "html", caption = "Theorem results of the Wissel model", align="ccccc", digits = 6) + + +## ----theoremWISSELtableLATEX, eval = knitr::is_latex_output()----------------- +knitr::kable(theoremWISSEL, format = "latex", booktabs = TRUE, caption = "Theorem results of the Wissel model", align="ccccc", digits = 6) %>% +kable_styling(latex_options = "scale_down") + + +## ----theoremKGtableHTML, eval = knitr::is_html_output()----------------------- +# knitr::kable(theoremKG, format = "html", caption = "Theorem results of the Klein and Goldberger model", align="ccccc", digits = 6) + + +## ----theoremKGtableLATEX, eval = knitr::is_latex_output()--------------------- +knitr::kable(theoremKG, format = "latex", booktabs = TRUE, caption = "Theorem results of the Klein and Goldberger model", align="ccccc", digits = 6) %>% +kable_styling(latex_options = "scale_down") + + +## ----PAPER13, echo=TRUE------------------------------------------------------- +y_W = Wissel[,2] +X_W = Wissel[,3:6] +multicollinearity(y_W, X_W) + + +## ----PAPER14, echo=TRUE------------------------------------------------------- +y_KG = KG[,1] +cte = rep(1, length(y)) +X_KG = cbind(cte, KG[,2:4]) +multicollinearity(y_KG, X_KG) + + +## ----PAPER15, echo=TRUE------------------------------------------------------- +multicollinearity(y_W, X_W, alpha = 0.01) +multicollinearity(y_KG, X_KG, alpha = 0.01) + + +## ----SAMPLE-SIZE 1------------------------------------------------------------ +## Simulation 1 +set.seed(2024) +obs = 3000 # no individual significance test is affected +cte = rep(1, obs) +x2 = rnorm(obs, 5, 0.1) # related to intercept: non essential +x3 = rnorm(obs, 5, 10) +x4 = x3 + rnorm(obs, 5, 0.5) # related to x3: essential +x5 = rnorm(obs, -1, 3) +x6 = rnorm(obs, 15, 2.5) +y = 4 + 5*x2 - 9*x3 -2*x4 + 2*x5 + 7*x6 + rnorm(obs, 0, 2) +X = cbind(cte, x2, x3, x4, x5, x6) +theoremSIMULATION1 = multicollinearity(y, X) +rownames(theoremSIMULATION1) = c("Intercept", "X2", "X3", "X4", "X5", "X6") +vifsSIMULATION1 = VIF(X) +cnSIMULATION1 = CN(X) +cvsSIMULATION1 = CVs(X) + +## Simulation 2 +obs = 100 # decreasing the number of observations affected to intercept +cte = rep(1, obs) +x2 = rnorm(obs, 5, 0.1) # related to intercept: non essential +x3 = rnorm(obs, 5, 10) +x4 = x3 + rnorm(obs, 5, 0.5) # related to x3: essential +x5 = rnorm(obs, -1, 3) +x6 = rnorm(obs, 15, 2.5) +y = 4 + 5*x2 - 9*x3 -2*x4 + 2*x5 + 7*x6 + rnorm(obs, 0, 2) +X = cbind(cte, x2, x3, x4, x5, x6) +theoremSIMULATION2 = multicollinearity(y, X) +rownames(theoremSIMULATION2) = c("Intercept", "X2", "X3", "X4", "X5", "X6") +vifsSIMULATION2 = VIF(X) +cnSIMULATION2 = CN(X) +cvsSIMULATION2 = CVs(X) + +## Simulation 3 +obs = 30 # decreasing the number of observations affected to intercept, x2 and x4 +cte = rep(1, obs) +x2 = rnorm(obs, 5, 0.1) # related to intercept: non essential +x3 = rnorm(obs, 5, 10) +x4 = x3 + rnorm(obs, 5, 0.5) # related to x3: essential +x5 = rnorm(obs, -1, 3) +x6 = rnorm(obs, 15, 2.5) +y = 4 + 5*x2 - 9*x3 -2*x4 + 2*x5 + 7*x6 + rnorm(obs, 0, 2) +X = cbind(cte, x2, x3, x4, x5, x6) +theoremSIMULATION3 = multicollinearity(y, X) +rownames(theoremSIMULATION3) = c("Intercept", "X2", "X3", "X4", "X5", "X6") +vifsSIMULATION3 = VIF(X) +cnSIMULATION3 = CN(X) +cvsSIMULATION3 = CVs(X) + + +## ----traditionalSIMULATIONtableHTML, eval = knitr::is_html_output()----------- +# traditionalSIMULATION = data.frame(c(cvsSIMULATION1, vifsSIMULATION1, cnSIMULATION1), +# c(cvsSIMULATION2, vifsSIMULATION2, cnSIMULATION2), +# c(cvsSIMULATION3, vifsSIMULATION3, cnSIMULATION3)) +# rownames(traditionalSIMULATION) = c("X2 CV", "X3 CV", "X4 CV", "X5 CV", "X6 CV", "X2 VIF", "X3 VIF", "X4 VIF", "X5 VIF", "X6 VIF", "CN") +# colnames(traditionalSIMULATION) = c("Simulation 1", "Simulation 2", "Simulation 3") +# knitr::kable(traditionalSIMULATION, format = "html", caption = "CVs, VIFs for data of Simulations 1, 2 and 3", align="cccccc", digits = 3) + + +## ----traditionalSIMULATIONtableLATEX, eval = knitr::is_latex_output()--------- +traditionalSIMULATION = data.frame(c(cvsSIMULATION1, vifsSIMULATION1, cnSIMULATION1), + c(cvsSIMULATION2, vifsSIMULATION2, cnSIMULATION2), + c(cvsSIMULATION3, vifsSIMULATION3, cnSIMULATION3)) +rownames(traditionalSIMULATION) = c("X2 CV", "X3 CV", "X4 CV", "X5 CV", "X6 CV", "X2 VIF", "X3 VIF", "X4 VIF", "X5 VIF", "X6 VIF", "CN") +colnames(traditionalSIMULATION) = c("Simulation 1", "Simulation 2", "Simulation 3") +knitr::kable(traditionalSIMULATION, format = "latex", booktabs = TRUE, caption = "CVs, VIFs and CN for data of Simulations 1, 2 and 3", align="cccccc", digits = 3) %>% +kable_styling(latex_options = "scale_down") + + +## ----theoremSIMULATION1tableHTML, eval = knitr::is_html_output()-------------- +# knitr::kable(theoremSIMULATION1, format = "html", caption = "Theorem results of the Simulation 1 model", align="cccccc", digits = 6) + + +## ----theoremSIMULATION1tableLATEX, eval = knitr::is_latex_output()------------ +knitr::kable(theoremSIMULATION1, format = "latex", booktabs = TRUE, caption = "Theorem results of the Simulation 1 model", align="cccccc", digits = 6) %>% +kable_styling(latex_options = "scale_down") + + +## ----theoremSIMULATION2tableHTML, eval = knitr::is_html_output()-------------- +# knitr::kable(theoremSIMULATION2, format = "html", caption = "Theorem results of the Simulation 2 model", align="cccccc", digits = 6) + + +## ----theoremSIMULATION2tableLATEX, eval = knitr::is_latex_output()------------ +knitr::kable(theoremSIMULATION2, format = "latex", booktabs = TRUE, caption = "Theorem results of the Simulation 2 model", align="cccccc", digits = 6) %>% +kable_styling(latex_options = "scale_down") + + +## ----theoremSIMULATION3tableHTML, eval = knitr::is_html_output()-------------- +# knitr::kable(theoremSIMULATION3, format = "html", caption = "Theorem results of the Simulation 3 model", align="cccccc", digits = 6) + + +## ----theoremSIMULATION3tableLATEX, eval = knitr::is_latex_output()------------ +knitr::kable(theoremSIMULATION3, format = "latex", booktabs = TRUE, caption = "Theorem results of the Simulation 3 model", align="cccccc", digits = 6) %>% +kable_styling(latex_options = "scale_down") + + +## ----Choice1------------------------------------------------------------------ +P = CDpf[,1] +cte = CDpf[,2] +K = CDpf[,3] +W = CDpf[,4] + +data2 = cbind(cte, K, W) +th2 = multicollinearity(P, data2) +rownames(th2) = c("Intercept", "Capital", "Work") + + +## ----theoremCHOICE1tableHTML, eval = knitr::is_html_output()------------------ +# knitr::kable(th2, format = "html", caption = "Theorem results of the Example 2 of @Salmeron2024a", align="cccccc", digits = 6) + + +## ----theoremCHOICE1tableLATEX, eval = knitr::is_latex_output()---------------- +knitr::kable(th2, format = "latex", booktabs = TRUE, caption = "Theorem results of the Example 2 of Salmerón et al. (2025)", align="cccccc", digits = 6) %>% +kable_styling(latex_options = "scale_down") + + +## ----Choice2------------------------------------------------------------------ +data2 = cbind(cte, W, K) +th2 = multicollinearity(P, data2) +rownames(th2) = c("Intercept", "Work", "Capital") + + +## ----theoremCHOICE2tableHTML, eval = knitr::is_html_output()------------------ +# knitr::kable(th2, format = "html", caption = "Theorem results of the Example 2 of @Salmeron2024a (reordination 2)", align="cccccc", digits = 6) + + +## ----theoremCHOICE2tableLATEX, eval = knitr::is_latex_output()---------------- +knitr::kable(th2, format = "latex", booktabs = TRUE, caption = "Theorem results of the Example 2 of Salmerón et al. (2025) (reordination 2)", align="cccccc", digits = 6) %>% +kable_styling(latex_options = "scale_down") + + +## ----Choice7, eval=FALSE------------------------------------------------------ +# NE = employees[,1] +# cte = employees[,2] +# FA = employees[,3] +# OI = employees[,4] +# S = employees[,5] +# reg = lm(NE~FA+OI+S) +# summary(reg) + + +## ----Choice8------------------------------------------------------------------ +NE = employees[,1] +cte = employees[,2] +FA = employees[,3] +OI = employees[,4] +S = employees[,5] +data3 = cbind(OI, S, FA) +th8 = multicollinearity(NE, data3) +rownames(th8) = c("OI", "S", "FA") + + +## ----theoremCHOICE8tableHTML, eval = knitr::is_html_output()------------------ +# knitr::kable(th8, format = "html", caption = "Theorem results of the Example 3 of @Salmeron2024a reordination", align="ccccc", digits=20) + + +## ----theoremCHOICE8tableLATEX, eval = knitr::is_latex_output()---------------- +knitr::kable(th8, format = "latex", booktabs = TRUE, caption = "Theorem results of the Example 3 of Salmerón et al. (2025) reordination", align="ccccc", digits=20) %>% +kable_styling(latex_options = "scale_down") + + +## ----PAPER1, echo=TRUE-------------------------------------------------------- +E = euribor[,1] +data1 = euribor[,-1] + +VIF(data1) +CN(data1) +CVs(data1) + + +## ----PAPER2, echo = knitr::is_html_output(), eval = knitr::is_html_output()---- +# rvifs(data1, ul = T, intercept = T) + + +## ----PAPER2bis, echo = knitr::is_latex_output(), eval = knitr::is_latex_output()---- +rvifs(data1, ul = T, intercept = T) + + +## ----PAPER_2, echo = knitr::is_html_output(), eval = knitr::is_html_output()---- +# reg_E = lm(euribor[,1]~as.matrix(euribor[,-c(1,2)])) +# rvifs(model.matrix(reg_E)) + + +## ----PAPER_2bis, echo = knitr::is_latex_output(), eval = knitr::is_latex_output()---- +reg_E = lm(euribor[,1]~as.matrix(euribor[,-c(1,2)])) +rvifs(model.matrix(reg_E)) + + +## ----PAPER3, echo=TRUE-------------------------------------------------------- +multicollinearity(E, data1) + + +## ----PAPER4, echo=TRUE-------------------------------------------------------- +P = CDpf[,1] +data2 = CDpf[,2:4] + + +## ----PAPER4bis, echo = knitr::is_html_output(), eval = knitr::is_html_output()---- +# rvifs(data2, ul = T) + + +## ----PAPER4tris, echo = knitr::is_latex_output(), eval = knitr::is_latex_output()---- +rvifs(data2, ul = T) + + +## ----PAPER5, echo=TRUE-------------------------------------------------------- +multicollinearity(P, data2) + + +## ----PAPER5bis, echo=TRUE----------------------------------------------------- +data2 = CDpf[,c(2,4,3)] +multicollinearity(P, data2) + + +## ----PAPER6, echo=TRUE-------------------------------------------------------- +NE = employees[,1] +data3 = employees[,2:5] + + +## ----PAPER6bis, echo = knitr::is_html_output(), eval = knitr::is_html_output()---- +# rvifs(data3, ul = T) + + +## ----PAPER6tris, echo = knitr::is_latex_output(), eval = knitr::is_latex_output()---- +rvifs(data3, ul = T) + + +## ----PAPER7, echo=TRUE-------------------------------------------------------- +multicollinearity(NE, data3[,-1]) + + +## ----PAPER8, echo=TRUE-------------------------------------------------------- +set.seed(2022) +obs = 50 +cte4 = rep(1, obs) +V = rnorm(obs, 10, 10) +y1 = 3 + 4*V + rnorm(obs, 0, 2) +Z = rnorm(obs, 10, 0.1) +y2 = 3 + 4*Z + rnorm(obs, 0, 2) + +data4.1 = cbind(cte4, V) +data4.2 = cbind(cte4, Z) + + +## ----PAPER10, echo=TRUE------------------------------------------------------- +rvifs(data4.1, ul = T) + + +## ----PAPER11, echo=TRUE------------------------------------------------------- +rvifs(data4.2, ul = T) + + +## ----PAPER12, echo=TRUE------------------------------------------------------- +multicollinearity(y1, data4.1) +multicollinearity(y2, data4.2) + diff --git a/_articles/RJ-2025-040/RJ-2025-040.Rmd b/_articles/RJ-2025-040/RJ-2025-040.Rmd new file mode 100644 index 0000000000..43dab731c5 --- /dev/null +++ b/_articles/RJ-2025-040/RJ-2025-040.Rmd @@ -0,0 +1,1010 @@ +--- +title: 'rvif: a Decision Rule to Detect Troubling Statistical Multicollinearity Based + on Redefined VIF' +date: '2026-02-04' +abstract: | + Multicollinearity is relevant in many different fields where linear regression models are applied since its presence may affect the analysis of ordinary least squares estimators not only numerically but also from a statistical point of view, which is the focus of this paper. Thus, it is known that collinearity can lead to incoherence in the statistical significance of the coefficients of the independent variables and in the global significance of the model. In this paper, the thresholds of the Redefined Variance Inflation Factor (RVIF) are reinterpreted and presented as a statistical test with a region of non-rejection (which depends on a significance level) to diagnose the existence of a degree of worrying multicollinearity that affects the linear regression model from a statistical point of view. The proposed methodology is implemented in the rvif package of R and its application is illustrated with different real data examples previously applied in the scientific literature. +draft: no +author: +- name: Román Salmerón-Gómez + affiliation: University of Granada + address: + - Department of Quantitative Methods for Economics and Business + - Campus Universitario de La Cartuja, Universidad de Granada. 18071 Granada (España) + url: https://www.ugr.es/~romansg/web/index.html + orcid: 0000-0003-2589-4058 + email: romansg@ugr.es +- name: Catalina B. García-García + affiliation: University of Granada + address: + - Department of Quantitative Methods for Economics and Business + - Campus Universitario de La Cartuja, Universidad de Granada. 18071 Granada (España) + url: https://metodoscuantitativos.ugr.es/informacion/directorio-personal/catalina-garcia-garcia + email: cbgarcia@ugr.es + orcid: 0000-0003-1622-3877 +type: package +output: + rjtools::rjournal_article: + self_contained: yes + toc: no +bibliography: RJreferences.bib +editor_options: + chunk_output_type: console +date_received: '2025-02-06' +volume: 17 +issue: 4 +slug: RJ-2025-040 +journal: + lastpage: 215 + firstpage: 192 + +--- + + +```{r setup, include=FALSE} +knitr::opts_chunk$set(echo = FALSE, warning = FALSE, message = FALSE) #, cache = TRUE) +options(tinytex.verbose = TRUE) +library(rvif) +library(knitr) +library(kableExtra) +library(memisc) # mtable +``` + +# Introduction + +It is well known that linear relationships between the independent variables of a multiple linear regression model (multicollinearity) can affect the analysis of the model estimated by Ordinary Least Squares (OLS), either by causing unstable estimates of the coefficients of these variables or by rejecting individual significance tests of these coefficients (see, for example, @FarrarGlauber, @GunstMason1977, @Gujarati2003, @Silvey1969, @WillanWatts1978 or @Wooldrigde2013). +However, the measures traditionally applied to detect multicollinearity may conclude that multicollinearity exists even if it does not lead to the negative effects mentioned above (see Subsection [Effect of sample size..](#effect-sample-size) for more details), when, in fact, the best solution in this case may be not to treat the multicollinearity (see @OBrien). + +Focusing on the possible effect of multicollinearity on the individual significance tests of the coefficients of the independent variables (tendency not to reject the null hypothesis), this paper proposes an alternative procedure that focuses on checking whether the detected multicollinearity affects the statistical analysis of the model. For this disruptive approach, a methodology is necessary that indicates whether multicollinearity affects the statistical analysis of the model. The introduction of such methodology is the main objective of this paper. The paper also shows the use of the \CRANpkg{rvif} package of R (@R) in which this procedure is implemented. + +To this end, we start from the Variance Inflation Factor (VIF). The VIF is obtained from the coefficient of determination of the auxiliary regression of each independent variable of linear regression model as a function of the other independent variables. Thus, there is a VIF for each independent variable except for the intercept, for which it is not possible to calculate a coefficient of determination for the corresponding auxiliary regression. Consequently, the VIF is able to diagnose the degree of essential approximate multicollinearity (strong linear relationship between the independent variables except the intercept) existing in the model but is not able to detect the non-essential one (strong relationship between the intercept and at least one of the independent variables). +For more information on multicollinearity of essential and non-essential type, see @MarquardtSnee1975 and @Salmeron2019. + +However, the fact that the VIF detects a worrying level of multicollinearity does not always translate into a negative impact on the statistical analysis. This lack of specificity is due to the fact that other factors, such as sample size and the variance of the random disturbance, can lead to high values of the VIF but not increase the variance of the OLS estimators (see @OBrien). The explanation for this phenomenon hinges on the fact that, in the orthogonal variable reference model, which is traditionally considered as the reference, the linear relationships are assumed to be eliminated, while other factors, such as the variance of the random disturbance, maintain the same values. + +Then, to avoid these inconsistencies, @Salmeron2024a propose a QR decomposition in the matrix of independent variables of the model in order to obtain an orthonormal matrix. By redefining the reference point, the variance inflation factor is also redefined, resulting in a new detection measure that analyzes the change in the VIF and the rest of relevant factors of the model, thereby overcoming the problems associated with the traditional VIF, as described by @OBrien among others. The intercept is also included in the detection (contrary to what happens with the traditional VIF), it is therefore able to detect both essential and non-essential multicollinearity. +This new measure presented by @Salmeron2024a is called Redefined Variance Inflation Factor (RVIF). + +In this paper, the RVIF is associated with a statistical test for detecting troubling multicollinearity, this test is given by a region of non-rejection that depends on a significance level. Note that most of the measures used to diagnose multicollinearity are merely indicators with rules of thumb rather than statistical tests per se. To the best of our knowledge, the only existing statistical test for diagnosing multicollinearity was presented by @FarrarGlauber and has received strong criticism (see, for example, @CriticaFarrar1, @CriticaFarrar2, @CriticaFarrar3 and @CriticaFarrar4). +Thus, for example, @CriticaFarrar1 indicates that the Farrar and Glauber statistic indicates that the variables are not orthogonal to each other; it tells us nothing more. +In this sense, @CriticaFarrar4 indicates that such a test simply indicates whether the null hypothesis of orthogonality is rejected by giving no information on the value of the matrix of correlations determinant above which the multicollinearity problem becomes intolerable. +Therefore, the non-rejection region presented in this paper should be a relevant contribution to the field of econometrics insofar as it would fill an existing gap in the scientific literature. + +The paper is structured as follows: Sections [Preliminares](#preliminares) and [A first attempt of...](#modelo-orto) provide preliminary information to introduce the methodology used to establish the non-rejection region described in Section [A non-rejection region...](#new-VIF-orto). +Section [rvif package](#paqueteRVIF) presents the package \CRANpkg{rvif} of R (@R) and shows its main commands by replicating the results given in @Salmeron2024a and in the previous sections of this paper. +Finally, Section [Conclusions](#conclusiones-VIF) summarizes the main contributions of this paper. + +# Preliminaries {#preliminares} + +This section identifies some inconsistencies in the definition of the VIF and how these are reflected in the individual significance tests of the linear regression model. It also shows how these inconsistencies are overcome in the proposal presented by @Salmeron2024a and how this proposal can lead to a decision rule to determine whether the degree of multicollinearity is troubling, i.e., whether it affects the statistical analysis (individual significance tests) of the model. + +## The original model + +The multiple linear regression model with $n$ observations and $k$ independent variables can be expressed as: +\begin{equation} + \mathbf{y}_{n \times 1} = \mathbf{X}_{n \times k} \cdot \boldsymbol{\beta}_{k \times 1} + \mathbf{u}_{n \times 1}, + (\#eq:model0) +\end{equation} +where the first column of $\mathbf{X} = [\mathbf{1} \ \mathbf{X}_{2} \dots \mathbf{X}_{i} \dots \mathbf{X}_{k}]$ is composed of ones representing the intercept and $\mathbf{u}$ represents the random disturbance assumed to be centered and spherical. That is, $E[\mathbf{u}_{n \times 1}] = \mathbf{0}_{n \times 1}$ and $var(\mathbf{u}_{n \times 1}) = \sigma^{2} \cdot \mathbf{I}_{n \times n}$, where $\mathbf{0}$ is a vector of zeros, $\sigma^{2}$ is the variance of the random disturbance and $\mathbf{I}$ is the identity matrix. + +Given the original model \@ref(eq:model0), the VIF is defined as the ratio between the variance of the estimator in this model, $var \left( \widehat{\beta}_{i} \right)$, and the variance of the estimator of a hypothetical reference model, that is, a hypothetical model in which orthogonality among the independent variables is assumed, $var \left( \widehat{\beta}_{i,o} \right)$. This is to say: + +\begin{equation}\small{ + var \left( \widehat{\beta}_{i} \right) = \frac{\sigma^{2}}{n \cdot var(\mathbf{X}_{i})} \cdot \frac{1}{1 - R_{i}^{2}} = var \left( \widehat{\beta}_{i,o} \right) \cdot VIF(i), \quad i=2,\dots,k, + (\#eq:vari-VIF)} +\end{equation} +\begin{equation} + \frac{ + var \left( \widehat{\beta}_{i} \right) + }{ + var \left( \widehat{\beta}_{i,o} \right) + } = VIF(i), \quad i=2,\dots,k, + (\#eq:vari-VIF2) +\end{equation} +where $\mathbf{X}_{i}$ is the independent variable $i$ of the model \@ref(eq:model0) and $R^{2}_{i}$ the coefficient of determination of the following auxiliary regression: +\begin{equation} + \mathbf{X}_{i} = \mathbf{X}_{-i} \cdot \boldsymbol{\alpha} + \mathbf{v}, + \label{model_aux} \nonumber +\end{equation} +where $\mathbf{X}_{-i}$ is the result of eliminating $\mathbf{X}_{i}$ from the matrix $\mathbf{X}$. + +As observed in the expression \@ref(eq:vari-VIF), a high VIF leads to a high variance. Then, since the experimental value for the individual significance test is given by: +\begin{equation} + t_{i} = \left| \frac{\widehat{\beta}_{i}}{\sqrt{\frac{\widehat{\sigma}^{2}}{n \cdot var(\mathbf{X}_{i})} \cdot VIF(i)}} \right|, \quad i=2,\dots,k, + (\#eq:texp-orig) +\end{equation} +a high VIF will lead to a low experimental statistic ($t_{i}$), provoking the tendency not to reject the null hypothesis, i.e. the experimental statistic will be lower than the theoretical statistic (given by $t_{n-k}(1-\alpha/2)$, where $\alpha$ is the significance level). + +However, this statement is full of simplifications. By following @OBrien, and as can be easily observed in the expression \@ref(eq:texp-orig), other factors, such as the estimation of the random disturbance and the size of the sample, can counterbalance the high value of the VIF to yield a low value for the experimental statistic. That is to say, it is possible to obtain VIF values greater than 10 (the threshold traditionally established as troubling, see @Marquardt1970 for example) that do not necessarily imply high estimated variance on account of a large sample size or a low value for the estimated variance of the random disturbance. This explains, as noted in the introduction, why not all models with a high value for the VIF present effects on the statistical analysis of the model. + +> Example 1. +>Thus, for example, @Garciaetal2019b considered an extension of the interest rate model presented by @Wooldrigde2013, where $k=3$, in which all the independent variables have associated coefficients significantly different from zero, presenting a VIF equal to 71.516, much higher than the threshold normally established as worrying. In other words, in this case, a high VIF does not mean that the individual significance tests are affected. This situation is probably due to the fact that in this case 131 observations are available, i.e. the expression \@ref(eq:texp-orig) can be expressed as: + $$t_{i} = \left| \frac{\widehat{\beta}_{i}}{\sqrt{\frac{\widehat{\sigma}^{2}}{131 \cdot var(\mathbf{X}_{i})} \cdot 71.516}} \right| + = \left| \frac{\widehat{\beta}_{i}}{\sqrt{0.546 \cdot \frac{\widehat{\sigma}^{2}}{var(\mathbf{X}_{i})}}} \right|, \quad i=2,3.$$ + Note that in this case a high value of $n$ compensates for the high value of VIF. In addition, the value of $n$ will also cause $\widehat{\sigma}^{2}$ to decrease, since $\widehat{\sigma}^{2} = \frac{\mathbf{e}^{t}\mathbf{e}}{n-k}$, where $\mathbf{e}$ are the residuals of the original model \@ref(eq:model0). + +> The Subsection [Effect of sample size..](#effect-sample-size) provides an example that illustrates in more detail the effect of sample size on the statistical analysis of the model. \hfill $\lozenge$ + +On the other hand, considering the hypothetical orthogonal model, the value of the experimental statistic of the individual significance test, whose null hypothesis is $\beta_{i} = 0$ in face of the alternative hypothesis $\beta_{i} \not= 0$ with $i=2,\dots,k$, is given by: +\begin{equation} + t_{i}^{o} = \left| \frac{\widehat{\beta}_{i}}{\sqrt{\frac{\widehat{\sigma}^{2}}{n \cdot var(\mathbf{X}_{i})}}} \right|, \quad i=2,\dots,k, + (\#eq:texp-orto-1) +\end{equation} +where the estimated variance of the estimator has been diminished due to the VIF always being greater than or equal to 1, and consequently, $t_{i}^{o} \geq t_{i}$. However, it has been assumed that the same estimates for the independent variable coefficients and random disturbance variance are obtained in the orthogonal and original models, which does not seem to be a plausible supposition (see @Salmeron2024a Section 2.1 for more details). + +## An orthonormal reference model {#sub-above} + +In @Salmeron2024a the following QR decomposition of the matrix $\mathbf{X}_{n \times k}$ of the model \@ref(eq:model0) is proposed: $\mathbf{X} = \mathbf{X}_{o} \cdot \mathbf{P}$, where $\mathbf{X}_{o}$ is an orthonormal matrix of the same dimensions as $\mathbf{X}$ and $\mathbf{P}$ is a higher-order triangular matrix of dimensions $k \times k$. Then, the following hypothetical orthonormal reference model: +\begin{equation} + \mathbf{y} = \mathbf{X}_{o} \cdot \boldsymbol{\beta}_{o} + \mathbf{w}, + (\#eq:model-ref) +\end{equation} +verifies that: +$$\widehat{\boldsymbol{\beta}} = \mathbf{P}^{-1} \cdot \widehat{\boldsymbol{\beta}}_{o}, \ + \mathbf{e} = \mathbf{e}_{o}, \ + var \left( \widehat{\boldsymbol{\beta}}_{o} \right) = \sigma^{2} \cdot \mathbf{I},$$ +where $\mathbf{e}_{o}$ are the residuals of the orthonormal reference model \@ref(eq:model-ref). +Note that since $\mathbf{e} = \mathbf{e}_{o}$, the estimate of $\sigma^{2}$ is the same in the original model \@ref(eq:model0) and in the orthonormal reference model \@ref(eq:model-ref). +Moreover, since the dependent variable is the same in both models, the coefficient of determination and the experimental value of the global significance test are the same in both cases. + +From these values, taking into account the expressions \@ref(eq:vari-VIF) and \@ref(eq:vari-VIF2), it is evident that the ratio between the variance of the estimator in the original model \@ref(eq:model0) and the variance of the estimator of the orthonormal reference model \@ref(eq:model-ref) is: +\begin{equation} + \frac{ + var \left( \widehat{\beta}_{i} \right) + }{ + var \left( \widehat{\beta}_{i,o} \right) + } = \frac{VIF(i)}{n \cdot var(\mathbf{X}_{i})}, \quad i=2,\dots,k. + (\#eq:redef-VIF) \nonumber +\end{equation} +Consequently, @Salmeron2024a defined the redefined VIF (RVIF) for $i=1,\dots,k$ as: +\begin{equation}\small{ + RVIF(i) = \frac{VIF(i)}{n \cdot var(\mathbf{X}_{i})} = \frac{\mathbf{X}_{i}^{t} \mathbf{X}_{i}}{\mathbf{X}_{i}^{t} \mathbf{X}_{i} - \mathbf{X}_{i}^{t} \mathbf{X}_{-i} \cdot \left( \mathbf{X}_{-i}^{t} \mathbf{X}_{-i} \right)^{-1} \cdot \mathbf{X}_{-i}^{t} \mathbf{X}_{i}}, (\#eq:RVIF)} +\end{equation} +which shows, among other questions, that it is defined for $i=1,2,\dots,k$. That is, in contrast to the VIF, the RVIF can be calculated for the intercept of the linear regression model. + +Other considerations to be taken into account are the following: + +- If the data are expressed in unit length, same transformation used to calculate the Condition Number (CN), then: +$$RVIF(i) = \frac{1}{1 - \mathbf{X}_{i}^{t} \mathbf{X}_{-i} \cdot \left( \mathbf{X}_{-i}^{t} \mathbf{X}_{-i} \right)^{-1} \cdot \mathbf{X}_{-i}^{t} \mathbf{X}_{i}}, \quad i=1,\dots,k.$$ +- In this case (data expressed in unit length), when $\mathbf{X}_{i}$ is orthogonal to $\mathbf{X}_{-i}$, it is verified that $\mathbf{X}_{i}^{t} \mathbf{X}_{-i} = \mathbf{0}$ and, consequently $RVIF(i) = 1$ for $i=1,\dots,k$. That is, the RVIF is always greater than or equal to 1 and its minimum value is indicative of the absence of multicollinearity. + +- Denoted by $a_{i}= \mathbf{X}_{i}^{t} \mathbf{X}_{i} \cdot \left( \mathbf{X}_{-i}^{t} \mathbf{X}_{-i} \right)^{-1} \cdot \mathbf{X}_{-i}^{t} \mathbf{X}_{i}$, it is verified that $RVIF(i) = \frac{1}{1-a_{i}}$ where $a_{i}$ can be interpreted as the percentage of approximate multicollinearity due to variable $\mathbf{X}_{i}$. Note the similarity of this expression to that of the VIF: $VIF(i) = \frac{1}{1-R_{i}^{2}}$ (see equation \@ref(eq:vari-VIF)). + +- Finally, from a simulation for $k=3$, @Salmeron2024a show that if $a_{i} > 0.826$, then the degree of multicollinearity is worrying. In any case this value should be refined by considering higher values of $k$. + +On the other hand, given the orthonormal reference model \@ref(eq:model-ref), the value for the experimental statistic of the individual significance test with the null hypothesis $\beta_{i,o} = 0$ (given the alternative hypothesis $\beta_{i,o} \not= 0$, for $i=1,\dots,k$) is: +\begin{equation} + t_{i}^{o} = \left| \frac{\widehat{\beta}_{i,o}}{\widehat{\sigma}} \right| = \left| \frac{\mathbf{p}_{i} \cdot \widehat{\boldsymbol{\beta}}}{\widehat{\sigma}} \right|, + (\#eq:texp-orto-2) +\end{equation} +where $\mathbf{p}_{i}$ is the $i$ row of the matrix $\mathbf{P}$. + +By comparing this expression with the one given in \@ref(eq:texp-orto-1), it is observed that, as expected, not only the denominator but also the numerator has changed. +Thus, in addition to the VIF, the rest of the elements in expression \@ref(eq:texp-orig) have also changed. +Consequently, if the null hypothesis is rejected in the original model, it is not assured that the same will occur in the orthonormal reference model. For this reason, it is possible to consider that the orthonormal model proposed as the reference model in @Salmeron2024a is more plausible than the one traditionally applied. + +## Possible scenarios in the individual significance tests + +To determine whether the tendency not to reject the null hypothesis in the individual significance test is caused by a troubling approximate multicollinearity that inflates the variance of the estimator, or whether it is caused by variables not being statistically significantly related, the following situations are distinguished with a significance level $\alpha$: + +a. If the null hypothesis is initially rejected in the original model \@ref(eq:model0), $t_{i} > t_{n-k}(1-\alpha/2)$, the following results can be obtained for the orthonormal model: + +a.1. the null hypothesis is rejected, $t_{i}^{o} > t_{n-k}(1-\alpha/2)$; then, the results are consistent. + +a.2. the null hypothesis is not rejected, $t_{i}^{o} < t_{n-k}(1-\alpha/2)$; this could be an inconsistency. + +b. If the null hypothesis is not initially rejected in the original model \@ref(eq:model0), $t_{i} < t_{n-k}(1-\alpha/2)$, the following results may occur for the orthonormal model: + +b.1 the null hypothesis is rejected, $t_{i}^{o} > t_{n-k}(1-\alpha/2)$; then, it is possible to conclude that the degree of multicollinearity affects the statistical analysis of the model, provoking not rejecting the null hypothesis in the original model. + +b.2 the null hypothesis is also not rejected, $t_{i}^{o} < t_{n-k}(1-\alpha/2)$; then, the results are consistent. + +In conclusion, when option b.1 is given, the null hypothesis of the individual significance test is not rejected when the linear relationships are considered (original model) but is rejected when the linear relationships are not considered (orthonormal model). Consequently, it is possible to conclude that the linear relationships affect the statistical analysis of the model. The possible inconsistency discussed in option a.2 is analyzed in detail in Appendix [Inconsistency](#inconsistency), concluding that it will rarely occur in cases where a high degree of multicollinearity is assumed. The other two scenarios provide consistent situations. + +# A first attempt to obtain a non-rejection region associated with a statistical test to detect multicollinearity {#modelo-orto} + +## From the traditional orthogonal model + +Considering the expressions \@ref(eq:texp-orig) and \@ref(eq:texp-orto-1), it is verified that $t_{i}^{o} = t_{i} \cdot \sqrt{VIF(i)}$. Consequently, in the orthogonal case, with a significance level $\alpha$, the null hypothesis $\beta_{i,o} = 0$ is rejected if $t_{i}^{o} > t_{n-k}(1-\alpha/2)$ for $i=2,\dots,k.$ That is, if: +\begin{equation} + VIF(i) > \left( \frac{t_{n-k}(1-\alpha/2)}{t_{i}} \right)^{2} = c_{1}(i), \quad i=2,\dots,k. + (\#eq:cond-false) +\end{equation} +Thus, if the VIF associated with the variable $i$ is greater than the upper bound $c_{1}(i)$, then it can be concluded that the estimator of the coefficient of that variable is significantly different from zero in the hypothetical case where the variables are orthogonal. In addition, if the null hypothesis is not rejected in the initial model, the reason for the failure to reject could be due to the degree of multicollinearity that affects the statistical analysis of the model. + +Finally, note that since the interesting cases are those where the null hypothesis is not initially rejected, $t_{i} < t_{n-k}(1-\alpha/2)$, the upper bound $c_{1}(i)$ will always be greater than one. + +> Example 2. +> Table `r knitr::asis_output(ifelse(knitr::is_html_output(), '\\@ref(tab:WisseltableHTML)', '\\@ref(tab:WisseltableLATEX)'))` shows a dataset (previously presented by @Wissell) with the following variables: outstanding mortgage debt ($\mathbf{D}$, trillions of dollars), personal consumption ($\mathbf{C}$, trillions of dollars), personal income ($\mathbf{I}$, trillions of dollars) and outstanding consumer credit ($\mathbf{CP}$, trillions of dollars) for the years 1996 to 2012. + +```{r WisseltableHTML, eval = knitr::is_html_output()} +WisselTABLE = Wissel[,-3] +knitr::kable(WisselTABLE, format = "html", caption = "Data set presented previously by @Wissell", align="cccccc", digits = 3) +``` + +```{r WisseltableLATEX, eval = knitr::is_latex_output()} +WisselTABLE = Wissel[,-3] +knitr::kable(WisselTABLE, format = "latex", booktabs = TRUE, caption = "Data set presented previously by Wissell", align="cccccc", digits = 3)%>% +kable_styling(latex_options = "scale_down") +``` + +> Table `r knitr::asis_output(ifelse(knitr::is_html_output(), '\\@ref(tab:Wissel0tableHTML)', '\\@ref(tab:Wissel0tableLATEX)'))` shows the OLS estimation of the model explaining the outstanding mortgage debt as a function of the rest of the variables. That is: + $$\mathbf{D} = \beta_{1} + \beta_{2} \cdot \mathbf{C} + \beta_{3} \cdot \mathbf{I} + \beta_{4} \cdot \mathbf{CP} + \mathbf{u}.$$ + Note that the estimates for the coefficients of personal consumption, personal income and outstanding consumer credit are not significantly different from zero (a significance level of 5\% is considered throughout the paper), while the model is considered to be globally valid (experimental value, F exp., higher than theoretical value). + +```{r Wisselregression} +attach(Wissel) +obs = nrow(Wissel) +regWISSEL0 = lm(D~C+I+CP, data = Wissel) + regWISSEL0coef = as.double(regWISSEL0$coefficients) + regWISSEL0se = as.double(summary(regWISSEL0)[[4]][,2]) + regWISSEL0texp = as.double(summary(regWISSEL0)[[4]][,3]) + regWISSEL0pvalue = as.double(summary(regWISSEL0)[[4]][,4]) + regWISSEL0sigma2 = as.double(summary(regWISSEL0)[[6]]^2) + regWISSEL0R2 = as.double(summary(regWISSEL0)[[8]]) + regWISSEL0Fexp = as.double(summary(regWISSEL0)[[10]][[1]]) + regWISSEL0table = data.frame(c(regWISSEL0coef, obs), + c(regWISSEL0se, regWISSEL0sigma2), + c(regWISSEL0texp, regWISSEL0R2), + c(regWISSEL0pvalue, regWISSEL0Fexp)) + regWISSEL0table = round(regWISSEL0table, digits=4) + colnames(regWISSEL0table) =c("Estimator", "Standard Error", "Experimental t", "p-value") + rownames(regWISSEL0table) =c("Intercept", "Personal consumption", "Personal income", "Outstanding consumer credit", "(Obs, Sigma Est., Coef. Det., F exp.)") +regWISSEL1 = lm(D~C, data = Wissel) + regWISSEL1coef = as.double(regWISSEL1$coefficients) + regWISSEL1se = as.double(summary(regWISSEL1)[[4]][,2]) + regWISSEL1texp = as.double(summary(regWISSEL1)[[4]][,3]) + regWISSEL1pvalue = as.double(summary(regWISSEL1)[[4]][,4]) + regWISSEL1sigma2 = as.double(summary(regWISSEL1)[[6]]^2) + regWISSEL1R2 = as.double(summary(regWISSEL1)[[8]]) + regWISSEL1Fexp = as.double(summary(regWISSEL1)[[10]][[1]]) + regWISSEL1table = data.frame(c(regWISSEL1coef, obs), + c(regWISSEL1se, regWISSEL1sigma2), + c(regWISSEL1texp, regWISSEL1R2), + c(regWISSEL1pvalue, regWISSEL1Fexp)) + regWISSEL1table = round(regWISSEL1table, digits=4) + colnames(regWISSEL1table) =c("Estimator", "Standard Error", "Experimental t", "p-value") + rownames(regWISSEL1table) =c("Intercept", "Personal consumption", "(Obs, Sigma Est., Coef. Det., F exp.)") +regWISSEL2 = lm(D~C+I, data = Wissel) + regWISSEL2coef = as.double(regWISSEL2$coefficients) + regWISSEL2se = as.double(summary(regWISSEL2)[[4]][,2]) + regWISSEL2texp = as.double(summary(regWISSEL2)[[4]][,3]) + regWISSEL2pvalue = as.double(summary(regWISSEL2)[[4]][,4]) + regWISSEL2sigma2 = as.double(summary(regWISSEL2)[[6]]^2) + regWISSEL2R2 = as.double(summary(regWISSEL2)[[8]]) + regWISSEL2Fexp = as.double(summary(regWISSEL2)[[10]][[1]]) + regWISSEL2table = data.frame(c(regWISSEL2coef, obs), + c(regWISSEL2se, regWISSEL2sigma2), + c(regWISSEL2texp, regWISSEL2R2), + c(regWISSEL2pvalue, regWISSEL2Fexp)) + regWISSEL2table = round(regWISSEL2table, digits=4) + colnames(regWISSEL2table) =c("Estimator", "Standard Error", "Experimental t", "p-value") + rownames(regWISSEL2table) =c("Intercept", "Personal consumption", "Personal income", "(Obs, Sigma Est., Coef. Det., F exp.)") +regWISSEL3 = lm(D~C+CP, data = Wissel) + regWISSEL3coef = as.double(regWISSEL3$coefficients) + regWISSEL3se = as.double(summary(regWISSEL3)[[4]][,2]) + regWISSEL3texp = as.double(summary(regWISSEL3)[[4]][,3]) + regWISSEL3pvalue = as.double(summary(regWISSEL3)[[4]][,4]) + regWISSEL3sigma2 = as.double(summary(regWISSEL3)[[6]]^2) + regWISSEL3R2 = as.double(summary(regWISSEL3)[[8]]) + regWISSEL3Fexp = as.double(summary(regWISSEL3)[[10]][[1]]) + regWISSEL3table = data.frame(c(regWISSEL3coef, obs), + c(regWISSEL3se, regWISSEL3sigma2), + c(regWISSEL3texp, regWISSEL3R2), + c(regWISSEL3pvalue, regWISSEL3Fexp)) + regWISSEL3table = round(regWISSEL3table, digits=4) + colnames(regWISSEL3table) =c("Estimator", "Standard Error", "Experimental t", "p-value") + rownames(regWISSEL3table) =c("Intercept", "Personal consumption", "Outstanding consumer credit", "(Obs, Sigma Est., Coef. Det., F exp.)") +``` + +```{r Wissel0tableHTML, eval = knitr::is_html_output()} +knitr::kable(regWISSEL0table, format = "html", caption = "OLS estimation for the Wissel model", align="cccc", digits = 3) +``` + +```{r Wissel0tableLATEX, eval = knitr::is_latex_output()} +knitr::kable(regWISSEL0table, format = "latex", booktabs = TRUE, caption = "OLS estimation for the Wissel model", align="cccc", digits = 3) %>% +kable_styling(latex_options = "scale_down") +``` + +>In addition, the estimated coefficient for the variable personal consumption, which is not significantly different from zero, has the opposite sign to the simple correlation coefficient between this variable and outstanding mortgage debt, 0.953. + Thus, in the simple linear regression between both variables (see Table `r knitr::asis_output(ifelse(knitr::is_html_output(), '\\@ref(tab:Wissel1tableHTML)', '\\@ref(tab:Wissel1tableLATEX)'))`), the estimated coefficient of the variable personal consumption is positive and significantly different from zero. However, adding a second variable (see Tables `r knitr::asis_output(ifelse(knitr::is_html_output(), '\\@ref(tab:Wissel2tableHTML)', '\\@ref(tab:Wissel2tableLATEX)'))` and `r knitr::asis_output(ifelse(knitr::is_html_output(), '\\@ref(tab:Wissel3tableHTML)', '\\@ref(tab:Wissel3tableLATEX)'))`) none of the coefficients are individually significantly different from zero although both models are globally significant. + This is traditionally understood as a symptom of statistically troubling multicollinearity. + +```{r Wissel1tableHTML, eval = knitr::is_html_output()} +knitr::kable(regWISSEL1table, format = "html", caption = "OLS estimation for part of the Wissel model", align="cccc", digits = 3) +``` + +```{r Wissel1tableLATEX, eval = knitr::is_latex_output()} +knitr::kable(regWISSEL1table, format = "latex", booktabs = TRUE, caption = "OLS estimation for part of the Wissel model", align="cccc", digits = 3) %>% +kable_styling(latex_options = "scale_down") +``` + +```{r Wissel2tableHTML, eval = knitr::is_html_output()} +knitr::kable(regWISSEL2table, format = "html", caption = "OLS estimation for part of the Wissel model", align="cccc", digits = 3) +``` + +```{r Wissel2tableLATEX, eval = knitr::is_latex_output()} +knitr::kable(regWISSEL2table, format = "latex", booktabs = TRUE, caption = "OLS estimation for part of the Wissel model", align="cccc", digits = 3) %>% +kable_styling(latex_options = "scale_down") +``` + +```{r Wissel3tableHTML, eval = knitr::is_html_output()} +knitr::kable(regWISSEL3table, format = "html", caption = "OLS estimation for part of the Wissel model", align="cccc", digits = 3) +``` + +```{r Wissel3tableLATEX, eval = knitr::is_latex_output()} +knitr::kable(regWISSEL3table, format = "latex", booktabs = TRUE, caption = "OLS estimation for part of the Wissel model", align="cccc", digits = 3)%>% +kable_styling(latex_options = "scale_down") +``` + +>By using expression \@ref(eq:cond-false) in order to confirm this problem, it is verified that $c_{1}(2) = 6.807$, $c_{1}(3) = 1.985$ and $c_{1}(4) = 18.743$, taking into account that $t_{13}(0.975) = 2.160$. Since the VIFs are equal to 589.754, 281.886 and 189.487, respectively, it is concluded that the individual significance tests for the three cases are affected by the degree of multicollinearity existing in the model. \hfill $\lozenge$ + +## From the alternative orthonormal model \@ref(eq:model-ref) + +In the Subsection [An orthonormal reference model](#sub-above) the individual significance test from the expression \@ref(eq:texp-orto-2) is redefined. Thus, the null hypothesis $\beta_{i,o}=0$ will be rejected, with a significance level $\alpha$, if the following condition is verified: +$$t_{i}^{o} > t_{n-k}(1-\alpha/2), \quad i=2,\dots,k.$$ +Taking into account the expressions \@ref(eq:texp-orig) and \@ref(eq:texp-orto-2), this is equivalent to: +\begin{equation}\small{ + VIF(i) > \left( \frac{t_{n-k}(1-\alpha/2)}{\widehat{\beta}_{i,o}} \right)^{2} \cdot \widehat{var} \left( \widehat{\beta}_{i} \right) \cdot n \cdot var(\mathbf{X}_{i}) = c_{2}(i). (\#eq:cota-VIF-orto)} +\end{equation} + +Thus, if the $VIF(i)$ is greater than $c_{2}(i)$, the null hypothesis is rejected in the respective individual significance tests in the orthonormal model (with $i=2,\dots,k$). Then, if the null hypothesis is not rejected in the original model and it is verified that $VIF(i) > c_{2}(i)$, it can be concluded that the multicollinearity existing in the model affects its statistical analysis. In summary, a lower bound for the VIF is established to indicate when the approximate multicollinearity is troubling in a way that can be reinterpreted and presented as a region of non-rejection of a statistical test. + +> Example 3. +> Continuing with the dataset presented by @Wissell, Table `r knitr::asis_output(ifelse(knitr::is_html_output(), '\\@ref(tab:WisselORTOtableHTML)', '\\@ref(tab:WisselORTOtableLATEX)'))` shows the results of the OLS estimation of the orthonormal model obtained from the original model. + +```{r WisselORTO} +y = Wissel[,2] +X = as.matrix(Wissel[,3:6]) +Xqr=qr(X) +Xo = qr.Q(Xqr) +regORTO = lm(y~Xo+0) +#summary(regORTO) + regORTOcoef = as.double(regORTO$coefficients) + regORTOse = as.double(summary(regORTO)[[4]][,2]) + regORTOtexp = as.double(summary(regORTO)[[4]][,3]) + regORTOpvalue = as.double(summary(regORTO)[[4]][,4]) + regORTOsigma2 = as.double(summary(regORTO)[[6]]^2) + regORTOR2 = as.double(summary(regORTO)[[8]]) # as I have removed the intercept in the regression, this does not calculate it well + regORTOFexp = as.double(summary(regORTO)[[10]][[1]]) # as I have removed the intercept in the regression, this does not calculate it well + regORTOtable = data.frame(c(regORTOcoef, obs), + c(regORTOse, regORTOsigma2), + c(regORTOtexp, 0.9235), + c(regORTOpvalue, 52.3047)) + regORTOtable = round(regORTOtable, digits=4) + colnames(regORTOtable) =c("Estimator", "Standard Error", "Experimental t", "p-value") + rownames(regORTOtable) =c("Intercept", "Personal consumption", "Personal income", "Outstanding consumer credit", "(Obs, Sigma Est., Coef. Det., F exp.)") +``` + +```{r WisselORTOtableHTML, eval = knitr::is_html_output()} +knitr::kable(regORTOtable, format = "html", caption = "OLS estimation for the orthonormal Wissel model", align="cccc", digits = 3) +``` + +```{r WisselORTOtableLATEX, eval = knitr::is_latex_output()} +knitr::kable(regORTOtable, format = "latex", booktabs = TRUE, caption = "OLS estimation for the orthonormal Wissel model", align="cccc", digits = 3)%>% +kable_styling(latex_options = "scale_down") +``` + +>When these results are compared with those in Table `r knitr::asis_output(ifelse(knitr::is_html_output(), '\\@ref(tab:Wissel0tableHTML)', '\\@ref(tab:Wissel0tableLATEX)'))`, the following conclusions can be obtained: +> +- Except for the outstanding consumer credit variable, whose standard deviation has increased, the standard deviation has decreased in all cases. +> +- The absolute values of the experimental statistics of the individual significance tests associated with the intercept and the personal consumption variable have increased, while the experimental statistic of the personal income variable has decreased, and the experimental statistic of the outstanding consumer credit variable remains the same. These facts show that the change from the original model to the orthonormal model does not guarantee an increase in the absolute value of the experimental statistic. +> +- The estimation of the coefficient of the personal consumption variable is not significantly different from zero in the original model, but it is in the orthogonal model. Thus, it is concluded that multicollinearity affects the statistical analysis of the model. Note that there is also a change in the sign of the estimate, although the purpose of the orthogonal model is not to obtain estimates for the coefficients, but rather to provide a reference point against which to measure how much the variances are inflated. Note that an orthonormal model is an idealized construction that may lack a proper interpretation in practice. +> +- The values corresponding to the estimated variance for the random disturbance, the coefficient of determination and the experimental statistic (F exp.) for the global significance test remain the same. + +>On the other hand, considering the VIF of the independent variables except for the intercept (589.754, 281.886 and 189.487) and their corresponding bounds (17.809, 623.127 and 3545.167) obtained from the expression \@ref(eq:cota-VIF-orto), only the variable of personal consumption verifies that the VIF is higher than the corresponding bound. These results are different from those obtained in Example 2, where the traditional orthogonal model was taken as a reference. + +>Finally, Tables `r knitr::asis_output(ifelse(knitr::is_html_output(), '\\@ref(tab:Wissel0tableHTML)', '\\@ref(tab:Wissel0tableLATEX)'))` and `r knitr::asis_output(ifelse(knitr::is_html_output(), '\\@ref(tab:WisselORTOtableHTML)', '\\@ref(tab:WisselORTOtableLATEX)'))` show that the experimental values of the statistic $t$ of the variable outstanding consumer credit are the same in the original and orthonormal models. \hfill $\lozenge$ + +The last fact highlighted at the end of the previous example is not a coincidence, but a consequence of the QR decomposition, see Appendix [Test of...](#apendix). Therefore, in this case, the conclusion of the individual significance test will be the same in the original and in the orthonormal model, i.e. we will always be in scenarios a.1 or b.2. + +Thus, this behavior establishes a situation where it is required to select the variable fixed in the last position. Some criteria to select the most appropriate variable for this placement could be: + +- To fix the variable that is considered less relevant to the model. + +- To fix a variable whose associated coefficient is significantly different from zero, since this case would not be of interest for the definition of multicollinearity given in the paper. Note that the interest will be related to a coefficient considered as zero in the original model and significantly different from zero in the orthonormal one. + +These options are explored in the Subsection [Choice of the variable to be fix...](#how-to-fix). + +# A non-rejection region associated with a statistical test to detect multicollinearity {#new-VIF-orto} + +@Salmeron2024a show that high values of RVIF are associated with a high degree of multicollinearity. The question, however, is how high RVIFs have to be to reflect troubling multicollinearity. + +Taking into account the expressions \@ref(eq:RVIF) and \@ref(eq:cota-VIF-orto), it is possible to conclude that multicollinearity is affecting the statistical analysis of the model if it can be verified that: +\begin{equation} + RVIF(i) > \left( \frac{t_{n-k}(1-\alpha/2)}{\widehat{\beta}_{i,o}} \right)^{2} \cdot \widehat{var} \left( \widehat{\beta}_{i} \right) = c_{3}(i), + (\#eq:cota-VIFR) +\end{equation} +for any $i=1,\dots,k$. Note that the intercept is included in this proposal, in contrast to the previous section, in which it was not included. + +By following @OBrien and taking into account that the estimation of the expression \@ref(eq:vari-VIF) can be expressed as: +$$\widehat{var} \left( \widehat{\beta}_{i} \right) = \widehat{\sigma}^{2} \cdot RVIF(i) = \frac{\mathbf{e}^{t}\mathbf{e}}{n-k} \cdot RVIF(i),$$ +there are other factors that counterbalance a high value of RVIF, thereby avoiding high estimated variances for the estimated coefficients. These factors are the sum of the squared residuals (SSR= $\mathbf{e}^{t}\mathbf{e}$) of the model \@ref(eq:model0) and $n$. Thus, an appropriate specification of the econometric model (i.e., one that implies a good fit and, consequently, a small SSR) and a large sample size can compensate for high RVIF values. +However, contrary to what happens for the VIF in the traditional case, these factors are taken into account in the threshold $c_{3}(i)$, as established in the expression \@ref(eq:cota-VIFR) in $\widehat{var} \left( \widehat{\beta}_{i} \right)$. + +> Example 4. +> This contribution can be illustrated with the data set previously presented by @KleinGoldberger, which includes variables for consumption, $\mathbf{C}$, wage incomes, $\mathbf{I}$, non-farm incomes, $\mathbf{InA}$, and farm incomes, $\mathbf{IA}$, in United States from 1936 to 1952, as shown in Table `r knitr::asis_output(ifelse(knitr::is_html_output(), '\\@ref(tab:KGtableHTML)', '\\@ref(tab:KGtableLATEX)'))` (data from 1942 to 1944 are not available because they were war years). + +```{r KGtableHTML, eval = knitr::is_html_output()} +data(KG) +KGtable = KG +colnames(KGtable) = c("Consumption", "Wage income", "Non-farm income", "Farm income") +knitr::kable(KGtable, format = "html", caption = "Data set presented previously by @KleinGoldberger", align="cccc", digits = 3) +``` + +```{r KGtableLATEX, eval = knitr::is_latex_output()} +data(KG) +KGtable = KG +colnames(KGtable) = c("Consumption", "Wage income", "Non-farm income", "Farm income") +knitr::kable(KGtable, format = "latex", booktabs = TRUE, caption = "Data set presented previously by Klein and Goldberger", align="cccc", digits = 3)%>% +kable_styling(latex_options = "scale_down") +``` + +> Table `r knitr::asis_output(ifelse(knitr::is_html_output(), '\\@ref(tab:regKGtableHTML)', '\\@ref(tab:regKGtableLATEX)'))` shows the OLS estimations of the model explaining consumption as a function of the rest of the variables. Note that there is some incoherence between the individual significance values of the variables and the global significance of the model. + +```{r KGregression} +attach(KG) +obs = nrow(KG) +regKG = lm(consumption~wage.income+non.farm.income+farm.income) + regKGcoef = as.double(regKG$coefficients) + regKGse = as.double(summary(regKG)[[4]][,2]) + regKGtexp = as.double(summary(regKG)[[4]][,3]) + regKGpvalue = as.double(summary(regKG)[[4]][,4]) + regKGsigma2 = as.double(summary(regKG)[[6]]^2) + regKGR2 = as.double(summary(regKG)[[8]]) + regKGFexp = as.double(summary(regKG)[[10]][[1]]) + regKGtable = data.frame(c(regKGcoef, obs), + c(regKGse, regKGsigma2), + c(regKGtexp, regKGR2), + c(regKGpvalue, regKGFexp)) + regKGtable = round(regKGtable, digits=4) + colnames(regKGtable) =c("Estimator", "Standard Error", "Experimental t", "p-value") + rownames(regKGtable) =c("Intercept", "Wage income", "Non-farm income", "Farm income", "(Obs, Sigma Est., Coef. Det., F exp.)") +``` + +```{r regKGtableHTML, eval = knitr::is_html_output()} +knitr::kable(regKGtable, format = "html", caption = "OLS estimation for the Klein and Goldberger model", align="cccc", digits = 3) +``` + +```{r regKGtableLATEX, eval = knitr::is_latex_output()} +knitr::kable(regKGtable, format = "latex", booktabs = TRUE, caption = "OLS estimation for the Klein and Goldberger model", align="cccc", digits = 3)%>% +kable_styling(latex_options = "scale_down") +``` + +>The RVIFs are calculated, yielding 1.275, 0.002, 0.014 and 0.053, respectively. The associated bounds, $c_{3}(i)$, are also calculated, yielding 0.002, 0.0001, 0.018 and 1.826, respectively. + +>Since the coefficient of the wage income variable is not significantly different from zero, and because it is verified that $0.002 > 0.0001$, from \@ref(eq:cota-VIFR) it is concluded that the degree of multicollinearity existing in the model is affecting its statistical analysis. +\hfill $\lozenge$ + +## From the RVIF {#TheTHEOREM} + +Considering that in the original model \@ref(eq:model0) the null hypothesis $\beta_{i} = 0$ of the individual significance test is not rejected if: +$$RVIF(i) > \left( \frac{\widehat{\beta}_{i}}{\widehat{\sigma} \cdot t_{n-k}(1-\alpha/2)} \right)^{2} = c_{0}(i), \quad i=1,\dots,k,$$ +while in the orthonormal model, the null hypothesis is rejected if $RVIF(i) > c_{3}(i)$, the following theorem can be established: + +> Theorem. Given the multiple linear regression model \@ref(eq:model0), the degree of multicollinearity affects its statistical analysis (with a level of significance of $\alpha\%$) if there is a variable $i$, with $i=1,\dots,k$, that verifies $RVIF(i) > \max \{ c_{0}(i), c_{3}(i) \}$. + +Note that @Salmeron2024a indicate that the RVIF must be calculated with unit length data (as any other transformation removes the intercept from the analysis), however, for the correct application of this theorem the original data must be used as no transformation has been considered in this paper. + +> Example 5. Tables `r knitr::asis_output(ifelse(knitr::is_html_output(), '\\@ref(tab:theoremWISSELtableHTML)', '\\@ref(tab:theoremWISSELtableLATEX)'))` and `r knitr::asis_output(ifelse(knitr::is_html_output(), '\\@ref(tab:theoremKGtableHTML)', '\\@ref(tab:theoremKGtableLATEX)'))` present the results of applying the theorem to the @Wissell and @KleinGoldberger models, respectively. Note that in both cases, there is a variable $i$ that verifies that $RVIF(i) > \max \{ c_{0}(i), c_{3}(i) \}$, and consequently, we can conclude that the degree of approximate multicollinearity is affecting the statistical analysis in both models (with a level of significance of $5\%$). \hfill $\lozenge$ + +```{r THEOREM} +y = Wissel[,2] +X = as.matrix(Wissel[,3:6]) +theoremWISSEL = multicollinearity(y, X) +rownames(theoremWISSEL) = c("Intercept", "Personal consumption", "Personal income", "Outstanding consumer credit") + +y = KG[,1] +cte = rep(1, length(y)) +X = as.matrix(cbind(cte, KG[,-1])) +theoremKG = multicollinearity(y, X) +rownames(theoremKG) = c("Intercept", "Wage income", "Non-farm income", "Farm income") +``` + +```{r theoremWISSELtableHTML, eval = knitr::is_html_output()} +knitr::kable(theoremWISSEL, format = "html", caption = "Theorem results of the Wissel model", align="ccccc", digits = 6) +``` + +```{r theoremWISSELtableLATEX, eval = knitr::is_latex_output()} +knitr::kable(theoremWISSEL, format = "latex", booktabs = TRUE, caption = "Theorem results of the Wissel model", align="ccccc", digits = 6) %>% +kable_styling(latex_options = "scale_down") +``` + +```{r theoremKGtableHTML, eval = knitr::is_html_output()} +knitr::kable(theoremKG, format = "html", caption = "Theorem results of the Klein and Goldberger model", align="ccccc", digits = 6) +``` + +```{r theoremKGtableLATEX, eval = knitr::is_latex_output()} +knitr::kable(theoremKG, format = "latex", booktabs = TRUE, caption = "Theorem results of the Klein and Goldberger model", align="ccccc", digits = 6) %>% +kable_styling(latex_options = "scale_down") +``` + +# The rvif package {#paqueteRVIF} + +The results developed in @Salmeron2024a and in this paper have been implemented in the \CRANpkg{rvif} package of R (@R). The following shows how to replicate the results presented in both papers from the existing commands $\texttt{rvifs}$ and $\texttt{multicollinearity}$ in \CRANpkg{rvif}. For this reason, the code executed is shown below. + +In addition, the following issues will be addressed: + +- Discussion on the effect of sample size in detecting the influence of multicollinearity on the statistical analysis of the model. + +- Discussion on the choice of the variable to be fixed as the last one before the orthonormalization. + +The code used in these two Subsections is available at . +It is also interesting to consult the package vignette using the command `browseVignettes("rvif")`, as well as its web page with `browseURL(system.file("docs/index.html", package = "rvif"))` or . + +## Detection of multicollinearity with RVIF: does the degree of multicollinearity affect the statistical analysis of the model? + +In @Salmeron2024a a series of examples are presented to illustrate the usefulness of RVIF to detect the degree of approximate multicollinearity in a multiple linear regression model. +Results presented by @Salmeron2024a will be reproduced by using the command $\texttt{rvifs}$ of \CRANpkg{rvif} package and complemented with the contribution developed in the present work by using the command $\texttt{multicollinearity}$ of the same package. +In order to facilitate the reading of the paper, this information is available in Appendix [Examples of...](#examplesRVIF). + +On the other hand, the following shows how to use the above commands to obtain the results shown in Table 9 of this paper: + +```{r PAPER13, echo=TRUE} +y_W = Wissel[,2] +X_W = Wissel[,3:6] +multicollinearity(y_W, X_W) +``` + +It is noted that the first two arguments of the $\texttt{multicollinearity}$ command are, respectively, the dependent variable of the linear model and the design matrix containing the independent variables (intercept included as the first column). + +While the results in Table 10 can be obtained using this code: + +```{r PAPER14, echo=TRUE} +y_KG = KG[,1] +cte = rep(1, length(y)) +X_KG = cbind(cte, KG[,2:4]) +multicollinearity(y_KG, X_KG) +``` + +As is known, in both cases it is concluded that the degree of multicollinearity in the model affects its statistical analysis. + +The $\texttt{multicollinearity}$ command is used by default with a significance level of 5% for the application of the Theorem set in Subsection [From the RVIF](#TheTHEOREM). +Note that if the significance level is changed to 1% (third argument of the $\texttt{multicollinearity}$ command), in the Klein and Goldberger model it is obtained that the individual significance test of the intercept is also affected by the degree of existing multicollinearity: + +```{r PAPER15, echo=TRUE} +multicollinearity(y_W, X_W, alpha = 0.01) +multicollinearity(y_KG, X_KG, alpha = 0.01) +``` + +It can be seen that the values of $c_{0}$ and $c_{3}$ change depending on the significance level used. + +## Effect of the sample size on the detection of the influence of multicollinearity on the statistical analysis of the model {#effect-sample-size} + +The introduction has highlighted the idea that the measures traditionally used to detect whether the degree of multicollinearity is of concern may indicate that it is troubling while the model analysis is not affected by it. Example 1 shows that this may be due, among other factors, to the size of the sample. + +To explore this issue in more detail, below is given an example where traditional measures of multicollinearity detection indicate that the existing multicollinearity is troubling while the statistical analysis of the model is not affected when the sample size is high. In particular, observations are simulated for $\mathbf{X} = [ \mathbf{1} \ \mathbf{X}_{2} \ \mathbf{X}_{3} \ \mathbf{X}_{4} \ \mathbf{X}_{5} \ \mathbf{X}_{6}]$ where: +$$\mathbf{X}_{2} \sim N(5, 0.1^{2}), \quad \mathbf{X}_{3} \sim N(5, 10^{2}), \quad \mathbf{X}_{4} = \mathbf{X}_{3} + \mathbf{p}$$ +$$\mathbf{X}_{5} \sim N(-1, 3^{2}), \quad \mathbf{X}_{6} \sim N(15, 2.5^{2}),$$ +where $\mathbf{p} \sim N(5, 0.5^2)$ and considering three different sample sizes: $n = 3000$ (Simulation 1), $n = 100$ (Simulation 2) and $n = 30$ (Simulation 3). +In all cases the dependent variable is generated according to: +$$\mathbf{y} = 4 + 5 \cdot \mathbf{X}_{2} - 9 \cdot \mathbf{X}_{3} - 2 \cdot \mathbf{X}_{4} + 2 \cdot \mathbf{X}_{5} + 7 \cdot \mathbf{X}_{6} + \mathbf{u},$$ +where $\mathbf{u} \sim N(0, 2^2)$. + +To set the results, a seed has been established using the command *set.seed(2024)*. + +```{r SAMPLE-SIZE 1} +## Simulation 1 +set.seed(2024) +obs = 3000 # no individual significance test is affected +cte = rep(1, obs) +x2 = rnorm(obs, 5, 0.1) # related to intercept: non essential +x3 = rnorm(obs, 5, 10) +x4 = x3 + rnorm(obs, 5, 0.5) # related to x3: essential +x5 = rnorm(obs, -1, 3) +x6 = rnorm(obs, 15, 2.5) +y = 4 + 5*x2 - 9*x3 -2*x4 + 2*x5 + 7*x6 + rnorm(obs, 0, 2) +X = cbind(cte, x2, x3, x4, x5, x6) +theoremSIMULATION1 = multicollinearity(y, X) +rownames(theoremSIMULATION1) = c("Intercept", "X2", "X3", "X4", "X5", "X6") +vifsSIMULATION1 = VIF(X) +cnSIMULATION1 = CN(X) +cvsSIMULATION1 = CVs(X) + +## Simulation 2 +obs = 100 # decreasing the number of observations affected to intercept +cte = rep(1, obs) +x2 = rnorm(obs, 5, 0.1) # related to intercept: non essential +x3 = rnorm(obs, 5, 10) +x4 = x3 + rnorm(obs, 5, 0.5) # related to x3: essential +x5 = rnorm(obs, -1, 3) +x6 = rnorm(obs, 15, 2.5) +y = 4 + 5*x2 - 9*x3 -2*x4 + 2*x5 + 7*x6 + rnorm(obs, 0, 2) +X = cbind(cte, x2, x3, x4, x5, x6) +theoremSIMULATION2 = multicollinearity(y, X) +rownames(theoremSIMULATION2) = c("Intercept", "X2", "X3", "X4", "X5", "X6") +vifsSIMULATION2 = VIF(X) +cnSIMULATION2 = CN(X) +cvsSIMULATION2 = CVs(X) + +## Simulation 3 +obs = 30 # decreasing the number of observations affected to intercept, x2 and x4 +cte = rep(1, obs) +x2 = rnorm(obs, 5, 0.1) # related to intercept: non essential +x3 = rnorm(obs, 5, 10) +x4 = x3 + rnorm(obs, 5, 0.5) # related to x3: essential +x5 = rnorm(obs, -1, 3) +x6 = rnorm(obs, 15, 2.5) +y = 4 + 5*x2 - 9*x3 -2*x4 + 2*x5 + 7*x6 + rnorm(obs, 0, 2) +X = cbind(cte, x2, x3, x4, x5, x6) +theoremSIMULATION3 = multicollinearity(y, X) +rownames(theoremSIMULATION3) = c("Intercept", "X2", "X3", "X4", "X5", "X6") +vifsSIMULATION3 = VIF(X) +cnSIMULATION3 = CN(X) +cvsSIMULATION3 = CVs(X) +``` + +With this generation it is intended that the variable $\mathbf{X}_{2}$ is linearly related to the intercept as well as $\mathbf{X}_{3}$ to $\mathbf{X}_{4}$. This is supported by the results shown in Table `r knitr::asis_output(ifelse(knitr::is_html_output(), '\\@ref(tab:traditionalSIMULATIONtableHTML)', '\\@ref(tab:traditionalSIMULATIONtableLATEX)'))`, which have been obtained using the \CRANpkg{multiColl} package of R (@R) using the commands $\texttt{CV}$, $\texttt{VIF}$ and $\texttt{CN}$. + +The results imply the same conclusions in all three simulations: + +- There is a worrying degree of non-essential multicollinearity in the model relating the intercept to the variable $\mathbf{X}_{2}$ since its coefficient of variation (CV) is lower than 0.1002506. + +- There is a worrying degree of essential multicollinearity in the model relating the variables $\mathbf{X}_{3}$ and $\mathbf{X}_{4}$ since the associated Variance Inflation Factors (VIF) are greater than 10. + +```{r traditionalSIMULATIONtableHTML, eval = knitr::is_html_output()} +traditionalSIMULATION = data.frame(c(cvsSIMULATION1, vifsSIMULATION1, cnSIMULATION1), + c(cvsSIMULATION2, vifsSIMULATION2, cnSIMULATION2), + c(cvsSIMULATION3, vifsSIMULATION3, cnSIMULATION3)) +rownames(traditionalSIMULATION) = c("X2 CV", "X3 CV", "X4 CV", "X5 CV", "X6 CV", "X2 VIF", "X3 VIF", "X4 VIF", "X5 VIF", "X6 VIF", "CN") +colnames(traditionalSIMULATION) = c("Simulation 1", "Simulation 2", "Simulation 3") +knitr::kable(traditionalSIMULATION, format = "html", caption = "CVs, VIFs for data of Simulations 1, 2 and 3", align="cccccc", digits = 3) +``` + +```{r traditionalSIMULATIONtableLATEX, eval = knitr::is_latex_output()} +traditionalSIMULATION = data.frame(c(cvsSIMULATION1, vifsSIMULATION1, cnSIMULATION1), + c(cvsSIMULATION2, vifsSIMULATION2, cnSIMULATION2), + c(cvsSIMULATION3, vifsSIMULATION3, cnSIMULATION3)) +rownames(traditionalSIMULATION) = c("X2 CV", "X3 CV", "X4 CV", "X5 CV", "X6 CV", "X2 VIF", "X3 VIF", "X4 VIF", "X5 VIF", "X6 VIF", "CN") +colnames(traditionalSIMULATION) = c("Simulation 1", "Simulation 2", "Simulation 3") +knitr::kable(traditionalSIMULATION, format = "latex", booktabs = TRUE, caption = "CVs, VIFs and CN for data of Simulations 1, 2 and 3", align="cccccc", digits = 3) %>% +kable_styling(latex_options = "scale_down") +``` + +However, does the degree of multicollinearity detected really affect the statistical analysis of the model? According to the results shown in Tables `r knitr::asis_output(ifelse(knitr::is_html_output(), '\\@ref(tab:theoremSIMULATION1tableHTML)', '\\@ref(tab:theoremSIMULATION1tableLATEX)'))` to `r knitr::asis_output(ifelse(knitr::is_html_output(), '\\@ref(tab:theoremSIMULATION3tableHTML)', '\\@ref(tab:theoremSIMULATION3tableLATEX)'))` this is not always the case: + +- In Simulation 1, when $n=3000$, the degree of multicollinearity in the model does not affect the statistical analysis of the model; scenario a.1 is always verified, i.e., both in the model proposed and in the orthonormal model, the null hypothesis is rejected in the individual significance tests. + +- In Simulation 2, when $n=100$, the degree of multicollinearity in the model affects the statistical analysis of the model only in the individual significance of the intercept; in all other cases scenario a.1 is verified again. + + + As will be seen below, the fact that the individual significance of the variable $\mathbf{X}_{2}$ is not affected may be due to the number of observations in the data set. But it may also be because multicollinearity of the nonessential type affects only the intercept estimate. Thus, for example, in @Salmeron2019TAS it is shown (see Table 2 of Example 2) that solving this type of approximate multicollinearity (by centering the variables that cause it) only modifies the estimate of the intercept and its standard deviation, with the estimates of the rest of the independent variables remaining unchanged. + +- In Simulation 3, when $n=30$, the degree of multicollinearity in the model affects the statistical analysis of the model in the individual significance of the intercept, in $\mathbf{X}_{2}$ and in $\mathbf{X}_{4}$. + + + In this case, as discussed, the reduction in sample size does not prevent the individual significance of $\mathbf{X}_{2}$ from being affected. + +In conclusion, as @OBrien indicates, it can be seen that the increase in sample size prevents the statistical analysis of the model from being affected by the degree of existing multicollinearity, even though the values of the measures traditionally used to detect this problem indicate that it is troubling. To reach this conclusion, the use of the RVIF proposed by @Salmeron2024a and the theorem developed in this paper is decisive. + +```{r theoremSIMULATION1tableHTML, eval = knitr::is_html_output()} +knitr::kable(theoremSIMULATION1, format = "html", caption = "Theorem results of the Simulation 1 model", align="cccccc", digits = 6) +``` + +```{r theoremSIMULATION1tableLATEX, eval = knitr::is_latex_output()} +knitr::kable(theoremSIMULATION1, format = "latex", booktabs = TRUE, caption = "Theorem results of the Simulation 1 model", align="cccccc", digits = 6) %>% +kable_styling(latex_options = "scale_down") +``` + +```{r theoremSIMULATION2tableHTML, eval = knitr::is_html_output()} +knitr::kable(theoremSIMULATION2, format = "html", caption = "Theorem results of the Simulation 2 model", align="cccccc", digits = 6) +``` + +```{r theoremSIMULATION2tableLATEX, eval = knitr::is_latex_output()} +knitr::kable(theoremSIMULATION2, format = "latex", booktabs = TRUE, caption = "Theorem results of the Simulation 2 model", align="cccccc", digits = 6) %>% +kable_styling(latex_options = "scale_down") +``` + +```{r theoremSIMULATION3tableHTML, eval = knitr::is_html_output()} +knitr::kable(theoremSIMULATION3, format = "html", caption = "Theorem results of the Simulation 3 model", align="cccccc", digits = 6) +``` + +```{r theoremSIMULATION3tableLATEX, eval = knitr::is_latex_output()} +knitr::kable(theoremSIMULATION3, format = "latex", booktabs = TRUE, caption = "Theorem results of the Simulation 3 model", align="cccccc", digits = 6) %>% +kable_styling(latex_options = "scale_down") +``` + +## Selection of the variable to be set as the last before orthonormalization {#how-to-fix} + +Since there are as many QR decompositions as there are possible rearrangements of the independent variables, it is convenient to test different options to determine whether the degree of multicollinearity in the regression model affects its statistical analysis. + +A first possibility is to try all possible reorderings considering that the intercept must always be in first place. Thus, in the Example 2 of @Salmeron2024a (see Appendix [Examples of...](#examplesRVIF) for more details) it is considered that $\mathbf{X} = [ \mathbf{1} \ \mathbf{K} \ \mathbf{W}]$ (see Table `r knitr::asis_output(ifelse(knitr::is_html_output(), '\\@ref(tab:theoremCHOICE1tableHTML)', '\\@ref(tab:theoremCHOICE1tableLATEX)'))`), but it could also be considered that $\mathbf{X} = [ \mathbf{1} \ \mathbf{W} \ \mathbf{K}]$ (see Table `r knitr::asis_output(ifelse(knitr::is_html_output(), '\\@ref(tab:theoremCHOICE2tableHTML)', '\\@ref(tab:theoremCHOICE2tableLATEX)'))`). + +Note that in these tables the values for each variable of RVIF and $c_{0}$ are always the same, but those of $c_{3}$ change depending on the position of each variable within the design matrix. + +```{r Choice1} +P = CDpf[,1] +cte = CDpf[,2] +K = CDpf[,3] +W = CDpf[,4] + +data2 = cbind(cte, K, W) +th2 = multicollinearity(P, data2) +rownames(th2) = c("Intercept", "Capital", "Work") +``` + +```{r theoremCHOICE1tableHTML, eval = knitr::is_html_output()} +knitr::kable(th2, format = "html", caption = "Theorem results of the Example 2 of @Salmeron2024a", align="cccccc", digits = 6) +``` + +```{r theoremCHOICE1tableLATEX, eval = knitr::is_latex_output()} +knitr::kable(th2, format = "latex", booktabs = TRUE, caption = "Theorem results of the Example 2 of Salmerón et al. (2025)", align="cccccc", digits = 6) %>% +kable_styling(latex_options = "scale_down") +``` + +```{r Choice2} +data2 = cbind(cte, W, K) +th2 = multicollinearity(P, data2) +rownames(th2) = c("Intercept", "Work", "Capital") +``` + +```{r theoremCHOICE2tableHTML, eval = knitr::is_html_output()} +knitr::kable(th2, format = "html", caption = "Theorem results of the Example 2 of @Salmeron2024a (reordination 2)", align="cccccc", digits = 6) +``` + +```{r theoremCHOICE2tableLATEX, eval = knitr::is_latex_output()} +knitr::kable(th2, format = "latex", booktabs = TRUE, caption = "Theorem results of the Example 2 of Salmerón et al. (2025) (reordination 2)", align="cccccc", digits = 6) %>% +kable_styling(latex_options = "scale_down") +``` + +It is observed that in one of the two possibilities considered, the individual significance of the labor variable is affected by the degree of existing multicollinearity. + +Therefore, to state that the statistical analysis of the multiple linear regression model is not affected by the multicollinearity present in the model, it is necessary to check all the possible QR decompositions and to determine in all cases that the statistical analysis is not affected. However, to determine that the statistical analysis of the model is affected by the presence of multicollinearity, it is sufficient to find one of the possible rearrangements in which the situation b.1 occurs. + +```{r Choice7, eval=FALSE} +NE = employees[,1] +cte = employees[,2] +FA = employees[,3] +OI = employees[,4] +S = employees[,5] +reg = lm(NE~FA+OI+S) +summary(reg) +``` + +Another possibility is to set in the last position of $\mathbf{X}$ a particular variable following a specific criterion. Thus, for example, in Example 3 of @Salmeron2024a (see Appendix [Examples of...](#examplesRVIF) for more details) it is verified that the variable FA has a coefficient significantly different from zero. Fixing this variable in third place since the individual significance will not be modified yields the results shown in Table `r knitr::asis_output(ifelse(knitr::is_html_output(), '\\@ref(tab:theoremCHOICE8tableHTML)', '\\@ref(tab:theoremCHOICE8tableLATEX)'))`. + +```{r Choice8} +NE = employees[,1] +cte = employees[,2] +FA = employees[,3] +OI = employees[,4] +S = employees[,5] +data3 = cbind(OI, S, FA) +th8 = multicollinearity(NE, data3) +rownames(th8) = c("OI", "S", "FA") +``` + +```{r theoremCHOICE8tableHTML, eval = knitr::is_html_output()} +knitr::kable(th8, format = "html", caption = "Theorem results of the Example 3 of @Salmeron2024a reordination", align="ccccc", digits=20) +``` + +```{r theoremCHOICE8tableLATEX, eval = knitr::is_latex_output()} +knitr::kable(th8, format = "latex", booktabs = TRUE, caption = "Theorem results of the Example 3 of Salmerón et al. (2025) reordination", align="ccccc", digits=20) %>% +kable_styling(latex_options = "scale_down") +``` + +It can be seen that in this case the degree of multicollinearity in the model affects the individual significance of the OI and S variables. + +# Conclusions {#conclusiones-VIF} + +In this paper, following @Salmeron2024a, we propose an alternative orthogonal model that leads to a lower bound for the RVIF, indicating whether the degree of multicollinearity present in the model affects its statistical analysis. These thresholds serve as complements to the results presented by @OBrien, who stated that the estimated variances depend on other factors that can counterbalance a high value of the VIF, for example, the size of the sample or the estimated variance of the independent variables. Thus, the thresholds presented for the RVIF also depend on these factors meeting a threshold associated with each independent variable (including the intercept). Note that these thresholds will indicate whether the degree of multicollinearity affects the statistical analysis. + +As these thresholds are derived from the individual significance tests of the model, it is possible to reinterpret them as a statistical test to determine whether the degree of multicollinearity in the linear regression model affects its statistical analysis. This analytic tool allows researchers to conclude whether the degree of multicollinearity is statistically troubling and whether it needs to be treated. We consider this to be a relevant contribution since, to the best of our knowledge, the only existing example of such a measure, presented by @FarrarGlauber, has been strongly criticized (in addition to the limitations highlighted in the introduction, it should be noted that it completely ignores approximate non-essential multicollinearity since the correlation matrix does not include information on the intercept); consequently, this new statistical test with a non-rejection region will fill a gap in the scientific literature. + +On the other hand, note that the position of each of the variables in the matrix $\mathbf{X}$ uniquely determines the reference orthonormal model $\mathbf{X}_{o}$. It is to say, there are as many reference models given by the proposed QR decomposition as there are possible rearrangements of the variables within the matrix $\mathbf{X}$. + +In this sense, as has been shown, in order to affirm that the statistical analysis of the model is not affected by the degree of multicollinearity existing in the model (with the degree of significance used in the application of the proposed theorem), it is necessary to state that in all the possible rearrangements of $\mathbf{X}$ it is concluded that scenario b.1 does not occur. On the other hand, when there is a rearrangement in which this scenario appears, it can be stated (to the degree of significance used when applying the proposed theorem) that the degree of existing multicollinearity affects the statistical analysis of the model. + +Finally, as a future line of work, it would be interesting to complete the analysis presented here by studying when the degree of multicollinearity in the model affects its numerical analysis. + +# Acknowledgments + +This work has been supported by project PP2019-EI-02 of the University of Granada (Spain) and by project A-SEJ-496-UGR20 of the Andalusian Government’s Counseling of Economic Transformation, Industry, Knowledge and Universities (Spain). + +# Appendix + +## Inconsistency in hypothesis tests: situation a.2 {#inconsistency} + +From a numerical point of view it is possible to reject $H_{0}: \beta_{i} = 0$ while $H_{0}: \beta_{i,o} = 0$ is not rejected, which implies that $t_{i}^{o} < t_{n-k}(1 - \alpha/2) < t_{i}$. Or, in other words, $t_{i}/t_{i}^{o} > 1$. + +However, from expression \@ref(eq:texp-orto-2) it is obtained that $\widehat{\sigma} = | \widehat{\beta}_{i,o} | / t_{i}^{o}$. By substituting $\widehat{\sigma}$ in expression \@ref(eq:texp-orig), taking into account expression \@ref(eq:RVIF), it is obtained that + $$\frac{t_{i}}{t_{i}^{o}} = \frac{| \widehat{\beta}_{i} |}{| \widehat{\beta}_{i,o} |} \cdot \frac{1}{\sqrt{RVIF(i)}}.$$ + From this expression it can be concluded that in situations with high collinearity, $RVIF(i) \rightarrow +\infty$, the ratio $t_{i}/t_{i}^{o}$ will tend to zero, and the condition $t_{i}/t_{i}^{o} > 1$ will rarely occur. That is to say, the inconsistency in situation a.2, commented on in the preliminaries of the paper, will not appear. + +On the other hand, if the variable $i$ is orthogonal to the rest of independent variables, it is verified that $\widehat{\beta}_{i,o} = \widehat{\beta}_{i}$ since $p_{i} = ( 0 \dots \underbrace{1}_{(i)} \dots 0)$. At the same time, $RVIF(i) = \frac{1}{SST_{i}}$ where $SST$ denotes the sum of total squares. If there is orthonormality, as proposed in this paper, $SST_{i} = 1$ and, as consequence, it is verified that $t_{i} = t_{i}^{o}$. Thus, the individual significance test for the original data and for the orthonormal data are the same. + +## Test of individual significance of coefficient $k$ {#apendix} + +Taking into account that it is verified that $\boldsymbol{\beta}_{o} = \mathbf{P} \boldsymbol{\beta}$ where: +$$\boldsymbol{\beta}_{o} = \left( + \begin{array}{c} + \beta_{1,o} \\ + \beta_{2,o} \\ + \vdots \\ + \beta_{k,o} + \end{array} \right), \quad + \mathbf{P} = \left( + \begin{array}{cccc} + p_{11} & p_{12} & \dots & p_{1k} \\ + 0 & p_{22} & \dots & p_{2k} \\ + \vdots & \vdots & & \vdots \\ + 0 & 0 & \dots & p_{kk} + \end{array} \right), \quad + \boldsymbol{\beta} = \left( + \begin{array}{c} + \beta_{1} \\ + \beta_{2} \\ + \vdots \\ + \beta_{k} + \end{array} \right),$$ +it is obtained that $\beta_{k,o} = p_{kk} \beta_{k}$. Then, the null hypothesis $H_{0}: \beta_{k,o} = 0$ is equivalent to $H_{0}: \beta_{k} = 0$. Due to this fact, Tables `r knitr::asis_output(ifelse(knitr::is_html_output(), '\\@ref(tab:Wissel0tableHTML)', '\\@ref(tab:Wissel0tableLATEX)'))` and `r knitr::asis_output(ifelse(knitr::is_html_output(), '\\@ref(tab:WisselORTOtableHTML)', '\\@ref(tab:WisselORTOtableLATEX)'))` showed an expectable behaviour. However, this behaviour will be analyzed with more detail. + +The experimental value to be considered to take a decision in the test with null hypothesis $H_{0}: \beta_{k,o} = 0$ and alternative hypothesis $H_{1}: \beta_{k,o} \not= 0$ is given by the following expression: +$$t_{k}^{o} = \left| \frac{\widehat{\beta}_{k,o}}{\sqrt{var \left( \widehat{\beta}_{k,o} \right)}} \right|.$$ + +Taking into account that $\widehat{\boldsymbol{\beta}}_{o} = \mathbf{P} \widehat{\boldsymbol{\beta}}$ and $var \left( \widehat{\boldsymbol{\beta}}_{o} \right) = \mathbf{P} var \left( \widehat{\boldsymbol{\beta}} \right) \mathbf{P}^{t},$ it is verified that $\widehat{\beta}_{k,o} = p_{kk} \widehat{\beta}_{k}$ and $var \left( \widehat{\beta}_{k,o} \right) = p_{kk}^{2} var \left( \widehat{\beta}_{k} \right)$. Then: + $$t_{k}^{o} = \left| \frac{p_{kk} \widehat{\beta}_{k}}{p_{kk} \sqrt{var \left( \widehat{\beta}_{k} \right)}} \right| = \left| \frac{\widehat{\beta}_{k}}{\sqrt{var \left( \widehat{\beta}_{k} \right)}} \right| = t_{k},$$ + where $t_{k}$ is the experimental value to take a decision in the test with null hypothesis $H_{0}: \beta_{k} = 0$ and alternative hypothesis $H_{1}: \beta_{k} \not= 0$. + +## Examples of @Salmeron2024a {#examplesRVIF} + +**Example 1 of @Salmeron2024a: Detection of traditional nonessential multicollinearity**. Using data from a financial model in which the Euribor (E) is analyzed from the Harmonized Index of Consumer Prices (HICP), the balance of payments to net current account (BC) and the government deficit to net nonfinancial accounts (GD), we illustrate the detection of approximate multicollinearity of the non-essential type, i.e. where the intercept is related to one of the remaining independent variables (for details see @MarquardtSnee1975). For more information on this data set use *help(euribor)*. + +Note that @Salmeron2019 establishes that an independent variable with a coefficient of variation less than 0.1002506 indicates that this variable is responsible for a non-essential multicollinearity problem. + +Thus, first of all, the approximate multicollinearity detection is performed using the measures traditionally applied for this purpose: the Variance Inflation Factor (VIF) and the Condition Number (CN). Values higher than 10 for the VIF (see, for example, @Marquardt1970) and 30 for the CN (see, for example, @Belsley1991 or @BelsleyKuhWelsch), imply that the degree of existing multicollinearity is troubling. Moreover, according to @Salmeron2019, the VIF is only able to detect essential multicollinearity (relationship between independent variables excluding the intercept, see @MarquardtSnee1975), while the CN detects both essential and non-essential multicollinearity. + +Therefore, the values calculated below (using the $\texttt{VIF}$, $\texttt{CN}$ and $\texttt{CVs}$ commands from the \CRANpkg{multiColl} package, see @Salmeron2021multicoll and @Salmeron2022multicoll for more details on this package) indicate that the degree of approximate multicollinearity existing in the model of the essential type is not troubling, while that of the non-essential type is troubling due to the relationship of HIPC with the intercept. + +```{r PAPER1, echo=TRUE} +E = euribor[,1] +data1 = euribor[,-1] + +VIF(data1) +CN(data1) +CVs(data1) +``` + +This assumption is confirmed by calculating the RVIF values, which point to a strong relationship between the second variable and the intercept: + +```{r PAPER2, echo = knitr::is_html_output(), eval = knitr::is_html_output()} +rvifs(data1, ul = T, intercept = T) +``` + +```{r PAPER2bis, echo = knitr::is_latex_output(), eval = knitr::is_latex_output()} +rvifs(data1, ul = T, intercept = T) +``` + +The output of the $\texttt{rvifs}$ command provides the values of the Redefined Variance Inflation Factor (RVIF) and the percentage of multicollinearity due to each variable (denoted as $a_{i}$ in the [An orthonormal...](#sub-above) section). + +In this case, three of the four arguments available in the $\texttt{rvifs}$ command are used: + +- The first of these refers to the design matrix containing the independent variables (the intercept, if any, being the first column). + +- The second argument, $ul$, indicates that the data is to be transformed into unit length. This transformation makes it possible to establish that the RVIF is always greater than or equal to 1, having as a reference a minimum value that indicates the absence of worrying multicollinearity. + +- The third argument, $intercept$, indicates whether there is an intercept in the design matrix. + +Note that these results can also be obtained after using the $\texttt{lm}$ and $\texttt{model.matrix}$ commands as follows: + +```{r PAPER_2, echo = knitr::is_html_output(), eval = knitr::is_html_output()} +reg_E = lm(euribor[,1]~as.matrix(euribor[,-c(1,2)])) +rvifs(model.matrix(reg_E)) +``` + +```{r PAPER_2bis, echo = knitr::is_latex_output(), eval = knitr::is_latex_output()} +reg_E = lm(euribor[,1]~as.matrix(euribor[,-c(1,2)])) +rvifs(model.matrix(reg_E)) +``` + +Finally, the application of the Theorem established in Subsection [From the RVIF](#TheTHEOREM) detects that the individual inference of the second variable (HIPC) is affected by the degree of multicollinearity existing in the model. These results are obtained using the $\texttt{multicollinearity}$ command from the \CRANpkg{rvif} package: + +```{r PAPER3, echo=TRUE} +multicollinearity(E, data1) +``` + +Therefore, it can be established that the existing multicollinearity affects the statistical analysis of the Euribor model. + +**Example 2 of @Salmeron2024a: Detection of generalized nonessential multicollinearity**. Using data from a Cobb-Douglas production function in which the production (P) is analyzed from the capital (K) and the work (W), we illustrate the detection of approximate multicollinearity of the generalized non-essential type, i.e., that in which at least two independent variables with very little variability (excluding the intercept) are related to each other (for more details, see @Salmeron2020maths). For more information on this dataset use *help(CDpf)*. + +Using the $\texttt{rvifs}$ command, it can be determined that both capital and labor are linearly related to each other with high RVIF values below the threshold established as a worrying value: + +```{r PAPER4, echo=TRUE} +P = CDpf[,1] +data2 = CDpf[,2:4] +``` + +```{r PAPER4bis, echo = knitr::is_html_output(), eval = knitr::is_html_output()} +rvifs(data2, ul = T) +``` + +```{r PAPER4tris, echo = knitr::is_latex_output(), eval = knitr::is_latex_output()} +rvifs(data2, ul = T) +``` + +However, the application of the Theorem established in Subsection [From the RVIF](#TheTHEOREM) does not detect that the degree of multicollinearity in the model affects the statistical analysis of the model: + +```{r PAPER5, echo=TRUE} +multicollinearity(P, data2) +``` + +Now then, if we rearrange the design matrix $\mathbf{X}$ we obtain that: + +```{r PAPER5bis, echo=TRUE} +data2 = CDpf[,c(2,4,3)] +multicollinearity(P, data2) +``` + +Therefore, it can be established that the existing multicollinearity does affect the statistical analysis of the Cobb-Douglas production function model. + +**Example 3 of @Salmeron2024a: Detection of essential multicollinearity**. Using data from a model in which the number of employees of Spanish companies (NE) is analyzed from the fixed assets (FA), operating income (OI) and sales (S), we illustrate the detection of approximate multicollinearity of the essential type, i.e., that in which at least two independent variables (excluding the intercept) are related to each other (for more details, see @MarquardtSnee1975). For more information on this dataset use *help(employees)*. + +In this case, the $\texttt{rvifs}$ command shows that variables three and four (OI and S) have a high VIF value, so they are highly linearly related: + +```{r PAPER6, echo=TRUE} +NE = employees[,1] +data3 = employees[,2:5] +``` + +```{r PAPER6bis, echo = knitr::is_html_output(), eval = knitr::is_html_output()} +rvifs(data3, ul = T) +``` + +```{r PAPER6tris, echo = knitr::is_latex_output(), eval = knitr::is_latex_output()} +rvifs(data3, ul = T) +``` + +Note that if in *rvifs(data3, ul = T)* the unit length transformation is avoided, which is done in the $\texttt{multicollinearity}$ command, the RVIF cannot be calculated since the system is computationally singular. For this reason, the intercept is eliminated below since it has been shown above that it does not play a relevant role in the linear relationships of the model. + +Finally, the application of the Theorem established in Subsection [From the RVIF](#TheTHEOREM) detects that the individual inference of the third variable (OI) is affected by the degree of multicollinearity existing in the model: + +```{r PAPER7, echo=TRUE} +multicollinearity(NE, data3[,-1]) +``` + +Therefore, it can be established that the existing multicollinearity affects the statistical analysis of the model of the number of employees in Spanish companies. + +**Example 4 of @Salmeron2024a: The special case of simple linear model**. The simple linear regression model is an interesting case because it has a single independent variable and the intercept. Since the intercept is not properly considered as an independent variable of the model in many cases (see introduction of @Salmeron2019 for more details), different software (including R, @R) do not consider that there can be worrisome multicollinearity in this type of model. + +To illustrate this situation, @Salmeron2024a randomly generates observations for the following two simple linear regression models $\mathbf{y}_{1} = \beta_{1} + \beta_{2} \mathbf{V} + \mathbf{u}_{1}$ and $\mathbf{y}_{2} = \alpha_{1} + \alpha_{2} \mathbf{Z} + \mathbf{u}_{1}$, according to the following code: + +```{r PAPER8, echo=TRUE} +set.seed(2022) +obs = 50 +cte4 = rep(1, obs) +V = rnorm(obs, 10, 10) +y1 = 3 + 4*V + rnorm(obs, 0, 2) +Z = rnorm(obs, 10, 0.1) +y2 = 3 + 4*Z + rnorm(obs, 0, 2) + +data4.1 = cbind(cte4, V) +data4.2 = cbind(cte4, Z) +``` + +For more information on these data sets use *help(SLM1)* and *help(SLM2)*. + +As mentioned above, the R package (@R) denies the existence of multicollinearity in this type of model. Thus, for example, when using the $\texttt{vif}$ command of the \CRANpkg{car} package on *reg=lm(y1~V)* the following message is obtained: *Error in vif.default(reg): model contains fewer than 2 terms*. + +Undoubtedly, this message is coherent with the fact that, as mentioned above, the VIF is not capable of detecting non-essential multicollinearity (which is the only multicollinearity that exists in this type of model). However, the error message provided may lead a non-specialized user to consider that the multicollinearity problem does not exist in this type of model. These issues are addressed in more depth in @Salmeron2022multicoll. + +On the other hand, the calculation of the RVIF in the first model shows that the degree of multicollinearity is not troubling, since it presents very low values: + +```{r PAPER10, echo=TRUE} +rvifs(data4.1, ul = T) +``` + +While in the second model they are very high, indicating a problem of non-essential multicollinearity: + +```{r PAPER11, echo=TRUE} +rvifs(data4.2, ul = T) +``` + +By using the $\texttt{multicollinearity}$ command, it is found that the individual inference of the intercept of the second model is affected by the degree of multicollinearity in the model: + +```{r PAPER12, echo=TRUE} +multicollinearity(y1, data4.1) +multicollinearity(y2, data4.2) +``` + +Therefore, it can be established that the multicollinearity existing in the first simple linear regression model does not affect the statistical analysis of the model, while in the second one it does. diff --git a/_articles/RJ-2025-040/RJ-2025-040.html b/_articles/RJ-2025-040/RJ-2025-040.html new file mode 100644 index 0000000000..9c3c514934 --- /dev/null +++ b/_articles/RJ-2025-040/RJ-2025-040.html @@ -0,0 +1,4856 @@ + + + + + + + + + + + + + + + + + + + + + + rvif: a Decision Rule to Detect Troubling Statistical Multicollinearity Based on Redefined VIF + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +

    rvif: a Decision Rule to Detect Troubling Statistical Multicollinearity Based on Redefined VIF

    + + + +

    Multicollinearity is relevant in many different fields where linear regression models are applied since its presence may affect the analysis of ordinary least squares estimators not only numerically but also from a statistical point of view, which is the focus of this paper. Thus, it is known that collinearity can lead to incoherence in the statistical significance of the coefficients of the independent variables and in the global significance of the model. In this paper, the thresholds of the Redefined Variance Inflation Factor (RVIF) are reinterpreted and presented as a statistical test with a region of non-rejection (which depends on a significance level) to diagnose the existence of a degree of worrying multicollinearity that affects the linear regression model from a statistical point of view. The proposed methodology is implemented in the rvif package of R and its application is illustrated with different real data examples previously applied in the scientific literature.

    +
    + + + +
    +

    1 Introduction

    +

    It is well known that linear relationships between the independent variables of a multiple linear regression model (multicollinearity) can affect the analysis of the model estimated by Ordinary Least Squares (OLS), either by causing unstable estimates of the coefficients of these variables or by rejecting individual significance tests of these coefficients (see, for example, Farrar and Glauber (1967), Gunst and Mason (1977), Gujarati (2003), Silvey (1969), Willan and Watts (1978) or Wooldrigde (2020)). +However, the measures traditionally applied to detect multicollinearity may conclude that multicollinearity exists even if it does not lead to the negative effects mentioned above (see Subsection Effect of sample size.. for more details), when, in fact, the best solution in this case may be not to treat the multicollinearity (see O’Brien (2007)).

    +

    Focusing on the possible effect of multicollinearity on the individual significance tests of the coefficients of the independent variables (tendency not to reject the null hypothesis), this paper proposes an alternative procedure that focuses on checking whether the detected multicollinearity affects the statistical analysis of the model. For this disruptive approach, a methodology is necessary that indicates whether multicollinearity affects the statistical analysis of the model. The introduction of such methodology is the main objective of this paper. The paper also shows the use of the rvif package of R (R Core Team (2025)) in which this procedure is implemented.

    +

    To this end, we start from the Variance Inflation Factor (VIF). The VIF is obtained from the coefficient of determination of the auxiliary regression of each independent variable of linear regression model as a function of the other independent variables. Thus, there is a VIF for each independent variable except for the intercept, for which it is not possible to calculate a coefficient of determination for the corresponding auxiliary regression. Consequently, the VIF is able to diagnose the degree of essential approximate multicollinearity (strong linear relationship between the independent variables except the intercept) existing in the model but is not able to detect the non-essential one (strong relationship between the intercept and at least one of the independent variables). +For more information on multicollinearity of essential and non-essential type, see Marquardt and Snee (1975) and Salmerón-Gómez et al. (2019).

    +

    However, the fact that the VIF detects a worrying level of multicollinearity does not always translate into a negative impact on the statistical analysis. This lack of specificity is due to the fact that other factors, such as sample size and the variance of the random disturbance, can lead to high values of the VIF but not increase the variance of the OLS estimators (see O’Brien (2007)). The explanation for this phenomenon hinges on the fact that, in the orthogonal variable reference model, which is traditionally considered as the reference, the linear relationships are assumed to be eliminated, while other factors, such as the variance of the random disturbance, maintain the same values.

    +

    Then, to avoid these inconsistencies, Salmerón et al. (2025) propose a QR decomposition in the matrix of independent variables of the model in order to obtain an orthonormal matrix. By redefining the reference point, the variance inflation factor is also redefined, resulting in a new detection measure that analyzes the change in the VIF and the rest of relevant factors of the model, thereby overcoming the problems associated with the traditional VIF, as described by O’Brien (2007) among others. The intercept is also included in the detection (contrary to what happens with the traditional VIF), it is therefore able to detect both essential and non-essential multicollinearity. +This new measure presented by Salmerón et al. (2025) is called Redefined Variance Inflation Factor (RVIF).

    +

    In this paper, the RVIF is associated with a statistical test for detecting troubling multicollinearity, this test is given by a region of non-rejection that depends on a significance level. Note that most of the measures used to diagnose multicollinearity are merely indicators with rules of thumb rather than statistical tests per se. To the best of our knowledge, the only existing statistical test for diagnosing multicollinearity was presented by Farrar and Glauber (1967) and has received strong criticism (see, for example, Haitovsky (1969), Kumar (1975), Wichers (1975) and O’Hagan and McCabe (1975)). +Thus, for example, Haitovsky (1969) indicates that the Farrar and Glauber statistic indicates that the variables are not orthogonal to each other; it tells us nothing more. +In this sense, O’Hagan and McCabe (1975) indicates that such a test simply indicates whether the null hypothesis of orthogonality is rejected by giving no information on the value of the matrix of correlations determinant above which the multicollinearity problem becomes intolerable. +Therefore, the non-rejection region presented in this paper should be a relevant contribution to the field of econometrics insofar as it would fill an existing gap in the scientific literature.

    +

    The paper is structured as follows: Sections Preliminares and A first attempt of… provide preliminary information to introduce the methodology used to establish the non-rejection region described in Section A non-rejection region…. +Section rvif package presents the package rvif of R (R Core Team (2025)) and shows its main commands by replicating the results given in Salmerón et al. (2025) and in the previous sections of this paper. +Finally, Section Conclusions summarizes the main contributions of this paper.

    +

    2 Preliminaries

    +

    This section identifies some inconsistencies in the definition of the VIF and how these are reflected in the individual significance tests of the linear regression model. It also shows how these inconsistencies are overcome in the proposal presented by Salmerón et al. (2025) and how this proposal can lead to a decision rule to determine whether the degree of multicollinearity is troubling, i.e., whether it affects the statistical analysis (individual significance tests) of the model.

    +

    2.1 The original model

    +

    The multiple linear regression model with \(n\) observations and \(k\) independent variables can be expressed as: +\[\begin{equation} + \mathbf{y}_{n \times 1} = \mathbf{X}_{n \times k} \cdot \boldsymbol{\beta}_{k \times 1} + \mathbf{u}_{n \times 1}, + \tag{1} +\end{equation}\] +where the first column of \(\mathbf{X} = [\mathbf{1} \ \mathbf{X}_{2} \dots \mathbf{X}_{i} \dots \mathbf{X}_{k}]\) is composed of ones representing the intercept and \(\mathbf{u}\) represents the random disturbance assumed to be centered and spherical. That is, \(E[\mathbf{u}_{n \times 1}] = \mathbf{0}_{n \times 1}\) and \(var(\mathbf{u}_{n \times 1}) = \sigma^{2} \cdot \mathbf{I}_{n \times n}\), where \(\mathbf{0}\) is a vector of zeros, \(\sigma^{2}\) is the variance of the random disturbance and \(\mathbf{I}\) is the identity matrix.

    +

    Given the original model (1), the VIF is defined as the ratio between the variance of the estimator in this model, \(var \left( \widehat{\beta}_{i} \right)\), and the variance of the estimator of a hypothetical reference model, that is, a hypothetical model in which orthogonality among the independent variables is assumed, \(var \left( \widehat{\beta}_{i,o} \right)\). This is to say:

    +

    \[\begin{equation}\small{ + var \left( \widehat{\beta}_{i} \right) = \frac{\sigma^{2}}{n \cdot var(\mathbf{X}_{i})} \cdot \frac{1}{1 - R_{i}^{2}} = var \left( \widehat{\beta}_{i,o} \right) \cdot VIF(i), \quad i=2,\dots,k, + \tag{2}} +\end{equation}\]
    +\[\begin{equation} + \frac{ + var \left( \widehat{\beta}_{i} \right) + }{ + var \left( \widehat{\beta}_{i,o} \right) + } = VIF(i), \quad i=2,\dots,k, + \tag{3} +\end{equation}\] +where \(\mathbf{X}_{i}\) is the independent variable \(i\) of the model (1) and \(R^{2}_{i}\) the coefficient of determination of the following auxiliary regression: +\[\begin{equation} + \mathbf{X}_{i} = \mathbf{X}_{-i} \cdot \boldsymbol{\alpha} + \mathbf{v}, + \label{model_aux} \nonumber +\end{equation}\] +where \(\mathbf{X}_{-i}\) is the result of eliminating \(\mathbf{X}_{i}\) from the matrix \(\mathbf{X}\).

    +

    As observed in the expression (2), a high VIF leads to a high variance. Then, since the experimental value for the individual significance test is given by: +\[\begin{equation} + t_{i} = \left| \frac{\widehat{\beta}_{i}}{\sqrt{\frac{\widehat{\sigma}^{2}}{n \cdot var(\mathbf{X}_{i})} \cdot VIF(i)}} \right|, \quad i=2,\dots,k, + \tag{4} +\end{equation}\] +a high VIF will lead to a low experimental statistic (\(t_{i}\)), provoking the tendency not to reject the null hypothesis, i.e. the experimental statistic will be lower than the theoretical statistic (given by \(t_{n-k}(1-\alpha/2)\), where \(\alpha\) is the significance level).

    +

    However, this statement is full of simplifications. By following O’Brien (2007), and as can be easily observed in the expression (4), other factors, such as the estimation of the random disturbance and the size of the sample, can counterbalance the high value of the VIF to yield a low value for the experimental statistic. That is to say, it is possible to obtain VIF values greater than 10 (the threshold traditionally established as troubling, see Marquardt (1970) for example) that do not necessarily imply high estimated variance on account of a large sample size or a low value for the estimated variance of the random disturbance. This explains, as noted in the introduction, why not all models with a high value for the VIF present effects on the statistical analysis of the model.

    +
    +

    Example 1. +Thus, for example, García et al. (2019) considered an extension of the interest rate model presented by Wooldrigde (2020), where \(k=3\), in which all the independent variables have associated coefficients significantly different from zero, presenting a VIF equal to 71.516, much higher than the threshold normally established as worrying. In other words, in this case, a high VIF does not mean that the individual significance tests are affected. This situation is probably due to the fact that in this case 131 observations are available, i.e. the expression (4) can be expressed as: +\[t_{i} = \left| \frac{\widehat{\beta}_{i}}{\sqrt{\frac{\widehat{\sigma}^{2}}{131 \cdot var(\mathbf{X}_{i})} \cdot 71.516}} \right| += \left| \frac{\widehat{\beta}_{i}}{\sqrt{0.546 \cdot \frac{\widehat{\sigma}^{2}}{var(\mathbf{X}_{i})}}} \right|, \quad i=2,3.\] +Note that in this case a high value of \(n\) compensates for the high value of VIF. In addition, the value of \(n\) will also cause \(\widehat{\sigma}^{2}\) to decrease, since \(\widehat{\sigma}^{2} = \frac{\mathbf{e}^{t}\mathbf{e}}{n-k}\), where \(\mathbf{e}\) are the residuals of the original model (1).

    +
    +
    +

    The Subsection Effect of sample size.. provides an example that illustrates in more detail the effect of sample size on the statistical analysis of the model. \(\lozenge\)

    +
    +

    On the other hand, considering the hypothetical orthogonal model, the value of the experimental statistic of the individual significance test, whose null hypothesis is \(\beta_{i} = 0\) in face of the alternative hypothesis \(\beta_{i} \not= 0\) with \(i=2,\dots,k\), is given by: +\[\begin{equation} + t_{i}^{o} = \left| \frac{\widehat{\beta}_{i}}{\sqrt{\frac{\widehat{\sigma}^{2}}{n \cdot var(\mathbf{X}_{i})}}} \right|, \quad i=2,\dots,k, + \tag{5} +\end{equation}\] +where the estimated variance of the estimator has been diminished due to the VIF always being greater than or equal to 1, and consequently, \(t_{i}^{o} \geq t_{i}\). However, it has been assumed that the same estimates for the independent variable coefficients and random disturbance variance are obtained in the orthogonal and original models, which does not seem to be a plausible supposition (see Salmerón et al. (2025) Section 2.1 for more details).

    +

    2.2 An orthonormal reference model

    +

    In Salmerón et al. (2025) the following QR decomposition of the matrix \(\mathbf{X}_{n \times k}\) of the model (1) is proposed: \(\mathbf{X} = \mathbf{X}_{o} \cdot \mathbf{P}\), where \(\mathbf{X}_{o}\) is an orthonormal matrix of the same dimensions as \(\mathbf{X}\) and \(\mathbf{P}\) is a higher-order triangular matrix of dimensions \(k \times k\). Then, the following hypothetical orthonormal reference model: +\[\begin{equation} + \mathbf{y} = \mathbf{X}_{o} \cdot \boldsymbol{\beta}_{o} + \mathbf{w}, + \tag{6} +\end{equation}\] +verifies that: +\[\widehat{\boldsymbol{\beta}} = \mathbf{P}^{-1} \cdot \widehat{\boldsymbol{\beta}}_{o}, \ + \mathbf{e} = \mathbf{e}_{o}, \ + var \left( \widehat{\boldsymbol{\beta}}_{o} \right) = \sigma^{2} \cdot \mathbf{I},\] +where \(\mathbf{e}_{o}\) are the residuals of the orthonormal reference model (6). +Note that since \(\mathbf{e} = \mathbf{e}_{o}\), the estimate of \(\sigma^{2}\) is the same in the original model (1) and in the orthonormal reference model (6). +Moreover, since the dependent variable is the same in both models, the coefficient of determination and the experimental value of the global significance test are the same in both cases.

    +

    From these values, taking into account the expressions (2) and (3), it is evident that the ratio between the variance of the estimator in the original model (1) and the variance of the estimator of the orthonormal reference model (6) is: +\[\begin{equation} + \frac{ + var \left( \widehat{\beta}_{i} \right) + }{ + var \left( \widehat{\beta}_{i,o} \right) + } = \frac{VIF(i)}{n \cdot var(\mathbf{X}_{i})}, \quad i=2,\dots,k. + \tag{7} \nonumber +\end{equation}\] +Consequently, Salmerón et al. (2025) defined the redefined VIF (RVIF) for \(i=1,\dots,k\) as: +\[\begin{equation}\small{ + RVIF(i) = \frac{VIF(i)}{n \cdot var(\mathbf{X}_{i})} = \frac{\mathbf{X}_{i}^{t} \mathbf{X}_{i}}{\mathbf{X}_{i}^{t} \mathbf{X}_{i} - \mathbf{X}_{i}^{t} \mathbf{X}_{-i} \cdot \left( \mathbf{X}_{-i}^{t} \mathbf{X}_{-i} \right)^{-1} \cdot \mathbf{X}_{-i}^{t} \mathbf{X}_{i}}, \tag{8}} +\end{equation}\] +which shows, among other questions, that it is defined for \(i=1,2,\dots,k\). That is, in contrast to the VIF, the RVIF can be calculated for the intercept of the linear regression model.

    +

    Other considerations to be taken into account are the following:

    +
      +
    • If the data are expressed in unit length, same transformation used to calculate the Condition Number (CN), then: +\[RVIF(i) = \frac{1}{1 - \mathbf{X}_{i}^{t} \mathbf{X}_{-i} \cdot \left( \mathbf{X}_{-i}^{t} \mathbf{X}_{-i} \right)^{-1} \cdot \mathbf{X}_{-i}^{t} \mathbf{X}_{i}}, \quad i=1,\dots,k.\]

    • +
    • In this case (data expressed in unit length), when \(\mathbf{X}_{i}\) is orthogonal to \(\mathbf{X}_{-i}\), it is verified that \(\mathbf{X}_{i}^{t} \mathbf{X}_{-i} = \mathbf{0}\) and, consequently \(RVIF(i) = 1\) for \(i=1,\dots,k\). That is, the RVIF is always greater than or equal to 1 and its minimum value is indicative of the absence of multicollinearity.

    • +
    • Denoted by \(a_{i}= \mathbf{X}_{i}^{t} \mathbf{X}_{i} \cdot \left( \mathbf{X}_{-i}^{t} \mathbf{X}_{-i} \right)^{-1} \cdot \mathbf{X}_{-i}^{t} \mathbf{X}_{i}\), it is verified that \(RVIF(i) = \frac{1}{1-a_{i}}\) where \(a_{i}\) can be interpreted as the percentage of approximate multicollinearity due to variable \(\mathbf{X}_{i}\). Note the similarity of this expression to that of the VIF: \(VIF(i) = \frac{1}{1-R_{i}^{2}}\) (see equation (2)).

    • +
    • Finally, from a simulation for \(k=3\), Salmerón et al. (2025) show that if \(a_{i} > 0.826\), then the degree of multicollinearity is worrying. In any case this value should be refined by considering higher values of \(k\).

    • +
    +

    On the other hand, given the orthonormal reference model (6), the value for the experimental statistic of the individual significance test with the null hypothesis \(\beta_{i,o} = 0\) (given the alternative hypothesis \(\beta_{i,o} \not= 0\), for \(i=1,\dots,k\)) is: +\[\begin{equation} + t_{i}^{o} = \left| \frac{\widehat{\beta}_{i,o}}{\widehat{\sigma}} \right| = \left| \frac{\mathbf{p}_{i} \cdot \widehat{\boldsymbol{\beta}}}{\widehat{\sigma}} \right|, + \tag{9} +\end{equation}\] +where \(\mathbf{p}_{i}\) is the \(i\) row of the matrix \(\mathbf{P}\).

    +

    By comparing this expression with the one given in (5), it is observed that, as expected, not only the denominator but also the numerator has changed. +Thus, in addition to the VIF, the rest of the elements in expression (4) have also changed. +Consequently, if the null hypothesis is rejected in the original model, it is not assured that the same will occur in the orthonormal reference model. For this reason, it is possible to consider that the orthonormal model proposed as the reference model in Salmerón et al. (2025) is more plausible than the one traditionally applied.

    +

    2.3 Possible scenarios in the individual significance tests

    +

    To determine whether the tendency not to reject the null hypothesis in the individual significance test is caused by a troubling approximate multicollinearity that inflates the variance of the estimator, or whether it is caused by variables not being statistically significantly related, the following situations are distinguished with a significance level \(\alpha\):

    +
      +
    1. If the null hypothesis is initially rejected in the original model (1), \(t_{i} > t_{n-k}(1-\alpha/2)\), the following results can be obtained for the orthonormal model:
    2. +
    +

    a.1. the null hypothesis is rejected, \(t_{i}^{o} > t_{n-k}(1-\alpha/2)\); then, the results are consistent.

    +

    a.2. the null hypothesis is not rejected, \(t_{i}^{o} < t_{n-k}(1-\alpha/2)\); this could be an inconsistency.

    +
      +
    1. If the null hypothesis is not initially rejected in the original model (1), \(t_{i} < t_{n-k}(1-\alpha/2)\), the following results may occur for the orthonormal model:
    2. +
    +

    b.1 the null hypothesis is rejected, \(t_{i}^{o} > t_{n-k}(1-\alpha/2)\); then, it is possible to conclude that the degree of multicollinearity affects the statistical analysis of the model, provoking not rejecting the null hypothesis in the original model.

    +

    b.2 the null hypothesis is also not rejected, \(t_{i}^{o} < t_{n-k}(1-\alpha/2)\); then, the results are consistent.

    +

    In conclusion, when option b.1 is given, the null hypothesis of the individual significance test is not rejected when the linear relationships are considered (original model) but is rejected when the linear relationships are not considered (orthonormal model). Consequently, it is possible to conclude that the linear relationships affect the statistical analysis of the model. The possible inconsistency discussed in option a.2 is analyzed in detail in Appendix Inconsistency, concluding that it will rarely occur in cases where a high degree of multicollinearity is assumed. The other two scenarios provide consistent situations.

    +

    3 A first attempt to obtain a non-rejection region associated with a statistical test to detect multicollinearity

    +

    3.1 From the traditional orthogonal model

    +

    Considering the expressions (4) and (5), it is verified that \(t_{i}^{o} = t_{i} \cdot \sqrt{VIF(i)}\). Consequently, in the orthogonal case, with a significance level \(\alpha\), the null hypothesis \(\beta_{i,o} = 0\) is rejected if \(t_{i}^{o} > t_{n-k}(1-\alpha/2)\) for \(i=2,\dots,k.\) That is, if: +\[\begin{equation} + VIF(i) > \left( \frac{t_{n-k}(1-\alpha/2)}{t_{i}} \right)^{2} = c_{1}(i), \quad i=2,\dots,k. + \tag{10} +\end{equation}\] +Thus, if the VIF associated with the variable \(i\) is greater than the upper bound \(c_{1}(i)\), then it can be concluded that the estimator of the coefficient of that variable is significantly different from zero in the hypothetical case where the variables are orthogonal. In addition, if the null hypothesis is not rejected in the initial model, the reason for the failure to reject could be due to the degree of multicollinearity that affects the statistical analysis of the model.

    +

    Finally, note that since the interesting cases are those where the null hypothesis is not initially rejected, \(t_{i} < t_{n-k}(1-\alpha/2)\), the upper bound \(c_{1}(i)\) will always be greater than one.

    +
    +

    Example 2. +Table 1 shows a dataset (previously presented by Wissel (2009)) with the following variables: outstanding mortgage debt (\(\mathbf{D}\), trillions of dollars), personal consumption (\(\mathbf{C}\), trillions of dollars), personal income (\(\mathbf{I}\), trillions of dollars) and outstanding consumer credit (\(\mathbf{CP}\), trillions of dollars) for the years 1996 to 2012.

    +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +Table 1: Data set presented previously by Wissel (2009) +
    +t + +D + +C + +I + +CP +
    +1996 + +3.805 + +4.770 + +4.879 + +808.23 +
    +1997 + +3.946 + +4.778 + +5.051 + +798.03 +
    +1998 + +4.058 + +4.935 + +5.362 + +806.12 +
    +1999 + +4.191 + +5.100 + +5.559 + +865.65 +
    +2000 + +4.359 + +5.291 + +5.843 + +997.30 +
    +2001 + +4.545 + +5.434 + +6.152 + +1140.70 +
    +2002 + +4.815 + +5.619 + +6.521 + +1253.40 +
    +2003 + +5.129 + +5.832 + +6.915 + +1324.80 +
    +2004 + +5.615 + +6.126 + +7.423 + +1420.50 +
    +2005 + +6.225 + +6.439 + +7.802 + +1532.10 +
    +2006 + +6.786 + +6.739 + +8.430 + +1717.50 +
    +2007 + +7.494 + +6.910 + +8.724 + +1867.20 +
    +2008 + +8.399 + +7.099 + +8.882 + +1974.10 +
    +2009 + +9.395 + +7.295 + +9.164 + +2078.00 +
    +2010 + +10.680 + +7.561 + +9.727 + +2191.30 +
    +2011 + +12.071 + +7.804 + +10.301 + +2284.90 +
    +2012 + +13.448 + +8.044 + +10.983 + +2387.50 +
    +
    +
    + +
    +
    +

    Table 2 shows the OLS estimation of the model explaining the outstanding mortgage debt as a function of the rest of the variables. That is: +\[\mathbf{D} = \beta_{1} + \beta_{2} \cdot \mathbf{C} + \beta_{3} \cdot \mathbf{I} + \beta_{4} \cdot \mathbf{CP} + \mathbf{u}.\] +Note that the estimates for the coefficients of personal consumption, personal income and outstanding consumer credit are not significantly different from zero (a significance level of 5% is considered throughout the paper), while the model is considered to be globally valid (experimental value, F exp., higher than theoretical value).

    +
    +
    + +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +Table 2: OLS estimation for the Wissel model +
    + +Estimator + +Standard Error + +Experimental t + +p-value +
    +Intercept + +5.469 + +13.017 + +0.420 + +0.681 +
    +Personal consumption + +-4.252 + +5.135 + +-0.828 + +0.422 +
    +Personal income + +3.120 + +2.036 + +1.533 + +0.149 +
    +Outstanding consumer credit + +0.003 + +0.006 + +0.500 + +0.626 +
    +(Obs, Sigma Est., Coef. Det., F exp.) + +17.000 + +0.870 + +0.923 + +52.305 +
    +
    +
    + +
    +
    +

    In addition, the estimated coefficient for the variable personal consumption, which is not significantly different from zero, has the opposite sign to the simple correlation coefficient between this variable and outstanding mortgage debt, 0.953. +Thus, in the simple linear regression between both variables (see Table 3), the estimated coefficient of the variable personal consumption is positive and significantly different from zero. However, adding a second variable (see Tables 4 and 5) none of the coefficients are individually significantly different from zero although both models are globally significant. +This is traditionally understood as a symptom of statistically troubling multicollinearity.

    +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +Table 3: OLS estimation for part of the Wissel model +
    + +Estimator + +Standard Error + +Experimental t + +p-value +
    +Intercept + +-9.594 + +1.351 + +-7.102 + +0.000 +
    +Personal consumption + +2.629 + +0.214 + +12.285 + +0.000 +
    +(Obs, Sigma Est., Coef. Det., F exp.) + +17.000 + +0.890 + +0.910 + +150.925 +
    +
    +
    + +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +Table 4: OLS estimation for part of the Wissel model +
    + +Estimator + +Standard Error + +Experimental t + +p-value +
    +Intercept + +-0.117 + +6.476 + +-0.018 + +0.986 +
    +Personal consumption + +-2.343 + +3.335 + +-0.703 + +0.494 +
    +Personal income + +2.856 + +1.912 + +1.494 + +0.158 +
    +(Obs, Sigma Est., Coef. Det., F exp.) + +17.000 + +0.823 + +0.922 + +82.770 +
    +
    +
    + +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +Table 5: OLS estimation for part of the Wissel model +
    + +Estimator + +Standard Error + +Experimental t + +p-value +
    +Intercept + +-8.640 + +9.638 + +-0.896 + +0.385 +
    +Personal consumption + +2.335 + +2.943 + +0.793 + +0.441 +
    +Outstanding consumer credit + +0.001 + +0.006 + +0.100 + +0.922 +
    +(Obs, Sigma Est., Coef. Det., F exp.) + +17.000 + +0.953 + +0.910 + +70.487 +
    +
    +
    + +
    +
    +

    By using expression (10) in order to confirm this problem, it is verified that \(c_{1}(2) = 6.807\), \(c_{1}(3) = 1.985\) and \(c_{1}(4) = 18.743\), taking into account that \(t_{13}(0.975) = 2.160\). Since the VIFs are equal to 589.754, 281.886 and 189.487, respectively, it is concluded that the individual significance tests for the three cases are affected by the degree of multicollinearity existing in the model. \(\lozenge\)

    +
    +

    3.2 From the alternative orthonormal model (6)

    +

    In the Subsection An orthonormal reference model the individual significance test from the expression (9) is redefined. Thus, the null hypothesis \(\beta_{i,o}=0\) will be rejected, with a significance level \(\alpha\), if the following condition is verified: +\[t_{i}^{o} > t_{n-k}(1-\alpha/2), \quad i=2,\dots,k.\] +Taking into account the expressions (4) and (9), this is equivalent to: +\[\begin{equation}\small{ + VIF(i) > \left( \frac{t_{n-k}(1-\alpha/2)}{\widehat{\beta}_{i,o}} \right)^{2} \cdot \widehat{var} \left( \widehat{\beta}_{i} \right) \cdot n \cdot var(\mathbf{X}_{i}) = c_{2}(i). \tag{11}} +\end{equation}\]

    +

    Thus, if the \(VIF(i)\) is greater than \(c_{2}(i)\), the null hypothesis is rejected in the respective individual significance tests in the orthonormal model (with \(i=2,\dots,k\)). Then, if the null hypothesis is not rejected in the original model and it is verified that \(VIF(i) > c_{2}(i)\), it can be concluded that the multicollinearity existing in the model affects its statistical analysis. In summary, a lower bound for the VIF is established to indicate when the approximate multicollinearity is troubling in a way that can be reinterpreted and presented as a region of non-rejection of a statistical test.

    +
    +

    Example 3. +Continuing with the dataset presented by Wissel (2009), Table 6 shows the results of the OLS estimation of the orthonormal model obtained from the original model.

    +
    +
    + +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +Table 6: OLS estimation for the orthonormal Wissel model +
    + +Estimator + +Standard Error + +Experimental t + +p-value +
    +Intercept + +-27.882 + +0.932 + +-29.901 + +0.000 +
    +Personal consumption + +11.592 + +0.932 + +12.432 + +0.000 +
    +Personal income + +-1.355 + +0.932 + +-1.453 + +0.170 +
    +Outstanding consumer credit + +0.466 + +0.932 + +0.500 + +0.626 +
    +(Obs, Sigma Est., Coef. Det., F exp.) + +17.000 + +0.870 + +0.923 + +52.305 +
    +
    +
    + +
    +
    +

    When these results are compared with those in Table 2, the following conclusions can be obtained:

    +
      +
    • Except for the outstanding consumer credit variable, whose standard deviation has increased, the standard deviation has decreased in all cases.

    • +
    • The absolute values of the experimental statistics of the individual significance tests associated with the intercept and the personal consumption variable have increased, while the experimental statistic of the personal income variable has decreased, and the experimental statistic of the outstanding consumer credit variable remains the same. These facts show that the change from the original model to the orthonormal model does not guarantee an increase in the absolute value of the experimental statistic.

    • +
    • The estimation of the coefficient of the personal consumption variable is not significantly different from zero in the original model, but it is in the orthogonal model. Thus, it is concluded that multicollinearity affects the statistical analysis of the model. Note that there is also a change in the sign of the estimate, although the purpose of the orthogonal model is not to obtain estimates for the coefficients, but rather to provide a reference point against which to measure how much the variances are inflated. Note that an orthonormal model is an idealized construction that may lack a proper interpretation in practice.

    • +
    • The values corresponding to the estimated variance for the random disturbance, the coefficient of determination and the experimental statistic (F exp.) for the global significance test remain the same.

    • +
    +
    +
    +

    On the other hand, considering the VIF of the independent variables except for the intercept (589.754, 281.886 and 189.487) and their corresponding bounds (17.809, 623.127 and 3545.167) obtained from the expression (11), only the variable of personal consumption verifies that the VIF is higher than the corresponding bound. These results are different from those obtained in Example 2, where the traditional orthogonal model was taken as a reference.

    +
    +
    +

    Finally, Tables 2 and 6 show that the experimental values of the statistic \(t\) of the variable outstanding consumer credit are the same in the original and orthonormal models. \(\lozenge\)

    +
    +

    The last fact highlighted at the end of the previous example is not a coincidence, but a consequence of the QR decomposition, see Appendix Test of…. Therefore, in this case, the conclusion of the individual significance test will be the same in the original and in the orthonormal model, i.e. we will always be in scenarios a.1 or b.2.

    +

    Thus, this behavior establishes a situation where it is required to select the variable fixed in the last position. Some criteria to select the most appropriate variable for this placement could be:

    +
      +
    • To fix the variable that is considered less relevant to the model.

    • +
    • To fix a variable whose associated coefficient is significantly different from zero, since this case would not be of interest for the definition of multicollinearity given in the paper. Note that the interest will be related to a coefficient considered as zero in the original model and significantly different from zero in the orthonormal one.

    • +
    +

    These options are explored in the Subsection Choice of the variable to be fix….

    +

    4 A non-rejection region associated with a statistical test to detect multicollinearity

    +

    Salmerón et al. (2025) show that high values of RVIF are associated with a high degree of multicollinearity. The question, however, is how high RVIFs have to be to reflect troubling multicollinearity.

    +

    Taking into account the expressions (8) and (11), it is possible to conclude that multicollinearity is affecting the statistical analysis of the model if it can be verified that: +\[\begin{equation} + RVIF(i) > \left( \frac{t_{n-k}(1-\alpha/2)}{\widehat{\beta}_{i,o}} \right)^{2} \cdot \widehat{var} \left( \widehat{\beta}_{i} \right) = c_{3}(i), + \tag{12} +\end{equation}\] +for any \(i=1,\dots,k\). Note that the intercept is included in this proposal, in contrast to the previous section, in which it was not included.

    +

    By following O’Brien (2007) and taking into account that the estimation of the expression (2) can be expressed as: +\[\widehat{var} \left( \widehat{\beta}_{i} \right) = \widehat{\sigma}^{2} \cdot RVIF(i) = \frac{\mathbf{e}^{t}\mathbf{e}}{n-k} \cdot RVIF(i),\] +there are other factors that counterbalance a high value of RVIF, thereby avoiding high estimated variances for the estimated coefficients. These factors are the sum of the squared residuals (SSR= \(\mathbf{e}^{t}\mathbf{e}\)) of the model (1) and \(n\). Thus, an appropriate specification of the econometric model (i.e., one that implies a good fit and, consequently, a small SSR) and a large sample size can compensate for high RVIF values. +However, contrary to what happens for the VIF in the traditional case, these factors are taken into account in the threshold \(c_{3}(i)\), as established in the expression (12) in \(\widehat{var} \left( \widehat{\beta}_{i} \right)\).

    +
    +

    Example 4. +This contribution can be illustrated with the data set previously presented by Klein and Goldberger (1955), which includes variables for consumption, \(\mathbf{C}\), wage incomes, \(\mathbf{I}\), non-farm incomes, \(\mathbf{InA}\), and farm incomes, \(\mathbf{IA}\), in United States from 1936 to 1952, as shown in Table 7 (data from 1942 to 1944 are not available because they were war years).

    +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +Table 7: Data set presented previously by Klein and Goldberger (1955) +
    +Consumption + +Wage income + +Non-farm income + +Farm income +
    +62.8 + +43.41 + +17.10 + +3.96 +
    +65.0 + +46.44 + +18.65 + +5.48 +
    +63.9 + +44.35 + +17.09 + +4.37 +
    +67.5 + +47.82 + +19.28 + +4.51 +
    +71.3 + +51.02 + +23.24 + +4.88 +
    +76.6 + +58.71 + +28.11 + +6.37 +
    +86.3 + +87.69 + +30.29 + +8.96 +
    +95.7 + +76.73 + +28.26 + +9.76 +
    +98.3 + +75.91 + +27.91 + +9.31 +
    +100.3 + +77.62 + +32.30 + +9.85 +
    +103.2 + +78.01 + +31.39 + +7.21 +
    +108.9 + +83.57 + +35.61 + +7.39 +
    +108.5 + +90.59 + +37.58 + +7.98 +
    +111.4 + +95.47 + +35.17 + +7.42 +
    +
    +
    + +
    +
    +

    Table 8 shows the OLS estimations of the model explaining consumption as a function of the rest of the variables. Note that there is some incoherence between the individual significance values of the variables and the global significance of the model.

    +
    +
    + +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +Table 8: OLS estimation for the Klein and Goldberger model +
    + +Estimator + +Standard Error + +Experimental t + +p-value +
    +Intercept + +18.702 + +6.845 + +2.732 + +0.021 +
    +Wage income + +0.380 + +0.312 + +1.218 + +0.251 +
    +Non-farm income + +1.419 + +0.720 + +1.969 + +0.077 +
    +Farm income + +0.533 + +1.400 + +0.381 + +0.711 +
    +(Obs, Sigma Est., Coef. Det., F exp.) + +14.000 + +36.725 + +0.919 + +37.678 +
    +
    +
    + +
    +
    +

    The RVIFs are calculated, yielding 1.275, 0.002, 0.014 and 0.053, respectively. The associated bounds, \(c_{3}(i)\), are also calculated, yielding 0.002, 0.0001, 0.018 and 1.826, respectively.

    +
    +
    +

    Since the coefficient of the wage income variable is not significantly different from zero, and because it is verified that \(0.002 > 0.0001\), from (12) it is concluded that the degree of multicollinearity existing in the model is affecting its statistical analysis. +\(\lozenge\)

    +
    +

    4.1 From the RVIF

    +

    Considering that in the original model (1) the null hypothesis \(\beta_{i} = 0\) of the individual significance test is not rejected if: +\[RVIF(i) > \left( \frac{\widehat{\beta}_{i}}{\widehat{\sigma} \cdot t_{n-k}(1-\alpha/2)} \right)^{2} = c_{0}(i), \quad i=1,\dots,k,\] +while in the orthonormal model, the null hypothesis is rejected if \(RVIF(i) > c_{3}(i)\), the following theorem can be established:

    +
    +

    Theorem. Given the multiple linear regression model (1), the degree of multicollinearity affects its statistical analysis (with a level of significance of \(\alpha\%\)) if there is a variable \(i\), with \(i=1,\dots,k\), that verifies \(RVIF(i) > \max \{ c_{0}(i), c_{3}(i) \}\).

    +
    +

    Note that Salmerón et al. (2025) indicate that the RVIF must be calculated with unit length data (as any other transformation removes the intercept from the analysis), however, for the correct application of this theorem the original data must be used as no transformation has been considered in this paper.

    +
    +

    Example 5. Tables 9 and 10 present the results of applying the theorem to the Wissel (2009) and Klein and Goldberger (1955) models, respectively. Note that in both cases, there is a variable \(i\) that verifies that \(RVIF(i) > \max \{ c_{0}(i), c_{3}(i) \}\), and consequently, we can conclude that the degree of approximate multicollinearity is affecting the statistical analysis in both models (with a level of significance of \(5\%\)). \(\lozenge\)

    +
    +
    + +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +Table 9: Theorem results of the Wissel model +
    + +RVIFs + +c0 + +c3 + +Scenario + +Affects +
    +Intercept + +194.866090 + +7.371069 + +1.017198 + +b.1 + +Yes +
    +Personal consumption + +30.326281 + +4.456018 + +0.915790 + +b.1 + +Yes +
    +Personal income + +4.765888 + +2.399341 + +10.535976 + +b.2 + +No +
    +Outstanding consumer credit + +0.000038 + +0.000002 + +0.000715 + +b.2 + +No +
    +
    +
    + +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +Table 10: Theorem results of the Klein and Goldberger model +
    + +RVIFs + +c0 + +c3 + +Scenario + +Affects +
    +Intercept + +1.275948 + +1.918383 + +0.002189 + +a.1 + +No +
    +Wage income + +0.002653 + +0.000793 + +0.000121 + +b.1 + +Yes +
    +Non-farm income + +0.014131 + +0.011037 + +0.018739 + +b.2 + +No +
    +Farm income + +0.053355 + +0.001558 + +1.826589 + +b.2 + +No +
    +
    +
    + +
    +

    5 The rvif package

    +

    The results developed in Salmerón et al. (2025) and in this paper have been implemented in the rvif package of R (R Core Team (2025)). The following shows how to replicate the results presented in both papers from the existing commands \(\texttt{rvifs}\) and \(\texttt{multicollinearity}\) in rvif. For this reason, the code executed is shown below.

    +

    In addition, the following issues will be addressed:

    +
      +
    • Discussion on the effect of sample size in detecting the influence of multicollinearity on the statistical analysis of the model.

    • +
    • Discussion on the choice of the variable to be fixed as the last one before the orthonormalization.

    • +
    +

    The code used in these two Subsections is available at https://github.com/rnoremlas/RVIF/tree/main/rvif%20package. +It is also interesting to consult the package vignette using the command browseVignettes("rvif"), as well as its web page with browseURL(system.file("docs/index.html", package = "rvif")) or https://www.ugr.es/local/romansg/rvif/index.html.

    +

    5.1 Detection of multicollinearity with RVIF: does the degree of multicollinearity affect the statistical analysis of the model?

    +

    In Salmerón et al. (2025) a series of examples are presented to illustrate the usefulness of RVIF to detect the degree of approximate multicollinearity in a multiple linear regression model. +Results presented by Salmerón et al. (2025) will be reproduced by using the command \(\texttt{rvifs}\) of rvif package and complemented with the contribution developed in the present work by using the command \(\texttt{multicollinearity}\) of the same package. +In order to facilitate the reading of the paper, this information is available in Appendix Examples of….

    +

    On the other hand, the following shows how to use the above commands to obtain the results shown in Table 9 of this paper:

    +
    +
    +
    y_W = Wissel[,2]
    +X_W = Wissel[,3:6]
    +multicollinearity(y_W, X_W)
    +
    +
                RVIFs            c0            c3 Scenario Affects
    +1 194.86608971714 7.37106939260  1.0171975958      b.1     Yes
    +2  30.32628121217 4.45601758858  0.9157898228      b.1     Yes
    +3   4.76588763064 2.39934057255 10.5359760035      b.2      No
    +4   0.00003821626 0.00000204264  0.0007149977      b.2      No
    +
    +

    It is noted that the first two arguments of the \(\texttt{multicollinearity}\) command are, respectively, the dependent variable of the linear model and the design matrix containing the independent variables (intercept included as the first column).

    +

    While the results in Table 10 can be obtained using this code:

    +
    +
    +
    y_KG = KG[,1]
    +cte = rep(1, length(y))
    +X_KG = cbind(cte, KG[,2:4])
    +multicollinearity(y_KG, X_KG)
    +
    +
            RVIFs           c0           c3 Scenario Affects
    +1 1.275947615 1.9183829079 0.0021892653      a.1      No
    +2 0.002652862 0.0007931658 0.0001206694      b.1     Yes
    +3 0.014130621 0.0110372472 0.0187393601      b.2      No
    +4 0.053354814 0.0015584988 1.8265885762      b.2      No
    +
    +

    As is known, in both cases it is concluded that the degree of multicollinearity in the model affects its statistical analysis.

    +

    The \(\texttt{multicollinearity}\) command is used by default with a significance level of 5% for the application of the Theorem set in Subsection From the RVIF. +Note that if the significance level is changed to 1% (third argument of the \(\texttt{multicollinearity}\) command), in the Klein and Goldberger model it is obtained that the individual significance test of the intercept is also affected by the degree of existing multicollinearity:

    +
    +
    +
    multicollinearity(y_W, X_W, alpha = 0.01)
    +
    +
                RVIFs            c0           c3 Scenario Affects
    +1 194.86608971714 3.79137514338  1.977602791      b.1     Yes
    +2  30.32628121217 2.29199230450  1.780449066      b.1     Yes
    +3   4.76588763064 1.23412217722 20.483705068      b.2      No
    +4   0.00003821626 0.00000105065  0.001390076      b.2      No
    +
    +
    multicollinearity(y_KG, X_KG, alpha = 0.01)
    +
    +
            RVIFs           c0           c3 Scenario Affects
    +1 1.275947615 0.9482013897 0.0044292796      b.1     Yes
    +2 0.002652862 0.0003920390 0.0002441361      b.1     Yes
    +3 0.014130621 0.0054553932 0.0379131147      b.2      No
    +4 0.053354814 0.0007703211 3.6955190555      b.2      No
    +
    +

    It can be seen that the values of \(c_{0}\) and \(c_{3}\) change depending on the significance level used.

    +

    5.2 Effect of the sample size on the detection of the influence of multicollinearity on the statistical analysis of the model

    +

    The introduction has highlighted the idea that the measures traditionally used to detect whether the degree of multicollinearity is of concern may indicate that it is troubling while the model analysis is not affected by it. Example 1 shows that this may be due, among other factors, to the size of the sample.

    +

    To explore this issue in more detail, below is given an example where traditional measures of multicollinearity detection indicate that the existing multicollinearity is troubling while the statistical analysis of the model is not affected when the sample size is high. In particular, observations are simulated for \(\mathbf{X} = [ \mathbf{1} \ \mathbf{X}_{2} \ \mathbf{X}_{3} \ \mathbf{X}_{4} \ \mathbf{X}_{5} \ \mathbf{X}_{6}]\) where: +\[\mathbf{X}_{2} \sim N(5, 0.1^{2}), \quad \mathbf{X}_{3} \sim N(5, 10^{2}), \quad \mathbf{X}_{4} = \mathbf{X}_{3} + \mathbf{p}\] +\[\mathbf{X}_{5} \sim N(-1, 3^{2}), \quad \mathbf{X}_{6} \sim N(15, 2.5^{2}),\] +where \(\mathbf{p} \sim N(5, 0.5^2)\) and considering three different sample sizes: \(n = 3000\) (Simulation 1), \(n = 100\) (Simulation 2) and \(n = 30\) (Simulation 3). +In all cases the dependent variable is generated according to: +\[\mathbf{y} = 4 + 5 \cdot \mathbf{X}_{2} - 9 \cdot \mathbf{X}_{3} - 2 \cdot \mathbf{X}_{4} + 2 \cdot \mathbf{X}_{5} + 7 \cdot \mathbf{X}_{6} + \mathbf{u},\] +where \(\mathbf{u} \sim N(0, 2^2)\).

    +

    To set the results, a seed has been established using the command set.seed(2024).

    +
    + +
    +

    With this generation it is intended that the variable \(\mathbf{X}_{2}\) is linearly related to the intercept as well as \(\mathbf{X}_{3}\) to \(\mathbf{X}_{4}\). This is supported by the results shown in Table 11, which have been obtained using the multiColl package of R (R Core Team (2025)) using the commands \(\texttt{CV}\), \(\texttt{VIF}\) and \(\texttt{CN}\).

    +

    The results imply the same conclusions in all three simulations:

    +
      +
    • There is a worrying degree of non-essential multicollinearity in the model relating the intercept to the variable \(\mathbf{X}_{2}\) since its coefficient of variation (CV) is lower than 0.1002506.

    • +
    • There is a worrying degree of essential multicollinearity in the model relating the variables \(\mathbf{X}_{3}\) and \(\mathbf{X}_{4}\) since the associated Variance Inflation Factors (VIF) are greater than 10.

    • +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +Table 11: CVs, VIFs for data of Simulations 1, 2 and 3 +
    + +Simulation 1 + +Simulation 2 + +Simulation 3 +
    +X2 CV + +0.020 + +0.022 + +0.020 +
    +X3 CV + +2.004 + +1.382 + +1.179 +
    +X4 CV + +0.999 + +0.794 + +0.692 +
    +X5 CV + +3.085 + +3.425 + +58.735 +
    +X6 CV + +0.164 + +0.150 + +0.161 +
    +X2 VIF + +1.003 + +1.028 + +1.229 +
    +X3 VIF + +399.382 + +366.074 + +673.396 +
    +X4 VIF + +399.378 + +365.372 + +675.300 +
    +X5 VIF + +1.001 + +1.043 + +1.132 +
    +X6 VIF + +1.001 + +1.017 + +1.066 +
    +CN + +145.832 + +141.879 + +180.820 +
    +
    +
    + +
    +

    However, does the degree of multicollinearity detected really affect the statistical analysis of the model? According to the results shown in Tables 12 to 14 this is not always the case:

    +
      +
    • In Simulation 1, when \(n=3000\), the degree of multicollinearity in the model does not affect the statistical analysis of the model; scenario a.1 is always verified, i.e., both in the model proposed and in the orthonormal model, the null hypothesis is rejected in the individual significance tests.

    • +
    • In Simulation 2, when \(n=100\), the degree of multicollinearity in the model affects the statistical analysis of the model only in the individual significance of the intercept; in all other cases scenario a.1 is verified again.

      +
        +
      • As will be seen below, the fact that the individual significance of the variable \(\mathbf{X}_{2}\) is not affected may be due to the number of observations in the data set. But it may also be because multicollinearity of the nonessential type affects only the intercept estimate. Thus, for example, in Salmerón et al. (2019) it is shown (see Table 2 of Example 2) that solving this type of approximate multicollinearity (by centering the variables that cause it) only modifies the estimate of the intercept and its standard deviation, with the estimates of the rest of the independent variables remaining unchanged.
      • +
    • +
    • In Simulation 3, when \(n=30\), the degree of multicollinearity in the model affects the statistical analysis of the model in the individual significance of the intercept, in \(\mathbf{X}_{2}\) and in \(\mathbf{X}_{4}\).

      +
        +
      • In this case, as discussed, the reduction in sample size does not prevent the individual significance of \(\mathbf{X}_{2}\) from being affected.
      • +
    • +
    +

    In conclusion, as O’Brien (2007) indicates, it can be seen that the increase in sample size prevents the statistical analysis of the model from being affected by the degree of existing multicollinearity, even though the values of the measures traditionally used to detect this problem indicate that it is troubling. To reach this conclusion, the use of the RVIF proposed by Salmerón et al. (2025) and the theorem developed in this paper is decisive.

    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +Table 12: Theorem results of the Simulation 1 model +
    + +RVIFs + +c0 + +c3 + +Scenario + +Affects +
    +Intercept + +0.891830 + +3.044340 + +0.000001 + +a.1 + +No +
    +X2 + +0.034562 + +1.244072 + +0.000153 + +a.1 + +No +
    +X3 + +0.001348 + +5.582043 + +0.000000 + +a.1 + +No +
    +X4 + +0.001345 + +0.237822 + +0.000005 + +a.1 + +No +
    +X5 + +0.000037 + +0.269382 + +0.000000 + +a.1 + +No +
    +X6 + +0.000055 + +3.300249 + +0.000000 + +a.1 + +No +
    +
    +
    + +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +Table 13: Theorem results of the Simulation 2 model +
    + +RVIFs + +c0 + +c3 + +Scenario + +Affects +
    +Intercept + +24.017641 + +2.248727 + +0.001408 + +b.1 + +Yes +
    +X2 + +0.881648 + +1.396582 + +0.005166 + +a.1 + +No +
    +X3 + +0.041414 + +5.850199 + +0.000001 + +a.1 + +No +
    +X4 + +0.041057 + +0.275152 + +0.078597 + +a.2 + +Contradiction +
    +X5 + +0.001178 + +0.300116 + +0.000004 + +a.1 + +No +
    +X6 + +0.001934 + +3.571484 + +0.000001 + +a.1 + +No +
    +
    +
    + +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +Table 14: Theorem results of the Simulation 3 model +
    + +RVIFs + +c0 + +c3 + +Scenario + +Affects +
    +Intercept + +127.767375 + +0.671403 + +0.035046 + +b.1 + +Yes +
    +X2 + +4.150578 + +3.062357 + +0.003971 + +b.1 + +Yes +
    +X3 + +0.299490 + +5.211866 + +0.000020 + +a.1 + +No +
    +X4 + +0.303316 + +0.191530 + +0.012989 + +b.1 + +Yes +
    +X5 + +0.002443 + +0.244713 + +0.000053 + +a.1 + +No +
    +X6 + +0.005971 + +2.766078 + +0.000013 + +a.1 + +No +
    +
    +
    + +
    +

    5.3 Selection of the variable to be set as the last before orthonormalization

    +

    Since there are as many QR decompositions as there are possible rearrangements of the independent variables, it is convenient to test different options to determine whether the degree of multicollinearity in the regression model affects its statistical analysis.

    +

    A first possibility is to try all possible reorderings considering that the intercept must always be in first place. Thus, in the Example 2 of Salmerón et al. (2025) (see Appendix Examples of… for more details) it is considered that \(\mathbf{X} = [ \mathbf{1} \ \mathbf{K} \ \mathbf{W}]\) (see Table 15), but it could also be considered that \(\mathbf{X} = [ \mathbf{1} \ \mathbf{W} \ \mathbf{K}]\) (see Table 16).

    +

    Note that in these tables the values for each variable of RVIF and \(c_{0}\) are always the same, but those of \(c_{3}\) change depending on the position of each variable within the design matrix.

    +
    + +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +Table 15: Theorem results of the Example 2 of Salmerón et al. (2025) +
    + +RVIFs + +c0 + +c3 + +Scenario + +Affects +
    +Intercept + +6388.887975 + +88495.933700 + +1.649518 + +a.1 + +No +
    +Capital + +4.136993 + +207.628058 + +0.050431 + +a.1 + +No +
    +Work + +37.336378 + +9.445619 + +147.582132 + +b.2 + +No +
    +
    +
    + +
    +
    + +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +Table 16: Theorem results of the Example 2 of Salmerón et al. (2025) (reordination 2) +
    + +RVIFs + +c0 + +c3 + +Scenario + +Affects +
    +Intercept + +6388.882446 + +88495.933700 + +1.649518 + +a.1 + +No +
    +Work + +37.336378 + +9.445619 + +1.163201 + +b.1 + +Yes +
    +Capital + +4.136993 + +207.628058 + +0.082430 + +a.1 + +No +
    +
    +
    + +
    +

    It is observed that in one of the two possibilities considered, the individual significance of the labor variable is affected by the degree of existing multicollinearity.

    +

    Therefore, to state that the statistical analysis of the multiple linear regression model is not affected by the multicollinearity present in the model, it is necessary to check all the possible QR decompositions and to determine in all cases that the statistical analysis is not affected. However, to determine that the statistical analysis of the model is affected by the presence of multicollinearity, it is sufficient to find one of the possible rearrangements in which the situation b.1 occurs.

    +
    + +
    +

    Another possibility is to set in the last position of \(\mathbf{X}\) a particular variable following a specific criterion. Thus, for example, in Example 3 of Salmerón et al. (2025) (see Appendix Examples of… for more details) it is verified that the variable FA has a coefficient significantly different from zero. Fixing this variable in third place since the individual significance will not be modified yields the results shown in Table 17.

    +
    + +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +Table 17: Theorem results of the Example 3 of Salmerón et al. (2025) reordination +
    + +RVIFs + +c0 + +c3 + +Scenario + +Affects +
    +OI + +0.00000000000169645443 + +0.00000000000095949419 + +0.00000000000017752441 + +b.1 + +Yes +
    +S + +0.00000000000171853492 + +0.00000000000110043700 + +0.00000000000101211253 + +b.1 + +Yes +
    +FA + +0.00000000000000018292 + +0.00000000000000023077 + +0.00000000000000014498 + +a.1 + +No +
    +
    +
    + +
    +

    It can be seen that in this case the degree of multicollinearity in the model affects the individual significance of the OI and S variables.

    +

    6 Conclusions

    +

    In this paper, following Salmerón et al. (2025), we propose an alternative orthogonal model that leads to a lower bound for the RVIF, indicating whether the degree of multicollinearity present in the model affects its statistical analysis. These thresholds serve as complements to the results presented by O’Brien (2007), who stated that the estimated variances depend on other factors that can counterbalance a high value of the VIF, for example, the size of the sample or the estimated variance of the independent variables. Thus, the thresholds presented for the RVIF also depend on these factors meeting a threshold associated with each independent variable (including the intercept). Note that these thresholds will indicate whether the degree of multicollinearity affects the statistical analysis.

    +

    As these thresholds are derived from the individual significance tests of the model, it is possible to reinterpret them as a statistical test to determine whether the degree of multicollinearity in the linear regression model affects its statistical analysis. This analytic tool allows researchers to conclude whether the degree of multicollinearity is statistically troubling and whether it needs to be treated. We consider this to be a relevant contribution since, to the best of our knowledge, the only existing example of such a measure, presented by Farrar and Glauber (1967), has been strongly criticized (in addition to the limitations highlighted in the introduction, it should be noted that it completely ignores approximate non-essential multicollinearity since the correlation matrix does not include information on the intercept); consequently, this new statistical test with a non-rejection region will fill a gap in the scientific literature.

    +

    On the other hand, note that the position of each of the variables in the matrix \(\mathbf{X}\) uniquely determines the reference orthonormal model \(\mathbf{X}_{o}\). It is to say, there are as many reference models given by the proposed QR decomposition as there are possible rearrangements of the variables within the matrix \(\mathbf{X}\).

    +

    In this sense, as has been shown, in order to affirm that the statistical analysis of the model is not affected by the degree of multicollinearity existing in the model (with the degree of significance used in the application of the proposed theorem), it is necessary to state that in all the possible rearrangements of \(\mathbf{X}\) it is concluded that scenario b.1 does not occur. On the other hand, when there is a rearrangement in which this scenario appears, it can be stated (to the degree of significance used when applying the proposed theorem) that the degree of existing multicollinearity affects the statistical analysis of the model.

    +

    Finally, as a future line of work, it would be interesting to complete the analysis presented here by studying when the degree of multicollinearity in the model affects its numerical analysis.

    +

    7 Acknowledgments

    +

    This work has been supported by project PP2019-EI-02 of the University of Granada (Spain) and by project A-SEJ-496-UGR20 of the Andalusian Government’s Counseling of Economic Transformation, Industry, Knowledge and Universities (Spain).

    +

    8 Appendix

    +

    8.1 Inconsistency in hypothesis tests: situation a.2

    +

    From a numerical point of view it is possible to reject \(H_{0}: \beta_{i} = 0\) while \(H_{0}: \beta_{i,o} = 0\) is not rejected, which implies that \(t_{i}^{o} < t_{n-k}(1 - \alpha/2) < t_{i}\). Or, in other words, \(t_{i}/t_{i}^{o} > 1\).

    +

    However, from expression (9) it is obtained that \(\widehat{\sigma} = | \widehat{\beta}_{i,o} | / t_{i}^{o}\). By substituting \(\widehat{\sigma}\) in expression (4), taking into account expression (8), it is obtained that +\[\frac{t_{i}}{t_{i}^{o}} = \frac{| \widehat{\beta}_{i} |}{| \widehat{\beta}_{i,o} |} \cdot \frac{1}{\sqrt{RVIF(i)}}.\] +From this expression it can be concluded that in situations with high collinearity, \(RVIF(i) \rightarrow +\infty\), the ratio \(t_{i}/t_{i}^{o}\) will tend to zero, and the condition \(t_{i}/t_{i}^{o} > 1\) will rarely occur. That is to say, the inconsistency in situation a.2, commented on in the preliminaries of the paper, will not appear.

    +

    On the other hand, if the variable \(i\) is orthogonal to the rest of independent variables, it is verified that \(\widehat{\beta}_{i,o} = \widehat{\beta}_{i}\) since \(p_{i} = ( 0 \dots \underbrace{1}_{(i)} \dots 0)\). At the same time, \(RVIF(i) = \frac{1}{SST_{i}}\) where \(SST\) denotes the sum of total squares. If there is orthonormality, as proposed in this paper, \(SST_{i} = 1\) and, as consequence, it is verified that \(t_{i} = t_{i}^{o}\). Thus, the individual significance test for the original data and for the orthonormal data are the same.

    +

    8.2 Test of individual significance of coefficient \(k\)

    +

    Taking into account that it is verified that \(\boldsymbol{\beta}_{o} = \mathbf{P} \boldsymbol{\beta}\) where: +\[\boldsymbol{\beta}_{o} = \left( + \begin{array}{c} + \beta_{1,o} \\ + \beta_{2,o} \\ + \vdots \\ + \beta_{k,o} + \end{array} \right), \quad + \mathbf{P} = \left( + \begin{array}{cccc} + p_{11} & p_{12} & \dots & p_{1k} \\ + 0 & p_{22} & \dots & p_{2k} \\ + \vdots & \vdots & & \vdots \\ + 0 & 0 & \dots & p_{kk} + \end{array} \right), \quad + \boldsymbol{\beta} = \left( + \begin{array}{c} + \beta_{1} \\ + \beta_{2} \\ + \vdots \\ + \beta_{k} + \end{array} \right),\] +it is obtained that \(\beta_{k,o} = p_{kk} \beta_{k}\). Then, the null hypothesis \(H_{0}: \beta_{k,o} = 0\) is equivalent to \(H_{0}: \beta_{k} = 0\). Due to this fact, Tables 2 and 6 showed an expectable behaviour. However, this behaviour will be analyzed with more detail.

    +

    The experimental value to be considered to take a decision in the test with null hypothesis \(H_{0}: \beta_{k,o} = 0\) and alternative hypothesis \(H_{1}: \beta_{k,o} \not= 0\) is given by the following expression: +\[t_{k}^{o} = \left| \frac{\widehat{\beta}_{k,o}}{\sqrt{var \left( \widehat{\beta}_{k,o} \right)}} \right|.\]

    +

    Taking into account that \(\widehat{\boldsymbol{\beta}}_{o} = \mathbf{P} \widehat{\boldsymbol{\beta}}\) and \(var \left( \widehat{\boldsymbol{\beta}}_{o} \right) = \mathbf{P} var \left( \widehat{\boldsymbol{\beta}} \right) \mathbf{P}^{t},\) it is verified that \(\widehat{\beta}_{k,o} = p_{kk} \widehat{\beta}_{k}\) and \(var \left( \widehat{\beta}_{k,o} \right) = p_{kk}^{2} var \left( \widehat{\beta}_{k} \right)\). Then: +\[t_{k}^{o} = \left| \frac{p_{kk} \widehat{\beta}_{k}}{p_{kk} \sqrt{var \left( \widehat{\beta}_{k} \right)}} \right| = \left| \frac{\widehat{\beta}_{k}}{\sqrt{var \left( \widehat{\beta}_{k} \right)}} \right| = t_{k},\] +where \(t_{k}\) is the experimental value to take a decision in the test with null hypothesis \(H_{0}: \beta_{k} = 0\) and alternative hypothesis \(H_{1}: \beta_{k} \not= 0\).

    +

    8.3 Examples of Salmerón et al. (2025)

    +

    Example 1 of Salmerón et al. (2025): Detection of traditional nonessential multicollinearity. Using data from a financial model in which the Euribor (E) is analyzed from the Harmonized Index of Consumer Prices (HICP), the balance of payments to net current account (BC) and the government deficit to net nonfinancial accounts (GD), we illustrate the detection of approximate multicollinearity of the non-essential type, i.e. where the intercept is related to one of the remaining independent variables (for details see Marquardt and Snee (1975)). For more information on this data set use help(euribor).

    +

    Note that Salmerón-Gómez et al. (2019) establishes that an independent variable with a coefficient of variation less than 0.1002506 indicates that this variable is responsible for a non-essential multicollinearity problem.

    +

    Thus, first of all, the approximate multicollinearity detection is performed using the measures traditionally applied for this purpose: the Variance Inflation Factor (VIF) and the Condition Number (CN). Values higher than 10 for the VIF (see, for example, Marquardt (1970)) and 30 for the CN (see, for example, Belsley (1991) or Belsley et al. (1980)), imply that the degree of existing multicollinearity is troubling. Moreover, according to Salmerón-Gómez et al. (2019), the VIF is only able to detect essential multicollinearity (relationship between independent variables excluding the intercept, see Marquardt and Snee (1975)), while the CN detects both essential and non-essential multicollinearity.

    +

    Therefore, the values calculated below (using the \(\texttt{VIF}\), \(\texttt{CN}\) and \(\texttt{CVs}\) commands from the multiColl package, see Salmerón et al. (2021) and Salmerón et al. (2022) for more details on this package) indicate that the degree of approximate multicollinearity existing in the model of the essential type is not troubling, while that of the non-essential type is troubling due to the relationship of HIPC with the intercept.

    +
    +
    +
    E = euribor[,1]
    +data1 = euribor[,-1]
    +
    +VIF(data1) 
    +
    +
        HIPC       BC       GD 
    +1.349666 1.058593 1.283815 
    +
    +
    CN(data1)
    +
    +
    [1] 39.35375
    +
    +
    CVs(data1)
    +
    +
    [1] 0.06957876 4.34031035 0.55015508
    +
    +

    This assumption is confirmed by calculating the RVIF values, which point to a strong relationship between the second variable and the intercept:

    +
    +
    +
    rvifs(data1, ul = T, intercept = T) 
    +
    +
                     RVIF       %
    +Intercept  250.294157 99.6005
    +Variable 2 280.136873 99.6430
    +Variable 3   1.114787 10.2967
    +Variable 4   5.525440 81.9019
    +
    +
    + +
    +

    The output of the \(\texttt{rvifs}\) command provides the values of the Redefined Variance Inflation Factor (RVIF) and the percentage of multicollinearity due to each variable (denoted as \(a_{i}\) in the An orthonormal… section).

    +

    In this case, three of the four arguments available in the \(\texttt{rvifs}\) command are used:

    +
      +
    • The first of these refers to the design matrix containing the independent variables (the intercept, if any, being the first column).

    • +
    • The second argument, \(ul\), indicates that the data is to be transformed into unit length. This transformation makes it possible to establish that the RVIF is always greater than or equal to 1, having as a reference a minimum value that indicates the absence of worrying multicollinearity.

    • +
    • The third argument, \(intercept\), indicates whether there is an intercept in the design matrix.

    • +
    +

    Note that these results can also be obtained after using the \(\texttt{lm}\) and \(\texttt{model.matrix}\) commands as follows:

    +
    +
    +
    reg_E = lm(euribor[,1]~as.matrix(euribor[,-c(1,2)]))
    +rvifs(model.matrix(reg_E)) 
    +
    +
                     RVIF       %
    +Intercept  250.294157 99.6005
    +Variable 2 280.136873 99.6430
    +Variable 3   1.114787 10.2967
    +Variable 4   5.525440 81.9019
    +
    +
    + +
    +

    Finally, the application of the Theorem established in Subsection From the RVIF detects that the individual inference of the second variable (HIPC) is affected by the degree of multicollinearity existing in the model. These results are obtained using the \(\texttt{multicollinearity}\) command from the rvif package:

    +
    +
    +
    multicollinearity(E, data1) 
    +
    +
                    RVIFs                  c0                   c3
    +1 5.32540760188356455 15.7587108767502624 0.021669072249543789
    +2 0.00053578299592945  0.0000032194557411 0.000042493587349727
    +3 0.00000000005109564  0.0000000010986494 0.000000000002586237
    +4 0.00000000001631439  0.0000000003216522 0.000000000000827476
    +  Scenario Affects
    +1      a.1      No
    +2      b.1     Yes
    +3      a.1      No
    +4      a.1      No
    +
    +

    Therefore, it can be established that the existing multicollinearity affects the statistical analysis of the Euribor model.

    +

    Example 2 of Salmerón et al. (2025): Detection of generalized nonessential multicollinearity. Using data from a Cobb-Douglas production function in which the production (P) is analyzed from the capital (K) and the work (W), we illustrate the detection of approximate multicollinearity of the generalized non-essential type, i.e., that in which at least two independent variables with very little variability (excluding the intercept) are related to each other (for more details, see Salmerón et al. (2020)). For more information on this dataset use help(CDpf).

    +

    Using the \(\texttt{rvifs}\) command, it can be determined that both capital and labor are linearly related to each other with high RVIF values below the threshold established as a worrying value:

    +
    +
    +
    P = CDpf[,1]
    +data2 = CDpf[,2:4]
    +
    +
    +
    +
    +
    rvifs(data2, ul = T) 
    +
    +
                    RVIF       %
    +Intercept  178888.71 99.9994
    +Variable 2  38071.44 99.9974
    +Variable 3 255219.74 99.9996
    +
    +
    + +
    +

    However, the application of the Theorem established in Subsection From the RVIF does not detect that the degree of multicollinearity in the model affects the statistical analysis of the model:

    +
    +
    +
    multicollinearity(P, data2)
    +
    +
            RVIFs           c0           c3 Scenario Affects
    +1 6388.887975 88495.933700   1.64951764      a.1      No
    +2    4.136993   207.628058   0.05043083      a.1      No
    +3   37.336378     9.445619 147.58213164      b.2      No
    +
    +

    Now then, if we rearrange the design matrix \(\mathbf{X}\) we obtain that:

    +
    +
    +
    data2 = CDpf[,c(2,4,3)]
    +multicollinearity(P, data2)
    +
    +
            RVIFs           c0         c3 Scenario Affects
    +1 6388.882446 88495.933700 1.64951764      a.1      No
    +2   37.336378     9.445619 1.16320125      b.1     Yes
    +3    4.136993   207.628058 0.08242979      a.1      No
    +
    +

    Therefore, it can be established that the existing multicollinearity does affect the statistical analysis of the Cobb-Douglas production function model.

    +

    Example 3 of Salmerón et al. (2025): Detection of essential multicollinearity. Using data from a model in which the number of employees of Spanish companies (NE) is analyzed from the fixed assets (FA), operating income (OI) and sales (S), we illustrate the detection of approximate multicollinearity of the essential type, i.e., that in which at least two independent variables (excluding the intercept) are related to each other (for more details, see Marquardt and Snee (1975)). For more information on this dataset use help(employees).

    +

    In this case, the \(\texttt{rvifs}\) command shows that variables three and four (OI and S) have a high VIF value, so they are highly linearly related:

    +
    +
    +
    NE = employees[,1]
    +data3 = employees[,2:5]
    +
    +
    +
    +
    +
    rvifs(data3, ul = T) 
    +
    +
                       RVIF       %
    +Intercept      2.984146 66.4896
    +Variable 2     5.011397 80.0455
    +Variable 3 15186.744870 99.9934
    +Variable 4 15052.679178 99.9934
    +
    +
    + +
    +

    Note that if in rvifs(data3, ul = T) the unit length transformation is avoided, which is done in the \(\texttt{multicollinearity}\) command, the RVIF cannot be calculated since the system is computationally singular. For this reason, the intercept is eliminated below since it has been shown above that it does not play a relevant role in the linear relationships of the model.

    +

    Finally, the application of the Theorem established in Subsection From the RVIF detects that the individual inference of the third variable (OI) is affected by the degree of multicollinearity existing in the model:

    +
    +
    +
    multicollinearity(NE, data3[,-1])
    +
    +
                         RVIFs                       c0
    +1 0.0000000000000001829154 0.0000000000000002307712
    +2 0.0000000000016964544259 0.0000000000009594941859
    +3 0.0000000000017185349203 0.0000000000011004369977
    +                         c3 Scenario Affects
    +1 0.00000000000000004679301      a.1      No
    +2 0.00000000000021295109127      b.1     Yes
    +3 0.00000000000268380859460      b.2      No
    +
    +

    Therefore, it can be established that the existing multicollinearity affects the statistical analysis of the model of the number of employees in Spanish companies.

    +

    Example 4 of Salmerón et al. (2025): The special case of simple linear model. The simple linear regression model is an interesting case because it has a single independent variable and the intercept. Since the intercept is not properly considered as an independent variable of the model in many cases (see introduction of Salmerón-Gómez et al. (2019) for more details), different software (including R, R Core Team (2025)) do not consider that there can be worrisome multicollinearity in this type of model.

    +

    To illustrate this situation, Salmerón et al. (2025) randomly generates observations for the following two simple linear regression models \(\mathbf{y}_{1} = \beta_{1} + \beta_{2} \mathbf{V} + \mathbf{u}_{1}\) and \(\mathbf{y}_{2} = \alpha_{1} + \alpha_{2} \mathbf{Z} + \mathbf{u}_{1}\), according to the following code:

    +
    +
    +
    set.seed(2022)
    +obs = 50
    +cte4 = rep(1, obs)
    +V = rnorm(obs, 10, 10)
    +y1 = 3 + 4*V + rnorm(obs, 0, 2)
    +Z = rnorm(obs, 10, 0.1)
    +y2 = 3 + 4*Z + rnorm(obs, 0, 2)
    +
    +data4.1 = cbind(cte4, V)
    +data4.2 = cbind(cte4, Z)
    +
    +
    +

    For more information on these data sets use help(SLM1) and help(SLM2).

    +

    As mentioned above, the R package (R Core Team (2025)) denies the existence of multicollinearity in this type of model. Thus, for example, when using the \(\texttt{vif}\) command of the car package on reg=lm(y1~V) the following message is obtained: Error in vif.default(reg): model contains fewer than 2 terms.

    +

    Undoubtedly, this message is coherent with the fact that, as mentioned above, the VIF is not capable of detecting non-essential multicollinearity (which is the only multicollinearity that exists in this type of model). However, the error message provided may lead a non-specialized user to consider that the multicollinearity problem does not exist in this type of model. These issues are addressed in more depth in Salmerón et al. (2022).

    +

    On the other hand, the calculation of the RVIF in the first model shows that the degree of multicollinearity is not troubling, since it presents very low values:

    +
    +
    +
    rvifs(data4.1, ul = T) 
    +
    +
                  RVIF       %
    +Intercept  2.28619 56.2591
    +Variable 2 2.28619 56.2591
    +
    +

    While in the second model they are very high, indicating a problem of non-essential multicollinearity:

    +
    +
    +
    rvifs(data4.2, ul = T) 
    +
    +
                  RVIF       %
    +Intercept  7973.64 99.9875
    +Variable 2 7973.64 99.9875
    +
    +

    By using the \(\texttt{multicollinearity}\) command, it is found that the individual inference of the intercept of the second model is affected by the degree of multicollinearity in the model:

    +
    +
    +
    multicollinearity(y1, data4.1)
    +
    +
             RVIFs        c0               c3 Scenario Affects
    +1 0.0457237934 0.3306778 0.00000801644147      a.1      No
    +2 0.0001862616 0.7394482 0.00000004691791      a.1      No
    +
    +
    multicollinearity(y2, data4.2)
    +
    +
           RVIFs       c0         c3 Scenario Affects
    +1 159.472803 1.638066 0.02948054      b.1     Yes
    +2   1.598808 1.368176 1.86831678      b.2      No
    +
    +

    Therefore, it can be established that the multicollinearity existing in the first simple linear regression model does not affect the statistical analysis of the model, while in the second one it does.

    +
    +

    8.4 CRAN packages used

    +

    rvif, multiColl, car

    +

    8.5 CRAN Task Views implied by cited packages

    +

    Econometrics, Finance, MixedModels, TeachingStatistics

    +
    +
    +D. Belsley. A guide to using the collinearity diagnostics. Computational Science in Economics and Manegement, 4: 33–50, 1991. URL https://doi.org/10.1007/BF00426854. +
    +
    +D. Belsley, E. Kuh and R. Welsch. Regression diagnostics: Identifying influential data and sources of collinearity. John Wiley; Sons, 1980. URL https://onlinelibrary.wiley.com/doi/book/10.1002/0471725153. +
    +
    +D. E. Farrar and R. R. Glauber. Multicollinearity in regression analysis: The problem revisited. The Review of Economic and Statistics, 49(1): 92–107, 1967. URL https://doi.org/10.2307/1937887. +
    +
    +C. B. García, R. Salmerón, C. García-García and J. García. Residualization: Justification, properties and application. Journal of Applied Statistics, 47(11): 1990–2010, 2019. URL https://doi.org/10.1111/insr.12575. +
    +
    +D. N. Gujarati. Basic econometrics. McGraw-Hill (fourth edition), 2003. URL https://highered.mheducation.com/sites/0072335424/. +
    +
    +R. L. Gunst and R. L. Mason. Advantages of examining multicollinearities in regression analysis. Biometrics, 33(1): 249–260, 1977. URL https://doi.org/10.2307/2529320. +
    +
    +Y. Haitovsky. Multicollinearity in regression analysis: comment. The Review of Economics and Statistics, 51(4): 486–489, 1969. URL https://doi.org/10.2307/1926450. +
    +
    +L. R. Klein and A. S. Goldberger. An economic model of the United States 1929-1952. Amsterdan: North-Holland Publishing Company, 1955. URL https://doi.org/10.2307/2227976. +
    +
    +T. K. Kumar. Multicollinearity in regression analysis. The Review of Economics and Statistics, 57(3): 365–366, 1975. URL https://doi.org/10.2307/1923925. +
    +
    +D. Marquardt. Generalized inverses, ridge regression, biased linear estimation and nonlinear estimation. Technometrics, 12(3): 591–612, 1970. URL https://doi.org/10.2307/1267205. +
    +
    +D. Marquardt and R. D. Snee. Ridge regression in practice. The American Statistician, 29(1): 3–20, 1975. URL https://doi.org/10.2307/2683673. +
    +
    +R. M. O’Brien. A caution regarding rules of thumb for variance inflation factors. Quality & quantity, 41(5): 673–690, 2007. URL https://link.springer.com/article/10.1007/s11135-006-9018-6. +
    +
    +J. O’Hagan and B. McCabe. Tests for the severity of multicolinearity in regression analysis: A comment. The Review of Economics and Statistics, 57(3): 368–370, 1975. URL https://doi.org/10.2307/1923927. +
    +
    +R Core Team. R: A language and environment for statistical computing. 4.5.1 ed Vienna, Austria: R Foundation for Statistical Computing, 2025. URL https://www.R-project.org/. +
    +
    +R. Salmerón, C. B. García and J. García. A guide to using the R package multiColl for detecting multicollinearity. Computational Economics, 57: 529–536, 2021. URL https://doi.org/10.1007/s10614-019-09967-y. +
    +
    +R. Salmerón, C. B. García and J. García. A redefined variance inflation factor: Overcoming the limitations of the variance inflation factor. Computational Economics, 65: 337–363, 2025. URL https://doi.org/10.1007/s10614-024-10575-8. +
    +
    +R. Salmerón, C. B. García and J. García. Comment on A note on collinearity diagnostics and centering” by Velilla (2018). The American Statistician, 74(1): 68–71, 2019. URL https://doi.org/10.1080/00031305.2019.1635527. +
    +
    +R. Salmerón, C. B. García and J. García. Detection of near-multicollinearity through centered and noncentered regression. Mathematics, 8(6): 931, 2020. URL https://doi.org/10.3390/math8060931. +
    +
    +R. Salmerón, C. B. García and J. García. The multiColl package versus other existing packages in R to detect multicollinearity. Computational Economics, 60: 439–450, 2022. URL https://doi.org/10.1007/s10614-021-10154-1. +
    +
    +R. Salmerón-Gómez, A. Rodríguez-Sánchez and C. García-García. Diagnosis and quantification of the non-essential collinearity. Computational Statistics, 35: 647–666, 2019. URL https://doi.org/10.1007/s00180-019-00922-x. +
    +
    +S. Silvey. Multicollinearity and imprecise estimation. Journal of the Royal Statistical Society. Series B (Methodological), 31(3): 539–552, 1969. URL https://www.jstor.org/stable/2984357. +
    +
    +C. R. Wichers. The detection of multicollinearity: A comment. The Review of Economics and Statistics, 57(3): 366–368, 1975. URL https://doi.org/10.2307/1923926. +
    +
    +A. R. Willan and D. G. Watts. Meaningful multicollinearity measures. Technometrics, 20(4): 407–412, 1978. URL https://doi.org/10.1080/00401706.1978.10489694. +
    +
    +J. Wissel. A new biased estimator for multivariate regression models with highly collinear variables. Erlangung des naturwissenschaftlichen Doktorgrades der Bayerischen Julius-Maximilians-Universität Würzburg., 2009. URL https://opus.bibliothek.uni-wuerzburg.de/frontdoor/index/index/docId/2949. +
    +
    +J. M. Wooldrigde. Introductory econometrics. A modern approach. South-Western, CENGAGE Learning (7th edition), 2020. URL https://www.cengage.uk/c/introductory-econometrics-a-modern-approach-7e-wooldridge/9781337558860PF/. +
    +
    + + +
    + +
    +
    + + + + + + + +
    +

    References

    +
    +

    Reuse

    +

    Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

    +

    Citation

    +

    For attribution, please cite this work as

    +
    Salmerón-Gómez & García-García, "rvif: a Decision Rule to Detect Troubling Statistical Multicollinearity Based on Redefined VIF", The R Journal, 2026
    +

    BibTeX citation

    +
    @article{RJ-2025-040,
    +  author = {Salmerón-Gómez, Román and García-García, Catalina B.},
    +  title = {rvif: a Decision Rule to Detect Troubling Statistical Multicollinearity Based on Redefined VIF},
    +  journal = {The R Journal},
    +  year = {2026},
    +  note = {https://doi.org/10.32614/RJ-2025-040},
    +  doi = {10.32614/RJ-2025-040},
    +  volume = {17},
    +  issue = {4},
    +  issn = {2073-4859},
    +  pages = {192-215}
    +}
    +
    + + + + + + + diff --git a/_articles/RJ-2025-040/RJ-2025-040.pdf b/_articles/RJ-2025-040/RJ-2025-040.pdf new file mode 100644 index 0000000000..0f8294df45 Binary files /dev/null and b/_articles/RJ-2025-040/RJ-2025-040.pdf differ diff --git a/_articles/RJ-2025-040/RJ-2025-040.tex b/_articles/RJ-2025-040/RJ-2025-040.tex new file mode 100644 index 0000000000..8cc623caca --- /dev/null +++ b/_articles/RJ-2025-040/RJ-2025-040.tex @@ -0,0 +1,1142 @@ +% !TeX root = RJwrapper.tex +\title{rvif: a Decision Rule to Detect Troubling Statistical Multicollinearity Based on Redefined VIF} + + +\author{by Román Salmerón-Gómez and Catalina B. García-García} + +\maketitle + +\abstract{% +Multicollinearity is relevant in many different fields where linear regression models are applied since its presence may affect the analysis of ordinary least squares estimators not only numerically but also from a statistical point of view, which is the focus of this paper. Thus, it is known that collinearity can lead to incoherence in the statistical significance of the coefficients of the independent variables and in the global significance of the model. In this paper, the thresholds of the Redefined Variance Inflation Factor (RVIF) are reinterpreted and presented as a statistical test with a region of non-rejection (which depends on a significance level) to diagnose the existence of a degree of worrying multicollinearity that affects the linear regression model from a statistical point of view. The proposed methodology is implemented in the rvif package of R and its application is illustrated with different real data examples previously applied in the scientific literature. +} + +\section{Introduction}\label{introduction} + +It is well known that linear relationships between the independent variables of a multiple linear regression model (multicollinearity) can affect the analysis of the model estimated by Ordinary Least Squares (OLS), either by causing unstable estimates of the coefficients of these variables or by rejecting individual significance tests of these coefficients (see, for example, \citet{FarrarGlauber}, \citet{GunstMason1977}, \citet{Gujarati2003}, \citet{Silvey1969}, \citet{WillanWatts1978} or \citet{Wooldrigde2013}). +However, the measures traditionally applied to detect multicollinearity may conclude that multicollinearity exists even if it does not lead to the negative effects mentioned above (see Subsection \hyperref[effect-sample-size]{Effect of sample size..} for more details), when, in fact, the best solution in this case may be not to treat the multicollinearity (see \citet{OBrien}). + +Focusing on the possible effect of multicollinearity on the individual significance tests of the coefficients of the independent variables (tendency not to reject the null hypothesis), this paper proposes an alternative procedure that focuses on checking whether the detected multicollinearity affects the statistical analysis of the model. For this disruptive approach, a methodology is necessary that indicates whether multicollinearity affects the statistical analysis of the model. The introduction of such methodology is the main objective of this paper. The paper also shows the use of the \CRANpkg{rvif} package of R (\citet{R}) in which this procedure is implemented. + +To this end, we start from the Variance Inflation Factor (VIF). The VIF is obtained from the coefficient of determination of the auxiliary regression of each independent variable of linear regression model as a function of the other independent variables. Thus, there is a VIF for each independent variable except for the intercept, for which it is not possible to calculate a coefficient of determination for the corresponding auxiliary regression. Consequently, the VIF is able to diagnose the degree of essential approximate multicollinearity (strong linear relationship between the independent variables except the intercept) existing in the model but is not able to detect the non-essential one (strong relationship between the intercept and at least one of the independent variables). +For more information on multicollinearity of essential and non-essential type, see \citet{MarquardtSnee1975} and \citet{Salmeron2019}. + +However, the fact that the VIF detects a worrying level of multicollinearity does not always translate into a negative impact on the statistical analysis. This lack of specificity is due to the fact that other factors, such as sample size and the variance of the random disturbance, can lead to high values of the VIF but not increase the variance of the OLS estimators (see \citet{OBrien}). The explanation for this phenomenon hinges on the fact that, in the orthogonal variable reference model, which is traditionally considered as the reference, the linear relationships are assumed to be eliminated, while other factors, such as the variance of the random disturbance, maintain the same values. + +Then, to avoid these inconsistencies, \citet{Salmeron2024a} propose a QR decomposition in the matrix of independent variables of the model in order to obtain an orthonormal matrix. By redefining the reference point, the variance inflation factor is also redefined, resulting in a new detection measure that analyzes the change in the VIF and the rest of relevant factors of the model, thereby overcoming the problems associated with the traditional VIF, as described by \citet{OBrien} among others. The intercept is also included in the detection (contrary to what happens with the traditional VIF), it is therefore able to detect both essential and non-essential multicollinearity. +This new measure presented by \citet{Salmeron2024a} is called Redefined Variance Inflation Factor (RVIF). + +In this paper, the RVIF is associated with a statistical test for detecting troubling multicollinearity, this test is given by a region of non-rejection that depends on a significance level. Note that most of the measures used to diagnose multicollinearity are merely indicators with rules of thumb rather than statistical tests per se. To the best of our knowledge, the only existing statistical test for diagnosing multicollinearity was presented by \citet{FarrarGlauber} and has received strong criticism (see, for example, \citet{CriticaFarrar1}, \citet{CriticaFarrar2}, \citet{CriticaFarrar3} and \citet{CriticaFarrar4}). +Thus, for example, \citet{CriticaFarrar1} indicates that the Farrar and Glauber statistic indicates that the variables are not orthogonal to each other; it tells us nothing more. +In this sense, \citet{CriticaFarrar4} indicates that such a test simply indicates whether the null hypothesis of orthogonality is rejected by giving no information on the value of the matrix of correlations determinant above which the multicollinearity problem becomes intolerable. +Therefore, the non-rejection region presented in this paper should be a relevant contribution to the field of econometrics insofar as it would fill an existing gap in the scientific literature. + +The paper is structured as follows: Sections \hyperref[preliminares]{Preliminares} and \hyperref[modelo-orto]{A first attempt of\ldots{}} provide preliminary information to introduce the methodology used to establish the non-rejection region described in Section \hyperref[new-VIF-orto]{A non-rejection region\ldots{}}. +Section \hyperref[paqueteRVIF]{rvif package} presents the package \CRANpkg{rvif} of R (\citet{R}) and shows its main commands by replicating the results given in \citet{Salmeron2024a} and in the previous sections of this paper. +Finally, Section \hyperref[conclusiones-VIF]{Conclusions} summarizes the main contributions of this paper. + +\section{Preliminaries}\label{preliminares} + +This section identifies some inconsistencies in the definition of the VIF and how these are reflected in the individual significance tests of the linear regression model. It also shows how these inconsistencies are overcome in the proposal presented by \citet{Salmeron2024a} and how this proposal can lead to a decision rule to determine whether the degree of multicollinearity is troubling, i.e., whether it affects the statistical analysis (individual significance tests) of the model. + +\subsection{The original model}\label{the-original-model} + +The multiple linear regression model with \(n\) observations and \(k\) independent variables can be expressed as: +\begin{equation} + \mathbf{y}_{n \times 1} = \mathbf{X}_{n \times k} \cdot \boldsymbol{\beta}_{k \times 1} + \mathbf{u}_{n \times 1}, + \label{eq:model0} +\end{equation} +where the first column of \(\mathbf{X} = [\mathbf{1} \ \mathbf{X}_{2} \dots \mathbf{X}_{i} \dots \mathbf{X}_{k}]\) is composed of ones representing the intercept and \(\mathbf{u}\) represents the random disturbance assumed to be centered and spherical. That is, \(E[\mathbf{u}_{n \times 1}] = \mathbf{0}_{n \times 1}\) and \(var(\mathbf{u}_{n \times 1}) = \sigma^{2} \cdot \mathbf{I}_{n \times n}\), where \(\mathbf{0}\) is a vector of zeros, \(\sigma^{2}\) is the variance of the random disturbance and \(\mathbf{I}\) is the identity matrix. + +Given the original model \eqref{eq:model0}, the VIF is defined as the ratio between the variance of the estimator in this model, \(var \left( \widehat{\beta}_{i} \right)\), and the variance of the estimator of a hypothetical reference model, that is, a hypothetical model in which orthogonality among the independent variables is assumed, \(var \left( \widehat{\beta}_{i,o} \right)\). This is to say: + +\begin{equation}\small{ + var \left( \widehat{\beta}_{i} \right) = \frac{\sigma^{2}}{n \cdot var(\mathbf{X}_{i})} \cdot \frac{1}{1 - R_{i}^{2}} = var \left( \widehat{\beta}_{i,o} \right) \cdot VIF(i), \quad i=2,\dots,k, + \label{eq:vari-VIF}} +\end{equation}\\ +\begin{equation} + \frac{ + var \left( \widehat{\beta}_{i} \right) + }{ + var \left( \widehat{\beta}_{i,o} \right) + } = VIF(i), \quad i=2,\dots,k, + \label{eq:vari-VIF2} +\end{equation} +where \(\mathbf{X}_{i}\) is the independent variable \(i\) of the model \eqref{eq:model0} and \(R^{2}_{i}\) the coefficient of determination of the following auxiliary regression: +\begin{equation} + \mathbf{X}_{i} = \mathbf{X}_{-i} \cdot \boldsymbol{\alpha} + \mathbf{v}, + \label{model_aux} \nonumber +\end{equation} +where \(\mathbf{X}_{-i}\) is the result of eliminating \(\mathbf{X}_{i}\) from the matrix \(\mathbf{X}\). + +As observed in the expression \eqref{eq:vari-VIF}, a high VIF leads to a high variance. Then, since the experimental value for the individual significance test is given by: +\begin{equation} + t_{i} = \left| \frac{\widehat{\beta}_{i}}{\sqrt{\frac{\widehat{\sigma}^{2}}{n \cdot var(\mathbf{X}_{i})} \cdot VIF(i)}} \right|, \quad i=2,\dots,k, + \label{eq:texp-orig} +\end{equation} +a high VIF will lead to a low experimental statistic (\(t_{i}\)), provoking the tendency not to reject the null hypothesis, i.e.~the experimental statistic will be lower than the theoretical statistic (given by \(t_{n-k}(1-\alpha/2)\), where \(\alpha\) is the significance level). + +However, this statement is full of simplifications. By following \citet{OBrien}, and as can be easily observed in the expression \eqref{eq:texp-orig}, other factors, such as the estimation of the random disturbance and the size of the sample, can counterbalance the high value of the VIF to yield a low value for the experimental statistic. That is to say, it is possible to obtain VIF values greater than 10 (the threshold traditionally established as troubling, see \citet{Marquardt1970} for example) that do not necessarily imply high estimated variance on account of a large sample size or a low value for the estimated variance of the random disturbance. This explains, as noted in the introduction, why not all models with a high value for the VIF present effects on the statistical analysis of the model. + +\begin{quote} +Example 1. +Thus, for example, \citet{Garciaetal2019b} considered an extension of the interest rate model presented by \citet{Wooldrigde2013}, where \(k=3\), in which all the independent variables have associated coefficients significantly different from zero, presenting a VIF equal to 71.516, much higher than the threshold normally established as worrying. In other words, in this case, a high VIF does not mean that the individual significance tests are affected. This situation is probably due to the fact that in this case 131 observations are available, i.e.~the expression \eqref{eq:texp-orig} can be expressed as: +\[t_{i} = \left| \frac{\widehat{\beta}_{i}}{\sqrt{\frac{\widehat{\sigma}^{2}}{131 \cdot var(\mathbf{X}_{i})} \cdot 71.516}} \right| += \left| \frac{\widehat{\beta}_{i}}{\sqrt{0.546 \cdot \frac{\widehat{\sigma}^{2}}{var(\mathbf{X}_{i})}}} \right|, \quad i=2,3.\] +Note that in this case a high value of \(n\) compensates for the high value of VIF. In addition, the value of \(n\) will also cause \(\widehat{\sigma}^{2}\) to decrease, since \(\widehat{\sigma}^{2} = \frac{\mathbf{e}^{t}\mathbf{e}}{n-k}\), where \(\mathbf{e}\) are the residuals of the original model \eqref{eq:model0}. +\end{quote} + +\begin{quote} +The Subsection \hyperref[effect-sample-size]{Effect of sample size..} provides an example that illustrates in more detail the effect of sample size on the statistical analysis of the model. \hfill \(\lozenge\) +\end{quote} + +On the other hand, considering the hypothetical orthogonal model, the value of the experimental statistic of the individual significance test, whose null hypothesis is \(\beta_{i} = 0\) in face of the alternative hypothesis \(\beta_{i} \not= 0\) with \(i=2,\dots,k\), is given by: +\begin{equation} + t_{i}^{o} = \left| \frac{\widehat{\beta}_{i}}{\sqrt{\frac{\widehat{\sigma}^{2}}{n \cdot var(\mathbf{X}_{i})}}} \right|, \quad i=2,\dots,k, + \label{eq:texp-orto-1} +\end{equation} +where the estimated variance of the estimator has been diminished due to the VIF always being greater than or equal to 1, and consequently, \(t_{i}^{o} \geq t_{i}\). However, it has been assumed that the same estimates for the independent variable coefficients and random disturbance variance are obtained in the orthogonal and original models, which does not seem to be a plausible supposition (see \citet{Salmeron2024a} Section 2.1 for more details). + +\subsection{An orthonormal reference model}\label{sub-above} + +In \citet{Salmeron2024a} the following QR decomposition of the matrix \(\mathbf{X}_{n \times k}\) of the model \eqref{eq:model0} is proposed: \(\mathbf{X} = \mathbf{X}_{o} \cdot \mathbf{P}\), where \(\mathbf{X}_{o}\) is an orthonormal matrix of the same dimensions as \(\mathbf{X}\) and \(\mathbf{P}\) is a higher-order triangular matrix of dimensions \(k \times k\). Then, the following hypothetical orthonormal reference model: +\begin{equation} + \mathbf{y} = \mathbf{X}_{o} \cdot \boldsymbol{\beta}_{o} + \mathbf{w}, + \label{eq:model-ref} +\end{equation} +verifies that: +\[\widehat{\boldsymbol{\beta}} = \mathbf{P}^{-1} \cdot \widehat{\boldsymbol{\beta}}_{o}, \ + \mathbf{e} = \mathbf{e}_{o}, \ + var \left( \widehat{\boldsymbol{\beta}}_{o} \right) = \sigma^{2} \cdot \mathbf{I},\] +where \(\mathbf{e}_{o}\) are the residuals of the orthonormal reference model \eqref{eq:model-ref}. +Note that since \(\mathbf{e} = \mathbf{e}_{o}\), the estimate of \(\sigma^{2}\) is the same in the original model \eqref{eq:model0} and in the orthonormal reference model \eqref{eq:model-ref}. +Moreover, since the dependent variable is the same in both models, the coefficient of determination and the experimental value of the global significance test are the same in both cases. + +From these values, taking into account the expressions \eqref{eq:vari-VIF} and \eqref{eq:vari-VIF2}, it is evident that the ratio between the variance of the estimator in the original model \eqref{eq:model0} and the variance of the estimator of the orthonormal reference model \eqref{eq:model-ref} is: +\begin{equation} + \frac{ + var \left( \widehat{\beta}_{i} \right) + }{ + var \left( \widehat{\beta}_{i,o} \right) + } = \frac{VIF(i)}{n \cdot var(\mathbf{X}_{i})}, \quad i=2,\dots,k. + \label{eq:redef-VIF} \nonumber +\end{equation} +Consequently, \citet{Salmeron2024a} defined the redefined VIF (RVIF) for \(i=1,\dots,k\) as: +\begin{equation}\small{ + RVIF(i) = \frac{VIF(i)}{n \cdot var(\mathbf{X}_{i})} = \frac{\mathbf{X}_{i}^{t} \mathbf{X}_{i}}{\mathbf{X}_{i}^{t} \mathbf{X}_{i} - \mathbf{X}_{i}^{t} \mathbf{X}_{-i} \cdot \left( \mathbf{X}_{-i}^{t} \mathbf{X}_{-i} \right)^{-1} \cdot \mathbf{X}_{-i}^{t} \mathbf{X}_{i}}, \label{eq:RVIF}} +\end{equation} +which shows, among other questions, that it is defined for \(i=1,2,\dots,k\). That is, in contrast to the VIF, the RVIF can be calculated for the intercept of the linear regression model. + +Other considerations to be taken into account are the following: + +\begin{itemize} +\item + If the data are expressed in unit length, same transformation used to calculate the Condition Number (CN), then: + \[RVIF(i) = \frac{1}{1 - \mathbf{X}_{i}^{t} \mathbf{X}_{-i} \cdot \left( \mathbf{X}_{-i}^{t} \mathbf{X}_{-i} \right)^{-1} \cdot \mathbf{X}_{-i}^{t} \mathbf{X}_{i}}, \quad i=1,\dots,k.\] +\item + In this case (data expressed in unit length), when \(\mathbf{X}_{i}\) is orthogonal to \(\mathbf{X}_{-i}\), it is verified that \(\mathbf{X}_{i}^{t} \mathbf{X}_{-i} = \mathbf{0}\) and, consequently \(RVIF(i) = 1\) for \(i=1,\dots,k\). That is, the RVIF is always greater than or equal to 1 and its minimum value is indicative of the absence of multicollinearity. +\item + Denoted by \(a_{i}= \mathbf{X}_{i}^{t} \mathbf{X}_{i} \cdot \left( \mathbf{X}_{-i}^{t} \mathbf{X}_{-i} \right)^{-1} \cdot \mathbf{X}_{-i}^{t} \mathbf{X}_{i}\), it is verified that \(RVIF(i) = \frac{1}{1-a_{i}}\) where \(a_{i}\) can be interpreted as the percentage of approximate multicollinearity due to variable \(\mathbf{X}_{i}\). Note the similarity of this expression to that of the VIF: \(VIF(i) = \frac{1}{1-R_{i}^{2}}\) (see equation \eqref{eq:vari-VIF}). +\item + Finally, from a simulation for \(k=3\), \citet{Salmeron2024a} show that if \(a_{i} > 0.826\), then the degree of multicollinearity is worrying. In any case this value should be refined by considering higher values of \(k\). +\end{itemize} + +On the other hand, given the orthonormal reference model \eqref{eq:model-ref}, the value for the experimental statistic of the individual significance test with the null hypothesis \(\beta_{i,o} = 0\) (given the alternative hypothesis \(\beta_{i,o} \not= 0\), for \(i=1,\dots,k\)) is: +\begin{equation} + t_{i}^{o} = \left| \frac{\widehat{\beta}_{i,o}}{\widehat{\sigma}} \right| = \left| \frac{\mathbf{p}_{i} \cdot \widehat{\boldsymbol{\beta}}}{\widehat{\sigma}} \right|, + \label{eq:texp-orto-2} +\end{equation} +where \(\mathbf{p}_{i}\) is the \(i\) row of the matrix \(\mathbf{P}\). + +By comparing this expression with the one given in \eqref{eq:texp-orto-1}, it is observed that, as expected, not only the denominator but also the numerator has changed. +Thus, in addition to the VIF, the rest of the elements in expression \eqref{eq:texp-orig} have also changed. +Consequently, if the null hypothesis is rejected in the original model, it is not assured that the same will occur in the orthonormal reference model. For this reason, it is possible to consider that the orthonormal model proposed as the reference model in \citet{Salmeron2024a} is more plausible than the one traditionally applied. + +\subsection{Possible scenarios in the individual significance tests}\label{possible-scenarios-in-the-individual-significance-tests} + +To determine whether the tendency not to reject the null hypothesis in the individual significance test is caused by a troubling approximate multicollinearity that inflates the variance of the estimator, or whether it is caused by variables not being statistically significantly related, the following situations are distinguished with a significance level \(\alpha\): + +\begin{enumerate} +\def\labelenumi{\alph{enumi}.} +\tightlist +\item + If the null hypothesis is initially rejected in the original model \eqref{eq:model0}, \(t_{i} > t_{n-k}(1-\alpha/2)\), the following results can be obtained for the orthonormal model: +\end{enumerate} + +a.1. the null hypothesis is rejected, \(t_{i}^{o} > t_{n-k}(1-\alpha/2)\); then, the results are consistent. + +a.2. the null hypothesis is not rejected, \(t_{i}^{o} < t_{n-k}(1-\alpha/2)\); this could be an inconsistency. + +\begin{enumerate} +\def\labelenumi{\alph{enumi}.} +\setcounter{enumi}{1} +\tightlist +\item + If the null hypothesis is not initially rejected in the original model \eqref{eq:model0}, \(t_{i} < t_{n-k}(1-\alpha/2)\), the following results may occur for the orthonormal model: +\end{enumerate} + +b.1 the null hypothesis is rejected, \(t_{i}^{o} > t_{n-k}(1-\alpha/2)\); then, it is possible to conclude that the degree of multicollinearity affects the statistical analysis of the model, provoking not rejecting the null hypothesis in the original model. + +b.2 the null hypothesis is also not rejected, \(t_{i}^{o} < t_{n-k}(1-\alpha/2)\); then, the results are consistent. + +In conclusion, when option b.1 is given, the null hypothesis of the individual significance test is not rejected when the linear relationships are considered (original model) but is rejected when the linear relationships are not considered (orthonormal model). Consequently, it is possible to conclude that the linear relationships affect the statistical analysis of the model. The possible inconsistency discussed in option a.2 is analyzed in detail in Appendix \hyperref[inconsistency]{Inconsistency}, concluding that it will rarely occur in cases where a high degree of multicollinearity is assumed. The other two scenarios provide consistent situations. + +\section{A first attempt to obtain a non-rejection region associated with a statistical test to detect multicollinearity}\label{modelo-orto} + +\subsection{From the traditional orthogonal model}\label{from-the-traditional-orthogonal-model} + +Considering the expressions \eqref{eq:texp-orig} and \eqref{eq:texp-orto-1}, it is verified that \(t_{i}^{o} = t_{i} \cdot \sqrt{VIF(i)}\). Consequently, in the orthogonal case, with a significance level \(\alpha\), the null hypothesis \(\beta_{i,o} = 0\) is rejected if \(t_{i}^{o} > t_{n-k}(1-\alpha/2)\) for \(i=2,\dots,k.\) That is, if: +\begin{equation} + VIF(i) > \left( \frac{t_{n-k}(1-\alpha/2)}{t_{i}} \right)^{2} = c_{1}(i), \quad i=2,\dots,k. + \label{eq:cond-false} +\end{equation} +Thus, if the VIF associated with the variable \(i\) is greater than the upper bound \(c_{1}(i)\), then it can be concluded that the estimator of the coefficient of that variable is significantly different from zero in the hypothetical case where the variables are orthogonal. In addition, if the null hypothesis is not rejected in the initial model, the reason for the failure to reject could be due to the degree of multicollinearity that affects the statistical analysis of the model. + +Finally, note that since the interesting cases are those where the null hypothesis is not initially rejected, \(t_{i} < t_{n-k}(1-\alpha/2)\), the upper bound \(c_{1}(i)\) will always be greater than one. + +\begin{quote} +Example 2. +Table \ref{tab:WisseltableLATEX} shows a dataset (previously presented by \citet{Wissell}) with the following variables: outstanding mortgage debt (\(\mathbf{D}\), trillions of dollars), personal consumption (\(\mathbf{C}\), trillions of dollars), personal income (\(\mathbf{I}\), trillions of dollars) and outstanding consumer credit (\(\mathbf{CP}\), trillions of dollars) for the years 1996 to 2012. +\end{quote} + +\begin{table} +\centering +\caption{\label{tab:WisseltableLATEX}Data set presented previously by Wissell} +\centering +\resizebox{\ifdim\width>\linewidth\linewidth\else\width\fi}{!}{ +\begin{tabular}[t]{ccccc} +\toprule +t & D & C & I & CP\\ +\midrule +1996 & 3.805 & 4.770 & 4.879 & 808.23\\ +1997 & 3.946 & 4.778 & 5.051 & 798.03\\ +1998 & 4.058 & 4.935 & 5.362 & 806.12\\ +1999 & 4.191 & 5.100 & 5.559 & 865.65\\ +2000 & 4.359 & 5.291 & 5.843 & 997.30\\ +\addlinespace +2001 & 4.545 & 5.434 & 6.152 & 1140.70\\ +2002 & 4.815 & 5.619 & 6.521 & 1253.40\\ +2003 & 5.129 & 5.832 & 6.915 & 1324.80\\ +2004 & 5.615 & 6.126 & 7.423 & 1420.50\\ +2005 & 6.225 & 6.439 & 7.802 & 1532.10\\ +\addlinespace +2006 & 6.786 & 6.739 & 8.430 & 1717.50\\ +2007 & 7.494 & 6.910 & 8.724 & 1867.20\\ +2008 & 8.399 & 7.099 & 8.882 & 1974.10\\ +2009 & 9.395 & 7.295 & 9.164 & 2078.00\\ +2010 & 10.680 & 7.561 & 9.727 & 2191.30\\ +\addlinespace +2011 & 12.071 & 7.804 & 10.301 & 2284.90\\ +2012 & 13.448 & 8.044 & 10.983 & 2387.50\\ +\bottomrule +\end{tabular}} +\end{table} + +\begin{quote} +Table \ref{tab:Wissel0tableLATEX} shows the OLS estimation of the model explaining the outstanding mortgage debt as a function of the rest of the variables. That is: +\[\mathbf{D} = \beta_{1} + \beta_{2} \cdot \mathbf{C} + \beta_{3} \cdot \mathbf{I} + \beta_{4} \cdot \mathbf{CP} + \mathbf{u}.\] +Note that the estimates for the coefficients of personal consumption, personal income and outstanding consumer credit are not significantly different from zero (a significance level of 5\% is considered throughout the paper), while the model is considered to be globally valid (experimental value, F exp., higher than theoretical value). +\end{quote} + +\begin{table} +\centering +\caption{\label{tab:Wissel0tableLATEX}OLS estimation for the Wissel model} +\centering +\resizebox{\ifdim\width>\linewidth\linewidth\else\width\fi}{!}{ +\begin{tabular}[t]{lcccc} +\toprule + & Estimator & Standard Error & Experimental t & p-value\\ +\midrule +Intercept & 5.469 & 13.017 & 0.420 & 0.681\\ +Personal consumption & -4.252 & 5.135 & -0.828 & 0.422\\ +Personal income & 3.120 & 2.036 & 1.533 & 0.149\\ +Outstanding consumer credit & 0.003 & 0.006 & 0.500 & 0.626\\ +(Obs, Sigma Est., Coef. Det., F exp.) & 17.000 & 0.870 & 0.923 & 52.305\\ +\bottomrule +\end{tabular}} +\end{table} + +\begin{quote} +In addition, the estimated coefficient for the variable personal consumption, which is not significantly different from zero, has the opposite sign to the simple correlation coefficient between this variable and outstanding mortgage debt, 0.953. +Thus, in the simple linear regression between both variables (see Table \ref{tab:Wissel1tableLATEX}), the estimated coefficient of the variable personal consumption is positive and significantly different from zero. However, adding a second variable (see Tables \ref{tab:Wissel2tableLATEX} and \ref{tab:Wissel3tableLATEX}) none of the coefficients are individually significantly different from zero although both models are globally significant. +This is traditionally understood as a symptom of statistically troubling multicollinearity. +\end{quote} + +\begin{table} +\centering +\caption{\label{tab:Wissel1tableLATEX}OLS estimation for part of the Wissel model} +\centering +\resizebox{\ifdim\width>\linewidth\linewidth\else\width\fi}{!}{ +\begin{tabular}[t]{lcccc} +\toprule + & Estimator & Standard Error & Experimental t & p-value\\ +\midrule +Intercept & -9.594 & 1.351 & -7.102 & 0.000\\ +Personal consumption & 2.629 & 0.214 & 12.285 & 0.000\\ +(Obs, Sigma Est., Coef. Det., F exp.) & 17.000 & 0.890 & 0.910 & 150.925\\ +\bottomrule +\end{tabular}} +\end{table} + +\begin{table} +\centering +\caption{\label{tab:Wissel2tableLATEX}OLS estimation for part of the Wissel model} +\centering +\resizebox{\ifdim\width>\linewidth\linewidth\else\width\fi}{!}{ +\begin{tabular}[t]{lcccc} +\toprule + & Estimator & Standard Error & Experimental t & p-value\\ +\midrule +Intercept & -0.117 & 6.476 & -0.018 & 0.986\\ +Personal consumption & -2.343 & 3.335 & -0.703 & 0.494\\ +Personal income & 2.856 & 1.912 & 1.494 & 0.158\\ +(Obs, Sigma Est., Coef. Det., F exp.) & 17.000 & 0.823 & 0.922 & 82.770\\ +\bottomrule +\end{tabular}} +\end{table} + +\begin{table} +\centering +\caption{\label{tab:Wissel3tableLATEX}OLS estimation for part of the Wissel model} +\centering +\resizebox{\ifdim\width>\linewidth\linewidth\else\width\fi}{!}{ +\begin{tabular}[t]{lcccc} +\toprule + & Estimator & Standard Error & Experimental t & p-value\\ +\midrule +Intercept & -8.640 & 9.638 & -0.896 & 0.385\\ +Personal consumption & 2.335 & 2.943 & 0.793 & 0.441\\ +Outstanding consumer credit & 0.001 & 0.006 & 0.100 & 0.922\\ +(Obs, Sigma Est., Coef. Det., F exp.) & 17.000 & 0.953 & 0.910 & 70.487\\ +\bottomrule +\end{tabular}} +\end{table} + +\begin{quote} +By using expression \eqref{eq:cond-false} in order to confirm this problem, it is verified that \(c_{1}(2) = 6.807\), \(c_{1}(3) = 1.985\) and \(c_{1}(4) = 18.743\), taking into account that \(t_{13}(0.975) = 2.160\). Since the VIFs are equal to 589.754, 281.886 and 189.487, respectively, it is concluded that the individual significance tests for the three cases are affected by the degree of multicollinearity existing in the model. \hfill \(\lozenge\) +\end{quote} + +\subsection{From the alternative orthonormal model \eqref{eq:model-ref}}\label{from-the-alternative-orthonormal-model-refeqmodel-ref} + +In the Subsection \hyperref[sub-above]{An orthonormal reference model} the individual significance test from the expression \eqref{eq:texp-orto-2} is redefined. Thus, the null hypothesis \(\beta_{i,o}=0\) will be rejected, with a significance level \(\alpha\), if the following condition is verified: +\[t_{i}^{o} > t_{n-k}(1-\alpha/2), \quad i=2,\dots,k.\] +Taking into account the expressions \eqref{eq:texp-orig} and \eqref{eq:texp-orto-2}, this is equivalent to: +\begin{equation}\small{ + VIF(i) > \left( \frac{t_{n-k}(1-\alpha/2)}{\widehat{\beta}_{i,o}} \right)^{2} \cdot \widehat{var} \left( \widehat{\beta}_{i} \right) \cdot n \cdot var(\mathbf{X}_{i}) = c_{2}(i). \label{eq:cota-VIF-orto}} +\end{equation} + +Thus, if the \(VIF(i)\) is greater than \(c_{2}(i)\), the null hypothesis is rejected in the respective individual significance tests in the orthonormal model (with \(i=2,\dots,k\)). Then, if the null hypothesis is not rejected in the original model and it is verified that \(VIF(i) > c_{2}(i)\), it can be concluded that the multicollinearity existing in the model affects its statistical analysis. In summary, a lower bound for the VIF is established to indicate when the approximate multicollinearity is troubling in a way that can be reinterpreted and presented as a region of non-rejection of a statistical test. + +\begin{quote} +Example 3. +Continuing with the dataset presented by \citet{Wissell}, Table \ref{tab:WisselORTOtableLATEX} shows the results of the OLS estimation of the orthonormal model obtained from the original model. +\end{quote} + +\begin{table} +\centering +\caption{\label{tab:WisselORTOtableLATEX}OLS estimation for the orthonormal Wissel model} +\centering +\resizebox{\ifdim\width>\linewidth\linewidth\else\width\fi}{!}{ +\begin{tabular}[t]{lcccc} +\toprule + & Estimator & Standard Error & Experimental t & p-value\\ +\midrule +Intercept & -27.882 & 0.932 & -29.901 & 0.000\\ +Personal consumption & 11.592 & 0.932 & 12.432 & 0.000\\ +Personal income & -1.355 & 0.932 & -1.453 & 0.170\\ +Outstanding consumer credit & 0.466 & 0.932 & 0.500 & 0.626\\ +(Obs, Sigma Est., Coef. Det., F exp.) & 17.000 & 0.870 & 0.923 & 52.305\\ +\bottomrule +\end{tabular}} +\end{table} + +\begin{quote} +When these results are compared with those in Table \ref{tab:Wissel0tableLATEX}, the following conclusions can be obtained: + +\begin{itemize} +\item + Except for the outstanding consumer credit variable, whose standard deviation has increased, the standard deviation has decreased in all cases. +\item + The absolute values of the experimental statistics of the individual significance tests associated with the intercept and the personal consumption variable have increased, while the experimental statistic of the personal income variable has decreased, and the experimental statistic of the outstanding consumer credit variable remains the same. These facts show that the change from the original model to the orthonormal model does not guarantee an increase in the absolute value of the experimental statistic. +\item + The estimation of the coefficient of the personal consumption variable is not significantly different from zero in the original model, but it is in the orthogonal model. Thus, it is concluded that multicollinearity affects the statistical analysis of the model. Note that there is also a change in the sign of the estimate, although the purpose of the orthogonal model is not to obtain estimates for the coefficients, but rather to provide a reference point against which to measure how much the variances are inflated. Note that an orthonormal model is an idealized construction that may lack a proper interpretation in practice. +\item + The values corresponding to the estimated variance for the random disturbance, the coefficient of determination and the experimental statistic (F exp.) for the global significance test remain the same. +\end{itemize} +\end{quote} + +\begin{quote} +On the other hand, considering the VIF of the independent variables except for the intercept (589.754, 281.886 and 189.487) and their corresponding bounds (17.809, 623.127 and 3545.167) obtained from the expression \eqref{eq:cota-VIF-orto}, only the variable of personal consumption verifies that the VIF is higher than the corresponding bound. These results are different from those obtained in Example 2, where the traditional orthogonal model was taken as a reference. +\end{quote} + +\begin{quote} +Finally, Tables \ref{tab:Wissel0tableLATEX} and \ref{tab:WisselORTOtableLATEX} show that the experimental values of the statistic \(t\) of the variable outstanding consumer credit are the same in the original and orthonormal models. \hfill \(\lozenge\) +\end{quote} + +The last fact highlighted at the end of the previous example is not a coincidence, but a consequence of the QR decomposition, see Appendix \hyperref[apendix]{Test of\ldots{}}. Therefore, in this case, the conclusion of the individual significance test will be the same in the original and in the orthonormal model, i.e.~we will always be in scenarios a.1 or b.2. + +Thus, this behavior establishes a situation where it is required to select the variable fixed in the last position. Some criteria to select the most appropriate variable for this placement could be: + +\begin{itemize} +\item + To fix the variable that is considered less relevant to the model. +\item + To fix a variable whose associated coefficient is significantly different from zero, since this case would not be of interest for the definition of multicollinearity given in the paper. Note that the interest will be related to a coefficient considered as zero in the original model and significantly different from zero in the orthonormal one. +\end{itemize} + +These options are explored in the Subsection \hyperref[how-to-fix]{Choice of the variable to be fix\ldots{}}. + +\section{A non-rejection region associated with a statistical test to detect multicollinearity}\label{new-VIF-orto} + +\citet{Salmeron2024a} show that high values of RVIF are associated with a high degree of multicollinearity. The question, however, is how high RVIFs have to be to reflect troubling multicollinearity. + +Taking into account the expressions \eqref{eq:RVIF} and \eqref{eq:cota-VIF-orto}, it is possible to conclude that multicollinearity is affecting the statistical analysis of the model if it can be verified that: +\begin{equation} + RVIF(i) > \left( \frac{t_{n-k}(1-\alpha/2)}{\widehat{\beta}_{i,o}} \right)^{2} \cdot \widehat{var} \left( \widehat{\beta}_{i} \right) = c_{3}(i), + \label{eq:cota-VIFR} +\end{equation} +for any \(i=1,\dots,k\). Note that the intercept is included in this proposal, in contrast to the previous section, in which it was not included. + +By following \citet{OBrien} and taking into account that the estimation of the expression \eqref{eq:vari-VIF} can be expressed as: +\[\widehat{var} \left( \widehat{\beta}_{i} \right) = \widehat{\sigma}^{2} \cdot RVIF(i) = \frac{\mathbf{e}^{t}\mathbf{e}}{n-k} \cdot RVIF(i),\] +there are other factors that counterbalance a high value of RVIF, thereby avoiding high estimated variances for the estimated coefficients. These factors are the sum of the squared residuals (SSR= \(\mathbf{e}^{t}\mathbf{e}\)) of the model \eqref{eq:model0} and \(n\). Thus, an appropriate specification of the econometric model (i.e., one that implies a good fit and, consequently, a small SSR) and a large sample size can compensate for high RVIF values. +However, contrary to what happens for the VIF in the traditional case, these factors are taken into account in the threshold \(c_{3}(i)\), as established in the expression \eqref{eq:cota-VIFR} in \(\widehat{var} \left( \widehat{\beta}_{i} \right)\). + +\begin{quote} +Example 4. +This contribution can be illustrated with the data set previously presented by \citet{KleinGoldberger}, which includes variables for consumption, \(\mathbf{C}\), wage incomes, \(\mathbf{I}\), non-farm incomes, \(\mathbf{InA}\), and farm incomes, \(\mathbf{IA}\), in United States from 1936 to 1952, as shown in Table \ref{tab:KGtableLATEX} (data from 1942 to 1944 are not available because they were war years). +\end{quote} + +\begin{table} +\centering +\caption{\label{tab:KGtableLATEX}Data set presented previously by Klein and Goldberger} +\centering +\resizebox{\ifdim\width>\linewidth\linewidth\else\width\fi}{!}{ +\begin{tabular}[t]{cccc} +\toprule +Consumption & Wage income & Non-farm income & Farm income\\ +\midrule +62.8 & 43.41 & 17.10 & 3.96\\ +65.0 & 46.44 & 18.65 & 5.48\\ +63.9 & 44.35 & 17.09 & 4.37\\ +67.5 & 47.82 & 19.28 & 4.51\\ +71.3 & 51.02 & 23.24 & 4.88\\ +\addlinespace +76.6 & 58.71 & 28.11 & 6.37\\ +86.3 & 87.69 & 30.29 & 8.96\\ +95.7 & 76.73 & 28.26 & 9.76\\ +98.3 & 75.91 & 27.91 & 9.31\\ +100.3 & 77.62 & 32.30 & 9.85\\ +\addlinespace +103.2 & 78.01 & 31.39 & 7.21\\ +108.9 & 83.57 & 35.61 & 7.39\\ +108.5 & 90.59 & 37.58 & 7.98\\ +111.4 & 95.47 & 35.17 & 7.42\\ +\bottomrule +\end{tabular}} +\end{table} + +\begin{quote} +Table \ref{tab:regKGtableLATEX} shows the OLS estimations of the model explaining consumption as a function of the rest of the variables. Note that there is some incoherence between the individual significance values of the variables and the global significance of the model. +\end{quote} + +\begin{table} +\centering +\caption{\label{tab:regKGtableLATEX}OLS estimation for the Klein and Goldberger model} +\centering +\resizebox{\ifdim\width>\linewidth\linewidth\else\width\fi}{!}{ +\begin{tabular}[t]{lcccc} +\toprule + & Estimator & Standard Error & Experimental t & p-value\\ +\midrule +Intercept & 18.702 & 6.845 & 2.732 & 0.021\\ +Wage income & 0.380 & 0.312 & 1.218 & 0.251\\ +Non-farm income & 1.419 & 0.720 & 1.969 & 0.077\\ +Farm income & 0.533 & 1.400 & 0.381 & 0.711\\ +(Obs, Sigma Est., Coef. Det., F exp.) & 14.000 & 36.725 & 0.919 & 37.678\\ +\bottomrule +\end{tabular}} +\end{table} + +\begin{quote} +The RVIFs are calculated, yielding 1.275, 0.002, 0.014 and 0.053, respectively. The associated bounds, \(c_{3}(i)\), are also calculated, yielding 0.002, 0.0001, 0.018 and 1.826, respectively. +\end{quote} + +\begin{quote} +Since the coefficient of the wage income variable is not significantly different from zero, and because it is verified that \(0.002 > 0.0001\), from \eqref{eq:cota-VIFR} it is concluded that the degree of multicollinearity existing in the model is affecting its statistical analysis. +\hfill \(\lozenge\) +\end{quote} + +\subsection{From the RVIF}\label{TheTHEOREM} + +Considering that in the original model \eqref{eq:model0} the null hypothesis \(\beta_{i} = 0\) of the individual significance test is not rejected if: +\[RVIF(i) > \left( \frac{\widehat{\beta}_{i}}{\widehat{\sigma} \cdot t_{n-k}(1-\alpha/2)} \right)^{2} = c_{0}(i), \quad i=1,\dots,k,\] +while in the orthonormal model, the null hypothesis is rejected if \(RVIF(i) > c_{3}(i)\), the following theorem can be established: + +\begin{quote} +Theorem. Given the multiple linear regression model \eqref{eq:model0}, the degree of multicollinearity affects its statistical analysis (with a level of significance of \(\alpha\%\)) if there is a variable \(i\), with \(i=1,\dots,k\), that verifies \(RVIF(i) > \max \{ c_{0}(i), c_{3}(i) \}\). +\end{quote} + +Note that \citet{Salmeron2024a} indicate that the RVIF must be calculated with unit length data (as any other transformation removes the intercept from the analysis), however, for the correct application of this theorem the original data must be used as no transformation has been considered in this paper. + +\begin{quote} +Example 5. Tables \ref{tab:theoremWISSELtableLATEX} and \ref{tab:theoremKGtableLATEX} present the results of applying the theorem to the \citet{Wissell} and \citet{KleinGoldberger} models, respectively. Note that in both cases, there is a variable \(i\) that verifies that \(RVIF(i) > \max \{ c_{0}(i), c_{3}(i) \}\), and consequently, we can conclude that the degree of approximate multicollinearity is affecting the statistical analysis in both models (with a level of significance of \(5\%\)). \hfill \(\lozenge\) +\end{quote} + +\begin{table} +\centering +\caption{\label{tab:theoremWISSELtableLATEX}Theorem results of the Wissel model} +\centering +\resizebox{\ifdim\width>\linewidth\linewidth\else\width\fi}{!}{ +\begin{tabular}[t]{lccccc} +\toprule + & RVIFs & c0 & c3 & Scenario & Affects\\ +\midrule +Intercept & 194.866090 & 7.371069 & 1.017198 & b.1 & Yes\\ +Personal consumption & 30.326281 & 4.456018 & 0.915790 & b.1 & Yes\\ +Personal income & 4.765888 & 2.399341 & 10.535976 & b.2 & No\\ +Outstanding consumer credit & 0.000038 & 0.000002 & 0.000715 & b.2 & No\\ +\bottomrule +\end{tabular}} +\end{table} + +\begin{table} +\centering +\caption{\label{tab:theoremKGtableLATEX}Theorem results of the Klein and Goldberger model} +\centering +\resizebox{\ifdim\width>\linewidth\linewidth\else\width\fi}{!}{ +\begin{tabular}[t]{lccccc} +\toprule + & RVIFs & c0 & c3 & Scenario & Affects\\ +\midrule +Intercept & 1.275948 & 1.918383 & 0.002189 & a.1 & No\\ +Wage income & 0.002653 & 0.000793 & 0.000121 & b.1 & Yes\\ +Non-farm income & 0.014131 & 0.011037 & 0.018739 & b.2 & No\\ +Farm income & 0.053355 & 0.001558 & 1.826589 & b.2 & No\\ +\bottomrule +\end{tabular}} +\end{table} + +\section{The rvif package}\label{paqueteRVIF} + +The results developed in \citet{Salmeron2024a} and in this paper have been implemented in the \CRANpkg{rvif} package of R (\citet{R}). The following shows how to replicate the results presented in both papers from the existing commands \(\texttt{rvifs}\) and \(\texttt{multicollinearity}\) in \CRANpkg{rvif}. For this reason, the code executed is shown below. + +In addition, the following issues will be addressed: + +\begin{itemize} +\item + Discussion on the effect of sample size in detecting the influence of multicollinearity on the statistical analysis of the model. +\item + Discussion on the choice of the variable to be fixed as the last one before the orthonormalization. +\end{itemize} + +The code used in these two Subsections is available at \url{https://github.com/rnoremlas/RVIF/tree/main/rvif\%20package}. +It is also interesting to consult the package vignette using the command \texttt{browseVignettes("rvif")}, as well as its web page with \texttt{browseURL(system.file("docs/index.html",\ package\ =\ "rvif"))} or \url{https://www.ugr.es/local/romansg/rvif/index.html}. + +\subsection{Detection of multicollinearity with RVIF: does the degree of multicollinearity affect the statistical analysis of the model?}\label{detection-of-multicollinearity-with-rvif-does-the-degree-of-multicollinearity-affect-the-statistical-analysis-of-the-model} + +In \citet{Salmeron2024a} a series of examples are presented to illustrate the usefulness of RVIF to detect the degree of approximate multicollinearity in a multiple linear regression model. +Results presented by \citet{Salmeron2024a} will be reproduced by using the command \(\texttt{rvifs}\) of \CRANpkg{rvif} package and complemented with the contribution developed in the present work by using the command \(\texttt{multicollinearity}\) of the same package. +In order to facilitate the reading of the paper, this information is available in Appendix \hyperref[examplesRVIF]{Examples of\ldots{}}. + +On the other hand, the following shows how to use the above commands to obtain the results shown in Table 9 of this paper: + +\begin{verbatim} +y_W = Wissel[,2] +X_W = Wissel[,3:6] +multicollinearity(y_W, X_W) +\end{verbatim} + +\begin{verbatim} +#> RVIFs c0 c3 Scenario Affects +#> 1 1.948661e+02 7.371069e+00 1.017198e+00 b.1 Yes +#> 2 3.032628e+01 4.456018e+00 9.157898e-01 b.1 Yes +#> 3 4.765888e+00 2.399341e+00 1.053598e+01 b.2 No +#> 4 3.821626e-05 2.042640e-06 7.149977e-04 b.2 No +\end{verbatim} + +It is noted that the first two arguments of the \(\texttt{multicollinearity}\) command are, respectively, the dependent variable of the linear model and the design matrix containing the independent variables (intercept included as the first column). + +While the results in Table 10 can be obtained using this code: + +\begin{verbatim} +y_KG = KG[,1] +cte = rep(1, length(y)) +X_KG = cbind(cte, KG[,2:4]) +multicollinearity(y_KG, X_KG) +\end{verbatim} + +\begin{verbatim} +#> RVIFs c0 c3 Scenario Affects +#> 1 1.275947615 1.9183829079 0.0021892653 a.1 No +#> 2 0.002652862 0.0007931658 0.0001206694 b.1 Yes +#> 3 0.014130621 0.0110372472 0.0187393601 b.2 No +#> 4 0.053354814 0.0015584988 1.8265885762 b.2 No +\end{verbatim} + +As is known, in both cases it is concluded that the degree of multicollinearity in the model affects its statistical analysis. + +The \(\texttt{multicollinearity}\) command is used by default with a significance level of 5\% for the application of the Theorem set in Subsection \hyperref[TheTHEOREM]{From the RVIF}. +Note that if the significance level is changed to 1\% (third argument of the \(\texttt{multicollinearity}\) command), in the Klein and Goldberger model it is obtained that the individual significance test of the intercept is also affected by the degree of existing multicollinearity: + +\begin{verbatim} +multicollinearity(y_W, X_W, alpha = 0.01) +\end{verbatim} + +\begin{verbatim} +#> RVIFs c0 c3 Scenario Affects +#> 1 1.948661e+02 3.791375e+00 1.977602791 b.1 Yes +#> 2 3.032628e+01 2.291992e+00 1.780449066 b.1 Yes +#> 3 4.765888e+00 1.234122e+00 20.483705068 b.2 No +#> 4 3.821626e-05 1.050650e-06 0.001390076 b.2 No +\end{verbatim} + +\begin{verbatim} +multicollinearity(y_KG, X_KG, alpha = 0.01) +\end{verbatim} + +\begin{verbatim} +#> RVIFs c0 c3 Scenario Affects +#> 1 1.275947615 0.9482013897 0.0044292796 b.1 Yes +#> 2 0.002652862 0.0003920390 0.0002441361 b.1 Yes +#> 3 0.014130621 0.0054553932 0.0379131147 b.2 No +#> 4 0.053354814 0.0007703211 3.6955190555 b.2 No +\end{verbatim} + +It can be seen that the values of \(c_{0}\) and \(c_{3}\) change depending on the significance level used. + +\subsection{Effect of the sample size on the detection of the influence of multicollinearity on the statistical analysis of the model}\label{effect-sample-size} + +The introduction has highlighted the idea that the measures traditionally used to detect whether the degree of multicollinearity is of concern may indicate that it is troubling while the model analysis is not affected by it. Example 1 shows that this may be due, among other factors, to the size of the sample. + +To explore this issue in more detail, below is given an example where traditional measures of multicollinearity detection indicate that the existing multicollinearity is troubling while the statistical analysis of the model is not affected when the sample size is high. In particular, observations are simulated for \(\mathbf{X} = [ \mathbf{1} \ \mathbf{X}_{2} \ \mathbf{X}_{3} \ \mathbf{X}_{4} \ \mathbf{X}_{5} \ \mathbf{X}_{6}]\) where: +\[\mathbf{X}_{2} \sim N(5, 0.1^{2}), \quad \mathbf{X}_{3} \sim N(5, 10^{2}), \quad \mathbf{X}_{4} = \mathbf{X}_{3} + \mathbf{p}\] +\[\mathbf{X}_{5} \sim N(-1, 3^{2}), \quad \mathbf{X}_{6} \sim N(15, 2.5^{2}),\] +where \(\mathbf{p} \sim N(5, 0.5^2)\) and considering three different sample sizes: \(n = 3000\) (Simulation 1), \(n = 100\) (Simulation 2) and \(n = 30\) (Simulation 3). +In all cases the dependent variable is generated according to: +\[\mathbf{y} = 4 + 5 \cdot \mathbf{X}_{2} - 9 \cdot \mathbf{X}_{3} - 2 \cdot \mathbf{X}_{4} + 2 \cdot \mathbf{X}_{5} + 7 \cdot \mathbf{X}_{6} + \mathbf{u},\] +where \(\mathbf{u} \sim N(0, 2^2)\). + +To set the results, a seed has been established using the command \emph{set.seed(2024)}. + +With this generation it is intended that the variable \(\mathbf{X}_{2}\) is linearly related to the intercept as well as \(\mathbf{X}_{3}\) to \(\mathbf{X}_{4}\). This is supported by the results shown in Table \ref{tab:traditionalSIMULATIONtableLATEX}, which have been obtained using the \CRANpkg{multiColl} package of R (\citet{R}) using the commands \(\texttt{CV}\), \(\texttt{VIF}\) and \(\texttt{CN}\). + +The results imply the same conclusions in all three simulations: + +\begin{itemize} +\item + There is a worrying degree of non-essential multicollinearity in the model relating the intercept to the variable \(\mathbf{X}_{2}\) since its coefficient of variation (CV) is lower than 0.1002506. +\item + There is a worrying degree of essential multicollinearity in the model relating the variables \(\mathbf{X}_{3}\) and \(\mathbf{X}_{4}\) since the associated Variance Inflation Factors (VIF) are greater than 10. +\end{itemize} + +\begin{table} +\centering +\caption{\label{tab:traditionalSIMULATIONtableLATEX}CVs, VIFs and CN for data of Simulations 1, 2 and 3} +\centering +\resizebox{\ifdim\width>\linewidth\linewidth\else\width\fi}{!}{ +\begin{tabular}[t]{lccc} +\toprule + & Simulation 1 & Simulation 2 & Simulation 3\\ +\midrule +X2 CV & 0.020 & 0.019 & 0.025\\ +X3 CV & 2.010 & 1.827 & 3.326\\ +X4 CV & 1.004 & 0.968 & 1.434\\ +X5 CV & 3.138 & 1.948 & 2.413\\ +X6 CV & 0.167 & 0.176 & 0.194\\ +\addlinespace +X2 VIF & 1.003 & 1.053 & 1.167\\ +X3 VIF & 388.669 & 373.092 & 926.768\\ +X4 VIF & 388.696 & 373.280 & 929.916\\ +X5 VIF & 1.001 & 1.014 & 1.043\\ +X6 VIF & 1.003 & 1.066 & 1.254\\ +\addlinespace +CN & 148.247 & 162.707 & 123.025\\ +\bottomrule +\end{tabular}} +\end{table} + +However, does the degree of multicollinearity detected really affect the statistical analysis of the model? According to the results shown in Tables \ref{tab:theoremSIMULATION1tableLATEX} to \ref{tab:theoremSIMULATION3tableLATEX} this is not always the case: + +\begin{itemize} +\item + In Simulation 1, when \(n=3000\), the degree of multicollinearity in the model does not affect the statistical analysis of the model; scenario a.1 is always verified, i.e., both in the model proposed and in the orthonormal model, the null hypothesis is rejected in the individual significance tests. +\item + In Simulation 2, when \(n=100\), the degree of multicollinearity in the model affects the statistical analysis of the model only in the individual significance of the intercept; in all other cases scenario a.1 is verified again. + + \begin{itemize} + \tightlist + \item + As will be seen below, the fact that the individual significance of the variable \(\mathbf{X}_{2}\) is not affected may be due to the number of observations in the data set. But it may also be because multicollinearity of the nonessential type affects only the intercept estimate. Thus, for example, in \citet{Salmeron2019TAS} it is shown (see Table 2 of Example 2) that solving this type of approximate multicollinearity (by centering the variables that cause it) only modifies the estimate of the intercept and its standard deviation, with the estimates of the rest of the independent variables remaining unchanged. + \end{itemize} +\item + In Simulation 3, when \(n=30\), the degree of multicollinearity in the model affects the statistical analysis of the model in the individual significance of the intercept, in \(\mathbf{X}_{2}\) and in \(\mathbf{X}_{4}\). + + \begin{itemize} + \tightlist + \item + In this case, as discussed, the reduction in sample size does not prevent the individual significance of \(\mathbf{X}_{2}\) from being affected. + \end{itemize} +\end{itemize} + +In conclusion, as \citet{OBrien} indicates, it can be seen that the increase in sample size prevents the statistical analysis of the model from being affected by the degree of existing multicollinearity, even though the values of the measures traditionally used to detect this problem indicate that it is troubling. To reach this conclusion, the use of the RVIF proposed by \citet{Salmeron2024a} and the theorem developed in this paper is decisive. + +\begin{table} +\centering +\caption{\label{tab:theoremSIMULATION1tableLATEX}Theorem results of the Simulation 1 model} +\centering +\resizebox{\ifdim\width>\linewidth\linewidth\else\width\fi}{!}{ +\begin{tabular}[t]{lccccc} +\toprule + & RVIFs & c0 & c3 & Scenario & Affects\\ +\midrule +Intercept & 0.934369 & 1.916912 & 0.000001 & a.1 & No\\ +X2 & 0.034899 & 1.359909 & 0.000168 & a.1 & No\\ +X3 & 0.001299 & 5.339519 & 0.000000 & a.1 & No\\ +X4 & 0.001296 & 0.230992 & 0.000004 & a.1 & No\\ +X5 & 0.000036 & 0.257015 & 0.000000 & a.1 & No\\ +\addlinespace +X6 & 0.000053 & 3.160352 & 0.000000 & a.1 & No\\ +\bottomrule +\end{tabular}} +\end{table} + +\begin{table} +\centering +\caption{\label{tab:theoremSIMULATION2tableLATEX}Theorem results of the Simulation 2 model} +\centering +\resizebox{\ifdim\width>\linewidth\linewidth\else\width\fi}{!}{ +\begin{tabular}[t]{lccccc} +\toprule + & RVIFs & c0 & c3 & Scenario & Affects\\ +\midrule +Intercept & 32.965272 & 0.228580 & 0.001580 & b.1 & Yes\\ +X2 & 1.179581 & 1.678248 & 0.014061 & a.1 & No\\ +X3 & 0.037287 & 5.662562 & 0.000001 & a.1 & No\\ +X4 & 0.036687 & 0.113376 & 0.000353 & a.1 & No\\ +X5 & 0.001269 & 0.252728 & 0.000006 & a.1 & No\\ +\addlinespace +X6 & 0.001601 & 3.060976 & 0.000001 & a.1 & No\\ +\bottomrule +\end{tabular}} +\end{table} + +\begin{table} +\centering +\caption{\label{tab:theoremSIMULATION3tableLATEX}Theorem results of the Simulation 3 model} +\centering +\resizebox{\ifdim\width>\linewidth\linewidth\else\width\fi}{!}{ +\begin{tabular}[t]{lccccc} +\toprule + & RVIFs & c0 & c3 & Scenario & Affects\\ +\midrule +Intercept & 70.990340 & 46.793605 & 0.008667 & b.1 & Yes\\ +X2 & 2.524792 & 0.000570 & 0.005083 & b.1 & Yes\\ +X3 & 0.187896 & 3.892727 & 0.000007 & a.1 & No\\ +X4 & 0.187317 & 0.168758 & 0.005113 & b.1 & Yes\\ +X5 & 0.003863 & 0.169923 & 0.000325 & a.1 & No\\ +\addlinespace +X6 & 0.005193 & 2.139108 & 0.000013 & a.1 & No\\ +\bottomrule +\end{tabular}} +\end{table} + +\subsection{Selection of the variable to be set as the last before orthonormalization}\label{how-to-fix} + +Since there are as many QR decompositions as there are possible rearrangements of the independent variables, it is convenient to test different options to determine whether the degree of multicollinearity in the regression model affects its statistical analysis. + +A first possibility is to try all possible reorderings considering that the intercept must always be in first place. Thus, in the Example 2 of \citet{Salmeron2024a} (see Appendix \hyperref[examplesRVIF]{Examples of\ldots{}} for more details) it is considered that \(\mathbf{X} = [ \mathbf{1} \ \mathbf{K} \ \mathbf{W}]\) (see Table \ref{tab:theoremCHOICE1tableLATEX}), but it could also be considered that \(\mathbf{X} = [ \mathbf{1} \ \mathbf{W} \ \mathbf{K}]\) (see Table \ref{tab:theoremCHOICE2tableLATEX}). + +Note that in these tables the values for each variable of RVIF and \(c_{0}\) are always the same, but those of \(c_{3}\) change depending on the position of each variable within the design matrix. + +\begin{table} +\centering +\caption{\label{tab:theoremCHOICE1tableLATEX}Theorem results of the Example 2 of Salmerón et al. (2025)} +\centering +\resizebox{\ifdim\width>\linewidth\linewidth\else\width\fi}{!}{ +\begin{tabular}[t]{lccccc} +\toprule + & RVIFs & c0 & c3 & Scenario & Affects\\ +\midrule +Intercept & 6388.887975 & 88495.933700 & 1.649518 & a.1 & No\\ +Capital & 4.136993 & 207.628058 & 0.050431 & a.1 & No\\ +Work & 37.336378 & 9.445619 & 147.582132 & b.2 & No\\ +\bottomrule +\end{tabular}} +\end{table} + +\begin{table} +\centering +\caption{\label{tab:theoremCHOICE2tableLATEX}Theorem results of the Example 2 of Salmerón et al. (2025) (reordination 2)} +\centering +\resizebox{\ifdim\width>\linewidth\linewidth\else\width\fi}{!}{ +\begin{tabular}[t]{lccccc} +\toprule + & RVIFs & c0 & c3 & Scenario & Affects\\ +\midrule +Intercept & 6388.882446 & 88495.933700 & 1.649518 & a.1 & No\\ +Work & 37.336378 & 9.445619 & 1.163201 & b.1 & Yes\\ +Capital & 4.136993 & 207.628058 & 0.082430 & a.1 & No\\ +\bottomrule +\end{tabular}} +\end{table} + +It is observed that in one of the two possibilities considered, the individual significance of the labor variable is affected by the degree of existing multicollinearity. + +Therefore, to state that the statistical analysis of the multiple linear regression model is not affected by the multicollinearity present in the model, it is necessary to check all the possible QR decompositions and to determine in all cases that the statistical analysis is not affected. However, to determine that the statistical analysis of the model is affected by the presence of multicollinearity, it is sufficient to find one of the possible rearrangements in which the situation b.1 occurs. + +Another possibility is to set in the last position of \(\mathbf{X}\) a particular variable following a specific criterion. Thus, for example, in Example 3 of \citet{Salmeron2024a} (see Appendix \hyperref[examplesRVIF]{Examples of\ldots{}} for more details) it is verified that the variable FA has a coefficient significantly different from zero. Fixing this variable in third place since the individual significance will not be modified yields the results shown in Table \ref{tab:theoremCHOICE8tableLATEX}. + +\begin{table} +\centering +\caption{\label{tab:theoremCHOICE8tableLATEX}Theorem results of the Example 3 of Salmerón et al. (2025) reordination} +\centering +\resizebox{\ifdim\width>\linewidth\linewidth\else\width\fi}{!}{ +\begin{tabular}[t]{lccccc} +\toprule + & RVIFs & c0 & c3 & Scenario & Affects\\ +\midrule +OI & 1.696454e-12 & 9.594942e-13 & 1.775244e-13 & b.1 & Yes\\ +S & 1.718535e-12 & 1.100437e-12 & 1.012113e-12 & b.1 & Yes\\ +FA & 1.829200e-16 & 2.307700e-16 & 1.449800e-16 & a.1 & No\\ +\bottomrule +\end{tabular}} +\end{table} + +It can be seen that in this case the degree of multicollinearity in the model affects the individual significance of the OI and S variables. + +\section{Conclusions}\label{conclusiones-VIF} + +In this paper, following \citet{Salmeron2024a}, we propose an alternative orthogonal model that leads to a lower bound for the RVIF, indicating whether the degree of multicollinearity present in the model affects its statistical analysis. These thresholds serve as complements to the results presented by \citet{OBrien}, who stated that the estimated variances depend on other factors that can counterbalance a high value of the VIF, for example, the size of the sample or the estimated variance of the independent variables. Thus, the thresholds presented for the RVIF also depend on these factors meeting a threshold associated with each independent variable (including the intercept). Note that these thresholds will indicate whether the degree of multicollinearity affects the statistical analysis. + +As these thresholds are derived from the individual significance tests of the model, it is possible to reinterpret them as a statistical test to determine whether the degree of multicollinearity in the linear regression model affects its statistical analysis. This analytic tool allows researchers to conclude whether the degree of multicollinearity is statistically troubling and whether it needs to be treated. We consider this to be a relevant contribution since, to the best of our knowledge, the only existing example of such a measure, presented by \citet{FarrarGlauber}, has been strongly criticized (in addition to the limitations highlighted in the introduction, it should be noted that it completely ignores approximate non-essential multicollinearity since the correlation matrix does not include information on the intercept); consequently, this new statistical test with a non-rejection region will fill a gap in the scientific literature. + +On the other hand, note that the position of each of the variables in the matrix \(\mathbf{X}\) uniquely determines the reference orthonormal model \(\mathbf{X}_{o}\). It is to say, there are as many reference models given by the proposed QR decomposition as there are possible rearrangements of the variables within the matrix \(\mathbf{X}\). + +In this sense, as has been shown, in order to affirm that the statistical analysis of the model is not affected by the degree of multicollinearity existing in the model (with the degree of significance used in the application of the proposed theorem), it is necessary to state that in all the possible rearrangements of \(\mathbf{X}\) it is concluded that scenario b.1 does not occur. On the other hand, when there is a rearrangement in which this scenario appears, it can be stated (to the degree of significance used when applying the proposed theorem) that the degree of existing multicollinearity affects the statistical analysis of the model. + +Finally, as a future line of work, it would be interesting to complete the analysis presented here by studying when the degree of multicollinearity in the model affects its numerical analysis. + +\section{Acknowledgments}\label{acknowledgments} + +This work has been supported by project PP2019-EI-02 of the University of Granada (Spain) and by project A-SEJ-496-UGR20 of the Andalusian Government's Counseling of Economic Transformation, Industry, Knowledge and Universities (Spain). + +\section{Appendix}\label{appendix} + +\subsection{Inconsistency in hypothesis tests: situation a.2}\label{inconsistency} + +From a numerical point of view it is possible to reject \(H_{0}: \beta_{i} = 0\) while \(H_{0}: \beta_{i,o} = 0\) is not rejected, which implies that \(t_{i}^{o} < t_{n-k}(1 - \alpha/2) < t_{i}\). Or, in other words, \(t_{i}/t_{i}^{o} > 1\). + +However, from expression \eqref{eq:texp-orto-2} it is obtained that \(\widehat{\sigma} = | \widehat{\beta}_{i,o} | / t_{i}^{o}\). By substituting \(\widehat{\sigma}\) in expression \eqref{eq:texp-orig}, taking into account expression \eqref{eq:RVIF}, it is obtained that +\[\frac{t_{i}}{t_{i}^{o}} = \frac{| \widehat{\beta}_{i} |}{| \widehat{\beta}_{i,o} |} \cdot \frac{1}{\sqrt{RVIF(i)}}.\] +From this expression it can be concluded that in situations with high collinearity, \(RVIF(i) \rightarrow +\infty\), the ratio \(t_{i}/t_{i}^{o}\) will tend to zero, and the condition \(t_{i}/t_{i}^{o} > 1\) will rarely occur. That is to say, the inconsistency in situation a.2, commented on in the preliminaries of the paper, will not appear. + +On the other hand, if the variable \(i\) is orthogonal to the rest of independent variables, it is verified that \(\widehat{\beta}_{i,o} = \widehat{\beta}_{i}\) since \(p_{i} = ( 0 \dots \underbrace{1}_{(i)} \dots 0)\). At the same time, \(RVIF(i) = \frac{1}{SST_{i}}\) where \(SST\) denotes the sum of total squares. If there is orthonormality, as proposed in this paper, \(SST_{i} = 1\) and, as consequence, it is verified that \(t_{i} = t_{i}^{o}\). Thus, the individual significance test for the original data and for the orthonormal data are the same. + +\subsection{\texorpdfstring{Test of individual significance of coefficient \(k\)}{Test of individual significance of coefficient k}}\label{apendix} + +Taking into account that it is verified that \(\boldsymbol{\beta}_{o} = \mathbf{P} \boldsymbol{\beta}\) where: +\[\boldsymbol{\beta}_{o} = \left( + \begin{array}{c} + \beta_{1,o} \\ + \beta_{2,o} \\ + \vdots \\ + \beta_{k,o} + \end{array} \right), \quad + \mathbf{P} = \left( + \begin{array}{cccc} + p_{11} & p_{12} & \dots & p_{1k} \\ + 0 & p_{22} & \dots & p_{2k} \\ + \vdots & \vdots & & \vdots \\ + 0 & 0 & \dots & p_{kk} + \end{array} \right), \quad + \boldsymbol{\beta} = \left( + \begin{array}{c} + \beta_{1} \\ + \beta_{2} \\ + \vdots \\ + \beta_{k} + \end{array} \right),\] +it is obtained that \(\beta_{k,o} = p_{kk} \beta_{k}\). Then, the null hypothesis \(H_{0}: \beta_{k,o} = 0\) is equivalent to \(H_{0}: \beta_{k} = 0\). Due to this fact, Tables \ref{tab:Wissel0tableLATEX} and \ref{tab:WisselORTOtableLATEX} showed an expectable behaviour. However, this behaviour will be analyzed with more detail. + +The experimental value to be considered to take a decision in the test with null hypothesis \(H_{0}: \beta_{k,o} = 0\) and alternative hypothesis \(H_{1}: \beta_{k,o} \not= 0\) is given by the following expression: +\[t_{k}^{o} = \left| \frac{\widehat{\beta}_{k,o}}{\sqrt{var \left( \widehat{\beta}_{k,o} \right)}} \right|.\] + +Taking into account that \(\widehat{\boldsymbol{\beta}}_{o} = \mathbf{P} \widehat{\boldsymbol{\beta}}\) and \(var \left( \widehat{\boldsymbol{\beta}}_{o} \right) = \mathbf{P} var \left( \widehat{\boldsymbol{\beta}} \right) \mathbf{P}^{t},\) it is verified that \(\widehat{\beta}_{k,o} = p_{kk} \widehat{\beta}_{k}\) and \(var \left( \widehat{\beta}_{k,o} \right) = p_{kk}^{2} var \left( \widehat{\beta}_{k} \right)\). Then: +\[t_{k}^{o} = \left| \frac{p_{kk} \widehat{\beta}_{k}}{p_{kk} \sqrt{var \left( \widehat{\beta}_{k} \right)}} \right| = \left| \frac{\widehat{\beta}_{k}}{\sqrt{var \left( \widehat{\beta}_{k} \right)}} \right| = t_{k},\] +where \(t_{k}\) is the experimental value to take a decision in the test with null hypothesis \(H_{0}: \beta_{k} = 0\) and alternative hypothesis \(H_{1}: \beta_{k} \not= 0\). + +\subsection{\texorpdfstring{Examples of \citet{Salmeron2024a}}{Examples of @Salmeron2024a}}\label{examplesRVIF} + +\textbf{Example 1 of \citet{Salmeron2024a}: Detection of traditional nonessential multicollinearity}. Using data from a financial model in which the Euribor (E) is analyzed from the Harmonized Index of Consumer Prices (HICP), the balance of payments to net current account (BC) and the government deficit to net nonfinancial accounts (GD), we illustrate the detection of approximate multicollinearity of the non-essential type, i.e.~where the intercept is related to one of the remaining independent variables (for details see \citet{MarquardtSnee1975}). For more information on this data set use \emph{help(euribor)}. + +Note that \citet{Salmeron2019} establishes that an independent variable with a coefficient of variation less than 0.1002506 indicates that this variable is responsible for a non-essential multicollinearity problem. + +Thus, first of all, the approximate multicollinearity detection is performed using the measures traditionally applied for this purpose: the Variance Inflation Factor (VIF) and the Condition Number (CN). Values higher than 10 for the VIF (see, for example, \citet{Marquardt1970}) and 30 for the CN (see, for example, \citet{Belsley1991} or \citet{BelsleyKuhWelsch}), imply that the degree of existing multicollinearity is troubling. Moreover, according to \citet{Salmeron2019}, the VIF is only able to detect essential multicollinearity (relationship between independent variables excluding the intercept, see \citet{MarquardtSnee1975}), while the CN detects both essential and non-essential multicollinearity. + +Therefore, the values calculated below (using the \(\texttt{VIF}\), \(\texttt{CN}\) and \(\texttt{CVs}\) commands from the \CRANpkg{multiColl} package, see \citet{Salmeron2021multicoll} and \citet{Salmeron2022multicoll} for more details on this package) indicate that the degree of approximate multicollinearity existing in the model of the essential type is not troubling, while that of the non-essential type is troubling due to the relationship of HIPC with the intercept. + +\begin{verbatim} +E = euribor[,1] +data1 = euribor[,-1] + +VIF(data1) +\end{verbatim} + +\begin{verbatim} +#> HIPC BC GD +#> 1.349666 1.058593 1.283815 +\end{verbatim} + +\begin{verbatim} +CN(data1) +\end{verbatim} + +\begin{verbatim} +#> [1] 39.35375 +\end{verbatim} + +\begin{verbatim} +CVs(data1) +\end{verbatim} + +\begin{verbatim} +#> [1] 0.06957876 4.34031035 0.55015508 +\end{verbatim} + +This assumption is confirmed by calculating the RVIF values, which point to a strong relationship between the second variable and the intercept: + +\begin{verbatim} +rvifs(data1, ul = T, intercept = T) +\end{verbatim} + +\begin{verbatim} +#> RVIF % +#> Intercept 250.294157 99.6005 +#> Variable 2 280.136873 99.6430 +#> Variable 3 1.114787 10.2967 +#> Variable 4 5.525440 81.9019 +\end{verbatim} + +The output of the \(\texttt{rvifs}\) command provides the values of the Redefined Variance Inflation Factor (RVIF) and the percentage of multicollinearity due to each variable (denoted as \(a_{i}\) in the \hyperref[sub-above]{An orthonormal\ldots{}} section). + +In this case, three of the four arguments available in the \(\texttt{rvifs}\) command are used: + +\begin{itemize} +\item + The first of these refers to the design matrix containing the independent variables (the intercept, if any, being the first column). +\item + The second argument, \(ul\), indicates that the data is to be transformed into unit length. This transformation makes it possible to establish that the RVIF is always greater than or equal to 1, having as a reference a minimum value that indicates the absence of worrying multicollinearity. +\item + The third argument, \(intercept\), indicates whether there is an intercept in the design matrix. +\end{itemize} + +Note that these results can also be obtained after using the \(\texttt{lm}\) and \(\texttt{model.matrix}\) commands as follows: + +\begin{verbatim} +reg_E = lm(euribor[,1]~as.matrix(euribor[,-c(1,2)])) +rvifs(model.matrix(reg_E)) +\end{verbatim} + +\begin{verbatim} +#> RVIF % +#> Intercept 250.294157 99.6005 +#> Variable 2 280.136873 99.6430 +#> Variable 3 1.114787 10.2967 +#> Variable 4 5.525440 81.9019 +\end{verbatim} + +Finally, the application of the Theorem established in Subsection \hyperref[TheTHEOREM]{From the RVIF} detects that the individual inference of the second variable (HIPC) is affected by the degree of multicollinearity existing in the model. These results are obtained using the \(\texttt{multicollinearity}\) command from the \CRANpkg{rvif} package: + +\begin{verbatim} +multicollinearity(E, data1) +\end{verbatim} + +\begin{verbatim} +#> RVIFs c0 c3 Scenario Affects +#> 1 5.325408e+00 1.575871e+01 2.166907e-02 a.1 No +#> 2 5.357830e-04 3.219456e-06 4.249359e-05 b.1 Yes +#> 3 5.109564e-11 1.098649e-09 2.586237e-12 a.1 No +#> 4 1.631439e-11 3.216522e-10 8.274760e-13 a.1 No +\end{verbatim} + +Therefore, it can be established that the existing multicollinearity affects the statistical analysis of the Euribor model. + +\textbf{Example 2 of \citet{Salmeron2024a}: Detection of generalized nonessential multicollinearity}. Using data from a Cobb-Douglas production function in which the production (P) is analyzed from the capital (K) and the work (W), we illustrate the detection of approximate multicollinearity of the generalized non-essential type, i.e., that in which at least two independent variables with very little variability (excluding the intercept) are related to each other (for more details, see \citet{Salmeron2020maths}). For more information on this dataset use \emph{help(CDpf)}. + +Using the \(\texttt{rvifs}\) command, it can be determined that both capital and labor are linearly related to each other with high RVIF values below the threshold established as a worrying value: + +\begin{verbatim} +P = CDpf[,1] +data2 = CDpf[,2:4] +\end{verbatim} + +\begin{verbatim} +rvifs(data2, ul = T) +\end{verbatim} + +\begin{verbatim} +#> RVIF % +#> Intercept 178888.71 99.9994 +#> Variable 2 38071.44 99.9974 +#> Variable 3 255219.74 99.9996 +\end{verbatim} + +However, the application of the Theorem established in Subsection \hyperref[TheTHEOREM]{From the RVIF} does not detect that the degree of multicollinearity in the model affects the statistical analysis of the model: + +\begin{verbatim} +multicollinearity(P, data2) +\end{verbatim} + +\begin{verbatim} +#> RVIFs c0 c3 Scenario Affects +#> 1 6388.887975 88495.933700 1.64951764 a.1 No +#> 2 4.136993 207.628058 0.05043083 a.1 No +#> 3 37.336378 9.445619 147.58213164 b.2 No +\end{verbatim} + +Now then, if we rearrange the design matrix \(\mathbf{X}\) we obtain that: + +\begin{verbatim} +data2 = CDpf[,c(2,4,3)] +multicollinearity(P, data2) +\end{verbatim} + +\begin{verbatim} +#> RVIFs c0 c3 Scenario Affects +#> 1 6388.882446 88495.933700 1.64951764 a.1 No +#> 2 37.336378 9.445619 1.16320125 b.1 Yes +#> 3 4.136993 207.628058 0.08242979 a.1 No +\end{verbatim} + +Therefore, it can be established that the existing multicollinearity does affect the statistical analysis of the Cobb-Douglas production function model. + +\textbf{Example 3 of \citet{Salmeron2024a}: Detection of essential multicollinearity}. Using data from a model in which the number of employees of Spanish companies (NE) is analyzed from the fixed assets (FA), operating income (OI) and sales (S), we illustrate the detection of approximate multicollinearity of the essential type, i.e., that in which at least two independent variables (excluding the intercept) are related to each other (for more details, see \citet{MarquardtSnee1975}). For more information on this dataset use \emph{help(employees)}. + +In this case, the \(\texttt{rvifs}\) command shows that variables three and four (OI and S) have a high VIF value, so they are highly linearly related: + +\begin{verbatim} +NE = employees[,1] +data3 = employees[,2:5] +\end{verbatim} + +\begin{verbatim} +rvifs(data3, ul = T) +\end{verbatim} + +\begin{verbatim} +#> RVIF % +#> Intercept 2.984146 66.4896 +#> Variable 2 5.011397 80.0455 +#> Variable 3 15186.744870 99.9934 +#> Variable 4 15052.679178 99.9934 +\end{verbatim} + +Note that if in \emph{rvifs(data3, ul = T)} the unit length transformation is avoided, which is done in the \(\texttt{multicollinearity}\) command, the RVIF cannot be calculated since the system is computationally singular. For this reason, the intercept is eliminated below since it has been shown above that it does not play a relevant role in the linear relationships of the model. + +Finally, the application of the Theorem established in Subsection \hyperref[TheTHEOREM]{From the RVIF} detects that the individual inference of the third variable (OI) is affected by the degree of multicollinearity existing in the model: + +\begin{verbatim} +multicollinearity(NE, data3[,-1]) +\end{verbatim} + +\begin{verbatim} +#> RVIFs c0 c3 Scenario Affects +#> 1 1.829154e-16 2.307712e-16 4.679301e-17 a.1 No +#> 2 1.696454e-12 9.594942e-13 2.129511e-13 b.1 Yes +#> 3 1.718535e-12 1.100437e-12 2.683809e-12 b.2 No +\end{verbatim} + +Therefore, it can be established that the existing multicollinearity affects the statistical analysis of the model of the number of employees in Spanish companies. + +\textbf{Example 4 of \citet{Salmeron2024a}: The special case of simple linear model}. The simple linear regression model is an interesting case because it has a single independent variable and the intercept. Since the intercept is not properly considered as an independent variable of the model in many cases (see introduction of \citet{Salmeron2019} for more details), different software (including R, \citet{R}) do not consider that there can be worrisome multicollinearity in this type of model. + +To illustrate this situation, \citet{Salmeron2024a} randomly generates observations for the following two simple linear regression models \(\mathbf{y}_{1} = \beta_{1} + \beta_{2} \mathbf{V} + \mathbf{u}_{1}\) and \(\mathbf{y}_{2} = \alpha_{1} + \alpha_{2} \mathbf{Z} + \mathbf{u}_{1}\), according to the following code: + +\begin{verbatim} +set.seed(2022) +obs = 50 +cte4 = rep(1, obs) +V = rnorm(obs, 10, 10) +y1 = 3 + 4*V + rnorm(obs, 0, 2) +Z = rnorm(obs, 10, 0.1) +y2 = 3 + 4*Z + rnorm(obs, 0, 2) + +data4.1 = cbind(cte4, V) +data4.2 = cbind(cte4, Z) +\end{verbatim} + +For more information on these data sets use \emph{help(SLM1)} and \emph{help(SLM2)}. + +As mentioned above, the R package (\citet{R}) denies the existence of multicollinearity in this type of model. Thus, for example, when using the \(\texttt{vif}\) command of the \CRANpkg{car} package on \emph{reg=lm(y1\textasciitilde V)} the following message is obtained: \emph{Error in vif.default(reg): model contains fewer than 2 terms}. + +Undoubtedly, this message is coherent with the fact that, as mentioned above, the VIF is not capable of detecting non-essential multicollinearity (which is the only multicollinearity that exists in this type of model). However, the error message provided may lead a non-specialized user to consider that the multicollinearity problem does not exist in this type of model. These issues are addressed in more depth in \citet{Salmeron2022multicoll}. + +On the other hand, the calculation of the RVIF in the first model shows that the degree of multicollinearity is not troubling, since it presents very low values: + +\begin{verbatim} +rvifs(data4.1, ul = T) +\end{verbatim} + +\begin{verbatim} +#> RVIF % +#> Intercept 2.015249 50.3783 +#> Variable 2 2.015249 50.3783 +\end{verbatim} + +While in the second model they are very high, indicating a problem of non-essential multicollinearity: + +\begin{verbatim} +rvifs(data4.2, ul = T) +\end{verbatim} + +\begin{verbatim} +#> RVIF % +#> Intercept 9390.044 99.9894 +#> Variable 2 9390.044 99.9894 +\end{verbatim} + +By using the \(\texttt{multicollinearity}\) command, it is found that the individual inference of the intercept of the second model is affected by the degree of multicollinearity in the model: + +\begin{verbatim} +multicollinearity(y1, data4.1) +\end{verbatim} + +\begin{verbatim} +#> RVIFs c0 c3 Scenario Affects +#> 1 0.0403049717 0.6454323 1.045802e-05 a.1 No +#> 2 0.0002675731 0.8383436 8.540101e-08 a.1 No +\end{verbatim} + +\begin{verbatim} +multicollinearity(y2, data4.2) +\end{verbatim} + +\begin{verbatim} +#> RVIFs c0 c3 Scenario Affects +#> 1 187.800878 21.4798003 0.03277691 b.1 Yes +#> 2 1.879296 0.3687652 9.57724567 b.2 No +\end{verbatim} + +Therefore, it can be established that the multicollinearity existing in the first simple linear regression model does not affect the statistical analysis of the model, while in the second one it does. + +\bibliography{RJreferences.bib} + +\address{% +Román Salmerón-Gómez\\ +University of Granada\\% +Department of Quantitative Methods for Economics and Business\\ Campus Universitario de La Cartuja, Universidad de Granada. 18071 Granada (España)\\ +% +\url{https://www.ugr.es/~romansg/web/index.html}\\% +\textit{ORCiD: \href{https://orcid.org/0000-0003-2589-4058}{0000-0003-2589-4058}}\\% +\href{mailto:romansg@ugr.es}{\nolinkurl{romansg@ugr.es}}% +} + +\address{% +Catalina B. García-García\\ +University of Granada\\% +Department of Quantitative Methods for Economics and Business\\ Campus Universitario de La Cartuja, Universidad de Granada. 18071 Granada (España)\\ +% +\url{https://metodoscuantitativos.ugr.es/informacion/directorio-personal/catalina-garcia-garcia}\\% +\textit{ORCiD: \href{https://orcid.org/0000-0003-1622-3877}{0000-0003-1622-3877}}\\% +\href{mailto:cbgarcia@ugr.es}{\nolinkurl{cbgarcia@ugr.es}}% +} diff --git a/_articles/RJ-2025-040/RJournal.sty b/_articles/RJ-2025-040/RJournal.sty new file mode 100644 index 0000000000..351990be38 --- /dev/null +++ b/_articles/RJ-2025-040/RJournal.sty @@ -0,0 +1,358 @@ +% Package `RJournal' to use with LaTeX2e +% Copyright (C) 2010 by the R Foundation +% Copyright (C) 2013 by the R Journal +% +% Originally written by Kurt Hornik and Friedrich Leisch with subsequent +% edits by the editorial board +% +% CAUTION: +% Do not modify this style file. Any changes to this file will be reset when your +% article is submitted. +% If you must modify the style or add LaTeX packages to the article, these +% should be specified in RJwrapper.tex + +\NeedsTeXFormat{LaTeX2e}[1995/12/01] +\ProvidesPackage{RJournal}[2025/10/05 v0.17 RJournal package] + +\RequirePackage{tikz} + +% Overall page layout, fonts etc ----------------------------------------------- + +% Issues of of \emph{The R Journal} are created from the standard \LaTeX{} +% document class \pkg{report}. + +\RequirePackage{geometry} +\geometry{a4paper, + textwidth=14cm, top=1cm, bottom=1cm, + includehead,includefoot,centering, + footskip=1.5cm} +\raggedbottom +\sloppy +\clubpenalty = 10000 +\widowpenalty = 10000 +\brokenpenalty = 10000 +\usepackage{microtype} + + +\RequirePackage{fancyhdr} +\fancyhead{} +\fancyheadoffset{2cm} +\fancyhead[L]{\textsc{\RJ@sectionhead}} +\fancyhead[R]{\thepage} +\fancyfoot{} +\fancyfoot[L]{The R Journal Vol. \RJ@volume/\RJ@number, \RJ@month~\RJ@year} +\fancyfoot[R]{ISSN 2073-4859} +\pagestyle{fancy} + +% We use the following fonts (all with T1 encoding): +% +% rm & palatino +% tt & inconsolata +% sf & helvetica +% math & palatino + +\RequirePackage{microtype} + +\RequirePackage[scaled=0.92]{helvet} +\RequirePackage{palatino,mathpazo} +\RequirePackage[scaled=1.02]{inconsolata} +\RequirePackage[T1]{fontenc} + +\RequirePackage[hyphens]{url} +\RequirePackage[pagebackref]{hyperref} +\renewcommand{\backref}[1]{[p#1]} + +% Dark blue colour for all links +\RequirePackage{color} +\definecolor{link}{rgb}{0.45,0.51,0.67} +\hypersetup{ + colorlinks,% + citecolor=link,% + filecolor=link,% + linkcolor=link,% + urlcolor=link +} + +% Give the text a little room to breath +\setlength{\parskip}{3pt} +\RequirePackage{setspace} +\setstretch{1.05} + +% Issue and article metadata --------------------------------------------------- + +% Basic front matter information about the issue: volume, number, and +% date. + +\newcommand{\volume}[1]{\def\RJ@volume{#1}} +\newcommand{\volnumber}[1]{\def\RJ@number{#1}} +\renewcommand{\month}[1]{\def\RJ@month{#1}} +\renewcommand{\year}[1]{\def\RJ@year{#1}} + + +% Individual articles correspond to +% chapters, and are contained in |article| environments. This makes it +% easy to have figures counted within articles and hence hyperlinked +% correctly. + +% An article has an author, a title, and optionally a subtitle. We use +% the obvious commands for specifying these. Articles will be put in certain +% journal sections, named by \sectionhead. + +\newcommand {\sectionhead} [1]{\def\RJ@sectionhead{#1}} +\renewcommand{\author} [1]{\def\RJ@author{#1}} +\renewcommand{\title} [1]{\def\RJ@title{#1}} +\newcommand {\subtitle} [1]{\def\RJ@subtitle{#1}} + +% Control appearance of titles: make slightly smaller than usual, and +% suppress section numbering. See http://tex.stackexchange.com/questions/69749 +% for why we don't use \setcounter{secnumdepth}{-1} + +\usepackage[medium]{titlesec} +\usepackage{titletoc} +\titleformat{\section} {\normalfont\large\bfseries}{\arabic{section}}{1em}{} +\titleformat{\subsection}{\normalfont\normalsize\bfseries}{\arabic{section}.\arabic{subsection}}{0.5em}{} +\titlecontents{chapter} [0em]{}{}{}{\titlerule*[1em]{.}\contentspage} + +% Article layout --------------------------------------------------------------- + +% Environment |article| clears the article header information at its beginning. +% We use |\FloatBarrier| from the placeins package to keep floats within +% the article. +\RequirePackage{placeins} +\newenvironment{article}{\author{}\title{}\subtitle{}\FloatBarrier}{\FloatBarrier} + +% Refereed articles should have an abstract, so we redefine |\abstract| to +% give the desired style + +\renewcommand{\abstract}[1]{\noindent\textbf{Abstract} #1} +\renewenvironment{abstract}{\noindent\textbf{Abstract}~}{} + +% The real work is done by a redefined version of |\maketitle|. Note +% that even though we do not want chapters (articles) numbered, we +% need to increment the chapter counter, so that figures get correct +% labelling. + +\renewcommand{\maketitle}{% +\noindent + \chapter{\RJ@title}\refstepcounter{chapter} + \ifx\empty\RJ@subtitle + \else + \noindent\textbf{\RJ@subtitle} + \par\nobreak\addvspace{\baselineskip} + \fi + \ifx\empty\RJ@author + \else + \noindent\textit{\RJ@author} + \par\nobreak\addvspace{\baselineskip} + \fi + \@afterindentfalse\@nobreaktrue\@afterheading +} + +% Now for some ugly redefinitions. We do not want articles to start a +% new page. (Actually, we do, but this is handled via explicit +% \newpage +% +% The name@of@eq is a hack to get hyperlinks to equations to work +% within each article, even though there may be multiple eq.(1) +% \begin{macrocode} +\renewcommand\chapter{\secdef\RJ@chapter\@schapter} +\providecommand{\nohyphens}{% + \hyphenpenalty=10000\exhyphenpenalty=10000\relax} +\newcommand{\RJ@chapter}{% + \edef\name@of@eq{equation.\@arabic{\c@chapter}}% + \renewcommand{\@seccntformat}[1]{}% + \@startsection{chapter}{0}{0mm}{% + -2\baselineskip \@plus -\baselineskip \@minus -.2ex}{\p@}{% + \phantomsection\normalfont\huge\bfseries\raggedright}} + +% Book reviews should appear as sections in the text and in the pdf bookmarks, +% however we wish them to appear as chapters in the TOC. Thus we define an +% alternative to |\maketitle| for reviews. +\newcommand{\review}[1]{ + \pdfbookmark[1]{#1}{#1} + \section*{#1} + \addtocontents{toc}{\protect\contentsline{chapter}{#1}{\thepage}{#1.1}} +} + +% We want bibliographies as starred sections within articles. +% +\RequirePackage[sectionbib,round]{natbib} +\bibliographystyle{abbrvnat} +\renewcommand{\bibsection}{\section*{References}} + +% Equations, figures and tables are counted within articles, but we do +% not show the article number. For equations it becomes a bit messy to avoid +% having hyperref getting it wrong. + +% \numberwithin{equation}{chapter} +\renewcommand{\theequation}{\@arabic\c@equation} +\renewcommand{\thefigure}{\@arabic\c@figure} +\renewcommand{\thetable}{\@arabic\c@table} + +% Issue layout ----------------------------------------------------------------- + +% Need to provide our own version of |\tableofcontents|. We use the +% tikz package to get the rounded rectangle. Notice that |\section*| +% is really the same as |\chapter*|. +\renewcommand{\contentsname}{Contents} +\renewcommand\tableofcontents{% + \vspace{1cm} + \section*{\contentsname} + { \@starttoc{toc} } +} + +\renewcommand{\titlepage}{% + \thispagestyle{empty} + \hypersetup{ + pdftitle={The R Journal Volume \RJ@volume/\RJ@number, \RJ@month \RJ@year},% + pdfauthor={R Foundation for Statistical Computing},% + } + \noindent + \begin{center} + \fontsize{50pt}{50pt}\selectfont + The \raisebox{-8pt}{\includegraphics[height=77pt]{Rlogo-5}}\hspace{10pt} + Journal + + \end{center} + {\large \hfill Volume \RJ@volume/\RJ@number, \RJ@month{} \RJ@year \quad} + + \rule{\textwidth}{1pt} + \begin{center} + {\Large A peer-reviewed, open-access publication of the \\ + R Foundation for Statistical Computing} + \end{center} + + % And finally, put in the TOC box. Note the way |tocdepth| is adjusted + % before and after producing the TOC: thus, we can ensure that only + % articles show up in the printed TOC, but that in the PDF version, + % bookmarks are created for sections and subsections as well (provided + % that the non-starred forms are used). + \setcounter{tocdepth}{0} + \tableofcontents + \setcounter{tocdepth}{2} + \clearpage +} + +% Text formatting -------------------------------------------------------------- + +\newcommand{\R}{R} +\newcommand{\address}[1]{\addvspace{\baselineskip}\noindent\emph{#1}} +\newcommand{\email}[1]{\href{mailto:#1}{\normalfont\texttt{#1}}} + +% Simple font selection is not good enough. For example, |\texttt{--}| +% gives `\texttt{--}', i.e., an endash in typewriter font. Hence, we +% need to turn off ligatures, which currently only happens for commands +% |\code| and |\samp| and the ones derived from them. Hyphenation is +% another issue; it should really be turned off inside |\samp|. And +% most importantly, \LaTeX{} special characters are a nightmare. E.g., +% one needs |\~{}| to produce a tilde in a file name marked by |\file|. +% Perhaps a few years ago, most users would have agreed that this may be +% unfortunate but should not be changed to ensure consistency. But with +% the advent of the WWW and the need for getting `|~|' and `|#|' into +% URLs, commands which only treat the escape and grouping characters +% specially have gained acceptance + +\DeclareRobustCommand\code{\bgroup\@noligs\@codex} +\def\@codex#1{\texorpdfstring% +{{\normalfont\ttfamily\hyphenchar\font=-1 #1}}% +{#1}\egroup} +\newcommand{\kbd}[1]{{\normalfont\texttt{#1}}} +\newcommand{\key}[1]{{\normalfont\texttt{\uppercase{#1}}}} +\DeclareRobustCommand\samp{`\bgroup\@noligs\@sampx} +\def\@sampx#1{{\normalfont\texttt{#1}}\egroup'} +\newcommand{\var}[1]{{\normalfont\textsl{#1}}} +\let\env=\code +\newcommand{\file}[1]{{`\normalfont\textsf{#1}'}} +\let\command=\code +\let\option=\samp +\newcommand{\dfn}[1]{{\normalfont\textsl{#1}}} +% \acronym is effectively disabled since not used consistently +\newcommand{\acronym}[1]{#1} +\newcommand{\strong}[1]{\texorpdfstring% +{{\normalfont\fontseries{b}\selectfont #1}}% +{#1}} +\let\pkg=\strong +\newcommand{\CRANpkg}[1]{\href{https://CRAN.R-project.org/package=#1}{\pkg{#1}}}% +\let\cpkg=\CRANpkg +\newcommand{\ctv}[1]{\href{https://CRAN.R-project.org/view=#1}{\emph{#1}}} +\newcommand{\BIOpkg}[1]{\href{https://www.bioconductor.org/packages/release/bioc/html/#1.html}{\pkg{#1}}} + +% Example environments --------------------------------------------------------- +\RequirePackage{fancyvrb} +\RequirePackage{alltt} + +\DefineVerbatimEnvironment{example}{Verbatim}{} +\renewenvironment{example*}{\begin{alltt}}{\end{alltt}} + +% Support for output from Sweave, and generic session style code +% These used to have fontshape=sl for Sinput/Scode/Sin, but pslatex +% won't use a condensed font in that case. + +% Update (2015-05-28 by DS): remove fontsize=\small to match example environment + +\DefineVerbatimEnvironment{Sinput}{Verbatim}{} +\DefineVerbatimEnvironment{Soutput}{Verbatim}{} +\DefineVerbatimEnvironment{Scode}{Verbatim}{} +\DefineVerbatimEnvironment{Sin}{Verbatim}{} +\DefineVerbatimEnvironment{Sout}{Verbatim}{} +\newenvironment{Schunk}{}{} + +% Mathematics ------------------------------------------------------------------ + +% The implementation of |\operatorname| is similar to the mechanism +% \LaTeXe{} uses for functions like sin and cos, and simpler than the +% one of \AmSLaTeX{}. We use |\providecommand| for the definition in +% order to keep the one of the \pkg{amstex} if this package has +% already been loaded. +% \begin{macrocode} +\providecommand{\operatorname}[1]{% + \mathop{\operator@font#1}\nolimits} +\RequirePackage{amsfonts} + +\renewcommand{\P}{% + \mathop{\operator@font I\hspace{-1.5pt}P\hspace{.13pt}}} +\newcommand{\E}{% + \mathop{\operator@font I\hspace{-1.5pt}E\hspace{.13pt}}} +\newcommand{\VAR}{\operatorname{var}} +\newcommand{\COV}{\operatorname{cov}} +\newcommand{\COR}{\operatorname{cor}} + +% Figures ---------------------------------------------------------------------- + +% For use with pandoc > 3.2.1 +\newsavebox\pandoc@box +\newcommand*\pandocbounded[1]{% scales image to fit in text height/width + \sbox\pandoc@box{#1}% + \Gscale@div\@tempa{\textheight}{\dimexpr\ht\pandoc@box+\dp\pandoc@box\relax}% + \Gscale@div\@tempb{\linewidth}{\wd\pandoc@box}% + \ifdim\@tempb\p@<\@tempa\p@\let\@tempa\@tempb\fi% select the smaller of both + \ifdim\@tempa\p@<\p@\scalebox{\@tempa}{\usebox\pandoc@box}% + \else\usebox{\pandoc@box}% + \fi% +} + +\RequirePackage[font=small,labelfont=bf]{caption} + +% Wide environments for figures and tables ------------------------------------- +\RequirePackage{environ} + +% An easy way to make a figure span the full width of the page +\NewEnviron{widefigure}[1][]{ +\begin{figure}[#1] +\advance\leftskip-2cm +\begin{minipage}{\dimexpr\textwidth+4cm\relax}% + \captionsetup{margin=2cm} + \BODY +\end{minipage}% +\end{figure} +} + +\NewEnviron{widetable}[1][]{ +\begin{table}[#1] +\advance\leftskip-2cm +\begin{minipage}{\dimexpr\textwidth+4cm\relax}% + \captionsetup{margin=2cm} + \BODY +\end{minipage}% +\end{table} +} diff --git a/_articles/RJ-2025-040/RJreferences.bib b/_articles/RJ-2025-040/RJreferences.bib new file mode 100644 index 0000000000..6caddb9666 --- /dev/null +++ b/_articles/RJ-2025-040/RJreferences.bib @@ -0,0 +1,283 @@ +@article{Salmeron2024a, + title = {A redefined Variance Inflation Factor: overcoming the limitations of the Variance Inflation Factor}, + year = {2025}, + author = {Salmer{\'o}n, R. and Garc{\'i}a, C.~B. and Garc\'{i}a, J.}, + journal = {Computational Economics}, + volume = {65}, + pages = {337--363}, + doi = {10.1007/s10614-024-10575-8}, + url = {https://doi.org/10.1007/s10614-024-10575-8} +} +@article{FarrarGlauber, + title = {Multicollinearity in regression analysis: the problem revisited}, + author = {Farrar, Donald E and Glauber, Robert R}, + journal = {The Review of Economic and Statistics}, + volume = {49}, + number = {1}, + pages = {92--107}, + year = {1967}, + doi = {10.2307/1937887}, + url = {https://doi.org/10.2307/1937887} +} +@article{GunstMason1977, + title = {Advantages of examining multicollinearities in regression analysis}, + year = {1977}, + author = {Gunst, R.L. and Mason, R.L.}, + journal = {Biometrics}, + volume = {33}, + number = {1}, + pages = {249--260}, + doi = {10.2307/2529320}, + url = {https://doi.org/10.2307/2529320} +} +@Book{Gujarati2003, + title = {Basic Econometrics}, + author = {Gujarati, D.N.}, + year = {2003}, + publisher = {McGraw-Hill (fourth edition)}, + url = {https://highered.mheducation.com/sites/0072335424/} +} +@article{Silvey1969, + title = {Multicollinearity and imprecise estimation}, + year = {1969}, + author = {Silvey, S.}, + journal = {Journal of the Royal Statistical Society. Series B (Methodological)}, + volume = {31}, + number = {3}, + pages = {539--552}, + url = {https://www.jstor.org/stable/2984357} +} +@article{WillanWatts1978, + title = {Meaningful multicollinearity measures}, + year = {1978}, + author = {Willan, A.R. and Watts, D.G.}, + journal = {Technometrics}, + volume = {20}, + number = {4}, + pages = {407--412}, + doi = {10.1080/00401706.1978.10489694}, + url = {https://doi.org/10.1080/00401706.1978.10489694} +} +@Book{Wooldrigde2013, + title = {Introductory Econometrics. A Modern Approach}, + author = {Wooldrigde, J.M.}, + year = {2020}, + url = {https://www.cengage.uk/c/introductory-econometrics-a-modern-approach-7e-wooldridge/9781337558860PF/}, + publisher = {South-Western, CENGAGE Learning (7th edition)} +} +@article{OBrien, + title = {A caution regarding rules of thumb for variance inflation factors}, + author = {O'Brien, R.M.}, + journal = {Quality \& quantity}, + volume = {41}, + number = {5}, + pages = {673--690}, + year = {2007}, + url = {https://link.springer.com/article/10.1007/s11135-006-9018-6} +} +@Manual{R, + title = {{R}: A Language and Environment for Statistical Computing}, + author = {{R Core Team}}, + organization = {R Foundation for Statistical Computing}, + address = {Vienna, Austria}, + year = {2025}, + edition = {4.5.1}, + url = {https://www.R-project.org/} +} +@article{CriticaFarrar1, + title = {Multicollinearity in regression analysis: Comment}, + author = {Haitovsky, Yoel}, + journal = {The Review of Economics and Statistics}, + volume = {51}, + number = {4}, + pages = {486--489}, + year = {1969}, + publisher = {JSTOR}, + doi = {10.2307/1926450}, + url = {https://doi.org/10.2307/1926450} +} +@article{CriticaFarrar2, + title = {Multicollinearity in regression analysis}, + author = {Kumar, T Krishna}, + journal = {The Review of Economics and Statistics}, + volume = {57}, + number = {3}, + pages = {365--366}, + year = {1975}, + publisher = {MIT Press}, + doi = {10.2307/1923925}, + url = {https://doi.org/10.2307/1923925} +} +@article{CriticaFarrar3, + title = {The detection of multicollinearity: A comment}, + author = {Wichers, C Robert}, + journal = {The Review of Economics and Statistics}, + volume = {57}, + number = {3}, + pages = {366--368}, + year = {1975}, + publisher = {JSTOR}, + doi = {10.2307/1923926}, + url = {https://doi.org/10.2307/1923926} +} +@article{CriticaFarrar4, + title = {Tests for the severity of multicolinearity in regression analysis: A comment}, + author = {O'Hagan, John and McCabe, Brendan}, + journal = {The Review of Economics and Statistics}, + volume = {57}, + number = {3}, + pages = {368--370}, + year = {1975}, + publisher = {JSTOR}, + doi = {10.2307/1923927}, + url = {https://doi.org/10.2307/1923927} +} +@article{Garciaetal2019b, + title = {Residualization: justification, properties and application}, + year = {2019}, + author = {Garc{\'i}a, C.~B. and Salmer{\'o}n, R. and Garc{\'i}a-Garc{\'i}a, C. and Garc\'{i}a, J.}, + journal = {Journal of Applied Statistics}, + volume = {47}, + number = {11}, + pages = {1990--2010}, + doi = {10.1111/insr.12575}, + url = {https://doi.org/10.1111/insr.12575} +} +@Book{Wissell, + title = {A new biased estimator for multivariate regression models with highly collinear variables}, + author = {Wissel, Julia}, + school = {Erlangung des naturwissenschaftlichen Doktorgrades der Bayerischen Julius-Maximilians-Universit{\"a}t W{\"u}rzburg.}, + url = {https://opus.bibliothek.uni-wuerzburg.de/frontdoor/index/index/docId/2949}, + year = {2009} +} +@article{Salmeron2017, + title = {The raise estimator estimation, inference, and properties}, + author = {Salmer{\'o}n, Roman and Garcia, Catalina and Garcia, Jose and Lopez, Maria del Mar}, + journal = {Communications in Statistics-Theory and Methods}, + volume = {46}, + number = {13}, + pages = {6446--6462}, + year = {2017}, + publisher = {Taylor \& Francis}, + doi = {10.1080/03610926.2015.1125496}, + url = {https://doi.org/10.1080/03610926.2015.1125496} +} +@article{Salmeron2024b, + title = {The Raise Regression: Justification, Properties and Application}, + author = {Salmer\'on, R. and Garc\'ia, C.B. and Garc\'ia, J.}, + journal = {International Statistical Review}, + doi = {10.1111/insr.12575}, + year = {2024}, + url = {https://doi.org/10.1111/insr.12575} +} +@Book{KleinGoldberger, + title = {An Economic Model of the {United States} 1929-1952}, + author = {Klein, Lawrence Robert and Goldberger, Arthur S}, + year = {1955}, + doi = {10.2307/2227976}, + url = {https://doi.org/10.2307/2227976}, + publisher = {Amsterdan: North-Holland Publishing Company} +} +@article{Salmeron2019, + title = {Diagnosis and quantification of the non-essential collinearity}, + author = {Salmer\'on-G\'omez, Rom\'an and Rodr\'iguez-S\'anchez, Ainara and Garc\'ia-Garc\'ia, Catalina}, + journal = {Computational Statistics}, + pages = {647--666}, + volume = {35}, + year = {2019}, + publisher = {Springer}, + doi = {10.1007/s00180-019-00922-x}, + url = {https://doi.org/10.1007/s00180-019-00922-x} +} +@article{Marquardt1970, + title = {Generalized inverses, ridge regression, biased linear estimation and nonlinear estimation}, + author = {Marquardt, D.}, + journal = {Technometrics}, + volume = {12}, + number = {3}, + pages = {591--612}, + doi = {10.2307/1267205}, + url = {https://doi.org/10.2307/1267205}, + year = {1970} +} +@article{Salmeron2019TAS, + title = {Comment on ``{A} Note on Collinearity Diagnostics and Centering'' by {V}elilla (2018)}, + author = {Salmer\'on, R. and Garc\'ia, C.B. and Garc\'ia, J.}, + journal = {The American Statistician}, + volume = {74}, + number = {1}, + pages = {68--71}, + doi = {10.1080/00031305.2019.1635527}, + url = {https://doi.org/10.1080/00031305.2019.1635527}, + year = {2019} +} +@article{MarquardtSnee1975, + title = {Ridge regression in practice}, + author = {Marquardt, D. and Snee, R.D.}, + journal = {The American Statistician}, + volume = {29}, + number = {1}, + pages = {3--20}, + doi = {10.2307/2683673}, + url = {https://doi.org/10.2307/2683673}, + year = {1975} +} +@article{Belsley1991, + title = {A guide to using the collinearity diagnostics}, + author = {Belsley, D.}, + journal = {Computational Science in Economics and Manegement}, + volume = {4}, + pages = {33--50}, + doi = {10.1007/BF00426854}, + url = {https://doi.org/10.1007/BF00426854}, + year = {1991} +} +@Book{BelsleyKuhWelsch, + title = {Regression diagnostics: Identifying influential data and sources of collinearity}, + author = {Belsley, D. and Kuh, E. and Welsch, R.}, + year = {1980}, + url = {https://onlinelibrary.wiley.com/doi/book/10.1002/0471725153}, + publisher = {John Wiley and Sons} +} +@article{Salmeron2022mcvis, + title = {A guide to using the collinearity diagnostics}, + author = {Salmer\'on, R. and Garc\'ia, C.B. and Rodr\'iguez, A. and Garc\'ia, C.}, + journal = {R Journal}, + volume = {14}, + number = {4}, + pages = {264--279}, + doi = {10.32614/RJ-2023-010}, + url = {https://doi.org/10.32614/RJ-2023-010}, + year = {2022} +} +@article{Salmeron2020maths, + title = {Detection of near-multicollinearity through centered and noncentered regression}, + author = {Salmer\'on, R. and Garc\'ia, C.B. and Garc\'ia, J.}, + journal = {Mathematics}, + volume = {8}, + number = {6}, + pages = {931}, + doi = {10.3390/math8060931}, + url = {https://doi.org/10.3390/math8060931}, + year = {2020} +} +@article{Salmeron2022multicoll, + title = {The multiColl package versus other existing packages in {R} to detect multicollinearity}, + author = {Salmer\'on, R. and Garc\'ia, C.B. and Garc\'ia, J.}, + journal = {Computational Economics}, + volume = {60}, + pages = {439--450}, + doi = {10.1007/s10614-021-10154-1}, + url = {https://doi.org/10.1007/s10614-021-10154-1}, + year = {2022} +} +@article{Salmeron2021multicoll, + title = {A Guide to Using the {R} Package multiColl for Detecting Multicollinearity}, + author = {Salmer\'on, R. and Garc\'ia, C.B. and Garc\'ia, J.}, + journal = {Computational Economics}, + volume = {57}, + pages = {529--536}, + doi = {10.1007/s10614-019-09967-y}, + url = {https://doi.org/10.1007/s10614-019-09967-y}, + year = {2021} +} diff --git a/_articles/RJ-2025-040/RJwrapper.tex b/_articles/RJ-2025-040/RJwrapper.tex new file mode 100644 index 0000000000..df64237727 --- /dev/null +++ b/_articles/RJ-2025-040/RJwrapper.tex @@ -0,0 +1,70 @@ +\documentclass[a4paper]{report} +\usepackage[utf8]{inputenc} +\usepackage[T1]{fontenc} +\usepackage{RJournal} +\usepackage{amsmath,amssymb,array} +\usepackage{booktabs} + + +% tightlist command for lists without linebreak +\providecommand{\tightlist}{% + \setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}} + +\usepackage{longtable} + +% Always define CSL refs as bib entries are contained in separate doc +% Pandoc citation processing +%From Pandoc 3.1.8 +% definitions for citeproc citations +\NewDocumentCommand\citeproctext{}{} +\NewDocumentCommand\citeproc{mm}{% + \begingroup\def\citeproctext{#2}\cite{#1}\endgroup} +\makeatletter + % allow citations to break across lines + \let\@cite@ofmt\@firstofone + % avoid brackets around text for \cite: + \def\@biblabel#1{} + \def\@cite#1#2{{#1\if@tempswa , #2\fi}} +\makeatother +\newlength{\cslhangindent} +\setlength{\cslhangindent}{1.5em} +\newlength{\csllabelwidth} +\setlength{\csllabelwidth}{3em} +\newenvironment{CSLReferences}[2] % #1 hanging-indent, #2 entry-spacing + {\begin{list}{}{% + \setlength{\itemindent}{0pt} + \setlength{\leftmargin}{0pt} + \setlength{\parsep}{0pt} + % turn on hanging indent if param 1 is 1 + \ifodd #1 + \setlength{\leftmargin}{\cslhangindent} + \setlength{\itemindent}{-1\cslhangindent} + \fi + % set entry spacing + \setlength{\itemsep}{#2\baselineskip}}} + {\end{list}} +\usepackage{calc} +\newcommand{\CSLBlock}[1]{#1\hfill\break} +\newcommand{\CSLLeftMargin}[1]{\parbox[t]{\csllabelwidth}{#1}} +\newcommand{\CSLRightInline}[1]{\parbox[t]{\linewidth - \csllabelwidth}{#1}\break} +\newcommand{\CSLIndent}[1]{\hspace{\cslhangindent}#1} + + + +\begin{document} + + +%% do not edit, for illustration only +\sectionhead{Contributed research article} +\volume{17} +\volnumber{4} +\year{2025} +\month{December} +\setcounter{page}{192} + +\begin{article} + \input{RJ-2025-040} +\end{article} + + +\end{document} diff --git a/_articles/RJ-2025-040/figures/Figura1.png b/_articles/RJ-2025-040/figures/Figura1.png new file mode 100644 index 0000000000..27ba5efd97 Binary files /dev/null and b/_articles/RJ-2025-040/figures/Figura1.png differ diff --git a/_articles/RJ-2025-040/figures/Figura2.png b/_articles/RJ-2025-040/figures/Figura2.png new file mode 100644 index 0000000000..a06d28f2a5 Binary files /dev/null and b/_articles/RJ-2025-040/figures/Figura2.png differ diff --git a/_articles/RJ-2025-040/figures/Figura3.png b/_articles/RJ-2025-040/figures/Figura3.png new file mode 100644 index 0000000000..e0ef4afac5 Binary files /dev/null and b/_articles/RJ-2025-040/figures/Figura3.png differ diff --git a/_articles/RJ-2025-040/rvif.R b/_articles/RJ-2025-040/rvif.R new file mode 100644 index 0000000000..164ec72336 --- /dev/null +++ b/_articles/RJ-2025-040/rvif.R @@ -0,0 +1,506 @@ +# Generated by `rjournal_pdf_article()` using `knitr::purl()`: do not edit by hand +# Please edit rvif.Rmd to modify this file + +## ----setup, include=FALSE----------------------------------------------------- +knitr::opts_chunk$set(echo = FALSE, warning = FALSE, message = FALSE) #, cache = TRUE) +options(tinytex.verbose = TRUE) +library(rvif) +library(knitr) +library(kableExtra) +library(memisc) # mtable + + +## ----WisseltableHTML, eval = knitr::is_html_output()-------------------------- +# WisselTABLE = Wissel[,-3] +# knitr::kable(WisselTABLE, format = "html", caption = "Data set presented previously by @Wissell", align="cccccc", digits = 3) + + +## ----WisseltableLATEX, eval = knitr::is_latex_output()------------------------ +WisselTABLE = Wissel[,-3] +knitr::kable(WisselTABLE, format = "latex", booktabs = TRUE, caption = "Data set presented previously by Wissell", align="cccccc", digits = 3)%>% +kable_styling(latex_options = "scale_down") + + +## ----Wisselregression--------------------------------------------------------- +attach(Wissel) +obs = nrow(Wissel) +regWISSEL0 = lm(D~C+I+CP) + regWISSEL0coef = as.double(regWISSEL0$coefficients) + regWISSEL0se = as.double(summary(regWISSEL0)[[4]][,2]) + regWISSEL0texp = as.double(summary(regWISSEL0)[[4]][,3]) + regWISSEL0pvalue = as.double(summary(regWISSEL0)[[4]][,4]) + regWISSEL0sigma2 = as.double(summary(regWISSEL0)[[6]]^2) + regWISSEL0R2 = as.double(summary(regWISSEL0)[[8]]) + regWISSEL0Fexp = as.double(summary(regWISSEL0)[[10]][[1]]) + regWISSEL0table = data.frame(c(regWISSEL0coef, obs), + c(regWISSEL0se, regWISSEL0sigma2), + c(regWISSEL0texp, regWISSEL0R2), + c(regWISSEL0pvalue, regWISSEL0Fexp)) + regWISSEL0table = round(regWISSEL0table, digits=4) + colnames(regWISSEL0table) =c("Estimator", "Standard Error", "Experimental t", "p-value") + rownames(regWISSEL0table) =c("Intercept", "Personal consumption", "Personal income", "Outstanding consumer credit", "(Obs, Sigma Est., Coef. Det., F exp.)") +regWISSEL1 = lm(D~C) + regWISSEL1coef = as.double(regWISSEL1$coefficients) + regWISSEL1se = as.double(summary(regWISSEL1)[[4]][,2]) + regWISSEL1texp = as.double(summary(regWISSEL1)[[4]][,3]) + regWISSEL1pvalue = as.double(summary(regWISSEL1)[[4]][,4]) + regWISSEL1sigma2 = as.double(summary(regWISSEL1)[[6]]^2) + regWISSEL1R2 = as.double(summary(regWISSEL1)[[8]]) + regWISSEL1Fexp = as.double(summary(regWISSEL1)[[10]][[1]]) + regWISSEL1table = data.frame(c(regWISSEL1coef, obs), + c(regWISSEL1se, regWISSEL1sigma2), + c(regWISSEL1texp, regWISSEL1R2), + c(regWISSEL1pvalue, regWISSEL1Fexp)) + regWISSEL1table = round(regWISSEL1table, digits=4) + colnames(regWISSEL1table) =c("Estimator", "Standard Error", "Experimental t", "p-value") + rownames(regWISSEL1table) =c("Intercept", "Personal consumption", "(Obs, Sigma Est., Coef. Det., F exp.)") +regWISSEL2 = lm(D~C+I) + regWISSEL2coef = as.double(regWISSEL2$coefficients) + regWISSEL2se = as.double(summary(regWISSEL2)[[4]][,2]) + regWISSEL2texp = as.double(summary(regWISSEL2)[[4]][,3]) + regWISSEL2pvalue = as.double(summary(regWISSEL2)[[4]][,4]) + regWISSEL2sigma2 = as.double(summary(regWISSEL2)[[6]]^2) + regWISSEL2R2 = as.double(summary(regWISSEL2)[[8]]) + regWISSEL2Fexp = as.double(summary(regWISSEL2)[[10]][[1]]) + regWISSEL2table = data.frame(c(regWISSEL2coef, obs), + c(regWISSEL2se, regWISSEL2sigma2), + c(regWISSEL2texp, regWISSEL2R2), + c(regWISSEL2pvalue, regWISSEL2Fexp)) + regWISSEL2table = round(regWISSEL2table, digits=4) + colnames(regWISSEL2table) =c("Estimator", "Standard Error", "Experimental t", "p-value") + rownames(regWISSEL2table) =c("Intercept", "Personal consumption", "Personal income", "(Obs, Sigma Est., Coef. Det., F exp.)") +regWISSEL3 = lm(D~C+CP) + regWISSEL3coef = as.double(regWISSEL3$coefficients) + regWISSEL3se = as.double(summary(regWISSEL3)[[4]][,2]) + regWISSEL3texp = as.double(summary(regWISSEL3)[[4]][,3]) + regWISSEL3pvalue = as.double(summary(regWISSEL3)[[4]][,4]) + regWISSEL3sigma2 = as.double(summary(regWISSEL3)[[6]]^2) + regWISSEL3R2 = as.double(summary(regWISSEL3)[[8]]) + regWISSEL3Fexp = as.double(summary(regWISSEL3)[[10]][[1]]) + regWISSEL3table = data.frame(c(regWISSEL3coef, obs), + c(regWISSEL3se, regWISSEL3sigma2), + c(regWISSEL3texp, regWISSEL3R2), + c(regWISSEL3pvalue, regWISSEL3Fexp)) + regWISSEL3table = round(regWISSEL3table, digits=4) + colnames(regWISSEL3table) =c("Estimator", "Standard Error", "Experimental t", "p-value") + rownames(regWISSEL3table) =c("Intercept", "Personal consumption", "Outstanding consumer credit", "(Obs, Sigma Est., Coef. Det., F exp.)") + + +## ----Wissel0tableHTML, eval = knitr::is_html_output()------------------------- +# knitr::kable(regWISSEL0table, format = "html", caption = "OLS estimation for the Wissel model", align="cccc", digits = 3) + + +## ----Wissel0tableLATEX, eval = knitr::is_latex_output()----------------------- +knitr::kable(regWISSEL0table, format = "latex", booktabs = TRUE, caption = "OLS estimation for the Wissel model", align="cccc", digits = 3) %>% +kable_styling(latex_options = "scale_down") + + +## ----Wissel1tableHTML, eval = knitr::is_html_output()------------------------- +# knitr::kable(regWISSEL1table, format = "html", caption = "OLS estimation for part of the Wissel model", align="cccc", digits = 3) + + +## ----Wissel1tableLATEX, eval = knitr::is_latex_output()----------------------- +knitr::kable(regWISSEL1table, format = "latex", booktabs = TRUE, caption = "OLS estimation for part of the Wissel model", align="cccc", digits = 3) %>% +kable_styling(latex_options = "scale_down") + + +## ----Wissel2tableHTML, eval = knitr::is_html_output()------------------------- +# knitr::kable(regWISSEL2table, format = "html", caption = "OLS estimation for part of the Wissel model", align="cccc", digits = 3) + + +## ----Wissel2tableLATEX, eval = knitr::is_latex_output()----------------------- +knitr::kable(regWISSEL2table, format = "latex", booktabs = TRUE, caption = "OLS estimation for part of the Wissel model", align="cccc", digits = 3) %>% +kable_styling(latex_options = "scale_down") + + +## ----Wissel3tableHTML, eval = knitr::is_html_output()------------------------- +# knitr::kable(regWISSEL3table, format = "html", caption = "OLS estimation for part of the Wissel model", align="cccc", digits = 3) + + +## ----Wissel3tableLATEX, eval = knitr::is_latex_output()----------------------- +knitr::kable(regWISSEL3table, format = "latex", booktabs = TRUE, caption = "OLS estimation for part of the Wissel model", align="cccc", digits = 3)%>% +kable_styling(latex_options = "scale_down") + + +## ----WisselORTO--------------------------------------------------------------- +y = Wissel[,2] +X = as.matrix(Wissel[,3:6]) +Xqr=qr(X) +Xo = qr.Q(Xqr) +regORTO = lm(y~Xo+0) +#summary(regORTO) + regORTOcoef = as.double(regORTO$coefficients) + regORTOse = as.double(summary(regORTO)[[4]][,2]) + regORTOtexp = as.double(summary(regORTO)[[4]][,3]) + regORTOpvalue = as.double(summary(regORTO)[[4]][,4]) + regORTOsigma2 = as.double(summary(regORTO)[[6]]^2) + regORTOR2 = as.double(summary(regORTO)[[8]]) # as I have removed the intercept in the regression, this does not calculate it well + regORTOFexp = as.double(summary(regORTO)[[10]][[1]]) # as I have removed the intercept in the regression, this does not calculate it well + regORTOtable = data.frame(c(regORTOcoef, obs), + c(regORTOse, regORTOsigma2), + c(regORTOtexp, 0.9235), + c(regORTOpvalue, 52.3047)) + regORTOtable = round(regORTOtable, digits=4) + colnames(regORTOtable) =c("Estimator", "Standard Error", "Experimental t", "p-value") + rownames(regORTOtable) =c("Intercept", "Personal consumption", "Personal income", "Outstanding consumer credit", "(Obs, Sigma Est., Coef. Det., F exp.)") + + +## ----WisselORTOtableHTML, eval = knitr::is_html_output()---------------------- +# knitr::kable(regORTOtable, format = "html", caption = "OLS estimation for the orthonormal Wissel model", align="cccc", digits = 3) + + +## ----WisselORTOtableLATEX, eval = knitr::is_latex_output()-------------------- +knitr::kable(regORTOtable, format = "latex", booktabs = TRUE, caption = "OLS estimation for the orthonormal Wissel model", align="cccc", digits = 3)%>% +kable_styling(latex_options = "scale_down") + + +## ----KGtableHTML, eval = knitr::is_html_output()------------------------------ +# data(KG) +# KGtable = KG +# colnames(KGtable) = c("Consumption", "Wage income", "Non-farm income", "Farm income") +# knitr::kable(KGtable, format = "html", caption = "Data set presented previously by @KleinGoldberger", align="cccc", digits = 3) + + +## ----KGtableLATEX, eval = knitr::is_latex_output()---------------------------- +data(KG) +KGtable = KG +colnames(KGtable) = c("Consumption", "Wage income", "Non-farm income", "Farm income") +knitr::kable(KGtable, format = "latex", booktabs = TRUE, caption = "Data set presented previously by Klein and Goldberger", align="cccc", digits = 3)%>% +kable_styling(latex_options = "scale_down") + + +## ----KGregression------------------------------------------------------------- +attach(KG) +obs = nrow(KG) +regKG = lm(consumption~wage.income+non.farm.income+farm.income) + regKGcoef = as.double(regKG$coefficients) + regKGse = as.double(summary(regKG)[[4]][,2]) + regKGtexp = as.double(summary(regKG)[[4]][,3]) + regKGpvalue = as.double(summary(regKG)[[4]][,4]) + regKGsigma2 = as.double(summary(regKG)[[6]]^2) + regKGR2 = as.double(summary(regKG)[[8]]) + regKGFexp = as.double(summary(regKG)[[10]][[1]]) + regKGtable = data.frame(c(regKGcoef, obs), + c(regKGse, regKGsigma2), + c(regKGtexp, regKGR2), + c(regKGpvalue, regKGFexp)) + regKGtable = round(regKGtable, digits=4) + colnames(regKGtable) =c("Estimator", "Standard Error", "Experimental t", "p-value") + rownames(regKGtable) =c("Intercept", "Wage income", "Non-farm income", "Farm income", "(Obs, Sigma Est., Coef. Det., F exp.)") + + +## ----regKGtableHTML, eval = knitr::is_html_output()--------------------------- +# knitr::kable(regKGtable, format = "html", caption = "OLS estimation for the Klein and Goldberger model", align="cccc", digits = 3) + + +## ----regKGtableLATEX, eval = knitr::is_latex_output()------------------------- +knitr::kable(regKGtable, format = "latex", booktabs = TRUE, caption = "OLS estimation for the Klein and Goldberger model", align="cccc", digits = 3)%>% +kable_styling(latex_options = "scale_down") + + +## ----THEOREM------------------------------------------------------------------ +y = Wissel[,2] +X = as.matrix(Wissel[,3:6]) +theoremWISSEL = multicollinearity(y, X) +rownames(theoremWISSEL) = c("Intercept", "Personal consumption", "Personal income", "Outstanding consumer credit") + +y = KG[,1] +cte = rep(1, length(y)) +X = as.matrix(cbind(cte, KG[,-1])) +theoremKG = multicollinearity(y, X) +rownames(theoremKG) = c("Intercept", "Wage income", "Non-farm income", "Farm income") + + +## ----theoremWISSELtableHTML, eval = knitr::is_html_output()------------------- +# knitr::kable(theoremWISSEL, format = "html", caption = "Theorem results of the Wissel model", align="ccccc", digits = 6) + + +## ----theoremWISSELtableLATEX, eval = knitr::is_latex_output()----------------- +knitr::kable(theoremWISSEL, format = "latex", booktabs = TRUE, caption = "Theorem results of the Wissel model", align="ccccc", digits = 6) %>% +kable_styling(latex_options = "scale_down") + + +## ----theoremKGtableHTML, eval = knitr::is_html_output()----------------------- +# knitr::kable(theoremKG, format = "html", caption = "Theorem results of the Klein and Goldberger model", align="ccccc", digits = 6) + + +## ----theoremKGtableLATEX, eval = knitr::is_latex_output()--------------------- +knitr::kable(theoremKG, format = "latex", booktabs = TRUE, caption = "Theorem results of the Klein and Goldberger model", align="ccccc", digits = 6) %>% +kable_styling(latex_options = "scale_down") + + +## ----PAPER13, echo=TRUE------------------------------------------------------- +y_W = Wissel[,2] +X_W = Wissel[,3:6] +multicollinearity(y_W, X_W) + + +## ----PAPER14, echo=TRUE------------------------------------------------------- +y_KG = KG[,1] +cte = rep(1, length(y)) +X_KG = cbind(cte, KG[,2:4]) +multicollinearity(y_KG, X_KG) + + +## ----PAPER15, echo=TRUE------------------------------------------------------- +multicollinearity(y_W, X_W, alpha = 0.01) +multicollinearity(y_KG, X_KG, alpha = 0.01) + + +## ----SAMPLE-SIZE 1------------------------------------------------------------ +## Simulation 1 +set.seed(2024) +obs = 3000 # no individual significance test is affected +cte = rep(1, obs) +x2 = rnorm(obs, 5, 0.1) # related to intercept: non essential +x3 = rnorm(obs, 5, 10) +x4 = x3 + rnorm(obs, 5, 0.5) # related to x3: essential +x5 = rnorm(obs, -1, 3) +x6 = rnorm(obs, 15, 2.5) +y = 4 + 5*x2 - 9*x3 -2*x4 + 2*x5 + 7*x6 + rnorm(obs, 0, 2) +X = cbind(cte, x2, x3, x4, x5, x6) +theoremSIMULATION1 = multicollinearity(y, X) +rownames(theoremSIMULATION1) = c("Intercept", "X2", "X3", "X4", "X5", "X6") +vifsSIMULATION1 = VIF(X) +cnSIMULATION1 = CN(X) +cvsSIMULATION1 = CVs(X) + +## Simulation 2 +obs = 100 # decreasing the number of observations affected to intercept +cte = rep(1, obs) +x2 = rnorm(obs, 5, 0.1) # related to intercept: non essential +x3 = rnorm(obs, 5, 10) +x4 = x3 + rnorm(obs, 5, 0.5) # related to x3: essential +x5 = rnorm(obs, -1, 3) +x6 = rnorm(obs, 15, 2.5) +y = 4 + 5*x2 - 9*x3 -2*x4 + 2*x5 + 7*x6 + rnorm(obs, 0, 2) +X = cbind(cte, x2, x3, x4, x5, x6) +theoremSIMULATION2 = multicollinearity(y, X) +rownames(theoremSIMULATION2) = c("Intercept", "X2", "X3", "X4", "X5", "X6") +vifsSIMULATION2 = VIF(X) +cnSIMULATION2 = CN(X) +cvsSIMULATION2 = CVs(X) + +## Simulation 3 +obs = 30 # decreasing the number of observations affected to intercept, x2 and x4 +cte = rep(1, obs) +x2 = rnorm(obs, 5, 0.1) # related to intercept: non essential +x3 = rnorm(obs, 5, 10) +x4 = x3 + rnorm(obs, 5, 0.5) # related to x3: essential +x5 = rnorm(obs, -1, 3) +x6 = rnorm(obs, 15, 2.5) +y = 4 + 5*x2 - 9*x3 -2*x4 + 2*x5 + 7*x6 + rnorm(obs, 0, 2) +X = cbind(cte, x2, x3, x4, x5, x6) +theoremSIMULATION3 = multicollinearity(y, X) +rownames(theoremSIMULATION3) = c("Intercept", "X2", "X3", "X4", "X5", "X6") +vifsSIMULATION3 = VIF(X) +cnSIMULATION3 = CN(X) +cvsSIMULATION3 = CVs(X) + + +## ----traditionalSIMULATIONtableHTML, eval = knitr::is_html_output()----------- +# traditionalSIMULATION = data.frame(c(cvsSIMULATION1, vifsSIMULATION1, cnSIMULATION1), +# c(cvsSIMULATION2, vifsSIMULATION2, cnSIMULATION2), +# c(cvsSIMULATION3, vifsSIMULATION3, cnSIMULATION3)) +# rownames(traditionalSIMULATION) = c("X2 CV", "X3 CV", "X4 CV", "X5 CV", "X6 CV", "X2 VIF", "X3 VIF", "X4 VIF", "X5 VIF", "X6 VIF", "CN") +# colnames(traditionalSIMULATION) = c("Simulation 1", "Simulation 2", "Simulation 3") +# knitr::kable(traditionalSIMULATION, format = "html", caption = "CVs, VIFs for data of Simulations 1, 2 and 3", align="cccccc", digits = 3) + + +## ----traditionalSIMULATIONtableLATEX, eval = knitr::is_latex_output()--------- +traditionalSIMULATION = data.frame(c(cvsSIMULATION1, vifsSIMULATION1, cnSIMULATION1), + c(cvsSIMULATION2, vifsSIMULATION2, cnSIMULATION2), + c(cvsSIMULATION3, vifsSIMULATION3, cnSIMULATION3)) +rownames(traditionalSIMULATION) = c("X2 CV", "X3 CV", "X4 CV", "X5 CV", "X6 CV", "X2 VIF", "X3 VIF", "X4 VIF", "X5 VIF", "X6 VIF", "CN") +colnames(traditionalSIMULATION) = c("Simulation 1", "Simulation 2", "Simulation 3") +knitr::kable(traditionalSIMULATION, format = "latex", booktabs = TRUE, caption = "CVs, VIFs and CN for data of Simulations 1, 2 and 3", align="cccccc", digits = 3) %>% +kable_styling(latex_options = "scale_down") + + +## ----theoremSIMULATION1tableHTML, eval = knitr::is_html_output()-------------- +# knitr::kable(theoremSIMULATION1, format = "html", caption = "Theorem results of the Simulation 1 model", align="cccccc", digits = 6) + + +## ----theoremSIMULATION1tableLATEX, eval = knitr::is_latex_output()------------ +knitr::kable(theoremSIMULATION1, format = "latex", booktabs = TRUE, caption = "Theorem results of the Simulation 1 model", align="cccccc", digits = 6) %>% +kable_styling(latex_options = "scale_down") + + +## ----theoremSIMULATION2tableHTML, eval = knitr::is_html_output()-------------- +# knitr::kable(theoremSIMULATION2, format = "html", caption = "Theorem results of the Simulation 2 model", align="cccccc", digits = 6) + + +## ----theoremSIMULATION2tableLATEX, eval = knitr::is_latex_output()------------ +knitr::kable(theoremSIMULATION2, format = "latex", booktabs = TRUE, caption = "Theorem results of the Simulation 2 model", align="cccccc", digits = 6) %>% +kable_styling(latex_options = "scale_down") + + +## ----theoremSIMULATION3tableHTML, eval = knitr::is_html_output()-------------- +# knitr::kable(theoremSIMULATION3, format = "html", caption = "Theorem results of the Simulation 3 model", align="cccccc", digits = 6) + + +## ----theoremSIMULATION3tableLATEX, eval = knitr::is_latex_output()------------ +knitr::kable(theoremSIMULATION3, format = "latex", booktabs = TRUE, caption = "Theorem results of the Simulation 3 model", align="cccccc", digits = 6) %>% +kable_styling(latex_options = "scale_down") + + +## ----Choice1------------------------------------------------------------------ +P = CDpf[,1] +cte = CDpf[,2] +K = CDpf[,3] +W = CDpf[,4] + +data2 = cbind(cte, K, W) +th2 = multicollinearity(P, data2) +rownames(th2) = c("Intercept", "Capital", "Work") + + +## ----theoremCHOICE1tableHTML, eval = knitr::is_html_output()------------------ +# knitr::kable(th2, format = "html", caption = "Theorem results of the Example 2 of @Salmeron2024a", align="cccccc", digits = 6) + + +## ----theoremCHOICE1tableLATEX, eval = knitr::is_latex_output()---------------- +knitr::kable(th2, format = "latex", booktabs = TRUE, caption = "Theorem results of the Example 2 of Salmerón et al. (2025)", align="cccccc", digits = 6) %>% +kable_styling(latex_options = "scale_down") + + +## ----Choice2------------------------------------------------------------------ +data2 = cbind(cte, W, K) +th2 = multicollinearity(P, data2) +rownames(th2) = c("Intercept", "Work", "Capital") + + +## ----theoremCHOICE2tableHTML, eval = knitr::is_html_output()------------------ +# knitr::kable(th2, format = "html", caption = "Theorem results of the Example 2 of @Salmeron2024a (reordination 2)", align="cccccc", digits = 6) + + +## ----theoremCHOICE2tableLATEX, eval = knitr::is_latex_output()---------------- +knitr::kable(th2, format = "latex", booktabs = TRUE, caption = "Theorem results of the Example 2 of Salmerón et al. (2025) (reordination 2)", align="cccccc", digits = 6) %>% +kable_styling(latex_options = "scale_down") + + +## ----Choice7, eval=FALSE------------------------------------------------------ +# NE = employees[,1] +# cte = employees[,2] +# FA = employees[,3] +# OI = employees[,4] +# S = employees[,5] +# reg = lm(NE~FA+OI+S) +# summary(reg) + + +## ----Choice8------------------------------------------------------------------ +NE = employees[,1] +cte = employees[,2] +FA = employees[,3] +OI = employees[,4] +S = employees[,5] +data3 = cbind(OI, S, FA) +th8 = multicollinearity(NE, data3) +rownames(th8) = c("OI", "S", "FA") + + +## ----theoremCHOICE8tableHTML, eval = knitr::is_html_output()------------------ +# knitr::kable(th8, format = "html", caption = "Theorem results of the Example 3 of @Salmeron2024a reordination", align="ccccc", digits=20) + + +## ----theoremCHOICE8tableLATEX, eval = knitr::is_latex_output()---------------- +knitr::kable(th8, format = "latex", booktabs = TRUE, caption = "Theorem results of the Example 3 of Salmerón et al. (2025) reordination", align="ccccc", digits=20) %>% +kable_styling(latex_options = "scale_down") + + +## ----PAPER1, echo=TRUE-------------------------------------------------------- +E = euribor[,1] +data1 = euribor[,-1] + +VIF(data1) +CN(data1) +CVs(data1) + + +## ----PAPER2, echo = knitr::is_html_output(), eval = knitr::is_html_output()---- +# rvifs(data1, ul = T, intercept = T) + + +## ----PAPER2bis, echo = knitr::is_latex_output(), eval = knitr::is_latex_output()---- +rvifs(data1, ul = T, intercept = T) + + +## ----PAPER_2, echo = knitr::is_html_output(), eval = knitr::is_html_output()---- +# reg_E = lm(euribor[,1]~as.matrix(euribor[,-c(1,2)])) +# rvifs(model.matrix(reg_E)) + + +## ----PAPER_2bis, echo = knitr::is_latex_output(), eval = knitr::is_latex_output()---- +reg_E = lm(euribor[,1]~as.matrix(euribor[,-c(1,2)])) +rvifs(model.matrix(reg_E)) + + +## ----PAPER3, echo=TRUE-------------------------------------------------------- +multicollinearity(E, data1) + + +## ----PAPER4, echo=TRUE-------------------------------------------------------- +P = CDpf[,1] +data2 = CDpf[,2:4] + + +## ----PAPER4bis, echo = knitr::is_html_output(), eval = knitr::is_html_output()---- +# rvifs(data2, ul = T) + + +## ----PAPER4tris, echo = knitr::is_latex_output(), eval = knitr::is_latex_output()---- +rvifs(data2, ul = T) + + +## ----PAPER5, echo=TRUE-------------------------------------------------------- +multicollinearity(P, data2) + + +## ----PAPER5bis, echo=TRUE----------------------------------------------------- +data2 = CDpf[,c(2,4,3)] +multicollinearity(P, data2) + + +## ----PAPER6, echo=TRUE-------------------------------------------------------- +NE = employees[,1] +data3 = employees[,2:5] + + +## ----PAPER6bis, echo = knitr::is_html_output(), eval = knitr::is_html_output()---- +# rvifs(data3, ul = T) + + +## ----PAPER6tris, echo = knitr::is_latex_output(), eval = knitr::is_latex_output()---- +rvifs(data3, ul = T) + + +## ----PAPER7, echo=TRUE-------------------------------------------------------- +multicollinearity(NE, data3[,-1]) + + +## ----PAPER8, echo=TRUE-------------------------------------------------------- +set.seed(2022) +obs = 50 +cte4 = rep(1, obs) +V = rnorm(obs, 10, 10) +y1 = 3 + 4*V + rnorm(obs, 0, 2) +Z = rnorm(obs, 10, 0.1) +y2 = 3 + 4*Z + rnorm(obs, 0, 2) + +data4.1 = cbind(cte4, V) +data4.2 = cbind(cte4, Z) + + +## ----PAPER10, echo=TRUE------------------------------------------------------- +rvifs(data4.1, ul = T) + + +## ----PAPER11, echo=TRUE------------------------------------------------------- +rvifs(data4.2, ul = T) + + +## ----PAPER12, echo=TRUE------------------------------------------------------- +multicollinearity(y1, data4.1) +multicollinearity(y2, data4.2) + diff --git a/_articles/RJ-2025-040/rvif.tex b/_articles/RJ-2025-040/rvif.tex new file mode 100644 index 0000000000..f41d5bd931 --- /dev/null +++ b/_articles/RJ-2025-040/rvif.tex @@ -0,0 +1,1142 @@ +% !TeX root = RJwrapper.tex +\title{rvif: a Decision Rule to Detect Troubling Statistical Multicollinearity Based on Redefined VIF} + + +\author{by Román Salmerón-Gómez and Catalina B. García-García} + +\maketitle + +\abstract{% +Multicollinearity is relevant in many different fields where linear regression models are applied since its presence may affect the analysis of ordinary least squares estimators not only numerically but also from a statistical point of view, which is the focus of this paper. Thus, it is known that collinearity can lead to incoherence in the statistical significance of the coefficients of the independent variables and in the global significance of the model. In this paper, the thresholds of the Redefined Variance Inflation Factor (RVIF) are reinterpreted and presented as a statistical test with a region of non-rejection (which depends on a significance level) to diagnose the existence of a degree of worrying multicollinearity that affects the linear regression model from a statistical point of view. The proposed methodology is implemented in the rvif package of R and its application is illustrated with different real data examples previously applied in the scientific literature. +} + +\section{Introduction}\label{introduction} + +It is well known that linear relationships between the independent variables of a multiple linear regression model (multicollinearity) can affect the analysis of the model estimated by Ordinary Least Squares (OLS), either by causing unstable estimates of the coefficients of these variables or by rejecting individual significance tests of these coefficients (see, for example, \citet{FarrarGlauber}, \citet{GunstMason1977}, \citet{Gujarati2003}, \citet{Silvey1969}, \citet{WillanWatts1978} or \citet{Wooldrigde2013}). +However, the measures traditionally applied to detect multicollinearity may conclude that multicollinearity exists even if it does not lead to the negative effects mentioned above (see Subsection \hyperref[effect-sample-size]{Effect of sample size..} for more details), when, in fact, the best solution in this case may be not to treat the multicollinearity (see \citet{OBrien}). + +Focusing on the possible effect of multicollinearity on the individual significance tests of the coefficients of the independent variables (tendency not to reject the null hypothesis), this paper proposes an alternative procedure that focuses on checking whether the detected multicollinearity affects the statistical analysis of the model. For this disruptive approach, a methodology is necessary that indicates whether multicollinearity affects the statistical analysis of the model. The introduction of such methodology is the main objective of this paper. The paper also shows the use of the \CRANpkg{rvif} package of R (\citet{R}) in which this procedure is implemented. + +To this end, we start from the Variance Inflation Factor (VIF). The VIF is obtained from the coefficient of determination of the auxiliary regression of each independent variable of linear regression model as a function of the other independent variables. Thus, there is a VIF for each independent variable except for the intercept, for which it is not possible to calculate a coefficient of determination for the corresponding auxiliary regression. Consequently, the VIF is able to diagnose the degree of essential approximate multicollinearity (strong linear relationship between the independent variables except the intercept) existing in the model but is not able to detect the non-essential one (strong relationship between the intercept and at least one of the independent variables). +For more information on multicollinearity of essential and non-essential type, see \citet{MarquardtSnee1975} and \citet{Salmeron2019}. + +However, the fact that the VIF detects a worrying level of multicollinearity does not always translate into a negative impact on the statistical analysis. This lack of specificity is due to the fact that other factors, such as sample size and the variance of the random disturbance, can lead to high values of the VIF but not increase the variance of the OLS estimators (see \citet{OBrien}). The explanation for this phenomenon hinges on the fact that, in the orthogonal variable reference model, which is traditionally considered as the reference, the linear relationships are assumed to be eliminated, while other factors, such as the variance of the random disturbance, maintain the same values. + +Then, to avoid these inconsistencies, \citet{Salmeron2024a} propose a QR decomposition in the matrix of independent variables of the model in order to obtain an orthonormal matrix. By redefining the reference point, the variance inflation factor is also redefined, resulting in a new detection measure that analyzes the change in the VIF and the rest of relevant factors of the model, thereby overcoming the problems associated with the traditional VIF, as described by \citet{OBrien} among others. The intercept is also included in the detection (contrary to what happens with the traditional VIF), it is therefore able to detect both essential and non-essential multicollinearity. +This new measure presented by \citet{Salmeron2024a} is called Redefined Variance Inflation Factor (RVIF). + +In this paper, the RVIF is associated with a statistical test for detecting troubling multicollinearity, this test is given by a region of non-rejection that depends on a significance level. Note that most of the measures used to diagnose multicollinearity are merely indicators with rules of thumb rather than statistical tests per se. To the best of our knowledge, the only existing statistical test for diagnosing multicollinearity was presented by \citet{FarrarGlauber} and has received strong criticism (see, for example, \citet{CriticaFarrar1}, \citet{CriticaFarrar2}, \citet{CriticaFarrar3} and \citet{CriticaFarrar4}). +Thus, for example, \citet{CriticaFarrar1} indicates that the Farrar and Glauber statistic indicates that the variables are not orthogonal to each other; it tells us nothing more. +In this sense, \citet{CriticaFarrar4} indicates that such a test simply indicates whether the null hypothesis of orthogonality is rejected by giving no information on the value of the matrix of correlations determinant above which the multicollinearity problem becomes intolerable. +Therefore, the non-rejection region presented in this paper should be a relevant contribution to the field of econometrics insofar as it would fill an existing gap in the scientific literature. + +The paper is structured as follows: Sections \hyperref[preliminares]{Preliminares} and \hyperref[modelo-orto]{A first attempt of\ldots{}} provide preliminary information to introduce the methodology used to establish the non-rejection region described in Section \hyperref[new-VIF-orto]{A non-rejection region\ldots{}}. +Section \hyperref[paqueteRVIF]{rvif package} presents the package \CRANpkg{rvif} of R (\citet{R}) and shows its main commands by replicating the results given in \citet{Salmeron2024a} and in the previous sections of this paper. +Finally, Section \hyperref[conclusiones-VIF]{Conclusions} summarizes the main contributions of this paper. + +\section{Preliminaries}\label{preliminares} + +This section identifies some inconsistencies in the definition of the VIF and how these are reflected in the individual significance tests of the linear regression model. It also shows how these inconsistencies are overcome in the proposal presented by \citet{Salmeron2024a} and how this proposal can lead to a decision rule to determine whether the degree of multicollinearity is troubling, i.e., whether it affects the statistical analysis (individual significance tests) of the model. + +\subsection{The original model}\label{the-original-model} + +The multiple linear regression model with \(n\) observations and \(k\) independent variables can be expressed as: +\begin{equation} + \mathbf{y}_{n \times 1} = \mathbf{X}_{n \times k} \cdot \boldsymbol{\beta}_{k \times 1} + \mathbf{u}_{n \times 1}, + \label{eq:model0} +\end{equation} +where the first column of \(\mathbf{X} = [\mathbf{1} \ \mathbf{X}_{2} \dots \mathbf{X}_{i} \dots \mathbf{X}_{k}]\) is composed of ones representing the intercept and \(\mathbf{u}\) represents the random disturbance assumed to be centered and spherical. That is, \(E[\mathbf{u}_{n \times 1}] = \mathbf{0}_{n \times 1}\) and \(var(\mathbf{u}_{n \times 1}) = \sigma^{2} \cdot \mathbf{I}_{n \times n}\), where \(\mathbf{0}\) is a vector of zeros, \(\sigma^{2}\) is the variance of the random disturbance and \(\mathbf{I}\) is the identity matrix. + +Given the original model \eqref{eq:model0}, the VIF is defined as the ratio between the variance of the estimator in this model, \(var \left( \widehat{\beta}_{i} \right)\), and the variance of the estimator of a hypothetical reference model, that is, a hypothetical model in which orthogonality among the independent variables is assumed, \(var \left( \widehat{\beta}_{i,o} \right)\). This is to say: + +\begin{equation}\small{ + var \left( \widehat{\beta}_{i} \right) = \frac{\sigma^{2}}{n \cdot var(\mathbf{X}_{i})} \cdot \frac{1}{1 - R_{i}^{2}} = var \left( \widehat{\beta}_{i,o} \right) \cdot VIF(i), \quad i=2,\dots,k, + \label{eq:vari-VIF}} +\end{equation}\\ +\begin{equation} + \frac{ + var \left( \widehat{\beta}_{i} \right) + }{ + var \left( \widehat{\beta}_{i,o} \right) + } = VIF(i), \quad i=2,\dots,k, + \label{eq:vari-VIF2} +\end{equation} +where \(\mathbf{X}_{i}\) is the independent variable \(i\) of the model \eqref{eq:model0} and \(R^{2}_{i}\) the coefficient of determination of the following auxiliary regression: +\begin{equation} + \mathbf{X}_{i} = \mathbf{X}_{-i} \cdot \boldsymbol{\alpha} + \mathbf{v}, + \label{model_aux} \nonumber +\end{equation} +where \(\mathbf{X}_{-i}\) is the result of eliminating \(\mathbf{X}_{i}\) from the matrix \(\mathbf{X}\). + +As observed in the expression \eqref{eq:vari-VIF}, a high VIF leads to a high variance. Then, since the experimental value for the individual significance test is given by: +\begin{equation} + t_{i} = \left| \frac{\widehat{\beta}_{i}}{\sqrt{\frac{\widehat{\sigma}^{2}}{n \cdot var(\mathbf{X}_{i})} \cdot VIF(i)}} \right|, \quad i=2,\dots,k, + \label{eq:texp-orig} +\end{equation} +a high VIF will lead to a low experimental statistic (\(t_{i}\)), provoking the tendency not to reject the null hypothesis, i.e.~the experimental statistic will be lower than the theoretical statistic (given by \(t_{n-k}(1-\alpha/2)\), where \(\alpha\) is the significance level). + +However, this statement is full of simplifications. By following \citet{OBrien}, and as can be easily observed in the expression \eqref{eq:texp-orig}, other factors, such as the estimation of the random disturbance and the size of the sample, can counterbalance the high value of the VIF to yield a low value for the experimental statistic. That is to say, it is possible to obtain VIF values greater than 10 (the threshold traditionally established as troubling, see \citet{Marquardt1970} for example) that do not necessarily imply high estimated variance on account of a large sample size or a low value for the estimated variance of the random disturbance. This explains, as noted in the introduction, why not all models with a high value for the VIF present effects on the statistical analysis of the model. + +\begin{quote} +Example 1. +Thus, for example, \citet{Garciaetal2019b} considered an extension of the interest rate model presented by \citet{Wooldrigde2013}, where \(k=3\), in which all the independent variables have associated coefficients significantly different from zero, presenting a VIF equal to 71.516, much higher than the threshold normally established as worrying. In other words, in this case, a high VIF does not mean that the individual significance tests are affected. This situation is probably due to the fact that in this case 131 observations are available, i.e.~the expression \eqref{eq:texp-orig} can be expressed as: +\[t_{i} = \left| \frac{\widehat{\beta}_{i}}{\sqrt{\frac{\widehat{\sigma}^{2}}{131 \cdot var(\mathbf{X}_{i})} \cdot 71.516}} \right| += \left| \frac{\widehat{\beta}_{i}}{\sqrt{0.546 \cdot \frac{\widehat{\sigma}^{2}}{var(\mathbf{X}_{i})}}} \right|, \quad i=2,3.\] +Note that in this case a high value of \(n\) compensates for the high value of VIF. In addition, the value of \(n\) will also cause \(\widehat{\sigma}^{2}\) to decrease, since \(\widehat{\sigma}^{2} = \frac{\mathbf{e}^{t}\mathbf{e}}{n-k}\), where \(\mathbf{e}\) are the residuals of the original model \eqref{eq:model0}. +\end{quote} + +\begin{quote} +The Subsection \hyperref[effect-sample-size]{Effect of sample size..} provides an example that illustrates in more detail the effect of sample size on the statistical analysis of the model. \hfill \(\lozenge\) +\end{quote} + +On the other hand, considering the hypothetical orthogonal model, the value of the experimental statistic of the individual significance test, whose null hypothesis is \(\beta_{i} = 0\) in face of the alternative hypothesis \(\beta_{i} \not= 0\) with \(i=2,\dots,k\), is given by: +\begin{equation} + t_{i}^{o} = \left| \frac{\widehat{\beta}_{i}}{\sqrt{\frac{\widehat{\sigma}^{2}}{n \cdot var(\mathbf{X}_{i})}}} \right|, \quad i=2,\dots,k, + \label{eq:texp-orto-1} +\end{equation} +where the estimated variance of the estimator has been diminished due to the VIF always being greater than or equal to 1, and consequently, \(t_{i}^{o} \geq t_{i}\). However, it has been assumed that the same estimates for the independent variable coefficients and random disturbance variance are obtained in the orthogonal and original models, which does not seem to be a plausible supposition (see \citet{Salmeron2024a} Section 2.1 for more details). + +\subsection{An orthonormal reference model}\label{sub-above} + +In \citet{Salmeron2024a} the following QR decomposition of the matrix \(\mathbf{X}_{n \times k}\) of the model \eqref{eq:model0} is proposed: \(\mathbf{X} = \mathbf{X}_{o} \cdot \mathbf{P}\), where \(\mathbf{X}_{o}\) is an orthonormal matrix of the same dimensions as \(\mathbf{X}\) and \(\mathbf{P}\) is a higher-order triangular matrix of dimensions \(k \times k\). Then, the following hypothetical orthonormal reference model: +\begin{equation} + \mathbf{y} = \mathbf{X}_{o} \cdot \boldsymbol{\beta}_{o} + \mathbf{w}, + \label{eq:model-ref} +\end{equation} +verifies that: +\[\widehat{\boldsymbol{\beta}} = \mathbf{P}^{-1} \cdot \widehat{\boldsymbol{\beta}}_{o}, \ + \mathbf{e} = \mathbf{e}_{o}, \ + var \left( \widehat{\boldsymbol{\beta}}_{o} \right) = \sigma^{2} \cdot \mathbf{I},\] +where \(\mathbf{e}_{o}\) are the residuals of the orthonormal reference model \eqref{eq:model-ref}. +Note that since \(\mathbf{e} = \mathbf{e}_{o}\), the estimate of \(\sigma^{2}\) is the same in the original model \eqref{eq:model0} and in the orthonormal reference model \eqref{eq:model-ref}. +Moreover, since the dependent variable is the same in both models, the coefficient of determination and the experimental value of the global significance test are the same in both cases. + +From these values, taking into account the expressions \eqref{eq:vari-VIF} and \eqref{eq:vari-VIF2}, it is evident that the ratio between the variance of the estimator in the original model \eqref{eq:model0} and the variance of the estimator of the orthonormal reference model \eqref{eq:model-ref} is: +\begin{equation} + \frac{ + var \left( \widehat{\beta}_{i} \right) + }{ + var \left( \widehat{\beta}_{i,o} \right) + } = \frac{VIF(i)}{n \cdot var(\mathbf{X}_{i})}, \quad i=2,\dots,k. + \label{eq:redef-VIF} \nonumber +\end{equation} +Consequently, \citet{Salmeron2024a} defined the redefined VIF (RVIF) for \(i=1,\dots,k\) as: +\begin{equation}\small{ + RVIF(i) = \frac{VIF(i)}{n \cdot var(\mathbf{X}_{i})} = \frac{\mathbf{X}_{i}^{t} \mathbf{X}_{i}}{\mathbf{X}_{i}^{t} \mathbf{X}_{i} - \mathbf{X}_{i}^{t} \mathbf{X}_{-i} \cdot \left( \mathbf{X}_{-i}^{t} \mathbf{X}_{-i} \right)^{-1} \cdot \mathbf{X}_{-i}^{t} \mathbf{X}_{i}}, \label{eq:RVIF}} +\end{equation} +which shows, among other questions, that it is defined for \(i=1,2,\dots,k\). That is, in contrast to the VIF, the RVIF can be calculated for the intercept of the linear regression model. + +Other considerations to be taken into account are the following: + +\begin{itemize} +\item + If the data are expressed in unit length, same transformation used to calculate the Condition Number (CN), then: + \[RVIF(i) = \frac{1}{1 - \mathbf{X}_{i}^{t} \mathbf{X}_{-i} \cdot \left( \mathbf{X}_{-i}^{t} \mathbf{X}_{-i} \right)^{-1} \cdot \mathbf{X}_{-i}^{t} \mathbf{X}_{i}}, \quad i=1,\dots,k.\] +\item + In this case (data expressed in unit length), when \(\mathbf{X}_{i}\) is orthogonal to \(\mathbf{X}_{-i}\), it is verified that \(\mathbf{X}_{i}^{t} \mathbf{X}_{-i} = \mathbf{0}\) and, consequently \(RVIF(i) = 1\) for \(i=1,\dots,k\). That is, the RVIF is always greater than or equal to 1 and its minimum value is indicative of the absence of multicollinearity. +\item + Denoted by \(a_{i}= \mathbf{X}_{i}^{t} \mathbf{X}_{i} \cdot \left( \mathbf{X}_{-i}^{t} \mathbf{X}_{-i} \right)^{-1} \cdot \mathbf{X}_{-i}^{t} \mathbf{X}_{i}\), it is verified that \(RVIF(i) = \frac{1}{1-a_{i}}\) where \(a_{i}\) can be interpreted as the percentage of approximate multicollinearity due to variable \(\mathbf{X}_{i}\). Note the similarity of this expression to that of the VIF: \(VIF(i) = \frac{1}{1-R_{i}^{2}}\) (see equation \eqref{eq:vari-VIF}). +\item + Finally, from a simulation for \(k=3\), \citet{Salmeron2024a} show that if \(a_{i} > 0.826\), then the degree of multicollinearity is worrying. In any case this value should be refined by considering higher values of \(k\). +\end{itemize} + +On the other hand, given the orthonormal reference model \eqref{eq:model-ref}, the value for the experimental statistic of the individual significance test with the null hypothesis \(\beta_{i,o} = 0\) (given the alternative hypothesis \(\beta_{i,o} \not= 0\), for \(i=1,\dots,k\)) is: +\begin{equation} + t_{i}^{o} = \left| \frac{\widehat{\beta}_{i,o}}{\widehat{\sigma}} \right| = \left| \frac{\mathbf{p}_{i} \cdot \widehat{\boldsymbol{\beta}}}{\widehat{\sigma}} \right|, + \label{eq:texp-orto-2} +\end{equation} +where \(\mathbf{p}_{i}\) is the \(i\) row of the matrix \(\mathbf{P}\). + +By comparing this expression with the one given in \eqref{eq:texp-orto-1}, it is observed that, as expected, not only the denominator but also the numerator has changed. +Thus, in addition to the VIF, the rest of the elements in expression \eqref{eq:texp-orig} have also changed. +Consequently, if the null hypothesis is rejected in the original model, it is not assured that the same will occur in the orthonormal reference model. For this reason, it is possible to consider that the orthonormal model proposed as the reference model in \citet{Salmeron2024a} is more plausible than the one traditionally applied. + +\subsection{Possible scenarios in the individual significance tests}\label{possible-scenarios-in-the-individual-significance-tests} + +To determine whether the tendency not to reject the null hypothesis in the individual significance test is caused by a troubling approximate multicollinearity that inflates the variance of the estimator, or whether it is caused by variables not being statistically significantly related, the following situations are distinguished with a significance level \(\alpha\): + +\begin{enumerate} +\def\labelenumi{\alph{enumi}.} +\tightlist +\item + If the null hypothesis is initially rejected in the original model \eqref{eq:model0}, \(t_{i} > t_{n-k}(1-\alpha/2)\), the following results can be obtained for the orthonormal model: +\end{enumerate} + +a.1. the null hypothesis is rejected, \(t_{i}^{o} > t_{n-k}(1-\alpha/2)\); then, the results are consistent. + +a.2. the null hypothesis is not rejected, \(t_{i}^{o} < t_{n-k}(1-\alpha/2)\); this could be an inconsistency. + +\begin{enumerate} +\def\labelenumi{\alph{enumi}.} +\setcounter{enumi}{1} +\tightlist +\item + If the null hypothesis is not initially rejected in the original model \eqref{eq:model0}, \(t_{i} < t_{n-k}(1-\alpha/2)\), the following results may occur for the orthonormal model: +\end{enumerate} + +b.1 the null hypothesis is rejected, \(t_{i}^{o} > t_{n-k}(1-\alpha/2)\); then, it is possible to conclude that the degree of multicollinearity affects the statistical analysis of the model, provoking not rejecting the null hypothesis in the original model. + +b.2 the null hypothesis is also not rejected, \(t_{i}^{o} < t_{n-k}(1-\alpha/2)\); then, the results are consistent. + +In conclusion, when option b.1 is given, the null hypothesis of the individual significance test is not rejected when the linear relationships are considered (original model) but is rejected when the linear relationships are not considered (orthonormal model). Consequently, it is possible to conclude that the linear relationships affect the statistical analysis of the model. The possible inconsistency discussed in option a.2 is analyzed in detail in Appendix \hyperref[inconsistency]{Inconsistency}, concluding that it will rarely occur in cases where a high degree of multicollinearity is assumed. The other two scenarios provide consistent situations. + +\section{A first attempt to obtain a non-rejection region associated with a statistical test to detect multicollinearity}\label{modelo-orto} + +\subsection{From the traditional orthogonal model}\label{from-the-traditional-orthogonal-model} + +Considering the expressions \eqref{eq:texp-orig} and \eqref{eq:texp-orto-1}, it is verified that \(t_{i}^{o} = t_{i} \cdot \sqrt{VIF(i)}\). Consequently, in the orthogonal case, with a significance level \(\alpha\), the null hypothesis \(\beta_{i,o} = 0\) is rejected if \(t_{i}^{o} > t_{n-k}(1-\alpha/2)\) for \(i=2,\dots,k.\) That is, if: +\begin{equation} + VIF(i) > \left( \frac{t_{n-k}(1-\alpha/2)}{t_{i}} \right)^{2} = c_{1}(i), \quad i=2,\dots,k. + \label{eq:cond-false} +\end{equation} +Thus, if the VIF associated with the variable \(i\) is greater than the upper bound \(c_{1}(i)\), then it can be concluded that the estimator of the coefficient of that variable is significantly different from zero in the hypothetical case where the variables are orthogonal. In addition, if the null hypothesis is not rejected in the initial model, the reason for the failure to reject could be due to the degree of multicollinearity that affects the statistical analysis of the model. + +Finally, note that since the interesting cases are those where the null hypothesis is not initially rejected, \(t_{i} < t_{n-k}(1-\alpha/2)\), the upper bound \(c_{1}(i)\) will always be greater than one. + +\begin{quote} +Example 2. +Table \ref{tab:WisseltableLATEX} shows a dataset (previously presented by \citet{Wissell}) with the following variables: outstanding mortgage debt (\(\mathbf{D}\), trillions of dollars), personal consumption (\(\mathbf{C}\), trillions of dollars), personal income (\(\mathbf{I}\), trillions of dollars) and outstanding consumer credit (\(\mathbf{CP}\), trillions of dollars) for the years 1996 to 2012. +\end{quote} + +\begin{table} +\centering +\caption{\label{tab:WisseltableLATEX}Data set presented previously by Wissell} +\centering +\resizebox{\ifdim\width>\linewidth\linewidth\else\width\fi}{!}{ +\begin{tabular}[t]{ccccc} +\toprule +t & D & C & I & CP\\ +\midrule +1996 & 3.805 & 4.770 & 4.879 & 808.23\\ +1997 & 3.946 & 4.778 & 5.051 & 798.03\\ +1998 & 4.058 & 4.935 & 5.362 & 806.12\\ +1999 & 4.191 & 5.100 & 5.559 & 865.65\\ +2000 & 4.359 & 5.291 & 5.843 & 997.30\\ +\addlinespace +2001 & 4.545 & 5.434 & 6.152 & 1140.70\\ +2002 & 4.815 & 5.619 & 6.521 & 1253.40\\ +2003 & 5.129 & 5.832 & 6.915 & 1324.80\\ +2004 & 5.615 & 6.126 & 7.423 & 1420.50\\ +2005 & 6.225 & 6.439 & 7.802 & 1532.10\\ +\addlinespace +2006 & 6.786 & 6.739 & 8.430 & 1717.50\\ +2007 & 7.494 & 6.910 & 8.724 & 1867.20\\ +2008 & 8.399 & 7.099 & 8.882 & 1974.10\\ +2009 & 9.395 & 7.295 & 9.164 & 2078.00\\ +2010 & 10.680 & 7.561 & 9.727 & 2191.30\\ +\addlinespace +2011 & 12.071 & 7.804 & 10.301 & 2284.90\\ +2012 & 13.448 & 8.044 & 10.983 & 2387.50\\ +\bottomrule +\end{tabular}} +\end{table} + +\begin{quote} +Table \ref{tab:Wissel0tableLATEX} shows the OLS estimation of the model explaining the outstanding mortgage debt as a function of the rest of the variables. That is: +\[\mathbf{D} = \beta_{1} + \beta_{2} \cdot \mathbf{C} + \beta_{3} \cdot \mathbf{I} + \beta_{4} \cdot \mathbf{CP} + \mathbf{u}.\] +Note that the estimates for the coefficients of personal consumption, personal income and outstanding consumer credit are not significantly different from zero (a significance level of 5\% is considered throughout the paper), while the model is considered to be globally valid (experimental value, F exp., higher than theoretical value). +\end{quote} + +\begin{table} +\centering +\caption{\label{tab:Wissel0tableLATEX}OLS estimation for the Wissel model} +\centering +\resizebox{\ifdim\width>\linewidth\linewidth\else\width\fi}{!}{ +\begin{tabular}[t]{lcccc} +\toprule + & Estimator & Standard Error & Experimental t & p-value\\ +\midrule +Intercept & 5.469 & 13.017 & 0.420 & 0.681\\ +Personal consumption & -4.252 & 5.135 & -0.828 & 0.422\\ +Personal income & 3.120 & 2.036 & 1.533 & 0.149\\ +Outstanding consumer credit & 0.003 & 0.006 & 0.500 & 0.626\\ +(Obs, Sigma Est., Coef. Det., F exp.) & 17.000 & 0.870 & 0.923 & 52.305\\ +\bottomrule +\end{tabular}} +\end{table} + +\begin{quote} +In addition, the estimated coefficient for the variable personal consumption, which is not significantly different from zero, has the opposite sign to the simple correlation coefficient between this variable and outstanding mortgage debt, 0.953. +Thus, in the simple linear regression between both variables (see Table \ref{tab:Wissel1tableLATEX}), the estimated coefficient of the variable personal consumption is positive and significantly different from zero. However, adding a second variable (see Tables \ref{tab:Wissel2tableLATEX} and \ref{tab:Wissel3tableLATEX}) none of the coefficients are individually significantly different from zero although both models are globally significant. +This is traditionally understood as a symptom of statistically troubling multicollinearity. +\end{quote} + +\begin{table} +\centering +\caption{\label{tab:Wissel1tableLATEX}OLS estimation for part of the Wissel model} +\centering +\resizebox{\ifdim\width>\linewidth\linewidth\else\width\fi}{!}{ +\begin{tabular}[t]{lcccc} +\toprule + & Estimator & Standard Error & Experimental t & p-value\\ +\midrule +Intercept & -9.594 & 1.351 & -7.102 & 0.000\\ +Personal consumption & 2.629 & 0.214 & 12.285 & 0.000\\ +(Obs, Sigma Est., Coef. Det., F exp.) & 17.000 & 0.890 & 0.910 & 150.925\\ +\bottomrule +\end{tabular}} +\end{table} + +\begin{table} +\centering +\caption{\label{tab:Wissel2tableLATEX}OLS estimation for part of the Wissel model} +\centering +\resizebox{\ifdim\width>\linewidth\linewidth\else\width\fi}{!}{ +\begin{tabular}[t]{lcccc} +\toprule + & Estimator & Standard Error & Experimental t & p-value\\ +\midrule +Intercept & -0.117 & 6.476 & -0.018 & 0.986\\ +Personal consumption & -2.343 & 3.335 & -0.703 & 0.494\\ +Personal income & 2.856 & 1.912 & 1.494 & 0.158\\ +(Obs, Sigma Est., Coef. Det., F exp.) & 17.000 & 0.823 & 0.922 & 82.770\\ +\bottomrule +\end{tabular}} +\end{table} + +\begin{table} +\centering +\caption{\label{tab:Wissel3tableLATEX}OLS estimation for part of the Wissel model} +\centering +\resizebox{\ifdim\width>\linewidth\linewidth\else\width\fi}{!}{ +\begin{tabular}[t]{lcccc} +\toprule + & Estimator & Standard Error & Experimental t & p-value\\ +\midrule +Intercept & -8.640 & 9.638 & -0.896 & 0.385\\ +Personal consumption & 2.335 & 2.943 & 0.793 & 0.441\\ +Outstanding consumer credit & 0.001 & 0.006 & 0.100 & 0.922\\ +(Obs, Sigma Est., Coef. Det., F exp.) & 17.000 & 0.953 & 0.910 & 70.487\\ +\bottomrule +\end{tabular}} +\end{table} + +\begin{quote} +By using expression \eqref{eq:cond-false} in order to confirm this problem, it is verified that \(c_{1}(2) = 6.807\), \(c_{1}(3) = 1.985\) and \(c_{1}(4) = 18.743\), taking into account that \(t_{13}(0.975) = 2.160\). Since the VIFs are equal to 589.754, 281.886 and 189.487, respectively, it is concluded that the individual significance tests for the three cases are affected by the degree of multicollinearity existing in the model. \hfill \(\lozenge\) +\end{quote} + +\subsection{From the alternative orthonormal model \eqref{eq:model-ref}}\label{from-the-alternative-orthonormal-model-refeqmodel-ref} + +In the Subsection \hyperref[sub-above]{An orthonormal reference model} the individual significance test from the expression \eqref{eq:texp-orto-2} is redefined. Thus, the null hypothesis \(\beta_{i,o}=0\) will be rejected, with a significance level \(\alpha\), if the following condition is verified: +\[t_{i}^{o} > t_{n-k}(1-\alpha/2), \quad i=2,\dots,k.\] +Taking into account the expressions \eqref{eq:texp-orig} and \eqref{eq:texp-orto-2}, this is equivalent to: +\begin{equation}\small{ + VIF(i) > \left( \frac{t_{n-k}(1-\alpha/2)}{\widehat{\beta}_{i,o}} \right)^{2} \cdot \widehat{var} \left( \widehat{\beta}_{i} \right) \cdot n \cdot var(\mathbf{X}_{i}) = c_{2}(i). \label{eq:cota-VIF-orto}} +\end{equation} + +Thus, if the \(VIF(i)\) is greater than \(c_{2}(i)\), the null hypothesis is rejected in the respective individual significance tests in the orthonormal model (with \(i=2,\dots,k\)). Then, if the null hypothesis is not rejected in the original model and it is verified that \(VIF(i) > c_{2}(i)\), it can be concluded that the multicollinearity existing in the model affects its statistical analysis. In summary, a lower bound for the VIF is established to indicate when the approximate multicollinearity is troubling in a way that can be reinterpreted and presented as a region of non-rejection of a statistical test. + +\begin{quote} +Example 3. +Continuing with the dataset presented by \citet{Wissell}, Table \ref{tab:WisselORTOtableLATEX} shows the results of the OLS estimation of the orthonormal model obtained from the original model. +\end{quote} + +\begin{table} +\centering +\caption{\label{tab:WisselORTOtableLATEX}OLS estimation for the orthonormal Wissel model} +\centering +\resizebox{\ifdim\width>\linewidth\linewidth\else\width\fi}{!}{ +\begin{tabular}[t]{lcccc} +\toprule + & Estimator & Standard Error & Experimental t & p-value\\ +\midrule +Intercept & -27.882 & 0.932 & -29.901 & 0.000\\ +Personal consumption & 11.592 & 0.932 & 12.432 & 0.000\\ +Personal income & -1.355 & 0.932 & -1.453 & 0.170\\ +Outstanding consumer credit & 0.466 & 0.932 & 0.500 & 0.626\\ +(Obs, Sigma Est., Coef. Det., F exp.) & 17.000 & 0.870 & 0.923 & 52.305\\ +\bottomrule +\end{tabular}} +\end{table} + +\begin{quote} +When these results are compared with those in Table \ref{tab:Wissel0tableLATEX}, the following conclusions can be obtained: + +\begin{itemize} +\item + Except for the outstanding consumer credit variable, whose standard deviation has increased, the standard deviation has decreased in all cases. +\item + The absolute values of the experimental statistics of the individual significance tests associated with the intercept and the personal consumption variable have increased, while the experimental statistic of the personal income variable has decreased, and the experimental statistic of the outstanding consumer credit variable remains the same. These facts show that the change from the original model to the orthonormal model does not guarantee an increase in the absolute value of the experimental statistic. +\item + The estimation of the coefficient of the personal consumption variable is not significantly different from zero in the original model, but it is in the orthogonal model. Thus, it is concluded that multicollinearity affects the statistical analysis of the model. Note that there is also a change in the sign of the estimate, although the purpose of the orthogonal model is not to obtain estimates for the coefficients, but rather to provide a reference point against which to measure how much the variances are inflated. Note that an orthonormal model is an idealized construction that may lack a proper interpretation in practice. +\item + The values corresponding to the estimated variance for the random disturbance, the coefficient of determination and the experimental statistic (F exp.) for the global significance test remain the same. +\end{itemize} +\end{quote} + +\begin{quote} +On the other hand, considering the VIF of the independent variables except for the intercept (589.754, 281.886 and 189.487) and their corresponding bounds (17.809, 623.127 and 3545.167) obtained from the expression \eqref{eq:cota-VIF-orto}, only the variable of personal consumption verifies that the VIF is higher than the corresponding bound. These results are different from those obtained in Example 2, where the traditional orthogonal model was taken as a reference. +\end{quote} + +\begin{quote} +Finally, Tables \ref{tab:Wissel0tableLATEX} and \ref{tab:WisselORTOtableLATEX} show that the experimental values of the statistic \(t\) of the variable outstanding consumer credit are the same in the original and orthonormal models. \hfill \(\lozenge\) +\end{quote} + +The last fact highlighted at the end of the previous example is not a coincidence, but a consequence of the QR decomposition, see Appendix \hyperref[apendix]{Test of\ldots{}}. Therefore, in this case, the conclusion of the individual significance test will be the same in the original and in the orthonormal model, i.e.~we will always be in scenarios a.1 or b.2. + +Thus, this behavior establishes a situation where it is required to select the variable fixed in the last position. Some criteria to select the most appropriate variable for this placement could be: + +\begin{itemize} +\item + To fix the variable that is considered less relevant to the model. +\item + To fix a variable whose associated coefficient is significantly different from zero, since this case would not be of interest for the definition of multicollinearity given in the paper. Note that the interest will be related to a coefficient considered as zero in the original model and significantly different from zero in the orthonormal one. +\end{itemize} + +These options are explored in the Subsection \hyperref[how-to-fix]{Choice of the variable to be fix\ldots{}}. + +\section{A non-rejection region associated with a statistical test to detect multicollinearity}\label{new-VIF-orto} + +\citet{Salmeron2024a} show that high values of RVIF are associated with a high degree of multicollinearity. The question, however, is how high RVIFs have to be to reflect troubling multicollinearity. + +Taking into account the expressions \eqref{eq:RVIF} and \eqref{eq:cota-VIF-orto}, it is possible to conclude that multicollinearity is affecting the statistical analysis of the model if it can be verified that: +\begin{equation} + RVIF(i) > \left( \frac{t_{n-k}(1-\alpha/2)}{\widehat{\beta}_{i,o}} \right)^{2} \cdot \widehat{var} \left( \widehat{\beta}_{i} \right) = c_{3}(i), + \label{eq:cota-VIFR} +\end{equation} +for any \(i=1,\dots,k\). Note that the intercept is included in this proposal, in contrast to the previous section, in which it was not included. + +By following \citet{OBrien} and taking into account that the estimation of the expression \eqref{eq:vari-VIF} can be expressed as: +\[\widehat{var} \left( \widehat{\beta}_{i} \right) = \widehat{\sigma}^{2} \cdot RVIF(i) = \frac{\mathbf{e}^{t}\mathbf{e}}{n-k} \cdot RVIF(i),\] +there are other factors that counterbalance a high value of RVIF, thereby avoiding high estimated variances for the estimated coefficients. These factors are the sum of the squared residuals (SSR= \(\mathbf{e}^{t}\mathbf{e}\)) of the model \eqref{eq:model0} and \(n\). Thus, an appropriate specification of the econometric model (i.e., one that implies a good fit and, consequently, a small SSR) and a large sample size can compensate for high RVIF values. +However, contrary to what happens for the VIF in the traditional case, these factors are taken into account in the threshold \(c_{3}(i)\), as established in the expression \eqref{eq:cota-VIFR} in \(\widehat{var} \left( \widehat{\beta}_{i} \right)\). + +\begin{quote} +Example 4. +This contribution can be illustrated with the data set previously presented by \citet{KleinGoldberger}, which includes variables for consumption, \(\mathbf{C}\), wage incomes, \(\mathbf{I}\), non-farm incomes, \(\mathbf{InA}\), and farm incomes, \(\mathbf{IA}\), in United States from 1936 to 1952, as shown in Table \ref{tab:KGtableLATEX} (data from 1942 to 1944 are not available because they were war years). +\end{quote} + +\begin{table} +\centering +\caption{\label{tab:KGtableLATEX}Data set presented previously by Klein and Goldberger} +\centering +\resizebox{\ifdim\width>\linewidth\linewidth\else\width\fi}{!}{ +\begin{tabular}[t]{cccc} +\toprule +Consumption & Wage income & Non-farm income & Farm income\\ +\midrule +62.8 & 43.41 & 17.10 & 3.96\\ +65.0 & 46.44 & 18.65 & 5.48\\ +63.9 & 44.35 & 17.09 & 4.37\\ +67.5 & 47.82 & 19.28 & 4.51\\ +71.3 & 51.02 & 23.24 & 4.88\\ +\addlinespace +76.6 & 58.71 & 28.11 & 6.37\\ +86.3 & 87.69 & 30.29 & 8.96\\ +95.7 & 76.73 & 28.26 & 9.76\\ +98.3 & 75.91 & 27.91 & 9.31\\ +100.3 & 77.62 & 32.30 & 9.85\\ +\addlinespace +103.2 & 78.01 & 31.39 & 7.21\\ +108.9 & 83.57 & 35.61 & 7.39\\ +108.5 & 90.59 & 37.58 & 7.98\\ +111.4 & 95.47 & 35.17 & 7.42\\ +\bottomrule +\end{tabular}} +\end{table} + +\begin{quote} +Table \ref{tab:regKGtableLATEX} shows the OLS estimations of the model explaining consumption as a function of the rest of the variables. Note that there is some incoherence between the individual significance values of the variables and the global significance of the model. +\end{quote} + +\begin{table} +\centering +\caption{\label{tab:regKGtableLATEX}OLS estimation for the Klein and Goldberger model} +\centering +\resizebox{\ifdim\width>\linewidth\linewidth\else\width\fi}{!}{ +\begin{tabular}[t]{lcccc} +\toprule + & Estimator & Standard Error & Experimental t & p-value\\ +\midrule +Intercept & 18.702 & 6.845 & 2.732 & 0.021\\ +Wage income & 0.380 & 0.312 & 1.218 & 0.251\\ +Non-farm income & 1.419 & 0.720 & 1.969 & 0.077\\ +Farm income & 0.533 & 1.400 & 0.381 & 0.711\\ +(Obs, Sigma Est., Coef. Det., F exp.) & 14.000 & 36.725 & 0.919 & 37.678\\ +\bottomrule +\end{tabular}} +\end{table} + +\begin{quote} +The RVIFs are calculated, yielding 1.275, 0.002, 0.014 and 0.053, respectively. The associated bounds, \(c_{3}(i)\), are also calculated, yielding 0.002, 0.0001, 0.018 and 1.826, respectively. +\end{quote} + +\begin{quote} +Since the coefficient of the wage income variable is not significantly different from zero, and because it is verified that \(0.002 > 0.0001\), from \eqref{eq:cota-VIFR} it is concluded that the degree of multicollinearity existing in the model is affecting its statistical analysis. +\hfill \(\lozenge\) +\end{quote} + +\subsection{From the RVIF}\label{TheTHEOREM} + +Considering that in the original model \eqref{eq:model0} the null hypothesis \(\beta_{i} = 0\) of the individual significance test is not rejected if: +\[RVIF(i) > \left( \frac{\widehat{\beta}_{i}}{\widehat{\sigma} \cdot t_{n-k}(1-\alpha/2)} \right)^{2} = c_{0}(i), \quad i=1,\dots,k,\] +while in the orthonormal model, the null hypothesis is rejected if \(RVIF(i) > c_{3}(i)\), the following theorem can be established: + +\begin{quote} +Theorem. Given the multiple linear regression model \eqref{eq:model0}, the degree of multicollinearity affects its statistical analysis (with a level of significance of \(\alpha\%\)) if there is a variable \(i\), with \(i=1,\dots,k\), that verifies \(RVIF(i) > \max \{ c_{0}(i), c_{3}(i) \}\). +\end{quote} + +Note that \citet{Salmeron2024a} indicate that the RVIF must be calculated with unit length data (as any other transformation removes the intercept from the analysis), however, for the correct application of this theorem the original data must be used as no transformation has been considered in this paper. + +\begin{quote} +Example 5. Tables \ref{tab:theoremWISSELtableLATEX} and \ref{tab:theoremKGtableLATEX} present the results of applying the theorem to the \citet{Wissell} and \citet{KleinGoldberger} models, respectively. Note that in both cases, there is a variable \(i\) that verifies that \(RVIF(i) > \max \{ c_{0}(i), c_{3}(i) \}\), and consequently, we can conclude that the degree of approximate multicollinearity is affecting the statistical analysis in both models (with a level of significance of \(5\%\)). \hfill \(\lozenge\) +\end{quote} + +\begin{table} +\centering +\caption{\label{tab:theoremWISSELtableLATEX}Theorem results of the Wissel model} +\centering +\resizebox{\ifdim\width>\linewidth\linewidth\else\width\fi}{!}{ +\begin{tabular}[t]{lccccc} +\toprule + & RVIFs & c0 & c3 & Scenario & Affects\\ +\midrule +Intercept & 194.866090 & 7.371069 & 1.017198 & b.1 & Yes\\ +Personal consumption & 30.326281 & 4.456018 & 0.915790 & b.1 & Yes\\ +Personal income & 4.765888 & 2.399341 & 10.535976 & b.2 & No\\ +Outstanding consumer credit & 0.000038 & 0.000002 & 0.000715 & b.2 & No\\ +\bottomrule +\end{tabular}} +\end{table} + +\begin{table} +\centering +\caption{\label{tab:theoremKGtableLATEX}Theorem results of the Klein and Goldberger model} +\centering +\resizebox{\ifdim\width>\linewidth\linewidth\else\width\fi}{!}{ +\begin{tabular}[t]{lccccc} +\toprule + & RVIFs & c0 & c3 & Scenario & Affects\\ +\midrule +Intercept & 1.275948 & 1.918383 & 0.002189 & a.1 & No\\ +Wage income & 0.002653 & 0.000793 & 0.000121 & b.1 & Yes\\ +Non-farm income & 0.014131 & 0.011037 & 0.018739 & b.2 & No\\ +Farm income & 0.053355 & 0.001558 & 1.826589 & b.2 & No\\ +\bottomrule +\end{tabular}} +\end{table} + +\section{The rvif package}\label{paqueteRVIF} + +The results developed in \citet{Salmeron2024a} and in this paper have been implemented in the \CRANpkg{rvif} package of R (\citet{R}). The following shows how to replicate the results presented in both papers from the existing commands \(\texttt{rvifs}\) and \(\texttt{multicollinearity}\) in \CRANpkg{rvif}. For this reason, the code executed is shown below. + +In addition, the following issues will be addressed: + +\begin{itemize} +\item + Discussion on the effect of sample size in detecting the influence of multicollinearity on the statistical analysis of the model. +\item + Discussion on the choice of the variable to be fixed as the last one before the orthonormalization. +\end{itemize} + +The code used in these two Subsections is available at \url{https://github.com/rnoremlas/RVIF/tree/main/rvif\%20package}. +It is also interesting to consult the package vignette using the command \texttt{browseVignettes("rvif")}, as well as its web page with \texttt{browseURL(system.file("docs/index.html",\ package\ =\ "rvif"))} or \url{https://www.ugr.es/local/romansg/rvif/index.html}. + +\subsection{Detection of multicollinearity with RVIF: does the degree of multicollinearity affect the statistical analysis of the model?}\label{detection-of-multicollinearity-with-rvif-does-the-degree-of-multicollinearity-affect-the-statistical-analysis-of-the-model} + +In \citet{Salmeron2024a} a series of examples are presented to illustrate the usefulness of RVIF to detect the degree of approximate multicollinearity in a multiple linear regression model. +Results presented by \citet{Salmeron2024a} will be reproduced by using the command \(\texttt{rvifs}\) of \CRANpkg{rvif} package and complemented with the contribution developed in the present work by using the command \(\texttt{multicollinearity}\) of the same package. +In order to facilitate the reading of the paper, this information is available in Appendix \hyperref[examplesRVIF]{Examples of\ldots{}}. + +On the other hand, the following shows how to use the above commands to obtain the results shown in Table 9 of this paper: + +\begin{verbatim} +y_W = Wissel[,2] +X_W = Wissel[,3:6] +multicollinearity(y_W, X_W) +\end{verbatim} + +\begin{verbatim} +#> RVIFs c0 c3 Scenario Affects +#> 1 1.948661e+02 7.371069e+00 1.017198e+00 b.1 Yes +#> 2 3.032628e+01 4.456018e+00 9.157898e-01 b.1 Yes +#> 3 4.765888e+00 2.399341e+00 1.053598e+01 b.2 No +#> 4 3.821626e-05 2.042640e-06 7.149977e-04 b.2 No +\end{verbatim} + +It is noted that the first two arguments of the \(\texttt{multicollinearity}\) command are, respectively, the dependent variable of the linear model and the design matrix containing the independent variables (intercept included as the first column). + +While the results in Table 10 can be obtained using this code: + +\begin{verbatim} +y_KG = KG[,1] +cte = rep(1, length(y)) +X_KG = cbind(cte, KG[,2:4]) +multicollinearity(y_KG, X_KG) +\end{verbatim} + +\begin{verbatim} +#> RVIFs c0 c3 Scenario Affects +#> 1 1.275947615 1.9183829079 0.0021892653 a.1 No +#> 2 0.002652862 0.0007931658 0.0001206694 b.1 Yes +#> 3 0.014130621 0.0110372472 0.0187393601 b.2 No +#> 4 0.053354814 0.0015584988 1.8265885762 b.2 No +\end{verbatim} + +As is known, in both cases it is concluded that the degree of multicollinearity in the model affects its statistical analysis. + +The \(\texttt{multicollinearity}\) command is used by default with a significance level of 5\% for the application of the Theorem set in Subsection \hyperref[TheTHEOREM]{From the RVIF}. +Note that if the significance level is changed to 1\% (third argument of the \(\texttt{multicollinearity}\) command), in the Klein and Goldberger model it is obtained that the individual significance test of the intercept is also affected by the degree of existing multicollinearity: + +\begin{verbatim} +multicollinearity(y_W, X_W, alpha = 0.01) +\end{verbatim} + +\begin{verbatim} +#> RVIFs c0 c3 Scenario Affects +#> 1 1.948661e+02 3.791375e+00 1.977602791 b.1 Yes +#> 2 3.032628e+01 2.291992e+00 1.780449066 b.1 Yes +#> 3 4.765888e+00 1.234122e+00 20.483705068 b.2 No +#> 4 3.821626e-05 1.050650e-06 0.001390076 b.2 No +\end{verbatim} + +\begin{verbatim} +multicollinearity(y_KG, X_KG, alpha = 0.01) +\end{verbatim} + +\begin{verbatim} +#> RVIFs c0 c3 Scenario Affects +#> 1 1.275947615 0.9482013897 0.0044292796 b.1 Yes +#> 2 0.002652862 0.0003920390 0.0002441361 b.1 Yes +#> 3 0.014130621 0.0054553932 0.0379131147 b.2 No +#> 4 0.053354814 0.0007703211 3.6955190555 b.2 No +\end{verbatim} + +It can be seen that the values of \(c_{0}\) and \(c_{3}\) change depending on the significance level used. + +\subsection{Effect of the sample size on the detection of the influence of multicollinearity on the statistical analysis of the model}\label{effect-sample-size} + +The introduction has highlighted the idea that the measures traditionally used to detect whether the degree of multicollinearity is of concern may indicate that it is troubling while the model analysis is not affected by it. Example 1 shows that this may be due, among other factors, to the size of the sample. + +To explore this issue in more detail, below is given an example where traditional measures of multicollinearity detection indicate that the existing multicollinearity is troubling while the statistical analysis of the model is not affected when the sample size is high. In particular, observations are simulated for \(\mathbf{X} = [ \mathbf{1} \ \mathbf{X}_{2} \ \mathbf{X}_{3} \ \mathbf{X}_{4} \ \mathbf{X}_{5} \ \mathbf{X}_{6}]\) where: +\[\mathbf{X}_{2} \sim N(5, 0.1^{2}), \quad \mathbf{X}_{3} \sim N(5, 10^{2}), \quad \mathbf{X}_{4} = \mathbf{X}_{3} + \mathbf{p}\] +\[\mathbf{X}_{5} \sim N(-1, 3^{2}), \quad \mathbf{X}_{6} \sim N(15, 2.5^{2}),\] +where \(\mathbf{p} \sim N(5, 0.5^2)\) and considering three different sample sizes: \(n = 3000\) (Simulation 1), \(n = 100\) (Simulation 2) and \(n = 30\) (Simulation 3). +In all cases the dependent variable is generated according to: +\[\mathbf{y} = 4 + 5 \cdot \mathbf{X}_{2} - 9 \cdot \mathbf{X}_{3} - 2 \cdot \mathbf{X}_{4} + 2 \cdot \mathbf{X}_{5} + 7 \cdot \mathbf{X}_{6} + \mathbf{u},\] +where \(\mathbf{u} \sim N(0, 2^2)\). + +To set the results, a seed has been established using the command \emph{set.seed(2024)}. + +With this generation it is intended that the variable \(\mathbf{X}_{2}\) is linearly related to the intercept as well as \(\mathbf{X}_{3}\) to \(\mathbf{X}_{4}\). This is supported by the results shown in Table \ref{tab:traditionalSIMULATIONtableLATEX}, which have been obtained using the \CRANpkg{multiColl} package of R (\citet{R}) using the commands \(\texttt{CV}\), \(\texttt{VIF}\) and \(\texttt{CN}\). + +The results imply the same conclusions in all three simulations: + +\begin{itemize} +\item + There is a worrying degree of non-essential multicollinearity in the model relating the intercept to the variable \(\mathbf{X}_{2}\) since its coefficient of variation (CV) is lower than 0.1002506. +\item + There is a worrying degree of essential multicollinearity in the model relating the variables \(\mathbf{X}_{3}\) and \(\mathbf{X}_{4}\) since the associated Variance Inflation Factors (VIF) are greater than 10. +\end{itemize} + +\begin{table} +\centering +\caption{\label{tab:traditionalSIMULATIONtableLATEX}CVs, VIFs and CN for data of Simulations 1, 2 and 3} +\centering +\resizebox{\ifdim\width>\linewidth\linewidth\else\width\fi}{!}{ +\begin{tabular}[t]{lccc} +\toprule + & Simulation 1 & Simulation 2 & Simulation 3\\ +\midrule +X2 CV & 0.020 & 0.019 & 0.025\\ +X3 CV & 2.010 & 1.827 & 3.326\\ +X4 CV & 1.004 & 0.968 & 1.434\\ +X5 CV & 3.138 & 1.948 & 2.413\\ +X6 CV & 0.167 & 0.176 & 0.194\\ +\addlinespace +X2 VIF & 1.003 & 1.053 & 1.167\\ +X3 VIF & 388.669 & 373.092 & 926.768\\ +X4 VIF & 388.696 & 373.280 & 929.916\\ +X5 VIF & 1.001 & 1.014 & 1.043\\ +X6 VIF & 1.003 & 1.066 & 1.254\\ +\addlinespace +CN & 148.247 & 162.707 & 123.025\\ +\bottomrule +\end{tabular}} +\end{table} + +However, does the degree of multicollinearity detected really affect the statistical analysis of the model? According to the results shown in Tables \ref{tab:theoremSIMULATION1tableLATEX} to \ref{tab:theoremSIMULATION3tableLATEX} this is not always the case: + +\begin{itemize} +\item + In Simulation 1, when \(n=3000\), the degree of multicollinearity in the model does not affect the statistical analysis of the model; scenario a.1 is always verified, i.e., both in the model proposed and in the orthonormal model, the null hypothesis is rejected in the individual significance tests. +\item + In Simulation 2, when \(n=100\), the degree of multicollinearity in the model affects the statistical analysis of the model only in the individual significance of the intercept; in all other cases scenario a.1 is verified again. + + \begin{itemize} + \tightlist + \item + As will be seen below, the fact that the individual significance of the variable \(\mathbf{X}_{2}\) is not affected may be due to the number of observations in the data set. But it may also be because multicollinearity of the nonessential type affects only the intercept estimate. Thus, for example, in \citet{Salmeron2019TAS} it is shown (see Table 2 of Example 2) that solving this type of approximate multicollinearity (by centering the variables that cause it) only modifies the estimate of the intercept and its standard deviation, with the estimates of the rest of the independent variables remaining unchanged. + \end{itemize} +\item + In Simulation 3, when \(n=30\), the degree of multicollinearity in the model affects the statistical analysis of the model in the individual significance of the intercept, in \(\mathbf{X}_{2}\) and in \(\mathbf{X}_{4}\). + + \begin{itemize} + \tightlist + \item + In this case, as discussed, the reduction in sample size does not prevent the individual significance of \(\mathbf{X}_{2}\) from being affected. + \end{itemize} +\end{itemize} + +In conclusion, as \citet{OBrien} indicates, it can be seen that the increase in sample size prevents the statistical analysis of the model from being affected by the degree of existing multicollinearity, even though the values of the measures traditionally used to detect this problem indicate that it is troubling. To reach this conclusion, the use of the RVIF proposed by \citet{Salmeron2024a} and the theorem developed in this paper is decisive. + +\begin{table} +\centering +\caption{\label{tab:theoremSIMULATION1tableLATEX}Theorem results of the Simulation 1 model} +\centering +\resizebox{\ifdim\width>\linewidth\linewidth\else\width\fi}{!}{ +\begin{tabular}[t]{lccccc} +\toprule + & RVIFs & c0 & c3 & Scenario & Affects\\ +\midrule +Intercept & 0.934369 & 1.916912 & 0.000001 & a.1 & No\\ +X2 & 0.034899 & 1.359909 & 0.000168 & a.1 & No\\ +X3 & 0.001299 & 5.339519 & 0.000000 & a.1 & No\\ +X4 & 0.001296 & 0.230992 & 0.000004 & a.1 & No\\ +X5 & 0.000036 & 0.257015 & 0.000000 & a.1 & No\\ +\addlinespace +X6 & 0.000053 & 3.160352 & 0.000000 & a.1 & No\\ +\bottomrule +\end{tabular}} +\end{table} + +\begin{table} +\centering +\caption{\label{tab:theoremSIMULATION2tableLATEX}Theorem results of the Simulation 2 model} +\centering +\resizebox{\ifdim\width>\linewidth\linewidth\else\width\fi}{!}{ +\begin{tabular}[t]{lccccc} +\toprule + & RVIFs & c0 & c3 & Scenario & Affects\\ +\midrule +Intercept & 32.965272 & 0.228580 & 0.001580 & b.1 & Yes\\ +X2 & 1.179581 & 1.678248 & 0.014061 & a.1 & No\\ +X3 & 0.037287 & 5.662562 & 0.000001 & a.1 & No\\ +X4 & 0.036687 & 0.113376 & 0.000353 & a.1 & No\\ +X5 & 0.001269 & 0.252728 & 0.000006 & a.1 & No\\ +\addlinespace +X6 & 0.001601 & 3.060976 & 0.000001 & a.1 & No\\ +\bottomrule +\end{tabular}} +\end{table} + +\begin{table} +\centering +\caption{\label{tab:theoremSIMULATION3tableLATEX}Theorem results of the Simulation 3 model} +\centering +\resizebox{\ifdim\width>\linewidth\linewidth\else\width\fi}{!}{ +\begin{tabular}[t]{lccccc} +\toprule + & RVIFs & c0 & c3 & Scenario & Affects\\ +\midrule +Intercept & 70.990340 & 46.793605 & 0.008667 & b.1 & Yes\\ +X2 & 2.524792 & 0.000570 & 0.005083 & b.1 & Yes\\ +X3 & 0.187896 & 3.892727 & 0.000007 & a.1 & No\\ +X4 & 0.187317 & 0.168758 & 0.005113 & b.1 & Yes\\ +X5 & 0.003863 & 0.169923 & 0.000325 & a.1 & No\\ +\addlinespace +X6 & 0.005193 & 2.139108 & 0.000013 & a.1 & No\\ +\bottomrule +\end{tabular}} +\end{table} + +\subsection{Selection of the variable to be set as the last before orthonormalization}\label{how-to-fix} + +Since there are as many QR decompositions as there are possible rearrangements of the independent variables, it is convenient to test different options to determine whether the degree of multicollinearity in the regression model affects its statistical analysis. + +A first possibility is to try all possible reorderings considering that the intercept must always be in first place. Thus, in the Example 2 of \citet{Salmeron2024a} (see Appendix \hyperref[examplesRVIF]{Examples of\ldots{}} for more details) it is considered that \(\mathbf{X} = [ \mathbf{1} \ \mathbf{K} \ \mathbf{W}]\) (see Table \ref{tab:theoremCHOICE1tableLATEX}), but it could also be considered that \(\mathbf{X} = [ \mathbf{1} \ \mathbf{W} \ \mathbf{K}]\) (see Table \ref{tab:theoremCHOICE2tableLATEX}). + +Note that in these tables the values for each variable of RVIF and \(c_{0}\) are always the same, but those of \(c_{3}\) change depending on the position of each variable within the design matrix. + +\begin{table} +\centering +\caption{\label{tab:theoremCHOICE1tableLATEX}Theorem results of the Example 2 of Salmerón et al. (2025)} +\centering +\resizebox{\ifdim\width>\linewidth\linewidth\else\width\fi}{!}{ +\begin{tabular}[t]{lccccc} +\toprule + & RVIFs & c0 & c3 & Scenario & Affects\\ +\midrule +Intercept & 6388.881402 & 88495.933700 & 1.649518 & a.1 & No\\ +Capital & 4.136993 & 207.628058 & 0.050431 & a.1 & No\\ +Work & 37.336325 & 9.445619 & 147.582132 & b.2 & No\\ +\bottomrule +\end{tabular}} +\end{table} + +\begin{table} +\centering +\caption{\label{tab:theoremCHOICE2tableLATEX}Theorem results of the Example 2 of Salmerón et al. (2025) (reordination 2)} +\centering +\resizebox{\ifdim\width>\linewidth\linewidth\else\width\fi}{!}{ +\begin{tabular}[t]{lccccc} +\toprule + & RVIFs & c0 & c3 & Scenario & Affects\\ +\midrule +Intercept & 6388.876692 & 88495.933700 & 1.649518 & a.1 & No\\ +Work & 37.336325 & 9.445619 & 1.163201 & b.1 & Yes\\ +Capital & 4.136993 & 207.628058 & 0.082430 & a.1 & No\\ +\bottomrule +\end{tabular}} +\end{table} + +It is observed that in one of the two possibilities considered, the individual significance of the labor variable is affected by the degree of existing multicollinearity. + +Therefore, to state that the statistical analysis of the multiple linear regression model is not affected by the multicollinearity present in the model, it is necessary to check all the possible QR decompositions and to determine in all cases that the statistical analysis is not affected. However, to determine that the statistical analysis of the model is affected by the presence of multicollinearity, it is sufficient to find one of the possible rearrangements in which the situation b.1 occurs. + +Another possibility is to set in the last position of \(\mathbf{X}\) a particular variable following a specific criterion. Thus, for example, in Example 3 of \citet{Salmeron2024a} (see Appendix \hyperref[examplesRVIF]{Examples of\ldots{}} for more details) it is verified that the variable FA has a coefficient significantly different from zero. Fixing this variable in third place since the individual significance will not be modified yields the results shown in Table \ref{tab:theoremCHOICE8tableLATEX}. + +\begin{table} +\centering +\caption{\label{tab:theoremCHOICE8tableLATEX}Theorem results of the Example 3 of Salmerón et al. (2025) reordination} +\centering +\resizebox{\ifdim\width>\linewidth\linewidth\else\width\fi}{!}{ +\begin{tabular}[t]{lccccc} +\toprule + & RVIFs & c0 & c3 & Scenario & Affects\\ +\midrule +OI & 1.696454e-12 & 9.594942e-13 & 1.775244e-13 & b.1 & Yes\\ +S & 1.718535e-12 & 1.100437e-12 & 1.012113e-12 & b.1 & Yes\\ +FA & 1.829200e-16 & 2.307700e-16 & 1.449800e-16 & a.1 & No\\ +\bottomrule +\end{tabular}} +\end{table} + +It can be seen that in this case the degree of multicollinearity in the model affects the individual significance of the OI and S variables. + +\section{Conclusions}\label{conclusiones-VIF} + +In this paper, following \citet{Salmeron2024a}, we propose an alternative orthogonal model that leads to a lower bound for the RVIF, indicating whether the degree of multicollinearity present in the model affects its statistical analysis. These thresholds serve as complements to the results presented by \citet{OBrien}, who stated that the estimated variances depend on other factors that can counterbalance a high value of the VIF, for example, the size of the sample or the estimated variance of the independent variables. Thus, the thresholds presented for the RVIF also depend on these factors meeting a threshold associated with each independent variable (including the intercept). Note that these thresholds will indicate whether the degree of multicollinearity affects the statistical analysis. + +As these thresholds are derived from the individual significance tests of the model, it is possible to reinterpret them as a statistical test to determine whether the degree of multicollinearity in the linear regression model affects its statistical analysis. This analytic tool allows researchers to conclude whether the degree of multicollinearity is statistically troubling and whether it needs to be treated. We consider this to be a relevant contribution since, to the best of our knowledge, the only existing example of such a measure, presented by \citet{FarrarGlauber}, has been strongly criticized (in addition to the limitations highlighted in the introduction, it should be noted that it completely ignores approximate non-essential multicollinearity since the correlation matrix does not include information on the intercept); consequently, this new statistical test with a non-rejection region will fill a gap in the scientific literature. + +On the other hand, note that the position of each of the variables in the matrix \(\mathbf{X}\) uniquely determines the reference orthonormal model \(\mathbf{X}_{o}\). It is to say, there are as many reference models given by the proposed QR decomposition as there are possible rearrangements of the variables within the matrix \(\mathbf{X}\). + +In this sense, as has been shown, in order to affirm that the statistical analysis of the model is not affected by the degree of multicollinearity existing in the model (with the degree of significance used in the application of the proposed theorem), it is necessary to state that in all the possible rearrangements of \(\mathbf{X}\) it is concluded that scenario b.1 does not occur. On the other hand, when there is a rearrangement in which this scenario appears, it can be stated (to the degree of significance used when applying the proposed theorem) that the degree of existing multicollinearity affects the statistical analysis of the model. + +Finally, as a future line of work, it would be interesting to complete the analysis presented here by studying when the degree of multicollinearity in the model affects its numerical analysis. + +\section{Acknowledgments}\label{acknowledgments} + +This work has been supported by project PP2019-EI-02 of the University of Granada (Spain) and by project A-SEJ-496-UGR20 of the Andalusian Government's Counseling of Economic Transformation, Industry, Knowledge and Universities (Spain). + +\section{Appendix}\label{appendix} + +\subsection{Inconsistency in hypothesis tests: situation a.2}\label{inconsistency} + +From a numerical point of view it is possible to reject \(H_{0}: \beta_{i} = 0\) while \(H_{0}: \beta_{i,o} = 0\) is not rejected, which implies that \(t_{i}^{o} < t_{n-k}(1 - \alpha/2) < t_{i}\). Or, in other words, \(t_{i}/t_{i}^{o} > 1\). + +However, from expression \eqref{eq:texp-orto-2} it is obtained that \(\widehat{\sigma} = | \widehat{\beta}_{i,o} | / t_{i}^{o}\). By substituting \(\widehat{\sigma}\) in expression \eqref{eq:texp-orig}, taking into account expression \eqref{eq:RVIF}, it is obtained that +\[\frac{t_{i}}{t_{i}^{o}} = \frac{| \widehat{\beta}_{i} |}{| \widehat{\beta}_{i,o} |} \cdot \frac{1}{\sqrt{RVIF(i)}}.\] +From this expression it can be concluded that in situations with high collinearity, \(RVIF(i) \rightarrow +\infty\), the ratio \(t_{i}/t_{i}^{o}\) will tend to zero, and the condition \(t_{i}/t_{i}^{o} > 1\) will rarely occur. That is to say, the inconsistency in situation a.2, commented on in the preliminaries of the paper, will not appear. + +On the other hand, if the variable \(i\) is orthogonal to the rest of independent variables, it is verified that \(\widehat{\beta}_{i,o} = \widehat{\beta}_{i}\) since \(p_{i} = ( 0 \dots \underbrace{1}_{(i)} \dots 0)\). At the same time, \(RVIF(i) = \frac{1}{SST_{i}}\) where \(SST\) denotes the sum of total squares. If there is orthonormality, as proposed in this paper, \(SST_{i} = 1\) and, as consequence, it is verified that \(t_{i} = t_{i}^{o}\). Thus, the individual significance test for the original data and for the orthonormal data are the same. + +\subsection{\texorpdfstring{Test of individual significance of coefficient \(k\)}{Test of individual significance of coefficient k}}\label{apendix} + +Taking into account that it is verified that \(\boldsymbol{\beta}_{o} = \mathbf{P} \boldsymbol{\beta}\) where: +\[\boldsymbol{\beta}_{o} = \left( + \begin{array}{c} + \beta_{1,o} \\ + \beta_{2,o} \\ + \vdots \\ + \beta_{k,o} + \end{array} \right), \quad + \mathbf{P} = \left( + \begin{array}{cccc} + p_{11} & p_{12} & \dots & p_{1k} \\ + 0 & p_{22} & \dots & p_{2k} \\ + \vdots & \vdots & & \vdots \\ + 0 & 0 & \dots & p_{kk} + \end{array} \right), \quad + \boldsymbol{\beta} = \left( + \begin{array}{c} + \beta_{1} \\ + \beta_{2} \\ + \vdots \\ + \beta_{k} + \end{array} \right),\] +it is obtained that \(\beta_{k,o} = p_{kk} \beta_{k}\). Then, the null hypothesis \(H_{0}: \beta_{k,o} = 0\) is equivalent to \(H_{0}: \beta_{k} = 0\). Due to this fact, Tables \ref{tab:Wissel0tableLATEX} and \ref{tab:WisselORTOtableLATEX} showed an expectable behaviour. However, this behaviour will be analyzed with more detail. + +The experimental value to be considered to take a decision in the test with null hypothesis \(H_{0}: \beta_{k,o} = 0\) and alternative hypothesis \(H_{1}: \beta_{k,o} \not= 0\) is given by the following expression: +\[t_{k}^{o} = \left| \frac{\widehat{\beta}_{k,o}}{\sqrt{var \left( \widehat{\beta}_{k,o} \right)}} \right|.\] + +Taking into account that \(\widehat{\boldsymbol{\beta}}_{o} = \mathbf{P} \widehat{\boldsymbol{\beta}}\) and \(var \left( \widehat{\boldsymbol{\beta}}_{o} \right) = \mathbf{P} var \left( \widehat{\boldsymbol{\beta}} \right) \mathbf{P}^{t},\) it is verified that \(\widehat{\beta}_{k,o} = p_{kk} \widehat{\beta}_{k}\) and \(var \left( \widehat{\beta}_{k,o} \right) = p_{kk}^{2} var \left( \widehat{\beta}_{k} \right)\). Then: +\[t_{k}^{o} = \left| \frac{p_{kk} \widehat{\beta}_{k}}{p_{kk} \sqrt{var \left( \widehat{\beta}_{k} \right)}} \right| = \left| \frac{\widehat{\beta}_{k}}{\sqrt{var \left( \widehat{\beta}_{k} \right)}} \right| = t_{k},\] +where \(t_{k}\) is the experimental value to take a decision in the test with null hypothesis \(H_{0}: \beta_{k} = 0\) and alternative hypothesis \(H_{1}: \beta_{k} \not= 0\). + +\subsection{\texorpdfstring{Examples of \citet{Salmeron2024a}}{Examples of @Salmeron2024a}}\label{examplesRVIF} + +\textbf{Example 1 of \citet{Salmeron2024a}: Detection of traditional nonessential multicollinearity}. Using data from a financial model in which the Euribor (E) is analyzed from the Harmonized Index of Consumer Prices (HICP), the balance of payments to net current account (BC) and the government deficit to net nonfinancial accounts (GD), we illustrate the detection of approximate multicollinearity of the non-essential type, i.e.~where the intercept is related to one of the remaining independent variables (for details see \citet{MarquardtSnee1975}). For more information on this data set use \emph{help(euribor)}. + +Note that \citet{Salmeron2019} establishes that an independent variable with a coefficient of variation less than 0.1002506 indicates that this variable is responsible for a non-essential multicollinearity problem. + +Thus, first of all, the approximate multicollinearity detection is performed using the measures traditionally applied for this purpose: the Variance Inflation Factor (VIF) and the Condition Number (CN). Values higher than 10 for the VIF (see, for example, \citet{Marquardt1970}) and 30 for the CN (see, for example, \citet{Belsley1991} or \citet{BelsleyKuhWelsch}), imply that the degree of existing multicollinearity is troubling. Moreover, according to \citet{Salmeron2019}, the VIF is only able to detect essential multicollinearity (relationship between independent variables excluding the intercept, see \citet{MarquardtSnee1975}), while the CN detects both essential and non-essential multicollinearity. + +Therefore, the values calculated below (using the \(\texttt{VIF}\), \(\texttt{CN}\) and \(\texttt{CVs}\) commands from the \CRANpkg{multiColl} package, see \citet{Salmeron2021multicoll} and \citet{Salmeron2022multicoll} for more details on this package) indicate that the degree of approximate multicollinearity existing in the model of the essential type is not troubling, while that of the non-essential type is troubling due to the relationship of HIPC with the intercept. + +\begin{verbatim} +E = euribor[,1] +data1 = euribor[,-1] + +VIF(data1) +\end{verbatim} + +\begin{verbatim} +#> HIPC BC GD +#> 1.349666 1.058593 1.283815 +\end{verbatim} + +\begin{verbatim} +CN(data1) +\end{verbatim} + +\begin{verbatim} +#> [1] 39.35375 +\end{verbatim} + +\begin{verbatim} +CVs(data1) +\end{verbatim} + +\begin{verbatim} +#> [1] 0.06957876 4.34031035 0.55015508 +\end{verbatim} + +This assumption is confirmed by calculating the RVIF values, which point to a strong relationship between the second variable and the intercept: + +\begin{verbatim} +rvifs(data1, ul = T, intercept = T) +\end{verbatim} + +\begin{verbatim} +#> RVIF % +#> Intercept 250.294157 99.6005 +#> Variable 2 280.136873 99.6430 +#> Variable 3 1.114787 10.2967 +#> Variable 4 5.525440 81.9019 +\end{verbatim} + +The output of the \(\texttt{rvifs}\) command provides the values of the Redefined Variance Inflation Factor (RVIF) and the percentage of multicollinearity due to each variable (denoted as \(a_{i}\) in the \hyperref[sub-above]{An orthonormal\ldots{}} section). + +In this case, three of the four arguments available in the \(\texttt{rvifs}\) command are used: + +\begin{itemize} +\item + The first of these refers to the design matrix containing the independent variables (the intercept, if any, being the first column). +\item + The second argument, \(ul\), indicates that the data is to be transformed into unit length. This transformation makes it possible to establish that the RVIF is always greater than or equal to 1, having as a reference a minimum value that indicates the absence of worrying multicollinearity. +\item + The third argument, \(intercept\), indicates whether there is an intercept in the design matrix. +\end{itemize} + +Note that these results can also be obtained after using the \(\texttt{lm}\) and \(\texttt{model.matrix}\) commands as follows: + +\begin{verbatim} +reg_E = lm(euribor[,1]~as.matrix(euribor[,-c(1,2)])) +rvifs(model.matrix(reg_E)) +\end{verbatim} + +\begin{verbatim} +#> RVIF % +#> Intercept 250.294157 99.6005 +#> Variable 2 280.136873 99.6430 +#> Variable 3 1.114787 10.2967 +#> Variable 4 5.525440 81.9019 +\end{verbatim} + +Finally, the application of the Theorem established in Subsection \hyperref[TheTHEOREM]{From the RVIF} detects that the individual inference of the second variable (HIPC) is affected by the degree of multicollinearity existing in the model. These results are obtained using the \(\texttt{multicollinearity}\) command from the \CRANpkg{rvif} package: + +\begin{verbatim} +multicollinearity(E, data1) +\end{verbatim} + +\begin{verbatim} +#> RVIFs c0 c3 Scenario Affects +#> 1 5.325408e+00 1.575871e+01 2.166907e-02 a.1 No +#> 2 5.357830e-04 3.219456e-06 4.249359e-05 b.1 Yes +#> 3 5.109564e-11 1.098649e-09 2.586237e-12 a.1 No +#> 4 1.631439e-11 3.216522e-10 8.274760e-13 a.1 No +\end{verbatim} + +Therefore, it can be established that the existing multicollinearity affects the statistical analysis of the Euribor model. + +\textbf{Example 2 of \citet{Salmeron2024a}: Detection of generalized nonessential multicollinearity}. Using data from a Cobb-Douglas production function in which the production (P) is analyzed from the capital (K) and the work (W), we illustrate the detection of approximate multicollinearity of the generalized non-essential type, i.e., that in which at least two independent variables with very little variability (excluding the intercept) are related to each other (for more details, see \citet{Salmeron2020maths}). For more information on this dataset use \emph{help(CDpf)}. + +Using the \(\texttt{rvifs}\) command, it can be determined that both capital and labor are linearly related to each other with high RVIF values below the threshold established as a worrying value: + +\begin{verbatim} +P = CDpf[,1] +data2 = CDpf[,2:4] +\end{verbatim} + +\begin{verbatim} +rvifs(data2, ul = T) +\end{verbatim} + +\begin{verbatim} +#> RVIF % +#> Intercept 178888.82 99.9994 +#> Variable 2 38071.45 99.9974 +#> Variable 3 255219.62 99.9996 +\end{verbatim} + +However, the application of the Theorem established in Subsection \hyperref[TheTHEOREM]{From the RVIF} does not detect that the degree of multicollinearity in the model affects the statistical analysis of the model: + +\begin{verbatim} +multicollinearity(P, data2) +\end{verbatim} + +\begin{verbatim} +#> RVIFs c0 c3 Scenario Affects +#> 1 6388.881402 88495.933700 1.64951764 a.1 No +#> 2 4.136993 207.628058 0.05043083 a.1 No +#> 3 37.336325 9.445619 147.58213164 b.2 No +\end{verbatim} + +Now then, if we rearrange the design matrix \(\mathbf{X}\) we obtain that: + +\begin{verbatim} +data2 = CDpf[,c(2,4,3)] +multicollinearity(P, data2) +\end{verbatim} + +\begin{verbatim} +#> RVIFs c0 c3 Scenario Affects +#> 1 6388.876692 88495.933700 1.64951764 a.1 No +#> 2 37.336325 9.445619 1.16320125 b.1 Yes +#> 3 4.136993 207.628058 0.08242979 a.1 No +\end{verbatim} + +Therefore, it can be established that the existing multicollinearity does affect the statistical analysis of the Cobb-Douglas production function model. + +\textbf{Example 3 of \citet{Salmeron2024a}: Detection of essential multicollinearity}. Using data from a model in which the number of employees of Spanish companies (NE) is analyzed from the fixed assets (FA), operating income (OI) and sales (S), we illustrate the detection of approximate multicollinearity of the essential type, i.e., that in which at least two independent variables (excluding the intercept) are related to each other (for more details, see \citet{MarquardtSnee1975}). For more information on this dataset use \emph{help(employees)}. + +In this case, the \(\texttt{rvifs}\) command shows that variables three and four (OI and S) have a high VIF value, so they are highly linearly related: + +\begin{verbatim} +NE = employees[,1] +data3 = employees[,2:5] +\end{verbatim} + +\begin{verbatim} +rvifs(data3, ul = T) +\end{verbatim} + +\begin{verbatim} +#> RVIF % +#> Intercept 2.984146 66.4896 +#> Variable 2 5.011397 80.0455 +#> Variable 3 15186.744870 99.9934 +#> Variable 4 15052.679178 99.9934 +\end{verbatim} + +Note that if in \emph{rvifs(data3, ul = T)} the unit length transformation is avoided, which is done in the \(\texttt{multicollinearity}\) command, the RVIF cannot be calculated since the system is computationally singular. For this reason, the intercept is eliminated below since it has been shown above that it does not play a relevant role in the linear relationships of the model. + +Finally, the application of the Theorem established in Subsection \hyperref[TheTHEOREM]{From the RVIF} detects that the individual inference of the third variable (OI) is affected by the degree of multicollinearity existing in the model: + +\begin{verbatim} +multicollinearity(NE, data3[,-1]) +\end{verbatim} + +\begin{verbatim} +#> RVIFs c0 c3 Scenario Affects +#> 1 1.829154e-16 2.307712e-16 4.679301e-17 a.1 No +#> 2 1.696454e-12 9.594942e-13 2.129511e-13 b.1 Yes +#> 3 1.718535e-12 1.100437e-12 2.683809e-12 b.2 No +\end{verbatim} + +Therefore, it can be established that the existing multicollinearity affects the statistical analysis of the model of the number of employees in Spanish companies. + +\textbf{Example 4 of \citet{Salmeron2024a}: The special case of simple linear model}. The simple linear regression model is an interesting case because it has a single independent variable and the intercept. Since the intercept is not properly considered as an independent variable of the model in many cases (see introduction of \citet{Salmeron2019} for more details), different software (including R, \citet{R}) do not consider that there can be worrisome multicollinearity in this type of model. + +To illustrate this situation, \citet{Salmeron2024a} randomly generates observations for the following two simple linear regression models \(\mathbf{y}_{1} = \beta_{1} + \beta_{2} \mathbf{V} + \mathbf{u}_{1}\) and \(\mathbf{y}_{2} = \alpha_{1} + \alpha_{2} \mathbf{Z} + \mathbf{u}_{1}\), according to the following code: + +\begin{verbatim} +set.seed(2022) +obs = 50 +cte4 = rep(1, obs) +V = rnorm(obs, 10, 10) +y1 = 3 + 4*V + rnorm(obs, 0, 2) +Z = rnorm(obs, 10, 0.1) +y2 = 3 + 4*Z + rnorm(obs, 0, 2) + +data4.1 = cbind(cte4, V) +data4.2 = cbind(cte4, Z) +\end{verbatim} + +For more information on these data sets use \emph{help(SLM1)} and \emph{help(SLM2)}. + +As mentioned above, the R package (\citet{R}) denies the existence of multicollinearity in this type of model. Thus, for example, when using the \(\texttt{vif}\) command of the \CRANpkg{car} package on \emph{reg=lm(y1\textasciitilde V)} the following message is obtained: \emph{Error in vif.default(reg): model contains fewer than 2 terms}. + +Undoubtedly, this message is coherent with the fact that, as mentioned above, the VIF is not capable of detecting non-essential multicollinearity (which is the only multicollinearity that exists in this type of model). However, the error message provided may lead a non-specialized user to consider that the multicollinearity problem does not exist in this type of model. These issues are addressed in more depth in \citet{Salmeron2022multicoll}. + +On the other hand, the calculation of the RVIF in the first model shows that the degree of multicollinearity is not troubling, since it presents very low values: + +\begin{verbatim} +rvifs(data4.1, ul = T) +\end{verbatim} + +\begin{verbatim} +#> RVIF % +#> Intercept 2.015249 50.3783 +#> Variable 2 2.015249 50.3783 +\end{verbatim} + +While in the second model they are very high, indicating a problem of non-essential multicollinearity: + +\begin{verbatim} +rvifs(data4.2, ul = T) +\end{verbatim} + +\begin{verbatim} +#> RVIF % +#> Intercept 9390.044 99.9894 +#> Variable 2 9390.044 99.9894 +\end{verbatim} + +By using the \(\texttt{multicollinearity}\) command, it is found that the individual inference of the intercept of the second model is affected by the degree of multicollinearity in the model: + +\begin{verbatim} +multicollinearity(y1, data4.1) +\end{verbatim} + +\begin{verbatim} +#> RVIFs c0 c3 Scenario Affects +#> 1 0.0403049717 0.6454323 1.045802e-05 a.1 No +#> 2 0.0002675731 0.8383436 8.540101e-08 a.1 No +\end{verbatim} + +\begin{verbatim} +multicollinearity(y2, data4.2) +\end{verbatim} + +\begin{verbatim} +#> RVIFs c0 c3 Scenario Affects +#> 1 187.800878 21.4798003 0.03277691 b.1 Yes +#> 2 1.879296 0.3687652 9.57724567 b.2 No +\end{verbatim} + +Therefore, it can be established that the multicollinearity existing in the first simple linear regression model does not affect the statistical analysis of the model, while in the second one it does. + +\bibliography{RJreferences.bib} + +\address{% +Román Salmerón-Gómez\\ +University of Granada\\% +Department of Quantitative Methods for Economics and Business\\ Campus Universitario de La Cartuja, Universidad de Granada. 18071 Granada (España)\\ +% +\url{https://www.ugr.es/~romansg/web/index.html}\\% +\textit{ORCiD: \href{https://orcid.org/0000-0003-2589-4058}{0000-0003-2589-4058}}\\% +\href{mailto:romansg@ugr.es}{\nolinkurl{romansg@ugr.es}}% +} + +\address{% +Catalina B. García-García\\ +University of Granada\\% +Department of Quantitative Methods for Economics and Business\\ Campus Universitario de La Cartuja, Universidad de Granada. 18071 Granada (España)\\ +% +\url{https://metodoscuantitativos.ugr.es/informacion/directorio-personal/catalina-garcia-garcia}\\% +\textit{ORCiD: \href{https://orcid.org/0000-0003-1622-3877}{0000-0003-1622-3877}}\\% +\href{mailto:cbgarcia@ugr.es}{\nolinkurl{cbgarcia@ugr.es}}% +} diff --git a/_articles/RJ-2025-041/RJ-2025-041.Rmd b/_articles/RJ-2025-041/RJ-2025-041.Rmd new file mode 100644 index 0000000000..773df1bbdd --- /dev/null +++ b/_articles/RJ-2025-041/RJ-2025-041.Rmd @@ -0,0 +1,1162 @@ +--- +title: 'elhmc: An R Package for Hamiltonian Monte Carlo Sampling in Bayesian Empirical + Likelihood' +abstract: | + In this article, we describe an R package for sampling from an + empirical likelihood-based posterior using a Hamiltonian Monte Carlo + method. Empirical likelihood-based methodologies have been used in the + Bayesian modeling of many problems of interest in recent times. This + semiparametric procedure can easily combine the flexibility of a + nonparametric distribution estimator together with the + interpretability of a parametric model. The model is specified by + estimating equation-based constraints. Drawing inference from a + Bayesian empirical likelihood (BayesEL) posterior is challenging. The + likelihood is computed numerically, so no closed-form expression of + the posterior exists. Moreover, for any sample of finite size, the + support of the likelihood is non-convex, which hinders fast mixing of + many Markov Chain Monte Carlo (MCMC) procedures. It has been recently + shown that using the properties of the gradient of the log empirical + likelihood, one can devise an efficient Hamiltonian Monte Carlo (HMC) + algorithm to sample from a BayesEL posterior. The package requires the + user to specify only the estimating equations, the prior, and their + respective gradients. An MCMC sample drawn from the BayesEL posterior + of the parameters, with various details required by the user, is + obtained. +author: +- name: Neo Han Wei + affiliation: Citibank, Singapore + address: | + [`nhanwei@gmail.com`](mailto:nhanwei@gmail.com) +- name: Dang Trung Kien + affiliation: Independent Consultant + address: + - '[`trungkiendang@hotmail.com`](mailto:trungkiendang@hotmail.com)' + - | + +- name: Sanjay Chaudhuri + affiliation: University of Nebraska-Lincoln + address: + - Department of Statistics + - 840 Hardin Hall North Wing, Lincoln, NE, USA + - '[`schaudhuri2@nebraska.edu`](mailto:schaudhuri2@nebraska.edu)' + - | + +date: '2026-01-06' +date_received: '2025-03-12' +journal: + firstpage: 237 + lastpage: 254 +volume: 17 +issue: 4 +slug: RJ-2025-041 +packages: + cran: [] + bioc: [] +preview: preview.png +bibliography: kWC.bib +CTV: ~ +legacy_pdf: yes +legacy_converted: yes +output: + rjtools::rjournal_web_article: + self_contained: yes + toc: no + mathjax: https://cdn.jsdelivr.net/npm/mathjax@4/tex-mml-chtml.js + md_extension: -tex_math_single_backslash +draft: no + +--- + + +::::::: article +## Introduction + +Empirical likelihood has several advantages over a traditional +parametric likelihood. Even though a correctly specified parametric +likelihood is usually the most efficient for parameter estimation, +semiparametric methods like empirical likelihood, which use a +nonparametric estimate of the underlying distribution, are often more +efficient when the model is misspecified. Empirical likelihood +incorporates parametric model-based information as constraints in +estimating the underlying distribution, which makes the parametric +estimates interpretable. Furthermore, it allows easy incorporation of +known additional information not involving the parameters in the +analysis. + +Bayesian empirical likelihood (BayesEL) [@lazar2003bayesian] methods +employ empirical likelihood in the Bayesian paradigm. Given some +information about the model parameters in the form of a prior +distribution and estimating equations obtained from the model, a +likelihood is constructed from a constrained empirical estimate of the +underlying distribution. The prior is then used to define a posterior +based on this estimated likelihood. Inference on the parameter is drawn +based on samples generated from the posterior distribution. + +BayesEL methods are quite flexible and have been found useful in many +areas of statistics. The examples include small area estimation, +quantile regression, analysis of complex survey data, etc. + +BayesEL procedures, however, require an efficient Markov Chain Monte +Carlo (MCMC) procedure to sample from the resulting posterior. It turns +out that such a procedure is not easily specified. For many parameter +values, it may not be feasible to compute the constrained empirical +distribution function, and the likelihood is estimated to be zero. That +is, the estimated likelihood is not supported over the whole space. +Moreover, this support is non-convex and impossible to determine in most +cases. Thus, a naive random walk MCMC would quite often propose +parameters outside the support and get stuck. + +Many authors have encountered this problem in frequentist applications. +Such \"empty set\" problems are quite common [@grendar2009empty] and +become more frequent in problems with a large number of parameters +[@bergsma2012empty]. Several authors +[@chen2008adjusted; @emerson2009calibration; @liu2010adjusted] have +suggested the addition of extra observations generated from the +available data designed specifically to avoid empty sets. They show that +such observations can be proposed without changing the asymptotic +distribution of the corresponding Wilks' statistics. Some authors +([@tsao2013extending; @tsao2013empirical; @tsaoFu2014]) have used a +transformation so that the contours of the resultant empirical +likelihood could be extended beyond the feasible region. However, in +most Bayesian applications, the data are finite in size and not large, +for which the asymptotic arguments have little use. + +With the availability of user-friendly software packages like `STAN` +[@stan2017], gradient-assisted MCMC methods like Hamiltonian Monte Carlo +(HMC) are becoming increasingly popular in Bayesian computation. When +the estimating equations are smooth with respect to the parameters, +gradient-based methods would have a huge advantage in sampling from a +BayesEL posterior. This is because @chaudhuriMondalTeng2017 have shown +that under mild conditions, the gradient of the log-posterior would +diverge to infinity at the boundary of its support. Due to this +phenomenon, if an HMC chain approaches the boundary of the posterior +support, it would be reflected towards its center. + +There is no software to implement HMC sampling from a BayesEL posterior +with smooth estimating equations and priors. We describe such a library +called `elhmc` written for the `R` platform. The main function in the +library only requires the user to specify the estimating equations, +prior, and respectively their Hessian and gradient with respect to the +parameters as functions. Outputs with user-specified degree of detail +can be obtained. + +The `elhmc` package has been used by practitioners since it was made available +on `CRAN`. In recent times, various other libraries for sampling from a +BayesEL posterior have been made available. Among them, the library +`VBel` [@VBel] deserves special mention. The authors compute a +variational approximation of the BayesEL posterior from which samples +can be easily drawn. However, most of the time `elhmc` is considered to +be the benchmark. + +The rest of the article is structured as follows. We start with the +theoretical background behind the software package. In section + [2](#sec:theory) we first +define the empirical likelihood and construct a Bayesian empirical +likelihood from it. The next part of this section is devoted to a review +of the properties of the log empirical likelihood gradient. A review of +the HMC method with special emphasis on BayesEL sampling is provided +next (Section \(sec:hmc)). Section \(sec:package) mainly contains the description of the `elhmc` +library. Some illustrative examples with artificial and real data sets +are presented in Section \(sec:examples). + +## Theoretical background {#sec:theory} + +### Basics of Bayesian Empirical Likelihood + +Suppose $x=(x_1,\ldots,x_n)\in \mathbb{R}^p$ are $n$ observations from a +distribution $F^0$ depending on a parameter vector +$\theta=(\theta^{(1)}, \ldots,\theta^{(d)})\in\Theta\subseteq \mathbb{R}^d$. +We assume that both $F^0$ and the true parameter value $\theta^0$ are +unknown. However, certain smooth functions +$g(\theta,x)=\left(g_1(\theta,x),\ldots,g_q(\theta,x)\right)^T$ are +known to satisfy +$$\begin{equation} +\label{smoothfun} +E_{F^0}[g(\theta^0,x)]=0. +\end{equation} (\#eq:smoothfun)$$ + +Additionally, information about the parameter is available in the form +of a prior density $\pi(\theta)$ supported on $\Theta$. We assume that +it is neither possible nor desirable to specify $F^0$ in a parametric +form. On the other hand, it is not beneficial to estimate $F^0$ +completely nonparametrically without taking into account the information +from \@ref(eq:smoothfun) in the estimation procedure. + +Empirical likelihood provides a semiparametric procedure to estimate +$F^0$, by incorporating information contained in \@ref(eq:smoothfun). A +likelihood can be computed from the estimate. Moreover, if some +information about the parameter is available in the form of a prior +distribution, the same likelihood can be employed to derive a posterior +of the parameter given the observations. + +Let $F\in\mathcal{F}_{\theta}$ be a distribution function depending on +the parameter $\theta$. The empirical likelihood is the maximum of the +"nonparametric likelihood\" +$$\begin{equation} +\label{eqn2} +L(F)=\prod_{i=1}^n \{F(x_i)-F(x_i-)\} +\end{equation} (\#eq:eqn2)$$ +over $\mathcal{F}_\theta$, $\theta\in\Theta$, under constraints +depending on $g(\theta,x)$. + +More specifically, by defining $\omega_i=F(x_i)-F(x_i-)$, the empirical +likelihood for $\theta$ is defined by, +$$\begin{equation} +\label{eqn3} +L(\theta)\mathrel{\mathop:}=\max_{\omega\in\mathcal{W}_{\theta}}\prod_{i=1}^n \omega_i +\end{equation} (\#eq:eqn3)$$ +where +$$\mathcal{W}_{\theta}=\Big\{\omega: \sum_{i=1}^n\omega_i g(\theta,x_i)=0\Big\}\cap\Delta_{n-1}$$ +and $\Delta_{n-1}$ is the $n-1$ dimensional simplex, i.e. +$\omega_i\geq 0$, $\forall i$ and $\sum_{i=1}^n\omega_i=1$. For any +$\theta$, if the problem in \@ref(eq:eqn3) is infeasible, i.e. +$\mathcal{W}_{\theta}=\emptyset$, we define $L(\theta)\mathrel{\mathop:}= 0$. + +Using the empirical likelihood $L(\theta)$ and the prior $\pi(\theta)$ +we can define a posterior as: +$$\begin{equation} +\label{eqn4} +\Pi(\theta|x)=\frac{L(\theta)\pi(\theta)}{\int L(\theta)\pi(\theta) d\theta}\propto L(\theta)\pi(\theta). +\end{equation} (\#eq:eqn4)$$ + +In Bayesian empirical likelihood (BayesEL), $\Pi(\theta|x)$ is used as +the posterior to draw inferences on the parameter. + +Returning back to \@ref(eq:eqn3) above, suppose we denote: +$$\begin{equation} +\label{eqn5} +\hat{\omega}(\theta)=\mathop{\mathrm{\arg\!\max}}_{\omega\in\mathcal{W}_{\theta}}\prod_{i=1}^n \omega_i. +\qquad\qquad +\Big(\text{ i.e. } L(\theta)=\prod^n_{i=1}\hat{\omega}_i(\theta)\Big) +\end{equation} (\#eq:eqn5)$$ +Each $\hat\omega_i\geq 0$ if and only if the origin in $\mathbb{R}^q$ +can be expressed as a convex combination of +$g(\theta,x_1),\ldots,g(\theta,x_n)$. Otherwise, the optimisation +problem is infeasible, and $\mathcal{W}_{\theta}=\emptyset$. +Furthermore, when $\hat{\omega}_i>0$, $\forall i$ is feasible, the +solution $\hat{\omega}$ of \@ref(eq:eqn5) is unique. + +The estimate of $F^0$ is given by:[^1] +$$\hat{F}^0(x)=\sum_{i=1}^n\hat{\omega}_i(\theta)1_{\{x_i\leq x\}}.$$ +The distribution $\hat{F}^0$ is a step function with a jump of +$\hat{\omega}_i(\theta)$ on $x_i$. If +$\mathcal{W}_{\theta}=\Delta_{n-1}$, i.e. no information about +$g(\theta,x)$ is present, it easily follows that +$\hat{\omega}_i(\theta)=n^{-1}$, for each $i=1$, $2$, $\ldots$, $n$ and +$\hat{F}^0$ is the well-known empirical distribution function. + +By construction, $\Pi(\theta|x)$ can only be computed numerically. No +analytic form is available. Inferences are drawn through the +observations from $\Pi(\theta|x)$ sampled using Markov chain Monte Carlo +techniques. + +Adaptation of Markov chain Monte Carlo methods to BayesEL applications +poses several challenges. First of all, it is not possible to determine +the full conditional densities in a closed form. So techniques like +Gibbs sampling [@geman1984stochastic] cannot be used. In most cases, +random walk Metropolis procedures, with carefully chosen step sizes, are +attempted. However, the nature of the support of $\Pi(\theta|x)$, which +we discuss in detail below, makes the choice of an appropriate step size +extremely difficult. + +```{r scheme, echo=FALSE, fig.cap="Schematic illustration of the Empirical likelihood problem. The support of the empirical likelihood is $\\Theta_1$, a subset of $\\mathbb{R}^d$. We take $n=8$ observations. The estimating equations $g(x,\\theta)$ are $q=2$ dimensional. Note that $\\Theta_1$ is non-convex and may not be bounded. The convex hull of the $q$-dimensional vectors, i.e., $\\mathcal{C}(\\theta,x)$, is a pentagon in $\\mathbb{R}^2$. The largest faces of $\\mathcal{C}(\\theta,x)$ are the one-dimensional sides of the pentagon. It follows that, $\\theta^{(k)}\\in\\Theta_1$ iff the origin of $\\mathbb{R}^2$, denoted $0_2$ is in the interior $\\mathcal{C}^0(\\theta,x)$ of $\\mathcal{C}(\\theta,x)$. This also implies that the optimal empirical likelihood weights $\\hat{\\omega}(\\theta^{(k)})$ are strictly positive and lie in the interior of the $n-1$, i.e. $7$-dimensional simplex. There is no easy way to determine $\\Theta_1$. We check if $0_2\\in \\mathcal{C}^0(\\theta,x)$ or equivalently if $\\hat{\\omega}(\\theta^{(k)})$ are in the interior of $\\Delta_7$ in order to determine if $\\theta^{(k)}\\in \\Theta_1$. As the sequence $\\theta^{(k)}$ approaches the boundary of $\\theta_1$, the convex polytope $\\mathcal{C}(\\theta^{(k)},x)$ changes in such a way, so that $0_2$ converges to its boundary. The sequence of optimal weights $\\hat{\\omega}(\\theta^{(k)})$, will converge to the boundary of $\\Delta_7$. The current software is based on @chaudhuriMondalTeng2017, who show that, under simple conditions, along almost every sequence $\\theta^{(k)}$ converging to the boundary of $\\Theta_1$, at least one component of the gradient of log-empirical likelihood based posterior diverges to positive or negative infinity.", out.width="100%"} +knitr::include_graphics("figures/dirScan3.jpeg") +``` + +Provided that the prior is positive over the whole $\Theta$, which is +true in most applications, the support of $\Pi(\theta|x)$ is a subset of +the support of the likelihood $L(\theta)$ which can be defined as (see +Figure \@ref(fig:scheme)): +$$\begin{equation} +\label{support} +\Theta_1=\left\{\theta: L(\theta)>0\right\}. +\end{equation} (\#eq:support)$$ +Thus, the efficiency of the MCMC algorithm would depend on $\Theta_1$ +and the behaviour of $\Pi(\theta|x)$ on it. + +By definition, $\Theta_1$ is closely connected to the set +$$\begin{equation} +\label{convexhull} +\mathcal{C}(\theta,x)=\left\{\sum_{i=1}^n\omega_ig(\theta,x_i) \Big|\omega\in \Delta_{n-1}\right\}, +\end{equation} (\#eq:convexhull)$$ +which is the closed convex hull of the $q$ dimensional vectors +$G(x,\theta)=\{g(\theta,x_i),\ldots,g(\theta,x_n)\}$ in $\mathbb{R}^q$ +(the pentagon in Figure \(fig:scheme)). Suppose $\mathcal{C}^0(\theta,x)$ and +$\partial \mathcal{C}(\theta,x)$ are respectively the interior and +boundary of $\mathcal{C}(\theta,x)$. By construction, +$\mathcal{C}(\theta,x)$ is a convex polytope. Since the data $x$ is +fixed, the set $\mathcal{C}(\theta,x)$ is a set-valued function of +$\theta$. For any $\theta\in\Theta$, the problem in \@ref(eq:eqn3) is +feasible (i.e. $\mathcal{W}_{\theta}\ne\emptyset$) if and only if the +origin of $\mathbb{R}^q$, denoted by $0_q$, is in +$\mathcal{C}(\theta,x)$. That is, $\theta\in\Theta_1$ if and only if the +same $0_q\in\mathcal{C}^0(\theta,x)$. It is not possible to determine +$\Theta_1$ in general. The only way is to check if, for any potential +$\theta$, the origin $0_q$ is in $\mathcal{C}^0(\theta,x)$. There is no +quick numerical way to check the latter either. Generally, an attempt is +made to solve \@ref(eq:eqn3). The existence of such a solution indicates +that $\theta\in L(\theta)$. + +Examples show [@chaudhuriMondalTeng2017] that even for simple problems, +$\Theta_1$ may not be a convex set. Designing an efficient random walk +Markov chain Monte Carlo algorithm on a potentially non-convex support +is an extremely challenging task. Unless the step sizes and the proposal +distributions are adapted well to the proximity of the current position +to the boundary of $\Theta_1$, the chain may repeatedly propose values +outside the likelihood support and, as a result, converge very slowly. +Adaptive algorithms like the one proposed by @haario1999adaptive do not +tackle the non-convexity problem well. + +Hamiltonian Monte Carlo methods solve well-known equations of motion +from classical mechanics to propose new values of $\theta\in\Theta$. +Numerical solutions of these equations of motion are dependent on the +gradient of the log posterior. The norm of the gradient of the log +empirical likelihood used in BayesEL procedures diverges near the +boundary of $\Theta_1$. This property makes the Hamiltonian Monte Carlo +procedures very efficient for sampling a BayesEL posterior. It ensures +that once in $\Theta_1$, the chain would rarely step outside the support +and repeatedly sample from the posterior. + +### A Review of Some Properties of the Gradient of Log Empirical Likelihood {#sec:elprop} + +Various properties of log-empirical likelihood have been discussed in +the literature. However, the properties of its gradients with respect to +the model parameters are relatively unknown. Our main goal in this +section is to review the behaviour of gradients of log-empirical +likelihood on the support of the empirical likelihood. We only state the +relevant results here. The proofs of these results can be found in +@chaudhuriMondalTeng2017. + +Recall that, (see Figure \(fig:scheme)) the support $\Theta_1$ can only be specified by +checking if $0_q\in\mathcal{C}^0(x,\theta_0)$ for each individual +$\theta_0\in\Theta$. If for some $\theta_0\in\Theta$, the origin lies on +the boundary of $\mathcal{C}(x,\theta_0)$, i.e. +$0_q\in\partial \mathcal{C}(x,\theta_0)$, the problem in \@ref(eq:eqn3) +is still feasible, however, $L\left(\theta_0\right)=0$ and the solution +of \@ref(eq:eqn5) is not unique. Below we discuss how, under mild +conditions, for any $\theta_0\in\Theta$, for a large subset +$S\subseteq\partial \mathcal{C}(x,\theta_0)$, if $0_q\in S$, the +absolute value of at least one component of the gradient of +$\log\left(L\left(\theta_0\right)\right)$ would be large. + +Before we proceed, we make the following assumptions: + +1. $\Theta$ is an open set. []{#A0 label="A0"} + +2. $g$ is a continuously differentiable function of $\theta$ in + $\Theta$, $q \le d$ and $\Theta_1$ is non-empty. []{#A1 label="A1"} + +3. The sample size $n > q$. The matrix $G(x, \theta)$ has full row rank + for any $\theta \in \Theta$. + +4. For any fixed $x$, let $\nabla g(x_i,\theta)$ be the $q \times d$ + Jacobian matrix for any $\theta \in \Theta$. Suppose + $w=(w_1,\ldots, w_n)\in\Delta_{n-1}$ and there are at least $q$ + elements of $w$ that are greater than $0$. Then, for any + $\theta \in \Theta$, the matrix + $\sum_{i=1}^n w_i \nabla g(x_i,\theta)$ has full row rank. + +Under the above assumptions, several results about the log empirical +likelihood and its gradient can be deduced. + +First of all, since the properties of the gradient of the log empirical +likelihood at the boundary of the support are of interest, some +topological properties of the support need to be investigated. Under the +standard topology of $\mathbb{R}^q$, since $\mathcal{C}(x,\theta)$ is a +convex polytope with a finite number of faces and extreme points, using +the smoothness of $g$, it is easy to see that, for any +$\theta_0\in\Theta_1$ one can find a real number $\delta>0$, such that +the open ball centred at $\theta_0$ with radius $\delta$ is contained in +$\Theta_1$. That is, $\Theta_1$ is an open subset of $\Theta$. + +Now, since $\Theta_1$ is an open set, the boundary $\partial\Theta_1$ of +$\Theta_1$ is not contained in $\Theta_1$. Let $\theta^{(0)}$ lie within +$\Theta$ and on the boundary of $\Theta_1$ (i.e. $\partial\Theta_1$). +Then it follows that the primal problem \@ref(eq:eqn3) is feasible at +$\theta^{(0)}$ and $0_q$ lies on the boundary of +$\mathcal{C}(x,\theta^{(0)})$ (i.e. +$\partial \mathcal{C}(x,\theta^{(0)})$). + +Our main objective is to study the utility of Hamiltonian Monte Carlo +methods for drawing samples from a BayesEL posterior. The sampling +scheme will produce a sequence of sample points in +$\theta^{(k)}\in\Theta_1$ (see Figure +\@ref(fig:scheme)). It would +be efficient as long as $\log L\left(\theta^{(k)}\right)$ is large. The +sampling scheme could potentially become inefficient if some +$\theta^{(k)}$ is close to the boundary $\partial\Theta_1$. Thus, it is +sufficient to consider the properties of the log empirical likelihood +and its gradient along such a sequence converging to a point +$\theta^{(0)}\in\partial\Theta_1$. + +From the discussion above it is evident that when +$\theta^{(0)} \in \partial \Theta_1$ the problem in \@ref(eq:eqn3) is +feasible but the likelihood $L\left(\theta^{(0)}\right)$ will always be +zero and \@ref(eq:eqn5) will not have a unique solution. Since +$\mathcal{C}(x,\theta^{(0)})$ is a polytope, and $0_q$ lies on one of +its faces, there exists a subset $\mathcal{I}_0$ of the observations and +$0$ belongs to the interior of the convex hull generated by all +$g(x_i,\theta^{(0)})$ for $i \in \mathcal{I}_0$ (in Figure +\@ref(fig:scheme), +$\mathcal{I}_0=\{x_4,x_5\}$). It follows from the supporting hyperplane +theorem [@boyd2004convex] that there exists a unit vector +$a\in \mathbb{R}^q$ such that +$$a^{\text{\tiny T}} g(x_i, \theta^{(0)}) =0 \quad \mbox{for} \quad i \in \mathcal{I}_0, \qquad\text{and}\qquad a^{\text{\tiny T}} g(x_{i}, \theta^{(0)}) >0 \quad \mbox{for} \quad i \in \mathcal{I}_0^c.$$ +From some algebraic manipulation it easily follows that any +$\omega\in\mathcal{W}_{\theta^{(0)}}$ ($\mathcal{W}_{\theta}$ as defined +in \@ref(eq:eqn3) with $\theta=\theta^{(0)}$) must satisfy,[^2] +$$\omega_i=0 \quad \mbox{for} \quad i \in \mathcal{I}_0^c \qquad\text{and}\qquad \omega_i>0 \quad \mbox{for} \quad i \in \mathcal{I}_0.$$ + +It is well known that the solution of \@ref(eq:eqn5) i.e. +$\hat{w}(\theta)$ is smooth for all $\theta\in\Theta_1$ +[@qin1994empirical]. As $\theta^{(k)}$ converges to $\theta^{(0)}$, the +properties of $\hat{w}(\theta^{(k)})$ need to be considered. To that +goal, we first make a specific choice of $\hat{w}(\theta^{(0)})$. + +First, we consider a restriction of problem \@ref(eq:eqn5) to +$\mathcal{I}_0$. + +$$\begin{equation} +\label{submax} +\hat\nu(\theta) =\mathop{\mathrm{\arg\!\max}}_{\nu\in\mathcal{V}_\theta} \prod_{i\in\mathcal{I}_0} \nu_i +\end{equation} (\#eq:submax)$$ +where +$$\mathcal{V}_\theta=\left\{\nu: \sum_{i\in \mathcal{I}_0}\nu_i g(x_i,\theta)=0\right\}\cap\Delta_{|\mathcal{I}_0|-1}.$$ +We now define +$$\hat \omega_i(\theta^{(0)}) = \hat\nu(\theta^{(0)}), \quad i \in \mathcal{I}_0 \quad \mbox{and} \quad \hat \omega_i(\theta^{(0)}) = 0, \quad i \in \mathcal{I}_0^c,$$ +and +$$L(\theta^{(0)})= \prod_{i=1}^n \hat \omega_i(\theta^{(0)}).$$ + +Since $\theta^{(0)}$ is in the interior of $\mathcal{I}_0$, the problem +\@ref(eq:submax) has a unique solution. For each +$\theta^{(k)}\in\Theta_1$, $\hat{\omega}(\theta^{(k)})$ is continuous +taking values in a compact set. Thus as $\theta^{(k)}$ converges to +$\theta^{(0)}$, $\hat{\omega}(\theta^{(k)})$ converges to a limit. +Furthermore, this limit is a solution of \@ref(eq:eqn5) at +$\theta^{(0)}$. However, counterexamples show [@chaudhuriMondalTeng2017] +that the limit may not be $\hat{\omega}_i(\theta^{(0)})$ as defined +above. That is, the vectors $\hat{\omega}(\theta^{(k)})$ do not extend +continuously to the boundary $\partial\Theta_1$ as a whole. However, we +can show that: +$$\lim_{k\to\infty}\hat\omega_i(\theta^{(k)}) = \hat \omega_i(\theta^{(0)}) = 0, \text{for all\ } i \in \mathcal{I}_0^c.$$ +That is, the components of $\hat\omega(\theta^{(k)})$ which are zero in +$\hat\omega(\theta^{(0)})$ are continuously extendable. Furthermore, +$$\lim_{k\to\infty}L(\theta^{(k)})=L(\theta^{(0)})=0.$$ +That is, the likelihood is continuous at $\theta^{(0)}$. + +However, this is not true for the components +$\hat{\omega}_i\left(\theta^{(k)}\right)$, $i\in\mathcal{I}_0$ for which +$\hat{\omega}_i\left(\theta^{(k)}\right)> 0$. + +Since the set $\mathcal{C}(x,\theta)$ is a convex polytope in +$\mathbb{R}^q$, the maximum dimension of any of its faces is $q-1$, +which would have exactly $q$ extreme points.[^3] Furthermore, any face +with a smaller dimension can be expressed as an intersection of such +$q-1$ dimensional faces. + +In certain cases, however, the whole vector +$\hat{\omega}\left(\theta^{(k)}\right)$ extends continuously to +$\hat{\omega}\left(\theta^{(0)}\right)$. In order to argue that, we +define +$$\begin{equation} +\label{Theta_2} + \mathcal{C}(x_{\mathcal{I}},\theta) = \left\{\sum_{i \in \mathcal{I}} \omega_i g(x_i,\theta)\, \Big|\, \omega\in \Delta_{|\mathcal{I}|-1}\right\} +\end{equation} (\#eq:Theta-2)$$ +and +$$\begin{equation} + \partial\Theta_1^{(q-1)} = \Big\{ \theta: 0 \in \mathcal{C}^0(x_{\mathcal{I}},\theta) \mbox{ for some } \mathcal{I}~ s.t. \mathcal{C}(x_{\mathcal{I}},\theta) \text{has exactly q extreme points} \Big\} \cap\partial\Theta_1. +\end{equation}$$ + +Thus $\partial\Theta_1^{(q-1)}$ is the set of all boundary points +$\theta^{(0)}$ of $\Theta_1$ such that $0$ belongs to a +$(q-1)$-dimensional face of the convex hull +$\mathcal{C}(x,\theta^{(0)})$. Now for any +$\theta^{(0)}\in \partial\Theta_1^{(q-1)}$, there is a unique set of +weight $\nu\in\Delta_{|\mathcal{I}|-1}$ such that, +$\sum_{i\in\mathcal{I}}\nu_ig\left(x_i,\theta^{(0)}\right)=0$. That is +the set of feasible solutions of \@ref(eq:submax) is a singleton set. +This, after taking note that $\hat{\omega}$ takes values in a compact +set, an argument using convergent subsequences, implies that for any +sequence $\theta^{(k)}\in\Theta_1$ converging to $\theta^{(0)}$, the +whole vector $\hat{\omega}\left(\theta^{(k)}\right)$ converges to +$\hat{\omega}\left(\theta^{(0)}\right)$. That is, the whole vector +$\hat{\omega}\left(\theta^{(k)}\right)$ extends continuously to +$\hat{\omega}\left(\theta^{(0)}\right)$. + +We now consider the behaviour of the gradient of the log empirical +likelihood near the boundary of $\Theta_1$. First, note that, for any +$\theta \in \Theta_1$, the gradient of the log empirical likelihood is +given by +$$\nabla \log L(\theta) = -n\sum_{i=1}^n \hat \omega_i(\theta) \hat{\lambda}(\theta)^{\text{\tiny T}} \nabla g(x_i,\theta).$$ +where $\hat{\lambda}(\theta)$ is the estimated Lagrange multiplier +satisfying the equation: + +$$\begin{equation} +\label{eq:lagmult} +\sum_{i=1}^n \frac{g(x_i,\theta)}{\left\{1+ \hat\lambda(\theta)^{\text{\tiny T}} g(x_i,\theta) \right\}}=0. +\end{equation} (\#eq:lagmult)$$ + +Note that, the gradient depends on the value of the Lagrange multiplier +but not on the value of its gradient. + +Now, Under assumption A3, it follows that the gradient of the log +empirical likelihood diverges on the set of all boundary points +$\partial\Theta_1^{(q-1)}$. More specifically one can show: + +1. As $\theta^{(k)}\rightarrow \theta^{(0)}$, + $\parallel\hat \lambda(\theta^{(k)})\parallel\to\infty$. + +2. If $\theta^{(0)}\in \partial\Theta_1^{(q-1)}$, under as + $\theta^{(k)}\rightarrow \theta^{(0)}$, + ${\parallel \nabla \log L(\theta^{(k)}) \parallel}\to \infty$. + +Therefore, it follows that at every boundary point $\theta^{(0)}$ of +$\Theta_1$ such that $0$ belongs to one of the $(q-1)$-dimensional faces +of $\mathcal{C}(x,\theta^{(0)})$, at least one component of the +estimated Lagrange multiplier and the gradient of the log empirical +likelihood diverges to positive or negative infinity. The gradient of +the negative log empirical likelihood represents the direction of the +steepest increase of the negative log empirical likelihood. Since the +value of the log empirical likelihood should typically be highest around +the center of the support $\Theta_1$, the gradient near the boundary of +$\Theta_1$ should point towards its center. This property can be +exploited in forcing candidates of $\theta$ generated by HMC proposals +to bounce back towards the interior of $\Theta_1$ from its boundaries +and in consequence reducing the chance of them getting out of the +support. + +### Hamiltonian Monte Carlo Sampling for Bayesian Empirical Likelihood {#sec:hmc} + +Hamiltonian Monte Carlo algorithm is a Metropolis algorithm where the +successive steps are proposed by using Hamiltonian dynamics. One can +visualise these dynamics as a cube sliding without friction under +gravity in a bowl with a smooth surface. The total energy of the cube is +the sum of the potential energy $U(\theta)$, defined by its position +$\theta$ (in this case its height) and kinetic energy $K(p)$, which is +determined by its momentum $p$. The total energy of the cube will be +conserved and it will continue to slide up and down on the smooth +surface of the bowl forever. The potential and the kinetic energy would, +however, vary with the position of the cube. + +In order to use the Hamiltonian dynamics to sample from the posterior +$\Pi\left(\theta\mid x\right)$ we set our potential and kinetic energy +as follows: +$$U(\theta)=-\log\Pi(\theta|x)\quad\text{and}\quad K(p)=\frac{1}{2}p^TM^{-1}p.$$ +Here, the momentum vector $p=\left(p_1,p_2,\ldots,p_d\right)$ is a +totally artificial construct usually generated from a $N(0, M)$ +distribution. Most often the covariance matrix $M$ is chosen to be a +diagonal matrix with diagonal $(m_1,m_2,\ldots,m_d)$, in which case each +$m_i$ is interpreted as the mass of the $i$th parameter. The Hamiltonian +of the system is the total energy +$$\begin{equation} +\label{hamiltonian dynamics} +\mathcal{H}(\theta,p)=U(\theta)+K(p). +\end{equation} (\#eq:hamiltonian-dynamics)$$ + +In Hamiltonian mechanics, the variation in the position $\theta$ and +momentum $p$ with time $t$ is determined by the partial derivatives of +$\mathcal{H}$ with $p$ and $\theta$ respectively. In particular, the +motion is governed by the pair of so-called Hamiltonian equations: +$$\begin{eqnarray} +\frac{d\theta}{dt}&=&\frac{\partial \mathcal{H}}{\partial p}=M^{-1}p, \label{PDE1}\\ +\frac{dp}{dt}&=&-\frac{\partial\mathcal{ H}}{\partial \theta}=-\frac{\partial U(\theta)}{\partial \theta}.\label{PDE2} +\end{eqnarray} (\#eq:PDE1)$$ +It is easy to show that [@neal2011mcmc] Hamiltonian dynamics is +reversible, invariant, and volume preserving, which makes it suitable +for MCMC sampling schemes. + +In HMC we propose successive states by solving the Hamiltonian equations +\@ref(eq:PDE1) and [\[PDE2\]](#PDE2){reference-type="eqref" +reference="PDE2"}. Unfortunately, they cannot be solved analytically +(except of course for a few simple cases), and they must be approximated +numerically at discrete time points. There are several ways to +numerically approximate these two equations in the literature +[@leimkuhler2004simulating]. For the purpose of MCMC sampling, we need a +method that is reversible and volume-preserving. + +Leapfrog integration [@birdsall2004plasma] is one such method to +numerically integrate the pair of Hamiltonian equations. In this method, +a step-size $\epsilon$ for the time variable $t$ is first chosen. Given +the value of $\theta$ and $p$ at the current time point $t$ (denoted +here by $\theta(t)$ and $p(t)$ respectively), the leapfrog updates the +position and the momentum at time $t+\epsilon$ as follows +$$\begin{eqnarray} +p\left(t+\frac{\epsilon}{2}\right)&=&p(t)-\frac{\epsilon}{2}\frac{\partial U(\theta(t))}{\partial\theta},\label{leapfrog1}\\ +\theta(t+\epsilon)&=&\theta(t)+\epsilon M^{-1}p\left(t+\frac{\epsilon}{2}\right),\label{leapfrog2}\\ +p(t+\epsilon)&=&p\left(t+\frac{\epsilon}{2}\right)-\frac{\epsilon}{2}\frac{\partial U(\theta(t+\epsilon))}{\partial\theta}.\label{leapfrog3} +\end{eqnarray} (\#eq:leapfrog1)$$ + +Theoretically, due to its symmetry, the leapfrog integration satisfies +the reversibility and preserves the volume. However, because of the +numerical inaccuracies, the volume is not preserved. This is similar to +the Langevin-Hastings algorithm [@besag2004markov], which is a special +case of HMC. Fortunately, the lack of invariance in volume is easily +corrected. The accept-reject step in the MCMC procedure ensures that the +chain converges to the correct posterior. + +At the beginning of each iteration of the HMC algorithm, the momentum +vector $p$ is randomly sampled from the $N(0,M)$ distribution. Starting +with the current state $(\theta,p)$, the leapfrog integrator described +above is used to simulate Hamiltonian dynamics for $T$ steps with a step +size of $\epsilon$. At the end of this $T$-step trajectory, the momentum +$p$ is negated so that the Metropolis proposal is symmetric. At the end +of this T-step iteration, the proposed state $(\theta^*,p^*)$ is +accepted with probability +$$\min\{1,\exp(-\mathcal{H}(\theta^*,p^*)+\mathcal{H}(\theta,p))\}.$$ + +The gradient of the log-posterior used in the leapfrog is a sum of the +gradient of the log empirical likelihood and the gradient of the log +prior. The prior is user-specified and it is hypothetically possible +that even though at least one component of the gradient of the log +empirical likelihood diverges at the boundary $\partial\Theta_1$, the +log prior gradient may behave in a way so that the effect is nullified +and the log posterior gradient remains finite over the closure of +$\Theta_1$. We make the following assumption on the prior mainly to +avoid this possibility (see @chaudhuriMondalTeng2017 for more details). + + \begin{equation} + \liminf_{k\to\infty} \frac{ \log\pi(\theta^{(k-1)}) - \log\pi(\theta^{(k)}) }{ \log L(\theta^{(k-1)} ) - \log L(\theta^{(k)} ) } \ge b(n, \theta^{(0)}). + (\#eq:liminf2) + \end{equation} + +- Consider a sequence $\{\theta^{(k)} \}$, $k=1, 2,\ldots$, of points + in $\Theta_1$ such that $\theta^{(k)}$ converges to a boundary point + $\theta^{(0)}$ of $\Theta_1$. Assume that $\theta^{(0)}$ lies within + $\Theta$ and $L(\theta^{(k)})$ strictly decreases to + $L(\theta^{(0)})$, Then, for some constant + $b(n, \theta^{(0)}) > -1$, we have + +The assumption implies that near the boundary of the support, the main +contribution in the gradient of the log-posterior with respect to any +parameter appearing in the argument of the estimating equations comes +from the corresponding gradient of the log empirical likelihood. This is +in most cases expected, especially if the sample size is large. For a +large sample size, the log-likelihood should be the dominant term in the +log-posterior. We are just assuming here that the gradients behave the +same way. It would also ensure that at the boundary, the gradient of the +log-likelihood and the log-posterior do not cancel each other, which is +crucial for the proposed Hamiltonian Monte Carlo to work. + +Under these assumptions, @chaudhuriMondalTeng2017 show that the gradient +of the log-posterior diverges along almost every sequence as the +parameter values approach the boundary $\partial \Theta_1$ from the +interior of the support. More specifically, they prove that: + +$$\begin{equation} +\label{eq:postdiv} +\Bigl\| \nabla \log \pi(\theta^{(k)} \mid x) \Bigr\| \rightarrow \infty, \hspace{.1in} \mbox{ as } \hspace{.1in} k \rightarrow \infty. +\end{equation} (\#eq:postdiv)$$ + +Since the $q-1$ dimensional faces of $\mathcal{C}(x,\theta^{(0)})$ have +larger volume than its faces with lower dimension (see Figure +\@ref(fig:scheme)), a random +sequence of points from the interior to the boundary would converge to a +point on $\partial \Theta_1^{(q-1)}$ with probability $1$. Thus under +our assumptions, the gradient of the log-posterior would diverge to +infinity for these sequences with a high probability. The lower +dimensional faces of the convex hull (a polytope) are an intersection of +$q-1$ dimensional faces. Although, it is not clear if the norm of the +gradient of the posterior will diverge on those faces. It is conjectured +that this would happen. However, even if the conjecture is not true, +from the setup, it is clear that the sampler would rarely move to the +region where the origin belongs to the lower dimensional faces of the +convex hull. + +As has been pointed out above, the gradient vector would always point +towards the mode of the posterior. From our results, since the gradient +is large near the support boundary, whenever the HMC sampler approaches +the boundary due to the high value of the gradient it would reflect +towards the interior of the support and not get out of it. The leapfrog +parameters can be controlled to increase efficiency of sampling. + +## Package description {#sec:package} + +The main function of the package is `ELHMC`. It draws samples from an +empirical likelihood Bayesian posterior of the parameter of interest +using Hamiltonian Monte Carlo once the estimating equations involving +the parameters, the prior distribution of the parameters, the gradients +of the estimating equations, and the log priors are specified. Some +other parameters which control the HMC process can also be specified. + +Suppose that the data set consists of observations +$x = \left( x_1, ..., x_n \right)$ where each $x_i$ is a vector of +length $p$ and follows a probability distribution $F$ of family +$\mathcal{F}_{\theta}$. Here +$\theta = \left(\theta_1,...,\theta_d\right)$ is the $d-$dimensional +parameter of interest associated with $F$. Suppose there exist smooth +functions +$g\left(\theta, x_i\right) = \left(g_1\left(\theta, x_i\right)\right.$, +$\ldots$, $\left. g_q\left(\theta, x_i\right)\right)^T$ which satisfy +$E_F\left[g\left(\theta,x_i\right)\right] = 0$. As we have explained +above, `ELHMC` is used to draw samples of $\theta$ from its posterior +defined by an empirical likelihood. + + +```{r arguments, echo=FALSE} +knitr::kable( + data.frame( + Argument = c( + "`initial`", + "`data`", + "`fun`", + "`dfun`", + "`prior`", + "`dprior`", + "`n.samples`", + "`lf.steps`", + "`epsilon`", + "`p.variance`", + "`tol`", + "`detailed`", + "`print.interval`", + "`plot.interval`", + "`which.plot`", + "`FUN`", + "`DFUN`" + ), + Description = c( + "A vector containing the initial values of the parameter", + "A matrix containing the data", + "The estimating function $g$. It takes in a parameter vector `params` as the first argument and a data point vector `x` as the second parameter. This function returns a vector.", + "A function that calculates the gradient of the estimating function $g$. It takes in a parameter vector `params` as the first argument and a data point vector `x` as the second argument. This function returns a matrix.", + "A function with one argument `x` that returns the log joint prior density of the parameters of interest.", + "A function with one argument `x` that returns the gradients of the log densities of the parameters of interest", + "Number of samples to draw", + "Number of leap frog steps in each Hamiltonian Monte Carlo update (defaults to $10$).", + "The leap frog step size (defaults to $0.05$).", + "The covariance matrix of a multivariate normal distribution used to generate the initial values of momentum `p` in Hamiltonian Monte Carlo. This can also be a single numeric value or a vector (defaults to $0.1$).", + "EL tolerance", + "If this is set to `TRUE`, the function will return a list with extra information.", + "The frequency at which the results would be printed on the terminal. Defaults to 1000.", + "The frequency at which the drawn samples would be plotted. The last half of the samples drawn are plotted after each plot.interval steps. The acceptance rate is also plotted. Defaults to 0, which means no plot.", + "The vector of parameters to be plotted after each plot.interval. Defaults to NULL, which means no plot.", + "the same as `fun` but takes in a matrix `X` instead of a vector `x` and returns a matrix so that `FUN(params, X)[i, ]` is the same as `fun(params, X[i, ])`. Only one of `FUN` and `fun` should be provided. If both are then `fun` is ignored.", + "the same as `dfun` but takes in a matrix `X` instead of a vector `x` and returns an array so that `DFUN(params, X)[, , i]` is the same as `dfun(params, X[i, ])`. Only one of `DFUN` and `dfun` should be provided. If both are then `dfun` is ignored." + ), + check.names = FALSE + ), + caption = "Arguments for function `ELHMC`", + label = "T1", + col.names = NULL +) +``` + + +Table \@ref(tab:T1) enlists the full list of arguments for `ELHMC`. Arguments `data` and `fun` define the problem. They +are the data set $x$ and the collection of smooth functions in $g$. The +user-specified starting point for $\theta$ is given in `initial`, whereas `n.samples` is the +number of samples of $\theta$ to be drawn. The gradient matrix of $g$ +with respect to the parameter $\theta$ (i.e. $\nabla_{\theta}g$) has to +be specified in `dfun`. At the moment the function does not compute the +gradient numerically by itself. The prior `prior` represents the joint density +functions of $\theta_1,..,\theta_q$, which for the purpose of this +description we denote by $\pi$. The gradient of the log prior function +is specified in `dprior`. The function returns a vector containing the values of +$\frac{\partial}{\partial \theta_1}\pi\left(\theta\right),..,\frac{\partial}{\partial\theta_d}\pi\left(\theta\right)$. +Finally, the arguments `epsilon`, `lf.steps`, `p.variance` and `tol` are hyper-parameters which control the +Hamiltonian Monte Carlo algorithm. + +The arguments `print.interval`, `plot.interval`, and `which.plot` can be used to tune the HMC samplers. They can be +used for printing and plotting the sampled values at specified intervals +while the code is running. The argument `which.plot` allows the user to only plot the +variables whose convergence needs to be checked. + +Given the data and a value of $\theta$, `ELHMC` computes the optimal weights +using the `el.test` function from `emplik` library [@{Zhou.:2014nr}]. The `el.test` provides +$\hat{\lambda}\left(\theta^{(k)}\right)$ from which the gradient of the +log-empirical likelihood can be computed. + +If $\theta\not\in\Theta_1$, i.e. problem \@ref(eq:eqn5) is not feasible, +then `el.test` converges to weights all close to zero which do not sum to one. +Furthermore, the norm of $\hat{\lambda}\left(\theta^{(k)}\right)$ will +be large. In such cases, the empirical likelihood will be zero. This +means that, whenever the optimal weights are computed, we need to check +if they sum to one (within numerical errors) or not. + +The function `ELHMC` returns a list. If argument `detailed` is set to `FALSE`, the list contains +samples of the parameters of interest $\theta$, the Monte Carlo +acceptance rate as listed in table \(tab:T2). If `detailed` is set to `TRUE`, additional information such as the +trajectories of $\theta$ and the momentum is included in the returned +list (see Table \(tab:T3)). + +At the moment `ELHMC` only allows a diagonal covariance matrix for the momentum +$p$. The default value for the stepsize `epsilon` and step number `lf.steps` are $0.05$ and +$10$ respectively. For a specific problem they need to be determined by +trial and error, using the outputs from `plot.interval`, and `print.interval` commands. + +```{r returned, echo=FALSE} +knitr::kable( + data.frame( + Element = c("`samples`", "`acceptance.rate`", "`call`"), + Description = c( + "A matrix containing the parameter samples", + "The acceptance rate", + "The matched call" + ), + check.names = FALSE + ), + caption = "Elements of the list returned by `ELHMC` if `detailed=FALSE`", + label = "returned", + col.names = NULL +) +``` +```{r detailedreturned, echo=FALSE} +knitr::kable( + data.frame( + Element = c("`samples`", "`acceptance.rate`", "`proposed`", "`acceptance`", "`trajectory`", "`call`"), + Description = c( + "A matrix containing the parameter samples", + "The acceptance rate", + "A matrix containing the proposed values at `n.samples - 1` Hamiltonian Monte Carlo updates", + "A vector of `TRUE/FALSE` values indicates whether each proposed value is accepted", + "A list with 2 elements `trajectory.q` and `trajectory.p`. These are lists of matrices containing position and momentum values along trajectory in each Hamiltonian Monte Carlo update.", + "The matched call" + ), + check.names = FALSE + ), + caption = "Elements of the list returned by `ELHMC` if `detailed=TRUE`", + label = "detailedreturned", + col.names = NULL +) +``` + +## Examples {#sec:examples} + +In this section, we present two examples of usage of the package. Both +examples in some sense supplement the conditions considered by +@chaudhuriMondalTeng2017. In each case, it is seen that the function can +sample from the resulting empirical likelihood-based posterior quite +efficiently. + +### Sample the mean of a simple data set + +In the first example, suppose the data set consists of eight data points +$v = \left(v_1,...,v_8\right)$: + +``` r +R> v <- rbind(c(1, 1), c(1, 0), c(1, -1), c(0, -1), ++ c(-1, -1), c(-1, 0), c(-1, 1), c(0, 1)) +R> print(v) + [,1] [,2] +[1,] 1 1 +[2,] 1 0 +[3,] 1 -1 +[4,] 0 -1 +[5,] -1 -1 +[6,] -1 0 +[7,] -1 1 +[8,] 0 1 +``` + +The parameters of interest are the mean +$\theta = \left(\theta_1, \theta_2\right)$. Since +$E\left[\theta - v_i\right] = 0$, the smooth function is +$g = \theta - v_i$ with +$\nabla_\theta g = \left(\left(1, 0\right), \left(0, 1\right)\right)$: + +``` r +Function: fun +R> g <- function(params, x) { ++ params - x ++ } + +Function: dfun +R> dlg <- function(params, x) { ++ rbind(c(1, 0), c(0, 1)) ++ } +``` + +Functions `g` and `dlg` are supplied to arguments `fun` and `dfun` in `ELHMC`. These two functions +must have `params` as the first argument and `x` as the second. `params` represents a sample +of $\theta$ whereas `x` represents a data point $v_i$ or a row in the matrix +`v`. `fun` should return a vector and `dfun` a matrix whose $\left(i, j\right)$ entry is +$\partial g_i/\partial\theta_j$. + +We assume that both $\theta_1$ and $\theta_2$ have independent standard +normal distributions as priors. Next, we define the functions that +calculate the prior densities and gradients of log prior densities as +`pr` and `dpr` in the following ways: + +``` r +Function: prior +R> pr <- function(x) { ++ -.5*(x[1]^2+x[2]^2)-log(2*pi) ++ } +Function: dprior +R> dpr <- function(x) { ++ -x ++ } +``` + +Functions `pr` and `dpr` are assigned to `prior` and `dprior` in `ELHMC`. `prior` and `dprior` must take in only one +argument `x` and return a vector of the same length as $\theta$. + +We can now use `ELHMC` to draw samples of $\theta$. Let us draw 1000 samples, +with starting point $\left(0.9, 0.95\right)$ using 12 leapfrog steps +with step size 0.06 for both $\theta_1$ and $\theta_2$ for each +Hamiltonian Monte Carlo update: + +``` r +R> library(elhmc) +R> set.seed(476) +R> thetas <- ELHMC(initial = c(0.9, 0.95), data = v, fun = g, dfun = dlg, ++ prior = pr, dprior = dpr, n.samples = 1000, ++ lf.steps = 12, epsilon = 0.06, detailed = TRUE) +``` + +We extract and visualise the distribution of the samples using a boxplot +(Figure \@ref(fig:theta)): + +``` r +R> boxplot(thetas$samples, names = c(expression(theta[1]), expression(theta[2]))) +``` + +Since we set `detailed = TRUE`, we have data on the trajectory of $\theta$ as well as +momentum $p$. They are stored in element `trajectory` of `thetas` and can be accessed by +`thetas$trajectory`. `thetas$trajectory` is a list with two elements +named `trajectory.q` and `trajectory.p` denoting trajectories for $\theta$ and momentum $p$. `trajectory.q` and `trajectory.p` are +both lists with elements `1`, \..., `n.samples - 1`. Each of these elements is a matrix +containing trajectories of $\theta$ (`trajectory.q`) and $p$ (`trajectory.p`) at each Hamiltonian +Monte Carlo update. + +We illustrate by extracting the trajectories of $\theta$ at the first +update and plotting them (Figure +\@ref(fig:trajectoryeg1)): + +``` r +R> q <- thetas$trajectory$trajectory.q[[1]] +R> plot(q, xlab = expression(theta[1]), ylab = expression(theta[2]), ++ xlim = c(-1, 1), ylim = c(-1, 1), cex = 1, pch = 16) +R> points(v[,1],v[,2],type="p",cex=1.5,pch=16) +R> abline(h=-1); abline(h=1); abline(v=-1); abline(v=1) +R> arrows(q[-nrow(q), 1], q[-nrow(q), 2], q[-1, 1], q[-1, 2], ++ length = 0.1, lwd = 1.5) +``` + + +```{r theta, echo=FALSE, fig.cap="Posterior distribution of $\\theta_1$ and $\\theta_2$ samples.", out.width="48%", fig.show='hold'} +knitr::include_graphics("figures/elhmc-008.png") +``` + +```{r trajectoryeg1, echo=FALSE, fig.cap="Trajectory of $\\theta$ during the first Monte Carlo update.", out.width="48%", fig.show='hold'} +knitr::include_graphics("figures/elhmc-010.png") +``` + + +```{r trajectory, echo=FALSE, fig.cap="Samples of $\\theta$ drawn from the posterior.", fig.subcap=c("Posterior distribution", "Trajectory during first update"), out.width="48%", fig.show='hold'} +knitr::include_graphics(c("figures/elhmc-008.png", "figures/elhmc-010.png")) +``` + + +The specialty in this example is in the choice of the data points in +$v$. @chaudhuriMondalTeng2017 show that the chain will reflect if the +one-dimensional boundaries of the convex hull (in this case the unit +square) have two observations, which happens with probability one for +continuous distributions. In this example, however, there is more than +one point in two one-dimensional boundaries. However, we can see that +the HMC method works very well here. + +### Logistic regression with an additional constraint {#ex:2} + +In this example, we consider a constrained logistic regression of one +binary variable on another, where the expectation of the response is +known. The frequentist estimation problem using empirical likelihood was +considered by @chaudhuri_handcock_rendall_2008. It has been shown that +empirical likelihood-based formulation has a major applicational +advantage over the fully parametric formulation. Below we consider a +Bayesian extension of the proposed empirical likelihood-based +formulation and use `ELHMC` to sample from the resulting posterior. + +```{r bhps, echo=FALSE} +knitr::kable( + data.frame( + ` ` = c("$y=0$", "$y=1$"), + `$x=0$` = c(5903, 230), + `$x=1$` = c(5157, 350), + check.names = FALSE + ), + caption = "The dataset used in this example.", + label = "bhps" +) +``` + +The data set $v$ consists of $n$ observations of two variables and two +columns $X$ and $Y$. In the ith row $y_i$ represents the indicator of +whether a woman gave birth between time $t - 1$ and $t$ while $x_i$ is +the indicator of whether she had at least one child at time $t - 1$. The +data can be found in Table +\@ref(tab:bhps) +above. In addition, it was known that the prevalent general fertility +rate in the population was $0.06179$. [^4] + +We are interested in fitting a logistic regression model to the data +with $X$ as the independent variable and $Y$ as the dependent variable. +However, we also would like to constrain the sample general fertility +rate to its value in the population. The logistic regression model takes +the form of: + +$$P \left(Y = 1 | X = x\right) = \frac{\exp\left(\beta_0 + \beta_1 x\right)}{1 + \exp\left(\beta_0 + \beta_1 x\right)}.$$ + +From the model, using conditions similar to zero-mean residual and +exogeneity, it is clear that: + +$$\begin{equation*} +E\left[y_i - \frac{\exp\left(\beta_0 + \beta_1 x_i\right)}{1 + \exp\left(\beta_0 + \beta_1 x_i\right)}\right] = 0,\quad +E\left[x_i\left\{y_i - \frac{\exp\left(\beta_0 + \beta_1 x_i\right)}{1 + \exp\left(\beta_0 + \beta_1 x_i\right)}\right\}\right] = 0. +\end{equation*}$$ + +Furthermore from the definition of general fertility rate, we get: + +$$E\left[y_i - 0.06179\right] = 0.$$ + +Following @chaudhuri_handcock_rendall_2008, we define the estimating +equations $g$ as follows: + +$$g \left(\beta, v_i\right) = \begin{bmatrix} +y_i - \frac{\exp\left(\beta_0 + \beta_1 x_i\right)}{1 + \exp\left(\beta_0 + \beta_1 x_i\right)} \\ +x_i\left[y_i - \frac{\exp\left(\beta_0 + \beta_1 x_i\right)}{1 + \exp\left(\beta_0 + \beta_1 x_i\right)}\right] \\ +y_i - 0.06179 \\ +\end{bmatrix}$$ + +The gradient of $g$ with respect to $\beta$ is given by: + +$$\nabla_{\beta}g = \begin{bmatrix} +\frac{-\exp\left(\beta_0 + \beta_1 x_i\right)}{\left(\exp\left(\beta_0 + \beta_1 x_i\right) + 1\right)^2} & \frac{-\exp\left(\beta_0 + \beta_1 x_i\right) x_i}{\left(\exp\left(\beta_0 + \beta_1 x_i\right) + 1\right)^2} \\ +\frac{-\exp\left(\beta_0 + \beta_1 x_i\right) x_i}{\left(\exp\left(\beta_0 + \beta_1 x_i\right) + 1\right)^2} & \frac{-\exp\left(\beta_0 + \beta_1 x_i\right) x_i^2}{\left(\exp\left(\beta_0 + \beta_1 x_i\right) + 1\right)^2} \\ +0 & 0 \\ +\end{bmatrix}$$ + +In R, we create functions `g` and `dg` to represent $g$ and $\nabla_{\beta}g$: + +``` r +Function: fun +R> g <- function(params, X) { ++ result <- matrix(0, nrow = nrow(X), ncol = 3) ++ a <- exp(params[1] + params[2] * X[, 1]) ++ a <- a / (1 + a) ++ result[, 1] <- X[, 2] - a ++ result[, 2] <- (X[, 2] - a) * X[, 1] ++ result[, 3] <- X[, 2] - 0.06179 ++ result +} +Function: dfun +R> dg <- function(params, X) { ++ result <- array(0, c(3, 2, nrow(X))) ++ a <- exp(params[1] + params[2] * X[, 1]) ++ a <- -a / (a + 1) ^ 2 ++ result[1, 1, ] <- a ++ result[1, 2, ] <- result[1, 1, ] * X[, 1] ++ result[2, 1, ] <- result[1, 2, ] ++ result[2, 2, ] <- result[1, 2, ] * X[, 1] ++ result[3, , ] <- 0 ++ result +} +``` + +We choose independent $N\left(0, 100\right)$ priors for both $\beta_0$ +and $\beta_1$: + +``` r +Function: prior +R> pr <- function(x) { ++ - 0.5*t(x)%*%x/10^4 - log(2*pi*10^4) ++ }, +Function: dprior +R> dpr <- function(x) { ++ -x * 10 ^ (-4) ++ }, +``` + +where `pr` is the prior and `dpr` is the gradient of the log prior for $\beta$. + +Our goal is to use `ELHMC` to draw samples of +$\beta = \left( \beta_0, \beta_1 \right)$ from their resulting posterior +based on empirical likelihood. + +We start our sampling from $(-3.2,0.55)$ and use two stages of sampling. +In the first stage, $50$ points are sampled with $\epsilon=0.001$, +$T=15$, and the momentum generated from a $N(0,0.02\cdot I_2)$ +distribution. The acceptance rate at this stage is very high, but it is +designed to find a good starting point for the second stage, where the +acceptance rate can be easily controlled. + +``` r +R> bstart.init=c(-3.2,.55) +R> betas.init <- ELHMC(initial = bstart.init, data = data, FUN = g, DFUN = dg, ++ n.samples = 50, prior = pr, dprior = dpr, epsilon = 0.001, ++ lf.steps = 15, detailed = T, p.variance = 0.2) +``` + + +```{r density, echo=FALSE, fig.cap="Contour plot of the non-normalised log posterior with the HMC sampling path (left) and density plot (right) of the samples for the constrained logistic regression problem.", out.width="48%", fig.show='hold'} +knitr::include_graphics(c("figures/bhpsContour.png", "figures/elhmcHMC.png")) +``` + +```{r acf, echo=FALSE, fig.cap="The autocorrelation function of the samples drawn from the posterior of $\\beta$.", out.width="48%", fig.show='hold'} +knitr::include_graphics(c("figures/bhpsAcfb0.png", "figures/bhpsAcfb1.png")) +``` + + +In this second stage, we draw 500 samples of $\beta$ with starting +values as the last value from the first stage. The number of leapfrog +steps per Monte Carlo update is set to 30, with a step size of 0.004 for +both $\beta_0$ and $\beta_1$. We use +$N \left(0, 0.02\textbf(I_2)\right)$ as prior for the momentum. + +``` r +R> bstart=betas.init$samples[50,] +R> betas <- ELHMC(initial = bstart, data = data, fun = g, dfun = dg, ++ n.samples = 500, prior = pr, dprior = dpr, epsilon = 0.004, ++ lf.steps = 30, detailed = FALSE, p.variance = 0.2,print.interval=10, ++ plot.interval=1,which.plot=c(1)) +``` + +Based on our output, we can make inferences about $\beta$. As an +example, the autocorrelation plots and the density plot of the last +$1000$ samples of $\beta$ are shown in Figure +\@ref(fig:density). + +``` r +R> library(MASS) +R> beta.density <- kde2d(betas$sample[, 1], betas$samples[, 2]) +R> persp(beta.density, phi = 50, theta = 20, ++ xlab = 'Intercept', ylab = '', zlab = 'Density', ++ ticktype = 'detailed', cex.axis = 0.35, cex.lab = 0.35, d = 0.7) +R> acf(betas$sample[round(n.samp/2):n.samp, 1], ++ main=expression(paste("Series ",beta[0]))) +R> acf(betas$sample[round(n.samp/2):n.samp, 2], ++ main=expression(paste("Series ",beta[1]))) +``` + +It is well known [@chaudhuri_handcock_rendall_2008] that the constrained +estimates of $\beta_0$ and $\beta_1$ have very low standard error. The +acceptance rate is close to $78\%$. It is evident that our software can +sample from such a narrow ridge with ease. Furthermore, the +autocorrelation of the samples seems to decrease very quickly with the +lag, which would not be the case for most other MCMC procedures. + +## Acknowledgement {#acknowledgement .unnumbered} + +Dang Trung Kien would like to acknowledge the support of MOE AcRF +R-155-000-140-112 from the National University of Singapore. Sanjay +Chaudhuri acknowledges the partial support from NSF-DMS grant 2413491 +from the National Science Foundation USA. The authors are grateful to +Professor Michael Rendall, Department of Sociology, University of +Maryland, College Park for kindly sharing the data set on which the +second example is based. +::::::: + +[^1]: By convention, + $x_i=(x_{i1},x_{i2},\ldots,x_{ip})^T\le x=(x_1,x_2,\ldots,x_p)^T$ + iff $x_{ij}\le x_{j}$ $\forall j$. + +[^2]: In Figure \(fig:scheme), $\omega_1=\omega_2=\omega_3=0$, + $\omega_4>0$, and $\omega_5>0$. + +[^3]: In Figure \(fig:scheme), $q=2$, and the faces of maximum dimension + are the sides of the pentagon. They have $q=2$ end i.e. extreme + points. + +[^4]: The authors are grateful to Prof. Michael Rendall, Department of + Sociology, University of Maryland, College Park, for kindly sharing + the data on which this example is based. diff --git a/_articles/RJ-2025-041/RJ-2025-041.html b/_articles/RJ-2025-041/RJ-2025-041.html new file mode 100644 index 0000000000..01e2d82e77 --- /dev/null +++ b/_articles/RJ-2025-041/RJ-2025-041.html @@ -0,0 +1,2988 @@ + + + + + + + + + + + + + + + + + + + + + + elhmc: An R Package for Hamiltonian Monte Carlo Sampling in Bayesian Empirical Likelihood + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +

    elhmc: An R Package for Hamiltonian Monte Carlo Sampling in Bayesian Empirical Likelihood

    + + + +

    In this article, we describe an R package for sampling from an +empirical likelihood-based posterior using a Hamiltonian Monte Carlo +method. Empirical likelihood-based methodologies have been used in the +Bayesian modeling of many problems of interest in recent times. This +semiparametric procedure can easily combine the flexibility of a +nonparametric distribution estimator together with the +interpretability of a parametric model. The model is specified by +estimating equation-based constraints. Drawing inference from a +Bayesian empirical likelihood (BayesEL) posterior is challenging. The +likelihood is computed numerically, so no closed-form expression of +the posterior exists. Moreover, for any sample of finite size, the +support of the likelihood is non-convex, which hinders fast mixing of +many Markov Chain Monte Carlo (MCMC) procedures. It has been recently +shown that using the properties of the gradient of the log empirical +likelihood, one can devise an efficient Hamiltonian Monte Carlo (HMC) +algorithm to sample from a BayesEL posterior. The package requires the +user to specify only the estimating equations, the prior, and their +respective gradients. An MCMC sample drawn from the BayesEL posterior +of the parameters, with various details required by the user, is +obtained.

    +
    + + + +
    +
    +

    1 Introduction

    +

    Empirical likelihood has several advantages over a traditional +parametric likelihood. Even though a correctly specified parametric +likelihood is usually the most efficient for parameter estimation, +semiparametric methods like empirical likelihood, which use a +nonparametric estimate of the underlying distribution, are often more +efficient when the model is misspecified. Empirical likelihood +incorporates parametric model-based information as constraints in +estimating the underlying distribution, which makes the parametric +estimates interpretable. Furthermore, it allows easy incorporation of +known additional information not involving the parameters in the +analysis.

    +

    Bayesian empirical likelihood (BayesEL) (Lazar 2003) methods +employ empirical likelihood in the Bayesian paradigm. Given some +information about the model parameters in the form of a prior +distribution and estimating equations obtained from the model, a +likelihood is constructed from a constrained empirical estimate of the +underlying distribution. The prior is then used to define a posterior +based on this estimated likelihood. Inference on the parameter is drawn +based on samples generated from the posterior distribution.

    +

    BayesEL methods are quite flexible and have been found useful in many +areas of statistics. The examples include small area estimation, +quantile regression, analysis of complex survey data, etc.

    +

    BayesEL procedures, however, require an efficient Markov Chain Monte +Carlo (MCMC) procedure to sample from the resulting posterior. It turns +out that such a procedure is not easily specified. For many parameter +values, it may not be feasible to compute the constrained empirical +distribution function, and the likelihood is estimated to be zero. That +is, the estimated likelihood is not supported over the whole space. +Moreover, this support is non-convex and impossible to determine in most +cases. Thus, a naive random walk MCMC would quite often propose +parameters outside the support and get stuck.

    +

    Many authors have encountered this problem in frequentist applications. +Such "empty set" problems are quite common (Grendár and Judge 2009) and +become more frequent in problems with a large number of parameters +(Bergsma et al. 2012). Several authors +(Chen et al. 2008; Emerson et al. 2009; Liu et al. 2010) have +suggested the addition of extra observations generated from the +available data designed specifically to avoid empty sets. They show that +such observations can be proposed without changing the asymptotic +distribution of the corresponding Wilks’ statistics. Some authors +((Tsao 2013; Tsao and Wu 2013, 2014)) have used a +transformation so that the contours of the resultant empirical +likelihood could be extended beyond the feasible region. However, in +most Bayesian applications, the data are finite in size and not large, +for which the asymptotic arguments have little use.

    +

    With the availability of user-friendly software packages like STAN +(Carpenter et al. 2017), gradient-assisted MCMC methods like Hamiltonian Monte Carlo +(HMC) are becoming increasingly popular in Bayesian computation. When +the estimating equations are smooth with respect to the parameters, +gradient-based methods would have a huge advantage in sampling from a +BayesEL posterior. This is because Chaudhuri et al. (2017) have shown +that under mild conditions, the gradient of the log-posterior would +diverge to infinity at the boundary of its support. Due to this +phenomenon, if an HMC chain approaches the boundary of the posterior +support, it would be reflected towards its center.

    +

    There is no software to implement HMC sampling from a BayesEL posterior +with smooth estimating equations and priors. We describe such a library +called elhmc written for the R platform. The main function in the +library only requires the user to specify the estimating equations, +prior, and respectively their Hessian and gradient with respect to the +parameters as functions. Outputs with user-specified degree of detail +can be obtained.

    +

    The elhmc package has been used by practitioners since it was made available +on CRAN. In recent times, various other libraries for sampling from a +BayesEL posterior have been made available. Among them, the library +VBel (Yu and Lim 2024) deserves special mention. The authors compute a +variational approximation of the BayesEL posterior from which samples +can be easily drawn. However, most of the time elhmc is considered to +be the benchmark.

    +

    The rest of the article is structured as follows. We start with the +theoretical background behind the software package. In section + 2 we first +define the empirical likelihood and construct a Bayesian empirical +likelihood from it. The next part of this section is devoted to a review +of the properties of the log empirical likelihood gradient. A review of +the HMC method with special emphasis on BayesEL sampling is provided +next (Section (sec:hmc)). Section (sec:package) mainly contains the description of the elhmc +library. Some illustrative examples with artificial and real data sets +are presented in Section (sec:examples).

    +

    2 Theoretical background

    +

    Basics of Bayesian Empirical Likelihood

    +

    Suppose \(x=(x_1,\ldots,x_n)\in \mathbb{R}^p\) are \(n\) observations from a +distribution \(F^0\) depending on a parameter vector +\(\theta=(\theta^{(1)}, \ldots,\theta^{(d)})\in\Theta\subseteq \mathbb{R}^d\). +We assume that both \(F^0\) and the true parameter value \(\theta^0\) are +unknown. However, certain smooth functions +\(g(\theta,x)=\left(g_1(\theta,x),\ldots,g_q(\theta,x)\right)^T\) are +known to satisfy +\[\begin{equation} +\label{smoothfun} +E_{F^0}[g(\theta^0,x)]=0. +\end{equation} \tag{1}\]

    +

    Additionally, information about the parameter is available in the form +of a prior density \(\pi(\theta)\) supported on \(\Theta\). We assume that +it is neither possible nor desirable to specify \(F^0\) in a parametric +form. On the other hand, it is not beneficial to estimate \(F^0\) +completely nonparametrically without taking into account the information +from (1) in the estimation procedure.

    +

    Empirical likelihood provides a semiparametric procedure to estimate +\(F^0\), by incorporating information contained in (1). A +likelihood can be computed from the estimate. Moreover, if some +information about the parameter is available in the form of a prior +distribution, the same likelihood can be employed to derive a posterior +of the parameter given the observations.

    +

    Let \(F\in\mathcal{F}_{\theta}\) be a distribution function depending on +the parameter \(\theta\). The empirical likelihood is the maximum of the +“nonparametric likelihood" +\[\begin{equation} +\label{eqn2} +L(F)=\prod_{i=1}^n \{F(x_i)-F(x_i-)\} +\end{equation} \tag{2}\] +over \(\mathcal{F}_\theta\), \(\theta\in\Theta\), under constraints +depending on \(g(\theta,x)\).

    +

    More specifically, by defining \(\omega_i=F(x_i)-F(x_i-)\), the empirical +likelihood for \(\theta\) is defined by, +\[\begin{equation} +\label{eqn3} +L(\theta)\mathrel{\mathop:}=\max_{\omega\in\mathcal{W}_{\theta}}\prod_{i=1}^n \omega_i +\end{equation} \tag{3}\] +where +\[\mathcal{W}_{\theta}=\Big\{\omega: \sum_{i=1}^n\omega_i g(\theta,x_i)=0\Big\}\cap\Delta_{n-1}\] +and \(\Delta_{n-1}\) is the \(n-1\) dimensional simplex, i.e. +\(\omega_i\geq 0\), \(\forall i\) and \(\sum_{i=1}^n\omega_i=1\). For any +\(\theta\), if the problem in (3) is infeasible, i.e. +\(\mathcal{W}_{\theta}=\emptyset\), we define \(L(\theta)\mathrel{\mathop:}= 0\).

    +

    Using the empirical likelihood \(L(\theta)\) and the prior \(\pi(\theta)\) +we can define a posterior as: +\[\begin{equation} +\label{eqn4} +\Pi(\theta|x)=\frac{L(\theta)\pi(\theta)}{\int L(\theta)\pi(\theta) d\theta}\propto L(\theta)\pi(\theta). +\end{equation} \tag{4}\]

    +

    In Bayesian empirical likelihood (BayesEL), \(\Pi(\theta|x)\) is used as +the posterior to draw inferences on the parameter.

    +

    Returning back to (3) above, suppose we denote: +\[\begin{equation} +\label{eqn5} +\hat{\omega}(\theta)=\mathop{\mathrm{\arg\!\max}}_{\omega\in\mathcal{W}_{\theta}}\prod_{i=1}^n \omega_i. +\qquad\qquad +\Big(\text{ i.e. } L(\theta)=\prod^n_{i=1}\hat{\omega}_i(\theta)\Big) +\end{equation} \tag{5}\] +Each \(\hat\omega_i\geq 0\) if and only if the origin in \(\mathbb{R}^q\) +can be expressed as a convex combination of +\(g(\theta,x_1),\ldots,g(\theta,x_n)\). Otherwise, the optimisation +problem is infeasible, and \(\mathcal{W}_{\theta}=\emptyset\). +Furthermore, when \(\hat{\omega}_i>0\), \(\forall i\) is feasible, the +solution \(\hat{\omega}\) of (5) is unique.

    +

    The estimate of \(F^0\) is given by:1 +\[\hat{F}^0(x)=\sum_{i=1}^n\hat{\omega}_i(\theta)1_{\{x_i\leq x\}}.\] +The distribution \(\hat{F}^0\) is a step function with a jump of +\(\hat{\omega}_i(\theta)\) on \(x_i\). If +\(\mathcal{W}_{\theta}=\Delta_{n-1}\), i.e. no information about +\(g(\theta,x)\) is present, it easily follows that +\(\hat{\omega}_i(\theta)=n^{-1}\), for each \(i=1\), \(2\), \(\ldots\), \(n\) and +\(\hat{F}^0\) is the well-known empirical distribution function.

    +

    By construction, \(\Pi(\theta|x)\) can only be computed numerically. No +analytic form is available. Inferences are drawn through the +observations from \(\Pi(\theta|x)\) sampled using Markov chain Monte Carlo +techniques.

    +

    Adaptation of Markov chain Monte Carlo methods to BayesEL applications +poses several challenges. First of all, it is not possible to determine +the full conditional densities in a closed form. So techniques like +Gibbs sampling (Geman and Geman 1984) cannot be used. In most cases, +random walk Metropolis procedures, with carefully chosen step sizes, are +attempted. However, the nature of the support of \(\Pi(\theta|x)\), which +we discuss in detail below, makes the choice of an appropriate step size +extremely difficult.

    +
    +
    +Schematic illustration of the Empirical likelihood problem. The support of the empirical likelihood is $\Theta_1$, a subset of $\mathbb{R}^d$. We take $n=8$ observations. The estimating equations $g(x,\theta)$ are $q=2$ dimensional. Note that $\Theta_1$ is non-convex and may not be bounded. The convex hull of the $q$-dimensional vectors, i.e., $\mathcal{C}(\theta,x)$, is a pentagon in $\mathbb{R}^2$. The largest faces of $\mathcal{C}(\theta,x)$ are the one-dimensional sides of the pentagon. It follows that, $\theta^{(k)}\in\Theta_1$ iff the origin of $\mathbb{R}^2$, denoted $0_2$ is in the interior $\mathcal{C}^0(\theta,x)$ of $\mathcal{C}(\theta,x)$. This also implies that the optimal empirical likelihood weights $\hat{\omega}(\theta^{(k)})$ are strictly positive and lie in the interior of the $n-1$, i.e. $7$-dimensional simplex. There is no easy way to determine $\Theta_1$. We check if $0_2\in \mathcal{C}^0(\theta,x)$ or equivalently if $\hat{\omega}(\theta^{(k)})$ are in the interior of $\Delta_7$ in order to determine if $\theta^{(k)}\in \Theta_1$. As the sequence $\theta^{(k)}$ approaches the boundary of $\theta_1$, the convex polytope $\mathcal{C}(\theta^{(k)},x)$ changes in such a way, so that $0_2$ converges to its boundary. The sequence of optimal weights $\hat{\omega}(\theta^{(k)})$, will converge to the boundary of $\Delta_7$. The current software is based on @chaudhuriMondalTeng2017, who show that, under simple conditions, along almost every sequence $\theta^{(k)}$ converging to the boundary of $\Theta_1$, at least one component of the gradient of log-empirical likelihood based posterior diverges to positive or negative infinity. +

    +Figure 1: Schematic illustration of the Empirical likelihood problem. The support of the empirical likelihood is \(\Theta_1\), a subset of \(\mathbb{R}^d\). We take \(n=8\) observations. The estimating equations \(g(x,\theta)\) are \(q=2\) dimensional. Note that \(\Theta_1\) is non-convex and may not be bounded. The convex hull of the \(q\)-dimensional vectors, i.e., \(\mathcal{C}(\theta,x)\), is a pentagon in \(\mathbb{R}^2\). The largest faces of \(\mathcal{C}(\theta,x)\) are the one-dimensional sides of the pentagon. It follows that, \(\theta^{(k)}\in\Theta_1\) iff the origin of \(\mathbb{R}^2\), denoted \(0_2\) is in the interior \(\mathcal{C}^0(\theta,x)\) of \(\mathcal{C}(\theta,x)\). This also implies that the optimal empirical likelihood weights \(\hat{\omega}(\theta^{(k)})\) are strictly positive and lie in the interior of the \(n-1\), i.e. \(7\)-dimensional simplex. There is no easy way to determine \(\Theta_1\). We check if \(0_2\in \mathcal{C}^0(\theta,x)\) or equivalently if \(\hat{\omega}(\theta^{(k)})\) are in the interior of \(\Delta_7\) in order to determine if \(\theta^{(k)}\in \Theta_1\). As the sequence \(\theta^{(k)}\) approaches the boundary of \(\theta_1\), the convex polytope \(\mathcal{C}(\theta^{(k)},x)\) changes in such a way, so that \(0_2\) converges to its boundary. The sequence of optimal weights \(\hat{\omega}(\theta^{(k)})\), will converge to the boundary of \(\Delta_7\). The current software is based on Chaudhuri et al. (2017), who show that, under simple conditions, along almost every sequence \(\theta^{(k)}\) converging to the boundary of \(\Theta_1\), at least one component of the gradient of log-empirical likelihood based posterior diverges to positive or negative infinity. +

    +
    +
    +

    Provided that the prior is positive over the whole \(\Theta\), which is +true in most applications, the support of \(\Pi(\theta|x)\) is a subset of +the support of the likelihood \(L(\theta)\) which can be defined as (see +Figure 1): +\[\begin{equation} +\label{support} +\Theta_1=\left\{\theta: L(\theta)>0\right\}. +\end{equation} \tag{6}\] +Thus, the efficiency of the MCMC algorithm would depend on \(\Theta_1\) +and the behaviour of \(\Pi(\theta|x)\) on it.

    +

    By definition, \(\Theta_1\) is closely connected to the set +\[\begin{equation} +\label{convexhull} +\mathcal{C}(\theta,x)=\left\{\sum_{i=1}^n\omega_ig(\theta,x_i) \Big|\omega\in \Delta_{n-1}\right\}, +\end{equation} \tag{7}\] +which is the closed convex hull of the \(q\) dimensional vectors +\(G(x,\theta)=\{g(\theta,x_i),\ldots,g(\theta,x_n)\}\) in \(\mathbb{R}^q\) +(the pentagon in Figure (fig:scheme)). Suppose \(\mathcal{C}^0(\theta,x)\) and +\(\partial \mathcal{C}(\theta,x)\) are respectively the interior and +boundary of \(\mathcal{C}(\theta,x)\). By construction, +\(\mathcal{C}(\theta,x)\) is a convex polytope. Since the data \(x\) is +fixed, the set \(\mathcal{C}(\theta,x)\) is a set-valued function of +\(\theta\). For any \(\theta\in\Theta\), the problem in (3) is +feasible (i.e. \(\mathcal{W}_{\theta}\ne\emptyset\)) if and only if the +origin of \(\mathbb{R}^q\), denoted by \(0_q\), is in +\(\mathcal{C}(\theta,x)\). That is, \(\theta\in\Theta_1\) if and only if the +same \(0_q\in\mathcal{C}^0(\theta,x)\). It is not possible to determine +\(\Theta_1\) in general. The only way is to check if, for any potential +\(\theta\), the origin \(0_q\) is in \(\mathcal{C}^0(\theta,x)\). There is no +quick numerical way to check the latter either. Generally, an attempt is +made to solve (3). The existence of such a solution indicates +that \(\theta\in L(\theta)\).

    +

    Examples show (Chaudhuri et al. 2017) that even for simple problems, +\(\Theta_1\) may not be a convex set. Designing an efficient random walk +Markov chain Monte Carlo algorithm on a potentially non-convex support +is an extremely challenging task. Unless the step sizes and the proposal +distributions are adapted well to the proximity of the current position +to the boundary of \(\Theta_1\), the chain may repeatedly propose values +outside the likelihood support and, as a result, converge very slowly. +Adaptive algorithms like the one proposed by Haario et al. (1999) do not +tackle the non-convexity problem well.

    +

    Hamiltonian Monte Carlo methods solve well-known equations of motion +from classical mechanics to propose new values of \(\theta\in\Theta\). +Numerical solutions of these equations of motion are dependent on the +gradient of the log posterior. The norm of the gradient of the log +empirical likelihood used in BayesEL procedures diverges near the +boundary of \(\Theta_1\). This property makes the Hamiltonian Monte Carlo +procedures very efficient for sampling a BayesEL posterior. It ensures +that once in \(\Theta_1\), the chain would rarely step outside the support +and repeatedly sample from the posterior.

    +

    A Review of Some Properties of the Gradient of Log Empirical Likelihood

    +

    Various properties of log-empirical likelihood have been discussed in +the literature. However, the properties of its gradients with respect to +the model parameters are relatively unknown. Our main goal in this +section is to review the behaviour of gradients of log-empirical +likelihood on the support of the empirical likelihood. We only state the +relevant results here. The proofs of these results can be found in +Chaudhuri et al. (2017).

    +

    Recall that, (see Figure (fig:scheme)) the support \(\Theta_1\) can only be specified by +checking if \(0_q\in\mathcal{C}^0(x,\theta_0)\) for each individual +\(\theta_0\in\Theta\). If for some \(\theta_0\in\Theta\), the origin lies on +the boundary of \(\mathcal{C}(x,\theta_0)\), i.e. +\(0_q\in\partial \mathcal{C}(x,\theta_0)\), the problem in (3) +is still feasible, however, \(L\left(\theta_0\right)=0\) and the solution +of (5) is not unique. Below we discuss how, under mild +conditions, for any \(\theta_0\in\Theta\), for a large subset +\(S\subseteq\partial \mathcal{C}(x,\theta_0)\), if \(0_q\in S\), the +absolute value of at least one component of the gradient of +\(\log\left(L\left(\theta_0\right)\right)\) would be large.

    +

    Before we proceed, we make the following assumptions:

    +
      +
    1. \(\Theta\) is an open set.

    2. +
    3. \(g\) is a continuously differentiable function of \(\theta\) in +\(\Theta\), \(q \le d\) and \(\Theta_1\) is non-empty.

    4. +
    5. The sample size \(n > q\). The matrix \(G(x, \theta)\) has full row rank +for any \(\theta \in \Theta\).

    6. +
    7. For any fixed \(x\), let \(\nabla g(x_i,\theta)\) be the \(q \times d\) +Jacobian matrix for any \(\theta \in \Theta\). Suppose +\(w=(w_1,\ldots, w_n)\in\Delta_{n-1}\) and there are at least \(q\) +elements of \(w\) that are greater than \(0\). Then, for any +\(\theta \in \Theta\), the matrix +\(\sum_{i=1}^n w_i \nabla g(x_i,\theta)\) has full row rank.

    8. +
    +

    Under the above assumptions, several results about the log empirical +likelihood and its gradient can be deduced.

    +

    First of all, since the properties of the gradient of the log empirical +likelihood at the boundary of the support are of interest, some +topological properties of the support need to be investigated. Under the +standard topology of \(\mathbb{R}^q\), since \(\mathcal{C}(x,\theta)\) is a +convex polytope with a finite number of faces and extreme points, using +the smoothness of \(g\), it is easy to see that, for any +\(\theta_0\in\Theta_1\) one can find a real number \(\delta>0\), such that +the open ball centred at \(\theta_0\) with radius \(\delta\) is contained in +\(\Theta_1\). That is, \(\Theta_1\) is an open subset of \(\Theta\).

    +

    Now, since \(\Theta_1\) is an open set, the boundary \(\partial\Theta_1\) of +\(\Theta_1\) is not contained in \(\Theta_1\). Let \(\theta^{(0)}\) lie within +\(\Theta\) and on the boundary of \(\Theta_1\) (i.e. \(\partial\Theta_1\)). +Then it follows that the primal problem (3) is feasible at +\(\theta^{(0)}\) and \(0_q\) lies on the boundary of +\(\mathcal{C}(x,\theta^{(0)})\) (i.e. +\(\partial \mathcal{C}(x,\theta^{(0)})\)).

    +

    Our main objective is to study the utility of Hamiltonian Monte Carlo +methods for drawing samples from a BayesEL posterior. The sampling +scheme will produce a sequence of sample points in +\(\theta^{(k)}\in\Theta_1\) (see Figure +1). It would +be efficient as long as \(\log L\left(\theta^{(k)}\right)\) is large. The +sampling scheme could potentially become inefficient if some +\(\theta^{(k)}\) is close to the boundary \(\partial\Theta_1\). Thus, it is +sufficient to consider the properties of the log empirical likelihood +and its gradient along such a sequence converging to a point +\(\theta^{(0)}\in\partial\Theta_1\).

    +

    From the discussion above it is evident that when +\(\theta^{(0)} \in \partial \Theta_1\) the problem in (3) is +feasible but the likelihood \(L\left(\theta^{(0)}\right)\) will always be +zero and (5) will not have a unique solution. Since +\(\mathcal{C}(x,\theta^{(0)})\) is a polytope, and \(0_q\) lies on one of +its faces, there exists a subset \(\mathcal{I}_0\) of the observations and +\(0\) belongs to the interior of the convex hull generated by all +\(g(x_i,\theta^{(0)})\) for \(i \in \mathcal{I}_0\) (in Figure +1, +\(\mathcal{I}_0=\{x_4,x_5\}\)). It follows from the supporting hyperplane +theorem (Boyd and Vandenberghe 2004) that there exists a unit vector +\(a\in \mathbb{R}^q\) such that +\[a^{\text{\tiny T}} g(x_i, \theta^{(0)}) =0 \quad \mbox{for} \quad i \in \mathcal{I}_0, \qquad\text{and}\qquad a^{\text{\tiny T}} g(x_{i}, \theta^{(0)}) >0 \quad \mbox{for} \quad i \in \mathcal{I}_0^c.\] +From some algebraic manipulation it easily follows that any +\(\omega\in\mathcal{W}_{\theta^{(0)}}\) (\(\mathcal{W}_{\theta}\) as defined +in (3) with \(\theta=\theta^{(0)}\)) must satisfy,2 +\[\omega_i=0 \quad \mbox{for} \quad i \in \mathcal{I}_0^c \qquad\text{and}\qquad \omega_i>0 \quad \mbox{for} \quad i \in \mathcal{I}_0.\]

    +

    It is well known that the solution of (5) i.e. +\(\hat{w}(\theta)\) is smooth for all \(\theta\in\Theta_1\) +(Qin and Lawless 1994). As \(\theta^{(k)}\) converges to \(\theta^{(0)}\), the +properties of \(\hat{w}(\theta^{(k)})\) need to be considered. To that +goal, we first make a specific choice of \(\hat{w}(\theta^{(0)})\).

    +

    First, we consider a restriction of problem (5) to +\(\mathcal{I}_0\).

    +

    \[\begin{equation} +\label{submax} +\hat\nu(\theta) =\mathop{\mathrm{\arg\!\max}}_{\nu\in\mathcal{V}_\theta} \prod_{i\in\mathcal{I}_0} \nu_i +\end{equation} \tag{8}\] +where +\[\mathcal{V}_\theta=\left\{\nu: \sum_{i\in \mathcal{I}_0}\nu_i g(x_i,\theta)=0\right\}\cap\Delta_{|\mathcal{I}_0|-1}.\] +We now define +\[\hat \omega_i(\theta^{(0)}) = \hat\nu(\theta^{(0)}), \quad i \in \mathcal{I}_0 \quad \mbox{and} \quad \hat \omega_i(\theta^{(0)}) = 0, \quad i \in \mathcal{I}_0^c,\] +and +\[L(\theta^{(0)})= \prod_{i=1}^n \hat \omega_i(\theta^{(0)}).\]

    +

    Since \(\theta^{(0)}\) is in the interior of \(\mathcal{I}_0\), the problem +(8) has a unique solution. For each +\(\theta^{(k)}\in\Theta_1\), \(\hat{\omega}(\theta^{(k)})\) is continuous +taking values in a compact set. Thus as \(\theta^{(k)}\) converges to +\(\theta^{(0)}\), \(\hat{\omega}(\theta^{(k)})\) converges to a limit. +Furthermore, this limit is a solution of (5) at +\(\theta^{(0)}\). However, counterexamples show (Chaudhuri et al. 2017) +that the limit may not be \(\hat{\omega}_i(\theta^{(0)})\) as defined +above. That is, the vectors \(\hat{\omega}(\theta^{(k)})\) do not extend +continuously to the boundary \(\partial\Theta_1\) as a whole. However, we +can show that: +\[\lim_{k\to\infty}\hat\omega_i(\theta^{(k)}) = \hat \omega_i(\theta^{(0)}) = 0, \text{for all\ } i \in \mathcal{I}_0^c.\] +That is, the components of \(\hat\omega(\theta^{(k)})\) which are zero in +\(\hat\omega(\theta^{(0)})\) are continuously extendable. Furthermore, +\[\lim_{k\to\infty}L(\theta^{(k)})=L(\theta^{(0)})=0.\] +That is, the likelihood is continuous at \(\theta^{(0)}\).

    +

    However, this is not true for the components +\(\hat{\omega}_i\left(\theta^{(k)}\right)\), \(i\in\mathcal{I}_0\) for which +\(\hat{\omega}_i\left(\theta^{(k)}\right)> 0\).

    +

    Since the set \(\mathcal{C}(x,\theta)\) is a convex polytope in +\(\mathbb{R}^q\), the maximum dimension of any of its faces is \(q-1\), +which would have exactly \(q\) extreme points.3 Furthermore, any face +with a smaller dimension can be expressed as an intersection of such +\(q-1\) dimensional faces.

    +

    In certain cases, however, the whole vector +\(\hat{\omega}\left(\theta^{(k)}\right)\) extends continuously to +\(\hat{\omega}\left(\theta^{(0)}\right)\). In order to argue that, we +define +\[\begin{equation} +\label{Theta_2} + \mathcal{C}(x_{\mathcal{I}},\theta) = \left\{\sum_{i \in \mathcal{I}} \omega_i g(x_i,\theta)\, \Big|\, \omega\in \Delta_{|\mathcal{I}|-1}\right\} +\end{equation} \tag{9}\] +and +\[\begin{equation} + \partial\Theta_1^{(q-1)} = \Big\{ \theta: 0 \in \mathcal{C}^0(x_{\mathcal{I}},\theta) \mbox{ for some } \mathcal{I}~ s.t. \mathcal{C}(x_{\mathcal{I}},\theta) \text{has exactly q extreme points} \Big\} \cap\partial\Theta_1. +\end{equation}\]

    +

    Thus \(\partial\Theta_1^{(q-1)}\) is the set of all boundary points +\(\theta^{(0)}\) of \(\Theta_1\) such that \(0\) belongs to a +\((q-1)\)-dimensional face of the convex hull +\(\mathcal{C}(x,\theta^{(0)})\). Now for any +\(\theta^{(0)}\in \partial\Theta_1^{(q-1)}\), there is a unique set of +weight \(\nu\in\Delta_{|\mathcal{I}|-1}\) such that, +\(\sum_{i\in\mathcal{I}}\nu_ig\left(x_i,\theta^{(0)}\right)=0\). That is +the set of feasible solutions of (8) is a singleton set. +This, after taking note that \(\hat{\omega}\) takes values in a compact +set, an argument using convergent subsequences, implies that for any +sequence \(\theta^{(k)}\in\Theta_1\) converging to \(\theta^{(0)}\), the +whole vector \(\hat{\omega}\left(\theta^{(k)}\right)\) converges to +\(\hat{\omega}\left(\theta^{(0)}\right)\). That is, the whole vector +\(\hat{\omega}\left(\theta^{(k)}\right)\) extends continuously to +\(\hat{\omega}\left(\theta^{(0)}\right)\).

    +

    We now consider the behaviour of the gradient of the log empirical +likelihood near the boundary of \(\Theta_1\). First, note that, for any +\(\theta \in \Theta_1\), the gradient of the log empirical likelihood is +given by +\[\nabla \log L(\theta) = -n\sum_{i=1}^n \hat \omega_i(\theta) \hat{\lambda}(\theta)^{\text{\tiny T}} \nabla g(x_i,\theta).\] +where \(\hat{\lambda}(\theta)\) is the estimated Lagrange multiplier +satisfying the equation:

    +

    \[\begin{equation} +\label{eq:lagmult} +\sum_{i=1}^n \frac{g(x_i,\theta)}{\left\{1+ \hat\lambda(\theta)^{\text{\tiny T}} g(x_i,\theta) \right\}}=0. +\end{equation} \tag{10}\]

    +

    Note that, the gradient depends on the value of the Lagrange multiplier +but not on the value of its gradient.

    +

    Now, Under assumption A3, it follows that the gradient of the log +empirical likelihood diverges on the set of all boundary points +\(\partial\Theta_1^{(q-1)}\). More specifically one can show:

    +
      +
    1. As \(\theta^{(k)}\rightarrow \theta^{(0)}\), +\(\parallel\hat \lambda(\theta^{(k)})\parallel\to\infty\).

    2. +
    3. If \(\theta^{(0)}\in \partial\Theta_1^{(q-1)}\), under as +\(\theta^{(k)}\rightarrow \theta^{(0)}\), +\({\parallel \nabla \log L(\theta^{(k)}) \parallel}\to \infty\).

    4. +
    +

    Therefore, it follows that at every boundary point \(\theta^{(0)}\) of +\(\Theta_1\) such that \(0\) belongs to one of the \((q-1)\)-dimensional faces +of \(\mathcal{C}(x,\theta^{(0)})\), at least one component of the +estimated Lagrange multiplier and the gradient of the log empirical +likelihood diverges to positive or negative infinity. The gradient of +the negative log empirical likelihood represents the direction of the +steepest increase of the negative log empirical likelihood. Since the +value of the log empirical likelihood should typically be highest around +the center of the support \(\Theta_1\), the gradient near the boundary of +\(\Theta_1\) should point towards its center. This property can be +exploited in forcing candidates of \(\theta\) generated by HMC proposals +to bounce back towards the interior of \(\Theta_1\) from its boundaries +and in consequence reducing the chance of them getting out of the +support.

    +

    Hamiltonian Monte Carlo Sampling for Bayesian Empirical Likelihood

    +

    Hamiltonian Monte Carlo algorithm is a Metropolis algorithm where the +successive steps are proposed by using Hamiltonian dynamics. One can +visualise these dynamics as a cube sliding without friction under +gravity in a bowl with a smooth surface. The total energy of the cube is +the sum of the potential energy \(U(\theta)\), defined by its position +\(\theta\) (in this case its height) and kinetic energy \(K(p)\), which is +determined by its momentum \(p\). The total energy of the cube will be +conserved and it will continue to slide up and down on the smooth +surface of the bowl forever. The potential and the kinetic energy would, +however, vary with the position of the cube.

    +

    In order to use the Hamiltonian dynamics to sample from the posterior +\(\Pi\left(\theta\mid x\right)\) we set our potential and kinetic energy +as follows: +\[U(\theta)=-\log\Pi(\theta|x)\quad\text{and}\quad K(p)=\frac{1}{2}p^TM^{-1}p.\] +Here, the momentum vector \(p=\left(p_1,p_2,\ldots,p_d\right)\) is a +totally artificial construct usually generated from a \(N(0, M)\) +distribution. Most often the covariance matrix \(M\) is chosen to be a +diagonal matrix with diagonal \((m_1,m_2,\ldots,m_d)\), in which case each +\(m_i\) is interpreted as the mass of the \(i\)th parameter. The Hamiltonian +of the system is the total energy +\[\begin{equation} +\label{hamiltonian dynamics} +\mathcal{H}(\theta,p)=U(\theta)+K(p). +\end{equation} \tag{11}\]

    +

    In Hamiltonian mechanics, the variation in the position \(\theta\) and +momentum \(p\) with time \(t\) is determined by the partial derivatives of +\(\mathcal{H}\) with \(p\) and \(\theta\) respectively. In particular, the +motion is governed by the pair of so-called Hamiltonian equations: +\[\begin{eqnarray} +\frac{d\theta}{dt}&=&\frac{\partial \mathcal{H}}{\partial p}=M^{-1}p, \label{PDE1}\\ +\frac{dp}{dt}&=&-\frac{\partial\mathcal{ H}}{\partial \theta}=-\frac{\partial U(\theta)}{\partial \theta}.\label{PDE2} +\end{eqnarray} \tag{12}\] +It is easy to show that (Neal 2011) Hamiltonian dynamics is +reversible, invariant, and volume preserving, which makes it suitable +for MCMC sampling schemes.

    +

    In HMC we propose successive states by solving the Hamiltonian equations +(12) and [PDE2]. Unfortunately, they cannot be solved analytically +(except of course for a few simple cases), and they must be approximated +numerically at discrete time points. There are several ways to +numerically approximate these two equations in the literature +(Leimkuhler and Reich 2004). For the purpose of MCMC sampling, we need a +method that is reversible and volume-preserving.

    +

    Leapfrog integration (Birdsall and Langdon 2004) is one such method to +numerically integrate the pair of Hamiltonian equations. In this method, +a step-size \(\epsilon\) for the time variable \(t\) is first chosen. Given +the value of \(\theta\) and \(p\) at the current time point \(t\) (denoted +here by \(\theta(t)\) and \(p(t)\) respectively), the leapfrog updates the +position and the momentum at time \(t+\epsilon\) as follows +\[\begin{eqnarray} +p\left(t+\frac{\epsilon}{2}\right)&=&p(t)-\frac{\epsilon}{2}\frac{\partial U(\theta(t))}{\partial\theta},\label{leapfrog1}\\ +\theta(t+\epsilon)&=&\theta(t)+\epsilon M^{-1}p\left(t+\frac{\epsilon}{2}\right),\label{leapfrog2}\\ +p(t+\epsilon)&=&p\left(t+\frac{\epsilon}{2}\right)-\frac{\epsilon}{2}\frac{\partial U(\theta(t+\epsilon))}{\partial\theta}.\label{leapfrog3} +\end{eqnarray} \tag{13}\]

    +

    Theoretically, due to its symmetry, the leapfrog integration satisfies +the reversibility and preserves the volume. However, because of the +numerical inaccuracies, the volume is not preserved. This is similar to +the Langevin-Hastings algorithm (Besag 2004), which is a special +case of HMC. Fortunately, the lack of invariance in volume is easily +corrected. The accept-reject step in the MCMC procedure ensures that the +chain converges to the correct posterior.

    +

    At the beginning of each iteration of the HMC algorithm, the momentum +vector \(p\) is randomly sampled from the \(N(0,M)\) distribution. Starting +with the current state \((\theta,p)\), the leapfrog integrator described +above is used to simulate Hamiltonian dynamics for \(T\) steps with a step +size of \(\epsilon\). At the end of this \(T\)-step trajectory, the momentum +\(p\) is negated so that the Metropolis proposal is symmetric. At the end +of this T-step iteration, the proposed state \((\theta^*,p^*)\) is +accepted with probability +\[\min\{1,\exp(-\mathcal{H}(\theta^*,p^*)+\mathcal{H}(\theta,p))\}.\]

    +

    The gradient of the log-posterior used in the leapfrog is a sum of the +gradient of the log empirical likelihood and the gradient of the log +prior. The prior is user-specified and it is hypothetically possible +that even though at least one component of the gradient of the log +empirical likelihood diverges at the boundary \(\partial\Theta_1\), the +log prior gradient may behave in a way so that the effect is nullified +and the log posterior gradient remains finite over the closure of +\(\Theta_1\). We make the following assumption on the prior mainly to +avoid this possibility (see Chaudhuri et al. (2017) for more details).

    +

    \[\begin{equation} + \liminf_{k\to\infty} \frac{ \log\pi(\theta^{(k-1)}) - \log\pi(\theta^{(k)}) }{ \log L(\theta^{(k-1)} ) - \log L(\theta^{(k)} ) } \ge b(n, \theta^{(0)}). + \tag{14} + \end{equation}\]

    +
      +
    • Consider a sequence \(\{\theta^{(k)} \}\), \(k=1, 2,\ldots\), of points +in \(\Theta_1\) such that \(\theta^{(k)}\) converges to a boundary point +\(\theta^{(0)}\) of \(\Theta_1\). Assume that \(\theta^{(0)}\) lies within +\(\Theta\) and \(L(\theta^{(k)})\) strictly decreases to +\(L(\theta^{(0)})\), Then, for some constant +\(b(n, \theta^{(0)}) > -1\), we have
    • +
    +

    The assumption implies that near the boundary of the support, the main +contribution in the gradient of the log-posterior with respect to any +parameter appearing in the argument of the estimating equations comes +from the corresponding gradient of the log empirical likelihood. This is +in most cases expected, especially if the sample size is large. For a +large sample size, the log-likelihood should be the dominant term in the +log-posterior. We are just assuming here that the gradients behave the +same way. It would also ensure that at the boundary, the gradient of the +log-likelihood and the log-posterior do not cancel each other, which is +crucial for the proposed Hamiltonian Monte Carlo to work.

    +

    Under these assumptions, Chaudhuri et al. (2017) show that the gradient +of the log-posterior diverges along almost every sequence as the +parameter values approach the boundary \(\partial \Theta_1\) from the +interior of the support. More specifically, they prove that:

    +

    \[\begin{equation} +\label{eq:postdiv} +\Bigl\| \nabla \log \pi(\theta^{(k)} \mid x) \Bigr\| \rightarrow \infty, \hspace{.1in} \mbox{ as } \hspace{.1in} k \rightarrow \infty. +\end{equation} \tag{15}\]

    +

    Since the \(q-1\) dimensional faces of \(\mathcal{C}(x,\theta^{(0)})\) have +larger volume than its faces with lower dimension (see Figure +1), a random +sequence of points from the interior to the boundary would converge to a +point on \(\partial \Theta_1^{(q-1)}\) with probability \(1\). Thus under +our assumptions, the gradient of the log-posterior would diverge to +infinity for these sequences with a high probability. The lower +dimensional faces of the convex hull (a polytope) are an intersection of +\(q-1\) dimensional faces. Although, it is not clear if the norm of the +gradient of the posterior will diverge on those faces. It is conjectured +that this would happen. However, even if the conjecture is not true, +from the setup, it is clear that the sampler would rarely move to the +region where the origin belongs to the lower dimensional faces of the +convex hull.

    +

    As has been pointed out above, the gradient vector would always point +towards the mode of the posterior. From our results, since the gradient +is large near the support boundary, whenever the HMC sampler approaches +the boundary due to the high value of the gradient it would reflect +towards the interior of the support and not get out of it. The leapfrog +parameters can be controlled to increase efficiency of sampling.

    +

    3 Package description

    +

    The main function of the package is ELHMC. It draws samples from an +empirical likelihood Bayesian posterior of the parameter of interest +using Hamiltonian Monte Carlo once the estimating equations involving +the parameters, the prior distribution of the parameters, the gradients +of the estimating equations, and the log priors are specified. Some +other parameters which control the HMC process can also be specified.

    +

    Suppose that the data set consists of observations +\(x = \left( x_1, ..., x_n \right)\) where each \(x_i\) is a vector of +length \(p\) and follows a probability distribution \(F\) of family +\(\mathcal{F}_{\theta}\). Here +\(\theta = \left(\theta_1,...,\theta_d\right)\) is the \(d-\)dimensional +parameter of interest associated with \(F\). Suppose there exist smooth +functions +\(g\left(\theta, x_i\right) = \left(g_1\left(\theta, x_i\right)\right.\), +\(\ldots\), \(\left. g_q\left(\theta, x_i\right)\right)^T\) which satisfy +\(E_F\left[g\left(\theta,x_i\right)\right] = 0\). As we have explained +above, ELHMC is used to draw samples of \(\theta\) from its posterior +defined by an empirical likelihood.

    +
    + + ++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Table 1: Arguments for function ELHMC
    initialA vector containing the initial values of the parameter
    dataA matrix containing the data
    funThe estimating function \(g\). It takes in a parameter vector params as the first argument and a data point vector x as the second parameter. This function returns a vector.
    dfunA function that calculates the gradient of the estimating function \(g\). It takes in a parameter vector params as the first argument and a data point vector x as the second argument. This function returns a matrix.
    priorA function with one argument x that returns the log joint prior density of the parameters of interest.
    dpriorA function with one argument x that returns the gradients of the log densities of the parameters of interest
    n.samplesNumber of samples to draw
    lf.stepsNumber of leap frog steps in each Hamiltonian Monte Carlo update (defaults to \(10\)).
    epsilonThe leap frog step size (defaults to \(0.05\)).
    p.varianceThe covariance matrix of a multivariate normal distribution used to generate the initial values of momentum p in Hamiltonian Monte Carlo. This can also be a single numeric value or a vector (defaults to \(0.1\)).
    tolEL tolerance
    detailedIf this is set to TRUE, the function will return a list with extra information.
    print.intervalThe frequency at which the results would be printed on the terminal. Defaults to 1000.
    plot.intervalThe frequency at which the drawn samples would be plotted. The last half of the samples drawn are plotted after each plot.interval steps. The acceptance rate is also plotted. Defaults to 0, which means no plot.
    which.plotThe vector of parameters to be plotted after each plot.interval. Defaults to NULL, which means no plot.
    FUNthe same as fun but takes in a matrix X instead of a vector x and returns a matrix so that FUN(params, X)[i, ] is the same as fun(params, X[i, ]). Only one of FUN and fun should be provided. If both are then fun is ignored.
    DFUNthe same as dfun but takes in a matrix X instead of a vector x and returns an array so that DFUN(params, X)[, , i] is the same as dfun(params, X[i, ]). Only one of DFUN and dfun should be provided. If both are then dfun is ignored.
    +
    +

    Table 1 enlists the full list of arguments for ELHMC. Arguments data and fun define the problem. They +are the data set \(x\) and the collection of smooth functions in \(g\). The +user-specified starting point for \(\theta\) is given in initial, whereas n.samples is the +number of samples of \(\theta\) to be drawn. The gradient matrix of \(g\) +with respect to the parameter \(\theta\) (i.e. \(\nabla_{\theta}g\)) has to +be specified in dfun. At the moment the function does not compute the +gradient numerically by itself. The prior prior represents the joint density +functions of \(\theta_1,..,\theta_q\), which for the purpose of this +description we denote by \(\pi\). The gradient of the log prior function +is specified in dprior. The function returns a vector containing the values of +\(\frac{\partial}{\partial \theta_1}\pi\left(\theta\right),..,\frac{\partial}{\partial\theta_d}\pi\left(\theta\right)\). +Finally, the arguments epsilon, lf.steps, p.variance and tol are hyper-parameters which control the +Hamiltonian Monte Carlo algorithm.

    +

    The arguments print.interval, plot.interval, and which.plot can be used to tune the HMC samplers. They can be +used for printing and plotting the sampled values at specified intervals +while the code is running. The argument which.plot allows the user to only plot the +variables whose convergence needs to be checked.

    +

    Given the data and a value of \(\theta\), ELHMC computes the optimal weights +using the el.test function from emplik library (Zhou 2014). The el.test provides +\(\hat{\lambda}\left(\theta^{(k)}\right)\) from which the gradient of the +log-empirical likelihood can be computed.

    +

    If \(\theta\not\in\Theta_1\), i.e. problem (5) is not feasible, +then el.test converges to weights all close to zero which do not sum to one. +Furthermore, the norm of \(\hat{\lambda}\left(\theta^{(k)}\right)\) will +be large. In such cases, the empirical likelihood will be zero. This +means that, whenever the optimal weights are computed, we need to check +if they sum to one (within numerical errors) or not.

    +

    The function ELHMC returns a list. If argument detailed is set to FALSE, the list contains +samples of the parameters of interest \(\theta\), the Monte Carlo +acceptance rate as listed in table (tab:T2). If detailed is set to TRUE, additional information such as the +trajectories of \(\theta\) and the momentum is included in the returned +list (see Table (tab:T3)).

    +

    At the moment ELHMC only allows a diagonal covariance matrix for the momentum +\(p\). The default value for the stepsize epsilon and step number lf.steps are \(0.05\) and +\(10\) respectively. For a specific problem they need to be determined by +trial and error, using the outputs from plot.interval, and print.interval commands.

    +
    + + + + + + + + + + + + + + + + +
    Table 2: Elements of the list returned by ELHMC if detailed=FALSE
    samplesA matrix containing the parameter samples
    acceptance.rateThe acceptance rate
    callThe matched call
    +
    +
    + + ++++ + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Table 3: Elements of the list returned by ELHMC if detailed=TRUE
    samplesA matrix containing the parameter samples
    acceptance.rateThe acceptance rate
    proposedA matrix containing the proposed values at n.samples - 1 Hamiltonian Monte Carlo updates
    acceptanceA vector of TRUE/FALSE values indicates whether each proposed value is accepted
    trajectoryA list with 2 elements trajectory.q and trajectory.p. These are lists of matrices containing position and momentum values along trajectory in each Hamiltonian Monte Carlo update.
    callThe matched call
    +
    +

    4 Examples

    +

    In this section, we present two examples of usage of the package. Both +examples in some sense supplement the conditions considered by +Chaudhuri et al. (2017). In each case, it is seen that the function can +sample from the resulting empirical likelihood-based posterior quite +efficiently.

    +

    Sample the mean of a simple data set

    +

    In the first example, suppose the data set consists of eight data points +\(v = \left(v_1,...,v_8\right)\):

    +
    R> v <- rbind(c(1, 1), c(1, 0), c(1, -1), c(0, -1),
    ++            c(-1, -1), c(-1, 0), c(-1, 1), c(0, 1))
    +R> print(v)
    +     [,1] [,2]
    +[1,]    1    1
    +[2,]    1    0
    +[3,]    1   -1
    +[4,]    0   -1
    +[5,]   -1   -1
    +[6,]   -1    0
    +[7,]   -1    1
    +[8,]    0    1
    +

    The parameters of interest are the mean +\(\theta = \left(\theta_1, \theta_2\right)\). Since +\(E\left[\theta - v_i\right] = 0\), the smooth function is +\(g = \theta - v_i\) with +\(\nabla_\theta g = \left(\left(1, 0\right), \left(0, 1\right)\right)\):

    +
    Function: fun
    +R> g <- function(params, x) {
    ++   params - x
    ++ }
    +
    +Function: dfun
    +R> dlg <- function(params, x) {
    ++   rbind(c(1, 0), c(0, 1))
    ++ }
    +

    Functions g and dlg are supplied to arguments fun and dfun in ELHMC. These two functions +must have params as the first argument and x as the second. params represents a sample +of \(\theta\) whereas x represents a data point \(v_i\) or a row in the matrix +v. fun should return a vector and dfun a matrix whose \(\left(i, j\right)\) entry is +\(\partial g_i/\partial\theta_j\).

    +

    We assume that both \(\theta_1\) and \(\theta_2\) have independent standard +normal distributions as priors. Next, we define the functions that +calculate the prior densities and gradients of log prior densities as +pr and dpr in the following ways:

    +
    Function: prior
    +R> pr <- function(x) {
    ++   -.5*(x[1]^2+x[2]^2)-log(2*pi)
    ++ }
    +Function: dprior
    +R> dpr <- function(x) {
    ++   -x
    ++ }
    +

    Functions pr and dpr are assigned to prior and dprior in ELHMC. prior and dprior must take in only one +argument x and return a vector of the same length as \(\theta\).

    +

    We can now use ELHMC to draw samples of \(\theta\). Let us draw 1000 samples, +with starting point \(\left(0.9, 0.95\right)\) using 12 leapfrog steps +with step size 0.06 for both \(\theta_1\) and \(\theta_2\) for each +Hamiltonian Monte Carlo update:

    +
    R> library(elhmc)
    +R> set.seed(476)
    +R> thetas <- ELHMC(initial = c(0.9, 0.95), data = v, fun = g, dfun = dlg,
    ++                 prior = pr, dprior = dpr, n.samples = 1000,
    ++                 lf.steps = 12, epsilon = 0.06, detailed = TRUE)
    +

    We extract and visualise the distribution of the samples using a boxplot +(Figure 2):

    +
    R> boxplot(thetas$samples, names = c(expression(theta[1]), expression(theta[2])))
    +

    Since we set detailed = TRUE, we have data on the trajectory of \(\theta\) as well as +momentum \(p\). They are stored in element trajectory of thetas and can be accessed by +thetas$trajectory. thetas$trajectory is a list with two elements +named trajectory.q and trajectory.p denoting trajectories for \(\theta\) and momentum \(p\). trajectory.q and trajectory.p are +both lists with elements 1, ..., n.samples - 1. Each of these elements is a matrix +containing trajectories of \(\theta\) (trajectory.q) and \(p\) (trajectory.p) at each Hamiltonian +Monte Carlo update.

    +

    We illustrate by extracting the trajectories of \(\theta\) at the first +update and plotting them (Figure +3):

    +
    R> q <- thetas$trajectory$trajectory.q[[1]]
    +R> plot(q, xlab = expression(theta[1]), ylab = expression(theta[2]),
    ++      xlim = c(-1, 1), ylim = c(-1, 1), cex = 1, pch = 16)
    +R> points(v[,1],v[,2],type="p",cex=1.5,pch=16)
    +R> abline(h=-1); abline(h=1); abline(v=-1); abline(v=1)
    +R> arrows(q[-nrow(q), 1], q[-nrow(q), 2], q[-1, 1], q[-1, 2],
    ++        length = 0.1, lwd = 1.5)
    +
    +
    +Posterior distribution of $\theta_1$ and $\theta_2$ samples. +

    +Figure 2: Posterior distribution of \(\theta_1\) and \(\theta_2\) samples. +

    +
    +
    +
    +
    +Trajectory of $\theta$ during the first Monte Carlo update. +

    +Figure 3: Trajectory of \(\theta\) during the first Monte Carlo update. +

    +
    +
    +
    +
    +Samples of $\theta$ drawn from the posterior.Samples of $\theta$ drawn from the posterior. +

    +Figure 4: Samples of \(\theta\) drawn from the posterior. +

    +
    +
    +

    The specialty in this example is in the choice of the data points in +\(v\). Chaudhuri et al. (2017) show that the chain will reflect if the +one-dimensional boundaries of the convex hull (in this case the unit +square) have two observations, which happens with probability one for +continuous distributions. In this example, however, there is more than +one point in two one-dimensional boundaries. However, we can see that +the HMC method works very well here.

    +

    Logistic regression with an additional constraint

    +

    In this example, we consider a constrained logistic regression of one +binary variable on another, where the expectation of the response is +known. The frequentist estimation problem using empirical likelihood was +considered by Chaudhuri et al. (2008). It has been shown that +empirical likelihood-based formulation has a major applicational +advantage over the fully parametric formulation. Below we consider a +Bayesian extension of the proposed empirical likelihood-based +formulation and use ELHMC to sample from the resulting posterior.

    +
    + + + + + + + + + + + + + + + + + + + + + +
    Table 4: The dataset used in this example.
    \(x=0\)\(x=1\)
    \(y=0\)59035157
    \(y=1\)230350
    +
    +

    The data set \(v\) consists of \(n\) observations of two variables and two +columns \(X\) and \(Y\). In the ith row \(y_i\) represents the indicator of +whether a woman gave birth between time \(t - 1\) and \(t\) while \(x_i\) is +the indicator of whether she had at least one child at time \(t - 1\). The +data can be found in Table +4 +above. In addition, it was known that the prevalent general fertility +rate in the population was \(0.06179\). 4

    +

    We are interested in fitting a logistic regression model to the data +with \(X\) as the independent variable and \(Y\) as the dependent variable. +However, we also would like to constrain the sample general fertility +rate to its value in the population. The logistic regression model takes +the form of:

    +

    \[P \left(Y = 1 | X = x\right) = \frac{\exp\left(\beta_0 + \beta_1 x\right)}{1 + \exp\left(\beta_0 + \beta_1 x\right)}.\]

    +

    From the model, using conditions similar to zero-mean residual and +exogeneity, it is clear that:

    +

    \[\begin{equation*} +E\left[y_i - \frac{\exp\left(\beta_0 + \beta_1 x_i\right)}{1 + \exp\left(\beta_0 + \beta_1 x_i\right)}\right] = 0,\quad +E\left[x_i\left\{y_i - \frac{\exp\left(\beta_0 + \beta_1 x_i\right)}{1 + \exp\left(\beta_0 + \beta_1 x_i\right)}\right\}\right] = 0. +\end{equation*}\]

    +

    Furthermore from the definition of general fertility rate, we get:

    +

    \[E\left[y_i - 0.06179\right] = 0.\]

    +

    Following Chaudhuri et al. (2008), we define the estimating +equations \(g\) as follows:

    +

    \[g \left(\beta, v_i\right) = \begin{bmatrix} +y_i - \frac{\exp\left(\beta_0 + \beta_1 x_i\right)}{1 + \exp\left(\beta_0 + \beta_1 x_i\right)} \\ +x_i\left[y_i - \frac{\exp\left(\beta_0 + \beta_1 x_i\right)}{1 + \exp\left(\beta_0 + \beta_1 x_i\right)}\right] \\ +y_i - 0.06179 \\ +\end{bmatrix}\]

    +

    The gradient of \(g\) with respect to \(\beta\) is given by:

    +

    \[\nabla_{\beta}g = \begin{bmatrix} +\frac{-\exp\left(\beta_0 + \beta_1 x_i\right)}{\left(\exp\left(\beta_0 + \beta_1 x_i\right) + 1\right)^2} & \frac{-\exp\left(\beta_0 + \beta_1 x_i\right) x_i}{\left(\exp\left(\beta_0 + \beta_1 x_i\right) + 1\right)^2} \\ +\frac{-\exp\left(\beta_0 + \beta_1 x_i\right) x_i}{\left(\exp\left(\beta_0 + \beta_1 x_i\right) + 1\right)^2} & \frac{-\exp\left(\beta_0 + \beta_1 x_i\right) x_i^2}{\left(\exp\left(\beta_0 + \beta_1 x_i\right) + 1\right)^2} \\ +0 & 0 \\ +\end{bmatrix}\]

    +

    In R, we create functions g and dg to represent \(g\) and \(\nabla_{\beta}g\):

    +
    Function: fun
    +R> g <- function(params, X) {
    ++  result <- matrix(0, nrow = nrow(X), ncol = 3)
    ++  a <- exp(params[1] + params[2] * X[, 1])
    ++  a <- a / (1 + a)
    ++  result[, 1] <- X[, 2] - a
    ++  result[, 2] <- (X[, 2] - a) * X[, 1]
    ++  result[, 3] <- X[, 2] - 0.06179
    ++  result
    +}
    +Function: dfun
    +R> dg <- function(params, X) {
    ++  result <- array(0, c(3, 2, nrow(X)))
    ++  a <- exp(params[1] + params[2] * X[, 1])
    ++  a <- -a / (a + 1) ^ 2
    ++  result[1, 1, ] <- a
    ++  result[1, 2, ] <- result[1, 1, ] * X[, 1]
    ++  result[2, 1, ] <- result[1, 2, ]
    ++  result[2, 2, ] <- result[1, 2, ] * X[, 1]
    ++  result[3, , ] <- 0
    ++  result
    +}
    +

    We choose independent \(N\left(0, 100\right)\) priors for both \(\beta_0\) +and \(\beta_1\):

    +
    Function: prior
    +R> pr <- function(x) {
    ++    - 0.5*t(x)%*%x/10^4 - log(2*pi*10^4)
    ++ },
    +Function: dprior
    +R> dpr <- function(x) {
    ++   -x * 10 ^ (-4)
    ++ },
    +

    where pr is the prior and dpr is the gradient of the log prior for \(\beta\).

    +

    Our goal is to use ELHMC to draw samples of +\(\beta = \left( \beta_0, \beta_1 \right)\) from their resulting posterior +based on empirical likelihood.

    +

    We start our sampling from \((-3.2,0.55)\) and use two stages of sampling. +In the first stage, \(50\) points are sampled with \(\epsilon=0.001\), +\(T=15\), and the momentum generated from a \(N(0,0.02\cdot I_2)\) +distribution. The acceptance rate at this stage is very high, but it is +designed to find a good starting point for the second stage, where the +acceptance rate can be easily controlled.

    +
    R> bstart.init=c(-3.2,.55)
    +R> betas.init <- ELHMC(initial = bstart.init, data = data, FUN = g, DFUN = dg,
    ++               n.samples = 50, prior = pr, dprior = dpr, epsilon = 0.001,
    ++               lf.steps = 15, detailed = T, p.variance = 0.2)
    +
    +
    +Contour plot of the non-normalised log posterior with the HMC sampling path (left) and density plot (right) of the samples for the constrained logistic regression problem.Contour plot of the non-normalised log posterior with the HMC sampling path (left) and density plot (right) of the samples for the constrained logistic regression problem. +

    +Figure 5: Contour plot of the non-normalised log posterior with the HMC sampling path (left) and density plot (right) of the samples for the constrained logistic regression problem. +

    +
    +
    +
    +
    +The autocorrelation function of the samples drawn from the posterior of $\beta$.The autocorrelation function of the samples drawn from the posterior of $\beta$. +

    +Figure 6: The autocorrelation function of the samples drawn from the posterior of \(\beta\). +

    +
    +
    +

    In this second stage, we draw 500 samples of \(\beta\) with starting +values as the last value from the first stage. The number of leapfrog +steps per Monte Carlo update is set to 30, with a step size of 0.004 for +both \(\beta_0\) and \(\beta_1\). We use +\(N \left(0, 0.02\textbf(I_2)\right)\) as prior for the momentum.

    +
    R> bstart=betas.init$samples[50,]
    +R> betas <- ELHMC(initial = bstart, data = data, fun = g, dfun = dg,
    ++                n.samples = 500, prior = pr, dprior = dpr, epsilon = 0.004,
    ++                lf.steps = 30, detailed = FALSE, p.variance = 0.2,print.interval=10,
    ++        plot.interval=1,which.plot=c(1))
    +

    Based on our output, we can make inferences about \(\beta\). As an +example, the autocorrelation plots and the density plot of the last +\(1000\) samples of \(\beta\) are shown in Figure +5.

    +
    R> library(MASS)
    +R> beta.density <- kde2d(betas$sample[, 1], betas$samples[, 2])
    +R> persp(beta.density, phi = 50, theta = 20,
    ++       xlab = 'Intercept', ylab = '', zlab = 'Density',
    ++       ticktype = 'detailed', cex.axis = 0.35, cex.lab = 0.35, d = 0.7)
    +R> acf(betas$sample[round(n.samp/2):n.samp, 1],
    ++       main=expression(paste("Series ",beta[0])))
    +R> acf(betas$sample[round(n.samp/2):n.samp, 2],
    ++       main=expression(paste("Series ",beta[1])))
    +

    It is well known (Chaudhuri et al. 2008) that the constrained +estimates of \(\beta_0\) and \(\beta_1\) have very low standard error. The +acceptance rate is close to \(78\%\). It is evident that our software can +sample from such a narrow ridge with ease. Furthermore, the +autocorrelation of the samples seems to decrease very quickly with the +lag, which would not be the case for most other MCMC procedures.

    +

    Acknowledgement

    +

    Dang Trung Kien would like to acknowledge the support of MOE AcRF +R-155-000-140-112 from the National University of Singapore. Sanjay +Chaudhuri acknowledges the partial support from NSF-DMS grant 2413491 +from the National Science Foundation USA. The authors are grateful to +Professor Michael Rendall, Department of Sociology, University of +Maryland, College Park for kindly sharing the data set on which the +second example is based.

    +
    +

    5 Supplementary materials

    +

    Supplementary materials are available in addition to this article. It can be downloaded at +RJ-2025-041.zip

    +

    6 Note

    +

    This article is converted from a Legacy LaTeX article using the +texor package. +The pdf version is the official version. To report a problem with the html, +refer to CONTRIBUTE on the R Journal homepage.

    +
    +
    +W. Bergsma, M. Croon, L. A. van der Ark, et al. The empty set and zero likelihood problems in maximum empirical likelihood estimation. Electronic Journal of Statistics, 6: 2356–2361, 2012. +
    +
    +J. Besag. Markov chain monte carlo methods for statistical inference. 2004. +
    +
    +C. K. Birdsall and A. B. Langdon. Plasma physics via computer simulation. CRC Press, 2004. +
    +
    +S. P. Boyd and L. Vandenberghe. Convex optimization. Cambridge university press, 2004. +
    +
    +B. Carpenter, A. Gelman, M. Hoffman, D. Lee, B. Goodrich, M. Betancourt, M. Brubaker, J. Guo, P. Li and A. Riddell. Stan: A probabilistic programming language. Journal of Statistical Software, Articles, 76(1): 1–32, 2017. URL https://www.jstatsoft.org/v076/i01. +
    +
    +S. Chaudhuri, M. S. Handcock and M. S. Rendall. Generalized linear models incorporating population level information: An empirical-likelihood-based approach. Journal of the Royal Statistical Society series B, 70.2: 311–328, 2008. +
    +
    +S. Chaudhuri, D. Mondal and T. Yin. Hamiltonian monte carlo sampling in Bayesian empirical likelihood computation. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79(1): 293–320, 2017. URL http://dx.doi.org/10.1111/rssb.12164. +
    +
    +J. Chen, A. M. Variyath and B. Abraham. Adjusted empirical likelihood and its properties. Journal of Computational and Graphical Statistics, 17(2): 426–443, 2008. +
    +
    +S. C. Emerson, A. B. Owen, et al. Calibration of the empirical likelihood method for a vector mean. Electronic Journal of Statistics, 3: 1161–1192, 2009. +
    +
    +S. Geman and D. Geman. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. Pattern Analysis and Machine Intelligence, IEEE Transactions on, (6): 721–741, 1984. +
    +
    +M. Grendár and G. Judge. Empty set problem of maximum empirical likelihood methods. Electronic Journal of Statistics, 3: 1542–1555, 2009. +
    +
    +H. Haario, E. Saksman and J. Tamminen. Adaptive proposal distribution for random walk Metropolis algorithm. Computational Statistics, 14(3): 375–396, 1999. +
    +
    +N. A. Lazar. Bayesian empirical likelihood. Biometrika, 90(2): 319–326, 2003. +
    +
    +B. Leimkuhler and S. Reich. Simulating hamiltonian dynamics. Cambridge University Press, 2004. +
    +
    +Y. Liu, J. Chen, et al. Adjusted empirical likelihood with high-order precision. The Annals of Statistics, 38(3): 1341–1362, 2010. +
    +
    +R. Neal. MCMC for using hamiltonian dynamics. Handbook of Markov Chain Monte Carlo, 113–162, 2011. +
    +
    +J. Qin and J. Lawless. Empirical likelihood and general estimating equations. The Annals of Statistics, 300–325, 1994. +
    +
    +M. Tsao. Extending the empirical likelihood by domain expansion. Canadian Journal of Statistics, 41(2): 257–274, 2013. +
    +
    +M. Tsao and F. Wu. Empirical likelihood on the full parameter space. The Annals of Statistics, 41(4): 2176–2196, 2013. +
    +
    +M. Tsao and F. Wu. Extended empirical likelihood for estimating equations. Biometrika, 101(3): 703–710, 2014. URL http://dx.doi.org/10.1093/biomet/asu014. +
    +
    +W. Yu and J. Lim. VBel: Variational Bayes for fast and accurate empirical likelihood inference. 2024. URL https://CRAN.R-project.org/package=VBel. R package version 1.1.0. +
    +
    +M. Zhou. Emplik: Empirical likelihood ratio for censored/truncated data. 2014. URL http://CRAN.R-project.org/package=emplik. R package version 0.9-9-2. +
    +
    +
    +
    +
      +
    1. By convention, +\(x_i=(x_{i1},x_{i2},\ldots,x_{ip})^T\le x=(x_1,x_2,\ldots,x_p)^T\) +iff \(x_{ij}\le x_{j}\) \(\forall j\).↩︎

    2. +
    3. In Figure (fig:scheme), \(\omega_1=\omega_2=\omega_3=0\), +\(\omega_4>0\), and \(\omega_5>0\).↩︎

    4. +
    5. In Figure (fig:scheme), \(q=2\), and the faces of maximum dimension +are the sides of the pentagon. They have \(q=2\) end i.e. extreme +points.↩︎

    6. +
    7. The authors are grateful to Prof. Michael Rendall, Department of +Sociology, University of Maryland, College Park, for kindly sharing +the data on which this example is based.

      +
      +↩︎
    8. +
    +
    + + +
    + +
    +
    + + + + + + + +
    +

    References

    +
    +

    Reuse

    +

    Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

    +

    Citation

    +

    For attribution, please cite this work as

    +
    Wei, et al., "elhmc: An R Package for Hamiltonian Monte Carlo Sampling in Bayesian Empirical Likelihood", The R Journal, 2026
    +

    BibTeX citation

    +
    @article{RJ-2025-041,
    +  author = {Wei, Neo Han and Kien, Dang Trung and Chaudhuri, Sanjay},
    +  title = {elhmc: An R Package for Hamiltonian Monte Carlo Sampling in Bayesian Empirical Likelihood},
    +  journal = {The R Journal},
    +  year = {2026},
    +  note = {https://doi.org/10.32614/RJ-2025-041},
    +  doi = {10.32614/RJ-2025-041},
    +  volume = {17},
    +  issue = {4},
    +  issn = {2073-4859},
    +  pages = {237-254}
    +}
    +
    + + + + + + + diff --git a/_articles/RJ-2025-041/RJ-2025-041.pdf b/_articles/RJ-2025-041/RJ-2025-041.pdf new file mode 100644 index 0000000000..339c4c8ad5 Binary files /dev/null and b/_articles/RJ-2025-041/RJ-2025-041.pdf differ diff --git a/_articles/RJ-2025-041/RJ-2025-041.zip b/_articles/RJ-2025-041/RJ-2025-041.zip new file mode 100644 index 0000000000..9eefe3c530 Binary files /dev/null and b/_articles/RJ-2025-041/RJ-2025-041.zip differ diff --git a/_articles/RJ-2025-041/RJournal.sty b/_articles/RJ-2025-041/RJournal.sty new file mode 100644 index 0000000000..351990be38 --- /dev/null +++ b/_articles/RJ-2025-041/RJournal.sty @@ -0,0 +1,358 @@ +% Package `RJournal' to use with LaTeX2e +% Copyright (C) 2010 by the R Foundation +% Copyright (C) 2013 by the R Journal +% +% Originally written by Kurt Hornik and Friedrich Leisch with subsequent +% edits by the editorial board +% +% CAUTION: +% Do not modify this style file. Any changes to this file will be reset when your +% article is submitted. +% If you must modify the style or add LaTeX packages to the article, these +% should be specified in RJwrapper.tex + +\NeedsTeXFormat{LaTeX2e}[1995/12/01] +\ProvidesPackage{RJournal}[2025/10/05 v0.17 RJournal package] + +\RequirePackage{tikz} + +% Overall page layout, fonts etc ----------------------------------------------- + +% Issues of of \emph{The R Journal} are created from the standard \LaTeX{} +% document class \pkg{report}. + +\RequirePackage{geometry} +\geometry{a4paper, + textwidth=14cm, top=1cm, bottom=1cm, + includehead,includefoot,centering, + footskip=1.5cm} +\raggedbottom +\sloppy +\clubpenalty = 10000 +\widowpenalty = 10000 +\brokenpenalty = 10000 +\usepackage{microtype} + + +\RequirePackage{fancyhdr} +\fancyhead{} +\fancyheadoffset{2cm} +\fancyhead[L]{\textsc{\RJ@sectionhead}} +\fancyhead[R]{\thepage} +\fancyfoot{} +\fancyfoot[L]{The R Journal Vol. \RJ@volume/\RJ@number, \RJ@month~\RJ@year} +\fancyfoot[R]{ISSN 2073-4859} +\pagestyle{fancy} + +% We use the following fonts (all with T1 encoding): +% +% rm & palatino +% tt & inconsolata +% sf & helvetica +% math & palatino + +\RequirePackage{microtype} + +\RequirePackage[scaled=0.92]{helvet} +\RequirePackage{palatino,mathpazo} +\RequirePackage[scaled=1.02]{inconsolata} +\RequirePackage[T1]{fontenc} + +\RequirePackage[hyphens]{url} +\RequirePackage[pagebackref]{hyperref} +\renewcommand{\backref}[1]{[p#1]} + +% Dark blue colour for all links +\RequirePackage{color} +\definecolor{link}{rgb}{0.45,0.51,0.67} +\hypersetup{ + colorlinks,% + citecolor=link,% + filecolor=link,% + linkcolor=link,% + urlcolor=link +} + +% Give the text a little room to breath +\setlength{\parskip}{3pt} +\RequirePackage{setspace} +\setstretch{1.05} + +% Issue and article metadata --------------------------------------------------- + +% Basic front matter information about the issue: volume, number, and +% date. + +\newcommand{\volume}[1]{\def\RJ@volume{#1}} +\newcommand{\volnumber}[1]{\def\RJ@number{#1}} +\renewcommand{\month}[1]{\def\RJ@month{#1}} +\renewcommand{\year}[1]{\def\RJ@year{#1}} + + +% Individual articles correspond to +% chapters, and are contained in |article| environments. This makes it +% easy to have figures counted within articles and hence hyperlinked +% correctly. + +% An article has an author, a title, and optionally a subtitle. We use +% the obvious commands for specifying these. Articles will be put in certain +% journal sections, named by \sectionhead. + +\newcommand {\sectionhead} [1]{\def\RJ@sectionhead{#1}} +\renewcommand{\author} [1]{\def\RJ@author{#1}} +\renewcommand{\title} [1]{\def\RJ@title{#1}} +\newcommand {\subtitle} [1]{\def\RJ@subtitle{#1}} + +% Control appearance of titles: make slightly smaller than usual, and +% suppress section numbering. See http://tex.stackexchange.com/questions/69749 +% for why we don't use \setcounter{secnumdepth}{-1} + +\usepackage[medium]{titlesec} +\usepackage{titletoc} +\titleformat{\section} {\normalfont\large\bfseries}{\arabic{section}}{1em}{} +\titleformat{\subsection}{\normalfont\normalsize\bfseries}{\arabic{section}.\arabic{subsection}}{0.5em}{} +\titlecontents{chapter} [0em]{}{}{}{\titlerule*[1em]{.}\contentspage} + +% Article layout --------------------------------------------------------------- + +% Environment |article| clears the article header information at its beginning. +% We use |\FloatBarrier| from the placeins package to keep floats within +% the article. +\RequirePackage{placeins} +\newenvironment{article}{\author{}\title{}\subtitle{}\FloatBarrier}{\FloatBarrier} + +% Refereed articles should have an abstract, so we redefine |\abstract| to +% give the desired style + +\renewcommand{\abstract}[1]{\noindent\textbf{Abstract} #1} +\renewenvironment{abstract}{\noindent\textbf{Abstract}~}{} + +% The real work is done by a redefined version of |\maketitle|. Note +% that even though we do not want chapters (articles) numbered, we +% need to increment the chapter counter, so that figures get correct +% labelling. + +\renewcommand{\maketitle}{% +\noindent + \chapter{\RJ@title}\refstepcounter{chapter} + \ifx\empty\RJ@subtitle + \else + \noindent\textbf{\RJ@subtitle} + \par\nobreak\addvspace{\baselineskip} + \fi + \ifx\empty\RJ@author + \else + \noindent\textit{\RJ@author} + \par\nobreak\addvspace{\baselineskip} + \fi + \@afterindentfalse\@nobreaktrue\@afterheading +} + +% Now for some ugly redefinitions. We do not want articles to start a +% new page. (Actually, we do, but this is handled via explicit +% \newpage +% +% The name@of@eq is a hack to get hyperlinks to equations to work +% within each article, even though there may be multiple eq.(1) +% \begin{macrocode} +\renewcommand\chapter{\secdef\RJ@chapter\@schapter} +\providecommand{\nohyphens}{% + \hyphenpenalty=10000\exhyphenpenalty=10000\relax} +\newcommand{\RJ@chapter}{% + \edef\name@of@eq{equation.\@arabic{\c@chapter}}% + \renewcommand{\@seccntformat}[1]{}% + \@startsection{chapter}{0}{0mm}{% + -2\baselineskip \@plus -\baselineskip \@minus -.2ex}{\p@}{% + \phantomsection\normalfont\huge\bfseries\raggedright}} + +% Book reviews should appear as sections in the text and in the pdf bookmarks, +% however we wish them to appear as chapters in the TOC. Thus we define an +% alternative to |\maketitle| for reviews. +\newcommand{\review}[1]{ + \pdfbookmark[1]{#1}{#1} + \section*{#1} + \addtocontents{toc}{\protect\contentsline{chapter}{#1}{\thepage}{#1.1}} +} + +% We want bibliographies as starred sections within articles. +% +\RequirePackage[sectionbib,round]{natbib} +\bibliographystyle{abbrvnat} +\renewcommand{\bibsection}{\section*{References}} + +% Equations, figures and tables are counted within articles, but we do +% not show the article number. For equations it becomes a bit messy to avoid +% having hyperref getting it wrong. + +% \numberwithin{equation}{chapter} +\renewcommand{\theequation}{\@arabic\c@equation} +\renewcommand{\thefigure}{\@arabic\c@figure} +\renewcommand{\thetable}{\@arabic\c@table} + +% Issue layout ----------------------------------------------------------------- + +% Need to provide our own version of |\tableofcontents|. We use the +% tikz package to get the rounded rectangle. Notice that |\section*| +% is really the same as |\chapter*|. +\renewcommand{\contentsname}{Contents} +\renewcommand\tableofcontents{% + \vspace{1cm} + \section*{\contentsname} + { \@starttoc{toc} } +} + +\renewcommand{\titlepage}{% + \thispagestyle{empty} + \hypersetup{ + pdftitle={The R Journal Volume \RJ@volume/\RJ@number, \RJ@month \RJ@year},% + pdfauthor={R Foundation for Statistical Computing},% + } + \noindent + \begin{center} + \fontsize{50pt}{50pt}\selectfont + The \raisebox{-8pt}{\includegraphics[height=77pt]{Rlogo-5}}\hspace{10pt} + Journal + + \end{center} + {\large \hfill Volume \RJ@volume/\RJ@number, \RJ@month{} \RJ@year \quad} + + \rule{\textwidth}{1pt} + \begin{center} + {\Large A peer-reviewed, open-access publication of the \\ + R Foundation for Statistical Computing} + \end{center} + + % And finally, put in the TOC box. Note the way |tocdepth| is adjusted + % before and after producing the TOC: thus, we can ensure that only + % articles show up in the printed TOC, but that in the PDF version, + % bookmarks are created for sections and subsections as well (provided + % that the non-starred forms are used). + \setcounter{tocdepth}{0} + \tableofcontents + \setcounter{tocdepth}{2} + \clearpage +} + +% Text formatting -------------------------------------------------------------- + +\newcommand{\R}{R} +\newcommand{\address}[1]{\addvspace{\baselineskip}\noindent\emph{#1}} +\newcommand{\email}[1]{\href{mailto:#1}{\normalfont\texttt{#1}}} + +% Simple font selection is not good enough. For example, |\texttt{--}| +% gives `\texttt{--}', i.e., an endash in typewriter font. Hence, we +% need to turn off ligatures, which currently only happens for commands +% |\code| and |\samp| and the ones derived from them. Hyphenation is +% another issue; it should really be turned off inside |\samp|. And +% most importantly, \LaTeX{} special characters are a nightmare. E.g., +% one needs |\~{}| to produce a tilde in a file name marked by |\file|. +% Perhaps a few years ago, most users would have agreed that this may be +% unfortunate but should not be changed to ensure consistency. But with +% the advent of the WWW and the need for getting `|~|' and `|#|' into +% URLs, commands which only treat the escape and grouping characters +% specially have gained acceptance + +\DeclareRobustCommand\code{\bgroup\@noligs\@codex} +\def\@codex#1{\texorpdfstring% +{{\normalfont\ttfamily\hyphenchar\font=-1 #1}}% +{#1}\egroup} +\newcommand{\kbd}[1]{{\normalfont\texttt{#1}}} +\newcommand{\key}[1]{{\normalfont\texttt{\uppercase{#1}}}} +\DeclareRobustCommand\samp{`\bgroup\@noligs\@sampx} +\def\@sampx#1{{\normalfont\texttt{#1}}\egroup'} +\newcommand{\var}[1]{{\normalfont\textsl{#1}}} +\let\env=\code +\newcommand{\file}[1]{{`\normalfont\textsf{#1}'}} +\let\command=\code +\let\option=\samp +\newcommand{\dfn}[1]{{\normalfont\textsl{#1}}} +% \acronym is effectively disabled since not used consistently +\newcommand{\acronym}[1]{#1} +\newcommand{\strong}[1]{\texorpdfstring% +{{\normalfont\fontseries{b}\selectfont #1}}% +{#1}} +\let\pkg=\strong +\newcommand{\CRANpkg}[1]{\href{https://CRAN.R-project.org/package=#1}{\pkg{#1}}}% +\let\cpkg=\CRANpkg +\newcommand{\ctv}[1]{\href{https://CRAN.R-project.org/view=#1}{\emph{#1}}} +\newcommand{\BIOpkg}[1]{\href{https://www.bioconductor.org/packages/release/bioc/html/#1.html}{\pkg{#1}}} + +% Example environments --------------------------------------------------------- +\RequirePackage{fancyvrb} +\RequirePackage{alltt} + +\DefineVerbatimEnvironment{example}{Verbatim}{} +\renewenvironment{example*}{\begin{alltt}}{\end{alltt}} + +% Support for output from Sweave, and generic session style code +% These used to have fontshape=sl for Sinput/Scode/Sin, but pslatex +% won't use a condensed font in that case. + +% Update (2015-05-28 by DS): remove fontsize=\small to match example environment + +\DefineVerbatimEnvironment{Sinput}{Verbatim}{} +\DefineVerbatimEnvironment{Soutput}{Verbatim}{} +\DefineVerbatimEnvironment{Scode}{Verbatim}{} +\DefineVerbatimEnvironment{Sin}{Verbatim}{} +\DefineVerbatimEnvironment{Sout}{Verbatim}{} +\newenvironment{Schunk}{}{} + +% Mathematics ------------------------------------------------------------------ + +% The implementation of |\operatorname| is similar to the mechanism +% \LaTeXe{} uses for functions like sin and cos, and simpler than the +% one of \AmSLaTeX{}. We use |\providecommand| for the definition in +% order to keep the one of the \pkg{amstex} if this package has +% already been loaded. +% \begin{macrocode} +\providecommand{\operatorname}[1]{% + \mathop{\operator@font#1}\nolimits} +\RequirePackage{amsfonts} + +\renewcommand{\P}{% + \mathop{\operator@font I\hspace{-1.5pt}P\hspace{.13pt}}} +\newcommand{\E}{% + \mathop{\operator@font I\hspace{-1.5pt}E\hspace{.13pt}}} +\newcommand{\VAR}{\operatorname{var}} +\newcommand{\COV}{\operatorname{cov}} +\newcommand{\COR}{\operatorname{cor}} + +% Figures ---------------------------------------------------------------------- + +% For use with pandoc > 3.2.1 +\newsavebox\pandoc@box +\newcommand*\pandocbounded[1]{% scales image to fit in text height/width + \sbox\pandoc@box{#1}% + \Gscale@div\@tempa{\textheight}{\dimexpr\ht\pandoc@box+\dp\pandoc@box\relax}% + \Gscale@div\@tempb{\linewidth}{\wd\pandoc@box}% + \ifdim\@tempb\p@<\@tempa\p@\let\@tempa\@tempb\fi% select the smaller of both + \ifdim\@tempa\p@<\p@\scalebox{\@tempa}{\usebox\pandoc@box}% + \else\usebox{\pandoc@box}% + \fi% +} + +\RequirePackage[font=small,labelfont=bf]{caption} + +% Wide environments for figures and tables ------------------------------------- +\RequirePackage{environ} + +% An easy way to make a figure span the full width of the page +\NewEnviron{widefigure}[1][]{ +\begin{figure}[#1] +\advance\leftskip-2cm +\begin{minipage}{\dimexpr\textwidth+4cm\relax}% + \captionsetup{margin=2cm} + \BODY +\end{minipage}% +\end{figure} +} + +\NewEnviron{widetable}[1][]{ +\begin{table}[#1] +\advance\leftskip-2cm +\begin{minipage}{\dimexpr\textwidth+4cm\relax}% + \captionsetup{margin=2cm} + \BODY +\end{minipage}% +\end{table} +} diff --git a/_articles/RJ-2025-041/RJwrapper.tex b/_articles/RJ-2025-041/RJwrapper.tex new file mode 100644 index 0000000000..0d73c69b0e --- /dev/null +++ b/_articles/RJ-2025-041/RJwrapper.tex @@ -0,0 +1,60 @@ +\documentclass[a4paper]{report} + +\usepackage[utf8]{inputenc} +\usepackage[T1]{fontenc} +\usepackage{RJournal} + +%% load any required packages, macros etc. FOLLOWING this line + +%\usepackage{Metafix} + +\usepackage{amsmath,amssymb,array} +\usepackage{booktabs} + +\usepackage{tabularx} +\usepackage{amscd} +\usepackage{mathtools} +\usepackage{subcaption} +\usepackage{natbib} +\usepackage{wrapfig} +%\usepackage[colorlinks=true]{hyperref} +%\usepackage{chicago} +%\usepackage[notref]{showkeys} +\def \C{\mathcal{C}} + +\def \S{\mathcal{S}} + +\def\L{\mathcal{L}} + + + +\def\shalf{\mbox{{\tiny$\frac{1}{2}$}}} +\def\half{\mbox{{\small$\frac{1}{2}$}}} +\def\sT{\mbox{\tiny$T$}} + +\def \I{\mathcal{I}} + +\def \F{\mathcal{F}} +\def \R{\mathcal{R}} +\def\E{\mbox{{\rm E\,}}} + +%\renewcommand{\baselinestretch}{1.6} +\DeclareMathOperator*{\argmin}{\arg\!\min} +\DeclareMathOperator*{\argmax}{\arg\!\max} + +\begin{document} + +%% do not edit, for illustration only +\sectionhead{Contributed research article} +\volume{17} +\volnumber{4} +\year{2025} +\month{December} +\setcounter{page}{237} + +%% replace RJtemplate with your article +\begin{article} + \input{kWC} +\end{article} + +\end{document} diff --git a/_articles/RJ-2025-041/figures/bhpsAcfb0.pdf b/_articles/RJ-2025-041/figures/bhpsAcfb0.pdf new file mode 100644 index 0000000000..e6c9bcd29e Binary files /dev/null and b/_articles/RJ-2025-041/figures/bhpsAcfb0.pdf differ diff --git a/_articles/RJ-2025-041/figures/bhpsAcfb0.png b/_articles/RJ-2025-041/figures/bhpsAcfb0.png new file mode 100644 index 0000000000..75c585d368 Binary files /dev/null and b/_articles/RJ-2025-041/figures/bhpsAcfb0.png differ diff --git a/_articles/RJ-2025-041/figures/bhpsAcfb0.ps b/_articles/RJ-2025-041/figures/bhpsAcfb0.ps new file mode 100644 index 0000000000..d0d60239bd Binary files /dev/null and b/_articles/RJ-2025-041/figures/bhpsAcfb0.ps differ diff --git a/_articles/RJ-2025-041/figures/bhpsAcfb1.pdf b/_articles/RJ-2025-041/figures/bhpsAcfb1.pdf new file mode 100644 index 0000000000..5d9c4c1926 Binary files /dev/null and b/_articles/RJ-2025-041/figures/bhpsAcfb1.pdf differ diff --git a/_articles/RJ-2025-041/figures/bhpsAcfb1.png b/_articles/RJ-2025-041/figures/bhpsAcfb1.png new file mode 100644 index 0000000000..0f4d38d90e Binary files /dev/null and b/_articles/RJ-2025-041/figures/bhpsAcfb1.png differ diff --git a/_articles/RJ-2025-041/figures/bhpsAcfb1.ps b/_articles/RJ-2025-041/figures/bhpsAcfb1.ps new file mode 100644 index 0000000000..6f6142dc1c Binary files /dev/null and b/_articles/RJ-2025-041/figures/bhpsAcfb1.ps differ diff --git a/_articles/RJ-2025-041/figures/bhpsContour.pdf b/_articles/RJ-2025-041/figures/bhpsContour.pdf new file mode 100644 index 0000000000..4ae7a1ae04 Binary files /dev/null and b/_articles/RJ-2025-041/figures/bhpsContour.pdf differ diff --git a/_articles/RJ-2025-041/figures/bhpsContour.png b/_articles/RJ-2025-041/figures/bhpsContour.png new file mode 100644 index 0000000000..bacaa4d401 Binary files /dev/null and b/_articles/RJ-2025-041/figures/bhpsContour.png differ diff --git a/_articles/RJ-2025-041/figures/dirScan3.jpeg b/_articles/RJ-2025-041/figures/dirScan3.jpeg new file mode 100644 index 0000000000..f65a1a3e34 Binary files /dev/null and b/_articles/RJ-2025-041/figures/dirScan3.jpeg differ diff --git a/_articles/RJ-2025-041/figures/elhmc-008.pdf b/_articles/RJ-2025-041/figures/elhmc-008.pdf new file mode 100644 index 0000000000..8d305f27fe Binary files /dev/null and b/_articles/RJ-2025-041/figures/elhmc-008.pdf differ diff --git a/_articles/RJ-2025-041/figures/elhmc-008.png b/_articles/RJ-2025-041/figures/elhmc-008.png new file mode 100644 index 0000000000..9dee2647a0 Binary files /dev/null and b/_articles/RJ-2025-041/figures/elhmc-008.png differ diff --git a/_articles/RJ-2025-041/figures/elhmc-010.pdf b/_articles/RJ-2025-041/figures/elhmc-010.pdf new file mode 100644 index 0000000000..c9c87576f3 Binary files /dev/null and b/_articles/RJ-2025-041/figures/elhmc-010.pdf differ diff --git a/_articles/RJ-2025-041/figures/elhmc-010.png b/_articles/RJ-2025-041/figures/elhmc-010.png new file mode 100644 index 0000000000..d108b2e478 Binary files /dev/null and b/_articles/RJ-2025-041/figures/elhmc-010.png differ diff --git a/_articles/RJ-2025-041/figures/elhmcHMC.pdf b/_articles/RJ-2025-041/figures/elhmcHMC.pdf new file mode 100644 index 0000000000..3e2b0fe102 Binary files /dev/null and b/_articles/RJ-2025-041/figures/elhmcHMC.pdf differ diff --git a/_articles/RJ-2025-041/figures/elhmcHMC.png b/_articles/RJ-2025-041/figures/elhmcHMC.png new file mode 100644 index 0000000000..d3375898b8 Binary files /dev/null and b/_articles/RJ-2025-041/figures/elhmcHMC.png differ diff --git a/_articles/RJ-2025-041/kWC.bib b/_articles/RJ-2025-041/kWC.bib new file mode 100644 index 0000000000..a03aa15247 --- /dev/null +++ b/_articles/RJ-2025-041/kWC.bib @@ -0,0 +1,1454 @@ +@manual{VBel, + title = {{VBel}: Variational {Bayes} for Fast and Accurate Empirical Likelihood Inference}, + author = {Weichang Yu and Jeremy Lim}, + year = {2024}, + note = {{R} package version 1.1.0}, + url = {https://CRAN.R-project.org/package=VBel} +} +@article{tsaoFu2014, + author = {Tsao, Min and Wu, Fan}, + title = {Extended empirical likelihood for estimating equations}, + journal = {Biometrika}, + volume = {101}, + number = {3}, + pages = {703-710}, + year = {2014}, + doi = {10.1093/biomet/asu014}, + url = {http://dx.doi.org/10.1093/biomet/asu014}, + eprint = {/oup/backfile/content_public/journal/biomet/101/3/10.1093/biomet/asu014/3/asu014.pdf} +} +@article{chaudhuri_handcock_rendall_2008, + author = {Chaudhuri, Sanjay and Handcock, Mark S. and Rendall, Michael S}, + title = {Generalized linear models incorporating population level information: an empirical-likelihood-based approach}, + journal = {Journal of the Royal Statistical Society series B}, + volume = {70}, + part = {2}, + pages = {311--328}, + year = {2008} +} +@article{stan2017, + author = {Carpenter,Bob and Gelman, Andrew and Hoffman, Matthew and Lee, Daniel and Goodrich, Ben and Betancourt, Michael and Brubaker, Marcus and Guo, Jiqiang and Li, Peter and Riddell, Allen}, + title = {Stan: A Probabilistic Programming Language}, + journal = {Journal of Statistical Software, Articles}, + volume = {76}, + number = {1}, + year = {2017}, + keywords = {probabilistic programming; {Bayesian} inference; algorithmic differentiation; Stan}, + abstract = {Stan is a probabilistic programming language for specifying statistical models. A Stan program imperatively defines a log probability function over parameters conditioned on specified data and constants. As of version 2.14.0, Stan provides full {Bayesian} inference for continuous-variable models through Markov chain Monte Carlo methods such as the No-U-Turn sampler, an adaptive form of Hamiltonian Monte Carlo sampling. Penalized maximum likelihood estimates are calculated using optimization methods such as the limited memory Broyden-Fletcher-Goldfarb-Shanno algorithm. Stan is also a platform for computing log densities and their gradients and Hessians, which can be used in alternative algorithms such as variational {Bayes}, expectation propagation, and marginal inference using approximate integration. To this end, Stan is set up so that the densities, gradients, and Hessians, along with intermediate quantities of the algorithm such as acceptance probabilities, are easily accessible. Stan can be called from the command line using the cmdstan package, through {R} using the rstan package, and through Python using the pystan package. All three interfaces support sampling and optimization-based inference with diagnostics and posterior analysis. rstan and pystan also provide access to log probabilities, gradients, Hessians, parameter transforms, and specialized plotting.}, + issn = {1548-7660}, + pages = {1--32}, + doi = {10.18637/jss.v076.i01}, + url = {https://www.jstatsoft.org/v076/i01} +} +@article{chaudhuriMondalTeng2017, + author = {Chaudhuri, Sanjay and Mondal, Debashis and Yin, Teng}, + title = {Hamiltonian Monte Carlo sampling in {Bayesian} empirical likelihood computation}, + journal = {Journal of the Royal Statistical Society: Series B (Statistical Methodology)}, + volume = {79}, + number = {1}, + issn = {1467-9868}, + url = {http://dx.doi.org/10.1111/rssb.12164}, + doi = {10.1111/rssb.12164}, + pages = {293--320}, + keywords = {Constrained convex optimization, Empirical likelihood, Generalized linear models, Hamiltonian Monte Carlo methods, Mixed effect models, Score equations, Small area estimation, Unbiased estimating equations}, + year = {2017} +} +@book{Tsao2013ELLforGEE, + author = {Tsao, Min and Wu, Fan}, + publisher = {eprint arXiv:1306.1493}, + title = {Extended empirical likelihood for general estimating equations}, + year = {2013} +} +@book{ye2011interior, + author = {Ye, Yinyu}, + publisher = {John Wiley \& Sons}, + title = {Interior point algorithms: theory and analysis}, + volume = {44}, + year = {2011} +} +@manual{Zhou.:2014nr, + author = {Mai Zhou}, + note = {{R} package version 0.9-9-2}, + title = {emplik: Empirical likelihood ratio for censored/truncated data}, + url = {http://CRAN.R-project.org/package=emplik}, + year = {2014} +} +@article{emerson2009calibration, + author = {Emerson, Sarah C and Owen, Art B and others}, + journal = {Electronic Journal of Statistics}, + pages = {1161--1192}, + publisher = {Institute of Mathematical Statistics}, + title = {Calibration of the empirical likelihood method for a vector mean}, + volume = {3}, + year = {2009} +} +@article{haario1999adaptive, + author = {Haario, Heikki and Saksman, Eero and Tamminen, Johanna}, + journal = {Computational Statistics}, + number = {3}, + pages = {375--396}, + publisher = {Citeseer}, + title = {Adaptive proposal distribution for random walk {Metropolis} algorithm}, + volume = {14}, + year = {1999} +} +@article{Hoffman2014Uturn, + author = {Matthew D. Hoffman and Andrew Gelman}, + journal = {Journal of Machine Learning Research}, + pages = {1351-1381}, + title = {The no-U-turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo}, + volume = {15}, + year = {2014} +} +@article{hoffman2011no, + author = {Hoffman, Matthew D and Gelman, Andrew}, + journal = {arXiv preprint arXiv:1111.4246}, + title = {The no-U-turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo}, + year = {2011} +} +@article{apostol1974mathematical, + author = {Apostol, Tom M}, + publisher = {Addison Wesley Publishing Company}, + title = {Mathematical analysis}, + year = {1974} +} +@book{munkres1975topology, + author = {Munkres, James R}, + publisher = {Prentice-Hall Englewood Cliffs, NJ}, + title = {Topology: a first course}, + volume = {23}, + year = {1975} +} +@article{emplikR, + author = {Owen, A.B. and Mai Zhou}, + title = {Empirical likelihood ratio for censored/truncated data}, + url = {http://cran.r-project.org/web/packages/emplik/index.html}, + urldate = {2014-03-13}, + year = {2014-03-13} +} +@article{liu2010adjusted, + author = {Liu, Yukun and Chen, Jiahua and others}, + journal = {The Annals of Statistics}, + number = {3}, + pages = {1341--1362}, + publisher = {Institute of Mathematical Statistics}, + title = {Adjusted empirical likelihood with high-order precision}, + volume = {38}, + year = {2010} +} +@article{peng2013empirical, + author = {Peng, Hanxiang and Schick, Anton and others}, + journal = {Bernoulli}, + number = {3}, + pages = {954--981}, + publisher = {Bernoulli Society for Mathematical Statistics and Probability}, + title = {Empirical likelihood approach to goodness of fit testing}, + volume = {19}, + year = {2013} +} +@article{fang2005expected, + author = {Fang, Kai-tai and Mukerjee, Rahul}, + journal = {Biometrika}, + number = {2}, + pages = {499--503}, + publisher = {Biometrika Trust}, + title = {Expected lengths of confidence intervals based on empirical discrepancy statistics}, + volume = {92}, + year = {2005} +} +@article{kitamura1997empirical, + author = {Kitamura, Yuichi and others}, + journal = {The Annals of Statistics}, + number = {5}, + pages = {2084--2102}, + publisher = {Institute of Mathematical Statistics}, + title = {Empirical likelihood methods with weakly dependent processes}, + volume = {25}, + year = {1997} +} +@article{corcoran1998bartlett, + author = {Corcoran, Stephen A}, + journal = {Biometrika}, + number = {4}, + title = {Bartlett adjustment of empirical discrepancy statistics.}, + volume = {85}, + year = {1998} +} +@article{chen2007second, + author = {Chen, Song Xi and Cui, Hengjian}, + journal = {Journal of Econometrics}, + number = {2}, + pages = {492--516}, + title = {On the second-order properties of empirical likelihood with moment restrictions}, + volume = {141}, + year = {2007} +} +@article{Chaudhuri2014twostep, + author = {Chaudhuri, S. and Yin, T}, + journal = {Unpublished manuscript, Department of Statistics and Applied Probability, National University of Singapore}, + title = {Two step {Metropolis} {Hastings} methods for {Bayesian} empirical likelihood}, + year = {2014} +} +@article{sexton1992hamiltonian, + author = {Sexton, JC and Weingarten, DH}, + journal = {Nuclear Physics B}, + number = {3}, + pages = {665--677}, + publisher = {Elsevier}, + title = {Hamiltonian evolution for the hybrid Monte Carlo algorithm}, + volume = {380}, + year = {1992} +} +@article{neal1994improved, + author = {Neal, Radford M}, + journal = {Journal of Computational Physics}, + number = {1}, + pages = {194--203}, + publisher = {Elsevier}, + title = {An improved acceptance procedure for the hybrid Monte Carlo algorithm}, + volume = {111}, + year = {1994} +} +@article{haario2001adaptive, + author = {Haario, Heikki and Saksman, Eero and Tamminen, Johanna and others}, + journal = {Bernoulli}, + number = {2}, + pages = {223--242}, + publisher = {Bernoulli Society for Mathematical Statistics and Probability}, + title = {An adaptive {Metropolis} algorithm}, + volume = {7}, + year = {2001} +} +@book{ghosh1999multivariate, + author = {Ghosh, M. and Natarajan, K.}, + editor = {Ghosh, Subir}, + number = {pp.91-102}, + publisher = {New York: Marcel Dekker}, + title = {Small area estimation: a {Bayesian} perspective. In Multivariate analysis, design of experiments, and survey sampling}, + year = {1999} +} +@book{Smallarea1999Ghosh, + author = {Ghosh, M. & Natarajan K.}, + title = {Small area estimation: a {Bayesian} perspective} +} +@article{qin2009empirical, + author = {Qin, Jing and Zhang, Biao and Leung, Denis HY}, + journal = {Journal of the American Statistical Association}, + number = {488}, + title = {Empirical likelihood in missing data problems}, + volume = {104}, + year = {2009} +} +@article{wang2002empirical, + author = {Wang, Qihua and Rao, J. N. K.}, + journal = {The Annals of Statistics}, + number = {3}, + pages = {896--924}, + publisher = {Institute of Mathematical Statistics}, + title = {Empirical likelihood-based inference under imputation for missing response data}, + volume = {30}, + year = {2002} +} +@article{wu2006pseudo, + author = {Wu, Changbao and Rao, J. N. K.}, + journal = {Canadian Journal of Statistics}, + number = {3}, + pages = {359--375}, + publisher = {Wiley Online Library}, + title = {Pseudo-empirical likelihood ratio confidence intervals for complex surveys}, + volume = {34}, + year = {2006} +} +@article{berger2012unified, + author = {Berger, Y. G. and De La Riva Torres, O}, + journal = {Southampton Statistical Sciences Research Institute, http://eprints. soton. ac. uk/337688}, + title = {A unified theory of empirical likelihood ratio confidence intervals for survey data with unequal probabilities and non negligible sampling fractions}, + year = {2012} +} +@article{rubin1981bayesian, + author = {Rubin, Donald B.}, + journal = {The annals of statistics}, + pages = {130--134}, + publisher = {JSTOR}, + title = {The {Bayesian} bootstrap}, + year = {1981} +} +@article{hartley1968new, + author = {Hartley, Herman O. and Rao, J. N. K.}, + journal = {Biometrika}, + number = {3}, + pages = {547--557}, + publisher = {Biometrika Trust}, + title = {A new estimation theory for sample surveys}, + volume = {55}, + year = {1968} +} +@book{McCullagh1983generalized, + author = {MuCullagh, P. and Nelder, J.A.}, + publisher = {London: Chapman and Hall.}, + title = {Generalized linear models. Monographs on Statistics and Applied Probability.}, + year = {1983} +} +@article{chaudhuri2009techniqreport, + author = {Chaudhuri, Sanjay and Malay Ghosh}, + journal = {National University of Singapore, Technical Report}, + title = {Empirical likelihood for small area estimation}, + year = {2009} +} +@article{ghosh1994small, + author = {Ghosh, Malay and Rao, J. N. K.}, + journal = {Statistical science}, + pages = {55--76}, + publisher = {JSTOR}, + title = {Small area estimation: an appraisal}, + year = {1994} +} +@article{ghosh1998generalized, + author = {Ghosh, Malay and Natarajan, Kannan and Stroud, T. W. F. and Carlin, Bradley P}, + journal = {Journal of the American Statistical Association}, + number = {441}, + pages = {273--282}, + publisher = {Taylor \& Francis Group}, + title = {Generalized linear models for small-area estimation}, + volume = {93}, + year = {1998} +} +@article{fowlkes1988evaluating, + author = {Fowlkes, Edward B. and Freeny, Anne E. and Landwehr, James M.}, + journal = {Journal of the American Statistical Association}, + number = {403}, + pages = {611--622}, + publisher = {Taylor \& Francis Group}, + title = {Evaluating logistic models for large contingency tables}, + volume = {83}, + year = {1988} +} +@book{rao2005small, + author = {Rao, Jonathan N. K.}, + publisher = {John Wiley \& Sons}, + title = {Small area estimation}, + volume = {331}, + year = {2005} +} +@article{besag2004markov, + author = {Besag, Julian}, + publisher = {Citeseer}, + title = {Markov chain Monte Carlo methods for statistical inference}, + year = {2004} +} +@article{owen2013self, + author = {Owen, Art B.}, + journal = {Canadian Journal of Statistics}, + number = {3}, + pages = {387--397}, + publisher = {Wiley Online Library}, + title = {Self-concordance for empirical likelihood}, + volume = {41}, + year = {2013} +} +@article{tsao2013empirical, + author = {Tsao, Min and Wu, Fan}, + journal = {The Annals of Statistics}, + number = {4}, + pages = {2176--2196}, + publisher = {Institute of Mathematical Statistics}, + title = {Empirical likelihood on the full parameter space}, + volume = {41}, + year = {2013} +} +@article{tsao2013extending, + author = {Tsao, Min}, + journal = {Canadian Journal of Statistics}, + number = {2}, + pages = {257--274}, + publisher = {Wiley Online Library}, + title = {Extending the empirical likelihood by domain expansion}, + volume = {41}, + year = {2013} +} +@article{bergsma2012empty, + author = {Bergsma, Wicher and Croon, Marcel and van der Ark, L Andries and others}, + journal = {Electronic Journal of Statistics}, + pages = {2356--2361}, + publisher = {Institute of Mathematical Statistics}, + title = {The empty set and zero likelihood problems in maximum empirical likelihood estimation}, + volume = {6}, + year = {2012} +} +@article{grendar2009empty, + author = {Grend{\'a}r, Marian and Judge, George}, + journal = {Electronic Journal of Statistics}, + pages = {1542--1555}, + title = {Empty set problem of maximum empirical likelihood methods}, + volume = {3}, + year = {2009} +} +@book{wright1997primal, + author = {Wright, Stephen J}, + publisher = {Siam}, + title = {Primal-dual interior-point methods}, + volume = {54}, + year = {1997} +} +@book{boyd2004convex, + author = {Boyd, Stephen P and Vandenberghe, Lieven}, + publisher = {Cambridge university press}, + title = {Convex optimization}, + year = {2004} +} +@article{efron1986bootstrap, + author = {Efron, Bradley and Tibshirani, Robert}, + journal = {Statistical science}, + pages = {54--75}, + publisher = {JSTOR}, + title = {Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy}, + year = {1986} +} +@book{leimkuhler2004simulating, + author = {Leimkuhler, Benedict and Reich, Sebastian}, + publisher = {Cambridge University Press}, + title = {Simulating hamiltonian dynamics}, + volume = {14}, + year = {2004} +} +@book{birdsall2004plasma, + author = {Birdsall, Charles K. and Langdon, A. Bruce}, + publisher = {CRC Press}, + title = {Plasma physics via computer simulation}, + year = {2004} +} +@article{rockafellar1993lagrange, + author = {Rockafellar, R. Tyrrell}, + journal = {SIAM review}, + number = {2}, + pages = {183--238}, + title = {Lagrange Multipliers and Optimality}, + volume = {35}, + year = {1993} +} +@article{alder2004studies, + author = {Alder, Berni J. and Wainwright, T. E.}, + journal = {The Journal of Chemical Physics}, + number = {2}, + pages = {459--466}, + publisher = {AIP Publishing}, + title = {Studies in molecular dynamics. I. General method}, + volume = {31}, + year = {2004} +} +@book{esbensen2002multivariate, + author = {Esbensen, Kim and Guyot, Dominique and Westad, Frank and Houmoller, Lars P}, + publisher = {Multivariate Data Analysis}, + title = {Multivariate data analysis-in practice: An introduction to multivariate data analysis and experimental design}, + year = {2002} +} +@article{newey2004higher, + author = {Newey, Whitney K and Smith, Richard J}, + journal = {Econometrica}, + number = {1}, + pages = {219--255}, + publisher = {Wiley Online Library}, + title = {Higher order properties of {GMM} and generalized empirical likelihood estimators}, + volume = {72}, + year = {2004} +} +@article{ahn1995generation, + author = {Ahn, Hongshik and Chen, James J}, + journal = {Journal of Computational and Graphical Statistics}, + number = {1}, + pages = {55--64}, + publisher = {Taylor \& Francis}, + title = {Generation of over-dispersed and under-dispersed binomial variates}, + volume = {4}, + year = {1995} +} +@article{plummer2008penalized, + author = {Plummer, Martyn}, + journal = {Biostatistics}, + number = {3}, + pages = {523--539}, + publisher = {Biometrika Trust}, + title = {Penalized loss functions for {Bayesian} model comparison}, + volume = {9}, + year = {2008} +} +@article{chaudhuri2008generalized, + author = {Chaudhuri, Sanjay and Handcock, Mark S and Rendall, Michael S}, + journal = {Journal of the Royal Statistical Society: Series B (Statistical Methodology)}, + number = {2}, + pages = {311--328}, + publisher = {Wiley Online Library}, + title = {Generalized linear models incorporating population level information: an empirical-likelihood-based approach}, + volume = {70}, + year = {2008} +} +@article{grzebyk2004identification, + author = {Grzebyk, Michel and Wild, Pascal and Chouani{\`e}re, Dominique}, + journal = {Biometrika}, + number = {1}, + pages = {141--151}, + publisher = {Biometrika Trust}, + title = {On identification of multi-factor models with correlated residuals}, + volume = {91}, + year = {2004} +} +@article{chaudhuri2007estimation, + author = {Chaudhuri, Sanjay and Drton, Mathias and Richardson, Thomas S}, + journal = {Biometrika}, + number = {1}, + pages = {199--216}, + publisher = {Biometrika Trust}, + title = {Estimation of a covariance matrix with zeros}, + volume = {94}, + year = {2007} +} +@article{radhakrishna1945information, + author = {Radhakrishna Rao, C}, + journal = {Bulletin of the Calcutta Mathematical Society}, + number = {3}, + pages = {81--91}, + publisher = {Calcutta Mathematical Society}, + title = {Information and accuracy attainable in the estimation of statistical parameters}, + volume = {37}, + year = {1945} +} +@article{shaby2010exploring, + author = {Shaby, Benjamin and Wells, Martin T}, + journal = {Currently under review}, + pages = {1--17}, + title = {Exploring an adaptive {Metropolis} algorithm}, + volume = {1}, + year = {2010} +} +@article{atchade2006adaptive, + author = {Atchade, Yves F}, + journal = {Methodology and Computing in applied Probability}, + number = {2}, + pages = {235--254}, + publisher = {Springer}, + title = {An adaptive version for the {Metropolis} adjusted Langevin algorithm with a truncated drift}, + volume = {8}, + year = {2006} +} +@article{roberts1996exponential, + author = {Roberts, Gareth O and Tweedie, Richard L}, + journal = {Bernoulli}, + pages = {341--363}, + publisher = {JSTOR}, + title = {Exponential convergence of Langevin distributions and their discrete approximations}, + year = {1996} +} +@book{amari2000methods, + author = {Amari, Shunʾichi and Nagaoka, Hiroshi}, + publisher = {AMS Bookstore}, + title = {Methods of information geometry}, + volume = {191}, + year = {2000} +} +@article{neal2011mcmc, + author = {Neal, R}, + journal = {Handbook of Markov Chain Monte Carlo}, + pages = {113--162}, + title = {{MCMC} for Using Hamiltonian Dynamics}, + year = {2011} +} +@article{duane1987hybrid, + author = {Duane, Simon and Kennedy, Anthony D and Pendleton, Brian J and Roweth, Duncan}, + journal = {Physics letters B}, + number = {2}, + pages = {216--222}, + publisher = {Elsevier}, + title = {Hybrid monte carlo}, + volume = {195}, + year = {1987} +} +@article{girolami2011riemann, + author = {Girolami, Mark and Calderhead, Ben}, + journal = {Journal of the Royal Statistical Society: Series B (Statistical Methodology)}, + number = {2}, + pages = {123--214}, + publisher = {Wiley Online Library}, + title = {Riemann manifold langevin and hamiltonian monte carlo methods}, + volume = {73}, + year = {2011} +} +@book{owen2010empirical, + author = {Owen, Art B.}, + publisher = {CRC press}, + title = {Empirical likelihood}, + year = {2010} +} +@article{chen2009review, + author = {Chen, Song Xi and Van Keilegom, Ingrid}, + journal = {Test}, + number = {3}, + pages = {415--447}, + publisher = {Springer}, + title = {A review on empirical likelihood methods for regression}, + volume = {18}, + year = {2009} +} +@article{chen1993empirical, + author = {Chen, Jiahua and Qin, Jing}, + journal = {Biometrika}, + number = {1}, + pages = {107--116}, + publisher = {Biometrika Trust}, + title = {Empirical likelihood estimation for finite populations and the effective usage of auxiliary information}, + volume = {80}, + year = {1993} +} +@article{chen1999pseudo, + author = {Chen, Jiahua and Sitter, RR}, + journal = {Statistica Sinica}, + number = {2}, + pages = {385--406}, + publisher = {C/O DR HC HO, INST STATISTICAL SCIENCE, ACADEMIA SINICA, TAIPEI 115, TAIWAN}, + title = {A pseudo empirical likelihood approach to the effective use of auxiliary information in complex surveys}, + volume = {9}, + year = {1999} +} +@article{chen2009effects, + author = {Chen, Song Xi and Peng, Liang and Qin, Ying-Li}, + journal = {Biometrika}, + number = {3}, + pages = {711--722}, + publisher = {Biometrika Trust}, + title = {Effects of data dimension on empirical likelihood}, + volume = {96}, + year = {2009} +} +@article{chen2009empirical, + author = {Chen, Jian and Peng, Liang and Zhao, Yichuan}, + journal = {Journal of Multivariate Analysis}, + number = {1}, + pages = {137--151}, + publisher = {Elsevier}, + title = {Empirical likelihood based confidence intervals for copulas}, + volume = {100}, + year = {2009} +} +@article{otsu2006generalized, + author = {Otsu, Taisuke}, + journal = {Econometric Theory}, + number = {3}, + pages = {513}, + publisher = {Cambridge Univ Press}, + title = {Generalized empirical likelihood inference for nonlinear and time series models under weak identification}, + volume = {22}, + year = {2006} +} +@article{chan2006empirical, + author = {Chan, Ngai Hang and Ling, Shiqing}, + journal = {Econometric Theory}, + number = {3}, + pages = {403}, + publisher = {Cambridge Univ Press}, + title = {Empirical likelihood for {GARCH} models}, + volume = {22}, + year = {2006} +} +@article{nordman2007empirical, + author = {Nordman, Daniel J and Sibbertsen, Philipp and Lahiri, Soumendra N}, + journal = {Journal of Time Series Analysis}, + number = {4}, + pages = {576--599}, + publisher = {Wiley Online Library}, + title = {Empirical likelihood confidence intervals for the mean of a long-range dependent process}, + volume = {28}, + year = {2007} +} +@article{zhao2008empirical-kernel, + author = {Zhao, Yichuan and Chen, Feiming}, + journal = {Journal of Multivariate Analysis}, + number = {2}, + pages = {215--231}, + publisher = {Elsevier}, + title = {Empirical likelihood inference for censored median regression model via nonparametric kernel estimation}, + volume = {99}, + year = {2008} +} +@article{zhao2008empirical, + author = {Zhao, Yichuan and Yang, Song}, + journal = {Annals of the Institute of Statistical Mathematics}, + number = {2}, + pages = {441--457}, + publisher = {Springer}, + title = {Empirical likelihood inference for censored median regression with weighted empirical hazard functions}, + volume = {60}, + year = {2008} +} +@article{xue2007empirical, + author = {Xue, Liugen and Zhu, Lixing}, + journal = {Journal of the American Statistical Association}, + number = {478}, + title = {Empirical likelihood for a varying coefficient model with longitudinal data}, + volume = {102}, + year = {2007} +} +@article{you2006block, + author = {You, Jinhong and Chen, Gemai and Zhou, Yong}, + journal = {Canadian Journal of Statistics}, + number = {1}, + pages = {79--96}, + publisher = {Wiley Online Library}, + title = {Block empirical likelihood for longitudinal partially linear regression models}, + volume = {34}, + year = {2006} +} +@article{lu2004empirical, + author = {Lu, Xuewen and Qi, Yongcheng}, + journal = {Probability and Mathematical Statistics-PWN}, + number = {2}, + pages = {419--432}, + publisher = {Wroclaw: Polish Scientific Publishers, c1980-}, + title = {Empirical likelihood for the additive risk model}, + volume = {24}, + year = {2004} +} +@article{zhong2000empirical, + author = {Zhong, Bob and Rao, JNK}, + journal = {Biometrika}, + number = {4}, + pages = {929--938}, + publisher = {Biometrika Trust}, + title = {Empirical likelihood inference under stratified random sampling using auxiliary population information}, + volume = {87}, + year = {2000} +} +@article{wang1999empirical, + author = {Wang, Qi-Hua and Jing, Bing-Yi}, + journal = {Statistics \& Probability Letters}, + number = {4}, + pages = {425--433}, + publisher = {Elsevier}, + title = {Empirical likelihood for partial linear models with fixed designs}, + volume = {41}, + year = {1999} +} +@article{qin1993empirical, + author = {Qin, Jing}, + journal = {The Annals of Statistics}, + number = {3}, + pages = {1182--1196}, + publisher = {Institute of Mathematical Statistics}, + title = {Empirical likelihood in biased sample problems}, + volume = {21}, + year = {1993} +} +@article{thomas1975confidence, + author = {Thomas, David R and Grunkemeier, Gary L}, + journal = {Journal of the American Statistical Association}, + number = {352}, + pages = {865--871}, + publisher = {Taylor \& Francis}, + title = {Confidence interval estimation of survival probabilities for censored data}, + volume = {70}, + year = {1975} +} +@article{tavare1997inferring, + author = {Tavare, Simon and Balding, David J and Griffiths, RC and Donnelly, Peter}, + journal = {Genetics}, + number = {2}, + pages = {505--518}, + publisher = {Genetics Soc America}, + title = {Inferring coalescence times from {DNA} sequence data}, + volume = {145}, + year = {1997} +} +@article{marjoram2006modern, + author = {Marjoram, Paul and Tavar{\'e}, Simon}, + journal = {Nature Reviews Genetics}, + number = {10}, + pages = {759--770}, + publisher = {Nature Publishing Group}, + title = {Modern computational approaches for analysing molecular genetic variation data}, + volume = {7}, + year = {2006} +} +@article{beaumont2002approximate, + author = {Beaumont, Mark A and Zhang, Wenyang and Balding, David J}, + journal = {Genetics}, + number = {4}, + pages = {2025--2035}, + publisher = {Genetics Soc America}, + title = {Approximate {Bayesian} computation in population genetics}, + volume = {162}, + year = {2002} +} +@article{akaike1974new, + author = {Akaike, Hirotugu}, + journal = {Automatic Control, {IEEE} Transactions on}, + number = {6}, + pages = {716--723}, + publisher = {Ieee}, + title = {A new look at the statistical model identification}, + volume = {19}, + year = {1974} +} +@article{newton1994approximate, + author = {Newton, Michael A and Raftery, Adrian E}, + journal = {Journal of the Royal Statistical Society. Series B (Methodological)}, + pages = {3--48}, + publisher = {JSTOR}, + title = {Approximate {Bayesian} inference with the weighted likelihood bootstrap}, + year = {1994} +} +@article{chib1995marginal, + author = {Chib, Siddhartha}, + journal = {Journal of the American Statistical Association}, + number = {432}, + pages = {1313--1321}, + publisher = {Taylor \& Francis Group}, + title = {Marginal likelihood from the {Gibbs} output}, + volume = {90}, + year = {1995} +} +@article{green2009reversible, + author = {Green, Peter J and Hastie, David I}, + journal = {Genetics}, + number = {3}, + pages = {1391--1403}, + title = {Reversible jump {MCMC}}, + volume = {155}, + year = {2009} +} +@article{sisson2005transdimensional, + author = {Sisson, Scott A}, + journal = {Journal of the American Statistical Association}, + number = {471}, + pages = {1077--1089}, + publisher = {Taylor \& Francis}, + title = {Transdimensional {Markov} chains: A decade of progress and future perspectives}, + volume = {100}, + year = {2005} +} +@article{waagepetersen2001tutorial, + author = {Waagepetersen, Rasmus and Sorensen, Daniel}, + journal = {International Statistical Review}, + number = {1}, + pages = {49--61}, + publisher = {Wiley Online Library}, + title = {A Tutorial on Reversible Jump {MCMC} with a View toward Applications in {QTL}-mapping}, + volume = {69}, + year = {2001} +} +@article{al2004improving, + author = {Al-Awadhi, Fahimah and Hurn, Merrilee and Jennison, Christopher}, + journal = {Statistics \& probability letters}, + number = {2}, + pages = {189--198}, + publisher = {Elsevier}, + title = {Improving the acceptance rate of reversible jump {MCMC} proposals}, + volume = {69}, + year = {2004} +} +@article{jasra2007population, + author = {Jasra, Ajay and Stephens, David A and Holmes, Christopher C}, + journal = {Biometrika}, + number = {4}, + pages = {787--807}, + publisher = {Biometrika Trust}, + title = {Population-based reversible jump {Markov} chain Monte Carlo}, + volume = {94}, + year = {2007} +} +@article{brooks2003efficient, + author = {Brooks, Stephen P and Giudici, PAULO and Roberts, Gareth O}, + journal = {Journal of the Royal Statistical Society: Series B (Statistical Methodology)}, + number = {1}, + pages = {3--39}, + publisher = {Wiley Online Library}, + title = {Efficient construction of reversible jump {Markov} chain Monte Carlo proposal distributions}, + volume = {65}, + year = {2003} +} +@article{fang2006empirical, + author = {Fang, Kai-Tai and Mukerjee, Rahul}, + journal = {Biometrika}, + number = {3}, + pages = {723--733}, + publisher = {Biometrika Trust}, + title = {Empirical-type likelihoods allowing posterior credible sets with frequentist validity: Higher-order asymptotics}, + volume = {93}, + year = {2006} +} +@article{chang2008bayesian, + author = {Chang, In Hong and Mukerjee, Rahul}, + journal = {Biometrika}, + number = {1}, + pages = {139--147}, + publisher = {Biometrika Trust}, + title = {{Bayesian} and frequentist confidence intervals arising from empirical-type likelihoods}, + volume = {95}, + year = {2008} +} +@article{schennach2007point, + author = {Schennach, Susanne M}, + journal = {The Annals of Statistics}, + number = {2}, + pages = {634--672}, + publisher = {Institute of Mathematical Statistics}, + title = {Point estimation with exponentially tilted empirical likelihood}, + volume = {35}, + year = {2007} +} +@article{tanner1987calculation, + author = {Tanner, Martin A and Wong, Wing Hung}, + journal = {Journal of the American statistical Association}, + number = {398}, + pages = {528--540}, + publisher = {Taylor \& Francis}, + title = {The calculation of posterior distributions by data augmentation}, + volume = {82}, + year = {1987} +} +@article{geman1984stochastic, + author = {Geman, Stuart and Geman, Donald}, + journal = {Pattern Analysis and Machine Intelligence, {IEEE} Transactions on}, + number = {6}, + pages = {721--741}, + publisher = {IEEE}, + title = {Stochastic relaxation, {Gibbs} distributions, and the {Bayesian} restoration of images}, + year = {1984} +} +@article{metropolis1953equation, + author = {Metropolis, Nicholas and Rosenbluth, Arianna W and Rosenbluth, Marshall N and Teller, Augusta H and Teller, Edward}, + journal = {The journal of chemical physics}, + pages = {1087}, + title = {Equation of state calculations by fast computing machines}, + volume = {21}, + year = {1953} +} +@book{robert2007bayesian, + author = {Robert, Christian}, + publisher = {Springer}, + title = {The {Bayesian} choice: from decision-theoretic foundations to computational implementation}, + year = {2007} +} +@book{bernardo2009bayesian, + author = {Bernardo, Jos{\'e} M and Smith, Adrian FM}, + publisher = {Wiley. com}, + title = {{Bayesian} theory}, + volume = {405}, + year = {2009} +} +@article{rossi2003bayesian, + author = {Rossi, Peter E and Allenby, Greg M}, + journal = {Marketing Science}, + number = {3}, + pages = {304--328}, + publisher = {INFORMS}, + title = {{Bayesian} statistics and marketing}, + volume = {22}, + year = {2003} +} +@article{brooks1998markov, + author = {Brooks, Stephen}, + journal = {Journal of the royal statistical society: series D (the Statistician)}, + number = {1}, + pages = {69--100}, + publisher = {Wiley Online Library}, + title = {Markov chain Monte Carlo method and its application}, + volume = {47}, + year = {1998} +} +@article{zhu2000comparing, + author = {Zhu, Li and Carlin, Bradley P}, + journal = {Statistics in Medicine}, + number = {17-18}, + pages = {2265--2278}, + publisher = {Wiley Online Library}, + title = {Comparing hierarchical models for spatio-temporally misaligned data using the deviance information criterion}, + volume = {19}, + year = {2000} +} +@article{andrews1974scale, + author = {Andrews, David F and Mallows, Colin L}, + journal = {Journal of the Royal Statistical Society. Series B (Methodological)}, + pages = {99--102}, + publisher = {JSTOR}, + title = {Scale mixtures of normal distributions}, + year = {1974} +} +@article{smith1993bayesian, + author = {Smith, Adrian FM and Roberts, Gareth O}, + journal = {Journal of the Royal Statistical Society. Series B (Methodological)}, + pages = {3--23}, + publisher = {JSTOR}, + title = {{Bayesian} computation via the {Gibbs} sampler and related {Markov} chain Monte Carlo methods}, + year = {1993} +} +@article{grendar2009asymptotic, + author = {Grend{\'a}r, Marian and Judge, George}, + journal = {The Annals of Statistics}, + number = {5A}, + pages = {2445--2457}, + publisher = {Institute of Mathematical Statistics}, + title = {Asymptotic equivalence of empirical likelihood and {Bayesian} {MAP}}, + volume = {37}, + year = {2009} +} +@article{kleijn2006misspecification, + author = {Kleijn, Bas JK and van der Vaart, Aad W}, + journal = {The Annals of Statistics}, + number = {2}, + pages = {837--877}, + publisher = {Institute of Mathematical Statistics}, + title = {Misspecification in infinite-dimensional {Bayesian} statistics}, + volume = {34}, + year = {2006} +} +@article{shi2000empirical, + author = {Shi, Jian and Lau, Tai-Shing}, + journal = {Journal of Multivariate Analysis}, + number = {1}, + pages = {132--148}, + publisher = {Elsevier}, + title = {Empirical likelihood for partially linear models}, + volume = {72}, + year = {2000} +} +@article{hall1990methodology, + author = {Hall, Peter and La Scala, Barbara}, + journal = {International Statistical Review/Revue Internationale de Statistique}, + pages = {109--127}, + publisher = {JSTOR}, + title = {Methodology and algorithms of empirical likelihood}, + year = {1990} +} +@article{diciccio1991empirical, + author = {DiCiccio, Thomas and Hall, Peter and Romano, Joseph}, + journal = {The Annals of Statistics}, + number = {2}, + pages = {1053--1061}, + publisher = {Institute of Mathematical Statistics}, + title = {Empirical likelihood is Bartlett-correctable}, + volume = {19}, + year = {1991} +} +@article{owen1990empirical, + author = {Owen, Art}, + journal = {The Annals of Statistics}, + number = {1}, + pages = {90--120}, + publisher = {Institute of Mathematical Statistics}, + title = {Empirical likelihood ratio confidence regions}, + volume = {18}, + year = {1990} +} +@article{franccois2011deviance, + author = {Fran{\c{c}}ois, Olivier and Laval, Guillaume}, + journal = {Statistical Applications in Genetics and Molecular Biology}, + number = {1}, + title = {Deviance information criteria for model selection in approximate {Bayesian} computation}, + volume = {10}, + year = {2011} +} +@article{li2012robust, + author = {Li, Yong and Zeng, Tao and Yu, Jun}, + title = {Robust Deviance Information Criterion for Latent Variable Models}, + year = {2012} +} +@article{shriner2009deviance, + author = {Shriner, Daniel and Yi, Nengjun}, + journal = {Computational statistics \& data analysis}, + number = {5}, + pages = {1850--1860}, + publisher = {Elsevier}, + title = {Deviance information criterion (DIC) in {Bayesian} multiple {QTL} mapping}, + volume = {53}, + year = {2009} +} +@article{berg2004deviance, + author = {Berg, Andreas and Meyer, Renate and Yu, Jun}, + journal = {Journal of Business \& Economic Statistics}, + number = {1}, + pages = {107--120}, + publisher = {Taylor \& Francis}, + title = {Deviance information criterion for comparing stochastic volatility models}, + volume = {22}, + year = {2004} +} +@article{celeux2006deviance, + author = {Celeux, Gilles and Forbes, Florence and Robert, Christian P and Titterington, D Michael}, + journal = {Bayesian Analysis}, + number = {4}, + pages = {651--673}, + publisher = {International Society for {Bayesian} Analysis}, + title = {Deviance information criteria for missing data models}, + volume = {1}, + year = {2006} +} +@inproceedings{akaike1973information, + author = {Akaike, Hirotugu}, + booktitle = {International Symposium on Information Theory, 2 nd, Tsahkadsor, Armenian SSR}, + pages = {267--281}, + title = {Information theory and an extension of the maximum likelihood principle}, + year = {1973} +} +@article{george1993variable, + author = {George, Edward I and McCulloch, Robert E}, + journal = {Journal of the American Statistical Association}, + number = {423}, + pages = {881--889}, + publisher = {Taylor \& Francis Group}, + title = {Variable selection via {Gibbs} sampling}, + volume = {88}, + year = {1993} +} +@article{owen1991empirical, + author = {Owen, Art}, + journal = {The Annals of Statistics}, + number = {4}, + pages = {1725--1747}, + publisher = {Institute of Mathematical Statistics}, + title = {Empirical likelihood for linear models}, + volume = {19}, + year = {1991} +} +@article{mosteller1977data, + author = {Mosteller, Frederick and Tukey, John W}, + journal = {Addison-Wesley Series in Behavioral Science: Quantitative Methods, Reading, Mass.: Addison-Wesley, 1977}, + title = {Data analysis and regression. A second course in statistics}, + volume = {1}, + year = {1977} +} +@article{li2012empirical, + author = {Li, Daoji and Pan, Jianxin}, + journal = {Journal of Multivariate Analysis}, + publisher = {Elsevier}, + title = {Empirical likelihood for generalized linear models with longitudinal data}, + year = {2012} +} +@article{fitzmaurice1993likelihood, + author = {Fitzmaurice, Garrett M and Laird, Nan M}, + journal = {Biometrika}, + number = {1}, + pages = {141--151}, + publisher = {Biometrika Trust}, + title = {A likelihood-based method for analysing longitudinal binary responses}, + volume = {80}, + year = {1993} +} +@article{variyath2010empirical, + author = {Variyath, Asokan Mulayath and Chen, Jiahua and Abraham, Bovas}, + journal = {Journal of Statistical Planning and Inference}, + number = {4}, + pages = {971--981}, + publisher = {Elsevier}, + title = {Empirical likelihood based variable selection}, + volume = {140}, + year = {2010} +} +@article{kolaczyk1995information, + author = {Kolaczyk, Eric D}, + journal = {Unpublished manuscript, Department of Statistics, University of Chicago}, + title = {An information criterion for empirical likelihood with general estimating equations}, + year = {1995} +} +@book{gelman2003bayesian, + author = {Gelman, Andrew and Carlin, John B and Stern, Hal S and Rubin, Donald B}, + publisher = {Chapman \& Hall/CRC}, + title = {{Bayesian} data analysis}, + year = {2003} +} +@article{spiegelhalter2002bayesian, + author = {Spiegelhalter, David J and Best, Nicola G and Carlin, Bradley P and Van Der Linde, Angelika}, + journal = {Journal of the Royal Statistical Society: Series B (Statistical Methodology)}, + number = {4}, + pages = {583--639}, + publisher = {Wiley Online Library}, + title = {{Bayesian} measures of model complexity and fit}, + volume = {64}, + year = {2002} +} +@article{linde2005dic, + author = {Linde, Angelika}, + journal = {Statistica Neerlandica}, + number = {1}, + pages = {45--56}, + publisher = {Wiley Online Library}, + title = {{DIC} in variable selection}, + volume = {59}, + year = {2005} +} +@article{chen2010objective, + author = {Chen, Ming-Hui and Dey, Dipak K and M{\"u}ller, Peter and Sun, Dongchu and Ye, Keying}, + journal = {Frontiers of Statistical Decision Making and {Bayesian} Analysis}, + pages = {31--68}, + publisher = {Springer}, + title = {Objective {Bayesian} Inference with Applications}, + year = {2010} +} +@article{wedderburn1974quasi, + author = {Wedderburn, R.W.M.}, + journal = {Biometrika}, + number = {3}, + pages = {439--447}, + publisher = {Biometrika Trust}, + title = {Quasi-likelihood functions, generalized linear models, and the {Gauss}---Newton method}, + volume = {61}, + year = {1974} +} +@book{chambers1992statistical, + author = {Chambers, J.M. and Hastie, T. and others}, + publisher = {Chapman \& Hall London}, + title = {Statistical models in {S}}, + year = {1992} +} +@article{nelder1992likelihood, + author = {Nelder, JA and Lee, Y.}, + journal = {Journal of the Royal Statistical Society. Series B (Methodological)}, + pages = {273--284}, + publisher = {JSTOR}, + title = {Likelihood, quasi-likelihood and pseudolikelihood: some comparisons}, + year = {1992} +} +@article{draper1995assessment, + author = {Draper, D.}, + journal = {Journal of the Royal Statistical Society. Series B (Methodological)}, + pages = {45--97}, + publisher = {JSTOR}, + title = {Assessment and propagation of model uncertainty}, + year = {1995} +} +@book{givens2005computational, + author = {Givens, G.H. and Hoeting, J.A.}, + publisher = {Wiley-Interscience}, + title = {Computational statistics}, + volume = {483}, + year = {2005} +} +@article{chen2003extended, + author = {Chen, S.X. and Cui, H.}, + journal = {Statistica Sinica}, + number = {1}, + pages = {69--82}, + title = {An extended empirical likelihood for generalized linear models}, + volume = {13}, + year = {2003} +} +@article{kolaczyk1994empirical, + author = {Kolaczyk, E.D.}, + journal = {Statist. Sinica}, + number = {1}, + pages = {199--218}, + title = {Empirical likelihood for generalized linear models}, + volume = {4}, + year = {1994} +} +@article{hoeting1999bayesian, + author = {Hoeting, J.A. and Madigan, D. and Raftery, A.E. and Volinsky, C.T.}, + journal = {Statistical science}, + pages = {382--401}, + publisher = {JSTOR}, + title = {{Bayesian} model averaging: a tutorial}, + year = {1999} +} +@article{robert2002bayesian, + author = {Robert, C.P. and Ryden, T. and Titterington, D.M.}, + journal = {Journal of the Royal Statistical Society: Series B (Statistical Methodology)}, + number = {1}, + pages = {57--75}, + publisher = {Wiley Online Library}, + title = {{Bayesian} inference in hidden {Markov} models through the reversible jump {Markov} chain Monte Carlo method}, + volume = {62}, + year = {2002} +} +@article{dellaportas2002bayesian, + author = {Dellaportas, P. and Forster, J.J. and Ntzoufras, I.}, + journal = {Statistics and Computing}, + number = {1}, + pages = {27--36}, + publisher = {Springer}, + title = {On {Bayesian} model and variable selection using {MCMC}}, + volume = {12}, + year = {2002} +} +@article{fan2010reversible, + author = {Fan, Y. and Sisson, S.A.}, + journal = {Handbook of {Markov} Chain Monte Carlo: Methods and Applications}, + pages = {67}, + publisher = {Chapman \& Hall/CRC}, + title = {Reversible jump {MCMC}}, + year = {2010} +} +@article{qin1994empirical, + author = {Qin, J. and Lawless, J.}, + journal = {The Annals of Statistics}, + pages = {300--325}, + publisher = {JSTOR}, + title = {Empirical likelihood and general estimating equations}, + year = {1994} +} +@article{chen2008adjusted, + author = {Chen, J. and Variyath, A.M. and Abraham, B.}, + journal = {Journal of Computational and Graphical Statistics}, + number = {2}, + pages = {426--443}, + publisher = {American Statistical Association}, + title = {Adjusted empirical likelihood and its properties}, + volume = {17}, + year = {2008} +} +@article{monahan1992proper, + author = {Monahan, J.F. and Boos, D.D.}, + journal = {Biometrika}, + number = {2}, + pages = {271--278}, + publisher = {Biometrika Trust}, + title = {Proper likelihoods for {Bayesian} analysis}, + volume = {79}, + year = {1992} +} +@article{green1995reversible, + author = {Green, P.J.}, + journal = {Biometrika}, + number = {4}, + pages = {711--732}, + publisher = {Biometrika Trust}, + title = {Reversible jump {Markov} chain Monte Carlo computation and {Bayesian} model determination}, + volume = {82}, + year = {1995} +} +@article{owen1988empirical, + author = {Owen, A.B.}, + journal = {Biometrika}, + number = {2}, + pages = {237--249}, + publisher = {Biometrika Trust}, + title = {Empirical likelihood ratio confidence intervals for a single functional}, + volume = {75}, + year = {1988} +} +@article{yang2012bayesian, + author = {Yang, Y. and He, X.}, + journal = {The Annals of Statistics}, + number = {2}, + pages = {1102--1131}, + publisher = {Institute of Mathematical Statistics}, + title = {{Bayesian} empirical likelihood for quantile regression}, + volume = {40}, + year = {2012} +} +@article{tierney1994markov, + author = {Tierney, L.}, + journal = {the Annals of Statistics}, + pages = {1701--1728}, + publisher = {JSTOR}, + title = {Markov chains for exploring posterior distributions}, + year = {1994} +} +@article{schennach2005bayesian, + author = {Schennach, S.M.}, + journal = {Biometrika}, + number = {1}, + pages = {31--46}, + publisher = {Biometrika Trust}, + title = {{Bayesian} exponentially tilted empirical likelihood}, + volume = {92}, + year = {2005} +} +@article{rao2010bayesian, + author = {Rao, JNK and Wu, C.}, + journal = {Journal of the Royal Statistical Society: Series B (Statistical Methodology)}, + number = {4}, + pages = {533--544}, + publisher = {Wiley Online Library}, + title = {{Bayesian} pseudo-empirical-likelihood intervals for complex surveys}, + volume = {72}, + year = {2010} +} +@book{owen2001empirical, + author = {Owen, A.B.}, + publisher = {Chapman \& Hall/CRC}, + title = {Empirical likelihood}, + volume = {92}, + year = {2001} +} +@book{mccullagh1989generalized, + author = {McCullagh, P. and Nelder, J.A.}, + publisher = {Chapman \& Hall/CRC}, + title = {Generalized linear models}, + volume = {37}, + year = {1989} +} +@book{liu2008monte, + author = {Liu, J.S.}, + publisher = {Springer}, + title = {Monte Carlo strategies in scientific computing}, + year = {2008} +} +@article{lazar2003bayesian, + author = {Lazar, N.A.}, + journal = {Biometrika}, + number = {2}, + pages = {319--326}, + publisher = {Biometrika Trust}, + title = {{Bayesian} empirical likelihood}, + volume = {90}, + year = {2003} +} +@article{heidelberger1983simulation, + author = {Heidelberger, P. and Welch, P.D.}, + journal = {Operations Research}, + number = {6}, + pages = {1109--1144}, + publisher = {INFORMS}, + title = {Simulation run length control in the presence of an initial transient}, + volume = {31}, + year = {1983} +} +@article{hastings1970monte, + author = {Hastings, W.K.}, + journal = {Biometrika}, + number = {1}, + pages = {97--109}, + publisher = {Biometrika Trust}, + title = {Monte Carlo sampling methods using {Markov} chains and their applications}, + volume = {57}, + year = {1970} +} +@book{geyer1992markov, + author = {Geyer, C.J.}, + publisher = {Defense Technical Information Center}, + title = {Markov chain Monte Carlo maximum likelihood}, + year = {1992} +} +@article{gelfand1990illustration, + author = {Gelfand, A.E. and Hills, S.E. and Racine-Poon, A. and Smith, A.F.M.}, + journal = {Journal of the American Statistical Association}, + number = {412}, + pages = {972--985}, + publisher = {Taylor \& Francis Group}, + title = {Illustration of {Bayesian} inference in normal data models using {Gibbs} sampling}, + volume = {85}, + year = {1990} +} +@article{chib1995understanding, + author = {Chib, S. and Greenberg, E.}, + journal = {The American Statistician}, + number = {4}, + pages = {327--335}, + publisher = {Taylor \& Francis Group}, + title = {Understanding the {Metropolis}-{Hastings} algorithm}, + volume = {49}, + year = {1995} +} +@book{shao2000monte, + author = {Shao, Q.M. and Ibrahim, J.G.}, + publisher = {Springer Series in Statistics, New York}, + title = {Monte Carlo methods in {Bayesian} computation}, + year = {2000} +} +@article{chaudhuri2011empirical, + author = {Chaudhuri, S. and Ghosh, M.}, + journal = {Biometrika}, + number = {2}, + pages = {473--480}, + publisher = {Biometrika Trust}, + title = {Empirical likelihood for small area estimation}, + volume = {98}, + year = {2011} +} diff --git a/_articles/RJ-2025-041/kWC.tex b/_articles/RJ-2025-041/kWC.tex new file mode 100644 index 0000000000..faf1c154ac --- /dev/null +++ b/_articles/RJ-2025-041/kWC.tex @@ -0,0 +1,771 @@ +\title{elhmc: An R Package for Hamiltonian Monte Carlo Sampling in Bayesian Empirical Likelihood} + + +\author{by Neo Han Wei, Dang Trung Kien, and Sanjay Chaudhuri} + +\maketitle +\begin{abstract} +In this article, we describe an {\tt R} package for sampling from an empirical likelihood-based posterior using a Hamiltonian Monte Carlo method. Empirical likelihood-based methodologies have been used in the Bayesian modeling of many problems of interest in recent times. +This semiparametric procedure can easily combine the flexibility of a nonparametric distribution estimator together with the interpretability of a parametric model. The model is specified by estimating equation-based constraints. +Drawing inference from a Bayesian empirical likelihood (BayesEL) posterior is challenging. The likelihood is computed numerically, so no closed-form expression of the posterior exists. Moreover, for any sample of finite size, the support of the likelihood is non-convex, which hinders fast mixing of many Markov Chain Monte Carlo (MCMC) procedures. +It has been recently shown that using the properties of the gradient of the log empirical likelihood, one can devise an efficient Hamiltonian Monte Carlo (HMC) algorithm to sample from a BayesEL posterior. + The package requires the user to specify only the estimating equations, the prior, and their respective gradients. An MCMC sample drawn from the BayesEL posterior of the parameters, with various details required by the user, is obtained. +\end{abstract} + +\section{Introduction} +Empirical likelihood has several advantages over a traditional parametric likelihood. Even though a correctly specified parametric likelihood is usually the most efficient for parameter estimation, semiparametric methods like empirical likelihood, which use a nonparametric estimate of the underlying distribution, are often more efficient when the model is misspecified. +Empirical likelihood incorporates parametric model-based information as constraints in estimating the underlying distribution, which makes the parametric estimates interpretable. Furthermore, it allows easy incorporation of known additional information not involving the parameters in the analysis. + +Bayesian empirical likelihood (BayesEL) \citep{lazar2003bayesian} methods employ empirical likelihood in the Bayesian paradigm. Given some information about the model parameters in the form of a prior distribution and estimating equations obtained from the model, a likelihood is constructed from a constrained empirical estimate of the underlying distribution. The prior is then used to define a posterior based on this estimated likelihood. +Inference on the parameter is drawn based on samples generated from the posterior distribution. + +BayesEL methods are quite flexible and have been found useful in many areas of statistics. The examples include small area estimation, quantile regression, analysis of complex survey data, etc. + +BayesEL procedures, however, require an efficient Markov Chain Monte Carlo (MCMC) procedure to sample from the resulting posterior. It turns out that such a procedure is not easily specified. For many parameter values, it may not be feasible to compute the constrained empirical distribution function, and the likelihood is estimated to be zero. That is, the estimated likelihood is not supported over the whole space. Moreover, this support is non-convex and impossible to determine in most cases. +Thus, a naive random walk MCMC would quite often propose parameters outside the support and get stuck. + +Many authors have encountered this problem in frequentist applications. Such "empty set" problems are quite common \citep{grendar2009empty} and become more frequent in problems with a large number of parameters \citep{bergsma2012empty}. +Several authors \citep{chen2008adjusted,emerson2009calibration,liu2010adjusted} have suggested the addition of extra observations generated from the available data designed specifically to avoid empty sets. They show that such observations can be proposed without changing the asymptotic distribution of the corresponding Wilks' statistics. +Some authors (\citep{tsao2013extending,tsao2013empirical,tsaoFu2014}) have used a transformation so that the contours of the resultant empirical likelihood could be extended beyond the feasible region. However, in most Bayesian applications, the data are finite in size and not large, for which the asymptotic arguments have little use. + +With the availability of user-friendly software packages like {\tt STAN} \citep{stan2017}, gradient-assisted MCMC methods like Hamiltonian Monte Carlo (HMC) are becoming increasingly popular in Bayesian computation. When the estimating equations are smooth with respect to the parameters, gradient-based methods would have a huge advantage in sampling from a BayesEL posterior. +This is because \citet{chaudhuriMondalTeng2017} have shown that under mild conditions, the gradient of the log-posterior would diverge to infinity at the boundary of its support. +Due to this phenomenon, if an HMC chain approaches the boundary of the posterior support, it would be reflected towards its center. + +There is no software to implement HMC sampling from a BayesEL posterior with smooth estimating equations and priors. We describe such a library called {\tt elhmc} written for the {\tt R} platform. The main function in the library only requires the user to specify the estimating equations, prior, and respectively their Hessian and gradient with respect to the parameters as functions. Outputs with user-specified degree of detail can be obtained. + +The \code{elhmc} package has been used by practitioners since it was made available on \code{CRAN}. In recent times, various other libraries for sampling from a BayesEL posterior have been made available. Among them, the library \code{VBel} \citep{VBel} deserves special mention. The authors compute a variational approximation of the BayesEL posterior from which samples can be easily drawn. However, most of the time \code{elhmc} is considered to be the benchmark. + +The rest of the article is structured as follows. We start with the theoretical background behind the software package. In section \ref{sec:theory} we first define the empirical likelihood and construct a Bayesian empirical likelihood from it. +The next part of this section is devoted to a review of the properties of the log empirical likelihood gradient. A review of the HMC method with special emphasis on BayesEL sampling is provided next (Section \ref{sec:hmc}). +Section \ref{sec:package} mainly contains the description of the {\tt elhmc} library. Some illustrative examples with artificial and real data sets are presented in Section \ref{sec:examples}. + +\section{Theoretical background}\label{sec:theory} +\subsection{Basics of Bayesian Empirical Likelihood} +Suppose $x=(x_1,\ldots,x_n)\in \mathbb{R}^p$ are $n$ observations from a distribution $F^0$ depending on a parameter vector $\theta=(\theta^{(1)}, \ldots,\theta^{(d)})\in\Theta\subseteq \mathbb{R}^d$. We assume that both $F^0$ and the true parameter value $\theta^0$ are unknown. However, certain smooth functions $g(\theta,x)=\left(g_1(\theta,x),\ldots,g_q(\theta,x)\right)^T$ are known to satisfy +\begin{equation}\label{smoothfun} +E_{F^0}[g(\theta^0,x)]=0. +\end{equation} + +Additionally, information about the parameter is available in the form of a prior density $\pi(\theta)$ supported on $\Theta$. We assume that it is neither possible nor desirable to specify $F^0$ in a parametric form. On the other hand, it is not beneficial to estimate $F^0$ completely nonparametrically without taking into account the information from \eqref{smoothfun} in the estimation procedure. + +Empirical likelihood provides a semiparametric procedure to estimate $F^0$, by incorporating information contained in \eqref{smoothfun}. A likelihood can be computed from the estimate. Moreover, if some information about the parameter is available in the form of a prior distribution, the same likelihood can be employed to derive a posterior of the parameter given the observations. + +Let $F\in\mathcal{F}_{\theta}$ be a distribution function depending on the parameter $\theta$. The empirical likelihood is the maximum of the ``nonparametric likelihood" +\begin{equation}\label{eqn2} +L(F)=\prod_{i=1}^n \{F(x_i)-F(x_i-)\} +\end{equation} +over $\mathcal{F}_\theta$, $\theta\in\Theta$, under constraints depending on $g(\theta,x)$. + +More specifically, by defining $\omega_i=F(x_i)-F(x_i-)$, the empirical likelihood for $\theta$ is defined by, +\begin{equation}\label{eqn3} +L(\theta)\coloneqq\max_{\omega\in\mathcal{W}_{\theta}}\prod_{i=1}^n \omega_i +\end{equation} +where +\[ +\mathcal{W}_{\theta}=\Big\{\omega: \sum_{i=1}^n\omega_i g(\theta,x_i)=0\Big\}\cap\Delta_{n-1} +\] +and $\Delta_{n-1}$ is the $n-1$ dimensional simplex, i.e. $\omega_i\geq 0$, $\forall i$ and $\sum_{i=1}^n\omega_i=1$. For any $\theta$, if the problem in \eqref{eqn3} is infeasible, i.e. $\mathcal{W}_{\theta}=\emptyset$, we define $L(\theta)\coloneqq 0$. + +Using the empirical likelihood $L(\theta)$ and the prior $\pi(\theta)$ we can define a posterior as: +\begin{equation}\label{eqn4} +\Pi(\theta|x)=\frac{L(\theta)\pi(\theta)}{\int L(\theta)\pi(\theta) d\theta}\propto L(\theta)\pi(\theta). +\end{equation} +%\left\{\prod_{i=1}^n\hat{\omega}_i(\theta)\right\}\pi(\theta). + +In Bayesian empirical likelihood (BayesEL), $\Pi(\theta|x)$ is used as the posterior to draw inferences on the parameter. + +Returning back to \eqref{eqn3} above, suppose we denote: +\begin{equation}\label{eqn5} +\hat{\omega}(\theta)=\argmax_{\omega\in\mathcal{W}_{\theta}}\prod_{i=1}^n \omega_i. +\qquad\qquad +\Big(\text{ i.e. } L(\theta)=\prod^n_{i=1}\hat{\omega}_i(\theta)\Big) +\end{equation} +Each $\hat\omega_i\geq 0$ if and only if the origin in $\mathbb{R}^q$ can be expressed as a convex combination of $g(\theta,x_1),\ldots,g(\theta,x_n)$. Otherwise, the optimisation problem is infeasible, and $\mathcal{W}_{\theta}=\emptyset$. Furthermore, when $\hat{\omega}_i>0$, $\forall i$ is feasible, the solution $\hat{\omega}$ of \eqref{eqn5} is unique. + +The estimate of $F^0$ is given by:\footnote{By convention, $x_i=(x_{i1},x_{i2},\ldots,x_{ip})^T\le x=(x_1,x_2,\ldots,x_p)^T$ iff $x_{ij}\le x_{j}$ $\forall j$.} +\[ +\hat{F}^0(x)=\sum_{i=1}^n\hat{\omega}_i(\theta)1_{\{x_i\leq x\}}. +\] +The distribution $\hat{F}^0$ is a step function with a jump of $\hat{\omega}_i(\theta)$ on $x_i$. If $\mathcal{W}_{\theta}=\Delta_{n-1}$, i.e. no information about $g(\theta,x)$ is present, it easily follows that $\hat{\omega}_i(\theta)=n^{-1}$, for each $i=1$, $2$, $\ldots$, $n$ and $\hat{F}^0$ is the well-known empirical distribution function. + +%From \eqref{eqn2}, it is clear that if $F$ is continuous, $L(F)=0$. The estimate $\hat{F}^0(x)$ is a step function with a jump of size $\omega_i$ at $x_i$, $i=1,\ldots,n$. Thus for any distribution the estimate of $F^0$ is discrete. Thus $\hat F^0$ is the usual empirical distribution function. + +%The empirical likelihood corresponding to $\hat{F}^0$ is given by +%\begin{equation}\label{emplik} +%L(\theta)=\prod_{i=1}^n \hat{\omega}_i(\theta). +%\end{equation} + +By construction, $\Pi(\theta|x)$ can only be computed numerically. No analytic form is available. Inferences are drawn through the observations from $\Pi(\theta|x)$ sampled using Markov chain Monte Carlo techniques. + +%Since $\hat{\omega}(\theta)$ is determined numerically from \eqref{eqn3}, no analytic form of $\Pi(\theta|x)$ is available in general. + +Adaptation of Markov chain Monte Carlo methods to BayesEL applications poses several challenges. First of all, it is not possible to determine the full conditional densities in a closed form. So techniques like Gibbs sampling \citep{geman1984stochastic} cannot be used. In most cases, random walk Metropolis procedures, with carefully chosen step sizes, are attempted. However, the nature of the support of $\Pi(\theta|x)$, which we discuss in detail below, makes the choice of an appropriate step size extremely difficult. + +\begin{figure}[t] +\includegraphics[width=\textwidth]{figures/dirScan3.jpeg} +\caption{Schematic illustration of the Empirical likelihood problem. The support of the empirical likelihood is $\Theta_1$, a subset of $\mathbb{R}^d$. We take $n=8$ observations. The estimating equations $g(x,\theta)$ are $q=2$ dimensional. Note that $\Theta_1$ is non-convex and may not be bounded. The convex hull of the $q$-dimensional vectors, i.e., $\C(\theta,x)$, is a pentagon in $\mathbb{R}^2$. The largest faces of $\C(\theta,x)$ are the one-dimensional sides of the pentagon. +It follows that, $\theta^{(k)}\in\Theta_1$ iff the origin of $\mathbb{R}^2$, denoted $0_2$ is in the interior $\C^0(\theta,x)$ of $\C(\theta,x)$. +This also implies that the optimal empirical likelihood weights $\hat{\omega}(\theta^{(k)})$ are strictly positive and lie in the interior of the $n-1$, i.e. $7$-dimensional simplex. There is no easy way to determine $\Theta_1$. We check if $0_2\in \C^0(\theta,x)$ or equivalently if $\hat{\omega}(\theta^{(k)})$ are in the interior of $\Delta_7$ in order to determine if $\theta^{(k)}\in \Theta_1$. +As the sequence $\theta^{(k)}$ approaches the boundary of $\theta_1$, the convex polytope $\C(\theta^{(k)},x)$ changes in such a way, so that $0_2$ converges to its boundary. The sequence of optimal weights $\hat{\omega}(\theta^{(k)})$, will converge to the boundary of $\Delta_7$. +The current software is based on \citet{chaudhuriMondalTeng2017}, who show that, under simple conditions, along almost every sequence $\theta^{(k)}$ converging to the boundary of $\Theta_1$, at least one component of the gradient of log-empirical likelihood based posterior diverges to positive or negative infinity.} +\label{fig:scheme} +\end{figure} + +%For any $\theta\in\Theta$, $\theta$ is in the support of the posterior if and only if $L(\theta)>0$. + +Provided that the prior is positive over the whole $\Theta$, which is true in most applications, the support of $\Pi(\theta|x)$ is a subset of the support of the likelihood $L(\theta)$ which can be defined as (see Figure \ref{fig:scheme}): +\begin{equation}\label{support} +\Theta_1=\left\{\theta: L(\theta)>0\right\}. +\end{equation} +Thus, the efficiency of the MCMC algorithm would depend on $\Theta_1$ and the behaviour of $\Pi(\theta|x)$ on it. + +By definition, $\Theta_1$ is closely connected to the set +\begin{equation}\label{convexhull} +\C(\theta,x)=\left\{\sum_{i=1}^n\omega_ig(\theta,x_i) \Big|\omega\in \Delta_{n-1}\right\}, +\end{equation} +which is the closed convex hull of the $q$ dimensional vectors $G(x,\theta)=\{g(\theta,x_i),\ldots,g(\theta,x_n)\}$ in $\mathbb{R}^q$ (the pentagon in Figure \ref{fig:scheme}). Suppose $\C^0(\theta,x)$ and $\partial \C(\theta,x)$ are respectively the interior and boundary of $\C(\theta,x)$. By construction, $\C(\theta,x)$ is a convex polytope. Since the data $x$ is fixed, +the set $\C(\theta,x)$ is a set-valued function of $\theta$. For any $\theta\in\Theta$, the problem in \eqref{eqn3} is feasible (i.e. $\mathcal{W}_{\theta}\ne\emptyset$) if and only if the origin of $\mathbb{R}^q$, denoted by $0_q$, is in $\C(\theta,x)$. That is, $\theta\in\Theta_1$ if and only if the same $0_q\in\C^0(\theta,x)$. +It is not possible to determine $\Theta_1$ in general. The only way is to check if, for any potential $\theta$, the origin $0_q$ is in $\C^0(\theta,x)$. There is no quick numerical way to check the latter either. Generally, an attempt is made to solve \eqref{eqn3}. The existence of such a solution indicates that $\theta\in L(\theta)$. + +Examples show \citep{chaudhuriMondalTeng2017} that even for simple problems, $\Theta_1$ may not be a convex set. Designing an efficient random walk Markov chain Monte Carlo algorithm on a potentially non-convex support is an extremely challenging task. +Unless the step sizes and the proposal distributions are adapted well to the proximity of the current position to the boundary of $\Theta_1$, the chain may repeatedly propose values outside the likelihood support and, as a result, converge very slowly. +Adaptive algorithms like the one proposed by \citet{haario1999adaptive} do not tackle the non-convexity problem well. + +Hamiltonian Monte Carlo methods solve well-known equations of motion from classical mechanics to propose new values of $\theta\in\Theta$. Numerical solutions of these equations of motion are dependent on the gradient of the log posterior. +The norm of the gradient of the log empirical likelihood used in BayesEL procedures diverges near the boundary of $\Theta_1$. This property makes the Hamiltonian Monte Carlo procedures very efficient for sampling a BayesEL posterior. It ensures that once in $\Theta_1$, the chain would rarely step outside the support and repeatedly sample from the posterior. + +% it follows that $L(\theta)>0$ if and only if $0\in \C^0(\theta,x)$. We hold the data $x$ fixed so $G(\theta,x)$ and $\C(\theta,x)$ vary with $\theta$. In general, it is not easy to ensure that $0$ is in $\C^0(\theta,x)$. When 0 is not in $\C(\theta,x)$, the primal problem in \eqref{eqn3} is infeasible. That is $\mathcal{W}_{\theta}=\emptyset$. + +\subsection{A Review of Some Properties of the Gradient of Log Empirical Likelihood}\label{sec:elprop} + +Various properties of log-empirical likelihood have been discussed in the literature. However, the properties of its gradients with respect to the model parameters are relatively unknown. Our main goal in this section is to review the behaviour of gradients of log-empirical likelihood on the support of the empirical likelihood. We only state the relevant results here. The proofs of these results can be found in \citet{chaudhuriMondalTeng2017}. + +Recall that, (see Figure \ref{fig:scheme}) the support $\Theta_1$ can only be specified by checking if $0_q\in\C^0(x,\theta_0)$ for each individual $\theta_0\in\Theta$. If for some $\theta_0\in\Theta$, the origin lies on the boundary of $\C(x,\theta_0)$, i.e. $0_q\in\partial \C(x,\theta_0)$, +the problem in \eqref{eqn3} is still feasible, however, $L\left(\theta_0\right)=0$ and the solution of \eqref{eqn5} is not unique. Below we discuss how, under mild conditions, for any $\theta_0\in\Theta$, for a large subset $S\subseteq\partial \C(x,\theta_0)$, if $0_q\in S$, +the absolute value of at least one component of the gradient of $\log\left(L\left(\theta_0\right)\right)$ would be large. + +%By construction, for any $\theta_0\in\Theta$, $L(\theta_0)>0$ if and only if $0\in \C^0(x,\theta_0)$. It turns out that (see Lemma \ref{lem2}) when $\theta$ is in the boundary of the support of BayEL posterior, the origin will lie in one of the boundaries of $\C(x,\theta_0)$. The optimisation problem in \eqref{eqn3} is still feasible when $0\in \partial \C(x,\theta_0)$ even though $L(\theta_0)=0$. We show below that under mild conditions for any $\theta_0\in\Theta$ such that $0\in\partial \C(x,\theta_0)$ the gradient of the $\log L(\theta_0)$ with respect to at least one component of $\theta$ has a large positive or a large negative value. + + Before we proceed, we make the following assumptions: + +\begin{enumerate} +\item[(A0)] $\Theta$ is an open set. \label{A0} + +\item[(A1)] $g$ is a continuously differentiable function of $\theta$ in $\Theta$, $q \le d$ and $\Theta_1$ is non-empty. \label{A1} + +\item[(A2)] The sample size $n > q$. The matrix $G(x, \theta)$ has full row rank for any $\theta \in \Theta$. + +\item[(A3)] For any fixed $x$, let $\nabla g(x_i,\theta)$ be the $q \times d$ Jacobian matrix for any $\theta \in \Theta$. Suppose $w=(w_1,\ldots, w_n)\in\Delta_{n-1}$ and there are at least $q$ elements of $w$ that are greater than $0$. Then, for any $\theta \in \Theta$, the matrix $\sum_{i=1}^n w_i \nabla g(x_i,\theta)$ has full row rank. +\end{enumerate} + +Under the above assumptions, several results about the log empirical likelihood and its gradient can be deduced. + +First of all, since the properties of the gradient of the log empirical likelihood at the boundary of the support are of interest, some topological properties of the support need to be investigated. Under the standard topology of $\mathbb{R}^q$, since $\C(x,\theta)$ is a convex polytope with a finite number of faces and extreme points, using the smoothness of $g$, it is easy to see that, +for any $\theta_0\in\Theta_1$ one can find a real number $\delta>0$, such that the open ball centred at $\theta_0$ with radius $\delta$ is contained in $\Theta_1$. That is, $\Theta_1$ is an open subset of $\Theta$. + +Now, since $\Theta_1$ is an open set, the boundary $\partial\Theta_1$ of $\Theta_1$ is not contained in $\Theta_1$. Let $\theta^{(0)}$ lie within $\Theta$ and on the boundary of $\Theta_1$ (i.e. $\partial\Theta_1$). Then it follows that the primal problem \eqref{eqn3} is feasible at $\theta^{(0)}$ and $0_q$ lies on the boundary of $\C(x,\theta^{(0)})$ (i.e. $\partial \C(x,\theta^{(0)})$). + +%\begin{prop}\label{prop2} +%Let $\theta^{(0)}\in\partial\Theta_1$ and lies within $\Theta$. Then the primal problem \eqref{eqn3} is feasible at $\theta^{(0)}$ and $0_q \in \partial \C(x,\theta^{(0)})$. +%\end{prop} + +Our main objective is to study the utility of Hamiltonian Monte Carlo methods for drawing samples from a BayesEL posterior. The sampling scheme will produce a sequence of sample points in $\theta^{(k)}\in\Theta_1$ (see Figure \ref{fig:scheme}). It would be efficient as long as $\log L\left(\theta^{(k)}\right)$ is large. The sampling scheme could potentially become inefficient if some $\theta^{(k)}$ +is close to the boundary $\partial\Theta_1$. Thus, it is sufficient to consider the properties of the log empirical likelihood and its gradient along such a sequence converging to a point $\theta^{(0)}\in\partial\Theta_1$. + +From the discussion above it is evident that when $\theta^{(0)} \in \partial \Theta_1$ the problem in \eqref{eqn3} is feasible but the likelihood $L\left(\theta^{(0)}\right)$ will always be zero and \eqref{eqn5} will not have a unique solution. Since $\C(x,\theta^{(0)})$ is a polytope, and $0_q$ lies on one of its faces, + there exists a subset $\I_0$ of the observations and $0$ belongs to the interior of the convex hull generated by all $g(x_i,\theta^{(0)})$ for $i \in \I_0$ (in Figure \ref{fig:scheme}, $\I_0=\{x_4,x_5\}$). It follows from the supporting hyperplane theorem \citep{boyd2004convex} that there exists a unit vector $a\in \mathbb{R}^q$ such that +\[ +a^{\text{\tiny $T$}} g(x_i, \theta^{(0)}) =0 \quad \mbox{for} \quad i \in \I_0, \qquad\text{and}\qquad a^{\text{\tiny $T$}} g(x_{i}, \theta^{(0)}) >0 \quad \mbox{for} \quad i \in \I_0^c. +\] +From some algebraic manipulation it easily follows that any $\omega\in\mathcal{W}_{\theta^{(0)}}$ ($\mathcal{W}_{\theta}$ as defined in \eqref{eqn3} with $\theta=\theta^{(0)}$) must satisfy,\footnote{In Figure \ref{fig:scheme}, $\omega_1=\omega_2=\omega_3=0$, $\omega_4>0$, and $\omega_5>0$.} +\[\omega_i=0 \quad \mbox{for} \quad i \in \I_0^c \qquad\text{and}\qquad \omega_i>0 \quad \mbox{for} \quad i \in \I_0. +\] + +It is well known that the solution of \eqref{eqn5} i.e. $\hat{w}(\theta)$ is smooth for all $\theta\in\Theta_1$ \citep{qin1994empirical}. As $\theta^{(k)}$ converges to $\theta^{(0)}$, the properties of $\hat{w}(\theta^{(k)})$ need to be considered. To that goal, we first make a specific choice of $\hat{w}(\theta^{(0)})$. + +First, we consider a restriction of problem \eqref{eqn5} to $\I_0$. + +\begin{equation}\label{submax} +\hat\nu(\theta) =\argmax_{\nu\in\mathcal{V}_\theta} \prod_{i\in\I_0} \nu_i +\end{equation} +where +\[ +\mathcal{V}_\theta=\left\{\nu: \sum_{i\in \I_0}\nu_i g(x_i,\theta)=0\right\}\cap\Delta_{|\I_0|-1}. +\] +We now define +\[ +\hat \omega_i(\theta^{(0)}) = \hat\nu(\theta^{(0)}), \quad i \in \I_0 \quad \mbox{and} \quad \hat \omega_i(\theta^{(0)}) = 0, \quad i \in \I_0^c, +\] +and +\[ +L(\theta^{(0)})= \prod_{i=1}^n \hat \omega_i(\theta^{(0)}). +\] + +Since $\theta^{(0)}$ is in the interior of $\I_0$, the problem \eqref{submax} has a unique solution. For each $\theta^{(k)}\in\Theta_1$, $\hat{\omega}(\theta^{(k)})$ is continuous taking values in a compact set. + Thus as $\theta^{(k)}$ converges to $\theta^{(0)}$, $\hat{\omega}(\theta^{(k)})$ converges to a limit. Furthermore, this limit is a solution of \eqref{eqn5} at $\theta^{(0)}$. However, counterexamples show \citep{chaudhuriMondalTeng2017} that the limit may not be $\hat{\omega}_i(\theta^{(0)})$ as defined above. +That is, the vectors $\hat{\omega}(\theta^{(k)})$ do not extend continuously to the boundary $\partial\Theta_1$ as a whole. However, we can show that: +\begin{equation} +\lim_{k\to\infty}\hat\omega_i(\theta^{(k)}) = \hat \omega_i(\theta^{(0)}) = 0, \text{for all\ } i \in \I_0^c. +\end{equation} +That is, the components of $\hat\omega(\theta^{(k)})$ which are zero in $\hat\omega(\theta^{(0)})$ are continuously extendable. Furthermore, +\begin{equation} +\lim_{k\to\infty}L(\theta^{(k)})=L(\theta^{(0)})=0. +\end{equation} +That is, the likelihood is continuous at $\theta^{(0)}$. + +However, this is not true for the components $\hat{\omega}_i\left(\theta^{(k)}\right)$, $i\in\I_0$ for which $\hat{\omega}_i\left(\theta^{(k)}\right)> 0$. + +%The following result about the components $\hat{\omega}_i(\theta^{(k)})$, for $i\in \I_0^c$ and the likelihood can be proved. + +% \begin{thm}\label{continuity1} +%Let $\{ \theta^{(k)}\}$, $k=1,2, \ldots$, be a sequence of points in $\Theta_1$ such that $\theta^{(k)}$ converges to a boundary point $\theta^{(0)}$ of $\Theta_1$. Assume that $\theta^{(0)}$ lies within $\Theta$. Let $\I_0$ be the subset of $\{1,2,\ldots,n\}$ such that $\hat\omega_i(\theta^{(0)}) >0$ for all $i \in \I_0$. It then follows that +%\begin{enumerate} +%\item $\lim_{k\to\infty}\hat\omega_i(\theta^{(k)}) = \hat \omega_i(\theta^{(0)}) = 0$, for all $i \in \I_0^c$. +%\item $\lim_{k\to\infty}L(\theta^{(k)})=L(\theta^{(0)})=0$. +%\end{enumerate} +%\end{thm} + +%Theorem \ref{continuity1} shows that the components of $\hat\omega(\theta^{(k)})$ which are zero in $\hat\omega(\theta^{(0)})$ are continuously extendable. Furthermore the likelihood is continuous at $\theta^{(0)}$. Such, however is not true for the components $\hat{\omega}_i\left(\theta^{(k)}\right)$, $i\in\I_0$ for which $\hat{\omega}_i\left(\theta^{(k)}\right)\ne 0$. + +Since the set $\C(x,\theta)$ is a convex polytope in $\mathbb{R}^q$, the maximum dimension of any of its faces is $q-1$, which would have exactly $q$ extreme points.\footnote{In Figure \ref{fig:scheme}, $q=2$, and the faces of maximum dimension are the sides of the pentagon. They have $q=2$ end i.e. extreme points.} Furthermore, any face with a smaller dimension can be expressed as an intersection of such $q-1$ dimensional faces. + +In certain cases, however, the whole vector $\hat{\omega}\left(\theta^{(k)}\right)$ extends continuously to $\hat{\omega}\left(\theta^{(0)}\right)$. In order to argue that, we define +\begin{equation}\label{Theta_2} + \C(x_{\I},\theta) = \left\{\sum_{i \in \I} \omega_i g(x_i,\theta)\, \Big|\, \omega\in \Delta_{|\I|-1}\right\} +\end{equation} +and +\begin{equation} + \partial\Theta_1^{(q-1)} = \Big\{ \theta: 0 \in \C^0(x_{\I},\theta) \mbox{ for some } \I ~ s.t. \C(x_{\I},\theta) \text{has exactly $q$ extreme points} \Big\} \cap\partial\Theta_1. +\end{equation} + + Thus $\partial\Theta_1^{(q-1)}$ is the set of all boundary points $\theta^{(0)}$ of $\Theta_1$ such that $0$ belongs to a $(q-1)$-dimensional face of the convex hull $\mathcal{C}(x,\theta^{(0)})$. Now for any $\theta^{(0)}\in \partial\Theta_1^{(q-1)}$, there is a unique set of weight $\nu\in\Delta_{|\I|-1}$ such that, $\sum_{i\in\I}\nu_ig\left(x_i,\theta^{(0)}\right)=0$. +That is the set of feasible solutions of \eqref{submax} is a singleton set. This, after taking note that $\hat{\omega}$ takes values in a compact set, an argument using convergent subsequences, implies that for any sequence $\theta^{(k)}\in\Theta_1$ converging to $\theta^{(0)}$, the whole vector $\hat{\omega}\left(\theta^{(k)}\right)$ converges to $\hat{\omega}\left(\theta^{(0)}\right)$. +That is, the whole vector $\hat{\omega}\left(\theta^{(k)}\right)$ extends continuously to $\hat{\omega}\left(\theta^{(0)}\right)$. + +%To that end, first note that since $0 \in \partial \mathcal{C}(x,\theta^{(0)})$, the vector $0$ must belong to one of the hyper-faces of the convex hull $\mathcal{C}(x,\theta^{(0)})$. Now, let $\I_0$ be a +%$(q-j)$-dimensional hyper-face of the convex hull $\mathcal{C}(x,\theta^{(0)})$ containing $0$ for some $j=1,2, \ldots, q-1$. +%Further, suppose that there are exactly $q-j+1$ vectors $g(x_i, \theta^{(0)})$ that lie on the closure of this hyper-face. It then follows that there exists a unique set of weights, namely, $\hat \omega_i(\theta^{(0)})$ such that $\sum_i \hat \omega_i(\theta^{(0)}) g(x_i, \theta^{(0)})=0$. + +%Since each $\hat \omega(\theta^{(k)})$ belongs to $\Delta_{n-1}$, which is a compact set, it must have a convergent subsequence. By uniqueness of the feasible weights at $\theta^{(0)}$, along any convergent subsequence, $\hat \omega_i(\theta^{(k)})$ must converge to the same $\hat \omega_i(\theta^{(0)})$. This implies that +%$\hat \omega_i(\theta)$ is continuous at $\theta^{(0)}$ for all $i \in \I_0$. This leads to our second theorem. + +%\medskip + +%\begin{thm}\label{th:continuity2} +%Let $\theta^{(k)}$, $\theta^{(0)}$ and $\I_0$ be as defined in Theorem \ref{continuity1}. Suppose that $0$ belongs to a $(q-j)$-dimensional hyper-face of the convex hull $\mathcal{C}(x,\theta^{(0)})$ for some $j=1,2, \ldots, q-1$. Assume that there are exactly $q-j+1$ vectors $g(x_i, \theta^{(0)})$ that lie on the closure of this hyper-face. Then +%\[ +%\lim_{k\to\infty} \hat\omega_i(\theta^{(k)}) = \hat\omega_i(\theta^{(0)}), \quad \mbox{for all} \quad i\in \I_0. +%\] +%\end{thm} + +We now consider the behaviour of the gradient of the log empirical likelihood near the boundary of $\Theta_1$. First, note that, for any $\theta \in \Theta_1$, the gradient of the log empirical likelihood is given by +\[ +\nabla \log L(\theta) = -n\sum_{i=1}^n \hat \omega_i(\theta) \hat{\lambda}(\theta)^{\text{\tiny $T$}} \nabla g(x_i,\theta). +\] +where $\hat{\lambda}(\theta)$ is the estimated Lagrange multiplier satisfying the equation: %\footnote{The Lagrange multiplier $\lambda$ should be considered to be a parameter in this problem. The estimate $\hat{\lambda}$ is function of the data $x$ and the parameter $\theta$.} + +\begin{equation}\label{eq:lagmult} +\sum_{i=1}^n \frac{g(x_i,\theta)}{\left\{1+ \hat\lambda(\theta)^{\text{\tiny $T$}} g(x_i,\theta) \right\}}=0. +\end{equation} + +Note that, the gradient depends on the value of the Lagrange multiplier but not on the value of its gradient. + +%Next we define +%\[ +%\C(x_{\I},\theta) = \left\{\sum_{i \in \I} \omega_i g(x_i,\theta)\, \Big|\, \omega\in \Delta_{|\I|-1}\right\} +%\] +%and +%\begin{align}\label{Theta_2} +% &\C(x_{\I},\theta) = \left\{\sum_{i \in \I} \omega_i g(x_i,\theta)\, \Big|\, \omega\in \Delta_{|\I|-1}\right\}\text{ and }\partial\Theta_1^{(q-1)}= \Bigl\{ \theta: 0 \in \C^0(x_{\I},\theta) \mbox{ for some }& \I \Bigr.\nonumber\\ +% &\bigl.\mbox{ such that $\C(x_{\I},\theta)$ has $q$ exactly extreme points }\Bigr\} \cap\partial\Theta_1. +%\end{align} + +% Thus $\partial\Theta_1^{(q-1)}$ is the set of all boundary points $\theta^{(0)}$ of $\Theta_1$ such that $0$ belongs to a $(q-1)$-dimensional face of the convex hull $\mathcal{C}(x,\theta^{(0)})$. + +Now, Under assumption A3, it follows that the gradient of the log empirical likelihood diverges on the set of all boundary points $\partial\Theta_1^{(q-1)}$. More specifically one can show: +\begin{enumerate} +\item As $\theta^{(k)}\rightarrow \theta^{(0)}$, $\parallel\hat \lambda(\theta^{(k)})\parallel\to\infty$. + +\item If $\theta^{(0)}\in \partial\Theta_1^{(q-1)}$, under \text{A3} as $\theta^{(k)}\rightarrow \theta^{(0)}$, ${\parallel \nabla \log L(\theta^{(k)}) \parallel}\to \infty$. +\end{enumerate} + +%the following result can be proved (see \citet[Theorem $3$]{chaudhuriMondalTeng2017}). + +%\medskip + +%\begin{thm}\label{lambda} +%Let $\{ \theta^{(k)}\}$, $\theta^{(0)}\in\partial\Theta^{(q-1)}_1$ be as defined above. Let, for each $k=1,2,\ldots$, the $\hat\lambda(\theta^{(k)})$ be the Lagrange multiplier satisfying equation \ref{eq:lagmult} with $\theta=\theta^{(k)}$. Then +%\begin{enumerate} +%\item[(i)] As $\theta^{(k)}\rightarrow \theta^{(0)}$, $\parallel\hat \lambda(\theta^{(k)})\parallel\to\infty$, +%\item[(ii)] If $\theta^{(0)}\in \partial\Theta_1^{(q-1)}$, under \text{A3} as $\theta^{(k)}\rightarrow \theta^{(0)}$, ${\parallel \nabla \log L(\theta^{(k)}) \parallel}\to \infty$. +%\end{enumerate} +%\end{thm} + +%\medskip + +Therefore, it follows that at every boundary point $\theta^{(0)}$ of $\Theta_1$ such that $0$ belongs to one of the $(q-1)$-dimensional faces of $\C(x,\theta^{(0)})$, at least one component of the estimated Lagrange multiplier and the gradient of the log empirical likelihood diverges to positive or negative infinity. +The gradient of the negative log empirical likelihood represents the direction of the steepest increase of the negative log empirical likelihood. Since the value of the log empirical likelihood should typically be highest around the center of the support $\Theta_1$, +the gradient near the boundary of $\Theta_1$ should point towards its center. This property can be exploited in forcing candidates of $\theta$ generated by HMC proposals to bounce back towards the interior of $\Theta_1$ from its boundaries and in consequence reducing the chance of them getting out of the support. + +%Since the negative log empirical likelihood increases to infinity at these boundary points, the direction of this gradient is in fact the direction at which the negative log empirical likelihood increases the fastest. Typically, we expect that the value of log-likelihood is large in and around the centre of the support. +%Thus, the gradient near the boundary of the support points to the centre of $\Theta_1$. In the following section we will see that this behaviour of log empirical likelihood forces the candidates of $\theta$ generated by HMC proposals to bounce back to the interior of $\Theta_1$, and consequently once inside $\Theta_1$, the proposed candidates are unlikely to get out of the support. + +\subsection{Hamiltonian Monte Carlo Sampling for Bayesian Empirical Likelihood} +\label{sec:hmc} +Hamiltonian Monte Carlo algorithm is a Metropolis algorithm where the successive steps are proposed by using Hamiltonian dynamics. One can visualise these dynamics as a cube sliding without friction under gravity in a bowl with a smooth surface. +The total energy of the cube is the sum of the potential energy $U(\theta)$, defined by its position $\theta$ (in this case its height) and kinetic energy $K(p)$, which is determined by its momentum $p$. +The total energy of the cube will be conserved and it will continue to slide up and down on the smooth surface of the bowl forever. The potential and the kinetic energy would, however, vary with the position of the cube. + +In order to use the Hamiltonian dynamics to sample from the posterior $\Pi\left(\theta\mid x\right)$ we set our potential and kinetic energy as follows: +\[ +U(\theta)=-\log\Pi(\theta|x)\quad\text{and}\quad K(p)=\frac{1}{2}p^TM^{-1}p. +\] +Here, the momentum vector $p=\left(p_1,p_2,\ldots,p_d\right)$ is a totally artificial construct usually generated from a $N(0, M)$ distribution. Most often the covariance matrix $M$ is chosen to be a diagonal matrix with diagonal $(m_1,m_2,\ldots,m_d)$, in which case each $m_i$ is interpreted as the mass of the $i$th parameter. The Hamiltonian of the system is the total energy +\begin{equation}\label{hamiltonian dynamics} +\mathcal{H}(\theta,p)=U(\theta)+K(p). +\end{equation} + +In Hamiltonian mechanics, the variation in the position $\theta$ and momentum $p$ with time $t$ is determined by the partial derivatives of $\mathcal{H}$ with $p$ and $\theta$ respectively. In particular, the motion is governed by the pair of so-called Hamiltonian equations: +\begin{eqnarray} +\frac{d\theta}{dt}&=&\frac{\partial \mathcal{H}}{\partial p}=M^{-1}p, \label{PDE1}\\ +\frac{dp}{dt}&=&-\frac{\partial\mathcal{ H}}{\partial \theta}=-\frac{\partial U(\theta)}{\partial \theta}.\label{PDE2} +\end{eqnarray} +It is easy to show that \citep{neal2011mcmc} Hamiltonian dynamics is reversible, invariant, and volume preserving, which makes it suitable for MCMC sampling schemes. + +In HMC we propose successive states by solving the Hamiltonian equations \eqref{PDE1} and \eqref{PDE2}. Unfortunately, they cannot be solved analytically (except of course for a few simple cases), and they must be approximated numerically at discrete time points. There are several ways to numerically approximate these two equations in the literature \citep{leimkuhler2004simulating}. +For the purpose of MCMC sampling, we need a method that is reversible and volume-preserving. + +Leapfrog integration \citep{birdsall2004plasma} is one such method to numerically integrate the pair of Hamiltonian equations. In this method, a step-size $\epsilon$ for the time variable $t$ is first chosen. Given the value of $\theta$ and $p$ at the current time point $t$ (denoted here by $\theta(t)$ and $p(t)$ respectively), the leapfrog updates the position and the momentum at time $t+\epsilon$ as follows +\begin{eqnarray} +p\left(t+\frac{\epsilon}{2}\right)&=&p(t)-\frac{\epsilon}{2}\frac{\partial U(\theta(t))}{\partial\theta},\label{leapfrog1}\\ +\theta(t+\epsilon)&=&\theta(t)+\epsilon M^{-1}p\left(t+\frac{\epsilon}{2}\right),\label{leapfrog2}\\ +p(t+\epsilon)&=&p\left(t+\frac{\epsilon}{2}\right)-\frac{\epsilon}{2}\frac{\partial U(\theta(t+\epsilon))}{\partial\theta}.\label{leapfrog3} +\end{eqnarray} + +Theoretically, due to its symmetry, the leapfrog integration satisfies the reversibility and preserves the volume. However, because of the numerical inaccuracies, the volume is not preserved. This is similar to the Langevin-Hastings algorithm \citep{besag2004markov}, which is a special case of HMC. +Fortunately, the lack of invariance in volume is easily corrected. The accept-reject step in the MCMC procedure ensures that the chain converges to the correct posterior. + +At the beginning of each iteration of the HMC algorithm, the momentum vector $p$ is randomly sampled from the $N(0,M)$ distribution. Starting with the current state $(\theta,p)$, the leapfrog integrator described above is used to simulate Hamiltonian dynamics for $T$ steps with a step size of $\epsilon$. At the end of this $T$-step trajectory, the momentum $p$ is negated so that the Metropolis proposal is symmetric. At the end of this T-step iteration, the proposed state $(\theta^*,p^*)$ is accepted with probability +\[ +\min\{1,\exp(-\mathcal{H}(\theta^*,p^*)+\mathcal{H}(\theta,p))\}. +\] + +%Suppose for some $\theta^\text{\tiny $T$}ar\in\Theta$ we define +%\[ +%\frac{\partial \log\Pi(\theta^\text{\tiny $T$}ar|x)}{\partial\theta}=\frac{\partial\log \Pi(\theta|x)}{\partial\theta}\Big|_{\theta=\theta^\text{\tiny $T$}ar}=\frac{\partial\log L(\theta)}{\partial\theta}\Big|_{\theta=\theta^\text{\tiny $T$}ar}+\frac{\partial\log \pi(\theta)}{\partial\theta}\Big|_{\theta=\theta^\text{\tiny $T$}ar}. +%\] + +The gradient of the log-posterior used in the leapfrog is a sum of the gradient of the log empirical likelihood and the gradient of the log prior. The prior is user-specified and it is hypothetically possible that even though at least one component of the gradient of the log empirical likelihood diverges at the boundary $\partial\Theta_1$, +the log prior gradient may behave in a way so that the effect is nullified and the log posterior gradient remains finite over the closure of $\Theta_1$. We make the following assumption on the prior mainly to avoid this possibility (see \citet{chaudhuriMondalTeng2017} for more details). + +%In our application, we take the negative value of the log-posterior as the potential energy. The gradient of the potential, used in leapfrog would depend on both the gradient of the log-likelihood and the gradient of the log-prior. We have discussed some properties of $\partial\log L(\theta)/\partial\theta$ in section \ref{sec:elprop}. The gradient of the log-prior is user specified. We make the following assumption about the prior (see \citet{chaudhuriMondalTeng2017}). + +%\begin{enumerate} +%\item \label{chpater2A6}Let $\theta=(\theta^{(1)},\ldots,\theta^{(d)})$ such that for each $i=1,2,\ldots,d$, there is at least one $j=1,2,\ldots,q$ such that $h_j(x,\theta)$ directly depends on $\theta^{(i)}$. Now suppose $\{\theta_{k}\}_{k=1}^\infty$ is a sequence of points in $\Theta_1$ such that $\theta_{k}\to\theta_0\in\partial\Theta_1$. Then for each $i=1,2,\ldots,d$, +%\[ +%\lim_{k\to\infty} \frac{\left|\partial\log\pi(\theta_k)/\partial\theta^{(i)}_k\right|}{\left|\partial\log\Pi(\theta_k|x)/\partial\theta^{(i)}_k\right|}=0. +%\] +%\end{enumerate} + +\begin{itemize} +\item[(A4)] Consider a sequence $\{\theta^{(k)} \}$, $k=1, 2,\ldots$, of points in $\Theta_1$ such that $\theta^{(k)}$ converges to a boundary point $\theta^{(0)}$ of $\Theta_1$. Assume that $\theta^{(0)}$ lies within $\Theta$ and $L(\theta^{(k)})$ strictly decreases to $L(\theta^{(0)})$, Then, for some constant $b(n, \theta^{(0)}) > -1$, we have +\begin{equation}\label{eq:liminf2} + \liminf_{k\to\infty} \frac{ \log\pi(\theta^{(k-1)}) - \log\pi(\theta^{(k)}) }{ \log L(\theta^{(k-1)} ) - \log L(\theta^{(k)} ) } \ge b(n, \theta^{(0)}). +\end{equation} +\end{itemize} + +The assumption implies that near the boundary of the support, the main contribution in the gradient of the log-posterior with respect to any parameter appearing in the argument of the estimating equations comes from the corresponding gradient of the log empirical likelihood. This is in most cases expected, especially if the sample size is large. For a large sample size, the log-likelihood should be the dominant term in the log-posterior. We are just assuming here that the gradients behave the same way. It would also ensure that at the boundary, the gradient of the log-likelihood and the log-posterior do not cancel each other, which is crucial for the proposed Hamiltonian Monte Carlo to work. + +Under these assumptions, \citet{chaudhuriMondalTeng2017} show that the gradient of the log-posterior diverges along almost every sequence as the parameter values approach the boundary $\partial \Theta_1$ from the interior of the support. More specifically, they prove that: + +\begin{equation}\label{eq:postdiv} +\Bigl\| \nabla \log \pi(\theta^{(k)} \mid x) \Bigr\| \rightarrow \infty, \hspace{.1in} \mbox{ as } \hspace{.1in} k \rightarrow \infty. +\end{equation} + +%\medskip + +%\begin{thm}\label{postdiv} +%Consider a sequence $\{\theta^{(k)} \}$, $k=1, 2,\ldots$, of points in $\Theta_1$ such that $\theta^{(k)}$ converges to a boundary point $\theta^{(0)}$ in $\partial \Theta_1^{(q-1)}$. Furthermore, let $\theta^{(0)}$ lie within $\Theta$ and Assumption (A4) hold. Then +%\begin{equation}\label{eq:postdiv} +%\Bigl\| \nabla \log \pi(\theta^{(k)} \mid x) \Bigr\| \rightarrow \infty, \hspace{.1in} \mbox{ as } \hspace{.1in} k \rightarrow \infty. +%\end{equation} +%\end{thm} + +Since the $q-1$ dimensional faces of $\mathcal{C}(x,\theta^{(0)})$ have larger volume than its faces with lower dimension (see Figure \ref{fig:scheme}), a random sequence of points from the interior to the boundary would converge to a point on $\partial \Theta_1^{(q-1)}$ with probability $1$. Thus under our assumptions, the gradient of the log-posterior would diverge to infinity for these sequences with a high probability. The lower dimensional faces of the convex hull (a polytope) are an intersection of $q-1$ dimensional faces. Although, it is not clear if the norm of the gradient of the posterior will diverge on those faces. +It is conjectured that this would happen. However, even if the conjecture is not true, from the setup, it is clear that the sampler would rarely move to the region where the origin belongs to the lower dimensional faces of the convex hull. + +As has been pointed out above, the gradient vector would always point towards the mode of the posterior. From our results, since the gradient is large near the support boundary, whenever the HMC sampler approaches the boundary due to the high value of the gradient it would reflect towards the interior of the support and not get out of it. The leapfrog parameters can be controlled to increase efficiency of sampling. + +\section{Package description}\label{sec:package} +%% Note: If there is markup in \(sub)section, then it has to be escape as above. +The main function of the package is {\tt ELHMC}. It draws samples from an empirical likelihood Bayesian posterior of the parameter of interest using Hamiltonian Monte Carlo once the estimating equations involving the parameters, the prior distribution of the parameters, the gradients of the estimating equations, and the log priors are specified. Some other parameters which control the HMC process can also be specified. + +Suppose that the data set consists of observations $x = \left( x_1, ..., x_n \right)$ where each $x_i$ is a vector of length $p$ and follows a probability distribution $F$ of family $\mathcal{F}_{\theta}$. Here $\theta = \left(\theta_1,...,\theta_d\right)$ is the $d-$dimensional parameter of interest associated with $F$. Suppose there exist smooth functions $g\left(\theta, x_i\right) = \left(g_1\left(\theta, x_i\right)\right.$, $\ldots$, $\left. g_q\left(\theta, x_i\right)\right)^T$ which satisfy $E_F\left[g\left(\theta,x_i\right)\right] = 0$. As we have explained above, {\tt ELHMC} is used to draw samples of $\theta$ from its posterior defined by an empirical likelihood. + +\begin{table}[ht] +{\small + \begin{tabularx}{\textwidth}{l X} + \hline + \code{initial} & A vector containing the initial values of the parameter \\ + \code{data} & A matrix containing the data \\ + \code{fun} & The estimating function $g$. It takes in a parameter vector \code{params} as the first argument and a data point vector \code{x} as the second parameter. This function returns a vector. \\ + \code{dfun} & A function that calculates the gradient of the estimating function $g$. It takes in a parameter vector \code{params} as the first argument and a data point vector \code{x} as the second argument. This function returns a matrix. \\ + \code{prior} & A function with one argument \code{x} that returns the log joint prior density of the parameters of interest. \\ + \code{dprior} & A function with one argument \code{x} that returns the gradients of the log densities of the parameters of interest \\ + \code{n.samples} & Number of samples to draw \\ + \code{lf.steps} & Number of leap frog steps in each Hamiltonian Monte Carlo update (defaults to $10$).\\ + \code{epsilon} & The leap frog step size (defaults to $0.05$).\\ + \code{p.variance}& The covariance matrix of a multivariate normal distribution used to generate the initial values of momentum \code{p} in Hamiltonian Monte Carlo. This can also be a single numeric value or a vector (defaults to $0.1$).\\ + \code{tol} & EL tolerance \\ + \code{detailed} & If this is set to \code{TRUE}, the function will return a list with extra information. \\ + \code{print.interval} & The frequency at which the results would be printed on the terminal. Defaults to 1000.\\ + \code{plot.interval}& The frequency at which the drawn samples would be plotted. The last half of the samples drawn are plotted after each plot.interval steps. The acceptance rate is also plotted. Defaults to 0, which means no plot.\\ + \code{which.plot}& The vector of parameters to be plotted after each plot.interval. Defaults to NULL, which means no plot.\\ + \code{FUN}& the same as \code{fun} but takes in a matrix \code{X} instead of +a vector \code{x} and returns a matrix so that \code{FUN(params, X)[i, ]} is +the same as \code{fun(params, X[i, ])}. Only one of \code{FUN} and +\code{fun} should be provided. If both are then \code{fun} is ignored.\\ +\code{DFUN}& the same as \code{dfun} but takes in a matrix \code{X} instead of +a vector \code{x} and returns an array so that \code{DFUN(params, X)[, , i]} +is the same as \code{dfun(params, X[i, ])}. Only one of \code{DFUN} and +\code{dfun} should be provided. If both are then \code{dfun} is ignored.\\ + \hline +\end{tabularx} +} +\caption{Arguments for function \code{ELHMC}} \label{arguments} +\end{table} + +Table \ref{arguments} enlists the full list of arguments for \code{ELHMC}. Arguments \code{data} and \code{fun} define the problem. They are the data set $x$ and the collection of smooth functions in $g$. The user-specified starting point for $\theta$ is given in \code{initial}, whereas \code{n.samples} is the number of samples of $\theta$ to be drawn. The gradient matrix of $g$ with respect to the parameter $\theta$ (i.e. $\nabla_{\theta}g$) has to be specified in \code{dfun}. At the moment the function does not compute the gradient numerically by itself. The prior \code{prior} represents the joint density functions of $\theta_1,..,\theta_q$, which for the purpose of this description we denote by $\pi$. +The gradient of the log prior function is specified in \code{dprior}. %calculates the gradients of log prior densities for $\theta_1,..,\theta_d$. +The function returns a vector containing the values of $\frac{\partial}{\partial \theta_1}\pi\left(\theta\right),..,\frac{\partial}{\partial\theta_d}\pi\left(\theta\right)$. Finally, the arguments \code{epsilon}, \code{lf.steps}, \code{p.variance} and \code{tol} are hyper-parameters which control the Hamiltonian Monte Carlo algorithm. + +The arguments \code{print.interval}, \code{plot.interval}, and \code{which.plot} can be used to tune the HMC samplers. They can be used for printing and plotting the sampled values at specified intervals while the code is running. The argument \code{which.plot} allows the user to only plot the variables whose convergence needs to be checked. + +Given the data and a value of $\theta$, \code{ELHMC} computes the optimal weights using the \code{el.test} function from \code{emplik} library \citep{Zhou.:2014nr}. The \code{el.test} provides $\hat{\lambda}\left(\theta^{(k)}\right)$ from which the gradient of the log-empirical likelihood can be computed. + +If $\theta\not\in\Theta_1$, i.e. problem \ref{eqn5} is not feasible, then \code{el.test} converges to weights all close to zero which do not sum to one. Furthermore, the norm of $\hat{\lambda}\left(\theta^{(k)}\right)$ will be large. +In such cases, the empirical likelihood will be zero. This means that, whenever the optimal weights are computed, we need to check if they sum to one (within numerical errors) or not. + +The function \code{ELHMC} returns a list. If argument \code{detailed} is set to \code{FALSE}, the list contains samples of the parameters of interest $\theta$, the Monte Carlo acceptance rate as listed in table \ref{returned}. If \code{detailed} is set to \code{TRUE}, additional information such as the trajectories of $\theta$ and the momentum is included in the returned list (see Table \ref{detailedreturned}). + +At the moment \code{ELHMC} only allows a diagonal covariance matrix for the momentum $p$. The default value for the stepsize \code{epsilon} and step number \code{lf.steps} are $0.05$ and $10$ respectively. For a specific problem they need to be determined by trial and error, using the outputs from \code{plot.interval}, and \code{print.interval} commands. + +\begin{table}[t] +{\small +\begin{tabularx}{\textwidth}{l X} + \hline + \code{samples} & A matrix containing the parameter samples \\ + \code{acceptance.rate} & The acceptance rate \\ + \code{call} & The matched call \\ + \hline +\end{tabularx} +\caption{Elements of the list returned by \code{ELHMC} if \code{detailed = FALSE}} \label{returned} +} +\end{table} + +\begin{table}[ht] +{\small +\begin{tabularx}{\textwidth}{l X} + \hline + \code{samples} & A matrix containing the parameter samples \\ + \code{acceptance.rate} & The acceptance rate \\ + \code{proposed} & A matrix containing the proposed values at \code{n.samaples - 1} Hamiltonian Monte Carlo updates \\ + \code{acceptance} & A vector of \code{TRUE/FALSE} values indicates whether each proposed value is accepted \\ + \code{trajectory} & A list with 2 elements \code{trajectory.q} and \code{trajectory.p}. These are lists of matrices containing position and momentum values along trajectory in each Hamiltonian Monte Carlo update. \\ + \code{call} & The matched call \\ + \hline +\end{tabularx} +\caption{Elements of the list returned by \code{ELHMC} if \code{detailed = TRUE}} \label{detailedreturned} +} +\end{table} + +\section{Examples}\label{sec:examples} + +In this section, we present two examples of usage of the package. Both examples in some sense supplement the conditions considered by \citet{chaudhuriMondalTeng2017}. In each case, it is seen that the function can sample from the resulting empirical likelihood-based posterior quite efficiently. + +\subsection{Sample the mean of a simple data set} + +In the first example, suppose the data set consists of eight data points $v = \left(v_1,...,v_8\right)$: + +\begin{verbatim} +R> v <- rbind(c(1, 1), c(1, 0), c(1, -1), c(0, -1), ++ c(-1, -1), c(-1, 0), c(-1, 1), c(0, 1)) +R> print(v) + [,1] [,2] +[1,] 1 1 +[2,] 1 0 +[3,] 1 -1 +[4,] 0 -1 +[5,] -1 -1 +[6,] -1 0 +[7,] -1 1 +[8,] 0 1 +\end{verbatim} + +The parameters of interest are the mean $\theta = \left(\theta_1, \theta_2\right)$. Since $E\left[\theta - v_i\right] = 0$, the smooth function is $g = \theta - v_i$ with $\nabla_\theta g = \left(\left(1, 0\right), \left(0, 1\right)\right) $: + +\begin{verbatim} +Function: fun +R> g <- function(params, x) { ++ params - x ++ } + +Function: dfun +R> dlg <- function(params, x) { ++ rbind(c(1, 0), c(0, 1)) ++ } +\end{verbatim} + +Functions \code{g} and \code{dlg} are supplied to arguments \code{fun} and \code{dfun} in \code{ELHMC}. These two functions must have \code{params} as the first argument and \code{x} as the second. \code{params} represents a sample of $\theta$ whereas \code{x} represents a data point $v_i$ or a row in the matrix \code{v}. \code{fun} should return a vector and \code{dfun} a matrix whose $\left(i, j\right)$ entry is $\partial g_i/\partial\theta_j$. + +We assume that both $\theta_1$ and $\theta_2$ have independent standard normal distributions as priors. Next, we define the functions that calculate the prior densities and gradients of log prior densities as \code{pr} and \code{dpr} in the following ways: + +\begin{verbatim} +Function: prior +R> pr <- function(x) { ++ -.5*(x[1]^2+x[2]^2)-log(2*pi) ++ } +Function: dprior +R> dpr <- function(x) { ++ -x ++ } +\end{verbatim} + +Functions \code{pr} and \code{dpr} are assigned to \code{prior} and \code{dprior} in \code{ELHMC}. \code{prior} and \code{dprior} must take in only one argument \code{x} and return a vector of the same length as $\theta$. + +We can now use \code{ELHMC} to draw samples of $\theta$. Let us draw 1000 samples, with starting point $\left(0.9, 0.95\right)$ using 12 leapfrog steps with step size 0.06 for both $\theta_1$ and $\theta_2$ for each Hamiltonian Monte Carlo update:%\marginpar{Is it $10$ steps or $15$ steps?} + +\begin{verbatim} +R> library(elhmc) +R> set.seed(476) +R> thetas <- ELHMC(initial = c(0.9, 0.95), data = v, fun = g, dfun = dlg, ++ prior = pr, dprior = dpr, n.samples = 1000, ++ lf.steps = 12, epsilon = 0.06, detailed = TRUE) +\end{verbatim} + +We extract and visualise the distribution of the samples using a boxplot (Figure \ref{theta}): + +\begin{verbatim} +R> boxplot(thetas$samples, names = c(expression(theta[1]), expression(theta[2]))) +\end{verbatim} + +Since we set \code{detailed = TRUE}, we have data on the trajectory of $\theta$ as well as momentum $p$. They are stored in element \code{trajectory} of \code{thetas} and can be accessed by \verb|thetas$trajectory|. \verb|thetas$trajectory| is a list with two elements named \code{trajectory.q} and \code{trajectory.p} denoting trajectories for $\theta$ and momentum $p$. \code{trajectory.q} and \code{trajectory.p} are both lists with elements \code{1}, ..., \code{n.samples - 1}. Each of these elements is a matrix containing trajectories of $\theta$ (\code{trajectory.q}) and $p$ (\code{trajectory.p}) at each Hamiltonian Monte Carlo update. + +We illustrate by extracting the trajectories of $\theta$ at the first update and plotting them (Figure \ref{trajectoryeg1}): + +\begin{verbatim} +R> q <- thetas$trajectory$trajectory.q[[1]] +R> plot(q, xlab = expression(theta[1]), ylab = expression(theta[2]), ++ xlim = c(-1, 1), ylim = c(-1, 1), cex = 1, pch = 16) +R> points(v[,1],v[,2],type="p",cex=1.5,pch=16) +R> abline(h=-1); abline(h=1); abline(v=-1); abline(v=1) +R> arrows(q[-nrow(q), 1], q[-nrow(q), 2], q[-1, 1], q[-1, 2], ++ length = 0.1, lwd = 1.5) +\end{verbatim} + +\begin{figure}[t] + \centering + \begin{subfigure}[b]{0.48\textwidth} + \centering + \includegraphics[width=\textwidth]{figures/elhmc-008} + \caption{Posterior distribution of $\theta_1$ and $\theta_2$ samples.} + \label{theta} + \end{subfigure} + \hfill + \begin{subfigure}[b]{0.48\textwidth} + \centering + \includegraphics[width=\textwidth]{figures/elhmc-010} + \caption{Trajectory of $\theta$ during the first Monte Carlo update.} + \label{trajectoryeg1} + \end{subfigure} + \caption{Samples of $\theta$ drawn from \code{ELHMC}.} + \label{fig:trajectory} +\end{figure} + +The specialty in this example is in the choice of the data points in $v$. \citet{chaudhuriMondalTeng2017} show that the chain will reflect if the one-dimensional boundaries of the convex hull (in this case the unit square) have two observations, which happens with probability one for continuous distributions. In this example, however, there is more than one point in two one-dimensional boundaries. However, we can see that the HMC method works very well here. + +\subsection{Logistic regression with an additional constraint}\label{ex:2} +In this example, we consider a constrained logistic regression of one binary variable on another, where the expectation of the response is known. The frequentist estimation problem using empirical likelihood was considered by \citet{chaudhuri_handcock_rendall_2008}. +It has been shown that empirical likelihood-based formulation has a major applicational advantage over the fully parametric formulation. Below we consider a Bayesian extension of the proposed empirical likelihood-based formulation and use \code{ELHMC} to sample from the resulting posterior. + +\begin{wraptable}{l}{2in} + \centering + \begin{tabular}{l||cc} + ~&$x=0$&$x=1$\\\hline\hline + $y=0$&5903&5157\\ + $y=1$&230&350\\\hline + \end{tabular} + \caption{The dataset used in Example \ref{ex:2}.} + \label{tab:bhps} + \end{wraptable} + +The data set $v$ consists of $n$ observations of two variables and two columns $X$ and $Y$. In the ith row $y_i$ represents the indicator of whether a woman gave birth between time $t - 1$ and $t$ while $x_i$ is the indicator of whether she had at least one child at time $t - 1$. The data can be found in Table \ref{tab:bhps} above. In addition, it was known that the prevalent general fertility rate in the population was $0.06179$. +\footnote{The authors are grateful to Prof. Michael Rendall, Department of Sociology, University of Maryland, College Park, for kindly sharing the data on which this example is based.} + +We are interested in fitting a logistic regression model to the data with $X$ as the independent variable and $Y$ as the dependent variable. However, we also would like to constrain the sample general fertility rate to its value in the population. The logistic regression model takes the form of: + +$$ + P \left(Y = 1 | X = x\right) = \frac{\exp\left(\beta_0 + \beta_1 x\right)}{1 + \exp\left(\beta_0 + \beta_1 x\right)}. +$$ + +From the model, using conditions similar to zero-mean residual and exogeneity, it is clear that: + +\begin{equation*} +E\left[y_i - \frac{\exp\left(\beta_0 + \beta_1 x_i\right)}{1 + \exp\left(\beta_0 + \beta_1 x_i\right)}\right] = 0,\quad +E\left[x_i\left\{y_i - \frac{\exp\left(\beta_0 + \beta_1 x_i\right)}{1 + \exp\left(\beta_0 + \beta_1 x_i\right)}\right\}\right] = 0. +\end{equation*} + +Furthermore from the definition of general fertility rate, we get: + +$$ +E\left[y_i - 0.06179\right] = 0. +$$ + +Following \citet{chaudhuri_handcock_rendall_2008}, we define the estimating equations $g$ as follows: + +$$ +g \left(\beta, v_i\right) = \begin{bmatrix} +y_i - \frac{\exp\left(\beta_0 + \beta_1 x_i\right)}{1 + \exp\left(\beta_0 + \beta_1 x_i\right)} \\ +x_i\left[y_i - \frac{\exp\left(\beta_0 + \beta_1 x_i\right)}{1 + \exp\left(\beta_0 + \beta_1 x_i\right)}\right] \\ +y_i - 0.06179 \\ +\end{bmatrix} +$$ + +The gradient of $g$ with respect to $\beta$ is given by: + +$$ +\nabla_{\beta}g = \begin{bmatrix} +\frac{-\exp\left(\beta_0 + \beta_1 x_i\right)}{\left(\exp\left(\beta_0 + \beta_1 x_i\right) + 1\right)^2} & \frac{-\exp\left(\beta_0 + \beta_1 x_i\right) x_i}{\left(\exp\left(\beta_0 + \beta_1 x_i\right) + 1\right)^2} \\ +\frac{-\exp\left(\beta_0 + \beta_1 x_i\right) x_i}{\left(\exp\left(\beta_0 + \beta_1 x_i\right) + 1\right)^2} & \frac{-\exp\left(\beta_0 + \beta_1 x_i\right) x_i^2}{\left(\exp\left(\beta_0 + \beta_1 x_i\right) + 1\right)^2} \\ +0 & 0 \\ +\end{bmatrix} +$$ + +In R, we create functions \code{g} and \code{dlg} to represent $g$ and $\nabla_{\beta}g$: + +\begin{verbatim} +Function: fun +R> g <- function(params, X) { ++ result <- matrix(0, nrow = nrow(X), ncol = 3) ++ a <- exp(params[1] + params[2] * X[, 1]) ++ a <- a / (1 + a) ++ result[, 1] <- X[, 2] - a ++ result[, 2] <- (X[, 2] - a) * X[, 1] ++ result[, 3] <- X[, 2] - 0.06179 ++ result +} +Function: dfun +R> dg <- function(params, X) { ++ result <- array(0, c(3, 2, nrow(X))) ++ a <- exp(params[1] + params[2] * X[, 1]) ++ a <- -a / (a + 1) ^ 2 ++ result[1, 1, ] <- a ++ result[1, 2, ] <- result[1, 1, ] * X[, 1] ++ result[2, 1, ] <- result[1, 2, ] ++ result[2, 2, ] <- result[1, 2, ] * X[, 1] ++ result[3, , ] <- 0 ++ result +} +\end{verbatim} + +We choose independent $N\left(0, 100\right)$ priors for both $\beta_0$ and $\beta_1$: + +\begin{verbatim} +Function: prior +R> pr <- function(x) { ++ - 0.5*t(x)%*%x/10^4 - log(2*pi*10^4) ++ }, +Function: dprior +R> dpr <- function(x) { ++ -x * 10 ^ (-4) ++ }, +\end{verbatim} +where \code{pr} is the prior and \code{dpr} is the gradient of the log prior for $\beta$. + +Our goal is to use \code{ELHMC} to draw samples of $\beta = \left( \beta_0, \beta_1 \right)$ from their resulting posterior based on empirical likelihood. + +We start our sampling from $(-3.2,0.55)$ and use two stages of sampling. In the first stage, $50$ points are sampled with $\epsilon=0.001$, $T=15$, and the momentum generated from a $N(0,0.02\cdot I_2)$ distribution. The acceptance rate at this stage is very high, but it is designed to find a good starting point for the second stage, where the acceptance rate can be easily controlled. + +\begin{verbatim} +R> bstart.init=c(-3.2,.55) +R> betas.init <- ELHMC(initial = bstart.init, data = data, FUN = g, DFUN = dg, ++ n.samples = 50, prior = pr, dprior = dpr, epsilon = 0.001, ++ lf.steps = 15, detailed = T, p.variance = 0.2) +\end{verbatim} + +\begin{figure}[t] + \centering + \includegraphics[width=0.48\textwidth]{figures/bhpsContour.pdf} + \includegraphics[width=0.48\textwidth]{figures/elhmcHMC.pdf} + \caption{Contour plot of the non-normalised log posterior with the HMC sampling path (left) and density plot (right) of the samples for the constrained logistic regression problem.}\label{fig:density} + \end{figure} + +\begin{figure}[b] + \centering + \includegraphics[width=0.48\textwidth]{figures/bhpsAcfb0} + \includegraphics[width=0.48\textwidth]{figures/bhpsAcfb1} + \caption{The autocorrelation function of the samples drawn from the posterior of $\beta$.} + \label{fig:acf} +\end{figure} + +In this second stage, we draw 500 samples of $\beta$ with starting values as the last value from the first stage. The number of leapfrog steps per Monte Carlo update is set to 30, with a step size of 0.004 for both $\beta_0$ and $\beta_1$. We use $N \left(0, 0.02\textbf(I_2)\right)$ as prior for the momentum. + +\begin{verbatim} +R> bstart=betas.init$samples[50,] +R> betas <- ELHMC(initial = bstart, data = data, fun = g, dfun = dg, ++ n.samples = 500, prior = pr, dprior = dpr, epsilon = 0.004, ++ lf.steps = 30, detailed = FALSE, p.variance = 0.2,print.interval=10, ++ plot.interval=1,which.plot=c(1)) +\end{verbatim} + +Based on our output, we can make inferences about $\beta$. As an example, the autocorrelation plots and the density plot of the last $1000$ samples of $\beta$ are shown in Figure \ref{fig:density}. + +\begin{verbatim} +R> library(MASS) +R> beta.density <- kde2d(betas$sample[, 1], betas$samples[, 2]) +R> persp(beta.density, phi = 50, theta = 20, ++ xlab = 'Intercept', ylab = '', zlab = 'Density', ++ ticktype = 'detailed', cex.axis = 0.35, cex.lab = 0.35, d = 0.7) +R> acf(betas$sample[round(n.samp/2):n.samp, 1], ++ main=expression(paste("Series ",beta[0]))) +R> acf(betas$sample[round(n.samp/2):n.samp, 2], ++ main=expression(paste("Series ",beta[1]))) +\end{verbatim} + +It is well known \citep{chaudhuri_handcock_rendall_2008} that the constrained estimates of $\beta_0$ and $\beta_1$ have very low standard error. The acceptance rate is close to $78\%$. It is evident that our software can sample from such a narrow ridge with ease. Furthermore, the autocorrelation of the samples seems to decrease very quickly with the lag, which would not be the case for most other MCMC procedures. + +\section*{Acknowledgement} +Dang Trung Kien would like to acknowledge the support of MOE AcRF R-155-000-140-112 from the National University of Singapore. Sanjay Chaudhuri acknowledges the partial support from NSF-DMS grant 2413491 from the National Science Foundation USA. +The authors are grateful to Professor Michael Rendall, Department of Sociology, University of Maryland, College Park for kindly sharing the data set on which the second example is based. + +\bibliography{kWC} + +%\input{RJwrapper.bbl} + + +\address{% +Neo Han Wei\\ +Citibank, Singapore\\ +% +\href{mailto:nhanwei@gmail.com}{\nolinkurl{nhanwei@gmail.com}}% +} + +\address{% +Dang Trung Kien\\ +Independent Consultant\\ +% +\href{mailto:trungkiendang@hotmail.com}{\nolinkurl{trungkiendang@hotmail.com}}\\ +\url{http://www.stat.nus.edu.sg/}% +} + +\address{% +Sanjay Chaudhuri\\ +University of Nebraska-Lincoln\\ +Department of Statistics\\ +840 Hardin Hall North Wing, Lincoln, NE, USA\\ +% +\href{mailto:schaudhuri2@nebraska.edu}{\nolinkurl{schaudhuri2@nebraska.edu}}\\ +\url{http://www.stat.nus.edu.sg/}% +} +%\end{document} diff --git a/_articles/RJ-2025-041/scripts/contour.pdf b/_articles/RJ-2025-041/scripts/contour.pdf new file mode 100644 index 0000000000..7ec9d3a78a Binary files /dev/null and b/_articles/RJ-2025-041/scripts/contour.pdf differ diff --git a/_articles/RJ-2025-041/scripts/kWC.R b/_articles/RJ-2025-041/scripts/kWC.R new file mode 100644 index 0000000000..74f4ca10f3 --- /dev/null +++ b/_articles/RJ-2025-041/scripts/kWC.R @@ -0,0 +1,152 @@ +rm(list=ls()) +graphics.off() + +library(elhmc) +library(plyr) +library(MASS) +library(magick) +library(ggplot2) +library(grid) +library(pdftools) + +#source('../elhmc/R/ELHMC.R') +#source('../elhmc/R/ELU.R') +#source('../elhmc/R/HMC.R') +#source('../elhmc/R/Utils.R') + + + +#Example 4.1 + +v <- rbind(c(1, 1), c(1, 0), c(1, -1), c(0, -1), + c(-1, -1), c(-1, 0), c(-1, 1), c(0, 1)) + +g <- function(params, x) { + params-x +} + +dlg <- function(params, x) { + rbind(c(1, 0), c(0, 1)) +} + +pr <- function(x) { + -.5*(x[1]^2+x[2]^2)-log(2*pi) +} +dpr <- function(x) { + -x +} + +set.seed(476) +thetas <- ELHMC(initial = c(0.9, 0.95), data = v, fun = g, dfun = dlg, + prior = pr, dprior = dpr, n.samples = 1000, + lf.steps = 12, epsilon = 0.06, detailed = TRUE) + +boxplot(thetas$samples,names = c(expression(theta[1]), expression(theta[2]))) + +q <- thetas$trajectory$trajectory.q[[1]] +par(pty="s") +plot(q, xlab = expression(theta[1]), ylab = expression(theta[2]), + xlim = c(-1, 1), ylim = c(-1, 1), cex = 1, pch = 16) +points(v[,1],v[,2],type="p",cex=1.5,pch=16) +abline(h=-1);abline(h=1);abline(v=-1);abline(v=1) +arrows(q[-nrow(q), 1], q[-nrow(q), 2], q[-1, 1], q[-1, 2], + length = 0.1, lwd = 1.5) + +#Example 4.2 + +n <- rbind(c(5903,230),c(5157,350)) +mat <- matrix(0,nrow=sum(n),ncol=2) +mat <- rbind(matrix(1,nrow=n[1,1],ncol=1)%*%c(0,0), + matrix(1,nrow=n[1,2],ncol=1)%*%c(0,1), + matrix(1,nrow=n[2,1],ncol=1)%*%c(1,0), + matrix(1,nrow=n[2,2],ncol=1)%*%c(1,1)) + +#Specifying the population constraints. + +gfr <- .06179*matrix(1,nrow=nrow(mat),ncol=1) +g <- matrix(1,nrow=nrow(mat),ncol=1) +amat <- matrix(mat[,2]*g-gfr,ncol=1) + +data <- mat +g <- function(params, X) { + result <- matrix(0, nrow = nrow(X), ncol = 3) + a <- exp(params[1] + params[2] * X[, 1]) + a <- a / (1 + a) + result[, 1] <- X[, 2] - a + result[, 2] <- (X[, 2] - a) * X[, 1] + result[, 3] <- X[, 2] - 0.06179 + result +} + +dg <- function(params, X) { + result <- array(0, c(3, 2, nrow(X))) + a <- exp(params[1] + params[2] * X[, 1]) + a <- -a / (a + 1) ^ 2 + result[1, 1, ] <- a + result[1, 2, ] <- result[1, 1, ] * X[, 1] + result[2, 1, ] <- result[1, 2, ] + result[2, 2, ] <- result[1, 2, ] * X[, 1] + result[3, , ] <- 0 + result +} + +pr <- function(x) { + - 0.5*t(x)%*%x/10^4 - log(2*pi*10^4) +} + +gpr <- function(x) { + -x*10^(-4) +} + +set.seed(1234) +bstart.init=c(-3.2,.55) +betas.init <- ELHMC(initial = bstart.init, data = data, FUN = g, DFUN = dg, + n.samples = 50, prior = pr, dprior = gpr, epsilon = 0.001, + lf.steps = 15, detailed = T, p.variance = 0.2,print.interval=10) + +betas.init$acceptance.rate + +bstart=betas.init$samples[50,] + +betas <- ELHMC(initial = bstart, data = data, FUN = g, DFUN = dg, + n.samples = 500, prior = pr, dprior = gpr, epsilon = 0.004, + lf.steps = 30, detailed = F, p.variance = 0.2,print.interval=10, + plot.interval=1,which.plot=c(1,2)) + +betas$acceptance.rate +n.samp=nrow(betas$samples) + +beta.density <- kde2d(betas$sample[round(n.samp/2):n.samp, 1], betas$samples[round(n.samp/2):n.samp, 2]) + +persp(beta.density, phi = 50, theta = 20, xlab = 'Intercept', ylab = '', zlab = 'Density', ticktype = 'detailed', cex.axis = 0.35, cex.lab = 0.35, d = 0.7) + +dev.new() +par(mfrow=c(2,1)) + +acf(betas$sample[round(n.samp/2):n.samp, 1],main=expression(paste("Series ",beta[0]))) +acf(betas$sample[round(n.samp/2):n.samp, 2],main=expression(paste("Series ",beta[1]))) + + +dev.new() +par(mfrow=c(2,1)) + +plot(betas$sample[round(n.samp/2):n.samp, 1],main=expression(paste("Series ",beta[0])),type='l',ylab=" ") +plot(betas$sample[round(n.samp/2):n.samp, 2],main=expression(paste("Series ",beta[1])),type='l',ylab=" ") + + +dev.new() +image=image_read_pdf("contour.pdf")%>%image_trim() +beta.init=betas.init$samples +beta=betas$samples +plot(c(-3.5,-2.5),c(-.5,1.5),xlab=expression(beta[0]),ylab=expression(beta[1]),type='n') + +rasterImage(image,-3.5,-.5,-2.5,1.5) + +for(i in 1:(nrow(beta.init)-1)){ + points(beta.init[i,1],beta.init[i,2],cex=.5,pch=19,col=2) + arrows(beta.init[i,1],beta.init[i,2],beta.init[(i+1),1],beta.init[(i+1),2],length=.075,lwd=1.75,col=2) +} + +points(beta[,1],beta[,2],type='l',lwd=1.75) +points(beta[,1],beta[,2],type='p',cex=.5,pch=19) + diff --git a/_articles/RJ-2025-042/GeRnika.R b/_articles/RJ-2025-042/GeRnika.R new file mode 100644 index 0000000000..7e9b596e86 --- /dev/null +++ b/_articles/RJ-2025-042/GeRnika.R @@ -0,0 +1,78 @@ +library(GeRnika) + + +# Generate a tumor instance with 5 clones, 4 samples, k=0.5, and neutral evolution. +I <- create_instance(n = 5, m = 3, k = 0.5, selection = "neutral", seed = 1) +I + + +# Create a 'Phylotree' class object based on the previously simulated instance. +phylotree <- B_to_phylotree(B = I$B) +phylotree + + +# Plot the 'Phylotree' class object. +plot(phylotree) + + +# Plot the 'Phylotree' class object according to the clone proportions +# associated to the previously generated U matrix and using predefined tags +# to label the nodes in the tree. +plot_proportions(phylotree, I$U, labels = TRUE) + + +# Load the first trio of matrices of the B_mats object offered by GeRnika. +B_mats <- GeRnika::B_mats +B_real <- B_mats[[1]]$B_real +B_alg1 <- B_mats[[1]]$B_alg1 +B_alg2 <- B_mats[[1]]$B_alg2 + + +# Create a predefined set of tags for the clones in the phylogenetic trees. +tags <- c("TP53", "KRAS", "PIK3CA", "APC", "EGFR", "BRCA1", "PTEN", + "BRAF", "MYC", "CDKN2A") + + +# Create a 'Phylotree' class object per each B matrix in the loaded trio of B +# matrices. +phylotree_real <- B_to_phylotree(B_real, labels = tags) +phylotree_alg1 <- B_to_phylotree(B_alg1, labels = tags) +phylotree_alg2 <- B_to_phylotree(B_alg2, labels = tags) + + +# Plot the phylogenetic trees using the predefined set of tags. +plot(phylotree_real, labels = TRUE) +plot(phylotree_alg1, labels = TRUE) +plot(phylotree_alg2, labels = TRUE) + + +# Check if phylotree_real and phylotree_alg1 are equal. +equals(phylotree_1 = phylotree_real, phylotree_2 = phylotree_alg1) + + +# Check if phylotree_real and phylotree_real are equal. +equals(phylotree_1 = phylotree_real, phylotree_2 = phylotree_real) + + +# Find the maximal common subtrees between phylotree_real and phylotree_alg2, +# using the predefined set of tags. +find_common_subtrees(phylotree_1 = phylotree_real, phylotree_2 = phylotree_alg2, + labels = TRUE) + + +# Load the Lancet palette offered by GeRnika +palette <- GeRnika::palettes$Lancet + + +# Compute the consensus tree between phylotree_real and phylotree_alg1, using the +# predefined set of tags for the clones in the trees and the previously loaded +# palette +consensus_real_alg1 <- combine_trees(phylotree_1 = phylotree_real, + phylotree_2 = phylotree_alg1, + labels = TRUE, + palette = palette) + + +# Plot the consensus tree between phylotree_real and phylotree_alg1 by means of +# DiagrammeR's render_graph function. +DiagrammeR::render_graph(consensus_real_alg1) diff --git a/_articles/RJ-2025-042/GeRnika.bib b/_articles/RJ-2025-042/GeRnika.bib new file mode 100644 index 0000000000..fda8741d3b --- /dev/null +++ b/_articles/RJ-2025-042/GeRnika.bib @@ -0,0 +1,390 @@ +@article{1, + author="Marass, Francesco and Mouliere, Florent and Yuan, Ke and Rosenfeld, Nitzan and Markowetz, Florian", + title="A Phylogenetic Latent Feature Model for Clonal Deconvolution", + journal="The Annals of Applied Statistics", + year="2016", + volume="10", + pages="2377-2404", + doi={10.1214/16-AOAS986} +} +@article{articleeffect, + author = {Burrell, Rebecca and Mcgranahan, Nicholas and Bartek, Jiri and Swanton, Charles}, + year = {2013}, + month = {09}, + pages = {338-45}, + title = {The Causes and Consequences of Genetic Heterogeneity in Cancer Evolution}, + volume = {501}, + journal = {Nature}, + doi = {10.1038/nature12625} +} +@article{citup, + title={Clonality inference in multiple tumor samples using phylogeny}, + author={Malikic, Salem and McPherson, Andrew W and Donmez, Nilgun and Sahinalp, Cenk S}, + journal={Bioinformatics}, + volume={31}, + number={9}, + pages={1349--1356}, + doi={10.1093/bioinformatics/btv003}, + year={2015}, + publisher={Oxford University Press} +} +@article{spruce, + title={Inferring the mutational history of a tumor using multi-state perfect phylogeny mixtures}, + author={El-Kebir, Mohammed and Satas, Gryte and Oesper, Layla and Raphael, Benjamin J}, + journal={Cell Systems}, + volume={3}, + number={1}, + pages={43--53}, + year={2016}, + doi={10.1016/j.cels.2016.07.004}, + publisher={Elsevier} +} +@article{phylowgs, + title={Phylo{WGS}: reconstructing subclonal composition and evolution from whole-genome sequencing of tumors}, + author={Deshwar, Amit G and Vembu, Shankar and Yung, Christina K and Jang, Gun Ho and Stein, Lincoln and Morris, Quaid}, + journal={Genome Biology}, + volume={16}, + number={1}, + pages={1--20}, + year={2015}, + doi={10.1186/s13059-015-0602-8}, + publisher={BioMed Central} +} +@article{tanner2019simulation, + title={Simulation of heterogeneous tumour genomes with HeteroGenesis and in silico whole exome sequencing}, + author={Tanner, Georgette and Westhead, David R and Droop, Alastair and Stead, Lucy F}, + journal={Bioinformatics}, + volume={35}, + number={16}, + pages={2850--2852}, + year={2019}, + doi={10.1093/bioinformatics/bty1063}, + publisher={Oxford University Press} +} +@article{machina, + title={Inferring parsimonious migration histories for metastatic cancers}, + author={El-Kebir, Mohammed and Satas, Gryte and Raphael, Benjamin J}, + journal={Nature Genetics}, + volume={50}, + number={5}, + pages={718--726}, + year={2018}, + doi={10.1038/s41588-018-0106-z}, + publisher={Nature Publishing Group} +} +@misc{Oncolib, + author = {Qi, YuanYuan and Luo, Yunan and El-Kebir, Mohammed}, + title = {Onco{L}ib}, + year = {2019}, + publisher = {GitHub}, + journal = {GitHub repository}, + url = {https://github.com/elkebir-group/OncoLib/}, + howpublished = {\url{https://github.com/elkebir-group/OncoLib/}} +} +@article{calder, + title={{CALDER}: {I}nferring phylogenetic trees from longitudinal tumor samples}, + author={Myers, Matthew A and Satas, Gryte and Raphael, Benjamin J}, + journal={Cell {S}ystems}, + volume={8}, + number={6}, + pages={514--522}, + year={2019}, + doi={10.1016/j.cels.2019.05.010}, + publisher={Elsevier} +} +@article{clonarch, + title={Clon{A}rch: visualizing the spatial clonal architecture of tumors}, + author={Wu, Jiaqi and El-Kebir, Mohammed}, + journal={Bioinformatics}, + volume={36}, + number={Supplement\_1}, + pages={i161--i168}, + year={2020}, + doi={10.1093/bioinformatics/btaa471}, + publisher={Oxford University Press} +} +@article{clonevol, + title={Clon{E}vol: clonal ordering and visualization in cancer sequencing}, + author={Dang, HX and White, BS and Foltz, SM and Miller, CA and Luo, Jingqin and Fields, RC and Maher, CA}, + journal={Annals of {O}ncology}, + volume={28}, + number={12}, + pages={3076--3082}, + year={2017}, + doi={10.1093/annonc/mdx517}, + publisher={Elsevier} +} +@article{clevrvis, + title={clev{R}vis: visualization techniques for clonal evolution}, + author={Sandmann, Sarah and Inserte, Clara and Varghese, Julian}, + journal={GigaScience}, + volume={12}, + pages={giad020}, + year={2023}, + doi={10.1093/gigascience/giad020}, + publisher={Oxford University Press} +} +@inproceedings{contreedp, + title={{C}on{T}ree{DP}: A consensus method of tumor trees based on maximum directed partition support problem}, + author={Fu, Xuecong and Schwartz, Russell}, + booktitle={2021 IEEE International Conference on Bioinformatics and Biomedicine ({BIBM})}, + pages={125--130}, + year={2021}, + doi={10.1101/2021.10.13.463978}, + organization={IEEE} +} +@article{tuelip, + title={A weighted distance-based approach for deriving consensus tumor evolutionary trees}, + author={Guang, Ziyun and Smith-Erb, Matthew and Oesper, Layla}, + journal={Bioinformatics}, + volume={39}, + number={Supplement\_1}, + pages={i204--i212}, + year={2023}, + doi={10.1093/bioinformatics/btad230}, + publisher={Oxford University Press} +} +@article{fishplot, + title={Visualizing tumor evolution with the fishplot package for {R}}, + author={Miller, Christopher A and McMichael, Joshua and Dang, Ha X and Maher, Christopher A and Ding, Li and Ley, Timothy J and Mardis, Elaine R and Wilson, Richard K}, + journal={BMC Genomics}, + volume={17}, + pages={1--3}, + year={2016}, + doi={10.1186/s12864-016-3195-z}, + publisher={Springer} +} +@article{pairtree, + title={Reconstructing cancer phylogenies using {P}airtree, a clone tree reconstruction algorithm}, + author={Kulman, Ethan and Wintersinger, Jeff and Morris, Quaid}, + journal={STAR Protocols}, + volume={3}, + number={4}, + pages={101706}, + year={2022}, + doi={10.1016/j.xpro.2022.101706}, + publisher={Elsevier} +} +@article{canopy, + title={Assessing intratumor heterogeneity and tracking longitudinal and spatial clonal evolutionary history by next-generation sequencing}, + author={Jiang, Yuchao and Qiu, Yu and Minn, Andy J and Zhang, Nancy R}, + journal={Proceedings of the National Academy of Sciences}, + volume={113}, + number={37}, + pages={E5528--E5537}, + year={2016}, + doi={10.1073/pnas.1522203113}, + publisher={National Acad Sciences} +} +@inproceedings{bayclone, + title={Bayclone: {B}ayesian nonparametric inference of tumor subclones using {NGS} data}, + author={Sengupta, Subhajit and Wang, Jin and Lee, Juhee and M{\"u}ller, Peter and Gulukota, Kamalakar and Banerjee, Arunava and Ji, Yuan}, + booktitle={Pacific Symposium on Biocomputing Co-Chairs}, + pages={467--478}, + year={2014}, + doi={10.1142/9789814644730_0044}, + organization={World Scientific} +} +@article{bitphylogeny, + title={Bit{P}hylogeny: a probabilistic framework for reconstructing intra-tumor phylogenies}, + author={Yuan, Ke and Sakoparnig, Thomas and Markowetz, Florian and Beerenwinkel, Niko}, + journal={Genome Biology}, + volume={16}, + number={1}, + pages={1--16}, + year={2015}, + doi={10.1186/s13059-015-0592-6}, + publisher={BioMed Central} +} +@article{phylosub, + title={Inferring clonal evolution of tumors from single nucleotide somatic mutations}, + author={Jiao, Wei and Vembu, Shankar and Deshwar, Amit G and Stein, Lincoln and Morris, Quaid}, + journal={BMC Bioinformatics}, + volume={15}, + number={1}, + pages={35}, + year={2014}, + doi={10.1186/1471-2105-15-35}, + publisher={Springer} +} +@article{lichee, + title={Fast and scalable inference of multi-sample cancer lineages}, + author={Popic, Victoria and Salari, Raheleh and Hajirasouliha, Iman and Kashef-Haghighi, Dorna and West, Robert B and Batzoglou, Serafim}, + journal={Genome Biology}, + volume={16}, + number={1}, + pages={91}, + year={2015}, + doi={10.1186/s13059-015-0647-8}, + publisher={Springer} +} +@article{mipup, + title={{MIPUP}: minimum perfect unmixed phylogenies for multi-sampled tumors via branchings and {ILP}}, + author={Husi{\'c}, Edin and Li, Xinyue and Hujdurovi{\'c}, Ademir and Mehine, Miika and Rizzi, Romeo and M{\"a}kinen, Veli and Milani{\v{c}}, Martin and Tomescu, Alexandru I}, + journal={Bioinformatics}, + volume={35}, + number={5}, + pages={769--777}, + year={2019}, + doi={10.1093/bioinformatics/bty683}, + publisher={Oxford University Press} +} +@article{sollier2023compass, + title={{COMPASS}: joint copy number and mutation phylogeny reconstruction from amplicon single-cell sequencing data}, + author={Sollier, Etienne and Kuipers, Jack and Takahashi, Koichi and Beerenwinkel, Niko and Jahn, Katharina}, + journal={Nature communications}, + volume={14}, + number={1}, + pages={4921}, + year={2023}, + doi={10.1038/s41467-023-40378-8}, + publisher={Nature Publishing Group UK London} +} +@article{fu2022reconstructing, + title={Reconstructing tumor clonal lineage trees incorporating single-nucleotide variants, copy number alterations and structural variations}, + author={Fu, Xuecong and Lei, Haoyun and Tao, Yifeng and Schwartz, Russell}, + journal={Bioinformatics}, + volume={38}, + number={Supplement\_1}, + pages={i125--i133}, + year={2022}, + doi={10.1093/bioinformatics/btac253}, + publisher={Oxford University Press} +} +@article{satas2020scarlet, + title={{SCARLET}: single-cell tumor phylogeny inference with copy-number constrained mutation losses}, + author={Satas, Gryte and Zaccaria, Simone and Mon, Geoffrey and Raphael, Benjamin J}, + journal={Cell systems}, + volume={10}, + number={4}, + pages={323--332}, + year={2020}, + doi={10.1016/j.cels.2020.04.001}, + publisher={Elsevier} +} +@article{grigoriadis2024conipher, + title={{CONIPHER}: a computational framework for scalable phylogenetic reconstruction with error correction}, + author={Grigoriadis, Kristiana and Huebner, Ariana and Bunkum, Abigail and Colliver, Emma and Frankell, Alexander M and Hill, Mark S and Thol, Kerstin and Birkbak, Nicolai J and Swanton, Charles and Zaccaria, Simone and others}, + journal={Nature Protocols}, + volume={19}, + number={1}, + pages={159--183}, + doi={10.1038/s41596-023-00913-9}, + year={2024}, + publisher={Nature Publishing Group UK London} +} +@article{ross2016onconem, + title={Onco{NEM}: inferring tumor evolution from single-cell sequencing data}, + author={Ross, Edith M and Markowetz, Florian}, + journal={Genome biology}, + volume={17}, + pages={1--14}, + year={2016}, + doi={10.1186/s13059-016-0929-9}, + publisher={Springer} +} +@article{malikic2019phiscs, + title={{PhISCS}: a combinatorial approach for subperfect tumor phylogeny reconstruction via integrative use of single-cell and bulk sequencing data}, + author={Malikic, Salem and Mehrabadi, Farid Rashidi and Ciccolella, Simone and Rahman, Md Khaledur and Ricketts, Camir and Haghshenas, Ehsan and Seidman, Daniel and Hach, Faraz and Hajirasouliha, Iman and Sahinalp, S Cenk}, + journal={Genome research}, + volume={29}, + number={11}, + pages={1860--1877}, + year={2019}, + doi={10.1101/gr.234435.118}, + publisher={Cold Spring Harbor Lab} +} +@article{ancestree, + author = {El-Kebir, Mohammed and Oesper, Layla and Acheson-Field, Hannah and Raphael, Benjamin J.}, + title = "{Reconstruction of clonal trees and tumor composition from multi-sample sequencing data}", + journal = {Bioinformatics}, + volume = {31}, + number = {12}, + pages = {62-70}, + year = {2015}, + month = {06}, + doi = {10.1093/bioinformatics/btv261}, + issn = {1367-4803} +} +@inproceedings{Maitena, + author={Tellaetxe-Abete, Maitena and Calvo, Borja and Lawrie, Charles}, + booktitle={2020 IEEE Congress on Evolutionary Computation ({CEC})}, + title={{An Iterated Local Search Algorithm for the Clonal Deconvolution Problem}}, + year={2020}, + doi={10.1109/CEC48606.2020.9185515}, + pages={1-8} +} +@article{trap, + title={{TrA}p: a tree approach for fingerprinting subclonal tumor composition}, + author={Francesco Strino and Fabio Parisi and Mariann Micsinai and Yuval Kluger}, + journal={Nucleic Acids Research}, + year={2013}, + volume={41}, + doi={10.1093/nar/gkt641}, + pages={e165 - e165} +} +@article{nowell1976clonal, + title={The clonal evolution of tumor cell populations}, + author={Nowell, Peter C}, + journal={Science}, + volume={194}, + number={4260}, + pages={23--28}, + year={1976}, + doi={10.1126/science.959840}, + publisher={American Association for the Advancement of Science} +} +@article{kimuraISA, + title={The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations}, + author={Kimura, Motoo}, + journal={Genetics}, + volume={61}, + number={4}, + pages={893}, + year={1969}, + doi={10.1093/genetics/61.4.893}, + publisher={Oxford University Press} +} +@article{gusfield, + title={Efficient algorithms for inferring evolutionary trees}, + author={Gusfield, Dan}, + journal={Networks}, + volume={21}, + number={1}, + pages={19--28}, + year={1991}, + doi={10.1002/net.3230210104}, + publisher={Wiley Online Library} +} +@article{davis2017tumor, + title={Tumor evolution: {L}inear, branching, neutral or punctuated?}, + author={Davis, Alexander and Gao, Ruli and Navin, Nicholas}, + journal={Biochimica et Biophysica Acta ({BBA})-Reviews on Cancer}, + volume={1867}, + number={2}, + pages={151--161}, + year={2017}, + doi={10.1016/j.bbcan.2017.01.003}, + publisher={Elsevier} +} +@article{depth_error, + title={Standardization of sequencing coverage depth in {NGS}: recommendation for detection of clonal and subclonal mutations in cancer diagnostics}, + author={Petrackova, Anna and Vasinek, Michal and Sedlarikova, Lenka and Dyskova, Tereza and Schneiderova, Petra and Novosad, Tomas and Papajik, Tomas and Kriegova, Eva}, + journal={Frontiers in Oncology}, + volume={9}, + pages={851}, + year={2019}, + doi={10.3389/fonc.2019.00851}, + publisher={Frontiers} +} +@article{loman2012performance, + title={Performance comparison of benchtop high-throughput sequencing platforms}, + author={Loman, Nicholas J and Misra, Raju V and Dallman, Timothy J and Constantinidou, Chrystala and Gharbia, Saheer E and Wain, John and Pallen, Mark J}, + journal={Nature Biotechnology}, + volume={30}, + number={5}, + pages={434--439}, + year={2012}, + doi={10.1038/nbt.2198}, + publisher={Nature Publishing Group US New York} +} diff --git a/_articles/RJ-2025-042/GeRnika.tex b/_articles/RJ-2025-042/GeRnika.tex new file mode 100644 index 0000000000..b9c1440ae6 --- /dev/null +++ b/_articles/RJ-2025-042/GeRnika.tex @@ -0,0 +1,577 @@ +% !TeX root = RJwrapper.tex +\title{GeRnika: An R Package for the Simulation, Visualization and Comparison of Tumor Phylogenies} +\author{by A. Sánchez-Ferrera\textsuperscript{*}, M. Tellaetxe-Abete\textsuperscript{*}, B. Calvo-Molinos} + +\maketitle + +\begingroup +\renewcommand\thefootnote{\fnsymbol{footnote}} +\footnotetext[1]{These authors contributed equally to this work.} +\addtocounter{footnote}{1} +\endgroup + +\begin{abstract} +The development of methods to study intratumoral heterogeneity and tumor phylogenies is a highly active area of research. However, the advancement of these approaches often necessitates access to substantial amounts of data, which can be challenging and expensive to acquire. Moreover, the assessment of results requires tools for visualizing and comparing tumor phylogenies. In this paper, we introduce GeRnika, an R package designed to address these needs by enabling the simulation, visualization, and comparison of tumor evolution data. In summary, GeRnika provides researchers with a user-friendly tool that facilitates the analysis of their approaches aimed at studying tumor composition and evolutionary history.\end{abstract} + +\section{Introduction} + +The development of tumors is a complex and dynamic process characterized by a succession of events where DNA mutations accumulate over time. These mutations give rise to genetic diversity within the tumor, leading to the emergence of distinct clonal subpopulations or, simply, clones \citep{nowell1976clonal}. Each of these subpopulations exhibits unique mutational profiles, resulting in varied phenotypic and behavioral characteristics among the cancer cells \citep{1}. This phenomenon, known as intratumoral heterogeneity (ITH), significantly hinders the design of effective medical therapies, since different clones within the same tumor may respond differently to treatments, ultimately leading to therapy resistance and disease recurrence \citep{articleeffect}. To address this challenge, several innovative approaches are being developed to study the tumor composition in greater detail and to reconstruct its evolutionary history. + +One common approach to studying tumor composition and phylogeny involves using bulk DNA sequencing data from multiple tumor biopsies. This data is relatively straightforward to obtain and provides a broad overview of the genetic alterations within the tumor. However, using bulk sequencing data in the study of ITH faces the challenge that each sample potentially contains a mixture of different clonal populations rather than just one clone. Consequently, the observed mutation frequencies—measured as variant allele frequencies (VAFs)—do not directly estimate the fraction of individual clones. Instead, the VAF values represent a composite signal: the sum of the fractions of all clones that harbor each mutation in a given sample. + +This complexity implies that reconstructing the tumor's evolutionary history requires deconvolving these clonal admixtures within the samples. This task is precisely the focus of the Clonal Deconvolution and Evolution Problem (CDEP) \citep{trap, ancestree}, which can be summarized as determining the tumor's clonal structure—that is, identifying the number, proportion, and mutational composition of clones in each sample—as well as reconstructing the clonal phylogenetic tree that leads to the observed clonal mosaic. In this context, one of the most prominent approaches to addressing the CDEP is the Variant Allele Frequency Factorization Problem (VAFFP) \citep{ancestree}. + +Given \( s \) tumor samples and \( n \) mutations identified across these samples, we define a matrix \( \boldsymbol{F} \in [0, 1]^{s \times n} \), where each element \( f_{ij} \) represents the variant VAF value, or equivalently, the fraction of cells that carry mutation \( j \) in sample \( i \). The VAFFP seeks to decompose this input matrix $\boldsymbol{F}$ into two matrices: a matrix $\boldsymbol{B} \in \{0, 1\}^{n \times n}$ that represents the clonal phylogeny, and a matrix $\boldsymbol{U} \in [0, 1]^{s \times n}$ that captures the clone proportions in each tumor sample: + +\begin{equation} +\boldsymbol{F} = \boldsymbol{U} \cdot \boldsymbol{B} +\label{eqF} +\end{equation} + +The $\boldsymbol{B}$ matrix is a binary square matrix of size $n$, where $b_{ij} = 1$ iff clone $i$ contains the mutation $j$ \citep{gusfield}. The matrix $\boldsymbol{U}$ is an $s \times n$ matrix where $u_{ij}$ is the fraction of clone $j$ in sample $i$. + +The VAFFP operates under two key assumptions: tumors have a monoclonal origin, meaning they arise from a single abnormal cell, and the infinite sites assumption (ISA), which states that mutations occur at most once and cannot disappear over time \citep{kimuraISA}. Under these assumptions, the tumor's clonal structure can be modeled as a perfect phylogeny \citep{gusfield}. This model imposes two key constraints: (1) if two clones share a mutation, they must either be identical or ancestrally related, and (2) once a clone acquires a mutation, that mutation is inherited by all its descendants. + +Over the years, numerous methods have been developed to solve the CDEP \citep{bayclone, bitphylogeny, phylowgs, ancestree, citup, lichee, spruce, canopy, mipup, fu2022reconstructing, grigoriadis2024conipher}, with recent advancements addressing reformulations of the problem that incorporate single-cell sequencing-derived information or variants in metastases, and account for temporal resolution \citep{ross2016onconem, machina, malikic2019phiscs, satas2020scarlet, sollier2023compass}. These methods primarily focus on reconstructing tumor phylogenies and clonal compositions using both real and simulated data. However, the simulation tools they employ often have limitations. For instance, while MiPUP uses simulated data, it allows only basic parameter adjustments—such as the number of mutations, samples, and reads—and lacks the flexibility to fine-tune biological parameters. BitPhylogeny creates highly realistic and complex simulated datasets representing different modes of evolution, but these simulations are manually crafted and limited in number, posing scalability issues. + +Several tools specifically designed for simulating data for the CDEP have also been introduced. Pearsim, written in Python, allows control over parameters like read depth, number of subclones, samples, and mutations \citep{pairtree}. OncoLib, a C++ library, facilitates the simulation of tumor heterogeneity and the reconstruction of NGS sequencing data of metastatic tumors, offering control over parameters such as driver mutation probability, per-base sequencing error rate, migration rate, and mutation rate \citep{Oncolib}. Machina, also in C++, provides a framework for simulating metastatic tumors and visualizing their phylogenetic trees and migration graphs \citep{machina}. HeteroGenesis, implemented in Python, simulates heterogeneous tumors at the level of clone genomes, but it is not specifically tailored for CDEP instances and requires additional processing to generate suitable datasets \citep{tanner2019simulation}. + +Visualization of tumor phylogenies is another crucial aspect of studying tumor evolution. Tools like CALDER, Clonevol, SPRUCE, \pkg{fishplot}, ClonArch, and \BIOpkg{clevRvis} offer solutions for visualizing clonal structures and evolutionary trajectories \citep{calder, clonevol, spruce, fishplot, clonarch, clevrvis}. Among these, \pkg{fishplot} and \pkg{clevRvis} are available as R packages. Additionally, methods for generating consensus trees, such as TuELIP and ConTreeDP, have been developed to summarize multiple phylogenetic trees into a single representative tree \citep{tuelip, contreedp}. + +%Despite the availability of these tools, there remains a need for an integrated solution that combines simulation, visualization, and comparison functionalities within a user-friendly platform, particularly in the R programming environment. + +Despite the availability of existing tools, there remains a lack of options in the R programming environment for realistically simulating tumor evolution in a way that is both flexible and user-friendly, while also enabling effective visualization and comparison. + +In this paper, we introduce \CRANpkg{GeRnika}, an R package that provides a comprehensive solution for simulating, visualizing, and comparing tumor evolution data. Although \pkg{GeRnika}'s data simulation functionality was primarily devised to create instances for solving the CDEP, the simulated data are not restricted to this purpose and can also be used for exploring evolutionary dynamics in broader contexts. To accommodate diverse research needs, we have implemented the procedures to be highly customizable, allowing users to adjust a wide range of parameters such as the number of clones, selective pressures, mutation rates, and sequencing noise levels. Unlike existing tools that may offer limited customization or are implemented in other programming languages, \pkg{GeRnika} is fully integrated into the R environment, making it easy to use alongside other bioinformatics packages. By combining simulation capabilities with visualization and comparison tools in a user-friendly interface, \pkg{GeRnika} offers an accessible and flexible option within the R ecosystem for researchers studying tumor evolution. It is important to note that \pkg{GeRnika} does not implement algorithms for inferring clonal composition or reconstructing phylogenies from experimental datasets. Instead, it provides a controlled framework for generating, visualizing, and comparing data that can be used to benchmark such methods. + +\section{Simulation of tumor evolution}\label{sec:simulation} + +The main contribution of this work is the introduction of a novel approach for simulating biologically plausible instances of the VAFFP that accounts for several key factors, including the number of clones, selective pressures, and sequencing noise. In this section, we provide a detailed description of the approach. %First, we offer an overview of its structure, followed by a breakdown of its three main components: the tumor model, the sampling simulation, and the sequencing noise simulation. + +%\subsection{Data Simulation Model} + +Broadly speaking, each problem instance consists of a matrix $\boldsymbol{F}$ containing the VAF values of a set of mutations in a set of samples, as described previously. This matrix is built from a pair of matrices $\boldsymbol{B}$ and $\boldsymbol{U}$ that represent a tumor phylogeny fulfilling the ISA and the proportions of the clones in the samples, respectively, following Equation \eqref{eqF}. + +In order to simulate the $\boldsymbol{B}$ and $\boldsymbol{U}$ matrices, we have devised two models: a tumor model that simulates the evolutionary history and current state of the tumor, and a sampling model that represents the tumor sampling process. A third model, namely the sequencing noise model, has been devised to optionally introduce sequencing noise to the VAF values in the $\boldsymbol{F}$ matrix, if noisy data is desired. The following subsections describe these models in detail. + +\subsection{Tumor model} + +The tumor model generates a clonal tree $T$ and an associated matrix $\boldsymbol{B}$, together with the clone proportions $\boldsymbol{c}$ and tumor blend at the moment of sampling. Briefly, for a tumor with a set of $n$ mutations denoted by $M$, $T$ is a rooted tree on an $n$-sized vertex set $V_n = \{v_{1}, \dots, v_{n} \}$, where $v_{i}$ represents clone $i$ and simultaneously corresponds to the first clone containing mutation $M_i$. This one-to-one correspondence between clones and mutations allows us to refer to them interchangeably. The tree is further defined by an $(n-1)$-sized edge set $E_T$, where each edge $e_{ij} \in E_T$ represents a direct ancestral relationship from vertex $v_{i}$ to vertex $v_{j}$. + +In our tumor model, $T$ is iteratively generated with a random topology, as follows. First, the root node of $T$, $\mathcal R(T)$, is set, and a random mutation $M_i \in M$ is assigned to it. For each of the remaining $M_j \in M - \{M_i\}$ mutations, a new node $v_j$ is created and the mutation $M_j$ is assigned to this node. The node $v_j$ is then attached as a child to one of the nodes already included in $T$. To adhere to the ISA model, each newly added node inherits all the mutations present in its parent node. + +The attachment of nodes to the tree is not uniformly random. Instead, the nodes in the growing tree $T$ have different probabilities of being selected as parents for the new nodes, depending on the number of ascendants, $\mathcal A(v_i)$, they have. Specifically, $\forall v_j \neq \mathcal R(T)$, the parent node of $v_j$ is sampled from a multinomial distribution where the probabilities are calculated as: %Mathematically, $\forall v_j \neq \mathcal R(T)$, the parent node of $v_j$, denoted by the random variable $\mathcal P(v_j)$, is sampled from the set of nodes already in the tree, $V^{\prime} \subset V_n$, following a multinomial distribution: + +\iffalse +\begin{equation} +\mathcal P(v_j) \sim M(\boldsymbol{p}) +\label{eq:pk} +\end{equation} + +where the probability distribution $\boldsymbol{p}$ is defined as: +\fi + +\begin{equation} +\boldsymbol{p}(v_i; k) = \frac{k^{\frac{|\mathcal{A}(v_i)| + 1}{\delta}}}{\sum_{v_l \in V'} k^{\frac{|\mathcal{A}(v_l)| + 1}{\delta}}}; \quad v_i \in V' +\end{equation} + +Here, $\delta$ represents the depth of the growing tree, i.e., the number of levels or layers in the tree structure. $k \in (0, +\infty)$ is the topology parameter that determines whether the topology tends to be branched, with a decreasing probability for increasing numbers of ascendants ($k$ $<$ 1), or linear, with an increasing probability for increasing numbers of ascendants ($k$ $>$ 1). %Likewise, setting $k$ to 1 assigns equal probabilities to all the nodes and generates a completely random topology. + +Once $T$ has been generated, it is represented in the form of a $\boldsymbol{B}$ matrix, constructed by initializing an identity matrix $B_{n}$ and setting $b_{ji}$ to 1 for each pair of nodes $v_i$ and $v_j$ where node $v_j$ is a descendant of node $v_i$ in $T$. + +After obtaining $\boldsymbol{B}$, the proportions of the clones in the whole tumor, denoted as $\boldsymbol{c} = \{c_{1}, \dots, c_{n} \}$, are simulated. It is important to note that these proportions are not the same as those appearing in the $\boldsymbol{U}$ matrix, which represent the \textit{sampled} clone proportions and depend not only on the global clone proportions but also on the spatial distribution of the clones and the sampling sites. + +These clone proportions $\boldsymbol{c}$ are calculated by sequentially sampling a Dirichlet distribution at each multifurcation in $T$, starting from the root. For instance, for a node $v_i$ with children $\mathcal K(v_i)$ = \{$v_j$, $v_k$\}, we draw a sample $(x_i, x_j, x_k)$ that represents the proportions of the parent clone and its two children, respectively, from a Dirichlet distribution $Dir(\alpha_i, \alpha_j, \alpha_k)$. When this sampling is performed at a node $v_i \neq \mathcal R(T)$, these proportions are scaled relative to the original proportion of the parent clone. This ensures that the sum rule is met, and that once all multifurcations have been visited, the proportions of all clones in $T$ sum up to one. While several approaches can exist to determine these proportions, this method provides a natural approximation to the problem that can be interpreted as the distribution of the mass or proportion of each clone between itself and its descendants. + +The parameters of the Dirichlet distribution depend on the tumor's evolution model. In this work, we consider two fundamental cases: positive selection-driven evolution and neutral evolution. In positive selection-driven evolution, certain mutations confer a growth advantage, while most mutations do not. As a result, the clones carrying these advantageous mutations outcompete other clones and dominate the tumor. Consequently, tumors are predominantly composed of a few dominant clones, with the remaining clones present in too small proportions. Under neutral evolution, instead, there is no significant number of mutations that provide a fitness advantage, and clones accumulate solely due to tumor progression. As a result, all clones are present in similar proportions \citep{davis2017tumor}. + +Based on this, all the %$\alpha_{i}$ +parameters for the Dirichlet distribution for positive selection-driven evolution are set to 0.3. For neutral evolution, the parameter corresponding to the parent node ($\alpha_{p}$) is set to 5, and the parameters corresponding to the children node(s) ($\alpha_{c}$) are set to 10. Different alpha values are used for parent and children nodes in neutral evolution to ensure that clones arising late in the evolution do not end up with proportions that are too small solely due to their position in the topology, preventing the deviation from the expected clone proportion distribution for this type of evolution model. These values have been chosen empirically, and their effect is illustrated in Figure \ref{ternary_plots}, which shows how 5,000 random samples from the mentioned Dirichlet distributions (for the particular case of 3 dimensions, i.e., one parent and two children nodes) are distributed. As observed, in the case of positive selection ($\alpha_{i} = \alpha_{j} = \alpha_{k} = 0.3$), the $(x_i, x_j, x_k)$ values are equally pushed towards the three corners of the simplex. In other words, the samples tend to be sparse, with typically one component having a large value and the rest close to 0. Instead, when neutral selection is adopted ($\boldsymbol{\alpha}$ = (5, 10, 10)), the $(x_i, x_j, x_k)$ values concentrate close to the center of the simplex, but with a tendency to deviate towards those components with larger $\alpha$ value. This means that samples $(x_i, x_j, x_k)$ are less sparse in neutral evolution, with larger values for $x_2$ and $x_3$ in this case, which represent the children nodes. + +\begin{figure*}[t] +\centerline{\includegraphics[width=\textwidth]{figs/ternary_plots_cropped.pdf}} +\caption[Ternary density plots from Dirichlet distribution samples drawn at each simulated topology multifurcation]{Ternary density plots of 5,000 samples drawn from two 3-dimensional Dirichlet distributions. The parameters of the Dirichlet distribution on the left are \boldmath$\alpha$ \unboldmath = (0.3, 0.3, 0.3) and the distribution is used to represent positive selection-driven evolution. The distribution on the right has parameters \boldmath$\alpha$ \unboldmath = (5, 10, 10) and is used to represent neutral evolution. Samples drawn from these distributions (or their generalization to higher spaces) are used to calculate clone proportions in each tree multifurcation.} +\label{ternary_plots} +\end{figure*} + +Taking into account that marginalizing the Dirichlet distribution results in a Beta distribution, the proportion of the clone $v_i \in V_n \; | \; v_i \neq \mathcal R(T)$ in the tumor, denoted as $C_i$, follows the distribution: + +\begin{equation} + C_i \sim C_{\mathcal P(v_i)} \cdot \Gamma_i \cdot \Gamma_i^\prime +\end{equation} + +where + +\begin{equation} +\Gamma_i \sim Beta(\alpha_{c}, \alpha_{p} + \alpha_{c} \cdot (|\mathcal K(\mathcal P(v_i))| - 1)) +\end{equation} + +and + +\begin{align} +\Gamma_i^\prime = 1 \quad \text{if } |\mathcal K(v_i)| = 0 \\ +\Gamma_i^\prime \sim Beta(\alpha_{p}, \alpha_{c} \cdot |\mathcal K(v_i)|) \quad \text{if } |\mathcal K(v_i)| \neq 0 +\end{align} + +For the case where $v_i = \mathcal R(T)$, the root node, $C_i$, follows: + +\begin{equation} +C_i \sim Beta(\alpha_{p}, \alpha_{c} \cdot |\mathcal K(v_i)|) +\end{equation} + +Here, $\alpha_{p}$ and $\alpha_{c}$ are the parameters of the Dirichlet distribution assigned to parent and child nodes, respectively. + +To complete the tumor model, the tumor blend is simulated, which represents the degree of physical mixing between the tumor clones. In order to do this, we simplify the spatial distribution to one dimension and model the tumor as a Gaussian mixture model with $n$ components, where each component $G_i$ represents a tumor clone, and the mixture weights are given by $\boldsymbol{c}$. The variance for all components is set to 1, while the mean values are random variables. + +Specifically, we start by selecting a random clone, and its component's mean value is set to 0. Then, the mean values of the remaining $n - 1$ components are calculated sequentially by adding $d$ units to the mean value of the previous component. To introduce variability in the tumor blend, the value of $d$ is chosen from the set $\{0, 0.1, \ldots, 4\}$. For $d$ = 0, the two clones are completely mixed, while for $d$ = 4, they are physically far apart from each other. The choice of the upper limit for $d$ has been determined empirically, considering that with this value, the overlapping area between the two clones becomes negligible. + +To ensure the separation between the clones is random and that most of the time the separation is small, we use an exponential-like distribution with the form $Beta(\alpha=1, \beta=X)$ to sample the values of $d$. Specifically, we set $\beta = 5$ to ensure that the samples obtained from the mixture are not excessively sparse. We can express this mathematically as: + +%To ensure the separation between the clones is random and that most of the time the separation is small, we use the following exponential distribution to sample the values of $d$: + +\begin{equation} +D \sim 4 \cdot Beta(\alpha=1, \beta=5) +\end{equation} + +\subsection{Sampling simulation} + +So far, we have described how the clones of a tumor are modelled by the tumor model. However, in real practice, there is no easy way of observing these global properties of a tumor. Instead, we typically have access to information provided by samples or biopsies. This means that certain tumor characteristics, such as the real clone proportions $\boldsymbol{c}$, cannot be directly obtained. Instead, we can only determine the \textit{sampled} clone proportions, which depend on the specific sampling procedure employed. Unless there is a perfectly uniform mixture of the clone cells, their sampled proportions will not match the global proportions. These sampled clone proportions are, in fact, the $\boldsymbol{u}_{i}.$ elements in the $\boldsymbol{U}$ matrix. + +The sampling simulation we have devised simulates the physical sampling of the tumor and allows us to construct the $\boldsymbol{U}$ matrix of the problem. This procedure operates on the data simulated using the tumor model. Specifically, it simulates a sampling procedure carried out in a grid manner over the tumor Gaussian mixture model described in the previous section. Let $G_1$ and $G_n$ be the components with the lowest and largest mean values, respectively, in the Gaussian mixture model. The 1\textsuperscript{st} and $m$\textsuperscript{th} sampling points in the grid are always set to $\mu_{G_1} - 2.8 \cdot \sigma_{G_1}$ and $\mu_{G_n} + 2.8 \cdot \sigma_{G_n}$, respectively, and the remaining $m$-2 sampling points are determined by dividing the range between these two endpoints into $m-1$ equal intervals. + +The densities of the Gaussian distributions at each sampling point are multiplied by the global proportion of the clones sampled from the Dirichlet distributions, so that for each sampling point $i$, the fraction of clone $j$, $p_{ij}$, is proportional to their product: + +\begin{equation} +p_{ij} \propto c_j \cdot \phi_{ij} +\label{eq:pij_2} +\end{equation} + +where $c_j$ is the global proportion of clone $j$ and $\phi_{ij}$ is the density of the Gaussian component associated with clone $j$ at sampling point $i$. + +Finally, to account for the effect of cell count in the samples, a multinomial distribution is used to sample a given number of cells $n_{c}$ for each tumor sample. In that distribution, the probability of selecting each clone at sampling site $i$ is given by $(p_{i1}, \ldots, p_{in})$. The resulting values determine the final tumor clone composition in sample $i$, which are represented in the matrix $\boldsymbol{U}$: + +\begin{equation} +U_{i.} \sim \frac{M(n = n_{c}, p = (p_{i1}, \ldots, p_{in})) +}{100} +\end{equation} + +Note that selecting a relatively low value for $n$ in the multinomial distribution can lead to clones with very low frequencies being modeled as absent in the sample, with composition values equal to 0. This is indeed more realistic than truly observing them with such low frequencies. + +\subsection{Sequencing noise simulation} + +Up to this point, the $\boldsymbol{B}$ and $\boldsymbol{U}$ matrices of an instance have been simulated. In case we are simulating noise-free data, the simulation is complete once Equation \eqref{eqF} is applied to obtain the $\boldsymbol{F}$ matrix. + +As a brief reminder, each element $f_{ij}$ in $\boldsymbol{F}$ denotes the frequency or VAF of the mutation $M_j$ in sample $i$ or, in other words, the proportion of sequencing reads that carry the mutation $M_j$ in that particular sample. This also means that the proportion of reads in that sample that do not observe the mutation but instead contain the reference nucleotide is 1 - $f_{ij}$. + +However, empirical factors can artificially alter the VAF value, leading it to deviate from the true ratio between the variant and total allele molecule counts. One of these factors is the noise introduced during the DNA sequencing process itself, which can arise in two main ways. First, limitations of the sequencing instrument can lead to incorrect nucleotide readings of DNA fragments. For example, a position that actually contains nucleotide A may be read as a T. Second, there can be a biased number of reads produced for a particular site, which can result from chemical reaction peculiarities or simply because not all fragments are sequenced. These limitations can, however, be mitigated to some extent. For instance, it has been shown that a high depth of coverage, which refers to the average number of reads that cover each position, can lead to more accurate VAF values \citep{depth_error}. + +In order to incorporate the effect of sequencing noise in the data instances, we have developed a procedure to simulate sequencing noise. This procedure introduces noise to the $\boldsymbol{F}$ matrix and generates a noisy matrix $\boldsymbol{F^{(n)}}$, where $\boldsymbol{F^{(n)}} \neq \boldsymbol{U} \cdot \boldsymbol{B}$. The procedure simulates noise at the level of the sequencing reads and recalculates the new $f^{(n)}_{ij}$ values, as follows. + +The sequencing depth $r$ at the genomic position where $M_j$ occurs in sample $i$ is distributed according to a negative binomial distribution: + +\begin{equation} +r_{ij} \sim NB(\mu = \mu_{sd}, \alpha = 5) +\label{eq:r} +\end{equation} + +where $\mu_{sd}$ represents the mean sequencing depth, which is the average number of reads covering the genomic position of mutation $M_j$ in the sample, and $\alpha$ is the dispersion parameter, which controls the variability of the sequencing depth around the mean and is fixed at 5. + +The number of reads supporting the alternate allele $r^{a}_{ij}$ is then modeled by a binomial distribution: + +\begin{equation} +r^{a}_{ij} \sim B(n = r_{ij}, p = f_{ij}) +\label{eq:ra} +\end{equation} + +In sequencing data, errors can occur due to limitations inherent to the sequencing methodology. These errors vary depending on the technology used. %For instance, in Illumina platforms, they are reported to occur at a rate of approximately 0.001. + +To simulate the effect of these errors on the VAF values, the number of reads $r^{a\prime}_{ij}$ that, despite originally supporting the alternate allele, contain a different allele as a result of a sequencing error, is modeled using a binomial distribution: + +\begin{equation} +r^{a\prime}_{ij} \sim B(n = r^{a}_{ij}, p = \varepsilon), +\end{equation} + +where $\varepsilon$ represents the sequencing error rate. + +We also need to consider the situation where the reads contain the reference nucleotide but are read with the alternate allele as a result of this error. This can be better understood with an example. Let's imagine that at a certain genomic position, the normal cells have a T, but in some cells, there is a mutation where the T has changed to an A. In this case, for the normal cells, with a rate of $\varepsilon$, a sequencing error may occur, resulting in a read of C, G, or A instead of T, each with an equal chance. Therefore, in approximately $\frac{\varepsilon}{3}$ of the cases, reads with the mutation of interest will arise from normal reads: + +\begin{equation} +r^{r\prime}_{ij} \sim B(n = r_{ij} - r^{a}_{ij}, p = \frac{\varepsilon}{3}) +\label{eq:ramr} +\end{equation} + +Thus, taking all these into consideration, the final noisy VAF values $f^{(n)}_{ij}$ are simulated as: + +\begin{equation} +f^{(n)}_{ij} = \frac{r^{a}_{ij} - r^{a\prime}_{ij} + r^{r\prime}_{ij}}{r_{ij}} +\label{eq:noisyVAF} +\end{equation} + +By default, the sequencing error rate $\varepsilon$ is set to 0.001, following commonly reported values for Illumina data \citep{loman2012performance}. + +As an illustration of the effect of the noise model, in Figure \ref{F_error_plot}, we have depicted the density of the mean absolute error between the $\boldsymbol{F^{(n)}}$ matrix and its corresponding noise-free $\boldsymbol{F}$ matrix for a collection of noisy instances. As can be seen, as $\mu_{sd}$ increases, the error introduced to the $\boldsymbol{F^{(n)}}$ matrix decreases. This is expected because the $r^{a}_{ij}$ values follow a binomial distribution as described in Equation \eqref{eq:ra}, where the number of trials is determined by $\mu_{sd}$ as shown in Equation \eqref{eq:r}, and the event probability corresponds to the $f^{(n)}_{ij}$ value. Therefore, the larger the number of trials, the closer the noisy VAF value is to the noise-free VAF value. + +\begin{figure*}[t] +\centerline{\includegraphics[width=\textwidth]{figs/noisy_F_error.pdf}} +\caption[$F$ matrix error density for different noise levels]{Density of the mean absolute error in noisy $\boldsymbol{F}$ matrices for different $\mu_{sd}$ values that correspond to different noise levels.} +\label{F_error_plot} +\end{figure*} + +As a final remark, it is important to note that although our data simulation procedure follows the ISA, the addition of noise may cause the resulting data to break this assumption. + +\section{The package}\label{sec:package} + +\pkg{GeRnika} provides three main functionalities for studying tumor evolution data: (I) simulating artificial tumor evolution, (II) visualizing tumor phylogenies, and (III) comparing tumor phylogenies. This section explains the functions that support these features. Additionally, we describe extra data provided by \pkg{GeRnika} that users can use to try the methods in the package. + +\subsection{Simulation methods} + +To enable users to simulate tumor evolution data, \pkg{GeRnika} provides various functions inspired by the methods described in \autoref{sec:simulation}. GeRnika offers two options: a single method for streamlined simulations and separate methods for performing each step individually, allowing users to customize or replace specific parts of the process. + +\subsubsection*{create\_instance} + +The main function for streamlined tumor data simulation is \code{create\_instance}. This function provides a convenient way to perform the entire simulation process in a single step. The following command demonstrates how to use it to generate the artificial data: + +\begin{center} +\begin{example} + create_instance(n, m, k, selection, noisy = TRUE, depth = 30, seed = Sys.time()) +\end{example} +\end{center} + + where each argument of the method is described as follows: + +\begin{itemize} + \item \code{n}: An integer representing the number of clones. + \item \code{m}: An integer representing the number of samples. + \item \code{k}: A numeric value that determines the linearity of the tree topology. Also referred to as the topology parameter. Increasing values of this parameter increase the linearity of the topology. When \code{k} is set to 1, all nodes have equal probabilities of being chosen as parents, resulting in a uniformly random topology. + \item \code{selection}: A character string representing the evolutionary mode the tumor follows. This should be either "positive" or "neutral". + \item \code{noisy}: A logical value (\code{TRUE} by default) indicating whether to add noise to the frequency matrix. If \code{TRUE}, noise is added to the frequency matrix. If \code{FALSE}, no noise is added. + \item \code{depth}: A numeric value (30 by default) representing the mean depth of sequencing. + \item \code{seed}: A numeric value (\code{Sys.time()} by default) used to set the seed for the random number generator. +\end{itemize} + + The \code{create\_instance} function returns a list containing the following components: + +\begin{itemize} + \item \code{F\_noisy}: A matrix representing the noisy frequencies of each mutation across samples. If the \code{noisy} parameter is set to \code{FALSE}, this matrix is equal to \code{F\_true}. + \item \code{B}: A matrix representing the relationships between mutations and clones in the tumor. + \item \code{U}: A matrix representing the frequencies of the clones across the set of samples. + \item \code{F\_true}: A matrix representing the noise-free frequencies of each mutation across the samples. +\end{itemize} + +As explained, the \code{create\_instance} function generates all matrices representing frequencies, proportions, and the phylogeny of the simulated tumor data in a single step. However, \pkg{GeRnika} also provides individual functions for simulating each of these elements independently, providing users with greater control over the characteristics of the simulated tumor data. + +\subsubsection*{create\_B, create\_U, create\_F and add\_noise} + +These methods provide specialized functions to generate each matrix involved in representing tumor evolution data. These functions include the following: + +\begin{itemize} + \item \code{create\_B}: This function generates a mutation matrix ($\boldsymbol{B}$ matrix) for a tumor phylogenetic tree with a given number of nodes and a value \code{k} determining the linearity of the tree topology. + \item \code{create\_U}: This function calculates the $\boldsymbol{U}$ matrix, containing the frequencies of each clone in a set of samples, based on a $\boldsymbol{B}$ matrix, the number of samples considered, the number of cells in each sample, and the evolutionary mode of the tumor. + \item \code{create\_F}: This function generates the $\boldsymbol{F}$ matrix, which contains mutation frequency values for a series of mutations across a collection of tumor biopsies or samples. The matrix is computed based on a pair of matrices, $\boldsymbol{U}$ and $\boldsymbol{B}$, and considers whether the mutations are heterozygous. + \item \code{add\_noise}: This function introduces sequencing noise into the noise-free $\boldsymbol{F}$ matrix generated by the \code{create\_F} method. Users can specify the mean sequencing depth and the overdispersion parameter, which are used to simulate sequencing depth based on a negative binomial distribution. +\end{itemize} + +The reader is encouraged to refer to the package documentation for more information about these functions and their parameters. + +\subsection{Visualization methods} + +The following functions enable the visualization of tumor evolution data by generating phylogenetic trees based on the data under analysis. + +\subsubsection*{Phylotree S4 class} + +To simplify the execution of its functionalities, \pkg{GeRnika} utilizes the \code{"Phylotree"} class. The \code{"Phylotree"} S4 class is a data structure specifically designed to represent phylogenetic trees, facilitating the use of the package's methods and ensuring their computational efficiency. The attributes of the \code{"Phylotree"} class are as follows: + +\begin{itemize} + \item \code{B}: A data.frame containing the square matrix that represents the ancestral relationships among the clones in the phylogenetic tree ($\boldsymbol{B}$ matrix). + \item \code{clones}: A vector representing the indices of the clones in the $\boldsymbol{B}$ matrix. + \item \code{genes}: A vector indicating the index of the gene that firstly mutated in each clone within the $\boldsymbol{B}$ matrix. + \item \code{parents}: A vector indicating the parent clones for each clone in the phylogenetic tree. + \item \code{tree}: A \code{"Node"} class object representing the phylogenetic tree (this class is inherited from the \CRANpkg{data.tree} package). + \item \code{labels}: A vector containing the gene tags associated with the nodes in the phylogenetic tree. +\end{itemize} + +A customized \code{"Phylotree"} class object can be instantiated with custom attributes using the \code{create\_instance} method. This method takes all the attributes according to the \code{"Phylotree"} class as arguments. Alternatively, \pkg{GeRnika} provides a function that automatically generates a \code{"Phylotree"} class object on the basis of a given $\boldsymbol{B}$. + +\subsubsection*{B\_to\_phylotree} + + In order to instantiate an object of the ``\code{Phylotree}'' class, the following command can be used: + +\begin{example} + B_to_phylotree(B, labels = NA) +\end{example} + + where each argument of the method is described as follows: + +\begin{itemize} + \item \code{B}: A square $\boldsymbol{B}$ matrix that represents the phylogenetic tree. + \item \code{labels}: An optional vector containing the tags of the genes in the phylogenetic tree. \code{NA} by default. +\end{itemize} + +This function returns an object of the \code{"Phylotree"} class, automatically generating its attributes based on \code{B}, which represents the phylogenetic tree of the tumor under analysis. + +Once instantiated, the phylogenetic tree in a \code{"Phylotree"} class object can be visualized using the generic \code{plot} function, which takes the \code{"Phylotree"} object as its argument. The \code{plot} function also includes a \code{labels} argument that can be set to \code{TRUE} to display node labels on the phylogenetic tree, using the gene tags stored within the \code{"Phylotree"} object. + +The \pkg{GeRnika} package provides the \code{plot\_proportions} function for visualizing phylogenetic trees, with node sizes and colors reflecting the proportions of each clone. This function requires two inputs: a \code{"Phylotree"} class object representing the phylogenetic tree and a numeric vector or matrix specifying clone proportions. If a vector is provided, a single tree is plotted, with the node sizes and colors determined by the values in the vector. Instead, if a matrix is provided, such as the $\boldsymbol{U}$ matrix that represents the frequencies of clones across samples, the function plots one tree for each row of the matrix. Each tree is generated based on the clone proportions specified in the corresponding row. Additionally, users can enable node labeling by setting the \code{labels} argument to \code{TRUE}, which annotates the tree nodes with gene tags from the \code{"Phylotree"} object. + +\subsection{Comparison methods} + +This section describes the methods included in \pkg{GeRnika} that facilitate the comparison of tumor phylogenies. + +A fundamental approach for comparing two phylogenetic trees is to determine if their evolutionary histories are equivalent. The \code{equals} function performs this comparison by accepting two \code{"Phylotree"} class objects as arguments. This function returns a boolean value indicating whether the provided phylogenetic trees are equivalent. + +To analyze similarities and differences between two phylogenetic trees, the \code{find\_common\_subtrees} function identifies and plots all maximal common subtrees between them. In addition to visualizing these subtrees, the function outputs the number of shared and unique edges (those present in only one of the trees) and calculates the distance between the trees, defined as the sum of their unique edges. This method also includes an option to label the maximal common subtrees with gene tags by setting \code{labels = TRUE}. + +The \code{combine\_trees} function generates a \dfn{consensus tree} by combining the nodes and edges of two \code{"Phylotree"} class objects. The consensus tree highlights the nodes and edges that form common subtrees between the original trees, as well as the independent edges unique to each tree, which are displayed with reduced opacity. This method also allows labeling the nodes with gene tags by setting \code{labels = TRUE} and customizing the colors of the consensus tree by passing a 3-element hexadecimal vector to the \code{palette} argument. + +\subsection{Exported data} + +\pkg{GeRnika} provides various exported data instances to help users easily explore the package's functionalities. These are as follows: + +\begin{itemize} + \item \code{B\_mats}: A list of 10 trios of $\boldsymbol{B}$ matrices. Each trio includes a real $\boldsymbol{B}$ matrix and two $\boldsymbol{B}$ matrices generated using different algorithms that infer evolutionary relationships from a given $\boldsymbol{F}$ matrix. These matrices serve as illustrative examples for testing and exploring the package's functionalities. + + \item \code{palettes}: A data frame containing three predefined palettes for use with methods in \pkg{GeRnika} that require color palettes. + +\end{itemize} + +\section{Examples} + +In this section we show examples of the use of the methods explained in \autoref{sec:package}. Please note that in the following examples, we will set the seeds of non-deterministic methods to a predefined value to ensure reproducibility. + +\subsection{Simulating tumor evolution data} + +The simulation of tumor clonal data involves generating the matrices $\boldsymbol{B}$, $\boldsymbol{U}$, $\boldsymbol{F}$, and $\boldsymbol{F^{(n)}}$ associated with a specific instance. For example, we can simulate a noisy instance of a tumor composed of 5 clones/mutations, which has evolved under neutral evolution with a $k$ value of 0.5, and from which 3 samples have been taken in a single line of code, as follows: + +\begin{example} + +> I <- create_instance(n = 5, m = 3, k = 0.5, selection = "neutral", seed = 1) +> I + +$F_noisy + mut1 mut2 mut3 mut4 mut5 +sample1 1.000 0.09090909 0.0000000 0.2777778 0.3548387 +sample2 1.000 0.20000000 0.2631579 0.8536585 0.2000000 +sample3 0.975 0.03846154 1.0000000 1.0000000 0.0000000 + +$B + mut1 mut2 mut3 mut4 mut5 +clone1 1 0 0 0 0 +clone2 1 1 0 1 0 +clone3 1 0 1 1 0 +clone4 1 0 0 1 0 +clone5 1 0 0 1 1 + +$U + clone1 clone2 clone3 clone4 clone5 +sample1 0.59 0.13 0.00 0.01 0.27 +sample2 0.13 0.27 0.24 0.19 0.17 +sample3 0.00 0.04 0.89 0.07 0.00 + +$F_true + mut1 mut2 mut3 mut4 mut5 +sample1 1 0.13 0.00 0.41 0.27 +sample2 1 0.27 0.24 0.87 0.17 +sample3 1 0.04 0.89 1.00 0.00 + +\end{example} + +Using this approach, the previously mentioned four matrices are simulated. Note that the noise-free $\boldsymbol{F}$ matrix is referred to as \code{F\_true} in the package's code, while the noisy $\boldsymbol{F^{(n)}}$ is denoted as \code{F\_noisy}. + +The previous method allows users to generate instances easily and quickly. However, some users may require more precise control over the data, which can be achieved using the \code{create\_B}, \code{create\_U}, \code{create\_F}, and \code{add\_noise} methods. For examples on how to use these methods, please refer to the package documentation. + +\subsection{Visualizing tumor phylogenies} + +Once the matrices associated with our tumor instance have been generated, we can create a \code{"Phylotree"} class object, as follows: + +\begin{example} +> phylotree <- B_to_phylotree(B = I$B) +> phylotree + +An object of class "Phylotree" +Slot "B": + mut1 mut2 mut3 mut4 mut5 +clone1 1 0 0 0 0 +clone2 1 1 0 1 0 +clone3 1 0 1 1 0 +clone4 1 0 0 1 0 +clone5 1 0 0 1 1 + +Slot "clones": +mut1 mut2 mut3 mut4 mut5 + 1 2 3 4 5 + +Slot "genes": +mut1 mut2 mut3 mut4 mut5 + 1 2 3 4 5 + +Slot "parents": +[1] -1 4 4 1 4 + +Slot "tree": + levelName +1 1 +2 °--4 +3 ¦--2 +4 ¦--3 +5 °--5 + +Slot "labels": +[1] "mut1" "mut4" "mut2" "mut3" "mut5" +\end{example} + +Since no list of tags is provided to the \code{labels} parameter, a default set of labels is automatically assigned to the instantiated \code{"Phylotree"} class object. + +Afterwards, we can visualize the tumor phylogeny associated to the simulated $\boldsymbol{B}$ matrix by using the generic \code{plot} method, as follows: + +\begin{example} +> plot(phylotree) +\end{example} + +\begin{figure*}[t] +\centerline{\includegraphics[width=0.5\textwidth]{figs/phylotree.png}} +\caption{Phylogenetic tree associated to the generated \code{"Phylotree"} class object.} +\label{fig:phylotree} +\end{figure*} + +The resulting plot is shown in Figure \ref{fig:phylotree}. Instead of clone numbers, the user can utilize the predefined tags in the \code{"Phylotree"} class object to label the nodes in the tree by setting the \code{labels = TRUE} parameter in the \code{plot} function. + +When plotting a \code{"Phylotree"} class object, its nodes can be resized according to the proportions of the clones that compose the tumor samples. To achieve this, we can use the $\boldsymbol{U}$ matrix from the previously generated instance to determine the proportions of the clones, as shown below: + +\begin{example} +> plot_proportions(phylotree, I$U, labels = TRUE) +\end{example} + +\begin{figure*}[t] +\centerline{\includegraphics[width=\textwidth]{figs/proportions.png}} +\caption{Phylogenetic trees associated to the generated \code{"Phylotree"} class object, using the proportions associated to the previously generated $\boldsymbol{U}$ matrix.} +\label{fig:proportions} +\end{figure*} + +The resulting plot is shown in Figure \ref{fig:proportions}. This method plots the proportions of the $\boldsymbol{U}$ matrix, where each tree represents a sample from the $\boldsymbol{U}$ matrix, illustrating the proportion of each clone within that specific sample. In this case, we have set the \code{labels} parameter to \code{TRUE} to label the nodes in the tree using the predefined tags "mut1", "mut4", "mut2", "mut3", and "mut5". + +\subsection{Comparing tumor phylogenies} + +Now, we present examples showing the use of \pkg{GeRnika}'s functionalities for comparing tumor phylogenies. For this purpose, we will use the \code{B\_mats} object included in the \pkg{GeRnika} package, which contains 10 $\boldsymbol{B}$ matrix trios. Specifically, we will use the first trio of matrices, and set a predefined set of tags for the clones in the trees: + +\begin{example} +> B_mats <- GeRnika::B_mats + +> B_real <- B_mats[[1]]$B_real +> B_alg1 <- B_mats[[1]]$B_alg1 +> B_alg2 <- B_mats[[1]]$B_alg2 + +> tags <- c("TP53", "KRAS", "PIK3CA", "APC", "EGFR", "BRCA1", "PTEN", "BRAF", + "MYC", "CDKN2A") + +> phylotree_real <- B_to_phylotree(B = B_real, labels = tags) +> phylotree_alg2 <- B_to_phylotree(B = B_alg2, labels = tags) +> phylotree_alg1 <- B_to_phylotree(B = B_alg1, labels = tags) + +> plot(phylotree_real, labels=TRUE) +> plot(phylotree_alg1, labels=TRUE) +> plot(phylotree_alg2, labels=TRUE) + +\end{example} + +The plots of the three instantiated \texttt{Phylotree} class objects are depicted in Figure \ref{fig:phylocomparison}. + +\begin{figure*}[t] +\centerline{\includegraphics[width=\textwidth]{figs/comparisonphylotrees.png}} +\caption{\code{phylotree\_real}, \code{phylotree\_alg2} and \code{phylotree\_alg1}, from the left to the right.} +\label{fig:phylocomparison} +\end{figure*} + + We can check if the phylogenies of two tumors are equivalent using the \code{equals} method: + +\begin{example} +> equals(phylotree_1 = phylotree_real, phylotree_2 = phylotree_alg1) +[1] FALSE + +> equals(phylotree_1 = phylotree_real, phylotree_2 = phylotree_real) +[1] TRUE +\end{example} + +In this case, \code{phylotree\_real} and \code{phylotree\_alg1} are not identical, as some edges present in \code{phylotree\_real} are absent in \code{phylotree\_alg1}, and vice versa. However, a phylogenetic tree will always be identical to itself, as shown when comparing \code{phylotree\_real} to itself. + +To find the maximal common subtrees between two phylogenetic trees, we do it using the following command: + +\begin{example} +> find_common_subtrees(phylotree_1 = phylotree_real, phylotree_2 = phylotree_alg2, + labels = TRUE) + +Independent edges of tree1: 2 + +Independent edges of tree2: 2 + +Common edges: 7 + +Distance: 4 +\end{example} + +\begin{figure*}[t] +\centerline{\includegraphics{figs/commonsubtrees.png}} +\caption{Maximal common subtrees between \code{phylotree\_real} and \code{phylotree\_alg2} using predefined tags. In this case, there exists a single common subtree, but there may exist more in other cases.} +\label{fig:commonsubtrees} +\end{figure*} + +\begin{figure*}[t!] +\centerline{\includegraphics[width=0.4\textwidth]{figs/consensus.png}} +\caption{Consensus tree between \code{phylotree\_real} and \code{phylotree\_alg1} using the \textit{Lancet} palette and predefined tags.} +\label{fig:consensus} +\end{figure*} + +The maximal common subtrees (in this case, one subtree) between \code{phylotree\_real} and \\ \code{phylotree\_alg2} are shown in Figure \ref{fig:commonsubtrees}. Note that the clones in the maximal common subtree are represented by the predefined tags in the \code{Phylotree} class objects as we have set \code{labels = TRUE}. Additionally, this method prints the number of common and independent edges of the trees, along with the distance between them. + +Finally, we generate the consensus tree between two phylogenetic trees using one of the custom palettes offered by \pkg{GeRnika}, specifically the \textit{Lancet} palette, and the predefined tags for the clones as follows: + +\begin{example} +> palette <- GeRnika::palettes$Lancet + +> consensus_real_alg1 <- combine_trees(phylotree_1 = phylotree_real, + phylotree_2 = phylotree_alg1, + labels = TRUE, + palette = palette) + +> DiagrammeR::render_graph(consensus_real_alg1) +\end{example} + +The consensus tree between \code{phylotree\_real} and \code{phylotree\_alg1} is depicted in Figure \ref{fig:consensus}. Here, the nodes and the edges that compose the common subtrees between the original trees are green. In addition, pink edges denote the +independent edges of the tree passed as the first parameter of the method, while blue edges represent the independent edges +of the second tree. Note that the independent edges of both trees are presented with translucent colors. + +\section{Conclusions} + +\pkg{GeRnika} is a comprehensive R package designed to address a critical gap in the tools available for studying tumor evolution within the R environment. To this end, it provides researchers with an integrated suite for simulating, visualizing, and comparing tumor phylogenies. Unlike many existing tools, \pkg{GeRnika} is fully implemented in R, making it particularly accessible for the bioinformatics community, which widely relies on R for data analysis and visualization. + +One of \pkg{GeRnika}'s key contributions is providing tools to generate biologically plausible datasets for studying intratumoral heterogeneity and clonal dynamics. With varied, easily customizable parameters—controlling features such as clonal tree topology, selective pressures, and sequencing noise—\pkg{GeRnika} enables exploration of a wide range of evolutionary patterns and complexities that provide a valuable resource for testing new methods and hypotheses in tumor heterogeneity research. + +Beyond its core simulation features, \pkg{GeRnika} includes tools for visualizing and comparing tumor phylogenies, offering a unified solution that eliminates the need for multiple packages or complex data processing workflows. Future work will focus on enhancing \pkg{GeRnika}’s compatibility with other R packages related to tumor evolution, allowing for easier integration with existing resources. + +Overall, \pkg{GeRnika} provides an accessible, user-friendly tool that supports research into intratumoral heterogeneity, with the potential to substantially advance tumor phylogeny research. While it does not perform clonal deconvolution or phylogeny inference from real sequencing data, \pkg{GeRnika} serves as a valuable platform for simulation, visualization, and benchmarking, supporting the development and evaluation of such algorithms. + +\section{R software} + +The R package \CRANpkg{GeRnika} is now available on CRAN. + +\bibliography{GeRnika} + +\address{Aitor Sánchez-Ferrera\\ +Intelligent Systems Group, Computer Science Faculty, University of the Basque Country\\ +Paseo Manuel Lardizabal, Donostia/San Sebastian, 20018\\ +Spain\\ +ORCiD: 0000-0001-6127-0686\\ +\email{aitor.sanchezf@ehu.eus}} + +\address{Maitena Tellaetxe-Abete\\ +Intelligent Systems Group, Computer Science Faculty, University of the Basque Country\\ +Paseo Manuel Lardizabal, Donostia/San Sebastian, 20018\\ +Spain\\ +ORCiD: 0000-0003-1894-4547\\ +\email{maitena.tellaetxe@ehu.eus}} + +\address{Borja Calvo-Molinos\\ +Intelligent Systems Group, Computer Science Faculty, University of the Basque Country\\ +Paseo Manuel Lardizabal, Donostia/San Sebastian, 20018\\ +Spain\\ +ORCiD: 0000-0001-9969-9664\\ +\email{borja.calvo@ehu.eus}} diff --git a/_articles/RJ-2025-042/RJ-2025-042.Rmd b/_articles/RJ-2025-042/RJ-2025-042.Rmd new file mode 100644 index 0000000000..9eb8010582 --- /dev/null +++ b/_articles/RJ-2025-042/RJ-2025-042.Rmd @@ -0,0 +1,1113 @@ +--- +title: 'GeRnika: An R Package for the Simulation, Visualization and Comparison of + Tumor Phylogenies' +abstract: | + The development of methods to study intratumoral heterogeneity and + tumor phylogenies is a highly active area of research. However, the + advancement of these approaches often necessitates access to + substantial amounts of data, which can be challenging and expensive to + acquire. Moreover, the assessment of results requires tools for + visualizing and comparing tumor phylogenies. In this paper, we + introduce GeRnika, an R package designed to address these needs by + enabling the simulation, visualization, and comparison of tumor + evolution data. In summary, GeRnika provides researchers with a + user-friendly tool that facilitates the analysis of their approaches + aimed at studying tumor composition and evolutionary history. +author: +- name: Aitor Sánchez-Ferrera + affiliation: |- + Intelligent Systems Group, Computer Science Faculty, University of the + Basque Country + orcid: 0000-0001-6127-0686 + address: + - Paseo Manuel Lardizabal, Donostia/San Sebastian, 20018 + - Spain + - | + [aitor.sanchezf@ehu.eus](aitor.sanchezf@ehu.eus){.uri} +- name: Maitena Tellaetxe-Abete + affiliation: |- + Intelligent Systems Group, Computer Science Faculty, University of the + Basque Country + orcid: 0000-0003-1894-4547 + address: + - Paseo Manuel Lardizabal, Donostia/San Sebastian, 20018 + - Spain + - | + [maitena.tellaetxe@ehu.eus](maitena.tellaetxe@ehu.eus){.uri} +- name: Borja Calvo-Molinos + affiliation: |- + Intelligent Systems Group, Computer Science Faculty, University of the + Basque Country + orcid: 0000-0001-9969-9664 + address: + - Paseo Manuel Lardizabal, Donostia/San Sebastian, 20018 + - Spain + - | + [borja.calvo@ehu.eus](borja.calvo@ehu.eus){.uri} +date: '2026-02-11' +date_received: '2025-03-31' +journal: + firstpage: 255 + lastpage: 274 +volume: 17 +issue: 4 +slug: RJ-2025-042 +packages: + cran: + - GeRnika + - data.tree + bioc: clevRvis +preview: preview.png +bibliography: GeRnika.bib +CTV: ~ +legacy_pdf: yes +legacy_converted: yes +output: + rjtools::rjournal_web_article: + self_contained: yes + toc: no + mathjax: https://cdn.jsdelivr.net/npm/mathjax@4/tex-mml-chtml.js + md_extension: -tex_math_single_backslash +draft: no + +--- + + +```{r setup, include=FALSE} +knitr::opts_chunk$set(fig.align = "center") +``` + +:::: article +## Introduction + +The development of tumors is a complex and dynamic process characterized +by a succession of events where DNA mutations accumulate over time. +These mutations give rise to genetic diversity within the tumor, leading +to the emergence of distinct clonal subpopulations or, simply, clones +[@nowell1976clonal]. Each of these subpopulations exhibits unique +mutational profiles, resulting in varied phenotypic and behavioral +characteristics among the cancer cells [@1]. This phenomenon, known as +intratumoral heterogeneity (ITH), significantly hinders the design of +effective medical therapies, since different clones within the same +tumor may respond differently to treatments, ultimately leading to +therapy resistance and disease recurrence [@articleeffect]. To address +this challenge, several innovative approaches are being developed to +study the tumor composition in greater detail and to reconstruct its +evolutionary history. + +One common approach to studying tumor composition and phylogeny involves +using bulk DNA sequencing data from multiple tumor biopsies. This data +is relatively straightforward to obtain and provides a broad overview of +the genetic alterations within the tumor. However, using bulk sequencing +data in the study of ITH faces the challenge that each sample +potentially contains a mixture of different clonal populations rather +than just one clone. Consequently, the observed mutation +frequencies---measured as variant allele frequencies (VAFs)---do not +directly estimate the fraction of individual clones. Instead, the VAF +values represent a composite signal: the sum of the fractions of all +clones that harbor each mutation in a given sample. + +This complexity implies that reconstructing the tumor's evolutionary +history requires deconvolving these clonal admixtures within the +samples. This task is precisely the focus of the Clonal Deconvolution +and Evolution Problem (CDEP) [@trap; @ancestree], which can be +summarized as determining the tumor's clonal structure---that is, +identifying the number, proportion, and mutational composition of clones +in each sample---as well as reconstructing the clonal phylogenetic tree +that leads to the observed clonal mosaic. In this context, one of the +most prominent approaches to addressing the CDEP is the Variant Allele +Frequency Factorization Problem (VAFFP) [@ancestree]. + +Given $s$ tumor samples and $n$ mutations identified across these +samples, we define a matrix $\boldsymbol{F} \in [0, 1]^{s \times n}$, +where each element $f_{ij}$ represents the variant VAF value, or +equivalently, the fraction of cells that carry mutation $j$ in sample +$i$. The VAFFP seeks to decompose this input matrix $\boldsymbol{F}$ +into two matrices: a matrix $\boldsymbol{B} \in \{0, 1\}^{n \times n}$ +that represents the clonal phylogeny, and a matrix +$\boldsymbol{U} \in [0, 1]^{s \times n}$ that captures the clone +proportions in each tumor sample: + +$$\boldsymbol{F} = \boldsymbol{U} \cdot \boldsymbol{B} +\label{eqF} (\#eq:eqF)$$ + +The $\boldsymbol{B}$ matrix is a binary square matrix of size $n$, where +$b_{ij} = 1$ iff clone $i$ contains the mutation $j$ [@gusfield]. The +matrix $\boldsymbol{U}$ is an $s \times n$ matrix where $u_{ij}$ is the +fraction of clone $j$ in sample $i$. + +The VAFFP operates under two key assumptions: tumors have a monoclonal +origin, meaning they arise from a single abnormal cell, and the infinite +sites assumption (ISA), which states that mutations occur at most once +and cannot disappear over time [@kimuraISA]. Under these assumptions, +the tumor's clonal structure can be modeled as a perfect phylogeny +[@gusfield]. This model imposes two key constraints: (1) if two clones +share a mutation, they must either be identical or ancestrally related, +and (2) once a clone acquires a mutation, that mutation is inherited by +all its descendants. + +Over the years, numerous methods have been developed to solve the CDEP +[@bayclone; @bitphylogeny; @phylowgs; @ancestree; @citup; @lichee; @spruce; @canopy; @mipup; @fu2022reconstructing; @grigoriadis2024conipher], +with recent advancements addressing reformulations of the problem that +incorporate single-cell sequencing-derived information or variants in +metastases, and account for temporal resolution +[@ross2016onconem; @machina; @malikic2019phiscs; @satas2020scarlet; @sollier2023compass]. +These methods primarily focus on reconstructing tumor phylogenies and +clonal compositions using both real and simulated data. However, the +simulation tools they employ often have limitations. For instance, while +MiPUP uses simulated data, it allows only basic parameter +adjustments---such as the number of mutations, samples, and reads---and +lacks the flexibility to fine-tune biological parameters. BitPhylogeny +creates highly realistic and complex simulated datasets representing +different modes of evolution, but these simulations are manually crafted +and limited in number, posing scalability issues. + +Several tools specifically designed for simulating data for the CDEP +have also been introduced. Pearsim, written in Python, allows control +over parameters like read depth, number of subclones, samples, and +mutations [@pairtree]. OncoLib, a C++ library, facilitates the +simulation of tumor heterogeneity and the reconstruction of NGS +sequencing data of metastatic tumors, offering control over parameters +such as driver mutation probability, per-base sequencing error rate, +migration rate, and mutation rate [@Oncolib]. Machina, also in C++, +provides a framework for simulating metastatic tumors and visualizing +their phylogenetic trees and migration graphs [@machina]. HeteroGenesis, +implemented in Python, simulates heterogeneous tumors at the level of +clone genomes, but it is not specifically tailored for CDEP instances +and requires additional processing to generate suitable datasets +[@tanner2019simulation]. + +Visualization of tumor phylogenies is another crucial aspect of studying +tumor evolution. Tools like CALDER, Clonevol, SPRUCE, **fishplot**, +ClonArch, and +[**clevRvis**](https://www.bioconductor.org/packages/release/bioc/html/clevRvis.html) +offer solutions for visualizing clonal structures and evolutionary +trajectories +[@calder; @clonevol; @spruce; @fishplot; @clonarch; @clevrvis]. Among +these, **fishplot** and **clevRvis** are available as R packages. +Additionally, methods for generating consensus trees, such as TuELIP and +ConTreeDP, have been developed to summarize multiple phylogenetic trees +into a single representative tree [@tuelip; @contreedp]. + +Despite the availability of existing tools, there remains a lack of +options in the R programming environment for realistically simulating +tumor evolution in a way that is both flexible and user-friendly, while +also enabling effective visualization and comparison. + +In this paper, we introduce +[**GeRnika**](https://CRAN.R-project.org/package=GeRnika), an R package +that provides a comprehensive solution for simulating, visualizing, and +comparing tumor evolution data. Although **GeRnika**'s data simulation +functionality was primarily devised to create instances for solving the +CDEP, the simulated data are not restricted to this purpose and can also +be used for exploring evolutionary dynamics in broader contexts. To +accommodate diverse research needs, we have implemented the procedures +to be highly customizable, allowing users to adjust a wide range of +parameters such as the number of clones, selective pressures, mutation +rates, and sequencing noise levels. Unlike existing tools that may offer +limited customization or are implemented in other programming languages, +**GeRnika** is fully integrated into the R environment, making it easy +to use alongside other bioinformatics packages. By combining simulation +capabilities with visualization and comparison tools in a user-friendly +interface, **GeRnika** offers an accessible and flexible option within +the R ecosystem for researchers studying tumor evolution. It is +important to note that **GeRnika** does not implement algorithms for +inferring clonal composition or reconstructing phylogenies from +experimental datasets. Instead, it provides a controlled framework for +generating, visualizing, and comparing data that can be used to +benchmark such methods. + +## Simulation of tumor evolution {#sec:simulation} + +The main contribution of this work is the introduction of a novel +approach for simulating biologically plausible instances of the VAFFP +that accounts for several key factors, including the number of clones, +selective pressures, and sequencing noise. In this section, we provide a +detailed description of the approach. + +Broadly speaking, each problem instance consists of a matrix +$\boldsymbol{F}$ containing the VAF values of a set of mutations in a +set of samples, as described previously. This matrix is built from a +pair of matrices $\boldsymbol{B}$ and $\boldsymbol{U}$ that represent a +tumor phylogeny fulfilling the ISA and the proportions of the clones in +the samples, respectively, following Equation \@ref(eq:eqF). + +In order to simulate the $\boldsymbol{B}$ and $\boldsymbol{U}$ matrices, +we have devised two models: a tumor model that simulates the +evolutionary history and current state of the tumor, and a sampling +model that represents the tumor sampling process. A third model, namely +the sequencing noise model, has been devised to optionally introduce +sequencing noise to the VAF values in the $\boldsymbol{F}$ matrix, if +noisy data is desired. The following subsections describe these models +in detail. + +### Tumor model + +The tumor model generates a clonal tree $T$ and an associated matrix +$\boldsymbol{B}$, together with the clone proportions $\boldsymbol{c}$ +and tumor blend at the moment of sampling. Briefly, for a tumor with a +set of $n$ mutations denoted by $M$, $T$ is a rooted tree on an +$n$-sized vertex set $V_n = \{v_{1}, \dots, v_{n} \}$, where $v_{i}$ +represents clone $i$ and simultaneously corresponds to the first clone +containing mutation $M_i$. This one-to-one correspondence between clones +and mutations allows us to refer to them interchangeably. The tree is +further defined by an $(n-1)$-sized edge set $E_T$, where each edge +$e_{ij} \in E_T$ represents a direct ancestral relationship from vertex +$v_{i}$ to vertex $v_{j}$. + +In our tumor model, $T$ is iteratively generated with a random topology, +as follows. First, the root node of $T$, $\mathcal R(T)$, is set, and a +random mutation $M_i \in M$ is assigned to it. For each of the remaining +$M_j \in M - \{M_i\}$ mutations, a new node $v_j$ is created and the +mutation $M_j$ is assigned to this node. The node $v_j$ is then attached +as a child to one of the nodes already included in $T$. To adhere to the +ISA model, each newly added node inherits all the mutations present in +its parent node. + +The attachment of nodes to the tree is not uniformly random. Instead, +the nodes in the growing tree $T$ have different probabilities of being +selected as parents for the new nodes, depending on the number of +ascendants, $\mathcal A(v_i)$, they have. Specifically, +$\forall v_j \neq \mathcal R(T)$, the parent node of $v_j$ is sampled +from a multinomial distribution where the probabilities are calculated +as: + +$$\boldsymbol{p}(v_i; k) = \frac{k^{\frac{|\mathcal{A}(v_i)| + 1}{\delta}}}{\sum_{v_l \in V'} k^{\frac{|\mathcal{A}(v_l)| + 1}{\delta}}}; \quad v_i \in V'$$ + +Here, $\delta$ represents the depth of the growing tree, i.e., the +number of levels or layers in the tree structure. $k \in (0, +\infty)$ +is the topology parameter that determines whether the topology tends to +be branched, with a decreasing probability for increasing numbers of +ascendants ($k$ $<$ 1), or linear, with an increasing probability for +increasing numbers of ascendants ($k$ $>$ 1). + +Once $T$ has been generated, it is represented in the form of a +$\boldsymbol{B}$ matrix, constructed by initializing an identity matrix +$B_{n}$ and setting $b_{ji}$ to 1 for each pair of nodes $v_i$ and $v_j$ +where node $v_j$ is a descendant of node $v_i$ in $T$. + +After obtaining $\boldsymbol{B}$, the proportions of the clones in the +whole tumor, denoted as $\boldsymbol{c} = \{c_{1}, \dots, c_{n} \}$, are +simulated. It is important to note that these proportions are not the +same as those appearing in the $\boldsymbol{U}$ matrix, which represent +the *sampled* clone proportions and depend not only on the global clone +proportions but also on the spatial distribution of the clones and the +sampling sites. + +These clone proportions $\boldsymbol{c}$ are calculated by sequentially +sampling a Dirichlet distribution at each multifurcation in $T$, +starting from the root. For instance, for a node $v_i$ with children +$\mathcal K(v_i)$ = {$v_j$, $v_k$}, we draw a sample $(x_i, x_j, x_k)$ +that represents the proportions of the parent clone and its two +children, respectively, from a Dirichlet distribution +$Dir(\alpha_i, \alpha_j, \alpha_k)$. When this sampling is performed at +a node $v_i \neq \mathcal R(T)$, these proportions are scaled relative +to the original proportion of the parent clone. This ensures that the +sum rule is met, and that once all multifurcations have been visited, +the proportions of all clones in $T$ sum up to one. While several +approaches can exist to determine these proportions, this method +provides a natural approximation to the problem that can be interpreted +as the distribution of the mass or proportion of each clone between +itself and its descendants. + +The parameters of the Dirichlet distribution depend on the tumor's +evolution model. In this work, we consider two fundamental cases: +positive selection-driven evolution and neutral evolution. In positive +selection-driven evolution, certain mutations confer a growth advantage, +while most mutations do not. As a result, the clones carrying these +advantageous mutations outcompete other clones and dominate the tumor. +Consequently, tumors are predominantly composed of a few dominant +clones, with the remaining clones present in too small proportions. +Under neutral evolution, instead, there is no significant number of +mutations that provide a fitness advantage, and clones accumulate solely +due to tumor progression. As a result, all clones are present in similar +proportions [@davis2017tumor]. + +Based on this, all the parameters for the Dirichlet distribution for +positive selection-driven evolution are set to 0.3. For neutral +evolution, the parameter corresponding to the parent node ($\alpha_{p}$) +is set to 5, and the parameters corresponding to the children node(s) +($\alpha_{c}$) are set to 10. Different alpha values are used for parent +and children nodes in neutral evolution to ensure that clones arising +late in the evolution do not end up with proportions that are too small +solely due to their position in the topology, preventing the deviation +from the expected clone proportion distribution for this type of +evolution model. These values have been chosen empirically, and their +effect is illustrated in Figure \@ref(fig:ternaryplots), which shows how 5,000 random samples from +the mentioned Dirichlet distributions (for the particular case of 3 +dimensions, i.e., one parent and two children nodes) are distributed. As +observed, in the case of positive selection +($\alpha_{i} = \alpha_{j} = \alpha_{k} = 0.3$), the $(x_i, x_j, x_k)$ +values are equally pushed towards the three corners of the simplex. In +other words, the samples tend to be sparse, with typically one component +having a large value and the rest close to 0. Instead, when neutral +selection is adopted ($\boldsymbol{\alpha}$ = (5, 10, 10)), the +$(x_i, x_j, x_k)$ values concentrate close to the center of the simplex, +but with a tendency to deviate towards those components with larger +$\alpha$ value. This means that samples $(x_i, x_j, x_k)$ are less +sparse in neutral evolution, with larger values for $x_2$ and $x_3$ in +this case, which represent the children nodes. + +```{r ternaryplots, echo=FALSE, fig.cap="Ternary density plots of 5,000 samples drawn from two 3-dimensional Dirichlet distributions. The parameters of the Dirichlet distribution on the left are $\\alpha = (0.3, 0.3, 0.3)$ and the distribution is used to represent positive selection-driven evolution. The distribution on the right has parameters $\\alpha = (5, 10, 10)$ and is used to represent neutral evolution. Samples drawn from these distributions (or their generalization to higher spaces) are used to calculate clone proportions in each tree multifurcation."} +knitr::include_graphics("figs/ternary_plots_cropped.png") +``` + +Taking into account that marginalizing the Dirichlet distribution +results in a Beta distribution, the proportion of the clone +$v_i \in V_n \; | \; v_i \neq \mathcal R(T)$ in the tumor, denoted as +$C_i$, follows the distribution: + +$$C_i \sim C_{\mathcal P(v_i)} \cdot \Gamma_i \cdot \Gamma_i^\prime$$ + +where + +$$\Gamma_i \sim Beta(\alpha_{c}, \alpha_{p} + \alpha_{c} \cdot (|\mathcal K(\mathcal P(v_i))| - 1))$$ + +and + +$$\begin{aligned} +\Gamma_i^\prime = 1 \quad \text{if } |\mathcal K(v_i)| = 0 \\ +\Gamma_i^\prime \sim Beta(\alpha_{p}, \alpha_{c} \cdot |\mathcal K(v_i)|) \quad \text{if } |\mathcal K(v_i)| \neq 0 +\end{aligned}$$ + +For the case where $v_i = \mathcal R(T)$, the root node, $C_i$, follows: + +$$C_i \sim Beta(\alpha_{p}, \alpha_{c} \cdot |\mathcal K(v_i)|)$$ + +Here, $\alpha_{p}$ and $\alpha_{c}$ are the parameters of the Dirichlet +distribution assigned to parent and child nodes, respectively. + +To complete the tumor model, the tumor blend is simulated, which +represents the degree of physical mixing between the tumor clones. In +order to do this, we simplify the spatial distribution to one dimension +and model the tumor as a Gaussian mixture model with $n$ components, +where each component $G_i$ represents a tumor clone, and the mixture +weights are given by $\boldsymbol{c}$. The variance for all components +is set to 1, while the mean values are random variables. + +Specifically, we start by selecting a random clone, and its component's +mean value is set to 0. Then, the mean values of the remaining $n - 1$ +components are calculated sequentially by adding $d$ units to the mean +value of the previous component. To introduce variability in the tumor +blend, the value of $d$ is chosen from the set $\{0, 0.1, \ldots, 4\}$. +For $d$ = 0, the two clones are completely mixed, while for $d$ = 4, +they are physically far apart from each other. The choice of the upper +limit for $d$ has been determined empirically, considering that with +this value, the overlapping area between the two clones becomes +negligible. + +To ensure the separation between the clones is random and that most of +the time the separation is small, we use an exponential-like +distribution with the form $Beta(\alpha=1, \beta=X)$ to sample the +values of $d$. Specifically, we set $\beta = 5$ to ensure that the +samples obtained from the mixture are not excessively sparse. We can +express this mathematically as: + +$$D \sim 4 \cdot Beta(\alpha=1, \beta=5)$$ + +### Sampling simulation + +So far, we have described how the clones of a tumor are modelled by the +tumor model. However, in real practice, there is no easy way of +observing these global properties of a tumor. Instead, we typically have +access to information provided by samples or biopsies. This means that +certain tumor characteristics, such as the real clone proportions +$\boldsymbol{c}$, cannot be directly obtained. Instead, we can only +determine the *sampled* clone proportions, which depend on the specific +sampling procedure employed. Unless there is a perfectly uniform mixture +of the clone cells, their sampled proportions will not match the global +proportions. These sampled clone proportions are, in fact, the +$\boldsymbol{u}_{i}.$ elements in the $\boldsymbol{U}$ matrix. + +The sampling simulation we have devised simulates the physical sampling +of the tumor and allows us to construct the $\boldsymbol{U}$ matrix of +the problem. This procedure operates on the data simulated using the +tumor model. Specifically, it simulates a sampling procedure carried out +in a grid manner over the tumor Gaussian mixture model described in the +previous section. Let $G_1$ and $G_n$ be the components with the lowest +and largest mean values, respectively, in the Gaussian mixture model. +The 1^st^ and $m$^th^ sampling points in the grid are always set to +$\mu_{G_1} - 2.8 \cdot \sigma_{G_1}$ and +$\mu_{G_n} + 2.8 \cdot \sigma_{G_n}$, respectively, and the remaining +$m$-2 sampling points are determined by dividing the range between these +two endpoints into $m-1$ equal intervals. + +The densities of the Gaussian distributions at each sampling point are +multiplied by the global proportion of the clones sampled from the +Dirichlet distributions, so that for each sampling point $i$, the +fraction of clone $j$, $p_{ij}$, is proportional to their product: + +$$p_{ij} \propto c_j \cdot \phi_{ij} +\label{eq:pij_2} (\#eq:pij-2)$$ + +where $c_j$ is the global proportion of clone $j$ and $\phi_{ij}$ is the +density of the Gaussian component associated with clone $j$ at sampling +point $i$. + +Finally, to account for the effect of cell count in the samples, a +multinomial distribution is used to sample a given number of cells +$n_{c}$ for each tumor sample. In that distribution, the probability of +selecting each clone at sampling site $i$ is given by +$(p_{i1}, \ldots, p_{in})$. The resulting values determine the final +tumor clone composition in sample $i$, which are represented in the +matrix $\boldsymbol{U}$: + +$$U_{i.} \sim \frac{M(n = n_{c}, p = (p_{i1}, \ldots, p_{in})) +}{100}$$ + +Note that selecting a relatively low value for $n$ in the multinomial +distribution can lead to clones with very low frequencies being modeled +as absent in the sample, with composition values equal to 0. This is +indeed more realistic than truly observing them with such low +frequencies. + +### Sequencing noise simulation + +Up to this point, the $\boldsymbol{B}$ and $\boldsymbol{U}$ matrices of +an instance have been simulated. In case we are simulating noise-free +data, the simulation is complete once Equation \@ref(eq:eqF) is applied +to obtain the $\boldsymbol{F}$ matrix. + +As a brief reminder, each element $f_{ij}$ in $\boldsymbol{F}$ denotes +the frequency or VAF of the mutation $M_j$ in sample $i$ or, in other +words, the proportion of sequencing reads that carry the mutation $M_j$ +in that particular sample. This also means that the proportion of reads +in that sample that do not observe the mutation but instead contain the +reference nucleotide is 1 - $f_{ij}$. + +However, empirical factors can artificially alter the VAF value, leading +it to deviate from the true ratio between the variant and total allele +molecule counts. One of these factors is the noise introduced during the +DNA sequencing process itself, which can arise in two main ways. First, +limitations of the sequencing instrument can lead to incorrect +nucleotide readings of DNA fragments. For example, a position that +actually contains nucleotide A may be read as a T. Second, there can be +a biased number of reads produced for a particular site, which can +result from chemical reaction peculiarities or simply because not all +fragments are sequenced. These limitations can, however, be mitigated to +some extent. For instance, it has been shown that a high depth of +coverage, which refers to the average number of reads that cover each +position, can lead to more accurate VAF values [@depth_error]. + +In order to incorporate the effect of sequencing noise in the data +instances, we have developed a procedure to simulate sequencing noise. +This procedure introduces noise to the $\boldsymbol{F}$ matrix and +generates a noisy matrix $\boldsymbol{F^{(n)}}$, where +$\boldsymbol{F^{(n)}} \neq \boldsymbol{U} \cdot \boldsymbol{B}$. The +procedure simulates noise at the level of the sequencing reads and +recalculates the new $f^{(n)}_{ij}$ values, as follows. + +The sequencing depth $r$ at the genomic position where $M_j$ occurs in +sample $i$ is distributed according to a negative binomial distribution: + +$$r_{ij} \sim NB(\mu = \mu_{sd}, \alpha = 5) +\label{eq:r} (\#eq:r)$$ + +where $\mu_{sd}$ represents the mean sequencing depth, which is the +average number of reads covering the genomic position of mutation $M_j$ +in the sample, and $\alpha$ is the dispersion parameter, which controls +the variability of the sequencing depth around the mean and is fixed at +5. + +The number of reads supporting the alternate allele $r^{a}_{ij}$ is then +modeled by a binomial distribution: + +$$r^{a}_{ij} \sim B(n = r_{ij}, p = f_{ij}) +\label{eq:ra} (\#eq:ra)$$ + +In sequencing data, errors can occur due to limitations inherent to the +sequencing methodology. These errors vary depending on the technology +used. + +To simulate the effect of these errors on the VAF values, the number of +reads $r^{a\prime}_{ij}$ that, despite originally supporting the +alternate allele, contain a different allele as a result of a sequencing +error, is modeled using a binomial distribution: + +$$r^{a\prime}_{ij} \sim B(n = r^{a}_{ij}, p = \varepsilon),$$ + +where $\varepsilon$ represents the sequencing error rate. + +We also need to consider the situation where the reads contain the +reference nucleotide but are read with the alternate allele as a result +of this error. This can be better understood with an example. Let's +imagine that at a certain genomic position, the normal cells have a T, +but in some cells, there is a mutation where the T has changed to an A. +In this case, for the normal cells, with a rate of $\varepsilon$, a +sequencing error may occur, resulting in a read of C, G, or A instead of +T, each with an equal chance. Therefore, in approximately +$\frac{\varepsilon}{3}$ of the cases, reads with the mutation of +interest will arise from normal reads: + +$$r^{r\prime}_{ij} \sim B(n = r_{ij} - r^{a}_{ij}, p = \frac{\varepsilon}{3}) +\label{eq:ramr} (\#eq:ramr)$$ + +Thus, taking all these into consideration, the final noisy VAF values +$f^{(n)}_{ij}$ are simulated as: + +$$f^{(n)}_{ij} = \frac{r^{a}_{ij} - r^{a\prime}_{ij} + r^{r\prime}_{ij}}{r_{ij}} +\label{eq:noisyVAF} (\#eq:noisyVAF)$$ + +By default, the sequencing error rate $\varepsilon$ is set to 0.001, +following commonly reported values for Illumina data +[@loman2012performance]. + +As an illustration of the effect of the noise model, in Figure +\@ref(fig:Ferrorplot), we +have depicted the density of the mean absolute error between the +$\boldsymbol{F^{(n)}}$ matrix and its corresponding noise-free +$\boldsymbol{F}$ matrix for a collection of noisy instances. As can be +seen, as $\mu_{sd}$ increases, the error introduced to the +$\boldsymbol{F^{(n)}}$ matrix decreases. This is expected because the +$r^{a}_{ij}$ values follow a binomial distribution as described in +Equation \@ref(eq:ra), where the number of trials is determined by +$\mu_{sd}$ as shown in Equation \@ref(eq:r), and the event probability +corresponds to the $f^{(n)}_{ij}$ value. Therefore, the larger the +number of trials, the closer the noisy VAF value is to the noise-free +VAF value. + +```{r Ferrorplot, echo=FALSE, fig.cap="Density of the mean absolute error in noisy $\\boldsymbol{F}$ matrices for different $\\mu_{sd}$ values that correspond to different noise levels."} +knitr::include_graphics("figs/noisy_F_error.png") +``` + +As a final remark, it is important to note that although our data +simulation procedure follows the ISA, the addition of noise may cause +the resulting data to break this assumption. + +## The package {#sec:package} + +**GeRnika** provides three main functionalities for studying tumor +evolution data: (I) simulating artificial tumor evolution, (II) +visualizing tumor phylogenies, and (III) comparing tumor phylogenies. +This section explains the functions that support these features. +Additionally, we describe extra data provided by **GeRnika** that users +can use to try the methods in the package. + +### Simulation methods + +To enable users to simulate tumor evolution data, **GeRnika** provides +various functions inspired by the methods described in +[2](#sec:simulation){reference-type="ref" reference="sec:simulation"}. +GeRnika offers two options: a single method for streamlined simulations +and separate methods for performing each step individually, allowing +users to customize or replace specific parts of the process. + +#### create_instance {#create_instance .unnumbered} + +The main function for streamlined tumor data simulation is +`create_instance`. This function provides a convenient way to perform +the entire simulation process in a single step. The following command +demonstrates how to use it to generate the artificial data: + +::: center +``` r + create_instance(n, m, k, selection, noisy = TRUE, depth = 30, seed = Sys.time()) +``` +::: + +where each argument of the method is described as follows: + +- `n`: An integer representing the number of clones. + +- `m`: An integer representing the number of samples. + +- `k`: A numeric value that determines the linearity of the tree + topology. Also referred to as the topology parameter. Increasing + values of this parameter increase the linearity of the topology. + When `k` is set to 1, all nodes have equal probabilities of being + chosen as parents, resulting in a uniformly random topology. + +- `selection`: A character string representing the evolutionary mode + the tumor follows. This should be either \"positive\" or + \"neutral\". + +- `noisy`: A logical value (`TRUE` by default) indicating whether to + add noise to the frequency matrix. If `TRUE`, noise is added to the + frequency matrix. If `FALSE`, no noise is added. + +- `depth`: A numeric value (30 by default) representing the mean depth + of sequencing. + +- `seed`: A numeric value (`Sys.time()` by default) used to set the + seed for the random number generator. + +The `create_instance` function returns a list containing the following +components: + +- `F_noisy`: A matrix representing the noisy frequencies of each + mutation across samples. If the `noisy` parameter is set to `FALSE`, + this matrix is equal to `F_true`. + +- `B`: A matrix representing the relationships between mutations and + clones in the tumor. + +- `U`: A matrix representing the frequencies of the clones across the + set of samples. + +- `F_true`: A matrix representing the noise-free frequencies of each + mutation across the samples. + +As explained, the `create_instance` function generates all matrices +representing frequencies, proportions, and the phylogeny of the +simulated tumor data in a single step. However, **GeRnika** also +provides individual functions for simulating each of these elements +independently, providing users with greater control over the +characteristics of the simulated tumor data. + +#### create_B, create_U, create_F and add_noise {#create_b-create_u-create_f-and-add_noise .unnumbered} + +These methods provide specialized functions to generate each matrix +involved in representing tumor evolution data. These functions include +the following: + +- `create_B`: This function generates a mutation matrix + ($\boldsymbol{B}$ matrix) for a tumor phylogenetic tree with a given + number of nodes and a value `k` determining the linearity of the + tree topology. + +- `create_U`: This function calculates the $\boldsymbol{U}$ matrix, + containing the frequencies of each clone in a set of samples, based + on a $\boldsymbol{B}$ matrix, the number of samples considered, the + number of cells in each sample, and the evolutionary mode of the + tumor. + +- `create_F`: This function generates the $\boldsymbol{F}$ matrix, + which contains mutation frequency values for a series of mutations + across a collection of tumor biopsies or samples. The matrix is + computed based on a pair of matrices, $\boldsymbol{U}$ and + $\boldsymbol{B}$, and considers whether the mutations are + heterozygous. + +- `add_noise`: This function introduces sequencing noise into the + noise-free $\boldsymbol{F}$ matrix generated by the `create_F` + method. Users can specify the mean sequencing depth and the + overdispersion parameter, which are used to simulate sequencing + depth based on a negative binomial distribution. + +The reader is encouraged to refer to the package documentation for more +information about these functions and their parameters. + +### Visualization methods + +The following functions enable the visualization of tumor evolution data +by generating phylogenetic trees based on the data under analysis. + +#### Phylotree S4 class {#phylotree-s4-class .unnumbered} + +To simplify the execution of its functionalities, **GeRnika** utilizes +the `"Phylotree"` class. The `"Phylotree"` S4 class is a data structure +specifically designed to represent phylogenetic trees, facilitating the +use of the package's methods and ensuring their computational +efficiency. The attributes of the `"Phylotree"` class are as follows: + +- `B`: A data.frame containing the square matrix that represents the + ancestral relationships among the clones in the phylogenetic tree + ($\boldsymbol{B}$ matrix). + +- `clones`: A vector representing the indices of the clones in the + $\boldsymbol{B}$ matrix. + +- `genes`: A vector indicating the index of the gene that firstly + mutated in each clone within the $\boldsymbol{B}$ matrix. + +- `parents`: A vector indicating the parent clones for each clone in + the phylogenetic tree. + +- `tree`: A `"Node"` class object representing the phylogenetic tree + (this class is inherited from the + [**data.tree**](https://CRAN.R-project.org/package=data.tree) + package). + +- `labels`: A vector containing the gene tags associated with the + nodes in the phylogenetic tree. + +A customized `"Phylotree"` class object can be instantiated with custom +attributes using the `create_instance` method. This method takes all the +attributes according to the `"Phylotree"` class as arguments. +Alternatively, **GeRnika** provides a function that automatically +generates a `"Phylotree"` class object on the basis of a given +$\boldsymbol{B}$. + +#### B_to_phylotree {#b_to_phylotree .unnumbered} + +In order to instantiate an object of the "`Phylotree`" class, the +following command can be used: + +``` r + B_to_phylotree(B, labels = NA) +``` + +where each argument of the method is described as follows: + +- `B`: A square $\boldsymbol{B}$ matrix that represents the + phylogenetic tree. + +- `labels`: An optional vector containing the tags of the genes in the + phylogenetic tree. `NA` by default. + +This function returns an object of the `"Phylotree"` class, +automatically generating its attributes based on `B`, which represents +the phylogenetic tree of the tumor under analysis. + +Once instantiated, the phylogenetic tree in a `"Phylotree"` class object +can be visualized using the generic `plot` function, which takes the +`"Phylotree"` object as its argument. The `plot` function also includes +a `labels` argument that can be set to `TRUE` to display node labels on +the phylogenetic tree, using the gene tags stored within the +`"Phylotree"` object. + +The **GeRnika** package provides the `plot_proportions` function for +visualizing phylogenetic trees, with node sizes and colors reflecting +the proportions of each clone. This function requires two inputs: a +`"Phylotree"` class object representing the phylogenetic tree and a +numeric vector or matrix specifying clone proportions. If a vector is +provided, a single tree is plotted, with the node sizes and colors +determined by the values in the vector. Instead, if a matrix is +provided, such as the $\boldsymbol{U}$ matrix that represents the +frequencies of clones across samples, the function plots one tree for +each row of the matrix. Each tree is generated based on the clone +proportions specified in the corresponding row. Additionally, users can +enable node labeling by setting the `labels` argument to `TRUE`, which +annotates the tree nodes with gene tags from the `"Phylotree"` object. + +### Comparison methods + +This section describes the methods included in **GeRnika** that +facilitate the comparison of tumor phylogenies. + +A fundamental approach for comparing two phylogenetic trees is to +determine if their evolutionary histories are equivalent. The `equals` +function performs this comparison by accepting two `"Phylotree"` class +objects as arguments. This function returns a boolean value indicating +whether the provided phylogenetic trees are equivalent. + +To analyze similarities and differences between two phylogenetic trees, +the `find_common_subtrees` function identifies and plots all maximal +common subtrees between them. In addition to visualizing these subtrees, +the function outputs the number of shared and unique edges (those +present in only one of the trees) and calculates the distance between +the trees, defined as the sum of their unique edges. This method also +includes an option to label the maximal common subtrees with gene tags +by setting `labels = TRUE`. + +The `combine_trees` function generates a consensus tree by combining the +nodes and edges of two `"Phylotree"` class objects. The consensus tree +highlights the nodes and edges that form common subtrees between the +original trees, as well as the independent edges unique to each tree, +which are displayed with reduced opacity. This method also allows +labeling the nodes with gene tags by setting `labels = TRUE` and +customizing the colors of the consensus tree by passing a 3-element +hexadecimal vector to the `palette` argument. + +### Exported data + +**GeRnika** provides various exported data instances to help users +easily explore the package's functionalities. These are as follows: + +- `B_mats`: A list of 10 trios of $\boldsymbol{B}$ matrices. Each trio + includes a real $\boldsymbol{B}$ matrix and two $\boldsymbol{B}$ + matrices generated using different algorithms that infer + evolutionary relationships from a given $\boldsymbol{F}$ matrix. + These matrices serve as illustrative examples for testing and + exploring the package's functionalities. + +- `palettes`: A data frame containing three predefined palettes for + use with methods in **GeRnika** that require color palettes. + +## Examples + +In this section we show examples of the use of the methods explained in +[3](#sec:package){reference-type="ref" reference="sec:package"}. Please +note that in the following examples, we will set the seeds of +non-deterministic methods to a predefined value to ensure +reproducibility. + +### Simulating tumor evolution data + +The simulation of tumor clonal data involves generating the matrices +$\boldsymbol{B}$, $\boldsymbol{U}$, $\boldsymbol{F}$, and +$\boldsymbol{F^{(n)}}$ associated with a specific instance. For example, +we can simulate a noisy instance of a tumor composed of 5 +clones/mutations, which has evolved under neutral evolution with a $k$ +value of 0.5, and from which 3 samples have been taken in a single line +of code, as follows: + +``` r + +> I <- create_instance(n = 5, m = 3, k = 0.5, selection = "neutral", seed = 1) +> I + +$F_noisy + mut1 mut2 mut3 mut4 mut5 +sample1 1.000 0.09090909 0.0000000 0.2777778 0.3548387 +sample2 1.000 0.20000000 0.2631579 0.8536585 0.2000000 +sample3 0.975 0.03846154 1.0000000 1.0000000 0.0000000 + +$B + mut1 mut2 mut3 mut4 mut5 +clone1 1 0 0 0 0 +clone2 1 1 0 1 0 +clone3 1 0 1 1 0 +clone4 1 0 0 1 0 +clone5 1 0 0 1 1 + +$U + clone1 clone2 clone3 clone4 clone5 +sample1 0.59 0.13 0.00 0.01 0.27 +sample2 0.13 0.27 0.24 0.19 0.17 +sample3 0.00 0.04 0.89 0.07 0.00 + +$F_true + mut1 mut2 mut3 mut4 mut5 +sample1 1 0.13 0.00 0.41 0.27 +sample2 1 0.27 0.24 0.87 0.17 +sample3 1 0.04 0.89 1.00 0.00 +``` + +Using this approach, the previously mentioned four matrices are +simulated. Note that the noise-free $\boldsymbol{F}$ matrix is referred +to as `F_true` in the package's code, while the noisy +$\boldsymbol{F^{(n)}}$ is denoted as `F_noisy`. + +The previous method allows users to generate instances easily and +quickly. However, some users may require more precise control over the +data, which can be achieved using the `create_B`, `create_U`, +`create_F`, and `add_noise` methods. For examples on how to use these +methods, please refer to the package documentation. + +### Visualizing tumor phylogenies + +Once the matrices associated with our tumor instance have been +generated, we can create a `"Phylotree"` class object, as follows: + +``` r +> phylotree <- B_to_phylotree(B = I$B) +> phylotree + +An object of class "Phylotree" +Slot "B": + mut1 mut2 mut3 mut4 mut5 +clone1 1 0 0 0 0 +clone2 1 1 0 1 0 +clone3 1 0 1 1 0 +clone4 1 0 0 1 0 +clone5 1 0 0 1 1 + +Slot "clones": +mut1 mut2 mut3 mut4 mut5 + 1 2 3 4 5 + +Slot "genes": +mut1 mut2 mut3 mut4 mut5 + 1 2 3 4 5 + +Slot "parents": +[1] -1 4 4 1 4 + +Slot "tree": + levelName +1 1 +2 °--4 +3 ¦--2 +4 ¦--3 +5 °--5 + +Slot "labels": +[1] "mut1" "mut4" "mut2" "mut3" "mut5" +``` + +Since no list of tags is provided to the `labels` parameter, a default +set of labels is automatically assigned to the instantiated +`"Phylotree"` class object. + +Afterwards, we can visualize the tumor phylogeny associated to the +simulated $\boldsymbol{B}$ matrix by using the generic `plot` method, as +follows: + +``` r +> plot(phylotree) +``` + +```{r phylotree, fig.cap="Phylogenetic tree associated to the generated 'Phylotree' class object.", out.width="50%", echo=FALSE} +knitr::include_graphics("figs/phylotree.png") +``` + +The resulting plot is shown in Figure +\@ref(fig:phylotree). +Instead of clone numbers, the user can utilize the predefined tags in +the `"Phylotree"` class object to label the nodes in the tree by setting +the `labels = TRUE` parameter in the `plot` function. + +When plotting a `"Phylotree"` class object, its nodes can be resized +according to the proportions of the clones that compose the tumor +samples. To achieve this, we can use the $\boldsymbol{U}$ matrix from +the previously generated instance to determine the proportions of the +clones, as shown below: + +``` r +> plot_proportions(phylotree, I$U, labels = TRUE) +``` + +```{r proportions, fig.cap="Phylogenetic trees associated to the generated 'Phylotree' class object, using the proportions associated to the previously generated $\\boldsymbol{U}$ matrix.", out.width="100%", echo=FALSE} +knitr::include_graphics("figs/proportions.png") +``` + +The resulting plot is shown in Figure +\@ref(fig:proportions). +This method plots the proportions of the $\boldsymbol{U}$ matrix, where +each tree represents a sample from the $\boldsymbol{U}$ matrix, +illustrating the proportion of each clone within that specific sample. +In this case, we have set the `labels` parameter to `TRUE` to label the +nodes in the tree using the predefined tags \"mut1\", \"mut4\", +\"mut2\", \"mut3\", and \"mut5\". + +### Comparing tumor phylogenies + +Now, we present examples showing the use of **GeRnika**'s +functionalities for comparing tumor phylogenies. For this purpose, we +will use the `B_mats` object included in the **GeRnika** package, which +contains 10 $\boldsymbol{B}$ matrix trios. Specifically, we will use the +first trio of matrices, and set a predefined set of tags for the clones +in the trees: + +``` r +> B_mats <- GeRnika::B_mats + +> B_real <- B_mats[[1]]$B_real +> B_alg1 <- B_mats[[1]]$B_alg1 +> B_alg2 <- B_mats[[1]]$B_alg2 + +> tags <- c("TP53", "KRAS", "PIK3CA", "APC", "EGFR", "BRCA1", "PTEN", "BRAF", + "MYC", "CDKN2A") + +> phylotree_real <- B_to_phylotree(B = B_real, labels = tags) +> phylotree_alg2 <- B_to_phylotree(B = B_alg2, labels = tags) +> phylotree_alg1 <- B_to_phylotree(B = B_alg1, labels = tags) + +> plot(phylotree_real, labels=TRUE) +> plot(phylotree_alg1, labels=TRUE) +> plot(phylotree_alg2, labels=TRUE) +``` + +The plots of the three instantiated `Phylotree` class objects are +depicted in Figure \@ref(fig:phylocomparison). + + +```{r phylocomparison, fig.cap="`phylotree_real`, `phylotree_alg1` and `phylotree_alg2`, from the left to the right.", out.width="100%", echo=FALSE} +knitr::include_graphics("figs/comparisonphylotrees.png") +``` + + +We can check if the phylogenies of two tumors are equivalent using the +`equals` method: + +``` r +> equals(phylotree_1 = phylotree_real, phylotree_2 = phylotree_alg1) +[1] FALSE + +> equals(phylotree_1 = phylotree_real, phylotree_2 = phylotree_real) +[1] TRUE +``` + +In this case, `phylotree_real` and `phylotree_alg1` are not identical, +as some edges present in `phylotree_real` are absent in +`phylotree_alg1`, and vice versa. However, a phylogenetic tree will +always be identical to itself, as shown when comparing `phylotree_real` +to itself. + +To find the maximal common subtrees between two phylogenetic trees, we +do it using the following command: + +``` r +> find_common_subtrees(phylotree_1 = phylotree_real, phylotree_2 = phylotree_alg2, + labels = TRUE) + +Independent edges of tree1: 2 + +Independent edges of tree2: 2 + +Common edges: 7 + +Distance: 4 +``` + +```{r commonsubtrees, fig.cap="Maximal common subtrees between `phylotree_real` and `phylotree_alg2` using predefined tags. In this case, there exists a single common subtree, but there may exist more in other cases.", out.width="100%", echo=FALSE} +knitr::include_graphics("figs/commonsubtrees.png") +``` + +```{r consensus, fig.cap="Consensus tree between `phylotree_real` and `phylotree_alg1` using the *Lancet* palette and predefined tags.", out.width="40%", echo=FALSE} +knitr::include_graphics("figs/consensus.png") +``` + + +The maximal common subtrees (in this case, one subtree) between +`phylotree_real` and\ +`phylotree_alg2` are shown in Figure +\@ref(fig:commonsubtrees). Note that the clones in the maximal +common subtree are represented by the predefined tags in the `Phylotree` +class objects as we have set `labels = TRUE`. Additionally, this method +prints the number of common and independent edges of the trees, along +with the distance between them. + +Finally, we generate the consensus tree between two phylogenetic trees +using one of the custom palettes offered by **GeRnika**, specifically +the *Lancet* palette, and the predefined tags for the clones as follows: + +``` r +> palette <- GeRnika::palettes$Lancet + +> consensus_real_alg1 <- combine_trees(phylotree_1 = phylotree_real, + phylotree_2 = phylotree_alg1, + labels = TRUE, + palette = palette) + +> DiagrammeR::render_graph(consensus_real_alg1) +``` + +The consensus tree between `phylotree_real` and `phylotree_alg1` is +depicted in Figure \@ref(fig:consensus). Here, the nodes and the edges that compose +the common subtrees between the original trees are green. In addition, +pink edges denote the independent edges of the tree passed as the first +parameter of the method, while blue edges represent the independent +edges of the second tree. Note that the independent edges of both trees +are presented with translucent colors. + +## Conclusions + +**GeRnika** is a comprehensive R package designed to address a critical +gap in the tools available for studying tumor evolution within the R +environment. To this end, it provides researchers with an integrated +suite for simulating, visualizing, and comparing tumor phylogenies. +Unlike many existing tools, **GeRnika** is fully implemented in R, +making it particularly accessible for the bioinformatics community, +which widely relies on R for data analysis and visualization. + +One of **GeRnika**'s key contributions is providing tools to generate +biologically plausible datasets for studying intratumoral heterogeneity +and clonal dynamics. With varied, easily customizable +parameters---controlling features such as clonal tree topology, +selective pressures, and sequencing noise---**GeRnika** enables +exploration of a wide range of evolutionary patterns and complexities +that provide a valuable resource for testing new methods and hypotheses +in tumor heterogeneity research. + +Beyond its core simulation features, **GeRnika** includes tools for +visualizing and comparing tumor phylogenies, offering a unified solution +that eliminates the need for multiple packages or complex data +processing workflows. Future work will focus on enhancing **GeRnika**'s +compatibility with other R packages related to tumor evolution, allowing +for easier integration with existing resources. + +Overall, **GeRnika** provides an accessible, user-friendly tool that +supports research into intratumoral heterogeneity, with the potential to +substantially advance tumor phylogeny research. While it does not +perform clonal deconvolution or phylogeny inference from real sequencing +data, **GeRnika** serves as a valuable platform for simulation, +visualization, and benchmarking, supporting the development and +evaluation of such algorithms. + +## R software + +The R package [**GeRnika**](https://CRAN.R-project.org/package=GeRnika) +is now available on CRAN. +:::: diff --git a/_articles/RJ-2025-042/RJ-2025-042.html b/_articles/RJ-2025-042/RJ-2025-042.html new file mode 100644 index 0000000000..026595753d --- /dev/null +++ b/_articles/RJ-2025-042/RJ-2025-042.html @@ -0,0 +1,2907 @@ + + + + + + + + + + + + + + + + + + + + + + GeRnika: An R Package for the Simulation, Visualization and Comparison of Tumor Phylogenies + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +

    GeRnika: An R Package for the Simulation, Visualization and Comparison of Tumor Phylogenies

    + + + +

    The development of methods to study intratumoral heterogeneity and +tumor phylogenies is a highly active area of research. However, the +advancement of these approaches often necessitates access to +substantial amounts of data, which can be challenging and expensive to +acquire. Moreover, the assessment of results requires tools for +visualizing and comparing tumor phylogenies. In this paper, we +introduce GeRnika, an R package designed to address these needs by +enabling the simulation, visualization, and comparison of tumor +evolution data. In summary, GeRnika provides researchers with a +user-friendly tool that facilitates the analysis of their approaches +aimed at studying tumor composition and evolutionary history.

    +
    + + + +
    +
    +

    1 Introduction

    +

    The development of tumors is a complex and dynamic process characterized +by a succession of events where DNA mutations accumulate over time. +These mutations give rise to genetic diversity within the tumor, leading +to the emergence of distinct clonal subpopulations or, simply, clones +(Nowell 1976). Each of these subpopulations exhibits unique +mutational profiles, resulting in varied phenotypic and behavioral +characteristics among the cancer cells (Marass et al. 2016). This phenomenon, known as +intratumoral heterogeneity (ITH), significantly hinders the design of +effective medical therapies, since different clones within the same +tumor may respond differently to treatments, ultimately leading to +therapy resistance and disease recurrence (Burrell et al. 2013). To address +this challenge, several innovative approaches are being developed to +study the tumor composition in greater detail and to reconstruct its +evolutionary history.

    +

    One common approach to studying tumor composition and phylogeny involves +using bulk DNA sequencing data from multiple tumor biopsies. This data +is relatively straightforward to obtain and provides a broad overview of +the genetic alterations within the tumor. However, using bulk sequencing +data in the study of ITH faces the challenge that each sample +potentially contains a mixture of different clonal populations rather +than just one clone. Consequently, the observed mutation +frequencies—measured as variant allele frequencies (VAFs)—do not +directly estimate the fraction of individual clones. Instead, the VAF +values represent a composite signal: the sum of the fractions of all +clones that harbor each mutation in a given sample.

    +

    This complexity implies that reconstructing the tumor’s evolutionary +history requires deconvolving these clonal admixtures within the +samples. This task is precisely the focus of the Clonal Deconvolution +and Evolution Problem (CDEP) (Strino et al. 2013; El-Kebir et al. 2015), which can be +summarized as determining the tumor’s clonal structure—that is, +identifying the number, proportion, and mutational composition of clones +in each sample—as well as reconstructing the clonal phylogenetic tree +that leads to the observed clonal mosaic. In this context, one of the +most prominent approaches to addressing the CDEP is the Variant Allele +Frequency Factorization Problem (VAFFP) (El-Kebir et al. 2015).

    +

    Given \(s\) tumor samples and \(n\) mutations identified across these +samples, we define a matrix \(\boldsymbol{F} \in [0, 1]^{s \times n}\), +where each element \(f_{ij}\) represents the variant VAF value, or +equivalently, the fraction of cells that carry mutation \(j\) in sample +\(i\). The VAFFP seeks to decompose this input matrix \(\boldsymbol{F}\) +into two matrices: a matrix \(\boldsymbol{B} \in \{0, 1\}^{n \times n}\) +that represents the clonal phylogeny, and a matrix +\(\boldsymbol{U} \in [0, 1]^{s \times n}\) that captures the clone +proportions in each tumor sample:

    +

    \[\boldsymbol{F} = \boldsymbol{U} \cdot \boldsymbol{B} +\label{eqF} \tag{1}\]

    +

    The \(\boldsymbol{B}\) matrix is a binary square matrix of size \(n\), where +\(b_{ij} = 1\) iff clone \(i\) contains the mutation \(j\) (Gusfield 1991). The +matrix \(\boldsymbol{U}\) is an \(s \times n\) matrix where \(u_{ij}\) is the +fraction of clone \(j\) in sample \(i\).

    +

    The VAFFP operates under two key assumptions: tumors have a monoclonal +origin, meaning they arise from a single abnormal cell, and the infinite +sites assumption (ISA), which states that mutations occur at most once +and cannot disappear over time (Kimura 1969). Under these assumptions, +the tumor’s clonal structure can be modeled as a perfect phylogeny +(Gusfield 1991). This model imposes two key constraints: (1) if two clones +share a mutation, they must either be identical or ancestrally related, +and (2) once a clone acquires a mutation, that mutation is inherited by +all its descendants.

    +

    Over the years, numerous methods have been developed to solve the CDEP +(Sengupta et al. 2014; Deshwar et al. 2015; Malikic et al. 2015; Popic et al. 2015; Yuan et al. 2015; El-Kebir et al. 2015, 2016; Jiang et al. 2016; Husić et al. 2019; Fu et al. 2022; Grigoriadis et al. 2024), +with recent advancements addressing reformulations of the problem that +incorporate single-cell sequencing-derived information or variants in +metastases, and account for temporal resolution +(Ross and Markowetz 2016; El-Kebir et al. 2018; Malikic et al. 2019; Satas et al. 2020; Sollier et al. 2023). +These methods primarily focus on reconstructing tumor phylogenies and +clonal compositions using both real and simulated data. However, the +simulation tools they employ often have limitations. For instance, while +MiPUP uses simulated data, it allows only basic parameter +adjustments—such as the number of mutations, samples, and reads—and +lacks the flexibility to fine-tune biological parameters. BitPhylogeny +creates highly realistic and complex simulated datasets representing +different modes of evolution, but these simulations are manually crafted +and limited in number, posing scalability issues.

    +

    Several tools specifically designed for simulating data for the CDEP +have also been introduced. Pearsim, written in Python, allows control +over parameters like read depth, number of subclones, samples, and +mutations (Kulman et al. 2022). OncoLib, a C++ library, facilitates the +simulation of tumor heterogeneity and the reconstruction of NGS +sequencing data of metastatic tumors, offering control over parameters +such as driver mutation probability, per-base sequencing error rate, +migration rate, and mutation rate (Qi et al. 2019). Machina, also in C++, +provides a framework for simulating metastatic tumors and visualizing +their phylogenetic trees and migration graphs (El-Kebir et al. 2018). HeteroGenesis, +implemented in Python, simulates heterogeneous tumors at the level of +clone genomes, but it is not specifically tailored for CDEP instances +and requires additional processing to generate suitable datasets +(Tanner et al. 2019).

    +

    Visualization of tumor phylogenies is another crucial aspect of studying +tumor evolution. Tools like CALDER, Clonevol, SPRUCE, fishplot, +ClonArch, and +clevRvis +offer solutions for visualizing clonal structures and evolutionary +trajectories +(El-Kebir et al. 2016; Miller et al. 2016; Dang et al. 2017; Myers et al. 2019; Wu and El-Kebir 2020; Sandmann et al. 2023). Among +these, fishplot and clevRvis are available as R packages. +Additionally, methods for generating consensus trees, such as TuELIP and +ConTreeDP, have been developed to summarize multiple phylogenetic trees +into a single representative tree (Fu and Schwartz 2021; Guang et al. 2023).

    +

    Despite the availability of existing tools, there remains a lack of +options in the R programming environment for realistically simulating +tumor evolution in a way that is both flexible and user-friendly, while +also enabling effective visualization and comparison.

    +

    In this paper, we introduce +GeRnika, an R package +that provides a comprehensive solution for simulating, visualizing, and +comparing tumor evolution data. Although GeRnika’s data simulation +functionality was primarily devised to create instances for solving the +CDEP, the simulated data are not restricted to this purpose and can also +be used for exploring evolutionary dynamics in broader contexts. To +accommodate diverse research needs, we have implemented the procedures +to be highly customizable, allowing users to adjust a wide range of +parameters such as the number of clones, selective pressures, mutation +rates, and sequencing noise levels. Unlike existing tools that may offer +limited customization or are implemented in other programming languages, +GeRnika is fully integrated into the R environment, making it easy +to use alongside other bioinformatics packages. By combining simulation +capabilities with visualization and comparison tools in a user-friendly +interface, GeRnika offers an accessible and flexible option within +the R ecosystem for researchers studying tumor evolution. It is +important to note that GeRnika does not implement algorithms for +inferring clonal composition or reconstructing phylogenies from +experimental datasets. Instead, it provides a controlled framework for +generating, visualizing, and comparing data that can be used to +benchmark such methods.

    +

    2 Simulation of tumor evolution

    +

    The main contribution of this work is the introduction of a novel +approach for simulating biologically plausible instances of the VAFFP +that accounts for several key factors, including the number of clones, +selective pressures, and sequencing noise. In this section, we provide a +detailed description of the approach.

    +

    Broadly speaking, each problem instance consists of a matrix +\(\boldsymbol{F}\) containing the VAF values of a set of mutations in a +set of samples, as described previously. This matrix is built from a +pair of matrices \(\boldsymbol{B}\) and \(\boldsymbol{U}\) that represent a +tumor phylogeny fulfilling the ISA and the proportions of the clones in +the samples, respectively, following Equation (1).

    +

    In order to simulate the \(\boldsymbol{B}\) and \(\boldsymbol{U}\) matrices, +we have devised two models: a tumor model that simulates the +evolutionary history and current state of the tumor, and a sampling +model that represents the tumor sampling process. A third model, namely +the sequencing noise model, has been devised to optionally introduce +sequencing noise to the VAF values in the \(\boldsymbol{F}\) matrix, if +noisy data is desired. The following subsections describe these models +in detail.

    +

    Tumor model

    +

    The tumor model generates a clonal tree \(T\) and an associated matrix +\(\boldsymbol{B}\), together with the clone proportions \(\boldsymbol{c}\) +and tumor blend at the moment of sampling. Briefly, for a tumor with a +set of \(n\) mutations denoted by \(M\), \(T\) is a rooted tree on an +\(n\)-sized vertex set \(V_n = \{v_{1}, \dots, v_{n} \}\), where \(v_{i}\) +represents clone \(i\) and simultaneously corresponds to the first clone +containing mutation \(M_i\). This one-to-one correspondence between clones +and mutations allows us to refer to them interchangeably. The tree is +further defined by an \((n-1)\)-sized edge set \(E_T\), where each edge +\(e_{ij} \in E_T\) represents a direct ancestral relationship from vertex +\(v_{i}\) to vertex \(v_{j}\).

    +

    In our tumor model, \(T\) is iteratively generated with a random topology, +as follows. First, the root node of \(T\), \(\mathcal R(T)\), is set, and a +random mutation \(M_i \in M\) is assigned to it. For each of the remaining +\(M_j \in M - \{M_i\}\) mutations, a new node \(v_j\) is created and the +mutation \(M_j\) is assigned to this node. The node \(v_j\) is then attached +as a child to one of the nodes already included in \(T\). To adhere to the +ISA model, each newly added node inherits all the mutations present in +its parent node.

    +

    The attachment of nodes to the tree is not uniformly random. Instead, +the nodes in the growing tree \(T\) have different probabilities of being +selected as parents for the new nodes, depending on the number of +ascendants, \(\mathcal A(v_i)\), they have. Specifically, +\(\forall v_j \neq \mathcal R(T)\), the parent node of \(v_j\) is sampled +from a multinomial distribution where the probabilities are calculated +as:

    +

    \[\boldsymbol{p}(v_i; k) = \frac{k^{\frac{|\mathcal{A}(v_i)| + 1}{\delta}}}{\sum_{v_l \in V'} k^{\frac{|\mathcal{A}(v_l)| + 1}{\delta}}}; \quad v_i \in V'\]

    +

    Here, \(\delta\) represents the depth of the growing tree, i.e., the +number of levels or layers in the tree structure. \(k \in (0, +\infty)\) +is the topology parameter that determines whether the topology tends to +be branched, with a decreasing probability for increasing numbers of +ascendants (\(k\) \(<\) 1), or linear, with an increasing probability for +increasing numbers of ascendants (\(k\) \(>\) 1).

    +

    Once \(T\) has been generated, it is represented in the form of a +\(\boldsymbol{B}\) matrix, constructed by initializing an identity matrix +\(B_{n}\) and setting \(b_{ji}\) to 1 for each pair of nodes \(v_i\) and \(v_j\) +where node \(v_j\) is a descendant of node \(v_i\) in \(T\).

    +

    After obtaining \(\boldsymbol{B}\), the proportions of the clones in the +whole tumor, denoted as \(\boldsymbol{c} = \{c_{1}, \dots, c_{n} \}\), are +simulated. It is important to note that these proportions are not the +same as those appearing in the \(\boldsymbol{U}\) matrix, which represent +the sampled clone proportions and depend not only on the global clone +proportions but also on the spatial distribution of the clones and the +sampling sites.

    +

    These clone proportions \(\boldsymbol{c}\) are calculated by sequentially +sampling a Dirichlet distribution at each multifurcation in \(T\), +starting from the root. For instance, for a node \(v_i\) with children +\(\mathcal K(v_i)\) = {\(v_j\), \(v_k\)}, we draw a sample \((x_i, x_j, x_k)\) +that represents the proportions of the parent clone and its two +children, respectively, from a Dirichlet distribution +\(Dir(\alpha_i, \alpha_j, \alpha_k)\). When this sampling is performed at +a node \(v_i \neq \mathcal R(T)\), these proportions are scaled relative +to the original proportion of the parent clone. This ensures that the +sum rule is met, and that once all multifurcations have been visited, +the proportions of all clones in \(T\) sum up to one. While several +approaches can exist to determine these proportions, this method +provides a natural approximation to the problem that can be interpreted +as the distribution of the mass or proportion of each clone between +itself and its descendants.

    +

    The parameters of the Dirichlet distribution depend on the tumor’s +evolution model. In this work, we consider two fundamental cases: +positive selection-driven evolution and neutral evolution. In positive +selection-driven evolution, certain mutations confer a growth advantage, +while most mutations do not. As a result, the clones carrying these +advantageous mutations outcompete other clones and dominate the tumor. +Consequently, tumors are predominantly composed of a few dominant +clones, with the remaining clones present in too small proportions. +Under neutral evolution, instead, there is no significant number of +mutations that provide a fitness advantage, and clones accumulate solely +due to tumor progression. As a result, all clones are present in similar +proportions (Davis et al. 2017).

    +

    Based on this, all the parameters for the Dirichlet distribution for +positive selection-driven evolution are set to 0.3. For neutral +evolution, the parameter corresponding to the parent node (\(\alpha_{p}\)) +is set to 5, and the parameters corresponding to the children node(s) +(\(\alpha_{c}\)) are set to 10. Different alpha values are used for parent +and children nodes in neutral evolution to ensure that clones arising +late in the evolution do not end up with proportions that are too small +solely due to their position in the topology, preventing the deviation +from the expected clone proportion distribution for this type of +evolution model. These values have been chosen empirically, and their +effect is illustrated in Figure 1, which shows how 5,000 random samples from +the mentioned Dirichlet distributions (for the particular case of 3 +dimensions, i.e., one parent and two children nodes) are distributed. As +observed, in the case of positive selection +(\(\alpha_{i} = \alpha_{j} = \alpha_{k} = 0.3\)), the \((x_i, x_j, x_k)\) +values are equally pushed towards the three corners of the simplex. In +other words, the samples tend to be sparse, with typically one component +having a large value and the rest close to 0. Instead, when neutral +selection is adopted (\(\boldsymbol{\alpha}\) = (5, 10, 10)), the +\((x_i, x_j, x_k)\) values concentrate close to the center of the simplex, +but with a tendency to deviate towards those components with larger +\(\alpha\) value. This means that samples \((x_i, x_j, x_k)\) are less +sparse in neutral evolution, with larger values for \(x_2\) and \(x_3\) in +this case, which represent the children nodes.

    +
    +
    +Ternary density plots of 5,000 samples drawn from two 3-dimensional Dirichlet distributions. The parameters of the Dirichlet distribution on the left are $\alpha = (0.3, 0.3, 0.3)$ and the distribution is used to represent positive selection-driven evolution. The distribution on the right has parameters $\alpha = (5, 10, 10)$ and is used to represent neutral evolution. Samples drawn from these distributions (or their generalization to higher spaces) are used to calculate clone proportions in each tree multifurcation. +

    +Figure 1: Ternary density plots of 5,000 samples drawn from two 3-dimensional Dirichlet distributions. The parameters of the Dirichlet distribution on the left are \(\alpha = (0.3, 0.3, 0.3)\) and the distribution is used to represent positive selection-driven evolution. The distribution on the right has parameters \(\alpha = (5, 10, 10)\) and is used to represent neutral evolution. Samples drawn from these distributions (or their generalization to higher spaces) are used to calculate clone proportions in each tree multifurcation. +

    +
    +
    +

    Taking into account that marginalizing the Dirichlet distribution +results in a Beta distribution, the proportion of the clone +\(v_i \in V_n \; | \; v_i \neq \mathcal R(T)\) in the tumor, denoted as +\(C_i\), follows the distribution:

    +

    \[C_i \sim C_{\mathcal P(v_i)} \cdot \Gamma_i \cdot \Gamma_i^\prime\]

    +

    where

    +

    \[\Gamma_i \sim Beta(\alpha_{c}, \alpha_{p} + \alpha_{c} \cdot (|\mathcal K(\mathcal P(v_i))| - 1))\]

    +

    and

    +

    \[\begin{aligned} +\Gamma_i^\prime = 1 \quad \text{if } |\mathcal K(v_i)| = 0 \\ +\Gamma_i^\prime \sim Beta(\alpha_{p}, \alpha_{c} \cdot |\mathcal K(v_i)|) \quad \text{if } |\mathcal K(v_i)| \neq 0 +\end{aligned}\]

    +

    For the case where \(v_i = \mathcal R(T)\), the root node, \(C_i\), follows:

    +

    \[C_i \sim Beta(\alpha_{p}, \alpha_{c} \cdot |\mathcal K(v_i)|)\]

    +

    Here, \(\alpha_{p}\) and \(\alpha_{c}\) are the parameters of the Dirichlet +distribution assigned to parent and child nodes, respectively.

    +

    To complete the tumor model, the tumor blend is simulated, which +represents the degree of physical mixing between the tumor clones. In +order to do this, we simplify the spatial distribution to one dimension +and model the tumor as a Gaussian mixture model with \(n\) components, +where each component \(G_i\) represents a tumor clone, and the mixture +weights are given by \(\boldsymbol{c}\). The variance for all components +is set to 1, while the mean values are random variables.

    +

    Specifically, we start by selecting a random clone, and its component’s +mean value is set to 0. Then, the mean values of the remaining \(n - 1\) +components are calculated sequentially by adding \(d\) units to the mean +value of the previous component. To introduce variability in the tumor +blend, the value of \(d\) is chosen from the set \(\{0, 0.1, \ldots, 4\}\). +For \(d\) = 0, the two clones are completely mixed, while for \(d\) = 4, +they are physically far apart from each other. The choice of the upper +limit for \(d\) has been determined empirically, considering that with +this value, the overlapping area between the two clones becomes +negligible.

    +

    To ensure the separation between the clones is random and that most of +the time the separation is small, we use an exponential-like +distribution with the form \(Beta(\alpha=1, \beta=X)\) to sample the +values of \(d\). Specifically, we set \(\beta = 5\) to ensure that the +samples obtained from the mixture are not excessively sparse. We can +express this mathematically as:

    +

    \[D \sim 4 \cdot Beta(\alpha=1, \beta=5)\]

    +

    Sampling simulation

    +

    So far, we have described how the clones of a tumor are modelled by the +tumor model. However, in real practice, there is no easy way of +observing these global properties of a tumor. Instead, we typically have +access to information provided by samples or biopsies. This means that +certain tumor characteristics, such as the real clone proportions +\(\boldsymbol{c}\), cannot be directly obtained. Instead, we can only +determine the sampled clone proportions, which depend on the specific +sampling procedure employed. Unless there is a perfectly uniform mixture +of the clone cells, their sampled proportions will not match the global +proportions. These sampled clone proportions are, in fact, the +\(\boldsymbol{u}_{i}.\) elements in the \(\boldsymbol{U}\) matrix.

    +

    The sampling simulation we have devised simulates the physical sampling +of the tumor and allows us to construct the \(\boldsymbol{U}\) matrix of +the problem. This procedure operates on the data simulated using the +tumor model. Specifically, it simulates a sampling procedure carried out +in a grid manner over the tumor Gaussian mixture model described in the +previous section. Let \(G_1\) and \(G_n\) be the components with the lowest +and largest mean values, respectively, in the Gaussian mixture model. +The 1st and \(m\)th sampling points in the grid are always set to +\(\mu_{G_1} - 2.8 \cdot \sigma_{G_1}\) and +\(\mu_{G_n} + 2.8 \cdot \sigma_{G_n}\), respectively, and the remaining +\(m\)-2 sampling points are determined by dividing the range between these +two endpoints into \(m-1\) equal intervals.

    +

    The densities of the Gaussian distributions at each sampling point are +multiplied by the global proportion of the clones sampled from the +Dirichlet distributions, so that for each sampling point \(i\), the +fraction of clone \(j\), \(p_{ij}\), is proportional to their product:

    +

    \[p_{ij} \propto c_j \cdot \phi_{ij} +\label{eq:pij_2} \tag{2}\]

    +

    where \(c_j\) is the global proportion of clone \(j\) and \(\phi_{ij}\) is the +density of the Gaussian component associated with clone \(j\) at sampling +point \(i\).

    +

    Finally, to account for the effect of cell count in the samples, a +multinomial distribution is used to sample a given number of cells +\(n_{c}\) for each tumor sample. In that distribution, the probability of +selecting each clone at sampling site \(i\) is given by +\((p_{i1}, \ldots, p_{in})\). The resulting values determine the final +tumor clone composition in sample \(i\), which are represented in the +matrix \(\boldsymbol{U}\):

    +

    \[U_{i.} \sim \frac{M(n = n_{c}, p = (p_{i1}, \ldots, p_{in})) +}{100}\]

    +

    Note that selecting a relatively low value for \(n\) in the multinomial +distribution can lead to clones with very low frequencies being modeled +as absent in the sample, with composition values equal to 0. This is +indeed more realistic than truly observing them with such low +frequencies.

    +

    Sequencing noise simulation

    +

    Up to this point, the \(\boldsymbol{B}\) and \(\boldsymbol{U}\) matrices of +an instance have been simulated. In case we are simulating noise-free +data, the simulation is complete once Equation (1) is applied +to obtain the \(\boldsymbol{F}\) matrix.

    +

    As a brief reminder, each element \(f_{ij}\) in \(\boldsymbol{F}\) denotes +the frequency or VAF of the mutation \(M_j\) in sample \(i\) or, in other +words, the proportion of sequencing reads that carry the mutation \(M_j\) +in that particular sample. This also means that the proportion of reads +in that sample that do not observe the mutation but instead contain the +reference nucleotide is 1 - \(f_{ij}\).

    +

    However, empirical factors can artificially alter the VAF value, leading +it to deviate from the true ratio between the variant and total allele +molecule counts. One of these factors is the noise introduced during the +DNA sequencing process itself, which can arise in two main ways. First, +limitations of the sequencing instrument can lead to incorrect +nucleotide readings of DNA fragments. For example, a position that +actually contains nucleotide A may be read as a T. Second, there can be +a biased number of reads produced for a particular site, which can +result from chemical reaction peculiarities or simply because not all +fragments are sequenced. These limitations can, however, be mitigated to +some extent. For instance, it has been shown that a high depth of +coverage, which refers to the average number of reads that cover each +position, can lead to more accurate VAF values (Petrackova et al. 2019).

    +

    In order to incorporate the effect of sequencing noise in the data +instances, we have developed a procedure to simulate sequencing noise. +This procedure introduces noise to the \(\boldsymbol{F}\) matrix and +generates a noisy matrix \(\boldsymbol{F^{(n)}}\), where +\(\boldsymbol{F^{(n)}} \neq \boldsymbol{U} \cdot \boldsymbol{B}\). The +procedure simulates noise at the level of the sequencing reads and +recalculates the new \(f^{(n)}_{ij}\) values, as follows.

    +

    The sequencing depth \(r\) at the genomic position where \(M_j\) occurs in +sample \(i\) is distributed according to a negative binomial distribution:

    +

    \[r_{ij} \sim NB(\mu = \mu_{sd}, \alpha = 5) +\label{eq:r} \tag{3}\]

    +

    where \(\mu_{sd}\) represents the mean sequencing depth, which is the +average number of reads covering the genomic position of mutation \(M_j\) +in the sample, and \(\alpha\) is the dispersion parameter, which controls +the variability of the sequencing depth around the mean and is fixed at +5.

    +

    The number of reads supporting the alternate allele \(r^{a}_{ij}\) is then +modeled by a binomial distribution:

    +

    \[r^{a}_{ij} \sim B(n = r_{ij}, p = f_{ij}) +\label{eq:ra} \tag{4}\]

    +

    In sequencing data, errors can occur due to limitations inherent to the +sequencing methodology. These errors vary depending on the technology +used.

    +

    To simulate the effect of these errors on the VAF values, the number of +reads \(r^{a\prime}_{ij}\) that, despite originally supporting the +alternate allele, contain a different allele as a result of a sequencing +error, is modeled using a binomial distribution:

    +

    \[r^{a\prime}_{ij} \sim B(n = r^{a}_{ij}, p = \varepsilon),\]

    +

    where \(\varepsilon\) represents the sequencing error rate.

    +

    We also need to consider the situation where the reads contain the +reference nucleotide but are read with the alternate allele as a result +of this error. This can be better understood with an example. Let’s +imagine that at a certain genomic position, the normal cells have a T, +but in some cells, there is a mutation where the T has changed to an A. +In this case, for the normal cells, with a rate of \(\varepsilon\), a +sequencing error may occur, resulting in a read of C, G, or A instead of +T, each with an equal chance. Therefore, in approximately +\(\frac{\varepsilon}{3}\) of the cases, reads with the mutation of +interest will arise from normal reads:

    +

    \[r^{r\prime}_{ij} \sim B(n = r_{ij} - r^{a}_{ij}, p = \frac{\varepsilon}{3}) +\label{eq:ramr} \tag{5}\]

    +

    Thus, taking all these into consideration, the final noisy VAF values +\(f^{(n)}_{ij}\) are simulated as:

    +

    \[f^{(n)}_{ij} = \frac{r^{a}_{ij} - r^{a\prime}_{ij} + r^{r\prime}_{ij}}{r_{ij}} +\label{eq:noisyVAF} \tag{6}\]

    +

    By default, the sequencing error rate \(\varepsilon\) is set to 0.001, +following commonly reported values for Illumina data +(Loman et al. 2012).

    +

    As an illustration of the effect of the noise model, in Figure +2, we +have depicted the density of the mean absolute error between the +\(\boldsymbol{F^{(n)}}\) matrix and its corresponding noise-free +\(\boldsymbol{F}\) matrix for a collection of noisy instances. As can be +seen, as \(\mu_{sd}\) increases, the error introduced to the +\(\boldsymbol{F^{(n)}}\) matrix decreases. This is expected because the +\(r^{a}_{ij}\) values follow a binomial distribution as described in +Equation (4), where the number of trials is determined by +\(\mu_{sd}\) as shown in Equation (3), and the event probability +corresponds to the \(f^{(n)}_{ij}\) value. Therefore, the larger the +number of trials, the closer the noisy VAF value is to the noise-free +VAF value.

    +
    +
    +Density of the mean absolute error in noisy $\boldsymbol{F}$ matrices for different $\mu_{sd}$ values that correspond to different noise levels. +

    +Figure 2: Density of the mean absolute error in noisy \(\boldsymbol{F}\) matrices for different \(\mu_{sd}\) values that correspond to different noise levels. +

    +
    +
    +

    As a final remark, it is important to note that although our data +simulation procedure follows the ISA, the addition of noise may cause +the resulting data to break this assumption.

    +

    3 The package

    +

    GeRnika provides three main functionalities for studying tumor +evolution data: (I) simulating artificial tumor evolution, (II) +visualizing tumor phylogenies, and (III) comparing tumor phylogenies. +This section explains the functions that support these features. +Additionally, we describe extra data provided by GeRnika that users +can use to try the methods in the package.

    +

    Simulation methods

    +

    To enable users to simulate tumor evolution data, GeRnika provides +various functions inspired by the methods described in +2. +GeRnika offers two options: a single method for streamlined simulations +and separate methods for performing each step individually, allowing +users to customize or replace specific parts of the process.

    +
    +
    create_instance
    +

    The main function for streamlined tumor data simulation is +create_instance. This function provides a convenient way to perform +the entire simulation process in a single step. The following command +demonstrates how to use it to generate the artificial data:

    +
    +
        create_instance(n, m, k, selection, noisy = TRUE, depth = 30, seed = Sys.time())
    +
    +

    where each argument of the method is described as follows:

    +
      +
    • n: An integer representing the number of clones.

    • +
    • m: An integer representing the number of samples.

    • +
    • k: A numeric value that determines the linearity of the tree +topology. Also referred to as the topology parameter. Increasing +values of this parameter increase the linearity of the topology. +When k is set to 1, all nodes have equal probabilities of being +chosen as parents, resulting in a uniformly random topology.

    • +
    • selection: A character string representing the evolutionary mode +the tumor follows. This should be either "positive" or +"neutral".

    • +
    • noisy: A logical value (TRUE by default) indicating whether to +add noise to the frequency matrix. If TRUE, noise is added to the +frequency matrix. If FALSE, no noise is added.

    • +
    • depth: A numeric value (30 by default) representing the mean depth +of sequencing.

    • +
    • seed: A numeric value (Sys.time() by default) used to set the +seed for the random number generator.

    • +
    +

    The create_instance function returns a list containing the following +components:

    +
      +
    • F_noisy: A matrix representing the noisy frequencies of each +mutation across samples. If the noisy parameter is set to FALSE, +this matrix is equal to F_true.

    • +
    • B: A matrix representing the relationships between mutations and +clones in the tumor.

    • +
    • U: A matrix representing the frequencies of the clones across the +set of samples.

    • +
    • F_true: A matrix representing the noise-free frequencies of each +mutation across the samples.

    • +
    +

    As explained, the create_instance function generates all matrices +representing frequencies, proportions, and the phylogeny of the +simulated tumor data in a single step. However, GeRnika also +provides individual functions for simulating each of these elements +independently, providing users with greater control over the +characteristics of the simulated tumor data.

    +
    +
    +
    create_B, create_U, create_F and add_noise
    +

    These methods provide specialized functions to generate each matrix +involved in representing tumor evolution data. These functions include +the following:

    +
      +
    • create_B: This function generates a mutation matrix +(\(\boldsymbol{B}\) matrix) for a tumor phylogenetic tree with a given +number of nodes and a value k determining the linearity of the +tree topology.

    • +
    • create_U: This function calculates the \(\boldsymbol{U}\) matrix, +containing the frequencies of each clone in a set of samples, based +on a \(\boldsymbol{B}\) matrix, the number of samples considered, the +number of cells in each sample, and the evolutionary mode of the +tumor.

    • +
    • create_F: This function generates the \(\boldsymbol{F}\) matrix, +which contains mutation frequency values for a series of mutations +across a collection of tumor biopsies or samples. The matrix is +computed based on a pair of matrices, \(\boldsymbol{U}\) and +\(\boldsymbol{B}\), and considers whether the mutations are +heterozygous.

    • +
    • add_noise: This function introduces sequencing noise into the +noise-free \(\boldsymbol{F}\) matrix generated by the create_F +method. Users can specify the mean sequencing depth and the +overdispersion parameter, which are used to simulate sequencing +depth based on a negative binomial distribution.

    • +
    +

    The reader is encouraged to refer to the package documentation for more +information about these functions and their parameters.

    +
    +

    Visualization methods

    +

    The following functions enable the visualization of tumor evolution data +by generating phylogenetic trees based on the data under analysis.

    +
    +
    Phylotree S4 class
    +

    To simplify the execution of its functionalities, GeRnika utilizes +the "Phylotree" class. The "Phylotree" S4 class is a data structure +specifically designed to represent phylogenetic trees, facilitating the +use of the package’s methods and ensuring their computational +efficiency. The attributes of the "Phylotree" class are as follows:

    +
      +
    • B: A data.frame containing the square matrix that represents the +ancestral relationships among the clones in the phylogenetic tree +(\(\boldsymbol{B}\) matrix).

    • +
    • clones: A vector representing the indices of the clones in the +\(\boldsymbol{B}\) matrix.

    • +
    • genes: A vector indicating the index of the gene that firstly +mutated in each clone within the \(\boldsymbol{B}\) matrix.

    • +
    • parents: A vector indicating the parent clones for each clone in +the phylogenetic tree.

    • +
    • tree: A "Node" class object representing the phylogenetic tree +(this class is inherited from the +data.tree +package).

    • +
    • labels: A vector containing the gene tags associated with the +nodes in the phylogenetic tree.

    • +
    +

    A customized "Phylotree" class object can be instantiated with custom +attributes using the create_instance method. This method takes all the +attributes according to the "Phylotree" class as arguments. +Alternatively, GeRnika provides a function that automatically +generates a "Phylotree" class object on the basis of a given +\(\boldsymbol{B}\).

    +
    +
    +
    B_to_phylotree
    +

    In order to instantiate an object of the “Phylotree” class, the +following command can be used:

    +
        B_to_phylotree(B, labels = NA)
    +

    where each argument of the method is described as follows:

    +
      +
    • B: A square \(\boldsymbol{B}\) matrix that represents the +phylogenetic tree.

    • +
    • labels: An optional vector containing the tags of the genes in the +phylogenetic tree. NA by default.

    • +
    +

    This function returns an object of the "Phylotree" class, +automatically generating its attributes based on B, which represents +the phylogenetic tree of the tumor under analysis.

    +

    Once instantiated, the phylogenetic tree in a "Phylotree" class object +can be visualized using the generic plot function, which takes the +"Phylotree" object as its argument. The plot function also includes +a labels argument that can be set to TRUE to display node labels on +the phylogenetic tree, using the gene tags stored within the +"Phylotree" object.

    +

    The GeRnika package provides the plot_proportions function for +visualizing phylogenetic trees, with node sizes and colors reflecting +the proportions of each clone. This function requires two inputs: a +"Phylotree" class object representing the phylogenetic tree and a +numeric vector or matrix specifying clone proportions. If a vector is +provided, a single tree is plotted, with the node sizes and colors +determined by the values in the vector. Instead, if a matrix is +provided, such as the \(\boldsymbol{U}\) matrix that represents the +frequencies of clones across samples, the function plots one tree for +each row of the matrix. Each tree is generated based on the clone +proportions specified in the corresponding row. Additionally, users can +enable node labeling by setting the labels argument to TRUE, which +annotates the tree nodes with gene tags from the "Phylotree" object.

    +
    +

    Comparison methods

    +

    This section describes the methods included in GeRnika that +facilitate the comparison of tumor phylogenies.

    +

    A fundamental approach for comparing two phylogenetic trees is to +determine if their evolutionary histories are equivalent. The equals +function performs this comparison by accepting two "Phylotree" class +objects as arguments. This function returns a boolean value indicating +whether the provided phylogenetic trees are equivalent.

    +

    To analyze similarities and differences between two phylogenetic trees, +the find_common_subtrees function identifies and plots all maximal +common subtrees between them. In addition to visualizing these subtrees, +the function outputs the number of shared and unique edges (those +present in only one of the trees) and calculates the distance between +the trees, defined as the sum of their unique edges. This method also +includes an option to label the maximal common subtrees with gene tags +by setting labels = TRUE.

    +

    The combine_trees function generates a consensus tree by combining the +nodes and edges of two "Phylotree" class objects. The consensus tree +highlights the nodes and edges that form common subtrees between the +original trees, as well as the independent edges unique to each tree, +which are displayed with reduced opacity. This method also allows +labeling the nodes with gene tags by setting labels = TRUE and +customizing the colors of the consensus tree by passing a 3-element +hexadecimal vector to the palette argument.

    +

    Exported data

    +

    GeRnika provides various exported data instances to help users +easily explore the package’s functionalities. These are as follows:

    +
      +
    • B_mats: A list of 10 trios of \(\boldsymbol{B}\) matrices. Each trio +includes a real \(\boldsymbol{B}\) matrix and two \(\boldsymbol{B}\) +matrices generated using different algorithms that infer +evolutionary relationships from a given \(\boldsymbol{F}\) matrix. +These matrices serve as illustrative examples for testing and +exploring the package’s functionalities.

    • +
    • palettes: A data frame containing three predefined palettes for +use with methods in GeRnika that require color palettes.

    • +
    +

    4 Examples

    +

    In this section we show examples of the use of the methods explained in +3. Please +note that in the following examples, we will set the seeds of +non-deterministic methods to a predefined value to ensure +reproducibility.

    +

    Simulating tumor evolution data

    +

    The simulation of tumor clonal data involves generating the matrices +\(\boldsymbol{B}\), \(\boldsymbol{U}\), \(\boldsymbol{F}\), and +\(\boldsymbol{F^{(n)}}\) associated with a specific instance. For example, +we can simulate a noisy instance of a tumor composed of 5 +clones/mutations, which has evolved under neutral evolution with a \(k\) +value of 0.5, and from which 3 samples have been taken in a single line +of code, as follows:

    +
    
    +> I <- create_instance(n = 5, m = 3, k = 0.5, selection = "neutral", seed = 1)
    +> I
    +
    +$F_noisy
    +         mut1       mut2      mut3      mut4      mut5
    +sample1 1.000 0.09090909 0.0000000 0.2777778 0.3548387
    +sample2 1.000 0.20000000 0.2631579 0.8536585 0.2000000
    +sample3 0.975 0.03846154 1.0000000 1.0000000 0.0000000
    +
    +$B
    +       mut1 mut2 mut3 mut4 mut5
    +clone1    1    0    0    0    0
    +clone2    1    1    0    1    0
    +clone3    1    0    1    1    0
    +clone4    1    0    0    1    0
    +clone5    1    0    0    1    1
    +
    +$U
    +        clone1 clone2 clone3 clone4 clone5
    +sample1   0.59   0.13   0.00   0.01   0.27
    +sample2   0.13   0.27   0.24   0.19   0.17
    +sample3   0.00   0.04   0.89   0.07   0.00
    +
    +$F_true
    +        mut1 mut2 mut3 mut4 mut5
    +sample1    1 0.13 0.00 0.41 0.27
    +sample2    1 0.27 0.24 0.87 0.17
    +sample3    1 0.04 0.89 1.00 0.00
    +

    Using this approach, the previously mentioned four matrices are +simulated. Note that the noise-free \(\boldsymbol{F}\) matrix is referred +to as F_true in the package’s code, while the noisy +\(\boldsymbol{F^{(n)}}\) is denoted as F_noisy.

    +

    The previous method allows users to generate instances easily and +quickly. However, some users may require more precise control over the +data, which can be achieved using the create_B, create_U, +create_F, and add_noise methods. For examples on how to use these +methods, please refer to the package documentation.

    +

    Visualizing tumor phylogenies

    +

    Once the matrices associated with our tumor instance have been +generated, we can create a "Phylotree" class object, as follows:

    +
    > phylotree <- B_to_phylotree(B = I$B)
    +> phylotree
    +
    +An object of class "Phylotree"
    +Slot "B":
    +       mut1 mut2 mut3 mut4 mut5
    +clone1    1    0    0    0    0
    +clone2    1    1    0    1    0
    +clone3    1    0    1    1    0
    +clone4    1    0    0    1    0
    +clone5    1    0    0    1    1
    +
    +Slot "clones":
    +mut1 mut2 mut3 mut4 mut5 
    +   1    2    3    4    5 
    +
    +Slot "genes":
    +mut1 mut2 mut3 mut4 mut5 
    +   1    2    3    4    5 
    +
    +Slot "parents":
    +[1] -1  4  4  1  4
    +
    +Slot "tree":
    +  levelName
    +1 1        
    +2  °--4    
    +3      ¦--2
    +4      ¦--3
    +5      °--5
    +
    +Slot "labels":
    +[1] "mut1" "mut4" "mut2" "mut3" "mut5"
    +

    Since no list of tags is provided to the labels parameter, a default +set of labels is automatically assigned to the instantiated +"Phylotree" class object.

    +

    Afterwards, we can visualize the tumor phylogeny associated to the +simulated \(\boldsymbol{B}\) matrix by using the generic plot method, as +follows:

    +
    > plot(phylotree)
    +
    +
    +Phylogenetic tree associated to the generated 'Phylotree' class object. +

    +Figure 3: Phylogenetic tree associated to the generated ‘Phylotree’ class object. +

    +
    +
    +

    The resulting plot is shown in Figure +3. +Instead of clone numbers, the user can utilize the predefined tags in +the "Phylotree" class object to label the nodes in the tree by setting +the labels = TRUE parameter in the plot function.

    +

    When plotting a "Phylotree" class object, its nodes can be resized +according to the proportions of the clones that compose the tumor +samples. To achieve this, we can use the \(\boldsymbol{U}\) matrix from +the previously generated instance to determine the proportions of the +clones, as shown below:

    +
    > plot_proportions(phylotree, I$U, labels = TRUE)
    +
    +
    +Phylogenetic trees associated to the generated 'Phylotree' class object, using the proportions associated to the previously generated $\boldsymbol{U}$ matrix. +

    +Figure 4: Phylogenetic trees associated to the generated ‘Phylotree’ class object, using the proportions associated to the previously generated \(\boldsymbol{U}\) matrix. +

    +
    +
    +

    The resulting plot is shown in Figure +4. +This method plots the proportions of the \(\boldsymbol{U}\) matrix, where +each tree represents a sample from the \(\boldsymbol{U}\) matrix, +illustrating the proportion of each clone within that specific sample. +In this case, we have set the labels parameter to TRUE to label the +nodes in the tree using the predefined tags "mut1", "mut4", +"mut2", "mut3", and "mut5".

    +

    Comparing tumor phylogenies

    +

    Now, we present examples showing the use of GeRnika’s +functionalities for comparing tumor phylogenies. For this purpose, we +will use the B_mats object included in the GeRnika package, which +contains 10 \(\boldsymbol{B}\) matrix trios. Specifically, we will use the +first trio of matrices, and set a predefined set of tags for the clones +in the trees:

    +
    > B_mats <- GeRnika::B_mats
    +
    +> B_real <- B_mats[[1]]$B_real
    +> B_alg1 <- B_mats[[1]]$B_alg1
    +> B_alg2 <- B_mats[[1]]$B_alg2
    +
    +> tags <- c("TP53", "KRAS", "PIK3CA", "APC", "EGFR", "BRCA1", "PTEN", "BRAF", 
    +            "MYC", "CDKN2A")
    +
    +> phylotree_real <- B_to_phylotree(B = B_real, labels = tags)
    +> phylotree_alg2 <- B_to_phylotree(B = B_alg2, labels = tags)
    +> phylotree_alg1 <- B_to_phylotree(B = B_alg1, labels = tags)
    +
    +> plot(phylotree_real, labels=TRUE)
    +> plot(phylotree_alg1, labels=TRUE)
    +> plot(phylotree_alg2, labels=TRUE)
    +

    The plots of the three instantiated Phylotree class objects are +depicted in Figure 5.

    +
    +
    +`phylotree_real`, `phylotree_alg1` and `phylotree_alg2`, from the left to the right. +

    +Figure 5: phylotree_real, phylotree_alg1 and phylotree_alg2, from the left to the right. +

    +
    +
    +

    We can check if the phylogenies of two tumors are equivalent using the +equals method:

    +
    > equals(phylotree_1 = phylotree_real, phylotree_2 = phylotree_alg1)
    +[1] FALSE
    +
    +> equals(phylotree_1 = phylotree_real, phylotree_2 = phylotree_real)
    +[1] TRUE
    +

    In this case, phylotree_real and phylotree_alg1 are not identical, +as some edges present in phylotree_real are absent in +phylotree_alg1, and vice versa. However, a phylogenetic tree will +always be identical to itself, as shown when comparing phylotree_real +to itself.

    +

    To find the maximal common subtrees between two phylogenetic trees, we +do it using the following command:

    +
    > find_common_subtrees(phylotree_1 = phylotree_real, phylotree_2 = phylotree_alg2, 
    +                       labels = TRUE)
    +
    +Independent edges of tree1: 2
    +
    +Independent edges of tree2: 2
    +
    +Common edges: 7
    +
    +Distance: 4
    +
    +
    +Maximal common subtrees between `phylotree_real` and `phylotree_alg2` using predefined tags. In this case, there exists a single common subtree, but there may exist more in other cases. +

    +Figure 6: Maximal common subtrees between phylotree_real and phylotree_alg2 using predefined tags. In this case, there exists a single common subtree, but there may exist more in other cases. +

    +
    +
    +
    +
    +Consensus tree between `phylotree_real` and `phylotree_alg1` using the *Lancet* palette and predefined tags. +

    +Figure 7: Consensus tree between phylotree_real and phylotree_alg1 using the Lancet palette and predefined tags. +

    +
    +
    +

    The maximal common subtrees (in this case, one subtree) between +phylotree_real and
    +phylotree_alg2 are shown in Figure +6. Note that the clones in the maximal +common subtree are represented by the predefined tags in the Phylotree +class objects as we have set labels = TRUE. Additionally, this method +prints the number of common and independent edges of the trees, along +with the distance between them.

    +

    Finally, we generate the consensus tree between two phylogenetic trees +using one of the custom palettes offered by GeRnika, specifically +the Lancet palette, and the predefined tags for the clones as follows:

    +
    > palette <- GeRnika::palettes$Lancet
    +
    +> consensus_real_alg1 <- combine_trees(phylotree_1 = phylotree_real, 
    +                                      phylotree_2 = phylotree_alg1,
    +                                      labels = TRUE, 
    +                                      palette = palette)
    +
    +> DiagrammeR::render_graph(consensus_real_alg1)
    +

    The consensus tree between phylotree_real and phylotree_alg1 is +depicted in Figure 7. Here, the nodes and the edges that compose +the common subtrees between the original trees are green. In addition, +pink edges denote the independent edges of the tree passed as the first +parameter of the method, while blue edges represent the independent +edges of the second tree. Note that the independent edges of both trees +are presented with translucent colors.

    +

    5 Conclusions

    +

    GeRnika is a comprehensive R package designed to address a critical +gap in the tools available for studying tumor evolution within the R +environment. To this end, it provides researchers with an integrated +suite for simulating, visualizing, and comparing tumor phylogenies. +Unlike many existing tools, GeRnika is fully implemented in R, +making it particularly accessible for the bioinformatics community, +which widely relies on R for data analysis and visualization.

    +

    One of GeRnika’s key contributions is providing tools to generate +biologically plausible datasets for studying intratumoral heterogeneity +and clonal dynamics. With varied, easily customizable +parameters—controlling features such as clonal tree topology, +selective pressures, and sequencing noise—GeRnika enables +exploration of a wide range of evolutionary patterns and complexities +that provide a valuable resource for testing new methods and hypotheses +in tumor heterogeneity research.

    +

    Beyond its core simulation features, GeRnika includes tools for +visualizing and comparing tumor phylogenies, offering a unified solution +that eliminates the need for multiple packages or complex data +processing workflows. Future work will focus on enhancing GeRnika’s +compatibility with other R packages related to tumor evolution, allowing +for easier integration with existing resources.

    +

    Overall, GeRnika provides an accessible, user-friendly tool that +supports research into intratumoral heterogeneity, with the potential to +substantially advance tumor phylogeny research. While it does not +perform clonal deconvolution or phylogeny inference from real sequencing +data, GeRnika serves as a valuable platform for simulation, +visualization, and benchmarking, supporting the development and +evaluation of such algorithms.

    +

    6 R software

    +

    The R package GeRnika +is now available on CRAN.

    +
    +
    +

    7 Supplementary materials

    +

    Supplementary materials are available in addition to this article. It can be downloaded at +RJ-2025-042.zip

    +

    8 CRAN packages used

    +

    GeRnika, data.tree

    +

    9 CRAN Task Views implied by cited packages

    +

    10 Bioconductor packages used

    +

    clevRvis

    +

    11 Note

    +

    This article is converted from a Legacy LaTeX article using the +texor package. +The pdf version is the official version. To report a problem with the html, +refer to CONTRIBUTE on the R Journal homepage.

    +
    +
    +R. Burrell, N. Mcgranahan, J. Bartek and C. Swanton. The causes and consequences of genetic heterogeneity in cancer evolution. Nature, 501: 338–45, 2013. DOI 10.1038/nature12625. +
    +
    +H. Dang, B. White, S. Foltz, C. Miller, J. Luo, R. Fields and C. Maher. ClonEvol: Clonal ordering and visualization in cancer sequencing. Annals of Oncology, 28(12): 3076–3082, 2017. DOI 10.1093/annonc/mdx517. +
    +
    +A. Davis, R. Gao and N. Navin. Tumor evolution: Linear, branching, neutral or punctuated? Biochimica et Biophysica Acta (BBA)-Reviews on Cancer, 1867(2): 151–161, 2017. DOI 10.1016/j.bbcan.2017.01.003. +
    +
    +A. G. Deshwar, S. Vembu, C. K. Yung, G. H. Jang, L. Stein and Q. Morris. PhyloWGS: Reconstructing subclonal composition and evolution from whole-genome sequencing of tumors. Genome Biology, 16(1): 1–20, 2015. DOI 10.1186/s13059-015-0602-8. +
    +
    +M. El-Kebir, L. Oesper, H. Acheson-Field and B. J. Raphael. Reconstruction of clonal trees and tumor composition from multi-sample sequencing data. Bioinformatics, 31(12): 62–70, 2015. DOI 10.1093/bioinformatics/btv261. +
    +
    +M. El-Kebir, G. Satas, L. Oesper and B. J. Raphael. Inferring the mutational history of a tumor using multi-state perfect phylogeny mixtures. Cell Systems, 3(1): 43–53, 2016. DOI 10.1016/j.cels.2016.07.004. +
    +
    +M. El-Kebir, G. Satas and B. J. Raphael. Inferring parsimonious migration histories for metastatic cancers. Nature Genetics, 50(5): 718–726, 2018. DOI 10.1038/s41588-018-0106-z. +
    +
    +X. Fu, H. Lei, Y. Tao and R. Schwartz. Reconstructing tumor clonal lineage trees incorporating single-nucleotide variants, copy number alterations and structural variations. Bioinformatics, 38(Supplement_1): i125–i133, 2022. DOI 10.1093/bioinformatics/btac253. +
    +
    +X. Fu and R. Schwartz. ConTreeDP: A consensus method of tumor trees based on maximum directed partition support problem. In 2021 IEEE international conference on bioinformatics and biomedicine (BIBM), pages. 125–130 2021. IEEE. DOI 10.1101/2021.10.13.463978. +
    +
    +K. Grigoriadis, A. Huebner, A. Bunkum, E. Colliver, A. M. Frankell, M. S. Hill, K. Thol, N. J. Birkbak, C. Swanton, S. Zaccaria, et al. CONIPHER: A computational framework for scalable phylogenetic reconstruction with error correction. Nature Protocols, 19(1): 159–183, 2024. DOI 10.1038/s41596-023-00913-9. +
    +
    +Z. Guang, M. Smith-Erb and L. Oesper. A weighted distance-based approach for deriving consensus tumor evolutionary trees. Bioinformatics, 39(Supplement_1): i204–i212, 2023. DOI 10.1093/bioinformatics/btad230. +
    +
    +D. Gusfield. Efficient algorithms for inferring evolutionary trees. Networks, 21(1): 19–28, 1991. DOI 10.1002/net.3230210104. +
    +
    +E. Husić, X. Li, A. Hujdurović, M. Mehine, R. Rizzi, V. Mäkinen, M. Milanič and A. I. Tomescu. MIPUP: Minimum perfect unmixed phylogenies for multi-sampled tumors via branchings and ILP. Bioinformatics, 35(5): 769–777, 2019. DOI 10.1093/bioinformatics/bty683. +
    +
    +Y. Jiang, Y. Qiu, A. J. Minn and N. R. Zhang. Assessing intratumor heterogeneity and tracking longitudinal and spatial clonal evolutionary history by next-generation sequencing. Proceedings of the National Academy of Sciences, 113(37): E5528–E5537, 2016. DOI 10.1073/pnas.1522203113. +
    +
    +M. Kimura. The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations. Genetics, 61(4): 893, 1969. DOI 10.1093/genetics/61.4.893. +
    +
    +E. Kulman, J. Wintersinger and Q. Morris. Reconstructing cancer phylogenies using Pairtree, a clone tree reconstruction algorithm. STAR Protocols, 3(4): 101706, 2022. DOI 10.1016/j.xpro.2022.101706. +
    +
    +N. J. Loman, R. V. Misra, T. J. Dallman, C. Constantinidou, S. E. Gharbia, J. Wain and M. J. Pallen. Performance comparison of benchtop high-throughput sequencing platforms. Nature Biotechnology, 30(5): 434–439, 2012. DOI 10.1038/nbt.2198. +
    +
    +S. Malikic, A. W. McPherson, N. Donmez and C. S. Sahinalp. Clonality inference in multiple tumor samples using phylogeny. Bioinformatics, 31(9): 1349–1356, 2015. DOI 10.1093/bioinformatics/btv003. +
    +
    +S. Malikic, F. R. Mehrabadi, S. Ciccolella, M. K. Rahman, C. Ricketts, E. Haghshenas, D. Seidman, F. Hach, I. Hajirasouliha and S. C. Sahinalp. PhISCS: A combinatorial approach for subperfect tumor phylogeny reconstruction via integrative use of single-cell and bulk sequencing data. Genome research, 29(11): 1860–1877, 2019. DOI 10.1101/gr.234435.118. +
    +
    +F. Marass, F. Mouliere, K. Yuan, N. Rosenfeld and F. Markowetz. A phylogenetic latent feature model for clonal deconvolution. The Annals of Applied Statistics, 10: 2377–2404, 2016. DOI 10.1214/16-AOAS986. +
    +
    +C. A. Miller, J. McMichael, H. X. Dang, C. A. Maher, L. Ding, T. J. Ley, E. R. Mardis and R. K. Wilson. Visualizing tumor evolution with the fishplot package for R. BMC Genomics, 17: 1–3, 2016. DOI 10.1186/s12864-016-3195-z. +
    +
    +M. A. Myers, G. Satas and B. J. Raphael. CALDER: Inferring phylogenetic trees from longitudinal tumor samples. Cell Systems, 8(6): 514–522, 2019. DOI 10.1016/j.cels.2019.05.010. +
    +
    +P. C. Nowell. The clonal evolution of tumor cell populations. Science, 194(4260): 23–28, 1976. DOI 10.1126/science.959840. +
    +
    +A. Petrackova, M. Vasinek, L. Sedlarikova, T. Dyskova, P. Schneiderova, T. Novosad, T. Papajik and E. Kriegova. Standardization of sequencing coverage depth in NGS: Recommendation for detection of clonal and subclonal mutations in cancer diagnostics. Frontiers in Oncology, 9: 851, 2019. DOI 10.3389/fonc.2019.00851. +
    +
    +V. Popic, R. Salari, I. Hajirasouliha, D. Kashef-Haghighi, R. B. West and S. Batzoglou. Fast and scalable inference of multi-sample cancer lineages. Genome Biology, 16(1): 91, 2015. DOI 10.1186/s13059-015-0647-8. +
    +
    +Y. Qi, Y. Luo and M. El-Kebir. OncoLib. GitHub repository, 2019. URL https://github.com/elkebir-group/OncoLib/. +
    +
    +E. M. Ross and F. Markowetz. OncoNEM: Inferring tumor evolution from single-cell sequencing data. Genome biology, 17: 1–14, 2016. DOI 10.1186/s13059-016-0929-9. +
    +
    +S. Sandmann, C. Inserte and J. Varghese. ClevRvis: Visualization techniques for clonal evolution. GigaScience, 12: giad020, 2023. DOI 10.1093/gigascience/giad020. +
    +
    +G. Satas, S. Zaccaria, G. Mon and B. J. Raphael. SCARLET: Single-cell tumor phylogeny inference with copy-number constrained mutation losses. Cell systems, 10(4): 323–332, 2020. DOI 10.1016/j.cels.2020.04.001. +
    +
    +S. Sengupta, J. Wang, J. Lee, P. Müller, K. Gulukota, A. Banerjee and Y. Ji. Bayclone: Bayesian nonparametric inference of tumor subclones using NGS data. In Pacific symposium on biocomputing co-chairs, pages. 467–478 2014. World Scientific. DOI 10.1142/9789814644730_0044. +
    +
    +E. Sollier, J. Kuipers, K. Takahashi, N. Beerenwinkel and K. Jahn. COMPASS: Joint copy number and mutation phylogeny reconstruction from amplicon single-cell sequencing data. Nature communications, 14(1): 4921, 2023. DOI 10.1038/s41467-023-40378-8. +
    +
    +F. Strino, F. Parisi, M. Micsinai and Y. Kluger. TrAp: A tree approach for fingerprinting subclonal tumor composition. Nucleic Acids Research, 41: e165–e165, 2013. DOI 10.1093/nar/gkt641. +
    +
    +G. Tanner, D. R. Westhead, A. Droop and L. F. Stead. Simulation of heterogeneous tumour genomes with HeteroGenesis and in silico whole exome sequencing. Bioinformatics, 35(16): 2850–2852, 2019. DOI 10.1093/bioinformatics/bty1063. +
    +
    +J. Wu and M. El-Kebir. ClonArch: Visualizing the spatial clonal architecture of tumors. Bioinformatics, 36(Supplement_1): i161–i168, 2020. DOI 10.1093/bioinformatics/btaa471. +
    +
    +K. Yuan, T. Sakoparnig, F. Markowetz and N. Beerenwinkel. BitPhylogeny: A probabilistic framework for reconstructing intra-tumor phylogenies. Genome Biology, 16(1): 1–16, 2015. DOI 10.1186/s13059-015-0592-6. +
    +
    + + +
    + +
    +
    + + + + + + + +
    +

    References

    +
    +

    Reuse

    +

    Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

    +

    Citation

    +

    For attribution, please cite this work as

    +
    Sánchez-Ferrera, et al., "GeRnika: An R Package for the Simulation, Visualization and Comparison of Tumor Phylogenies", The R Journal, 2026
    +

    BibTeX citation

    +
    @article{RJ-2025-042,
    +  author = {Sánchez-Ferrera, Aitor and Tellaetxe-Abete, Maitena and Calvo-Molinos, Borja},
    +  title = {GeRnika: An R Package for the Simulation, Visualization and Comparison of Tumor Phylogenies},
    +  journal = {The R Journal},
    +  year = {2026},
    +  note = {https://doi.org/10.32614/RJ-2025-042},
    +  doi = {10.32614/RJ-2025-042},
    +  volume = {17},
    +  issue = {4},
    +  issn = {2073-4859},
    +  pages = {255-274}
    +}
    +
    + + + + + + + diff --git a/_articles/RJ-2025-042/RJ-2025-042.pdf b/_articles/RJ-2025-042/RJ-2025-042.pdf new file mode 100644 index 0000000000..a91a051880 Binary files /dev/null and b/_articles/RJ-2025-042/RJ-2025-042.pdf differ diff --git a/_articles/RJ-2025-042/RJ-2025-042.zip b/_articles/RJ-2025-042/RJ-2025-042.zip new file mode 100644 index 0000000000..94dcbfbc6d Binary files /dev/null and b/_articles/RJ-2025-042/RJ-2025-042.zip differ diff --git a/_articles/RJ-2025-042/RJournal.sty b/_articles/RJ-2025-042/RJournal.sty new file mode 100644 index 0000000000..351990be38 --- /dev/null +++ b/_articles/RJ-2025-042/RJournal.sty @@ -0,0 +1,358 @@ +% Package `RJournal' to use with LaTeX2e +% Copyright (C) 2010 by the R Foundation +% Copyright (C) 2013 by the R Journal +% +% Originally written by Kurt Hornik and Friedrich Leisch with subsequent +% edits by the editorial board +% +% CAUTION: +% Do not modify this style file. Any changes to this file will be reset when your +% article is submitted. +% If you must modify the style or add LaTeX packages to the article, these +% should be specified in RJwrapper.tex + +\NeedsTeXFormat{LaTeX2e}[1995/12/01] +\ProvidesPackage{RJournal}[2025/10/05 v0.17 RJournal package] + +\RequirePackage{tikz} + +% Overall page layout, fonts etc ----------------------------------------------- + +% Issues of of \emph{The R Journal} are created from the standard \LaTeX{} +% document class \pkg{report}. + +\RequirePackage{geometry} +\geometry{a4paper, + textwidth=14cm, top=1cm, bottom=1cm, + includehead,includefoot,centering, + footskip=1.5cm} +\raggedbottom +\sloppy +\clubpenalty = 10000 +\widowpenalty = 10000 +\brokenpenalty = 10000 +\usepackage{microtype} + + +\RequirePackage{fancyhdr} +\fancyhead{} +\fancyheadoffset{2cm} +\fancyhead[L]{\textsc{\RJ@sectionhead}} +\fancyhead[R]{\thepage} +\fancyfoot{} +\fancyfoot[L]{The R Journal Vol. \RJ@volume/\RJ@number, \RJ@month~\RJ@year} +\fancyfoot[R]{ISSN 2073-4859} +\pagestyle{fancy} + +% We use the following fonts (all with T1 encoding): +% +% rm & palatino +% tt & inconsolata +% sf & helvetica +% math & palatino + +\RequirePackage{microtype} + +\RequirePackage[scaled=0.92]{helvet} +\RequirePackage{palatino,mathpazo} +\RequirePackage[scaled=1.02]{inconsolata} +\RequirePackage[T1]{fontenc} + +\RequirePackage[hyphens]{url} +\RequirePackage[pagebackref]{hyperref} +\renewcommand{\backref}[1]{[p#1]} + +% Dark blue colour for all links +\RequirePackage{color} +\definecolor{link}{rgb}{0.45,0.51,0.67} +\hypersetup{ + colorlinks,% + citecolor=link,% + filecolor=link,% + linkcolor=link,% + urlcolor=link +} + +% Give the text a little room to breath +\setlength{\parskip}{3pt} +\RequirePackage{setspace} +\setstretch{1.05} + +% Issue and article metadata --------------------------------------------------- + +% Basic front matter information about the issue: volume, number, and +% date. + +\newcommand{\volume}[1]{\def\RJ@volume{#1}} +\newcommand{\volnumber}[1]{\def\RJ@number{#1}} +\renewcommand{\month}[1]{\def\RJ@month{#1}} +\renewcommand{\year}[1]{\def\RJ@year{#1}} + + +% Individual articles correspond to +% chapters, and are contained in |article| environments. This makes it +% easy to have figures counted within articles and hence hyperlinked +% correctly. + +% An article has an author, a title, and optionally a subtitle. We use +% the obvious commands for specifying these. Articles will be put in certain +% journal sections, named by \sectionhead. + +\newcommand {\sectionhead} [1]{\def\RJ@sectionhead{#1}} +\renewcommand{\author} [1]{\def\RJ@author{#1}} +\renewcommand{\title} [1]{\def\RJ@title{#1}} +\newcommand {\subtitle} [1]{\def\RJ@subtitle{#1}} + +% Control appearance of titles: make slightly smaller than usual, and +% suppress section numbering. See http://tex.stackexchange.com/questions/69749 +% for why we don't use \setcounter{secnumdepth}{-1} + +\usepackage[medium]{titlesec} +\usepackage{titletoc} +\titleformat{\section} {\normalfont\large\bfseries}{\arabic{section}}{1em}{} +\titleformat{\subsection}{\normalfont\normalsize\bfseries}{\arabic{section}.\arabic{subsection}}{0.5em}{} +\titlecontents{chapter} [0em]{}{}{}{\titlerule*[1em]{.}\contentspage} + +% Article layout --------------------------------------------------------------- + +% Environment |article| clears the article header information at its beginning. +% We use |\FloatBarrier| from the placeins package to keep floats within +% the article. +\RequirePackage{placeins} +\newenvironment{article}{\author{}\title{}\subtitle{}\FloatBarrier}{\FloatBarrier} + +% Refereed articles should have an abstract, so we redefine |\abstract| to +% give the desired style + +\renewcommand{\abstract}[1]{\noindent\textbf{Abstract} #1} +\renewenvironment{abstract}{\noindent\textbf{Abstract}~}{} + +% The real work is done by a redefined version of |\maketitle|. Note +% that even though we do not want chapters (articles) numbered, we +% need to increment the chapter counter, so that figures get correct +% labelling. + +\renewcommand{\maketitle}{% +\noindent + \chapter{\RJ@title}\refstepcounter{chapter} + \ifx\empty\RJ@subtitle + \else + \noindent\textbf{\RJ@subtitle} + \par\nobreak\addvspace{\baselineskip} + \fi + \ifx\empty\RJ@author + \else + \noindent\textit{\RJ@author} + \par\nobreak\addvspace{\baselineskip} + \fi + \@afterindentfalse\@nobreaktrue\@afterheading +} + +% Now for some ugly redefinitions. We do not want articles to start a +% new page. (Actually, we do, but this is handled via explicit +% \newpage +% +% The name@of@eq is a hack to get hyperlinks to equations to work +% within each article, even though there may be multiple eq.(1) +% \begin{macrocode} +\renewcommand\chapter{\secdef\RJ@chapter\@schapter} +\providecommand{\nohyphens}{% + \hyphenpenalty=10000\exhyphenpenalty=10000\relax} +\newcommand{\RJ@chapter}{% + \edef\name@of@eq{equation.\@arabic{\c@chapter}}% + \renewcommand{\@seccntformat}[1]{}% + \@startsection{chapter}{0}{0mm}{% + -2\baselineskip \@plus -\baselineskip \@minus -.2ex}{\p@}{% + \phantomsection\normalfont\huge\bfseries\raggedright}} + +% Book reviews should appear as sections in the text and in the pdf bookmarks, +% however we wish them to appear as chapters in the TOC. Thus we define an +% alternative to |\maketitle| for reviews. +\newcommand{\review}[1]{ + \pdfbookmark[1]{#1}{#1} + \section*{#1} + \addtocontents{toc}{\protect\contentsline{chapter}{#1}{\thepage}{#1.1}} +} + +% We want bibliographies as starred sections within articles. +% +\RequirePackage[sectionbib,round]{natbib} +\bibliographystyle{abbrvnat} +\renewcommand{\bibsection}{\section*{References}} + +% Equations, figures and tables are counted within articles, but we do +% not show the article number. For equations it becomes a bit messy to avoid +% having hyperref getting it wrong. + +% \numberwithin{equation}{chapter} +\renewcommand{\theequation}{\@arabic\c@equation} +\renewcommand{\thefigure}{\@arabic\c@figure} +\renewcommand{\thetable}{\@arabic\c@table} + +% Issue layout ----------------------------------------------------------------- + +% Need to provide our own version of |\tableofcontents|. We use the +% tikz package to get the rounded rectangle. Notice that |\section*| +% is really the same as |\chapter*|. +\renewcommand{\contentsname}{Contents} +\renewcommand\tableofcontents{% + \vspace{1cm} + \section*{\contentsname} + { \@starttoc{toc} } +} + +\renewcommand{\titlepage}{% + \thispagestyle{empty} + \hypersetup{ + pdftitle={The R Journal Volume \RJ@volume/\RJ@number, \RJ@month \RJ@year},% + pdfauthor={R Foundation for Statistical Computing},% + } + \noindent + \begin{center} + \fontsize{50pt}{50pt}\selectfont + The \raisebox{-8pt}{\includegraphics[height=77pt]{Rlogo-5}}\hspace{10pt} + Journal + + \end{center} + {\large \hfill Volume \RJ@volume/\RJ@number, \RJ@month{} \RJ@year \quad} + + \rule{\textwidth}{1pt} + \begin{center} + {\Large A peer-reviewed, open-access publication of the \\ + R Foundation for Statistical Computing} + \end{center} + + % And finally, put in the TOC box. Note the way |tocdepth| is adjusted + % before and after producing the TOC: thus, we can ensure that only + % articles show up in the printed TOC, but that in the PDF version, + % bookmarks are created for sections and subsections as well (provided + % that the non-starred forms are used). + \setcounter{tocdepth}{0} + \tableofcontents + \setcounter{tocdepth}{2} + \clearpage +} + +% Text formatting -------------------------------------------------------------- + +\newcommand{\R}{R} +\newcommand{\address}[1]{\addvspace{\baselineskip}\noindent\emph{#1}} +\newcommand{\email}[1]{\href{mailto:#1}{\normalfont\texttt{#1}}} + +% Simple font selection is not good enough. For example, |\texttt{--}| +% gives `\texttt{--}', i.e., an endash in typewriter font. Hence, we +% need to turn off ligatures, which currently only happens for commands +% |\code| and |\samp| and the ones derived from them. Hyphenation is +% another issue; it should really be turned off inside |\samp|. And +% most importantly, \LaTeX{} special characters are a nightmare. E.g., +% one needs |\~{}| to produce a tilde in a file name marked by |\file|. +% Perhaps a few years ago, most users would have agreed that this may be +% unfortunate but should not be changed to ensure consistency. But with +% the advent of the WWW and the need for getting `|~|' and `|#|' into +% URLs, commands which only treat the escape and grouping characters +% specially have gained acceptance + +\DeclareRobustCommand\code{\bgroup\@noligs\@codex} +\def\@codex#1{\texorpdfstring% +{{\normalfont\ttfamily\hyphenchar\font=-1 #1}}% +{#1}\egroup} +\newcommand{\kbd}[1]{{\normalfont\texttt{#1}}} +\newcommand{\key}[1]{{\normalfont\texttt{\uppercase{#1}}}} +\DeclareRobustCommand\samp{`\bgroup\@noligs\@sampx} +\def\@sampx#1{{\normalfont\texttt{#1}}\egroup'} +\newcommand{\var}[1]{{\normalfont\textsl{#1}}} +\let\env=\code +\newcommand{\file}[1]{{`\normalfont\textsf{#1}'}} +\let\command=\code +\let\option=\samp +\newcommand{\dfn}[1]{{\normalfont\textsl{#1}}} +% \acronym is effectively disabled since not used consistently +\newcommand{\acronym}[1]{#1} +\newcommand{\strong}[1]{\texorpdfstring% +{{\normalfont\fontseries{b}\selectfont #1}}% +{#1}} +\let\pkg=\strong +\newcommand{\CRANpkg}[1]{\href{https://CRAN.R-project.org/package=#1}{\pkg{#1}}}% +\let\cpkg=\CRANpkg +\newcommand{\ctv}[1]{\href{https://CRAN.R-project.org/view=#1}{\emph{#1}}} +\newcommand{\BIOpkg}[1]{\href{https://www.bioconductor.org/packages/release/bioc/html/#1.html}{\pkg{#1}}} + +% Example environments --------------------------------------------------------- +\RequirePackage{fancyvrb} +\RequirePackage{alltt} + +\DefineVerbatimEnvironment{example}{Verbatim}{} +\renewenvironment{example*}{\begin{alltt}}{\end{alltt}} + +% Support for output from Sweave, and generic session style code +% These used to have fontshape=sl for Sinput/Scode/Sin, but pslatex +% won't use a condensed font in that case. + +% Update (2015-05-28 by DS): remove fontsize=\small to match example environment + +\DefineVerbatimEnvironment{Sinput}{Verbatim}{} +\DefineVerbatimEnvironment{Soutput}{Verbatim}{} +\DefineVerbatimEnvironment{Scode}{Verbatim}{} +\DefineVerbatimEnvironment{Sin}{Verbatim}{} +\DefineVerbatimEnvironment{Sout}{Verbatim}{} +\newenvironment{Schunk}{}{} + +% Mathematics ------------------------------------------------------------------ + +% The implementation of |\operatorname| is similar to the mechanism +% \LaTeXe{} uses for functions like sin and cos, and simpler than the +% one of \AmSLaTeX{}. We use |\providecommand| for the definition in +% order to keep the one of the \pkg{amstex} if this package has +% already been loaded. +% \begin{macrocode} +\providecommand{\operatorname}[1]{% + \mathop{\operator@font#1}\nolimits} +\RequirePackage{amsfonts} + +\renewcommand{\P}{% + \mathop{\operator@font I\hspace{-1.5pt}P\hspace{.13pt}}} +\newcommand{\E}{% + \mathop{\operator@font I\hspace{-1.5pt}E\hspace{.13pt}}} +\newcommand{\VAR}{\operatorname{var}} +\newcommand{\COV}{\operatorname{cov}} +\newcommand{\COR}{\operatorname{cor}} + +% Figures ---------------------------------------------------------------------- + +% For use with pandoc > 3.2.1 +\newsavebox\pandoc@box +\newcommand*\pandocbounded[1]{% scales image to fit in text height/width + \sbox\pandoc@box{#1}% + \Gscale@div\@tempa{\textheight}{\dimexpr\ht\pandoc@box+\dp\pandoc@box\relax}% + \Gscale@div\@tempb{\linewidth}{\wd\pandoc@box}% + \ifdim\@tempb\p@<\@tempa\p@\let\@tempa\@tempb\fi% select the smaller of both + \ifdim\@tempa\p@<\p@\scalebox{\@tempa}{\usebox\pandoc@box}% + \else\usebox{\pandoc@box}% + \fi% +} + +\RequirePackage[font=small,labelfont=bf]{caption} + +% Wide environments for figures and tables ------------------------------------- +\RequirePackage{environ} + +% An easy way to make a figure span the full width of the page +\NewEnviron{widefigure}[1][]{ +\begin{figure}[#1] +\advance\leftskip-2cm +\begin{minipage}{\dimexpr\textwidth+4cm\relax}% + \captionsetup{margin=2cm} + \BODY +\end{minipage}% +\end{figure} +} + +\NewEnviron{widetable}[1][]{ +\begin{table}[#1] +\advance\leftskip-2cm +\begin{minipage}{\dimexpr\textwidth+4cm\relax}% + \captionsetup{margin=2cm} + \BODY +\end{minipage}% +\end{table} +} diff --git a/_articles/RJ-2025-042/RJwrapper.tex b/_articles/RJ-2025-042/RJwrapper.tex new file mode 100644 index 0000000000..561e2f6fe7 --- /dev/null +++ b/_articles/RJ-2025-042/RJwrapper.tex @@ -0,0 +1,31 @@ +%% Just added files from R Project to Overleaf. + +\documentclass[a4paper]{report} +\usepackage[utf8]{inputenc} +\usepackage[T1]{fontenc} +\usepackage{RJournal} +\usepackage{amsmath,amssymb,array} +\usepackage{booktabs} + +% Comments +\usepackage{xcolor} +\definecolor{darkolivegreen}{rgb}{0.32, 0.41, 0.2} +\newcommand{\maitena}[1]{\textcolor{darkolivegreen}{Maitena: #1}} + +%% load any required packages here + +\begin{document} + +%% do not edit, for illustration only +\sectionhead{Contributed research article} +\volume{17} +\volnumber{4} +\year{2025} +\month{December} +\setcounter{page}{255} + +\begin{article} + \input{GeRnika} +\end{article} + +\end{document} diff --git a/_articles/RJ-2025-042/figs/commonsubtrees.png b/_articles/RJ-2025-042/figs/commonsubtrees.png new file mode 100644 index 0000000000..5fdf383632 Binary files /dev/null and b/_articles/RJ-2025-042/figs/commonsubtrees.png differ diff --git a/_articles/RJ-2025-042/figs/comparisonphylotrees.png b/_articles/RJ-2025-042/figs/comparisonphylotrees.png new file mode 100644 index 0000000000..9c3a2b7a35 Binary files /dev/null and b/_articles/RJ-2025-042/figs/comparisonphylotrees.png differ diff --git a/_articles/RJ-2025-042/figs/consensus.png b/_articles/RJ-2025-042/figs/consensus.png new file mode 100644 index 0000000000..57c5d8035a Binary files /dev/null and b/_articles/RJ-2025-042/figs/consensus.png differ diff --git a/_articles/RJ-2025-042/figs/noisy_F_error.pdf b/_articles/RJ-2025-042/figs/noisy_F_error.pdf new file mode 100644 index 0000000000..ca0bacf19e Binary files /dev/null and b/_articles/RJ-2025-042/figs/noisy_F_error.pdf differ diff --git a/_articles/RJ-2025-042/figs/noisy_F_error.png b/_articles/RJ-2025-042/figs/noisy_F_error.png new file mode 100644 index 0000000000..80132d1505 Binary files /dev/null and b/_articles/RJ-2025-042/figs/noisy_F_error.png differ diff --git a/_articles/RJ-2025-042/figs/phylotree.png b/_articles/RJ-2025-042/figs/phylotree.png new file mode 100644 index 0000000000..fccb0057b9 Binary files /dev/null and b/_articles/RJ-2025-042/figs/phylotree.png differ diff --git a/_articles/RJ-2025-042/figs/proportions.png b/_articles/RJ-2025-042/figs/proportions.png new file mode 100644 index 0000000000..987792121c Binary files /dev/null and b/_articles/RJ-2025-042/figs/proportions.png differ diff --git a/_articles/RJ-2025-042/figs/ternary_plots_cropped.pdf b/_articles/RJ-2025-042/figs/ternary_plots_cropped.pdf new file mode 100644 index 0000000000..cffdf595f5 Binary files /dev/null and b/_articles/RJ-2025-042/figs/ternary_plots_cropped.pdf differ diff --git a/_articles/RJ-2025-042/figs/ternary_plots_cropped.png b/_articles/RJ-2025-042/figs/ternary_plots_cropped.png new file mode 100644 index 0000000000..b8cdfcdbe8 Binary files /dev/null and b/_articles/RJ-2025-042/figs/ternary_plots_cropped.png differ diff --git a/_articles/RJ-2025-043/RJ-2025-043.R b/_articles/RJ-2025-043/RJ-2025-043.R new file mode 100644 index 0000000000..7862d1cad7 --- /dev/null +++ b/_articles/RJ-2025-043/RJ-2025-043.R @@ -0,0 +1,86 @@ +# Generated by `rjournal_pdf_article()` using `knitr::purl()`: do not edit by hand +# Please edit RJ-2025-043.Rmd to modify this file + +## ----setup, include=FALSE----------------------------------------------------- +knitr::opts_chunk$set(echo = FALSE, warning = FALSE, message = FALSE) +library(ggplot2) +library(kableExtra) + +dir <- getwd() + + +## ----figurememshare, out.width = "100%", fig.cap = "A schematic about where the memory is located and how different sessions access it.", echo=FALSE---- +knitr::include_graphics("figures/Grafik_Memshare.png") + + +## ----apply, echo = TRUE------------------------------------------------------- +library(memshare) +set.seed(1) +n <- 10000 +p <- 2000 +# Numeric double matrix (required): n rows (cases) x d columns (features) +X <- matrix(rnorm(n * p), n, p) +# Reference vector to correlate with each column +y <- rnorm(n) +f <- function(v, y) cor(v, y) + +ns <- "my_namespace" +res <- memshare::memApply( +X = X, MARGIN = 2, +FUN = f, +NAMESPACE = ns, +VARS = list(y = y), +MAX.CORES = NULL # defaults to detectCores() - 1 +) + + +## ----lapply, echo = TRUE------------------------------------------------------ + library(memshare) + list_length <- 1000 + matrix_dim <- 100 + + # Create the list of random matrices + l <- lapply( + 1:list_length, + function(i) matrix(rnorm(matrix_dim * matrix_dim), + nrow = matrix_dim, ncol = matrix_dim) + ) + + y <- rnorm(matrix_dim) + namespace <- "my_namespace" + + res <- memLapply(l, function(el, y) { + el %*% y + }, NAMESPACE = namespace, VARS = list(y=y), + MAX.CORES = 1) #MAX.CORES=1 for simplicity + + +## ----figure1, out.width = "100%", fig.cap = "Matrix size depicted as magnitude vs median runtime (left) and vs memory overhead (MB) during the run relative to idle (right) for `memshare`, `SharedObject` as error-bar style plots with intevals given by the median ± AMAD across 100 runs. In addition, the serial baseline is shown as a line in magenta. The top subfigures present the full range of matrix sizes, and the bottom subfigures zoom in.", echo=FALSE---- +knitr::include_graphics("figures/Figure1.png") + + +## ----figure2,out.width = "100%", fig.cap = "The distribution of mutual information for 19637 gene expressions as a histogram, Pareto Density Estimation (PDE), QQ-plot against normal distribution and boxplot. There are no missing values (NaN).", fig.path='figures/'---- +Header <- readLines(file.path(dir,"data/MI_values.lrn"), n = 2)[2] + +mi_values <- read.table(file = file.path(dir,"data/MI_values.lrn"),header = TRUE,sep = "\t",skip = 5) + +DataVisualizations::InspectVariable(mi_values$MI,Name = "Distribution of Mutual Information") +#length(mi_values$MI) +#Header + + +## ----figure1-detail, out.width = "100%", fig.cap = "Median runtime (log-scale) vs matrix size for `memshare`, `SharedObject`, and serial baseline; ribbons show IQR across 100 runs. Insets show difference in total RSS in log(MB), i,e., the memory overhead, during the run relative to idle for Mac presenting the details of Figure \ref{fig:figure1, echo=FALSE}."---- +knitr::include_graphics("figures/Figure1_appendix_secs_vs_Resident_Set_Size_mac.png") + + +## ----appendix-figure1, out.width = "100%", fig.cap = "Median runtime (log-scale) vs matrix size for `memshare`, `SharedObject`, and serial baseline; ribbons show IQR across 100 runs. Insets show the difference in total RSS in log(MB) (i.e., memory overhead) during the run relative to idle for Windows~10 via Boot Camp.", echo=FALSE---- +knitr::include_graphics("figures/Figure1_appendix_secs_vs_Resident_Set_Size_win.png") + + +## ----app-a-1, out.width = "100%", fig.cap = "First Screenshot of ShareObjects Computation.", echo=FALSE---- +knitr::include_graphics("figures/Crash_message1.png") + + +## ----app-a-2, out.width = "100%", fig.cap = "Second Screenshot of ShareObjects Computation.", echo=FALSE---- +knitr::include_graphics("figures/Crash_message2.png") + diff --git a/_articles/RJ-2025-043/RJ-2025-043.Rmd b/_articles/RJ-2025-043/RJ-2025-043.Rmd new file mode 100644 index 0000000000..bc4631a335 --- /dev/null +++ b/_articles/RJ-2025-043/RJ-2025-043.Rmd @@ -0,0 +1,481 @@ +--- +title: 'memshare: Memory Sharing for Multicore Computation in R with an Application + to Feature Selection by Mutual Information using PDE' +date: '2026-02-04' +abstract: | + We present memshare, a package that enables shared-memory multicore computation in R by allocating buffers in C++ shared memory and exposing them to R through ALTREP views. We compare memshare to SharedObject (Bioconductor), discuss semantics and safety, and report a 2x speedup over SharedObject with no additional resident memory in a column-wise apply benchmark. Finally, we illustrate a downstream analytics use case: feature selection by mutual information, in which densities are estimated per feature via Pareto Density Estimation (PDE). The analytical use case is an RNA-seq dataset consisting of N = 10,446 cases and d = 19,637 gene expression measurements, requiring roughly `n_threads` * 10GB of memory in the case of using parallel R sessions. Such and larger use cases are common in big data analytics and make R feel limiting, which is mitigated by the library presented in this work. +draft: no +author: +- name: Michael C. Thrun + affiliation: + - University of Marburg + - IAP-GmbH Intelligent Analytics Projects + address: + - Mathematics and Computer Science, D-35032 Marburg + - In den Birken 10A, 29352 Adelheidsdorf + url: https://www.iap-gmbh.de + orcid: 0000-0001-9542-5543 + email: mthrun@informatik.uni-marburg.de +- name: Julian Märte + affiliation: IAP-GmbH Intelligent Analytics Projects + address: In den Birken 10A, 29352 Adelheidsdorf + url: https://www.iap-gmbh.de + email: j.maerte@iap-gmbh.de + orcid: 0000-0001-5451-1023 +type: package +output: + rjtools::rjournal_article: + self_contained: yes + toc: no +bibliography: RJreferences.bib +date_received: '2025-09-10' +volume: 17 +issue: 4 +slug: RJ-2025-043 +journal: + lastpage: 321 + firstpage: 305 + +--- + + +```{r setup, include=FALSE} +knitr::opts_chunk$set(echo = FALSE, warning = FALSE, message = FALSE) +library(ggplot2) +library(kableExtra) + +dir <- getwd() +``` + +## Introduction + +Parallel computing in R is usually realized through PSOCK or FORK clusters, where multiple R processes work in parallel [@Rparallel2025], [@doparallel2025]. A practical issue arises immediately: each worker process receives its own private copy of the data. If a matrix consumes several gigabytes in memory, then spawning ten workers results in ten redundant copies, potentially exhausting system memory and severely reducing performance due to paging. This overhead becomes especially prohibitive in genomics or imaging, where matrices of tens of gigabytes are commonplace. Copying also incurs serialization and deserialization costs when transmitting objects to workers, delaying the onset of actual computation. + +Shared memory frameworks address this issue by allowing workers to view the same physical memory without duplication. Instead of copying the whole object, only small handles or identifiers are communicated, while the underlying data is stored once in RAM. This enables efficient multicore computation on large datasets that would otherwise be infeasible. + +ALTREP (short for ALTernative REPresentations) is a framework in R that allows vectors or matrices to be backed by alternative storage while still behaving like ordinary R objects. Method hooks determine how to access length, data pointers, and duplication, so that package developers can integrate external memory management, including shared memory segments, without changing downstream R code. + +A common alternative to in-RAM shared memory is file-backed memory mapping, where binary files on disk are mapped into R and the operating system pages data into memory on demand [@kane2013scalable]. Packages such as `bigmemory` and `bigstatsr` create matrices stored on disk but accessible through R as if they were in memory [@prive2018efficient]. This enables analyses on datasets larger than RAM by working column-wise with a small memory footprint. However, because file-backed matrices rely on disk I/O, they are slower than in-RAM shared memory when a single copy of the data would fit into physical memory, but multiple per-worker copies would not. Therefore, this work focuses on ALTREP-based, in-RAM techniques for data that fit into memory once but not many times. +Our contributions are: + +1. A fully independent and user-friendly implementation based on the ALTREP framework. + +2. A comparison of `memshare` vs `SharedObject`: data model, safety, copy-on-write, and developer surface showing a runtime up to twice as fast for `memshare` on parallel column operations without extra RSS. + +3. A practical template for MI-PDE feature selection on RNA-seq. + +## Background + +A more detailed description of ALTREP internals and our shared-memory use case is provided in Appendix~B; here we summarize only the concepts needed for the comparison of `memshare` to existing approaches. + +### `SharedObject` baseline +Having outlined the general ALTREP and shared-memory mechanism, we now briefly review the `SharedObject` package, which serves as our main existing ALTREP-based shared-memory baseline. + +`SharedObject` allocates shared segments and wraps them as ALTREP [@sharedobject2025]. It exposes properties like `copyOnWrite`, `sharedSubset`, and `sharedCopy`; it supports atomic types and (with caveats) character vectors. Developers can map or unmap shared regions and query whether an object is shared. `SharedObject` was among the first implementations that showed how ALTREP can enable multicore parallelism by avoiding data duplication. +`SharedObject` provides `share()` to wrap an R object as a shared ALTREP, with tunables: + +- `copyOnWrite` (default TRUE): duplicates on write; setting FALSE enables in‑place edits but is not fully supported and can lead to surprising behavior (e.g., unary minus mutating the source). + +- `sharedSubset`: whether slices are also shared; can incur extra duplication in some IDEs; often left `FALSE`. + +- `sharedCopy`: whether duplication of a shared object remains shared. +It supports raw, logical, integer, double, or complex and, with restrictions, character (recommended read‑only; new, previously unseen strings cannot be assigned). Developers can also directly allocate, map, unmap, and free shared regions and query `is.shared` or `is.altrep`. + +### R’s threading model +`R`’s C API is single‑threaded; package code must not call R from secondary threads. Process‑level parallelism (clusters) remains the primary avenue. Consequently, shared‑memory frameworks must ensure that mutation is either controlled in the main thread or performed at the raw buffer level without touching R internals. + +### PDE-based Mutual Information +For feature selection with a discrete response $Y$ and a continuous feature $X$, mutual information can be expressed as: +\begin{equation} +I(X;Y) = \sum_{y} p(y) KL(p(x|y) || p(x)) +\end{equation} + +This formulation requires only univariate densities $p(x)$ and $p(x|y)$ per class. +This lends itself to Pareto Density Estimation (PDE), a density estimator based on hyperspheres with the Pareto radius chosen by an information-optimal criterion. In PDE, the idea is to select a subset S of the data with relative size $p = |S|/|D|$. The information content is $I(p) = -p \ln(p)$. [@ultsch2005pareto] showed that the optimal set size corresponds to about 20.1%, retrieving roughly 88% of the maximum possible information. The unrealized potential (URP) quantifies deviation from the optimal set size and is minimized when $p\approx 20\%$. For univariate density estimation this yields the Pareto radius $R$, which can be approximated by the 18$\%$ quantile distance in one dimension. PDE thus adapts the neighborhood size following the Pareto rule (80–20 rule) to maximize information content. Empirical studies report that PDE can outperform standard density estimators under default settings [@thrun2020analyzing]. + +With respect to the categorical variable, no density estimation is needed, as the most accurate density estimate in this case is simply the relative label count, $p(y) = \frac{\#\{\omega\in\Omega ~|~ Y(\omega) = y\}}{\#\Omega}$. Here $\Omega$ is the set of cases, $Y$ is the categorical random variable, and $y$ runs over the range of $Y$, $Y(\Omega)$. + +## Methods +In idiomatic use, `memshare` coordinates PSOCK clusters (separate processes) that attach to one shared segment. Within workers the relevant shared segments are retrieved and the task is executed on them. The key win is replacing per‑worker duplication of the large matrix with a far cheaper retrieval of a handle to the shared memory segment and subsequent wrapping in an ALTREP instead. + +### The `memshare` API + +Shared memory pages in `memshare` are handled by unique string identifiers on the OS side. These identifiers can be requested and retrieved via `C`/`C++`. To prevent two master `R` sessions from accidentally accessing each other’s memory space because duplicate allocations can lead to undefined behavior at the OS level, users may define one or more **namespaces** in which the current session operates. +The `memshare` API closely mirrors `C`’s memory ownership model but applies it to `R` sessions. A master (primary) session owns the memory, while worker (secondary) sessions can access it. + +### Shared memory semantics + +A crucial aspect of `memshare`’s design is how shared memory is managed and exposed through `R`. Three definitions clarify the terminology: + +A **namespace** refers to a character string that defines the identifier of the shared memory context. It allows the initialization, retrieval, and release of shared variables under a common label, ensuring that multiple sessions or clusters can coordinate access to the same objects. While this does not provide absolute protection, it makes it the user’s responsibility to avoid assigning the same namespace to multiple master sessions. + +**Pages** are variables owned by the current compilation unit of the code, such as the `R` session or terminal that loaded the DLL. Pages are realized as shared memory objects: on Windows via `MapViewOfFile`, and on Unix systems via `shm` in combination with `mmap`. + +**Views** are references to variables owned by another or the same compilation unit. Views are always ALTREP wrappers, providing pointers to the shared memory chunk so that R can interact with them as if they were ordinary vectors or matrices. + +Together, these concepts enforce a lifecycle: pages represent ownership of memory segments, views represent references to them, and namespaces serve as the coordination mechanism. The combination guarantees both memory efficiency and safety when performing multicore computations across `R` processes. + +If the user detaches the `memshare` package, all handles are destroyed. This means that all variables of all namespaces are cleared, provided there is no other `R` thread still using them. In other words, unloading the package cleans up shared memory regions and ensures that no dangling references remain. Other threads still holding a handle to the memory will prevent this cleanup, as it would invalidate the working memory of those threads. The shared memory is then cleared whenever all handles are released. + +#### Master session + +A master session takes ownership of a memory page using: + +- `registerVariables(namespace, variableList)` + +where `variableList` is a named list of supported types. These are double matrices, double vectors, or lists of these. The names define the memory pages ID through which they can be accessed, while the values of the actual variables define the size and content of the memory page. + +To deallocate memory pages, the master session can call: + +- `releaseVariables(namespace, variableNames)` + +where `variableNames` is a character vector containing the names of previously shared variables. + +`memshare` also allows for releasing all the memory allocated for a given namespace by a memory context, i.e., a parallel cluster with a master session, via the + +- `memshare_gc(namespace, cluster)` + +function. This first removes every view handle in the context and then releases all pages. + +**Note.** Memory pages are not permanently deallocated if another session still holds a view of them. This ensures stability: allowing workers to continue with valid but outdated memory is safer than letting them access invalidated memory. However, releasing variables still in use is always a user error and must be avoided. + +#### Worker session + +Once the master session has shared variables, worker sessions can retrieve them via: + +- `retrieveViews(namespace, variableNames)` + +This returns a named list of `R` objects. These objects are raw ALTREP objects indistinguishable from the originals (`is.matrix`, `is.numeric`, etc.), and all behave the same. + +When operating on these objects, workers interact directly with the underlying `C` buffer, backed by `mmap` (Unix) or `MapViewOfFile` (Windows). Changes to such objects modify the shared memory for all sessions. In this framework, however, modification is secondary—the main goal is to transfer data from the master to worker sessions. + +For metadata access without retrieving views, workers can call: + +- `retrieveMetadata(namespace, variableName)` + +which provides information for a single variable. + +After processing, workers must return their views to allow memory release by calling: + +- `releaseViews(namespace, variableNames)` + +The overall high-level concept is summarized in Figure \@ref(fig:figurememshare). + +```{r figurememshare, out.width = "100%", fig.cap = "A schematic about where the memory is located and how different sessions access it.", echo=FALSE} +knitr::include_graphics("figures/Grafik_Memshare.png") +``` + +#### Diagnostic tools + +To verify correct memory management, two diagnostic functions are available: + +- `pageList()`: lists all variables owned by the current session. +- `viewList()`: lists all views (handles) currently held by the session. + +The former is stricter, since it identifies ownership, whereas the latter only tracks held views. + +#### User-friendly wrapper functions for `apply` and `lapply` + +Since memory ownership and address sharing are low-level concepts, `memshare` provides wrapper functions that mimic `parallel::parApply` and `parallel::parLapply`. + +- `memApply(X, MARGIN, FUN, NAMESPACE, CLUSTER, VARS, MAX.CORES)` + Mimics `parallel::parApply`. + - `X`: a double matrix. + - `MARGIN`: direction (`1 = row-wise`, `2 = column-wise`). + - `FUN`: function applied to rows/columns. + - `CLUSTER`: a prepared parallel cluster (variables exported via `parallel::clusterExport`). + - `VARS`: additional shared variables (names must match `FUN` arguments). + - `MAX.CORES`: only relevant if `CLUSTER` is uninitialized. + +`memApply` automatically manages sharing and cleanup of `X` and `VARS`, ensuring no residual `C`/`C++` buffers remain. Both `X` and `VARS` can also refer to previously allocated shared variables, though in that case the user must manage their lifetime. + +- `memLapply(X, FUN, NAMESPACE, CLUSTER, VARS, MAX.CORES)` + Equivalent to `parallel::parLapply`, but within the `memshare` framework. + - `X`: a list of double matrices or vectors. + - Other arguments behave the same way as in `memApply`. + +### Examples of Use +We provide two top-level examples for the use of the `memshare` package: one with `memLapply` and one with `memApply`. + +The first example computes the correlation between each column of a matrix and a reference vector using shared memory and `memApply`. The matrix can be provided directly and will be registered automatically, or by name if already registered. + +```{r apply, echo = TRUE} +library(memshare) +set.seed(1) +n <- 10000 +p <- 2000 +# Numeric double matrix (required): n rows (cases) x d columns (features) +X <- matrix(rnorm(n * p), n, p) +# Reference vector to correlate with each column +y <- rnorm(n) +f <- function(v, y) cor(v, y) + +ns <- "my_namespace" +res <- memshare::memApply( +X = X, MARGIN = 2, +FUN = f, +NAMESPACE = ns, +VARS = list(y = y), +MAX.CORES = NULL # defaults to detectCores() - 1 +) +``` + +`memApply` parallelizes a row- or column-wise map over a matrix that lives once in shared memory. If `X` is passed as an ordinary `R` matrix, it is registered under a generated name in the namespace `ns`. Additional variables (here `y`) can be provided as a named list; these are registered and retrieved as ALTREP views on the workers. A cluster is created automatically if none is provided. Each worker obtains a cached view of the matrix (and any shared variables), extracts the i-th row or column as a vector `v` according to MARGIN, calls `FUN(v,...)`, and returns the result. Views are released after the computation, and any objects registered by this call are freed. Because workers operate on shared views rather than copies, the total resident memory remains close to a single in-RAM copy of `X`, while runtime scales with the available cores. + +As a second example, consider a case where a list of 1000 random matrices is multiplied by a random vector. This task is parallelizable at the element level and demonstrates the use of `memshare::memLapply`, which applies a function across list elements in a shared memory context: + +```{r lapply, echo = TRUE} + library(memshare) + list_length <- 1000 + matrix_dim <- 100 + + # Create the list of random matrices + l <- lapply( + 1:list_length, + function(i) matrix(rnorm(matrix_dim * matrix_dim), + nrow = matrix_dim, ncol = matrix_dim) + ) + + y <- rnorm(matrix_dim) + namespace <- "my_namespace" + + res <- memLapply(l, function(el, y) { + el %*% y + }, NAMESPACE = namespace, VARS = list(y=y), + MAX.CORES = 1) #MAX.CORES=1 for simplicity +``` + +`memLapply()` provides a parallel version of `lapply()` where the list elements and optional auxiliary variables are stored in shared memory. If the input X is an ordinary `R` list, it is first registered in a shared memory namespace. Additional variables can be supplied either as names of existing shared objects or as a named list to be registered. A parallel cluster is created automatically if none is provided, and each worker is initialized with the `memshare` environment. + +For each index of the list, the worker retrieves an ALTREP view of the corresponding element (and of any shared variables), applies the user-defined function `FUN` to these objects, and then releases the views to avoid memory leaks. The function enforces that the first argument of `FUN` corresponds to the list element and that the names of shared variables match exactly between the namespace and the function signature. Results are collected with `parLapply`, yielding an ordinary `R` list of the same length as the input. + +Because only lightweight references to the shared objects are passed to the workers, no duplication of large data occurs, making the approach memory-efficient. Finally, `memLapply()` includes cleanup routines to release temporary registrations, stop the cluster if it was created internally, and free shared memory, ensuring safe reuse in subsequent computations. + +### Benchmark design + +We compare `memshare` and `SharedObject` on a column-wise apply task across square matrices of sizes \(10^i \times 10^i\) for \(i = 1,\ldots,5\). We use a PSOCK cluster with 32 cores on an iMac Pro, 256 GB DDR4, 2.3 GHz 18-core Intel Xeon W on macOS Sequoia 16.6.1 (24G90) with R 4.5.1 `x86_64-apple-darwin20`. For each size, we run 100 repetitions and recorded wall-clock times and resident set size (RSS) across all worker PIDs plus the master. The RSS is summed via `ps()` and our helper `total_rss_mb()`. +We define the memory overhead as the difference in total RSS before and after the call, i.e., we measure the additional memory required by the computation beyond the base process footprint. + +For `SharedObject` we create `A2`, `share(A1)`, and `parApply()`; for `memshare` we call `memApply` directly on `A1` with a namespace, so that only ALTREP views are created on the workers. A serial baseline uses `apply()`, and an additional baseline uses `parApply()` without shared memory. A minimally edited version of the full script (setup, PID collection, loops, and data saving) is provided in Appendix~A to ensure reproducibility. + +As part of our safety and lifecycle checks, we ensure that views, which keep shared segments alive, are always released in the workers before returning control. Once all work is complete, the corresponding variables are then released in the master. To maintain fairness, we avoid creating incidental copies, such as those introduced by coercions, remove variables with `rm()`, and use R's garbage collection `gc()` after each call. + +### RNA-seq dataset via FireBrowse + +FireBrowse [@firebrowse2025] delivers raw counts of gene expression indexed by NCBI identifiers. For each gene identifier \(i\) (from Ensembl or NCBI), we obtain a raw count \(r_i\) that quantifies the observed read abundance. These raw counts represent the number of reads mapped to each gene, without length normalization. To convert raw counts into TPM (transcripts per million) [@li2010rsem], we require gene or transcript lengths \(l_i\). For each gene \(i\), we compute: + +\begin{equation} +\hat{r}_i=\frac{r_i}{l_i} +\end{equation} + +The total sum $R = \sum_i \hat{r}_i$ across all genes is then used to scale values as: + +\begin{equation} +TPM_i=\frac{\hat{r}_i}{R} \times 10^6 +\end{equation} +This transformation allows comparison of expression levels across genes and samples by correcting for gene length and sequencing depth [@li2010rsem]. After transformation, our dataset consists of \(d = 19,637\) gene expressions across \(N = 10,446\) cases spanning 32 diagnoses. It can be found under [@thrun2025genexpressions]. + +## Results + +In the first subsection, the efficiency of `memshare` is compared to `SharedObject`, and in the second subsection the application is presented. + +### Performance and Memory + +In Figure \@ref(fig:figure1), the results for square matrices of increasing magnitudes are shown as summary plots with variability bars. In these error-bar-style plots the bars indicate median ± AMAD. +The bottom subfigures (C–D) provide a zoomed view of the first four matrix sizes, while the top subfigures (A–B) display all five magnitudes from \(10^1\) to \(10^5\). +The x-axis represents the magnitude of the matrix size, and the y-axis shows either runtime in seconds (subfigures A and C) or memory overhead in megabytes (subfigures B and D). Memory overhead is measured as the difference in total RSS. For each magnitude, 100 trials were conducted. The magenta line indicates the performance of the single-threaded R baseline. + +Table~\@ref(tab:median-res-tab-static) reports the median runtime and memory overhead, while variability is visualized in Figure~\@ref(fig:figure1-detail) via the scatter of 100 runs and summarized numerically by the robust AMAD dispersion statistic in Table~\@ref(tab:amad-res-tab-static). + +`memshare` (orange diamonds) consistently outperforms `SharedObject` (blue circles) in Figure \@ref(fig:figure1). In the scatter plot of the 100 trials in Figure~\@ref(fig:figure1-detail), it is evident that for the first three magnitudes both packages use less memory than the baseline. At \(10^4\), average memory usage is comparable to the baseline, while at \(10^5\), `memshare` slightly exceeds it. `SharedObject`, however, could only be executed at this magnitude from the terminal, but not within RStudio (see Appendix~B). + +Considering relative differences [@Ultsch2008], `memshare` achieves computation times that are 90--170\% faster than `SharedObject`. For matrices of size \(10^2\) and larger, memory consumption is reduced by 132--153\% compared to `SharedObject`. + +```{r figure1, out.width = "100%", fig.cap = "Matrix size depicted as magnitude vs median runtime (left) and vs memory overhead (MB) during the run relative to idle (right) for `memshare`, `SharedObject` as error-bar style plots with intevals given by the median ± AMAD across 100 runs. In addition, the serial baseline is shown as a line in magenta. The top subfigures present the full range of matrix sizes, and the bottom subfigures zoom in.", echo=FALSE} +knitr::include_graphics("figures/Figure1.png") +``` + +Table: (\#tab:median-res-tab-static) The benchmark compares four types: `memshare`, `SharedObject`, a single-threaded baseline, and a parallel baseline. For `memshare` and `SharedObject`, the reported values are the medians over 100 iterations, while the baselines are the result from either a single-threaded R or a simple `parApply` run using one iteration. Magnitude refers to the matrix size. Entries are given as *Time Consumed (Memory Overhead)*, where time is measured in seconds and memory in megabytes (MB); the memory after call is mentioned in Appendix C. + +| Type / Magnitude | Baseline | Baseline parApply | SharedObject | memshare | +|-----------------------------|--------------------------|----------------------------|----------------------------|---------------------------| +| 1 | 0.0003 (0.0234) | 0.0049 (0.9023) | 0.2492 (0.0801) | 0.0416 (1.1426) | +| 2 | 0.0008 (0.1461) | 0.0034 (0.1461) | 0.2531 (0.5117) | 0.0419 (0.3594) | +| 3 | 0.0231 (15.2656) | 0.0356 (7.6406) | 0.3238 (11.6387) | 0.0481 (1.4688) | +| 4 | 2.2322 (1664.9727) | 3.5015 (2627.1133) | 1.5526 (1570.0566) | 0.6655 (895.4473) | +| 4.2 | 9.2883 (4490.4648) | 12.9872 (6040.4023) | 3.1147 (3901.5137) | 1.6223 (3881.7441) | +| 4.5 | 36.3783 (17206.4688) | 53.6183 (24983.7852) | 10.3513 (15391.0020) | 6.6583 (15285.4258) | +| 4.7 | 92.0136 (39355.5703) | 130.6937 (62936.4766) | 32.3116 (38533.7266) | 16.3157 (38389.7305) | +| 5 | 217.4490 (157812.9492) | -- | 128.0942 (152967.0273) | 67.0000 (76311.7402) | + +Table: (\#tab:amad-res-tab-static) AMAD for the benchmark of `SharedObject` vs `memshare`. + +| Magnitude | Type | Time Consumed (seconds) | Memory Overhead (MB) | Memory after Call | +|--------------|--------------|-----------|-------------------------|----------------------| +| 1 | SharedObject | 0.01241 | 0.06400 | 7.47613 | +| 2 | SharedObject | 0.00549 | 0.05270 | 12.85171 | +| 3 | SharedObject | 0.00743 | 0.23716 | 109.60844 | +| 4 | SharedObject | 0.05179 | 9.98323 | 364.98253 | +| 4.2 | SharedObject | 0.13739 | 9.28681 | 533.72239 | +| 4.5 | SharedObject | 0.63438 | 14.73392 | 1619.81984 | +| 4.7 | SharedObject | 2.09502 | 15.81807 | 823.25853 | +| 6 | SharedObject | 4.01802 | 22.88011 | 555.74045 | +| 1 | memshare | 0.00198 | 1.41166 | 51.33531 | +| 2 | memshare | 0.00251 | 0.27480 | 17.12808 | +| 3 | memshare | 0.00228 | 1.91232 | 78.65743 | +| 4 | memshare | 0.02038 | 110.18064 | 5956.56036 | +| 4.2 | memshare | 0.03206 | 39.21014 | 2346.06943 | +| 4.5 | memshare | 0.24571 | 36.97784 | 1086.58683 | +| 4.7 | memshare | 0.53134 | 89.44624 | 4081.05399 | +| 5 | memshare | 2.48280 | 31.98623 | 2413.87405 | + +### Application to Feature Selection by Mutual Information using Pareto Density Estimation + +The computation of mutual information produced values ranging from 0 to 0.54 (Figure \@ref(fig:figure2) ). The QQ-plot shows clear deviation from a straight line, indicating that the distribution is not Gaussian. Both the histogram and the PDE plot provide consistent estimates of the probability density, revealing a bimodal structure. The boxplot further highlights the presence of outliers with values above 0.4. + +The analysis required about two hours of computation time and approximately 47 GB of RAM, feasible only through memory sharing. In practice, mutual information values can guide feature selection, either by applying a hard threshold or by using a soft approach via a mixture model, depending on the requirements of the subsequent machine learning task. + +```{r figure2,out.width = "100%", fig.cap = "The distribution of mutual information for 19637 gene expressions as a histogram, Pareto Density Estimation (PDE), QQ-plot against normal distribution and boxplot. There are no missing values (NaN).", fig.path='figures/'} +Header <- readLines(file.path(dir,"data/MI_values.lrn"), n = 2)[2] + +mi_values <- read.table(file = file.path(dir,"data/MI_values.lrn"),header = TRUE,sep = "\t",skip = 5) + +DataVisualizations::InspectVariable(mi_values$MI,Name = "Distribution of Mutual Information") +#length(mi_values$MI) +#Header +``` + +\newpage + +## Discussion + +On all major platforms, PSOCK clusters execute each worker in a separate R process. Consequently, a call to `parApply(cl, X, MARGIN, FUN, …)` requires R to serialize the matrix to transmit it to each worker, and deserialize it into a full in-memory copy in every process. +As shown in Table \@ref(tab:median-res-tab-static), this replication leads to out-of-memory (OOM) failures once the matrix reaches a size of \(10^5 \times 10^5\), despite the machine providing 256 GB of RAM. +Even for smaller magnitudes, substantial redundant memory allocation occurs: multiple workers may begin materializing private copies of the matrix faster than they can complete their portion of the computation, resulting in transient but significant memory amplification. This behavior, inherent to PSOCK-based parallelization, explains the observed OOM conditions under `parApply`. + +Consequently, shared memory becomes a foundational requirement for scalable high-performance computing in `R`, because it avoids redundant data replication and stabilizes memory usage as problem sizes grow. +In our experiments, this advantage is reflected directly in performance: across matrix sizes, `memshare` achieved a two-fold reduction in median computation time compared to `SharedObject` on the column-wise task. For large matrix sizes, both `memshare` and `SharedObject` show a lower total memory overhead than the single-threaded baseline, because `R`’s serial `apply()` implementation creates substantial temporary objects. Each column extraction allocates a full-length numeric vector, and additional intermediates are produced by `FUN`. These private allocations inflate the baseline’s RSS, and memory is not promptly returned to the OS due to R’s garbage-collection strategy. In contrast, `memshare` and `SharedObject` provide ALTREP-backed views into a shared memory segment, eliminating the need to materialize full column vectors. + +For matrices of size \(10^2\) and larger, memory overhead was between half and a third of that of `SharedObject`. At the smallest size (\(10^1\)), `memshare` consumed more memory than `SharedObject` because, in R, the metadata must also be shared; this requires a second shared-memory segment whose fixed overhead dominates at small sizes. + +The experiments show that `SharedObject` exhibited overhead consistent with copy-on-write materializations and temporary object creation up to size \(10^4\). Its memory usage was approximately an order of magnitude higher on macOS than that of `memshare` or the single thread baseline, as illustrated by the triangles aligning with the single-thread baseline of a higher magnitude in Figure \@ref(fig:figure1-detail). For matrices of size \(10^5\), `SharedObject` caused RStudio to crash (see appendix~D), although results were computable in R via the terminal. + +Beyond these synthetic benchmarks, the RNA-seq case study illustrates how this computational behavior translates into a practical high-dimensional analysis pipeline. Biologically, the bimodal structure in Figure~\@ref(fig:figure2) is consistent with a separation between largely uninformative background genes and a smaller subset of diagnosis, or subtype-specific markers. Genes in the lower mode, i.e., with MI values close to zero, show little association with the 32 diagnostic labels and plausibly correspond to broadly expressed housekeeping or pathway genes whose expression varies only weakly across cancer types. In contrast, genes in the upper mode, i.e., MI values in the right-hand peak, exhibit strong dependence on the diagnostic label and are therefore candidates for disease- or tissue-specific markers, including lineage markers and immune-related genes that differ systematically between tumor entities. Although a detailed pathway or gene-set enrichment analysis is beyond the scope of this work, the presence of a distinct high-MI mode indicates that the PDE-based mutual information filter successfully highlights genes whose expression patterns are highly structured with respect to the underlying diagnostic classes and likely reflect underlying molecular subtypes. + +At the same time, the fact that this analysis required approximately two hours of computation time and about 47~GB of RAM underscores the central role of memory sharing: without `memshare`, running such a MI-PDE filter on 19,637 genes and 10,446 cases in parallel would be prohibitively memory-intensive on typical multicore hardware. In sum, `memshare` provides a more stable memory-sharing interface, scales more effectively to large matrix sizes, and achieves greater computational efficiency than `SharedObject`. + +## Summary + +Regardless of the package, `R`'s single-threaded API implies that multi-threaded computation should touch only raw memory and synchronize results at the main thread boundary. Shared mutation requires external synchronization if multiple workers write to overlapping regions. In practice, read-mostly patterns are ideal. + +Here, `memshare`'s namespace + view model and memApply wrapper simplify cross-process sharing compared to manual `share()` + cluster wiring. Its explicit releaseViews/Variables lifecycle makes retention and cleanup auditable. `SharedObject`'s fine-grained properties are powerful, but the interaction of copy-on-write and duplication semantics increases cognitive load. + +`memshare` combines ALTREP-backed shared memory with a pragmatic parallel API to deliver strong speed and memory efficiency on multicore systems. In analytic pipelines like MI-based feature selection for RNA-seq, this enables simple, scalable patterns---one in-RAM copy of the data, many cores, and no serialization overhead. + +## Appendix A: code listing of benchmark {.appendix} + +The full code is accessible via [@thrun2025mem_appendixa]. To avoid the crash message for `SharedObject` in Appendix~B, we tried manual garbage collection and performed saves for each iteration without being able to change the outcome. Only by restricting usage to the 'R' terminal console without RStudio were we able to compute all results. + +## Appendix B: ALTREP and shared memory in R {.appendix} + +In R, ALTREP (short for ALTernate REPresentations) is a framework introduced in version 3.5.0 that allows vectors and other objects to be stored and accessed in non-standard ways while maintaining their usual R interface. Built-in type checks cannot tell the difference between an ALTREP object and its ordinary counterpart, which ensures compatibility. + +Instead of relying solely on R's default contiguous in-memory arrays, ALTREP permits objects such as integers, doubles, or strings to be backed by alternative storage mechanisms. Developers can override fundamental methods that govern vector behavior---such as length queries, element access (`DATAPTR`, `DATAPTR_OR_NULL`, etc.), duplication, coercion, and even printing, so that objects can behave normally while drawing data from different sources. + +Because these overrides are transparent to R's higher-level functions, ALTREP objects can be passed, transformed, and manipulated like regular vectors, regardless of whether their contents reside in memory, on disk, or are computed lazily. + +For package authors, this framework makes it possible to expose objects that look identical to standard R vectors but internally retrieve their data from sources like memory-mapped files, shared memory, compressed formats, or custom C++ buffers. In practice, this enables efficient handling of large datasets and unconventional data representations while keeping existing R code unchanged. + +## Appendix C: benchmarking in detail {.appendix} + +In Figure \@ref(fig:figure1-detail), the results are presented as scatter plots for square matrices of increasing magnitudes. The left panel shows detailed scatter plots for the first three matrix sizes, while the right panel summarizes all five magnitudes from \(10^1\) to \(10^5\). The x-axis represents computation time (log seconds), and the y-axis represents memory overhead (log megabytes). It is measured as the difference in total RSS. Each point corresponds to one of 100 trials per magnitude. The magenta baseline indicates the performance of a single-threaded R computation. + + + + + +```{r figure1-detail, out.width = "100%", fig.cap = "Median runtime (log-scale) vs matrix size for `memshare`, `SharedObject`, and serial baseline; ribbons show IQR across 100 runs. Insets show difference in total RSS in log(MB), i,e., the memory overhead, during the run relative to idle for Mac presenting the details of Figure \ref{fig:figure1, echo=FALSE}."} +knitr::include_graphics("figures/Figure1_appendix_secs_vs_Resident_Set_Size_mac.png") +``` + +To compute the results for Figure \@ref(fig:appendix-figure1), we used a PSOCK cluster with 15 workers on a different iMac, namely, 128 GB DDR4, 3.8 GHz 8-Core Intel Xeon W with Windows 10 on Boot Camp, and R 4.5.1. For each size, we run 100 repetitions and record wall-clock times and resident set size (RSS) across all worker PIDs plus the master. +The results of the benchmark on windows are presented in Table \@ref(tab:appendix-res-tab-static) and \@ref(tab:appendix-amad-res-tab-static), and in Figure \@ref(fig:appendix-figure1). + +Within macOs Tahoe, for `SharedObject`, the memory after a call increases from about 3062 MB at \(10^1\) to 173,083 MB at exponent \(10^5\), i.e. from roughly 3 GB to 169 GB. +For `memshare`, the memory after a call grows from about 3490 MB at \(10^1\) to 128,393 MB at \(10^5\), i.e. from roughly 3.5 GB to 125 GB. +Based on these numbers, `memshare` slightly uses more memory after the call than `SharedObject` for small and medium problem sizes (exponents \(10^1\) to \(10^4.7\), +but at the largest matrix size \(10^5\) `memshare` requires substantially less overall memory than `SharedObject`. However, within Windows~10 the situation between `SharedObject` and `memshare` changes as depicted in Table \@ref(tab:appendix-res-tab-static). + +It is important to emphasize that our benchmark was conducted on a specific and somewhat idiosyncratic hardware, i.e., a 2021 iMac running Windows~10 via Boot Camp. This configuration combines Apple hardware, Apple’s firmware and drivers, and Microsoft’s operating system and memory manager in a way that is not representative of typical server or workstation deployments. As a consequence, low-level aspects that are relevant for shared-memory performance, such as page allocation strategy, memory-mapped file handling, and the interaction between R, BLAS, and the operating system scheduler, may differ substantially on other platforms. We therefore refrain from drawing broader conclusions about absolute performance or cross-platform behavior from this benchmark and instead interpret the results as a comparative case study of `memshare` versus `SharedObject` on this concrete, well-specified environment. + +Table: (\#tab:appendix-res-tab-static) Median runtime and memory overhead for the benchmark on Windows 10 via Boot Camp. + +| Type | Magnitude | Time Consumed (seconds) | Memory Overhead (MB) | Memory after Call | +|------------------|-----------|-------------------------|----------------------|-------------------| +| SharedObject | 1 | 0.0047 | 0.0391 | 2870.6816 | +| SharedObject | 2 | 0.0124 | 0.4648 | 2873.7266 | +| SharedObject | 3 | 0.0707 | 13.3691 | 2929.9121 | +| SharedObject | 4 | 1.0768 | 2594.9395 | 5799.9766 | +| SharedObject | 4.2 | 2.0788 | 7029.2070 | 10131.3750 | +| SharedObject | 4.5 | 6.1894 | 23832.9727 | 26923.1855 | +| SharedObject | 4.7 | 16.2314 | 38973.7051 | 42078.9980 | +| memshare | 1 | 0.0417 | 1.3848 | 1609.0801 | +| memshare | 2 | 0.0412 | 1.3145 | 1619.4062 | +| memshare | 3 | 0.0487 | 3.3945 | 1675.7246 | +| memshare | 4 | 0.6164 | 764.8184 | 2823.6172 | +| memshare | 4.2 | 1.4858 | 3841.7480 | 5881.9570 | +| memshare | 4.5 | 5.1790 | 15275.3418 | 17298.4473 | +| memshare | 4.7 | 13.7764 | 38336.5898 | 40407.6250 | +| Baseline | 1 | 0.0002 | 0.0000 | 796.5848 | +| Baseline | 2 | 0.0008 | 0.0021 | 796.8240 | +| Baseline | 3 | 0.0138 | 0.1461 | 816.7769 | +| Baseline | 4 | 1.6573 | 1686.0985 | 2542.8888 | +| Baseline | 4.2 | 9.1679 | 4356.6723 | 5234.8967 | +| Baseline | 4.5 | 34.7380 | 17936.2663 | 18959.0366 | +| Baseline | 4.7 | 86.3598 | 45320.6208 | 46571.1692 | +| Baseline Parallel| 1 | 0.0041 | 0.9023 | 78916.7539 | +| Baseline Parallel| 2 | 0.0043 | 0.0000 | 739.4570 | +| Baseline Parallel| 3 | 0.0389 | 7.1484 | 765.4844 | +| Baseline Parallel| 4 | 1.9595 | 1053.2031 | 3416.6445 | +| Baseline Parallel| 4.2 | 10.6068 | 2621.7109 | 6546.7422 | +| Baseline Parallel| 4.5 | 44.0077 | 23568.1680 | 24471.5742 | +| Baseline Parallel| 4.7 | 108.1856 | 58851.6836 | 59758.3945 | + +Table: (\#tab:appendix-amad-res-tab-static) AMAD for the benchmark grid for `SharedObject` and `memshare` on Windows 10 via Boot Camp. + +| Type | Magnitude | Time Consumed (seconds) | Memory Overhead (MB) | Memory after Call | +|--------------|-----------|-------------------------|----------------------|-------------------| +| SharedObject | 1 | 0.00065 | 0.00000 | 0.38021 | +| SharedObject | 2 | 0.00126 | 0.00000 | 0.26351 | +| SharedObject | 3 | 0.00855 | 2.11560 | 1.31002 | +| SharedObject | 4 | 0.02707 | 5.14219 | 2.51839 | +| SharedObject | 4.2 | 0.02521 | 18.96512 | 2.62003 | +| SharedObject | 4.5 | 0.06582 | 43.49404 | 29.24573 | +| SharedObject | 4.7 | 0.22797 | 186.75258 | 181.13231 | +| memshare | 1 | 0.00558 | 0.99004 | 9.55032 | +| memshare | 2 | 0.00411 | 1.33637 | 6.46726 | +| memshare | 3 | 0.00373 | 6.80983 | 8.52263 | +| memshare | 4 | 0.01511 | 304.55616 | 114.84474 | +| memshare | 4.2 | 0.02341 | 158.45925 | 99.27513 | +| memshare | 4.5 | 0.04168 | 173.53949 | 125.11030 | +| memshare | 4.7 | 0.09361 | 148.23133 | 148.96163 | + + + + + +```{r appendix-figure1, out.width = "100%", fig.cap = "Median runtime (log-scale) vs matrix size for `memshare`, `SharedObject`, and serial baseline; ribbons show IQR across 100 runs. Insets show the difference in total RSS in log(MB) (i.e., memory overhead) during the run relative to idle for Windows~10 via Boot Camp.", echo=FALSE} +knitr::include_graphics("figures/Figure1_appendix_secs_vs_Resident_Set_Size_win.png") +``` + +## Appendix D: screenshot {.appendix} + +Report as screenshots in Figure \@ref(fig:app-a-1) and subsequent after forcing to close RStudio in \@ref(fig:app-a-2) of the crash of RStudio if `SharedObject` is called with a matrix of size \(10^5\). + +```{r app-a-1, out.width = "100%", fig.cap = "First Screenshot of ShareObjects Computation.", echo=FALSE} +knitr::include_graphics("figures/Crash_message1.png") +``` + +```{r app-a-2, out.width = "100%", fig.cap = "Second Screenshot of ShareObjects Computation.", echo=FALSE} +knitr::include_graphics("figures/Crash_message2.png") +``` diff --git a/_articles/RJ-2025-043/RJ-2025-043.html b/_articles/RJ-2025-043/RJ-2025-043.html new file mode 100644 index 0000000000..829fa64be7 --- /dev/null +++ b/_articles/RJ-2025-043/RJ-2025-043.html @@ -0,0 +1,2722 @@ + + + + + + + + + + + + + + + + + + + + + + memshare: Memory Sharing for Multicore Computation in R with an Application to Feature Selection by Mutual Information using PDE + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +

    memshare: Memory Sharing for Multicore Computation in R with an Application to Feature Selection by Mutual Information using PDE

    + + + +

    We present memshare, a package that enables shared-memory multicore computation in R by allocating buffers in C++ shared memory and exposing them to R through ALTREP views. We compare memshare to SharedObject (Bioconductor), discuss semantics and safety, and report a 2x speedup over SharedObject with no additional resident memory in a column-wise apply benchmark. Finally, we illustrate a downstream analytics use case: feature selection by mutual information, in which densities are estimated per feature via Pareto Density Estimation (PDE). The analytical use case is an RNA-seq dataset consisting of N = 10,446 cases and d = 19,637 gene expression measurements, requiring roughly n_threads * 10GB of memory in the case of using parallel R sessions. Such and larger use cases are common in big data analytics and make R feel limiting, which is mitigated by the library presented in this work.

    +
    + + + +
    +

    1 Introduction

    +

    Parallel computing in R is usually realized through PSOCK or FORK clusters, where multiple R processes work in parallel (R Core Team 2025), (Corporation and Weston 2025). A practical issue arises immediately: each worker process receives its own private copy of the data. If a matrix consumes several gigabytes in memory, then spawning ten workers results in ten redundant copies, potentially exhausting system memory and severely reducing performance due to paging. This overhead becomes especially prohibitive in genomics or imaging, where matrices of tens of gigabytes are commonplace. Copying also incurs serialization and deserialization costs when transmitting objects to workers, delaying the onset of actual computation.

    +

    Shared memory frameworks address this issue by allowing workers to view the same physical memory without duplication. Instead of copying the whole object, only small handles or identifiers are communicated, while the underlying data is stored once in RAM. This enables efficient multicore computation on large datasets that would otherwise be infeasible.

    +

    ALTREP (short for ALTernative REPresentations) is a framework in R that allows vectors or matrices to be backed by alternative storage while still behaving like ordinary R objects. Method hooks determine how to access length, data pointers, and duplication, so that package developers can integrate external memory management, including shared memory segments, without changing downstream R code.

    +

    A common alternative to in-RAM shared memory is file-backed memory mapping, where binary files on disk are mapped into R and the operating system pages data into memory on demand (Kane et al. 2013). Packages such as bigmemory and bigstatsr create matrices stored on disk but accessible through R as if they were in memory (Prive et al. 2018). This enables analyses on datasets larger than RAM by working column-wise with a small memory footprint. However, because file-backed matrices rely on disk I/O, they are slower than in-RAM shared memory when a single copy of the data would fit into physical memory, but multiple per-worker copies would not. Therefore, this work focuses on ALTREP-based, in-RAM techniques for data that fit into memory once but not many times. +Our contributions are:

    +
      +
    1. A fully independent and user-friendly implementation based on the ALTREP framework.

    2. +
    3. A comparison of memshare vs SharedObject: data model, safety, copy-on-write, and developer surface showing a runtime up to twice as fast for memshare on parallel column operations without extra RSS.

    4. +
    5. A practical template for MI-PDE feature selection on RNA-seq.

    6. +
    +

    2 Background

    +

    A more detailed description of ALTREP internals and our shared-memory use case is provided in Appendix~B; here we summarize only the concepts needed for the comparison of memshare to existing approaches.

    +

    SharedObject baseline

    +

    Having outlined the general ALTREP and shared-memory mechanism, we now briefly review the SharedObject package, which serves as our main existing ALTREP-based shared-memory baseline.

    +

    SharedObject allocates shared segments and wraps them as ALTREP (Wang and Morgan 2025). It exposes properties like copyOnWrite, sharedSubset, and sharedCopy; it supports atomic types and (with caveats) character vectors. Developers can map or unmap shared regions and query whether an object is shared. SharedObject was among the first implementations that showed how ALTREP can enable multicore parallelism by avoiding data duplication. +SharedObject provides share() to wrap an R object as a shared ALTREP, with tunables:

    +
      +
    • copyOnWrite (default TRUE): duplicates on write; setting FALSE enables in‑place edits but is not fully supported and can lead to surprising behavior (e.g., unary minus mutating the source).

    • +
    • sharedSubset: whether slices are also shared; can incur extra duplication in some IDEs; often left FALSE.

    • +
    • sharedCopy: whether duplication of a shared object remains shared. +It supports raw, logical, integer, double, or complex and, with restrictions, character (recommended read‑only; new, previously unseen strings cannot be assigned). Developers can also directly allocate, map, unmap, and free shared regions and query is.shared or is.altrep.

    • +
    +

    R’s threading model

    +

    R’s C API is single‑threaded; package code must not call R from secondary threads. Process‑level parallelism (clusters) remains the primary avenue. Consequently, shared‑memory frameworks must ensure that mutation is either controlled in the main thread or performed at the raw buffer level without touching R internals.

    +

    PDE-based Mutual Information

    +

    For feature selection with a discrete response \(Y\) and a continuous feature \(X\), mutual information can be expressed as: +\[\begin{equation} +I(X;Y) = \sum_{y} p(y) KL(p(x|y) || p(x)) +\end{equation}\]

    +

    This formulation requires only univariate densities \(p(x)\) and \(p(x|y)\) per class. +This lends itself to Pareto Density Estimation (PDE), a density estimator based on hyperspheres with the Pareto radius chosen by an information-optimal criterion. In PDE, the idea is to select a subset S of the data with relative size \(p = |S|/|D|\). The information content is \(I(p) = -p \ln(p)\). (Ultsch 2005) showed that the optimal set size corresponds to about 20.1%, retrieving roughly 88% of the maximum possible information. The unrealized potential (URP) quantifies deviation from the optimal set size and is minimized when \(p\approx 20\%\). For univariate density estimation this yields the Pareto radius \(R\), which can be approximated by the 18\(\%\) quantile distance in one dimension. PDE thus adapts the neighborhood size following the Pareto rule (80–20 rule) to maximize information content. Empirical studies report that PDE can outperform standard density estimators under default settings (Thrun et al. 2020).

    +

    With respect to the categorical variable, no density estimation is needed, as the most accurate density estimate in this case is simply the relative label count, \(p(y) = \frac{\#\{\omega\in\Omega ~|~ Y(\omega) = y\}}{\#\Omega}\). Here \(\Omega\) is the set of cases, \(Y\) is the categorical random variable, and \(y\) runs over the range of \(Y\), \(Y(\Omega)\).

    +

    3 Methods

    +

    In idiomatic use, memshare coordinates PSOCK clusters (separate processes) that attach to one shared segment. Within workers the relevant shared segments are retrieved and the task is executed on them. The key win is replacing per‑worker duplication of the large matrix with a far cheaper retrieval of a handle to the shared memory segment and subsequent wrapping in an ALTREP instead.

    +

    The memshare API

    +

    Shared memory pages in memshare are handled by unique string identifiers on the OS side. These identifiers can be requested and retrieved via C/C++. To prevent two master R sessions from accidentally accessing each other’s memory space because duplicate allocations can lead to undefined behavior at the OS level, users may define one or more namespaces in which the current session operates. +The memshare API closely mirrors C’s memory ownership model but applies it to R sessions. A master (primary) session owns the memory, while worker (secondary) sessions can access it.

    +

    Shared memory semantics

    +

    A crucial aspect of memshare’s design is how shared memory is managed and exposed through R. Three definitions clarify the terminology:

    +

    A namespace refers to a character string that defines the identifier of the shared memory context. It allows the initialization, retrieval, and release of shared variables under a common label, ensuring that multiple sessions or clusters can coordinate access to the same objects. While this does not provide absolute protection, it makes it the user’s responsibility to avoid assigning the same namespace to multiple master sessions.

    +

    Pages are variables owned by the current compilation unit of the code, such as the R session or terminal that loaded the DLL. Pages are realized as shared memory objects: on Windows via MapViewOfFile, and on Unix systems via shm in combination with mmap.

    +

    Views are references to variables owned by another or the same compilation unit. Views are always ALTREP wrappers, providing pointers to the shared memory chunk so that R can interact with them as if they were ordinary vectors or matrices.

    +

    Together, these concepts enforce a lifecycle: pages represent ownership of memory segments, views represent references to them, and namespaces serve as the coordination mechanism. The combination guarantees both memory efficiency and safety when performing multicore computations across R processes.

    +

    If the user detaches the memshare package, all handles are destroyed. This means that all variables of all namespaces are cleared, provided there is no other R thread still using them. In other words, unloading the package cleans up shared memory regions and ensures that no dangling references remain. Other threads still holding a handle to the memory will prevent this cleanup, as it would invalidate the working memory of those threads. The shared memory is then cleared whenever all handles are released.

    +
    Master session
    +

    A master session takes ownership of a memory page using:

    +
      +
    • registerVariables(namespace, variableList)
    • +
    +

    where variableList is a named list of supported types. These are double matrices, double vectors, or lists of these. The names define the memory pages ID through which they can be accessed, while the values of the actual variables define the size and content of the memory page.

    +

    To deallocate memory pages, the master session can call:

    +
      +
    • releaseVariables(namespace, variableNames)
    • +
    +

    where variableNames is a character vector containing the names of previously shared variables.

    +

    memshare also allows for releasing all the memory allocated for a given namespace by a memory context, i.e., a parallel cluster with a master session, via the

    +
      +
    • memshare_gc(namespace, cluster)
    • +
    +

    function. This first removes every view handle in the context and then releases all pages.

    +

    Note. Memory pages are not permanently deallocated if another session still holds a view of them. This ensures stability: allowing workers to continue with valid but outdated memory is safer than letting them access invalidated memory. However, releasing variables still in use is always a user error and must be avoided.

    +
    Worker session
    +

    Once the master session has shared variables, worker sessions can retrieve them via:

    +
      +
    • retrieveViews(namespace, variableNames)
    • +
    +

    This returns a named list of R objects. These objects are raw ALTREP objects indistinguishable from the originals (is.matrix, is.numeric, etc.), and all behave the same.

    +

    When operating on these objects, workers interact directly with the underlying C buffer, backed by mmap (Unix) or MapViewOfFile (Windows). Changes to such objects modify the shared memory for all sessions. In this framework, however, modification is secondary—the main goal is to transfer data from the master to worker sessions.

    +

    For metadata access without retrieving views, workers can call:

    +
      +
    • retrieveMetadata(namespace, variableName)
    • +
    +

    which provides information for a single variable.

    +

    After processing, workers must return their views to allow memory release by calling:

    +
      +
    • releaseViews(namespace, variableNames)
    • +
    +

    The overall high-level concept is summarized in Figure 1.

    +
    +
    +A schematic about where the memory is located and how different sessions access it. +

    +Figure 1: A schematic about where the memory is located and how different sessions access it. +

    +
    +
    +
    Diagnostic tools
    +

    To verify correct memory management, two diagnostic functions are available:

    +
      +
    • pageList(): lists all variables owned by the current session.
      +
    • +
    • viewList(): lists all views (handles) currently held by the session.
    • +
    +

    The former is stricter, since it identifies ownership, whereas the latter only tracks held views.

    +
    User-friendly wrapper functions for apply and lapply
    +

    Since memory ownership and address sharing are low-level concepts, memshare provides wrapper functions that mimic parallel::parApply and parallel::parLapply.

    +
      +
    • memApply(X, MARGIN, FUN, NAMESPACE, CLUSTER, VARS, MAX.CORES)
      +Mimics parallel::parApply. +
        +
      • X: a double matrix.
        +
      • +
      • MARGIN: direction (1 = row-wise, 2 = column-wise).
        +
      • +
      • FUN: function applied to rows/columns.
        +
      • +
      • CLUSTER: a prepared parallel cluster (variables exported via parallel::clusterExport).
        +
      • +
      • VARS: additional shared variables (names must match FUN arguments).
        +
      • +
      • MAX.CORES: only relevant if CLUSTER is uninitialized.
      • +
    • +
    +

    memApply automatically manages sharing and cleanup of X and VARS, ensuring no residual C/C++ buffers remain. Both X and VARS can also refer to previously allocated shared variables, though in that case the user must manage their lifetime.

    +
      +
    • memLapply(X, FUN, NAMESPACE, CLUSTER, VARS, MAX.CORES)
      +Equivalent to parallel::parLapply, but within the memshare framework. +
        +
      • X: a list of double matrices or vectors.
        +
      • +
      • Other arguments behave the same way as in memApply.
      • +
    • +
    +

    Examples of Use

    +

    We provide two top-level examples for the use of the memshare package: one with memLapply and one with memApply.

    +

    The first example computes the correlation between each column of a matrix and a reference vector using shared memory and memApply. The matrix can be provided directly and will be registered automatically, or by name if already registered.

    +
    +
    +
    library(memshare)
    +set.seed(1)
    +n <- 10000
    +p <- 2000
    +# Numeric double matrix (required): n rows (cases) x d columns (features)
    +X <- matrix(rnorm(n * p), n, p)
    +# Reference vector to correlate with each column
    +y <- rnorm(n)
    +f <- function(v, y) cor(v, y)
    +
    +ns <- "my_namespace"
    +res <- memshare::memApply(
    +X = X, MARGIN = 2,
    +FUN = f,
    +NAMESPACE = ns,
    +VARS = list(y = y),
    +MAX.CORES = NULL # defaults to detectCores() - 1
    +)
    +
    +
    +

    memApply parallelizes a row- or column-wise map over a matrix that lives once in shared memory. If X is passed as an ordinary R matrix, it is registered under a generated name in the namespace ns. Additional variables (here y) can be provided as a named list; these are registered and retrieved as ALTREP views on the workers. A cluster is created automatically if none is provided. Each worker obtains a cached view of the matrix (and any shared variables), extracts the i-th row or column as a vector v according to MARGIN, calls FUN(v,...), and returns the result. Views are released after the computation, and any objects registered by this call are freed. Because workers operate on shared views rather than copies, the total resident memory remains close to a single in-RAM copy of X, while runtime scales with the available cores.

    +

    As a second example, consider a case where a list of 1000 random matrices is multiplied by a random vector. This task is parallelizable at the element level and demonstrates the use of memshare::memLapply, which applies a function across list elements in a shared memory context:

    +
    +
    +
      library(memshare)
    +  list_length <- 1000
    +  matrix_dim <- 100
    +
    +  # Create the list of random matrices
    +  l <- lapply(
    +      1:list_length,
    +      function(i) matrix(rnorm(matrix_dim * matrix_dim),
    +      nrow = matrix_dim, ncol = matrix_dim)
    +      )
    +
    +  y <- rnorm(matrix_dim)
    +  namespace <- "my_namespace"
    +
    +  res <- memLapply(l, function(el, y) {
    +    el %*% y
    +  }, NAMESPACE = namespace, VARS = list(y=y),
    +  MAX.CORES = 1) #MAX.CORES=1 for simplicity
    +
    +
    +

    memLapply() provides a parallel version of lapply() where the list elements and optional auxiliary variables are stored in shared memory. If the input X is an ordinary R list, it is first registered in a shared memory namespace. Additional variables can be supplied either as names of existing shared objects or as a named list to be registered. A parallel cluster is created automatically if none is provided, and each worker is initialized with the memshare environment.

    +

    For each index of the list, the worker retrieves an ALTREP view of the corresponding element (and of any shared variables), applies the user-defined function FUN to these objects, and then releases the views to avoid memory leaks. The function enforces that the first argument of FUN corresponds to the list element and that the names of shared variables match exactly between the namespace and the function signature. Results are collected with parLapply, yielding an ordinary R list of the same length as the input.

    +

    Because only lightweight references to the shared objects are passed to the workers, no duplication of large data occurs, making the approach memory-efficient. Finally, memLapply() includes cleanup routines to release temporary registrations, stop the cluster if it was created internally, and free shared memory, ensuring safe reuse in subsequent computations.

    +

    Benchmark design

    +

    We compare memshare and SharedObject on a column-wise apply task across square matrices of sizes \(10^i \times 10^i\) for \(i = 1,\ldots,5\). We use a PSOCK cluster with 32 cores on an iMac Pro, 256 GB DDR4, 2.3 GHz 18-core Intel Xeon W on macOS Sequoia 16.6.1 (24G90) with R 4.5.1 x86_64-apple-darwin20. For each size, we run 100 repetitions and recorded wall-clock times and resident set size (RSS) across all worker PIDs plus the master. The RSS is summed via ps() and our helper total_rss_mb(). +We define the memory overhead as the difference in total RSS before and after the call, i.e., we measure the additional memory required by the computation beyond the base process footprint.

    +

    For SharedObject we create A2, share(A1), and parApply(); for memshare we call memApply directly on A1 with a namespace, so that only ALTREP views are created on the workers. A serial baseline uses apply(), and an additional baseline uses parApply() without shared memory. A minimally edited version of the full script (setup, PID collection, loops, and data saving) is provided in Appendix~A to ensure reproducibility.

    +

    As part of our safety and lifecycle checks, we ensure that views, which keep shared segments alive, are always released in the workers before returning control. Once all work is complete, the corresponding variables are then released in the master. To maintain fairness, we avoid creating incidental copies, such as those introduced by coercions, remove variables with rm(), and use R’s garbage collection gc() after each call.

    +

    RNA-seq dataset via FireBrowse

    +

    FireBrowse (Broad Institute of MIT and Harvard 2025) delivers raw counts of gene expression indexed by NCBI identifiers. For each gene identifier \(i\) (from Ensembl or NCBI), we obtain a raw count \(r_i\) that quantifies the observed read abundance. These raw counts represent the number of reads mapped to each gene, without length normalization. To convert raw counts into TPM (transcripts per million) (Li and Dewey 2011), we require gene or transcript lengths \(l_i\). For each gene \(i\), we compute:

    +

    \[\begin{equation} +\hat{r}_i=\frac{r_i}{l_i} +\end{equation}\]

    +

    The total sum \(R = \sum_i \hat{r}_i\) across all genes is then used to scale values as:

    +

    \[\begin{equation} +TPM_i=\frac{\hat{r}_i}{R} \times 10^6 +\end{equation}\] +This transformation allows comparison of expression levels across genes and samples by correcting for gene length and sequencing depth (Li and Dewey 2011). After transformation, our dataset consists of \(d = 19,637\) gene expressions across \(N = 10,446\) cases spanning 32 diagnoses. It can be found under (Thrun and Märte 2025).

    +

    4 Results

    +

    In the first subsection, the efficiency of memshare is compared to SharedObject, and in the second subsection the application is presented.

    +

    Performance and Memory

    +

    In Figure 2, the results for square matrices of increasing magnitudes are shown as summary plots with variability bars. In these error-bar-style plots the bars indicate median ± AMAD. +The bottom subfigures (C–D) provide a zoomed view of the first four matrix sizes, while the top subfigures (A–B) display all five magnitudes from \(10^1\) to \(10^5\). +The x-axis represents the magnitude of the matrix size, and the y-axis shows either runtime in seconds (subfigures A and C) or memory overhead in megabytes (subfigures B and D). Memory overhead is measured as the difference in total RSS. For each magnitude, 100 trials were conducted. The magenta line indicates the performance of the single-threaded R baseline.

    +

    Table~1 reports the median runtime and memory overhead, while variability is visualized in Figure~4 via the scatter of 100 runs and summarized numerically by the robust AMAD dispersion statistic in Table~2.

    +

    memshare (orange diamonds) consistently outperforms SharedObject (blue circles) in Figure 2. In the scatter plot of the 100 trials in Figure~4, it is evident that for the first three magnitudes both packages use less memory than the baseline. At \(10^4\), average memory usage is comparable to the baseline, while at \(10^5\), memshare slightly exceeds it. SharedObject, however, could only be executed at this magnitude from the terminal, but not within RStudio (see Appendix~B).

    +

    Considering relative differences (Ultsch 2008), memshare achieves computation times that are 90–170% faster than SharedObject. For matrices of size \(10^2\) and larger, memory consumption is reduced by 132–153% compared to SharedObject.

    +
    +
    +Matrix size depicted as magnitude vs median runtime (left) and vs memory overhead (MB) during the run relative to idle (right) for `memshare`, `SharedObject` as error-bar style plots with intevals given by the median ± AMAD across 100 runs. In addition, the serial baseline is shown as a line in magenta. The top subfigures present the full range of matrix sizes, and the bottom subfigures zoom in. +

    +Figure 2: Matrix size depicted as magnitude vs median runtime (left) and vs memory overhead (MB) during the run relative to idle (right) for memshare, SharedObject as error-bar style plots with intevals given by the median ± AMAD across 100 runs. In addition, the serial baseline is shown as a line in magenta. The top subfigures present the full range of matrix sizes, and the bottom subfigures zoom in. +

    +
    +
    + + +++++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Table 1: The benchmark compares four types: memshare, SharedObject, a single-threaded baseline, and a parallel baseline. For memshare and SharedObject, the reported values are the medians over 100 iterations, while the baselines are the result from either a single-threaded R or a simple parApply run using one iteration. Magnitude refers to the matrix size. Entries are given as Time Consumed (Memory Overhead), where time is measured in seconds and memory in megabytes (MB); the memory after call is mentioned in Appendix C.
    Type / MagnitudeBaselineBaseline parApplySharedObjectmemshare
    10.0003 (0.0234)0.0049 (0.9023)0.2492 (0.0801)0.0416 (1.1426)
    20.0008 (0.1461)0.0034 (0.1461)0.2531 (0.5117)0.0419 (0.3594)
    30.0231 (15.2656)0.0356 (7.6406)0.3238 (11.6387)0.0481 (1.4688)
    42.2322 (1664.9727)3.5015 (2627.1133)1.5526 (1570.0566)0.6655 (895.4473)
    4.29.2883 (4490.4648)12.9872 (6040.4023)3.1147 (3901.5137)1.6223 (3881.7441)
    4.536.3783 (17206.4688)53.6183 (24983.7852)10.3513 (15391.0020)6.6583 (15285.4258)
    4.792.0136 (39355.5703)130.6937 (62936.4766)32.3116 (38533.7266)16.3157 (38389.7305)
    5217.4490 (157812.9492)128.0942 (152967.0273)67.0000 (76311.7402)
    + + +++++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Table 2: AMAD for the benchmark of SharedObject vs memshare.
    MagnitudeTypeTime Consumed (seconds)Memory Overhead (MB)Memory after Call
    1SharedObject0.012410.064007.47613
    2SharedObject0.005490.0527012.85171
    3SharedObject0.007430.23716109.60844
    4SharedObject0.051799.98323364.98253
    4.2SharedObject0.137399.28681533.72239
    4.5SharedObject0.6343814.733921619.81984
    4.7SharedObject2.0950215.81807823.25853
    6SharedObject4.0180222.88011555.74045
    1memshare0.001981.4116651.33531
    2memshare0.002510.2748017.12808
    3memshare0.002281.9123278.65743
    4memshare0.02038110.180645956.56036
    4.2memshare0.0320639.210142346.06943
    4.5memshare0.2457136.977841086.58683
    4.7memshare0.5313489.446244081.05399
    5memshare2.4828031.986232413.87405
    +

    Application to Feature Selection by Mutual Information using Pareto Density Estimation

    +

    The computation of mutual information produced values ranging from 0 to 0.54 (Figure 3 ). The QQ-plot shows clear deviation from a straight line, indicating that the distribution is not Gaussian. Both the histogram and the PDE plot provide consistent estimates of the probability density, revealing a bimodal structure. The boxplot further highlights the presence of outliers with values above 0.4.

    +

    The analysis required about two hours of computation time and approximately 47 GB of RAM, feasible only through memory sharing. In practice, mutual information values can guide feature selection, either by applying a hard threshold or by using a soft approach via a mixture model, depending on the requirements of the subsequent machine learning task.

    +
    +
    +The distribution of mutual information for 19637 gene expressions as a histogram, Pareto Density Estimation (PDE), QQ-plot against normal distribution and boxplot. There are no missing values (NaN). +

    +Figure 3: The distribution of mutual information for 19637 gene expressions as a histogram, Pareto Density Estimation (PDE), QQ-plot against normal distribution and boxplot. There are no missing values (NaN). +

    +
    +
    +
    +

    5 Discussion

    +

    On all major platforms, PSOCK clusters execute each worker in a separate R process. Consequently, a call to parApply(cl, X, MARGIN, FUN, …) requires R to serialize the matrix to transmit it to each worker, and deserialize it into a full in-memory copy in every process. +As shown in Table 1, this replication leads to out-of-memory (OOM) failures once the matrix reaches a size of \(10^5 \times 10^5\), despite the machine providing 256 GB of RAM. +Even for smaller magnitudes, substantial redundant memory allocation occurs: multiple workers may begin materializing private copies of the matrix faster than they can complete their portion of the computation, resulting in transient but significant memory amplification. This behavior, inherent to PSOCK-based parallelization, explains the observed OOM conditions under parApply.

    +

    Consequently, shared memory becomes a foundational requirement for scalable high-performance computing in R, because it avoids redundant data replication and stabilizes memory usage as problem sizes grow. +In our experiments, this advantage is reflected directly in performance: across matrix sizes, memshare achieved a two-fold reduction in median computation time compared to SharedObject on the column-wise task. For large matrix sizes, both memshare and SharedObject show a lower total memory overhead than the single-threaded baseline, because R’s serial apply() implementation creates substantial temporary objects. Each column extraction allocates a full-length numeric vector, and additional intermediates are produced by FUN. These private allocations inflate the baseline’s RSS, and memory is not promptly returned to the OS due to R’s garbage-collection strategy. In contrast, memshare and SharedObject provide ALTREP-backed views into a shared memory segment, eliminating the need to materialize full column vectors.

    +

    For matrices of size \(10^2\) and larger, memory overhead was between half and a third of that of SharedObject. At the smallest size (\(10^1\)), memshare consumed more memory than SharedObject because, in R, the metadata must also be shared; this requires a second shared-memory segment whose fixed overhead dominates at small sizes.

    +

    The experiments show that SharedObject exhibited overhead consistent with copy-on-write materializations and temporary object creation up to size \(10^4\). Its memory usage was approximately an order of magnitude higher on macOS than that of memshare or the single thread baseline, as illustrated by the triangles aligning with the single-thread baseline of a higher magnitude in Figure 4. For matrices of size \(10^5\), SharedObject caused RStudio to crash (see appendix~D), although results were computable in R via the terminal.

    +

    Beyond these synthetic benchmarks, the RNA-seq case study illustrates how this computational behavior translates into a practical high-dimensional analysis pipeline. Biologically, the bimodal structure in Figure~3 is consistent with a separation between largely uninformative background genes and a smaller subset of diagnosis, or subtype-specific markers. Genes in the lower mode, i.e., with MI values close to zero, show little association with the 32 diagnostic labels and plausibly correspond to broadly expressed housekeeping or pathway genes whose expression varies only weakly across cancer types. In contrast, genes in the upper mode, i.e., MI values in the right-hand peak, exhibit strong dependence on the diagnostic label and are therefore candidates for disease- or tissue-specific markers, including lineage markers and immune-related genes that differ systematically between tumor entities. Although a detailed pathway or gene-set enrichment analysis is beyond the scope of this work, the presence of a distinct high-MI mode indicates that the PDE-based mutual information filter successfully highlights genes whose expression patterns are highly structured with respect to the underlying diagnostic classes and likely reflect underlying molecular subtypes.

    +

    At the same time, the fact that this analysis required approximately two hours of computation time and about 47~GB of RAM underscores the central role of memory sharing: without memshare, running such a MI-PDE filter on 19,637 genes and 10,446 cases in parallel would be prohibitively memory-intensive on typical multicore hardware. In sum, memshare provides a more stable memory-sharing interface, scales more effectively to large matrix sizes, and achieves greater computational efficiency than SharedObject.

    +

    6 Summary

    +

    Regardless of the package, R’s single-threaded API implies that multi-threaded computation should touch only raw memory and synchronize results at the main thread boundary. Shared mutation requires external synchronization if multiple workers write to overlapping regions. In practice, read-mostly patterns are ideal.

    +

    Here, memshare’s namespace + view model and memApply wrapper simplify cross-process sharing compared to manual share() + cluster wiring. Its explicit releaseViews/Variables lifecycle makes retention and cleanup auditable. SharedObject’s fine-grained properties are powerful, but the interaction of copy-on-write and duplication semantics increases cognitive load.

    +

    memshare combines ALTREP-backed shared memory with a pragmatic parallel API to deliver strong speed and memory efficiency on multicore systems. In analytic pipelines like MI-based feature selection for RNA-seq, this enables simple, scalable patterns—one in-RAM copy of the data, many cores, and no serialization overhead.

    +

    7 Appendix A: code listing of benchmark

    +

    The full code is accessible via (Thrun 2025). To avoid the crash message for SharedObject in Appendix~B, we tried manual garbage collection and performed saves for each iteration without being able to change the outcome. Only by restricting usage to the ‘R’ terminal console without RStudio were we able to compute all results.

    +

    8 Appendix B: ALTREP and shared memory in R

    +

    In R, ALTREP (short for ALTernate REPresentations) is a framework introduced in version 3.5.0 that allows vectors and other objects to be stored and accessed in non-standard ways while maintaining their usual R interface. Built-in type checks cannot tell the difference between an ALTREP object and its ordinary counterpart, which ensures compatibility.

    +

    Instead of relying solely on R’s default contiguous in-memory arrays, ALTREP permits objects such as integers, doubles, or strings to be backed by alternative storage mechanisms. Developers can override fundamental methods that govern vector behavior—such as length queries, element access (DATAPTR, DATAPTR_OR_NULL, etc.), duplication, coercion, and even printing, so that objects can behave normally while drawing data from different sources.

    +

    Because these overrides are transparent to R’s higher-level functions, ALTREP objects can be passed, transformed, and manipulated like regular vectors, regardless of whether their contents reside in memory, on disk, or are computed lazily.

    +

    For package authors, this framework makes it possible to expose objects that look identical to standard R vectors but internally retrieve their data from sources like memory-mapped files, shared memory, compressed formats, or custom C++ buffers. In practice, this enables efficient handling of large datasets and unconventional data representations while keeping existing R code unchanged.

    +

    9 Appendix C: benchmarking in detail

    +

    In Figure 4, the results are presented as scatter plots for square matrices of increasing magnitudes. The left panel shows detailed scatter plots for the first three matrix sizes, while the right panel summarizes all five magnitudes from \(10^1\) to \(10^5\). The x-axis represents computation time (log seconds), and the y-axis represents memory overhead (log megabytes). It is measured as the difference in total RSS. Each point corresponds to one of 100 trials per magnitude. The magenta baseline indicates the performance of a single-threaded R computation.

    + + + +
    +
    +Median runtime (log-scale) vs matrix size for `memshare`, `SharedObject`, and serial baseline; ribbons show IQR across 100 runs. Insets show difference in total RSS in log(MB), i,e., the memory overhead, during the run relative to idle for Mac presenting the details of Figure 
+ef{fig:figure1, echo=FALSE}. +

    +Figure 4: Median runtime (log-scale) vs matrix size for memshare, SharedObject, and serial baseline; ribbons show IQR across 100 runs. Insets show difference in total RSS in log(MB), i,e., the memory overhead, during the run relative to idle for Mac presenting the details of Figure +ef{fig:figure1, echo=FALSE}. +

    +
    +
    +

    To compute the results for Figure 5, we used a PSOCK cluster with 15 workers on a different iMac, namely, 128 GB DDR4, 3.8 GHz 8-Core Intel Xeon W with Windows 10 on Boot Camp, and R 4.5.1. For each size, we run 100 repetitions and record wall-clock times and resident set size (RSS) across all worker PIDs plus the master. +The results of the benchmark on windows are presented in Table 3 and 4, and in Figure 5.

    +

    Within macOs Tahoe, for SharedObject, the memory after a call increases from about 3062 MB at \(10^1\) to 173,083 MB at exponent \(10^5\), i.e. from roughly 3 GB to 169 GB. +For memshare, the memory after a call grows from about 3490 MB at \(10^1\) to 128,393 MB at \(10^5\), i.e. from roughly 3.5 GB to 125 GB. +Based on these numbers, memshare slightly uses more memory after the call than SharedObject for small and medium problem sizes (exponents \(10^1\) to \(10^4.7\), +but at the largest matrix size \(10^5\) memshare requires substantially less overall memory than SharedObject. However, within Windows~10 the situation between SharedObject and memshare changes as depicted in Table 3.

    +

    It is important to emphasize that our benchmark was conducted on a specific and somewhat idiosyncratic hardware, i.e., a 2021 iMac running Windows~10 via Boot Camp. This configuration combines Apple hardware, Apple’s firmware and drivers, and Microsoft’s operating system and memory manager in a way that is not representative of typical server or workstation deployments. As a consequence, low-level aspects that are relevant for shared-memory performance, such as page allocation strategy, memory-mapped file handling, and the interaction between R, BLAS, and the operating system scheduler, may differ substantially on other platforms. We therefore refrain from drawing broader conclusions about absolute performance or cross-platform behavior from this benchmark and instead interpret the results as a comparative case study of memshare versus SharedObject on this concrete, well-specified environment.

    + + +++++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Table 3: Median runtime and memory overhead for the benchmark on Windows 10 via Boot Camp.
    TypeMagnitudeTime Consumed (seconds)Memory Overhead (MB)Memory after Call
    SharedObject10.00470.03912870.6816
    SharedObject20.01240.46482873.7266
    SharedObject30.070713.36912929.9121
    SharedObject41.07682594.93955799.9766
    SharedObject4.22.07887029.207010131.3750
    SharedObject4.56.189423832.972726923.1855
    SharedObject4.716.231438973.705142078.9980
    memshare10.04171.38481609.0801
    memshare20.04121.31451619.4062
    memshare30.04873.39451675.7246
    memshare40.6164764.81842823.6172
    memshare4.21.48583841.74805881.9570
    memshare4.55.179015275.341817298.4473
    memshare4.713.776438336.589840407.6250
    Baseline10.00020.0000796.5848
    Baseline20.00080.0021796.8240
    Baseline30.01380.1461816.7769
    Baseline41.65731686.09852542.8888
    Baseline4.29.16794356.67235234.8967
    Baseline4.534.738017936.266318959.0366
    Baseline4.786.359845320.620846571.1692
    Baseline Parallel10.00410.902378916.7539
    Baseline Parallel20.00430.0000739.4570
    Baseline Parallel30.03897.1484765.4844
    Baseline Parallel41.95951053.20313416.6445
    Baseline Parallel4.210.60682621.71096546.7422
    Baseline Parallel4.544.007723568.168024471.5742
    Baseline Parallel4.7108.185658851.683659758.3945
    + + +++++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Table 4: AMAD for the benchmark grid for SharedObject and memshare on Windows 10 via Boot Camp.
    TypeMagnitudeTime Consumed (seconds)Memory Overhead (MB)Memory after Call
    SharedObject10.000650.000000.38021
    SharedObject20.001260.000000.26351
    SharedObject30.008552.115601.31002
    SharedObject40.027075.142192.51839
    SharedObject4.20.0252118.965122.62003
    SharedObject4.50.0658243.4940429.24573
    SharedObject4.70.22797186.75258181.13231
    memshare10.005580.990049.55032
    memshare20.004111.336376.46726
    memshare30.003736.809838.52263
    memshare40.01511304.55616114.84474
    memshare4.20.02341158.4592599.27513
    memshare4.50.04168173.53949125.11030
    memshare4.70.09361148.23133148.96163
    + + + +
    +
    +Median runtime (log-scale) vs matrix size for `memshare`, `SharedObject`, and serial baseline; ribbons show IQR across 100 runs. Insets show the difference in total RSS in log(MB) (i.e., memory overhead) during the run relative to idle for Windows~10 via Boot Camp. +

    +Figure 5: Median runtime (log-scale) vs matrix size for memshare, SharedObject, and serial baseline; ribbons show IQR across 100 runs. Insets show the difference in total RSS in log(MB) (i.e., memory overhead) during the run relative to idle for Windows~10 via Boot Camp. +

    +
    +
    +

    10 Appendix D: screenshot

    +

    Report as screenshots in Figure 6 and subsequent after forcing to close RStudio in 7 of the crash of RStudio if SharedObject is called with a matrix of size \(10^5\).

    +
    +
    +First Screenshot of ShareObjects Computation. +

    +Figure 6: First Screenshot of ShareObjects Computation. +

    +
    +
    +
    +
    +Second Screenshot of ShareObjects Computation. +

    +Figure 7: Second Screenshot of ShareObjects Computation. +

    +
    +
    +
    +

    11 Supplementary materials

    +

    Supplementary materials are available in addition to this article. It can be downloaded at +RJ-2025-043.zip

    +
    +
    +Broad Institute of MIT and Harvard. FireBrowse (RRID:SCR_026320). 2025. Accessed via https://gdac.broadinstitute.org. +
    +
    +M. Corporation and S. Weston. doParallel: Foreach parallel adaptor for the ’parallel’ package. 2025. URL https://github.com/revolutionanalytics/doparallel. R package version 1.0.17. +
    +
    +M. Kane, J. W. Emerson and S. Weston. Scalable strategies for computing with massive data. Journal of Statistical Software, 55(14): 1–19, 2013. URL https://doi.org/10.18637/jss.v055.i14. +
    +
    +B. Li and C. N. Dewey. RSEM: Accurate transcript quantification from RNA-seq data with or without a reference genome. BMC Bioinformatics, 12(1): 323, 2011. URL https://doi.org/10.1186/1471-2105-12-323. +
    +
    +F. Prive, H. Aschard, A. Ziyatdinov and M. G. B. Blum. Efficient analysis of large-scale genome-wide data with two R packages: Bigstatsr and bigsnpr. Bioinformatics, 34(16): 2781–2787, 2018. URL https://doi.org/10.1093/bioinformatics/bty185. +
    +
    +R Core Team. Support for parallel computation in R. 2025. URL https://stat.ethz.ch/R-manual/R-devel/library/parallel/doc/parallel.pdf. R package ’parallel’ version included in R. +
    +
    +M. Thrun. AppendixA: Memshare: Memory sharing for multicore computation in R with an application to feature selection by mutual information using PDE. 2025. URL https://zenodo.org/records/17762666. +
    +
    +M. C. Thrun, T. Gehlert and A. Ultsch. Analyzing the fine structure of distributions. PLOS ONE, 15(10): e0238835, 2020. URL https://doi.org/10.1371/journal.pone.0238835. +
    +
    +M. Thrun and J. Märte. Genexpressions dataset derived from FireBrowse. 2025. URL https://zenodo.org/records/16937028. +
    +
    +A. Ultsch. Is log ratio a good value for measuring return in stock investments? In Advances in data analysis, data handling and business intelligence, pages. 505–511 2008. Springer. +
    +
    +A. Ultsch. Pareto density estimation: A density estimation for knowledge discovery. In Proceedings of the 28th annual conference of the German Classification Society (GfKl), pages. 91–98 2005. Springer. +
    +
    +J. Wang and M. Morgan. SharedObject: Sharing R objects across multiple R processes without memory duplication. 2025. URL https://bioconductor.org/packages/SharedObject. R package version (Bioconductor Release). +
    +
    + + +
    + +
    +
    + + + + + + + +
    +

    References

    +
    +

    Reuse

    +

    Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

    +

    Citation

    +

    For attribution, please cite this work as

    +
    Thrun & Märte, "memshare: Memory Sharing for Multicore Computation in R with an Application to Feature Selection by Mutual Information using PDE", The R Journal, 2026
    +

    BibTeX citation

    +
    @article{RJ-2025-043,
    +  author = {Thrun, Michael C. and Märte, Julian},
    +  title = {memshare: Memory Sharing for Multicore Computation in R with an Application to Feature Selection by Mutual Information using PDE},
    +  journal = {The R Journal},
    +  year = {2026},
    +  note = {https://doi.org/10.32614/RJ-2025-043},
    +  doi = {10.32614/RJ-2025-043},
    +  volume = {17},
    +  issue = {4},
    +  issn = {2073-4859},
    +  pages = {305-321}
    +}
    +
    + + + + + + + diff --git a/_articles/RJ-2025-043/RJ-2025-043.pdf b/_articles/RJ-2025-043/RJ-2025-043.pdf new file mode 100644 index 0000000000..29f77b7974 Binary files /dev/null and b/_articles/RJ-2025-043/RJ-2025-043.pdf differ diff --git a/_articles/RJ-2025-043/RJ-2025-043.tex b/_articles/RJ-2025-043/RJ-2025-043.tex new file mode 100644 index 0000000000..ce448afd3d --- /dev/null +++ b/_articles/RJ-2025-043/RJ-2025-043.tex @@ -0,0 +1,654 @@ +% !TeX root = RJwrapper.tex +\title{memshare: Memory Sharing for Multicore Computation in R with an Application to Feature Selection by Mutual Information using PDE} + + +\author{by Michael C. Thrun and Julian Märte} + +\maketitle + +\abstract{% +We present memshare, a package that enables shared-memory multicore computation in R by allocating buffers in C++ shared memory and exposing them to R through ALTREP views. We compare memshare to SharedObject (Bioconductor), discuss semantics and safety, and report a 2x speedup over SharedObject with no additional resident memory in a column-wise apply benchmark. Finally, we illustrate a downstream analytics use case: feature selection by mutual information, in which densities are estimated per feature via Pareto Density Estimation (PDE). The analytical use case is an RNA-seq dataset consisting of N = 10,446 cases and d = 19,637 gene expression measurements, requiring roughly \texttt{n\_threads} * 10GB of memory in the case of using parallel R sessions. Such and larger use cases are common in big data analytics and make R feel limiting, which is mitigated by the library presented in this work. +} + +\subsection{Introduction}\label{introduction} + +Parallel computing in R is usually realized through PSOCK or FORK clusters, where multiple R processes work in parallel \citep{Rparallel2025}, \citep{doparallel2025}. A practical issue arises immediately: each worker process receives its own private copy of the data. If a matrix consumes several gigabytes in memory, then spawning ten workers results in ten redundant copies, potentially exhausting system memory and severely reducing performance due to paging. This overhead becomes especially prohibitive in genomics or imaging, where matrices of tens of gigabytes are commonplace. Copying also incurs serialization and deserialization costs when transmitting objects to workers, delaying the onset of actual computation. + +Shared memory frameworks address this issue by allowing workers to view the same physical memory without duplication. Instead of copying the whole object, only small handles or identifiers are communicated, while the underlying data is stored once in RAM. This enables efficient multicore computation on large datasets that would otherwise be infeasible. + +ALTREP (short for ALTernative REPresentations) is a framework in R that allows vectors or matrices to be backed by alternative storage while still behaving like ordinary R objects. Method hooks determine how to access length, data pointers, and duplication, so that package developers can integrate external memory management, including shared memory segments, without changing downstream R code. + +A common alternative to in-RAM shared memory is file-backed memory mapping, where binary files on disk are mapped into R and the operating system pages data into memory on demand \citep{kane2013scalable}. Packages such as \texttt{bigmemory} and \texttt{bigstatsr} create matrices stored on disk but accessible through R as if they were in memory \citep{prive2018efficient}. This enables analyses on datasets larger than RAM by working column-wise with a small memory footprint. However, because file-backed matrices rely on disk I/O, they are slower than in-RAM shared memory when a single copy of the data would fit into physical memory, but multiple per-worker copies would not. Therefore, this work focuses on ALTREP-based, in-RAM techniques for data that fit into memory once but not many times. +Our contributions are: + +\begin{enumerate} +\def\labelenumi{\arabic{enumi}.} +\item + A fully independent and user-friendly implementation based on the ALTREP framework. +\item + A comparison of \texttt{memshare} vs \texttt{SharedObject}: data model, safety, copy-on-write, and developer surface showing a runtime up to twice as fast for \texttt{memshare} on parallel column operations without extra RSS. +\item + A practical template for MI-PDE feature selection on RNA-seq. +\end{enumerate} + +\subsection{Background}\label{background} + +A more detailed description of ALTREP internals and our shared-memory use case is provided in Appendix\textasciitilde B; here we summarize only the concepts needed for the comparison of \texttt{memshare} to existing approaches. + +\subsubsection{\texorpdfstring{\texttt{SharedObject} baseline}{SharedObject baseline}}\label{sharedobject-baseline} + +Having outlined the general ALTREP and shared-memory mechanism, we now briefly review the \texttt{SharedObject} package, which serves as our main existing ALTREP-based shared-memory baseline. + +\texttt{SharedObject} allocates shared segments and wraps them as ALTREP \citep{sharedobject2025}. It exposes properties like \texttt{copyOnWrite}, \texttt{sharedSubset}, and \texttt{sharedCopy}; it supports atomic types and (with caveats) character vectors. Developers can map or unmap shared regions and query whether an object is shared. \texttt{SharedObject} was among the first implementations that showed how ALTREP can enable multicore parallelism by avoiding data duplication. +\texttt{SharedObject} provides \texttt{share()} to wrap an R object as a shared ALTREP, with tunables: + +\begin{itemize} +\item + \texttt{copyOnWrite} (default TRUE): duplicates on write; setting FALSE enables in‑place edits but is not fully supported and can lead to surprising behavior (e.g., unary minus mutating the source). +\item + \texttt{sharedSubset}: whether slices are also shared; can incur extra duplication in some IDEs; often left \texttt{FALSE}. +\item + \texttt{sharedCopy}: whether duplication of a shared object remains shared. + It supports raw, logical, integer, double, or complex and, with restrictions, character (recommended read‑only; new, previously unseen strings cannot be assigned). Developers can also directly allocate, map, unmap, and free shared regions and query \texttt{is.shared} or \texttt{is.altrep}. +\end{itemize} + +\subsubsection{R's threading model}\label{rs-threading-model} + +\texttt{R}'s C API is single‑threaded; package code must not call R from secondary threads. Process‑level parallelism (clusters) remains the primary avenue. Consequently, shared‑memory frameworks must ensure that mutation is either controlled in the main thread or performed at the raw buffer level without touching R internals. + +\subsubsection{PDE-based Mutual Information}\label{pde-based-mutual-information} + +For feature selection with a discrete response \(Y\) and a continuous feature \(X\), mutual information can be expressed as: +\begin{equation} +I(X;Y) = \sum_{y} p(y) KL(p(x|y) || p(x)) +\end{equation} + +This formulation requires only univariate densities \(p(x)\) and \(p(x|y)\) per class. +This lends itself to Pareto Density Estimation (PDE), a density estimator based on hyperspheres with the Pareto radius chosen by an information-optimal criterion. In PDE, the idea is to select a subset S of the data with relative size \(p = |S|/|D|\). The information content is \(I(p) = -p \ln(p)\). \citep{ultsch2005pareto} showed that the optimal set size corresponds to about 20.1\%, retrieving roughly 88\% of the maximum possible information. The unrealized potential (URP) quantifies deviation from the optimal set size and is minimized when \(p\approx 20\%\). For univariate density estimation this yields the Pareto radius \(R\), which can be approximated by the 18\(\%\) quantile distance in one dimension. PDE thus adapts the neighborhood size following the Pareto rule (80--20 rule) to maximize information content. Empirical studies report that PDE can outperform standard density estimators under default settings \citep{thrun2020analyzing}. + +With respect to the categorical variable, no density estimation is needed, as the most accurate density estimate in this case is simply the relative label count, \(p(y) = \frac{\#\{\omega\in\Omega ~|~ Y(\omega) = y\}}{\#\Omega}\). Here \(\Omega\) is the set of cases, \(Y\) is the categorical random variable, and \(y\) runs over the range of \(Y\), \(Y(\Omega)\). + +\subsection{Methods}\label{methods} + +In idiomatic use, \texttt{memshare} coordinates PSOCK clusters (separate processes) that attach to one shared segment. Within workers the relevant shared segments are retrieved and the task is executed on them. The key win is replacing per‑worker duplication of the large matrix with a far cheaper retrieval of a handle to the shared memory segment and subsequent wrapping in an ALTREP instead. + +\subsubsection{\texorpdfstring{The \texttt{memshare} API}{The memshare API}}\label{the-memshare-api} + +Shared memory pages in \texttt{memshare} are handled by unique string identifiers on the OS side. These identifiers can be requested and retrieved via \texttt{C}/\texttt{C++}. To prevent two master \texttt{R} sessions from accidentally accessing each other's memory space because duplicate allocations can lead to undefined behavior at the OS level, users may define one or more \textbf{namespaces} in which the current session operates. +The \texttt{memshare} API closely mirrors \texttt{C}'s memory ownership model but applies it to \texttt{R} sessions. A master (primary) session owns the memory, while worker (secondary) sessions can access it. + +\subsubsection{Shared memory semantics}\label{shared-memory-semantics} + +A crucial aspect of \texttt{memshare}'s design is how shared memory is managed and exposed through \texttt{R}. Three definitions clarify the terminology: + +A \textbf{namespace} refers to a character string that defines the identifier of the shared memory context. It allows the initialization, retrieval, and release of shared variables under a common label, ensuring that multiple sessions or clusters can coordinate access to the same objects. While this does not provide absolute protection, it makes it the user's responsibility to avoid assigning the same namespace to multiple master sessions. + +\textbf{Pages} are variables owned by the current compilation unit of the code, such as the \texttt{R} session or terminal that loaded the DLL. Pages are realized as shared memory objects: on Windows via \texttt{MapViewOfFile}, and on Unix systems via \texttt{shm} in combination with \texttt{mmap}. + +\textbf{Views} are references to variables owned by another or the same compilation unit. Views are always ALTREP wrappers, providing pointers to the shared memory chunk so that R can interact with them as if they were ordinary vectors or matrices. + +Together, these concepts enforce a lifecycle: pages represent ownership of memory segments, views represent references to them, and namespaces serve as the coordination mechanism. The combination guarantees both memory efficiency and safety when performing multicore computations across \texttt{R} processes. + +If the user detaches the \texttt{memshare} package, all handles are destroyed. This means that all variables of all namespaces are cleared, provided there is no other \texttt{R} thread still using them. In other words, unloading the package cleans up shared memory regions and ensures that no dangling references remain. Other threads still holding a handle to the memory will prevent this cleanup, as it would invalidate the working memory of those threads. The shared memory is then cleared whenever all handles are released. + +\paragraph{Master session}\label{master-session} + +A master session takes ownership of a memory page using: + +\begin{itemize} +\tightlist +\item + \texttt{registerVariables(namespace,\ variableList)} +\end{itemize} + +where \texttt{variableList} is a named list of supported types. These are double matrices, double vectors, or lists of these. The names define the memory pages ID through which they can be accessed, while the values of the actual variables define the size and content of the memory page. + +To deallocate memory pages, the master session can call: + +\begin{itemize} +\tightlist +\item + \texttt{releaseVariables(namespace,\ variableNames)} +\end{itemize} + +where \texttt{variableNames} is a character vector containing the names of previously shared variables. + +\texttt{memshare} also allows for releasing all the memory allocated for a given namespace by a memory context, i.e., a parallel cluster with a master session, via the + +\begin{itemize} +\tightlist +\item + \texttt{memshare\_gc(namespace,\ cluster)} +\end{itemize} + +function. This first removes every view handle in the context and then releases all pages. + +\textbf{Note.} Memory pages are not permanently deallocated if another session still holds a view of them. This ensures stability: allowing workers to continue with valid but outdated memory is safer than letting them access invalidated memory. However, releasing variables still in use is always a user error and must be avoided. + +\paragraph{Worker session}\label{worker-session} + +Once the master session has shared variables, worker sessions can retrieve them via: + +\begin{itemize} +\tightlist +\item + \texttt{retrieveViews(namespace,\ variableNames)} +\end{itemize} + +This returns a named list of \texttt{R} objects. These objects are raw ALTREP objects indistinguishable from the originals (\texttt{is.matrix}, \texttt{is.numeric}, etc.), and all behave the same. + +When operating on these objects, workers interact directly with the underlying \texttt{C} buffer, backed by \texttt{mmap} (Unix) or \texttt{MapViewOfFile} (Windows). Changes to such objects modify the shared memory for all sessions. In this framework, however, modification is secondary---the main goal is to transfer data from the master to worker sessions. + +For metadata access without retrieving views, workers can call: + +\begin{itemize} +\tightlist +\item + \texttt{retrieveMetadata(namespace,\ variableName)} +\end{itemize} + +which provides information for a single variable. + +After processing, workers must return their views to allow memory release by calling: + +\begin{itemize} +\tightlist +\item + \texttt{releaseViews(namespace,\ variableNames)} +\end{itemize} + +The overall high-level concept is summarized in Figure \ref{fig:figurememshare}. + +\begin{figure} +\includegraphics[width=1\linewidth]{figures/Grafik_Memshare} \caption{A schematic about where the memory is located and how different sessions access it.}\label{fig:figurememshare} +\end{figure} + +\paragraph{Diagnostic tools}\label{diagnostic-tools} + +To verify correct memory management, two diagnostic functions are available: + +\begin{itemize} +\tightlist +\item + \texttt{pageList()}: lists all variables owned by the current session.\\ +\item + \texttt{viewList()}: lists all views (handles) currently held by the session. +\end{itemize} + +The former is stricter, since it identifies ownership, whereas the latter only tracks held views. + +\paragraph{\texorpdfstring{User-friendly wrapper functions for \texttt{apply} and \texttt{lapply}}{User-friendly wrapper functions for apply and lapply}}\label{user-friendly-wrapper-functions-for-apply-and-lapply} + +Since memory ownership and address sharing are low-level concepts, \texttt{memshare} provides wrapper functions that mimic \texttt{parallel::parApply} and \texttt{parallel::parLapply}. + +\begin{itemize} +\tightlist +\item + \texttt{memApply(X,\ MARGIN,\ FUN,\ NAMESPACE,\ CLUSTER,\ VARS,\ MAX.CORES)}\strut \\ + Mimics \texttt{parallel::parApply}. + + \begin{itemize} + \tightlist + \item + \texttt{X}: a double matrix.\\ + \item + \texttt{MARGIN}: direction (\texttt{1\ =\ row-wise}, \texttt{2\ =\ column-wise}).\\ + \item + \texttt{FUN}: function applied to rows/columns.\\ + \item + \texttt{CLUSTER}: a prepared parallel cluster (variables exported via \texttt{parallel::clusterExport}).\\ + \item + \texttt{VARS}: additional shared variables (names must match \texttt{FUN} arguments).\\ + \item + \texttt{MAX.CORES}: only relevant if \texttt{CLUSTER} is uninitialized. + \end{itemize} +\end{itemize} + +\texttt{memApply} automatically manages sharing and cleanup of \texttt{X} and \texttt{VARS}, ensuring no residual \texttt{C}/\texttt{C++} buffers remain. Both \texttt{X} and \texttt{VARS} can also refer to previously allocated shared variables, though in that case the user must manage their lifetime. + +\begin{itemize} +\tightlist +\item + \texttt{memLapply(X,\ FUN,\ NAMESPACE,\ CLUSTER,\ VARS,\ MAX.CORES)}\strut \\ + Equivalent to \texttt{parallel::parLapply}, but within the \texttt{memshare} framework. + + \begin{itemize} + \tightlist + \item + \texttt{X}: a list of double matrices or vectors.\\ + \item + Other arguments behave the same way as in \texttt{memApply}. + \end{itemize} +\end{itemize} + +\subsubsection{Examples of Use}\label{examples-of-use} + +We provide two top-level examples for the use of the \texttt{memshare} package: one with \texttt{memLapply} and one with \texttt{memApply}. + +The first example computes the correlation between each column of a matrix and a reference vector using shared memory and \texttt{memApply}. The matrix can be provided directly and will be registered automatically, or by name if already registered. + +\begin{verbatim} +library(memshare) +set.seed(1) +n <- 10000 +p <- 2000 +# Numeric double matrix (required): n rows (cases) x d columns (features) +X <- matrix(rnorm(n * p), n, p) +# Reference vector to correlate with each column +y <- rnorm(n) +f <- function(v, y) cor(v, y) + +ns <- "my_namespace" +res <- memshare::memApply( +X = X, MARGIN = 2, +FUN = f, +NAMESPACE = ns, +VARS = list(y = y), +MAX.CORES = NULL # defaults to detectCores() - 1 +) +\end{verbatim} + +\texttt{memApply} parallelizes a row- or column-wise map over a matrix that lives once in shared memory. If \texttt{X} is passed as an ordinary \texttt{R} matrix, it is registered under a generated name in the namespace \texttt{ns}. Additional variables (here \texttt{y}) can be provided as a named list; these are registered and retrieved as ALTREP views on the workers. A cluster is created automatically if none is provided. Each worker obtains a cached view of the matrix (and any shared variables), extracts the i-th row or column as a vector \texttt{v} according to MARGIN, calls \texttt{FUN(v,...)}, and returns the result. Views are released after the computation, and any objects registered by this call are freed. Because workers operate on shared views rather than copies, the total resident memory remains close to a single in-RAM copy of \texttt{X}, while runtime scales with the available cores. + +As a second example, consider a case where a list of 1000 random matrices is multiplied by a random vector. This task is parallelizable at the element level and demonstrates the use of \texttt{memshare::memLapply}, which applies a function across list elements in a shared memory context: + +\begin{verbatim} + library(memshare) + list_length <- 1000 + matrix_dim <- 100 + + # Create the list of random matrices + l <- lapply( + 1:list_length, + function(i) matrix(rnorm(matrix_dim * matrix_dim), + nrow = matrix_dim, ncol = matrix_dim) + ) + + y <- rnorm(matrix_dim) + namespace <- "my_namespace" + + res <- memLapply(l, function(el, y) { + el %*% y + }, NAMESPACE = namespace, VARS = list(y=y), + MAX.CORES = 1) #MAX.CORES=1 for simplicity +\end{verbatim} + +\texttt{memLapply()} provides a parallel version of \texttt{lapply()} where the list elements and optional auxiliary variables are stored in shared memory. If the input X is an ordinary \texttt{R} list, it is first registered in a shared memory namespace. Additional variables can be supplied either as names of existing shared objects or as a named list to be registered. A parallel cluster is created automatically if none is provided, and each worker is initialized with the \texttt{memshare} environment. + +For each index of the list, the worker retrieves an ALTREP view of the corresponding element (and of any shared variables), applies the user-defined function \texttt{FUN} to these objects, and then releases the views to avoid memory leaks. The function enforces that the first argument of \texttt{FUN} corresponds to the list element and that the names of shared variables match exactly between the namespace and the function signature. Results are collected with \texttt{parLapply}, yielding an ordinary \texttt{R} list of the same length as the input. + +Because only lightweight references to the shared objects are passed to the workers, no duplication of large data occurs, making the approach memory-efficient. Finally, \texttt{memLapply()} includes cleanup routines to release temporary registrations, stop the cluster if it was created internally, and free shared memory, ensuring safe reuse in subsequent computations. + +\subsubsection{Benchmark design}\label{benchmark-design} + +We compare \texttt{memshare} and \texttt{SharedObject} on a column-wise apply task across square matrices of sizes \(10^i \times 10^i\) for \(i = 1,\ldots,5\). We use a PSOCK cluster with 32 cores on an iMac Pro, 256 GB DDR4, 2.3 GHz 18-core Intel Xeon W on macOS Sequoia 16.6.1 (24G90) with R 4.5.1 \texttt{x86\_64-apple-darwin20}. For each size, we run 100 repetitions and recorded wall-clock times and resident set size (RSS) across all worker PIDs plus the master. The RSS is summed via \texttt{ps()} and our helper \texttt{total\_rss\_mb()}. +We define the memory overhead as the difference in total RSS before and after the call, i.e., we measure the additional memory required by the computation beyond the base process footprint. + +For \texttt{SharedObject} we create \texttt{A2}, \texttt{share(A1)}, and \texttt{parApply()}; for \texttt{memshare} we call \texttt{memApply} directly on \texttt{A1} with a namespace, so that only ALTREP views are created on the workers. A serial baseline uses \texttt{apply()}, and an additional baseline uses \texttt{parApply()} without shared memory. A minimally edited version of the full script (setup, PID collection, loops, and data saving) is provided in Appendix\textasciitilde A to ensure reproducibility. + +As part of our safety and lifecycle checks, we ensure that views, which keep shared segments alive, are always released in the workers before returning control. Once all work is complete, the corresponding variables are then released in the master. To maintain fairness, we avoid creating incidental copies, such as those introduced by coercions, remove variables with \texttt{rm()}, and use R's garbage collection \texttt{gc()} after each call. + +\subsubsection{RNA-seq dataset via FireBrowse}\label{rna-seq-dataset-via-firebrowse} + +FireBrowse \citep{firebrowse2025} delivers raw counts of gene expression indexed by NCBI identifiers. For each gene identifier \(i\) (from Ensembl or NCBI), we obtain a raw count \(r_i\) that quantifies the observed read abundance. These raw counts represent the number of reads mapped to each gene, without length normalization. To convert raw counts into TPM (transcripts per million) \citep{li2010rsem}, we require gene or transcript lengths \(l_i\). For each gene \(i\), we compute: + +\begin{equation} +\hat{r}_i=\frac{r_i}{l_i} +\end{equation} + +The total sum \(R = \sum_i \hat{r}_i\) across all genes is then used to scale values as: + +\begin{equation} +TPM_i=\frac{\hat{r}_i}{R} \times 10^6 +\end{equation} +This transformation allows comparison of expression levels across genes and samples by correcting for gene length and sequencing depth \citep{li2010rsem}. After transformation, our dataset consists of \(d = 19,637\) gene expressions across \(N = 10,446\) cases spanning 32 diagnoses. It can be found under \citep{thrun2025genexpressions}. + +\subsection{Results}\label{results} + +In the first subsection, the efficiency of \texttt{memshare} is compared to \texttt{SharedObject}, and in the second subsection the application is presented. + +\subsubsection{Performance and Memory}\label{performance-and-memory} + +In Figure \ref{fig:figure1}, the results for square matrices of increasing magnitudes are shown as summary plots with variability bars. In these error-bar-style plots the bars indicate median ± AMAD. +The bottom subfigures (C--D) provide a zoomed view of the first four matrix sizes, while the top subfigures (A--B) display all five magnitudes from \(10^1\) to \(10^5\). +The x-axis represents the magnitude of the matrix size, and the y-axis shows either runtime in seconds (subfigures A and C) or memory overhead in megabytes (subfigures B and D). Memory overhead is measured as the difference in total RSS. For each magnitude, 100 trials were conducted. The magenta line indicates the performance of the single-threaded R baseline. + +Table\textasciitilde\ref{tab:median-res-tab-static} reports the median runtime and memory overhead, while variability is visualized in Figure\textasciitilde\ref{fig:figure1-detail} via the scatter of 100 runs and summarized numerically by the robust AMAD dispersion statistic in Table\textasciitilde\ref{tab:amad-res-tab-static}. + +\texttt{memshare} (orange diamonds) consistently outperforms \texttt{SharedObject} (blue circles) in Figure \ref{fig:figure1}. In the scatter plot of the 100 trials in Figure\textasciitilde\ref{fig:figure1-detail}, it is evident that for the first three magnitudes both packages use less memory than the baseline. At \(10^4\), average memory usage is comparable to the baseline, while at \(10^5\), \texttt{memshare} slightly exceeds it. \texttt{SharedObject}, however, could only be executed at this magnitude from the terminal, but not within RStudio (see Appendix\textasciitilde B). + +Considering relative differences \citep{Ultsch2008}, \texttt{memshare} achieves computation times that are 90--170\% faster than \texttt{SharedObject}. For matrices of size \(10^2\) and larger, memory consumption is reduced by 132--153\% compared to \texttt{SharedObject}. + +\begin{figure} +\includegraphics[width=1\linewidth]{figures/Figure1} \caption{Matrix size depicted as magnitude vs median runtime (left) and vs memory overhead (MB) during the run relative to idle (right) for `memshare`, `SharedObject` as error-bar style plots with intevals given by the median ± AMAD across 100 runs. In addition, the serial baseline is shown as a line in magenta. The top subfigures present the full range of matrix sizes, and the bottom subfigures zoom in.}\label{fig:figure1} +\end{figure} + +\begin{longtable}[]{@{} + >{\raggedright\arraybackslash}p{(\linewidth - 8\tabcolsep) * \real{0.2101}} + >{\raggedright\arraybackslash}p{(\linewidth - 8\tabcolsep) * \real{0.1884}} + >{\raggedright\arraybackslash}p{(\linewidth - 8\tabcolsep) * \real{0.2029}} + >{\raggedright\arraybackslash}p{(\linewidth - 8\tabcolsep) * \real{0.2029}} + >{\raggedright\arraybackslash}p{(\linewidth - 8\tabcolsep) * \real{0.1957}}@{}} +\caption{\label{tab:median-res-tab-static} The benchmark compares four types: \texttt{memshare}, \texttt{SharedObject}, a single-threaded baseline, and a parallel baseline. For \texttt{memshare} and \texttt{SharedObject}, the reported values are the medians over 100 iterations, while the baselines are the result from either a single-threaded R or a simple \texttt{parApply} run using one iteration. Magnitude refers to the matrix size. Entries are given as \emph{Time Consumed (Memory Overhead)}, where time is measured in seconds and memory in megabytes (MB); the memory after call is mentioned in Appendix C.}\tabularnewline +\toprule\noalign{} +\begin{minipage}[b]{\linewidth}\raggedright +Type / Magnitude +\end{minipage} & \begin{minipage}[b]{\linewidth}\raggedright +Baseline +\end{minipage} & \begin{minipage}[b]{\linewidth}\raggedright +Baseline parApply +\end{minipage} & \begin{minipage}[b]{\linewidth}\raggedright +SharedObject +\end{minipage} & \begin{minipage}[b]{\linewidth}\raggedright +memshare +\end{minipage} \\ +\midrule\noalign{} +\endfirsthead +\toprule\noalign{} +\begin{minipage}[b]{\linewidth}\raggedright +Type / Magnitude +\end{minipage} & \begin{minipage}[b]{\linewidth}\raggedright +Baseline +\end{minipage} & \begin{minipage}[b]{\linewidth}\raggedright +Baseline parApply +\end{minipage} & \begin{minipage}[b]{\linewidth}\raggedright +SharedObject +\end{minipage} & \begin{minipage}[b]{\linewidth}\raggedright +memshare +\end{minipage} \\ +\midrule\noalign{} +\endhead +\bottomrule\noalign{} +\endlastfoot +1 & 0.0003 (0.0234) & 0.0049 (0.9023) & 0.2492 (0.0801) & 0.0416 (1.1426) \\ +2 & 0.0008 (0.1461) & 0.0034 (0.1461) & 0.2531 (0.5117) & 0.0419 (0.3594) \\ +3 & 0.0231 (15.2656) & 0.0356 (7.6406) & 0.3238 (11.6387) & 0.0481 (1.4688) \\ +4 & 2.2322 (1664.9727) & 3.5015 (2627.1133) & 1.5526 (1570.0566) & 0.6655 (895.4473) \\ +4.2 & 9.2883 (4490.4648) & 12.9872 (6040.4023) & 3.1147 (3901.5137) & 1.6223 (3881.7441) \\ +4.5 & 36.3783 (17206.4688) & 53.6183 (24983.7852) & 10.3513 (15391.0020) & 6.6583 (15285.4258) \\ +4.7 & 92.0136 (39355.5703) & 130.6937 (62936.4766) & 32.3116 (38533.7266) & 16.3157 (38389.7305) \\ +5 & 217.4490 (157812.9492) & -- & 128.0942 (152967.0273) & 67.0000 (76311.7402) \\ +\end{longtable} + +\begin{longtable}[]{@{} + >{\raggedright\arraybackslash}p{(\linewidth - 8\tabcolsep) * \real{0.1628}} + >{\raggedright\arraybackslash}p{(\linewidth - 8\tabcolsep) * \real{0.1628}} + >{\raggedright\arraybackslash}p{(\linewidth - 8\tabcolsep) * \real{0.1279}} + >{\raggedright\arraybackslash}p{(\linewidth - 8\tabcolsep) * \real{0.2907}} + >{\raggedright\arraybackslash}p{(\linewidth - 8\tabcolsep) * \real{0.2558}}@{}} +\caption{\label{tab:amad-res-tab-static} AMAD for the benchmark of \texttt{SharedObject} vs \texttt{memshare}.}\tabularnewline +\toprule\noalign{} +\begin{minipage}[b]{\linewidth}\raggedright +Magnitude +\end{minipage} & \begin{minipage}[b]{\linewidth}\raggedright +Type +\end{minipage} & \begin{minipage}[b]{\linewidth}\raggedright +Time Consumed (seconds) +\end{minipage} & \begin{minipage}[b]{\linewidth}\raggedright +Memory Overhead (MB) +\end{minipage} & \begin{minipage}[b]{\linewidth}\raggedright +Memory after Call +\end{minipage} \\ +\midrule\noalign{} +\endfirsthead +\toprule\noalign{} +\begin{minipage}[b]{\linewidth}\raggedright +Magnitude +\end{minipage} & \begin{minipage}[b]{\linewidth}\raggedright +Type +\end{minipage} & \begin{minipage}[b]{\linewidth}\raggedright +Time Consumed (seconds) +\end{minipage} & \begin{minipage}[b]{\linewidth}\raggedright +Memory Overhead (MB) +\end{minipage} & \begin{minipage}[b]{\linewidth}\raggedright +Memory after Call +\end{minipage} \\ +\midrule\noalign{} +\endhead +\bottomrule\noalign{} +\endlastfoot +1 & SharedObject & 0.01241 & 0.06400 & 7.47613 \\ +2 & SharedObject & 0.00549 & 0.05270 & 12.85171 \\ +3 & SharedObject & 0.00743 & 0.23716 & 109.60844 \\ +4 & SharedObject & 0.05179 & 9.98323 & 364.98253 \\ +4.2 & SharedObject & 0.13739 & 9.28681 & 533.72239 \\ +4.5 & SharedObject & 0.63438 & 14.73392 & 1619.81984 \\ +4.7 & SharedObject & 2.09502 & 15.81807 & 823.25853 \\ +6 & SharedObject & 4.01802 & 22.88011 & 555.74045 \\ +1 & memshare & 0.00198 & 1.41166 & 51.33531 \\ +2 & memshare & 0.00251 & 0.27480 & 17.12808 \\ +3 & memshare & 0.00228 & 1.91232 & 78.65743 \\ +4 & memshare & 0.02038 & 110.18064 & 5956.56036 \\ +4.2 & memshare & 0.03206 & 39.21014 & 2346.06943 \\ +4.5 & memshare & 0.24571 & 36.97784 & 1086.58683 \\ +4.7 & memshare & 0.53134 & 89.44624 & 4081.05399 \\ +5 & memshare & 2.48280 & 31.98623 & 2413.87405 \\ +\end{longtable} + +\subsubsection{Application to Feature Selection by Mutual Information using Pareto Density Estimation}\label{application-to-feature-selection-by-mutual-information-using-pareto-density-estimation} + +The computation of mutual information produced values ranging from 0 to 0.54 (Figure \ref{fig:figure2} ). The QQ-plot shows clear deviation from a straight line, indicating that the distribution is not Gaussian. Both the histogram and the PDE plot provide consistent estimates of the probability density, revealing a bimodal structure. The boxplot further highlights the presence of outliers with values above 0.4. + +The analysis required about two hours of computation time and approximately 47 GB of RAM, feasible only through memory sharing. In practice, mutual information values can guide feature selection, either by applying a hard threshold or by using a soft approach via a mixture model, depending on the requirements of the subsequent machine learning task. + +\begin{figure} +\includegraphics[width=1\linewidth]{figures/figure2-1} \caption{The distribution of mutual information for 19637 gene expressions as a histogram, Pareto Density Estimation (PDE), QQ-plot against normal distribution and boxplot. There are no missing values (NaN).}\label{fig:figure2} +\end{figure} + +\newpage + +\subsection{Discussion}\label{discussion} + +On all major platforms, PSOCK clusters execute each worker in a separate R process. Consequently, a call to \texttt{parApply(cl,\ X,\ MARGIN,\ FUN,\ …)} requires R to serialize the matrix to transmit it to each worker, and deserialize it into a full in-memory copy in every process. +As shown in Table \ref{tab:median-res-tab-static}, this replication leads to out-of-memory (OOM) failures once the matrix reaches a size of \(10^5 \times 10^5\), despite the machine providing 256 GB of RAM. +Even for smaller magnitudes, substantial redundant memory allocation occurs: multiple workers may begin materializing private copies of the matrix faster than they can complete their portion of the computation, resulting in transient but significant memory amplification. This behavior, inherent to PSOCK-based parallelization, explains the observed OOM conditions under \texttt{parApply}. + +Consequently, shared memory becomes a foundational requirement for scalable high-performance computing in \texttt{R}, because it avoids redundant data replication and stabilizes memory usage as problem sizes grow. +In our experiments, this advantage is reflected directly in performance: across matrix sizes, \texttt{memshare} achieved a two-fold reduction in median computation time compared to \texttt{SharedObject} on the column-wise task. For large matrix sizes, both \texttt{memshare} and \texttt{SharedObject} show a lower total memory overhead than the single-threaded baseline, because \texttt{R}'s serial \texttt{apply()} implementation creates substantial temporary objects. Each column extraction allocates a full-length numeric vector, and additional intermediates are produced by \texttt{FUN}. These private allocations inflate the baseline's RSS, and memory is not promptly returned to the OS due to R's garbage-collection strategy. In contrast, \texttt{memshare} and \texttt{SharedObject} provide ALTREP-backed views into a shared memory segment, eliminating the need to materialize full column vectors. + +For matrices of size \(10^2\) and larger, memory overhead was between half and a third of that of \texttt{SharedObject}. At the smallest size (\(10^1\)), \texttt{memshare} consumed more memory than \texttt{SharedObject} because, in R, the metadata must also be shared; this requires a second shared-memory segment whose fixed overhead dominates at small sizes. + +The experiments show that \texttt{SharedObject} exhibited overhead consistent with copy-on-write materializations and temporary object creation up to size \(10^4\). Its memory usage was approximately an order of magnitude higher on macOS than that of \texttt{memshare} or the single thread baseline, as illustrated by the triangles aligning with the single-thread baseline of a higher magnitude in Figure \ref{fig:figure1-detail}. For matrices of size \(10^5\), \texttt{SharedObject} caused RStudio to crash (see appendix\textasciitilde D), although results were computable in R via the terminal. + +Beyond these synthetic benchmarks, the RNA-seq case study illustrates how this computational behavior translates into a practical high-dimensional analysis pipeline. Biologically, the bimodal structure in Figure\textasciitilde\ref{fig:figure2} is consistent with a separation between largely uninformative background genes and a smaller subset of diagnosis, or subtype-specific markers. Genes in the lower mode, i.e., with MI values close to zero, show little association with the 32 diagnostic labels and plausibly correspond to broadly expressed housekeeping or pathway genes whose expression varies only weakly across cancer types. In contrast, genes in the upper mode, i.e., MI values in the right-hand peak, exhibit strong dependence on the diagnostic label and are therefore candidates for disease- or tissue-specific markers, including lineage markers and immune-related genes that differ systematically between tumor entities. Although a detailed pathway or gene-set enrichment analysis is beyond the scope of this work, the presence of a distinct high-MI mode indicates that the PDE-based mutual information filter successfully highlights genes whose expression patterns are highly structured with respect to the underlying diagnostic classes and likely reflect underlying molecular subtypes. + +At the same time, the fact that this analysis required approximately two hours of computation time and about 47\textasciitilde GB of RAM underscores the central role of memory sharing: without \texttt{memshare}, running such a MI-PDE filter on 19,637 genes and 10,446 cases in parallel would be prohibitively memory-intensive on typical multicore hardware. In sum, \texttt{memshare} provides a more stable memory-sharing interface, scales more effectively to large matrix sizes, and achieves greater computational efficiency than \texttt{SharedObject}. + +\subsection{Summary}\label{summary} + +Regardless of the package, \texttt{R}'s single-threaded API implies that multi-threaded computation should touch only raw memory and synchronize results at the main thread boundary. Shared mutation requires external synchronization if multiple workers write to overlapping regions. In practice, read-mostly patterns are ideal. + +Here, \texttt{memshare}'s namespace + view model and memApply wrapper simplify cross-process sharing compared to manual \texttt{share()} + cluster wiring. Its explicit releaseViews/Variables lifecycle makes retention and cleanup auditable. \texttt{SharedObject}'s fine-grained properties are powerful, but the interaction of copy-on-write and duplication semantics increases cognitive load. + +\texttt{memshare} combines ALTREP-backed shared memory with a pragmatic parallel API to deliver strong speed and memory efficiency on multicore systems. In analytic pipelines like MI-based feature selection for RNA-seq, this enables simple, scalable patterns---one in-RAM copy of the data, many cores, and no serialization overhead. + +\subsection{Appendix A: code listing of benchmark}\label{appendix-a-code-listing-of-benchmark} + +The full code is accessible via \citep{thrun2025mem_appendixa}. To avoid the crash message for \texttt{SharedObject} in Appendix\textasciitilde B, we tried manual garbage collection and performed saves for each iteration without being able to change the outcome. Only by restricting usage to the `R' terminal console without RStudio were we able to compute all results. + +\subsection{Appendix B: ALTREP and shared memory in R}\label{appendix-b-altrep-and-shared-memory-in-r} + +In R, ALTREP (short for ALTernate REPresentations) is a framework introduced in version 3.5.0 that allows vectors and other objects to be stored and accessed in non-standard ways while maintaining their usual R interface. Built-in type checks cannot tell the difference between an ALTREP object and its ordinary counterpart, which ensures compatibility. + +Instead of relying solely on R's default contiguous in-memory arrays, ALTREP permits objects such as integers, doubles, or strings to be backed by alternative storage mechanisms. Developers can override fundamental methods that govern vector behavior---such as length queries, element access (\texttt{DATAPTR}, \texttt{DATAPTR\_OR\_NULL}, etc.), duplication, coercion, and even printing, so that objects can behave normally while drawing data from different sources. + +Because these overrides are transparent to R's higher-level functions, ALTREP objects can be passed, transformed, and manipulated like regular vectors, regardless of whether their contents reside in memory, on disk, or are computed lazily. + +For package authors, this framework makes it possible to expose objects that look identical to standard R vectors but internally retrieve their data from sources like memory-mapped files, shared memory, compressed formats, or custom C++ buffers. In practice, this enables efficient handling of large datasets and unconventional data representations while keeping existing R code unchanged. + +\subsection{Appendix C: benchmarking in detail}\label{appendix-c-benchmarking-in-detail} + +In Figure \ref{fig:figure1-detail}, the results are presented as scatter plots for square matrices of increasing magnitudes. The left panel shows detailed scatter plots for the first three matrix sizes, while the right panel summarizes all five magnitudes from \(10^1\) to \(10^5\). The x-axis represents computation time (log seconds), and the y-axis represents memory overhead (log megabytes). It is measured as the difference in total RSS. Each point corresponds to one of 100 trials per magnitude. The magenta baseline indicates the performance of a single-threaded R computation. + +\begin{figure} +\includegraphics[width=1\linewidth]{figures/Figure1_appendix_secs_vs_Resident_Set_Size_mac} \caption{Median runtime (log-scale) vs matrix size for `memshare`, `SharedObject`, and serial baseline; ribbons show IQR across 100 runs. Insets show difference in total RSS in log(MB), i,e., the memory overhead, during the run relative to idle for Mac presenting the details of Figure +ef{fig:figure1, echo=FALSE}.}\label{fig:figure1-detail} +\end{figure} + +To compute the results for Figure \ref{fig:appendix-figure1}, we used a PSOCK cluster with 15 workers on a different iMac, namely, 128 GB DDR4, 3.8 GHz 8-Core Intel Xeon W with Windows 10 on Boot Camp, and R 4.5.1. For each size, we run 100 repetitions and record wall-clock times and resident set size (RSS) across all worker PIDs plus the master. +The results of the benchmark on windows are presented in Table \ref{tab:appendix-res-tab-static} and \ref{tab:appendix-amad-res-tab-static}, and in Figure \ref{fig:appendix-figure1}. + +Within macOs Tahoe, for \texttt{SharedObject}, the memory after a call increases from about 3062 MB at \(10^1\) to 173,083 MB at exponent \(10^5\), i.e.~from roughly 3 GB to 169 GB. +For \texttt{memshare}, the memory after a call grows from about 3490 MB at \(10^1\) to 128,393 MB at \(10^5\), i.e.~from roughly 3.5 GB to 125 GB. +Based on these numbers, \texttt{memshare} slightly uses more memory after the call than \texttt{SharedObject} for small and medium problem sizes (exponents \(10^1\) to \(10^4.7\), +but at the largest matrix size \(10^5\) \texttt{memshare} requires substantially less overall memory than \texttt{SharedObject}. However, within Windows\textasciitilde10 the situation between \texttt{SharedObject} and \texttt{memshare} changes as depicted in Table \ref{tab:appendix-res-tab-static}. + +It is important to emphasize that our benchmark was conducted on a specific and somewhat idiosyncratic hardware, i.e., a 2021 iMac running Windows\textasciitilde10 via Boot Camp. This configuration combines Apple hardware, Apple's firmware and drivers, and Microsoft's operating system and memory manager in a way that is not representative of typical server or workstation deployments. As a consequence, low-level aspects that are relevant for shared-memory performance, such as page allocation strategy, memory-mapped file handling, and the interaction between R, BLAS, and the operating system scheduler, may differ substantially on other platforms. We therefore refrain from drawing broader conclusions about absolute performance or cross-platform behavior from this benchmark and instead interpret the results as a comparative case study of \texttt{memshare} versus \texttt{SharedObject} on this concrete, well-specified environment. + +\begin{longtable}[]{@{} + >{\raggedright\arraybackslash}p{(\linewidth - 8\tabcolsep) * \real{0.1895}} + >{\raggedright\arraybackslash}p{(\linewidth - 8\tabcolsep) * \real{0.1158}} + >{\raggedright\arraybackslash}p{(\linewidth - 8\tabcolsep) * \real{0.2632}} + >{\raggedright\arraybackslash}p{(\linewidth - 8\tabcolsep) * \real{0.2316}} + >{\raggedright\arraybackslash}p{(\linewidth - 8\tabcolsep) * \real{0.2000}}@{}} +\caption{\label{tab:appendix-res-tab-static} Median runtime and memory overhead for the benchmark on Windows 10 via Boot Camp.}\tabularnewline +\toprule\noalign{} +\begin{minipage}[b]{\linewidth}\raggedright +Type +\end{minipage} & \begin{minipage}[b]{\linewidth}\raggedright +Magnitude +\end{minipage} & \begin{minipage}[b]{\linewidth}\raggedright +Time Consumed (seconds) +\end{minipage} & \begin{minipage}[b]{\linewidth}\raggedright +Memory Overhead (MB) +\end{minipage} & \begin{minipage}[b]{\linewidth}\raggedright +Memory after Call +\end{minipage} \\ +\midrule\noalign{} +\endfirsthead +\toprule\noalign{} +\begin{minipage}[b]{\linewidth}\raggedright +Type +\end{minipage} & \begin{minipage}[b]{\linewidth}\raggedright +Magnitude +\end{minipage} & \begin{minipage}[b]{\linewidth}\raggedright +Time Consumed (seconds) +\end{minipage} & \begin{minipage}[b]{\linewidth}\raggedright +Memory Overhead (MB) +\end{minipage} & \begin{minipage}[b]{\linewidth}\raggedright +Memory after Call +\end{minipage} \\ +\midrule\noalign{} +\endhead +\bottomrule\noalign{} +\endlastfoot +SharedObject & 1 & 0.0047 & 0.0391 & 2870.6816 \\ +SharedObject & 2 & 0.0124 & 0.4648 & 2873.7266 \\ +SharedObject & 3 & 0.0707 & 13.3691 & 2929.9121 \\ +SharedObject & 4 & 1.0768 & 2594.9395 & 5799.9766 \\ +SharedObject & 4.2 & 2.0788 & 7029.2070 & 10131.3750 \\ +SharedObject & 4.5 & 6.1894 & 23832.9727 & 26923.1855 \\ +SharedObject & 4.7 & 16.2314 & 38973.7051 & 42078.9980 \\ +memshare & 1 & 0.0417 & 1.3848 & 1609.0801 \\ +memshare & 2 & 0.0412 & 1.3145 & 1619.4062 \\ +memshare & 3 & 0.0487 & 3.3945 & 1675.7246 \\ +memshare & 4 & 0.6164 & 764.8184 & 2823.6172 \\ +memshare & 4.2 & 1.4858 & 3841.7480 & 5881.9570 \\ +memshare & 4.5 & 5.1790 & 15275.3418 & 17298.4473 \\ +memshare & 4.7 & 13.7764 & 38336.5898 & 40407.6250 \\ +Baseline & 1 & 0.0002 & 0.0000 & 796.5848 \\ +Baseline & 2 & 0.0008 & 0.0021 & 796.8240 \\ +Baseline & 3 & 0.0138 & 0.1461 & 816.7769 \\ +Baseline & 4 & 1.6573 & 1686.0985 & 2542.8888 \\ +Baseline & 4.2 & 9.1679 & 4356.6723 & 5234.8967 \\ +Baseline & 4.5 & 34.7380 & 17936.2663 & 18959.0366 \\ +Baseline & 4.7 & 86.3598 & 45320.6208 & 46571.1692 \\ +Baseline Parallel & 1 & 0.0041 & 0.9023 & 78916.7539 \\ +Baseline Parallel & 2 & 0.0043 & 0.0000 & 739.4570 \\ +Baseline Parallel & 3 & 0.0389 & 7.1484 & 765.4844 \\ +Baseline Parallel & 4 & 1.9595 & 1053.2031 & 3416.6445 \\ +Baseline Parallel & 4.2 & 10.6068 & 2621.7109 & 6546.7422 \\ +Baseline Parallel & 4.5 & 44.0077 & 23568.1680 & 24471.5742 \\ +Baseline Parallel & 4.7 & 108.1856 & 58851.6836 & 59758.3945 \\ +\end{longtable} + +\begin{longtable}[]{@{} + >{\raggedright\arraybackslash}p{(\linewidth - 8\tabcolsep) * \real{0.1538}} + >{\raggedright\arraybackslash}p{(\linewidth - 8\tabcolsep) * \real{0.1209}} + >{\raggedright\arraybackslash}p{(\linewidth - 8\tabcolsep) * \real{0.2747}} + >{\raggedright\arraybackslash}p{(\linewidth - 8\tabcolsep) * \real{0.2418}} + >{\raggedright\arraybackslash}p{(\linewidth - 8\tabcolsep) * \real{0.2088}}@{}} +\caption{\label{tab:appendix-amad-res-tab-static} AMAD for the benchmark grid for \texttt{SharedObject} and \texttt{memshare} on Windows 10 via Boot Camp.}\tabularnewline +\toprule\noalign{} +\begin{minipage}[b]{\linewidth}\raggedright +Type +\end{minipage} & \begin{minipage}[b]{\linewidth}\raggedright +Magnitude +\end{minipage} & \begin{minipage}[b]{\linewidth}\raggedright +Time Consumed (seconds) +\end{minipage} & \begin{minipage}[b]{\linewidth}\raggedright +Memory Overhead (MB) +\end{minipage} & \begin{minipage}[b]{\linewidth}\raggedright +Memory after Call +\end{minipage} \\ +\midrule\noalign{} +\endfirsthead +\toprule\noalign{} +\begin{minipage}[b]{\linewidth}\raggedright +Type +\end{minipage} & \begin{minipage}[b]{\linewidth}\raggedright +Magnitude +\end{minipage} & \begin{minipage}[b]{\linewidth}\raggedright +Time Consumed (seconds) +\end{minipage} & \begin{minipage}[b]{\linewidth}\raggedright +Memory Overhead (MB) +\end{minipage} & \begin{minipage}[b]{\linewidth}\raggedright +Memory after Call +\end{minipage} \\ +\midrule\noalign{} +\endhead +\bottomrule\noalign{} +\endlastfoot +SharedObject & 1 & 0.00065 & 0.00000 & 0.38021 \\ +SharedObject & 2 & 0.00126 & 0.00000 & 0.26351 \\ +SharedObject & 3 & 0.00855 & 2.11560 & 1.31002 \\ +SharedObject & 4 & 0.02707 & 5.14219 & 2.51839 \\ +SharedObject & 4.2 & 0.02521 & 18.96512 & 2.62003 \\ +SharedObject & 4.5 & 0.06582 & 43.49404 & 29.24573 \\ +SharedObject & 4.7 & 0.22797 & 186.75258 & 181.13231 \\ +memshare & 1 & 0.00558 & 0.99004 & 9.55032 \\ +memshare & 2 & 0.00411 & 1.33637 & 6.46726 \\ +memshare & 3 & 0.00373 & 6.80983 & 8.52263 \\ +memshare & 4 & 0.01511 & 304.55616 & 114.84474 \\ +memshare & 4.2 & 0.02341 & 158.45925 & 99.27513 \\ +memshare & 4.5 & 0.04168 & 173.53949 & 125.11030 \\ +memshare & 4.7 & 0.09361 & 148.23133 & 148.96163 \\ +\end{longtable} + +\begin{figure} +\includegraphics[width=1\linewidth]{figures/Figure1_appendix_secs_vs_Resident_Set_Size_win} \caption{Median runtime (log-scale) vs matrix size for `memshare`, `SharedObject`, and serial baseline; ribbons show IQR across 100 runs. Insets show the difference in total RSS in log(MB) (i.e., memory overhead) during the run relative to idle for Windows~10 via Boot Camp.}\label{fig:appendix-figure1} +\end{figure} + +\subsection{Appendix D: screenshot}\label{appendix-d-screenshot} + +Report as screenshots in Figure \ref{fig:app-a-1} and subsequent after forcing to close RStudio in \ref{fig:app-a-2} of the crash of RStudio if \texttt{SharedObject} is called with a matrix of size \(10^5\). + +\begin{figure} +\includegraphics[width=1\linewidth]{figures/Crash_message1} \caption{First Screenshot of ShareObjects Computation.}\label{fig:app-a-1} +\end{figure} + +\begin{figure} +\includegraphics[width=1\linewidth]{figures/Crash_message2} \caption{Second Screenshot of ShareObjects Computation.}\label{fig:app-a-2} +\end{figure} + +\bibliography{RJreferences.bib} + +\address{% +Michael C. Thrun\\ +University of MarburgIAP-GmbH Intelligent Analytics Projects\\% +Mathematics and Computer Science, D-35032 Marburg\\ In den Birken 10A, 29352 Adelheidsdorf\\ +% +\url{https://www.iap-gmbh.de}\\% +\textit{ORCiD: \href{https://orcid.org/0000-0001-9542-5543}{0000-0001-9542-5543}}\\% +\href{mailto:mthrun@informatik.uni-marburg.de}{\nolinkurl{mthrun@informatik.uni-marburg.de}}% +} + +\address{% +Julian Märte\\ +IAP-GmbH Intelligent Analytics Projects\\% +In den Birken 10A, 29352 Adelheidsdorf\\ +% +\url{https://www.iap-gmbh.de}\\% +\textit{ORCiD: \href{https://orcid.org/0000-0001-5451-1023}{0000-0001-5451-1023}}\\% +\href{mailto:j.maerte@iap-gmbh.de}{\nolinkurl{j.maerte@iap-gmbh.de}}% +} diff --git a/_articles/RJ-2025-043/RJ-2025-043.zip b/_articles/RJ-2025-043/RJ-2025-043.zip new file mode 100644 index 0000000000..5b24eb6762 Binary files /dev/null and b/_articles/RJ-2025-043/RJ-2025-043.zip differ diff --git a/_articles/RJ-2025-043/RJournal.sty b/_articles/RJ-2025-043/RJournal.sty new file mode 100644 index 0000000000..351990be38 --- /dev/null +++ b/_articles/RJ-2025-043/RJournal.sty @@ -0,0 +1,358 @@ +% Package `RJournal' to use with LaTeX2e +% Copyright (C) 2010 by the R Foundation +% Copyright (C) 2013 by the R Journal +% +% Originally written by Kurt Hornik and Friedrich Leisch with subsequent +% edits by the editorial board +% +% CAUTION: +% Do not modify this style file. Any changes to this file will be reset when your +% article is submitted. +% If you must modify the style or add LaTeX packages to the article, these +% should be specified in RJwrapper.tex + +\NeedsTeXFormat{LaTeX2e}[1995/12/01] +\ProvidesPackage{RJournal}[2025/10/05 v0.17 RJournal package] + +\RequirePackage{tikz} + +% Overall page layout, fonts etc ----------------------------------------------- + +% Issues of of \emph{The R Journal} are created from the standard \LaTeX{} +% document class \pkg{report}. + +\RequirePackage{geometry} +\geometry{a4paper, + textwidth=14cm, top=1cm, bottom=1cm, + includehead,includefoot,centering, + footskip=1.5cm} +\raggedbottom +\sloppy +\clubpenalty = 10000 +\widowpenalty = 10000 +\brokenpenalty = 10000 +\usepackage{microtype} + + +\RequirePackage{fancyhdr} +\fancyhead{} +\fancyheadoffset{2cm} +\fancyhead[L]{\textsc{\RJ@sectionhead}} +\fancyhead[R]{\thepage} +\fancyfoot{} +\fancyfoot[L]{The R Journal Vol. \RJ@volume/\RJ@number, \RJ@month~\RJ@year} +\fancyfoot[R]{ISSN 2073-4859} +\pagestyle{fancy} + +% We use the following fonts (all with T1 encoding): +% +% rm & palatino +% tt & inconsolata +% sf & helvetica +% math & palatino + +\RequirePackage{microtype} + +\RequirePackage[scaled=0.92]{helvet} +\RequirePackage{palatino,mathpazo} +\RequirePackage[scaled=1.02]{inconsolata} +\RequirePackage[T1]{fontenc} + +\RequirePackage[hyphens]{url} +\RequirePackage[pagebackref]{hyperref} +\renewcommand{\backref}[1]{[p#1]} + +% Dark blue colour for all links +\RequirePackage{color} +\definecolor{link}{rgb}{0.45,0.51,0.67} +\hypersetup{ + colorlinks,% + citecolor=link,% + filecolor=link,% + linkcolor=link,% + urlcolor=link +} + +% Give the text a little room to breath +\setlength{\parskip}{3pt} +\RequirePackage{setspace} +\setstretch{1.05} + +% Issue and article metadata --------------------------------------------------- + +% Basic front matter information about the issue: volume, number, and +% date. + +\newcommand{\volume}[1]{\def\RJ@volume{#1}} +\newcommand{\volnumber}[1]{\def\RJ@number{#1}} +\renewcommand{\month}[1]{\def\RJ@month{#1}} +\renewcommand{\year}[1]{\def\RJ@year{#1}} + + +% Individual articles correspond to +% chapters, and are contained in |article| environments. This makes it +% easy to have figures counted within articles and hence hyperlinked +% correctly. + +% An article has an author, a title, and optionally a subtitle. We use +% the obvious commands for specifying these. Articles will be put in certain +% journal sections, named by \sectionhead. + +\newcommand {\sectionhead} [1]{\def\RJ@sectionhead{#1}} +\renewcommand{\author} [1]{\def\RJ@author{#1}} +\renewcommand{\title} [1]{\def\RJ@title{#1}} +\newcommand {\subtitle} [1]{\def\RJ@subtitle{#1}} + +% Control appearance of titles: make slightly smaller than usual, and +% suppress section numbering. See http://tex.stackexchange.com/questions/69749 +% for why we don't use \setcounter{secnumdepth}{-1} + +\usepackage[medium]{titlesec} +\usepackage{titletoc} +\titleformat{\section} {\normalfont\large\bfseries}{\arabic{section}}{1em}{} +\titleformat{\subsection}{\normalfont\normalsize\bfseries}{\arabic{section}.\arabic{subsection}}{0.5em}{} +\titlecontents{chapter} [0em]{}{}{}{\titlerule*[1em]{.}\contentspage} + +% Article layout --------------------------------------------------------------- + +% Environment |article| clears the article header information at its beginning. +% We use |\FloatBarrier| from the placeins package to keep floats within +% the article. +\RequirePackage{placeins} +\newenvironment{article}{\author{}\title{}\subtitle{}\FloatBarrier}{\FloatBarrier} + +% Refereed articles should have an abstract, so we redefine |\abstract| to +% give the desired style + +\renewcommand{\abstract}[1]{\noindent\textbf{Abstract} #1} +\renewenvironment{abstract}{\noindent\textbf{Abstract}~}{} + +% The real work is done by a redefined version of |\maketitle|. Note +% that even though we do not want chapters (articles) numbered, we +% need to increment the chapter counter, so that figures get correct +% labelling. + +\renewcommand{\maketitle}{% +\noindent + \chapter{\RJ@title}\refstepcounter{chapter} + \ifx\empty\RJ@subtitle + \else + \noindent\textbf{\RJ@subtitle} + \par\nobreak\addvspace{\baselineskip} + \fi + \ifx\empty\RJ@author + \else + \noindent\textit{\RJ@author} + \par\nobreak\addvspace{\baselineskip} + \fi + \@afterindentfalse\@nobreaktrue\@afterheading +} + +% Now for some ugly redefinitions. We do not want articles to start a +% new page. (Actually, we do, but this is handled via explicit +% \newpage +% +% The name@of@eq is a hack to get hyperlinks to equations to work +% within each article, even though there may be multiple eq.(1) +% \begin{macrocode} +\renewcommand\chapter{\secdef\RJ@chapter\@schapter} +\providecommand{\nohyphens}{% + \hyphenpenalty=10000\exhyphenpenalty=10000\relax} +\newcommand{\RJ@chapter}{% + \edef\name@of@eq{equation.\@arabic{\c@chapter}}% + \renewcommand{\@seccntformat}[1]{}% + \@startsection{chapter}{0}{0mm}{% + -2\baselineskip \@plus -\baselineskip \@minus -.2ex}{\p@}{% + \phantomsection\normalfont\huge\bfseries\raggedright}} + +% Book reviews should appear as sections in the text and in the pdf bookmarks, +% however we wish them to appear as chapters in the TOC. Thus we define an +% alternative to |\maketitle| for reviews. +\newcommand{\review}[1]{ + \pdfbookmark[1]{#1}{#1} + \section*{#1} + \addtocontents{toc}{\protect\contentsline{chapter}{#1}{\thepage}{#1.1}} +} + +% We want bibliographies as starred sections within articles. +% +\RequirePackage[sectionbib,round]{natbib} +\bibliographystyle{abbrvnat} +\renewcommand{\bibsection}{\section*{References}} + +% Equations, figures and tables are counted within articles, but we do +% not show the article number. For equations it becomes a bit messy to avoid +% having hyperref getting it wrong. + +% \numberwithin{equation}{chapter} +\renewcommand{\theequation}{\@arabic\c@equation} +\renewcommand{\thefigure}{\@arabic\c@figure} +\renewcommand{\thetable}{\@arabic\c@table} + +% Issue layout ----------------------------------------------------------------- + +% Need to provide our own version of |\tableofcontents|. We use the +% tikz package to get the rounded rectangle. Notice that |\section*| +% is really the same as |\chapter*|. +\renewcommand{\contentsname}{Contents} +\renewcommand\tableofcontents{% + \vspace{1cm} + \section*{\contentsname} + { \@starttoc{toc} } +} + +\renewcommand{\titlepage}{% + \thispagestyle{empty} + \hypersetup{ + pdftitle={The R Journal Volume \RJ@volume/\RJ@number, \RJ@month \RJ@year},% + pdfauthor={R Foundation for Statistical Computing},% + } + \noindent + \begin{center} + \fontsize{50pt}{50pt}\selectfont + The \raisebox{-8pt}{\includegraphics[height=77pt]{Rlogo-5}}\hspace{10pt} + Journal + + \end{center} + {\large \hfill Volume \RJ@volume/\RJ@number, \RJ@month{} \RJ@year \quad} + + \rule{\textwidth}{1pt} + \begin{center} + {\Large A peer-reviewed, open-access publication of the \\ + R Foundation for Statistical Computing} + \end{center} + + % And finally, put in the TOC box. Note the way |tocdepth| is adjusted + % before and after producing the TOC: thus, we can ensure that only + % articles show up in the printed TOC, but that in the PDF version, + % bookmarks are created for sections and subsections as well (provided + % that the non-starred forms are used). + \setcounter{tocdepth}{0} + \tableofcontents + \setcounter{tocdepth}{2} + \clearpage +} + +% Text formatting -------------------------------------------------------------- + +\newcommand{\R}{R} +\newcommand{\address}[1]{\addvspace{\baselineskip}\noindent\emph{#1}} +\newcommand{\email}[1]{\href{mailto:#1}{\normalfont\texttt{#1}}} + +% Simple font selection is not good enough. For example, |\texttt{--}| +% gives `\texttt{--}', i.e., an endash in typewriter font. Hence, we +% need to turn off ligatures, which currently only happens for commands +% |\code| and |\samp| and the ones derived from them. Hyphenation is +% another issue; it should really be turned off inside |\samp|. And +% most importantly, \LaTeX{} special characters are a nightmare. E.g., +% one needs |\~{}| to produce a tilde in a file name marked by |\file|. +% Perhaps a few years ago, most users would have agreed that this may be +% unfortunate but should not be changed to ensure consistency. But with +% the advent of the WWW and the need for getting `|~|' and `|#|' into +% URLs, commands which only treat the escape and grouping characters +% specially have gained acceptance + +\DeclareRobustCommand\code{\bgroup\@noligs\@codex} +\def\@codex#1{\texorpdfstring% +{{\normalfont\ttfamily\hyphenchar\font=-1 #1}}% +{#1}\egroup} +\newcommand{\kbd}[1]{{\normalfont\texttt{#1}}} +\newcommand{\key}[1]{{\normalfont\texttt{\uppercase{#1}}}} +\DeclareRobustCommand\samp{`\bgroup\@noligs\@sampx} +\def\@sampx#1{{\normalfont\texttt{#1}}\egroup'} +\newcommand{\var}[1]{{\normalfont\textsl{#1}}} +\let\env=\code +\newcommand{\file}[1]{{`\normalfont\textsf{#1}'}} +\let\command=\code +\let\option=\samp +\newcommand{\dfn}[1]{{\normalfont\textsl{#1}}} +% \acronym is effectively disabled since not used consistently +\newcommand{\acronym}[1]{#1} +\newcommand{\strong}[1]{\texorpdfstring% +{{\normalfont\fontseries{b}\selectfont #1}}% +{#1}} +\let\pkg=\strong +\newcommand{\CRANpkg}[1]{\href{https://CRAN.R-project.org/package=#1}{\pkg{#1}}}% +\let\cpkg=\CRANpkg +\newcommand{\ctv}[1]{\href{https://CRAN.R-project.org/view=#1}{\emph{#1}}} +\newcommand{\BIOpkg}[1]{\href{https://www.bioconductor.org/packages/release/bioc/html/#1.html}{\pkg{#1}}} + +% Example environments --------------------------------------------------------- +\RequirePackage{fancyvrb} +\RequirePackage{alltt} + +\DefineVerbatimEnvironment{example}{Verbatim}{} +\renewenvironment{example*}{\begin{alltt}}{\end{alltt}} + +% Support for output from Sweave, and generic session style code +% These used to have fontshape=sl for Sinput/Scode/Sin, but pslatex +% won't use a condensed font in that case. + +% Update (2015-05-28 by DS): remove fontsize=\small to match example environment + +\DefineVerbatimEnvironment{Sinput}{Verbatim}{} +\DefineVerbatimEnvironment{Soutput}{Verbatim}{} +\DefineVerbatimEnvironment{Scode}{Verbatim}{} +\DefineVerbatimEnvironment{Sin}{Verbatim}{} +\DefineVerbatimEnvironment{Sout}{Verbatim}{} +\newenvironment{Schunk}{}{} + +% Mathematics ------------------------------------------------------------------ + +% The implementation of |\operatorname| is similar to the mechanism +% \LaTeXe{} uses for functions like sin and cos, and simpler than the +% one of \AmSLaTeX{}. We use |\providecommand| for the definition in +% order to keep the one of the \pkg{amstex} if this package has +% already been loaded. +% \begin{macrocode} +\providecommand{\operatorname}[1]{% + \mathop{\operator@font#1}\nolimits} +\RequirePackage{amsfonts} + +\renewcommand{\P}{% + \mathop{\operator@font I\hspace{-1.5pt}P\hspace{.13pt}}} +\newcommand{\E}{% + \mathop{\operator@font I\hspace{-1.5pt}E\hspace{.13pt}}} +\newcommand{\VAR}{\operatorname{var}} +\newcommand{\COV}{\operatorname{cov}} +\newcommand{\COR}{\operatorname{cor}} + +% Figures ---------------------------------------------------------------------- + +% For use with pandoc > 3.2.1 +\newsavebox\pandoc@box +\newcommand*\pandocbounded[1]{% scales image to fit in text height/width + \sbox\pandoc@box{#1}% + \Gscale@div\@tempa{\textheight}{\dimexpr\ht\pandoc@box+\dp\pandoc@box\relax}% + \Gscale@div\@tempb{\linewidth}{\wd\pandoc@box}% + \ifdim\@tempb\p@<\@tempa\p@\let\@tempa\@tempb\fi% select the smaller of both + \ifdim\@tempa\p@<\p@\scalebox{\@tempa}{\usebox\pandoc@box}% + \else\usebox{\pandoc@box}% + \fi% +} + +\RequirePackage[font=small,labelfont=bf]{caption} + +% Wide environments for figures and tables ------------------------------------- +\RequirePackage{environ} + +% An easy way to make a figure span the full width of the page +\NewEnviron{widefigure}[1][]{ +\begin{figure}[#1] +\advance\leftskip-2cm +\begin{minipage}{\dimexpr\textwidth+4cm\relax}% + \captionsetup{margin=2cm} + \BODY +\end{minipage}% +\end{figure} +} + +\NewEnviron{widetable}[1][]{ +\begin{table}[#1] +\advance\leftskip-2cm +\begin{minipage}{\dimexpr\textwidth+4cm\relax}% + \captionsetup{margin=2cm} + \BODY +\end{minipage}% +\end{table} +} diff --git a/_articles/RJ-2025-043/RJreferences.bib b/_articles/RJ-2025-043/RJreferences.bib new file mode 100644 index 0000000000..a10267074f --- /dev/null +++ b/_articles/RJ-2025-043/RJreferences.bib @@ -0,0 +1,108 @@ +@InProceedings{ultsch2005pareto, + author = {Alfred Ultsch}, + title = {Pareto Density Estimation: A density estimation for knowledge discovery}, + booktitle = {Proceedings of the 28th Annual Conference of the {German Classification Society} ({GfKl})}, + year = {2005}, + pages = {91--98}, + publisher = {Springer}, + series = {Studies in Classification, Data Analysis, and Knowledge Organization} +} +@Article{thrun2020analyzing, + author = {Michael C. Thrun and Tim Gehlert and Alfred Ultsch}, + title = {Analyzing the Fine Structure of Distributions}, + journal = {PLOS ONE}, + year = {2020}, + volume = {15}, + number = {10}, + pages = {e0238835}, + doi = {10.1371/journal.pone.0238835}, + url = {https://doi.org/10.1371/journal.pone.0238835} +} +@Article{prive2018efficient, + title = {Efficient analysis of large-scale genome-wide data with two {R} packages: bigstatsr and bigsnpr}, + author = {Florian Prive and Hugues Aschard and Andrey Ziyatdinov and Michael G. B. Blum}, + journal = {Bioinformatics}, + volume = {34}, + number = {16}, + pages = {2781-2787}, + year = {2018}, + publisher = {Oxford University Press}, + doi = {10.1093/bioinformatics/bty185}, + url = {https://doi.org/10.1093/bioinformatics/bty185} +} +@Article{kane2013scalable, + title = {Scalable strategies for computing with massive data}, + author = {Michael Kane and John W. Emerson and Stephen Weston}, + journal = {Journal of Statistical Software}, + volume = {55}, + number = {14}, + pages = {1-19}, + year = {2013}, + doi = {10.18637/jss.v055.i14}, + url = {https://doi.org/10.18637/jss.v055.i14} +} +@Manual{sharedobject2025, + title = {SharedObject: Sharing {R} objects across multiple {R} processes without memory duplication}, + author = {Jiefei Wang and Martin Morgan}, + year = {2025}, + note = {{R} package version (Bioconductor Release)}, + doi = {10.18129/B9.bioc.SharedObject}, + url = {https://bioconductor.org/packages/SharedObject} +} +@Manual{Rparallel2025, + title = {Support for Parallel Computation in {R}}, + author = {{R Core Team}}, + year = {2025}, + note = {{R} package 'parallel' version included in {R}}, + url = {https://stat.ethz.ch/R-manual/R-devel/library/parallel/doc/parallel.pdf} +} +@Manual{doparallel2025, + title = {doParallel: Foreach Parallel Adaptor for the 'parallel' Package}, + author = {Microsoft Corporation and Steve Weston}, + year = {2025}, + note = {{R} package version 1.0.17}, + url = {https://github.com/revolutionanalytics/doparallel} +} +@Misc{firebrowse2025, + author = {{Broad Institute of MIT and Harvard}}, + title = {FireBrowse (RRID:SCR\_026320)}, + howpublished = {\url{http://firebrowse.org/}}, + note = {Accessed via \url{https://gdac.broadinstitute.org}}, + year = {2025} +} +@Article{li2010rsem, + author = {Bo Li and Colin N. Dewey}, + title = {RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome}, + journal = {BMC Bioinformatics}, + year = {2011}, + volume = {12}, + number = {1}, + pages = {323}, + doi = {10.1186/1471-2105-12-323}, + url = {https://doi.org/10.1186/1471-2105-12-323} +} +@Misc{thrun2025genexpressions, + author = {Michael Thrun and Julian Märte}, + title = {Genexpressions Dataset derived from {FireBrowse}}, + year = {2025}, + publisher = {Zenodo}, + doi = {10.5281/zenodo.16937028}, + url = {https://zenodo.org/records/16937028} +} +@Misc{thrun2025mem_appendixa, + author = {Michael Thrun}, + title = {AppendixA: Memshare: Memory Sharing for Multicore Computation in {R} with an Application to Feature Selection by Mutual Information using {PDE}}, + year = {2025}, + publisher = {Zenodo}, + doi = {10.5281/zenodo.17762666}, + url = {https://zenodo.org/records/17762666} +} +@InProceedings{Ultsch2008, + author = {Ultsch, Alfred}, + title = {Is Log Ratio a Good Value for Measuring Return in Stock Investments?}, + booktitle = {Advances in Data Analysis, Data Handling and Business Intelligence}, + series = {Studies in Classification, Data Analysis, and Knowledge Organization}, + year = {2008}, + pages = {505--511}, + publisher = {Springer} +} diff --git a/_articles/RJ-2025-043/RJwrapper.tex b/_articles/RJ-2025-043/RJwrapper.tex new file mode 100644 index 0000000000..070cefd3d3 --- /dev/null +++ b/_articles/RJ-2025-043/RJwrapper.tex @@ -0,0 +1,70 @@ +\documentclass[a4paper]{report} +\usepackage[utf8]{inputenc} +\usepackage[T1]{fontenc} +\usepackage{RJournal} +\usepackage{amsmath,amssymb,array} +\usepackage{booktabs} + + +% tightlist command for lists without linebreak +\providecommand{\tightlist}{% + \setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}} + +\usepackage{longtable} + +% Always define CSL refs as bib entries are contained in separate doc +% Pandoc citation processing +%From Pandoc 3.1.8 +% definitions for citeproc citations +\NewDocumentCommand\citeproctext{}{} +\NewDocumentCommand\citeproc{mm}{% + \begingroup\def\citeproctext{#2}\cite{#1}\endgroup} +\makeatletter + % allow citations to break across lines + \let\@cite@ofmt\@firstofone + % avoid brackets around text for \cite: + \def\@biblabel#1{} + \def\@cite#1#2{{#1\if@tempswa , #2\fi}} +\makeatother +\newlength{\cslhangindent} +\setlength{\cslhangindent}{1.5em} +\newlength{\csllabelwidth} +\setlength{\csllabelwidth}{3em} +\newenvironment{CSLReferences}[2] % #1 hanging-indent, #2 entry-spacing + {\begin{list}{}{% + \setlength{\itemindent}{0pt} + \setlength{\leftmargin}{0pt} + \setlength{\parsep}{0pt} + % turn on hanging indent if param 1 is 1 + \ifodd #1 + \setlength{\leftmargin}{\cslhangindent} + \setlength{\itemindent}{-1\cslhangindent} + \fi + % set entry spacing + \setlength{\itemsep}{#2\baselineskip}}} + {\end{list}} +\usepackage{calc} +\newcommand{\CSLBlock}[1]{#1\hfill\break} +\newcommand{\CSLLeftMargin}[1]{\parbox[t]{\csllabelwidth}{#1}} +\newcommand{\CSLRightInline}[1]{\parbox[t]{\linewidth - \csllabelwidth}{#1}\break} +\newcommand{\CSLIndent}[1]{\hspace{\cslhangindent}#1} + + + +\begin{document} + + +%% do not edit, for illustration only +\sectionhead{Contributed research article} +\volume{17} +\volnumber{4} +\year{2025} +\month{December} +\setcounter{page}{305} + +\begin{article} + \input{RJ-2025-043} +\end{article} + + +\end{document} diff --git a/_articles/RJ-2025-043/_Rpackages.txt b/_articles/RJ-2025-043/_Rpackages.txt new file mode 100644 index 0000000000..8759687ec5 --- /dev/null +++ b/_articles/RJ-2025-043/_Rpackages.txt @@ -0,0 +1,15 @@ +install.packages(BiocManager) +BiocManager::install("ShareObject") +install.packages("ps") +install.packages("this.path") +install.packages("parallel") +install.packages("memshare") +install.packages("rjtools") +install.packages("tinytex") +install.packages("ggplot2") +install.packages("ScatterDensity") +install.packages("DataVisualizations") +install.packages("DatabionicSwarm") +install.packages("this.path") +install.packages("kableExtra") +install.packages("memshare") \ No newline at end of file diff --git a/_articles/RJ-2025-043/data/DF_Results_Windows.rda b/_articles/RJ-2025-043/data/DF_Results_Windows.rda new file mode 100644 index 0000000000..39a24cdac9 Binary files /dev/null and b/_articles/RJ-2025-043/data/DF_Results_Windows.rda differ diff --git a/_articles/RJ-2025-043/data/DF_Results_mac.rda b/_articles/RJ-2025-043/data/DF_Results_mac.rda new file mode 100644 index 0000000000..8adf766af5 Binary files /dev/null and b/_articles/RJ-2025-043/data/DF_Results_mac.rda differ diff --git a/_articles/RJ-2025-043/data/MI_values.lrn b/_articles/RJ-2025-043/data/MI_values.lrn new file mode 100644 index 0000000000..52242cf6ba --- /dev/null +++ b/_articles/RJ-2025-043/data/MI_values.lrn @@ -0,0 +1,19643 @@ +# +# TimeDiff in hours: 2.12562474164698 MemDiff in MB: 47048.78125 mem_idle: 5601.41796875 mem_at_end: 52650.19921875 TimeStart: 2025-08-24 13:42:13.871551 TimeEnd: 2025-08-24 15:49:46.120621 04MionFirebrowse.R +% 19637 +% 2 +% 9 1 +% Key MI +1 0.0103357844659038109608 +2 0.0136229204007971942125 +3 0.0053687399366724323016 +4 0.0205366940846860197845 +5 0.0091291201295921713121 +6 0.1588031986411286566874 +7 0.0763309086256121482883 +8 0.091837792461858108739 +9 0.2051376914173367316252 +10 0.0407009632459015766037 +11 0.0328634733225747419083 +12 0.0531853000768327463521 +13 0.0910589014850319428129 +14 0.2606956031353761527036 +15 0.0223210756645285113287 +16 0.032694455815095051221 +17 0.18048850528075247035 +18 0.0167575135795176889675 +19 0.070715598081555697263 +20 0.0067266521732648023105 +21 0.0008732898408048254697 +22 0.001309238573689347658 +23 0.0778839413126197893655 +24 0.1754167453152533362459 +25 0.1285636345994227158762 +26 0.1561569698104834924557 +27 0.2210195520839992877615 +28 0.0105729423818335978302 +29 0.0549741745095929934539 +30 0.2179148787442016788063 +31 0.1189439123174001189387 +32 0.0919574002340092372387 +33 0.1211689524662124817223 +34 0.1038808352493928860438 +35 0.0043862063682261380818 +36 0.0930073457525145558256 +37 0.2918745558843475196298 +38 0.0148118419679443592829 +39 0.1391084706021359918005 +40 0.0662059840295123186449 +41 0.0776196252193091795757 +42 0.0662845388976514571233 +43 0.1529710437799923217206 +44 0.1026337663064361971355 +45 0.1978735915208991169578 +46 0.01126049967976763648 +47 0.1301579396159418255952 +48 0.0680944933413662051302 +49 0.1261022084671164777969 +50 0.0342832531180844723995 +51 0.0166782050858635920443 +52 0.1570312896191174978355 +53 0.0427386058766998089276 +54 0.1211917434541791710467 +55 0.0758128897547038355098 +56 0.0676202158644089584838 +57 0.1579025153371456102747 +58 0.1491587406070797872104 +59 0.0677906770293165150676 +60 0.0635731938522145628712 +61 0.1221214745180637939459 +62 0.0456832434850031610107 +63 0.0180795922496297632731 +64 0.0136310643499734634315 +65 0.085253663543921234802 +66 0.1154109218688044863343 +67 0.2580830547552808162948 +68 0.140851124472036487445 +69 0.0572108851765401812961 +70 0.1296626150766504725897 +71 0.2581813366164703027472 +72 0.2174909244007858233338 +73 0.0123050661795052144326 +74 0.0774881538618829995846 +75 0.1065382222439721915386 +76 0.052221816974826501534 +77 0.1523591805254878028819 +78 0.1666132273822979736888 +79 0.1137828417766790778387 +80 0.1097743214901914371451 +81 0.2020679032927427398736 +82 0.1078490630960094731616 +83 0.1476393505258706273509 +84 0.0276127693454678822815 +85 0.0449751651240704111712 +86 0.0961148665890804720924 +87 0.117234449277988636573 +88 0.2309422417583602005209 +89 0.2725870763231378890445 +90 0.0286232778263691249598 +91 0.2149793490458201117299 +92 0.084713629842953455662 +93 0.2126798923929108797726 +94 0.3082101987368611184515 +95 0.052941387100607216476 +96 0.0355214669213359743316 +97 0.0739811638601361037937 +98 0.1628495821525295372822 +99 0.1451177477010414107461 +100 0.1842680042060051703867 +101 0.3238772450876324371372 +102 0.1291051410082738049034 +103 0.2687136390682243747285 +104 0.3732011451839402638164 +105 0.0612239265018399217433 +106 0.1206370071085289125135 +107 0.1862203143463728172158 +108 0.1883485453305878687402 +109 0.2204198273784529482633 +110 0.1576909258413582981539 +111 0.1246754738123149325313 +112 0.1549066987227584457631 +113 0.0998664894375900363821 +114 0.0018991834565824108026 +115 0.3113520294372204566535 +116 0.1913003253120307278756 +117 0.0839495693257413938548 +118 0.0974885125480152853195 +119 0.2558774859592539141495 +120 0.2381067659111554823959 +121 0.1627620275800221727458 +122 0.0363217644516830445722 +123 0.1515025105803990967956 +124 0.1787698604715260430886 +125 0.0484035519544335629538 +126 0.108454666751242170819 +127 0.2172094328404398644317 +128 0.1304255912214545565231 +129 0.1864433295332020845692 +130 0.2298289767302330033338 +131 0.1981020807828208774026 +132 0.0058518979257340727335 +133 0.1239925025295599281261 +134 0.176632984696542705283 +135 0.2166066635737893542046 +136 0.0702257169765366784375 +137 0.0277348198302902852752 +138 0.1708206140735340494707 +139 0.3310959127553178560355 +140 0.2477599759983817040432 +141 0.1342943934116657689337 +142 0.0864614517372727098277 +143 0.0339959115588105154071 +144 0.1618089443765982349266 +145 0.0457250970034678710929 +146 0.0800823813452104515953 +147 0.0018194833494231655267 +148 0.0435537037379662567149 +149 0.0887397188567303762952 +150 0.1823741200470327394889 +151 0.0195629907130866198539 +152 0.0159970329711347225565 +153 0.0601851371607157181742 +154 0.1030378385749737696342 +155 0.0234641547664009124385 +156 0.0200132344882308042811 +157 0.221700131852678478106 +158 0.0184484755629038335578 +159 0.2262451079882820959011 +160 0.0865838487198515444065 +161 0.1284802748720689780093 +162 0.2223805815782405881365 +163 0.1686345560229465523339 +164 0.0829081254560372071793 +165 0.0809128585110912151857 +166 0.1569651381727796557453 +167 0.0813908203138564989576 +168 0.2014302408254322052095 +169 0.0131633659318595146875 +170 0.1788001053046577581984 +171 0.1589279579652697338421 +172 0.0704575333031021155961 +173 0.1316082149775442688977 +174 0.2023759217509896735088 +175 0.1487082729277183767991 +176 0.149596668960252754399 +177 0.1178575114164024245644 +178 0.1475194682520282185578 +179 0.0206895191678280604419 +180 0.1299544752629553379109 +181 0.1263512361969936770301 +182 0.1126181878633464444883 +183 0.0050097898050745536327 +184 0.1341082164768893913998 +185 0.0359220781402921768288 +186 0.058929250370876276599 +187 0.0393457457495349077758 +188 0.1554445110484795888883 +189 0.0026472884495242605084 +190 0.0790965436067974092538 +191 0.1631453668438315740552 +192 0.1261857119544278205137 +193 0.1793055736799167021456 +194 0.0623062903257211725405 +195 0.2301145795430945273363 +196 0.1118210438214992108463 +197 0.0334070432172128353732 +198 0.120779197919567482522 +199 0.0875749369921712400577 +200 0.1742626653999835573128 +201 0.0033517273955190000218 +202 0.1878795187525192367239 +203 0.1791777947135514692523 +204 0.1262276119633287674304 +205 0.0826186224296062976524 +206 0.0038610377645176981697 +207 0.0938294867626150719264 +208 0.0028061568956736998928 +209 0.2343106602353579592801 +210 0.0157715919190124682914 +211 0.2575555789676397533405 +212 0.0411660331757855921242 +213 0.1869633534118859408135 +214 0.1615141486142451510144 +215 0.0004995849072089598303 +216 0.0004539482641914736817 +217 0.0115079159558560181059 +218 0.0004446049273121707637 +219 0.1328547814449447528329 +220 0.0043396094324674562917 +221 0.0012736262799782976322 +222 0.0849305108750872256307 +223 0.2417666387951524775701 +224 0.2816779437916664963204 +225 0.1757300027398601971473 +226 0.2155555367911683450899 +227 0.1666480972455068043647 +228 0.1150119943847732056907 +229 0.2808730396404316431713 +230 0.0912625156928722897076 +231 0.2112389815576330986957 +232 0.2351861107006636142369 +233 0.0069844986116444169807 +234 0.0012788220502079326348 +235 0.1737665986841356313697 +236 0.0101111332539210157372 +237 0.0328275953909663090191 +238 0.1215143256610666799844 +239 0.1161800468497165006454 +240 0.0922891978213567221179 +241 0.1083205046145546412983 +242 0.0353399994666109273922 +243 0.1055458464431290876995 +244 0.2926694701011635646459 +245 0.010155687688925283188 +246 0.0083469728836070983258 +247 0.2171756033509034122808 +248 0.1522074346448337445192 +249 0.0590907851763350158714 +250 0.0982977241410194491067 +251 0.2328961133067044275702 +252 0.2477178514614986337694 +253 0.0018234652325396457674 +254 0.0599069912578104463163 +255 0.0355224510737433343821 +256 0.0611755429802229369618 +257 0.0558614683679837104036 +258 0.1448717283016058010592 +259 0.0782470968354881390994 +260 0.0696516364697300771613 +261 0.0030255316881783096174 +262 0.0057662928031517908523 +263 0.0033972868626507232194 +264 0.0359741338750203323849 +265 0.0336361934668057138698 +266 0.0040023512807412474357 +267 0.0003651114086207292073 +268 0.0542582496760843913108 +269 0.0045689944910909090908 +270 0.1558897255846392326806 +271 0.1442134756245311288936 +272 0.0466099117540086718692 +273 0.1276810108141848421059 +274 0.1366217723351676738286 +275 0.1488725916906816770791 +276 0.0402066563485180897275 +277 0.0588251165873492562475 +278 0.0367436847777296640705 +279 0.0555518335050865438851 +280 0.0033627694779704443012 +281 0.0276124901095222037573 +282 0.0751856838832631879654 +283 0.0513435767671907017373 +284 0.0586319602834895897692 +285 0.0465174561630070093621 +286 0.0344452372410420787352 +287 0.03068748299888058792 +288 0.1122679329466285680361 +289 0.1081781505953970240475 +290 0.0185883251343557202862 +291 0.0517773910704122999893 +292 0.0356022129689673172415 +293 0.1126217748230899096118 +294 0.0414380709725151105682 +295 0.0432797522136023141148 +296 0.1505633675964879458764 +297 0.2845403634401744152882 +298 0.1091822478582534783342 +299 0.1030907242417716390692 +300 0.1156605669827139443173 +301 0.1698484031993499066004 +302 0.1508745891507072833804 +303 0.1278833411330524139071 +304 0.1740997852114480604069 +305 0.0841266621312769802277 +306 0.2090473369608962239674 +307 0.140729269109096832846 +308 0.0889369523653875115876 +309 0.1249555921394898128085 +310 0.0391358026080069129615 +311 0.0301417565979955580369 +312 0.1824128657742941361786 +313 0.1195469889256878665007 +314 0.1364936885158714308108 +315 0.0886215097115838901232 +316 0.1245211202692586005547 +317 0.1453997653495813779934 +318 0.1325234256436954305425 +319 0.1936361769567648827284 +320 0.1236437417140118094627 +321 0.0083933434898956112147 +322 0.2613422301147083071093 +323 0.3509264443601278204987 +324 0.1633847151192330537928 +325 0.248055607174563652606 +326 0.1024143628895841234838 +327 0.0513068056696931382166 +328 0.0701808095414829802294 +329 0.0699602985127973225898 +330 0.128878405740722595807 +331 0.094260345055257641067 +332 0.0696308244852797131319 +333 0.2733052193156628728588 +334 0.1900294943757125099015 +335 0.0242450328102647542916 +336 0.0073508657351245999118 +337 0.1567007322614284714479 +338 0.0823614471595171121487 +339 0.230239655231159467963 +340 0.0376112566142615123432 +341 0.1873952423382165832777 +342 0.1577118095802926278637 +343 0.1976085856225655856822 +344 0.1081473092301731137033 +345 0.0737546870774549168104 +346 0.2077854118813743655725 +347 0.1467026930548769869667 +348 0.0393054040966375542454 +349 0.131713079725162551803 +350 0.0225816797145129144475 +351 0.1156991092835254059112 +352 0.2208376418368291627115 +353 0.0867234708524018127473 +354 0.1106415240055591692192 +355 0.0194896530385622046566 +356 0.0526009234908681316822 +357 0.0250544409029216409712 +358 0.0499333455797374009544 +359 0.1336759214484485314145 +360 0.164846231374061985564 +361 0.018451540825323201056 +362 0.101537678402829373292 +363 0.1384704065809335693515 +364 0.1504559868761530527781 +365 0.1848320344024300398988 +366 0.1120240519195521144979 +367 0.2010392062455910733298 +368 0.102452840532241393201 +369 0.0788128929529328392523 +370 0.1205670635344291558022 +371 0.2877204620984501470815 +372 0.1530375048943290461345 +373 0.2167103124708930350639 +374 0.2373512459417280173746 +375 0.348998173004443856815 +376 0.0139901434383774819076 +377 0.1009487128374537551201 +378 0.2003882284822824377812 +379 0.1095555786677667731155 +380 0.1235661264014457499938 +381 0.1291540604061773012123 +382 0.0960587579261628354832 +383 0.0079589112942173544096 +384 0.3077985485989592762124 +385 0.0615617536678957077978 +386 0.2389586527224955292414 +387 0.0438281372640600894175 +388 0.1541283077362071707306 +389 0.1356679996452667680273 +390 0.1573828496359580275676 +391 0.1417048182872693018464 +392 0.036093317141463243658 +393 0.1039491190961282091054 +394 0.003794530176195705113 +395 0.0145685050600620567135 +396 0.0732767507902270798237 +397 0.0987764344089870072185 +398 0.2223149937030846623465 +399 0.0288500319014812628415 +400 0.1744165524079518969636 +401 0.1946460910789859499737 +402 0.2306019443555063497264 +403 0.1906630178646222251171 +404 0.0433056041019055162899 +405 0.1232631007181570481634 +406 0.3173501316532063021292 +407 0.1668630125356784787982 +408 0.1670156908687702723704 +409 0.2085573607551287345085 +410 0.1921577309724521176459 +411 0.0539605389064248680553 +412 0.0641189150061380330747 +413 0.0996292845979522279087 +414 0.2107968239452565617498 +415 0.1047624444433871149229 +416 0.113543263553194717641 +417 0.1831381692728910881574 +418 0.0057496061746064299966 +419 0.1406555083809261363914 +420 0.0305262672434031927249 +421 0.0032656734385369656087 +422 0.2761796731793259573884 +423 0.1499591898884118090773 +424 0.06010491177617544728 +425 0.24660227048055838317 +426 0.1159346334572011472108 +427 0.1223154817036979924438 +428 0.1526127408155963371961 +429 0.1639376439058205792154 +430 0.2192835251103564808695 +431 0.1652421350861395243914 +432 0.1567671060136864025214 +433 0.0204477380697836719214 +434 0.1269827673725372474411 +435 0.189934797196983390366 +436 0.0206092489979880222339 +437 0.1333328687146567159694 +438 0.1831315365413279661055 +439 0.1186269716774443622187 +440 0.0993048615383739785578 +441 0.017236339046604400721 +442 0.0120901907368988949937 +443 0.0321425994834382486309 +444 0.1561798447803679767976 +445 0.1215061439330050591101 +446 0.2136653089532197513645 +447 0.1341155166139067766995 +448 0.0247860444156957522577 +449 0.2735876074742600305711 +450 0.2594048398554464895938 +451 0.0162508141111626890074 +452 0.0995989133236500623347 +453 0.0129420926996290444611 +454 0.1121429677305311733981 +455 0.0004764886818120700278 +456 0.156679206870268011853 +457 0.0061999673099471224597 +458 0.0646940622392331854407 +459 0.2279537105084166459523 +460 0.1996887279343586008018 +461 0.1489526608270657082045 +462 0.0497109835403597571934 +463 0.0649031101221865525108 +464 0.0869008283335677284009 +465 0.1593904351719953038646 +466 0.2436370613142594399037 +467 0.0996033965804333926153 +468 0.1884552913015014596887 +469 0.0205745709509938048964 +470 0.2158212249719445030571 +471 0.0193621073834161494531 +472 0.0084445963365318964938 +473 0.0841881609955511117649 +474 0.1992786968746164677491 +475 0.145414846047136059326 +476 0.1052547957192212224609 +477 0.1580241154261538549797 +478 0.0548379449563931864464 +479 0.1136082017178825509518 +480 0.0764617896748829295461 +481 0.2442348300025044827155 +482 0.0010082955800276981618 +483 0.1565800284571940703682 +484 0.1887587955696986563581 +485 0.0531342053817551868145 +486 0.0008199356594556804273 +487 0.0482333748845327550669 +488 0.079567198628338817179 +489 0.0598619538924331576202 +490 0.0889003878022898569311 +491 0.0577612997652972581797 +492 0.03650720582345912818 +493 0.053747560019858801883 +494 0.0827391883278271594859 +495 0.1402226907573820846586 +496 0.1110380922985091112221 +497 0.1621256181196870094219 +498 0.1451703030081691880859 +499 0.0905861214399099889949 +500 0.1068810195440116489474 +501 0.2485244070917716485702 +502 0.380919737502897592929 +503 0.0769829734317917652175 +504 0.0393292352725764238208 +505 0.0183001048239571223464 +506 0.1275174520430334412779 +507 0.2107137482061962052704 +508 0.1561565902391427174134 +509 0.1753636293284050740038 +510 0.08075733090099172784 +511 0.0220981244077736692366 +512 0.1268533609799676609153 +513 0.1014370972147669580643 +514 0.1035988904031701351993 +515 0.1493669432139541897175 +516 0.2320868191552487302953 +517 0.0266861264368701972538 +518 0.1650732709363995909246 +519 0.2069202415434681086026 +520 0.0479373376333897052715 +521 0.165695358195496494913 +522 0.242127544672646466406 +523 0.1208165602168230073898 +524 0.2112760009189422194531 +525 0.0313100621224358527384 +526 0.2057211509124792780945 +527 0.227191960384446806076 +528 0.0590844998040627109037 +529 0.1867696331326222647373 +530 0.0117501674022213661863 +531 0.1534736333031869737198 +532 0.1019137359237553969171 +533 0.1974821737420984490985 +534 0.0279268144494842530945 +535 0.1668931473772106433007 +536 0.0482121720954979127161 +537 0.1613828526200870305818 +538 0.1806000689811398107754 +539 0.2494516182515532365382 +540 0.1183843135655776451776 +541 0.1777434170789813361324 +542 0.0296720197909334480746 +543 0.1566533648021680436013 +544 0.1330673191734303151268 +545 0.1722395453685902078078 +546 0.1428800509330881707282 +547 0.2396724122298835857325 +548 0.0507504996826455903602 +549 0.1058240936015644589885 +550 0.1927785270219431745797 +551 0.1212855405060533769968 +552 0.0011317968376532800898 +553 0.0040133941714261258879 +554 0.1030719980923030382192 +555 0.1110414864086439190594 +556 0.0425831482314530523903 +557 0.0725845813226547870167 +558 0.0257749781603314361 +559 0.0434888532931217222655 +560 0.0064563543156776915383 +561 0.1254624833628069646707 +562 0.1402876712549212534498 +563 0.0391323318403293957113 +564 0.0082388103069831741809 +565 0.1725831530384281387036 +566 0.1147591026692927201891 +567 0.1237198082327845560791 +568 0.0927125032716096669771 +569 0.0063947987409026284603 +570 0.0107307592355499591824 +571 0.1196626716874496371146 +572 0.0867537072472013420787 +573 0.0472734391682063798124 +574 0.1999032269149785545626 +575 0.2319598154705919956786 +576 0.1027776145677796043598 +577 0.0841646321670154184957 +578 0.0531328503839602200864 +579 0.010217752978466235203 +580 0.0218767139101891383568 +581 0.0661274925760302056954 +582 0.007287025669259791609 +583 0.1062070120148032176255 +584 0.0015110920650276530881 +585 0.1492259225156165602932 +586 0.0781233354892073017517 +587 0.1747980933548836102176 +588 0.0700833921813015237046 +589 0.0978162887525698410141 +590 0.0008343235317485189024 +591 0.0062996091671819604008 +592 0.2034375305839348779813 +593 0.0323318589387552937642 +594 0.0951214432864569553461 +595 0.0040659561860614144499 +596 0.1938981974309635092357 +597 0.0664357646697954079551 +598 0.1056350048184722945077 +599 0.1912998665824247601641 +600 0.2374589082182314014879 +601 0.2169668531817345591417 +602 0.1484529174392095474122 +603 0.1464819532318143990857 +604 0.1931123579863151529246 +605 0.1519237537567467710709 +606 0.002972028373540035965 +607 0.1710832465619619990882 +608 0.1329362248393899659238 +609 0.1046751113466383859008 +610 0.0060742964115353091648 +611 0.2502418998171195285707 +612 0.0007209679851683507444 +613 0.0043825015200025024473 +614 0.0037799996904745629747 +615 0.0905833693671480588705 +616 0.2862367847643237372957 +617 0.1551518551013350011303 +618 0.1206397295593227281252 +619 0.153045872667526144939 +620 0.1668228438541513036775 +621 0.2763640609198229070032 +622 0.2215441199365376367947 +623 0.1628946871860430245516 +624 0.0643187587284703293777 +625 0.059967069173608035515 +626 0.1699739217475710884031 +627 0.1751556575738657250962 +628 0.2465364291451048184278 +629 0.0560332973225868974376 +630 0.0813274247719775050802 +631 0.0024055990779393806624 +632 0.0313027049463935475027 +633 0.0505326371857241773977 +634 0.1036505179723193198482 +635 0.1328736973379127106476 +636 0.0067779602291926201521 +637 0.0402345819106213614669 +638 0.0098607573231701880351 +639 0.1078313855545160321325 +640 0.0147714235324436175129 +641 0.2291338882336691751274 +642 0.1694104041161113571867 +643 0.1453791975874793307444 +644 0.1359658478484937205177 +645 0.0012611662543457911531 +646 0.2803784916923703907088 +647 0.1695200783061953930808 +648 0.2169400490232868405904 +649 0.1282789849724764452699 +650 0.0805668843817697738485 +651 0.0465150645929471445728 +652 0.0947277610997054181352 +653 0.2154922016394374417736 +654 0.1178168081517685233939 +655 0.2132704675982920183852 +656 0.1905965244207901976292 +657 0.0716083453562443988938 +658 0.2473775292349370824585 +659 0.2051691444097025196491 +660 0.2242975325066883796055 +661 0.1747742841881742004162 +662 0.1328441976560619797176 +663 0.0903802800046638599163 +664 0.1675125295612704401371 +665 0.176259234199928838116 +666 0.0404494305899466624621 +667 0.0160248114802071829621 +668 0.1844926894217976953705 +669 0.0069079131528833387019 +670 0.0791730039376375288596 +671 0.1720284891025518347885 +672 0.0882342380344940946379 +673 0.1160737245857074162458 +674 0.0750434918908716025454 +675 0.0350494696921927750832 +676 0.0838024982963183895368 +677 0.0082973551492339994717 +678 0.0192892266331776418564 +679 0.0878809526309815569611 +680 0.0053715088642330745261 +681 0.0753301953714325678302 +682 0.0165356501469048865205 +683 0.0107558252704677056316 +684 0.0535980466177025730024 +685 0.1514585020084496247428 +686 0.0740672583716078525296 +687 0.0837929398395581975789 +688 0.1021375221506677921646 +689 0.012132067159613003221 +690 0.2477715897839512237955 +691 0.2840245057780644288847 +692 0.1053547062825010427467 +693 0.1059184875815815524147 +694 0.0570997391941396939541 +695 0.236102253875527912097 +696 0.0754455379215616395161 +697 0.0282206593462047371224 +698 0.1388031285660606128918 +699 0.1023888299697438275526 +700 0.160870225658769266941 +701 0.0497133615391673264194 +702 0.1316863377258291611938 +703 0.2277573348594031932279 +704 0.0669113181718883620608 +705 0.1786158150702540636789 +706 0.1651422954653298524885 +707 0.0024703761611786171498 +708 0.1999136365660366676522 +709 0.0487212669062624809113 +710 0.1102536142758022558175 +711 0.1296835061955128876043 +712 0.3140118462421736533763 +713 0.3145863885260696268809 +714 0.1303011259720200931689 +715 0.1910373149561994665646 +716 0.1781791467226400693669 +717 0.0485081216090400280105 +718 0.0053812748483210090522 +719 0.0401292038938285411387 +720 0.0593424888730685190508 +721 0.1309382914930187213898 +722 0.1605505441664240029453 +723 0.1167624082341881947 +724 0.1556782033152454836067 +725 0.1787842857214200775751 +726 0.2378499831960694532729 +727 0.2294609144485501850408 +728 0.0239068296153474857402 +729 0.1180770299377678772546 +730 0.0405774704340964159632 +731 0.1478713268631493826799 +732 0.2330869492922156560599 +733 0.0026514291906123118106 +734 0.1903711954146452833836 +735 0.0594040758708950789035 +736 0.3088139137055315175395 +737 0.0446461478298251501773 +738 0.1721104439399007979272 +739 0.343880597107053764816 +740 0.2477095373252609367132 +741 0.2988207320743574979538 +742 0.3048418529209319105533 +743 0.2607843008241432225525 +744 0.1782742619096983616167 +745 0.0971828033299457494376 +746 0.2184245393587758155007 +747 0.0584694713587410389022 +748 0.0638279877178173082886 +749 0.0341563877767108031724 +750 0.0545658979838643254379 +751 0.0981242936774055080074 +752 0.2012160334815320528179 +753 0.1581465943892474412813 +754 0.1985449789865851299986 +755 0.2225041047539473293426 +756 0.3262981370135806247212 +757 0.3534796210370066971151 +758 0.1158307976544141054687 +759 0.1363918264340877040119 +760 0.2614739041363389193684 +761 0.110489239196034053081 +762 0.227790077778560196009 +763 0.2710338194433912217818 +764 0.0668211802127595816403 +765 0.1606134350442874336728 +766 0.1992405011349243082464 +767 0.215192444394042364797 +768 0.1518773846770969104814 +769 0.0791588525980066598509 +770 0.0128048001763508174922 +771 0.1160889617020348235643 +772 0.0983062796302010821758 +773 0.1789392474134967603305 +774 0.1268491960985431155073 +775 0.111565439809329314258 +776 0.1497477096461569123775 +777 0.1577139649467565185592 +778 0.1446487279752141641431 +779 0.2776869971947555360003 +780 0.1123230954319162366861 +781 0.2123031911750946354456 +782 0.3256502052542062375373 +783 0.2788451383063124122685 +784 0.1558474356757902634651 +785 0.2238158976917973552734 +786 0.0137909457506487590633 +787 0.0501485533393365920385 +788 0.119132868794034757487 +789 0.2945254015803294156406 +790 0.2623633653104189278338 +791 0.0944055796742806485611 +792 0.2421155555525777436365 +793 0.1535878982599307995827 +794 0.3233887881591032131823 +795 0.1821126252298093939341 +796 0.0980936468604304140362 +797 0.1963312849219550071389 +798 0.1120711781594434658382 +799 0.0452269571064116623904 +800 0.0958780720012289006116 +801 0.1548367964267970409153 +802 0.3043215528695869531717 +803 0.1069537438402038409535 +804 0.1025581834386352780841 +805 0.1191653430148333359107 +806 0.0027974120859285060529 +807 0.0878646269539565816231 +808 0.1490949599631972988245 +809 0.0340885515578310677176 +810 0.0144981411933851334073 +811 0.0295299055333167644044 +812 0.1155566567174346470637 +813 0.2512223778943901697502 +814 0.1826776931227317435358 +815 0.1386749290942622947487 +816 0.0703164313301549714907 +817 0.0898624599537528739868 +818 0.0182053604830616604093 +819 0.1330442579476156239604 +820 0.021164234033726653933 +821 0.1270914371825014599349 +822 0.1411135579096388092957 +823 0.107111510375125285055 +824 0.069944706979545356873 +825 0.0176934455277967493669 +826 0.1903291630767999786933 +827 0.05917962836951444483 +828 0.1314652455267406316874 +829 0.1120763208413307759903 +830 0.0743343168799734388408 +831 0.110941191317150827933 +832 0.0831099659426812464869 +833 0.0180690281894575199839 +834 0.1385021731346537943175 +835 0.1310197270013367942809 +836 0.0868853754151411999951 +837 0.1788843978254237299108 +838 0.1722280623020373546694 +839 0.107222796758905211667 +840 0.2365804406488224709282 +841 0.2662370707330685704051 +842 0.2273415830560453487763 +843 0.1825893833826750245475 +844 0.1607239971463591543532 +845 0.0031170526892349338134 +846 0.1076246086092790960054 +847 0.000612393878261248371 +848 0.0022928292990216426507 +849 0.1486664820275574450115 +850 0.0237453898556176518087 +851 0.1162035088920937292434 +852 0.1187229725490356047057 +853 0.0462147883116904439893 +854 0.0042404929813761376919 +855 0.017331729121765816759 +856 0.0078223646772303031399 +857 0.0573871241994060241054 +858 0.0029298830292935722219 +859 0.2140591443916082525778 +860 0.0801012828886110556104 +861 0.1033264964145305386811 +862 0.2617936990175679379256 +863 0.1754335344761759618581 +864 0.194216971598102583485 +865 0.0212636312612261291211 +866 0.0430128143474638666155 +867 0.2268053837570593866957 +868 0.23869253085433009276 +869 0.215755023035297616163 +870 0.1643929213218977281041 +871 0.1574257081454435158729 +872 0.1308364915688858265419 +873 0.2357158982587085893634 +874 0.2223719164495084332778 +875 0.1798978977230228926132 +876 0.0870524276730279406022 +877 0.1926045454279338153825 +878 0.267902877021501950594 +879 0.1468691468946222433711 +880 0.0837190616548069005409 +881 0.1501397616353045694204 +882 0.0071733210342291532299 +883 0.1356315219674733840716 +884 0.1465673061811681165967 +885 0.265062125137343285175 +886 0.1466169287995850722961 +887 0.1733640150315326478037 +888 0.0866725223447913917241 +889 0.1969086258313380299878 +890 0.1330244053610785681663 +891 0.1455351926259598749347 +892 0.1658835326097561069236 +893 0.0546693482610903225272 +894 0.2331723221027502679359 +895 0.1237254633419331284472 +896 0.1969366560662974829921 +897 0.2627363136320698400183 +898 0.1450071970583280966949 +899 0.1936330859932508807741 +900 0.2879662788033484632777 +901 0.1252909642091710462619 +902 0.2535875464487861719043 +903 0.1566488800908782397237 +904 0.2365257248520505062306 +905 0.12253535736440719639 +906 0.2119570324755394863381 +907 0.02030818016855746172 +908 0.2066151812474020799115 +909 0.1058811200190054035986 +910 0.1510790984919328505676 +911 0.168210748595057829613 +912 0.2945460925830304943496 +913 0.2710619147778436377472 +914 0.1211312820689946223585 +915 0.1836909754293216401333 +916 0.1685962134963669978305 +917 0.0925417035680719118895 +918 0.2291567518875770592679 +919 0.182332920257830849442 +920 0.1765810456113492132868 +921 0.268954699350378356737 +922 0.1869498548396522630277 +923 0.3076184406272254867609 +924 0.2053303091921203082926 +925 0.0754313266134041676692 +926 0.2455497839801591419828 +927 0.1592747216155399880844 +928 0.3419917886334637446843 +929 0.1080098486248657618791 +930 0.1867757396940214476011 +931 0.2482415452066662475694 +932 0.1775498238734169453856 +933 0.0646113526999296111653 +934 0.2880622774064449109765 +935 0.2133267052001324992627 +936 0.2328208829702936022432 +937 0.2091850058874865359027 +938 0.2584561602994933093846 +939 0.149809830852015379854 +940 0.1705694721040474093332 +941 0.1728184270319076398792 +942 0.0635493896088434317981 +943 0.1620818776208256539739 +944 0.0335164644015759002893 +945 0.245070861415436741515 +946 0.2444764750085890447906 +947 0.0543126112360958479552 +948 0.116360900684288642859 +949 0.0888225427290738611674 +950 0.2378581185009978615774 +951 0.1633034347792336193894 +952 0.1545655930751306528226 +953 0.0477755609229121649961 +954 0.1332741891036173798035 +955 0.0351962959640597372601 +956 0.1443684278916747554256 +957 0.1957737533802137719352 +958 0.1049890523553480142782 +959 0.2644686688167312005326 +960 0.3170796657988331213041 +961 0.2196473605741443568817 +962 0.2051647970644786822536 +963 0.1485743603906647947177 +964 0.132962353541120764655 +965 0.254611541547919595363 +966 0.1603925389508753518442 +967 0.1613033174499724220485 +968 0.0505759099749923182743 +969 0.1908574553767729831133 +970 0.1741019417770157395875 +971 0.2450881106874134018891 +972 0.262121147707788493264 +973 0.0980888622317963032593 +974 0.3059738236644671816045 +975 0.2399899454101925844185 +976 0.026555836839041307984 +977 0.1438266215079595788939 +978 0.1163548796303969373733 +979 0.1079954797416218942319 +980 0.0374139777967985240337 +981 0.0216607273671259037418 +982 0.1021004353505006595482 +983 0.1718375067013132861948 +984 0.1153659684282613562267 +985 0.0035457717932916711472 +986 0.1031597616418546137229 +987 0.2176177625986014663173 +988 0.3148875825300081476854 +989 0.1136783252067939398167 +990 0.2165173310385295757374 +991 0.2754355844431203848011 +992 0.0025823636356058087167 +993 0.287770766071549277676 +994 0.1628063596773578480814 +995 0.1386080597186825080769 +996 0.1790222897263488988084 +997 0.1199532587740894523654 +998 0.1788862436215859774702 +999 0.3385146223093770112555 +1000 0.2196798205400380954444 +1001 0.1640076280672619901679 +1002 0.1969206388550824027295 +1003 0.0614850073040076952613 +1004 0.0287053129138994320146 +1005 0.2239129009791144819808 +1006 0.1354508010770091031016 +1007 0.0004831122093182353846 +1008 0.1429464823295157338556 +1009 0.2175184219845426869533 +1010 0.2962371040990158244099 +1011 0.124955603464376507894 +1012 0.2168337920889716607942 +1013 0.0054073186928538649262 +1014 0.0710277432566941979708 +1015 0.0851395146237353128704 +1016 0.154701904710255078168 +1017 0.1720527554472447706591 +1018 0.1916913766160111554093 +1019 0.0283525667516824377012 +1020 0.1308900653033507610612 +1021 0.0637888984823810345492 +1022 0.072947855795329363815 +1023 0.0935699476751397501983 +1024 0.1349673623939901534463 +1025 0.006989199480915410552 +1026 0.0047598083254007678838 +1027 0.024374867811837553494 +1028 0.0315529457358868226091 +1029 0.0820765183600576514467 +1030 0.1685012028404103412438 +1031 0.1750125621179144386197 +1032 0.0273160552308884009565 +1033 0.1571131057822081067421 +1034 0.0505753767627332082024 +1035 0.2066426431104639582781 +1036 0.0728634307130093250571 +1037 0.0025338458396075669125 +1038 0.0791639672770705904448 +1039 0.1454900860380906879588 +1040 0.2431160832436011498547 +1041 0.2574541506607220808789 +1042 0.0058136341945447360391 +1043 0.0367349367726494818731 +1044 0.0095071527294055616086 +1045 0.2332719428209642231753 +1046 0.0897303852584891564748 +1047 0.0074363026546675637524 +1048 0.0551094765424028701362 +1049 0.0084224881086383696777 +1050 0.0114633653926870093115 +1051 0.0991005826518278065684 +1052 0.0516161427568152531009 +1053 0.3394972749678783485727 +1054 0.0201291771787169178343 +1055 0.0199636007960640668446 +1056 0.2082237573856789702997 +1057 0.1388522579216645291833 +1058 0.2401931910818343451908 +1059 0.0865788433617293851086 +1060 0.2358887799285734054866 +1061 0.104167663170346577739 +1062 0.0852964909832493178854 +1063 0.0174110746095943631773 +1064 0.0975309651335702076924 +1065 0.002491374619651863867 +1066 0.0052392684310629588154 +1067 0.2250731468400821166931 +1068 0.2341945655740922116461 +1069 0.1099059870425004720174 +1070 0.1453800285400909941824 +1071 0.1471757637310477373482 +1072 0.1930945728554382401221 +1073 0.034049739419697987175 +1074 0.0135871995151827362064 +1075 0.1036110640574321450913 +1076 0.2537297296550772296442 +1077 0.0038552985072675487709 +1078 0.1900663098024906183525 +1079 0.1703626108020865059789 +1080 0.0595905326463781609414 +1081 0.1950702310286170915177 +1082 0.1135406768916217556953 +1083 0.0876823730418149621668 +1084 0.1937920221628925687707 +1085 0.1280160414689443804104 +1086 0.0765403067870862735544 +1087 0.1010450072624454881387 +1088 0.0392249711365273145969 +1089 0.0173457626718523556797 +1090 0.0146325828860611861088 +1091 0.2028345122353513441116 +1092 0.1097030274015017986544 +1093 0.1942347101337511539043 +1094 0.0139506188053430853213 +1095 0.2548109049902186895586 +1096 0.2430126786680310568833 +1097 0.1165334766396436266822 +1098 0.2068680203906715919793 +1099 0.0157722896562715354796 +1100 0.0041857472021821469713 +1101 0.1497582978313458379471 +1102 0.1526508663049607472662 +1103 0.1546679178892775285981 +1104 0.2032423837995034898718 +1105 0.110071275639377186284 +1106 0.1806810715900839525716 +1107 0.1234400878652296557059 +1108 0.1752662944665286937074 +1109 0.0087114540603013077696 +1110 0.0639404101276518505026 +1111 0.1217573134879403468078 +1112 0.0293611437254287926946 +1113 0.080645397291032705489 +1114 0.0392983288432729796491 +1115 0.1858835368169359791857 +1116 0.1362436241099950762212 +1117 0.0703685143820432235096 +1118 0.167144727013504884594 +1119 0.263233477183036856939 +1120 0.135258431315698729458 +1121 0.18117085294389165262 +1122 0.1528901464415989008838 +1123 0.0919789435779277686489 +1124 0.0850142617595665978092 +1125 0.1722205832375817147195 +1126 0.199070967911263857486 +1127 0.0429854413519196384974 +1128 0.1692499740367919558448 +1129 0.2058222209198370955541 +1130 0.1118158522761493051112 +1131 0.0390447212775827390274 +1132 0.2007197500703065684569 +1133 0.2085832270955933764878 +1134 0.0336450497362893174991 +1135 0.0355304155911140284729 +1136 0.2335911222540178189266 +1137 0.2362412685239793364023 +1138 0.2201265085332028137444 +1139 0.2285581346829669568965 +1140 0.2177420032038389352547 +1141 0.1562201190828678420974 +1142 0.0820615799082632935146 +1143 0.0216288703936734251487 +1144 0.0369230605514057827476 +1145 0.1900069645350747071788 +1146 0.1424705910974010947267 +1147 0.1092864019447295498288 +1148 0.169804208946596091323 +1149 0.1872493078182409786958 +1150 0.2132363994046124477943 +1151 0.0552014782423233402175 +1152 0.1108752214943229535971 +1153 0.0150181243734197958362 +1154 0.1721439191658074019387 +1155 0.2320948756519130296017 +1156 0.2095396729842387284481 +1157 0.2111273272834628123018 +1158 0.0195524170606353187019 +1159 0.178803335904601512496 +1160 0.1392244148220820598727 +1161 0.163504325688719837073 +1162 0.0250649644565516527273 +1163 0.2584171746706380168312 +1164 0.2167622802361900968826 +1165 0.2566173884808310434735 +1166 0.006387314720535496558 +1167 0.0032405020205456187311 +1168 0.1107565758036627157068 +1169 0.1272993314179687585064 +1170 0.0678206860638581460199 +1171 0.1025516946791618921875 +1172 0.0578127680581143332805 +1173 0.0755948737209076321308 +1174 0.1827637851222030096476 +1175 0.1752749984433800700678 +1176 0.0039663121254265481033 +1177 0.0045587297650738047364 +1178 0.1539827474855063327297 +1179 0.1223203229398327207544 +1180 0.1437452796520869713337 +1181 0.1373004144902869594347 +1182 0.1423428648015322706133 +1183 0.1530709630574370316847 +1184 0.0862090754128030795078 +1185 0.2144642573542974717249 +1186 0.1314135214237687754313 +1187 0.1586471864429254141626 +1188 0.0938048459965129949323 +1189 0.0675959706925503278629 +1190 0.0953013354449169930449 +1191 0.0746662827104477566653 +1192 0.0612415221761706957126 +1193 0.1352024446295502502213 +1194 0.1456695518791354115873 +1195 0.2234508250887265179419 +1196 0.2633184037571543179013 +1197 0.1277855901436855401432 +1198 0.3587976520923636947202 +1199 0.1196169668051927925667 +1200 0.0318951603209484610413 +1201 0.0648535072835508957789 +1202 0.2615607843338863536253 +1203 0.173973519598527193164 +1204 0.0349479752064291229741 +1205 0.1715900028620468975582 +1206 0.3054103505510611937801 +1207 0.1133499399953648567596 +1208 0.0723726204830441866012 +1209 0.2819779971735173473846 +1210 0.0982627877057593995813 +1211 0.0310540785601006903704 +1212 0.2678650175436430580511 +1213 0.2208886690469563374162 +1214 0.1280523663771313547244 +1215 0.1695283398500292415534 +1216 0.2508319228179857196537 +1217 0.1836517075717626346076 +1218 0.0256650768497021682102 +1219 0.1537331887660331464218 +1220 0.1168573393200641324929 +1221 0.0991508467806325594074 +1222 0.2389195783537516881712 +1223 0.0484992593942431757248 +1224 0.3087234875504218378772 +1225 0.1193905594410641923808 +1226 0.077826142397517589222 +1227 0.0709369085933360815632 +1228 0.0970695990735247482561 +1229 0.1649139462328694238913 +1230 0.2256492619379146802316 +1231 0.2695399094392952887844 +1232 0.1684541388376332438082 +1233 0.131291642993747209589 +1234 0.2140365887919183640609 +1235 0.2531364560248345885185 +1236 0.0754364604101064734065 +1237 0.1283404500036558204545 +1238 0.2194434012449975046621 +1239 0.1584838067632649005745 +1240 0.2044531156507611302509 +1241 0.143011764925004947191 +1242 0.1786082524345182498404 +1243 0.2195522703263048658329 +1244 0.2796630347244535297335 +1245 0.0142660075876303565207 +1246 0.1843818427561252359936 +1247 0.1861283551157981164259 +1248 0.102893775262644232571 +1249 0.0454219520320639641442 +1250 0.1596766467215575691085 +1251 0.1736635574510253798852 +1252 0.0020043493437845584342 +1253 0.2693133446892093663116 +1254 0.1291154856336569345387 +1255 0.1310319907136449912866 +1256 0.1644192323624399931781 +1257 0.288523985155305251471 +1258 0.0038569145842108939863 +1259 0.1081357404210734479699 +1260 0.1671419981567880963524 +1261 0.0008308867645878336653 +1262 0.1798347181747962097997 +1263 0.1322288823590080930614 +1264 0.0223999451742384336761 +1265 0.0685144837389715793963 +1266 0.023692942226074559936 +1267 0.001273889877932105107 +1268 0.0010111585207466383673 +1269 0.0008792309948694133556 +1270 0.178022327386422624107 +1271 0.0736211020539247612549 +1272 0.0190215145106706166878 +1273 0.130133179429877149369 +1274 0.1651401290732080195944 +1275 0.2084878207908841540874 +1276 0.1822356403258404144996 +1277 0.0216232160201209784589 +1278 0.11894171612687408357 +1279 0.2196807938974478791039 +1280 0.2111455187092234064838 +1281 0.0431019252602994548673 +1282 0.1028145481910634623235 +1283 0.1931626230761887830223 +1284 0.0851647822355626887836 +1285 0.1344018044035387160484 +1286 0.1662032693331748911536 +1287 0.2470511043032074804948 +1288 0.0877604065300501279268 +1289 0.2639854868423664457922 +1290 0.3108730963324741192189 +1291 0.2171270071046442484697 +1292 0.2636780437845708258138 +1293 0.0635275772135023703013 +1294 0.0270752355710573988645 +1295 0.0254188133695572478221 +1296 0.0858138993278352579797 +1297 0.231168564309903040721 +1298 0.2379632635557990816277 +1299 0.1302364370559878914424 +1300 0.0120779561231208650007 +1301 0.0109593678610800434464 +1302 0.09815651854658526132 +1303 0.1793284549263199079139 +1304 0.1572486901666294289992 +1305 0.1956037926720434749761 +1306 0.1033568240008408134489 +1307 0.0687626670487910679475 +1308 0.2037863404947426182456 +1309 0.0835215578255046470213 +1310 0.1504151337957374545695 +1311 0.2730643039092548463032 +1312 0.1949917590383613563709 +1313 0.233016528080250528987 +1314 0.0532223802391766018238 +1315 0.1605494504825229484801 +1316 0.1640139050719862101602 +1317 0.127804298628535845106 +1318 0.041363587355739979412 +1319 0.1892251579929587201878 +1320 0.1108122265978340875714 +1321 0.1302093837046083280029 +1322 0.1293106928017903334283 +1323 0.0573314849785517346992 +1324 0.1321567017055405302006 +1325 0.0923836438493439854325 +1326 0.188242193201127033797 +1327 0.0893867110306224149641 +1328 0.2214709147077585926766 +1329 0.2320568706097347055906 +1330 0.1035771669441804265777 +1331 0.2236854277217698583335 +1332 0.2030012422704423058484 +1333 0.1097185395894913484405 +1334 0.2802569872673349449244 +1335 0.0762948584695525844346 +1336 0.1584642435479279576338 +1337 0.0102211381640826097505 +1338 0.073623968963813957167 +1339 0.0878102225930971480583 +1340 0.2060496463628069263585 +1341 0.197698528344028207071 +1342 0.0102953810337376022721 +1343 0.003980686383884903673 +1344 0.02771672317739501748 +1345 0.0933112551237189474795 +1346 0.0002404048469617763722 +1347 0.0731069927411063386069 +1348 0.2101971366263294394017 +1349 0.2081711412472942168694 +1350 0.2335228673826350298537 +1351 0.2304194002333859458975 +1352 0.1610416074083654003335 +1353 0.2258441128533723862848 +1354 0.1780889913489273335134 +1355 0.1061223527673572503138 +1356 0.0898419593234256463887 +1357 0.0595519661331002717053 +1358 0.2114401569422915405649 +1359 0.1784271681217005633968 +1360 0.1429352853135366363535 +1361 0.1502500918971757248865 +1362 0.1256776613220659533088 +1363 0.1765132118318330778362 +1364 0.0940848721497400813929 +1365 0.0184743990426760665291 +1366 0.2491772115614554472529 +1367 0.358741763794278023525 +1368 0.3892838884001437183002 +1369 0.0760769794817497957418 +1370 0.1719454785604850877156 +1371 0.183191364702242814877 +1372 0.1508179800607829246228 +1373 0.1510319687972856983471 +1374 0.0850758194844024595582 +1375 0.1887536591064955526775 +1376 0.2282809256425298582993 +1377 0.1652650693883571664244 +1378 0.253568478431193566891 +1379 0.0413841325273747312474 +1380 0.0036699602511292431575 +1381 0.0494078121406845843788 +1382 0.0134886159798076107202 +1383 0.0898309878313614668111 +1384 0.1993227830635348474253 +1385 0.0070415394065056341466 +1386 0.1768241331526067272595 +1387 0.0734908352470554915303 +1388 0.2432012732880700356386 +1389 0.0232644661222389768918 +1390 0.138143132525719275483 +1391 0.1685982511573657638415 +1392 0.2095138306175746145676 +1393 0.1491082600960272619428 +1394 0.0665509709394111748093 +1395 0.0904881790618824594086 +1396 0.0861353996818271655522 +1397 0.0459826667178191011986 +1398 0.1209415015196358411664 +1399 0.3551742866480260829576 +1400 0.2090344355311354596072 +1401 0.0521699957528920024719 +1402 0.1561728221208865396985 +1403 0.0943624270633674211428 +1404 0.2248133017505593578278 +1405 0.2603228402679918507268 +1406 0.0190026107910874035578 +1407 0.1534466403983077431583 +1408 0.207807970415728487934 +1409 0.2258723491201363176017 +1410 0.166172990790642016723 +1411 0.0951269102660345922118 +1412 0.222451347091780049503 +1413 0.1489252532291844288714 +1414 0.2116921680949644946068 +1415 0.0734969126953012169556 +1416 0.0671247114924633375699 +1417 0.1165361962028146347592 +1418 0.0292071413679862341983 +1419 0.0555442581498111417893 +1420 0.1631840973325706933839 +1421 0.1239720080515270689281 +1422 0.0029124494841279826088 +1423 0.2005642681843850250178 +1424 0.3744365110898986714894 +1425 0.0341596259583922465253 +1426 0.0361947317329775475336 +1427 0.3456338180737475451743 +1428 0.0522506034071344160896 +1429 0.1769011781072945543869 +1430 0.0643051319968257345527 +1431 0.1998712572451887825586 +1432 0.0700273092262569257782 +1433 0.0007609908843910470665 +1434 0.0389580506158462014366 +1435 0.0836907275704486253165 +1436 0.0411757616590907940535 +1437 0.2411685865158308406109 +1438 0.2037224942288647644428 +1439 0.1053556777745310868433 +1440 0.0203197239170457531754 +1441 0.0866227916158720800954 +1442 0.0155034512766536475975 +1443 0.2282036788365139623469 +1444 0.0670780907930317721322 +1445 0.002046253430286516968 +1446 0.230464505684978337996 +1447 0.3321482951892756885215 +1448 0.4154881531852382958725 +1449 0.0203019428473501552246 +1450 0.0027932759510421135626 +1451 0.1794620811112241920426 +1452 0.0689713732248740524211 +1453 0.0094794476083784606768 +1454 0.0018145306836304107064 +1455 0.1016702593091354661636 +1456 0.0671836047892670484849 +1457 0.1674537856579013106462 +1458 0.0171512371538490165923 +1459 0.0217208911574966155589 +1460 0.1961759965446477349538 +1461 0.0958965512892885618701 +1462 0.2638179865418655012377 +1463 0.1136627845942783932021 +1464 0.1567038588835182877368 +1465 0.0771050882648138258268 +1466 0.2093515953804002127647 +1467 0.2315146807102594073147 +1468 0.1727118720743029967046 +1469 0.1327599599736241642312 +1470 0.100214021922222173977 +1471 0.1563587074180801117507 +1472 0.0067695340319817295591 +1473 0.0501876064540958261029 +1474 0.2220240398363156619777 +1475 0.1485794664958310085101 +1476 0.0549421444473995948532 +1477 0.0022047954797605124125 +1478 0.2305772222835717200073 +1479 0.1216769828545585380164 +1480 0.0064864749751544848247 +1481 0.0274722339196640796821 +1482 0.1798960078327455636149 +1483 0.1778233620102030454468 +1484 0.1412564550114352723842 +1485 0.1098646747256866035292 +1486 0.0278375671752564314565 +1487 0.0862210635583488121947 +1488 0.1771348922790106084069 +1489 0.1052591833527815740057 +1490 0.1600563034522391459014 +1491 0.0941467717791623776025 +1492 0.0946352218286971302863 +1493 0.0113464128442482114351 +1494 0.005777002245841068688 +1495 0.1456441713170053553483 +1496 0.1309631534303195565805 +1497 0.1382454320363267763483 +1498 0.0607361644051601784478 +1499 0.0907203063562155992416 +1500 0.0143586871057439346283 +1501 0.0275241932018792621883 +1502 0.1738600736430466198179 +1503 0.0215041503400611733188 +1504 0.0467709790646368278599 +1505 0.0599899805165273886098 +1506 0.2280637377486965100548 +1507 0.0633369423926529312441 +1508 0.1909364453132750827358 +1509 0.13783227920375301645 +1510 0.0348708691760512487501 +1511 0.1353686465314687392691 +1512 0.006849913520391339404 +1513 0.1594986863519569086911 +1514 0.0918100000392711029784 +1515 0.1334282393661853916011 +1516 0.1914216625656637860153 +1517 0.2067812714939774310086 +1518 0.1994058764119432114459 +1519 0.1318036980667044999471 +1520 0.0736529642463702866051 +1521 0.1799852271707386430766 +1522 0.1642850851100311482256 +1523 0.1073393294046215556969 +1524 0.1293070871537283228481 +1525 0.138934044827181135684 +1526 0.1508490739458000773343 +1527 0.0132138313584694057357 +1528 0.1079259961478137991309 +1529 0.0010738175408233332966 +1530 0.0350261573325129152434 +1531 0.1639646646343927705036 +1532 0.0044013840358056825763 +1533 0.0328550584634767633974 +1534 0.0018668265821087748433 +1535 0.0089430338591643936813 +1536 0.1680619384994674059275 +1537 0.1228551773064831681426 +1538 0.0038347931332554765008 +1539 0.1531281395904099951188 +1540 0.178638681695471895905 +1541 0.2089202836086310743902 +1542 0.0896260853743007651628 +1543 0.0340702501471800145438 +1544 0.1571735540982532874921 +1545 0.143717213208923250134 +1546 0.2188099247437565175733 +1547 0.1503460541895899060805 +1548 0.228433552628235669868 +1549 0.2275732831336783890119 +1550 0.0901621739047240022824 +1551 0.0397319627611773668052 +1552 0.157060262925003996104 +1553 0.158020106884821792681 +1554 0.0397912148449899513714 +1555 0.2027376115024112357421 +1556 0.068286456307732959492 +1557 0.1722322074287534476866 +1558 0.1863333772697852164235 +1559 0.1074414507389067074961 +1560 0.1263684924530074915605 +1561 0.1879648749523292750041 +1562 0.2868457063202544676805 +1563 0.2199674819640668466114 +1564 0.1271155907295355991238 +1565 0.0043289646030840690447 +1566 0.1986470128023962444797 +1567 0.1984188803161830716437 +1568 0.1761349419304074070958 +1569 0.1212574324796468344667 +1570 0.1419191745431137019473 +1571 0.2120412081115334301806 +1572 0.1946664206636211091883 +1573 0.0499702095488063421924 +1574 0.0877321674657653011176 +1575 0.0028980410471085580118 +1576 0.3981534550998857002746 +1577 0.1072283024365746628703 +1578 0.1067793834973326944082 +1579 0.0011628374089239939572 +1580 0.1348964129626767094461 +1581 0.0603581634854004275836 +1582 0.0847211968747366706012 +1583 0.1943101080368328514858 +1584 0.0255885986920868502281 +1585 0.0841095452592366815692 +1586 0.079022642353581501462 +1587 0.0931450315392893246624 +1588 0.2210953188533726909704 +1589 0.2153058198592286698325 +1590 0.0516155284992240379061 +1591 0.130721352632639703506 +1592 0.2213708066525113815626 +1593 0.1234975143414494824778 +1594 0.1995910896177314486355 +1595 0.0836876322564743257981 +1596 0.2269187180957381311242 +1597 0.2580345916270024297923 +1598 0.2117442434917949922735 +1599 0.1456160850595186462275 +1600 0.0951068507146313607548 +1601 0.0927985500918619976574 +1602 0.0212977895377872393501 +1603 0.1040812435611924435808 +1604 0.0119261210787636410396 +1605 0.000463405310383502929 +1606 0.236901180394535948448 +1607 0.1477231375093162879519 +1608 0.1451978912461293069747 +1609 0.1141864229986995493693 +1610 0.1212457903279099136196 +1611 0.0777085825272487984439 +1612 0.0235306358375035606922 +1613 0.0453539808054941179227 +1614 0.0462787307298511974007 +1615 0.0892654228383505943256 +1616 0.2056546883929831759374 +1617 0.1430842775578147763049 +1618 0.2688280666065682722454 +1619 0.0517605471031155686679 +1620 0.1566681355375058593005 +1621 0.0635928821488774276283 +1622 0.0400737029065534391958 +1623 0.020600130985784011195 +1624 0.0780344207077174056852 +1625 0.120659391947242386478 +1626 0.1613165271426300895197 +1627 0.0312699983882744753094 +1628 0.0764609734650633088293 +1629 0.0632710801516959270296 +1630 0.1100002174756360973085 +1631 0.1451573092753922766818 +1632 0.1345614473998324456261 +1633 0.1049058834084109076423 +1634 0.0791666209033856094202 +1635 0.2275915492784388316494 +1636 0.1752826975815934273939 +1637 0.3116529139148633831269 +1638 0.0035783512584652318704 +1639 0.004461734804652072589 +1640 0.1627650152981751086045 +1641 0.0192011939954748155068 +1642 0.0117916171813316657258 +1643 0.1455704798021275336861 +1644 0.0700283442476703626989 +1645 0.1133412391209771968859 +1646 0.0299180900498735788395 +1647 0.1955363274682689889161 +1648 0.0316446447252178575393 +1649 0.1384801258599009099459 +1650 0.1612819616164909708456 +1651 0.3548960217165492703195 +1652 0.095160999716452032704 +1653 0.0011495106753204426507 +1654 0.2140760538542594848543 +1655 0.174877927194231924668 +1656 0.1103307815115626561164 +1657 0.2565061414186152188854 +1658 0.0688051179018815961541 +1659 0.0023460459303655911527 +1660 0.0915699657910115927262 +1661 0.1062980691020060930452 +1662 0.2213592288333228741415 +1663 0.0614820433177492783883 +1664 0.0020154384653392936125 +1665 0.0834117300346351592255 +1666 0.1244285714087608879508 +1667 0.0030055083231477568682 +1668 0.1894855928660696686716 +1669 0.1779282756889202510298 +1670 0.1873537277951271839971 +1671 0.1101671668716081992079 +1672 0.0231668617752240207919 +1673 0.0798758547463019191737 +1674 0.0311720667294251921331 +1675 0.0304545718088546218949 +1676 0.0878570003341758165583 +1677 0.1586492440396635450472 +1678 0.0523051204821283755031 +1679 0.0006992182741403161735 +1680 0.1067834131072753073299 +1681 0.124450133661594880552 +1682 0.0335708123408325828652 +1683 0.160189146366323670323 +1684 0.1992769157760198583951 +1685 0.0056760937432539350714 +1686 0.0707931576904058806887 +1687 0.1675527590575400238571 +1688 0.2224032195301380021579 +1689 0.0921417552339133805184 +1690 0.0956359561511101324838 +1691 0.0633661043922730538025 +1692 0.0947417085150951410188 +1693 0.0038623306653968870854 +1694 0.0057569755875474977469 +1695 0.1584390023871173336367 +1696 0.0574919920891353228298 +1697 0.0456501457934407328665 +1698 0.2406620316190447861349 +1699 0.1278485360917783197898 +1700 0.3628220981266978562019 +1701 0.1425086819659178438702 +1702 0.347673843122107639747 +1703 0.0040358451927714296056 +1704 0.1818253627783488757252 +1705 0.2452390597821866580208 +1706 0.1895072078714603314165 +1707 0.1335950746888855122929 +1708 0.1027524002742159353607 +1709 0.1587705371553003175222 +1710 0.0014841647648655299091 +1711 0.0927997485046593989511 +1712 0.0048403623323144103277 +1713 0.0670847796532387241619 +1714 0.3020298338688852668454 +1715 0.0157967880197951611454 +1716 0.2386673081621798753194 +1717 0.1890246250767010915972 +1718 0.1732513190106091671705 +1719 0.3048521377748558136567 +1720 0.0630853239256338599317 +1721 0.1892905071362412927627 +1722 0.0972846805948104076389 +1723 0.2107320646405555497616 +1724 0.0550367718845683126516 +1725 0.0087729273673208402196 +1726 0.0278333058633024842121 +1727 0.0129486759916347341642 +1728 0.0229914966615109468695 +1729 0.1237374262299021643319 +1730 0.0432635234324254061566 +1731 0.0010790436570818806757 +1732 0.1114344715755956199965 +1733 0.1293559366231598339869 +1734 0.129890095947517242303 +1735 0.0562217800738179576681 +1736 0.0012447302654052844369 +1737 0.1721776212676330575224 +1738 0.2229717673899782037772 +1739 0.0319169235383320731847 +1740 0.0683885295693265338191 +1741 0.1005382601714530504955 +1742 0.1524041307013540569315 +1743 0.1633532915737200552631 +1744 0.0781290533359192962415 +1745 0.0570044612202557157699 +1746 0.0275227455021010668224 +1747 0.2056832746260620070622 +1748 0.022422759521193103005 +1749 0.1827968393572747263232 +1750 0.0939670489614928133859 +1751 0.0172063053420066711108 +1752 0.1283219230961976653482 +1753 0.1946480797805322793259 +1754 0.2032185865024036808268 +1755 0.1362826594753805964366 +1756 0.0002923372875713933371 +1757 0.2395578673276554659832 +1758 0.07120276503555933445 +1759 0.2195851261320556913059 +1760 0.0639851192587108374976 +1761 0.1285699191621621217951 +1762 0.1218866411720378739592 +1763 0.0165761023937078498525 +1764 0.0491309619697319760467 +1765 0.0490600414378807372917 +1766 0.1894312514206697162233 +1767 0.1215697984753226906784 +1768 0.0066328960922345863924 +1769 0.1119120468661496670126 +1770 0.01636625111045833178 +1771 0.0586092330108958781132 +1772 0.0537784971451739854387 +1773 0.1726019812323281998889 +1774 0.0851913297469434616094 +1775 0.120781270413148664189 +1776 0.2859229770160172656723 +1777 0.0157541366776109623138 +1778 0.1773883026634537618538 +1779 0.0179765411707858042967 +1780 0.1240949167623486143475 +1781 0.1902716561103601489169 +1782 0.2180924272081687376179 +1783 0.0270334540441422958856 +1784 0.1611207578422312602928 +1785 0.0013558947098231692759 +1786 0.0440799264022823655829 +1787 0.0252611213803489963181 +1788 0.0471872012372925447865 +1789 0.0534603815041615332282 +1790 0.197980237566089706247 +1791 0.0052312274967718265517 +1792 0.0315182812736623865768 +1793 0.0832025710815216951177 +1794 0.0622574227760450613078 +1795 0.0053379793027605428393 +1796 0.121439666706038743027 +1797 0.0881718071564735472734 +1798 0.1012918600575407612752 +1799 0.1579132384856775650483 +1800 0.1420111991709097631897 +1801 0.2467633257237845934018 +1802 0.057678845881401416551 +1803 0.0669467992550054347145 +1804 0.165496436411753544693 +1805 0.0048612829247675045166 +1806 0.1943626979600750481758 +1807 0.2072905188131300979748 +1808 0.2227533591954552472103 +1809 0.1707815912444738959408 +1810 0.1174018233154184437383 +1811 0.1556234042681065576907 +1812 0.1304993788269065146945 +1813 0.07856709003375535183 +1814 0.1243473004214586674632 +1815 0.1994477831777923260148 +1816 0.1482686521643424570893 +1817 0.0533925608765037154613 +1818 0.2643252555198725706198 +1819 0.0455259172223522606826 +1820 0.0056235682022283562526 +1821 0.1367580584380416741652 +1822 0.1134333732511558501255 +1823 0.2261769512054927488975 +1824 0.14059695948723921477 +1825 0.0218685888559882468662 +1826 0.0062856429207065397935 +1827 0.1460591605412163151989 +1828 0.1960456227437601950392 +1829 0.0133091096227051352424 +1830 0.0394843320032822256427 +1831 0.0537689874436527923129 +1832 0.0833199757592199552469 +1833 0.1248147388576523464376 +1834 0.075636862366396703794 +1835 0.1399931370144627817975 +1836 0.0991719366846251504377 +1837 0.0972444418513454950093 +1838 0.0342403519709343698296 +1839 0.1661323022490958367658 +1840 0.0795985883583903985894 +1841 0.1872805919532543605932 +1842 0.185968353904023164791 +1843 0.0069455461518501985727 +1844 0.096958260317709557552 +1845 0.2113411239862769686049 +1846 0.0926957479558133540243 +1847 0.0078871332697591396699 +1848 0.1143840310207457355851 +1849 0.0570832540038608421407 +1850 0.0051710233477355619855 +1851 0.0610474994190257566617 +1852 0.1596955949346933745225 +1853 0.2731381428521609811 +1854 0.1322410279710957148325 +1855 0.1381799441945403550402 +1856 0.1497013210036779728984 +1857 0.3849106451159086672575 +1858 0.1277144945983213764062 +1859 0.1372969534167746974429 +1860 0.0158783599370817610219 +1861 0.0187737449327333230653 +1862 0.0855362647540311438199 +1863 0.0066519217149146369142 +1864 0.0155312955123402453989 +1865 0.0860486246249635294836 +1866 0.1244917253119601080469 +1867 0.0918607814061349059465 +1868 0.2215514690352383686545 +1869 0.1445571482293346032844 +1870 0.1229952136441440496695 +1871 0.2323052675505717645787 +1872 0.0130142013237965915168 +1873 0.1068710657475362563185 +1874 0.0786123563623496585118 +1875 0.1154220816835969803948 +1876 0.0308348611057653876344 +1877 0.0056480733492883317473 +1878 0.0031011805528407300481 +1879 0.2027762909979937688831 +1880 0.2611143321014571250238 +1881 0.0441895007109392931599 +1882 0.1013926002819085236961 +1883 0.0142834910195327857013 +1884 0.0364801280416596485079 +1885 0.0154991108638468724767 +1886 0.1387443549161609113174 +1887 0.028378001366304708708 +1888 0.1166781661780785517291 +1889 0.0603807181953203916946 +1890 0.1842209092480049936746 +1891 0.2998285497731135174071 +1892 0.2162984653071872154229 +1893 0.1461481294377870721046 +1894 0.1086532823958018811883 +1895 0.1580028135177573289649 +1896 0.0803543771662619671359 +1897 0.1458288914122160551123 +1898 0.1135063642273799744409 +1899 0.2130267019004902784296 +1900 0.2532816208648885347721 +1901 0.3099882393803491398288 +1902 0.1646486820658883132307 +1903 0.351743096694885237774 +1904 0.114659836473846282745 +1905 0.1656027122628649650427 +1906 0.169850243530140526671 +1907 0.0614651456778722363183 +1908 0.1372770372121603354998 +1909 0.0237789030882733717909 +1910 0.1634645072763989126496 +1911 0.0851635531481000351839 +1912 0.000458100846630852957 +1913 0.1183928211259160645508 +1914 0.265348133981826383998 +1915 0.1856707429827902378072 +1916 0.0369707234420690658561 +1917 0.0033977669277486018812 +1918 0.2663934724332373549238 +1919 0.0819225857847247634913 +1920 0.2424758829148426775291 +1921 0.1132367276695116081742 +1922 0.0054530927132660986234 +1923 0.1385046713053320888154 +1924 0.0133488500462734839547 +1925 0.1265758750185619896378 +1926 0.2253709495076605118236 +1927 0.1056789182149322908755 +1928 0.0300711004618811894584 +1929 0.1804712363038681599559 +1930 0.0138742762928459953398 +1931 0.0006407750709258772624 +1932 0.106957694643468340101 +1933 0.0514779345665944895738 +1934 0.1911045967066604678308 +1935 0.3046832439651542823178 +1936 0.0120159914864323043304 +1937 0.0870894407282059440645 +1938 0.1497221329921841481703 +1939 0.1172773572807458419831 +1940 0.0789867440737429954423 +1941 0.1754029502703680631903 +1942 0.1604310870167975633382 +1943 0.0334121005624364170172 +1944 0.1842916282609213529664 +1945 0.2652995564456493426952 +1946 0.0021175888020343630776 +1947 0.140367216415368756266 +1948 0.031036081220467061359 +1949 0.0809483427513325998204 +1950 0.0545628185120372102834 +1951 0.0655814822168383354528 +1952 0.1419313771763057174802 +1953 0.1941928716402160914889 +1954 0.1944738044676128096988 +1955 0.0027151232513809389103 +1956 0.1767252272185021177986 +1957 0.0040334827792496757634 +1958 0.0740234942041360782783 +1959 0.0761006738396359516674 +1960 0.1710954628566847546267 +1961 0.1163171167914036491231 +1962 0.0412739291575359504294 +1963 0.0632922902237095796885 +1964 0.1136149987164971758569 +1965 0.0063413019792679242714 +1966 0.0782301851311550527912 +1967 0.2102137197034745497159 +1968 0.0049384789597585881937 +1969 0.0072678515672034704753 +1970 0.1769867482584957518732 +1971 0.1900898139430719224752 +1972 0.2145722575503020146392 +1973 0.0667749683008569683285 +1974 0.127329292440909069839 +1975 0.1799959819149643536207 +1976 0.0363288680406711889104 +1977 0.089646344781169826077 +1978 0.1110089724410998207604 +1979 0.0898017246413547881589 +1980 0.1747174517536560223174 +1981 0.1091499252625969251795 +1982 0.0143929113684081160657 +1983 0.1935314444091368601963 +1984 0.0035864284256339048693 +1985 0.0166773280758368315502 +1986 0.1150773551341593148312 +1987 0.0779614242884619362828 +1988 0.2386210316683781973435 +1989 0.000750060167479552997 +1990 0.0922354798611760562377 +1991 0.1013650102273297576305 +1992 0.1342587273108633849628 +1993 0.0066249271998505112127 +1994 0.3104863456294278400982 +1995 0.0414164555557865951552 +1996 0.1923030473090001357672 +1997 0.1636029641320471683663 +1998 0.1763220067017390868536 +1999 0.0019433071449138957908 +2000 0.0930714066086048008586 +2001 0.1234073651644412156214 +2002 0.0344538296408440844476 +2003 0.0729428038182583782234 +2004 0.1491250377716271668938 +2005 0.2408421088646595653593 +2006 0.1890966153090983536966 +2007 0.0744523062910332766862 +2008 0.097063149514640414961 +2009 0.1608238979369565613275 +2010 0.104067186891640239832 +2011 0.0976910538706694808742 +2012 0.1953884370063160846875 +2013 0.0219525569554477086032 +2014 0.0473728425171052111575 +2015 0.100291369813659825283 +2016 0.0754308893279913711405 +2017 0.064169086457438517801 +2018 0.0425376293038486411047 +2019 0.0525062620479531047546 +2020 0.1766735063203687672306 +2021 0.0947017703058782439607 +2022 0.0151911590872287095844 +2023 0.2053165395925564962187 +2024 0.1393991967684574118103 +2025 0.1426665202153185318945 +2026 0.0567696235464345555499 +2027 0.3302657285559967292876 +2028 0.0152300556890402356142 +2029 0.173903803397126743846 +2030 0.1562483826614712578706 +2031 0.045784262659366810333 +2032 0.0391752704193938025568 +2033 0.1015379694403976129635 +2034 0.0749274963647718061921 +2035 0.1253764349628533891767 +2036 0.1466826331382287407212 +2037 0.0202937937351523565999 +2038 0.0354636995984466824039 +2039 0.1498872556849991100059 +2040 0.1684843620240915607056 +2041 0.0529356585030949650106 +2042 0.2272133050445482027602 +2043 0.0933049722063605396771 +2044 0.020799918092740361103 +2045 0.1696817264707919103994 +2046 0.132440970792012946422 +2047 0.0634863077073196113709 +2048 0.1425958855565272465071 +2049 0.006392074535200494359 +2050 0.0661810683035216062997 +2051 0.1358078642072867159296 +2052 0.130077334040157560624 +2053 0.2167915717499964978199 +2054 0.2188801547719537998749 +2055 0.052820161382363541025 +2056 0.1710162932618068953872 +2057 0.0761842704191446201678 +2058 0.1064136127314902480201 +2059 0.1187426411228201666992 +2060 0.0122188444158104307646 +2061 0.0328366236913154863064 +2062 0.0137495988262141839026 +2063 0.0332756129487924851018 +2064 0.039624684448078449639 +2065 0.0278984581981834983688 +2066 0.012663860615039011781 +2067 0.092755158529370371201 +2068 0.0453641420810668438701 +2069 0.0026778926072094466453 +2070 0.0142701376730706066404 +2071 0.0141326966780117996292 +2072 0.2505147025682694095217 +2073 0.1700206000335967859716 +2074 0.200870950815392856903 +2075 0.0021552559849684132728 +2076 0.1241072299695308578382 +2077 0.1372790951735606279627 +2078 0.1312356969027525799287 +2079 0.0038243144228264258806 +2080 0.331036052764251131908 +2081 0.1524919022906319199251 +2082 0.0188966197713380038015 +2083 0.0463870905255442389503 +2084 0.0221648820339855612271 +2085 0.1765123466406748886648 +2086 0.1490667466875476854238 +2087 0.145880741844152228559 +2088 0.204261895552804995102 +2089 0.221543370882240936437 +2090 0.1109602215820145870717 +2091 0.1824776685230572459417 +2092 0.1861225336976505551068 +2093 0.0575137949019477179302 +2094 0.0902689418298729534529 +2095 0.0014505835186616356392 +2096 0.10708385806384128347 +2097 0.0142018680133384200825 +2098 0.0795494821796675494197 +2099 0.2472208371827867567916 +2100 0.3148593797442190700409 +2101 0.0056846381493715512565 +2102 0.2255542604388027227991 +2103 0.0403719206559254861766 +2104 0.0168922542350435737901 +2105 0.0327575425623906693606 +2106 0.1831456453939757922367 +2107 0.0940496853077206901306 +2108 0.190281402252787751328 +2109 0.0191106389661264419733 +2110 0.153044462005589632092 +2111 0.0398434220208493186655 +2112 0.0656827162275680082049 +2113 0.0296687543557904546965 +2114 0.1216955715185471659234 +2115 0.0596679513594214894212 +2116 0.4941144019096668782431 +2117 0.0527482557511186125887 +2118 0.2058715030053286376699 +2119 0.0821120509957997263273 +2120 0.0286146613818247935734 +2121 0.0590997401915759751345 +2122 0.0166990934289039209826 +2123 0.1808932505162636206197 +2124 0.0014608151730294807909 +2125 0.1115145342246197379676 +2126 0.0573242035404796929088 +2127 0.006591584884917262889 +2128 0.3001677662535180979297 +2129 0.0261653700973027425469 +2130 0.0112865698338753921159 +2131 0.2451061388551850350748 +2132 0.0832913347421044975016 +2133 0.0947230306097035928969 +2134 0.1148859841138819143369 +2135 0.0843730020641287048466 +2136 0.3065661353382167497728 +2137 0.1376331451300011610162 +2138 0.0738870187682132972817 +2139 0.3009813893852776867099 +2140 0.0812049060177082310252 +2141 0.0510216012606468719004 +2142 0.0015237113817646663963 +2143 0.114466985205454022112 +2144 0.0514966819915988333056 +2145 0.0627205416751979299983 +2146 0.1589696741677588121 +2147 0.0514535900107221805255 +2148 0.0702640225705546683788 +2149 0.1243810588710783648914 +2150 0.156051872425447490933 +2151 0.1027076713269630126035 +2152 0.1278557990081796991166 +2153 0.1396237280042411932346 +2154 0.1723674935410826336568 +2155 0.0105773565721635504772 +2156 0.0851318224926047306678 +2157 0.080791408708672268757 +2158 0.164348250048210658969 +2159 0.1133549454296667452491 +2160 0.1429118990767677677134 +2161 0.1301172432117271293706 +2162 0.2473764069236052687284 +2163 0.1495500702791089930876 +2164 0.1506456837571751306015 +2165 0.2007409507831228734354 +2166 0.2029086718306183367844 +2167 0.1363187212892545963072 +2168 0.0795563165316749004718 +2169 0.019618409052029897599 +2170 0.2061104831231887479781 +2171 0.0081377019782869312869 +2172 0.0437522981106495886805 +2173 0.245810217768317768039 +2174 0.0963525484350798433475 +2175 0.231446894249561996304 +2176 0.0976590602225020504301 +2177 0.1438849899203688553762 +2178 0.011006551604546712872 +2179 0.0129798007448558017085 +2180 0.0739700922223213985784 +2181 0.2107196001055016976178 +2182 0.1450777920040420554404 +2183 0.0140200776008638748615 +2184 0.202599196791469371437 +2185 0.0685857861731601903266 +2186 0.0571114399136435427207 +2187 0.2963410434995783493406 +2188 0.0769905257549500510184 +2189 0.0192466192829536149322 +2190 0.0228151426947730516515 +2191 0.0214979319282523340107 +2192 0.1782392972901024053023 +2193 0.0846218598342582950522 +2194 0.2399692718358214515995 +2195 0.0053256199274749841527 +2196 0.2284669204006757881054 +2197 0.2121907736957552581547 +2198 0.1384690374349452079983 +2199 0.0028907847946160309359 +2200 0.1846540843963671851835 +2201 0.2796751993193745677857 +2202 0.1028867098952122999256 +2203 0.1210453163403581999802 +2204 0.005641258064482650679 +2205 0.0011641100072042806762 +2206 0.0396769355542082244326 +2207 0.0010372487945971339135 +2208 0.1079595541627119681394 +2209 0.0702145429322706682296 +2210 0.0014771175434702223973 +2211 0.0303977647047866013363 +2212 0.1292054762381006083327 +2213 0.0014174924350394064228 +2214 0.0017926809915109886018 +2215 0.3167292336960789156386 +2216 0.0546541622707085797406 +2217 0.1871165404179192526257 +2218 0.0421029272985882427327 +2219 0.0899089197041944010458 +2220 0.0686941178191574908229 +2221 0.1272039826243223370472 +2222 0.0506783746063609735755 +2223 0.1622836896745674384324 +2224 0.1985415569227975129962 +2225 0.0440969775459305740606 +2226 0.1488128225368877677326 +2227 0.1239277803770833519259 +2228 0.1612633179939555183591 +2229 0.1966465734909657592233 +2230 0.2078888407164466989308 +2231 0.0406503310787745750221 +2232 0.0803911975452028665368 +2233 0.151295198654482571099 +2234 0.0264033934178477848465 +2235 0.0005640332429777243379 +2236 0.0006611575808794747282 +2237 0.1778418097275234199817 +2238 0.0007401637064039930883 +2239 0.0926964626909429906254 +2240 0.0106897795047131080604 +2241 0.1716634497701265138492 +2242 0.1352863983992468599471 +2243 0.0089774269600483742881 +2244 0.1700188205032847899023 +2245 0.0407690663627830948701 +2246 0.0172598498546742036563 +2247 0.0173419605321513299601 +2248 0.1163829773099507441936 +2249 0.0132430665365084118679 +2250 0.2699577990923201498141 +2251 0.3564262772893716513245 +2252 0.2471403819542553614852 +2253 0.2103915511408226413081 +2254 0.0973964479767278623612 +2255 0.1359260545784322093965 +2256 0.1108850481801500198475 +2257 0.1813707707274075930926 +2258 0.1199723828586228419057 +2259 0.1796644141217144663436 +2260 0.1696784900379308125284 +2261 0.1186543143088289870013 +2262 0.0792579285952813017424 +2263 0.110651206797672818638 +2264 0.0609446560498566464181 +2265 0.1460499380332936081306 +2266 0.0202347317031434034695 +2267 0.1429410467873833689989 +2268 0.0006545678785038417172 +2269 0.1399470588933192927161 +2270 0.1276858969528308351826 +2271 0.0887693643810482657663 +2272 0.1110810303620934635926 +2273 0.1746072201853033356578 +2274 0.0309018052938064176349 +2275 0.213710390803705102547 +2276 0.000439025464469265951 +2277 0.0289170920380399376626 +2278 0.1665130599355367779779 +2279 0.3242364621849851413771 +2280 0.1585224401489660761566 +2281 0.0339321653457320335234 +2282 0.0064349923353894163819 +2283 0.0402054532173719419985 +2284 0.1772062417112962307009 +2285 0.2197535979056077781735 +2286 0.2259631549401755201689 +2287 0.2148936483475142322774 +2288 0.0720685141448479810178 +2289 0.1028358147634980818141 +2290 0.0217483041808062167732 +2291 0.0470106319372218528496 +2292 0.0377553862556506067416 +2293 0.0908385049248562892465 +2294 0.0052070109267103540512 +2295 0.2173620148952293829048 +2296 0.1043338674538513699552 +2297 0.1667114992359897962526 +2298 0.3020464410274178446159 +2299 0.1195624203547720199037 +2300 0.011071440900946617758 +2301 0.2168157086366016561563 +2302 0.2228212093591651621338 +2303 0.1359961603861247758651 +2304 0.0194616028908875665637 +2305 0.1353249872182736013304 +2306 0.1346754446754717393642 +2307 0.2280449102038492881217 +2308 0.1512178972069129767597 +2309 0.2068208542370139346733 +2310 0.143636171750302465977 +2311 0.2187058815228509756068 +2312 0.0023755355577218290226 +2313 0.0203778994573851519878 +2314 0.0157557874726929614262 +2315 0.0008842289370961901743 +2316 0.0386579217238780942223 +2317 0.1736345852179494997358 +2318 0.1303761962294471277168 +2319 0.0010191396762768444896 +2320 0.0380974697541103107801 +2321 0.1599830586413667765644 +2322 0.0475094451714917348495 +2323 0.0681145101253152335241 +2324 0.2222394082455394770381 +2325 0.3418175967499597622812 +2326 0.1511101164120162432258 +2327 0.028343986323515202308 +2328 0.2799704640715117154315 +2329 0.2072727014736417128216 +2330 0.1610314928666016220937 +2331 0.109254178688369549266 +2332 0.0025864311517116316853 +2333 0.2304432416650406267866 +2334 0.0224884529178497501178 +2335 0.1804765052780210599082 +2336 0.1604959637400977245925 +2337 0.0580546768736649454468 +2338 0.0014233596634011527141 +2339 0.2036026711558827917425 +2340 0.1743006236719066015528 +2341 0.1633925348673935518118 +2342 0.1463416854976982428571 +2343 0.004915212541040422066 +2344 0.020714163186075350287 +2345 0.0160761720020521151064 +2346 0.07234560829157458961 +2347 0.199054886870588199077 +2348 0.0641715960295495685406 +2349 0.1378268797979087800787 +2350 0.1549043387140212890696 +2351 0.0935702990200337203497 +2352 0.0639582710346222604869 +2353 0.1824996326714730665586 +2354 0.2401178788843613409743 +2355 0.1676792044082175969777 +2356 0.0148103062024930245966 +2357 0.1592811220517064141333 +2358 0.1661907688709641783831 +2359 0.0693712292380164047279 +2360 0.2131902382630554981802 +2361 0.0107678905921609301133 +2362 0.2085100068907214665437 +2363 0.0028094117648293152716 +2364 0.2169571618749998320741 +2365 0.0091459126700460281756 +2366 0.1434724274888089512459 +2367 0.009903949145702304821 +2368 0.0123746535433679194504 +2369 0.1573880898166110653413 +2370 0.0335551940949471855236 +2371 0.2499680191824814601276 +2372 0.1482905766221102594749 +2373 0.0136265525603676881244 +2374 0.0486783001197479484046 +2375 0.1149451359374829978455 +2376 0.1378963897046527886658 +2377 0.2935995887686574379316 +2378 0.0956407511589770081839 +2379 0.0021655418362711460756 +2380 0.007176461301288510064 +2381 0.0718039286984457902907 +2382 0.0217319881939808216831 +2383 0.0196653600864657301939 +2384 0.1105860668245408212629 +2385 0.2264758896108966801375 +2386 0.0584584783602910412759 +2387 0.034244864243315721597 +2388 0.0822395214080069575369 +2389 0.1824673167296326503628 +2390 0.1868652107181402177716 +2391 0.2090012562847713140091 +2392 0.1727580799884907702957 +2393 0.0605043766968969415343 +2394 0.032327043565465764241 +2395 0.0973877193916912187177 +2396 0.1503180808456568429143 +2397 0.0185760668194342934423 +2398 0.4023926808310127056778 +2399 0.0137320353716692059032 +2400 0.0020977861493932502962 +2401 0.1988338776977151556302 +2402 0.1980015265005981295321 +2403 0.116328739776085432478 +2404 0.139714367193307736903 +2405 0.0041640255547000246464 +2406 0.305329827734439507303 +2407 0.2335163285251364295725 +2408 0.0019536348862314255846 +2409 0.2069365163539931551373 +2410 0.1543527445893314864467 +2411 0.0140700511858201215282 +2412 0.194412221421128267318 +2413 0.0034293868357636011004 +2414 0.1883813892690542057817 +2415 0.1614338560222241258924 +2416 0.0801925568350746326152 +2417 0.2494631301259116196167 +2418 0.1536047149591783511635 +2419 0.2072725420106609484083 +2420 0.0007259506865127487215 +2421 0.1530430490002368526525 +2422 0.0582717069822188613037 +2423 0.2406972136486494606711 +2424 0.1846465195716577578633 +2425 0.2129075593436315916307 +2426 0.0012849501298855964 +2427 0.2121844373271927486346 +2428 0.0826827722958766025974 +2429 0.0119160517986407598484 +2430 0.0101282979862371610297 +2431 0.175471033268002091221 +2432 0.1188313654093694354819 +2433 0.1058339324360653282975 +2434 0.0386820564801596544768 +2435 0.0127875958456133160396 +2436 0.0184649574184235082397 +2437 0.0930927733736684470678 +2438 0.0299204941574969518481 +2439 0.0027898288178541596691 +2440 0.034482386472560737356 +2441 0.0496314790352423707009 +2442 0.1381698769919852132215 +2443 0.010780239761543214394 +2444 0.0749230514274472347847 +2445 0.1764041111875007949994 +2446 0.1690611359864280793808 +2447 0.0255973350187739846773 +2448 0.1498609066591310989836 +2449 0.168001210648994508956 +2450 0.1439617624587407052061 +2451 0.2376723896824820547735 +2452 0.172184000538101816824 +2453 0.1126755568178180871852 +2454 0.218554993808567299407 +2455 0.0401425388559211171469 +2456 0.0107352462208406100597 +2457 0.2348464924808136844447 +2458 0.1774453047059669497099 +2459 0.1723021858688398300075 +2460 0.1714987706858331073523 +2461 0.0498724396774605505112 +2462 0.0298768141236745478351 +2463 0.0419088018935076800364 +2464 0.0295632518921931375377 +2465 0.0243141232145835765877 +2466 0.1426126690866459378881 +2467 0.0835604562583991267166 +2468 0.0560764294154361478406 +2469 0.4235203572079103695458 +2470 0.1750984752813558587015 +2471 0.0384230930350794377959 +2472 0.1117697224232105795982 +2473 0.1330147451181435680478 +2474 0.010926901943866624764 +2475 0.0364605959267263191048 +2476 0.1045145898159515474024 +2477 0.0845639380913638244452 +2478 0.0194610788753990411604 +2479 0.2027856849541685158034 +2480 0.0292049017792760240431 +2481 0.1608242180191857551019 +2482 0.0565821382440779077627 +2483 0.0440682906194343609885 +2484 0.137115608010643125203 +2485 0.0499635161199597707671 +2486 0.0059624020225934855691 +2487 0.1216973073931245791224 +2488 0.0702878152370673969784 +2489 0.2461041469894100774951 +2490 0.2524852448715217412278 +2491 0.2139504278377334089978 +2492 0.1282735292338666177603 +2493 0.034375195715725954948 +2494 0.0501096454729402299155 +2495 0.2084215921023562512193 +2496 0.0257395761936783651347 +2497 0.2140284912843988807118 +2498 0.106932331399325550314 +2499 0.1285212945268083273298 +2500 0.0982834364407777216766 +2501 0.0025938728969975830864 +2502 0.1745009156054779242417 +2503 0.079063772607434815165 +2504 0.0882389414763944945896 +2505 0.0372004082012147782588 +2506 0.1966197788561229176096 +2507 0.2583942472891290753623 +2508 0.0396274077475269950743 +2509 0.2299307735527399976228 +2510 0.0397327685839678687585 +2511 0.1199309685323506763366 +2512 0.0570111366476113087809 +2513 0.0051349264804015429253 +2514 0.1574218648311844215293 +2515 0.108400046758774126765 +2516 0.2654577100903594000236 +2517 0.1022052058688600884029 +2518 0.0088497124336279243961 +2519 0.144738530545150873019 +2520 0.0739556855582241268188 +2521 0.0768125188576924272654 +2522 0.294483453303180264804 +2523 0.0424353005371127245393 +2524 0.0442435072555563899122 +2525 0.3544948234667063546866 +2526 0.1895405951177846770594 +2527 0.1542124262745910578953 +2528 0.0121542948078427713587 +2529 0.1054367151655060952375 +2530 0.0484815044226641969627 +2531 0.2448743421481043569532 +2532 0.1281977411934284993844 +2533 0.1037693367689660334241 +2534 0.2198442632756829606677 +2535 0.1350620572421279141917 +2536 0.0814989528061744322729 +2537 0.0021413031538932026515 +2538 0.1640761461726896186519 +2539 0.1450468663142952252532 +2540 0.1058099186525906093559 +2541 0.1986385567473009372552 +2542 0.1343077779605205690494 +2543 0.0220839183245500995934 +2544 0.01662172487150359243 +2545 0.109204391009944820623 +2546 0.0907518947547002186038 +2547 0.0226099582056279481412 +2548 0.0056492366857320650692 +2549 0.0060717114948516419035 +2550 0.3380703439382316544126 +2551 0.0581268526532458112976 +2552 0.2124920803326165175129 +2553 0.1386795315393004879301 +2554 0.1347932988300366008438 +2555 0.121504143230536251763 +2556 0.0513711190621291072245 +2557 0.0666218820718188081687 +2558 0.007069296870015914433 +2559 0.014109402220271067066 +2560 0.1314702786326876204903 +2561 0.1345111760909042386558 +2562 0.1664530807318153060592 +2563 0.1377840312236222330977 +2564 0.0794836775814724422196 +2565 0.1834367607553485934879 +2566 0.0025939839957927961038 +2567 0.1170965578346548630062 +2568 0.1156790410927202306102 +2569 0.2107773943668873051216 +2570 0.027042885639178823598 +2571 0.1736407788144941621855 +2572 0.2052222425011835382769 +2573 0.0179698660019617789674 +2574 0.0247365195121280120882 +2575 0.0933570396939106678103 +2576 0.2569056791114628923722 +2577 0.0271014626086391145765 +2578 0.0600790944020814857152 +2579 0.0742161981716505936291 +2580 0.1638745842854571477254 +2581 0.1670650448619828143393 +2582 0.0736240732212274234803 +2583 0.0173813572202586863469 +2584 0.0033823132186506774978 +2585 0.0010403574894879575458 +2586 0.1310610599035414158831 +2587 0.01534704418793906254 +2588 0.0557813957837481888546 +2589 0.0034395634530824132497 +2590 0.0073337308571418801523 +2591 0.0094863775494172554448 +2592 0.1170008888272379699025 +2593 0.1269579971912191407313 +2594 0.1203902433152129469729 +2595 0.0384685269822066452927 +2596 0.0051969315201472040741 +2597 0.0133452533146741648268 +2598 0.148974140230077795044 +2599 0.0322909887188171235151 +2600 0.146558300987356726619 +2601 0.0225681076116942894161 +2602 0.1620171706030084091132 +2603 0.0586206189468629484951 +2604 0.099006453779147046701 +2605 0.0526068643220937021132 +2606 0.0455421142673279530366 +2607 0.3455809824999592216876 +2608 0.1427148526143743878514 +2609 0.1257019803286653636398 +2610 0.1884273969067890019957 +2611 0.0919427373862823082984 +2612 0.05718350104503643061 +2613 0.085130309451324659431 +2614 0.2008385394964192216527 +2615 0.1484915800614026726301 +2616 0.0932013485480321646204 +2617 0.0895558126641292640002 +2618 0.0202462494971754961015 +2619 0.1133765761493495810575 +2620 0.0023781333689480792758 +2621 0.0246735421799235757345 +2622 0.1506387198328801024783 +2623 0.0615614004018025517961 +2624 0.1302657740916344786264 +2625 0.0568205947946362061041 +2626 0.1346383061069140740784 +2627 0.0301453913302625851889 +2628 0.0177241934145614433993 +2629 0.0759908193164727102653 +2630 0.137735520000752953429 +2631 0.1816396101387582062703 +2632 0.0155887621557745242873 +2633 0.0099258484981321476459 +2634 0.2123071943282858753399 +2635 0.0760682509053266647792 +2636 0.0056329187436876520384 +2637 0.1369480875371387185169 +2638 0.0114011582022070928621 +2639 0.1533126341715076446093 +2640 0.208122262460883611368 +2641 0.0359729818882940505231 +2642 0.2298877014815729202812 +2643 0.063867262820876138929 +2644 0.0275495577718649353738 +2645 0.0566244492456836473471 +2646 0.1751320366053481236701 +2647 0.0987229807210229237668 +2648 0.0053737180732420578022 +2649 0.0601803324852872231765 +2650 0.318268765180199419973 +2651 0.0591057491393019232562 +2652 0.215316106781662852665 +2653 0.0882176485738843385587 +2654 0.3335990399447665288868 +2655 0.0461967013925209876724 +2656 0.1350676434589878627257 +2657 0.1080284933867047131306 +2658 0.01576186201582747623 +2659 0.1285420086044584409013 +2660 0.022764113133795352284 +2661 0.1788431835197618613442 +2662 0.1504790573651963447865 +2663 0.014844877079295510508 +2664 0.3004968579312912169144 +2665 0.1420831360179672098187 +2666 0.2116156894060964022142 +2667 0.1648841584779866253907 +2668 0.1938571569820361772951 +2669 0.1867816193948446112927 +2670 0.0370733921340270500266 +2671 0.1418727883309052750782 +2672 0.0674384400357729840714 +2673 0.0935928217515556992723 +2674 0.1990383517846558703024 +2675 0.0037523067666033848959 +2676 0.244438316439917452394 +2677 0.1537143247892193420601 +2678 0.0897513806147612475117 +2679 0.1753375191891037854219 +2680 0.1339936911837263933123 +2681 0.0018754173966811474479 +2682 0.0152028211592923189616 +2683 0.1047142047560685462759 +2684 0.2315866732101002634092 +2685 0.0055861471717716477611 +2686 0.2776580849092614711182 +2687 0.2028345237773025999672 +2688 0.079065098273930170536 +2689 0.0128665115831939486091 +2690 0.1145768356841426888204 +2691 0.2419162883747675851787 +2692 0.1234711303652080982562 +2693 0.3595094067866123621613 +2694 0.0779220018785525408989 +2695 0.2308233100397104331503 +2696 0.084346272732717816889 +2697 0.1974765505702943402255 +2698 0.1966360942817890755663 +2699 0.0452744530661514932302 +2700 0.1342632625444113436686 +2701 0.1385586282478681041663 +2702 0.1545645272303121930957 +2703 0.1447325042885267420978 +2704 0.0863473552275148964474 +2705 0.0196627560663093503557 +2706 0.016927444305896224841 +2707 0.0013645639623854224996 +2708 0.0079539769469218254277 +2709 0.0817946874934123852041 +2710 0.2037057410520580758018 +2711 0.1141631919482741092109 +2712 0.0005978087577394737258 +2713 0.0162351670973732188641 +2714 0.0865632760970847375814 +2715 0.0481068115023836548327 +2716 0.2629267879997556067728 +2717 0.1715419529539238263638 +2718 0.0917519859178581170811 +2719 0.0783619674073570138262 +2720 0.1332709932541748798496 +2721 0.0802760935337092640385 +2722 0.0604581436489268556067 +2723 0.0014190434261650724558 +2724 0.0667676777232441426291 +2725 0.0004721168292071079974 +2726 0.0038914586210579064744 +2727 0.0497554523325992800742 +2728 0.1521977263204603025848 +2729 0.0620028320722515596808 +2730 0.0633036754937242457375 +2731 0.1026710271484469472192 +2732 0.0769016734278377689371 +2733 0.0693654957875548400237 +2734 0.0119206942920402243768 +2735 0.0203673960793961557336 +2736 0.0754473388721126347889 +2737 0.0624774507747897292487 +2738 0.0046626829982433322039 +2739 0.0790998051521342865344 +2740 0.0304327184615056749173 +2741 0.0357982189310320150777 +2742 0.0319236776737948715899 +2743 0.1026738665599874616996 +2744 0.0673374131115183627294 +2745 0.1764061854164127307598 +2746 0.1280826799934452475682 +2747 0.007782041490858968881 +2748 0.114328661005651616156 +2749 0.0257661425629784265889 +2750 0.107083164388986254445 +2751 0.0043485693480149467671 +2752 0.0083690604798912406753 +2753 0.2292076895750671805541 +2754 0.0544553442500525092251 +2755 0.2129566412579354528489 +2756 0.1889593936780286365096 +2757 0.172642018152317888724 +2758 0.1368574329212502715958 +2759 0.2415674018451373361582 +2760 0.1336280332180080909765 +2761 0.1657357786273243316355 +2762 0.2283189241197183250165 +2763 0.0026254900098847564968 +2764 0.0077541923687372057289 +2765 0.0293538473116906964422 +2766 0.0030969287913701908262 +2767 0.0017201442403423942637 +2768 0.2569832258647768430393 +2769 0.079559024299790198631 +2770 0.1797063066586075397169 +2771 0.0277497474910505942391 +2772 0.0655778160798214104776 +2773 0.0101996914749051762611 +2774 0.1626572398495955418252 +2775 0.0033164815730715641907 +2776 0.2333375283907073438616 +2777 0.244778262569239263291 +2778 0.1646032105146908730564 +2779 0.1415274127821308236541 +2780 0.3292422763882240088762 +2781 0.046363106551004902578 +2782 0.0024518633608138118603 +2783 0.0997449814116486976889 +2784 0.0031796619162379317382 +2785 0.1433182806337778003591 +2786 0.1964992693631701925483 +2787 0.0325780813743309916042 +2788 0.1142857030211182761681 +2789 0.0239263432678354509564 +2790 0.2431990935859856539647 +2791 0.0316916257647134186337 +2792 0.1703492392958453616192 +2793 0.2076323313851414775755 +2794 0.2480848559896711436057 +2795 0.2224984911335469994764 +2796 0.1655865620784020275646 +2797 0.1031580595455002524741 +2798 0.0886715412326178065161 +2799 0.1226934041984904799616 +2800 0.0498416491010674686524 +2801 0.3349494692147985119846 +2802 0.006210581959476661057 +2803 0.2770502827892607489169 +2804 0.2161935664303321069646 +2805 0.2056558169351428155824 +2806 0.2118263819264807890086 +2807 0.0411162145412144811041 +2808 0.1821524417761425651552 +2809 0.2769424687395267703494 +2810 0.1703953343261174202539 +2811 0.0214078655253415080228 +2812 0.0926063781528581042579 +2813 0.145359389676828071325 +2814 0.2065512490589405392161 +2815 0.0090137241466095169296 +2816 0.1118184298618773026002 +2817 0.0035554948582959869895 +2818 0.0094144396490929531385 +2819 0.2920791773598971907688 +2820 0.1666293660709002588671 +2821 0.1099723366446295758081 +2822 0.1738272936159558079705 +2823 0.0115056751074714466027 +2824 0.1023925994431865438283 +2825 0.1167946021717456012157 +2826 0.0380587313902784399322 +2827 0.1636709800281535287514 +2828 0.1339634016592393950251 +2829 0.2022324627777637717774 +2830 0.094701962860586186288 +2831 0.1746941511609486630263 +2832 0.0193863322227264441022 +2833 0.0360454333975258195411 +2834 0.2465523181964188459414 +2835 0.0232679562358172158099 +2836 0.0033592042539377763716 +2837 0.1815110558171170729302 +2838 0.3089136735892808460768 +2839 0.1315658555807887597489 +2840 0.1586971235561754622889 +2841 0.1156485175765274292514 +2842 0.0075709372195589298704 +2843 0.1681966237302223998729 +2844 0.1610776147368404576099 +2845 0.0943643320711956001823 +2846 0.1241883832889457972559 +2847 0.0702114321500772881057 +2848 0.1951730262413134131538 +2849 0.0395331160149240481982 +2850 0.0835565956939205156662 +2851 0.139494749504829568254 +2852 0.0096604686797054535596 +2853 0.0195488769998803692263 +2854 0.2399923264482737295111 +2855 0.0950020084488971544578 +2856 0.3272870933481855648672 +2857 0.1147863398720919414497 +2858 0.057747746660610575764 +2859 0.1689709075513060598794 +2860 0.1732548250498491260174 +2861 0.2002990732404179297088 +2862 0.2846162867275424113167 +2863 0.0096910972003828171895 +2864 0.1049552479863856552234 +2865 0.2375623029867749425037 +2866 0.0575306936930087067794 +2867 0.2141835465970087781784 +2868 0.0535310486906852567301 +2869 0.3607528306096018377858 +2870 0.228964175081257603761 +2871 0.1163323074842727566924 +2872 0.3200234073163278902818 +2873 0.0722669837010452426052 +2874 0.0035044148857781404725 +2875 0.0206823367097282578841 +2876 0.0471546389336831711647 +2877 0.0991066480135640165416 +2878 0.2109055675612355740256 +2879 0.268247741211268098116 +2880 0.0497086840141839000906 +2881 0.1277717399095862627068 +2882 0.1071553822391922566881 +2883 0.0845380249154414897816 +2884 0.0250944975695425558093 +2885 0.0387534150747867248143 +2886 0.003615261081989404237 +2887 0.1589143411963168384116 +2888 0.1302023275669566826362 +2889 0.2248458813210832962781 +2890 0.0072814867788405989932 +2891 0.2165559437480575333268 +2892 0.1522167947614985195059 +2893 0.1266797695406784718397 +2894 0.1677907722907424348247 +2895 0.0944377488096280054419 +2896 0.0845452517355018834389 +2897 0.1866353638397941216986 +2898 0.0157041541864496106418 +2899 0.01237544011179283511 +2900 0.1105786993764815706598 +2901 0.0222508958854637992064 +2902 0.1675183573237997436411 +2903 0.0589193102555670769616 +2904 0.1279521268942941103486 +2905 0.2373088823075289732767 +2906 0.2817936428570636531177 +2907 0.1229845392529253511604 +2908 0.1518299110151070940855 +2909 0.0884840852706702402086 +2910 0.0339035928579533921146 +2911 0.0720287620700666847418 +2912 0.124046364131163863731 +2913 0.1558159372107149609477 +2914 0.1318012475978172226565 +2915 0.1483300037939070115112 +2916 0.2569348975412811819652 +2917 0.1390581473515541399699 +2918 0.2985884569764386542445 +2919 0.1097807252010112866181 +2920 0.2186088846590386325364 +2921 0.0845736331985452172155 +2922 0.166271724790468838906 +2923 0.3196851422745318749286 +2924 0.0826239121843893448149 +2925 0.2496273967283511030502 +2926 0.0353552784702108185977 +2927 0.2288439373347165572969 +2928 0.1586555054251710394908 +2929 0.0573842403319966068431 +2930 0.1233477069128028763556 +2931 0.2125953503603534577859 +2932 0.0587499904316012497296 +2933 0.1580511555910226462185 +2934 0.4480166429170024389173 +2935 0.0014229599529343914515 +2936 0.3146460557967399873647 +2937 0.036605597337155591775 +2938 0.0255339439975152786289 +2939 0.1556839991560617364463 +2940 0.250797511383354987391 +2941 0.2281834110579817054276 +2942 0.1716894995829897074824 +2943 0.0876863391621175436441 +2944 0.13962941893158703488 +2945 0.0200954040745903034226 +2946 0.2865065974619071531571 +2947 0.0827155965499545009489 +2948 0.0465852606576946312589 +2949 0.0316862006374767912753 +2950 0.0610765934997651305238 +2951 0.2075526162511772965136 +2952 0.1567975893727510949827 +2953 0.1251875442049842357406 +2954 0.1993191394576612485157 +2955 0.2988437219377586551161 +2956 0.0518467902508634448377 +2957 0.1419094900481321086527 +2958 0.0046582311760646529647 +2959 0.2277831542939583175933 +2960 0.1007449227471378522258 +2961 0.0428522636898503450631 +2962 0.1370319892611244483316 +2963 0.0513513888172681606803 +2964 0.1697966476871144991811 +2965 0.2096503171000900500598 +2966 0.1576033377883376551676 +2967 0.0117301782611429292885 +2968 0.0259902564438802691216 +2969 0.018920520887062117904 +2970 0.1167490430399165113329 +2971 0.0422003978743157037723 +2972 0.0415444717725776055395 +2973 0.0139461299679071552837 +2974 0.0547222297888943221 +2975 0.0019603996407729956837 +2976 0.1395077228340765862491 +2977 0.0464143454575344574509 +2978 0.1206770535368144187105 +2979 0.084108770515115505173 +2980 0.0621915315714392860635 +2981 0.1199094617738433454779 +2982 0.0499081558639687533629 +2983 0.0508114418356454822234 +2984 0.013006403661350531345 +2985 0.1184458612553337258921 +2986 0.109383998145070909791 +2987 0.3118947187911950136296 +2988 0.1414503315336376465527 +2989 0.2238975041164148194195 +2990 0.1240055366693619542939 +2991 0.005016482641917922522 +2992 0.1847416680531932253739 +2993 0.027213563057076371049 +2994 0.232042032666544106867 +2995 0.0784491126333243804503 +2996 0.13623542681942771293 +2997 0.1866319186577099220603 +2998 0.1895133004273205357126 +2999 0.0024903879532956600153 +3000 0.2646922346321394781121 +3001 0.0302175561633273234707 +3002 0.261781867488347974593 +3003 0.0149020800670385707154 +3004 0.1145584114933814456894 +3005 0.0944728035016796641177 +3006 0.008113245987565732148 +3007 0.0069611146803846913125 +3008 0.0312304504950194315727 +3009 0.0515072236714657419321 +3010 0.1172302116130079369105 +3011 0.1617054286672423291105 +3012 0.0108462658009758228933 +3013 0.0018699074430577964766 +3014 0.1811177055709279060114 +3015 0.1426268960586380973865 +3016 0.0718984990174210136793 +3017 0.1021416934991820069101 +3018 0.022510713123388009782 +3019 0.1575662504014271325659 +3020 0.1361340061614966512327 +3021 0.1783684432552496423874 +3022 0.0919328984960274403493 +3023 0.0032081838060334963263 +3024 0.0802423742216783336678 +3025 0.1861252293099680266142 +3026 0.0810520787459095420902 +3027 0.2278771736666826142059 +3028 0.0076304744207732928818 +3029 0.0218687714173012234808 +3030 0.1073527056697983900202 +3031 0.0498410067621438551688 +3032 0.0006371432955588797334 +3033 0.2108870430707977250417 +3034 0.1372286761193078352683 +3035 0.0376911700509253086433 +3036 0.2212798407926439836491 +3037 0.0111775852317381272499 +3038 0.1001531572290251664858 +3039 0.0888126658028011162216 +3040 0.1961300738368988161309 +3041 0.0004538393634603690502 +3042 0.2833552531069427615762 +3043 0.2574097611377465510962 +3044 0.0685489018736263056031 +3045 0.0237900312161981181136 +3046 0.0836888621186484549241 +3047 0.2005547608196312503459 +3048 0.1378531705070404345115 +3049 0.1415507953881944835928 +3050 0.0080449282800475700045 +3051 0.0263701756359114442008 +3052 0.0670687898431599760496 +3053 0.0280219440113129436565 +3054 0.2825217852568266962976 +3055 0.0043219351130198467928 +3056 0.1169956896167674703646 +3057 0.1366960987990151477067 +3058 0.0690420717937976352596 +3059 0.2144164905081941685516 +3060 0.161166000510295642778 +3061 0.0895227040108247384964 +3062 0.2678169545204754919965 +3063 0.0764642512280863662077 +3064 0.2412115543665994121802 +3065 0.1322984922171547561565 +3066 0.1643296198349467052147 +3067 0.0492837226058766217363 +3068 0.1699986805426225988658 +3069 0.0195640761208761664036 +3070 0.2535318900286891441453 +3071 0.2100462189544979318967 +3072 0.1880493157436045059683 +3073 0.059052711211673422631 +3074 0.1719567956323400925722 +3075 0.1996961892768592261582 +3076 0.1704517815891918863791 +3077 0.126854256561462014341 +3078 0.0510325353002580237027 +3079 0.0092953525113053207807 +3080 0.0060882106935869276243 +3081 0.0460701100626370665947 +3082 0.0323862725149140426306 +3083 0.0341049339090510189587 +3084 0.0653638536122347763024 +3085 0.1711777698578192952183 +3086 0.0889465989636387538431 +3087 0.0016680013251085415524 +3088 0.0285819190188840374645 +3089 0.0199306512112688988259 +3090 0.0021184585929255482159 +3091 0.0353217288668367102034 +3092 0.0064904925141500745614 +3093 0.0190707605229783298817 +3094 0.0181624084747951178298 +3095 0.0426233570136845332788 +3096 0.0327193796302269537812 +3097 0.0092899645837449112973 +3098 0.0006564205503587887142 +3099 0.0510451037867390988723 +3100 0.0313470222734937437048 +3101 0.0221016489533145607527 +3102 0.0311160715435623430603 +3103 0.035041498321941269567 +3104 0.0157165133469368799302 +3105 0.0257570815886230854164 +3106 0.0198404478865008421229 +3107 0.0005474172157414539108 +3108 0.0200798109890301633429 +3109 0.0593095238705631244702 +3110 0.0685260616814646894568 +3111 0.3074576939736610170506 +3112 0.1190262237470886830248 +3113 0.3274102518620881085987 +3114 0.2545059358878248856328 +3115 0.0301532115671413018798 +3116 0.0895013187890096734156 +3117 0.0648007750693457840185 +3118 0.1847408634116886050336 +3119 0.0803407292908526748931 +3120 0.2252622628952133110314 +3121 0.0465691168676453451369 +3122 0.0994772495091964525926 +3123 0.3442923664349671342144 +3124 0.1864505948720996120205 +3125 0.2335394449118377291352 +3126 0.1653577273925228019458 +3127 0.1100815739469794135585 +3128 0.2589344016469439790917 +3129 0.1459172717982839817541 +3130 0.1736646239796034441039 +3131 0.224183934686719238405 +3132 0.1028777197274833576923 +3133 0.0879033414331857887447 +3134 0.055080458028201086107 +3135 0.1569720779421226175554 +3136 0.1787172746692884672814 +3137 0.1254777841954128336788 +3138 0.0758003176180231663661 +3139 0.2326051373851192038966 +3140 0.0604172206794297214638 +3141 0.0938254315424522367106 +3142 0.0285274465348027417289 +3143 0.0287620149660223443921 +3144 0.0343270741461129472172 +3145 0.0927771332994787173432 +3146 0.0316657396552845887827 +3147 0.0230406332035208126496 +3148 0.0366204711004418440035 +3149 0.0263488217879932588861 +3150 0.0360995939189020093041 +3151 0.0660480771721020376575 +3152 0.0862011433504227958524 +3153 0.1063439249537742742335 +3154 0.0345815176586725669949 +3155 0.2193006080835057935996 +3156 0.1096913668810697572997 +3157 0.1881988440677752894015 +3158 0.0244146789182316699407 +3159 0.1613791387594823434348 +3160 0.2117726884008070686871 +3161 0.0039590461276429724999 +3162 0.214459834833923723707 +3163 0.0383480463021019374326 +3164 0.1585248454876230506105 +3165 0.0705111670358051639829 +3166 0.2706182295602736487261 +3167 0.0878322350962837533617 +3168 0.0517176549549155997743 +3169 0.0302158768456229111232 +3170 0.0388406799032090868651 +3171 0.1027996266762829236097 +3172 0.0128251854164530571661 +3173 0.0675279359351047825388 +3174 0.0291343694823372155456 +3175 0.0353017108063855972189 +3176 0.0364580302138601472506 +3177 0.0423375153613005442144 +3178 0.109122790051786142107 +3179 0.0357963112944383946012 +3180 0.0020584079367283532498 +3181 0.0409886883266247503688 +3182 0.2234599357349993653532 +3183 0.0180263016223652962344 +3184 0.020412072239353113573 +3185 0.024695546076903818894 +3186 0.0271564704087689547107 +3187 0.0593945260999703961158 +3188 0.0574636816445278017507 +3189 0.0529526368170022340709 +3190 0.0413042055184209971896 +3191 0.0197172341266798265003 +3192 0.0622016288677573470078 +3193 0.0316782463688824525438 +3194 0.0596052303635860245001 +3195 0.181012422383792281888 +3196 0.3565426540119927545369 +3197 0.0848583367078007966278 +3198 0.1487948903912677256489 +3199 0.0533913448716650113068 +3200 0.012469129586983186686 +3201 0.0298738958914091107255 +3202 0.0104341356237832947468 +3203 0.1113608986336611650358 +3204 0.0221028536959108380666 +3205 0.1984282294373693467104 +3206 0.263810572435300871863 +3207 0.0766325040223957559826 +3208 0.0697661384771070675059 +3209 0.0283449846047639335278 +3210 0.1040836414430723333435 +3211 0.0818852490623001305625 +3212 0.0425025267787629834615 +3213 0.1202697657523023877513 +3214 0.056084320816759083983 +3215 0.029970462444111323902 +3216 0.0492612357602516329202 +3217 0.1693562418000767733073 +3218 0.1587473499205541982437 +3219 0.302665619776510186334 +3220 0.158033465822700558423 +3221 0.0970550848298305929296 +3222 0.1393764471881148991894 +3223 0.0122632695262497558308 +3224 0.128834405480652780529 +3225 0.1082429002788807270719 +3226 0.1212111725349946794728 +3227 0.2130383494599963900562 +3228 0.0093227737416850132296 +3229 0.0313391252638081929671 +3230 0.1409419737180716281078 +3231 0.1798065074770405225024 +3232 0.0317339965435771143021 +3233 0.0636704451712455515855 +3234 0.0629321517111462030991 +3235 0.0382271774233775146756 +3236 0.1314266815239853447572 +3237 0.0219051502653139738841 +3238 0.0158940043660962468386 +3239 0.0438587402928188388462 +3240 0.040923041208120916612 +3241 0.3121803184925798779759 +3242 0.1521416720822764967469 +3243 0.0325337522455878061511 +3244 0.0705952048438668877672 +3245 0.0972642006664119207482 +3246 0.0519973810620732446175 +3247 0.0422315114286933718235 +3248 0.1603418803420250116076 +3249 0.0358462927840883655017 +3250 0.2096602250618694318618 +3251 0.2393887511702489367948 +3252 0.1434760651142424070503 +3253 0.2092119684897524589662 +3254 0.2114547918480881860681 +3255 0.1332642797887544361402 +3256 0.0422294176309829308313 +3257 0.1702795929311250733296 +3258 0.0975906191622728097501 +3259 0.1999459993799193313624 +3260 0.1907530832322540670365 +3261 0.1494832993902686646415 +3262 0.0045123667177111867824 +3263 0.3197576382773874570375 +3264 0.1419473888690102048038 +3265 0.2260631185788058716835 +3266 0.2123418544300837984551 +3267 0.2799648911197425982245 +3268 0.2666085263016489959043 +3269 0.1493891400526843882268 +3270 0.1851088009135518075432 +3271 0.1456665604425780302655 +3272 0.1168712257768009937786 +3273 0.1021527095102173904317 +3274 0.1416008056434925055367 +3275 0.177207868941272383978 +3276 0.3361708296119622074727 +3277 0.15755397384049107945 +3278 0.1178897624403039412488 +3279 0.1400013303922583163263 +3280 0.176509061137554518206 +3281 0.1089098599670997036259 +3282 0.2748092349528887257115 +3283 0.2779879557384857813318 +3284 0.2036895494502742043341 +3285 0.2063792243686550786919 +3286 0.0780427129773455935391 +3287 0.0091113101239992493052 +3288 0.1871229568561795231219 +3289 0.2426027389274114376416 +3290 0.2026468763455281929531 +3291 0.2649111296202343868167 +3292 0.1990752360636304785224 +3293 0.2203818811694421841985 +3294 0.1545254673777810006108 +3295 0.2839434822145660564097 +3296 0.0526567724517632840264 +3297 0.2772356221355222749558 +3298 0.0329684454485768621557 +3299 0.1858027779810614565914 +3300 0.1242936666360192343728 +3301 0.0146300429917546707947 +3302 0.14796884403334820135 +3303 0.0089489093660660556234 +3304 0.0979284976771337828394 +3305 0.1838658155876358446523 +3306 0.0997532050431331612783 +3307 0.0504919250637339139809 +3308 0.3492203674827920045765 +3309 0.1858855977875928933152 +3310 0.0680772505803873412278 +3311 0.0491010600359467475062 +3312 0.1177849405754964745574 +3313 0.0965867573423924846709 +3314 0.2350740594362605850876 +3315 0.2011077259952706586255 +3316 0.1133744903034245438134 +3317 0.1735165687421685032366 +3318 0.1667469077963386059338 +3319 0.0301393395519699847096 +3320 0.0307716620000859260309 +3321 0.0017145842604528104772 +3322 0.043074866589486754398 +3323 0.046577307958076935579 +3324 0.020319618051986142182 +3325 0.0109970347545038510395 +3326 0.0878695751499343619129 +3327 0.1816195252003078020042 +3328 0.1695604876973014485397 +3329 0.0956238916748437112192 +3330 0.1445152867748406011472 +3331 0.0317658028326593261381 +3332 0.1826013701958615897958 +3333 0.1489764291978402932504 +3334 0.0042057080843222599542 +3335 0.0743784179371895920463 +3336 0.0648243826450895926916 +3337 0.1737402990033380945079 +3338 0.0687539074638968961128 +3339 0.2230756905862842920385 +3340 0.2352261977661938285866 +3341 0.1644626540392685465353 +3342 0.1469742789133955118697 +3343 0.1417290217431739773879 +3344 0.1079224124101459819647 +3345 0.0153859539785224808811 +3346 0.2018890899610975009359 +3347 0.124080201407772344746 +3348 0.1424717964510726631033 +3349 0.1089250075963916208899 +3350 0.1512930605537946282535 +3351 0.0842381678345729273394 +3352 0.0712700341568046347174 +3353 0.2031555987262934348525 +3354 0.1796487830532789709004 +3355 0.1123395383508015710206 +3356 0.0175698166504657943277 +3357 0.0813029139855265536863 +3358 0.1241596197290908976107 +3359 0.2572281800886839775444 +3360 0.0192461160377564409474 +3361 0.1068855421566663049804 +3362 0.1256082118222672794339 +3363 0.2004173579792937476274 +3364 0.044820165382584271907 +3365 0.1444530386580949177233 +3366 0.1228206441930206893609 +3367 0.1604960467035461135765 +3368 0.019904885785959541139 +3369 0.1298731081666716080658 +3370 0.1215025686701936835643 +3371 0.1619741089296333758085 +3372 0.1909399588561942473497 +3373 0.1733423544948868300075 +3374 0.1986212752992650432127 +3375 0.0902718812188627733883 +3376 0.1790676582050176601779 +3377 0.2657696382905293530641 +3378 0.011649424291058665254 +3379 0.0272064432342149706279 +3380 0.0885725151316452269379 +3381 0.2586000933800553513997 +3382 0.2314083001211589551982 +3383 0.0198056124006394593251 +3384 0.3005025506070571061912 +3385 0.1166688695771490869024 +3386 0.1869956067589376991211 +3387 0.1745314256259571650265 +3388 0.0164901870945801977408 +3389 0.006673713672932521726 +3390 0.0035265028542741049498 +3391 0.0118579488171117464201 +3392 0.1350329239399608860506 +3393 0.2158432484541343843176 +3394 0.0023066687645656156792 +3395 0.0326364065688158344614 +3396 0.0833031431359460422525 +3397 0.0857848578809226186559 +3398 0.0111862465340402516406 +3399 0.0463759750331433004411 +3400 0.0060416118589701819919 +3401 0.0006427782950807708982 +3402 0.0491100199987967306337 +3403 0.1603790260318033455977 +3404 0.1307413224098404325169 +3405 0.0231077469051775077902 +3406 0.003932805810367834623 +3407 0.0787359571690409015821 +3408 0.1891261318919864875543 +3409 0.1187405713044350219487 +3410 0.0109161112328830106621 +3411 0.1141895602321871677765 +3412 0.1384042565875593655544 +3413 0.055862250383252064212 +3414 0.1142969221953334529873 +3415 0.1327788792475713963714 +3416 0.1908809385493322186012 +3417 0.0865255972676073209504 +3418 0.008732591740166857977 +3419 0.0066138035785192703792 +3420 0.0106189051167663150305 +3421 0.0098587367763217099298 +3422 0.0076420199236784591354 +3423 0.1552850883447020324812 +3424 0.2171099320313051861397 +3425 0.1554658690986545910295 +3426 0.1155178766632893733179 +3427 0.123286586193296202052 +3428 0.1951643899795043879397 +3429 0.257050079594686731177 +3430 0.2508211830141769960179 +3431 0.1380120575535491445063 +3432 0.0047449677139764692008 +3433 0.1772604259004863791471 +3434 0.187449969285026385446 +3435 0.2040326959686181718112 +3436 0.1678731229288273618661 +3437 0.1837330679149777601378 +3438 0.1396660659850197383491 +3439 0.1207617602058912442686 +3440 0.2164019068023724789995 +3441 0.2472372339156540532734 +3442 0.2414480108775189648451 +3443 0.147350887892059634332 +3444 0.256228001037917874072 +3445 0.21428914981237773274 +3446 0.1987943487867952496728 +3447 0.1399272016986101707658 +3448 0.2366948054102755993888 +3449 0.1239098091552188474207 +3450 0.137805803618054473203 +3451 0.1739069281962354329707 +3452 0.0891552118972492868565 +3453 0.1001906046793476917633 +3454 0.090294349364083301146 +3455 0.218842761152374415401 +3456 0.0772235826200158154142 +3457 0.1434808414941689813649 +3458 0.0994574097462292416871 +3459 0.0738896829080334938133 +3460 0.1195788926567913496024 +3461 0.1290435117685693744427 +3462 0.070967109687173013377 +3463 0.0754563939470829225797 +3464 0.1566715558898878757343 +3465 0.281280967011571847447 +3466 0.0970049133435441635065 +3467 0.239596171808706709383 +3468 0.1746836301765438137501 +3469 0.0027197627055256298062 +3470 0.0574895576193003368659 +3471 0.1285003638524215563432 +3472 0.2226680679607130064479 +3473 0.199277808294023278668 +3474 0.1740711485709219019657 +3475 0.0027471557543005155344 +3476 0.1017359625758856994659 +3477 0.1184390483280777089936 +3478 0.1110375815517991304748 +3479 0.0823232937409893450464 +3480 0.1458978591624407183946 +3481 0.1201850621346703951176 +3482 0.0366353570489677815569 +3483 0.0315312772991568138203 +3484 0.0421381268230634625782 +3485 0.0017437513888482465522 +3486 0.1929018685339519978417 +3487 0.1821660101325456027066 +3488 0.0393480359588910047486 +3489 0.150561469450815582638 +3490 0.0050184511015964007249 +3491 0.2215392077069398557576 +3492 0.0296951027128736742833 +3493 0.115979755726402872229 +3494 0.1217664914578608925666 +3495 0.0450615982454763003284 +3496 0.0440447236743254436075 +3497 0.0372062403276545092257 +3498 0.1409236205103927275228 +3499 0.3006869153357429325091 +3500 0.2469433727990865967694 +3501 0.1717349182486347380916 +3502 0.2607781960826233702555 +3503 0.0090120380496488232402 +3504 0.0564736930106374476757 +3505 0.0608787891951332524321 +3506 0.0858384853760857657878 +3507 0.0021514484794770789497 +3508 0.0036531469323813499893 +3509 0.0015800273989132560676 +3510 0.0056251729112659311013 +3511 0.0099149640772344914408 +3512 0.0042109890898481237842 +3513 0.0053009124986554117392 +3514 0.1760137768597587748509 +3515 0.10411464222836500082 +3516 0.2197618342299647309357 +3517 0.119468732053721785058 +3518 0.2727724926821695206058 +3519 0.0400449849871290300407 +3520 0.0353053741071929008677 +3521 0.1281116219094852737292 +3522 0.1156089660702946386728 +3523 0.009442624277851502379 +3524 0.2693566696657047621066 +3525 0.1217133987137141021284 +3526 0.0054372100347485552083 +3527 0.1611244136242492297484 +3528 0.1492546074651632526109 +3529 0.0276334115453705092547 +3530 0.1993800479972486772162 +3531 0.1032421186528345996569 +3532 0.1142639086441479823186 +3533 0.1384955904382247360296 +3534 0.0553659174794249517304 +3535 0.0540540456688882869907 +3536 0.1629717395444979788355 +3537 0.1229334498345440174205 +3538 0.2157278101657877644737 +3539 0.1713182226157022314261 +3540 0.2283887845886253264993 +3541 0.0620065698124661474067 +3542 0.1172512622716708891035 +3543 0.2104055650385804332458 +3544 0.2006641591666254165194 +3545 0.1865294718403428531062 +3546 0.3112149459136091533829 +3547 0.1851572021044215754504 +3548 0.1582320593751194715004 +3549 0.1088615530238389861317 +3550 0.2021346365250853427398 +3551 0.0735078555881270301331 +3552 0.0683009144415237523518 +3553 0.0252635889587007160528 +3554 0.0207229859621155254601 +3555 0.0054015032610996690024 +3556 0.2658708507411475241433 +3557 0.0057838656437668280533 +3558 0.2200403929056574947598 +3559 0.0145412150425511654123 +3560 0.0878927351267556611347 +3561 0.1948889625842969985925 +3562 0.1396393456711796654535 +3563 0.0801419666076929743115 +3564 0.1860812747003632028342 +3565 0.095309713664604264749 +3566 0.1471663765166285886554 +3567 0.0868868877734392114354 +3568 0.1800470850480476447952 +3569 0.2929164841046422718129 +3570 0.2474583316413268563316 +3571 0.1606506661966100035777 +3572 0.1563588952067411219637 +3573 0.1855044207776142262123 +3574 0.1398544970421385369441 +3575 0.0665775878167705753574 +3576 0.0938461055975454844225 +3577 0.0291521186915892627223 +3578 0.0913572158141896933925 +3579 0.0359761166911074375951 +3580 0.1547647320330169107283 +3581 0.1049958169772139265907 +3582 0.1721093769549299212773 +3583 0.1539991724141188678132 +3584 0.188735228801503590601 +3585 0.0264898222218543887008 +3586 0.0048773570237383902931 +3587 0.0283310376239221152483 +3588 0.0561063675599617964207 +3589 0.1294164914777035935156 +3590 0.0064452867279276467027 +3591 0.1029736163995343956046 +3592 0.0378027735025980027306 +3593 0.0327923488443235544576 +3594 0.0343037839576066788672 +3595 0.0083677245325080359256 +3596 0.1074639308166813744805 +3597 0.0876553717341266858121 +3598 0.1211258713397537073453 +3599 0.0796593236661178594193 +3600 0.0005688554106119129072 +3601 0.0230452072273104151268 +3602 0.0020856599021119256489 +3603 0.188362148734631529301 +3604 0.155087553084020252081 +3605 0.0024044127405534266098 +3606 0.0782934463915720135763 +3607 0.007962363386950303476 +3608 0.0660063256946502446842 +3609 0.0054584641154633798538 +3610 0.2826935290995286620408 +3611 0.1403192799014746638964 +3612 0.1556728187822832287779 +3613 0.1211484097153959771376 +3614 0.190614324012632424532 +3615 0.2143861139611405730587 +3616 0.042816740109314348206 +3617 0.0811081629467639009246 +3618 0.0965369535654367616484 +3619 0.0245524454849066660345 +3620 0.046406924201960832177 +3621 0.085762555559584702225 +3622 0.1622892879369096374997 +3623 0.0453482783083946142111 +3624 0.1075382999430092856308 +3625 0.1356574331997106308556 +3626 0.0734139481633700879959 +3627 0.1746978365638146846184 +3628 0.1680518842981787996926 +3629 0.1405924341427501056412 +3630 0.1649717271948850372265 +3631 0.2370972273608270286527 +3632 0.1973370321727905551956 +3633 0.1104867054768108036722 +3634 0.0754544557951192379486 +3635 0.0410444931384498709614 +3636 0.0226441102516831589686 +3637 0.1184968813907709694888 +3638 0.0071813302768380295887 +3639 0.1889909729038197760964 +3640 0.0128760766320561940296 +3641 0.075799404062590322595 +3642 0.0067669564166409797157 +3643 0.0704186388912095700965 +3644 0.0941157545602486628189 +3645 0.1608091766943904321607 +3646 0.4283368267575853871598 +3647 0.1778938094220405830015 +3648 0.1661355766727347083034 +3649 0.0950102752241430009716 +3650 0.0866164078377063689773 +3651 0.255731329625330405797 +3652 0.0877157282958572898091 +3653 0.1424334394935027614526 +3654 0.0177851234314711399709 +3655 0.0618981144457921295121 +3656 0.1998129254830185996816 +3657 0.2517128464021950451368 +3658 0.2106653721924954536515 +3659 0.229064248515472357326 +3660 0.2381909083692227979689 +3661 0.2491807658874115316561 +3662 0.2076590136174782941225 +3663 0.2198136215230511725238 +3664 0.2702727825408367534621 +3665 0.0443077064748854873577 +3666 0.0044514640189330309528 +3667 0.2648449569013368032344 +3668 0.2272924404107573848055 +3669 0.1629596976745857073166 +3670 0.3070318941276618529557 +3671 0.0031602505943956939877 +3672 0.1578384559487820137935 +3673 0.0034278950397065182688 +3674 0.0177173215677000825408 +3675 0.2377656006323777548861 +3676 0.1347729911998758189906 +3677 0.0209122446250912773225 +3678 0.1666465552354011725456 +3679 0.2511342330164593739106 +3680 0.1930485895767429238568 +3681 0.232291885794574520796 +3682 0.2722404534745440085608 +3683 0.1937979702992125852212 +3684 0.0419511617757435048559 +3685 0.0505555937367230673329 +3686 0.0075273620609871503412 +3687 0.0962879257880270705838 +3688 0.0357158143480790432611 +3689 0.0188428672912769462622 +3690 0.0834257625652984335929 +3691 0.063934691888149319805 +3692 0.0717040295268693839148 +3693 0.0091943280158451514555 +3694 0.0672118383311378908562 +3695 0.0263393121937783233899 +3696 0.0973059164366461282203 +3697 0.0526542202088888591782 +3698 0.0063937610661800878611 +3699 0.029409460502668113846 +3700 0.0190075278554926070662 +3701 0.0759206493697667100795 +3702 0.3352891844699213974579 +3703 0.3035671279907667385878 +3704 0.0551717604918808765535 +3705 0.0821441948143292188789 +3706 0.4174009779485415294609 +3707 0.0495368474403488703861 +3708 0.0420934143755120615538 +3709 0.1509408817289767668068 +3710 0.0742242757819093906813 +3711 0.0620707750297767590908 +3712 0.0531620649680608625687 +3713 0.0565386778962200936549 +3714 0.0455922835278869120756 +3715 0.1918383722963040716181 +3716 0.1661089439190804872837 +3717 0.0138296362802681448156 +3718 0.0892748173303130332368 +3719 0.070077248031121738836 +3720 0.0591614712602952994014 +3721 0.030394193775864535717 +3722 0.0150620405481196439207 +3723 0.0106651045442354309306 +3724 0.0608349463979113366796 +3725 0.1086056430801814187692 +3726 0.0616780932831235625957 +3727 0.0147193017506656413651 +3728 0.0161673404393403510881 +3729 0.0593804225688110837167 +3730 0.0036952246589288298938 +3731 0.0304662146353913629304 +3732 0.0181362248335619653805 +3733 0.0465407629435860351053 +3734 0.0187054216611301782347 +3735 0.005487347435844057951 +3736 0.0227814127418200611241 +3737 0.0217770312402846000222 +3738 0.0385506823916997656831 +3739 0.0365324614240276901445 +3740 0.1193966864709204850348 +3741 0.3268690869590851755255 +3742 0.1501895214829850333516 +3743 0.089619663290677456513 +3744 0.1414218891247940868006 +3745 0.1115863113689032209974 +3746 0.0624792953834596084817 +3747 0.1827812894573133672971 +3748 0.1829885725644982141791 +3749 0.3171210601431219133772 +3750 0.2807170975254086320305 +3751 0.3041261815035468663027 +3752 0.1490728467046736060997 +3753 0.176248229665670663735 +3754 0.0907654669458520940717 +3755 0.2253001894824978312837 +3756 0.0032082754150488737577 +3757 0.0018665606124773090682 +3758 0.2535699978196003789854 +3759 0.1939338836472572491498 +3760 0.2413046674639198474477 +3761 0.1589843621346438695685 +3762 0.2119310785710614708499 +3763 0.0299769098728603564097 +3764 0.0494956962278097142849 +3765 0.1473979286292585733076 +3766 0.1299062309273925908126 +3767 0.1527911438133494892444 +3768 0.1720467220050022616462 +3769 0.0061384030962465660139 +3770 0.1091618639292729064927 +3771 0.1038561967709750732958 +3772 0.0857371786675657776566 +3773 0.0129266613119995450037 +3774 0.0092386442415787334292 +3775 0.2888239432871428924443 +3776 0.2021563447235094901355 +3777 0.2478922150681169278563 +3778 0.0386504856938449589676 +3779 0.1865188202493618385702 +3780 0.1003894339413108210968 +3781 0.2189590500046067478301 +3782 0.0365049198912088604385 +3783 0.0264101492446997647423 +3784 0.3045002144415660527699 +3785 0.0054777426026449945254 +3786 0.1696293345194961887934 +3787 0.0686019638587238578253 +3788 0.0081022722987695063035 +3789 0.2543958537814009490674 +3790 0.0003738170410983708298 +3791 0.1257961128507932291321 +3792 0.03119116381856970599 +3793 0.1705815794985753586221 +3794 0.1452215924189783624598 +3795 0.2496364569828288804132 +3796 0.0309606607093980239209 +3797 0.2424016606054131550696 +3798 0.0620535771692871246374 +3799 0.0906748320518952488634 +3800 0.0020307191306590407186 +3801 0.204106986721229122228 +3802 0.2886530800315882938634 +3803 0.1479259439699542322266 +3804 0.2045550353940958909771 +3805 0.202381294123099630955 +3806 0.1937050729892271505594 +3807 0.0037816535001893845404 +3808 0.1500754015752976955334 +3809 0.0064329425300870112653 +3810 0.2206927959375553638388 +3811 0.0511789597160807324672 +3812 0.3012890312546626647716 +3813 0.0455976796578803078575 +3814 0.0383254579859244673412 +3815 0.0007927526205742253373 +3816 0.050282181926781688297 +3817 0.0076787037038057283714 +3818 0.020511603033736110907 +3819 0.0567271362002168474947 +3820 0.1877512470780467179754 +3821 0.0751264592934939995361 +3822 0.0417107950907375651939 +3823 0.1123046244113278108179 +3824 0.4204446583513561930268 +3825 0.0862045672420381214884 +3826 0.0676157531310417614367 +3827 0.049415138278191050869 +3828 0.3192960391262790742495 +3829 0.2451026548374813651687 +3830 0.1776356298152951540548 +3831 0.1822472579202011611077 +3832 0.1518703086364470711267 +3833 0.1279381516309843203061 +3834 0.2473427307034520294948 +3835 0.2286021756321498787923 +3836 0.0217229262511619419795 +3837 0.143797310882160772616 +3838 0.2557715710076476911539 +3839 0.162842141864613931812 +3840 0.1974807045478640665781 +3841 0.1382344353174246032179 +3842 0.192696333491217758338 +3843 0.0722080479609166075994 +3844 0.0231817299588010297673 +3845 0.1019049587770307074841 +3846 0.1347619499472519910555 +3847 0.1821725954729625507333 +3848 0.0912586695556845545196 +3849 0.0752118562444913096865 +3850 0.0466966709773578708687 +3851 0.1272302639968231896539 +3852 0.1205865193697424619934 +3853 0.0456886571988530595934 +3854 0.0369727564885707557507 +3855 0.1321008465961622524443 +3856 0.0390060664579434113186 +3857 0.0244881708525533821341 +3858 0.1434508170844046859305 +3859 0.0390878808001687283458 +3860 0.110479719256333708266 +3861 0.1067435654499038488741 +3862 0.0098600214555692247687 +3863 0.0649430399821642051483 +3864 0.1693311856881258015139 +3865 0.0708119509839941346785 +3866 0.0496822717204906047495 +3867 0.0898073063702923557461 +3868 0.0223424231247816990065 +3869 0.0453630772944901630717 +3870 0.0950551673812456004997 +3871 0.1674446430090911486221 +3872 0.2872120301503596206949 +3873 0.1830964411368032063177 +3874 0.0274550124429659286873 +3875 0.1785242536161159299102 +3876 0.218706957141110130971 +3877 0.1858905785830006207604 +3878 0.1200593972067087872313 +3879 0.1833989806385556142754 +3880 0.1767334632708348896024 +3881 0.2457559753178497619075 +3882 0.1771152782719934148936 +3883 0.1178955974408629758798 +3884 0.1134321515641412286834 +3885 0.0474758982668836218566 +3886 0.0146908047346926053794 +3887 0.1212325821996592667418 +3888 0.0561237668090163316359 +3889 0.0667313744060067937092 +3890 0.0815785939412785043867 +3891 0.0621157358513423976665 +3892 0.0950651335152585957866 +3893 0.252495010120377783025 +3894 0.018265965234743504475 +3895 0.0481504026205439930597 +3896 0.0697411615324139699812 +3897 0.0173249270329776036692 +3898 0.0437634275702078190395 +3899 0.0399318662009947242253 +3900 0.1768954236826311698927 +3901 0.0276684828836874160962 +3902 0.0276332609790727636023 +3903 0.012049048844822444343 +3904 0.0132694292169576628926 +3905 0.0101354131838240751201 +3906 0.0180956452170536220836 +3907 0.075402261315245383555 +3908 0.1512245060280698605304 +3909 0.1292240545123687911477 +3910 0.1955569325508548506054 +3911 0.1620317181138292150777 +3912 0.172435813135814297592 +3913 0.0502481043801039570229 +3914 0.0371883258113151482505 +3915 0.085485606341439007716 +3916 0.1328887500572258628662 +3917 0.0457384775731437404067 +3918 0.0619157097359727404129 +3919 0.0733418152192148037338 +3920 0.1283368356122053799417 +3921 0.0344663996238450121012 +3922 0.1453082257595083304214 +3923 0.1515417244985909905886 +3924 0.1369314766394547078221 +3925 0.0114403707592418725503 +3926 0.016640824230382635085 +3927 0.0053744429550810603538 +3928 0.0147214590105817343807 +3929 0.0259125497594347477259 +3930 0.1465013302809963779971 +3931 0.087640288544519343028 +3932 0.1976763104895475908762 +3933 0.0061780052109623997611 +3934 0.1739799703302535582861 +3935 0.1506332219668733707341 +3936 0.1600410824466568238122 +3937 0.1044573302177933399326 +3938 0.1366221500515853171098 +3939 0.2086833408879196494201 +3940 0.1587006188656918115321 +3941 0.085570919092840411202 +3942 0.0243265171215070047372 +3943 0.1083327809545726744256 +3944 0.1697744572078665192816 +3945 0.0755740770471215184134 +3946 0.0870371883444000932206 +3947 0.1168773608764838017882 +3948 0.1594631524622551477233 +3949 0.1936822582438990358877 +3950 0.173471860165907626472 +3951 0.0504350407068298520197 +3952 0.1110538693943386445717 +3953 0.2368684735880159653032 +3954 0.1105424588377246358784 +3955 0.0443779713957541538383 +3956 0.1704723891906737287538 +3957 0.1851965729490814560609 +3958 0.226800265769911468805 +3959 0.055998659787524315834 +3960 0.1313255683718299038354 +3961 0.1898110829223404738553 +3962 0.1441405936726761483069 +3963 0.142682928931867181177 +3964 0.097405156249538618618 +3965 0.1380241069202323977994 +3966 0.2236489536293437274672 +3967 0.1452862790344026377998 +3968 0.3007307685620276838101 +3969 0.2049124475354583974074 +3970 0.0253735093466930551609 +3971 0.1151642816631818222461 +3972 0.2653117772818321395967 +3973 0.23478503357933400153 +3974 0.3604552839546781850544 +3975 0.2439218970255671581349 +3976 0.0387785775073670666013 +3977 0.2579576914029050871413 +3978 0.016428708126775218995 +3979 0.1098049732777548609386 +3980 0.1257682143982402078741 +3981 0.0463537099190269863414 +3982 0.1878625429917780420563 +3983 0.1678608853201566109004 +3984 0.1878042378726373085041 +3985 0.1027823154179225151328 +3986 0.1844459472911866293465 +3987 0.1486355623863670483153 +3988 0.0128173497918741760571 +3989 0.2054163119636665812973 +3990 0.1424099667568755889224 +3991 0.1267594636502189509031 +3992 0.0601371739672541955057 +3993 0.0071767136789444279515 +3994 0.0790481828707524508681 +3995 0.0257984354246672473177 +3996 0.045193482954460802159 +3997 0.0605628771868820664492 +3998 0.1912824341654079662867 +3999 0.116693236207298656204 +4000 0.0199068232272822491502 +4001 0.0519592751936931301904 +4002 0.2446950596185900050905 +4003 0.141180416189038288044 +4004 0.0158074189354803808227 +4005 0.2148655690177198041013 +4006 0.0087480752804380522425 +4007 0.0085180155181302803086 +4008 0.0149031218099938168875 +4009 0.0255819851915492467442 +4010 0.0076391878790477846239 +4011 0.0181607625702228485387 +4012 0.0749755265502994455717 +4013 0.0135013932627892074834 +4014 0.1370536067022218307265 +4015 0.0659838375296512569257 +4016 0.0583760841989925324813 +4017 0.2376222690039524187977 +4018 0.243665222756563965012 +4019 0.2406717225322681508182 +4020 0.1800967531887190387074 +4021 0.0902256018947642929318 +4022 0.0819310034006439508181 +4023 0.0617171744131362262631 +4024 0.0005306825286828938896 +4025 0.0030698337806951228081 +4026 0.0907850936629379101639 +4027 0.1245917062827779087097 +4028 0.0722254528243505417029 +4029 0.2601310293129750483843 +4030 0.1106546229433326838931 +4031 0.156632724790644356494 +4032 0.1356952830476296223416 +4033 0.0341345596794648473149 +4034 0.0303000311932713380092 +4035 0.2462159957562648493479 +4036 0.0362378531841351103626 +4037 0.1583837529010675426733 +4038 0.0003623861184866412574 +4039 0.2156811819490457859949 +4040 0.0823778562400383496112 +4041 0.1280754082249621339518 +4042 0.1618285358375705362821 +4043 0.1463898271419968311058 +4044 0.1992557232181761672773 +4045 0.0014641552747738631381 +4046 0.1435892903100626505353 +4047 0.0939670646725311725644 +4048 0.1919982693619320668343 +4049 0.118250987017775116561 +4050 0.0985996436685276644862 +4051 0.2560087967628522642727 +4052 0.1842425159921962374288 +4053 0.1862880880188146071053 +4054 0.0088414940324730746657 +4055 0.0851253365158099878363 +4056 0.0410267705894761178231 +4057 0.0227375914707156102923 +4058 0.0886409102569368639868 +4059 0.0285466835159366751073 +4060 0.0271043385658969633589 +4061 0.0111435758049109583795 +4062 0.012291639634902566322 +4063 0.097626871860030786654 +4064 0.0980805624689709693032 +4065 0.1684642163395447256313 +4066 0.1444926416092707499583 +4067 0.1586566455685150256461 +4068 0.1096877899847961274071 +4069 0.5101166296671078015379 +4070 0.3115184386968540275475 +4071 0.094019727808949460357 +4072 0.0235692423670498599209 +4073 0.2072152129328617142789 +4074 0.0363385608858746433913 +4075 0.2304316823877136100052 +4076 0.0620586134173696027383 +4077 0.3293072097312343959885 +4078 0.0147289988327358508519 +4079 0.1292992642487013343899 +4080 0.1710214726210536606565 +4081 0.3040847491787577183686 +4082 0.127693414584827719116 +4083 0.0465786516523781571331 +4084 0.0336082796358415716864 +4085 0.3103590750965450340004 +4086 0.0949751642858192718366 +4087 0.0910829826942699538606 +4088 0.0015219117315174728218 +4089 0.0301111207044240346276 +4090 0.0079310218521748340748 +4091 0.0055579221279862221772 +4092 0.2264281744783969707591 +4093 0.0696257136451407243616 +4094 0.1963115463662767345987 +4095 0.0084778359518670100975 +4096 0.1968102959209410973695 +4097 0.0004952412342801579877 +4098 0.0079084756824991334773 +4099 0.0088520654669721598323 +4100 0.0019843250596790528231 +4101 0.105328416545915376501 +4102 0.1050891079984912906964 +4103 0.1741051254404116233054 +4104 0.022239760319040183939 +4105 0.0200511095349266699195 +4106 0.2182345916509646766013 +4107 0.1697168571365343936463 +4108 0.2471348414951480276791 +4109 0.1133363946650487036338 +4110 0.0145099521681392503586 +4111 0.1087741770488035303321 +4112 0.1570719909989715390441 +4113 0.0667771199468601972526 +4114 0.1698844274688278976893 +4115 0.0382620884170325817331 +4116 0.0425589923588482535455 +4117 0.0678014839297229887638 +4118 0.2957627500242011042708 +4119 0.2830760194623967440641 +4120 0.1300632307390131026104 +4121 0.1110701680403100022332 +4122 0.0005765763462983869696 +4123 0.1517907042608299117425 +4124 0.2414214208316683429878 +4125 0.008500761844038712961 +4126 0.1888560235916785234878 +4127 0.0035858720542105804001 +4128 0.0077203525663805967066 +4129 0.0019176633684559421956 +4130 0.0008283803807169385323 +4131 0.0022628781247748331121 +4132 0.0040943530419276863777 +4133 0.1616376100982946650841 +4134 0.0551484654669320908948 +4135 0.0131873822195388117062 +4136 0.0015424057619886613526 +4137 0.0153920652614377446304 +4138 0.1269154108518646961734 +4139 0.0015016822703914315274 +4140 0.2787202231777513583566 +4141 0.1088344064723902326897 +4142 0.3188641344559778878498 +4143 0.2520035806162828628807 +4144 0.0651358543637888670608 +4145 0.0431271558358988504889 +4146 0.0665497899969361844663 +4147 0.2352715724170541256033 +4148 0.1332817452649865952452 +4149 0.1202279679459067607272 +4150 0.1186869849580461655503 +4151 0.1454530393371549212844 +4152 0.1374499205594109374484 +4153 0.1255934760930888516306 +4154 0.0543132350450286766064 +4155 0.0224968015488662999302 +4156 0.0783087582712510554828 +4157 0.0091582714146309397718 +4158 0.1666525120375363377967 +4159 0.1161110201345140480189 +4160 0.0014378229466662042525 +4161 0.0022494988589376076205 +4162 0.0048531927538969163449 +4163 0.1910229043953715344362 +4164 0.0752577680857202668818 +4165 0.1376771957494238030062 +4166 0.0633780833314361441433 +4167 0.000425476157797112823 +4168 0.0001964449356874940029 +4169 0.0001986278893264328837 +4170 0.0008366906142024408202 +4171 0.2172367739694282262608 +4172 0.2743864844787835544082 +4173 0.2026750900253522225114 +4174 0.2069064706943257725413 +4175 0.1909043246122456294334 +4176 0.184627848641922998052 +4177 0.1629799633344526654799 +4178 0.1916578574442475424444 +4179 0.1646856789994466663263 +4180 0.2047733196421544499266 +4181 0.1144466100739784247819 +4182 0.0030629764953806972068 +4183 0.17165720163390066344 +4184 0.131153358583911888724 +4185 0.0717460731679279151152 +4186 0.2034911746678884914008 +4187 0.1091662337319167785932 +4188 0.0842466094540107873767 +4189 0.1747775921773994101116 +4190 0.105587986008598172516 +4191 0.0078803477411828748433 +4192 0.0024692552816724944176 +4193 0.0060025424101323755777 +4194 0.0029363219347848341576 +4195 0.0736851621988935390206 +4196 0.0006328747167226907036 +4197 0.0037374956033242316862 +4198 0.0383560405513165558755 +4199 0.0207160587501894477569 +4200 0.0015274828619809461844 +4201 0.008574160673668578328 +4202 0.0066746370921195096945 +4203 0.0789823684246251933594 +4204 0.1119812043994510009082 +4205 0.1074577243652060087875 +4206 0.1664973568230766476361 +4207 0.1826231341004952157814 +4208 0.0063620489798721013494 +4209 0.0258164971068528537046 +4210 0.0005126715211182289976 +4211 0.1389141881163659109077 +4212 0.0166349951325294904969 +4213 0.0111175015627804855273 +4214 0.0103628638013896785114 +4215 0.0184452531416872157755 +4216 0.0040326287326595692054 +4217 0.0021958460547918013002 +4218 0.0011519075400983364715 +4219 0.0048803376714008153692 +4220 0.0037027893335098765668 +4221 0.0031032497414730962279 +4222 0.0031212747543176862751 +4223 0.0040364790044883705478 +4224 0.0010276182314935479629 +4225 0.0582347639791096077788 +4226 0.0175663338912753728971 +4227 0.0287327337211462224831 +4228 0.0084301462480381693376 +4229 0.1030072480791257849564 +4230 0.2628282897940247830881 +4231 0.1160851307400683346538 +4232 0.0998289584502579374625 +4233 0.1002254182640651969383 +4234 0.180746894783200801049 +4235 0.1081757206951769884062 +4236 0.0409114848629315436268 +4237 0.2035662644707687263246 +4238 0.0918862632170126952813 +4239 0.2598208903982803241739 +4240 0.0752059163909914174528 +4241 0.1824558245840094405921 +4242 0.1993502791557117959087 +4243 0.2444205484711327314162 +4244 0.0520964514453361934865 +4245 0.1418332139610516406947 +4246 0.0064060018583772943002 +4247 0.021803047467360781031 +4248 0.2548175520805381344402 +4249 0.1640744521436818814664 +4250 0.0428829725574316963921 +4251 0.1664844394328665877492 +4252 0.1375400374661079339056 +4253 0.237138777153354685101 +4254 0.0627839840459638171 +4255 0.0451822450622955673616 +4256 0.2368340082199997254087 +4257 0.1759447361697340506126 +4258 0.1100762663608997882214 +4259 0.1210078182420664810737 +4260 0.1175728224651560460678 +4261 0.0129525208231030201228 +4262 0.0113202936529250512004 +4263 0.0105968789094524912936 +4264 0.0105252168336671808602 +4265 0.0922911941550576819804 +4266 0.1758250324488162263936 +4267 0.2137276380763132310747 +4268 0.1557667996550577693693 +4269 0.0963732954082928416906 +4270 0.1408033670343264431857 +4271 0.0165912175502304741481 +4272 0.0801339538972818954399 +4273 0.029201559318230086576 +4274 0.1120123768128030439239 +4275 0.026562168332101378232 +4276 0.0239615052667603821435 +4277 0.2263806961246138393928 +4278 0.0862267528719333486986 +4279 0.0300451455239174533562 +4280 0.1939140916121566138308 +4281 0.1942908970272332613849 +4282 0.112568028527207500411 +4283 0.0248897441368862507027 +4284 0.2219682022007315458367 +4285 0.1783233775627049155776 +4286 0.111780243680575111509 +4287 0.0809283434303675469623 +4288 0.0142402819300842010952 +4289 0.1128688140639016040945 +4290 0.1911395306195265153537 +4291 0.3179010555864766329748 +4292 0.1109046755116412791287 +4293 0.0614179102859464531861 +4294 0.1940699442885135683756 +4295 0.0039947702777824349932 +4296 0.1446763331555616927115 +4297 0.1713386862875473370948 +4298 0.0848248052513970729471 +4299 0.1252425640552670138561 +4300 0.2693692662512285895815 +4301 0.0941326903556691518826 +4302 0.2431163491995858849659 +4303 0.1910306399469211824371 +4304 0.0107105123489134399645 +4305 0.1534963618372035820681 +4306 0.1632546584085292351585 +4307 0.1007076771840570567074 +4308 0.181586475095326577156 +4309 0.1674187560273544539413 +4310 0.1966868253853858439228 +4311 0.0748489875085504158259 +4312 0.1705244014836621024855 +4313 0.0412950583172347063243 +4314 0.1261141410493317183139 +4315 0.0263868950216068973158 +4316 0.0205284122296171024635 +4317 0.0335563574856665922108 +4318 0.0052077191932726604393 +4319 0.0238062153755826122481 +4320 0.2085530518570083147178 +4321 0.0825077385184957712294 +4322 0.0230214667861015701011 +4323 0.0194488477921859054254 +4324 0.0461421846711311950395 +4325 0.0210701754214128629539 +4326 0.0102879254220884994164 +4327 0.032130741322960509887 +4328 0.0206888740477990003275 +4329 0.0504391971078233455894 +4330 0.0701840299284326729623 +4331 0.1238858968720318171064 +4332 0.0222131010538233232554 +4333 0.0721003050932351174929 +4334 0.1194853595468106655098 +4335 0.1744284987223078953189 +4336 0.1589194302675700498639 +4337 0.2960896759939400935124 +4338 0.0364895791010611744709 +4339 0.215556648214324908519 +4340 0.1574628360866846543686 +4341 0.0072874388773318645243 +4342 0.1682154755511218735542 +4343 0.1670010746396732026486 +4344 0.152280669801272083852 +4345 0.1473658248010493998503 +4346 0.014265249193593908944 +4347 0.1070196954150673673833 +4348 0.0156755436543889251078 +4349 0.0064270070606039994554 +4350 0.0042423568954070602077 +4351 0.1434873015126655326767 +4352 0.1467341587672668368203 +4353 0.0171787015150820128462 +4354 0.0128520927615421286933 +4355 0.0560231663911258045974 +4356 0.00821980518965979845 +4357 0.0369528199453098837868 +4358 0.0010190987091489031938 +4359 0.1736784366512678090544 +4360 0.2662084356896019521344 +4361 0.0926538098077428068367 +4362 0.3573102111773652778126 +4363 0.0797592319686623307806 +4364 0.2299976998475782574705 +4365 0.3305267944074622987749 +4366 0.1781307498379371967445 +4367 0.1427360209960422399167 +4368 0.0620392553041279326553 +4369 0.2262346862008816350542 +4370 0.1649870641648281122116 +4371 0.0646219097360074407055 +4372 0.2812927872944429030788 +4373 0.0406766399875815171105 +4374 0.1659253360250876052628 +4375 0.1310602146078089502712 +4376 0.0639814011154224149269 +4377 0.1485064079479852250376 +4378 0.1688824591266299413039 +4379 0.1335807211172489183948 +4380 0.0809079936899527035754 +4381 0.1708735376826836238617 +4382 0.0038935112644296163316 +4383 0.0006501067984761594448 +4384 0.1897980889516865110611 +4385 0.0031924881476388159394 +4386 0.0218484432964594216253 +4387 0.0119914787025937833592 +4388 0.0022069814337102042559 +4389 0.0223578057217288204972 +4390 0.000963261674286419213 +4391 0.0069007478883632957303 +4392 0.0131078669674492345743 +4393 0.0280207922227136696502 +4394 0.2500060928054615438931 +4395 0.0244562721749706224461 +4396 0.021798185819801897678 +4397 0.0080850905496578657355 +4398 0.0258911016965221937869 +4399 0.0111340348535191140111 +4400 0.2294276806725126516273 +4401 0.0019618625645020436148 +4402 0.1194679094432188476427 +4403 0.0030832102090249361055 +4404 0.0248369353577142445988 +4405 0.0036946058229902485441 +4406 0.0317278983467005268326 +4407 0.0139769408884573009821 +4408 0.1152010572094914614416 +4409 0.029078720451510810252 +4410 0.0714058902035234754901 +4411 0.0754874241753284852541 +4412 0.0526829072724802338623 +4413 0.0405901442621384306308 +4414 0.008960817434659656619 +4415 0.0793258580432504856184 +4416 0.13839887467662997933 +4417 0.2515354596687037758151 +4418 0.0490736555079089725728 +4419 0.0243148998867850872629 +4420 0.0878273029239554220426 +4421 0.0374959903929495269592 +4422 0.0302852402790124430876 +4423 0.082603746432802799271 +4424 0.0328846311499054397287 +4425 0.1112265835391817725419 +4426 0.1131463078461643761985 +4427 0.0521576668716203437071 +4428 0.0771718736929119936141 +4429 0.0645095601247225453045 +4430 0.0404004846652530405504 +4431 0.0577277447161300694134 +4432 0.0840555505840218930569 +4433 0.1112785406428763029796 +4434 0.0081825664169625951516 +4435 0.2606456854962001523646 +4436 0.0342981591218172662816 +4437 0.0184331638968619933305 +4438 0.0574524289387916137217 +4439 0.0361143075169255353685 +4440 0.0384067638493685373224 +4441 0.0459021238666344894019 +4442 0.065552532791657500133 +4443 0.0672108198984940896636 +4444 0.177688258337554222388 +4445 0.1026529818944238725553 +4446 0.0296988650047990097869 +4447 0.1668171899134259228425 +4448 0.2230124599644606464555 +4449 0.1005466837279764408253 +4450 0.0902229754515261050951 +4451 0.1114800924573465801259 +4452 0.0216911972337536229638 +4453 0.0898701170638098734145 +4454 0.1263059101695682417965 +4455 0.1246461573036362902034 +4456 0.1328941613372640606627 +4457 0.1349535689199392318738 +4458 0.1272717898201116693002 +4459 0.14397157038872704371 +4460 0.1558396788926872944536 +4461 0.0111967534237212250348 +4462 0.1916713883999723033735 +4463 0.2526061649162906763699 +4464 0.0621785222068197904188 +4465 0.0525613018858564831626 +4466 0.0260189771923574064694 +4467 0.1085682035547879953352 +4468 0.1501919563554138525863 +4469 0.0855689439312008193195 +4470 0.1330513231331069412544 +4471 0.1081256677987275510722 +4472 0.1916108195381133194601 +4473 0.0522631950370593748034 +4474 0.3075939613661464178129 +4475 0.0195949236955125886961 +4476 0.0051577174787872121647 +4477 0.1035512829906866660856 +4478 0.1792180019090972853402 +4479 0.1584244295840402660946 +4480 0.1863269983350900049768 +4481 0.108652383859198245486 +4482 0.0374880864408803429755 +4483 0.1847973765461255468701 +4484 0.1932253574813729835036 +4485 0.0368051925012730166142 +4486 0.1382074635021349195974 +4487 0.2113298091064115535875 +4488 0.2137897002998340179047 +4489 0.0150362196891176986419 +4490 0.0153678703769809851115 +4491 0.0105353831687315553645 +4492 0.0018893284192036910885 +4493 0.3259337776402452813151 +4494 0.1472610322914114644455 +4495 0.0163763564731667261853 +4496 0.1270804415127299580313 +4497 0.209389011905339622599 +4498 0.1406941780396299690459 +4499 0.0699984123365925692273 +4500 0.2103543929260962563887 +4501 0.2134007145205577549163 +4502 0.1709867594774858445827 +4503 0.0929454108071221946075 +4504 0.1632145887615704116236 +4505 0.1648290782151679567669 +4506 0.0680829803809696598726 +4507 0.1661102793273032951493 +4508 0.0033505769401152734645 +4509 0.0673381233238743298619 +4510 0.0647400670138379846774 +4511 0.1974221188417387418124 +4512 0.0987953876080069209875 +4513 0.027747665408916984825 +4514 0.1854799743414835277999 +4515 0.1388114495213512156191 +4516 0.1394315756649421311764 +4517 0.1695910352426638345591 +4518 0.1670758753584241917967 +4519 0.0137565246463670267513 +4520 0.0296779017222721791525 +4521 0.1300992707987473351317 +4522 0.1671759120190755565361 +4523 0.1282444672412006714257 +4524 0.1822427371523054617786 +4525 0.001282729293163135725 +4526 0.0203639852898019162175 +4527 0.1807443366534953188562 +4528 0.3441170489091502915358 +4529 0.0622536930279754679995 +4530 0.0486813119268007521434 +4531 0.0224650768861722228975 +4532 0.0554724369946084303051 +4533 0.0139972251007518689453 +4534 0.1216597449653902957012 +4535 0.000905304151481356326 +4536 0.1420612351154305619261 +4537 0.0318685743947912589191 +4538 0.2034970517733050154874 +4539 0.1672488886523292861419 +4540 0.1591522155293712825763 +4541 0.2185307288773136735749 +4542 0.0078372478929167582046 +4543 0.104251817247598521643 +4544 0.0617999762479985725872 +4545 0.2093803050740611626512 +4546 0.0400474497196588660741 +4547 0.2592375759391346012883 +4548 0.1137795000347233131954 +4549 0.2212335895316799583288 +4550 0.0688585820223864275569 +4551 0.0121015738085411845432 +4552 0.0136513450002786734111 +4553 0.1463729332566003904503 +4554 0.2484928222902198780009 +4555 0.0285357702374828131608 +4556 0.1572657833617963629713 +4557 0.3173616034238306848891 +4558 0.1897851425987219786684 +4559 0.2218250489332346531146 +4560 0.1841164334309823569225 +4561 0.033262727411369985564 +4562 0.0262561436714446420904 +4563 0.0169132421693846644872 +4564 0.126618261152316724516 +4565 0.0040228280391478212635 +4566 0.0158708737129258752041 +4567 0.1001284840163209483022 +4568 0.0285660797099539474908 +4569 0.0621645173102289075318 +4570 0.385503256481098566244 +4571 0.2778717383037563326553 +4572 0.1784412991236714318699 +4573 0.2217845054382865943587 +4574 0.0885218525432837494149 +4575 0.1533844071788677543289 +4576 0.05574298223344682085 +4577 0.0083475603660379451426 +4578 0.201387600022997659277 +4579 0.0431022052491419546683 +4580 0.0354024796874756325638 +4581 0.1117270434872491696732 +4582 0.0319602168830015950785 +4583 0.2102432531357036893382 +4584 0.1777878452872843073607 +4585 0.3541613525038327381367 +4586 0.0575582739258113859626 +4587 0.1548770103006541221102 +4588 0.1064822612114519201354 +4589 0.139330134942840633272 +4590 0.1081916905784542498603 +4591 0.056709891269090366428 +4592 0.0907913351961889270036 +4593 0.1811873641188560812854 +4594 0.1640499663330450275112 +4595 0.1275820971359435396231 +4596 0.2121520376691352904874 +4597 0.034125362352520553777 +4598 0.2237638756083827062948 +4599 0.1996314897717700875823 +4600 0.1511294035108225741748 +4601 0.2563792492478957907309 +4602 0.1940739686597330038165 +4603 0.126250426870364251819 +4604 0.1398922506427444645283 +4605 0.1704421249091554180222 +4606 0.1940836695904690767378 +4607 0.2056829860133120024823 +4608 0.1686103649099368884112 +4609 0.1600829320652169540651 +4610 0.1905922476043409308222 +4611 0.1830232101759070861835 +4612 0.047646791582034761503 +4613 0.2156923535645799971938 +4614 0.1576148618402838097463 +4615 0.1576171571808727689401 +4616 0.0266372468597717293459 +4617 0.2187211239211107605129 +4618 0.1193262257943380616476 +4619 0.0497972920654145556618 +4620 0.0178360836450031541578 +4621 0.1111068682635533572123 +4622 0.1576479183489917779148 +4623 0.1115760145504451211762 +4624 0.0948538605907629089886 +4625 0.1563472391943546424553 +4626 0.1858150370494189573822 +4627 0.083762888365057019957 +4628 0.0811068574248010043526 +4629 0.2450548186930160687869 +4630 0.2950504794498978178297 +4631 0.2236940792116520471122 +4632 0.2288873048959965961213 +4633 0.1653272457126888916878 +4634 0.0681195808182264439523 +4635 0.2533665884028281634066 +4636 0.2428943075558337716391 +4637 0.0023363141130540495544 +4638 0.0026789077702187541301 +4639 0.0013832866753287441624 +4640 0.0026335092966361874252 +4641 0.0183114485376897791202 +4642 0.0102731796645875311791 +4643 0.0004368625438520553966 +4644 0.0063771845572596717183 +4645 0.0060536504713801344083 +4646 0.0073666995643893555534 +4647 0.0013617545520490470599 +4648 0.0007955149756838827404 +4649 0.0030699843068689438783 +4650 0.001437877603071683252 +4651 0.0017275939465434726489 +4652 0.0051494134771190450475 +4653 0.003305536275465674554 +4654 0.001024508296276154147 +4655 0.0048553979148282452871 +4656 0.0056191205951491140363 +4657 0.017087126665303119244 +4658 0.0092930161428107904004 +4659 0.0205054076821327320446 +4660 0.0016538753275510689137 +4661 0.0039594154206364137044 +4662 0.0005109550420854525304 +4663 0.0048562936734404824404 +4664 0.0003302448057472734725 +4665 0.0070425994465016048679 +4666 0.0153694784928456174011 +4667 0.0058770531986631711213 +4668 0.0069675852170578986625 +4669 0.000399825183629001356 +4670 0.0089444640566066803411 +4671 0.0357827126643381898674 +4672 0.0114896591256762811706 +4673 0.1071479802891513039009 +4674 0.1268981702534190680876 +4675 0.0722792951346362261278 +4676 0.0450922769783302507252 +4677 0.2284920322676788351224 +4678 0.0655402548664045375704 +4679 0.2232476399444237591396 +4680 0.2008764523352913888843 +4681 0.1800535074458235740291 +4682 0.3156233933809327596443 +4683 0.0678568239752801866116 +4684 0.1626698563468443325952 +4685 0.2039782486365656066774 +4686 0.1239123581622204317565 +4687 0.3565298909367310975504 +4688 0.192502262946039087721 +4689 0.2103839341976855847882 +4690 0.130311926221129525949 +4691 0.1804851141153951676888 +4692 0.0745578880840587071566 +4693 0.1141680443701015429392 +4694 0.1304217109797839446728 +4695 0.145337537272658051668 +4696 0.1687207499079463324243 +4697 0.1198308673341065971885 +4698 0.1333160773530151776978 +4699 0.0715078448275525691002 +4700 0.0401158400226026487512 +4701 0.245383925704411198776 +4702 0.1197486958646829940944 +4703 0.1740789027982858616639 +4704 0.1379422359537207376512 +4705 0.1800036695190900815078 +4706 0.109449141325961438187 +4707 0.0768804148598428438444 +4708 0.0909105807992101472292 +4709 0.0006341167707370900337 +4710 0.0402293164335374500706 +4711 0.1485474985090467492377 +4712 0.1438740255988584515467 +4713 0.0841998428404114318546 +4714 0.1386805674399636534311 +4715 0.1407841176379224512605 +4716 0.0780931094668284586646 +4717 0.1612565667037682837215 +4718 0.2044649219400348216791 +4719 0.1713784256957913976205 +4720 0.2274210197602154204244 +4721 0.0737928238736608593795 +4722 0.1229274541054603958612 +4723 0.1082507617649232090073 +4724 0.0169240402880178365808 +4725 0.0381053233424733100998 +4726 0.0694918560820155933877 +4727 0.1371480242682209249594 +4728 0.1881985653385573331597 +4729 0.2161928373750054421798 +4730 0.08242807665782539428 +4731 0.0987699941435524336741 +4732 0.1080039700789639511846 +4733 0.1885961346334689703319 +4734 0.1593669319729310029921 +4735 0.1506945212069397688204 +4736 0.0039656109973142646477 +4737 0.0882138545431484788484 +4738 0.1584893876628611919521 +4739 0.0502457716501681708743 +4740 0.25513532112440856503 +4741 0.058562709605547963454 +4742 0.162488982499703366047 +4743 0.0486818488206859981138 +4744 0.2373584628596057122252 +4745 0.1629355994915318295302 +4746 0.108694166544807249708 +4747 0.190367846463467937923 +4748 0.0843116544177226617807 +4749 0.0043356547554491741206 +4750 0.1720267290629483580844 +4751 0.0332588610421206617995 +4752 0.101676446966488823187 +4753 0.1682977634941221767306 +4754 0.2096293276358974777551 +4755 0.0746817023528973389368 +4756 0.0818317863454956634373 +4757 0.278807376901436365646 +4758 0.1553259250194015395063 +4759 0.1443317923219225307196 +4760 0.1877586562829027805321 +4761 0.1368322159929674097789 +4762 0.1689627692041205386797 +4763 0.2046448498395713466724 +4764 0.157376960449486513749 +4765 0.0023396860807756211104 +4766 0.0682600565756572991161 +4767 0.1849155661537168482411 +4768 0.0486466062483421404772 +4769 0.1534693304747960229673 +4770 0.2098244607058543376166 +4771 0.1786701508543510064086 +4772 0.2215736554616025977271 +4773 0.2140845056341628416252 +4774 0.154647962369468566779 +4775 0.160116715216922167464 +4776 0.1544278896263713285464 +4777 0.1817821563857128508435 +4778 0.063172463852990828892 +4779 0.0494532076280363164233 +4780 0.0260368108157048756823 +4781 0.001854541567847989899 +4782 0.1756770478278363023517 +4783 0.167552322247116608489 +4784 0.2713638533399816421543 +4785 0.2185334671623652413874 +4786 0.0787152322926790298796 +4787 0.0144515726110565189549 +4788 0.1581156549043186698711 +4789 0.0205905309355269815208 +4790 0.1791790496852501413105 +4791 0.1728107684487021067365 +4792 0.0796964690693396926591 +4793 0.2116716856958691383994 +4794 0.1102564420775041104106 +4795 0.1235937351609419504461 +4796 0.2240721925910811596072 +4797 0.1561894675584463709761 +4798 0.0418971333845982174537 +4799 0.0226186612880708119711 +4800 0.1484079878155745157997 +4801 0.0362298531144581159591 +4802 0.1509321459739350335116 +4803 0.1450409814398237740019 +4804 0.0148134540133222202679 +4805 0.011675285749432584223 +4806 0.0665413037311661514916 +4807 0.0028919187412075942657 +4808 0.0229458619414631623257 +4809 0.1383247561746723253151 +4810 0.1718003730458728950392 +4811 0.1233266049917305512418 +4812 0.0111566170699596844962 +4813 0.178383860725915749379 +4814 0.1117070091894858674397 +4815 0.13090172765961921586 +4816 0.1590212130160796599743 +4817 0.0898825516226449960921 +4818 0.3299214550897222042458 +4819 0.2003451366256333021632 +4820 0.2274996058640026408426 +4821 0.1402564310545051928791 +4822 0.0025682780725612515058 +4823 0.1083535172947279329181 +4824 0.1094763180119700113346 +4825 0.3145563840663286314658 +4826 0.0754947214636540880894 +4827 0.0807863141128916106837 +4828 0.0132133243670061444747 +4829 0.1025693860015859798507 +4830 0.1303237358464813255843 +4831 0.1290639015922354082555 +4832 0.0711746711396145925743 +4833 0.0145882019956189972615 +4834 0.0284335550172053162787 +4835 0.0307500746516956406651 +4836 0.0444480266801937293208 +4837 0.036711269323264503206 +4838 0.2210001891908661697528 +4839 0.0285861863461234870509 +4840 0.0327045263770062294428 +4841 0.0700084108528353232792 +4842 0.0940896793722627022838 +4843 0.2055662153830629945706 +4844 0.029866972099528956569 +4845 0.0015794409260703929938 +4846 0.1232730495175807378105 +4847 0.0299226047049673433975 +4848 0.0511782422933492117667 +4849 0.1013255288571393020192 +4850 0.113475762991380707545 +4851 0.0907577471073428843074 +4852 0.002104372128145075467 +4853 0.212249619331481231832 +4854 0.0492829771026095930786 +4855 0.0012892676904975731555 +4856 0.1981896302790730757248 +4857 0.1527393215325187070253 +4858 0.194391512116872916538 +4859 0.0777924346310693848316 +4860 0.219224710073657930165 +4861 0.0377167495300441829587 +4862 0.0649514752470660705619 +4863 0.0120743812965247651614 +4864 0.142622996851522299 +4865 0.0535384110428430981532 +4866 0.1209723671396934902011 +4867 0.065173828823379409414 +4868 0.0254912841888103175458 +4869 0.0392498148092685836263 +4870 0.0273235223321135235752 +4871 0.1129048383393093102356 +4872 0.0074609342208508070554 +4873 0.0404427233045363612729 +4874 0.0117608414491422886422 +4875 0.0113140517657200970203 +4876 0.089835954997749295492 +4877 0.0876623011812665553499 +4878 0.1127901578651419045585 +4879 0.1284337053668805073148 +4880 0.1339121805215127136712 +4881 0.1495043056710509410401 +4882 0.0937917517865983485503 +4883 0.1877152642561703554946 +4884 0.0403136978421835892594 +4885 0.3218648959872420500794 +4886 0.0058478310011668881499 +4887 0.0832755226044045310241 +4888 0.1294654535263329853123 +4889 0.1893241582960600022378 +4890 0.0957577660673121500157 +4891 0.0013785515897553938212 +4892 0.1722794900443196663975 +4893 0.0463255127767497262847 +4894 0.1341436949190538130416 +4895 0.0676147885729588282722 +4896 0.1218932253462059606131 +4897 0.0468796213128502281542 +4898 0.2017736592277697971198 +4899 0.1783950167639407891063 +4900 0.1577874460817795532197 +4901 0.3411615854341327747079 +4902 0.1480257869556722560844 +4903 0.0582251062351919687621 +4904 0.1634395111809037182127 +4905 0.1949140977771753047154 +4906 0.2258037050329695094586 +4907 0.0112707582465849300707 +4908 0.1435906530840601880161 +4909 0.2610081837715649832887 +4910 0.25480473432873984585 +4911 0.1610924430942440588321 +4912 0.2807442036035404653305 +4913 0.183838881615247279333 +4914 0.2273118160322422365294 +4915 0.0081927681641683677144 +4916 0.0037291787931846734176 +4917 0.1629488535392236148169 +4918 0.164010214650120389468 +4919 0.1650438523663789736062 +4920 0.2676564195200917817274 +4921 0.2052208187761042912367 +4922 0.2613411273718195815263 +4923 0.2408780171597859620647 +4924 0.1220746900394716755178 +4925 0.2071739025878859274954 +4926 0.0226260754961124348206 +4927 0.0590517321804560832432 +4928 0.0346882758424414475162 +4929 0.0214124704183018543802 +4930 0.1661042041014171022351 +4931 0.0556072927964750760021 +4932 0.2344489438826961313911 +4933 0.0934970923601824571714 +4934 0.0627024065881412184797 +4935 0.1488271461973018072733 +4936 0.0808939914333822729375 +4937 0.2394395902482875926331 +4938 0.2058010334002671748443 +4939 0.2182664539691295535473 +4940 0.2140572613457590589459 +4941 0.182880695709834617535 +4942 0.0185327700983158108472 +4943 0.0142772468646295887934 +4944 0.2345571229522538103662 +4945 0.0855461239071267226519 +4946 0.14620464649814474134 +4947 0.0393267499920395280366 +4948 0.0511696193244288236035 +4949 0.0438479689762244911888 +4950 0.2233360910910727747947 +4951 0.1869779799143947429663 +4952 0.2128298230821198955276 +4953 0.129865886889977799612 +4954 0.2583921477096674523821 +4955 0.2877590604044982791621 +4956 0.1175843396719046390908 +4957 0.2434798421088040232263 +4958 0.2011127562179221561767 +4959 0.1571076739133931976511 +4960 0.2371578411619900406127 +4961 0.1350494055849346120013 +4962 0.2152224519837465654959 +4963 0.1058973774803239759068 +4964 0.1009456530590920331214 +4965 0.309129162055472306303 +4966 0.1560750559919588109636 +4967 0.1037112659459642483029 +4968 0.0933599132594925829043 +4969 0.1749949415043687006577 +4970 0.2164279445523314970856 +4971 0.1821280254763844863586 +4972 0.2101309096753017913173 +4973 0.236429981159559238213 +4974 0.240427325264706726049 +4975 0.2203372484252686669404 +4976 0.2017163954762522803943 +4977 0.2526476950997639803198 +4978 0.0049524824988606804335 +4979 0.0583711174651025191396 +4980 0.0849568107311699793893 +4981 0.0142156895828683283584 +4982 0.2180717700527515512388 +4983 0.296891142828538234788 +4984 0.1878965428860461805982 +4985 0.1970468936979270779819 +4986 0.2101709370872049120749 +4987 0.1506118389425049608477 +4988 0.2015758033799161519362 +4989 0.1693549822011713446024 +4990 0.0487333608551988387014 +4991 0.1121949193526840388158 +4992 0.1208376054884339467765 +4993 0.088222585884102325271 +4994 0.1400172156727651773256 +4995 0.2602264834792684111697 +4996 0.1713553247118685118888 +4997 0.2392598609940334020152 +4998 0.1833419503442186848652 +4999 0.0219089078293477033943 +5000 0.0260692463526386551675 +5001 0.049980994739027319318 +5002 0.0381709909195510482816 +5003 0.0002711039633882453954 +5004 0.0243201829999617341604 +5005 0.1320085015704949571447 +5006 0.1649154846279408670462 +5007 0.1127774707354340100185 +5008 0.2334830890358730592915 +5009 0.1005716985744681685189 +5010 0.1905083797628749353592 +5011 0.1075978211518220373222 +5012 0.3250200311098510597141 +5013 0.2735641956386449802352 +5014 0.2515394575232275942156 +5015 0.1921820213944430699726 +5016 0.1581063229116226043214 +5017 0.2191438041385653512361 +5018 0.1786107100036826433875 +5019 0.0984657817876659902101 +5020 0.2040247940450322094996 +5021 0.1676072529600007854356 +5022 0.0099475855660530326796 +5023 0.0663597230967363505005 +5024 0.0004966277650082133802 +5025 0.0082415906894971573127 +5026 0.0033198519233797098987 +5027 0.1342278912825916881157 +5028 0.1192425634085322178057 +5029 0.0285272097245546467359 +5030 0.1148672124825811102777 +5031 0.0165969407478225097763 +5032 0.1587504383554737830142 +5033 0.2025365198478128736514 +5034 0.1855254439173583402845 +5035 0.1430553356275491294625 +5036 0.265966202715120947353 +5037 0.0263074795559888052754 +5038 0.1894606171093986546339 +5039 0.0514942854376576331088 +5040 0.0115765356820223420248 +5041 0.0519284497087376326063 +5042 0.0117365967758634916901 +5043 0.1496936664263334160196 +5044 0.1994751346848609441231 +5045 0.0324705243930506065597 +5046 0.1775616010918793785667 +5047 0.1825112134644266970174 +5048 0.0020918241271511003529 +5049 0.2154800563151794412509 +5050 0.0314115922275299205846 +5051 0.2162618908311279286583 +5052 0.2057311325205461727261 +5053 0.2069040315069412272475 +5054 0.1271454271665726776241 +5055 0.2124849047272916457985 +5056 0.024608499192460737276 +5057 0.1251540272682928056636 +5058 0.1996244272916053996703 +5059 0.1598376213675556378746 +5060 0.2912745461681436331958 +5061 0.1200244379388842735912 +5062 0.1773815619625213335642 +5063 0.0882424222652787443311 +5064 0.2626543680590184282053 +5065 0.1928043901786541114429 +5066 0.0305226994621478550651 +5067 0.2044445440635363209214 +5068 0.2049588685066312809813 +5069 0.2507787991665464399915 +5070 0.19943431144027315427 +5071 0.0698381092702493599944 +5072 0.2800900284989077393405 +5073 0.0620252922496460773472 +5074 0.0042446707762772472108 +5075 0.2084851777110027226669 +5076 0.189798197112812178089 +5077 0.197102443368141255764 +5078 0.0985601455368964873838 +5079 0.0843524917298528464915 +5080 0.1395277979609956531259 +5081 0.1709648004101598506299 +5082 0.0113739756434567840931 +5083 0.1002043806686915977666 +5084 0.0493014539926501724199 +5085 0.1724927523647857285916 +5086 0.2669465186909126819259 +5087 0.1240462847703464460691 +5088 0.1315510514728638336024 +5089 0.0063183780501642576588 +5090 0.1569629550601862599812 +5091 0.0904493511200731986621 +5092 0.2134122371948800056529 +5093 0.0303276545697214808261 +5094 0.2667884010323802179698 +5095 0.0294898380632473630647 +5096 0.2704125631233053606017 +5097 0.0902180350423939236837 +5098 0.091152567401019721216 +5099 0.1795693467190687675483 +5100 0.2695254462118605864873 +5101 0.1897014433600908434041 +5102 0.0353751659482292868386 +5103 0.1571724270870281936485 +5104 0.0009470536315212116276 +5105 0.1492320730432993569625 +5106 0.1838751480209152577849 +5107 0.1491511393580619015964 +5108 0.0609843971184887145842 +5109 0.017813041529344517816 +5110 0.0219690009804204902655 +5111 0.1960326078325428100779 +5112 0.2390926650211029347304 +5113 0.1354269439469785452079 +5114 0.2126433141444319108171 +5115 0.1898124702900666482819 +5116 0.250087168301595053066 +5117 0.0279910118143606341523 +5118 0.192595007438030263458 +5119 0.2590027520273904815262 +5120 0.308776703868806567943 +5121 0.0682312205396271853619 +5122 0.027246907575115806438 +5123 0.2074361976162191756323 +5124 0.1047498314471579650142 +5125 0.1805235389580870930537 +5126 0.1080100441379065329128 +5127 0.0461505841452450327189 +5128 0.1476528677142645296083 +5129 0.1538358632425925187626 +5130 0.0064594241274439573655 +5131 0.1537914738530078706535 +5132 0.036610559643729854995 +5133 0.1567063078984346768951 +5134 0.1117395814596291170329 +5135 0.2627493792796335991824 +5136 0.2612407614892414686736 +5137 0.1303037432897650449437 +5138 0.1812647106747309888597 +5139 0.034121729735995351418 +5140 0.2915582044265880301559 +5141 0.1716950762466106783499 +5142 0.1354399282952614136377 +5143 0.1435312391178961421989 +5144 0.2301241673317155067569 +5145 0.1450062877786099657662 +5146 0.1730128911160070670494 +5147 0.1535149789716777346538 +5148 0.0880770699305246373978 +5149 0.1604478776104890924703 +5150 0.1500021419730936667047 +5151 0.1067699529963172799807 +5152 0.0331626892581471863219 +5153 0.0197939323409604961412 +5154 0.1449437399966186368339 +5155 0.0219233750424108614352 +5156 0.0975266682647050187072 +5157 0.0967739670894236514442 +5158 0.0546147605682671261063 +5159 0.1088399528746582384242 +5160 0.2084181087463875814425 +5161 0.1404633060524795307167 +5162 0.0049283448915436785936 +5163 0.1364371249153983611802 +5164 0.0750281114894140732652 +5165 0.3130259777910924245958 +5166 0.1109815271420790672163 +5167 0.239275529895405542069 +5168 0.0393744404692230390297 +5169 0.0494503557758915954223 +5170 0.1706735340412919299524 +5171 0.1814113899573267518761 +5172 0.0067163137373766872845 +5173 0.2592553661046927970801 +5174 0.2443915062089330736406 +5175 0.0745784873775068440915 +5176 0.0142594646555716762781 +5177 0.0640929996908287946678 +5178 0.2345634025246381981766 +5179 0.1902508942354014798148 +5180 0.00179958566205999323 +5181 0.000827848323829376352 +5182 0.1921967067316516786235 +5183 0.1680349351739365004743 +5184 0.1526154493651839250123 +5185 0.1811906071076854196633 +5186 0.0789028713897464428761 +5187 0.1440954366153753274382 +5188 0.019066471219075495358 +5189 0.0066416145959591549666 +5190 0.1174862460170906924839 +5191 0.1548202568158831005096 +5192 0.1163344914794849499495 +5193 0.1341446412825756107079 +5194 0.2309780807225123855364 +5195 0.0984094192705885495442 +5196 0.1945056463933473311911 +5197 0.1506151806882674859533 +5198 0.164948819774635185853 +5199 0.2425706010613241092599 +5200 0.210047341224982864194 +5201 0.1862060131533584406149 +5202 0.0872392341964068340765 +5203 0.2382217859771895984711 +5204 0.0365897686515499059867 +5205 0.0198574835047035623548 +5206 0.0555429242276904988618 +5207 0.0015193037719239263147 +5208 0.2096121182387682413406 +5209 0.0784807473116028581073 +5210 0.0136163190046356529644 +5211 0.211866349952831073411 +5212 0.2057182876121430026295 +5213 0.062123950067993656543 +5214 0.312300077414421839439 +5215 0.1862476409917473429978 +5216 0.2299824521487082717996 +5217 0.0536674838510361740251 +5218 0.1447824084867040117519 +5219 0.0531498388116208794751 +5220 0.2135368250496738340427 +5221 0.2714343650820685294178 +5222 0.2053569070972897370186 +5223 0.1221581497379095465616 +5224 0.1452174401559035155085 +5225 0.3122728634289192450879 +5226 0.1179022839874802053295 +5227 0.1435983593383415857225 +5228 0.2015030529234513034798 +5229 0.0618986097600065857116 +5230 0.1431729610673933839049 +5231 0.3441876809625069588705 +5232 0.2581033994185421009959 +5233 0.0010994806788765156685 +5234 0.2333754964250572272455 +5235 0.0239575448606428771658 +5236 0.0572195761519964096742 +5237 0.0093168176364451238403 +5238 0.0625707955596674775256 +5239 0.0162795394192689554802 +5240 0.0360440705437652178511 +5241 0.2207842397301449377522 +5242 0.0942115354245618014106 +5243 0.1692266620650569597384 +5244 0.1545128993390017524412 +5245 0.0639561087911093423264 +5246 0.024909036858272771281 +5247 0.0415308221671518359996 +5248 0.0133982221760321519588 +5249 0.2776674409007288679696 +5250 0.1252538492848424556136 +5251 0.1123772928609377275144 +5252 0.2560572681112592285935 +5253 0.0752003193343899756229 +5254 0.2229100912207461204773 +5255 0.2227526047046860246947 +5256 0.2215527243484598785006 +5257 0.1492828107137843252072 +5258 0.2171212118445552952117 +5259 0.1866599722319669529824 +5260 0.3755365449556266055353 +5261 0.3449805818017609992943 +5262 0.1059964049032021404795 +5263 0.0088164757598470401556 +5264 0.1240385518778803486395 +5265 0.1553484391240206596851 +5266 0.1659925151482400385028 +5267 0.2408245781656481732114 +5268 0.0326285798709721255872 +5269 0.1792034261508238246474 +5270 0.1769607810825122584664 +5271 0.1845662643268051461565 +5272 0.094571230142363543747 +5273 0.1179281130495002521963 +5274 0.1044985623030001836709 +5275 0.1250854668613417819412 +5276 0.1611773798679536073841 +5277 0.2627452971760081412711 +5278 0.0973516735246265779713 +5279 0.2104037660207943860602 +5280 0.1452081956650796701336 +5281 0.1325608550765921267445 +5282 0.1251433416792298858322 +5283 0.1356433642815827456118 +5284 0.203348039946714242987 +5285 0.1601588261651661626583 +5286 0.1422675767054908391174 +5287 0.1772549370325804951598 +5288 0.1168538390213595262734 +5289 0.0276908346358880237681 +5290 0.2243103592785843092283 +5291 0.1399362932374917378731 +5292 0.2017185240634237763935 +5293 0.1050967190556039709826 +5294 0.0847740114252884813251 +5295 0.059744657478030600839 +5296 0.2466353750749293860522 +5297 0.1845477927426387165788 +5298 0.1196007195700833913854 +5299 0.2456690019694188487076 +5300 0.0758992225316288682269 +5301 0.2325928765366805095471 +5302 0.1705768215132642262599 +5303 0.1780554694960808648219 +5304 0.2421925916710574722135 +5305 0.0073007630312410913553 +5306 0.1323227804011133657003 +5307 0.2146120998561249493264 +5308 0.0615709944130029690479 +5309 0.1827527561933446342834 +5310 0.2365690524650027604103 +5311 0.2510648042212536013018 +5312 0.199269642570942789872 +5313 0.1663819407045757925445 +5314 0.1252103566577472126831 +5315 0.1810807303006210233765 +5316 0.1827698734184236561973 +5317 0.1244881815404528385693 +5318 0.2224392610376377532599 +5319 0.1779355937280837351988 +5320 0.1187170854082866822132 +5321 0.1472984688366519578917 +5322 0.1249020285321916223786 +5323 0.2302402859781280675122 +5324 0.1312111057146450598943 +5325 0.0133135698446721663368 +5326 0.2079878349707363072163 +5327 0.0820886366723530686018 +5328 0.1516860887697044357747 +5329 0.1141538408462055148584 +5330 0.2308434739996991225119 +5331 0.1933023306920755857163 +5332 0.3416084606724699712643 +5333 0.4384485396609182905614 +5334 0.017807130672928432602 +5335 0.1013277570912017172544 +5336 0.128350289184529176012 +5337 0.0841665997503358298548 +5338 0.1230534693146804886554 +5339 0.199988512443606603064 +5340 0.2350848907843383073235 +5341 0.053897255384550972479 +5342 0.1421892172873372717223 +5343 0.2602420903621658343496 +5344 0.3364819332103271554146 +5345 0.3875216272177700038704 +5346 0.1397527541223476110765 +5347 0.1651030463452063390406 +5348 0.0994350711184099900208 +5349 0.0347453390374611731373 +5350 0.192251066618895544158 +5351 0.1980096384117545071923 +5352 0.0218627417666670542662 +5353 0.0032081028980494989168 +5354 0.06869881002078088994 +5355 0.1785049204721449178646 +5356 0.0526199245735479784192 +5357 0.2746991608584855582009 +5358 0.1278381508502562657892 +5359 0.0791322615581510901972 +5360 0.0596325032489335646324 +5361 0.2467219876674622835999 +5362 0.0043651335817998168642 +5363 0.1543029970056059296812 +5364 0.1652959890582109681034 +5365 0.1615954770977277021871 +5366 0.1025091361739040168866 +5367 0.1311393975663418254296 +5368 0.0941077073890356913255 +5369 0.1554765859704426156362 +5370 0.0278754856512546181357 +5371 0.0323611248710761398306 +5372 0.0832241838567808062122 +5373 0.0941230987144824754465 +5374 0.0164363208108416515574 +5375 0.1229044331459856198574 +5376 0.2266396798819439450945 +5377 0.1656106972431473089991 +5378 0.1441577897063365254482 +5379 0.0850855725074512875272 +5380 0.0793880204952095991366 +5381 0.0497140169440965715153 +5382 0.1596675083797913785588 +5383 0.1093461323906829152364 +5384 0.0282938300623309900439 +5385 0.0739200832551311637353 +5386 0.0167947034039557345497 +5387 0.0952527221378688393472 +5388 0.149869666278222013478 +5389 0.2124599564731924905558 +5390 0.0402590963756105230109 +5391 0.0347421654151619380135 +5392 0.2643218317292952423969 +5393 0.0323487763564336630595 +5394 0.0797866461945101884679 +5395 0.1585248231922071759925 +5396 0.1070761455015945412539 +5397 0.0215043320411142831194 +5398 0.1315576157262594303443 +5399 0.1644295656005451689019 +5400 0.1365895165751667139631 +5401 0.1434553862969714932429 +5402 0.3029245303588081195123 +5403 0.2263052385263417953798 +5404 0.0037318874432835952987 +5405 0.3007035913029514895278 +5406 0.208248225744601139775 +5407 0.1149894747087491370108 +5408 0.1098586967587296403526 +5409 0.1370373533906333873844 +5410 0.0820482748402784345387 +5411 0.0538182632286128628807 +5412 0.1132954744848062716978 +5413 0.1062328222688762729975 +5414 0.1459645997675081130485 +5415 0.024572253390867219297 +5416 0.0078988393657783984025 +5417 0.0853602997086318482367 +5418 0.0252013249761133401039 +5419 0.1712389091500131355073 +5420 0.0370488614783489461635 +5421 0.0536152846400104535207 +5422 0.1659736674760395880313 +5423 0.1002049050458108936379 +5424 0.2458770943666156727492 +5425 0.1970176750307471302825 +5426 0.0151156542865529940761 +5427 0.1168171615577857230805 +5428 0.0161719655622621777402 +5429 0.2146115229217137598816 +5430 0.224756800864974959353 +5431 0.1351169613579714101625 +5432 0.2311364094631697951865 +5433 0.1525557929116258260027 +5434 0.2104475055721197196412 +5435 0.3518403545320690772868 +5436 0.294947688443039757189 +5437 0.2233368552688614616653 +5438 0.1592025501375711749041 +5439 0.0209141586413164048963 +5440 0.2280044022889163579659 +5441 0.1820358865255851665843 +5442 0.2425213082719848700641 +5443 0.4677110982734684618833 +5444 0.1211179157751627877282 +5445 0.0547368871570942105986 +5446 0.200245022192612864842 +5447 0.2372682862779670331932 +5448 0.2215978566018079742861 +5449 0.0514159431553030815687 +5450 0.1218995114904759813346 +5451 0.0250353436707426123264 +5452 0.0355852193171223088464 +5453 0.0315304842867076412505 +5454 0.0365896838226361306723 +5455 0.1171156073268682928923 +5456 0.1711517665591367476363 +5457 0.0970454546564236686379 +5458 0.1495534118742358475895 +5459 0.1234185539018486515994 +5460 0.1269524240661063785307 +5461 0.3184272230003005188514 +5462 0.1030125631438088273928 +5463 0.0856844195501681726856 +5464 0.3097804252035556427103 +5465 0.2211831443280378173277 +5466 0.1660272862519960590522 +5467 0.1596665013834726309927 +5468 0.1541137560971471420768 +5469 0.0760655049891443696408 +5470 0.0047927097132941963095 +5471 0.1613643110637683064645 +5472 0.1185961487250451057429 +5473 0.1968550205742937986297 +5474 0.2424687448948805168492 +5475 0.2103143505237580757061 +5476 0.2881932759744739724894 +5477 0.2398167967751246298924 +5478 0.1615675999223835368479 +5479 0.1381104638664191475161 +5480 0.1713153093436702856245 +5481 0.0035646928492834560089 +5482 0.0076066706219052055565 +5483 0.087899726289613885899 +5484 0.1571559299081330962622 +5485 0.0718062325658847810939 +5486 0.0006884795614797719438 +5487 0.2024317763870060737919 +5488 0.0266155863757686905746 +5489 0.3063496647495635993863 +5490 0.0928746458825047138674 +5491 0.0934215798060361174437 +5492 0.0970125775826494868292 +5493 0.1797582718954762459607 +5494 0.1348377931115595229628 +5495 0.1990013796652505551066 +5496 0.1511956252868640326881 +5497 0.2388587757599974770173 +5498 0.1775467893323089685342 +5499 0.1735017085024617156108 +5500 0.1964747259616519692305 +5501 0.0218181464953966404441 +5502 0.1289257347430931566201 +5503 0.3260860192922620526268 +5504 0.0542677371498976671149 +5505 0.1283478114893712995759 +5506 0.1279888069239770720387 +5507 0.1735125881794932212188 +5508 0.0653966713486939660305 +5509 0.1977757931050410367124 +5510 0.1991005948036122241707 +5511 0.1397046495249691655527 +5512 0.2378196038469486683908 +5513 0.1583545377191945546791 +5514 0.0122040312724814910966 +5515 0.1422891918339139472049 +5516 0.0453194917498115842913 +5517 0.0442472529948128856514 +5518 0.133763210489707851103 +5519 0.1758757282038507785416 +5520 0.0198158921439892041216 +5521 0.1384784400857938801988 +5522 0.0284950735660407898642 +5523 0.1836776260506882141321 +5524 0.1636872148552237227204 +5525 0.0913627460375127792291 +5526 0.0117992348134043027047 +5527 0.2808545697970155075041 +5528 0.1295468715718715024821 +5529 0.2593123155061895501738 +5530 0.2040010142150728289501 +5531 0.1347080536321252353105 +5532 0.0801281812332847176439 +5533 0.2996901064332086472852 +5534 0.0421011735055945301998 +5535 0.1269787678256972252022 +5536 0.1531273739299514324852 +5537 0.0258181456073142204244 +5538 0.4765628027994746607199 +5539 0.3340918259553954383634 +5540 0.1909090088596101519869 +5541 0.0246760225164373500628 +5542 0.0463486632448461624567 +5543 0.0061901959135433736597 +5544 0.1758304441729779121761 +5545 0.1717121585505082259626 +5546 0.1318722272408501539065 +5547 0.1391215796060075726714 +5548 0.1381059309398521006695 +5549 0.0499354193885378563889 +5550 0.1170341829934554572779 +5551 0.2068317457566483663634 +5552 0.1608454745078281289405 +5553 0.0597414661858264584016 +5554 0.0622047156090815567264 +5555 0.1652947474843947384127 +5556 0.2511709575910596181636 +5557 0.0976831283279521039864 +5558 0.085691768098334078485 +5559 0.0102121696764929763279 +5560 0.1336991675108890331725 +5561 0.0296306598078646538097 +5562 0.0983563745264632027787 +5563 0.0931391426177946984977 +5564 0.1424923449746955395057 +5565 0.139153645120868141305 +5566 0.0858336426127809942743 +5567 0.1526236013618044451423 +5568 0.0763014094531858921844 +5569 0.2523520640583298280113 +5570 0.1803481914657777962496 +5571 0.1501353059131391998182 +5572 0.1077616694622989534924 +5573 0.2098718479419765803318 +5574 0.0506974639777460558965 +5575 0.0101583414240198118689 +5576 0.2797633745269028082703 +5577 0.030548971422236777945 +5578 0.3197045510832292047887 +5579 0.1279603724638915251965 +5580 0.230274492398783087177 +5581 0.0459534770651463947422 +5582 0.0094484350196898606727 +5583 0.2203570699925467835101 +5584 0.1955121961621573423162 +5585 0.0778210979143684061787 +5586 0.1940615337891284930549 +5587 0.2043972406180394607578 +5588 0.1639032097916193475573 +5589 0.3097950559071747100859 +5590 0.1428462622993802999538 +5591 0.1469778615364193130599 +5592 0.275023068955268457092 +5593 0.2009812402618243731833 +5594 0.1856131791003177711197 +5595 0.2198063482243454991316 +5596 0.1286538978129323773647 +5597 0.1255141940113136123092 +5598 0.1161145407334666784793 +5599 0.1117507314926945993783 +5600 0.1601122557405506297012 +5601 0.1113063156339737097555 +5602 0.1989228159369996196126 +5603 0.155797266812949086745 +5604 0.2356216686517612346474 +5605 0.1170064183422125925205 +5606 0.1036815690783133708797 +5607 0.2183183768871427010705 +5608 0.1784144144735027304183 +5609 0.0948652586747403409051 +5610 0.1419172543520663964944 +5611 0.1787043818305992515239 +5612 0.0614319720950543962656 +5613 0.0053967190822779400119 +5614 0.3267833754307447602372 +5615 0.2445422558222536402184 +5616 0.2411602582669402361493 +5617 0.0793778683833280263027 +5618 0.1966610426030507008388 +5619 0.1034875792201656874436 +5620 0.1180972813561476375543 +5621 0.0460316411934240804493 +5622 0.0876337386791462835678 +5623 0.2652807004956594383316 +5624 0.0315192637528683844428 +5625 0.0724224325326280754522 +5626 0.0946491195738906054835 +5627 0.1412945026636958090194 +5628 0.1094974607899504986941 +5629 0.1024293172012131908355 +5630 0.1329029299644893524768 +5631 0.2536577209761444184899 +5632 0.0393027603792932070381 +5633 0.0820312753726730042869 +5634 0.1567944262612362138487 +5635 0.2830072482434324854239 +5636 0.2484180208746043228007 +5637 0.0065526090079587925169 +5638 0.0934518297541288844865 +5639 0.007655122303988452781 +5640 0.0088004222405891723013 +5641 0.0067709460836725748542 +5642 0.0494893783295524067323 +5643 0.0753782406588271169934 +5644 0.004179423641196952123 +5645 0.0067310759326978377923 +5646 0.0011331251799100222274 +5647 0.099393794145388916772 +5648 0.0529668918443234326698 +5649 0.1554480851290435117207 +5650 0.1671152726710891101014 +5651 0.0230374385339557098684 +5652 0.1394751962909948073133 +5653 0.1501624052192694347418 +5654 0.2207505640218473119685 +5655 0.1791185819438510795631 +5656 0.0816802628920478973606 +5657 0.2929170242922088385207 +5658 0.2135342160756779072983 +5659 0.0321674159626461134143 +5660 0.0030450626702304911878 +5661 0.0955011215604366636711 +5662 0.21147668411145956191 +5663 0.0851020883610363459981 +5664 0.06228084817816599561 +5665 0.1782700057303135132702 +5666 0.1086797330916630943687 +5667 0.1464966756092455080207 +5668 0.1604962423941297400276 +5669 0.232553616012443314931 +5670 0.0275527343765716303814 +5671 0.1403073818947801476575 +5672 0.042080471981508028867 +5673 0.0644454912246553573985 +5674 0.1406587037754165880887 +5675 0.1118870979280140454115 +5676 0.1817481515196406605117 +5677 0.1577158400971395979528 +5678 0.2439901061631758183434 +5679 0.1626004683652916560366 +5680 0.1714596605555782637964 +5681 0.198037554972512852558 +5682 0.2470499268575064577558 +5683 0.1555600493469128653423 +5684 0.1919079919186366944839 +5685 0.1186596451728660489566 +5686 0.2119403100241975534956 +5687 0.0735459365855094665543 +5688 0.2008734060672552890203 +5689 0.2358392057595364577072 +5690 0.1415927796775944269569 +5691 0.2072807290836866656036 +5692 0.1940960497158703002007 +5693 0.1319473406833739370647 +5694 0.0896230740793016178447 +5695 0.1332618923388126930263 +5696 0.1415652996766883775503 +5697 0.2184764290239953055117 +5698 0.0831334164678668718906 +5699 0.0234613385278730525452 +5700 0.2686884850499402510593 +5701 0.2564217196141713173141 +5702 0.0779951055805552795341 +5703 0.3028631297109207198837 +5704 0.0403248379228861170143 +5705 0.0442904509376742944182 +5706 0.1310655479545941026753 +5707 0.1815757287055337065862 +5708 0.1196927406486014838771 +5709 0.125151158336356288503 +5710 0.082074149811062258042 +5711 0.0263823703680651187875 +5712 0.0831397113225178507889 +5713 0.2352246870270305878492 +5714 0.1399080975908285062737 +5715 0.1859977377193419156853 +5716 0.2488871172045050428334 +5717 0.0299082811609027061917 +5718 0.1753844399443389434623 +5719 0.0818382336458225584375 +5720 0.1090027584398074178562 +5721 0.1332966539671957273416 +5722 0.2824142322531239535088 +5723 0.0157188267564084861727 +5724 0.2058648399957186680975 +5725 0.1841516932226635594461 +5726 0.1169566242304344166891 +5727 0.0357347438015198359818 +5728 0.0888366344268810420592 +5729 0.0589142809919425275433 +5730 0.3667012082895249047709 +5731 0.2131068116303764525821 +5732 0.1817323894327584343777 +5733 0.2451472929191678695737 +5734 0.0778734347445425162393 +5735 0.2147332350763671293681 +5736 0.0083013174068035412712 +5737 0.0031454291895488344609 +5738 0.0005561685319563570073 +5739 0.2006534074940753697991 +5740 0.1980590081106877731187 +5741 0.2022248105264263795533 +5742 0.3522522549189842089312 +5743 0.2257370689732155766283 +5744 0.0533072160657729748889 +5745 0.0433706539236636059997 +5746 0.0183190144226862833277 +5747 0.100037644356730404116 +5748 0.0631597995256480554405 +5749 0.0040042681372273189813 +5750 0.0343642850199593410943 +5751 0.1691503019027770504668 +5752 0.0952452221759817752034 +5753 0.1696831570258861554557 +5754 0.1349611980882476636179 +5755 0.0559076827876446302845 +5756 0.2541552010204976563834 +5757 0.1625076455868024738471 +5758 0.1777174737378089119932 +5759 0.17922948836119578786 +5760 0.1639834849858542720202 +5761 0.2021002418297371816536 +5762 0.1549839760745896632965 +5763 0.0731560111400221813049 +5764 0.1006323052122663347374 +5765 0.0901583371300844482743 +5766 0.0949192224318664568017 +5767 0.0593512250925796666645 +5768 0.2982508690820439567482 +5769 0.0080363038421736292943 +5770 0.0160212196218129272751 +5771 0.0912793617843797311373 +5772 0.1039020396340130381052 +5773 0.2002582173629041673024 +5774 0.2251736076088422522368 +5775 0.2105984380178106685211 +5776 0.0177198330611303342397 +5777 0.0049316586718097894446 +5778 0.0020550996265826866789 +5779 0.2602635396813133272786 +5780 0.1294924515513547202961 +5781 0.23969330341010594565 +5782 0.2747534739326713548735 +5783 0.1122122551297740217224 +5784 0.1283326503055979284085 +5785 0.1747999389838671424613 +5786 0.3390326032529020761075 +5787 0.1667350080186013883132 +5788 0.1185849061381934188564 +5789 0.2661550056695610733115 +5790 0.0639726937594429084788 +5791 0.0711980749355873021589 +5792 0.0183107781087663151753 +5793 0.194382697408236077008 +5794 0.0261282903147599214477 +5795 0.0327278106580004835013 +5796 0.1898723157006594053353 +5797 0.0109034393709825112168 +5798 0.0159088340058023240686 +5799 0.1444417345403624375333 +5800 0.1876113878640914556239 +5801 0.0705874276049522353382 +5802 0.0116654752381788739957 +5803 0.2046092369538805577633 +5804 0.0156787142094883952259 +5805 0.1145060496548669370931 +5806 0.022192817075597615073 +5807 0.1230850498856445596196 +5808 0.0005736010126551597697 +5809 0.0393592723936930996564 +5810 0.0936346466383056463822 +5811 0.0713473939063584972065 +5812 0.1706989085608578426978 +5813 0.1902658274201060872866 +5814 0.1269083433660158077849 +5815 0.2061853884337727094156 +5816 0.1232520771466859982324 +5817 0.0388043139530889966138 +5818 0.240663398879893325466 +5819 0.2933762453258626501373 +5820 0.292635140607305943572 +5821 0.1410436218037978695072 +5822 0.0049234525885786988367 +5823 0.0213261124501753912552 +5824 0.1550251872731984448261 +5825 0.3001251842342484987824 +5826 0.0095500468946525442909 +5827 0.0918351500157731370777 +5828 0.1779466919358271836948 +5829 0.0413756880325438780588 +5830 0.1387268392284956852745 +5831 0.046296938022687444958 +5832 0.0388309600359277723447 +5833 0.0169715566918180599254 +5834 0.0118134339343431028074 +5835 0.2143533073538643196621 +5836 0.1136557273567257997371 +5837 0.2771781275830050295106 +5838 0.2030785159811470119706 +5839 0.1743367632496022889832 +5840 0.0970085930646875232997 +5841 0.0772630342929760127735 +5842 0.1450159604234304289161 +5843 0.1721450862181843066701 +5844 0.1020193624283927680274 +5845 0.0474678824408281477276 +5846 0.126479498127487599568 +5847 0.0393014374607715891163 +5848 0.0021156785246828364172 +5849 0.0061107885456919376274 +5850 0.0177812720698023764287 +5851 0.0179880983338337567534 +5852 0.0015729318465181405928 +5853 0.1293696039894565164019 +5854 0.0465035642280100455781 +5855 0.0687320168262303132778 +5856 0.1252076608249506894932 +5857 0.2142475711852538933222 +5858 0.0897687513737183834239 +5859 0.1010284649869643835984 +5860 0.2105323377525662731369 +5861 0.0701382352918721863055 +5862 0.1070179549195817148011 +5863 0.1337503420101380235963 +5864 0.1455377531960836134939 +5865 0.0811507331990258679033 +5866 0.2231011098343935739052 +5867 0.0614669670173300999871 +5868 0.0220881565604158294769 +5869 0.0259912496781187357664 +5870 0.0793965116107303719994 +5871 0.0680763376760400995558 +5872 0.0610068404505618097633 +5873 0.1189019844395201169762 +5874 0.1628615562014867901297 +5875 0.0376921598737001289914 +5876 0.0278457376805905827688 +5877 0.003727597617232579312 +5878 0.0016489728187341755761 +5879 0.3107880691701275233108 +5880 0.0898773972759862177506 +5881 0.0085654709893585711383 +5882 0.0080117186728903312665 +5883 0.2442010786704305880246 +5884 0.2030289221930428045226 +5885 0.1006747122814200257057 +5886 0.0635636747175562649703 +5887 0.1003376514172598099606 +5888 0.1850560939207744060031 +5889 0.2768260793174097122993 +5890 0.1943472307900161455407 +5891 0.3629127627484906581934 +5892 0.0602273141017768270933 +5893 0.0198657138345981476579 +5894 0.1572572608716708286725 +5895 0.0404420521754589173957 +5896 0.2395638982146328921363 +5897 0.161068636339376763944 +5898 0.159227409953788007213 +5899 0.1068657610630645016236 +5900 0.138558924927497661983 +5901 0.0634932176801362019303 +5902 0.0956067312570648281111 +5903 0.233081598189014943534 +5904 0.1678434829773889913618 +5905 0.1569205653214545137519 +5906 0.258946995108177480116 +5907 0.0691571671714993757574 +5908 0.0313823568585654966157 +5909 0.0132351390698506338889 +5910 0.0440813937868853034097 +5911 0.1802842876825268003671 +5912 0.2587445715605955487426 +5913 0.1939121356321778233411 +5914 0.0547724710480511287058 +5915 0.0459579677203584280321 +5916 0.0012850765014429711437 +5917 0.0015812109568818348299 +5918 0.0032202673521439951348 +5919 0.1192450692007357704316 +5920 0.1336486271682829074514 +5921 0.0094486789857233677986 +5922 0.0062725126701244819996 +5923 0.0957387892990814726168 +5924 0.1820051462418431076351 +5925 0.1636089772018216315086 +5926 0.1833143039012442299107 +5927 0.1926712409071375253777 +5928 0.2076670918001117249752 +5929 0.0076362288608292907752 +5930 0.0021037198077854374866 +5931 0.0087827013658662595602 +5932 0.0048981330716272404244 +5933 0.0017693262332530608483 +5934 0.0028861523125170143386 +5935 0.172638345660741487908 +5936 0.1373766059863818500553 +5937 0.1476135388368403633663 +5938 0.0218329091188605196872 +5939 0.0987024560434054260005 +5940 0.0308794368406612451672 +5941 0.0796318832702303941451 +5942 0.2904821946257925557866 +5943 0.0597454571861558356161 +5944 0.1126413239323255099933 +5945 0.2232931175255526701218 +5946 0.113164935000204025517 +5947 0.0966266524805813875609 +5948 0.2176451502407461824351 +5949 0.1258317512185910258538 +5950 0.2883642572831909278719 +5951 0.3834358195026172366759 +5952 0.1210914455905811315528 +5953 0.0451390957028348707714 +5954 0.1197352032891692286132 +5955 0.0819244473622117996836 +5956 0.0367080049611252581809 +5957 0.0762121287422677512469 +5958 0.2040057635529807789077 +5959 0.071415476015607701199 +5960 0.1599961124236986276248 +5961 0.1271139568973437550259 +5962 0.0870386496592520653159 +5963 0.0947295669127547756982 +5964 0.0975773514156066806846 +5965 0.1952959348651781834594 +5966 0.0004523146972274757799 +5967 0.0172572457851154770214 +5968 0.0217238210816913142331 +5969 0.1074289724969150622291 +5970 0.2197930259585877954542 +5971 0.1062071531524487882914 +5972 0.1006884534865205571563 +5973 0.1310521650467638410387 +5974 0.0535735693140478447249 +5975 0.0249209333865727973578 +5976 0.0024175726252180880158 +5977 0.0070094792813113152713 +5978 0.0104383258736575959547 +5979 0.2099176847037019233433 +5980 0.2443236325136659836677 +5981 0.1095154490126877822043 +5982 0.2443303835919661826104 +5983 0.2613968546707304274612 +5984 0.1534345945021843016232 +5985 0.1719368972895392033706 +5986 0.2307297051741424021021 +5987 0.135338057903043540442 +5988 0.1905933726513064085939 +5989 0.2022024129253140733997 +5990 0.1512004519318901618607 +5991 0.1128652141837803574154 +5992 0.188495067232567869997 +5993 0.2107285984332014916465 +5994 0.1885206200267376674962 +5995 0.1040534267193514855743 +5996 0.1698934948534307209922 +5997 0.2228866471947879845938 +5998 0.0626249329967564216659 +5999 0.1195881029384595201082 +6000 0.1709680512045371747476 +6001 0.0928430797879980368448 +6002 0.1134573567468820792792 +6003 0.1466150425404364188164 +6004 0.116298444758088717621 +6005 0.0891766457780629462349 +6006 0.1938657845712775940061 +6007 0.185891426692584099678 +6008 0.1735847855537828443584 +6009 0.0933078252291377335803 +6010 0.0038356506063400249741 +6011 0.1283042748621713580182 +6012 0.1226173238353151090374 +6013 0.2822456979007630373246 +6014 0.1549913956627108346797 +6015 0.0645482652624506797467 +6016 0.0165062885935613502808 +6017 0.0628812753607702612513 +6018 0.0904387953847687053877 +6019 0.0906756201916135012864 +6020 0.076014687504332809076 +6021 0.0394194342431534242022 +6022 0.0774701856642043829027 +6023 0.1480060724696281027057 +6024 0.0027153355186498506421 +6025 0.2001159973911320888451 +6026 0.1680995810928871680989 +6027 0.0933168503331497228848 +6028 0.0199494489080037987194 +6029 0.1090197633121181008953 +6030 0.184011720832948061366 +6031 0.127642274450034287625 +6032 0.281990835485932955784 +6033 0.0793890442037462479297 +6034 0.1911129551915582147625 +6035 0.0792893889675006374729 +6036 0.0251810075135823277503 +6037 0.3267901660180426159918 +6038 0.2246191142388134220909 +6039 0.1524346934731158065279 +6040 0.2863752397707267638438 +6041 0.1435650133457424537653 +6042 0.1631120959286356530971 +6043 0.1730323349015969736087 +6044 0.252175928886599620693 +6045 0.2333207231260782110738 +6046 0.1286415167540314596906 +6047 0.2316194889002435586267 +6048 0.275145757019385372999 +6049 0.1552033007501613381951 +6050 0.1467355774168482285269 +6051 0.0102731260494568522879 +6052 0.0544361252423506761233 +6053 0.1893519639179687685804 +6054 0.0753025633008883171771 +6055 0.05997158320238924617 +6056 0.0913020269676404622183 +6057 0.0984737154354583393845 +6058 0.0252754928591636084112 +6059 0.0825515434655516994189 +6060 0.1976847254584851165671 +6061 0.2698136869690060390958 +6062 0.2190168547254448538908 +6063 0.1619262459879300120047 +6064 0.0105913404323636477916 +6065 0.168385205221261796682 +6066 0.0053985722215878330713 +6067 0.173920809635538298199 +6068 0.2434345825937237195458 +6069 0.1150406066730763898764 +6070 0.3103386659232357791538 +6071 0.204141879636609013815 +6072 0.1571140012612219083454 +6073 0.0004801986818099430163 +6074 0.1576500907598667733378 +6075 0.1374427520847592010256 +6076 0.1332323817844130453558 +6077 0.1111959452815408089243 +6078 0.204225316404912621282 +6079 0.1979283659414477203331 +6080 0.1383651096640071964661 +6081 0.0151553664085626246805 +6082 0.2763920879706186495284 +6083 0.0273476399160426097978 +6084 0.1652905085544127405939 +6085 0.2580746899871783073266 +6086 0.1437904327216377176057 +6087 0.1189070252031715069219 +6088 0.168792906702171230382 +6089 0.0682185988482580107917 +6090 0.0148311632802436837997 +6091 0.0283929641181251066206 +6092 0.0093678591277889678574 +6093 0.0411124627968968633929 +6094 0.0066071372819309654584 +6095 0.1194536748651429808321 +6096 0.0501692296302955267895 +6097 0.1272908166693746401243 +6098 0.1162811547692638031171 +6099 0.1212521124148993878489 +6100 0.0712945876042492543423 +6101 0.0482123322245574525979 +6102 0.0023012827757538326795 +6103 0.1837596092047064089137 +6104 0.1792269261640813404757 +6105 0.2235051760294557154918 +6106 0.1616041840528831818879 +6107 0.2916852289832706501826 +6108 0.0236127788123278725685 +6109 0.0015426058493666018525 +6110 0.0392283585701765288856 +6111 0.0224159980105419730234 +6112 0.011695085371999291643 +6113 0.0213438797794967619059 +6114 0.0043003606858688604372 +6115 0.0250149322296770518226 +6116 0.0293373528179334264188 +6117 0.0272443155893747034069 +6118 0.0141515071410825473558 +6119 0.0766822065062447572048 +6120 0.0407346604340477133621 +6121 0.2144909223741590098555 +6122 0.031155703840156753065 +6123 0.1955607054317035076174 +6124 0.0355945038757360318615 +6125 0.0617541836709601596467 +6126 0.0633626146363098774472 +6127 0.1638532350329217901486 +6128 0.1789295651382554752651 +6129 0.2043385001359923069941 +6130 0.0026886449319290591127 +6131 0.0312102322928632588961 +6132 0.0021212556478927332287 +6133 0.2501357086579124588965 +6134 0.1577197636116620116375 +6135 0.1656942972705799377309 +6136 0.1247504594299411739833 +6137 0.1596460485533271189684 +6138 0.0601469036066563447762 +6139 0.0220264703364130252916 +6140 0.246349556258219831717 +6141 0.2868542832589540125809 +6142 0.0296238905914580283318 +6143 0.0194679617304996527283 +6144 0.0080393826204632722054 +6145 0.0366377374153782214838 +6146 0.0187469266349293874063 +6147 0.1191622643215099891512 +6148 0.105999481784682367258 +6149 0.2043633413829981648746 +6150 0.0523456204439545630391 +6151 0.1438764553125411738144 +6152 0.0991949197757179407242 +6153 0.1010264007207824082935 +6154 0.1998808275674321399684 +6155 0.0085610336169157537356 +6156 0.1885004077387408827349 +6157 0.085489267980415228898 +6158 0.1944015839729833217131 +6159 0.0356102295790721307123 +6160 0.0124425897922782850713 +6161 0.0067030537489657189207 +6162 0.0874927422440023028205 +6163 0.0026701005375609388089 +6164 0.1467103897836007364575 +6165 0.0040110918937876390011 +6166 0.0386120719090560296505 +6167 0.0172645407848654591776 +6168 0.0040819435430771841208 +6169 0.1705825673999419034299 +6170 0.0041133396778648013276 +6171 0.0078199144313179230759 +6172 0.0082495256622067067442 +6173 0.0042133168902084849883 +6174 0.0481525586511452538541 +6175 0.0049836382386054143764 +6176 0.0481892197808585637242 +6177 0.0497405287293587392017 +6178 0.0118807301600169258615 +6179 0.0540685711835702589867 +6180 0.109152494361082208485 +6181 0.1016519946033884180814 +6182 0.0665472530154912617073 +6183 0.0090684205288120468824 +6184 0.0243398831566247059177 +6185 0.2323855401195857039998 +6186 0.1316987322505861202071 +6187 0.0880064810349068527007 +6188 0.0941328871586697840668 +6189 0.0763123146905044236199 +6190 0.0443663590068500771069 +6191 0.0508429803566406851578 +6192 0.0393636586135835270239 +6193 0.0940311818431030510546 +6194 0.0520931854002086025535 +6195 0.0533338383033347129825 +6196 0.0769321882475489937647 +6197 0.0944972594560261752727 +6198 0.0517342602021853073535 +6199 0.1751924261823086248491 +6200 0.1423694575135426299006 +6201 0.1420791411554148953034 +6202 0.0153817904089535335832 +6203 0.1310048076352797741251 +6204 0.176221878856137825764 +6205 0.1759521443031886733799 +6206 0.1572249093078289250425 +6207 0.0192328457123714692545 +6208 0.0110845229267276437074 +6209 0.1555031389760291593571 +6210 0.1640402388743329442633 +6211 0.0209926543086005820693 +6212 0.0771875649122297780025 +6213 0.0523663002249369902152 +6214 0.0097513040774134963212 +6215 0.1908305010881104213372 +6216 0.0184069057293816014387 +6217 0.150601253697265902165 +6218 0.0368695510138907570075 +6219 0.1587112009736477424848 +6220 0.044846260299077081446 +6221 0.1454433793247525141812 +6222 0.0088949904238700659709 +6223 0.172101674087783790279 +6224 0.1038908160295760468861 +6225 0.1055068586341223746405 +6226 0.1557947573837801802377 +6227 0.1004915640270240134724 +6228 0.1361612292138245083883 +6229 0.0993627536069939254482 +6230 0.042744205162205985149 +6231 0.1652207475466213049131 +6232 0.2274285257510501589984 +6233 0.0598568504578215823675 +6234 0.1806969131964562824688 +6235 0.1881981714861281351769 +6236 0.0044352883716031977698 +6237 0.2176083555869988417353 +6238 0.1901531434646659968202 +6239 0.0980542346861306562955 +6240 0.0060365266748622784312 +6241 0.0048951880037527697662 +6242 0.1765271054326795208134 +6243 0.0528413448343061711854 +6244 0.1821061706673147706503 +6245 0.105942314955177779856 +6246 0.2521888163938346139048 +6247 0.0422534293942315800074 +6248 0.107938585766097758234 +6249 0.0046848545519676950896 +6250 0.1035641429865597346049 +6251 0.1161284126275965306041 +6252 0.00327310320073544549 +6253 0.0593396563287599335124 +6254 0.019724250317755005063 +6255 0.0286076200565349124394 +6256 0.0802258798860453342661 +6257 0.0470173213738967962771 +6258 0.1531718274486312536542 +6259 0.0408230816876589477915 +6260 0.2375396729339876444875 +6261 0.1207793231812441386852 +6262 0.0004370583381552423487 +6263 0.0760538502885760997474 +6264 0.0146240092715582501587 +6265 0.2304024441904804210157 +6266 0.0419298628528689404371 +6267 0.0320977330183452591594 +6268 0.0614875404537305672581 +6269 0.0496022837765947924304 +6270 0.0109163756488231625946 +6271 0.0066997589295264871334 +6272 0.2534464026202942754651 +6273 0.0020058304165927165995 +6274 0.1308630402580706064697 +6275 0.3049154151600591000637 +6276 0.2198107279100149358264 +6277 0.0008338716591591484057 +6278 0.0030146336833849554648 +6279 0.0013804610641840987188 +6280 0.0025267648303612796554 +6281 0.0004841030532721159901 +6282 0.105308087642134867723 +6283 0.0026699008551142387573 +6284 0.1489604067713586055266 +6285 0.0425605964172856982008 +6286 0.0044355552241931153409 +6287 0.0006379996175175452243 +6288 0.3243289373623489679765 +6289 0.0621134705438644851849 +6290 0.2592451405983123291499 +6291 0.0382854387924869768445 +6292 0.1331422074010299161362 +6293 0.1633794769638247379451 +6294 0.1341406498618941123269 +6295 0.0981497563590763150154 +6296 0.1088568064684246605722 +6297 0.1516558662519525146894 +6298 0.0915535420407333627724 +6299 0.0609040902502688413778 +6300 0.1048788166419790252037 +6301 0.1564477528185401755678 +6302 0.1248655940609579850786 +6303 0.2550315174746895507951 +6304 0.1569197323638602814544 +6305 0.0876481925414748230807 +6306 0.2770268297889962871849 +6307 0.1258961069272269694963 +6308 0.3030851929168323355412 +6309 0.2601311829809083708831 +6310 0.0551084597941223150452 +6311 0.0374756105387539437124 +6312 0.0629967330106774026088 +6313 0.1289793020434518155959 +6314 0.1190916485865916168985 +6315 0.0154744021047433649718 +6316 0.0175549280584057082466 +6317 0.0560844882715751877345 +6318 0.0019254890163910248855 +6319 0.1549307776753613141718 +6320 0.1069409963106876804151 +6321 0.0754123226072500041361 +6322 0.211643678764222575861 +6323 0.2007915977175363242413 +6324 0.1810436607348108184468 +6325 0.1723256462394775756497 +6326 0.1058521169557556790286 +6327 0.1777043272832208031797 +6328 0.1796190311652687554567 +6329 0.2045195810634459487876 +6330 0.073476640758224492922 +6331 0.0032936716566865842024 +6332 0.0643392371239562416152 +6333 0.1946722324037457874102 +6334 0.1177407733543508333574 +6335 0.0611617644328880036286 +6336 0.1534244607331993748289 +6337 0.0589190455438160301527 +6338 0.1092375863152018622415 +6339 0.0928199235724071342046 +6340 0.051899778285320059823 +6341 0.0168880047620067856584 +6342 0.0243259353053178993542 +6343 0.0289345461830358662014 +6344 0.094961228042639095337 +6345 0.2613967808474900733628 +6346 0.075116902720678363492 +6347 0.3270020207311737925338 +6348 0.1449182843426043976187 +6349 0.0872846586874060914596 +6350 0.0037444994769517177952 +6351 0.026417100540428322536 +6352 0.0717509136784424822464 +6353 0.033277627850679163124 +6354 0.0422604268369082061718 +6355 0.1072089222086634319187 +6356 0.0753936704631503057383 +6357 0.0585955297982415496127 +6358 0.0289252194288751769691 +6359 0.0124532640591521366613 +6360 0.0746233722795664883298 +6361 0.0253664147809342238604 +6362 0.2161939149897319500937 +6363 0.0136121618696277667249 +6364 0.0618217469769654845435 +6365 0.0247212379988383246232 +6366 0.1593858224672120293963 +6367 0.0328364687480683925536 +6368 0.0355567653842557285238 +6369 0.0126267204211019459792 +6370 0.0308864633044824143937 +6371 0.0645721447325867531353 +6372 0.1524026728154118937031 +6373 0.1718579236012924282839 +6374 0.2195558528031251455115 +6375 0.1967300883550815171485 +6376 0.0169277276561853734504 +6377 0.0244940165858533662191 +6378 0.1664323513606872917414 +6379 0.0756742933102977632931 +6380 0.1600396846063824318751 +6381 0.1181182329249421125716 +6382 0.0187969571549861345916 +6383 0.097545782006820236365 +6384 0.0959817879752517810754 +6385 0.1062479410060110512903 +6386 0.1127941613877111587394 +6387 0.3361876147403016212856 +6388 0.0461683212375711624076 +6389 0.0787667620562482739821 +6390 0.0418296756464731217529 +6391 0.0659819471831287612806 +6392 0.0328907902670439247772 +6393 0.0076485028431454282066 +6394 0.0290653807174479882069 +6395 0.18273460195423996133 +6396 0.0040149625593208504437 +6397 0.1330240042814171130825 +6398 0.152984494624709449484 +6399 0.0633633636931140165061 +6400 0.0401105623979124900624 +6401 0.1282079156104489747747 +6402 0.0943482099532688134325 +6403 0.1750470381172105627243 +6404 0.1760412334928320721161 +6405 0.0325277490263606999799 +6406 0.0607962875616371092868 +6407 0.1189393566722013184656 +6408 0.1084198209895844700057 +6409 0.0014256860433202855204 +6410 0.0016928058381360953834 +6411 0.0021072138275991625193 +6412 0.2212216664496740037293 +6413 0.0835865832083859949808 +6414 0.113303989902651058852 +6415 0.146833601993499290872 +6416 0.1839710212348815154826 +6417 0.1943561773743082143895 +6418 0.1546832780850398436634 +6419 0.0064028516794422275088 +6420 0.1873537235563742797684 +6421 0.0510940870401953850521 +6422 0.0185673948928583090745 +6423 0.040880772678408949794 +6424 0.0358820181097003337856 +6425 0.1722868474569910213212 +6426 0.0115894141901494857061 +6427 0.0778221991898873921567 +6428 0.1856878495604148138209 +6429 0.1817509563331964062005 +6430 0.0229558433893138069681 +6431 0.0017529210888475997999 +6432 0.2463369702336387634389 +6433 0.2325409856108833772659 +6434 0.0200671553350801126769 +6435 0.1170508567436589070221 +6436 0.2169706586433677308889 +6437 0.00330882858750440427 +6438 0.0014203263205975638715 +6439 0.0133527826296578194798 +6440 0.0806315873866006865844 +6441 0.1115736378740094236761 +6442 0.0577232788467951021816 +6443 0.0141970735134982276038 +6444 0.0337993966318123548187 +6445 0.0526156937376876868151 +6446 0.1209241182116551488468 +6447 0.1134981231418197705763 +6448 0.0089184344030667217496 +6449 0.0112207770672182252647 +6450 0.1376238293347259444843 +6451 0.0102628449412773652127 +6452 0.170930347430830548161 +6453 0.0775646902768458157418 +6454 0.1432040226238908142697 +6455 0.0939244955965613248505 +6456 0.1822103107604053640056 +6457 0.0774960514468463201876 +6458 0.1975869829041514424972 +6459 0.2312230046317520215648 +6460 0.1632292558826641537539 +6461 0.1419671426921173540414 +6462 0.2100138351270676162486 +6463 0.1529789630551116885737 +6464 0.0453053999206945862133 +6465 0.0268858410492001376202 +6466 0.248316896331502889872 +6467 0.0544292485774690373845 +6468 0.2347604763021020612968 +6469 0.0058314246623960815444 +6470 0.2455764323834590812101 +6471 0.1828541716681549300638 +6472 0.1962808919741362567724 +6473 0.0100622548944888055844 +6474 0.1961040172425834171577 +6475 0.061968307280927822922 +6476 0.1430471023723989265619 +6477 0.1742633385538314705343 +6478 0.3382889493696246696608 +6479 0.2344960361591318098728 +6480 0.1441732866101447241292 +6481 0.1501326168716988485041 +6482 0.2513312645926688571052 +6483 0.0754770745962791977934 +6484 0.1567426684733178587905 +6485 0.1502717757866961767466 +6486 0.0155931085605828835539 +6487 0.07327705558158632837 +6488 0.2123914209395511842882 +6489 0.07744710497744136668 +6490 0.0879486292494550542242 +6491 0.1956373599128413132142 +6492 0.2918272957616663276781 +6493 0.2020924662808739058484 +6494 0.0644648743633148835208 +6495 0.2333128129938591421855 +6496 0.1243106202470819210415 +6497 0.1629046323595919609772 +6498 0.143001446726329750625 +6499 0.0973209406105584318158 +6500 0.2270152322766300911905 +6501 0.0906864251038341367961 +6502 0.014926826602081571041 +6503 0.0579113883175975466266 +6504 0.1671919920445425478128 +6505 0.0274870212272349187521 +6506 0.1391174638994850665252 +6507 0.1514077405247115648557 +6508 0.1238877749407523570779 +6509 0.0081461622042355158468 +6510 0.3031298924109740444699 +6511 0.0339814995947270159782 +6512 0.0768752967388360158862 +6513 0.1400509299534252183328 +6514 0.2008330079743787244251 +6515 0.0146575452094653783247 +6516 0.0740464479066580971711 +6517 0.0031839058004425806019 +6518 0.1715793368811031616161 +6519 0.4259925218584189599014 +6520 0.0012019814010135617365 +6521 0.3121015245599179088742 +6522 0.1478800805189307288057 +6523 0.1332125451089298528995 +6524 0.1664291354399049516211 +6525 0.1247265225313820696806 +6526 0.1656611924547137515429 +6527 0.0155570790979369273643 +6528 0.0538610971666079185738 +6529 0.0778490307852079815865 +6530 0.0378990613157343947393 +6531 0.0332528090563497871757 +6532 0.0007780302544443608477 +6533 0.1125657826733356542404 +6534 0.1060125905967394532858 +6535 0.1116922565846869136452 +6536 0.0901510058978196998636 +6537 0.0560579469406574937285 +6538 0.0763651484203855396293 +6539 0.0927597406825487746929 +6540 0.0642494663698310597422 +6541 0.0552101358689963336857 +6542 0.002706075325572701671 +6543 0.0093227874457351561643 +6544 0.0187344512947789504365 +6545 0.0047011063503942735267 +6546 0.1169280010531367486326 +6547 0.0259220372978774746264 +6548 0.111872757717569118463 +6549 0.0427303553233369451392 +6550 0.0784842603760878554375 +6551 0.1167177156265563758852 +6552 0.0160215548843743831042 +6553 0.0092318083595709608813 +6554 0.0003573812435783161397 +6555 0.0055546532712574606117 +6556 0.0025790744483065227571 +6557 0.0076913330780070659348 +6558 0.0047782340873862025679 +6559 0.0078149093954853563609 +6560 0.0016934773080750119886 +6561 0.0105780405642019528778 +6562 0.0016477889475980062126 +6563 0.0124723913027463637754 +6564 0.0072504321838772417888 +6565 0.0590347835624781824837 +6566 0.1618474342785949093848 +6567 0.0913485240502112244565 +6568 0.1687872721674402443082 +6569 0.231267855814371842138 +6570 0.1351805884280661562702 +6571 0.2096155174387697717187 +6572 0.1400249156866700261936 +6573 0.0836863624483870638393 +6574 0.2929175523182862006522 +6575 0.1021729815861018159096 +6576 0.1982483575192470481863 +6577 0.1337407551542923733745 +6578 0.2383796858710598709497 +6579 0.1299797044941439772003 +6580 0.2418841112060074460821 +6581 0.1103784236654487688201 +6582 0.0488042154102427128137 +6583 0.2638320204460604112562 +6584 0.1938349150382421282046 +6585 0.0405432252309991594807 +6586 0.191198136070096447181 +6587 0.2472686893774702354687 +6588 0.0062880896944980130964 +6589 0.085002039972992909922 +6590 0.080267709967796058157 +6591 0.1052575269809458285986 +6592 0.2113952098197312834404 +6593 0.0229164175689986863993 +6594 0.0147211887037421718105 +6595 0.009904756756880288282 +6596 0.0575798050689702162197 +6597 0.0066535314083409650629 +6598 0.0391114504109769528517 +6599 0.2385685398847353644314 +6600 0.0021623020104250384975 +6601 0.20111743421193384318 +6602 0.1344619662993360997838 +6603 0.1599341521839720281495 +6604 0.1262300833071652861328 +6605 0.1035555844159948973848 +6606 0.0806308666423979913951 +6607 0.1568447003675909334763 +6608 0.0784819794481953630916 +6609 0.1833607809646756736655 +6610 0.1800381821608825483327 +6611 0.2513794308214428174786 +6612 0.0532385393875324242074 +6613 0.2175761037681273046829 +6614 0.0785373083700609542213 +6615 0.2218983638726246265804 +6616 0.0146027572551968801845 +6617 0.1592673657950800236716 +6618 0.0161105605666831651346 +6619 0.1105039010146763450715 +6620 0.0470067971813964563532 +6621 0.0928306192363959614688 +6622 0.2747694111181017695422 +6623 0.0006084760256480669424 +6624 0.0194277618284215726485 +6625 0.142978693781330434831 +6626 0.2345146423255030365684 +6627 0.0445006476976525303102 +6628 0.0020238727154076661778 +6629 0.0643975825069648816212 +6630 0.0172780131125405897463 +6631 0.2405267163705158972586 +6632 0.2901120313801416883415 +6633 0.1321988091619106520103 +6634 0.152182335319851314015 +6635 0.0828119545625734809757 +6636 0.1603656006800869282536 +6637 0.2699740352373932728014 +6638 0.0921908474671449401638 +6639 0.1455946146380769634643 +6640 0.154068942279474790924 +6641 0.0144710721720221358738 +6642 0.0845030869880356722001 +6643 0.155544596007783197944 +6644 0.167626673686813165709 +6645 0.1085092235051271364332 +6646 0.0699938987953007824006 +6647 0.0783382934164532224175 +6648 0.1309734190294808697796 +6649 0.0377287717196675975728 +6650 0.0530909110531919969933 +6651 0.0985234202450770979453 +6652 0.0637452235638477776591 +6653 0.0057830382912479232418 +6654 0.0627284599743241078063 +6655 0.1098289124867148774944 +6656 0.0679260941048707989065 +6657 0.1528100544192936005583 +6658 0.2590230714792907118493 +6659 0.2158205270235448447469 +6660 0.0380230688616467471519 +6661 0.1854040071835148795198 +6662 0.0700865410449131170934 +6663 0.0080966374774279672305 +6664 0.0819085339818140800716 +6665 0.1643972227945900976831 +6666 0.1128102238049530636754 +6667 0.0095615217349317403739 +6668 0.0636390588320896871677 +6669 0.0633921249947407561276 +6670 0.0043417479665295594798 +6671 0.0355415744190362009602 +6672 0.1593972646883441979249 +6673 0.098255536751548494534 +6674 0.1373198734894149619645 +6675 0.0977226270649931144652 +6676 0.0695787285351593953742 +6677 0.0779879548470889388811 +6678 0.0670875396779267718639 +6679 0.2704568204220754723544 +6680 0.1377386087526683844384 +6681 0.1022937274407929308939 +6682 0.0311144358608542968836 +6683 0.0731817911726609093837 +6684 0.0589520827218428106198 +6685 0.2011129446264857401072 +6686 0.0383547906675025182532 +6687 0.0321033686145350410923 +6688 0.1713380264968447919127 +6689 0.0846086469384531164595 +6690 0.1807589413361541108571 +6691 0.0221693352946342667198 +6692 0.0208347551481722001454 +6693 0.0240934504283759828858 +6694 0.0153572316031732585689 +6695 0.0277854635433716391779 +6696 0.0020801100043890221637 +6697 0.3492369919867175931394 +6698 0.1452425077034987521696 +6699 0.0089849112138200310645 +6700 0.1523289063635125117901 +6701 0.052800947578179394104 +6702 0.0411862563690862867882 +6703 0.0182263569802415190124 +6704 0.0430903870083061343865 +6705 0.0397691376477106867116 +6706 0.16871050332315018494 +6707 0.1666145421273275140095 +6708 0.0963370390162116563282 +6709 0.0886069652998733442439 +6710 0.1211365626310347554107 +6711 0.0629727963842893062596 +6712 0.1430729169714009818559 +6713 0.1244965766995674155693 +6714 0.1577969746935194150783 +6715 0.1989065135256129768226 +6716 0.0309010384621324768473 +6717 0.05632210843643319087 +6718 0.1298436316627563924531 +6719 0.1007702729703717908771 +6720 0.0982338341261885378275 +6721 0.1281385517461155199026 +6722 0.1880897267215397294926 +6723 0.0365466742355859061653 +6724 0.0754722569572871815335 +6725 0.0841889159941402592802 +6726 0.0195089208397466358502 +6727 0.0016971412285932561872 +6728 0.0028285724667345196401 +6729 0.1933535939294830086066 +6730 0.2541380466688484274229 +6731 0.2011818961787693316179 +6732 0.1669331530956693343537 +6733 0.2026008381090057819396 +6734 0.039539852056375557332 +6735 0.1635688658217728519428 +6736 0.1065264783126263353763 +6737 0.2484066510333744759453 +6738 0.1961554342746598678104 +6739 0.1276055973352040462387 +6740 0.2009899437652503395579 +6741 0.147654569008270231123 +6742 0.1808237595548227716336 +6743 0.0403322919305992289019 +6744 0.1172038119458745919488 +6745 0.0080151765125256388411 +6746 0.0089004048919599579315 +6747 0.3334038204546406758411 +6748 0.125496410128114521676 +6749 0.0076633399819015131371 +6750 0.0035386466574901537935 +6751 0.1018501104185967204296 +6752 0.0057801378527072497238 +6753 0.0650818420357490573513 +6754 0.0008497114676432241222 +6755 0.0046666212455603057335 +6756 0.1556894197136663160475 +6757 0.1878028550585393585681 +6758 0.1487186176554803340721 +6759 0.1346767510005601187206 +6760 0.1644080989188584085436 +6761 0.0781790196185719876709 +6762 0.1631180934951137473377 +6763 0.1412726303874177191666 +6764 0.1733585218816308115564 +6765 0.2218110367036285890396 +6766 0.2075389454044561932111 +6767 0.1108445892032157015228 +6768 0.2076529269764074703275 +6769 0.044656649848116229673 +6770 0.1883631157615723028531 +6771 0.1808678619434770928898 +6772 0.0569723552220503248744 +6773 0.0101714159417192803042 +6774 0.0226384719781705275043 +6775 0.1879156638988419003589 +6776 0.2993860629798745232044 +6777 0.1389991564885895702908 +6778 0.0430276856452954034604 +6779 0.112859545145672021671 +6780 0.0529795022191820036417 +6781 0.1640109692670611574172 +6782 0.100344205729622160117 +6783 0.0094981439906533602496 +6784 0.122959790870989668643 +6785 0.2778951934228080511424 +6786 0.1023832545104687991033 +6787 0.3025663950006142188798 +6788 0.1352882082254931173093 +6789 0.3227007892881398887219 +6790 0.1052958884889646784533 +6791 0.0972176838427383921415 +6792 0.156100559810011030315 +6793 0.0481445401492375732455 +6794 0.0013718107980117484786 +6795 0.0128680468704290631998 +6796 0.0309661138611430492018 +6797 0.006684108855071174693 +6798 0.000984537242038703922 +6799 0.1135881860351443672918 +6800 0.1513929182213579471838 +6801 0.2569491346266051245983 +6802 0.0050905673255816699663 +6803 0.0063943772078976848783 +6804 0.117356655811498747366 +6805 0.0350952882170933455619 +6806 0.2900025481639444957516 +6807 0.0692343011238866767876 +6808 0.2523159286851539562235 +6809 0.2496677220096156224471 +6810 0.2012493998038021036923 +6811 0.1583996455552382776055 +6812 0.1867794452411207783982 +6813 0.0529268139747673255213 +6814 0.1935760070902865226383 +6815 0.2016651651852214843785 +6816 0.0017556706029552373617 +6817 0.0433280042808951507127 +6818 0.1163817683604457808855 +6819 0.0635495352051272527349 +6820 0.0176796713928052164067 +6821 0.0453557643948531291622 +6822 0.0684199684061418539338 +6823 0.2237626806521066102906 +6824 0.0538234300796507220133 +6825 0.2345819857491906745839 +6826 0.2460577873491113509719 +6827 0.1898615368311965811987 +6828 0.1054419718757311635882 +6829 0.1797130567260208922065 +6830 0.1315900125117239749528 +6831 0.0078964630728857680003 +6832 0.005440934282805278896 +6833 0.0192021293107217140561 +6834 0.0127548197737724598733 +6835 0.008672316351509926427 +6836 0.1864092347843764874149 +6837 0.1119635860876671829001 +6838 0.2226387592757784195108 +6839 0.096755231262250537827 +6840 0.2347136623971006474942 +6841 0.0686399298647391992167 +6842 0.1962262182713534197642 +6843 0.1770465253562533147758 +6844 0.2703814752592825221278 +6845 0.0717311294078946254382 +6846 0.0029305537129735289778 +6847 0.3114144634416235457586 +6848 0.0274850796799370085399 +6849 0.1064122480851530938573 +6850 0.0947433231268478143194 +6851 0.1686200515282713918719 +6852 0.2206228367851677807376 +6853 0.1790472451332134340429 +6854 0.2632762999805194348291 +6855 0.1951058881724699045623 +6856 0.083031473801594132822 +6857 0.1849233675777781638061 +6858 0.0094798702890740525057 +6859 0.0054994278433077919618 +6860 0.1701197229177777159315 +6861 0.0051173419076946972603 +6862 0.1700851451395791469334 +6863 0.1988932460939051849458 +6864 0.2010244261149613520523 +6865 0.0124057297169169588463 +6866 0.1569698779139733213484 +6867 0.2362758628302682739619 +6868 0.2317158069679461085411 +6869 0.1627073448151767354197 +6870 0.2335677444742602915095 +6871 0.1427087025734083414186 +6872 0.1986725081541958759512 +6873 0.0131124922475195828525 +6874 0.095818608983381153843 +6875 0.1637867629990558404618 +6876 0.1937075387490455047335 +6877 0.2924025074657299327896 +6878 0.161219753581070684989 +6879 0.2760547948819426222755 +6880 0.1978892454650732213306 +6881 0.261552722717613539416 +6882 0.1904137728238812499182 +6883 0.1231617473896098463593 +6884 0.2444819860221792706678 +6885 0.1649224586009477577786 +6886 0.2993844718022266038204 +6887 0.1589172490980094443191 +6888 0.0572690396759494291246 +6889 0.2563651439748567129051 +6890 0.2375161973902255030389 +6891 0.0634716427027591234555 +6892 0.1434013864963223638949 +6893 0.000463778490719803419 +6894 0.0164123756871605748497 +6895 0.0014865029326131464372 +6896 0.248022767742687277881 +6897 0.1378815324833213007416 +6898 0.2130387800023814393047 +6899 0.095191231430643813427 +6900 0.1566901800781311526745 +6901 0.0051877082447453369204 +6902 0.1610703526932819318329 +6903 0.1636579276364559243362 +6904 0.1369380573555604163616 +6905 0.1575401041941232038734 +6906 0.1494367112324106738885 +6907 0.2589469951928975444133 +6908 0.001950691590961720805 +6909 0.1899489364589395656857 +6910 0.0592046565065425592356 +6911 0.177264500584843387454 +6912 0.0407910612813292619561 +6913 0.2278227570515837518972 +6914 0.0100782146640880660576 +6915 0.0012354626777714822726 +6916 0.0998892835473744189478 +6917 0.2441776878559239749578 +6918 0.0313865200789487930333 +6919 0.2621063787362843955364 +6920 0.2849837101865418942381 +6921 0.0035032403911178450553 +6922 0.0471009977096594756318 +6923 0.1520878444428041265102 +6924 0.1939103286375115087203 +6925 0.2304024187440255333481 +6926 0.1856199403081565946216 +6927 0.1187418635046747217299 +6928 0.1931618754398942150452 +6929 0.1294829965009825223365 +6930 0.0012646713878231537805 +6931 0.2787634455667954980207 +6932 0.0040973528063732471069 +6933 0.0462208811666441829469 +6934 0.1653041128316941887899 +6935 0.0027116223368395192886 +6936 0.0258468627703985799615 +6937 0.1837486258918213377633 +6938 0.2119523202376066561836 +6939 0.1935111661971808671989 +6940 0.0079442638736577624731 +6941 0.0125055145722040399192 +6942 0.0037186184389527551439 +6943 0.0041158757026474764029 +6944 0.1037712937992245759711 +6945 0.0081261938794545598291 +6946 0.0077040706989230526083 +6947 0.0782039820019537895712 +6948 0.1857239687531499761075 +6949 0.1222818277210285936585 +6950 0.031695405387718464385 +6951 0.0228411190039118951189 +6952 0.0085540816903471351973 +6953 0.0081262814651944601657 +6954 0.0075541458763101424628 +6955 0.003945766534540177603 +6956 0.0058586346236182392302 +6957 0.015156852138865892915 +6958 0.2692444845391226970754 +6959 0.1725980213015269326693 +6960 0.2105697021087762077496 +6961 0.1194547524288085416089 +6962 0.2468881232323845986709 +6963 0.1795787850988713185707 +6964 0.0224941371213345335522 +6965 0.2320508017026192137955 +6966 0.1609619766538062102246 +6967 0.1093502164221428207824 +6968 0.2572508682678579794434 +6969 0.1237001602371263359048 +6970 0.1582015481839995629798 +6971 0.0990371509229996671575 +6972 0.0013789293520350496408 +6973 0.174434280408294895004 +6974 0.1466813080751296349824 +6975 0.0479496655249919412345 +6976 0.0107842625368337313024 +6977 0.0093123248097053221844 +6978 0.0054087754508379669999 +6979 0.008011905827938510874 +6980 0.1154252072228778430718 +6981 0.1033834897460963464511 +6982 0.0235538300716771724364 +6983 0.0255570377700515896646 +6984 0.1210179486262552145392 +6985 0.0306737017473436394954 +6986 0.1676375455589426133329 +6987 0.1702285451552505157746 +6988 0.1961482708715489298168 +6989 0.0094377157484646853614 +6990 0.2547414885818424790109 +6991 0.270631235224775668069 +6992 0.1660346920806677062377 +6993 0.1277660008452711770666 +6994 0.0343758704946244414424 +6995 0.0779766620655987507948 +6996 0.0537584873408924537874 +6997 0.1031439530254232295192 +6998 0.1877667999150261224361 +6999 0.1305221012484454046199 +7000 0.0262844802619938561206 +7001 0.1412701693053333984462 +7002 0.0598441558752462951509 +7003 0.016373620501336335481 +7004 0.0010090702568582566017 +7005 0.0601727392588179768596 +7006 0.0800446795115851844615 +7007 0.134016019412936782107 +7008 0.1324747349989361389078 +7009 0.0254168824710101423636 +7010 0.2281315417027489633739 +7011 0.2552459887898338442014 +7012 0.1702657779524869297028 +7013 0.1431894923019999266156 +7014 0.0608342908279927915838 +7015 0.0446474301225592423714 +7016 0.0004473408334054415104 +7017 0.1696723447920359950647 +7018 0.1614123479663291949038 +7019 0.1943984094797673867205 +7020 0.1026104307698111894886 +7021 0.0713508353594766997485 +7022 0.0044486308028851059143 +7023 0.0564886379508027711149 +7024 0.058762507135387845314 +7025 0.2174741226753020284512 +7026 0.0734884283629053253195 +7027 0.0025443377055439073998 +7028 0.0495985564932890921064 +7029 0.1708240673688136979091 +7030 0.0974737514871325261279 +7031 0.1797438774013863926893 +7032 0.0349863994354448690216 +7033 0.043336047588123011165 +7034 0.0006499093024782119123 +7035 0.1148717524474717616156 +7036 0.0397986056350105241175 +7037 0.2024538258658122591438 +7038 0.1409106084531014935202 +7039 0.2474237066960955633288 +7040 0.2418268766807698810783 +7041 0.0112274254555262721633 +7042 0.0510759685055730897063 +7043 0.0032044567091270687552 +7044 0.1164074770774476408342 +7045 0.008922784670801577056 +7046 0.1388085290457064802094 +7047 0.002431652960148449321 +7048 0.0100585832865899733879 +7049 0.0227037642209334117327 +7050 0.0023525478413698435592 +7051 0.0401068117952581848495 +7052 0.1679931017362037426466 +7053 0.099184482398131754155 +7054 0.0869947130487890851747 +7055 0.149306360717943925609 +7056 0.1400249108958426524918 +7057 0.0041211983905861193991 +7058 0.0706991483816256927053 +7059 0.115319483186310245415 +7060 0.2513124854242952399552 +7061 0.0288690299220167173599 +7062 0.1825886435003297447732 +7063 0.0408353710226772442793 +7064 0.2112380201986803207603 +7065 0.027463429399787932933 +7066 0.1312491160306407111946 +7067 0.0287177393231401213713 +7068 0.0211015679569402578308 +7069 0.1285183661897205675828 +7070 0.0241641304517487484627 +7071 0.0425494702431178026258 +7072 0.0568167790348067872075 +7073 0.2377412367398576698019 +7074 0.0628037566824229676055 +7075 0.0190240516106209255065 +7076 0.0400343594994245835506 +7077 0.0627083755436964873642 +7078 0.0192295636211119898296 +7079 0.0050889876575943016057 +7080 0.1281742302669431898821 +7081 0.0389689709566808512031 +7082 0.005349280300847131589 +7083 0.1174804355710783038136 +7084 0.2810972670350996294175 +7085 0.1652880301844797672484 +7086 0.0607263563490851449034 +7087 0.2047446976513243588336 +7088 0.0210437411517892317137 +7089 0.0237265869944407042924 +7090 0.1292638542538473156451 +7091 0.1821315917862811883943 +7092 0.0024122010906484328725 +7093 0.0320911155746763793184 +7094 0.0653809404820665934777 +7095 0.3067621932478507562614 +7096 0.0882976223647601032862 +7097 0.0405455169898889106195 +7098 0.1006761568741657286719 +7099 0.0222135174664889502305 +7100 0.0835649923383346565631 +7101 0.1471352405598633783512 +7102 0.0232352505895321755081 +7103 0.2044526228448531601778 +7104 0.0308833322998154358174 +7105 0.0204098275282929182695 +7106 0.130431311444891284701 +7107 0.0390289578731146322155 +7108 0.0341997715489326462035 +7109 0.0790190803598364438498 +7110 0.1531101161540770472236 +7111 0.2845955420590504103551 +7112 0.010268162336396158274 +7113 0.1520672448311609636917 +7114 0.0896552936671122929768 +7115 0.0327616196962487241096 +7116 0.076022176452572948202 +7117 0.2267611884457592652531 +7118 0.3226643096491740858589 +7119 0.1487387959984254048873 +7120 0.2599492635301189347174 +7121 0.2925745105389020439546 +7122 0.0141523649556834860835 +7123 0.012302611697058469728 +7124 0.1920775679674741887037 +7125 0.1073070905365919452956 +7126 0.0806147523041901292729 +7127 0.1825678886931261513116 +7128 0.1796800922479328821435 +7129 0.2517008004299816081861 +7130 0.2956920526852048203281 +7131 0.2330077441291455442318 +7132 0.1776070404737491448266 +7133 0.1013108005119411347827 +7134 0.1625119124691619676426 +7135 0.2032908486140322656954 +7136 0.1270912873514711827205 +7137 0.160184020895032003251 +7138 0.0121463918097770001009 +7139 0.001144594223953125222 +7140 0.106800212182768208824 +7141 0.252376519603120241797 +7142 0.0386144925919582882035 +7143 0.0585522152850152174941 +7144 0.1163637823054508718679 +7145 0.0736958484652108086754 +7146 0.2597463912235527527805 +7147 0.0566554735414096233281 +7148 0.038421443639089968014 +7149 0.0294694976472139082146 +7150 0.0903287287942549921294 +7151 0.0271082961552342113776 +7152 0.1725579838513911778985 +7153 0.0365182054548066140076 +7154 0.0462382341852170600238 +7155 0.0153021154350804766686 +7156 0.0688331532599636419789 +7157 0.1836455995405524288522 +7158 0.0388296929041793761894 +7159 0.0380090567390104633327 +7160 0.1806867774482911193079 +7161 0.5143091690219215861291 +7162 0.153484442068992477104 +7163 0.1461929300072654924758 +7164 0.0868459276201993257693 +7165 0.1484975363409632154177 +7166 0.1694156411637361692168 +7167 0.1794808666283979614597 +7168 0.2919718668648719206438 +7169 0.0024774412691419070431 +7170 0.1151924983036917843338 +7171 0.0343534331763539063642 +7172 0.1332648821599643207403 +7173 0.1424134258510840189693 +7174 0.2051807544472373789901 +7175 0.3573527106168091771465 +7176 0.0504163050066006035443 +7177 0.0528730518209758679204 +7178 0.0422383164553685047649 +7179 0.1358339703819765598158 +7180 0.1745780699896406218397 +7181 0.0243682883726829800641 +7182 0.0159991076520231145786 +7183 0.197014494267453821319 +7184 0.1797956631160479157305 +7185 0.0924955143882882357964 +7186 0.0485494827412580251536 +7187 0.1502465778620536429599 +7188 0.0050688341852603872525 +7189 0.2788732320019115529242 +7190 0.1261731825454097399852 +7191 0.2602967062724941271767 +7192 0.0193800617740809734413 +7193 0.1844219816324207950053 +7194 0.0231547262422769940804 +7195 0.0663944604589678522011 +7196 0.0698691318359255797432 +7197 0.0355906026508230124983 +7198 0.0877733691830360762687 +7199 0.0046483159818050132431 +7200 0.0193510587348805565555 +7201 0.0704511707180515700522 +7202 0.1685335257908435191432 +7203 0.0648069087426485224146 +7204 0.1548125933800301723409 +7205 0.0332617505834901516781 +7206 0.0039500204260573363799 +7207 0.0923614814238988279804 +7208 0.1317685662150410441651 +7209 0.022573642352775979969 +7210 0.187171064610180237775 +7211 0.0021847163879245160359 +7212 0.0029484223134867149074 +7213 0.0117211867543335051395 +7214 0.0256153477367695864286 +7215 0.0302601892879705500206 +7216 0.0080743576731721698053 +7217 0.1615430057815815112487 +7218 0.2132745581066337881015 +7219 0.0990207450198271055264 +7220 0.0082770682584584111957 +7221 0.2457828709494123409218 +7222 0.1496947903196798879133 +7223 0.1709611291053572401655 +7224 0.0729557172210548859903 +7225 0.1638225432022411276467 +7226 0.2367647419907275563311 +7227 0.1071743008484072057218 +7228 0.1453173722973346015941 +7229 0.0540652712779035940494 +7230 0.0676581230308455211508 +7231 0.0220501712802128602142 +7232 0.1641008729062123361331 +7233 0.0030408974998264088653 +7234 0.1592706668940992442707 +7235 0.2705598520150543362384 +7236 0.0169708041488108481165 +7237 0.0240243016277104197431 +7238 0.0320370348504520202626 +7239 0.0452130887385100641018 +7240 0.0021732078869530794749 +7241 0.1265452695915671443849 +7242 0.2215662817999634881527 +7243 0.2121824731028696675583 +7244 0.1670019111011586676607 +7245 0.0248070535250153947904 +7246 0.009026503724437041562 +7247 0.1977391426567640386747 +7248 0.0228144461506976717335 +7249 0.007957308999043375547 +7250 0.1364840801595860186879 +7251 0.0688081437601193723896 +7252 0.1597928569777959662002 +7253 0.1015986575658400364874 +7254 0.1329819487918781073521 +7255 0.1964642021308921349121 +7256 0.2039842528007781985444 +7257 0.2629423632225511564542 +7258 0.2007783977141902598262 +7259 0.095200631749494293854 +7260 0.108485014730842807551 +7261 0.028144930487013341408 +7262 0.1327881162535629877564 +7263 0.1679197762273670901934 +7264 0.2246221853101287024845 +7265 0.2713118969980199413072 +7266 0.2090316112445549379117 +7267 0.2397132919934392980554 +7268 0.2746121467819490624684 +7269 0.1511887208139442440036 +7270 0.1095349141223087990893 +7271 0.0712154595608623791625 +7272 0.2379242297371115255622 +7273 0.1810676195783399999861 +7274 0.1594040602936583339577 +7275 0.1312999557418670915077 +7276 0.1684155434400857465072 +7277 0.0391534819185887297821 +7278 0.2078745155227762764216 +7279 0.1293171125017410771907 +7280 0.0881371477654050300421 +7281 0.1763619859423152524958 +7282 0.083382289319535965455 +7283 0.1754476970497246823832 +7284 0.3075028891840088096643 +7285 0.0103038635172453679756 +7286 0.0643802460126821907149 +7287 0.0027550168782264085852 +7288 0.0009355258105007295587 +7289 0.0008918928235668504905 +7290 0.0159396823716495654455 +7291 0.012910591478893810799 +7292 0.0210210157332206444558 +7293 0.2182765365716880789915 +7294 0.0208047962208791070715 +7295 0.1451002301958850271291 +7296 0.0864985025931018919332 +7297 0.0099131726489990884543 +7298 0.010429573155157131209 +7299 0.0825857954835385876802 +7300 0.1003814392911926978158 +7301 0.1528444860692270701463 +7302 0.0750386724888685158197 +7303 0.0998886283685835318913 +7304 0.0113362414845314084244 +7305 0.0723186454183154525666 +7306 0.0831716350784509617089 +7307 0.1588679122700848989069 +7308 0.168476266936828211751 +7309 0.0888138591913109898268 +7310 0.1018119494098196214926 +7311 0.0107643145854148143681 +7312 0.0153263168283198671082 +7313 0.2541441745111340799745 +7314 0.0061358492420292148894 +7315 0.2270170435981554157845 +7316 0.0776560672016631398096 +7317 0.1633060635433541962769 +7318 0.0066930628909506505522 +7319 0.0120806157542278549272 +7320 0.0408741593237312225706 +7321 0.0324023817549913828451 +7322 0.0330529436913397714481 +7323 0.0255128944990470653276 +7324 0.0979837073593159968432 +7325 0.0014880203241825773215 +7326 0.008936603504557645225 +7327 0.0851505790808850526741 +7328 0.0046622059201657890457 +7329 0.1345557176558048184667 +7330 0.2029168331355349075285 +7331 0.2500354429769177655452 +7332 0.1204468837782945306314 +7333 0.2457680283403710363999 +7334 0.2455424801713388671676 +7335 0.0169626290343916213965 +7336 0.0036448327014053163997 +7337 0.2042108072169066812496 +7338 0.1089552727300526074883 +7339 0.2760136376805323310535 +7340 0.1169449190237073499254 +7341 0.2051060711846235673494 +7342 0.0689298080300324578218 +7343 0.3667420197781269819437 +7344 0.068420339989148704074 +7345 0.1301486917787081643016 +7346 0.1897918684218319884671 +7347 0.2004858099288919504044 +7348 0.0910719076722397991031 +7349 0.0965409177002838209125 +7350 0.2842204721827849556171 +7351 0.0634152136780540137684 +7352 0.0188413358044451570428 +7353 0.0379061548153594649047 +7354 0.0894135476404651630133 +7355 0.1250269501198652821738 +7356 0.0397688538845953690193 +7357 0.0427009848586788337554 +7358 0.0119786132116376856094 +7359 0.0544910253902383470703 +7360 0.0833276108553751421182 +7361 0.0197290084566971342384 +7362 0.0221942047407584289176 +7363 0.1603135182592677621116 +7364 0.2325320611763350697743 +7365 0.1536973064061433469352 +7366 0.0205275857709079634361 +7367 0.0199238450735758076959 +7368 0.0029153904790262677796 +7369 0.1493792807281243972817 +7370 0.1325969410022837247087 +7371 0.0978529389547513755998 +7372 0.0875060456389647467468 +7373 0.1523687819497749618147 +7374 0.150076065613148151856 +7375 0.0956538978461537736653 +7376 0.1360492455855410176557 +7377 0.0394190110612618119235 +7378 0.0089236029890490259353 +7379 0.1126894800000600022072 +7380 0.1086366168972305173979 +7381 0.1342931000383379702701 +7382 0.0073617651609381596495 +7383 0.0123197414623424112562 +7384 0.0174708092814022490635 +7385 0.0087175079185053041403 +7386 0.0043522340240265107794 +7387 0.0376789039899925876087 +7388 0.0062014275224099163486 +7389 0.0045939544934346585037 +7390 0 +7391 0 +7392 0.0140361672187343348994 +7393 0.2024739604282171367977 +7394 0.020983667623062762092 +7395 0.0100320900811012164722 +7396 0.2004206882096322972497 +7397 0.0024734223349860799679 +7398 0.0487358737428732549657 +7399 0.1879472730653228562669 +7400 0.231883497772862351427 +7401 0.1570680273851374209038 +7402 0.157269129828911891078 +7403 0.1303467836399435364481 +7404 0.02152617849669493863 +7405 0.0911067675292501072404 +7406 0.0636121472097137680146 +7407 0.036213674645305046762 +7408 0.1641052122832067916569 +7409 0.0830496762361105228534 +7410 0.178296098349467163624 +7411 0.0763066348852451115103 +7412 0.0357087797559920080981 +7413 0.1408556470198993326726 +7414 0.00954162441811895988 +7415 0.007419778481662316072 +7416 0.000516872071704517033 +7417 0.1120924644774005385361 +7418 0.1084045852092262590682 +7419 0.1264663562238993732567 +7420 0.3408171837325087438764 +7421 0.1855073654107717295858 +7422 0.1716342735522776996593 +7423 0.1454489354809909151811 +7424 0.2593839291116726841757 +7425 0.1644513968936923420383 +7426 0.1555308010062710910582 +7427 0.1782260793099902373982 +7428 0.1021501534313563608336 +7429 0.0039172955613703359587 +7430 0.0597944155242512778714 +7431 0.1081976872551827090208 +7432 0.026018143423227280131 +7433 0.2581542084805921177804 +7434 0.1611691132231049261581 +7435 0.2700567602383247001185 +7436 0.1357601444964054160369 +7437 0.359994423241028482785 +7438 0.2530025717718072386653 +7439 0.2394023925076095016262 +7440 0.2490933918944291114439 +7441 0.2197865226061767318644 +7442 0.133841066020050358798 +7443 0.1155551806022031696708 +7444 0.0475865607157361300561 +7445 0.0725008206954955852774 +7446 0.1851026671023259240201 +7447 0.0413895504998437954725 +7448 0.2279823068740435831891 +7449 0.0021367355560359515068 +7450 0.0233184548354764716038 +7451 0.2236590581605667482634 +7452 0.1019729117240492227703 +7453 0.1077157546511324226479 +7454 0.0977431454192145277027 +7455 0.1796478567168700579693 +7456 0.0342361383365885826868 +7457 0.1496340127219581150886 +7458 0.2091725035878120309007 +7459 0.0894011303725937706011 +7460 0.2331742687296593785629 +7461 0.2246807808757096958097 +7462 0.0040056386840906952826 +7463 0.1361467128423534955761 +7464 0.0248928644527552922483 +7465 0.1892588620444993763314 +7466 0.0335657516495644278609 +7467 0.2124398535023727663251 +7468 0.028185879959748118273 +7469 0.2041411411738190961884 +7470 0.2082992871052778693919 +7471 0.1626405028389218965224 +7472 0.1620803340857391861007 +7473 0.214591620399927907048 +7474 0.1305925357045854551252 +7475 0.0509191178573222755221 +7476 0.0898400481311864562706 +7477 0.2491019780732322830286 +7478 0.1711576599254318598042 +7479 0.2210075205627017869148 +7480 0.0976295625823682261535 +7481 0.0147154998114998932651 +7482 0.0672219035846862433825 +7483 0.0285243201598884901782 +7484 0.1170811508578943477277 +7485 0.004695641950287078592 +7486 0.0084206824086788586298 +7487 0.0506581410321903305438 +7488 0.2128519071058876110936 +7489 0.1739591799911736069717 +7490 0.1644540487855498112069 +7491 0.112563854742572078127 +7492 0.2032551791392386841828 +7493 0.2139209483220697316508 +7494 0.1404208309759610584511 +7495 0.0865397560181712238725 +7496 0.0982895756009059679004 +7497 0.1898073256908553441136 +7498 0.1294489971339391198857 +7499 0.1821261591201197849177 +7500 0.0529150996747691712563 +7501 0.0121191699640577285613 +7502 0.1271590457836973275807 +7503 0.1498586562665057686505 +7504 0.0173929704562815271029 +7505 0.2126850861023679983841 +7506 0.2585907746581592170365 +7507 0.1253979786096073301138 +7508 0.0178951657260589674925 +7509 0.0123104624306440337683 +7510 0.0008581594358516511515 +7511 0.1239144831426130799468 +7512 0.1507268817153783746488 +7513 0.1605654679331280920707 +7514 0.1347895905090470414223 +7515 0.241636790612875101969 +7516 0.1571054400470403289436 +7517 0.1234432851756936261323 +7518 0.1305447220777536110337 +7519 0.0494948032241300864276 +7520 0.1956180496514848543566 +7521 0.0675022255899696971282 +7522 0.0749375249280158500786 +7523 0.1715029259705939945757 +7524 0.1576569782236373340467 +7525 0.0018660387529547952865 +7526 0.2001454364775369920704 +7527 0.011479737088007306578 +7528 0.2060020260813822390311 +7529 0.1623967678817888338205 +7530 0.1307445360816844592833 +7531 0.1242313136777665610877 +7532 0.1759244884245406703727 +7533 0.229674435477867755484 +7534 0.1644172705238240994596 +7535 0.326494762904196600406 +7536 0.1644765396531465417862 +7537 0.0351222391274917458692 +7538 0.2363156145524553941595 +7539 0.2614356221992091833251 +7540 0.0089934822453611989318 +7541 0.0088866598064387108263 +7542 0.0826156682013749199545 +7543 0.0288801105647755115835 +7544 0.0284857084358081553976 +7545 0.008567218744618744844 +7546 0.0364284947934817096571 +7547 0.0050515754657117157081 +7548 0.1365111119692143104576 +7549 0.0129991297935365835575 +7550 0.0493669687670928170053 +7551 0.0288595298746761701192 +7552 0.0174148972619346392721 +7553 0.0237827101357744596921 +7554 0.0659454668480743272374 +7555 0.012424539704524967948 +7556 0.0414716519477870901311 +7557 0.0060982613768865634105 +7558 0.0019749918060059895412 +7559 0.0860617914388667609726 +7560 0.1281906385788028901462 +7561 0.1298371235252915867342 +7562 0.0335467909990141074927 +7563 0.0820372120532534276904 +7564 0.0348732743753044499568 +7565 0.0021606248252282011894 +7566 0.0388932550412012453811 +7567 0.1109430098787142215944 +7568 0.0522316947129023156915 +7569 0.0485855952187017325894 +7570 0.0796917459147115925244 +7571 0.0114459016862849122242 +7572 0.0276120799972513035481 +7573 0.0019579738036567995337 +7574 0.0052486942925796505402 +7575 0.0470919623960520949968 +7576 0.1146806603889408970876 +7577 0.0029090853961013874232 +7578 0.0186946383228683392363 +7579 0.0595996516914124099817 +7580 0.0078565825482313500494 +7581 0.0035002802102875626815 +7582 0.0016358206211369095539 +7583 0.0027522982008290749686 +7584 0.0680978072857042343591 +7585 0.0040064658645709971124 +7586 0.0072555440385493694816 +7587 0.0228064910318178902526 +7588 0.0046946848720051618381 +7589 0.0556579395459941492219 +7590 0.0379327434609296604284 +7591 0.0683332544559708682241 +7592 0.0427438768026611667916 +7593 0.0134911154326452682739 +7594 0.0806052667344526463378 +7595 0.0124163311867182719894 +7596 0.0387879339849202114943 +7597 0.1065587427049870944407 +7598 0.1534827513542542376523 +7599 0.0784660488380934800778 +7600 0.0074124368890615523545 +7601 0.0170224660924376478 +7602 0.0793725041029665623338 +7603 0.0912753053195942271048 +7604 0.0840165530261533094469 +7605 0.0546393875323556155177 +7606 0.0816585567617398866425 +7607 0.1275521610691978946495 +7608 0.1307545421954632858252 +7609 0.033259644425590720318 +7610 0.2172227104699936128807 +7611 0.092341339328208024706 +7612 0.149213736148017223071 +7613 0.0386636635994135613448 +7614 0.1693400994450028340665 +7615 0.2382205763073545756736 +7616 0.1229399689136530665623 +7617 0.1186058360412045736831 +7618 0.0864301495075685127789 +7619 0.1292895088672188219636 +7620 0.1194778513097639743856 +7621 0.0797512490839563520373 +7622 0.0276061332531845997351 +7623 0.1162374200744188817991 +7624 0.1019157242664188306458 +7625 0.0297962884486058544875 +7626 0.0881032336902971419113 +7627 0.0526327166070035545875 +7628 0.0835341696582452830633 +7629 0.0097803595421533280618 +7630 0.1273206269389361156019 +7631 0.1072839440052221898769 +7632 0.0510961459684963897887 +7633 0.0817839778124322125397 +7634 0.1363767577629549920815 +7635 0.1296350527949128028649 +7636 0.0202523202926151689451 +7637 0.1315367345772084917144 +7638 0.0736681889622753799385 +7639 0.1895668557771294504555 +7640 0.1425074609796906088821 +7641 0.1357898683813494045136 +7642 0.06850596746493045619 +7643 0.0373110594449653498739 +7644 0.0988114183417411473531 +7645 0.0946250185783139019513 +7646 0.0684015104617954200483 +7647 0.0428055438073504448959 +7648 0.2556886104794568459475 +7649 0.2587389809624160430523 +7650 0.0261405045140640814039 +7651 0.2667359373190992255509 +7652 0.1638205875688134305346 +7653 0.065404423346804144157 +7654 0.000432575810881318727 +7655 0.0589489237717977510034 +7656 0.2871164096494662198999 +7657 0.0431430652993868429812 +7658 0.0326674092914173222479 +7659 0.0325875141186911798652 +7660 0.2383965773595692894116 +7661 0.2026451990104156386518 +7662 0.1912048964923938054739 +7663 0.1866936228548951304251 +7664 0.0295477076545063339907 +7665 0.1776227307914819419921 +7666 0.1257346625519757399303 +7667 0.1642553473138395891961 +7668 0.0131002811208574335144 +7669 0.261496997994879132321 +7670 0.0492266747236059579174 +7671 0.1673194415338175311536 +7672 0.1392021263127091734724 +7673 0.060400947643644606333 +7674 0.0659152873956738993844 +7675 0.0162859273372955894177 +7676 0.0110601136017136700979 +7677 0.3759722996569637154529 +7678 0.2184854559726182310353 +7679 0.3667109499413945816748 +7680 0.2676752696682538901207 +7681 0.2889380292318690091058 +7682 0.1933174753129356104875 +7683 0.3069375127304095896008 +7684 0.1904526581571102383794 +7685 0.2476847323529803857056 +7686 0.2316095251860201542637 +7687 0.1964858449094623316089 +7688 0.2456636337580349882526 +7689 0.2828157439286412278001 +7690 0.0011568589114942194867 +7691 0.3012181751650901406769 +7692 0.2279235145196432943404 +7693 0.282771493904825199639 +7694 0.1923590690813153092353 +7695 0.2150136251784743046667 +7696 0.2670939873297866751223 +7697 0.2389051964827669238822 +7698 0.24186787594795283729 +7699 0.2726674783940083446332 +7700 0.2080420239500096746266 +7701 0.1834806990279933269772 +7702 0.1968640219608780161931 +7703 0.2356921470663106976673 +7704 0.2832263157314920398733 +7705 0.2172890939675496402295 +7706 0.1579489218302234654345 +7707 0.2492033751473797797971 +7708 0.2149789369417271223117 +7709 0.1762090971266905281567 +7710 0.1567294723433814707114 +7711 0.1873733724083976592834 +7712 0.1283108484767878088029 +7713 0.1059678348442252421302 +7714 0.0443192234831766979086 +7715 0.0219011014616836349744 +7716 0.00976692340522885942 +7717 0.19241189472570530139 +7718 0.1846207223601159796988 +7719 0.1251253385125424832935 +7720 0.1760945022294530815099 +7721 0.1201002190911241440663 +7722 0.0707038118086094485859 +7723 0.1591841956250227285707 +7724 0.1227584226913335241349 +7725 0.0436999563925749498483 +7726 0.0455632706424348027374 +7727 0.1370682606473955789106 +7728 0.1603581631827242504063 +7729 0.2026859548766176688517 +7730 0.0243648136177845101025 +7731 0.0781358980680691567189 +7732 0.1089403104127951021907 +7733 0.1113273700734550702984 +7734 0.0714694244098405817578 +7735 0.1324778957217895292153 +7736 0.2940036078664337582111 +7737 0.0861794512179971383681 +7738 0.1147005924578168994943 +7739 0.2383729986626546282213 +7740 0.1004938693456455534037 +7741 0.0043804196978884562055 +7742 0.0525901089309372127278 +7743 0.0331250147170802020091 +7744 0.0794887287135967873786 +7745 0.0764708649727942613161 +7746 0.1325095055530903231933 +7747 0.2552228468187550114443 +7748 0.0780722796894219028818 +7749 0.0763163682898871653659 +7750 0.0010040458456855598426 +7751 0.039667079764968124489 +7752 0.0552298151858492680777 +7753 0.0857010876122456105586 +7754 0.10801598383825504468 +7755 0.1972523998761545604985 +7756 0.1612837224838029603902 +7757 0.1771341453065832793889 +7758 0.2287299769986745334727 +7759 0.0779677896218715915655 +7760 0.0484756944618742052766 +7761 0.1012690213418379475696 +7762 0.0273004954662233977059 +7763 0.0220459118983915465517 +7764 0.032090122799253530117 +7765 0.1606187781710781969924 +7766 0.0921165001051857251779 +7767 0.0811154629825486805927 +7768 0.2396110680670553438887 +7769 0.1641551035053006590836 +7770 0.1518743682611336520694 +7771 0.0576294184923425226175 +7772 0.1331957723993718034627 +7773 0.0465894102359195941276 +7774 0.0852694714485661420245 +7775 0.1241738390145540466003 +7776 0.0265281836500965290115 +7777 0.085121852783833495959 +7778 0.0201791744589736692095 +7779 0.0418705590430583848849 +7780 0.0445266609044060596156 +7781 0.0777003422116917669138 +7782 0.0220188171236753091331 +7783 0.0192753543487450829108 +7784 0.0860526644849779892565 +7785 0.1197209332128265357742 +7786 0.0776842638247376426897 +7787 0.0631458447683398149675 +7788 0.0195775225969501212586 +7789 0.0377440994457753226099 +7790 0.0599119553946016944468 +7791 0.0082272871028938477506 +7792 0.1819970289381248607086 +7793 0.1143410328835337691489 +7794 0.1950724232008145819783 +7795 0.1039347866428994432431 +7796 0.1178296322935371009955 +7797 0.0139480566132595979606 +7798 0.0455886182112002005806 +7799 0.0134380309619125705434 +7800 0.0450604894481721474087 +7801 0.0195451111311735446774 +7802 0.0229596428998333282334 +7803 0.1931433190369310215484 +7804 0.0560717016718346311643 +7805 0.0545526398842702819891 +7806 0.1737466006329947409981 +7807 0.1796325620578898274449 +7808 0.1243398330828845849139 +7809 0.2942923173253740309896 +7810 0.038885870844488135134 +7811 0.0635790117509993091272 +7812 0.1169697401517209855992 +7813 0.154526965333016269577 +7814 0.1736152600423614134062 +7815 0.0390040061797075898742 +7816 0.1933300247167799301717 +7817 0.0038415200191523821149 +7818 0.1179981009904673694422 +7819 0.0278969662903250266439 +7820 0.1561238935319643472699 +7821 0.1194968234522447575463 +7822 0.052705212032742997097 +7823 0.0827657372224917115 +7824 0.2084270912513574991465 +7825 0.0036445227100406090738 +7826 0.0085056293267852852574 +7827 0.216973630074812046109 +7828 0.254588641666423864951 +7829 0.1402935648969171056333 +7830 0.1355556310215164650401 +7831 0.0831235977344571552727 +7832 0.1277243244957885492941 +7833 0.0603619363080557386203 +7834 0.0444894965853371085474 +7835 0.1434521372261584326591 +7836 0.0042967247688592349886 +7837 0.0020632728768130242057 +7838 0.0865370469448598494955 +7839 0.098984144541199303724 +7840 0.0925270299018886716036 +7841 0.0893927218981619975402 +7842 0.2176719704820487821806 +7843 0.0770381140890784993358 +7844 0.1849529737380625205034 +7845 0.1986148685538724756316 +7846 0.025018423831552401293 +7847 0.0199163418301229605545 +7848 0.2638492919534161096351 +7849 0.0970469204903780108262 +7850 0.204053643712579668712 +7851 0.1445817922802202226684 +7852 0.1116336839200813835227 +7853 0.0063197150585952953372 +7854 0.1378306026586227917008 +7855 0.1286393246713649074486 +7856 0.1535593162159908098285 +7857 0.1620172867984844833344 +7858 0.0857636808125600280661 +7859 0.0084340915288044857739 +7860 0.0221630092775593652565 +7861 0.0349310309891847983743 +7862 0.0270920744095849465316 +7863 0.0435884233491691669427 +7864 0.123051002505464010528 +7865 0.1196835441208962264037 +7866 0.088760674736947403618 +7867 0.1346833186915855073984 +7868 0.1031765499293945814729 +7869 0.1221677371247434640278 +7870 0.1803029722186451910826 +7871 0.0813697864550963906316 +7872 0.2480679141558752298646 +7873 0.1634288904939017461615 +7874 0.026121649096961727099 +7875 0.0007518892895408251778 +7876 0.0005699886635757907835 +7877 0.0112725577701837231109 +7878 0.0117191209508870249068 +7879 0.0736166512295413216771 +7880 0.024194195736976020078 +7881 0.0169609877607077470796 +7882 0.0704579421936450261965 +7883 0.0203972619109350529476 +7884 0.0006730685238339644601 +7885 0.0285019811293807656671 +7886 0.0035063325873171788567 +7887 0.0011586323435832670235 +7888 0.0014402566184047731375 +7889 0.0007059592681142110981 +7890 0.0446859376431841814892 +7891 0.0437815708014066412579 +7892 0.0194657437520748648196 +7893 0.0048389857091368443723 +7894 0.0779211627586418459357 +7895 0.1306065107671844915949 +7896 0.2438538761331888737871 +7897 0.0332621191115974809693 +7898 0.0242968982981237174856 +7899 0.1746319138790585201448 +7900 0.062315287627756094091 +7901 0.1410366069375854347623 +7902 0.0041457414022512137711 +7903 0.1701365979399347416745 +7904 0.202397810848261666683 +7905 0.1241783738698143263468 +7906 0.2341010355805780429606 +7907 0.2530078251493072527545 +7908 0.0696951959660535441676 +7909 0.0557678758093095722215 +7910 0.0980807184956736799464 +7911 0.170033438326043340183 +7912 0.1061291505931268863705 +7913 0.050887482618959416214 +7914 0.0931450313471525875864 +7915 0.1426673169212336611533 +7916 0.0012342447541678150131 +7917 0.1683490968809696619601 +7918 0.1503434107946854392246 +7919 0.001638236664645191477 +7920 0.1912641270293847228778 +7921 0.3556435113967142958025 +7922 0.3013856103843566835998 +7923 0.086701617363328764565 +7924 0.0669510553455204965312 +7925 0.1025461082133745788214 +7926 0.0307212774548065344071 +7927 0.0448746628000845915185 +7928 0.1579573239012362073641 +7929 0.2062441244258933770173 +7930 0.0778220188901800752346 +7931 0.0807204598562545477813 +7932 0.0956489490948867343567 +7933 0.125546603206583284873 +7934 0.332930153951908724963 +7935 0.2007825003349816284093 +7936 0.2398483633013025750902 +7937 0.2193059084779706569002 +7938 0.2462904583204033048816 +7939 0.1154312803656749597536 +7940 0.1271934411122072061673 +7941 0.1585456090833379794169 +7942 0.1630226223384990680287 +7943 0.0860083991318629503819 +7944 0.0258402826672891987314 +7945 0.0298901674297904519639 +7946 0.0100550512795625542756 +7947 0.1928901822591163883747 +7948 0.1329291365930915791438 +7949 0.0772212025128684309561 +7950 0.0787550991290837665293 +7951 0.0955403070366815238001 +7952 0.1396173676252501494321 +7953 0.0827688132481362859316 +7954 0.296050911806255756531 +7955 0.1517377606003872392293 +7956 0.2735501490771631849519 +7957 0.0988196019779837425689 +7958 0.1730225641439714356906 +7959 0.1515055231436285010371 +7960 0.1051991348048076785338 +7961 0.0751530732582413596443 +7962 0.0375611825527542020353 +7963 0.0875867253291385816638 +7964 0.0662664549414085923829 +7965 0.0688240313270044312688 +7966 0.0047026591289701311083 +7967 0.0408089856972086509335 +7968 0.0352768132870023357062 +7969 0.0607882170729906606832 +7970 0.1489635483113908309694 +7971 0.1629919637022253509073 +7972 0.1939445098123929733802 +7973 0.2161776290219819540717 +7974 0.0005093453414150353725 +7975 0.0057156657402992029973 +7976 0.0043864941852611132045 +7977 0.0223189627679737656596 +7978 0.0003870086401841667375 +7979 0.0008338224975211719521 +7980 0.0031326239610273728768 +7981 0.0207786048351801637402 +7982 0.0079862036896744953429 +7983 0.0157789220243411618116 +7984 0.0064676275242287698669 +7985 0.0210548249979768710427 +7986 0.0128888492670193449685 +7987 0.0007974635262692003016 +7988 0.0053476036593378597356 +7989 0.1726683664974643372947 +7990 0.1950499705149030260642 +7991 0.0168593288194459969831 +7992 0.017283952616719350931 +7993 0.1520367363564799623177 +7994 0.2081494982607478283487 +7995 0.0193041270607221572553 +7996 0.002653916493526287633 +7997 0.0060992535033353644999 +7998 0.1057198963661232782307 +7999 0.2332227253424350443822 +8000 0.2211219642909796889718 +8001 0.2710028770808610154575 +8002 0.1917413879289884548474 +8003 0.1160310925734053533187 +8004 0.2291498991027942633281 +8005 0.2769781291135791079228 +8006 0.1798243250441287910402 +8007 0.1850214757162216006048 +8008 0.1473926005596505051098 +8009 0.1665491287889428739799 +8010 0.1964235155845980018707 +8011 0.2557955537647512156418 +8012 0.2825406601997142996829 +8013 0.0239324440939714389032 +8014 0.1426692118816489940336 +8015 0.0044089436624696358089 +8016 0.0537569137657893658666 +8017 0.0445310973831400877532 +8018 0.0796743425504855135033 +8019 0.113295831396720461659 +8020 0.0862187754800706751546 +8021 0.1446698211423786128993 +8022 0.0696624666552513732709 +8023 0.0317415404971764722464 +8024 0.0659931520566814505679 +8025 0.1024876541943848390348 +8026 0.1087230656481881913011 +8027 0.1875043194424347581251 +8028 0.0644900942714387026555 +8029 0.0325798566685546325816 +8030 0.2267012710258762553384 +8031 0.0270460343132474349825 +8032 0.0183463963242471266024 +8033 0.0161070512560872423913 +8034 0.0092858961104817800553 +8035 0.0281803337147393320739 +8036 0.0218400806468336712574 +8037 0.0169382613058607668644 +8038 0.0347714077855867387257 +8039 0.01986614604575804785 +8040 0.1160271207300045820388 +8041 0.0475464450301294541679 +8042 0.2661378390314421138463 +8043 0.0406998651210370987474 +8044 0.1483958358883378236825 +8045 0.0159648467742135626024 +8046 0.1535577431864647290904 +8047 0.0990377908730425554618 +8048 0.0530839932601600353324 +8049 0.10048184043616953387 +8050 0.0898331326779885364076 +8051 0.1615450910778126469847 +8052 0.0937059835566323717782 +8053 0.0735816783518493250371 +8054 0.1238168024079212697908 +8055 0.0240350215283146768919 +8056 0.2312685694004778524935 +8057 0.1171745669294664976556 +8058 0.1300676733531173634439 +8059 0.1257719516315556618213 +8060 0.013579608201392709757 +8061 0.2212872839454851081609 +8062 0.1925864879212845492962 +8063 0.2934888092554505667486 +8064 0.0916959097793988708869 +8065 0.2325946565714692715332 +8066 0.0049660439219404821365 +8067 0.0826683747365726390166 +8068 0.015236443818521959101 +8069 0.0451674169107786349642 +8070 0.012019977032609032927 +8071 0.1365046015954938796177 +8072 0.0654539262502410040812 +8073 0.1490686838991415186229 +8074 0.0118012607904992190833 +8075 0.0006946324772002267483 +8076 0.2254428193336694141724 +8077 0.1887113664395209389202 +8078 0.1543935346516399909067 +8079 0.0125642910487055396024 +8080 0.0108806265889780529738 +8081 0.0051315717936510725147 +8082 0.1668311305898099172751 +8083 0.0088618344753505790684 +8084 0.0603700012160067675526 +8085 0.1181733855043776565408 +8086 0.2018024109397341947503 +8087 0.1077986974730957697721 +8088 0.0110416429416891493515 +8089 0.1566656125756440975927 +8090 0.0511124591487021190384 +8091 0.0644308075592996826186 +8092 0.0251816058213628832241 +8093 0.150717205438087531455 +8094 0.0028347603800281222231 +8095 0.0313778819519579255104 +8096 0.0287895866851518508756 +8097 0.0171383029313267945537 +8098 0.0770846809299985391561 +8099 0.0172051733532912927427 +8100 0.0087249902243147162834 +8101 0.0134791946096992840132 +8102 0.0697738908340606728276 +8103 0.2086526526328291442081 +8104 0.0061973274661356880788 +8105 0.0410572964203708284936 +8106 0.0110497557709863906961 +8107 0.1411377741369430061091 +8108 0.0150815345652777852564 +8109 0.2108743929439176045815 +8110 0.0879439569380841557056 +8111 0.2193426855057571400742 +8112 0.1456310976273381407875 +8113 0.0229111852628516507457 +8114 0.0348735349647690748287 +8115 0.0106496258697687216921 +8116 0.187723196717237805764 +8117 0.0113161067128617499195 +8118 0.0033084183851271905911 +8119 0.0332461535550605050138 +8120 0.02155581459212744036 +8121 0.0213537312151824941464 +8122 0.0003058826640216788387 +8123 0.008882269415836384352 +8124 0.1350635747947812326242 +8125 0.0686271412373081479696 +8126 0.0024203748978379297473 +8127 0.0051502177414069474568 +8128 0.1700534747925095957477 +8129 0.001591584851292949165 +8130 0.0306655372669516340656 +8131 0.0137406568877327349421 +8132 0.171463072680654360358 +8133 0.008903467841784140635 +8134 0.0119488440774803816175 +8135 0.0065854266562691077605 +8136 0.1468248127359326749009 +8137 0.0931952248600785015942 +8138 0.0061505957371002836082 +8139 0.0947653088601029675031 +8140 0.0019635560229774233314 +8141 0.0438412434105155054964 +8142 0.1557476489079694936812 +8143 0.0012831757058722045514 +8144 0.0043742411813687759672 +8145 0.026820649124110629985 +8146 0.1580674960662048211812 +8147 0.0355918234866338667466 +8148 0.0113728011107587546891 +8149 0.0982460597192237050646 +8150 0.0914601295211995707346 +8151 0.0224496599324540482834 +8152 0.0024232584247553946681 +8153 0.0160428132373416597323 +8154 0.3702069776176252768884 +8155 0.1196299075959143393133 +8156 0.2187406535369678939329 +8157 0.200074330918258630474 +8158 0.2128962664686286365701 +8159 0.1549256915308926119135 +8160 0.1426433734195731717342 +8161 0.0493778883690783598759 +8162 0.3155306974358246252521 +8163 0.1725127586271398838491 +8164 0.0293543425259273384198 +8165 0.1339660504538901231175 +8166 0.0075200978978573846875 +8167 0.082859056801695821215 +8168 0.2951126674745749300932 +8169 0.1355125751957901580891 +8170 0.097256376973487684845 +8171 0.2849912306803766837149 +8172 0.2266929260697230219535 +8173 0.0005224801165746368508 +8174 0.0091020128596853624309 +8175 0.2004045653543649052608 +8176 0.1699806543412160919626 +8177 0.2374329648297423267511 +8178 0.2602892178156579472414 +8179 0.0871590329928716711638 +8180 0.0914702126753063199383 +8181 0.1345953734388850298043 +8182 0.0201547664335187150242 +8183 0.1012202781014499425316 +8184 0.1905404766131464644463 +8185 0.2329242366291104926468 +8186 0.2152806191803761370895 +8187 0.0187189490707160566263 +8188 0.0169811140958336138918 +8189 0.062829698802561506632 +8190 0.1367859863065006564842 +8191 0.0810484997866663392507 +8192 0.0362912022220352320501 +8193 0.0385904308386632866057 +8194 0.1247605098932989819982 +8195 0.0976381965947603347455 +8196 0.1766556103652550946403 +8197 0.0935034687927338709068 +8198 0.2059270642755827440684 +8199 0.1244959189105442481926 +8200 0.187930162139213846606 +8201 0.1852282624499892449421 +8202 0.1730130322368503281716 +8203 0.0288376728436625824614 +8204 0.1528581758589057648656 +8205 0.1528395024441393523773 +8206 0.1711165668309540810466 +8207 0.1647731450177844225724 +8208 0.3396490961022038956862 +8209 0.1243208937436510452823 +8210 0.0243268178799608027207 +8211 0.020680461929405410948 +8212 0.085823923606832427935 +8213 0.1498983519178340162448 +8214 0.0023459087102835956759 +8215 0.0092775099397921084732 +8216 0.0030637821289449551111 +8217 0.002642096252129567252 +8218 0.0405296504267370835684 +8219 0.0363162826183203862884 +8220 0.0360909918999247625315 +8221 0.2292925400009672431967 +8222 0.011836250040414461257 +8223 0.1647274232824157691457 +8224 0.260771709352052238895 +8225 0.0986203337807698698914 +8226 0.1626097880316599020301 +8227 0.1756568930592790067635 +8228 0.0358472961008086260515 +8229 0.0334862857619721010494 +8230 0.204490973647514479028 +8231 0.0289074872517313466447 +8232 0.2150611450059162210735 +8233 0.2081886018236304702889 +8234 0.2032723777307310908391 +8235 0.125940560309174304221 +8236 0.136669858152615286695 +8237 0.2836386682824428295824 +8238 0.2869379329010571244574 +8239 0.0308950603967206964551 +8240 0.0461640575622408594336 +8241 0.1988151961399936829 +8242 0.2008885222181200080893 +8243 0.1971104132349726645312 +8244 0.1998001231145459521965 +8245 0.1128517756721466869241 +8246 0.1324073353261488172894 +8247 0.0709308832897719843125 +8248 0.2343175358917653217095 +8249 0.2250575851379885417014 +8250 0.1883157104745049326144 +8251 0.2263413584138835032977 +8252 0.2205339996969711835462 +8253 0.1764427698142534139958 +8254 0.1217653599197023595035 +8255 0.1175128370019645207556 +8256 0.0122661157805369451512 +8257 0.0005014032023498829265 +8258 0.0010309882100125854903 +8259 0.0111725837610311476422 +8260 0.0175896823023807546971 +8261 0.0961481568143081333222 +8262 0.0194535332408504943458 +8263 0.0083716406561758875682 +8264 0.4266332955447856289943 +8265 0.1373108429648904338372 +8266 0.2203760393001583661743 +8267 0.2065719419253984334706 +8268 0.1180381813675825813936 +8269 0.1298244551169779881228 +8270 0.1025492276666276625363 +8271 0.2160748380618027764122 +8272 0.199410379589783165466 +8273 0.1869568461003000281462 +8274 0.1090235973151456594366 +8275 0.0734901281409549023138 +8276 0.1634705005713570635795 +8277 0.1547153493784516609111 +8278 0.1401364931877334962795 +8279 0.0794226315397620064029 +8280 0.196514633285544026986 +8281 0.1304792676848122168209 +8282 0.178659655552167118131 +8283 0.1123604080858924270103 +8284 0.1963864243729896708057 +8285 0.2694984263548764746865 +8286 0.1010546687005020594086 +8287 0.0475640032547754235126 +8288 0.0851489097478186351964 +8289 0.0015806472389913528778 +8290 0.0256172827300952463125 +8291 0.1320269233057109747875 +8292 0.1247886614653523490448 +8293 0.0839372500856030434591 +8294 0.0039131802592870454371 +8295 0.0408193196974821898015 +8296 0.1135086282891694664521 +8297 0.2498340907177203917833 +8298 0.0548030914235575067339 +8299 0.2821879354715443599666 +8300 0.0646681828741367398239 +8301 0.2517809906783031737021 +8302 0.2835491120658327757198 +8303 0.3234034960452050744273 +8304 0.0322483474377103715436 +8305 0.215178009207053072549 +8306 0.0713852000012474841029 +8307 0.0741072197485198869149 +8308 0.0054617462085422225151 +8309 0.0153247372735739583599 +8310 0.0619930278516881566597 +8311 0.0523461360857300009308 +8312 0.0076064259246018941715 +8313 0.039857338989777113536 +8314 0.1408121918090682600511 +8315 0.1374750472832215741903 +8316 0.0879654719033735321698 +8317 0.2927134981817896552947 +8318 0.1387424669077952921192 +8319 0.0373012722271980692867 +8320 0.1626528353232959966945 +8321 0.1966310263836207883337 +8322 0.1832837729961475647933 +8323 0.0060689898849009132897 +8324 0.0912287111453712468156 +8325 0.1286172116702031242852 +8326 0.00496257297623490818 +8327 0.199239565943128416059 +8328 0.1929114084120694128099 +8329 0.0806296518258376726518 +8330 0.0287058554450318609286 +8331 0.1753410655081814040201 +8332 0.0739667723168215551777 +8333 0.0276796195455713302247 +8334 0.1302050314471113467985 +8335 0.0134882450966109374013 +8336 0.0886401013513393315479 +8337 0.1083403778653998644765 +8338 0.0528947559593432800606 +8339 0.1019587766453859956073 +8340 0.0716924464459952476281 +8341 0.1512562544948740650419 +8342 0.0284996368135156941115 +8343 0.0022956348252046068596 +8344 0.1987696541886788126341 +8345 0.1147356016189915173253 +8346 0.1670226398281123014744 +8347 0.1277242419870563971962 +8348 0.2692077516643085477455 +8349 0.184023854651205492905 +8350 0.0948578682452256571889 +8351 0.1161877817898767234484 +8352 0.084232545928877161634 +8353 0.1444736183655415395766 +8354 0.1386989066038269158287 +8355 0.0872548388204959224357 +8356 0.1190990350454049495621 +8357 0.121880326688006745961 +8358 0.0249814937245723747705 +8359 0.0133986165709000133728 +8360 0.0494588545686099612109 +8361 0.0149101731325964569075 +8362 0.0068131434667802488053 +8363 0.0534069242159756346622 +8364 0.2851859577494834652178 +8365 0.253314179920509252586 +8366 0.1515503950174606206946 +8367 0.238829156215830396004 +8368 0.0801245084286383441041 +8369 0.1619772277269920512932 +8370 0.1209995231019217604151 +8371 0.0989221400216643564907 +8372 0.1050621982854366281979 +8373 0.3099639706843750897036 +8374 0.1439044472773757799011 +8375 0.1793336689868239297585 +8376 0.1972167402319305873348 +8377 0.1071343067294752843255 +8378 0.2887570331710305748807 +8379 0.3539263203243126487685 +8380 0.1014285035769060677735 +8381 0.2064330217237701492472 +8382 0.1550885598744145554839 +8383 0.0857159008683094914005 +8384 0.0274857428668016379181 +8385 0.0503569291866675580738 +8386 0.106811909265523780066 +8387 0.137393207298758174284 +8388 0.2168626993714627415599 +8389 0.0616164909513487804582 +8390 0.096890971498140077145 +8391 0.0833621017814875631124 +8392 0.267889840061612927169 +8393 0.1296109020598621153209 +8394 0.1243104910453090988787 +8395 0.2400465897263367731629 +8396 0.1438341562279945029523 +8397 0.1899002750717191656538 +8398 0.0725199867082254173711 +8399 0.2013243642879081196639 +8400 0.1949345701043641854167 +8401 0.1553192492674611568937 +8402 0.106169609132398992668 +8403 0.0953767206181078824878 +8404 0.0864107884742765786079 +8405 0.1551779447766649111529 +8406 0.198899605530331952119 +8407 0.1940454386346371629113 +8408 0.1548997437405025823232 +8409 0.1488443017525889477959 +8410 0.0822610293010166276906 +8411 0.0560053474930843758761 +8412 0.1325853689683804681465 +8413 0.1953528389193071035379 +8414 0.1264329043690376463971 +8415 0.1672375170595779836358 +8416 0.0265870824608366654318 +8417 0.1931970626664727130883 +8418 0.242883034873440428747 +8419 0.0984995141104343385763 +8420 0.1053335865077440897108 +8421 0.0699685004547361821814 +8422 0.0334897230028006198665 +8423 0.0438162323578882989539 +8424 0.1298041482494068921127 +8425 0.142596692062300189896 +8426 0.0468739664747962886393 +8427 0.0836416195120491118287 +8428 0.1201680154943152445401 +8429 0.0135441411852128008181 +8430 0.1387365926331804166782 +8431 0.0845973092371955504021 +8432 0.1778400015245457654878 +8433 0.2995018058475634181725 +8434 0.1104570453180502426216 +8435 0.2872765313870900283355 +8436 0.1125323008696950261021 +8437 0.2562683965074810110707 +8438 0.0767996334301541544454 +8439 0.3028096455498837658027 +8440 0.0050786626805905503435 +8441 0.3328767447860822015215 +8442 0.0555708520786918347412 +8443 0.0070065844842152301408 +8444 0.1372769200793052257747 +8445 0.1658952728110394680971 +8446 0.21288967262208283171 +8447 0.0107474677579122313703 +8448 0.2676047674112670970992 +8449 0.1498219245432675372776 +8450 0.0685652613330299670613 +8451 0.0632449252544773909968 +8452 0.1380745290805992508965 +8453 0.0055384772223088301821 +8454 0.0268834277799697950184 +8455 0.0815034795604387268808 +8456 0.0818646474402432294815 +8457 0.0308507372167282668818 +8458 0.0125147146590686795392 +8459 0.1470050469571408668923 +8460 0.0060297500606925928321 +8461 0.0778117480956866669484 +8462 0.1344136946596957427058 +8463 0.0601946975498324210463 +8464 0.1372695496165283746137 +8465 0.0480924143055664402024 +8466 0.1247790707648753771863 +8467 0.026864209358468935096 +8468 0.13603632574713969694 +8469 0.1342818453732010086643 +8470 0.1432592119961491328173 +8471 0.1667455347175985647112 +8472 0.1208666328887958996141 +8473 0.0100635043368035737715 +8474 0.0367526459985671391517 +8475 0.0075689901450897495092 +8476 0.1546066218041613760104 +8477 0.0501081312517454718902 +8478 0.0793215634828346310981 +8479 0.039521764170312480291 +8480 0.0710921520834521813992 +8481 0.0435970712291062767463 +8482 0.055692778176582878602 +8483 0.087379438828579555909 +8484 0.0840855304907888317922 +8485 0.0244265882232814687813 +8486 0.046530720457343383023 +8487 0.0411898506195353536485 +8488 0.0028610803822663537742 +8489 0.0279359583498239548438 +8490 0.0922796467283041182972 +8491 0.1267715601539365355421 +8492 0.0545901806427041905168 +8493 0.1500606946213084158526 +8494 0.0942370963673655193116 +8495 0.1927579446168497245839 +8496 0.1765470111321363888557 +8497 0.0637635304151677184681 +8498 0.0157961533303975602827 +8499 0.0772265530704200342527 +8500 0.3224316456913740025492 +8501 0.3101564981158096978575 +8502 0.0245178183270786075043 +8503 0.0674203619984789087605 +8504 0.0272051477048295582695 +8505 0.0458778622262686966438 +8506 0.0401152366864103021582 +8507 0.0064300440366741196813 +8508 0.0981610185659428091665 +8509 0.131362304842922439363 +8510 0.1199522860912959620894 +8511 0.0298885802110887832672 +8512 0.0575851441180640979955 +8513 0.0944968170609585961239 +8514 0.0057255901381017930418 +8515 0.0271841179295313969955 +8516 0.0035749158281992632 +8517 0.1883008183772245935916 +8518 0.0261353293078208995248 +8519 0.0768587472637319857327 +8520 0.1115814959780950521573 +8521 0.1373586336444014566194 +8522 0.2423293292839186485743 +8523 0.0574507074318488519804 +8524 0.0522677341015111290856 +8525 0.0616915668857879462283 +8526 0.0561406369112606440375 +8527 0.0361170733504326893493 +8528 0.112287841001874250102 +8529 0.0684572519843886367896 +8530 0.1007808848813154523061 +8531 0.1837137812924459490116 +8532 0.0828980670401384123203 +8533 0.1411394382626726673458 +8534 0.0068054542417987102382 +8535 0.0656562474824478742308 +8536 0.2423461377893346146983 +8537 0.2017931912515903147831 +8538 0.0792923484174761217247 +8539 0.0326564332025235937174 +8540 0.1524766010640912772267 +8541 0.0302520412871433094781 +8542 0.0627549654283880292693 +8543 0.0308265990056969876876 +8544 0.1587400073135581957118 +8545 0.0391665357467177940864 +8546 0.0573811104433479476916 +8547 0.0056505472740991588565 +8548 0.0582349131195196573207 +8549 0.0039247793006519111209 +8550 0.0748444009431286083611 +8551 0.0627364796790810808069 +8552 0.2145462879050566984418 +8553 0.0210343870773128625851 +8554 0.190250033411893948454 +8555 0.0237629351665213477129 +8556 0.2720010114967867931313 +8557 0.0837622721544434428731 +8558 0.2577230814422862836821 +8559 0.2109191547322775694973 +8560 0.0163457007984557957381 +8561 0.1651339206670606740701 +8562 0.1609532327702005716219 +8563 0.0400281359249703150294 +8564 0.2850622918066176847418 +8565 0.1541551939300546958389 +8566 0.0856868638136405735839 +8567 0.2604770146411228282091 +8568 0.1210990552832905686742 +8569 0.3051387445416685051391 +8570 0.1231477045006560966378 +8571 0.1843783337009994305333 +8572 0.0680592065543928065319 +8573 0.0150870451756516706088 +8574 0.2750223898093290264555 +8575 0.1812332137451080071866 +8576 0.1546591600751805462011 +8577 0.1028767260338346034576 +8578 0.1373324702829188115327 +8579 0.0364048237110454406262 +8580 0.0603153368275763240858 +8581 0.1518815453172895069134 +8582 0.2458375073390864362644 +8583 0.1652530316021365019985 +8584 0.1938696131795154820043 +8585 0.1653200002628394948978 +8586 0.0102415800591138216047 +8587 0.2111992314142422666468 +8588 0.1771277554908438911507 +8589 0.2328629453663851045508 +8590 0.1453337064194472894041 +8591 0.1011338261790686171571 +8592 0.1610175398289128145812 +8593 0.0676879054932211232654 +8594 0.1747790617824113712508 +8595 0.1732873854277291458814 +8596 0.1183564126718560510776 +8597 0.0208970508396723042566 +8598 0.0047234806503864091784 +8599 0.2558599431232341792075 +8600 0.0134262780518028396193 +8601 0.2949221252275749938221 +8602 0.2586965113632408996835 +8603 0.1070645995116193682772 +8604 0.2649721644975878143136 +8605 0.1802854031882577534596 +8606 0.2335858098305548524731 +8607 0.2797968637867251362472 +8608 0.0369834374392549458666 +8609 0.1543806435008284738686 +8610 0.0018580472829382744127 +8611 0.1586669062940708752052 +8612 0.1199839318248194508509 +8613 0.2249782513728329214064 +8614 0.0224386788468770477512 +8615 0.0189843980596832613561 +8616 0.3193786240608227910975 +8617 0.1836159270235522555925 +8618 0.2659345083129484588191 +8619 0.2037110943864854639695 +8620 0.1429072222497842492572 +8621 0.0970785520368723647833 +8622 0.1703450858300144810098 +8623 0.2497877287138557111934 +8624 0.1918364349108097333119 +8625 0.2206422835240241242527 +8626 0.1826255066917265856841 +8627 0.1763673240543145448456 +8628 0.2067040847314073059859 +8629 0.0666189297610578906594 +8630 0.1631540367487031018001 +8631 0.146688323376795826114 +8632 0.0878003921803628822884 +8633 0.0370765066894695252686 +8634 0.0322220349269553563154 +8635 0.1112463274161931275907 +8636 0.2338487479511763278417 +8637 0.2552369968438371050645 +8638 0.1697615366674623815779 +8639 0.2321210460125368579831 +8640 0.1729627855891780752184 +8641 0.0990796697233652934322 +8642 0.0281938001713800019676 +8643 0.0680001480818082648661 +8644 0.0954468034434066692206 +8645 0.1351951555952239925062 +8646 0.1790703723267226543658 +8647 0.1999014932159775825848 +8648 0.1097132198902402155927 +8649 0.2252074634597530933267 +8650 0.0710735479577888490388 +8651 0.1237878233942528577449 +8652 0.1681741581068993407477 +8653 0.0886444964692530423811 +8654 0.09222080043698246965 +8655 0.1909477782952623547974 +8656 0.1941431068252871816604 +8657 0.1487231977154283291132 +8658 0.1173928521243140371544 +8659 0.1585960272323008179995 +8660 0.1655923409942065549494 +8661 0.1404920203428259850575 +8662 0.1443055669901420756673 +8663 0.1370063232795027119426 +8664 0.0210198128243151392824 +8665 0.1961227387784315046027 +8666 0.0481806157048993918823 +8667 0.1605842957862302633476 +8668 0.2345637380019330031633 +8669 0.2669131750055372953589 +8670 0.2187619648456786414226 +8671 0.3028853328708052128349 +8672 0.0426609290532276794194 +8673 0.0577751409399351462115 +8674 0.1743993914245962617571 +8675 0.3244754071685772567335 +8676 0.010207818692816769679 +8677 0.1776378667013016054987 +8678 0.1046738356001176567522 +8679 0.1694291798308132368511 +8680 0.2341488981521240297923 +8681 0.3343062993857486842053 +8682 0.1710294180973286615188 +8683 0.0788741549506775108114 +8684 0.2530363645475956491104 +8685 0.1220124834714734901597 +8686 0.215820934620269827553 +8687 0.0992169782340057981873 +8688 0.0147755006713805644297 +8689 0.1715725410080337909324 +8690 0.1504797592305885389763 +8691 0.1459900298412341035359 +8692 0.1781516065926315450785 +8693 0.1303657146859464877053 +8694 0.1024996258134081711377 +8695 0.0084941569493823913806 +8696 0.1574893356234526864412 +8697 0.0118415190296310198631 +8698 0.172387331041172431334 +8699 0.1944119962989794281327 +8700 0.1188267398074664554786 +8701 0.0927590628005814654689 +8702 0.1240963586861112966098 +8703 0.0433740311008107073953 +8704 0.1686122838037376614473 +8705 0.1108197451259673588231 +8706 0.1172181233295015445606 +8707 0.2622867017102489262115 +8708 0.0915165607821551152501 +8709 0.1897930470139339853564 +8710 0.0585791200622245711305 +8711 0.1648615682150594841104 +8712 0.0569378369815183513203 +8713 0.1601870820235099801554 +8714 0.0574110522977347020879 +8715 0.1776523645424734065834 +8716 0.1132157495522631840412 +8717 0.2780581724199865378822 +8718 0.0007660853408153096844 +8719 0.1495768057148144369872 +8720 0.0662134293779327171015 +8721 0.2734660336507215294688 +8722 0.1592885395581593588332 +8723 0.1559253157595129890556 +8724 0.0389237099265435787521 +8725 0.1504543866831990761579 +8726 0.182634342252342762869 +8727 0.1781028698509506680292 +8728 0.1855832103950514899715 +8729 0.1626422812195581768524 +8730 0.0825948583251277790307 +8731 0.094079997773959797569 +8732 0.1631579797761323058491 +8733 0.1280119741171291680715 +8734 0.1466737862453895702153 +8735 0.2167069868797183995746 +8736 0.2881145705563283865303 +8737 0.1018707967919347673336 +8738 0.1504781128812072987788 +8739 0.1363458991215675819575 +8740 0.2263989138483550045411 +8741 0.2464995677462798739921 +8742 0.1732591592638327016029 +8743 0.0528389540666565071803 +8744 0.2583654261566338261602 +8745 0.2494677073073827411331 +8746 0.0095182941314510532277 +8747 0.2421374967844966696884 +8748 0.2326795950358193920682 +8749 0.2287810895987140435981 +8750 0.2282247361918612327258 +8751 0.1991155757670834136608 +8752 0.1644645737236224702915 +8753 0.1890619918053183523554 +8754 0.2035450607394189148636 +8755 0.3065697274712940179064 +8756 0.1794922050298140281388 +8757 0.0294079043632657181895 +8758 0.0434724945385207561799 +8759 0.0623891660815162410469 +8760 0.1651053443466149184271 +8761 0.2295925258230584187213 +8762 0.0009783135816368562689 +8763 0.3106144965744053187962 +8764 0.2742951320816547911008 +8765 0.2020633131256220627048 +8766 0.2083921507818630447506 +8767 0.245738325409258911991 +8768 0.1914906683538147302848 +8769 0.0650737559289523148642 +8770 0.1768888009015019968651 +8771 0.2627603145567005138439 +8772 0.0978914202514062498084 +8773 0.2319396942010392703715 +8774 0.2206568831039431299867 +8775 0.2623952515082687564352 +8776 0.2942349791928268731844 +8777 0.055050623976922175018 +8778 0.3153641926123249561442 +8779 0.1221315892427932703335 +8780 0.1316097121618917675789 +8781 0.0005883548844420152021 +8782 0.0034606559198119995514 +8783 0.0302496690125612054112 +8784 0.0033507108658286325481 +8785 0.016315809463014582098 +8786 0.0026998149908534544404 +8787 0.0017128025500697220077 +8788 0.0010701014192713151005 +8789 0.0725721164366126131329 +8790 0.132921998251943063174 +8791 0.0509086369621347886727 +8792 0.0020008222847618722592 +8793 0.1024455424033699213471 +8794 0.0468742155535367410746 +8795 0.0482664211624013947399 +8796 0.2428976410113369288624 +8797 0.1800517135589345396607 +8798 0.160232655038910876133 +8799 0.1155096742238283652471 +8800 0.1431022908505255597511 +8801 0.2104644147543652255017 +8802 0.081303090799038635339 +8803 0.1578913295158284879616 +8804 0.0625116571195923342863 +8805 0.2933312768463957054266 +8806 0.1859401561534254065933 +8807 0.0184828564358742279683 +8808 0.0118641358418892506332 +8809 0.1072795342514177158177 +8810 0.2174385520454933107271 +8811 0.1447789598597324367546 +8812 0.4137700638622152649049 +8813 0.1130989444995533638183 +8814 0.1466236096807562450106 +8815 0.1692852469525892022961 +8816 0.1767356748314411196699 +8817 0.093546829278308360478 +8818 0.2787349516921087677623 +8819 0.2527067466034098042194 +8820 0.1644713920643271487521 +8821 0.1426961957610095033111 +8822 0.114745858737520356474 +8823 0.0826369901494222042215 +8824 0.0506387971061982369858 +8825 0.1002813247474612090571 +8826 0.0842647013211725998127 +8827 0.2794783382872994548229 +8828 0.0237408450897899706011 +8829 0.0469773613271550899428 +8830 0.1413731071828622787301 +8831 0.1102916054913853499686 +8832 0.2107598012591708758201 +8833 0.1301086844271786002736 +8834 0.0986355310666577334011 +8835 0.1950242872760624968098 +8836 0.004177990934726347888 +8837 0.060895700886806555796 +8838 0.1686294792992237279172 +8839 0.2029756618412955071484 +8840 0.2183774291320710325692 +8841 0.048257402391321445323 +8842 0.2224894641314822396616 +8843 0.160200607781387643902 +8844 0.1563495456772075187235 +8845 0.105476351160557202391 +8846 0.2163036223410646219367 +8847 0.0394077225321826582483 +8848 0.0132477320654364738584 +8849 0.1631736928333953617898 +8850 0.0147248078799576982006 +8851 0.0250019093657492738614 +8852 0.0262983805043584331629 +8853 0.2059437917452929256434 +8854 0.0179431976322921224454 +8855 0.0457182908262665499421 +8856 0.0416282799027548264248 +8857 0.1733041957491589235563 +8858 0.0989591978678688449778 +8859 0.1026191725622142697505 +8860 0.1144470794535678848103 +8861 0.1182723910958837360008 +8862 0.1075427111465321466932 +8863 0.0857183855563201646532 +8864 0.0102988308744156719515 +8865 0.0299571797204585127394 +8866 0.000551131849778764343 +8867 0.0537427101366555401429 +8868 0.0029902164612644935825 +8869 0.1874163181267580746692 +8870 0.1825750660724729801387 +8871 0.1605541820089251559001 +8872 0.044529215879237346587 +8873 0.0594433704361362058588 +8874 0.0739216885913853150036 +8875 0.1251723539380396044152 +8876 0.0523392851022716071308 +8877 0.0958569278777262751001 +8878 0.1024877386250337835127 +8879 0.2664414721780679617957 +8880 0.0950719883481915967183 +8881 0.0027287925508246045278 +8882 0.0557064858066866341879 +8883 0.1203274413928494129822 +8884 0.0595177227717718429489 +8885 0.0621309125171923895548 +8886 0.0818886132821692591666 +8887 0.0661205086827406224304 +8888 0.0792382277119935696241 +8889 0.0390441624381414956191 +8890 0.0661487486101955679541 +8891 0.0748427825463443135989 +8892 0.0169564110974319941227 +8893 0.1797351730160718086271 +8894 0.1300463768412805365404 +8895 0.2410997729193522787217 +8896 0.1693914907666173863543 +8897 0.2374598022514708695052 +8898 0.1524488264447918239863 +8899 0.2469267257873430654325 +8900 0.1950725105781999646481 +8901 0.1369462049268280745551 +8902 0.0691753479245899860484 +8903 0.1985720429709348644476 +8904 0.0082313413383043598781 +8905 0.2006994131762021016385 +8906 0.0227618217646265094678 +8907 0.2060906720082924803439 +8908 0.0950081856820145925768 +8909 0.2064949037976503698477 +8910 0.1249390385367699779495 +8911 0.0450734056028104165814 +8912 0.1445876263736547151506 +8913 0.0230244820909837298595 +8914 0.0454299758802251055223 +8915 0.01787916891868806743 +8916 0.0006406441702845708415 +8917 0.0489643263730233438413 +8918 0.1341774305868304872913 +8919 0.0614703046078635614857 +8920 0.0729996143539159286773 +8921 0.2014152378267811405177 +8922 0.2634542016191506741407 +8923 0.2232522347626289815903 +8924 0.014843010465368801018 +8925 0.0414998745924296391641 +8926 0.0600805665724093510005 +8927 0.0326345582058278141369 +8928 0.0112616705901575769877 +8929 0.0004516774742378030995 +8930 0.0012134757343348178803 +8931 0.0008006340121037254032 +8932 0.0019810997872627281527 +8933 0.0040855533744516272718 +8934 0.0046025354938020734963 +8935 0.0109968326288086143405 +8936 0.0012321412233166585334 +8937 0.0025880703507152914844 +8938 0.0026926045330208153369 +8939 0.0006992645661205820588 +8940 0.00174202404530055476 +8941 0.0085328083917839796235 +8942 0.0011725706640596180071 +8943 0.0021121909326185330825 +8944 0.0020924830810545120771 +8945 0.0013179008883864925129 +8946 0.0125197020973032546037 +8947 0.2720028416121660908011 +8948 0.1917332218213784189587 +8949 0.1053133920731385703151 +8950 0.0818922620541072676659 +8951 0.0009448237346375734473 +8952 0.0030610408019868680894 +8953 0.0068754578747952524789 +8954 0.0045445155879572729854 +8955 0.0124270815245944247357 +8956 0.0009685691170938048834 +8957 0.001580049001203873843 +8958 0.0140096468783375682071 +8959 0.0020191232411674838246 +8960 0.2319237021533388232175 +8961 0.0941187976972347739579 +8962 0.0126748439896949907646 +8963 0.0007315271757873499994 +8964 0.0011853825719576999674 +8965 0.0013803648101733611512 +8966 0.0005900898533018708071 +8967 0.0027870981379136813491 +8968 0.3042426441558621075245 +8969 0.0023295245353030027968 +8970 0.0015311427338319087438 +8971 0.0008784162084281159398 +8972 0.0009077871550077593011 +8973 0.0005344120538786410832 +8974 0.0005808109809003998225 +8975 0.0006464731284079923033 +8976 0.0005790155954223798022 +8977 0.0006484185511674736384 +8978 0.0005784196394930668375 +8979 0.0005912285374236167425 +8980 0.00057625855949776459 +8981 0.0005753388762944626651 +8982 0.0005820077077555945279 +8983 0.0006580222599635646413 +8984 0.000512986319089254691 +8985 0.0013593225645908725823 +8986 0.0006385240258257183143 +8987 0.0005700503693503779787 +8988 0.0006496073962325843573 +8989 0.0051788588970122931623 +8990 0.0010078159485925628023 +8991 0.0033059868112048965659 +8992 0.0059121569252437063124 +8993 0.0003876253404974693614 +8994 0.0046733737818920083676 +8995 0.0016035226082478975986 +8996 0.0122043175174917986614 +8997 0.0016214890932904132016 +8998 0.0002997946148282937284 +8999 0.0060825975630108700468 +9000 0.0028923613992685778123 +9001 0.0039559603572030853716 +9002 0.0003711047528703148948 +9003 0.0044353902237108493167 +9004 0.0008827648156098318087 +9005 0.0011144897289446934266 +9006 0.0011510573285518643522 +9007 0.0024770337504058472823 +9008 0.0010606222980853695359 +9009 0.0037555254374739398954 +9010 0.0103352508187337290696 +9011 0.0060943579262483279363 +9012 0.000896467650410785852 +9013 0.0013540838685782391297 +9014 0.0049399610912045765349 +9015 0.0006037345677941983535 +9016 0.0049993105232348233891 +9017 0.0010737653729611688555 +9018 0.0082161777236341698022 +9019 0.0016472502595177235599 +9020 0.001685133585277226757 +9021 0.0009218370658281156401 +9022 0.0006935420042534041693 +9023 0.0005933466418798631929 +9024 0.0061018334098876325991 +9025 0.0007265698967183185659 +9026 0.0005854452866300295828 +9027 0.000569481019536725256 +9028 0.0006528582734170191443 +9029 0.000595161772484434793 +9030 0.0006765209498858929318 +9031 0.0005913593440507993121 +9032 0.0180757904059668655927 +9033 0.0006581854467126133394 +9034 0.0521761290953037942808 +9035 0.0092805342097196461687 +9036 0.0027153923088141895641 +9037 0.0017681024558972740536 +9038 0.0095065211375698361468 +9039 0.0162047182125875932623 +9040 0.0044482964412362194576 +9041 0.0144914310412527676253 +9042 0.0380325654509227686906 +9043 0.0003104026874036327002 +9044 0.0048749878469196887901 +9045 0.0090207838440230555394 +9046 0.000301671972689338951 +9047 0.0003850954999393812464 +9048 0.0006056281714305017209 +9049 0.000756630232344836836 +9050 0.0005330100978041213498 +9051 0.0006266386906792492531 +9052 0.1309950902507193426505 +9053 0.3272746256170387391116 +9054 0.0582110589382072995956 +9055 0.1470110057044705653695 +9056 0.084754840093960795433 +9057 0.1561563917548393354284 +9058 0.1390901989520316739402 +9059 0.1636301883617001240445 +9060 0.0510176660700295478423 +9061 0.0524329823102716169325 +9062 0.1005581753262301536456 +9063 0.0562884306077432450266 +9064 0.0803688380268270385898 +9065 0.1318974765562276330044 +9066 0.1661827267096052385131 +9067 0.1374216682304444381302 +9068 0.1215533917585180784249 +9069 0.1330286140137158235053 +9070 0.0007122315678837250891 +9071 0.1499002466726845339551 +9072 0.0884209333627866828076 +9073 0.359814864358447117354 +9074 0.0442174974511563428914 +9075 0.1463746678722840921516 +9076 0.1280056103734071304423 +9077 0.00112921813698249867 +9078 0.0002006840624231151439 +9079 0.0322841765847862935401 +9080 0.0276468013259757866562 +9081 0.0665243652995480133638 +9082 0.1283866387943244369296 +9083 0.1399804446920019096456 +9084 0.1389078113937193825755 +9085 0.2495160098873874243708 +9086 0.1886752942114891251624 +9087 0.0110464284850242174668 +9088 0.1356388049619191016326 +9089 0.0865374071429178687298 +9090 0.0368105930563258007537 +9091 0.0037828579278072656245 +9092 0.1412159403811512614624 +9093 0.1033923853853017382765 +9094 0.2141533534742105504556 +9095 0.017281112717055457928 +9096 0.0781752672365584599934 +9097 0.0899310941119286222367 +9098 0.2884973488772392680168 +9099 0.074407703652794157656 +9100 0.1688327780811844369691 +9101 0.1373264562977437319713 +9102 0.1846206594043681425976 +9103 0.1562344229953515906129 +9104 0.2371459533965763755159 +9105 0.1036555911850459699153 +9106 0.2551630793316444489882 +9107 0.1944376844136443127642 +9108 0.2070509028188843581031 +9109 0.2380749608972562159082 +9110 0.1444117710023732625846 +9111 0.0614961192090996144888 +9112 0.2135751188356360963372 +9113 0.2578302154363311116114 +9114 0.1842682740630620097377 +9115 0.331469625647600407703 +9116 0.3083399426696162293204 +9117 0.2045642935444731136396 +9118 0.0989890649704781577789 +9119 0.1825824953057342825424 +9120 0.1638839500829980533325 +9121 0.0700353597330293647927 +9122 0.0903594394169031228659 +9123 0.1011523891455359952518 +9124 0.1561523467742015058857 +9125 0.0709424838503885796248 +9126 0.2720524800539387277887 +9127 0.0071240325617284595144 +9128 0.0837291965233116358691 +9129 0.0244042095575784517836 +9130 0.1237607128933228550771 +9131 0.2013268002785758015172 +9132 0.1555055775295043629658 +9133 0.0057418407797128903983 +9134 0.0030497457279344158433 +9135 0.0038606731683024364532 +9136 0.0033225365234905886169 +9137 0.0052139785243924888225 +9138 0.0095880895731004049665 +9139 0.0084968996227796599052 +9140 0.0047032963846718924689 +9141 0.0064518434102092776447 +9142 0.006559909475047212557 +9143 0.0178685409610335292629 +9144 0.0086275932472844214793 +9145 0.0024780119091604552693 +9146 0.0197896296651446507164 +9147 0.02122973664022781079 +9148 0.0025958746953689071715 +9149 0.0022854414943916522181 +9150 0.0043811763810345932812 +9151 0.0566689996101861701017 +9152 0.1543470610773555184991 +9153 0.0699177807153527480777 +9154 0.1385905181051646506418 +9155 0.0408765650168604957604 +9156 0.1358648843508474257646 +9157 0.0041126813401982775104 +9158 0.0002714492649946291546 +9159 0.0242160620567300199912 +9160 0.0160612188988420202806 +9161 0.0157980258236506375391 +9162 0.0018354527130640286171 +9163 0.0309460359797814159311 +9164 0.1774627686024903006068 +9165 0.1909573775751892965591 +9166 0.1786987654171407757797 +9167 0.1883085059566124153108 +9168 0.0587605010026430868431 +9169 0.0014857085454610537926 +9170 0.1745658602223606747916 +9171 0.2098982463986083790708 +9172 0.003786551837769678696 +9173 0.0083921622931078156821 +9174 0.0550657047571598140534 +9175 0.1677439666306355658953 +9176 0.1752957890186143119493 +9177 0.0134984251015198460089 +9178 0.1583903883653042243829 +9179 0.0141726301391915734951 +9180 0.0140630619138569987192 +9181 0.2832097051465706227802 +9182 0.2497821302681923205657 +9183 0.0693363531874386512799 +9184 0.2578548148449141264038 +9185 0.2767445337309628605738 +9186 0.1094938548901151914583 +9187 0.0016870000961811426089 +9188 0.0603041603049709537543 +9189 0.0882677656213124162132 +9190 0.0198237953042652396973 +9191 0.0048815466846376708368 +9192 0.112303627140306783172 +9193 0.0007180572345804601183 +9194 0.0579249846545986449287 +9195 0.1372998533734701986919 +9196 0.0976954847272353543719 +9197 0.0150489307233821877768 +9198 0.1674914009182288665567 +9199 0.108241069212877849548 +9200 0.18431512162498833729 +9201 0.0991955187312544700573 +9202 0.1332089973812089489513 +9203 0.1298509510454400051049 +9204 0.1712531346105114760103 +9205 0.2555777214644341377436 +9206 0.1776108929790634061163 +9207 0.0290509463030112803639 +9208 0.0093196676052763882669 +9209 0.052390974445803345505 +9210 0.0237522026581166306736 +9211 0.2116977099820014085463 +9212 0.0034674534607123811206 +9213 0.144682776823111397535 +9214 0.0178180205702220817443 +9215 0.0008589168675031916105 +9216 0.0003016728507571125525 +9217 0.1241503255541756001179 +9218 0.0567701205233668085293 +9219 0.2363154908120839725605 +9220 0.1887991482634732431922 +9221 0.1836994983307776996284 +9222 0.0323524982631013327539 +9223 0.0470231925947772397278 +9224 0.2269659626347052261419 +9225 0.0123610408291211851251 +9226 0.0450175892821364359575 +9227 0.1863911915429148913326 +9228 0.0714996617357695207762 +9229 0.0101143047737781936052 +9230 0.0960741968489554204469 +9231 0.0345417603872258879849 +9232 0.0826385315469741477123 +9233 0.145585277353662417843 +9234 0.0769645242010645602715 +9235 0.0220531456377695764259 +9236 0.0351573451424784869102 +9237 0.2367153519572042230656 +9238 0.0022352453907597547023 +9239 0.0032717796870507373161 +9240 0.0931506976831395772098 +9241 0.2040662693125288307261 +9242 0.1204712991989245196578 +9243 0.2373200154569300446283 +9244 0.0235406512317946590984 +9245 0.1134461845518630607188 +9246 0.2025910630066848350239 +9247 0.0253066754101024422863 +9248 0.0294942605446072564357 +9249 0.001333179891711626288 +9250 0.005870112015956676331 +9251 0.0092482709715851606908 +9252 0.0479647019345779759836 +9253 0.0080980290564829447014 +9254 0.0125578073941609285463 +9255 0.1804587127082751263352 +9256 0.1111307984190422343262 +9257 0.0083480551227550128107 +9258 0.2349218844962224217454 +9259 0.1515835374868993445929 +9260 0.0568653020463055006162 +9261 0.0342053895200293486822 +9262 0.0386932403818499720538 +9263 0.0092880694121364754806 +9264 0.0050630893251407691272 +9265 0.0217029668994372440527 +9266 0.0227291472894180467124 +9267 0.0298304788319302023603 +9268 0.0327836073739862093879 +9269 0.0398249860167674760159 +9270 0.0707071977821312758206 +9271 0.0166527005812273157237 +9272 0.0027153547051739167738 +9273 0.0025402193463633726891 +9274 0.2313339978456727397127 +9275 0.2419607981148422515272 +9276 0.0819829964927265147923 +9277 0.1355910833318311470297 +9278 0.1057345841949140774485 +9279 0.1851011237985978330478 +9280 0.1658192570098567519832 +9281 0.1282313295577834355576 +9282 0.0828185771623787397866 +9283 0.0245510718424656500247 +9284 0.0452232179170685075209 +9285 0.0459781901079243451869 +9286 0.0915732981472620194419 +9287 0.1656284960395965344393 +9288 0.1762333664806597921526 +9289 0.0723576467306231291277 +9290 0.1347021126812903146774 +9291 0.1768974008354713411961 +9292 0.208800885335853281255 +9293 0.2043989998554870124448 +9294 0.036857976481748003228 +9295 0.0673495558827511414668 +9296 0.0481976734149661759088 +9297 0.1933204798890095221786 +9298 0.1196863893740081619121 +9299 0.078359704462433038552 +9300 0.0197815817793248506506 +9301 0.0043408661682766891063 +9302 0.0785406348671718185228 +9303 0.2008012182326157912815 +9304 0.0106690091610653382814 +9305 0.0032991277320209930896 +9306 0.0203808863998509066928 +9307 0.0115313786494959883561 +9308 0.0287765610092596095526 +9309 0.2155320913549354511218 +9310 0.1233740578580028307565 +9311 0.0756925528128939706374 +9312 0.3546099119882520911418 +9313 0.1553797982701151103591 +9314 0.2529301218209744073206 +9315 0.3417785135055349532429 +9316 0.026466468365809348029 +9317 0.079200296796247668385 +9318 0.1667841202090407648484 +9319 0.2161990380558130586675 +9320 0.1704782977794916110081 +9321 0.1518865117513972240459 +9322 0.1560436017139044395652 +9323 0.3263928354349326932393 +9324 0.1350690614226380259488 +9325 0.1181910498061864850383 +9326 0.2358659196659692869336 +9327 0.0849295419739689583238 +9328 0.1180141947607060565195 +9329 0.245032563204889114461 +9330 0.2709434970096613048973 +9331 0.2720583524498006999792 +9332 0.0565645748320151575683 +9333 0.1577961940906796034501 +9334 0.2500299344968294756875 +9335 0.1234357334247074139411 +9336 0.1873177667953292713232 +9337 0.0614153788088096716513 +9338 0.0036835142993303293321 +9339 0.0049318834701721709149 +9340 0.0326840709915589555656 +9341 0.2457019141092237346502 +9342 0.0090664280074580650554 +9343 0.1837934715192574564036 +9344 0.1256753116602318809392 +9345 0.1469208107369323335067 +9346 0.0292545844086253906124 +9347 0.123710043589721485624 +9348 0.1530441156381182887802 +9349 0.0116571514787579596367 +9350 0.1556151267617688627176 +9351 0.052140358439988265804 +9352 0.0158033491011777584012 +9353 0.1016763552266089498666 +9354 0.1502185284884752780599 +9355 0.0190124189171190852155 +9356 0.1517544870098871401254 +9357 0.0324796198992449158816 +9358 0.1481835440303101436577 +9359 0.0138733740682509976844 +9360 0.1945770153841779592785 +9361 0.1636372433581431140581 +9362 0.0916781086298516650901 +9363 0.0005223324845571860687 +9364 0.1277664293780183535354 +9365 0.076440476435681803391 +9366 0.0198941883959259969983 +9367 0.002218977000244925623 +9368 0.0117244851609473923343 +9369 0.0113142950660047782868 +9370 0.1034215413726312232656 +9371 0.0925804026665147383746 +9372 0.0385758675740009854271 +9373 0.0887845827660167924433 +9374 0.1608964355476260044142 +9375 0.0963971588440434568046 +9376 0.0786540516452030324945 +9377 0.0421750035872431802697 +9378 0.0142451710041065545548 +9379 0.0505403877383863373018 +9380 0.1915836669454415996938 +9381 0.0880747835157517366955 +9382 0.0165863132198312140941 +9383 0.0260416522684067050508 +9384 0.0131108943765508630802 +9385 0.0059468176250917744244 +9386 0.0706267544332272040331 +9387 0.116097025756146970954 +9388 0.1517819590308445709681 +9389 0.0179910251232159382195 +9390 0.1030617467466788950148 +9391 0.0425538884729298738652 +9392 0.028248548009106522888 +9393 0.0007746684321861656026 +9394 0.0704856618747168872652 +9395 0.0552469014103878289235 +9396 0.0458336387572940567869 +9397 0.0211182832774723487634 +9398 0.0179792465104289221622 +9399 0.0735874916604310896906 +9400 0.0839589290477884137021 +9401 0.1175150534138946351037 +9402 0.0549060242204003484678 +9403 0.2118662454948724849224 +9404 0.0130464942828604907321 +9405 0.0683601005906641079335 +9406 0.1374839331308841117174 +9407 0.0779689191692751748519 +9408 0.2222230201149883177525 +9409 0.0371758289885159853849 +9410 0.006622787384939336254 +9411 0.0277664454350454101283 +9412 0.0021413173332601896301 +9413 0.0071927318180354622365 +9414 0.0305080014256004679318 +9415 0.1359648160118178128464 +9416 0.1174982520669241442546 +9417 0.0053230856609934968857 +9418 0.0026143839226469771202 +9419 0.1162946734374742802087 +9420 0.055169957964080487689 +9421 0.05780014847678349027 +9422 0.0102350649972237639496 +9423 0.1659765234004498268838 +9424 0.1186202115416992858288 +9425 0.0338118664561586831918 +9426 0.1409511593683193675375 +9427 0.1583383984373154251823 +9428 0.0731927639823044540179 +9429 0.0230031584326178505095 +9430 0.0129667877297663328845 +9431 0.1334868635353639310104 +9432 0.0414255594120334844077 +9433 0.0009309441737245376406 +9434 0.1051626383828855315627 +9435 0.1490952824792537712018 +9436 0.1259609336236288401611 +9437 0.0765051366989147468844 +9438 0.2915638348236524457491 +9439 0.0332501856913343316058 +9440 0.004044848203516132959 +9441 0.1596194180757025127271 +9442 0.0030161619723228015426 +9443 0.1061463090409382747925 +9444 0.0057398772201315614166 +9445 0.0139803054372772111308 +9446 0.1662459143849872866827 +9447 0.0107195207101802806454 +9448 0.0472317676823491641724 +9449 0.0772701442489741319175 +9450 0.0804256706652433234206 +9451 0.1682736384680546815051 +9452 0.0111558172757511963463 +9453 0.072858845582766951221 +9454 0.1027185999463162274781 +9455 0.2050456690955709537771 +9456 0.055852662600767391532 +9457 0.0819762302766381401042 +9458 0.1073146184874851777336 +9459 0.1389608167889210821322 +9460 0.1594745655065269163142 +9461 0.0086152448908359026414 +9462 0.0249036887449056998367 +9463 0.1028778889400036311574 +9464 0.018279932652615911437 +9465 0.2553894596627832580538 +9466 0.1621761447209457418506 +9467 0.0057152628169648408626 +9468 0.0411656986798703827812 +9469 0.0353629159911750673428 +9470 0.0185955752224262853278 +9471 0.1105227344275311052169 +9472 0.2463404645095876310812 +9473 0.0017216140086916630579 +9474 0.0015464283557362497844 +9475 0.1151212780532069324124 +9476 0.0300426028370079853391 +9477 0.071993875435865550827 +9478 0.0496421828802854486762 +9479 0.0104264989896806249681 +9480 0.0866161126782415413672 +9481 0.0235203850657219951104 +9482 0.015410786933446685501 +9483 0.0118426045630895832722 +9484 0.0549980734364722348428 +9485 0.0048482943788047579148 +9486 0.0178516997895831336374 +9487 0.0369216823721264658653 +9488 0.194969058165290809459 +9489 0.002534981460010217371 +9490 0.1281631026418454666338 +9491 0.0019501997136234128661 +9492 0.0079551411190335371421 +9493 0.0410087915891004031743 +9494 0.0177426712624704566168 +9495 0.0122406029322365376977 +9496 0.038923762432073245876 +9497 0.0401187610527687851802 +9498 0.0109897195004595351481 +9499 0.0138013275729263334979 +9500 0.0111432637773044261598 +9501 0.0020543976201717328856 +9502 0.0003435836771729778552 +9503 0.0025025863559335315665 +9504 0.0535978049074863766021 +9505 0.0357857800817430668494 +9506 0.0468194880260108040049 +9507 0.0528750478016774896894 +9508 0.0095395316840202661751 +9509 0.0032163721960741091001 +9510 0.0632298047335142637815 +9511 0.0228759202147561721552 +9512 0.1050448031294462930552 +9513 0.1510950348715034696756 +9514 0.1604664298069160832672 +9515 0.0501821527343807860522 +9516 0.0850348328287439808593 +9517 0.0114430400968542830831 +9518 0.1926217788800008157146 +9519 0.0738566086943113903063 +9520 0.0016450714538033008299 +9521 0.0026931599382539216959 +9522 0.0311082184364933900245 +9523 0.0353277357856200055486 +9524 0.0530610265003043229948 +9525 0.1106669478481783902568 +9526 0.0903062797586343252165 +9527 0.0029470628292037797172 +9528 0.0055814796888333196787 +9529 0.1521495288618400776315 +9530 0.0092225472454875116052 +9531 0.1107090142405575267093 +9532 0.2740488700401597088607 +9533 0.0022446284776301279711 +9534 0.1123243516719592133279 +9535 0.1280857989373776761877 +9536 0.0821642070214579911536 +9537 0.140782047956776484865 +9538 0.0084399000666870310772 +9539 0.0195245654379614073937 +9540 0.0500212440100866254089 +9541 0.0733835602299729439224 +9542 0.0852911200344028674891 +9543 0.0279220643636812308752 +9544 0.0242476595509400552619 +9545 0.0125475821201814826539 +9546 0.1684498935072876735219 +9547 0.0732121314932019096045 +9548 0.0087668084350091147333 +9549 0.1179147790861950068075 +9550 0.0743034477036331741928 +9551 0.0639202688376963451544 +9552 0.0141885726745052589276 +9553 0.0090433604228386012258 +9554 0.0198058966234900095393 +9555 0.0440992903690261839333 +9556 0.005800738076428762352 +9557 0.1319077054524467051966 +9558 0.2104026721777736141927 +9559 0.0135081140392939308414 +9560 0.0010668086620321804141 +9561 0.0524359883248363412012 +9562 0.1166467044714138978634 +9563 0.0254714891391341555216 +9564 0.0013168523974367769315 +9565 0.0059689918610075116956 +9566 0.0178116005457452340754 +9567 0.1050791811686384580327 +9568 0.1759719877309823543587 +9569 0.0954088799592905378866 +9570 0.0551895034612399568164 +9571 0.0902773523340640732249 +9572 0.0336602344537775752253 +9573 0.0887645219312367117492 +9574 0.1492063007644222616399 +9575 0.0223871523990081568167 +9576 0.1544843285720895464941 +9577 0.0099320044583478955813 +9578 0.1227797352514036627325 +9579 0.074065941545610511354 +9580 0.2298319928666623657154 +9581 0.0902592029799868939 +9582 0.0368516797168253196215 +9583 0.0223770503651504436282 +9584 0.0967663999486363252123 +9585 0.0503155990395568672446 +9586 0.0351361328151157600863 +9587 0.0782365343867953422219 +9588 0.1401999740025649254793 +9589 0.0575065014289431949979 +9590 0.2213271036688894410904 +9591 0.0052286840958394795176 +9592 0.1577056221971597782527 +9593 0.0309399663265906912923 +9594 0.1399008432270555934984 +9595 0.0066505504721380004352 +9596 0.0950809166624460716521 +9597 0.1437726643457596376052 +9598 0.0724101440587481132161 +9599 0.0977694230184440321585 +9600 0.1520055915094704057644 +9601 0.0017763925239421508515 +9602 0.0180109355300723562865 +9603 0.0930617184422500159924 +9604 0.0074396481574511650475 +9605 0.1426782408505914701013 +9606 0.0154537478898100055086 +9607 0.0804348070563043748837 +9608 0.1946471126541182705694 +9609 0.0526204150130102840666 +9610 0.2041774760484155737394 +9611 0.145700507681012197958 +9612 0.1155972878200365455692 +9613 0.1054450802420247718105 +9614 0.0478942711998587705713 +9615 0.0796415932083918837181 +9616 0.1081097333990578140961 +9617 0.231591602688909659058 +9618 0.0789856172230722491934 +9619 0.173393418337487337233 +9620 0.0013459304765646052261 +9621 0.0015325025231908079911 +9622 0.1042393945292040408068 +9623 0.002631224318567269068 +9624 0.2663239089902291745204 +9625 0.1022389995277975943688 +9626 0.1731417621236777948379 +9627 0.1566818213102311729834 +9628 0.0827556568469283576617 +9629 0.1843343748014525107504 +9630 0.1966820074878505020255 +9631 0.0031715401258710935323 +9632 0.0237347284792092175221 +9633 0.2109144268810615452381 +9634 0.1066340870035470261845 +9635 0.1441945390874170562778 +9636 0.0207021689820333573695 +9637 0.0140631040479813392702 +9638 0.000849934823606910317 +9639 0.0020402115456539643944 +9640 0.0033430018337221563754 +9641 0.2117948477492135050948 +9642 0.1520733318920945986719 +9643 0.0740007262062550574822 +9644 0.0742183124914053277132 +9645 0.0568548843453509783474 +9646 0.0251789780634192536524 +9647 0.0370815957161846065926 +9648 0.0559200369100386757149 +9649 0.1195565491005565889049 +9650 0.0126046464088211060406 +9651 0.2333902543387993966384 +9652 0.1385089540256401741125 +9653 0.0518708963814499546263 +9654 0.0995711247489255851795 +9655 0.0901092853478186600613 +9656 0.1370431448699186427476 +9657 0.140528246719340249582 +9658 0.1530129204300290091112 +9659 0.1044952780271691666902 +9660 0.1748356593132721903405 +9661 0.0995782885591434546724 +9662 0.0037678809663422730553 +9663 0.031795957848446111671 +9664 0.1332663074525470281451 +9665 0.0501287838430321125438 +9666 0.0244916055491986722759 +9667 0.0343183482030946993002 +9668 0.0208133026353479987092 +9669 0.0339811939252971997871 +9670 0.1034856965836955661198 +9671 0.2312591670602320292893 +9672 0.1356206387686708092488 +9673 0.0476222028750506662376 +9674 0.2505881907776852379754 +9675 0.0623727530945849470423 +9676 0.017294596826016240404 +9677 0.1041039872155624240735 +9678 0.2019495886205361356858 +9679 0.2462001053149288909427 +9680 0.1874628171723475844601 +9681 0.0890698461354736681361 +9682 0.2389563754274750817519 +9683 0.1318464820076262478121 +9684 0.2039815526576105042533 +9685 0.1740879486814672638673 +9686 0.1049415085568604494304 +9687 0.155798499467498102744 +9688 0.0396541696951528063741 +9689 0.0092328949479557366475 +9690 0.1269153602894076415808 +9691 0.2185670872502271933069 +9692 0.0546233916164355418066 +9693 0.029128185738007784239 +9694 0.1714149992029812830108 +9695 0.0875635186059941367231 +9696 0.070964661581951693492 +9697 0.0505387488484055485971 +9698 0.008518005075270439555 +9699 0.1827722707189456674559 +9700 0.237062723979394485152 +9701 0.2107416731162626843954 +9702 0.1806418242993995115775 +9703 0.1804875543805070925441 +9704 0.0983402146538013416377 +9705 0.1351407138485450076804 +9706 0.0339374682537702115903 +9707 0.0741607562167546502385 +9708 0.203706960228117517353 +9709 0.0647397649230417593369 +9710 0.1109149695728867679456 +9711 0.009240401261796942281 +9712 0.2905201737543051665469 +9713 0.097282445430147218568 +9714 0.0965707515167547603241 +9715 0.0010417635793334570956 +9716 0.0304414880880623517356 +9717 0.0465694306380129985357 +9718 0.0273359077352389387428 +9719 0.2130707471387767448778 +9720 0.1930694619331179417987 +9721 0.1351435364239073622095 +9722 0.1082357313595641301607 +9723 0.1135947003846267816529 +9724 0.0908008449315390736567 +9725 0.1274201268504644435353 +9726 0.2419486237749146062548 +9727 0.1772950028306035930026 +9728 0.1474931051917534041706 +9729 0.153910229503712192578 +9730 0.0531133451342131399087 +9731 0.1562038436689254239376 +9732 0.1704672646137860847482 +9733 0.1699270636563128544783 +9734 0.0318458224569836967355 +9735 0.001625771898584751126 +9736 0.0228657056579455002232 +9737 0.1610623829138286278173 +9738 0.0765366317222238023987 +9739 0.2039513076007555547609 +9740 0.1793731150507747729161 +9741 0.0215937883507372474501 +9742 0.0217651271387104133792 +9743 0.1593299147057889997292 +9744 0.2424683641518658727687 +9745 0.1486099389876854304227 +9746 0.1298801573400640840816 +9747 0.1453908188911745003047 +9748 0.0229173690469683805082 +9749 0.0421211795643882253026 +9750 0.2942260741078025487738 +9751 0.0388602761651848024882 +9752 0.2713958169589838487212 +9753 0.1224229033848703573906 +9754 0.0066068095740912753319 +9755 0.0622070255606004540727 +9756 0.1221922785364750740911 +9757 0.1482790378101539185973 +9758 0.1730127381626286597882 +9759 0.0432761239504956957136 +9760 0.098268562780554194469 +9761 0.0551356951295640732025 +9762 0.0978297996979116402105 +9763 0.201637640963083702994 +9764 0.0241070767180478588709 +9765 0.1356230133360104839202 +9766 0.0767157756405271729072 +9767 0.1470459346429608404883 +9768 0.099170819990451922199 +9769 0.1514860712229517958338 +9770 0.0620452228346110509771 +9771 0.1138849931595904974024 +9772 0.0228074307883474811098 +9773 0.167159512890308198374 +9774 0.1002973444356108495557 +9775 0.1792350038407796997131 +9776 0.2571874917345353672715 +9777 0.2331245953021592043353 +9778 0.1621230262279233502731 +9779 0.0117501995316113671403 +9780 0.0065697941583062477791 +9781 0.0450656096134448777568 +9782 0.1254468898505070595384 +9783 0.044007218066593981054 +9784 0.1780557499050737513269 +9785 0.1370156636033581787348 +9786 0.1948060352525879668217 +9787 0.0839666594683982914216 +9788 0.0316178208438026714378 +9789 0.0520068679425467750299 +9790 0.1160813822562279523654 +9791 0.0759190608154335921931 +9792 0.0314324590545051740786 +9793 0.193467059011105863009 +9794 0.2078471074228534221096 +9795 0.106267983829501574955 +9796 0.1044032698547609772088 +9797 0.135160887202903728932 +9798 0.0971013754393721256752 +9799 0.2729889489813742131652 +9800 0.0912202871983290586311 +9801 0.0288558443823844941456 +9802 0.0336248490615918202429 +9803 0.0456893775036274207846 +9804 0.1850162189485960173574 +9805 0.1599583373680507347725 +9806 0.138204991629583412438 +9807 0.0149759530441131005069 +9808 0.2095730476134391173648 +9809 0.0980237920796071943164 +9810 0.1054117321241478250071 +9811 0.0481657203761237662221 +9812 0.1457875895954204858995 +9813 0.1636838741909477090708 +9814 0.0563443660542541358005 +9815 0.1509768917606387206298 +9816 0.0024481561885089822388 +9817 0.0224204839677434007716 +9818 0.2403618442196100712938 +9819 0.2113446850093191609066 +9820 0.2666020120338585819297 +9821 0.1043352664157337628126 +9822 0.097685089777459724325 +9823 0.099029977363479348984 +9824 0.209588225311293091524 +9825 0.175131215931583233969 +9826 0.1469595626507658736593 +9827 0.0419664150845122721489 +9828 0.1829800027843020704399 +9829 0.1404210310969076780374 +9830 0.1806756849767640615845 +9831 0.0950099211563086160082 +9832 0.1744722213047060210833 +9833 0.1726441787661202043047 +9834 0.099113840912170167452 +9835 0.104107875861691262509 +9836 0.1345413525173153879155 +9837 0.1146693048549680266079 +9838 0.0903473001202931558229 +9839 0.1645162234204006201921 +9840 0.0148743441697925752459 +9841 0.178996621050176329959 +9842 0.1440860111565940016032 +9843 0.0728481076906496022305 +9844 0.1524770760424648818887 +9845 0.2300557998691462602281 +9846 0.080089804941274433947 +9847 0.2452957421479222688276 +9848 0.0445525302506838624672 +9849 0.10154995993054710679 +9850 0.0170410210127949976466 +9851 0.0329123830441768253086 +9852 0.1683648566004762592296 +9853 0.2158297250238322417815 +9854 0.1461319542838545315622 +9855 0.1669481251320152626061 +9856 0.0452685627594018244468 +9857 0.0932884727689086562519 +9858 0.1487981299359011355676 +9859 0.0304353304300462972065 +9860 0.0522755352032271206064 +9861 0.1214375190696308365323 +9862 0.1282094825076883493065 +9863 0.1270557113669991966542 +9864 0.1551068396298614926465 +9865 0.0116930530571019968694 +9866 0.0480361184311122735302 +9867 0.0104265949222276266967 +9868 0.0191548079767101184223 +9869 0.1289733798336426262576 +9870 0.1674875775924290488561 +9871 0.0865309368037046056088 +9872 0.0357273581598900324963 +9873 0.0819069924947666144766 +9874 0.248248463531006408056 +9875 0.0894998137792928361023 +9876 0.0094147923706174984487 +9877 0.0936530406449053304252 +9878 0.0708745174849164383257 +9879 0.1982937822408680217379 +9880 0.1245777159896052782573 +9881 0.0078888981518880404303 +9882 0.1171145139159355808012 +9883 0.0007870706125686735858 +9884 0.0451555195536252756972 +9885 0.0619382963810743980559 +9886 0.0756705192200191473217 +9887 0.149821708501269079683 +9888 0.1425429066468487937502 +9889 0.2683906577754754940734 +9890 0.1911517088173597112011 +9891 0.1101708528108951562485 +9892 0.1540225377863279343504 +9893 0.0426248691683933922381 +9894 0.2715107388927307519033 +9895 0.2028013863081478407313 +9896 0.1816134981663328484824 +9897 0.1471568852600598220626 +9898 0.1062202740609042017628 +9899 0.1246287781617670259449 +9900 0.019109897942030749235 +9901 0.0016996957462246998528 +9902 0.0101100611820936249485 +9903 0.0045108483897994483894 +9904 0.0019351282300480792623 +9905 0.0244480551638261156544 +9906 0.1233899371503771602576 +9907 0.2468020331789154975244 +9908 0.0249587867705454713452 +9909 0.1543128975528146040741 +9910 0.2455694176694442620867 +9911 0.1531042271275613042558 +9912 0.0750583373340317216993 +9913 0.0627832986456366842765 +9914 0.2157628714879796794079 +9915 0.097251253734471146406 +9916 0.1498561025781625366537 +9917 0.0213084065461497523808 +9918 0.0699912901109984580961 +9919 0.1078058642738706079944 +9920 0.2267346438280520015685 +9921 0.1045557143295386126614 +9922 0.0220743864378122357783 +9923 0.2637924157944351866156 +9924 0.1061299644808610664892 +9925 0.0083180272258804033303 +9926 0.1411690374824553784183 +9927 0.0120250616274654182486 +9928 0.0246170154275339933547 +9929 0.0388285980156827109644 +9930 0.2006751075171614717707 +9931 0.0786233159472981835147 +9932 0.2001538625828172268672 +9933 0.021657642659307543842 +9934 0.05147098065462249461 +9935 0.0627196479859923744504 +9936 0.0386055233620772944581 +9937 0.0632338583854028246645 +9938 0.0875815942512205569681 +9939 0.0341703346675482078076 +9940 0.0906278320893997013652 +9941 0.0084251996336089911982 +9942 0.0762588535383144222601 +9943 0.0046385998684739960379 +9944 0.0034887979911018916074 +9945 0.0058174507760977139242 +9946 0.03153707461013233182 +9947 0.0396282222103661943557 +9948 0.0228910864300263493243 +9949 0.0033107775462972444114 +9950 0.0225914815633699340991 +9951 0.0288991459119237945052 +9952 0.0351508164078547305498 +9953 0.0162871358639025930193 +9954 0.1202153702038697008492 +9955 0.2937716775215717390779 +9956 0.2322520817894763167644 +9957 0.2188328882377917894431 +9958 0.1830865505790334746283 +9959 0.0969164234115510170309 +9960 0.2395880490822490105352 +9961 0.1691530781719808584906 +9962 0.0310042760683109062947 +9963 0.2465477684259508306397 +9964 0.2340041766669820610947 +9965 0.2089493321637926059875 +9966 0.043803119517639266256 +9967 0.0835968594313097335124 +9968 0.1676744910152857814811 +9969 0.1078102279642998428955 +9970 0.0424373617609513364979 +9971 0.0658909356864179895918 +9972 0.0344819173680755361455 +9973 0.1242696617034844824845 +9974 0.1106281867090375820029 +9975 0.2795013267096421727409 +9976 0.133259499483970617062 +9977 0.0860414841535598190481 +9978 0.0368474140129043323255 +9979 0.0381892977929676399884 +9980 0.1771349986548422794641 +9981 0.21475769906978400825 +9982 0.2533533093378473322055 +9983 0.1899108842753212400556 +9984 0.0764431264305976826412 +9985 0.0828326112681390391845 +9986 0.0977895789077051769134 +9987 0.1188811324414951603456 +9988 0.1255848423309227912537 +9989 0.1133155520597814019945 +9990 0.1506894418167470661718 +9991 0.1976164880235802279174 +9992 0.1707103279218786517912 +9993 0.1980673303801670503965 +9994 0.0781669768879041521048 +9995 0.2082454704380212739956 +9996 0.1560466227710868847289 +9997 0.1061585432446934956774 +9998 0.1294459937163946472438 +9999 0.2138143905850015402681 +10000 0.2016834625238333178476 +10001 0.1177572162008231754227 +10002 0.1933981669282424653566 +10003 0.106323784379686106516 +10004 0.181422630571034082525 +10005 0.1583469569507593710522 +10006 0.2295006302683175458679 +10007 0.2942745553696368143726 +10008 0.006568166800737603242 +10009 0.1471331151119127655935 +10010 0.152882016960427413732 +10011 0.1950128238221340293279 +10012 0.1216819446544849941949 +10013 0.1513990195066804989033 +10014 0.1893160876438836903723 +10015 0.0549479151510763735855 +10016 0.1181358879458835053899 +10017 0.2272824065361602474677 +10018 0.1172814953294621237445 +10019 0.1780359018462888132106 +10020 0.2909030959587650677634 +10021 0.1699475859781458264575 +10022 0.0840709647313449132122 +10023 0.0300653440515754530327 +10024 0.2614463163609817564215 +10025 0.2034365034232324676022 +10026 0.2336359943558939944452 +10027 0.1278501663139255395407 +10028 0.2289967849134877941886 +10029 0.2106760056233987488117 +10030 0.0411117140805495467903 +10031 0.1006928398285263648315 +10032 0.1072386711964220806781 +10033 0.1180459962101651633093 +10034 0.1300091396706948010387 +10035 0.1581427664900742768861 +10036 0.2141589812772573264699 +10037 0.1862380205615399297869 +10038 0.2649812177601890139655 +10039 0.0874448895523565611354 +10040 0.2540932710236497582557 +10041 0.2525306433817063322955 +10042 0.0481547164717643291776 +10043 0.2292822547406346000987 +10044 0.2680291721841983609842 +10045 0.2846951871714557547044 +10046 0.3119660602762742707128 +10047 0.1842656639977923171969 +10048 0.1801619572463224183689 +10049 0.4178907376540970664003 +10050 0.1699115087416948333132 +10051 0.0337656803827791965333 +10052 0.2214468750024305709889 +10053 0.2339248391843532226542 +10054 0.3049237910137754803408 +10055 0.0625196892795037845181 +10056 0.1852126897163443863814 +10057 0.084436782151131400953 +10058 0.2761977351635215849335 +10059 0.2139902416141170049535 +10060 0.0622993047587187542824 +10061 0.1129547624768676566953 +10062 0.2365031071709355614541 +10063 0.1593910539469610976315 +10064 0.1801324734646941316818 +10065 0.2859607382794111529734 +10066 0.1179096748419817819453 +10067 0.1684209043672957317739 +10068 0.2616774313271093554967 +10069 0.1520585135454573244562 +10070 0.3695909421804152716895 +10071 0.2319565644599756171207 +10072 0.3078468057777993260871 +10073 0.0264020804418136911806 +10074 0.0614228159125194614343 +10075 0.1046421997003433856932 +10076 0.2070547913306121923238 +10077 0.219315902419835978554 +10078 0.0797696453878124628334 +10079 0.1822394597565563101327 +10080 0.1728157927734473853221 +10081 0.2256828328246322512562 +10082 0.1245093083468095523925 +10083 0.0197555688286347973237 +10084 0.3299198778889369521039 +10085 0.2790224134640308939481 +10086 0.0265193331974628611347 +10087 0.2625100862521053124432 +10088 0.196433434453885891191 +10089 0.1953021377175332418119 +10090 0.1152260716092922410159 +10091 0.2065190809920219583429 +10092 0.0287687238846946399806 +10093 0.2374545389912276482924 +10094 0.3246691720477635767494 +10095 0.3171264400436133445993 +10096 0.0017260559752483388939 +10097 0.0064797413442205468309 +10098 0.2491699225026575703623 +10099 0.1225726192179639878477 +10100 0.2168040774941968618084 +10101 0.1960726313224008809932 +10102 0.116766788122320747112 +10103 0.1233291764683179803086 +10104 0.1908391162921048989087 +10105 0.1191391496744753503156 +10106 0.1479524026871492925483 +10107 0.2187484067706229051709 +10108 0.0531404161537334940313 +10109 0.0002724394352470997801 +10110 0.0399652279058394649391 +10111 0.0603830132776964900732 +10112 0.0016469067208833892881 +10113 0.3195569978685626755244 +10114 0.2143087606276031786834 +10115 0.2035589028087646523701 +10116 0.2115106116793221691097 +10117 0.1601784679706116376607 +10118 0.2952800913347993772007 +10119 0.0104182961231595785229 +10120 0.0120621218173576783689 +10121 0.0122244746654102901667 +10122 0.0999978911512319351695 +10123 0.214848721734471365119 +10124 0.2293994644312437047873 +10125 0.0441481443965310305089 +10126 0.0943803713080917094924 +10127 0.0991036938995763505567 +10128 0.0770852332039891341431 +10129 0.165216892342055832188 +10130 0.1704737395398952526371 +10131 0.1705342223057092765348 +10132 0.164330767839437591471 +10133 0.0678075325751392038454 +10134 0.0829845153502262317113 +10135 0.1084867059777910036633 +10136 0.02527535701412944244 +10137 0.1295653777287373298321 +10138 0.0595091650631420013284 +10139 0.0422966510388843794366 +10140 0.1274400567784902127411 +10141 0.0787157968576039229758 +10142 0.004494793752509346034 +10143 0.0462914400787473867549 +10144 0.0143578565361363128666 +10145 0.0006825954018727354265 +10146 0.0009593149795110297515 +10147 0.0169226671115526097755 +10148 0.1239868014430804965542 +10149 0.1566516725451433944549 +10150 0.1072258122053827839038 +10151 0.1583958073670065358485 +10152 0.0715891116907257096802 +10153 0.0119235085707529078064 +10154 0.1416323530006973752116 +10155 0.0251985261345845715231 +10156 0.1530990489675243260148 +10157 0.1814270987160442327735 +10158 0.1289116714185515599755 +10159 0.1245843856492325069096 +10160 0.0234087849848394653762 +10161 0.192970785858661619816 +10162 0.0479298879554766743549 +10163 0.0133250510732490015653 +10164 0.1174047703485037452698 +10165 0.1798309152311587943984 +10166 0.267514300762364820585 +10167 0.14275773055871052164 +10168 0.1105659806926461291576 +10169 0.2289135921902629711155 +10170 0.1882037234436539641091 +10171 0.2648649860268375011429 +10172 0.2663738221284893126217 +10173 0.0285450339041520145811 +10174 0.2046681725399022155898 +10175 0.1459801565126791489391 +10176 0.1135498816066675964453 +10177 0.0547419291294482385224 +10178 0.0355448190668035679529 +10179 0.170489529983016918635 +10180 0.0374268756891051532221 +10181 0.0898574900893969602533 +10182 0.1833755980122107609454 +10183 0.1348446944218749943278 +10184 0.0878329827774234350724 +10185 0.2082580812981215390156 +10186 0.0897159244539486450121 +10187 0.0204386066769266065701 +10188 0.0671490685197389108874 +10189 0.1135390487554445437812 +10190 0.1415190408987275494379 +10191 0.1255419940991866767366 +10192 0.0806705350785031499727 +10193 0.0554646675285486584484 +10194 0.0176130964018430015428 +10195 0.0775944282248102051858 +10196 0.1259931779883967650768 +10197 0.2820645174354423634355 +10198 0.0811761061419105506287 +10199 0.042414434611751379578 +10200 0.1907559361711756995561 +10201 0.2268148995414669966131 +10202 0.0944732666058752806881 +10203 0.1249340408341167857342 +10204 0.1890873519959144488656 +10205 0.2621480938575199703955 +10206 0.16797097578071346069 +10207 0.1194149161639928846679 +10208 0.2292127253471120118533 +10209 0.024359071805386343923 +10210 0.2342755575655175448446 +10211 0.1509094709916055376819 +10212 0.1713711393605333310752 +10213 0.1350693652385464849086 +10214 0.1238494139678416039452 +10215 0.1306068375832213079857 +10216 0.0942901855998562188343 +10217 0.1337901685820544750438 +10218 0.0835921629969760826562 +10219 0.0366484803689541630001 +10220 0.0181731431156448609843 +10221 0.0517057397901294416021 +10222 0.1526366897551075518802 +10223 0.0980398586009871425562 +10224 0.0251979868984783805796 +10225 0.1383283905698171889487 +10226 0.1178362701135518691986 +10227 0.1627543948527981620966 +10228 0.1624873877235728514279 +10229 0.1533803325668131045667 +10230 0.2109035450786798937539 +10231 0.1537116522208703539576 +10232 0.134200592564937354334 +10233 0.1366736127805428802073 +10234 0.1816710232290486282114 +10235 0.1576754943608234815766 +10236 0.2412925718067542024681 +10237 0.1941353758005301688438 +10238 0.0158009100639610045036 +10239 0.1545919150896781779636 +10240 0.1683448603413743305257 +10241 0.0299401019773914874034 +10242 0.0798133875921868546355 +10243 0.0926158087763158943684 +10244 0.0785597577971703514566 +10245 0.0964523899353751379415 +10246 0.2659590535900488994514 +10247 0.2435117431031727508461 +10248 0.0657759356361182212503 +10249 0.0969512713335537179571 +10250 0.1526087889066483627154 +10251 0.0668747752056529781717 +10252 0.1708542426254030244248 +10253 0.2303119350672376552946 +10254 0.1823753723746990573762 +10255 0.1246414999102828380373 +10256 0.0133311210334132769495 +10257 0.0182289617350873435486 +10258 0.0327920737258485875398 +10259 0.0031486397550241065554 +10260 0.1986545906577934506032 +10261 0.0002656059431829995592 +10262 0.1058848239379956007733 +10263 0.1926080219002401228146 +10264 0.1985065100166871765452 +10265 0.0961094997486324209568 +10266 0.0161828883082159175055 +10267 0.1546273471115928721531 +10268 0.0428560273286544210491 +10269 0.2401903876595638531732 +10270 0.1335791356430102716502 +10271 0.1691414749877933332289 +10272 0.1356891035021354496859 +10273 0.2282914217891059138754 +10274 0.1224782722483608587982 +10275 0.174551728181041421184 +10276 0.0738037450631337105245 +10277 0.1525562851598606106673 +10278 0.0039081349722827965487 +10279 0.1889070240774610165424 +10280 0.0738383586441969802516 +10281 0.1525698444239527162836 +10282 0.0195187417324171573474 +10283 0.120177692076687062106 +10284 0.1657602331775315240847 +10285 0.1763736990823132066986 +10286 0.0362665448690029715295 +10287 0.1316741059412499248449 +10288 0.1543242187770126339075 +10289 0.3350483442177141779617 +10290 0.1993015649959750112963 +10291 0.2169114378128242370813 +10292 0.271182690975210138884 +10293 0.0859580282123211247836 +10294 0.1507194895735408657345 +10295 0.081129732263350573529 +10296 0.194146054056534761445 +10297 0.0409682400872337648678 +10298 0.1930365070172465058462 +10299 0.1379802990498113501872 +10300 0.2295896424221383347319 +10301 0.2009941432485133350205 +10302 0.0561262397363408716 +10303 0.0245750495418404872605 +10304 0.2825273686999131284026 +10305 0.0773380501174079498794 +10306 0.1149002198114291717346 +10307 0.0695994868021339724296 +10308 0.1530148309924324045994 +10309 0.0870740493223875616779 +10310 0.1239943501300822537026 +10311 0.0969843587735302881114 +10312 0.1649673084924125343598 +10313 0.1265238570016113117678 +10314 0.1873873191109230451623 +10315 0.0956880525190112185552 +10316 0.1724985066946772838126 +10317 0.1333755925736835601381 +10318 0.0262147378700560221287 +10319 0.1731659264523979224748 +10320 0.1847832195898243545784 +10321 0.2970413359560905020018 +10322 0.083527043738906084358 +10323 0.1717302763945188215366 +10324 0.1481078653821545632052 +10325 0.0567028578207449465709 +10326 0.1189716631052753353925 +10327 0.0896370160416601613473 +10328 0.1290355235097405128819 +10329 0.1468586635615663771315 +10330 0.1905790504227697357287 +10331 0.2118958012643925303831 +10332 0.1085844568457085268509 +10333 0.1283247869827617848593 +10334 0.1334802620461255640016 +10335 0.1308234401678730340901 +10336 0.0219666912416795374885 +10337 0.0094996491630685454127 +10338 0.2122405533295132484817 +10339 0.1316161215107699289373 +10340 0.0630484483309857318689 +10341 0.1399780618258120723496 +10342 0.0014778618133139216542 +10343 0.0448378918112395871698 +10344 0.1140191654822647615575 +10345 0.0642303676058293832041 +10346 0.2791354832080760739466 +10347 0.0076871472710875257683 +10348 0.1786271353276323303749 +10349 0.0965647513107793092901 +10350 0.2924862562922952435862 +10351 0.2462084176097802667993 +10352 0.1770706580147976549888 +10353 0.0957721657725423697327 +10354 0.1852658592522408165237 +10355 0.2446494523529957609842 +10356 0.2900259519966208454633 +10357 0.1767876008941856036394 +10358 0.2043988647974269989849 +10359 0.0562311482572719623096 +10360 0.0562854013200537253714 +10361 0.1451719837900817544618 +10362 0.1871130175039452114838 +10363 0.1239547325829638557693 +10364 0.1761601431454807298316 +10365 0.2483720258307285377875 +10366 0.2637478146184376592842 +10367 0.1252762759246145995995 +10368 0.1508898540332951621057 +10369 0.1078520965285766908259 +10370 0.1619758805803349632768 +10371 0.0112452725889604296383 +10372 0.1066652210267149819556 +10373 0.2111057927667629730006 +10374 0.1081398784134021823222 +10375 0.1150493844227767903377 +10376 0.1812006274682063378112 +10377 0.1311876031406577214344 +10378 0.1840795310556487263121 +10379 0.0799685975979418373072 +10380 0.1258487851651018674115 +10381 0.2891927049495857660055 +10382 0.0725192572971960630612 +10383 0.2195568952309603083517 +10384 0.0577810921776477845468 +10385 0.0840919112679776037389 +10386 0.0511254555214646355665 +10387 0.0006632224864630002943 +10388 0.0549714163151343582214 +10389 0.1565061781353953296314 +10390 0.079458892638170047773 +10391 0.1717553174330957999061 +10392 0.030088642344011821278 +10393 0.137471237195314566204 +10394 0.2114479775448294407436 +10395 0.1051483888164348318162 +10396 0.1781167254310599690204 +10397 0.1557937812816105049851 +10398 0.1644107208579102252788 +10399 0.1311104249620748118943 +10400 0.1528632912707621194226 +10401 0.123478728451627675633 +10402 0.0931465011915311308366 +10403 0.103199644326488759738 +10404 0.1754881869714297881746 +10405 0.0143909468771808678189 +10406 0.1334928390682002530276 +10407 0.2109511267135775192116 +10408 0.2822011832623634863815 +10409 0.2421827451366067029959 +10410 0.1096996120361997395509 +10411 0.1725363069667291970877 +10412 0.2436837224463793416529 +10413 0.1501816682564551252987 +10414 0.3131331875576994772103 +10415 0.1822897519902744734033 +10416 0.1442493942327275624482 +10417 0.0784871016437104357388 +10418 0.2012995157480652053117 +10419 0.1595638859882531623846 +10420 0.2105632024982996286511 +10421 0.2270598713670791690689 +10422 0.0713797747589285797032 +10423 0.1152490534305580099916 +10424 0.1064422378909874455921 +10425 0.1493261722406484248715 +10426 0.0219152928518290233961 +10427 0.0104590591351549095506 +10428 0.272817772651692047603 +10429 0.2243683419706411497074 +10430 0.1515999221358252102565 +10431 0.2464301778116344543434 +10432 0.1216447177818577723984 +10433 0.1755087834252983813865 +10434 0.1813434761958249430425 +10435 0.1573606125600074678861 +10436 0.1569283836165892842551 +10437 0.1652226157758886759819 +10438 0.1073340200556509443919 +10439 0.0848814572544025897072 +10440 0.0134944916165448732664 +10441 0.0680986182018456787102 +10442 0.1978251351366967503154 +10443 0.0108687090556989617712 +10444 0.092683459251710731408 +10445 0.0400035640967645114707 +10446 0.0079809574703396911344 +10447 0.1512289964168208877826 +10448 0.2276131607752815699808 +10449 0.1044890676014328112498 +10450 0.0986484528587936310151 +10451 0.023327323194223942987 +10452 0.0424293330522021241991 +10453 0.0028350113119521452333 +10454 0.0208979552183389365172 +10455 0.0820354441476270068723 +10456 0.0921464033094878259833 +10457 0.0517977983589842580492 +10458 0.0027059062600381061751 +10459 0.0057481035130205436479 +10460 0.0690503312538482405536 +10461 0.0446677888913329795839 +10462 0.0214025577246721859792 +10463 0.0674673961855788567599 +10464 0.0112789706190148351916 +10465 0.0037687880524863780034 +10466 0.0182830768159775967907 +10467 0.1611823250558335474114 +10468 0.209781625497869522734 +10469 0.0583645908350450132174 +10470 0.0640239313998230280545 +10471 0.197275192770465512826 +10472 0.0491193026175120381338 +10473 0.0589627575879349344334 +10474 0.251647247618697555982 +10475 0.121816102034047427094 +10476 0.164503841944366235861 +10477 0.0866546009182491555611 +10478 0.0996084134923413966334 +10479 0.3556402399258545843352 +10480 0.0849707014250378062092 +10481 0.1782759870632331933304 +10482 0.0650475029651978314815 +10483 0.0502717576050509123986 +10484 0.0567117739219787572735 +10485 0.1011884720473249926753 +10486 0.1200049925299151998281 +10487 0.3254177043172663541526 +10488 0.1229584580059452647571 +10489 0.0130998327010346474891 +10490 0.0277552202345127743688 +10491 0.1567986467890526147695 +10492 0.1592030817138300746727 +10493 0.0636263249845854944597 +10494 0.2734002258588315026344 +10495 0.0370057414582603896869 +10496 0.1869960758676947332191 +10497 0.0112902004734443325851 +10498 0.1652266225701510271584 +10499 0.1478053121193214636264 +10500 0.0344482903570240209246 +10501 0.3407576021211349837081 +10502 0.0898592464010294494914 +10503 0.2174433155049939891956 +10504 0.1897202002633519013486 +10505 0.0742785287658941917321 +10506 0.4656757212334676765408 +10507 0.0172259709435032271063 +10508 0.199887032936105518699 +10509 0.3132682240582040922305 +10510 0.03250124796063783017 +10511 0.1258777799396842700386 +10512 0.1722412861788296112309 +10513 0.0041609957475024720447 +10514 0.0215084927864102812278 +10515 0.1441420209678476171966 +10516 0.0196892078324299828196 +10517 0.1390429405001236684569 +10518 0.2910071474863768492547 +10519 0.0468145770401572763886 +10520 0.1892523747033483527691 +10521 0.2050879242726948992193 +10522 0.1571647007542794349799 +10523 0.2082665541348754922879 +10524 0.1090437874370733806195 +10525 0.27954610068115476329 +10526 0.0187444468800137820086 +10527 0.1790002675554514255385 +10528 0.0275734189328443214662 +10529 0.1124965591614841137202 +10530 0.2682705319758569850741 +10531 0.1192008861686391502088 +10532 0.0106027171923234794088 +10533 0.2409756904814446820673 +10534 0.074911833459277057945 +10535 0.2680481032147360997975 +10536 0.2322745386233147912858 +10537 0.0333154971547640968432 +10538 0.1161344610380880687517 +10539 0.1492186917136057999755 +10540 0.2059806608912814751555 +10541 0.1000860995144105675125 +10542 0.0811328757953136275871 +10543 0.2932982319429113249321 +10544 0.1211866549477938481694 +10545 0.3592497684979893413093 +10546 0.2585329807827181913815 +10547 0.011890916811229897132 +10548 0.1505176747286164351536 +10549 0.0715182872472034936173 +10550 0.0059825810559677138067 +10551 0.0241784084238284187329 +10552 0.0409609700013165156651 +10553 0.0856922311068947645252 +10554 0.1207310885098846903185 +10555 0.2554928727615564665321 +10556 0.2599148354477536226881 +10557 0.4177066894855241407924 +10558 0.0004867458800217712368 +10559 0.0083989321209835422072 +10560 0.0662596266126911220828 +10561 0.0048516594489053984501 +10562 0.0028562356834031660258 +10563 0.0200374546328614747126 +10564 0.0047615841312902123378 +10565 0.0013423489665599602598 +10566 0.0888987620613408413561 +10567 0.1263637316648516251139 +10568 0.0818256308611764043848 +10569 0.0953694647683482749079 +10570 0.0745366471435111588306 +10571 0.1455762377423251707764 +10572 0.1470264468030917648367 +10573 0.0380592585566538604835 +10574 0.0334023824832769614113 +10575 0.1154421413699943110842 +10576 0.1458977588347875331554 +10577 0.1906023841841877897174 +10578 0.1127376597856771384132 +10579 0.1585467380238854417929 +10580 0.1217560244475175135204 +10581 0.146698188102494658791 +10582 0.0139993352122526927561 +10583 0.1668880672691504807315 +10584 0.0838782978486737051949 +10585 0.1420115258673775027187 +10586 0.0874755451661559241261 +10587 0.1311031375023135048252 +10588 0.0304193742986266293937 +10589 0.066556398251056386739 +10590 0.0483229067140639664824 +10591 0.1277688962344696033124 +10592 0.1338297808213959050327 +10593 0.1810709189425968868026 +10594 0.0627342354406831698155 +10595 0.1531477635002724824176 +10596 0.1139633460801393050055 +10597 0.1084477173561030766136 +10598 0.2284308785914455908195 +10599 0.1727818546741789196286 +10600 0.1344680255628247500521 +10601 0.1309314698388656661443 +10602 0.1948337373251347592884 +10603 0.1632570268272000579834 +10604 0.0428220649191738689243 +10605 0.1446695249418231410399 +10606 0.0872132952438560288266 +10607 0.0705048341039508225103 +10608 0.0833214692723726335633 +10609 0.1430209418292990308785 +10610 0.1211150876555264072065 +10611 0.0917851198651245259175 +10612 0.1126242385403006251998 +10613 0.1702582657800145171656 +10614 0.1526746345759584233548 +10615 0.0821682867970197627905 +10616 0.1595021645349715966322 +10617 0.0625107882441308659871 +10618 0.1389306232466676060966 +10619 0.101902749333700415324 +10620 0.1788216658349138232786 +10621 0.0524411584003027092526 +10622 0.0825118833785322275398 +10623 0.023098563842528129636 +10624 0.0773207804246255347991 +10625 0.1181929844699235038208 +10626 0.1334638622305504840337 +10627 0.0900356367780942440371 +10628 0.1815197175706255716676 +10629 0.1267894765702480552871 +10630 0.1379157665562804613302 +10631 0.0556615353705869342349 +10632 0.193304296808737202662 +10633 0.1782171084565887797524 +10634 0.0798334504884986306816 +10635 0.117791665074553533743 +10636 0.0539772348010196523216 +10637 0.1611140322123232604401 +10638 0.1418704218671887340442 +10639 0.0876674631961585348883 +10640 0.053485565194283310464 +10641 0.2270594465542840612038 +10642 0.1208315275938831728775 +10643 0.1953887440803988850835 +10644 0.1131021992164565614924 +10645 0.1280514359549511649305 +10646 0.1911421152325180339293 +10647 0.1085999612688771792657 +10648 0.2272511894240716034954 +10649 0.1170829428796329035878 +10650 0.0029014325120817335835 +10651 0.0125267816135549055201 +10652 0.0164409535265081022115 +10653 0.0273073521728509696627 +10654 0.0037739206067500963987 +10655 0.0233763536890576953398 +10656 0.006341488169051854272 +10657 0.0138743733230671448253 +10658 0.0791897727253541988679 +10659 0.0006010473676060290116 +10660 0.1178444823573299299335 +10661 0.0032413893698858635938 +10662 0.0121344746729443869748 +10663 0.0140039186046347144399 +10664 0.0350659851292266580569 +10665 0.0018433022016750356172 +10666 0.1738768077750950658 +10667 0.1601061424112658337915 +10668 0.0160124057133060083979 +10669 0.1246756876902711252963 +10670 0.1814049407087318377396 +10671 0.149245743918694384611 +10672 0.1098214210061630585757 +10673 0.060341165903824545147 +10674 0.1695537939372631086599 +10675 0.1923231780283950753674 +10676 0.216192393165744789485 +10677 0.0898277374414186979834 +10678 0.0572421996013227907252 +10679 0.003277835753809389685 +10680 0.3681350670422727233699 +10681 0.1042815442600485281721 +10682 0.1860194289710013215711 +10683 0.1685227743422690049879 +10684 0.1267902240509303224414 +10685 0.2835911858655483230862 +10686 0.150725910551014585792 +10687 0.2074237984613548635426 +10688 0.0075094382741584426999 +10689 0.1201021558488698792733 +10690 0.0322020660350077758682 +10691 0.0801506623376766896838 +10692 0.0050951027407922077489 +10693 0.0009178968021828513629 +10694 0.0273485634189215703893 +10695 0.0236293271018169492859 +10696 0.0132857865014355536193 +10697 0.0209460025000829698494 +10698 0.0024894856930937842368 +10699 0.0049229064079111799906 +10700 0.0469423563201851815063 +10701 0.0589669398821453943915 +10702 0.0668390613481183865074 +10703 0.0024074921354417045186 +10704 0.1115099506389160693987 +10705 0.3133892407011057978927 +10706 0.2703103565475675207352 +10707 0.0830090930402086618622 +10708 0.1848304241481820431314 +10709 0.211402173697775780381 +10710 0.1539359161898975481009 +10711 0.056362118043379330623 +10712 0.0563371860787408793025 +10713 0.1809735978313489757063 +10714 0.118309985939956022194 +10715 0.2590859201988678917061 +10716 0.1912444649356500325776 +10717 0.0187258359841209615082 +10718 0.0799984519893280204839 +10719 0.1604472357600762733032 +10720 0.1199927769170564956225 +10721 0.1354172137050197177732 +10722 0.1153550004066816597748 +10723 0.1485958911364099188646 +10724 0.1497491177579284571397 +10725 0.2161144806108308558112 +10726 0.2340393770550959484655 +10727 0.1390857296258131148292 +10728 0.1181103639164412522078 +10729 0.2245389913565656037076 +10730 0.1441450235230041232448 +10731 0.1937944098166675677142 +10732 0.034011957732508805996 +10733 0.2294550254965656566064 +10734 0.2293198473522563007876 +10735 0.1845132763250866281268 +10736 0.1859652445657801467238 +10737 0.1990598830706707678129 +10738 0.1955238349235002559556 +10739 0.1927318174314110454937 +10740 0.1867399070052860321489 +10741 0.2017627301634287440013 +10742 0.0892002315067008388905 +10743 0.2001860836676694899161 +10744 0.0226805088692505381343 +10745 0.1388730678442574306697 +10746 0.0902681789355850722201 +10747 0.178121816717508762995 +10748 0.0192093220559752184129 +10749 0.0029088201232265018144 +10750 0.1600493748800642934693 +10751 0.1286074351144176897144 +10752 0.1419376397644698251099 +10753 0.1508413348674641218938 +10754 0.2168383728501075968786 +10755 0.1104806102481906138202 +10756 0.145034127393855932775 +10757 0.2331930795165603198527 +10758 0.2497259135313544542445 +10759 0.1241119653809344969275 +10760 0.0649415489345029212087 +10761 0.0690747780592022575519 +10762 0.0225205447310945425232 +10763 0.1313852320151557007133 +10764 0.1339597008965716373741 +10765 0.1245144523852322604451 +10766 0.0450594844488023116291 +10767 0.1780602278824546524927 +10768 0.1618088029729192189876 +10769 0.0513658505249441066365 +10770 0.0142121791166484167518 +10771 0.2279960164371989950194 +10772 0.041334837579062345958 +10773 0.0125588902166699663276 +10774 0.018995797666693636091 +10775 0.0334566048742325969356 +10776 0.0253424478850599009527 +10777 0.0205978935541534548259 +10778 0.0017361381657532717025 +10779 0.0103732297632674157206 +10780 0.1754368662088821861911 +10781 0.1696735728417333999296 +10782 0.0192864296886771750816 +10783 0.2884859157319214495452 +10784 0.0110605242183174374421 +10785 0.1689963126940555726563 +10786 0.0118776023883968018247 +10787 0.0077974232942634282353 +10788 0.1897903604721366821462 +10789 0.1869106659715450979853 +10790 0.1188595245220709362144 +10791 0.0433783781109046695512 +10792 0.0608379364600384539563 +10793 0.3130060957590903303327 +10794 0.0805903457609357248836 +10795 0.1515014580470445126892 +10796 0.1453886219148880498686 +10797 0.1571539352407783085575 +10798 0.2859883138404030744617 +10799 0.2601935227495038582823 +10800 0.1013475267453808476015 +10801 0.2726999583987570474619 +10802 0.0991590165447028687451 +10803 0.0306925316390619928875 +10804 0.0010994815921077619354 +10805 0.1671877992610242280591 +10806 0.144283706930507710231 +10807 0.0574262643596717414773 +10808 0.1875746883620567073692 +10809 0.0047470020245030299794 +10810 0.00489226102685464629 +10811 0.0076232019137841304166 +10812 0.0289939376975210001686 +10813 0.0088927665793596920307 +10814 0.216166508046580718716 +10815 0.0783274685354332228515 +10816 0.0247316283840282695461 +10817 0.1300245862080424141816 +10818 0.0021144745323358540845 +10819 0.1251721336025045649976 +10820 0.0697584007151447471617 +10821 0.0801448305684659828563 +10822 0.1546297642472987954321 +10823 0.2041972651418667572543 +10824 0.3695521533148148907166 +10825 0.069902282788936925817 +10826 0.0249488146393163010006 +10827 0.0028912275658606184647 +10828 0.0106333433257263104688 +10829 0.1127698221735601163962 +10830 0.0421775980044993073492 +10831 0.0087740231280413558818 +10832 0.3576191077925123806835 +10833 0.0648792965306137225934 +10834 0.0062188993361810803243 +10835 0.0024161822167782726414 +10836 0.0038083094613043143653 +10837 0.0067180220126745295375 +10838 0.009950332304098377656 +10839 0.0060727496977864333511 +10840 0.0046455224845593952321 +10841 0.0037270838932329016652 +10842 0.0031667406010572455612 +10843 0.2205957045114982639777 +10844 0.0063953583306456751947 +10845 0.1150603236463643430643 +10846 0.0958828811076629511367 +10847 0.0073260030001798840879 +10848 0.0038224191182639208689 +10849 0.0050889282093766691553 +10850 0.0075219916877178700718 +10851 0.1227558029824011576192 +10852 0.1070081816571444299768 +10853 0.1368838301365271348242 +10854 0.0041331466878632367248 +10855 0.0633440234821328190806 +10856 0.1807207210541365294265 +10857 0.0064446827312218384759 +10858 0.0118233936599972441067 +10859 0.0146608754286513889853 +10860 0.0242045603107103601825 +10861 0.0073386414362319090168 +10862 0.0494139735919359723848 +10863 0.1484588569679064495865 +10864 0.0011181432912390348866 +10865 0.1866750306412496285269 +10866 0.0725874103943275339468 +10867 0.1860429219974387704539 +10868 0.0126516522925586780146 +10869 0.2357283024944059335315 +10870 0.0613904431131447880787 +10871 0.2371159822995498323017 +10872 0.3253525467982418817314 +10873 0.2833459500138735265296 +10874 0.2370041189776294077163 +10875 0.1012351711177342517001 +10876 0.122656900666130108668 +10877 0.0173017313827225072131 +10878 0.074386803742766308134 +10879 0.0187951441645419076165 +10880 0.0969927029072727292514 +10881 0.3165059989673122409215 +10882 0.2821432275927054922349 +10883 0.2068017448032681837589 +10884 0.0520399636550026994941 +10885 0.0997631281726480412253 +10886 0.2185478017636682812608 +10887 0.2688032769780234199075 +10888 0.0221861653025933157346 +10889 0.0048755625709366668188 +10890 0.0085915400104023594702 +10891 0.252532230245588018569 +10892 0.0055755825806510115814 +10893 0.0097912943261605558942 +10894 0.0032110477131958707099 +10895 0.1368072844183486069802 +10896 0.0038917990296466089921 +10897 0.0038368265717368742022 +10898 0.0107578250195982799697 +10899 0.0110746682084889615238 +10900 0.0063544679904252061864 +10901 0.1029895733173971095198 +10902 0.0861889476991874459166 +10903 0.1179958809018363669052 +10904 0.2840162827533911360511 +10905 0.1560476684251518031399 +10906 0.1073753069223178091551 +10907 0.0936431947333716263593 +10908 0.0833101356498882339441 +10909 0.135171855615129149486 +10910 0.1557360934090481763281 +10911 0.1175095766692126481034 +10912 0.1755861457392798585797 +10913 0.1848686697117125354861 +10914 0.0769681094422441508529 +10915 0.1127225387018799124261 +10916 0.2181702495280080089657 +10917 0.0919083801200037409096 +10918 0.0894239902994851121321 +10919 0.0357432351516282983961 +10920 0.177298986675664405066 +10921 0.1644721596969108412534 +10922 0.1101477557052633754475 +10923 0.1752200284163155374895 +10924 0.1333321425763448475621 +10925 0.1951287025563168409104 +10926 0.081996713692437633858 +10927 0.1797855754644257231245 +10928 0.2403090619440876873902 +10929 0.2420060226066546305024 +10930 0.0630896077498738078182 +10931 0.0044282573325478448434 +10932 0.1675626503152725377266 +10933 0.071892653274521253759 +10934 0.1330610143894299657319 +10935 0.0862513719857383454048 +10936 0.1417944233036131929104 +10937 0.1508576962063960524851 +10938 0.0720924279911504817253 +10939 0.2115327007871348197199 +10940 0.1543609561137938568365 +10941 0.0761313980086493202748 +10942 0.1969998478504316197224 +10943 0.0551421103399615947227 +10944 0.0886633269908813081983 +10945 0.1881709043104153600279 +10946 0.1800339090824152610271 +10947 0.1459023766278403910768 +10948 0.0683451777744084332245 +10949 0.1091319384712569617868 +10950 0.0571271200201842396793 +10951 0.2158245684467988978472 +10952 0.0679443798608264371008 +10953 0.0493442718200758009828 +10954 0.1034213049848854704127 +10955 0.0063285794842786406017 +10956 0.0283436120044860302702 +10957 0.1422009403886989320487 +10958 0.2787894316072752620705 +10959 0.2122200954743155287829 +10960 0.2222649269951503248066 +10961 0.2506364533888595924616 +10962 0.2329724451111270799863 +10963 0.2495300361029131885893 +10964 0.0733221000162709302117 +10965 0.2209875365560864113323 +10966 0.101698133561056960894 +10967 0.1756694219740731477408 +10968 0.1030032299832903397441 +10969 0.1185373141889202591726 +10970 0.1148866564815485963313 +10971 0.0309903355770438510008 +10972 0.1863235804348170876477 +10973 0.1681401894209471337138 +10974 0.0670032423503299501277 +10975 0.0278541914870658949277 +10976 0.1475935373350675816617 +10977 0.1840563467637631456242 +10978 0.1754989957587495541524 +10979 0.1827750843835528049564 +10980 0.1889073194237209651902 +10981 0.0245168649194995556806 +10982 0.0429190820943639861351 +10983 0.1661581727926417506591 +10984 0.1857020226608000201818 +10985 0.0861254357935910125343 +10986 0.2031774478918018933538 +10987 0.2166289127934064862036 +10988 0.1494575545354938639342 +10989 0.0179896026852295887188 +10990 0.0546700690053249505063 +10991 0.1641436051834312404019 +10992 0.2003681334250041468081 +10993 0.1702039074603941271047 +10994 0.0995044498722695985871 +10995 0.0748862895321938670801 +10996 0.1619177152291882970658 +10997 0.170185484421143895517 +10998 0.1199654555622141627325 +10999 0.1076089205779661578388 +11000 0.1415378429410550642498 +11001 0.1274048979916551638247 +11002 0.0389124874974473705658 +11003 0.0589012239092544867547 +11004 0.1114502570394669545006 +11005 0.2300726915720923693165 +11006 0.1038298338255989883283 +11007 0.058739324075153226401 +11008 0.2732786346026630508455 +11009 0.1007222231340264007038 +11010 0.0975319821034276723815 +11011 0.0997303351595791864881 +11012 0.0543409246890465130742 +11013 0.1966951392368414541778 +11014 0.234140196723186394534 +11015 0.1765861217801882043776 +11016 0.319660411427688195829 +11017 0.1963650230568794174957 +11018 0.2073194984033019938607 +11019 0.0532719489492221379145 +11020 0.0739177230065780066504 +11021 0.1795130493917034275597 +11022 0.0488371574660801180401 +11023 0.0377229135980790900029 +11024 0.0479411288295002180337 +11025 0.0868147361840509029784 +11026 0.1305681493297707052026 +11027 0.1798559164780621499879 +11028 0.1815246663783538771941 +11029 0.1596563379976242147418 +11030 0.1042279628113816775103 +11031 0.2256890814449392657082 +11032 0.1274595597133718061222 +11033 0.2239532751474485328114 +11034 0.2156831035620089542792 +11035 0.2343981162016072483656 +11036 0.3070073399299533822848 +11037 0.1486014361361301383546 +11038 0.0757289101887394511259 +11039 0.193051751385089509494 +11040 0.22117386202382494087 +11041 0.1503442628742011344389 +11042 0.1640388061092835081389 +11043 0.2524514784630543240418 +11044 0.2272298736902714222374 +11045 0.0160062052262797613333 +11046 0.0022171105384256955227 +11047 0.0159754864829767979917 +11048 0.009624237235185725825 +11049 0.0292287046218332249858 +11050 0.0022798235059812419802 +11051 0.0036964300714951736117 +11052 0.2549293099054139588588 +11053 0.1620552891366030257014 +11054 0.1093360433524033226149 +11055 0.0542909200697666710633 +11056 0.0896293301171808970418 +11057 0.4141671299949893914061 +11058 0.0364543808267442370763 +11059 0.0078796960540383903515 +11060 0.0135136674808218198995 +11061 0.0050026594994362480126 +11062 0.0560786102715389450646 +11063 0.0055724864013356953385 +11064 0.0761768432644743809234 +11065 0.0150138994994176849795 +11066 0.0088229243414580556248 +11067 0.0123560648996448637499 +11068 0.0371860124362729041469 +11069 0.042665817452018223177 +11070 0.1401938147750847463158 +11071 0.0663323250992071677956 +11072 0.0633232230830637204955 +11073 0.0273471802847187785146 +11074 0.0628899796857359111302 +11075 0.2544951468792394089213 +11076 0.0842101440114737531673 +11077 0.164883727859770484736 +11078 0.1188747000519361218807 +11079 0.045461373008375702276 +11080 0.0083241753664364190324 +11081 0.0753705372721201333208 +11082 0.0257489459300900523642 +11083 0.0011626814169271380405 +11084 0.1384129890658335193532 +11085 0.0746377826684770789845 +11086 0.1237723582878757916381 +11087 0.2437538275834109646745 +11088 0.0366045716849553448569 +11089 0.4511803442010767217774 +11090 0.2486000434270057846131 +11091 0.1243566761287067390374 +11092 0.1447630393989903230256 +11093 0.2121526214243555363836 +11094 0.0850486455604010094245 +11095 0.2411529444752705531041 +11096 0.1866739773835116578127 +11097 0.2018900887214092054744 +11098 0.2180774387669623670671 +11099 0.2111683675690630146438 +11100 0.1652818804784274009378 +11101 0.0150543634556484402787 +11102 0.0010024358893505013173 +11103 0.0240409384431559403072 +11104 0.156246421734362234135 +11105 0.1108277863353412800995 +11106 0.0977097226961807369117 +11107 0.0980015678013289037729 +11108 0.1362714642664153152918 +11109 0.1447243702381152685721 +11110 0.1579282581453333167421 +11111 0.0731605562969685208063 +11112 0.2832494696475988393836 +11113 0.107885900764374598193 +11114 0.1585688230509192386819 +11115 0.1648033149729629665003 +11116 0.1623851434560596440626 +11117 0.1304187524238092921181 +11118 0.2234046353793088246498 +11119 0.16306296881371210028 +11120 0.2725637791798047326886 +11121 0.1538634955849357555113 +11122 0.1351190160178063204288 +11123 0.0546088045084897749648 +11124 0.0911355636217711095881 +11125 0.2308303011407322746784 +11126 0.1036206515037417091873 +11127 0.1349222159433950685159 +11128 0.1222508925284485070684 +11129 0.1253080343696766907335 +11130 0.0485988269211257845059 +11131 0.2314720257172969974935 +11132 0.0608726603346907840786 +11133 0.2579821017640902547363 +11134 0.031517968426523851333 +11135 0.1279987662124228986471 +11136 0.0905953462116946411919 +11137 0.1441181636649390696014 +11138 0.1473981322386181680439 +11139 0.0816108733049210055555 +11140 0.072718133056571612749 +11141 0.2059072352734529764984 +11142 0.1711141594611025606998 +11143 0.138006641823240017164 +11144 0.1960444454701678329567 +11145 0.1388856627436211021998 +11146 0.0883409798940262974698 +11147 0.2739436353055280659596 +11148 0.0037853615031382373442 +11149 0.0567361541943842348501 +11150 0.0167722867646189939539 +11151 0.2379889832585856679348 +11152 0.2503640020848546532406 +11153 0.1891245134411955186593 +11154 0.0096766242441421075615 +11155 0.1237604377854112369661 +11156 0.1544013118793635397363 +11157 0.1356530717934048002604 +11158 0.1188932862956439406865 +11159 0.0525044193567343123186 +11160 0.0195077235948235387653 +11161 0.0584274647624794460365 +11162 0.0589315329336813492622 +11163 0.128284903771975183906 +11164 0.1936422581672885279147 +11165 0.1478290933720776689775 +11166 0.0749860439755317131905 +11167 0.191639047319164784744 +11168 0.1199749310223978127077 +11169 0.189835578596873194579 +11170 0.1466964286630244307386 +11171 0.2504066101384661013896 +11172 0.0643130421720882472503 +11173 0.2714901837943138884768 +11174 0.1098440190022967333228 +11175 0.22196325050947757207 +11176 0.0991450845022220084157 +11177 0.0750458831263402376122 +11178 0.0102389186126078807204 +11179 0.0394834862825286167509 +11180 0.1053212329763942856076 +11181 0.2207370093131718058466 +11182 0.1686797561332046280214 +11183 0.2487959548252349495367 +11184 0.0230869871481609312269 +11185 0.0895652985223539949144 +11186 0.1123577345040902147888 +11187 0.0161830423086010768974 +11188 0.1173757625732365639237 +11189 0.0930018092477157970466 +11190 0.1360027809011435662168 +11191 0.0324017534114106817156 +11192 0.0134265360976362439077 +11193 0.2902654348733384459536 +11194 0.0856774568606429864914 +11195 0.0064348791625517846524 +11196 0.009534870547191293122 +11197 0.050024729918511132154 +11198 0.0178925130400652693252 +11199 0.0109316712884814613849 +11200 0.0076886151439247499284 +11201 0.0025399968820742631175 +11202 0.0455197471862898092598 +11203 0.0098647432450788517799 +11204 0.1521479140197540824708 +11205 0.1398895535019814007072 +11206 0.060620274271138099853 +11207 0.0073684806639955652968 +11208 0.180265491299047092566 +11209 0.0242673254466761502413 +11210 0.2308351867246955235746 +11211 0.1057263650220631040488 +11212 0.176211025815583066656 +11213 0.1066180634762118051295 +11214 0.2051474007283331790763 +11215 0.0865433235275529927355 +11216 0.2095242182584950152435 +11217 0.0792561942740729885593 +11218 0.2578860935888415140127 +11219 0.2362234118816798345009 +11220 0.1791168079182200445487 +11221 0.074545563362107714056 +11222 0.3397283097727435419166 +11223 0.1545161502079039350654 +11224 0.1552367856375528598623 +11225 0.0119529868847067233251 +11226 0.0715563574821760778333 +11227 0.1073599577419182965476 +11228 0.134564449549273712492 +11229 0.2172794523498110808024 +11230 0.1961591196704676187679 +11231 0.0888621353804864011483 +11232 0.1386088631096635270801 +11233 0.1443970437759444225279 +11234 0.2338642806730687195849 +11235 0.1493918113395577007552 +11236 0.1073390993126251968803 +11237 0.0557631852308763664716 +11238 0.0049479766281727887225 +11239 0.08682731479169131239 +11240 0.0147369106090477364895 +11241 0.1163355783927936260014 +11242 0.0574143797829362104168 +11243 0.3430310852660694020955 +11244 0.0141360454006008683103 +11245 0.0901799227761032884398 +11246 0.2250283321317642082349 +11247 0.1459236764877027414666 +11248 0.1539817660048601877243 +11249 0.0466563703440901825226 +11250 0.0023154272111906332339 +11251 0.2485873009440777436918 +11252 0.0025731095894631472037 +11253 0.0049268505135236116341 +11254 0.0519632448419303297205 +11255 0.1781391086758237762844 +11256 0.2045313860268543615373 +11257 0.0946541157459759446846 +11258 0.2059802547058104393862 +11259 0.1986949325486218331793 +11260 0.1295904909387399217557 +11261 0.0675453026411208939939 +11262 0.1672834744968138831478 +11263 0.3826127284264587702367 +11264 0.0410959839130372178717 +11265 0.1343282436013968350608 +11266 0.1956967930512559517009 +11267 0.2252059692094203480206 +11268 0.0221068878466905080482 +11269 0.1563962924099933093913 +11270 0.2010191176039299587597 +11271 0.1850282014767535010424 +11272 0.1810340818016514252697 +11273 0.1688611401146798973727 +11274 0.0909097598811478835312 +11275 0.1611093809340465032864 +11276 0.1434046104129882148737 +11277 0.1605019662691136717036 +11278 0.1629960822903208650381 +11279 0.1371686826195400443762 +11280 0.1061021235855863220632 +11281 0.0700645608324429181035 +11282 0.3096518549154310129268 +11283 0.0941075612534689881494 +11284 0.1800894653322889227276 +11285 0.0937046612793854261092 +11286 0.0848125610709568461543 +11287 0.1068957497544152507318 +11288 0.1564260284615183460577 +11289 0.0816503789095592152902 +11290 0.0333110503729750068169 +11291 0.0248508418802191322072 +11292 0.004787555313177152376 +11293 0.0310707663684455445696 +11294 0.3258694620374376071936 +11295 0.1829076933495571155586 +11296 0.0540669057627718163683 +11297 0.1445422685367635795561 +11298 0.1202913286957752875272 +11299 0.0236269209807270473556 +11300 0.2520428042700930126863 +11301 0.0232343177182590919505 +11302 0.0433281159788386563414 +11303 0.0179501731470436964866 +11304 0.0244591838724175607289 +11305 0.0079916412959955777234 +11306 0.0450211106439018585168 +11307 0.1871164655117040775956 +11308 0.0022125143363269771796 +11309 0.0167492222514565337887 +11310 0.0272291741495889992597 +11311 0.0025074435182400886882 +11312 0.1364577033459469790344 +11313 0.2915011967500899525696 +11314 0.2883242070954282199047 +11315 0.2023526135764424915831 +11316 0.1820594534316122559137 +11317 0.1108489195202072702395 +11318 0.1416618250311977578182 +11319 0.0989131555037402465658 +11320 0.0921165621990739647007 +11321 0.063366527218934182164 +11322 0.1563970251912288333962 +11323 0.0026118848877814400226 +11324 0.0130011064786654008396 +11325 0.0408034090756161094138 +11326 0.0034970273379338317338 +11327 0.0333927372641573613765 +11328 0.096463251187833970457 +11329 0.0587847782007918026603 +11330 0.0742152557973462245755 +11331 0.0039170966485732108819 +11332 0.0046029016004044851981 +11333 0.0254993708287061214557 +11334 0.0288975798105777989533 +11335 0.03690717041108119989 +11336 0.0285550779379527240598 +11337 0.1882317252568388166178 +11338 0.0028541236941546409625 +11339 0.1272954908961913822463 +11340 0.1392135604261471593102 +11341 0.1742388259592841381895 +11342 0.0449072369574740407061 +11343 0.0054314714273536500791 +11344 0.1666473707892929534413 +11345 0.2123429909728052344953 +11346 0.2435351350190927133266 +11347 0.1861108635316283121952 +11348 0.2047045348655621432865 +11349 0.2711689619535358475311 +11350 0.1708759626300707934465 +11351 0.1515493232030055703596 +11352 0.1561457669054075458881 +11353 0.1582153732475749807751 +11354 0.0017282896388651071244 +11355 0.1393303829903553836544 +11356 0.1547705401869264862924 +11357 0.0158525784322423660133 +11358 0.0173502171826623724504 +11359 0.0325460615697414495928 +11360 0.0080176009069186371381 +11361 0.1582414754904319653672 +11362 0.1062843858455126966334 +11363 0.2056019369394314622035 +11364 0.0082040374151243922879 +11365 0.1616133764631407787604 +11366 0.1271733913502643420479 +11367 0.1648428466724134733301 +11368 0.1830730056746188993966 +11369 0.1241200295119796598309 +11370 0.0348891223262233693414 +11371 0.0769792831407751787509 +11372 0.0928267179474632914138 +11373 0.0754642298589590643232 +11374 0.1160962737577704989578 +11375 0.2084573373951330299558 +11376 0.1068939849653054163792 +11377 0.1045855158218311764529 +11378 0.0723817229189240990417 +11379 0.2187517676498107466188 +11380 0.1937581850176645359429 +11381 0.208012981971469751441 +11382 0.1898704895430567696213 +11383 0.0904238696597449015391 +11384 0.075767194749044211588 +11385 0.0685879603619775068957 +11386 0.1170174746091242612422 +11387 0.1254940978820524022375 +11388 0.1795848367337511963804 +11389 0.1781038064736174542091 +11390 0.1724569176830963723734 +11391 0.2828917257696642595377 +11392 0.2196948914811534536717 +11393 0.1462170450782914743471 +11394 0.0012301432879662590764 +11395 0.0116774402126921181266 +11396 0.0043532868475383576859 +11397 0.1073645273119751691882 +11398 0.250901896003875968244 +11399 0.1456982544203218321588 +11400 0.1411344714471502170827 +11401 0.0906686132924155591128 +11402 0.0294553109364202696241 +11403 0.1625921130075843601404 +11404 0.0155358827103202826181 +11405 0.0042841585116648343498 +11406 0.223203219684257575528 +11407 0.3062193749056863478764 +11408 0.0081396427325127421876 +11409 0.0956711929595947530025 +11410 0.0129205791493700022227 +11411 0.1122117956313344044883 +11412 0.002515894677314486471 +11413 0.1824123476659118003873 +11414 0.0820819280425488378983 +11415 0.0439036751298770983043 +11416 0.1582713089334743528092 +11417 0.1701071277735322351266 +11418 0.0047915279931412292636 +11419 0.1354233564045221327454 +11420 0.024266773049606088497 +11421 0.0092678156287506249095 +11422 0.007069932173748095898 +11423 0.0582380057970316653004 +11424 0.1509382446779113373658 +11425 0.2243313714491701749143 +11426 0.3175519769449676732442 +11427 0.039679624873499096116 +11428 0.1123131748923450606847 +11429 0.0598372601430312497928 +11430 0.0156923901030521957545 +11431 0.0041709169100558529317 +11432 0.2688203646893126408379 +11433 0.1296981093651444483239 +11434 0.100462377509720826585 +11435 0.0165528379446150709919 +11436 0.0185380370552394174721 +11437 0.0590927415010613285573 +11438 0.203993164036020968588 +11439 0.1607457720306713910841 +11440 0.1301350734210408588432 +11441 0.2572767204751381653338 +11442 0.2033302120275166102736 +11443 0.2324508167759405108388 +11444 0.1949798557066922399805 +11445 0.0173052673814498188254 +11446 0.0038922867084579424943 +11447 0.0010783330744990554748 +11448 0.1643234316804402239676 +11449 0.1751944597292894589824 +11450 0.0881370795903570758423 +11451 0.1660128565606079631412 +11452 0.2142791449936980729962 +11453 0.0255797988670521253551 +11454 0.0002947000518527094167 +11455 0.1814718547885674215792 +11456 0.0244881547758949887073 +11457 0.0358897584273944192379 +11458 0.0997967550681597842521 +11459 0.002998916367169054617 +11460 0.0091868107706918368527 +11461 0.0148204511921674660785 +11462 0.0058038697796830244352 +11463 0.0082554149036227209013 +11464 0.0909633602878916064371 +11465 0.0342981003378303589413 +11466 0.0604857758149192209496 +11467 0.1586836637775205227641 +11468 0.0363459585434384307989 +11469 0.0902756820693400435784 +11470 0.1043373908621225698568 +11471 0.1665430400744946637381 +11472 0.2064165513696651077513 +11473 0.0592217356810783787657 +11474 0.2896985512464956280532 +11475 0.2195824318686107889942 +11476 0.063078207045118725449 +11477 0.1835276741823540502807 +11478 0.1247111881127577620587 +11479 0.172292058686121407618 +11480 0.0496882073037517302261 +11481 0.0044300179908809871401 +11482 0.1832819576446186871088 +11483 0.1184965566452951951648 +11484 0.0793602460560133421419 +11485 0.1667413377606624946825 +11486 0.2194686168595314512686 +11487 0.0600347467405707574395 +11488 0.0287955163005024633582 +11489 0.0402539961470066987026 +11490 0.0365730161557873897915 +11491 0.0713530592009522374175 +11492 0.0424781200582921292241 +11493 0.0054970683153095457538 +11494 0.1820481039280045787443 +11495 0.0069554246157220661964 +11496 0.0241502758774541012587 +11497 0.0456720092312518782807 +11498 0.1518409709726710188349 +11499 0.2324865986592245525877 +11500 0.148070762712085274293 +11501 0.2549859304055267505973 +11502 0.016109993259316442854 +11503 0.1342102196899180865941 +11504 0.224464193629516112205 +11505 0.0757030909469879054186 +11506 0.0357387798808058293476 +11507 0.047255018709035818969 +11508 0.1518638751242587237034 +11509 0.0154343368324496542365 +11510 0.0368441014391832979968 +11511 0.0009440701999676484298 +11512 0.2071614165814543717659 +11513 0.0225166720632374479927 +11514 0.0928341031166338009895 +11515 0.2601091688638214605156 +11516 0.1503248753725500430622 +11517 0.1200370055486110754117 +11518 0.4011115339374073385237 +11519 0.0713108793796751322391 +11520 0.1611716325807312777396 +11521 0.2308533820877078923672 +11522 0.0636707791854996124625 +11523 0.1981770641956949186024 +11524 0.1944271193714486589688 +11525 0.0329171424451727701044 +11526 0.1173414295542149626828 +11527 0.1266939104001571225755 +11528 0.2904895778859709065678 +11529 0.1853171282604724190435 +11530 0.1979760652004313170593 +11531 0.071008677604111045123 +11532 0.0052076131403868183151 +11533 0.1923331204325965937407 +11534 0.1312204859172112214605 +11535 0.1924068352667283876567 +11536 0.0758149923022912175519 +11537 0.1113533868847298158311 +11538 0.1594325203596208551104 +11539 0.1671899029763646571389 +11540 0.1107191344308611491432 +11541 0.0681701926044363110124 +11542 0.0145242679214586181691 +11543 0.2909039901541974337817 +11544 0.1215261555022910527901 +11545 0.1187167868651788499301 +11546 0.1474310350890847554073 +11547 0.2166127292545901861232 +11548 0.1418264203650363430853 +11549 0.0950821625526765967784 +11550 0.0863892406631392023586 +11551 0.0151424083438618656272 +11552 0.1785113839030354199178 +11553 0.0529352849019919385887 +11554 0.0791119338134761151959 +11555 0.0164314932322269860454 +11556 0.1646230644728804703991 +11557 0.0630737191326593560348 +11558 0.0241472829901146872345 +11559 0.175727048001134256161 +11560 0.0157514496069311935034 +11561 0.0101677781211168914016 +11562 0.0844106126352152513759 +11563 0.0395958899665454708283 +11564 0.1954333945679734563239 +11565 0.207881944865088263974 +11566 0.0076564724677473314229 +11567 0.0614121555357384080187 +11568 0.0067490227736116265023 +11569 0.1291279801154500139759 +11570 0.195269165755855872435 +11571 0.1475658530807451296329 +11572 0.1263111800467849876739 +11573 0.1835075843684914331799 +11574 0.0615046603037262828995 +11575 0.246407446188161238787 +11576 0.1988867693733930852584 +11577 0.1650498114805522553716 +11578 0.1257613145980953917036 +11579 0.197754668141905232126 +11580 0.1957851214775409021129 +11581 0.1358714948511348941107 +11582 0.1932737853927984084113 +11583 0.2387727517463152926425 +11584 0.1769158858145058188516 +11585 0.1762860717585586212142 +11586 0.1576917348708518074041 +11587 0.1801872388614372821891 +11588 0.2729765304766984534979 +11589 0.2292216140404514657103 +11590 0.1681328547238173098943 +11591 0.1893203124417582594585 +11592 0.1375408041660071967183 +11593 0.179359985641841523929 +11594 0.2159895874550007977 +11595 0.1960173370677725512845 +11596 0.1336839839699256737848 +11597 0.200311260053960105143 +11598 0.1775882841696556357913 +11599 0.1192866714602097422127 +11600 0.1051860933431314065745 +11601 0.2086187784236243902836 +11602 0.2014077773178615526906 +11603 0.043906174320414756429 +11604 0.2003558944894351101507 +11605 0.2103954021080240710528 +11606 0.1618682847372006350373 +11607 0.1083043581749253136959 +11608 0.2678989012316806417324 +11609 0.1115552157562269136593 +11610 0.2201637351160627420477 +11611 0.0226719260189288014662 +11612 0.232761981414026403181 +11613 0.1320483229505436395534 +11614 0.1687818660709806684039 +11615 0.1438183943106209339291 +11616 0.2390474621207580585569 +11617 0.22463656027896306 +11618 0.0120449545096103227682 +11619 0.1623401747824744500992 +11620 0.2227956257510817217327 +11621 0.1095909539831161211287 +11622 0.213716163899890954081 +11623 0.1650807513216302202519 +11624 0.204761451709094571827 +11625 0.1328734542578222932452 +11626 0.1386484587711097615781 +11627 0.20973325333817244287 +11628 0.1723602843894762592925 +11629 0.1331014885203895548038 +11630 0.087027869175274039093 +11631 0.2010530268996703617823 +11632 0.1403595266511805939036 +11633 0.0677520509894646572047 +11634 0.1051814984321180479476 +11635 0.1307011021238294046221 +11636 0.2874271846866067181558 +11637 0.1185421278044107851191 +11638 0.1737646318259916666804 +11639 0.158070902575556754277 +11640 0.1760052492495619846924 +11641 0.0172540372708449539496 +11642 0.0015904942531911679237 +11643 0.0255700649412696628948 +11644 0.0396628799318737707003 +11645 0.0009113457456561963304 +11646 0.002403476552552736515 +11647 0.0959884082139059180427 +11648 0.246572539667683304776 +11649 0.0685313955505811589486 +11650 0.0267301708267658200135 +11651 0.1285680269434120170935 +11652 0.0407545676531757966599 +11653 0.1668514947028741923774 +11654 0.0610211006512943951452 +11655 0.139382739664383831224 +11656 0.0352913459398914650111 +11657 0.2085710614466952661505 +11658 0.0917162970048765735509 +11659 0.1235614069528776504114 +11660 0.1274758684477480530362 +11661 0.0540000209439132558553 +11662 0.0960172780595835440032 +11663 0.1559635868486203280625 +11664 0.379517821297398527669 +11665 0.2453903224986888897963 +11666 0.2799928199851865473491 +11667 0.137389882513457156632 +11668 0.1494277663495372499014 +11669 0.0066818109074351919249 +11670 0.0062206752158438273492 +11671 0.0252172688166847000502 +11672 0.1903785818737241675169 +11673 0.0073857341526744899399 +11674 0.0380410430275176236758 +11675 0.1576276445284080784948 +11676 0.234657298304853767501 +11677 0.3283570376389626788161 +11678 0.000432091686735405745 +11679 0.0036585444924359744948 +11680 0.1061781766135483501268 +11681 0.0005792666024961712736 +11682 0.0461178875378694258513 +11683 0.0008032470349052967407 +11684 0.1645435653860740365939 +11685 0.2309601324127429355926 +11686 0.0456918809794485958342 +11687 0.0398448254876590246543 +11688 0.0126052345037963760277 +11689 0.0364186161785009440695 +11690 0.0282660645020852696607 +11691 0.0461978563380125717419 +11692 0.1047367620485807893083 +11693 0.0756707223301289338968 +11694 0.1570134248839516188934 +11695 0.1588423465930696998338 +11696 0.1508136910281057674332 +11697 0.1716170386341461795432 +11698 0.1649185129847504371892 +11699 0.1603181346065583601224 +11700 0.1237044802329534953778 +11701 0.0876230971487710480483 +11702 0.2435084674986960906473 +11703 0.0199188521520217952376 +11704 0.1015694562196163969192 +11705 0.3009108787409721652573 +11706 0.0325519251405104667607 +11707 0.1882107804458638955492 +11708 0.0027741413228222042316 +11709 0.1384938651877636317522 +11710 0.1745532152085047528089 +11711 0.0113284580091676662095 +11712 0.0356835077872515379904 +11713 0.0663296481718687186424 +11714 0.0547192713909119415883 +11715 0.0349542137893795110126 +11716 0.078377654960671172546 +11717 0.1773018386577809579752 +11718 0.1768734547316168081998 +11719 0.0021710245219799316312 +11720 0.0528601767322834023566 +11721 0.1166341732222045679546 +11722 0.035626969752715266504 +11723 0.1955472963371598926763 +11724 0.0463862958814935735785 +11725 0.0580579839776344114322 +11726 0.0929910726850841501401 +11727 0.052789689645853940525 +11728 0.0367138952811546098731 +11729 0.1542087238866089449196 +11730 0.1732680136520248215248 +11731 0.0264670813209012950606 +11732 0.0997725970140016654719 +11733 0.1989717149772031179467 +11734 0.1972423701949272967227 +11735 0.0005199614740114047711 +11736 0.0005034154275225144644 +11737 0.0109385229640586046501 +11738 0.0326892609993922472755 +11739 0.0042142515579860675284 +11740 0.0056859872815732620879 +11741 0.0094129394086633240424 +11742 0.0140828597425210373661 +11743 0.1784553678247656205436 +11744 0.0222763450914631559729 +11745 0.0023851153067624979338 +11746 0.1625896332852570513872 +11747 0.0136901236901965980136 +11748 0.0339962074061242458534 +11749 0.0183242657118193041921 +11750 0.0132532128259644620821 +11751 0.0095779954158478047138 +11752 0.0071259597576864720098 +11753 0.0705269607870576131781 +11754 0.0070969020532688038491 +11755 0.0069197537374051498044 +11756 0.0237697722569346660271 +11757 0.002740272114618620778 +11758 0.0024550861390580922851 +11759 0.0005599220266272844941 +11760 0.0035101064472430219474 +11761 0.0024944703197968647046 +11762 0.020586845234705965918 +11763 0.013653194608141671143 +11764 0.0060327705520621536031 +11765 0.0040058462327471839118 +11766 0.0083943291850837528645 +11767 0.0006046868415293832908 +11768 0.0051156044109487248603 +11769 0.0067763506660475352622 +11770 0.006561916428177666169 +11771 0.0021472871353234823481 +11772 0.0054732874730275132835 +11773 0.0151819663991647341161 +11774 0.0131174947120966094855 +11775 0.0146005075545929691866 +11776 0.0102189221350823585682 +11777 0.0050979785424241389966 +11778 0.0022683931496446933969 +11779 0.0085388104304523655685 +11780 0.0007641030362859567958 +11781 0.0021567450740828564455 +11782 0.005494308031758684055 +11783 0.0079923243381817624109 +11784 0.0158441299768129276182 +11785 0.0176614640685134917375 +11786 0.0043550375992472351375 +11787 0.0057373857050381903838 +11788 0.0049317714163807594568 +11789 0.0437368162967360571414 +11790 0.0105852393278744721233 +11791 0.0034233540685034593958 +11792 0.005135563926091758942 +11793 0.0170039666789657262225 +11794 0.0159486826984624324721 +11795 0.0110148386613897978498 +11796 0.0142180403634629727583 +11797 0.0135813863223545744363 +11798 0.0128929239990621651718 +11799 0.0200328977971209395981 +11800 0.0255209918206839268817 +11801 0.0031706925981895991117 +11802 0.0040593034307682728315 +11803 0.0271210076532184748976 +11804 0.0044874278326901603092 +11805 0.0053007148215074278291 +11806 0.0073325662221886489869 +11807 0.0267240002872007539891 +11808 0.0170752648624135187905 +11809 0.000777724580663039281 +11810 0.0075298403768212897672 +11811 0.0004433089956295018083 +11812 0.0352008399929388607341 +11813 0.0105330698701100108905 +11814 0.0180634149217350804817 +11815 0.0036996794135201486672 +11816 0.007266239885721504678 +11817 0.0101816984671948117941 +11818 0.0275127760495788367678 +11819 0.0201016911077809973252 +11820 0.0115548903957118989649 +11821 0.0269430903216090200658 +11822 0.0282139127527198836642 +11823 0.0282983201030432632295 +11824 0.0148733367938210549186 +11825 0.0030220749611831495933 +11826 0.0023006841456520519401 +11827 0.0072937691356750183891 +11828 0.0351304578765462144352 +11829 0.0042297563016138748151 +11830 0.0065145066044315150244 +11831 0.0090963857259498143853 +11832 0.0009692572534571153829 +11833 0.1210625535861129253856 +11834 0.0929832208809892313273 +11835 0.0195663268818735205934 +11836 0.0216723690230391195788 +11837 0.0166808000121542662764 +11838 0.1396576516583366567303 +11839 0.0248002144947733482727 +11840 0.0077279985591515122298 +11841 0.0257343959801144238353 +11842 0.0365259603862330278767 +11843 0.0032658594017909827259 +11844 0.0229421218662825936174 +11845 0.018881970661192227362 +11846 0.0053808183265404369369 +11847 0.0373842510332622821045 +11848 0.0524244336370798535985 +11849 0.0049908549095172439342 +11850 0.0083335104330878699563 +11851 0.0130153065406617748229 +11852 0.0013343235969608183405 +11853 0.0036625400438123375917 +11854 0.0214613356864729667994 +11855 0.0222004410483771369433 +11856 0.0073948057767792146899 +11857 0.0043327537892962932065 +11858 0.0550865689337677391646 +11859 0.0037560387359860023491 +11860 0.018418446681618075994 +11861 0.023077343233890643126 +11862 0.1036343157470912546003 +11863 0.0457120470570652887021 +11864 0.0406773926686518280671 +11865 0.0269092818685378425136 +11866 0.0259448781337056275098 +11867 0.0178728534825337882774 +11868 0.0031805976742606328553 +11869 0.0137413157099581552389 +11870 0.0212526592439351846853 +11871 0.0127869953231841172736 +11872 0.0112015111567200544018 +11873 0.0056231634076754728399 +11874 0.0164269945446221844743 +11875 0.0149845886028817411317 +11876 0.0131800521744314114159 +11877 0.0057300731469685332875 +11878 0.0040987982383312868487 +11879 0.006617552707112199159 +11880 0.0166158451050898783863 +11881 0.066664991776553231273 +11882 0.006139379523879160902 +11883 0.0220507388638169939121 +11884 0.0045315811113984319491 +11885 0.0039788064417122410987 +11886 0.0066849897547396792963 +11887 0.0017449198263997283201 +11888 0.0738456134759257615263 +11889 0.0040839206360615025113 +11890 0.0056712879875054269391 +11891 0.108228881437865950832 +11892 0.0057507406838498958263 +11893 0.0004000001659846297601 +11894 0.002848342603158266189 +11895 0.0214667233503977525344 +11896 0.0330916779308484121369 +11897 0.0011705863536840804395 +11898 0.0026226475445231939819 +11899 0.001571696667735347731 +11900 0.0023570733811078469974 +11901 0.0053635158394941826257 +11902 0.0001969614397818534853 +11903 0.0020336713900778401708 +11904 0.0061472952809817071884 +11905 0.0024546928321657088094 +11906 0.0043723544250607922906 +11907 0.004773216451012239972 +11908 0.006586016960500468316 +11909 0.0029559028118734407067 +11910 0.0048337549366735659689 +11911 0.0030284175659386911375 +11912 0.000502499183478922703 +11913 0.0071837976547178394177 +11914 0.0107248006885681708422 +11915 0.0058011687477494151138 +11916 0.0053926639695293567683 +11917 0.0308742157071487631492 +11918 0.0068526606686335599788 +11919 0.0062058539199325154709 +11920 0.0059178927446700210768 +11921 0.0037270238598914795167 +11922 0.0083840219206731021739 +11923 0.0397461151363616704835 +11924 0.0030425443923695048333 +11925 0.001263378031697651345 +11926 0.0120599082120506767568 +11927 0.0055589661040449734122 +11928 0.0110152257772407353303 +11929 0.0067819936382098695501 +11930 0.0019465066733416699386 +11931 0.0083380405114994032845 +11932 0.0035228000347029943609 +11933 0.004842945572938123569 +11934 0.0074561558366072295742 +11935 0.0020433083737237071192 +11936 0.030099043223942304609 +11937 0.0041906179150867214553 +11938 0.0055119212567262248353 +11939 0.0103203060582434270193 +11940 0.0054937542370551801715 +11941 0.0025748732484471553023 +11942 0.0054420150068855638673 +11943 0.0012479259465569957511 +11944 0.0054131243321305297367 +11945 0.0080412301325504583621 +11946 0.0057946342028590497666 +11947 0.0147606066324557604064 +11948 0.0251430614218754316935 +11949 0.0107174303992301992589 +11950 0.009173046755927789786 +11951 0.0116406725329985943906 +11952 0.010650315417884376884 +11953 0.0231036705884912901832 +11954 0.044669200167116521194 +11955 0.1100957444578239685651 +11956 0.026438378072871013269 +11957 0.0137640322114390596664 +11958 0.0033207443521679779787 +11959 0.0038673157691947283865 +11960 0.0035829446161209305491 +11961 0.0178143176572557286608 +11962 0.0097344088283398428479 +11963 0.0074008432716384381367 +11964 0.0130603225578938761386 +11965 0.0488375461651271228214 +11966 0.0556360379639789306983 +11967 0.0056277346862437516162 +11968 0.0062968895893910602213 +11969 0.0034353503499549580165 +11970 0.0253236615433893134941 +11971 0.013006832262403124692 +11972 0.0375133665773817598366 +11973 0.0131608892917961014385 +11974 0.0126097285574464142849 +11975 0.0046652469612359793311 +11976 0.0119528495278813234881 +11977 0.0011460080497869532153 +11978 0.0372728542668780485347 +11979 0.0262624781883120834991 +11980 0.03829113771522511811 +11981 0.0003187778467690421599 +11982 0.0119196247148770015367 +11983 0.0106021574037409704178 +11984 0.0191376820820838081105 +11985 0.0099090319915414223212 +11986 0.0069075357250828898006 +11987 0.0074791443007673485965 +11988 0.0651135743225569468828 +11989 0.0071421541804006054288 +11990 0.0409328866448233075204 +11991 0.0311930332791687282834 +11992 0.0033363309815510116749 +11993 0.007727859304078150797 +11994 0.0109266384808908850745 +11995 0.028560103817525737957 +11996 0.0322019835584306246545 +11997 0.049930934338088601987 +11998 0.0039869094728580099785 +11999 0.0135406706122793256408 +12000 0.0035981702417633835457 +12001 0.0015843820258734490642 +12002 0.0171158484137150929127 +12003 0.0025580337015632123934 +12004 0.0057593198699757283329 +12005 0.0070633068386686086171 +12006 0.0022669055364556546864 +12007 0.0045219749484045439203 +12008 0.0041314658950947257335 +12009 0.0179712396172024835295 +12010 0.0068000112105741743196 +12011 0.0052907539979850847986 +12012 0.0038429800463973848128 +12013 0.0039458005554870697121 +12014 0.0060026372999218893106 +12015 0.0064289735906773353216 +12016 0.0020795535223170880622 +12017 0.0019317240185625593734 +12018 0.0068764388291209267712 +12019 0.0014471616086713229559 +12020 0.0054022418200057788912 +12021 0.0073548539339644662671 +12022 0.0078026395269230781893 +12023 0.0040495913563211817163 +12024 0.0035862457386542276848 +12025 0.0285275088718854874514 +12026 0.0850944009474807611104 +12027 0.0055046461226099753158 +12028 0.0046113534002220051694 +12029 0.0040878857664774833486 +12030 0.0016755593262757319858 +12031 0.001422208433197531775 +12032 0.0361818119310744895634 +12033 0.0027513322000844203825 +12034 0.0008856593855317731849 +12035 0.0037813719117240750939 +12036 0.0023780123768816158539 +12037 0.0009091812763697840541 +12038 0.0018387707641679019045 +12039 0.0056956720729010434229 +12040 0.0017060460018248468222 +12041 0.0021478803206763691543 +12042 0.0023689408526150956477 +12043 0.000988802442903822772 +12044 0.0054170817227068646452 +12045 0.0076788124982119310494 +12046 0.0056833939527825656432 +12047 0.0068940661901653512572 +12048 0.0055481389861483003736 +12049 0.0004329860523325485097 +12050 0.0037725143163689139544 +12051 0.0008162955007953338893 +12052 0.0022387806675117681923 +12053 0.0083631868168122028284 +12054 0.0042970957856580498102 +12055 0.0020012564484811535583 +12056 0.0304481961081543156533 +12057 0.0013785388381609866952 +12058 0.0087321305361527639527 +12059 0.0009994457716334609679 +12060 0.0082376137484376013576 +12061 0.0117150790617818554434 +12062 0.0066153035188704183145 +12063 0.0029763896596016136374 +12064 0.0092636553246700347602 +12065 0.0080187804850683601365 +12066 0.0077255516113914580664 +12067 0.0042608261981413467559 +12068 0.0058882833954285408382 +12069 0.0103884985118871588788 +12070 0.0032524447890440026133 +12071 0.0315806508144250291936 +12072 0.0049254763181785640377 +12073 0.0051536157489925531541 +12074 0.0044276182431978077306 +12075 0.0079704455002256105467 +12076 0.031635201012319834335 +12077 0.0394248571692886740081 +12078 0.0009624662333226080167 +12079 0.0250408595450315775433 +12080 0.0007270232160456590459 +12081 0.0222363181014945746061 +12082 0.0849219316255895723122 +12083 0.0051937335196680489008 +12084 0.0007140328952502956322 +12085 0.0052399589540615245167 +12086 0.0211578652435300586243 +12087 0.0130792976398284282319 +12088 0.013853744191078069467 +12089 0.0125523047544226756117 +12090 0.0048278790000826007911 +12091 0.008294586646621941381 +12092 0.0017253591519140740792 +12093 0.0138618486458972154513 +12094 0.0071764530698379533011 +12095 0.0111603160850307654012 +12096 0.0155744670798312317278 +12097 0.0026081394707313295539 +12098 0.0095861096440466118246 +12099 0.0015588528080922141317 +12100 0.0019639666119593769851 +12101 0.0096683975278448255897 +12102 0.0044812087355884117762 +12103 0.0032062004150644539228 +12104 0.0008344086347235479427 +12105 0.0003827486959034756618 +12106 0.0042682033254605655434 +12107 0.0035577236802816145089 +12108 0.0100363459452909083852 +12109 0.0141234532438320216363 +12110 0.0017164892227972163019 +12111 0.0010801375072204911967 +12112 0.0212737018436939540056 +12113 0.0097263567574919513464 +12114 0.0069778835269394745219 +12115 0.0087691383252152307198 +12116 0.1910229549666921933238 +12117 0.316131446811921623663 +12118 0.2458622322983493868609 +12119 0.0068898406103343632842 +12120 0.0988210369559822110475 +12121 0.1627244082160521354563 +12122 0.1777651977795146054362 +12123 0.2211790599901876375188 +12124 0.0235704652209702304744 +12125 0.2832800549444294269108 +12126 0.0942996947376435051824 +12127 0.1085733955081452511982 +12128 0.1798818526279666596412 +12129 0.0845743458866715647204 +12130 0.0481703941269270366488 +12131 0.0199453717832537777421 +12132 0.0396419014270772973596 +12133 0.1820176212577429009087 +12134 0.0814572930591251909149 +12135 0.227084121284071793756 +12136 0.0958162743154245960531 +12137 0.1308032055264496118507 +12138 0.2038955391601218192221 +12139 0.1379682148470970093346 +12140 0.252411743500360274961 +12141 0.1518742813524288359783 +12142 0.1736136858186656806158 +12143 0.1467852464426401826181 +12144 0.087903250620447390129 +12145 0.2388992350083740390243 +12146 0.2085545598884071816226 +12147 0.162123024158024653385 +12148 0.0910252295628175972508 +12149 0.1824062235446638480241 +12150 0.1412654671547863560743 +12151 0.0218271383197253705799 +12152 0.0335199540138577389836 +12153 0.0428426970651362912235 +12154 0.1785460105221060855829 +12155 0.0308385833867473625747 +12156 0.1444707909763853492802 +12157 0.1713476926014676771093 +12158 0.2243881225077375141108 +12159 0.1347450687922928380669 +12160 0.002230454498934041855 +12161 0.0587498530279745440064 +12162 0.1023316841519792269599 +12163 0.0195313185756403814852 +12164 0.0069116211122193784919 +12165 0.0030352238157581902522 +12166 0.0041344425987536138553 +12167 0.0124344630230683039773 +12168 0.0155469099392287982836 +12169 0.0016308277474209641536 +12170 0.0331094561545555166049 +12171 0.0004239929385966325877 +12172 0.2357205392734144089317 +12173 0.1042059184008510969166 +12174 0.089650146039186945357 +12175 0.1297262217526081728991 +12176 0.1798542669213872569056 +12177 0.0513579147800132113089 +12178 0.0045500009594738304725 +12179 0.1112350026027288663055 +12180 0.1896833668151670093671 +12181 0.2291153825541993038772 +12182 0.087720727455987676513 +12183 0.0167507927908630178948 +12184 0.1209202783998387653597 +12185 0.0019674713010092641056 +12186 0.040895893022947132911 +12187 0.0036946731105862830671 +12188 0.1969676676179205332406 +12189 0.5377059562830973638015 +12190 0.1509940181267788450459 +12191 0.0535469372309013841305 +12192 0.0141873859665541980635 +12193 0.1146440431120826081157 +12194 0.0242126106365289188449 +12195 0.2057854404117798186036 +12196 0.1129823925482959234845 +12197 0.168885435599804456519 +12198 0.1137033705858000870448 +12199 0.0285833819090783938788 +12200 0.0230539978717337924741 +12201 0.077825671812428720564 +12202 0.0093183910405657353199 +12203 0.003938132765259827299 +12204 0.2274863721316613407453 +12205 0.0323753906727278356614 +12206 0.1296227273612432784322 +12207 0.1187593978265439748077 +12208 0.0487753881521950025113 +12209 0.0937645056944728383375 +12210 0.1222873050835773245382 +12211 0.0986263759137584322678 +12212 0.0607045902726538605387 +12213 0.1470420723368515980578 +12214 0.1262570109346214941581 +12215 0.0062113919717671755433 +12216 0.0756545233520016929818 +12217 0.1061017789400744010253 +12218 0.1209262660006123618617 +12219 0.1211689853438040753719 +12220 0.0180485583804617620574 +12221 0.2481058911832037805478 +12222 0.3189996220018243011651 +12223 0.0659096144936879430043 +12224 0.1741625660201518432935 +12225 0.1275505175208326291703 +12226 0.1343658201503064031357 +12227 0.1439535517220157934126 +12228 0.0265071443005938239823 +12229 0.0864354235317904284752 +12230 0.2894153915299464419064 +12231 0.1047769633786737181191 +12232 0.0196114609556645655419 +12233 0.1501265977853003252829 +12234 0.0427615381997663857039 +12235 0.1509292270907202082597 +12236 0.2357254235662626651315 +12237 0.0683193966748453646032 +12238 0.1983833828274666477576 +12239 0.2051576138718196329336 +12240 0.0417841500765968598685 +12241 0.1419072061906264947684 +12242 0.2201096351348457713648 +12243 0.0081475090899081510643 +12244 0.0747223839592956323719 +12245 0.0469254623552043878543 +12246 0.0403982385476757549503 +12247 0.0025336885315484533425 +12248 0.0047438397207872099945 +12249 0.1355762697781487946536 +12250 0.2901818821319780972168 +12251 0.1793504797302662989722 +12252 0.0885681397632204492343 +12253 0.2249991240597813413338 +12254 0.2161468653388705085661 +12255 0.0036186474265232056527 +12256 0.0077831462296774674842 +12257 0.0136889993229251244466 +12258 0.0042070868031380392188 +12259 0.0117842382830945727884 +12260 0.0237225802741360645987 +12261 0.1129620877840570303263 +12262 0.0561958360618083421767 +12263 0.1762827061647067317107 +12264 0.1433868985588198918801 +12265 0.2787309105741627157649 +12266 0.1774364604611660956035 +12267 0.0058440866008936621576 +12268 0.2312904162496002835603 +12269 0.0516136022963562188237 +12270 0.1040803708511269243253 +12271 0.2433686873396296301841 +12272 0.021188215306466123139 +12273 0.155462159168017671762 +12274 0.1119246666894433556161 +12275 0.1755340699131951176515 +12276 0.1673489745613278845138 +12277 0.0696263363495198822628 +12278 0.1234396889412639058969 +12279 0.0541010607353960490395 +12280 0.2192175081448512474758 +12281 0.0248756716645738007454 +12282 0.0985205441979065538183 +12283 0.2037289239031506715705 +12284 0.132146875886834685998 +12285 0.1578971920238320392205 +12286 0.159925052627185898535 +12287 0.1368470845002633107246 +12288 0.0627287741472834364798 +12289 0.0040974585938121140666 +12290 0.1723602184281983473735 +12291 0.2495505929359686236868 +12292 0.117833741852472584366 +12293 0.1489624589055328907072 +12294 0.1530794460871276740921 +12295 0.0313234323261466010568 +12296 0.1762916122465436052913 +12297 0.0115273824146171575272 +12298 0.2095050842907220101097 +12299 0.0037069643522262499952 +12300 0.0420236138257492966575 +12301 0.1647269568802018080422 +12302 0.0416408970930094990415 +12303 0.1788528444757126467657 +12304 0.2349891294128756968895 +12305 0.1945789015654737774685 +12306 0.0692775173625315016102 +12307 0.0939575206202773599573 +12308 0.1620372409421418080644 +12309 0.0933456964421123386311 +12310 0.1130376437089723185014 +12311 0.1840399154998836050989 +12312 0.1573707840286919235062 +12313 0.2240996161714666201625 +12314 0.0505981731057993169842 +12315 0.2642573084181020770522 +12316 0.1035384505777539182469 +12317 0.1648376381499863374547 +12318 0.2622618812281607425518 +12319 0.0434728310933567702046 +12320 0.1868515819575063130475 +12321 0.1882960858688584593335 +12322 0.1027894628265866028283 +12323 0.0761880819888169635856 +12324 0.1115472977679600047329 +12325 0.1484834582254913326604 +12326 0.0814415026727189628186 +12327 0.1524858014842462305971 +12328 0.1291760566651193042897 +12329 0.182301478225881635753 +12330 0.1884725687086465095721 +12331 0.2708647407483801838879 +12332 0.2021718634207101128464 +12333 0.0985488911412400608869 +12334 0.1255728150593999337126 +12335 0.1861648353789970566918 +12336 0.1328042321313481488598 +12337 0.2439199483035198734449 +12338 0.133615548975159537104 +12339 0.1015739120336604378592 +12340 0.0093014795137371895917 +12341 0.0284281069899764357589 +12342 0.0019294255092796883198 +12343 0.0019141342879747493596 +12344 0.0020772969112932827655 +12345 0.0019532277164741721799 +12346 0.240301178773909579478 +12347 0.0730664703413718003011 +12348 0.1385568025147100035266 +12349 0.2656871178851850090474 +12350 0.0456974365136466192938 +12351 0.1012135571916997073272 +12352 0.1133063371514648753902 +12353 0.0013567523842161551169 +12354 0.041818862160633747671 +12355 0.1491756792205754800218 +12356 0.0150220534983566862686 +12357 0.2839389389306369304578 +12358 0.0287056010463632750584 +12359 0.1908915424146448713838 +12360 0.2320013388289514644214 +12361 0.2054364506498023712489 +12362 0.0531591006199821150546 +12363 0.2098324245161568013529 +12364 0.1151813284799236558609 +12365 0.1987703614814739228045 +12366 0.0940339387088840461537 +12367 0.1103336030570859771505 +12368 0.1831106026360879990289 +12369 0.0816962418390168310101 +12370 0.2176354517584519754347 +12371 0.2796151429115716902096 +12372 0.1594326924296295133754 +12373 0.1670676859276006565036 +12374 0.1203589549385677970328 +12375 0.260273293240442293861 +12376 0.0633446105367947848164 +12377 0.16117215737315604418 +12378 0.0831602552365740821383 +12379 0.0369422063183204446579 +12380 0.0245119454134026977521 +12381 0.178240434261184826914 +12382 0.1090105371220351782036 +12383 0.1332897196198805089828 +12384 0.0633742815837997769401 +12385 0.0120186719751187044009 +12386 0.2569276689182740103234 +12387 0.023898563757052772305 +12388 0.0976325306413545063178 +12389 0.0253963245452178305872 +12390 0.0894508786048613252495 +12391 0.0685074023311699614736 +12392 0.0663478012904564901087 +12393 0.0915158535315902244012 +12394 0.0389841355539216269377 +12395 0.07845903845543612809 +12396 0.0425159178996684650187 +12397 0.0353999056532642020212 +12398 0.0493880547876073133784 +12399 0.0150027498986355575794 +12400 0.038632868358584912527 +12401 0.039019203903728309768 +12402 0.0097552910471997017483 +12403 0.0151912115007213788936 +12404 0.0780202940844238329587 +12405 0.0674607070308990974805 +12406 0.1608570685682002843198 +12407 0.0520264514218721912941 +12408 0.0592506480451986811131 +12409 0.0778223218283149859031 +12410 0.0777385322660022193286 +12411 0.0546805664297239923943 +12412 0.0866359517128204997816 +12413 0.036176771005036127693 +12414 0.0462567146750829868984 +12415 0.0735698097691456082137 +12416 0.0496052466370001665652 +12417 0.0582889089855325587264 +12418 0.1061520068599420552058 +12419 0.0583479915360951770231 +12420 0.0424092048931384857369 +12421 0.0481098122190732266357 +12422 0.1649706801284229473747 +12423 0.0429077855402202265611 +12424 0.013219020052806227869 +12425 0.0392382475451555426393 +12426 0.0229638608492369715453 +12427 0.0279751730272059240257 +12428 0.0128436162703558197362 +12429 0.0318254948453330355207 +12430 0.0144020995950553051812 +12431 0.0091324237755623718926 +12432 0.0192117440563461736036 +12433 0.0100539631992548894601 +12434 0.0210189938714100202355 +12435 0.0231920919446508798367 +12436 0.0538045150930119148169 +12437 0.011198270168383924611 +12438 0.0256486364886455570078 +12439 0.0456258761533798670729 +12440 0.0331395242446072338383 +12441 0.0539557850293540183784 +12442 0.0402627278872435684365 +12443 0.2909263989139072026369 +12444 0.0622697581003984201864 +12445 0.0943137055695581483228 +12446 0.1709377448817627764832 +12447 0.2120388806666336223294 +12448 0.0322448327284887009681 +12449 0.1388922419138058084176 +12450 0.0851302662092825596085 +12451 0.1017806563089594801275 +12452 0.1639653863253924537258 +12453 0.1682084963598061422285 +12454 0.0114188001789237054961 +12455 0.1846510844276861851299 +12456 0.0607521947298297060325 +12457 0.1730245305047121429087 +12458 0.0223764753756240282467 +12459 0.211302886298780384644 +12460 0.1955639172787931312492 +12461 0.1589570837996213403809 +12462 0.0615320221954855153812 +12463 0.2532027083036880732436 +12464 0.1670485706422620009182 +12465 0.0260837212915766107624 +12466 0.1390519504428719133049 +12467 0.1823262520524626395524 +12468 0.1522976071322382496387 +12469 0.0422253887231763408439 +12470 0.0837223783210181565773 +12471 0.0486212557014173360481 +12472 0.0139087429727789545914 +12473 0.0191293507189055946027 +12474 0.0606544839852877909481 +12475 0.0483527194520916581144 +12476 0.0035324556688256731132 +12477 0.0787319547436601124835 +12478 0.0995881223428587286639 +12479 0.0585189961874315253687 +12480 0.1292971605460499495699 +12481 0.2005022040328406085496 +12482 0.1391509281651133189683 +12483 0.0105681642981093389805 +12484 0.240101378550598099082 +12485 0.1932330325738096654664 +12486 0.2020229945086617073802 +12487 0.2236543413724979556534 +12488 0.1532569015694755232282 +12489 0.1822656000195042036438 +12490 0.0900491924416929323538 +12491 0.2214565310196983105584 +12492 0.1881630430528913600075 +12493 0.0168305213080700566408 +12494 0.0529876848902716690581 +12495 0.1826913922205175011548 +12496 0.0694872801688055791169 +12497 0.2660863460864985596821 +12498 0.0867147228279001036633 +12499 0.2193976549772687356654 +12500 0.0433940254505533890006 +12501 0.1975279002148898621627 +12502 0.019608904127055206551 +12503 0.237714868864104422741 +12504 0.1999398723633355745566 +12505 0.0009546112369094190151 +12506 0.2511195055998555747401 +12507 0.078032809626363330846 +12508 0.063146920257630051454 +12509 0.1935717380149306332537 +12510 0.1708196602231095151403 +12511 0.0626450393095115548592 +12512 0.0329633189386582189928 +12513 0.0456531800893516248596 +12514 0.0421157193496421791701 +12515 0.1274515064976894651583 +12516 0.1811115264278238268236 +12517 0.195238452415079366542 +12518 0.0373920773887675833569 +12519 0.0642062678741995851439 +12520 0.1609251866385138229898 +12521 0.1557873047884097972293 +12522 0.0012010533355591604225 +12523 0.0688354146348393869381 +12524 0.0164784043503916968754 +12525 0.2800548827461238343695 +12526 0.0022020293768192414872 +12527 0.0009248860525495366429 +12528 0.0946149589176276950564 +12529 0.0956859039541416989127 +12530 0.1723263808945721675681 +12531 0.138621893323960909683 +12532 0.2739733825224018870514 +12533 0.1227246273236006907092 +12534 0.0859582191027632031011 +12535 0.178657036308002392655 +12536 0.1602413494723200770853 +12537 0.1287823896739359319508 +12538 0.0256795476987061836727 +12539 0.1474513951885909890382 +12540 0.1907465150230454598024 +12541 0.1487673726924553796458 +12542 0.0008914536531873442238 +12543 0.2087024813176432680439 +12544 0.1088232778337822892478 +12545 0.0182208905994253496374 +12546 0.2477671163702189760514 +12547 0.1316171386645823560002 +12548 0.1837559085245699941336 +12549 0.1286942823919222378404 +12550 0.1858315056020513889035 +12551 0.0036736002685066500295 +12552 0.1687712168444059634975 +12553 0.2733280500244343880212 +12554 0.1097788285538171537681 +12555 0.0644993895269425215888 +12556 0.2653708828786787710108 +12557 0.1104422673444359903616 +12558 0.0358703806541446901246 +12559 0.1730689881060317736328 +12560 0.1137848920691533899374 +12561 0.1373987487114520789078 +12562 0.1098275829031247208256 +12563 0.1586339091985767557613 +12564 0.1969724855342188385432 +12565 0.0478733246305924065034 +12566 0.1704149943140857159651 +12567 0.1459863756665553125025 +12568 0.1840617418697435003772 +12569 0.1855734972981196717612 +12570 0.2683735668921294337252 +12571 0.0231002616133849404678 +12572 0.0850108332126338350765 +12573 0.1742489307146211086064 +12574 0.0822001743166315135269 +12575 0.1219361420191544531644 +12576 0.1815934151378356398165 +12577 0.0080362945486087029867 +12578 0.1537983966730412377366 +12579 0.1121131577492363862181 +12580 0.1415752196461728673604 +12581 0.2015946422887950062464 +12582 0.0896630487711985169064 +12583 0.1549716850538334633747 +12584 0.1727174226033422188564 +12585 0.2331164029182108965443 +12586 0.2493494990793935495166 +12587 0.0890789265774420968569 +12588 0.0511811899278735368823 +12589 0.2566062208190614812864 +12590 0.0789698746467448614572 +12591 0.2696995064809493691804 +12592 0.0630383421224685464024 +12593 0.1903829438750334335229 +12594 0.2935179314016420626388 +12595 0.2853689477687977871234 +12596 0.2226072543902355149736 +12597 0.0610363013503163881479 +12598 0.0819783988677095093456 +12599 0.1415841538343604877337 +12600 0.1873073354359321429641 +12601 0.2009726209537511243397 +12602 0.1524776741451394357352 +12603 0.1690762503041147446492 +12604 0.1686912727278228052441 +12605 0.017807251770061382351 +12606 0.1313055511884277659629 +12607 0.1016753789021838244588 +12608 0.1496461175926618869525 +12609 0.1454371349397588220054 +12610 0.2269111761151239858858 +12611 0.1466468662175786363822 +12612 0.2564820364256955587123 +12613 0.1176062101873961429632 +12614 0.1500542954781260984731 +12615 0.2105587628924041054823 +12616 0.0176595101521805838973 +12617 0.3038230255014279235226 +12618 0.1923259527241383548546 +12619 0.125423329028496122195 +12620 0.0514170315756536169816 +12621 0.2005840778399959056788 +12622 0.0324039163028967194813 +12623 0.128424152701116317532 +12624 0.1111168166240133481359 +12625 0.1045290297568075149615 +12626 0.0289929846687062425792 +12627 0.2042852568065760265892 +12628 0.093880639060715337374 +12629 0.2669675487922421597276 +12630 0.0200913137874959152329 +12631 0.0077632406832867026369 +12632 0.1205455976963016873071 +12633 0.2269425239076195421895 +12634 0.1068615769844142721068 +12635 0.1174794861929502054476 +12636 0.2181650867099910739988 +12637 0.1303969364544825815244 +12638 0.0720796549799367913813 +12639 0.0653022745972315632468 +12640 0.1437808453029436883686 +12641 0.0987910750629038825199 +12642 0.1588288474731464616774 +12643 0.2690367885749791465599 +12644 0.1864428833817201036638 +12645 0.2709391973444834067486 +12646 0.2196898895620823799479 +12647 0.0064688604800730532471 +12648 0.0560775515222376869096 +12649 0.0045394528189346685862 +12650 0.0043914788391127132158 +12651 0.0018450868770069051537 +12652 0.245117160036469383666 +12653 0.0182998332859219438318 +12654 0.2339898297477559230284 +12655 0.2111786817508643776531 +12656 0.1787604703912158699008 +12657 0.1461127420277906485691 +12658 0.0135233808823845853136 +12659 0.1715465253195707362455 +12660 0.1685781922629829010507 +12661 0.1423097186650401357877 +12662 0.1521261129560097791202 +12663 0.2850999191605946836425 +12664 0.0118011510134613407247 +12665 0.1468217800247390292245 +12666 0.0600892993799005334377 +12667 0.1444037963121553025392 +12668 0.2248208775183934693409 +12669 0.0006831724986370903965 +12670 0.0654273867698563632578 +12671 0.0064973348909508057622 +12672 0.0720292518157749361851 +12673 0.1075548893768172725727 +12674 0.0446092678587980892568 +12675 0.0664290942652536603452 +12676 0.0177286692914150126321 +12677 0.2413298020250058961711 +12678 0.1996305764065391952222 +12679 0.0500165064909111611868 +12680 0.0761289766946404850012 +12681 0.0043889545686256088708 +12682 0.2876117633075439550261 +12683 0.1275071029754953066959 +12684 0.129246277677455528865 +12685 0.2189955028028797989759 +12686 0.0249338958947488108231 +12687 0.2018944771398612780811 +12688 0.1971270452470074430185 +12689 0.2066047401124538607231 +12690 0.1692910131869942436111 +12691 0.2055696235592596177177 +12692 0.1879442213874727707346 +12693 0.1429042993651418735901 +12694 0.1151245663335813684824 +12695 0.0831776203802947894372 +12696 0.2172910521602939293295 +12697 0.1241388350353749558952 +12698 0.0099222363225188198871 +12699 0.2152737167895292236253 +12700 0.1818811250306773541485 +12701 0.1668251992198926392241 +12702 0.1872617664543428372959 +12703 0.1476518261582092095274 +12704 0.1618243478110073518828 +12705 0.1145504782175620489548 +12706 0.2879406578407359695859 +12707 0.2785311059786277909645 +12708 0.3451036250934603755169 +12709 0.1116663112122583401131 +12710 0.197453373711075658159 +12711 0.1896497036108676648158 +12712 0.1732277218199973767288 +12713 0.2302776321125036551596 +12714 0.2583716866863705607926 +12715 0.0701400627102972684668 +12716 0.1719571844170294516729 +12717 0.1074247499284372853179 +12718 0.0101062843289765579013 +12719 0.1184076053650341825429 +12720 0.0405241890716117394988 +12721 0.0406688705166407329927 +12722 0.1880100610206911770916 +12723 0.1553120080416308390614 +12724 0.184877846942677226405 +12725 0.1800072413038585061873 +12726 0.0107317524855947317441 +12727 0.1573732853318002833376 +12728 0.1765737212907176612031 +12729 0.185245089858169426078 +12730 0.1209742025076163823849 +12731 0.2871864154734766194643 +12732 0.1658307527608754716741 +12733 0.1852830539748661353716 +12734 0.2969892004728647560974 +12735 0.1606165972842216460581 +12736 0.0374672510509633446829 +12737 0.2202077179052264466197 +12738 0.0686900260408806057066 +12739 0.0801204187344106144586 +12740 0.18811302053366601994 +12741 0.1024695494713614429827 +12742 0.0555996356033157321241 +12743 0.1219482614618504906501 +12744 0.1654363970483124501865 +12745 0.2306783604503511997397 +12746 0.0784995102166980429637 +12747 0.1002645215242030513059 +12748 0.0126742418644048074339 +12749 0.0100595480910560062388 +12750 0.0548503523534533896089 +12751 0.2015406831085929939551 +12752 0.165928248040147663378 +12753 0.085549759807064043593 +12754 0.0467657286155489237967 +12755 0.2111197121054239900229 +12756 0.2829554165907238738598 +12757 0.1266996884013704693928 +12758 0.2087863971380831684677 +12759 0.1387847949187901719359 +12760 0.0743363971695293057618 +12761 0.1967535949780947479582 +12762 0.1839145354044922908709 +12763 0.1881785373490577895073 +12764 0.1176214798627758240102 +12765 0.1229729942357254690588 +12766 0.1484600202761867937529 +12767 0.0203260762909941578025 +12768 0.1370170186103495391094 +12769 0.0268980982751432333433 +12770 0.2794967069924831903904 +12771 0.1495240084700583393928 +12772 0.1158754898410871703174 +12773 0.2234977795306772319073 +12774 0.1638831064309904661513 +12775 0.0664468245955236042066 +12776 0.1965963221400888472079 +12777 0.2345804704785008076851 +12778 0.0987200753202146369381 +12779 0.0622264303863364620639 +12780 0.1363582928407596206455 +12781 0.0578005009317351797771 +12782 0.2050448275470509729157 +12783 0.1148268627643561495821 +12784 0.1837245754513582296497 +12785 0.2242768618034696459507 +12786 0.1155251265721502357264 +12787 0.1332737549928624254392 +12788 0.1633124104693502165464 +12789 0.1413639802048992644146 +12790 0.219293014262766067235 +12791 0.0920543736458101780062 +12792 0.070971745683074599631 +12793 0.148023946375228365735 +12794 0.0303483333330839578912 +12795 0.0049918696442287080473 +12796 0.1917317042402807980572 +12797 0.0960469451623739284285 +12798 0.2334472332007250239272 +12799 0.1670742448818758052465 +12800 0.1519406451316236417171 +12801 0.1955653709397856820917 +12802 0.1331730127898808069276 +12803 0.0882415279352200265794 +12804 0.0836087386451923320063 +12805 0.1787570589841541157661 +12806 0.0787324935794734837025 +12807 0.0944556611579353555541 +12808 0.0737811196988197581792 +12809 0.0907773174167766094822 +12810 0.130233235581343087528 +12811 0.064728134279976520804 +12812 0.2560593627460710508537 +12813 0.2077560309288177253784 +12814 0.3204606831949300405071 +12815 0.0627434565228293106287 +12816 0.1995622146135749008788 +12817 0.145340778511981572052 +12818 0.2116676690925981263458 +12819 0.0661644746529586880168 +12820 0.1830932186407656925287 +12821 0.1061446128970376379108 +12822 0.1672533875479173537393 +12823 0.1507065609946889594983 +12824 0.132923728178723554727 +12825 0.0042264234204937509154 +12826 0.0563792078701547405561 +12827 0.1407386460037240716314 +12828 0.0788526633187077163889 +12829 0.0053566105543818975238 +12830 0.3003056298741547669451 +12831 0.1633485779159454254206 +12832 0.1112428327013147422164 +12833 0.21448261293750031653 +12834 0.0769650333004824327432 +12835 0.1570917576513893854084 +12836 0.0951966019174264183045 +12837 0.1733163628581758375358 +12838 0.0435864396329250849971 +12839 0.0079004943266751861269 +12840 0.0365802143043564426117 +12841 0.0323963769326439249285 +12842 0.0038307350308770596325 +12843 0.0687856045184902509249 +12844 0.2826993797809154718159 +12845 0.2431366155278402441109 +12846 0.0282800797048370425313 +12847 0.0098064679954304387333 +12848 0.0069218322778147713462 +12849 0.1885992771703478843026 +12850 0.0115969337589023769458 +12851 0.0595618317993520315912 +12852 0.2462358968114697066465 +12853 0.1174953434660011286628 +12854 0.0331236664838877870043 +12855 0.0371231581780028241369 +12856 0.141918223754736727571 +12857 0.1750152686025887860399 +12858 0.0804552423510926928429 +12859 0.19789006397733555076 +12860 0.0736705731684471343268 +12861 0.2653459843681611474153 +12862 0.2714404828149133974691 +12863 0.1916043398967369748576 +12864 0.1745522149073791273732 +12865 0.0998884391085910483543 +12866 0.1696854161327442100937 +12867 0.22395783896579340988 +12868 0.2706896163446727565649 +12869 0.1748930935451036283723 +12870 0.442579608566628701638 +12871 0.030530533329344560628 +12872 0.1342401532200317437749 +12873 0.0177830380425593494176 +12874 0.0463796496970504817936 +12875 0.3276582272672145523806 +12876 0.1549125581828623698932 +12877 0.2035810160921258105127 +12878 0.2194618368060027857602 +12879 0.0113831898537560451817 +12880 0.0279166134138750555993 +12881 0.0180260030781780426767 +12882 0.0356474696743677502142 +12883 0.0094074892997242295922 +12884 0.0377334908934960980287 +12885 0.0492421369096084082795 +12886 0.0481233755696282503034 +12887 0.2244388601972588381805 +12888 0.0093115467820058910042 +12889 0.0457962262139386053206 +12890 0.0920359118144354754198 +12891 0.0620120361934072827914 +12892 0.1449469210945256159029 +12893 0.0815839550470286029027 +12894 0.0938797191997552127329 +12895 0.0548303768091943019569 +12896 0.0055671733160843114158 +12897 0.0270247977600679908927 +12898 0.1075413504315066992589 +12899 0.041099665840533101957 +12900 0.1315653752526540487544 +12901 0.0218318369564471405353 +12902 0.0701375644838390654989 +12903 0.0155948045994206089399 +12904 0.1741492164204537507199 +12905 0.0094394170644160630551 +12906 0.0746575793366742318513 +12907 0.1075391268378899506386 +12908 0.0302975567686435826031 +12909 0.1111535461619740788874 +12910 0.132272007049195505779 +12911 0.1817894547462567678142 +12912 0.0829265961798802686555 +12913 0.2705411213147349558383 +12914 0.1293015007905520263876 +12915 0.1297336376502468691108 +12916 0.2738239061302041865709 +12917 0.1491546752216950932901 +12918 0.1025095769408416873647 +12919 0.2289880590092397971702 +12920 0.029122267886544955795 +12921 0.1171171067259942077943 +12922 0.0675699501663041962596 +12923 0.1378548646192269133692 +12924 0.1492443640714352093291 +12925 0.12937678445238798286 +12926 0.1777250281736891812656 +12927 0.0611612524246788538673 +12928 0.0036118902714930962126 +12929 0.1637043019801525944867 +12930 0.2134264557391326588842 +12931 0.1437312470129631924998 +12932 0.028698518129852648717 +12933 0.0326558848528912121223 +12934 0.1286141685426073488774 +12935 0.206022259291540194015 +12936 0.1441007899809581271899 +12937 0.2126218756930594588894 +12938 0.2413567732730719228584 +12939 0.2061268588961303882545 +12940 0.2666861539229399835627 +12941 0.1445054733140845037997 +12942 0.0929566080687533463633 +12943 0.1151346516988539375737 +12944 0.2554931253666286594672 +12945 0.089690012425384421646 +12946 0.1263142701885491825387 +12947 0.2028361839244331721321 +12948 0.2081773076596365179558 +12949 0.0353803629130383517909 +12950 0.1479941480261199038271 +12951 0.1832048193403038871541 +12952 0.0963022730033639001723 +12953 0.1764018324453841268795 +12954 0.0815188732116168823794 +12955 0.0676256129116291626602 +12956 0.1946652355229955555505 +12957 0.0983928865068054020471 +12958 0.0355960941724276849762 +12959 0.1961284106481538114242 +12960 0.0428391989210754886575 +12961 0.1887772216823711546319 +12962 0.1955622879458525920082 +12963 0.1891287822946380314804 +12964 0.2876557082131168674444 +12965 0.2273242419970271654961 +12966 0.2019064840280978823106 +12967 0.1608821678539714139511 +12968 0.2421686097663399150903 +12969 0.1103764176713783601302 +12970 0.1057565899578154083871 +12971 0.1170491801210921806176 +12972 0.0135897801167611624595 +12973 0.1091245691307218268484 +12974 0.2027471628540224446446 +12975 0.0121507615875481190387 +12976 0.0984927415862656419066 +12977 0.0333596433627871766237 +12978 0.34463684150953777241 +12979 0.1145308094274826993253 +12980 0.1429526076509441967488 +12981 0.288393418514545918363 +12982 0.0018165218990976810497 +12983 0.1415700317783671891014 +12984 0.027915332445742260109 +12985 0.1664388686500003489144 +12986 0.0507560017785804243329 +12987 0.1891202551161788314449 +12988 0.0790031650073131641587 +12989 0.2349970160879142899724 +12990 0.021537474502348313149 +12991 0.2610229150856348545595 +12992 0.1428767278541355767452 +12993 0.2134213945204095630448 +12994 0.0395770402817959554564 +12995 0.1438717684545876840385 +12996 0.1708618843952883215032 +12997 0.000887256472693436697 +12998 0.0328021081131491601401 +12999 0.0072864541371143766446 +13000 0.1144934723018853628407 +13001 0.0416646182098284190154 +13002 0.2000689424803529237185 +13003 0.2115567948070481152367 +13004 0.1325277080708512078555 +13005 0.1172401616069160129596 +13006 0.0829507597500869775509 +13007 0.2120989385848769670506 +13008 0.1911565368943412834835 +13009 0.2106527206583924116678 +13010 0.1135495450342216411377 +13011 0.2103131499705349072293 +13012 0.0137907171649266843683 +13013 0.2419436823724713525596 +13014 0.0645174956130249444008 +13015 0.0149355293182297736015 +13016 0.0020854037960936306836 +13017 0.2078277099042833420128 +13018 0.134227235687899654204 +13019 0.0336242710299755540038 +13020 0.1770567376897437816829 +13021 0.2645555009941986113198 +13022 0.22503585897772776514 +13023 0.1420127672718531353091 +13024 0.1825048735870796745573 +13025 0.1462007250052267115503 +13026 0.1692064962291985052989 +13027 0.1935035688324863167864 +13028 0.1563079572129506544798 +13029 0.1249808518015923936018 +13030 0.2484939400173555024143 +13031 0.178947896536232470277 +13032 0.0579354057330494776057 +13033 0.1609645490592496175353 +13034 0.1534251531016458669487 +13035 0.0255315046923560336201 +13036 0.0141432929968927333098 +13037 0.010058920847166107071 +13038 0.0228813948463605990047 +13039 0.0064127535755532857897 +13040 0.3269950921275000532695 +13041 0.2776748510360093069593 +13042 0.0693300856289411210343 +13043 0.013249910226011976655 +13044 0.1252102584723043254833 +13045 0.2968895730959378864711 +13046 0.1388742634788280860825 +13047 0.0081724310123581830195 +13048 0.130841152894825507591 +13049 0.1450166641365270292496 +13050 0.0287278019290372887973 +13051 0.0276766690877509741187 +13052 0.1162093974832756615445 +13053 0.0909055258291798229386 +13054 0.210014036209064058891 +13055 0.0314526788624714045972 +13056 0.0567519177529761387202 +13057 0.2246413482965650965095 +13058 0.2342896467115572478601 +13059 0.0925106176942580432643 +13060 0.1794175823728123564393 +13061 0.0985062701530395323779 +13062 0.1782845699493462443552 +13063 0.2544678145391809920284 +13064 0.3111480014089079149109 +13065 0.2615948911102098373149 +13066 0.2232418843223316906599 +13067 0.0504707074345986037023 +13068 0.0772271032812726632155 +13069 0.2728878550873311548131 +13070 0.1398750338200192844251 +13071 0.2124731636729093042781 +13072 0.1239905821948397157684 +13073 0.1285411110726450423236 +13074 0.163897097626711701901 +13075 0.1457802885064192477405 +13076 0.188779550092906062142 +13077 0.2861988582101771538824 +13078 0.0349900891571926220269 +13079 0.2040727622607929336862 +13080 0.1731103055303277182819 +13081 0.0642766573035121191459 +13082 0.0116623746236842462665 +13083 0.1061316786434148007334 +13084 0.1284096557395533189094 +13085 0.2645337352276145304053 +13086 0.2199834954265298403442 +13087 0.1694962338098464360581 +13088 0.1320649483922139155734 +13089 0.143176391698098676164 +13090 0.1057840037611665479078 +13091 0.0794559295358470557558 +13092 0.28495392365164734505 +13093 0.2583649091609260817215 +13094 0.2315001288897102493802 +13095 0.1215377833428189502385 +13096 0.0551718748979686465783 +13097 0.2147770441231964533291 +13098 0.2046960914352647653569 +13099 0.1282284692716648966115 +13100 0.1613052629179699548612 +13101 0.2142863447342401028717 +13102 0.1411188689724779954382 +13103 0.1506854205898861542856 +13104 0.1150275910252436389891 +13105 0.1599719956702576884044 +13106 0.2188304209902651453756 +13107 0.0425520447281508423787 +13108 0.1457430659591974486666 +13109 0.1964963281508532721986 +13110 0.1968463016922099850792 +13111 0.1480935095105165855145 +13112 0.0552598198630946044196 +13113 0.0362899287223771779831 +13114 0.1291027477959329483603 +13115 0.1098815559788776929917 +13116 0.1050166661081941149591 +13117 0.1103020368068883677592 +13118 0.0594123094530838238847 +13119 0.0020995779700155392869 +13120 0.218554076908010069813 +13121 0.153648090444506751906 +13122 0.1676436087214675518808 +13123 0.1596635987409703028028 +13124 0.2621606770738452718561 +13125 0.121493832550776892365 +13126 0.1056451322414745513933 +13127 0.2015228847104735543372 +13128 0.1747495267445016431385 +13129 0.2020327099312810570364 +13130 0.0012950351063347595101 +13131 0.0129378306719993771412 +13132 0.0254932875841056838517 +13133 0.192207008419770175589 +13134 0.0021294980381798903069 +13135 0.2381105903255636424554 +13136 0.0966742331536858956476 +13137 0.3332405012456759463291 +13138 0.241713987311167638472 +13139 0.1271560286591984778592 +13140 0.1200780442716539997416 +13141 0.2026887544456074663479 +13142 0.1459776133026672584325 +13143 0.1470481182534410513618 +13144 0.0204795751304315092056 +13145 0.1307392917410616361984 +13146 0.1247792343461379405145 +13147 0.0416922668153361930532 +13148 0.019849204383828710635 +13149 0.0536899535678029074526 +13150 0.1694150522997807939429 +13151 0.0436658241849739239826 +13152 0.1634578751481506053711 +13153 0.0014572979484222398083 +13154 0.0074006071042125353213 +13155 0.0133914686057962396826 +13156 0.0649415814602394142474 +13157 0.0284929374877686651668 +13158 0.0336054222462123797222 +13159 0.0214007121585722548396 +13160 0.0004004948502961004479 +13161 0.0351879410183512444932 +13162 0.2137100090017510778395 +13163 0.0454939084491659193055 +13164 0.0250092536910400571604 +13165 0.0749933204577460654772 +13166 0.2022869203194386789058 +13167 0.3643823220343763158446 +13168 0.0995482298164525325479 +13169 0.0031678396255100630015 +13170 0.0067568517775084546706 +13171 0.0005266051073511916064 +13172 0.0622951079273063729191 +13173 0.0573063630890522715533 +13174 0.0542365925914252908213 +13175 0.3944721554868608204281 +13176 0.0411467811528266058652 +13177 0.052946367446236097809 +13178 0.197641342196348768212 +13179 0.2122031271836624100047 +13180 0.1300503214683307040556 +13181 0.1459056190255637597275 +13182 0.2101380116600830849638 +13183 0.2713453214514174316463 +13184 0.2988306329317709786331 +13185 0.1059996688665615499625 +13186 0.0935995144907603299211 +13187 0.1858366317537298029183 +13188 0.2134816907718484202405 +13189 0.2495885094874875642645 +13190 0.1231613168297124721873 +13191 0.0856250064086039736067 +13192 0.2133546284305530560399 +13193 0.085736098964753107432 +13194 0.0669664804405121077968 +13195 0.0256285987248462757138 +13196 0.0010596968007618356913 +13197 0.1686633856310412182822 +13198 0.2479902953904503659643 +13199 0.2720036342754812985767 +13200 0.0200816075492002699132 +13201 0.0019670514508208719953 +13202 0.112349028697535838206 +13203 0.08852142578643575066 +13204 0.2075151842902104581956 +13205 0.082569187236468485902 +13206 0.0905679277153513817566 +13207 0.1853238666896917796745 +13208 0.0994652338738169217169 +13209 0.0026488816309713169271 +13210 0.3460118461874882034301 +13211 0.013163968533777901937 +13212 0.0024468460966148137224 +13213 0.1470671247505940526512 +13214 0.1386272681416559071632 +13215 0.1282299101947066699569 +13216 0.2878537191322892430101 +13217 0.0899913377376684170406 +13218 0.1153208434382100161519 +13219 0.0249001882369774205084 +13220 0.1494834464695611875307 +13221 0.2099930269394799930982 +13222 0.1423140776452557765275 +13223 0.0175101084519494916425 +13224 0.1247470283973139693057 +13225 0.1518900654081083156921 +13226 0.1493745963690691347381 +13227 0.2764763222879739190674 +13228 0.1221040434466378332701 +13229 0.2152644652411415393534 +13230 0.0556104982650626467056 +13231 0.0622899121663185104114 +13232 0.2187557116270576196282 +13233 0.2163916325609122337514 +13234 0.0998746546162323917128 +13235 0.121524086173390152843 +13236 0.1472404476943698081737 +13237 0.1991601656788545560151 +13238 0.2016118795810269148649 +13239 0.1054268316899629603922 +13240 0.2219230168080590648305 +13241 0.1193901611339206886298 +13242 0.2298381847820251122627 +13243 0.0450355439066183124708 +13244 0.0355571788024511836745 +13245 0.0969155814633473333686 +13246 0.1454724254669395866468 +13247 0.2531608325024441441542 +13248 0.1398681859282822625978 +13249 0.1122508314025099512845 +13250 0.1762062533721132639997 +13251 0.1430788235091839111224 +13252 0.0298890586164641258726 +13253 0.1633382763468548826147 +13254 0.2710782054389786477344 +13255 0.2660947810829644710751 +13256 0.0467703912989492626151 +13257 0.1147630149968565105301 +13258 0.2114241311271639611746 +13259 0.1480714814103009369095 +13260 0.0849559645136160473955 +13261 0.0403143107177452234002 +13262 0.1562711457547399374057 +13263 0.1317797597176922907991 +13264 0.0609675504999969003483 +13265 0.0159550893442475978656 +13266 0.0567660909310237493663 +13267 0.1717949921189989848092 +13268 0.0074553806563203723484 +13269 0.2200409493934623594846 +13270 0.0024229749906718731425 +13271 0.1560405846903773208201 +13272 0.1019091155563884854018 +13273 0.0288406770645616636339 +13274 0.1608213403110411998664 +13275 0.2585885257371689505135 +13276 0.0451332059377288000168 +13277 0.1511615046523683281166 +13278 0.2760389701552818375596 +13279 0.2948169738325226441056 +13280 0.2272456137888846094963 +13281 0.1930502239907924411089 +13282 0.3471647036156504384508 +13283 0.1720374288496294423734 +13284 0.0900290566470969000346 +13285 0.1589899938689235958655 +13286 0.2634287904527773394037 +13287 0.1612880338826636339444 +13288 0.2091864648474403565181 +13289 0.167819142180225255867 +13290 0.1428703643068886808276 +13291 0.2426342822205850291439 +13292 0.2131556416694661837408 +13293 0.1791039150693986281571 +13294 0.2567328047865031770414 +13295 0.1937756693758813375528 +13296 0.1104890107352457656598 +13297 0.1717386963365977459173 +13298 0.2223855513790715388023 +13299 0.1967092810981340145204 +13300 0.2342241665748107937173 +13301 0.1476166474404010109378 +13302 0.001405198305994524938 +13303 0.1542563726241150123819 +13304 0.1892204227341501288073 +13305 0.1116451834377323176772 +13306 0.057149486387871158144 +13307 0.1002746234371763023319 +13308 0.0682155175402465729917 +13309 0.113213234622458303047 +13310 0.1514069685770955964177 +13311 0.2470321170654017350898 +13312 0.1050125671770759522294 +13313 0.238387414421886573157 +13314 0.206654929072115328692 +13315 0.1884852855017691286577 +13316 0.0447267198559048242967 +13317 0.0016786190382124574049 +13318 0.0467013598616622047088 +13319 0.1621686417389941459177 +13320 0.1243696984807721034327 +13321 0.1181210523162477959991 +13322 0.1565640138429829630873 +13323 0.2588601985200402744169 +13324 0.0672442093925399442256 +13325 0.0129011308326782373618 +13326 0.007594389476236173761 +13327 0.0084781772446862937315 +13328 0.0061428022605096886319 +13329 0.0094722326233727376782 +13330 0.0084101374550008441144 +13331 0.0043804502000072461504 +13332 0.0040411740118516210549 +13333 0.0080916769343723615043 +13334 0.0018612970127048638919 +13335 0.0034856400727596046837 +13336 0.0044530713997466232421 +13337 0.0097752705376563109319 +13338 0.0102615525770580629655 +13339 0.0014091962068927550277 +13340 0.0040602160312924635843 +13341 0.150678148672683670517 +13342 0.1111975617216773570828 +13343 0.0004011281278863984101 +13344 0.000307802760095793413 +13345 0.000928542921638506589 +13346 0.0004261394952049731857 +13347 0.211422455579508311585 +13348 0.1897942870199167408884 +13349 0.082540150188276220633 +13350 0.1364561288961212148774 +13351 0.2092033534133996286464 +13352 0.1316908740051707382701 +13353 0.0041064133378249456988 +13354 0.0534388283463830271458 +13355 0.0349212841591711523059 +13356 0.096413136894648907016 +13357 0.1525428667227899381942 +13358 0.2122952821077758733814 +13359 0.1062797446059388084549 +13360 0.1624489546271585538317 +13361 0.1563937730633007394676 +13362 0.0446850711337820061253 +13363 0.0246841848891405576949 +13364 0.0906985180308928773396 +13365 0.0013112535946944554269 +13366 0.1187478988385056305477 +13367 0.1889998214078422722473 +13368 0.2149555630182940213491 +13369 0.0226502666225363927244 +13370 0.1574871581421421473657 +13371 0.042300723448745719546 +13372 0.1552771274923977484761 +13373 0.0881976795957736675602 +13374 0.1462999244602316561981 +13375 0.2922160131103594959967 +13376 0.0497490617299693599351 +13377 0.2584171875879141544452 +13378 0.0568455650511586058071 +13379 0.2144034394141013932789 +13380 0.0789108620379764535979 +13381 0.0675374124937258152324 +13382 0.0120887209480661523414 +13383 0.0007324582077080633814 +13384 0.0141777325750485697076 +13385 0.0006500554228014735966 +13386 0.0010264194797816805164 +13387 0.0084317266533984417537 +13388 0.1351733800632825832633 +13389 0.1150127283783387577287 +13390 0.1856330589705349198848 +13391 0.081262551859274306798 +13392 0.0213743705528633629043 +13393 0.2120372100309508844873 +13394 0.0414792749769879870447 +13395 0.0357823661056103775602 +13396 0.1614464346893974555641 +13397 0.1295743837703002554473 +13398 0.2986632038656176768576 +13399 0.1444601275175926013983 +13400 0.1963411188748882540978 +13401 0.0887633497125227960423 +13402 0.0102341108702486329601 +13403 0.1330554080697268770628 +13404 0.1511171621298245393294 +13405 0.0108228189556809697991 +13406 0.2190695676285172310838 +13407 0.1123479582239351598494 +13408 0.1542800003135264552689 +13409 0.0528819510077241788926 +13410 0.2541561944808478634705 +13411 0.1200437002058444974084 +13412 0.1817914710038368297784 +13413 0.3234733893312348174121 +13414 0.1808122140601059746334 +13415 0.0239556534044875098222 +13416 0.2807428599752067599482 +13417 0.1148752941491574114385 +13418 0.1603836186501406413374 +13419 0.0753037175859238988584 +13420 0.0105833670857224445422 +13421 0.3383278237726857762269 +13422 0.1789478248873264676355 +13423 0.0840434177538892379644 +13424 0.1780242962216756763549 +13425 0.122852437338800482558 +13426 0.0460184123421965643463 +13427 0.2400348223093090038738 +13428 0.17238815490833017674 +13429 0.071988287227893382747 +13430 0.2003877606138539935188 +13431 0.0685869554772960965394 +13432 0.0713753505688968226117 +13433 0.0005032280786934314239 +13434 0.1175744631814741175679 +13435 0.0017868409689079599832 +13436 0.0004199901257906159561 +13437 0.000426550122642915721 +13438 0.0004957392942988754492 +13439 0.1840820130467043513889 +13440 0.2379244317163354827116 +13441 0.3234985221750406592101 +13442 0.187429371119043730598 +13443 0.0687219646948090678862 +13444 0.1489555511466299497769 +13445 0.1891341203578108531858 +13446 0.0481011318620260044931 +13447 0.020830452286613055507 +13448 0.1794746119630337144457 +13449 0.0033344425732761691188 +13450 0.0577092999662822750406 +13451 0.1267675503054299102068 +13452 0.0845024099249604837558 +13453 0.1578834726919315645244 +13454 0.0694958568409970584012 +13455 0.0422410003766769681643 +13456 0.000989741220502608288 +13457 0.041424326842399883386 +13458 0.0287731638442750708995 +13459 0.0054271169225150565907 +13460 0.0005576479213511870581 +13461 0.0603799336340612863294 +13462 0.2157744744881338050391 +13463 0.0006600202040831915215 +13464 0.2247161089001386002906 +13465 0.0326373231705856178819 +13466 0.0358858126647500150219 +13467 0.0258994457564994048915 +13468 0.0905364207464880998977 +13469 0.1406086122210911115715 +13470 0.1474594137619671951889 +13471 0.1216831698816720902512 +13472 0.1012332988214770385094 +13473 0.2072836445301870778035 +13474 0.1775902583450410798793 +13475 0.04934267573665182538 +13476 0.2060787753325541427074 +13477 0.2327259014128763847662 +13478 0.1775852198256664404852 +13479 0.1217674027563975802657 +13480 0.1427525052358533941277 +13481 0.1766223597882721763064 +13482 0.0015009725191667733137 +13483 0.0312631010175136794804 +13484 0.0109220161330563104302 +13485 0.0590328029663993864018 +13486 0.0816636795899954165412 +13487 0.0996315294643322357793 +13488 0.1456213638522460829261 +13489 0.0039284812096820201713 +13490 0.1923670775876912542035 +13491 0.2986967619981269472795 +13492 0.1915843027085344929539 +13493 0.1456925426252010380423 +13494 0.17638632849646407319 +13495 0.0873607132817825482451 +13496 0.0499769127986118832929 +13497 0.1403758104808249529416 +13498 0.0025590200985797567287 +13499 0 +13500 0.0014771675048114022485 +13501 0 +13502 0.0507882216991051016475 +13503 0.0234761821548312259178 +13504 0.0226312170853699319573 +13505 0.0094959918024426215322 +13506 0.0730074648451375235458 +13507 0.2220209103412033591063 +13508 0.0012658958163772235893 +13509 0.1675252297276144941574 +13510 0.1361920014255290078964 +13511 0.1605047149519324423128 +13512 0.107549682650039085674 +13513 0.2024103039486822575022 +13514 0.1013781042693547296274 +13515 0.4164452989272092175987 +13516 0.0185791002350720983738 +13517 0.2361065881172933200727 +13518 0.1404392520016008172323 +13519 0.1135822666022177906964 +13520 0.1820500444976420328924 +13521 0.0386077857040471039696 +13522 0.1348316861868487648302 +13523 0.0439835967600996496918 +13524 0.0225751072142408096277 +13525 0.0301599066672249248311 +13526 0.008557384889532792388 +13527 0.0935403425761489187851 +13528 0.1619772419821329245693 +13529 0.0666389515313985381084 +13530 0.0476084720129857225102 +13531 0.0039646002696131629403 +13532 0.0106958695104588958485 +13533 0.0340425815761337055565 +13534 0.0051466473962892044305 +13535 0.0085547429504099509884 +13536 0.000315344484872706176 +13537 0.0149541067059192085004 +13538 0.0078603973767593666305 +13539 0.0440203190497098564848 +13540 0.0012300031696650074332 +13541 0.0239635888112440277598 +13542 0.0332659152511831784915 +13543 0.0129765013985546073805 +13544 0.0147829992815823561825 +13545 0.3250373369851281002418 +13546 0.0524528583964495170866 +13547 0.2499194834654406838226 +13548 0.0899890941005105499118 +13549 0.0228273566921472631508 +13550 0.2015800825067089185882 +13551 0.1964475286809101439722 +13552 0.0153244492431756345469 +13553 0.0084427634085177758611 +13554 0.1844452713180091407708 +13555 0.2241695926878929934212 +13556 0.0315889045708867624573 +13557 0.1058356531629230656844 +13558 0.143213429970789590806 +13559 0.3314028503130395097998 +13560 0.1540208154886045233134 +13561 0.1745911131538417926379 +13562 0.1689872375986326558106 +13563 0.1324418736845892619058 +13564 0.0008027871524804985068 +13565 0.0016412199929821871119 +13566 0.0025834548961586436058 +13567 0.0005513486039323183333 +13568 0.00244749689955516209 +13569 0.004399380895807832037 +13570 0.0006905048959456702891 +13571 0.0018184117123525623833 +13572 0.0066022845224348609278 +13573 0.004273836665008202626 +13574 0.1143899586239315729896 +13575 0.2114824992082919297154 +13576 0.1663708893376619757376 +13577 0.0014842500828802810894 +13578 0.0268278002818627056558 +13579 0.0301101912730600690393 +13580 0.1631649302993892491909 +13581 0.178315469242797591809 +13582 0.1539121286487196704762 +13583 0.0890683378361531025158 +13584 0.1654177374955509149945 +13585 0.0091532184625040865361 +13586 0.1042496784877227772315 +13587 0.0325317291218128537555 +13588 0.0872766878106668725357 +13589 0.0890413904728553917689 +13590 0.0441401184649467537291 +13591 0.1344147568595465969121 +13592 0.1027353752805820474858 +13593 0.1691535481405398999577 +13594 0.1214518048686932250035 +13595 0.1464404390864695593155 +13596 0.1080386750482418001251 +13597 0.0691553107008020112589 +13598 0.1171560694550801695613 +13599 0.1815190853629832790084 +13600 0.1258912205349361213003 +13601 0.0364816222812030591105 +13602 0.1054840394249293156204 +13603 0.1387401935761039539763 +13604 0.0299431763790027685723 +13605 0.1137716775588014095355 +13606 0.1102178247980055564303 +13607 0.1805700723360786630334 +13608 0.1968852265628452113955 +13609 0.1631238802694993383913 +13610 0.1852554033126617427651 +13611 0.0186850733968458859735 +13612 0.135791991510391951703 +13613 0.2212909232032646833499 +13614 0.0723956273451801940455 +13615 0.1658865025791225744634 +13616 0.0928166754989598885572 +13617 0.099636610655001636716 +13618 0.1664328456059929461919 +13619 0.1437041600922831230402 +13620 0.2013410618230042081755 +13621 0.1952702947776356334764 +13622 0.1241234224576488187619 +13623 0.1116355819842781565976 +13624 0.1118432395274509966177 +13625 0.1389842747826281199597 +13626 0.1156415328659292501845 +13627 0.0764871683877921537675 +13628 0.0166410435916220511565 +13629 0.2293068688776531138807 +13630 0.012326072880530291459 +13631 0.0009738742826608230317 +13632 0.1845533722726247771728 +13633 0.1124178272800200606429 +13634 0.1002879891475169232828 +13635 0.106050825281088223484 +13636 0.0690393145786384609952 +13637 0.050417718655176031739 +13638 0.3274516041523610199171 +13639 0.2283519472926694438009 +13640 0.1173285485876752604062 +13641 0.1721812860566006575525 +13642 0.2475675596771896580517 +13643 0.0405984579713012105895 +13644 0.0028508209522701204001 +13645 0.0551984866051312833557 +13646 0.1587074311119612834986 +13647 0.0127891582129563696757 +13648 0.0358159490336368707686 +13649 0.1634717139820892339852 +13650 0.2548856051268315048297 +13651 0.1055511412880468929965 +13652 0.1289374299906761167644 +13653 0.0362806596865845598643 +13654 0.0013638750829266705917 +13655 0.0149185227067544830953 +13656 0.0319534117303241910202 +13657 0.0331373648757026031952 +13658 0.0613190940892771180204 +13659 0.079510076394574663583 +13660 0.0567379443837843361331 +13661 0.170459727762938217932 +13662 0.0910153139272674438498 +13663 0.0529866724106405062589 +13664 0.1697258090281698050816 +13665 0.0331149124438697983752 +13666 0.0604825213448796014659 +13667 0.0591534749958117986313 +13668 0.0744987427748175362607 +13669 0.3004951870441239236698 +13670 0.0805636221504978239372 +13671 0.0222849856848996406844 +13672 0.0457380041753317895248 +13673 0.0939015765737370861066 +13674 0.0102585814127894598552 +13675 0.061449961976596322033 +13676 0.00060024483697459607 +13677 0.1556934799117080658881 +13678 0.1724988734352065622435 +13679 0.3233717548919021189668 +13680 0.1564245475464031975843 +13681 0.356675570271579034376 +13682 0.1803337507732624411805 +13683 0.1349837879410323016049 +13684 0.2460526856247898797037 +13685 0.0877617780438600808557 +13686 0.0640815767544423570135 +13687 0.0585359066353785054138 +13688 0.1407498501319371653207 +13689 0.1919268885476843322646 +13690 0.1357105040670668338887 +13691 0.1214987222408589334632 +13692 0.1564353341678497433254 +13693 0.2458648465460715470954 +13694 0.195560535847470545745 +13695 0.2033397562828682392233 +13696 0.1342396058560233473855 +13697 0.1586101921171275408451 +13698 0.2220126850287754971536 +13699 0.019085561244768251693 +13700 0.0726447895821634437308 +13701 0.1679802987080494525163 +13702 0.0494896916956500607099 +13703 0.2355364231941002584048 +13704 0.2097892964349772071397 +13705 0.2436222642376132119235 +13706 0.1259673312289950608545 +13707 0.0546088491283283558597 +13708 0.1594827825011508604103 +13709 0.1188616632391514410516 +13710 0.0117709564853840445431 +13711 0.3503209735800650315163 +13712 0.0055951776159734786104 +13713 0.0593576168509995366618 +13714 0.1164188503602329588515 +13715 0.1613458066213884489759 +13716 0.2221951947883450961818 +13717 0.2889973830674821830833 +13718 0.1419147776549713191319 +13719 0.2230779621632372755613 +13720 0.1956314283417706079149 +13721 0.1765749863240734796044 +13722 0.3224990223125475119481 +13723 0.226136930811046354961 +13724 0.0705316609196149219008 +13725 0.0580531639024830792284 +13726 0.0213056603145024975321 +13727 0.0143401129646191886902 +13728 0.2458465377269539731131 +13729 0.1021785581198374620593 +13730 0.1086140736972596443355 +13731 0.061713688880516953672 +13732 0.1854464990690676273744 +13733 0.2183779208174443653423 +13734 0.083498493613593627316 +13735 0.0339105364613970575172 +13736 0.083970472779739749325 +13737 0.3050907758020307003477 +13738 0.254052765682921843915 +13739 0.1390832721736318389194 +13740 0.0156391232058513095227 +13741 0.0175164002515664427029 +13742 0.2042198781657270367784 +13743 0.2332001010115108408094 +13744 0.1970534182954850055403 +13745 0.2524065606409007234134 +13746 0.2053365662567217253542 +13747 0.1997265440858724538931 +13748 0.1683213242858765112953 +13749 0.1913728324478329578806 +13750 0.0884594478292538288766 +13751 0.1258358723731085870856 +13752 0.1166385060759833613986 +13753 0.1294116263941036681917 +13754 0.0282996924046526772412 +13755 0.0674242726087412314051 +13756 0.299706808963676241131 +13757 0.2515592421851019500068 +13758 0.1211589265631216327801 +13759 0.110249076357369038548 +13760 0.0184596584928314114138 +13761 0.0047762717854566583836 +13762 0.1725322660188878254317 +13763 0.247216904128520043038 +13764 0.0288509731504169725236 +13765 0.0094058592426492591138 +13766 0.2339857666874880781638 +13767 0.2484739833319636048703 +13768 0.007182498072256950257 +13769 0.090313171245806886378 +13770 0.1696567394093774217012 +13771 0.2004942226758675560827 +13772 0.1749431400401267044931 +13773 0.1279232286383226169235 +13774 0.0017564618109737473173 +13775 0.117456523562410194983 +13776 0.3023124309681283206075 +13777 0.1308015452791077282146 +13778 0.1080743906785597008291 +13779 0.0208311618182337940364 +13780 0.01058170834487954029 +13781 0.185256057917042638028 +13782 0.1330907537702210041708 +13783 0.0028301207902100694591 +13784 0.1461066257495834963542 +13785 0.1711418860805748731746 +13786 0.0556742152083989846889 +13787 0.10999311922988526391 +13788 0.035230007783099502594 +13789 0.0299330912401942830781 +13790 0.0005681537576894359932 +13791 0.0017745780949923585599 +13792 0.2663418902909290841841 +13793 0.3215443982562631863864 +13794 0.1348264413475000478293 +13795 0.2965860271388124358438 +13796 0.0279723427868984082789 +13797 0.0681708658132867406643 +13798 0.1433073343401146670928 +13799 0.0728481537816367957783 +13800 0.0631000076368397333493 +13801 0.288611721205796312173 +13802 0.0390062067507493046015 +13803 0.0003956545764838058452 +13804 0.1184192913835281774082 +13805 0.1450786708493032317602 +13806 0.0886902928544402724409 +13807 0.118921438732773152025 +13808 0.2358626307270084432322 +13809 0.2879799167869272480402 +13810 0.227481325579810766957 +13811 0.0893234859646064999206 +13812 0.0288089141659291676778 +13813 0.2587649504157423785422 +13814 0.1274337497043301625776 +13815 0.2706429326638543630956 +13816 0.0847658117643898395555 +13817 0.0348844457768942434184 +13818 0.306724997810351673877 +13819 0.1851565727251069937154 +13820 0.3005946811173407229312 +13821 0.1916224112500212184429 +13822 0.237073520444526480988 +13823 0.1935772414134389607376 +13824 0.1163591808885750950031 +13825 0.3296742441944711665691 +13826 0.1891561167107662366771 +13827 0.1791449821952838528105 +13828 0.1488952098764847187251 +13829 0.1172375749835532776633 +13830 0.027055197060143733323 +13831 0.1195753002058309588662 +13832 0.0524848256059910825422 +13833 0.0481290292961321294896 +13834 0.2172808161196898013579 +13835 0.5445549592636894642439 +13836 0.0502954009647814948014 +13837 0.1928676222913369153744 +13838 0.1726865014095094741364 +13839 0.2364645424384712313337 +13840 0.1386898101262747928786 +13841 0.1887412080663428137761 +13842 0.0852959626675010296193 +13843 0.2099020385802319033086 +13844 0.0307725100899690628964 +13845 0.1674762959315479471378 +13846 0.196940040556663281679 +13847 0.1963295362587181569491 +13848 0.1887481510586618282943 +13849 0.1816942300833431500617 +13850 0.0918113699051868181744 +13851 0.1296877815583076520856 +13852 0.3380348649680519068639 +13853 0.1205818054916735770909 +13854 0.1073483924506268977206 +13855 0.1526343628573271216098 +13856 0.1231290434661278521133 +13857 0.2997276799486667253625 +13858 0.1494696829143885019597 +13859 0.1769681064296728667529 +13860 0.1503645201441438372658 +13861 0.0469042899206327801309 +13862 0.0201777992416998126768 +13863 0.0170719411418145013137 +13864 0.0584273822384968541388 +13865 0.160894572270991298435 +13866 0.226123687155286079431 +13867 0.0518085135684738240514 +13868 0.1855050762086993543232 +13869 0.2167940734611104036667 +13870 0.1343707608227901750819 +13871 0.2701484787237231066648 +13872 0.2159233109502901870602 +13873 0.1698790941986985936207 +13874 0.1102068204939316009572 +13875 0.2004883629676069423109 +13876 0.1774133315167263635903 +13877 0.1414401046945157891521 +13878 0.1580831672469266424574 +13879 0.1548617870652060635184 +13880 0.2131100489077036030938 +13881 0.1631042984518284444384 +13882 0.0081458731993469517957 +13883 0.1117169633593522587445 +13884 0.1366802907202556427446 +13885 0.2932312960066725815444 +13886 0.2054205588197476339829 +13887 0.2000018171884644646585 +13888 0.1531345742288465205938 +13889 0.2132783593968336777191 +13890 0.1275713003835574343992 +13891 0.2153383979875116982505 +13892 0.100661665358241086965 +13893 0.1542565693918779012872 +13894 0.2075240986652170382953 +13895 0.1663816932112096302898 +13896 0.1765394487901352504888 +13897 0.369501655587234301148 +13898 0.2318423882119103884847 +13899 0.1760027739431022919536 +13900 0.1182607599017346522441 +13901 0.2806042306653727669286 +13902 0.2372162839038287340809 +13903 0.1744485529807979218297 +13904 0.1837517873675293400559 +13905 0.0008349041403441156972 +13906 0.1096784781849411971599 +13907 0.0587315905038375760117 +13908 0.1257821862300725068717 +13909 0.1804452483023751274693 +13910 0.1086925969439143518924 +13911 0.0671081134626719810754 +13912 0.1508363033898212912653 +13913 0.0592290136767959288222 +13914 0.2269075560457762263944 +13915 0.2876283467518229164384 +13916 0.1730341711099839252519 +13917 0.1816643110780848946462 +13918 0.1988149963698425393144 +13919 0.2482727931918931318922 +13920 0.0649953453747659226636 +13921 0.1601617487123776495395 +13922 0.0708581474849877129829 +13923 0.1623563799572753096889 +13924 0.0958968953069833790481 +13925 0.0514970625164793485706 +13926 0.1001843733525603907797 +13927 0.1329860309732113221592 +13928 0.0534841545820055677685 +13929 0.2012070321627417446297 +13930 0.0249477653422544125295 +13931 0.0304178430241304198489 +13932 0.1176665790628767249704 +13933 0.0747411065561078558073 +13934 0.1369227781701735091602 +13935 0.301260110162315131177 +13936 0.1913196134487240296362 +13937 0.0780419773227748608813 +13938 0.1282916522794582403488 +13939 0.1201386685404709220304 +13940 0.1722013730197644032494 +13941 0.0644213462226271987854 +13942 0.2582612680573718733079 +13943 0.2811055650924420001857 +13944 0.168122345857984717421 +13945 0.0868641726255781065902 +13946 0.1910728640704473790812 +13947 0.0851415716475505957028 +13948 0.1790755824331493051282 +13949 0.0405275111229612058916 +13950 0.1457165323348849628182 +13951 0.0522421676987887342869 +13952 0.1859416857784658982933 +13953 0.1792736799079930609668 +13954 0.1279994937264858634318 +13955 0.2820333987462790292788 +13956 0.113790142667965818446 +13957 0.0679544747759842082679 +13958 0.1478702148714116371142 +13959 0.0782567634167443543225 +13960 0.2428363492799270051403 +13961 0.1611101201731485743629 +13962 0.0305778262970792420739 +13963 0.1338687781825182721018 +13964 0.1048626698201645668362 +13965 0.1296957706359685436048 +13966 0.1843794569313753073381 +13967 0.0909160936765814470251 +13968 0.0162476604242579487913 +13969 0.2309647549194017712715 +13970 0.2674571359476481391404 +13971 0.2262674959550128872543 +13972 0.1256184119358582651493 +13973 0.1250195105124664585183 +13974 0.1495572192368131991813 +13975 0.1128901825721305707262 +13976 0.2253822040902807577467 +13977 0.0076753426822921549944 +13978 0.0084147759711139108463 +13979 0.1034989405480136875903 +13980 0.2294837290084081327102 +13981 0.0182143560738366010188 +13982 0.1117722966403183193496 +13983 0.0804976741778734988797 +13984 0.1274913215397071031276 +13985 0.190622569100363986605 +13986 0.1741946426057260910447 +13987 0.1553157252727136727888 +13988 0.2376936868472636632532 +13989 0.1748372409216009393251 +13990 0.1815081189408371264982 +13991 0.1357123557313830719551 +13992 0.0972647574629257605228 +13993 0.1343853783043075234005 +13994 0.0955207723691839927227 +13995 0.0354125344395027560895 +13996 0.2420574726174133106671 +13997 0.0891316838436235769905 +13998 0.0865192895174075188303 +13999 0.161062296999817416987 +14000 0.0440930925445592244993 +14001 0.0925192835608802249814 +14002 0.1063693730630638767387 +14003 0.0989053205154740572302 +14004 0.1081404104060088716688 +14005 0.0764123038297376544481 +14006 0.0959992580535721645729 +14007 0.108044810286812217881 +14008 0.1496321361603837607479 +14009 0.055420462138110145045 +14010 0.0232704214139407006279 +14011 0.0785914043553380819995 +14012 0.074049990755935768405 +14013 0.1885225606363044592584 +14014 0.1840145896214789578149 +14015 0.1310902794299731999317 +14016 0.2727566382606382289211 +14017 0.2188774058727257776358 +14018 0.0911188364583165044674 +14019 0.2013237794833059035593 +14020 0.0298273815392365825028 +14021 0.0463685027423514431466 +14022 0.1972478482244469921714 +14023 0.2259103157950254348041 +14024 0.0005601513171481641191 +14025 0.0057634344725711545224 +14026 0.065267499169608361731 +14027 0.204839865878989502157 +14028 0.142472844095283046606 +14029 0.1827453028800271883636 +14030 0.097706770099638726168 +14031 0.2012851549889808633331 +14032 0.0496510211516114352559 +14033 0.1461000163142382735071 +14034 0.1880590211772893982634 +14035 0.1052481971239567354326 +14036 0.319124640377335400121 +14037 0.1968294197106972842626 +14038 0.2222969860194421443111 +14039 0.1220383163893207362216 +14040 0.0554408641459492659842 +14041 0.1644344814318798586594 +14042 0.2141252563970403866023 +14043 0.1992266035993975059615 +14044 0.2189268833544367742761 +14045 0.2265462237894934338645 +14046 0.1214674779089007089494 +14047 0.2239255660682374360348 +14048 0.1574245063495907925866 +14049 0.1091296162648334872047 +14050 0.0161742040069407407021 +14051 0.2905119273384119127357 +14052 0.1472824583421580846299 +14053 0.0299034526341692800944 +14054 0.2344692546945709443751 +14055 0.1848675568317361406212 +14056 0.1872586316562069586045 +14057 0.1660406435933769186963 +14058 0.169225410887447802466 +14059 0.1418462284315269161805 +14060 0.1467942035505579445598 +14061 0.1627484234102632154606 +14062 0.2069887211848891184207 +14063 0.1114367617918528563514 +14064 0.0671249983710946601656 +14065 0.1858360591064328881661 +14066 0.0092275065224285898746 +14067 0.2023988665029067823831 +14068 0.0158283910431303610766 +14069 0.4573882316879983034319 +14070 0.270627499377971647565 +14071 0.2289483873140144287728 +14072 0.2424411190516835012954 +14073 0.2123001130580787487734 +14074 0.1899938693655365551383 +14075 0.1313424392753218539198 +14076 0.163030923229098706484 +14077 0.2884850777697178236814 +14078 0.2768363590024075504559 +14079 0.1529970130243568648964 +14080 0.1071877780205952312986 +14081 0.2040998939108893683869 +14082 0.0419640819434108566788 +14083 0.019090394848881628298 +14084 0.2746709405399149206417 +14085 0.0056750517809564830427 +14086 0.0051075380242706419284 +14087 0.0076138783466226404514 +14088 0.0066280693784909151464 +14089 0.0030088011469816249716 +14090 0.0105379722054360844274 +14091 0.102693745139886105644 +14092 0.0026574402060082660641 +14093 0.0004705404812825004298 +14094 0.1278763023567803525093 +14095 0.1679537763372263292627 +14096 0.0780830363820572276312 +14097 0.001812074366412947616 +14098 0.0293315161735217787287 +14099 0.0845668092285207922965 +14100 0.3330244468426418746887 +14101 0.1395386906665811443951 +14102 0.197226424763821350794 +14103 0.192066010331336939343 +14104 0.0949867584776862422524 +14105 0.0861942482115681807286 +14106 0.1971725656297943773954 +14107 0.2886890761733506560738 +14108 0.3671793598868241526034 +14109 0.1075319294538627373781 +14110 0.3197779421192382121042 +14111 0.1248242745714482626607 +14112 0.1299873843718777621792 +14113 0.1979938589260811510062 +14114 0.0402134421718188042605 +14115 0.088118492121428845798 +14116 0.2204220417359427641735 +14117 0.0851337923279696096035 +14118 0.1274743162246586847264 +14119 0.107893350218209449487 +14120 0.2363592191239227979782 +14121 0.1341279605201835811723 +14122 0.0004706676210280526305 +14123 0.0137718536366909817537 +14124 0.1253628510804492557362 +14125 0.0536105339429525512562 +14126 0.1473326087214142654513 +14127 0.028719291818898338986 +14128 0.1540214000661402071746 +14129 0.071398777698402651759 +14130 0.0591233637237823378419 +14131 0.1019117121567135741955 +14132 0.0105527199074349573721 +14133 0.0699109294041405926468 +14134 0.1899953723805034855321 +14135 0.1391876161378469145546 +14136 0.0832657997479224476489 +14137 0.202228511477354072623 +14138 0.1353065839076030840715 +14139 0.0505813226255326306435 +14140 0.1455957300175419466548 +14141 0.2768201828838499634067 +14142 0.0612189921452175114802 +14143 0.3360079422107702384892 +14144 0.4052680617716474253953 +14145 0.1691409073582766930954 +14146 0.0166783613468534895607 +14147 0.0059038717737648566466 +14148 0.0079678576353143636818 +14149 0.0056001211640964963789 +14150 0.0186544896458543316142 +14151 0.189080386468886058049 +14152 0.0568275908712118776878 +14153 0.2035879360383924530087 +14154 0.0984954407980472290696 +14155 0.0167864136341610258041 +14156 0.0542758811214527278621 +14157 0.0644685606551478329607 +14158 0.0287035388723753982776 +14159 0.0808170941640199841371 +14160 0.1561728272437654385651 +14161 0.0182130051807472478909 +14162 0.0187697119347811774981 +14163 0.2218338563430566623413 +14164 0.1857081619711303532849 +14165 0.1256367866015444534344 +14166 0.0963450231204349416547 +14167 0.1750888630964271552859 +14168 0.0021308491839058889793 +14169 0.0757265170371984180919 +14170 0.0063195610007185064327 +14171 0.1480415970354576338242 +14172 0.0051234400994571438481 +14173 0.0301146054695219204944 +14174 0.1188936443049715824261 +14175 0.0418859971772261863165 +14176 0.1564931075005411842405 +14177 0.1522070492926553553215 +14178 0.1621957276602542519228 +14179 0.1388226736168892694323 +14180 0.1612648906481952415515 +14181 0.1111097630990060258771 +14182 0.198109667016193702338 +14183 0.1660057683985535492699 +14184 0.1705319605425605899995 +14185 0.1431296549639648474361 +14186 0.0398429854567988669944 +14187 0.2122790160923284374928 +14188 0.1158956305464083069712 +14189 0.1215686136169776415672 +14190 0.0223937812535498458055 +14191 0.0671647558939861555327 +14192 0.0601304423492123873984 +14193 0.0136385560395859521676 +14194 0.0078588302584147919777 +14195 0.0152116341401871031647 +14196 0.2079553250440359379869 +14197 0.1596502609269776495893 +14198 0.2260567823622577077192 +14199 0.1453708174301489020053 +14200 0.220125691844868964786 +14201 0.1332406979211800956975 +14202 0.1388036924486995005168 +14203 0.1064377559694702074511 +14204 0.2115013740941728559442 +14205 0.1323185276724829217976 +14206 0.01560471835994534269 +14207 0.1565562381467841823746 +14208 0.0250208688491622803507 +14209 0.154428472429874841243 +14210 0.2540546513361383040852 +14211 0.1959679820052322440915 +14212 0.1086883694735013283772 +14213 0.1922723523911153975519 +14214 0.0008843357653262608978 +14215 0.2964013838994893346168 +14216 0.1761831213795501949182 +14217 0.160413439927872519819 +14218 0.1069747006229226915508 +14219 0.0496541055684694629391 +14220 0.158055000870837097926 +14221 0.1835284852515083331426 +14222 0.241356200859247638224 +14223 0.1539538461101311872525 +14224 0.0644070239084697238052 +14225 0.0726138598595223172527 +14226 0.1500443190845520968768 +14227 0.2033458324762149049825 +14228 0.0966518363038318534253 +14229 0.1815716643382601358514 +14230 0.0249241952561385753107 +14231 0.053345809650482786668 +14232 0.1593140849923221913365 +14233 0.1334572639080400247025 +14234 0.2702891583806520947597 +14235 0.0126695304440460898565 +14236 0.1437102401330610512709 +14237 0.0672291327885297801403 +14238 0.0973021665801352270142 +14239 0.048828863493201223045 +14240 0.1519066738211715950069 +14241 0.0355435663671560680976 +14242 0.1135036450285853842912 +14243 0.0010978582463854965907 +14244 0.0333319499066312280311 +14245 0.0636697871627858597021 +14246 0.1385251081473136414157 +14247 0.0540422088029080399663 +14248 0.1414760983517228121187 +14249 0.1045360687308820590857 +14250 0.0683395704717833091246 +14251 0.127386476042384039209 +14252 0.0150513045180735629408 +14253 0.0319179506277410748871 +14254 0.086116470451612830983 +14255 0.0083390775790452079108 +14256 0.0073377897251146522936 +14257 0.1827667578846006446458 +14258 0.228827727165548638899 +14259 0.1570805429912016659699 +14260 0.2269796773374280485225 +14261 0.2024938334800775452393 +14262 0.0361452097668491373339 +14263 0.2072844485521175283971 +14264 0.1111789419816369278715 +14265 0.0433708062706893160421 +14266 0.0445093122394038948064 +14267 0.055100139188902247378 +14268 0.0276197940505797726041 +14269 0.1015919832846352427458 +14270 0.19428184018516037046 +14271 0.2575503186718441361869 +14272 0.1520954680625301869146 +14273 0.1954009541942188232788 +14274 0.1503531007680946329774 +14275 0.1855167389167596547761 +14276 0.21153979079030693633 +14277 0.1354308268233905820921 +14278 0.1867145505390565562998 +14279 0.1156254745547409396034 +14280 0.1258897988881071960954 +14281 0.1285606281932051286887 +14282 0.1814111293252885781513 +14283 0.0840391091942173373841 +14284 0.1364990287887837550063 +14285 0.0616382603857311625606 +14286 0.1985557021264180710407 +14287 0.0039471297235175442294 +14288 0.0046148818374404758369 +14289 0.0004604710582585323873 +14290 0.1587250087320470348651 +14291 0.2477147577120022980957 +14292 0.063396554389403531049 +14293 0.0837632272164462315933 +14294 0.2922931716298517446084 +14295 0.1773264074697722403329 +14296 0.1935082867004867535066 +14297 0.1669101772027864194481 +14298 0.1494856070820230475871 +14299 0.1523308811620353031557 +14300 0.170761027671116283333 +14301 0.2536093641223240435778 +14302 0.2534669733715750883363 +14303 0.0919970902793633693584 +14304 0.1164850735561209910118 +14305 0.1215337695275599261269 +14306 0.10702649746046995205 +14307 0.138093972392053537801 +14308 0.0815503002926728226551 +14309 0.0825875520550367336892 +14310 0.0952234460605769911234 +14311 0.038952654579153547787 +14312 0.1644389355511726158454 +14313 0.2473824169373967929442 +14314 0.1694928105825443676036 +14315 0.267719198034781169504 +14316 0.2010037426060831411423 +14317 0.0630716038524623545669 +14318 0.1616164988617894149314 +14319 0.1853593162426594442849 +14320 0.0360842448399801424275 +14321 0.181958076554947584258 +14322 0.0873317577514707843456 +14323 0.3270319219100751850782 +14324 0.2312774372912093090449 +14325 0.0006188646732116436606 +14326 0.1926801638337037880788 +14327 0.1994251111795374287183 +14328 0.0562445538709444026182 +14329 0.0158826143786843301808 +14330 0.0930211678766043464917 +14331 0.2190803846508081187405 +14332 0.0134275619831285501077 +14333 0.001949498962752731733 +14334 0.0122908811389557514476 +14335 0.1191329417274260421555 +14336 0.1505040301116183920627 +14337 0.0868562903071967706836 +14338 0.1152129405090285968472 +14339 0.2079282480115899056639 +14340 0.0106268457790758743403 +14341 0.1086479455480601763462 +14342 0.0309509839273947387484 +14343 0.010063253065210128398 +14344 0.0014843565404540572019 +14345 0.0240167974035312634107 +14346 0.0984056101651946124598 +14347 0.0337883744660264356829 +14348 0.0155846974751506703316 +14349 0.2209340333973748571328 +14350 0.0895664837776182776174 +14351 0.0534887949409723825744 +14352 0.0085003309705875117891 +14353 0.0011785712161009531066 +14354 0.1856468147168567617289 +14355 0.0987187372797409951053 +14356 0.1971788565056662168029 +14357 0.2286724348244551230369 +14358 0.2311651419445280575271 +14359 0.2173191359135965283755 +14360 0.1150473107616588747115 +14361 0.1539083716894823206101 +14362 0.0453918036604296221137 +14363 0.2034101649494740871038 +14364 0.072274654021694778594 +14365 0.2636710724457929644338 +14366 0.1975178620548690422432 +14367 0.203896730834690648182 +14368 0.2018895212250289283862 +14369 0.1063993040480941398851 +14370 0.0017052749593941308754 +14371 0.1149508057393404530933 +14372 0.1512477412112839814107 +14373 0.2138889469192823133969 +14374 0.0273188972486578217358 +14375 0.088048031491716735597 +14376 0.1981922421082136787085 +14377 0.1000341653214014969731 +14378 0.0465979901544882404996 +14379 0.205528563156269938883 +14380 0.2135391893906567195582 +14381 0.2742667660729283252152 +14382 0.0055228861069273431361 +14383 0.2599219228641593271156 +14384 0.1655055215924879918887 +14385 0.113176526429574791921 +14386 0.2432103032986805413174 +14387 0.1769110932989108042612 +14388 0.091390401054535774783 +14389 0.0938093316996095744065 +14390 0.1713637316548725086918 +14391 0.3085986559843901244626 +14392 0.0146363366643108856052 +14393 0.2842325747804003155395 +14394 0.2877632084014824509488 +14395 0.1168136406009619709945 +14396 0.0007629037574464857382 +14397 0.0400100085852222175919 +14398 0.1702435024091659421241 +14399 0.1848434615377103906741 +14400 0.179018417621977843357 +14401 0.1385640001656934516827 +14402 0.1604488584083903490729 +14403 0.1877375910699104477608 +14404 0.0898473346103956377329 +14405 0.2118507086978931586163 +14406 0.0643065910741141971707 +14407 0.0285196813445228250505 +14408 0.3078130388138856088354 +14409 0.1347639740920348605524 +14410 0.0877528949938066182845 +14411 0.0111413889782161750625 +14412 0.2492399800865558734397 +14413 0.1226619769255028746802 +14414 0.1546375430295816866266 +14415 0.1368972458858888352484 +14416 0.2190415922291976957847 +14417 0.0938410282808908452479 +14418 0.1937294078049739132208 +14419 0.2222836329336029259629 +14420 0.1002652914966237485128 +14421 0.1538461958494662940033 +14422 0.1763171343518068201472 +14423 0.2240033458769691843138 +14424 0.1384776041768688570333 +14425 0.2102397170709781049069 +14426 0.0537797521041133083641 +14427 0.1569770580906821311196 +14428 0.1622110867531320610446 +14429 0.0591502317343738162614 +14430 0.1285982642185004576341 +14431 0.2465970212091793123399 +14432 0.1237531240911405594485 +14433 0.1200717526814425450965 +14434 0.1816575151684246269834 +14435 0.0930000494089111007234 +14436 0.0711784875670120054103 +14437 0.2430640506398959177492 +14438 0.0974104654105040629331 +14439 0.1965215318491121132549 +14440 0.2802558208079188317718 +14441 0.136672932583628115033 +14442 0.2128434091107331005421 +14443 0.1497425243242883619033 +14444 0.2071777991122182305261 +14445 0.1833481999142760054955 +14446 0.1500087037525694722007 +14447 0.09964794324557599281 +14448 0.0685542949343153074082 +14449 0.0089717275788367759071 +14450 0.1041679742462954111337 +14451 0.2250762429888976123848 +14452 0.1400337150178858347527 +14453 0.146004798843629879368 +14454 0.1416239043315317691629 +14455 0.1031602952309736975778 +14456 0.2092411830491201496507 +14457 0.2297837319736522609936 +14458 0.2059035676779223666966 +14459 0.0345730614028569860352 +14460 0.0150423189620210919631 +14461 0 +14462 0.0041019543823909336913 +14463 0 +14464 0.158672715277867359962 +14465 0.0979185388273065421272 +14466 0.0654717646198723218776 +14467 0.1189542928157805573575 +14468 0.193750419181717731254 +14469 0.2093551890427154393848 +14470 0.1671126763387345559586 +14471 0.3576700381353813407159 +14472 0.214028295903335052941 +14473 0.0322095854269028800121 +14474 0.0686938699447892514582 +14475 0.085570379614867159157 +14476 0.0204808687417929909114 +14477 0.0631044433791053127614 +14478 0.0926935049737936667125 +14479 0.1619613384183206938882 +14480 0.1400710394465229657879 +14481 0.0110240791083619140561 +14482 0.3326086863827660278226 +14483 0.0694733138024522106635 +14484 0.0865334351072923474435 +14485 0.003375280739091071798 +14486 0.0135558928210780457735 +14487 0.2029294430675961102839 +14488 0.0745004718531268317339 +14489 0.1306005905975831382637 +14490 0.2082058367528715758787 +14491 0.3183532462959074438302 +14492 0.1158188170830576346759 +14493 0.0898753683459554558732 +14494 0.2262706700514047486461 +14495 0.1756063113516906082356 +14496 0.141185588821399504722 +14497 0.1621045333196631244022 +14498 0.0162629270765283530376 +14499 0.1357630861421820767099 +14500 0.0334757551749187143297 +14501 0.0822603728892497693126 +14502 0.1679902492961348470235 +14503 0.0560151166039214304382 +14504 0.0984997614947936694918 +14505 0.2505356965203963670596 +14506 0.0810810610036576950854 +14507 0.2379861927213424155347 +14508 0.264886402103572116129 +14509 0.0340905681048484301976 +14510 0.2817540431565926373203 +14511 0.1961284616488445098348 +14512 0.2684184285901111133299 +14513 0.0137815899699823326413 +14514 0.1116593850510527036191 +14515 0.1226006486153570357311 +14516 0.3402294236477568789567 +14517 0.124266044336265396586 +14518 0.2411872364375871602427 +14519 0.2738884812829477932716 +14520 0.4038470846473997122317 +14521 0.2346160455234379826894 +14522 0.1548092146883517650213 +14523 0.0224692156657972062772 +14524 0.098541190318431948203 +14525 0.0654681938857850781455 +14526 0.2357550292244994238899 +14527 0.0668026087729627210399 +14528 0.1503794052159853622541 +14529 0.055093088477524677915 +14530 0.2507810683249629413183 +14531 0.1711311142398113938068 +14532 0.1712923093081981096297 +14533 0.1338386304121730041583 +14534 0.1212961882754400727569 +14535 0.2023313309635434431577 +14536 0.0032935183078503555039 +14537 0.1796995807421077617771 +14538 0.1437404640866547511635 +14539 0.0200913665590523073112 +14540 0.1409364442539269013377 +14541 0.1334810402437182907942 +14542 0.1676558301009388463054 +14543 0.1588615766053348143938 +14544 0.1124555709597068881012 +14545 0.1378686015825977273508 +14546 0.1440197352422594445631 +14547 0.1531030013258611877092 +14548 0.1688519974960958625054 +14549 0.1365445219660484932334 +14550 0.1129718866903077906239 +14551 0.2262474360238512505195 +14552 0.1138207133102184381857 +14553 0.0041441838817575342435 +14554 0.2758393525852779748497 +14555 0.1763320409759747131595 +14556 0.1956218254412305235235 +14557 0.2585062876504369278052 +14558 0.0832240522959905443257 +14559 0.2476137137744280902751 +14560 0.0196180540616829671152 +14561 0.1329186430604536173217 +14562 0.1622950831092599965899 +14563 0.0878864164258115893036 +14564 0.0550222070247362998741 +14565 0.1810756979182843173781 +14566 0.1501719870654500688101 +14567 0.0735435367144495338865 +14568 0.1510656669565444976122 +14569 0.0455366056598926760568 +14570 0.1585952231850852878203 +14571 0.1831729401550585001335 +14572 0.2250601211633411480317 +14573 0.1583608751543079118385 +14574 0.1931524547333112740155 +14575 0.186938828737294837623 +14576 0.1561751483552826780254 +14577 0.0324614132749669187517 +14578 0.1707761943640121737875 +14579 0.0250758478511366837693 +14580 0.0525172306637631119974 +14581 0.174616451424428498207 +14582 0.1992564745974052531352 +14583 0.2033414314866055228048 +14584 0.1573864076625466346115 +14585 0.1594198175262893091286 +14586 0.1228282127429988213052 +14587 0.1198567011276575611234 +14588 0.0646787938683743846013 +14589 0.131455323102273452518 +14590 0.2051076948624164231205 +14591 0.1823367070760444630384 +14592 0.1550473363482273991831 +14593 0.059217994826206932224 +14594 0.0981038138938422343838 +14595 0.2147454924460099889849 +14596 0.1395283772000297795035 +14597 0.2211755075295253614165 +14598 0.0697039214746634633313 +14599 0.2304133740679505104598 +14600 0.192775105293419901642 +14601 0.1817874103639398886134 +14602 0.0321474696101749293775 +14603 0.1503158382952374250152 +14604 0.2116315802072443741366 +14605 0.1383045477448467952541 +14606 0.0943582916912196895121 +14607 0.2565772541615374824353 +14608 0.1696420525803874201909 +14609 0.0166554287900238413844 +14610 0.0664062894881708581929 +14611 0.3174910701409612290291 +14612 0.0326849101435064126608 +14613 0.1818410490084581832004 +14614 0.2681104279394281864057 +14615 0.1085609225862435300858 +14616 0.1438655227018132698458 +14617 0.0171186694070149907076 +14618 0.0858422020456842266301 +14619 0.0759153335854049760023 +14620 0.1639367693753380483823 +14621 0.1779069390657466442907 +14622 0.266117147056654512749 +14623 0.2173442177443099254663 +14624 0.1902969754835630300516 +14625 0.0193444821242128132954 +14626 0.1905728960480728428895 +14627 0.2440204966000510677926 +14628 0.0034708026611957065212 +14629 0.1785610569233750666296 +14630 0.1599092789984060403174 +14631 0.1928860006332416554553 +14632 0.1613552698370852023224 +14633 0.0594027937379029077891 +14634 0.2742970648786025589239 +14635 0.0325933467508109711575 +14636 0.2042608609963469101789 +14637 0.2600901852738207731264 +14638 0.1333520348597777849253 +14639 0.2073815444553305797903 +14640 0.0157694537113946681162 +14641 0.1591919146286941144641 +14642 0.2494706307014122326215 +14643 0.2157147116441978096368 +14644 0.0025977278907390699385 +14645 0.1861569071783654161223 +14646 0.1856287091938529365542 +14647 0.1736959360249657746689 +14648 0.0665517091955043077478 +14649 0.1883942145588805849865 +14650 0.1456744764747673615002 +14651 0.1523889115498285595773 +14652 0.1438628624592696303441 +14653 0.2115733049342470861731 +14654 0.2179829335105602505784 +14655 0.0872905969662153835564 +14656 0.0691734454366514928125 +14657 0.182172223662109028508 +14658 0.157227362030090273004 +14659 0.1097053099777428181172 +14660 0.0012222952795103180813 +14661 0.1699337690752295848551 +14662 0.0374404612399439826187 +14663 0.2044566792036381364106 +14664 0.0368473874558262956547 +14665 0.2026919271124604793322 +14666 0.088629054314629460154 +14667 0.1511100590337984828082 +14668 0.09116535127332210664 +14669 0.0289493573726686548309 +14670 0.0300545340711214255258 +14671 0.2162728305424150565539 +14672 0.0262407046026746706979 +14673 0.0732149664446322817613 +14674 0.0301617754069784700655 +14675 0.0449971544469779036191 +14676 0.0299271613455644978907 +14677 0.0351186455892446222626 +14678 0.0499726484941659804262 +14679 0.1902553622641972863594 +14680 0.2162428103327083683816 +14681 0.2195937419813452085027 +14682 0.1255704812681847359901 +14683 0.0279777154030998592704 +14684 0.1221642711493335087258 +14685 0.1667981944598197208141 +14686 0.1222282952321326748857 +14687 0.2246892256090398110047 +14688 0.0464180081372116137839 +14689 0.2249497362771971653572 +14690 0.0358301517615586886945 +14691 0.171883130666028949296 +14692 0.0794951250687499083591 +14693 0.3425035334340596482328 +14694 0.0053115100129654925845 +14695 0.1755739316922062809301 +14696 0.0353043546178573794392 +14697 0.072932095155967185951 +14698 0.2227636999037861664785 +14699 0.0042964163886268797593 +14700 0.0210919463840062298976 +14701 0.0649323835002008076245 +14702 0.0844777093220951263985 +14703 0.0938655204929855213614 +14704 0.225116837405910596992 +14705 0.3047676994018375107309 +14706 0.3708400090547754324177 +14707 0.0113826348111728883894 +14708 0.1439559805646384027522 +14709 0.2027087544314429001791 +14710 0.1876331124470076439081 +14711 0.0101787079746949310488 +14712 0.1643048317368392963189 +14713 0.0729824373186921671053 +14714 0.1295337566029902687248 +14715 0.1654631930695228403394 +14716 0.2595591817495796482795 +14717 0.2587564925339477794886 +14718 0.1257278296590743638106 +14719 0.1525334222110001636263 +14720 0.290100391112416367001 +14721 0.1048259595209988653819 +14722 0.1491409254735273637316 +14723 0.1331800131921282159819 +14724 0.0111488823369393207779 +14725 0.0123293458714061002907 +14726 0.0180186834514543640917 +14727 0.0112385065736030267253 +14728 0.1188489495147091734584 +14729 0.2339335648254665667167 +14730 0.1389562277799829548286 +14731 0.1466554405495808510729 +14732 0.2168225476711765631332 +14733 0.0156811801628018721022 +14734 0.0537857319060057251225 +14735 0.0050002186026826328166 +14736 0.2691052859781076223022 +14737 0.3130296171568542895614 +14738 0.0252230111749535655663 +14739 0.209022137605742064359 +14740 0.2650681697735201569621 +14741 0.3229502195327652036561 +14742 0.1464651296555091186669 +14743 0.1568328061502676784578 +14744 0.0007373149493195674904 +14745 0.0401552176984009759342 +14746 0.0989595638008345518388 +14747 0.2134802497376751539981 +14748 0.022158627147452709949 +14749 0.0422276142624239472778 +14750 0.0834056839640667929681 +14751 0.1171384130390467487315 +14752 0.142455910817254727263 +14753 0.0012606796642245924386 +14754 0.1992243771474289937284 +14755 0.0328573153775607793015 +14756 0.0616465178398828209105 +14757 0.2238732737125103955389 +14758 0.1338961898178901155365 +14759 0.0690009724232193838445 +14760 0.1148848211353596798778 +14761 0.0759265091471029440751 +14762 0.0143528282353451459158 +14763 0.0080511420568338317588 +14764 0.0031539997794972859414 +14765 0.095116712537193254251 +14766 0.0913186072439876872497 +14767 0.086791732383299749487 +14768 0.1949444158075035182787 +14769 0.1685163887216386269241 +14770 0.242742857696786606958 +14771 0.3009225954222181642628 +14772 0.333232798576902355947 +14773 0.0271083353978143239138 +14774 0.0004628023755054846218 +14775 0.3310224328995793885433 +14776 0.2655690999747909453177 +14777 0.0752032394090807604359 +14778 0.0740110979812742958428 +14779 0.1611393928710243284019 +14780 0.0804289842402688553991 +14781 0.2053081222145426432135 +14782 0.1216203306289658503392 +14783 0.2598562981421964623863 +14784 0.2760848150260462596783 +14785 0.0948454133677817806669 +14786 0.1571632002660377780412 +14787 0.0642623775106749006714 +14788 0.0849417127012146860876 +14789 0.0019272971903775742899 +14790 0.0592617109990659363117 +14791 0.1121281262046147780831 +14792 0.0406435633766101533926 +14793 0.0580490650678952765928 +14794 0.0765229662334405424939 +14795 0.0809694343888302942025 +14796 0.2173128250654998983027 +14797 0.1377813469540078994413 +14798 0.1175244868292072836224 +14799 0.2230302277447349379447 +14800 0.1686796347252797756422 +14801 0.155872877738723047969 +14802 0.2199725387893510519799 +14803 0.1170629348297661936895 +14804 0.085514322607053647185 +14805 0.194003849773706049664 +14806 0.1505628444686051004009 +14807 0.2310845858345438041948 +14808 0.1407438009055115846202 +14809 0.1197517721103436016961 +14810 0.1831948477457359392151 +14811 0.1895042670513407412347 +14812 0.1494030142480322065346 +14813 0.1211206568996636145918 +14814 0.1339336928495116485482 +14815 0.2848626203747542939837 +14816 0.1864718277466479168236 +14817 0.280248642271063774789 +14818 0.1666737672942721348868 +14819 0.2147764178523416944699 +14820 0.1735467795369969590258 +14821 0.2421075622565262708275 +14822 0.2206278243716709719369 +14823 0.2081647613385345374493 +14824 0.0128770826241469855955 +14825 0.2746016625337674987861 +14826 0.2188473975647663993804 +14827 0.0271849758210599944108 +14828 0.0076201284387572756213 +14829 0.166690580398041493515 +14830 0.1985905573559764158986 +14831 0.060328048552402355198 +14832 0.0539936010575143010559 +14833 0.101764117340906248832 +14834 0.0102555966263109029496 +14835 0.1669203157178917318326 +14836 0.0958355688633820840261 +14837 0.2391474498456219455189 +14838 0.1792184362419769472208 +14839 0.100636900389394781663 +14840 0.2844385444717305033713 +14841 0.2603270971720696436158 +14842 0.1154348080270817811233 +14843 0.0769272744330731783613 +14844 0.2882481598742105965982 +14845 0.1135803048002795578864 +14846 0.2186612556297428122676 +14847 0.0178034914594586771153 +14848 0.0497805886359445476375 +14849 0.2723179578759347130301 +14850 0.209845023383344764234 +14851 0.0932943938806134603903 +14852 0.0084576329905986374902 +14853 0.0080903065024021253437 +14854 0.0483073238920018882547 +14855 0.0105852368875626514505 +14856 0.0122145136519978582401 +14857 0.010476260611843030493 +14858 0.0065658562714093534479 +14859 0.0013206487324977222043 +14860 0.0151303319503177814004 +14861 0.0033946074063647860113 +14862 0.0047231227479471926331 +14863 0.0140583952458234462402 +14864 0.0249332930635074619119 +14865 0.012443306391319932519 +14866 0.0218074880353530918542 +14867 0.0056293737228367132439 +14868 0.0083092553005891696405 +14869 0.0405068617324257362888 +14870 0.0096972527686702680166 +14871 0.0717068275307186642209 +14872 0.0476623406205271965552 +14873 0.2979033940134447888681 +14874 0.2255721665271900044925 +14875 0.0676573302233266921091 +14876 0.0576277144353388656728 +14877 0.0979351560844222146507 +14878 0.0539025724823345361858 +14879 0.0614234244318758934589 +14880 0.1151276090431096743583 +14881 0.0903739049003903366364 +14882 0.0121229886671549730998 +14883 0.0031740656313959292496 +14884 0.0050604654738404036482 +14885 0.0027385514465726921128 +14886 0.001862283644769087464 +14887 0.0078975949403439476421 +14888 0.0112016456397300203512 +14889 0.0044265457271771366937 +14890 0.019721197940290955547 +14891 0.0099930364048488855983 +14892 0.0225280217446792510594 +14893 0.2577487385429281796512 +14894 0.0888612198011000525844 +14895 0.1837163131267748328312 +14896 0.1709372962991067002037 +14897 0.2862744784819295507106 +14898 0.1032991813280390996921 +14899 0.0996018780561275796082 +14900 0.0395444443015593033453 +14901 0.0021822672480222853227 +14902 0.0021597686283105397252 +14903 0.1499105087903606359223 +14904 0.0737901377733784086255 +14905 0.0820046352470311934457 +14906 0.0900468230946451925201 +14907 0.0874833851050184907461 +14908 0.1259975452053360422955 +14909 0.0509497688546148158606 +14910 0.07921059789118180261 +14911 0.0201696475613924175452 +14912 0.0270557026003751706356 +14913 0.0607329586183364134033 +14914 0.0534626418329641056593 +14915 0.1614606411257595020015 +14916 0.2183563970167152479274 +14917 0.032641517257165325594 +14918 0.0347132347384088366105 +14919 0.0213352590340077537256 +14920 0.0598825750346242344224 +14921 0.0965674811274181421616 +14922 0.1997964855320945587458 +14923 0.1572563557679214629736 +14924 0.1537122630676545920192 +14925 0.128569784338205678198 +14926 0.1500949776370305854822 +14927 0.1384427479351268974739 +14928 0.1906845327533390199726 +14929 0.1989123196730490650275 +14930 0.1273554377860663666677 +14931 0.0500888983577596064167 +14932 0.033024301696155181074 +14933 0.0072647611232821074009 +14934 0.0221935281340300306907 +14935 0.0591241166283016858496 +14936 0.0323472030206860966528 +14937 0.0364037083190365540197 +14938 0.1534285036819305336753 +14939 0.1372500150373902882972 +14940 0.2143282393181803746618 +14941 0.125029947709879657447 +14942 0.2519804553373244937475 +14943 0.2614265757564509518041 +14944 0.2343186601169481020346 +14945 0.2574286876476790109081 +14946 0.1897439611121812252925 +14947 0.1021909966034433847604 +14948 0.2298408460199745928509 +14949 0.1341174503574260601368 +14950 0.1551628264465765016933 +14951 0.1018534850482086667744 +14952 0.1166879053807990118674 +14953 0.1413921157222996494163 +14954 0.1088384825195796851149 +14955 0.1371046660874996880874 +14956 0.1131714602723532386852 +14957 0.1045617890391233595349 +14958 0.0834314768872771589381 +14959 0.1424163558922689754027 +14960 0.1288347953336186069162 +14961 0.0949407408056172907873 +14962 0.1514622972955906587966 +14963 0.1145129032714128358705 +14964 0.0649613724341419757069 +14965 0.1290663966616353575123 +14966 0.1777152736926378151505 +14967 0.3295378969830388138362 +14968 0.283236685338067228912 +14969 0.0739737578271926626838 +14970 0.2349006090199961260812 +14971 0.0392657052532615485907 +14972 0.0491780649559683119554 +14973 0.1755552949145059715708 +14974 0.1359479050745266981437 +14975 0.1456722789693686848089 +14976 0.2407372348826617713957 +14977 0.1671437005350517035485 +14978 0.0139614220145280309299 +14979 0.0583530143291577860221 +14980 0.0547069155890151787025 +14981 0.1610749753317337884528 +14982 0.1215151710529512874004 +14983 0.0631801539307214748975 +14984 0.1513632074637296653918 +14985 0.0258036452580524128253 +14986 0.3950776430828631946568 +14987 0.1030883663664788757464 +14988 0.1848797309390599885592 +14989 0.1726620866136867327167 +14990 0.1377758094804789612553 +14991 0.1706467539122008547636 +14992 0.0853782164118507269635 +14993 0.188220273843917096368 +14994 0.2309234347121734975961 +14995 0.1431081415367423115281 +14996 0.1411815521915166238287 +14997 0.0527267680771811159124 +14998 0.16019978152723077236 +14999 0.0223323991846002822825 +15000 0.0792901425357430134078 +15001 0.1346495705097606465639 +15002 0.1086962924717287531928 +15003 0.2353229833767898071528 +15004 0.1035442464475556656467 +15005 0.0854581013477836437531 +15006 0.0133281939325491424059 +15007 0.20119034513329045466 +15008 0.1024027205780551896819 +15009 0.1353980476367817320504 +15010 0.0207614558326697645763 +15011 0.2523450238846036142171 +15012 0.0358104253226029581314 +15013 0.1221288157555578540725 +15014 0.1465620691383173190836 +15015 0.1484908910110934177062 +15016 0.0635068704148774171303 +15017 0.1315500831009459858922 +15018 0.1753231271874507168906 +15019 0.0353746609058941952064 +15020 0.0270586832636340555991 +15021 0.0520889295569090249316 +15022 0.087432921168187363592 +15023 0.0211740466219808368131 +15024 0.0114939247886927161518 +15025 0.1494271080827094533294 +15026 0.0635736439100878236719 +15027 0.1744257059505136264743 +15028 0.1322858633330304678477 +15029 0.0164366723079415466557 +15030 0.1138283963315878388745 +15031 0.1286159467509457865475 +15032 0.1626553058363396642161 +15033 0.1182758818252385851766 +15034 0.1710775758676675029069 +15035 0.1656395244986824111244 +15036 0.1580525065215513391692 +15037 0.0785049598624087685561 +15038 0.099850453148213474952 +15039 0.0268055983056011970511 +15040 0.001970830549245297051 +15041 0.0019708000866273485061 +15042 0.1607229720953552709606 +15043 0.1398972569679652611097 +15044 0.2189352325133092913578 +15045 0.2238479976920164926657 +15046 0.1755422331692935711267 +15047 0.1770339616942321336879 +15048 0.1442797392380787691035 +15049 0.1949751109684071670447 +15050 0.0898726870546474043699 +15051 0.1037359082733196324888 +15052 0.208798503868790341631 +15053 0.1313943343514634432001 +15054 0.1516380107105979868631 +15055 0.1987151566815898717788 +15056 0.1751088542510995105239 +15057 0.0044488445266612592841 +15058 0.0018075879383942091653 +15059 0.063168347086967904791 +15060 0.340457271986250864515 +15061 0.1835904242111640327639 +15062 0.079187523290795314157 +15063 0.1568614245694555653365 +15064 0.1338455547876995221923 +15065 0.0431255448294132273657 +15066 0.2869259046381191780029 +15067 0.2869876283871521938096 +15068 0.145195569176896716046 +15069 0.2019470673537832883238 +15070 0.1412826777011378098781 +15071 0.1321158068285396836039 +15072 0.2115482102735397884619 +15073 0.2767162119103586670477 +15074 0.105083508771985800756 +15075 0.1268282373658327077948 +15076 0.0083111999358263417836 +15077 0.0109871694617383695164 +15078 0.2528941851324209566521 +15079 0.1893302387144827902965 +15080 0.1514597271987073445754 +15081 0.0800855878997228237681 +15082 0.1645310565383497625191 +15083 0.288590341760820878747 +15084 0.2828661243903938249389 +15085 0.1477930663052880178032 +15086 0.0771166124802423469387 +15087 0.0019017459437442830669 +15088 0.009060146728200282884 +15089 0.1360503601530458628321 +15090 0.126649220633203385411 +15091 0.131080373946328154755 +15092 0.0997347623448593817441 +15093 0.0843090738913501724916 +15094 0.0681752689990169774559 +15095 0.0030640090783411828498 +15096 0.0231614228001935373857 +15097 0.0101353874329837816831 +15098 0.0083550544469972939932 +15099 0.0824925265714696320796 +15100 0.2855871441846031810741 +15101 0.0590878309363226175699 +15102 0.0559068176902981842091 +15103 0.0497696853265483796225 +15104 0.2029225225558500544931 +15105 0.262898696073951165797 +15106 0.0075403901347051993653 +15107 0.0862018832100225340653 +15108 0.0414953868059671146518 +15109 0.1127200541328864658031 +15110 0.1297347547080969221156 +15111 0.0382427917891765972724 +15112 0.0848277491478822698312 +15113 0.0034794201916534248965 +15114 0.1498832300393654615167 +15115 0.1685388550629553505722 +15116 0.216868262124318006423 +15117 0.1489306805012127199461 +15118 0.0686209860269721971804 +15119 0.0129641575870808167109 +15120 0.0705351464680923495587 +15121 0.1957764529400599085029 +15122 0.2307190478871763983371 +15123 0.1486339516292284612664 +15124 0.1094091520001979761512 +15125 0.0815154484405229834421 +15126 0.0897849412204183711106 +15127 0.2204601324857353439324 +15128 0.0799023169360107837766 +15129 0.1925876669410313546393 +15130 0.1949751007519291978731 +15131 0.2036863222882995816398 +15132 0.2856298018503333202567 +15133 0.1514403938870018395679 +15134 0.2390365653792807798528 +15135 0.1202182190014949997314 +15136 0.2362600915803014511862 +15137 0.0804824137731489086534 +15138 0.1886533581940339210359 +15139 0.2095570963020600696591 +15140 0.2918344193761417404431 +15141 0.1849594752982857137624 +15142 0.223496460675462305101 +15143 0.1595738223720947757123 +15144 0.1042320738136161273335 +15145 0.1183343059406585101589 +15146 0.2260910334575400304491 +15147 0.1872269750380978481363 +15148 0.1744487377089844748479 +15149 0.0181081528290080381871 +15150 0.1142954590062046499321 +15151 0.1917046798549042752047 +15152 0.2444446471493398886921 +15153 0.149318781209926310094 +15154 0.2162388752793804602703 +15155 0.090765096398385730625 +15156 0.1504414869938278109807 +15157 0.1222433179502518396653 +15158 0.1745213120854484933897 +15159 0.1080139453828777157396 +15160 0.2117580531859710457621 +15161 0.2275991259553379331138 +15162 0.0370330313827661045112 +15163 0.0970792153782090855296 +15164 0.0107063630961575888267 +15165 0.0245935623943640546685 +15166 0.1536672768289764900995 +15167 0.1905587898423468373732 +15168 0.1524467558022143676144 +15169 0.2055676753930714140139 +15170 0.189537270959304837703 +15171 0.2693679428219707405745 +15172 0.2231677252894800622673 +15173 0.157466532198116709873 +15174 0.1436862577734402957041 +15175 0.1447410437295786334033 +15176 0.216634771662075953369 +15177 0.2765211846249318039526 +15178 0.1936111035984995898396 +15179 0.2732596776262219973042 +15180 0.248357627174961376193 +15181 0.2039840847473644303012 +15182 0.1967825926150783000246 +15183 0.1878236577830786624421 +15184 0.2918475685110730188931 +15185 0.161916673451496406777 +15186 0.2181186512581285430468 +15187 0.0564897097997928629431 +15188 0.1746724517970308643289 +15189 0.14314617542770385894 +15190 0.058875507374926354387 +15191 0.1008243085038472292014 +15192 0.2529890812491831786346 +15193 0.0652282429788558770767 +15194 0.061521426991351220448 +15195 0.1021306932739173556346 +15196 0.034982248514130648287 +15197 0.0694208846532851225897 +15198 0.1427165746095161447116 +15199 0.1105175627940269689198 +15200 0.2465256724221734918601 +15201 0.1346975420858086136544 +15202 0.1648201378409997752694 +15203 0.0368902360076618224261 +15204 0.0034142759114750452151 +15205 0.1182354955917118838338 +15206 0.0486965321489550642675 +15207 0.0064654007749505868277 +15208 0.0052499656072125516076 +15209 0.295041451750693128897 +15210 0.1191865725176150109244 +15211 0.1283937300843506768011 +15212 0.1097220549444083148583 +15213 0.1308817214944215101546 +15214 0.1977737260942524222251 +15215 0.1822509658459053882851 +15216 0.062379075652445457878 +15217 0.1162389935361981263551 +15218 0.1890655188691252597266 +15219 0.1410561075295375943472 +15220 0.238255919602140597835 +15221 0.218426746847600605772 +15222 0.0917037274615200709293 +15223 0.1279043796057225679252 +15224 0.2828066930874194029322 +15225 0.2136907049372336608872 +15226 0.1100156282583949485243 +15227 0.2880506635134689741307 +15228 0.22199806017267223468 +15229 0.2361014108320574067523 +15230 0.2564231126053175269242 +15231 0.2359243677091359125964 +15232 0.0709097749090906032121 +15233 0.2203629159752278077189 +15234 0.0413304178164116201022 +15235 0.0025020517275630388236 +15236 0.0601383508272575889597 +15237 0.4533782572324805415498 +15238 0.1914223273667572111645 +15239 0.2627114493348440271703 +15240 0.0309806116531716919937 +15241 0.0801584460885631094884 +15242 0.0067880235392419798618 +15243 0.0063709460039316403307 +15244 0.3488720812193784048816 +15245 0.2045155397506603911495 +15246 0.1136664730794505334677 +15247 0.0851384786448890068522 +15248 0.3316275779618672037152 +15249 0.2056351331402446747099 +15250 0.2169305512172764183187 +15251 0.1395866479757023326247 +15252 0.1398847317634927811536 +15253 0.1641248241193487555911 +15254 0.2332144031614090118332 +15255 0.1567584997467541541027 +15256 0.0871101248424558105565 +15257 0.2289085691372378106312 +15258 0.2213258290977991038506 +15259 0.2235041113114456523547 +15260 0.0805678735591676825623 +15261 0.1091022154587962256223 +15262 0.1959932729039627619105 +15263 0.3107770449716509841842 +15264 0.1874916800357785406916 +15265 0.1606682308065233988348 +15266 0.0968752206002762572545 +15267 0.2402329721167731357845 +15268 0.0963208306736275110538 +15269 0.0791996535589404998534 +15270 0.0669916196459604124103 +15271 0.0871930261851702470555 +15272 0.0462416920122571822871 +15273 0.1987641358753508835644 +15274 0.1696684018092537160616 +15275 0.3207795580977487737151 +15276 0.0568012624038161592632 +15277 0.0755987975798596972821 +15278 0.1620175769032502488542 +15279 0.125256592149941892167 +15280 0.2328198754626390198919 +15281 0.0892312618788855954621 +15282 0.131835589980916711994 +15283 0.0008074301367317806687 +15284 0.026082730669097668047 +15285 0.0159323682464233007394 +15286 0.3646499806576515290146 +15287 0.107726344026221679262 +15288 0.0317121306655548246267 +15289 0.160628689533985957727 +15290 0.1077588331163883417219 +15291 0.0790568028821991980459 +15292 0.2499733885074705008744 +15293 0.1474965023024130961193 +15294 0.177239702492351519636 +15295 0.0609353676469075178357 +15296 0.0016656183512680144618 +15297 0.0781698895076564370665 +15298 0.0950153229077650013368 +15299 0.1622858024900554474446 +15300 0.0624606193025804973251 +15301 0.1735433081759261975119 +15302 0.1816431016344151883057 +15303 0.2216559806878747873604 +15304 0.3190904988622282334454 +15305 0.103792990934899048705 +15306 0.1538811779375977573103 +15307 0.0351204676994232925513 +15308 0.170579878001853257885 +15309 0.1505642567002117293562 +15310 0.2249977156677035461385 +15311 0.070884729979961744184 +15312 0.0197408180605293907817 +15313 0.032118801606434439766 +15314 0.0462536219761678912832 +15315 0.0250116671401462153934 +15316 0.0206226572121701694218 +15317 0.0611422215940451013738 +15318 0.0088163409610677988953 +15319 0.0157217690879349371313 +15320 0.0539711355980279361422 +15321 0.0998200637273136681671 +15322 0.0663290233531785983212 +15323 0.0198129784885314325082 +15324 0.0856779860982529450908 +15325 0.0841870057117596881646 +15326 0.1014599082281048558762 +15327 0.2709064439998323803316 +15328 0.1520669597259560468228 +15329 0.2344319106861474932835 +15330 0.1081709715528891696534 +15331 0.0251992487727540695475 +15332 0.1212084970272798434276 +15333 0.1836914029214800903045 +15334 0.1504109136758255416133 +15335 0.18102159762927619302 +15336 0.1105206735129773687465 +15337 0.0160903369924267486413 +15338 0.1672759998148979354582 +15339 0.1369476071537575612158 +15340 0.3535996213846902547218 +15341 0.0596126868491548347895 +15342 0.0627527116317517230648 +15343 0.026062395544558827204 +15344 0.0804364100638294138301 +15345 0.183546962667330476604 +15346 0.1170738346910596278105 +15347 0.2840015640875253044584 +15348 0.1582673370045673544304 +15349 0.1736588391868843506138 +15350 0.1848586686671216117173 +15351 0.1646410899127938032294 +15352 0.0622287013639470643067 +15353 0.1323216337668265685235 +15354 0.0610845479414886677327 +15355 0.0156897273405678869262 +15356 0.0196195538604376981673 +15357 0.2250826090499905840492 +15358 0.2091915780447005657194 +15359 0.0088981284579177262473 +15360 0.006197134879581616719 +15361 0.1723711563477065000782 +15362 0.1707987152376994333114 +15363 0.2395338701085033272786 +15364 0.0423274640137420166131 +15365 0.2507450947216997017009 +15366 0.0700748704118610010738 +15367 0.0779583110718007066264 +15368 0.1114575849788347616798 +15369 0.0303864792593827592881 +15370 0.3401904723766905336291 +15371 0.1629094693739549792877 +15372 0.1067063145980951810454 +15373 0.173139514399784977261 +15374 0.2139016937933115036774 +15375 0.0235784850225226084997 +15376 0.071908589987558224621 +15377 0.0389994619634058195534 +15378 0.0927347919809289045201 +15379 0.0119106169910992293337 +15380 0.1138274018346653593259 +15381 0.0785683055427956744676 +15382 0.0713120298685322312471 +15383 0.0192974528268850462065 +15384 0.2040278750650460770633 +15385 0.0513214556358469742303 +15386 0.0253330884494514831096 +15387 0.0983092477438442247095 +15388 0.0423027044338188731865 +15389 0.0774957283311820677163 +15390 0.1283141904104653785446 +15391 0.0235824508773545306328 +15392 0.1221920283497243547854 +15393 0.0189867577670402923329 +15394 0.2581886970169828821042 +15395 0.0337491052945461597612 +15396 0.1447661497735097668738 +15397 0.0543870585591184965701 +15398 0.0993481308331188167626 +15399 0.155541580939891510571 +15400 0.0409849751160986375265 +15401 0.0075794567708785901688 +15402 0.0244247137642791961598 +15403 0.0003998803504070059015 +15404 0.0916321507485420988504 +15405 0.0354586251781291231233 +15406 0.0105553458490068204079 +15407 0.0291672772121544555957 +15408 0.2112264475256204754317 +15409 0.0312735029881136689101 +15410 0.2261051128266001186695 +15411 0.0945232711771470390794 +15412 0.0682164522259687267081 +15413 0.1505776305121926117447 +15414 0.1128102908294878470308 +15415 0.0541209204132786572683 +15416 0.2394515937934405536147 +15417 0.1675335794336916261926 +15418 0.2296084455709239691146 +15419 0.2222063623154969524975 +15420 0.1125788178103570152944 +15421 0.03764454573659149067 +15422 0.0564212387020423949968 +15423 0.0464459048872941995811 +15424 0.0770951241374031037701 +15425 0.1251936976871520779131 +15426 0.079468643005011746494 +15427 0.0812537622019482430602 +15428 0.0955387145544665389307 +15429 0.2851690814605954416017 +15430 0.0040443969236500102241 +15431 0.0407446835415267369362 +15432 0.0046183645392077130043 +15433 0.1180857729387978505287 +15434 0.0531636565646723541212 +15435 0.0387822361701183268567 +15436 0.0071593233712998454593 +15437 0.1867151109266980135715 +15438 0.1162416161600378500252 +15439 0.015777794905594149516 +15440 0.0561532329398683680011 +15441 0.1001389716360903542336 +15442 0.200989772053315002287 +15443 0.1802412199415610594144 +15444 0.1171003364467610857735 +15445 0.0256039422291295393208 +15446 0.0157179787324714069496 +15447 0.0327838311691424355576 +15448 0.1081208960232912102972 +15449 0.048405288117458582231 +15450 0.113130595590897595204 +15451 0.0229095083267349536316 +15452 0.0345799260462398630467 +15453 0.0055029214552379003556 +15454 0.0466593135727547403402 +15455 0.079540197178983473858 +15456 0.2569880829740418537455 +15457 0.1141234354535243106232 +15458 0.2349478336601781847115 +15459 0.0513244780973373046495 +15460 0.2485673956722157795785 +15461 0.0394035030237876379711 +15462 0.0732726570415571654138 +15463 0.1821303028046585281707 +15464 0.1643223544404903702087 +15465 0.1422527004855284604723 +15466 0.0713396790448348055991 +15467 0.0134099412453587071847 +15468 0.0673129426853324719371 +15469 0.0149034403901047284718 +15470 0.0729346298269040876905 +15471 0.1057657626613224821632 +15472 0.2222269809784684357901 +15473 0.0521724127649357688474 +15474 0.0460456422659814151865 +15475 0.0628205104144172188363 +15476 0.1637113792240175169646 +15477 0.0225148527698606874814 +15478 0.1109624958535758015854 +15479 0.1034989082502709989431 +15480 0.2529664551099723990291 +15481 0.2163506201864039135696 +15482 0.1881811519035586355297 +15483 0.1708880225432536092978 +15484 0.217769105302685672676 +15485 0.0880265322097574243188 +15486 0.0832198509814445902544 +15487 0.1628407442727931031534 +15488 0.2015744511324949472808 +15489 0.129025635646033393078 +15490 0.1161780280127505671217 +15491 0.19428981573829795515 +15492 0.0032157704243991305955 +15493 0.12911785983420115409 +15494 0.3711640112816024239528 +15495 0.3536107713738574354068 +15496 0.0938986300239670612644 +15497 0.1922770960031738030072 +15498 0.1013584262068815411428 +15499 0.1488704349416622996216 +15500 0.1237432421025617124144 +15501 0.0401841604383558412272 +15502 0.1710145282749791773025 +15503 0.0073076753651037254858 +15504 0.1432472497910264042087 +15505 0.1149172863116754983448 +15506 0.081425134688677888728 +15507 0.0281034216946822526517 +15508 0.1962501845168291847443 +15509 0.1024960942479745257261 +15510 0.2569208725151675976051 +15511 0.1279240256718601753683 +15512 0.1650436855905590416427 +15513 0.1218086913065720700011 +15514 0.1305741146318138701243 +15515 0.3071047989782707543505 +15516 0.2678680330926118680068 +15517 0.1531348868813032504388 +15518 0.2366647458123253555673 +15519 0.169683698163197782538 +15520 0.1342830384415158173628 +15521 0.139978282581840612897 +15522 0.1969461996760829280984 +15523 0.2438909784486539833459 +15524 0.0857676361596401010123 +15525 0.0443681701582605211853 +15526 0.0161279688194990375449 +15527 0.0734824544657615152943 +15528 0.0243751944588856614415 +15529 0.2317141421896515285361 +15530 0.0536396568757158340168 +15531 0.0517018540678671478794 +15532 0.0130280522525201680056 +15533 0.2175038161870642139384 +15534 0.0485269308444815222758 +15535 0.0356648654334665998755 +15536 0.0354456520382366163657 +15537 0.0946723938097468187269 +15538 0.0906605555979008159717 +15539 0.1950438199357003021817 +15540 0.017160111683141118788 +15541 0.0560132355093904896193 +15542 0.0058952871618457962366 +15543 0.1877389696381218109345 +15544 0.0633735442068099757984 +15545 0.0700590633127858436469 +15546 0.0928271580026574782973 +15547 0.2816504193622674789843 +15548 0.0367829996628817809667 +15549 0.1970529220101016054656 +15550 0.0553020604150715675007 +15551 0.1041187546853301060246 +15552 0.1111006669540444413702 +15553 0.0269302183010025342202 +15554 0.1378645024439515975878 +15555 0.0550357228166433801175 +15556 0.172357565021455744203 +15557 0.1387293786591664579433 +15558 0.0036361203261237842917 +15559 0.140316243662368000944 +15560 0.1009703330846805913179 +15561 0.0993284739022540080766 +15562 0.0888515292480629714778 +15563 0.1306023633047409537866 +15564 0.0280925108516042708329 +15565 0.1166199830673500942302 +15566 0.2185977529803652397167 +15567 0.1688186376242594155528 +15568 0.2219571660379860145174 +15569 0.0063335325401814822344 +15570 0.2096548721977236906522 +15571 0.1475919479212679519264 +15572 0.1498319579965564363189 +15573 0.0104138874009393599812 +15574 0.1675585862440327755696 +15575 0.0157760715708234744437 +15576 0.1812311816683496445979 +15577 0.0182801840589682171312 +15578 0.2569665999835050174127 +15579 0.0665930408392023387432 +15580 0.1812275501706346603203 +15581 0.2221649420631966187489 +15582 0.2651118046978471753938 +15583 0.036074999610887538426 +15584 0.1268723214331698712076 +15585 0.1072074015054325785146 +15586 0.3422787059261071518357 +15587 0.1421652856660737818117 +15588 0.1005462953921467811336 +15589 0.1856638347075047490442 +15590 0.2506600112226722854025 +15591 0.03966719715534213625 +15592 0.1675864873759017881216 +15593 0.2224882564871756007729 +15594 0.0107935120675292615944 +15595 0.173442447396480542432 +15596 0.1437458705249253509972 +15597 0.2312405314235564945946 +15598 0.0549405570523840425246 +15599 0.0156291782437192293409 +15600 0.1810520642131271396025 +15601 0.0197212083422837851365 +15602 0.0144526008856769444533 +15603 0.0228697426150585442217 +15604 0.0968155826905111316361 +15605 0.2913329540963830455169 +15606 0.0568068981838786499217 +15607 0.2022053787121544099481 +15608 0.3138134239001402203684 +15609 0.2101520661480217044836 +15610 0.0305621130121277763403 +15611 0.2017522335929807786314 +15612 0.1506048937010127930503 +15613 0.1312568072123563855946 +15614 0.0850123099826198408424 +15615 0.2268188969792716835894 +15616 0.0131303056592755752058 +15617 0.1584391752323174862305 +15618 0.0052106567369408478968 +15619 0.044860156905536720251 +15620 0.2250601920177332515571 +15621 0.1352393268514541779801 +15622 0.0419492825886923562106 +15623 0.2115088953486161105211 +15624 0.1224519182633156400009 +15625 0.196945807149109725076 +15626 0.0136664549205734713738 +15627 0.1438803072505852598262 +15628 0.0726301065048782917843 +15629 0.1943673887694548907401 +15630 0.0656935266562913844357 +15631 0.2697136511539718894959 +15632 0.0399646195550160468479 +15633 0.1869260140963674987624 +15634 0.2301673125216098791945 +15635 0.1945611399506727290376 +15636 0.0711393095509445239077 +15637 0.1953106086815758757336 +15638 0.0289696441499718813917 +15639 0.0939365961910583191496 +15640 0.2252638730540382772105 +15641 0.1660010346362067457093 +15642 0.0473292771832163686185 +15643 0.1734931368378358762516 +15644 0.1683887610233213705246 +15645 0.2849160916631345852323 +15646 0.3273999149032687516758 +15647 0.0795845512312968461632 +15648 0.3173124996795073649025 +15649 0.099685483330493679377 +15650 0.1672491921949684490212 +15651 0.0969931047037896343355 +15652 0.0681122796384037859641 +15653 0.0171632990028541231042 +15654 0.0687068840062499547328 +15655 0.1786677234941488612296 +15656 0.0186740402783930094233 +15657 0.184872107049783407362 +15658 0.0592903900947470210991 +15659 0.0760225189547493462872 +15660 0.1689626792799877319506 +15661 0.0305388552650885207462 +15662 0.1696670994766602547088 +15663 0.0847734314026661883723 +15664 0.2465342477717313851571 +15665 0.0572485380908603624928 +15666 0.0549398510401282874627 +15667 0.261225300030136664553 +15668 0.0338293575212032376909 +15669 0.1035203877870310845699 +15670 0.0323078643144227503625 +15671 0.0724550742806042563648 +15672 0.055426491091450681703 +15673 0.0109588535002794734002 +15674 0.1363188359262621773826 +15675 0.0120292898710794640693 +15676 0.0048983216011440691706 +15677 0.1544317484085711289499 +15678 0.0142370582678691213335 +15679 0.0266623097856169936903 +15680 0.1295551141196601696581 +15681 0.0093038754487761061995 +15682 0.0067853631132653901073 +15683 0.1690009683687814512343 +15684 0.1544228927607754209284 +15685 0.0503767929980654532995 +15686 0.0763170951602308300643 +15687 0.0637789795835236456023 +15688 0.0681973488044206876957 +15689 0.018121579732999087281 +15690 0.0246917555035805827424 +15691 0.178051861730811084028 +15692 0.109929548471407900756 +15693 0.0598799676456581511652 +15694 0.05343753774966661374 +15695 0.0173796747060573665566 +15696 0.0013551809558679768844 +15697 0.2194379673392281848709 +15698 0.0318566883699454433954 +15699 0.2450490239218224841622 +15700 0.1331717580274186241152 +15701 0.0083672999191621976606 +15702 0.0561425728022925882299 +15703 0.0092313245139764056602 +15704 0.1107696838929443095001 +15705 0.2084011786270877353289 +15706 0.0325724825788041710828 +15707 0.0505671233184075280542 +15708 0.0710474212006543537345 +15709 0.1387307969810158136958 +15710 0.1909916690879877310127 +15711 0.2071316882522572511149 +15712 0.054867452006063431591 +15713 0.121046039267086347202 +15714 0.03235786159816789187 +15715 0.0922113507729047793271 +15716 0.0946041055616177462007 +15717 0.161264420822781334719 +15718 0.0279298135915852924205 +15719 0.0322639399286829969826 +15720 0.1833744928131293139284 +15721 0.0984308305283028706079 +15722 0.1521980003941817705115 +15723 0.2259849699421356272744 +15724 0.024633426263136769252 +15725 0.0443947357175369025639 +15726 0.089473371303440921154 +15727 0.2912850478494989348022 +15728 0.1010586541415701422597 +15729 0.1493206349570299351637 +15730 0.195218940216191988446 +15731 0.1256787797287610419783 +15732 0.1073158560899779845332 +15733 0.0374642896746042591838 +15734 0.138250105342314411061 +15735 0.1896319454491003919649 +15736 0.1562083000933861265391 +15737 0.1346293640713370209916 +15738 0.0852035144399395372172 +15739 0.1701777305053106537081 +15740 0.0335692076093537239423 +15741 0.0193840088621929917956 +15742 0.0194751423813808062502 +15743 0.2136322110722240763536 +15744 0.1940251325825375905421 +15745 0.1605861563231589239642 +15746 0.0251576603515517062404 +15747 0.1323524806877426907015 +15748 0.0493061410111975892412 +15749 0.1209587482373479100373 +15750 0.0335506028693820235187 +15751 0.1517626729831274556837 +15752 0.0889164492402083894707 +15753 0.2079345803149570404145 +15754 0.1063229972700561742904 +15755 0.1541373103144714762358 +15756 0.1342793655665905361296 +15757 0.0250720893119567855845 +15758 0.2245116705295475412196 +15759 0.0727690832981485041797 +15760 0.1718545472031116627676 +15761 0.1030132367324612108206 +15762 0.0078975167088801596915 +15763 0.020571205726177002604 +15764 0.1681330727654642143154 +15765 0.2438503356752504680482 +15766 0.0241393533668888098098 +15767 0.1726929457050099536719 +15768 0.1620675425430599392307 +15769 0.1356391266519088867337 +15770 0.2047094976731080040455 +15771 0.076340571937239670719 +15772 0.1303074971284891303558 +15773 0.2158550409146678394912 +15774 0.1821359025667903253076 +15775 0.1275892486736973363559 +15776 0.3088882828785238787717 +15777 0.2528942805845701413681 +15778 0.1275312730521554749341 +15779 0.1938332413438550172469 +15780 0.1778723964578449168172 +15781 0.1679013592597847770449 +15782 0.1616425066667726562208 +15783 0.185515781799493545412 +15784 0.2489063967677999367201 +15785 0.0786854381533256264092 +15786 0.2224192389700566097055 +15787 0.2324762743731190184349 +15788 0.2324547670905026119925 +15789 0.1115803772852139164051 +15790 0.3296068393272970076247 +15791 0.0345723141963002744692 +15792 0.1763688815198347081825 +15793 0.0818731205310098963945 +15794 0.0501510294420498131673 +15795 0.1387931248120843064076 +15796 0.1777578553486997281485 +15797 0.040490109953894730388 +15798 0.1343161144475885360183 +15799 0.156664137466005826127 +15800 0.0074434872880238117654 +15801 0.0677775374359546461944 +15802 0.1785800556402535876899 +15803 0.2444226084790113717737 +15804 0.1794293638962748682619 +15805 0.1796323301070872902052 +15806 0.1811058849249464475228 +15807 0.0244152449610195859675 +15808 0.1569003554329740646178 +15809 0.0783658770922387032387 +15810 0.2856407575664773879076 +15811 0.1737950814394776666294 +15812 0.0801269307199168889788 +15813 0.1139175325997451165838 +15814 0.1947948372815765105681 +15815 0.1177254652306021359687 +15816 0.0384005507729237721692 +15817 0.1973041930637775553947 +15818 0.0915506197384305248077 +15819 0.3146877203648243970235 +15820 0.119653863039528421397 +15821 0.1400617126126419420551 +15822 0.1884334374470129080947 +15823 0.1531533472494146785881 +15824 0.2277439968689753679865 +15825 0.0143829650904713553933 +15826 0.000520748995161072494 +15827 0.0008433828138346014316 +15828 0.090333069745564639752 +15829 0.0070110360281086170006 +15830 0.1645330374466079936813 +15831 0.0620084779989740972339 +15832 0.2323389600187883707871 +15833 0.0759143651533751573357 +15834 0.1147834287364405819742 +15835 0.1597876848710067521075 +15836 0.0101389107760028237409 +15837 0.1407112724438823181394 +15838 0.0796685797559793379419 +15839 0.183948669792578956006 +15840 0.2215532859510203345099 +15841 0.0692115988290202127642 +15842 0.1594553283413341626673 +15843 0.0088788283345035429878 +15844 0.2437053366151859246092 +15845 0.0753656392875992481306 +15846 0.018099773704820392578 +15847 0.117945717531163438152 +15848 0.1744501429504918821323 +15849 0.0946782641114382456093 +15850 0.0580664983397123427022 +15851 0.0761098539933282164371 +15852 0.1392486122132187698064 +15853 0.1933642016575029820569 +15854 0.2373362539830534967411 +15855 0.1261394954617922337103 +15856 0.1804768970884004786193 +15857 0.066553543400821077447 +15858 0.0734558295978184061514 +15859 0.1899758432291977772177 +15860 0.0780237535027299144996 +15861 0.0842435893492749082734 +15862 0.0712778291109563422179 +15863 0.0900774475183909878462 +15864 0.1299333372091995009168 +15865 0.1394768757819678239862 +15866 0.1086929154775388844945 +15867 0.1326242295566910645022 +15868 0.0795400118954108781288 +15869 0.0106616835226228945377 +15870 0.2091579502515585331501 +15871 0.018081855388729958789 +15872 0.0179284274295543992472 +15873 0.0021165770749352469382 +15874 0.0143328095152754526886 +15875 0.012687480238168380095 +15876 0.0456443621935293253333 +15877 0.009430605608512861518 +15878 0.010854352374579850915 +15879 0.0039813105510861602696 +15880 0.0133748466508353588272 +15881 0.0081981166539722115372 +15882 0.0165728005840291570117 +15883 0.0122641161042949687482 +15884 0.0161287289412779047404 +15885 0.042627755183281032525 +15886 0.0020547933771471690347 +15887 0.0149827213859907125426 +15888 0.0020181554509326022316 +15889 0.0057035790949799287236 +15890 0.0389868336400817847021 +15891 0.0024615070583479145387 +15892 0.0171542618279685102345 +15893 0.0137968391401723285833 +15894 0.0250381680389022073796 +15895 0.0113330930178843700129 +15896 0.0238598084758756025237 +15897 0.0105498927518330870567 +15898 0.002762088929508463471 +15899 0.0156794456450007370363 +15900 0.0068385981630196988543 +15901 0.000610423663002993889 +15902 0.026377220998888742387 +15903 0.0144729279208014333796 +15904 0.0051692780142417798714 +15905 0.0063826746903402004421 +15906 0.0070001738056696509449 +15907 0.0041593395397065318145 +15908 0.0044304825273813941075 +15909 0.0284359031173547389948 +15910 0.0048904144522761748906 +15911 0.012450333954070920417 +15912 0.0115307561747124871776 +15913 0.0010639646405011576793 +15914 0.0216497591542206616055 +15915 0.0108601221827144042015 +15916 0.0123538366324283267056 +15917 0.0070803216207248869524 +15918 0.0031163896942579634095 +15919 0.0269424678982205997868 +15920 0.0089434613557534802658 +15921 0.0048591752438354582008 +15922 0.0043330015791924192392 +15923 0.0125304130450550055276 +15924 0.0240982308471847789244 +15925 0.0141082227832090741082 +15926 0.0391687973174497036655 +15927 0.0075545275280243439839 +15928 0.0043772817749264669729 +15929 0.0311215413876297780638 +15930 0.0214894870246444304185 +15931 0.001439325653470692034 +15932 0.0082283015268247387125 +15933 0.0143726421285125770255 +15934 0.0507435897120127807347 +15935 0.0099802974304633185892 +15936 0.0151410773717020850915 +15937 0.0141690948765983096985 +15938 0.0305475594511607877191 +15939 0.000852918988304910493 +15940 0.0027515615278423236935 +15941 0.0029675034070429703176 +15942 0.002615965287525340386 +15943 0.0039889507359996950048 +15944 0.0229117297974692456231 +15945 0.0249065573769584672204 +15946 0.0177024942460157798385 +15947 0.0178637488572424667221 +15948 0.006677043798659582606 +15949 0.0930702779992970685718 +15950 0.0067003170651024693272 +15951 0.0423339997555737534984 +15952 0.0097874448360393915408 +15953 0.1135872447914887672926 +15954 0.0034144837784291573457 +15955 0 +15956 0 +15957 0 +15958 0 +15959 0 +15960 0 +15961 0 +15962 0 +15963 0 +15964 0 +15965 0 +15966 0 +15967 0 +15968 0 +15969 0 +15970 0 +15971 0 +15972 0 +15973 0 +15974 0 +15975 0 +15976 0 +15977 0 +15978 0 +15979 0 +15980 0 +15981 0 +15982 0 +15983 0 +15984 0 +15985 0 +15986 0 +15987 0 +15988 0 +15989 0 +15990 0 +15991 0 +15992 0 +15993 0 +15994 0 +15995 0 +15996 0 +15997 0 +15998 0.0415762213042851422329 +15999 0 +16000 0 +16001 0 +16002 0 +16003 0 +16004 0 +16005 0 +16006 0.0641898168403989693998 +16007 0 +16008 0 +16009 0 +16010 0 +16011 0 +16012 0 +16013 0 +16014 0 +16015 0 +16016 0 +16017 0 +16018 0 +16019 0 +16020 0 +16021 0 +16022 0 +16023 0 +16024 0.0210227040950905601224 +16025 0 +16026 0 +16027 0 +16028 0 +16029 0 +16030 0 +16031 0 +16032 0 +16033 0 +16034 0.0032061380267618664643 +16035 0.0056919706198307173234 +16036 0.0223000099490644615452 +16037 0 +16038 0 +16039 0 +16040 0 +16041 0 +16042 0.0021159568693526835913 +16043 0.0073484761860509806644 +16044 0 +16045 0 +16046 0 +16047 0 +16048 0 +16049 0 +16050 0 +16051 0 +16052 0 +16053 0 +16054 0 +16055 0 +16056 0 +16057 0 +16058 0 +16059 0 +16060 0 +16061 0 +16062 0 +16063 0 +16064 0 +16065 0 +16066 0 +16067 0.0012058177342322428001 +16068 0 +16069 0 +16070 0 +16071 0 +16072 0 +16073 0 +16074 0 +16075 0 +16076 0 +16077 0 +16078 0 +16079 0 +16080 0 +16081 0 +16082 0 +16083 0 +16084 0 +16085 0 +16086 0 +16087 0 +16088 0.0152972973314947965212 +16089 0 +16090 0 +16091 0 +16092 0 +16093 0 +16094 0 +16095 0 +16096 0 +16097 0 +16098 0 +16099 0 +16100 0.0112794410809940095358 +16101 0.0052946376273602059268 +16102 0.0006024998298750258415 +16103 0 +16104 0 +16105 0.0048867304001417978879 +16106 0 +16107 0 +16108 0 +16109 0.0090300274481830908324 +16110 0 +16111 0.001938681051134623053 +16112 0.0755510496369360545765 +16113 0.1808536771715715252284 +16114 0.1570952381684577459087 +16115 0.1867673213741027071766 +16116 0.2868772701588779283632 +16117 0.1343420704690497236022 +16118 0.2529608167913260929893 +16119 0.1430168041143461410414 +16120 0.1657716792095448588018 +16121 0.1876858377995943849559 +16122 0.2644631687433110500685 +16123 0.0773763471563061328018 +16124 0.2804417736518529657985 +16125 0.1884449346786252754704 +16126 0.1622052554665074919349 +16127 0.0518968928047949737414 +16128 0.1143027482508971875985 +16129 0.099242627826099730215 +16130 0.0654952095620215668514 +16131 0.1287752994191638800103 +16132 0.4625702767179263319441 +16133 0.1619269359119421825 +16134 0.2089530542000464508057 +16135 0.1888895168993203199648 +16136 0.1395101176231851036746 +16137 0.0013680703943796026362 +16138 0.010281385554565352497 +16139 0.0100730992823688369348 +16140 0.2494397195570102199191 +16141 0.1384144192117151006549 +16142 0.0798530036383782726439 +16143 0.0991166087385450395031 +16144 0.1695908927627955142814 +16145 0.1840619488882189458412 +16146 0.2153081504560865888642 +16147 0.133021785526401592481 +16148 0.1324724708500835956837 +16149 0.2374137487514335065342 +16150 0.2010087538157708519915 +16151 0.1918463908003349982501 +16152 0.2562459132521640037261 +16153 0.1292167344714361121305 +16154 0.1419472609250291561533 +16155 0.1596525254375296687392 +16156 0.2170329958256839997244 +16157 0.0947697336874704643339 +16158 0.2425566942756396671133 +16159 0.2870384375627197948155 +16160 0.2333354873869347423376 +16161 0.1278235779910205649479 +16162 0.0356561860036541597441 +16163 0.1712780063607702851769 +16164 0.1347100797681051453925 +16165 0.1594162013517100018323 +16166 0.1312714512135875755128 +16167 0.3361338168610855658969 +16168 0.1019641398019448252921 +16169 0.2436179688233194307045 +16170 0.113859487562938593852 +16171 0.2651953548128310189647 +16172 0.1062061422084417117162 +16173 0.0207053336332446887014 +16174 0.2241699850625888390532 +16175 0.0357803520775029498813 +16176 0.1466239095449780593938 +16177 0.076747223704257627741 +16178 0.1370535462543056304252 +16179 0.1600380377592270919607 +16180 0.1742915251930536035818 +16181 0.0477659486742453356922 +16182 0.1977367862387306718386 +16183 0.1057997466256000718143 +16184 0.0169512254097073412906 +16185 0.014005661030100270345 +16186 0.0272612691323418782074 +16187 0.2382234198507621747432 +16188 0.2236575568607556963041 +16189 0.1224587128916872630358 +16190 0.1560316223863706108776 +16191 0.198129399751574064581 +16192 0.0634923098832794741142 +16193 0.1010801864606213906761 +16194 0.1630561684782222686696 +16195 0.2251340032072514096839 +16196 0.1681481803139555919469 +16197 0.2748596703817050568297 +16198 0.1568892996518606308687 +16199 0.1730874403323137411537 +16200 0.0179434908895633772608 +16201 0.0130498707192984869652 +16202 0.204047096241398301375 +16203 0.0940170929149356238153 +16204 0.1664723711802338079746 +16205 0.1205526969380752294603 +16206 0.0049108646302671657424 +16207 0.2213847915716623815374 +16208 0.1520969687432126848936 +16209 0.0354476104668228625405 +16210 0.1024807268626003464718 +16211 0.2169618696399453205981 +16212 0.1436806745980299393484 +16213 0.255048730610669960317 +16214 0.0621275609965114203326 +16215 0.0502519390148938385754 +16216 0.1358095530227527714118 +16217 0.1539342826070745906542 +16218 0.177244240477499814812 +16219 0.0841536178715079352664 +16220 0.1504170613380071686027 +16221 0.1780546434577246761588 +16222 0.2442226241084822224714 +16223 0.1618197818932041454065 +16224 0.2180021745788333353833 +16225 0.0455651052828850502974 +16226 0.3097788423517595091106 +16227 0.1763197481540772215691 +16228 0.2090431174410198322811 +16229 0.2064958290101799320837 +16230 0.0212151990432322724855 +16231 0.0375333729161383897188 +16232 0.0029642132309036311154 +16233 0.061602922771059054341 +16234 0.0056336710262415235287 +16235 0.0694265797048045085749 +16236 0.0122952890947797856269 +16237 0.0121386683241781791198 +16238 0.070623350585827388004 +16239 0.0015236230099298143405 +16240 0.0007463371203306307881 +16241 0.0007157814721164129906 +16242 0.1437364858362560893212 +16243 0.0116178828075105926299 +16244 0.151763201858299556779 +16245 0.1621066862468218405802 +16246 0.0450645056980920541423 +16247 0.0343913576641591486882 +16248 0.30938137994689207666 +16249 0.0565314308441903293212 +16250 0.2967535587530959473312 +16251 0.021743524732034009983 +16252 0.0012976026312942494281 +16253 0.0026554416436502070557 +16254 0.0023935890464098137122 +16255 0.0039652364658300329886 +16256 0.0018148575099181313893 +16257 0.0006436242603748483938 +16258 0.0008703910722558101491 +16259 0.1991576065567107478582 +16260 0.1615232167044536593092 +16261 0.2475947656130859098056 +16262 0.0629224609542276269725 +16263 0.1545662449469100419019 +16264 0.0065288578731862965954 +16265 0.0858730318608592613217 +16266 0.1611321914644429365016 +16267 0.0018853244822599252075 +16268 0.1788758652273436389191 +16269 0.0200133973541869528145 +16270 0.0448220658756681294976 +16271 0.1460518950594304754542 +16272 0.1065241475050062031071 +16273 0.1500753146657482217652 +16274 0.001585206542975181555 +16275 0.0507212653088975479188 +16276 0.0839919380648753571839 +16277 0.1311421059169739222749 +16278 0.2297044843147226511615 +16279 0.3090813843403525384979 +16280 0.0013885928987487593505 +16281 0.0736232206047166271023 +16282 0.018885540892245589184 +16283 0.0109890126301533767872 +16284 0.2211969809601326553139 +16285 0.12330063929240557552 +16286 0.1010915809764498518586 +16287 0.3048132349621700543096 +16288 0.1821047728687413269633 +16289 0.0429951990400846417728 +16290 0.1280822723619236824799 +16291 0.3011145864118492387895 +16292 0.1101538186803412561199 +16293 0.0013125917194658097333 +16294 0.0874979242656501177144 +16295 0.0777138782596030891492 +16296 0.119547858169796905603 +16297 0.011475869316206777429 +16298 0.0999475262486890780878 +16299 0.0960133105416235144158 +16300 0.0464171192607709393041 +16301 0.1873740509837303891416 +16302 0.140134244944758790119 +16303 0.005710016390671579338 +16304 0.2160209337998105494361 +16305 0.0793978900881162075187 +16306 0.04094947156855421877 +16307 0.2416760896478367115492 +16308 0.214829564567010994347 +16309 0.1355561367669022299065 +16310 0.2379474950894443496807 +16311 0.1273010324527202863365 +16312 0.2229399413654486461489 +16313 0.0825029630483845866085 +16314 0.131488858864072355459 +16315 0.021121338729104485038 +16316 0.0063330706065581488656 +16317 0.246347701025151855081 +16318 0.0766338513306978746931 +16319 0.2222371357985676032509 +16320 0.1732009514267404126642 +16321 0.0651946617770307185191 +16322 0.0199565023183427454889 +16323 0.0087378724157265268146 +16324 0.0637559956240909553316 +16325 0.0285094074299733560296 +16326 0.0033330220050117164571 +16327 0.022330909531827779213 +16328 0.0086184690034733796998 +16329 0.0193191492350699572933 +16330 0.0120422059587360526234 +16331 0.0020685822222891690758 +16332 0.0073787412903488728416 +16333 0.3475070662618787720177 +16334 0.2553160997500576456787 +16335 0.0005807460121379318671 +16336 0.0003074112600617406985 +16337 0.3367575658948455674668 +16338 0.1748155974332307738273 +16339 0.1356188527361354978495 +16340 0.1811899394006075414421 +16341 0.0556660391279432745226 +16342 0.0941596500227279420425 +16343 0.0002096567227190658132 +16344 0.036664531595311258827 +16345 0.1545217950103895676595 +16346 0.1583665151900414125308 +16347 0.1100679297840366638317 +16348 0.0638298164026015468764 +16349 0.069717828928702513247 +16350 0.2085898998558391725933 +16351 0.1796347462783746407045 +16352 0.1181617081106236394339 +16353 0.0472837609857802015911 +16354 0.0975415490394305056965 +16355 0.1420173426548804185288 +16356 0.2068873906678582730301 +16357 0.2678952483848829890789 +16358 0.1836572435176615225583 +16359 0.0387603963151841868218 +16360 0.1439833620993223317619 +16361 0.074605310439016470081 +16362 0.0890924180747597743713 +16363 0.0718066315972803043133 +16364 0.0259565596483371935355 +16365 0.0783232090301988898018 +16366 0.0526470537091843146515 +16367 0.013169611645634128963 +16368 0.0305363060258417515214 +16369 0.0241225706160535721678 +16370 0.0094357672712936067894 +16371 0.1611212495406053046842 +16372 0.1218685868105491981739 +16373 0.1253155352108008824263 +16374 0.0880342873854908641862 +16375 0.2990279861507877390281 +16376 0.1891444887578442035636 +16377 0.0421421053082018381986 +16378 0.2996872355830028089585 +16379 0.1363489349166490394172 +16380 0.0962986775905416148014 +16381 0.1511350796498457105255 +16382 0.0774024110032655587865 +16383 0.013153281389840610488 +16384 0.3059524081308522824862 +16385 0.1948408975859114711238 +16386 0.0485292027712095411229 +16387 0.2171470652150441493777 +16388 0.0144657801580548115578 +16389 0.0499660296875392151428 +16390 0.2037373958272570839689 +16391 0.1930532986556718066939 +16392 0.1716535561472302684471 +16393 0.2246803031689924567971 +16394 0.0002818949054555021697 +16395 0.0755126888785089406264 +16396 0.1821671135274197661502 +16397 0.0417557540250754061217 +16398 0.1874875657359865166462 +16399 0.0786459636752006585603 +16400 0.1927583507652327055037 +16401 0.2377870744570979855137 +16402 0.0523486079667615997191 +16403 0.0335396177188719488149 +16404 0.2140303700404109221722 +16405 0.0545672611004820728997 +16406 0.0587749242595517404353 +16407 0.090290051540706892097 +16408 0.0681545123716321143981 +16409 0.1721012594798842498811 +16410 0.1605764571806236606921 +16411 0.1156637601286709809978 +16412 0.0668570233149214931956 +16413 0.2448177831389002911955 +16414 0.2836750008679324031036 +16415 0.0403184745353195941875 +16416 0.2976159933755366249208 +16417 0.0344758629001818764359 +16418 0.1320208420750124400644 +16419 0.2599179878666604270876 +16420 0.0276967257768320658518 +16421 0.1002922106698305171335 +16422 0.0642850351822128462942 +16423 0.0716190156984904402471 +16424 0.0531764782734983065815 +16425 0.2386499545147739431705 +16426 0.2559317823843381689564 +16427 0.2622503811743617241348 +16428 0.0512052472594268365258 +16429 0.151912014120259120098 +16430 0.028089154689480859417 +16431 0.0547043813988131108483 +16432 0.0515318880276276272689 +16433 0.0536860437820012720289 +16434 0.2121883709728514810777 +16435 0.2201717109858775667242 +16436 0.098810326216908953012 +16437 0.0114320871438079070537 +16438 0.0973938229445485137425 +16439 0.2028057327731951353655 +16440 0.3325976334037407666777 +16441 0.0751050462925591660879 +16442 0.0346740978104739319687 +16443 0.0471253350955411348666 +16444 0.0813523749139665325325 +16445 0.0464626977476229052755 +16446 0.1434794336740979614486 +16447 0.211166088306406207753 +16448 0.2320204632680101242137 +16449 0.1665084325270737575231 +16450 0.2200337374405656187815 +16451 0.0902880820313199938809 +16452 0.096730875912742431133 +16453 0.1500406576442245565328 +16454 0.1172979131883619269283 +16455 0.0906114731085046476444 +16456 0.1499366401746584920307 +16457 0.1094767689242233943414 +16458 0.1325211278914099899406 +16459 0.2355177649519460403305 +16460 0.1594620541144604053407 +16461 0.102007709923003495045 +16462 0.1191934542677898917207 +16463 0.1150094900705122408224 +16464 0.0384631165363305271843 +16465 0.0308390594380723373491 +16466 0.0166082257381199756285 +16467 0.0164287847123832861373 +16468 0.0207268866780039653919 +16469 0.001713369123066352858 +16470 0.1675392417317425608481 +16471 0.0262699483950079581973 +16472 0.0374456591303431174378 +16473 0.0122060830161217024475 +16474 0.0046271153383257357036 +16475 0.0081267201315632607572 +16476 0.0073355590810458629059 +16477 0.0082649341424597955186 +16478 0.2124672751922780100653 +16479 0.480737676209128828031 +16480 0.0468791667626539607094 +16481 0.257146533794501974679 +16482 0.2512600098613693155869 +16483 0.2482897483344081168699 +16484 0.171945323477121186917 +16485 0.2678889725171052504926 +16486 0.1559075912539305275306 +16487 0.17803210959386456258 +16488 0.1036018798108513799816 +16489 0.0753244098611666884624 +16490 0.0707089943300080042698 +16491 0.1954791603288393575522 +16492 0.1976174107062027562876 +16493 0.112802798465425804797 +16494 0.085968316658744783898 +16495 0.2049328339685261013425 +16496 0.0381863719095785827951 +16497 0.0394742245381534823112 +16498 0.0022501318062166588037 +16499 0.0619465306500691648139 +16500 0.0402131019646514847876 +16501 0.0943096432817352309241 +16502 0.198591615177624747357 +16503 0.0247458071629198528285 +16504 0.1413101957803788444323 +16505 0.1526227796016569049353 +16506 0.0696463672993315741078 +16507 0.0963945779602444763334 +16508 0.0259557854097685405947 +16509 0.009623724435563030255 +16510 0.0142166802332561077737 +16511 0.0141035987611602166292 +16512 0.0897499138842930976878 +16513 0.1520314962889373067156 +16514 0.2098625795227430090861 +16515 0.1130551924633672322695 +16516 0.022181431073757612582 +16517 0.2257463786573827013715 +16518 0.2330754599417744765688 +16519 0.2369210449337529889036 +16520 0.0824509668888410429677 +16521 0.0212523698439925703974 +16522 0.3190555470585098585268 +16523 0.0955346921159861833805 +16524 0.1627022953852163278388 +16525 0.1904171702578934810557 +16526 0.0098440127439613205745 +16527 0.1003992776401819997378 +16528 0.2041031571477570916162 +16529 0.0496038435138091912679 +16530 0.2460654461953766247717 +16531 0.1913116863307163240115 +16532 0.0337839627433631981002 +16533 0.0813816268362216183041 +16534 0.1288294349931530802245 +16535 0.1532259385667350537474 +16536 0.0273015357290394868051 +16537 0.221879966985791626799 +16538 0.2447333010635615591077 +16539 0.3116068830898905961391 +16540 0.0010772397482657735789 +16541 0.0855242618413244171105 +16542 0.1361531188656138202653 +16543 0.1768011638223698223094 +16544 0.0077101010952201785506 +16545 0.0639651741509901783367 +16546 0.2055113001129926786348 +16547 0.1717203861720122681778 +16548 0.0538424305662790758653 +16549 0.1277524914223968632854 +16550 0.0876462338835080068122 +16551 0.2359402969799641869297 +16552 0.1701671002815398625607 +16553 0.1065515562915223535256 +16554 0.1578177926959536125917 +16555 0.2543373462686832620072 +16556 0.174683397147183228082 +16557 0.0466652808422031578628 +16558 0.2604070847084569706453 +16559 0.1327266501814821642036 +16560 0.1383350581903491360869 +16561 0.0713187082672100680636 +16562 0.2909045476057164347772 +16563 0.1889252318513657546983 +16564 0.0285721366479008247186 +16565 0.1603374900413054582327 +16566 0.1985249313238600099574 +16567 0.1570078635222187379838 +16568 0.2810283535975013347041 +16569 0.1918953345239265140254 +16570 0.1985368562960719840405 +16571 0.0873480603355668849819 +16572 0.2882377779165704034803 +16573 0.1077036073389062431627 +16574 0.2044794614398423304724 +16575 0.0407424864813078768666 +16576 0.1943841459400439397598 +16577 0.2081758194472668188624 +16578 0.0937560119999833602167 +16579 0.2219574339644267979565 +16580 0.1746829165672868500181 +16581 0.2828822214823044078891 +16582 0.070915768769662879123 +16583 0.017641020444089001995 +16584 0.1762755846031192896728 +16585 0.0248150692929759719707 +16586 0.1280990416237956508461 +16587 0.2210012150746637482612 +16588 0.0962022715799988603624 +16589 0.1503304623338309609082 +16590 0.1648463549990878962426 +16591 0.058027000374450889153 +16592 0.0046596056935093513318 +16593 0.2560268487012705707073 +16594 0.3027665256817782535315 +16595 0.0568886152310058429427 +16596 0.3071176518599898486173 +16597 0.0007331427296657904574 +16598 0.0952773401795065089104 +16599 0.1980288962664776275524 +16600 0.2053561206763912239115 +16601 0.0179721220343074285231 +16602 0.1578128320232694836722 +16603 0.1130928656790476244387 +16604 0.2182938049107271960558 +16605 0.1089416262422273490307 +16606 0.0741646206485001013853 +16607 0.3506450383184788188728 +16608 0.1114618920020841036367 +16609 0.1595913965156102176657 +16610 0.0588861299841709137648 +16611 0.2658124724117779114074 +16612 0.088384694224109827676 +16613 0.1206222633019050777259 +16614 0.2749991520752583795684 +16615 0.1103583700747981649615 +16616 0.2040627726728093949138 +16617 0.2322009926674628854659 +16618 0.2873203649758870326103 +16619 0.2605965548147623489506 +16620 0.1757753070439408982217 +16621 0.1918837944189003552609 +16622 0.1861233639315230947719 +16623 0.2219128537687242164189 +16624 0.081472822685810700194 +16625 0.0858558764601300844488 +16626 0.1136858257235384278561 +16627 0.1239018088168752224076 +16628 0.2784486949758399609856 +16629 0.1199861271986065458517 +16630 0.1056988111194767376855 +16631 0.1509205605615895895433 +16632 0.1822093637329064153008 +16633 0.132253139320661766698 +16634 0.2698494612862853414548 +16635 0.0173718221142447372396 +16636 0.2325596733103958146938 +16637 0.2683500315713659989392 +16638 0.236904628167770386904 +16639 0.149251576008259362327 +16640 0.0682844685427882180928 +16641 0.146672156241963252965 +16642 0.1336089905431968039018 +16643 0.1334436784480442228507 +16644 0.1091214507919653214918 +16645 0.0658247019411827977153 +16646 0.1176362420720962614906 +16647 0.0017429471636527702708 +16648 0.1230911857278110255187 +16649 0.0198693535707986819461 +16650 0.0586243917297910172026 +16651 0.1437312954495182093506 +16652 0.0646714547888001395348 +16653 0.0107500639851739725888 +16654 0.3107473236850621933769 +16655 0.0309346403692007942965 +16656 0.1092196678603104126326 +16657 0.1313158155976971996903 +16658 0.159505365201158183508 +16659 0.0420586566980089837653 +16660 0.0941339218426150703412 +16661 0.1130661900267599800918 +16662 0.0251608374765898527847 +16663 0.0007010229789564680113 +16664 0.2920043558286441465199 +16665 0.2069333852921236005518 +16666 0.1465854005723923048077 +16667 0.0740670621114126598306 +16668 0.0658790203477956221034 +16669 0.0816558083235175979908 +16670 0.2566215339441053067127 +16671 0.0293031282306569322571 +16672 0.1733113297577919265091 +16673 0.1357372445161131591096 +16674 0.1459351037882048174321 +16675 0.1981730191088286463508 +16676 0.1986559999438730905741 +16677 0.0519487020216464548406 +16678 0.0932298196453418270835 +16679 0.1096055196826311445291 +16680 0.0451226200270858565644 +16681 0.1502763296696299488708 +16682 0.1723285423478568556543 +16683 0.1276321235577169421749 +16684 0.1154375472122303736278 +16685 0.090028489741463310625 +16686 0.2453868887223023820265 +16687 0.0413743306160370555302 +16688 0.0155492107788475653757 +16689 0.0369806374780356772458 +16690 0.0926216479851628554654 +16691 0.2379356635881525738885 +16692 0.0453010617692485087482 +16693 0.0995178303896977894949 +16694 0.2023170304205110237028 +16695 0.1025611261185422895537 +16696 0.1820584384030380364905 +16697 0.066207060666007164218 +16698 0.0365946484790119208608 +16699 0.1275038252665074423042 +16700 0.0060172173524954799054 +16701 0.0029314804190191225478 +16702 0.0237422308900607434423 +16703 0.0358601116508215980727 +16704 0.0460275001762823210427 +16705 0.2300270441520970987614 +16706 0.0527956505805891179728 +16707 0.1506476160507932426924 +16708 0.2792732979874858467717 +16709 0.0905845204582605423305 +16710 0.1106069306830211596004 +16711 0.047077448566541529873 +16712 0.0925073239606714337047 +16713 0.1862193505748956923185 +16714 0.0930894499942639258583 +16715 0.166866793062796919811 +16716 0.1526726391953166228621 +16717 0.1915429491510418080757 +16718 0.1918911833288670099051 +16719 0.2483759930421294837011 +16720 0.1184888011587561518212 +16721 0.0010178198880551930696 +16722 0.2309811804621424435879 +16723 0.1994884524596402530783 +16724 0.1685336760922676868635 +16725 0.0353824232411657918496 +16726 0.004123960385574236033 +16727 0.0268991151256034420614 +16728 0.1750094110515833523145 +16729 0.0346526665874999570405 +16730 0.0984768404522170376358 +16731 0.0280233249041509681343 +16732 0.0537748675091609396048 +16733 0.1723775306161291764884 +16734 0.1984603894788008193739 +16735 0.0215482388213103989716 +16736 0.2801417017342520132672 +16737 0.0918635688329322158685 +16738 0.0612855111502070143414 +16739 0.1410066085355373977084 +16740 0.0680933516056000803784 +16741 0.0798000490391305439264 +16742 0.2178339924759949142175 +16743 0.0898510017536864324939 +16744 0.0454457088329357702938 +16745 0.0263192041756223037152 +16746 0.0824995881241344652945 +16747 0.0662350340099270712413 +16748 0.0789958447724011803004 +16749 0.1498598317399812140938 +16750 0.0323111607756816840431 +16751 0.1337413776167186152399 +16752 0.3613004021941749188684 +16753 0.0537476847733288215503 +16754 0.19127003309100323869 +16755 0.1367137283934932723284 +16756 0.0229976736420459890176 +16757 0.1212410854637801654876 +16758 0.0019899085101834092758 +16759 0.007211258985054574272 +16760 0.0003017419165556903498 +16761 0.0002386763786663105233 +16762 0.0125500876087183532021 +16763 0.0069820355593824086002 +16764 0.2444278805934523568499 +16765 0.1273916231792801001532 +16766 0.0803676984678970973697 +16767 0.0012230975374424198077 +16768 0.0032331265451713470972 +16769 0.0039414037331521566132 +16770 0.1330088674660935832428 +16771 0.0886057928596776650521 +16772 0.0133124836917912683226 +16773 0.1288781317826888372924 +16774 0.0158316450355630508606 +16775 0.0202314120153291310267 +16776 0.0054983419240957290969 +16777 0.3231676095269647164265 +16778 0.1857475069562270209023 +16779 0.1010235480438350791355 +16780 0.2480156700686920179244 +16781 0.2398946874781062721382 +16782 0.1608385078980763549517 +16783 0.0787852833611747821463 +16784 0.2032811822779386490279 +16785 0.1159888384179353360004 +16786 0.2112167948953618257235 +16787 0.171132415073268040473 +16788 0.1120518093164746403767 +16789 0.1747244999366075746128 +16790 0.0866744124606114163534 +16791 0.0855085648650566465534 +16792 0.1676600880329578990846 +16793 0.1646162085106168126991 +16794 0.2310293261470057202267 +16795 0.08073494480559681441 +16796 0.1916966153780668979945 +16797 0.2140960296298549392358 +16798 0.2306740324251390916732 +16799 0.1868659668580923138848 +16800 0.0676477707148764179079 +16801 0.0307915998283505645827 +16802 0.219527291800017815282 +16803 0.0381196373362432811005 +16804 0.2204394852376395652627 +16805 0.1491152681291241233197 +16806 0.0919587160066818454229 +16807 0.1953079330212796616006 +16808 0.1429681333125652764604 +16809 0.0726054912670692498278 +16810 0.029511627526061182808 +16811 0.1001501062780154466259 +16812 0.1393195879459326602667 +16813 0.1907719107971305172455 +16814 0.1735411534613071071753 +16815 0.1967286940012358265051 +16816 0.1080289894801525762968 +16817 0.1852524610933890258924 +16818 0.2852498663894187380663 +16819 0.0999699539962512345737 +16820 0.1072060648350098704951 +16821 0.0861708147319900086281 +16822 0.1950873655992161614936 +16823 0.2428908644956299101025 +16824 0.1442095323865800193808 +16825 0.2201871439847058187222 +16826 0.2223176262982817785741 +16827 0.0076401309707995936765 +16828 0.1188335176492745642873 +16829 0.0997923903430294190198 +16830 0.2347428241055161957096 +16831 0.1076052067711668058791 +16832 0.0035516005371227566713 +16833 0.0005263118499466627569 +16834 0.0031017365164040145049 +16835 0.0701793928566424585203 +16836 0.0395978220434976893594 +16837 0.0757733065349713158509 +16838 0.0053035082506607718333 +16839 0.0552608850333980994307 +16840 0.0005775941698080076748 +16841 0.1029807536408611917667 +16842 0.0203824714052425644273 +16843 0.0483953433167442978458 +16844 0.0651327494091398229159 +16845 0.0181156016621233303954 +16846 0.0469363098202866257935 +16847 0.0429485697632946625268 +16848 0.0350183784221284322147 +16849 0.0149221988782463187845 +16850 0.0295885322973184067274 +16851 0.0533482552769923359937 +16852 0.0973447436937014964897 +16853 0.0534535678576580977617 +16854 0.0903817449699020891707 +16855 0.0370551156887112750904 +16856 0.0187004564533809550242 +16857 0.0221549990430330925872 +16858 0.0262034893336372844141 +16859 0.1068661456126464570637 +16860 0.078519976874424385338 +16861 0.1464717944535466331857 +16862 0.2343052388581341738405 +16863 0.0437073731655179126276 +16864 0.1047352305699651292548 +16865 0.2764493643417253454508 +16866 0.0848776647340919315754 +16867 0.1678012993699057320285 +16868 0.234676029417009907041 +16869 0.1028537980715492072825 +16870 0.2806057448709480262039 +16871 0.2569213252153128013155 +16872 0.1195731597909068105734 +16873 0.1007544982092021257358 +16874 0.088488779844736492608 +16875 0.2460359084116182926127 +16876 0.2372085872625338820452 +16877 0.158383252654506756496 +16878 0.2122559370494951358488 +16879 0.0007655928464404561894 +16880 0.1445870459157371223924 +16881 0.0665445744276592504596 +16882 0.13920135071608008559 +16883 0.1431455065407894744478 +16884 0.1121422076687203089884 +16885 0.0190305654456373839201 +16886 0.0026310656607839897313 +16887 0.158465549732888372203 +16888 0.2770351864162637900435 +16889 0.044802610839457621128 +16890 0.0168557331349535460818 +16891 0.0303980544956027089187 +16892 0.0021461787191927100479 +16893 0.088240746070544953672 +16894 0.0455754357784949742927 +16895 0.2699395523600004453968 +16896 0.055333859702969563632 +16897 0.1821619449824540826199 +16898 0.2123544858541883961767 +16899 0.1754345204399675450802 +16900 0.148255321423210101317 +16901 0.1433974540310116729813 +16902 0.2605095332060469726088 +16903 0.1366266105367197891685 +16904 0.0468508480346157898944 +16905 0.1339677655687750990054 +16906 0.1609809930015090317923 +16907 0.0950339600704430409461 +16908 0.220595940767843057273 +16909 0.1211922002806690989996 +16910 0.2067818349233826180633 +16911 0.155038520739248619762 +16912 0.1663194618431922955359 +16913 0.0733976115337666434879 +16914 0.1946146196517153925676 +16915 0.1334613185008990954294 +16916 0.141115535114945761519 +16917 0.0371263765116141078448 +16918 0.0584166010858219938995 +16919 0.0209103711035189543388 +16920 0.2734751019915820924311 +16921 0.1614992365978968891227 +16922 0.0031278694140456867606 +16923 0.0820205307584023907141 +16924 0.063478384885516450642 +16925 0.0928193151717257164401 +16926 0.0513030211427458401885 +16927 0.0422947591699351505001 +16928 0.0253242616126035137436 +16929 0.0212922404343575434604 +16930 0.1537611431893558533446 +16931 0.2077507290108916437887 +16932 0.092659135189156777157 +16933 0.0847267579636472728533 +16934 0.0849202432064653650601 +16935 0.0992109718214185809515 +16936 0.1347224199517542408966 +16937 0.2643923608757871579478 +16938 0.0615420979624623359916 +16939 0.0070661448766131023172 +16940 0.0978565114803454511039 +16941 0.3269769986568658581838 +16942 0.404366766340198635632 +16943 0.1714812567939071052958 +16944 0.2459392725691601777616 +16945 0.3201172435957587159017 +16946 0.1731226701722713146037 +16947 0.2019990984871045391902 +16948 0.1739451022383857459186 +16949 0.2130333966324369499024 +16950 0.2400034069485424970125 +16951 0.1672878487854194684559 +16952 0.1071438615120936937997 +16953 0.0897568265155254674248 +16954 0.0070728727172890875144 +16955 0.0555929593641944636007 +16956 0.1335586243443128706687 +16957 0.2174603165178919661482 +16958 0.1719363897308549271603 +16959 0.0814821730834425494994 +16960 0.1559355006798198994833 +16961 0.1934269546065927602996 +16962 0.0203122997409412760217 +16963 0.0083295832730608219963 +16964 0.2998867865516931519032 +16965 0.134412854424528327435 +16966 0.2117846466604303423686 +16967 0.1134012721797270223378 +16968 0.2022235155891342062162 +16969 0.0441856280529197911755 +16970 0.1040165593069849597763 +16971 0.0013043497062755593074 +16972 0.0015827809074878016957 +16973 0.1162397720253407207291 +16974 0.070251147381323209018 +16975 0.0355778725667007245104 +16976 0.0246704255431467178727 +16977 0.0154300891748256063796 +16978 0.2134715881201312559501 +16979 0.2490932821313284473153 +16980 0.1060742582878801687496 +16981 0.0078761223908453779213 +16982 0.1602882596036575846288 +16983 0.1287998583682262720984 +16984 0.0327377225273480787493 +16985 0.1503485225245154155438 +16986 0.3428279056730536011877 +16987 0.0295227242495155264845 +16988 0.0105858085446398596241 +16989 0.0201954080307525235538 +16990 0.1355624574299003071154 +16991 0.0378825438803815153777 +16992 0.2858783524287325517044 +16993 0.1501323841862007568704 +16994 0.1439817170868852302945 +16995 0.0377516672189745702837 +16996 0.0335062169913254992437 +16997 0.1356816004487113314081 +16998 0.1096505774314538023129 +16999 0.0709022035712956605336 +17000 0.2307683788449631512307 +17001 0.1255039017559597958051 +17002 0.051093828329668994348 +17003 0.0273682108039232097063 +17004 0.0477955281326200506764 +17005 0.0528445802968597541649 +17006 0.1550612823321834699009 +17007 0.0777441362234719685054 +17008 0.1313202998062957538306 +17009 0.0455931272987902813654 +17010 0.0040035802397153661195 +17011 0.144815058911985428125 +17012 0.1881879752397749572257 +17013 0.2198761444432857925513 +17014 0.28016422454744338566 +17015 0.1218186503270866638537 +17016 0.1021938634979929205748 +17017 0.1762942543858513899657 +17018 0.0019748582747833418223 +17019 0.0398736933500634244121 +17020 0.0128635546671747990483 +17021 0.0062704479267977767179 +17022 0.0871779729911097145401 +17023 0.0026158491806844751462 +17024 0.2161274263195462619347 +17025 0.0210197703861606834119 +17026 0.027413340417271395838 +17027 0.0033717003588937671799 +17028 0.0091009252756640117432 +17029 0.0045013078921285155573 +17030 0.1487636015254002741504 +17031 0.1519297879804786810531 +17032 0.2174545303292691189601 +17033 0.1861703881949445804622 +17034 0.0599720061538240034604 +17035 0.0369280648351024420872 +17036 0.0473687712868015031731 +17037 0.3584918550062101139098 +17038 0.1831713433314167416288 +17039 0.0209067379289461542646 +17040 0.1058407881693002178247 +17041 0.0274059280148723841131 +17042 0.2266032480356568179491 +17043 0.0779987939876936181571 +17044 0.0825693535136396289387 +17045 0.1100944536846039933931 +17046 0.2226734120761820756584 +17047 0.0023360415897440609499 +17048 0.2187391085863836237557 +17049 0.023851152812851425522 +17050 0.0835346004049178775253 +17051 0.0101635413274941913825 +17052 0.0043070015539042431907 +17053 0.0007706483847890603948 +17054 0.0221993827374691102272 +17055 0.0024459037474329958436 +17056 0.1912735750011903823609 +17057 0.2660111968199093679388 +17058 0.0033487240010780438756 +17059 0.1307027215463235081394 +17060 0.1453161298922036881454 +17061 0.1223196280476895325684 +17062 0.1778271780341377505952 +17063 0.1071513167445578423864 +17064 0.2379732891255935123276 +17065 0.0068249543430002838212 +17066 0.0973212263964932239313 +17067 0.1504957423659121495962 +17068 0.0883439286755859781719 +17069 0.1831339431801572159575 +17070 0.0816540913792574984953 +17071 0.0694377197751442493123 +17072 0.0077468550311974417391 +17073 0.1298332934084729661794 +17074 0.0185985493232085422055 +17075 0.1670056291155945227267 +17076 0.0888138060928748696021 +17077 0.1513613275973254845841 +17078 0.0280346581123692983883 +17079 0.0155845414610686714557 +17080 0.0656422739898585710927 +17081 0.198939410265242783149 +17082 0.0808678963347690404895 +17083 0.0097434750723621749852 +17084 0.1477992782990636688734 +17085 0.251308972255904816695 +17086 0.1251348329762491384809 +17087 0.0653388551011065271679 +17088 0.1233295348358915549669 +17089 0.1389303148372093565932 +17090 0.1710475842015012903907 +17091 0.0948893306847218148681 +17092 0.1888249643005095179671 +17093 0.0553253320101485049354 +17094 0.0472067344098947572206 +17095 0.0483597920656440027898 +17096 0.0968396177219593562446 +17097 0.1454910283700276418806 +17098 0.0936836007662693315279 +17099 0.2114341771999231622825 +17100 0.1450544114929031846994 +17101 0.0165807451159886050374 +17102 0.0125450240802278435714 +17103 0.2473102525383007455062 +17104 0.1008755832763718052192 +17105 0.1125413580220744752536 +17106 0.0205494107621235141958 +17107 0.0032834957164215041424 +17108 0.0493525597892387965504 +17109 0.0054435639172064360231 +17110 0.0022950738752712746341 +17111 0.2414792909464940351416 +17112 0.1531395120812955312761 +17113 0.1239460829495167004666 +17114 0.0782096584144995332633 +17115 0.0007983690158747572914 +17116 0.313679347344244963125 +17117 0.1997120781045056969738 +17118 0.1115049297973629838232 +17119 0.1065101148259735147628 +17120 0.2004261042119004077033 +17121 0.1915216991835683169043 +17122 0.173093354397518828236 +17123 0.1290781376003847791267 +17124 0.0563708301238291381585 +17125 0.1869594968815427304953 +17126 0.1885299756444164254177 +17127 0.0514206641545097986601 +17128 0.054094237072113582343 +17129 0.115223127249065110389 +17130 0.0939166153941944636951 +17131 0.0035953805489971108096 +17132 0.0161393062310756021938 +17133 0.1717512903392038048356 +17134 0.0126257974259074701173 +17135 0.0440098777889145684195 +17136 0.2181786806496252839604 +17137 0.1581236731180842014499 +17138 0.2469110504597383792813 +17139 0.0875048179427920097551 +17140 0.2009856988024693869033 +17141 0.1083979001822747978423 +17142 0.2394554018836604902809 +17143 0.1239379973371035892349 +17144 0.1880742941645971166853 +17145 0.0984089865369479344093 +17146 0.2069150515278548319031 +17147 0.1441316189704201777833 +17148 0.24022677776858875065 +17149 0.2748083703786420861626 +17150 0.1438085215950061501733 +17151 0.0290170840710436764298 +17152 0.0941297768363953679671 +17153 0.0806703315488283834167 +17154 0.0547601786151387659451 +17155 0.0065569349241105271894 +17156 0.2172305238639896662889 +17157 0.2589080140827704212825 +17158 0.1782160560532977666703 +17159 0.1781073335858649464569 +17160 0.1240598608341687975054 +17161 0.082591062767815040524 +17162 0.0672393741893736046622 +17163 0.1572926763341831180032 +17164 0.1792658654628397907338 +17165 0.215195369507629019612 +17166 0.0344104924988244895157 +17167 0.2140559008434681220745 +17168 0.0919637267950459830201 +17169 0.1819983931676389665721 +17170 0.0219852588029356282862 +17171 0.1852540776174025904943 +17172 0.1692241122012819731779 +17173 0.0478343256356169033161 +17174 0.1267144045397286811205 +17175 0.1841988139441036176347 +17176 0.1221701421535668907747 +17177 0.3605465535422571199398 +17178 0.3315387798358139792576 +17179 0.048103556883108314346 +17180 0.0422998882602492604077 +17181 0.2228742896985811960953 +17182 0.0748034734519037597389 +17183 0.0585621323691317557936 +17184 0.083826760070018796478 +17185 0.1293948158378209201125 +17186 0.0231998569196218383526 +17187 0.2437726056589805123664 +17188 0.1826507172537543133295 +17189 0.1264548645467935272801 +17190 0.1330663995336935079283 +17191 0.1078385803494616573817 +17192 0.1058627084179283306398 +17193 0.050646820112752580223 +17194 0.2222268284568062746143 +17195 0.1358740541893384556893 +17196 0.1091633944001808992352 +17197 0.0951285755288747425507 +17198 0.167897206929694786659 +17199 0.2190553464899325508686 +17200 0.0419260498512176379982 +17201 0.1814306182491589036765 +17202 0.0460393876572153387672 +17203 0.0143140219733319060819 +17204 0.0869662403527916016976 +17205 0.1426161291185380208812 +17206 0.1856927028709929317074 +17207 0.3825109416399239958828 +17208 0.1747199691306438318339 +17209 0.3619791706992045998703 +17210 0.0113509056977867169719 +17211 0.0056954301104061395497 +17212 0.2227911987789777392699 +17213 0.1103973716126110254931 +17214 0.1544385931156481706061 +17215 0.0978225420936189998811 +17216 0.2519376120754078240971 +17217 0.2074652028420872484915 +17218 0.0750244037859766482734 +17219 0.1673379985611376519206 +17220 0.1894438615427269656522 +17221 0.084194557005301984276 +17222 0.0090424166101005700596 +17223 0.0857354057677638253798 +17224 0.2574966007195677031305 +17225 0.0192261669615531177735 +17226 0.1079226521618634609956 +17227 0.1105720887701659577163 +17228 0.1480088411053462615907 +17229 0.0845670373099616234924 +17230 0.1360798711945767347498 +17231 0.0816463687356675793527 +17232 0.0987909857589089612784 +17233 0.0277748945151506942963 +17234 0.0209134277376155468253 +17235 0.0310444038073204031658 +17236 0.0126953913000346276657 +17237 0.0562294513107078550163 +17238 0.0330652615040648092948 +17239 0.0199778810054734995427 +17240 0.0305362072579387321125 +17241 0.2125377224438468048984 +17242 0.1266449312160743134736 +17243 0.0674393891900937902939 +17244 0.1207690864544969794325 +17245 0.0277971184280833942537 +17246 0.0681471785744877794899 +17247 0.06659248665140676815 +17248 0.156767324653021900005 +17249 0.0238661012528588976911 +17250 0.0878542799428907728032 +17251 0.0906649276116418684213 +17252 0.0756027091178259336157 +17253 0.2026802275937653330384 +17254 0.1773620049726355174524 +17255 0.2946086517520967174022 +17256 0.1173638646419656178521 +17257 0.2829800629940412148855 +17258 0.1560555951831270049546 +17259 0.264868572367622401309 +17260 0.0157350051580672188611 +17261 0.0164292903607400619936 +17262 0.0083700012419607571629 +17263 0.4341518842217534923655 +17264 0.2097742010116541433451 +17265 0.2393680346948404147334 +17266 0.1724094727338393862048 +17267 0.1173902483900085697011 +17268 0.2419134926565315313329 +17269 0.1265963207922135291827 +17270 0.1831250727933746158538 +17271 0.125000520695603606347 +17272 0.0008904796419102399248 +17273 0.0090401094394023610162 +17274 0.1966313895041106318828 +17275 0.0018597516493554320377 +17276 0.1561055070186885829209 +17277 0.1917452011912305909203 +17278 0.1455991567538177655639 +17279 0.1367351764038861605677 +17280 0.1458239921771110614301 +17281 0.2561337313974862417432 +17282 0.3216779024206246062612 +17283 0.0762777388518218746327 +17284 0.0147636441183274999545 +17285 0.0937051864170419246403 +17286 0.1518286779721758072981 +17287 0.1619323511068410936797 +17288 0.2018949654491126177724 +17289 0.1528092443654550380838 +17290 0.0627803806475361325656 +17291 0.0949121584271426094181 +17292 0.018371072360652215133 +17293 0.260804612557674453388 +17294 0.0769482275407650101462 +17295 0.0552528216411833431043 +17296 0.1867474531545426363888 +17297 0.1223747557235510541318 +17298 0.1234193681902506639636 +17299 0.0491912001512496976918 +17300 0.1904491683596581941273 +17301 0.1901083225941216647836 +17302 0.1635508448086665789223 +17303 0.1150086197876039467713 +17304 0.0166094141274792245211 +17305 0.3212583389530604405238 +17306 0.125873425515836245836 +17307 0.0993761912825071036526 +17308 0.0726898531286603866164 +17309 0.0312474171394975762384 +17310 0.1717739504768655833811 +17311 0.1291766445873712421122 +17312 0.1271568151846139971273 +17313 0.0549752040871996985882 +17314 0.2806122127770785401601 +17315 0.0647208082018650293632 +17316 0.0928757648350940351323 +17317 0.2201355192911008673562 +17318 0.2973746329677817157311 +17319 0.198090302816732038993 +17320 0.0744195449092927563406 +17321 0.1043462860321264507046 +17322 0.2176686173470270824204 +17323 0.0790501986314551263746 +17324 0.0712866398098904591185 +17325 0.0511664748716841727427 +17326 0.0914106821343546038783 +17327 0.1282346809439547929976 +17328 0.1611198827855400139697 +17329 0.3603119700645429079877 +17330 0.1647846078759122590007 +17331 0.3090985383308906264688 +17332 0.2994280486589228074124 +17333 0.1736361937240409047956 +17334 0.2153659177898539500706 +17335 0.066861766143256132211 +17336 0.1821008484606409516537 +17337 0.0083648150184789642897 +17338 0.1568990803299119274339 +17339 0.0770828147925064210977 +17340 0.0787346986859215614007 +17341 0.1452950081748025923378 +17342 0.1911168013211043126276 +17343 0.0265440247866746345529 +17344 0.2196140190736160235918 +17345 0.1203970360884706186644 +17346 0.0215782362090238613583 +17347 0.1303848799676247394164 +17348 0.1212774760632003984151 +17349 0.1569343697412305671524 +17350 0.0448412715397209019064 +17351 0.0905448140389489147806 +17352 0.0138752322674722088081 +17353 0.3304615940928214978811 +17354 0.1538992588893424440233 +17355 0.1275598610942117461242 +17356 0.1429308369952264901048 +17357 0.0792263564260479863366 +17358 0.1079857226668507613754 +17359 0.1016406210379122432297 +17360 0.1901416786733265140708 +17361 0.2279200679889120118027 +17362 0.10104379823623320378 +17363 0.2515609893958789533563 +17364 0.2834480307678198807153 +17365 0.1062606661918030770408 +17366 0.0804408727971790415978 +17367 0.2310665408993060054854 +17368 0.0385647487120655413362 +17369 0.2566130679000091507902 +17370 0.3335591674967264830265 +17371 0.342287188308860568231 +17372 0.1468589266957100736288 +17373 0.1343663210199798063726 +17374 0.2649331284149304410747 +17375 0.0967474615815937244445 +17376 0.1756116288288950832985 +17377 0.1951524262472911774857 +17378 0.0965678225042231191955 +17379 0.0286727208802415658995 +17380 0.1049094303461908933572 +17381 0.1651432384213426529662 +17382 0.1489048847499782890491 +17383 0.1493097682406747395145 +17384 0.1332249346036318005115 +17385 0.1812531804025921955414 +17386 0.1225940453757533160628 +17387 0.1253581202764829372054 +17388 0.2035654181215918456793 +17389 0.1261837227458690036119 +17390 0.2363596944252396303554 +17391 0.154545502469547235691 +17392 0.0038046749283590555052 +17393 0.0015015010276539151029 +17394 0.2896661857473889489789 +17395 0.1351121981890568979345 +17396 0.1549298189108122836277 +17397 0.1260001413731382802208 +17398 0.0547412801976542851001 +17399 0.2863028899932322235422 +17400 0.1211468106368704389375 +17401 0.0177723748965278455225 +17402 0.204881387210346294081 +17403 0.109719420498328903979 +17404 0.0676268729621175151978 +17405 0.2832074349300987559097 +17406 0.00649926834459535821 +17407 0.1706800539147473039225 +17408 0.0967931277764303432676 +17409 0.3188166710004035064863 +17410 0.1708083264423358171857 +17411 0.0177496556403248795242 +17412 0.1513944570470798500761 +17413 0.017171520586955663773 +17414 0.0542181262439781139206 +17415 0.0361139793965364053197 +17416 0.0203041331370498248643 +17417 0.0410490296477455871016 +17418 0.2665838377139750359213 +17419 0.0193796185011175202406 +17420 0.1581207824026825692343 +17421 0.0519042843213351964837 +17422 0.2054320490314534564114 +17423 0.1551537970030529567378 +17424 0.206769280999664517573 +17425 0.210614321769185847133 +17426 0.132453447688399977844 +17427 0.0037830338240422080494 +17428 0.0467277077381198502337 +17429 0.173515425579376741938 +17430 0.2105978454621436624095 +17431 0.1675325540965930670723 +17432 0.0833552311016561159329 +17433 0.0692336104022796827717 +17434 0.3032035531627783053743 +17435 0.0881848012355123112016 +17436 0.1250599027083837133656 +17437 0.1528548524960360310843 +17438 0.2463808230473267801841 +17439 0.2887278713625217596572 +17440 0.0088354727846605141117 +17441 0.0066243032571488436738 +17442 0.1164380771925279250967 +17443 0.1503588281105604773646 +17444 0.2803781990230154730348 +17445 0.0674557235358624468491 +17446 0.1663798557089981033918 +17447 0.2123736743574730911899 +17448 0.2431714878998942064303 +17449 0.1577010496983524134862 +17450 0.1783264827588262046998 +17451 0.1363298426102118321435 +17452 0.3263647587462483734555 +17453 0.1702895064176811190659 +17454 0.0605073349655832343208 +17455 0.0683461419572238548792 +17456 0.2426952852747386035759 +17457 0.0821669806920296563746 +17458 0.2008715725279928143721 +17459 0.0560052072986534157928 +17460 0.0796811178457514368567 +17461 0.2534952532819030879274 +17462 0.2573583135266716559464 +17463 0.0566700093684485081691 +17464 0.1348483441110690517206 +17465 0.2352411547656632628822 +17466 0.2333722323419256117205 +17467 0.1221900245334768542183 +17468 0.0994288055669207276877 +17469 0.2260909663240591727096 +17470 0.23202564549478479905 +17471 0.2510183092366751189672 +17472 0.0459844796867012184372 +17473 0.1684713614246004242947 +17474 0.0742015627337450633938 +17475 0.1966605181737196561542 +17476 0.2710924922902029909899 +17477 0.0207492147511997224174 +17478 0.1681155128864438019765 +17479 0.0106522922235621434611 +17480 0.0896234875301694794425 +17481 0.3051783492661062635598 +17482 0.1080404177906216095506 +17483 0.0975505998233332505265 +17484 0.2052248935617032588041 +17485 0.0887056527583243992208 +17486 0.0843958254951684139211 +17487 0.0747603529517118226 +17488 0.039071025252589541843 +17489 0.2697141624345441024424 +17490 0.1940901406842709953082 +17491 0.0236281637754693560338 +17492 0.1177510104923366190954 +17493 0.0459324845020252503502 +17494 0.166318613328207964841 +17495 0.0626321708737428151093 +17496 0.0669024249639774692966 +17497 0.3226468381898938697638 +17498 0.4100873296341471796467 +17499 0.0347186208631031154037 +17500 0.1688074746804271997203 +17501 0.0217940141120689864773 +17502 0.1987316028334739637096 +17503 0.3645166728703855274496 +17504 0.0060309074933035484653 +17505 0.0182124216869740326863 +17506 0.1790590630474977740239 +17507 0.1095678434411886253974 +17508 0.1830396146807750923369 +17509 0.1326491500222359287697 +17510 0.0004059394694301096023 +17511 0.0373924958373910154563 +17512 0.2568143622373957879468 +17513 0.1170415028435467102996 +17514 0.3736726439632587215556 +17515 0.2563304151271536546197 +17516 0.2360609822379295119621 +17517 0.0489645749960473664064 +17518 0.0147934961611063345993 +17519 0.0571363772054815435841 +17520 0.2428960681341053862159 +17521 0.1877769160321392516444 +17522 0.2508837406510847078955 +17523 0.2477348823604934580089 +17524 0.0051702669611275384187 +17525 0.2370746518390743351468 +17526 0.1672879889879479997639 +17527 0.0452434797888827955337 +17528 0.007159339689732154878 +17529 0.0943471721669366103491 +17530 0.0598968405442416237938 +17531 0.0084114933092651941043 +17532 0.221915236041743846318 +17533 0.0032022602304412990312 +17534 0.1989365369956827278575 +17535 0.1299246543501582362712 +17536 0.2405530290487270794753 +17537 0.1751149212413921629761 +17538 0.1420694399600872515066 +17539 0.0071047661897775966217 +17540 0.0021392690340753073124 +17541 0.1373672684593550130661 +17542 0.1266597314620859227041 +17543 0.1107118496106767047138 +17544 0.1290342930287353695462 +17545 0.0681287871136207895395 +17546 0.1071167090163663987434 +17547 0.1376015252608058514472 +17548 0.1440123330345417507203 +17549 0.1528882811781334671863 +17550 0.1935473214169615152613 +17551 0.149021475528536895272 +17552 0.0885144490730446487081 +17553 0.2010257079136187341462 +17554 0.120334784293850846626 +17555 0.0577330467613888995149 +17556 0.1767941703591145996555 +17557 0.1293411552430853428408 +17558 0.040712083490983652645 +17559 0.0680781917937748337621 +17560 0.1013899192364805940114 +17561 0.1537672079481599562989 +17562 0.057533762116366737871 +17563 0.1552062225836624009823 +17564 0.2294926550971274903379 +17565 0.1875864682111377523821 +17566 0.1749842538836254013379 +17567 0.0856340917908395976577 +17568 0.0707999529023454338139 +17569 0.0981333123720587174876 +17570 0.1855474589533172291578 +17571 0.0145772109844762454839 +17572 0.0116995506140676469586 +17573 0.2269426085050559871448 +17574 0.0088575881265483998039 +17575 0.0541853043239214349214 +17576 0.0777721624676322720315 +17577 0.1825903260260076099719 +17578 0.0884318337531108805694 +17579 0.095308970896738981482 +17580 0.1333271116016485624556 +17581 0.0262149382730584475099 +17582 0.0479921714920739789223 +17583 0.035754333785909697685 +17584 0.0414263759607198939205 +17585 0.0842898903020312534329 +17586 0.0053835375642107699892 +17587 0.0711766304193620497065 +17588 0.264960078995582160033 +17589 0.073594653911873447738 +17590 0.2602772094907165323363 +17591 0.0387654588475547512427 +17592 0.0554381762887202755863 +17593 0.016399113797080970556 +17594 0.0186052991134766504411 +17595 0.0262941616499185668032 +17596 0.0902858386148366887092 +17597 0.0409751188012960898543 +17598 0.1825136748008133746879 +17599 0.2679748270096711237898 +17600 0.0887330997866574017197 +17601 0.0202621497490086122095 +17602 0.1473992500390663429322 +17603 0.1754149319829911846114 +17604 0.2669346059878092702888 +17605 0.2064923873448883118176 +17606 0.1331089435040115620534 +17607 0.003267478201817675728 +17608 0.006901154612725867929 +17609 0.006916694158931212949 +17610 0.0106365059819990368656 +17611 0.007772509826417916233 +17612 0.0677793016006599247092 +17613 0.0098277864845368861219 +17614 0.0116313060579225999863 +17615 0.0037974434040148291429 +17616 0.0104522390496838649826 +17617 0.0367715274237905495602 +17618 0.0007727729246302161631 +17619 0.0004568854871342578725 +17620 0.1765380020250975623419 +17621 0.2764377495531472050949 +17622 0.1283109996350883519334 +17623 0.1766794130141338681206 +17624 0.1075001058028564604463 +17625 0.1541491961450831837954 +17626 0.2187073722552392018859 +17627 0.0423360941039494578786 +17628 0.1284411474308444767178 +17629 0.2341395765778331028439 +17630 0.0521616764267755564366 +17631 0.0157691077689700864806 +17632 0.0554277994265752749214 +17633 0.2575910960578162156587 +17634 0.226203527433620765752 +17635 0.1301470537116006243039 +17636 0.1503566197950980121156 +17637 0.2353724567681167445521 +17638 0.1758970123064466650753 +17639 0.0661899044020060473015 +17640 0.178115475474459161731 +17641 0.1119999294016731095081 +17642 0.132957223063391971607 +17643 0.1039314822014957145901 +17644 0.0991408601665718230977 +17645 0.1033441637099937887578 +17646 0.0256537056002503656427 +17647 0.1188113563503915282027 +17648 0.0950941779228684919145 +17649 0.1841600416189818623103 +17650 0.170631125233074776304 +17651 0.0683657524174302727582 +17652 0.1845314125587259601602 +17653 0.1036166125715066443913 +17654 0.13937119553915466863 +17655 0.2352417630032319906253 +17656 0.1528716419073986032551 +17657 0.2495440901618449780486 +17658 0.1854030951598530096991 +17659 0.1793115750011081976911 +17660 0.1396143963044610969426 +17661 0.1552913145372338821204 +17662 0.1169958099234412879364 +17663 0.0633052048541162259987 +17664 0.1231566227245867312101 +17665 0.1417295910348501664089 +17666 0.1151702688678730573946 +17667 0.0754506153322898331925 +17668 0.2160546992053653247989 +17669 0.1884604914170809886187 +17670 0.2169704247511561878703 +17671 0.0847886078910183166313 +17672 0.0686911460418307417974 +17673 0.1383172842726787254986 +17674 0.1756454808302043701929 +17675 0.0987242198629785516761 +17676 0.0438280160700129511886 +17677 0.0021980781701272846668 +17678 0.1469286430950887678559 +17679 0.1867397546926861662264 +17680 0.1331709070576526932062 +17681 0.1171807779160660839546 +17682 0.213402526896336042217 +17683 0.0123211480661988056345 +17684 0.1523576131654344190114 +17685 0.1235199423627011267035 +17686 0.002211582591133594896 +17687 0.2327182834003892475394 +17688 0.0014266215213399394135 +17689 0.0011484097577750861444 +17690 0.1075671625176487650188 +17691 0.2506930653125782337831 +17692 0.0710528409890456430498 +17693 0.0877036810460103677389 +17694 0.0627325440580780713251 +17695 0.2336617241073603856716 +17696 0.2522316149081719194314 +17697 0.2670397551782329181869 +17698 0.0487633427538501917908 +17699 0.2870825795953111936321 +17700 0.1498134798110338228305 +17701 0.0099737450917379757293 +17702 0.0223561019322532049391 +17703 0.1010477677730047402216 +17704 0.141180829786402173287 +17705 0.1152610615009539152265 +17706 0.0331160305039606209232 +17707 0.1159479651720768250689 +17708 0.1971414244585006358967 +17709 0.0121866951267675781984 +17710 0.1120108444606853909109 +17711 0.1025686929487572562891 +17712 0.001078031864598520944 +17713 0.0022979997322443374355 +17714 0.0041110448191052995429 +17715 0.0208753031413579683007 +17716 0.2012865860156639874479 +17717 0.178750118399215895959 +17718 0.1980796533841326645398 +17719 0.0267723617276340482285 +17720 0.0567350141737465174518 +17721 0.2674393614905821170424 +17722 0.1649828855641960190237 +17723 0.2459217890162623543215 +17724 0.167670971469833229861 +17725 0.2433813269327963213495 +17726 0.0265501197835399660452 +17727 0.0792088381045154416027 +17728 0.2187207560188052513084 +17729 0.1250345082364271442898 +17730 0.1402062939914864303503 +17731 0.1926026048158931691656 +17732 0.108791638827987355298 +17733 0.2087291143621818056975 +17734 0.1725760097418224114918 +17735 0.1869585345608048843058 +17736 0.1308152685608709453469 +17737 0.2982796844357694654448 +17738 0.1971369940936621434346 +17739 0.2215802075459601949703 +17740 0.2940514581434341412347 +17741 0.0837875561280355674043 +17742 0.1579890186233659787884 +17743 0.1408763960465529152533 +17744 0.1525091322540319815904 +17745 0.1864496383182588923333 +17746 0.079164952585242753802 +17747 0.1004470169888194985841 +17748 0.1244402400704113093033 +17749 0.1829685671239367483754 +17750 0.1562264983221219039056 +17751 0.2519841594021877129173 +17752 0.0833026514538857676406 +17753 0.1372832131361766683231 +17754 0.1802673156005905052979 +17755 0.1687929961981549931149 +17756 0.0538806595559024256725 +17757 0.06422469217928890306 +17758 0.0045153345072611534086 +17759 0.0200233997813087268025 +17760 0.0386398576116297950978 +17761 0.1442406213563132977296 +17762 0.013769672641317572126 +17763 0.0496012272733738837638 +17764 0.0269707938682387134144 +17765 0.146400232312574429816 +17766 0.1819317534895782129123 +17767 0.0252325486174968839637 +17768 0.0328919208513035049579 +17769 0.0155617209876680092501 +17770 0.0047600223105044364366 +17771 0.1350438067953519671871 +17772 0.1679944087297486898791 +17773 0.1810093475503501136981 +17774 0.0814777951053813342952 +17775 0.1171728070167250418399 +17776 0.1009581695962852437853 +17777 0.1448964343187196246454 +17778 0.1919317417247303758643 +17779 0.198122409841204827341 +17780 0.2334269553855928547303 +17781 0.0595199349724117171556 +17782 0.1633368736811628141314 +17783 0.0306164098572347942451 +17784 0.147392207264616631468 +17785 0.15320915611088148367 +17786 0.351508192183617840243 +17787 0.1200367027983789275369 +17788 0.1481615111238178406072 +17789 0.1018976366207100803241 +17790 0.1382384676994295458208 +17791 0.1379108539616982487264 +17792 0.3140950953404664791613 +17793 0.251333326933924183777 +17794 0.0880072767208470163602 +17795 0.1587077365818701790712 +17796 0.0579112208342916282011 +17797 0.2037790832059445667035 +17798 0.1567600189182307979241 +17799 0.1498880854768303017011 +17800 0.1112089108989141317263 +17801 0.2743293015619173158548 +17802 0.2308737251624245756521 +17803 0.2127497031223327395022 +17804 0.0159927977357741252229 +17805 0.1744364696096733502451 +17806 0.0007290543701792118582 +17807 0.0057030698462667188745 +17808 0.1731222988727328382375 +17809 0.0904510210228518241049 +17810 0.1124272354714990068691 +17811 0.1254114956298172656179 +17812 0.002599859746391787723 +17813 0.0016441813597793356484 +17814 0.0106402807373247362127 +17815 0.1885735743100152572715 +17816 0.0259015415335931906138 +17817 0.0804868278731957625949 +17818 0.0228607043912523663631 +17819 0.1318712987245394541169 +17820 0.1628388978531758735091 +17821 0.1011598317481375297611 +17822 0.2438417720721710313381 +17823 0.1557047007069042032779 +17824 0.0165028300175126194094 +17825 0.0333568177949287031359 +17826 0.1590454495785965094434 +17827 0.2165572613027843817957 +17828 0.0396274766049706794746 +17829 0.0077615429301267401321 +17830 0.1168454739049310658894 +17831 0.122418656148255630689 +17832 0.0270179881374193429922 +17833 0.2026662562433902137204 +17834 0.0040680228603337965226 +17835 0.0938200406764211763955 +17836 0.0572753417218349186735 +17837 0.0107927932488262254274 +17838 0.0175638013892175197384 +17839 0.0003171181729700912888 +17840 0.1571522504378659035318 +17841 0.2059483195707200453572 +17842 0.2201734223616254315647 +17843 0.0036913405647431962037 +17844 0.0500815582916849938444 +17845 0.1932149567047153071453 +17846 0.1631712557838496391316 +17847 0.1990082832027492698579 +17848 0.1929964465328024847413 +17849 0.1838781511340079333205 +17850 0.0371891741117211793721 +17851 0.1573272497029332139817 +17852 0.2619464776069418965321 +17853 0.0036757731852870307448 +17854 0.1421070314705659143328 +17855 0.1078119062032926317451 +17856 0.1114010011904566638252 +17857 0.1053850109982522043062 +17858 0.1483434303761534123201 +17859 0.1357836891429036185741 +17860 0.0367263318186745180149 +17861 0.1476903701055372886142 +17862 0.1250253639576204833794 +17863 0.1720409318076463189851 +17864 0.135477936090136086289 +17865 0.1995930788201144001626 +17866 0.1069479859090914503073 +17867 0.1599068612552659585102 +17868 0.2746959505201342355285 +17869 0.2799285814479590683845 +17870 0.2381118079604500892277 +17871 0.0023150775132494022003 +17872 0.2853391009944403755583 +17873 0.0395552816477530017059 +17874 0.0431587804408285688473 +17875 0.0447173190999186967454 +17876 0.0418636874685701654286 +17877 0.0449787825969031260231 +17878 0.0095528076434917581078 +17879 0.0072483744757117713267 +17880 0.0562203494560396116464 +17881 0.1020688586850237167258 +17882 0.020854387377373835577 +17883 0.2066116742873044809237 +17884 0.0022810674538527014182 +17885 0.0229453649329953088265 +17886 0.1555055434854079043916 +17887 0.094044255454987984999 +17888 0.2031442414019743736286 +17889 0.2099532010877925292114 +17890 0.1098376388103559503362 +17891 0.0968899186528724004619 +17892 0.0612779390112974317595 +17893 0.1205686578813443821101 +17894 0.0081365867277816278369 +17895 0.1518435534603332359715 +17896 0.0656342516266924247414 +17897 0.1304832984058515166215 +17898 0.0794316071164173492791 +17899 0.0004956394108547007137 +17900 0.2120577908825882484933 +17901 0.2418299736334363070345 +17902 0.1161643996736033412898 +17903 0.1248249761152757930649 +17904 0.2568959665857055152394 +17905 0.1817560973875518015586 +17906 0.0963396907671262708872 +17907 0.0947977156814842070176 +17908 0.167876700976720055758 +17909 0.1855982821404117688591 +17910 0.0204781210554216795694 +17911 0.0588826276993083819189 +17912 0.015440878674393499273 +17913 0.1541993133534127202733 +17914 0.0018614062839284655024 +17915 0.0368651602813022116756 +17916 0.019087582435660596647 +17917 0.1918553210779067874636 +17918 0.1342995205655101997877 +17919 0.110148927696001278087 +17920 0.0945378718659003053348 +17921 0.0371226616765013353594 +17922 0.0508474527893272992829 +17923 0.0615075519013823304726 +17924 0.0039808954438738152273 +17925 0.0465744711385413165394 +17926 0.0491477688463244355432 +17927 0.1476369356001050736626 +17928 0.1117540423263331528236 +17929 0.0747147418607106844624 +17930 0.0891755409397887444101 +17931 0.1747194903640390262201 +17932 0.1401827614951975664592 +17933 0.1608419954083477321216 +17934 0.1550479591492441022726 +17935 0.0019531521749367904139 +17936 0.0980309191723687384235 +17937 0.1946647877605225951392 +17938 0.0217582298349975888918 +17939 0.2311984090989286089179 +17940 0.0899837170902808014539 +17941 0.0176420911088277926337 +17942 0.0602084892811322186512 +17943 0.1466860421080045939757 +17944 0.2220745672766799017062 +17945 0.313539013686785772439 +17946 0.1147312440936136129777 +17947 0.1457798081198857786944 +17948 0.2798595467749289134041 +17949 0.2093702735119226665539 +17950 0.1983178985997993137858 +17951 0.0517080726910148572029 +17952 0.1809966089067192585027 +17953 0.0090095587153941111141 +17954 0.009897734442197910279 +17955 0.011999465514623608442 +17956 0.0103182822147734921947 +17957 0.2881285457346272615986 +17958 0.1502644062122389378988 +17959 0.2804561488078958775283 +17960 0.0718416662195610206121 +17961 0.0170336803057895731783 +17962 0.1302004371589564613032 +17963 0.272398902721018465467 +17964 0.1845656602554655578707 +17965 0.1594345294007278590609 +17966 0.004529010001087588333 +17967 0 +17968 0.0953208661314446231705 +17969 0.1379307899563331496484 +17970 0.0665035756224646723567 +17971 0.1817130754568493933299 +17972 0.3745215619567749398833 +17973 0.2277649339195904065747 +17974 0.2294081933589630917236 +17975 0.1697939642213852351471 +17976 0.1748237735137694870424 +17977 0.2512134660711425881985 +17978 0.1115309980549059193367 +17979 0.1561973257871864251278 +17980 0.213779008455895064289 +17981 0.0181604106409523936849 +17982 0.1802095376252311753618 +17983 0.1148667459791108402811 +17984 0.1239618957735275361909 +17985 0.091350020849392879474 +17986 0.0567860668115286321145 +17987 0.1508985283971136237557 +17988 0.1599698877357179638281 +17989 0.0585973733447195527235 +17990 0.0249538883167546854391 +17991 0.0748761829120155458783 +17992 0.0244866179610322629023 +17993 0.1713152871943108002117 +17994 0.0393140954381089868797 +17995 0.2493035922971660045544 +17996 0.0029315752142405854079 +17997 0.2252821520668261967568 +17998 0.1441224909938375320362 +17999 0.1516127259054756704071 +18000 0.0833077334128274338276 +18001 0.1724151539469438165764 +18002 0.1234822509294044162731 +18003 0.0645662375217156137008 +18004 0.19386929781875786305 +18005 0.2857282622563305229946 +18006 0.1598499804645054367391 +18007 0.1043818874441863192581 +18008 0.12972299321464003663 +18009 0.3177311169603224128011 +18010 0.1404784722033636246152 +18011 0.1209355607961160733455 +18012 0.2876327413664799270698 +18013 0.2168581312752037859592 +18014 0.2784191070680955837346 +18015 0.1090203602167259117062 +18016 0.1655807201676856577066 +18017 0.0806935527045730544016 +18018 0.2198991204282246259005 +18019 0.1635814238918513741083 +18020 0.2351466411447646898125 +18021 0.000441671247514348244 +18022 0.1511545879836183881029 +18023 0.1589090830955917910128 +18024 0.0918568026088247696626 +18025 0.2589808280989620570267 +18026 0.0564213278131782472458 +18027 0.1456666178016300383646 +18028 0.207648979962351459827 +18029 0.2043842298105004728903 +18030 0.0390301383498186105214 +18031 0.1357985836768341725289 +18032 0.0035710077490131312064 +18033 0.1010553134134538622035 +18034 0.207365687033288476071 +18035 0.0039389740093661411435 +18036 0.060855324283046449696 +18037 0.0727800800537980185911 +18038 0.005430515254259168749 +18039 0.0220192151202911155783 +18040 0.0027559027601297904526 +18041 0.0022656024204406885816 +18042 0.0047476069384497438311 +18043 0.1268088012780932971602 +18044 0.1000616556986353850034 +18045 0.0003011478934097169564 +18046 0 +18047 0.0017149757368658193231 +18048 0.0117334634503691980639 +18049 0.000518282190365185213 +18050 0.0023610965992863800975 +18051 0.0071424738916848334869 +18052 0.0023378518127352082914 +18053 0.0364343830548470443143 +18054 0.0066135024976973835753 +18055 0.0113797922623484221855 +18056 0.0069449934693437955988 +18057 0.0024741994122077872777 +18058 0.1941830563862584624335 +18059 0.217605378715923503119 +18060 0.0550603222151690857578 +18061 0.1893371249319093230223 +18062 0.3152647024235313288898 +18063 0.4424199546953626605017 +18064 0.0102805503849548209028 +18065 0.0128387122495456987292 +18066 0.0058888675212329152225 +18067 0.1750778057400688703105 +18068 0.0104382357241808096382 +18069 0.0228866033678337549295 +18070 0.0490908030712392876027 +18071 0.0177146330960381015451 +18072 0.2611269371892904755939 +18073 0.2098040035467355346466 +18074 0.2341937024897038643889 +18075 0.196890759708400447936 +18076 0.1579411915872625804536 +18077 0.2222736650548767789459 +18078 0.0685501698576211487834 +18079 0.292851464978596320865 +18080 0.0295263770471875727774 +18081 0.0546256877036014332205 +18082 0.1705996074873241397096 +18083 0.3119315957812990314224 +18084 0.1482843531245990609246 +18085 0.0976547477681229819657 +18086 0.1784989919050370132947 +18087 0.1716432706371492367126 +18088 0.1766755578814606097993 +18089 0.1865157882650629039833 +18090 0.175918298040749404576 +18091 0.2152742892926876039361 +18092 0.1888188947879438361888 +18093 0.0013274914882078717865 +18094 0.0185702313701437450588 +18095 0.1691248809726571544765 +18096 0.1446872472652094010481 +18097 0.1292719090081078103793 +18098 0.373306747868998034523 +18099 0.1804728297891636523698 +18100 0.015321701239754776866 +18101 0.1786312232368219010681 +18102 0.1279635620360682624064 +18103 0.2076486986459515904357 +18104 0.0507359932737087918064 +18105 0.0443584788196931881221 +18106 0.0555158900571330812812 +18107 0.1649199407413120543886 +18108 0.0555112643001908698248 +18109 0.1763239471018666626456 +18110 0.0130157990095159849936 +18111 0.1595207422499282656769 +18112 0.1982748286381401103817 +18113 0.2059805015307418774029 +18114 0.1599511235431538835883 +18115 0.4135119767241617894094 +18116 0.1601266128234338781411 +18117 0.1133121393479562394324 +18118 0.0050743874900839184075 +18119 0.0293940472474371883305 +18120 0.0891460551678163598543 +18121 0.0833638672965437976803 +18122 0.0033566045052715594126 +18123 0.1180204875054727542416 +18124 0.155107661469005231325 +18125 0.0794707055949172291864 +18126 0.153944828356646412626 +18127 0.0411142135919342829875 +18128 0.0471637192774587640787 +18129 0.2321518069758631841459 +18130 0.0944694728582464865463 +18131 0.1110383550624630816239 +18132 0.168156505499448438723 +18133 0.1320931800139716061349 +18134 0.2438761199503773346997 +18135 0.1213722202742059108616 +18136 0.0430410090944168335914 +18137 0.1078672849063931488134 +18138 0.1133912051894608585823 +18139 0.0662062762063518606137 +18140 0.0895895033108574656566 +18141 0.1377136287829839600416 +18142 0.0165564708149989990582 +18143 0.1399935143880475241218 +18144 0.2375786570908554606874 +18145 0.1191118763023114174304 +18146 0.156125130056595118333 +18147 0.1518937547945568289354 +18148 0.2002309505875196349312 +18149 0.1188847206013360496835 +18150 0.0407061513823591819561 +18151 0.2444809056530179158795 +18152 0.1497762496450918134894 +18153 0.1433813815913974054617 +18154 0.1118420645887449327294 +18155 0.1045597440941408162729 +18156 0.2659567658467611450668 +18157 0.1955417935577852817097 +18158 0.1682121644842596852598 +18159 0.253706327486346416844 +18160 0.1743156547728661187602 +18161 0.0687832428842840570837 +18162 0.0472956005722800915114 +18163 0.2007118004293849955477 +18164 0.1888578976377574825118 +18165 0.0624528572792661523794 +18166 0.1097416497142324615099 +18167 0.1516032142940242910534 +18168 0.1348204698622019193088 +18169 0.2343293424384812473349 +18170 0.1475943377547144774731 +18171 0.2108141929032178452097 +18172 0.180401728089734675331 +18173 0.2477611829374505303392 +18174 0.0214798796465132182942 +18175 0.3030635258561778511144 +18176 0.261841072187886103606 +18177 0.1815178481483136208041 +18178 0.1184367797017038398399 +18179 0.2441227564022127871635 +18180 0.2083401990581824070325 +18181 0.1421576493869268209558 +18182 0.2469289289610227811078 +18183 0.1758360929148369000696 +18184 0.1672384357248306863131 +18185 0.1269571764741111441488 +18186 0.0521276139139820945956 +18187 0.139165518921991471224 +18188 0.1467735124931759949085 +18189 0.1009278405690279545315 +18190 0.0871233893919486912516 +18191 0.2370596716077133792044 +18192 0.080512330338968332577 +18193 0.0889040592312476118941 +18194 0.130589146566716551856 +18195 0.1181411286276151662022 +18196 0.1514794278715471120833 +18197 0.182432327461889337572 +18198 0.2432459225197852459299 +18199 0.0078301338612682507001 +18200 0.0687863055257729033576 +18201 0.1006729920036149167339 +18202 0.1256744729435230190084 +18203 0.0759872979221539457395 +18204 0.2019418818857780240084 +18205 0.0547654269972163709745 +18206 0.169283585934895453029 +18207 0.177425700805586561426 +18208 0.1480722804985613871853 +18209 0.1957616110692123045922 +18210 0.1482906275536453044595 +18211 0.2800214437133635159149 +18212 0.0733045083082826537391 +18213 0.0273700789045759063089 +18214 0.1683128665647343202849 +18215 0.2124467855475032918022 +18216 0.1321687805629958023523 +18217 0.2296651118661119372888 +18218 0.1751735068295801334148 +18219 0.1520525203551351400844 +18220 0.2381540574448402780572 +18221 0.1197287748996094386422 +18222 0.2056208324060196102057 +18223 0.0010979967407788780807 +18224 0.2195469744755686736237 +18225 0.042572500083864081577 +18226 0.1892792969712624173706 +18227 0.1263267771395576444249 +18228 0.1695337810075737694504 +18229 0.1481472615670355041928 +18230 0.1905824315916943290095 +18231 0.2055024909668216603897 +18232 0.229727780823096916496 +18233 0.1125929331022468837809 +18234 0.0033962196481329971591 +18235 0.2480749791622353650489 +18236 0.1684927780120254459106 +18237 0.1600027540147423987804 +18238 0.2377345726643743939466 +18239 0.1752575317824640044329 +18240 0.141858710875027271836 +18241 0.1904506740419150689902 +18242 0.4370606099766647933968 +18243 0.1882552643350380305876 +18244 0.1360470492382698237677 +18245 0.1842504306297755578559 +18246 0.1773482652400054926378 +18247 0.1750594059050461226423 +18248 0.3012393326371941015829 +18249 0.2689549033766223118391 +18250 0.1004778785826502679468 +18251 0.1035448070255398228534 +18252 0.0010054546190000285613 +18253 0.0189635542189391485823 +18254 0.0030638125750028963844 +18255 0.0008775081457463466744 +18256 0.0018239627040821607823 +18257 0.1532221428914678218458 +18258 0.0076898843351129149773 +18259 0.190821490985056851919 +18260 0.0748286118201101357883 +18261 0.0547654358066303192931 +18262 0.1741460230441680412383 +18263 0.0712451857772737756624 +18264 0.3621395680130584660006 +18265 0.0899993367766359908533 +18266 0.0477098788422328570769 +18267 0.1693208974519741782938 +18268 0.1682902987490794977088 +18269 0.1240929746385259074826 +18270 0.0475063883123203398173 +18271 0.0391531965970683196177 +18272 0.0285765106883843697516 +18273 0.047440868077832210048 +18274 0.0018475543837059684241 +18275 0.1300849639172065030923 +18276 0.0322680651779538515966 +18277 0.0208764976945624812499 +18278 0.062621831959888699104 +18279 0.0037470127902126991806 +18280 0.1310651212212339267182 +18281 0.0852640134002118910139 +18282 0.0106638755911186034614 +18283 0.0877656537006233361531 +18284 0.0037577661236115533425 +18285 0.1026831433452825154129 +18286 0.1011184012486548988718 +18287 0.0856150009395674205237 +18288 0.0165463089966162192612 +18289 0.1990767812105071477013 +18290 0.1775205286190646436495 +18291 0.2462994974274429760275 +18292 0.1360071827533103350216 +18293 0.2219614168334577974928 +18294 0.1121606710407034857013 +18295 0.1959328514125608655672 +18296 0.0163142616769633257146 +18297 0.149460582507563249921 +18298 0.115950616910339490695 +18299 0.1407784412334128487654 +18300 0.1433337578460555095106 +18301 0.0760418304664580158025 +18302 0.0934798407220509552928 +18303 0.0313739197931876684367 +18304 0.0215215496277923190593 +18305 0.2155588699660665463753 +18306 0.2591869935424570559057 +18307 0.0512298178963613154435 +18308 0.1545487874536299877093 +18309 0.1969726291389284622557 +18310 0.0423728006010131649783 +18311 0.2020671711699580652688 +18312 0.2049860775881432728429 +18313 0.0078199452334037286572 +18314 0.159120879428883921225 +18315 0.1209558329253512115598 +18316 0.166803905542060537659 +18317 0.1208933899620305146838 +18318 0.0567299265079361028641 +18319 0.026909352787147330921 +18320 0.1489721004273446003374 +18321 0.0126058544952274212414 +18322 0.0677218252620448712964 +18323 0.0063519964269385300465 +18324 0.1775812037422017652943 +18325 0.1809756521365397941015 +18326 0.1423322579421414102452 +18327 0.1588281939931837916102 +18328 0.0353430805169004796928 +18329 0.1423181096993264282968 +18330 0.1006550311700829952732 +18331 0.2486762389902469894132 +18332 0.2004623261112695453701 +18333 0.030043201408475720543 +18334 0.0359112325715776128932 +18335 0.0355584251777642385006 +18336 0.0158830232999571628605 +18337 0.0175853035081263847639 +18338 0.1228154349360874864061 +18339 0.0194485628517798748993 +18340 0.2626689742812493011392 +18341 0.0740892379331159894873 +18342 0.0699942730653436290256 +18343 0.0670014646872278962775 +18344 0.1689104747371281411983 +18345 0.1946809662740139335035 +18346 0.1786643854383287155496 +18347 0.0428063962470211431532 +18348 0.1470180248446233128856 +18349 0.1490506972871445956486 +18350 0.0781392707064163388742 +18351 0.1622198904468862279415 +18352 0.2583001547514197082656 +18353 0.1894584333019087296623 +18354 0.1043230423581738841587 +18355 0.0590129112635531249964 +18356 0.3321678377929551673198 +18357 0.1177033376884292770193 +18358 0.1251173756578504714376 +18359 0.1382442361119499418809 +18360 0.0446534770188088056542 +18361 0.3771885209069316680086 +18362 0.1330994691396859552768 +18363 0.0040153707601135102054 +18364 0.1730923151792794489889 +18365 0.0787689173629117084241 +18366 0.1113846907207212966195 +18367 0.2041191921321554025859 +18368 0.3479090118511659612821 +18369 0.1827357219924398701316 +18370 0.0656379910741664712726 +18371 0.0940642597283680298448 +18372 0.1640926890864767284661 +18373 0.1760348564241021596111 +18374 0.0220982299031303244807 +18375 0.0037308044359354725072 +18376 0.0071945569890447466643 +18377 0.0858510193751142736129 +18378 0.2070025668257270412553 +18379 0.139206457264875238522 +18380 0.3379932580434205635633 +18381 0.1473447411253314687229 +18382 0.0650915310099227095719 +18383 0.1540878829893118917571 +18384 0.0087975156947480946668 +18385 0.0028743285747490931493 +18386 0.1322112980924165237528 +18387 0.1381356622344782070222 +18388 0.00230750435207690641 +18389 0.1452683577573039286346 +18390 0.2823371998153218487282 +18391 0.1636987633578381173827 +18392 0.0506292093499677506307 +18393 0.2215414989100196185845 +18394 0.2038605605318324076602 +18395 0.0184123656577295131964 +18396 0.1238522950882637169157 +18397 0.1777415528289000012929 +18398 0.1497831448375858132138 +18399 0.2358487582043662378339 +18400 0.1825532099461999457901 +18401 0.0828885833122554593899 +18402 0.1996085442238781648694 +18403 0.1817877819760976843 +18404 0.0289663123119633901525 +18405 0.1072426293527988933052 +18406 0.1544276631949356437534 +18407 0.222380530074135385199 +18408 0.2029269377947564012565 +18409 0.0744760397176732202862 +18410 0.2812860735377147869407 +18411 0.0224498634746041619836 +18412 0.1501896521701654718672 +18413 0.1055680127919115529611 +18414 0.3318051496775146214091 +18415 0.168717959385079302459 +18416 0.1336345409911959547511 +18417 0.0377165874258064554025 +18418 0.2031947834422959120459 +18419 0.1112310069861062317154 +18420 0.0808008936401047733744 +18421 0.1362603028383152226066 +18422 0.2089584464557599219336 +18423 0.2301521590405874451157 +18424 0.0364023966166654930343 +18425 0.0203589975167012202095 +18426 0.2304190630606022249793 +18427 0.0594294085002385202854 +18428 0.1492583794532164331859 +18429 0.0066691138091164023013 +18430 0.1373339769404682553233 +18431 0.1081751567965066329613 +18432 0.1500177934373270183244 +18433 0.1104994227952121876868 +18434 0.164086958711541941458 +18435 0.0055466930167721549549 +18436 0.0118450464020503545143 +18437 0.0012402389618443420815 +18438 0.1544446215705267133611 +18439 0.0469383235390932532227 +18440 0.1523596520496398876166 +18441 0.0315273591461389529611 +18442 0.1188079548756395681375 +18443 0.1315432596403146303654 +18444 0.204558405680878824251 +18445 0.1993184616760916016975 +18446 0.1305229905098053422297 +18447 0.1130076376961058060022 +18448 0.0990793894281722098416 +18449 0.3332481106178610819057 +18450 0.0069400196614444230672 +18451 0.2646596090049528893218 +18452 0.1624120899578676280317 +18453 0.1005447995940161598494 +18454 0.1679060771238996274679 +18455 0.1609525569806243561199 +18456 0.2163288137405453281925 +18457 0.0335975692392678701448 +18458 0.201696395954786833471 +18459 0.2685551996516466477516 +18460 0.0631570879783048266809 +18461 0.2780685335583163997342 +18462 0.1846582039824579690723 +18463 0.115857229460904195828 +18464 0.1604610254484162001898 +18465 0.0081705351852282316616 +18466 0.1195247850355663221666 +18467 0.2016354487027081365813 +18468 0.1450577505786348198047 +18469 0.0562897326910547085577 +18470 0.152114692787212707481 +18471 0.1495997193592639717963 +18472 0.0920112513370975881877 +18473 0.0011113831649603978478 +18474 0.0048805785515343485734 +18475 0.0047692856141091349159 +18476 0.0043005957838099075422 +18477 0.0018440095329387341284 +18478 0.2446830499248825452341 +18479 0.1372599859506910546791 +18480 0.0649501581036552866344 +18481 0.2375014243010505621267 +18482 0.1595603633875452842528 +18483 0.2889707313547236200435 +18484 0.1621407788650274517384 +18485 0.003580755206998893856 +18486 0.0540021942020356068848 +18487 0.1179184040454388515684 +18488 0.2070493643182073628672 +18489 0.1888059720503992011675 +18490 0.0130705362795139213433 +18491 0.0480492515264844474721 +18492 0.0017686871859852800349 +18493 0.0121423925141752927365 +18494 0.0950888686643246716645 +18495 0.0037235799832657976403 +18496 0.175911899797843940263 +18497 0.196702314599184080457 +18498 0.1338125006677290951274 +18499 0.3353529568313248776512 +18500 0.1335712267899539229532 +18501 0.2048149664921998514 +18502 0.0989187553446903983234 +18503 0.000665653125714388001 +18504 0.0064629806658318255524 +18505 0.0998507785638114014892 +18506 0.2576945392603415818655 +18507 0.1180278390197541937834 +18508 0.2195510691795415081362 +18509 0.1922822678947682106809 +18510 0.0011446105455075194796 +18511 0.16473876921711352872 +18512 0.0294090348908226072167 +18513 0.0082700670996561035514 +18514 0.0254257521125705321374 +18515 0.0288587563160839777332 +18516 0.027752608576851640898 +18517 0.0693393506448980317947 +18518 0.0119347189075395838237 +18519 0.2068409083833274431097 +18520 0.0144901036078989037953 +18521 0.0150072773362545860976 +18522 0.3970271297747804961631 +18523 0.1363697790724937297657 +18524 0.1373896932780124258144 +18525 0.1008971388607416697081 +18526 0.2331345610902002263298 +18527 0.1540584162403156320842 +18528 0.2070007992693947418417 +18529 0.2417938036923152622659 +18530 0.1595894183490109696777 +18531 0.0486239708288420044435 +18532 0.2839829758427137584853 +18533 0.1321483677005825807527 +18534 0.0700160592145586124779 +18535 0.1951122703531600333093 +18536 0.1619984315781600181516 +18537 0.1353399545680018523086 +18538 0.1994356512822186566769 +18539 0.1080415859322212124294 +18540 0.2515006592176165511887 +18541 0.1188169304979228985131 +18542 0.2718122272342773371179 +18543 0.2954137994503140962621 +18544 0.2285334712563556469611 +18545 0.2040633019861021191232 +18546 0.2984992202528120031069 +18547 0.1927800022117476530514 +18548 0.2394903249609868778336 +18549 0.2118231141323248245545 +18550 0.1382684125758435744746 +18551 0.1872885988134298240748 +18552 0.1689613192980941869248 +18553 0.1475307353511701169424 +18554 0.0560360909344354493622 +18555 0.1223812251244479759604 +18556 0.017485082425542485679 +18557 0.2141003357250291783132 +18558 0.0354896310375256082703 +18559 0.1758644888308338039007 +18560 0.0649838685827093531788 +18561 0.005532321348159312778 +18562 0.0607978290885191965942 +18563 0.0357373004991073869863 +18564 0.0208881881647858189122 +18565 0.0895038642578959570129 +18566 0.0946350058416036560294 +18567 0.0464556769102554764639 +18568 0.0070028665514906057385 +18569 0.1483638288749385969467 +18570 0.0065167071983196290033 +18571 0.1962399999599510891368 +18572 0.1594796602553577302697 +18573 0.1540534217323296928281 +18574 0.0009086574898092184644 +18575 0 +18576 0 +18577 0.2100225905313938268293 +18578 0.0979120636057256993334 +18579 0.0161169589395486478889 +18580 0.0166272398308350138896 +18581 0.1326144703736487351353 +18582 0.0067374807689089682983 +18583 0.052792192591312694927 +18584 0.023145936702383674427 +18585 0.0102048300159266296105 +18586 0.1139265988151916819682 +18587 0.0359323870657666938344 +18588 0.1151574760316894652235 +18589 0.2112489289516611412623 +18590 0.1907513061155081401932 +18591 0.0470038246547416285748 +18592 0.0692460901837928383129 +18593 0.1248883537462997100898 +18594 0.1960845485886129335729 +18595 0.225612636401993527091 +18596 0.0652042013894028882337 +18597 0.1111360955619497409286 +18598 0.1309529947294738594366 +18599 0.1372665441048375734034 +18600 0.0208868405977043931909 +18601 0.0436243296139947958623 +18602 0.4070023741461871180647 +18603 0.1248134437229772714106 +18604 0.2772024393413830556376 +18605 0.2153419521270824588122 +18606 0.1174184029923247274407 +18607 0.1503211217570716018432 +18608 0.1063347266496385556955 +18609 0.1785507840517786548595 +18610 0.0548825402492473177207 +18611 0.0616608470556775714599 +18612 0.1625722701993999597114 +18613 0.1648379918029636570509 +18614 0.3078292513515992201434 +18615 0.1006330426512352521629 +18616 0.1647567773104708366283 +18617 0.0482026316889102510177 +18618 0.1391884823204224685966 +18619 0.205657730278581946104 +18620 0.0144688643020750989931 +18621 0.1529486338315818039213 +18622 0.0778514475349953161132 +18623 0.276564973434942795727 +18624 0.1453396062412823375176 +18625 0.1859065939672181755338 +18626 0.1609229444227850136517 +18627 0.1045160367886696639594 +18628 0.1319918617488965162377 +18629 0.1128169323678423902058 +18630 0.1438350388803957546902 +18631 0.1703724105825713819318 +18632 0.1603062683098510909918 +18633 0.2469369676634066701482 +18634 0.1468452260746450632745 +18635 0.2357337406648322974956 +18636 0.0115229451581461209142 +18637 0.0387108234921452049049 +18638 0.2631665623094598949194 +18639 0.2399586353362735680061 +18640 0.0959701708647213991288 +18641 0.1828718722715768629783 +18642 0.1254196918590610998478 +18643 0.1450101562771428098664 +18644 0.2327956160301346555386 +18645 0.2923126875501758648035 +18646 0.0977974397513787230274 +18647 0.2045607902182501880439 +18648 0.1339511163820340244879 +18649 0.2076753691835491466566 +18650 0.1733627545624913113276 +18651 0.130766714962169183778 +18652 0.1784327635918930565762 +18653 0.2201795770460958368009 +18654 0.1783155457300794610731 +18655 0.2576116458255861552118 +18656 0.1063188063874700994837 +18657 0.1799729533055094699012 +18658 0.03206173629485391946 +18659 0.0013415722786560056821 +18660 0.0140192740141562507961 +18661 0.2018299450529805194599 +18662 0.1526465690946135767003 +18663 0.010556635805153700633 +18664 0.2940911761301208682262 +18665 0.1403746244433323742129 +18666 0.1771682200798452011536 +18667 0.1754929880058608560045 +18668 0.0565890872076419207404 +18669 0.1856221380786187702405 +18670 0.2136146775430310951194 +18671 0.1959108349554155303895 +18672 0.0958791876236331497285 +18673 0.1033712786345001716226 +18674 0.1803289522486768159393 +18675 0.2434045291272279309958 +18676 0.2307885518051163709075 +18677 0.1809476117269445683 +18678 0.1458718108489107379011 +18679 0.0140313756649268674831 +18680 0.0373570962667365627508 +18681 0.1230162799977710386123 +18682 0.178261815891412211954 +18683 0.1217170105940821173141 +18684 0.2651238679423931765555 +18685 0.1285937585126921967049 +18686 0.0851143017821203023976 +18687 0.2616613299463824704461 +18688 0.1377001005126409771684 +18689 0.0687010156191886905974 +18690 0.0750741841012827065827 +18691 0.0083790402674602595945 +18692 0.0070591151958643103687 +18693 0.0383678792092639678324 +18694 0.0017827551797501921781 +18695 0.0061886678139414031188 +18696 0.0102958565847519681324 +18697 0.0172214803284786305282 +18698 0.0639642766436315374223 +18699 0.0344383082591017131158 +18700 0.0035598338030445528246 +18701 0.0111563358308332943869 +18702 0.0034905896411549998566 +18703 0.0002983705643945461857 +18704 0.0619762946466480835195 +18705 0.0033139491877430976054 +18706 0.15842189213816687543 +18707 0.1491714676551496865375 +18708 0.0595844342073957380923 +18709 0.1218360526660559606249 +18710 0.0973714111044198665779 +18711 0.1778936778838607069009 +18712 0.0080238537426905644917 +18713 0.1997226343455639763214 +18714 0.0057835491369501263945 +18715 0.0890687435072079064513 +18716 0.0899886751476700746943 +18717 0.196353345858494426901 +18718 0.0318369157989309564671 +18719 0.0102535144805690776187 +18720 0.0531191150739465170472 +18721 0.1105653464339493435853 +18722 0.1897865332785212966193 +18723 0.1916154023347427792778 +18724 0.1543792302852345366126 +18725 0.1942441632516016536414 +18726 0.0027857188684034762602 +18727 0.147179443921497704606 +18728 0.09343231496575528372 +18729 0.0112320692775585411621 +18730 0.0314390760573202421302 +18731 0.0038510442799600855777 +18732 0.0016668422084832521229 +18733 0.0470138706480584742109 +18734 0.0549837249627685292763 +18735 0.0303955180919543363482 +18736 0.0497293987544512317012 +18737 0.0249744594737898971404 +18738 0.0474128924328683226674 +18739 0.0523827555268211869643 +18740 0.0289041929573701471268 +18741 0.0638746612833150928523 +18742 0.1672021604726765253712 +18743 0.0284251636108390978719 +18744 0.0075143708280194839583 +18745 0.0780968745376443701511 +18746 0.0175950875689349292563 +18747 0.1674075593528788485909 +18748 0.3014457976704950081803 +18749 0.1638990509704109088673 +18750 0.1449587485661499586254 +18751 0.1168827857843649092873 +18752 0.1595382843980283071872 +18753 0.1606564928986414231282 +18754 0.0504178883410336398097 +18755 0.1267859511259462057531 +18756 0.2005029265231725588148 +18757 0.1781790219528638918867 +18758 0.4098174316794863658586 +18759 0.2229984607195678869562 +18760 0.1983369104360408197163 +18761 0.0776365849624390536032 +18762 0.10961814716063325581 +18763 0.0916295598195327187785 +18764 0.0744280333478072320119 +18765 0.0581316907107792338327 +18766 0.0789532389459057515335 +18767 0.0087036209584386032417 +18768 0.0046632926637450271279 +18769 0.0046742625546208034851 +18770 0.2591332688261053229795 +18771 0.0217242329639543430009 +18772 0.0386031560648445598583 +18773 0.0037857090855466517428 +18774 0.0026263696931909685274 +18775 0.1072932787936466819678 +18776 0.1020934573807757150821 +18777 0.002695690790569308954 +18778 0.0036816691999079190385 +18779 0.0457898173066890387672 +18780 0.0201312611237342183046 +18781 0.0812769446657537336653 +18782 0.034482399235978206109 +18783 0.0238792429288101383522 +18784 0.0101994317121835501028 +18785 0.2302675292473897739054 +18786 0.0799880895778775080629 +18787 0.0703642994637120322565 +18788 0.001679219107760647001 +18789 0 +18790 0.1038524642156976596219 +18791 0.2307918885424446175936 +18792 0.185053142400767950404 +18793 0.1962033882277619267764 +18794 0.0301643748733326881595 +18795 0.1542175999509358408357 +18796 0.2453979737579427833349 +18797 0.1837534001762141377956 +18798 0.109955288518474386672 +18799 0.0508410263257436517703 +18800 0.2036115245953469821139 +18801 0.0721926138674106460114 +18802 0.2124006932861754182174 +18803 0.15799194833086926959 +18804 0.2079201163725858114706 +18805 0.1567169853347887831241 +18806 0.0934809610059220974509 +18807 0.1859910488242150217175 +18808 0.0224111976161435744181 +18809 0.1666337744221880468043 +18810 0.1428988449848179742574 +18811 0.1359275440194020745377 +18812 0.0185677878324936353194 +18813 0.1355607237482749638513 +18814 0.1032718261868926235003 +18815 0.1386688324626951518415 +18816 0.2814516733503681922279 +18817 0.0188157140148992353679 +18818 0.1546321196214341386899 +18819 0.1492478021794829690094 +18820 0.0924548102788875225677 +18821 0.0873287417428650486473 +18822 0.1477203703580732274681 +18823 0.0602412997312167589059 +18824 0.0186169043450291303576 +18825 0.0633173626275496809201 +18826 0.0831360184943039093319 +18827 0.1121294953737922289738 +18828 0.2636828252923588689072 +18829 0.169504928766335549728 +18830 0.1022340425257051987895 +18831 0.137532162455879736207 +18832 0.1882885812465860952969 +18833 0.1365490512576336690209 +18834 0.0073526328065387338076 +18835 0.0575027184568545901566 +18836 0.228224317383259212022 +18837 0.217836359183442601628 +18838 0.167263792074330341908 +18839 0.0489200116240188567773 +18840 0.2569898711120338630209 +18841 0.0724574753940080001913 +18842 0.1398671898934081003762 +18843 0.1929028463561862083431 +18844 0.237338483595763843903 +18845 0.0426588806594497133018 +18846 0.0157714765525554696801 +18847 0.2373809039649912300263 +18848 0.1594997393351658410143 +18849 0.1184080223031681589063 +18850 0.2335859305662482121591 +18851 0.1443326268150945113078 +18852 0.1789049795291220990023 +18853 0.3126905837150252276402 +18854 0.1671830322971298021795 +18855 0.1565288667771175812327 +18856 0.2410043451739520559141 +18857 0.211008802511150728165 +18858 0.2230207362411087412823 +18859 0.222007734311309684605 +18860 0.0919216803117915148835 +18861 0.0007502728095867884119 +18862 0.1196501486967350980084 +18863 0.0560415333211624899779 +18864 0.027180797508116267408 +18865 0.0699585171458263904709 +18866 0.0197074648028955565215 +18867 0.0126132841148663740055 +18868 0.0218065675925165382287 +18869 0.0719350861088969700807 +18870 0.1612762268253190300893 +18871 0.2391219052874784811724 +18872 0.2275198181215099535457 +18873 0.1309886357603370909963 +18874 0.0641851997421467590499 +18875 0.0916778930735432651122 +18876 0.1267400228549005192225 +18877 0.1536725483829190219698 +18878 0.1342059560534332451898 +18879 0.2207547569709027157625 +18880 0.1193588194193625356965 +18881 0.0689260748025541936101 +18882 0.298968472193730738784 +18883 0.1067109629644829027484 +18884 0.2176296545298257445467 +18885 0.1562866243497121609973 +18886 0.1514527747681180536166 +18887 0.0118808015571911975566 +18888 0.157615435771202544446 +18889 0.1439737161315770563874 +18890 0.0852582096006175710334 +18891 0.1663310507747201849593 +18892 0.1590761384381751963879 +18893 0.2142028439148202512143 +18894 0.1451680822142518145768 +18895 0.1333412404878342172321 +18896 0.1698288672772230722963 +18897 0.1687714235964202780238 +18898 0.1731499078663679136447 +18899 0.0944070050955044398622 +18900 0.1870988767068406111171 +18901 0.3234253378618333063521 +18902 0.1841811582366636446118 +18903 0.123138087882516927829 +18904 0.3662904162942626773436 +18905 0.1661441097305851877231 +18906 0.1177448177335986001513 +18907 0.3670213656207496621953 +18908 0.2221678932173503651093 +18909 0.213228525476527847049 +18910 0.1537463888113102528088 +18911 0.067549023265329577459 +18912 0.1283058290878115181854 +18913 0.2117502744800596192754 +18914 0.2059178376166334867303 +18915 0.073752902720262594749 +18916 0.172913824949699923561 +18917 0.2516187314193369939375 +18918 0.1751430499041494448509 +18919 0.1044046350916993659563 +18920 0.1602537447801758996313 +18921 0.123449498162113258104 +18922 0.1401496212258819540342 +18923 0.196635528050167152081 +18924 0.176576741541188853013 +18925 0.2422685163014373921353 +18926 0.2482200127050803728501 +18927 0.2349702574925168485631 +18928 0.1147132775203203447489 +18929 0.164205891225632533148 +18930 0.2328223936266896332281 +18931 0.1974207930301723201705 +18932 0.0776173145952704091055 +18933 0.283279482856349185127 +18934 0.165120699297490680646 +18935 0.1125544015939884817668 +18936 0.0406830718350471229328 +18937 0.0191213723371324167966 +18938 0.178786783161934775821 +18939 0.0920289762563280544505 +18940 0.2992623204548174875583 +18941 0.1647275445878989275084 +18942 0.2603547470186083345745 +18943 0.1625005448257071938301 +18944 0.1755859114031320666882 +18945 0.041458439160781915267 +18946 0.0157173442065552462776 +18947 0.1510937655047928374064 +18948 0.1727714199172540321392 +18949 0.1506340409458691298461 +18950 0.0881409460819801982012 +18951 0.1459809300483381200308 +18952 0.1799959024740538093834 +18953 0.2667252121738439285537 +18954 0.0870106271511883883063 +18955 0.0425852350727826262089 +18956 0.2575946959554983317631 +18957 0.2439683602744568968745 +18958 0.1723702631790969275194 +18959 0.3430652057839308488596 +18960 0.2022727800887217930459 +18961 0.1989196378834104383948 +18962 0.2058420081780426846851 +18963 0.0105391337764544797129 +18964 0.1328881498597168919762 +18965 0.1112809547010504274134 +18966 0.1864752597856291493894 +18967 0.1453448328954453083739 +18968 0.1666040515615199169019 +18969 0.0679299090115956932801 +18970 0.1939874343179712745666 +18971 0.1475831235675842401101 +18972 0.1729236906938498019581 +18973 0.2008054743252862817116 +18974 0.2046109603523881270704 +18975 0.2282654930673072846137 +18976 0.115610446132640848349 +18977 0.2007104150143790288752 +18978 0.2056747822397597680322 +18979 0.2621299616649698238646 +18980 0.3217173930864579145528 +18981 0.1076485881553422702117 +18982 0.0840393467419996847978 +18983 0.2156374356571753858614 +18984 0.2071431094954981622092 +18985 0.0633403915806194234284 +18986 0.1834490278046202460516 +18987 0.0288543246728563060333 +18988 0.0731417442887139224172 +18989 0.1088936112717757420665 +18990 0.2294332885552078649116 +18991 0.0688360882928090583865 +18992 0.1013171126667108540298 +18993 0.0986731939504182087308 +18994 0.2424257117980905540389 +18995 0.2268997768376506962706 +18996 0.2407010265658554581947 +18997 0.2995331337780445801755 +18998 0.2859595846692179477166 +18999 0.2011002481223818638689 +19000 0.1915867872117047177039 +19001 0.2000516381652403252733 +19002 0.056084733087298929699 +19003 0.247341811567962310825 +19004 0.2858906595115913162886 +19005 0.178188093159933391707 +19006 0.0555305739073701162822 +19007 0.0230602773618729453564 +19008 0.1730854341158783726851 +19009 0.1807384154797275765958 +19010 0.2435193331470654753979 +19011 0.2210171663060404301238 +19012 0.1667886098820925422359 +19013 0.0775245586151774163097 +19014 0.1549731279123913452178 +19015 0.164939969956804044493 +19016 0.107768201769539537227 +19017 0.0997975892937699171625 +19018 0.2166451446660503654673 +19019 0.177675247000657843266 +19020 0.1728909695338806673526 +19021 0.1283236706441707730697 +19022 0.2284002719119478386212 +19023 0.2061222853377386032836 +19024 0.2453255234208633617232 +19025 0.1199752581239283966141 +19026 0.2460994979958445461055 +19027 0.1515204329980504860398 +19028 0.1411326618722051340349 +19029 0.1154767608266606543355 +19030 0.0018434881426817968306 +19031 0.0157052315425148039663 +19032 0.001048687032308570936 +19033 0.0729132940717958505372 +19034 0.1859749487146356294875 +19035 0.206194766495714720822 +19036 0.3108376390098254726801 +19037 0.0781555031473659955754 +19038 0.0661588768907337976266 +19039 0.0327911457813491527546 +19040 0.0192501334263666216973 +19041 0.0700134344084384629969 +19042 0.1334289802826452986384 +19043 0.0462177741810611691919 +19044 0.005947520188751040382 +19045 0.1678695379508802765223 +19046 0.1510971432937854574963 +19047 0.1258196132714939075026 +19048 0.180084025067727748759 +19049 0.1551000311446619761568 +19050 0.2596655920381512605566 +19051 0.251146963724184935085 +19052 0.1931999337697365703992 +19053 0.0597739795040231405854 +19054 0.0808820926388692101883 +19055 0.1706576538697562595015 +19056 0.1640788741908773551437 +19057 0.0915401206358454067047 +19058 0.1301818712732434601964 +19059 0.1521506930619578867869 +19060 0.2633445959733253238788 +19061 0.241153385021515426212 +19062 0.1436329513563688875522 +19063 0.167943093136478122096 +19064 0.0410061957201608151546 +19065 0.3183352982618974924556 +19066 0.1508991071949347118064 +19067 0.1202028262184696438819 +19068 0.0560943847731352890351 +19069 0.2243689551458581121768 +19070 0.1951622445464150368277 +19071 0.2019728929429949193519 +19072 0.1766559683392764923404 +19073 0.1069611650828389409762 +19074 0.2295690922587961257939 +19075 0.0085669046199340608688 +19076 0.0146240527610876983045 +19077 0.1564025886911896701115 +19078 0.1256145946963526383033 +19079 0.1593781745475219280284 +19080 0.1716679487362137712658 +19081 0.2466940622421856244006 +19082 0.1316062650817634016587 +19083 0.1040092402503923901813 +19084 0.1926667968525586915884 +19085 0.1325085794941060768348 +19086 0.2113484452946284564145 +19087 0.2231097994835105347011 +19088 0.1067473866280967409459 +19089 0.2821197201255112307905 +19090 0.1891487478203091876505 +19091 0.1510079472168808500943 +19092 0.2123408828493761890677 +19093 0.2632436194286144615084 +19094 0.1637624915690270577251 +19095 0.1241863427276338055494 +19096 0.0797109797395132296804 +19097 0.2123827839837425790481 +19098 0.0877999484187076267849 +19099 0.204405066354242881399 +19100 0.1827483544164398565623 +19101 0.1834238507070545254329 +19102 0.2460257716993600740629 +19103 0.1167729172897907646833 +19104 0.2326944795700472690125 +19105 0.0975566944508398231628 +19106 0.0707204070894862230068 +19107 0.2264968993953527243157 +19108 0.1247100757975573986913 +19109 0.1524776441734853604792 +19110 0.1474172197715507004467 +19111 0.2877083723783749324987 +19112 0.0489601082027828898857 +19113 0.0697281652206902108349 +19114 0.1752157100139709045017 +19115 0.1721025287955486593461 +19116 0.2125329921395729404487 +19117 0.1792364597287839922402 +19118 0.3121793083385974720834 +19119 0.137866015330098873326 +19120 0.131428592020300660792 +19121 0.1699617838357339594246 +19122 0.2406549098416658205579 +19123 0.0823521803669035878936 +19124 0.2206993391752084532342 +19125 0.1622650164774172565352 +19126 0.1713561967101673577663 +19127 0.2898281156355977938865 +19128 0.214788368372095000991 +19129 0.1224000117796310987961 +19130 0.0841634079319383832107 +19131 0.140039646091566549746 +19132 0.1011326356422993372197 +19133 0.1086144200530839570851 +19134 0.1165526789411113484096 +19135 0.1338501015640459479439 +19136 0.1266567655200513675684 +19137 0.2139784076737129003831 +19138 0.0717640403377108188065 +19139 0.1626921652468212575737 +19140 0.0775967369617870661447 +19141 0.1216062238225291969318 +19142 0.1120338014474103055518 +19143 0.1354464951224382263106 +19144 0.0425799932145942972017 +19145 0.172109354725216101567 +19146 0.1956294231318713439105 +19147 0.1504576488251039290134 +19148 0.0571286412730184905984 +19149 0.1931984488864131044306 +19150 0.2318509964990276839281 +19151 0.1391639019239228058922 +19152 0.2037117971868596943974 +19153 0.1751577008607355934178 +19154 0.1631040005532574388436 +19155 0.2660253564483163746246 +19156 0.0707431484876513694271 +19157 0.1880069186814645743056 +19158 0.1102331919224120992951 +19159 0.1266635904387160005413 +19160 0.3549812041453412891912 +19161 0.1692966836042917033733 +19162 0.1918476944184706878271 +19163 0.1481395656729648513394 +19164 0.1189299225256544162432 +19165 0.1876557956746454369767 +19166 0.1701807417396278554467 +19167 0.148403414177912779115 +19168 0.230811002336908849264 +19169 0.1710011388291256817507 +19170 0.1288059640514975512371 +19171 0.0302010394138854665336 +19172 0.1981748434451025275838 +19173 0.0996529964487362085457 +19174 0.0565536082337893888927 +19175 0.2031262457691552503203 +19176 0.1784334544279605194728 +19177 0.2067416740037630040749 +19178 0.2034199277765502134052 +19179 0.1517886275041409049891 +19180 0.0283289796153916653698 +19181 0.0426497355814960468767 +19182 0.2396234251682407556672 +19183 0.1292723920820299798873 +19184 0.1585423954308209093522 +19185 0.258761333318354547206 +19186 0.1817694610225722151586 +19187 0.1496394992250398570111 +19188 0.1063579076170118875577 +19189 0.1806144752005912901272 +19190 0.1244186591968608479597 +19191 0.1683447859422191605905 +19192 0.0883668974045004190776 +19193 0.0987911414721141939443 +19194 0.156951887875186019139 +19195 0.1924774574563405393768 +19196 0.025853271321854009307 +19197 0.0948829645267813409637 +19198 0.2775387893954341378411 +19199 0.1954094790397791214787 +19200 0.1894334843023754433311 +19201 0.0495377565904055044976 +19202 0.0443641554029769735745 +19203 0.0750515920130142427658 +19204 0.0900462512029560763205 +19205 0.2018221886587636959565 +19206 0.0929051047232336835169 +19207 0.2902771656063434790873 +19208 0.0861129238120354467689 +19209 0.0321983528582934219742 +19210 0.2645030671082097550872 +19211 0.1371602578475252798285 +19212 0.1860743002752122121635 +19213 0.1603696208925018962255 +19214 0.2450126462378770242356 +19215 0.2867458831628088566923 +19216 0.0947846424353171596033 +19217 0.1360387024709196479577 +19218 0.1424944052941966021741 +19219 0.2612696406723382103188 +19220 0.1600402746322555358471 +19221 0.1984945613438559020025 +19222 0.2592711173468490448712 +19223 0.0677971111130456077776 +19224 0.1251907754104115422233 +19225 0.1614056293326183721959 +19226 0.1575662439767904765553 +19227 0.2598038695772501016634 +19228 0.2340442270520359524522 +19229 0.0932076822204406518768 +19230 0.1327622624433401621324 +19231 0.1187332677282842141597 +19232 0.1853779048473241097827 +19233 0.1115668566642464276661 +19234 0.1700320307637870342887 +19235 0.1453759541091149964931 +19236 0.205630078211391453813 +19237 0.0859988715520452473573 +19238 0.137543987236679127184 +19239 0.0425601186620381891079 +19240 0.1076621221744457068681 +19241 0.1384577100507543845342 +19242 0.1675514523762825502828 +19243 0.1406818128084617869344 +19244 0.0939659375109881123134 +19245 0.1126732655089118517733 +19246 0.1876560763696940581369 +19247 0.10103274894650354776 +19248 0.1180179715225045389282 +19249 0.0814886749150952882648 +19250 0.0746360189117777350631 +19251 0.1873937130466684919394 +19252 0.1788084408175210082703 +19253 0.3644947141655311972919 +19254 0.1123776688358818914848 +19255 0.1674211768441074832392 +19256 0.1828192317753574158701 +19257 0.2001733391920799687291 +19258 0.1485022267634564518968 +19259 0.1745998317671735300394 +19260 0.1366209505387228906326 +19261 0.2559068005791487188105 +19262 0.1086297821592051432438 +19263 0.3371258539349503702276 +19264 0.2241937623055425266028 +19265 0.2257706438520669101067 +19266 0.1407444641253190187324 +19267 0.0141664191103682071682 +19268 0.2874214157732388752464 +19269 0.2203578488051858019769 +19270 0.1949990896845216203914 +19271 0.1393937711436575022095 +19272 0.1405783881742073693211 +19273 0.2849302826371176156073 +19274 0.1872363166546918700206 +19275 0.1190803531199611381419 +19276 0.1608261671634666212238 +19277 0.1762262111641881290325 +19278 0.173897519204618961064 +19279 0.1950334631126266482504 +19280 0.1656724903524744996197 +19281 0.1873682969954735544338 +19282 0.0789119880009301194468 +19283 0.1209571661362493222303 +19284 0.21587668793023520597 +19285 0.104049031940868730306 +19286 0.1279905958819859690401 +19287 0.1978361350073776225855 +19288 0.1090346692127220212809 +19289 0.1784029430780781910748 +19290 0.119660136640496322924 +19291 0.0730087001467265017673 +19292 0.2394442283691455564121 +19293 0.2573791076454169934884 +19294 0.1229542940739694611985 +19295 0.0253973846510918907504 +19296 0.0116432077686442671943 +19297 0.1592839742316115025922 +19298 0.1883910086720668175531 +19299 0.1501706190280251151936 +19300 0.1855290255899516294047 +19301 0.1196532486959752950018 +19302 0.1615294306744268770792 +19303 0.0289494319524339106808 +19304 0.1663745288424389734416 +19305 0.1194059360767533856107 +19306 0.1027335893379965059413 +19307 0.1101630037211331902824 +19308 0.1501774926369634777235 +19309 0.1782944730769046304264 +19310 0.1025475279567807929126 +19311 0.0482216893197378870806 +19312 0.1167764202417595625905 +19313 0.2707543483051683130824 +19314 0.2784951378713178549162 +19315 0.1906631425814580838018 +19316 0.2332448723431498338599 +19317 0.1192473282777258003495 +19318 0.2522303317681190426569 +19319 0.2242319102413485032432 +19320 0.151852980049689945119 +19321 0.2990675502384902983977 +19322 0.1544478873307190969211 +19323 0.1377955002045901933272 +19324 0.1532452729050650663822 +19325 0.1764665454687365797604 +19326 0.1642135705031717018976 +19327 0.1322689220029227286091 +19328 0.1823998840516543318291 +19329 0.158565880163957839688 +19330 0.1003610240759047061987 +19331 0.1998641459518277485596 +19332 0.2198852652712735278318 +19333 0.1584226130170211244153 +19334 0.2367384853837148084299 +19335 0.1936468464387538779281 +19336 0.1543102417410250615237 +19337 0.2685685893131626866825 +19338 0.097388588283925672795 +19339 0.0535713098070830298636 +19340 0.2960251052336222432615 +19341 0.0832584904114629098526 +19342 0.293205946132212402766 +19343 0.0880525208575422824175 +19344 0.1008934439547045508467 +19345 0.2115010996100215034943 +19346 0.1760778568044980563201 +19347 0.1452870940939731958519 +19348 0.1813506813760925595602 +19349 0.1330485629094268484884 +19350 0.0948447229860650969924 +19351 0.0989673701272723582711 +19352 0.2511661345166870673573 +19353 0.1877505014260649196522 +19354 0.0912409023808154751034 +19355 0.0558628760054483280983 +19356 0.0864681262834918334947 +19357 0.0856734077271408545062 +19358 0.0350971312457961787112 +19359 0.1222306579783555974261 +19360 0.1801432297312636210407 +19361 0.2047583484653394403363 +19362 0.0701442189025063556063 +19363 0.1829744770624752991939 +19364 0.2040287635445505454168 +19365 0.1486327484311834867814 +19366 0.1497749659483821604855 +19367 0.103147422948781763985 +19368 0.1524466524967201253027 +19369 0.2087523241322719602664 +19370 0.0737519095166567356658 +19371 0.1434248015219342919924 +19372 0.1982242915632623225797 +19373 0.0959467101364367275229 +19374 0.1471891491932719453484 +19375 0.1889481669761573501276 +19376 0.1590480427148542597049 +19377 0.1464955153417544231331 +19378 0.142031222199565210218 +19379 0.1868378711662644631186 +19380 0.1091003535883588726474 +19381 0.3638798617047923067425 +19382 0.258597979112882792041 +19383 0.0948765683021701444755 +19384 0.1455543753928393024832 +19385 0.1599027712315626292838 +19386 0.2095414711869966195579 +19387 0.0795338037488305810152 +19388 0.2383998008067887142136 +19389 0.2190218565775493009262 +19390 0.0342852079638173989906 +19391 0.2256183056135502296158 +19392 0.1266112751219454490847 +19393 0.2362261936146395346903 +19394 0.16558908618231962806 +19395 0.2463551226219997947187 +19396 0.1963324130150259361205 +19397 0.2084396009732054078079 +19398 0.1907189446344557948176 +19399 0.1538113314425336719538 +19400 0.1668121848104673066171 +19401 0.236126155568768203441 +19402 0.1924671061653499826161 +19403 0.3115209055534977622415 +19404 0.252706136878455667194 +19405 0.2298627594774989013615 +19406 0.2172826783629511482321 +19407 0.1755663113032284172732 +19408 0.1067164599582551554047 +19409 0.2340908607783625250409 +19410 0.2258791721387437001489 +19411 0.214536961508161122536 +19412 0.2609968882525392497307 +19413 0.039324927202740486698 +19414 0.1755608364197001058926 +19415 0.1633109703855772087966 +19416 0.1155599744768033287912 +19417 0.2162368853594618112179 +19418 0.262212815417712130639 +19419 0.2115976414852433762359 +19420 0.2193535915757351406619 +19421 0.1943645386523293050463 +19422 0.2358579034332057844381 +19423 0.1335962036293740218262 +19424 0.1762738285609267707699 +19425 0.0837598285443656026539 +19426 0.0399419982428383807394 +19427 0.1552809623255279602994 +19428 0.0026347885690389703447 +19429 0.1884907491822910841073 +19430 0.0151183082529401883326 +19431 0.1875882595035078259649 +19432 0.0443097655609649421216 +19433 0.1549378800145528034538 +19434 0.1540120454287680629157 +19435 0.1325472164380676332041 +19436 0.1382390237093153173564 +19437 0.0991179759373038288262 +19438 0.0849721010017295275141 +19439 0.2036645264995898818583 +19440 0.2021706298084396391523 +19441 0.1488664590300987911053 +19442 0.1391391537056507476944 +19443 0.1707274981445553729831 +19444 0.1293269489294239515953 +19445 0.307008874583261259783 +19446 0.139249438001403019749 +19447 0.0861627479139891183646 +19448 0.1043402759517832872893 +19449 0.0130599569330062358075 +19450 0.0870462841967871436832 +19451 0.295857518481225989504 +19452 0.1163666406655083224164 +19453 0.0242388421376321279432 +19454 0.1551858618362858543982 +19455 0.0539025280575478188405 +19456 0.1828007748702586421352 +19457 0.0456938530451443877811 +19458 0.1555275352294999691161 +19459 0.1386873782654015896387 +19460 0.138663916611837967352 +19461 0.010763568197009725641 +19462 0.1868078372437060596933 +19463 0.104196083627554322848 +19464 0.1473563352183137609153 +19465 0.1384529056207997876893 +19466 0.0071343908653651111687 +19467 0.0772633850417903250696 +19468 0.208230869706164584132 +19469 0.145747032735052656216 +19470 0.2389649565959322263797 +19471 0.1683851983399428653421 +19472 0.0483085897076237710812 +19473 0.1204714626686573947234 +19474 0.0213770997127157129369 +19475 0.0205165754833492759712 +19476 0.0718612054523561638586 +19477 0.1441094824385853456761 +19478 0.1896431768428539221816 +19479 0.0900151368788581623948 +19480 0.1183248980482656254054 +19481 0.085834068959219522732 +19482 0.202046441065874965437 +19483 0.0211702161684409854991 +19484 0.1771686759553661727473 +19485 0.0454073851825330110366 +19486 0.1044838833775870584608 +19487 0.0809545941664136098348 +19488 0.0234842086348643896976 +19489 0.1930891349281000368698 +19490 0.1651280523315231396619 +19491 0.0758891687410858939433 +19492 0.094084836857940973287 +19493 0.0198387427963130913433 +19494 0.1050553329692898718584 +19495 0.1572534430924249215789 +19496 0.1926500956805489161994 +19497 0.1888566414403155535595 +19498 0.1776442901847916366886 +19499 0.1316662142121549083917 +19500 0.0396473038116651327334 +19501 0.1761152059068219755122 +19502 0.2153392118958414869212 +19503 0.1768437845865560298364 +19504 0.175459077104742594555 +19505 0.2493026499407893781246 +19506 0.1341624514143845148251 +19507 0.1709681045658610065718 +19508 0.1438064089535948109067 +19509 0.1722545881067915340967 +19510 0.1721745176216334383135 +19511 0.2336127965043690857261 +19512 0.1848862730209096250089 +19513 0.2182037177205269906644 +19514 0.1958093106276805883681 +19515 0.1593829346576683680947 +19516 0.0373748136500234029445 +19517 0.2553602566777393723108 +19518 0.0597867521209291194473 +19519 0.1871100710412376966207 +19520 0.1732011541395917508801 +19521 0.1224099813943768100266 +19522 0.0774181834733798923676 +19523 0.1345344290240473583697 +19524 0.1730603083248112850079 +19525 0.1214111907783063520894 +19526 0.0427815593587118694896 +19527 0.2127996308539320813313 +19528 0.1335710016365999064369 +19529 0.1050464693650860620311 +19530 0.1233249770745096784985 +19531 0.1817259400214256137573 +19532 0.1790858610171597309346 +19533 0.0726312693947817333839 +19534 0.1565866032415126896105 +19535 0.2036313142649500695924 +19536 0.0901569542634270831849 +19537 0.1650437870579088694978 +19538 0.2056517624995332638171 +19539 0.1160011012633629101654 +19540 0.0233919606777039934542 +19541 0.158959796820465515399 +19542 0.1401422971970830988475 +19543 0.0369332793245650894964 +19544 0.1567254288078794022265 +19545 0.174458613803946643861 +19546 0.2103421352842297920027 +19547 0.1658294344609391457457 +19548 0.3791067058651391152857 +19549 0.1916487738501861104456 +19550 0.2048058084144810664373 +19551 0.2165662438253408983435 +19552 0.145928838609814648386 +19553 0.1313972077828148565803 +19554 0.0208521393798084299365 +19555 0.0501247486157631941261 +19556 0.2322677471653670777663 +19557 0.0591357640768880796212 +19558 0.1114758753087855269737 +19559 0.151267027524426972418 +19560 0.1120536235502324062807 +19561 0.0753167010683456067399 +19562 0.2045172976127078157127 +19563 0.2317906610087288321598 +19564 0.1841156168603680642892 +19565 0.1339143325425875552526 +19566 0.28074254364292217101 +19567 0.3447552456811008303461 +19568 0.2101739463506270000348 +19569 0.173486686091951919364 +19570 0.1927512711040374748794 +19571 0.0994466344234377630507 +19572 0.0698832650396323251663 +19573 0.2563544103269011498192 +19574 0.1437530109183902393699 +19575 0.1743356942219426719554 +19576 0.1637456675261451388703 +19577 0.1341046633236205876116 +19578 0.1705083767612341483222 +19579 0.08138820432683614714 +19580 0.1814413734690674406824 +19581 0.0766976387425671035647 +19582 0.0444460029026987959999 +19583 0.1346784420125223258324 +19584 0.0863601913786418429186 +19585 0.1451796815479593494302 +19586 0.0887840511279128635769 +19587 0.0663224213313260979596 +19588 0.1488159344664577166117 +19589 0.2511959998980299313409 +19590 0.2323033402151462867291 +19591 0.0129934731595788503095 +19592 0.0005446997945962187351 +19593 0.0273965061142591274901 +19594 0.0015099767761681366033 +19595 0.0592782278821597785612 +19596 0.0018002655074536935437 +19597 0.0033831865732285755499 +19598 0.001415276905645363641 +19599 0.0009847930905138762084 +19600 0.0778336955421665283028 +19601 0.1938320036152341441227 +19602 0.1374534896597169875143 +19603 0.25865823537401338994 +19604 0.0276633142144024901654 +19605 0.1777210027112783585768 +19606 0.1476007620044580137542 +19607 0.3566456654895809319505 +19608 0.2094825474088457539779 +19609 0.1291085341162588884245 +19610 0.1940602182086839933017 +19611 0.1169649416942527136998 +19612 0.1532958942657072221039 +19613 0.1854344021900693750826 +19614 0.2247087600008850771793 +19615 0.0285868658999691681311 +19616 0.0822272668447440596706 +19617 0.0111492020505989906931 +19618 0.1575207963185207560297 +19619 0.0243978573342100921173 +19620 0.1350939646523492132513 +19621 0.0789567865484441383517 +19622 0.2555301359363819635462 +19623 0.18566688395612215845 +19624 0.1026413348234621880639 +19625 0.1942066325732812737126 +19626 0.1097843002945084883715 +19627 0.2236091924894035576354 +19628 0.2654968955270290975612 +19629 0.1909371790179886230199 +19630 0.1531482317479092580825 +19631 0.1691565579072072944555 +19632 0.0357010833072671993071 +19633 0.0962908059670776056427 +19634 0.2006169672591177344234 +19635 0.2239405673535604657243 +19636 0.1175423939938338852018 +19637 0.0516190099685360087589 \ No newline at end of file diff --git a/_articles/RJ-2025-043/data/RNA_seq_dataset_via_FireBrowse.txt b/_articles/RJ-2025-043/data/RNA_seq_dataset_via_FireBrowse.txt new file mode 100644 index 0000000000..85c212928c --- /dev/null +++ b/_articles/RJ-2025-043/data/RNA_seq_dataset_via_FireBrowse.txt @@ -0,0 +1,2 @@ +accessible via +https://zenodo.org/records/16937028 \ No newline at end of file diff --git a/_articles/RJ-2025-043/figures/Crash_message1.png b/_articles/RJ-2025-043/figures/Crash_message1.png new file mode 100644 index 0000000000..14fb441c53 Binary files /dev/null and b/_articles/RJ-2025-043/figures/Crash_message1.png differ diff --git a/_articles/RJ-2025-043/figures/Crash_message2.png b/_articles/RJ-2025-043/figures/Crash_message2.png new file mode 100644 index 0000000000..30dd44cae9 Binary files /dev/null and b/_articles/RJ-2025-043/figures/Crash_message2.png differ diff --git a/_articles/RJ-2025-043/figures/Figure1.png b/_articles/RJ-2025-043/figures/Figure1.png new file mode 100644 index 0000000000..c10cb974df Binary files /dev/null and b/_articles/RJ-2025-043/figures/Figure1.png differ diff --git a/_articles/RJ-2025-043/figures/Figure1_appendix_secs_vs_Resident_Set_Size_mac.png b/_articles/RJ-2025-043/figures/Figure1_appendix_secs_vs_Resident_Set_Size_mac.png new file mode 100644 index 0000000000..48c16fb5b9 Binary files /dev/null and b/_articles/RJ-2025-043/figures/Figure1_appendix_secs_vs_Resident_Set_Size_mac.png differ diff --git a/_articles/RJ-2025-043/figures/Figure1_appendix_secs_vs_Resident_Set_Size_win.png b/_articles/RJ-2025-043/figures/Figure1_appendix_secs_vs_Resident_Set_Size_win.png new file mode 100644 index 0000000000..e69b3e63ca Binary files /dev/null and b/_articles/RJ-2025-043/figures/Figure1_appendix_secs_vs_Resident_Set_Size_win.png differ diff --git a/_articles/RJ-2025-043/figures/Grafik_Memshare.png b/_articles/RJ-2025-043/figures/Grafik_Memshare.png new file mode 100644 index 0000000000..4f018014e9 Binary files /dev/null and b/_articles/RJ-2025-043/figures/Grafik_Memshare.png differ diff --git a/_articles/RJ-2025-043/figures/figure2-1.pdf b/_articles/RJ-2025-043/figures/figure2-1.pdf new file mode 100644 index 0000000000..0517131bf3 Binary files /dev/null and b/_articles/RJ-2025-043/figures/figure2-1.pdf differ diff --git a/_articles/RJ-2025-043/figures/figure2-1.png b/_articles/RJ-2025-043/figures/figure2-1.png new file mode 100644 index 0000000000..431bb64568 Binary files /dev/null and b/_articles/RJ-2025-043/figures/figure2-1.png differ diff --git a/_articles/RJ-2025-043/memshare.R b/_articles/RJ-2025-043/memshare.R new file mode 100644 index 0000000000..0dba61e051 --- /dev/null +++ b/_articles/RJ-2025-043/memshare.R @@ -0,0 +1,86 @@ +# Generated by `rjournal_pdf_article()` using `knitr::purl()`: do not edit by hand +# Please edit memshare.Rmd to modify this file + +## ----setup, include=FALSE----------------------------------------------------- +knitr::opts_chunk$set(echo = FALSE, warning = FALSE, message = FALSE) +library(ggplot2) +library(kableExtra) + +dir <- getwd() + + +## ----figurememshare, out.width = "100%", fig.cap = "A schematic about where the memory is located and how different sessions access it."---- +knitr::include_graphics("figures/Grafik_Memshare.png") + + +## ----apply, echo = TRUE------------------------------------------------------- +library(memshare) +set.seed(1) +n <- 10000 +p <- 2000 +# Numeric double matrix (required): n rows (cases) x d columns (features) +X <- matrix(rnorm(n * p), n, p) +# Reference vector to correlate with each column +y <- rnorm(n) +f <- function(v, y) cor(v, y) + +ns <- "my_namespace" +res <- memshare::memApply( +X = X, MARGIN = 2, +FUN = f, +NAMESPACE = ns, +VARS = list(y = y), +MAX.CORES = NULL # defaults to detectCores() - 1 +) + + +## ----lapply, echo = TRUE------------------------------------------------------ + library(memshare) + list_length <- 1000 + matrix_dim <- 100 + + # Create the list of random matrices + l <- lapply( + 1:list_length, + function(i) matrix(rnorm(matrix_dim * matrix_dim), + nrow = matrix_dim, ncol = matrix_dim) + ) + + y <- rnorm(matrix_dim) + namespace <- "my_namespace" + + res <- memLapply(l, function(el, y) { + el %*% y + }, NAMESPACE = namespace, VARS = list(y=y), + MAX.CORES = 1) #MAX.CORES=1 for simplicity + + +## ----figure1, out.width = "100%", fig.cap = "Matrix size depicted as magnitude vs median runtime (left) and vs memory overhead (MB) during the run relative to idle (right) for `memshare`, `SharedObject` as error-bar style plots with intevals given by the median ± AMAD across 100 runs. In addition, the serial baseline is shown as a line in magenta. The top subfigures present the full range of matrix sizes, and the bottom subfigures zoom in."---- +knitr::include_graphics("figures/Figure1.png") + + +## ----figure2,out.width = "100%", fig.cap = "The distribution of mutual information for 19637 gene expressions as a histogram, Pareto Density Estimation (PDE), QQ-plot against normal distribution and boxplot. There are no missing values (NaN).", fig.path='figures/'---- +Header <- readLines(file.path(dir,"data/MI_values.lrn"), n = 2)[2] + +mi_values <- read.table(file = file.path(dir,"data/MI_values.lrn"),header = TRUE,sep = "\t",skip = 5) + +DataVisualizations::InspectVariable(mi_values$MI,Name = "Distribution of Mutual Information") +#length(mi_values$MI) +#Header + + +## ----figure1-detail, out.width = "100%", fig.cap = "Median runtime (log-scale) vs matrix size for `memshare`, `SharedObject`, and serial baseline; ribbons show IQR across 100 runs. Insets show difference in total RSS in log(MB), i,e., the memory overhead, during the run relative to idle for Mac presenting the details of Figure \ref{fig:figure1}."---- +knitr::include_graphics("figures/Figure1_appendix_secs_vs_Resident_Set_Size_mac.png") + + +## ----appendix-figure1, out.width = "100%", fig.cap = "Median runtime (log-scale) vs matrix size for `memshare`, `SharedObject`, and serial baseline; ribbons show IQR across 100 runs. Insets show the difference in total RSS in log(MB) (i.e., memory overhead) during the run relative to idle for Windows~10 via Boot Camp."---- +knitr::include_graphics("figures/Figure1_appendix_secs_vs_Resident_Set_Size_win.png") + + +## ----app-a-1, out.width = "100%", fig.cap = "First Screenshot of ShareObjects Computation."---- +knitr::include_graphics("figures/Crash_message1.png") + + +## ----app-a-2, out.width = "100%", fig.cap = "Second Screenshot of ShareObjects Computation."---- +knitr::include_graphics("figures/Crash_message2.png") + diff --git a/_articles/RJ-2025-043/scripts/01BenchmarkMemshareVSSharedObject.R b/_articles/RJ-2025-043/scripts/01BenchmarkMemshareVSSharedObject.R new file mode 100644 index 0000000000..115424c7ed --- /dev/null +++ b/_articles/RJ-2025-043/scripts/01BenchmarkMemshareVSSharedObject.R @@ -0,0 +1,227 @@ +#01BenchmarkMemshareVSSharedObject.R +comment="01BenchmarkMemshareVSSharedObject.R" +if(R.Version()$os=="darwin20"){ + path <- file.path(getwd(),"01Transformierte/mac") +}else{ + path <- file.path(getwd(),"01Transformierte/win") +} +library(ps) + +# Get the PIDs of the workers in a PSOCK cluster +cluster_pids = function(cl) { + as.integer(unlist(clusterCall(cl, Sys.getpid))) +} + +# Sum RSS (MB) for a vector of PIDs; optionally include the master process +total_rss_mb = function(pids, include_master = TRUE) { + rss_pid = function(pid) { + h = ps_handle(pid) + as.numeric(ps_memory_info(h)["rss"]) / (1024^2) + } + # sum worker RSS; skip PIDs that may have exited + worker_sum = sum(vapply(pids, function(pid) { + tryCatch(rss_pid(pid), error = function(e) 0) + }, numeric(1))) + + if (include_master) { + master = as.numeric(ps_memory_info()["rss"]) / (1024^2) + worker_sum + master + } else { + worker_sum + } +} + +nearest_square <- function(i) { + if(i%%1==0){ + A <- matrix(runif(round(10^(i+1))), nrow = 10^i, ncol = 10^i) + }else{ + n <- as.double(round(10^i)) + A <- matrix(runif(round(n*n)), nrow = n, ncol = n) + } + return(A) +} + +library(parallel) +# For comparison purpose, fix the number of clusters to 32 +set.seed(1234) +Ivec <- c(1,2,3,4,4.2,4.5,4.7,5) +MaxIt <- 100 +##SharedObject ---- +cl <- makeCluster(32) +pids <- cluster_pids(cl) +library(SharedObject) +mem_idle_SharedObject <- total_rss_mb(pids, include_master = TRUE) + +SharedObjectPerformance <- c() + +for(i in Ivec){ + if(i>1){ + save(file=file.path(path,"SharedObjectPerformance_v2.rda"), + SharedObjectPerformance,comment,mem_idle_SharedObject) + } + Start <- c() + Ende <- c() + print(i) + mem_before_call <- c() + mem_after_call <- c() + p <- txtProgressBar(min = 0, max = MaxIt, style = 3) + for(k in 1:MaxIt){ + setTxtProgressBar(p, k) + mem_before_call[k] <- total_rss_mb(pids, include_master = TRUE) + A1 <- nearest_square(i) + x1 <- Sys.time() + A2 <- share(A1) + clusterExport(cl, varlist = "A2", envir = environment()) + res <- clusterApply(cl, 1:ncol(A2), function(col_index) { + sd(A2[, col_index]) + }) + Start[k] <- x1 + freeSharedMemory(listSharedObjects()) + x2 <- Sys.time() + Ende[k] <- x2 + mem_after_call[k] <- total_rss_mb(pids, include_master = TRUE) + # Cleanup + clusterEvalQ(cl, { + rm(A2) + gc() + }) + rm(A1) + gc() + save(file=file.path(path,paste0("SharedObjectPerformance_v2_part",i,".rda")), + mem_before_call,mem_after_call,Start,Ende,comment) + } + #temporary save ---- + mem_at_end_SharedObject <- total_rss_mb(pids, include_master = TRUE) + + SharedObjectPerformance[[as.character(i)]] <- cbind(DiffTime=Ende-Start,Start,Ende,mem_before_call,mem_after_call) + + save(file=file.path(path,"SharedObjectPerformance_v2.rda"), + SharedObjectPerformance,comment,mem_idle_SharedObject,mem_at_end_SharedObject) + +} +mem_at_end_SharedObject <- total_rss_mb(pids, include_master = TRUE) +stopCluster(cl) +save(file=file.path(path,"SharedObjectPerformance_v3.rda"), + SharedObjectPerformance,comment,mem_idle_SharedObject,mem_at_end_SharedObject) + +listSharedObjects() + +##memshare ---- +cl <- makeCluster(32) +pids <- cluster_pids(cl) +library(memshare) +mem_idle_memshare <- total_rss_mb(pids, include_master = TRUE) +memsharePerformance <- c() +for(i in Ivec){ + Start <- c() + Ende <- c() + mem_before_call <- c() + mem_after_call <- c() + print(i) + p <- txtProgressBar(min = 0, max = MaxIt, style = 3) + for(k in 1:MaxIt){ + setTxtProgressBar(p, k) + mem_before_call[k] <- total_rss_mb(pids, include_master = TRUE) + A1 <- nearest_square(i) + x1 <- Sys.time() + res <- memshare::memApply(X = A1, MARGIN = 2, FUN = function(x) { + return(sd(x)) + }, CLUSTER=cl, NAMESPACE="namespace_id") + #memshare::releaseViews("my_namespace",A1) + x2 <- Sys.time() + Start[k] <- x1 + Ende[k] <- x2 + mem_after_call[k] <- total_rss_mb(pids, include_master = TRUE) + rm(A1) + gc() + } + memsharePerformance[[as.character(i)]]=cbind(DiffTime=Ende-Start,Start,Ende,mem_before_call,mem_after_call) +save(file=file.path(path,"memsharePerformance.rda"), + memsharePerformance,comment,mem_idle_memshare) + +} +mem_at_end_memshare <- total_rss_mb(pids, include_master = TRUE) + +save(file=file.path(path,"memsharePerformance.rda"), + memsharePerformance,comment,mem_idle_memshare,mem_at_end_memshare) +stopCluster(cl) +##baseline ---- + # 0.25-steps for better line interpolation for small matrices +Ivec <- c(seq(from = 1, to = 4, by = 0.25),4.2,4.5,4.7,5) +mem_idle_base <- total_rss_mb(pids, include_master = TRUE) +BaselinePerformance <- list() +It <- 1 +for(i in Ivec){ + print(paste("Exponent:", i)) + + Start <- c() + Ende <- c() + mem_before_call <- c() + mem_after_call <- c() + + for(k in 1:It){ + + + mem_before_call[k] <- total_rss_mb(pids, include_master = TRUE) + + x1 <- Sys.time() + A1 <- nearest_square(i) + res <- apply(A1, 2, sd) + x2 <- Sys.time() + + Start[k] <- x1 + Ende[k] <- x2 + mem_after_call[k] <- total_rss_mb(pids, include_master = TRUE) + + rm(A1) + gc() + } + BaselinePerformance[[as.character(i)]] <- cbind(DiffTime = Ende - Start, + Start, Ende, + mem_before_call, mem_after_call) +} +mem_at_end_base <- total_rss_mb(pids, include_master = TRUE) + +save(file = file.path(path, "BaselinePerformance.rda"), + BaselinePerformance, comment, mem_idle_base, mem_at_end_base) + +#parallel Baseline +cl <- makeCluster(32) +Ivec <- c(seq(from = 1, to = 4, by = 0.25),4.2,4.5,4.7) +mem_idle_base <- total_rss_mb(pids, include_master = TRUE) +BaselinePerformanceParallel <- list() +It <- 1 +for(i in Ivec){ + print(paste("Exponent:", i)) + + Start <- c() + Ende <- c() + mem_before_call <- c() + mem_after_call <- c() + + for(k in 1:It){ + + + mem_before_call[k] <- total_rss_mb(pids, include_master = TRUE) + + x1 <- Sys.time() + A1 <- nearest_square(i) + res <- parApply(cl,A1, 2, sd) + x2 <- Sys.time() + + Start[k] <- x1 + Ende[k] <- x2 + mem_after_call[k] <- total_rss_mb(pids, include_master = TRUE) + + rm(A1) + gc() + } + BaselinePerformanceParallel[[as.character(i)]] <- cbind(DiffTime = Ende - Start, + Start, Ende, + mem_before_call, mem_after_call) +} +mem_at_end_base <- total_rss_mb(pids, include_master = TRUE) + +save(file = file.path(path, "BaselinePerformanceParallel.rda"), + BaselinePerformanceParallel, comment, mem_idle_base, mem_at_end_base) +stopCluster(cl) \ No newline at end of file diff --git a/_articles/RJ-2025-043/scripts/02CreateManuscriptInR.R b/_articles/RJ-2025-043/scripts/02CreateManuscriptInR.R new file mode 100644 index 0000000000..60dcd32a76 --- /dev/null +++ b/_articles/RJ-2025-043/scripts/02CreateManuscriptInR.R @@ -0,0 +1,77 @@ +#01CreateManuscriptInR.R +install.packages("tinytex") # one-time +tinytex::install_tinytex() # one-time +tinytex::tlmgr_update() # if already installed +tinytex::is_tinytex() + +tinytex::tlmgr_install("pgf") +tinytex::tlmgr_install("fancyhdr") +tinytex::tlmgr_install("microtype") +tinytex::tlmgr_install("setspace") +tinytex::tlmgr_install("titlesec") +tinytex::tlmgr_install("placeins") +tinytex::tlmgr_install("caption") +tinytex::tlmgr_install("environ") +tinytex::tlmgr_install("upquote") + +tinytex::tlmgr_install(c( + "psnfss", # Palatino + friends (pslatex, mathpazo live here) + "pslatex", + "mathpazo", + "palatino", + "microtype", + "cm-super" # full T1 Computer Modern; often helps with T1/font issues +)) + +tinytex::tlmgr_install(c("collection-fontsrecommended", "collection-latexrecommended")) + +install.packages("tikzDevice") +install.packages("rjtools") # one-time + +library(rjtools) +setwd(file.path(getwd(),"05OurPublication2")) +name <- "memshare" +create_article(name = name,file = xfun::with_ext(name,"Rmd")) + +##prepare via chat gpt the table for MacOs ---- +load(file,path(getwd(),"05OurPublication/data/DF_Results_mac.rda")) +is_num_DF_Results<- sapply(DF_Results, is.numeric) +DF_Results[is_num_DF_Results] <- lapply(DF_Results[is_num_DF_Results], function(x) { + return(round(x,4)) +}) +dput(DF_Results) #->prompting + +ind <- which(stat_amad$Type=="memshare"|stat_amad$Type=="SharedObject") +stat_amad_sel <- stat_amad[ind,] + +#is_num_stat_amad<- sapply(stat_amad, is.numeric) +# stat_amad[is_num_stat_amad] <- lapply(stat_amad[is_num_stat_amad], function(x) { +# return(round(x,6)) +# }) +Diff_Sec <- round(as.numeric(stat_amad_sel$Diff_Sec[,1]),5) +MemoryDiff_MB <- round(as.numeric(stat_amad_sel$MemoryDiff_MB[,1]),5) +mem_after_call <- round(as.numeric(stat_amad_sel$mem_after_call[,1]),5) +DF_out <- data.frame(stat_amad_sel$Exponent,stat_amad_sel$Type,Diff_Sec,MemoryDiff_MB,mem_after_call) +dput(DF_out) #->prompting + + +##prepare via chat gpt the table for Windows for appendix ---- +load(file.path(getwd(),"05OurPublication/data/DF_Results_Windows.rda")) +is_num_DF_Results<- sapply(DF_Results, is.numeric) +DF_Results[is_num_DF_Results] <- lapply(DF_Results[is_num_DF_Results], function(x) { + return(round(x,4)) +}) +dput(DF_Results) #->prompting + +ind <- which(stat_amad$Type=="memshare"|stat_amad$Type=="SharedObject") +stat_amad_sel <- stat_amad[ind,] + +#is_num_stat_amad<- sapply(stat_amad, is.numeric) +# stat_amad[is_num_stat_amad] <- lapply(stat_amad[is_num_stat_amad], function(x) { +# return(round(x,6)) +# }) +Diff_Sec <- round(as.numeric(stat_amad_sel$Diff_Sec[,1]),5) +MemoryDiff_MB <- round(as.numeric(stat_amad_sel$MemoryDiff_MB[,1]),5) +mem_after_call <- round(as.numeric(stat_amad_sel$mem_after_call[,1]),5) +DF_out <- data.frame(stat_amad_sel$Exponent,stat_amad_sel$Type,Diff_Sec,MemoryDiff_MB,mem_after_call) +dput(DF_out) #->prompting diff --git a/_articles/RJ-2025-043/scripts/03EvalauteBenchmark.R b/_articles/RJ-2025-043/scripts/03EvalauteBenchmark.R new file mode 100644 index 0000000000..a5c22e3a45 --- /dev/null +++ b/_articles/RJ-2025-043/scripts/03EvalauteBenchmark.R @@ -0,0 +1,522 @@ +#03EvalauteBenchmark.R +Comment <- "03EvalauteBenchmark.R" + +indir <- file.path(getwd(),"01Transformierte/mac") +load(file.path(indir,"BaselinePerformance.rda")) + + +Base <- lapply(1:length(BaselinePerformance), function(i,x) { + x <- x[[i]] + return(cbind(i,x[,1],x[,5]-x[,4],x[,5])) +},BaselinePerformance) +BaselinePerformance_Mat <- do.call(rbind,Base) + +load(file.path(indir,"SharedObjectPerformance_v2.rda")) + +load(file.path(indir,"memsharePerformance.rda")) + +mem_at_end_SharedObject-mem_idle_SharedObject +mem_at_end_memshare-mem_idle_memshare +mem_at_end_base-mem_idle_base + + +CompareShare <- lapply(1:length(SharedObjectPerformance), function(i,x) { + x <- x[[i]] + return(cbind(i,x[,1],x[,5]-x[,4],x[,5])) +},SharedObjectPerformance) +CompareShareMat <- do.call(rbind,CompareShare) + + + +# Scatter plot submitted to R jorunal and provided on Arxiv ---- +CompareMem <- lapply(1:length(memsharePerformance), function(i,x) { + x <- x[[i]] + return(cbind(i,x[,1],x[,5]-x[,4],x[,5])) +},memsharePerformance) +CompareMemMat <- do.call(rbind,CompareMem) + +CompareMemMat[,1] <- CompareMemMat[,1]+10 + +Compare <- rbind(CompareShareMat,CompareMemMat) +colnames(Compare) <- c("Exponent","Diff_Sec","MemoryDiff_MB","mem_after_call") +table(Compare[,1]) +##Prepare accordingly to ggplot2 +Size <- c(rep(c("10^1", "10^2", "10^3", "10^4","10^4.2","10^4.5","10^4.7", "10^5"), 2)) +Source <- c(rep("SharedObject", 8), rep("memshare", 8)) +names(Size) <- c(1:8,11:18) +names(Source) <- c(1:8,11:18) + +df <- data.frame( + x = Compare[,2], + y = Compare[,3], + label = Compare[,1] +) +df$label <- as.factor(df$label) +df$size <- Size[as.character(df$label)] +df$source <- Source[as.character(df$label)] + +# Define explicit shapes for each Source +source_shapes <- c( + "SharedObject" = 15, # square + "memshare" = 17 # triangle +) + +library(ggplot2) +# Plot + + +BaselinePerformance_Mat[,1] <- BaselinePerformance_Mat[,1]+100 +BaselinePerformance_df <- as.data.frame(BaselinePerformance_Mat) +colnames(BaselinePerformance_df) <- c("class","xbase","ybase","z") +BaselinePerformance_df$xbase <- BaselinePerformance_df$xbase +BaselinePerformance_df$ybase <- BaselinePerformance_df$ybase + +size_colors <- c( + "10^1" = "lightblue", + "10^2" = "darkgreen", + "10^3" = "gold", + "10^4" = "orange", + "10^4.2" = "darkorange", + "10^4.5" = "red", + "10^4.7" = "darkred", + "10^5" = "blue" +) + +source_shapes <- c( + "SharedObject" = 21, # circle with fill + border + "memshare" = 24 # triangle with fill + border +) +FontSize <- 26 #for 2000x1000 png, for screen decrease + +obj1 <- ggplot(df, aes(x = DataVisualizations::SignedLog(x), y = DataVisualizations::SignedLog(y))) + + geom_point( + aes(fill = size, shape = source), + size = 3, stroke = 0.7, colour = "black", alpha = 0.5 + ) + + # Magnitude legend (colors) + scale_fill_manual( + name = "Magnitude", + values = size_colors, + guide = guide_legend(override.aes = list(colour = size_colors)) + ) + + # Type legend (shapes) + scale_shape_manual( + name = "Type", + values = source_shapes + ) + + # Baseline legend (line) + geom_line( + data = BaselinePerformance_df, + aes(x = DataVisualizations::SignedLog(xbase), y = DataVisualizations::SignedLog(ybase), colour = "Baseline"), + inherit.aes = FALSE, + linewidth = 1 + ) + + scale_colour_manual( + name = "", # legend title blank + values = c("Baseline" = "magenta"), + guide = guide_legend(override.aes = list(linetype = 1)) + ) + + labs( + # title = "Benchmark of SharedObject vs. memshare", + x = "Time in log(s)", + y = "Difference in total RSS in log(MB)" + ) + + theme_minimal(base_size = FontSize) + + theme( + legend.position = "right", + legend.box = "vertical", + )#+ theme(plot.title = element_text(hjust = 0.5))#+xlim(0,0.1) +obj1 + +obj2 <- ggplot(df, aes(x = DataVisualizations::SignedLog(x), y = DataVisualizations::SignedLog(y))) + + geom_point( + aes(fill = size, shape = source), + size = 3, stroke = 0.7, colour = "black", alpha = 0.5 + ) + + # Magnitude legend (colors) + scale_fill_manual( + name = "Magnitude", + values = size_colors, + guide = guide_legend(override.aes = list(colour = size_colors)) + ) + + # Type legend (shapes) + scale_shape_manual( + name = "Type", + values = source_shapes + ) + + # Baseline legend (line) + geom_line( + data = BaselinePerformance_df, + aes(x = DataVisualizations::SignedLog(xbase), y = DataVisualizations::SignedLog(ybase), colour = "Baseline"), + inherit.aes = FALSE, + linewidth = 1 + ) + + scale_colour_manual( + name = "", # legend title blank + values = c("Baseline" = "magenta"), + guide = guide_legend(override.aes = list(linetype = 1)) + ) + + labs( + #title = "Benchmark of SharedObject vs. memshare", + x = "Time in log(s)", + y = "Difference in total RSS in log(MB)" + ) + + theme_minimal(base_size = FontSize) + xlim(0,0.1)+ylim(0,2)+ theme(legend.position = "none") + # theme( + # legend.position = "right", + # legend.box = "vertical", + # legend.title = element_text(size = 10), + # legend.text = element_text(size = 9) + # )+ theme(plot.title = element_text(hjust = 0.5))+ + +DataVisualizations::Multiplot(obj2,obj1,ColNo = 2) + +# Error bar plot with line plot in the revised version of R Journal ---- +dbt_mad <- function (x) +{ + if (is.vector(x)) { + centerMad <- mad(x) + leftMad <- mad(x[x < median(x)]) + rightMad <- mad(x[x > median(x)]) + } + else { + centerMad <- matrix(1, ncol = ncol(x)) + leftMad <- matrix(1, ncol = ncol(x)) + rightMad <- matrix(1, ncol = ncol(x)) + for (i in 1:ncol(x)) { + centerMad[, i] <- mad(x[, i], na.rm = TRUE) + leftMad[, i] <- mad(x[x < median(x[, i])], na.rm = TRUE) + rightMad[, i] <- mad(x[x > median(x[, i])], na.rm = TRUE) + } + } + return(list(centerMad = centerMad, leftMad = leftMad, rightMad = rightMad)) +} +amad <- function (x) +{ + adjfactor <- 1.3 + ergMad <- dbt_mad(x) + amad <- ergMad$centerMad * adjfactor + leftAmad <- ergMad$leftMad * adjfactor + rightAmad <- ergMad$rightMad * adjfactor + return(list(amad = amad, leftAmad = leftAmad, rightAmad = rightAmad)) +} +amad_val <- function(x,na.rm){ + if(isTRUE(na.rm)){ + x <- x[is.finite(x)] + } + return (amad(x)$amad) +} +CompareMem <- lapply(1:length(memsharePerformance), function(i,x) { + x <- x[[i]] + return(cbind(i,x[,1],x[,5]-x[,4],x[,5])) +},memsharePerformance) +CompareMemMat <- do.call(rbind,CompareMem) + +Compare <- rbind(CompareShareMat,CompareMemMat) +colnames(Compare) <- c("Exponent","Diff_Sec","MemoryDiff_MB","mem_after_call") +table(Compare[,1]) +## --- mapping exponent -> Magnitude label (as in your Size vector) ---- +exp_to_mag <- c( + "1" = "10^1", + "2" = "10^2", + "3" = "10^3", + "4" = "10^4", + "5" = "10^4.2", + "6" = "10^4.5", + "7" = "10^4.7", + "8" = "10^5" +) + +## SharedObject: exponents 1..8 +df_time_SO <- data.frame( + method = "SharedObject", + exponent = CompareShareMat[,1], + magnitude = exp_to_mag[as.character(CompareShareMat[,1])], + value = CompareShareMat[,2], # time (Diff_Sec) + mem = CompareShareMat[,3] # memory diff +) + +## memshare: exponents 1..8 +df_time_MS <- data.frame( + method = "memshare", + exponent = CompareMemMat[,1], + magnitude = exp_to_mag[as.character(CompareMemMat[,1])], + value = CompareMemMat[,2], + mem = CompareMemMat[,3] +) + +## Baseline: exponents 1..8 +df_time_BL <- data.frame( + method = "Baseline", + exponent = BaselinePerformance_Mat[,1], + magnitude = 10^as.numeric(names(BaselinePerformance)),#paste0("10^",names(BaselinePerformance)), + value = BaselinePerformance_Mat[,2], # time + mem = BaselinePerformance_Mat[,3] # memory diff +) + +## Long tables for time and memory +df_time_long <- rbind( + df_time_SO[,c("method","magnitude","value")], + df_time_MS[,c("method","magnitude","value")] +) + +df_mem_long <- rbind( + df_time_SO[,c("method","magnitude","mem")], + df_time_MS[,c("method","magnitude","mem")] +) +colnames(df_mem_long)[3] <- "value" # same col name as df_time_long + +MakeYmatrixForClassErrorbar <- function(df_long, MagnitudeLevels, MethodLevels) { + df_long$magnitude <- factor(df_long$magnitude, levels = MagnitudeLevels) + df_long$method <- factor(df_long$method, levels = MethodLevels) + + # maximum number of repetitions over all (method, magnitude) + n_max <- max(table(df_long$magnitude, df_long$method)) + + cols <- list() + for (meth in MethodLevels) { + for (mag in MagnitudeLevels) { + v <- df_long$value[df_long$method == meth & df_long$magnitude == mag] + if (length(v) == 0L) { + v <- rep(NA_real_, n_max) + } else if (length(v) < n_max) { + v <- c(v, rep(NA_real_, n_max - length(v))) + } + cols[[paste(meth, mag, sep = "_")]] <- v + } + } + + Y <- do.call(cbind, cols) + Xvalues <- MagnitudeLevels + Cls <- rep(seq_along(MethodLevels), each = length(MagnitudeLevels)) + + list(Ymatrix = Y, Xvalues = Xvalues, Cls = Cls) +} + +MagnitudeLevels <- c("10^1", "10^2", "10^3", "10^4", + "10^4.2", "10^4.5", "10^4.7", "10^5") +MethodLevels <- c("SharedObject", "memshare") + +## --- time --- +time_dat <- MakeYmatrixForClassErrorbar(df_time_long, MagnitudeLevels, MethodLevels) +Xvalues <- log(c(10^1,10^2,10^3,10^4,10^4.2,10^4.5,10^4.7,10^5),base = 10) +names(Xvalues) <- time_dat$Xvalues +CE_time <- DataVisualizations::ClassErrorbar( + Xvalues = Xvalues, + Ymatrix = time_dat$Ymatrix, + Cls = time_dat$Cls, + ClassNames = MethodLevels, + SDfun = amad_val, + ClassCols = c("steelblue","orange"), + ClassShape = c(20, 18), + main = "", + xlab = "Magnitude", + ylab = "Time [s]", + JitterPosition = 0, + WhiskerWidth = 0.15, + Whisker_lwd = 0.7, + BW = FALSE +) +p_time_full <- CE_time$ggobj + + theme_minimal(base_size = FontSize) + + ## x-axis with pretty 10^k labels (here only 10^1,10^2,10^3,10^4,10^5) + scale_x_continuous( + breaks = Xvalues[c(1:4, 8)], + labels = as.expression( + lapply(Xvalues[c(1:4, 8)], function(z) bquote(10^.(z))) + ) + ) + + ## Baseline as magenta line; also mapped to colour AND shape + geom_line( + data = df_time_BL, + aes( + x = log(magnitude, base = 10), + y = value, + colour = "Baseline", + shape = "Baseline" + ), + inherit.aes = FALSE, + linewidth = 1 + ) + + ## Overwrite BOTH colour and shape scales so there is ONE legend + scale_colour_manual( + name = "Type", + values = c( + "SharedObject" = "steelblue", + "memshare" = "orange", + "Baseline" = "magenta" + ) + ) + + scale_shape_manual( + name = "Type", + values = c( + "SharedObject" = 20, + "memshare" = 18, + "Baseline" = NA # no point symbol for baseline + ), + guide = guide_legend( + override.aes = list( + linetype = c("solid","blank", "blank"), # Baseline as line + size = c( 1.2,3, 3) # point sizes / line size + ) + ) + ) + +#p_time_full + +## zoomed time (adjust ylim as needed) +p_time_zoom <- p_time_full + + coord_cartesian(ylim = c(min(CE_time$Statistics$lower), + quantile(CE_time$Statistics$upper, 0.6))) + + +mem_dat <- MakeYmatrixForClassErrorbar(df_mem_long, MagnitudeLevels, MethodLevels) + +Xvalues <- log(c(10^1,10^2,10^3,10^4,10^4.2,10^4.5,10^4.7,10^5),base = 10) +names(Xvalues) <- mem_dat$Xvalues +CE_mem <- DataVisualizations::ClassErrorbar( + Xvalues = Xvalues, + Ymatrix = mem_dat$Ymatrix, + Cls = mem_dat$Cls, + ClassNames = MethodLevels, + SDfun = amad_val, + ClassCols = c("steelblue","orange"), + ClassShape = c(20, 18), + main = "", + xlab = "Magnitude", + ylab = "Memory overhead [MB]", + JitterPosition = 0, + WhiskerWidth = 0.15, + Whisker_lwd = 0.7, + BW = FALSE +) + +p_mem_full <- CE_mem$ggobj + theme_minimal(base_size = FontSize) + + ## x-axis with pretty 10^k labels (here only 10^1,10^2,10^3,10^4,10^5) + scale_x_continuous( + breaks = Xvalues[c(1:4, 8)], + labels = as.expression( + lapply(Xvalues[c(1:4, 8)], function(z) bquote(10^.(z))) + ) + ) + + ## Baseline as magenta line; also mapped to colour AND shape + geom_line( + data = df_time_BL, + aes( + x = log(magnitude, base = 10), + y = mem, + colour = "Baseline", + shape = "Baseline" + ), + inherit.aes = FALSE, + linewidth = 1 + ) + + ## Overwrite BOTH colour and shape scales so there is ONE legend + scale_colour_manual( + name = "Type", + values = c( + "SharedObject" = "steelblue", + "memshare" = "orange", + "Baseline" = "magenta" + ) + ) + + scale_shape_manual( + name = "Type", + values = c( + "SharedObject" = 20, + "memshare" = 18, + "Baseline" = NA # no point symbol for baseline + ), + guide = guide_legend( + override.aes = list( + linetype = c("solid","blank", "blank"), # Baseline as line + size = c( 1.2,3, 3) # point sizes / line size + ) + ) + ) + +p_mem_zoom <- p_mem_full + + coord_cartesian(ylim = c(min(CE_mem$Statistics$lower), + quantile(CE_mem$Statistics$upper, 0.6))) + +# helper to add panel label in top-left +add_panel_label <- function(p, label) { + p + annotate( + "text", + x = -Inf, + y = Inf, + label = label, + hjust = -0.1, # nudge a bit right from -Inf + vjust = 1.1, # nudge a bit down from Inf + fontface = "bold", + ) +} + +# add labels +p_time_full_lab <- add_panel_label(p_time_full, "A") +p_time_zoom_lab <- add_panel_label(p_time_zoom, "C (Zoom)") +p_mem_full_lab <- add_panel_label(p_mem_full, "B") +p_mem_zoom_lab <- add_panel_label(p_mem_zoom, "D (Zoom)") + +DataVisualizations::Multiplot( + p_time_full_lab, + p_time_zoom_lab, + p_mem_full_lab, + p_mem_zoom_lab, + ColNo = 2 +) +##prepare an object for ktable ---- + + + + +load(file.path(indir,"BaselinePerformanceParallel.rda")) +Base_par <- lapply(1:length(BaselinePerformanceParallel), function(i,x) { + x <- x[[i]] + return(data.frame(Exponent=names(BaselinePerformanceParallel)[i],x[,1],x[,5]-x[,4],x[,5])) +},BaselinePerformanceParallel) +BaselinePerformance_Par <- do.call(rbind,Base_par) +colnames(BaselinePerformance_Par) <- c("Exponent","Diff_Sec","MemoryDiff_MB","mem_after_call") +BaselinePerformance_Par$Type <- "Baseline Parallel" + +load(file.path(indir,"BaselinePerformance.rda")) + +Base <- lapply(1:length(BaselinePerformance), function(i,x) { + x <- x[[i]] + return(data.frame(Exponent <- names(BaselinePerformance)[i],x[,1],x[,5]-x[,4],x[,5])) +},BaselinePerformance) +BaselinePerformance <- do.call(rbind,Base) +colnames(BaselinePerformance) <- c("Exponent","Diff_Sec","MemoryDiff_MB","mem_after_call") +BaselinePerformance$Type <- "Baseline" + +load(file.path(indir,"memsharePerformance.rda")) +CompareMem <- lapply(1:length(memsharePerformance), function(i,x) { + x <- x[[i]] + return(data.frame(names(memsharePerformance)[i],x[,1],x[,5]-x[,4],x[,5])) +},memsharePerformance) +CompareMemDF <- do.call(rbind,CompareMem) +colnames(CompareMemDF) <- c("Exponent","Diff_Sec","MemoryDiff_MB","mem_after_call") +CompareMemDF_med <- aggregate(cbind(Diff_Sec,MemoryDiff_MB,mem_after_call) ~Exponent,data=CompareMemDF,FUN=median,na.rm=T) +CompareMemDF_med$Type <- "memshare" + +CompareMemDF_amad <- aggregate(cbind(Diff_Sec,MemoryDiff_MB,mem_after_call) ~Exponent,data=CompareMemDF,FUN=amad) +CompareMemDF_amad$Type <- "memshare" + +load(file.path(indir,"SharedObjectPerformance_v2.rda")) +CompareShare <- lapply(1:length(SharedObjectPerformance), function(i,x) { + x <- x[[i]] + return(data.frame(Exponent=names(SharedObjectPerformance)[i],x[,1],x[,5]-x[,4],x[,5])) +},SharedObjectPerformance) +CompareShareDF <- do.call(rbind,CompareShare) +colnames(CompareShareDF) <- c("Exponent","Diff_Sec","MemoryDiff_MB","mem_after_call") +CompareShareDF_med <- aggregate(cbind(Diff_Sec,MemoryDiff_MB,mem_after_call) ~Exponent,data=CompareShareDF,FUN=median,na.rm=T) +CompareShareDF_med$Type <- "SharedObject" + +CompareShareDF_amad <- aggregate(cbind(Diff_Sec,MemoryDiff_MB,mem_after_call) ~Exponent,data=CompareShareDF,FUN=amad) +CompareShareDF_amad$Type <- "SharedObject" + +DF_Results <- rbind(CompareShareDF_med,CompareMemDF_med,BaselinePerformance,BaselinePerformance_Par) + +stat_amad <- rbind(CompareShareDF_amad,CompareMemDF_amad,BaselinePerformance,BaselinePerformance_Par) + +save(file=file.path(getwd(),"05OurPublication/data","DF_Results_mac.rda"),DF_Results,stat_amad,Comment) diff --git a/_articles/RJ-2025-043/scripts/03EvalauteBenchmark_windows.R b/_articles/RJ-2025-043/scripts/03EvalauteBenchmark_windows.R new file mode 100644 index 0000000000..7d64c335e6 --- /dev/null +++ b/_articles/RJ-2025-043/scripts/03EvalauteBenchmark_windows.R @@ -0,0 +1,260 @@ +#03EvalauteBenchmark.R +Comment <- "03EvalauteBenchmark_windows.R" + +indir <- file.path(getwd(),"01Transformierte/win") +load(file.path(indir,"BaselinePerformance_100trials.rda")) + + +Base <- lapply(1:length(BaselinePerformance), function(i,x) { + x <- x[[i]] + return(cbind(i,mean(x[,1]),mean(x[,5])-mean(x[,4]),mean(x[,5]))) +},BaselinePerformance) +BaselinePerformance_Mat <- do.call(rbind,Base) + +load(file.path(indir,"SharedObjectPerformance_v2.rda")) + +load(file.path(indir,"memsharePerformance.rda")) + +# mem_at_end_SharedObject-mem_idle_SharedObject +# mem_at_end_memshare-mem_idle_memshare +# mem_at_end_base-mem_idle_base + + +CompareShare <- lapply(1:length(SharedObjectPerformance), function(i,x) { + x <- x[[i]] + return(cbind(i,x[,1],x[,5]-x[,4],x[,5])) +},SharedObjectPerformance) +CompareShareMat <- do.call(rbind,CompareShare) + +CompareMem <- lapply(1:length(memsharePerformance), function(i,x) { + x <- x[[i]] + return(cbind(i,x[,1],x[,5]-x[,4],x[,5])) +},memsharePerformance) +CompareMemMat <- do.call(rbind,CompareMem) + +CompareMemMat[,1] <- CompareMemMat[,1]+10 + +Compare <- rbind(CompareShareMat,CompareMemMat) +colnames(Compare) <- c("Exponent","Diff_Sec","MemoryDiff_MB","mem_after_call") +table(Compare[,1]) + +##Prepare accordingly to ggplot2 +Size <- c(rep(c("10^1", "10^2", "10^3", "10^4","10^4.2","10^4.5","10^4.7"), 2)) +Source <- c(rep("SharedObject", length(memsharePerformance)), rep("memshare", length(SharedObjectPerformance))) +names(Size) <- c(1:(length(memsharePerformance)),11:(11+length(SharedObjectPerformance)-1)) +names(Source) <- c(1:(length(memsharePerformance)),11:(11+length(SharedObjectPerformance)-1)) + +df <- data.frame( + x = Compare[,2], + y = Compare[,3], + label = Compare[,1] +) +df$label <- as.factor(df$label) +df$size <- Size[as.character(df$label)] +df$source <- Source[as.character(df$label)] + +# size_colors <- c( +# "10^1" = "lightblue", +# "10^2" = "darkgreen", +# "10^3" = "gold", +# "10^4" = "red", +# "10^5" = "black" +# ) + +# Define explicit shapes for each Source +source_shapes <- c( + "SharedObject" = 15, # square + "memshare" = 17 # triangle +) + +library(ggplot2) +# Plot + + +BaselinePerformance_Mat[,1] <- BaselinePerformance_Mat[,1]+100 +BaselinePerformance_df <- as.data.frame(BaselinePerformance_Mat) +colnames(BaselinePerformance_df) <- c("class","xbase","ybase","z") +BaselinePerformance_df$xbase <- BaselinePerformance_df$xbase +BaselinePerformance_df$ybase <- BaselinePerformance_df$ybase + +size_colors <- c( + "10^1" = "lightblue", + "10^2" = "darkgreen", + "10^3" = "gold", + "10^4" = "orange", + "10^4.2" = "darkorange", + "10^4.5" = "red", + "10^4.7" = "darkred"#, + # "10^5" = "blue" +) + +source_shapes <- c( + "SharedObject" = 21, # circle with fill + border + "memshare" = 24 # triangle with fill + border +) +FontSize <- 26 #for 2000x1000 png, for screen decrease + +obj1 <- ggplot(df, aes(x = DataVisualizations::SignedLog(x), y = DataVisualizations::SignedLog(y))) + + geom_point( + aes(fill = size, shape = source), + size = 3, stroke = 0.7, colour = "black", alpha = 0.5 + ) + + # Magnitude legend (colors) + scale_fill_manual( + name = "Magnitude", + values = size_colors, + guide = guide_legend(override.aes = list(colour = size_colors)) + ) + + # Type legend (shapes) + scale_shape_manual( + name = "Type", + values = source_shapes + ) + + # Baseline legend (line) + geom_line( + data = BaselinePerformance_df, + aes(x = DataVisualizations::SignedLog(xbase), y = DataVisualizations::SignedLog(ybase), colour = "Baseline"), + inherit.aes = FALSE, + linewidth = 1 + ) + + scale_colour_manual( + name = "", # legend title blank + values = c("Baseline" = "magenta"), + guide = guide_legend(override.aes = list(linetype = 1)) + ) + + labs( + # title = "Benchmark of SharedObject vs. memshare", + x = "Time in log(s)", + y = "Difference in total RSS in log(MB)" + ) + + theme_minimal(base_size = FontSize) + + theme( + legend.position = "right", + legend.box = "vertical", + )#+ theme(plot.title = element_text(hjust = 0.5))#+xlim(0,0.1) +obj1 + +obj2=ggplot(df, aes(x = DataVisualizations::SignedLog(x), y = DataVisualizations::SignedLog(y))) + + geom_point( + aes(fill = size, shape = source), + size = 3, stroke = 0.7, colour = "black", alpha = 0.5 + ) + + # Magnitude legend (colors) + scale_fill_manual( + name = "Magnitude", + values = size_colors, + guide = guide_legend(override.aes = list(colour = size_colors)) + ) + + # Type legend (shapes) + scale_shape_manual( + name = "Type", + values = source_shapes + ) + + # Baseline legend (line) + geom_line( + data = BaselinePerformance_df, + aes(x = DataVisualizations::SignedLog(xbase), y = DataVisualizations::SignedLog(ybase), colour = "Baseline"), + inherit.aes = FALSE, + linewidth = 1 + ) + + scale_colour_manual( + name = "", # legend title blank + values = c("Baseline" = "magenta"), + guide = guide_legend(override.aes = list(linetype = 1)) + ) + + labs( + #title = "Benchmark of SharedObject vs. memshare", + x = "Time in log(s)", + y = "Difference in total RSS in log(MB)" + ) + + theme_minimal(base_size = FontSize) + xlim(0,0.1)+ylim(0,2)+ theme(legend.position = "none") + # theme( + # legend.position = "right", + # legend.box = "vertical", + # legend.title = element_text(size = 10), + # legend.text = element_text(size = 9) + # )+ theme(plot.title = element_text(hjust = 0.5))+ + +DataVisualizations::Multiplot(obj2,obj1,ColNo = 2) + +##prepare an object for ktable ---- + +dbt_mad <- function (x) +{ + if (is.vector(x)) { + centerMad <- mad(x) + leftMad <- mad(x[x < median(x)]) + rightMad <- mad(x[x > median(x)]) + } + else { + centerMad <- matrix(1, ncol = ncol(x)) + leftMad <- matrix(1, ncol = ncol(x)) + rightMad <- matrix(1, ncol = ncol(x)) + for (i in 1:ncol(x)) { + centerMad[, i] <- mad(x[, i], na.rm = TRUE) + leftMad[, i] <- mad(x[x < median(x[, i])], na.rm = TRUE) + rightMad[, i] <- mad(x[x > median(x[, i])], na.rm = TRUE) + } + } + return(list(centerMad = centerMad, leftMad = leftMad, rightMad = rightMad)) +} +amad <- function (x) +{ + adjfactor <- 1.3 + ergMad <- dbt_mad(x) + amad <- ergMad$centerMad * adjfactor + leftAmad <- ergMad$leftMad * adjfactor + rightAmad <- ergMad$rightMad * adjfactor + return(list(amad = amad, leftAmad = leftAmad, rightAmad = rightAmad)) +} + +load(file.path(indir,"BaselinePerformanceParallel.rda")) +Base_par <- lapply(1:length(BaselinePerformanceParallel), function(i,x) { + x <- x[[i]] + return(data.frame(Exponent=names(BaselinePerformanceParallel)[i],x[,1],x[,5]-x[,4],x[,5])) +},BaselinePerformanceParallel) +BaselinePerformance_Par <- do.call(rbind,Base_par) +colnames(BaselinePerformance_Par) <- c("Exponent","Diff_Sec","MemoryDiff_MB","mem_after_call") +BaselinePerformance_Par$Type <- "Baseline Parallel" + +load(file.path(indir,"BaselinePerformance_100trials.rda")) + +Base <- lapply(1:length(BaselinePerformance), function(i,x) { + x <- x[[i]] + return(data.frame(Exponent=names(BaselinePerformance)[i],mean(x[,1]),mean(x[,5])-mean(x[,4]),mean(x[,5]))) +},BaselinePerformance) +BaselinePerformance <- do.call(rbind,Base) +colnames(BaselinePerformance) <- c("Exponent","Diff_Sec","MemoryDiff_MB","mem_after_call") +BaselinePerformance$Type <- "Baseline" + +load(file.path(indir,"memsharePerformance.rda")) +CompareMem <- lapply(1:length(memsharePerformance), function(i,x) { + x <- x[[i]] + return(data.frame(names(memsharePerformance)[i],x[,1],x[,5]-x[,4],x[,5])) +},memsharePerformance) +CompareMemDF <- do.call(rbind,CompareMem) +colnames(CompareMemDF) <- c("Exponent","Diff_Sec","MemoryDiff_MB","mem_after_call") +CompareMemDF_med <- aggregate(cbind(Diff_Sec,MemoryDiff_MB,mem_after_call) ~Exponent,data=CompareMemDF,FUN=median,na.rm=T) +CompareMemDF_med$Type <- "memshare" + +CompareMemDF_amad <- aggregate(cbind(Diff_Sec,MemoryDiff_MB,mem_after_call) ~Exponent,data=CompareMemDF,FUN=amad) +CompareMemDF_amad$Type <- "memshare" + +load(file.path(indir,"SharedObjectPerformance_v2.rda")) +CompareShare <- lapply(1:length(SharedObjectPerformance), function(i,x) { + x <- x[[i]] + return(data.frame(Exponent=names(SharedObjectPerformance)[i],x[,1],x[,5]-x[,4],x[,5])) +},SharedObjectPerformance) +CompareShareDF <- do.call(rbind,CompareShare) +colnames(CompareShareDF) <- c("Exponent","Diff_Sec","MemoryDiff_MB","mem_after_call") +CompareShareDF_med <- aggregate(cbind(Diff_Sec,MemoryDiff_MB,mem_after_call) ~Exponent,data=CompareShareDF,FUN=median,na.rm=T) +CompareShareDF_med$Type <- "SharedObject" + +CompareShareDF_amad <- aggregate(cbind(Diff_Sec,MemoryDiff_MB,mem_after_call) ~Exponent,data=CompareShareDF,FUN=amad) +CompareShareDF_amad$Type <- "SharedObject" + +DF_Results <- rbind(CompareShareDF_med,CompareMemDF_med,BaselinePerformance,BaselinePerformance_Par) + +stat_amad <- rbind(CompareShareDF_amad,CompareMemDF_amad,BaselinePerformance,BaselinePerformance_Par) + +save(file=file.path(file.path(getwd(),"05OurPublication/data"),"DF_Results_Windows.rda"),DF_Results,stat_amad,Comment) diff --git a/_articles/RJ-2025-043/scripts/04MionFirebrowse.R b/_articles/RJ-2025-043/scripts/04MionFirebrowse.R new file mode 100644 index 0000000000..bf4ae68df3 --- /dev/null +++ b/_articles/RJ-2025-043/scripts/04MionFirebrowse.R @@ -0,0 +1,70 @@ +#04MionFirebrowse.R +Comment="04MionFirebrowse.R" +library(ps) + +# Get the PIDs of the workers in a PSOCK cluster +cluster_pids <- function(cl) { + as.integer(unlist(clusterCall(cl, Sys.getpid))) +} + +# Sum RSS (MB) for a vector of PIDs; optionally include the master process +total_rss_mb <- function(pids, include_master = TRUE) { + rss_pid <- function(pid) { + h <- ps_handle(pid) + as.numeric(ps_memory_info(h)["rss"]) / (1024^2) + } + # sum worker RSS; skip PIDs that may have exited + worker_sum <- sum(vapply(pids, function(pid) { + tryCatch(rss_pid(pid), error = function(e) 0) + }, numeric(1))) + + if (include_master) { + master <- as.numeric(ps_memory_info()["rss"]) / (1024^2) + worker_sum + master + } else { + worker_sum + } +} +#set path to zenodo downloaded data +path <- file.path(getwd(),"01Transformierte") +#user another csv reader +#for example +#read.table(file = file.path(dir,"data/12Genexpession_Firebrowse_d19637_N10446".lrn"),header = T,sep = "\t",skip = 5) + +V <- ReadLRN("12Genexpession_Firebrowse_d19637_N10446",path) +Key=V$Key +Data=V$Data +Header=V$Header +#user another csv reader +V2 <- ReadCLS("14Genexpression_Firebrowse_Cls_N10446",path) +ClsKey <- V2$Key +Cls <- V2$Cls +TheSameKey(Key,ClsKey) +#FCPS::ClusterCount(Cls) +library(parallel) +library(memshare) +cl <- makeCluster(detectCores()-1) +namespace <- "mutual_info" + +pids <- cluster_pids(cl) +mem_idle<- total_rss_mb(pids, include_master = TRUE) +start <- Sys.time() +mi_vals <- memshare::memApply( + CLUSTER = cl, + X = Data, + MARGIN = 2, + FUN = function(x,y) { + cc <- memshare::mutualinfo(x,y,isYDiscrete = T,na.rm = T,useMPMI = F) + return(cc) + }, + VARS = list(y=Cls), + NAMESPACE = namespace) + +mem_at_end <- total_rss_mb(pids, include_master = TRUE) +atend <- Sys.time() +mi_vals_vec <- unlist(mi_vals) +atend-start +Comments <- paste("TimeDiff in hours:",atend-start,"MemDiff in MB:",mem_at_end-mem_idle,"mem_idle:",mem_idle,"mem_at_end:",mem_at_end,"TimeStart:",start,"TimeEnd:",atend,Comment) + +# not accessible externally +#WriteLRN("MI_values",Data =mi_vals_vec,Key = as.numeric(gsub("C","",Header)),Comments = Comments,Header =c("MI"),OutDirectory = RelPath(1,"05OurPublication/data")) diff --git a/_articles/RJ-2025-044/RJ-2025-044.R b/_articles/RJ-2025-044/RJ-2025-044.R new file mode 100644 index 0000000000..10754c534b --- /dev/null +++ b/_articles/RJ-2025-044/RJ-2025-044.R @@ -0,0 +1,859 @@ +# Generated by `rjournal_pdf_article()` using `knitr::purl()`: do not edit by hand +# Please edit RJ-2025-044.Rmd to modify this file + +## ----setup, include=FALSE----------------------------------------------------- +knitr::opts_chunk$set(echo = FALSE, warning = FALSE, message = FALSE) +library(ggplot2) +library(MultiATSM) +library(kableExtra) +library(magrittr) + + +## ----tab-ModFea-H, eval = knitr::is_html_output(), layout = "l-body-outset"---- +# ModelLabels <- c("JPS original", "JPS global", "JPS multi", "GVAR single", "GVAR multi", +# "JLL original", "JLL No DomUnit", "JLL joint Sigma") +# +# # Rows +# Tab <- data.frame(matrix(nrow = length(ModelLabels), ncol = 0)) +# rownames(Tab) <- ModelLabels +# +# # Empty columns +# EmptyCol <- c("", "", "", "", "", "", "", "") +# Tab$EmptyCol0 <- EmptyCol +# # P-dynamics + 2 empty spaces +# Tab$PdynIndUnco <- c("x", "", "", "", "", "", "", "") +# Tab$PdynIndCo <- c("", "", "", "", "", "", "", "") +# Tab$PdynJointUnco <- c("", "x", "x", "", "", "", "", "") +# Tab$PdynJointJLL <- c("", "", "", "", "", "x", "x", "x") +# Tab$PdynJointGVAR <- c("", "", "", "x", "x", "", "", "") +# Tab$EmptyCol1 <- EmptyCol +# Tab$EmptyCol2 <- EmptyCol +# # Q-dynamics + 2 empty spaces +# Tab$QdynInd <- c("x", "x", "", "x", "", "", "", "") +# Tab$QdynJoint <- c("", "", "x", "", "x", "x", "x", "x") +# Tab$EmptyCol3 <- EmptyCol +# Tab$EmptyCol4 <- EmptyCol +# # Sigma + 2 empty spaces +# Tab$Ponly <- c("", "", "", "", "", "x", "x", "") +# Tab$PandQ <- c("x", "x", "x", "x", "x", "", "", "x") +# Tab$EmptyCol5 <- EmptyCol +# Tab$EmptyCol6 <- EmptyCol +# # Dominant Unit +# Tab$DomUnit <- c("", "", "", "", "", "x", "", "x") +# +# # Adjust column names +# ColNames <- c("","","","","JLL", "GVAR", "", "", "", "", "", "", "","", "", "","") +# colnames(Tab) <- ColNames +# +# # Generate the table +# kableExtra::kbl(Tab, align = "c", caption = "Summary of model features") %>% +# kableExtra::kable_classic("striped", full_width = F) %>% +# kableExtra::row_spec(0, font_size = 10) %>% +# kableExtra::add_header_above(c(" "=2, "UR" = 1, "R" = 1, "UR" = 1, "R" = 2, " " = 11)) %>% +# kableExtra::add_header_above(c(" "=2, "Single" = 2, "Joint" = 3, " "=2, "Single" = 1, "Joint" = 1, " "=2, "P only" = 1, "P and Q" = 1, " " = 3)) %>% +# kableExtra::add_header_above(c( " "=2, "P-dynamics"= 5, " "=2, "Q-dynamics"= 2, " "=2, "Sigma matrix estimation" = 2, " "=2, "Dom. Eco."=1), bold = T) %>% +# kableExtra::pack_rows("Unrestricted VAR", 1, 3 , label_row_css = "background-color: #666; color: #fff;") %>% +# kableExtra::pack_rows("Restricted VAR (GVAR)", 4, 5, label_row_css = "background-color: #666; color: #fff;") %>% +# kableExtra::pack_rows("Restricted VAR (JLL)", 6, 8, label_row_css = "background-color: #666; color: #fff;") %>% +# kableExtra::column_spec(1, width = "10em") %>% +# kableExtra::column_spec(3:17, width = "4.5em") %>% +# kableExtra::footnote(general = "Risk factor dynamics under the \\(\\mathbb{P}\\)-measure may follow either an unrestricted (UR) or a restricted (R) specification. The set of restrictions present in the JLL-based and GVAR-based models are described in @JotikasthiraLeLundblad2015 and @CandelonMoura2024, respectively. The estimation of the \\(\\Sigma\\) matrix is done either exclusively with the other parameters of the \\(\\mathbb{P}\\)-dynamics (*P* column) or jointly under both \\(\\mathbb{P}\\)- and \\(\\mathbb{Q}\\)-parameters (*P and Q* column). *Dom. Eco.* relates to the presence of a dominant economy. The entries featuring *x* indicate that the referred characteristic is part of the model.", +# escape = FALSE) + + +## ----tab-ModFea-L, eval = knitr::is_latex_output()---------------------------- +ModelLabels <- c("JPS original", "JPS global", "JPS multi", "GVAR single", "GVAR multi", + "JLL original", "JLL No DomUnit", "JLL joint Sigma") + +# Rows +Tab <- data.frame(matrix(nrow = length(ModelLabels), ncol = 0)) +rownames(Tab) <- ModelLabels + +# Empty columns +EmptyCol <- c("", "", "", "", "", "", "", "") +Tab$EmptyCol0 <- EmptyCol +# P-dynamics + 2 empty spaces +Tab$PdynIndUnco <- c("x", "", "", "", "", "", "", "") +Tab$PdynIndCo <- c("", "", "", "", "", "", "", "") +Tab$PdynJointUnco <- c("", "x", "x", "", "", "", "", "") +Tab$PdynJointJLL <- c("", "", "", "", "", "x", "x", "x") +Tab$PdynJointGVAR <- c("", "", "", "x", "x", "", "", "") +Tab$EmptyCol1 <- EmptyCol +# Q-dynamics + 2 empty spaces +Tab$QdynInd <- c("x", "x", "", "x", "", "", "", "") +Tab$QdynJoint <- c("", "", "x", "", "x", "x", "x", "x") +Tab$EmptyCol4 <- EmptyCol +# Sigma + 2 empty spaces +Tab$Ponly <- c("", "", "", "", "", "x", "x", "") +Tab$PandQ <- c("x", "x", "x", "x", "x", "", "", "x") +Tab$EmptyCol2 <- EmptyCol +# Dominant Unit +Tab$DomUnit <- c("", "", "", "", "", "x", "", "x") + +# Adjust column names +ColNames <- c("","", "", "", "JLL", "GVAR", "", "", "", "", "","", "", "") +colnames(Tab) <- ColNames + +# Generate the table + kableExtra::kbl(Tab, align = "c", format = "latex", booktabs = TRUE, + caption = "Summary of model features", escape = FALSE) %>% + kableExtra::row_spec(0, bold = TRUE) %>% + kableExtra::add_header_above(c(" " = 2, "UR" = 1, "R" = 1, "UR" = 1, "R" = 2, " " = 1, " " = 1, + " " = 1, " " = 1, " " = 1, " " = 1, " " = 1, " " = 1)) %>% + kableExtra::add_header_above(c(" " = 2, "Single" = 2, "Joint" = 3, " " = 1, "Single" = 1, + "Joint" = 1, " " = 1, "P" = 1, "P and Q" = 1, " " = 1)) %>% kableExtra::add_header_above(c(" " = 2,"P-dynamics" = 5, " " = 1,"Q-dynamics" = 2, + " " = 1,"Sigma estimation" = 2, " " = 1,"Dom. Eco." = 1), + bold = TRUE) %>% +kableExtra::pack_rows("Unrestricted VAR", 1, 3) %>% +kableExtra::pack_rows("Restricted VAR (GVAR)", 4, 5) %>% +kableExtra::pack_rows("Restricted VAR (JLL)", 6, 8) %>% +kableExtra::kable_styling(font_size = 7, latex_options = "hold_position") +knitr::asis_output(" +\\vspace{-2.5em} +\\begin{center} +\\captionsetup{type=table} +\\caption*{\\footnotesize Note: Risk factor dynamics under the $\\mathbb{P}$-measure may follow either an unrestricted (UR) or a restricted (R) specification. The set of restrictions present in the JLL-based and GVAR-based models are described in \\cite{JotikasthiraLeLundblad2015} and \\cite{CandelonMoura2024}, respectively. The estimation of the $\\Sigma$ matrix is done either exclusively with the other parameters of the $\\mathbb{P}$-dynamics (\\textit{P} column) or jointly under both $\\mathbb{P}$- and $\\mathbb{Q}$-parameters (\\textit{P and Q} column). \\textit{Dom. Eco.} relates to the presence of a dominant economy. The entries featuring \\textit{x} indicate that the referred characteristic is part of the model.} +\\end{center} +") + + +## ----echo=TRUE---------------------------------------------------------------- +LoadData("CM_2024") + + +## ----echo=TRUE---------------------------------------------------------------- +ModelType <- "JPS original" +Economies <- c("Brazil", "Mexico", "Uruguay") +GlobalVar <- c("Gl_Eco_Act", "Gl_Inflation") +DomVar <- c("Eco_Act", "Inflation") +N <- 3 +t0 <- "01-07-2005" +tF <- "01-12-2019" +DataFreq <- "Monthly" +StatQ <- FALSE +Folder2Save <- NULL +OutputLabel <- "Model_demo" + + +## ----echo=TRUE---------------------------------------------------------------- +VARXtype <- "unconstrained" + + +## ----echo=TRUE---------------------------------------------------------------- +data('TradeFlows') +W_type <- "Sample Mean" +t_First_Wgvar <- "2000" +t_Last_Wgvar <- "2015" +DataConnectedness <- TradeFlows + + +## ----echo=TRUE---------------------------------------------------------------- +GVARlist <- list(VARXtype = "unconstrained", W_type = "Sample Mean", + t_First_Wgvar = "2000", t_Last_Wgvar = "2015", + DataConnectedness = TradeFlows) + + +## ----echo=TRUE---------------------------------------------------------------- +## Example for "JLL original" and "JLL joint Sigma" models +JLLlist <- list(DomUnit = "China") + +## For "JLL No DomUnit" model +JLLlist <- list(DomUnit = "None") + + +## ----echo = TRUE-------------------------------------------------------------- +BRWlist <- within(list(Cent_Measure = "Mean", gamma = 0.2, N_iter = 500, B = 50, + checkBRW = TRUE, B_check = 1000, Eigen_rest = 1), + N_burn <- round(N_iter * 0.15)) + + +## ----echo = TRUE-------------------------------------------------------------- +DesiredGraphs <- c("Fit", "GIRF", "GFEVD", "TermPremia") + + +## ----echo = TRUE-------------------------------------------------------------- +WishGraphRiskFac <- FALSE +WishGraphYields <- TRUE +WishOrthoJLLgraphs <- FALSE + + +## ----echo = TRUE-------------------------------------------------------------- +Bootlist <- list(methodBS = 'block', BlockLength = 4, ndraws = 1000, pctg = 95) + + +## ----echo = TRUE-------------------------------------------------------------- +ForecastList <- list(ForHoriz = 12, t0Sample = 1, t0Forecast = 70, ForType = "Rolling") + + +## ----echo = TRUE-------------------------------------------------------------- +data('Yields') +w <- pca_weights_one_country(Yields, Economy = "Uruguay") + + +## ----pca-H, fig.height = 5, fig.cap="Yield loadings on the spanned factors. Example using bond yield data for Uruguay. Graph generated using the ggplot2 package [@ggplot22016].", include=knitr::is_html_output(), eval=knitr::is_html_output()---- +# +# LabSpaFac <- c("Level", "Slope", "Curvature") +# N <- length(LabSpaFac) +# +# mat <- c(0.25, 0.5, 1, 3, 5, 10) +# +# w_pca <- data.frame(t(w[1:N,])) +# colnames(w_pca) <- LabSpaFac +# w_pca$mat <- mat +# +# ## Prepare plots +# colors <- c("Level" = "#0072B2", "Slope" = "#009E73", "Curvature" = "#D55E00") +# +# g <- ggplot2::ggplot(data = w_pca, ggplot2::aes(x= mat)) + +# ggplot2::geom_line(ggplot2::aes(y = Level, color = "Level"), size = 0.7) + +# ggplot2::geom_line(ggplot2::aes(y = Slope, color = "Slope"), size = 0.7) + +# ggplot2::geom_line(ggplot2::aes(y = Curvature, color = "Curvature"), size = 0.7) + +# ggplot2::labs(color = "Legend") + ggplot2::scale_color_manual(values = colors) + ggplot2::theme_classic() + +# ggplot2::theme(legend.position="top", legend.title=ggplot2::element_blank(), legend.text= ggplot2::element_text(size=8) ) + +# ggplot2::xlab("Maturity (Years)") + ggplot2:: scale_y_continuous(name="Weights") + ggplot2::geom_hline(yintercept=0) +# +# print(g) + + +## ----pca-L, fig.height = 2.8, fig.width = 5, fig.cap="Yield loadings on the spanned factors. Example using bond yield data for Uruguay. Graph was generated using the ggplot2 package \\citep{ggplot22016}.", include=knitr::is_latex_output(), eval=knitr::is_latex_output()---- + +LabSpaFac <- c("Level", "Slope", "Curvature") +N <- length(LabSpaFac) + +mat <- c(0.25, 0.5, 1, 3, 5, 10) + +w_pca <- data.frame(t(w[1:N,])) +colnames(w_pca) <- LabSpaFac +w_pca$mat <- mat + +## Prepare plots +colors <- c("Level" = "#0072B2", "Slope" = "#009E73", "Curvature" = "#D55E00") + + ggplot2::ggplot(data = w_pca, ggplot2::aes(x= mat)) + + ggplot2::geom_line(ggplot2::aes(y = Level, color = "Level"), size = 0.7) + + ggplot2::geom_line(ggplot2::aes(y = Slope, color = "Slope"), size = 0.7) + + ggplot2::geom_line(ggplot2::aes(y = Curvature, color = "Curvature"), size = 0.7) + + ggplot2::labs(color = "Legend") + ggplot2::scale_color_manual(values = colors) + ggplot2::theme_classic() + + ggplot2::theme(legend.position="top", legend.title=ggplot2::element_blank(), legend.text= ggplot2::element_text(size=8) ) + + ggplot2::xlab("Maturity (Years)") + ggplot2:: scale_y_continuous(name="Weights") + ggplot2::geom_hline(yintercept=0) + + +## ----echo = TRUE-------------------------------------------------------------- +data('Yields') +Economies <- c("China", "Brazil", "Mexico", "Uruguay") +N <- 2 +SpaFact <- Spanned_Factors(Yields, Economies, N) + + +## ----echo = TRUE-------------------------------------------------------------- +## Example 1: "JPS global" and "JPS multi" models +data("RiskFacFull") +PdynPara <- VAR(RiskFacFull, VARtype = "unconstrained") + +## Example 2: "JPS original" model for China +FactorsChina <- RiskFacFull[1:7, ] +PdynPara <- VAR(FactorsChina, VARtype = "unconstrained") + + +## ----echo = TRUE-------------------------------------------------------------- +data("GVARFactors") + + +## ----echo = TRUE-------------------------------------------------------------- +data('GVARFactors') +GVARinputs <- list(Economies = Economies, GVARFactors = GVARFactors, + VARXtype ="constrained: Inflation") + + +## ----------------------------------------------------------------------------- +data("TradeFlows") +t_First <- "2006" +t_Last <- "2019" +Economies <- c("China", "Brazil", "Mexico", "Uruguay") +type <- "Sample Mean" +W_gvar <- Transition_Matrix(t_First, t_Last, Economies, type, TradeFlows) + +round(W_gvar, digits= 4) + + +## ----echo = TRUE-------------------------------------------------------------- +data("GVARFactors") +GVARinputs <- list(Economies = Economies, GVARFactors = GVARFactors, + VARXtype = "unconstrained", Wgvar = W_gvar) +N <- 3 +GVARpara <- GVAR(GVARinputs, N, CheckInputs = TRUE) + + +## ----eval=FALSE, echo=TRUE---------------------------------------------------- +# ## First set the JLLinputs +# ModelType <- "JLL original" +# JLLinputs <- list(Economies = Economies, DomUnit = "China", WishSigmas = TRUE, +# SigmaNonOrtho = NULL, JLLModelType = ModelType) +# +# ## Then, estimate the desired the P-dynamics from the desired JLL model +# data("RiskFacFull") +# N <- 3 +# JLLpara <- JLL(RiskFacFull, N, JLLinputs, CheckInputs = TRUE) + + +## ----FullImpl, cache=FALSE, echo = TRUE--------------------------------------- +library(MultiATSM) +# 1) USER INPUTS +# A) Load database data +LoadData("CM_2024") + +# B) GENERAL model inputs +ModelType <- "JPS original" +Economies <- c("China", "Brazil") +GlobalVar <- c("Gl_Eco_Act") +DomVar <- c("Eco_Act") +N <- 2 +t0_sample <- "01-05-2005" +tF_sample <- "01-12-2019" +OutputLabel <- "Test" +DataFreq <-"Monthly" +Folder2Save <- NULL +StatQ <- FALSE + +# B.1) SPECIFIC model inputs +# GVAR-based models +GVARlist <- list( VARXtype = "unconstrained", W_type = "Sample Mean", t_First_Wgvar = "2005", + t_Last_Wgvar = "2019", DataConnectedness = TradeFlows ) + +# JLL-based models +JLLlist <- list(DomUnit = "China") + +# BRW inputs +WishBC <- FALSE +BRWlist <- within(list(Cent_Measure = "Mean", gamma = 0.05, N_iter = 250, B = 50, checkBRW = TRUE, + B_check = 1000, Eigen_rest = 1), N_burn <- round(N_iter * 0.15)) + +# C) Decide on Settings for numerical outputs +WishFPremia <- TRUE +FPmatLim <- c(60,120) + +Horiz <- 30 +DesiredGraphs <- c() +WishGraphRiskFac <- FALSE +WishGraphYields <- FALSE +WishOrthoJLLgraphs <- FALSE + +# D) Bootstrap settings +WishBootstrap <- TRUE +BootList <- list(methodBS = 'bs', BlockLength = 4, ndraws = 5, pctg = 95) + +# E) Out-of-sample forecast +WishForecast <- TRUE +ForecastList <- list(ForHoriz = 12, t0Sample = 1, t0Forecast = 162, ForType = "Rolling") + +########################################################################################## +# NO NEED TO MAKE CHANGES FROM HERE: +# The sections below automatically process the inputs provided above, run the model +# estimation, generate the numerical and graphical outputs, and save results. + +# 2) Minor preliminary work: get the sets of factor labels +FactorLabels <- LabFac(N, DomVar, GlobalVar, Economies, ModelType) + +# 3) Prepare the inputs of the likelihood function +ATSMInputs <- InputsForOpt(t0_sample, tF_sample, ModelType, Yields, GlobalMacro, + DomMacro, FactorLabels, Economies, DataFreq, GVARlist, + JLLlist, WishBC, BRWlist) + +# 4) Optimization of the ATSM (Point Estimates) +ModelParaList <- Optimization(ATSMInputs, StatQ, DataFreq, FactorLabels, Economies, ModelType) + +# 5) Numerical and graphical outputs +# a) Prepare list of inputs for graphs and numerical outputs +InputsForOutputs <- InputsForOutputs(ModelType, Horiz, DesiredGraphs, OutputLabel, StatQ, + DataFreq, WishGraphYields, WishGraphRiskFac, + WishOrthoJLLgraphs, WishFPremia, + FPmatLim, WishBootstrap, BootList, + WishForecast, ForecastList) + +# b) Fit, IRF, FEVD, GIRF, GFEVD, and Term Premia +NumericalOutputs <- NumOutputs(ModelType, ModelParaList, InputsForOutputs, + FactorLabels, Economies, Folder2Save) + +# c) Confidence intervals (bootstrap analysis) +BootstrapAnalysis <- Bootstrap(ModelType, ModelParaList, NumericalOutputs, Economies, + InputsForOutputs, FactorLabels, JLLlist, GVARlist, + WishBC, BRWlist, Folder2Save) + +# 6) Out-of-sample forecasting +Forecasts <- ForecastYields(ModelType, ModelParaList, InputsForOutputs, FactorLabels, + Economies, JLLlist, GVARlist, WishBC, BRWlist, + Folder2Save) + + +## ----FitYields, out.width="100%", fig.width = 6, fig.height = 4.5, fig.cap = if (knitr::is_html_output()) { knitr::asis_output("Chinese bond yield maturities with model fit comparisons. *Model-fit* reflects estimation using only risk-neutral ($\\mathbb{Q}$) dynamics parameters, while *Model-Implied* incorporates both physical ($\\mathbb{P}$) and risk-neutral ($\\mathbb{Q}$) dynamics. The $x$-axes represent time in months and the $y$-axis is in natural units.")} else{ knitr::asis_output("Chinese bond yield maturities with model fit comparisons. \\emph{Model-fit} reflects estimation using only risk-neutral ($\\mathbb{Q}$) dynamics parameters, while \\emph{Model-implied} incorporates both physical ($\\mathbb{P}$) and risk-neutral ($\\mathbb{Q}$) dynamics. The $x$-axes represent time in months and the $y$-axis is in natural units.")}---- +FitYields <- autoplot(NumericalOutputs, type = "Fit") +FitYields$China + + +## ----IRF, out.width="100%", fig.width = 6, fig.height = 4.5, fig.cap = "IRFs from the Brazilian bond yields to global economic activity. Size of the shock is one-standard deviation. The black lines are the point estimates. Gray dashed lines are the bounds of the 95% confidence intervals and the green lines correspond to the median of these intervals. The $x$-axes are expressed in months and the $y$-axis is in natural units."---- +IRFs_Graphs <- autoplot(BootstrapAnalysis, NumericalOutputs, type = "IRF_Yields_Boot") +IRFs_Graphs$Brazil$Gl_Eco_Act + + +## ----FEVD, out.width="100%", fig.width = 6, fig.height = 4.5, fig.cap = "FEVD from the Brazilian bond yield with maturity 60 months. The $x$-axis represents the forecast horizon in months and the $y$-axis is in natural units."---- +FEVDs_Graphs <- autoplot(NumericalOutputs, type = "FEVD_Yields") +FEVDs_Graphs$Brazil$Y60M_Brazil + + +## ----TermPremia, out.width="100%", fig.width = 6, fig.height = 4.5, fig.cap = "Chinese sovereign yield curve decomposition showing (i) expected future short rates and (ii) term premia components. The $x$-axis represents time in months and the $y$-axis is expressed in percentage points."---- +TP_Graphs <- autoplot(NumericalOutputs, type = "TermPremia") +TP_Graphs$China + + +## ----echo=TRUE---------------------------------------------------------------- +MacroData <- Load_Excel_Data(system.file("extdata", "MacroData.xlsx", + package = "MultiATSM")) +YieldsData <- Load_Excel_Data(system.file("extdata", "YieldsData.xlsx", + package = "MultiATSM")) + + +## ----echo=TRUE---------------------------------------------------------------- +ModelType <- "JPS original" +Initial_Date <- "2006-09-01" +Final_Date <- "2019-01-01" +DataFrequency <- "Monthly" +GlobalVar <- c("GBC", "VIX") +DomVar <- c("Eco_Act", "Inflation", "Com_Prices", "Exc_Rates") +N <- 3 +Economies <- c("China", "Mexico", "Uruguay", "Brazil", "Russia") + + +## ----echo=TRUE---------------------------------------------------------------- +FactorLabels <- LabFac(N, DomVar, GlobalVar, Economies, ModelType) +RiskFactorsSet <- DataForEstimation(Initial_Date, Final_Date, Economies, N, FactorLabels, + ModelType, DataFrequency, MacroData, YieldsData) + + +## ----echo=TRUE---------------------------------------------------------------- +data("TradeFlows") +t_First <- "2006" +t_Last <- "2019" +Economies <- c("China", "Brazil", "Mexico", "Uruguay") +type <- "Sample Mean" +W_gvar <- Transition_Matrix(t_First, t_Last, Economies, type, TradeFlows) + + +## ----echo=TRUE---------------------------------------------------------------- + WishFPremia <- TRUE + FPmatLim <- c(60, 120) + + +## ----echo=TRUE---------------------------------------------------------------- +# 1) INPUTS +# A) Load database data +LoadData("BR_2017") + +# B) GENERAL model inputs +ModelType <- "JPS original" + +Economies <- c("US") +GlobalVar <- c() +DomVar <- c("GRO", "INF") +N <- 3 +t0_sample <- "January-1985" +tF_sample <- "December-2007" +DataFreq <- "Monthly" +StatQ <- FALSE + +# 2) Minor preliminary work +FactorLabels <- LabFac(N, DomVar, GlobalVar, Economies, ModelType) +Yields <- t(BR_jps_out$Y) +DomesticMacroVar <- t(BR_jps_out$M.o) +GlobalMacroVar <- c() + +# 3) Prepare the inputs of the likelihood function +ATSMInputs <- InputsForOpt(t0_sample, tF_sample, ModelType, Yields, GlobalMacroVar, + DomesticMacroVar, FactorLabels, Economies, DataFreq) + +# 4) Optimization of the model +ModelPara <- Optimization(ATSMInputs, StatQ, DataFreq, FactorLabels, Economies, ModelType) + + +## ----QdynTab-H, eval = knitr::is_html_output(), layout = "l-body-outset"------ +# options(scipen = 100) +# options(scipen = 100) +# +# RowsQ <- c("$r_0$", "$\\lambda_1$", "$\\lambda_2$", "$\\lambda_3$") +# TableQ <- data.frame(matrix(NA, ncol = 0, nrow = length(RowsQ))) +# row.names(TableQ) <- RowsQ +# +# PackageQ <- c( +# ModelPara$`JPS original`$US$ModEst$Q$r0, +# diag(ModelPara$`JPS original`$US$ModEst$Q$K1XQ) +# ) +# BRq <- c( +# BR_jps_out$est.llk$rho0.cP, +# diag(BR_jps_out$est.llk$KQ.XX) +# ) +# +# TableQ$MultiATSM <- PackageQ +# TableQ$'BR (2017)' <- BRq +# +# # Function for consistent width and right alignment in HTML +# format_html_num <- function(x, digits = 4) { +# fmt <- formatC(x, format = "f", digits = digits) +# fmt <- gsub("-", "−", fmt, fixed = TRUE) # replace hyphen with Unicode minus +# # wrap in right-aligned span to preserve your table's theme +# paste0('', fmt, '') +# } +# +# TableQ_fmt <- TableQ +# TableQ_fmt[] <- lapply(TableQ_fmt, function(col) { +# if (is.numeric(col)) format_html_num(col) else col +# }) +# +# library(kableExtra) +# library(magrittr) +# +# kbl(TableQ_fmt, align = "c", caption = "$Q$-dynamics parameters", escape = FALSE) %>% +# kable_classic("striped", full_width = FALSE) %>% +# row_spec(0, font_size = 14) %>% +# footnote( +# general = "λ's are the eigenvalues from the risk-neutral feedback matrix and r₀ is the long-run mean of the short rate under Q." +# ) + + +## ----QdynTab-L, eval = knitr::is_latex_output()------------------------------- +options(scipen = 100) # eliminate scientific notation + +RowsQ <- c("$r_0$", "$\\lambda_1$", "$\\lambda_2$", "$\\lambda_3$") +TableQ <- data.frame(matrix(NA, ncol = 0, nrow = length(RowsQ))) +row.names(TableQ) <- RowsQ + +PackageQ <- c(ModelPara$`JPS original`$US$ModEst$Q$r0, diag(ModelPara$`JPS original`$US$ModEst$Q$K1XQ)) +BRq <- c(BR_jps_out$est.llk$rho0.cP, diag(BR_jps_out$est.llk$KQ.XX)) +TableQ$MultiATSM <- PackageQ +TableQ$'BR (2017)' <- BRq + +TableQ <- round(TableQ, digits = 4) + +# Ensure that numbers in the table are actual numerical values. This is necessary to ensure that negative signs show up as dashes rather than hyphens. +TableQ <- as.data.frame(TableQ) +TableQ[] <- lapply(TableQ, function(col) { + if (all(suppressWarnings(!is.na(as.numeric(col))))) { + paste0("$", col, "$") + } else { + col + } +}) + + +format_latex_num <- function(x, digits = 4) { + # Round and create a fixed-width string + fmt <- formatC(x, format = "f", digits = digits, width = digits + 3) + + # Replace normal space padding with phantom zeros for alignment in LaTeX + fmt <- gsub(" ", "\\\\phantom{0}", fmt) + + # Replace minus sign with LaTeX proper math minus + fmt <- gsub("-", "\\\\text{-}", fmt) + + paste0("$", fmt, "$") +} + +# Apply formatting only to numeric columns +TableQ_fmt <- TableQ +TableQ_fmt[] <- lapply(TableQ_fmt, function(col) { + if (is.numeric(col)) format_latex_num(col) else col +}) + +library(kableExtra) + +kable( + TableQ_fmt, + format = "latex", + booktabs = TRUE, + escape = FALSE, + align = "r", + caption = "$Q$-dynamics parameters" +) %>% + kable_styling(font_size = 7, latex_options = "hold_position") +knitr::asis_output(" +\\vspace{-2.0em} +\\begin{center} +\\footnotesize Note: $\\lambda$'s are the eigenvalues from the risk-neutral feedback matrix and $r_0$ is the long-run mean of the short rate under $\\mathbb{Q}$. +\\end{center} +") + + +## ----PdynTab-H, eval = knitr::is_html_output(), layout = "l-body-outset"------ +# +# RowsP <- c("PC1", "PC2", "PC3", "GRO", "INF") +# ColP <- c(" ", RowsP) +# +# # 1) K0Z and K1Z +# # Bauer and Rudebusch coefficients +# TablePbr <- data.frame(matrix(NA, ncol = length(ColP), nrow = length(RowsP))) +# row.names(TablePbr) <- RowsP +# colnames(TablePbr) <- ColP +# +# TablePbr[[ColP[1]]] <- BR_jps_out$est.llk$KP.0Z +# for (j in seq_along(RowsP)) { +# TablePbr[[RowsP[j]]] <- BR_jps_out$est.llk$KP.ZZ[, j] +# } +# +# TablePbr <- round(TablePbr, digits = 4) +# +# # MultiATSM coefficients +# TablePMultiATSM <- data.frame(matrix(NA, ncol = length(ColP), nrow = length(RowsP))) +# row.names(TablePMultiATSM) <- RowsP +# colnames(TablePMultiATSM) <- ColP +# +# PP <- BR_jps_out$W[1:N, ] %*% Yields +# ZZ <- rbind(PP, DomesticMacroVar) +# Pdyncoef <- VAR(ZZ, "unconstrained") +# +# TablePMultiATSM[[ColP[1]]] <- Pdyncoef$K0Z +# for (j in seq_along(RowsP)) { +# TablePMultiATSM[[RowsP[j]]] <- Pdyncoef$K1Z[, j] +# } +# +# TablePMultiATSM <- round(TablePMultiATSM, digits = 4) +# +# # Combine both tables +# TableP <- rbind(TablePbr, TablePMultiATSM) +# row.names(TableP) <- c(RowsP, paste0(RowsP, " ")) +# +# # ---- Formatting for HTML ---- +# # Same right-aligned CSS approach as in Q-table +# format_html_num <- function(x, digits = 4) { +# fmt <- formatC(x, format = "f", digits = digits) +# fmt <- gsub("-", "−", fmt, fixed = TRUE) # Unicode minus +# paste0('', fmt, '') +# } +# +# TableP_fmt <- TableP +# TableP_fmt[] <- lapply(TableP_fmt, function(col) { +# if (is.numeric(col)) format_html_num(col) else col +# }) +# +# library(kableExtra) +# library(magrittr) +# +# kbl(TableP_fmt, align = "c", caption = "$P$-dynamics parameters", escape = FALSE) %>% +# kable_classic("striped", full_width = FALSE) %>% +# row_spec(0, font_size = 14) %>% +# add_header_above(c(" " = 1, "K0Z" = 1, "K1Z" = 5), bold = TRUE) %>% +# pack_rows("BR (2017)", 1, 5) %>% +# pack_rows("MultiATSM", 6, 10) %>% +# footnote( +# general = "$K0Z$ is the intercept and $K1Z$ is the feedback matrix from the $P$-dynamics." +# ) +# + + +## ----PdynTab-L, eval = knitr::is_latex_output()------------------------------- + +RowsP <- c("PC1", "PC2", "PC3", "GRO", "INF") +ColP <- c(" ", RowsP) + +# --- 1) K0Z and K1Z : Bauer and Rudebusch coefficients --- +TablePbr <- data.frame(matrix(NA, ncol = length(ColP), nrow = length(RowsP))) +row.names(TablePbr) <- RowsP +colnames(TablePbr) <- ColP + +TablePbr[[ColP[1]]] <- BR_jps_out$est.llk$KP.0Z +for (j in seq_along(RowsP)) { + TablePbr[[RowsP[j]]] <- BR_jps_out$est.llk$KP.ZZ[, j] +} +TablePbr <- round(TablePbr, digits = 4) + +# --- 2) MultiATSM coefficients --- +TablePMultiATSM <- data.frame(matrix(NA, ncol = length(ColP), nrow = length(RowsP))) +row.names(TablePMultiATSM) <- RowsP +colnames(TablePMultiATSM) <- ColP + +PP <- BR_jps_out$W[1:N, ] %*% Yields +ZZ <- rbind(PP, DomesticMacroVar) +Pdyncoef <- VAR(ZZ, "unconstrained") + +TablePMultiATSM[[ColP[1]]] <- Pdyncoef$K0Z +for (j in seq_along(RowsP)) { + TablePMultiATSM[[RowsP[j]]] <- Pdyncoef$K1Z[, j] +} +TablePMultiATSM <- round(TablePMultiATSM, digits = 4) + +# --- 3) Combine and label --- +TableP <- rbind(TablePbr, TablePMultiATSM) +row.names(TableP) <- c(RowsP, paste0(RowsP, " ")) # avoid duplicate names + +# --- 4) Format numeric cells for LaTeX alignment --- +format_latex_num <- function(x, digits = 4) { + fmt <- formatC(x, format = "f", digits = digits, width = digits + 4) + fmt <- gsub(" ", "\\\\phantom{0}", fmt) # pad spaces with phantom zeros + fmt <- gsub("-", "\\\\text{-}", fmt) # proper minus sign + paste0("$", fmt, "$") +} + +TableP_fmt <- TableP +TableP_fmt[] <- lapply(TableP_fmt, function(col) { + if (is.numeric(col)) format_latex_num(col) else col +}) + +library(kableExtra) +library(magrittr) + +kable(TableP_fmt, align = "c", format = "latex", booktabs = TRUE, escape = FALSE, + caption = "$P$-dynamics parameters") %>% + kable_styling(latex_options = "hold_position", font_size = 7) %>% + add_header_above(c(" " = 1, "K0Z" = 1, "K1Z" = 5), bold = TRUE) %>% + pack_rows("BR (2017)", 1, 5) %>% + pack_rows("MultiATSM", 6, 10) %>% + footnote( + general = "$K0Z$ is the intercept and $K1Z$ is the feedback matrix from the $P$-dynamics.", + escape = FALSE + ) + + +## ----eval=FALSE, echo=TRUE---------------------------------------------------- +# # 1) INPUTS +# # A) Load database data +# LoadData("CM_2024") +# +# # B) GENERAL model inputs +# ModelType <- "GVAR multi" +# Economies <- c("China", "Brazil", "Mexico", "Uruguay") +# GlobalVar <- c("Gl_Eco_Act", "Gl_Inflation") +# DomVar <- c("Eco_Act", "Inflation") +# N <- 3 +# t0_sample <- "01-06-2004" +# tF_sample <- "01-01-2020" +# OutputLabel <- "CM_jfec" +# DataFreq <-"Monthly" +# StatQ <- FALSE +# +# # B.1) SPECIFIC model inputs +# # GVAR-based models +# GVARlist <- list( VARXtype = "unconstrained", W_type = "Sample Mean", t_First_Wgvar = "2004", +# t_Last_Wgvar = "2019", DataConnectedness = TradeFlows ) +# +# # JLL-based models +# JLLlist <- list(DomUnit = "China") +# +# # BRW inputs +# WishBC <- TRUE +# BRWlist <- within(list(Cent_Measure = "Mean", gamma = 0.001, N_iter = 200, B = 50, checkBRW = TRUE, +# B_check = 1000, Eigen_rest = 1), N_burn <- round(N_iter * 0.15)) +# +# # C) Decide on Settings for numerical outputs +# WishFPremia <- TRUE +# FPmatLim <- c(24,36) +# +# Horiz <- 25 +# DesiredGraphs <- c("GIRF", "GFEVD", "TermPremia") +# WishGraphRiskFac <- FALSE +# WishGraphYields <- TRUE +# WishOrthoJLLgraphs <- TRUE +# +# # D) Bootstrap settings +# WishBootstrap <- FALSE +# BootList <- list(methodBS = 'bs', BlockLength = 4, ndraws = 1000, pctg = 95) +# +# # E) Out-of-sample forecast +# WishForecast <- TRUE +# ForecastList <- list(ForHoriz = 12, t0Sample = 1, t0Forecast = 100, ForType = "Rolling") +# +# # 2) Minor preliminary work: get the sets of factor labels and a vector of common maturities +# FactorLabels <- LabFac(N, DomVar, GlobalVar, Economies, ModelType) +# +# # 3) Prepare the inputs of the likelihood function +# ATSMInputs <- InputsForOpt(t0_sample, tF_sample, ModelType, Yields, GlobalMacro, +# DomMacro, FactorLabels, Economies, DataFreq, +# GVARlist, JLLlist, WishBC, BRWlist) +# +# # 4) Optimization of the ATSM (Point Estimates) +# ModelParaList <- Optimization(ATSMInputs, StatQ, DataFreq, FactorLabels, Economies, ModelType) +# +# # 5) Numerical and graphical outputs +# # a) Prepare list of inputs for graphs and numerical outputs +# InputsForOutputs <- InputsForOutputs(ModelType, Horiz, DesiredGraphs, OutputLabel, StatQ, +# DataFreq, WishGraphYields, WishGraphRiskFac, +# WishOrthoJLLgraphs, WishFPremia, FPmatLim, +# WishBootstrap, BootList, WishForecast, +# ForecastList) +# +# # b) Fit, IRF, FEVD, GIRF, GFEVD, and Term Premia +# NumericalOutputs <- NumOutputs(ModelType, ModelParaList, InputsForOutputs, +# FactorLabels, Economies) +# +# # c) Confidence intervals (bootstrap analysis) +# BootstrapAnalysis <- Bootstrap(ModelType, ModelParaList, NumericalOutputs, Economies, +# InputsForOutputs, FactorLabels, JLLlist, GVARlist, +# WishBC, BRWlist) +# +# # 6) Out-of-sample forecasting +# Forecasts <- ForecastYields(ModelType, ModelParaList, InputsForOutputs, FactorLabels, +# Economies, JLLlist, GVARlist, WishBC, BRWlist) + + +## ----eval=FALSE, echo=TRUE---------------------------------------------------- +# # 1) INPUTS +# # A) Load database data +# LoadData("CM_2023") +# +# # B) GENERAL model inputs +# ModelType <- "GVAR multi" +# Economies <- c("Brazil", "India", "Russia", "Mexico") +# GlobalVar <- c("US_Output_growth", "China_Output_growth", "SP500") +# DomVar <- c("Inflation","Output_growth", "CDS", "COVID") +# N <- 2 +# t0_sample <- "22-03-2020" +# tF_sample <- "26-09-2021" +# OutputLabel <- "CM_EM" +# DataFreq <-"Weekly" +# StatQ <- FALSE +# +# # B.1) SPECIFIC model inputs +# # GVAR-based models +# GVARlist <- list(VARXtype = "constrained: COVID", W_type = "Sample Mean", +# t_First_Wgvar = "2015", t_Last_Wgvar = "2020", +# DataConnectedness = TradeFlows_covid) +# +# # BRW inputs +# WishBC <- FALSE +# +# # C) Decide on Settings for numerical outputs +# WishFPremia <- TRUE +# FPmatLim <- c(47,48) +# +# Horiz <- 12 +# DesiredGraphs <- c("GIRF", "GFEVD", "TermPremia") +# WishGraphRiskFac <- FALSE +# WishGraphYields <- TRUE +# WishOrthoJLLgraphs <- FALSE +# +# # D) Bootstrap settings +# WishBootstrap <- TRUE +# BootList <- list(methodBS = 'bs', BlockLength = 4, ndraws = 100, pctg = 95) +# +# # 2) Minor preliminary work: get the sets of factor labels and a vector of common maturities +# FactorLabels <- LabFac(N, DomVar, GlobalVar, Economies, ModelType) +# +# # 3) Prepare the inputs of the likelihood function +# ATSMInputs <- InputsForOpt(t0_sample, tF_sample, ModelType, Yields_covid, GlobalMacro_covid, +# DomMacro_covid, FactorLabels, Economies, DataFreq, GVARlist) +# +# # 4) Optimization of the ATSM (Point Estimates) +# ModelParaList <- Optimization(ATSMInputs, StatQ, DataFreq, FactorLabels, Economies, ModelType) +# +# # 5) Numerical and graphical outputs +# # a) Prepare list of inputs for graphs and numerical outputs +# InputsForOutputs <- InputsForOutputs(ModelType, Horiz, DesiredGraphs, OutputLabel, StatQ, +# DataFreq, WishGraphYields, WishGraphRiskFac, +# WishOrthoJLLgraphs, WishFPremia, FPmatLim, +# WishBootstrap, BootList) +# +# # b) Fit, IRF, FEVD, GIRF, GFEVD, and Term Premia +# NumericalOutputs <- NumOutputs(ModelType, ModelParaList, InputsForOutputs, FactorLabels, +# Economies) +# +# # c) Confidence intervals (bootstrap analysis) +# BootstrapAnalysis <- Bootstrap(ModelType, ModelParaList, NumericalOutputs, Economies, +# InputsForOutputs, FactorLabels, +# JLLlist = NULL, GVARlist) + diff --git a/_articles/RJ-2025-044/RJ-2025-044.Rmd b/_articles/RJ-2025-044/RJ-2025-044.Rmd new file mode 100644 index 0000000000..ebfe6fcbaf --- /dev/null +++ b/_articles/RJ-2025-044/RJ-2025-044.Rmd @@ -0,0 +1,1589 @@ +--- +title: 'MultiATSM: An R Package for Arbitrage-Free Macrofinance Multicountry Affine + Term Structure Models' +date: '2026-02-10' +abstract: | + The MultiATSM package provides estimation tools and a wide range of outputs for eight macrofinance affine term structure model (ATSM) classes, supporting practitioners, academics, and policymakers. All models extend the single-country framework of Joslin et al. (2014) to multicountry settings, with additional adaptations from Jotikasthira et al. (2015) and Candelon and Moura (2024). These model extensions incorporate, respectively, the presence of a dominant (global) economy and adopt a global vector autoregressive (GVAR) setup to capture the joint dynamics of risk factors. The package generates diverse outputs for each ATSM, including graphical representations of model fit, risk premia, impulse response functions, and forecast error variance decompositions. It also implements bootstrap methods for confidence intervals and produces bond yield forecasts. +draft: no +author: +- name: Rubens Moura + affiliation: Banco de Mexico + address: + - Avenida 5 de Mayo, 2 + - Mexico City, Mexico + orcid: 0000-0001-8105-4729 + email: rubens.guimaraes@banxico.org.mx +type: package +output: + rjtools::rjournal_web_article: + self_contained: yes + toc: no + mathjax: https://cdn.jsdelivr.net/npm/mathjax@4/tex-mml-chtml.js + rjtools::rjournal_pdf_article: + toc: no +bibliography: References.bib +date_received: '2025-06-10' +volume: 17 +issue: 4 +slug: RJ-2025-044 +journal: + lastpage: 304 + firstpage: 275 + +--- + + +```{r setup, include=FALSE} +knitr::opts_chunk$set(echo = FALSE, warning = FALSE, message = FALSE) +library(ggplot2) +library(MultiATSM) +library(kableExtra) +library(magrittr) +``` + +# Introduction +The term structure of interest rates (or yield curve) describes the relationship between bond yields and investment maturities. As @Piazzesi2010 emphasizes, understanding its dynamics is essential for several reasons. First, long-term yields incorporate market expectations of future short-term rates, making the yield curve a handy forecasting tool for macroeconomic aggregates like output and inflation. As such, this supports optimal consumption-saving decisions and capital allocation by economic agents. Second, it plays a key role in the transmission of monetary policy, linking short-term policy rates to long-term borrowing costs. Third, it guides fiscal authorities in shaping debt maturities to balance refinancing risk and interest rate exposure. Fourth, it is essential for pricing and hedging interest rate derivatives, which rely on accurate yield curve modelling. + +Affine Term Structure Models (ATSMs) are the workhorse in yield curve modelling. Based on the assumption of no arbitrage, ATSMs offer a flexible framework to assess how investors price risks and generate predictions for the price of any bond (see @Piazzesi2010; @GurkaynakWright2012 for comprehensive reviews). Early ATSMs gained popularity for their ability to capture nearly all term structure fluctuations, appealing to both academics and practitioners [@Vasicek1977; @DuffieKan1996; @DaiSingleton2002]. While these models produce accurate statistical descriptions of the yield curve, they are silent on the deeper economic determinants that policymakers require for causal inference. + +In response to this limitation, a large body of research has emerged to explore the interplay between the term structure and macroeconomic developments (seminal contributions include @AngPiazzesi2003 and @RudebuschWu2008). A prominent contribution in this area is the unspanned economic risk framework developed by @JoslinPriebschSingleton2014 (henceforth JPS, 2014). In essence, this model assumes an arbitrage-free bond market and considers a linear state space representation to describe the dynamics of the yield curve. Compared to earlier macrofinance ATSMs, JPS (2014) offers a tractable estimation approach that integrates traditional yield curve factors (spanned factors) with macroeconomic variables (unspanned factors). As a result, the model delivers a strong cross-sectional fit while explicitly linking bond yield responses to the state of the economy. + +The work of JPS (2014) lays the foundational framework for the modelling tools included in the \CRANpkg{MultiATSM} package [@MultiATSM2025]. In addition to the original single-country setup proposed by JPS (2014), the package incorporates multicountry extensions developed by @JotikasthiraLeLundblad2015 (henceforth JLL, 2015) and @CandelonMoura2024 (henceforth CM, 2024). Altogether, the package offers functions to build eight types of ATSMs, covering the original versions and several variants of these three frameworks. + +Beyond complete routines for model estimation, \CRANpkg{MultiATSM} produces a wide range of analytical outputs. In particular, it generates graphical representations such as model-implied bond yields, bond risk premia, and both orthogonalized and generalized versions of: *(i)* impulse response functions, and *(ii)* forecast error variance decompositions for yields and risk factors. Confidence intervals for the two latter outputs can be computed using three bootstrap methods: residual-based, block, or wild bootstrap. Moreover, the package supports out-of-sample forecasting of bond yields across the maturity spectrum. This paper provides detailed guidance on how to use the \CRANpkg{MultiATSM} package effectively. + +There are a few notable packages for term structure modelling in the R programming environment. \CRANpkg{YieldCurve} [@YieldCurve2015] and \CRANpkg{fBonds} [@fBonds2017] provide a collection of functions to build term structures based on the frameworks of @NelsonSiegel1987 and @Svensson1994. These yield curve methods have gained popularity for their parsimonious parameterization and good empirical fit. However, these models do not rule out arbitrage opportunities, a limitation addressed by ATSMs. Moreover, the focus of \CRANpkg{YieldCurve} and \CRANpkg{fBonds} is restricted to parameter estimation and yield curve fitting, without offering additional model outputs such as those provided by \CRANpkg{MultiATSM}. + +Several other R packages support time series modelling [@HyndmanKillick2025], particularly within state space and vector autoregressive (VAR) frameworks. State space packages are relatively few and tend to focus on either estimation, \CRANpkg{statespacer} [@statespacer2023], or simulation, \CRANpkg{simStateSpace} [@simStateSpace2025]. VAR-based tools are more numerous. For instance, \CRANpkg{vars} [@vars2024] and \CRANpkg{MTS} [@MTS2022] provide extensive functionality for estimation, diagnostics, and forecasting, while \CRANpkg{svars} [@svars2023] adds structural identification methods. High-dimensional VARs are handled by packages like \CRANpkg{bigtime} [@bigtime2023] and \CRANpkg{BigVAR} [@BigVAR2025], and cross-country spillovers are modeled by \CRANpkg{Spillover} [@Spillover2024] and \CRANpkg{BGVAR} [@BGVAR2024]. + +Although these tools share some features with \CRANpkg{MultiATSM}, they are tailored to standard state space or VAR analysis. In contrast, \CRANpkg{MultiATSM} embeds VAR dynamics within a state space representation that is explicitly grounded in arbitrage-free asset pricing theory. As such, \CRANpkg{MultiATSM} fills a specific gap in the R ecosystem by combining the structure of ATSMs with the flexibility of modern time series tools. + +The remainder of the paper is organized as follows. Section [2](#S:ATSMTheory) outlines the theoretical foundations of the ATSMs implemented in the \CRANpkg{MultiATSM} package, and Section [3](#S:ATSMoptions) details each model’s features. The subsequent sections focus on the practical implementation of ATSMs. Section [4](#S:SectionData) presents the dataset included in the package. Section [5](#S:SectionInputs) explains the user inputs required for model estimation. Section [6](#S:SectionEstimation) explains the estimation procedure, and Section [7](#S:SectionImplementation) shows how to estimate ATSMs from scratch using \CRANpkg{MultiATSM}. Replications of published academic studies are provided in the Appendix. + +# ATSMs with unspanned economic risks: theoretical background {#S:ATSMTheory} +In this section, I outline several arbitrage-free ATSMs with unspanned macroeconomic risks available in the \CRANpkg{MultiATSM} package. A key appealing feature of these setups is their ability to disentangle the yield curve into a cross-sectional component, governed by the risk-neutral ($\mathbb{Q}$) dynamics, and a time-series component, driven by the physical ($\mathbb{P}$) dynamics. In light of this characteristic of the models, I present the single and the multicountry $\mathbb{Q}$-dynamics model dimensions in Section [2.1](#S:Qdyn). Next, I expose the specific features of the risk factor dynamics under the $\mathbb{P}$-measure of the various restricted and unrestricted VARs settings in Section [2.2](#S:Pdyn). Section [2.3](#S:ATSMestimation) describes the model estimation procedures. + + +## Model cross-sectional dimension (Q-dynamics) {#S:Qdyn} +### Single-country specifications (individual Q-dynamics model classes) +The model cross-sectional structure is based on two central equations. The first one assumes that the country $i$ short-term interest rate at time $t$, $r_{i,t}$, is an affine function of $N$ unobserved (latent) country-specific factors, $\boldsymbol{X_{i,t}}$: +\begin{equation} +\underset{(1 \times 1)}{\vphantom{\Big|} +r_{i,t}} = +\underset{(1 \times 1)}{ +\vphantom{\Big|} +\delta_{i,0}} + +\underset{(1 \times N)}{% +\vphantom{\Big|} +\boldsymbol{\delta}_{i,1}^{\top}} +\underset{(N \times 1)}{% +\vphantom{\Big|} +\boldsymbol{X}_{i,t}}\text{,} +(\#eq:ShortRate) +\end{equation} +where $\delta_{i,0}$ and $\boldsymbol{\delta_{i,1}}$ are time-invariant parameters. + +The second equation assumes that the unobserved factor dynamics for each country $i$ follow a maximally flexible, first-order, $N-$dimensional multivariate Gaussian ($\mathcal{N}$) VAR model under the $\mathbb{Q}$-measure: + +\begin{align} +& \underset{(N \times 1)}{\boldsymbol{\vphantom{\Big|} +X_{i,t}}} = +\underset{(N \times 1)}{\boldsymbol{\vphantom{\Big|} +\mu^{Q}_{i,X}}} + +\underset{(N \times N)}{\vphantom{\Big|} +\Phi^{Q}_{i,X}} +\underset{(N \times 1)}{\boldsymbol{\vphantom{\Big|} +X_{i,t-1}}} + +\underset{(N \times N)}{\vphantom{\Big|} +\Gamma_{i,X}} +\underset{(N \times 1)}{\boldsymbol{\vphantom{\Big|} +\varepsilon_{i,t}^{Q}}}\text{,} + & \boldsymbol{\varepsilon_{i,t}^{Q}}\sim {\mathcal{N}_N}(\boldsymbol{0}_N,\mathrm{I}_N)\text{,} + (\#eq:VARQ) +\end{align} +where $\boldsymbol{\mu^{Q}_{i,X}}$ contains intercepts, $\Phi^{Q}_{i,X}$, the feedback matrix, and $\Gamma_{i,X}$ a lower triangular matrix. + +Based on Equations \@ref(eq:ShortRate) and \@ref(eq:VARQ), @DaiSingleton2000 show that the country-specific zero-coupon bond yield with maturity of $n$ periods, $y_{i,t}^{(n)}$, is affine in $\boldsymbol{X_{i,t}}$: +\begin{equation} +\underset{(1 \times 1)}{\vphantom{\Big|} +y_{i,t}^{(n)}} = +\underset{(1 \times 1)}{\vphantom{\Big|} +a_{i,n}(\Theta_{n})} + +\underset{(1 \times N)}{\vphantom{\Big|} +\boldsymbol{b_{i,n}(\Theta_{n})}^\top} +\underset{(N \times 1)}{\vphantom{\Big|} +\boldsymbol{X_{i,t}}}\text{,} + (\#eq:AffineYieldsScalar) +\end{equation} +where $a_{i,n}(\Theta _{n})$ and $\boldsymbol{b_{i,n}(\Theta _{n})}$ are constrained to eliminate arbitrage opportunities within this bond market, as dictated by the well-known Riccati equations.^[Specifically, the referred loadings are $a_{i,n+1}(\Theta _{n+1}) = a_{i,n}(\Theta _{n}) + b_{i,n}(\Theta _{n}) \mu^{Q}_{i,X} + \frac{1}{2} b_{i,n}(\Theta _{n}) \Gamma_{i,X} \Gamma_{i,X}' b_{i,n}(\Theta _{n})' - \delta_{i,0}$ and $b_{i, n+1}=b_{i,n}\Phi^{Q}_{i,X} - \delta_{i,1}$, considering that the boundary conditions are $a_{i,1}(\Theta _1)=-\delta_{i,0}$ and $b_{i,1}(\Theta_1)=-\delta_{i,1}$. These expressions assume that the Radon–Nikodym derivative, which maps the risk-neutral measure to the physical measure, follows a log-normal process, and that the market price of risk is time-varying and affine in $X_t$. See @AngPiazzesi2003 for a detailed derivation of these expressions.] For notational simplicity, we collect $J$ bond yields into the vector $\boldsymbol{Y_{i,t}}=[y_{i,t}^{(1)}, y_{i,t}^{(2)},...,y_{i,t}^{(J)}]^\top$, the $J$ intercepts into $\boldsymbol{A_X(\Theta_i)}=[a_{i,1}(\Theta _{1}), a_{i,2}(\Theta _{2}) ,...,a_{i,J}(\Theta _{J})]^\top$ $\in \mathbb{R}^J$, and the $N$ slope coefficients into a $J \times N$ matrix $B_X(\Theta_i)=[\boldsymbol{b_{i,1}(\Theta _{1})}^\top, \boldsymbol{b_{i,2}(\Theta _{2})}^\top, ...,\boldsymbol{b_{i,J}(\Theta _{J})}^\top]^\top$. Accordingly, the yield curve cross-section dimension of country $i$ is: +\begin{equation} + \underset{(J \times 1)}{\vphantom{\Big|} + \boldsymbol{Y_{i,t}}} = + \underset{(J \times 1)}{\vphantom{\Big|} + \boldsymbol{A_X(\Theta_i)}} + + \underset{(J \times N)}{\vphantom{\Big|} + B_X(\Theta_i)} + \underset{(N \times 1)}{\vphantom{\Big|} + \boldsymbol{X_{i,t}}}\text{.} +(\#eq:AffineYieldsVector) +\end{equation} + +It follows from Equations \@ref(eq:ShortRate) and \@ref(eq:VARQ) that the parameter set $\Theta_i =\{\boldsymbol{\mu^Q_{i,X}},\Phi^Q_{i,X}, \Gamma_{i,X}, \delta_{i,0}, \boldsymbol{\delta_{i,1}}\}$ fully characterizes the cross-section of country's $i$ term structure. Importantly, @DaiSingleton2000 demonstrate that this system is not identified without additional restrictions, since $\boldsymbol{X_{i,t}}$ and any invertible affine transformation of $\boldsymbol{X_{i,t}}$ yield observationally equivalent representations. To circumvent this problem, JPS (2014) adopt the three sets of (minimal) restrictions proposed by @JoslinSingletonZhu2011. First, they impose the latent factors to be zero-mean processes, forcing $\boldsymbol{\mu^{Q}_{i,X}}= \boldsymbol{0}_N$. Second, they choose $\boldsymbol{\delta_{i,1}}$ to be a $N$-dimensional vector whose entries are all equal to one. Lastly, $\Phi^Q_{i,X}$ is a diagonal matrix, the elements of which are the real and distinct eigenvalues, $\lambda^Q_i$, of the matrix of eigenvectors of $\Phi^Q_{i,X}$. Based on this restriction set, @JoslinSingletonZhu2011 show that no additional invariant rotation is possible. + +@JoslinSingletonZhu2011 also show that a rotation from $\boldsymbol{X_{i,t}}$ to portfolios of yields, the spanned factors $\boldsymbol{P_{i,t}}$, leads to an observationally equivalent model representation. This invariant transformation implies that $N$ portfolios of yields are perfectly priced and observed without errors, while the remaining $J-N$ portfolios are priced and observed imperfectly. Specifically, the spanned factors are computed as $\boldsymbol{P_{i,t}}=V_i\boldsymbol{Y_{i,t}}$, for a full-rank matrix $V_i$. Based on this definition, Equation \@ref(eq:AffineYieldsVector) can be rearranged as an affine function of $\boldsymbol{P_{i,t}}$ + +\begin{equation} + \underset{(J \times 1)}{\vphantom{\Big|} + \boldsymbol{Y_{i,t}}}= + \underset{(J \times 1)}{\vphantom{\Big|} + \boldsymbol{A_P(\Theta_i)}}+ + \underset{(J \times N)}{\vphantom{\Big|} + B_P(\Theta_i)} + \underset{(N \times 1)}{\vphantom{\Big|} + \boldsymbol{P_{i,t}}}\text{.} + (\#eq:AffineYieldsSpanned) +\end{equation} +where $\boldsymbol{A_P(\Theta_i)}= \mathrm{I}_n - B_X(\Theta_i ) \left[ V_iB_X(\Theta_i \right] ^{-1}V_i \boldsymbol{A_X(\Theta_i)}$ and +$B_P(\Theta_i)=B_X(\Theta_i) \left[ V_iB_X(\Theta_i ) \right]^{-1}$. + + +The rotation from $\boldsymbol{X_{i,t}}$ to $\boldsymbol{P_{i,t}}$ is convenient for two key reasons. First, $\boldsymbol{P_{i,t}}$ contains directly observable yield curve factors (unlike $\boldsymbol{X_{i,t}}$), with its $N$ elements mapping to traditional yield curve components. For instance, for $N=3$ and $V_i$ being the weight matrix that results from a principal component analysis, the portfolios of yields $\boldsymbol{P_{i,t}}$ are commonly referred to as the level, slope, and curvature factors (see Section [6.1](#S:SpaFac)). Second, it enables a convenient decomposition of the likelihood function, facilitating both estimation and the interpretation of model parameters. + + +### Multicountry specifications (joint Q-dynamics model classes) +The cross-section multicountry extension is formed by stacking the country yields, spanned factors, and intercepts from Equation \@ref(eq:AffineYieldsSpanned) into, respectively, $\boldsymbol{Y_t}=[\boldsymbol{Y_{1,t}}^\top, \boldsymbol{Y_{2,t}}^\top, ...,\boldsymbol{Y_{C,t}}^\top]^\top$, $\boldsymbol{P_t}=[\boldsymbol{P_{1,t}}^\top, \boldsymbol{P_{2,t}}^\top, ..., \boldsymbol{P_{C,t}}^\top]^\top$, and $\boldsymbol{A_P(\Theta)}=[\boldsymbol{A_P^\top(\Theta_1)}, \boldsymbol{A_P^\top(\Theta_2)}, ..., \boldsymbol{A_P^\top(\Theta_C)}]^\top$, where $C$ denotes the number of countries in this economic system. Additionally, we set $B_{P}(\Theta)$ as block diagonal, $B_P(\Theta)=B_P(\Theta_1) \oplus B_P(\Theta_2) \oplus \dots \oplus B_P(\Theta_C)$, where $\oplus$ refers to the direct sum symbol. Accordingly, +\begin{equation} + \underset{(CJ \times 1)}{\vphantom{\Big|} + \boldsymbol{Y_{t}}} = + \underset{(CJ \times 1)}{\vphantom{\Big|} + \boldsymbol{A_{P}(\Theta)}} + + \underset{(CJ \times CN)}{\vphantom{\Big|} + B_{P}(\Theta)} + \underset{(CN \times 1)}{\vphantom{\Big|} + \boldsymbol{P_{t}}}\text{.} + (\#eq:AffineYieldsSpannedMultiCountry) +\end{equation} + +## Model time series dimension (P-dynamics) {#S:Pdyn} +In the modelling frameworks implemented in the \CRANpkg{MultiATSM} package, the risk factor dynamics under the $\mathbb{P}$-measure must include at least $N$ domestic spanned factors ($\boldsymbol{P_{i,t}}$) and $M$ domestic unspanned factors ($\boldsymbol{M_{i,t}}$), and may optionally include $G$ global unspanned factors ($\boldsymbol{M_t^W}$), depending on the specification. The dynamics of these risk factors evolves as either an unrestricted or a restricted VAR models. The unrestricted case corresponds to the JPS specification, while the restricted setup encompasses the GVAR and JLL frameworks. + +It is worth stressing the role of unspanned factors in shaping the yield curve developments. Although these factors are absent in the cross-section dimension of the models, they influence the dynamics of the spanned factors which, in turn, affect directly bond yields. + +### JPS-based models +The country-specific state vector, $\boldsymbol{Z_{i,t}}$, is formed from stacking the global and domestic (unspanned and spanned) risk factors: $\boldsymbol{Z_{i,t}} = [\boldsymbol{M_t^{W^\top}}$, $\boldsymbol{M_{i,t}}^\top$, $\boldsymbol{P_{i,t}}^\top]^\top$. As such, $\boldsymbol{Z_{i,t}}$ is a $R$-dimensional vector, where $R =G + K$ and $K = M + N$. +In JPS-based setups, $\boldsymbol{Z_{i,t}}$ follows a standard unrestricted Gaussian VAR(1): +\begin{align} + & \underset{(R \times 1)}{\vphantom{\Big|} + \boldsymbol{Z_{i,t}}} = + \underset{(R \times 1)}{\vphantom{\Big|} + \boldsymbol{C_i^{\mathbb{P}}}} + + \underset{(R \times R)}{\vphantom{\Big|} + \Phi_i^{\mathbb{P}}} + \underset{ (R \times 1)}{\vphantom{\Big|} + \boldsymbol{Z_{i,t-1}}} + + \underset{(R \times R)}{\vphantom{\Big|} + \Gamma_i} + \underset{(R \times 1)}{\vphantom{\Big|} + \boldsymbol{\varepsilon_{Z,t}^{\mathbb{P}}}}\text{,} + & \boldsymbol{\varepsilon_{Z,t}^{\mathbb{P}}} \sim + {\mathcal{N}_R}(\boldsymbol{0}_R,\mathrm{I}_R)\text{,}% + (\#eq:VARunspanned) +\end{align} +where $\boldsymbol{C_i^{\mathbb{P}}}$ denotes the vector of intercepts; $\Phi_i^{\mathbb{P}}$, the feedback matrix; and $\Gamma_i$, the Cholesky factor (a lower triangular matrix). + +### GVAR-based models {#S:GVARtheory} +In the \CRANpkg{MultiATSM} package, the GVAR setup is formed from two parts: the marginal and the VARX$^{*}$ models. The former captures the joint dynamics of the global economy, whereas the latter describes the developments from the domestic factors. For a thorough description of GVAR models, see @ChudikPesaran2016. + +The marginal model is an unrestricted VAR($1$) featuring exclusively the global factors: +\begin{align} +& \underset{(G \times 1)}{\vphantom{\Big|} +\boldsymbol{M_t^W}}= +\underset{(G \times 1)}{\vphantom{\Big|} +\boldsymbol{C^W}} + +\underset{(G \times G)}{\vphantom{\Big|} +\Phi^W} \underset{(G \times 1)}{\vphantom{\Big|} +\boldsymbol{M_{t-1}^W}} + +\underset{(G \times G)}{\vphantom{\Big|} +\Gamma^W}\underset{(G \times 1)}{\vphantom{\Big|} +\boldsymbol{\varepsilon_{t}^W}}\text{,} & \boldsymbol{\varepsilon_t^W} \sim {\mathcal{N}_G}(\boldsymbol{0}_G,\mathrm{I}_G). +(\#eq:MarginalModel) +\end{align} + +The VARX$^{*}$ setups are country-specific small-scale VAR models containing global factors and weakly exogenous 'star' variables — weighted averages of foreign variables — built as +\begin{equation} + \boldsymbol{Z_{i,t}^{\ast^\top}} = \sum_{j=1}^{C} w_{i,j} \boldsymbol{Z_{j,t}^\top}, \qquad \sum_{j=1}^{C} w_{i,j}= 1, \quad w_{i,i}=0 \quad \forall i \in \{1,2, ...C \}, +(\#eq:StarVar) +\end{equation} +where $Z_{j,t}$ is a $K-$dimension vector of domestic factors $\boldsymbol{Z_{j,t}} = [\boldsymbol{M_{j,t}}^\top$, $\boldsymbol{P_{j,t}}^\top]^\top$ and $w_{i,j}$ is a scalar that measures the degree of connectedness of country $i$ with country $j$. + +These models follow a VARX$^{*}(p,q,r)$ specification, where $p$, $q$ and $r$ are the number of lags from, respectively, the domestic, the star, and the global risk factors. The \CRANpkg{MultiATSM} package provides the estimates for the case $p=q=r=1$. In such a case, the dynamics of $\boldsymbol{Z_{i,t}}$ is described as a VARX$^{*}$ of the following form: +\begin{align} +& \underset{(K \times 1)}{\vphantom{\Big|} +\boldsymbol{Z_{i,t}}} = +\underset{(K \times 1)}{\vphantom{\Big|} +\boldsymbol{C^X_{i}}} + +\underset{(K \times K)}{\vphantom{\Big|} +\Phi^X_{i}} +\underset{(K \times 1)}{\vphantom{\Big|} +\boldsymbol{Z_{i,t-1}}} + +\underset{(K \times K)}{\vphantom{\Big|} +\Phi^{X^\ast}_i} +\underset{(K \times 1)}{\vphantom{\Big|} +\boldsymbol{Z_{i,t-1}^{\ast}}} + +\underset{(K \times G)}{\vphantom{\Big|} +\Phi_{i}^{X^{W}}} +\underset{(G \times 1)}{\vphantom{\Big|} +\boldsymbol{M_{t-1}^{W}}} + +\underset{(K \times K)}{\vphantom{\Big|} +\Gamma_{i}^{X}} +\underset{(K \times 1)}{\vphantom{\Big|} +\boldsymbol{\varepsilon^X_{i,t}}}\text{,} & \boldsymbol{\varepsilon^X_{i,t}} \sim {\mathcal{N}_K}(\boldsymbol{0}_K,\mathrm{I}_K). +(\#eq:VARXmodel) +\end{align} + + +Additionally, GVAR models require, as an intermediate step, the specification of country-specific $2K \times CK$-link matrices, $W_i$, to unify the individual VARX$^{*}$ models. Formally, +\begin{equation} +\begin{bmatrix} \boldsymbol{Z_{i,t}} \\ \boldsymbol{Z_{i,t}}^{*} \end{bmatrix}_{2K \times 1} \equiv \underset{(2K \times CK)}{W_i} \begin{bmatrix} \boldsymbol{Z_{1,t}} \\ \boldsymbol{Z_{2,t}} \\ \vdots \\ \boldsymbol{Z_{C,t}} +\end{bmatrix}_{CK \times 1}. +(\#eq:LinkMatequation) +\end{equation} + + +Last, to compose the $F$-dimensional state vector for $F = G + CK$, we gather the global economic variables and the country-specific risk factors, as $\boldsymbol{Z_t} = [\boldsymbol{M_{t}^{W^\top}}$, $\boldsymbol{Z_{1,t}}^\top$, $\boldsymbol{Z_{2,t}}^\top, \ldots \boldsymbol{Z_{C,t}}^\top]^\top$. As such, we can form a first order GVAR process as + \begin{align} + & \underset{(F \times 1)}{\vphantom{\Big|} + \boldsymbol{Z_t}} = + \underset{(F \times 1)}{\vphantom{\Big|} + \boldsymbol{C_y}} + + \underset{(F \times F)}{\vphantom{\Big|} + \Phi_y} + \underset{(F \times F)}{\vphantom{\Big|} + \boldsymbol{Z_{t-1}}} + + \underset{(F \times F)}{\vphantom{\Big|} + \Gamma_y} + \underset{(F \times 1)}{\vphantom{\Big|} + \boldsymbol{\varepsilon_{y,t}}}\text{,} & + \boldsymbol{\varepsilon_{y,t}} \sim {\mathcal{N}_F}(\boldsymbol{0}_F,\mathrm{I}_F)\text{,} (\#eq:GVARequation) +\end{align} +where $\boldsymbol{C_y} = [\boldsymbol{C^{W^\top}}$, $\boldsymbol{C_1^{X^\top}}$, $\boldsymbol{C_2^{X^\top}}$,... $\boldsymbol{C_C^{X^\top}}]^\top$, $\boldsymbol{\varepsilon_{y,t}} =[ \boldsymbol{\varepsilon^{W^\top}_t}$, $\boldsymbol{\varepsilon_{1,t}^{X^\top}}$, $\boldsymbol{\varepsilon_{2,t}^{X^\top}}$ ... $\boldsymbol{\varepsilon_{C,t}^{X^\top}}]^\top$, $\Gamma_y=\Gamma^W \oplus \Gamma_1^X \oplus \Gamma_2^X \oplus \dots \oplus \Gamma_C^X$, and + +\begin{equation} +\Phi_y = +\begin{bmatrix} +\Phi^W & 0_{\scriptscriptstyle{G \times CK}} \\ +\Phi^{X^{W}} & G_1 +\end{bmatrix}_{F \times F} , +\end{equation} + +where $\Phi^{X^{W}}= +\begin{bmatrix} +\Phi^{X^{W}}_1 \\ +\Phi^{X^{W}}_2 \\ +\vdots \\ +\Phi^{X^{W}}_C +\end{bmatrix}_{CK \times G}$ + and $G_1= +\begin{bmatrix} +\Phi_1W_1 \\ +\Phi_2W_2 \\ +\vdots \\ +\Phi_CW_C +\end{bmatrix}_{CK \times CK}$, for $\Phi_i= [\Phi_i^{X}$, $\Phi_i^{X^*}]$ and $\quad \forall i \in \{1,2, ...C \}$. + + +### JLL-based models {#S:JLL} +JLL-based models incorporate three components: *(i)* the global economy, *(ii)* a dominant large economy,^[Noticeably, in the context of the \CRANpkg{MultiATSM} package, the model type `JLL No DomUnit` is the only exception (see Section [3](#S:ATSMoptions)).] and *(iii)* a set of smaller economies. The state vector is formed from a number of linear projections to build domestic risk factors that are free of the influence of the variables from other countries and/or from the global economy. + +The construction of the domestic spanned factors proceeds in two steps. First, for each economy $i$, $\boldsymbol{P_{i,t}}$ is projected on $\boldsymbol{M_{i,t}}$ of this same country +\begin{equation} + \underset{ (N \times 1)}{\vphantom{\Big|} + \boldsymbol{P_{i,t}}} = + \underset{(N \times M)}{\vphantom{\Big|} + b_i} + \underset{(M \times 1)}{\vphantom{\Big|} + \boldsymbol{M_{i,t}}} + + \underset{ (N \times 1)}{\vphantom{\Big|} + \boldsymbol{P_{i,t}^e}} \text{,} +(\#eq:PricingOrthoAll) +\end{equation} +where the residuals $\boldsymbol{P_{i,t}^e}$ are orthogonal to the economic fundamentals of the country $i$. + +Second, for the non-dominant economies, $\boldsymbol{P_{i,t}^e}$ is additionally projected on the orthogonalized spanned factors of the dominant country, indexed by $D$, as follows: +\begin{equation} + \underset{(N \times 1)}{\vphantom{\Big|} + \boldsymbol{P_{i,t}^e}} = + \underset{(N \times N)}{\vphantom{\Big|} + c_i^D} + \underset{(N \times 1)}{\vphantom{\Big|} + \boldsymbol{P_{D,t}^e}} + + \underset{(N \times 1)}{\vphantom{\Big|} + \boldsymbol{P_{i,t}^{e*}}}\text{,} \quad i \neq D, +(\#eq:PricingOrthoNonDU) +\end{equation} +where $\boldsymbol{P_{i,t}^{e*}}$ corresponds to the non-dominant country $i$ set of residuals. + +The design of the domestic unspanned factors also features two steps: for the dominant economy, $\boldsymbol{M_{D,t}}$ is projected on the global economic factors +\begin{equation} + \underset{(M \times 1)}{\vphantom{\Big|} + \boldsymbol{M_{D,t}}} = + \underset{(M \times G)}{\vphantom{\Big|} + a_D^W} + \underset{(G \times 1)}{\vphantom{\Big|} + \boldsymbol{M_t^W}} + + \underset{(M \times 1)}{\vphantom{\Big|} + \boldsymbol{M_{D,t}^e}} \text{,} +(\#eq:MacroOrthoDU) +\end{equation} +and, for the other economies, the residuals of the previous regression are used to compute +\begin{equation} + \underset{(M \times 1)}{\vphantom{\Big|} + \boldsymbol{M_{i,t}}} = + \underset{(M \times G)}{\vphantom{\Big|} + a_i^W} + \underset{(G \times 1)}{\vphantom{\Big|} + \boldsymbol{M_t^W}} + + \underset{(M \times M)}{\vphantom{\Big|} + a_i^D} + \underset{(M \times 1)}{\vphantom{\Big|} + \boldsymbol{M_{D,t}^e}} + + \underset{(M \times 1)}{\vphantom{\Big|} + \boldsymbol{M_{i,t}^{e*}}}\text{.} +(\#eq:MacroOrthoNonDU) +\end{equation} + +Accordingly, the state vector is formed by $\boldsymbol{Z_t^e}= [\boldsymbol{M_t^{W^\top}}$, $\boldsymbol{M_{D,t}^{e^\top}}$, $\boldsymbol{P_{D,t}^{e^\top}}$, $\boldsymbol{M_{2,t}^{e*^\top}}$, $\boldsymbol{P_{2,t}^{e*^\top}}$ ... $\boldsymbol{M_{C,t}^{e*^\top}}$, $\boldsymbol{P_{C,t}^{e*^\top}}]^\top$ and its dynamics evolve as a restricted VAR(1), +\begin{align} +& \underset{(F \times 1)}{\vphantom{\Big|} +\boldsymbol{Z_t^e}}= +\underset{(F \times 1)}{\vphantom{\Big|} +\boldsymbol{C^{e}_Y}} + +\underset{(F \times F)}{\vphantom{\Big|} +\Phi^e_Y} +\underset{(F \times 1)}{\vphantom{\Big|} +\boldsymbol{Z_{t-1}^e}} + +\underset{(F \times F)}{\vphantom{\Big|} +\Gamma_{Y}^e} +\underset{(F \times 1)}{\vphantom{\Big|} +\boldsymbol{\varepsilon^e_{Z,t}}}\text{,} & \boldsymbol{\varepsilon _{Z,t}^{e}} \sim {\mathcal{N}_F}(\boldsymbol{0}_F,\mathrm{I}_F). +(\#eq:VAROrtho) +\end{align} +JLL (2015) impose a set of zero restrictions on $\Phi^e_Y$ and $\Gamma_{Y}^e$, with their detailed structure provided in the original study. + +## Estimation procedures {#S:ATSMestimation} +The approach proposed by JPS (2014) enables an efficient estimation procedure through its structural design. Specifically, the parameters governing the $\mathbb{Q}$- and $\mathbb{P}$-measures can be estimated independently. The only exception is the variance-covariance matrix, $\Sigma$, which appears in both likelihood functions and, therefore, must be estimated jointly. + +In JLL (2015), however, the authors adopt a simplified estimation procedure by estimating the $\Sigma$ matrix exclusively under the $\mathbb{P}$-measure. While they acknowledge that this approach is not fully efficient, they argue that the empirical implications are limited in their application. + + +# The ATSMs available at the MultiATSM package {#S:ATSMoptions} + +As outlined in the previous section, the ATSMs implemented in the \CRANpkg{MultiATSM} package differ in the specification of their $\mathbb{Q}$- and $\mathbb{P}$-measure dynamics. In short, under the $\mathbb{Q}$-measure, models can be specified either on a country-by-country basis (JPS, 2014) or jointly across countries (JLL, 2015; CM, 2024). Under the $\mathbb{P}$-measure, risk factor dynamics follow a VAR(1) process, which may be unrestricted, as in the JPS-related frameworks, or restricted, as in the JLL and GVAR specifications. + +\CRANpkg{MultiATSM} provides support for eight different classes of ATSMs based on these modelling approaches. These classes vary along several dimensions: the specification of the $\mathbb{P}$- and $\mathbb{Q}$-dynamics, the estimation approach, and whether a dominant economy is included. Table `r knitr::asis_output(ifelse(knitr::is_html_output(), '\\@ref(tab:tab-ModFea-H)', '\\@ref(tab:tab-ModFea-L)'))` summarizes the defining features of each model class available in the package. A brief overview of these specifications follows below. + + +```{r tab-ModFea-H, eval = knitr::is_html_output(), layout = "l-body-outset"} +ModelLabels <- c("JPS original", "JPS global", "JPS multi", "GVAR single", "GVAR multi", + "JLL original", "JLL No DomUnit", "JLL joint Sigma") + +# Rows +Tab <- data.frame(matrix(nrow = length(ModelLabels), ncol = 0)) +rownames(Tab) <- ModelLabels + +# Empty columns +EmptyCol <- c("", "", "", "", "", "", "", "") +Tab$EmptyCol0 <- EmptyCol +# P-dynamics + 2 empty spaces +Tab$PdynIndUnco <- c("x", "", "", "", "", "", "", "") +Tab$PdynIndCo <- c("", "", "", "", "", "", "", "") +Tab$PdynJointUnco <- c("", "x", "x", "", "", "", "", "") +Tab$PdynJointJLL <- c("", "", "", "", "", "x", "x", "x") +Tab$PdynJointGVAR <- c("", "", "", "x", "x", "", "", "") +Tab$EmptyCol1 <- EmptyCol +Tab$EmptyCol2 <- EmptyCol +# Q-dynamics + 2 empty spaces +Tab$QdynInd <- c("x", "x", "", "x", "", "", "", "") +Tab$QdynJoint <- c("", "", "x", "", "x", "x", "x", "x") +Tab$EmptyCol3 <- EmptyCol +Tab$EmptyCol4 <- EmptyCol +# Sigma + 2 empty spaces +Tab$Ponly <- c("", "", "", "", "", "x", "x", "") +Tab$PandQ <- c("x", "x", "x", "x", "x", "", "", "x") +Tab$EmptyCol5 <- EmptyCol +Tab$EmptyCol6 <- EmptyCol +# Dominant Unit +Tab$DomUnit <- c("", "", "", "", "", "x", "", "x") + +# Adjust column names +ColNames <- c("","","","","JLL", "GVAR", "", "", "", "", "", "", "","", "", "","") +colnames(Tab) <- ColNames + +# Generate the table +kableExtra::kbl(Tab, align = "c", caption = "Summary of model features") %>% + kableExtra::kable_classic("striped", full_width = F) %>% + kableExtra::row_spec(0, font_size = 10) %>% + kableExtra::add_header_above(c(" "=2, "UR" = 1, "R" = 1, "UR" = 1, "R" = 2, " " = 11)) %>% + kableExtra::add_header_above(c(" "=2, "Single" = 2, "Joint" = 3, " "=2, "Single" = 1, "Joint" = 1, " "=2, "P only" = 1, "P and Q" = 1, " " = 3)) %>% + kableExtra::add_header_above(c( " "=2, "P-dynamics"= 5, " "=2, "Q-dynamics"= 2, " "=2, "Sigma matrix estimation" = 2, " "=2, "Dom. Eco."=1), bold = T) %>% +kableExtra::pack_rows("Unrestricted VAR", 1, 3 , label_row_css = "background-color: #666; color: #fff;") %>% +kableExtra::pack_rows("Restricted VAR (GVAR)", 4, 5, label_row_css = "background-color: #666; color: #fff;") %>% +kableExtra::pack_rows("Restricted VAR (JLL)", 6, 8, label_row_css = "background-color: #666; color: #fff;") %>% +kableExtra::column_spec(1, width = "10em") %>% +kableExtra::column_spec(3:17, width = "4.5em") %>% +kableExtra::footnote(general = "Risk factor dynamics under the \\(\\mathbb{P}\\)-measure may follow either an unrestricted (UR) or a restricted (R) specification. The set of restrictions present in the JLL-based and GVAR-based models are described in @JotikasthiraLeLundblad2015 and @CandelonMoura2024, respectively. The estimation of the \\(\\Sigma\\) matrix is done either exclusively with the other parameters of the \\(\\mathbb{P}\\)-dynamics (*P* column) or jointly under both \\(\\mathbb{P}\\)- and \\(\\mathbb{Q}\\)-parameters (*P and Q* column). *Dom. Eco.* relates to the presence of a dominant economy. The entries featuring *x* indicate that the referred characteristic is part of the model.", + escape = FALSE) +``` + + + +```{r tab-ModFea-L, eval = knitr::is_latex_output()} +ModelLabels <- c("JPS original", "JPS global", "JPS multi", "GVAR single", "GVAR multi", + "JLL original", "JLL No DomUnit", "JLL joint Sigma") + +# Rows +Tab <- data.frame(matrix(nrow = length(ModelLabels), ncol = 0)) +rownames(Tab) <- ModelLabels + +# Empty columns +EmptyCol <- c("", "", "", "", "", "", "", "") +Tab$EmptyCol0 <- EmptyCol +# P-dynamics + 2 empty spaces +Tab$PdynIndUnco <- c("x", "", "", "", "", "", "", "") +Tab$PdynIndCo <- c("", "", "", "", "", "", "", "") +Tab$PdynJointUnco <- c("", "x", "x", "", "", "", "", "") +Tab$PdynJointJLL <- c("", "", "", "", "", "x", "x", "x") +Tab$PdynJointGVAR <- c("", "", "", "x", "x", "", "", "") +Tab$EmptyCol1 <- EmptyCol +# Q-dynamics + 2 empty spaces +Tab$QdynInd <- c("x", "x", "", "x", "", "", "", "") +Tab$QdynJoint <- c("", "", "x", "", "x", "x", "x", "x") +Tab$EmptyCol4 <- EmptyCol +# Sigma + 2 empty spaces +Tab$Ponly <- c("", "", "", "", "", "x", "x", "") +Tab$PandQ <- c("x", "x", "x", "x", "x", "", "", "x") +Tab$EmptyCol2 <- EmptyCol +# Dominant Unit +Tab$DomUnit <- c("", "", "", "", "", "x", "", "x") + +# Adjust column names +ColNames <- c("","", "", "", "JLL", "GVAR", "", "", "", "", "","", "", "") +colnames(Tab) <- ColNames + +# Generate the table + kableExtra::kbl(Tab, align = "c", format = "latex", booktabs = TRUE, + caption = "Summary of model features", escape = FALSE) %>% + kableExtra::row_spec(0, bold = TRUE) %>% + kableExtra::add_header_above(c(" " = 2, "UR" = 1, "R" = 1, "UR" = 1, "R" = 2, " " = 1, " " = 1, + " " = 1, " " = 1, " " = 1, " " = 1, " " = 1, " " = 1)) %>% + kableExtra::add_header_above(c(" " = 2, "Single" = 2, "Joint" = 3, " " = 1, "Single" = 1, + "Joint" = 1, " " = 1, "P" = 1, "P and Q" = 1, " " = 1)) %>% kableExtra::add_header_above(c(" " = 2,"P-dynamics" = 5, " " = 1,"Q-dynamics" = 2, + " " = 1,"Sigma estimation" = 2, " " = 1,"Dom. Eco." = 1), + bold = TRUE) %>% +kableExtra::pack_rows("Unrestricted VAR", 1, 3) %>% +kableExtra::pack_rows("Restricted VAR (GVAR)", 4, 5) %>% +kableExtra::pack_rows("Restricted VAR (JLL)", 6, 8) %>% +kableExtra::kable_styling(font_size = 7, latex_options = "hold_position") +knitr::asis_output(" +\\vspace{-2.5em} +\\begin{center} +\\captionsetup{type=table} +\\caption*{\\footnotesize Note: Risk factor dynamics under the $\\mathbb{P}$-measure may follow either an unrestricted (UR) or a restricted (R) specification. The set of restrictions present in the JLL-based and GVAR-based models are described in \\cite{JotikasthiraLeLundblad2015} and \\cite{CandelonMoura2024}, respectively. The estimation of the $\\Sigma$ matrix is done either exclusively with the other parameters of the $\\mathbb{P}$-dynamics (\\textit{P} column) or jointly under both $\\mathbb{P}$- and $\\mathbb{Q}$-parameters (\\textit{P and Q} column). \\textit{Dom. Eco.} relates to the presence of a dominant economy. The entries featuring \\textit{x} indicate that the referred characteristic is part of the model.} +\\end{center} +") +``` + +The ATSMs in which the estimation is performed separately for each country are labeled as `JPS original`, `JPS global` and `GVAR single`. In the `JPS original` setup, the set of risk factors includes exclusively each country's domestic variables and the global unspanned factors, whereas `JPS global` and `GVAR single` also incorporate domestic risk factors of the other countries of the economic system. Noticeably, the difference between `JPS global` and `GVAR single` stem from the set of restrictions imposed under the $\mathbb{P}$-dynamics. + +Within the multicountry frameworks, certain features are worth noting. The `JLL original` model reproduces the setup in JLL (2015), assuming an economic cohort composed of a globally dominant economy and a set of smaller countries, and estimating the $\Sigma$ matrix exclusively under the $\mathbb{P}$-measure. The two alternative versions assume the absence of a dominant country (`JLL No DomUnit`) and the estimation of $\Sigma$ under both the $\mathbb{P}$ and $\mathbb{Q}$ measures (`JLL joint Sigma`), as in JPS (2014). The remaining specifications differ in their $\mathbb{P}$-dynamics: either by an unrestricted VAR model (`JPS multi`) or by a GVAR setup (`GVAR multi`), as proposed in CM (2024). + +# Package dataset {#S:SectionData} + +The \CRANpkg{MultiATSM} package provides datasets that approximate those used in the GVAR-based ATSMs of @CandelonMoura2023 and CM (2024). The data requirements for estimating GVAR models encompass those of all other model classes, making them suitable for generating outputs across all models supported by the package. As such, the examples in the following sections use the dataset from CM (2024). + +The `LoadData()` function provides access to the datasets included in the package. To load the data from CM (2024), set the argument to `CM_2024`: + +```{r, echo=TRUE} +LoadData("CM_2024") +``` + +This function returns three sets of data. The first contains time series of zero-coupon bond yields for four emerging market economies: China, Brazil, Mexico, and Uruguay. The data spans monthly intervals from June 2004 to January 2020. For the purpose of model estimation, the package requires that *(i)* bond yield maturities are the same across all countries;^[It is worth emphasizing that, although the `DataForEstimation()` and `InputsForOpt()` functions in the package accept inputs with differing maturities, their outputs are standardized to a common set of yields.] and *(ii)* yields must be expressed in annualized percentage terms (not basis points). Note that the \CRANpkg{MultiATSM} package does not provide routines for bootstrapping zero-coupon yields from coupon bonds, so any such treatment must be handled by the user. + +The second dataset comprises time series for unspanned risk factors — specifically, the macroeconomic indicators economic growth and inflation — covering the same period as the bond yield data. These data cover both *(i)* domestic variables for each of the four countries in the sample and *(ii)* corresponding global indicators. The construction of unspanned risk factors, like that of bond yields, must be carried out externally by the user. + +The final dataset contains measures of interconnectedness, proxied by trade flows, which are specifically required for estimating the GVAR-based models. The trade flow data report the annual value of goods imported and exported between each pair of countries in the sample, starting from 1948. All values are expressed in U.S. dollars on a free-on-board basis. These data are used to construct the transition matrix in the GVAR framework. + + +# Required user inputs {#S:SectionInputs} + +## Fundamental inputs +To estimate any model, the user must specify several general inputs, which can be grouped into the following categories: + + +1. Desired ATSM class (`ModelType`): a character vector containing the label of the model to be estimated as described in Table `r knitr::asis_output(ifelse(knitr::is_html_output(), '\\@ref(tab:tab-ModFea-H)', '\\@ref(tab:tab-ModFea-L)'))`; + +2. Risk Factor Features. This includes the following list of elements: + +- Set of economies (`Economies`): a character vector containing the names of the economies which are part of the economic system; + +- Global variables (`GlobalVar`): a character vector containing the labels of the $G$ global unspanned factors. Studies examining the impact of global developments on bond prices could include proxy measures of global inflation and global economic activity in this category [@JotikasthiraLeLundblad2015; @AbbrittiDellErbaMorenoSola2018; @CandelonMoura2024]; + +- Domestic variables (`DomVar`): a character vector containing the labels of the $M$ domestic unspanned factors. These typically correspond to measures of domestic inflation and economic activity, the standard macroeconomic indicators monitored by central banks [@AngPiazzesi2003; @JoslinPriebschSingleton2014; @JotikasthiraLeLundblad2015; @CandelonMoura2024]; + +- Number of spanned factors ($N$): a scalar representing the number of country-specific spanned factors. Although, in principle, $N$ could vary across countries, the models provided in the package assume a common value of $N$ for all countries. +A common choice in the literature is $N=3$, as in JPS (2014) and CM (2024), since this produces an excellent cross-sectional fit of bond yields (@LittermanScheinkman1991). +Other studies, such as @AdrianCrumpMoench2013, extend the specification to $N=5$, arguing that it improves the performance of model-implied term premia. Further intuition on the role and interpretation of spanned factors is provided in Section [6.1](#S:SpaFac). + +3. Sample span: + +- Initial sample date (`t0`): the start of the sample period in the format *dd-mm-yyyy*; + +- End sample date (`tF`): the end of the sample period in the format *dd-mm-yyyy*. + +4. Data Frequency (`DataFreq`): a character vector specifying the frequency of the time series data. The available options are: `Annually`, `Quarterly`, `Monthly`, `Weekly`, `Daily Business Days`, and `Daily All Days`; + +5. Stationarity constraint under the $\mathbb{Q}$-dynamics (`StatQ`): a logical that takes `TRUE` if the user wishes to impose that the largest eigenvalue under the $\mathbb{Q}$-measure, $\lambda^Q_i$, is strictly less than 1. While enforcing this stationarity constraint may increase estimation time, it can improve convergence and numerical stability. Moreover, by inducing near-cointegration, the eigenvalue restriction helps to pin down more plausible dynamics for bond risk premia [@BauerRudebuschWu2012; @JoslinPriebschSingleton2014]; + + +6. Selected folder to save the graphical outputs (`Folder2Save`): path where the selected graphical outputs will be saved. If set to `NULL`, the outputs are stored in the user's temporary directory (accessible via `tempdir()`); + +7. Output label (`OutputLabel`): A single-element character vector containing the name used in the file name that stores the model outputs. + + +The following provides an example of the basic model input specification: + +```{r, echo=TRUE} +ModelType <- "JPS original" +Economies <- c("Brazil", "Mexico", "Uruguay") +GlobalVar <- c("Gl_Eco_Act", "Gl_Inflation") +DomVar <- c("Eco_Act", "Inflation") +N <- 3 +t0 <- "01-07-2005" +tF <- "01-12-2019" +DataFreq <- "Monthly" +StatQ <- FALSE +Folder2Save <- NULL +OutputLabel <- "Model_demo" +``` + +## Model-specific inputs + +### GVARlist and JLLlist {#S:SectionGVARJLLinputs} +The inputs described above are sufficient for estimating all variants of the JPS models presented in Table `r knitr::asis_output(ifelse(knitr::is_html_output(), '\\@ref(tab:tab-ModFea-H)', '\\@ref(tab:tab-ModFea-L)'))`. However, estimating the GVAR or JLL setups requires additional elements. For clarity, these extra inputs should be organized into separate lists for each model. This section outlines the general structure of both lists, while Section [6.2](#S:PdynEst) provide a more detailed explanations of their components and available options, reflecting the broader scope of each setup. + +For GVAR models, the required inputs are twofold. First, the user must specify the dynamic structure of each country's VARX model. For example: + +```{r, echo=TRUE} +VARXtype <- "unconstrained" +``` + +Next, provide the desired inputs to build the transition matrix. For instance: + +```{r, echo=TRUE} +data('TradeFlows') +W_type <- "Sample Mean" +t_First_Wgvar <- "2000" +t_Last_Wgvar <- "2015" +DataConnectedness <- TradeFlows +``` + +Based on these inputs, a complete instance of the `GVARlist` object is + +```{r, echo=TRUE} +GVARlist <- list(VARXtype = "unconstrained", W_type = "Sample Mean", + t_First_Wgvar = "2000", t_Last_Wgvar = "2015", + DataConnectedness = TradeFlows) +``` + +For the JLL frameworks, if the chosen model is either `JLL original` or `JLL joint Sigma`, it suffices to specify the name of the dominant economy. Otherwise, for the `JLL No DomUnit` class, the user must set `None`. For instance: + +```{r, echo=TRUE} +## Example for "JLL original" and "JLL joint Sigma" models +JLLlist <- list(DomUnit = "China") + +## For "JLL No DomUnit" model +JLLlist <- list(DomUnit = "None") +``` + + +### BRWlist {#S:SectionBRWinputs} +In an influential paper, @BauerRudebuschWu2012 (henceforth BRW, 2012) show that estimates from traditional ATSMs often suffer from severe small-sample bias. This can lead to unrealistically stable expectations for future short-term interest rates and, consequently, distort term premium estimates for long-maturity bonds. To address this issue, BRW (2012) propose an indirect inference estimator based on a stochastic approximation algorithm, which corrects for bias and enhances the persistence of short-term interest rates, resulting in more plausible term premium dynamics. + +It is worth noting that this framework serves as a complementary feature to the core ATSMs and can therefore be applied to any of the model types supported by the \CRANpkg{MultiATSM} package. If the user intends to implement a model following the BRW (2012) approach, a few additional inputs must be specified. These include: + +- Mean or median of physical dynamic estimates (`Cent_Measure`): compute the mean or the median of the $\mathbb{P}$-dynamics estimates after each bootstrap iteration by setting the option to `Mean` (for the mean) or `Median` (for the median); + +- Adjustment parameter (`gamma`): this parameter controls the degree of shrinkage applied to the difference between the estimates prior to the bias correction and the bootstrap-based estimates after each iteration. It remains fixed across iterations and must lie in the interval $(0,1)$; + +- Number of iteration (`N_iter`) : total number of iterations used + in the stochastic approximation algorithm after burn-in; + +- Number of bootstrap samples (`B`): quantity of simulated samples + used in each burn-in or actual iteration; + +- Perform closeness check (`checkBRW`): indicates whether the user wishes to compute the root mean square distance between the model estimates obtained through the bias-correction method and those generated via no bias-correction. The default is `TRUE`; + +- Number of bootstrap samples used in the closeness check (`B_check`): + default is equal to 100,000 samples; + +- Eigenvalue restriction (`Eigen_rest`): impose a restriction on the largest eigenvalue under the $P$-measure after applying the bias correction procedure. Default is $1$; + +- Number of burn-in iteration (`N_burn`): quantity of the iterations + to be discarded in the first stage of the bias-correction estimation + process. The recommended number is $15\%$ of the total number of + iterations. In practice, this resembles the burn-in concept in Markov chain Monte Carlo methods. Particularly, the BRW (2012) stochastic approximation algorithm is iterative, and for a sufficiently large number of iterations, the parameters converge to their true values. As such, discarding early iterations avoids the need for assessing a computationally costly exit condition. + +```{r, echo = TRUE} +BRWlist <- within(list(Cent_Measure = "Mean", gamma = 0.2, N_iter = 500, B = 50, + checkBRW = TRUE, B_check = 1000, Eigen_rest = 1), + N_burn <- round(N_iter * 0.15)) +``` + +## Additional inputs for numerical and graphical outputs {#S:SectionNumOut} +Once the desired features are selected and the parameters of the chosen ATSM have been estimated, the \CRANpkg{MultiATSM} package provides tools to generate the following numerical and graphical outputs +via the `NumOutputs()` function: + +- Time-series dynamics of the risk factors; +- Model fit of the bond yields; +- Orthogonalized impulse response functions (IRFs); +- Orthogonalized forecast error variance decompositions (FEVDs); +- Generalized impulse response functions (GIRFs); +- Generalized forecast error variance decompositions (GFEVDs); +- Decomposition of bond yields into expected and term premia components. + +These outputs are organized into distinct analytical components, each offering different insights into model behavior and its economic interpretation. + +The time-series dynamics of the risk factors are displayed in separate subplots: one for each global factor, and one subplot per domestic risk factor showing all countries in the economic system. The model fit of the bond yields is provided through two measures of model-implied yields. The first is a fitted measure derived solely from the cross-sectional component, as in Equation \@ref(eq:AffineYieldsSpanned) for single-country models and Equation \@ref(eq:AffineYieldsSpannedMultiCountry) for multicountry setups. This measure reflects the fit based exclusively on the parameters governing the $\mathbb{Q}$-dynamics. The second incorporates both the physical and risk-neutral dynamics, combining the cross-sectional equations with the state evolution specified by each ATSM. + +The impulse response functions and variance decompositions are available in both orthogonalized and generalized forms. The orthogonalized outputs (IRFs and FEVDs) are computed using a short-run recursive identification scheme, meaning they depend on the ordering of the selected risk factors. Specifically, the package is structured to place global unspanned factors first, followed by its domestic unspanned and spanned factors within each country, in the order in which countries are listed in the `Economies` vector. In contrast, the generalized versions (GIRFs and GFEVDs) are robust to factor ordering but allow for correlated shocks across risk factors [@PesaranShin1998]. For the numerical computation of these outputs, a horizon of analysis has to be specified, *e.g.*, `Horiz <- 100`. + + +The bond yield decomposition can be performed with respect to two measures of risk compensation: term premia and forward premia. While the term premium is derived directly from the bond yield levels, the forward premium is obtained from the decomposition of forward rates. A more formal presentation of both measures is provided in the Appendix. + +Users must specify the desired graph types in a character vector. Available options include: `RiskFactors`, `Fit`, `IRF`, `FEVD`, `GIRF`, `GFEVD`, and `TermPremia`. For example: + +```{r, echo = TRUE} +DesiredGraphs <- c("Fit", "GIRF", "GFEVD", "TermPremia") +``` + +Moreover, for all models, users must indicate the types of variables of interest (yields, risk factors, or both). For JLL-type models specifically, users must also specify whether to include the orthogonalized versions. Each of these options should be set to `TRUE` to generate the corresponding graphs, and `FALSE` otherwise. + +```{r, echo = TRUE} +WishGraphRiskFac <- FALSE +WishGraphYields <- TRUE +WishOrthoJLLgraphs <- FALSE +``` + + +The desired graphical outputs are stored in the selected folder, `Folder2Save`. Alternatively, users can display the desired plots directly in the console without saving them to `Folder2Save` by using the `autoplot()` method. + +### Bootstrap settings {#S:SectionBootstrap} +@Horowitz2019 shows that bootstrap methods generally produce more accurate statistical inference than those based on asymptotic distribution theory. To generate confidence intervals +using bootstrap, via the `Bootstrap()` function, an additional list of inputs must be provided: + +- Desired bootstrap procedure (`methodBS`): the user must select one of the following options: *(i)* standard residual bootstrap (`bs`); *(ii)* wild bootstrap (`wild`); or *(iii)* block bootstrap (`block`). If the block bootstrap is selected, the block length must also be specified. +The residual bootstrap is a conventional method that is straightforward to implement when a parametric model, such as a VAR model, is available. The block bootstrap makes weaker assumptions about the data-generating process and is well-suited to handling both weak and strong serial dependence. The wild bootstrap is particularly appropriate for data exhibiting heteroskedasticity [@Horowitz2019]; +- Number of bootstrap draws (`ndraws`): @KilianLutkepohl2017 suggest that, in VAR specifications, `ndraws` can range from a few hundred to several thousand, depending on factors such as sample size, lag order, and the desired quantiles of the distribution. Illustrating this, CM (2024) set $ndraws= 1,000$ in their ATSM to construct confidence intervals for IRFs; +- Confidence level expressed (`pctg`): the desired confidence level should be expressed in percentage points. Common choices in VAR-related setups include $68\%$, $90\%$ and $95\%$ [@KilianLutkepohl2017]. + +```{r, echo = TRUE} +Bootlist <- list(methodBS = 'block', BlockLength = 4, ndraws = 1000, pctg = 95) +``` + +### Out-of-sample forecast settings {#S:SectionForecast} +To generate bond yield forecasts, use `ForecastYields()` with the following inputs: + +- Forecast horizon (`ForHoriz`): Number of forecast horizons in periods; +- Index of the first observation (`t0Sample`): time index of the first observation included in the information set; +- Index of the last observation (`t0Forecast`): time index of the last observation in the information set used to generate the first forecast; +- Method used for forecast computation (`ForType`): forecasts can be generated using either a rolling or expanding window. To use a rolling window, set this parameter to `Rolling`. In this case, the sample length for each forecast is fixed and defined by `t0Sample`. For expanding window forecasts, set this input to `Expanding`, allowing the information set to increase at each forecast iteration. + +```{r, echo = TRUE} +ForecastList <- list(ForHoriz = 12, t0Sample = 1, t0Forecast = 70, ForType = "Rolling") +``` + +# Model estimation {#S:SectionEstimation} +Using the dataset described in Section [4](#S:SectionData), the estimation of the ATSM proceeds in three main steps. First, the country-specific spanned factors are estimated, which, along with the global and domestic unspanned factors, form the complete set of risk factors used in the subsequent estimation steps. Second, the package estimates the parameters governing the dynamics of the risk factors under the $\mathbb{P}$-measure. Finally, it optimizes the full ATSM specification, including the parameters under the $\mathbb{Q}$-measure. + +As will be made clear in Section [7](#S:SectionImplementation), although the functions introduced in this section can be used individually, they are primarily designed to be used together with the broader set of functions available in the \CRANpkg{MultiATSM} package. However, as these functions play a central role in the package structure, they warrant a dedicated section. + +## Spanned factors {#S:SpaFac} + +The spanned factors for country $i$, denoted by $\boldsymbol{P_{i,t}}$, are typically obtained as the first $N$ principal components (PCs) of the observed bond yields. The PC method provides orthogonal linear combinations of the original variables, ordered by their ability to capture the variance in the data. Formally, $\boldsymbol{P_{i,t}}$ is computed as $\boldsymbol{P_{i,t}} = w_i \boldsymbol{Y_{i,t}}$, where yields are ordered by increasing maturity in $\boldsymbol{Y_{i,t}}$, and $w_i$ is the matrix of eigenvectors derived from the covariance matrix of $\boldsymbol{Y_{i,t}}$. + +In the case of $N = 3$, the spanned factors are traditionally interpreted as level, slope, and curvature components of the yield curve [@LittermanScheinkman1991]. This interpretation stems from the properties of the $w_i$ matrix, as illustrated below: + +```{r, echo = TRUE} +data('Yields') +w <- pca_weights_one_country(Yields, Economy = "Uruguay") +``` + +In matrix *w*, each row holds the weights for constructing a spanned factor. The first row relates to the level factor, with weights loading roughly equally across maturities. As such, high (low) values of the level factor indicate an overall high (low) value of yields across all maturities. The second row features increasing weights with maturity, capturing the slope of the yield curve: high values indicate steep curves, while low values reflect flat or inverted curves. The third row corresponds to the curvature factor, with weights emphasizing medium-term maturities. This captures the ‘hump-shaped’ features of the yield curve typically associated with changes in its curvature. These concepts are also graphically illustrated in Figure `r knitr::asis_output(ifelse(knitr::is_html_output(), '\\@ref(fig:pca-H)', '\\@ref(fig:pca-L)'))`. + + +```{r pca-H, fig.height = 5, fig.cap="Yield loadings on the spanned factors. Example using bond yield data for Uruguay. Graph generated using the ggplot2 package [@ggplot22016].", include=knitr::is_html_output(), eval=knitr::is_html_output()} + +LabSpaFac <- c("Level", "Slope", "Curvature") +N <- length(LabSpaFac) + +mat <- c(0.25, 0.5, 1, 3, 5, 10) + +w_pca <- data.frame(t(w[1:N,])) +colnames(w_pca) <- LabSpaFac +w_pca$mat <- mat + +## Prepare plots +colors <- c("Level" = "#0072B2", "Slope" = "#009E73", "Curvature" = "#D55E00") + +g <- ggplot2::ggplot(data = w_pca, ggplot2::aes(x= mat)) + + ggplot2::geom_line(ggplot2::aes(y = Level, color = "Level"), size = 0.7) + + ggplot2::geom_line(ggplot2::aes(y = Slope, color = "Slope"), size = 0.7) + + ggplot2::geom_line(ggplot2::aes(y = Curvature, color = "Curvature"), size = 0.7) + + ggplot2::labs(color = "Legend") + ggplot2::scale_color_manual(values = colors) + ggplot2::theme_classic() + + ggplot2::theme(legend.position="top", legend.title=ggplot2::element_blank(), legend.text= ggplot2::element_text(size=8) ) + + ggplot2::xlab("Maturity (Years)") + ggplot2:: scale_y_continuous(name="Weights") + ggplot2::geom_hline(yintercept=0) + +print(g) +``` + + +```{r pca-L, fig.height = 2.8, fig.width = 5, fig.cap="Yield loadings on the spanned factors. Example using bond yield data for Uruguay. Graph was generated using the ggplot2 package \\citep{ggplot22016}.", include=knitr::is_latex_output(), eval=knitr::is_latex_output()} + +LabSpaFac <- c("Level", "Slope", "Curvature") +N <- length(LabSpaFac) + +mat <- c(0.25, 0.5, 1, 3, 5, 10) + +w_pca <- data.frame(t(w[1:N,])) +colnames(w_pca) <- LabSpaFac +w_pca$mat <- mat + +## Prepare plots +colors <- c("Level" = "#0072B2", "Slope" = "#009E73", "Curvature" = "#D55E00") + + ggplot2::ggplot(data = w_pca, ggplot2::aes(x= mat)) + + ggplot2::geom_line(ggplot2::aes(y = Level, color = "Level"), size = 0.7) + + ggplot2::geom_line(ggplot2::aes(y = Slope, color = "Slope"), size = 0.7) + + ggplot2::geom_line(ggplot2::aes(y = Curvature, color = "Curvature"), size = 0.7) + + ggplot2::labs(color = "Legend") + ggplot2::scale_color_manual(values = colors) + ggplot2::theme_classic() + + ggplot2::theme(legend.position="top", legend.title=ggplot2::element_blank(), legend.text= ggplot2::element_text(size=8) ) + + ggplot2::xlab("Maturity (Years)") + ggplot2:: scale_y_continuous(name="Weights") + ggplot2::geom_hline(yintercept=0) +``` + + +The user can directly obtain the time series of the country-specific spanned factors by calling `Spanned_Factors()`, as shown below: + +```{r, echo = TRUE} +data('Yields') +Economies <- c("China", "Brazil", "Mexico", "Uruguay") +N <- 2 +SpaFact <- Spanned_Factors(Yields, Economies, N) +``` + + +## The P-dynamics estimation {#S:PdynEst} +As presented in Table `r knitr::asis_output(ifelse(knitr::is_html_output(), '\\@ref(tab:tab-ModFea-H)', '\\@ref(tab:tab-ModFea-L)'))` and explained in detail in Section [2](#S:ATSMTheory), the dynamics of the risk factors under the $\mathbb{P}$-measure in the available models follow a VAR(1) process. This specification can be fully unrestricted, as in the JPS-related models, or subject to restrictions, as in the GVAR and JLL frameworks. This subsection illustrates how each of these model configurations is implemented. + +### VAR {#S:SectionVAR} +To use `VAR()`, the user needs to select the appropriate set of risk factors for the model being estimated and specify `unconstrained` in the argument `VARtype`. In the two examples presented below, the outputs are the intercept vector, the feedback matrix, and the variance–covariance matrix for a VAR(1) model under the $\mathbb{P}$-measure: + +```{r, echo = TRUE} +## Example 1: "JPS global" and "JPS multi" models +data("RiskFacFull") +PdynPara <- VAR(RiskFacFull, VARtype = "unconstrained") + +## Example 2: "JPS original" model for China +FactorsChina <- RiskFacFull[1:7, ] +PdynPara <- VAR(FactorsChina, VARtype = "unconstrained") +``` + + +### GVAR {#S:SectionGVAR} +The `GVAR()` function estimates a GVAR(1) model constructed from country-specific VARX$^{*}(1,1,1)$ specifications. It requires two main inputs: the number of domestic spanned factors ($N$) and a set of elements grouped in the `GVARinputs` list. The latter consists of four components: + + +1. Economies: a $C-$dimensional character vector containing the names of the economies present in the economic system; + +2. GVAR list of risk factors: a list of risk factors sorted by country in addition to the global variables. An example of the expected data structure is: + +```{r, echo = TRUE} +data("GVARFactors") +``` +To assist in formatting the data accordingly, users may use the `DatabasePrep()` function; + +3. VARX type: a character vector specifying the desired structure of the VARX$^{*}$ model. Two general options are available: + +- Fully unconstrained: specify as `unconstrained`. This option estimates each equation in the system separately via ordinary least squares, without imposing any restrictions. + +- With constraints: imposes specific set of zero restrictions on the feedback matrix. This category includes two sub-options: +*(a)* `constrained: Spanned Factors` prevents foreign spanned factors from affecting any domestic risk factor; +*(b)* `constrained: [factor name]` restricts the specified risk factor to be influenced only by its own lags and the lags of its associated star variables. In both cases, the VARX$^{*}$ is estimated using restricted least squares. + +```{r, echo = TRUE} +data('GVARFactors') +GVARinputs <- list(Economies = Economies, GVARFactors = GVARFactors, + VARXtype ="constrained: Inflation") +``` + +4. Transition matrix: a $C \times C$ matrix that captures the degree of interdependence across the countries in the system. Each entry $(i,j)$ represents the strength of the dependence of economy $i$ on economy $j$. As an example, the matrix below is computed from bilateral trade flow data, averaged over the period 2006–2019, for a system comprising China, Brazil, Mexico, and Uruguay. The rows are normalized so that the weights sum to $1$ for each country (i.e., each row of the matrix sums to $1$). +The transition matrix can be generated using `Transition_Matrix()`, as illustrated in the Appendix : + +```{r} +data("TradeFlows") +t_First <- "2006" +t_Last <- "2019" +Economies <- c("China", "Brazil", "Mexico", "Uruguay") +type <- "Sample Mean" +W_gvar <- Transition_Matrix(t_First, t_Last, Economies, type, TradeFlows) + +round(W_gvar, digits= 4) +``` + +With inputs specified, the user can estimate a GVAR model using: + +```{r, echo = TRUE} +data("GVARFactors") +GVARinputs <- list(Economies = Economies, GVARFactors = GVARFactors, + VARXtype = "unconstrained", Wgvar = W_gvar) +N <- 3 +GVARpara <- GVAR(GVARinputs, N, CheckInputs = TRUE) +``` + +Note that the `CheckInputs` parameter should be set to TRUE to perform a consistency check on the inputs specified in `GVARinputs` prior to the $\mathbb{P}$-dynamics estimation. + +### JLL {#S:SectionJLL} +The `JLL()` function estimates the physical parameters. Required inputs are: + +1. Risk factors: a time series matrix of the risk factors in their non-orthogonalized form; + +2. Number of spanned factors ($N$): a scalar representing the number of country-specific spanned factors; + +3. `JLLinputs`: a list object containing the following elements: + + - Economies: a $C$-dimensional character vector listing the economies; + + - Dominant Economy: a character vector indicating either the name of the country assigned as the dominant economy (for `JLL original` and `JLL jointSigma` models), or `None` (for the `JLL No DomUnit` case); + + - Estimate Sigma Matrices: a logical equal to `TRUE` if the user wishes to estimate the full set of JLL sigma matrices (i.e., variance-covariance and Cholesky factor matrices), and `FALSE` otherwise. Since this numerical estimation is costly, it may significantly increase computation time; + + - Precomputed Variance-Covariance Matrix: in some instances, a precomputed variance-covariance matrix from the non-orthogonalized dynamics can be supplied here to save time and memory. If no such matrix is available, this input should be set to `NULL`; + + - JLL type: a character string specifying the chosen JLL model, following the classification described in Table `r knitr::asis_output(ifelse(knitr::is_html_output(), '\\@ref(tab:tab-ModFea-H)', '\\@ref(tab:tab-ModFea-L)'))`. + +```{r, eval=FALSE, echo=TRUE} +## First set the JLLinputs +ModelType <- "JLL original" +JLLinputs <- list(Economies = Economies, DomUnit = "China", WishSigmas = TRUE, + SigmaNonOrtho = NULL, JLLModelType = ModelType) + +## Then, estimate the desired the P-dynamics from the desired JLL model +data("RiskFacFull") +N <- 3 +JLLpara <- JLL(RiskFacFull, N, JLLinputs, CheckInputs = TRUE) +``` + +The `CheckInputs` input is set to `TRUE` to perform a consistency check on the inputs specified in `JLLinputs` before running the $\mathbb{P}$-dynamics estimation. + +## ATSM estimation +Estimating the ATSM parameters involves maximizing the log-likelihood function to obtain the best-fitting model parameters using `Optimization()`. The unspanned risk factor framework of JPS (2014) (and, therefore, all its multicountry extensions) follows a model parameterization similar to that proposed in @JoslinSingletonZhu2011. Particularly, it requires estimating a set of six parameter blocks: + +1. The risk-neutral long-run mean of the short rate ($r0$); + +2. The risk-neutral feedback matrix ($K1XQ$); + +3. Standard deviation of measurement errors for yields observed with error ($se$); + +4. The variance-covariance matrix from the VAR process ($SSZ$); + +5. The intercept matrix of the physical dynamics ($K0Z$); + +6. The feedback matrix of the physical dynamics ($K1Z$). + +The parameters $K0Z$ and $K1Z$ have closed-form solutions. Similarly, $r0$ and $se$ are derived analytically and are factored out of the log-likelihood function. In contrast, the remaining parameters, $K1XQ$ and $SSZ$, must be estimated numerically. + + +The optimization routine in \CRANpkg{MultiATSM} combines the `Nelder–Mead` and `L-BFGS-B` algorithms, executed sequentially and repeated until convergence is achieved. At each iteration, the parameter vector yielding the highest likelihood is retained, enhancing robustness to local optima without resorting to full multi-start procedures. Convergence is achieved when the absolute change in the mean log-likelihood falls below a user-defined tolerance (default $10^{-4}$). For the bootstrap replications, the same optimization procedure is applied; however, only the `Nelder–Mead` algorithm is used to reduce computation time. + + +# Full implementation of ATSMs {#S:SectionImplementation} + +## Package workflow +The complete workflow of the \CRANpkg{MultiATSM} package is built around seven core functions, which together support a streamlined and modular process. An overview of these functions is provided below: + +1. `LabFac()`: returns a list of risk factor labels used throughout the package. In particular, these labels assist in structuring sub-function inputs and generating variable and graph labels in a parsimonious manner; + +2. `InputsForOpt()`: collects and processes the inputs needed to build the likelihood function as specified in Section [5](#S:SectionInputs). It estimates the model’s $\mathbb{P}$-dynamics and returns an object of class *ATSMModelInputs*, which includes `print()` and `summary()` S3 methods. The `print()` method summarizes model inputs and system features, while `summary()` reports statistics on risk factors and bond yields; + +3. `Optimization()`: performs the estimation of the model parameters, primarily the $\mathbb{Q}$-dynamics, using numerical optimization. This function returns a comprehensive list of the model's point estimates and can be computationally intensive; + +4. `InputsForOutputs()`: an auxiliary function that compiles the necessary elements for producing numerical and graphical outputs. It also creates separate folders in the user's `Folder2Save` directory to store the generated figures; + +5. `NumOutputs()`: produces the numerical outputs as selected in Section [5.3](#S:SectionNumOut), based on the model’s point estimates. The function returns an object of class *ATSMNumOutputs*, for which an `autoplot()` S3 method is available. This method provides a convenient way to visualize the selected graphical outputs; + +6. `Bootstrap()`: computes confidence bounds for the numerical outputs using the bootstrap procedures defined in Section [5.3](#S:SectionNumOut) (subsection "Bootstrap settings"). The function returns an *ATSMModelBoot* object, which can be accessed via the `autoplot()` S3 method to generate the desired graphical outputs with confidence intervals. As this step involves repeated model estimation, it may require several hours (possibly days) to complete; + + +7. `ForecastYields()`: generates bond yield forecasts and the corresponding forecast errors according to the specifications outlined in Section [5.3](#S:SectionNumOut) (subsection "Out-of-sample forecast settings"). This function returns an object of class *ATSMModelForecast*, accesible via the `plot()` S3 method, which displays Root Mean Squared Errors (RMSEs) by country and forecast horizon. + + +## Complete implementation + +This section illustrates how to fully implement ATSMs using the \CRANpkg{MultiATSM} package. A simplified two-country `JPS original` framework serves as the example. The implementation steps are outlined below, and a sample of graphical outputs are presented in Figures \@ref(fig:FitYields) -- \@ref(fig:TermPremia). + +```{r FullImpl, cache=FALSE, echo = TRUE} +library(MultiATSM) +# 1) USER INPUTS +# A) Load database data +LoadData("CM_2024") + +# B) GENERAL model inputs +ModelType <- "JPS original" +Economies <- c("China", "Brazil") +GlobalVar <- c("Gl_Eco_Act") +DomVar <- c("Eco_Act") +N <- 2 +t0_sample <- "01-05-2005" +tF_sample <- "01-12-2019" +OutputLabel <- "Test" +DataFreq <-"Monthly" +Folder2Save <- NULL +StatQ <- FALSE + +# B.1) SPECIFIC model inputs +# GVAR-based models +GVARlist <- list( VARXtype = "unconstrained", W_type = "Sample Mean", t_First_Wgvar = "2005", + t_Last_Wgvar = "2019", DataConnectedness = TradeFlows ) + +# JLL-based models +JLLlist <- list(DomUnit = "China") + +# BRW inputs +WishBC <- FALSE +BRWlist <- within(list(Cent_Measure = "Mean", gamma = 0.05, N_iter = 250, B = 50, checkBRW = TRUE, + B_check = 1000, Eigen_rest = 1), N_burn <- round(N_iter * 0.15)) + +# C) Decide on Settings for numerical outputs +WishFPremia <- TRUE +FPmatLim <- c(60,120) + +Horiz <- 30 +DesiredGraphs <- c() +WishGraphRiskFac <- FALSE +WishGraphYields <- FALSE +WishOrthoJLLgraphs <- FALSE + +# D) Bootstrap settings +WishBootstrap <- TRUE +BootList <- list(methodBS = 'bs', BlockLength = 4, ndraws = 5, pctg = 95) + +# E) Out-of-sample forecast +WishForecast <- TRUE +ForecastList <- list(ForHoriz = 12, t0Sample = 1, t0Forecast = 162, ForType = "Rolling") + +########################################################################################## +# NO NEED TO MAKE CHANGES FROM HERE: +# The sections below automatically process the inputs provided above, run the model +# estimation, generate the numerical and graphical outputs, and save results. + +# 2) Minor preliminary work: get the sets of factor labels +FactorLabels <- LabFac(N, DomVar, GlobalVar, Economies, ModelType) + +# 3) Prepare the inputs of the likelihood function +ATSMInputs <- InputsForOpt(t0_sample, tF_sample, ModelType, Yields, GlobalMacro, + DomMacro, FactorLabels, Economies, DataFreq, GVARlist, + JLLlist, WishBC, BRWlist) + +# 4) Optimization of the ATSM (Point Estimates) +ModelParaList <- Optimization(ATSMInputs, StatQ, DataFreq, FactorLabels, Economies, ModelType) + +# 5) Numerical and graphical outputs +# a) Prepare list of inputs for graphs and numerical outputs +InputsForOutputs <- InputsForOutputs(ModelType, Horiz, DesiredGraphs, OutputLabel, StatQ, + DataFreq, WishGraphYields, WishGraphRiskFac, + WishOrthoJLLgraphs, WishFPremia, + FPmatLim, WishBootstrap, BootList, + WishForecast, ForecastList) + +# b) Fit, IRF, FEVD, GIRF, GFEVD, and Term Premia +NumericalOutputs <- NumOutputs(ModelType, ModelParaList, InputsForOutputs, + FactorLabels, Economies, Folder2Save) + +# c) Confidence intervals (bootstrap analysis) +BootstrapAnalysis <- Bootstrap(ModelType, ModelParaList, NumericalOutputs, Economies, + InputsForOutputs, FactorLabels, JLLlist, GVARlist, + WishBC, BRWlist, Folder2Save) + +# 6) Out-of-sample forecasting +Forecasts <- ForecastYields(ModelType, ModelParaList, InputsForOutputs, FactorLabels, + Economies, JLLlist, GVARlist, WishBC, BRWlist, + Folder2Save) +``` + + +```{r FitYields, out.width="100%", fig.width = 6, fig.height = 4.5, fig.cap = if (knitr::is_html_output()) { knitr::asis_output("Chinese bond yield maturities with model fit comparisons. *Model-fit* reflects estimation using only risk-neutral ($\\mathbb{Q}$) dynamics parameters, while *Model-Implied* incorporates both physical ($\\mathbb{P}$) and risk-neutral ($\\mathbb{Q}$) dynamics. The $x$-axes represent time in months and the $y$-axis is in natural units.")} else{ knitr::asis_output("Chinese bond yield maturities with model fit comparisons. \\emph{Model-fit} reflects estimation using only risk-neutral ($\\mathbb{Q}$) dynamics parameters, while \\emph{Model-implied} incorporates both physical ($\\mathbb{P}$) and risk-neutral ($\\mathbb{Q}$) dynamics. The $x$-axes represent time in months and the $y$-axis is in natural units.")}} +FitYields <- autoplot(NumericalOutputs, type = "Fit") +FitYields$China +``` + + +```{r IRF, out.width="100%", fig.width = 6, fig.height = 4.5, fig.cap = "IRFs from the Brazilian bond yields to global economic activity. Size of the shock is one-standard deviation. The black lines are the point estimates. Gray dashed lines are the bounds of the 95% confidence intervals and the green lines correspond to the median of these intervals. The $x$-axes are expressed in months and the $y$-axis is in natural units."} +IRFs_Graphs <- autoplot(BootstrapAnalysis, NumericalOutputs, type = "IRF_Yields_Boot") +IRFs_Graphs$Brazil$Gl_Eco_Act +``` + +```{r FEVD, out.width="100%", fig.width = 6, fig.height = 4.5, fig.cap = "FEVD from the Brazilian bond yield with maturity 60 months. The $x$-axis represents the forecast horizon in months and the $y$-axis is in natural units."} +FEVDs_Graphs <- autoplot(NumericalOutputs, type = "FEVD_Yields") +FEVDs_Graphs$Brazil$Y60M_Brazil +``` + +```{r TermPremia, out.width="100%", fig.width = 6, fig.height = 4.5, fig.cap = "Chinese sovereign yield curve decomposition showing (i) expected future short rates and (ii) term premia components. The $x$-axis represents time in months and the $y$-axis is expressed in percentage points."} +TP_Graphs <- autoplot(NumericalOutputs, type = "TermPremia") +TP_Graphs$China +``` + +# Concluding remarks +The \CRANpkg{MultiATSM} package aims to advance yield curve (term structure) modelling within the R programming environment. It provides a comprehensive yet user-friendly toolkit for practitioners, academics, and policymakers, featuring estimation routines and generating detailed outputs across several macrofinance model classes. This allows for an in-depth exploration of the relationship between the real economy developments and the fixed income markets. + +The package covers eight classes of macrofinance term structure models, all built upon the single-country unspanned macroeconomic risk framework of @JoslinPriebschSingleton2014, which is also extended to a multicountry setting. Additional multicountry variants based on @JotikasthiraLeLundblad2015 and @CandelonMoura2024 are included, incorporating, respectively, a dominant economy and a GVAR structure to model cross-country interdependence. + +Each model class provides analytical outputs that offer insight into term structure dynamics, including plots of model fit, risk premia, impulse responses, and forecast error variance decompositions.The \CRANpkg{MultiATSM} package also offers bootstrap procedures for confidence interval construction and out-of-sample forecasting of bond yields. + +# Acknowledgments {-} +I thank the editor, Rob Hyndman, and an anonymous referee for several helpful comments. I am also grateful to Bertrand Candelon, Adhir Dhoble and Gustavo Torregrosa for many insightful discussions. An earlier version of this paper circulated under the title *MultiATSM: An R Package for Arbitrage-Free Multicountry Affine Term Structure of Interest Rate Models with Unspanned Macroeconomic Risk* and was part of the author’s PhD dissertation at UCLouvain [@Moura2022]. The views expressed in this paper are those of the author and do not necessarily reflect those of Banco de Mexico. + + +# Appendix {.appendix} + +## A: Supplementary functions {-} + +### Importing data from Excel files + +The \CRANpkg{MultiATSM} package also provides an automated procedure for importing data from Excel files via `Load_Excel_Data()` and preparing the risk factor database used directly in the model estimation. To ensure compatibility with the package functions, the following requirements must be met: + +1. Databases must be organized in separate Excel files: one for unspanned factors and another for term structure data. For GVAR-based models, a third file containing the interdependence measures is also required; +2. Each Excel file should include one tab per country. In the case of unspanned factors, an additional tab must be included for the global variables if the user opts to incorporate them; +3. Variable names must be identical across all tabs within each file. + +An example Excel file meeting these requirements is provided with the package. Below is an example of how to import the data from excel and construct the input list to be supplied: + +```{r, echo=TRUE} +MacroData <- Load_Excel_Data(system.file("extdata", "MacroData.xlsx", + package = "MultiATSM")) +YieldsData <- Load_Excel_Data(system.file("extdata", "YieldsData.xlsx", + package = "MultiATSM")) +``` + +```{r, echo=TRUE} +ModelType <- "JPS original" +Initial_Date <- "2006-09-01" +Final_Date <- "2019-01-01" +DataFrequency <- "Monthly" +GlobalVar <- c("GBC", "VIX") +DomVar <- c("Eco_Act", "Inflation", "Com_Prices", "Exc_Rates") +N <- 3 +Economies <- c("China", "Mexico", "Uruguay", "Brazil", "Russia") +``` + +These inputs are used to construct the *RiskFactorsSet* variable, which holds the full collection of risk factors required by the model. + +```{r, echo=TRUE} +FactorLabels <- LabFac(N, DomVar, GlobalVar, Economies, ModelType) +RiskFactorsSet <- DataForEstimation(Initial_Date, Final_Date, Economies, N, FactorLabels, + ModelType, DataFrequency, MacroData, YieldsData) +``` + +### Transition matrix and star factors +To construct the transition matrix for GVAR specifications, the user can employ `Transition_Matrix()`. This function requires: + +1. Data selection: choose proxies for cross-country interdependence. + +2. Time frame: specify the sample’s start and end dates. + +3. Dependence measure: select from: + - Time-varying (dynamic weights) + - Sample Mean (static average) + - A numeric scalar (fixed-year snapshot). + +```{r, echo=TRUE} +data("TradeFlows") +t_First <- "2006" +t_Last <- "2019" +Economies <- c("China", "Brazil", "Mexico", "Uruguay") +type <- "Sample Mean" +W_gvar <- Transition_Matrix(t_First, t_Last, Economies, type, TradeFlows) +``` +Note that if data is missing for any country in a given year, the corresponding transition matrix will contain only `NA`s. + +A more flexible approach to modelling interdependence is to allow the transition matrix to vary over time. In this case, the star factors are constructed using trade flow weights specific to each year, adjusting the corresponding year’s risk factors accordingly. To enable this feature, users must set the `type` argument to `Time-varying` and specify the same year for both the initial and final periods in the transition matrix. This indicates that the trade weights from that particular year is used when solving the GVAR system (i.e., in the construction of the link matrices, see Equation \@ref(eq:LinkMatequation)). + + +## B: Additional theoretical considerations {-} + +### Bond yield decomposition + +The \CRANpkg{MultiATSM} package allows for the calculation +of two risk compensation measures: term premia and forward premia. Assume that an $n$-maturity bond yield can be decomposed into two components: the expected short-rate ($\mathrm{Exp}_{i,t}^{(n)}$) and term premia +($\mathrm{TP}_{i,t}^{(n)}$). +Technically: +$$ +y_{i,t}^{(n)} = \mathrm{Exp}_{i,t}^{(n)} + \mathrm{TP}_{i,t}^{(n)} \text{.} +$$ In the package's standard form, the expected short rate term is +computed from time $t$ to $t+n$, which represents the bond's maturity: +$\mathrm{Exp}_{i,t}^{(n)} = \sum_{h=0}^{n} E_t[y_{i, t+h}^{(1)}]$. Alternatively, +the decomposition for the forward rates ($f_{i,t}^{(n)}$) is +$f_{i,t}^{(n)} = \sum_{h=m}^{n} E_t[y_{i,t+h}^{(1)}] + \mathrm{FP}_{i,t}^{(n)}$ where +$\mathrm{FP}_{i,t}^{(n)}$ corresponds to the forward premia. In this case, the user +must specify `TRUE` if the computation of forward +premia is desired, or `FALSE` otherwise. If set to `TRUE`, the user must also +provide a two-element numerical vector containing the maturities +corresponding to the starting and ending dates of the bond maturity. +Example: + +```{r, echo=TRUE} + WishFPremia <- TRUE + FPmatLim <- c(60, 120) +``` + +## C: Replication of existing research {-} + +### Joslin, Priebisch and Singleton (2014) +The dataset used in this replication was constructed by @BauerRudebusch2017 (henceforth BR, 2017) and is available on Bauer’s website. In their paper, BR (2017) investigate whether macrofinance term structure models are better suited to the unspanned macro risk framework of JPS (2014) or to earlier, traditional spanned settings such as @AngPiazzesi2003. To that end, BR (2017) replicate selected empirical results from JPS (2014). The corresponding R code is also available on Bauer’s website. + +Using the dataset from BR (2017), the code below applies the \CRANpkg{MultiATSM} package to estimate the key ATSM parameters following the `JPS original` modelling setup. + +```{r, echo=TRUE} +# 1) INPUTS +# A) Load database data +LoadData("BR_2017") + +# B) GENERAL model inputs +ModelType <- "JPS original" + +Economies <- c("US") +GlobalVar <- c() +DomVar <- c("GRO", "INF") +N <- 3 +t0_sample <- "January-1985" +tF_sample <- "December-2007" +DataFreq <- "Monthly" +StatQ <- FALSE + +# 2) Minor preliminary work +FactorLabels <- LabFac(N, DomVar, GlobalVar, Economies, ModelType) +Yields <- t(BR_jps_out$Y) +DomesticMacroVar <- t(BR_jps_out$M.o) +GlobalMacroVar <- c() + +# 3) Prepare the inputs of the likelihood function +ATSMInputs <- InputsForOpt(t0_sample, tF_sample, ModelType, Yields, GlobalMacroVar, + DomesticMacroVar, FactorLabels, Economies, DataFreq) + +# 4) Optimization of the model +ModelPara <- Optimization(ATSMInputs, StatQ, DataFreq, FactorLabels, Economies, ModelType) +``` + + +The tables below compare the ATSM parameter estimates generated from BR (2017) and the \CRANpkg{MultiATSM}. Table `r knitr::asis_output(ifelse(knitr::is_html_output(), '\\@ref(tab:QdynTab-H)', '\\@ref(tab:QdynTab-L)'))` reports the risk-neutral parameters. While the values presented do not match exactly, the differences are well within convergence tolerance and are arguably economically negligible. Table `r knitr::asis_output(ifelse(knitr::is_html_output(), '\\@ref(tab:PdynTab-H)', '\\@ref(tab:PdynTab-L)'))`, by contrast, contains parameters related to the model’s time-series dynamics. As these are derived in closed form, the estimates are exactly the same under both specifications. + +\newpage + + +```{r QdynTab-H, eval = knitr::is_html_output(), layout = "l-body-outset"} +options(scipen = 100) +options(scipen = 100) + +RowsQ <- c("$r_0$", "$\\lambda_1$", "$\\lambda_2$", "$\\lambda_3$") +TableQ <- data.frame(matrix(NA, ncol = 0, nrow = length(RowsQ))) +row.names(TableQ) <- RowsQ + +PackageQ <- c( + ModelPara$`JPS original`$US$ModEst$Q$r0, + diag(ModelPara$`JPS original`$US$ModEst$Q$K1XQ) +) +BRq <- c( + BR_jps_out$est.llk$rho0.cP, + diag(BR_jps_out$est.llk$KQ.XX) +) + +TableQ$MultiATSM <- PackageQ +TableQ$'BR (2017)' <- BRq + +# Function for consistent width and right alignment in HTML +format_html_num <- function(x, digits = 4) { + fmt <- formatC(x, format = "f", digits = digits) + fmt <- gsub("-", "−", fmt, fixed = TRUE) # replace hyphen with Unicode minus + # wrap in right-aligned span to preserve your table's theme + paste0('', fmt, '') +} + +TableQ_fmt <- TableQ +TableQ_fmt[] <- lapply(TableQ_fmt, function(col) { + if (is.numeric(col)) format_html_num(col) else col +}) + +library(kableExtra) +library(magrittr) + +kbl(TableQ_fmt, align = "c", caption = "$Q$-dynamics parameters", escape = FALSE) %>% + kable_classic("striped", full_width = FALSE) %>% + row_spec(0, font_size = 14) %>% + footnote( + general = "λ's are the eigenvalues from the risk-neutral feedback matrix and r₀ is the long-run mean of the short rate under Q." + ) +``` + +```{r QdynTab-L, eval = knitr::is_latex_output()} +options(scipen = 100) # eliminate scientific notation + +RowsQ <- c("$r_0$", "$\\lambda_1$", "$\\lambda_2$", "$\\lambda_3$") +TableQ <- data.frame(matrix(NA, ncol = 0, nrow = length(RowsQ))) +row.names(TableQ) <- RowsQ + +PackageQ <- c(ModelPara$`JPS original`$US$ModEst$Q$r0, diag(ModelPara$`JPS original`$US$ModEst$Q$K1XQ)) +BRq <- c(BR_jps_out$est.llk$rho0.cP, diag(BR_jps_out$est.llk$KQ.XX)) +TableQ$MultiATSM <- PackageQ +TableQ$'BR (2017)' <- BRq + +TableQ <- round(TableQ, digits = 4) + +# Ensure that numbers in the table are actual numerical values. This is necessary to ensure that negative signs show up as dashes rather than hyphens. +TableQ <- as.data.frame(TableQ) +TableQ[] <- lapply(TableQ, function(col) { + if (all(suppressWarnings(!is.na(as.numeric(col))))) { + paste0("$", col, "$") + } else { + col + } +}) + + +format_latex_num <- function(x, digits = 4) { + # Round and create a fixed-width string + fmt <- formatC(x, format = "f", digits = digits, width = digits + 3) + + # Replace normal space padding with phantom zeros for alignment in LaTeX + fmt <- gsub(" ", "\\\\phantom{0}", fmt) + + # Replace minus sign with LaTeX proper math minus + fmt <- gsub("-", "\\\\text{-}", fmt) + + paste0("$", fmt, "$") +} + +# Apply formatting only to numeric columns +TableQ_fmt <- TableQ +TableQ_fmt[] <- lapply(TableQ_fmt, function(col) { + if (is.numeric(col)) format_latex_num(col) else col +}) + +library(kableExtra) + +kable( + TableQ_fmt, + format = "latex", + booktabs = TRUE, + escape = FALSE, + align = "r", + caption = "$Q$-dynamics parameters" +) %>% + kable_styling(font_size = 7, latex_options = "hold_position") +knitr::asis_output(" +\\vspace{-2.0em} +\\begin{center} +\\footnotesize Note: $\\lambda$'s are the eigenvalues from the risk-neutral feedback matrix and $r_0$ is the long-run mean of the short rate under $\\mathbb{Q}$. +\\end{center} +") +``` + + +```{r PdynTab-H, eval = knitr::is_html_output(), layout = "l-body-outset"} + +RowsP <- c("PC1", "PC2", "PC3", "GRO", "INF") +ColP <- c(" ", RowsP) + +# 1) K0Z and K1Z +# Bauer and Rudebusch coefficients +TablePbr <- data.frame(matrix(NA, ncol = length(ColP), nrow = length(RowsP))) +row.names(TablePbr) <- RowsP +colnames(TablePbr) <- ColP + +TablePbr[[ColP[1]]] <- BR_jps_out$est.llk$KP.0Z +for (j in seq_along(RowsP)) { + TablePbr[[RowsP[j]]] <- BR_jps_out$est.llk$KP.ZZ[, j] +} + +TablePbr <- round(TablePbr, digits = 4) + +# MultiATSM coefficients +TablePMultiATSM <- data.frame(matrix(NA, ncol = length(ColP), nrow = length(RowsP))) +row.names(TablePMultiATSM) <- RowsP +colnames(TablePMultiATSM) <- ColP + +PP <- BR_jps_out$W[1:N, ] %*% Yields +ZZ <- rbind(PP, DomesticMacroVar) +Pdyncoef <- VAR(ZZ, "unconstrained") + +TablePMultiATSM[[ColP[1]]] <- Pdyncoef$K0Z +for (j in seq_along(RowsP)) { + TablePMultiATSM[[RowsP[j]]] <- Pdyncoef$K1Z[, j] +} + +TablePMultiATSM <- round(TablePMultiATSM, digits = 4) + +# Combine both tables +TableP <- rbind(TablePbr, TablePMultiATSM) +row.names(TableP) <- c(RowsP, paste0(RowsP, " ")) + +# ---- Formatting for HTML ---- +# Same right-aligned CSS approach as in Q-table +format_html_num <- function(x, digits = 4) { + fmt <- formatC(x, format = "f", digits = digits) + fmt <- gsub("-", "−", fmt, fixed = TRUE) # Unicode minus + paste0('', fmt, '') +} + +TableP_fmt <- TableP +TableP_fmt[] <- lapply(TableP_fmt, function(col) { + if (is.numeric(col)) format_html_num(col) else col +}) + +library(kableExtra) +library(magrittr) + +kbl(TableP_fmt, align = "c", caption = "$P$-dynamics parameters", escape = FALSE) %>% + kable_classic("striped", full_width = FALSE) %>% + row_spec(0, font_size = 14) %>% + add_header_above(c(" " = 1, "K0Z" = 1, "K1Z" = 5), bold = TRUE) %>% + pack_rows("BR (2017)", 1, 5) %>% + pack_rows("MultiATSM", 6, 10) %>% + footnote( + general = "$K0Z$ is the intercept and $K1Z$ is the feedback matrix from the $P$-dynamics." + ) + +``` + + +```{r PdynTab-L, eval = knitr::is_latex_output()} + +RowsP <- c("PC1", "PC2", "PC3", "GRO", "INF") +ColP <- c(" ", RowsP) + +# --- 1) K0Z and K1Z : Bauer and Rudebusch coefficients --- +TablePbr <- data.frame(matrix(NA, ncol = length(ColP), nrow = length(RowsP))) +row.names(TablePbr) <- RowsP +colnames(TablePbr) <- ColP + +TablePbr[[ColP[1]]] <- BR_jps_out$est.llk$KP.0Z +for (j in seq_along(RowsP)) { + TablePbr[[RowsP[j]]] <- BR_jps_out$est.llk$KP.ZZ[, j] +} +TablePbr <- round(TablePbr, digits = 4) + +# --- 2) MultiATSM coefficients --- +TablePMultiATSM <- data.frame(matrix(NA, ncol = length(ColP), nrow = length(RowsP))) +row.names(TablePMultiATSM) <- RowsP +colnames(TablePMultiATSM) <- ColP + +PP <- BR_jps_out$W[1:N, ] %*% Yields +ZZ <- rbind(PP, DomesticMacroVar) +Pdyncoef <- VAR(ZZ, "unconstrained") + +TablePMultiATSM[[ColP[1]]] <- Pdyncoef$K0Z +for (j in seq_along(RowsP)) { + TablePMultiATSM[[RowsP[j]]] <- Pdyncoef$K1Z[, j] +} +TablePMultiATSM <- round(TablePMultiATSM, digits = 4) + +# --- 3) Combine and label --- +TableP <- rbind(TablePbr, TablePMultiATSM) +row.names(TableP) <- c(RowsP, paste0(RowsP, " ")) # avoid duplicate names + +# --- 4) Format numeric cells for LaTeX alignment --- +format_latex_num <- function(x, digits = 4) { + fmt <- formatC(x, format = "f", digits = digits, width = digits + 4) + fmt <- gsub(" ", "\\\\phantom{0}", fmt) # pad spaces with phantom zeros + fmt <- gsub("-", "\\\\text{-}", fmt) # proper minus sign + paste0("$", fmt, "$") +} + +TableP_fmt <- TableP +TableP_fmt[] <- lapply(TableP_fmt, function(col) { + if (is.numeric(col)) format_latex_num(col) else col +}) + +library(kableExtra) +library(magrittr) + +kable(TableP_fmt, align = "c", format = "latex", booktabs = TRUE, escape = FALSE, + caption = "$P$-dynamics parameters") %>% + kable_styling(latex_options = "hold_position", font_size = 7) %>% + add_header_above(c(" " = 1, "K0Z" = 1, "K1Z" = 5), bold = TRUE) %>% + pack_rows("BR (2017)", 1, 5) %>% + pack_rows("MultiATSM", 6, 10) %>% + footnote( + general = "$K0Z$ is the intercept and $K1Z$ is the feedback matrix from the $P$-dynamics.", + escape = FALSE + ) +``` + + + +For replicability, it is important to note that the physical dynamics results reported in Table `r knitr::asis_output(ifelse(knitr::is_html_output(), '\\@ref(tab:PdynTab-H)', '\\@ref(tab:PdynTab-L)'))` using \CRANpkg{MultiATSM} rely on the principal component weights provided by BR (2017). Such a matrix is simply a scaled-up version of the one provided by the function `pca_weights_one_country()` of the current package. Accordingly, despite the numerical differences on the weight matrices, both methods generate time series of spanned factors which are perfectly correlated. Another difference between the two approaches relates to the construction of the log-likelihood function: while in the BR (2017) code this is expressed in terms of a portfolio of yields, the \CRANpkg{MultiATSM} package generates this same input directly as a function of observed yields (i.e. both procedures lead to equivalent log-likelihood vales up to the Jacobian term). + +Additionally, it is worth highlighting that the standard deviations for the portfolios of yields observed with errors are nearly identical, matching to seven decimal places: `r format(round(BR_jps_out$est.llk$sigma.e, 7), nsmall = 7)` for \CRANpkg{MultiATSM} and `r format(round(ModelPara[[1]]$US$ModEst$Q$se, 7), nsmall = 7)` for BR (2017). + +### Candelon and Moura (2024) + +The multicountry framework introduced in @CandelonMoura2024 enhances the tractability of large-scale ATSMs and deepens our understanding of the global economic mechanisms driving domestic yield curve fluctuations. This framework also generates more precise model estimates and enhances the forecasting capabilities of these models. This novel setup, embodied by the `GVAR multi` model class, is benchmarked against the findings of @JotikasthiraLeLundblad2015, which are captured by the `JLL original` model class. The paper showcases an empirical illustration involving China, Brazil, Mexico, and Uruguay. + +```{r, eval=FALSE, echo=TRUE} +# 1) INPUTS +# A) Load database data +LoadData("CM_2024") + +# B) GENERAL model inputs +ModelType <- "GVAR multi" +Economies <- c("China", "Brazil", "Mexico", "Uruguay") +GlobalVar <- c("Gl_Eco_Act", "Gl_Inflation") +DomVar <- c("Eco_Act", "Inflation") +N <- 3 +t0_sample <- "01-06-2004" +tF_sample <- "01-01-2020" +OutputLabel <- "CM_jfec" +DataFreq <-"Monthly" +StatQ <- FALSE + +# B.1) SPECIFIC model inputs +# GVAR-based models +GVARlist <- list( VARXtype = "unconstrained", W_type = "Sample Mean", t_First_Wgvar = "2004", + t_Last_Wgvar = "2019", DataConnectedness = TradeFlows ) + +# JLL-based models +JLLlist <- list(DomUnit = "China") + +# BRW inputs +WishBC <- TRUE +BRWlist <- within(list(Cent_Measure = "Mean", gamma = 0.001, N_iter = 200, B = 50, checkBRW = TRUE, + B_check = 1000, Eigen_rest = 1), N_burn <- round(N_iter * 0.15)) + +# C) Decide on Settings for numerical outputs +WishFPremia <- TRUE +FPmatLim <- c(24,36) + +Horiz <- 25 +DesiredGraphs <- c("GIRF", "GFEVD", "TermPremia") +WishGraphRiskFac <- FALSE +WishGraphYields <- TRUE +WishOrthoJLLgraphs <- TRUE + +# D) Bootstrap settings +WishBootstrap <- FALSE +BootList <- list(methodBS = 'bs', BlockLength = 4, ndraws = 1000, pctg = 95) + +# E) Out-of-sample forecast +WishForecast <- TRUE +ForecastList <- list(ForHoriz = 12, t0Sample = 1, t0Forecast = 100, ForType = "Rolling") + +# 2) Minor preliminary work: get the sets of factor labels and a vector of common maturities +FactorLabels <- LabFac(N, DomVar, GlobalVar, Economies, ModelType) + +# 3) Prepare the inputs of the likelihood function +ATSMInputs <- InputsForOpt(t0_sample, tF_sample, ModelType, Yields, GlobalMacro, + DomMacro, FactorLabels, Economies, DataFreq, + GVARlist, JLLlist, WishBC, BRWlist) + +# 4) Optimization of the ATSM (Point Estimates) +ModelParaList <- Optimization(ATSMInputs, StatQ, DataFreq, FactorLabels, Economies, ModelType) + +# 5) Numerical and graphical outputs +# a) Prepare list of inputs for graphs and numerical outputs +InputsForOutputs <- InputsForOutputs(ModelType, Horiz, DesiredGraphs, OutputLabel, StatQ, + DataFreq, WishGraphYields, WishGraphRiskFac, + WishOrthoJLLgraphs, WishFPremia, FPmatLim, + WishBootstrap, BootList, WishForecast, + ForecastList) + +# b) Fit, IRF, FEVD, GIRF, GFEVD, and Term Premia +NumericalOutputs <- NumOutputs(ModelType, ModelParaList, InputsForOutputs, + FactorLabels, Economies) + +# c) Confidence intervals (bootstrap analysis) +BootstrapAnalysis <- Bootstrap(ModelType, ModelParaList, NumericalOutputs, Economies, + InputsForOutputs, FactorLabels, JLLlist, GVARlist, + WishBC, BRWlist) + +# 6) Out-of-sample forecasting +Forecasts <- ForecastYields(ModelType, ModelParaList, InputsForOutputs, FactorLabels, + Economies, JLLlist, GVARlist, WishBC, BRWlist) +``` + +#### Candelon and Moura (2023) + +In this paper, @CandelonMoura2023 investigate the underlying factors that shape the sovereign yield curves of Brazil, India, Mexico, and Russia during the COVID$-19$ pandemic crisis. The study adopts a `GVAR multi` approach to capture the complex global macrofinancial, and especially health-related interdependencies during the latest pandemic. + +```{r, eval=FALSE, echo=TRUE} +# 1) INPUTS +# A) Load database data +LoadData("CM_2023") + +# B) GENERAL model inputs +ModelType <- "GVAR multi" +Economies <- c("Brazil", "India", "Russia", "Mexico") +GlobalVar <- c("US_Output_growth", "China_Output_growth", "SP500") +DomVar <- c("Inflation","Output_growth", "CDS", "COVID") +N <- 2 +t0_sample <- "22-03-2020" +tF_sample <- "26-09-2021" +OutputLabel <- "CM_EM" +DataFreq <-"Weekly" +StatQ <- FALSE + +# B.1) SPECIFIC model inputs +# GVAR-based models +GVARlist <- list(VARXtype = "constrained: COVID", W_type = "Sample Mean", + t_First_Wgvar = "2015", t_Last_Wgvar = "2020", + DataConnectedness = TradeFlows_covid) + +# BRW inputs +WishBC <- FALSE + +# C) Decide on Settings for numerical outputs +WishFPremia <- TRUE +FPmatLim <- c(47,48) + +Horiz <- 12 +DesiredGraphs <- c("GIRF", "GFEVD", "TermPremia") +WishGraphRiskFac <- FALSE +WishGraphYields <- TRUE +WishOrthoJLLgraphs <- FALSE + +# D) Bootstrap settings +WishBootstrap <- TRUE +BootList <- list(methodBS = 'bs', BlockLength = 4, ndraws = 100, pctg = 95) + +# 2) Minor preliminary work: get the sets of factor labels and a vector of common maturities +FactorLabels <- LabFac(N, DomVar, GlobalVar, Economies, ModelType) + +# 3) Prepare the inputs of the likelihood function +ATSMInputs <- InputsForOpt(t0_sample, tF_sample, ModelType, Yields_covid, GlobalMacro_covid, + DomMacro_covid, FactorLabels, Economies, DataFreq, GVARlist) + +# 4) Optimization of the ATSM (Point Estimates) +ModelParaList <- Optimization(ATSMInputs, StatQ, DataFreq, FactorLabels, Economies, ModelType) + +# 5) Numerical and graphical outputs +# a) Prepare list of inputs for graphs and numerical outputs +InputsForOutputs <- InputsForOutputs(ModelType, Horiz, DesiredGraphs, OutputLabel, StatQ, + DataFreq, WishGraphYields, WishGraphRiskFac, + WishOrthoJLLgraphs, WishFPremia, FPmatLim, + WishBootstrap, BootList) + +# b) Fit, IRF, FEVD, GIRF, GFEVD, and Term Premia +NumericalOutputs <- NumOutputs(ModelType, ModelParaList, InputsForOutputs, FactorLabels, + Economies) + +# c) Confidence intervals (bootstrap analysis) +BootstrapAnalysis <- Bootstrap(ModelType, ModelParaList, NumericalOutputs, Economies, + InputsForOutputs, FactorLabels, + JLLlist = NULL, GVARlist) +``` diff --git a/_articles/RJ-2025-044/RJ-2025-044.html b/_articles/RJ-2025-044/RJ-2025-044.html new file mode 100644 index 0000000000..8adc5f4a9b --- /dev/null +++ b/_articles/RJ-2025-044/RJ-2025-044.html @@ -0,0 +1,3935 @@ + + + + + + + + + + + + + + + + + + + + + + MultiATSM: An R Package for Arbitrage-Free Macrofinance Multicountry Affine Term Structure Models + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +

    MultiATSM: An R Package for Arbitrage-Free Macrofinance Multicountry Affine Term Structure Models

    + + + +

    The MultiATSM package provides estimation tools and a wide range of outputs for eight macrofinance affine term structure model (ATSM) classes, supporting practitioners, academics, and policymakers. All models extend the single-country framework of Joslin et al. (2014) to multicountry settings, with additional adaptations from Jotikasthira et al. (2015) and Candelon and Moura (2024). These model extensions incorporate, respectively, the presence of a dominant (global) economy and adopt a global vector autoregressive (GVAR) setup to capture the joint dynamics of risk factors. The package generates diverse outputs for each ATSM, including graphical representations of model fit, risk premia, impulse response functions, and forecast error variance decompositions. It also implements bootstrap methods for confidence intervals and produces bond yield forecasts.

    +
    + + + +
    +

    1 Introduction

    +

    The term structure of interest rates (or yield curve) describes the relationship between bond yields and investment maturities. As Piazzesi (2010) emphasizes, understanding its dynamics is essential for several reasons. First, long-term yields incorporate market expectations of future short-term rates, making the yield curve a handy forecasting tool for macroeconomic aggregates like output and inflation. As such, this supports optimal consumption-saving decisions and capital allocation by economic agents. Second, it plays a key role in the transmission of monetary policy, linking short-term policy rates to long-term borrowing costs. Third, it guides fiscal authorities in shaping debt maturities to balance refinancing risk and interest rate exposure. Fourth, it is essential for pricing and hedging interest rate derivatives, which rely on accurate yield curve modelling.

    +

    Affine Term Structure Models (ATSMs) are the workhorse in yield curve modelling. Based on the assumption of no arbitrage, ATSMs offer a flexible framework to assess how investors price risks and generate predictions for the price of any bond (see Piazzesi (2010); Gürkaynak and Wright (2012) for comprehensive reviews). Early ATSMs gained popularity for their ability to capture nearly all term structure fluctuations, appealing to both academics and practitioners (Vasicek 1977; Duffie and Kan 1996; Dai and Singleton 2002). While these models produce accurate statistical descriptions of the yield curve, they are silent on the deeper economic determinants that policymakers require for causal inference.

    +

    In response to this limitation, a large body of research has emerged to explore the interplay between the term structure and macroeconomic developments (seminal contributions include Ang and Piazzesi (2003) and Rudebusch and Wu (2008)). A prominent contribution in this area is the unspanned economic risk framework developed by Joslin et al. (2014) (henceforth JPS, 2014). In essence, this model assumes an arbitrage-free bond market and considers a linear state space representation to describe the dynamics of the yield curve. Compared to earlier macrofinance ATSMs, JPS (2014) offers a tractable estimation approach that integrates traditional yield curve factors (spanned factors) with macroeconomic variables (unspanned factors). As a result, the model delivers a strong cross-sectional fit while explicitly linking bond yield responses to the state of the economy.

    +

    The work of JPS (2014) lays the foundational framework for the modelling tools included in the MultiATSM package (Moura 2025). In addition to the original single-country setup proposed by JPS (2014), the package incorporates multicountry extensions developed by Jotikasthira et al. (2015) (henceforth JLL, 2015) and Candelon and Moura (2024) (henceforth CM, 2024). Altogether, the package offers functions to build eight types of ATSMs, covering the original versions and several variants of these three frameworks.

    +

    Beyond complete routines for model estimation, MultiATSM produces a wide range of analytical outputs. In particular, it generates graphical representations such as model-implied bond yields, bond risk premia, and both orthogonalized and generalized versions of: (i) impulse response functions, and (ii) forecast error variance decompositions for yields and risk factors. Confidence intervals for the two latter outputs can be computed using three bootstrap methods: residual-based, block, or wild bootstrap. Moreover, the package supports out-of-sample forecasting of bond yields across the maturity spectrum. This paper provides detailed guidance on how to use the MultiATSM package effectively.

    +

    There are a few notable packages for term structure modelling in the R programming environment. YieldCurve (Guirreri 2015) and fBonds (Setz 2017) provide a collection of functions to build term structures based on the frameworks of Nelson and Siegel (1987) and Svensson (1994). These yield curve methods have gained popularity for their parsimonious parameterization and good empirical fit. However, these models do not rule out arbitrage opportunities, a limitation addressed by ATSMs. Moreover, the focus of YieldCurve and fBonds is restricted to parameter estimation and yield curve fitting, without offering additional model outputs such as those provided by MultiATSM.

    +

    Several other R packages support time series modelling (Hyndman and Killick 2025), particularly within state space and vector autoregressive (VAR) frameworks. State space packages are relatively few and tend to focus on either estimation, statespacer (Beijers 2023), or simulation, simStateSpace (Pesigan 2025). VAR-based tools are more numerous. For instance, vars (Pfaff and Stigler 2024) and MTS (Tsay et al. 2022) provide extensive functionality for estimation, diagnostics, and forecasting, while svars (Lange et al. 2023) adds structural identification methods. High-dimensional VARs are handled by packages like bigtime (Wilms et al. 2023) and BigVAR (Nicholson et al. 2025), and cross-country spillovers are modeled by Spillover (Urbina 2024) and BGVAR (Boeck et al. 2024).

    +

    Although these tools share some features with MultiATSM, they are tailored to standard state space or VAR analysis. In contrast, MultiATSM embeds VAR dynamics within a state space representation that is explicitly grounded in arbitrage-free asset pricing theory. As such, MultiATSM fills a specific gap in the R ecosystem by combining the structure of ATSMs with the flexibility of modern time series tools.

    +

    The remainder of the paper is organized as follows. Section 2 outlines the theoretical foundations of the ATSMs implemented in the MultiATSM package, and Section 3 details each model’s features. The subsequent sections focus on the practical implementation of ATSMs. Section 4 presents the dataset included in the package. Section 5 explains the user inputs required for model estimation. Section 6 explains the estimation procedure, and Section 7 shows how to estimate ATSMs from scratch using MultiATSM. Replications of published academic studies are provided in the Appendix.

    +

    2 ATSMs with unspanned economic risks: theoretical background

    +

    In this section, I outline several arbitrage-free ATSMs with unspanned macroeconomic risks available in the MultiATSM package. A key appealing feature of these setups is their ability to disentangle the yield curve into a cross-sectional component, governed by the risk-neutral (\(\mathbb{Q}\)) dynamics, and a time-series component, driven by the physical (\(\mathbb{P}\)) dynamics. In light of this characteristic of the models, I present the single and the multicountry \(\mathbb{Q}\)-dynamics model dimensions in Section 2.1. Next, I expose the specific features of the risk factor dynamics under the \(\mathbb{P}\)-measure of the various restricted and unrestricted VARs settings in Section 2.2. Section 2.3 describes the model estimation procedures.

    +

    2.1 Model cross-sectional dimension (Q-dynamics)

    +

    Single-country specifications (individual Q-dynamics model classes)

    +

    The model cross-sectional structure is based on two central equations. The first one assumes that the country \(i\) short-term interest rate at time \(t\), \(r_{i,t}\), is an affine function of \(N\) unobserved (latent) country-specific factors, \(\boldsymbol{X_{i,t}}\): +\[\begin{equation} +\underset{(1 \times 1)}{\vphantom{\Big|} +r_{i,t}} = +\underset{(1 \times 1)}{ +\vphantom{\Big|} +\delta_{i,0}} + +\underset{(1 \times N)}{% +\vphantom{\Big|} +\boldsymbol{\delta}_{i,1}^{\top}} +\underset{(N \times 1)}{% +\vphantom{\Big|} +\boldsymbol{X}_{i,t}}\text{,} +\tag{1} +\end{equation}\] +where \(\delta_{i,0}\) and \(\boldsymbol{\delta_{i,1}}\) are time-invariant parameters.

    +

    The second equation assumes that the unobserved factor dynamics for each country \(i\) follow a maximally flexible, first-order, \(N-\)dimensional multivariate Gaussian (\(\mathcal{N}\)) VAR model under the \(\mathbb{Q}\)-measure:

    +

    \[\begin{align} +& \underset{(N \times 1)}{\boldsymbol{\vphantom{\Big|} +X_{i,t}}} = +\underset{(N \times 1)}{\boldsymbol{\vphantom{\Big|} +\mu^{Q}_{i,X}}} + +\underset{(N \times N)}{\vphantom{\Big|} +\Phi^{Q}_{i,X}} +\underset{(N \times 1)}{\boldsymbol{\vphantom{\Big|} +X_{i,t-1}}} + +\underset{(N \times N)}{\vphantom{\Big|} +\Gamma_{i,X}} +\underset{(N \times 1)}{\boldsymbol{\vphantom{\Big|} +\varepsilon_{i,t}^{Q}}}\text{,} + & \boldsymbol{\varepsilon_{i,t}^{Q}}\sim {\mathcal{N}_N}(\boldsymbol{0}_N,\mathrm{I}_N)\text{,} + \tag{2} +\end{align}\] +where \(\boldsymbol{\mu^{Q}_{i,X}}\) contains intercepts, \(\Phi^{Q}_{i,X}\), the feedback matrix, and \(\Gamma_{i,X}\) a lower triangular matrix.

    +

    Based on Equations (1) and (2), Dai and Singleton (2000) show that the country-specific zero-coupon bond yield with maturity of \(n\) periods, \(y_{i,t}^{(n)}\), is affine in \(\boldsymbol{X_{i,t}}\): +\[\begin{equation} +\underset{(1 \times 1)}{\vphantom{\Big|} +y_{i,t}^{(n)}} = +\underset{(1 \times 1)}{\vphantom{\Big|} +a_{i,n}(\Theta_{n})} + +\underset{(1 \times N)}{\vphantom{\Big|} +\boldsymbol{b_{i,n}(\Theta_{n})}^\top} +\underset{(N \times 1)}{\vphantom{\Big|} +\boldsymbol{X_{i,t}}}\text{,} + \tag{3} +\end{equation}\] +where \(a_{i,n}(\Theta _{n})\) and \(\boldsymbol{b_{i,n}(\Theta _{n})}\) are constrained to eliminate arbitrage opportunities within this bond market, as dictated by the well-known Riccati equations.1 For notational simplicity, we collect \(J\) bond yields into the vector \(\boldsymbol{Y_{i,t}}=[y_{i,t}^{(1)}, y_{i,t}^{(2)},...,y_{i,t}^{(J)}]^\top\), the \(J\) intercepts into \(\boldsymbol{A_X(\Theta_i)}=[a_{i,1}(\Theta _{1}), a_{i,2}(\Theta _{2}) ,...,a_{i,J}(\Theta _{J})]^\top\) \(\in \mathbb{R}^J\), and the \(N\) slope coefficients into a \(J \times N\) matrix \(B_X(\Theta_i)=[\boldsymbol{b_{i,1}(\Theta _{1})}^\top, \boldsymbol{b_{i,2}(\Theta _{2})}^\top, ...,\boldsymbol{b_{i,J}(\Theta _{J})}^\top]^\top\). Accordingly, the yield curve cross-section dimension of country \(i\) is: +\[\begin{equation} + \underset{(J \times 1)}{\vphantom{\Big|} + \boldsymbol{Y_{i,t}}} = + \underset{(J \times 1)}{\vphantom{\Big|} + \boldsymbol{A_X(\Theta_i)}} + + \underset{(J \times N)}{\vphantom{\Big|} + B_X(\Theta_i)} + \underset{(N \times 1)}{\vphantom{\Big|} + \boldsymbol{X_{i,t}}}\text{.} +\tag{4} +\end{equation}\]

    +

    It follows from Equations (1) and (2) that the parameter set \(\Theta_i =\{\boldsymbol{\mu^Q_{i,X}},\Phi^Q_{i,X}, \Gamma_{i,X}, \delta_{i,0}, \boldsymbol{\delta_{i,1}}\}\) fully characterizes the cross-section of country’s \(i\) term structure. Importantly, Dai and Singleton (2000) demonstrate that this system is not identified without additional restrictions, since \(\boldsymbol{X_{i,t}}\) and any invertible affine transformation of \(\boldsymbol{X_{i,t}}\) yield observationally equivalent representations. To circumvent this problem, JPS (2014) adopt the three sets of (minimal) restrictions proposed by Joslin et al. (2011). First, they impose the latent factors to be zero-mean processes, forcing \(\boldsymbol{\mu^{Q}_{i,X}}= \boldsymbol{0}_N\). Second, they choose \(\boldsymbol{\delta_{i,1}}\) to be a \(N\)-dimensional vector whose entries are all equal to one. Lastly, \(\Phi^Q_{i,X}\) is a diagonal matrix, the elements of which are the real and distinct eigenvalues, \(\lambda^Q_i\), of the matrix of eigenvectors of \(\Phi^Q_{i,X}\). Based on this restriction set, Joslin et al. (2011) show that no additional invariant rotation is possible.

    +

    Joslin et al. (2011) also show that a rotation from \(\boldsymbol{X_{i,t}}\) to portfolios of yields, the spanned factors \(\boldsymbol{P_{i,t}}\), leads to an observationally equivalent model representation. This invariant transformation implies that \(N\) portfolios of yields are perfectly priced and observed without errors, while the remaining \(J-N\) portfolios are priced and observed imperfectly. Specifically, the spanned factors are computed as \(\boldsymbol{P_{i,t}}=V_i\boldsymbol{Y_{i,t}}\), for a full-rank matrix \(V_i\). Based on this definition, Equation (4) can be rearranged as an affine function of \(\boldsymbol{P_{i,t}}\)

    +

    \[\begin{equation} + \underset{(J \times 1)}{\vphantom{\Big|} + \boldsymbol{Y_{i,t}}}= + \underset{(J \times 1)}{\vphantom{\Big|} + \boldsymbol{A_P(\Theta_i)}}+ + \underset{(J \times N)}{\vphantom{\Big|} + B_P(\Theta_i)} + \underset{(N \times 1)}{\vphantom{\Big|} + \boldsymbol{P_{i,t}}}\text{.} + \tag{5} +\end{equation}\] +where \(\boldsymbol{A_P(\Theta_i)}= \mathrm{I}_n - B_X(\Theta_i ) \left[ V_iB_X(\Theta_i \right] ^{-1}V_i \boldsymbol{A_X(\Theta_i)}\) and +\(B_P(\Theta_i)=B_X(\Theta_i) \left[ V_iB_X(\Theta_i ) \right]^{-1}\).

    +

    The rotation from \(\boldsymbol{X_{i,t}}\) to \(\boldsymbol{P_{i,t}}\) is convenient for two key reasons. First, \(\boldsymbol{P_{i,t}}\) contains directly observable yield curve factors (unlike \(\boldsymbol{X_{i,t}}\)), with its \(N\) elements mapping to traditional yield curve components. For instance, for \(N=3\) and \(V_i\) being the weight matrix that results from a principal component analysis, the portfolios of yields \(\boldsymbol{P_{i,t}}\) are commonly referred to as the level, slope, and curvature factors (see Section 6.1). Second, it enables a convenient decomposition of the likelihood function, facilitating both estimation and the interpretation of model parameters.

    +

    Multicountry specifications (joint Q-dynamics model classes)

    +

    The cross-section multicountry extension is formed by stacking the country yields, spanned factors, and intercepts from Equation (5) into, respectively, \(\boldsymbol{Y_t}=[\boldsymbol{Y_{1,t}}^\top, \boldsymbol{Y_{2,t}}^\top, ...,\boldsymbol{Y_{C,t}}^\top]^\top\), \(\boldsymbol{P_t}=[\boldsymbol{P_{1,t}}^\top, \boldsymbol{P_{2,t}}^\top, ..., \boldsymbol{P_{C,t}}^\top]^\top\), and \(\boldsymbol{A_P(\Theta)}=[\boldsymbol{A_P^\top(\Theta_1)}, \boldsymbol{A_P^\top(\Theta_2)}, ..., \boldsymbol{A_P^\top(\Theta_C)}]^\top\), where \(C\) denotes the number of countries in this economic system. Additionally, we set \(B_{P}(\Theta)\) as block diagonal, \(B_P(\Theta)=B_P(\Theta_1) \oplus B_P(\Theta_2) \oplus \dots \oplus B_P(\Theta_C)\), where \(\oplus\) refers to the direct sum symbol. Accordingly, +\[\begin{equation} + \underset{(CJ \times 1)}{\vphantom{\Big|} + \boldsymbol{Y_{t}}} = + \underset{(CJ \times 1)}{\vphantom{\Big|} + \boldsymbol{A_{P}(\Theta)}} + + \underset{(CJ \times CN)}{\vphantom{\Big|} + B_{P}(\Theta)} + \underset{(CN \times 1)}{\vphantom{\Big|} + \boldsymbol{P_{t}}}\text{.} + \tag{6} +\end{equation}\]

    +

    2.2 Model time series dimension (P-dynamics)

    +

    In the modelling frameworks implemented in the MultiATSM package, the risk factor dynamics under the \(\mathbb{P}\)-measure must include at least \(N\) domestic spanned factors (\(\boldsymbol{P_{i,t}}\)) and \(M\) domestic unspanned factors (\(\boldsymbol{M_{i,t}}\)), and may optionally include \(G\) global unspanned factors (\(\boldsymbol{M_t^W}\)), depending on the specification. The dynamics of these risk factors evolves as either an unrestricted or a restricted VAR models. The unrestricted case corresponds to the JPS specification, while the restricted setup encompasses the GVAR and JLL frameworks.

    +

    It is worth stressing the role of unspanned factors in shaping the yield curve developments. Although these factors are absent in the cross-section dimension of the models, they influence the dynamics of the spanned factors which, in turn, affect directly bond yields.

    +

    JPS-based models

    +

    The country-specific state vector, \(\boldsymbol{Z_{i,t}}\), is formed from stacking the global and domestic (unspanned and spanned) risk factors: \(\boldsymbol{Z_{i,t}} = [\boldsymbol{M_t^{W^\top}}\), \(\boldsymbol{M_{i,t}}^\top\), \(\boldsymbol{P_{i,t}}^\top]^\top\). As such, \(\boldsymbol{Z_{i,t}}\) is a \(R\)-dimensional vector, where \(R =G + K\) and \(K = M + N\). +In JPS-based setups, \(\boldsymbol{Z_{i,t}}\) follows a standard unrestricted Gaussian VAR(1): +\[\begin{align} + & \underset{(R \times 1)}{\vphantom{\Big|} + \boldsymbol{Z_{i,t}}} = + \underset{(R \times 1)}{\vphantom{\Big|} + \boldsymbol{C_i^{\mathbb{P}}}} + + \underset{(R \times R)}{\vphantom{\Big|} + \Phi_i^{\mathbb{P}}} + \underset{ (R \times 1)}{\vphantom{\Big|} + \boldsymbol{Z_{i,t-1}}} + + \underset{(R \times R)}{\vphantom{\Big|} + \Gamma_i} + \underset{(R \times 1)}{\vphantom{\Big|} + \boldsymbol{\varepsilon_{Z,t}^{\mathbb{P}}}}\text{,} + & \boldsymbol{\varepsilon_{Z,t}^{\mathbb{P}}} \sim + {\mathcal{N}_R}(\boldsymbol{0}_R,\mathrm{I}_R)\text{,}% + \tag{7} +\end{align}\] +where \(\boldsymbol{C_i^{\mathbb{P}}}\) denotes the vector of intercepts; \(\Phi_i^{\mathbb{P}}\), the feedback matrix; and \(\Gamma_i\), the Cholesky factor (a lower triangular matrix).

    +

    GVAR-based models

    +

    In the MultiATSM package, the GVAR setup is formed from two parts: the marginal and the VARX\(^{*}\) models. The former captures the joint dynamics of the global economy, whereas the latter describes the developments from the domestic factors. For a thorough description of GVAR models, see Chudik and Pesaran (2016).

    +

    The marginal model is an unrestricted VAR(\(1\)) featuring exclusively the global factors: +\[\begin{align} +& \underset{(G \times 1)}{\vphantom{\Big|} +\boldsymbol{M_t^W}}= +\underset{(G \times 1)}{\vphantom{\Big|} +\boldsymbol{C^W}} + +\underset{(G \times G)}{\vphantom{\Big|} +\Phi^W} \underset{(G \times 1)}{\vphantom{\Big|} +\boldsymbol{M_{t-1}^W}} + +\underset{(G \times G)}{\vphantom{\Big|} +\Gamma^W}\underset{(G \times 1)}{\vphantom{\Big|} +\boldsymbol{\varepsilon_{t}^W}}\text{,} & \boldsymbol{\varepsilon_t^W} \sim {\mathcal{N}_G}(\boldsymbol{0}_G,\mathrm{I}_G). +\tag{8} +\end{align}\]

    +

    The VARX\(^{*}\) setups are country-specific small-scale VAR models containing global factors and weakly exogenous ‘star’ variables — weighted averages of foreign variables — built as +\[\begin{equation} +\boldsymbol{Z_{i,t}^{\ast^\top}} = \sum_{j=1}^{C} w_{i,j} \boldsymbol{Z_{j,t}^\top}, \qquad \sum_{j=1}^{C} w_{i,j}= 1, \quad w_{i,i}=0 \quad \forall i \in \{1,2, ...C \}, +\tag{9} +\end{equation}\] +where \(Z_{j,t}\) is a \(K-\)dimension vector of domestic factors \(\boldsymbol{Z_{j,t}} = [\boldsymbol{M_{j,t}}^\top\), \(\boldsymbol{P_{j,t}}^\top]^\top\) and \(w_{i,j}\) is a scalar that measures the degree of connectedness of country \(i\) with country \(j\).

    +

    These models follow a VARX\(^{*}(p,q,r)\) specification, where \(p\), \(q\) and \(r\) are the number of lags from, respectively, the domestic, the star, and the global risk factors. The MultiATSM package provides the estimates for the case \(p=q=r=1\). In such a case, the dynamics of \(\boldsymbol{Z_{i,t}}\) is described as a VARX\(^{*}\) of the following form: +\[\begin{align} +& \underset{(K \times 1)}{\vphantom{\Big|} +\boldsymbol{Z_{i,t}}} = +\underset{(K \times 1)}{\vphantom{\Big|} +\boldsymbol{C^X_{i}}} + +\underset{(K \times K)}{\vphantom{\Big|} +\Phi^X_{i}} +\underset{(K \times 1)}{\vphantom{\Big|} +\boldsymbol{Z_{i,t-1}}} + +\underset{(K \times K)}{\vphantom{\Big|} +\Phi^{X^\ast}_i} +\underset{(K \times 1)}{\vphantom{\Big|} +\boldsymbol{Z_{i,t-1}^{\ast}}} + +\underset{(K \times G)}{\vphantom{\Big|} +\Phi_{i}^{X^{W}}} +\underset{(G \times 1)}{\vphantom{\Big|} +\boldsymbol{M_{t-1}^{W}}} + +\underset{(K \times K)}{\vphantom{\Big|} +\Gamma_{i}^{X}} +\underset{(K \times 1)}{\vphantom{\Big|} +\boldsymbol{\varepsilon^X_{i,t}}}\text{,} & \boldsymbol{\varepsilon^X_{i,t}} \sim {\mathcal{N}_K}(\boldsymbol{0}_K,\mathrm{I}_K). +\tag{10} +\end{align}\]

    +

    Additionally, GVAR models require, as an intermediate step, the specification of country-specific \(2K \times CK\)-link matrices, \(W_i\), to unify the individual VARX\(^{*}\) models. Formally, +\[\begin{equation} +\begin{bmatrix} \boldsymbol{Z_{i,t}} \\ \boldsymbol{Z_{i,t}}^{*} \end{bmatrix}_{2K \times 1} \equiv \underset{(2K \times CK)}{W_i} \begin{bmatrix} \boldsymbol{Z_{1,t}} \\ \boldsymbol{Z_{2,t}} \\ \vdots \\ \boldsymbol{Z_{C,t}} +\end{bmatrix}_{CK \times 1}. +\tag{11} +\end{equation}\]

    +

    Last, to compose the \(F\)-dimensional state vector for \(F = G + CK\), we gather the global economic variables and the country-specific risk factors, as \(\boldsymbol{Z_t} = [\boldsymbol{M_{t}^{W^\top}}\), \(\boldsymbol{Z_{1,t}}^\top\), \(\boldsymbol{Z_{2,t}}^\top, \ldots \boldsymbol{Z_{C,t}}^\top]^\top\). As such, we can form a first order GVAR process as +\[\begin{align} + & \underset{(F \times 1)}{\vphantom{\Big|} + \boldsymbol{Z_t}} = + \underset{(F \times 1)}{\vphantom{\Big|} + \boldsymbol{C_y}} + + \underset{(F \times F)}{\vphantom{\Big|} + \Phi_y} + \underset{(F \times F)}{\vphantom{\Big|} + \boldsymbol{Z_{t-1}}} + + \underset{(F \times F)}{\vphantom{\Big|} + \Gamma_y} + \underset{(F \times 1)}{\vphantom{\Big|} + \boldsymbol{\varepsilon_{y,t}}}\text{,} & + \boldsymbol{\varepsilon_{y,t}} \sim {\mathcal{N}_F}(\boldsymbol{0}_F,\mathrm{I}_F)\text{,} \tag{12} +\end{align}\] +where \(\boldsymbol{C_y} = [\boldsymbol{C^{W^\top}}\), \(\boldsymbol{C_1^{X^\top}}\), \(\boldsymbol{C_2^{X^\top}}\),… \(\boldsymbol{C_C^{X^\top}}]^\top\), \(\boldsymbol{\varepsilon_{y,t}} =[ \boldsymbol{\varepsilon^{W^\top}_t}\), \(\boldsymbol{\varepsilon_{1,t}^{X^\top}}\), \(\boldsymbol{\varepsilon_{2,t}^{X^\top}}\)\(\boldsymbol{\varepsilon_{C,t}^{X^\top}}]^\top\), \(\Gamma_y=\Gamma^W \oplus \Gamma_1^X \oplus \Gamma_2^X \oplus \dots \oplus \Gamma_C^X\), and

    +

    \[\begin{equation} +\Phi_y = +\begin{bmatrix} +\Phi^W & 0_{\scriptscriptstyle{G \times CK}} \\ +\Phi^{X^{W}} & G_1 +\end{bmatrix}_{F \times F} , +\end{equation}\]

    +

    where \(\Phi^{X^{W}}= +\begin{bmatrix} +\Phi^{X^{W}}_1 \\ +\Phi^{X^{W}}_2 \\ +\vdots \\ +\Phi^{X^{W}}_C +\end{bmatrix}_{CK \times G}\) +and \(G_1= +\begin{bmatrix} +\Phi_1W_1 \\ +\Phi_2W_2 \\ +\vdots \\ +\Phi_CW_C +\end{bmatrix}_{CK \times CK}\), for \(\Phi_i= [\Phi_i^{X}\), \(\Phi_i^{X^*}]\) and \(\quad \forall i \in \{1,2, ...C \}\).

    +

    JLL-based models

    +

    JLL-based models incorporate three components: (i) the global economy, (ii) a dominant large economy,2 and (iii) a set of smaller economies. The state vector is formed from a number of linear projections to build domestic risk factors that are free of the influence of the variables from other countries and/or from the global economy.

    +

    The construction of the domestic spanned factors proceeds in two steps. First, for each economy \(i\), \(\boldsymbol{P_{i,t}}\) is projected on \(\boldsymbol{M_{i,t}}\) of this same country +\[\begin{equation} + \underset{ (N \times 1)}{\vphantom{\Big|} + \boldsymbol{P_{i,t}}} = + \underset{(N \times M)}{\vphantom{\Big|} + b_i} + \underset{(M \times 1)}{\vphantom{\Big|} + \boldsymbol{M_{i,t}}} + + \underset{ (N \times 1)}{\vphantom{\Big|} + \boldsymbol{P_{i,t}^e}} \text{,} +\tag{13} +\end{equation}\] +where the residuals \(\boldsymbol{P_{i,t}^e}\) are orthogonal to the economic fundamentals of the country \(i\).

    +

    Second, for the non-dominant economies, \(\boldsymbol{P_{i,t}^e}\) is additionally projected on the orthogonalized spanned factors of the dominant country, indexed by \(D\), as follows: +\[\begin{equation} + \underset{(N \times 1)}{\vphantom{\Big|} + \boldsymbol{P_{i,t}^e}} = + \underset{(N \times N)}{\vphantom{\Big|} + c_i^D} + \underset{(N \times 1)}{\vphantom{\Big|} + \boldsymbol{P_{D,t}^e}} + + \underset{(N \times 1)}{\vphantom{\Big|} + \boldsymbol{P_{i,t}^{e*}}}\text{,} \quad i \neq D, +\tag{14} +\end{equation}\] +where \(\boldsymbol{P_{i,t}^{e*}}\) corresponds to the non-dominant country \(i\) set of residuals.

    +

    The design of the domestic unspanned factors also features two steps: for the dominant economy, \(\boldsymbol{M_{D,t}}\) is projected on the global economic factors +\[\begin{equation} + \underset{(M \times 1)}{\vphantom{\Big|} + \boldsymbol{M_{D,t}}} = + \underset{(M \times G)}{\vphantom{\Big|} + a_D^W} + \underset{(G \times 1)}{\vphantom{\Big|} + \boldsymbol{M_t^W}} + + \underset{(M \times 1)}{\vphantom{\Big|} + \boldsymbol{M_{D,t}^e}} \text{,} +\tag{15} +\end{equation}\] +and, for the other economies, the residuals of the previous regression are used to compute +\[\begin{equation} + \underset{(M \times 1)}{\vphantom{\Big|} + \boldsymbol{M_{i,t}}} = + \underset{(M \times G)}{\vphantom{\Big|} + a_i^W} + \underset{(G \times 1)}{\vphantom{\Big|} + \boldsymbol{M_t^W}} + + \underset{(M \times M)}{\vphantom{\Big|} + a_i^D} + \underset{(M \times 1)}{\vphantom{\Big|} + \boldsymbol{M_{D,t}^e}} + + \underset{(M \times 1)}{\vphantom{\Big|} + \boldsymbol{M_{i,t}^{e*}}}\text{.} +\tag{16} +\end{equation}\]

    +

    Accordingly, the state vector is formed by \(\boldsymbol{Z_t^e}= [\boldsymbol{M_t^{W^\top}}\), \(\boldsymbol{M_{D,t}^{e^\top}}\), \(\boldsymbol{P_{D,t}^{e^\top}}\), \(\boldsymbol{M_{2,t}^{e*^\top}}\), \(\boldsymbol{P_{2,t}^{e*^\top}}\)\(\boldsymbol{M_{C,t}^{e*^\top}}\), \(\boldsymbol{P_{C,t}^{e*^\top}}]^\top\) and its dynamics evolve as a restricted VAR(1), +\[\begin{align} +& \underset{(F \times 1)}{\vphantom{\Big|} +\boldsymbol{Z_t^e}}= +\underset{(F \times 1)}{\vphantom{\Big|} +\boldsymbol{C^{e}_Y}} + +\underset{(F \times F)}{\vphantom{\Big|} +\Phi^e_Y} +\underset{(F \times 1)}{\vphantom{\Big|} +\boldsymbol{Z_{t-1}^e}} + +\underset{(F \times F)}{\vphantom{\Big|} +\Gamma_{Y}^e} +\underset{(F \times 1)}{\vphantom{\Big|} +\boldsymbol{\varepsilon^e_{Z,t}}}\text{,} & \boldsymbol{\varepsilon _{Z,t}^{e}} \sim {\mathcal{N}_F}(\boldsymbol{0}_F,\mathrm{I}_F). +\tag{17} +\end{align}\] +JLL (2015) impose a set of zero restrictions on \(\Phi^e_Y\) and \(\Gamma_{Y}^e\), with their detailed structure provided in the original study.

    +

    2.3 Estimation procedures

    +

    The approach proposed by JPS (2014) enables an efficient estimation procedure through its structural design. Specifically, the parameters governing the \(\mathbb{Q}\)- and \(\mathbb{P}\)-measures can be estimated independently. The only exception is the variance-covariance matrix, \(\Sigma\), which appears in both likelihood functions and, therefore, must be estimated jointly.

    +

    In JLL (2015), however, the authors adopt a simplified estimation procedure by estimating the \(\Sigma\) matrix exclusively under the \(\mathbb{P}\)-measure. While they acknowledge that this approach is not fully efficient, they argue that the empirical implications are limited in their application.

    +

    3 The ATSMs available at the MultiATSM package

    +

    As outlined in the previous section, the ATSMs implemented in the MultiATSM package differ in the specification of their \(\mathbb{Q}\)- and \(\mathbb{P}\)-measure dynamics. In short, under the \(\mathbb{Q}\)-measure, models can be specified either on a country-by-country basis (JPS, 2014) or jointly across countries (JLL, 2015; CM, 2024). Under the \(\mathbb{P}\)-measure, risk factor dynamics follow a VAR(1) process, which may be unrestricted, as in the JPS-related frameworks, or restricted, as in the JLL and GVAR specifications.

    +

    MultiATSM provides support for eight different classes of ATSMs based on these modelling approaches. These classes vary along several dimensions: the specification of the \(\mathbb{P}\)- and \(\mathbb{Q}\)-dynamics, the estimation approach, and whether a dominant economy is included. Table 1 summarizes the defining features of each model class available in the package. A brief overview of these specifications follows below.

    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +Table 1: Summary of model features +
    + +
    +P-dynamics +
    +
    + +
    +Q-dynamics +
    +
    + +
    +Sigma matrix estimation +
    +
    + +
    +Dom. Eco. +
    +
    + +
    +Single +
    +
    +
    +Joint +
    +
    + +
    +Single +
    +
    +
    +Joint +
    +
    + +
    +P only +
    +
    +
    +P and Q +
    +
    +
    + +
    +UR +
    +
    +
    +R +
    +
    +
    +UR +
    +
    +
    +R +
    +
    +
    + + + + + +JLL + +GVAR + + + + + + + + + + + +
    +Unrestricted VAR +
    +JPS original + + +x + + + + + + + +x + + + + + +x + + + +
    +JPS global + + + + +x + + + + + +x + + + + + +x + + + +
    +JPS multi + + + + +x + + + + + + +x + + + + +x + + + +
    +Restricted VAR (GVAR) +
    +GVAR single + + + + + + +x + + + +x + + + + + +x + + + +
    +GVAR multi + + + + + + +x + + + + +x + + + + +x + + + +
    +Restricted VAR (JLL) +
    +JLL original + + + + + +x + + + + + +x + + + +x + + + + +x +
    +JLL No DomUnit + + + + + +x + + + + + +x + + + +x + + + + +
    +JLL joint Sigma + + + + + +x + + + + + +x + + + + +x + + + +x +
    +Note: +
    + Risk factor dynamics under the \(\mathbb{P}\)-measure may follow either an unrestricted (UR) or a restricted (R) specification. The set of restrictions present in the JLL-based and GVAR-based models are described in Jotikasthira et al. (2015) and Candelon and Moura (2024), respectively. The estimation of the \(\Sigma\) matrix is done either exclusively with the other parameters of the \(\mathbb{P}\)-dynamics (P column) or jointly under both \(\mathbb{P}\)- and \(\mathbb{Q}\)-parameters (P and Q column). Dom. Eco. relates to the presence of a dominant economy. The entries featuring x indicate that the referred characteristic is part of the model. +
    +
    +
    + +
    +

    The ATSMs in which the estimation is performed separately for each country are labeled as JPS original, JPS global and GVAR single. In the JPS original setup, the set of risk factors includes exclusively each country’s domestic variables and the global unspanned factors, whereas JPS global and GVAR single also incorporate domestic risk factors of the other countries of the economic system. Noticeably, the difference between JPS global and GVAR single stem from the set of restrictions imposed under the \(\mathbb{P}\)-dynamics.

    +

    Within the multicountry frameworks, certain features are worth noting. The JLL original model reproduces the setup in JLL (2015), assuming an economic cohort composed of a globally dominant economy and a set of smaller countries, and estimating the \(\Sigma\) matrix exclusively under the \(\mathbb{P}\)-measure. The two alternative versions assume the absence of a dominant country (JLL No DomUnit) and the estimation of \(\Sigma\) under both the \(\mathbb{P}\) and \(\mathbb{Q}\) measures (JLL joint Sigma), as in JPS (2014). The remaining specifications differ in their \(\mathbb{P}\)-dynamics: either by an unrestricted VAR model (JPS multi) or by a GVAR setup (GVAR multi), as proposed in CM (2024).

    +

    4 Package dataset

    +

    The MultiATSM package provides datasets that approximate those used in the GVAR-based ATSMs of Candelon and Moura (2023) and CM (2024). The data requirements for estimating GVAR models encompass those of all other model classes, making them suitable for generating outputs across all models supported by the package. As such, the examples in the following sections use the dataset from CM (2024).

    +

    The LoadData() function provides access to the datasets included in the package. To load the data from CM (2024), set the argument to CM_2024:

    +
    +
    +
    LoadData("CM_2024")
    +
    +
    +

    This function returns three sets of data. The first contains time series of zero-coupon bond yields for four emerging market economies: China, Brazil, Mexico, and Uruguay. The data spans monthly intervals from June 2004 to January 2020. For the purpose of model estimation, the package requires that (i) bond yield maturities are the same across all countries;3 and (ii) yields must be expressed in annualized percentage terms (not basis points). Note that the MultiATSM package does not provide routines for bootstrapping zero-coupon yields from coupon bonds, so any such treatment must be handled by the user.

    +

    The second dataset comprises time series for unspanned risk factors — specifically, the macroeconomic indicators economic growth and inflation — covering the same period as the bond yield data. These data cover both (i) domestic variables for each of the four countries in the sample and (ii) corresponding global indicators. The construction of unspanned risk factors, like that of bond yields, must be carried out externally by the user.

    +

    The final dataset contains measures of interconnectedness, proxied by trade flows, which are specifically required for estimating the GVAR-based models. The trade flow data report the annual value of goods imported and exported between each pair of countries in the sample, starting from 1948. All values are expressed in U.S. dollars on a free-on-board basis. These data are used to construct the transition matrix in the GVAR framework.

    +

    5 Required user inputs

    +

    5.1 Fundamental inputs

    +

    To estimate any model, the user must specify several general inputs, which can be grouped into the following categories:

    +
      +
    1. Desired ATSM class (ModelType): a character vector containing the label of the model to be estimated as described in Table 1;

    2. +
    3. Risk Factor Features. This includes the following list of elements:

    4. +
    +
      +
    • Set of economies (Economies): a character vector containing the names of the economies which are part of the economic system;

    • +
    • Global variables (GlobalVar): a character vector containing the labels of the \(G\) global unspanned factors. Studies examining the impact of global developments on bond prices could include proxy measures of global inflation and global economic activity in this category (Jotikasthira et al. 2015; Abbritti et al. 2018; Candelon and Moura 2024);

    • +
    • Domestic variables (DomVar): a character vector containing the labels of the \(M\) domestic unspanned factors. These typically correspond to measures of domestic inflation and economic activity, the standard macroeconomic indicators monitored by central banks (Ang and Piazzesi 2003; Joslin et al. 2014; Jotikasthira et al. 2015; Candelon and Moura 2024);

    • +
    • Number of spanned factors (\(N\)): a scalar representing the number of country-specific spanned factors. Although, in principle, \(N\) could vary across countries, the models provided in the package assume a common value of \(N\) for all countries. +A common choice in the literature is \(N=3\), as in JPS (2014) and CM (2024), since this produces an excellent cross-sectional fit of bond yields (Litterman and Scheinkman (1991)). +Other studies, such as Adrian et al. (2013), extend the specification to \(N=5\), arguing that it improves the performance of model-implied term premia. Further intuition on the role and interpretation of spanned factors is provided in Section 6.1.

    • +
    +
      +
    1. Sample span:
    2. +
    +
      +
    • Initial sample date (t0): the start of the sample period in the format dd-mm-yyyy;

    • +
    • End sample date (tF): the end of the sample period in the format dd-mm-yyyy.

    • +
    +
      +
    1. Data Frequency (DataFreq): a character vector specifying the frequency of the time series data. The available options are: Annually, Quarterly, Monthly, Weekly, Daily Business Days, and Daily All Days;

    2. +
    3. Stationarity constraint under the \(\mathbb{Q}\)-dynamics (StatQ): a logical that takes TRUE if the user wishes to impose that the largest eigenvalue under the \(\mathbb{Q}\)-measure, \(\lambda^Q_i\), is strictly less than 1. While enforcing this stationarity constraint may increase estimation time, it can improve convergence and numerical stability. Moreover, by inducing near-cointegration, the eigenvalue restriction helps to pin down more plausible dynamics for bond risk premia (Bauer et al. 2012; Joslin et al. 2014);

    4. +
    5. Selected folder to save the graphical outputs (Folder2Save): path where the selected graphical outputs will be saved. If set to NULL, the outputs are stored in the user’s temporary directory (accessible via tempdir());

    6. +
    7. Output label (OutputLabel): A single-element character vector containing the name used in the file name that stores the model outputs.

    8. +
    +

    The following provides an example of the basic model input specification:

    +
    +
    +
    ModelType <- "JPS original"
    +Economies <- c("Brazil", "Mexico", "Uruguay")
    +GlobalVar <- c("Gl_Eco_Act", "Gl_Inflation")
    +DomVar <- c("Eco_Act", "Inflation")
    +N <- 3
    +t0 <- "01-07-2005"
    +tF <- "01-12-2019"
    +DataFreq <- "Monthly"
    +StatQ <- FALSE
    +Folder2Save <- NULL
    +OutputLabel <- "Model_demo"
    +
    +
    +

    5.2 Model-specific inputs

    +

    GVARlist and JLLlist

    +

    The inputs described above are sufficient for estimating all variants of the JPS models presented in Table 1. However, estimating the GVAR or JLL setups requires additional elements. For clarity, these extra inputs should be organized into separate lists for each model. This section outlines the general structure of both lists, while Section 6.2 provide a more detailed explanations of their components and available options, reflecting the broader scope of each setup.

    +

    For GVAR models, the required inputs are twofold. First, the user must specify the dynamic structure of each country’s VARX model. For example:

    +
    +
    +
    VARXtype <- "unconstrained"
    +
    +
    +

    Next, provide the desired inputs to build the transition matrix. For instance:

    +
    +
    +
    data('TradeFlows')
    +W_type <- "Sample Mean"
    +t_First_Wgvar <- "2000"
    +t_Last_Wgvar <- "2015"
    +DataConnectedness <- TradeFlows
    +
    +
    +

    Based on these inputs, a complete instance of the GVARlist object is

    +
    +
    +
    GVARlist <- list(VARXtype = "unconstrained", W_type = "Sample Mean",
    +                 t_First_Wgvar = "2000", t_Last_Wgvar = "2015",
    +                 DataConnectedness = TradeFlows)
    +
    +
    +

    For the JLL frameworks, if the chosen model is either JLL original or JLL joint Sigma, it suffices to specify the name of the dominant economy. Otherwise, for the JLL No DomUnit class, the user must set None. For instance:

    +
    +
    +
    ## Example for "JLL original" and "JLL joint Sigma" models
    +JLLlist <- list(DomUnit =  "China")
    +
    +## For "JLL No DomUnit" model
    +JLLlist <- list(DomUnit =  "None")
    +
    +
    +

    BRWlist

    +

    In an influential paper, Bauer et al. (2012) (henceforth BRW, 2012) show that estimates from traditional ATSMs often suffer from severe small-sample bias. This can lead to unrealistically stable expectations for future short-term interest rates and, consequently, distort term premium estimates for long-maturity bonds. To address this issue, BRW (2012) propose an indirect inference estimator based on a stochastic approximation algorithm, which corrects for bias and enhances the persistence of short-term interest rates, resulting in more plausible term premium dynamics.

    +

    It is worth noting that this framework serves as a complementary feature to the core ATSMs and can therefore be applied to any of the model types supported by the MultiATSM package. If the user intends to implement a model following the BRW (2012) approach, a few additional inputs must be specified. These include:

    +
      +
    • Mean or median of physical dynamic estimates (Cent_Measure): compute the mean or the median of the \(\mathbb{P}\)-dynamics estimates after each bootstrap iteration by setting the option to Mean (for the mean) or Median (for the median);

    • +
    • Adjustment parameter (gamma): this parameter controls the degree of shrinkage applied to the difference between the estimates prior to the bias correction and the bootstrap-based estimates after each iteration. It remains fixed across iterations and must lie in the interval \((0,1)\);

    • +
    • Number of iteration (N_iter) : total number of iterations used +in the stochastic approximation algorithm after burn-in;

    • +
    • Number of bootstrap samples (B): quantity of simulated samples +used in each burn-in or actual iteration;

    • +
    • Perform closeness check (checkBRW): indicates whether the user wishes to compute the root mean square distance between the model estimates obtained through the bias-correction method and those generated via no bias-correction. The default is TRUE;

    • +
    • Number of bootstrap samples used in the closeness check (B_check): +default is equal to 100,000 samples;

    • +
    • Eigenvalue restriction (Eigen_rest): impose a restriction on the largest eigenvalue under the \(P\)-measure after applying the bias correction procedure. Default is \(1\);

    • +
    • Number of burn-in iteration (N_burn): quantity of the iterations +to be discarded in the first stage of the bias-correction estimation +process. The recommended number is \(15\%\) of the total number of +iterations. In practice, this resembles the burn-in concept in Markov chain Monte Carlo methods. Particularly, the BRW (2012) stochastic approximation algorithm is iterative, and for a sufficiently large number of iterations, the parameters converge to their true values. As such, discarding early iterations avoids the need for assessing a computationally costly exit condition.

    • +
    +
    +
    +
    BRWlist <- within(list(Cent_Measure = "Mean", gamma = 0.2, N_iter = 500, B = 50,
    +                       checkBRW = TRUE, B_check = 1000, Eigen_rest = 1),
    +                       N_burn <- round(N_iter * 0.15))
    +
    +
    +

    5.3 Additional inputs for numerical and graphical outputs

    +

    Once the desired features are selected and the parameters of the chosen ATSM have been estimated, the MultiATSM package provides tools to generate the following numerical and graphical outputs +via the NumOutputs() function:

    +
      +
    • Time-series dynamics of the risk factors;
    • +
    • Model fit of the bond yields;
    • +
    • Orthogonalized impulse response functions (IRFs);
    • +
    • Orthogonalized forecast error variance decompositions (FEVDs);
    • +
    • Generalized impulse response functions (GIRFs);
    • +
    • Generalized forecast error variance decompositions (GFEVDs);
    • +
    • Decomposition of bond yields into expected and term premia components.
    • +
    +

    These outputs are organized into distinct analytical components, each offering different insights into model behavior and its economic interpretation.

    +

    The time-series dynamics of the risk factors are displayed in separate subplots: one for each global factor, and one subplot per domestic risk factor showing all countries in the economic system. The model fit of the bond yields is provided through two measures of model-implied yields. The first is a fitted measure derived solely from the cross-sectional component, as in Equation (5) for single-country models and Equation (6) for multicountry setups. This measure reflects the fit based exclusively on the parameters governing the \(\mathbb{Q}\)-dynamics. The second incorporates both the physical and risk-neutral dynamics, combining the cross-sectional equations with the state evolution specified by each ATSM.

    +

    The impulse response functions and variance decompositions are available in both orthogonalized and generalized forms. The orthogonalized outputs (IRFs and FEVDs) are computed using a short-run recursive identification scheme, meaning they depend on the ordering of the selected risk factors. Specifically, the package is structured to place global unspanned factors first, followed by its domestic unspanned and spanned factors within each country, in the order in which countries are listed in the Economies vector. In contrast, the generalized versions (GIRFs and GFEVDs) are robust to factor ordering but allow for correlated shocks across risk factors (Pesaran and Shin 1998). For the numerical computation of these outputs, a horizon of analysis has to be specified, e.g., Horiz <- 100.

    +

    The bond yield decomposition can be performed with respect to two measures of risk compensation: term premia and forward premia. While the term premium is derived directly from the bond yield levels, the forward premium is obtained from the decomposition of forward rates. A more formal presentation of both measures is provided in the Appendix.

    +

    Users must specify the desired graph types in a character vector. Available options include: RiskFactors, Fit, IRF, FEVD, GIRF, GFEVD, and TermPremia. For example:

    +
    +
    +
    DesiredGraphs <- c("Fit", "GIRF", "GFEVD", "TermPremia")
    +
    +
    +

    Moreover, for all models, users must indicate the types of variables of interest (yields, risk factors, or both). For JLL-type models specifically, users must also specify whether to include the orthogonalized versions. Each of these options should be set to TRUE to generate the corresponding graphs, and FALSE otherwise.

    +
    +
    +
    WishGraphRiskFac <- FALSE
    +WishGraphYields <- TRUE
    +WishOrthoJLLgraphs <- FALSE
    +
    +
    +

    The desired graphical outputs are stored in the selected folder, Folder2Save. Alternatively, users can display the desired plots directly in the console without saving them to Folder2Save by using the autoplot() method.

    +

    Bootstrap settings

    +

    Horowitz (2019) shows that bootstrap methods generally produce more accurate statistical inference than those based on asymptotic distribution theory. To generate confidence intervals +using bootstrap, via the Bootstrap() function, an additional list of inputs must be provided:

    +
      +
    • Desired bootstrap procedure (methodBS): the user must select one of the following options: (i) standard residual bootstrap (bs); (ii) wild bootstrap (wild); or (iii) block bootstrap (block). If the block bootstrap is selected, the block length must also be specified. +The residual bootstrap is a conventional method that is straightforward to implement when a parametric model, such as a VAR model, is available. The block bootstrap makes weaker assumptions about the data-generating process and is well-suited to handling both weak and strong serial dependence. The wild bootstrap is particularly appropriate for data exhibiting heteroskedasticity (Horowitz 2019);
    • +
    • Number of bootstrap draws (ndraws): Kilian and Lütkepohl (2017) suggest that, in VAR specifications, ndraws can range from a few hundred to several thousand, depending on factors such as sample size, lag order, and the desired quantiles of the distribution. Illustrating this, CM (2024) set \(ndraws= 1,000\) in their ATSM to construct confidence intervals for IRFs;
    • +
    • Confidence level expressed (pctg): the desired confidence level should be expressed in percentage points. Common choices in VAR-related setups include \(68\%\), \(90\%\) and \(95\%\) (Kilian and Lütkepohl 2017).
    • +
    +
    +
    +
    Bootlist <- list(methodBS = 'block', BlockLength = 4, ndraws =  1000, pctg   =  95)
    +
    +
    +

    Out-of-sample forecast settings

    +

    To generate bond yield forecasts, use ForecastYields() with the following inputs:

    +
      +
    • Forecast horizon (ForHoriz): Number of forecast horizons in periods;
    • +
    • Index of the first observation (t0Sample): time index of the first observation included in the information set;
    • +
    • Index of the last observation (t0Forecast): time index of the last observation in the information set used to generate the first forecast;
    • +
    • Method used for forecast computation (ForType): forecasts can be generated using either a rolling or expanding window. To use a rolling window, set this parameter to Rolling. In this case, the sample length for each forecast is fixed and defined by t0Sample. For expanding window forecasts, set this input to Expanding, allowing the information set to increase at each forecast iteration.
    • +
    +
    +
    +
    ForecastList <- list(ForHoriz = 12, t0Sample = 1, t0Forecast = 70, ForType = "Rolling")
    +
    +
    +

    6 Model estimation

    +

    Using the dataset described in Section 4, the estimation of the ATSM proceeds in three main steps. First, the country-specific spanned factors are estimated, which, along with the global and domestic unspanned factors, form the complete set of risk factors used in the subsequent estimation steps. Second, the package estimates the parameters governing the dynamics of the risk factors under the \(\mathbb{P}\)-measure. Finally, it optimizes the full ATSM specification, including the parameters under the \(\mathbb{Q}\)-measure.

    +

    As will be made clear in Section 7, although the functions introduced in this section can be used individually, they are primarily designed to be used together with the broader set of functions available in the MultiATSM package. However, as these functions play a central role in the package structure, they warrant a dedicated section.

    +

    6.1 Spanned factors

    +

    The spanned factors for country \(i\), denoted by \(\boldsymbol{P_{i,t}}\), are typically obtained as the first \(N\) principal components (PCs) of the observed bond yields. The PC method provides orthogonal linear combinations of the original variables, ordered by their ability to capture the variance in the data. Formally, \(\boldsymbol{P_{i,t}}\) is computed as \(\boldsymbol{P_{i,t}} = w_i \boldsymbol{Y_{i,t}}\), where yields are ordered by increasing maturity in \(\boldsymbol{Y_{i,t}}\), and \(w_i\) is the matrix of eigenvectors derived from the covariance matrix of \(\boldsymbol{Y_{i,t}}\).

    +

    In the case of \(N = 3\), the spanned factors are traditionally interpreted as level, slope, and curvature components of the yield curve (Litterman and Scheinkman 1991). This interpretation stems from the properties of the \(w_i\) matrix, as illustrated below:

    +
    +
    +
    data('Yields')
    +w <- pca_weights_one_country(Yields, Economy = "Uruguay")
    +
    +
    +

    In matrix w, each row holds the weights for constructing a spanned factor. The first row relates to the level factor, with weights loading roughly equally across maturities. As such, high (low) values of the level factor indicate an overall high (low) value of yields across all maturities. The second row features increasing weights with maturity, capturing the slope of the yield curve: high values indicate steep curves, while low values reflect flat or inverted curves. The third row corresponds to the curvature factor, with weights emphasizing medium-term maturities. This captures the ‘hump-shaped’ features of the yield curve typically associated with changes in its curvature. These concepts are also graphically illustrated in Figure 1.

    +
    +
    +Yield loadings on the spanned factors. Example using bond yield data for Uruguay. Graph generated using the ggplot2 package [@ggplot22016]. +

    +Figure 1: Yield loadings on the spanned factors. Example using bond yield data for Uruguay. Graph generated using the ggplot2 package (Wickham 2016). +

    +
    +
    +

    The user can directly obtain the time series of the country-specific spanned factors by calling Spanned_Factors(), as shown below:

    +
    +
    +
    data('Yields')
    +Economies <- c("China", "Brazil", "Mexico", "Uruguay")
    +N <- 2
    +SpaFact <- Spanned_Factors(Yields, Economies, N)
    +
    +
    +

    6.2 The P-dynamics estimation

    +

    As presented in Table 1 and explained in detail in Section 2, the dynamics of the risk factors under the \(\mathbb{P}\)-measure in the available models follow a VAR(1) process. This specification can be fully unrestricted, as in the JPS-related models, or subject to restrictions, as in the GVAR and JLL frameworks. This subsection illustrates how each of these model configurations is implemented.

    +

    VAR

    +

    To use VAR(), the user needs to select the appropriate set of risk factors for the model being estimated and specify unconstrained in the argument VARtype. In the two examples presented below, the outputs are the intercept vector, the feedback matrix, and the variance–covariance matrix for a VAR(1) model under the \(\mathbb{P}\)-measure:

    +
    +
    +
    ## Example 1: "JPS global" and "JPS multi" models
    +data("RiskFacFull")
    +PdynPara <- VAR(RiskFacFull, VARtype = "unconstrained")
    +
    +## Example 2: "JPS original" model for China
    +FactorsChina <- RiskFacFull[1:7, ]
    +PdynPara <- VAR(FactorsChina, VARtype = "unconstrained")
    +
    +
    +

    GVAR

    +

    The GVAR() function estimates a GVAR(1) model constructed from country-specific VARX\(^{*}(1,1,1)\) specifications. It requires two main inputs: the number of domestic spanned factors (\(N\)) and a set of elements grouped in the GVARinputs list. The latter consists of four components:

    +
      +
    1. Economies: a \(C-\)dimensional character vector containing the names of the economies present in the economic system;

    2. +
    3. GVAR list of risk factors: a list of risk factors sorted by country in addition to the global variables. An example of the expected data structure is:

    4. +
    +
    +
    +
    data("GVARFactors")
    +
    +
    +

    To assist in formatting the data accordingly, users may use the DatabasePrep() function;

    +
      +
    1. VARX type: a character vector specifying the desired structure of the VARX\(^{*}\) model. Two general options are available:
    2. +
    +
      +
    • Fully unconstrained: specify as unconstrained. This option estimates each equation in the system separately via ordinary least squares, without imposing any restrictions.

    • +
    • With constraints: imposes specific set of zero restrictions on the feedback matrix. This category includes two sub-options: +(a) constrained: Spanned Factors prevents foreign spanned factors from affecting any domestic risk factor; +(b) constrained: [factor name] restricts the specified risk factor to be influenced only by its own lags and the lags of its associated star variables. In both cases, the VARX\(^{*}\) is estimated using restricted least squares.

    • +
    +
    +
    +
    data('GVARFactors')
    +GVARinputs <- list(Economies = Economies, GVARFactors = GVARFactors,
    +                   VARXtype ="constrained: Inflation")
    +
    +
    +
      +
    1. Transition matrix: a \(C \times C\) matrix that captures the degree of interdependence across the countries in the system. Each entry \((i,j)\) represents the strength of the dependence of economy \(i\) on economy \(j\). As an example, the matrix below is computed from bilateral trade flow data, averaged over the period 2006–2019, for a system comprising China, Brazil, Mexico, and Uruguay. The rows are normalized so that the weights sum to \(1\) for each country (i.e., each row of the matrix sums to \(1\)). +The transition matrix can be generated using Transition_Matrix(), as illustrated in the Appendix :
    2. +
    +
    +
             China Brazil Mexico Uruguay
    +China   0.0000 0.6549 0.3155  0.0296
    +Brazil  0.8269 0.0000 0.1234  0.0497
    +Mexico  0.8596 0.1326 0.0000  0.0078
    +Uruguay 0.3811 0.5498 0.0691  0.0000
    +
    +

    With inputs specified, the user can estimate a GVAR model using:

    +
    +
    +
    data("GVARFactors")
    +GVARinputs <- list(Economies = Economies, GVARFactors = GVARFactors,
    +                   VARXtype = "unconstrained", Wgvar = W_gvar)
    +N <- 3
    +GVARpara <- GVAR(GVARinputs, N, CheckInputs = TRUE)
    +
    +
    +

    Note that the CheckInputs parameter should be set to TRUE to perform a consistency check on the inputs specified in GVARinputs prior to the \(\mathbb{P}\)-dynamics estimation.

    +

    JLL

    +

    The JLL() function estimates the physical parameters. Required inputs are:

    +
      +
    1. Risk factors: a time series matrix of the risk factors in their non-orthogonalized form;

    2. +
    3. Number of spanned factors (\(N\)): a scalar representing the number of country-specific spanned factors;

    4. +
    5. JLLinputs: a list object containing the following elements:

    6. +
    +
      +
    • Economies: a \(C\)-dimensional character vector listing the economies;

    • +
    • Dominant Economy: a character vector indicating either the name of the country assigned as the dominant economy (for JLL original and JLL jointSigma models), or None (for the JLL No DomUnit case);

    • +
    • Estimate Sigma Matrices: a logical equal to TRUE if the user wishes to estimate the full set of JLL sigma matrices (i.e., variance-covariance and Cholesky factor matrices), and FALSE otherwise. Since this numerical estimation is costly, it may significantly increase computation time;

    • +
    • Precomputed Variance-Covariance Matrix: in some instances, a precomputed variance-covariance matrix from the non-orthogonalized dynamics can be supplied here to save time and memory. If no such matrix is available, this input should be set to NULL;

    • +
    • JLL type: a character string specifying the chosen JLL model, following the classification described in Table 1.

    • +
    +
    +
    +
    ## First set the JLLinputs
    +ModelType <- "JLL original"
    +JLLinputs <- list(Economies = Economies, DomUnit = "China", WishSigmas = TRUE,
    +                  SigmaNonOrtho = NULL, JLLModelType = ModelType)
    +
    +## Then, estimate the desired the P-dynamics from the desired JLL model
    +data("RiskFacFull")
    +N <- 3
    +JLLpara <- JLL(RiskFacFull, N, JLLinputs, CheckInputs = TRUE)
    +
    +
    +

    The CheckInputs input is set to TRUE to perform a consistency check on the inputs specified in JLLinputs before running the \(\mathbb{P}\)-dynamics estimation.

    +

    6.3 ATSM estimation

    +

    Estimating the ATSM parameters involves maximizing the log-likelihood function to obtain the best-fitting model parameters using Optimization(). The unspanned risk factor framework of JPS (2014) (and, therefore, all its multicountry extensions) follows a model parameterization similar to that proposed in Joslin et al. (2011). Particularly, it requires estimating a set of six parameter blocks:

    +
      +
    1. The risk-neutral long-run mean of the short rate (\(r0\));

    2. +
    3. The risk-neutral feedback matrix (\(K1XQ\));

    4. +
    5. Standard deviation of measurement errors for yields observed with error (\(se\));

    6. +
    7. The variance-covariance matrix from the VAR process (\(SSZ\));

    8. +
    9. The intercept matrix of the physical dynamics (\(K0Z\));

    10. +
    11. The feedback matrix of the physical dynamics (\(K1Z\)).

    12. +
    +

    The parameters \(K0Z\) and \(K1Z\) have closed-form solutions. Similarly, \(r0\) and \(se\) are derived analytically and are factored out of the log-likelihood function. In contrast, the remaining parameters, \(K1XQ\) and \(SSZ\), must be estimated numerically.

    +

    The optimization routine in MultiATSM combines the Nelder–Mead and L-BFGS-B algorithms, executed sequentially and repeated until convergence is achieved. At each iteration, the parameter vector yielding the highest likelihood is retained, enhancing robustness to local optima without resorting to full multi-start procedures. Convergence is achieved when the absolute change in the mean log-likelihood falls below a user-defined tolerance (default \(10^{-4}\)). For the bootstrap replications, the same optimization procedure is applied; however, only the Nelder–Mead algorithm is used to reduce computation time.

    +

    7 Full implementation of ATSMs

    +

    7.1 Package workflow

    +

    The complete workflow of the MultiATSM package is built around seven core functions, which together support a streamlined and modular process. An overview of these functions is provided below:

    +
      +
    1. LabFac(): returns a list of risk factor labels used throughout the package. In particular, these labels assist in structuring sub-function inputs and generating variable and graph labels in a parsimonious manner;

    2. +
    3. InputsForOpt(): collects and processes the inputs needed to build the likelihood function as specified in Section 5. It estimates the model’s \(\mathbb{P}\)-dynamics and returns an object of class ATSMModelInputs, which includes print() and summary() S3 methods. The print() method summarizes model inputs and system features, while summary() reports statistics on risk factors and bond yields;

    4. +
    5. Optimization(): performs the estimation of the model parameters, primarily the \(\mathbb{Q}\)-dynamics, using numerical optimization. This function returns a comprehensive list of the model’s point estimates and can be computationally intensive;

    6. +
    7. InputsForOutputs(): an auxiliary function that compiles the necessary elements for producing numerical and graphical outputs. It also creates separate folders in the user’s Folder2Save directory to store the generated figures;

    8. +
    9. NumOutputs(): produces the numerical outputs as selected in Section 5.3, based on the model’s point estimates. The function returns an object of class ATSMNumOutputs, for which an autoplot() S3 method is available. This method provides a convenient way to visualize the selected graphical outputs;

    10. +
    11. Bootstrap(): computes confidence bounds for the numerical outputs using the bootstrap procedures defined in Section 5.3 (subsection “Bootstrap settings”). The function returns an ATSMModelBoot object, which can be accessed via the autoplot() S3 method to generate the desired graphical outputs with confidence intervals. As this step involves repeated model estimation, it may require several hours (possibly days) to complete;

    12. +
    13. ForecastYields(): generates bond yield forecasts and the corresponding forecast errors according to the specifications outlined in Section 5.3 (subsection “Out-of-sample forecast settings”). This function returns an object of class ATSMModelForecast, accesible via the plot() S3 method, which displays Root Mean Squared Errors (RMSEs) by country and forecast horizon.

    14. +
    +

    7.2 Complete implementation

    +

    This section illustrates how to fully implement ATSMs using the MultiATSM package. A simplified two-country JPS original framework serves as the example. The implementation steps are outlined below, and a sample of graphical outputs are presented in Figures 25.

    +
    +
    +
    library(MultiATSM)
    +# 1) USER INPUTS
    +# A) Load database data
    +LoadData("CM_2024")
    +
    +# B) GENERAL model inputs
    +ModelType <- "JPS original"
    +Economies <- c("China", "Brazil")
    +GlobalVar <- c("Gl_Eco_Act")
    +DomVar <- c("Eco_Act")
    +N <- 2
    +t0_sample <- "01-05-2005"
    +tF_sample <- "01-12-2019"
    +OutputLabel <- "Test"
    +DataFreq <-"Monthly"
    +Folder2Save <- NULL
    +StatQ <- FALSE
    +
    +# B.1) SPECIFIC model inputs
    +# GVAR-based models
    +GVARlist <- list( VARXtype = "unconstrained", W_type = "Sample Mean", t_First_Wgvar = "2005",
    +                  t_Last_Wgvar = "2019", DataConnectedness = TradeFlows )
    +
    +# JLL-based models
    +JLLlist <- list(DomUnit =  "China")
    +
    +# BRW inputs
    +WishBC <- FALSE
    +BRWlist <- within(list(Cent_Measure = "Mean", gamma = 0.05, N_iter = 250, B = 50, checkBRW = TRUE,
    +                       B_check = 1000, Eigen_rest = 1),  N_burn <- round(N_iter * 0.15))
    +
    +# C) Decide on Settings for numerical outputs
    +WishFPremia <- TRUE
    +FPmatLim <- c(60,120)
    +
    +Horiz <- 30
    +DesiredGraphs <- c()
    +WishGraphRiskFac <- FALSE
    +WishGraphYields <- FALSE
    +WishOrthoJLLgraphs <- FALSE
    +
    +# D) Bootstrap settings
    +WishBootstrap <- TRUE
    +BootList <- list(methodBS = 'bs', BlockLength = 4, ndraws = 5, pctg =  95)
    +
    +# E) Out-of-sample forecast
    +WishForecast <- TRUE
    +ForecastList <- list(ForHoriz = 12,  t0Sample = 1, t0Forecast = 162, ForType = "Rolling")
    +
    +##########################################################################################
    +# NO NEED TO MAKE CHANGES FROM HERE:
    +# The sections below automatically process the inputs provided above, run the model
    +# estimation, generate the numerical and graphical outputs, and save results.
    +
    +# 2) Minor preliminary work: get the sets of factor labels
    +FactorLabels <- LabFac(N, DomVar, GlobalVar, Economies, ModelType)
    +
    +# 3) Prepare the inputs of the likelihood function
    +ATSMInputs <- InputsForOpt(t0_sample, tF_sample, ModelType, Yields, GlobalMacro,
    +                           DomMacro, FactorLabels, Economies, DataFreq, GVARlist,
    +                           JLLlist, WishBC, BRWlist)
    +
    +# 4) Optimization of the ATSM (Point Estimates)
    +ModelParaList <- Optimization(ATSMInputs, StatQ, DataFreq, FactorLabels, Economies, ModelType)
    +
    +# 5) Numerical and graphical outputs
    +# a) Prepare list of inputs for graphs and numerical outputs
    +InputsForOutputs <- InputsForOutputs(ModelType, Horiz, DesiredGraphs, OutputLabel, StatQ,
    +                                     DataFreq, WishGraphYields, WishGraphRiskFac,
    +                                     WishOrthoJLLgraphs, WishFPremia,
    +                                     FPmatLim, WishBootstrap, BootList,
    +                                     WishForecast, ForecastList)
    +
    +# b) Fit, IRF, FEVD, GIRF, GFEVD, and Term Premia
    +NumericalOutputs <- NumOutputs(ModelType, ModelParaList, InputsForOutputs,
    +                               FactorLabels, Economies, Folder2Save)
    +
    +# c) Confidence intervals (bootstrap analysis)
    +BootstrapAnalysis <- Bootstrap(ModelType, ModelParaList, NumericalOutputs, Economies,
    +                               InputsForOutputs, FactorLabels, JLLlist, GVARlist,
    +                               WishBC, BRWlist, Folder2Save)
    +
    +# 6) Out-of-sample forecasting
    +Forecasts <- ForecastYields(ModelType, ModelParaList, InputsForOutputs, FactorLabels,
    +                            Economies, JLLlist, GVARlist, WishBC, BRWlist,
    +                            Folder2Save)
    +
    +
    +
    +
    +Chinese bond yield maturities with model fit comparisons. *Model-fit* reflects estimation using only risk-neutral ($\mathbb{Q}$) dynamics parameters, while *Model-Implied* incorporates both physical ($\mathbb{P}$) and risk-neutral ($\mathbb{Q}$) dynamics. The $x$-axes represent time in months and the $y$-axis is in natural units. +

    +Figure 2: Chinese bond yield maturities with model fit comparisons. Model-fit reflects estimation using only risk-neutral (\(\mathbb{Q}\)) dynamics parameters, while Model-Implied incorporates both physical (\(\mathbb{P}\)) and risk-neutral (\(\mathbb{Q}\)) dynamics. The \(x\)-axes represent time in months and the \(y\)-axis is in natural units. +

    +
    +
    +
    +
    +IRFs from the Brazilian bond yields to global economic activity. Size of the shock is one-standard deviation. The black lines are the point estimates. Gray dashed lines are the bounds of the 95% confidence intervals and the green lines correspond to the median of these intervals. The $x$-axes are expressed in months and the $y$-axis is in natural units. +

    +Figure 3: IRFs from the Brazilian bond yields to global economic activity. Size of the shock is one-standard deviation. The black lines are the point estimates. Gray dashed lines are the bounds of the 95% confidence intervals and the green lines correspond to the median of these intervals. The \(x\)-axes are expressed in months and the \(y\)-axis is in natural units. +

    +
    +
    +
    +
    +FEVD from the Brazilian bond yield with maturity 60 months. The $x$-axis represents the forecast horizon in months and the $y$-axis is in natural units. +

    +Figure 4: FEVD from the Brazilian bond yield with maturity 60 months. The \(x\)-axis represents the forecast horizon in months and the \(y\)-axis is in natural units. +

    +
    +
    +
    +
    +Chinese sovereign yield curve decomposition showing (i) expected future short rates and (ii) term premia components. The $x$-axis represents time in months and the $y$-axis is expressed in percentage points. +

    +Figure 5: Chinese sovereign yield curve decomposition showing (i) expected future short rates and (ii) term premia components. The \(x\)-axis represents time in months and the \(y\)-axis is expressed in percentage points. +

    +
    +
    +

    8 Concluding remarks

    +

    The MultiATSM package aims to advance yield curve (term structure) modelling within the R programming environment. It provides a comprehensive yet user-friendly toolkit for practitioners, academics, and policymakers, featuring estimation routines and generating detailed outputs across several macrofinance model classes. This allows for an in-depth exploration of the relationship between the real economy developments and the fixed income markets.

    +

    The package covers eight classes of macrofinance term structure models, all built upon the single-country unspanned macroeconomic risk framework of Joslin et al. (2014), which is also extended to a multicountry setting. Additional multicountry variants based on Jotikasthira et al. (2015) and Candelon and Moura (2024) are included, incorporating, respectively, a dominant economy and a GVAR structure to model cross-country interdependence.

    +

    Each model class provides analytical outputs that offer insight into term structure dynamics, including plots of model fit, risk premia, impulse responses, and forecast error variance decompositions.The MultiATSM package also offers bootstrap procedures for confidence interval construction and out-of-sample forecasting of bond yields.

    +

    Acknowledgments

    +

    I thank the editor, Rob Hyndman, and an anonymous referee for several helpful comments. I am also grateful to Bertrand Candelon, Adhir Dhoble and Gustavo Torregrosa for many insightful discussions. An earlier version of this paper circulated under the title MultiATSM: An R Package for Arbitrage-Free Multicountry Affine Term Structure of Interest Rate Models with Unspanned Macroeconomic Risk and was part of the author’s PhD dissertation at UCLouvain (Moura 2022). The views expressed in this paper are those of the author and do not necessarily reflect those of Banco de Mexico.

    +

    9 Appendix

    +

    A: Supplementary functions

    +

    Importing data from Excel files

    +

    The MultiATSM package also provides an automated procedure for importing data from Excel files via Load_Excel_Data() and preparing the risk factor database used directly in the model estimation. To ensure compatibility with the package functions, the following requirements must be met:

    +
      +
    1. Databases must be organized in separate Excel files: one for unspanned factors and another for term structure data. For GVAR-based models, a third file containing the interdependence measures is also required;
    2. +
    3. Each Excel file should include one tab per country. In the case of unspanned factors, an additional tab must be included for the global variables if the user opts to incorporate them;
    4. +
    5. Variable names must be identical across all tabs within each file.
    6. +
    +

    An example Excel file meeting these requirements is provided with the package. Below is an example of how to import the data from excel and construct the input list to be supplied:

    +
    +
    +
    MacroData  <- Load_Excel_Data(system.file("extdata", "MacroData.xlsx",
    +                                          package = "MultiATSM"))
    +YieldsData <- Load_Excel_Data(system.file("extdata", "YieldsData.xlsx",
    +                                          package = "MultiATSM"))
    +
    +
    +
    +
    +
    ModelType <- "JPS original"
    +Initial_Date <- "2006-09-01"
    +Final_Date <- "2019-01-01"
    +DataFrequency <- "Monthly"
    +GlobalVar <- c("GBC", "VIX")
    +DomVar <- c("Eco_Act", "Inflation", "Com_Prices", "Exc_Rates")
    +N <- 3
    +Economies <- c("China", "Mexico", "Uruguay", "Brazil", "Russia")
    +
    +
    +

    These inputs are used to construct the RiskFactorsSet variable, which holds the full collection of risk factors required by the model.

    +
    +
    +
    FactorLabels <- LabFac(N, DomVar, GlobalVar, Economies, ModelType)
    +RiskFactorsSet <- DataForEstimation(Initial_Date, Final_Date, Economies, N, FactorLabels,
    +                                    ModelType, DataFrequency, MacroData, YieldsData)
    +
    +
    +

    Transition matrix and star factors

    +

    To construct the transition matrix for GVAR specifications, the user can employ Transition_Matrix(). This function requires:

    +
      +
    1. Data selection: choose proxies for cross-country interdependence.

    2. +
    3. Time frame: specify the sample’s start and end dates.

    4. +
    5. Dependence measure: select from:

    6. +
    +
      +
    • Time-varying (dynamic weights)
    • +
    • Sample Mean (static average)
    • +
    • A numeric scalar (fixed-year snapshot).
    • +
    +
    +
    +
    data("TradeFlows")
    +t_First <- "2006"
    +t_Last <- "2019"
    +Economies <- c("China", "Brazil", "Mexico", "Uruguay")
    +type <- "Sample Mean"
    +W_gvar <- Transition_Matrix(t_First, t_Last, Economies, type, TradeFlows)
    +
    +
    +

    Note that if data is missing for any country in a given year, the corresponding transition matrix will contain only NAs.

    +

    A more flexible approach to modelling interdependence is to allow the transition matrix to vary over time. In this case, the star factors are constructed using trade flow weights specific to each year, adjusting the corresponding year’s risk factors accordingly. To enable this feature, users must set the type argument to Time-varying and specify the same year for both the initial and final periods in the transition matrix. This indicates that the trade weights from that particular year is used when solving the GVAR system (i.e., in the construction of the link matrices, see Equation (11)).

    +

    B: Additional theoretical considerations

    +

    Bond yield decomposition

    +

    The MultiATSM package allows for the calculation +of two risk compensation measures: term premia and forward premia. Assume that an \(n\)-maturity bond yield can be decomposed into two components: the expected short-rate (\(\mathrm{Exp}_{i,t}^{(n)}\)) and term premia +(\(\mathrm{TP}_{i,t}^{(n)}\)). +Technically: +\[ +y_{i,t}^{(n)} = \mathrm{Exp}_{i,t}^{(n)} + \mathrm{TP}_{i,t}^{(n)} \text{.} +\] In the package’s standard form, the expected short rate term is +computed from time \(t\) to \(t+n\), which represents the bond’s maturity: +\(\mathrm{Exp}_{i,t}^{(n)} = \sum_{h=0}^{n} E_t[y_{i, t+h}^{(1)}]\). Alternatively, +the decomposition for the forward rates (\(f_{i,t}^{(n)}\)) is +\(f_{i,t}^{(n)} = \sum_{h=m}^{n} E_t[y_{i,t+h}^{(1)}] + \mathrm{FP}_{i,t}^{(n)}\) where +\(\mathrm{FP}_{i,t}^{(n)}\) corresponds to the forward premia. In this case, the user +must specify TRUE if the computation of forward +premia is desired, or FALSE otherwise. If set to TRUE, the user must also +provide a two-element numerical vector containing the maturities +corresponding to the starting and ending dates of the bond maturity. +Example:

    +
    +
    +
        WishFPremia <- TRUE
    +    FPmatLim <- c(60, 120)
    +
    +
    +

    C: Replication of existing research

    +

    Joslin, Priebisch and Singleton (2014)

    +

    The dataset used in this replication was constructed by Bauer and Rudebusch (2017) (henceforth BR, 2017) and is available on Bauer’s website. In their paper, BR (2017) investigate whether macrofinance term structure models are better suited to the unspanned macro risk framework of JPS (2014) or to earlier, traditional spanned settings such as Ang and Piazzesi (2003). To that end, BR (2017) replicate selected empirical results from JPS (2014). The corresponding R code is also available on Bauer’s website.

    +

    Using the dataset from BR (2017), the code below applies the MultiATSM package to estimate the key ATSM parameters following the JPS original modelling setup.

    +
    +
    +
    # 1) INPUTS
    +# A) Load database data
    +LoadData("BR_2017")
    +
    +# B) GENERAL model inputs
    +ModelType <- "JPS original"
    +
    +Economies <- c("US")
    +GlobalVar <- c()
    +DomVar <- c("GRO", "INF")
    +N <- 3
    +t0_sample <- "January-1985"
    +tF_sample <- "December-2007"
    +DataFreq <- "Monthly"
    +StatQ <- FALSE
    +
    +# 2) Minor preliminary work
    +FactorLabels <- LabFac(N, DomVar, GlobalVar, Economies, ModelType)
    +Yields <- t(BR_jps_out$Y)
    +DomesticMacroVar <- t(BR_jps_out$M.o)
    +GlobalMacroVar <- c()
    +
    +# 3) Prepare the inputs of the likelihood function
    +ATSMInputs <- InputsForOpt(t0_sample, tF_sample, ModelType, Yields, GlobalMacroVar,
    +                           DomesticMacroVar, FactorLabels, Economies, DataFreq)
    +
    +# 4) Optimization of the model
    +ModelPara <- Optimization(ATSMInputs, StatQ, DataFreq, FactorLabels, Economies, ModelType)
    +
    +
    +

    The tables below compare the ATSM parameter estimates generated from BR (2017) and the MultiATSM. Table 2 reports the risk-neutral parameters. While the values presented do not match exactly, the differences are well within convergence tolerance and are arguably economically negligible. Table 3, by contrast, contains parameters related to the model’s time-series dynamics. As these are derived in closed form, the estimates are exactly the same under both specifications.

    +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +Table 2: \(Q\)-dynamics parameters +
    + +MultiATSM + +BR (2017) +
    +\(r_0\) + +0.0006 + +−0.0002 +
    +\(\lambda_1\) + +0.9967 + +0.9968 +
    +\(\lambda_2\) + +0.9149 + +0.9594 +
    +\(\lambda_3\) + +0.9149 + +0.8717 +
    +Note: +
    + λ’s are the eigenvalues from the risk-neutral feedback matrix and r₀ is the long-run mean of the short rate under Q. +
    +
    +
    + +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +Table 3: \(P\)-dynamics parameters +
    + +
    +K0Z +
    +
    +
    +K1Z +
    +
    + + +PC1 + +PC2 + +PC3 + +GRO + +INF +
    +BR (2017) +
    +PC1 + +0.0781 + +0.9369 + +−0.0131 + +−0.0218 + +0.1046 + +0.1003 +
    +PC2 + +0.0210 + +0.0058 + +0.9781 + +0.1703 + +−0.1672 + +−0.0402 +
    +PC3 + +0.1005 + +−0.0104 + +−0.0062 + +0.7835 + +−0.0399 + +0.0437 +
    +GRO + +0.0690 + +−0.0048 + +0.0180 + +−0.1112 + +0.8818 + +−0.0025 +
    +INF + +0.0500 + +0.0018 + +0.0064 + +−0.0592 + +0.0277 + +0.9859 +
    +MultiATSM +
    +PC1 + +0.0781 + +0.9369 + +−0.0131 + +−0.0218 + +0.1046 + +0.1003 +
    +PC2 + +0.0210 + +0.0058 + +0.9781 + +0.1703 + +−0.1672 + +−0.0402 +
    +PC3 + +0.1005 + +−0.0104 + +−0.0062 + +0.7835 + +−0.0399 + +0.0437 +
    +GRO + +0.0690 + +−0.0048 + +0.0180 + +−0.1112 + +0.8818 + +−0.0025 +
    +INF + +0.0500 + +0.0018 + +0.0064 + +−0.0592 + +0.0277 + +0.9859 +
    +Note: +
    + \(K0Z\) is the intercept and \(K1Z\) is the feedback matrix from the \(P\)-dynamics. +
    +
    +
    + +
    +

    For replicability, it is important to note that the physical dynamics results reported in Table 3 using MultiATSM rely on the principal component weights provided by BR (2017). Such a matrix is simply a scaled-up version of the one provided by the function pca_weights_one_country() of the current package. Accordingly, despite the numerical differences on the weight matrices, both methods generate time series of spanned factors which are perfectly correlated. Another difference between the two approaches relates to the construction of the log-likelihood function: while in the BR (2017) code this is expressed in terms of a portfolio of yields, the MultiATSM package generates this same input directly as a function of observed yields (i.e. both procedures lead to equivalent log-likelihood vales up to the Jacobian term).

    +

    Additionally, it is worth highlighting that the standard deviations for the portfolios of yields observed with errors are nearly identical, matching to seven decimal places: 0.0000546 for MultiATSM and 0.0000550 for BR (2017).

    +

    Candelon and Moura (2024)

    +

    The multicountry framework introduced in Candelon and Moura (2024) enhances the tractability of large-scale ATSMs and deepens our understanding of the global economic mechanisms driving domestic yield curve fluctuations. This framework also generates more precise model estimates and enhances the forecasting capabilities of these models. This novel setup, embodied by the GVAR multi model class, is benchmarked against the findings of Jotikasthira et al. (2015), which are captured by the JLL original model class. The paper showcases an empirical illustration involving China, Brazil, Mexico, and Uruguay.

    +
    +
    +
    # 1) INPUTS
    +# A) Load database data
    +LoadData("CM_2024")
    +
    +# B) GENERAL model inputs
    +ModelType <- "GVAR multi"
    +Economies <- c("China", "Brazil", "Mexico", "Uruguay")
    +GlobalVar <- c("Gl_Eco_Act", "Gl_Inflation")
    +DomVar <- c("Eco_Act", "Inflation")
    +N <- 3
    +t0_sample <- "01-06-2004"
    +tF_sample <- "01-01-2020"
    +OutputLabel <- "CM_jfec"
    +DataFreq <-"Monthly"
    +StatQ <- FALSE
    +
    +# B.1) SPECIFIC model inputs
    +# GVAR-based models
    +GVARlist <- list( VARXtype = "unconstrained", W_type = "Sample Mean", t_First_Wgvar = "2004",
    +                  t_Last_Wgvar = "2019", DataConnectedness = TradeFlows )
    +
    +# JLL-based models
    +JLLlist <- list(DomUnit =  "China")
    +
    +# BRW inputs
    +WishBC <- TRUE
    +BRWlist <- within(list(Cent_Measure = "Mean", gamma = 0.001, N_iter = 200, B = 50, checkBRW = TRUE,
    +                       B_check = 1000, Eigen_rest = 1),  N_burn <- round(N_iter * 0.15))
    +
    +# C) Decide on Settings for numerical outputs
    +WishFPremia <- TRUE
    +FPmatLim <- c(24,36)
    +
    +Horiz <- 25
    +DesiredGraphs <- c("GIRF", "GFEVD", "TermPremia")
    +WishGraphRiskFac <- FALSE
    +WishGraphYields <- TRUE
    +WishOrthoJLLgraphs <- TRUE
    +
    +# D) Bootstrap settings
    +WishBootstrap <- FALSE
    +BootList <- list(methodBS = 'bs', BlockLength = 4, ndraws = 1000, pctg =  95)
    +
    +# E) Out-of-sample forecast
    +WishForecast <- TRUE
    +ForecastList <- list(ForHoriz = 12,  t0Sample = 1, t0Forecast = 100, ForType = "Rolling")
    +
    +# 2) Minor preliminary work: get the sets of factor labels and  a vector of common maturities
    +FactorLabels <- LabFac(N, DomVar, GlobalVar, Economies, ModelType)
    +
    +# 3) Prepare the inputs of the likelihood function
    +ATSMInputs <- InputsForOpt(t0_sample, tF_sample, ModelType, Yields, GlobalMacro,
    +                           DomMacro, FactorLabels, Economies, DataFreq,
    +                           GVARlist, JLLlist, WishBC, BRWlist)
    +
    +# 4) Optimization of the ATSM (Point Estimates)
    +ModelParaList <- Optimization(ATSMInputs, StatQ, DataFreq, FactorLabels, Economies, ModelType)
    +
    +# 5) Numerical and graphical outputs
    +# a) Prepare list of inputs for graphs and numerical outputs
    +InputsForOutputs <- InputsForOutputs(ModelType, Horiz, DesiredGraphs, OutputLabel, StatQ,
    +                                     DataFreq, WishGraphYields, WishGraphRiskFac,
    +                                     WishOrthoJLLgraphs, WishFPremia, FPmatLim,
    +                                     WishBootstrap, BootList, WishForecast,
    +                                     ForecastList)
    +
    +# b) Fit, IRF, FEVD, GIRF, GFEVD, and Term Premia
    +NumericalOutputs <- NumOutputs(ModelType, ModelParaList, InputsForOutputs,
    +                               FactorLabels, Economies)
    +
    +# c) Confidence intervals (bootstrap analysis)
    +BootstrapAnalysis <- Bootstrap(ModelType, ModelParaList, NumericalOutputs, Economies,
    +                               InputsForOutputs, FactorLabels, JLLlist, GVARlist,
    +                               WishBC, BRWlist)
    +
    +# 6) Out-of-sample forecasting
    +Forecasts <- ForecastYields(ModelType, ModelParaList, InputsForOutputs, FactorLabels,
    +                            Economies, JLLlist, GVARlist, WishBC, BRWlist)
    +
    +
    +
    Candelon and Moura (2023)
    +

    In this paper, Candelon and Moura (2023) investigate the underlying factors that shape the sovereign yield curves of Brazil, India, Mexico, and Russia during the COVID\(-19\) pandemic crisis. The study adopts a GVAR multi approach to capture the complex global macrofinancial, and especially health-related interdependencies during the latest pandemic.

    +
    +
    +
    # 1) INPUTS
    +# A) Load database data
    +LoadData("CM_2023")
    +
    +# B) GENERAL model inputs
    +ModelType <- "GVAR multi"
    +Economies <- c("Brazil", "India", "Russia", "Mexico")
    +GlobalVar <- c("US_Output_growth", "China_Output_growth", "SP500")
    +DomVar <- c("Inflation","Output_growth", "CDS", "COVID")
    +N <- 2
    +t0_sample <- "22-03-2020"
    +tF_sample <- "26-09-2021"
    +OutputLabel <- "CM_EM"
    +DataFreq <-"Weekly"
    +StatQ <- FALSE
    +
    +# B.1) SPECIFIC model inputs
    +# GVAR-based models
    +GVARlist <- list(VARXtype = "constrained: COVID", W_type = "Sample Mean",
    +                 t_First_Wgvar = "2015", t_Last_Wgvar = "2020",
    +                 DataConnectedness = TradeFlows_covid)
    +
    +# BRW inputs
    +WishBC <- FALSE
    +
    +# C) Decide on Settings for numerical outputs
    +WishFPremia <- TRUE
    +FPmatLim <- c(47,48)
    +
    +Horiz <- 12
    +DesiredGraphs <- c("GIRF", "GFEVD", "TermPremia")
    +WishGraphRiskFac <- FALSE
    +WishGraphYields <- TRUE
    +WishOrthoJLLgraphs <- FALSE
    +
    +# D) Bootstrap settings
    +WishBootstrap <- TRUE
    +BootList <- list(methodBS = 'bs', BlockLength = 4, ndraws = 100, pctg =  95)
    +
    +# 2) Minor preliminary work: get the sets of factor labels and  a vector of common maturities
    +FactorLabels <- LabFac(N, DomVar, GlobalVar, Economies, ModelType)
    +
    +# 3) Prepare the inputs of the likelihood function
    +ATSMInputs <- InputsForOpt(t0_sample, tF_sample, ModelType, Yields_covid, GlobalMacro_covid,
    +                           DomMacro_covid, FactorLabels, Economies, DataFreq, GVARlist)
    +
    +# 4) Optimization of the ATSM (Point Estimates)
    +ModelParaList <- Optimization(ATSMInputs, StatQ, DataFreq, FactorLabels, Economies, ModelType)
    +
    +# 5) Numerical and graphical outputs
    +# a) Prepare list of inputs for graphs and numerical outputs
    +InputsForOutputs <- InputsForOutputs(ModelType, Horiz, DesiredGraphs, OutputLabel, StatQ,
    +                                     DataFreq, WishGraphYields, WishGraphRiskFac,
    +                                     WishOrthoJLLgraphs, WishFPremia, FPmatLim,
    +                                     WishBootstrap, BootList)
    +
    +# b) Fit, IRF, FEVD, GIRF, GFEVD, and Term Premia
    +NumericalOutputs <- NumOutputs(ModelType, ModelParaList, InputsForOutputs, FactorLabels,
    +                               Economies)
    +
    +# c) Confidence intervals (bootstrap analysis)
    +BootstrapAnalysis <- Bootstrap(ModelType, ModelParaList, NumericalOutputs, Economies,
    +                               InputsForOutputs, FactorLabels,
    +                               JLLlist = NULL, GVARlist)
    +
    +
    +
    +

    9.1 Supplementary materials

    +

    Supplementary materials are available in addition to this article. It can be downloaded at +RJ-2025-044.zip

    +

    9.2 CRAN packages used

    +

    MultiATSM, YieldCurve, fBonds, statespacer, simStateSpace, vars, MTS, svars, bigtime, BigVAR, Spillover, BGVAR

    +

    9.3 CRAN Task Views implied by cited packages

    +

    Econometrics, Finance, TimeSeries

    +
    +
    +M. Abbritti, S. Dell’Erba, A. Moreno and S. Sola. Global factors in the term structure of interest rates. International Journal of Central Banking, 14(2): 301–339, 2018. URL https://www.ijcb.org/journal/ijcb18q1a7.htm. +
    +
    +T. Adrian, R. K. Crump and E. Moench. Pricing the term structure with linear regressions. Journal of Financial Economics, 110(1): 110–138, 2013. URL https://doi.org/10.1016/j.jfineco.2013.04.009. +
    +
    +A. Ang and M. Piazzesi. A no-arbitrage vector autoregression of term structure dynamics with macroeconomic and latent variables. Journal of Monetary Economics, 50(4): 745–787, 2003. URL https://doi.org/10.1016/S0304-3932(03)00032-1. +
    +
    +M. D. Bauer and G. D. Rudebusch. Resolving the spanning puzzle in macro-finance term structure models. Review of Finance, 21(2): 511–553, 2017. URL https://doi.org/10.1093/rof/rfw044. +
    +
    +M. D. Bauer, G. D. Rudebusch and J. C. Wu. Correcting estimation bias in dynamic term structure models. Journal of Business & Economic Statistics, 30(3): 454–467, 2012. URL https://doi.org/10.1080/07350015.2012.693855. +
    +
    +D. Beijers. statespacer: State space modelling in R. 2023. URL https://CRAN.R-project.org/package=statespacer. R package version 0.5.0. +
    +
    +M. Boeck, M. Feldkircher, F. Huber and D. Hosszejni. BGVAR: Bayesian global vector autoregressions. 2024. URL https://CRAN.R-project.org/package=BGVAR. R package version 2.5.8. +
    +
    +B. Candelon and R. Moura. A multicountry model of the term structures of interest rates with a GVAR. Journal of Financial Econometrics, 22(5): 1558–1587, 2024. URL https://doi.org/10.1093/jjfinec/nbae008. +
    +
    +B. Candelon and R. Moura. Sovereign yield curves and the COVID-19 in emerging markets. Economic Modelling, 127: 106453, 2023. URL https://doi.org/10.1016/j.econmod.2023.106453. +
    +
    +A. Chudik and M. H. Pesaran. Theory and practice of GVAR modelling. Journal of Economic Surveys, 30(1): 165–197, 2016. URL https://doi.org/10.1111/joes.12095. +
    +
    +Q. Dai and K. J. Singleton. Expectation puzzles, time-varying risk premia, and affine models of the term structure. Journal of Financial Economics, 63(3): 415–441, 2002. URL https://doi.org/10.1016/S0304-405X(02)00067-3. +
    +
    +Q. Dai and K. J. Singleton. Specification analysis of affine term structure models. Journal of Finance, 55(5): 1943–1978, 2000. URL https://doi.org/10.1111/0022-1082.00278. +
    +
    +D. Duffie and R. Kan. A yield-factor model of interest rates. Mathematical Finance, 6(4): 379–406, 1996. URL https://doi.org/10.1111/j.1467-9965.1996.tb00123.x. +
    +
    +S. S. Guirreri. YieldCurve: Modelling and estimation of the yield curve. 2015. URL https://CRAN.R-project.org/package=YieldCurve. R package version 4.1. +
    +
    +R. S. Gürkaynak and J. H. Wright. Macroeconomics and the term structure. Journal of Economic Literature, 50(2): 331–67, 2012. URL https://www.aeaweb.org/articles?id=10.1257/jel.50.2.331. +
    +
    +J. L. Horowitz. Bootstrap methods in econometrics. Annual Review of Economics, 11(1): 193–224, 2019. URL https://doi.org/10.1146/annurev-economics-080218-025651. +
    +
    +R. J. Hyndman and R. Killick. CRAN task view: Time series analysis. 2025. URL https://cran.r-project.org/web/views/TimeSeries.html. +
    +
    +S. Joslin, M. Priebsch and K. J. Singleton. Risk premiums in dynamic term structure models with unspanned macro risks. Journal of Finance, 69(3): 1197–1233, 2014. URL https://doi.org/10.1111/jofi.12131. +
    +
    +S. Joslin, K. J. Singleton and H. Zhu. A new perspective on Gaussian dynamic term structure models. Review of Financial Studies, 24(3): 926–970, 2011. URL https://doi.org/10.1093/rfs/hhq128. +
    +
    +C. Jotikasthira, A. Le and C. Lundblad. Why do term structures in different currencies co-move? Journal of Financial Economics, 115: 58–83, 2015. URL https://doi.org/10.1016/j.jfineco.2014.09.004. +
    +
    +L. Kilian and H. Lütkepohl. Structural vector autoregressive analysis. Cambridge University Press, 2017. URL https://doi.org/10.1017/9781108164818. +
    +
    +A. Lange, B. Dalheimer, H. Herwartz, S. Maxand and H. Riebl. svars: Data-driven identification of SVAR models. 2023. URL https://CRAN.R-project.org/package=svars. R package version 1.3.11. +
    +
    +R. Litterman and J. Scheinkman. Common factors affecting bond returns. Journal of Fixed Income, 1: 54–61, 1991. DOI 10.3905/jfi.1991.692347. +
    +
    +R. Moura. MultiATSM: Multicountry term structure of interest rates models. 2025. URL https://CRAN.R-project.org/package=MultiATSM. R package version 1.5.1. +
    +
    +R. G. T. de Moura. Modelling the term structure of interest rates in a multicountry setting. 2022. URL https://dial.uclouvain.be/pr/boreal/object/boreal:262850. +
    +
    +C. R. Nelson and A. F. Siegel. Parsimonious modeling of yield curves. Journal of Business, 473–489, 1987. URL https://www.jstor.org/stable/2352957. +
    +
    +W. Nicholson, D. Matteson and J. Bien. BigVAR: Dimension reduction methods for multivariate time series. 2025. URL https://CRAN.R-project.org/package=BigVAR. R package version 1.1.3. +
    +
    +H. H. Pesaran and Y. Shin. Generalized impulse response analysis in linear multivariate models. Economics Letters, 58(1): 17–29, 1998. URL https://doi.org/10.1016/S0165-1765(97)00214-0. +
    +
    +I. J. A. Pesigan. simStateSpace: Simulate data from state space models. 2025. URL https://CRAN.R-project.org/package=simStateSpace. R package version 1.2.10. +
    +
    +B. Pfaff and M. Stigler. vars: VAR modelling. 2024. URL https://CRAN.R-project.org/package=vars. R package version 1.6-1. +
    +
    +M. Piazzesi. Affine term structure models. In Handbook of financial econometrics: Tools and techniques, pages. 691–766 2010. Elsevier. URL https://doi.org/10.1016/B978-0-444-50897-3.50015-8. +
    +
    +G. D. Rudebusch and T. Wu. A macro-finance model of the term structure, monetary policy and the economy. The Economic Journal, 118(530): 906–926, 2008. URL https://doi.org/10.1111/j.1468-0297.2008.02155.x. +
    +
    +T. Setz. fBonds: Rmetrics - pricing and evaluating bonds. 2017. URL https://CRAN.R-project.org/package=fBonds. R package version 3042.78. +
    +
    +L. E. Svensson. Estimating and interpreting forward interest rates: Sweden 1992-1994. 1994. URL https://www.nber.org/papers/w4871. +
    +
    +R. S. Tsay, D. Wood and J. Lachmann. MTS: All-purpose toolkit for analyzing multivariate time series (MTS) and estimating multivariate volatility models. 2022. URL https://CRAN.R-project.org/package=MTS. R package version 1.2.1. +
    +
    +J. Urbina. Spillover: Spillover/connectedness index based on VAR modelling. 2024. URL https://CRAN.R-project.org/package=Spillover. R package version 0.1.1. +
    +
    +O. Vasicek. An equilibrium characterization of the term structure. Journal of Financial Economics, 5(2): 177–188, 1977. URL https://doi.org/10.1016/0304-405X(77)90016-2. +
    +
    +H. Wickham. ggplot2: Elegant graphics for data analysis. Springer-Verlag New York, 2016. URL https://ggplot2.tidyverse.org. +
    +
    +I. Wilms, D. S. Matteson, J. Bien, S. Basu and W. N. E. Wegner. bigtime: Sparse estimation of large time series models. 2023. URL https://CRAN.R-project.org/package=bigtime. R package version 0.2.3. +
    +
    +
    +
    +
      +
    1. Specifically, the referred loadings are \(a_{i,n+1}(\Theta _{n+1}) = a_{i,n}(\Theta _{n}) + b_{i,n}(\Theta _{n}) \mu^{Q}_{i,X} + \frac{1}{2} b_{i,n}(\Theta _{n}) \Gamma_{i,X} \Gamma_{i,X}' b_{i,n}(\Theta _{n})' - \delta_{i,0}\) and \(b_{i, n+1}=b_{i,n}\Phi^{Q}_{i,X} - \delta_{i,1}\), considering that the boundary conditions are \(a_{i,1}(\Theta _1)=-\delta_{i,0}\) and \(b_{i,1}(\Theta_1)=-\delta_{i,1}\). These expressions assume that the Radon–Nikodym derivative, which maps the risk-neutral measure to the physical measure, follows a log-normal process, and that the market price of risk is time-varying and affine in \(X_t\). See Ang and Piazzesi (2003) for a detailed derivation of these expressions.↩︎

    2. +
    3. Noticeably, in the context of the MultiATSM package, the model type JLL No DomUnit is the only exception (see Section 3).↩︎

    4. +
    5. It is worth emphasizing that, although the DataForEstimation() and InputsForOpt() functions in the package accept inputs with differing maturities, their outputs are standardized to a common set of yields.↩︎

    6. +
    +
    + + +
    + +
    +
    + + + + + + + +
    +

    References

    +
    +

    Reuse

    +

    Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

    +

    Citation

    +

    For attribution, please cite this work as

    +
    Moura, "MultiATSM: An R Package for Arbitrage-Free Macrofinance Multicountry Affine Term Structure Models", The R Journal, 2026
    +

    BibTeX citation

    +
    @article{RJ-2025-044,
    +  author = {Moura, Rubens},
    +  title = {MultiATSM: An R Package for Arbitrage-Free Macrofinance Multicountry Affine Term Structure Models},
    +  journal = {The R Journal},
    +  year = {2026},
    +  note = {https://doi.org/10.32614/RJ-2025-044},
    +  doi = {10.32614/RJ-2025-044},
    +  volume = {17},
    +  issue = {4},
    +  issn = {2073-4859},
    +  pages = {275-304}
    +}
    +
    + + + + + + + diff --git a/_articles/RJ-2025-044/RJ-2025-044.pdf b/_articles/RJ-2025-044/RJ-2025-044.pdf new file mode 100644 index 0000000000..8e2d136ef1 Binary files /dev/null and b/_articles/RJ-2025-044/RJ-2025-044.pdf differ diff --git a/_articles/RJ-2025-044/RJ-2025-044.tex b/_articles/RJ-2025-044/RJ-2025-044.tex new file mode 100644 index 0000000000..51c3887da7 --- /dev/null +++ b/_articles/RJ-2025-044/RJ-2025-044.tex @@ -0,0 +1,1364 @@ +% !TeX root = RJwrapper.tex +\title{MultiATSM: An R Package for Arbitrage-Free Macrofinance Multicountry Affine Term Structure Models} + + +\author{by Rubens Moura} + +\maketitle + +\abstract{% +The MultiATSM package provides estimation tools and a wide range of outputs for eight macrofinance affine term structure model (ATSM) classes, supporting practitioners, academics, and policymakers. All models extend the single-country framework of Joslin et al.~(2014) to multicountry settings, with additional adaptations from Jotikasthira et al.~(2015) and Candelon and Moura (2024). These model extensions incorporate, respectively, the presence of a dominant (global) economy and adopt a global vector autoregressive (GVAR) setup to capture the joint dynamics of risk factors. The package generates diverse outputs for each ATSM, including graphical representations of model fit, risk premia, impulse response functions, and forecast error variance decompositions. It also implements bootstrap methods for confidence intervals and produces bond yield forecasts. +} + +\section{Introduction}\label{introduction} + +The term structure of interest rates (or yield curve) describes the relationship between bond yields and investment maturities. As \citet{Piazzesi2010} emphasizes, understanding its dynamics is essential for several reasons. First, long-term yields incorporate market expectations of future short-term rates, making the yield curve a handy forecasting tool for macroeconomic aggregates like output and inflation. As such, this supports optimal consumption-saving decisions and capital allocation by economic agents. Second, it plays a key role in the transmission of monetary policy, linking short-term policy rates to long-term borrowing costs. Third, it guides fiscal authorities in shaping debt maturities to balance refinancing risk and interest rate exposure. Fourth, it is essential for pricing and hedging interest rate derivatives, which rely on accurate yield curve modelling. + +Affine Term Structure Models (ATSMs) are the workhorse in yield curve modelling. Based on the assumption of no arbitrage, ATSMs offer a flexible framework to assess how investors price risks and generate predictions for the price of any bond (see \citet{Piazzesi2010}; \citet{GurkaynakWright2012} for comprehensive reviews). Early ATSMs gained popularity for their ability to capture nearly all term structure fluctuations, appealing to both academics and practitioners \citep{Vasicek1977, DuffieKan1996, DaiSingleton2002}. While these models produce accurate statistical descriptions of the yield curve, they are silent on the deeper economic determinants that policymakers require for causal inference. + +In response to this limitation, a large body of research has emerged to explore the interplay between the term structure and macroeconomic developments (seminal contributions include \citet{AngPiazzesi2003} and \citet{RudebuschWu2008}). A prominent contribution in this area is the unspanned economic risk framework developed by \citet{JoslinPriebschSingleton2014} (henceforth JPS, 2014). In essence, this model assumes an arbitrage-free bond market and considers a linear state space representation to describe the dynamics of the yield curve. Compared to earlier macrofinance ATSMs, JPS (2014) offers a tractable estimation approach that integrates traditional yield curve factors (spanned factors) with macroeconomic variables (unspanned factors). As a result, the model delivers a strong cross-sectional fit while explicitly linking bond yield responses to the state of the economy. + +The work of JPS (2014) lays the foundational framework for the modelling tools included in the \CRANpkg{MultiATSM} package \citep{MultiATSM2025}. In addition to the original single-country setup proposed by JPS (2014), the package incorporates multicountry extensions developed by \citet{JotikasthiraLeLundblad2015} (henceforth JLL, 2015) and \citet{CandelonMoura2024} (henceforth CM, 2024). Altogether, the package offers functions to build eight types of ATSMs, covering the original versions and several variants of these three frameworks. + +Beyond complete routines for model estimation, \CRANpkg{MultiATSM} produces a wide range of analytical outputs. In particular, it generates graphical representations such as model-implied bond yields, bond risk premia, and both orthogonalized and generalized versions of: \emph{(i)} impulse response functions, and \emph{(ii)} forecast error variance decompositions for yields and risk factors. Confidence intervals for the two latter outputs can be computed using three bootstrap methods: residual-based, block, or wild bootstrap. Moreover, the package supports out-of-sample forecasting of bond yields across the maturity spectrum. This paper provides detailed guidance on how to use the \CRANpkg{MultiATSM} package effectively. + +There are a few notable packages for term structure modelling in the R programming environment. \CRANpkg{YieldCurve} \citep{YieldCurve2015} and \CRANpkg{fBonds} \citep{fBonds2017} provide a collection of functions to build term structures based on the frameworks of \citet{NelsonSiegel1987} and \citet{Svensson1994}. These yield curve methods have gained popularity for their parsimonious parameterization and good empirical fit. However, these models do not rule out arbitrage opportunities, a limitation addressed by ATSMs. Moreover, the focus of \CRANpkg{YieldCurve} and \CRANpkg{fBonds} is restricted to parameter estimation and yield curve fitting, without offering additional model outputs such as those provided by \CRANpkg{MultiATSM}. + +Several other R packages support time series modelling \citep{HyndmanKillick2025}, particularly within state space and vector autoregressive (VAR) frameworks. State space packages are relatively few and tend to focus on either estimation, \CRANpkg{statespacer} \citep{statespacer2023}, or simulation, \CRANpkg{simStateSpace} \citep{simStateSpace2025}. VAR-based tools are more numerous. For instance, \CRANpkg{vars} \citep{vars2024} and \CRANpkg{MTS} \citep{MTS2022} provide extensive functionality for estimation, diagnostics, and forecasting, while \CRANpkg{svars} \citep{svars2023} adds structural identification methods. High-dimensional VARs are handled by packages like \CRANpkg{bigtime} \citep{bigtime2023} and \CRANpkg{BigVAR} \citep{BigVAR2025}, and cross-country spillovers are modeled by \CRANpkg{Spillover} \citep{Spillover2024} and \CRANpkg{BGVAR} \citep{BGVAR2024}. + +Although these tools share some features with \CRANpkg{MultiATSM}, they are tailored to standard state space or VAR analysis. In contrast, \CRANpkg{MultiATSM} embeds VAR dynamics within a state space representation that is explicitly grounded in arbitrage-free asset pricing theory. As such, \CRANpkg{MultiATSM} fills a specific gap in the R ecosystem by combining the structure of ATSMs with the flexibility of modern time series tools. + +The remainder of the paper is organized as follows. Section~\hyperref[S:ATSMTheory]{2} outlines the theoretical foundations of the ATSMs implemented in the \CRANpkg{MultiATSM} package, and Section~\hyperref[S:ATSMoptions]{3} details each model's features. The subsequent sections focus on the practical implementation of ATSMs. Section~\hyperref[S:SectionData]{4} presents the dataset included in the package. Section~\hyperref[S:SectionInputs]{5} explains the user inputs required for model estimation. Section~\hyperref[S:SectionEstimation]{6} explains the estimation procedure, and Section~\hyperref[S:SectionImplementation]{7} shows how to estimate ATSMs from scratch using \CRANpkg{MultiATSM}. Replications of published academic studies are provided in the Appendix. + +\section{ATSMs with unspanned economic risks: theoretical background}\label{S:ATSMTheory} + +In this section, I outline several arbitrage-free ATSMs with unspanned macroeconomic risks available in the \CRANpkg{MultiATSM} package. A key appealing feature of these setups is their ability to disentangle the yield curve into a cross-sectional component, governed by the risk-neutral (\(\mathbb{Q}\)) dynamics, and a time-series component, driven by the physical (\(\mathbb{P}\)) dynamics. In light of this characteristic of the models, I present the single and the multicountry \(\mathbb{Q}\)-dynamics model dimensions in Section~\hyperref[S:Qdyn]{2.1}. Next, I expose the specific features of the risk factor dynamics under the \(\mathbb{P}\)-measure of the various restricted and unrestricted VARs settings in Section~\hyperref[S:Pdyn]{2.2}. Section~\hyperref[S:ATSMestimation]{2.3} describes the model estimation procedures. + +\subsection{Model cross-sectional dimension (Q-dynamics)}\label{S:Qdyn} + +\subsubsection{Single-country specifications (individual Q-dynamics model classes)}\label{single-country-specifications-individual-q-dynamics-model-classes} + +The model cross-sectional structure is based on two central equations. The first one assumes that the country \(i\) short-term interest rate at time \(t\), \(r_{i,t}\), is an affine function of \(N\) unobserved (latent) country-specific factors, \(\boldsymbol{X_{i,t}}\): +\begin{equation} +\underset{(1 \times 1)}{\vphantom{\Big|} +r_{i,t}} = +\underset{(1 \times 1)}{ +\vphantom{\Big|} +\delta_{i,0}} + +\underset{(1 \times N)}{% +\vphantom{\Big|} +\boldsymbol{\delta}_{i,1}^{\top}} +\underset{(N \times 1)}{% +\vphantom{\Big|} +\boldsymbol{X}_{i,t}}\text{,} +\label{eq:ShortRate} +\end{equation} +where \(\delta_{i,0}\) and \(\boldsymbol{\delta_{i,1}}\) are time-invariant parameters. + +The second equation assumes that the unobserved factor dynamics for each country \(i\) follow a maximally flexible, first-order, \(N-\)dimensional multivariate Gaussian (\(\mathcal{N}\)) VAR model under the \(\mathbb{Q}\)-measure: + +\begin{align} +& \underset{(N \times 1)}{\boldsymbol{\vphantom{\Big|} +X_{i,t}}} = +\underset{(N \times 1)}{\boldsymbol{\vphantom{\Big|} +\mu^{Q}_{i,X}}} + +\underset{(N \times N)}{\vphantom{\Big|} +\Phi^{Q}_{i,X}} +\underset{(N \times 1)}{\boldsymbol{\vphantom{\Big|} +X_{i,t-1}}} + +\underset{(N \times N)}{\vphantom{\Big|} +\Gamma_{i,X}} +\underset{(N \times 1)}{\boldsymbol{\vphantom{\Big|} +\varepsilon_{i,t}^{Q}}}\text{,} + & \boldsymbol{\varepsilon_{i,t}^{Q}}\sim {\mathcal{N}_N}(\boldsymbol{0}_N,\mathrm{I}_N)\text{,} + \label{eq:VARQ} +\end{align} +where \(\boldsymbol{\mu^{Q}_{i,X}}\) contains intercepts, \(\Phi^{Q}_{i,X}\), the feedback matrix, and \(\Gamma_{i,X}\) a lower triangular matrix. + +Based on Equations \eqref{eq:ShortRate} and \eqref{eq:VARQ}, \citet{DaiSingleton2000} show that the country-specific zero-coupon bond yield with maturity of \(n\) periods, \(y_{i,t}^{(n)}\), is affine in \(\boldsymbol{X_{i,t}}\): +\begin{equation} +\underset{(1 \times 1)}{\vphantom{\Big|} +y_{i,t}^{(n)}} = +\underset{(1 \times 1)}{\vphantom{\Big|} +a_{i,n}(\Theta_{n})} + +\underset{(1 \times N)}{\vphantom{\Big|} +\boldsymbol{b_{i,n}(\Theta_{n})}^\top} +\underset{(N \times 1)}{\vphantom{\Big|} +\boldsymbol{X_{i,t}}}\text{,} + \label{eq:AffineYieldsScalar} +\end{equation} +where \(a_{i,n}(\Theta _{n})\) and \(\boldsymbol{b_{i,n}(\Theta _{n})}\) are constrained to eliminate arbitrage opportunities within this bond market, as dictated by the well-known Riccati equations.\footnote{Specifically, the referred loadings are \(a_{i,n+1}(\Theta _{n+1}) = a_{i,n}(\Theta _{n}) + b_{i,n}(\Theta _{n}) \mu^{Q}_{i,X} + \frac{1}{2} b_{i,n}(\Theta _{n}) \Gamma_{i,X} \Gamma_{i,X}' b_{i,n}(\Theta _{n})' - \delta_{i,0}\) and \(b_{i, n+1}=b_{i,n}\Phi^{Q}_{i,X} - \delta_{i,1}\), considering that the boundary conditions are \(a_{i,1}(\Theta _1)=-\delta_{i,0}\) and \(b_{i,1}(\Theta_1)=-\delta_{i,1}\). These expressions assume that the Radon--Nikodym derivative, which maps the risk-neutral measure to the physical measure, follows a log-normal process, and that the market price of risk is time-varying and affine in \(X_t\). See \citet{AngPiazzesi2003} for a detailed derivation of these expressions.} For notational simplicity, we collect \(J\) bond yields into the vector \(\boldsymbol{Y_{i,t}}=[y_{i,t}^{(1)}, y_{i,t}^{(2)},...,y_{i,t}^{(J)}]^\top\), the \(J\) intercepts into \(\boldsymbol{A_X(\Theta_i)}=[a_{i,1}(\Theta _{1}), a_{i,2}(\Theta _{2}) ,...,a_{i,J}(\Theta _{J})]^\top\) \(\in \mathbb{R}^J\), and the \(N\) slope coefficients into a \(J \times N\) matrix \(B_X(\Theta_i)=[\boldsymbol{b_{i,1}(\Theta _{1})}^\top, \boldsymbol{b_{i,2}(\Theta _{2})}^\top, ...,\boldsymbol{b_{i,J}(\Theta _{J})}^\top]^\top\). Accordingly, the yield curve cross-section dimension of country \(i\) is: +\begin{equation} + \underset{(J \times 1)}{\vphantom{\Big|} + \boldsymbol{Y_{i,t}}} = + \underset{(J \times 1)}{\vphantom{\Big|} + \boldsymbol{A_X(\Theta_i)}} + + \underset{(J \times N)}{\vphantom{\Big|} + B_X(\Theta_i)} + \underset{(N \times 1)}{\vphantom{\Big|} + \boldsymbol{X_{i,t}}}\text{.} +\label{eq:AffineYieldsVector} +\end{equation} + +It follows from Equations \eqref{eq:ShortRate} and \eqref{eq:VARQ} that the parameter set \(\Theta_i =\{\boldsymbol{\mu^Q_{i,X}},\Phi^Q_{i,X}, \Gamma_{i,X}, \delta_{i,0}, \boldsymbol{\delta_{i,1}}\}\) fully characterizes the cross-section of country's \(i\) term structure. Importantly, \citet{DaiSingleton2000} demonstrate that this system is not identified without additional restrictions, since \(\boldsymbol{X_{i,t}}\) and any invertible affine transformation of \(\boldsymbol{X_{i,t}}\) yield observationally equivalent representations. To circumvent this problem, JPS (2014) adopt the three sets of (minimal) restrictions proposed by \citet{JoslinSingletonZhu2011}. First, they impose the latent factors to be zero-mean processes, forcing \(\boldsymbol{\mu^{Q}_{i,X}}= \boldsymbol{0}_N\). Second, they choose \(\boldsymbol{\delta_{i,1}}\) to be a \(N\)-dimensional vector whose entries are all equal to one. Lastly, \(\Phi^Q_{i,X}\) is a diagonal matrix, the elements of which are the real and distinct eigenvalues, \(\lambda^Q_i\), of the matrix of eigenvectors of \(\Phi^Q_{i,X}\). Based on this restriction set, \citet{JoslinSingletonZhu2011} show that no additional invariant rotation is possible. + +\citet{JoslinSingletonZhu2011} also show that a rotation from \(\boldsymbol{X_{i,t}}\) to portfolios of yields, the spanned factors \(\boldsymbol{P_{i,t}}\), leads to an observationally equivalent model representation. This invariant transformation implies that \(N\) portfolios of yields are perfectly priced and observed without errors, while the remaining \(J-N\) portfolios are priced and observed imperfectly. Specifically, the spanned factors are computed as \(\boldsymbol{P_{i,t}}=V_i\boldsymbol{Y_{i,t}}\), for a full-rank matrix \(V_i\). Based on this definition, Equation \eqref{eq:AffineYieldsVector} can be rearranged as an affine function of \(\boldsymbol{P_{i,t}}\) + +\begin{equation} + \underset{(J \times 1)}{\vphantom{\Big|} + \boldsymbol{Y_{i,t}}}= + \underset{(J \times 1)}{\vphantom{\Big|} + \boldsymbol{A_P(\Theta_i)}}+ + \underset{(J \times N)}{\vphantom{\Big|} + B_P(\Theta_i)} + \underset{(N \times 1)}{\vphantom{\Big|} + \boldsymbol{P_{i,t}}}\text{.} + \label{eq:AffineYieldsSpanned} +\end{equation} +where \(\boldsymbol{A_P(\Theta_i)}= \mathrm{I}_n - B_X(\Theta_i ) \left[ V_iB_X(\Theta_i \right] ^{-1}V_i \boldsymbol{A_X(\Theta_i)}\) and +\(B_P(\Theta_i)=B_X(\Theta_i) \left[ V_iB_X(\Theta_i ) \right]^{-1}\). + +The rotation from \(\boldsymbol{X_{i,t}}\) to \(\boldsymbol{P_{i,t}}\) is convenient for two key reasons. First, \(\boldsymbol{P_{i,t}}\) contains directly observable yield curve factors (unlike \(\boldsymbol{X_{i,t}}\)), with its \(N\) elements mapping to traditional yield curve components. For instance, for \(N=3\) and \(V_i\) being the weight matrix that results from a principal component analysis, the portfolios of yields \(\boldsymbol{P_{i,t}}\) are commonly referred to as the level, slope, and curvature factors (see Section~\hyperref[S:SpaFac]{6.1}). Second, it enables a convenient decomposition of the likelihood function, facilitating both estimation and the interpretation of model parameters. + +\subsubsection{Multicountry specifications (joint Q-dynamics model classes)}\label{multicountry-specifications-joint-q-dynamics-model-classes} + +The cross-section multicountry extension is formed by stacking the country yields, spanned factors, and intercepts from Equation \eqref{eq:AffineYieldsSpanned} into, respectively, \(\boldsymbol{Y_t}=[\boldsymbol{Y_{1,t}}^\top, \boldsymbol{Y_{2,t}}^\top, ...,\boldsymbol{Y_{C,t}}^\top]^\top\), \(\boldsymbol{P_t}=[\boldsymbol{P_{1,t}}^\top, \boldsymbol{P_{2,t}}^\top, ..., \boldsymbol{P_{C,t}}^\top]^\top\), and \(\boldsymbol{A_P(\Theta)}=[\boldsymbol{A_P^\top(\Theta_1)}, \boldsymbol{A_P^\top(\Theta_2)}, ..., \boldsymbol{A_P^\top(\Theta_C)}]^\top\), where \(C\) denotes the number of countries in this economic system. Additionally, we set \(B_{P}(\Theta)\) as block diagonal, \(B_P(\Theta)=B_P(\Theta_1) \oplus B_P(\Theta_2) \oplus \dots \oplus B_P(\Theta_C)\), where \(\oplus\) refers to the direct sum symbol. Accordingly, +\begin{equation} + \underset{(CJ \times 1)}{\vphantom{\Big|} + \boldsymbol{Y_{t}}} = + \underset{(CJ \times 1)}{\vphantom{\Big|} + \boldsymbol{A_{P}(\Theta)}} + + \underset{(CJ \times CN)}{\vphantom{\Big|} + B_{P}(\Theta)} + \underset{(CN \times 1)}{\vphantom{\Big|} + \boldsymbol{P_{t}}}\text{.} + \label{eq:AffineYieldsSpannedMultiCountry} +\end{equation} + +\subsection{Model time series dimension (P-dynamics)}\label{S:Pdyn} + +In the modelling frameworks implemented in the \CRANpkg{MultiATSM} package, the risk factor dynamics under the \(\mathbb{P}\)-measure must include at least \(N\) domestic spanned factors (\(\boldsymbol{P_{i,t}}\)) and \(M\) domestic unspanned factors (\(\boldsymbol{M_{i,t}}\)), and may optionally include \(G\) global unspanned factors (\(\boldsymbol{M_t^W}\)), depending on the specification. The dynamics of these risk factors evolves as either an unrestricted or a restricted VAR models. The unrestricted case corresponds to the JPS specification, while the restricted setup encompasses the GVAR and JLL frameworks. + +It is worth stressing the role of unspanned factors in shaping the yield curve developments. Although these factors are absent in the cross-section dimension of the models, they influence the dynamics of the spanned factors which, in turn, affect directly bond yields. + +\subsubsection{JPS-based models}\label{jps-based-models} + +The country-specific state vector, \(\boldsymbol{Z_{i,t}}\), is formed from stacking the global and domestic (unspanned and spanned) risk factors: \(\boldsymbol{Z_{i,t}} = [\boldsymbol{M_t^{W^\top}}\), \(\boldsymbol{M_{i,t}}^\top\), \(\boldsymbol{P_{i,t}}^\top]^\top\). As such, \(\boldsymbol{Z_{i,t}}\) is a \(R\)-dimensional vector, where \(R =G + K\) and \(K = M + N\). +In JPS-based setups, \(\boldsymbol{Z_{i,t}}\) follows a standard unrestricted Gaussian VAR(1): +\begin{align} + & \underset{(R \times 1)}{\vphantom{\Big|} + \boldsymbol{Z_{i,t}}} = + \underset{(R \times 1)}{\vphantom{\Big|} + \boldsymbol{C_i^{\mathbb{P}}}} + + \underset{(R \times R)}{\vphantom{\Big|} + \Phi_i^{\mathbb{P}}} + \underset{ (R \times 1)}{\vphantom{\Big|} + \boldsymbol{Z_{i,t-1}}} + + \underset{(R \times R)}{\vphantom{\Big|} + \Gamma_i} + \underset{(R \times 1)}{\vphantom{\Big|} + \boldsymbol{\varepsilon_{Z,t}^{\mathbb{P}}}}\text{,} + & \boldsymbol{\varepsilon_{Z,t}^{\mathbb{P}}} \sim + {\mathcal{N}_R}(\boldsymbol{0}_R,\mathrm{I}_R)\text{,}% + \label{eq:VARunspanned} +\end{align} +where \(\boldsymbol{C_i^{\mathbb{P}}}\) denotes the vector of intercepts; \(\Phi_i^{\mathbb{P}}\), the feedback matrix; and \(\Gamma_i\), the Cholesky factor (a lower triangular matrix). + +\subsubsection{GVAR-based models}\label{S:GVARtheory} + +In the \CRANpkg{MultiATSM} package, the GVAR setup is formed from two parts: the marginal and the VARX\(^{*}\) models. The former captures the joint dynamics of the global economy, whereas the latter describes the developments from the domestic factors. For a thorough description of GVAR models, see \citet{ChudikPesaran2016}. + +The marginal model is an unrestricted VAR(\(1\)) featuring exclusively the global factors: +\begin{align} +& \underset{(G \times 1)}{\vphantom{\Big|} +\boldsymbol{M_t^W}}= +\underset{(G \times 1)}{\vphantom{\Big|} +\boldsymbol{C^W}} + +\underset{(G \times G)}{\vphantom{\Big|} +\Phi^W} \underset{(G \times 1)}{\vphantom{\Big|} +\boldsymbol{M_{t-1}^W}} + +\underset{(G \times G)}{\vphantom{\Big|} +\Gamma^W}\underset{(G \times 1)}{\vphantom{\Big|} +\boldsymbol{\varepsilon_{t}^W}}\text{,} & \boldsymbol{\varepsilon_t^W} \sim {\mathcal{N}_G}(\boldsymbol{0}_G,\mathrm{I}_G). +\label{eq:MarginalModel} +\end{align} + +The VARX\(^{*}\) setups are country-specific small-scale VAR models containing global factors and weakly exogenous `star' variables --- weighted averages of foreign variables --- built as +\begin{equation} + \boldsymbol{Z_{i,t}^{\ast^\top}} = \sum_{j=1}^{C} w_{i,j} \boldsymbol{Z_{j,t}^\top}, \qquad \sum_{j=1}^{C} w_{i,j}= 1, \quad w_{i,i}=0 \quad \forall i \in \{1,2, ...C \}, +\label{eq:StarVar} +\end{equation} +where \(Z_{j,t}\) is a \(K-\)dimension vector of domestic factors \(\boldsymbol{Z_{j,t}} = [\boldsymbol{M_{j,t}}^\top\), \(\boldsymbol{P_{j,t}}^\top]^\top\) and \(w_{i,j}\) is a scalar that measures the degree of connectedness of country \(i\) with country \(j\). + +These models follow a VARX\(^{*}(p,q,r)\) specification, where \(p\), \(q\) and \(r\) are the number of lags from, respectively, the domestic, the star, and the global risk factors. The \CRANpkg{MultiATSM} package provides the estimates for the case \(p=q=r=1\). In such a case, the dynamics of \(\boldsymbol{Z_{i,t}}\) is described as a VARX\(^{*}\) of the following form: +\begin{align} +& \underset{(K \times 1)}{\vphantom{\Big|} +\boldsymbol{Z_{i,t}}} = +\underset{(K \times 1)}{\vphantom{\Big|} +\boldsymbol{C^X_{i}}} + +\underset{(K \times K)}{\vphantom{\Big|} +\Phi^X_{i}} +\underset{(K \times 1)}{\vphantom{\Big|} +\boldsymbol{Z_{i,t-1}}} + +\underset{(K \times K)}{\vphantom{\Big|} +\Phi^{X^\ast}_i} +\underset{(K \times 1)}{\vphantom{\Big|} +\boldsymbol{Z_{i,t-1}^{\ast}}} + +\underset{(K \times G)}{\vphantom{\Big|} +\Phi_{i}^{X^{W}}} +\underset{(G \times 1)}{\vphantom{\Big|} +\boldsymbol{M_{t-1}^{W}}} + +\underset{(K \times K)}{\vphantom{\Big|} +\Gamma_{i}^{X}} +\underset{(K \times 1)}{\vphantom{\Big|} +\boldsymbol{\varepsilon^X_{i,t}}}\text{,} & \boldsymbol{\varepsilon^X_{i,t}} \sim {\mathcal{N}_K}(\boldsymbol{0}_K,\mathrm{I}_K). +\label{eq:VARXmodel} +\end{align} + +Additionally, GVAR models require, as an intermediate step, the specification of country-specific \(2K \times CK\)-link matrices, \(W_i\), to unify the individual VARX\(^{*}\) models. Formally, +\begin{equation} +\begin{bmatrix} \boldsymbol{Z_{i,t}} \\ \boldsymbol{Z_{i,t}}^{*} \end{bmatrix}_{2K \times 1} \equiv \underset{(2K \times CK)}{W_i} \begin{bmatrix} \boldsymbol{Z_{1,t}} \\ \boldsymbol{Z_{2,t}} \\ \vdots \\ \boldsymbol{Z_{C,t}} +\end{bmatrix}_{CK \times 1}. +\label{eq:LinkMatequation} +\end{equation} + +Last, to compose the \(F\)-dimensional state vector for \(F = G + CK\), we gather the global economic variables and the country-specific risk factors, as \(\boldsymbol{Z_t} = [\boldsymbol{M_{t}^{W^\top}}\), \(\boldsymbol{Z_{1,t}}^\top\), \(\boldsymbol{Z_{2,t}}^\top, \ldots \boldsymbol{Z_{C,t}}^\top]^\top\). As such, we can form a first order GVAR process as +\begin{align} + & \underset{(F \times 1)}{\vphantom{\Big|} + \boldsymbol{Z_t}} = + \underset{(F \times 1)}{\vphantom{\Big|} + \boldsymbol{C_y}} + + \underset{(F \times F)}{\vphantom{\Big|} + \Phi_y} + \underset{(F \times F)}{\vphantom{\Big|} + \boldsymbol{Z_{t-1}}} + + \underset{(F \times F)}{\vphantom{\Big|} + \Gamma_y} + \underset{(F \times 1)}{\vphantom{\Big|} + \boldsymbol{\varepsilon_{y,t}}}\text{,} & + \boldsymbol{\varepsilon_{y,t}} \sim {\mathcal{N}_F}(\boldsymbol{0}_F,\mathrm{I}_F)\text{,} \label{eq:GVARequation} +\end{align} +where \(\boldsymbol{C_y} = [\boldsymbol{C^{W^\top}}\), \(\boldsymbol{C_1^{X^\top}}\), \(\boldsymbol{C_2^{X^\top}}\),\ldots{} \(\boldsymbol{C_C^{X^\top}}]^\top\), \(\boldsymbol{\varepsilon_{y,t}} =[ \boldsymbol{\varepsilon^{W^\top}_t}\), \(\boldsymbol{\varepsilon_{1,t}^{X^\top}}\), \(\boldsymbol{\varepsilon_{2,t}^{X^\top}}\) \ldots{} \(\boldsymbol{\varepsilon_{C,t}^{X^\top}}]^\top\), \(\Gamma_y=\Gamma^W \oplus \Gamma_1^X \oplus \Gamma_2^X \oplus \dots \oplus \Gamma_C^X\), and + +\begin{equation} +\Phi_y = +\begin{bmatrix} +\Phi^W & 0_{\scriptscriptstyle{G \times CK}} \\ +\Phi^{X^{W}} & G_1 +\end{bmatrix}_{F \times F} , +\end{equation} + +where \(\Phi^{X^{W}}= +\begin{bmatrix} +\Phi^{X^{W}}_1 \\ +\Phi^{X^{W}}_2 \\ +\vdots \\ +\Phi^{X^{W}}_C +\end{bmatrix}_{CK \times G}\) +and \(G_1= +\begin{bmatrix} +\Phi_1W_1 \\ +\Phi_2W_2 \\ +\vdots \\ +\Phi_CW_C +\end{bmatrix}_{CK \times CK}\), for \(\Phi_i= [\Phi_i^{X}\), \(\Phi_i^{X^*}]\) and \(\quad \forall i \in \{1,2, ...C \}\). + +\subsubsection{JLL-based models}\label{S:JLL} + +JLL-based models incorporate three components: \emph{(i)} the global economy, \emph{(ii)} a dominant large economy,\footnote{Noticeably, in the context of the \CRANpkg{MultiATSM} package, the model type \texttt{JLL\ No\ DomUnit} is the only exception (see Section~\hyperref[S:ATSMoptions]{3}).} and \emph{(iii)} a set of smaller economies. The state vector is formed from a number of linear projections to build domestic risk factors that are free of the influence of the variables from other countries and/or from the global economy. + +The construction of the domestic spanned factors proceeds in two steps. First, for each economy \(i\), \(\boldsymbol{P_{i,t}}\) is projected on \(\boldsymbol{M_{i,t}}\) of this same country +\begin{equation} + \underset{ (N \times 1)}{\vphantom{\Big|} + \boldsymbol{P_{i,t}}} = + \underset{(N \times M)}{\vphantom{\Big|} + b_i} + \underset{(M \times 1)}{\vphantom{\Big|} + \boldsymbol{M_{i,t}}} + + \underset{ (N \times 1)}{\vphantom{\Big|} + \boldsymbol{P_{i,t}^e}} \text{,} +\label{eq:PricingOrthoAll} +\end{equation} +where the residuals \(\boldsymbol{P_{i,t}^e}\) are orthogonal to the economic fundamentals of the country \(i\). + +Second, for the non-dominant economies, \(\boldsymbol{P_{i,t}^e}\) is additionally projected on the orthogonalized spanned factors of the dominant country, indexed by \(D\), as follows: +\begin{equation} + \underset{(N \times 1)}{\vphantom{\Big|} + \boldsymbol{P_{i,t}^e}} = + \underset{(N \times N)}{\vphantom{\Big|} + c_i^D} + \underset{(N \times 1)}{\vphantom{\Big|} + \boldsymbol{P_{D,t}^e}} + + \underset{(N \times 1)}{\vphantom{\Big|} + \boldsymbol{P_{i,t}^{e*}}}\text{,} \quad i \neq D, +\label{eq:PricingOrthoNonDU} +\end{equation} +where \(\boldsymbol{P_{i,t}^{e*}}\) corresponds to the non-dominant country \(i\) set of residuals. + +The design of the domestic unspanned factors also features two steps: for the dominant economy, \(\boldsymbol{M_{D,t}}\) is projected on the global economic factors +\begin{equation} + \underset{(M \times 1)}{\vphantom{\Big|} + \boldsymbol{M_{D,t}}} = + \underset{(M \times G)}{\vphantom{\Big|} + a_D^W} + \underset{(G \times 1)}{\vphantom{\Big|} + \boldsymbol{M_t^W}} + + \underset{(M \times 1)}{\vphantom{\Big|} + \boldsymbol{M_{D,t}^e}} \text{,} +\label{eq:MacroOrthoDU} +\end{equation} +and, for the other economies, the residuals of the previous regression are used to compute +\begin{equation} + \underset{(M \times 1)}{\vphantom{\Big|} + \boldsymbol{M_{i,t}}} = + \underset{(M \times G)}{\vphantom{\Big|} + a_i^W} + \underset{(G \times 1)}{\vphantom{\Big|} + \boldsymbol{M_t^W}} + + \underset{(M \times M)}{\vphantom{\Big|} + a_i^D} + \underset{(M \times 1)}{\vphantom{\Big|} + \boldsymbol{M_{D,t}^e}} + + \underset{(M \times 1)}{\vphantom{\Big|} + \boldsymbol{M_{i,t}^{e*}}}\text{.} +\label{eq:MacroOrthoNonDU} +\end{equation} + +Accordingly, the state vector is formed by \(\boldsymbol{Z_t^e}= [\boldsymbol{M_t^{W^\top}}\), \(\boldsymbol{M_{D,t}^{e^\top}}\), \(\boldsymbol{P_{D,t}^{e^\top}}\), \(\boldsymbol{M_{2,t}^{e*^\top}}\), \(\boldsymbol{P_{2,t}^{e*^\top}}\) \ldots{} \(\boldsymbol{M_{C,t}^{e*^\top}}\), \(\boldsymbol{P_{C,t}^{e*^\top}}]^\top\) and its dynamics evolve as a restricted VAR(1), +\begin{align} +& \underset{(F \times 1)}{\vphantom{\Big|} +\boldsymbol{Z_t^e}}= +\underset{(F \times 1)}{\vphantom{\Big|} +\boldsymbol{C^{e}_Y}} + +\underset{(F \times F)}{\vphantom{\Big|} +\Phi^e_Y} +\underset{(F \times 1)}{\vphantom{\Big|} +\boldsymbol{Z_{t-1}^e}} + +\underset{(F \times F)}{\vphantom{\Big|} +\Gamma_{Y}^e} +\underset{(F \times 1)}{\vphantom{\Big|} +\boldsymbol{\varepsilon^e_{Z,t}}}\text{,} & \boldsymbol{\varepsilon _{Z,t}^{e}} \sim {\mathcal{N}_F}(\boldsymbol{0}_F,\mathrm{I}_F). +\label{eq:VAROrtho} +\end{align} +JLL (2015) impose a set of zero restrictions on \(\Phi^e_Y\) and \(\Gamma_{Y}^e\), with their detailed structure provided in the original study. + +\subsection{Estimation procedures}\label{S:ATSMestimation} + +The approach proposed by JPS (2014) enables an efficient estimation procedure through its structural design. Specifically, the parameters governing the \(\mathbb{Q}\)- and \(\mathbb{P}\)-measures can be estimated independently. The only exception is the variance-covariance matrix, \(\Sigma\), which appears in both likelihood functions and, therefore, must be estimated jointly. + +In JLL (2015), however, the authors adopt a simplified estimation procedure by estimating the \(\Sigma\) matrix exclusively under the \(\mathbb{P}\)-measure. While they acknowledge that this approach is not fully efficient, they argue that the empirical implications are limited in their application. + +\section{The ATSMs available at the MultiATSM package}\label{S:ATSMoptions} + +As outlined in the previous section, the ATSMs implemented in the \CRANpkg{MultiATSM} package differ in the specification of their \(\mathbb{Q}\)- and \(\mathbb{P}\)-measure dynamics. In short, under the \(\mathbb{Q}\)-measure, models can be specified either on a country-by-country basis (JPS, 2014) or jointly across countries (JLL, 2015; CM, 2024). Under the \(\mathbb{P}\)-measure, risk factor dynamics follow a VAR(1) process, which may be unrestricted, as in the JPS-related frameworks, or restricted, as in the JLL and GVAR specifications. + +\CRANpkg{MultiATSM} provides support for eight different classes of ATSMs based on these modelling approaches. These classes vary along several dimensions: the specification of the \(\mathbb{P}\)- and \(\mathbb{Q}\)-dynamics, the estimation approach, and whether a dominant economy is included. Table \ref{tab:tab-ModFea-L} summarizes the defining features of each model class available in the package. A brief overview of these specifications follows below. + +\begin{table}[!h] +\centering +\caption{\label{tab:tab-ModFea-L}Summary of model features} +\centering +\fontsize{7}{9}\selectfont +\begin{tabular}[t]{lcccccccccccccc} +\toprule +\multicolumn{2}{c}{\textbf{ }} & \multicolumn{5}{c}{\textbf{P-dynamics}} & \multicolumn{1}{c}{\textbf{ }} & \multicolumn{2}{c}{\textbf{Q-dynamics}} & \multicolumn{1}{c}{\textbf{ }} & \multicolumn{2}{c}{\textbf{Sigma estimation}} & \multicolumn{1}{c}{\textbf{ }} & \multicolumn{1}{c}{\textbf{Dom. Eco.}} \\ +\cmidrule(l{3pt}r{3pt}){3-7} \cmidrule(l{3pt}r{3pt}){9-10} \cmidrule(l{3pt}r{3pt}){12-13} \cmidrule(l{3pt}r{3pt}){15-15} +\multicolumn{2}{c}{ } & \multicolumn{2}{c}{Single} & \multicolumn{3}{c}{Joint} & \multicolumn{1}{c}{ } & \multicolumn{1}{c}{Single} & \multicolumn{1}{c}{Joint} & \multicolumn{1}{c}{ } & \multicolumn{1}{c}{P} & \multicolumn{1}{c}{P and Q} & \multicolumn{1}{c}{ } \\ +\cmidrule(l{3pt}r{3pt}){3-4} \cmidrule(l{3pt}r{3pt}){5-7} \cmidrule(l{3pt}r{3pt}){9-9} \cmidrule(l{3pt}r{3pt}){10-10} \cmidrule(l{3pt}r{3pt}){12-12} \cmidrule(l{3pt}r{3pt}){13-13} +\multicolumn{2}{c}{ } & \multicolumn{1}{c}{UR} & \multicolumn{1}{c}{R} & \multicolumn{1}{c}{UR} & \multicolumn{2}{c}{R} & \multicolumn{1}{c}{ } & \multicolumn{1}{c}{ } & \multicolumn{1}{c}{ } & \multicolumn{1}{c}{ } & \multicolumn{1}{c}{ } & \multicolumn{1}{c}{ } & \multicolumn{1}{c}{ } & \multicolumn{1}{c}{ } \\ +\cmidrule(l{3pt}r{3pt}){3-3} \cmidrule(l{3pt}r{3pt}){4-4} \cmidrule(l{3pt}r{3pt}){5-5} \cmidrule(l{3pt}r{3pt}){6-7} +\textbf{ } & \textbf{} & \textbf{} & \textbf{} & \textbf{} & \textbf{JLL} & \textbf{GVAR} & \textbf{} & \textbf{} & \textbf{} & \textbf{} & \textbf{} & \textbf{} & \textbf{} & \textbf{}\\ +\midrule +\addlinespace[0.3em] +\multicolumn{15}{l}{\textbf{Unrestricted VAR}}\\ +\hspace{1em}JPS original & & x & & & & & & x & & & & x & & \\ +\hspace{1em}JPS global & & & & x & & & & x & & & & x & & \\ +\hspace{1em}JPS multi & & & & x & & & & & x & & & x & & \\ +\addlinespace[0.3em] +\multicolumn{15}{l}{\textbf{Restricted VAR (GVAR)}}\\ +\hspace{1em}GVAR single & & & & & & x & & x & & & & x & & \\ +\hspace{1em}GVAR multi & & & & & & x & & & x & & & x & & \\ +\addlinespace[0.3em] +\multicolumn{15}{l}{\textbf{Restricted VAR (JLL)}}\\ +\hspace{1em}JLL original & & & & & x & & & & x & & x & & & x\\ +\hspace{1em}JLL No DomUnit & & & & & x & & & & x & & x & & & \\ +\hspace{1em}JLL joint Sigma & & & & & x & & & & x & & & x & & x\\ +\bottomrule +\end{tabular} +\end{table} + +\vspace{-2.5em} +\begin{center} +\captionsetup{type=table} +\caption*{\footnotesize Note: Risk factor dynamics under the $\mathbb{P}$-measure may follow either an unrestricted (UR) or a restricted (R) specification. The set of restrictions present in the JLL-based and GVAR-based models are described in \cite{JotikasthiraLeLundblad2015} and \cite{CandelonMoura2024}, respectively. The estimation of the $\Sigma$ matrix is done either exclusively with the other parameters of the $\mathbb{P}$-dynamics (\textit{P} column) or jointly under both $\mathbb{P}$- and $\mathbb{Q}$-parameters (\textit{P and Q} column). \textit{Dom. Eco.} relates to the presence of a dominant economy. The entries featuring \textit{x} indicate that the referred characteristic is part of the model.} +\end{center} + +The ATSMs in which the estimation is performed separately for each country are labeled as \texttt{JPS\ original}, \texttt{JPS\ global} and \texttt{GVAR\ single}. In the \texttt{JPS\ original} setup, the set of risk factors includes exclusively each country's domestic variables and the global unspanned factors, whereas \texttt{JPS\ global} and \texttt{GVAR\ single} also incorporate domestic risk factors of the other countries of the economic system. Noticeably, the difference between \texttt{JPS\ global} and \texttt{GVAR\ single} stem from the set of restrictions imposed under the \(\mathbb{P}\)-dynamics. + +Within the multicountry frameworks, certain features are worth noting. The \texttt{JLL\ original} model reproduces the setup in JLL (2015), assuming an economic cohort composed of a globally dominant economy and a set of smaller countries, and estimating the \(\Sigma\) matrix exclusively under the \(\mathbb{P}\)-measure. The two alternative versions assume the absence of a dominant country (\texttt{JLL\ No\ DomUnit}) and the estimation of \(\Sigma\) under both the \(\mathbb{P}\) and \(\mathbb{Q}\) measures (\texttt{JLL\ joint\ Sigma}), as in JPS (2014). The remaining specifications differ in their \(\mathbb{P}\)-dynamics: either by an unrestricted VAR model (\texttt{JPS\ multi}) or by a GVAR setup (\texttt{GVAR\ multi}), as proposed in CM (2024). + +\section{Package dataset}\label{S:SectionData} + +The \CRANpkg{MultiATSM} package provides datasets that approximate those used in the GVAR-based ATSMs of \citet{CandelonMoura2023} and CM (2024). The data requirements for estimating GVAR models encompass those of all other model classes, making them suitable for generating outputs across all models supported by the package. As such, the examples in the following sections use the dataset from CM (2024). + +The \texttt{LoadData()} function provides access to the datasets included in the package. To load the data from CM (2024), set the argument to \texttt{CM\_2024}: + +\begin{verbatim} +LoadData("CM_2024") +\end{verbatim} + +This function returns three sets of data. The first contains time series of zero-coupon bond yields for four emerging market economies: China, Brazil, Mexico, and Uruguay. The data spans monthly intervals from June 2004 to January 2020. For the purpose of model estimation, the package requires that \emph{(i)} bond yield maturities are the same across all countries;\footnote{It is worth emphasizing that, although the \texttt{DataForEstimation()} and \texttt{InputsForOpt()} functions in the package accept inputs with differing maturities, their outputs are standardized to a common set of yields.} and \emph{(ii)} yields must be expressed in annualized percentage terms (not basis points). Note that the \CRANpkg{MultiATSM} package does not provide routines for bootstrapping zero-coupon yields from coupon bonds, so any such treatment must be handled by the user. + +The second dataset comprises time series for unspanned risk factors --- specifically, the macroeconomic indicators economic growth and inflation --- covering the same period as the bond yield data. These data cover both \emph{(i)} domestic variables for each of the four countries in the sample and \emph{(ii)} corresponding global indicators. The construction of unspanned risk factors, like that of bond yields, must be carried out externally by the user. + +The final dataset contains measures of interconnectedness, proxied by trade flows, which are specifically required for estimating the GVAR-based models. The trade flow data report the annual value of goods imported and exported between each pair of countries in the sample, starting from 1948. All values are expressed in U.S. dollars on a free-on-board basis. These data are used to construct the transition matrix in the GVAR framework. + +\section{Required user inputs}\label{S:SectionInputs} + +\subsection{Fundamental inputs}\label{fundamental-inputs} + +To estimate any model, the user must specify several general inputs, which can be grouped into the following categories: + +\begin{enumerate} +\def\labelenumi{\arabic{enumi}.} +\item + Desired ATSM class (\texttt{ModelType}): a character vector containing the label of the model to be estimated as described in Table \ref{tab:tab-ModFea-L}; +\item + Risk Factor Features. This includes the following list of elements: +\end{enumerate} + +\begin{itemize} +\item + Set of economies (\texttt{Economies}): a character vector containing the names of the economies which are part of the economic system; +\item + Global variables (\texttt{GlobalVar}): a character vector containing the labels of the \(G\) global unspanned factors. Studies examining the impact of global developments on bond prices could include proxy measures of global inflation and global economic activity in this category \citep{JotikasthiraLeLundblad2015, AbbrittiDellErbaMorenoSola2018, CandelonMoura2024}; +\item + Domestic variables (\texttt{DomVar}): a character vector containing the labels of the \(M\) domestic unspanned factors. These typically correspond to measures of domestic inflation and economic activity, the standard macroeconomic indicators monitored by central banks \citep{AngPiazzesi2003, JoslinPriebschSingleton2014, JotikasthiraLeLundblad2015, CandelonMoura2024}; +\item + Number of spanned factors (\(N\)): a scalar representing the number of country-specific spanned factors. Although, in principle, \(N\) could vary across countries, the models provided in the package assume a common value of \(N\) for all countries. + A common choice in the literature is \(N=3\), as in JPS (2014) and CM (2024), since this produces an excellent cross-sectional fit of bond yields (\citet{LittermanScheinkman1991}). + Other studies, such as \citet{AdrianCrumpMoench2013}, extend the specification to \(N=5\), arguing that it improves the performance of model-implied term premia. Further intuition on the role and interpretation of spanned factors is provided in Section~\hyperref[S:SpaFac]{6.1}. +\end{itemize} + +\begin{enumerate} +\def\labelenumi{\arabic{enumi}.} +\setcounter{enumi}{2} +\tightlist +\item + Sample span: +\end{enumerate} + +\begin{itemize} +\item + Initial sample date (\texttt{t0}): the start of the sample period in the format \emph{dd-mm-yyyy}; +\item + End sample date (\texttt{tF}): the end of the sample period in the format \emph{dd-mm-yyyy}. +\end{itemize} + +\begin{enumerate} +\def\labelenumi{\arabic{enumi}.} +\setcounter{enumi}{3} +\item + Data Frequency (\texttt{DataFreq}): a character vector specifying the frequency of the time series data. The available options are: \texttt{Annually}, \texttt{Quarterly}, \texttt{Monthly}, \texttt{Weekly}, \texttt{Daily\ Business\ Days}, and \texttt{Daily\ All\ Days}; +\item + Stationarity constraint under the \(\mathbb{Q}\)-dynamics (\texttt{StatQ}): a logical that takes \texttt{TRUE} if the user wishes to impose that the largest eigenvalue under the \(\mathbb{Q}\)-measure, \(\lambda^Q_i\), is strictly less than 1. While enforcing this stationarity constraint may increase estimation time, it can improve convergence and numerical stability. Moreover, by inducing near-cointegration, the eigenvalue restriction helps to pin down more plausible dynamics for bond risk premia \citep{BauerRudebuschWu2012, JoslinPriebschSingleton2014}; +\item + Selected folder to save the graphical outputs (\texttt{Folder2Save}): path where the selected graphical outputs will be saved. If set to \texttt{NULL}, the outputs are stored in the user's temporary directory (accessible via \texttt{tempdir()}); +\item + Output label (\texttt{OutputLabel}): A single-element character vector containing the name used in the file name that stores the model outputs. +\end{enumerate} + +The following provides an example of the basic model input specification: + +\begin{verbatim} +ModelType <- "JPS original" +Economies <- c("Brazil", "Mexico", "Uruguay") +GlobalVar <- c("Gl_Eco_Act", "Gl_Inflation") +DomVar <- c("Eco_Act", "Inflation") +N <- 3 +t0 <- "01-07-2005" +tF <- "01-12-2019" +DataFreq <- "Monthly" +StatQ <- FALSE +Folder2Save <- NULL +OutputLabel <- "Model_demo" +\end{verbatim} + +\subsection{Model-specific inputs}\label{model-specific-inputs} + +\subsubsection{GVARlist and JLLlist}\label{S:SectionGVARJLLinputs} + +The inputs described above are sufficient for estimating all variants of the JPS models presented in Table \ref{tab:tab-ModFea-L}. However, estimating the GVAR or JLL setups requires additional elements. For clarity, these extra inputs should be organized into separate lists for each model. This section outlines the general structure of both lists, while Section~\hyperref[S:PdynEst]{6.2} provide a more detailed explanations of their components and available options, reflecting the broader scope of each setup. + +For GVAR models, the required inputs are twofold. First, the user must specify the dynamic structure of each country's VARX model. For example: + +\begin{verbatim} +VARXtype <- "unconstrained" +\end{verbatim} + +Next, provide the desired inputs to build the transition matrix. For instance: + +\begin{verbatim} +data('TradeFlows') +W_type <- "Sample Mean" +t_First_Wgvar <- "2000" +t_Last_Wgvar <- "2015" +DataConnectedness <- TradeFlows +\end{verbatim} + +Based on these inputs, a complete instance of the \texttt{GVARlist} object is + +\begin{verbatim} +GVARlist <- list(VARXtype = "unconstrained", W_type = "Sample Mean", + t_First_Wgvar = "2000", t_Last_Wgvar = "2015", + DataConnectedness = TradeFlows) +\end{verbatim} + +For the JLL frameworks, if the chosen model is either \texttt{JLL\ original} or \texttt{JLL\ joint\ Sigma}, it suffices to specify the name of the dominant economy. Otherwise, for the \texttt{JLL\ No\ DomUnit} class, the user must set \texttt{None}. For instance: + +\begin{verbatim} +## Example for "JLL original" and "JLL joint Sigma" models +JLLlist <- list(DomUnit = "China") + +## For "JLL No DomUnit" model +JLLlist <- list(DomUnit = "None") +\end{verbatim} + +\subsubsection{BRWlist}\label{S:SectionBRWinputs} + +In an influential paper, \citet{BauerRudebuschWu2012} (henceforth BRW, 2012) show that estimates from traditional ATSMs often suffer from severe small-sample bias. This can lead to unrealistically stable expectations for future short-term interest rates and, consequently, distort term premium estimates for long-maturity bonds. To address this issue, BRW (2012) propose an indirect inference estimator based on a stochastic approximation algorithm, which corrects for bias and enhances the persistence of short-term interest rates, resulting in more plausible term premium dynamics. + +It is worth noting that this framework serves as a complementary feature to the core ATSMs and can therefore be applied to any of the model types supported by the \CRANpkg{MultiATSM} package. If the user intends to implement a model following the BRW (2012) approach, a few additional inputs must be specified. These include: + +\begin{itemize} +\item + Mean or median of physical dynamic estimates (\texttt{Cent\_Measure}): compute the mean or the median of the \(\mathbb{P}\)-dynamics estimates after each bootstrap iteration by setting the option to \texttt{Mean} (for the mean) or \texttt{Median} (for the median); +\item + Adjustment parameter (\texttt{gamma}): this parameter controls the degree of shrinkage applied to the difference between the estimates prior to the bias correction and the bootstrap-based estimates after each iteration. It remains fixed across iterations and must lie in the interval \((0,1)\); +\item + Number of iteration (\texttt{N\_iter}) : total number of iterations used + in the stochastic approximation algorithm after burn-in; +\item + Number of bootstrap samples (\texttt{B}): quantity of simulated samples + used in each burn-in or actual iteration; +\item + Perform closeness check (\texttt{checkBRW}): indicates whether the user wishes to compute the root mean square distance between the model estimates obtained through the bias-correction method and those generated via no bias-correction. The default is \texttt{TRUE}; +\item + Number of bootstrap samples used in the closeness check (\texttt{B\_check}): + default is equal to 100,000 samples; +\item + Eigenvalue restriction (\texttt{Eigen\_rest}): impose a restriction on the largest eigenvalue under the \(P\)-measure after applying the bias correction procedure. Default is \(1\); +\item + Number of burn-in iteration (\texttt{N\_burn}): quantity of the iterations + to be discarded in the first stage of the bias-correction estimation + process. The recommended number is \(15\%\) of the total number of + iterations. In practice, this resembles the burn-in concept in Markov chain Monte Carlo methods. Particularly, the BRW (2012) stochastic approximation algorithm is iterative, and for a sufficiently large number of iterations, the parameters converge to their true values. As such, discarding early iterations avoids the need for assessing a computationally costly exit condition. +\end{itemize} + +\begin{verbatim} +BRWlist <- within(list(Cent_Measure = "Mean", gamma = 0.2, N_iter = 500, B = 50, + checkBRW = TRUE, B_check = 1000, Eigen_rest = 1), + N_burn <- round(N_iter * 0.15)) +\end{verbatim} + +\subsection{Additional inputs for numerical and graphical outputs}\label{S:SectionNumOut} + +Once the desired features are selected and the parameters of the chosen ATSM have been estimated, the \CRANpkg{MultiATSM} package provides tools to generate the following numerical and graphical outputs +via the \texttt{NumOutputs()} function: + +\begin{itemize} +\tightlist +\item + Time-series dynamics of the risk factors; +\item + Model fit of the bond yields; +\item + Orthogonalized impulse response functions (IRFs); +\item + Orthogonalized forecast error variance decompositions (FEVDs); +\item + Generalized impulse response functions (GIRFs); +\item + Generalized forecast error variance decompositions (GFEVDs); +\item + Decomposition of bond yields into expected and term premia components. +\end{itemize} + +These outputs are organized into distinct analytical components, each offering different insights into model behavior and its economic interpretation. + +The time-series dynamics of the risk factors are displayed in separate subplots: one for each global factor, and one subplot per domestic risk factor showing all countries in the economic system. The model fit of the bond yields is provided through two measures of model-implied yields. The first is a fitted measure derived solely from the cross-sectional component, as in Equation \eqref{eq:AffineYieldsSpanned} for single-country models and Equation \eqref{eq:AffineYieldsSpannedMultiCountry} for multicountry setups. This measure reflects the fit based exclusively on the parameters governing the \(\mathbb{Q}\)-dynamics. The second incorporates both the physical and risk-neutral dynamics, combining the cross-sectional equations with the state evolution specified by each ATSM. + +The impulse response functions and variance decompositions are available in both orthogonalized and generalized forms. The orthogonalized outputs (IRFs and FEVDs) are computed using a short-run recursive identification scheme, meaning they depend on the ordering of the selected risk factors. Specifically, the package is structured to place global unspanned factors first, followed by its domestic unspanned and spanned factors within each country, in the order in which countries are listed in the \texttt{Economies} vector. In contrast, the generalized versions (GIRFs and GFEVDs) are robust to factor ordering but allow for correlated shocks across risk factors \citep{PesaranShin1998}. For the numerical computation of these outputs, a horizon of analysis has to be specified, \emph{e.g.}, \texttt{Horiz\ \textless{}-\ 100}. + +The bond yield decomposition can be performed with respect to two measures of risk compensation: term premia and forward premia. While the term premium is derived directly from the bond yield levels, the forward premium is obtained from the decomposition of forward rates. A more formal presentation of both measures is provided in the Appendix. + +Users must specify the desired graph types in a character vector. Available options include: \texttt{RiskFactors}, \texttt{Fit}, \texttt{IRF}, \texttt{FEVD}, \texttt{GIRF}, \texttt{GFEVD}, and \texttt{TermPremia}. For example: + +\begin{verbatim} +DesiredGraphs <- c("Fit", "GIRF", "GFEVD", "TermPremia") +\end{verbatim} + +Moreover, for all models, users must indicate the types of variables of interest (yields, risk factors, or both). For JLL-type models specifically, users must also specify whether to include the orthogonalized versions. Each of these options should be set to \texttt{TRUE} to generate the corresponding graphs, and \texttt{FALSE} otherwise. + +\begin{verbatim} +WishGraphRiskFac <- FALSE +WishGraphYields <- TRUE +WishOrthoJLLgraphs <- FALSE +\end{verbatim} + +The desired graphical outputs are stored in the selected folder, \texttt{Folder2Save}. Alternatively, users can display the desired plots directly in the console without saving them to \texttt{Folder2Save} by using the \texttt{autoplot()} method. + +\subsubsection{Bootstrap settings}\label{S:SectionBootstrap} + +\citet{Horowitz2019} shows that bootstrap methods generally produce more accurate statistical inference than those based on asymptotic distribution theory. To generate confidence intervals +using bootstrap, via the \texttt{Bootstrap()} function, an additional list of inputs must be provided: + +\begin{itemize} +\tightlist +\item + Desired bootstrap procedure (\texttt{methodBS}): the user must select one of the following options: \emph{(i)} standard residual bootstrap (\texttt{bs}); \emph{(ii)} wild bootstrap (\texttt{wild}); or \emph{(iii)} block bootstrap (\texttt{block}). If the block bootstrap is selected, the block length must also be specified. + The residual bootstrap is a conventional method that is straightforward to implement when a parametric model, such as a VAR model, is available. The block bootstrap makes weaker assumptions about the data-generating process and is well-suited to handling both weak and strong serial dependence. The wild bootstrap is particularly appropriate for data exhibiting heteroskedasticity \citep{Horowitz2019}; +\item + Number of bootstrap draws (\texttt{ndraws}): \citet{KilianLutkepohl2017} suggest that, in VAR specifications, \texttt{ndraws} can range from a few hundred to several thousand, depending on factors such as sample size, lag order, and the desired quantiles of the distribution. Illustrating this, CM (2024) set \(ndraws= 1,000\) in their ATSM to construct confidence intervals for IRFs; +\item + Confidence level expressed (\texttt{pctg}): the desired confidence level should be expressed in percentage points. Common choices in VAR-related setups include \(68\%\), \(90\%\) and \(95\%\) \citep{KilianLutkepohl2017}. +\end{itemize} + +\begin{verbatim} +Bootlist <- list(methodBS = 'block', BlockLength = 4, ndraws = 1000, pctg = 95) +\end{verbatim} + +\subsubsection{Out-of-sample forecast settings}\label{S:SectionForecast} + +To generate bond yield forecasts, use \texttt{ForecastYields()} with the following inputs: + +\begin{itemize} +\tightlist +\item + Forecast horizon (\texttt{ForHoriz}): Number of forecast horizons in periods; +\item + Index of the first observation (\texttt{t0Sample}): time index of the first observation included in the information set; +\item + Index of the last observation (\texttt{t0Forecast}): time index of the last observation in the information set used to generate the first forecast; +\item + Method used for forecast computation (\texttt{ForType}): forecasts can be generated using either a rolling or expanding window. To use a rolling window, set this parameter to \texttt{Rolling}. In this case, the sample length for each forecast is fixed and defined by \texttt{t0Sample}. For expanding window forecasts, set this input to \texttt{Expanding}, allowing the information set to increase at each forecast iteration. +\end{itemize} + +\begin{verbatim} +ForecastList <- list(ForHoriz = 12, t0Sample = 1, t0Forecast = 70, ForType = "Rolling") +\end{verbatim} + +\section{Model estimation}\label{S:SectionEstimation} + +Using the dataset described in Section~\hyperref[S:SectionData]{4}, the estimation of the ATSM proceeds in three main steps. First, the country-specific spanned factors are estimated, which, along with the global and domestic unspanned factors, form the complete set of risk factors used in the subsequent estimation steps. Second, the package estimates the parameters governing the dynamics of the risk factors under the \(\mathbb{P}\)-measure. Finally, it optimizes the full ATSM specification, including the parameters under the \(\mathbb{Q}\)-measure. + +As will be made clear in Section~\hyperref[S:SectionImplementation]{7}, although the functions introduced in this section can be used individually, they are primarily designed to be used together with the broader set of functions available in the \CRANpkg{MultiATSM} package. However, as these functions play a central role in the package structure, they warrant a dedicated section. + +\subsection{Spanned factors}\label{S:SpaFac} + +The spanned factors for country \(i\), denoted by \(\boldsymbol{P_{i,t}}\), are typically obtained as the first \(N\) principal components (PCs) of the observed bond yields. The PC method provides orthogonal linear combinations of the original variables, ordered by their ability to capture the variance in the data. Formally, \(\boldsymbol{P_{i,t}}\) is computed as \(\boldsymbol{P_{i,t}} = w_i \boldsymbol{Y_{i,t}}\), where yields are ordered by increasing maturity in \(\boldsymbol{Y_{i,t}}\), and \(w_i\) is the matrix of eigenvectors derived from the covariance matrix of \(\boldsymbol{Y_{i,t}}\). + +In the case of \(N = 3\), the spanned factors are traditionally interpreted as level, slope, and curvature components of the yield curve \citep{LittermanScheinkman1991}. This interpretation stems from the properties of the \(w_i\) matrix, as illustrated below: + +\begin{verbatim} +data('Yields') +w <- pca_weights_one_country(Yields, Economy = "Uruguay") +\end{verbatim} + +In matrix \emph{w}, each row holds the weights for constructing a spanned factor. The first row relates to the level factor, with weights loading roughly equally across maturities. As such, high (low) values of the level factor indicate an overall high (low) value of yields across all maturities. The second row features increasing weights with maturity, capturing the slope of the yield curve: high values indicate steep curves, while low values reflect flat or inverted curves. The third row corresponds to the curvature factor, with weights emphasizing medium-term maturities. This captures the `hump-shaped' features of the yield curve typically associated with changes in its curvature. These concepts are also graphically illustrated in Figure \ref{fig:pca-L}. + +\begin{figure} +\centering +\pandocbounded{\includegraphics[keepaspectratio]{RJ-2025-044_files/figure-latex/pca-L-1.pdf}} +\caption{\label{fig:pca-L}Yield loadings on the spanned factors. Example using bond yield data for Uruguay. Graph was generated using the ggplot2 package \citep{ggplot22016}.} +\end{figure} + +The user can directly obtain the time series of the country-specific spanned factors by calling \texttt{Spanned\_Factors()}, as shown below: + +\begin{verbatim} +data('Yields') +Economies <- c("China", "Brazil", "Mexico", "Uruguay") +N <- 2 +SpaFact <- Spanned_Factors(Yields, Economies, N) +\end{verbatim} + +\subsection{The P-dynamics estimation}\label{S:PdynEst} + +As presented in Table \ref{tab:tab-ModFea-L} and explained in detail in Section~\hyperref[S:ATSMTheory]{2}, the dynamics of the risk factors under the \(\mathbb{P}\)-measure in the available models follow a VAR(1) process. This specification can be fully unrestricted, as in the JPS-related models, or subject to restrictions, as in the GVAR and JLL frameworks. This subsection illustrates how each of these model configurations is implemented. + +\subsubsection{VAR}\label{S:SectionVAR} + +To use \texttt{VAR()}, the user needs to select the appropriate set of risk factors for the model being estimated and specify \texttt{unconstrained} in the argument \texttt{VARtype}. In the two examples presented below, the outputs are the intercept vector, the feedback matrix, and the variance--covariance matrix for a VAR(1) model under the \(\mathbb{P}\)-measure: + +\begin{verbatim} +## Example 1: "JPS global" and "JPS multi" models +data("RiskFacFull") +PdynPara <- VAR(RiskFacFull, VARtype = "unconstrained") + +## Example 2: "JPS original" model for China +FactorsChina <- RiskFacFull[1:7, ] +PdynPara <- VAR(FactorsChina, VARtype = "unconstrained") +\end{verbatim} + +\subsubsection{GVAR}\label{S:SectionGVAR} + +The \texttt{GVAR()} function estimates a GVAR(1) model constructed from country-specific VARX\(^{*}(1,1,1)\) specifications. It requires two main inputs: the number of domestic spanned factors (\(N\)) and a set of elements grouped in the \texttt{GVARinputs} list. The latter consists of four components: + +\begin{enumerate} +\def\labelenumi{\arabic{enumi}.} +\item + Economies: a \(C-\)dimensional character vector containing the names of the economies present in the economic system; +\item + GVAR list of risk factors: a list of risk factors sorted by country in addition to the global variables. An example of the expected data structure is: +\end{enumerate} + +\begin{verbatim} +data("GVARFactors") +\end{verbatim} + +To assist in formatting the data accordingly, users may use the \texttt{DatabasePrep()} function; + +\begin{enumerate} +\def\labelenumi{\arabic{enumi}.} +\setcounter{enumi}{2} +\tightlist +\item + VARX type: a character vector specifying the desired structure of the VARX\(^{*}\) model. Two general options are available: +\end{enumerate} + +\begin{itemize} +\item + Fully unconstrained: specify as \texttt{unconstrained}. This option estimates each equation in the system separately via ordinary least squares, without imposing any restrictions. +\item + With constraints: imposes specific set of zero restrictions on the feedback matrix. This category includes two sub-options: + \emph{(a)} \texttt{constrained:\ Spanned\ Factors} prevents foreign spanned factors from affecting any domestic risk factor; + \emph{(b)} \texttt{constrained:\ {[}factor\ name{]}} restricts the specified risk factor to be influenced only by its own lags and the lags of its associated star variables. In both cases, the VARX\(^{*}\) is estimated using restricted least squares. +\end{itemize} + +\begin{verbatim} +data('GVARFactors') +GVARinputs <- list(Economies = Economies, GVARFactors = GVARFactors, + VARXtype ="constrained: Inflation") +\end{verbatim} + +\begin{enumerate} +\def\labelenumi{\arabic{enumi}.} +\setcounter{enumi}{3} +\tightlist +\item + Transition matrix: a \(C \times C\) matrix that captures the degree of interdependence across the countries in the system. Each entry \((i,j)\) represents the strength of the dependence of economy \(i\) on economy \(j\). As an example, the matrix below is computed from bilateral trade flow data, averaged over the period 2006--2019, for a system comprising China, Brazil, Mexico, and Uruguay. The rows are normalized so that the weights sum to \(1\) for each country (i.e., each row of the matrix sums to \(1\)). + The transition matrix can be generated using \texttt{Transition\_Matrix()}, as illustrated in the Appendix : +\end{enumerate} + +\begin{verbatim} +#> China Brazil Mexico Uruguay +#> China 0.0000 0.6549 0.3155 0.0296 +#> Brazil 0.8269 0.0000 0.1234 0.0497 +#> Mexico 0.8596 0.1326 0.0000 0.0078 +#> Uruguay 0.3811 0.5498 0.0691 0.0000 +\end{verbatim} + +With inputs specified, the user can estimate a GVAR model using: + +\begin{verbatim} +data("GVARFactors") +GVARinputs <- list(Economies = Economies, GVARFactors = GVARFactors, + VARXtype = "unconstrained", Wgvar = W_gvar) +N <- 3 +GVARpara <- GVAR(GVARinputs, N, CheckInputs = TRUE) +\end{verbatim} + +Note that the \texttt{CheckInputs} parameter should be set to TRUE to perform a consistency check on the inputs specified in \texttt{GVARinputs} prior to the \(\mathbb{P}\)-dynamics estimation. + +\subsubsection{JLL}\label{S:SectionJLL} + +The \texttt{JLL()} function estimates the physical parameters. Required inputs are: + +\begin{enumerate} +\def\labelenumi{\arabic{enumi}.} +\item + Risk factors: a time series matrix of the risk factors in their non-orthogonalized form; +\item + Number of spanned factors (\(N\)): a scalar representing the number of country-specific spanned factors; +\item + \texttt{JLLinputs}: a list object containing the following elements: +\end{enumerate} + +\begin{itemize} +\item + Economies: a \(C\)-dimensional character vector listing the economies; +\item + Dominant Economy: a character vector indicating either the name of the country assigned as the dominant economy (for \texttt{JLL\ original} and \texttt{JLL\ jointSigma} models), or \texttt{None} (for the \texttt{JLL\ No\ DomUnit} case); +\item + Estimate Sigma Matrices: a logical equal to \texttt{TRUE} if the user wishes to estimate the full set of JLL sigma matrices (i.e., variance-covariance and Cholesky factor matrices), and \texttt{FALSE} otherwise. Since this numerical estimation is costly, it may significantly increase computation time; +\item + Precomputed Variance-Covariance Matrix: in some instances, a precomputed variance-covariance matrix from the non-orthogonalized dynamics can be supplied here to save time and memory. If no such matrix is available, this input should be set to \texttt{NULL}; +\item + JLL type: a character string specifying the chosen JLL model, following the classification described in Table \ref{tab:tab-ModFea-L}. +\end{itemize} + +\begin{verbatim} +## First set the JLLinputs +ModelType <- "JLL original" +JLLinputs <- list(Economies = Economies, DomUnit = "China", WishSigmas = TRUE, + SigmaNonOrtho = NULL, JLLModelType = ModelType) + +## Then, estimate the desired the P-dynamics from the desired JLL model +data("RiskFacFull") +N <- 3 +JLLpara <- JLL(RiskFacFull, N, JLLinputs, CheckInputs = TRUE) +\end{verbatim} + +The \texttt{CheckInputs} input is set to \texttt{TRUE} to perform a consistency check on the inputs specified in \texttt{JLLinputs} before running the \(\mathbb{P}\)-dynamics estimation. + +\subsection{ATSM estimation}\label{atsm-estimation} + +Estimating the ATSM parameters involves maximizing the log-likelihood function to obtain the best-fitting model parameters using \texttt{Optimization()}. The unspanned risk factor framework of JPS (2014) (and, therefore, all its multicountry extensions) follows a model parameterization similar to that proposed in \citet{JoslinSingletonZhu2011}. Particularly, it requires estimating a set of six parameter blocks: + +\begin{enumerate} +\def\labelenumi{\arabic{enumi}.} +\item + The risk-neutral long-run mean of the short rate (\(r0\)); +\item + The risk-neutral feedback matrix (\(K1XQ\)); +\item + Standard deviation of measurement errors for yields observed with error (\(se\)); +\item + The variance-covariance matrix from the VAR process (\(SSZ\)); +\item + The intercept matrix of the physical dynamics (\(K0Z\)); +\item + The feedback matrix of the physical dynamics (\(K1Z\)). +\end{enumerate} + +The parameters \(K0Z\) and \(K1Z\) have closed-form solutions. Similarly, \(r0\) and \(se\) are derived analytically and are factored out of the log-likelihood function. In contrast, the remaining parameters, \(K1XQ\) and \(SSZ\), must be estimated numerically. + +The optimization routine in \CRANpkg{MultiATSM} combines the \texttt{Nelder–Mead} and \texttt{L-BFGS-B} algorithms, executed sequentially and repeated until convergence is achieved. At each iteration, the parameter vector yielding the highest likelihood is retained, enhancing robustness to local optima without resorting to full multi-start procedures. Convergence is achieved when the absolute change in the mean log-likelihood falls below a user-defined tolerance (default \(10^{-4}\)). For the bootstrap replications, the same optimization procedure is applied; however, only the \texttt{Nelder–Mead} algorithm is used to reduce computation time. + +\section{Full implementation of ATSMs}\label{S:SectionImplementation} + +\subsection{Package workflow}\label{package-workflow} + +The complete workflow of the \CRANpkg{MultiATSM} package is built around seven core functions, which together support a streamlined and modular process. An overview of these functions is provided below: + +\begin{enumerate} +\def\labelenumi{\arabic{enumi}.} +\item + \texttt{LabFac()}: returns a list of risk factor labels used throughout the package. In particular, these labels assist in structuring sub-function inputs and generating variable and graph labels in a parsimonious manner; +\item + \texttt{InputsForOpt()}: collects and processes the inputs needed to build the likelihood function as specified in Section~\hyperref[S:SectionInputs]{5}. It estimates the model's \(\mathbb{P}\)-dynamics and returns an object of class \emph{ATSMModelInputs}, which includes \texttt{print()} and \texttt{summary()} S3 methods. The \texttt{print()} method summarizes model inputs and system features, while \texttt{summary()} reports statistics on risk factors and bond yields; +\item + \texttt{Optimization()}: performs the estimation of the model parameters, primarily the \(\mathbb{Q}\)-dynamics, using numerical optimization. This function returns a comprehensive list of the model's point estimates and can be computationally intensive; +\item + \texttt{InputsForOutputs()}: an auxiliary function that compiles the necessary elements for producing numerical and graphical outputs. It also creates separate folders in the user's \texttt{Folder2Save} directory to store the generated figures; +\item + \texttt{NumOutputs()}: produces the numerical outputs as selected in Section~\hyperref[S:SectionNumOut]{5.3}, based on the model's point estimates. The function returns an object of class \emph{ATSMNumOutputs}, for which an \texttt{autoplot()} S3 method is available. This method provides a convenient way to visualize the selected graphical outputs; +\item + \texttt{Bootstrap()}: computes confidence bounds for the numerical outputs using the bootstrap procedures defined in Section~\hyperref[S:SectionNumOut]{5.3} (subsection ``Bootstrap settings''). The function returns an \emph{ATSMModelBoot} object, which can be accessed via the \texttt{autoplot()} S3 method to generate the desired graphical outputs with confidence intervals. As this step involves repeated model estimation, it may require several hours (possibly days) to complete; +\item + \texttt{ForecastYields()}: generates bond yield forecasts and the corresponding forecast errors according to the specifications outlined in Section~\hyperref[S:SectionNumOut]{5.3} (subsection ``Out-of-sample forecast settings''). This function returns an object of class \emph{ATSMModelForecast}, accesible via the \texttt{plot()} S3 method, which displays Root Mean Squared Errors (RMSEs) by country and forecast horizon. +\end{enumerate} + +\subsection{Complete implementation}\label{complete-implementation} + +This section illustrates how to fully implement ATSMs using the \CRANpkg{MultiATSM} package. A simplified two-country \texttt{JPS\ original} framework serves as the example. The implementation steps are outlined below, and a sample of graphical outputs are presented in Figures \ref{fig:FitYields} -- \ref{fig:TermPremia}. + +\begin{verbatim} +library(MultiATSM) +# 1) USER INPUTS +# A) Load database data +LoadData("CM_2024") + +# B) GENERAL model inputs +ModelType <- "JPS original" +Economies <- c("China", "Brazil") +GlobalVar <- c("Gl_Eco_Act") +DomVar <- c("Eco_Act") +N <- 2 +t0_sample <- "01-05-2005" +tF_sample <- "01-12-2019" +OutputLabel <- "Test" +DataFreq <-"Monthly" +Folder2Save <- NULL +StatQ <- FALSE + +# B.1) SPECIFIC model inputs +# GVAR-based models +GVARlist <- list( VARXtype = "unconstrained", W_type = "Sample Mean", t_First_Wgvar = "2005", + t_Last_Wgvar = "2019", DataConnectedness = TradeFlows ) + +# JLL-based models +JLLlist <- list(DomUnit = "China") + +# BRW inputs +WishBC <- FALSE +BRWlist <- within(list(Cent_Measure = "Mean", gamma = 0.05, N_iter = 250, B = 50, checkBRW = TRUE, + B_check = 1000, Eigen_rest = 1), N_burn <- round(N_iter * 0.15)) + +# C) Decide on Settings for numerical outputs +WishFPremia <- TRUE +FPmatLim <- c(60,120) + +Horiz <- 30 +DesiredGraphs <- c() +WishGraphRiskFac <- FALSE +WishGraphYields <- FALSE +WishOrthoJLLgraphs <- FALSE + +# D) Bootstrap settings +WishBootstrap <- TRUE +BootList <- list(methodBS = 'bs', BlockLength = 4, ndraws = 5, pctg = 95) + +# E) Out-of-sample forecast +WishForecast <- TRUE +ForecastList <- list(ForHoriz = 12, t0Sample = 1, t0Forecast = 162, ForType = "Rolling") + +########################################################################################## +# NO NEED TO MAKE CHANGES FROM HERE: +# The sections below automatically process the inputs provided above, run the model +# estimation, generate the numerical and graphical outputs, and save results. + +# 2) Minor preliminary work: get the sets of factor labels +FactorLabels <- LabFac(N, DomVar, GlobalVar, Economies, ModelType) + +# 3) Prepare the inputs of the likelihood function +ATSMInputs <- InputsForOpt(t0_sample, tF_sample, ModelType, Yields, GlobalMacro, + DomMacro, FactorLabels, Economies, DataFreq, GVARlist, + JLLlist, WishBC, BRWlist) + +# 4) Optimization of the ATSM (Point Estimates) +ModelParaList <- Optimization(ATSMInputs, StatQ, DataFreq, FactorLabels, Economies, ModelType) + +# 5) Numerical and graphical outputs +# a) Prepare list of inputs for graphs and numerical outputs +InputsForOutputs <- InputsForOutputs(ModelType, Horiz, DesiredGraphs, OutputLabel, StatQ, + DataFreq, WishGraphYields, WishGraphRiskFac, + WishOrthoJLLgraphs, WishFPremia, + FPmatLim, WishBootstrap, BootList, + WishForecast, ForecastList) + +# b) Fit, IRF, FEVD, GIRF, GFEVD, and Term Premia +NumericalOutputs <- NumOutputs(ModelType, ModelParaList, InputsForOutputs, + FactorLabels, Economies, Folder2Save) + +# c) Confidence intervals (bootstrap analysis) +BootstrapAnalysis <- Bootstrap(ModelType, ModelParaList, NumericalOutputs, Economies, + InputsForOutputs, FactorLabels, JLLlist, GVARlist, + WishBC, BRWlist, Folder2Save) + +# 6) Out-of-sample forecasting +Forecasts <- ForecastYields(ModelType, ModelParaList, InputsForOutputs, FactorLabels, + Economies, JLLlist, GVARlist, WishBC, BRWlist, + Folder2Save) +\end{verbatim} + +\begin{figure} +\includegraphics[width=1\linewidth]{RJ-2025-044_files/figure-latex/FitYields-1} \caption{Chinese bond yield maturities with model fit comparisons. \emph{Model-fit} reflects estimation using only risk-neutral ($\mathbb{Q}$) dynamics parameters, while \emph{Model-implied} incorporates both physical ($\mathbb{P}$) and risk-neutral ($\mathbb{Q}$) dynamics. The $x$-axes represent time in months and the $y$-axis is in natural units.}\label{fig:FitYields} +\end{figure} + +\begin{figure} +\includegraphics[width=1\linewidth]{RJ-2025-044_files/figure-latex/IRF-1} \caption{IRFs from the Brazilian bond yields to global economic activity. Size of the shock is one-standard deviation. The black lines are the point estimates. Gray dashed lines are the bounds of the 95\% confidence intervals and the green lines correspond to the median of these intervals. The $x$-axes are expressed in months and the $y$-axis is in natural units.}\label{fig:IRF} +\end{figure} + +\begin{figure} +\includegraphics[width=1\linewidth]{RJ-2025-044_files/figure-latex/FEVD-1} \caption{FEVD from the Brazilian bond yield with maturity 60 months. The $x$-axis represents the forecast horizon in months and the $y$-axis is in natural units.}\label{fig:FEVD} +\end{figure} + +\begin{figure} +\includegraphics[width=1\linewidth]{RJ-2025-044_files/figure-latex/TermPremia-1} \caption{Chinese sovereign yield curve decomposition showing (i) expected future short rates and (ii) term premia components. The $x$-axis represents time in months and the $y$-axis is expressed in percentage points.}\label{fig:TermPremia} +\end{figure} + +\section{Concluding remarks}\label{concluding-remarks} + +The \CRANpkg{MultiATSM} package aims to advance yield curve (term structure) modelling within the R programming environment. It provides a comprehensive yet user-friendly toolkit for practitioners, academics, and policymakers, featuring estimation routines and generating detailed outputs across several macrofinance model classes. This allows for an in-depth exploration of the relationship between the real economy developments and the fixed income markets. + +The package covers eight classes of macrofinance term structure models, all built upon the single-country unspanned macroeconomic risk framework of \citet{JoslinPriebschSingleton2014}, which is also extended to a multicountry setting. Additional multicountry variants based on \citet{JotikasthiraLeLundblad2015} and \citet{CandelonMoura2024} are included, incorporating, respectively, a dominant economy and a GVAR structure to model cross-country interdependence. + +Each model class provides analytical outputs that offer insight into term structure dynamics, including plots of model fit, risk premia, impulse responses, and forecast error variance decompositions.The \CRANpkg{MultiATSM} package also offers bootstrap procedures for confidence interval construction and out-of-sample forecasting of bond yields. + +\section*{Acknowledgments}\label{acknowledgments} +\addcontentsline{toc}{section}{Acknowledgments} + +I thank the editor, Rob Hyndman, and an anonymous referee for several helpful comments. I am also grateful to Bertrand Candelon, Adhir Dhoble and Gustavo Torregrosa for many insightful discussions. An earlier version of this paper circulated under the title \emph{MultiATSM: An R Package for Arbitrage-Free Multicountry Affine Term Structure of Interest Rate Models with Unspanned Macroeconomic Risk} and was part of the author's PhD dissertation at UCLouvain \citep{Moura2022}. The views expressed in this paper are those of the author and do not necessarily reflect those of Banco de Mexico. + +\section{Appendix}\label{appendix} + +\subsection*{A: Supplementary functions}\label{a-supplementary-functions} +\addcontentsline{toc}{subsection}{A: Supplementary functions} + +\subsubsection{Importing data from Excel files}\label{importing-data-from-excel-files} + +The \CRANpkg{MultiATSM} package also provides an automated procedure for importing data from Excel files via \texttt{Load\_Excel\_Data()} and preparing the risk factor database used directly in the model estimation. To ensure compatibility with the package functions, the following requirements must be met: + +\begin{enumerate} +\def\labelenumi{\arabic{enumi}.} +\tightlist +\item + Databases must be organized in separate Excel files: one for unspanned factors and another for term structure data. For GVAR-based models, a third file containing the interdependence measures is also required; +\item + Each Excel file should include one tab per country. In the case of unspanned factors, an additional tab must be included for the global variables if the user opts to incorporate them; +\item + Variable names must be identical across all tabs within each file. +\end{enumerate} + +An example Excel file meeting these requirements is provided with the package. Below is an example of how to import the data from excel and construct the input list to be supplied: + +\begin{verbatim} +MacroData <- Load_Excel_Data(system.file("extdata", "MacroData.xlsx", + package = "MultiATSM")) +YieldsData <- Load_Excel_Data(system.file("extdata", "YieldsData.xlsx", + package = "MultiATSM")) +\end{verbatim} + +\begin{verbatim} +ModelType <- "JPS original" +Initial_Date <- "2006-09-01" +Final_Date <- "2019-01-01" +DataFrequency <- "Monthly" +GlobalVar <- c("GBC", "VIX") +DomVar <- c("Eco_Act", "Inflation", "Com_Prices", "Exc_Rates") +N <- 3 +Economies <- c("China", "Mexico", "Uruguay", "Brazil", "Russia") +\end{verbatim} + +These inputs are used to construct the \emph{RiskFactorsSet} variable, which holds the full collection of risk factors required by the model. + +\begin{verbatim} +FactorLabels <- LabFac(N, DomVar, GlobalVar, Economies, ModelType) +RiskFactorsSet <- DataForEstimation(Initial_Date, Final_Date, Economies, N, FactorLabels, + ModelType, DataFrequency, MacroData, YieldsData) +\end{verbatim} + +\subsubsection{Transition matrix and star factors}\label{transition-matrix-and-star-factors} + +To construct the transition matrix for GVAR specifications, the user can employ \texttt{Transition\_Matrix()}. This function requires: + +\begin{enumerate} +\def\labelenumi{\arabic{enumi}.} +\item + Data selection: choose proxies for cross-country interdependence. +\item + Time frame: specify the sample's start and end dates. +\item + Dependence measure: select from: +\end{enumerate} + +\begin{itemize} +\tightlist +\item + Time-varying (dynamic weights) +\item + Sample Mean (static average) +\item + A numeric scalar (fixed-year snapshot). +\end{itemize} + +\begin{verbatim} +data("TradeFlows") +t_First <- "2006" +t_Last <- "2019" +Economies <- c("China", "Brazil", "Mexico", "Uruguay") +type <- "Sample Mean" +W_gvar <- Transition_Matrix(t_First, t_Last, Economies, type, TradeFlows) +\end{verbatim} + +Note that if data is missing for any country in a given year, the corresponding transition matrix will contain only \texttt{NA}s. + +A more flexible approach to modelling interdependence is to allow the transition matrix to vary over time. In this case, the star factors are constructed using trade flow weights specific to each year, adjusting the corresponding year's risk factors accordingly. To enable this feature, users must set the \texttt{type} argument to \texttt{Time-varying} and specify the same year for both the initial and final periods in the transition matrix. This indicates that the trade weights from that particular year is used when solving the GVAR system (i.e., in the construction of the link matrices, see Equation \eqref{eq:LinkMatequation}). + +\subsection*{B: Additional theoretical considerations}\label{b-additional-theoretical-considerations} +\addcontentsline{toc}{subsection}{B: Additional theoretical considerations} + +\subsubsection{Bond yield decomposition}\label{bond-yield-decomposition} + +The \CRANpkg{MultiATSM} package allows for the calculation +of two risk compensation measures: term premia and forward premia. Assume that an \(n\)-maturity bond yield can be decomposed into two components: the expected short-rate (\(\mathrm{Exp}_{i,t}^{(n)}\)) and term premia +(\(\mathrm{TP}_{i,t}^{(n)}\)). +Technically: +\[ +y_{i,t}^{(n)} = \mathrm{Exp}_{i,t}^{(n)} + \mathrm{TP}_{i,t}^{(n)} \text{.} +\] In the package's standard form, the expected short rate term is +computed from time \(t\) to \(t+n\), which represents the bond's maturity: +\(\mathrm{Exp}_{i,t}^{(n)} = \sum_{h=0}^{n} E_t[y_{i, t+h}^{(1)}]\). Alternatively, +the decomposition for the forward rates (\(f_{i,t}^{(n)}\)) is +\(f_{i,t}^{(n)} = \sum_{h=m}^{n} E_t[y_{i,t+h}^{(1)}] + \mathrm{FP}_{i,t}^{(n)}\) where +\(\mathrm{FP}_{i,t}^{(n)}\) corresponds to the forward premia. In this case, the user +must specify \texttt{TRUE} if the computation of forward +premia is desired, or \texttt{FALSE} otherwise. If set to \texttt{TRUE}, the user must also +provide a two-element numerical vector containing the maturities +corresponding to the starting and ending dates of the bond maturity. +Example: + +\begin{verbatim} + WishFPremia <- TRUE + FPmatLim <- c(60, 120) +\end{verbatim} + +\subsection*{C: Replication of existing research}\label{c-replication-of-existing-research} +\addcontentsline{toc}{subsection}{C: Replication of existing research} + +\subsubsection{Joslin, Priebisch and Singleton (2014)}\label{joslin-priebisch-and-singleton-2014} + +The dataset used in this replication was constructed by \citet{BauerRudebusch2017} (henceforth BR, 2017) and is available on Bauer's website. In their paper, BR (2017) investigate whether macrofinance term structure models are better suited to the unspanned macro risk framework of JPS (2014) or to earlier, traditional spanned settings such as \citet{AngPiazzesi2003}. To that end, BR (2017) replicate selected empirical results from JPS (2014). The corresponding R code is also available on Bauer's website. + +Using the dataset from BR (2017), the code below applies the \CRANpkg{MultiATSM} package to estimate the key ATSM parameters following the \texttt{JPS\ original} modelling setup. + +\begin{verbatim} +# 1) INPUTS +# A) Load database data +LoadData("BR_2017") + +# B) GENERAL model inputs +ModelType <- "JPS original" + +Economies <- c("US") +GlobalVar <- c() +DomVar <- c("GRO", "INF") +N <- 3 +t0_sample <- "January-1985" +tF_sample <- "December-2007" +DataFreq <- "Monthly" +StatQ <- FALSE + +# 2) Minor preliminary work +FactorLabels <- LabFac(N, DomVar, GlobalVar, Economies, ModelType) +Yields <- t(BR_jps_out$Y) +DomesticMacroVar <- t(BR_jps_out$M.o) +GlobalMacroVar <- c() + +# 3) Prepare the inputs of the likelihood function +ATSMInputs <- InputsForOpt(t0_sample, tF_sample, ModelType, Yields, GlobalMacroVar, + DomesticMacroVar, FactorLabels, Economies, DataFreq) + +# 4) Optimization of the model +ModelPara <- Optimization(ATSMInputs, StatQ, DataFreq, FactorLabels, Economies, ModelType) +\end{verbatim} + +The tables below compare the ATSM parameter estimates generated from BR (2017) and the \CRANpkg{MultiATSM}. Table \ref{tab:QdynTab-L} reports the risk-neutral parameters. While the values presented do not match exactly, the differences are well within convergence tolerance and are arguably economically negligible. Table \ref{tab:PdynTab-L}, by contrast, contains parameters related to the model's time-series dynamics. As these are derived in closed form, the estimates are exactly the same under both specifications. + +\newpage + +\begin{table}[!h] +\centering +\caption{\label{tab:QdynTab-L}$Q$-dynamics parameters} +\centering +\fontsize{7}{9}\selectfont +\begin{tabular}[t]{lrr} +\toprule + & MultiATSM & BR (2017)\\ +\midrule +$r_0$ & $0.0006$ & $-0.0002$\\ +$\lambda_1$ & $0.9967$ & $0.9968$\\ +$\lambda_2$ & $0.9149$ & $0.9594$\\ +$\lambda_3$ & $0.9149$ & $0.8717$\\ +\bottomrule +\end{tabular} +\end{table} + +\vspace{-2.0em} +\begin{center} +\footnotesize Note: $\lambda$'s are the eigenvalues from the risk-neutral feedback matrix and $r_0$ is the long-run mean of the short rate under $\mathbb{Q}$. +\end{center} + +\begin{table}[!h] +\centering +\caption{\label{tab:PdynTab-L}$P$-dynamics parameters} +\centering +\fontsize{7}{9}\selectfont +\begin{tabular}[t]{lcccccc} +\toprule +\multicolumn{1}{c}{\textbf{ }} & \multicolumn{1}{c}{\textbf{K0Z}} & \multicolumn{5}{c}{\textbf{K1Z}} \\ +\cmidrule(l{3pt}r{3pt}){2-2} \cmidrule(l{3pt}r{3pt}){3-7} + & & PC1 & PC2 & PC3 & GRO & INF\\ +\midrule +\addlinespace[0.3em] +\multicolumn{7}{l}{\textbf{BR (2017)}}\\ +\hspace{1em}PC1 & $\phantom{0}\phantom{0}0.0781$ & $\phantom{0}\phantom{0}0.9369$ & $\phantom{0}\text{-}0.0131$ & $\phantom{0}\text{-}0.0218$ & $\phantom{0}\phantom{0}0.1046$ & \vphantom{1} $\phantom{0}\phantom{0}0.1003$\\ +\hspace{1em}PC2 & $\phantom{0}\phantom{0}0.0210$ & $\phantom{0}\phantom{0}0.0058$ & $\phantom{0}\phantom{0}0.9781$ & $\phantom{0}\phantom{0}0.1703$ & $\phantom{0}\text{-}0.1672$ & \vphantom{1} $\phantom{0}\text{-}0.0402$\\ +\hspace{1em}PC3 & $\phantom{0}\phantom{0}0.1005$ & $\phantom{0}\text{-}0.0104$ & $\phantom{0}\text{-}0.0062$ & $\phantom{0}\phantom{0}0.7835$ & $\phantom{0}\text{-}0.0399$ & \vphantom{1} $\phantom{0}\phantom{0}0.0437$\\ +\hspace{1em}GRO & $\phantom{0}\phantom{0}0.0690$ & $\phantom{0}\text{-}0.0048$ & $\phantom{0}\phantom{0}0.0180$ & $\phantom{0}\text{-}0.1112$ & $\phantom{0}\phantom{0}0.8818$ & \vphantom{1} $\phantom{0}\text{-}0.0025$\\ +\hspace{1em}INF & $\phantom{0}\phantom{0}0.0500$ & $\phantom{0}\phantom{0}0.0018$ & $\phantom{0}\phantom{0}0.0064$ & $\phantom{0}\text{-}0.0592$ & $\phantom{0}\phantom{0}0.0277$ & \vphantom{1} $\phantom{0}\phantom{0}0.9859$\\ +\addlinespace[0.3em] +\multicolumn{7}{l}{\textbf{MultiATSM}}\\ +\hspace{1em}PC1 & $\phantom{0}\phantom{0}0.0781$ & $\phantom{0}\phantom{0}0.9369$ & $\phantom{0}\text{-}0.0131$ & $\phantom{0}\text{-}0.0218$ & $\phantom{0}\phantom{0}0.1046$ & $\phantom{0}\phantom{0}0.1003$\\ +\hspace{1em}PC2 & $\phantom{0}\phantom{0}0.0210$ & $\phantom{0}\phantom{0}0.0058$ & $\phantom{0}\phantom{0}0.9781$ & $\phantom{0}\phantom{0}0.1703$ & $\phantom{0}\text{-}0.1672$ & $\phantom{0}\text{-}0.0402$\\ +\hspace{1em}PC3 & $\phantom{0}\phantom{0}0.1005$ & $\phantom{0}\text{-}0.0104$ & $\phantom{0}\text{-}0.0062$ & $\phantom{0}\phantom{0}0.7835$ & $\phantom{0}\text{-}0.0399$ & $\phantom{0}\phantom{0}0.0437$\\ +\hspace{1em}GRO & $\phantom{0}\phantom{0}0.0690$ & $\phantom{0}\text{-}0.0048$ & $\phantom{0}\phantom{0}0.0180$ & $\phantom{0}\text{-}0.1112$ & $\phantom{0}\phantom{0}0.8818$ & $\phantom{0}\text{-}0.0025$\\ +\hspace{1em}INF & $\phantom{0}\phantom{0}0.0500$ & $\phantom{0}\phantom{0}0.0018$ & $\phantom{0}\phantom{0}0.0064$ & $\phantom{0}\text{-}0.0592$ & $\phantom{0}\phantom{0}0.0277$ & $\phantom{0}\phantom{0}0.9859$\\ +\bottomrule +\multicolumn{7}{l}{\rule{0pt}{1em}\textit{Note: }}\\ +\multicolumn{7}{l}{\rule{0pt}{1em}$K0Z$ is the intercept and $K1Z$ is the feedback matrix from the $P$-dynamics.}\\ +\end{tabular} +\end{table} + +For replicability, it is important to note that the physical dynamics results reported in Table \ref{tab:PdynTab-L} using \CRANpkg{MultiATSM} rely on the principal component weights provided by BR (2017). Such a matrix is simply a scaled-up version of the one provided by the function \texttt{pca\_weights\_one\_country()} of the current package. Accordingly, despite the numerical differences on the weight matrices, both methods generate time series of spanned factors which are perfectly correlated. Another difference between the two approaches relates to the construction of the log-likelihood function: while in the BR (2017) code this is expressed in terms of a portfolio of yields, the \CRANpkg{MultiATSM} package generates this same input directly as a function of observed yields (i.e.~both procedures lead to equivalent log-likelihood vales up to the Jacobian term). + +Additionally, it is worth highlighting that the standard deviations for the portfolios of yields observed with errors are nearly identical, matching to seven decimal places: 0.0000546 for \CRANpkg{MultiATSM} and 0.0000550 for BR (2017). + +\subsubsection{Candelon and Moura (2024)}\label{candelon-and-moura-2024} + +The multicountry framework introduced in \citet{CandelonMoura2024} enhances the tractability of large-scale ATSMs and deepens our understanding of the global economic mechanisms driving domestic yield curve fluctuations. This framework also generates more precise model estimates and enhances the forecasting capabilities of these models. This novel setup, embodied by the \texttt{GVAR\ multi} model class, is benchmarked against the findings of \citet{JotikasthiraLeLundblad2015}, which are captured by the \texttt{JLL\ original} model class. The paper showcases an empirical illustration involving China, Brazil, Mexico, and Uruguay. + +\begin{verbatim} +# 1) INPUTS +# A) Load database data +LoadData("CM_2024") + +# B) GENERAL model inputs +ModelType <- "GVAR multi" +Economies <- c("China", "Brazil", "Mexico", "Uruguay") +GlobalVar <- c("Gl_Eco_Act", "Gl_Inflation") +DomVar <- c("Eco_Act", "Inflation") +N <- 3 +t0_sample <- "01-06-2004" +tF_sample <- "01-01-2020" +OutputLabel <- "CM_jfec" +DataFreq <-"Monthly" +StatQ <- FALSE + +# B.1) SPECIFIC model inputs +# GVAR-based models +GVARlist <- list( VARXtype = "unconstrained", W_type = "Sample Mean", t_First_Wgvar = "2004", + t_Last_Wgvar = "2019", DataConnectedness = TradeFlows ) + +# JLL-based models +JLLlist <- list(DomUnit = "China") + +# BRW inputs +WishBC <- TRUE +BRWlist <- within(list(Cent_Measure = "Mean", gamma = 0.001, N_iter = 200, B = 50, checkBRW = TRUE, + B_check = 1000, Eigen_rest = 1), N_burn <- round(N_iter * 0.15)) + +# C) Decide on Settings for numerical outputs +WishFPremia <- TRUE +FPmatLim <- c(24,36) + +Horiz <- 25 +DesiredGraphs <- c("GIRF", "GFEVD", "TermPremia") +WishGraphRiskFac <- FALSE +WishGraphYields <- TRUE +WishOrthoJLLgraphs <- TRUE + +# D) Bootstrap settings +WishBootstrap <- FALSE +BootList <- list(methodBS = 'bs', BlockLength = 4, ndraws = 1000, pctg = 95) + +# E) Out-of-sample forecast +WishForecast <- TRUE +ForecastList <- list(ForHoriz = 12, t0Sample = 1, t0Forecast = 100, ForType = "Rolling") + +# 2) Minor preliminary work: get the sets of factor labels and a vector of common maturities +FactorLabels <- LabFac(N, DomVar, GlobalVar, Economies, ModelType) + +# 3) Prepare the inputs of the likelihood function +ATSMInputs <- InputsForOpt(t0_sample, tF_sample, ModelType, Yields, GlobalMacro, + DomMacro, FactorLabels, Economies, DataFreq, + GVARlist, JLLlist, WishBC, BRWlist) + +# 4) Optimization of the ATSM (Point Estimates) +ModelParaList <- Optimization(ATSMInputs, StatQ, DataFreq, FactorLabels, Economies, ModelType) + +# 5) Numerical and graphical outputs +# a) Prepare list of inputs for graphs and numerical outputs +InputsForOutputs <- InputsForOutputs(ModelType, Horiz, DesiredGraphs, OutputLabel, StatQ, + DataFreq, WishGraphYields, WishGraphRiskFac, + WishOrthoJLLgraphs, WishFPremia, FPmatLim, + WishBootstrap, BootList, WishForecast, + ForecastList) + +# b) Fit, IRF, FEVD, GIRF, GFEVD, and Term Premia +NumericalOutputs <- NumOutputs(ModelType, ModelParaList, InputsForOutputs, + FactorLabels, Economies) + +# c) Confidence intervals (bootstrap analysis) +BootstrapAnalysis <- Bootstrap(ModelType, ModelParaList, NumericalOutputs, Economies, + InputsForOutputs, FactorLabels, JLLlist, GVARlist, + WishBC, BRWlist) + +# 6) Out-of-sample forecasting +Forecasts <- ForecastYields(ModelType, ModelParaList, InputsForOutputs, FactorLabels, + Economies, JLLlist, GVARlist, WishBC, BRWlist) +\end{verbatim} + +\paragraph{Candelon and Moura (2023)}\label{candelon-and-moura-2023} + +In this paper, \citet{CandelonMoura2023} investigate the underlying factors that shape the sovereign yield curves of Brazil, India, Mexico, and Russia during the COVID\(-19\) pandemic crisis. The study adopts a \texttt{GVAR\ multi} approach to capture the complex global macrofinancial, and especially health-related interdependencies during the latest pandemic. + +\begin{verbatim} +# 1) INPUTS +# A) Load database data +LoadData("CM_2023") + +# B) GENERAL model inputs +ModelType <- "GVAR multi" +Economies <- c("Brazil", "India", "Russia", "Mexico") +GlobalVar <- c("US_Output_growth", "China_Output_growth", "SP500") +DomVar <- c("Inflation","Output_growth", "CDS", "COVID") +N <- 2 +t0_sample <- "22-03-2020" +tF_sample <- "26-09-2021" +OutputLabel <- "CM_EM" +DataFreq <-"Weekly" +StatQ <- FALSE + +# B.1) SPECIFIC model inputs +# GVAR-based models +GVARlist <- list(VARXtype = "constrained: COVID", W_type = "Sample Mean", + t_First_Wgvar = "2015", t_Last_Wgvar = "2020", + DataConnectedness = TradeFlows_covid) + +# BRW inputs +WishBC <- FALSE + +# C) Decide on Settings for numerical outputs +WishFPremia <- TRUE +FPmatLim <- c(47,48) + +Horiz <- 12 +DesiredGraphs <- c("GIRF", "GFEVD", "TermPremia") +WishGraphRiskFac <- FALSE +WishGraphYields <- TRUE +WishOrthoJLLgraphs <- FALSE + +# D) Bootstrap settings +WishBootstrap <- TRUE +BootList <- list(methodBS = 'bs', BlockLength = 4, ndraws = 100, pctg = 95) + +# 2) Minor preliminary work: get the sets of factor labels and a vector of common maturities +FactorLabels <- LabFac(N, DomVar, GlobalVar, Economies, ModelType) + +# 3) Prepare the inputs of the likelihood function +ATSMInputs <- InputsForOpt(t0_sample, tF_sample, ModelType, Yields_covid, GlobalMacro_covid, + DomMacro_covid, FactorLabels, Economies, DataFreq, GVARlist) + +# 4) Optimization of the ATSM (Point Estimates) +ModelParaList <- Optimization(ATSMInputs, StatQ, DataFreq, FactorLabels, Economies, ModelType) + +# 5) Numerical and graphical outputs +# a) Prepare list of inputs for graphs and numerical outputs +InputsForOutputs <- InputsForOutputs(ModelType, Horiz, DesiredGraphs, OutputLabel, StatQ, + DataFreq, WishGraphYields, WishGraphRiskFac, + WishOrthoJLLgraphs, WishFPremia, FPmatLim, + WishBootstrap, BootList) + +# b) Fit, IRF, FEVD, GIRF, GFEVD, and Term Premia +NumericalOutputs <- NumOutputs(ModelType, ModelParaList, InputsForOutputs, FactorLabels, + Economies) + +# c) Confidence intervals (bootstrap analysis) +BootstrapAnalysis <- Bootstrap(ModelType, ModelParaList, NumericalOutputs, Economies, + InputsForOutputs, FactorLabels, + JLLlist = NULL, GVARlist) +\end{verbatim} + +\bibliography{References.bib} + +\address{% +Rubens Moura\\ +Banco de Mexico\\% +Avenida 5 de Mayo, 2\\ Mexico City, Mexico\\ +% +% +\textit{ORCiD: \href{https://orcid.org/0000-0001-8105-4729}{0000-0001-8105-4729}}\\% +\href{mailto:rubens.guimaraes@banxico.org.mx}{\nolinkurl{rubens.guimaraes@banxico.org.mx}}% +} diff --git a/_articles/RJ-2025-044/RJ-2025-044.zip b/_articles/RJ-2025-044/RJ-2025-044.zip new file mode 100644 index 0000000000..4d87ead0ce Binary files /dev/null and b/_articles/RJ-2025-044/RJ-2025-044.zip differ diff --git a/_articles/RJ-2025-044/RJ-2025-044_files/figure-html5/FEVD-1.png b/_articles/RJ-2025-044/RJ-2025-044_files/figure-html5/FEVD-1.png new file mode 100644 index 0000000000..86a35b5857 Binary files /dev/null and b/_articles/RJ-2025-044/RJ-2025-044_files/figure-html5/FEVD-1.png differ diff --git a/_articles/RJ-2025-044/RJ-2025-044_files/figure-html5/FitYields-1.png b/_articles/RJ-2025-044/RJ-2025-044_files/figure-html5/FitYields-1.png new file mode 100644 index 0000000000..486eb63e36 Binary files /dev/null and b/_articles/RJ-2025-044/RJ-2025-044_files/figure-html5/FitYields-1.png differ diff --git a/_articles/RJ-2025-044/RJ-2025-044_files/figure-html5/IRF-1.png b/_articles/RJ-2025-044/RJ-2025-044_files/figure-html5/IRF-1.png new file mode 100644 index 0000000000..6595f5809f Binary files /dev/null and b/_articles/RJ-2025-044/RJ-2025-044_files/figure-html5/IRF-1.png differ diff --git a/_articles/RJ-2025-044/RJ-2025-044_files/figure-html5/TermPremia-1.png b/_articles/RJ-2025-044/RJ-2025-044_files/figure-html5/TermPremia-1.png new file mode 100644 index 0000000000..9ebe3401c7 Binary files /dev/null and b/_articles/RJ-2025-044/RJ-2025-044_files/figure-html5/TermPremia-1.png differ diff --git a/_articles/RJ-2025-044/RJ-2025-044_files/figure-html5/pca-H-1.png b/_articles/RJ-2025-044/RJ-2025-044_files/figure-html5/pca-H-1.png new file mode 100644 index 0000000000..f80dcf3009 Binary files /dev/null and b/_articles/RJ-2025-044/RJ-2025-044_files/figure-html5/pca-H-1.png differ diff --git a/_articles/RJ-2025-044/RJ-2025-044_files/figure-latex/FEVD-1.pdf b/_articles/RJ-2025-044/RJ-2025-044_files/figure-latex/FEVD-1.pdf new file mode 100644 index 0000000000..cfbedc0dbd Binary files /dev/null and b/_articles/RJ-2025-044/RJ-2025-044_files/figure-latex/FEVD-1.pdf differ diff --git a/_articles/RJ-2025-044/RJ-2025-044_files/figure-latex/FitYields-1.pdf b/_articles/RJ-2025-044/RJ-2025-044_files/figure-latex/FitYields-1.pdf new file mode 100644 index 0000000000..99e54d3e26 Binary files /dev/null and b/_articles/RJ-2025-044/RJ-2025-044_files/figure-latex/FitYields-1.pdf differ diff --git a/_articles/RJ-2025-044/RJ-2025-044_files/figure-latex/IRF-1.pdf b/_articles/RJ-2025-044/RJ-2025-044_files/figure-latex/IRF-1.pdf new file mode 100644 index 0000000000..e741419126 Binary files /dev/null and b/_articles/RJ-2025-044/RJ-2025-044_files/figure-latex/IRF-1.pdf differ diff --git a/_articles/RJ-2025-044/RJ-2025-044_files/figure-latex/TermPremia-1.pdf b/_articles/RJ-2025-044/RJ-2025-044_files/figure-latex/TermPremia-1.pdf new file mode 100644 index 0000000000..fe7140c254 Binary files /dev/null and b/_articles/RJ-2025-044/RJ-2025-044_files/figure-latex/TermPremia-1.pdf differ diff --git a/_articles/RJ-2025-044/RJ-2025-044_files/figure-latex/pca-L-1.pdf b/_articles/RJ-2025-044/RJ-2025-044_files/figure-latex/pca-L-1.pdf new file mode 100644 index 0000000000..694810e2bf Binary files /dev/null and b/_articles/RJ-2025-044/RJ-2025-044_files/figure-latex/pca-L-1.pdf differ diff --git a/_articles/RJ-2025-044/RJournal.sty b/_articles/RJ-2025-044/RJournal.sty new file mode 100644 index 0000000000..351990be38 --- /dev/null +++ b/_articles/RJ-2025-044/RJournal.sty @@ -0,0 +1,358 @@ +% Package `RJournal' to use with LaTeX2e +% Copyright (C) 2010 by the R Foundation +% Copyright (C) 2013 by the R Journal +% +% Originally written by Kurt Hornik and Friedrich Leisch with subsequent +% edits by the editorial board +% +% CAUTION: +% Do not modify this style file. Any changes to this file will be reset when your +% article is submitted. +% If you must modify the style or add LaTeX packages to the article, these +% should be specified in RJwrapper.tex + +\NeedsTeXFormat{LaTeX2e}[1995/12/01] +\ProvidesPackage{RJournal}[2025/10/05 v0.17 RJournal package] + +\RequirePackage{tikz} + +% Overall page layout, fonts etc ----------------------------------------------- + +% Issues of of \emph{The R Journal} are created from the standard \LaTeX{} +% document class \pkg{report}. + +\RequirePackage{geometry} +\geometry{a4paper, + textwidth=14cm, top=1cm, bottom=1cm, + includehead,includefoot,centering, + footskip=1.5cm} +\raggedbottom +\sloppy +\clubpenalty = 10000 +\widowpenalty = 10000 +\brokenpenalty = 10000 +\usepackage{microtype} + + +\RequirePackage{fancyhdr} +\fancyhead{} +\fancyheadoffset{2cm} +\fancyhead[L]{\textsc{\RJ@sectionhead}} +\fancyhead[R]{\thepage} +\fancyfoot{} +\fancyfoot[L]{The R Journal Vol. \RJ@volume/\RJ@number, \RJ@month~\RJ@year} +\fancyfoot[R]{ISSN 2073-4859} +\pagestyle{fancy} + +% We use the following fonts (all with T1 encoding): +% +% rm & palatino +% tt & inconsolata +% sf & helvetica +% math & palatino + +\RequirePackage{microtype} + +\RequirePackage[scaled=0.92]{helvet} +\RequirePackage{palatino,mathpazo} +\RequirePackage[scaled=1.02]{inconsolata} +\RequirePackage[T1]{fontenc} + +\RequirePackage[hyphens]{url} +\RequirePackage[pagebackref]{hyperref} +\renewcommand{\backref}[1]{[p#1]} + +% Dark blue colour for all links +\RequirePackage{color} +\definecolor{link}{rgb}{0.45,0.51,0.67} +\hypersetup{ + colorlinks,% + citecolor=link,% + filecolor=link,% + linkcolor=link,% + urlcolor=link +} + +% Give the text a little room to breath +\setlength{\parskip}{3pt} +\RequirePackage{setspace} +\setstretch{1.05} + +% Issue and article metadata --------------------------------------------------- + +% Basic front matter information about the issue: volume, number, and +% date. + +\newcommand{\volume}[1]{\def\RJ@volume{#1}} +\newcommand{\volnumber}[1]{\def\RJ@number{#1}} +\renewcommand{\month}[1]{\def\RJ@month{#1}} +\renewcommand{\year}[1]{\def\RJ@year{#1}} + + +% Individual articles correspond to +% chapters, and are contained in |article| environments. This makes it +% easy to have figures counted within articles and hence hyperlinked +% correctly. + +% An article has an author, a title, and optionally a subtitle. We use +% the obvious commands for specifying these. Articles will be put in certain +% journal sections, named by \sectionhead. + +\newcommand {\sectionhead} [1]{\def\RJ@sectionhead{#1}} +\renewcommand{\author} [1]{\def\RJ@author{#1}} +\renewcommand{\title} [1]{\def\RJ@title{#1}} +\newcommand {\subtitle} [1]{\def\RJ@subtitle{#1}} + +% Control appearance of titles: make slightly smaller than usual, and +% suppress section numbering. See http://tex.stackexchange.com/questions/69749 +% for why we don't use \setcounter{secnumdepth}{-1} + +\usepackage[medium]{titlesec} +\usepackage{titletoc} +\titleformat{\section} {\normalfont\large\bfseries}{\arabic{section}}{1em}{} +\titleformat{\subsection}{\normalfont\normalsize\bfseries}{\arabic{section}.\arabic{subsection}}{0.5em}{} +\titlecontents{chapter} [0em]{}{}{}{\titlerule*[1em]{.}\contentspage} + +% Article layout --------------------------------------------------------------- + +% Environment |article| clears the article header information at its beginning. +% We use |\FloatBarrier| from the placeins package to keep floats within +% the article. +\RequirePackage{placeins} +\newenvironment{article}{\author{}\title{}\subtitle{}\FloatBarrier}{\FloatBarrier} + +% Refereed articles should have an abstract, so we redefine |\abstract| to +% give the desired style + +\renewcommand{\abstract}[1]{\noindent\textbf{Abstract} #1} +\renewenvironment{abstract}{\noindent\textbf{Abstract}~}{} + +% The real work is done by a redefined version of |\maketitle|. Note +% that even though we do not want chapters (articles) numbered, we +% need to increment the chapter counter, so that figures get correct +% labelling. + +\renewcommand{\maketitle}{% +\noindent + \chapter{\RJ@title}\refstepcounter{chapter} + \ifx\empty\RJ@subtitle + \else + \noindent\textbf{\RJ@subtitle} + \par\nobreak\addvspace{\baselineskip} + \fi + \ifx\empty\RJ@author + \else + \noindent\textit{\RJ@author} + \par\nobreak\addvspace{\baselineskip} + \fi + \@afterindentfalse\@nobreaktrue\@afterheading +} + +% Now for some ugly redefinitions. We do not want articles to start a +% new page. (Actually, we do, but this is handled via explicit +% \newpage +% +% The name@of@eq is a hack to get hyperlinks to equations to work +% within each article, even though there may be multiple eq.(1) +% \begin{macrocode} +\renewcommand\chapter{\secdef\RJ@chapter\@schapter} +\providecommand{\nohyphens}{% + \hyphenpenalty=10000\exhyphenpenalty=10000\relax} +\newcommand{\RJ@chapter}{% + \edef\name@of@eq{equation.\@arabic{\c@chapter}}% + \renewcommand{\@seccntformat}[1]{}% + \@startsection{chapter}{0}{0mm}{% + -2\baselineskip \@plus -\baselineskip \@minus -.2ex}{\p@}{% + \phantomsection\normalfont\huge\bfseries\raggedright}} + +% Book reviews should appear as sections in the text and in the pdf bookmarks, +% however we wish them to appear as chapters in the TOC. Thus we define an +% alternative to |\maketitle| for reviews. +\newcommand{\review}[1]{ + \pdfbookmark[1]{#1}{#1} + \section*{#1} + \addtocontents{toc}{\protect\contentsline{chapter}{#1}{\thepage}{#1.1}} +} + +% We want bibliographies as starred sections within articles. +% +\RequirePackage[sectionbib,round]{natbib} +\bibliographystyle{abbrvnat} +\renewcommand{\bibsection}{\section*{References}} + +% Equations, figures and tables are counted within articles, but we do +% not show the article number. For equations it becomes a bit messy to avoid +% having hyperref getting it wrong. + +% \numberwithin{equation}{chapter} +\renewcommand{\theequation}{\@arabic\c@equation} +\renewcommand{\thefigure}{\@arabic\c@figure} +\renewcommand{\thetable}{\@arabic\c@table} + +% Issue layout ----------------------------------------------------------------- + +% Need to provide our own version of |\tableofcontents|. We use the +% tikz package to get the rounded rectangle. Notice that |\section*| +% is really the same as |\chapter*|. +\renewcommand{\contentsname}{Contents} +\renewcommand\tableofcontents{% + \vspace{1cm} + \section*{\contentsname} + { \@starttoc{toc} } +} + +\renewcommand{\titlepage}{% + \thispagestyle{empty} + \hypersetup{ + pdftitle={The R Journal Volume \RJ@volume/\RJ@number, \RJ@month \RJ@year},% + pdfauthor={R Foundation for Statistical Computing},% + } + \noindent + \begin{center} + \fontsize{50pt}{50pt}\selectfont + The \raisebox{-8pt}{\includegraphics[height=77pt]{Rlogo-5}}\hspace{10pt} + Journal + + \end{center} + {\large \hfill Volume \RJ@volume/\RJ@number, \RJ@month{} \RJ@year \quad} + + \rule{\textwidth}{1pt} + \begin{center} + {\Large A peer-reviewed, open-access publication of the \\ + R Foundation for Statistical Computing} + \end{center} + + % And finally, put in the TOC box. Note the way |tocdepth| is adjusted + % before and after producing the TOC: thus, we can ensure that only + % articles show up in the printed TOC, but that in the PDF version, + % bookmarks are created for sections and subsections as well (provided + % that the non-starred forms are used). + \setcounter{tocdepth}{0} + \tableofcontents + \setcounter{tocdepth}{2} + \clearpage +} + +% Text formatting -------------------------------------------------------------- + +\newcommand{\R}{R} +\newcommand{\address}[1]{\addvspace{\baselineskip}\noindent\emph{#1}} +\newcommand{\email}[1]{\href{mailto:#1}{\normalfont\texttt{#1}}} + +% Simple font selection is not good enough. For example, |\texttt{--}| +% gives `\texttt{--}', i.e., an endash in typewriter font. Hence, we +% need to turn off ligatures, which currently only happens for commands +% |\code| and |\samp| and the ones derived from them. Hyphenation is +% another issue; it should really be turned off inside |\samp|. And +% most importantly, \LaTeX{} special characters are a nightmare. E.g., +% one needs |\~{}| to produce a tilde in a file name marked by |\file|. +% Perhaps a few years ago, most users would have agreed that this may be +% unfortunate but should not be changed to ensure consistency. But with +% the advent of the WWW and the need for getting `|~|' and `|#|' into +% URLs, commands which only treat the escape and grouping characters +% specially have gained acceptance + +\DeclareRobustCommand\code{\bgroup\@noligs\@codex} +\def\@codex#1{\texorpdfstring% +{{\normalfont\ttfamily\hyphenchar\font=-1 #1}}% +{#1}\egroup} +\newcommand{\kbd}[1]{{\normalfont\texttt{#1}}} +\newcommand{\key}[1]{{\normalfont\texttt{\uppercase{#1}}}} +\DeclareRobustCommand\samp{`\bgroup\@noligs\@sampx} +\def\@sampx#1{{\normalfont\texttt{#1}}\egroup'} +\newcommand{\var}[1]{{\normalfont\textsl{#1}}} +\let\env=\code +\newcommand{\file}[1]{{`\normalfont\textsf{#1}'}} +\let\command=\code +\let\option=\samp +\newcommand{\dfn}[1]{{\normalfont\textsl{#1}}} +% \acronym is effectively disabled since not used consistently +\newcommand{\acronym}[1]{#1} +\newcommand{\strong}[1]{\texorpdfstring% +{{\normalfont\fontseries{b}\selectfont #1}}% +{#1}} +\let\pkg=\strong +\newcommand{\CRANpkg}[1]{\href{https://CRAN.R-project.org/package=#1}{\pkg{#1}}}% +\let\cpkg=\CRANpkg +\newcommand{\ctv}[1]{\href{https://CRAN.R-project.org/view=#1}{\emph{#1}}} +\newcommand{\BIOpkg}[1]{\href{https://www.bioconductor.org/packages/release/bioc/html/#1.html}{\pkg{#1}}} + +% Example environments --------------------------------------------------------- +\RequirePackage{fancyvrb} +\RequirePackage{alltt} + +\DefineVerbatimEnvironment{example}{Verbatim}{} +\renewenvironment{example*}{\begin{alltt}}{\end{alltt}} + +% Support for output from Sweave, and generic session style code +% These used to have fontshape=sl for Sinput/Scode/Sin, but pslatex +% won't use a condensed font in that case. + +% Update (2015-05-28 by DS): remove fontsize=\small to match example environment + +\DefineVerbatimEnvironment{Sinput}{Verbatim}{} +\DefineVerbatimEnvironment{Soutput}{Verbatim}{} +\DefineVerbatimEnvironment{Scode}{Verbatim}{} +\DefineVerbatimEnvironment{Sin}{Verbatim}{} +\DefineVerbatimEnvironment{Sout}{Verbatim}{} +\newenvironment{Schunk}{}{} + +% Mathematics ------------------------------------------------------------------ + +% The implementation of |\operatorname| is similar to the mechanism +% \LaTeXe{} uses for functions like sin and cos, and simpler than the +% one of \AmSLaTeX{}. We use |\providecommand| for the definition in +% order to keep the one of the \pkg{amstex} if this package has +% already been loaded. +% \begin{macrocode} +\providecommand{\operatorname}[1]{% + \mathop{\operator@font#1}\nolimits} +\RequirePackage{amsfonts} + +\renewcommand{\P}{% + \mathop{\operator@font I\hspace{-1.5pt}P\hspace{.13pt}}} +\newcommand{\E}{% + \mathop{\operator@font I\hspace{-1.5pt}E\hspace{.13pt}}} +\newcommand{\VAR}{\operatorname{var}} +\newcommand{\COV}{\operatorname{cov}} +\newcommand{\COR}{\operatorname{cor}} + +% Figures ---------------------------------------------------------------------- + +% For use with pandoc > 3.2.1 +\newsavebox\pandoc@box +\newcommand*\pandocbounded[1]{% scales image to fit in text height/width + \sbox\pandoc@box{#1}% + \Gscale@div\@tempa{\textheight}{\dimexpr\ht\pandoc@box+\dp\pandoc@box\relax}% + \Gscale@div\@tempb{\linewidth}{\wd\pandoc@box}% + \ifdim\@tempb\p@<\@tempa\p@\let\@tempa\@tempb\fi% select the smaller of both + \ifdim\@tempa\p@<\p@\scalebox{\@tempa}{\usebox\pandoc@box}% + \else\usebox{\pandoc@box}% + \fi% +} + +\RequirePackage[font=small,labelfont=bf]{caption} + +% Wide environments for figures and tables ------------------------------------- +\RequirePackage{environ} + +% An easy way to make a figure span the full width of the page +\NewEnviron{widefigure}[1][]{ +\begin{figure}[#1] +\advance\leftskip-2cm +\begin{minipage}{\dimexpr\textwidth+4cm\relax}% + \captionsetup{margin=2cm} + \BODY +\end{minipage}% +\end{figure} +} + +\NewEnviron{widetable}[1][]{ +\begin{table}[#1] +\advance\leftskip-2cm +\begin{minipage}{\dimexpr\textwidth+4cm\relax}% + \captionsetup{margin=2cm} + \BODY +\end{minipage}% +\end{table} +} diff --git a/_articles/RJ-2025-044/RJwrapper.tex b/_articles/RJ-2025-044/RJwrapper.tex new file mode 100644 index 0000000000..862395dc73 --- /dev/null +++ b/_articles/RJ-2025-044/RJwrapper.tex @@ -0,0 +1,70 @@ +\documentclass[a4paper]{report} +\usepackage[utf8]{inputenc} +\usepackage[T1]{fontenc} +\usepackage{RJournal} +\usepackage{amsmath,amssymb,array} +\usepackage{booktabs} + + +% tightlist command for lists without linebreak +\providecommand{\tightlist}{% + \setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}} + +\usepackage{longtable} + +% Always define CSL refs as bib entries are contained in separate doc +% Pandoc citation processing +%From Pandoc 3.1.8 +% definitions for citeproc citations +\NewDocumentCommand\citeproctext{}{} +\NewDocumentCommand\citeproc{mm}{% + \begingroup\def\citeproctext{#2}\cite{#1}\endgroup} +\makeatletter + % allow citations to break across lines + \let\@cite@ofmt\@firstofone + % avoid brackets around text for \cite: + \def\@biblabel#1{} + \def\@cite#1#2{{#1\if@tempswa , #2\fi}} +\makeatother +\newlength{\cslhangindent} +\setlength{\cslhangindent}{1.5em} +\newlength{\csllabelwidth} +\setlength{\csllabelwidth}{3em} +\newenvironment{CSLReferences}[2] % #1 hanging-indent, #2 entry-spacing + {\begin{list}{}{% + \setlength{\itemindent}{0pt} + \setlength{\leftmargin}{0pt} + \setlength{\parsep}{0pt} + % turn on hanging indent if param 1 is 1 + \ifodd #1 + \setlength{\leftmargin}{\cslhangindent} + \setlength{\itemindent}{-1\cslhangindent} + \fi + % set entry spacing + \setlength{\itemsep}{#2\baselineskip}}} + {\end{list}} +\usepackage{calc} +\newcommand{\CSLBlock}[1]{#1\hfill\break} +\newcommand{\CSLLeftMargin}[1]{\parbox[t]{\csllabelwidth}{#1}} +\newcommand{\CSLRightInline}[1]{\parbox[t]{\linewidth - \csllabelwidth}{#1}\break} +\newcommand{\CSLIndent}[1]{\hspace{\cslhangindent}#1} + + + +\begin{document} + + +%% do not edit, for illustration only +\sectionhead{Contributed research article} +\volume{17} +\volnumber{4} +\year{2025} +\month{December} +\setcounter{page}{275} + +\begin{article} + \input{RJ-2025-044} +\end{article} + + +\end{document} diff --git a/_articles/RJ-2025-044/References.bib b/_articles/RJ-2025-044/References.bib new file mode 100644 index 0000000000..5e874f4cd7 --- /dev/null +++ b/_articles/RJ-2025-044/References.bib @@ -0,0 +1,371 @@ +@article{AbbrittiDellErbaMorenoSola2018, + author = {Abbritti, Mirko and Dell'Erba, Salvatore and Moreno, Antonio and Sola, Sergio}, + title = {Global Factors in the Term Structure of Interest Rates}, + journal = {International Journal of Central Banking}, + year = {2018}, + month = {March}, + volume = {14}, + pages = {301-339}, + url = {https://www.ijcb.org/journal/ijcb18q1a7.htm}, + number = {2} +} +@article{AdrianCrumpMoench2013, + title = {Pricing the Term Structure with Linear Regressions}, + author = {Adrian, Tobias and Crump, Richard K and Moench, Emanuel}, + journal = {Journal of Financial Economics}, + volume = {110}, + number = {1}, + pages = {110--138}, + year = {2013}, + url = {https://doi.org/10.1016/j.jfineco.2013.04.009}, + publisher = {Elsevier} +} +@article{AngPiazzesi2003, + author = {Andrew Ang and Monika Piazzesi}, + title = {A No-Arbitrage Vector Autoregression of Term Structure Dynamics with Macroeconomic and Latent Variables}, + journal = {Journal of Monetary Economics}, + year = {2003}, + volume = {50}, + pages = {745-787}, + url = {https://doi.org/10.1016/S0304-3932(03)00032-1}, + number = {4} +} +@Manual{BGVAR2024, + title = {{BGVAR}: {Bayesian} Global Vector Autoregressions}, + author = {Maximilian Boeck and Martin Feldkircher and Florian Huber and Darjus Hosszejni}, + year = {2024}, + note = {{R} package version 2.5.8}, + url = {https://CRAN.R-project.org/package=BGVAR} +} +@Manual{bigtime2023, + title = {{bigtime}: Sparse Estimation of Large Time Series Models}, + author = {Ines Wilms and David S. Matteson and Jacob Bien and Sumanta Basu and Will Nicholson Enrico Wegner}, + year = {2023}, + note = {{R} package version 0.2.3}, + url = {https://CRAN.R-project.org/package=bigtime} +} +@Manual{BigVAR2025, + title = {{BigVAR}: Dimension Reduction Methods for Multivariate Time Series}, + author = {Will Nicholson and David Matteson and Jacob Bien}, + year = {2025}, + note = {{R} package version 1.1.3}, + url = {https://CRAN.R-project.org/package=BigVAR} +} +@article{BauerRudebuschWu2012, + title = {Correcting Estimation Bias in Dynamic Term Structure Models}, + author = {Bauer, Michael D and Rudebusch, Glenn D and Wu, Jing Cynthia}, + journal = {Journal of Business \& Economic Statistics}, + volume = {30}, + number = {3}, + pages = {454--467}, + year = {2012}, + url = {https://doi.org/10.1080/07350015.2012.693855}, + publisher = {Taylor \& Francis} +} +@article{BauerRudebusch2017, + title = {Resolving the Spanning Puzzle in Macro-Finance Term Structure Models}, + author = {Bauer, Michael D and Rudebusch, Glenn D}, + journal = {Review of Finance}, + volume = {21}, + number = {2}, + pages = {511--553}, + year = {2017}, + url = {https://doi.org/10.1093/rof/rfw044}, + publisher = {Oxford University Press} +} +@article{CandelonMoura2023, + title = {Sovereign Yield Curves and the {COVID}-19 in Emerging Markets}, + author = {Candelon, Bertrand and Moura, Rubens}, + journal = {Economic Modelling}, + volume = {127}, + pages = {106453}, + year = {2023}, + url = {https://doi.org/10.1016/j.econmod.2023.106453}, + publisher = {Elsevier} +} +@article{CandelonMoura2024, + title = {A Multicountry Model of the Term Structures of Interest Rates with a {GVAR}}, + author = {Candelon, Bertrand and Moura, Rubens}, + journal = {Journal of Financial Econometrics}, + volume = {22}, + number = {5}, + pages = {1558--1587}, + year = {2024}, + url = {https://doi.org/10.1093/jjfinec/nbae008}, + publisher = {Oxford University Press} +} +@article{ChudikPesaran2016, + title = {Theory and Practice of {GVAR} Modelling}, + author = {Chudik, Alexander and Pesaran, M Hashem}, + journal = {Journal of Economic Surveys}, + volume = {30}, + number = {1}, + pages = {165--197}, + year = {2016}, + url = {https://doi.org/10.1111/joes.12095}, + publisher = {Wiley Online Library} +} +@article{DaiSingleton2000, + author = {Dai, Qiang and Singleton, Kenneth J.}, + title = {Specification Analysis of Affine Term Structure Models}, + journal = {Journal of Finance}, + year = {2000}, + volume = {55}, + pages = {1943--1978}, + url = {https://doi.org/10.1111/0022-1082.00278}, + number = {5} +} +@article{DaiSingleton2002, + title = {Expectation Puzzles, Time-Varying Risk Premia, and Affine Models of the Term Structure}, + author = {Dai, Qiang and Singleton, Kenneth J}, + journal = {Journal of Financial Economics}, + volume = {63}, + number = {3}, + pages = {415--441}, + year = {2002}, + url = {https://doi.org/10.1016/S0304-405X(02)00067-3}, + publisher = {Elsevier} +} +@article{DuffieKan1996, + title = {A Yield-Factor Model of Interest Rates}, + author = {Duffie, Darrell and Kan, Rui}, + journal = {Mathematical Finance}, + volume = {6}, + number = {4}, + pages = {379--406}, + year = {1996}, + url = {https://doi.org/10.1111/j.1467-9965.1996.tb00123.x}, + publisher = {Wiley Online Library} +} +@book{EfronTibshirani1994, + title = {An Introduction to the Bootstrap}, + author = {Efron, Bradley and Tibshirani, Robert J}, + year = {1994}, + doi = {https://doi.org/10.1201/9780429246593}, + publisher = {Chapman and Hall/CRC} +} +@Manual{EWS2017, + title = {{EWS}: Early Warning System}, + author = {Jean-Baptiste Hasse and Quentin Lajaunie}, + year = {2021}, + note = {{R} package version 0.2.0}, + url = {https://CRAN.R-project.org/package=EWS} +} +@Manual{fBonds2017, + title = {{fBonds}: {R}metrics - Pricing and Evaluating Bonds}, + author = {Tobias Setz}, + year = {2017}, + note = {{R} package version 3042.78}, + url = {https://CRAN.R-project.org/package=fBonds} +} +@book{ggplot22016, + author = {Hadley Wickham}, + title = {{ggplot2}: Elegant Graphics for Data Analysis}, + publisher = {Springer-Verlag New York}, + year = {2016}, + isbn = {978-3-319-24277-4}, + url = {https://ggplot2.tidyverse.org} +} +@article{Moura2022, + title = {Modelling the Term Structure of Interest Rates in a Multicountry Setting}, + author = {Moura, Rubens Guim{\~a}es Togeiro de}, + year = {2022}, + url = {https://dial.uclouvain.be/pr/boreal/object/boreal:262850} +} +@Manual{MTS2022, + title = {{MTS}: All-Purpose Toolkit for Analyzing Multivariate Time Series (MTS) and Estimating Multivariate Volatility Models}, + author = {Ruey S. Tsay and David Wood and Jon Lachmann}, + year = {2022}, + note = {{R} package version 1.2.1}, + url = {https://CRAN.R-project.org/package=MTS} +} +@article{GurkaynakWright2012, + title = {Macroeconomics and the Term Structure}, + author = {G{\"u}rkaynak, Refet S and Wright, Jonathan H}, + journal = {Journal of Economic Literature}, + volume = {50}, + number = {2}, + pages = {331--67}, + url = {https://www.aeaweb.org/articles?id=10.1257/jel.50.2.331}, + year = {2012} +} +@article{Horowitz2019, + title = {Bootstrap Methods in Econometrics}, + author = {Horowitz, Joel L}, + journal = {Annual Review of Economics}, + volume = {11}, + number = {1}, + pages = {193--224}, + year = {2019}, + url = {https://doi.org/10.1146/annurev-economics-080218-025651}, + publisher = {Annual Reviews} +} +@article{HyndmanKillick2025, + title = {{{CRAN}} Task View: Time Series Analysis}, + author = {Hyndman, Rob J and Killick, Rebecca}, + year = {2025}, + url = {https://cran.r-project.org/web/views/TimeSeries.html}, + publisher = {Comprehensive {R} Archive Network ({CRAN})} +} +@book{KilianLutkepohl2017, + title = {Structural Vector Autoregressive Analysis}, + author = {Kilian, Lutz and L{\"u}tkepohl, Helmut}, + year = {2017}, + url = {https://doi.org/10.1017/9781108164818}, + publisher = {Cambridge University Press} +} +@misc{LeSingleton2018, + title = {A Small Package of {MATLAB} Routines for the Estimation of Some Term Structure Models}, + author = {Anh Le and Ken Singleton}, + institution = {Bank of Finland}, + type = {Euro Area Business Cycle Network Training School - Term Structure Modelling}, + url = {https://cepr.org/40029}, + year = {2018} +} +@article{JoslinPriebschSingleton2014, + author = {Joslin, Scott and Priebsch, Marcel and Singleton, Kenneth J.}, + title = {Risk Premiums in Dynamic Term Structure Models with Unspanned Macro Risks}, + journal = {Journal of Finance}, + year = {2014}, + volume = {69}, + pages = {1197-1233}, + url = {https://doi.org/10.1111/jofi.12131}, + number = {3} +} +@article{JoslinSingletonZhu2011, + author = {Scott Joslin and Kenneth J. Singleton and Haoxing Zhu}, + title = {A New Perspective on {Gaussian} Dynamic Term Structure Models}, + journal = {Review of Financial Studies}, + year = {2011}, + volume = {24}, + pages = {926-970}, + url = {https://doi.org/10.1093/rfs/hhq128}, + number = {3} +} +@article{JotikasthiraLeLundblad2015, + author = {Jotikasthira, Chotibhak and Le, Anh and Lundblad, Christian}, + title = {Why Do Term Structures in Different Currencies Co-move?}, + journal = {Journal of Financial Economics}, + year = {2015}, + pages = {58-83}, + url = {https://doi.org/10.1016/j.jfineco.2014.09.004}, + volume = {115} +} +@article{LittermanScheinkman1991, + author = {Litterman, Robert and Scheinkman, Jos{\'e}}, + title = {Common Factors Affecting Bond Returns}, + journal = {Journal of Fixed Income}, + year = {1991}, + volume = {1}, + pages = {54-61}, + doi = {10.3905/jfi.1991.692347} +} +@Manual{MultiATSM2025, + title = {{MultiATSM}: Multicountry Term Structure of Interest Rates Models}, + author = {Rubens Moura}, + year = {2025}, + note = {{R} package version 1.5.1}, + url = {https://CRAN.R-project.org/package=MultiATSM} +} +@article{NelsonSiegel1987, + title = {Parsimonious Modeling of Yield Curves}, + author = {Nelson, Charles R and Siegel, Andrew F}, + journal = {Journal of Business}, + pages = {473--489}, + year = {1987}, + url = {https://www.jstor.org/stable/2352957}, + publisher = {JSTOR} +} +@article{PesaranShin1998, + title = {Generalized Impulse Response Analysis in Linear Multivariate Models}, + author = {Pesaran, H Hashem and Shin, Yongcheol}, + journal = {Economics Letters}, + volume = {58}, + number = {1}, + pages = {17--29}, + year = {1998}, + url = {https://doi.org/10.1016/S0165-1765(97)00214-0}, + publisher = {Elsevier} +} +@incollection{Piazzesi2010, + title = {Affine Term Structure Models}, + author = {Piazzesi, Monika}, + booktitle = {Handbook of Financial Econometrics: Tools and Techniques}, + pages = {691--766}, + year = {2010}, + url = {https://doi.org/10.1016/B978-0-444-50897-3.50015-8}, + publisher = {Elsevier} +} +@article{RudebuschWu2008, + title = {A Macro-Finance Model of the Term Structure, Monetary Policy and the Economy}, + author = {Rudebusch, Glenn D and Wu, Tao}, + journal = {The Economic Journal}, + volume = {118}, + number = {530}, + pages = {906--926}, + year = {2008}, + url = {https://doi.org/10.1111/j.1468-0297.2008.02155.x}, + publisher = {Oxford University Press Oxford, UK} +} +@Manual{simStateSpace2025, + title = {{simStateSpace}: Simulate Data from State Space Models}, + author = {Ivan Jacob Agaloos Pesigan}, + year = {2025}, + note = {{R} package version 1.2.10}, + url = {https://CRAN.R-project.org/package=simStateSpace} +} +@Manual{Spillover2024, + title = {{Spillover}: Spillover/Connectedness Index Based on VAR Modelling}, + author = {Jilber Urbina}, + year = {2024}, + note = {{R} package version 0.1.1}, + url = {https://CRAN.R-project.org/package=Spillover} +} +@Manual{statespacer2023, + title = {{statespacer}: State Space Modelling in {R}}, + author = {Dylan Beijers}, + year = {2023}, + note = {{R} package version 0.5.0}, + url = {https://CRAN.R-project.org/package=statespacer} +} +@Manual{svars2023, + title = {{svars}: Data-Driven Identification of {SVAR} Models}, + author = {Alexander Lange and Bernhard Dalheimer and Helmut Herwartz and Simone Maxand and Hannes Riebl}, + year = {2023}, + note = {{R} package version 1.3.11}, + url = {https://CRAN.R-project.org/package=svars} +} +@misc{Svensson1994, + title = {Estimating and Interpreting Forward Interest Rates: {Sweden} 1992-1994}, + author = {Svensson, Lars EO}, + year = {1994}, + url = {https://www.nber.org/papers/w4871}, + doi = {10.3386/w4871}, + publisher = {National Bureau of Economic Research Cambridge, Mass., {USA}} +} +@article{Vasicek1977, + title = {An Equilibrium Characterization of the Term Structure}, + author = {Vasicek, Oldrich}, + journal = {Journal of Financial Economics}, + volume = {5}, + number = {2}, + pages = {177--188}, + year = {1977}, + url = {https://doi.org/10.1016/0304-405X(77)90016-2}, + publisher = {Elsevier} +} +@Manual{YieldCurve2015, + title = {{YieldCurve}: Modelling and Estimation of the Yield Curve}, + author = {Sergio Salvino Guirreri}, + year = {2015}, + note = {{R} package version 4.1}, + url = {https://CRAN.R-project.org/package=YieldCurve} +} +@Manual{vars2024, + title = {{vars}: {VAR} Modelling}, + author = {Bernhard Pfaff and Matthieu Stigler}, + year = {2024}, + note = {{R} package version 1.6-1}, + url = {https://CRAN.R-project.org/package=vars} +} + diff --git a/_articles/RJ-2025-045/CPMP-2015_data/algorithm_runs.arff b/_articles/RJ-2025-045/CPMP-2015_data/algorithm_runs.arff new file mode 100644 index 0000000000..ba74aa9c6e --- /dev/null +++ b/_articles/RJ-2025-045/CPMP-2015_data/algorithm_runs.arff @@ -0,0 +1,2117 @@ +@RELATION algorithm_runs_premarshalling_astar_2013 + +@ATTRIBUTE instance_id STRING +@ATTRIBUTE repetition NUMERIC +@ATTRIBUTE algorithm STRING +@ATTRIBUTE runtime NUMERIC +@ATTRIBUTE runstatus {ok, timeout, memout, not_applicable, crash, other} + +@DATA +BF10_cpmp_16_8_77_16_58_13,1,astar-symmulgt-transmul,3600,memout +BF10_cpmp_16_8_77_16_58_13,1,astar-symmullt-transmul,3600,memout +BF10_cpmp_16_8_77_16_58_13,1,idastar-symmulgt-transmul,311.863,ok +BF10_cpmp_16_8_77_16_58_13,1,idastar-symmullt-transmul,3600,timeout +BF10_cpmp_16_8_77_16_58_3,1,astar-symmulgt-transmul,3600,memout +BF10_cpmp_16_8_77_16_58_3,1,astar-symmullt-transmul,3600,memout +BF10_cpmp_16_8_77_16_58_3,1,idastar-symmulgt-transmul,78.121,ok +BF10_cpmp_16_8_77_16_58_3,1,idastar-symmullt-transmul,600.402,ok +BF10_cpmp_16_8_77_16_58_8,1,astar-symmulgt-transmul,3600,memout +BF10_cpmp_16_8_77_16_58_8,1,astar-symmullt-transmul,3600,memout +BF10_cpmp_16_8_77_16_58_8,1,idastar-symmulgt-transmul,0.012,ok +BF10_cpmp_16_8_77_16_58_8,1,idastar-symmullt-transmul,60.72,ok +BF11_cpmp_16_8_77_31_47_10,1,astar-symmulgt-transmul,3600,memout +BF11_cpmp_16_8_77_31_47_10,1,astar-symmullt-transmul,92.986,ok +BF11_cpmp_16_8_77_31_47_10,1,idastar-symmulgt-transmul,2031.039,ok +BF11_cpmp_16_8_77_31_47_10,1,idastar-symmullt-transmul,141.857,ok +BF11_cpmp_16_8_77_31_47_11,1,astar-symmulgt-transmul,3600,memout +BF11_cpmp_16_8_77_31_47_11,1,astar-symmullt-transmul,3600,memout +BF11_cpmp_16_8_77_31_47_11,1,idastar-symmulgt-transmul,422.53,ok +BF11_cpmp_16_8_77_31_47_11,1,idastar-symmullt-transmul,3600,timeout +BF11_cpmp_16_8_77_31_47_16,1,astar-symmulgt-transmul,182.019,ok +BF11_cpmp_16_8_77_31_47_16,1,astar-symmullt-transmul,3600,memout +BF11_cpmp_16_8_77_31_47_16,1,idastar-symmulgt-transmul,66.988,ok +BF11_cpmp_16_8_77_31_47_16,1,idastar-symmullt-transmul,249.488,ok +BF11_cpmp_16_8_77_31_47_6,1,astar-symmulgt-transmul,3600,memout +BF11_cpmp_16_8_77_31_47_6,1,astar-symmullt-transmul,3600,memout +BF11_cpmp_16_8_77_31_47_6,1,idastar-symmulgt-transmul,3600,timeout +BF11_cpmp_16_8_77_31_47_6,1,idastar-symmullt-transmul,1121.786,ok +BF11_cpmp_16_8_77_31_47_7,1,astar-symmulgt-transmul,2.28,ok +BF11_cpmp_16_8_77_31_47_7,1,astar-symmullt-transmul,3600,memout +BF11_cpmp_16_8_77_31_47_7,1,idastar-symmulgt-transmul,14.481,ok +BF11_cpmp_16_8_77_31_47_7,1,idastar-symmullt-transmul,758.815,ok +BF12_cpmp_16_8_77_31_58_14,1,astar-symmulgt-transmul,3600,memout +BF12_cpmp_16_8_77_31_58_14,1,astar-symmullt-transmul,3600,memout +BF12_cpmp_16_8_77_31_58_14,1,idastar-symmulgt-transmul,2816.652,ok +BF12_cpmp_16_8_77_31_58_14,1,idastar-symmullt-transmul,1246.814,ok +BF17_cpmp_20_5_60_12_36_1,1,astar-symmulgt-transmul,0.052,ok +BF17_cpmp_20_5_60_12_36_1,1,astar-symmullt-transmul,0.332,ok +BF17_cpmp_20_5_60_12_36_1,1,idastar-symmulgt-transmul,0.004,ok +BF17_cpmp_20_5_60_12_36_1,1,idastar-symmullt-transmul,0.076,ok +BF17_cpmp_20_5_60_12_36_10,1,astar-symmulgt-transmul,3600,memout +BF17_cpmp_20_5_60_12_36_10,1,astar-symmullt-transmul,3600,memout +BF17_cpmp_20_5_60_12_36_10,1,idastar-symmulgt-transmul,0.004,ok +BF17_cpmp_20_5_60_12_36_10,1,idastar-symmullt-transmul,0.032,ok +BF17_cpmp_20_5_60_12_36_11,1,astar-symmulgt-transmul,1.096,ok +BF17_cpmp_20_5_60_12_36_11,1,astar-symmullt-transmul,3600,memout +BF17_cpmp_20_5_60_12_36_11,1,idastar-symmulgt-transmul,3600,timeout +BF17_cpmp_20_5_60_12_36_11,1,idastar-symmullt-transmul,3600,timeout +BF17_cpmp_20_5_60_12_36_12,1,astar-symmulgt-transmul,10.053,ok +BF17_cpmp_20_5_60_12_36_12,1,astar-symmullt-transmul,0.024,ok +BF17_cpmp_20_5_60_12_36_12,1,idastar-symmulgt-transmul,3600,timeout +BF17_cpmp_20_5_60_12_36_12,1,idastar-symmullt-transmul,3600,timeout +BF17_cpmp_20_5_60_12_36_15,1,astar-symmulgt-transmul,0.024,ok +BF17_cpmp_20_5_60_12_36_15,1,astar-symmullt-transmul,0.616,ok +BF17_cpmp_20_5_60_12_36_15,1,idastar-symmulgt-transmul,0.004,ok +BF17_cpmp_20_5_60_12_36_15,1,idastar-symmullt-transmul,0.004,ok +BF17_cpmp_20_5_60_12_36_16,1,astar-symmulgt-transmul,1.008,ok +BF17_cpmp_20_5_60_12_36_16,1,astar-symmullt-transmul,106.643,ok +BF17_cpmp_20_5_60_12_36_16,1,idastar-symmulgt-transmul,2602.087,ok +BF17_cpmp_20_5_60_12_36_16,1,idastar-symmullt-transmul,38.226,ok +BF17_cpmp_20_5_60_12_36_17,1,astar-symmulgt-transmul,33.774,ok +BF17_cpmp_20_5_60_12_36_17,1,astar-symmullt-transmul,22.953,ok +BF17_cpmp_20_5_60_12_36_17,1,idastar-symmulgt-transmul,0.004,ok +BF17_cpmp_20_5_60_12_36_17,1,idastar-symmullt-transmul,89.422,ok +BF17_cpmp_20_5_60_12_36_2,1,astar-symmulgt-transmul,0.044,ok +BF17_cpmp_20_5_60_12_36_2,1,astar-symmullt-transmul,0.08,ok +BF17_cpmp_20_5_60_12_36_2,1,idastar-symmulgt-transmul,3600,timeout +BF17_cpmp_20_5_60_12_36_2,1,idastar-symmullt-transmul,3600,timeout +BF17_cpmp_20_5_60_12_36_20,1,astar-symmulgt-transmul,0.024,ok +BF17_cpmp_20_5_60_12_36_20,1,astar-symmullt-transmul,0.1,ok +BF17_cpmp_20_5_60_12_36_20,1,idastar-symmulgt-transmul,3600,timeout +BF17_cpmp_20_5_60_12_36_20,1,idastar-symmullt-transmul,67.956,ok +BF17_cpmp_20_5_60_12_36_3,1,astar-symmulgt-transmul,0.028,ok +BF17_cpmp_20_5_60_12_36_3,1,astar-symmullt-transmul,0.092,ok +BF17_cpmp_20_5_60_12_36_3,1,idastar-symmulgt-transmul,3600,timeout +BF17_cpmp_20_5_60_12_36_3,1,idastar-symmullt-transmul,3600,timeout +BF17_cpmp_20_5_60_12_36_4,1,astar-symmulgt-transmul,171.987,ok +BF17_cpmp_20_5_60_12_36_4,1,astar-symmullt-transmul,0.032,ok +BF17_cpmp_20_5_60_12_36_4,1,idastar-symmulgt-transmul,3600,timeout +BF17_cpmp_20_5_60_12_36_4,1,idastar-symmullt-transmul,3600,timeout +BF17_cpmp_20_5_60_12_36_5,1,astar-symmulgt-transmul,3600,memout +BF17_cpmp_20_5_60_12_36_5,1,astar-symmullt-transmul,2.64,ok +BF17_cpmp_20_5_60_12_36_5,1,idastar-symmulgt-transmul,3600,timeout +BF17_cpmp_20_5_60_12_36_5,1,idastar-symmullt-transmul,3600,timeout +BF17_cpmp_20_5_60_12_36_6,1,astar-symmulgt-transmul,0.028,ok +BF17_cpmp_20_5_60_12_36_6,1,astar-symmullt-transmul,3600,memout +BF17_cpmp_20_5_60_12_36_6,1,idastar-symmulgt-transmul,0.004,ok +BF17_cpmp_20_5_60_12_36_6,1,idastar-symmullt-transmul,3600,timeout +BF17_cpmp_20_5_60_12_36_7,1,astar-symmulgt-transmul,0.156,ok +BF17_cpmp_20_5_60_12_36_7,1,astar-symmullt-transmul,3600,memout +BF17_cpmp_20_5_60_12_36_7,1,idastar-symmulgt-transmul,3600,timeout +BF17_cpmp_20_5_60_12_36_7,1,idastar-symmullt-transmul,3600,timeout +BF17_cpmp_20_5_60_12_36_8,1,astar-symmulgt-transmul,1.46,ok +BF17_cpmp_20_5_60_12_36_8,1,astar-symmullt-transmul,0.04,ok +BF17_cpmp_20_5_60_12_36_8,1,idastar-symmulgt-transmul,3600,timeout +BF17_cpmp_20_5_60_12_36_8,1,idastar-symmullt-transmul,1.9,ok +BF18_cpmp_20_5_60_12_45_1,1,astar-symmulgt-transmul,0.06,ok +BF18_cpmp_20_5_60_12_45_1,1,astar-symmullt-transmul,3600,memout +BF18_cpmp_20_5_60_12_45_1,1,idastar-symmulgt-transmul,3600,timeout +BF18_cpmp_20_5_60_12_45_1,1,idastar-symmullt-transmul,526.597,ok +BF18_cpmp_20_5_60_12_45_12,1,astar-symmulgt-transmul,3600,memout +BF18_cpmp_20_5_60_12_45_12,1,astar-symmullt-transmul,0.132,ok +BF18_cpmp_20_5_60_12_45_12,1,idastar-symmulgt-transmul,0.004,ok +BF18_cpmp_20_5_60_12_45_12,1,idastar-symmullt-transmul,36.754,ok +BF18_cpmp_20_5_60_12_45_13,1,astar-symmulgt-transmul,0.032,ok +BF18_cpmp_20_5_60_12_45_13,1,astar-symmullt-transmul,3600,memout +BF18_cpmp_20_5_60_12_45_13,1,idastar-symmulgt-transmul,3600,timeout +BF18_cpmp_20_5_60_12_45_13,1,idastar-symmullt-transmul,79.909,ok +BF18_cpmp_20_5_60_12_45_14,1,astar-symmulgt-transmul,88.109,ok +BF18_cpmp_20_5_60_12_45_14,1,astar-symmullt-transmul,162.306,ok +BF18_cpmp_20_5_60_12_45_14,1,idastar-symmulgt-transmul,0.004,ok +BF18_cpmp_20_5_60_12_45_14,1,idastar-symmullt-transmul,68.552,ok +BF18_cpmp_20_5_60_12_45_15,1,astar-symmulgt-transmul,0.036,ok +BF18_cpmp_20_5_60_12_45_15,1,astar-symmullt-transmul,87.281,ok +BF18_cpmp_20_5_60_12_45_15,1,idastar-symmulgt-transmul,123.636,ok +BF18_cpmp_20_5_60_12_45_15,1,idastar-symmullt-transmul,0.124,ok +BF18_cpmp_20_5_60_12_45_16,1,astar-symmulgt-transmul,0.144,ok +BF18_cpmp_20_5_60_12_45_16,1,astar-symmullt-transmul,0.032,ok +BF18_cpmp_20_5_60_12_45_16,1,idastar-symmulgt-transmul,0.004,ok +BF18_cpmp_20_5_60_12_45_16,1,idastar-symmullt-transmul,0.008,ok +BF18_cpmp_20_5_60_12_45_17,1,astar-symmulgt-transmul,0.04,ok +BF18_cpmp_20_5_60_12_45_17,1,astar-symmullt-transmul,0.056,ok +BF18_cpmp_20_5_60_12_45_17,1,idastar-symmulgt-transmul,0.004,ok +BF18_cpmp_20_5_60_12_45_17,1,idastar-symmullt-transmul,3600,timeout +BF18_cpmp_20_5_60_12_45_18,1,astar-symmulgt-transmul,0.248,ok +BF18_cpmp_20_5_60_12_45_18,1,astar-symmullt-transmul,3600,memout +BF18_cpmp_20_5_60_12_45_18,1,idastar-symmulgt-transmul,0.004,ok +BF18_cpmp_20_5_60_12_45_18,1,idastar-symmullt-transmul,3.552,ok +BF18_cpmp_20_5_60_12_45_19,1,astar-symmulgt-transmul,3600,memout +BF18_cpmp_20_5_60_12_45_19,1,astar-symmullt-transmul,3600,memout +BF18_cpmp_20_5_60_12_45_19,1,idastar-symmulgt-transmul,0.004,ok +BF18_cpmp_20_5_60_12_45_19,1,idastar-symmullt-transmul,0.008,ok +BF18_cpmp_20_5_60_12_45_2,1,astar-symmulgt-transmul,0.352,ok +BF18_cpmp_20_5_60_12_45_2,1,astar-symmullt-transmul,20.105,ok +BF18_cpmp_20_5_60_12_45_2,1,idastar-symmulgt-transmul,0.004,ok +BF18_cpmp_20_5_60_12_45_2,1,idastar-symmullt-transmul,6.752,ok +BF18_cpmp_20_5_60_12_45_3,1,astar-symmulgt-transmul,5.912,ok +BF18_cpmp_20_5_60_12_45_3,1,astar-symmullt-transmul,3600,memout +BF18_cpmp_20_5_60_12_45_3,1,idastar-symmulgt-transmul,3600,timeout +BF18_cpmp_20_5_60_12_45_3,1,idastar-symmullt-transmul,3600,timeout +BF18_cpmp_20_5_60_12_45_4,1,astar-symmulgt-transmul,0.044,ok +BF18_cpmp_20_5_60_12_45_4,1,astar-symmullt-transmul,3600,memout +BF18_cpmp_20_5_60_12_45_4,1,idastar-symmulgt-transmul,0.06,ok +BF18_cpmp_20_5_60_12_45_4,1,idastar-symmullt-transmul,3600,timeout +BF18_cpmp_20_5_60_12_45_5,1,astar-symmulgt-transmul,3600,memout +BF18_cpmp_20_5_60_12_45_5,1,astar-symmullt-transmul,0.032,ok +BF18_cpmp_20_5_60_12_45_5,1,idastar-symmulgt-transmul,0.116,ok +BF18_cpmp_20_5_60_12_45_5,1,idastar-symmullt-transmul,3600,timeout +BF18_cpmp_20_5_60_12_45_6,1,astar-symmulgt-transmul,3600,memout +BF18_cpmp_20_5_60_12_45_6,1,astar-symmullt-transmul,3600,memout +BF18_cpmp_20_5_60_12_45_6,1,idastar-symmulgt-transmul,0.004,ok +BF18_cpmp_20_5_60_12_45_6,1,idastar-symmullt-transmul,3600,timeout +BF18_cpmp_20_5_60_12_45_7,1,astar-symmulgt-transmul,1.256,ok +BF18_cpmp_20_5_60_12_45_7,1,astar-symmullt-transmul,3600,memout +BF18_cpmp_20_5_60_12_45_7,1,idastar-symmulgt-transmul,0.004,ok +BF18_cpmp_20_5_60_12_45_7,1,idastar-symmullt-transmul,3600,timeout +BF18_cpmp_20_5_60_12_45_8,1,astar-symmulgt-transmul,1.6,ok +BF18_cpmp_20_5_60_12_45_8,1,astar-symmullt-transmul,167.146,ok +BF18_cpmp_20_5_60_12_45_8,1,idastar-symmulgt-transmul,0.004,ok +BF18_cpmp_20_5_60_12_45_8,1,idastar-symmullt-transmul,2.684,ok +BF18_cpmp_20_5_60_12_45_9,1,astar-symmulgt-transmul,3600,memout +BF18_cpmp_20_5_60_12_45_9,1,astar-symmullt-transmul,0.028,ok +BF18_cpmp_20_5_60_12_45_9,1,idastar-symmulgt-transmul,0.004,ok +BF18_cpmp_20_5_60_12_45_9,1,idastar-symmullt-transmul,0.004,ok +BF19_cpmp_20_5_60_24_36_1,1,astar-symmulgt-transmul,3600,memout +BF19_cpmp_20_5_60_24_36_1,1,astar-symmullt-transmul,3600,memout +BF19_cpmp_20_5_60_24_36_1,1,idastar-symmulgt-transmul,1.788,ok +BF19_cpmp_20_5_60_24_36_1,1,idastar-symmullt-transmul,3600,timeout +BF19_cpmp_20_5_60_24_36_10,1,astar-symmulgt-transmul,3600,memout +BF19_cpmp_20_5_60_24_36_10,1,astar-symmullt-transmul,321.764,ok +BF19_cpmp_20_5_60_24_36_10,1,idastar-symmulgt-transmul,1.528,ok +BF19_cpmp_20_5_60_24_36_10,1,idastar-symmullt-transmul,37.138,ok +BF19_cpmp_20_5_60_24_36_11,1,astar-symmulgt-transmul,3600,memout +BF19_cpmp_20_5_60_24_36_11,1,astar-symmullt-transmul,1.392,ok +BF19_cpmp_20_5_60_24_36_11,1,idastar-symmulgt-transmul,0.004,ok +BF19_cpmp_20_5_60_24_36_11,1,idastar-symmullt-transmul,1.036,ok +BF19_cpmp_20_5_60_24_36_14,1,astar-symmulgt-transmul,0.036,ok +BF19_cpmp_20_5_60_24_36_14,1,astar-symmullt-transmul,3600,memout +BF19_cpmp_20_5_60_24_36_14,1,idastar-symmulgt-transmul,0.004,ok +BF19_cpmp_20_5_60_24_36_14,1,idastar-symmullt-transmul,1490.277,ok +BF19_cpmp_20_5_60_24_36_15,1,astar-symmulgt-transmul,3600,memout +BF19_cpmp_20_5_60_24_36_15,1,astar-symmullt-transmul,247.087,ok +BF19_cpmp_20_5_60_24_36_15,1,idastar-symmulgt-transmul,3600,timeout +BF19_cpmp_20_5_60_24_36_15,1,idastar-symmullt-transmul,3600,timeout +BF19_cpmp_20_5_60_24_36_16,1,astar-symmulgt-transmul,6.66,ok +BF19_cpmp_20_5_60_24_36_16,1,astar-symmullt-transmul,0.18,ok +BF19_cpmp_20_5_60_24_36_16,1,idastar-symmulgt-transmul,3600,timeout +BF19_cpmp_20_5_60_24_36_16,1,idastar-symmullt-transmul,3600,timeout +BF19_cpmp_20_5_60_24_36_17,1,astar-symmulgt-transmul,0.036,ok +BF19_cpmp_20_5_60_24_36_17,1,astar-symmullt-transmul,3600,memout +BF19_cpmp_20_5_60_24_36_17,1,idastar-symmulgt-transmul,7.768,ok +BF19_cpmp_20_5_60_24_36_17,1,idastar-symmullt-transmul,3600,timeout +BF19_cpmp_20_5_60_24_36_18,1,astar-symmulgt-transmul,188.192,ok +BF19_cpmp_20_5_60_24_36_18,1,astar-symmullt-transmul,14.485,ok +BF19_cpmp_20_5_60_24_36_18,1,idastar-symmulgt-transmul,0.016,ok +BF19_cpmp_20_5_60_24_36_18,1,idastar-symmullt-transmul,2.704,ok +BF19_cpmp_20_5_60_24_36_2,1,astar-symmulgt-transmul,1.948,ok +BF19_cpmp_20_5_60_24_36_2,1,astar-symmullt-transmul,3600,memout +BF19_cpmp_20_5_60_24_36_2,1,idastar-symmulgt-transmul,0.472,ok +BF19_cpmp_20_5_60_24_36_2,1,idastar-symmullt-transmul,3.312,ok +BF19_cpmp_20_5_60_24_36_20,1,astar-symmulgt-transmul,22.129,ok +BF19_cpmp_20_5_60_24_36_20,1,astar-symmullt-transmul,3600,memout +BF19_cpmp_20_5_60_24_36_20,1,idastar-symmulgt-transmul,0.004,ok +BF19_cpmp_20_5_60_24_36_20,1,idastar-symmullt-transmul,263.444,ok +BF19_cpmp_20_5_60_24_36_3,1,astar-symmulgt-transmul,3600,memout +BF19_cpmp_20_5_60_24_36_3,1,astar-symmullt-transmul,3600,memout +BF19_cpmp_20_5_60_24_36_3,1,idastar-symmulgt-transmul,0.004,ok +BF19_cpmp_20_5_60_24_36_3,1,idastar-symmullt-transmul,3600,timeout +BF19_cpmp_20_5_60_24_36_5,1,astar-symmulgt-transmul,3.572,ok +BF19_cpmp_20_5_60_24_36_5,1,astar-symmullt-transmul,0.124,ok +BF19_cpmp_20_5_60_24_36_5,1,idastar-symmulgt-transmul,0.004,ok +BF19_cpmp_20_5_60_24_36_5,1,idastar-symmullt-transmul,0.008,ok +BF19_cpmp_20_5_60_24_36_6,1,astar-symmulgt-transmul,0.036,ok +BF19_cpmp_20_5_60_24_36_6,1,astar-symmullt-transmul,0.036,ok +BF19_cpmp_20_5_60_24_36_6,1,idastar-symmulgt-transmul,3600,timeout +BF19_cpmp_20_5_60_24_36_6,1,idastar-symmullt-transmul,3600,timeout +BF19_cpmp_20_5_60_24_36_7,1,astar-symmulgt-transmul,5.156,ok +BF19_cpmp_20_5_60_24_36_7,1,astar-symmullt-transmul,69.512,ok +BF19_cpmp_20_5_60_24_36_7,1,idastar-symmulgt-transmul,0.004,ok +BF19_cpmp_20_5_60_24_36_7,1,idastar-symmullt-transmul,2569.305,ok +BF19_cpmp_20_5_60_24_36_8,1,astar-symmulgt-transmul,0.072,ok +BF19_cpmp_20_5_60_24_36_8,1,astar-symmullt-transmul,3600,memout +BF19_cpmp_20_5_60_24_36_8,1,idastar-symmulgt-transmul,3600,timeout +BF19_cpmp_20_5_60_24_36_8,1,idastar-symmullt-transmul,3600,timeout +BF19_cpmp_20_5_60_24_36_9,1,astar-symmulgt-transmul,0.048,ok +BF19_cpmp_20_5_60_24_36_9,1,astar-symmullt-transmul,2.748,ok +BF19_cpmp_20_5_60_24_36_9,1,idastar-symmulgt-transmul,3600,timeout +BF19_cpmp_20_5_60_24_36_9,1,idastar-symmullt-transmul,3600,timeout +BF1_cpmp_16_5_48_10_29_1,1,astar-symmulgt-transmul,0.028,ok +BF1_cpmp_16_5_48_10_29_1,1,astar-symmullt-transmul,0.804,ok +BF1_cpmp_16_5_48_10_29_1,1,idastar-symmulgt-transmul,0.004,ok +BF1_cpmp_16_5_48_10_29_1,1,idastar-symmullt-transmul,3600,timeout +BF1_cpmp_16_5_48_10_29_12,1,astar-symmulgt-transmul,3600,memout +BF1_cpmp_16_5_48_10_29_12,1,astar-symmullt-transmul,0.572,ok +BF1_cpmp_16_5_48_10_29_12,1,idastar-symmulgt-transmul,3600,timeout +BF1_cpmp_16_5_48_10_29_12,1,idastar-symmullt-transmul,128.756,ok +BF1_cpmp_16_5_48_10_29_14,1,astar-symmulgt-transmul,3600,memout +BF1_cpmp_16_5_48_10_29_14,1,astar-symmullt-transmul,19.085,ok +BF1_cpmp_16_5_48_10_29_14,1,idastar-symmulgt-transmul,0.764,ok +BF1_cpmp_16_5_48_10_29_14,1,idastar-symmullt-transmul,3600,timeout +BF1_cpmp_16_5_48_10_29_15,1,astar-symmulgt-transmul,3600,memout +BF1_cpmp_16_5_48_10_29_15,1,astar-symmullt-transmul,3600,memout +BF1_cpmp_16_5_48_10_29_15,1,idastar-symmulgt-transmul,106.755,ok +BF1_cpmp_16_5_48_10_29_15,1,idastar-symmullt-transmul,3588.976,ok +BF1_cpmp_16_5_48_10_29_17,1,astar-symmulgt-transmul,3600,memout +BF1_cpmp_16_5_48_10_29_17,1,astar-symmullt-transmul,0.552,ok +BF1_cpmp_16_5_48_10_29_17,1,idastar-symmulgt-transmul,0.008,ok +BF1_cpmp_16_5_48_10_29_17,1,idastar-symmullt-transmul,0.008,ok +BF1_cpmp_16_5_48_10_29_18,1,astar-symmulgt-transmul,0.032,ok +BF1_cpmp_16_5_48_10_29_18,1,astar-symmullt-transmul,0.048,ok +BF1_cpmp_16_5_48_10_29_18,1,idastar-symmulgt-transmul,0.096,ok +BF1_cpmp_16_5_48_10_29_18,1,idastar-symmullt-transmul,50.831,ok +BF1_cpmp_16_5_48_10_29_19,1,astar-symmulgt-transmul,0.024,ok +BF1_cpmp_16_5_48_10_29_19,1,astar-symmullt-transmul,0.504,ok +BF1_cpmp_16_5_48_10_29_19,1,idastar-symmulgt-transmul,0.004,ok +BF1_cpmp_16_5_48_10_29_19,1,idastar-symmullt-transmul,0.568,ok +BF1_cpmp_16_5_48_10_29_20,1,astar-symmulgt-transmul,3600,memout +BF1_cpmp_16_5_48_10_29_20,1,astar-symmullt-transmul,3600,memout +BF1_cpmp_16_5_48_10_29_20,1,idastar-symmulgt-transmul,220.526,ok +BF1_cpmp_16_5_48_10_29_20,1,idastar-symmullt-transmul,509.552,ok +BF1_cpmp_16_5_48_10_29_3,1,astar-symmulgt-transmul,3600,memout +BF1_cpmp_16_5_48_10_29_3,1,astar-symmullt-transmul,0.1,ok +BF1_cpmp_16_5_48_10_29_3,1,idastar-symmulgt-transmul,127.056,ok +BF1_cpmp_16_5_48_10_29_3,1,idastar-symmullt-transmul,134.024,ok +BF1_cpmp_16_5_48_10_29_4,1,astar-symmulgt-transmul,7.936,ok +BF1_cpmp_16_5_48_10_29_4,1,astar-symmullt-transmul,6.476,ok +BF1_cpmp_16_5_48_10_29_4,1,idastar-symmulgt-transmul,84.661,ok +BF1_cpmp_16_5_48_10_29_4,1,idastar-symmullt-transmul,15.057,ok +BF1_cpmp_16_5_48_10_29_5,1,astar-symmulgt-transmul,3600,memout +BF1_cpmp_16_5_48_10_29_5,1,astar-symmullt-transmul,3600,memout +BF1_cpmp_16_5_48_10_29_5,1,idastar-symmulgt-transmul,0.9,ok +BF1_cpmp_16_5_48_10_29_5,1,idastar-symmullt-transmul,3600,timeout +BF1_cpmp_16_5_48_10_29_6,1,astar-symmulgt-transmul,0.024,ok +BF1_cpmp_16_5_48_10_29_6,1,astar-symmullt-transmul,3600,memout +BF1_cpmp_16_5_48_10_29_6,1,idastar-symmulgt-transmul,344.342,ok +BF1_cpmp_16_5_48_10_29_6,1,idastar-symmullt-transmul,1055.966,ok +BF1_cpmp_16_5_48_10_29_7,1,astar-symmulgt-transmul,0.248,ok +BF1_cpmp_16_5_48_10_29_7,1,astar-symmullt-transmul,8.537,ok +BF1_cpmp_16_5_48_10_29_7,1,idastar-symmulgt-transmul,0.004,ok +BF1_cpmp_16_5_48_10_29_7,1,idastar-symmullt-transmul,1.256,ok +BF1_cpmp_16_5_48_10_29_8,1,astar-symmulgt-transmul,0.144,ok +BF1_cpmp_16_5_48_10_29_8,1,astar-symmullt-transmul,0.032,ok +BF1_cpmp_16_5_48_10_29_8,1,idastar-symmulgt-transmul,0.004,ok +BF1_cpmp_16_5_48_10_29_8,1,idastar-symmullt-transmul,0.008,ok +BF1_cpmp_16_5_48_10_29_9,1,astar-symmulgt-transmul,0.132,ok +BF1_cpmp_16_5_48_10_29_9,1,astar-symmullt-transmul,3600,memout +BF1_cpmp_16_5_48_10_29_9,1,idastar-symmulgt-transmul,0.004,ok +BF1_cpmp_16_5_48_10_29_9,1,idastar-symmullt-transmul,3600,timeout +BF20_cpmp_20_5_60_24_45_1,1,astar-symmulgt-transmul,0.048,ok +BF20_cpmp_20_5_60_24_45_1,1,astar-symmullt-transmul,3600,memout +BF20_cpmp_20_5_60_24_45_1,1,idastar-symmulgt-transmul,0.004,ok +BF20_cpmp_20_5_60_24_45_1,1,idastar-symmullt-transmul,3600,timeout +BF20_cpmp_20_5_60_24_45_10,1,astar-symmulgt-transmul,1.956,ok +BF20_cpmp_20_5_60_24_45_10,1,astar-symmullt-transmul,3.096,ok +BF20_cpmp_20_5_60_24_45_10,1,idastar-symmulgt-transmul,6.34,ok +BF20_cpmp_20_5_60_24_45_10,1,idastar-symmullt-transmul,12.757,ok +BF20_cpmp_20_5_60_24_45_11,1,astar-symmulgt-transmul,11.177,ok +BF20_cpmp_20_5_60_24_45_11,1,astar-symmullt-transmul,376.04,ok +BF20_cpmp_20_5_60_24_45_11,1,idastar-symmulgt-transmul,112.059,ok +BF20_cpmp_20_5_60_24_45_11,1,idastar-symmullt-transmul,3600,timeout +BF20_cpmp_20_5_60_24_45_12,1,astar-symmulgt-transmul,3600,memout +BF20_cpmp_20_5_60_24_45_12,1,astar-symmullt-transmul,65.732,ok +BF20_cpmp_20_5_60_24_45_12,1,idastar-symmulgt-transmul,3600,timeout +BF20_cpmp_20_5_60_24_45_12,1,idastar-symmullt-transmul,3600,timeout +BF20_cpmp_20_5_60_24_45_13,1,astar-symmulgt-transmul,3600,memout +BF20_cpmp_20_5_60_24_45_13,1,astar-symmullt-transmul,0.08,ok +BF20_cpmp_20_5_60_24_45_13,1,idastar-symmulgt-transmul,0.004,ok +BF20_cpmp_20_5_60_24_45_13,1,idastar-symmullt-transmul,0.036,ok +BF20_cpmp_20_5_60_24_45_14,1,astar-symmulgt-transmul,40.315,ok +BF20_cpmp_20_5_60_24_45_14,1,astar-symmullt-transmul,0.208,ok +BF20_cpmp_20_5_60_24_45_14,1,idastar-symmulgt-transmul,0.004,ok +BF20_cpmp_20_5_60_24_45_14,1,idastar-symmullt-transmul,234.759,ok +BF20_cpmp_20_5_60_24_45_15,1,astar-symmulgt-transmul,3600,memout +BF20_cpmp_20_5_60_24_45_15,1,astar-symmullt-transmul,3600,memout +BF20_cpmp_20_5_60_24_45_15,1,idastar-symmulgt-transmul,0.004,ok +BF20_cpmp_20_5_60_24_45_15,1,idastar-symmullt-transmul,3600,timeout +BF20_cpmp_20_5_60_24_45_17,1,astar-symmulgt-transmul,3600,memout +BF20_cpmp_20_5_60_24_45_17,1,astar-symmullt-transmul,0.032,ok +BF20_cpmp_20_5_60_24_45_17,1,idastar-symmulgt-transmul,0.004,ok +BF20_cpmp_20_5_60_24_45_17,1,idastar-symmullt-transmul,1.204,ok +BF20_cpmp_20_5_60_24_45_2,1,astar-symmulgt-transmul,0.036,ok +BF20_cpmp_20_5_60_24_45_2,1,astar-symmullt-transmul,3600,memout +BF20_cpmp_20_5_60_24_45_2,1,idastar-symmulgt-transmul,3600,timeout +BF20_cpmp_20_5_60_24_45_2,1,idastar-symmullt-transmul,1165.425,ok +BF20_cpmp_20_5_60_24_45_20,1,astar-symmulgt-transmul,3600,memout +BF20_cpmp_20_5_60_24_45_20,1,astar-symmullt-transmul,3600,memout +BF20_cpmp_20_5_60_24_45_20,1,idastar-symmulgt-transmul,0.004,ok +BF20_cpmp_20_5_60_24_45_20,1,idastar-symmullt-transmul,7.856,ok +BF20_cpmp_20_5_60_24_45_3,1,astar-symmulgt-transmul,0.192,ok +BF20_cpmp_20_5_60_24_45_3,1,astar-symmullt-transmul,1.768,ok +BF20_cpmp_20_5_60_24_45_3,1,idastar-symmulgt-transmul,0.004,ok +BF20_cpmp_20_5_60_24_45_3,1,idastar-symmullt-transmul,0.18,ok +BF20_cpmp_20_5_60_24_45_5,1,astar-symmulgt-transmul,3600,memout +BF20_cpmp_20_5_60_24_45_5,1,astar-symmullt-transmul,0.032,ok +BF20_cpmp_20_5_60_24_45_5,1,idastar-symmulgt-transmul,0.004,ok +BF20_cpmp_20_5_60_24_45_5,1,idastar-symmullt-transmul,0.008,ok +BF20_cpmp_20_5_60_24_45_6,1,astar-symmulgt-transmul,0.036,ok +BF20_cpmp_20_5_60_24_45_6,1,astar-symmullt-transmul,182.735,ok +BF20_cpmp_20_5_60_24_45_6,1,idastar-symmulgt-transmul,0.004,ok +BF20_cpmp_20_5_60_24_45_6,1,idastar-symmullt-transmul,3.256,ok +BF20_cpmp_20_5_60_24_45_7,1,astar-symmulgt-transmul,3600,memout +BF20_cpmp_20_5_60_24_45_7,1,astar-symmullt-transmul,18.845,ok +BF20_cpmp_20_5_60_24_45_7,1,idastar-symmulgt-transmul,3600,timeout +BF20_cpmp_20_5_60_24_45_7,1,idastar-symmullt-transmul,81.685,ok +BF20_cpmp_20_5_60_24_45_8,1,astar-symmulgt-transmul,3600,memout +BF20_cpmp_20_5_60_24_45_8,1,astar-symmullt-transmul,0.456,ok +BF20_cpmp_20_5_60_24_45_8,1,idastar-symmulgt-transmul,0.004,ok +BF20_cpmp_20_5_60_24_45_8,1,idastar-symmullt-transmul,3600,timeout +BF20_cpmp_20_5_60_24_45_9,1,astar-symmulgt-transmul,3600,memout +BF20_cpmp_20_5_60_24_45_9,1,astar-symmullt-transmul,3600,memout +BF20_cpmp_20_5_60_24_45_9,1,idastar-symmulgt-transmul,2.844,ok +BF20_cpmp_20_5_60_24_45_9,1,idastar-symmullt-transmul,319.476,ok +BF21_cpmp_20_5_80_16_48_11,1,astar-symmulgt-transmul,1.036,ok +BF21_cpmp_20_5_80_16_48_11,1,astar-symmullt-transmul,3600,memout +BF21_cpmp_20_5_80_16_48_11,1,idastar-symmulgt-transmul,3600,timeout +BF21_cpmp_20_5_80_16_48_11,1,idastar-symmullt-transmul,7.244,ok +BF21_cpmp_20_5_80_16_48_14,1,astar-symmulgt-transmul,8.625,ok +BF21_cpmp_20_5_80_16_48_14,1,astar-symmullt-transmul,13.969,ok +BF21_cpmp_20_5_80_16_48_14,1,idastar-symmulgt-transmul,5.416,ok +BF21_cpmp_20_5_80_16_48_14,1,idastar-symmullt-transmul,6.64,ok +BF21_cpmp_20_5_80_16_48_16,1,astar-symmulgt-transmul,1.096,ok +BF21_cpmp_20_5_80_16_48_16,1,astar-symmullt-transmul,311.187,ok +BF21_cpmp_20_5_80_16_48_16,1,idastar-symmulgt-transmul,0.488,ok +BF21_cpmp_20_5_80_16_48_16,1,idastar-symmullt-transmul,3600,timeout +BF21_cpmp_20_5_80_16_48_18,1,astar-symmulgt-transmul,3600,memout +BF21_cpmp_20_5_80_16_48_18,1,astar-symmullt-transmul,3600,memout +BF21_cpmp_20_5_80_16_48_18,1,idastar-symmulgt-transmul,1182.126,ok +BF21_cpmp_20_5_80_16_48_18,1,idastar-symmullt-transmul,967.624,ok +BF21_cpmp_20_5_80_16_48_19,1,astar-symmulgt-transmul,35.778,ok +BF21_cpmp_20_5_80_16_48_19,1,astar-symmullt-transmul,3600,memout +BF21_cpmp_20_5_80_16_48_19,1,idastar-symmulgt-transmul,108.587,ok +BF21_cpmp_20_5_80_16_48_19,1,idastar-symmullt-transmul,3600,timeout +BF21_cpmp_20_5_80_16_48_2,1,astar-symmulgt-transmul,3.604,ok +BF21_cpmp_20_5_80_16_48_2,1,astar-symmullt-transmul,2.696,ok +BF21_cpmp_20_5_80_16_48_2,1,idastar-symmulgt-transmul,3600,timeout +BF21_cpmp_20_5_80_16_48_2,1,idastar-symmullt-transmul,3600,timeout +BF21_cpmp_20_5_80_16_48_3,1,astar-symmulgt-transmul,3600,memout +BF21_cpmp_20_5_80_16_48_3,1,astar-symmullt-transmul,8.865,ok +BF21_cpmp_20_5_80_16_48_3,1,idastar-symmulgt-transmul,2814.224,ok +BF21_cpmp_20_5_80_16_48_3,1,idastar-symmullt-transmul,56.112,ok +BF21_cpmp_20_5_80_16_48_4,1,astar-symmulgt-transmul,5.576,ok +BF21_cpmp_20_5_80_16_48_4,1,astar-symmullt-transmul,3600,memout +BF21_cpmp_20_5_80_16_48_4,1,idastar-symmulgt-transmul,765.884,ok +BF21_cpmp_20_5_80_16_48_4,1,idastar-symmullt-transmul,846.193,ok +BF21_cpmp_20_5_80_16_48_5,1,astar-symmulgt-transmul,101.554,ok +BF21_cpmp_20_5_80_16_48_5,1,astar-symmullt-transmul,3.548,ok +BF21_cpmp_20_5_80_16_48_5,1,idastar-symmulgt-transmul,3600,timeout +BF21_cpmp_20_5_80_16_48_5,1,idastar-symmullt-transmul,3600,timeout +BF21_cpmp_20_5_80_16_48_7,1,astar-symmulgt-transmul,3600,memout +BF21_cpmp_20_5_80_16_48_7,1,astar-symmullt-transmul,27.286,ok +BF21_cpmp_20_5_80_16_48_7,1,idastar-symmulgt-transmul,3600,timeout +BF21_cpmp_20_5_80_16_48_7,1,idastar-symmullt-transmul,3600,timeout +BF21_cpmp_20_5_80_16_48_8,1,astar-symmulgt-transmul,3600,memout +BF21_cpmp_20_5_80_16_48_8,1,astar-symmullt-transmul,17.933,ok +BF21_cpmp_20_5_80_16_48_8,1,idastar-symmulgt-transmul,3600,timeout +BF21_cpmp_20_5_80_16_48_8,1,idastar-symmullt-transmul,3600,timeout +BF21_cpmp_20_5_80_16_48_9,1,astar-symmulgt-transmul,15.261,ok +BF21_cpmp_20_5_80_16_48_9,1,astar-symmullt-transmul,47.395,ok +BF21_cpmp_20_5_80_16_48_9,1,idastar-symmulgt-transmul,35.526,ok +BF21_cpmp_20_5_80_16_48_9,1,idastar-symmullt-transmul,3600,timeout +BF22_cpmp_20_5_80_16_60_11,1,astar-symmulgt-transmul,3600,memout +BF22_cpmp_20_5_80_16_60_11,1,astar-symmullt-transmul,3600,memout +BF22_cpmp_20_5_80_16_60_11,1,idastar-symmulgt-transmul,3600,timeout +BF22_cpmp_20_5_80_16_60_11,1,idastar-symmullt-transmul,2811.512,ok +BF22_cpmp_20_5_80_16_60_16,1,astar-symmulgt-transmul,3600,memout +BF22_cpmp_20_5_80_16_60_16,1,astar-symmullt-transmul,53.339,ok +BF22_cpmp_20_5_80_16_60_16,1,idastar-symmulgt-transmul,3600,timeout +BF22_cpmp_20_5_80_16_60_16,1,idastar-symmullt-transmul,3600,timeout +BF22_cpmp_20_5_80_16_60_4,1,astar-symmulgt-transmul,4.724,ok +BF22_cpmp_20_5_80_16_60_4,1,astar-symmullt-transmul,3600,memout +BF22_cpmp_20_5_80_16_60_4,1,idastar-symmulgt-transmul,3600,timeout +BF22_cpmp_20_5_80_16_60_4,1,idastar-symmullt-transmul,3600,timeout +BF22_cpmp_20_5_80_16_60_6,1,astar-symmulgt-transmul,0.052,ok +BF22_cpmp_20_5_80_16_60_6,1,astar-symmullt-transmul,137.485,ok +BF22_cpmp_20_5_80_16_60_6,1,idastar-symmulgt-transmul,3600,timeout +BF22_cpmp_20_5_80_16_60_6,1,idastar-symmullt-transmul,3600,timeout +BF22_cpmp_20_5_80_16_60_7,1,astar-symmulgt-transmul,47.947,ok +BF22_cpmp_20_5_80_16_60_7,1,astar-symmullt-transmul,3600,memout +BF22_cpmp_20_5_80_16_60_7,1,idastar-symmulgt-transmul,3600,timeout +BF22_cpmp_20_5_80_16_60_7,1,idastar-symmullt-transmul,3600,timeout +BF22_cpmp_20_5_80_16_60_9,1,astar-symmulgt-transmul,3600,memout +BF22_cpmp_20_5_80_16_60_9,1,astar-symmullt-transmul,1.608,ok +BF22_cpmp_20_5_80_16_60_9,1,idastar-symmulgt-transmul,3600,timeout +BF22_cpmp_20_5_80_16_60_9,1,idastar-symmullt-transmul,3600,timeout +BF23_cpmp_20_5_80_32_48_10,1,astar-symmulgt-transmul,1.66,ok +BF23_cpmp_20_5_80_32_48_10,1,astar-symmullt-transmul,18.621,ok +BF23_cpmp_20_5_80_32_48_10,1,idastar-symmulgt-transmul,3600,timeout +BF23_cpmp_20_5_80_32_48_10,1,idastar-symmullt-transmul,3600,timeout +BF23_cpmp_20_5_80_32_48_11,1,astar-symmulgt-transmul,12.141,ok +BF23_cpmp_20_5_80_32_48_11,1,astar-symmullt-transmul,338.729,ok +BF23_cpmp_20_5_80_32_48_11,1,idastar-symmulgt-transmul,3600,timeout +BF23_cpmp_20_5_80_32_48_11,1,idastar-symmullt-transmul,3600,timeout +BF23_cpmp_20_5_80_32_48_12,1,astar-symmulgt-transmul,3600,memout +BF23_cpmp_20_5_80_32_48_12,1,astar-symmullt-transmul,478.102,ok +BF23_cpmp_20_5_80_32_48_12,1,idastar-symmulgt-transmul,3600,timeout +BF23_cpmp_20_5_80_32_48_12,1,idastar-symmullt-transmul,3600,timeout +BF23_cpmp_20_5_80_32_48_15,1,astar-symmulgt-transmul,139.557,ok +BF23_cpmp_20_5_80_32_48_15,1,astar-symmullt-transmul,3600,memout +BF23_cpmp_20_5_80_32_48_15,1,idastar-symmulgt-transmul,0.18,ok +BF23_cpmp_20_5_80_32_48_15,1,idastar-symmullt-transmul,3600,timeout +BF23_cpmp_20_5_80_32_48_16,1,astar-symmulgt-transmul,3600,memout +BF23_cpmp_20_5_80_32_48_16,1,astar-symmullt-transmul,166.602,ok +BF23_cpmp_20_5_80_32_48_16,1,idastar-symmulgt-transmul,4.424,ok +BF23_cpmp_20_5_80_32_48_16,1,idastar-symmullt-transmul,1365.913,ok +BF23_cpmp_20_5_80_32_48_18,1,astar-symmulgt-transmul,0.164,ok +BF23_cpmp_20_5_80_32_48_18,1,astar-symmullt-transmul,4.66,ok +BF23_cpmp_20_5_80_32_48_18,1,idastar-symmulgt-transmul,3600,timeout +BF23_cpmp_20_5_80_32_48_18,1,idastar-symmullt-transmul,3600,timeout +BF23_cpmp_20_5_80_32_48_3,1,astar-symmulgt-transmul,7.836,ok +BF23_cpmp_20_5_80_32_48_3,1,astar-symmullt-transmul,213.633,ok +BF23_cpmp_20_5_80_32_48_3,1,idastar-symmulgt-transmul,3600,timeout +BF23_cpmp_20_5_80_32_48_3,1,idastar-symmullt-transmul,3113.451,ok +BF23_cpmp_20_5_80_32_48_4,1,astar-symmulgt-transmul,2.768,ok +BF23_cpmp_20_5_80_32_48_4,1,astar-symmullt-transmul,123.932,ok +BF23_cpmp_20_5_80_32_48_4,1,idastar-symmulgt-transmul,63.008,ok +BF23_cpmp_20_5_80_32_48_4,1,idastar-symmullt-transmul,690.687,ok +BF23_cpmp_20_5_80_32_48_6,1,astar-symmulgt-transmul,2.856,ok +BF23_cpmp_20_5_80_32_48_6,1,astar-symmullt-transmul,1.268,ok +BF23_cpmp_20_5_80_32_48_6,1,idastar-symmulgt-transmul,3600,timeout +BF23_cpmp_20_5_80_32_48_6,1,idastar-symmullt-transmul,3600,timeout +BF23_cpmp_20_5_80_32_48_7,1,astar-symmulgt-transmul,21.673,ok +BF23_cpmp_20_5_80_32_48_7,1,astar-symmullt-transmul,3600,memout +BF23_cpmp_20_5_80_32_48_7,1,idastar-symmulgt-transmul,3600,timeout +BF23_cpmp_20_5_80_32_48_7,1,idastar-symmullt-transmul,3600,timeout +BF24_cpmp_20_5_80_32_60_13,1,astar-symmulgt-transmul,296.083,ok +BF24_cpmp_20_5_80_32_60_13,1,astar-symmullt-transmul,21.981,ok +BF24_cpmp_20_5_80_32_60_13,1,idastar-symmulgt-transmul,3600,timeout +BF24_cpmp_20_5_80_32_60_13,1,idastar-symmullt-transmul,3600,timeout +BF24_cpmp_20_5_80_32_60_17,1,astar-symmulgt-transmul,3600,memout +BF24_cpmp_20_5_80_32_60_17,1,astar-symmullt-transmul,280.966,ok +BF24_cpmp_20_5_80_32_60_17,1,idastar-symmulgt-transmul,3600,timeout +BF24_cpmp_20_5_80_32_60_17,1,idastar-symmullt-transmul,3600,timeout +BF24_cpmp_20_5_80_32_60_2,1,astar-symmulgt-transmul,51.007,ok +BF24_cpmp_20_5_80_32_60_2,1,astar-symmullt-transmul,3600,memout +BF24_cpmp_20_5_80_32_60_2,1,idastar-symmulgt-transmul,3600,timeout +BF24_cpmp_20_5_80_32_60_2,1,idastar-symmullt-transmul,3600,timeout +BF24_cpmp_20_5_80_32_60_8,1,astar-symmulgt-transmul,3600,memout +BF24_cpmp_20_5_80_32_60_8,1,astar-symmullt-transmul,20.565,ok +BF24_cpmp_20_5_80_32_60_8,1,idastar-symmulgt-transmul,3600,timeout +BF24_cpmp_20_5_80_32_60_8,1,idastar-symmullt-transmul,3600,timeout +BF25_cpmp_20_8_96_20_58_20,1,astar-symmulgt-transmul,109.611,ok +BF25_cpmp_20_8_96_20_58_20,1,astar-symmullt-transmul,3600,memout +BF25_cpmp_20_8_96_20_58_20,1,idastar-symmulgt-transmul,3600,timeout +BF25_cpmp_20_8_96_20_58_20,1,idastar-symmullt-transmul,922.238,ok +BF26_cpmp_20_8_96_20_72_5,1,astar-symmulgt-transmul,482.286,ok +BF26_cpmp_20_8_96_20_72_5,1,astar-symmullt-transmul,3600,memout +BF26_cpmp_20_8_96_20_72_5,1,idastar-symmulgt-transmul,3600,timeout +BF26_cpmp_20_8_96_20_72_5,1,idastar-symmullt-transmul,3600,timeout +BF27_cpmp_20_8_96_39_58_12,1,astar-symmulgt-transmul,3600,memout +BF27_cpmp_20_8_96_39_58_12,1,astar-symmullt-transmul,3600,memout +BF27_cpmp_20_8_96_39_58_12,1,idastar-symmulgt-transmul,359.054,ok +BF27_cpmp_20_8_96_39_58_12,1,idastar-symmullt-transmul,3600,timeout +BF27_cpmp_20_8_96_39_58_6,1,astar-symmulgt-transmul,3600,memout +BF27_cpmp_20_8_96_39_58_6,1,astar-symmullt-transmul,3600,memout +BF27_cpmp_20_8_96_39_58_6,1,idastar-symmulgt-transmul,178.767,ok +BF27_cpmp_20_8_96_39_58_6,1,idastar-symmullt-transmul,1104.137,ok +BF27_cpmp_20_8_96_39_58_9,1,astar-symmulgt-transmul,70.236,ok +BF27_cpmp_20_8_96_39_58_9,1,astar-symmullt-transmul,3600,memout +BF27_cpmp_20_8_96_39_58_9,1,idastar-symmulgt-transmul,3600,timeout +BF27_cpmp_20_8_96_39_58_9,1,idastar-symmullt-transmul,3600,timeout +BF28_cpmp_20_8_96_39_72_19,1,astar-symmulgt-transmul,3600,memout +BF28_cpmp_20_8_96_39_72_19,1,astar-symmullt-transmul,3600,memout +BF28_cpmp_20_8_96_39_72_19,1,idastar-symmulgt-transmul,220.91,ok +BF28_cpmp_20_8_96_39_72_19,1,idastar-symmullt-transmul,1868.773,ok +BF2_cpmp_16_5_48_10_36_1,1,astar-symmulgt-transmul,0.016,ok +BF2_cpmp_16_5_48_10_36_1,1,astar-symmullt-transmul,1.276,ok +BF2_cpmp_16_5_48_10_36_1,1,idastar-symmulgt-transmul,165.582,ok +BF2_cpmp_16_5_48_10_36_1,1,idastar-symmullt-transmul,183.243,ok +BF2_cpmp_16_5_48_10_36_11,1,astar-symmulgt-transmul,0.772,ok +BF2_cpmp_16_5_48_10_36_11,1,astar-symmullt-transmul,1.136,ok +BF2_cpmp_16_5_48_10_36_11,1,idastar-symmulgt-transmul,3600,timeout +BF2_cpmp_16_5_48_10_36_11,1,idastar-symmullt-transmul,3600,timeout +BF2_cpmp_16_5_48_10_36_12,1,astar-symmulgt-transmul,3600,memout +BF2_cpmp_16_5_48_10_36_12,1,astar-symmullt-transmul,3600,memout +BF2_cpmp_16_5_48_10_36_12,1,idastar-symmulgt-transmul,0.004,ok +BF2_cpmp_16_5_48_10_36_12,1,idastar-symmullt-transmul,3600,timeout +BF2_cpmp_16_5_48_10_36_13,1,astar-symmulgt-transmul,48.127,ok +BF2_cpmp_16_5_48_10_36_13,1,astar-symmullt-transmul,7.476,ok +BF2_cpmp_16_5_48_10_36_13,1,idastar-symmulgt-transmul,3600,timeout +BF2_cpmp_16_5_48_10_36_13,1,idastar-symmullt-transmul,3600,timeout +BF2_cpmp_16_5_48_10_36_14,1,astar-symmulgt-transmul,0.056,ok +BF2_cpmp_16_5_48_10_36_14,1,astar-symmullt-transmul,0.016,ok +BF2_cpmp_16_5_48_10_36_14,1,idastar-symmulgt-transmul,3600,timeout +BF2_cpmp_16_5_48_10_36_14,1,idastar-symmullt-transmul,3600,timeout +BF2_cpmp_16_5_48_10_36_15,1,astar-symmulgt-transmul,0.06,ok +BF2_cpmp_16_5_48_10_36_15,1,astar-symmullt-transmul,0.484,ok +BF2_cpmp_16_5_48_10_36_15,1,idastar-symmulgt-transmul,0.004,ok +BF2_cpmp_16_5_48_10_36_15,1,idastar-symmullt-transmul,0.004,ok +BF2_cpmp_16_5_48_10_36_17,1,astar-symmulgt-transmul,0.048,ok +BF2_cpmp_16_5_48_10_36_17,1,astar-symmullt-transmul,0.968,ok +BF2_cpmp_16_5_48_10_36_17,1,idastar-symmulgt-transmul,0.004,ok +BF2_cpmp_16_5_48_10_36_17,1,idastar-symmullt-transmul,0.008,ok +BF2_cpmp_16_5_48_10_36_18,1,astar-symmulgt-transmul,0.024,ok +BF2_cpmp_16_5_48_10_36_18,1,astar-symmullt-transmul,0.016,ok +BF2_cpmp_16_5_48_10_36_18,1,idastar-symmulgt-transmul,0.004,ok +BF2_cpmp_16_5_48_10_36_18,1,idastar-symmullt-transmul,3600,timeout +BF2_cpmp_16_5_48_10_36_19,1,astar-symmulgt-transmul,0.284,ok +BF2_cpmp_16_5_48_10_36_19,1,astar-symmullt-transmul,3600,memout +BF2_cpmp_16_5_48_10_36_19,1,idastar-symmulgt-transmul,3600,timeout +BF2_cpmp_16_5_48_10_36_19,1,idastar-symmullt-transmul,3600,timeout +BF2_cpmp_16_5_48_10_36_2,1,astar-symmulgt-transmul,3600,memout +BF2_cpmp_16_5_48_10_36_2,1,astar-symmullt-transmul,0.184,ok +BF2_cpmp_16_5_48_10_36_2,1,idastar-symmulgt-transmul,3600,timeout +BF2_cpmp_16_5_48_10_36_2,1,idastar-symmullt-transmul,3600,timeout +BF2_cpmp_16_5_48_10_36_20,1,astar-symmulgt-transmul,4.004,ok +BF2_cpmp_16_5_48_10_36_20,1,astar-symmullt-transmul,0.016,ok +BF2_cpmp_16_5_48_10_36_20,1,idastar-symmulgt-transmul,0.004,ok +BF2_cpmp_16_5_48_10_36_20,1,idastar-symmullt-transmul,0.004,ok +BF2_cpmp_16_5_48_10_36_3,1,astar-symmulgt-transmul,0.108,ok +BF2_cpmp_16_5_48_10_36_3,1,astar-symmullt-transmul,14.797,ok +BF2_cpmp_16_5_48_10_36_3,1,idastar-symmulgt-transmul,0.004,ok +BF2_cpmp_16_5_48_10_36_3,1,idastar-symmullt-transmul,0.004,ok +BF2_cpmp_16_5_48_10_36_4,1,astar-symmulgt-transmul,7.212,ok +BF2_cpmp_16_5_48_10_36_4,1,astar-symmullt-transmul,1.284,ok +BF2_cpmp_16_5_48_10_36_4,1,idastar-symmulgt-transmul,0.004,ok +BF2_cpmp_16_5_48_10_36_4,1,idastar-symmullt-transmul,3600,timeout +BF2_cpmp_16_5_48_10_36_5,1,astar-symmulgt-transmul,0.556,ok +BF2_cpmp_16_5_48_10_36_5,1,astar-symmullt-transmul,0.028,ok +BF2_cpmp_16_5_48_10_36_5,1,idastar-symmulgt-transmul,0.0,ok +BF2_cpmp_16_5_48_10_36_5,1,idastar-symmullt-transmul,0.004,ok +BF2_cpmp_16_5_48_10_36_6,1,astar-symmulgt-transmul,3600,memout +BF2_cpmp_16_5_48_10_36_6,1,astar-symmullt-transmul,3600,memout +BF2_cpmp_16_5_48_10_36_6,1,idastar-symmulgt-transmul,1323.091,ok +BF2_cpmp_16_5_48_10_36_6,1,idastar-symmullt-transmul,3600,timeout +BF2_cpmp_16_5_48_10_36_7,1,astar-symmulgt-transmul,0.808,ok +BF2_cpmp_16_5_48_10_36_7,1,astar-symmullt-transmul,3600,memout +BF2_cpmp_16_5_48_10_36_7,1,idastar-symmulgt-transmul,0.0,ok +BF2_cpmp_16_5_48_10_36_7,1,idastar-symmullt-transmul,3600,timeout +BF2_cpmp_16_5_48_10_36_8,1,astar-symmulgt-transmul,0.052,ok +BF2_cpmp_16_5_48_10_36_8,1,astar-symmullt-transmul,3600,memout +BF2_cpmp_16_5_48_10_36_8,1,idastar-symmulgt-transmul,0.004,ok +BF2_cpmp_16_5_48_10_36_8,1,idastar-symmullt-transmul,2933.159,ok +BF2_cpmp_16_5_48_10_36_9,1,astar-symmulgt-transmul,0.512,ok +BF2_cpmp_16_5_48_10_36_9,1,astar-symmullt-transmul,0.54,ok +BF2_cpmp_16_5_48_10_36_9,1,idastar-symmulgt-transmul,0.096,ok +BF2_cpmp_16_5_48_10_36_9,1,idastar-symmullt-transmul,0.088,ok +BF3_cpmp_16_5_48_20_29_1,1,astar-symmulgt-transmul,3600,memout +BF3_cpmp_16_5_48_20_29_1,1,astar-symmullt-transmul,3600,memout +BF3_cpmp_16_5_48_20_29_1,1,idastar-symmulgt-transmul,108.943,ok +BF3_cpmp_16_5_48_20_29_1,1,idastar-symmullt-transmul,3600,timeout +BF3_cpmp_16_5_48_20_29_11,1,astar-symmulgt-transmul,0.016,ok +BF3_cpmp_16_5_48_20_29_11,1,astar-symmullt-transmul,3600,memout +BF3_cpmp_16_5_48_20_29_11,1,idastar-symmulgt-transmul,0.0,ok +BF3_cpmp_16_5_48_20_29_11,1,idastar-symmullt-transmul,184.984,ok +BF3_cpmp_16_5_48_20_29_13,1,astar-symmulgt-transmul,24.049,ok +BF3_cpmp_16_5_48_20_29_13,1,astar-symmullt-transmul,3600,memout +BF3_cpmp_16_5_48_20_29_13,1,idastar-symmulgt-transmul,0.0,ok +BF3_cpmp_16_5_48_20_29_13,1,idastar-symmullt-transmul,3600,timeout +BF3_cpmp_16_5_48_20_29_14,1,astar-symmulgt-transmul,3600,memout +BF3_cpmp_16_5_48_20_29_14,1,astar-symmullt-transmul,0.052,ok +BF3_cpmp_16_5_48_20_29_14,1,idastar-symmulgt-transmul,3600,timeout +BF3_cpmp_16_5_48_20_29_14,1,idastar-symmullt-transmul,3600,timeout +BF3_cpmp_16_5_48_20_29_16,1,astar-symmulgt-transmul,13.061,ok +BF3_cpmp_16_5_48_20_29_16,1,astar-symmullt-transmul,1.032,ok +BF3_cpmp_16_5_48_20_29_16,1,idastar-symmulgt-transmul,5.628,ok +BF3_cpmp_16_5_48_20_29_16,1,idastar-symmullt-transmul,859.542,ok +BF3_cpmp_16_5_48_20_29_19,1,astar-symmulgt-transmul,3600,memout +BF3_cpmp_16_5_48_20_29_19,1,astar-symmullt-transmul,19.833,ok +BF3_cpmp_16_5_48_20_29_19,1,idastar-symmulgt-transmul,155.03,ok +BF3_cpmp_16_5_48_20_29_19,1,idastar-symmullt-transmul,5.208,ok +BF3_cpmp_16_5_48_20_29_20,1,astar-symmulgt-transmul,61.472,ok +BF3_cpmp_16_5_48_20_29_20,1,astar-symmullt-transmul,3600,memout +BF3_cpmp_16_5_48_20_29_20,1,idastar-symmulgt-transmul,3446.571,ok +BF3_cpmp_16_5_48_20_29_20,1,idastar-symmullt-transmul,3600,timeout +BF3_cpmp_16_5_48_20_29_3,1,astar-symmulgt-transmul,3.072,ok +BF3_cpmp_16_5_48_20_29_3,1,astar-symmullt-transmul,306.691,ok +BF3_cpmp_16_5_48_20_29_3,1,idastar-symmulgt-transmul,3600,timeout +BF3_cpmp_16_5_48_20_29_3,1,idastar-symmullt-transmul,3600,timeout +BF3_cpmp_16_5_48_20_29_5,1,astar-symmulgt-transmul,3600,memout +BF3_cpmp_16_5_48_20_29_5,1,astar-symmullt-transmul,3600,memout +BF3_cpmp_16_5_48_20_29_5,1,idastar-symmulgt-transmul,665.154,ok +BF3_cpmp_16_5_48_20_29_5,1,idastar-symmullt-transmul,228.986,ok +BF3_cpmp_16_5_48_20_29_6,1,astar-symmulgt-transmul,0.1,ok +BF3_cpmp_16_5_48_20_29_6,1,astar-symmullt-transmul,6.948,ok +BF3_cpmp_16_5_48_20_29_6,1,idastar-symmulgt-transmul,0.004,ok +BF3_cpmp_16_5_48_20_29_6,1,idastar-symmullt-transmul,3.604,ok +BF3_cpmp_16_5_48_20_29_8,1,astar-symmulgt-transmul,0.02,ok +BF3_cpmp_16_5_48_20_29_8,1,astar-symmullt-transmul,3600,memout +BF3_cpmp_16_5_48_20_29_8,1,idastar-symmulgt-transmul,0.004,ok +BF3_cpmp_16_5_48_20_29_8,1,idastar-symmullt-transmul,3600,timeout +BF3_cpmp_16_5_48_20_29_9,1,astar-symmulgt-transmul,0.016,ok +BF3_cpmp_16_5_48_20_29_9,1,astar-symmullt-transmul,3600,memout +BF3_cpmp_16_5_48_20_29_9,1,idastar-symmulgt-transmul,0.0,ok +BF3_cpmp_16_5_48_20_29_9,1,idastar-symmullt-transmul,3600,timeout +BF4_cpmp_16_5_48_20_36_1,1,astar-symmulgt-transmul,0.012,ok +BF4_cpmp_16_5_48_20_36_1,1,astar-symmullt-transmul,0.132,ok +BF4_cpmp_16_5_48_20_36_1,1,idastar-symmulgt-transmul,0.004,ok +BF4_cpmp_16_5_48_20_36_1,1,idastar-symmullt-transmul,0.012,ok +BF4_cpmp_16_5_48_20_36_10,1,astar-symmulgt-transmul,3600,memout +BF4_cpmp_16_5_48_20_36_10,1,astar-symmullt-transmul,3600,memout +BF4_cpmp_16_5_48_20_36_10,1,idastar-symmulgt-transmul,0.004,ok +BF4_cpmp_16_5_48_20_36_10,1,idastar-symmullt-transmul,0.036,ok +BF4_cpmp_16_5_48_20_36_11,1,astar-symmulgt-transmul,3600,memout +BF4_cpmp_16_5_48_20_36_11,1,astar-symmullt-transmul,107.599,ok +BF4_cpmp_16_5_48_20_36_11,1,idastar-symmulgt-transmul,3600,timeout +BF4_cpmp_16_5_48_20_36_11,1,idastar-symmullt-transmul,3600,timeout +BF4_cpmp_16_5_48_20_36_13,1,astar-symmulgt-transmul,0.052,ok +BF4_cpmp_16_5_48_20_36_13,1,astar-symmullt-transmul,0.024,ok +BF4_cpmp_16_5_48_20_36_13,1,idastar-symmulgt-transmul,0.004,ok +BF4_cpmp_16_5_48_20_36_13,1,idastar-symmullt-transmul,0.244,ok +BF4_cpmp_16_5_48_20_36_14,1,astar-symmulgt-transmul,2.264,ok +BF4_cpmp_16_5_48_20_36_14,1,astar-symmullt-transmul,1.748,ok +BF4_cpmp_16_5_48_20_36_14,1,idastar-symmulgt-transmul,0.332,ok +BF4_cpmp_16_5_48_20_36_14,1,idastar-symmullt-transmul,0.376,ok +BF4_cpmp_16_5_48_20_36_15,1,astar-symmulgt-transmul,3600,memout +BF4_cpmp_16_5_48_20_36_15,1,astar-symmullt-transmul,3600,memout +BF4_cpmp_16_5_48_20_36_15,1,idastar-symmulgt-transmul,0.004,ok +BF4_cpmp_16_5_48_20_36_15,1,idastar-symmullt-transmul,3600,timeout +BF4_cpmp_16_5_48_20_36_16,1,astar-symmulgt-transmul,0.02,ok +BF4_cpmp_16_5_48_20_36_16,1,astar-symmullt-transmul,0.428,ok +BF4_cpmp_16_5_48_20_36_16,1,idastar-symmulgt-transmul,0.004,ok +BF4_cpmp_16_5_48_20_36_16,1,idastar-symmullt-transmul,0.004,ok +BF4_cpmp_16_5_48_20_36_17,1,astar-symmulgt-transmul,82.681,ok +BF4_cpmp_16_5_48_20_36_17,1,astar-symmullt-transmul,2.356,ok +BF4_cpmp_16_5_48_20_36_17,1,idastar-symmulgt-transmul,8.893,ok +BF4_cpmp_16_5_48_20_36_17,1,idastar-symmullt-transmul,0.44,ok +BF4_cpmp_16_5_48_20_36_18,1,astar-symmulgt-transmul,0.16,ok +BF4_cpmp_16_5_48_20_36_18,1,astar-symmullt-transmul,5.616,ok +BF4_cpmp_16_5_48_20_36_18,1,idastar-symmulgt-transmul,0.424,ok +BF4_cpmp_16_5_48_20_36_18,1,idastar-symmullt-transmul,3600,timeout +BF4_cpmp_16_5_48_20_36_19,1,astar-symmulgt-transmul,234.887,ok +BF4_cpmp_16_5_48_20_36_19,1,astar-symmullt-transmul,3600,memout +BF4_cpmp_16_5_48_20_36_19,1,idastar-symmulgt-transmul,0.004,ok +BF4_cpmp_16_5_48_20_36_19,1,idastar-symmullt-transmul,0.004,ok +BF4_cpmp_16_5_48_20_36_2,1,astar-symmulgt-transmul,3600,memout +BF4_cpmp_16_5_48_20_36_2,1,astar-symmullt-transmul,3600,memout +BF4_cpmp_16_5_48_20_36_2,1,idastar-symmulgt-transmul,121.056,ok +BF4_cpmp_16_5_48_20_36_2,1,idastar-symmullt-transmul,165.59,ok +BF4_cpmp_16_5_48_20_36_20,1,astar-symmulgt-transmul,5.376,ok +BF4_cpmp_16_5_48_20_36_20,1,astar-symmullt-transmul,0.056,ok +BF4_cpmp_16_5_48_20_36_20,1,idastar-symmulgt-transmul,18.581,ok +BF4_cpmp_16_5_48_20_36_20,1,idastar-symmullt-transmul,0.656,ok +BF4_cpmp_16_5_48_20_36_3,1,astar-symmulgt-transmul,0.152,ok +BF4_cpmp_16_5_48_20_36_3,1,astar-symmullt-transmul,3600,memout +BF4_cpmp_16_5_48_20_36_3,1,idastar-symmulgt-transmul,0.004,ok +BF4_cpmp_16_5_48_20_36_3,1,idastar-symmullt-transmul,3600,timeout +BF4_cpmp_16_5_48_20_36_5,1,astar-symmulgt-transmul,0.216,ok +BF4_cpmp_16_5_48_20_36_5,1,astar-symmullt-transmul,1.312,ok +BF4_cpmp_16_5_48_20_36_5,1,idastar-symmulgt-transmul,0.048,ok +BF4_cpmp_16_5_48_20_36_5,1,idastar-symmullt-transmul,3.232,ok +BF4_cpmp_16_5_48_20_36_6,1,astar-symmulgt-transmul,1.668,ok +BF4_cpmp_16_5_48_20_36_6,1,astar-symmullt-transmul,0.088,ok +BF4_cpmp_16_5_48_20_36_6,1,idastar-symmulgt-transmul,0.0,ok +BF4_cpmp_16_5_48_20_36_6,1,idastar-symmullt-transmul,3600,timeout +BF4_cpmp_16_5_48_20_36_7,1,astar-symmulgt-transmul,11.393,ok +BF4_cpmp_16_5_48_20_36_7,1,astar-symmullt-transmul,0.52,ok +BF4_cpmp_16_5_48_20_36_7,1,idastar-symmulgt-transmul,93.89,ok +BF4_cpmp_16_5_48_20_36_7,1,idastar-symmullt-transmul,1.192,ok +BF4_cpmp_16_5_48_20_36_8,1,astar-symmulgt-transmul,0.02,ok +BF4_cpmp_16_5_48_20_36_8,1,astar-symmullt-transmul,267.681,ok +BF4_cpmp_16_5_48_20_36_8,1,idastar-symmulgt-transmul,0.004,ok +BF4_cpmp_16_5_48_20_36_8,1,idastar-symmullt-transmul,68.28,ok +BF4_cpmp_16_5_48_20_36_9,1,astar-symmulgt-transmul,2.828,ok +BF4_cpmp_16_5_48_20_36_9,1,astar-symmullt-transmul,6.94,ok +BF4_cpmp_16_5_48_20_36_9,1,idastar-symmulgt-transmul,0.0,ok +BF4_cpmp_16_5_48_20_36_9,1,idastar-symmullt-transmul,0.012,ok +BF5_cpmp_16_5_64_13_39_1,1,astar-symmulgt-transmul,228.59,ok +BF5_cpmp_16_5_64_13_39_1,1,astar-symmullt-transmul,7.68,ok +BF5_cpmp_16_5_64_13_39_1,1,idastar-symmulgt-transmul,485.89,ok +BF5_cpmp_16_5_64_13_39_1,1,idastar-symmullt-transmul,4.94,ok +BF5_cpmp_16_5_64_13_39_10,1,astar-symmulgt-transmul,3600,memout +BF5_cpmp_16_5_64_13_39_10,1,astar-symmullt-transmul,337.773,ok +BF5_cpmp_16_5_64_13_39_10,1,idastar-symmulgt-transmul,3600,timeout +BF5_cpmp_16_5_64_13_39_10,1,idastar-symmullt-transmul,3600,timeout +BF5_cpmp_16_5_64_13_39_11,1,astar-symmulgt-transmul,3600,memout +BF5_cpmp_16_5_64_13_39_11,1,astar-symmullt-transmul,3600,memout +BF5_cpmp_16_5_64_13_39_11,1,idastar-symmulgt-transmul,3600,timeout +BF5_cpmp_16_5_64_13_39_11,1,idastar-symmullt-transmul,238.703,ok +BF5_cpmp_16_5_64_13_39_12,1,astar-symmulgt-transmul,0.064,ok +BF5_cpmp_16_5_64_13_39_12,1,astar-symmullt-transmul,5.068,ok +BF5_cpmp_16_5_64_13_39_12,1,idastar-symmulgt-transmul,0.788,ok +BF5_cpmp_16_5_64_13_39_12,1,idastar-symmullt-transmul,299.575,ok +BF5_cpmp_16_5_64_13_39_14,1,astar-symmulgt-transmul,0.26,ok +BF5_cpmp_16_5_64_13_39_14,1,astar-symmullt-transmul,2.836,ok +BF5_cpmp_16_5_64_13_39_14,1,idastar-symmulgt-transmul,1.192,ok +BF5_cpmp_16_5_64_13_39_14,1,idastar-symmullt-transmul,0.92,ok +BF5_cpmp_16_5_64_13_39_15,1,astar-symmulgt-transmul,3600,memout +BF5_cpmp_16_5_64_13_39_15,1,astar-symmullt-transmul,237.703,ok +BF5_cpmp_16_5_64_13_39_15,1,idastar-symmulgt-transmul,3600,timeout +BF5_cpmp_16_5_64_13_39_15,1,idastar-symmullt-transmul,12.665,ok +BF5_cpmp_16_5_64_13_39_17,1,astar-symmulgt-transmul,1.284,ok +BF5_cpmp_16_5_64_13_39_17,1,astar-symmullt-transmul,38.662,ok +BF5_cpmp_16_5_64_13_39_17,1,idastar-symmulgt-transmul,0.516,ok +BF5_cpmp_16_5_64_13_39_17,1,idastar-symmullt-transmul,0.256,ok +BF5_cpmp_16_5_64_13_39_2,1,astar-symmulgt-transmul,95.354,ok +BF5_cpmp_16_5_64_13_39_2,1,astar-symmullt-transmul,0.04,ok +BF5_cpmp_16_5_64_13_39_2,1,idastar-symmulgt-transmul,1583.931,ok +BF5_cpmp_16_5_64_13_39_2,1,idastar-symmullt-transmul,2516.925,ok +BF5_cpmp_16_5_64_13_39_20,1,astar-symmulgt-transmul,3600,memout +BF5_cpmp_16_5_64_13_39_20,1,astar-symmullt-transmul,3600,memout +BF5_cpmp_16_5_64_13_39_20,1,idastar-symmulgt-transmul,3600,timeout +BF5_cpmp_16_5_64_13_39_20,1,idastar-symmullt-transmul,1500.394,ok +BF5_cpmp_16_5_64_13_39_3,1,astar-symmulgt-transmul,0.352,ok +BF5_cpmp_16_5_64_13_39_3,1,astar-symmullt-transmul,3600,memout +BF5_cpmp_16_5_64_13_39_3,1,idastar-symmulgt-transmul,3600,timeout +BF5_cpmp_16_5_64_13_39_3,1,idastar-symmullt-transmul,3600,timeout +BF5_cpmp_16_5_64_13_39_4,1,astar-symmulgt-transmul,5.6,ok +BF5_cpmp_16_5_64_13_39_4,1,astar-symmullt-transmul,1.556,ok +BF5_cpmp_16_5_64_13_39_4,1,idastar-symmulgt-transmul,3600,timeout +BF5_cpmp_16_5_64_13_39_4,1,idastar-symmullt-transmul,3600,timeout +BF5_cpmp_16_5_64_13_39_5,1,astar-symmulgt-transmul,7.056,ok +BF5_cpmp_16_5_64_13_39_5,1,astar-symmullt-transmul,2.452,ok +BF5_cpmp_16_5_64_13_39_5,1,idastar-symmulgt-transmul,0.892,ok +BF5_cpmp_16_5_64_13_39_5,1,idastar-symmullt-transmul,0.352,ok +BF5_cpmp_16_5_64_13_39_8,1,astar-symmulgt-transmul,6.72,ok +BF5_cpmp_16_5_64_13_39_8,1,astar-symmullt-transmul,138.617,ok +BF5_cpmp_16_5_64_13_39_8,1,idastar-symmulgt-transmul,14.333,ok +BF5_cpmp_16_5_64_13_39_8,1,idastar-symmullt-transmul,0.296,ok +BF5_cpmp_16_5_64_13_39_9,1,astar-symmulgt-transmul,0.832,ok +BF5_cpmp_16_5_64_13_39_9,1,astar-symmullt-transmul,0.34,ok +BF5_cpmp_16_5_64_13_39_9,1,idastar-symmulgt-transmul,29.366,ok +BF5_cpmp_16_5_64_13_39_9,1,idastar-symmullt-transmul,81.745,ok +BF6_cpmp_16_5_64_13_48_1,1,astar-symmulgt-transmul,11.389,ok +BF6_cpmp_16_5_64_13_48_1,1,astar-symmullt-transmul,11.861,ok +BF6_cpmp_16_5_64_13_48_1,1,idastar-symmulgt-transmul,3600,timeout +BF6_cpmp_16_5_64_13_48_1,1,idastar-symmullt-transmul,3600,timeout +BF6_cpmp_16_5_64_13_48_11,1,astar-symmulgt-transmul,3.016,ok +BF6_cpmp_16_5_64_13_48_11,1,astar-symmullt-transmul,1.588,ok +BF6_cpmp_16_5_64_13_48_11,1,idastar-symmulgt-transmul,3600,timeout +BF6_cpmp_16_5_64_13_48_11,1,idastar-symmullt-transmul,3600,timeout +BF6_cpmp_16_5_64_13_48_12,1,astar-symmulgt-transmul,78.477,ok +BF6_cpmp_16_5_64_13_48_12,1,astar-symmullt-transmul,0.852,ok +BF6_cpmp_16_5_64_13_48_12,1,idastar-symmulgt-transmul,3600,timeout +BF6_cpmp_16_5_64_13_48_12,1,idastar-symmullt-transmul,3600,timeout +BF6_cpmp_16_5_64_13_48_13,1,astar-symmulgt-transmul,114.211,ok +BF6_cpmp_16_5_64_13_48_13,1,astar-symmullt-transmul,1.536,ok +BF6_cpmp_16_5_64_13_48_13,1,idastar-symmulgt-transmul,3600,timeout +BF6_cpmp_16_5_64_13_48_13,1,idastar-symmullt-transmul,3600,timeout +BF6_cpmp_16_5_64_13_48_15,1,astar-symmulgt-transmul,3600,memout +BF6_cpmp_16_5_64_13_48_15,1,astar-symmullt-transmul,133.524,ok +BF6_cpmp_16_5_64_13_48_15,1,idastar-symmulgt-transmul,3600,timeout +BF6_cpmp_16_5_64_13_48_15,1,idastar-symmullt-transmul,3600,timeout +BF6_cpmp_16_5_64_13_48_16,1,astar-symmulgt-transmul,1.072,ok +BF6_cpmp_16_5_64_13_48_16,1,astar-symmullt-transmul,3.036,ok +BF6_cpmp_16_5_64_13_48_16,1,idastar-symmulgt-transmul,3600,timeout +BF6_cpmp_16_5_64_13_48_16,1,idastar-symmullt-transmul,3600,timeout +BF6_cpmp_16_5_64_13_48_17,1,astar-symmulgt-transmul,1.904,ok +BF6_cpmp_16_5_64_13_48_17,1,astar-symmullt-transmul,2.112,ok +BF6_cpmp_16_5_64_13_48_17,1,idastar-symmulgt-transmul,3600,timeout +BF6_cpmp_16_5_64_13_48_17,1,idastar-symmullt-transmul,3600,timeout +BF6_cpmp_16_5_64_13_48_18,1,astar-symmulgt-transmul,12.225,ok +BF6_cpmp_16_5_64_13_48_18,1,astar-symmullt-transmul,3600,memout +BF6_cpmp_16_5_64_13_48_18,1,idastar-symmulgt-transmul,3600,timeout +BF6_cpmp_16_5_64_13_48_18,1,idastar-symmullt-transmul,3600,timeout +BF6_cpmp_16_5_64_13_48_19,1,astar-symmulgt-transmul,3600,memout +BF6_cpmp_16_5_64_13_48_19,1,astar-symmullt-transmul,28.89,ok +BF6_cpmp_16_5_64_13_48_19,1,idastar-symmulgt-transmul,3600,timeout +BF6_cpmp_16_5_64_13_48_19,1,idastar-symmullt-transmul,3600,timeout +BF6_cpmp_16_5_64_13_48_2,1,astar-symmulgt-transmul,1.72,ok +BF6_cpmp_16_5_64_13_48_2,1,astar-symmullt-transmul,8.465,ok +BF6_cpmp_16_5_64_13_48_2,1,idastar-symmulgt-transmul,3600,timeout +BF6_cpmp_16_5_64_13_48_2,1,idastar-symmullt-transmul,3600,timeout +BF6_cpmp_16_5_64_13_48_3,1,astar-symmulgt-transmul,1.972,ok +BF6_cpmp_16_5_64_13_48_3,1,astar-symmullt-transmul,3600,memout +BF6_cpmp_16_5_64_13_48_3,1,idastar-symmulgt-transmul,3600,timeout +BF6_cpmp_16_5_64_13_48_3,1,idastar-symmullt-transmul,3600,timeout +BF6_cpmp_16_5_64_13_48_4,1,astar-symmulgt-transmul,3600,memout +BF6_cpmp_16_5_64_13_48_4,1,astar-symmullt-transmul,125.576,ok +BF6_cpmp_16_5_64_13_48_4,1,idastar-symmulgt-transmul,3600,timeout +BF6_cpmp_16_5_64_13_48_4,1,idastar-symmullt-transmul,3600,timeout +BF6_cpmp_16_5_64_13_48_6,1,astar-symmulgt-transmul,156.334,ok +BF6_cpmp_16_5_64_13_48_6,1,astar-symmullt-transmul,3600,memout +BF6_cpmp_16_5_64_13_48_6,1,idastar-symmulgt-transmul,3600,timeout +BF6_cpmp_16_5_64_13_48_6,1,idastar-symmullt-transmul,3600,timeout +BF6_cpmp_16_5_64_13_48_7,1,astar-symmulgt-transmul,9.729,ok +BF6_cpmp_16_5_64_13_48_7,1,astar-symmullt-transmul,14.477,ok +BF6_cpmp_16_5_64_13_48_7,1,idastar-symmulgt-transmul,3600,timeout +BF6_cpmp_16_5_64_13_48_7,1,idastar-symmullt-transmul,3600,timeout +BF6_cpmp_16_5_64_13_48_8,1,astar-symmulgt-transmul,76.481,ok +BF6_cpmp_16_5_64_13_48_8,1,astar-symmullt-transmul,3600,memout +BF6_cpmp_16_5_64_13_48_8,1,idastar-symmulgt-transmul,311.359,ok +BF6_cpmp_16_5_64_13_48_8,1,idastar-symmullt-transmul,3600,timeout +BF6_cpmp_16_5_64_13_48_9,1,astar-symmulgt-transmul,8.917,ok +BF6_cpmp_16_5_64_13_48_9,1,astar-symmullt-transmul,59.656,ok +BF6_cpmp_16_5_64_13_48_9,1,idastar-symmulgt-transmul,3600,timeout +BF6_cpmp_16_5_64_13_48_9,1,idastar-symmullt-transmul,3600,timeout +BF7_cpmp_16_5_64_26_39_1,1,astar-symmulgt-transmul,23.57,ok +BF7_cpmp_16_5_64_26_39_1,1,astar-symmullt-transmul,3600,memout +BF7_cpmp_16_5_64_26_39_1,1,idastar-symmulgt-transmul,3600,timeout +BF7_cpmp_16_5_64_26_39_1,1,idastar-symmullt-transmul,3600,timeout +BF7_cpmp_16_5_64_26_39_10,1,astar-symmulgt-transmul,353.126,ok +BF7_cpmp_16_5_64_26_39_10,1,astar-symmullt-transmul,2.216,ok +BF7_cpmp_16_5_64_26_39_10,1,idastar-symmulgt-transmul,66.108,ok +BF7_cpmp_16_5_64_26_39_10,1,idastar-symmullt-transmul,2.032,ok +BF7_cpmp_16_5_64_26_39_11,1,astar-symmulgt-transmul,25.254,ok +BF7_cpmp_16_5_64_26_39_11,1,astar-symmullt-transmul,13.929,ok +BF7_cpmp_16_5_64_26_39_11,1,idastar-symmulgt-transmul,345.646,ok +BF7_cpmp_16_5_64_26_39_11,1,idastar-symmullt-transmul,2782.858,ok +BF7_cpmp_16_5_64_26_39_12,1,astar-symmulgt-transmul,3600,memout +BF7_cpmp_16_5_64_26_39_12,1,astar-symmullt-transmul,3600,memout +BF7_cpmp_16_5_64_26_39_12,1,idastar-symmulgt-transmul,2387.621,ok +BF7_cpmp_16_5_64_26_39_12,1,idastar-symmullt-transmul,3143.292,ok +BF7_cpmp_16_5_64_26_39_13,1,astar-symmulgt-transmul,5.54,ok +BF7_cpmp_16_5_64_26_39_13,1,astar-symmullt-transmul,3600,memout +BF7_cpmp_16_5_64_26_39_13,1,idastar-symmulgt-transmul,33.922,ok +BF7_cpmp_16_5_64_26_39_13,1,idastar-symmullt-transmul,33.098,ok +BF7_cpmp_16_5_64_26_39_14,1,astar-symmulgt-transmul,3600,memout +BF7_cpmp_16_5_64_26_39_14,1,astar-symmullt-transmul,3600,memout +BF7_cpmp_16_5_64_26_39_14,1,idastar-symmulgt-transmul,3600,timeout +BF7_cpmp_16_5_64_26_39_14,1,idastar-symmullt-transmul,657.537,ok +BF7_cpmp_16_5_64_26_39_15,1,astar-symmulgt-transmul,3600,memout +BF7_cpmp_16_5_64_26_39_15,1,astar-symmullt-transmul,8.097,ok +BF7_cpmp_16_5_64_26_39_15,1,idastar-symmulgt-transmul,1.688,ok +BF7_cpmp_16_5_64_26_39_15,1,idastar-symmullt-transmul,3.228,ok +BF7_cpmp_16_5_64_26_39_16,1,astar-symmulgt-transmul,4.304,ok +BF7_cpmp_16_5_64_26_39_16,1,astar-symmullt-transmul,14.337,ok +BF7_cpmp_16_5_64_26_39_16,1,idastar-symmulgt-transmul,0.772,ok +BF7_cpmp_16_5_64_26_39_16,1,idastar-symmullt-transmul,55.947,ok +BF7_cpmp_16_5_64_26_39_17,1,astar-symmulgt-transmul,3600,memout +BF7_cpmp_16_5_64_26_39_17,1,astar-symmullt-transmul,300.859,ok +BF7_cpmp_16_5_64_26_39_17,1,idastar-symmulgt-transmul,3.472,ok +BF7_cpmp_16_5_64_26_39_17,1,idastar-symmullt-transmul,589.277,ok +BF7_cpmp_16_5_64_26_39_19,1,astar-symmulgt-transmul,3600,memout +BF7_cpmp_16_5_64_26_39_19,1,astar-symmullt-transmul,391.176,ok +BF7_cpmp_16_5_64_26_39_19,1,idastar-symmulgt-transmul,200.645,ok +BF7_cpmp_16_5_64_26_39_19,1,idastar-symmullt-transmul,945.447,ok +BF7_cpmp_16_5_64_26_39_2,1,astar-symmulgt-transmul,3600,memout +BF7_cpmp_16_5_64_26_39_2,1,astar-symmullt-transmul,3600,memout +BF7_cpmp_16_5_64_26_39_2,1,idastar-symmulgt-transmul,2288.879,ok +BF7_cpmp_16_5_64_26_39_2,1,idastar-symmullt-transmul,699.408,ok +BF7_cpmp_16_5_64_26_39_20,1,astar-symmulgt-transmul,3600,memout +BF7_cpmp_16_5_64_26_39_20,1,astar-symmullt-transmul,3600,memout +BF7_cpmp_16_5_64_26_39_20,1,idastar-symmulgt-transmul,3600,timeout +BF7_cpmp_16_5_64_26_39_20,1,idastar-symmullt-transmul,2240.424,ok +BF7_cpmp_16_5_64_26_39_3,1,astar-symmulgt-transmul,3600,memout +BF7_cpmp_16_5_64_26_39_3,1,astar-symmullt-transmul,3600,memout +BF7_cpmp_16_5_64_26_39_3,1,idastar-symmulgt-transmul,3600,timeout +BF7_cpmp_16_5_64_26_39_3,1,idastar-symmullt-transmul,3387.188,ok +BF7_cpmp_16_5_64_26_39_6,1,astar-symmulgt-transmul,2.832,ok +BF7_cpmp_16_5_64_26_39_6,1,astar-symmullt-transmul,1.124,ok +BF7_cpmp_16_5_64_26_39_6,1,idastar-symmulgt-transmul,8.953,ok +BF7_cpmp_16_5_64_26_39_6,1,idastar-symmullt-transmul,1.136,ok +BF7_cpmp_16_5_64_26_39_7,1,astar-symmulgt-transmul,3600,memout +BF7_cpmp_16_5_64_26_39_7,1,astar-symmullt-transmul,3600,memout +BF7_cpmp_16_5_64_26_39_7,1,idastar-symmulgt-transmul,76.673,ok +BF7_cpmp_16_5_64_26_39_7,1,idastar-symmullt-transmul,407.793,ok +BF7_cpmp_16_5_64_26_39_8,1,astar-symmulgt-transmul,12.325,ok +BF7_cpmp_16_5_64_26_39_8,1,astar-symmullt-transmul,3600,memout +BF7_cpmp_16_5_64_26_39_8,1,idastar-symmulgt-transmul,226.442,ok +BF7_cpmp_16_5_64_26_39_8,1,idastar-symmullt-transmul,6.344,ok +BF8_cpmp_16_5_64_26_48_1,1,astar-symmulgt-transmul,0.128,ok +BF8_cpmp_16_5_64_26_48_1,1,astar-symmullt-transmul,24.645,ok +BF8_cpmp_16_5_64_26_48_1,1,idastar-symmulgt-transmul,3600,timeout +BF8_cpmp_16_5_64_26_48_1,1,idastar-symmullt-transmul,3600,timeout +BF8_cpmp_16_5_64_26_48_11,1,astar-symmulgt-transmul,196.648,ok +BF8_cpmp_16_5_64_26_48_11,1,astar-symmullt-transmul,81.437,ok +BF8_cpmp_16_5_64_26_48_11,1,idastar-symmulgt-transmul,3600,timeout +BF8_cpmp_16_5_64_26_48_11,1,idastar-symmullt-transmul,3600,timeout +BF8_cpmp_16_5_64_26_48_12,1,astar-symmulgt-transmul,12.009,ok +BF8_cpmp_16_5_64_26_48_12,1,astar-symmullt-transmul,22.061,ok +BF8_cpmp_16_5_64_26_48_12,1,idastar-symmulgt-transmul,3600,timeout +BF8_cpmp_16_5_64_26_48_12,1,idastar-symmullt-transmul,3600,timeout +BF8_cpmp_16_5_64_26_48_17,1,astar-symmulgt-transmul,3600,memout +BF8_cpmp_16_5_64_26_48_17,1,astar-symmullt-transmul,127.268,ok +BF8_cpmp_16_5_64_26_48_17,1,idastar-symmulgt-transmul,3600,timeout +BF8_cpmp_16_5_64_26_48_17,1,idastar-symmullt-transmul,3600,timeout +BF8_cpmp_16_5_64_26_48_20,1,astar-symmulgt-transmul,3600,memout +BF8_cpmp_16_5_64_26_48_20,1,astar-symmullt-transmul,3600,memout +BF8_cpmp_16_5_64_26_48_20,1,idastar-symmulgt-transmul,284.538,ok +BF8_cpmp_16_5_64_26_48_20,1,idastar-symmullt-transmul,15.101,ok +BF8_cpmp_16_5_64_26_48_4,1,astar-symmulgt-transmul,3600,memout +BF8_cpmp_16_5_64_26_48_4,1,astar-symmullt-transmul,3600,memout +BF8_cpmp_16_5_64_26_48_4,1,idastar-symmulgt-transmul,3600,timeout +BF8_cpmp_16_5_64_26_48_4,1,idastar-symmullt-transmul,1433.006,ok +BF8_cpmp_16_5_64_26_48_5,1,astar-symmulgt-transmul,15.465,ok +BF8_cpmp_16_5_64_26_48_5,1,astar-symmullt-transmul,32.898,ok +BF8_cpmp_16_5_64_26_48_5,1,idastar-symmulgt-transmul,7.288,ok +BF8_cpmp_16_5_64_26_48_5,1,idastar-symmullt-transmul,3600,timeout +BF8_cpmp_16_5_64_26_48_6,1,astar-symmulgt-transmul,3600,memout +BF8_cpmp_16_5_64_26_48_6,1,astar-symmullt-transmul,3600,memout +BF8_cpmp_16_5_64_26_48_6,1,idastar-symmulgt-transmul,148.953,ok +BF8_cpmp_16_5_64_26_48_6,1,idastar-symmullt-transmul,1452.071,ok +BF8_cpmp_16_5_64_26_48_7,1,astar-symmulgt-transmul,2.72,ok +BF8_cpmp_16_5_64_26_48_7,1,astar-symmullt-transmul,3600,memout +BF8_cpmp_16_5_64_26_48_7,1,idastar-symmulgt-transmul,3600,timeout +BF8_cpmp_16_5_64_26_48_7,1,idastar-symmullt-transmul,3600,timeout +BF9_cpmp_16_8_77_16_47_1,1,astar-symmulgt-transmul,3600,memout +BF9_cpmp_16_8_77_16_47_1,1,astar-symmullt-transmul,3600,memout +BF9_cpmp_16_8_77_16_47_1,1,idastar-symmulgt-transmul,1468.668,ok +BF9_cpmp_16_8_77_16_47_1,1,idastar-symmullt-transmul,3600,timeout +BF9_cpmp_16_8_77_16_47_10,1,astar-symmulgt-transmul,60.852,ok +BF9_cpmp_16_8_77_16_47_10,1,astar-symmullt-transmul,3600,memout +BF9_cpmp_16_8_77_16_47_10,1,idastar-symmulgt-transmul,300.563,ok +BF9_cpmp_16_8_77_16_47_10,1,idastar-symmullt-transmul,37.682,ok +BF9_cpmp_16_8_77_16_47_12,1,astar-symmulgt-transmul,3600,memout +BF9_cpmp_16_8_77_16_47_12,1,astar-symmullt-transmul,3600,memout +BF9_cpmp_16_8_77_16_47_12,1,idastar-symmulgt-transmul,705.936,ok +BF9_cpmp_16_8_77_16_47_12,1,idastar-symmullt-transmul,406.801,ok +BF9_cpmp_16_8_77_16_47_3,1,astar-symmulgt-transmul,3600,memout +BF9_cpmp_16_8_77_16_47_3,1,astar-symmullt-transmul,3600,memout +BF9_cpmp_16_8_77_16_47_3,1,idastar-symmulgt-transmul,1137.839,ok +BF9_cpmp_16_8_77_16_47_3,1,idastar-symmullt-transmul,1.932,ok +BF9_cpmp_16_8_77_16_47_4,1,astar-symmulgt-transmul,3600,memout +BF9_cpmp_16_8_77_16_47_4,1,astar-symmullt-transmul,6.1,ok +BF9_cpmp_16_8_77_16_47_4,1,idastar-symmulgt-transmul,1010.091,ok +BF9_cpmp_16_8_77_16_47_4,1,idastar-symmullt-transmul,7.836,ok +BF9_cpmp_16_8_77_16_47_6,1,astar-symmulgt-transmul,3600,memout +BF9_cpmp_16_8_77_16_47_6,1,astar-symmullt-transmul,3600,memout +BF9_cpmp_16_8_77_16_47_6,1,idastar-symmulgt-transmul,444.748,ok +BF9_cpmp_16_8_77_16_47_6,1,idastar-symmullt-transmul,1096.493,ok +BF9_cpmp_16_8_77_16_47_7,1,astar-symmulgt-transmul,3600,memout +BF9_cpmp_16_8_77_16_47_7,1,astar-symmullt-transmul,3600,memout +BF9_cpmp_16_8_77_16_47_7,1,idastar-symmulgt-transmul,1246.822,ok +BF9_cpmp_16_8_77_16_47_7,1,idastar-symmullt-transmul,2556.336,ok +BF9_cpmp_16_8_77_16_47_8,1,astar-symmulgt-transmul,3600,memout +BF9_cpmp_16_8_77_16_47_8,1,astar-symmullt-transmul,2.728,ok +BF9_cpmp_16_8_77_16_47_8,1,idastar-symmulgt-transmul,3600,timeout +BF9_cpmp_16_8_77_16_47_8,1,idastar-symmullt-transmul,3600,timeout +LC2a_lc2a_1,1,astar-symmulgt-transmul,0.024,ok +LC2a_lc2a_1,1,astar-symmullt-transmul,14.857,ok +LC2a_lc2a_1,1,idastar-symmulgt-transmul,0.02,ok +LC2a_lc2a_1,1,idastar-symmullt-transmul,0.764,ok +LC2a_lc2a_10,1,astar-symmulgt-transmul,0.484,ok +LC2a_lc2a_10,1,astar-symmullt-transmul,0.02,ok +LC2a_lc2a_10,1,idastar-symmulgt-transmul,0.392,ok +LC2a_lc2a_10,1,idastar-symmullt-transmul,0.008,ok +LC2a_lc2a_2,1,astar-symmulgt-transmul,792.29,ok +LC2a_lc2a_2,1,astar-symmullt-transmul,5.28,ok +LC2a_lc2a_2,1,idastar-symmulgt-transmul,613.682,ok +LC2a_lc2a_2,1,idastar-symmullt-transmul,160.234,ok +LC2a_lc2a_3,1,astar-symmulgt-transmul,31.086,ok +LC2a_lc2a_3,1,astar-symmullt-transmul,1.636,ok +LC2a_lc2a_3,1,idastar-symmulgt-transmul,25.442,ok +LC2a_lc2a_3,1,idastar-symmullt-transmul,30.694,ok +LC2a_lc2a_4,1,astar-symmulgt-transmul,0.648,ok +LC2a_lc2a_4,1,astar-symmullt-transmul,3600,memout +LC2a_lc2a_4,1,idastar-symmulgt-transmul,0.52,ok +LC2a_lc2a_4,1,idastar-symmullt-transmul,0.112,ok +LC2a_lc2a_5,1,astar-symmulgt-transmul,2.476,ok +LC2a_lc2a_5,1,astar-symmullt-transmul,13.665,ok +LC2a_lc2a_5,1,idastar-symmulgt-transmul,1.92,ok +LC2a_lc2a_5,1,idastar-symmullt-transmul,1.804,ok +LC2a_lc2a_7,1,astar-symmulgt-transmul,11.417,ok +LC2a_lc2a_7,1,astar-symmullt-transmul,231.046,ok +LC2a_lc2a_7,1,idastar-symmulgt-transmul,9.253,ok +LC2a_lc2a_7,1,idastar-symmullt-transmul,6.796,ok +LC2b_lc2b_1,1,astar-symmulgt-transmul,21.005,ok +LC2b_lc2b_1,1,astar-symmullt-transmul,3600,memout +LC2b_lc2b_1,1,idastar-symmulgt-transmul,17.401,ok +LC2b_lc2b_1,1,idastar-symmullt-transmul,652.297,ok +LC2b_lc2b_2,1,astar-symmulgt-transmul,3039.914,ok +LC2b_lc2b_2,1,astar-symmullt-transmul,3600,memout +LC2b_lc2b_2,1,idastar-symmulgt-transmul,1806.705,ok +LC2b_lc2b_2,1,idastar-symmullt-transmul,3600,timeout +LC2b_lc2b_3,1,astar-symmulgt-transmul,0.752,ok +LC2b_lc2b_3,1,astar-symmullt-transmul,3600,memout +LC2b_lc2b_3,1,idastar-symmulgt-transmul,0.468,ok +LC2b_lc2b_3,1,idastar-symmullt-transmul,0.516,ok +LC2b_lc2b_4,1,astar-symmulgt-transmul,60.796,ok +LC2b_lc2b_4,1,astar-symmullt-transmul,0.232,ok +LC2b_lc2b_4,1,idastar-symmulgt-transmul,50.051,ok +LC2b_lc2b_4,1,idastar-symmullt-transmul,23.045,ok +LC2b_lc2b_5,1,astar-symmulgt-transmul,3600,memout +LC2b_lc2b_5,1,astar-symmullt-transmul,3600,memout +LC2b_lc2b_5,1,idastar-symmulgt-transmul,3600,timeout +LC2b_lc2b_5,1,idastar-symmullt-transmul,10.729,ok +LC2b_lc2b_6,1,astar-symmulgt-transmul,0.38,ok +LC2b_lc2b_6,1,astar-symmullt-transmul,2.676,ok +LC2b_lc2b_6,1,idastar-symmulgt-transmul,0.308,ok +LC2b_lc2b_6,1,idastar-symmullt-transmul,1.156,ok +LC2b_lc2b_7,1,astar-symmulgt-transmul,1056.57,ok +LC2b_lc2b_7,1,astar-symmullt-transmul,7.856,ok +LC2b_lc2b_7,1,idastar-symmulgt-transmul,779.805,ok +LC2b_lc2b_7,1,idastar-symmullt-transmul,528.325,ok +LC2b_lc2b_9,1,astar-symmulgt-transmul,82.761,ok +LC2b_lc2b_9,1,astar-symmullt-transmul,3600,memout +LC2b_lc2b_9,1,idastar-symmulgt-transmul,67.752,ok +LC2b_lc2b_9,1,idastar-symmullt-transmul,74.373,ok +LC3a_lc3a_1,1,astar-symmulgt-transmul,70.404,ok +LC3a_lc3a_1,1,astar-symmullt-transmul,0.572,ok +LC3a_lc3a_1,1,idastar-symmulgt-transmul,56.104,ok +LC3a_lc3a_1,1,idastar-symmullt-transmul,29.318,ok +LC3a_lc3a_10,1,astar-symmulgt-transmul,1.176,ok +LC3a_lc3a_10,1,astar-symmullt-transmul,3.996,ok +LC3a_lc3a_10,1,idastar-symmulgt-transmul,0.976,ok +LC3a_lc3a_10,1,idastar-symmullt-transmul,0.048,ok +LC3a_lc3a_3,1,astar-symmulgt-transmul,170.859,ok +LC3a_lc3a_3,1,astar-symmullt-transmul,3600,memout +LC3a_lc3a_3,1,idastar-symmulgt-transmul,142.593,ok +LC3a_lc3a_3,1,idastar-symmullt-transmul,114.671,ok +LC3a_lc3a_4,1,astar-symmulgt-transmul,3600,memout +LC3a_lc3a_4,1,astar-symmullt-transmul,81.537,ok +LC3a_lc3a_4,1,idastar-symmulgt-transmul,3600,timeout +LC3a_lc3a_4,1,idastar-symmullt-transmul,2436.808,ok +LC3a_lc3a_6,1,astar-symmulgt-transmul,28.546,ok +LC3a_lc3a_6,1,astar-symmullt-transmul,0.648,ok +LC3a_lc3a_6,1,idastar-symmulgt-transmul,23.149,ok +LC3a_lc3a_6,1,idastar-symmullt-transmul,2.08,ok +LC3a_lc3a_7,1,astar-symmulgt-transmul,111.835,ok +LC3a_lc3a_7,1,astar-symmullt-transmul,0.144,ok +LC3a_lc3a_7,1,idastar-symmulgt-transmul,80.653,ok +LC3a_lc3a_7,1,idastar-symmullt-transmul,89.89,ok +LC3a_lc3a_8,1,astar-symmulgt-transmul,1320.567,ok +LC3a_lc3a_8,1,astar-symmullt-transmul,229.57,ok +LC3a_lc3a_8,1,idastar-symmulgt-transmul,1009.243,ok +LC3a_lc3a_8,1,idastar-symmullt-transmul,569.048,ok +LC3a_lc3a_9,1,astar-symmulgt-transmul,3600,memout +LC3a_lc3a_9,1,astar-symmullt-transmul,55.592,ok +LC3a_lc3a_9,1,idastar-symmulgt-transmul,3600,timeout +LC3a_lc3a_9,1,idastar-symmullt-transmul,2312.845,ok +LC3b_lc3b_1,1,astar-symmulgt-transmul,1054.294,ok +LC3b_lc3b_1,1,astar-symmullt-transmul,214.365,ok +LC3b_lc3b_1,1,idastar-symmulgt-transmul,857.706,ok +LC3b_lc3b_1,1,idastar-symmullt-transmul,359.93,ok +LC3b_lc3b_10,1,astar-symmulgt-transmul,1053.806,ok +LC3b_lc3b_10,1,astar-symmullt-transmul,3600,memout +LC3b_lc3b_10,1,idastar-symmulgt-transmul,822.027,ok +LC3b_lc3b_10,1,idastar-symmullt-transmul,1984.696,ok +LC3b_lc3b_4,1,astar-symmulgt-transmul,338.281,ok +LC3b_lc3b_4,1,astar-symmullt-transmul,759.055,ok +LC3b_lc3b_4,1,idastar-symmulgt-transmul,275.901,ok +LC3b_lc3b_4,1,idastar-symmullt-transmul,12.305,ok +LC3b_lc3b_5,1,astar-symmulgt-transmul,0.476,ok +LC3b_lc3b_5,1,astar-symmullt-transmul,0.14,ok +LC3b_lc3b_5,1,idastar-symmulgt-transmul,0.408,ok +LC3b_lc3b_5,1,idastar-symmullt-transmul,0.228,ok +LC3b_lc3b_6,1,astar-symmulgt-transmul,1.732,ok +LC3b_lc3b_6,1,astar-symmullt-transmul,0.752,ok +LC3b_lc3b_6,1,idastar-symmulgt-transmul,1.46,ok +LC3b_lc3b_6,1,idastar-symmullt-transmul,0.484,ok +LC3b_lc3b_8,1,astar-symmulgt-transmul,43.835,ok +LC3b_lc3b_8,1,astar-symmullt-transmul,23.433,ok +LC3b_lc3b_8,1,idastar-symmulgt-transmul,42.227,ok +LC3b_lc3b_8,1,idastar-symmullt-transmul,399.973,ok +LC3b_lc3b_9,1,astar-symmulgt-transmul,94.622,ok +LC3b_lc3b_9,1,astar-symmullt-transmul,3600,memout +LC3b_lc3b_9,1,idastar-symmulgt-transmul,75.209,ok +LC3b_lc3b_9,1,idastar-symmullt-transmul,74.453,ok +cv_data3-5-13,1,astar-symmulgt-transmul,0.18,ok +cv_data3-5-13,1,astar-symmullt-transmul,0.196,ok +cv_data3-5-13,1,idastar-symmulgt-transmul,0.072,ok +cv_data3-5-13,1,idastar-symmullt-transmul,0.076,ok +cv_data3-5-8,1,astar-symmulgt-transmul,0.26,ok +cv_data3-5-8,1,astar-symmullt-transmul,0.268,ok +cv_data3-5-8,1,idastar-symmulgt-transmul,0.084,ok +cv_data3-5-8,1,idastar-symmullt-transmul,0.084,ok +cv_data3-6-24,1,astar-symmulgt-transmul,0.064,ok +cv_data3-6-24,1,astar-symmullt-transmul,0.072,ok +cv_data3-6-24,1,idastar-symmulgt-transmul,0.188,ok +cv_data3-6-24,1,idastar-symmullt-transmul,0.092,ok +cv_data3-6-26,1,astar-symmulgt-transmul,0.528,ok +cv_data3-6-26,1,astar-symmullt-transmul,0.652,ok +cv_data3-6-26,1,idastar-symmulgt-transmul,0.244,ok +cv_data3-6-26,1,idastar-symmullt-transmul,0.264,ok +cv_data3-7-12,1,astar-symmulgt-transmul,0.344,ok +cv_data3-7-12,1,astar-symmullt-transmul,0.484,ok +cv_data3-7-12,1,idastar-symmulgt-transmul,0.436,ok +cv_data3-7-12,1,idastar-symmullt-transmul,0.228,ok +cv_data3-7-17,1,astar-symmulgt-transmul,0.268,ok +cv_data3-7-17,1,astar-symmullt-transmul,0.252,ok +cv_data3-7-17,1,idastar-symmulgt-transmul,0.14,ok +cv_data3-7-17,1,idastar-symmullt-transmul,0.128,ok +cv_data3-7-18,1,astar-symmulgt-transmul,0.052,ok +cv_data3-7-18,1,astar-symmullt-transmul,0.06,ok +cv_data3-7-18,1,idastar-symmulgt-transmul,0.06,ok +cv_data3-7-18,1,idastar-symmullt-transmul,0.216,ok +cv_data3-7-22,1,astar-symmulgt-transmul,0.44,ok +cv_data3-7-22,1,astar-symmullt-transmul,0.536,ok +cv_data3-7-22,1,idastar-symmulgt-transmul,0.032,ok +cv_data3-7-22,1,idastar-symmullt-transmul,0.024,ok +cv_data3-7-27,1,astar-symmulgt-transmul,0.096,ok +cv_data3-7-27,1,astar-symmullt-transmul,0.116,ok +cv_data3-7-27,1,idastar-symmulgt-transmul,0.2,ok +cv_data3-7-27,1,idastar-symmullt-transmul,0.104,ok +cv_data3-7-30,1,astar-symmulgt-transmul,0.316,ok +cv_data3-7-30,1,astar-symmullt-transmul,0.384,ok +cv_data3-7-30,1,idastar-symmulgt-transmul,0.004,ok +cv_data3-7-30,1,idastar-symmullt-transmul,0.004,ok +cv_data3-7-5,1,astar-symmulgt-transmul,0.292,ok +cv_data3-7-5,1,astar-symmullt-transmul,0.26,ok +cv_data3-7-5,1,idastar-symmulgt-transmul,0.004,ok +cv_data3-7-5,1,idastar-symmullt-transmul,0.004,ok +cv_data3-8-17,1,astar-symmulgt-transmul,0.116,ok +cv_data3-8-17,1,astar-symmullt-transmul,0.168,ok +cv_data3-8-17,1,idastar-symmulgt-transmul,0.008,ok +cv_data3-8-17,1,idastar-symmullt-transmul,0.008,ok +cv_data3-8-2,1,astar-symmulgt-transmul,0.016,ok +cv_data3-8-2,1,astar-symmullt-transmul,0.016,ok +cv_data3-8-2,1,idastar-symmulgt-transmul,0.08,ok +cv_data3-8-2,1,idastar-symmullt-transmul,0.272,ok +cv_data3-8-24,1,astar-symmulgt-transmul,0.764,ok +cv_data3-8-24,1,astar-symmullt-transmul,0.62,ok +cv_data3-8-24,1,idastar-symmulgt-transmul,0.54,ok +cv_data3-8-24,1,idastar-symmullt-transmul,0.248,ok +cv_data3-8-25,1,astar-symmulgt-transmul,0.156,ok +cv_data3-8-25,1,astar-symmullt-transmul,0.152,ok +cv_data3-8-25,1,idastar-symmulgt-transmul,0.012,ok +cv_data3-8-25,1,idastar-symmullt-transmul,0.008,ok +cv_data3-8-28,1,astar-symmulgt-transmul,0.308,ok +cv_data3-8-28,1,astar-symmullt-transmul,0.252,ok +cv_data3-8-28,1,idastar-symmulgt-transmul,0.168,ok +cv_data3-8-28,1,idastar-symmullt-transmul,0.104,ok +cv_data3-8-37,1,astar-symmulgt-transmul,0.144,ok +cv_data3-8-37,1,astar-symmullt-transmul,0.104,ok +cv_data3-8-37,1,idastar-symmulgt-transmul,0.48,ok +cv_data3-8-37,1,idastar-symmullt-transmul,0.112,ok +cv_data3-8-38,1,astar-symmulgt-transmul,0.416,ok +cv_data3-8-38,1,astar-symmullt-transmul,0.532,ok +cv_data3-8-38,1,idastar-symmulgt-transmul,0.536,ok +cv_data3-8-38,1,idastar-symmullt-transmul,0.208,ok +cv_data3-8-7,1,astar-symmulgt-transmul,1.088,ok +cv_data3-8-7,1,astar-symmullt-transmul,1.876,ok +cv_data3-8-7,1,idastar-symmulgt-transmul,1.928,ok +cv_data3-8-7,1,idastar-symmullt-transmul,1.064,ok +cv_data4-4-10,1,astar-symmulgt-transmul,0.38,ok +cv_data4-4-10,1,astar-symmullt-transmul,0.368,ok +cv_data4-4-10,1,idastar-symmulgt-transmul,0.16,ok +cv_data4-4-10,1,idastar-symmullt-transmul,0.152,ok +cv_data4-4-12,1,astar-symmulgt-transmul,0.872,ok +cv_data4-4-12,1,astar-symmullt-transmul,0.868,ok +cv_data4-4-12,1,idastar-symmulgt-transmul,0.656,ok +cv_data4-4-12,1,idastar-symmullt-transmul,0.576,ok +cv_data4-4-13,1,astar-symmulgt-transmul,0.188,ok +cv_data4-4-13,1,astar-symmullt-transmul,0.192,ok +cv_data4-4-13,1,idastar-symmulgt-transmul,0.068,ok +cv_data4-4-13,1,idastar-symmullt-transmul,0.064,ok +cv_data4-4-14,1,astar-symmulgt-transmul,0.332,ok +cv_data4-4-14,1,astar-symmullt-transmul,0.316,ok +cv_data4-4-14,1,idastar-symmulgt-transmul,0.472,ok +cv_data4-4-14,1,idastar-symmullt-transmul,0.432,ok +cv_data4-4-15,1,astar-symmulgt-transmul,0.532,ok +cv_data4-4-15,1,astar-symmullt-transmul,0.396,ok +cv_data4-4-15,1,idastar-symmulgt-transmul,0.272,ok +cv_data4-4-15,1,idastar-symmullt-transmul,0.256,ok +cv_data4-4-16,1,astar-symmulgt-transmul,17.353,ok +cv_data4-4-16,1,astar-symmullt-transmul,16.273,ok +cv_data4-4-16,1,idastar-symmulgt-transmul,6.432,ok +cv_data4-4-16,1,idastar-symmullt-transmul,6.104,ok +cv_data4-4-19,1,astar-symmulgt-transmul,0.512,ok +cv_data4-4-19,1,astar-symmullt-transmul,0.688,ok +cv_data4-4-19,1,idastar-symmulgt-transmul,0.216,ok +cv_data4-4-19,1,idastar-symmullt-transmul,0.204,ok +cv_data4-4-2,1,astar-symmulgt-transmul,0.148,ok +cv_data4-4-2,1,astar-symmullt-transmul,0.148,ok +cv_data4-4-2,1,idastar-symmulgt-transmul,0.04,ok +cv_data4-4-2,1,idastar-symmullt-transmul,0.036,ok +cv_data4-4-20,1,astar-symmulgt-transmul,1.716,ok +cv_data4-4-20,1,astar-symmullt-transmul,1.748,ok +cv_data4-4-20,1,idastar-symmulgt-transmul,2.316,ok +cv_data4-4-20,1,idastar-symmullt-transmul,2.068,ok +cv_data4-4-21,1,astar-symmulgt-transmul,0.62,ok +cv_data4-4-21,1,astar-symmullt-transmul,0.508,ok +cv_data4-4-21,1,idastar-symmulgt-transmul,0.172,ok +cv_data4-4-21,1,idastar-symmullt-transmul,0.236,ok +cv_data4-4-23,1,astar-symmulgt-transmul,152.046,ok +cv_data4-4-23,1,astar-symmullt-transmul,180.103,ok +cv_data4-4-23,1,idastar-symmulgt-transmul,8.893,ok +cv_data4-4-23,1,idastar-symmullt-transmul,9.349,ok +cv_data4-4-24,1,astar-symmulgt-transmul,12.105,ok +cv_data4-4-24,1,astar-symmullt-transmul,11.865,ok +cv_data4-4-24,1,idastar-symmulgt-transmul,2.08,ok +cv_data4-4-24,1,idastar-symmullt-transmul,2.056,ok +cv_data4-4-25,1,astar-symmulgt-transmul,46.347,ok +cv_data4-4-25,1,astar-symmullt-transmul,40.486,ok +cv_data4-4-25,1,idastar-symmulgt-transmul,12.009,ok +cv_data4-4-25,1,idastar-symmullt-transmul,11.057,ok +cv_data4-4-27,1,astar-symmulgt-transmul,17.881,ok +cv_data4-4-27,1,astar-symmullt-transmul,15.337,ok +cv_data4-4-27,1,idastar-symmulgt-transmul,5.064,ok +cv_data4-4-27,1,idastar-symmullt-transmul,6.004,ok +cv_data4-4-28,1,astar-symmulgt-transmul,0.112,ok +cv_data4-4-28,1,astar-symmullt-transmul,0.1,ok +cv_data4-4-28,1,idastar-symmulgt-transmul,0.584,ok +cv_data4-4-28,1,idastar-symmullt-transmul,0.484,ok +cv_data4-4-29,1,astar-symmulgt-transmul,0.612,ok +cv_data4-4-29,1,astar-symmullt-transmul,0.44,ok +cv_data4-4-29,1,idastar-symmulgt-transmul,0.372,ok +cv_data4-4-29,1,idastar-symmullt-transmul,0.328,ok +cv_data4-4-3,1,astar-symmulgt-transmul,0.636,ok +cv_data4-4-3,1,astar-symmullt-transmul,0.544,ok +cv_data4-4-3,1,idastar-symmulgt-transmul,0.488,ok +cv_data4-4-3,1,idastar-symmullt-transmul,0.4,ok +cv_data4-4-30,1,astar-symmulgt-transmul,2.744,ok +cv_data4-4-30,1,astar-symmullt-transmul,2.964,ok +cv_data4-4-30,1,idastar-symmulgt-transmul,1.8,ok +cv_data4-4-30,1,idastar-symmullt-transmul,1.824,ok +cv_data4-4-33,1,astar-symmulgt-transmul,34.154,ok +cv_data4-4-33,1,astar-symmullt-transmul,42.479,ok +cv_data4-4-33,1,idastar-symmulgt-transmul,2.12,ok +cv_data4-4-33,1,idastar-symmullt-transmul,1.848,ok +cv_data4-4-34,1,astar-symmulgt-transmul,0.804,ok +cv_data4-4-34,1,astar-symmullt-transmul,0.68,ok +cv_data4-4-34,1,idastar-symmulgt-transmul,0.832,ok +cv_data4-4-34,1,idastar-symmullt-transmul,0.516,ok +cv_data4-4-35,1,astar-symmulgt-transmul,2.156,ok +cv_data4-4-35,1,astar-symmullt-transmul,1.88,ok +cv_data4-4-35,1,idastar-symmulgt-transmul,1.9,ok +cv_data4-4-35,1,idastar-symmullt-transmul,1.676,ok +cv_data4-4-7,1,astar-symmulgt-transmul,2.512,ok +cv_data4-4-7,1,astar-symmullt-transmul,2.204,ok +cv_data4-4-7,1,idastar-symmulgt-transmul,0.552,ok +cv_data4-4-7,1,idastar-symmullt-transmul,0.536,ok +cv_data4-5-1,1,astar-symmulgt-transmul,3.432,ok +cv_data4-5-1,1,astar-symmullt-transmul,3.472,ok +cv_data4-5-1,1,idastar-symmulgt-transmul,0.516,ok +cv_data4-5-1,1,idastar-symmullt-transmul,0.408,ok +cv_data4-5-12,1,astar-symmulgt-transmul,0.188,ok +cv_data4-5-12,1,astar-symmullt-transmul,0.176,ok +cv_data4-5-12,1,idastar-symmulgt-transmul,0.772,ok +cv_data4-5-12,1,idastar-symmullt-transmul,0.78,ok +cv_data4-5-14,1,astar-symmulgt-transmul,4.804,ok +cv_data4-5-14,1,astar-symmullt-transmul,5.048,ok +cv_data4-5-14,1,idastar-symmulgt-transmul,4.892,ok +cv_data4-5-14,1,idastar-symmullt-transmul,3.864,ok +cv_data4-5-15,1,astar-symmulgt-transmul,6.28,ok +cv_data4-5-15,1,astar-symmullt-transmul,5.312,ok +cv_data4-5-15,1,idastar-symmulgt-transmul,6.936,ok +cv_data4-5-15,1,idastar-symmullt-transmul,5.968,ok +cv_data4-5-17,1,astar-symmulgt-transmul,197.196,ok +cv_data4-5-17,1,astar-symmullt-transmul,142.161,ok +cv_data4-5-17,1,idastar-symmulgt-transmul,82.777,ok +cv_data4-5-17,1,idastar-symmullt-transmul,66.976,ok +cv_data4-5-19,1,astar-symmulgt-transmul,2.144,ok +cv_data4-5-19,1,astar-symmullt-transmul,1.92,ok +cv_data4-5-19,1,idastar-symmulgt-transmul,0.804,ok +cv_data4-5-19,1,idastar-symmullt-transmul,0.56,ok +cv_data4-5-2,1,astar-symmulgt-transmul,0.324,ok +cv_data4-5-2,1,astar-symmullt-transmul,0.296,ok +cv_data4-5-2,1,idastar-symmulgt-transmul,1.312,ok +cv_data4-5-2,1,idastar-symmullt-transmul,0.788,ok +cv_data4-5-20,1,astar-symmulgt-transmul,0.332,ok +cv_data4-5-20,1,astar-symmullt-transmul,0.7,ok +cv_data4-5-20,1,idastar-symmulgt-transmul,0.508,ok +cv_data4-5-20,1,idastar-symmullt-transmul,0.336,ok +cv_data4-5-21,1,astar-symmulgt-transmul,589.337,ok +cv_data4-5-21,1,astar-symmullt-transmul,624.091,ok +cv_data4-5-21,1,idastar-symmulgt-transmul,3.956,ok +cv_data4-5-21,1,idastar-symmullt-transmul,3.816,ok +cv_data4-5-22,1,astar-symmulgt-transmul,2.244,ok +cv_data4-5-22,1,astar-symmullt-transmul,2.692,ok +cv_data4-5-22,1,idastar-symmulgt-transmul,0.48,ok +cv_data4-5-22,1,idastar-symmullt-transmul,0.472,ok +cv_data4-5-23,1,astar-symmulgt-transmul,1.516,ok +cv_data4-5-23,1,astar-symmullt-transmul,1.688,ok +cv_data4-5-23,1,idastar-symmulgt-transmul,1.78,ok +cv_data4-5-23,1,idastar-symmullt-transmul,1.56,ok +cv_data4-5-24,1,astar-symmulgt-transmul,0.276,ok +cv_data4-5-24,1,astar-symmullt-transmul,0.268,ok +cv_data4-5-24,1,idastar-symmulgt-transmul,0.016,ok +cv_data4-5-24,1,idastar-symmullt-transmul,0.016,ok +cv_data4-5-25,1,astar-symmulgt-transmul,2.884,ok +cv_data4-5-25,1,astar-symmullt-transmul,2.788,ok +cv_data4-5-25,1,idastar-symmulgt-transmul,0.836,ok +cv_data4-5-25,1,idastar-symmullt-transmul,0.764,ok +cv_data4-5-26,1,astar-symmulgt-transmul,5.028,ok +cv_data4-5-26,1,astar-symmullt-transmul,5.9,ok +cv_data4-5-26,1,idastar-symmulgt-transmul,12.525,ok +cv_data4-5-26,1,idastar-symmullt-transmul,11.317,ok +cv_data4-5-27,1,astar-symmulgt-transmul,2.308,ok +cv_data4-5-27,1,astar-symmullt-transmul,2.416,ok +cv_data4-5-27,1,idastar-symmulgt-transmul,2.304,ok +cv_data4-5-27,1,idastar-symmullt-transmul,1.604,ok +cv_data4-5-28,1,astar-symmulgt-transmul,0.26,ok +cv_data4-5-28,1,astar-symmullt-transmul,0.272,ok +cv_data4-5-28,1,idastar-symmulgt-transmul,0.084,ok +cv_data4-5-28,1,idastar-symmullt-transmul,0.084,ok +cv_data4-5-29,1,astar-symmulgt-transmul,10.721,ok +cv_data4-5-29,1,astar-symmullt-transmul,11.181,ok +cv_data4-5-29,1,idastar-symmulgt-transmul,5.036,ok +cv_data4-5-29,1,idastar-symmullt-transmul,3.204,ok +cv_data4-5-3,1,astar-symmulgt-transmul,0.464,ok +cv_data4-5-3,1,astar-symmullt-transmul,0.8,ok +cv_data4-5-3,1,idastar-symmulgt-transmul,0.048,ok +cv_data4-5-3,1,idastar-symmullt-transmul,0.08,ok +cv_data4-5-30,1,astar-symmulgt-transmul,19.449,ok +cv_data4-5-30,1,astar-symmullt-transmul,14.937,ok +cv_data4-5-30,1,idastar-symmulgt-transmul,2.94,ok +cv_data4-5-30,1,idastar-symmullt-transmul,2.928,ok +cv_data4-5-31,1,astar-symmulgt-transmul,0.1,ok +cv_data4-5-31,1,astar-symmullt-transmul,0.096,ok +cv_data4-5-31,1,idastar-symmulgt-transmul,0.368,ok +cv_data4-5-31,1,idastar-symmullt-transmul,0.268,ok +cv_data4-5-32,1,astar-symmulgt-transmul,3.484,ok +cv_data4-5-32,1,astar-symmullt-transmul,2.972,ok +cv_data4-5-32,1,idastar-symmulgt-transmul,1.552,ok +cv_data4-5-32,1,idastar-symmullt-transmul,1.424,ok +cv_data4-5-34,1,astar-symmulgt-transmul,21.013,ok +cv_data4-5-34,1,astar-symmullt-transmul,17.137,ok +cv_data4-5-34,1,idastar-symmulgt-transmul,6.148,ok +cv_data4-5-34,1,idastar-symmullt-transmul,13.081,ok +cv_data4-5-35,1,astar-symmulgt-transmul,2.136,ok +cv_data4-5-35,1,astar-symmullt-transmul,1.692,ok +cv_data4-5-35,1,idastar-symmulgt-transmul,3.664,ok +cv_data4-5-35,1,idastar-symmullt-transmul,2.88,ok +cv_data4-5-39,1,astar-symmulgt-transmul,2.1,ok +cv_data4-5-39,1,astar-symmullt-transmul,1.928,ok +cv_data4-5-39,1,idastar-symmulgt-transmul,7.292,ok +cv_data4-5-39,1,idastar-symmullt-transmul,5.504,ok +cv_data4-5-5,1,astar-symmulgt-transmul,1.448,ok +cv_data4-5-5,1,astar-symmullt-transmul,0.66,ok +cv_data4-5-5,1,idastar-symmulgt-transmul,0.136,ok +cv_data4-5-5,1,idastar-symmullt-transmul,0.116,ok +cv_data4-5-9,1,astar-symmulgt-transmul,1.484,ok +cv_data4-5-9,1,astar-symmullt-transmul,1.46,ok +cv_data4-5-9,1,idastar-symmulgt-transmul,1.256,ok +cv_data4-5-9,1,idastar-symmullt-transmul,0.736,ok +cv_data4-6-1,1,astar-symmulgt-transmul,471.613,ok +cv_data4-6-1,1,astar-symmullt-transmul,413.73,ok +cv_data4-6-1,1,idastar-symmulgt-transmul,377.572,ok +cv_data4-6-1,1,idastar-symmullt-transmul,236.251,ok +cv_data4-6-10,1,astar-symmulgt-transmul,0.86,ok +cv_data4-6-10,1,astar-symmullt-transmul,0.712,ok +cv_data4-6-10,1,idastar-symmulgt-transmul,0.26,ok +cv_data4-6-10,1,idastar-symmullt-transmul,0.248,ok +cv_data4-6-11,1,astar-symmulgt-transmul,0.48,ok +cv_data4-6-11,1,astar-symmullt-transmul,0.388,ok +cv_data4-6-11,1,idastar-symmulgt-transmul,0.76,ok +cv_data4-6-11,1,idastar-symmullt-transmul,0.42,ok +cv_data4-6-12,1,astar-symmulgt-transmul,47.367,ok +cv_data4-6-12,1,astar-symmullt-transmul,44.011,ok +cv_data4-6-12,1,idastar-symmulgt-transmul,5.664,ok +cv_data4-6-12,1,idastar-symmullt-transmul,4.856,ok +cv_data4-6-13,1,astar-symmulgt-transmul,5.82,ok +cv_data4-6-13,1,astar-symmullt-transmul,5.708,ok +cv_data4-6-13,1,idastar-symmulgt-transmul,3.632,ok +cv_data4-6-13,1,idastar-symmullt-transmul,2.04,ok +cv_data4-6-14,1,astar-symmulgt-transmul,0.408,ok +cv_data4-6-14,1,astar-symmullt-transmul,0.336,ok +cv_data4-6-14,1,idastar-symmulgt-transmul,0.104,ok +cv_data4-6-14,1,idastar-symmullt-transmul,0.052,ok +cv_data4-6-15,1,astar-symmulgt-transmul,1.604,ok +cv_data4-6-15,1,astar-symmullt-transmul,1.292,ok +cv_data4-6-15,1,idastar-symmulgt-transmul,3.716,ok +cv_data4-6-15,1,idastar-symmullt-transmul,2.808,ok +cv_data4-6-16,1,astar-symmulgt-transmul,3.812,ok +cv_data4-6-16,1,astar-symmullt-transmul,4.108,ok +cv_data4-6-16,1,idastar-symmulgt-transmul,9.345,ok +cv_data4-6-16,1,idastar-symmullt-transmul,12.165,ok +cv_data4-6-18,1,astar-symmulgt-transmul,209.581,ok +cv_data4-6-18,1,astar-symmullt-transmul,200.317,ok +cv_data4-6-18,1,idastar-symmulgt-transmul,49.899,ok +cv_data4-6-18,1,idastar-symmullt-transmul,31.87,ok +cv_data4-6-19,1,astar-symmulgt-transmul,35.094,ok +cv_data4-6-19,1,astar-symmullt-transmul,41.263,ok +cv_data4-6-19,1,idastar-symmulgt-transmul,11.369,ok +cv_data4-6-19,1,idastar-symmullt-transmul,7.944,ok +cv_data4-6-2,1,astar-symmulgt-transmul,0.128,ok +cv_data4-6-2,1,astar-symmullt-transmul,0.12,ok +cv_data4-6-2,1,idastar-symmulgt-transmul,0.46,ok +cv_data4-6-2,1,idastar-symmullt-transmul,0.312,ok +cv_data4-6-20,1,astar-symmulgt-transmul,1.92,ok +cv_data4-6-20,1,astar-symmullt-transmul,1.972,ok +cv_data4-6-20,1,idastar-symmulgt-transmul,17.581,ok +cv_data4-6-20,1,idastar-symmullt-transmul,14.749,ok +cv_data4-6-21,1,astar-symmulgt-transmul,72.617,ok +cv_data4-6-21,1,astar-symmullt-transmul,74.905,ok +cv_data4-6-21,1,idastar-symmulgt-transmul,6.868,ok +cv_data4-6-21,1,idastar-symmullt-transmul,6.24,ok +cv_data4-6-24,1,astar-symmulgt-transmul,3600,memout +cv_data4-6-24,1,astar-symmullt-transmul,3600,memout +cv_data4-6-24,1,idastar-symmulgt-transmul,57.016,ok +cv_data4-6-24,1,idastar-symmullt-transmul,38.75,ok +cv_data4-6-25,1,astar-symmulgt-transmul,1.096,ok +cv_data4-6-25,1,astar-symmullt-transmul,1.096,ok +cv_data4-6-25,1,idastar-symmulgt-transmul,0.528,ok +cv_data4-6-25,1,idastar-symmullt-transmul,0.72,ok +cv_data4-6-27,1,astar-symmulgt-transmul,2.304,ok +cv_data4-6-27,1,astar-symmullt-transmul,2.252,ok +cv_data4-6-27,1,idastar-symmulgt-transmul,2.128,ok +cv_data4-6-27,1,idastar-symmullt-transmul,1.328,ok +cv_data4-6-28,1,astar-symmulgt-transmul,632.216,ok +cv_data4-6-28,1,astar-symmullt-transmul,831.544,ok +cv_data4-6-28,1,idastar-symmulgt-transmul,166.166,ok +cv_data4-6-28,1,idastar-symmullt-transmul,87.113,ok +cv_data4-6-29,1,astar-symmulgt-transmul,9.033,ok +cv_data4-6-29,1,astar-symmullt-transmul,8.221,ok +cv_data4-6-29,1,idastar-symmulgt-transmul,1.64,ok +cv_data4-6-29,1,idastar-symmullt-transmul,1.42,ok +cv_data4-6-3,1,astar-symmulgt-transmul,53.983,ok +cv_data4-6-3,1,astar-symmullt-transmul,71.597,ok +cv_data4-6-3,1,idastar-symmulgt-transmul,61.412,ok +cv_data4-6-3,1,idastar-symmullt-transmul,47.743,ok +cv_data4-6-30,1,astar-symmulgt-transmul,5.232,ok +cv_data4-6-30,1,astar-symmullt-transmul,4.676,ok +cv_data4-6-30,1,idastar-symmulgt-transmul,2.12,ok +cv_data4-6-30,1,idastar-symmullt-transmul,1.932,ok +cv_data4-6-32,1,astar-symmulgt-transmul,205.333,ok +cv_data4-6-32,1,astar-symmullt-transmul,228.702,ok +cv_data4-6-32,1,idastar-symmulgt-transmul,9.957,ok +cv_data4-6-32,1,idastar-symmullt-transmul,10.213,ok +cv_data4-6-33,1,astar-symmulgt-transmul,7.04,ok +cv_data4-6-33,1,astar-symmullt-transmul,6.008,ok +cv_data4-6-33,1,idastar-symmulgt-transmul,2.58,ok +cv_data4-6-33,1,idastar-symmullt-transmul,31.722,ok +cv_data4-6-34,1,astar-symmulgt-transmul,65.304,ok +cv_data4-6-34,1,astar-symmullt-transmul,60.68,ok +cv_data4-6-34,1,idastar-symmulgt-transmul,26.746,ok +cv_data4-6-34,1,idastar-symmullt-transmul,20.633,ok +cv_data4-6-35,1,astar-symmulgt-transmul,20.281,ok +cv_data4-6-35,1,astar-symmullt-transmul,22.957,ok +cv_data4-6-35,1,idastar-symmulgt-transmul,2.484,ok +cv_data4-6-35,1,idastar-symmullt-transmul,2.684,ok +cv_data4-6-36,1,astar-symmulgt-transmul,635.312,ok +cv_data4-6-36,1,astar-symmullt-transmul,813.615,ok +cv_data4-6-36,1,idastar-symmulgt-transmul,18.949,ok +cv_data4-6-36,1,idastar-symmullt-transmul,12.849,ok +cv_data4-6-37,1,astar-symmulgt-transmul,40.55,ok +cv_data4-6-37,1,astar-symmullt-transmul,47.559,ok +cv_data4-6-37,1,idastar-symmulgt-transmul,30.566,ok +cv_data4-6-37,1,idastar-symmullt-transmul,15.421,ok +cv_data4-6-39,1,astar-symmulgt-transmul,1.176,ok +cv_data4-6-39,1,astar-symmullt-transmul,1.224,ok +cv_data4-6-39,1,idastar-symmulgt-transmul,0.612,ok +cv_data4-6-39,1,idastar-symmullt-transmul,0.372,ok +cv_data4-6-4,1,astar-symmulgt-transmul,174.419,ok +cv_data4-6-4,1,astar-symmullt-transmul,146.437,ok +cv_data4-6-4,1,idastar-symmulgt-transmul,2.516,ok +cv_data4-6-4,1,idastar-symmullt-transmul,1.432,ok +cv_data4-6-40,1,astar-symmulgt-transmul,30.45,ok +cv_data4-6-40,1,astar-symmullt-transmul,27.162,ok +cv_data4-6-40,1,idastar-symmulgt-transmul,10.117,ok +cv_data4-6-40,1,idastar-symmullt-transmul,11.849,ok +cv_data4-6-5,1,astar-symmulgt-transmul,2.14,ok +cv_data4-6-5,1,astar-symmullt-transmul,1.788,ok +cv_data4-6-5,1,idastar-symmulgt-transmul,1.936,ok +cv_data4-6-5,1,idastar-symmullt-transmul,0.788,ok +cv_data4-6-7,1,astar-symmulgt-transmul,3600,memout +cv_data4-6-7,1,astar-symmullt-transmul,3600,memout +cv_data4-6-7,1,idastar-symmulgt-transmul,84.385,ok +cv_data4-6-7,1,idastar-symmullt-transmul,80.585,ok +cv_data4-6-8,1,astar-symmulgt-transmul,0.104,ok +cv_data4-6-8,1,astar-symmullt-transmul,0.092,ok +cv_data4-6-8,1,idastar-symmulgt-transmul,0.22,ok +cv_data4-6-8,1,idastar-symmullt-transmul,0.088,ok +cv_data4-6-9,1,astar-symmulgt-transmul,3.552,ok +cv_data4-6-9,1,astar-symmullt-transmul,3.28,ok +cv_data4-6-9,1,idastar-symmulgt-transmul,1.004,ok +cv_data4-6-9,1,idastar-symmullt-transmul,0.76,ok +cv_data4-7-1,1,astar-symmulgt-transmul,86.009,ok +cv_data4-7-1,1,astar-symmullt-transmul,56.924,ok +cv_data4-7-1,1,idastar-symmulgt-transmul,2.644,ok +cv_data4-7-1,1,idastar-symmullt-transmul,2.336,ok +cv_data4-7-10,1,astar-symmulgt-transmul,3600,memout +cv_data4-7-10,1,astar-symmullt-transmul,3600,memout +cv_data4-7-10,1,idastar-symmulgt-transmul,96.27,ok +cv_data4-7-10,1,idastar-symmullt-transmul,96.41,ok +cv_data4-7-11,1,astar-symmulgt-transmul,1.432,ok +cv_data4-7-11,1,astar-symmullt-transmul,0.76,ok +cv_data4-7-11,1,idastar-symmulgt-transmul,0.3,ok +cv_data4-7-11,1,idastar-symmullt-transmul,0.112,ok +cv_data4-7-12,1,astar-symmulgt-transmul,3600,memout +cv_data4-7-12,1,astar-symmullt-transmul,3600,memout +cv_data4-7-12,1,idastar-symmulgt-transmul,247.307,ok +cv_data4-7-12,1,idastar-symmullt-transmul,275.873,ok +cv_data4-7-13,1,astar-symmulgt-transmul,395.429,ok +cv_data4-7-13,1,astar-symmullt-transmul,332.793,ok +cv_data4-7-13,1,idastar-symmulgt-transmul,0.62,ok +cv_data4-7-13,1,idastar-symmullt-transmul,0.396,ok +cv_data4-7-14,1,astar-symmulgt-transmul,11.865,ok +cv_data4-7-14,1,astar-symmullt-transmul,11.733,ok +cv_data4-7-14,1,idastar-symmulgt-transmul,11.353,ok +cv_data4-7-14,1,idastar-symmullt-transmul,4.988,ok +cv_data4-7-15,1,astar-symmulgt-transmul,893.188,ok +cv_data4-7-15,1,astar-symmullt-transmul,944.267,ok +cv_data4-7-15,1,idastar-symmulgt-transmul,126.544,ok +cv_data4-7-15,1,idastar-symmullt-transmul,58.7,ok +cv_data4-7-16,1,astar-symmulgt-transmul,0.2,ok +cv_data4-7-16,1,astar-symmullt-transmul,0.288,ok +cv_data4-7-16,1,idastar-symmulgt-transmul,18.001,ok +cv_data4-7-16,1,idastar-symmullt-transmul,9.033,ok +cv_data4-7-17,1,astar-symmulgt-transmul,0.264,ok +cv_data4-7-17,1,astar-symmullt-transmul,0.34,ok +cv_data4-7-17,1,idastar-symmulgt-transmul,1.464,ok +cv_data4-7-17,1,idastar-symmullt-transmul,1.164,ok +cv_data4-7-19,1,astar-symmulgt-transmul,30.118,ok +cv_data4-7-19,1,astar-symmullt-transmul,30.686,ok +cv_data4-7-19,1,idastar-symmulgt-transmul,12.541,ok +cv_data4-7-19,1,idastar-symmullt-transmul,5.98,ok +cv_data4-7-2,1,astar-symmulgt-transmul,8.741,ok +cv_data4-7-2,1,astar-symmullt-transmul,9.149,ok +cv_data4-7-2,1,idastar-symmulgt-transmul,49.971,ok +cv_data4-7-2,1,idastar-symmullt-transmul,26.998,ok +cv_data4-7-20,1,astar-symmulgt-transmul,2.372,ok +cv_data4-7-20,1,astar-symmullt-transmul,3.352,ok +cv_data4-7-20,1,idastar-symmulgt-transmul,0.192,ok +cv_data4-7-20,1,idastar-symmullt-transmul,1.008,ok +cv_data4-7-21,1,astar-symmulgt-transmul,11.193,ok +cv_data4-7-21,1,astar-symmullt-transmul,10.981,ok +cv_data4-7-21,1,idastar-symmulgt-transmul,17.837,ok +cv_data4-7-21,1,idastar-symmullt-transmul,17.009,ok +cv_data4-7-22,1,astar-symmulgt-transmul,3600,memout +cv_data4-7-22,1,astar-symmullt-transmul,3600,memout +cv_data4-7-22,1,idastar-symmulgt-transmul,432.431,ok +cv_data4-7-22,1,idastar-symmullt-transmul,179.491,ok +cv_data4-7-23,1,astar-symmulgt-transmul,1.232,ok +cv_data4-7-23,1,astar-symmullt-transmul,1.16,ok +cv_data4-7-23,1,idastar-symmulgt-transmul,5.684,ok +cv_data4-7-23,1,idastar-symmullt-transmul,3.168,ok +cv_data4-7-24,1,astar-symmulgt-transmul,0.488,ok +cv_data4-7-24,1,astar-symmullt-transmul,0.332,ok +cv_data4-7-24,1,idastar-symmulgt-transmul,0.26,ok +cv_data4-7-24,1,idastar-symmullt-transmul,0.124,ok +cv_data4-7-25,1,astar-symmulgt-transmul,3600,memout +cv_data4-7-25,1,astar-symmullt-transmul,3600,memout +cv_data4-7-25,1,idastar-symmulgt-transmul,108.619,ok +cv_data4-7-25,1,idastar-symmullt-transmul,55.319,ok +cv_data4-7-26,1,astar-symmulgt-transmul,5.748,ok +cv_data4-7-26,1,astar-symmullt-transmul,5.26,ok +cv_data4-7-26,1,idastar-symmulgt-transmul,40.315,ok +cv_data4-7-26,1,idastar-symmullt-transmul,32.738,ok +cv_data4-7-27,1,astar-symmulgt-transmul,3600,memout +cv_data4-7-27,1,astar-symmullt-transmul,3600,memout +cv_data4-7-27,1,idastar-symmulgt-transmul,134.6,ok +cv_data4-7-27,1,idastar-symmullt-transmul,106.611,ok +cv_data4-7-28,1,astar-symmulgt-transmul,0.936,ok +cv_data4-7-28,1,astar-symmullt-transmul,0.688,ok +cv_data4-7-28,1,idastar-symmulgt-transmul,0.344,ok +cv_data4-7-28,1,idastar-symmullt-transmul,0.364,ok +cv_data4-7-29,1,astar-symmulgt-transmul,14.301,ok +cv_data4-7-29,1,astar-symmullt-transmul,12.857,ok +cv_data4-7-29,1,idastar-symmulgt-transmul,4.224,ok +cv_data4-7-29,1,idastar-symmullt-transmul,3.12,ok +cv_data4-7-3,1,astar-symmulgt-transmul,1.02,ok +cv_data4-7-3,1,astar-symmullt-transmul,1.22,ok +cv_data4-7-3,1,idastar-symmulgt-transmul,0.888,ok +cv_data4-7-3,1,idastar-symmullt-transmul,0.628,ok +cv_data4-7-30,1,astar-symmulgt-transmul,4.6,ok +cv_data4-7-30,1,astar-symmullt-transmul,1.408,ok +cv_data4-7-30,1,idastar-symmulgt-transmul,0.432,ok +cv_data4-7-30,1,idastar-symmullt-transmul,0.352,ok +cv_data4-7-31,1,astar-symmulgt-transmul,3600,memout +cv_data4-7-31,1,astar-symmullt-transmul,3600,memout +cv_data4-7-31,1,idastar-symmulgt-transmul,25.406,ok +cv_data4-7-31,1,idastar-symmullt-transmul,15.453,ok +cv_data4-7-32,1,astar-symmulgt-transmul,3600,memout +cv_data4-7-32,1,astar-symmullt-transmul,3600,memout +cv_data4-7-32,1,idastar-symmulgt-transmul,83.561,ok +cv_data4-7-32,1,idastar-symmullt-transmul,57.224,ok +cv_data4-7-33,1,astar-symmulgt-transmul,934.406,ok +cv_data4-7-33,1,astar-symmullt-transmul,1203.52,ok +cv_data4-7-33,1,idastar-symmulgt-transmul,32.114,ok +cv_data4-7-33,1,idastar-symmullt-transmul,33.666,ok +cv_data4-7-34,1,astar-symmulgt-transmul,0.096,ok +cv_data4-7-34,1,astar-symmullt-transmul,0.092,ok +cv_data4-7-34,1,idastar-symmulgt-transmul,0.296,ok +cv_data4-7-34,1,idastar-symmullt-transmul,0.128,ok +cv_data4-7-35,1,astar-symmulgt-transmul,26.122,ok +cv_data4-7-35,1,astar-symmullt-transmul,28.178,ok +cv_data4-7-35,1,idastar-symmulgt-transmul,7.176,ok +cv_data4-7-35,1,idastar-symmullt-transmul,6.564,ok +cv_data4-7-36,1,astar-symmulgt-transmul,0.192,ok +cv_data4-7-36,1,astar-symmullt-transmul,0.684,ok +cv_data4-7-36,1,idastar-symmulgt-transmul,0.936,ok +cv_data4-7-36,1,idastar-symmullt-transmul,1.308,ok +cv_data4-7-37,1,astar-symmulgt-transmul,201.793,ok +cv_data4-7-37,1,astar-symmullt-transmul,259.488,ok +cv_data4-7-37,1,idastar-symmulgt-transmul,21.665,ok +cv_data4-7-37,1,idastar-symmullt-transmul,35.174,ok +cv_data4-7-38,1,astar-symmulgt-transmul,2.088,ok +cv_data4-7-38,1,astar-symmullt-transmul,2.16,ok +cv_data4-7-38,1,idastar-symmulgt-transmul,0.548,ok +cv_data4-7-38,1,idastar-symmullt-transmul,0.488,ok +cv_data4-7-39,1,astar-symmulgt-transmul,604.022,ok +cv_data4-7-39,1,astar-symmullt-transmul,702.224,ok +cv_data4-7-39,1,idastar-symmulgt-transmul,129.468,ok +cv_data4-7-39,1,idastar-symmullt-transmul,50.035,ok +cv_data4-7-4,1,astar-symmulgt-transmul,3.68,ok +cv_data4-7-4,1,astar-symmullt-transmul,14.881,ok +cv_data4-7-4,1,idastar-symmulgt-transmul,2.924,ok +cv_data4-7-4,1,idastar-symmullt-transmul,1.304,ok +cv_data4-7-40,1,astar-symmulgt-transmul,0.54,ok +cv_data4-7-40,1,astar-symmullt-transmul,0.484,ok +cv_data4-7-40,1,idastar-symmulgt-transmul,0.256,ok +cv_data4-7-40,1,idastar-symmullt-transmul,0.124,ok +cv_data4-7-5,1,astar-symmulgt-transmul,46.455,ok +cv_data4-7-5,1,astar-symmullt-transmul,44.535,ok +cv_data4-7-5,1,idastar-symmulgt-transmul,76.485,ok +cv_data4-7-5,1,idastar-symmullt-transmul,29.214,ok +cv_data4-7-6,1,astar-symmulgt-transmul,2.244,ok +cv_data4-7-6,1,astar-symmullt-transmul,4.272,ok +cv_data4-7-6,1,idastar-symmulgt-transmul,11.713,ok +cv_data4-7-6,1,idastar-symmullt-transmul,8.173,ok +cv_data4-7-7,1,astar-symmulgt-transmul,864.594,ok +cv_data4-7-7,1,astar-symmullt-transmul,667.686,ok +cv_data4-7-7,1,idastar-symmulgt-transmul,121.948,ok +cv_data4-7-7,1,idastar-symmullt-transmul,62.636,ok +cv_data4-7-8,1,astar-symmulgt-transmul,1.912,ok +cv_data4-7-8,1,astar-symmullt-transmul,2.408,ok +cv_data4-7-8,1,idastar-symmulgt-transmul,0.78,ok +cv_data4-7-8,1,idastar-symmullt-transmul,0.596,ok +cv_data4-7-9,1,astar-symmulgt-transmul,0.456,ok +cv_data4-7-9,1,astar-symmullt-transmul,0.408,ok +cv_data4-7-9,1,idastar-symmulgt-transmul,0.94,ok +cv_data4-7-9,1,idastar-symmullt-transmul,0.552,ok +cv_data5-10-19,1,astar-symmulgt-transmul,3600,memout +cv_data5-10-19,1,astar-symmullt-transmul,3600,memout +cv_data5-10-19,1,idastar-symmulgt-transmul,1236.317,ok +cv_data5-10-19,1,idastar-symmullt-transmul,1059.75,ok +cv_data5-10-23,1,astar-symmulgt-transmul,3600,memout +cv_data5-10-23,1,astar-symmullt-transmul,3600,memout +cv_data5-10-23,1,idastar-symmulgt-transmul,2579.769,ok +cv_data5-10-23,1,idastar-symmullt-transmul,556.423,ok +cv_data5-10-32,1,astar-symmulgt-transmul,3600,memout +cv_data5-10-32,1,astar-symmullt-transmul,3600,memout +cv_data5-10-32,1,idastar-symmulgt-transmul,2967.681,ok +cv_data5-10-32,1,idastar-symmullt-transmul,3600,timeout +cv_data5-10-33,1,astar-symmulgt-transmul,3600,memout +cv_data5-10-33,1,astar-symmullt-transmul,3600,memout +cv_data5-10-33,1,idastar-symmulgt-transmul,3600,timeout +cv_data5-10-33,1,idastar-symmullt-transmul,3050.235,ok +cv_data5-4-1,1,astar-symmulgt-transmul,564.919,ok +cv_data5-4-1,1,astar-symmullt-transmul,590.937,ok +cv_data5-4-1,1,idastar-symmulgt-transmul,68.6,ok +cv_data5-4-1,1,idastar-symmullt-transmul,64.056,ok +cv_data5-4-10,1,astar-symmulgt-transmul,0.564,ok +cv_data5-4-10,1,astar-symmullt-transmul,0.588,ok +cv_data5-4-10,1,idastar-symmulgt-transmul,0.308,ok +cv_data5-4-10,1,idastar-symmullt-transmul,0.296,ok +cv_data5-4-12,1,astar-symmulgt-transmul,46.627,ok +cv_data5-4-12,1,astar-symmullt-transmul,42.235,ok +cv_data5-4-12,1,idastar-symmulgt-transmul,61.76,ok +cv_data5-4-12,1,idastar-symmullt-transmul,57.784,ok +cv_data5-4-13,1,astar-symmulgt-transmul,3600,memout +cv_data5-4-13,1,astar-symmullt-transmul,3600,memout +cv_data5-4-13,1,idastar-symmulgt-transmul,320.532,ok +cv_data5-4-13,1,idastar-symmullt-transmul,242.255,ok +cv_data5-4-14,1,astar-symmulgt-transmul,3600,memout +cv_data5-4-14,1,astar-symmullt-transmul,3600,memout +cv_data5-4-14,1,idastar-symmulgt-transmul,1264.015,ok +cv_data5-4-14,1,idastar-symmullt-transmul,1259.867,ok +cv_data5-4-15,1,astar-symmulgt-transmul,3600,memout +cv_data5-4-15,1,astar-symmullt-transmul,3600,memout +cv_data5-4-15,1,idastar-symmulgt-transmul,60.304,ok +cv_data5-4-15,1,idastar-symmullt-transmul,60.66,ok +cv_data5-4-16,1,astar-symmulgt-transmul,338.085,ok +cv_data5-4-16,1,astar-symmullt-transmul,289.334,ok +cv_data5-4-16,1,idastar-symmulgt-transmul,28.646,ok +cv_data5-4-16,1,idastar-symmullt-transmul,35.018,ok +cv_data5-4-17,1,astar-symmulgt-transmul,3600,memout +cv_data5-4-17,1,astar-symmullt-transmul,3600,memout +cv_data5-4-17,1,idastar-symmulgt-transmul,1124.37,ok +cv_data5-4-17,1,idastar-symmullt-transmul,1010.499,ok +cv_data5-4-18,1,astar-symmulgt-transmul,0.388,ok +cv_data5-4-18,1,astar-symmullt-transmul,0.34,ok +cv_data5-4-18,1,idastar-symmulgt-transmul,1.372,ok +cv_data5-4-18,1,idastar-symmullt-transmul,1.876,ok +cv_data5-4-20,1,astar-symmulgt-transmul,417.706,ok +cv_data5-4-20,1,astar-symmullt-transmul,541.146,ok +cv_data5-4-20,1,idastar-symmulgt-transmul,7.048,ok +cv_data5-4-20,1,idastar-symmullt-transmul,21.585,ok +cv_data5-4-21,1,astar-symmulgt-transmul,3600,memout +cv_data5-4-21,1,astar-symmullt-transmul,3600,memout +cv_data5-4-21,1,idastar-symmulgt-transmul,2854.162,ok +cv_data5-4-21,1,idastar-symmullt-transmul,2822.916,ok +cv_data5-4-22,1,astar-symmulgt-transmul,314.556,ok +cv_data5-4-22,1,astar-symmullt-transmul,317.788,ok +cv_data5-4-22,1,idastar-symmulgt-transmul,43.935,ok +cv_data5-4-22,1,idastar-symmullt-transmul,38.822,ok +cv_data5-4-23,1,astar-symmulgt-transmul,0.5,ok +cv_data5-4-23,1,astar-symmullt-transmul,0.472,ok +cv_data5-4-23,1,idastar-symmulgt-transmul,0.2,ok +cv_data5-4-23,1,idastar-symmullt-transmul,0.196,ok +cv_data5-4-24,1,astar-symmulgt-transmul,3600,memout +cv_data5-4-24,1,astar-symmullt-transmul,3600,memout +cv_data5-4-24,1,idastar-symmulgt-transmul,282.866,ok +cv_data5-4-24,1,idastar-symmullt-transmul,265.525,ok +cv_data5-4-25,1,astar-symmulgt-transmul,3600,memout +cv_data5-4-25,1,astar-symmullt-transmul,3600,memout +cv_data5-4-25,1,idastar-symmulgt-transmul,233.355,ok +cv_data5-4-25,1,idastar-symmullt-transmul,201.601,ok +cv_data5-4-27,1,astar-symmulgt-transmul,73.965,ok +cv_data5-4-27,1,astar-symmullt-transmul,85.521,ok +cv_data5-4-27,1,idastar-symmulgt-transmul,6.58,ok +cv_data5-4-27,1,idastar-symmullt-transmul,5.9,ok +cv_data5-4-29,1,astar-symmulgt-transmul,3600,memout +cv_data5-4-29,1,astar-symmullt-transmul,3600,memout +cv_data5-4-29,1,idastar-symmulgt-transmul,123.768,ok +cv_data5-4-29,1,idastar-symmullt-transmul,150.445,ok +cv_data5-4-3,1,astar-symmulgt-transmul,3600,memout +cv_data5-4-3,1,astar-symmullt-transmul,3600,memout +cv_data5-4-3,1,idastar-symmulgt-transmul,202.321,ok +cv_data5-4-3,1,idastar-symmullt-transmul,187.94,ok +cv_data5-4-30,1,astar-symmulgt-transmul,2.472,ok +cv_data5-4-30,1,astar-symmullt-transmul,2.82,ok +cv_data5-4-30,1,idastar-symmulgt-transmul,2.3,ok +cv_data5-4-30,1,idastar-symmullt-transmul,2.04,ok +cv_data5-4-31,1,astar-symmulgt-transmul,2086.65,ok +cv_data5-4-31,1,astar-symmullt-transmul,1950.48,ok +cv_data5-4-31,1,idastar-symmulgt-transmul,298.295,ok +cv_data5-4-31,1,idastar-symmullt-transmul,276.989,ok +cv_data5-4-32,1,astar-symmulgt-transmul,21.645,ok +cv_data5-4-32,1,astar-symmullt-transmul,24.758,ok +cv_data5-4-32,1,idastar-symmulgt-transmul,16.553,ok +cv_data5-4-32,1,idastar-symmullt-transmul,15.669,ok +cv_data5-4-33,1,astar-symmulgt-transmul,41.851,ok +cv_data5-4-33,1,astar-symmullt-transmul,47.363,ok +cv_data5-4-33,1,idastar-symmulgt-transmul,13.501,ok +cv_data5-4-33,1,idastar-symmullt-transmul,29.734,ok +cv_data5-4-34,1,astar-symmulgt-transmul,3600,memout +cv_data5-4-34,1,astar-symmullt-transmul,3600,memout +cv_data5-4-34,1,idastar-symmulgt-transmul,969.093,ok +cv_data5-4-34,1,idastar-symmullt-transmul,976.633,ok +cv_data5-4-35,1,astar-symmulgt-transmul,618.047,ok +cv_data5-4-35,1,astar-symmullt-transmul,528.217,ok +cv_data5-4-35,1,idastar-symmulgt-transmul,68.704,ok +cv_data5-4-35,1,idastar-symmullt-transmul,128.564,ok +cv_data5-4-36,1,astar-symmulgt-transmul,783.105,ok +cv_data5-4-36,1,astar-symmullt-transmul,1015.33,ok +cv_data5-4-36,1,idastar-symmulgt-transmul,175.423,ok +cv_data5-4-36,1,idastar-symmullt-transmul,162.726,ok +cv_data5-4-37,1,astar-symmulgt-transmul,1.056,ok +cv_data5-4-37,1,astar-symmullt-transmul,1.048,ok +cv_data5-4-37,1,idastar-symmulgt-transmul,1.732,ok +cv_data5-4-37,1,idastar-symmullt-transmul,1.504,ok +cv_data5-4-38,1,astar-symmulgt-transmul,772.792,ok +cv_data5-4-38,1,astar-symmullt-transmul,676.198,ok +cv_data5-4-38,1,idastar-symmulgt-transmul,28.33,ok +cv_data5-4-38,1,idastar-symmullt-transmul,28.162,ok +cv_data5-4-4,1,astar-symmulgt-transmul,3600,memout +cv_data5-4-4,1,astar-symmullt-transmul,3600,memout +cv_data5-4-4,1,idastar-symmulgt-transmul,322.9,ok +cv_data5-4-4,1,idastar-symmullt-transmul,297.275,ok +cv_data5-4-6,1,astar-symmulgt-transmul,6.54,ok +cv_data5-4-6,1,astar-symmullt-transmul,6.448,ok +cv_data5-4-6,1,idastar-symmulgt-transmul,2.588,ok +cv_data5-4-6,1,idastar-symmullt-transmul,2.432,ok +cv_data5-4-7,1,astar-symmulgt-transmul,5.904,ok +cv_data5-4-7,1,astar-symmullt-transmul,5.772,ok +cv_data5-4-7,1,idastar-symmulgt-transmul,32.91,ok +cv_data5-4-7,1,idastar-symmullt-transmul,28.43,ok +cv_data5-4-9,1,astar-symmulgt-transmul,234.323,ok +cv_data5-4-9,1,astar-symmullt-transmul,258.888,ok +cv_data5-4-9,1,idastar-symmulgt-transmul,42.911,ok +cv_data5-4-9,1,idastar-symmullt-transmul,43.839,ok +cv_data5-5-10,1,astar-symmulgt-transmul,3600,memout +cv_data5-5-10,1,astar-symmullt-transmul,3600,memout +cv_data5-5-10,1,idastar-symmulgt-transmul,92.846,ok +cv_data5-5-10,1,idastar-symmullt-transmul,62.912,ok +cv_data5-5-11,1,astar-symmulgt-transmul,3600,memout +cv_data5-5-11,1,astar-symmullt-transmul,3600,memout +cv_data5-5-11,1,idastar-symmulgt-transmul,3600,timeout +cv_data5-5-11,1,idastar-symmullt-transmul,2891.773,ok +cv_data5-5-12,1,astar-symmulgt-transmul,3600,memout +cv_data5-5-12,1,astar-symmullt-transmul,3600,memout +cv_data5-5-12,1,idastar-symmulgt-transmul,224.286,ok +cv_data5-5-12,1,idastar-symmullt-transmul,213.965,ok +cv_data5-5-13,1,astar-symmulgt-transmul,3600,memout +cv_data5-5-13,1,astar-symmullt-transmul,3600,memout +cv_data5-5-13,1,idastar-symmulgt-transmul,1303.693,ok +cv_data5-5-13,1,idastar-symmullt-transmul,913.593,ok +cv_data5-5-14,1,astar-symmulgt-transmul,3600,memout +cv_data5-5-14,1,astar-symmullt-transmul,3600,memout +cv_data5-5-14,1,idastar-symmulgt-transmul,364.819,ok +cv_data5-5-14,1,idastar-symmullt-transmul,290.018,ok +cv_data5-5-15,1,astar-symmulgt-transmul,3600,memout +cv_data5-5-15,1,astar-symmullt-transmul,3600,memout +cv_data5-5-15,1,idastar-symmulgt-transmul,1089.456,ok +cv_data5-5-15,1,idastar-symmullt-transmul,802.942,ok +cv_data5-5-16,1,astar-symmulgt-transmul,3600,memout +cv_data5-5-16,1,astar-symmullt-transmul,3600,memout +cv_data5-5-16,1,idastar-symmulgt-transmul,371.783,ok +cv_data5-5-16,1,idastar-symmullt-transmul,250.328,ok +cv_data5-5-17,1,astar-symmulgt-transmul,3600,memout +cv_data5-5-17,1,astar-symmullt-transmul,3600,memout +cv_data5-5-17,1,idastar-symmulgt-transmul,33.714,ok +cv_data5-5-17,1,idastar-symmullt-transmul,28.894,ok +cv_data5-5-18,1,astar-symmulgt-transmul,358.538,ok +cv_data5-5-18,1,astar-symmullt-transmul,453.3,ok +cv_data5-5-18,1,idastar-symmulgt-transmul,147.785,ok +cv_data5-5-18,1,idastar-symmullt-transmul,299.363,ok +cv_data5-5-19,1,astar-symmulgt-transmul,1861.84,ok +cv_data5-5-19,1,astar-symmullt-transmul,1449.08,ok +cv_data5-5-19,1,idastar-symmulgt-transmul,11.777,ok +cv_data5-5-19,1,idastar-symmullt-transmul,7.856,ok +cv_data5-5-20,1,astar-symmulgt-transmul,735.558,ok +cv_data5-5-20,1,astar-symmullt-transmul,642.92,ok +cv_data5-5-20,1,idastar-symmulgt-transmul,116.039,ok +cv_data5-5-20,1,idastar-symmullt-transmul,84.897,ok +cv_data5-5-21,1,astar-symmulgt-transmul,3600,memout +cv_data5-5-21,1,astar-symmullt-transmul,3600,memout +cv_data5-5-21,1,idastar-symmulgt-transmul,244.363,ok +cv_data5-5-21,1,idastar-symmullt-transmul,205.309,ok +cv_data5-5-22,1,astar-symmulgt-transmul,3600,memout +cv_data5-5-22,1,astar-symmullt-transmul,3600,memout +cv_data5-5-22,1,idastar-symmulgt-transmul,95.806,ok +cv_data5-5-22,1,idastar-symmullt-transmul,79.885,ok +cv_data5-5-23,1,astar-symmulgt-transmul,70.788,ok +cv_data5-5-23,1,astar-symmullt-transmul,69.408,ok +cv_data5-5-23,1,idastar-symmulgt-transmul,7.152,ok +cv_data5-5-23,1,idastar-symmullt-transmul,6.016,ok +cv_data5-5-24,1,astar-symmulgt-transmul,0.632,ok +cv_data5-5-24,1,astar-symmullt-transmul,0.468,ok +cv_data5-5-24,1,idastar-symmulgt-transmul,0.792,ok +cv_data5-5-24,1,idastar-symmullt-transmul,0.788,ok +cv_data5-5-25,1,astar-symmulgt-transmul,3600,memout +cv_data5-5-25,1,astar-symmullt-transmul,3600,memout +cv_data5-5-25,1,idastar-symmulgt-transmul,506.316,ok +cv_data5-5-25,1,idastar-symmullt-transmul,573.432,ok +cv_data5-5-27,1,astar-symmulgt-transmul,3600,memout +cv_data5-5-27,1,astar-symmullt-transmul,3600,memout +cv_data5-5-27,1,idastar-symmulgt-transmul,2993.091,ok +cv_data5-5-27,1,idastar-symmullt-transmul,2180.444,ok +cv_data5-5-28,1,astar-symmulgt-transmul,3600,memout +cv_data5-5-28,1,astar-symmullt-transmul,3600,memout +cv_data5-5-28,1,idastar-symmulgt-transmul,2240.964,ok +cv_data5-5-28,1,idastar-symmullt-transmul,3600,timeout +cv_data5-5-29,1,astar-symmulgt-transmul,3600,memout +cv_data5-5-29,1,astar-symmullt-transmul,3600,memout +cv_data5-5-29,1,idastar-symmulgt-transmul,1960.979,ok +cv_data5-5-29,1,idastar-symmullt-transmul,1565.862,ok +cv_data5-5-3,1,astar-symmulgt-transmul,33.914,ok +cv_data5-5-3,1,astar-symmullt-transmul,22.693,ok +cv_data5-5-3,1,idastar-symmulgt-transmul,1.492,ok +cv_data5-5-3,1,idastar-symmullt-transmul,1.396,ok +cv_data5-5-30,1,astar-symmulgt-transmul,3600,memout +cv_data5-5-30,1,astar-symmullt-transmul,3600,memout +cv_data5-5-30,1,idastar-symmulgt-transmul,1173.737,ok +cv_data5-5-30,1,idastar-symmullt-transmul,1168.273,ok +cv_data5-5-31,1,astar-symmulgt-transmul,3600,memout +cv_data5-5-31,1,astar-symmullt-transmul,3600,memout +cv_data5-5-31,1,idastar-symmulgt-transmul,967.488,ok +cv_data5-5-31,1,idastar-symmullt-transmul,761.496,ok +cv_data5-5-32,1,astar-symmulgt-transmul,3600,memout +cv_data5-5-32,1,astar-symmullt-transmul,3600,memout +cv_data5-5-32,1,idastar-symmulgt-transmul,744.599,ok +cv_data5-5-32,1,idastar-symmullt-transmul,667.818,ok +cv_data5-5-33,1,astar-symmulgt-transmul,3600,memout +cv_data5-5-33,1,astar-symmullt-transmul,3600,memout +cv_data5-5-33,1,idastar-symmulgt-transmul,442.524,ok +cv_data5-5-33,1,idastar-symmullt-transmul,404.633,ok +cv_data5-5-36,1,astar-symmulgt-transmul,3600,memout +cv_data5-5-36,1,astar-symmullt-transmul,3600,memout +cv_data5-5-36,1,idastar-symmulgt-transmul,922.322,ok +cv_data5-5-36,1,idastar-symmullt-transmul,638.1,ok +cv_data5-5-37,1,astar-symmulgt-transmul,3600,memout +cv_data5-5-37,1,astar-symmullt-transmul,3600,memout +cv_data5-5-37,1,idastar-symmulgt-transmul,176.123,ok +cv_data5-5-37,1,idastar-symmullt-transmul,145.557,ok +cv_data5-5-38,1,astar-symmulgt-transmul,380.444,ok +cv_data5-5-38,1,astar-symmullt-transmul,349.482,ok +cv_data5-5-38,1,idastar-symmulgt-transmul,44.291,ok +cv_data5-5-38,1,idastar-symmullt-transmul,29.462,ok +cv_data5-5-39,1,astar-symmulgt-transmul,2184.39,ok +cv_data5-5-39,1,astar-symmullt-transmul,1915.32,ok +cv_data5-5-39,1,idastar-symmulgt-transmul,275.541,ok +cv_data5-5-39,1,idastar-symmullt-transmul,189.14,ok +cv_data5-5-4,1,astar-symmulgt-transmul,2020.47,ok +cv_data5-5-4,1,astar-symmullt-transmul,1755.28,ok +cv_data5-5-4,1,idastar-symmulgt-transmul,86.217,ok +cv_data5-5-4,1,idastar-symmullt-transmul,63.828,ok +cv_data5-5-40,1,astar-symmulgt-transmul,1629.71,ok +cv_data5-5-40,1,astar-symmullt-transmul,1423.57,ok +cv_data5-5-40,1,idastar-symmulgt-transmul,33.094,ok +cv_data5-5-40,1,idastar-symmullt-transmul,34.242,ok +cv_data5-5-5,1,astar-symmulgt-transmul,3600,memout +cv_data5-5-5,1,astar-symmullt-transmul,3600,memout +cv_data5-5-5,1,idastar-symmulgt-transmul,503.075,ok +cv_data5-5-5,1,idastar-symmullt-transmul,332.769,ok +cv_data5-5-6,1,astar-symmulgt-transmul,3600,memout +cv_data5-5-6,1,astar-symmullt-transmul,3600,memout +cv_data5-5-6,1,idastar-symmulgt-transmul,431.859,ok +cv_data5-5-6,1,idastar-symmullt-transmul,340.801,ok +cv_data5-5-7,1,astar-symmulgt-transmul,1469.89,ok +cv_data5-5-7,1,astar-symmullt-transmul,1034.31,ok +cv_data5-5-7,1,idastar-symmulgt-transmul,83.229,ok +cv_data5-5-7,1,idastar-symmullt-transmul,61.068,ok +cv_data5-5-8,1,astar-symmulgt-transmul,3600,memout +cv_data5-5-8,1,astar-symmullt-transmul,3600,memout +cv_data5-5-8,1,idastar-symmulgt-transmul,333.969,ok +cv_data5-5-8,1,idastar-symmullt-transmul,251.252,ok +cv_data5-5-9,1,astar-symmulgt-transmul,0.584,ok +cv_data5-5-9,1,astar-symmullt-transmul,0.484,ok +cv_data5-5-9,1,idastar-symmulgt-transmul,0.296,ok +cv_data5-5-9,1,idastar-symmullt-transmul,1.208,ok +cv_data5-6-11,1,astar-symmulgt-transmul,22.521,ok +cv_data5-6-11,1,astar-symmullt-transmul,24.926,ok +cv_data5-6-11,1,idastar-symmulgt-transmul,32.982,ok +cv_data5-6-11,1,idastar-symmullt-transmul,19.305,ok +cv_data5-6-19,1,astar-symmulgt-transmul,3600,memout +cv_data5-6-19,1,astar-symmullt-transmul,3600,memout +cv_data5-6-19,1,idastar-symmulgt-transmul,3121.263,ok +cv_data5-6-19,1,idastar-symmullt-transmul,3253.375,ok +cv_data5-6-2,1,astar-symmulgt-transmul,3600,memout +cv_data5-6-2,1,astar-symmullt-transmul,3600,memout +cv_data5-6-2,1,idastar-symmulgt-transmul,142.021,ok +cv_data5-6-2,1,idastar-symmullt-transmul,80.553,ok +cv_data5-6-20,1,astar-symmulgt-transmul,3600,memout +cv_data5-6-20,1,astar-symmullt-transmul,3600,memout +cv_data5-6-20,1,idastar-symmulgt-transmul,1852.048,ok +cv_data5-6-20,1,idastar-symmullt-transmul,1032.337,ok +cv_data5-6-22,1,astar-symmulgt-transmul,3600,memout +cv_data5-6-22,1,astar-symmullt-transmul,3600,memout +cv_data5-6-22,1,idastar-symmulgt-transmul,135.804,ok +cv_data5-6-22,1,idastar-symmullt-transmul,102.822,ok +cv_data5-6-25,1,astar-symmulgt-transmul,3600,memout +cv_data5-6-25,1,astar-symmullt-transmul,3600,memout +cv_data5-6-25,1,idastar-symmulgt-transmul,686.015,ok +cv_data5-6-25,1,idastar-symmullt-transmul,299.055,ok +cv_data5-6-26,1,astar-symmulgt-transmul,3600,memout +cv_data5-6-26,1,astar-symmullt-transmul,3600,memout +cv_data5-6-26,1,idastar-symmulgt-transmul,1242.33,ok +cv_data5-6-26,1,idastar-symmullt-transmul,1144.932,ok +cv_data5-6-28,1,astar-symmulgt-transmul,502.687,ok +cv_data5-6-28,1,astar-symmullt-transmul,532.821,ok +cv_data5-6-28,1,idastar-symmulgt-transmul,148.685,ok +cv_data5-6-28,1,idastar-symmullt-transmul,94.706,ok +cv_data5-6-30,1,astar-symmulgt-transmul,3600,memout +cv_data5-6-30,1,astar-symmullt-transmul,3600,memout +cv_data5-6-30,1,idastar-symmulgt-transmul,1011.107,ok +cv_data5-6-30,1,idastar-symmullt-transmul,696.028,ok +cv_data5-6-31,1,astar-symmulgt-transmul,3600,memout +cv_data5-6-31,1,astar-symmullt-transmul,3600,memout +cv_data5-6-31,1,idastar-symmulgt-transmul,2419.015,ok +cv_data5-6-31,1,idastar-symmullt-transmul,1421.457,ok +cv_data5-6-33,1,astar-symmulgt-transmul,3600,memout +cv_data5-6-33,1,astar-symmullt-transmul,3600,memout +cv_data5-6-33,1,idastar-symmulgt-transmul,3046.198,ok +cv_data5-6-33,1,idastar-symmullt-transmul,2126.257,ok +cv_data5-6-35,1,astar-symmulgt-transmul,3600,memout +cv_data5-6-35,1,astar-symmullt-transmul,3600,memout +cv_data5-6-35,1,idastar-symmulgt-transmul,2532.15,ok +cv_data5-6-35,1,idastar-symmullt-transmul,2689.408,ok +cv_data5-6-37,1,astar-symmulgt-transmul,3600,memout +cv_data5-6-37,1,astar-symmullt-transmul,3600,memout +cv_data5-6-37,1,idastar-symmulgt-transmul,3600,timeout +cv_data5-6-37,1,idastar-symmullt-transmul,3195.648,ok +cv_data5-6-5,1,astar-symmulgt-transmul,3600,memout +cv_data5-6-5,1,astar-symmullt-transmul,3600,memout +cv_data5-6-5,1,idastar-symmulgt-transmul,1595.128,ok +cv_data5-6-5,1,idastar-symmullt-transmul,941.507,ok +cv_data5-6-6,1,astar-symmulgt-transmul,1341.13,ok +cv_data5-6-6,1,astar-symmullt-transmul,1294.58,ok +cv_data5-6-6,1,idastar-symmulgt-transmul,9.777,ok +cv_data5-6-6,1,idastar-symmullt-transmul,4.708,ok +cv_data5-6-7,1,astar-symmulgt-transmul,3600,memout +cv_data5-6-7,1,astar-symmullt-transmul,3600,memout +cv_data5-6-7,1,idastar-symmulgt-transmul,159.95,ok +cv_data5-6-7,1,idastar-symmullt-transmul,169.227,ok +cv_data5-6-8,1,astar-symmulgt-transmul,3600,memout +cv_data5-6-8,1,astar-symmullt-transmul,3600,memout +cv_data5-6-8,1,idastar-symmulgt-transmul,245.599,ok +cv_data5-6-8,1,idastar-symmullt-transmul,189.796,ok +cv_data5-7-1,1,astar-symmulgt-transmul,9.765,ok +cv_data5-7-1,1,astar-symmullt-transmul,13.801,ok +cv_data5-7-1,1,idastar-symmulgt-transmul,7.016,ok +cv_data5-7-1,1,idastar-symmullt-transmul,6.524,ok +cv_data5-7-11,1,astar-symmulgt-transmul,3600,memout +cv_data5-7-11,1,astar-symmullt-transmul,3600,memout +cv_data5-7-11,1,idastar-symmulgt-transmul,74.173,ok +cv_data5-7-11,1,idastar-symmullt-transmul,50.827,ok +cv_data5-7-12,1,astar-symmulgt-transmul,3600,memout +cv_data5-7-12,1,astar-symmullt-transmul,3600,memout +cv_data5-7-12,1,idastar-symmulgt-transmul,798.354,ok +cv_data5-7-12,1,idastar-symmullt-transmul,1146.04,ok +cv_data5-7-13,1,astar-symmulgt-transmul,3600,memout +cv_data5-7-13,1,astar-symmullt-transmul,3600,memout +cv_data5-7-13,1,idastar-symmulgt-transmul,794.39,ok +cv_data5-7-13,1,idastar-symmullt-transmul,851.317,ok +cv_data5-7-15,1,astar-symmulgt-transmul,293.238,ok +cv_data5-7-15,1,astar-symmullt-transmul,312.476,ok +cv_data5-7-15,1,idastar-symmulgt-transmul,135.993,ok +cv_data5-7-15,1,idastar-symmullt-transmul,78.257,ok +cv_data5-7-16,1,astar-symmulgt-transmul,55.755,ok +cv_data5-7-16,1,astar-symmullt-transmul,72.869,ok +cv_data5-7-16,1,idastar-symmulgt-transmul,120.464,ok +cv_data5-7-16,1,idastar-symmullt-transmul,88.306,ok +cv_data5-7-18,1,astar-symmulgt-transmul,3600,memout +cv_data5-7-18,1,astar-symmullt-transmul,3600,memout +cv_data5-7-18,1,idastar-symmulgt-transmul,85.773,ok +cv_data5-7-18,1,idastar-symmullt-transmul,110.471,ok +cv_data5-7-19,1,astar-symmulgt-transmul,3600,memout +cv_data5-7-19,1,astar-symmullt-transmul,3600,memout +cv_data5-7-19,1,idastar-symmulgt-transmul,314.256,ok +cv_data5-7-19,1,idastar-symmullt-transmul,169.831,ok +cv_data5-7-22,1,astar-symmulgt-transmul,71.172,ok +cv_data5-7-22,1,astar-symmullt-transmul,65.408,ok +cv_data5-7-22,1,idastar-symmulgt-transmul,7.788,ok +cv_data5-7-22,1,idastar-symmullt-transmul,15.089,ok +cv_data5-7-31,1,astar-symmulgt-transmul,3600,memout +cv_data5-7-31,1,astar-symmullt-transmul,3600,memout +cv_data5-7-31,1,idastar-symmulgt-transmul,1194.343,ok +cv_data5-7-31,1,idastar-symmullt-transmul,632.019,ok +cv_data5-7-32,1,astar-symmulgt-transmul,3600,memout +cv_data5-7-32,1,astar-symmullt-transmul,3600,memout +cv_data5-7-32,1,idastar-symmulgt-transmul,3600,timeout +cv_data5-7-32,1,idastar-symmullt-transmul,2907.326,ok +cv_data5-7-33,1,astar-symmulgt-transmul,3600,memout +cv_data5-7-33,1,astar-symmullt-transmul,3600,memout +cv_data5-7-33,1,idastar-symmulgt-transmul,668.422,ok +cv_data5-7-33,1,idastar-symmullt-transmul,1496.738,ok +cv_data5-7-36,1,astar-symmulgt-transmul,3600,memout +cv_data5-7-36,1,astar-symmullt-transmul,3600,memout +cv_data5-7-36,1,idastar-symmulgt-transmul,907.445,ok +cv_data5-7-36,1,idastar-symmullt-transmul,675.686,ok +cv_data5-7-37,1,astar-symmulgt-transmul,156.866,ok +cv_data5-7-37,1,astar-symmullt-transmul,137.417,ok +cv_data5-7-37,1,idastar-symmulgt-transmul,75.369,ok +cv_data5-7-37,1,idastar-symmullt-transmul,96.022,ok +cv_data5-7-4,1,astar-symmulgt-transmul,964.04,ok +cv_data5-7-4,1,astar-symmullt-transmul,808.547,ok +cv_data5-7-4,1,idastar-symmulgt-transmul,599.933,ok +cv_data5-7-4,1,idastar-symmullt-transmul,368.859,ok +cv_data5-7-40,1,astar-symmulgt-transmul,3600,memout +cv_data5-7-40,1,astar-symmullt-transmul,3600,memout +cv_data5-7-40,1,idastar-symmulgt-transmul,3600,timeout +cv_data5-7-40,1,idastar-symmullt-transmul,1961.883,ok +cv_data5-7-7,1,astar-symmulgt-transmul,3600,memout +cv_data5-7-7,1,astar-symmullt-transmul,3600,memout +cv_data5-7-7,1,idastar-symmulgt-transmul,577.132,ok +cv_data5-7-7,1,idastar-symmullt-transmul,524.621,ok +cv_data5-8-19,1,astar-symmulgt-transmul,3600,memout +cv_data5-8-19,1,astar-symmullt-transmul,3600,memout +cv_data5-8-19,1,idastar-symmulgt-transmul,3600,timeout +cv_data5-8-19,1,idastar-symmullt-transmul,2413.111,ok +cv_data5-8-2,1,astar-symmulgt-transmul,3600,memout +cv_data5-8-2,1,astar-symmullt-transmul,3600,memout +cv_data5-8-2,1,idastar-symmulgt-transmul,936.483,ok +cv_data5-8-2,1,idastar-symmullt-transmul,391.54,ok +cv_data5-8-25,1,astar-symmulgt-transmul,3600,memout +cv_data5-8-25,1,astar-symmullt-transmul,3600,memout +cv_data5-8-25,1,idastar-symmulgt-transmul,3512.672,ok +cv_data5-8-25,1,idastar-symmullt-transmul,3370.827,ok +cv_data5-8-29,1,astar-symmulgt-transmul,126.476,ok +cv_data5-8-29,1,astar-symmullt-transmul,281.102,ok +cv_data5-8-29,1,idastar-symmulgt-transmul,68.272,ok +cv_data5-8-29,1,idastar-symmullt-transmul,49.343,ok +cv_data5-8-3,1,astar-symmulgt-transmul,3600,memout +cv_data5-8-3,1,astar-symmullt-transmul,3600,memout +cv_data5-8-3,1,idastar-symmulgt-transmul,1574.23,ok +cv_data5-8-3,1,idastar-symmullt-transmul,508.596,ok +cv_data5-8-36,1,astar-symmulgt-transmul,3600,memout +cv_data5-8-36,1,astar-symmullt-transmul,3600,memout +cv_data5-8-36,1,idastar-symmulgt-transmul,2910.83,ok +cv_data5-8-36,1,idastar-symmullt-transmul,1556.549,ok +cv_data5-8-38,1,astar-symmulgt-transmul,3600,memout +cv_data5-8-38,1,astar-symmullt-transmul,3600,memout +cv_data5-8-38,1,idastar-symmulgt-transmul,2471.762,ok +cv_data5-8-38,1,idastar-symmullt-transmul,1874.097,ok +cv_data5-8-39,1,astar-symmulgt-transmul,3600,memout +cv_data5-8-39,1,astar-symmullt-transmul,3600,memout +cv_data5-8-39,1,idastar-symmulgt-transmul,2529.79,ok +cv_data5-8-39,1,idastar-symmullt-transmul,1526.903,ok +cv_data5-8-4,1,astar-symmulgt-transmul,3600,memout +cv_data5-8-4,1,astar-symmullt-transmul,3600,memout +cv_data5-8-4,1,idastar-symmulgt-transmul,1656.592,ok +cv_data5-8-4,1,idastar-symmullt-transmul,1792.312,ok +cv_data5-8-6,1,astar-symmulgt-transmul,3600,memout +cv_data5-8-6,1,astar-symmullt-transmul,3600,memout +cv_data5-8-6,1,idastar-symmulgt-transmul,1610.869,ok +cv_data5-8-6,1,idastar-symmullt-transmul,1532.88,ok +cv_data5-8-8,1,astar-symmulgt-transmul,3600,memout +cv_data5-8-8,1,astar-symmullt-transmul,3600,memout +cv_data5-8-8,1,idastar-symmulgt-transmul,348.75,ok +cv_data5-8-8,1,idastar-symmullt-transmul,359.362,ok +cv_data5-9-10,1,astar-symmulgt-transmul,3600,memout +cv_data5-9-10,1,astar-symmullt-transmul,3600,memout +cv_data5-9-10,1,idastar-symmulgt-transmul,675.93,ok +cv_data5-9-10,1,idastar-symmullt-transmul,514.58,ok +cv_data5-9-21,1,astar-symmulgt-transmul,3600,memout +cv_data5-9-21,1,astar-symmullt-transmul,3600,memout +cv_data5-9-21,1,idastar-symmulgt-transmul,848.305,ok +cv_data5-9-21,1,idastar-symmullt-transmul,608.694,ok +cv_data5-9-3,1,astar-symmulgt-transmul,3600,memout +cv_data5-9-3,1,astar-symmullt-transmul,3600,memout +cv_data5-9-3,1,idastar-symmulgt-transmul,3600,timeout +cv_data5-9-3,1,idastar-symmullt-transmul,3576.584,ok +cv_data5-9-32,1,astar-symmulgt-transmul,3600,memout +cv_data5-9-32,1,astar-symmullt-transmul,3600,memout +cv_data5-9-32,1,idastar-symmulgt-transmul,3600,timeout +cv_data5-9-32,1,idastar-symmullt-transmul,1377.438,ok +cv_data5-9-38,1,astar-symmulgt-transmul,3600,memout +cv_data5-9-38,1,astar-symmullt-transmul,3600,memout +cv_data5-9-38,1,idastar-symmulgt-transmul,3600,timeout +cv_data5-9-38,1,idastar-symmullt-transmul,1108.917,ok +cv_data5-9-4,1,astar-symmulgt-transmul,3600,memout +cv_data5-9-4,1,astar-symmullt-transmul,3600,memout +cv_data5-9-4,1,idastar-symmulgt-transmul,3253.151,ok +cv_data5-9-4,1,idastar-symmullt-transmul,1466.84,ok diff --git a/_articles/RJ-2025-045/CPMP-2015_data/algorithm_runs_test.arff b/_articles/RJ-2025-045/CPMP-2015_data/algorithm_runs_test.arff new file mode 100644 index 0000000000..8892c84614 --- /dev/null +++ b/_articles/RJ-2025-045/CPMP-2015_data/algorithm_runs_test.arff @@ -0,0 +1,2197 @@ +@RELATION algorithm_runs_premarshalling_astar_2013 +@ATTRIBUTE instance_id STRING +@ATTRIBUTE repetition NUMERIC +@ATTRIBUTE algorithm STRING +@ATTRIBUTE performance NUMERIC +@ATTRIBUTE runstatus {ok, timeout, memout, not_applicable, crash, other} + +@DATA + +4-6-75pct-2_10,1,astar-symmulgt-transmul,1.99612,ok +4-6-75pct-2_10,1,idastar-symmulgt-transmul,0.572035,ok +4-6-75pct-2_10,1,idastar-symmullt-transmul,0.708044,ok +4-6-75pct-2_10,1,astar-symmullt-transmul,1.94412,ok +4-6-75pct-2_102,1,astar-symmulgt-transmul,9.47659,ok +4-6-75pct-2_102,1,idastar-symmulgt-transmul,4.10426,ok +4-6-75pct-2_102,1,idastar-symmullt-transmul,4.47228,ok +4-6-75pct-2_102,1,astar-symmullt-transmul,9.32058,ok +4-6-75pct-2_103,1,astar-symmulgt-transmul,2.28814,ok +4-6-75pct-2_103,1,idastar-symmulgt-transmul,1.06407,ok +4-6-75pct-2_103,1,idastar-symmullt-transmul,1.21208,ok +4-6-75pct-2_103,1,astar-symmullt-transmul,1.99212,ok +4-6-75pct-2_104,1,astar-symmulgt-transmul,1.00006,ok +4-6-75pct-2_104,1,idastar-symmulgt-transmul,1.70811,ok +4-6-75pct-2_104,1,idastar-symmullt-transmul,1.86012,ok +4-6-75pct-2_104,1,astar-symmullt-transmul,1.00406,ok +4-6-75pct-2_105,1,astar-symmulgt-transmul,3.76824,ok +4-6-75pct-2_105,1,idastar-symmulgt-transmul,2.66817,ok +4-6-75pct-2_105,1,idastar-symmullt-transmul,3.82824,ok +4-6-75pct-2_105,1,astar-symmullt-transmul,3.61623,ok +4-6-75pct-2_106,1,astar-symmulgt-transmul,4.02825,ok +4-6-75pct-2_106,1,idastar-symmulgt-transmul,2.89618,ok +4-6-75pct-2_106,1,idastar-symmullt-transmul,1.98412,ok +4-6-75pct-2_106,1,astar-symmullt-transmul,4.07225,ok +4-6-75pct-2_107,1,astar-symmulgt-transmul,3.80024,ok +4-6-75pct-2_107,1,idastar-symmulgt-transmul,1.5481,ok +4-6-75pct-2_107,1,idastar-symmullt-transmul,1.39209,ok +4-6-75pct-2_107,1,astar-symmullt-transmul,3.72023,ok +4-6-75pct-2_109,1,astar-symmulgt-transmul,0.80405,ok +4-6-75pct-2_109,1,idastar-symmulgt-transmul,1.05207,ok +4-6-75pct-2_109,1,idastar-symmullt-transmul,0.920057,ok +4-6-75pct-2_109,1,astar-symmullt-transmul,0.720044,ok +4-6-75pct-2_11,1,astar-symmulgt-transmul,2.36415,ok +4-6-75pct-2_11,1,idastar-symmulgt-transmul,0.788049,ok +4-6-75pct-2_11,1,idastar-symmullt-transmul,1.16007,ok +4-6-75pct-2_11,1,astar-symmullt-transmul,1.74011,ok +4-6-75pct-2_112,1,astar-symmulgt-transmul,5.23633,ok +4-6-75pct-2_112,1,idastar-symmulgt-transmul,2.74417,ok +4-6-75pct-2_112,1,idastar-symmullt-transmul,2.77617,ok +4-6-75pct-2_112,1,astar-symmullt-transmul,5.08832,ok +4-6-75pct-2_113,1,astar-symmulgt-transmul,3.59622,ok +4-6-75pct-2_113,1,idastar-symmulgt-transmul,1.91212,ok +4-6-75pct-2_113,1,idastar-symmullt-transmul,2.12813,ok +4-6-75pct-2_113,1,astar-symmullt-transmul,3.47622,ok +4-6-75pct-2_115,1,astar-symmulgt-transmul,3.06819,ok +4-6-75pct-2_115,1,idastar-symmulgt-transmul,1.94412,ok +4-6-75pct-2_115,1,idastar-symmullt-transmul,2.02813,ok +4-6-75pct-2_115,1,astar-symmullt-transmul,3.00819,ok +4-6-75pct-2_116,1,astar-symmulgt-transmul,11.9927,ok +4-6-75pct-2_116,1,idastar-symmulgt-transmul,4.22826,ok +4-6-75pct-2_116,1,idastar-symmullt-transmul,3.73623,ok +4-6-75pct-2_116,1,astar-symmullt-transmul,11.8487,ok +4-6-75pct-2_117,1,astar-symmulgt-transmul,4.48428,ok +4-6-75pct-2_117,1,idastar-symmulgt-transmul,1.6921,ok +4-6-75pct-2_117,1,idastar-symmullt-transmul,2.07613,ok +4-6-75pct-2_117,1,astar-symmullt-transmul,4.46028,ok +4-6-75pct-2_119,1,astar-symmulgt-transmul,77.5888,ok +4-6-75pct-2_119,1,idastar-symmulgt-transmul,6.69642,ok +4-6-75pct-2_119,1,idastar-symmullt-transmul,6.85643,ok +4-6-75pct-2_119,1,astar-symmullt-transmul,73.3806,ok +4-6-75pct-2_121,1,astar-symmulgt-transmul,4.7483,ok +4-6-75pct-2_121,1,idastar-symmulgt-transmul,1.6601,ok +4-6-75pct-2_121,1,idastar-symmullt-transmul,2.36015,ok +4-6-75pct-2_121,1,astar-symmullt-transmul,4.90431,ok +4-6-75pct-2_122,1,astar-symmulgt-transmul,1.50009,ok +4-6-75pct-2_122,1,idastar-symmulgt-transmul,0.48003,ok +4-6-75pct-2_122,1,idastar-symmullt-transmul,0.524032,ok +4-6-75pct-2_122,1,astar-symmullt-transmul,1.46009,ok +4-6-75pct-2_124,1,astar-symmulgt-transmul,12.3168,ok +4-6-75pct-2_124,1,idastar-symmulgt-transmul,3.40421,ok +4-6-75pct-2_124,1,idastar-symmullt-transmul,4.64029,ok +4-6-75pct-2_124,1,astar-symmullt-transmul,11.7887,ok +4-6-75pct-2_125,1,astar-symmulgt-transmul,13.4008,ok +4-6-75pct-2_125,1,idastar-symmulgt-transmul,12.11276,ok +4-6-75pct-2_125,1,idastar-symmullt-transmul,12.59679,ok +4-6-75pct-2_125,1,astar-symmullt-transmul,10.8527,ok +4-6-75pct-2_129,1,astar-symmulgt-transmul,1.6601,ok +4-6-75pct-2_129,1,idastar-symmulgt-transmul,1.00806,ok +4-6-75pct-2_129,1,idastar-symmullt-transmul,0.788049,ok +4-6-75pct-2_129,1,astar-symmullt-transmul,1.6961,ok +4-6-75pct-2_130,1,astar-symmulgt-transmul,1.28808,ok +4-6-75pct-2_130,1,idastar-symmulgt-transmul,0.81205,ok +4-6-75pct-2_130,1,idastar-symmullt-transmul,0.98806,ok +4-6-75pct-2_130,1,astar-symmullt-transmul,1.28808,ok +4-6-75pct-2_134,1,astar-symmulgt-transmul,4.54828,ok +4-6-75pct-2_134,1,idastar-symmulgt-transmul,4.19626,ok +4-6-75pct-2_134,1,idastar-symmullt-transmul,3.56422,ok +4-6-75pct-2_134,1,astar-symmullt-transmul,4.44428,ok +4-6-75pct-2_136,1,astar-symmulgt-transmul,3.96825,ok +4-6-75pct-2_136,1,idastar-symmulgt-transmul,0.81205,ok +4-6-75pct-2_136,1,idastar-symmullt-transmul,0.93606,ok +4-6-75pct-2_136,1,astar-symmullt-transmul,3.94825,ok +4-6-75pct-2_144,1,astar-symmulgt-transmul,4.13626,ok +4-6-75pct-2_144,1,idastar-symmulgt-transmul,2.86818,ok +4-6-75pct-2_144,1,idastar-symmullt-transmul,3.53222,ok +4-6-75pct-2_144,1,astar-symmullt-transmul,4.35227,ok +4-6-75pct-2_146,1,astar-symmulgt-transmul,2.10013,ok +4-6-75pct-2_146,1,idastar-symmulgt-transmul,0.900056,ok +4-6-75pct-2_146,1,idastar-symmullt-transmul,0.620038,ok +4-6-75pct-2_146,1,astar-symmullt-transmul,2.15613,ok +4-6-75pct-2_147,1,astar-symmulgt-transmul,20.8653,ok +4-6-75pct-2_147,1,idastar-symmulgt-transmul,2.31214,ok +4-6-75pct-2_147,1,idastar-symmullt-transmul,2.61616,ok +4-6-75pct-2_147,1,astar-symmullt-transmul,19.5692,ok +4-6-75pct-2_149,1,astar-symmulgt-transmul,7.31246,ok +4-6-75pct-2_149,1,idastar-symmulgt-transmul,1.21208,ok +4-6-75pct-2_149,1,idastar-symmullt-transmul,1.52409,ok +4-6-75pct-2_149,1,astar-symmullt-transmul,4.15626,ok +4-6-75pct-2_151,1,astar-symmulgt-transmul,1.96412,ok +4-6-75pct-2_151,1,idastar-symmulgt-transmul,0.348021,ok +4-6-75pct-2_151,1,idastar-symmullt-transmul,0.476029,ok +4-6-75pct-2_151,1,astar-symmullt-transmul,1.05607,ok +4-6-75pct-2_152,1,astar-symmulgt-transmul,6.97243,ok +4-6-75pct-2_152,1,idastar-symmulgt-transmul,2.56816,ok +4-6-75pct-2_152,1,idastar-symmullt-transmul,3.05219,ok +4-6-75pct-2_152,1,astar-symmullt-transmul,7.08044,ok +4-6-75pct-2_153,1,astar-symmulgt-transmul,3.76423,ok +4-6-75pct-2_153,1,idastar-symmulgt-transmul,3.90024,ok +4-6-75pct-2_153,1,idastar-symmullt-transmul,4.68029,ok +4-6-75pct-2_153,1,astar-symmullt-transmul,3.77223,ok +4-6-75pct-2_155,1,astar-symmulgt-transmul,35.1542,ok +4-6-75pct-2_155,1,idastar-symmulgt-transmul,15.965,ok +4-6-75pct-2_155,1,idastar-symmullt-transmul,23.18145,ok +4-6-75pct-2_155,1,astar-symmullt-transmul,24.8095,ok +4-6-75pct-2_156,1,astar-symmulgt-transmul,11.0127,ok +4-6-75pct-2_156,1,idastar-symmulgt-transmul,1.73611,ok +4-6-75pct-2_156,1,idastar-symmullt-transmul,1.18807,ok +4-6-75pct-2_156,1,astar-symmullt-transmul,9.85661,ok +4-6-75pct-2_157,1,astar-symmulgt-transmul,29.6539,ok +4-6-75pct-2_157,1,idastar-symmulgt-transmul,20.7613,ok +4-6-75pct-2_157,1,idastar-symmullt-transmul,28.08575,ok +4-6-75pct-2_157,1,astar-symmullt-transmul,21.5373,ok +4-6-75pct-2_159,1,astar-symmulgt-transmul,7.93249,ok +4-6-75pct-2_159,1,idastar-symmulgt-transmul,0.92806,ok +4-6-75pct-2_159,1,idastar-symmullt-transmul,1.26808,ok +4-6-75pct-2_159,1,astar-symmullt-transmul,7.70848,ok +4-6-75pct-2_161,1,astar-symmulgt-transmul,4.94031,ok +4-6-75pct-2_161,1,idastar-symmulgt-transmul,3.62823,ok +4-6-75pct-2_161,1,idastar-symmullt-transmul,4.18426,ok +4-6-75pct-2_161,1,astar-symmullt-transmul,4.8563,ok +4-6-75pct-2_163,1,astar-symmulgt-transmul,1.04406,ok +4-6-75pct-2_163,1,idastar-symmulgt-transmul,0.136008,ok +4-6-75pct-2_163,1,idastar-symmullt-transmul,0.17601,ok +4-6-75pct-2_163,1,astar-symmullt-transmul,1.01606,ok +4-6-75pct-2_165,1,astar-symmulgt-transmul,2.26014,ok +4-6-75pct-2_165,1,idastar-symmulgt-transmul,1.73211,ok +4-6-75pct-2_165,1,idastar-symmullt-transmul,2.04013,ok +4-6-75pct-2_165,1,astar-symmullt-transmul,2.26414,ok +4-6-75pct-2_166,1,astar-symmulgt-transmul,1160.06,ok +4-6-75pct-2_166,1,idastar-symmulgt-transmul,61.03581,ok +4-6-75pct-2_166,1,idastar-symmullt-transmul,78.72492,ok +4-6-75pct-2_166,1,astar-symmullt-transmul,1058.89,ok +4-6-75pct-2_169,1,astar-symmulgt-transmul,27.2657,ok +4-6-75pct-2_169,1,idastar-symmulgt-transmul,6.70442,ok +4-6-75pct-2_169,1,idastar-symmullt-transmul,9.31658,ok +4-6-75pct-2_169,1,astar-symmullt-transmul,16.261,ok +4-6-75pct-2_17,1,astar-symmulgt-transmul,54.4994,ok +4-6-75pct-2_17,1,idastar-symmulgt-transmul,8.18451,ok +4-6-75pct-2_17,1,idastar-symmullt-transmul,10.76467,ok +4-6-75pct-2_17,1,astar-symmullt-transmul,50.8152,ok +4-6-75pct-2_171,1,astar-symmulgt-transmul,3.68023,ok +4-6-75pct-2_171,1,idastar-symmulgt-transmul,0.508031,ok +4-6-75pct-2_171,1,idastar-symmullt-transmul,0.668041,ok +4-6-75pct-2_171,1,astar-symmullt-transmul,2.12013,ok +4-6-75pct-2_172,1,astar-symmulgt-transmul,30.9419,ok +4-6-75pct-2_172,1,idastar-symmulgt-transmul,8.72854,ok +4-6-75pct-2_172,1,idastar-symmullt-transmul,11.93274,ok +4-6-75pct-2_172,1,astar-symmullt-transmul,17.6211,ok +4-6-75pct-2_178,1,astar-symmulgt-transmul,2.23214,ok +4-6-75pct-2_178,1,idastar-symmulgt-transmul,0.32002,ok +4-6-75pct-2_178,1,idastar-symmullt-transmul,0.416026,ok +4-6-75pct-2_178,1,astar-symmullt-transmul,2.17213,ok +4-6-75pct-2_181,1,astar-symmulgt-transmul,1.18007,ok +4-6-75pct-2_181,1,idastar-symmulgt-transmul,0.292018,ok +4-6-75pct-2_181,1,idastar-symmullt-transmul,0.352022,ok +4-6-75pct-2_181,1,astar-symmullt-transmul,1.21208,ok +4-6-75pct-2_183,1,astar-symmulgt-transmul,7.08444,ok +4-6-75pct-2_183,1,idastar-symmulgt-transmul,1.25608,ok +4-6-75pct-2_183,1,idastar-symmullt-transmul,1.26808,ok +4-6-75pct-2_183,1,astar-symmullt-transmul,3.59622,ok +4-6-75pct-2_185,1,astar-symmulgt-transmul,5.26033,ok +4-6-75pct-2_185,1,idastar-symmulgt-transmul,1.08007,ok +4-6-75pct-2_185,1,idastar-symmullt-transmul,0.88405,ok +4-6-75pct-2_185,1,astar-symmullt-transmul,5.36033,ok +4-6-75pct-2_187,1,astar-symmulgt-transmul,96.014,ok +4-6-75pct-2_187,1,idastar-symmulgt-transmul,19.2332,ok +4-6-75pct-2_187,1,idastar-symmullt-transmul,30.23389,ok +4-6-75pct-2_187,1,astar-symmullt-transmul,55.1834,ok +4-6-75pct-2_19,1,astar-symmulgt-transmul,3.34021,ok +4-6-75pct-2_19,1,idastar-symmulgt-transmul,1.25608,ok +4-6-75pct-2_19,1,idastar-symmullt-transmul,1.90412,ok +4-6-75pct-2_19,1,astar-symmullt-transmul,1.6521,ok +4-6-75pct-2_193,1,astar-symmulgt-transmul,2.69617,ok +4-6-75pct-2_193,1,idastar-symmulgt-transmul,0.620038,ok +4-6-75pct-2_193,1,idastar-symmullt-transmul,0.516031,ok +4-6-75pct-2_193,1,astar-symmullt-transmul,1.28008,ok +4-6-75pct-2_194,1,astar-symmulgt-transmul,6.4244,ok +4-6-75pct-2_194,1,idastar-symmulgt-transmul,1.35608,ok +4-6-75pct-2_194,1,idastar-symmullt-transmul,1.93612,ok +4-6-75pct-2_194,1,astar-symmullt-transmul,3.1762,ok +4-6-75pct-2_198,1,astar-symmulgt-transmul,3.35221,ok +4-6-75pct-2_198,1,idastar-symmulgt-transmul,0.604037,ok +4-6-75pct-2_198,1,idastar-symmullt-transmul,0.680042,ok +4-6-75pct-2_198,1,astar-symmullt-transmul,3.36021,ok +4-6-75pct-2_199,1,astar-symmulgt-transmul,1.92412,ok +4-6-75pct-2_199,1,idastar-symmulgt-transmul,0.90006,ok +4-6-75pct-2_199,1,idastar-symmullt-transmul,0.94006,ok +4-6-75pct-2_199,1,astar-symmullt-transmul,0.880054,ok +4-6-75pct-2_2,1,astar-symmulgt-transmul,8.97656,ok +4-6-75pct-2_2,1,idastar-symmulgt-transmul,2.70017,ok +4-6-75pct-2_2,1,idastar-symmullt-transmul,2.97619,ok +4-6-75pct-2_2,1,astar-symmullt-transmul,4.72829,ok +4-6-75pct-2_20,1,astar-symmulgt-transmul,1.18407,ok +4-6-75pct-2_20,1,idastar-symmulgt-transmul,0.408025,ok +4-6-75pct-2_20,1,idastar-symmullt-transmul,0.620038,ok +4-6-75pct-2_20,1,astar-symmullt-transmul,0.560034,ok +4-6-75pct-2_209,1,astar-symmulgt-transmul,1.5561,ok +4-6-75pct-2_209,1,idastar-symmulgt-transmul,0.564035,ok +4-6-75pct-2_209,1,idastar-symmullt-transmul,0.82005,ok +4-6-75pct-2_209,1,astar-symmullt-transmul,0.736045,ok +4-6-75pct-2_21,1,astar-symmulgt-transmul,2.70817,ok +4-6-75pct-2_21,1,idastar-symmulgt-transmul,0.504031,ok +4-6-75pct-2_21,1,idastar-symmullt-transmul,0.748046,ok +4-6-75pct-2_21,1,astar-symmullt-transmul,1.34808,ok +4-6-75pct-2_210,1,astar-symmulgt-transmul,43.1027,ok +4-6-75pct-2_210,1,idastar-symmulgt-transmul,3.98425,ok +4-6-75pct-2_210,1,idastar-symmullt-transmul,4.8083,ok +4-6-75pct-2_210,1,astar-symmullt-transmul,39.1344,ok +4-6-75pct-2_214,1,astar-symmulgt-transmul,3.12019,ok +4-6-75pct-2_214,1,idastar-symmulgt-transmul,1.96812,ok +4-6-75pct-2_214,1,idastar-symmullt-transmul,2.55216,ok +4-6-75pct-2_214,1,astar-symmullt-transmul,1.5961,ok +4-6-75pct-2_216,1,astar-symmulgt-transmul,1.31608,ok +4-6-75pct-2_216,1,idastar-symmulgt-transmul,0.080004,ok +4-6-75pct-2_216,1,idastar-symmullt-transmul,0.104006,ok +4-6-75pct-2_216,1,astar-symmullt-transmul,0.632039,ok +4-6-75pct-2_218,1,astar-symmulgt-transmul,1.48009,ok +4-6-75pct-2_218,1,idastar-symmulgt-transmul,0.48003,ok +4-6-75pct-2_218,1,idastar-symmullt-transmul,0.620038,ok +4-6-75pct-2_218,1,astar-symmullt-transmul,0.704044,ok +4-6-75pct-2_219,1,astar-symmulgt-transmul,143.229,ok +4-6-75pct-2_219,1,idastar-symmulgt-transmul,8.43653,ok +4-6-75pct-2_219,1,idastar-symmullt-transmul,8.75255,ok +4-6-75pct-2_219,1,astar-symmullt-transmul,76.9168,ok +4-6-75pct-2_221,1,astar-symmulgt-transmul,1.53209,ok +4-6-75pct-2_221,1,idastar-symmulgt-transmul,1.43609,ok +4-6-75pct-2_221,1,idastar-symmullt-transmul,1.5721,ok +4-6-75pct-2_221,1,astar-symmullt-transmul,0.760047,ok +4-6-75pct-2_224,1,astar-symmulgt-transmul,15.313,ok +4-6-75pct-2_224,1,idastar-symmulgt-transmul,8.27652,ok +4-6-75pct-2_224,1,idastar-symmullt-transmul,5.53635,ok +4-6-75pct-2_224,1,astar-symmullt-transmul,8.66054,ok +4-6-75pct-2_225,1,astar-symmulgt-transmul,1.54009,ok +4-6-75pct-2_225,1,idastar-symmulgt-transmul,0.62004,ok +4-6-75pct-2_225,1,idastar-symmullt-transmul,0.468029,ok +4-6-75pct-2_225,1,astar-symmullt-transmul,0.748046,ok +4-6-75pct-2_226,1,astar-symmulgt-transmul,1.52409,ok +4-6-75pct-2_226,1,idastar-symmulgt-transmul,1.02006,ok +4-6-75pct-2_226,1,idastar-symmullt-transmul,1.00806,ok +4-6-75pct-2_226,1,astar-symmullt-transmul,0.732045,ok +4-6-75pct-2_227,1,astar-symmulgt-transmul,2.06413,ok +4-6-75pct-2_227,1,idastar-symmulgt-transmul,0.81605,ok +4-6-75pct-2_227,1,idastar-symmullt-transmul,1.10807,ok +4-6-75pct-2_227,1,astar-symmullt-transmul,1.06807,ok +4-6-75pct-2_23,1,astar-symmulgt-transmul,2.74417,ok +4-6-75pct-2_23,1,idastar-symmulgt-transmul,0.424026,ok +4-6-75pct-2_23,1,idastar-symmullt-transmul,0.356022,ok +4-6-75pct-2_23,1,astar-symmullt-transmul,1.31208,ok +4-6-75pct-2_230,1,astar-symmulgt-transmul,2.44415,ok +4-6-75pct-2_230,1,idastar-symmulgt-transmul,1.14807,ok +4-6-75pct-2_230,1,idastar-symmullt-transmul,2.10013,ok +4-6-75pct-2_230,1,astar-symmullt-transmul,1.24008,ok +4-6-75pct-2_231,1,astar-symmulgt-transmul,1.95612,ok +4-6-75pct-2_231,1,idastar-symmulgt-transmul,0.360022,ok +4-6-75pct-2_231,1,idastar-symmullt-transmul,0.408025,ok +4-6-75pct-2_231,1,astar-symmullt-transmul,0.948059,ok +4-6-75pct-2_235,1,astar-symmulgt-transmul,5.73236,ok +4-6-75pct-2_235,1,idastar-symmulgt-transmul,0.556034,ok +4-6-75pct-2_235,1,idastar-symmullt-transmul,0.79205,ok +4-6-75pct-2_235,1,astar-symmullt-transmul,1.6721,ok +4-6-75pct-2_236,1,astar-symmulgt-transmul,7.20845,ok +4-6-75pct-2_236,1,idastar-symmulgt-transmul,1.5481,ok +4-6-75pct-2_236,1,idastar-symmullt-transmul,1.5961,ok +4-6-75pct-2_236,1,astar-symmullt-transmul,3.66423,ok +4-6-75pct-2_239,1,astar-symmulgt-transmul,24.0695,ok +4-6-75pct-2_239,1,idastar-symmulgt-transmul,4.8483,ok +4-6-75pct-2_239,1,idastar-symmullt-transmul,6.02838,ok +4-6-75pct-2_239,1,astar-symmullt-transmul,14.1569,ok +4-6-75pct-2_242,1,astar-symmulgt-transmul,1.17207,ok +4-6-75pct-2_242,1,idastar-symmulgt-transmul,0.600037,ok +4-6-75pct-2_242,1,idastar-symmullt-transmul,0.724045,ok +4-6-75pct-2_242,1,astar-symmullt-transmul,1.16407,ok +4-6-75pct-2_243,1,astar-symmulgt-transmul,3.37221,ok +4-6-75pct-2_243,1,idastar-symmulgt-transmul,0.588036,ok +4-6-75pct-2_243,1,idastar-symmullt-transmul,0.520032,ok +4-6-75pct-2_243,1,astar-symmullt-transmul,3.36421,ok +4-6-75pct-2_248,1,astar-symmulgt-transmul,2.61216,ok +4-6-75pct-2_248,1,idastar-symmulgt-transmul,0.568035,ok +4-6-75pct-2_248,1,idastar-symmullt-transmul,0.544034,ok +4-6-75pct-2_248,1,astar-symmullt-transmul,2.69617,ok +4-6-75pct-2_25,1,astar-symmulgt-transmul,44.4308,ok +4-6-75pct-2_25,1,idastar-symmulgt-transmul,15.02894,ok +4-6-75pct-2_25,1,idastar-symmullt-transmul,20.8093,ok +4-6-75pct-2_25,1,astar-symmullt-transmul,45.8309,ok +4-6-75pct-2_28,1,astar-symmulgt-transmul,4.34027,ok +4-6-75pct-2_28,1,idastar-symmulgt-transmul,0.584036,ok +4-6-75pct-2_28,1,idastar-symmullt-transmul,0.83205,ok +4-6-75pct-2_28,1,astar-symmullt-transmul,4.29227,ok +4-6-75pct-2_31,1,astar-symmulgt-transmul,2.86418,ok +4-6-75pct-2_31,1,idastar-symmulgt-transmul,1.44809,ok +4-6-75pct-2_31,1,idastar-symmullt-transmul,1.6361,ok +4-6-75pct-2_31,1,astar-symmullt-transmul,2.95218,ok +4-6-75pct-2_33,1,astar-symmulgt-transmul,5.48834,ok +4-6-75pct-2_33,1,idastar-symmulgt-transmul,0.89205,ok +4-6-75pct-2_33,1,idastar-symmullt-transmul,0.91606,ok +4-6-75pct-2_33,1,astar-symmullt-transmul,5.10832,ok +4-6-75pct-2_35,1,astar-symmulgt-transmul,8.38852,ok +4-6-75pct-2_35,1,idastar-symmulgt-transmul,1.77611,ok +4-6-75pct-2_35,1,idastar-symmullt-transmul,1.97612,ok +4-6-75pct-2_35,1,astar-symmullt-transmul,4.34027,ok +4-6-75pct-2_36,1,astar-symmulgt-transmul,1.35208,ok +4-6-75pct-2_36,1,idastar-symmulgt-transmul,0.192012,ok +4-6-75pct-2_36,1,idastar-symmullt-transmul,0.308019,ok +4-6-75pct-2_36,1,astar-symmullt-transmul,1.5481,ok +4-6-75pct-2_44,1,astar-symmulgt-transmul,3.82024,ok +4-6-75pct-2_44,1,idastar-symmulgt-transmul,1.34808,ok +4-6-75pct-2_44,1,idastar-symmullt-transmul,1.5761,ok +4-6-75pct-2_44,1,astar-symmullt-transmul,3.84424,ok +4-6-75pct-2_47,1,astar-symmulgt-transmul,1.29208,ok +4-6-75pct-2_47,1,idastar-symmulgt-transmul,0.96806,ok +4-6-75pct-2_47,1,idastar-symmullt-transmul,1.48809,ok +4-6-75pct-2_47,1,astar-symmullt-transmul,1.29608,ok +4-6-75pct-2_49,1,astar-symmulgt-transmul,1.48409,ok +4-6-75pct-2_49,1,idastar-symmulgt-transmul,0.276017,ok +4-6-75pct-2_49,1,idastar-symmullt-transmul,0.272017,ok +4-6-75pct-2_49,1,astar-symmullt-transmul,0.704043,ok +4-6-75pct-2_55,1,astar-symmulgt-transmul,26.8857,ok +4-6-75pct-2_55,1,idastar-symmulgt-transmul,3.99625,ok +4-6-75pct-2_55,1,idastar-symmullt-transmul,4.22426,ok +4-6-75pct-2_55,1,astar-symmullt-transmul,15.0569,ok +4-6-75pct-2_60,1,astar-symmulgt-transmul,1.52409,ok +4-6-75pct-2_60,1,idastar-symmulgt-transmul,0.68404,ok +4-6-75pct-2_60,1,idastar-symmullt-transmul,0.256016,ok +4-6-75pct-2_60,1,astar-symmullt-transmul,0.720044,ok +4-6-75pct-2_62,1,astar-symmulgt-transmul,3.02819,ok +4-6-75pct-2_62,1,idastar-symmulgt-transmul,0.752047,ok +4-6-75pct-2_62,1,idastar-symmullt-transmul,0.79605,ok +4-6-75pct-2_62,1,astar-symmullt-transmul,2.99219,ok +4-6-75pct-2_64,1,astar-symmulgt-transmul,1.35208,ok +4-6-75pct-2_64,1,idastar-symmulgt-transmul,0.50003,ok +4-6-75pct-2_64,1,idastar-symmullt-transmul,0.82805,ok +4-6-75pct-2_64,1,astar-symmullt-transmul,1.35608,ok +4-6-75pct-2_65,1,astar-symmulgt-transmul,46.6869,ok +4-6-75pct-2_65,1,idastar-symmulgt-transmul,4.54028,ok +4-6-75pct-2_65,1,idastar-symmullt-transmul,3.73623,ok +4-6-75pct-2_65,1,astar-symmullt-transmul,25.1616,ok +4-6-75pct-2_68,1,astar-symmulgt-transmul,6.85643,ok +4-6-75pct-2_68,1,idastar-symmulgt-transmul,5.54035,ok +4-6-75pct-2_68,1,idastar-symmullt-transmul,7.06044,ok +4-6-75pct-2_68,1,astar-symmullt-transmul,6.83643,ok +4-6-75pct-2_71,1,astar-symmulgt-transmul,3.1482,ok +4-6-75pct-2_71,1,idastar-symmulgt-transmul,0.612038,ok +4-6-75pct-2_71,1,idastar-symmullt-transmul,0.500031,ok +4-6-75pct-2_71,1,astar-symmullt-transmul,3.1642,ok +4-6-75pct-2_75,1,astar-symmulgt-transmul,21.5293,ok +4-6-75pct-2_75,1,idastar-symmulgt-transmul,4.34027,ok +4-6-75pct-2_75,1,idastar-symmullt-transmul,6.00838,ok +4-6-75pct-2_75,1,astar-symmullt-transmul,11.7167,ok +4-6-75pct-2_76,1,astar-symmulgt-transmul,4.7883,ok +4-6-75pct-2_76,1,idastar-symmulgt-transmul,0.760047,ok +4-6-75pct-2_76,1,idastar-symmullt-transmul,0.75605,ok +4-6-75pct-2_76,1,astar-symmullt-transmul,5.11632,ok +4-6-75pct-2_78,1,astar-symmulgt-transmul,1.10807,ok +4-6-75pct-2_78,1,idastar-symmulgt-transmul,0.240015,ok +4-6-75pct-2_78,1,idastar-symmullt-transmul,0.236014,ok +4-6-75pct-2_78,1,astar-symmullt-transmul,1.12007,ok +4-6-75pct-2_83,1,astar-symmulgt-transmul,1.86812,ok +4-6-75pct-2_83,1,idastar-symmulgt-transmul,0.232014,ok +4-6-75pct-2_83,1,idastar-symmullt-transmul,0.336021,ok +4-6-75pct-2_83,1,astar-symmullt-transmul,1.84811,ok +4-6-75pct-2_84,1,astar-symmulgt-transmul,2.48015,ok +4-6-75pct-2_84,1,idastar-symmulgt-transmul,0.76405,ok +4-6-75pct-2_84,1,idastar-symmullt-transmul,1.16807,ok +4-6-75pct-2_84,1,astar-symmullt-transmul,1.5841,ok +4-6-75pct-2_86,1,astar-symmulgt-transmul,2.20814,ok +4-6-75pct-2_86,1,idastar-symmulgt-transmul,0.132008,ok +4-6-75pct-2_86,1,idastar-symmullt-transmul,0.124007,ok +4-6-75pct-2_86,1,astar-symmullt-transmul,0.96806,ok +4-6-75pct-2_89,1,astar-symmulgt-transmul,48.239,ok +4-6-75pct-2_89,1,idastar-symmulgt-transmul,4.70829,ok +4-6-75pct-2_89,1,idastar-symmullt-transmul,5.19232,ok +4-6-75pct-2_89,1,astar-symmullt-transmul,33.1781,ok +4-6-75pct-2_93,1,astar-symmulgt-transmul,38.2464,ok +4-6-75pct-2_93,1,idastar-symmulgt-transmul,4.56428,ok +4-6-75pct-2_93,1,idastar-symmullt-transmul,5.70836,ok +4-6-75pct-2_93,1,astar-symmullt-transmul,28.8698,ok +4-6-75pct-2_95,1,astar-symmulgt-transmul,3.1842,ok +4-6-75pct-2_95,1,idastar-symmulgt-transmul,1.70811,ok +4-6-75pct-2_95,1,idastar-symmullt-transmul,2.24014,ok +4-6-75pct-2_95,1,astar-symmullt-transmul,1.5881,ok +4-6-75pct-2_96,1,astar-symmulgt-transmul,2.83218,ok +4-6-75pct-2_96,1,idastar-symmulgt-transmul,0.408025,ok +4-6-75pct-2_96,1,idastar-symmullt-transmul,0.440027,ok +4-6-75pct-2_96,1,astar-symmullt-transmul,1.39209,ok +4-6-75pct-2_97,1,astar-symmulgt-transmul,104.219,ok +4-6-75pct-2_97,1,idastar-symmulgt-transmul,8.86855,ok +4-6-75pct-2_97,1,idastar-symmullt-transmul,10.85668,ok +4-6-75pct-2_97,1,astar-symmullt-transmul,69.0083,ok +4-6-75pct-2_99,1,astar-symmulgt-transmul,3.1922,ok +4-6-75pct-2_99,1,idastar-symmulgt-transmul,0.216013,ok +4-6-75pct-2_99,1,idastar-symmullt-transmul,0.360022,ok +4-6-75pct-2_99,1,astar-symmullt-transmul,2.67617,ok +6-6-75pct-2_0,1,astar-symmulgt-transmul,0.18401,ok +6-6-75pct-2_0,1,idastar-symmulgt-transmul,0.192011,ok +6-6-75pct-2_0,1,idastar-symmullt-transmul,1.02006,ok +6-6-75pct-2_0,1,astar-symmullt-transmul,0.092005,ok +6-6-75pct-2_1,1,astar-symmulgt-transmul,44.4948,ok +6-6-75pct-2_1,1,idastar-symmulgt-transmul,21.74136,ok +6-6-75pct-2_1,1,idastar-symmullt-transmul,34.90618,ok +6-6-75pct-2_1,1,astar-symmullt-transmul,23.6695,ok +6-6-75pct-2_102,1,astar-symmulgt-transmul,4.7763,ok +6-6-75pct-2_102,1,idastar-symmulgt-transmul,0.440027,ok +6-6-75pct-2_102,1,idastar-symmullt-transmul,1.07207,ok +6-6-75pct-2_102,1,astar-symmullt-transmul,2.08413,ok +6-6-75pct-2_103,1,astar-symmulgt-transmul,97.3901,ok +6-6-75pct-2_103,1,idastar-symmulgt-transmul,5.34833,ok +6-6-75pct-2_103,1,idastar-symmullt-transmul,6.31639,ok +6-6-75pct-2_103,1,astar-symmullt-transmul,60.5518,ok +6-6-75pct-2_104,1,astar-symmulgt-transmul,18.5532,ok +6-6-75pct-2_104,1,idastar-symmulgt-transmul,4.71629,ok +6-6-75pct-2_104,1,idastar-symmullt-transmul,10.17664,ok +6-6-75pct-2_104,1,astar-symmullt-transmul,17.7571,ok +6-6-75pct-2_105,1,astar-symmulgt-transmul,3.64823,ok +6-6-75pct-2_105,1,idastar-symmulgt-transmul,5.38434,ok +6-6-75pct-2_105,1,idastar-symmullt-transmul,1.96812,ok +6-6-75pct-2_105,1,astar-symmullt-transmul,1.86812,ok +6-6-75pct-2_108,1,astar-symmulgt-transmul,10.8927,ok +6-6-75pct-2_108,1,idastar-symmulgt-transmul,0.936058,ok +6-6-75pct-2_108,1,idastar-symmullt-transmul,1.05607,ok +6-6-75pct-2_108,1,astar-symmullt-transmul,4.7683,ok +6-6-75pct-2_113,1,astar-symmulgt-transmul,1.26408,ok +6-6-75pct-2_113,1,idastar-symmulgt-transmul,0.128008,ok +6-6-75pct-2_113,1,idastar-symmullt-transmul,0.184011,ok +6-6-75pct-2_113,1,astar-symmullt-transmul,1.36808,ok +6-6-75pct-2_114,1,astar-symmulgt-transmul,137.961,ok +6-6-75pct-2_114,1,idastar-symmulgt-transmul,33.06207,ok +6-6-75pct-2_114,1,idastar-symmullt-transmul,78.3289,ok +6-6-75pct-2_114,1,astar-symmullt-transmul,70.5244,ok +6-6-75pct-2_115,1,astar-symmulgt-transmul,9.88862,ok +6-6-75pct-2_115,1,idastar-symmulgt-transmul,5.60835,ok +6-6-75pct-2_115,1,idastar-symmullt-transmul,11.2647,ok +6-6-75pct-2_115,1,astar-symmullt-transmul,4.51628,ok +6-6-75pct-2_117,1,astar-symmulgt-transmul,30.4259,ok +6-6-75pct-2_117,1,idastar-symmulgt-transmul,4.95231,ok +6-6-75pct-2_117,1,idastar-symmullt-transmul,9.46859,ok +6-6-75pct-2_117,1,astar-symmullt-transmul,18.2571,ok +6-6-75pct-2_119,1,astar-symmulgt-transmul,5.57635,ok +6-6-75pct-2_119,1,idastar-symmulgt-transmul,3.76823,ok +6-6-75pct-2_119,1,idastar-symmullt-transmul,7.72048,ok +6-6-75pct-2_119,1,astar-symmullt-transmul,2.50016,ok +6-6-75pct-2_120,1,astar-symmulgt-transmul,27.5137,ok +6-6-75pct-2_120,1,idastar-symmulgt-transmul,1.89612,ok +6-6-75pct-2_120,1,idastar-symmullt-transmul,2.74817,ok +6-6-75pct-2_120,1,astar-symmullt-transmul,22.4254,ok +6-6-75pct-2_122,1,astar-symmulgt-transmul,1.05607,ok +6-6-75pct-2_122,1,idastar-symmulgt-transmul,0.088005,ok +6-6-75pct-2_122,1,idastar-symmullt-transmul,0.140008,ok +6-6-75pct-2_122,1,astar-symmullt-transmul,0.604037,ok +6-6-75pct-2_126,1,astar-symmulgt-transmul,0.600037,ok +6-6-75pct-2_126,1,idastar-symmulgt-transmul,1.24808,ok +6-6-75pct-2_126,1,idastar-symmullt-transmul,1.6081,ok +6-6-75pct-2_126,1,astar-symmullt-transmul,0.252015,ok +6-6-75pct-2_127,1,astar-symmulgt-transmul,9.79261,ok +6-6-75pct-2_127,1,idastar-symmulgt-transmul,7.75248,ok +6-6-75pct-2_127,1,idastar-symmullt-transmul,13.30083,ok +6-6-75pct-2_127,1,astar-symmullt-transmul,5.07632,ok +6-6-75pct-2_128,1,astar-symmulgt-transmul,6.52841,ok +6-6-75pct-2_128,1,idastar-symmulgt-transmul,6.80042,ok +6-6-75pct-2_128,1,idastar-symmullt-transmul,12.41277,ok +6-6-75pct-2_128,1,astar-symmullt-transmul,2.92818,ok +6-6-75pct-2_13,1,astar-symmulgt-transmul,3.1282,ok +6-6-75pct-2_13,1,idastar-symmulgt-transmul,0.608038,ok +6-6-75pct-2_13,1,idastar-symmullt-transmul,0.65204,ok +6-6-75pct-2_13,1,astar-symmullt-transmul,1.20407,ok +6-6-75pct-2_132,1,astar-symmulgt-transmul,45.7149,ok +6-6-75pct-2_132,1,idastar-symmulgt-transmul,19.30921,ok +6-6-75pct-2_132,1,idastar-symmullt-transmul,35.31021,ok +6-6-75pct-2_132,1,astar-symmullt-transmul,24.5975,ok +6-6-75pct-2_135,1,astar-symmulgt-transmul,8.84455,ok +6-6-75pct-2_135,1,idastar-symmulgt-transmul,10.98069,ok +6-6-75pct-2_135,1,idastar-symmullt-transmul,18.59316,ok +6-6-75pct-2_135,1,astar-symmullt-transmul,4.8643,ok +6-6-75pct-2_136,1,astar-symmulgt-transmul,0.17201,ok +6-6-75pct-2_136,1,idastar-symmulgt-transmul,1.08407,ok +6-6-75pct-2_136,1,idastar-symmullt-transmul,1.32808,ok +6-6-75pct-2_136,1,astar-symmullt-transmul,0.064003,ok +6-6-75pct-2_138,1,astar-symmulgt-transmul,4.21626,ok +6-6-75pct-2_138,1,idastar-symmulgt-transmul,5.52034,ok +6-6-75pct-2_138,1,idastar-symmullt-transmul,13.25683,ok +6-6-75pct-2_138,1,astar-symmullt-transmul,1.99612,ok +6-6-75pct-2_142,1,astar-symmulgt-transmul,142.089,ok +6-6-75pct-2_142,1,idastar-symmulgt-transmul,20.05725,ok +6-6-75pct-2_142,1,idastar-symmullt-transmul,27.65373,ok +6-6-75pct-2_142,1,astar-symmullt-transmul,184.684,ok +6-6-75pct-2_143,1,astar-symmulgt-transmul,0.796049,ok +6-6-75pct-2_143,1,idastar-symmulgt-transmul,1.84011,ok +6-6-75pct-2_143,1,idastar-symmullt-transmul,2.50016,ok +6-6-75pct-2_143,1,astar-symmullt-transmul,0.676042,ok +6-6-75pct-2_146,1,astar-symmulgt-transmul,23.9495,ok +6-6-75pct-2_146,1,idastar-symmulgt-transmul,1.94412,ok +6-6-75pct-2_146,1,idastar-symmullt-transmul,2.34015,ok +6-6-75pct-2_146,1,astar-symmullt-transmul,23.8695,ok +6-6-75pct-2_147,1,astar-symmulgt-transmul,11.9727,ok +6-6-75pct-2_147,1,idastar-symmulgt-transmul,0.440027,ok +6-6-75pct-2_147,1,idastar-symmullt-transmul,0.584036,ok +6-6-75pct-2_147,1,astar-symmullt-transmul,11.0807,ok +6-6-75pct-2_149,1,astar-symmulgt-transmul,1.6561,ok +6-6-75pct-2_149,1,idastar-symmulgt-transmul,0.264016,ok +6-6-75pct-2_149,1,idastar-symmullt-transmul,0.424026,ok +6-6-75pct-2_149,1,astar-symmullt-transmul,0.852053,ok +6-6-75pct-2_15,1,astar-symmulgt-transmul,10.7367,ok +6-6-75pct-2_15,1,idastar-symmulgt-transmul,9.30058,ok +6-6-75pct-2_15,1,idastar-symmullt-transmul,16.52903,ok +6-6-75pct-2_15,1,astar-symmullt-transmul,8.38852,ok +6-6-75pct-2_150,1,astar-symmulgt-transmul,8.40852,ok +6-6-75pct-2_150,1,idastar-symmulgt-transmul,3.53622,ok +6-6-75pct-2_150,1,idastar-symmullt-transmul,7.16845,ok +6-6-75pct-2_150,1,astar-symmullt-transmul,7.89249,ok +6-6-75pct-2_151,1,astar-symmulgt-transmul,45.0588,ok +6-6-75pct-2_151,1,idastar-symmulgt-transmul,5.29233,ok +6-6-75pct-2_151,1,idastar-symmullt-transmul,4.8483,ok +6-6-75pct-2_151,1,astar-symmullt-transmul,40.7345,ok +6-6-75pct-2_152,1,astar-symmulgt-transmul,782.409,ok +6-6-75pct-2_152,1,idastar-symmulgt-transmul,1.73611,ok +6-6-75pct-2_152,1,idastar-symmullt-transmul,1.6441,ok +6-6-75pct-2_152,1,astar-symmullt-transmul,576.792,ok +6-6-75pct-2_153,1,astar-symmulgt-transmul,203.457,ok +6-6-75pct-2_153,1,idastar-symmulgt-transmul,16.67704,ok +6-6-75pct-2_153,1,idastar-symmullt-transmul,16.60904,ok +6-6-75pct-2_153,1,astar-symmullt-transmul,248.824,ok +6-6-75pct-2_156,1,astar-symmulgt-transmul,30.8299,ok +6-6-75pct-2_156,1,idastar-symmulgt-transmul,2.76417,ok +6-6-75pct-2_156,1,idastar-symmullt-transmul,1.78011,ok +6-6-75pct-2_156,1,astar-symmullt-transmul,32.366,ok +6-6-75pct-2_158,1,astar-symmulgt-transmul,83.4052,ok +6-6-75pct-2_158,1,idastar-symmulgt-transmul,23.19345,ok +6-6-75pct-2_158,1,idastar-symmullt-transmul,43.59872,ok +6-6-75pct-2_158,1,astar-symmullt-transmul,97.3781,ok +6-6-75pct-2_16,1,astar-symmulgt-transmul,24.9936,ok +6-6-75pct-2_16,1,idastar-symmulgt-transmul,11.97275,ok +6-6-75pct-2_16,1,idastar-symmullt-transmul,32.006,ok +6-6-75pct-2_16,1,astar-symmullt-transmul,23.1774,ok +6-6-75pct-2_160,1,astar-symmulgt-transmul,14.1169,ok +6-6-75pct-2_160,1,idastar-symmulgt-transmul,10.57266,ok +6-6-75pct-2_160,1,idastar-symmullt-transmul,23.39746,ok +6-6-75pct-2_160,1,astar-symmullt-transmul,14.5289,ok +6-6-75pct-2_161,1,astar-symmulgt-transmul,40.8706,ok +6-6-75pct-2_161,1,idastar-symmulgt-transmul,15.80099,ok +6-6-75pct-2_161,1,idastar-symmullt-transmul,46.14688,ok +6-6-75pct-2_161,1,astar-symmullt-transmul,68.0242,ok +6-6-75pct-2_162,1,astar-symmulgt-transmul,506.452,ok +6-6-75pct-2_162,1,idastar-symmulgt-transmul,60.01175,ok +6-6-75pct-2_162,1,idastar-symmullt-transmul,112.55103,ok +6-6-75pct-2_162,1,astar-symmullt-transmul,539.49,ok +6-6-75pct-2_164,1,astar-symmulgt-transmul,36.4063,ok +6-6-75pct-2_164,1,idastar-symmulgt-transmul,7.62848,ok +6-6-75pct-2_164,1,idastar-symmullt-transmul,10.02062,ok +6-6-75pct-2_164,1,astar-symmullt-transmul,34.6502,ok +6-6-75pct-2_165,1,astar-symmulgt-transmul,3.42821,ok +6-6-75pct-2_165,1,idastar-symmulgt-transmul,2.43215,ok +6-6-75pct-2_165,1,idastar-symmullt-transmul,6.3364,ok +6-6-75pct-2_165,1,astar-symmullt-transmul,3.55622,ok +6-6-75pct-2_167,1,astar-symmulgt-transmul,65.0401,ok +6-6-75pct-2_167,1,idastar-symmulgt-transmul,14.85293,ok +6-6-75pct-2_167,1,idastar-symmullt-transmul,48.047,ok +6-6-75pct-2_167,1,astar-symmullt-transmul,64.8761,ok +6-6-75pct-2_17,1,astar-symmulgt-transmul,0.956059,ok +6-6-75pct-2_17,1,idastar-symmulgt-transmul,6.70842,ok +6-6-75pct-2_17,1,idastar-symmullt-transmul,15.00494,ok +6-6-75pct-2_17,1,astar-symmullt-transmul,1.21208,ok +6-6-75pct-2_170,1,astar-symmulgt-transmul,8.24451,ok +6-6-75pct-2_170,1,idastar-symmulgt-transmul,4.45228,ok +6-6-75pct-2_170,1,idastar-symmullt-transmul,7.9565,ok +6-6-75pct-2_170,1,astar-symmullt-transmul,8.12851,ok +6-6-75pct-2_171,1,astar-symmulgt-transmul,1.90412,ok +6-6-75pct-2_171,1,idastar-symmulgt-transmul,9.49659,ok +6-6-75pct-2_171,1,idastar-symmullt-transmul,4.16826,ok +6-6-75pct-2_171,1,astar-symmullt-transmul,1.83611,ok +6-6-75pct-2_172,1,astar-symmulgt-transmul,8.28452,ok +6-6-75pct-2_172,1,idastar-symmulgt-transmul,22.64141,ok +6-6-75pct-2_172,1,idastar-symmullt-transmul,48.48703,ok +6-6-75pct-2_172,1,astar-symmullt-transmul,7.42846,ok +6-6-75pct-2_173,1,astar-symmulgt-transmul,1.7001,ok +6-6-75pct-2_173,1,idastar-symmulgt-transmul,2.71217,ok +6-6-75pct-2_173,1,idastar-symmullt-transmul,5.09232,ok +6-6-75pct-2_173,1,astar-symmullt-transmul,1.90812,ok +6-6-75pct-2_174,1,astar-symmulgt-transmul,5.03231,ok +6-6-75pct-2_174,1,idastar-symmulgt-transmul,34.00212,ok +6-6-75pct-2_174,1,idastar-symmullt-transmul,47.66698,ok +6-6-75pct-2_174,1,astar-symmullt-transmul,4.34027,ok +6-6-75pct-2_175,1,astar-symmulgt-transmul,11.2207,ok +6-6-75pct-2_175,1,idastar-symmulgt-transmul,46.67092,ok +6-6-75pct-2_175,1,idastar-symmullt-transmul,61.59185,ok +6-6-75pct-2_175,1,astar-symmullt-transmul,7.41646,ok +6-6-75pct-2_176,1,astar-symmulgt-transmul,9.21257,ok +6-6-75pct-2_176,1,idastar-symmulgt-transmul,2.04013,ok +6-6-75pct-2_176,1,idastar-symmullt-transmul,4.70429,ok +6-6-75pct-2_176,1,astar-symmullt-transmul,5.23233,ok +6-6-75pct-2_177,1,astar-symmulgt-transmul,1.5601,ok +6-6-75pct-2_177,1,idastar-symmulgt-transmul,2.00813,ok +6-6-75pct-2_177,1,idastar-symmullt-transmul,2.70017,ok +6-6-75pct-2_177,1,astar-symmullt-transmul,1.74811,ok +6-6-75pct-2_18,1,astar-symmulgt-transmul,20.3053,ok +6-6-75pct-2_18,1,idastar-symmulgt-transmul,21.48134,ok +6-6-75pct-2_18,1,idastar-symmullt-transmul,44.42278,ok +6-6-75pct-2_18,1,astar-symmullt-transmul,20.8693,ok +6-6-75pct-2_180,1,astar-symmulgt-transmul,46.0469,ok +6-6-75pct-2_180,1,idastar-symmulgt-transmul,5.50034,ok +6-6-75pct-2_180,1,idastar-symmullt-transmul,5.76436,ok +6-6-75pct-2_180,1,astar-symmullt-transmul,23.6375,ok +6-6-75pct-2_181,1,astar-symmulgt-transmul,0.300018,ok +6-6-75pct-2_181,1,idastar-symmulgt-transmul,0.672042,ok +6-6-75pct-2_181,1,idastar-symmullt-transmul,1.15607,ok +6-6-75pct-2_181,1,astar-symmullt-transmul,0.160009,ok +6-6-75pct-2_183,1,astar-symmulgt-transmul,13.8129,ok +6-6-75pct-2_183,1,idastar-symmulgt-transmul,3.43621,ok +6-6-75pct-2_183,1,idastar-symmullt-transmul,4.34827,ok +6-6-75pct-2_183,1,astar-symmullt-transmul,26.8057,ok +6-6-75pct-2_184,1,astar-symmulgt-transmul,1.92412,ok +6-6-75pct-2_184,1,idastar-symmulgt-transmul,1.42409,ok +6-6-75pct-2_184,1,idastar-symmullt-transmul,3.02819,ok +6-6-75pct-2_184,1,astar-symmullt-transmul,0.872053,ok +6-6-75pct-2_185,1,astar-symmulgt-transmul,3.03219,ok +6-6-75pct-2_185,1,idastar-symmulgt-transmul,3.11619,ok +6-6-75pct-2_185,1,idastar-symmullt-transmul,6.03638,ok +6-6-75pct-2_185,1,astar-symmullt-transmul,1.47209,ok +6-6-75pct-2_186,1,astar-symmulgt-transmul,479.142,ok +6-6-75pct-2_186,1,idastar-symmulgt-transmul,26.32165,ok +6-6-75pct-2_186,1,idastar-symmullt-transmul,51.2872,ok +6-6-75pct-2_186,1,astar-symmullt-transmul,410.038,ok +6-6-75pct-2_187,1,astar-symmulgt-transmul,77.4968,ok +6-6-75pct-2_187,1,idastar-symmulgt-transmul,52.09125,ok +6-6-75pct-2_187,1,idastar-symmullt-transmul,97.71011,ok +6-6-75pct-2_187,1,astar-symmullt-transmul,46.9429,ok +6-6-75pct-2_188,1,astar-symmulgt-transmul,0.368022,ok +6-6-75pct-2_188,1,idastar-symmulgt-transmul,0.032002,ok +6-6-75pct-2_188,1,idastar-symmullt-transmul,0.032002,ok +6-6-75pct-2_188,1,astar-symmullt-transmul,1.00406,ok +6-6-75pct-2_190,1,astar-symmulgt-transmul,57.2356,ok +6-6-75pct-2_190,1,idastar-symmulgt-transmul,11.47272,ok +6-6-75pct-2_190,1,idastar-symmullt-transmul,24.12151,ok +6-6-75pct-2_190,1,astar-symmullt-transmul,33.2421,ok +6-6-75pct-2_191,1,astar-symmulgt-transmul,12.8128,ok +6-6-75pct-2_191,1,idastar-symmulgt-transmul,0.280017,ok +6-6-75pct-2_191,1,idastar-symmullt-transmul,0.772048,ok +6-6-75pct-2_191,1,astar-symmullt-transmul,1.96412,ok +6-6-75pct-2_193,1,astar-symmulgt-transmul,50.4472,ok +6-6-75pct-2_193,1,idastar-symmulgt-transmul,5.97237,ok +6-6-75pct-2_193,1,idastar-symmullt-transmul,10.50866,ok +6-6-75pct-2_193,1,astar-symmullt-transmul,71.6605,ok +6-6-75pct-2_194,1,astar-symmulgt-transmul,12.4448,ok +6-6-75pct-2_194,1,idastar-symmulgt-transmul,3.31621,ok +6-6-75pct-2_194,1,idastar-symmullt-transmul,3.70823,ok +6-6-75pct-2_194,1,astar-symmullt-transmul,23.5095,ok +6-6-75pct-2_195,1,astar-symmulgt-transmul,74.3246,ok +6-6-75pct-2_195,1,idastar-symmulgt-transmul,19.03719,ok +6-6-75pct-2_195,1,idastar-symmullt-transmul,28.60579,ok +6-6-75pct-2_195,1,astar-symmullt-transmul,109.895,ok +6-6-75pct-2_196,1,astar-symmulgt-transmul,16.397,ok +6-6-75pct-2_196,1,idastar-symmulgt-transmul,24.36952,ok +6-6-75pct-2_196,1,idastar-symmullt-transmul,6.12838,ok +6-6-75pct-2_196,1,astar-symmullt-transmul,15.745,ok +6-6-75pct-2_197,1,astar-symmulgt-transmul,2.06413,ok +6-6-75pct-2_197,1,idastar-symmulgt-transmul,0.192012,ok +6-6-75pct-2_197,1,idastar-symmullt-transmul,0.288018,ok +6-6-75pct-2_197,1,astar-symmullt-transmul,0.300018,ok +6-6-75pct-2_198,1,astar-symmulgt-transmul,2.64016,ok +6-6-75pct-2_198,1,idastar-symmulgt-transmul,0.592036,ok +6-6-75pct-2_198,1,idastar-symmullt-transmul,1.00006,ok +6-6-75pct-2_198,1,astar-symmullt-transmul,5.26833,ok +6-6-75pct-2_199,1,astar-symmulgt-transmul,13.2568,ok +6-6-75pct-2_199,1,idastar-symmulgt-transmul,17.5251,ok +6-6-75pct-2_199,1,idastar-symmullt-transmul,30.28189,ok +6-6-75pct-2_199,1,astar-symmullt-transmul,7.41646,ok +6-6-75pct-2_2,1,astar-symmulgt-transmul,0.996061,ok +6-6-75pct-2_2,1,idastar-symmulgt-transmul,0.636039,ok +6-6-75pct-2_2,1,idastar-symmullt-transmul,0.848053,ok +6-6-75pct-2_2,1,astar-symmullt-transmul,1.15207,ok +6-6-75pct-2_201,1,astar-symmulgt-transmul,5.59635,ok +6-6-75pct-2_201,1,idastar-symmulgt-transmul,2.43615,ok +6-6-75pct-2_201,1,idastar-symmullt-transmul,2.88818,ok +6-6-75pct-2_201,1,astar-symmullt-transmul,10.1886,ok +6-6-75pct-2_202,1,astar-symmulgt-transmul,0.344021,ok +6-6-75pct-2_202,1,idastar-symmulgt-transmul,2.41615,ok +6-6-75pct-2_202,1,idastar-symmullt-transmul,1.93212,ok +6-6-75pct-2_202,1,astar-symmullt-transmul,0.300018,ok +6-6-75pct-2_203,1,astar-symmulgt-transmul,2.02013,ok +6-6-75pct-2_203,1,idastar-symmulgt-transmul,0.200012,ok +6-6-75pct-2_203,1,idastar-symmullt-transmul,0.256016,ok +6-6-75pct-2_203,1,astar-symmullt-transmul,1.80411,ok +6-6-75pct-2_204,1,astar-symmulgt-transmul,10.6847,ok +6-6-75pct-2_204,1,idastar-symmulgt-transmul,4.43628,ok +6-6-75pct-2_204,1,idastar-symmullt-transmul,5.98437,ok +6-6-75pct-2_204,1,astar-symmullt-transmul,10.2046,ok +6-6-75pct-2_206,1,astar-symmulgt-transmul,3.51222,ok +6-6-75pct-2_206,1,idastar-symmulgt-transmul,1.94012,ok +6-6-75pct-2_206,1,idastar-symmullt-transmul,2.49215,ok +6-6-75pct-2_206,1,astar-symmullt-transmul,3.68423,ok +6-6-75pct-2_209,1,astar-symmulgt-transmul,75.6247,ok +6-6-75pct-2_209,1,idastar-symmulgt-transmul,43.09469,ok +6-6-75pct-2_209,1,idastar-symmullt-transmul,97.37009,ok +6-6-75pct-2_209,1,astar-symmullt-transmul,114.851,ok +6-6-75pct-2_211,1,astar-symmulgt-transmul,8.18051,ok +6-6-75pct-2_211,1,idastar-symmulgt-transmul,1.78011,ok +6-6-75pct-2_211,1,idastar-symmullt-transmul,1.6881,ok +6-6-75pct-2_211,1,astar-symmullt-transmul,7.46847,ok +6-6-75pct-2_215,1,astar-symmulgt-transmul,345.654,ok +6-6-75pct-2_215,1,idastar-symmulgt-transmul,22.73342,ok +6-6-75pct-2_215,1,idastar-symmullt-transmul,35.92624,ok +6-6-75pct-2_215,1,astar-symmullt-transmul,349.902,ok +6-6-75pct-2_219,1,astar-symmulgt-transmul,20.3093,ok +6-6-75pct-2_219,1,idastar-symmulgt-transmul,3.29621,ok +6-6-75pct-2_219,1,idastar-symmullt-transmul,5.08832,ok +6-6-75pct-2_219,1,astar-symmullt-transmul,11.1927,ok +6-6-75pct-2_220,1,astar-symmulgt-transmul,311.603,ok +6-6-75pct-2_220,1,idastar-symmulgt-transmul,661.74936,ok +6-6-75pct-2_220,1,idastar-symmullt-transmul,904.87255,ok +6-6-75pct-2_220,1,astar-symmullt-transmul,428.675,ok +6-6-75pct-2_221,1,astar-symmulgt-transmul,285.502,ok +6-6-75pct-2_221,1,idastar-symmulgt-transmul,26.56166,ok +6-6-75pct-2_221,1,idastar-symmullt-transmul,22.89743,ok +6-6-75pct-2_221,1,astar-symmullt-transmul,196.84,ok +6-6-75pct-2_222,1,astar-symmulgt-transmul,2.20014,ok +6-6-75pct-2_222,1,idastar-symmulgt-transmul,0.600037,ok +6-6-75pct-2_222,1,idastar-symmullt-transmul,0.372023,ok +6-6-75pct-2_222,1,astar-symmullt-transmul,1.72011,ok +6-6-75pct-2_223,1,astar-symmulgt-transmul,23.1654,ok +6-6-75pct-2_223,1,idastar-symmulgt-transmul,14.3329,ok +6-6-75pct-2_223,1,idastar-symmullt-transmul,25.43359,ok +6-6-75pct-2_223,1,astar-symmullt-transmul,24.7575,ok +6-6-75pct-2_224,1,astar-symmulgt-transmul,38.9304,ok +6-6-75pct-2_224,1,idastar-symmulgt-transmul,3.69623,ok +6-6-75pct-2_224,1,idastar-symmullt-transmul,5.64035,ok +6-6-75pct-2_224,1,astar-symmullt-transmul,43.1187,ok +6-6-75pct-2_225,1,astar-symmulgt-transmul,36.8303,ok +6-6-75pct-2_225,1,idastar-symmulgt-transmul,8.81655,ok +6-6-75pct-2_225,1,idastar-symmullt-transmul,19.96925,ok +6-6-75pct-2_225,1,astar-symmullt-transmul,48.583,ok +6-6-75pct-2_227,1,astar-symmulgt-transmul,12.7408,ok +6-6-75pct-2_227,1,idastar-symmulgt-transmul,18.92918,ok +6-6-75pct-2_227,1,idastar-symmullt-transmul,22.59341,ok +6-6-75pct-2_227,1,astar-symmullt-transmul,10.2686,ok +6-6-75pct-2_229,1,astar-symmulgt-transmul,2.73217,ok +6-6-75pct-2_229,1,idastar-symmulgt-transmul,7.65648,ok +6-6-75pct-2_229,1,idastar-symmullt-transmul,13.87287,ok +6-6-75pct-2_229,1,astar-symmullt-transmul,2.99619,ok +6-6-75pct-2_23,1,astar-symmulgt-transmul,1.84411,ok +6-6-75pct-2_23,1,idastar-symmulgt-transmul,0.628039,ok +6-6-75pct-2_23,1,idastar-symmullt-transmul,0.832052,ok +6-6-75pct-2_23,1,astar-symmullt-transmul,2.24814,ok +6-6-75pct-2_230,1,astar-symmulgt-transmul,13.7369,ok +6-6-75pct-2_230,1,idastar-symmulgt-transmul,3.51622,ok +6-6-75pct-2_230,1,idastar-symmullt-transmul,3.88824,ok +6-6-75pct-2_230,1,astar-symmullt-transmul,13.9409,ok +6-6-75pct-2_233,1,astar-symmulgt-transmul,23.8215,ok +6-6-75pct-2_233,1,idastar-symmulgt-transmul,8.72855,ok +6-6-75pct-2_233,1,idastar-symmullt-transmul,11.32871,ok +6-6-75pct-2_233,1,astar-symmullt-transmul,16.101,ok +6-6-75pct-2_235,1,astar-symmulgt-transmul,3.2042,ok +6-6-75pct-2_235,1,idastar-symmulgt-transmul,0.608038,ok +6-6-75pct-2_235,1,idastar-symmullt-transmul,0.632039,ok +6-6-75pct-2_235,1,astar-symmullt-transmul,1.38409,ok +6-6-75pct-2_236,1,astar-symmulgt-transmul,4.48028,ok +6-6-75pct-2_236,1,idastar-symmulgt-transmul,2.27214,ok +6-6-75pct-2_236,1,idastar-symmullt-transmul,3.89624,ok +6-6-75pct-2_236,1,astar-symmullt-transmul,2.22814,ok +6-6-75pct-2_238,1,astar-symmulgt-transmul,2.19214,ok +6-6-75pct-2_238,1,idastar-symmulgt-transmul,0.32402,ok +6-6-75pct-2_238,1,idastar-symmullt-transmul,0.340021,ok +6-6-75pct-2_238,1,astar-symmullt-transmul,1.08407,ok +6-6-75pct-2_240,1,astar-symmulgt-transmul,1.24008,ok +6-6-75pct-2_240,1,idastar-symmulgt-transmul,0.496031,ok +6-6-75pct-2_240,1,idastar-symmullt-transmul,0.784049,ok +6-6-75pct-2_240,1,astar-symmullt-transmul,0.456028,ok +6-6-75pct-2_241,1,astar-symmulgt-transmul,99.0302,ok +6-6-75pct-2_241,1,idastar-symmulgt-transmul,9.24458,ok +6-6-75pct-2_241,1,idastar-symmullt-transmul,8.80455,ok +6-6-75pct-2_241,1,astar-symmullt-transmul,69.9884,ok +6-6-75pct-2_242,1,astar-symmulgt-transmul,116.043,ok +6-6-75pct-2_242,1,idastar-symmulgt-transmul,189.97187,ok +6-6-75pct-2_242,1,idastar-symmullt-transmul,315.51572,ok +6-6-75pct-2_242,1,astar-symmullt-transmul,75.4007,ok +6-6-75pct-2_245,1,astar-symmulgt-transmul,3.33221,ok +6-6-75pct-2_245,1,idastar-symmulgt-transmul,6.70842,ok +6-6-75pct-2_245,1,idastar-symmullt-transmul,2.57216,ok +6-6-75pct-2_245,1,astar-symmullt-transmul,1.78011,ok +6-6-75pct-2_248,1,astar-symmulgt-transmul,309.027,ok +6-6-75pct-2_248,1,idastar-symmulgt-transmul,29.25783,ok +6-6-75pct-2_248,1,idastar-symmullt-transmul,68.8683,ok +6-6-75pct-2_248,1,astar-symmullt-transmul,190.744,ok +6-6-75pct-2_249,1,astar-symmulgt-transmul,215.585,ok +6-6-75pct-2_249,1,idastar-symmulgt-transmul,5.78836,ok +6-6-75pct-2_249,1,idastar-symmullt-transmul,11.2367,ok +6-6-75pct-2_249,1,astar-symmullt-transmul,143.817,ok +6-6-75pct-2_25,1,astar-symmulgt-transmul,592.317,ok +6-6-75pct-2_25,1,idastar-symmulgt-transmul,19.47722,ok +6-6-75pct-2_25,1,idastar-symmullt-transmul,25.94562,ok +6-6-75pct-2_25,1,astar-symmullt-transmul,462.229,ok +6-6-75pct-2_26,1,astar-symmulgt-transmul,3.68823,ok +6-6-75pct-2_26,1,idastar-symmulgt-transmul,1.82011,ok +6-6-75pct-2_26,1,idastar-symmullt-transmul,1.89612,ok +6-6-75pct-2_26,1,astar-symmullt-transmul,1.76411,ok +6-6-75pct-2_27,1,astar-symmulgt-transmul,2.87218,ok +6-6-75pct-2_27,1,idastar-symmulgt-transmul,0.080005,ok +6-6-75pct-2_27,1,idastar-symmullt-transmul,0.16801,ok +6-6-75pct-2_27,1,astar-symmullt-transmul,1.06807,ok +6-6-75pct-2_28,1,astar-symmulgt-transmul,9.28058,ok +6-6-75pct-2_28,1,idastar-symmulgt-transmul,4.05225,ok +6-6-75pct-2_28,1,idastar-symmullt-transmul,7.58447,ok +6-6-75pct-2_28,1,astar-symmullt-transmul,4.94831,ok +6-6-75pct-2_3,1,astar-symmulgt-transmul,1.39209,ok +6-6-75pct-2_3,1,idastar-symmulgt-transmul,0.32002,ok +6-6-75pct-2_3,1,idastar-symmullt-transmul,0.376023,ok +6-6-75pct-2_3,1,astar-symmullt-transmul,0.66004,ok +6-6-75pct-2_30,1,astar-symmulgt-transmul,0.240014,ok +6-6-75pct-2_30,1,idastar-symmulgt-transmul,2.26814,ok +6-6-75pct-2_30,1,idastar-symmullt-transmul,2.50016,ok +6-6-75pct-2_30,1,astar-symmullt-transmul,0.124007,ok +6-6-75pct-2_31,1,astar-symmulgt-transmul,347.37,ok +6-6-75pct-2_31,1,idastar-symmulgt-transmul,219.80974,ok +6-6-75pct-2_31,1,idastar-symmullt-transmul,355.60622,ok +6-6-75pct-2_31,1,astar-symmullt-transmul,325.7,ok +6-6-75pct-2_32,1,astar-symmulgt-transmul,273.157,ok +6-6-75pct-2_32,1,idastar-symmulgt-transmul,28.67379,ok +6-6-75pct-2_32,1,idastar-symmullt-transmul,46.86693,ok +6-6-75pct-2_32,1,astar-symmullt-transmul,208.961,ok +6-6-75pct-2_35,1,astar-symmulgt-transmul,206.153,ok +6-6-75pct-2_35,1,idastar-symmulgt-transmul,33.17407,ok +6-6-75pct-2_35,1,idastar-symmullt-transmul,45.71886,ok +6-6-75pct-2_35,1,astar-symmullt-transmul,240.935,ok +6-6-75pct-2_39,1,astar-symmulgt-transmul,571.292,ok +6-6-75pct-2_39,1,idastar-symmulgt-transmul,48.21901,ok +6-6-75pct-2_39,1,idastar-symmullt-transmul,79.20095,ok +6-6-75pct-2_39,1,astar-symmullt-transmul,628.547,ok +6-6-75pct-2_43,1,astar-symmulgt-transmul,7.30046,ok +6-6-75pct-2_43,1,idastar-symmulgt-transmul,11.57672,ok +6-6-75pct-2_43,1,idastar-symmullt-transmul,23.64548,ok +6-6-75pct-2_43,1,astar-symmullt-transmul,7.81249,ok +6-6-75pct-2_44,1,astar-symmulgt-transmul,2.26414,ok +6-6-75pct-2_44,1,idastar-symmulgt-transmul,2.27614,ok +6-6-75pct-2_44,1,idastar-symmullt-transmul,0.312019,ok +6-6-75pct-2_44,1,astar-symmullt-transmul,2.49615,ok +6-6-75pct-2_45,1,astar-symmulgt-transmul,149.845,ok +6-6-75pct-2_45,1,idastar-symmulgt-transmul,44.27877,ok +6-6-75pct-2_45,1,idastar-symmullt-transmul,100.8623,ok +6-6-75pct-2_45,1,astar-symmullt-transmul,180.887,ok +6-6-75pct-2_46,1,astar-symmulgt-transmul,26.3136,ok +6-6-75pct-2_46,1,idastar-symmulgt-transmul,79.20495,ok +6-6-75pct-2_46,1,idastar-symmullt-transmul,39.19845,ok +6-6-75pct-2_46,1,astar-symmullt-transmul,80.729,ok +6-6-75pct-2_5,1,astar-symmulgt-transmul,4.18826,ok +6-6-75pct-2_5,1,idastar-symmulgt-transmul,5.84037,ok +6-6-75pct-2_5,1,idastar-symmullt-transmul,12.8328,ok +6-6-75pct-2_5,1,astar-symmullt-transmul,57.7876,ok +6-6-75pct-2_50,1,astar-symmulgt-transmul,476.478,ok +6-6-75pct-2_50,1,idastar-symmulgt-transmul,1075.04719,ok +6-6-75pct-2_50,1,idastar-symmullt-transmul,1264.98706,ok +6-6-75pct-2_50,1,astar-symmullt-transmul,544.482,ok +6-6-75pct-2_52,1,astar-symmulgt-transmul,0.688042,ok +6-6-75pct-2_52,1,idastar-symmulgt-transmul,2.96819,ok +6-6-75pct-2_52,1,idastar-symmullt-transmul,6.61241,ok +6-6-75pct-2_52,1,astar-symmullt-transmul,1.6081,ok +6-6-75pct-2_54,1,astar-symmulgt-transmul,46.3629,ok +6-6-75pct-2_54,1,idastar-symmulgt-transmul,5.54435,ok +6-6-75pct-2_54,1,idastar-symmullt-transmul,12.48078,ok +6-6-75pct-2_54,1,astar-symmullt-transmul,71.2605,ok +6-6-75pct-2_55,1,astar-symmulgt-transmul,305.535,ok +6-6-75pct-2_55,1,idastar-symmulgt-transmul,24.80155,ok +6-6-75pct-2_55,1,idastar-symmullt-transmul,25.86962,ok +6-6-75pct-2_55,1,astar-symmullt-transmul,294.146,ok +6-6-75pct-2_57,1,astar-symmulgt-transmul,177.931,ok +6-6-75pct-2_57,1,idastar-symmulgt-transmul,16.65304,ok +6-6-75pct-2_57,1,idastar-symmullt-transmul,15.61298,ok +6-6-75pct-2_57,1,astar-symmullt-transmul,172.631,ok +6-6-75pct-2_58,1,astar-symmulgt-transmul,4.11226,ok +6-6-75pct-2_58,1,idastar-symmulgt-transmul,1.08807,ok +6-6-75pct-2_58,1,idastar-symmullt-transmul,1.16407,ok +6-6-75pct-2_58,1,astar-symmullt-transmul,4.12026,ok +6-6-75pct-2_59,1,astar-symmulgt-transmul,90.0976,ok +6-6-75pct-2_59,1,idastar-symmulgt-transmul,12.8968,ok +6-6-75pct-2_59,1,idastar-symmullt-transmul,11.72873,ok +6-6-75pct-2_59,1,astar-symmullt-transmul,143.137,ok +6-6-75pct-2_60,1,astar-symmulgt-transmul,480.51,ok +6-6-75pct-2_60,1,idastar-symmulgt-transmul,42.97869,ok +6-6-75pct-2_60,1,idastar-symmullt-transmul,91.84174,ok +6-6-75pct-2_60,1,astar-symmullt-transmul,758.483,ok +6-6-75pct-2_62,1,astar-symmulgt-transmul,14.1849,ok +6-6-75pct-2_62,1,idastar-symmulgt-transmul,3.49222,ok +6-6-75pct-2_62,1,idastar-symmullt-transmul,5.23233,ok +6-6-75pct-2_62,1,astar-symmullt-transmul,46.9229,ok +6-6-75pct-2_63,1,astar-symmulgt-transmul,1.14407,ok +6-6-75pct-2_63,1,idastar-symmulgt-transmul,1.14407,ok +6-6-75pct-2_63,1,idastar-symmullt-transmul,2.12013,ok +6-6-75pct-2_63,1,astar-symmullt-transmul,2.35615,ok +6-6-75pct-2_66,1,astar-symmulgt-transmul,3.71223,ok +6-6-75pct-2_66,1,idastar-symmulgt-transmul,9.73661,ok +6-6-75pct-2_66,1,idastar-symmullt-transmul,3.86424,ok +6-6-75pct-2_66,1,astar-symmullt-transmul,8.34452,ok +6-6-75pct-2_67,1,astar-symmulgt-transmul,8.24851,ok +6-6-75pct-2_67,1,idastar-symmulgt-transmul,0.99206,ok +6-6-75pct-2_67,1,idastar-symmullt-transmul,1.11207,ok +6-6-75pct-2_67,1,astar-symmullt-transmul,12.7488,ok +6-6-75pct-2_68,1,astar-symmulgt-transmul,18.2411,ok +6-6-75pct-2_68,1,idastar-symmulgt-transmul,14.83693,ok +6-6-75pct-2_68,1,idastar-symmullt-transmul,27.78174,ok +6-6-75pct-2_68,1,astar-symmullt-transmul,33.1581,ok +6-6-75pct-2_70,1,astar-symmulgt-transmul,0.604037,ok +6-6-75pct-2_70,1,idastar-symmulgt-transmul,4.64429,ok +6-6-75pct-2_70,1,idastar-symmullt-transmul,5.51634,ok +6-6-75pct-2_70,1,astar-symmullt-transmul,1.20408,ok +6-6-75pct-2_71,1,astar-symmulgt-transmul,6.4724,ok +6-6-75pct-2_71,1,idastar-symmulgt-transmul,8.0645,ok +6-6-75pct-2_71,1,idastar-symmullt-transmul,17.02506,ok +6-6-75pct-2_71,1,astar-symmullt-transmul,5.58835,ok +6-6-75pct-2_72,1,astar-symmulgt-transmul,258.008,ok +6-6-75pct-2_72,1,idastar-symmulgt-transmul,28.89381,ok +6-6-75pct-2_72,1,idastar-symmullt-transmul,25.44959,ok +6-6-75pct-2_72,1,astar-symmullt-transmul,200.625,ok +6-6-75pct-2_73,1,astar-symmulgt-transmul,2.67617,ok +6-6-75pct-2_73,1,idastar-symmulgt-transmul,1.39609,ok +6-6-75pct-2_73,1,idastar-symmullt-transmul,1.16807,ok +6-6-75pct-2_73,1,astar-symmullt-transmul,6.23639,ok +6-6-75pct-2_74,1,astar-symmulgt-transmul,41.2106,ok +6-6-75pct-2_74,1,idastar-symmulgt-transmul,9.04457,ok +6-6-75pct-2_74,1,idastar-symmullt-transmul,10.20864,ok +6-6-75pct-2_74,1,astar-symmullt-transmul,76.7968,ok +6-6-75pct-2_75,1,astar-symmulgt-transmul,2.52016,ok +6-6-75pct-2_75,1,idastar-symmulgt-transmul,1.74011,ok +6-6-75pct-2_75,1,idastar-symmullt-transmul,5.54835,ok +6-6-75pct-2_75,1,astar-symmullt-transmul,5.41634,ok +6-6-75pct-2_76,1,astar-symmulgt-transmul,33.0741,ok +6-6-75pct-2_76,1,idastar-symmulgt-transmul,0.080005,ok +6-6-75pct-2_76,1,idastar-symmullt-transmul,0.208013,ok +6-6-75pct-2_76,1,astar-symmullt-transmul,288.734,ok +6-6-75pct-2_78,1,astar-symmulgt-transmul,2.33214,ok +6-6-75pct-2_78,1,idastar-symmulgt-transmul,0.288018,ok +6-6-75pct-2_78,1,idastar-symmullt-transmul,0.564035,ok +6-6-75pct-2_78,1,astar-symmullt-transmul,2.32414,ok +6-6-75pct-2_81,1,astar-symmulgt-transmul,5.04432,ok +6-6-75pct-2_81,1,idastar-symmulgt-transmul,28.51778,ok +6-6-75pct-2_81,1,idastar-symmullt-transmul,32.40602,ok +6-6-75pct-2_81,1,astar-symmullt-transmul,5.09232,ok +6-6-75pct-2_82,1,astar-symmulgt-transmul,15.837,ok +6-6-75pct-2_82,1,idastar-symmulgt-transmul,8.50853,ok +6-6-75pct-2_82,1,idastar-symmullt-transmul,21.80536,ok +6-6-75pct-2_82,1,astar-symmullt-transmul,16.073,ok +6-6-75pct-2_84,1,astar-symmulgt-transmul,16.349,ok +6-6-75pct-2_84,1,idastar-symmulgt-transmul,1.78411,ok +6-6-75pct-2_84,1,idastar-symmullt-transmul,2.57616,ok +6-6-75pct-2_84,1,astar-symmullt-transmul,18.0851,ok +6-6-75pct-2_86,1,astar-symmulgt-transmul,13.4168,ok +6-6-75pct-2_86,1,idastar-symmulgt-transmul,9.94862,ok +6-6-75pct-2_86,1,idastar-symmullt-transmul,30.78992,ok +6-6-75pct-2_86,1,astar-symmullt-transmul,13.8129,ok +6-6-75pct-2_87,1,astar-symmulgt-transmul,2.06413,ok +6-6-75pct-2_87,1,idastar-symmulgt-transmul,1.5721,ok +6-6-75pct-2_87,1,idastar-symmullt-transmul,2.14413,ok +6-6-75pct-2_87,1,astar-symmullt-transmul,2.07213,ok +6-6-75pct-2_88,1,astar-symmulgt-transmul,0.408024,ok +6-6-75pct-2_88,1,idastar-symmulgt-transmul,1.5681,ok +6-6-75pct-2_88,1,idastar-symmullt-transmul,1.6041,ok +6-6-75pct-2_88,1,astar-symmullt-transmul,0.400024,ok +6-6-75pct-2_9,1,astar-symmulgt-transmul,1.42409,ok +6-6-75pct-2_9,1,idastar-symmulgt-transmul,0.17201,ok +6-6-75pct-2_9,1,idastar-symmullt-transmul,0.260016,ok +6-6-75pct-2_9,1,astar-symmullt-transmul,1.05207,ok +6-6-75pct-2_90,1,astar-symmulgt-transmul,6.08838,ok +6-6-75pct-2_90,1,idastar-symmulgt-transmul,6.13638,ok +6-6-75pct-2_90,1,idastar-symmullt-transmul,11.78474,ok +6-6-75pct-2_90,1,astar-symmullt-transmul,6.64841,ok +6-6-75pct-2_91,1,astar-symmulgt-transmul,39.4585,ok +6-6-75pct-2_91,1,idastar-symmulgt-transmul,18.94918,ok +6-6-75pct-2_91,1,idastar-symmullt-transmul,41.84661,ok +6-6-75pct-2_91,1,astar-symmullt-transmul,41.3546,ok +6-6-75pct-2_92,1,astar-symmulgt-transmul,0.756047,ok +6-6-75pct-2_92,1,idastar-symmulgt-transmul,7.54847,ok +6-6-75pct-2_92,1,idastar-symmullt-transmul,7.34846,ok +6-6-75pct-2_92,1,astar-symmullt-transmul,1.76011,ok +6-6-75pct-2_93,1,astar-symmulgt-transmul,13.6929,ok +6-6-75pct-2_93,1,idastar-symmulgt-transmul,7.84049,ok +6-6-75pct-2_93,1,idastar-symmullt-transmul,12.03675,ok +6-6-75pct-2_93,1,astar-symmullt-transmul,54.4194,ok +6-6-75pct-2_94,1,astar-symmulgt-transmul,3.54022,ok +6-6-75pct-2_94,1,idastar-symmulgt-transmul,1.50809,ok +6-6-75pct-2_94,1,idastar-symmullt-transmul,1.88012,ok +6-6-75pct-2_94,1,astar-symmullt-transmul,6.3764,ok +6-6-75pct-2_97,1,astar-symmulgt-transmul,0.436026,ok +6-6-75pct-2_97,1,idastar-symmulgt-transmul,0.96406,ok +6-6-75pct-2_97,1,idastar-symmullt-transmul,1.52009,ok +6-6-75pct-2_97,1,astar-symmullt-transmul,0.49203,ok +6-6-75pct-2_98,1,astar-symmulgt-transmul,27.8297,ok +6-6-75pct-2_98,1,idastar-symmulgt-transmul,12.48078,ok +6-6-75pct-2_98,1,idastar-symmullt-transmul,22.61341,ok +6-6-75pct-2_98,1,astar-symmullt-transmul,33.6861,ok +6-6-75pct-2_99,1,astar-symmulgt-transmul,2.05213,ok +6-6-75pct-2_99,1,idastar-symmulgt-transmul,0.728045,ok +6-6-75pct-2_99,1,idastar-symmullt-transmul,1.44809,ok +6-6-75pct-2_99,1,astar-symmullt-transmul,1.5481,ok +8-8-75pct-2_104,1,astar-symmulgt-transmul,145.165,ok +8-8-75pct-2_104,1,idastar-symmulgt-transmul,44.30277,ok +8-8-75pct-2_104,1,idastar-symmullt-transmul,74.79667,ok +8-8-75pct-2_104,1,astar-symmullt-transmul,142.133,ok +8-8-75pct-2_19,1,astar-symmulgt-transmul,252.408,ok +8-8-75pct-2_19,1,idastar-symmulgt-transmul,22.97744,ok +8-8-75pct-2_19,1,idastar-symmullt-transmul,23.48147,ok +8-8-75pct-2_19,1,astar-symmullt-transmul,34.4582,ok +8-8-75pct-2_196,1,astar-symmulgt-transmul,38.4504,ok +8-8-75pct-2_196,1,idastar-symmulgt-transmul,117.29533,ok +8-8-75pct-2_196,1,idastar-symmullt-transmul,1324.33076,ok +8-8-75pct-2_196,1,astar-symmullt-transmul,37.9704,ok +8-8-75pct-2_66,1,astar-symmulgt-transmul,538.846,ok +8-8-75pct-2_66,1,idastar-symmulgt-transmul,210.40515,ok +8-8-75pct-2_66,1,idastar-symmullt-transmul,56.60754,ok +8-8-75pct-2_66,1,astar-symmullt-transmul,461.857,ok +8-8-75pct-2_86,1,astar-symmulgt-transmul,9.52859,ok +8-8-75pct-2_86,1,idastar-symmulgt-transmul,26.71367,ok +8-8-75pct-2_86,1,idastar-symmullt-transmul,43.47072,ok +8-8-75pct-2_86,1,astar-symmullt-transmul,30.3379,ok +tdata-3-7-23,1,astar-symmulgt-transmul,1.19607,ok +tdata-3-7-23,1,idastar-symmulgt-transmul,0.140008,ok +tdata-3-7-23,1,idastar-symmullt-transmul,0.148009,ok +tdata-3-7-23,1,astar-symmullt-transmul,1.41209,ok +tdata-3-8-19,1,astar-symmulgt-transmul,1.22408,ok +tdata-3-8-19,1,idastar-symmulgt-transmul,2.10013,ok +tdata-3-8-19,1,idastar-symmullt-transmul,5.25233,ok +tdata-3-8-19,1,astar-symmullt-transmul,1.08407,ok +tdata-3-8-36,1,astar-symmulgt-transmul,8.25652,ok +tdata-3-8-36,1,idastar-symmulgt-transmul,1.35608,ok +tdata-3-8-36,1,idastar-symmullt-transmul,2.55216,ok +tdata-3-8-36,1,astar-symmullt-transmul,2.97619,ok +tdata-3-8-39,1,astar-symmulgt-transmul,0.636039,ok +tdata-3-8-39,1,idastar-symmulgt-transmul,0.532033,ok +tdata-3-8-39,1,idastar-symmullt-transmul,1.24408,ok +tdata-3-8-39,1,astar-symmullt-transmul,0.520032,ok +tdata-4-4-16,1,astar-symmulgt-transmul,1.92412,ok +tdata-4-4-16,1,idastar-symmulgt-transmul,0.88005,ok +tdata-4-4-16,1,idastar-symmullt-transmul,1.08407,ok +tdata-4-4-16,1,astar-symmullt-transmul,3.86024,ok +tdata-4-4-17,1,astar-symmulgt-transmul,7.46447,ok +tdata-4-4-17,1,idastar-symmulgt-transmul,2.30014,ok +tdata-4-4-17,1,idastar-symmullt-transmul,2.08013,ok +tdata-4-4-17,1,astar-symmullt-transmul,6.82043,ok +tdata-4-4-19,1,astar-symmulgt-transmul,3.84424,ok +tdata-4-4-19,1,idastar-symmulgt-transmul,1.06407,ok +tdata-4-4-19,1,idastar-symmullt-transmul,0.92806,ok +tdata-4-4-19,1,astar-symmullt-transmul,7.40046,ok +tdata-4-4-21,1,astar-symmulgt-transmul,3.60422,ok +tdata-4-4-21,1,idastar-symmulgt-transmul,2.54416,ok +tdata-4-4-21,1,idastar-symmullt-transmul,2.92818,ok +tdata-4-4-21,1,astar-symmullt-transmul,3.07219,ok +tdata-4-4-22,1,astar-symmulgt-transmul,1.04807,ok +tdata-4-4-22,1,idastar-symmulgt-transmul,0.256016,ok +tdata-4-4-22,1,idastar-symmullt-transmul,0.264016,ok +tdata-4-4-22,1,astar-symmullt-transmul,1.03206,ok +tdata-4-4-23,1,astar-symmulgt-transmul,16.8051,ok +tdata-4-4-23,1,idastar-symmulgt-transmul,3.39221,ok +tdata-4-4-23,1,idastar-symmullt-transmul,2.81618,ok +tdata-4-4-23,1,astar-symmullt-transmul,29.2698,ok +tdata-4-4-27,1,astar-symmulgt-transmul,1.83211,ok +tdata-4-4-27,1,idastar-symmulgt-transmul,0.232014,ok +tdata-4-4-27,1,idastar-symmullt-transmul,0.272017,ok +tdata-4-4-27,1,astar-symmullt-transmul,0.856053,ok +tdata-4-4-29,1,astar-symmulgt-transmul,54.1394,ok +tdata-4-4-29,1,idastar-symmulgt-transmul,8.30452,ok +tdata-4-4-29,1,idastar-symmullt-transmul,10.04463,ok +tdata-4-4-29,1,astar-symmullt-transmul,76.1928,ok +tdata-4-4-31,1,astar-symmulgt-transmul,0.956059,ok +tdata-4-4-31,1,idastar-symmulgt-transmul,0.17201,ok +tdata-4-4-31,1,idastar-symmullt-transmul,0.204012,ok +tdata-4-4-31,1,astar-symmullt-transmul,1.13607,ok +tdata-4-4-4,1,astar-symmulgt-transmul,0.884054,ok +tdata-4-4-4,1,idastar-symmulgt-transmul,0.96006,ok +tdata-4-4-4,1,idastar-symmullt-transmul,1.36008,ok +tdata-4-4-4,1,astar-symmullt-transmul,1.20408,ok +tdata-4-4-8,1,astar-symmulgt-transmul,212.449,ok +tdata-4-4-8,1,idastar-symmulgt-transmul,14.13688,ok +tdata-4-4-8,1,idastar-symmullt-transmul,15.62098,ok +tdata-4-4-8,1,astar-symmullt-transmul,230.794,ok +tdata-4-5-11,1,astar-symmulgt-transmul,2.10013,ok +tdata-4-5-11,1,idastar-symmulgt-transmul,2.53216,ok +tdata-4-5-11,1,idastar-symmullt-transmul,2.03213,ok +tdata-4-5-11,1,astar-symmullt-transmul,4.19226,ok +tdata-4-5-14,1,astar-symmulgt-transmul,26.0376,ok +tdata-4-5-14,1,idastar-symmulgt-transmul,3.37621,ok +tdata-4-5-14,1,idastar-symmullt-transmul,3.86024,ok +tdata-4-5-14,1,astar-symmullt-transmul,44.1828,ok +tdata-4-5-15,1,astar-symmulgt-transmul,1.11207,ok +tdata-4-5-15,1,idastar-symmulgt-transmul,0.300018,ok +tdata-4-5-15,1,idastar-symmullt-transmul,0.380023,ok +tdata-4-5-15,1,astar-symmullt-transmul,2.20814,ok +tdata-4-5-18,1,astar-symmulgt-transmul,2.80017,ok +tdata-4-5-18,1,idastar-symmulgt-transmul,2.08013,ok +tdata-4-5-18,1,idastar-symmullt-transmul,2.16013,ok +tdata-4-5-18,1,astar-symmullt-transmul,5.69235,ok +tdata-4-5-19,1,astar-symmulgt-transmul,17.1331,ok +tdata-4-5-19,1,idastar-symmulgt-transmul,13.71686,ok +tdata-4-5-19,1,idastar-symmullt-transmul,26.21364,ok +tdata-4-5-19,1,astar-symmullt-transmul,32.066,ok +tdata-4-5-2,1,astar-symmulgt-transmul,32.702,ok +tdata-4-5-2,1,idastar-symmulgt-transmul,9.78861,ok +tdata-4-5-2,1,idastar-symmullt-transmul,11.96475,ok +tdata-4-5-2,1,astar-symmullt-transmul,48.571,ok +tdata-4-5-22,1,astar-symmulgt-transmul,1.32008,ok +tdata-4-5-22,1,idastar-symmulgt-transmul,0.448027,ok +tdata-4-5-22,1,idastar-symmullt-transmul,0.48003,ok +tdata-4-5-22,1,astar-symmullt-transmul,1.6961,ok +tdata-4-5-23,1,astar-symmulgt-transmul,8.24851,ok +tdata-4-5-23,1,idastar-symmulgt-transmul,3.86424,ok +tdata-4-5-23,1,idastar-symmullt-transmul,4.42428,ok +tdata-4-5-23,1,astar-symmullt-transmul,13.1168,ok +tdata-4-5-24,1,astar-symmulgt-transmul,0.560034,ok +tdata-4-5-24,1,idastar-symmulgt-transmul,0.84405,ok +tdata-4-5-24,1,idastar-symmullt-transmul,1.28808,ok +tdata-4-5-24,1,astar-symmullt-transmul,1.14407,ok +tdata-4-5-25,1,astar-symmulgt-transmul,1.06407,ok +tdata-4-5-25,1,idastar-symmulgt-transmul,0.220013,ok +tdata-4-5-25,1,idastar-symmullt-transmul,0.192012,ok +tdata-4-5-25,1,astar-symmullt-transmul,1.89212,ok +tdata-4-5-27,1,astar-symmulgt-transmul,90.4136,ok +tdata-4-5-27,1,idastar-symmulgt-transmul,17.46909,ok +tdata-4-5-27,1,idastar-symmullt-transmul,23.34546,ok +tdata-4-5-27,1,astar-symmullt-transmul,109.855,ok +tdata-4-5-28,1,astar-symmulgt-transmul,0.592036,ok +tdata-4-5-28,1,idastar-symmulgt-transmul,0.460028,ok +tdata-4-5-28,1,idastar-symmullt-transmul,0.284017,ok +tdata-4-5-28,1,astar-symmullt-transmul,1.24008,ok +tdata-4-5-29,1,astar-symmulgt-transmul,21.9494,ok +tdata-4-5-29,1,idastar-symmulgt-transmul,3.39621,ok +tdata-4-5-29,1,idastar-symmullt-transmul,2.64417,ok +tdata-4-5-29,1,astar-symmullt-transmul,38.4824,ok +tdata-4-5-31,1,astar-symmulgt-transmul,89.0816,ok +tdata-4-5-31,1,idastar-symmulgt-transmul,15.58897,ok +tdata-4-5-31,1,idastar-symmullt-transmul,22.30939,ok +tdata-4-5-31,1,astar-symmullt-transmul,180.983,ok +tdata-4-5-33,1,astar-symmulgt-transmul,1.15607,ok +tdata-4-5-33,1,idastar-symmulgt-transmul,0.588036,ok +tdata-4-5-33,1,idastar-symmullt-transmul,0.70804,ok +tdata-4-5-33,1,astar-symmullt-transmul,2.24414,ok +tdata-4-5-35,1,astar-symmulgt-transmul,0.18001,ok +tdata-4-5-35,1,idastar-symmulgt-transmul,0.316019,ok +tdata-4-5-35,1,idastar-symmullt-transmul,0.032002,ok +tdata-4-5-35,1,astar-symmullt-transmul,1.16007,ok +tdata-4-5-36,1,astar-symmulgt-transmul,84.9813,ok +tdata-4-5-36,1,idastar-symmulgt-transmul,8.88856,ok +tdata-4-5-36,1,idastar-symmullt-transmul,12.35677,ok +tdata-4-5-36,1,astar-symmullt-transmul,122.208,ok +tdata-4-5-37,1,astar-symmulgt-transmul,3.12019,ok +tdata-4-5-37,1,idastar-symmulgt-transmul,1.35608,ok +tdata-4-5-37,1,idastar-symmullt-transmul,1.15607,ok +tdata-4-5-37,1,astar-symmullt-transmul,6.14838,ok +tdata-4-5-38,1,astar-symmulgt-transmul,4.8123,ok +tdata-4-5-38,1,idastar-symmulgt-transmul,1.86012,ok +tdata-4-5-38,1,idastar-symmullt-transmul,1.03606,ok +tdata-4-5-38,1,astar-symmullt-transmul,8.66454,ok +tdata-4-5-40,1,astar-symmulgt-transmul,0.788048,ok +tdata-4-5-40,1,idastar-symmulgt-transmul,0.112007,ok +tdata-4-5-40,1,idastar-symmullt-transmul,0.132008,ok +tdata-4-5-40,1,astar-symmullt-transmul,1.52009,ok +tdata-4-5-5,1,astar-symmulgt-transmul,11.4607,ok +tdata-4-5-5,1,idastar-symmulgt-transmul,2.45615,ok +tdata-4-5-5,1,idastar-symmullt-transmul,2.02013,ok +tdata-4-5-5,1,astar-symmullt-transmul,19.4572,ok +tdata-4-5-7,1,astar-symmulgt-transmul,89.6056,ok +tdata-4-5-7,1,idastar-symmulgt-transmul,19.1372,ok +tdata-4-5-7,1,idastar-symmullt-transmul,28.20976,ok +tdata-4-5-7,1,astar-symmullt-transmul,147.945,ok +tdata-4-5-8,1,astar-symmulgt-transmul,4.35627,ok +tdata-4-5-8,1,idastar-symmulgt-transmul,0.98406,ok +tdata-4-5-8,1,idastar-symmullt-transmul,0.644039,ok +tdata-4-5-8,1,astar-symmullt-transmul,7.72848,ok +tdata-4-5-9,1,astar-symmulgt-transmul,1.6201,ok +tdata-4-5-9,1,idastar-symmulgt-transmul,0.48403,ok +tdata-4-5-9,1,idastar-symmullt-transmul,0.596037,ok +tdata-4-5-9,1,astar-symmullt-transmul,3.05619,ok +tdata-4-6-1,1,astar-symmulgt-transmul,30.5859,ok +tdata-4-6-1,1,idastar-symmulgt-transmul,4.22826,ok +tdata-4-6-1,1,idastar-symmullt-transmul,3.08819,ok +tdata-4-6-1,1,astar-symmullt-transmul,46.7509,ok +tdata-4-6-10,1,astar-symmulgt-transmul,1568.26,ok +tdata-4-6-10,1,idastar-symmulgt-transmul,408.16151,ok +tdata-4-6-10,1,idastar-symmullt-transmul,604.44578,ok +tdata-4-6-10,1,astar-symmullt-transmul,1287.81,ok +tdata-4-6-11,1,astar-symmulgt-transmul,0.888055,ok +tdata-4-6-11,1,idastar-symmulgt-transmul,2.00413,ok +tdata-4-6-11,1,idastar-symmullt-transmul,2.76017,ok +tdata-4-6-11,1,astar-symmullt-transmul,1.6361,ok +tdata-4-6-12,1,astar-symmulgt-transmul,19.5052,ok +tdata-4-6-12,1,idastar-symmulgt-transmul,6.85243,ok +tdata-4-6-12,1,idastar-symmullt-transmul,13.00881,ok +tdata-4-6-12,1,astar-symmullt-transmul,60.5318,ok +tdata-4-6-13,1,astar-symmulgt-transmul,1.80411,ok +tdata-4-6-13,1,idastar-symmulgt-transmul,0.348021,ok +tdata-4-6-13,1,idastar-symmullt-transmul,0.512032,ok +tdata-4-6-13,1,astar-symmullt-transmul,4.25227,ok +tdata-4-6-14,1,astar-symmulgt-transmul,1.20007,ok +tdata-4-6-14,1,idastar-symmulgt-transmul,2.92018,ok +tdata-4-6-14,1,idastar-symmullt-transmul,4.47628,ok +tdata-4-6-14,1,astar-symmullt-transmul,2.30814,ok +tdata-4-6-16,1,astar-symmulgt-transmul,2.82418,ok +tdata-4-6-16,1,idastar-symmulgt-transmul,1.26408,ok +tdata-4-6-16,1,idastar-symmullt-transmul,2.12013,ok +tdata-4-6-16,1,astar-symmullt-transmul,3.77624,ok +tdata-4-6-17,1,astar-symmulgt-transmul,1.35608,ok +tdata-4-6-17,1,idastar-symmulgt-transmul,3.34821,ok +tdata-4-6-17,1,idastar-symmullt-transmul,3.34021,ok +tdata-4-6-17,1,astar-symmullt-transmul,2.82818,ok +tdata-4-6-19,1,astar-symmulgt-transmul,10.3446,ok +tdata-4-6-19,1,idastar-symmulgt-transmul,3.60423,ok +tdata-4-6-19,1,idastar-symmullt-transmul,4.60429,ok +tdata-4-6-19,1,astar-symmullt-transmul,15.333,ok +tdata-4-6-2,1,astar-symmulgt-transmul,87.7375,ok +tdata-4-6-2,1,idastar-symmulgt-transmul,6.63241,ok +tdata-4-6-2,1,idastar-symmullt-transmul,9.01656,ok +tdata-4-6-2,1,astar-symmullt-transmul,116.267,ok +tdata-4-6-20,1,astar-symmulgt-transmul,10.4407,ok +tdata-4-6-20,1,idastar-symmulgt-transmul,0.17201,ok +tdata-4-6-20,1,idastar-symmullt-transmul,0.232014,ok +tdata-4-6-20,1,astar-symmullt-transmul,10.0606,ok +tdata-4-6-22,1,astar-symmulgt-transmul,3.80024,ok +tdata-4-6-22,1,idastar-symmulgt-transmul,3.68423,ok +tdata-4-6-22,1,idastar-symmullt-transmul,5.14032,ok +tdata-4-6-22,1,astar-symmullt-transmul,3.2242,ok +tdata-4-6-23,1,astar-symmulgt-transmul,248.444,ok +tdata-4-6-23,1,idastar-symmulgt-transmul,10.34865,ok +tdata-4-6-23,1,idastar-symmullt-transmul,10.62866,ok +tdata-4-6-23,1,astar-symmullt-transmul,178.011,ok +tdata-4-6-24,1,astar-symmulgt-transmul,34.5622,ok +tdata-4-6-24,1,idastar-symmulgt-transmul,7.77649,ok +tdata-4-6-24,1,idastar-symmullt-transmul,7.29645,ok +tdata-4-6-24,1,astar-symmullt-transmul,56.2795,ok +tdata-4-6-26,1,astar-symmulgt-transmul,0.588036,ok +tdata-4-6-26,1,idastar-symmulgt-transmul,1.32408,ok +tdata-4-6-26,1,idastar-symmullt-transmul,1.6961,ok +tdata-4-6-26,1,astar-symmullt-transmul,1.26008,ok +tdata-4-6-27,1,astar-symmulgt-transmul,0.280016,ok +tdata-4-6-27,1,idastar-symmulgt-transmul,3.04819,ok +tdata-4-6-27,1,idastar-symmullt-transmul,2.99619,ok +tdata-4-6-27,1,astar-symmullt-transmul,0.596037,ok +tdata-4-6-3,1,astar-symmulgt-transmul,1.13207,ok +tdata-4-6-3,1,idastar-symmulgt-transmul,3.11619,ok +tdata-4-6-3,1,idastar-symmullt-transmul,4.48428,ok +tdata-4-6-3,1,astar-symmullt-transmul,2.23614,ok +tdata-4-6-31,1,astar-symmulgt-transmul,96.434,ok +tdata-4-6-31,1,idastar-symmulgt-transmul,12.50878,ok +tdata-4-6-31,1,idastar-symmullt-transmul,18.52916,ok +tdata-4-6-31,1,astar-symmullt-transmul,163.686,ok +tdata-4-6-33,1,astar-symmulgt-transmul,174.315,ok +tdata-4-6-33,1,idastar-symmulgt-transmul,12.05675,ok +tdata-4-6-33,1,idastar-symmullt-transmul,20.13326,ok +tdata-4-6-33,1,astar-symmullt-transmul,184.784,ok +tdata-4-6-35,1,astar-symmulgt-transmul,1.09207,ok +tdata-4-6-35,1,idastar-symmulgt-transmul,0.704043,ok +tdata-4-6-35,1,idastar-symmullt-transmul,0.84005,ok +tdata-4-6-35,1,astar-symmullt-transmul,1.40009,ok +tdata-4-6-36,1,astar-symmulgt-transmul,0.448027,ok +tdata-4-6-36,1,idastar-symmulgt-transmul,0.584036,ok +tdata-4-6-36,1,idastar-symmullt-transmul,0.792049,ok +tdata-4-6-36,1,astar-symmullt-transmul,4.44428,ok +tdata-4-6-38,1,astar-symmulgt-transmul,4.03225,ok +tdata-4-6-38,1,idastar-symmulgt-transmul,0.528033,ok +tdata-4-6-38,1,idastar-symmullt-transmul,0.908056,ok +tdata-4-6-38,1,astar-symmullt-transmul,7.32046,ok +tdata-4-6-4,1,astar-symmulgt-transmul,35.4102,ok +tdata-4-6-4,1,idastar-symmulgt-transmul,3.35621,ok +tdata-4-6-4,1,idastar-symmullt-transmul,4.61229,ok +tdata-4-6-4,1,astar-symmullt-transmul,59.3157,ok +tdata-4-6-40,1,astar-symmulgt-transmul,96.794,ok +tdata-4-6-40,1,idastar-symmulgt-transmul,8.0125,ok +tdata-4-6-40,1,idastar-symmullt-transmul,6.74442,ok +tdata-4-6-40,1,astar-symmullt-transmul,153.71,ok +tdata-4-6-5,1,astar-symmulgt-transmul,4.01225,ok +tdata-4-6-5,1,idastar-symmulgt-transmul,2.58816,ok +tdata-4-6-5,1,idastar-symmullt-transmul,3.44821,ok +tdata-4-6-5,1,astar-symmullt-transmul,3.97225,ok +tdata-4-6-6,1,astar-symmulgt-transmul,167.718,ok +tdata-4-6-6,1,idastar-symmulgt-transmul,18.22914,ok +tdata-4-6-6,1,idastar-symmullt-transmul,24.38152,ok +tdata-4-6-6,1,astar-symmullt-transmul,174.547,ok +tdata-4-6-7,1,astar-symmulgt-transmul,1.80411,ok +tdata-4-6-7,1,idastar-symmulgt-transmul,0.336021,ok +tdata-4-6-7,1,idastar-symmullt-transmul,0.276017,ok +tdata-4-6-7,1,astar-symmullt-transmul,1.95612,ok +tdata-4-6-8,1,astar-symmulgt-transmul,46.3229,ok +tdata-4-6-8,1,idastar-symmulgt-transmul,8.0205,ok +tdata-4-6-8,1,idastar-symmullt-transmul,13.53285,ok +tdata-4-6-8,1,astar-symmullt-transmul,78.6169,ok +tdata-4-7-10,1,astar-symmulgt-transmul,597.189,ok +tdata-4-7-10,1,idastar-symmulgt-transmul,19.56922,ok +tdata-4-7-10,1,idastar-symmullt-transmul,18.33315,ok +tdata-4-7-10,1,astar-symmullt-transmul,539.422,ok +tdata-4-7-12,1,astar-symmulgt-transmul,13.2768,ok +tdata-4-7-12,1,idastar-symmulgt-transmul,1.92812,ok +tdata-4-7-12,1,idastar-symmullt-transmul,4.8123,ok +tdata-4-7-12,1,astar-symmullt-transmul,12.3168,ok +tdata-4-7-13,1,astar-symmulgt-transmul,222.614,ok +tdata-4-7-13,1,idastar-symmulgt-transmul,2.20414,ok +tdata-4-7-13,1,idastar-symmullt-transmul,6.06438,ok +tdata-4-7-13,1,astar-symmullt-transmul,275.197,ok +tdata-4-7-14,1,astar-symmulgt-transmul,1299.67,ok +tdata-4-7-14,1,idastar-symmulgt-transmul,129.37208,ok +tdata-4-7-14,1,idastar-symmullt-transmul,114.41915,ok +tdata-4-7-14,1,astar-symmullt-transmul,3600,memout +tdata-4-7-17,1,astar-symmulgt-transmul,7.06044,ok +tdata-4-7-17,1,idastar-symmulgt-transmul,1.35208,ok +tdata-4-7-17,1,idastar-symmullt-transmul,1.42809,ok +tdata-4-7-17,1,astar-symmullt-transmul,6.66442,ok +tdata-4-7-19,1,astar-symmulgt-transmul,4.43628,ok +tdata-4-7-19,1,idastar-symmulgt-transmul,0.392024,ok +tdata-4-7-19,1,idastar-symmullt-transmul,1.02806,ok +tdata-4-7-19,1,astar-symmullt-transmul,3.84024,ok +tdata-4-7-2,1,astar-symmulgt-transmul,6.13238,ok +tdata-4-7-2,1,idastar-symmulgt-transmul,4.42828,ok +tdata-4-7-2,1,idastar-symmullt-transmul,8.63654,ok +tdata-4-7-2,1,astar-symmullt-transmul,2.98418,ok +tdata-4-7-22,1,astar-symmulgt-transmul,0.104006,ok +tdata-4-7-22,1,idastar-symmulgt-transmul,0.552034,ok +tdata-4-7-22,1,idastar-symmullt-transmul,1.09207,ok +tdata-4-7-22,1,astar-symmullt-transmul,0.16801,ok +tdata-4-7-23,1,astar-symmulgt-transmul,23.8255,ok +tdata-4-7-23,1,idastar-symmulgt-transmul,3.44821,ok +tdata-4-7-23,1,idastar-symmullt-transmul,6.73642,ok +tdata-4-7-23,1,astar-symmullt-transmul,13.0768,ok +tdata-4-7-24,1,astar-symmulgt-transmul,5.82036,ok +tdata-4-7-24,1,idastar-symmulgt-transmul,0.768048,ok +tdata-4-7-24,1,idastar-symmullt-transmul,1.6601,ok +tdata-4-7-24,1,astar-symmullt-transmul,3.32421,ok +tdata-4-7-25,1,astar-symmulgt-transmul,5.03231,ok +tdata-4-7-25,1,idastar-symmulgt-transmul,1.35608,ok +tdata-4-7-25,1,idastar-symmullt-transmul,4.02825,ok +tdata-4-7-25,1,astar-symmullt-transmul,2.74417,ok +tdata-4-7-26,1,astar-symmulgt-transmul,1.90012,ok +tdata-4-7-26,1,idastar-symmulgt-transmul,0.936058,ok +tdata-4-7-26,1,idastar-symmullt-transmul,1.01206,ok +tdata-4-7-26,1,astar-symmullt-transmul,0.896055,ok +tdata-4-7-28,1,astar-symmulgt-transmul,1.85611,ok +tdata-4-7-28,1,idastar-symmulgt-transmul,0.668041,ok +tdata-4-7-28,1,idastar-symmullt-transmul,1.17607,ok +tdata-4-7-28,1,astar-symmullt-transmul,0.96806,ok +tdata-4-7-29,1,astar-symmulgt-transmul,10.4287,ok +tdata-4-7-29,1,idastar-symmulgt-transmul,1.41609,ok +tdata-4-7-29,1,idastar-symmullt-transmul,2.31214,ok +tdata-4-7-29,1,astar-symmullt-transmul,5.62435,ok +tdata-4-7-30,1,astar-symmulgt-transmul,76.4568,ok +tdata-4-7-30,1,idastar-symmulgt-transmul,10.70867,ok +tdata-4-7-30,1,idastar-symmullt-transmul,31.39396,ok +tdata-4-7-30,1,astar-symmullt-transmul,54.1794,ok +tdata-4-7-33,1,astar-symmulgt-transmul,17.7731,ok +tdata-4-7-33,1,idastar-symmulgt-transmul,7.26045,ok +tdata-4-7-33,1,idastar-symmullt-transmul,9.97662,ok +tdata-4-7-33,1,astar-symmullt-transmul,17.2371,ok +tdata-4-7-35,1,astar-symmulgt-transmul,200.525,ok +tdata-4-7-35,1,idastar-symmulgt-transmul,57.70361,ok +tdata-4-7-35,1,idastar-symmullt-transmul,159.51797,ok +tdata-4-7-35,1,astar-symmullt-transmul,178.915,ok +tdata-4-7-36,1,astar-symmulgt-transmul,58.4917,ok +tdata-4-7-36,1,idastar-symmulgt-transmul,1.50809,ok +tdata-4-7-36,1,idastar-symmullt-transmul,5.30833,ok +tdata-4-7-36,1,astar-symmullt-transmul,25.9856,ok +tdata-4-7-38,1,astar-symmulgt-transmul,17.7251,ok +tdata-4-7-38,1,idastar-symmulgt-transmul,2.02813,ok +tdata-4-7-38,1,idastar-symmullt-transmul,3.72023,ok +tdata-4-7-38,1,astar-symmullt-transmul,16.497,ok +tdata-4-7-4,1,astar-symmulgt-transmul,162.45,ok +tdata-4-7-4,1,idastar-symmulgt-transmul,18.16914,ok +tdata-4-7-4,1,idastar-symmullt-transmul,33.02206,ok +tdata-4-7-4,1,astar-symmullt-transmul,188.788,ok +tdata-4-7-8,1,astar-symmulgt-transmul,1.93612,ok +tdata-4-7-8,1,idastar-symmulgt-transmul,1.70811,ok +tdata-4-7-8,1,idastar-symmullt-transmul,2.44815,ok +tdata-4-7-8,1,astar-symmullt-transmul,1.21608,ok +tdata-4-7-9,1,astar-symmulgt-transmul,1.14007,ok +tdata-4-7-9,1,idastar-symmulgt-transmul,0.272017,ok +tdata-4-7-9,1,idastar-symmullt-transmul,0.424026,ok +tdata-4-7-9,1,astar-symmullt-transmul,1.30808,ok +tdata-5-10-13,1,astar-symmulgt-transmul,429.047,ok +tdata-5-10-13,1,idastar-symmulgt-transmul,20.39727,ok +tdata-5-10-13,1,idastar-symmullt-transmul,47.943,ok +tdata-5-10-13,1,astar-symmullt-transmul,3600,memout +tdata-5-10-18,1,astar-symmulgt-transmul,247.299,ok +tdata-5-10-18,1,idastar-symmulgt-transmul,158.3979,ok +tdata-5-10-18,1,idastar-symmullt-transmul,183.9355,ok +tdata-5-10-18,1,astar-symmullt-transmul,250.272,ok +tdata-5-4-1,1,astar-symmulgt-transmul,85.0773,ok +tdata-5-4-1,1,idastar-symmulgt-transmul,21.68935,ok +tdata-5-4-1,1,idastar-symmullt-transmul,23.80949,ok +tdata-5-4-1,1,astar-symmullt-transmul,112.607,ok +tdata-5-4-13,1,astar-symmulgt-transmul,509.236,ok +tdata-5-4-13,1,idastar-symmulgt-transmul,20.24126,ok +tdata-5-4-13,1,idastar-symmullt-transmul,18.28114,ok +tdata-5-4-13,1,astar-symmullt-transmul,449.224,ok +tdata-5-4-14,1,astar-symmulgt-transmul,4.06025,ok +tdata-5-4-14,1,idastar-symmulgt-transmul,1.89612,ok +tdata-5-4-14,1,idastar-symmullt-transmul,2.06813,ok +tdata-5-4-14,1,astar-symmullt-transmul,4.12426,ok +tdata-5-4-18,1,astar-symmulgt-transmul,58.2356,ok +tdata-5-4-18,1,idastar-symmulgt-transmul,24.12551,ok +tdata-5-4-18,1,idastar-symmullt-transmul,26.79767,ok +tdata-5-4-18,1,astar-symmullt-transmul,57.6556,ok +tdata-5-4-19,1,astar-symmulgt-transmul,1032.43,ok +tdata-5-4-19,1,idastar-symmulgt-transmul,175.89899,ok +tdata-5-4-19,1,idastar-symmullt-transmul,170.68267,ok +tdata-5-4-19,1,astar-symmullt-transmul,827.208,ok +tdata-5-4-2,1,astar-symmulgt-transmul,14.5769,ok +tdata-5-4-2,1,idastar-symmulgt-transmul,12.14076,ok +tdata-5-4-2,1,idastar-symmullt-transmul,13.58085,ok +tdata-5-4-2,1,astar-symmullt-transmul,12.8448,ok +tdata-5-4-20,1,astar-symmulgt-transmul,47.447,ok +tdata-5-4-20,1,idastar-symmulgt-transmul,8.12451,ok +tdata-5-4-20,1,idastar-symmullt-transmul,9.18857,ok +tdata-5-4-20,1,astar-symmullt-transmul,43.2667,ok +tdata-5-4-21,1,astar-symmulgt-transmul,7.9645,ok +tdata-5-4-21,1,idastar-symmulgt-transmul,2.67617,ok +tdata-5-4-21,1,idastar-symmullt-transmul,3.01219,ok +tdata-5-4-21,1,astar-symmullt-transmul,7.20845,ok +tdata-5-4-23,1,astar-symmulgt-transmul,68.0363,ok +tdata-5-4-23,1,idastar-symmulgt-transmul,21.18532,ok +tdata-5-4-23,1,idastar-symmullt-transmul,15.84099,ok +tdata-5-4-23,1,astar-symmullt-transmul,102.726,ok +tdata-5-4-24,1,astar-symmulgt-transmul,430.823,ok +tdata-5-4-24,1,idastar-symmulgt-transmul,125.97187,ok +tdata-5-4-24,1,idastar-symmullt-transmul,129.20808,ok +tdata-5-4-24,1,astar-symmullt-transmul,467.333,ok +tdata-5-4-32,1,astar-symmulgt-transmul,78.2409,ok +tdata-5-4-32,1,idastar-symmulgt-transmul,17.04106,ok +tdata-5-4-32,1,idastar-symmullt-transmul,23.44147,ok +tdata-5-4-32,1,astar-symmullt-transmul,87.5495,ok +tdata-5-4-37,1,astar-symmulgt-transmul,1296.51,ok +tdata-5-4-37,1,idastar-symmulgt-transmul,171.07469,ok +tdata-5-4-37,1,idastar-symmullt-transmul,212.53728,ok +tdata-5-4-37,1,astar-symmullt-transmul,966.024,ok +tdata-5-4-39,1,astar-symmulgt-transmul,93.9819,ok +tdata-5-4-39,1,idastar-symmulgt-transmul,26.53766,ok +tdata-5-4-39,1,idastar-symmullt-transmul,25.82961,ok +tdata-5-4-39,1,astar-symmullt-transmul,118.319,ok +tdata-5-4-4,1,astar-symmulgt-transmul,1768.44,ok +tdata-5-4-4,1,idastar-symmulgt-transmul,271.92499,ok +tdata-5-4-4,1,idastar-symmullt-transmul,190.14788,ok +tdata-5-4-4,1,astar-symmullt-transmul,1710.21,ok +tdata-5-4-6,1,astar-symmulgt-transmul,9.50059,ok +tdata-5-4-6,1,idastar-symmulgt-transmul,3.30021,ok +tdata-5-4-6,1,idastar-symmullt-transmul,2.97619,ok +tdata-5-4-6,1,astar-symmullt-transmul,8.62454,ok +tdata-5-4-8,1,astar-symmulgt-transmul,288.11,ok +tdata-5-4-8,1,idastar-symmulgt-transmul,100.01825,ok +tdata-5-4-8,1,idastar-symmullt-transmul,96.31802,ok +tdata-5-4-8,1,astar-symmullt-transmul,267.297,ok +tdata-5-5-1,1,astar-symmulgt-transmul,1359.47,ok +tdata-5-5-1,1,idastar-symmulgt-transmul,20.18526,ok +tdata-5-5-1,1,idastar-symmullt-transmul,36.41828,ok +tdata-5-5-1,1,astar-symmullt-transmul,3600,memout +tdata-5-5-12,1,astar-symmulgt-transmul,1334.64,ok +tdata-5-5-12,1,idastar-symmulgt-transmul,34.20614,ok +tdata-5-5-12,1,idastar-symmullt-transmul,28.65779,ok +tdata-5-5-12,1,astar-symmullt-transmul,1335.55,ok +tdata-5-5-15,1,astar-symmulgt-transmul,101.514,ok +tdata-5-5-15,1,idastar-symmulgt-transmul,18.99319,ok +tdata-5-5-15,1,idastar-symmullt-transmul,30.08988,ok +tdata-5-5-15,1,astar-symmullt-transmul,109.903,ok +tdata-5-5-17,1,astar-symmulgt-transmul,8.74455,ok +tdata-5-5-17,1,idastar-symmulgt-transmul,12.66879,ok +tdata-5-5-17,1,idastar-symmullt-transmul,12.95281,ok +tdata-5-5-17,1,astar-symmullt-transmul,8.29252,ok +tdata-5-5-19,1,astar-symmulgt-transmul,194.824,ok +tdata-5-5-19,1,idastar-symmulgt-transmul,59.89974,ok +tdata-5-5-19,1,idastar-symmullt-transmul,82.60516,ok +tdata-5-5-19,1,astar-symmullt-transmul,167.266,ok +tdata-5-5-25,1,astar-symmulgt-transmul,28.5818,ok +tdata-5-5-25,1,idastar-symmulgt-transmul,19.91324,ok +tdata-5-5-25,1,idastar-symmullt-transmul,26.10563,ok +tdata-5-5-25,1,astar-symmullt-transmul,27.5937,ok +tdata-5-5-28,1,astar-symmulgt-transmul,459.761,ok +tdata-5-5-28,1,idastar-symmulgt-transmul,139.42871,ok +tdata-5-5-28,1,idastar-symmullt-transmul,155.1697,ok +tdata-5-5-28,1,astar-symmullt-transmul,414.678,ok +tdata-5-5-30,1,astar-symmulgt-transmul,325.692,ok +tdata-5-5-30,1,idastar-symmulgt-transmul,57.17557,ok +tdata-5-5-30,1,idastar-symmullt-transmul,82.38915,ok +tdata-5-5-30,1,astar-symmullt-transmul,357.03,ok +tdata-5-5-5,1,astar-symmulgt-transmul,37.8944,ok +tdata-5-5-5,1,idastar-symmulgt-transmul,44.00675,ok +tdata-5-5-5,1,idastar-symmullt-transmul,41.82261,ok +tdata-5-5-5,1,astar-symmullt-transmul,34.2821,ok +tdata-5-5-7,1,astar-symmulgt-transmul,951.339,ok +tdata-5-5-7,1,idastar-symmulgt-transmul,57.07157,ok +tdata-5-5-7,1,idastar-symmullt-transmul,65.6361,ok +tdata-5-5-7,1,astar-symmullt-transmul,1200.91,ok +tdata-5-6-25,1,astar-symmulgt-transmul,6.32439,ok +tdata-5-6-25,1,idastar-symmulgt-transmul,3.82024,ok +tdata-5-6-25,1,idastar-symmullt-transmul,4.11226,ok +tdata-5-6-25,1,astar-symmullt-transmul,10.2726,ok +tdata-5-6-26,1,astar-symmulgt-transmul,25.3816,ok +tdata-5-6-26,1,idastar-symmulgt-transmul,25.5656,ok +tdata-5-6-26,1,idastar-symmullt-transmul,24.21751,ok +tdata-5-6-26,1,astar-symmullt-transmul,25.8976,ok +tdata-5-6-3,1,astar-symmulgt-transmul,899.204,ok +tdata-5-6-3,1,idastar-symmulgt-transmul,34.61416,ok +tdata-5-6-3,1,idastar-symmullt-transmul,41.09057,ok +tdata-5-6-3,1,astar-symmullt-transmul,911.229,ok +tdata-5-6-32,1,astar-symmulgt-transmul,10.5447,ok +tdata-5-6-32,1,idastar-symmulgt-transmul,8.76455,ok +tdata-5-6-32,1,idastar-symmullt-transmul,11.50472,ok +tdata-5-6-32,1,astar-symmullt-transmul,10.6687,ok +tdata-5-6-34,1,astar-symmulgt-transmul,135.24,ok +tdata-5-6-34,1,idastar-symmulgt-transmul,31.21795,ok +tdata-5-6-34,1,idastar-symmullt-transmul,85.32133,ok +tdata-5-6-34,1,astar-symmullt-transmul,216.362,ok +tdata-5-6-39,1,astar-symmulgt-transmul,588.817,ok +tdata-5-6-39,1,idastar-symmulgt-transmul,215.57347,ok +tdata-5-6-39,1,idastar-symmullt-transmul,30.3819,ok +tdata-5-6-39,1,astar-symmullt-transmul,557.675,ok +tdata-5-6-7,1,astar-symmulgt-transmul,69.2043,ok +tdata-5-6-7,1,idastar-symmulgt-transmul,17.71711,ok +tdata-5-6-7,1,idastar-symmullt-transmul,23.08544,ok +tdata-5-6-7,1,astar-symmullt-transmul,179.459,ok +tdata-5-7-9,1,astar-symmulgt-transmul,26.5057,ok +tdata-5-7-9,1,idastar-symmulgt-transmul,6.48041,ok +tdata-5-7-9,1,idastar-symmullt-transmul,7.17645,ok +tdata-5-7-9,1,astar-symmullt-transmul,47.679,ok +tdata-5-8-31,1,astar-symmulgt-transmul,854.133,ok +tdata-5-8-31,1,idastar-symmulgt-transmul,92.16976,ok +tdata-5-8-31,1,idastar-symmullt-transmul,260.71229,ok +tdata-5-8-31,1,astar-symmullt-transmul,838.956,ok +tdata-5-9-38,1,astar-symmulgt-transmul,538.938,ok +tdata-5-9-38,1,idastar-symmulgt-transmul,28.05375,ok +tdata-5-9-38,1,idastar-symmullt-transmul,32.88605,ok +tdata-5-9-38,1,astar-symmullt-transmul,518.852,ok +BFT-10_16_8_77_15_58-1,1,astar-symmulgt-transmul,356.054,ok +BFT-10_16_8_77_15_58-1,1,idastar-symmulgt-transmul,3600,timeout +BFT-10_16_8_77_15_58-1,1,idastar-symmullt-transmul,3600,timeout +BFT-10_16_8_77_15_58-1,1,astar-symmullt-transmul,60.8998,ok +BFT-10_16_8_77_15_58-16,1,astar-symmulgt-transmul,87.2735,ok +BFT-10_16_8_77_15_58-16,1,idastar-symmulgt-transmul,3202.05211,ok +BFT-10_16_8_77_15_58-16,1,idastar-symmullt-transmul,1141.02731,ok +BFT-10_16_8_77_15_58-16,1,astar-symmullt-transmul,3600,memout +BFT-11_16_8_77_31_46-12,1,astar-symmulgt-transmul,69.0083,ok +BFT-11_16_8_77_31_46-12,1,idastar-symmulgt-transmul,3600,timeout +BFT-11_16_8_77_31_46-12,1,idastar-symmullt-transmul,3600,timeout +BFT-11_16_8_77_31_46-12,1,astar-symmullt-transmul,278.237,ok +BFT-17_20_5_60_12_36-1,1,astar-symmulgt-transmul,0.49203,ok +BFT-17_20_5_60_12_36-1,1,idastar-symmulgt-transmul,350.94193,ok +BFT-17_20_5_60_12_36-1,1,idastar-symmullt-transmul,260.39627,ok +BFT-17_20_5_60_12_36-1,1,astar-symmullt-transmul,0.49203,ok +BFT-17_20_5_60_12_36-11,1,astar-symmulgt-transmul,67.1802,ok +BFT-17_20_5_60_12_36-11,1,idastar-symmulgt-transmul,3600,timeout +BFT-17_20_5_60_12_36-11,1,idastar-symmullt-transmul,3600,timeout +BFT-17_20_5_60_12_36-11,1,astar-symmullt-transmul,3600,memout +BFT-17_20_5_60_12_36-14,1,astar-symmulgt-transmul,6.71642,ok +BFT-17_20_5_60_12_36-14,1,idastar-symmulgt-transmul,3600,timeout +BFT-17_20_5_60_12_36-14,1,idastar-symmullt-transmul,3600,timeout +BFT-17_20_5_60_12_36-14,1,astar-symmullt-transmul,0.060003,ok +BFT-17_20_5_60_12_36-15,1,astar-symmulgt-transmul,14.5529,ok +BFT-17_20_5_60_12_36-15,1,idastar-symmulgt-transmul,3600,timeout +BFT-17_20_5_60_12_36-15,1,idastar-symmullt-transmul,3600,timeout +BFT-17_20_5_60_12_36-15,1,astar-symmullt-transmul,0.024001,ok +BFT-17_20_5_60_12_36-17,1,astar-symmulgt-transmul,0.064004,ok +BFT-17_20_5_60_12_36-17,1,idastar-symmulgt-transmul,3600,timeout +BFT-17_20_5_60_12_36-17,1,idastar-symmullt-transmul,3600,timeout +BFT-17_20_5_60_12_36-17,1,astar-symmullt-transmul,0.040002,ok +BFT-17_20_5_60_12_36-19,1,astar-symmulgt-transmul,154.346,ok +BFT-17_20_5_60_12_36-19,1,idastar-symmulgt-transmul,2917.22631,ok +BFT-17_20_5_60_12_36-19,1,idastar-symmullt-transmul,3600,timeout +BFT-17_20_5_60_12_36-19,1,astar-symmullt-transmul,0.180011,ok +BFT-17_20_5_60_12_36-2,1,astar-symmulgt-transmul,0.544033,ok +BFT-17_20_5_60_12_36-2,1,idastar-symmulgt-transmul,3600,timeout +BFT-17_20_5_60_12_36-2,1,idastar-symmullt-transmul,3600,timeout +BFT-17_20_5_60_12_36-2,1,astar-symmullt-transmul,0.036002,ok +BFT-17_20_5_60_12_36-3,1,astar-symmulgt-transmul,12.7768,ok +BFT-17_20_5_60_12_36-3,1,idastar-symmulgt-transmul,3600,timeout +BFT-17_20_5_60_12_36-3,1,idastar-symmullt-transmul,3600,timeout +BFT-17_20_5_60_12_36-3,1,astar-symmullt-transmul,7.00044,ok +BFT-17_20_5_60_12_36-6,1,astar-symmulgt-transmul,31.498,ok +BFT-17_20_5_60_12_36-6,1,idastar-symmulgt-transmul,0.368022,ok +BFT-17_20_5_60_12_36-6,1,idastar-symmullt-transmul,0.004,ok +BFT-17_20_5_60_12_36-6,1,astar-symmullt-transmul,0.944058,ok +BFT-17_20_5_60_12_36-7,1,astar-symmulgt-transmul,2.25614,ok +BFT-17_20_5_60_12_36-7,1,idastar-symmulgt-transmul,3600,timeout +BFT-17_20_5_60_12_36-7,1,idastar-symmullt-transmul,3600,timeout +BFT-17_20_5_60_12_36-7,1,astar-symmullt-transmul,0.020001,ok +BFT-17_20_5_60_12_36-8,1,astar-symmulgt-transmul,36.3863,ok +BFT-17_20_5_60_12_36-8,1,idastar-symmulgt-transmul,3600,timeout +BFT-17_20_5_60_12_36-8,1,idastar-symmullt-transmul,3600,timeout +BFT-17_20_5_60_12_36-8,1,astar-symmullt-transmul,12.7968,ok +BFT-17_20_5_60_12_36-9,1,astar-symmulgt-transmul,0.164009,ok +BFT-17_20_5_60_12_36-9,1,idastar-symmulgt-transmul,0.040002,ok +BFT-17_20_5_60_12_36-9,1,idastar-symmullt-transmul,0.004,ok +BFT-17_20_5_60_12_36-9,1,astar-symmullt-transmul,3.06019,ok +BFT-18_20_5_60_12_45-11,1,astar-symmulgt-transmul,1.01206,ok +BFT-18_20_5_60_12_45-11,1,idastar-symmulgt-transmul,6.69242,ok +BFT-18_20_5_60_12_45-11,1,idastar-symmullt-transmul,0.004,ok +BFT-18_20_5_60_12_45-11,1,astar-symmullt-transmul,0.020001,ok +BFT-18_20_5_60_12_45-13,1,astar-symmulgt-transmul,0.044002,ok +BFT-18_20_5_60_12_45-13,1,idastar-symmulgt-transmul,159.11394,ok +BFT-18_20_5_60_12_45-13,1,idastar-symmullt-transmul,3600,timeout +BFT-18_20_5_60_12_45-13,1,astar-symmullt-transmul,3600,memout +BFT-18_20_5_60_12_45-15,1,astar-symmulgt-transmul,0.304018,ok +BFT-18_20_5_60_12_45-15,1,idastar-symmulgt-transmul,3600,timeout +BFT-18_20_5_60_12_45-15,1,idastar-symmullt-transmul,3600,timeout +BFT-18_20_5_60_12_45-15,1,astar-symmullt-transmul,3600,memout +BFT-18_20_5_60_12_45-16,1,astar-symmulgt-transmul,12.0168,ok +BFT-18_20_5_60_12_45-16,1,idastar-symmulgt-transmul,3.83624,ok +BFT-18_20_5_60_12_45-16,1,idastar-symmullt-transmul,0.004,ok +BFT-18_20_5_60_12_45-16,1,astar-symmullt-transmul,3600,memout +BFT-18_20_5_60_12_45-20,1,astar-symmulgt-transmul,0.032001,ok +BFT-18_20_5_60_12_45-20,1,idastar-symmulgt-transmul,3600,timeout +BFT-18_20_5_60_12_45-20,1,idastar-symmullt-transmul,3600,timeout +BFT-18_20_5_60_12_45-20,1,astar-symmullt-transmul,0.056003,ok +BFT-18_20_5_60_12_45-3,1,astar-symmulgt-transmul,0.064004,ok +BFT-18_20_5_60_12_45-3,1,idastar-symmulgt-transmul,0.144009,ok +BFT-18_20_5_60_12_45-3,1,idastar-symmullt-transmul,831.50797,ok +BFT-18_20_5_60_12_45-3,1,astar-symmullt-transmul,3600,memout +BFT-18_20_5_60_12_45-5,1,astar-symmulgt-transmul,3.81224,ok +BFT-18_20_5_60_12_45-5,1,idastar-symmulgt-transmul,3600,timeout +BFT-18_20_5_60_12_45-5,1,idastar-symmullt-transmul,3600,timeout +BFT-18_20_5_60_12_45-5,1,astar-symmullt-transmul,0.028001,ok +BFT-18_20_5_60_12_45-6,1,astar-symmulgt-transmul,14.2889,ok +BFT-18_20_5_60_12_45-6,1,idastar-symmulgt-transmul,3600,timeout +BFT-18_20_5_60_12_45-6,1,idastar-symmullt-transmul,3600,timeout +BFT-18_20_5_60_12_45-6,1,astar-symmullt-transmul,0.264015,ok +BFT-18_20_5_60_12_45-7,1,astar-symmulgt-transmul,0.232014,ok +BFT-18_20_5_60_12_45-7,1,idastar-symmulgt-transmul,3600,timeout +BFT-18_20_5_60_12_45-7,1,idastar-symmullt-transmul,0.48403,ok +BFT-18_20_5_60_12_45-7,1,astar-symmullt-transmul,0.032002,ok +BFT-18_20_5_60_12_45-8,1,astar-symmulgt-transmul,0.216013,ok +BFT-18_20_5_60_12_45-8,1,idastar-symmulgt-transmul,3600,timeout +BFT-18_20_5_60_12_45-8,1,idastar-symmullt-transmul,0.020001,ok +BFT-18_20_5_60_12_45-8,1,astar-symmullt-transmul,31.794,ok +BFT-19_20_5_60_24_36-11,1,astar-symmulgt-transmul,3.93625,ok +BFT-19_20_5_60_24_36-11,1,idastar-symmulgt-transmul,3600,timeout +BFT-19_20_5_60_24_36-11,1,idastar-symmullt-transmul,3600,timeout +BFT-19_20_5_60_24_36-11,1,astar-symmullt-transmul,3600,memout +BFT-19_20_5_60_24_36-12,1,astar-symmulgt-transmul,0.192011,ok +BFT-19_20_5_60_24_36-12,1,idastar-symmulgt-transmul,3600,timeout +BFT-19_20_5_60_24_36-12,1,idastar-symmullt-transmul,3600,timeout +BFT-19_20_5_60_24_36-12,1,astar-symmullt-transmul,1.91212,ok +BFT-19_20_5_60_24_36-13,1,astar-symmulgt-transmul,1.16407,ok +BFT-19_20_5_60_24_36-13,1,idastar-symmulgt-transmul,3600,timeout +BFT-19_20_5_60_24_36-13,1,idastar-symmullt-transmul,3600,timeout +BFT-19_20_5_60_24_36-13,1,astar-symmullt-transmul,3600,memout +BFT-19_20_5_60_24_36-14,1,astar-symmulgt-transmul,0.48403,ok +BFT-19_20_5_60_24_36-14,1,idastar-symmulgt-transmul,3600,timeout +BFT-19_20_5_60_24_36-14,1,idastar-symmullt-transmul,3600,timeout +BFT-19_20_5_60_24_36-14,1,astar-symmullt-transmul,0.456028,ok +BFT-19_20_5_60_24_36-18,1,astar-symmulgt-transmul,1.6841,ok +BFT-19_20_5_60_24_36-18,1,idastar-symmulgt-transmul,3600,timeout +BFT-19_20_5_60_24_36-18,1,idastar-symmullt-transmul,3600,timeout +BFT-19_20_5_60_24_36-18,1,astar-symmullt-transmul,0.028001,ok +BFT-19_20_5_60_24_36-20,1,astar-symmulgt-transmul,0.064004,ok +BFT-19_20_5_60_24_36-20,1,idastar-symmulgt-transmul,1.90412,ok +BFT-19_20_5_60_24_36-20,1,idastar-symmullt-transmul,0.004,ok +BFT-19_20_5_60_24_36-20,1,astar-symmullt-transmul,381.464,ok +BFT-19_20_5_60_24_36-8,1,astar-symmulgt-transmul,2.14813,ok +BFT-19_20_5_60_24_36-8,1,idastar-symmulgt-transmul,3600,timeout +BFT-19_20_5_60_24_36-8,1,idastar-symmullt-transmul,5.72836,ok +BFT-19_20_5_60_24_36-8,1,astar-symmullt-transmul,3600,memout +BFT-1_16_5_48_10_29-1,1,astar-symmulgt-transmul,80.661,ok +BFT-1_16_5_48_10_29-1,1,idastar-symmulgt-transmul,923.50972,ok +BFT-1_16_5_48_10_29-1,1,idastar-symmullt-transmul,1.13207,ok +BFT-1_16_5_48_10_29-1,1,astar-symmullt-transmul,0.064003,ok +BFT-1_16_5_48_10_29-10,1,astar-symmulgt-transmul,0.312018,ok +BFT-1_16_5_48_10_29-10,1,idastar-symmulgt-transmul,0.016001,ok +BFT-1_16_5_48_10_29-10,1,idastar-symmullt-transmul,0.004,ok +BFT-1_16_5_48_10_29-10,1,astar-symmullt-transmul,14.7089,ok +BFT-1_16_5_48_10_29-11,1,astar-symmulgt-transmul,0.012,ok +BFT-1_16_5_48_10_29-11,1,idastar-symmulgt-transmul,3600,timeout +BFT-1_16_5_48_10_29-11,1,idastar-symmullt-transmul,3600,timeout +BFT-1_16_5_48_10_29-11,1,astar-symmullt-transmul,0.024001,ok +BFT-1_16_5_48_10_29-14,1,astar-symmulgt-transmul,0.012,ok +BFT-1_16_5_48_10_29-14,1,idastar-symmulgt-transmul,0.024001,ok +BFT-1_16_5_48_10_29-14,1,idastar-symmullt-transmul,0.004,ok +BFT-1_16_5_48_10_29-14,1,astar-symmullt-transmul,3600,memout +BFT-1_16_5_48_10_29-15,1,astar-symmulgt-transmul,0.012,ok +BFT-1_16_5_48_10_29-15,1,idastar-symmulgt-transmul,141.74086,ok +BFT-1_16_5_48_10_29-15,1,idastar-symmullt-transmul,3600,timeout +BFT-1_16_5_48_10_29-15,1,astar-symmullt-transmul,0.016001,ok +BFT-1_16_5_48_10_29-16,1,astar-symmulgt-transmul,0.024001,ok +BFT-1_16_5_48_10_29-16,1,idastar-symmulgt-transmul,6.18439,ok +BFT-1_16_5_48_10_29-16,1,idastar-symmullt-transmul,44.01075,ok +BFT-1_16_5_48_10_29-16,1,astar-symmullt-transmul,0.012,ok +BFT-1_16_5_48_10_29-18,1,astar-symmulgt-transmul,0.012,ok +BFT-1_16_5_48_10_29-18,1,idastar-symmulgt-transmul,123.71173,ok +BFT-1_16_5_48_10_29-18,1,idastar-symmullt-transmul,183.18345,ok +BFT-1_16_5_48_10_29-18,1,astar-symmullt-transmul,0.016001,ok +BFT-1_16_5_48_10_29-19,1,astar-symmulgt-transmul,0.184011,ok +BFT-1_16_5_48_10_29-19,1,idastar-symmulgt-transmul,2320.70503,ok +BFT-1_16_5_48_10_29-19,1,idastar-symmullt-transmul,3600,timeout +BFT-1_16_5_48_10_29-19,1,astar-symmullt-transmul,0.284017,ok +BFT-1_16_5_48_10_29-2,1,astar-symmulgt-transmul,2.12013,ok +BFT-1_16_5_48_10_29-2,1,idastar-symmulgt-transmul,4.26827,ok +BFT-1_16_5_48_10_29-2,1,idastar-symmullt-transmul,1.11207,ok +BFT-1_16_5_48_10_29-2,1,astar-symmullt-transmul,3.1922,ok +BFT-1_16_5_48_10_29-3,1,astar-symmulgt-transmul,1.13607,ok +BFT-1_16_5_48_10_29-3,1,idastar-symmulgt-transmul,3.1402,ok +BFT-1_16_5_48_10_29-3,1,idastar-symmullt-transmul,0.012,ok +BFT-1_16_5_48_10_29-3,1,astar-symmullt-transmul,0.020001,ok +BFT-1_16_5_48_10_29-4,1,astar-symmulgt-transmul,4.45628,ok +BFT-1_16_5_48_10_29-4,1,idastar-symmulgt-transmul,13.01681,ok +BFT-1_16_5_48_10_29-4,1,idastar-symmullt-transmul,0.004,ok +BFT-1_16_5_48_10_29-4,1,astar-symmullt-transmul,0.016001,ok +BFT-1_16_5_48_10_29-5,1,astar-symmulgt-transmul,0.288017,ok +BFT-1_16_5_48_10_29-5,1,idastar-symmulgt-transmul,3600,timeout +BFT-1_16_5_48_10_29-5,1,idastar-symmullt-transmul,3600,timeout +BFT-1_16_5_48_10_29-5,1,astar-symmullt-transmul,0.02,ok +BFT-1_16_5_48_10_29-6,1,astar-symmulgt-transmul,0.012,ok +BFT-1_16_5_48_10_29-6,1,idastar-symmulgt-transmul,3600,timeout +BFT-1_16_5_48_10_29-6,1,idastar-symmullt-transmul,3600,timeout +BFT-1_16_5_48_10_29-6,1,astar-symmullt-transmul,0.020001,ok +BFT-1_16_5_48_10_29-8,1,astar-symmulgt-transmul,3.47222,ok +BFT-1_16_5_48_10_29-8,1,idastar-symmulgt-transmul,1.00006,ok +BFT-1_16_5_48_10_29-8,1,idastar-symmullt-transmul,5.89237,ok +BFT-1_16_5_48_10_29-8,1,astar-symmullt-transmul,1.34008,ok +BFT-1_16_5_48_10_29-9,1,astar-symmulgt-transmul,0.228014,ok +BFT-1_16_5_48_10_29-9,1,idastar-symmulgt-transmul,0.632039,ok +BFT-1_16_5_48_10_29-9,1,idastar-symmullt-transmul,1.50009,ok +BFT-1_16_5_48_10_29-9,1,astar-symmullt-transmul,0.316019,ok +BFT-20_20_5_60_24_45-14,1,astar-symmulgt-transmul,0.396024,ok +BFT-20_20_5_60_24_45-14,1,idastar-symmulgt-transmul,4.41228,ok +BFT-20_20_5_60_24_45-14,1,idastar-symmullt-transmul,0.004,ok +BFT-20_20_5_60_24_45-14,1,astar-symmullt-transmul,0.724044,ok +BFT-20_20_5_60_24_45-18,1,astar-symmulgt-transmul,18.6132,ok +BFT-20_20_5_60_24_45-18,1,idastar-symmulgt-transmul,3600,timeout +BFT-20_20_5_60_24_45-18,1,idastar-symmullt-transmul,3600,timeout +BFT-20_20_5_60_24_45-18,1,astar-symmullt-transmul,0.308019,ok +BFT-20_20_5_60_24_45-19,1,astar-symmulgt-transmul,2.60016,ok +BFT-20_20_5_60_24_45-19,1,idastar-symmulgt-transmul,3600,timeout +BFT-20_20_5_60_24_45-19,1,idastar-symmullt-transmul,3600,timeout +BFT-20_20_5_60_24_45-19,1,astar-symmullt-transmul,13.3928,ok +BFT-20_20_5_60_24_45-6,1,astar-symmulgt-transmul,0.156009,ok +BFT-20_20_5_60_24_45-6,1,idastar-symmulgt-transmul,3600,timeout +BFT-20_20_5_60_24_45-6,1,idastar-symmullt-transmul,3600,timeout +BFT-20_20_5_60_24_45-6,1,astar-symmullt-transmul,1.88412,ok +BFT-20_20_5_60_24_45-9,1,astar-symmulgt-transmul,172.567,ok +BFT-20_20_5_60_24_45-9,1,idastar-symmulgt-transmul,14.3569,ok +BFT-20_20_5_60_24_45-9,1,idastar-symmullt-transmul,303.91499,ok +BFT-20_20_5_60_24_45-9,1,astar-symmullt-transmul,0.460028,ok +BFT-21_20_5_80_16_48-1,1,astar-symmulgt-transmul,147.837,ok +BFT-21_20_5_80_16_48-1,1,idastar-symmulgt-transmul,567.29545,ok +BFT-21_20_5_80_16_48-1,1,idastar-symmullt-transmul,30.84993,ok +BFT-21_20_5_80_16_48-1,1,astar-symmullt-transmul,4.33227,ok +BFT-21_20_5_80_16_48-12,1,astar-symmulgt-transmul,1.5961,ok +BFT-21_20_5_80_16_48-12,1,idastar-symmulgt-transmul,3600,timeout +BFT-21_20_5_80_16_48-12,1,idastar-symmullt-transmul,3600,timeout +BFT-21_20_5_80_16_48-12,1,astar-symmullt-transmul,0.412025,ok +BFT-21_20_5_80_16_48-13,1,astar-symmulgt-transmul,202.777,ok +BFT-21_20_5_80_16_48-13,1,idastar-symmulgt-transmul,3600,timeout +BFT-21_20_5_80_16_48-13,1,idastar-symmullt-transmul,3600,timeout +BFT-21_20_5_80_16_48-13,1,astar-symmullt-transmul,11.6047,ok +BFT-21_20_5_80_16_48-14,1,astar-symmulgt-transmul,78.3049,ok +BFT-21_20_5_80_16_48-14,1,idastar-symmulgt-transmul,3600,timeout +BFT-21_20_5_80_16_48-14,1,idastar-symmullt-transmul,3600,timeout +BFT-21_20_5_80_16_48-14,1,astar-symmullt-transmul,3600,memout +BFT-21_20_5_80_16_48-18,1,astar-symmulgt-transmul,0.544033,ok +BFT-21_20_5_80_16_48-18,1,idastar-symmulgt-transmul,3600,timeout +BFT-21_20_5_80_16_48-18,1,idastar-symmullt-transmul,3600,timeout +BFT-21_20_5_80_16_48-18,1,astar-symmullt-transmul,193.72,ok +BFT-21_20_5_80_16_48-3,1,astar-symmulgt-transmul,2.98819,ok +BFT-21_20_5_80_16_48-3,1,idastar-symmulgt-transmul,3600,timeout +BFT-21_20_5_80_16_48-3,1,idastar-symmullt-transmul,3600,timeout +BFT-21_20_5_80_16_48-3,1,astar-symmullt-transmul,0.224013,ok +BFT-21_20_5_80_16_48-8,1,astar-symmulgt-transmul,10.2406,ok +BFT-21_20_5_80_16_48-8,1,idastar-symmulgt-transmul,3600,timeout +BFT-21_20_5_80_16_48-8,1,idastar-symmullt-transmul,3600,timeout +BFT-21_20_5_80_16_48-8,1,astar-symmullt-transmul,98.9262,ok +BFT-22_20_5_80_16_60-14,1,astar-symmulgt-transmul,26.1656,ok +BFT-22_20_5_80_16_60-14,1,idastar-symmulgt-transmul,3600,timeout +BFT-22_20_5_80_16_60-14,1,idastar-symmullt-transmul,3600,timeout +BFT-22_20_5_80_16_60-14,1,astar-symmullt-transmul,3600,memout +BFT-22_20_5_80_16_60-17,1,astar-symmulgt-transmul,57.0116,ok +BFT-22_20_5_80_16_60-17,1,idastar-symmulgt-transmul,3600,timeout +BFT-22_20_5_80_16_60-17,1,idastar-symmullt-transmul,3600,timeout +BFT-22_20_5_80_16_60-17,1,astar-symmullt-transmul,3600,memout +BFT-22_20_5_80_16_60-18,1,astar-symmulgt-transmul,13.2768,ok +BFT-22_20_5_80_16_60-18,1,idastar-symmulgt-transmul,3600,timeout +BFT-22_20_5_80_16_60-18,1,idastar-symmullt-transmul,3600,timeout +BFT-22_20_5_80_16_60-18,1,astar-symmullt-transmul,59.6117,ok +BFT-22_20_5_80_16_60-20,1,astar-symmulgt-transmul,0.600036,ok +BFT-22_20_5_80_16_60-20,1,idastar-symmulgt-transmul,3600,timeout +BFT-22_20_5_80_16_60-20,1,idastar-symmullt-transmul,3600,timeout +BFT-22_20_5_80_16_60-20,1,astar-symmullt-transmul,249.1,ok +BFT-22_20_5_80_16_60-3,1,astar-symmulgt-transmul,1.30808,ok +BFT-22_20_5_80_16_60-3,1,idastar-symmulgt-transmul,3600,timeout +BFT-22_20_5_80_16_60-3,1,idastar-symmullt-transmul,3600,timeout +BFT-22_20_5_80_16_60-3,1,astar-symmullt-transmul,0.448027,ok +BFT-22_20_5_80_16_60-5,1,astar-symmulgt-transmul,13.7809,ok +BFT-22_20_5_80_16_60-5,1,idastar-symmulgt-transmul,3600,timeout +BFT-22_20_5_80_16_60-5,1,idastar-symmullt-transmul,3600,timeout +BFT-22_20_5_80_16_60-5,1,astar-symmullt-transmul,3600,memout +BFT-22_20_5_80_16_60-6,1,astar-symmulgt-transmul,0.476029,ok +BFT-22_20_5_80_16_60-6,1,idastar-symmulgt-transmul,3600,timeout +BFT-22_20_5_80_16_60-6,1,idastar-symmullt-transmul,3600,timeout +BFT-22_20_5_80_16_60-6,1,astar-symmullt-transmul,0.420025,ok +BFT-22_20_5_80_16_60-8,1,astar-symmulgt-transmul,1.48009,ok +BFT-22_20_5_80_16_60-8,1,idastar-symmulgt-transmul,3600,timeout +BFT-22_20_5_80_16_60-8,1,idastar-symmullt-transmul,3600,timeout +BFT-22_20_5_80_16_60-8,1,astar-symmullt-transmul,4.29227,ok +BFT-23_20_5_80_32_48-1,1,astar-symmulgt-transmul,10.1406,ok +BFT-23_20_5_80_32_48-1,1,idastar-symmulgt-transmul,3600,timeout +BFT-23_20_5_80_32_48-1,1,idastar-symmullt-transmul,3600,timeout +BFT-23_20_5_80_32_48-1,1,astar-symmullt-transmul,3600,memout +BFT-23_20_5_80_32_48-11,1,astar-symmulgt-transmul,0.804049,ok +BFT-23_20_5_80_32_48-11,1,idastar-symmulgt-transmul,3600,timeout +BFT-23_20_5_80_32_48-11,1,idastar-symmullt-transmul,3600,timeout +BFT-23_20_5_80_32_48-11,1,astar-symmullt-transmul,4.34427,ok +BFT-23_20_5_80_32_48-14,1,astar-symmulgt-transmul,0.452027,ok +BFT-23_20_5_80_32_48-14,1,idastar-symmulgt-transmul,3600,timeout +BFT-23_20_5_80_32_48-14,1,idastar-symmullt-transmul,3600,timeout +BFT-23_20_5_80_32_48-14,1,astar-symmullt-transmul,3600,memout +BFT-23_20_5_80_32_48-16,1,astar-symmulgt-transmul,389.204,ok +BFT-23_20_5_80_32_48-16,1,idastar-symmulgt-transmul,3600,timeout +BFT-23_20_5_80_32_48-16,1,idastar-symmullt-transmul,3600,timeout +BFT-23_20_5_80_32_48-16,1,astar-symmullt-transmul,2.06413,ok +BFT-23_20_5_80_32_48-19,1,astar-symmulgt-transmul,1.6801,ok +BFT-23_20_5_80_32_48-19,1,idastar-symmulgt-transmul,3600,timeout +BFT-23_20_5_80_32_48-19,1,idastar-symmullt-transmul,3600,timeout +BFT-23_20_5_80_32_48-19,1,astar-symmullt-transmul,2.04013,ok +BFT-23_20_5_80_32_48-2,1,astar-symmulgt-transmul,0.360021,ok +BFT-23_20_5_80_32_48-2,1,idastar-symmulgt-transmul,3600,timeout +BFT-23_20_5_80_32_48-2,1,idastar-symmullt-transmul,3600,timeout +BFT-23_20_5_80_32_48-2,1,astar-symmullt-transmul,3600,memout +BFT-23_20_5_80_32_48-8,1,astar-symmulgt-transmul,57.3236,ok +BFT-23_20_5_80_32_48-8,1,idastar-symmulgt-transmul,76.53678,ok +BFT-23_20_5_80_32_48-8,1,idastar-symmullt-transmul,151.38946,ok +BFT-23_20_5_80_32_48-8,1,astar-symmullt-transmul,12.1088,ok +BFT-24_20_5_80_32_60-1,1,astar-symmulgt-transmul,42.5787,ok +BFT-24_20_5_80_32_60-1,1,idastar-symmulgt-transmul,3600,timeout +BFT-24_20_5_80_32_60-1,1,idastar-symmullt-transmul,3600,timeout +BFT-24_20_5_80_32_60-1,1,astar-symmullt-transmul,3600,memout +BFT-24_20_5_80_32_60-10,1,astar-symmulgt-transmul,25.1896,ok +BFT-24_20_5_80_32_60-10,1,idastar-symmulgt-transmul,3600,timeout +BFT-24_20_5_80_32_60-10,1,idastar-symmullt-transmul,3600,timeout +BFT-24_20_5_80_32_60-10,1,astar-symmullt-transmul,7.66048,ok +BFT-24_20_5_80_32_60-14,1,astar-symmulgt-transmul,0.888055,ok +BFT-24_20_5_80_32_60-14,1,idastar-symmulgt-transmul,3600,timeout +BFT-24_20_5_80_32_60-14,1,idastar-symmullt-transmul,3600,timeout +BFT-24_20_5_80_32_60-14,1,astar-symmullt-transmul,0.216013,ok +BFT-24_20_5_80_32_60-18,1,astar-symmulgt-transmul,373.855,ok +BFT-24_20_5_80_32_60-18,1,idastar-symmulgt-transmul,3600,timeout +BFT-24_20_5_80_32_60-18,1,idastar-symmullt-transmul,3600,timeout +BFT-24_20_5_80_32_60-18,1,astar-symmullt-transmul,11.9687,ok +BFT-24_20_5_80_32_60-4,1,astar-symmulgt-transmul,27.2577,ok +BFT-24_20_5_80_32_60-4,1,idastar-symmulgt-transmul,3600,timeout +BFT-24_20_5_80_32_60-4,1,idastar-symmullt-transmul,3600,timeout +BFT-24_20_5_80_32_60-4,1,astar-symmullt-transmul,3600,memout +BFT-24_20_5_80_32_60-8,1,astar-symmulgt-transmul,204.377,ok +BFT-24_20_5_80_32_60-8,1,idastar-symmulgt-transmul,3600,timeout +BFT-24_20_5_80_32_60-8,1,idastar-symmullt-transmul,3600,timeout +BFT-24_20_5_80_32_60-8,1,astar-symmullt-transmul,14.6089,ok +BFT-25_20_8_96_19_58-12,1,astar-symmulgt-transmul,136.221,ok +BFT-25_20_8_96_19_58-12,1,idastar-symmulgt-transmul,3600,timeout +BFT-25_20_8_96_19_58-12,1,idastar-symmullt-transmul,3600,timeout +BFT-25_20_8_96_19_58-12,1,astar-symmullt-transmul,3600,memout +BFT-25_20_8_96_19_58-3,1,astar-symmulgt-transmul,12.9968,ok +BFT-25_20_8_96_19_58-3,1,idastar-symmulgt-transmul,3600,timeout +BFT-25_20_8_96_19_58-3,1,idastar-symmullt-transmul,3600,timeout +BFT-25_20_8_96_19_58-3,1,astar-symmullt-transmul,3600,memout +BFT-28_20_8_96_38_72-19,1,astar-symmulgt-transmul,100.462,ok +BFT-28_20_8_96_38_72-19,1,idastar-symmulgt-transmul,3600,timeout +BFT-28_20_8_96_38_72-19,1,idastar-symmullt-transmul,3600,timeout +BFT-28_20_8_96_38_72-19,1,astar-symmullt-transmul,376.039,ok +BFT-2_16_5_48_10_36-11,1,astar-symmulgt-transmul,0.052003,ok +BFT-2_16_5_48_10_36-11,1,idastar-symmulgt-transmul,3600,timeout +BFT-2_16_5_48_10_36-11,1,idastar-symmullt-transmul,3600,timeout +BFT-2_16_5_48_10_36-11,1,astar-symmullt-transmul,0.024001,ok +BFT-2_16_5_48_10_36-13,1,astar-symmulgt-transmul,232.115,ok +BFT-2_16_5_48_10_36-13,1,idastar-symmulgt-transmul,33.44609,ok +BFT-2_16_5_48_10_36-13,1,idastar-symmullt-transmul,0.004,ok +BFT-2_16_5_48_10_36-13,1,astar-symmullt-transmul,0.012,ok +BFT-2_16_5_48_10_36-14,1,astar-symmulgt-transmul,0.088005,ok +BFT-2_16_5_48_10_36-14,1,idastar-symmulgt-transmul,6.86043,ok +BFT-2_16_5_48_10_36-14,1,idastar-symmullt-transmul,136.76855,ok +BFT-2_16_5_48_10_36-14,1,astar-symmullt-transmul,0.012,ok +BFT-2_16_5_48_10_36-15,1,astar-symmulgt-transmul,0.244014,ok +BFT-2_16_5_48_10_36-15,1,idastar-symmulgt-transmul,3600,timeout +BFT-2_16_5_48_10_36-15,1,idastar-symmullt-transmul,3600,timeout +BFT-2_16_5_48_10_36-15,1,astar-symmullt-transmul,0.304018,ok +BFT-2_16_5_48_10_36-16,1,astar-symmulgt-transmul,0.020001,ok +BFT-2_16_5_48_10_36-16,1,idastar-symmulgt-transmul,4.8043,ok +BFT-2_16_5_48_10_36-16,1,idastar-symmullt-transmul,458.34064,ok +BFT-2_16_5_48_10_36-16,1,astar-symmullt-transmul,0.012,ok +BFT-2_16_5_48_10_36-17,1,astar-symmulgt-transmul,24.0855,ok +BFT-2_16_5_48_10_36-17,1,idastar-symmulgt-transmul,3600,timeout +BFT-2_16_5_48_10_36-17,1,idastar-symmullt-transmul,3600,timeout +BFT-2_16_5_48_10_36-17,1,astar-symmullt-transmul,0.064003,ok +BFT-2_16_5_48_10_36-18,1,astar-symmulgt-transmul,0.044002,ok +BFT-2_16_5_48_10_36-18,1,idastar-symmulgt-transmul,100.56228,ok +BFT-2_16_5_48_10_36-18,1,idastar-symmullt-transmul,0.012,ok +BFT-2_16_5_48_10_36-18,1,astar-symmullt-transmul,276.661,ok +BFT-2_16_5_48_10_36-20,1,astar-symmulgt-transmul,0.076004,ok +BFT-2_16_5_48_10_36-20,1,idastar-symmulgt-transmul,2.41615,ok +BFT-2_16_5_48_10_36-20,1,idastar-symmullt-transmul,3.61223,ok +BFT-2_16_5_48_10_36-20,1,astar-symmullt-transmul,0.024001,ok +BFT-2_16_5_48_10_36-3,1,astar-symmulgt-transmul,0.072004,ok +BFT-2_16_5_48_10_36-3,1,idastar-symmulgt-transmul,3600,timeout +BFT-2_16_5_48_10_36-3,1,idastar-symmullt-transmul,3600,timeout +BFT-2_16_5_48_10_36-3,1,astar-symmullt-transmul,5.10032,ok +BFT-2_16_5_48_10_36-4,1,astar-symmulgt-transmul,0.300018,ok +BFT-2_16_5_48_10_36-4,1,idastar-symmulgt-transmul,1.20407,ok +BFT-2_16_5_48_10_36-4,1,idastar-symmullt-transmul,0.0,ok +BFT-2_16_5_48_10_36-4,1,astar-symmullt-transmul,0.136008,ok +BFT-2_16_5_48_10_36-5,1,astar-symmulgt-transmul,11.3567,ok +BFT-2_16_5_48_10_36-5,1,idastar-symmulgt-transmul,3600,timeout +BFT-2_16_5_48_10_36-5,1,idastar-symmullt-transmul,3600,timeout +BFT-2_16_5_48_10_36-5,1,astar-symmullt-transmul,0.024001,ok +BFT-2_16_5_48_10_36-8,1,astar-symmulgt-transmul,162.602,ok +BFT-2_16_5_48_10_36-8,1,idastar-symmulgt-transmul,0.472029,ok +BFT-2_16_5_48_10_36-8,1,idastar-symmullt-transmul,0.0,ok +BFT-2_16_5_48_10_36-8,1,astar-symmullt-transmul,0.032002,ok +BFT-2_16_5_48_10_36-9,1,astar-symmulgt-transmul,0.044002,ok +BFT-2_16_5_48_10_36-9,1,idastar-symmulgt-transmul,9.75261,ok +BFT-2_16_5_48_10_36-9,1,idastar-symmullt-transmul,0.0,ok +BFT-2_16_5_48_10_36-9,1,astar-symmullt-transmul,0.008,ok +BFT-3_16_5_48_19_29-1,1,astar-symmulgt-transmul,0.508031,ok +BFT-3_16_5_48_19_29-1,1,idastar-symmulgt-transmul,327.35246,ok +BFT-3_16_5_48_19_29-1,1,idastar-symmullt-transmul,3600,timeout +BFT-3_16_5_48_19_29-1,1,astar-symmullt-transmul,0.016,ok +BFT-3_16_5_48_19_29-10,1,astar-symmulgt-transmul,21.6334,ok +BFT-3_16_5_48_19_29-10,1,idastar-symmulgt-transmul,157.32583,ok +BFT-3_16_5_48_19_29-10,1,idastar-symmullt-transmul,2206.28188,ok +BFT-3_16_5_48_19_29-10,1,astar-symmullt-transmul,0.904056,ok +BFT-3_16_5_48_19_29-11,1,astar-symmulgt-transmul,0.028001,ok +BFT-3_16_5_48_19_29-11,1,idastar-symmulgt-transmul,3600,timeout +BFT-3_16_5_48_19_29-11,1,idastar-symmullt-transmul,3600,timeout +BFT-3_16_5_48_19_29-11,1,astar-symmullt-transmul,0.748046,ok +BFT-3_16_5_48_19_29-14,1,astar-symmulgt-transmul,0.016,ok +BFT-3_16_5_48_19_29-14,1,idastar-symmulgt-transmul,0.024001,ok +BFT-3_16_5_48_19_29-14,1,idastar-symmullt-transmul,0.0,ok +BFT-3_16_5_48_19_29-14,1,astar-symmullt-transmul,3600,memout +BFT-3_16_5_48_19_29-15,1,astar-symmulgt-transmul,0.148009,ok +BFT-3_16_5_48_19_29-15,1,idastar-symmulgt-transmul,5.59635,ok +BFT-3_16_5_48_19_29-15,1,idastar-symmullt-transmul,0.940058,ok +BFT-3_16_5_48_19_29-15,1,astar-symmullt-transmul,1.02806,ok +BFT-3_16_5_48_19_29-16,1,astar-symmulgt-transmul,0.040002,ok +BFT-3_16_5_48_19_29-16,1,idastar-symmulgt-transmul,3600,timeout +BFT-3_16_5_48_19_29-16,1,idastar-symmullt-transmul,3600,timeout +BFT-3_16_5_48_19_29-16,1,astar-symmullt-transmul,22.4254,ok +BFT-3_16_5_48_19_29-17,1,astar-symmulgt-transmul,236.279,ok +BFT-3_16_5_48_19_29-17,1,idastar-symmulgt-transmul,67.2082,ok +BFT-3_16_5_48_19_29-17,1,idastar-symmullt-transmul,3217.8851,ok +BFT-3_16_5_48_19_29-17,1,astar-symmullt-transmul,3600,memout +BFT-3_16_5_48_19_29-18,1,astar-symmulgt-transmul,7.83649,ok +BFT-3_16_5_48_19_29-18,1,idastar-symmulgt-transmul,1442.65016,ok +BFT-3_16_5_48_19_29-18,1,idastar-symmullt-transmul,3600,timeout +BFT-3_16_5_48_19_29-18,1,astar-symmullt-transmul,88.6975,ok +BFT-3_16_5_48_19_29-19,1,astar-symmulgt-transmul,0.120007,ok +BFT-3_16_5_48_19_29-19,1,idastar-symmulgt-transmul,3600,timeout +BFT-3_16_5_48_19_29-19,1,idastar-symmullt-transmul,3600,timeout +BFT-3_16_5_48_19_29-19,1,astar-symmullt-transmul,0.028001,ok +BFT-3_16_5_48_19_29-3,1,astar-symmulgt-transmul,0.020001,ok +BFT-3_16_5_48_19_29-3,1,idastar-symmulgt-transmul,0.016,ok +BFT-3_16_5_48_19_29-3,1,idastar-symmullt-transmul,10.42465,ok +BFT-3_16_5_48_19_29-3,1,astar-symmullt-transmul,0.420025,ok +BFT-3_16_5_48_19_29-4,1,astar-symmulgt-transmul,0.036001,ok +BFT-3_16_5_48_19_29-4,1,idastar-symmulgt-transmul,3600,timeout +BFT-3_16_5_48_19_29-4,1,idastar-symmullt-transmul,3600,timeout +BFT-3_16_5_48_19_29-4,1,astar-symmullt-transmul,0.316019,ok +BFT-3_16_5_48_19_29-5,1,astar-symmulgt-transmul,11.8407,ok +BFT-3_16_5_48_19_29-5,1,idastar-symmulgt-transmul,16.41303,ok +BFT-3_16_5_48_19_29-5,1,idastar-symmullt-transmul,0.444027,ok +BFT-3_16_5_48_19_29-5,1,astar-symmullt-transmul,45.0708,ok +BFT-3_16_5_48_19_29-7,1,astar-symmulgt-transmul,0.188011,ok +BFT-3_16_5_48_19_29-7,1,idastar-symmulgt-transmul,76.60479,ok +BFT-3_16_5_48_19_29-7,1,idastar-symmullt-transmul,0.0,ok +BFT-3_16_5_48_19_29-7,1,astar-symmullt-transmul,0.020001,ok +BFT-4_16_5_48_19_36-14,1,astar-symmulgt-transmul,125.848,ok +BFT-4_16_5_48_19_36-14,1,idastar-symmulgt-transmul,3600,timeout +BFT-4_16_5_48_19_36-14,1,idastar-symmullt-transmul,3600,timeout +BFT-4_16_5_48_19_36-14,1,astar-symmullt-transmul,0.016001,ok +BFT-4_16_5_48_19_36-15,1,astar-symmulgt-transmul,0.292017,ok +BFT-4_16_5_48_19_36-15,1,idastar-symmulgt-transmul,3600,timeout +BFT-4_16_5_48_19_36-15,1,idastar-symmullt-transmul,3600,timeout +BFT-4_16_5_48_19_36-15,1,astar-symmullt-transmul,16.213,ok +BFT-4_16_5_48_19_36-16,1,astar-symmulgt-transmul,0.040002,ok +BFT-4_16_5_48_19_36-16,1,idastar-symmulgt-transmul,3600,timeout +BFT-4_16_5_48_19_36-16,1,idastar-symmullt-transmul,3600,timeout +BFT-4_16_5_48_19_36-16,1,astar-symmullt-transmul,0.016001,ok +BFT-4_16_5_48_19_36-17,1,astar-symmulgt-transmul,3.40021,ok +BFT-4_16_5_48_19_36-17,1,idastar-symmulgt-transmul,146.00512,ok +BFT-4_16_5_48_19_36-17,1,idastar-symmullt-transmul,1.52009,ok +BFT-4_16_5_48_19_36-17,1,astar-symmullt-transmul,0.448027,ok +BFT-4_16_5_48_19_36-18,1,astar-symmulgt-transmul,0.33202,ok +BFT-4_16_5_48_19_36-18,1,idastar-symmulgt-transmul,565.32333,ok +BFT-4_16_5_48_19_36-18,1,idastar-symmullt-transmul,1292.00874,ok +BFT-4_16_5_48_19_36-18,1,astar-symmullt-transmul,3600,memout +BFT-4_16_5_48_19_36-19,1,astar-symmulgt-transmul,41.3786,ok +BFT-4_16_5_48_19_36-19,1,idastar-symmulgt-transmul,29.50984,ok +BFT-4_16_5_48_19_36-19,1,idastar-symmullt-transmul,140.7568,ok +BFT-4_16_5_48_19_36-19,1,astar-symmullt-transmul,3600,memout +BFT-4_16_5_48_19_36-20,1,astar-symmulgt-transmul,0.084005,ok +BFT-4_16_5_48_19_36-20,1,idastar-symmulgt-transmul,16.52503,ok +BFT-4_16_5_48_19_36-20,1,idastar-symmullt-transmul,17.71311,ok +BFT-4_16_5_48_19_36-20,1,astar-symmullt-transmul,0.248015,ok +BFT-4_16_5_48_19_36-3,1,astar-symmulgt-transmul,0.084005,ok +BFT-4_16_5_48_19_36-3,1,idastar-symmulgt-transmul,3600,timeout +BFT-4_16_5_48_19_36-3,1,idastar-symmullt-transmul,3600,timeout +BFT-4_16_5_48_19_36-3,1,astar-symmullt-transmul,7.82449,ok +BFT-4_16_5_48_19_36-7,1,astar-symmulgt-transmul,3.52422,ok +BFT-4_16_5_48_19_36-7,1,idastar-symmulgt-transmul,3600,timeout +BFT-4_16_5_48_19_36-7,1,idastar-symmullt-transmul,3600,timeout +BFT-4_16_5_48_19_36-7,1,astar-symmullt-transmul,4.28827,ok +BFT-4_16_5_48_19_36-8,1,astar-symmulgt-transmul,0.64804,ok +BFT-4_16_5_48_19_36-8,1,idastar-symmulgt-transmul,2.48415,ok +BFT-4_16_5_48_19_36-8,1,idastar-symmullt-transmul,19.2652,ok +BFT-4_16_5_48_19_36-8,1,astar-symmullt-transmul,3600,memout +BFT-4_16_5_48_19_36-9,1,astar-symmulgt-transmul,1.6121,ok +BFT-4_16_5_48_19_36-9,1,idastar-symmulgt-transmul,0.252015,ok +BFT-4_16_5_48_19_36-9,1,idastar-symmullt-transmul,0.18001,ok +BFT-4_16_5_48_19_36-9,1,astar-symmullt-transmul,8.27652,ok +BFT-5_16_5_64_13_38-10,1,astar-symmulgt-transmul,84.6013,ok +BFT-5_16_5_64_13_38-10,1,idastar-symmulgt-transmul,3600,timeout +BFT-5_16_5_64_13_38-10,1,idastar-symmullt-transmul,3600,timeout +BFT-5_16_5_64_13_38-10,1,astar-symmullt-transmul,3600,memout +BFT-5_16_5_64_13_38-11,1,astar-symmulgt-transmul,1.6561,ok +BFT-5_16_5_64_13_38-11,1,idastar-symmulgt-transmul,27.44171,ok +BFT-5_16_5_64_13_38-11,1,idastar-symmullt-transmul,119.81949,ok +BFT-5_16_5_64_13_38-11,1,astar-symmullt-transmul,1.12807,ok +BFT-5_16_5_64_13_38-14,1,astar-symmulgt-transmul,1.02006,ok +BFT-5_16_5_64_13_38-14,1,idastar-symmulgt-transmul,30.98594,ok +BFT-5_16_5_64_13_38-14,1,idastar-symmullt-transmul,32.46203,ok +BFT-5_16_5_64_13_38-14,1,astar-symmullt-transmul,0.916057,ok +BFT-5_16_5_64_13_38-15,1,astar-symmulgt-transmul,0.308019,ok +BFT-5_16_5_64_13_38-15,1,idastar-symmulgt-transmul,3600,timeout +BFT-5_16_5_64_13_38-15,1,idastar-symmullt-transmul,3600,timeout +BFT-5_16_5_64_13_38-15,1,astar-symmullt-transmul,0.084005,ok +BFT-5_16_5_64_13_38-16,1,astar-symmulgt-transmul,0.34002,ok +BFT-5_16_5_64_13_38-16,1,idastar-symmulgt-transmul,0.564035,ok +BFT-5_16_5_64_13_38-16,1,idastar-symmullt-transmul,3600,timeout +BFT-5_16_5_64_13_38-16,1,astar-symmullt-transmul,3600,memout +BFT-5_16_5_64_13_38-18,1,astar-symmulgt-transmul,4.24026,ok +BFT-5_16_5_64_13_38-18,1,idastar-symmulgt-transmul,2.16813,ok +BFT-5_16_5_64_13_38-18,1,idastar-symmullt-transmul,3.75223,ok +BFT-5_16_5_64_13_38-18,1,astar-symmullt-transmul,81.3611,ok +BFT-5_16_5_64_13_38-2,1,astar-symmulgt-transmul,1.99212,ok +BFT-5_16_5_64_13_38-2,1,idastar-symmulgt-transmul,3600,timeout +BFT-5_16_5_64_13_38-2,1,idastar-symmullt-transmul,3600,timeout +BFT-5_16_5_64_13_38-2,1,astar-symmullt-transmul,0.64804,ok +BFT-5_16_5_64_13_38-3,1,astar-symmulgt-transmul,0.088005,ok +BFT-5_16_5_64_13_38-3,1,idastar-symmulgt-transmul,1319.99449,ok +BFT-5_16_5_64_13_38-3,1,idastar-symmullt-transmul,3600,timeout +BFT-5_16_5_64_13_38-3,1,astar-symmullt-transmul,11.3407,ok +BFT-5_16_5_64_13_38-4,1,astar-symmulgt-transmul,10.9007,ok +BFT-5_16_5_64_13_38-4,1,idastar-symmulgt-transmul,9.78461,ok +BFT-5_16_5_64_13_38-4,1,idastar-symmullt-transmul,136.92856,ok +BFT-5_16_5_64_13_38-4,1,astar-symmullt-transmul,36.1863,ok +BFT-5_16_5_64_13_38-5,1,astar-symmulgt-transmul,5.36034,ok +BFT-5_16_5_64_13_38-5,1,idastar-symmulgt-transmul,874.23063,ok +BFT-5_16_5_64_13_38-5,1,idastar-symmullt-transmul,7.18445,ok +BFT-5_16_5_64_13_38-5,1,astar-symmullt-transmul,20.5773,ok +BFT-5_16_5_64_13_38-6,1,astar-symmulgt-transmul,174.939,ok +BFT-5_16_5_64_13_38-6,1,idastar-symmulgt-transmul,2.18014,ok +BFT-5_16_5_64_13_38-6,1,idastar-symmullt-transmul,2.11213,ok +BFT-5_16_5_64_13_38-6,1,astar-symmullt-transmul,0.608037,ok +BFT-5_16_5_64_13_38-8,1,astar-symmulgt-transmul,11.8327,ok +BFT-5_16_5_64_13_38-8,1,idastar-symmulgt-transmul,1.01206,ok +BFT-5_16_5_64_13_38-8,1,idastar-symmullt-transmul,3.03219,ok +BFT-5_16_5_64_13_38-8,1,astar-symmullt-transmul,4.33627,ok +BFT-6_16_5_64_13_48-11,1,astar-symmulgt-transmul,0.136008,ok +BFT-6_16_5_64_13_48-11,1,idastar-symmulgt-transmul,0.636039,ok +BFT-6_16_5_64_13_48-11,1,idastar-symmullt-transmul,4.29227,ok +BFT-6_16_5_64_13_48-11,1,astar-symmullt-transmul,8.92056,ok +BFT-6_16_5_64_13_48-12,1,astar-symmulgt-transmul,137.969,ok +BFT-6_16_5_64_13_48-12,1,idastar-symmulgt-transmul,3600,timeout +BFT-6_16_5_64_13_48-12,1,idastar-symmullt-transmul,3600,timeout +BFT-6_16_5_64_13_48-12,1,astar-symmullt-transmul,10.3526,ok +BFT-6_16_5_64_13_48-13,1,astar-symmulgt-transmul,24.2455,ok +BFT-6_16_5_64_13_48-13,1,idastar-symmulgt-transmul,178.61116,ok +BFT-6_16_5_64_13_48-13,1,idastar-symmullt-transmul,60.21176,ok +BFT-6_16_5_64_13_48-13,1,astar-symmullt-transmul,3600,memout +BFT-6_16_5_64_13_48-14,1,astar-symmulgt-transmul,8.80055,ok +BFT-6_16_5_64_13_48-14,1,idastar-symmulgt-transmul,543.37396,ok +BFT-6_16_5_64_13_48-14,1,idastar-symmullt-transmul,2848.19,ok +BFT-6_16_5_64_13_48-14,1,astar-symmullt-transmul,155.19,ok +BFT-6_16_5_64_13_48-15,1,astar-symmulgt-transmul,0.49603,ok +BFT-6_16_5_64_13_48-15,1,idastar-symmulgt-transmul,3600,timeout +BFT-6_16_5_64_13_48-15,1,idastar-symmullt-transmul,3600,timeout +BFT-6_16_5_64_13_48-15,1,astar-symmullt-transmul,1.51609,ok +BFT-6_16_5_64_13_48-17,1,astar-symmulgt-transmul,4.36027,ok +BFT-6_16_5_64_13_48-17,1,idastar-symmulgt-transmul,3600,timeout +BFT-6_16_5_64_13_48-17,1,idastar-symmullt-transmul,3600,timeout +BFT-6_16_5_64_13_48-17,1,astar-symmullt-transmul,99.4262,ok +BFT-6_16_5_64_13_48-2,1,astar-symmulgt-transmul,0.556034,ok +BFT-6_16_5_64_13_48-2,1,idastar-symmulgt-transmul,3600,timeout +BFT-6_16_5_64_13_48-2,1,idastar-symmullt-transmul,0.036002,ok +BFT-6_16_5_64_13_48-2,1,astar-symmullt-transmul,0.472029,ok +BFT-6_16_5_64_13_48-20,1,astar-symmulgt-transmul,26.6057,ok +BFT-6_16_5_64_13_48-20,1,idastar-symmulgt-transmul,3405.35282,ok +BFT-6_16_5_64_13_48-20,1,idastar-symmullt-transmul,556.27476,ok +BFT-6_16_5_64_13_48-20,1,astar-symmullt-transmul,91.2577,ok +BFT-6_16_5_64_13_48-3,1,astar-symmulgt-transmul,49.0551,ok +BFT-6_16_5_64_13_48-3,1,idastar-symmulgt-transmul,3600,timeout +BFT-6_16_5_64_13_48-3,1,idastar-symmullt-transmul,3600,timeout +BFT-6_16_5_64_13_48-3,1,astar-symmullt-transmul,222.614,ok +BFT-6_16_5_64_13_48-5,1,astar-symmulgt-transmul,0.104006,ok +BFT-6_16_5_64_13_48-5,1,idastar-symmulgt-transmul,17.38109,ok +BFT-6_16_5_64_13_48-5,1,idastar-symmullt-transmul,48.50303,ok +BFT-6_16_5_64_13_48-5,1,astar-symmullt-transmul,249.756,ok +BFT-6_16_5_64_13_48-6,1,astar-symmulgt-transmul,23.6535,ok +BFT-6_16_5_64_13_48-6,1,idastar-symmulgt-transmul,3484.43776,ok +BFT-6_16_5_64_13_48-6,1,idastar-symmullt-transmul,3600,timeout +BFT-6_16_5_64_13_48-6,1,astar-symmullt-transmul,4.22026,ok +BFT-6_16_5_64_13_48-8,1,astar-symmulgt-transmul,1.73211,ok +BFT-6_16_5_64_13_48-8,1,idastar-symmulgt-transmul,3600,timeout +BFT-6_16_5_64_13_48-8,1,idastar-symmullt-transmul,3600,timeout +BFT-6_16_5_64_13_48-8,1,astar-symmullt-transmul,0.688043,ok +BFT-6_16_5_64_13_48-9,1,astar-symmulgt-transmul,3.03219,ok +BFT-6_16_5_64_13_48-9,1,idastar-symmulgt-transmul,1.29208,ok +BFT-6_16_5_64_13_48-9,1,idastar-symmullt-transmul,0.584036,ok +BFT-6_16_5_64_13_48-9,1,astar-symmullt-transmul,6.51641,ok +BFT-7_16_5_64_26_38-1,1,astar-symmulgt-transmul,40.0385,ok +BFT-7_16_5_64_26_38-1,1,idastar-symmulgt-transmul,3600,timeout +BFT-7_16_5_64_26_38-1,1,idastar-symmullt-transmul,51.95525,ok +BFT-7_16_5_64_26_38-1,1,astar-symmullt-transmul,3600,memout +BFT-7_16_5_64_26_38-10,1,astar-symmulgt-transmul,11.6447,ok +BFT-7_16_5_64_26_38-10,1,idastar-symmulgt-transmul,3600,timeout +BFT-7_16_5_64_26_38-10,1,idastar-symmullt-transmul,3600,timeout +BFT-7_16_5_64_26_38-10,1,astar-symmullt-transmul,16.097,ok +BFT-7_16_5_64_26_38-12,1,astar-symmulgt-transmul,0.160009,ok +BFT-7_16_5_64_26_38-12,1,idastar-symmulgt-transmul,163.09819,ok +BFT-7_16_5_64_26_38-12,1,idastar-symmullt-transmul,1.01606,ok +BFT-7_16_5_64_26_38-12,1,astar-symmullt-transmul,7.00044,ok +BFT-7_16_5_64_26_38-14,1,astar-symmulgt-transmul,0.044002,ok +BFT-7_16_5_64_26_38-14,1,idastar-symmulgt-transmul,1975.71547,ok +BFT-7_16_5_64_26_38-14,1,idastar-symmullt-transmul,3600,timeout +BFT-7_16_5_64_26_38-14,1,astar-symmullt-transmul,3600,memout +BFT-7_16_5_64_26_38-15,1,astar-symmulgt-transmul,8.76855,ok +BFT-7_16_5_64_26_38-15,1,idastar-symmulgt-transmul,3600,timeout +BFT-7_16_5_64_26_38-15,1,idastar-symmullt-transmul,3600,timeout +BFT-7_16_5_64_26_38-15,1,astar-symmullt-transmul,3600,memout +BFT-7_16_5_64_26_38-2,1,astar-symmulgt-transmul,1.17207,ok +BFT-7_16_5_64_26_38-2,1,idastar-symmulgt-transmul,344.28952,ok +BFT-7_16_5_64_26_38-2,1,idastar-symmullt-transmul,469.36933,ok +BFT-7_16_5_64_26_38-2,1,astar-symmullt-transmul,57.6196,ok +BFT-7_16_5_64_26_38-20,1,astar-symmulgt-transmul,114.171,ok +BFT-7_16_5_64_26_38-20,1,idastar-symmulgt-transmul,3600,timeout +BFT-7_16_5_64_26_38-20,1,idastar-symmullt-transmul,3600,timeout +BFT-7_16_5_64_26_38-20,1,astar-symmullt-transmul,272.981,ok +BFT-7_16_5_64_26_38-7,1,astar-symmulgt-transmul,6.68442,ok +BFT-7_16_5_64_26_38-7,1,idastar-symmulgt-transmul,0.276017,ok +BFT-7_16_5_64_26_38-7,1,idastar-symmullt-transmul,419.2702,ok +BFT-7_16_5_64_26_38-7,1,astar-symmullt-transmul,0.040002,ok +BFT-7_16_5_64_26_38-8,1,astar-symmulgt-transmul,51.4112,ok +BFT-7_16_5_64_26_38-8,1,idastar-symmulgt-transmul,10.22464,ok +BFT-7_16_5_64_26_38-8,1,idastar-symmullt-transmul,18.00913,ok +BFT-7_16_5_64_26_38-8,1,astar-symmullt-transmul,5.42834,ok +BFT-7_16_5_64_26_38-9,1,astar-symmulgt-transmul,66.1201,ok +BFT-7_16_5_64_26_38-9,1,idastar-symmulgt-transmul,3600,timeout +BFT-7_16_5_64_26_38-9,1,idastar-symmullt-transmul,3600,timeout +BFT-7_16_5_64_26_38-9,1,astar-symmullt-transmul,8.17251,ok +BFT-8_16_5_64_26_48-11,1,astar-symmulgt-transmul,15.577,ok +BFT-8_16_5_64_26_48-11,1,idastar-symmulgt-transmul,1894.5584,ok +BFT-8_16_5_64_26_48-11,1,idastar-symmullt-transmul,3600,timeout +BFT-8_16_5_64_26_48-11,1,astar-symmullt-transmul,1.6441,ok +BFT-8_16_5_64_26_48-15,1,astar-symmulgt-transmul,1.17607,ok +BFT-8_16_5_64_26_48-15,1,idastar-symmulgt-transmul,3600,timeout +BFT-8_16_5_64_26_48-15,1,idastar-symmullt-transmul,73.79261,ok +BFT-8_16_5_64_26_48-15,1,astar-symmullt-transmul,28.4498,ok +BFT-8_16_5_64_26_48-2,1,astar-symmulgt-transmul,0.136008,ok +BFT-8_16_5_64_26_48-2,1,idastar-symmulgt-transmul,3600,timeout +BFT-8_16_5_64_26_48-2,1,idastar-symmullt-transmul,3600,timeout +BFT-8_16_5_64_26_48-2,1,astar-symmullt-transmul,0.208012,ok +BFT-8_16_5_64_26_48-6,1,astar-symmulgt-transmul,8.70054,ok +BFT-8_16_5_64_26_48-6,1,idastar-symmulgt-transmul,10.74467,ok +BFT-8_16_5_64_26_48-6,1,idastar-symmullt-transmul,10.12063,ok +BFT-8_16_5_64_26_48-6,1,astar-symmullt-transmul,7.58447,ok +BFT-8_16_5_64_26_48-8,1,astar-symmulgt-transmul,0.052003,ok +BFT-8_16_5_64_26_48-8,1,idastar-symmulgt-transmul,3600,timeout +BFT-8_16_5_64_26_48-8,1,idastar-symmullt-transmul,3600,timeout +BFT-8_16_5_64_26_48-8,1,astar-symmullt-transmul,0.164009,ok +BFT-8_16_5_64_26_48-9,1,astar-symmulgt-transmul,116.219,ok +BFT-8_16_5_64_26_48-9,1,idastar-symmulgt-transmul,361.42659,ok +BFT-8_16_5_64_26_48-9,1,idastar-symmullt-transmul,35.41821,ok +BFT-8_16_5_64_26_48-9,1,astar-symmullt-transmul,266.237,ok +BFT-9_16_8_77_15_46-2,1,astar-symmulgt-transmul,185.872,ok +BFT-9_16_8_77_15_46-2,1,idastar-symmulgt-transmul,10.73267,ok +BFT-9_16_8_77_15_46-2,1,idastar-symmullt-transmul,14.50491,ok +BFT-9_16_8_77_15_46-2,1,astar-symmullt-transmul,4.19226,ok +BFT-9_16_8_77_15_46-20,1,astar-symmulgt-transmul,25.4216,ok +BFT-9_16_8_77_15_46-20,1,idastar-symmulgt-transmul,3600,timeout +BFT-9_16_8_77_15_46-20,1,idastar-symmullt-transmul,3600,timeout +BFT-9_16_8_77_15_46-20,1,astar-symmullt-transmul,3600,memout diff --git a/_articles/RJ-2025-045/CPMP-2015_data/citation.bib b/_articles/RJ-2025-045/CPMP-2015_data/citation.bib new file mode 100644 index 0000000000..0e8a9f109b --- /dev/null +++ b/_articles/RJ-2025-045/CPMP-2015_data/citation.bib @@ -0,0 +1,9 @@ +@inproceedings{aslib_premarshalling, + author={Kevin Tierney and Yuri Malitsky}, + title={An Algorithm Selection Benchmark of the Container Pre-Marshalling Problem}, + booktitle={Learning and Intelligent OptimizatioN (LION) 2015} + year = {2015} +} + + + diff --git a/_articles/RJ-2025-045/CPMP-2015_data/cv.arff b/_articles/RJ-2025-045/CPMP-2015_data/cv.arff new file mode 100644 index 0000000000..bae8367920 --- /dev/null +++ b/_articles/RJ-2025-045/CPMP-2015_data/cv.arff @@ -0,0 +1,534 @@ +@relation R_data_frame + +@attribute instance_id string +@attribute repetition numeric +@attribute fold numeric + +@data +BF1_cpmp_16_5_48_10_29_15,1,1 +BF17_cpmp_20_5_60_12_36_10,1,1 +BF17_cpmp_20_5_60_12_36_20,1,1 +BF18_cpmp_20_5_60_12_45_15,1,1 +BF18_cpmp_20_5_60_12_45_2,1,1 +BF18_cpmp_20_5_60_12_45_3,1,1 +BF18_cpmp_20_5_60_12_45_9,1,1 +BF19_cpmp_20_5_60_24_36_14,1,1 +BF19_cpmp_20_5_60_24_36_2,1,1 +BF19_cpmp_20_5_60_24_36_6,1,1 +BF2_cpmp_16_5_48_10_36_12,1,1 +BF20_cpmp_20_5_60_24_45_13,1,1 +BF20_cpmp_20_5_60_24_45_7,1,1 +BF21_cpmp_20_5_80_16_48_11,1,1 +BF21_cpmp_20_5_80_16_48_14,1,1 +BF21_cpmp_20_5_80_16_48_9,1,1 +BF22_cpmp_20_5_80_16_60_4,1,1 +BF22_cpmp_20_5_80_16_60_6,1,1 +BF23_cpmp_20_5_80_32_48_10,1,1 +BF23_cpmp_20_5_80_32_48_11,1,1 +BF23_cpmp_20_5_80_32_48_3,1,1 +BF23_cpmp_20_5_80_32_48_7,1,1 +BF26_cpmp_20_8_96_20_72_5,1,1 +BF3_cpmp_16_5_48_20_29_11,1,1 +BF5_cpmp_16_5_64_13_39_1,1,1 +BF5_cpmp_16_5_64_13_39_15,1,1 +BF5_cpmp_16_5_64_13_39_3,1,1 +BF6_cpmp_16_5_64_13_48_1,1,1 +BF6_cpmp_16_5_64_13_48_13,1,1 +BF7_cpmp_16_5_64_26_39_13,1,1 +BF9_cpmp_16_8_77_16_47_8,1,1 +LC2a_lc2a_4,1,1 +LC2b_lc2b_4,1,1 +LC3b_lc3b_4,1,1 +cv_data5-7-40,1,1 +cv_data5-8-38,1,1 +cv_data5-5-29,1,1 +cv_data4-5-20,1,1 +cv_data5-4-7,1,1 +cv_data5-5-8,1,1 +cv_data4-6-2,1,1 +cv_data4-6-29,1,1 +cv_data3-7-27,1,1 +cv_data5-4-25,1,1 +cv_data5-6-31,1,1 +cv_data4-4-2,1,1 +cv_data5-6-35,1,1 +cv_data5-7-11,1,1 +cv_data3-6-24,1,1 +cv_data5-7-31,1,1 +cv_data4-7-29,1,1 +cv_data5-9-3,1,1 +cv_data4-7-15,1,1 +BF1_cpmp_16_5_48_10_29_14,1,2 +BF10_cpmp_16_8_77_16_58_8,1,2 +BF11_cpmp_16_8_77_31_47_10,1,2 +BF17_cpmp_20_5_60_12_36_1,1,2 +BF17_cpmp_20_5_60_12_36_17,1,2 +BF17_cpmp_20_5_60_12_36_3,1,2 +BF2_cpmp_16_5_48_10_36_11,1,2 +BF2_cpmp_16_5_48_10_36_17,1,2 +BF2_cpmp_16_5_48_10_36_3,1,2 +BF20_cpmp_20_5_60_24_45_2,1,2 +BF22_cpmp_20_5_80_16_60_9,1,2 +BF24_cpmp_20_5_80_32_60_13,1,2 +BF3_cpmp_16_5_48_20_29_8,1,2 +BF4_cpmp_16_5_48_20_36_5,1,2 +BF5_cpmp_16_5_64_13_39_11,1,2 +BF6_cpmp_16_5_64_13_48_7,1,2 +BF7_cpmp_16_5_64_26_39_10,1,2 +BF7_cpmp_16_5_64_26_39_14,1,2 +BF7_cpmp_16_5_64_26_39_17,1,2 +BF8_cpmp_16_5_64_26_48_1,1,2 +LC2a_lc2a_3,1,2 +LC2b_lc2b_1,1,2 +LC2b_lc2b_2,1,2 +LC2b_lc2b_5,1,2 +cv_data5-5-12,1,2 +cv_data5-4-21,1,2 +cv_data4-7-37,1,2 +cv_data4-7-30,1,2 +cv_data5-4-36,1,2 +cv_data5-10-23,1,2 +cv_data5-5-11,1,2 +cv_data4-7-6,1,2 +cv_data4-6-5,1,2 +cv_data4-7-1,1,2 +cv_data5-9-21,1,2 +cv_data5-5-16,1,2 +cv_data5-5-24,1,2 +cv_data4-5-23,1,2 +cv_data5-7-36,1,2 +cv_data3-8-28,1,2 +cv_data5-4-30,1,2 +cv_data4-7-11,1,2 +cv_data4-4-24,1,2 +cv_data5-6-7,1,2 +cv_data4-7-10,1,2 +cv_data5-4-9,1,2 +cv_data5-5-22,1,2 +cv_data4-7-34,1,2 +cv_data3-7-30,1,2 +cv_data4-4-30,1,2 +cv_data4-4-25,1,2 +cv_data3-5-8,1,2 +cv_data5-5-13,1,2 +BF17_cpmp_20_5_60_12_36_5,1,3 +BF18_cpmp_20_5_60_12_45_1,1,3 +BF18_cpmp_20_5_60_12_45_12,1,3 +BF18_cpmp_20_5_60_12_45_7,1,3 +BF19_cpmp_20_5_60_24_36_11,1,3 +BF19_cpmp_20_5_60_24_36_9,1,3 +BF2_cpmp_16_5_48_10_36_15,1,3 +BF2_cpmp_16_5_48_10_36_2,1,3 +BF2_cpmp_16_5_48_10_36_7,1,3 +BF24_cpmp_20_5_80_32_60_2,1,3 +BF25_cpmp_20_8_96_20_58_20,1,3 +BF4_cpmp_16_5_48_20_36_18,1,3 +BF4_cpmp_16_5_48_20_36_19,1,3 +BF4_cpmp_16_5_48_20_36_7,1,3 +BF5_cpmp_16_5_64_13_39_14,1,3 +BF5_cpmp_16_5_64_13_39_4,1,3 +BF5_cpmp_16_5_64_13_39_8,1,3 +BF6_cpmp_16_5_64_13_48_11,1,3 +BF6_cpmp_16_5_64_13_48_12,1,3 +BF7_cpmp_16_5_64_26_39_19,1,3 +BF7_cpmp_16_5_64_26_39_2,1,3 +BF9_cpmp_16_8_77_16_47_1,1,3 +LC2b_lc2b_3,1,3 +LC2b_lc2b_6,1,3 +LC3b_lc3b_10,1,3 +LC3b_lc3b_9,1,3 +cv_data5-7-13,1,3 +cv_data4-4-16,1,3 +cv_data5-8-19,1,3 +cv_data5-5-37,1,3 +cv_data5-6-25,1,3 +cv_data5-6-11,1,3 +cv_data5-6-2,1,3 +cv_data5-5-27,1,3 +cv_data5-5-10,1,3 +cv_data4-7-32,1,3 +cv_data3-7-22,1,3 +cv_data4-6-30,1,3 +cv_data5-4-10,1,3 +cv_data4-4-34,1,3 +cv_data4-4-10,1,3 +cv_data4-7-31,1,3 +cv_data4-7-21,1,3 +cv_data4-6-14,1,3 +cv_data3-7-17,1,3 +cv_data4-6-8,1,3 +cv_data4-6-28,1,3 +cv_data5-6-30,1,3 +cv_data4-7-14,1,3 +cv_data5-10-19,1,3 +cv_data4-5-14,1,3 +cv_data4-6-3,1,3 +cv_data5-4-6,1,3 +BF1_cpmp_16_5_48_10_29_8,1,4 +BF1_cpmp_16_5_48_10_29_9,1,4 +BF11_cpmp_16_8_77_31_47_7,1,4 +BF18_cpmp_20_5_60_12_45_19,1,4 +BF18_cpmp_20_5_60_12_45_5,1,4 +BF19_cpmp_20_5_60_24_36_18,1,4 +BF19_cpmp_20_5_60_24_36_3,1,4 +BF19_cpmp_20_5_60_24_36_7,1,4 +BF2_cpmp_16_5_48_10_36_1,1,4 +BF2_cpmp_16_5_48_10_36_13,1,4 +BF2_cpmp_16_5_48_10_36_14,1,4 +BF2_cpmp_16_5_48_10_36_9,1,4 +BF20_cpmp_20_5_60_24_45_10,1,4 +BF21_cpmp_20_5_80_16_48_8,1,4 +BF3_cpmp_16_5_48_20_29_1,1,4 +BF3_cpmp_16_5_48_20_29_6,1,4 +BF4_cpmp_16_5_48_20_36_2,1,4 +BF6_cpmp_16_5_64_13_48_4,1,4 +BF6_cpmp_16_5_64_13_48_9,1,4 +BF8_cpmp_16_5_64_26_48_12,1,4 +BF8_cpmp_16_5_64_26_48_17,1,4 +BF8_cpmp_16_5_64_26_48_6,1,4 +BF9_cpmp_16_8_77_16_47_3,1,4 +LC2a_lc2a_10,1,4 +LC2a_lc2a_2,1,4 +LC3a_lc3a_10,1,4 +LC3a_lc3a_7,1,4 +LC3b_lc3b_1,1,4 +cv_data4-7-25,1,4 +cv_data4-5-25,1,4 +cv_data4-6-35,1,4 +cv_data5-4-17,1,4 +cv_data5-6-8,1,4 +cv_data5-5-40,1,4 +cv_data4-7-24,1,4 +cv_data4-7-40,1,4 +cv_data4-7-23,1,4 +cv_data4-4-3,1,4 +cv_data4-5-15,1,4 +cv_data5-5-30,1,4 +cv_data4-4-35,1,4 +cv_data4-6-10,1,4 +cv_data5-5-20,1,4 +cv_data4-7-28,1,4 +cv_data4-5-17,1,4 +cv_data4-6-33,1,4 +cv_data4-6-11,1,4 +cv_data4-5-35,1,4 +cv_data4-6-13,1,4 +cv_data5-5-36,1,4 +cv_data3-6-26,1,4 +cv_data5-10-32,1,4 +cv_data4-7-17,1,4 +BF1_cpmp_16_5_48_10_29_19,1,5 +BF10_cpmp_16_8_77_16_58_3,1,5 +BF19_cpmp_20_5_60_24_36_10,1,5 +BF19_cpmp_20_5_60_24_36_17,1,5 +BF19_cpmp_20_5_60_24_36_20,1,5 +BF2_cpmp_16_5_48_10_36_8,1,5 +BF20_cpmp_20_5_60_24_45_11,1,5 +BF20_cpmp_20_5_60_24_45_15,1,5 +BF20_cpmp_20_5_60_24_45_20,1,5 +BF20_cpmp_20_5_60_24_45_3,1,5 +BF20_cpmp_20_5_60_24_45_6,1,5 +BF20_cpmp_20_5_60_24_45_8,1,5 +BF21_cpmp_20_5_80_16_48_19,1,5 +BF22_cpmp_20_5_80_16_60_11,1,5 +BF23_cpmp_20_5_80_32_48_6,1,5 +BF28_cpmp_20_8_96_39_72_19,1,5 +BF3_cpmp_16_5_48_20_29_9,1,5 +BF4_cpmp_16_5_48_20_36_15,1,5 +BF4_cpmp_16_5_48_20_36_6,1,5 +BF5_cpmp_16_5_64_13_39_2,1,5 +BF5_cpmp_16_5_64_13_39_5,1,5 +BF6_cpmp_16_5_64_13_48_17,1,5 +BF6_cpmp_16_5_64_13_48_19,1,5 +BF7_cpmp_16_5_64_26_39_15,1,5 +LC2a_lc2a_5,1,5 +LC3b_lc3b_6,1,5 +cv_data5-4-23,1,5 +cv_data4-5-1,1,5 +cv_data4-7-38,1,5 +cv_data5-5-38,1,5 +cv_data4-6-7,1,5 +cv_data5-4-4,1,5 +cv_data5-8-6,1,5 +cv_data4-6-20,1,5 +cv_data5-5-33,1,5 +cv_data4-6-16,1,5 +cv_data5-4-13,1,5 +cv_data5-7-33,1,5 +cv_data4-4-23,1,5 +cv_data5-6-28,1,5 +cv_data5-5-21,1,5 +cv_data5-6-37,1,5 +cv_data3-7-5,1,5 +cv_data5-7-1,1,5 +cv_data5-4-1,1,5 +cv_data4-4-13,1,5 +cv_data5-5-28,1,5 +cv_data5-8-36,1,5 +cv_data5-5-4,1,5 +cv_data4-6-39,1,5 +cv_data5-4-32,1,5 +cv_data4-6-15,1,5 +cv_data5-8-29,1,5 +BF1_cpmp_16_5_48_10_29_7,1,6 +BF10_cpmp_16_8_77_16_58_13,1,6 +BF17_cpmp_20_5_60_12_36_2,1,6 +BF17_cpmp_20_5_60_12_36_6,1,6 +BF17_cpmp_20_5_60_12_36_7,1,6 +BF17_cpmp_20_5_60_12_36_8,1,6 +BF18_cpmp_20_5_60_12_45_14,1,6 +BF18_cpmp_20_5_60_12_45_6,1,6 +BF19_cpmp_20_5_60_24_36_16,1,6 +BF19_cpmp_20_5_60_24_36_8,1,6 +BF21_cpmp_20_5_80_16_48_4,1,6 +BF21_cpmp_20_5_80_16_48_5,1,6 +BF23_cpmp_20_5_80_32_48_12,1,6 +BF27_cpmp_20_8_96_39_58_9,1,6 +BF4_cpmp_16_5_48_20_36_11,1,6 +BF4_cpmp_16_5_48_20_36_13,1,6 +BF5_cpmp_16_5_64_13_39_10,1,6 +BF5_cpmp_16_5_64_13_39_20,1,6 +BF6_cpmp_16_5_64_13_48_16,1,6 +BF6_cpmp_16_5_64_13_48_2,1,6 +BF7_cpmp_16_5_64_26_39_3,1,6 +BF7_cpmp_16_5_64_26_39_7,1,6 +BF8_cpmp_16_5_64_26_48_11,1,6 +BF8_cpmp_16_5_64_26_48_5,1,6 +BF9_cpmp_16_8_77_16_47_6,1,6 +LC3a_lc3a_6,1,6 +cv_data4-5-26,1,6 +cv_data4-5-9,1,6 +cv_data4-4-33,1,6 +cv_data5-8-25,1,6 +cv_data3-5-13,1,6 +cv_data5-5-23,1,6 +cv_data4-4-12,1,6 +cv_data4-5-27,1,6 +cv_data3-8-25,1,6 +cv_data5-7-4,1,6 +cv_data5-6-20,1,6 +cv_data3-8-24,1,6 +cv_data4-5-3,1,6 +cv_data4-6-19,1,6 +cv_data5-5-17,1,6 +cv_data4-6-34,1,6 +cv_data5-5-32,1,6 +cv_data4-7-7,1,6 +cv_data5-7-12,1,6 +cv_data4-7-4,1,6 +cv_data5-4-18,1,6 +cv_data4-5-32,1,6 +cv_data5-5-6,1,6 +cv_data5-6-5,1,6 +cv_data4-4-27,1,6 +cv_data5-5-19,1,6 +cv_data3-8-17,1,6 +BF1_cpmp_16_5_48_10_29_1,1,7 +BF1_cpmp_16_5_48_10_29_12,1,7 +BF1_cpmp_16_5_48_10_29_17,1,7 +BF1_cpmp_16_5_48_10_29_18,1,7 +BF17_cpmp_20_5_60_12_36_11,1,7 +BF17_cpmp_20_5_60_12_36_16,1,7 +BF18_cpmp_20_5_60_12_45_13,1,7 +BF18_cpmp_20_5_60_12_45_18,1,7 +BF2_cpmp_16_5_48_10_36_20,1,7 +BF20_cpmp_20_5_60_24_45_17,1,7 +BF21_cpmp_20_5_80_16_48_18,1,7 +BF21_cpmp_20_5_80_16_48_2,1,7 +BF21_cpmp_20_5_80_16_48_3,1,7 +BF24_cpmp_20_5_80_32_60_8,1,7 +BF27_cpmp_20_8_96_39_58_12,1,7 +BF3_cpmp_16_5_48_20_29_14,1,7 +BF3_cpmp_16_5_48_20_29_5,1,7 +BF8_cpmp_16_5_64_26_48_20,1,7 +BF9_cpmp_16_8_77_16_47_4,1,7 +BF9_cpmp_16_8_77_16_47_7,1,7 +LC3a_lc3a_1,1,7 +LC3a_lc3a_4,1,7 +LC3a_lc3a_9,1,7 +LC3b_lc3b_5,1,7 +cv_data4-6-1,1,7 +cv_data4-6-18,1,7 +cv_data5-6-6,1,7 +cv_data3-8-38,1,7 +cv_data5-5-25,1,7 +cv_data4-5-30,1,7 +cv_data5-9-38,1,7 +cv_data5-5-9,1,7 +cv_data5-4-35,1,7 +cv_data5-4-29,1,7 +cv_data4-6-12,1,7 +cv_data5-6-22,1,7 +cv_data4-6-32,1,7 +cv_data5-7-37,1,7 +cv_data5-4-14,1,7 +cv_data4-6-40,1,7 +cv_data4-7-5,1,7 +cv_data5-4-12,1,7 +cv_data5-6-33,1,7 +cv_data4-5-12,1,7 +cv_data4-5-28,1,7 +cv_data4-4-7,1,7 +cv_data4-6-27,1,7 +cv_data4-4-28,1,7 +cv_data4-7-33,1,7 +cv_data5-7-18,1,7 +cv_data5-7-16,1,7 +cv_data4-7-9,1,7 +BF1_cpmp_16_5_48_10_29_20,1,8 +BF1_cpmp_16_5_48_10_29_3,1,8 +BF1_cpmp_16_5_48_10_29_5,1,8 +BF1_cpmp_16_5_48_10_29_6,1,8 +BF11_cpmp_16_8_77_31_47_6,1,8 +BF12_cpmp_16_8_77_31_58_14,1,8 +BF18_cpmp_20_5_60_12_45_4,1,8 +BF18_cpmp_20_5_60_12_45_8,1,8 +BF19_cpmp_20_5_60_24_36_15,1,8 +BF2_cpmp_16_5_48_10_36_18,1,8 +BF2_cpmp_16_5_48_10_36_4,1,8 +BF20_cpmp_20_5_60_24_45_1,1,8 +BF22_cpmp_20_5_80_16_60_16,1,8 +BF23_cpmp_20_5_80_32_48_15,1,8 +BF23_cpmp_20_5_80_32_48_16,1,8 +BF27_cpmp_20_8_96_39_58_6,1,8 +BF3_cpmp_16_5_48_20_29_19,1,8 +BF4_cpmp_16_5_48_20_36_14,1,8 +BF4_cpmp_16_5_48_20_36_17,1,8 +BF4_cpmp_16_5_48_20_36_9,1,8 +BF5_cpmp_16_5_64_13_39_9,1,8 +BF6_cpmp_16_5_64_13_48_8,1,8 +BF7_cpmp_16_5_64_26_39_1,1,8 +BF7_cpmp_16_5_64_26_39_8,1,8 +BF8_cpmp_16_5_64_26_48_4,1,8 +BF9_cpmp_16_8_77_16_47_10,1,8 +LC2b_lc2b_9,1,8 +LC3a_lc3a_3,1,8 +cv_data4-4-21,1,8 +cv_data4-7-39,1,8 +cv_data4-7-27,1,8 +cv_data4-5-22,1,8 +cv_data5-7-15,1,8 +cv_data4-7-35,1,8 +cv_data5-8-2,1,8 +cv_data4-5-24,1,8 +cv_data5-4-15,1,8 +cv_data5-4-27,1,8 +cv_data5-4-31,1,8 +cv_data4-7-26,1,8 +cv_data5-5-7,1,8 +cv_data5-4-3,1,8 +cv_data4-5-39,1,8 +cv_data4-4-14,1,8 +cv_data4-6-25,1,8 +cv_data5-4-38,1,8 +cv_data5-8-4,1,8 +cv_data4-7-16,1,8 +cv_data4-4-29,1,8 +cv_data4-5-31,1,8 +cv_data5-7-7,1,8 +cv_data3-8-7,1,8 +BF1_cpmp_16_5_48_10_29_4,1,9 +BF11_cpmp_16_8_77_31_47_16,1,9 +BF17_cpmp_20_5_60_12_36_4,1,9 +BF18_cpmp_20_5_60_12_45_17,1,9 +BF19_cpmp_20_5_60_24_36_1,1,9 +BF2_cpmp_16_5_48_10_36_5,1,9 +BF20_cpmp_20_5_60_24_45_5,1,9 +BF21_cpmp_20_5_80_16_48_16,1,9 +BF23_cpmp_20_5_80_32_48_18,1,9 +BF23_cpmp_20_5_80_32_48_4,1,9 +BF3_cpmp_16_5_48_20_29_13,1,9 +BF3_cpmp_16_5_48_20_29_16,1,9 +BF4_cpmp_16_5_48_20_36_10,1,9 +BF4_cpmp_16_5_48_20_36_3,1,9 +BF4_cpmp_16_5_48_20_36_8,1,9 +BF5_cpmp_16_5_64_13_39_12,1,9 +BF6_cpmp_16_5_64_13_48_6,1,9 +BF7_cpmp_16_5_64_26_39_12,1,9 +BF7_cpmp_16_5_64_26_39_20,1,9 +BF7_cpmp_16_5_64_26_39_6,1,9 +BF8_cpmp_16_5_64_26_48_7,1,9 +LC2a_lc2a_1,1,9 +LC2a_lc2a_7,1,9 +LC3a_lc3a_8,1,9 +cv_data4-4-19,1,9 +cv_data5-7-19,1,9 +cv_data3-8-37,1,9 +cv_data5-4-37,1,9 +cv_data5-5-5,1,9 +cv_data4-6-9,1,9 +cv_data4-7-19,1,9 +cv_data5-5-14,1,9 +cv_data3-7-18,1,9 +cv_data5-5-18,1,9 +cv_data5-4-20,1,9 +cv_data4-7-13,1,9 +cv_data5-9-32,1,9 +cv_data5-5-3,1,9 +cv_data4-6-37,1,9 +cv_data5-5-31,1,9 +cv_data5-7-22,1,9 +cv_data4-7-12,1,9 +cv_data4-5-29,1,9 +cv_data5-4-22,1,9 +cv_data5-8-8,1,9 +cv_data5-8-3,1,9 +cv_data4-7-8,1,9 +cv_data5-4-24,1,9 +cv_data5-4-16,1,9 +cv_data5-5-15,1,9 +cv_data4-4-20,1,9 +cv_data4-5-19,1,9 +cv_data5-9-10,1,9 +BF11_cpmp_16_8_77_31_47_11,1,10 +BF17_cpmp_20_5_60_12_36_12,1,10 +BF17_cpmp_20_5_60_12_36_15,1,10 +BF18_cpmp_20_5_60_12_45_16,1,10 +BF19_cpmp_20_5_60_24_36_5,1,10 +BF2_cpmp_16_5_48_10_36_19,1,10 +BF2_cpmp_16_5_48_10_36_6,1,10 +BF20_cpmp_20_5_60_24_45_12,1,10 +BF20_cpmp_20_5_60_24_45_14,1,10 +BF20_cpmp_20_5_60_24_45_9,1,10 +BF21_cpmp_20_5_80_16_48_7,1,10 +BF22_cpmp_20_5_80_16_60_7,1,10 +BF24_cpmp_20_5_80_32_60_17,1,10 +BF3_cpmp_16_5_48_20_29_20,1,10 +BF3_cpmp_16_5_48_20_29_3,1,10 +BF4_cpmp_16_5_48_20_36_1,1,10 +BF4_cpmp_16_5_48_20_36_16,1,10 +BF4_cpmp_16_5_48_20_36_20,1,10 +BF5_cpmp_16_5_64_13_39_17,1,10 +BF6_cpmp_16_5_64_13_48_15,1,10 +BF6_cpmp_16_5_64_13_48_18,1,10 +BF6_cpmp_16_5_64_13_48_3,1,10 +BF7_cpmp_16_5_64_26_39_11,1,10 +BF7_cpmp_16_5_64_26_39_16,1,10 +BF9_cpmp_16_8_77_16_47_12,1,10 +LC2b_lc2b_7,1,10 +LC3b_lc3b_8,1,10 +cv_data5-10-33,1,10 +cv_data4-5-2,1,10 +cv_data5-6-26,1,10 +cv_data5-8-39,1,10 +cv_data4-4-15,1,10 +cv_data4-7-20,1,10 +cv_data4-5-5,1,10 +cv_data4-6-21,1,10 +cv_data5-5-39,1,10 +cv_data4-7-22,1,10 +cv_data3-8-2,1,10 +cv_data4-6-36,1,10 +cv_data5-9-4,1,10 +cv_data4-7-36,1,10 +cv_data4-5-21,1,10 +cv_data4-7-2,1,10 +cv_data5-4-33,1,10 +cv_data3-7-12,1,10 +cv_data4-6-4,1,10 +cv_data4-7-3,1,10 +cv_data5-7-32,1,10 +cv_data5-4-34,1,10 +cv_data5-6-19,1,10 +cv_data4-6-24,1,10 +cv_data4-5-34,1,10 diff --git a/_articles/RJ-2025-045/CPMP-2015_data/description.txt b/_articles/RJ-2025-045/CPMP-2015_data/description.txt new file mode 100644 index 0000000000..de57ccecef --- /dev/null +++ b/_articles/RJ-2025-045/CPMP-2015_data/description.txt @@ -0,0 +1,82 @@ +algorithm_cutoff_memory: 5120 +algorithm_cutoff_time: 3600 +default_steps: +- orig +- lfa1 +- lfa2 +feature_steps: + lfa1: + provides: + - left-density + - tier-weighted-groups + - avg-l1-top-left-lg-group + - cont-empty-grt-estack + lfa2: + provides: + - overstowing-2cont-stack-pct + - pct-bottom-pct-on-top + orig: + provides: + - stacks + - tiers + - stack-tier-ratio + - container-density + - empty-stack-pct + - overstowing-stack-pct + - group-same-min + - group-same-max + - group-same-mean + - group-same-stdev + - top-good-min + - top-good-max + - top-good-mean + - top-good-stdev + - overstowage-pct + - bflb +features_cutoff_memory: 512 +features_cutoff_time: 30 +features_deterministic: +- stacks +- tiers +- stack-tier-ratio +- container-density +- empty-stack-pct +- overstowing-stack-pct +- overstowing-2cont-stack-pct +- group-same-min +- group-same-max +- group-same-mean +- group-same-stdev +- top-good-min +- top-good-max +- top-good-mean +- top-good-stdev +- overstowage-pct +- bflb +- left-density +- tier-weighted-groups +- avg-l1-top-left-lg-group +- cont-empty-grt-estack +- pct-bottom-pct-on-top +features_stochastic: null +maximize: +- false +metainfo_algorithms: + astar-symmulgt-transmul: + configuration: '' + deterministic: true + astar-symmullt-transmul: + configuration: '' + deterministic: true + idastar-symmulgt-transmul: + configuration: '' + deterministic: true + idastar-symmullt-transmul: + configuration: '' + deterministic: true +number_of_feature_steps: 3 +performance_measures: +- runtime +performance_type: +- runtime +scenario_id: CPMP-2015 diff --git a/_articles/RJ-2025-045/CPMP-2015_data/feature_runstatus.arff b/_articles/RJ-2025-045/CPMP-2015_data/feature_runstatus.arff new file mode 100644 index 0000000000..02c4e7f940 --- /dev/null +++ b/_articles/RJ-2025-045/CPMP-2015_data/feature_runstatus.arff @@ -0,0 +1,536 @@ +@RELATION feature_runstatus_premarshalling_astar_2013 + +@ATTRIBUTE instance_id STRING +@ATTRIBUTE repetition NUMERIC +@ATTRIBUTE orig {ok, timeout, memout, presolved, crash, other, unknown} +@ATTRIBUTE lfa1 {ok, timeout, memout, presolved, crash, other, unknown} +@ATTRIBUTE lfa2 {ok, timeout, memout, presolved, crash, other, unknown} + +@DATA, +BF1_cpmp_16_5_48_10_29_1,1,ok,ok,ok +BF1_cpmp_16_5_48_10_29_12,1,ok,ok,ok +BF1_cpmp_16_5_48_10_29_14,1,ok,ok,ok +BF1_cpmp_16_5_48_10_29_15,1,ok,ok,ok +BF1_cpmp_16_5_48_10_29_17,1,ok,ok,ok +BF1_cpmp_16_5_48_10_29_18,1,ok,ok,ok +BF1_cpmp_16_5_48_10_29_19,1,ok,ok,ok +BF1_cpmp_16_5_48_10_29_20,1,ok,ok,ok +BF1_cpmp_16_5_48_10_29_3,1,ok,ok,ok +BF1_cpmp_16_5_48_10_29_4,1,ok,ok,ok +BF1_cpmp_16_5_48_10_29_5,1,ok,ok,ok +BF1_cpmp_16_5_48_10_29_6,1,ok,ok,ok +BF1_cpmp_16_5_48_10_29_7,1,ok,ok,ok +BF1_cpmp_16_5_48_10_29_8,1,ok,ok,ok +BF1_cpmp_16_5_48_10_29_9,1,ok,ok,ok +BF10_cpmp_16_8_77_16_58_13,1,ok,ok,ok +BF10_cpmp_16_8_77_16_58_3,1,ok,ok,ok +BF10_cpmp_16_8_77_16_58_8,1,ok,ok,ok +BF11_cpmp_16_8_77_31_47_10,1,ok,ok,ok +BF11_cpmp_16_8_77_31_47_11,1,ok,ok,ok +BF11_cpmp_16_8_77_31_47_16,1,ok,ok,ok +BF11_cpmp_16_8_77_31_47_6,1,ok,ok,ok +BF11_cpmp_16_8_77_31_47_7,1,ok,ok,ok +BF12_cpmp_16_8_77_31_58_14,1,ok,ok,ok +BF17_cpmp_20_5_60_12_36_1,1,ok,ok,ok +BF17_cpmp_20_5_60_12_36_10,1,ok,ok,ok +BF17_cpmp_20_5_60_12_36_11,1,ok,ok,ok +BF17_cpmp_20_5_60_12_36_12,1,ok,ok,ok +BF17_cpmp_20_5_60_12_36_15,1,ok,ok,ok +BF17_cpmp_20_5_60_12_36_16,1,ok,ok,ok +BF17_cpmp_20_5_60_12_36_17,1,ok,ok,ok +BF17_cpmp_20_5_60_12_36_2,1,ok,ok,ok +BF17_cpmp_20_5_60_12_36_20,1,ok,ok,ok +BF17_cpmp_20_5_60_12_36_3,1,ok,ok,ok +BF17_cpmp_20_5_60_12_36_4,1,ok,ok,ok +BF17_cpmp_20_5_60_12_36_5,1,ok,ok,ok +BF17_cpmp_20_5_60_12_36_6,1,ok,ok,ok +BF17_cpmp_20_5_60_12_36_7,1,ok,ok,ok +BF17_cpmp_20_5_60_12_36_8,1,ok,ok,ok +BF18_cpmp_20_5_60_12_45_1,1,ok,ok,ok +BF18_cpmp_20_5_60_12_45_12,1,ok,ok,ok +BF18_cpmp_20_5_60_12_45_13,1,ok,ok,ok +BF18_cpmp_20_5_60_12_45_14,1,ok,ok,ok +BF18_cpmp_20_5_60_12_45_15,1,ok,ok,ok +BF18_cpmp_20_5_60_12_45_16,1,ok,ok,ok +BF18_cpmp_20_5_60_12_45_17,1,ok,ok,ok +BF18_cpmp_20_5_60_12_45_18,1,ok,ok,ok +BF18_cpmp_20_5_60_12_45_19,1,ok,ok,ok +BF18_cpmp_20_5_60_12_45_2,1,ok,ok,ok +BF18_cpmp_20_5_60_12_45_3,1,ok,ok,ok +BF18_cpmp_20_5_60_12_45_4,1,ok,ok,ok +BF18_cpmp_20_5_60_12_45_5,1,ok,ok,ok +BF18_cpmp_20_5_60_12_45_6,1,ok,ok,ok +BF18_cpmp_20_5_60_12_45_7,1,ok,ok,ok +BF18_cpmp_20_5_60_12_45_8,1,ok,ok,ok +BF18_cpmp_20_5_60_12_45_9,1,ok,ok,ok +BF19_cpmp_20_5_60_24_36_1,1,ok,ok,ok +BF19_cpmp_20_5_60_24_36_10,1,ok,ok,ok +BF19_cpmp_20_5_60_24_36_11,1,ok,ok,ok +BF19_cpmp_20_5_60_24_36_14,1,ok,ok,ok +BF19_cpmp_20_5_60_24_36_15,1,ok,ok,ok +BF19_cpmp_20_5_60_24_36_16,1,ok,ok,ok +BF19_cpmp_20_5_60_24_36_17,1,ok,ok,ok +BF19_cpmp_20_5_60_24_36_18,1,ok,ok,ok +BF19_cpmp_20_5_60_24_36_2,1,ok,ok,ok +BF19_cpmp_20_5_60_24_36_20,1,ok,ok,ok +BF19_cpmp_20_5_60_24_36_3,1,ok,ok,ok +BF19_cpmp_20_5_60_24_36_5,1,ok,ok,ok +BF19_cpmp_20_5_60_24_36_6,1,ok,ok,ok +BF19_cpmp_20_5_60_24_36_7,1,ok,ok,ok +BF19_cpmp_20_5_60_24_36_8,1,ok,ok,ok +BF19_cpmp_20_5_60_24_36_9,1,ok,ok,ok +BF2_cpmp_16_5_48_10_36_1,1,ok,ok,ok +BF2_cpmp_16_5_48_10_36_11,1,ok,ok,ok +BF2_cpmp_16_5_48_10_36_12,1,ok,ok,ok +BF2_cpmp_16_5_48_10_36_13,1,ok,ok,ok +BF2_cpmp_16_5_48_10_36_14,1,ok,ok,ok +BF2_cpmp_16_5_48_10_36_15,1,ok,ok,ok +BF2_cpmp_16_5_48_10_36_17,1,ok,ok,ok +BF2_cpmp_16_5_48_10_36_18,1,ok,ok,ok +BF2_cpmp_16_5_48_10_36_19,1,ok,ok,ok +BF2_cpmp_16_5_48_10_36_2,1,ok,ok,ok +BF2_cpmp_16_5_48_10_36_20,1,ok,ok,ok +BF2_cpmp_16_5_48_10_36_3,1,ok,ok,ok +BF2_cpmp_16_5_48_10_36_4,1,ok,ok,ok +BF2_cpmp_16_5_48_10_36_5,1,ok,ok,ok +BF2_cpmp_16_5_48_10_36_6,1,ok,ok,ok +BF2_cpmp_16_5_48_10_36_7,1,ok,ok,ok +BF2_cpmp_16_5_48_10_36_8,1,ok,ok,ok +BF2_cpmp_16_5_48_10_36_9,1,ok,ok,ok +BF20_cpmp_20_5_60_24_45_1,1,ok,ok,ok +BF20_cpmp_20_5_60_24_45_10,1,ok,ok,ok +BF20_cpmp_20_5_60_24_45_11,1,ok,ok,ok +BF20_cpmp_20_5_60_24_45_12,1,ok,ok,ok +BF20_cpmp_20_5_60_24_45_13,1,ok,ok,ok +BF20_cpmp_20_5_60_24_45_14,1,ok,ok,ok +BF20_cpmp_20_5_60_24_45_15,1,ok,ok,ok +BF20_cpmp_20_5_60_24_45_17,1,ok,ok,ok +BF20_cpmp_20_5_60_24_45_2,1,ok,ok,ok +BF20_cpmp_20_5_60_24_45_20,1,ok,ok,ok +BF20_cpmp_20_5_60_24_45_3,1,ok,ok,ok +BF20_cpmp_20_5_60_24_45_5,1,ok,ok,ok +BF20_cpmp_20_5_60_24_45_6,1,ok,ok,ok +BF20_cpmp_20_5_60_24_45_7,1,ok,ok,ok +BF20_cpmp_20_5_60_24_45_8,1,ok,ok,ok +BF20_cpmp_20_5_60_24_45_9,1,ok,ok,ok +BF21_cpmp_20_5_80_16_48_11,1,ok,ok,ok +BF21_cpmp_20_5_80_16_48_14,1,ok,ok,ok +BF21_cpmp_20_5_80_16_48_16,1,ok,ok,ok +BF21_cpmp_20_5_80_16_48_18,1,ok,ok,ok +BF21_cpmp_20_5_80_16_48_19,1,ok,ok,ok +BF21_cpmp_20_5_80_16_48_2,1,ok,ok,ok +BF21_cpmp_20_5_80_16_48_3,1,ok,ok,ok +BF21_cpmp_20_5_80_16_48_4,1,ok,ok,ok +BF21_cpmp_20_5_80_16_48_5,1,ok,ok,ok +BF21_cpmp_20_5_80_16_48_7,1,ok,ok,ok +BF21_cpmp_20_5_80_16_48_8,1,ok,ok,ok +BF21_cpmp_20_5_80_16_48_9,1,ok,ok,ok +BF22_cpmp_20_5_80_16_60_11,1,ok,ok,ok +BF22_cpmp_20_5_80_16_60_16,1,ok,ok,ok +BF22_cpmp_20_5_80_16_60_4,1,ok,ok,ok +BF22_cpmp_20_5_80_16_60_6,1,ok,ok,ok +BF22_cpmp_20_5_80_16_60_7,1,ok,ok,ok +BF22_cpmp_20_5_80_16_60_9,1,ok,ok,ok +BF23_cpmp_20_5_80_32_48_10,1,ok,ok,ok +BF23_cpmp_20_5_80_32_48_11,1,ok,ok,ok +BF23_cpmp_20_5_80_32_48_12,1,ok,ok,ok +BF23_cpmp_20_5_80_32_48_15,1,ok,ok,ok +BF23_cpmp_20_5_80_32_48_16,1,ok,ok,ok +BF23_cpmp_20_5_80_32_48_18,1,ok,ok,ok +BF23_cpmp_20_5_80_32_48_3,1,ok,ok,ok +BF23_cpmp_20_5_80_32_48_4,1,ok,ok,ok +BF23_cpmp_20_5_80_32_48_6,1,ok,ok,ok +BF23_cpmp_20_5_80_32_48_7,1,ok,ok,ok +BF24_cpmp_20_5_80_32_60_13,1,ok,ok,ok +BF24_cpmp_20_5_80_32_60_17,1,ok,ok,ok +BF24_cpmp_20_5_80_32_60_2,1,ok,ok,ok +BF24_cpmp_20_5_80_32_60_8,1,ok,ok,ok +BF25_cpmp_20_8_96_20_58_20,1,ok,ok,ok +BF26_cpmp_20_8_96_20_72_5,1,ok,ok,ok +BF27_cpmp_20_8_96_39_58_12,1,ok,ok,ok +BF27_cpmp_20_8_96_39_58_6,1,ok,ok,ok +BF27_cpmp_20_8_96_39_58_9,1,ok,ok,ok +BF28_cpmp_20_8_96_39_72_19,1,ok,ok,ok +BF3_cpmp_16_5_48_20_29_1,1,ok,ok,ok +BF3_cpmp_16_5_48_20_29_11,1,ok,ok,ok +BF3_cpmp_16_5_48_20_29_13,1,ok,ok,ok +BF3_cpmp_16_5_48_20_29_14,1,ok,ok,ok +BF3_cpmp_16_5_48_20_29_16,1,ok,ok,ok +BF3_cpmp_16_5_48_20_29_19,1,ok,ok,ok +BF3_cpmp_16_5_48_20_29_20,1,ok,ok,ok +BF3_cpmp_16_5_48_20_29_3,1,ok,ok,ok +BF3_cpmp_16_5_48_20_29_5,1,ok,ok,ok +BF3_cpmp_16_5_48_20_29_6,1,ok,ok,ok +BF3_cpmp_16_5_48_20_29_8,1,ok,ok,ok +BF3_cpmp_16_5_48_20_29_9,1,ok,ok,ok +BF4_cpmp_16_5_48_20_36_1,1,ok,ok,ok +BF4_cpmp_16_5_48_20_36_10,1,ok,ok,ok +BF4_cpmp_16_5_48_20_36_11,1,ok,ok,ok +BF4_cpmp_16_5_48_20_36_13,1,ok,ok,ok +BF4_cpmp_16_5_48_20_36_14,1,ok,ok,ok +BF4_cpmp_16_5_48_20_36_15,1,ok,ok,ok +BF4_cpmp_16_5_48_20_36_16,1,ok,ok,ok +BF4_cpmp_16_5_48_20_36_17,1,ok,ok,ok +BF4_cpmp_16_5_48_20_36_18,1,ok,ok,ok +BF4_cpmp_16_5_48_20_36_19,1,ok,ok,ok +BF4_cpmp_16_5_48_20_36_2,1,ok,ok,ok +BF4_cpmp_16_5_48_20_36_20,1,ok,ok,ok +BF4_cpmp_16_5_48_20_36_3,1,ok,ok,ok +BF4_cpmp_16_5_48_20_36_5,1,ok,ok,ok +BF4_cpmp_16_5_48_20_36_6,1,ok,ok,ok +BF4_cpmp_16_5_48_20_36_7,1,ok,ok,ok +BF4_cpmp_16_5_48_20_36_8,1,ok,ok,ok +BF4_cpmp_16_5_48_20_36_9,1,ok,ok,ok +BF5_cpmp_16_5_64_13_39_1,1,ok,ok,ok +BF5_cpmp_16_5_64_13_39_10,1,ok,ok,ok +BF5_cpmp_16_5_64_13_39_11,1,ok,ok,ok +BF5_cpmp_16_5_64_13_39_12,1,ok,ok,ok +BF5_cpmp_16_5_64_13_39_14,1,ok,ok,ok +BF5_cpmp_16_5_64_13_39_15,1,ok,ok,ok +BF5_cpmp_16_5_64_13_39_17,1,ok,ok,ok +BF5_cpmp_16_5_64_13_39_2,1,ok,ok,ok +BF5_cpmp_16_5_64_13_39_20,1,ok,ok,ok +BF5_cpmp_16_5_64_13_39_3,1,ok,ok,ok +BF5_cpmp_16_5_64_13_39_4,1,ok,ok,ok +BF5_cpmp_16_5_64_13_39_5,1,ok,ok,ok +BF5_cpmp_16_5_64_13_39_8,1,ok,ok,ok +BF5_cpmp_16_5_64_13_39_9,1,ok,ok,ok +BF6_cpmp_16_5_64_13_48_1,1,ok,ok,ok +BF6_cpmp_16_5_64_13_48_11,1,ok,ok,ok +BF6_cpmp_16_5_64_13_48_12,1,ok,ok,ok +BF6_cpmp_16_5_64_13_48_13,1,ok,ok,ok +BF6_cpmp_16_5_64_13_48_15,1,ok,ok,ok +BF6_cpmp_16_5_64_13_48_16,1,ok,ok,ok +BF6_cpmp_16_5_64_13_48_17,1,ok,ok,ok +BF6_cpmp_16_5_64_13_48_18,1,ok,ok,ok +BF6_cpmp_16_5_64_13_48_19,1,ok,ok,ok +BF6_cpmp_16_5_64_13_48_2,1,ok,ok,ok +BF6_cpmp_16_5_64_13_48_3,1,ok,ok,ok +BF6_cpmp_16_5_64_13_48_4,1,ok,ok,ok +BF6_cpmp_16_5_64_13_48_6,1,ok,ok,ok +BF6_cpmp_16_5_64_13_48_7,1,ok,ok,ok +BF6_cpmp_16_5_64_13_48_8,1,ok,ok,ok +BF6_cpmp_16_5_64_13_48_9,1,ok,ok,ok +BF7_cpmp_16_5_64_26_39_1,1,ok,ok,ok +BF7_cpmp_16_5_64_26_39_10,1,ok,ok,ok +BF7_cpmp_16_5_64_26_39_11,1,ok,ok,ok +BF7_cpmp_16_5_64_26_39_12,1,ok,ok,ok +BF7_cpmp_16_5_64_26_39_13,1,ok,ok,ok +BF7_cpmp_16_5_64_26_39_14,1,ok,ok,ok +BF7_cpmp_16_5_64_26_39_15,1,ok,ok,ok +BF7_cpmp_16_5_64_26_39_16,1,ok,ok,ok +BF7_cpmp_16_5_64_26_39_17,1,ok,ok,ok +BF7_cpmp_16_5_64_26_39_19,1,ok,ok,ok +BF7_cpmp_16_5_64_26_39_2,1,ok,ok,ok +BF7_cpmp_16_5_64_26_39_20,1,ok,ok,ok +BF7_cpmp_16_5_64_26_39_3,1,ok,ok,ok +BF7_cpmp_16_5_64_26_39_6,1,ok,ok,ok +BF7_cpmp_16_5_64_26_39_7,1,ok,ok,ok +BF7_cpmp_16_5_64_26_39_8,1,ok,ok,ok +BF8_cpmp_16_5_64_26_48_1,1,ok,ok,ok +BF8_cpmp_16_5_64_26_48_11,1,ok,ok,ok +BF8_cpmp_16_5_64_26_48_12,1,ok,ok,ok +BF8_cpmp_16_5_64_26_48_17,1,ok,ok,ok +BF8_cpmp_16_5_64_26_48_20,1,ok,ok,ok +BF8_cpmp_16_5_64_26_48_4,1,ok,ok,ok +BF8_cpmp_16_5_64_26_48_5,1,ok,ok,ok +BF8_cpmp_16_5_64_26_48_6,1,ok,ok,ok +BF8_cpmp_16_5_64_26_48_7,1,ok,ok,ok +BF9_cpmp_16_8_77_16_47_1,1,ok,ok,ok +BF9_cpmp_16_8_77_16_47_10,1,ok,ok,ok +BF9_cpmp_16_8_77_16_47_12,1,ok,ok,ok +BF9_cpmp_16_8_77_16_47_3,1,ok,ok,ok +BF9_cpmp_16_8_77_16_47_4,1,ok,ok,ok +BF9_cpmp_16_8_77_16_47_6,1,ok,ok,ok +BF9_cpmp_16_8_77_16_47_7,1,ok,ok,ok +BF9_cpmp_16_8_77_16_47_8,1,ok,ok,ok +LC2a_lc2a_1,1,ok,ok,ok +LC2a_lc2a_10,1,ok,ok,ok +LC2a_lc2a_2,1,ok,ok,ok +LC2a_lc2a_3,1,ok,ok,ok +LC2a_lc2a_4,1,ok,ok,ok +LC2a_lc2a_5,1,ok,ok,ok +LC2a_lc2a_7,1,ok,ok,ok +LC2b_lc2b_1,1,ok,ok,ok +LC2b_lc2b_2,1,ok,ok,ok +LC2b_lc2b_3,1,ok,ok,ok +LC2b_lc2b_4,1,ok,ok,ok +LC2b_lc2b_5,1,ok,ok,ok +LC2b_lc2b_6,1,ok,ok,ok +LC2b_lc2b_7,1,ok,ok,ok +LC2b_lc2b_9,1,ok,ok,ok +LC3a_lc3a_1,1,ok,ok,ok +LC3a_lc3a_10,1,ok,ok,ok +LC3a_lc3a_3,1,ok,ok,ok +LC3a_lc3a_4,1,ok,ok,ok +LC3a_lc3a_6,1,ok,ok,ok +LC3a_lc3a_7,1,ok,ok,ok +LC3a_lc3a_8,1,ok,ok,ok +LC3a_lc3a_9,1,ok,ok,ok +LC3b_lc3b_1,1,ok,ok,ok +LC3b_lc3b_10,1,ok,ok,ok +LC3b_lc3b_4,1,ok,ok,ok +LC3b_lc3b_5,1,ok,ok,ok +LC3b_lc3b_6,1,ok,ok,ok +LC3b_lc3b_8,1,ok,ok,ok +LC3b_lc3b_9,1,ok,ok,ok +cv_data5-10-33,1,ok,ok,ok +cv_data4-6-1,1,ok,ok,ok +cv_data4-4-19,1,ok,ok,ok +cv_data5-4-23,1,ok,ok,ok +cv_data4-5-2,1,ok,ok,ok +cv_data4-5-26,1,ok,ok,ok +cv_data5-5-12,1,ok,ok,ok +cv_data4-5-9,1,ok,ok,ok +cv_data5-7-19,1,ok,ok,ok +cv_data4-6-18,1,ok,ok,ok +cv_data4-4-21,1,ok,ok,ok +cv_data4-7-39,1,ok,ok,ok +cv_data4-7-27,1,ok,ok,ok +cv_data5-7-40,1,ok,ok,ok +cv_data5-6-26,1,ok,ok,ok +cv_data5-6-6,1,ok,ok,ok +cv_data4-5-1,1,ok,ok,ok +cv_data5-4-21,1,ok,ok,ok +cv_data4-5-22,1,ok,ok,ok +cv_data5-7-13,1,ok,ok,ok +cv_data5-8-38,1,ok,ok,ok +cv_data4-7-38,1,ok,ok,ok +cv_data3-8-38,1,ok,ok,ok +cv_data4-4-16,1,ok,ok,ok +cv_data4-7-37,1,ok,ok,ok +cv_data5-7-15,1,ok,ok,ok +cv_data4-7-25,1,ok,ok,ok +cv_data5-5-38,1,ok,ok,ok +cv_data5-8-39,1,ok,ok,ok +cv_data5-5-29,1,ok,ok,ok +cv_data5-5-25,1,ok,ok,ok +cv_data4-7-30,1,ok,ok,ok +cv_data5-4-36,1,ok,ok,ok +cv_data4-4-15,1,ok,ok,ok +cv_data5-8-19,1,ok,ok,ok +cv_data5-10-23,1,ok,ok,ok +cv_data5-5-11,1,ok,ok,ok +cv_data4-7-20,1,ok,ok,ok +cv_data4-7-35,1,ok,ok,ok +cv_data4-6-7,1,ok,ok,ok +cv_data4-7-6,1,ok,ok,ok +cv_data4-5-25,1,ok,ok,ok +cv_data5-8-2,1,ok,ok,ok +cv_data4-4-33,1,ok,ok,ok +cv_data4-5-5,1,ok,ok,ok +cv_data3-8-37,1,ok,ok,ok +cv_data4-5-20,1,ok,ok,ok +cv_data5-4-37,1,ok,ok,ok +cv_data4-6-5,1,ok,ok,ok +cv_data4-6-35,1,ok,ok,ok +cv_data4-7-1,1,ok,ok,ok +cv_data5-5-5,1,ok,ok,ok +cv_data5-5-37,1,ok,ok,ok +cv_data4-5-24,1,ok,ok,ok +cv_data5-4-15,1,ok,ok,ok +cv_data5-9-21,1,ok,ok,ok +cv_data5-4-27,1,ok,ok,ok +cv_data5-6-25,1,ok,ok,ok +cv_data5-8-25,1,ok,ok,ok +cv_data4-6-21,1,ok,ok,ok +cv_data5-5-16,1,ok,ok,ok +cv_data5-4-4,1,ok,ok,ok +cv_data5-5-39,1,ok,ok,ok +cv_data4-7-22,1,ok,ok,ok +cv_data5-6-11,1,ok,ok,ok +cv_data4-6-9,1,ok,ok,ok +cv_data3-5-13,1,ok,ok,ok +cv_data3-8-2,1,ok,ok,ok +cv_data5-6-2,1,ok,ok,ok +cv_data5-4-17,1,ok,ok,ok +cv_data5-8-6,1,ok,ok,ok +cv_data4-6-20,1,ok,ok,ok +cv_data5-4-7,1,ok,ok,ok +cv_data4-7-19,1,ok,ok,ok +cv_data5-5-33,1,ok,ok,ok +cv_data5-5-27,1,ok,ok,ok +cv_data5-4-31,1,ok,ok,ok +cv_data4-5-30,1,ok,ok,ok +cv_data5-6-8,1,ok,ok,ok +cv_data5-5-14,1,ok,ok,ok +cv_data4-6-16,1,ok,ok,ok +cv_data5-5-40,1,ok,ok,ok +cv_data4-7-26,1,ok,ok,ok +cv_data5-5-7,1,ok,ok,ok +cv_data3-7-18,1,ok,ok,ok +cv_data5-5-24,1,ok,ok,ok +cv_data5-4-3,1,ok,ok,ok +cv_data5-5-23,1,ok,ok,ok +cv_data5-5-10,1,ok,ok,ok +cv_data5-5-18,1,ok,ok,ok +cv_data4-7-24,1,ok,ok,ok +cv_data5-4-13,1,ok,ok,ok +cv_data5-9-38,1,ok,ok,ok +cv_data5-5-8,1,ok,ok,ok +cv_data4-7-40,1,ok,ok,ok +cv_data5-7-33,1,ok,ok,ok +cv_data4-4-12,1,ok,ok,ok +cv_data4-6-36,1,ok,ok,ok +cv_data5-4-20,1,ok,ok,ok +cv_data4-5-23,1,ok,ok,ok +cv_data4-7-13,1,ok,ok,ok +cv_data5-9-32,1,ok,ok,ok +cv_data4-7-32,1,ok,ok,ok +cv_data5-9-4,1,ok,ok,ok +cv_data4-7-23,1,ok,ok,ok +cv_data5-5-3,1,ok,ok,ok +cv_data4-5-39,1,ok,ok,ok +cv_data5-7-36,1,ok,ok,ok +cv_data4-4-3,1,ok,ok,ok +cv_data4-7-36,1,ok,ok,ok +cv_data5-5-9,1,ok,ok,ok +cv_data4-4-23,1,ok,ok,ok +cv_data3-7-22,1,ok,ok,ok +cv_data5-6-28,1,ok,ok,ok +cv_data4-6-37,1,ok,ok,ok +cv_data5-5-21,1,ok,ok,ok +cv_data4-6-30,1,ok,ok,ok +cv_data4-5-27,1,ok,ok,ok +cv_data4-5-21,1,ok,ok,ok +cv_data3-8-25,1,ok,ok,ok +cv_data4-5-15,1,ok,ok,ok +cv_data5-5-31,1,ok,ok,ok +cv_data4-7-2,1,ok,ok,ok +cv_data5-4-10,1,ok,ok,ok +cv_data4-4-34,1,ok,ok,ok +cv_data5-7-4,1,ok,ok,ok +cv_data4-4-14,1,ok,ok,ok +cv_data5-7-22,1,ok,ok,ok +cv_data4-6-2,1,ok,ok,ok +cv_data4-7-12,1,ok,ok,ok +cv_data5-5-30,1,ok,ok,ok +cv_data3-8-28,1,ok,ok,ok +cv_data5-6-37,1,ok,ok,ok +cv_data5-6-20,1,ok,ok,ok +cv_data4-6-29,1,ok,ok,ok +cv_data4-4-10,1,ok,ok,ok +cv_data4-5-29,1,ok,ok,ok +cv_data3-8-24,1,ok,ok,ok +cv_data5-4-35,1,ok,ok,ok +cv_data5-4-22,1,ok,ok,ok +cv_data5-4-29,1,ok,ok,ok +cv_data3-7-27,1,ok,ok,ok +cv_data3-7-5,1,ok,ok,ok +cv_data4-5-3,1,ok,ok,ok +cv_data4-6-12,1,ok,ok,ok +cv_data5-8-8,1,ok,ok,ok +cv_data4-6-25,1,ok,ok,ok +cv_data4-7-31,1,ok,ok,ok +cv_data5-6-22,1,ok,ok,ok +cv_data4-6-19,1,ok,ok,ok +cv_data5-4-30,1,ok,ok,ok +cv_data5-5-17,1,ok,ok,ok +cv_data4-6-34,1,ok,ok,ok +cv_data5-5-32,1,ok,ok,ok +cv_data4-6-32,1,ok,ok,ok +cv_data5-4-33,1,ok,ok,ok +cv_data3-7-12,1,ok,ok,ok +cv_data4-7-11,1,ok,ok,ok +cv_data4-4-35,1,ok,ok,ok +cv_data5-4-25,1,ok,ok,ok +cv_data4-6-10,1,ok,ok,ok +cv_data4-7-21,1,ok,ok,ok +cv_data4-6-4,1,ok,ok,ok +cv_data5-4-38,1,ok,ok,ok +cv_data4-7-7,1,ok,ok,ok +cv_data4-7-3,1,ok,ok,ok +cv_data5-8-3,1,ok,ok,ok +cv_data5-7-37,1,ok,ok,ok +cv_data5-5-20,1,ok,ok,ok +cv_data4-4-24,1,ok,ok,ok +cv_data5-4-14,1,ok,ok,ok +cv_data5-8-4,1,ok,ok,ok +cv_data4-6-40,1,ok,ok,ok +cv_data5-6-31,1,ok,ok,ok +cv_data4-6-14,1,ok,ok,ok +cv_data5-7-1,1,ok,ok,ok +cv_data5-7-32,1,ok,ok,ok +cv_data4-7-28,1,ok,ok,ok +cv_data4-4-2,1,ok,ok,ok +cv_data5-7-12,1,ok,ok,ok +cv_data4-7-5,1,ok,ok,ok +cv_data5-4-1,1,ok,ok,ok +cv_data4-5-17,1,ok,ok,ok +cv_data4-6-33,1,ok,ok,ok +cv_data4-7-8,1,ok,ok,ok +cv_data5-4-24,1,ok,ok,ok +cv_data4-4-13,1,ok,ok,ok +cv_data5-4-12,1,ok,ok,ok +cv_data5-6-7,1,ok,ok,ok +cv_data5-6-35,1,ok,ok,ok +cv_data4-6-11,1,ok,ok,ok +cv_data5-4-16,1,ok,ok,ok +cv_data4-7-4,1,ok,ok,ok +cv_data5-7-11,1,ok,ok,ok +cv_data5-6-33,1,ok,ok,ok +cv_data5-5-28,1,ok,ok,ok +cv_data5-5-15,1,ok,ok,ok +cv_data3-6-24,1,ok,ok,ok +cv_data5-4-34,1,ok,ok,ok +cv_data4-5-35,1,ok,ok,ok +cv_data5-7-31,1,ok,ok,ok +cv_data4-7-10,1,ok,ok,ok +cv_data4-7-29,1,ok,ok,ok +cv_data5-4-18,1,ok,ok,ok +cv_data5-6-19,1,ok,ok,ok +cv_data4-5-32,1,ok,ok,ok +cv_data4-6-13,1,ok,ok,ok +cv_data5-8-36,1,ok,ok,ok +cv_data4-5-12,1,ok,ok,ok +cv_data4-5-28,1,ok,ok,ok +cv_data5-5-4,1,ok,ok,ok +cv_data4-4-7,1,ok,ok,ok +cv_data5-5-36,1,ok,ok,ok +cv_data4-7-16,1,ok,ok,ok +cv_data4-6-39,1,ok,ok,ok +cv_data5-5-6,1,ok,ok,ok +cv_data5-4-9,1,ok,ok,ok +cv_data4-4-29,1,ok,ok,ok +cv_data5-6-5,1,ok,ok,ok +cv_data3-7-17,1,ok,ok,ok +cv_data4-6-27,1,ok,ok,ok +cv_data5-5-22,1,ok,ok,ok +cv_data5-9-3,1,ok,ok,ok +cv_data4-4-28,1,ok,ok,ok +cv_data4-4-27,1,ok,ok,ok +cv_data4-6-24,1,ok,ok,ok +cv_data4-6-8,1,ok,ok,ok +cv_data4-4-20,1,ok,ok,ok +cv_data3-6-26,1,ok,ok,ok +cv_data4-7-34,1,ok,ok,ok +cv_data5-10-32,1,ok,ok,ok +cv_data5-4-32,1,ok,ok,ok +cv_data4-7-33,1,ok,ok,ok +cv_data4-6-28,1,ok,ok,ok +cv_data3-7-30,1,ok,ok,ok +cv_data5-5-19,1,ok,ok,ok +cv_data4-4-30,1,ok,ok,ok +cv_data4-7-15,1,ok,ok,ok +cv_data5-6-30,1,ok,ok,ok +cv_data4-7-17,1,ok,ok,ok +cv_data4-5-31,1,ok,ok,ok +cv_data4-5-34,1,ok,ok,ok +cv_data4-4-25,1,ok,ok,ok +cv_data5-7-18,1,ok,ok,ok +cv_data3-8-17,1,ok,ok,ok +cv_data4-6-15,1,ok,ok,ok +cv_data5-7-7,1,ok,ok,ok +cv_data4-7-14,1,ok,ok,ok +cv_data5-10-19,1,ok,ok,ok +cv_data5-8-29,1,ok,ok,ok +cv_data3-8-7,1,ok,ok,ok +cv_data4-5-14,1,ok,ok,ok +cv_data5-7-16,1,ok,ok,ok +cv_data4-5-19,1,ok,ok,ok +cv_data3-5-8,1,ok,ok,ok +cv_data5-9-10,1,ok,ok,ok +cv_data4-7-9,1,ok,ok,ok +cv_data5-5-13,1,ok,ok,ok +cv_data4-6-3,1,ok,ok,ok +cv_data5-4-6,1,ok,ok,ok diff --git a/_articles/RJ-2025-045/CPMP-2015_data/feature_runstatus_test.arff b/_articles/RJ-2025-045/CPMP-2015_data/feature_runstatus_test.arff new file mode 100644 index 0000000000..9e1051965e --- /dev/null +++ b/_articles/RJ-2025-045/CPMP-2015_data/feature_runstatus_test.arff @@ -0,0 +1,557 @@ +@RELATION feature_runstatus_premarshalling_astar_2013 + +@ATTRIBUTE instance_id STRING +@ATTRIBUTE repetition NUMERIC +@ATTRIBUTE orig {ok, timeout, memout, presolved, crash, other, unknown} +@ATTRIBUTE lfa1 {ok, timeout, memout, presolved, crash, other, unknown} +@ATTRIBUTE lfa2 {ok, timeout, memout, presolved, crash, other, unknown} + +@DATA, + +4-6-75pct-2_10,1,ok,ok,ok +4-6-75pct-2_102,1,ok,ok,ok +4-6-75pct-2_103,1,ok,ok,ok +4-6-75pct-2_104,1,ok,ok,ok +4-6-75pct-2_105,1,ok,ok,ok +4-6-75pct-2_106,1,ok,ok,ok +4-6-75pct-2_107,1,ok,ok,ok +4-6-75pct-2_109,1,ok,ok,ok +4-6-75pct-2_11,1,ok,ok,ok +4-6-75pct-2_112,1,ok,ok,ok +4-6-75pct-2_113,1,ok,ok,ok +4-6-75pct-2_115,1,ok,ok,ok +4-6-75pct-2_116,1,ok,ok,ok +4-6-75pct-2_117,1,ok,ok,ok +4-6-75pct-2_119,1,ok,ok,ok +4-6-75pct-2_121,1,ok,ok,ok +4-6-75pct-2_122,1,ok,ok,ok +4-6-75pct-2_124,1,ok,ok,ok +4-6-75pct-2_125,1,ok,ok,ok +4-6-75pct-2_129,1,ok,ok,ok +4-6-75pct-2_130,1,ok,ok,ok +4-6-75pct-2_134,1,ok,ok,ok +4-6-75pct-2_136,1,ok,ok,ok +4-6-75pct-2_144,1,ok,ok,ok +4-6-75pct-2_146,1,ok,ok,ok +4-6-75pct-2_147,1,ok,ok,ok +4-6-75pct-2_149,1,ok,ok,ok +4-6-75pct-2_151,1,ok,ok,ok +4-6-75pct-2_152,1,ok,ok,ok +4-6-75pct-2_153,1,ok,ok,ok +4-6-75pct-2_155,1,ok,ok,ok +4-6-75pct-2_156,1,ok,ok,ok +4-6-75pct-2_157,1,ok,ok,ok +4-6-75pct-2_159,1,ok,ok,ok +4-6-75pct-2_161,1,ok,ok,ok +4-6-75pct-2_163,1,ok,ok,ok +4-6-75pct-2_165,1,ok,ok,ok +4-6-75pct-2_166,1,ok,ok,ok +4-6-75pct-2_169,1,ok,ok,ok +4-6-75pct-2_17,1,ok,ok,ok +4-6-75pct-2_171,1,ok,ok,ok +4-6-75pct-2_172,1,ok,ok,ok +4-6-75pct-2_178,1,ok,ok,ok +4-6-75pct-2_181,1,ok,ok,ok +4-6-75pct-2_183,1,ok,ok,ok +4-6-75pct-2_185,1,ok,ok,ok +4-6-75pct-2_187,1,ok,ok,ok +4-6-75pct-2_19,1,ok,ok,ok +4-6-75pct-2_193,1,ok,ok,ok +4-6-75pct-2_194,1,ok,ok,ok +4-6-75pct-2_198,1,ok,ok,ok +4-6-75pct-2_199,1,ok,ok,ok +4-6-75pct-2_2,1,ok,ok,ok +4-6-75pct-2_20,1,ok,ok,ok +4-6-75pct-2_209,1,ok,ok,ok +4-6-75pct-2_21,1,ok,ok,ok +4-6-75pct-2_210,1,ok,ok,ok +4-6-75pct-2_214,1,ok,ok,ok +4-6-75pct-2_216,1,ok,ok,ok +4-6-75pct-2_218,1,ok,ok,ok +4-6-75pct-2_219,1,ok,ok,ok +4-6-75pct-2_221,1,ok,ok,ok +4-6-75pct-2_224,1,ok,ok,ok +4-6-75pct-2_225,1,ok,ok,ok +4-6-75pct-2_226,1,ok,ok,ok +4-6-75pct-2_227,1,ok,ok,ok +4-6-75pct-2_23,1,ok,ok,ok +4-6-75pct-2_230,1,ok,ok,ok +4-6-75pct-2_231,1,ok,ok,ok +4-6-75pct-2_235,1,ok,ok,ok +4-6-75pct-2_236,1,ok,ok,ok +4-6-75pct-2_239,1,ok,ok,ok +4-6-75pct-2_242,1,ok,ok,ok +4-6-75pct-2_243,1,ok,ok,ok +4-6-75pct-2_248,1,ok,ok,ok +4-6-75pct-2_25,1,ok,ok,ok +4-6-75pct-2_28,1,ok,ok,ok +4-6-75pct-2_31,1,ok,ok,ok +4-6-75pct-2_33,1,ok,ok,ok +4-6-75pct-2_35,1,ok,ok,ok +4-6-75pct-2_36,1,ok,ok,ok +4-6-75pct-2_44,1,ok,ok,ok +4-6-75pct-2_47,1,ok,ok,ok +4-6-75pct-2_49,1,ok,ok,ok +4-6-75pct-2_55,1,ok,ok,ok +4-6-75pct-2_60,1,ok,ok,ok +4-6-75pct-2_62,1,ok,ok,ok +4-6-75pct-2_64,1,ok,ok,ok +4-6-75pct-2_65,1,ok,ok,ok +4-6-75pct-2_68,1,ok,ok,ok +4-6-75pct-2_71,1,ok,ok,ok +4-6-75pct-2_75,1,ok,ok,ok +4-6-75pct-2_76,1,ok,ok,ok +4-6-75pct-2_78,1,ok,ok,ok +4-6-75pct-2_83,1,ok,ok,ok +4-6-75pct-2_84,1,ok,ok,ok +4-6-75pct-2_86,1,ok,ok,ok +4-6-75pct-2_89,1,ok,ok,ok +4-6-75pct-2_93,1,ok,ok,ok +4-6-75pct-2_95,1,ok,ok,ok +4-6-75pct-2_96,1,ok,ok,ok +4-6-75pct-2_97,1,ok,ok,ok +4-6-75pct-2_99,1,ok,ok,ok +6-6-75pct-2_0,1,ok,ok,ok +6-6-75pct-2_1,1,ok,ok,ok +6-6-75pct-2_102,1,ok,ok,ok +6-6-75pct-2_103,1,ok,ok,ok +6-6-75pct-2_104,1,ok,ok,ok +6-6-75pct-2_105,1,ok,ok,ok +6-6-75pct-2_108,1,ok,ok,ok +6-6-75pct-2_113,1,ok,ok,ok +6-6-75pct-2_114,1,ok,ok,ok +6-6-75pct-2_115,1,ok,ok,ok +6-6-75pct-2_117,1,ok,ok,ok +6-6-75pct-2_119,1,ok,ok,ok +6-6-75pct-2_120,1,ok,ok,ok +6-6-75pct-2_122,1,ok,ok,ok +6-6-75pct-2_126,1,ok,ok,ok +6-6-75pct-2_127,1,ok,ok,ok +6-6-75pct-2_128,1,ok,ok,ok +6-6-75pct-2_13,1,ok,ok,ok +6-6-75pct-2_132,1,ok,ok,ok +6-6-75pct-2_135,1,ok,ok,ok +6-6-75pct-2_136,1,ok,ok,ok +6-6-75pct-2_138,1,ok,ok,ok +6-6-75pct-2_142,1,ok,ok,ok +6-6-75pct-2_143,1,ok,ok,ok +6-6-75pct-2_146,1,ok,ok,ok +6-6-75pct-2_147,1,ok,ok,ok +6-6-75pct-2_149,1,ok,ok,ok +6-6-75pct-2_15,1,ok,ok,ok +6-6-75pct-2_150,1,ok,ok,ok +6-6-75pct-2_151,1,ok,ok,ok +6-6-75pct-2_152,1,ok,ok,ok +6-6-75pct-2_153,1,ok,ok,ok +6-6-75pct-2_156,1,ok,ok,ok +6-6-75pct-2_158,1,ok,ok,ok +6-6-75pct-2_16,1,ok,ok,ok +6-6-75pct-2_160,1,ok,ok,ok +6-6-75pct-2_161,1,ok,ok,ok +6-6-75pct-2_162,1,ok,ok,ok +6-6-75pct-2_164,1,ok,ok,ok +6-6-75pct-2_165,1,ok,ok,ok +6-6-75pct-2_167,1,ok,ok,ok +6-6-75pct-2_17,1,ok,ok,ok +6-6-75pct-2_170,1,ok,ok,ok +6-6-75pct-2_171,1,ok,ok,ok +6-6-75pct-2_172,1,ok,ok,ok +6-6-75pct-2_173,1,ok,ok,ok +6-6-75pct-2_174,1,ok,ok,ok +6-6-75pct-2_175,1,ok,ok,ok +6-6-75pct-2_176,1,ok,ok,ok +6-6-75pct-2_177,1,ok,ok,ok +6-6-75pct-2_18,1,ok,ok,ok +6-6-75pct-2_180,1,ok,ok,ok +6-6-75pct-2_181,1,ok,ok,ok +6-6-75pct-2_183,1,ok,ok,ok +6-6-75pct-2_184,1,ok,ok,ok +6-6-75pct-2_185,1,ok,ok,ok +6-6-75pct-2_186,1,ok,ok,ok +6-6-75pct-2_187,1,ok,ok,ok +6-6-75pct-2_188,1,ok,ok,ok +6-6-75pct-2_190,1,ok,ok,ok +6-6-75pct-2_191,1,ok,ok,ok +6-6-75pct-2_193,1,ok,ok,ok +6-6-75pct-2_194,1,ok,ok,ok +6-6-75pct-2_195,1,ok,ok,ok +6-6-75pct-2_196,1,ok,ok,ok +6-6-75pct-2_197,1,ok,ok,ok +6-6-75pct-2_198,1,ok,ok,ok +6-6-75pct-2_199,1,ok,ok,ok +6-6-75pct-2_2,1,ok,ok,ok +6-6-75pct-2_201,1,ok,ok,ok +6-6-75pct-2_202,1,ok,ok,ok +6-6-75pct-2_203,1,ok,ok,ok +6-6-75pct-2_204,1,ok,ok,ok +6-6-75pct-2_206,1,ok,ok,ok +6-6-75pct-2_209,1,ok,ok,ok +6-6-75pct-2_211,1,ok,ok,ok +6-6-75pct-2_215,1,ok,ok,ok +6-6-75pct-2_219,1,ok,ok,ok +6-6-75pct-2_220,1,ok,ok,ok +6-6-75pct-2_221,1,ok,ok,ok +6-6-75pct-2_222,1,ok,ok,ok +6-6-75pct-2_223,1,ok,ok,ok +6-6-75pct-2_224,1,ok,ok,ok +6-6-75pct-2_225,1,ok,ok,ok +6-6-75pct-2_227,1,ok,ok,ok +6-6-75pct-2_229,1,ok,ok,ok +6-6-75pct-2_23,1,ok,ok,ok +6-6-75pct-2_230,1,ok,ok,ok +6-6-75pct-2_233,1,ok,ok,ok +6-6-75pct-2_235,1,ok,ok,ok +6-6-75pct-2_236,1,ok,ok,ok +6-6-75pct-2_238,1,ok,ok,ok +6-6-75pct-2_240,1,ok,ok,ok +6-6-75pct-2_241,1,ok,ok,ok +6-6-75pct-2_242,1,ok,ok,ok +6-6-75pct-2_245,1,ok,ok,ok +6-6-75pct-2_248,1,ok,ok,ok +6-6-75pct-2_249,1,ok,ok,ok +6-6-75pct-2_25,1,ok,ok,ok +6-6-75pct-2_26,1,ok,ok,ok +6-6-75pct-2_27,1,ok,ok,ok +6-6-75pct-2_28,1,ok,ok,ok +6-6-75pct-2_3,1,ok,ok,ok +6-6-75pct-2_30,1,ok,ok,ok +6-6-75pct-2_31,1,ok,ok,ok +6-6-75pct-2_32,1,ok,ok,ok +6-6-75pct-2_35,1,ok,ok,ok +6-6-75pct-2_39,1,ok,ok,ok +6-6-75pct-2_43,1,ok,ok,ok +6-6-75pct-2_44,1,ok,ok,ok +6-6-75pct-2_45,1,ok,ok,ok +6-6-75pct-2_46,1,ok,ok,ok +6-6-75pct-2_5,1,ok,ok,ok +6-6-75pct-2_50,1,ok,ok,ok +6-6-75pct-2_52,1,ok,ok,ok +6-6-75pct-2_54,1,ok,ok,ok +6-6-75pct-2_55,1,ok,ok,ok +6-6-75pct-2_57,1,ok,ok,ok +6-6-75pct-2_58,1,ok,ok,ok +6-6-75pct-2_59,1,ok,ok,ok +6-6-75pct-2_60,1,ok,ok,ok +6-6-75pct-2_62,1,ok,ok,ok +6-6-75pct-2_63,1,ok,ok,ok +6-6-75pct-2_66,1,ok,ok,ok +6-6-75pct-2_67,1,ok,ok,ok +6-6-75pct-2_68,1,ok,ok,ok +6-6-75pct-2_70,1,ok,ok,ok +6-6-75pct-2_71,1,ok,ok,ok +6-6-75pct-2_72,1,ok,ok,ok +6-6-75pct-2_73,1,ok,ok,ok +6-6-75pct-2_74,1,ok,ok,ok +6-6-75pct-2_75,1,ok,ok,ok +6-6-75pct-2_76,1,ok,ok,ok +6-6-75pct-2_78,1,ok,ok,ok +6-6-75pct-2_81,1,ok,ok,ok +6-6-75pct-2_82,1,ok,ok,ok +6-6-75pct-2_84,1,ok,ok,ok +6-6-75pct-2_86,1,ok,ok,ok +6-6-75pct-2_87,1,ok,ok,ok +6-6-75pct-2_88,1,ok,ok,ok +6-6-75pct-2_9,1,ok,ok,ok +6-6-75pct-2_90,1,ok,ok,ok +6-6-75pct-2_91,1,ok,ok,ok +6-6-75pct-2_92,1,ok,ok,ok +6-6-75pct-2_93,1,ok,ok,ok +6-6-75pct-2_94,1,ok,ok,ok +6-6-75pct-2_97,1,ok,ok,ok +6-6-75pct-2_98,1,ok,ok,ok +6-6-75pct-2_99,1,ok,ok,ok +8-8-75pct-2_104,1,ok,ok,ok +8-8-75pct-2_19,1,ok,ok,ok +8-8-75pct-2_196,1,ok,ok,ok +8-8-75pct-2_66,1,ok,ok,ok +8-8-75pct-2_86,1,ok,ok,ok +tdata-3-7-23,1,ok,ok,ok +tdata-3-8-19,1,ok,ok,ok +tdata-3-8-36,1,ok,ok,ok +tdata-3-8-39,1,ok,ok,ok +tdata-4-4-16,1,ok,ok,ok +tdata-4-4-17,1,ok,ok,ok +tdata-4-4-19,1,ok,ok,ok +tdata-4-4-21,1,ok,ok,ok +tdata-4-4-22,1,ok,ok,ok +tdata-4-4-23,1,ok,ok,ok +tdata-4-4-27,1,ok,ok,ok +tdata-4-4-29,1,ok,ok,ok +tdata-4-4-31,1,ok,ok,ok +tdata-4-4-4,1,ok,ok,ok +tdata-4-4-8,1,ok,ok,ok +tdata-4-5-11,1,ok,ok,ok +tdata-4-5-14,1,ok,ok,ok +tdata-4-5-15,1,ok,ok,ok +tdata-4-5-18,1,ok,ok,ok +tdata-4-5-19,1,ok,ok,ok +tdata-4-5-2,1,ok,ok,ok +tdata-4-5-22,1,ok,ok,ok +tdata-4-5-23,1,ok,ok,ok +tdata-4-5-24,1,ok,ok,ok +tdata-4-5-25,1,ok,ok,ok +tdata-4-5-27,1,ok,ok,ok +tdata-4-5-28,1,ok,ok,ok +tdata-4-5-29,1,ok,ok,ok +tdata-4-5-31,1,ok,ok,ok +tdata-4-5-33,1,ok,ok,ok +tdata-4-5-35,1,ok,ok,ok +tdata-4-5-36,1,ok,ok,ok +tdata-4-5-37,1,ok,ok,ok +tdata-4-5-38,1,ok,ok,ok +tdata-4-5-40,1,ok,ok,ok +tdata-4-5-5,1,ok,ok,ok +tdata-4-5-7,1,ok,ok,ok +tdata-4-5-8,1,ok,ok,ok +tdata-4-5-9,1,ok,ok,ok +tdata-4-6-1,1,ok,ok,ok +tdata-4-6-10,1,ok,ok,ok +tdata-4-6-11,1,ok,ok,ok +tdata-4-6-12,1,ok,ok,ok +tdata-4-6-13,1,ok,ok,ok +tdata-4-6-14,1,ok,ok,ok +tdata-4-6-16,1,ok,ok,ok +tdata-4-6-17,1,ok,ok,ok +tdata-4-6-19,1,ok,ok,ok +tdata-4-6-2,1,ok,ok,ok +tdata-4-6-20,1,ok,ok,ok +tdata-4-6-22,1,ok,ok,ok +tdata-4-6-23,1,ok,ok,ok +tdata-4-6-24,1,ok,ok,ok +tdata-4-6-26,1,ok,ok,ok +tdata-4-6-27,1,ok,ok,ok +tdata-4-6-3,1,ok,ok,ok +tdata-4-6-31,1,ok,ok,ok +tdata-4-6-33,1,ok,ok,ok +tdata-4-6-35,1,ok,ok,ok +tdata-4-6-36,1,ok,ok,ok +tdata-4-6-38,1,ok,ok,ok +tdata-4-6-4,1,ok,ok,ok +tdata-4-6-40,1,ok,ok,ok +tdata-4-6-5,1,ok,ok,ok +tdata-4-6-6,1,ok,ok,ok +tdata-4-6-7,1,ok,ok,ok +tdata-4-6-8,1,ok,ok,ok +tdata-4-7-10,1,ok,ok,ok +tdata-4-7-12,1,ok,ok,ok +tdata-4-7-13,1,ok,ok,ok +tdata-4-7-14,1,ok,ok,ok +tdata-4-7-17,1,ok,ok,ok +tdata-4-7-19,1,ok,ok,ok +tdata-4-7-2,1,ok,ok,ok +tdata-4-7-22,1,ok,ok,ok +tdata-4-7-23,1,ok,ok,ok +tdata-4-7-24,1,ok,ok,ok +tdata-4-7-25,1,ok,ok,ok +tdata-4-7-26,1,ok,ok,ok +tdata-4-7-28,1,ok,ok,ok +tdata-4-7-29,1,ok,ok,ok +tdata-4-7-30,1,ok,ok,ok +tdata-4-7-33,1,ok,ok,ok +tdata-4-7-35,1,ok,ok,ok +tdata-4-7-36,1,ok,ok,ok +tdata-4-7-38,1,ok,ok,ok +tdata-4-7-4,1,ok,ok,ok +tdata-4-7-8,1,ok,ok,ok +tdata-4-7-9,1,ok,ok,ok +tdata-5-10-13,1,ok,ok,ok +tdata-5-10-18,1,ok,ok,ok +tdata-5-4-1,1,ok,ok,ok +tdata-5-4-13,1,ok,ok,ok +tdata-5-4-14,1,ok,ok,ok +tdata-5-4-18,1,ok,ok,ok +tdata-5-4-19,1,ok,ok,ok +tdata-5-4-2,1,ok,ok,ok +tdata-5-4-20,1,ok,ok,ok +tdata-5-4-21,1,ok,ok,ok +tdata-5-4-23,1,ok,ok,ok +tdata-5-4-24,1,ok,ok,ok +tdata-5-4-32,1,ok,ok,ok +tdata-5-4-37,1,ok,ok,ok +tdata-5-4-39,1,ok,ok,ok +tdata-5-4-4,1,ok,ok,ok +tdata-5-4-6,1,ok,ok,ok +tdata-5-4-8,1,ok,ok,ok +tdata-5-5-1,1,ok,ok,ok +tdata-5-5-12,1,ok,ok,ok +tdata-5-5-15,1,ok,ok,ok +tdata-5-5-17,1,ok,ok,ok +tdata-5-5-19,1,ok,ok,ok +tdata-5-5-25,1,ok,ok,ok +tdata-5-5-28,1,ok,ok,ok +tdata-5-5-30,1,ok,ok,ok +tdata-5-5-5,1,ok,ok,ok +tdata-5-5-7,1,ok,ok,ok +tdata-5-6-25,1,ok,ok,ok +tdata-5-6-26,1,ok,ok,ok +tdata-5-6-3,1,ok,ok,ok +tdata-5-6-32,1,ok,ok,ok +tdata-5-6-34,1,ok,ok,ok +tdata-5-6-39,1,ok,ok,ok +tdata-5-6-7,1,ok,ok,ok +tdata-5-7-9,1,ok,ok,ok +tdata-5-8-31,1,ok,ok,ok +tdata-5-9-38,1,ok,ok,ok +BFT-10_16_8_77_15_58-1,1,ok,ok,ok +BFT-10_16_8_77_15_58-16,1,ok,ok,ok +BFT-11_16_8_77_31_46-12,1,ok,ok,ok +BFT-17_20_5_60_12_36-1,1,ok,ok,ok +BFT-17_20_5_60_12_36-11,1,ok,ok,ok +BFT-17_20_5_60_12_36-14,1,ok,ok,ok +BFT-17_20_5_60_12_36-15,1,ok,ok,ok +BFT-17_20_5_60_12_36-17,1,ok,ok,ok +BFT-17_20_5_60_12_36-19,1,ok,ok,ok +BFT-17_20_5_60_12_36-2,1,ok,ok,ok +BFT-17_20_5_60_12_36-3,1,ok,ok,ok +BFT-17_20_5_60_12_36-6,1,ok,ok,ok +BFT-17_20_5_60_12_36-7,1,ok,ok,ok +BFT-17_20_5_60_12_36-8,1,ok,ok,ok +BFT-17_20_5_60_12_36-9,1,ok,ok,ok +BFT-18_20_5_60_12_45-11,1,ok,ok,ok +BFT-18_20_5_60_12_45-13,1,ok,ok,ok +BFT-18_20_5_60_12_45-15,1,ok,ok,ok +BFT-18_20_5_60_12_45-16,1,ok,ok,ok +BFT-18_20_5_60_12_45-20,1,ok,ok,ok +BFT-18_20_5_60_12_45-3,1,ok,ok,ok +BFT-18_20_5_60_12_45-5,1,ok,ok,ok +BFT-18_20_5_60_12_45-6,1,ok,ok,ok +BFT-18_20_5_60_12_45-7,1,ok,ok,ok +BFT-18_20_5_60_12_45-8,1,ok,ok,ok +BFT-19_20_5_60_24_36-11,1,ok,ok,ok +BFT-19_20_5_60_24_36-12,1,ok,ok,ok +BFT-19_20_5_60_24_36-13,1,ok,ok,ok +BFT-19_20_5_60_24_36-14,1,ok,ok,ok +BFT-19_20_5_60_24_36-18,1,ok,ok,ok +BFT-19_20_5_60_24_36-20,1,ok,ok,ok +BFT-19_20_5_60_24_36-8,1,ok,ok,ok +BFT-1_16_5_48_10_29-1,1,ok,ok,ok +BFT-1_16_5_48_10_29-10,1,ok,ok,ok +BFT-1_16_5_48_10_29-11,1,ok,ok,ok +BFT-1_16_5_48_10_29-14,1,ok,ok,ok +BFT-1_16_5_48_10_29-15,1,ok,ok,ok +BFT-1_16_5_48_10_29-16,1,ok,ok,ok +BFT-1_16_5_48_10_29-18,1,ok,ok,ok +BFT-1_16_5_48_10_29-19,1,ok,ok,ok +BFT-1_16_5_48_10_29-2,1,ok,ok,ok +BFT-1_16_5_48_10_29-3,1,ok,ok,ok +BFT-1_16_5_48_10_29-4,1,ok,ok,ok +BFT-1_16_5_48_10_29-5,1,ok,ok,ok +BFT-1_16_5_48_10_29-6,1,ok,ok,ok +BFT-1_16_5_48_10_29-8,1,ok,ok,ok +BFT-1_16_5_48_10_29-9,1,ok,ok,ok +BFT-20_20_5_60_24_45-14,1,ok,ok,ok +BFT-20_20_5_60_24_45-18,1,ok,ok,ok +BFT-20_20_5_60_24_45-19,1,ok,ok,ok +BFT-20_20_5_60_24_45-6,1,ok,ok,ok +BFT-20_20_5_60_24_45-9,1,ok,ok,ok +BFT-21_20_5_80_16_48-1,1,ok,ok,ok +BFT-21_20_5_80_16_48-12,1,ok,ok,ok +BFT-21_20_5_80_16_48-13,1,ok,ok,ok +BFT-21_20_5_80_16_48-14,1,ok,ok,ok +BFT-21_20_5_80_16_48-18,1,ok,ok,ok +BFT-21_20_5_80_16_48-3,1,ok,ok,ok +BFT-21_20_5_80_16_48-8,1,ok,ok,ok +BFT-22_20_5_80_16_60-14,1,ok,ok,ok +BFT-22_20_5_80_16_60-17,1,ok,ok,ok +BFT-22_20_5_80_16_60-18,1,ok,ok,ok +BFT-22_20_5_80_16_60-20,1,ok,ok,ok +BFT-22_20_5_80_16_60-3,1,ok,ok,ok +BFT-22_20_5_80_16_60-5,1,ok,ok,ok +BFT-22_20_5_80_16_60-6,1,ok,ok,ok +BFT-22_20_5_80_16_60-8,1,ok,ok,ok +BFT-23_20_5_80_32_48-1,1,ok,ok,ok +BFT-23_20_5_80_32_48-11,1,ok,ok,ok +BFT-23_20_5_80_32_48-14,1,ok,ok,ok +BFT-23_20_5_80_32_48-16,1,ok,ok,ok +BFT-23_20_5_80_32_48-19,1,ok,ok,ok +BFT-23_20_5_80_32_48-2,1,ok,ok,ok +BFT-23_20_5_80_32_48-8,1,ok,ok,ok +BFT-24_20_5_80_32_60-1,1,ok,ok,ok +BFT-24_20_5_80_32_60-10,1,ok,ok,ok +BFT-24_20_5_80_32_60-14,1,ok,ok,ok +BFT-24_20_5_80_32_60-18,1,ok,ok,ok +BFT-24_20_5_80_32_60-4,1,ok,ok,ok +BFT-24_20_5_80_32_60-8,1,ok,ok,ok +BFT-25_20_8_96_19_58-12,1,ok,ok,ok +BFT-25_20_8_96_19_58-3,1,ok,ok,ok +BFT-28_20_8_96_38_72-19,1,ok,ok,ok +BFT-2_16_5_48_10_36-11,1,ok,ok,ok +BFT-2_16_5_48_10_36-13,1,ok,ok,ok +BFT-2_16_5_48_10_36-14,1,ok,ok,ok +BFT-2_16_5_48_10_36-15,1,ok,ok,ok +BFT-2_16_5_48_10_36-16,1,ok,ok,ok +BFT-2_16_5_48_10_36-17,1,ok,ok,ok +BFT-2_16_5_48_10_36-18,1,ok,ok,ok +BFT-2_16_5_48_10_36-20,1,ok,ok,ok +BFT-2_16_5_48_10_36-3,1,ok,ok,ok +BFT-2_16_5_48_10_36-4,1,ok,ok,ok +BFT-2_16_5_48_10_36-5,1,ok,ok,ok +BFT-2_16_5_48_10_36-8,1,ok,ok,ok +BFT-2_16_5_48_10_36-9,1,ok,ok,ok +BFT-3_16_5_48_19_29-1,1,ok,ok,ok +BFT-3_16_5_48_19_29-10,1,ok,ok,ok +BFT-3_16_5_48_19_29-11,1,ok,ok,ok +BFT-3_16_5_48_19_29-14,1,ok,ok,ok +BFT-3_16_5_48_19_29-15,1,ok,ok,ok +BFT-3_16_5_48_19_29-16,1,ok,ok,ok +BFT-3_16_5_48_19_29-17,1,ok,ok,ok +BFT-3_16_5_48_19_29-18,1,ok,ok,ok +BFT-3_16_5_48_19_29-19,1,ok,ok,ok +BFT-3_16_5_48_19_29-3,1,ok,ok,ok +BFT-3_16_5_48_19_29-4,1,ok,ok,ok +BFT-3_16_5_48_19_29-5,1,ok,ok,ok +BFT-3_16_5_48_19_29-7,1,ok,ok,ok +BFT-4_16_5_48_19_36-14,1,ok,ok,ok +BFT-4_16_5_48_19_36-15,1,ok,ok,ok +BFT-4_16_5_48_19_36-16,1,ok,ok,ok +BFT-4_16_5_48_19_36-17,1,ok,ok,ok +BFT-4_16_5_48_19_36-18,1,ok,ok,ok +BFT-4_16_5_48_19_36-19,1,ok,ok,ok +BFT-4_16_5_48_19_36-20,1,ok,ok,ok +BFT-4_16_5_48_19_36-3,1,ok,ok,ok +BFT-4_16_5_48_19_36-7,1,ok,ok,ok +BFT-4_16_5_48_19_36-8,1,ok,ok,ok +BFT-4_16_5_48_19_36-9,1,ok,ok,ok +BFT-5_16_5_64_13_38-10,1,ok,ok,ok +BFT-5_16_5_64_13_38-11,1,ok,ok,ok +BFT-5_16_5_64_13_38-14,1,ok,ok,ok +BFT-5_16_5_64_13_38-15,1,ok,ok,ok +BFT-5_16_5_64_13_38-16,1,ok,ok,ok +BFT-5_16_5_64_13_38-18,1,ok,ok,ok +BFT-5_16_5_64_13_38-2,1,ok,ok,ok +BFT-5_16_5_64_13_38-3,1,ok,ok,ok +BFT-5_16_5_64_13_38-4,1,ok,ok,ok +BFT-5_16_5_64_13_38-5,1,ok,ok,ok +BFT-5_16_5_64_13_38-6,1,ok,ok,ok +BFT-5_16_5_64_13_38-8,1,ok,ok,ok +BFT-6_16_5_64_13_48-11,1,ok,ok,ok +BFT-6_16_5_64_13_48-12,1,ok,ok,ok +BFT-6_16_5_64_13_48-13,1,ok,ok,ok +BFT-6_16_5_64_13_48-14,1,ok,ok,ok +BFT-6_16_5_64_13_48-15,1,ok,ok,ok +BFT-6_16_5_64_13_48-17,1,ok,ok,ok +BFT-6_16_5_64_13_48-2,1,ok,ok,ok +BFT-6_16_5_64_13_48-20,1,ok,ok,ok +BFT-6_16_5_64_13_48-3,1,ok,ok,ok +BFT-6_16_5_64_13_48-5,1,ok,ok,ok +BFT-6_16_5_64_13_48-6,1,ok,ok,ok +BFT-6_16_5_64_13_48-8,1,ok,ok,ok +BFT-6_16_5_64_13_48-9,1,ok,ok,ok +BFT-7_16_5_64_26_38-1,1,ok,ok,ok +BFT-7_16_5_64_26_38-10,1,ok,ok,ok +BFT-7_16_5_64_26_38-12,1,ok,ok,ok +BFT-7_16_5_64_26_38-14,1,ok,ok,ok +BFT-7_16_5_64_26_38-15,1,ok,ok,ok +BFT-7_16_5_64_26_38-2,1,ok,ok,ok +BFT-7_16_5_64_26_38-20,1,ok,ok,ok +BFT-7_16_5_64_26_38-7,1,ok,ok,ok +BFT-7_16_5_64_26_38-8,1,ok,ok,ok +BFT-7_16_5_64_26_38-9,1,ok,ok,ok +BFT-8_16_5_64_26_48-11,1,ok,ok,ok +BFT-8_16_5_64_26_48-15,1,ok,ok,ok +BFT-8_16_5_64_26_48-2,1,ok,ok,ok +BFT-8_16_5_64_26_48-6,1,ok,ok,ok +BFT-8_16_5_64_26_48-8,1,ok,ok,ok +BFT-8_16_5_64_26_48-9,1,ok,ok,ok +BFT-9_16_8_77_15_46-2,1,ok,ok,ok +BFT-9_16_8_77_15_46-20,1,ok,ok,ok diff --git a/_articles/RJ-2025-045/CPMP-2015_data/feature_values.arff b/_articles/RJ-2025-045/CPMP-2015_data/feature_values.arff new file mode 100644 index 0000000000..d025951ac7 --- /dev/null +++ b/_articles/RJ-2025-045/CPMP-2015_data/feature_values.arff @@ -0,0 +1,555 @@ +@RELATION feature_values_premarshalling_astar_2013 + +@ATTRIBUTE instance_id STRING +@ATTRIBUTE repetition NUMERIC +@ATTRIBUTE stacks NUMERIC +@ATTRIBUTE tiers NUMERIC +@ATTRIBUTE stack-tier-ratio NUMERIC +@ATTRIBUTE container-density NUMERIC +@ATTRIBUTE empty-stack-pct NUMERIC +@ATTRIBUTE overstowing-stack-pct NUMERIC +@ATTRIBUTE overstowing-2cont-stack-pct NUMERIC +@ATTRIBUTE group-same-min NUMERIC +@ATTRIBUTE group-same-max NUMERIC +@ATTRIBUTE group-same-mean NUMERIC +@ATTRIBUTE group-same-stdev NUMERIC +@ATTRIBUTE top-good-min NUMERIC +@ATTRIBUTE top-good-max NUMERIC +@ATTRIBUTE top-good-mean NUMERIC +@ATTRIBUTE top-good-stdev NUMERIC +@ATTRIBUTE overstowage-pct NUMERIC +@ATTRIBUTE bflb NUMERIC +@ATTRIBUTE left-density NUMERIC +@ATTRIBUTE tier-weighted-groups NUMERIC +@ATTRIBUTE avg-l1-top-left-lg-group NUMERIC +@ATTRIBUTE cont-empty-grt-estack NUMERIC +@ATTRIBUTE pct-bottom-pct-on-top NUMERIC + +@DATA +BF1_cpmp_16_5_48_10_29_1,1,16,5,0.3125,0.6,0.0625,0.625,0.909091,0,6,4.36364,1.55346,2,8,4.6,2.00998,0.3625,29,0.382222,0.441162,0.509615,0.175,0 +BF1_cpmp_16_5_48_10_29_12,1,16,5,0.3125,0.6,0.1875,0.5625,1,0,7,4.36364,1.61091,2,9,4.55556,2.26623,0.3625,29,0.413333,0.427399,0.663462,0.3375,0 +BF1_cpmp_16_5_48_10_29_14,1,16,5,0.3125,0.6,0.1875,0.625,0.833333,0,7,4.36364,1.72008,2,7,4.3,1.84662,0.3625,29,0.111111,0.405997,0.615385,0.2625,0 +BF1_cpmp_16_5_48_10_29_15,1,16,5,0.3125,0.6,0.0625,0.5,0.888889,0,6,4.36364,1.61091,2,9,3.5,2.29129,0.3625,29,0.677778,0.409975,0.576923,0.1125,0 +BF1_cpmp_16_5_48_10_29_17,1,16,5,0.3125,0.6,0.125,0.5625,0.9,0,6,4.36364,1.55346,2,10,4.66667,2.44949,0.3625,29,0.471111,0.428914,0.569231,0.3375,0 +BF1_cpmp_16_5_48_10_29_18,1,16,5,0.3125,0.6,0.0625,0.625,0.909091,0,6,4.36364,1.55346,2,8,4.5,1.85742,0.3625,29,0.151111,0.438952,0.615385,0.1375,0 +BF1_cpmp_16_5_48_10_29_19,1,16,5,0.3125,0.6,0.0625,0.5625,0.818182,0,7,4.36364,1.77214,2,10,5.33333,2.10819,0.3625,29,0.648889,0.433775,0.528846,0.0625,0 +BF1_cpmp_16_5_48_10_29_20,1,16,5,0.3125,0.6,0.125,0.625,0.769231,0,6,4.36364,1.55346,2,10,4.9,2.66271,0.3625,29,0.317778,0.436995,0.519231,0.2125,0 +BF1_cpmp_16_5_48_10_29_3,1,16,5,0.3125,0.6,0.125,0.75,0.923077,0,6,4.36364,1.49379,2,9,4.33333,2.28522,0.3625,29,0.0666667,0.424242,0.740385,0.4,0 +BF1_cpmp_16_5_48_10_29_4,1,16,5,0.3125,0.6,0.1875,0.625,0.909091,0,6,4.36364,1.49379,2,9,5,2.09762,0.3625,29,0.582222,0.414457,0.538462,0.25,0 +BF1_cpmp_16_5_48_10_29_5,1,16,5,0.3125,0.6,0.125,0.5625,0.75,0,7,4.36364,1.72008,2,8,5,2,0.3625,29,0.495556,0.446907,0.5,0.2125,0 +BF1_cpmp_16_5_48_10_29_6,1,16,5,0.3125,0.6,0.25,0.75,1,0,6,4.36364,1.49379,2,7,3.91667,1.65622,0.3625,29,0.277778,0.402399,0.429487,0.325,0 +BF1_cpmp_16_5_48_10_29_7,1,16,5,0.3125,0.6,0.0625,0.625,0.909091,0,5,4.36364,1.43164,2,8,4.3,2.05183,0.3625,29,0.728889,0.428977,0.507692,0.1625,0 +BF1_cpmp_16_5_48_10_29_8,1,16,5,0.3125,0.6,0.125,0.5625,0.9,0,6,4.36364,1.55346,2,9,4.55556,2.00616,0.3625,29,0.646667,0.415909,0.548077,0.35,0 +BF1_cpmp_16_5_48_10_29_9,1,16,5,0.3125,0.6,0.125,0.75,0.923077,0,6,4.36364,1.91988,2,8,4.66667,1.84089,0.3625,29,0.553333,0.426894,0.884615,0.2,0 +BF10_cpmp_16_8_77_16_58_13,1,16,8,0.5,0.601562,0.1875,0.625,0.909091,0,6,4.52941,1.37702,2,12,8,3.4641,0.453125,58,0.677778,0.401784,0.49375,0.289062,0.0625 +BF10_cpmp_16_8_77_16_58_3,1,16,8,0.5,0.601562,0.1875,0.625,1,0,8,4.52941,1.57621,3,15,6.9,3.61801,0.453125,58,0.531944,0.392541,0.65625,0.296875,0 +BF10_cpmp_16_8_77_16_58_8,1,16,8,0.5,0.601562,0.25,0.625,1,0,6,4.52941,1.33362,2,12,6.1,3.36006,0.453125,58,0.555556,0.375117,0.6875,0.304688,0.0625 +BF11_cpmp_16_8_77_31_47_10,1,16,8,0.5,0.601562,0,0.75,0.8,0,5,2.40625,0.896499,2,26,11.8333,7.32386,0.367188,49,0.377778,0.441853,0.53125,0,0.0625 +BF11_cpmp_16_8_77_31_47_11,1,16,8,0.5,0.601562,0,0.9375,0.9375,0,5,2.40625,1.34302,2,23,10.8667,6.05383,0.367188,49,0.513889,0.438422,0.8125,0,0 +BF11_cpmp_16_8_77_31_47_16,1,16,8,0.5,0.601562,0,0.875,0.875,0,5,2.40625,0.963696,2,17,10.2857,4.39852,0.367188,51,0.4,0.481308,0.65625,0,0 +BF11_cpmp_16_8_77_31_47_6,1,16,8,0.5,0.601562,0,1,1,0,7,2.40625,1.19529,2,25,12.5,6.99107,0.367188,51,0.266667,0.475875,0.359375,0,0 +BF11_cpmp_16_8_77_31_47_7,1,16,8,0.5,0.601562,0,0.9375,0.9375,0,5,2.40625,1.08568,2,26,11.7333,6.9519,0.367188,49,0.505556,0.499464,0.625,0,0.125 +BF12_cpmp_16_8_77_31_58_14,1,16,8,0.5,0.601562,0.125,0.75,1,0,6,2.40625,1.08568,3,29,14.1667,7.74417,0.453125,58,0.443056,0.407667,0.510417,0.226562,0.0625 +BF17_cpmp_20_5_60_12_36_1,1,20,5,0.25,0.6,0.1,0.55,0.785714,0,7,4.61538,1.54613,2,12,6.36364,3.11223,0.36,36,0.581295,0.449188,0.666667,0.17,0.05 +BF17_cpmp_20_5_60_12_36_10,1,20,5,0.25,0.6,0.15,0.6,1,0,8,4.61538,1.68881,2,11,6,2.48328,0.36,36,0.513669,0.441239,0.613333,0.15,0.05 +BF17_cpmp_20_5_60_12_36_11,1,20,5,0.25,0.6,0.05,0.65,0.928571,0,7,4.61538,1.59511,2,10,4.61538,2.33826,0.36,37,0.405755,0.427051,0.480952,0.09,0 +BF17_cpmp_20_5_60_12_36_12,1,20,5,0.25,0.6,0.05,0.65,1,0,7,4.61538,1.59511,2,10,5.23077,2.69341,0.36,36,0.168345,0.419701,0.42,0.14,0.1 +BF17_cpmp_20_5_60_12_36_15,1,20,5,0.25,0.6,0.25,0.65,0.928571,0,8,4.61538,1.77757,2,11,5.92308,2.46394,0.36,36,0.690647,0.394017,0.633333,0.35,0 +BF17_cpmp_20_5_60_12_36_16,1,20,5,0.25,0.6,0.15,0.65,0.866667,0,7,4.61538,1.54613,2,11,5.53846,2.70656,0.36,36,0.117986,0.435256,0.626667,0.3,0.05 +BF17_cpmp_20_5_60_12_36_17,1,20,5,0.25,0.6,0.15,0.65,0.928571,0,6,4.61538,1.49556,3,10,5.46154,2.06119,0.36,36,0.433094,0.43265,0.672222,0.31,0.1 +BF17_cpmp_20_5_60_12_36_2,1,20,5,0.25,0.6,0.05,0.6,0.8,0,8,4.61538,1.82033,2,10,5.25,2.34965,0.36,36,0.302158,0.443761,0.491667,0.19,0.1 +BF17_cpmp_20_5_60_12_36_20,1,20,5,0.25,0.6,0.1,0.7,0.933333,0,8,4.61538,1.77757,2,12,6.07143,3.26156,0.36,36,0.546763,0.452735,0.670833,0.14,0.05 +BF17_cpmp_20_5_60_12_36_3,1,20,5,0.25,0.6,0.05,0.65,0.8125,0,7,4.61538,1.54613,2,10,5.30769,2.64239,0.36,37,0.610072,0.440812,0.6,0.21,0 +BF17_cpmp_20_5_60_12_36_4,1,20,5,0.25,0.6,0.15,0.55,0.733333,0,7,4.61538,1.68881,2,7,4.45455,1.97086,0.36,37,0.296403,0.394701,0.258333,0.24,0 +BF17_cpmp_20_5_60_12_36_5,1,20,5,0.25,0.6,0.05,0.75,0.882353,0,7,4.61538,1.49556,2,11,5.73333,2.86279,0.36,37,0.398561,0.466966,0.526667,0.08,0 +BF17_cpmp_20_5_60_12_36_6,1,20,5,0.25,0.6,0.05,0.75,1,0,7,4.61538,1.49556,2,10,5,2.47656,0.36,36,0.661871,0.427265,0.493333,0.11,0.05 +BF17_cpmp_20_5_60_12_36_7,1,20,5,0.25,0.6,0.05,0.55,0.785714,0,7,4.61538,1.59511,2,9,5.54545,2.31059,0.36,36,0.156835,0.427479,0.44,0.18,0 +BF17_cpmp_20_5_60_12_36_8,1,20,5,0.25,0.6,0.1,0.75,0.9375,0,8,4.61538,2.13175,2,11,4.93333,2.79205,0.36,36,0.425899,0.408889,0.666667,0.24,0 +BF18_cpmp_20_5_60_12_45_1,1,20,5,0.25,0.6,0.25,0.7,1,0,6,4.61538,1.4432,2,11,5.92857,3.15015,0.45,45,0.538129,0.36953,0.46,0.39,0.05 +BF18_cpmp_20_5_60_12_45_12,1,20,5,0.25,0.6,0.3,0.6,1,0,7,4.61538,1.54613,2,11,5.33333,2.71825,0.45,45,0.741007,0.345385,0.658333,0.3,0.05 +BF18_cpmp_20_5_60_12_45_13,1,20,5,0.25,0.6,0.4,0.6,1,0,7,4.61538,1.68881,2,11,6.41667,2.8419,0.45,45,0.258993,0.357393,0.586667,0.4,0.15 +BF18_cpmp_20_5_60_12_45_14,1,20,5,0.25,0.6,0.35,0.65,1,0,7,4.61538,1.59511,3,12,7.30769,3.07307,0.45,45,0.467626,0.380641,0.706667,0.4,0.05 +BF18_cpmp_20_5_60_12_45_15,1,20,5,0.25,0.6,0.4,0.6,1,0,7,4.61538,1.59511,2,10,5.33333,2.65623,0.45,45,0.28777,0.331752,0.566667,0.4,0 +BF18_cpmp_20_5_60_12_45_16,1,20,5,0.25,0.6,0.3,0.6,1,0,7,4.61538,1.54613,2,12,5.83333,2.99537,0.45,45,0.45036,0.378889,0.57619,0.36,0.05 +BF18_cpmp_20_5_60_12_45_17,1,20,5,0.25,0.6,0.3,0.65,1,0,7,4.61538,1.59511,2,10,5.92308,3.19763,0.45,45,0.37554,0.403077,0.514286,0.37,0.05 +BF18_cpmp_20_5_60_12_45_18,1,20,5,0.25,0.6,0.3,0.6,1,0,7,4.61538,1.68881,2,11,6.58333,2.81242,0.45,45,0.323741,0.38094,0.533333,0.4,0.05 +BF18_cpmp_20_5_60_12_45_19,1,20,5,0.25,0.6,0.35,0.65,1,0,7,4.61538,1.54613,2,11,6.69231,2.99704,0.45,45,0.0935252,0.376154,0.558333,0.38,0.15 +BF18_cpmp_20_5_60_12_45_2,1,20,5,0.25,0.6,0.3,0.6,1,0,7,4.61538,1.59511,4,12,7.08333,2.92855,0.45,45,0.379856,0.39312,0.393333,0.36,0.15 +BF18_cpmp_20_5_60_12_45_3,1,20,5,0.25,0.6,0.3,0.6,1,0,7,4.61538,1.73376,2,11,5.25,2.71186,0.45,45,0.814388,0.362051,0.766667,0.39,0 +BF18_cpmp_20_5_60_12_45_4,1,20,5,0.25,0.6,0.25,0.6,1,0,6,4.61538,1.4432,2,12,5.83333,2.85287,0.45,45,0.151079,0.385043,0.55,0.35,0.1 +BF18_cpmp_20_5_60_12_45_5,1,20,5,0.25,0.6,0.3,0.6,1,0,9,4.61538,2.20274,2,9,5.16667,2.54406,0.45,45,0.258993,0.364231,0.622222,0.4,0.15 +BF18_cpmp_20_5_60_12_45_6,1,20,5,0.25,0.6,0.4,0.6,1,0,6,4.61538,1.49556,3,10,6.5,2.5658,0.45,45,0.438849,0.365812,0.613333,0.4,0 +BF18_cpmp_20_5_60_12_45_7,1,20,5,0.25,0.6,0.3,0.65,1,0,7,4.61538,1.54613,2,11,5.92308,3.04988,0.45,45,0.359712,0.379829,0.526667,0.37,0.05 +BF18_cpmp_20_5_60_12_45_8,1,20,5,0.25,0.6,0.35,0.6,1,0,8,4.61538,1.94297,2,12,5.5,2.81366,0.45,45,0.699281,0.381581,0.55,0.4,0.1 +BF18_cpmp_20_5_60_12_45_9,1,20,5,0.25,0.6,0.3,0.6,1,0,6,4.61538,1.38888,2,11,7,2.41523,0.45,45,0.143885,0.387778,0.486667,0.4,0.1 +BF19_cpmp_20_5_60_24_36_1,1,20,5,0.25,0.6,0.1,0.7,0.933333,0,5,2.4,0.979796,2,21,10.1429,5.9983,0.36,36,0.505036,0.418044,0.85,0.2,0.1 +BF19_cpmp_20_5_60_24_36_10,1,20,5,0.25,0.6,0.15,0.7,0.875,0,6,2.4,1.13137,2,20,10.8571,5.70535,0.36,36,0.342446,0.451756,0.622222,0.29,0 +BF19_cpmp_20_5_60_24_36_11,1,20,5,0.25,0.6,0.1,0.5,0.769231,0,5,2.4,0.938083,4,24,9.4,5.78273,0.36,36,0.630216,0.393667,0.566667,0.2,0 +BF19_cpmp_20_5_60_24_36_14,1,20,5,0.25,0.6,0,0.7,0.933333,0,4,2.4,0.748331,2,16,8.21429,4.00319,0.36,36,0.425899,0.419644,0.822222,0,0 +BF19_cpmp_20_5_60_24_36_15,1,20,5,0.25,0.6,0.05,0.65,0.866667,0,5,2.4,1.0583,2,21,7.76923,5.0559,0.36,36,0.391367,0.394756,0.6,0.06,0 +BF19_cpmp_20_5_60_24_36_16,1,20,5,0.25,0.6,0.05,0.65,0.928571,0,4,2.4,0.848528,3,17,9.23077,4.50903,0.36,37,0.346763,0.412489,0.416667,0.18,0 +BF19_cpmp_20_5_60_24_36_17,1,20,5,0.25,0.6,0.05,0.55,0.916667,0,4,2.4,0.979796,2,19,8.36364,5.22708,0.36,36,0.0748201,0.407311,0.677778,0.12,0.1 +BF19_cpmp_20_5_60_24_36_18,1,20,5,0.25,0.6,0.05,0.65,0.866667,0,4,2.4,0.8,2,20,10.0769,5.10598,0.36,36,0.277698,0.433467,0.777778,0.27,0 +BF19_cpmp_20_5_60_24_36_2,1,20,5,0.25,0.6,0.05,0.6,1,0,5,2.4,0.979796,3,20,10.1667,4.82758,0.36,36,0.456115,0.424467,0.811111,0.06,0 +BF19_cpmp_20_5_60_24_36_20,1,20,5,0.25,0.6,0.2,0.55,0.846154,0,4,2.4,1.0198,2,22,8.54545,6.21382,0.36,36,0.541007,0.387467,0.683333,0.4,0.15 +BF19_cpmp_20_5_60_24_36_3,1,20,5,0.25,0.6,0.2,0.7,0.933333,0,6,2.4,1.2,2,22,10.0714,6.46379,0.36,36,0.581295,0.384267,0.716667,0.34,0 +BF19_cpmp_20_5_60_24_36_5,1,20,5,0.25,0.6,0.15,0.6,1,0,4,2.4,0.848528,2,23,10.75,6.70976,0.36,36,0.215827,0.403089,0.716667,0.16,0.05 +BF19_cpmp_20_5_60_24_36_6,1,20,5,0.25,0.6,0.1,0.6,0.8,0,4,2.4,0.938083,2,24,8.58333,6.03405,0.36,37,0.323741,0.393222,0.883333,0.18,0 +BF19_cpmp_20_5_60_24_36_7,1,20,5,0.25,0.6,0.15,0.55,1,0,4,2.4,1.0198,3,24,8.63636,5.80396,0.36,36,0.421583,0.402044,0.633333,0.28,0.1 +BF19_cpmp_20_5_60_24_36_8,1,20,5,0.25,0.6,0,0.75,0.9375,0,4,2.4,0.848528,4,21,11.2667,4.9189,0.36,39,0.497842,0.436444,0.622222,0,0 +BF19_cpmp_20_5_60_24_36_9,1,20,5,0.25,0.6,0.05,0.65,1,0,5,2.4,0.894427,3,16,8.84615,4.12956,0.36,38,0.263309,0.405222,0.4,0.17,0 +BF2_cpmp_16_5_48_10_36_1,1,16,5,0.3125,0.6,0.3125,0.625,1,0,7,4.36364,1.66639,2,10,4.8,2.6,0.45,36,0.197778,0.358081,0.538462,0.4,0 +BF2_cpmp_16_5_48_10_36_11,1,16,5,0.3125,0.6,0.25,0.625,1,0,6,4.36364,1.49379,2,8,4.6,2.2,0.45,36,0.202222,0.368308,0.548077,0.3125,0 +BF2_cpmp_16_5_48_10_36_12,1,16,5,0.3125,0.6,0.25,0.625,1,0,9,4.36364,2.1436,2,9,5,2.44949,0.45,36,0.493333,0.379419,0.602564,0.4,0 +BF2_cpmp_16_5_48_10_36_13,1,16,5,0.3125,0.6,0.3125,0.6875,1,0,7,4.36364,1.87193,2,10,5.90909,2.4663,0.45,36,0.171111,0.402904,0.489011,0.35,0 +BF2_cpmp_16_5_48_10_36_14,1,16,5,0.3125,0.6,0.3125,0.625,1,0,7,4.36364,1.82272,2,8,4.9,2.02237,0.45,36,0.555556,0.389141,0.740385,0.3625,0 +BF2_cpmp_16_5_48_10_36_15,1,16,5,0.3125,0.6,0.3125,0.625,1,0,7,4.36364,1.66639,2,10,5,2.79285,0.45,36,0.402222,0.390593,0.638462,0.4,0 +BF2_cpmp_16_5_48_10_36_17,1,16,5,0.3125,0.6,0.3125,0.6875,1,0,5,4.36364,1.43164,2,10,5.45455,2.38799,0.45,36,0.455556,0.391098,0.584615,0.3125,0 +BF2_cpmp_16_5_48_10_36_18,1,16,5,0.3125,0.6,0.3125,0.625,1,0,6,4.36364,1.55346,2,10,5.4,2.90517,0.45,36,0.677778,0.39072,0.570513,0.4,0 +BF2_cpmp_16_5_48_10_36_19,1,16,5,0.3125,0.6,0.3125,0.625,1,0,6,4.36364,1.61091,2,8,5,2.40832,0.45,36,0.42,0.362626,0.487179,0.35,0 +BF2_cpmp_16_5_48_10_36_2,1,16,5,0.3125,0.6,0.25,0.6875,1,0,7,4.36364,1.72008,2,9,5.90909,2.35312,0.45,36,0.171111,0.417361,0.6,0.375,0 +BF2_cpmp_16_5_48_10_36_20,1,16,5,0.3125,0.6,0.25,0.625,1,0,7,4.36364,1.87193,2,10,5,2.40832,0.45,36,0.277778,0.373422,0.384615,0.35,0 +BF2_cpmp_16_5_48_10_36_3,1,16,5,0.3125,0.6,0.3125,0.625,1,0,6,4.36364,1.55346,2,7,4.5,1.74642,0.45,36,0.1,0.37178,0.448718,0.4,0 +BF2_cpmp_16_5_48_10_36_4,1,16,5,0.3125,0.6,0.3125,0.625,1,0,7,4.36364,1.77214,2,9,5.8,2.4,0.45,36,0.277778,0.38529,0.519231,0.3625,0 +BF2_cpmp_16_5_48_10_36_5,1,16,5,0.3125,0.6,0.3125,0.625,1,0,6,4.36364,1.55346,2,10,5.4,2.37487,0.45,36,0.16,0.387374,0.5,0.325,0 +BF2_cpmp_16_5_48_10_36_6,1,16,5,0.3125,0.6,0.3125,0.625,1,0,6,4.36364,1.55346,2,9,4.6,2.2891,0.45,36,0.444444,0.369003,0.701923,0.3125,0 +BF2_cpmp_16_5_48_10_36_7,1,16,5,0.3125,0.6,0.375,0.625,1,0,8,4.36364,1.82272,2,9,4.9,2.34307,0.45,36,0.677778,0.347412,0.384615,0.4,0 +BF2_cpmp_16_5_48_10_36_8,1,16,5,0.3125,0.6,0.3125,0.625,1,0,6,4.36364,1.55346,2,8,5.2,1.93907,0.45,36,0.177778,0.357702,0.378205,0.4,0 +BF2_cpmp_16_5_48_10_36_9,1,16,5,0.3125,0.6,0.3125,0.625,1,0,7,4.36364,1.77214,2,8,5.1,2.3,0.45,36,0.257778,0.367361,0.576923,0.3125,0 +BF20_cpmp_20_5_60_24_45_1,1,20,5,0.25,0.6,0.3,0.6,1,0,5,2.4,0.938083,2,23,10.6667,7.31817,0.45,45,0.755396,0.337667,0.483333,0.39,0.1 +BF20_cpmp_20_5_60_24_45_10,1,20,5,0.25,0.6,0.4,0.6,1,0,4,2.4,0.848528,3,22,11.75,5.86124,0.45,45,0.856115,0.341156,0.816667,0.4,0.15 +BF20_cpmp_20_5_60_24_45_11,1,20,5,0.25,0.6,0.35,0.65,1,0,7,2.4,1.49666,2,22,7.53846,5.28591,0.45,45,0,0.3126,0.4,0.4,0.2 +BF20_cpmp_20_5_60_24_45_12,1,20,5,0.25,0.6,0.3,0.65,1,0,5,2.4,1.09545,3,24,11,6.02559,0.45,45,0.352518,0.342822,0.783333,0.4,0.05 +BF20_cpmp_20_5_60_24_45_13,1,20,5,0.25,0.6,0.3,0.65,1,0,5,2.4,1.2,3,21,11.5385,5.87845,0.45,45,0.116547,0.386222,0.533333,0.33,0.1 +BF20_cpmp_20_5_60_24_45_14,1,20,5,0.25,0.6,0.35,0.6,1,0,4,2.4,0.979796,3,20,10.8333,5.66912,0.45,45,0.62446,0.3544,0.583333,0.36,0.15 +BF20_cpmp_20_5_60_24_45_15,1,20,5,0.25,0.6,0.3,0.6,1,0,4,2.4,0.8,2,21,9.66667,5.57275,0.45,45,0.791367,0.327711,0.5,0.34,0 +BF20_cpmp_20_5_60_24_45_17,1,20,5,0.25,0.6,0.25,0.65,1,0,5,2.4,0.979796,6,23,13.6923,4.76178,0.45,45,0.156835,0.399933,0.4,0.38,0.25 +BF20_cpmp_20_5_60_24_45_2,1,20,5,0.25,0.6,0.4,0.6,1,0,4,2.4,0.894427,4,22,12.0833,5.64887,0.45,45,0.791367,0.336444,0.683333,0.4,0 +BF20_cpmp_20_5_60_24_45_20,1,20,5,0.25,0.6,0.25,0.6,1,0,5,2.4,0.894427,3,21,12.5833,5.34569,0.45,45,0.381295,0.377378,0.666667,0.4,0 +BF20_cpmp_20_5_60_24_45_3,1,20,5,0.25,0.6,0.3,0.65,1,0,5,2.4,1.09545,2,22,10.6154,5.83805,0.45,45,0.028777,0.346422,0.216667,0.35,0.05 +BF20_cpmp_20_5_60_24_45_5,1,20,5,0.25,0.6,0.3,0.6,1,0,4,2.4,0.894427,7,21,12.6667,4.17,0.45,45,0.595683,0.3644,0.533333,0.36,0.2 +BF20_cpmp_20_5_60_24_45_6,1,20,5,0.25,0.6,0.3,0.6,1,0,4,2.4,1.0198,4,17,10,3.937,0.45,45,0.438849,0.342311,0.95,0.32,0 +BF20_cpmp_20_5_60_24_45_7,1,20,5,0.25,0.6,0.3,0.6,1,0,5,2.4,1.0198,2,22,9.33333,5.63225,0.45,45,0.155396,0.337822,0.633333,0.34,0.05 +BF20_cpmp_20_5_60_24_45_8,1,20,5,0.25,0.6,0.35,0.6,1,0,5,2.4,1.0198,3,19,9.41667,5.1228,0.45,45,0.561151,0.324356,0.5,0.39,0 +BF20_cpmp_20_5_60_24_45_9,1,20,5,0.25,0.6,0.3,0.6,1,0,5,2.4,1.0583,2,23,13.1667,5.95586,0.45,45,0.74964,0.381,0.611111,0.34,0.15 +BF21_cpmp_20_5_80_16_48_11,1,20,5,0.25,0.8,0,0.8,0.888889,0,6,4.70588,1.22545,2,15,7.375,3.72282,0.48,51,0.305036,0.523399,0.66,0,0 +BF21_cpmp_20_5_80_16_48_14,1,20,5,0.25,0.8,0,0.9,0.947368,0,6,4.70588,1.36186,2,13,5.83333,2.71314,0.48,52,0.107914,0.526176,0.533333,0,0.05 +BF21_cpmp_20_5_80_16_48_16,1,20,5,0.25,0.8,0,0.85,0.85,0,7,4.70588,1.4858,2,14,7.94118,3.40364,0.48,50,0.320863,0.528399,0.483333,0,0 +BF21_cpmp_20_5_80_16_48_18,1,20,5,0.25,0.8,0,0.8,0.842105,0,9,4.70588,1.63652,2,15,7.9375,4.40835,0.48,50,0.0877698,0.549641,0.52,0,0.05 +BF21_cpmp_20_5_80_16_48_19,1,20,5,0.25,0.8,0,1,1,0,7,4.70588,1.52488,2,16,6.2,3.89358,0.48,54,0.14964,0.508529,0.733333,0,0.15 +BF21_cpmp_20_5_80_16_48_2,1,20,5,0.25,0.8,0,0.8,0.8,0,6,4.70588,1.44567,2,13,5.9375,3.28764,0.48,52,0.197122,0.515098,0.722222,0,0 +BF21_cpmp_20_5_80_16_48_3,1,20,5,0.25,0.8,0,0.75,0.882353,0,9,4.70588,1.80733,2,11,5.66667,3.0037,0.48,49,0.094964,0.517288,0.683333,0,0.1 +BF21_cpmp_20_5_80_16_48_4,1,20,5,0.25,0.8,0,0.85,0.894737,0,6,4.70588,1.40439,2,14,6.76471,3.48998,0.48,50,0.263309,0.530327,0.411111,0,0 +BF21_cpmp_20_5_80_16_48_5,1,20,5,0.25,0.8,0,0.85,0.894737,0,6,4.70588,1.36186,2,16,6.82353,4.10502,0.48,50,0.448921,0.520163,0.494444,0,0.05 +BF21_cpmp_20_5_80_16_48_7,1,20,5,0.25,0.8,0,0.75,0.833333,0,6,4.70588,1.31796,2,13,4.86667,2.9409,0.48,51,0.172662,0.49183,0.566667,0,0 +BF21_cpmp_20_5_80_16_48_8,1,20,5,0.25,0.8,0,0.75,0.789474,0,7,4.70588,1.52488,2,12,6.8,3.61847,0.48,51,0.410072,0.493725,0.546667,0,0 +BF21_cpmp_20_5_80_16_48_9,1,20,5,0.25,0.8,0,0.75,0.833333,0,7,4.70588,1.52488,2,12,6.06667,2.93182,0.48,50,0.0517986,0.503039,0.725,0,0 +BF22_cpmp_20_5_80_16_60_11,1,20,5,0.25,0.8,0.1,0.8,1,0,6,4.70588,1.36186,2,16,8.75,4.08503,0.6,60,0.282014,0.503366,0.533333,0.1,0.2 +BF22_cpmp_20_5_80_16_60_16,1,20,5,0.25,0.8,0.15,0.85,1,0,7,4.70588,1.67208,2,16,8.05882,3.85732,0.6,61,0.195683,0.477124,0.477778,0.15,0.1 +BF22_cpmp_20_5_80_16_60_4,1,20,5,0.25,0.8,0.05,0.9,1,0,8,4.70588,1.7069,3,15,8.44444,3.13089,0.6,62,0.254676,0.505948,0.553333,0.05,0 +BF22_cpmp_20_5_80_16_60_6,1,20,5,0.25,0.8,0.15,0.8,0.941176,0,8,4.70588,1.52488,2,12,7.25,3.23071,0.6,61,0.0460432,0.449412,0.586667,0.15,0 +BF22_cpmp_20_5_80_16_60_7,1,20,5,0.25,0.8,0.2,0.8,1,0,7,4.70588,1.77448,2,11,6.375,2.97647,0.6,62,0.352518,0.439216,0.366667,0.2,0.1 +BF22_cpmp_20_5_80_16_60_9,1,20,5,0.25,0.8,0.15,0.85,1,0,7,4.70588,1.52488,2,12,6.70588,3.32132,0.6,62,0.414388,0.46683,0.38,0.17,0.05 +BF23_cpmp_20_5_80_32_48_10,1,20,5,0.25,0.8,0,0.9,1,0,5,2.42424,1.07394,2,17,8.66667,5.04425,0.48,52,0.169784,0.435741,0.3,0,0 +BF23_cpmp_20_5_80_32_48_11,1,20,5,0.25,0.8,0,0.85,0.894737,0,4,2.42424,0.985664,2,25,10.0588,6.27292,0.48,51,0.0589928,0.501515,0.358333,0,0.1 +BF23_cpmp_20_5_80_32_48_12,1,20,5,0.25,0.8,0,0.9,0.947368,0,6,2.42424,1.20681,2,28,10.7778,7.32997,0.48,49,0.185612,0.466296,0.588889,0,0.1 +BF23_cpmp_20_5_80_32_48_15,1,20,5,0.25,0.8,0,0.8,0.842105,0,5,2.42424,0.922129,2,30,14.3125,7.48932,0.48,51,0.0129496,0.527744,0.833333,0,0 +BF23_cpmp_20_5_80_32_48_16,1,20,5,0.25,0.8,0,0.9,0.947368,0,7,2.42424,1.39328,2,28,11.8889,7.68757,0.48,51,0.299281,0.494562,0.75,0,0 +BF23_cpmp_20_5_80_32_48_18,1,20,5,0.25,0.8,0,0.65,0.764706,0,5,2.42424,1.25602,2,24,15.0769,8.21314,0.48,51,0.195683,0.499714,0.333333,0,0 +BF23_cpmp_20_5_80_32_48_3,1,20,5,0.25,0.8,0,0.85,0.894737,0,4,2.42424,0.888659,3,27,11.1765,6.51907,0.48,51,0.0848921,0.476599,0.677778,0,0.1 +BF23_cpmp_20_5_80_32_48_4,1,20,5,0.25,0.8,0,0.75,0.789474,0,5,2.42424,0.922129,2,25,12.3333,7.28164,0.48,50,0.105036,0.50569,0.506667,0,0 +BF23_cpmp_20_5_80_32_48_6,1,20,5,0.25,0.8,0.05,0.85,0.944444,0,6,2.42424,1.41486,5,23,14.3529,5.38998,0.48,50,0.430216,0.511987,0.5,0.05,0 +BF23_cpmp_20_5_80_32_48_7,1,20,5,0.25,0.8,0,0.75,0.833333,0,5,2.42424,1.04534,2,25,13.6667,7.11493,0.48,51,0.0057554,0.498889,0.533333,0,0 +BF24_cpmp_20_5_80_32_60_13,1,20,5,0.25,0.8,0,0.85,1,0,4,2.42424,0.853879,2,26,11.8235,8.13298,0.6,64,0.368345,0.459091,0.575,0,0.1 +BF24_cpmp_20_5_80_32_60_17,1,20,5,0.25,0.8,0.15,0.8,1,0,4,2.42424,0.779678,4,30,15.125,7.43198,0.6,60,0.258993,0.459091,0.488889,0.16,0 +BF24_cpmp_20_5_80_32_60_2,1,20,5,0.25,0.8,0.1,0.8,1,0,5,2.42424,0.888659,2,32,14.625,7.40671,0.6,61,0.0920863,0.456582,0.55,0.1,0.15 +BF24_cpmp_20_5_80_32_60_8,1,20,5,0.25,0.8,0.2,0.8,1,0,5,2.42424,1.1018,2,27,13.375,7.61475,0.6,60,0.381295,0.440522,0.683333,0.2,0.1 +BF25_cpmp_20_8_96_20_58_20,1,20,8,0.4,0.6,0,0.8,0.842105,0,9,4.57143,1.56057,2,13,6.875,3.70599,0.3625,62,0.397482,0.475134,0.458333,0,0 +BF26_cpmp_20_8_96_20_72_5,1,20,8,0.4,0.6,0.2,0.6,0.923077,0,8,4.57143,1.62045,2,17,9.91667,4.99096,0.45,72,0.338129,0.405466,0.649306,0.34375,0 +BF27_cpmp_20_8_96_39_58_12,1,20,8,0.4,0.6,0,0.8,0.842105,0,4,2.4,0.830662,3,27,15.25,7.70146,0.3625,59,0.304856,0.451712,0.736111,0,0.05 +BF27_cpmp_20_8_96_39_58_6,1,20,8,0.4,0.6,0,0.8,0.8,0,4,2.4,0.860233,2,34,14.5625,10.1794,0.3625,60,0.357914,0.449005,0.583333,0,0 +BF27_cpmp_20_8_96_39_58_9,1,20,8,0.4,0.6,0,0.9,0.947368,0,5,2.4,0.943398,2,30,15.0556,8.2964,0.3625,61,0.609712,0.458776,0.569444,0,0 +BF28_cpmp_20_8_96_39_72_19,1,20,8,0.4,0.6,0.05,0.7,1,0,6,2.4,1.09087,2,32,16.7857,9.88634,0.45,72,0.198741,0.413438,0.701389,0.18125,0 +BF3_cpmp_16_5_48_20_29_1,1,16,5,0.3125,0.6,0.1875,0.6875,0.916667,0,6,2.28571,1.45219,3,15,8.45455,3.82251,0.3625,29,0.7,0.391005,0.346154,0.3,0.125 +BF3_cpmp_16_5_48_20_29_11,1,16,5,0.3125,0.6,0.125,0.5625,0.9,0,3,2.28571,0.764875,2,18,8.22222,4.91659,0.3625,29,0.06,0.387401,0.628205,0.3625,0 +BF3_cpmp_16_5_48_20_29_13,1,16,5,0.3125,0.6,0.125,0.625,0.909091,0,4,2.28571,0.933139,2,14,8.5,3.98121,0.3625,29,0.688889,0.433499,0.75,0.3375,0.0625 +BF3_cpmp_16_5_48_20_29_14,1,16,5,0.3125,0.6,0.0625,0.75,0.923077,0,5,2.28571,1.16058,3,17,7.33333,3.85861,0.3625,30,0.395556,0.405853,0.525641,0.3,0 +BF3_cpmp_16_5_48_20_29_16,1,16,5,0.3125,0.6,0.0625,0.5625,0.9,0,5,2.28571,1.07539,2,19,10.1111,5.25874,0.3625,29,0.197778,0.401918,0.519231,0.1125,0 +BF3_cpmp_16_5_48_20_29_19,1,16,5,0.3125,0.6,0,0.625,0.833333,0,6,2.28571,1.20091,3,12,6.6,2.69072,0.3625,30,0.162222,0.375595,0.596154,0,0 +BF3_cpmp_16_5_48_20_29_20,1,16,5,0.3125,0.6,0.1875,0.625,0.909091,0,5,2.28571,0.982846,2,19,7.3,4.98096,0.3625,29,0.5,0.361905,0.403846,0.3125,0 +BF3_cpmp_16_5_48_20_29_3,1,16,5,0.3125,0.6,0.0625,0.5625,0.818182,0,5,2.28571,0.982846,2,12,6.44444,3.49956,0.3625,29,0.473333,0.40043,0.711538,0.175,0 +BF3_cpmp_16_5_48_20_29_5,1,16,5,0.3125,0.6,0.1875,0.6875,0.916667,0,4,2.28571,0.824786,3,19,8.09091,5.16024,0.3625,29,0.446667,0.398181,0.589744,0.275,0.0625 +BF3_cpmp_16_5_48_20_29_6,1,16,5,0.3125,0.6,0.125,0.625,0.909091,0,4,2.28571,0.933139,3,18,9.6,5.08331,0.3625,29,0.551111,0.418618,0.487179,0.125,0 +BF3_cpmp_16_5_48_20_29_8,1,16,5,0.3125,0.6,0.1875,0.75,1,0,4,2.28571,0.824786,2,18,9.08333,4.78641,0.3625,29,0.417778,0.380423,0.634615,0.35,0.0625 +BF3_cpmp_16_5_48_20_29_9,1,16,5,0.3125,0.6,0.0625,0.625,1,0,4,2.28571,0.880631,2,10,5.5,2.94109,0.3625,29,0.637778,0.393618,0.692308,0.225,0.0625 +BF4_cpmp_16_5_48_20_36_1,1,16,5,0.3125,0.6,0.375,0.625,1,0,4,2.28571,0.764875,3,14,8.1,3.7,0.45,36,0.1,0.331052,0.576923,0.4,0.0625 +BF4_cpmp_16_5_48_20_36_10,1,16,5,0.3125,0.6,0.375,0.625,1,0,5,2.28571,0.933139,4,19,10,4.51664,0.45,36,0.433333,0.345536,0.653846,0.4,0 +BF4_cpmp_16_5_48_20_36_11,1,16,5,0.3125,0.6,0.375,0.625,1,0,5,2.28571,0.933139,2,13,7,2.93258,0.45,36,0.144444,0.325959,0.717949,0.375,0.0625 +BF4_cpmp_16_5_48_20_36_13,1,16,5,0.3125,0.6,0.3125,0.625,1,0,7,2.28571,1.35023,3,19,9.3,5.38609,0.45,36,0.144444,0.354464,0.807692,0.35,0.3125 +BF4_cpmp_16_5_48_20_36_14,1,16,5,0.3125,0.6,0.375,0.625,1,0,5,2.28571,1.11879,3,19,13.4,5.04381,0.45,36,0.422222,0.394081,0.615385,0.375,0.0625 +BF4_cpmp_16_5_48_20_36_15,1,16,5,0.3125,0.6,0.3125,0.625,1,0,6,2.28571,1.20091,2,19,9.3,5.9,0.45,36,0.666667,0.368651,0.403846,0.375,0 +BF4_cpmp_16_5_48_20_36_16,1,16,5,0.3125,0.6,0.375,0.625,1,0,4,2.28571,0.880631,3,19,12.4,5.0636,0.45,36,0.377778,0.377712,0.423077,0.375,0.0625 +BF4_cpmp_16_5_48_20_36_17,1,16,5,0.3125,0.6,0.3125,0.625,1,0,4,2.28571,0.982846,3,19,9.7,4.38292,0.45,36,0.655556,0.373578,0.730769,0.4,0.125 +BF4_cpmp_16_5_48_20_36_18,1,16,5,0.3125,0.6,0.3125,0.625,1,0,5,2.28571,0.933139,2,19,7.1,4.65725,0.45,36,0.357778,0.343684,0.333333,0.35,0.1875 +BF4_cpmp_16_5_48_20_36_19,1,16,5,0.3125,0.6,0.25,0.5625,1,0,4,2.28571,1.11879,4,18,9.77778,4.02155,0.45,36,0.746667,0.384259,0.423077,0.35,0.125 +BF4_cpmp_16_5_48_20_36_2,1,16,5,0.3125,0.6,0.3125,0.5625,0.9,0,4,2.28571,0.933139,4,20,10.4444,4.78681,0.45,36,0.704444,0.371759,0.596154,0.35,0.125 +BF4_cpmp_16_5_48_20_36_20,1,16,5,0.3125,0.6,0.25,0.6875,1,0,6,2.28571,1.27775,2,18,10.3636,5.26174,0.45,36,0.36,0.405721,0.692308,0.3375,0 +BF4_cpmp_16_5_48_20_36_3,1,16,5,0.3125,0.6,0.3125,0.6875,1,0,4,2.28571,0.933139,2,20,10.0909,5.40125,0.45,36,0.155556,0.354001,0.692308,0.3625,0.0625 +BF4_cpmp_16_5_48_20_36_5,1,16,5,0.3125,0.6,0.3125,0.625,1,0,8,2.28571,1.54744,2,14,6.4,3.87814,0.45,36,0.722222,0.334987,0.634615,0.3625,0.0625 +BF4_cpmp_16_5_48_20_36_6,1,16,5,0.3125,0.6,0.25,0.625,1,0,3,2.28571,0.699854,2,19,10.4,5.62494,0.45,36,0.302222,0.384028,0.371795,0.2875,0.1875 +BF4_cpmp_16_5_48_20_36_7,1,16,5,0.3125,0.6,0.375,0.625,1,0,4,2.28571,0.824786,2,16,7.1,4.32319,0.45,36,0.495556,0.311574,0.711538,0.4,0.125 +BF4_cpmp_16_5_48_20_36_8,1,16,5,0.3125,0.6,0.25,0.6875,1,0,4,2.28571,0.982846,2,16,7.90909,4.58167,0.45,36,0.784444,0.346263,0.884615,0.3875,0.0625 +BF4_cpmp_16_5_48_20_36_9,1,16,5,0.3125,0.6,0.25,0.5625,1,0,4,2.28571,0.764875,2,18,9.66667,5.22813,0.45,36,0.277778,0.359425,0.730769,0.4,0.0625 +BF5_cpmp_16_5_64_13_39_1,1,16,5,0.3125,0.8,0,0.8125,0.8125,0,7,4.57143,1.54524,2,10,4.46154,2.27411,0.4875,42,0.14,0.526687,0.707692,0,0.0625 +BF5_cpmp_16_5_64_13_39_10,1,16,5,0.3125,0.8,0.0625,0.8125,1,0,7,4.57143,1.4983,3,13,6.53846,2.46874,0.4875,39,0.242222,0.505308,0.528846,0.1125,0.0625 +BF5_cpmp_16_5_64_13_39_11,1,16,5,0.3125,0.8,0,0.8125,0.8125,0,7,4.57143,1.44984,2,10,5.46154,2.4997,0.4875,42,0.18,0.519246,0.669231,0,0.0625 +BF5_cpmp_16_5_64_13_39_12,1,16,5,0.3125,0.8,0,0.75,0.857143,0,7,4.57143,1.63507,2,12,6.25,3.4187,0.4875,41,0.115556,0.53497,0.438462,0,0.0625 +BF5_cpmp_16_5_64_13_39_14,1,16,5,0.3125,0.8,0,0.875,0.875,0,9,4.57143,1.80136,2,13,7.07143,3.32661,0.4875,41,0.293333,0.566419,0.338462,0,0 +BF5_cpmp_16_5_64_13_39_15,1,16,5,0.3125,0.8,0.0625,0.9375,1,0,6,4.57143,1.34771,2,13,6.2,3.03754,0.4875,40,0.355556,0.527232,0.569231,0.075,0 +BF5_cpmp_16_5_64_13_39_17,1,16,5,0.3125,0.8,0,0.875,0.875,0,6,4.57143,1.44984,2,12,5.78571,3.38439,0.4875,42,0.108889,0.540476,0.461538,0,0.0625 +BF5_cpmp_16_5_64_13_39_2,1,16,5,0.3125,0.8,0.0625,0.9375,1,0,10,4.57143,2.16182,2,13,7,3.01109,0.4875,40,0.111111,0.555704,0.410256,0.075,0.0625 +BF5_cpmp_16_5_64_13_39_20,1,16,5,0.3125,0.8,0,0.875,0.933333,0,6,4.57143,1.39971,2,13,5.85714,3.77694,0.4875,40,0.106667,0.540327,0.592308,0,0.125 +BF5_cpmp_16_5_64_13_39_3,1,16,5,0.3125,0.8,0,0.75,0.923077,0,6,4.57143,1.4983,3,10,5.58333,2.25308,0.4875,40,0.186667,0.530952,0.538462,0,0.125 +BF5_cpmp_16_5_64_13_39_4,1,16,5,0.3125,0.8,0,0.8125,0.928571,0,6,4.57143,1.44984,3,12,6.76923,2.57664,0.4875,40,0.557778,0.557143,0.638462,0,0.1875 +BF5_cpmp_16_5_64_13_39_5,1,16,5,0.3125,0.8,0.0625,0.875,0.933333,0,7,4.57143,1.54524,2,10,5.85714,2.79942,0.4875,41,0.0555556,0.54122,0.551282,0.1625,0.0625 +BF5_cpmp_16_5_64_13_39_8,1,16,5,0.3125,0.8,0,0.75,0.75,0,6,4.57143,1.34771,2,11,5.41667,2.92855,0.4875,41,0.111111,0.516369,0.548077,0,0 +BF5_cpmp_16_5_64_13_39_9,1,16,5,0.3125,0.8,0,0.875,0.875,0,9,4.57143,1.76126,2,12,5.85714,3.09047,0.4875,41,0.00888889,0.524901,0.692308,0,0.0625 +BF6_cpmp_16_5_64_13_48_1,1,16,5,0.3125,0.8,0.1875,0.8125,1,0,7,4.57143,1.63507,3,13,7.30769,3.24356,0.6,48,0.286667,0.498909,0.692308,0.1875,0.125 +BF6_cpmp_16_5_64_13_48_11,1,16,5,0.3125,0.8,0.1875,0.8125,1,0,7,4.57143,1.63507,2,11,6.30769,3.098,0.6,48,0.186667,0.470337,0.5,0.1875,0 +BF6_cpmp_16_5_64_13_48_12,1,16,5,0.3125,0.8,0.1875,0.8125,1,0,6,4.57143,1.59079,2,11,5.69231,2.64239,0.6,50,0.0444444,0.456448,0.394231,0.1875,0 +BF6_cpmp_16_5_64_13_48_13,1,16,5,0.3125,0.8,0.1875,0.8125,1,0,7,4.57143,1.63507,2,12,5.07692,2.99901,0.6,50,0.0444444,0.426538,0.357143,0.2,0.0625 +BF6_cpmp_16_5_64_13_48_15,1,16,5,0.3125,0.8,0.125,0.8125,1,0,7,4.57143,1.54524,2,12,6.53846,3.00296,0.6,48,0.251111,0.496875,0.592308,0.1875,0 +BF6_cpmp_16_5_64_13_48_16,1,16,5,0.3125,0.8,0.125,0.875,1,0,6,4.57143,1.39971,3,10,6.35714,2.02157,0.6,50,0,0.505853,0.519231,0.1625,0.125 +BF6_cpmp_16_5_64_13_48_17,1,16,5,0.3125,0.8,0.125,0.8125,1,0,7,4.57143,1.4983,2,13,6.38462,3.4981,0.6,49,0,0.493899,0.515385,0.15,0 +BF6_cpmp_16_5_64_13_48_18,1,16,5,0.3125,0.8,0.0625,0.8125,1,0,7,4.57143,1.4983,2,10,5.53846,2.53029,0.6,50,0.0444444,0.502877,0.492308,0.0625,0 +BF6_cpmp_16_5_64_13_48_19,1,16,5,0.3125,0.8,0.1875,0.8125,1,0,7,4.57143,1.59079,2,12,7,3.50823,0.6,48,0.0444444,0.483879,0.326923,0.2,0.125 +BF6_cpmp_16_5_64_13_48_2,1,16,5,0.3125,0.8,0.1875,0.8125,1,0,6,4.57143,1.39971,2,11,6.38462,2.70437,0.6,48,0.322222,0.477282,0.509615,0.2,0.0625 +BF6_cpmp_16_5_64_13_48_3,1,16,5,0.3125,0.8,0.1875,0.8125,1,0,7,4.57143,1.59079,2,10,6.76923,2.72182,0.6,49,0.00888889,0.495288,0.615385,0.1875,0.125 +BF6_cpmp_16_5_64_13_48_4,1,16,5,0.3125,0.8,0.125,0.8125,1,0,8,4.57143,1.72023,2,13,5.69231,3.47263,0.6,49,0.357778,0.477579,0.569231,0.125,0 +BF6_cpmp_16_5_64_13_48_6,1,16,5,0.3125,0.8,0.1875,0.8125,1,0,8,4.57143,1.63507,2,12,6.30769,3.42804,0.6,48,0.4,0.475694,0.692308,0.1875,0 +BF6_cpmp_16_5_64_13_48_7,1,16,5,0.3125,0.8,0.1875,0.8125,1,0,6,4.57143,1.44984,2,10,6,2.48069,0.6,49,0.177778,0.471032,0.451923,0.1875,0 +BF6_cpmp_16_5_64_13_48_8,1,16,5,0.3125,0.8,0.1875,0.8125,1,0,7,4.57143,1.59079,2,13,7.07692,3.09991,0.6,48,0.0355556,0.475248,0.553846,0.1875,0.0625 +BF6_cpmp_16_5_64_13_48_9,1,16,5,0.3125,0.8,0.125,0.8125,1,0,8,4.57143,1.76126,3,12,7.15385,3.25449,0.6,48,0.277778,0.515823,0.628205,0.15,0.25 +BF7_cpmp_16_5_64_26_39_1,1,16,5,0.3125,0.8,0,0.8125,0.928571,0,5,2.37037,0.948611,2,21,9.61538,6.43952,0.4875,41,0.168889,0.525643,0.615385,0,0.1875 +BF7_cpmp_16_5_64_26_39_10,1,16,5,0.3125,0.8,0,0.875,0.875,0,5,2.37037,1.09369,2,18,8.85714,5.02646,0.4875,42,0.171111,0.465484,0.653846,0,0 +BF7_cpmp_16_5_64_26_39_11,1,16,5,0.3125,0.8,0,0.8125,0.8125,0,4,2.37037,0.908729,3,21,10.0769,5.52563,0.4875,42,0.255556,0.511677,0.461538,0,0.1875 +BF7_cpmp_16_5_64_26_39_12,1,16,5,0.3125,0.8,0,0.8125,0.928571,0,5,2.37037,1.02372,2,21,9.61538,6.58131,0.4875,41,0.248889,0.474949,0.307692,0,0 +BF7_cpmp_16_5_64_26_39_13,1,16,5,0.3125,0.8,0,0.875,0.875,0,5,2.37037,1.09369,2,21,11.4286,5.76584,0.4875,41,0.131111,0.533668,0.397436,0,0 +BF7_cpmp_16_5_64_26_39_14,1,16,5,0.3125,0.8,0.125,0.875,1,0,4,2.37037,0.908729,2,22,9.64286,6.76765,0.4875,40,0.128889,0.478524,0.423077,0.1625,0.125 +BF7_cpmp_16_5_64_26_39_15,1,16,5,0.3125,0.8,0,0.875,0.933333,0,4,2.37037,0.823189,2,23,10.5,6.75859,0.4875,41,0.462222,0.462603,0.538462,0,0.25 +BF7_cpmp_16_5_64_26_39_16,1,16,5,0.3125,0.8,0,0.75,0.857143,0,4,2.37037,0.908729,2,20,9.75,5.44862,0.4875,41,0.162222,0.488066,0.634615,0,0 +BF7_cpmp_16_5_64_26_39_17,1,16,5,0.3125,0.8,0,0.8125,0.8125,0,5,2.37037,1.09369,3,17,8.07692,4.74685,0.4875,43,0.306667,0.502315,0.519231,0,0 +BF7_cpmp_16_5_64_26_39_19,1,16,5,0.3125,0.8,0,0.9375,1,0,4,2.37037,0.823189,2,20,10.6667,4.86712,0.4875,40,0.271111,0.516461,0.538462,0,0 +BF7_cpmp_16_5_64_26_39_2,1,16,5,0.3125,0.8,0,0.9375,0.9375,0,4,2.37037,0.948611,2,24,10.2,7.13863,0.4875,41,0.244444,0.507305,0.5,0,0.1875 +BF7_cpmp_16_5_64_26_39_20,1,16,5,0.3125,0.8,0.0625,0.875,1,0,5,2.37037,0.986882,3,19,9.21429,5.14335,0.4875,41,0.266667,0.46659,0.442308,0.0625,0 +BF7_cpmp_16_5_64_26_39_3,1,16,5,0.3125,0.8,0,0.9375,0.9375,0,5,2.37037,1.05929,3,24,10.2667,6.15864,0.4875,43,0.433333,0.485545,0.692308,0,0.125 +BF7_cpmp_16_5_64_26_39_6,1,16,5,0.3125,0.8,0,0.875,0.933333,0,5,2.37037,1.15944,2,21,10.3571,5.65189,0.4875,41,0.146667,0.504758,0.634615,0,0 +BF7_cpmp_16_5_64_26_39_7,1,16,5,0.3125,0.8,0,0.875,0.933333,0,4,2.37037,0.823189,2,19,9.21429,4.94511,0.4875,42,0.304444,0.507536,0.692308,0,0.125 +BF7_cpmp_16_5_64_26_39_8,1,16,5,0.3125,0.8,0,0.8125,0.866667,0,4,2.37037,0.908729,3,21,10.9231,6.14519,0.4875,42,0.328889,0.500952,0.653846,0,0 +BF8_cpmp_16_5_64_26_48_1,1,16,5,0.3125,0.8,0.125,0.875,1,0,6,2.37037,1.41809,2,21,13.2143,6.24704,0.6,50,0.0977778,0.476543,0.602564,0.15,0 +BF8_cpmp_16_5_64_26_48_11,1,16,5,0.3125,0.8,0.125,0.875,1,0,5,2.37037,0.948611,2,25,12.7857,7.07287,0.6,48,0.195556,0.493416,0.641026,0.125,0.125 +BF8_cpmp_16_5_64_26_48_12,1,16,5,0.3125,0.8,0.125,0.8125,1,0,4,2.37037,0.776895,2,20,9.15385,5.15672,0.6,49,0.08,0.421399,0.75,0.125,0.0625 +BF8_cpmp_16_5_64_26_48_17,1,16,5,0.3125,0.8,0.125,0.8125,1,0,4,2.37037,0.908729,2,25,12.1538,7.02573,0.6,48,0.177778,0.468879,0.615385,0.125,0.1875 +BF8_cpmp_16_5_64_26_48_20,1,16,5,0.3125,0.8,0.125,0.8125,1,0,4,2.37037,0.823189,2,26,12.1538,7.38862,0.6,48,0.197778,0.462551,0.653846,0.1375,0.125 +BF8_cpmp_16_5_64_26_48_4,1,16,5,0.3125,0.8,0.1875,0.8125,1,0,5,2.37037,0.823189,4,24,14,6.57501,0.6,48,0.4,0.474434,0.365385,0.1875,0.25 +BF8_cpmp_16_5_64_26_48_5,1,16,5,0.3125,0.8,0.125,0.8125,1,0,5,2.37037,1.15944,2,24,13.2308,6.84053,0.6,48,0.186667,0.483796,0.538462,0.125,0.1875 +BF8_cpmp_16_5_64_26_48_6,1,16,5,0.3125,0.8,0.0625,0.875,1,0,4,2.37037,0.823189,2,26,11.4286,6.74764,0.6,49,0.1,0.481507,0.403846,0.1125,0.0625 +BF8_cpmp_16_5_64_26_48_7,1,16,5,0.3125,0.8,0.0625,0.8125,1,0,6,2.37037,1.02372,2,22,11.5385,6.75523,0.6,50,0.226667,0.473843,0.5,0.0625,0 +BF9_cpmp_16_8_77_16_47_1,1,16,8,0.5,0.601562,0,0.75,0.75,0,9,4.52941,2.40386,2,12,6.91667,3.6391,0.367188,49,0.109722,0.431014,0.552083,0,0 +BF9_cpmp_16_8_77_16_47_10,1,16,8,0.5,0.601562,0,0.75,0.75,0,6,4.52941,1.33362,2,16,6.58333,4.27119,0.367188,48,0.406944,0.480427,0.585938,0,0.0625 +BF9_cpmp_16_8_77_16_47_12,1,16,8,0.5,0.601562,0,0.75,0.8,0,7,4.52941,1.71902,2,15,5.66667,3.61325,0.367188,49,0.4375,0.420296,0.53125,0,0 +BF9_cpmp_16_8_77_16_47_3,1,16,8,0.5,0.601562,0,0.875,0.875,0,6,4.52941,1.28876,2,10,5.64286,2.43801,0.367188,51,0.359722,0.457475,0.53125,0,0 +BF9_cpmp_16_8_77_16_47_4,1,16,8,0.5,0.601562,0,0.8125,0.866667,0,7,4.52941,1.4191,2,13,6.46154,3.34239,0.367188,48,0.333333,0.473669,0.65625,0,0 +BF9_cpmp_16_8_77_16_47_6,1,16,8,0.5,0.601562,0,0.875,0.875,0,6,4.52941,1.4191,2,13,6.42857,3.7362,0.367188,49,0.301389,0.438523,0.695312,0,0 +BF9_cpmp_16_8_77_16_47_7,1,16,8,0.5,0.601562,0,0.875,0.875,0,8,4.52941,1.81878,2,11,5.78571,2.54049,0.367188,51,0.255556,0.485864,0.541667,0,0.125 +BF9_cpmp_16_8_77_16_47_8,1,16,8,0.5,0.601562,0,0.9375,0.9375,0,8,4.52941,1.68445,2,9,4.73333,1.98214,0.367188,51,0.233333,0.435235,0.523438,0,0 +LC2a_lc2a_1,1,12,6,0.5,0.694444,0,0.666667,0.727273,0,11,4.54545,3.17271,2,5,3.125,0.780625,0.263889,22,0.376543,0.530794,0.6,0,0 +LC2a_lc2a_10,1,12,6,0.5,0.694444,0,0.666667,0.666667,0,9,4.54545,2.46295,3,8,5.625,1.72753,0.263889,21,0.157407,0.583987,0.572917,0,0 +LC2a_lc2a_2,1,12,6,0.5,0.694444,0,0.833333,0.833333,0,9,4.54545,2.70903,2,6,4.1,1.22066,0.263889,21,0.527778,0.584369,0.635417,0,0 +LC2a_lc2a_3,1,12,6,0.5,0.694444,0,0.75,0.75,0,9,4.54545,2.8079,2,7,3.66667,1.69967,0.263889,21,0.0987654,0.598757,0.736111,0,0 +LC2a_lc2a_4,1,12,6,0.5,0.694444,0,0.916667,0.916667,0,8,4.54545,2.14746,2,6,4,1.2792,0.263889,22,0.160494,0.534554,0.708333,0,0 +LC2a_lc2a_5,1,12,6,0.5,0.694444,0,0.75,0.75,0,9,4.54545,2.49959,2,5,3.44444,1.06574,0.263889,22,0.166667,0.538315,0.631944,0,0 +LC2a_lc2a_7,1,12,6,0.5,0.694444,0,0.666667,0.666667,0,8,4.54545,2.27091,2,7,3.5,1.58114,0.263889,21,0.398148,0.53717,0.675,0,0 +LC2b_lc2b_1,1,12,6,0.5,0.694444,0,0.833333,1,0,14,4.54545,3.4997,2,8,4.2,1.66132,0.486111,37,0.364198,0.438631,0.652778,0,0 +LC2b_lc2b_2,1,12,6,0.5,0.694444,0,0.833333,0.909091,0,9,4.54545,2.14746,2,10,4.4,2.2891,0.486111,37,0.435185,0.483159,0.645833,0,0 +LC2b_lc2b_3,1,12,6,0.5,0.694444,0,1,1,0,9,4.54545,2.74238,2,8,4.16667,2.11476,0.486111,39,0.280864,0.508393,0.482143,0,0 +LC2b_lc2b_4,1,12,6,0.5,0.694444,0.0833333,0.833333,0.909091,0,7,4.54545,1.87634,3,8,5.4,1.74356,0.486111,37,0.648148,0.526433,0.613095,0.125,0 +LC2b_lc2b_5,1,12,6,0.5,0.694444,0.0833333,0.833333,1,0,8,4.54545,2.34961,2,10,5.1,2.07123,0.486111,38,0.512346,0.517931,0.45,0.0972222,0 +LC2b_lc2b_6,1,12,6,0.5,0.694444,0.0833333,0.833333,1,0,10,4.54545,2.70903,2,7,5,1.34164,0.486111,37,0.138889,0.485775,0.527778,0.222222,0 +LC2b_lc2b_7,1,12,6,0.5,0.694444,0.0833333,0.833333,1,0,6,4.54545,1.61604,2,5,3,1.18322,0.486111,38,0.265432,0.432854,0.28125,0.0972222,0 +LC2b_lc2b_9,1,12,6,0.5,0.694444,0,0.75,0.9,0,8,4.54545,1.82725,2,9,5,2.66667,0.486111,36,0.145062,0.483486,0.65,0,0 +LC3a_lc3a_1,1,12,6,0.5,0.75,0,0.583333,0.583333,0,9,4.90909,2.31417,2,9,4.71429,2.6573,0.277778,21,0.160494,0.591999,0.803571,0,0 +LC3a_lc3a_10,1,12,6,0.5,0.75,0,0.75,0.75,0,9,4.90909,2.50289,3,7,4.66667,1.24722,0.277778,22,0.151235,0.596359,0.729167,0,0 +LC3a_lc3a_3,1,12,6,0.5,0.75,0,0.666667,0.666667,0,9,4.90909,2.31417,2,6,3.875,1.16592,0.277778,22,0.435185,0.58148,0.576389,0,0 +LC3a_lc3a_4,1,12,6,0.5,0.75,0,0.75,0.75,0,8,4.90909,2.50289,2,7,3.77778,1.6178,0.277778,21,0.287037,0.576848,0.660714,0,0 +LC3a_lc3a_6,1,12,6,0.5,0.75,0,0.75,0.75,0,8,4.90909,2.02056,2,6,3.55556,1.42292,0.277778,22,0.169753,0.548834,0.691667,0,0 +LC3a_lc3a_7,1,12,6,0.5,0.75,0,0.916667,0.916667,0,9,4.90909,2.67835,2,8,3.54545,1.67134,0.277778,22,0.111111,0.520929,0.597222,0,0 +LC3a_lc3a_8,1,12,6,0.5,0.75,0,0.75,0.818182,0,9,4.90909,2.50289,2,7,4.22222,1.5476,0.277778,22,0.367284,0.537225,0.541667,0,0 +LC3a_lc3a_9,1,12,6,0.5,0.75,0,0.75,0.818182,0,9,4.90909,2.96815,2,8,3,1.82574,0.277778,22,0.200617,0.485939,0.569444,0,0 +LC3b_lc3b_1,1,12,6,0.5,0.75,0,1,1,0,9,4.90909,2.23422,2,10,4.41667,2.66015,0.513889,40,0.225309,0.523054,0.479167,0,0 +LC3b_lc3b_10,1,12,6,0.5,0.75,0,0.833333,0.909091,0,12,4.90909,3.20382,2,10,5.3,2.32594,0.513889,39,0.231481,0.512208,0.508333,0,0 +LC3b_lc3b_4,1,12,6,0.5,0.75,0.0833333,0.916667,1,0,11,4.90909,2.74539,2,5,3.45455,1.07565,0.513889,40,0.0987654,0.432854,0.4375,0.152778,0 +LC3b_lc3b_5,1,12,6,0.5,0.75,0.0833333,0.916667,1,0,10,4.90909,2.42916,2,7,3.72727,1.71044,0.513889,39,0.160494,0.465555,0.569444,0.0833333,0 +LC3b_lc3b_6,1,12,6,0.5,0.75,0,0.916667,1,0,8,4.90909,2.27454,2,8,5,2.17423,0.513889,40,0.410494,0.559244,0.552083,0,0 +LC3b_lc3b_8,1,12,6,0.5,0.75,0.0833333,0.916667,1,0,8,4.90909,2.27454,2,7,3.90909,1.72966,0.513889,39,0.685185,0.434216,0.4375,0.138889,0 +LC3b_lc3b_9,1,12,6,0.5,0.75,0,0.833333,1,0,10,4.90909,2.71208,2,9,4.5,2.61725,0.513889,40,0.475309,0.518149,0.576389,0,0 +cv_data3-5-13,1,5,5,1,0.6,0,1,1,1,1,1,0,1,7,3.8,2.31517,0.36,12,0.4,0.427111,0.6,0,0 +cv_data3-5-8,1,5,5,1,0.6,0,1,1,1,1,1,0,1,9,4.6,3.2619,0.36,11,0.4,0.425481,0.466667,0,0 +cv_data3-6-24,1,6,5,0.833333,0.6,0,1,1,1,1,1,0,1,11,4.66667,3.39935,0.333333,13,0.4,0.43786,0.8125,0,0 +cv_data3-6-26,1,6,5,0.833333,0.6,0,1,1,1,1,1,0,1,7,3.83333,2.11476,0.366667,14,0.4,0.409156,0.5,0,0 +cv_data3-7-12,1,7,5,0.714286,0.6,0,0.714286,0.714286,1,1,1,0,2,15,8.2,4.83322,0.257143,10,0.4,0.446259,0.823529,0,0.142857 +cv_data3-7-17,1,7,5,0.714286,0.6,0,1,1,1,1,1,0,2,13,8,3.74166,0.314286,14,0.4,0.439078,0.764706,0,0.285714 +cv_data3-7-18,1,7,5,0.714286,0.6,0,0.714286,0.714286,1,1,1,0,6,13,9.6,2.72764,0.285714,11,0.4,0.478005,0.588235,0,0.571429 +cv_data3-7-22,1,7,5,0.714286,0.6,0,1,1,1,1,1,0,1,11,6.28571,3.53409,0.371429,16,0.4,0.416553,0.470588,0,0.285714 +cv_data3-7-27,1,7,5,0.714286,0.6,0,1,1,1,1,1,0,1,14,6.14286,4.25705,0.342857,14,0.4,0.434014,0.470588,0,0.142857 +cv_data3-7-30,1,7,5,0.714286,0.6,0,1,1,1,1,1,0,3,16,7.71429,4.19913,0.314286,13,0.4,0.453364,0.352941,0,0.285714 +cv_data3-7-5,1,7,5,0.714286,0.6,0,1,1,1,1,1,0,1,14,5,4,0.314286,14,0.4,0.445578,0.882353,0,0 +cv_data3-8-17,1,8,5,0.625,0.6,0,0.75,0.75,1,1,1,0,2,21,10.6667,6.28932,0.225,10,0.4,0.454398,0.666667,0,0.25 +cv_data3-8-2,1,8,5,0.625,0.6,0,0.75,0.75,1,1,1,0,1,16,7.33333,4.98888,0.25,12,0.4,0.455845,0.666667,0,0.25 +cv_data3-8-24,1,8,5,0.625,0.6,0,0.75,0.75,1,1,1,0,1,18,9.33333,6.77413,0.225,10,0.4,0.442998,0.444444,0,0.375 +cv_data3-8-25,1,8,5,0.625,0.6,0,0.75,0.75,1,1,1,0,2,20,10,5.50757,0.25,11,0.4,0.469444,0.833333,0,0.375 +cv_data3-8-28,1,8,5,0.625,0.6,0,0.875,0.875,1,1,1,0,1,20,8.28571,6.27271,0.275,12,0.4,0.466146,0.722222,0,0.25 +cv_data3-8-37,1,8,5,0.625,0.6,0,0.75,0.75,1,1,1,0,1,17,7.66667,5.82142,0.225,10,0.4,0.46956,0.5,0,0.25 +cv_data3-8-38,1,8,5,0.625,0.6,0,1,1,1,1,1,0,1,18,7.5,5.12348,0.3,14,0.4,0.432234,0.611111,0,0 +cv_data3-8-7,1,8,5,0.625,0.6,0,1,1,1,1,1,0,1,11,6.25,3.34477,0.325,16,0.4,0.442824,0.5,0,0.125 +cv_data4-4-10,1,4,6,1.5,0.666667,0,1,1,1,1,1,0,1,10,4.75,3.56195,0.333333,10,0.333333,0.492131,0.375,0,0 +cv_data4-4-12,1,4,6,1.5,0.666667,0,1,1,1,1,1,0,4,8,5.75,1.47902,0.416667,13,0.333333,0.520908,0.75,0,0.75 +cv_data4-4-13,1,4,6,1.5,0.666667,0,1,1,1,1,1,0,4,8,6,1.58114,0.458333,15,0.333333,0.468076,0.5,0,0.5 +cv_data4-4-14,1,4,6,1.5,0.666667,0,1,1,1,1,1,0,2,14,5.75,4.81534,0.416667,13,0.333333,0.464366,0.6875,0,0 +cv_data4-4-15,1,4,6,1.5,0.666667,0,1,1,1,1,1,0,2,11,5,3.53553,0.291667,9,0.333333,0.489771,0.4375,0,0.25 +cv_data4-4-16,1,4,6,1.5,0.666667,0,1,1,1,1,1,0,1,9,5,3.16228,0.5,17,0.333333,0.441097,0.4375,0,0 +cv_data4-4-19,1,4,6,1.5,0.666667,0,1,1,1,1,1,0,1,9,5.5,3.20156,0.416667,13,0.333333,0.464591,0.75,0,0.25 +cv_data4-4-2,1,4,6,1.5,0.666667,0,1,1,1,1,1,0,1,13,7.25,4.60299,0.375,11,0.333333,0.502248,0.375,0,0 +cv_data4-4-20,1,4,6,1.5,0.666667,0,1,1,1,1,1,0,6,13,8.5,2.69258,0.458333,14,0.333333,0.530238,0.6875,0,0.5 +cv_data4-4-21,1,4,6,1.5,0.666667,0,1,1,1,1,1,0,1,9,4.5,2.95804,0.416667,13,0.333333,0.471223,0.4375,0,0.25 +cv_data4-4-23,1,4,6,1.5,0.666667,0,1,1,1,1,1,0,1,11,7,3.74166,0.5,16,0.333333,0.475382,0.4375,0,0.25 +cv_data4-4-24,1,4,6,1.5,0.666667,0,1,1,1,1,1,0,1,9,5.5,2.95804,0.416667,13,0.333333,0.512478,0.625,0,0.25 +cv_data4-4-25,1,4,6,1.5,0.666667,0,1,1,1,1,1,0,2,10,6,2.91548,0.458333,14,0.333333,0.464928,0.4375,0,0.5 +cv_data4-4-27,1,4,6,1.5,0.666667,0,1,1,1,1,1,0,1,7,4.75,2.27761,0.416667,14,0.333333,0.467963,0.875,0,0.25 +cv_data4-4-28,1,4,6,1.5,0.666667,0,1,1,1,1,1,0,1,5,3,1.58114,0.291667,9,0.333333,0.505171,0.5625,0,0.25 +cv_data4-4-29,1,4,6,1.5,0.666667,0,1,1,1,1,1,0,1,9,5,3.16228,0.375,12,0.333333,0.482689,0.75,0,0.25 +cv_data4-4-3,1,4,6,1.5,0.666667,0,1,1,1,1,1,0,3,13,6.5,3.90512,0.458333,14,0.333333,0.486398,0.625,0,0.5 +cv_data4-4-30,1,4,6,1.5,0.666667,0,1,1,1,1,1,0,1,9,4.5,2.95804,0.333333,10,0.333333,0.489096,0.5625,0,0 +cv_data4-4-33,1,4,6,1.5,0.666667,0,1,1,1,1,1,0,1,7,4.25,2.38485,0.416667,14,0.333333,0.483588,0.875,0,0.25 +cv_data4-4-34,1,4,6,1.5,0.666667,0,1,1,1,1,1,0,2,9,5,2.54951,0.375,11,0.333333,0.489321,0.625,0,0.5 +cv_data4-4-35,1,4,6,1.5,0.666667,0,1,1,1,1,1,0,1,10,4.5,3.5,0.5,16,0.333333,0.442896,0.6875,0,0 +cv_data4-4-7,1,4,6,1.5,0.666667,0,1,1,1,1,1,0,1,10,5.25,3.26917,0.458333,14,0.333333,0.457509,0.625,0,0.25 +cv_data4-5-1,1,5,6,1.2,0.666667,0,1,1,1,1,1,0,1,10,5.2,2.92575,0.4,16,0.333333,0.442734,0.529412,0,0.2 +cv_data4-5-12,1,5,6,1.2,0.666667,0,0.8,0.8,1,1,1,0,1,9,6,3.08221,0.366667,13,0.333333,0.489928,0.705882,0,0.6 +cv_data4-5-14,1,5,6,1.2,0.666667,0,1,1,1,1,1,0,2,18,8.6,5.71314,0.433333,16,0.333333,0.483094,0.647059,0,0 +cv_data4-5-15,1,5,6,1.2,0.666667,0,1,1,1,1,1,0,1,9,5.8,2.78568,0.466667,18,0.333333,0.461583,0.647059,0,0.4 +cv_data4-5-17,1,5,6,1.2,0.666667,0,1,1,1,1,1,0,2,13,6,3.89872,0.466667,18,0.333333,0.465324,0.470588,0,0.4 +cv_data4-5-19,1,5,6,1.2,0.666667,0,1,1,1,1,1,0,3,9,5.8,2.13542,0.4,16,0.333333,0.486259,0.882353,0,0.2 +cv_data4-5-2,1,5,6,1.2,0.666667,0,1,1,1,1,1,0,1,12,7.4,3.92938,0.433333,16,0.333333,0.471799,0.588235,0,0.4 +cv_data4-5-20,1,5,6,1.2,0.666667,0,1,1,1,1,1,0,1,12,6.6,4.36348,0.433333,15,0.333333,0.474604,0.764706,0,0 +cv_data4-5-21,1,5,6,1.2,0.666667,0,1,1,1,1,1,0,1,7,4.2,2.31517,0.4,16,0.333333,0.446835,0.588235,0,0 +cv_data4-5-22,1,5,6,1.2,0.666667,0,1,1,1,1,1,0,1,10,5.8,2.92575,0.466667,18,0.333333,0.445468,0.470588,0,0 +cv_data4-5-23,1,5,6,1.2,0.666667,0,1,1,1,1,1,0,1,10,5.6,3.49857,0.433333,17,0.333333,0.460432,0.411765,0,0 +cv_data4-5-24,1,5,6,1.2,0.666667,0,0.8,0.8,1,1,1,0,2,8,4.25,2.27761,0.333333,12,0.333333,0.454101,0.588235,0,0.2 +cv_data4-5-25,1,5,6,1.2,0.666667,0,1,1,1,1,1,0,1,18,9.8,6.04649,0.466667,17,0.333333,0.490504,0.823529,0,0.2 +cv_data4-5-26,1,5,6,1.2,0.666667,0,1,1,1,1,1,0,1,10,4.6,3.00666,0.433333,17,0.333333,0.47,0.764706,0,0.2 +cv_data4-5-27,1,5,6,1.2,0.666667,0,1,1,1,1,1,0,1,12,6.4,3.97995,0.333333,12,0.333333,0.497842,0.588235,0,0.4 +cv_data4-5-28,1,5,6,1.2,0.666667,0,1,1,1,1,1,0,1,12,7.2,3.86782,0.366667,13,0.333333,0.510072,0.588235,0,0.4 +cv_data4-5-29,1,5,6,1.2,0.666667,0,1,1,1,1,1,0,1,16,6.6,5.23832,0.433333,16,0.333333,0.461871,0.529412,0,0.2 +cv_data4-5-3,1,5,6,1.2,0.666667,0,1,1,1,1,1,0,2,12,7.4,3.44093,0.466667,18,0.333333,0.468273,0.529412,0,0.4 +cv_data4-5-30,1,5,6,1.2,0.666667,0,1,1,1,1,1,0,1,8,4.2,2.31517,0.466667,18,0.333333,0.433237,0.529412,0,0 +cv_data4-5-31,1,5,6,1.2,0.666667,0,1,1,1,1,1,0,1,13,6.6,5.004,0.466667,17,0.333333,0.457482,0.705882,0,0 +cv_data4-5-32,1,5,6,1.2,0.666667,0,1,1,1,1,1,0,8,14,10.8,2.31517,0.5,19,0.333333,0.521727,0.647059,0,0.6 +cv_data4-5-34,1,5,6,1.2,0.666667,0,1,1,1,1,1,0,1,13,6.4,4.17612,0.466667,17,0.333333,0.469281,0.705882,0,0.2 +cv_data4-5-35,1,5,6,1.2,0.666667,0,1,1,1,1,1,0,1,7,3.6,2.15407,0.333333,13,0.333333,0.47,0.705882,0,0 +cv_data4-5-39,1,5,6,1.2,0.666667,0,1,1,1,1,1,0,1,11,5.8,3.76298,0.4,15,0.333333,0.466691,0.823529,0,0.2 +cv_data4-5-5,1,5,6,1.2,0.666667,0,1,1,1,1,1,0,1,9,4,2.82843,0.366667,14,0.333333,0.430647,0.529412,0,0 +cv_data4-5-9,1,5,6,1.2,0.666667,0,1,1,1,1,1,0,2,14,8.2,4.30813,0.466667,17,0.333333,0.481079,0.647059,0,0.4 +cv_data4-6-1,1,6,6,1,0.666667,0,1,1,1,1,1,0,1,14,8,4.86484,0.416667,18,0.333333,0.473421,0.666667,0,0.333333 +cv_data4-6-10,1,6,6,1,0.666667,0,1,1,1,1,1,0,2,11,7,3.16228,0.416667,19,0.333333,0.468076,0.666667,0,0.166667 +cv_data4-6-11,1,6,6,1,0.666667,0,1,1,1,1,1,0,1,16,6.5,5.25198,0.388889,17,0.333333,0.483463,0.666667,0,0.166667 +cv_data4-6-12,1,6,6,1,0.666667,0,1,1,1,1,1,0,3,17,9.83333,4.56131,0.444444,19,0.333333,0.476469,0.333333,0,0.5 +cv_data4-6-13,1,6,6,1,0.666667,0,0.833333,0.833333,1,1,1,0,2,16,7,4.81664,0.25,10,0.333333,0.48796,0.611111,0,0.166667 +cv_data4-6-14,1,6,6,1,0.666667,0,1,1,1,1,1,0,1,22,9.5,7.32006,0.361111,15,0.333333,0.498751,0.833333,0,0 +cv_data4-6-15,1,6,6,1,0.666667,0,1,1,1,1,1,0,2,15,8.83333,4.0586,0.444444,20,0.333333,0.469724,0.555556,0,0.333333 +cv_data4-6-16,1,6,6,1,0.666667,0,0.833333,0.833333,1,1,1,0,3,18,9.4,6.01997,0.333333,13,0.333333,0.51284,0.833333,0,0.333333 +cv_data4-6-18,1,6,6,1,0.666667,0,1,1,1,1,1,0,1,16,7.66667,5.15321,0.472222,21,0.333333,0.457084,0.444444,0,0.333333 +cv_data4-6-19,1,6,6,1,0.666667,0,1,1,1,1,1,0,1,21,7.66667,7.11024,0.416667,17,0.333333,0.478717,0.777778,0,0.166667 +cv_data4-6-2,1,6,6,1,0.666667,0,1,1,1,1,1,0,2,11,6,3.05505,0.25,11,0.333333,0.538419,0.944444,0,0.333333 +cv_data4-6-20,1,6,6,1,0.666667,0,0.833333,0.833333,1,1,1,0,1,16,8.6,5.23832,0.361111,15,0.333333,0.482064,0.722222,0,0.333333 +cv_data4-6-21,1,6,6,1,0.666667,0,1,1,1,1,1,0,1,17,8.16667,5.01387,0.444444,20,0.333333,0.444794,0.555556,0,0.166667 +cv_data4-6-24,1,6,6,1,0.666667,0,1,1,1,1,1,0,1,15,8,5.16398,0.416667,19,0.333333,0.481515,0.611111,0,0.333333 +cv_data4-6-25,1,6,6,1,0.666667,0,1,1,1,1,1,0,2,14,6.33333,3.94405,0.444444,19,0.333333,0.414468,0.611111,0,0.166667 +cv_data4-6-27,1,6,6,1,0.666667,0,1,1,1,1,1,0,2,19,8.83333,5.81425,0.388889,16,0.333333,0.490857,0.944444,0,0.5 +cv_data4-6-28,1,6,6,1,0.666667,0,1,1,1,1,1,0,2,15,7.33333,4.26875,0.444444,20,0.333333,0.478567,0.611111,0,0.5 +cv_data4-6-29,1,6,6,1,0.666667,0,1,1,1,1,1,0,1,13,6.33333,4.14997,0.361111,16,0.333333,0.473571,0.777778,0,0 +cv_data4-6-3,1,6,6,1,0.666667,0,1,1,1,1,1,0,4,17,9.5,4.34933,0.416667,18,0.333333,0.498102,0.777778,0,0.5 +cv_data4-6-30,1,6,6,1,0.666667,0,0.833333,0.833333,1,1,1,0,1,14,7.6,5.1614,0.361111,15,0.333333,0.481115,0.722222,0,0.333333 +cv_data4-6-32,1,6,6,1,0.666667,0,1,1,1,1,1,0,1,21,9.16667,6.84146,0.416667,17,0.333333,0.48756,0.777778,0,0.166667 +cv_data4-6-33,1,6,6,1,0.666667,0,0.833333,0.833333,1,1,1,0,1,20,7.6,6.65132,0.305556,12,0.333333,0.493106,0.722222,0,0.333333 +cv_data4-6-34,1,6,6,1,0.666667,0,1,1,1,1,1,0,2,15,7.83333,4.52462,0.472222,21,0.333333,0.474121,0.611111,0,0 +cv_data4-6-35,1,6,6,1,0.666667,0,1,1,1,1,1,0,3,17,8,4.76095,0.416667,17,0.333333,0.486611,0.777778,0,0.166667 +cv_data4-6-36,1,6,6,1,0.666667,0,1,1,1,1,1,0,1,19,6.33333,5.8784,0.388889,17,0.333333,0.458733,0.444444,0,0.166667 +cv_data4-6-37,1,6,6,1,0.666667,0,1,1,1,1,1,0,1,12,5.16667,3.43592,0.361111,16,0.333333,0.453637,0.944444,0,0.166667 +cv_data4-6-39,1,6,6,1,0.666667,0,1,1,1,1,1,0,1,8,3.83333,2.26691,0.305556,14,0.333333,0.482464,0.611111,0,0 +cv_data4-6-4,1,6,6,1,0.666667,0,1,1,1,1,1,0,3,15,8,4.54606,0.416667,18,0.333333,0.496703,0.611111,0,0.333333 +cv_data4-6-40,1,6,6,1,0.666667,0,1,1,1,1,1,0,1,15,8.33333,4.88763,0.472222,21,0.333333,0.473671,0.388889,0,0.5 +cv_data4-6-5,1,6,6,1,0.666667,0,0.833333,0.833333,1,1,1,0,1,20,6.8,7.02567,0.388889,15,0.333333,0.488259,0.888889,0,0.333333 +cv_data4-6-7,1,6,6,1,0.666667,0,1,1,1,1,1,0,1,19,8.66667,6.59966,0.444444,19,0.333333,0.457234,0.611111,0,0.166667 +cv_data4-6-8,1,6,6,1,0.666667,0,0.833333,0.833333,1,1,1,0,1,15,8.2,5.34416,0.361111,14,0.333333,0.483963,0.888889,0,0.333333 +cv_data4-6-9,1,6,6,1,0.666667,0,0.833333,0.833333,1,1,1,0,2,14,7,4.97996,0.361111,15,0.333333,0.451888,0.555556,0,0.166667 +cv_data4-7-1,1,7,6,0.857143,0.666667,0,1,1,1,1,1,0,2,14,7.57143,4.49943,0.380952,19,0.333333,0.469975,0.947368,0,0.285714 +cv_data4-7-10,1,7,6,0.857143,0.666667,0,1,1,1,1,1,0,3,26,10.2857,7.51597,0.380952,18,0.333333,0.494054,0.684211,0,0.428571 +cv_data4-7-11,1,7,6,0.857143,0.666667,0,1,1,1,1,1,0,1,17,8.42857,5.60248,0.404762,21,0.333333,0.461313,0.526316,0,0.142857 +cv_data4-7-12,1,7,6,0.857143,0.666667,0,1,1,1,1,1,0,3,22,11,7.11136,0.452381,22,0.333333,0.461533,0.789474,0,0.285714 +cv_data4-7-13,1,7,6,0.857143,0.666667,0,1,1,1,1,1,0,1,17,9.14286,5.84144,0.333333,17,0.333333,0.485979,0.947368,0,0.285714 +cv_data4-7-14,1,7,6,0.857143,0.666667,0,1,1,1,1,1,0,1,19,7.85714,5.69282,0.428571,21,0.333333,0.482235,0.842105,0,0.285714 +cv_data4-7-15,1,7,6,0.857143,0.666667,0,1,1,1,1,1,0,1,16,8.42857,5.49954,0.380952,20,0.333333,0.451549,0.473684,0,0.142857 +cv_data4-7-16,1,7,6,0.857143,0.666667,0,0.714286,0.714286,1,1,1,0,2,7,4.2,1.72047,0.214286,11,0.333333,0.511562,0.421053,0,0.285714 +cv_data4-7-17,1,7,6,0.857143,0.666667,0,0.857143,0.857143,1,1,1,0,3,16,11,5.06623,0.357143,17,0.333333,0.485832,0.473684,0,0.571429 +cv_data4-7-19,1,7,6,0.857143,0.666667,0,1,1,1,1,1,0,1,26,11.2857,7.86882,0.380952,18,0.333333,0.483666,0.736842,0,0.285714 +cv_data4-7-2,1,7,6,0.857143,0.666667,0,1,1,1,1,1,0,1,19,8.57143,6.25316,0.404762,19,0.333333,0.485024,0.947368,0,0.285714 +cv_data4-7-20,1,7,6,0.857143,0.666667,0,1,1,1,1,1,0,2,22,11.2857,6.88091,0.428571,21,0.333333,0.480436,0.631579,0,0.285714 +cv_data4-7-21,1,7,6,0.857143,0.666667,0,1,1,1,1,1,0,1,21,7.14286,6.74915,0.428571,22,0.333333,0.43628,0.578947,0,0 +cv_data4-7-22,1,7,6,0.857143,0.666667,0,1,1,1,1,1,0,1,19,8.42857,5.52545,0.404762,20,0.333333,0.47181,0.526316,0,0.428571 +cv_data4-7-23,1,7,6,0.857143,0.666667,0,0.857143,0.857143,1,1,1,0,1,23,10,6.87992,0.309524,14,0.333333,0.518756,0.684211,0,0.428571 +cv_data4-7-24,1,7,6,0.857143,0.666667,0,1,1,1,1,1,0,2,16,7.57143,4.43548,0.404762,20,0.333333,0.452613,0.473684,0,0.142857 +cv_data4-7-25,1,7,6,0.857143,0.666667,0,1,1,1,1,1,0,1,25,8.28571,7.81417,0.380952,18,0.333333,0.480987,0.473684,0,0.142857 +cv_data4-7-26,1,7,6,0.857143,0.666667,0,1,1,1,1,1,0,1,14,7,4,0.357143,18,0.333333,0.468544,0.789474,0,0.142857 +cv_data4-7-27,1,7,6,0.857143,0.666667,0,1,1,1,1,1,0,1,26,10.2857,8.54759,0.452381,22,0.333333,0.479592,0.631579,0,0.142857 +cv_data4-7-28,1,7,6,0.857143,0.666667,0,1,1,1,1,1,0,1,12,6.28571,4.06076,0.47619,25,0.333333,0.421451,0.526316,0,0.142857 +cv_data4-7-29,1,7,6,0.857143,0.666667,0,1,1,1,1,1,0,2,22,8.57143,6.49961,0.357143,17,0.333333,0.499596,0.578947,0,0.285714 +cv_data4-7-3,1,7,6,0.857143,0.666667,0,0.714286,0.714286,1,1,1,0,2,19,7.6,6.01997,0.309524,14,0.333333,0.492475,0.894737,0,0.142857 +cv_data4-7-30,1,7,6,0.857143,0.666667,0,0.857143,0.857143,1,1,1,0,2,19,9.83333,5.14512,0.357143,17,0.333333,0.482125,0.578947,0,0.428571 +cv_data4-7-31,1,7,6,0.857143,0.666667,0,1,1,1,1,1,0,1,9,4.71429,2.6573,0.380952,21,0.333333,0.438445,0.631579,0,0.142857 +cv_data4-7-32,1,7,6,0.857143,0.666667,0,1,1,1,1,1,0,1,20,8.57143,6.09114,0.380952,19,0.333333,0.46803,0.789474,0,0 +cv_data4-7-33,1,7,6,0.857143,0.666667,0,1,1,1,1,1,0,1,14,7.14286,4.94046,0.380952,19,0.333333,0.472177,0.736842,0,0.142857 +cv_data4-7-34,1,7,6,0.857143,0.666667,0,1,1,1,1,1,0,1,22,8,6.76123,0.357143,17,0.333333,0.480877,0.947368,0,0.142857 +cv_data4-7-35,1,7,6,0.857143,0.666667,0,1,1,1,1,1,0,1,11,5.42857,3.45821,0.428571,22,0.333333,0.437931,0.368421,0,0 +cv_data4-7-36,1,7,6,0.857143,0.666667,0,1,1,1,1,1,0,1,23,10.2857,7.9231,0.404762,19,0.333333,0.48095,0.578947,0,0.142857 +cv_data4-7-37,1,7,6,0.857143,0.666667,0,1,1,1,1,1,0,1,13,6,3.70328,0.380952,19,0.333333,0.46546,0.736842,0,0.285714 +cv_data4-7-38,1,7,6,0.857143,0.666667,0,1,1,1,1,1,0,1,23,8.85714,7.15998,0.333333,16,0.333333,0.472434,0.421053,0,0.142857 +cv_data4-7-39,1,7,6,0.857143,0.666667,0,0.857143,0.857143,1,1,1,0,2,25,10.8333,8.19383,0.357143,16,0.333333,0.48407,0.736842,0,0.285714 +cv_data4-7-4,1,7,6,0.857143,0.666667,0,1,1,1,1,1,0,1,22,8.71429,7.59162,0.428571,21,0.333333,0.48073,0.947368,0,0 +cv_data4-7-40,1,7,6,0.857143,0.666667,0,1,1,1,1,1,0,1,17,8.57143,5.57692,0.380952,19,0.333333,0.481097,0.842105,0,0.285714 +cv_data4-7-5,1,7,6,0.857143,0.666667,0,1,1,1,1,1,0,1,20,8.71429,5.64963,0.404762,20,0.333333,0.457862,0.578947,0,0.142857 +cv_data4-7-6,1,7,6,0.857143,0.666667,0,1,1,1,1,1,0,3,18,9.42857,5.85191,0.5,26,0.333333,0.456688,0.473684,0,0.142857 +cv_data4-7-7,1,7,6,0.857143,0.666667,0,1,1,1,1,1,0,2,16,8.28571,4.36592,0.452381,23,0.333333,0.464065,0.684211,0,0.142857 +cv_data4-7-8,1,7,6,0.857143,0.666667,0,1,1,1,1,1,0,1,16,7.42857,5.42105,0.333333,17,0.333333,0.47816,0.684211,0,0.142857 +cv_data4-7-9,1,7,6,0.857143,0.666667,0,1,1,1,1,1,0,1,16,7,5.60612,0.357143,18,0.333333,0.484657,0.789474,0,0.142857 +cv_data5-10-19,1,10,7,0.7,0.714286,0,1,1,1,1,1,0,2,25,11.6,8.26075,0.457143,36,0.285714,0.48866,0.541667,0,0.2 +cv_data5-10-23,1,10,7,0.7,0.714286,0,1,1,1,1,1,0,1,32,14.7,11.0639,0.457143,35,0.285714,0.460788,0.375,0,0 +cv_data5-10-32,1,10,7,0.7,0.714286,0,0.9,0.9,1,1,1,0,2,20,10.6667,5.07718,0.4,31,0.285714,0.511379,0.958333,0,0.5 +cv_data5-10-33,1,10,7,0.7,0.714286,0,1,1,1,1,1,0,1,28,14.4,8.61626,0.5,40,0.285714,0.484207,0.666667,0,0.2 +cv_data5-4-1,1,4,7,1.75,0.714286,0,1,1,1,1,1,0,3,9,5.5,2.29129,0.428571,16,0.285714,0.488916,0.444444,0,0.25 +cv_data5-4-10,1,4,7,1.75,0.714286,0,1,1,1,1,1,0,3,13,7,3.937,0.428571,15,0.285714,0.509975,0.722222,0,0 +cv_data5-4-12,1,4,7,1.75,0.714286,0,1,1,1,1,1,0,1,8,3.5,2.69258,0.428571,17,0.285714,0.494027,0.555556,0,0 +cv_data5-4-13,1,4,7,1.75,0.714286,0,1,1,1,1,1,0,1,17,6.25,6.37868,0.464286,17,0.285714,0.473892,0.666667,0,0 +cv_data5-4-14,1,4,7,1.75,0.714286,0,1,1,1,1,1,0,1,11,6,3.80789,0.535714,20,0.285714,0.46564,0.611111,0,0.25 +cv_data5-4-15,1,4,7,1.75,0.714286,0,1,1,1,1,1,0,5,16,10.5,4.272,0.5,17,0.285714,0.543904,0.444444,0,0.5 +cv_data5-4-16,1,4,7,1.75,0.714286,0,1,1,1,1,1,0,1,11,5.25,3.76663,0.5,18,0.285714,0.491749,0.5,0,0.25 +cv_data5-4-17,1,4,7,1.75,0.714286,0,1,1,1,1,1,0,1,13,7,5.52268,0.5,18,0.285714,0.485283,0.611111,0,0 +cv_data5-4-18,1,4,7,1.75,0.714286,0,0.75,0.75,1,1,1,0,2,8,5,2.44949,0.25,8,0.285714,0.555788,0.777778,0,0.5 +cv_data5-4-20,1,4,7,1.75,0.714286,0,1,1,1,1,1,0,1,9,4.75,3.03109,0.428571,15,0.285714,0.505111,0.888889,0,0.25 +cv_data5-4-21,1,4,7,1.75,0.714286,0,1,1,1,1,1,0,4,8,6.25,1.47902,0.5,19,0.285714,0.510222,0.777778,0,0.25 +cv_data5-4-22,1,4,7,1.75,0.714286,0,1,1,1,1,1,0,4,9,6.25,1.92029,0.464286,17,0.285714,0.530419,0.833333,0,0.5 +cv_data5-4-23,1,4,7,1.75,0.714286,0,1,1,1,1,1,0,1,11,6,3.80789,0.321429,11,0.285714,0.52032,0.777778,0,0.25 +cv_data5-4-24,1,4,7,1.75,0.714286,0,1,1,1,1,1,0,1,13,5.75,4.76314,0.5,18,0.285714,0.462254,0.611111,0,0.25 +cv_data5-4-25,1,4,7,1.75,0.714286,0,1,1,1,1,1,0,4,16,9.75,5.30919,0.464286,16,0.285714,0.546305,0.666667,0,0.5 +cv_data5-4-27,1,4,7,1.75,0.714286,0,1,1,1,1,1,0,2,19,9,6.20484,0.5,17,0.285714,0.504557,0.666667,0,0.25 +cv_data5-4-29,1,4,7,1.75,0.714286,0,1,1,1,1,1,0,4,11,7.25,2.86138,0.535714,20,0.285714,0.514286,0.777778,0,0.5 +cv_data5-4-3,1,4,7,1.75,0.714286,0,1,1,1,1,1,0,4,16,8.25,4.60299,0.5,17,0.285714,0.515825,0.611111,0,0.5 +cv_data5-4-30,1,4,7,1.75,0.714286,0,1,1,1,1,1,0,3,12,6.25,3.49106,0.464286,16,0.285714,0.499631,0.888889,0,0.25 +cv_data5-4-31,1,4,7,1.75,0.714286,0,1,1,1,1,1,0,2,13,8.25,4.02337,0.464286,17,0.285714,0.543904,0.611111,0,0.75 +cv_data5-4-32,1,4,7,1.75,0.714286,0,1,1,1,1,1,0,1,8,3.75,2.68095,0.5,19,0.285714,0.445751,0.333333,0,0 +cv_data5-4-33,1,4,7,1.75,0.714286,0,1,1,1,1,1,0,3,12,8.5,3.3541,0.464286,16,0.285714,0.481281,0.388889,0,0.25 +cv_data5-4-34,1,4,7,1.75,0.714286,0,1,1,1,1,1,0,1,15,8.25,5.26189,0.5,18,0.285714,0.504926,0.611111,0,0.25 +cv_data5-4-35,1,4,7,1.75,0.714286,0,1,1,1,1,1,0,2,11,6.25,3.83243,0.464286,17,0.285714,0.490394,0.666667,0,0 +cv_data5-4-36,1,4,7,1.75,0.714286,0,1,1,1,1,1,0,5,16,10.5,4.03113,0.5,17,0.285714,0.504433,0.666667,0,0 +cv_data5-4-37,1,4,7,1.75,0.714286,0,1,1,1,1,1,0,1,16,10.25,5.76086,0.5,17,0.285714,0.514717,0.666667,0,0 +cv_data5-4-38,1,4,7,1.75,0.714286,0,1,1,1,1,1,0,1,11,4.25,3.96074,0.464286,16,0.285714,0.480973,0.444444,0,0 +cv_data5-4-4,1,4,7,1.75,0.714286,0,1,1,1,1,1,0,4,17,9.75,5.11737,0.535714,19,0.285714,0.518103,0.777778,0,0.25 +cv_data5-4-6,1,4,7,1.75,0.714286,0,1,1,1,1,1,0,1,10,6,3.39116,0.392857,14,0.285714,0.520628,0.611111,0,0.25 +cv_data5-4-7,1,4,7,1.75,0.714286,0,1,1,1,1,1,0,1,11,6.5,3.64005,0.392857,14,0.285714,0.50234,0.5,0,0.25 +cv_data5-4-9,1,4,7,1.75,0.714286,0,1,1,1,1,1,0,3,6,4.5,1.11803,0.428571,16,0.285714,0.488608,0.555556,0,0.5 +cv_data5-5-10,1,5,7,1.4,0.714286,0,1,1,1,1,1,0,1,12,5,3.89872,0.428571,19,0.285714,0.476768,0.631579,0,0 +cv_data5-5-11,1,5,7,1.4,0.714286,0,1,1,1,1,1,0,1,6,3.8,1.72047,0.4,17,0.285714,0.462424,0.473684,0,0 +cv_data5-5-12,1,5,7,1.4,0.714286,0,1,1,1,1,1,0,1,15,7.2,4.91528,0.428571,18,0.285714,0.492611,0.789474,0,0.4 +cv_data5-5-13,1,5,7,1.4,0.714286,0,1,1,1,1,1,0,6,11,8.4,1.85472,0.514286,23,0.285714,0.466443,0.684211,0,0.4 +cv_data5-5-14,1,5,7,1.4,0.714286,0,1,1,1,1,1,0,4,12,8.6,2.87054,0.485714,21,0.285714,0.485951,0.421053,0,0.2 +cv_data5-5-15,1,5,7,1.4,0.714286,0,1,1,1,1,1,0,1,10,4.6,3.38231,0.457143,19,0.285714,0.472828,0.631579,0,0 +cv_data5-5-16,1,5,7,1.4,0.714286,0,1,1,1,1,1,0,1,16,7,5.93296,0.4,16,0.285714,0.475271,0.736842,0,0 +cv_data5-5-17,1,5,7,1.4,0.714286,0,1,1,1,1,1,0,3,8,5.6,1.85472,0.457143,21,0.285714,0.459783,0.894737,0,0 +cv_data5-5-18,1,5,7,1.4,0.714286,0,1,1,1,1,1,0,2,9,6,2.44949,0.428571,19,0.285714,0.497419,0.526316,0,0.4 +cv_data5-5-19,1,5,7,1.4,0.714286,0,1,1,1,1,1,0,3,17,7.6,5.004,0.485714,22,0.285714,0.463133,0.526316,0,0.2 +cv_data5-5-20,1,5,7,1.4,0.714286,0,1,1,1,1,1,0,1,16,5.8,5.49181,0.428571,17,0.285714,0.488828,0.473684,0,0.2 +cv_data5-5-21,1,5,7,1.4,0.714286,0,1,1,1,1,1,0,1,16,7,5.93296,0.457143,19,0.285714,0.487921,0.526316,0,0.2 +cv_data5-5-22,1,5,7,1.4,0.714286,0,1,1,1,1,1,0,1,16,7.2,5.84466,0.428571,19,0.285714,0.484296,0.421053,0,0.2 +cv_data5-5-23,1,5,7,1.4,0.714286,0,1,1,1,1,1,0,1,12,5.4,4.31741,0.4,17,0.285714,0.506167,0.736842,0,0.2 +cv_data5-5-24,1,5,7,1.4,0.714286,0,1,1,1,1,1,0,2,12,6.4,3.61109,0.514286,23,0.285714,0.445596,0.368421,0,0.2 +cv_data5-5-25,1,5,7,1.4,0.714286,0,0.8,0.8,1,1,1,0,2,15,7.75,4.71036,0.428571,17,0.285714,0.505222,0.526316,0,0.4 +cv_data5-5-27,1,5,7,1.4,0.714286,0,1,1,1,1,1,0,2,14,9.2,4.53431,0.514286,23,0.285714,0.490286,0.684211,0,0.4 +cv_data5-5-28,1,5,7,1.4,0.714286,0,1,1,1,1,1,0,2,19,7.4,6.08605,0.457143,19,0.285714,0.497813,0.578947,0,0 +cv_data5-5-29,1,5,7,1.4,0.714286,0,1,1,1,1,1,0,1,14,6,5.09902,0.485714,21,0.285714,0.445557,0.578947,0,0 +cv_data5-5-3,1,5,7,1.4,0.714286,0,1,1,1,1,1,0,1,9,4.4,3.07246,0.485714,21,0.285714,0.432355,0.473684,0,0.2 +cv_data5-5-30,1,5,7,1.4,0.714286,0,1,1,1,1,1,0,1,14,7.8,4.9558,0.4,16,0.285714,0.538719,0.789474,0,0.2 +cv_data5-5-31,1,5,7,1.4,0.714286,0,1,1,1,1,1,0,1,17,9.2,5.45527,0.542857,23,0.285714,0.472079,0.842105,0,0 +cv_data5-5-32,1,5,7,1.4,0.714286,0,1,1,1,1,1,0,3,18,8.8,5.41849,0.457143,18,0.285714,0.508966,0.631579,0,0.4 +cv_data5-5-33,1,5,7,1.4,0.714286,0,1,1,1,1,1,0,3,14,8.2,4.53431,0.542857,24,0.285714,0.461714,0.842105,0,0 +cv_data5-5-36,1,5,7,1.4,0.714286,0,1,1,1,1,1,0,3,9,6.2,2.13542,0.457143,20,0.285714,0.454345,0.789474,0,0 +cv_data5-5-37,1,5,7,1.4,0.714286,0,1,1,1,1,1,0,2,13,6.2,4.16653,0.4,16,0.285714,0.512394,0.947368,0,0.2 +cv_data5-5-38,1,5,7,1.4,0.714286,0,0.8,0.8,1,1,1,0,2,15,9.25,5.11737,0.4,15,0.285714,0.518345,0.842105,0,0.2 +cv_data5-5-39,1,5,7,1.4,0.714286,0,1,1,1,1,1,0,2,13,6.2,3.96989,0.4,16,0.285714,0.527094,0.789474,0,0.2 +cv_data5-5-4,1,5,7,1.4,0.714286,0,1,1,1,1,1,0,2,16,6.4,5.08331,0.485714,21,0.285714,0.48197,0.947368,0,0 +cv_data5-5-40,1,5,7,1.4,0.714286,0,1,1,1,1,1,0,1,15,7.6,4.96387,0.457143,20,0.285714,0.521537,0.842105,0,0.2 +cv_data5-5-5,1,5,7,1.4,0.714286,0,1,1,1,1,1,0,3,17,9.4,4.84149,0.485714,20,0.285714,0.520315,0.421053,0,0.6 +cv_data5-5-6,1,5,7,1.4,0.714286,0,1,1,1,1,1,0,1,19,8,6.22896,0.428571,17,0.285714,0.496,0.684211,0,0.2 +cv_data5-5-7,1,5,7,1.4,0.714286,0,1,1,1,1,1,0,1,20,7.2,6.79412,0.457143,18,0.285714,0.490167,0.894737,0,0.2 +cv_data5-5-8,1,5,7,1.4,0.714286,0,1,1,1,1,1,0,1,15,6.2,4.9558,0.4,16,0.285714,0.507586,0.526316,0,0.2 +cv_data5-5-9,1,5,7,1.4,0.714286,0,0.8,0.8,1,1,1,0,1,14,5.5,5.02494,0.285714,11,0.285714,0.530601,0.631579,0,0.4 +cv_data5-6-11,1,6,7,1.16667,0.714286,0,1,1,1,1,1,0,3,18,9.5,5.34634,0.428571,21,0.285714,0.488205,0.5,0,0.333333 +cv_data5-6-19,1,6,7,1.16667,0.714286,0,1,1,1,1,1,0,1,11,4.83333,3.23608,0.380952,20,0.285714,0.454953,0.55,0,0 +cv_data5-6-2,1,6,7,1.16667,0.714286,0,0.833333,0.833333,1,1,1,0,2,15,7.6,4.75815,0.380952,18,0.285714,0.5052,0.65,0,0.166667 +cv_data5-6-20,1,6,7,1.16667,0.714286,0,1,1,1,1,1,0,1,18,6.5,6.07591,0.404762,19,0.285714,0.503722,0.75,0,0.166667 +cv_data5-6-22,1,6,7,1.16667,0.714286,0,1,1,1,1,1,0,2,14,6.5,4.11299,0.452381,23,0.285714,0.484346,0.85,0,0.166667 +cv_data5-6-25,1,6,7,1.16667,0.714286,0,1,1,1,1,1,0,2,18,7,5.41603,0.47619,23,0.285714,0.488889,0.7,0,0.166667 +cv_data5-6-26,1,6,7,1.16667,0.714286,0,1,1,1,1,1,0,1,13,4.83333,3.97562,0.547619,29,0.285714,0.436946,0.45,0,0 +cv_data5-6-28,1,6,7,1.16667,0.714286,0,1,1,1,1,1,0,1,15,7.83333,5.39804,0.47619,23,0.285714,0.479201,0.45,0,0 +cv_data5-6-30,1,6,7,1.16667,0.714286,0,1,1,1,1,1,0,1,19,11.1667,5.98377,0.428571,20,0.285714,0.502791,0.8,0,0.333333 +cv_data5-6-31,1,6,7,1.16667,0.714286,0,1,1,1,1,1,0,2,15,7.16667,5.2731,0.428571,22,0.285714,0.486617,0.5,0,0.166667 +cv_data5-6-33,1,6,7,1.16667,0.714286,0,1,1,1,1,1,0,3,17,8.16667,4.52462,0.5,26,0.285714,0.478736,0.65,0,0.333333 +cv_data5-6-35,1,6,7,1.16667,0.714286,0,1,1,1,1,1,0,4,11,8,2.38048,0.5,26,0.285714,0.468774,0.45,0,0.5 +cv_data5-6-37,1,6,7,1.16667,0.714286,0,1,1,1,1,1,0,2,14,9,3.91578,0.452381,23,0.285714,0.48139,0.45,0,0.333333 +cv_data5-6-5,1,6,7,1.16667,0.714286,0,1,1,1,1,1,0,1,15,7.83333,5.17741,0.52381,27,0.285714,0.484319,0.5,0,0.166667 +cv_data5-6-6,1,6,7,1.16667,0.714286,0,1,1,1,1,1,0,1,21,12,7.78888,0.428571,21,0.285714,0.491845,0.4,0,0.166667 +cv_data5-6-7,1,6,7,1.16667,0.714286,0,1,1,1,1,1,0,1,23,9.66667,8.03465,0.47619,23,0.285714,0.498604,0.75,0,0 +cv_data5-6-8,1,6,7,1.16667,0.714286,0,0.833333,0.833333,1,1,1,0,6,12,9.4,2.15407,0.428571,20,0.285714,0.518446,0.75,0,0.5 +cv_data5-7-1,1,7,7,1,0.714286,0,1,1,1,1,1,0,3,27,9.71429,7.66652,0.387755,22,0.285714,0.51883,0.666667,0,0.142857 +cv_data5-7-11,1,7,7,1,0.714286,0,1,1,1,1,1,0,2,26,9.28571,7.24498,0.469388,26,0.285714,0.49625,0.857143,0,0.142857 +cv_data5-7-12,1,7,7,1,0.714286,0,1,1,1,1,1,0,4,15,8.71429,3.69224,0.489796,29,0.285714,0.476666,0.666667,0,0.428571 +cv_data5-7-13,1,7,7,1,0.714286,0,1,1,1,1,1,0,1,14,8,4,0.44898,27,0.285714,0.460842,0.428571,0,0.142857 +cv_data5-7-15,1,7,7,1,0.714286,0,1,1,1,1,1,0,1,18,7.42857,5.36808,0.469388,27,0.285714,0.468583,0.761905,0,0.142857 +cv_data5-7-16,1,7,7,1,0.714286,0,1,1,1,1,1,0,1,18,7.14286,5.98638,0.408163,23,0.285714,0.465447,0.52381,0,0 +cv_data5-7-18,1,7,7,1,0.714286,0,1,1,1,1,1,0,1,25,9.42857,7.30669,0.530612,32,0.285714,0.447874,0.333333,0,0.142857 +cv_data5-7-19,1,7,7,1,0.714286,0,1,1,1,1,1,0,2,14,6.57143,3.8492,0.428571,24,0.285714,0.471137,0.571429,0,0.142857 +cv_data5-7-22,1,7,7,1,0.714286,0,0.857143,0.857143,1,1,1,0,1,9,5.16667,2.91071,0.367347,21,0.285714,0.499889,0.333333,0,0.428571 +cv_data5-7-31,1,7,7,1,0.714286,0,0.857143,0.857143,1,1,1,0,1,15,8.5,5.18813,0.408163,22,0.285714,0.466432,0.47619,0,0.142857 +cv_data5-7-32,1,7,7,1,0.714286,0,1,1,1,1,1,0,3,24,11.8571,7.29831,0.510204,30,0.285714,0.497778,0.904762,0,0.285714 +cv_data5-7-33,1,7,7,1,0.714286,0,1,1,1,1,1,0,1,30,12.1429,8.93514,0.510204,30,0.285714,0.495466,0.714286,0,0.285714 +cv_data5-7-36,1,7,7,1,0.714286,0,1,1,1,1,1,0,2,28,11.1429,9.6574,0.489796,29,0.285714,0.502121,0.857143,0,0.428571 +cv_data5-7-37,1,7,7,1,0.714286,0,0.857143,0.857143,1,1,1,0,1,30,10.6667,9.99444,0.387755,20,0.285714,0.500875,0.666667,0,0.285714 +cv_data5-7-4,1,7,7,1,0.714286,0,1,1,1,1,1,0,2,25,11.1429,6.87498,0.408163,22,0.285714,0.495989,0.904762,0,0.142857 +cv_data5-7-40,1,7,7,1,0.714286,0,1,1,1,1,1,0,2,28,10.7143,8.31031,0.510204,29,0.285714,0.46798,0.714286,0,0 +cv_data5-7-7,1,7,7,1,0.714286,0,1,1,1,1,1,0,2,26,13.7143,7.75913,0.428571,24,0.285714,0.522811,0.666667,0,0.285714 +cv_data5-8-19,1,8,7,0.875,0.714286,0,1,1,1,1,1,0,1,36,13.25,11.5081,0.446429,27,0.285714,0.48419,0.772727,0,0.125 +cv_data5-8-2,1,8,7,0.875,0.714286,0,1,1,1,1,1,0,1,32,15.75,9.09327,0.446429,27,0.285714,0.498337,0.863636,0,0.125 +cv_data5-8-25,1,8,7,0.875,0.714286,0,1,1,1,1,1,0,3,27,14.375,9.2997,0.446429,28,0.285714,0.466502,0.454545,0,0.25 +cv_data5-8-29,1,8,7,0.875,0.714286,0,1,1,1,1,1,0,1,26,9.875,7.95986,0.410714,27,0.285714,0.481404,0.363636,0,0.25 +cv_data5-8-3,1,8,7,0.875,0.714286,0,1,1,1,1,1,0,2,28,11.875,7.81725,0.5,34,0.285714,0.467811,0.727273,0,0.25 +cv_data5-8-36,1,8,7,0.875,0.714286,0,1,1,1,1,1,0,1,33,12.75,10.779,0.446429,27,0.285714,0.488932,0.681818,0,0.25 +cv_data5-8-38,1,8,7,0.875,0.714286,0,1,1,1,1,1,0,3,25,12.5,6.76387,0.482143,31,0.285714,0.47694,0.772727,0,0.125 +cv_data5-8-39,1,8,7,0.875,0.714286,0,1,1,1,1,1,0,1,21,9.375,7.39827,0.464286,30,0.285714,0.471444,0.818182,0,0.25 +cv_data5-8-4,1,8,7,0.875,0.714286,0,1,1,1,1,1,0,2,22,10.875,7.37288,0.464286,30,0.285714,0.479895,0.454545,0,0.125 +cv_data5-8-6,1,8,7,0.875,0.714286,0,1,1,1,1,1,0,1,18,7.5,5.5,0.446429,30,0.285714,0.462962,0.818182,0,0 +cv_data5-8-8,1,8,7,0.875,0.714286,0,1,1,1,1,1,0,2,18,9.25,5.23808,0.428571,29,0.285714,0.475015,0.363636,0,0.375 +cv_data5-9-10,1,9,7,0.777778,0.714286,0,1,1,1,1,1,0,2,23,10,6.84755,0.412698,31,0.285714,0.471848,0.869565,0,0.222222 +cv_data5-9-21,1,9,7,0.777778,0.714286,0,1,1,1,1,1,0,1,28,9.22222,7.64167,0.365079,27,0.285714,0.467494,0.956522,0,0 +cv_data5-9-3,1,9,7,0.777778,0.714286,0,1,1,1,1,1,0,1,21,7.88889,6.13631,0.47619,35,0.285714,0.439956,0.652174,0,0 +cv_data5-9-32,1,9,7,0.777778,0.714286,0,1,1,1,1,1,0,3,31,15.3333,8.65384,0.444444,31,0.285714,0.490069,0.869565,0,0.222222 +cv_data5-9-38,1,9,7,0.777778,0.714286,0,1,1,1,1,1,0,2,25,13.4444,8.26117,0.492063,37,0.285714,0.487733,0.608696,0,0 +cv_data5-9-4,1,9,7,0.777778,0.714286,0,1,1,1,1,1,0,5,20,10.4444,4.64545,0.412698,30,0.285714,0.502427,0.608696,0,0.333333 diff --git a/_articles/RJ-2025-045/CPMP-2015_data/feature_values_test.arff b/_articles/RJ-2025-045/CPMP-2015_data/feature_values_test.arff new file mode 100644 index 0000000000..58dfa6f55a --- /dev/null +++ b/_articles/RJ-2025-045/CPMP-2015_data/feature_values_test.arff @@ -0,0 +1,575 @@ +@RELATION feature_values_premarshalling_astar_2013 + +@ATTRIBUTE instance_id STRING +@ATTRIBUTE repetition NUMERIC +@ATTRIBUTE stacks NUMERIC +@ATTRIBUTE tiers NUMERIC +@ATTRIBUTE stack-tier-ratio NUMERIC +@ATTRIBUTE container-density NUMERIC +@ATTRIBUTE empty-stack-pct NUMERIC +@ATTRIBUTE overstowing-stack-pct NUMERIC +@ATTRIBUTE overstowing-2cont-stack-pct NUMERIC +@ATTRIBUTE group-same-min NUMERIC +@ATTRIBUTE group-same-max NUMERIC +@ATTRIBUTE group-same-mean NUMERIC +@ATTRIBUTE group-same-stdev NUMERIC +@ATTRIBUTE top-good-min NUMERIC +@ATTRIBUTE top-good-max NUMERIC +@ATTRIBUTE top-good-mean NUMERIC +@ATTRIBUTE top-good-stdev NUMERIC +@ATTRIBUTE overstowage-pct NUMERIC +@ATTRIBUTE bflb NUMERIC +@ATTRIBUTE left-density NUMERIC +@ATTRIBUTE tier-weighted-groups NUMERIC +@ATTRIBUTE avg-l1-top-left-lg-group NUMERIC +@ATTRIBUTE cont-empty-grt-estack NUMERIC +@ATTRIBUTE pct-bottom-pct-on-top NUMERIC + +@DATA +4-6-75pct-2_10,1,4,8,2,0.5625,0,0.75,0.75,2,2,2,0,1,4,2.33333,1.24722,0.34375,12,0.586538,0.430458,0.55,0,0.25 +4-6-75pct-2_102,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,7,3,2.44949,0.40625,16,0.413462,0.472711,0.75,0,0 +4-6-75pct-2_103,1,4,8,2,0.5625,0,1,1,2,2,2,0,2,4,3,1,0.375,14,0.596154,0.447672,0.65,0,0 +4-6-75pct-2_104,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,7,3.25,2.48747,0.40625,15,0.288462,0.465376,0.775,0,0 +4-6-75pct-2_105,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,2,1.5,0.5,0.375,15,0.365385,0.424198,0.55,0,0 +4-6-75pct-2_106,1,4,8,2,0.5625,0,1,1,2,2,2,0,2,5,3.5,1.11803,0.375,14,0.403846,0.465571,0.775,0,0 +4-6-75pct-2_107,1,4,8,2,0.5625,0,0.75,0.75,2,2,2,0,4,7,6,1.41421,0.3125,11,0.5,0.526017,0.825,0,0.5 +4-6-75pct-2_109,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,5,2.5,1.5,0.34375,13,0.538462,0.450704,0.7,0,0 +4-6-75pct-2_11,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,3,1.75,0.829156,0.375,16,0.375,0.436913,0.775,0,0 +4-6-75pct-2_112,1,4,8,2,0.5625,0,0.75,0.75,2,2,2,0,3,7,4.33333,1.88562,0.375,13,0.538462,0.49687,0.85,0,0.25 +4-6-75pct-2_113,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,7,3.75,2.38485,0.40625,16,0.336538,0.451976,0.575,0,0.25 +4-6-75pct-2_115,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,3,2.25,0.829156,0.3125,13,0.365385,0.437304,0.525,0,0 +4-6-75pct-2_116,1,4,8,2,0.5625,0,1,1,2,2,2,0,2,7,4.5,2.06155,0.375,14,0.538462,0.467919,0.725,0,0 +4-6-75pct-2_117,1,4,8,2,0.5625,0,0.75,0.75,2,2,2,0,1,3,2,0.816497,0.21875,8,0.538462,0.46205,0.7,0,0 +4-6-75pct-2_119,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,7,3,2.44949,0.375,14,0.509615,0.473494,0.825,0,0 +4-6-75pct-2_121,1,4,8,2,0.5625,0,0.75,1,2,2,2,0,1,7,3.33333,2.62467,0.4375,15,0.442308,0.443466,0.825,0,0 +4-6-75pct-2_122,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,4,2,1.22474,0.28125,12,0.5,0.469581,0.725,0,0 +4-6-75pct-2_124,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,3,2.25,0.829156,0.3125,14,0.5,0.454128,0.675,0,0.25 +4-6-75pct-2_125,1,4,8,2,0.5625,0,0.75,0.75,2,2,2,0,3,5,4,0.816497,0.375,13,0.576923,0.470462,0.675,0,0.25 +4-6-75pct-2_129,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,6,2.75,1.92029,0.25,10,0.538462,0.482003,0.775,0,0.25 +4-6-75pct-2_130,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,5,3,1.58114,0.3125,12,0.451923,0.45804,0.625,0,0.25 +4-6-75pct-2_134,1,4,8,2,0.5625,0,0.75,0.75,2,2,2,0,2,3,2.66667,0.471405,0.3125,11,0.423077,0.43437,0.5,0,0.25 +4-6-75pct-2_136,1,4,8,2,0.5625,0,0.75,0.75,2,2,2,0,1,6,4,2.16025,0.375,13,0.288462,0.486796,0.55,0,0 +4-6-75pct-2_144,1,4,8,2,0.5625,0,0.75,0.75,2,2,2,0,3,4,3.33333,0.471405,0.375,13,0.25,0.450411,0.8,0,0 +4-6-75pct-2_146,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,7,3.25,2.27761,0.3125,12,0.365385,0.449824,0.65,0,0 +4-6-75pct-2_147,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,6,2.5,2.06155,0.40625,16,0.548077,0.445031,0.5,0,0 +4-6-75pct-2_149,1,4,8,2,0.5625,0,0.75,0.75,2,2,2,0,4,7,5,1.41421,0.375,13,0.625,0.494327,0.825,0,0.5 +4-6-75pct-2_151,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,7,3.75,2.38485,0.40625,17,0.5,0.471733,0.6,0,0 +4-6-75pct-2_152,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,3,1.75,0.829156,0.375,15,0.336538,0.439847,0.575,0,0 +4-6-75pct-2_153,1,4,8,2,0.5625,0,1,1,2,2,2,0,2,6,4.5,1.5,0.40625,16,0.586538,0.484742,0.6,0,0.25 +4-6-75pct-2_155,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,3,1.75,0.829156,0.34375,14,0.326923,0.445423,0.625,0,0 +4-6-75pct-2_156,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,5,2.25,1.63936,0.375,15,0.336538,0.443466,0.65,0,0 +4-6-75pct-2_157,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,5,3,1.58114,0.375,14,0.336538,0.483079,0.725,0,0 +4-6-75pct-2_159,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,6,3.5,2.06155,0.375,14,0.326923,0.444933,0.675,0,0 +4-6-75pct-2_161,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,6,2.5,2.06155,0.3125,12,0.538462,0.46831,0.75,0,0 +4-6-75pct-2_163,1,4,8,2,0.5625,0,1,1,2,2,2,0,2,5,3.5,1.5,0.28125,11,0.586538,0.515649,0.9,0,0.25 +4-6-75pct-2_165,1,4,8,2,0.5625,0,1,1,2,2,2,0,3,5,3.75,0.829156,0.40625,16,0.25,0.460094,0.725,0,0.25 +4-6-75pct-2_166,1,4,8,2,0.5625,0,1,1,2,2,2,0,2,4,3,0.707107,0.375,15,0.413462,0.491002,0.775,0,0 +4-6-75pct-2_169,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,7,3.75,2.77263,0.40625,16,0.5,0.485329,0.65,0,0 +4-6-75pct-2_17,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,6,4,2.12132,0.4375,16,0.375,0.480829,0.75,0,0 +4-6-75pct-2_171,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,4,2.25,1.29904,0.40625,17,0.326923,0.441412,0.725,0,0 +4-6-75pct-2_172,1,4,8,2,0.5625,0,1,1,2,2,2,0,2,6,4,1.58114,0.40625,16,0.375,0.477015,0.675,0,0 +4-6-75pct-2_178,1,4,8,2,0.5625,0,0.75,0.75,2,2,2,0,1,3,2,0.816497,0.3125,12,0.596154,0.402191,0.45,0,0.25 +4-6-75pct-2_181,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,7,3,2.34521,0.4375,17,0.336538,0.438674,0.75,0,0 +4-6-75pct-2_183,1,4,8,2,0.5625,0,1,1,2,2,2,0,2,6,4.25,1.78536,0.375,14,0.673077,0.487774,0.625,0,0.25 +4-6-75pct-2_185,1,4,8,2,0.5625,0,0.75,0.75,2,2,2,0,3,7,4.33333,1.88562,0.34375,12,0.403846,0.51741,0.9,0,0.25 +4-6-75pct-2_187,1,4,8,2,0.5625,0,1,1,2,2,2,0,2,7,4.5,1.80278,0.375,14,0.365385,0.482981,0.825,0,0 +4-6-75pct-2_19,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,3,2,0.707107,0.40625,17,0.336538,0.429871,0.575,0,0 +4-6-75pct-2_193,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,4,2,1.22474,0.25,10,0.461538,0.436326,0.575,0,0 +4-6-75pct-2_194,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,6,3.75,1.78536,0.375,14,0.548077,0.480634,0.875,0,0 +4-6-75pct-2_198,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,4,2,1.22474,0.4375,19,0.5,0.434468,0.65,0,0 +4-6-75pct-2_199,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,4,2.25,1.08972,0.3125,12,0.25,0.441706,0.525,0,0.25 +4-6-75pct-2_2,1,4,8,2,0.5625,0,0.75,0.75,2,2,2,0,2,4,3,0.816497,0.25,9,0.490385,0.463713,0.675,0,0 +4-6-75pct-2_20,1,4,8,2,0.5625,0,0.75,0.75,2,2,2,0,2,4,3,0.816497,0.34375,12,0.288462,0.431436,0.375,0,0 +4-6-75pct-2_209,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,5,2.75,1.47902,0.40625,15,0.451923,0.466549,0.6,0,0 +4-6-75pct-2_21,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,5,2.5,1.5,0.25,10,0.375,0.501076,0.725,0,0 +4-6-75pct-2_210,1,4,8,2,0.5625,0,1,1,2,2,2,0,2,6,3.25,1.63936,0.375,15,0.375,0.489241,0.75,0,0.25 +4-6-75pct-2_214,1,4,8,2,0.5625,0,1,1,2,2,2,0,2,3,2.5,0.5,0.3125,12,0.509615,0.484742,0.8,0,0.25 +4-6-75pct-2_216,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,5,3,2,0.34375,14,0.375,0.459018,0.65,0,0 +4-6-75pct-2_218,1,4,8,2,0.5625,0,0.75,0.75,2,2,2,0,4,7,5,1.41421,0.34375,12,0.461538,0.483568,0.6,0,0.5 +4-6-75pct-2_219,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,7,2.75,2.48747,0.375,15,0.5,0.446205,0.775,0,0 +4-6-75pct-2_221,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,5,2.5,1.65831,0.34375,14,0.461538,0.476721,0.775,0,0 +4-6-75pct-2_224,1,4,8,2,0.5625,0,1,1,2,2,2,0,2,4,3,1,0.40625,16,0.25,0.461659,0.675,0,0 +4-6-75pct-2_225,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,4,2,1.22474,0.3125,12,0.538462,0.468505,0.675,0,0 +4-6-75pct-2_226,1,4,8,2,0.5625,0,1,1,2,2,2,0,2,6,3.5,1.65831,0.4375,17,0.586538,0.460974,0.775,0,0.25 +4-6-75pct-2_227,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,5,2.5,1.65831,0.3125,12,0.413462,0.492077,0.725,0,0 +4-6-75pct-2_23,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,7,3.75,2.16506,0.40625,16,0.461538,0.493153,0.7,0,0 +4-6-75pct-2_230,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,3,2,0.707107,0.34375,15,0.5,0.42234,0.625,0,0 +4-6-75pct-2_231,1,4,8,2,0.5625,0,0.75,0.75,2,2,2,0,2,6,4.33333,1.69967,0.34375,12,0.423077,0.493349,0.625,0,0.25 +4-6-75pct-2_235,1,4,8,2,0.5625,0,1,1,2,2,2,0,2,6,4.25,1.78536,0.40625,16,0.461538,0.461952,0.5,0,0.25 +4-6-75pct-2_236,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,4,2.25,1.29904,0.40625,16,0.423077,0.417938,0.575,0,0 +4-6-75pct-2_239,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,5,3,2,0.40625,15,0.596154,0.456768,0.725,0,0 +4-6-75pct-2_242,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,4,2.5,1.11803,0.28125,11,0.461538,0.447085,0.6,0,0 +4-6-75pct-2_243,1,4,8,2,0.5625,0,0.75,0.75,2,2,2,0,3,5,4,0.816497,0.375,13,0.423077,0.473103,0.8,0,0.25 +4-6-75pct-2_248,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,5,3,2,0.375,14,0.634615,0.455595,0.825,0,0 +4-6-75pct-2_25,1,4,8,2,0.5625,0,0.75,1,2,2,2,0,1,4,2.33333,1.24722,0.40625,14,0.25,0.436326,0.825,0,0.25 +4-6-75pct-2_28,1,4,8,2,0.5625,0,0.75,0.75,2,2,2,0,2,3,2.66667,0.471405,0.34375,13,0.326923,0.438478,0.525,0,0 +4-6-75pct-2_31,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,5,3.25,1.47902,0.40625,15,0.326923,0.429871,0.5,0,0.25 +4-6-75pct-2_33,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,4,2,1.22474,0.34375,14,0.365385,0.429871,0.65,0,0 +4-6-75pct-2_35,1,4,8,2,0.5625,0,0.75,0.75,2,2,2,0,2,6,3.66667,1.69967,0.3125,11,0.423077,0.500391,0.85,0,0.25 +4-6-75pct-2_36,1,4,8,2,0.5625,0,1,1,2,2,2,0,3,6,4.5,1.11803,0.40625,16,0.326923,0.483764,0.575,0,0.25 +4-6-75pct-2_44,1,4,8,2,0.5625,0,0.75,0.75,2,2,2,0,1,5,3.33333,1.69967,0.34375,12,0.326923,0.447379,0.6,0,0.25 +4-6-75pct-2_47,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,6,3.25,1.78536,0.40625,15,0.336538,0.459996,0.675,0,0 +4-6-75pct-2_49,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,6,2.75,2.04634,0.375,15,0.548077,0.457746,0.675,0,0 +4-6-75pct-2_55,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,4,2.25,1.29904,0.28125,11,0.326923,0.441901,0.75,0,0 +4-6-75pct-2_60,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,7,4.25,2.16506,0.4375,17,0.365385,0.478482,0.775,0,0.25 +4-6-75pct-2_62,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,5,2.5,1.65831,0.375,14,0.461538,0.469777,0.725,0,0 +4-6-75pct-2_64,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,5,3,1.58114,0.34375,13,0.25,0.431045,0.35,0,0 +4-6-75pct-2_65,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,4,2.25,1.29904,0.40625,16,0.538462,0.451291,0.675,0,0 +4-6-75pct-2_68,1,4,8,2,0.5625,0,0.75,0.75,2,2,2,0,1,6,3,2.16025,0.3125,11,0.403846,0.460876,0.8,0,0 +4-6-75pct-2_71,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,5,2.25,1.63936,0.34375,13,0.403846,0.449335,0.75,0,0 +4-6-75pct-2_75,1,4,8,2,0.5625,0,0.75,0.75,2,2,2,0,2,4,3,0.816497,0.25,9,0.326923,0.500293,0.675,0,0.25 +4-6-75pct-2_76,1,4,8,2,0.5625,0,0.75,0.75,2,2,2,0,1,3,1.66667,0.942809,0.34375,13,0.423077,0.465082,0.75,0,0 +4-6-75pct-2_78,1,4,8,2,0.5625,0,0.75,1,2,2,2,0,1,7,4,2.44949,0.40625,14,0.442308,0.451095,0.85,0,0 +4-6-75pct-2_83,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,5,3,1.41421,0.375,14,0.336538,0.4643,0.775,0,0 +4-6-75pct-2_84,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,5,3.25,1.47902,0.375,14,0.509615,0.4643,0.55,0,0 +4-6-75pct-2_86,1,4,8,2,0.5625,0,1,1,2,2,2,0,2,5,3.5,1.11803,0.40625,15,0.403846,0.464397,0.775,0,0 +4-6-75pct-2_89,1,4,8,2,0.5625,0,1,1,2,2,2,0,2,7,5.25,1.92029,0.40625,16,0.5,0.487578,0.775,0,0.25 +4-6-75pct-2_93,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,5,2.5,1.5,0.375,14,0.586538,0.443955,0.625,0,0.25 +4-6-75pct-2_95,1,4,8,2,0.5625,0,0.75,0.75,2,2,2,0,2,4,3,0.816497,0.375,14,0.25,0.437011,0.375,0,0.25 +4-6-75pct-2_96,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,6,3,2.12132,0.40625,16,0.25,0.456279,0.7,0,0 +4-6-75pct-2_97,1,4,8,2,0.5625,0,1,1,2,2,2,0,1,3,2.25,0.829156,0.40625,16,0.403846,0.409038,0.6,0,0 +4-6-75pct-2_99,1,4,8,2,0.5625,0,0.75,1,2,2,2,0,3,7,4.33333,1.88562,0.375,13,0.336538,0.46293,0.6,0,0.25 +6-6-75pct-2_0,1,6,8,1.33333,0.5625,0,0.833333,1,1,2,1.92857,0.257539,2,7,4.8,2.03961,0.395833,20,0.25,0.417924,0.431818,0,0 +6-6-75pct-2_1,1,6,8,1.33333,0.5625,0,0.833333,1,1,2,1.92857,0.257539,1,9,5.8,2.78568,0.375,19,0.528846,0.470154,0.818182,0,0.166667 +6-6-75pct-2_102,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,1,6,3,1.63299,0.333333,19,0.423077,0.443746,0.818182,0,0 +6-6-75pct-2_103,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,1,9,3.66667,2.68742,0.354167,20,0.423077,0.442614,0.704545,0,0 +6-6-75pct-2_104,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,1,10,5.5,3.30404,0.395833,21,0.548077,0.440602,0.590909,0,0 +6-6-75pct-2_105,1,6,8,1.33333,0.5625,0,0.833333,1,1,2,1.92857,0.257539,1,7,3,2.09762,0.270833,14,0.336538,0.436787,0.840909,0,0 +6-6-75pct-2_108,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,2,8,4,2.23607,0.375,22,0.336538,0.437248,0.590909,0,0 +6-6-75pct-2_113,1,6,8,1.33333,0.5625,0,0.833333,0.833333,1,2,1.92857,0.257539,1,9,3.8,2.92575,0.229167,12,0.451923,0.473047,0.681818,0,0 +6-6-75pct-2_114,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,2,7,4,1.63299,0.395833,22,0.509615,0.447602,0.636364,0,0.166667 +6-6-75pct-2_115,1,6,8,1.33333,0.5625,0,0.833333,0.833333,1,2,1.92857,0.257539,1,7,3.8,1.93907,0.354167,18,0.634615,0.458291,0.795455,0,0.166667 +6-6-75pct-2_117,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,1,7,4.33333,2.13437,0.354167,20,0.576923,0.460094,0.681818,0,0 +6-6-75pct-2_119,1,6,8,1.33333,0.5625,0,0.833333,0.833333,1,2,1.92857,0.257539,1,7,4.2,2.31517,0.333333,18,0.586538,0.480173,0.681818,0,0 +6-6-75pct-2_120,1,6,8,1.33333,0.5625,0,0.833333,0.833333,1,2,1.92857,0.257539,2,8,4.8,2.03961,0.354167,19,0.586538,0.474304,0.795455,0,0 +6-6-75pct-2_122,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,1,11,4.16667,3.3375,0.3125,17,0.538462,0.477616,0.590909,0,0 +6-6-75pct-2_126,1,6,8,1.33333,0.5625,0,0.833333,1,1,2,1.92857,0.257539,1,7,4,2,0.395833,21,0.326923,0.418301,0.886364,0,0 +6-6-75pct-2_127,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,1,7,3.83333,2.03443,0.375,21,0.25,0.4305,0.795455,0,0 +6-6-75pct-2_128,1,6,8,1.33333,0.5625,0,0.833333,1,1,2,1.92857,0.257539,1,12,4.4,3.92938,0.395833,20,0.798077,0.433392,0.681818,0,0.166667 +6-6-75pct-2_13,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,1,8,4.5,2.36291,0.375,22,0.548077,0.420733,0.636364,0,0 +6-6-75pct-2_132,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,1,8,4.33333,2.49444,0.375,21,0.509615,0.453052,0.704545,0,0 +6-6-75pct-2_135,1,6,8,1.33333,0.5625,0.166667,0.833333,1,1,2,1.92857,0.257539,1,7,4,2.19089,0.375,19,0.25,0.408996,0.568182,0.4375,0 +6-6-75pct-2_136,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,2,7,4,1.91485,0.3125,18,0.25,0.433518,0.545455,0,0.166667 +6-6-75pct-2_138,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,1,11,5.16667,3.67045,0.375,20,0.25,0.446848,0.704545,0,0.166667 +6-6-75pct-2_142,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,1,7,5,2.23607,0.395833,23,0.365385,0.463238,0.772727,0,0.166667 +6-6-75pct-2_143,1,6,8,1.33333,0.5625,0,0.833333,1,1,2,1.92857,0.257539,1,11,4.2,3.54401,0.395833,20,0.798077,0.417002,0.659091,0,0.166667 +6-6-75pct-2_146,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,1,7,4.16667,2.11476,0.333333,19,0.586538,0.442153,0.613636,0,0.166667 +6-6-75pct-2_147,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,1,9,5,3.10913,0.416667,23,0.326923,0.451501,0.590909,0,0 +6-6-75pct-2_149,1,6,8,1.33333,0.5625,0,0.833333,0.833333,1,2,1.92857,0.257539,1,10,4.4,3.13688,0.395833,21,0.25,0.430584,0.636364,0,0.166667 +6-6-75pct-2_15,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,1,11,4.83333,3.5316,0.375,20,0.538462,0.432805,0.568182,0,0 +6-6-75pct-2_150,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,1,8,5.16667,2.47768,0.354167,20,0.538462,0.4528,0.659091,0,0 +6-6-75pct-2_151,1,6,8,1.33333,0.5625,0,0.833333,0.833333,1,2,1.92857,0.257539,3,12,6,3.52136,0.333333,17,0.596154,0.475897,0.772727,0,0 +6-6-75pct-2_152,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,1,7,4,2.16025,0.375,21,0.634615,0.448441,0.75,0,0.166667 +6-6-75pct-2_153,1,6,8,1.33333,0.5625,0,0.833333,0.833333,1,2,1.92857,0.257539,2,5,3.4,1.2,0.333333,18,0.413462,0.448734,0.636364,0,0 +6-6-75pct-2_156,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,1,7,3.83333,2.26691,0.395833,22,0.548077,0.417798,0.477273,0,0 +6-6-75pct-2_158,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,1,11,4.33333,3.54338,0.354167,19,0.288462,0.456321,0.659091,0,0 +6-6-75pct-2_16,1,6,8,1.33333,0.5625,0,0.833333,0.833333,1,2,1.92857,0.257539,1,10,4,3.16228,0.3125,16,0.403846,0.457034,0.636364,0,0.166667 +6-6-75pct-2_160,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,1,8,4.83333,2.26691,0.375,22,0.288462,0.475352,0.681818,0,0.166667 +6-6-75pct-2_161,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,1,8,3.83333,2.26691,0.395833,22,0.326923,0.415912,0.454545,0,0 +6-6-75pct-2_162,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,3,7,5.16667,1.46249,0.333333,19,0.451923,0.475059,0.568182,0,0 +6-6-75pct-2_164,1,6,8,1.33333,0.5625,0,0.833333,0.833333,1,2,1.92857,0.257539,1,7,3.4,2.24499,0.354167,19,0.548077,0.457705,0.681818,0,0 +6-6-75pct-2_165,1,6,8,1.33333,0.5625,0,0.833333,0.833333,1,2,1.92857,0.257539,2,5,4,1.09545,0.333333,18,0.365385,0.455734,0.613636,0,0.166667 +6-6-75pct-2_167,1,6,8,1.33333,0.5625,0,0.833333,0.833333,1,2,1.92857,0.257539,1,11,4.4,3.55528,0.375,19,0.326923,0.432093,0.659091,0,0.166667 +6-6-75pct-2_17,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,1,10,5.5,3.68556,0.395833,22,0.538462,0.461268,0.840909,0,0.166667 +6-6-75pct-2_170,1,6,8,1.33333,0.5625,0,0.833333,0.833333,1,2,1.92857,0.257539,1,7,3.8,2.31517,0.3125,17,0.586538,0.469022,0.795455,0,0 +6-6-75pct-2_171,1,6,8,1.33333,0.5625,0,0.833333,1,1,2,1.92857,0.257539,2,9,5.4,2.24499,0.375,19,0.25,0.462316,0.795455,0,0.166667 +6-6-75pct-2_172,1,6,8,1.33333,0.5625,0,0.833333,1,1,2,1.92857,0.257539,1,8,5,2.44949,0.354167,19,0.25,0.433141,0.636364,0,0.166667 +6-6-75pct-2_173,1,6,8,1.33333,0.5625,0,0.833333,0.833333,1,2,1.92857,0.257539,1,8,3.8,2.48193,0.291667,15,0.509615,0.4445,0.409091,0,0 +6-6-75pct-2_174,1,6,8,1.33333,0.5625,0,0.833333,0.833333,1,2,1.92857,0.257539,1,8,4.8,2.78568,0.354167,19,0.596154,0.436242,0.704545,0,0 +6-6-75pct-2_175,1,6,8,1.33333,0.5625,0,0.833333,1,1,2,1.92857,0.257539,1,6,3.4,1.74356,0.333333,17,0.528846,0.449069,0.613636,0,0 +6-6-75pct-2_176,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,2,9,4.66667,2.80872,0.354167,19,0.336538,0.460807,0.772727,0,0 +6-6-75pct-2_177,1,6,8,1.33333,0.5625,0,0.833333,0.833333,1,2,1.92857,0.257539,2,9,5.2,2.31517,0.291667,15,0.25,0.479502,0.681818,0,0.166667 +6-6-75pct-2_18,1,6,8,1.33333,0.5625,0,0.833333,0.833333,1,2,1.92857,0.257539,1,10,4.4,3.00666,0.270833,14,0.413462,0.452129,0.75,0,0.166667 +6-6-75pct-2_180,1,6,8,1.33333,0.5625,0.166667,0.833333,1,1,2,1.92857,0.257539,2,10,4,3.03315,0.354167,18,0.336538,0.434021,0.636364,0.4375,0.166667 +6-6-75pct-2_181,1,6,8,1.33333,0.5625,0,0.833333,1,1,2,1.92857,0.257539,2,8,4.4,2.33238,0.375,20,0.442308,0.418804,0.75,0,0 +6-6-75pct-2_183,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,1,9,4,2.70801,0.395833,22,0.625,0.425931,0.522727,0,0.166667 +6-6-75pct-2_184,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,1,10,4.16667,3.0777,0.291667,17,0.375,0.451543,0.409091,0,0 +6-6-75pct-2_185,1,6,8,1.33333,0.5625,0,0.833333,1,1,2,1.92857,0.257539,1,10,4.4,3.2619,0.375,19,0.25,0.424673,0.636364,0,0 +6-6-75pct-2_186,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,4,12,7,2.94392,0.395833,22,0.5,0.487886,0.818182,0,0 +6-6-75pct-2_187,1,6,8,1.33333,0.5625,0,0.833333,1,1,2,1.92857,0.257539,1,10,4.4,3.2619,0.3125,16,0.461538,0.426517,0.454545,0,0 +6-6-75pct-2_188,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,1,8,4.16667,2.40947,0.333333,20,0.548077,0.471873,0.590909,0,0.166667 +6-6-75pct-2_190,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,2,9,5,2.58199,0.354167,19,0.365385,0.46613,0.454545,0,0.166667 +6-6-75pct-2_191,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,2,9,6,2.58199,0.395833,21,0.365385,0.477825,0.840909,0,0.166667 +6-6-75pct-2_193,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,3,11,6.33333,2.98142,0.4375,24,0.461538,0.47334,0.613636,0,0 +6-6-75pct-2_194,1,6,8,1.33333,0.5625,0,0.666667,0.8,1,2,1.92857,0.257539,1,10,5.25,3.49106,0.333333,17,0.682692,0.447099,0.522727,0,0 +6-6-75pct-2_195,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,1,8,5,2.23607,0.291667,16,0.586538,0.491113,0.772727,0,0.166667 +6-6-75pct-2_196,1,6,8,1.33333,0.5625,0,0.833333,0.833333,1,2,1.92857,0.257539,1,9,5.8,3.05941,0.3125,16,0.509615,0.463405,0.75,0,0 +6-6-75pct-2_197,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,2,8,4.5,2.06155,0.395833,22,0.336538,0.442698,0.818182,0,0.166667 +6-6-75pct-2_198,1,6,8,1.33333,0.5625,0,0.833333,0.833333,1,2,1.92857,0.257539,2,4,2.8,0.748331,0.354167,19,0.576923,0.451962,0.545455,0,0.166667 +6-6-75pct-2_199,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,1,13,6,4.16333,0.3125,17,0.336538,0.502809,0.795455,0,0.166667 +6-6-75pct-2_2,1,6,8,1.33333,0.5625,0,0.833333,0.833333,1,2,1.92857,0.257539,1,12,7.4,4.17612,0.375,19,0.576923,0.449195,0.727273,0,0 +6-6-75pct-2_201,1,6,8,1.33333,0.5625,0,0.833333,0.833333,1,2,1.92857,0.257539,3,10,5.6,2.41661,0.3125,16,0.576923,0.460848,0.818182,0,0 +6-6-75pct-2_202,1,6,8,1.33333,0.5625,0,0.833333,1,1,2,1.92857,0.257539,3,7,4.8,1.32665,0.375,19,0.442308,0.445087,0.681818,0,0 +6-6-75pct-2_203,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,1,10,6.16667,3.13138,0.354167,20,0.326923,0.47422,0.613636,0,0 +6-6-75pct-2_204,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,1,8,4.66667,2.35702,0.354167,21,0.548077,0.434985,0.659091,0,0 +6-6-75pct-2_206,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,1,7,3.33333,2.0548,0.3125,18,0.25,0.425344,0.409091,0,0 +6-6-75pct-2_209,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,3,9,5.66667,2.4267,0.354167,20,0.451923,0.458166,0.659091,0,0 +6-6-75pct-2_211,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,2,11,5.5,3.2532,0.291667,16,0.586538,0.481514,0.681818,0,0 +6-6-75pct-2_215,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,1,8,3.66667,2.21108,0.354167,20,0.326923,0.43356,0.681818,0,0.166667 +6-6-75pct-2_219,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,1,4,2.5,0.957427,0.291667,17,0.413462,0.431087,0.5,0,0 +6-6-75pct-2_220,1,6,8,1.33333,0.5625,0,0.833333,1,1,2,1.92857,0.257539,1,5,3.6,1.49666,0.416667,22,0.25,0.434817,0.659091,0,0 +6-6-75pct-2_221,1,6,8,1.33333,0.5625,0,0.833333,1,1,2,1.92857,0.257539,1,9,4,2.82843,0.4375,23,0.615385,0.416541,0.727273,0,0.166667 +6-6-75pct-2_222,1,6,8,1.33333,0.5625,0,0.833333,0.833333,1,2,1.92857,0.257539,2,9,4.4,2.498,0.3125,16,0.403846,0.489437,0.681818,0,0.166667 +6-6-75pct-2_223,1,6,8,1.33333,0.5625,0,0.833333,1,1,2,1.92857,0.257539,2,6,3.8,1.46969,0.3125,17,0.336538,0.43859,0.545455,0,0.166667 +6-6-75pct-2_224,1,6,8,1.33333,0.5625,0,0.833333,0.833333,1,2,1.92857,0.257539,1,10,5.4,3.61109,0.375,19,0.326923,0.46307,0.704545,0,0 +6-6-75pct-2_225,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,1,11,5.16667,3.80424,0.395833,21,0.403846,0.440057,0.75,0,0 +6-6-75pct-2_227,1,6,8,1.33333,0.5625,0,0.833333,1,1,2,1.92857,0.257539,1,6,3.2,1.72047,0.3125,16,0.721154,0.445758,0.613636,0,0 +6-6-75pct-2_229,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,2,8,5,2.3094,0.291667,16,0.365385,0.478538,0.704545,0,0 +6-6-75pct-2_23,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,2,7,4.83333,1.57233,0.375,21,0.509615,0.432973,0.454545,0,0 +6-6-75pct-2_230,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,3,10,6.16667,2.73354,0.354167,20,0.5,0.486796,0.681818,0,0 +6-6-75pct-2_233,1,6,8,1.33333,0.5625,0,0.833333,0.833333,1,2,1.92857,0.257539,2,11,5.6,3.2,0.3125,16,0.423077,0.467304,0.636364,0,0.166667 +6-6-75pct-2_235,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,2,6,3.5,1.38444,0.395833,22,0.596154,0.42308,0.545455,0,0 +6-6-75pct-2_236,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,1,11,5.5,3.40343,0.3125,17,0.509615,0.485496,0.886364,0,0.166667 +6-6-75pct-2_238,1,6,8,1.33333,0.5625,0,0.833333,1,1,2,1.92857,0.257539,1,7,3.6,1.95959,0.375,20,0.25,0.429033,0.545455,0,0.166667 +6-6-75pct-2_240,1,6,8,1.33333,0.5625,0.166667,0.833333,1,1,2,1.92857,0.257539,1,12,5,3.84708,0.354167,17,0.288462,0.45607,0.727273,0.4375,0 +6-6-75pct-2_241,1,6,8,1.33333,0.5625,0,0.666667,0.8,1,2,1.92857,0.257539,3,9,7,2.44949,0.25,13,0.375,0.471202,0.636364,0,0 +6-6-75pct-2_242,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,1,9,4.33333,2.56038,0.375,22,0.509615,0.443494,0.704545,0,0 +6-6-75pct-2_245,1,6,8,1.33333,0.5625,0,0.833333,0.833333,1,2,1.92857,0.257539,1,7,3.2,2.13542,0.375,20,0.413462,0.451585,0.75,0,0 +6-6-75pct-2_248,1,6,8,1.33333,0.5625,0,0.833333,1,1,2,1.92857,0.257539,2,5,3.4,1.0198,0.395833,21,0.25,0.425008,0.840909,0,0 +6-6-75pct-2_249,1,6,8,1.33333,0.5625,0,0.833333,0.833333,1,2,1.92857,0.257539,1,6,3.8,2.03961,0.333333,18,0.336538,0.477364,0.931818,0,0 +6-6-75pct-2_25,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,2,7,4.16667,1.57233,0.375,22,0.375,0.454393,0.522727,0,0.166667 +6-6-75pct-2_26,1,6,8,1.33333,0.5625,0,0.833333,1,1,2,1.92857,0.257539,2,9,6,2.60768,0.333333,17,0.326923,0.468142,0.727273,0,0 +6-6-75pct-2_27,1,6,8,1.33333,0.5625,0,0.833333,0.833333,1,2,1.92857,0.257539,1,12,6,3.63318,0.375,19,0.586538,0.47728,0.75,0,0 +6-6-75pct-2_28,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,2,6,3.66667,1.49071,0.375,21,0.375,0.437374,0.613636,0,0 +6-6-75pct-2_3,1,6,8,1.33333,0.5625,0,0.833333,1,1,2,1.92857,0.257539,1,5,2.6,1.35647,0.354167,19,0.461538,0.431087,0.727273,0,0 +6-6-75pct-2_30,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,1,6,3.83333,1.86339,0.354167,20,0.375,0.44886,0.795455,0,0 +6-6-75pct-2_31,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,1,10,4.33333,3.19722,0.354167,19,0.288462,0.440309,0.590909,0,0 +6-6-75pct-2_32,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,2,7,4.33333,1.88562,0.354167,21,0.461538,0.455651,0.681818,0,0.166667 +6-6-75pct-2_35,1,6,8,1.33333,0.5625,0,0.833333,0.833333,1,2,1.92857,0.257539,2,8,4.6,2.05913,0.333333,18,0.288462,0.446974,0.75,0,0.166667 +6-6-75pct-2_39,1,6,8,1.33333,0.5625,0,0.833333,0.833333,1,2,1.92857,0.257539,2,6,3.8,1.46969,0.395833,21,0.403846,0.413397,0.75,0,0 +6-6-75pct-2_43,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,1,10,4,2.94392,0.3125,18,0.25,0.456112,0.545455,0,0.166667 +6-6-75pct-2_44,1,6,8,1.33333,0.5625,0,0.833333,0.833333,1,2,1.92857,0.257539,2,7,4.4,1.85472,0.354167,19,0.509615,0.443285,0.568182,0,0 +6-6-75pct-2_45,1,6,8,1.33333,0.5625,0,0.833333,0.833333,1,2,1.92857,0.257539,4,9,6,1.67332,0.395833,21,0.288462,0.459842,0.681818,0,0.166667 +6-6-75pct-2_46,1,6,8,1.33333,0.5625,0,0.833333,0.833333,1,2,1.92857,0.257539,2,8,3.6,2.24499,0.395833,21,0.336538,0.419349,0.613636,0,0 +6-6-75pct-2_5,1,6,8,1.33333,0.5625,0,0.833333,0.833333,1,2,1.92857,0.257539,1,10,5,3.74166,0.291667,15,0.586538,0.491071,0.772727,0,0.166667 +6-6-75pct-2_50,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,4,12,6.5,2.62996,0.395833,22,0.336538,0.461729,0.340909,0,0 +6-6-75pct-2_52,1,6,8,1.33333,0.5625,0,0.833333,0.833333,1,2,1.92857,0.257539,1,10,5.4,2.87054,0.375,19,0.365385,0.47619,0.818182,0,0 +6-6-75pct-2_54,1,6,8,1.33333,0.5625,0,0.833333,1,1,2,1.92857,0.257539,3,7,5.4,1.62481,0.395833,20,0.682692,0.435865,0.704545,0,0.166667 +6-6-75pct-2_55,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,4,9,6,1.63299,0.416667,23,0.5,0.467388,0.727273,0,0 +6-6-75pct-2_57,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,2,7,4,1.73205,0.416667,24,0.365385,0.442991,0.409091,0,0 +6-6-75pct-2_58,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,2,11,5.66667,3.19722,0.354167,19,0.326923,0.469651,0.772727,0,0 +6-6-75pct-2_59,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,1,7,3.83333,2.03443,0.395833,23,0.375,0.423206,0.5,0,0 +6-6-75pct-2_60,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,1,10,5,3.10913,0.395833,21,0.461538,0.430919,0.522727,0,0.166667 +6-6-75pct-2_62,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,2,7,4.33333,1.79505,0.354167,20,0.423077,0.454812,0.886364,0,0.166667 +6-6-75pct-2_63,1,6,8,1.33333,0.5625,0,0.833333,0.833333,1,2,1.92857,0.257539,2,8,4.8,2.31517,0.3125,17,0.490385,0.407277,0.545455,0,0.166667 +6-6-75pct-2_66,1,6,8,1.33333,0.5625,0,0.833333,0.833333,1,2,1.92857,0.257539,1,8,5,2.68328,0.395833,21,0.423077,0.451249,0.681818,0,0.166667 +6-6-75pct-2_67,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,1,10,6,3.16228,0.375,21,0.625,0.465879,0.681818,0,0 +6-6-75pct-2_68,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,1,12,5.16667,4.0586,0.395833,22,0.336538,0.432302,0.477273,0,0 +6-6-75pct-2_70,1,6,8,1.33333,0.5625,0,0.666667,0.8,1,2,1.92857,0.257539,1,7,5.25,2.48747,0.416667,21,0.25,0.434398,0.704545,0,0 +6-6-75pct-2_71,1,6,8,1.33333,0.5625,0,0.833333,0.833333,1,2,1.92857,0.257539,2,8,5,2.28035,0.333333,18,0.548077,0.457663,0.522727,0,0 +6-6-75pct-2_72,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,1,10,5.16667,3.18416,0.416667,23,0.538462,0.4598,0.727273,0,0 +6-6-75pct-2_73,1,6,8,1.33333,0.5625,0,0.833333,1,1,2,1.92857,0.257539,1,6,3.6,1.74356,0.291667,16,0.759615,0.446932,0.727273,0,0 +6-6-75pct-2_74,1,6,8,1.33333,0.5625,0,0.833333,1,1,2,1.92857,0.257539,2,8,5,2.19089,0.375,20,0.682692,0.436536,0.568182,0,0 +6-6-75pct-2_75,1,6,8,1.33333,0.5625,0.166667,0.833333,1,1,2,1.92857,0.257539,1,12,8,3.74166,0.416667,20,0.413462,0.455902,0.704545,0.4375,0 +6-6-75pct-2_76,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,4,8,5.66667,1.69967,0.354167,20,0.365385,0.456489,0.613636,0,0.166667 +6-6-75pct-2_78,1,6,8,1.33333,0.5625,0,0.833333,0.833333,1,2,1.92857,0.257539,2,10,6.2,3.6,0.416667,21,0.423077,0.45674,0.818182,0,0.166667 +6-6-75pct-2_81,1,6,8,1.33333,0.5625,0,0.833333,0.833333,1,2,1.92857,0.257539,1,12,4.8,3.86782,0.333333,17,0.365385,0.464453,0.818182,0,0.166667 +6-6-75pct-2_82,1,6,8,1.33333,0.5625,0,0.833333,0.833333,1,2,1.92857,0.257539,1,8,5.2,3.05941,0.333333,17,0.25,0.471957,0.772727,0,0 +6-6-75pct-2_84,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,1,6,3.5,1.89297,0.395833,23,0.461538,0.433015,0.795455,0,0 +6-6-75pct-2_86,1,6,8,1.33333,0.5625,0,0.833333,0.833333,1,2,1.92857,0.257539,1,7,5,2.19089,0.333333,18,0.365385,0.487215,0.795455,0,0.166667 +6-6-75pct-2_87,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,1,11,6.16667,3.28718,0.395833,21,0.25,0.439931,0.363636,0,0 +6-6-75pct-2_88,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,1,9,4.5,3.2532,0.375,21,0.336538,0.459884,0.681818,0,0 +6-6-75pct-2_9,1,6,8,1.33333,0.5625,0,0.833333,0.833333,1,2,1.92857,0.257539,3,6,4.2,1.16619,0.354167,19,0.509615,0.418637,0.431818,0,0.166667 +6-6-75pct-2_90,1,6,8,1.33333,0.5625,0,0.833333,1,1,2,1.92857,0.257539,1,6,3,1.78885,0.354167,19,0.326923,0.436033,0.590909,0,0 +6-6-75pct-2_91,1,6,8,1.33333,0.5625,0,0.833333,0.833333,1,2,1.92857,0.257539,1,12,5,3.89872,0.333333,17,0.663462,0.485454,0.636364,0,0.166667 +6-6-75pct-2_92,1,6,8,1.33333,0.5625,0,0.833333,1,1,2,1.92857,0.257539,1,7,3.8,2.31517,0.375,19,0.288462,0.445213,0.886364,0,0.166667 +6-6-75pct-2_93,1,6,8,1.33333,0.5625,0,0.833333,1,1,2,1.92857,0.257539,2,6,3.8,1.83303,0.354167,19,0.509615,0.423541,0.75,0,0 +6-6-75pct-2_94,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,2,11,6,2.88675,0.3125,17,0.413462,0.498156,0.727273,0,0.166667 +6-6-75pct-2_97,1,6,8,1.33333,0.5625,0,0.833333,1,1,2,1.92857,0.257539,1,8,4.6,2.93939,0.375,20,0.25,0.386989,0.409091,0,0.166667 +6-6-75pct-2_98,1,6,8,1.33333,0.5625,0,1,1,1,2,1.92857,0.257539,2,5,3.16667,1.06719,0.375,21,0.288462,0.423499,0.522727,0,0 +6-6-75pct-2_99,1,6,8,1.33333,0.5625,0,0.833333,0.833333,1,2,1.92857,0.257539,1,10,5.4,3.2619,0.354167,18,0.365385,0.464202,0.454545,0,0 +8-8-75pct-2_104,1,8,10,1.25,0.6,0,1,1,2,2,2,0,1,16,7.375,4.63512,0.35,30,0.393103,0.454332,0.589286,0,0 +8-8-75pct-2_19,1,8,10,1.25,0.6,0,0.875,0.875,2,2,2,0,2,12,5.57143,3.41665,0.3625,31,0.272414,0.44505,0.5,0,0.25 +8-8-75pct-2_196,1,8,10,1.25,0.6,0,0.75,0.75,2,2,2,0,1,11,5.66667,3.24893,0.4,34,0.503448,0.431817,0.803571,0,0.125 +8-8-75pct-2_66,1,8,10,1.25,0.6,0,0.75,0.75,2,2,2,0,3,19,8.16667,5.45945,0.425,35,0.2,0.415109,0.642857,0,0.125 +8-8-75pct-2_86,1,8,10,1.25,0.6,0,0.875,1,2,2,2,0,3,19,9.28571,5.96931,0.425,35,0.462069,0.435087,0.357143,0,0 +tdata-3-7-23,1,7,5,0.714286,0.6,0,1,1,1,1,1,0,2,16,9.14286,4.96929,0.342857,14,0.4,0.478382,0.705882,0,0.285714 +tdata-3-8-19,1,8,5,0.625,0.6,0,1,1,1,1,1,0,1,18,8.625,6.44084,0.35,16,0.4,0.427836,0.555556,0,0 +tdata-3-8-36,1,8,5,0.625,0.6,0,1,1,1,1,1,0,1,20,9.125,5.84032,0.325,15,0.4,0.4386,0.722222,0,0.375 +tdata-3-8-39,1,8,5,0.625,0.6,0,1,1,1,1,1,0,1,12,6.125,3.95087,0.25,13,0.4,0.440336,0.444444,0,0 +tdata-4-4-16,1,4,6,1.5,0.666667,0,1,1,1,1,1,0,1,7,3.25,2.27761,0.375,12,0.333333,0.460207,0.5625,0,0 +tdata-4-4-17,1,4,6,1.5,0.666667,0,1,1,1,1,1,0,2,7,4.5,2.06155,0.458333,15,0.333333,0.461781,0.5,0,0.25 +tdata-4-4-19,1,4,6,1.5,0.666667,0,1,1,1,1,1,0,1,7,4.25,2.38485,0.416667,14,0.333333,0.430643,0.5,0,0 +tdata-4-4-21,1,4,6,1.5,0.666667,0,1,1,1,1,1,0,2,10,5.75,3.03109,0.416667,12,0.333333,0.495166,0.5625,0,0.5 +tdata-4-4-22,1,4,6,1.5,0.666667,0,1,1,1,1,1,0,1,8,5,2.73861,0.458333,15,0.333333,0.464928,0.5625,0,0.5 +tdata-4-4-23,1,4,6,1.5,0.666667,0,1,1,1,1,1,0,1,10,5.25,3.49106,0.416667,13,0.333333,0.432779,0.375,0,0 +tdata-4-4-27,1,4,6,1.5,0.666667,0,1,1,1,1,1,0,1,5,3.25,1.47902,0.375,12,0.333333,0.467176,0.9375,0,0 +tdata-4-4-29,1,4,6,1.5,0.666667,0,1,1,1,1,1,0,2,12,5.75,3.76663,0.458333,14,0.333333,0.482239,0.5625,0,0.25 +tdata-4-4-31,1,4,6,1.5,0.666667,0,1,1,1,1,1,0,2,15,6,5.24404,0.291667,9,0.333333,0.498426,0.625,0,0 +tdata-4-4-4,1,4,6,1.5,0.666667,0,1,1,1,1,1,0,2,6,3.75,1.47902,0.458333,15,0.333333,0.461893,0.625,0,0.25 +tdata-4-4-8,1,4,6,1.5,0.666667,0,1,1,1,1,1,0,1,11,6.5,3.64005,0.458333,14,0.333333,0.504496,0.625,0,0.5 +tdata-4-5-11,1,5,6,1.2,0.666667,0,1,1,1,1,1,0,1,12,5.6,3.77359,0.433333,17,0.333333,0.446331,0.941176,0,0.2 +tdata-4-5-14,1,5,6,1.2,0.666667,0,1,1,1,1,1,0,1,9,5.4,2.87054,0.4,15,0.333333,0.46259,0.823529,0,0.4 +tdata-4-5-15,1,5,6,1.2,0.666667,0,1,1,1,1,1,0,1,9,5.2,2.99333,0.4,16,0.333333,0.459496,0.352941,0,0.4 +tdata-4-5-18,1,5,6,1.2,0.666667,0,1,1,1,1,1,0,2,16,7.8,5.38145,0.466667,17,0.333333,0.463885,0.529412,0,0.2 +tdata-4-5-19,1,5,6,1.2,0.666667,0,1,1,1,1,1,0,2,8,4.8,2.13542,0.366667,14,0.333333,0.493957,0.764706,0,0.4 +tdata-4-5-2,1,5,6,1.2,0.666667,0,1,1,1,1,1,0,1,7,4.4,2.15407,0.433333,17,0.333333,0.450863,0.588235,0,0.4 +tdata-4-5-22,1,5,6,1.2,0.666667,0,1,1,1,1,1,0,1,13,5.2,4.4,0.4,15,0.333333,0.43777,0.588235,0,0 +tdata-4-5-23,1,5,6,1.2,0.666667,0,0.8,0.8,1,1,1,0,2,12,6.25,3.63146,0.333333,11,0.333333,0.492878,0.647059,0,0.4 +tdata-4-5-24,1,5,6,1.2,0.666667,0,0.8,0.8,1,1,1,0,2,13,7,3.937,0.333333,11,0.333333,0.498633,0.823529,0,0.4 +tdata-4-5-25,1,5,6,1.2,0.666667,0,1,1,1,1,1,0,1,15,5.4,5.08331,0.4,14,0.333333,0.497338,0.705882,0,0.2 +tdata-4-5-27,1,5,6,1.2,0.666667,0,1,1,1,1,1,0,2,11,6,3.74166,0.433333,17,0.333333,0.469424,0.647059,0,0.2 +tdata-4-5-28,1,5,6,1.2,0.666667,0,1,1,1,1,1,0,1,17,6.6,5.57136,0.4,14,0.333333,0.466403,0.588235,0,0.2 +tdata-4-5-29,1,5,6,1.2,0.666667,0,0.8,0.8,1,1,1,0,1,17,9,5.83095,0.333333,11,0.333333,0.512878,0.764706,0,0.4 +tdata-4-5-31,1,5,6,1.2,0.666667,0,1,1,1,1,1,0,1,8,5,2.60768,0.433333,17,0.333333,0.476763,0.647059,0,0.4 +tdata-4-5-33,1,5,6,1.2,0.666667,0,1,1,1,1,1,0,1,11,5.6,3.87814,0.4,14,0.333333,0.470791,0.411765,0,0.4 +tdata-4-5-35,1,5,6,1.2,0.666667,0,1,1,1,1,1,0,1,14,8.2,5.03587,0.4,14,0.333333,0.488417,0.647059,0,0.2 +tdata-4-5-36,1,5,6,1.2,0.666667,0,1,1,1,1,1,0,2,13,5.4,3.92938,0.366667,13,0.333333,0.477194,0.764706,0,0.2 +tdata-4-5-37,1,5,6,1.2,0.666667,0,1,1,1,1,1,0,1,11,5.4,3.2619,0.3,11,0.333333,0.501151,0.882353,0,0.4 +tdata-4-5-38,1,5,6,1.2,0.666667,0,1,1,1,1,1,0,1,11,5.8,3.54401,0.3,11,0.333333,0.463022,0.588235,0,0 +tdata-4-5-40,1,5,6,1.2,0.666667,0,1,1,1,1,1,0,1,8,5,2.60768,0.366667,14,0.333333,0.457122,0.764706,0,0.2 +tdata-4-5-5,1,5,6,1.2,0.666667,0,1,1,1,1,1,0,1,11,5.8,4.0694,0.366667,14,0.333333,0.462446,0.588235,0,0.2 +tdata-4-5-7,1,5,6,1.2,0.666667,0,1,1,1,1,1,0,2,7,4.4,1.85472,0.433333,16,0.333333,0.447122,0.647059,0,0.2 +tdata-4-5-8,1,5,6,1.2,0.666667,0,1,1,1,1,1,0,1,9,5.6,2.87054,0.333333,12,0.333333,0.494029,0.647059,0,0.4 +tdata-4-5-9,1,5,6,1.2,0.666667,0,1,1,1,1,1,0,1,16,8.6,5.23832,0.466667,17,0.333333,0.495396,0.647059,0,0.4 +tdata-4-6-1,1,6,6,1,0.666667,0,1,1,1,1,1,0,1,12,5.66667,3.72678,0.388889,18,0.333333,0.456485,0.5,0,0.166667 +tdata-4-6-10,1,6,6,1,0.666667,0,1,1,1,1,1,0,1,12,5.66667,3.85861,0.5,23,0.333333,0.444145,0.722222,0,0.333333 +tdata-4-6-11,1,6,6,1,0.666667,0,1,1,1,1,1,0,1,13,7.33333,3.59011,0.444444,20,0.333333,0.464928,0.666667,0,0.5 +tdata-4-6-12,1,6,6,1,0.666667,0,1,1,1,1,1,0,1,11,4.66667,3.39935,0.416667,19,0.333333,0.433603,0.333333,0,0 +tdata-4-6-13,1,6,6,1,0.666667,0,0.833333,0.833333,1,1,1,0,2,10,6.8,2.92575,0.388889,16,0.333333,0.47562,0.888889,0,0.333333 +tdata-4-6-14,1,6,6,1,0.666667,0,1,1,1,1,1,0,3,12,7.83333,3.02306,0.388889,17,0.333333,0.468225,0.555556,0,0.333333 +tdata-4-6-16,1,6,6,1,0.666667,0,0.833333,0.833333,1,1,1,0,1,8,4.2,2.31517,0.305556,13,0.333333,0.463679,0.5,0,0 +tdata-4-6-17,1,6,6,1,0.666667,0,1,1,1,1,1,0,1,14,6.5,5.12348,0.444444,20,0.333333,0.460981,0.666667,0,0.166667 +tdata-4-6-19,1,6,6,1,0.666667,0,1,1,1,1,1,0,1,11,5.66667,3.63624,0.361111,17,0.333333,0.501849,0.611111,0,0.166667 +tdata-4-6-2,1,6,6,1,0.666667,0,1,1,1,1,1,0,2,13,6.33333,3.94405,0.388889,17,0.333333,0.497452,0.5,0,0.166667 +tdata-4-6-20,1,6,6,1,0.666667,0,1,1,1,1,1,0,2,13,6.83333,4.0586,0.416667,19,0.333333,0.467876,0.722222,0,0.166667 +tdata-4-6-22,1,6,6,1,0.666667,0,0.833333,0.833333,1,1,1,0,4,13,8.4,3.00666,0.333333,14,0.333333,0.4997,0.666667,0,0.333333 +tdata-4-6-23,1,6,6,1,0.666667,0,1,1,1,1,1,0,1,14,7.83333,4.52462,0.361111,15,0.333333,0.479167,0.333333,0,0.333333 +tdata-4-6-24,1,6,6,1,0.666667,0,0.833333,0.833333,1,1,1,0,1,12,6.8,3.86782,0.361111,15,0.333333,0.476918,0.611111,0,0.333333 +tdata-4-6-26,1,6,6,1,0.666667,0,1,1,1,1,1,0,2,19,8,5.35413,0.333333,14,0.333333,0.486361,0.666667,0,0.166667 +tdata-4-6-27,1,6,6,1,0.666667,0,1,1,1,1,1,0,1,14,6.5,5.12348,0.416667,18,0.333333,0.469225,0.777778,0,0.166667 +tdata-4-6-3,1,6,6,1,0.666667,0,1,1,1,1,1,0,1,22,6.5,7.13559,0.333333,15,0.333333,0.484712,0.777778,0,0 +tdata-4-6-31,1,6,6,1,0.666667,0,0.833333,0.833333,1,1,1,0,4,14,8.6,3.87814,0.277778,11,0.333333,0.496153,0.611111,0,0.5 +tdata-4-6-33,1,6,6,1,0.666667,0,1,1,1,1,1,0,2,12,5.66667,3.39935,0.444444,20,0.333333,0.436701,0.611111,0,0 +tdata-4-6-35,1,6,6,1,0.666667,0,1,1,1,1,1,0,1,17,7.66667,5.70575,0.333333,14,0.333333,0.469574,0.722222,0,0.166667 +tdata-4-6-36,1,6,6,1,0.666667,0,1,1,1,1,1,0,5,17,8.83333,3.97562,0.444444,20,0.333333,0.459183,0.5,0,0.333333 +tdata-4-6-38,1,6,6,1,0.666667,0,0.833333,0.833333,1,1,1,0,2,17,8.6,4.96387,0.333333,13,0.333333,0.480765,0.944444,0,0.333333 +tdata-4-6-4,1,6,6,1,0.666667,0,1,1,1,1,1,0,1,17,6.66667,5.46707,0.388889,16,0.333333,0.484263,0.944444,0,0.166667 +tdata-4-6-40,1,6,6,1,0.666667,0,1,1,1,1,1,0,1,16,8.83333,5.2095,0.416667,17,0.333333,0.4998,0.722222,0,0.5 +tdata-4-6-5,1,6,6,1,0.666667,0,0.833333,0.833333,1,1,1,0,1,9,4.4,2.6533,0.222222,9,0.333333,0.496952,0.944444,0,0 +tdata-4-6-6,1,6,6,1,0.666667,0,1,1,1,1,1,0,1,8,3.83333,2.26691,0.444444,21,0.333333,0.4369,0.611111,0,0 +tdata-4-6-7,1,6,6,1,0.666667,0,1,1,1,1,1,0,3,20,8.83333,5.33594,0.444444,20,0.333333,0.480516,0.555556,0,0.5 +tdata-4-6-8,1,6,6,1,0.666667,0,1,1,1,1,1,0,1,15,6.33333,4.74927,0.416667,19,0.333333,0.457134,0.611111,0,0 +tdata-4-7-10,1,7,6,0.857143,0.666667,0,1,1,1,1,1,0,1,23,9.28571,7.75913,0.428571,20,0.333333,0.478858,0.578947,0,0.285714 +tdata-4-7-12,1,7,6,0.857143,0.666667,0,1,1,1,1,1,0,2,24,10.4286,7.49966,0.357143,17,0.333333,0.495375,0.789474,0,0.285714 +tdata-4-7-13,1,7,6,0.857143,0.666667,0,1,1,1,1,1,0,3,23,10.8571,7.05951,0.428571,21,0.333333,0.496403,0.578947,0,0.285714 +tdata-4-7-14,1,7,6,0.857143,0.666667,0,0.857143,0.857143,1,1,1,0,1,16,9,5.44671,0.380952,18,0.333333,0.473315,0.631579,0,0.285714 +tdata-4-7-17,1,7,6,0.857143,0.666667,0,1,1,1,1,1,0,1,19,7.42857,5.62792,0.380952,19,0.333333,0.463441,0.631579,0,0 +tdata-4-7-19,1,7,6,0.857143,0.666667,0,1,1,1,1,1,0,1,19,8,6.52468,0.380952,18,0.333333,0.471223,0.789474,0,0.142857 +tdata-4-7-2,1,7,6,0.857143,0.666667,0,1,1,1,1,1,0,2,25,9.57143,7.38448,0.380952,18,0.333333,0.464212,0.631579,0,0.142857 +tdata-4-7-22,1,7,6,0.857143,0.666667,0,1,1,1,1,1,0,1,8,4.14286,2.2315,0.404762,21,0.333333,0.435215,0.631579,0,0 +tdata-4-7-23,1,7,6,0.857143,0.666667,0,1,1,1,1,1,0,1,16,7.85714,4.96929,0.404762,21,0.333333,0.466892,0.421053,0,0.285714 +tdata-4-7-24,1,7,6,0.857143,0.666667,0,1,1,1,1,1,0,2,25,12.1429,8.27092,0.428571,21,0.333333,0.482345,0.736842,0,0.142857 +tdata-4-7-25,1,7,6,0.857143,0.666667,0,1,1,1,1,1,0,2,22,9.14286,6.51216,0.404762,19,0.333333,0.461166,0.631579,0,0.285714 +tdata-4-7-26,1,7,6,0.857143,0.666667,0,0.857143,0.857143,1,1,1,0,2,22,8.5,6.84957,0.333333,15,0.333333,0.488438,0.842105,0,0.285714 +tdata-4-7-28,1,7,6,0.857143,0.666667,0,1,1,1,1,1,0,1,16,6,5.07093,0.357143,18,0.333333,0.450228,0.578947,0,0 +tdata-4-7-29,1,7,6,0.857143,0.666667,0,0.857143,0.857143,1,1,1,0,1,21,9.33333,8.09664,0.357143,16,0.333333,0.505286,0.736842,0,0.428571 +tdata-4-7-30,1,7,6,0.857143,0.666667,0,1,1,1,1,1,0,3,16,10.2857,4.19913,0.428571,22,0.333333,0.480693,0.894737,0,0.571429 +tdata-4-7-33,1,7,6,0.857143,0.666667,0,1,1,1,1,1,0,2,12,6,3.42261,0.428571,22,0.333333,0.418514,0.631579,0,0 +tdata-4-7-35,1,7,6,0.857143,0.666667,0,1,1,1,1,1,0,2,17,8,5.12696,0.333333,17,0.333333,0.485648,0.526316,0,0.285714 +tdata-4-7-36,1,7,6,0.857143,0.666667,0,1,1,1,1,1,0,1,17,7.71429,5.8728,0.404762,20,0.333333,0.449567,0.842105,0,0.142857 +tdata-4-7-38,1,7,6,0.857143,0.666667,0,0.857143,0.857143,1,1,1,0,1,13,6.83333,4.74049,0.309524,15,0.333333,0.467846,0.421053,0,0.285714 +tdata-4-7-4,1,7,6,0.857143,0.666667,0,0.714286,0.714286,1,1,1,0,1,21,7,7.29383,0.261905,12,0.333333,0.493393,0.631579,0,0.285714 +tdata-4-7-8,1,7,6,0.857143,0.666667,0,1,1,1,1,1,0,1,11,6.57143,3.45821,0.380952,21,0.333333,0.453568,0.684211,0,0 +tdata-4-7-9,1,7,6,0.857143,0.666667,0,1,1,1,1,1,0,1,23,8.28571,7.55389,0.380952,18,0.333333,0.474086,0.736842,0,0.142857 +tdata-5-10-13,1,10,7,0.7,0.714286,0,1,1,1,1,1,0,2,37,17.5,12.2902,0.485714,37,0.285714,0.470877,0.583333,0,0.3 +tdata-5-10-18,1,10,7,0.7,0.714286,0,0.9,0.9,1,1,1,0,1,38,14.3333,12.8149,0.428571,32,0.285714,0.481153,0.583333,0,0.1 +tdata-5-4-1,1,4,7,1.75,0.714286,0,1,1,1,1,1,0,3,10,6.5,2.69258,0.464286,18,0.285714,0.481404,0.444444,0,0.5 +tdata-5-4-13,1,4,7,1.75,0.714286,0,1,1,1,1,1,0,1,11,6.5,3.84057,0.428571,14,0.285714,0.536576,0.666667,0,0.25 +tdata-5-4-14,1,4,7,1.75,0.714286,0,1,1,1,1,1,0,1,15,7.5,5.02494,0.464286,15,0.285714,0.492672,0.666667,0,0 +tdata-5-4-18,1,4,7,1.75,0.714286,0,1,1,1,1,1,0,1,10,5.5,3.3541,0.428571,16,0.285714,0.479495,0.944444,0,0.25 +tdata-5-4-19,1,4,7,1.75,0.714286,0,1,1,1,1,1,0,6,17,11.5,5.02494,0.571429,21,0.285714,0.548953,0.611111,0,0.5 +tdata-5-4-2,1,4,7,1.75,0.714286,0,1,1,1,1,1,0,3,11,6,3.08221,0.464286,16,0.285714,0.487931,0.611111,0,0 +tdata-5-4-20,1,4,7,1.75,0.714286,0,1,1,1,1,1,0,1,6,3.5,1.80278,0.428571,15,0.285714,0.482635,0.333333,0,0.25 +tdata-5-4-21,1,4,7,1.75,0.714286,0,1,1,1,1,1,0,4,9,7,1.87083,0.464286,18,0.285714,0.50899,0.555556,0,0.25 +tdata-5-4-23,1,4,7,1.75,0.714286,0,1,1,1,1,1,0,2,9,6.25,2.68095,0.464286,18,0.285714,0.473645,0.944444,0,0 +tdata-5-4-24,1,4,7,1.75,0.714286,0,1,1,1,1,1,0,2,14,8.75,4.96865,0.5,17,0.285714,0.523645,0.833333,0,0 +tdata-5-4-32,1,4,7,1.75,0.714286,0,1,1,1,1,1,0,1,12,4.5,4.38748,0.428571,14,0.285714,0.475924,0.944444,0,0 +tdata-5-4-37,1,4,7,1.75,0.714286,0,1,1,1,1,1,0,1,11,4.25,3.96074,0.5,19,0.285714,0.464594,0.777778,0,0.25 +tdata-5-4-39,1,4,7,1.75,0.714286,0,1,1,1,1,1,0,1,12,5.75,4.20565,0.464286,17,0.285714,0.458805,0.388889,0,0 +tdata-5-4-4,1,4,7,1.75,0.714286,0,1,1,1,1,1,0,1,11,4.75,3.89711,0.357143,12,0.285714,0.491502,0.611111,0,0.25 +tdata-5-4-6,1,4,7,1.75,0.714286,0,1,1,1,1,1,0,2,12,8,3.74166,0.392857,13,0.285714,0.491564,0.555556,0,0 +tdata-5-4-8,1,4,7,1.75,0.714286,0,1,1,1,1,1,0,2,16,7,5.38516,0.5,18,0.285714,0.514101,0.777778,0,0.25 +tdata-5-5-1,1,5,7,1.4,0.714286,0,1,1,1,1,1,0,1,16,6.8,5.77581,0.457143,19,0.285714,0.494778,0.526316,0,0 +tdata-5-5-12,1,5,7,1.4,0.714286,0,1,1,1,1,1,0,2,16,7.8,5.6356,0.457143,18,0.285714,0.502621,0.947368,0,0.2 +tdata-5-5-15,1,5,7,1.4,0.714286,0,1,1,1,1,1,0,2,19,9.8,6.24179,0.428571,18,0.285714,0.533399,0.578947,0,0.4 +tdata-5-5-17,1,5,7,1.4,0.714286,0,1,1,1,1,1,0,1,24,9.6,8.21219,0.514286,22,0.285714,0.482286,0.736842,0,0.2 +tdata-5-5-19,1,5,7,1.4,0.714286,0,1,1,1,1,1,0,1,11,7,3.52136,0.428571,19,0.285714,0.518778,0.421053,0,0.4 +tdata-5-5-25,1,5,7,1.4,0.714286,0,1,1,1,1,1,0,2,19,8,5.83095,0.514286,23,0.285714,0.478542,0.631579,0,0 +tdata-5-5-28,1,5,7,1.4,0.714286,0,1,1,1,1,1,0,2,22,11,7.94984,0.514286,22,0.285714,0.494266,0.684211,0,0.2 +tdata-5-5-30,1,5,7,1.4,0.714286,0,1,1,1,1,1,0,1,12,6.6,4.22374,0.457143,20,0.285714,0.502187,0.842105,0,0.2 +tdata-5-5-5,1,5,7,1.4,0.714286,0,0.8,0.8,1,1,1,0,2,11,6.5,3.64005,0.371429,15,0.285714,0.514246,0.736842,0,0.2 +tdata-5-5-7,1,5,7,1.4,0.714286,0,1,1,1,1,1,0,2,15,8,4.3359,0.428571,18,0.285714,0.492887,0.894737,0,0.4 +tdata-5-6-25,1,6,7,1.16667,0.714286,0,0.833333,0.833333,1,1,1,0,2,23,10.8,8.58836,0.428571,19,0.285714,0.505774,0.85,0,0.333333 +tdata-5-6-26,1,6,7,1.16667,0.714286,0,0.833333,0.833333,1,1,1,0,1,11,4.6,3.49857,0.357143,17,0.285714,0.485961,0.95,0,0.166667 +tdata-5-6-3,1,6,7,1.16667,0.714286,0,1,1,1,1,1,0,1,19,8.5,5.7373,0.428571,20,0.285714,0.493514,0.5,0,0.166667 +tdata-5-6-32,1,6,7,1.16667,0.714286,0,1,1,1,1,1,0,1,16,8,5.71548,0.428571,22,0.285714,0.517433,0.85,0,0.166667 +tdata-5-6-34,1,6,7,1.16667,0.714286,0,1,1,1,1,1,0,1,24,12.1667,7.96695,0.428571,20,0.285714,0.523673,0.7,0,0.333333 +tdata-5-6-39,1,6,7,1.16667,0.714286,0,0.833333,0.833333,1,1,1,0,1,29,10,10.1587,0.452381,21,0.285714,0.498905,0.75,0,0.333333 +tdata-5-6-7,1,6,7,1.16667,0.714286,0,1,1,1,1,1,0,3,11,6.5,2.62996,0.5,27,0.285714,0.460564,0.75,0,0.333333 +tdata-5-7-9,1,7,7,1,0.714286,0,1,1,1,1,1,0,1,19,8.28571,6.81834,0.44898,26,0.285714,0.467437,0.714286,0,0.142857 +tdata-5-8-31,1,8,7,0.875,0.714286,0,1,1,1,1,1,0,4,27,15.125,7.99121,0.410714,27,0.285714,0.497491,0.318182,0,0.25 +tdata-5-9-38,1,9,7,0.777778,0.714286,0,1,1,1,1,1,0,3,21,11.3333,5.67646,0.412698,30,0.285714,0.512267,0.391304,0,0.222222 +BFT-10_16_8_77_15_58-1,1,16,8,0.5,0.601562,0.1875,0.6875,0.846154,0,9,4.8125,2.42947,2,13,6.45455,3.17271,0.375,48,0.319444,0.466288,0.450893,0.34375,0 +BFT-10_16_8_77_15_58-16,1,16,8,0.5,0.601562,0.125,0.6875,0.846154,0,10,4.8125,2.7436,2,15,6,4.17786,0.390625,50,0.105556,0.405617,0.463542,0.367188,0 +BFT-11_16_8_77_31_46-12,1,16,8,0.5,0.601562,0.0625,0.75,0.8,0,7,2.40625,1.67443,2,20,8.91667,5.6783,0.351562,47,0.330556,0.403809,0.28125,0.226562,0 +BFT-17_20_5_60_12_36-1,1,20,5,0.25,0.6,0.2,0.5,0.666667,0,9,4.61538,2.70437,2,11,5.9,3.2078,0.29,29,0.592806,0.410043,0.666667,0.34,0 +BFT-17_20_5_60_12_36-11,1,20,5,0.25,0.6,0.1,0.75,0.882353,0,8,4.61538,2.13175,2,8,4.46667,1.92758,0.33,35,0.473381,0.402821,0.683333,0.27,0 +BFT-17_20_5_60_12_36-14,1,20,5,0.25,0.6,0.1,0.65,0.866667,0,8,4.61538,2.30513,2,11,5.15385,2.56743,0.32,33,0.479137,0.414316,0.406667,0.17,0 +BFT-17_20_5_60_12_36-15,1,20,5,0.25,0.6,0.1,0.7,0.875,0,10,4.61538,2.16754,2,10,4.14286,2.35606,0.32,34,0.305036,0.37859,0.483333,0.25,0 +BFT-17_20_5_60_12_36-17,1,20,5,0.25,0.6,0.15,0.65,0.866667,0,8,4.61538,2.23739,3,12,5.07692,2.30256,0.35,36,0.493525,0.450256,0.491667,0.38,0.05 +BFT-17_20_5_60_12_36-19,1,20,5,0.25,0.6,0.2,0.75,1,0,7,4.61538,2.09536,2,11,4.73333,2.64491,0.34,34,0.515108,0.407094,0.677778,0.27,0 +BFT-17_20_5_60_12_36-2,1,20,5,0.25,0.6,0.15,0.8,1,0,8,4.61538,2.09536,2,10,4.8125,2.12776,0.35,36,0.138129,0.388077,0.416667,0.23,0.05 +BFT-17_20_5_60_12_36-3,1,20,5,0.25,0.6,0.2,0.5,0.666667,0,7,4.61538,2.23739,2,8,4.1,2.02237,0.24,24,0.61295,0.407094,0.7,0.32,0 +BFT-17_20_5_60_12_36-6,1,20,5,0.25,0.6,0.2,0.7,0.933333,0,8,4.61538,2.40315,2,11,4.78571,2.59611,0.37,37,0.804317,0.397222,0.5,0.39,0.05 +BFT-17_20_5_60_12_36-7,1,20,5,0.25,0.6,0.15,0.6,0.75,0,7,4.61538,1.98217,2,7,3.83333,1.46249,0.23,24,0.0503597,0.422265,0.658333,0.31,0.05 +BFT-17_20_5_60_12_36-8,1,20,5,0.25,0.6,0.1,0.7,0.875,0,8,4.61538,2.13175,2,8,3.92857,1.53364,0.3,30,0.145324,0.448419,0.725,0.37,0 +BFT-17_20_5_60_12_36-9,1,20,5,0.25,0.6,0.2,0.6,0.857143,0,10,4.61538,2.76067,2,7,4.25,1.63936,0.29,29,0.135252,0.376752,0.2,0.39,0.1 +BFT-18_20_5_60_12_45-11,1,20,5,0.25,0.6,0.25,0.6,0.857143,0,9,4.61538,3.02651,3,12,4.83333,2.60875,0.27,27,0.546763,0.396923,0.592593,0.3,0 +BFT-18_20_5_60_12_45-13,1,20,5,0.25,0.6,0.2,0.55,0.733333,0,7,4.61538,1.98217,2,11,5.54545,2.74238,0.29,29,0.379856,0.433077,0.473333,0.27,0 +BFT-18_20_5_60_12_45-15,1,20,5,0.25,0.6,0.2,0.7,0.933333,0,11,4.61538,2.70437,2,7,3.92857,1.43747,0.36,37,0.189928,0.378632,0.477778,0.32,0 +BFT-18_20_5_60_12_45-16,1,20,5,0.25,0.6,0.15,0.75,0.9375,0,9,4.61538,2.37093,2,12,6.86667,3.34398,0.36,36,0.497842,0.477393,0.592593,0.34,0.1 +BFT-18_20_5_60_12_45-20,1,20,5,0.25,0.6,0.1,0.55,0.733333,0,7,4.61538,2.09536,2,9,4.90909,2.50289,0.32,32,0.263309,0.45735,0.509524,0.27,0 +BFT-18_20_5_60_12_45-3,1,20,5,0.25,0.6,0.25,0.6,0.857143,0,9,4.61538,3.24721,2,10,4.41667,2.62864,0.34,34,0.394245,0.350598,0.557143,0.38,0 +BFT-18_20_5_60_12_45-5,1,20,5,0.25,0.6,0.2,0.7,1,0,9,4.61538,2.52795,2,12,5.57143,2.63803,0.36,36,0.238849,0.397735,0.566667,0.32,0 +BFT-18_20_5_60_12_45-6,1,20,5,0.25,0.6,0.15,0.65,0.866667,0,6,4.61538,1.82033,2,8,4.46154,1.7809,0.31,31,0.515108,0.404316,0.326667,0.33,0 +BFT-18_20_5_60_12_45-7,1,20,5,0.25,0.6,0.2,0.55,0.785714,0,8,4.61538,2.09536,2,10,6.45455,2.49959,0.28,28,0.323741,0.456581,0.722222,0.27,0.05 +BFT-18_20_5_60_12_45-8,1,20,5,0.25,0.6,0.15,0.75,0.9375,0,8,4.61538,1.90297,2,11,6.06667,3.08689,0.37,37,0.4,0.452009,0.644444,0.22,0 +BFT-19_20_5_60_24_36-11,1,20,5,0.25,0.6,0.1,0.65,0.866667,0,7,2.4,1.72047,3,22,8.92308,4.79521,0.33,33,0.047482,0.399289,0.6,0.31,0 +BFT-19_20_5_60_24_36-12,1,20,5,0.25,0.6,0.05,0.6,0.75,0,6,2.4,1.5748,2,18,8.91667,5.1228,0.33,34,0.315108,0.415733,0.591667,0.18,0 +BFT-19_20_5_60_24_36-13,1,20,5,0.25,0.6,0.1,0.5,0.714286,0,5,2.4,1.6,3,17,8.5,4.47772,0.33,33,0.4,0.3242,0.766667,0.1,0 +BFT-19_20_5_60_24_36-14,1,20,5,0.25,0.6,0.1,0.7,0.823529,0,5,2.4,1.41421,2,18,7.85714,5.71786,0.36,36,0.303597,0.330533,0.4,0.23,0 +BFT-19_20_5_60_24_36-18,1,20,5,0.25,0.6,0.15,0.55,0.733333,0,6,2.4,1.81108,2,8,4,1.80907,0.3,31,0.417266,0.346089,0.45,0.36,0 +BFT-19_20_5_60_24_36-20,1,20,5,0.25,0.6,0.2,0.6,0.857143,0,5,2.4,1.26491,2,17,8.08333,4.78641,0.26,26,0.631655,0.4254,0.406667,0.28,0 +BFT-19_20_5_60_24_36-8,1,20,5,0.25,0.6,0.2,0.7,0.875,0,7,2.4,2,2,19,8.92857,6.1929,0.33,33,0.158273,0.375422,0.3,0.34,0 +BFT-1_16_5_48_10_29-1,1,16,5,0.3125,0.6,0.1875,0.5625,0.818182,0,16,4.36364,4.20547,2,7,3.77778,1.5476,0.3375,27,0.233333,0.341035,0.346154,0.3,0 +BFT-1_16_5_48_10_29-10,1,16,5,0.3125,0.6,0.1875,0.5625,0.75,0,11,4.36364,2.93173,2,10,5.66667,2.62467,0.3375,27,0.00888889,0.408775,0.484615,0.3125,0 +BFT-1_16_5_48_10_29-11,1,16,5,0.3125,0.6,0.1875,0.625,0.909091,0,9,4.36364,2.77235,2,7,4.2,1.77764,0.3,24,0.56,0.438321,0.571429,0.3,0 +BFT-1_16_5_48_10_29-14,1,16,5,0.3125,0.6,0.25,0.625,1,0,9,4.36364,2.34609,2,6,4.5,1.28452,0.3875,31,0.211111,0.40928,0.474359,0.3625,0 +BFT-1_16_5_48_10_29-15,1,16,5,0.3125,0.6,0.1875,0.6875,0.916667,0,7,4.36364,2.45959,3,9,4.81818,1.94554,0.35,28,0.475556,0.451768,0.565934,0.3,0 +BFT-1_16_5_48_10_29-16,1,16,5,0.3125,0.6,0.125,0.625,0.909091,0,9,4.36364,2.60324,2,9,3.7,2.2383,0.3,24,0.604444,0.42803,0.570513,0.275,0 +BFT-1_16_5_48_10_29-18,1,16,5,0.3125,0.6,0.125,0.75,0.923077,0,9,4.36364,2.77235,2,7,3.91667,1.49768,0.275,23,0.448889,0.459659,0.631868,0.35,0 +BFT-1_16_5_48_10_29-19,1,16,5,0.3125,0.6,0.0625,0.625,0.833333,0,9,4.36364,2.42235,2,9,4.6,2.498,0.35,28,0.393333,0.445265,0.307692,0.1375,0 +BFT-1_16_5_48_10_29-2,1,16,5,0.3125,0.6,0.1875,0.625,0.833333,0,8,4.36364,2.26727,2,6,4.2,1.6,0.2875,23,0.0644444,0.425063,0.5,0.3875,0 +BFT-1_16_5_48_10_29-3,1,16,5,0.3125,0.6,0,0.5625,0.692308,0,9,4.36364,2.22681,2,5,3.11111,0.993808,0.275,23,0.284444,0.438384,0.605769,0,0 +BFT-1_16_5_48_10_29-4,1,16,5,0.3125,0.6,0.1875,0.6875,0.916667,0,7,4.36364,1.96666,2,9,4.09091,2.23422,0.35,28,0.326667,0.383396,0.538462,0.3125,0 +BFT-1_16_5_48_10_29-5,1,16,5,0.3125,0.6,0.1875,0.625,0.909091,0,8,4.36364,2.26727,2,9,5.2,2.67582,0.35,28,0.104444,0.453283,0.466346,0.225,0 +BFT-1_16_5_48_10_29-6,1,16,5,0.3125,0.6,0.125,0.625,0.769231,0,8,4.36364,2.01236,2,8,4.4,1.90788,0.3125,25,0.468889,0.453662,0.666667,0.1375,0 +BFT-1_16_5_48_10_29-8,1,16,5,0.3125,0.6,0.25,0.625,1,0,9,4.36364,2.67217,2,10,4.9,2.3,0.35,28,0.177778,0.415025,0.521978,0.375,0 +BFT-1_16_5_48_10_29-9,1,16,5,0.3125,0.6,0.0625,0.6875,0.916667,0,8,4.36364,1.91988,2,10,4.63636,2.90056,0.2625,21,0.24,0.437879,0.391026,0.1875,0 +BFT-20_20_5_60_24_45-14,1,20,5,0.25,0.6,0.15,0.6,0.8,0,6,2.5,1.70783,3,20,8.25,6.15257,0.31,31,0.446043,0.413519,0.553333,0.24,0 +BFT-20_20_5_60_24_45-18,1,20,5,0.25,0.6,0.1,0.55,0.6875,0,7,2.4,1.85472,2,14,7.72727,3.79212,0.27,28,0.510791,0.390533,0.3,0.18,0 +BFT-20_20_5_60_24_45-19,1,20,5,0.25,0.6,0.25,0.7,0.933333,0,5,2.4,1.41421,2,17,6.42857,5.05278,0.36,36,0.231655,0.339689,0.533333,0.35,0.05 +BFT-20_20_5_60_24_45-6,1,20,5,0.25,0.6,0.1,0.6,0.75,0,5,2.4,1.23288,5,21,10.1667,4.66964,0.28,28,0.435971,0.391444,0.4,0.2,0 +BFT-20_20_5_60_24_45-9,1,20,5,0.25,0.6,0.15,0.6,0.75,0,5,2.4,1.5748,2,18,9.5,5.17204,0.35,35,0.515108,0.404378,0.611111,0.27,0 +BFT-21_20_5_80_16_48-1,1,20,5,0.25,0.8,0,0.85,0.894737,0,7,4.70588,1.99307,2,10,5.82353,2.54917,0.41,45,0.0978417,0.535,0.561111,0,0 +BFT-21_20_5_80_16_48-12,1,20,5,0.25,0.8,0.05,0.85,0.944444,0,8,4.70588,2.18994,2,12,6.29412,3.62613,0.5,53,0.352518,0.496667,0.666667,0.05,0.15 +BFT-21_20_5_80_16_48-13,1,20,5,0.25,0.8,0.1,0.7,0.777778,0,10,4.70588,2.16291,2,13,5.28571,2.81396,0.44,47,0.0877698,0.460392,0.505556,0.13,0 +BFT-21_20_5_80_16_48-14,1,20,5,0.25,0.8,0.1,0.9,1,0,9,4.70588,2.29487,2,13,5.88889,3.41384,0.49,51,0.128058,0.482712,0.458333,0.12,0.1 +BFT-21_20_5_80_16_48-18,1,20,5,0.25,0.8,0.1,0.75,0.833333,0,9,4.70588,2.24302,2,13,7.6,4.14407,0.4,41,0.217266,0.520948,0.593333,0.1,0 +BFT-21_20_5_80_16_48-3,1,20,5,0.25,0.8,0.05,0.8,0.941176,0,10,4.70588,2.53829,2,13,6.0625,3.59633,0.48,51,0.22446,0.491569,0.622222,0.08,0 +BFT-21_20_5_80_16_48-8,1,20,5,0.25,0.8,0.05,0.8,0.941176,0,12,4.70588,3.42593,2,12,6.125,3.29536,0.5,52,0,0.454869,0.358333,0.05,0 +BFT-22_20_5_80_16_60-14,1,20,5,0.25,0.8,0.1,0.75,0.882353,0,9,4.70588,1.96333,2,15,6.46667,4.3645,0.46,48,0.258993,0.487745,0.527778,0.1,0.1 +BFT-22_20_5_80_16_60-17,1,20,5,0.25,0.8,0.1,0.85,0.944444,0,9,4.70588,2.62935,3,13,7,3.44708,0.47,49,0.197122,0.511863,0.491667,0.1,0 +BFT-22_20_5_80_16_60-18,1,20,5,0.25,0.8,0.05,0.85,0.894737,0,7,4.70588,1.60017,2,10,5.58824,2.74524,0.44,47,0.115108,0.510294,0.426667,0.05,0 +BFT-22_20_5_80_16_60-20,1,20,5,0.25,0.8,0.05,0.9,1,0,8,4.70588,2.07973,2,14,6.27778,2.7448,0.51,54,0.218705,0.453333,0.571429,0.05,0.25 +BFT-22_20_5_80_16_60-3,1,20,5,0.25,0.8,0.05,0.8,0.842105,0,8,4.70588,1.96333,2,16,6.375,4.04467,0.41,43,0.315108,0.501144,0.644444,0.05,0 +BFT-22_20_5_80_16_60-5,1,20,5,0.25,0.8,0.15,0.8,0.941176,0,11,4.70588,2.62935,2,11,5.875,2.6428,0.48,49,0.335252,0.478235,0.386667,0.16,0 +BFT-22_20_5_80_16_60-6,1,20,5,0.25,0.8,0.15,0.85,1,0,11,4.70588,3.04408,2,13,5.17647,2.87454,0.5,54,0.217266,0.457745,0.466667,0.16,0 +BFT-22_20_5_80_16_60-8,1,20,5,0.25,0.8,0,0.85,0.85,0,11,4.70588,3.00519,2,11,4.76471,2.60157,0.44,48,0.392806,0.487647,0.574074,0,0 +BFT-23_20_5_80_32_48-1,1,20,5,0.25,0.8,0.05,0.9,1,0,6,2.42424,1.41486,2,22,11.6111,7.04855,0.47,50,0.215827,0.470185,0.516667,0.05,0.05 +BFT-23_20_5_80_32_48-11,1,20,5,0.25,0.8,0.1,0.8,0.888889,0,8,2.42424,1.79275,4,18,9.5,4,0.53,56,0.166906,0.421549,0.8,0.12,0 +BFT-23_20_5_80_32_48-14,1,20,5,0.25,0.8,0.1,0.75,0.882353,0,6,2.42424,1.47772,3,22,9.26667,4.94593,0.5,53,0.0359712,0.40963,0.35,0.14,0.15 +BFT-23_20_5_80_32_48-16,1,20,5,0.25,0.8,0.05,0.85,0.944444,0,6,2.42424,1.49809,2,23,11.6471,7.52987,0.52,55,0.0230216,0.429714,0.433333,0.07,0 +BFT-23_20_5_80_32_48-19,1,20,5,0.25,0.8,0.05,0.75,0.833333,0,6,2.42424,1.57692,2,22,10.1333,6.33368,0.48,51,0.0690647,0.435657,0.35,0.05,0 +BFT-23_20_5_80_32_48-2,1,20,5,0.25,0.8,0.05,0.9,1,0,5,2.42424,1.34908,2,24,9.16667,6.12146,0.47,49,0.103597,0.45665,0.9,0.07,0 +BFT-23_20_5_80_32_48-8,1,20,5,0.25,0.8,0.05,0.7,0.777778,0,6,2.42424,1.43612,2,26,11.0714,7.75946,0.43,46,0.218705,0.51202,0.658333,0.05,0 +BFT-24_20_5_80_32_60-1,1,20,5,0.25,0.8,0.05,0.85,0.894737,0,6,2.42424,1.30338,3,22,9.05882,4.92856,0.47,51,0.428777,0.467879,0.35,0.05,0 +BFT-24_20_5_80_32_60-10,1,20,5,0.25,0.8,0,0.85,0.894737,0,7,2.42424,1.49809,2,28,11.4118,7.21973,0.48,52,0.264748,0.47899,0.533333,0,0 +BFT-24_20_5_80_32_60-14,1,20,5,0.25,0.8,0.1,0.8,0.888889,0,6,2.42424,1.70614,2,28,11.25,7.12829,0.48,50,0.0676259,0.448838,0.48,0.13,0 +BFT-24_20_5_80_32_60-18,1,20,5,0.25,0.8,0,0.8,0.888889,0,6,2.42424,1.41486,3,24,10.75,6.0156,0.53,58,0.368345,0.502155,0.35,0,0 +BFT-24_20_5_80_32_60-4,1,20,5,0.25,0.8,0.15,0.7,0.823529,0,5,2.42424,1.41486,2,18,8.85714,5.16661,0.45,47,0.0359712,0.438131,0.433333,0.15,0.1 +BFT-24_20_5_80_32_60-8,1,20,5,0.25,0.8,0.1,0.8,0.888889,0,6,2.42424,1.32643,2,28,10.625,6.41166,0.47,51,0.128058,0.487054,0.477778,0.14,0 +BFT-25_20_8_96_19_58-12,1,20,8,0.4,0.6,0.1,0.8,0.888889,0,9,4.8,2.76767,2,15,7.25,4.22049,0.38125,62,0.431655,0.472843,0.641975,0.4,0 +BFT-25_20_8_96_19_58-3,1,20,8,0.4,0.6,0.15,0.85,1,0,11,4.8,2.92575,2,12,5.70588,3.30356,0.38125,62,0.221223,0.413019,0.75,0.30625,0 +BFT-28_20_8_96_38_72-19,1,20,8,0.4,0.6,0.05,0.85,0.894737,0,6,2.46154,1.56641,2,30,9.88235,8.9435,0.35625,58,0.254496,0.417628,0.490741,0.11875,0 +BFT-2_16_5_48_10_36-11,1,16,5,0.3125,0.6,0.25,0.625,0.909091,0,8,4.36364,2.45959,3,7,4.7,1.41774,0.35,28,0.555556,0.398106,0.448718,0.35,0 +BFT-2_16_5_48_10_36-13,1,16,5,0.3125,0.6,0.1875,0.5,0.8,0,8,4.36364,2.45959,2,8,5.125,1.83286,0.3,24,0.464444,0.462058,0.717949,0.3625,0 +BFT-2_16_5_48_10_36-14,1,16,5,0.3125,0.6,0.125,0.6875,0.846154,0,9,4.36364,2.56808,2,10,5,2.41209,0.3125,25,0.375556,0.436364,0.769231,0.25,0 +BFT-2_16_5_48_10_36-15,1,16,5,0.3125,0.6,0.0625,0.6875,0.916667,0,8,4.36364,2.1436,2,8,3.72727,1.81363,0.3375,27,0.451111,0.430366,0.634615,0.2875,0 +BFT-2_16_5_48_10_36-16,1,16,5,0.3125,0.6,0.0625,0.5625,0.75,0,7,4.36364,1.82272,2,6,3.55556,1.25708,0.275,22,0.424444,0.431503,0.673077,0.1375,0 +BFT-2_16_5_48_10_36-17,1,16,5,0.3125,0.6,0.125,0.625,0.909091,0,7,4.36364,2.10077,3,6,4.4,1.11355,0.3125,26,0.171111,0.409722,0.576923,0.225,0 +BFT-2_16_5_48_10_36-18,1,16,5,0.3125,0.6,0.0625,0.5625,0.818182,0,9,4.36364,2.42235,2,7,3.33333,1.82574,0.3125,25,0.522222,0.406187,0.673077,0.0875,0 +BFT-2_16_5_48_10_36-20,1,16,5,0.3125,0.6,0.1875,0.5625,0.75,0,9,4.36364,2.26727,2,5,3.44444,1.16534,0.2875,23,0.731111,0.409533,0.628205,0.275,0 +BFT-2_16_5_48_10_36-3,1,16,5,0.3125,0.6,0.125,0.5625,0.75,0,10,4.36364,2.80495,2,6,3.11111,1.1967,0.3125,25,0.482222,0.397664,0.807692,0.25,0 +BFT-2_16_5_48_10_36-4,1,16,5,0.3125,0.6,0.125,0.5625,0.818182,0,9,4.36364,2.22681,2,10,4.44444,2.62937,0.3,24,0.717778,0.443497,0.557692,0.2125,0 +BFT-2_16_5_48_10_36-5,1,16,5,0.3125,0.6,0.0625,0.75,1,0,9,4.36364,2.34609,2,7,4,1.68325,0.35,29,0.18,0.413258,0.480769,0.3125,0 +BFT-2_16_5_48_10_36-8,1,16,5,0.3125,0.6,0,0.625,0.769231,0,10,4.36364,2.77235,2,8,4.2,1.83303,0.2625,23,0.335556,0.426326,0.407692,0,0 +BFT-2_16_5_48_10_36-9,1,16,5,0.3125,0.6,0.1875,0.5,0.727273,0,9,4.36364,2.42235,2,8,4.125,2.14695,0.2375,19,0.764444,0.393434,0.740385,0.275,0 +BFT-3_16_5_48_19_29-1,1,16,5,0.3125,0.6,0.125,0.75,1,0,5,2.4,1.52971,2,16,6.66667,4.22953,0.375,30,0.566667,0.400486,0.589744,0.225,0 +BFT-3_16_5_48_19_29-10,1,16,5,0.3125,0.6,0.125,0.6875,0.846154,0,6,2.4,1.42829,2,13,6.09091,2.93736,0.275,22,0.0933333,0.425417,0.507692,0.3875,0 +BFT-3_16_5_48_19_29-11,1,16,5,0.3125,0.6,0.125,0.625,0.833333,0,5,2.4,1.46287,4,18,7.3,4.77598,0.375,31,0.422222,0.340972,0.432692,0.25,0.125 +BFT-3_16_5_48_19_29-14,1,16,5,0.3125,0.6,0.0625,0.5625,0.692308,0,5,2.4,1.71464,2,15,6.88889,4.72451,0.3,24,0.611111,0.46691,0.446154,0.275,0 +BFT-3_16_5_48_19_29-15,1,16,5,0.3125,0.6,0.125,0.5,0.666667,0,6,2.4,1.42829,2,14,9,4.41588,0.275,22,0.424444,0.40441,0.538462,0.225,0 +BFT-3_16_5_48_19_29-16,1,16,5,0.3125,0.6,0.1875,0.75,1,0,7,2.4,1.90788,3,11,6.5,2.92973,0.3625,29,0.568889,0.394757,0.5,0.4,0.125 +BFT-3_16_5_48_19_29-17,1,16,5,0.3125,0.6,0.125,0.5625,0.9,0,7,2.4,1.85472,2,18,7,4.8074,0.35,28,0.577778,0.363611,0.548077,0.1375,0 +BFT-3_16_5_48_19_29-18,1,16,5,0.3125,0.6,0.25,0.625,0.833333,0,7,2.4,1.772,2,16,7.2,4.996,0.35,28,0.186667,0.377569,0.679487,0.3375,0 +BFT-3_16_5_48_19_29-19,1,16,5,0.3125,0.6,0.1875,0.625,0.909091,0,7,2.4,1.62481,2,13,7.5,3.72156,0.4125,34,0.1,0.361146,0.269231,0.2625,0 +BFT-3_16_5_48_19_29-3,1,16,5,0.3125,0.6,0.1875,0.6875,0.846154,0,7,2.4,2.0347,2,15,8.18182,4.04111,0.2625,21,0.635556,0.425833,0.519231,0.3375,0 +BFT-3_16_5_48_19_29-4,1,16,5,0.3125,0.6,0.0625,0.5625,0.75,0,6,2.4,1.68523,3,11,6.66667,2.35702,0.325,28,0.657778,0.377431,0.461538,0.15,0.0625 +BFT-3_16_5_48_19_29-5,1,16,5,0.3125,0.6,0.125,0.6875,0.785714,0,7,2.4,1.49666,2,16,7.36364,4.39572,0.325,26,0.36,0.41934,0.384615,0.3375,0.125 +BFT-3_16_5_48_19_29-7,1,16,5,0.3125,0.6,0.25,0.5625,0.818182,0,6,2.52632,1.49977,2,10,5.77778,2.04275,0.3375,27,0.0955556,0.371455,0.403846,0.3,0 +BFT-4_16_5_48_19_36-14,1,16,5,0.3125,0.6,0.125,0.6875,0.846154,0,5,2.4,1.52971,3,16,6.90909,3.60441,0.3375,28,0.431111,0.405278,0.596154,0.2875,0.0625 +BFT-4_16_5_48_19_36-15,1,16,5,0.3125,0.6,0.1875,0.625,0.833333,0,6,2.4,1.59374,2,13,5.8,3.18748,0.3375,28,0.546667,0.403681,0.476923,0.3375,0.0625 +BFT-4_16_5_48_19_36-16,1,16,5,0.3125,0.6,0.1875,0.6875,0.916667,0,10,2.4,2.63439,3,13,7.72727,3.95637,0.325,26,0.18,0.415174,0.692308,0.2875,0 +BFT-4_16_5_48_19_36-17,1,16,5,0.3125,0.6,0.1875,0.5,0.666667,0,5,2.4,1.35647,2,12,6.875,3.0999,0.275,22,0.482222,0.393924,0.442308,0.275,0 +BFT-4_16_5_48_19_36-18,1,16,5,0.3125,0.6,0.125,0.625,0.909091,0,6,2.4,1.85472,2,14,6.3,4.02616,0.3375,27,0.686667,0.338507,0.307692,0.2,0 +BFT-4_16_5_48_19_36-19,1,16,5,0.3125,0.6,0.3125,0.6875,1,0,6,2.4,1.62481,3,17,6.36364,4.14011,0.3375,27,0.16,0.384965,0.630769,0.3625,0.125 +BFT-4_16_5_48_19_36-20,1,16,5,0.3125,0.6,0.125,0.6875,0.916667,0,5,2.4,1.42829,2,16,6.90909,3.98758,0.3375,27,0.215556,0.435069,0.512821,0.2125,0.0625 +BFT-4_16_5_48_19_36-3,1,16,5,0.3125,0.6,0.1875,0.5625,0.818182,0,5,2.4,1.28062,2,13,6.88889,4.01233,0.3625,29,0.471111,0.393299,0.507692,0.2625,0 +BFT-4_16_5_48_19_36-7,1,16,5,0.3125,0.6,0.0625,0.6875,0.785714,0,7,2.4,1.8,4,12,7.09091,2.4663,0.275,24,0.315556,0.416771,0.75,0.1,0.125 +BFT-4_16_5_48_19_36-8,1,16,5,0.3125,0.6,0,0.8125,0.928571,0,9,2.4,2.10713,3,14,6.92308,3.64716,0.3125,27,0.288889,0.41941,0.673077,0,0 +BFT-4_16_5_48_19_36-9,1,16,5,0.3125,0.6,0.1875,0.5625,0.75,0,5,2.4,1.46287,2,19,8.22222,5.3287,0.2625,21,0.386667,0.425,0.769231,0.2875,0.0625 +BFT-5_16_5_64_13_38-10,1,16,5,0.3125,0.8,0,0.8125,0.928571,0,8,4.57143,1.91663,2,9,5.30769,1.72692,0.5375,46,0.346667,0.481349,0.538462,0,0.125 +BFT-5_16_5_64_13_38-11,1,16,5,0.3125,0.8,0.0625,0.875,0.933333,0,7,4.57143,1.91663,2,12,5.78571,2.62348,0.475,40,0.164444,0.518998,0.476923,0.0625,0 +BFT-5_16_5_64_13_38-14,1,16,5,0.3125,0.8,0.0625,0.8125,0.928571,0,9,4.57143,2.44114,2,10,5.23077,2.85964,0.4375,37,0,0.509871,0.692308,0.0875,0 +BFT-5_16_5_64_13_38-15,1,16,5,0.3125,0.8,0.0625,0.75,0.857143,0,9,4.57143,2.3819,2,8,4.58333,2.01901,0.4,34,0.231111,0.468948,0.525641,0.0625,0 +BFT-5_16_5_64_13_38-16,1,16,5,0.3125,0.8,0,0.8125,0.928571,0,9,4.57143,2.55551,2,10,5.07692,2.30256,0.45,40,0.251111,0.524454,0.615385,0,0 +BFT-5_16_5_64_13_38-18,1,16,5,0.3125,0.8,0.0625,0.8125,0.928571,0,10,4.57143,2.5274,2,13,6.15385,3.52674,0.45,37,0.473333,0.510069,0.621795,0.15,0.0625 +BFT-5_16_5_64_13_38-2,1,16,5,0.3125,0.8,0.125,0.875,1,0,8,4.92308,2.12898,2,9,3.85714,1.84612,0.4125,35,0.186667,0.505342,0.642857,0.125,0.125 +BFT-5_16_5_64_13_38-3,1,16,5,0.3125,0.8,0,0.875,1,0,10,4.57143,2.69163,2,8,4.85714,1.95876,0.4625,40,0.04,0.509673,0.615385,0,0 +BFT-5_16_5_64_13_38-4,1,16,5,0.3125,0.8,0,0.875,0.933333,0,11,4.57143,2.71804,2,11,5.14286,2.66879,0.4625,40,0.202222,0.495734,0.336538,0,0 +BFT-5_16_5_64_13_38-5,1,16,5,0.3125,0.8,0,0.8125,0.866667,0,7,4.57143,1.80136,2,11,5.53846,2.73483,0.45,39,0.253333,0.541815,0.598901,0,0 +BFT-5_16_5_64_13_38-6,1,16,5,0.3125,0.8,0.0625,0.75,0.8,0,11,4.57143,2.79577,2,6,4.25,1.36168,0.475,41,0.0977778,0.452827,0.730769,0.0875,0 +BFT-5_16_5_64_13_38-8,1,16,5,0.3125,0.8,0.0625,0.875,0.933333,0,9,4.57143,2.12852,2,11,6.85714,2.97266,0.475,40,0.253333,0.546528,0.553846,0.0875,0.0625 +BFT-6_16_5_64_13_48-11,1,16,5,0.3125,0.8,0.0625,0.875,0.933333,0,7,4.57143,1.91663,2,8,4.5,1.67971,0.4375,39,0.155556,0.513046,0.569231,0.0625,0 +BFT-6_16_5_64_13_48-12,1,16,5,0.3125,0.8,0.125,0.875,1,0,12,4.57143,3.04054,3,11,6.71429,2.43277,0.5375,46,0.455556,0.555605,0.509615,0.125,0 +BFT-6_16_5_64_13_48-13,1,16,5,0.3125,0.8,0.0625,0.9375,1,0,12,4.57143,3.26703,2,10,4.33333,2.82056,0.45,39,0.4,0.487302,0.346154,0.0625,0 +BFT-6_16_5_64_13_48-14,1,16,5,0.3125,0.8,0.0625,0.75,0.8,0,8,4.57143,1.63507,2,10,5.25,2.89036,0.425,36,0.115556,0.537698,0.453846,0.075,0.0625 +BFT-6_16_5_64_13_48-15,1,16,5,0.3125,0.8,0.0625,0.6875,0.785714,0,10,4.57143,2.66497,2,10,5.63636,2.53243,0.525,44,0.08,0.533036,0.653846,0.1,0.1875 +BFT-6_16_5_64_13_48-17,1,16,5,0.3125,0.8,0.125,0.8125,1,0,13,4.57143,2.96923,2,9,4.92308,1.77424,0.5375,46,0.606667,0.463492,0.538462,0.1375,0 +BFT-6_16_5_64_13_48-2,1,16,5,0.3125,0.8,0.0625,0.875,1,0,10,4.57143,2.7701,2,10,5,2.23607,0.5,44,0.471111,0.475397,0.403846,0.0625,0 +BFT-6_16_5_64_13_48-20,1,16,5,0.3125,0.8,0.125,0.8125,0.928571,0,10,4.57143,2.32115,2,12,4.84615,2.98319,0.5125,42,0.117778,0.443502,0.461538,0.15,0.0625 +BFT-6_16_5_64_13_48-3,1,16,5,0.3125,0.8,0.0625,0.875,1,0,10,4.57143,2.84641,2,11,4.85714,2.94854,0.5625,48,0.235556,0.472222,0.598901,0.1375,0 +BFT-6_16_5_64_13_48-5,1,16,5,0.3125,0.8,0,0.8125,0.866667,0,8,4.57143,2.47023,2,8,5,1.7097,0.4125,36,0.468889,0.479663,0.471154,0,0.0625 +BFT-6_16_5_64_13_48-6,1,16,5,0.3125,0.8,0.125,0.8125,0.928571,0,7,4.57143,1.67819,2,12,6.46154,3.3193,0.525,43,0.5,0.52123,0.416667,0.1375,0.0625 +BFT-6_16_5_64_13_48-8,1,16,5,0.3125,0.8,0.125,0.8125,0.928571,0,8,4.57143,2.12852,2,11,4.61538,2.94928,0.5,43,0.166667,0.479613,0.546154,0.125,0 +BFT-6_16_5_64_13_48-9,1,16,5,0.3125,0.8,0.0625,0.8125,0.866667,0,8,4.57143,2.12852,2,9,4.30769,2.19736,0.3875,33,0.0444444,0.479266,0.355769,0.0625,0 +BFT-7_16_5_64_26_38-1,1,16,5,0.3125,0.8,0.0625,0.75,0.857143,0,9,2.37037,1.88853,4,17,7.91667,3.6391,0.5125,45,0,0.411394,0.576923,0.0625,0 +BFT-7_16_5_64_26_38-10,1,16,5,0.3125,0.8,0.0625,0.8125,0.866667,0,6,2.37037,1.86881,4,19,10.3077,4.40951,0.4625,39,0,0.505787,0.230769,0.1,0 +BFT-7_16_5_64_26_38-12,1,16,5,0.3125,0.8,0,0.875,0.875,0,8,2.37037,1.76694,2,20,8.42857,6.67313,0.475,40,0.00888889,0.476389,0.519231,0,0.0625 +BFT-7_16_5_64_26_38-14,1,16,5,0.3125,0.8,0.0625,0.8125,1,0,7,2.37037,1.59044,2,20,7.76923,5.89855,0.4625,38,0,0.42482,0.5,0.125,0.125 +BFT-7_16_5_64_26_38-15,1,16,5,0.3125,0.8,0.125,0.8125,0.928571,0,8,2.37037,1.84889,2,15,5.92308,4.4972,0.4375,37,0.337778,0.383873,0.461538,0.125,0 +BFT-7_16_5_64_26_38-2,1,16,5,0.3125,0.8,0.125,0.875,1,0,5,2.37037,1.44397,2,18,9.14286,4.71861,0.5,41,0.111111,0.421116,0.423077,0.15,0 +BFT-7_16_5_64_26_38-20,1,16,5,0.3125,0.8,0.0625,0.75,0.8,0,5,2.37037,1.51897,2,17,7.41667,4.62706,0.45,38,0.195556,0.484336,0.551282,0.0875,0 +BFT-7_16_5_64_26_38-7,1,16,5,0.3125,0.8,0.0625,0.75,0.8,0,5,2.37037,1.4694,2,17,10.8333,4.31728,0.425,38,0.177778,0.496914,0.461538,0.0625,0 +BFT-7_16_5_64_26_38-8,1,16,5,0.3125,0.8,0,0.8125,0.928571,0,6,2.37037,1.51897,3,19,6.76923,4.20904,0.475,43,0.115556,0.416692,0.605769,0,0 +BFT-7_16_5_64_26_38-9,1,16,5,0.3125,0.8,0.0625,0.8125,0.928571,0,7,2.37037,1.96541,5,16,9.84615,4.16665,0.4625,42,0.106667,0.476466,0.596154,0.0625,0.0625 +BFT-8_16_5_64_26_48-11,1,16,5,0.3125,0.8,0.0625,0.875,1,0,6,2.37037,1.28086,2,19,9,5.87975,0.5125,43,0.06,0.438735,0.365385,0.0625,0 +BFT-8_16_5_64_26_48-15,1,16,5,0.3125,0.8,0.0625,0.8125,0.928571,0,5,2.37037,1.28086,2,14,8.76923,3.8261,0.5375,46,0.0355556,0.437474,0.5,0.1125,0 +BFT-8_16_5_64_26_48-2,1,16,5,0.3125,0.8,0.125,0.6875,0.846154,0,5,2.37037,1.33744,3,19,10.2727,5.04689,0.4875,40,0.02,0.475849,0.846154,0.1375,0 +BFT-8_16_5_64_26_48-6,1,16,5,0.3125,0.8,0,0.875,0.875,0,7,2.37037,1.92735,2,21,9.92857,5.72544,0.475,41,0.342222,0.467618,0.5,0,0 +BFT-8_16_5_64_26_48-8,1,16,5,0.3125,0.8,0.0625,0.8125,0.928571,0,6,2.37037,1.49439,2,17,9.84615,5.461,0.4375,37,0.0555556,0.478987,0.570513,0.0625,0 +BFT-8_16_5_64_26_48-9,1,16,5,0.3125,0.8,0,0.875,1,0,5,2.37037,1.68101,2,14,6.07143,3.71222,0.475,42,0.217778,0.428138,0.346154,0,0.125 +BFT-9_16_8_77_15_46-2,1,16,8,0.5,0.601562,0.0625,0.75,0.8,0,10,4.8125,2.76629,2,11,5.41667,2.89995,0.351562,46,0.391667,0.437624,0.559028,0.0703125,0 +BFT-9_16_8_77_15_46-20,1,16,8,0.5,0.601562,0.0625,0.8125,0.928571,0,8,4.8125,2.15693,2,11,5,2.82843,0.390625,52,0.333333,0.431847,0.550781,0.273438,0 diff --git a/_articles/RJ-2025-045/CPMP-2015_data/index.html b/_articles/RJ-2025-045/CPMP-2015_data/index.html new file mode 100644 index 0000000000..107a4ee6f4 --- /dev/null +++ b/_articles/RJ-2025-045/CPMP-2015_data/index.html @@ -0,0 +1,120 @@ + + + + +Scenario overview + + + + +

    Scenario CPMP-2015

    + +
    ## Scenario id                           : CPMP-2015
    +## Performance measures                  : runtime
    +## Performance types                     : runtime
    +## Algorithm cutoff time                 : 3600
    +## Algorithm cutoff mem                  : 5120
    +## Instance Feature cutoff time          : 30
    +## Instance Feature cutoff mem           : 512
    +## Algorithm Feature cutoff time         : -
    +## Algorithm Feature cutoff mem          : -
    +## Nr. of instances                      : 527
    +## Instance Features (deterministic) ( 22)  : stacks, tiers, stack.tier.ratio, container.density, empty...
    +## Instance Features (stochastic)        : -
    +## Algorithm Features (deterministic)        : -
    +## Algorithm Features (stochastic)        : -
    +## Feature repetitions                   : 1 - 1
    +## Feature costs                         : No
    +## Algo.                          (  4)  : astar.symmulgt.transmul, astar.symmullt.transmul, idastar...
    +## Algo. repetitions                     : 1 - 1
    +## Algo. runs (inst x algo x rep)        : 2108
    +## Feature steps                  (  3)  : lfa1, lfa2, orig
    +## CV repetitions                        : 1
    +## CV folds                              : 10
    +
    +
    + + + +Back to scenario list + + + + diff --git a/_articles/RJ-2025-045/CPMP-2015_data/readme.txt b/_articles/RJ-2025-045/CPMP-2015_data/readme.txt new file mode 100644 index 0000000000..2f407a1fcc --- /dev/null +++ b/_articles/RJ-2025-045/CPMP-2015_data/readme.txt @@ -0,0 +1,22 @@ +source: An Algorithm Selection Benchmark for the Container Pre-Marshalling Problem (CPMP) +authors: K. Tierney and Y. Malitsky (features) / K. Tierney and D. Pacino and S. Voss (algorithms) +translator in coseal format: K. Tierney + +This is an extension of the 2013 premarshalling dataset that includes more features and a set of test instances. + +There are three sets of features: + +feature_values.arff contains the full set of features from iteration 2 of our latent feature analysis (LFA) process (see paper) +feature_values_itr1.arff contains only the features after iteration 1 of LFA +feature_values_orig.arff containers the features used in PREMARHSALLING-ASTAR-2013 + +We also provide test data with an identical naming scheme (see _test). + +The features for the pre-marshalling problem are all extremely easy and fast to +compute, thus the feature_costs.arff file has been omitted, as it would be time +0 for every feature (regardless of using original, iteration 1 or iteration 2 +features). + +The feature computation code is available at https://bitbucket.org/eusorpb/cpmp-as + +Note: previously the scenario was called PREMARSHALLING-ASTAR-2015. To save same space, we renamed the scenario. \ No newline at end of file diff --git a/_articles/RJ-2025-045/RJ-2025-045.R b/_articles/RJ-2025-045/RJ-2025-045.R new file mode 100644 index 0000000000..3d0867e1f9 --- /dev/null +++ b/_articles/RJ-2025-045/RJ-2025-045.R @@ -0,0 +1,687 @@ +# Generated by `rjournal_pdf_article()` using `knitr::purl()`: do not edit by hand +# Please edit RJ-2025-045.Rmd to modify this file + +## ----setup, include=FALSE----------------------------------------------------- +knitr::opts_chunk$set(echo = FALSE, warning = FALSE, message = FALSE, fig.path = "figures/", cache = TRUE, fig.align = "center", fig.pos = "ht") +library(ASML) +library(knitr) +library(kableExtra) +library(ggplot2) +library(viridisLite) +library(caret) +library(aslib) +library(llama) +library(rvest) +library(dplyr) +library(reshape2) +set.seed(1234) + + +## ----ASkerschke, include=TRUE, out.width = "100%",fig.align="center", fig.cap = "Schematic overview of the interplay between problem instance features (top left), algorithm performance data (bottom left), selector construction (center), and the assessment of selector performance (bottom right). Adapted from Kerschke et al. (2019).", echo=FALSE---- +if (knitr::is_html_output()) { + knitr::include_graphics("figures/AS_kerschke_drawio.png") +} else if (knitr::is_latex_output()) { + knitr::include_graphics("figures/AS_kerschke_drawio.pdf") +} + + +## ----rice, include=TRUE, out.width = "80%",fig.align="center", fig.cap = "Scheme of the algorithm selection problem by Rice (1976).", echo=FALSE---- +knitr::include_graphics("figures/ASP_Rice.png") + + +## ----ASMLvsllama, echo=FALSE, message=FALSE----------------------------------- +library(knitr) +library(kableExtra) + +if (knitr::is_latex_output()) { + check <- "\\ding{51}" + cross <- "\\ding{55}" + br <- "\\hspace{4cm}" +} else { + check <- "✓" # HTML ✓ + cross <- "✗" # HTML ✗ + br <- "
    " # salto de línea en HTML +} + +checklist <- data.frame( + Aspect = c( + "Input data", + "Normalized KPIs", + "ML backend", + "Hyperparameter tuning", + "Parallelization", + "Results summary", + "Visualization", + "Model interpretability tools", + "ASlib integration", + "Latest release" + ), + ASML = c( + paste0("features", br, " KPIs", br, "split by families supported"), + paste0(check), + paste0("caret"), + paste0("ASML::AStrain()", br, "supports arguments passed to caret (trainControl(), tuneGrid)"), + paste0(check, br, " with snow"), + paste0( + "Per algorithm", br, + "Best overall and per instance", br, + "ML-selected" + ), + paste0( + "Boxplots (per algorithm and ML-selected)", br, + "Ranking plots", br, + "Barplots (best vs ML-selected)" + ), + paste0(check, br, " with DALEX"), + paste0("basic support"), + "CRAN 1.1.0 (2025)" + ), + llama = c( + paste0("features", br, " KPIs", br, "feature costs supported"), + paste0(cross), + paste0("mlr"), + paste0("llama::cvFolds", br, "llama::tuneModel"), + paste0(check, br, " with parallelMap"), + paste0( + "Virtual best and single best per instance", br, + "Aggregated scores (PAR, count, successes)" + ), + paste0("Scatter plots comparing two algorithm selectors"), + paste0(cross), + paste0("extended support"), + "CRAN 0.10.1 (2021)" + ), + stringsAsFactors = FALSE +) + +is_latex <- knitr::is_latex_output() +if (knitr::is_latex_output()) { + fs <- 9 +} else { + fs <- NULL +} +tbl <- kbl( + checklist, + caption = "Comparative overview of ASML and llama for algorithm selection.", + escape = FALSE, + booktabs = TRUE, + align = "lcc" +) + +if (is_latex) { + tbl %>% + kable_styling(latex_options = c("hold_position", "striped"), font_size = fs) %>% + kableExtra::row_spec(0, bold = TRUE) %>% + column_spec(1, latex_column_spec = ">{\\\\raggedright\\\\arraybackslash}m{3cm}") %>% + column_spec(2, latex_column_spec = ">{\\\\centering\\\\arraybackslash}m{4.5cm}") %>% + column_spec(3, latex_column_spec = ">{\\\\centering\\\\arraybackslash}m{4.5cm}") +} else { + tbl %>% + kable_styling( + bootstrap_options = c("striped", "hover", "condensed"), + full_width = FALSE + ) %>% + column_spec(1, width = "25%") %>% + column_spec(2:3, width = "35%") +} + + +## ----preparedata, echo=FALSE, eval=TRUE--------------------------------------- +data(branching) +datap <- branching +features.complete <- datap$x +featcol <- c( + "number_of_variables", + "percentage_of_variables_degree_one", + "percentage_of_variables_degree_two", + "mean_variable_ranges", + "variance_variable_ranges", + "q50_variable_ranges", + "variance_variable_densities_RLT_variables", + "mean_percentage_of_constraints_and_objfun_in_which_each_variable_appears", + "variance_percentage_of_constraints_and_objfun_in_which_each_variable_appears", + "number_of_constraints", + "number_of_equality_constraints_:_number_of_constraints", + "number_of_linear_constraints_:_number_of_constraints", + "number_of_quadratic_constraints_:_number_of_constraints", + "degree", + "number_of_monomials", + "density", + "number_of_linear_monomials_:_number_of_monomials", + "number_of_quadratic_monomials_:_number_of_monomials", + "number_of_linear_rlt_variables_:_number_of_rlt_variables", + "number_of_quadratic_rlt_variables_:_number_of_rlt_variables", + "mean_percentage_of_monomials_in_each_constraint_and_objfun", + "mean_coefficients", + "variance_coefficients", + "number_of_variables_:_number_of_constraints_and_objfun", + "number_of_variables_:_degree", + "number_of_monomials_:_number_of_constraints_and_objfun", + "number_of_rlt_variables_:_number_of_constraints_and_objfun", + "intersection_graph_density", + "alternative_graph_density", + "intersection_greedy_max_modularity", + "alternative_greedy_max_modularity", + "intersection_graph_treewidth", + "alternative_graph_treewidth" +) +Description <- c( + "Number of variables", + "Pct. of variables not present in any monomial with degree greater than one", + "Pct. of variables not present in any monomial with degree greater than two", + "Average of the ranges of the variables", + "Variance of the ranges of the variables", + "Median of the ranges of the variables", + "Variance of the density of the variables", + "Average of the no. of appearances of each variable", + "Variance of the no. of appearances of each variable", + "Number of constraints", + "Pct. of equality constraints", + "Pct. of linear constraints", + "Pct. of quadratic constraints", + "Degree", + "Number of monomials", + "Density", + "Pct. of linear monomials", + "Pct. of quadratic monomials ", + "Pct. of linear RLT variables", + "Pct. of quadratic RLT variables", + "Average pct. of monomials in each constraint and in the objective function", + "Average of the coefficients", + "Variance of the coefficients", + "Number of variables divided by number of constrains", + "Number of variables divided by degree", + "Number of monomials divided by number of constrains", + "Number of RLT variables divided by number of constrains", + "Density of VIG", + "Density of CMIG", + "Modularity of VIG", + "Modularity of CMIG", + "Treewidth of VIG", + "Treewidth of CMIG" +) +features <- features.complete[, which(names(features.complete) %in% featcol)] + + +## ----optsum------------------------------------------------------------------- +aux <- paste(table(branching$x[, 1]), " (", names(table(branching$x[, 1])), ")", sep = "", collapse = ", ") +C1 <- c("Algorithms", "KPI", "Number of instances", "Number of instances per library", "Number of features") +C2 <- c("max, sum, dual, range, eig-VI, eig-CMI", "pace", nrow(features), aux, ncol(features)) +df <- data.frame(C1, C2) +df$C2[1:2] <- kableExtra::cell_spec(df$C2[1:2], monospace = TRUE) +colnames(df) <- NULL +if (knitr::is_latex_output()) { + fs <- 9 +} else { + fs <- NULL +} +kableExtra::kbl(df, escape = FALSE, booktabs = TRUE, caption = "Summary of the branching rule selection problem.") %>% + kableExtra::row_spec(1, extra_css = "border-top: 1px solid") %>% + kableExtra::row_spec(nrow(df), extra_css = "border-bottom: 1px solid") %>% + kableExtra::kable_styling(full_width = FALSE, position = "center", latex_options = "hold_position", font_size = fs) + + +## ----featsum------------------------------------------------------------------ +# Filtrar las columnas que están en featcol +filtered_columns <- names(datap$x)[names(datap$x) %in% featcol] +filtered_indices <- which(names(datap$x) %in% filtered_columns) +# Obtener las descripciones en el orden de las columnas de datap$x +filtered_descriptions <- Description[match(filtered_columns, featcol)] +df <- data.frame(filtered_indices, filtered_descriptions) +df[, 1] <- cell_spec(df[, 1], monospace = TRUE) +colnames(df) <- c("Index", "Description") +kableExtra::kbl(df, escape = FALSE, booktabs = TRUE, caption = "Features from the branching dataset.") %>% + kableExtra::row_spec(0, bold = TRUE) %>% + kableExtra::kable_styling(full_width = FALSE, position = "center", ) %>% + kableExtra::footnote(general = "Index refers to columns of branching$x.") + + +## ----ASMLcall, echo = TRUE, tidy=TRUE----------------------------------------- +set.seed(1234) +library(ASML) +data(branching) +features <- branching$x +KPI <- branching$y +lab_rules <- c("max", "sum", "dual", "range", "eig-VI", "eig-CMI") + + +## ----ASMLpre, echo = TRUE, tidy=TRUE------------------------------------------ +data <- partition_and_normalize(features, KPI, family_column = 1, split_by_family = TRUE, better_smaller = TRUE) +names(data) + + +## ----partitionPLOT, echo = FALSE, tidy=TRUE, out.width = ifelse(knitr::is_latex_output(), "70%" , "100%"), fig.cap="Train/Test partition preserving the percentage of instances for each library.", fig.alt = "Train/Test partition preserving the percentage of instances for each library."---- +library(ggplot2) +library(viridisLite) +data_plot <- rbind( + data.frame( + Family = data$families.train[[1]], + Dataset = "Train" + ), + data.frame( + Family = data$families.test[[1]], + Dataset = "Test" + ) +) +data_plot$Dataset <- factor(data_plot$Dataset, levels = c("Train", "Test")) +ggplot(data_plot, aes(x = Dataset, fill = Family)) + + geom_bar(position = "fill") + + labs( + title = "Train/Test Partition", + y = "Proportion" + ) + + scale_fill_viridis_d(name = "Library") + + theme_minimal() + + scale_x_discrete(labels = c( + "Train" = paste0("Train (n=", nrow(data$families.train), ")"), + "Test" = paste0("Test (n=", nrow(data$families.test), ")") + )) + + +## ----splitPLOT, echo = TRUE, tidy=TRUE, out.width = ifelse(knitr::is_latex_output(), "70%" , "100%"), fig.cap="Boxplots of instance-normalized KPI for each algorithm across instances in the train set.", fig.alt = "Boxplots of instance-normalized KPI for each algorithm across instances in the train set."---- +boxplots(data, test = FALSE, by_families = FALSE, labels = lab_rules) + + +## ----rank, echo = TRUE, tidy=TRUE, out.width = ifelse(knitr::is_latex_output(), "70%" , "100%"), fig.cap="Ranking of algorithms based on the instance-normalized KPI for the training sample, categorized by family. The bars represent the percentage of times each algorithm appeared in different ranking positions, with the numbers indicating the mean value of the KPI.", fig.alt = "Ranking of algorithms based on the instance-normalized KPI for the training sample, categorized by family. The bars represent the percentage of times each algorithm appeared in different ranking positions, with the numbers indicating the mean value of the KPI."---- +ranking(data, test = FALSE, by_families = TRUE, labels = lab_rules) + + +## ----precaret, echo = TRUE, tidy=TRUE----------------------------------------- +preProcValues <- caret::preProcess(data$x.train, method = "YeoJohnson") +data$x.train <- predict(preProcValues, data$x.train) +data$x.test <- predict(preProcValues, data$x.test) + + +## ----AMSLtrain, echo = TRUE, tidy=TRUE---------------------------------------- +library(quantregForest) +tune_grid <- expand.grid(mtry = 10) +training <- AStrain(data, method = "qrf", tuneGrid = tune_grid) + + +## ----AMSLpred, echo = TRUE, tidy=TRUE----------------------------------------- +predict_test <- ASpredict(training, newdata = data$x.test) + + +## ----AMSLtab, echo = TRUE, tidy=TRUE, eval=FALSE------------------------------ +# KPI_table(data, predictions = predict_test) + + +## ----AMSLtab2, echo = FALSE, tidy=TRUE, eval=TRUE----------------------------- +KPItab <- KPI_table(data, predictions = predict_test) +KPItab <- round(KPItab, 3) +rownames(KPItab) <- c("ML", "max", "sum", "dual", "range", "eig-VI", "eig-CMI") +rownames(KPItab) <- kableExtra::cell_spec(rownames(KPItab), monospace = TRUE) +# Define column names based on the output format +if (knitr::is_html_output()) { + col_names <- c("Arithmetic mean\ninst-norm KPI", "Geometric mean\n inst-norm KPI", "Arithmetic mean\nnon-norm KPI", "Geometric mean\nnon-norm KPI") + wi <- "3.3cm" +} else { + col_names <- c("Arith. mean\\newline inst-norm KPI", "Geom. mean\\newline inst-norm KPI", "Arith. mean\\newline non-norm KPI", "Geom. mean\\newline non-norm KPI") + wi <- "2.5cm" +} +if (knitr::is_latex_output()) { + fs <- 9 +} else { + fs <- NULL +} +kableExtra::kbl(KPItab, + escape = F, caption = "Arithmetic and geometric mean of the KPI (both instance-normalized and non-normalized) for each algorithm on the test set, along with the results for the algorithm selected by the learning model (first row).", + col.names = col_names, + booktabs = TRUE, +) %>% + kableExtra::kable_styling(full_width = FALSE, position = "center", latex_options = "hold_position", font_size = fs) %>% + column_spec(column = 2:5, width = wi) + + +## ----AMSLtabsum, echo = TRUE, tidy=TRUE, eval=FALSE--------------------------- +# KPI_summary_table(data, predictions = predict_test) + + +## ----AMSLtabsum2, echo = FALSE, tidy=TRUE, eval=TRUE-------------------------- +KPItab <- KPI_summary_table(data, predictions = predict_test) +KPItab <- round(KPItab, 3) +rownames(KPItab) <- c("single best", "ML", "optimal") +rownames(KPItab) <- kableExtra::cell_spec(rownames(KPItab), monospace = TRUE) +# Define column names based on the output format +if (knitr::is_html_output()) { + col_names <- c("Arithmetic mean\nnon-norm KPI", "Geometric mean\nnon-norm KPI") + wi <- "3.3cm" +} else { + col_names <- c("Arith. mean\\newline non-norm KPI", "Geom. mean\\newline non-norm KPI") + wi <- "2.5cm" +} +if (knitr::is_latex_output()) { + fs <- 9 +} else { + fs <- NULL +} +kableExtra::kbl(KPItab, + escape = F, caption = "Arithmetic and geometric mean of the non-normalized KPI for single best choice, ML choice, and optimal choice.", + col.names = col_names, + booktabs = TRUE, +) %>% + kableExtra::kable_styling(full_width = FALSE, position = "center", latex_options = "hold_position", font_size = fs) %>% + column_spec(column = 2:3, width = wi) + + +## ----ASMLplot, echo = TRUE, tidy=TRUE, eval=FALSE----------------------------- +# boxplots(data, predictions = predict_test, labels = c(lab_rules, "ML")) +# boxplots(data, predictions = predict_test, labels = c(lab_rules, "ML"), by_families = TRUE) +# ranking(data, predictions = predict_test, labels = c("ML", lab_rules), by_families = TRUE) +# figure_comparison(data, predictions = predict_test, by_families = FALSE, labels = lab_rules) + + +## ----ASMLplot1, echo = FALSE, tidy=TRUE, out.width = ifelse(knitr::is_latex_output(), "70%" , "100%"), fig.cap="Boxplots of instance-normalized KPI for each algorithm, including the ML algorithm, across instances in the test set.", fig.alt = "Boxplots of instance-normalized KPI for each algorithm, including the ML algorithm, across instances in the test set."---- +boxplots(data, predictions = predict_test, labels = c(lab_rules, "ML")) + + +## ----ASMLplot2, echo = FALSE, tidy=TRUE, out.width = ifelse(knitr::is_latex_output(), "70%" , "100%"), fig.cap="Boxplots of instance-normalized KPI for each algorithm, including the ML algorithm, across instances in the test set, categorized by family.", fig.alt = "Boxplots of instance-normalized KPI for each algorithm, including the ML algorithm, across instances in the test set, categorized by family."---- +boxplots(data, predictions = predict_test, labels = c(lab_rules, "ML"), by_families = TRUE) + + +## ----ASMLplot3, echo = FALSE, tidy=TRUE, out.width = ifelse(knitr::is_latex_output(), "70%" , "100%"), fig.cap="Ranking of algorithms, including the ML algorithm, based on the instance-normalized KPI for the test sample, categorized by family. The bars represent the percentage of times each algorithm appeared in different ranking positions, with the numbers indicating the mean value of the normalized KPI.", fig.alt = "Ranking of algorithms, including the ML algorithm, based on the instance-normalized KPI for the test sample, categorized by family. The bars represent the percentage of times each algorithm appeared in different ranking positions, with the numbers indicating the mean value of the normalized KPI."---- +ranking(data, predictions = predict_test, labels = c("ML", lab_rules), by_families = TRUE) + + +## ----ASMLplot4, echo = FALSE, tidy=TRUE, out.width = ifelse(knitr::is_latex_output(), "70%" , "100%"), fig.cap="Comparison of the best-performing rules: The right stack shows the proportion of times each of the original rules is identified as the best-performing option, while the left stack presents the frequency of selection by ML.", fig.alt = "Comparison of the best-performing rules: The right stack shows the proportion of times each of the original rules is identified as the best-performing option, while the left stack presents the frequency of selection by ML."---- +figure_comparison(data, predictions = predict_test, by_families = FALSE, labels = lab_rules) + + +## ----AMSLtrain2, echo = TRUE, tidy=TRUE, eval=FALSE--------------------------- +# qrf_q_predict <- function(modelFit, newdata, what = 0.5, submodels = NULL) { +# out <- predict(modelFit$finalModel, newdata, what = what) +# if (is.matrix(out)) { +# out <- out[, 1] +# } +# out +# } +# +# predict_test_Q1 <- ASpredict(training, newdata = data$x.test, f = "qrf_q_predict", what = 0.25) +# KPI_summary_table(data, predictions = predict_test_Q1) + + +## ----inline = TRUE, results='asis'-------------------------------------------- +if (knitr::is_html_output()) { + knitr::asis_output("The results are summarized in Table \\@ref(tab:AMSLtabsum22).") +} + + +## ----AMSLtabsum22, echo = FALSE, tidy=TRUE, eval=TRUE------------------------- +if (knitr::is_html_output()) { + .GlobalEnv$qrf_q_predict <- function(modelFit, newdata, what = 0.5, submodels = NULL) { + out <- predict(modelFit$finalModel, newdata, what = what) + if (is.matrix(out)) { + out <- out[, 1] + } + out + } + predict_test_Q1 <- ASML::ASpredict(training, newdata = data$x.test, f = "qrf_q_predict", what = 0.25) + KPItab <- KPI_summary_table(data, predictions = predict_test_Q1) + KPItab <- round(KPItab, 3) + rownames(KPItab) <- c("single best", "ML", "optimal") + rownames(KPItab) <- kableExtra::cell_spec(rownames(KPItab), monospace = TRUE) + # Define column names based on the output format + if (knitr::is_html_output()) { + col_names <- c("Arithmetic mean\nnon-norm KPI", "Geometric mean\nnon-norm KPI") + wi <- "3.3cm" + } else { + col_names <- c("Arith. mean\\newline non-norm KPI", "Geom. mean\\newline non-norm KPI") + wi <- "2.5cm" + } + # kableExtra::kbl(KPItab, escape = F, caption=paste0("Arithmetic and geometric mean of the non-normalized KPI #for single best choice, ML choice, and optimal choice. The ML choice is based on the predictions of the", + # bquote(alpha), + # "-conditional quantile for ", + # bquote(alpha), + # "=0.25.") , + if (knitr::is_latex_output()) { + fs <- 9 + } else { + fs <- NULL + } + kableExtra::kbl(KPItab, + escape = F, caption = "Arithmetic and geometric mean of the non-normalized KPI for single best choice, ML choice, and optimal choice. The ML choice is based on the predictions of the alpha-conditional quantile for alpha=0.25.",booktabs = TRUE, + col.names = col_names + ) %>% + kableExtra::kable_styling(full_width = FALSE, position = "center", latex_options = "hold_position", font_size = fs) %>% + column_spec(column = 2:3, width = wi) +} else { + cat("") +} + + +## ----ASML_DALEX, echo = TRUE, tidy=TRUE, eval=FALSE--------------------------- +# # Create DALEX explainers for each trained model +# explainers_qrf <- ASexplainer(training, data = data$x.test, y = data$y.test, labels = lab_rules) +# # Compute model performance metrics for each explainer +# mp_qrf <- lapply(explainers_qrf, DALEX::model_performance) +# # Plot the performance metrics +# do.call(plot, unname(mp_qrf)) + theme_bw(base_line_size = 0.5) + + +## ----DALEX1, echo = FALSE, tidy=TRUE, out.width = ifelse(knitr::is_latex_output(), "70%" , "100%"), fig.cap="Reversed empirical cumulative distribution function of the absolute residuals of the trained models.", fig.alt = "Reversed empirical cumulative distribution function of the absolute residuals of the trained model."---- +explainers_qrf <- ASexplainer(training, data = data$x.test, y = data$y.test, labels = lab_rules, verbose = FALSE) +mp_qrf <- lapply(explainers_qrf, DALEX::model_performance) +do.call(plot, unname(mp_qrf)) + theme_bw(base_line_size = 0.5) + + +## ----ASML_DALEX2, echo = TRUE, tidy=TRUE, eval=FALSE-------------------------- +# # Compute feature importance for each model in the explainers list +# vi_qrf <- lapply(explainers_qrf, DALEX::model_parts) +# # Plot the top 5 most important variables for each model +# do.call(plot, c(unname(vi_qrf), list(max_vars = 5))) +# # Compute PDP for the variable "degree" for each model +# pdp_qrf <- lapply(explainers_qrf, DALEX::model_profile, variable = "degree", type = "partial") +# # Plot the PDPs generated +# do.call(plot, unname(pdp_qrf)) + theme_bw(base_line_size = 0.5) + + +## ----AMSLtimes, echo = FALSE, tidy=TRUE, eval=TRUE---------------------------- +library(kableExtra) +set.seed(1234) +# Tabla con valores fijos +general_times <- data.frame( + Function = c("ASML::partition_and_normalize", "caret::preProcess"), + Time_sec = c(0.03, 1.55) # valores ya obtenidos +) + +# Nombres de columna +col_names <- c("Stage", "Execution time (seconds)") + +# Ancho opcional de la columna de tiempo +wi <- "10em" + +# Crear tabla elegante +if (knitr::is_latex_output()) { + fs <- 9 +} else { + fs <- NULL +} +kableExtra::kbl( + general_times, + escape = TRUE, + caption = "Execution times (in seconds) on the SpMVformat dataset for the main preprocessing stages.", + col.names = col_names,booktabs = TRUE, +) %>% + kableExtra::kable_styling(full_width = FALSE, position = "center", latex_options = "hold_position", font_size = fs) %>% + kableExtra::column_spec(column = 2, width = wi) + + +## ----AMSLtimes2, echo = FALSE, tidy=TRUE, eval=TRUE--------------------------- +library(kableExtra) + +# Crear tabla con valores fijos +train_times <- data.frame( + Method = c("nnet", "svmRadial", "rf"), + `parallel \\= FALSE` = c(236.58, 881.03, 4753.00), + `parallel \\= TRUE` = c(50.75, 263.60, 1289.68) +) + +# Ancho opcional de columnas +wi <- "10em" + +# Crear tabla elegante compatible con PDF +if (knitr::is_latex_output()) { + fs <- 9 +} else { + fs <- NULL +} +kableExtra::kbl( + train_times, + escape = TRUE, # ya escapamos manualmente en los nombres + caption = "Training times (in seconds) on the SpMVformat dataset for different methods using ASML::AStrain. The first column shows execution without parallelization (parallel = FALSE) and the second column shows execution with parallelization (parallel = TRUE).", + col.names = c("Method", "parallel = FALSE", "parallel = TRUE"), booktabs = TRUE, +) %>% + kableExtra::kable_styling(latex_options = c("hold_position"), font_size = fs) %>% + kableExtra::column_spec(column = 2:3, width = wi) %>% + kableExtra::add_header_above(c(" " = 1, "Execution times (in seconds) of ASML::AStrain" = 2)) + + +## ----ASMLnnet, echo = TRUE, tidy=TRUE, eval=FALSE----------------------------- +# set.seed(1234) +# data(SpMVformat) +# features <- SpMVformat$x +# KPI <- SpMVformat$y +# data <- partition_and_normalize(features, KPI, better_smaller = FALSE) +# preProcValues <- caret::preProcess(data$x.train, method = "YeoJohnson") +# data$x.train <- predict(preProcValues, data$x.train) +# data$x.test <- predict(preProcValues, data$x.test) +# training <- AStrain(data, method = "nnet", parallel = TRUE) +# pred <- ASpredict(training, newdata = data$x.test) +# ranking(data, predictions = pred) + + +## ----ASMLnnet1, echo = FALSE, tidy=TRUE, out.width = ifelse(knitr::is_latex_output(), "70%" , "100%"), fig.cap="Ranking of storage formats, including the ML selected, based on the instance-normalized KPI for the test sample. The bars represent the percentage of times each storage format appeared in different ranking positions, with the numbers indicating the mean value of the normalized KPI.", fig.alt = "Ranking of storage formats, including the ML selected, based on the instance-normalized KPI for the test sample. The bars represent the percentage of times each storage format appeared in different ranking positions, with the numbers indicating the mean value of the normalized KPI."---- +data(SpMVformat) +features <- SpMVformat$x +KPI <- SpMVformat$y +data <- partition_and_normalize(features, KPI, better_smaller = FALSE) +preProcValues <- caret::preProcess(data$x.train, method = "YeoJohnson") +data$x.train <- predict(preProcValues, data$x.train) +data$x.test <- predict(preProcValues, data$x.test) +training <- AStrain(data, method = "nnet", parallel = TRUE) +pred <- ASpredict(training, newdata = data$x.test) +ranking(data, predictions = pred) + + +## ----echo = TRUE, tidy=TRUE, eval=TRUE---------------------------------------- +set.seed(1234) +library(tidyverse) +library(rvest) +scen <- "CPMP-2015" +url <- paste0("https://coseal.github.io/aslib-r/scenario-pages/", scen, "/data_files") +page <- read_html(paste0(url, ".html")) +file_links <- page %>% + html_nodes("a") %>% + html_attr("href") + +# Create directory for downloaded files +dir_data <- paste0(scen, "_data") +dir.create(dir_data, showWarnings = FALSE) + +# Download files +for (link in file_links) { + full_link <- ifelse(grepl("^http", link), link, paste0(url, "/", link)) + file_name <- basename(link) + dest_file <- file.path(dir_data, file_name) + if (!is.na(full_link)) { + download.file(full_link, dest_file, mode = "wb", quiet = TRUE) + } +} + + +## ----echo = TRUE, tidy=TRUE, eval=TRUE---------------------------------------- +library(aslib) +ASScen <- aslib::parseASScenario(dir_data) +llamaScen <- aslib::convertToLlama(ASScen) +folds <- llama::cvFolds(llamaScen) + + +## ----echo = TRUE, tidy=TRUE, eval=TRUE---------------------------------------- +KPI <- folds$data[, folds$performance] +features <- folds$data[, folds$features] +cutoff <- ASScen$desc$algorithm_cutoff_time +is.timeout <- ASScen$algo.runstatus[, -c(1, 2)] != "ok" +KPI_pen <- KPI * ifelse(is.timeout, 10, 1) +nins <- length(getInstanceNames(ASScen)) +ID <- 1:nins + + +## ----echo = TRUE, tidy=TRUE, eval=TRUE---------------------------------------- +data <- partition_and_normalize(x = features, y = KPI, x.test = features, y.test = KPI, better_smaller = TRUE) +train_control <- caret::trainControl(index = folds$train, savePredictions = "final") +training <- AStrain(data, method = "qrf", trControl = train_control) + + +## ----echo = TRUE, tidy=TRUE, eval=TRUE---------------------------------------- +pred_list <- lapply(training, function(model) { + model$pred %>% + arrange(rowIndex) %>% + pull(pred) +}) + +pred <- do.call(cbind, pred_list) +alg_sel <- apply(pred, 1, which.max) + +succ <- mean(!is.timeout[cbind(ID, alg_sel)]) +par10 <- mean(KPI_pen[cbind(ID, alg_sel)]) +mcp <- mean(KPI[cbind(ID, alg_sel)] - apply(KPI, 1, min)) + + +## ----AMSLtabASLIB, echo = FALSE, tidy=TRUE, eval=TRUE------------------------- +results_table <- data.frame(Model = "ASML qrf", succ = format(succ, nsmall = 3, digits = 3), par10 = format(par10, nsmall = 3, digits = 3), mcp = format(mcp, nsmall = 3, digits = 3)) + +# Add manually defined rows +manual_rows <- data.frame( + Model = c("baseline vbs", "baseline singleBest", "regr.lm", "regr.rpart", "regr.randomForest"), + succ = format(c(1.000, 0.812, 0.843, 0.843, 0.846), nsmall = 3, digits = 3), + par10 = format(c(227.605, 7002.907, 5887.326, 5916.120, 5748.065), nsmall = 3, digits = 3), + mcp = format(c(0.000, 688.774, 556.875, 585.669, 540.574), nsmall = 3, digits = 3) +) + +# Combine with the results table +results_table <- rbind(manual_rows, results_table) + +# Update column names if necessary +colnames(results_table) <- kableExtra::cell_spec(colnames(results_table), monospace = TRUE) + +# Create the table +if (knitr::is_latex_output()) { + fs <- 9 +} else { + fs <- NULL +} +if (knitr::is_html_output()) { + results_table %>% + kableExtra::kbl( + escape = FALSE, caption = "Performance results of various models on the CPMP-2015 dataset. The last row represents the performance of the quantile random forest model based on instance-normalized KPI using the \\CRANpkg{ASML} package. + The preceding rows detail the results (all taken from original ASlib study^[Available at: https://coseal.github.io/aslib-r/scenario-pages/CPMP-2015/llama.html (Accessed October 25, 2024).]) of the virtual best solver (vbs), single best solver (singleBest), and the considered regression methods (linear model, regression trees and regression random forest).", + align = "lrrr" + ) %>% + kableExtra::kable_styling(full_width = FALSE, position = "center", latex_options = "hold_position", font_size = fs) %>% + kableExtra::row_spec(0, bold = TRUE) %>% # Encabezado en negrita + kableExtra::row_spec(c(1, 2), background = "#E6F2FA") %>% # Color para la primera fila + kableExtra::row_spec(3:5, background = "#B3D8E5") %>% # Color para la segunda fila + kableExtra::row_spec(nrow(results_table), background = "#99C4DE") # Color para la última fila +} else { + results_table %>% + kableExtra::kbl( + escape = FALSE, caption = "Performance results of various models on the CPMP-2015 dataset. The last row represents the performance of the quantile random forest model based on instance-normalized KPI using the \\CRANpkg{ASML} package. + The preceding rows detail the results (all taken from original ASlib study) of the virtual best solver (vbs), single best solver (singleBest), and the considered regression methods (linear model, regression trees and regression random forest).", booktabs = TRUE, + align = "lrrr" + ) %>% + kableExtra::kable_styling(full_width = FALSE, position = "center", latex_options = "hold_position", font_size = fs) %>% + kableExtra::row_spec(0, bold = TRUE) +} + + +## ----AMSLtabASLIB_OLD, echo = FALSE, tidy=TRUE, eval=FALSE-------------------- +# results_table <- cbind(succ, par10, mcp) +# colnames(results_table) <- kableExtra::cell_spec(colnames(results_table), monospace = TRUE) +# if (knitr::is_latex_output()) { +# fs <- 9 +# } else { +# fs <- NULL +# } +# kableExtra::kbl(results_table, escape = F, booktabs = TRUE, caption = "Performance results of quantile random forest on the CPMP-2015 dataset based on instance-normalized KPI.") %>% +# kableExtra::kable_styling(full_width = FALSE, position = "center", latex_options = "hold_position", font_size = fs) + diff --git a/_articles/RJ-2025-045/RJ-2025-045.Rmd b/_articles/RJ-2025-045/RJ-2025-045.Rmd new file mode 100644 index 0000000000..acf4f9872e --- /dev/null +++ b/_articles/RJ-2025-045/RJ-2025-045.Rmd @@ -0,0 +1,881 @@ +--- +title: 'ASML: An R Package for Algorithm Selection with Machine Learning' +date: '2026-02-10' +abstract: | + For extensively studied computational problems, it is commonly acknowledged that different instances may require different algorithms for optimal performance. The R package ASML focuses on the task of efficiently selecting from a given portfolio of algorithms, the most suitable one for each specific problem instance, based on significant instance features. The package allows for the use of the machine learning tools available in the R package caret and additionally offers visualization tools and summaries of results that make it easier to interpret how algorithm selection techniques perform, helping users better understand and assess their behavior and performance improvements. +draft: no +author: +- name: Ignacio Gómez-Casares + affiliation: Universidade de Santiago de Compostela + address: + - Department of Statistics, Mathematical Analysis and Optimization + - Santiago de Compostela, Spain + email: ignaciogomez.casares@usc.es +- name: Beatriz Pateiro-López + affiliation: Universidade de Santiago de Compostela + address: + - Department of Statistics, Mathematical Analysis and Optimization + - CITMAga (Galician Center for Mathematical Research and Technology) + - Santiago de Compostela, Spain + email: beatriz.pateiro@usc.es + orcid: 0000-0002-7714-1835 +- name: Brais González-Rodríguez + affiliation: Universidade de Vigo + address: + - Department of Statistics and Operational Research + - SiDOR Research Group + - Vigo, Spain + email: brais.gonzalez.rodriguez@uvigo.gal +- name: Julio González-Díaz + affiliation: Universidade de Santiago de Compostela + address: + - Department of Statistics, Mathematical Analysis and Optimization + - CITMAga (Galician Center for Mathematical Research and Technology) + - Santiago de Compostela, Spain + email: julio.gonzalez@usc.es + orcid: 0000-0002-4667-4348 +type: package +output: + rjtools::rjournal_web_article: + self_contained: yes + toc: no + mathjax: https://cdn.jsdelivr.net/npm/mathjax@4/tex-mml-chtml.js +bibliography: gomez-pateiro-gonzalez-gonzalez.bib +header-includes: +- \usepackage{longtable} +- \usepackage[table]{xcolor} +- \usepackage{pifont} +- \usepackage{float} +date_received: '2025-03-04' +volume: 17 +issue: 4 +slug: RJ-2025-045 +journal: + lastpage: 236 + firstpage: 216 + +--- + + +```{r setup, include=FALSE} +knitr::opts_chunk$set(echo = FALSE, warning = FALSE, message = FALSE, fig.path = "figures/", cache = TRUE, fig.align = "center", fig.pos = "ht") +library(ASML) +library(knitr) +library(kableExtra) +library(ggplot2) +library(viridisLite) +library(caret) +library(aslib) +library(llama) +library(rvest) +library(dplyr) +library(reshape2) +set.seed(1234) +``` + +# Introduction + +Selecting from a set of algorithms the most appropriate one for solving a given problem instance (understood as an individual problem case with its own specific characteristics) is a common issue that comes up in many different situations, such as in combinatorial search problems `r knitr::asis_output(ifelse(knitr::is_html_output(), '[@kot16; @dra20]', '\\citep{kot16,dra20}'))`, planning and scheduling problems `r knitr::asis_output(ifelse(knitr::is_html_output(), '[@spe21; @mes14]', '\\citep{spe21,mes14}'))`, or in machine learning (ML), where the multitude of available techniques often makes it challenging to determine the best approach for a particular dataset `r knitr::asis_output(ifelse(knitr::is_html_output(), '[@van19]', '\\citep{van19}'))`. For an extensive survey on automated algorithm selection and application areas, we refer to `r knitr::asis_output(ifelse(knitr::is_html_output(), '@ker19', '\\cite{ker19}'))`. + +Figure `r knitr::asis_output(ifelse(knitr::is_html_output(), '\\@ref(fig:ASkerschke)', '\\@ref(fig:ASkerschke)'))` presents a general scheme, adapted from Figure 1 in `r knitr::asis_output(ifelse(knitr::is_html_output(), '@ker19', '\\cite{ker19}'))`, illustrating the use of ML for algorithm selection. A set of problem instances is given, each described by associated features, together with a portfolio of algorithms that have been evaluated on all instances. The instance features and performance results are then fed into a ML framework, which is trained to produce a selector capable of predicting the best-performing algorithm for an unseen instance. Note that we are restricting attention to *offline* algorithm selection, in which the selector is constructed using a training set of instances and then applied to new problem instances. + +```{r, ASkerschke, include=TRUE, out.width = "100%",fig.align="center", fig.cap = "Schematic overview of the interplay between problem instance features (top left), algorithm performance data (bottom left), selector construction (center), and the assessment of selector performance (bottom right). Adapted from Kerschke et al. (2019).", echo=FALSE} +if (knitr::is_html_output()) { + knitr::include_graphics("figures/AS_kerschke_drawio.png") +} else if (knitr::is_latex_output()) { + knitr::include_graphics("figures/AS_kerschke_drawio.pdf") +} +``` + +Algorithm selection tools also demonstrate significant potential in the field of optimization, enhancing performance at solving problems where multiple solving strategies are often available. For example, a key factor in the efficiency of state-of-the-art global solvers in mixed integer linear programming and also in nonlinear optimization is the design of branch-and-bound algorithms and, in particular, of their branching rules. There isn't a single branching rule that outperforms all others on every problem instance. Instead, different branching rules exhibit optimal performance on different types of problem instances. Developing methods for the automatic selection of branching rules based on instance features has proven to be an effective strategy toward solving optimization problems more efficiently `r knitr::asis_output(ifelse(knitr::is_html_output(), '[@lod17;@ben21;@gha23]', '\\citep{lod17, ben21, gha23}'))`. + +In algorithm selection, not only do the problem domain to which it applies and the algorithms for addressing problem instances play a crucial role, but also the metrics used to assess algorithm effectiveness ---referred to in this work as Key Performance Indicators (KPIs). KPIs are used in different fields to assess and measure the performance of specific objectives or goals. In a business context, these indicators are quantifiable metrics that provide valuable insights into how well an individual, team, or entire organization is progressing towards achieving its defined targets. In the context of algorithms, KPIs serve as quantifiable measures used to evaluate the effectiveness and efficiency of algorithmic processes. For instance, in the realm of computer science and data analysis, KPIs can include measures like execution time, accuracy, and scalability. Monitoring these KPIs allows for a comprehensive assessment of algorithmic performance, aiding in the selection of the most appropriate algorithm for a given instance and facilitating continuous improvement in algorithmic design and implementation. + +Additionally, in many applications, normalizing the KPI to a standardized range like $[0, 1]$ provides a more meaningful basis for comparison. The KPI obtained through this process, which we will refer to as instance-normalized KPI, reflects the performance of each algorithm relative to the best-performing one for each specific instance. For example, if we have multiple algorithms and we are measuring execution time that can vary across instances, normalizing the execution time for each instance relative to the fastest algorithm within that same instance allows for a fairer evaluation. This is particularly important when the values of execution time might not directly reflect the relative performance of the algorithms due to wide variations in the scale of the measurements. Thus, normalizing puts all algorithms on an equal footing, allowing a clearer assessment of their relative efficiency. + +Following the general framework illustrated in Figure `r knitr::asis_output(ifelse(knitr::is_html_output(), '\\@ref(fig:ASkerschke)', '\\@ref(fig:ASkerschke)'))`, the R package \CRANpkg{ASML} `r knitr::asis_output(ifelse(knitr::is_html_output(), '[@Rasml]', '\\citep{Rasml}'))` provides a wrapper for ML methods to select from a portfolio of algorithms based on the value of a given KPI. It uses a set of features in a training set to learn a regression model for the instance-normalized KPI value for each algorithm. Then, the instance-normalized KPI is predicted for unseen test instances, and the algorithm with the best predicted value is chosen. As learning techniques for algorithm selection, the user can invoke any regression method from the \CRANpkg{caret} package `r knitr::asis_output(ifelse(knitr::is_html_output(), '[@Rcaret]', '\\citep{Rcaret}'))` or use a custom function defined by the user. This makes our package flexible, as it automatically supports new methods when they are added to \CRANpkg{caret}. Although initially designed for selecting branching rules in nonlinear optimization problems, its versatility allows the package to effectively address algorithm selection challenges across a wide range of domains. It can be applied to a broad spectrum of disciplines whenever there is a diverse set of instances within a specific problem domain, a suite of algorithms with varying behaviors across instances, clearly defined metrics for evaluating the performance of the available algorithms, and known features or characteristics of the instances that can be computed and are ideally correlated with algorithm performance. The visualization tools implemented in the package allow for an effective evaluation of the performance of the algorithm selection techniques. A key distinguishing element of \CRANpkg{ASML} is its learning-phase approach, which uses instance-normalized KPI values and trains a separate regression model for each algorithm to predict its normalized KPI on unseen instances. + +# Background + + +The algorithm selection problem was first outlined in the seminal work by `r knitr::asis_output(ifelse(knitr::is_html_output(), '@ric76', '\\cite{ric76}'))` . In simple terms, for a given set of problem instances (problem space) and a set of algorithms (algorithm space), the goal is to determine a selection model that maps each problem instance to the most suitable algorithm for it. By *most suitable*, we mean the best according to a specific metric that associates each combination of instance and algorithm with its respective performance. Formally, let $\mathcal{P}$ denote the problem space or set of problem instances. The algorithm space or set of algorithms is denoted by $\mathcal{A}$. The metric $p:\mathcal{P}\times\mathcal{A}\rightarrow \mathbb{R}^n$ measures the performance $p(x,A)$ of +any algorithm $A\in \mathcal{A}$ on instance $x\in\mathcal{P}$. The goal is to construct a selector $S:\mathcal{P}\rightarrow \mathcal{A}$ that maps any problem +instance $x\in \mathcal{P}$ to an algorithm $S(x)=A\in \mathcal{A}$ in such a way that its performance is optimal. + +As discussed in the Introduction, many algorithm selection methods in the literature use ML tools to model the relationship between problem instances and algorithm performance, using features derived from these instances. The pivotal step in this process is defining appropriate features that can be readily computed and are likely to impact algorithm performance. That is, given $x\in\mathcal{P}$, we make use of informative features $f (x) = (f_1(x),\ldots,f_k(x))\in \mathbb{R}^k$. In this framework, the selector $S$ maps the simpler feature space $\mathbb{R}^k$ into the algorithm space $\mathcal{A}$. A scheme of the algorithm selection problem, as described in `r knitr::asis_output(ifelse(knitr::is_html_output(), '@ric76', '\\cite{ric76}'))`, is shown in Figure `r knitr::asis_output(ifelse(knitr::is_html_output(), '\\@ref(fig:rice)', '\\@ref(fig:rice)'))`. + +```{r, rice, include=TRUE, out.width = "80%",fig.align="center", fig.cap = "Scheme of the algorithm selection problem by Rice (1976).", echo=FALSE} +knitr::include_graphics("figures/ASP_Rice.png") +``` + +For the practical derivation of the selection model $S$, we use training data consisting of features $f(x)$ and performances $p(x,A)$, where $x \in \mathcal{P}^\prime \subset \mathcal{P}$ and $A \in \mathcal{A}$. The task is to learn the selector $S$ based on the training data. The model allows us to forecast the performance on unobserved instance problems based on their features and subsequently select the algorithm with the highest predicted performance. A comprehensive discussion of various aspects of algorithm selection techniques can be found in `r knitr::asis_output(ifelse(knitr::is_html_output(), '@kot16', '\\cite{kot16}'))` and `r knitr::asis_output(ifelse(knitr::is_html_output(), '@pul22', '\\cite{pul22}'))`. + +# Algorithm selection tools in R + +The task of algorithm selection has seen significant advancements in recent years, with R packages facilitating this process. Here we present some of the existing tools that offer a range of functionalities, including flexible model-building frameworks, automated workflows, and standardized scenario formats, providing valuable resources for both researchers and end-users in algorithm selection. + +The \CRANpkg{llama} package `r knitr::asis_output(ifelse(knitr::is_html_output(), '[@Rllama]', '\\citep{Rllama}'))` provides a flexible implementation within R for evaluating algorithm portfolios. It simplifies the task of building predictive models to solve algorithm selection scenarios, allowing users to apply ML models effectively. In \CRANpkg{llama}, ML algorithms are defined using the \CRANpkg{mlr} package `r knitr::asis_output(ifelse(knitr::is_html_output(), '[@Rmlr]', '\\citep{Rmlr}'))`, offering a structured approach to model selection. On the other hand, the Algorithm Selection Library (ASlib) `r knitr::asis_output(ifelse(knitr::is_html_output(), '[@bis16]', '\\citep{bis16}'))` proposes a standardized format for representing algorithm selection scenarios and introduces a repository that hosts an expanding collection of datasets from the literature. It serves as a benchmark for evaluating algorithm selection techniques under consistent conditions. It is accessible to R users through the \CRANpkg{aslib} package. This integration simplifies the process for those working within the R environment. Furthermore, \CRANpkg{aslib} interfaces with the \CRANpkg{llama} package, facilitating the analysis of algorithm selection techniques within the benchmark scenarios it provides. + +Our \CRANpkg{ASML} package offers an approach to algorithm selection based on the powerful and flexible \CRANpkg{caret} framework. By using \CRANpkg{caret}’s ability to work with many different ML models, along with its model tuning and validation tools, \CRANpkg{ASML} makes the selection process easy and effective, especially for users already familiar with \CRANpkg{caret}. Thus, while \CRANpkg{ASML} shares some conceptual similarities with \CRANpkg{llama}, it distinguishes itself through its interface to the ML models in \CRANpkg{caret} instead of \CRANpkg{mlr}, which is currently considered retired by the mlr-org team, potentially leading to compatibility issues with certain learners, and has been succeeded by the next-generation \CRANpkg{mlr3} `r knitr::asis_output(ifelse(knitr::is_html_output(), '[@Rmlr3]', '\\citep{Rmlr3}'))`. In addition, \CRANpkg{ASML} automates the normalization of KPIs based on the best-performing algorithm for each instance, addressing the challenges that arise when performance metrics vary significantly across instances. \CRANpkg{ASML} further provides new visualization tools that can be useful for understanding the results of the learning process. A comparative overview of the main features and differences between these packages can be seen in Table `r knitr::asis_output('\\@ref(tab:ASMLvsllama)')`. + +```{r ASMLvsllama, echo=FALSE, message=FALSE} +library(knitr) +library(kableExtra) + +if (knitr::is_latex_output()) { + check <- "\\ding{51}" + cross <- "\\ding{55}" + br <- "\\hspace{4cm}" +} else { + check <- "✓" # HTML ✓ + cross <- "✗" # HTML ✗ + br <- "
    " # salto de línea en HTML +} + +checklist <- data.frame( + Aspect = c( + "Input data", + "Normalized KPIs", + "ML backend", + "Hyperparameter tuning", + "Parallelization", + "Results summary", + "Visualization", + "Model interpretability tools", + "ASlib integration", + "Latest release" + ), + ASML = c( + paste0("features", br, " KPIs", br, "split by families supported"), + paste0(check), + paste0("caret"), + paste0("ASML::AStrain()", br, "supports arguments passed to caret (trainControl(), tuneGrid)"), + paste0(check, br, " with snow"), + paste0( + "Per algorithm", br, + "Best overall and per instance", br, + "ML-selected" + ), + paste0( + "Boxplots (per algorithm and ML-selected)", br, + "Ranking plots", br, + "Barplots (best vs ML-selected)" + ), + paste0(check, br, " with DALEX"), + paste0("basic support"), + "CRAN 1.1.0 (2025)" + ), + llama = c( + paste0("features", br, " KPIs", br, "feature costs supported"), + paste0(cross), + paste0("mlr"), + paste0("llama::cvFolds", br, "llama::tuneModel"), + paste0(check, br, " with parallelMap"), + paste0( + "Virtual best and single best per instance", br, + "Aggregated scores (PAR, count, successes)" + ), + paste0("Scatter plots comparing two algorithm selectors"), + paste0(cross), + paste0("extended support"), + "CRAN 0.10.1 (2021)" + ), + stringsAsFactors = FALSE +) + +is_latex <- knitr::is_latex_output() +if (knitr::is_latex_output()) { + fs <- 9 +} else { + fs <- NULL +} +tbl <- kbl( + checklist, + caption = "Comparative overview of ASML and llama for algorithm selection.", + escape = FALSE, + booktabs = TRUE, + align = "lcc" +) + +if (is_latex) { + tbl %>% + kable_styling(latex_options = c("hold_position", "striped"), font_size = fs) %>% + kableExtra::row_spec(0, bold = TRUE) %>% + column_spec(1, latex_column_spec = ">{\\\\raggedright\\\\arraybackslash}m{3cm}") %>% + column_spec(2, latex_column_spec = ">{\\\\centering\\\\arraybackslash}m{4.5cm}") %>% + column_spec(3, latex_column_spec = ">{\\\\centering\\\\arraybackslash}m{4.5cm}") +} else { + tbl %>% + kable_styling( + bootstrap_options = c("striped", "hover", "condensed"), + full_width = FALSE + ) %>% + column_spec(1, width = "25%") %>% + column_spec(2:3, width = "35%") +} +``` + +There are also automated approaches that streamline the process of selecting and optimizing ML models within the R environment. Tools like \CRANpkg{h2o} provide robust functionalities specifically designed for R users, facilitating an end-to-end ML workflow. These frameworks automate various tasks, including algorithm selection, hyperparameter optimization, and feature engineering, thereby simplifying the process for users of all skill levels. By integrating these automated solutions into R, users can efficiently explore a wide range of models and tuning options without needing extensive domain knowledge or manual intervention. This automation not only accelerates the model development process but also improves the overall performance of ML projects by allowing a systematic evaluation of different approaches and configurations. However, while \CRANpkg{h2o} excels at automating the selection of ML models and hyperparameter tuning, it does not perform algorithm selection based on instance-specific features, which is the primary focus of our approach. Instead, it evaluates multiple algorithms in parallel and selects the best-performing one based on predetermined metrics. + +# Using the ASML package + +```{r preparedata, echo=FALSE, eval=TRUE} +data(branching) +datap <- branching +features.complete <- datap$x +featcol <- c( + "number_of_variables", + "percentage_of_variables_degree_one", + "percentage_of_variables_degree_two", + "mean_variable_ranges", + "variance_variable_ranges", + "q50_variable_ranges", + "variance_variable_densities_RLT_variables", + "mean_percentage_of_constraints_and_objfun_in_which_each_variable_appears", + "variance_percentage_of_constraints_and_objfun_in_which_each_variable_appears", + "number_of_constraints", + "number_of_equality_constraints_:_number_of_constraints", + "number_of_linear_constraints_:_number_of_constraints", + "number_of_quadratic_constraints_:_number_of_constraints", + "degree", + "number_of_monomials", + "density", + "number_of_linear_monomials_:_number_of_monomials", + "number_of_quadratic_monomials_:_number_of_monomials", + "number_of_linear_rlt_variables_:_number_of_rlt_variables", + "number_of_quadratic_rlt_variables_:_number_of_rlt_variables", + "mean_percentage_of_monomials_in_each_constraint_and_objfun", + "mean_coefficients", + "variance_coefficients", + "number_of_variables_:_number_of_constraints_and_objfun", + "number_of_variables_:_degree", + "number_of_monomials_:_number_of_constraints_and_objfun", + "number_of_rlt_variables_:_number_of_constraints_and_objfun", + "intersection_graph_density", + "alternative_graph_density", + "intersection_greedy_max_modularity", + "alternative_greedy_max_modularity", + "intersection_graph_treewidth", + "alternative_graph_treewidth" +) +Description <- c( + "Number of variables", + "Pct. of variables not present in any monomial with degree greater than one", + "Pct. of variables not present in any monomial with degree greater than two", + "Average of the ranges of the variables", + "Variance of the ranges of the variables", + "Median of the ranges of the variables", + "Variance of the density of the variables", + "Average of the no. of appearances of each variable", + "Variance of the no. of appearances of each variable", + "Number of constraints", + "Pct. of equality constraints", + "Pct. of linear constraints", + "Pct. of quadratic constraints", + "Degree", + "Number of monomials", + "Density", + "Pct. of linear monomials", + "Pct. of quadratic monomials ", + "Pct. of linear RLT variables", + "Pct. of quadratic RLT variables", + "Average pct. of monomials in each constraint and in the objective function", + "Average of the coefficients", + "Variance of the coefficients", + "Number of variables divided by number of constrains", + "Number of variables divided by degree", + "Number of monomials divided by number of constrains", + "Number of RLT variables divided by number of constrains", + "Density of VIG", + "Density of CMIG", + "Modularity of VIG", + "Modularity of CMIG", + "Treewidth of VIG", + "Treewidth of CMIG" +) +features <- features.complete[, which(names(features.complete) %in% featcol)] +``` + +Here, we illustrate the usage of the \CRANpkg{ASML} package with an example within the context of algorithm selection for spatial branching in polynomial optimization, aligning with the problem discussed in `r knitr::asis_output(ifelse(knitr::is_html_output(), '@gha23', '\\cite{gha23}'))` and further explored in `r knitr::asis_output(ifelse(knitr::is_html_output(), '@gon24', '\\cite{gon24}'))`. Table `r knitr::asis_output('\\@ref(tab:optsum)')` provides an overview of the problem and a summary of the components that we will discuss in detail below. + +```{r optsum} +aux <- paste(table(branching$x[, 1]), " (", names(table(branching$x[, 1])), ")", sep = "", collapse = ", ") +C1 <- c("Algorithms", "KPI", "Number of instances", "Number of instances per library", "Number of features") +C2 <- c("max, sum, dual, range, eig-VI, eig-CMI", "pace", nrow(features), aux, ncol(features)) +df <- data.frame(C1, C2) +df$C2[1:2] <- kableExtra::cell_spec(df$C2[1:2], monospace = TRUE) +colnames(df) <- NULL +if (knitr::is_latex_output()) { + fs <- 9 +} else { + fs <- NULL +} +kableExtra::kbl(df, escape = FALSE, booktabs = TRUE, caption = "Summary of the branching rule selection problem.") %>% + kableExtra::row_spec(1, extra_css = "border-top: 1px solid") %>% + kableExtra::row_spec(nrow(df), extra_css = "border-bottom: 1px solid") %>% + kableExtra::kable_styling(full_width = FALSE, position = "center", latex_options = "hold_position", font_size = fs) +``` + +A well-known approach for finding global optima in polynomial optimization problems is based on the use of the Reformation Linearization Technique (RLT) `r knitr::asis_output(ifelse(knitr::is_html_output(), '[@she92]', '\\citep{she92}'))`. Without delving into intricate details, RLT operates by creating a linear relaxation of the original polynomial problem, which is then integrated into a branch-and-bound framework. The branching process involves assigning a score to each variable, based on the violations of the RLT identities it participates in, after solving the corresponding relaxation at each node. Subsequently, the variable with the highest score is selected for branching. The computation of these scores is a critical aspect and allows for various approaches, leading to distinct branching rules that constitute our algorithm selection portfolio. Specifically, in our example, we will examine six distinct branching rules (referred to interchangeably as branching rules or algorithms), labeled as `max`, `sum`, `dual`, `range`, `eig-VI`, and `eig-CMI` rules. For the definitions and a comprehensive understanding of the rationale behind these rules, refer to `r knitr::asis_output(ifelse(knitr::is_html_output(), '@gha23', '\\cite{gha23}'))`. + +Measuring the performance of different algorithms in the context of optimization is crucial for evaluating their effectiveness and efficiency. Two common metrics for this evaluation are running time and optimality gap, measured as a function of the lower and upper bounds for the objective function value at the end of the algorithm (a small optimality gap indicates that the algorithm is producing solutions close to the optimal). Both metrics are important and are often considered together to evaluate algorithm performance. For instance, it is meaningful to consider the time required to reduce the optimality gap by one unit as KPI. In our example, and to ensure it is well-defined, we make use of a slightly different metric, which we refer to as pace, defined as the time required to increase the lower bound by one unit. For the pace, a smaller value is preferred, as it indicates better performance. + +As depicted in Figure `r knitr::asis_output(ifelse(knitr::is_html_output(), '\\@ref(fig:rice)', '\\@ref(fig:rice)'))`, a crucial aspect of the methodology involves selecting input variables (features) that facilitate the prediction of the KPI for each branching rule. We consider `r ncol(features)` features representing global information of the polynomial optimization problems, such as relevant characteristics of variables, constraints, monomials, coefficients, or other attributes. A detailed description of the considered features can be found in Table `r knitr::asis_output('\\@ref(tab:featsum)')`. Although we won't delve into these aspects, determining appropriate features is often complex, and using feature-selection methods can be beneficial for choosing the most relevant ones. + +```{r featsum} +# Filtrar las columnas que están en featcol +filtered_columns <- names(datap$x)[names(datap$x) %in% featcol] +filtered_indices <- which(names(datap$x) %in% filtered_columns) +# Obtener las descripciones en el orden de las columnas de datap$x +filtered_descriptions <- Description[match(filtered_columns, featcol)] +df <- data.frame(filtered_indices, filtered_descriptions) +df[, 1] <- cell_spec(df[, 1], monospace = TRUE) +colnames(df) <- c("Index", "Description") +kableExtra::kbl(df, escape = FALSE, booktabs = TRUE, caption = "Features from the branching dataset.") %>% + kableExtra::row_spec(0, bold = TRUE) %>% + kableExtra::kable_styling(full_width = FALSE, position = "center", ) %>% + kableExtra::footnote(general = "Index refers to columns of branching$x.") +``` + +To assess the performance of the algorithm selection methods in this context, we have a diverse set of `r nrow(features)` instances from different optimization problems, taken from three well-known benchmarks `r knitr::asis_output(ifelse(knitr::is_html_output(), '[@bus03; @dal16; @fur18]', '\\citep{bus03, dal16, fur18}'))`, corresponding respectively to the MINLPLib, DS, and QPLIB libraries. Details are given in Table `r knitr::asis_output('\\@ref(tab:optsum)')`. The data for this analysis is contained within the `branching` dataset included in the package. We begin by defining two data frames. The `features` data frame includes two initial columns that provide the instance names and the corresponding family (library in our example) for each instance. The remaining columns consist of the features listed in Table `r knitr::asis_output('\\@ref(tab:featsum)')`. + +We also define the `KPI` data frame, which is derived from `branching$y`. This data frame contains the pace values for each of the six branching rules considered in this study (specified by the labels in the `lab_rules` vector). These data frames will serve as the input for our subsequent analyses. + +```{r ASMLcall, echo = TRUE, tidy=TRUE} +set.seed(1234) +library(ASML) +data(branching) +features <- branching$x +KPI <- branching$y +lab_rules <- c("max", "sum", "dual", "range", "eig-VI", "eig-CMI") +``` + +## Pre-Processing the data + +As with any analysis, the first step involves preprocessing the data. This includes using the function `partition_and_normalize`, which not only divides the dataset into training and test sets but also normalizes the KPI relative to the best result for each instance. The argument `better_smaller` specifies whether a lower KPI value is preferred (such as in our case where the KPI represents pace, with smaller values indicating better performance) or if a higher value is desired, when larger KPI values are considered more advantageous. + +```{r ASMLpre, echo = TRUE, tidy=TRUE} +data <- partition_and_normalize(features, KPI, family_column = 1, split_by_family = TRUE, better_smaller = TRUE) +names(data) +``` + +When using the function `partition_and_normalize` the resulting object is of class `as_data` and contains several key components essential for our study. Specifically, the object includes `x.train` and `x.test`, representing the feature sets for the training and test datasets, respectively. Additionally, it contains `y.train` and `y.test`, with the instance-normalized KPI corresponding to each dataset, along with their original counterparts, `y.train.original` and `y.test.original`. This structure allows us to retain the original KPI values while working with the instance-normalized data. Furthermore, when the parameter `split_by_family` is set to `TRUE`, as in the example, the object also includes `families.train` and `families.test`, indicating the family affiliation for each observation within the training and test sets. Figure `r knitr::asis_output(ifelse(knitr::is_html_output(), '\\@ref(fig:partitionPLOT)', '\\@ref(fig:partitionPLOT)'))` illustrates how the split preserves the proportions of instances for each library. + +```{r partitionPLOT, echo = FALSE, tidy=TRUE, out.width = ifelse(knitr::is_latex_output(), "70%" , "100%"), fig.cap="Train/Test partition preserving the percentage of instances for each library.", fig.alt = "Train/Test partition preserving the percentage of instances for each library."} +library(ggplot2) +library(viridisLite) +data_plot <- rbind( + data.frame( + Family = data$families.train[[1]], + Dataset = "Train" + ), + data.frame( + Family = data$families.test[[1]], + Dataset = "Test" + ) +) +data_plot$Dataset <- factor(data_plot$Dataset, levels = c("Train", "Test")) +ggplot(data_plot, aes(x = Dataset, fill = Family)) + + geom_bar(position = "fill") + + labs( + title = "Train/Test Partition", + y = "Proportion" + ) + + scale_fill_viridis_d(name = "Library") + + theme_minimal() + + scale_x_discrete(labels = c( + "Train" = paste0("Train (n=", nrow(data$families.train), ")"), + "Test" = paste0("Test (n=", nrow(data$families.test), ")") + )) +``` + +As a tool for visualizing the performance of the considered algorithms, the `boxplots` function operates on objects of class `as_data` and generates boxplots for the instance-normalized KPI. This visualization facilitates the comparison of performance differences across instances. The function can be applied to both training and test observations and can also group the results by family. Additionally, it accepts common arguments typically used in R functions. Figure `r knitr::asis_output(ifelse(knitr::is_html_output(), '\\@ref(fig:splitPLOT)', '\\@ref(fig:splitPLOT)'))` shows the instance-normalized KPI of the instances in the train set. What becomes evident from the boxplots is that there is no branching rule that outperforms the others across all instances, and making a wrong choice of criteria in certain problems can lead to very poor performance. + +```{r splitPLOT, echo = TRUE, tidy=TRUE, out.width = ifelse(knitr::is_latex_output(), "70%" , "100%"), fig.cap="Boxplots of instance-normalized KPI for each algorithm across instances in the train set.", fig.alt = "Boxplots of instance-normalized KPI for each algorithm across instances in the train set."} +boxplots(data, test = FALSE, by_families = FALSE, labels = lab_rules) +``` + +The `ranking` function, specifically designed for the \CRANpkg{ASML} package, is also valuable for visualizing the differing behaviors of the algorithms under investigation, depending on the analyzed instances. After ranking the algorithms for each instance, based on the instance-normalized KPI, the function generates a bar chart for each algorithm, indicating the percentage of times it occupies each ranking position. The numbers displayed within the bars represent the mean value of the instance-normalized KPI for the problems associated with that specific ranking position. Again, the representation can be made both for the training and test sets, as well as by family. In Figure `r knitr::asis_output(ifelse(knitr::is_html_output(), '\\@ref(fig:rank)', '\\@ref(fig:rank)'))`, we present the chart corresponding to the training sample and categorized by family. In particular, it is observed that certain rules, when not the best choice for a given instance, can perform quite poorly in terms of instance-normalized KPI (see, for example, the results on the MINLPLib library). This highlights the importance of not only selecting the best algorithm for each instance but also ensuring that the chosen algorithm does not perform too poorly when it isn’t optimal. In some cases, even if an algorithm isn't the best-performing option, it may still provide reasonably good results, whereas a wrong choice can result in significantly worse outcomes. + +```{r rank, echo = TRUE, tidy=TRUE, out.width = ifelse(knitr::is_latex_output(), "70%" , "100%"), fig.cap="Ranking of algorithms based on the instance-normalized KPI for the training sample, categorized by family. The bars represent the percentage of times each algorithm appeared in different ranking positions, with the numbers indicating the mean value of the KPI.", fig.alt = "Ranking of algorithms based on the instance-normalized KPI for the training sample, categorized by family. The bars represent the percentage of times each algorithm appeared in different ranking positions, with the numbers indicating the mean value of the KPI."} +ranking(data, test = FALSE, by_families = TRUE, labels = lab_rules) +``` + +Additionally, functions from the \CRANpkg{caret} package can be applied if further operations on the predictors are needed. Here we show an example where the Yeo-Johnson transformation is applied to the training set, and the same transformation is subsequently applied to the test set to ensure consistency across both datasets. The flexibility of \CRANpkg{caret} also allows for the inclusion of advanced techniques, such as feature selection and dimensionality reduction, to improve the quality of the algorithm selection process. + +```{r precaret, echo = TRUE, tidy=TRUE} +preProcValues <- caret::preProcess(data$x.train, method = "YeoJohnson") +data$x.train <- predict(preProcValues, data$x.train) +data$x.test <- predict(preProcValues, data$x.test) +``` + +## Training models and predicting the performance of the algorithms + +The approach in \CRANpkg{ASML} to algorithm selection is based on building regression models that predict the instance-normalized KPI of each considered algorithm. To this end, users can take advantage of the wide range of ML models available in the \CRANpkg{caret} package, which provides a unified interface for training and tuning various types of models. Models trained with \CRANpkg{caret} can be seamlessly integrated into the \CRANpkg{ASML} workflow using the `AStrain` function from \CRANpkg{ASML}, as shown in the next example. Just for illustrative purposes, we use quantile random forest `r knitr::asis_output(ifelse(knitr::is_html_output(), '[@mei06]', '\\citep{mei06}'))` to model the behavior of the instance-normalized KPI based on the features. This is done with the `qrf` method in the \CRANpkg{caret} package, which relies on the \CRANpkg{quantregForest} package `r knitr::asis_output(ifelse(knitr::is_html_output(), '[@mei24]', '\\citep{mei24}'))`. + +```{r AMSLtrain, echo = TRUE, tidy=TRUE} +library(quantregForest) +tune_grid <- expand.grid(mtry = 10) +training <- AStrain(data, method = "qrf", tuneGrid = tune_grid) +``` + +Additional arguments for `caret::train` can also be passed directly to `ASML::AStrain`. This allows users to take advantage of the flexibility of the \CRANpkg{caret} package, including specifying control methods (such as cross-validation), tuning parameters, or any other relevant settings provided by `caret::train`. This integration ensures that the \CRANpkg{ASML} workflow can fully make use of the modeling capabilities offered by \CRANpkg{caret}. To make the execution faster (it is not our intention here to delve into the choice of the best model), we use a `tune_grid` that sets a fixed value for `mtry`. This avoids the need for an exhaustive search for this hyperparameter, speeding up the model training process. Other modeling approaches should also be considered, as they may offer better performance depending on the specific characteristics of the data and the problem at hand. For more computationally intensive models or larger datasets, the `ASML::AStrain` function includes the argument `parallel`, which can be set to TRUE to enable parallel execution using the \CRANpkg{snow} package `r knitr::asis_output(ifelse(knitr::is_html_output(), '[@Rsnow]', '\\citep{Rsnow}'))`. This allows the training step to be distributed across multiple cores, reducing computation time. A detailed example on a larger dataset is provided in the following section, showing the scalability of the workflow and the effect of parallelization on training time. + +The function `caret::train` returns a trained model along with performance metrics, predictions, and tuning parameters, providing insights into the model's effectiveness. In a similar manner, `ASML::AStrain` offers the same type of output but for each algorithm under consideration, allowing straightforward comparison within the \CRANpkg{ASML} framework. The `ASML::ASpredict` function generates the predictions for new data by using the models created during the training phase for each algorithm under evaluation. Thus, predictions for the algorithms are obtained simultaneously, facilitating a direct comparison of their performance. By using `ASML::ASpredict` as follows, we obtain a matrix where each row corresponds to an instance from the test set, and each column represents the predicted instance-normalized KPIs for the six branching rules using the `qrf` method. + +```{r AMSLpred, echo = TRUE, tidy=TRUE} +predict_test <- ASpredict(training, newdata = data$x.test) +``` + +## Evaluating and visualizing the results + +One of the key strengths of the \CRANpkg{ASML} package lies in its ability to evaluate results collectively and provide intuitive visualizations. This approach not only aids in identifying the most effective algorithms but also contributes to the interpretability of the results, making it easier for users to make informed decisions based on the performance metrics and visual representations provided. For example, the function `KPI_table` returns a table showing the arithmetic and geometric mean of the KPI (both instance-normalized and not normalized) obtained on the test set for each algorithm, as well as for the algorithm selected by the learning model (the one with the largest instance-normalized predicted KPI for each instance). In Table `r knitr::asis_output('\\@ref(tab:AMSLtab2)')`, the results for our case study are shown. It is important to note that larger values are better in the columns for the arithmetic and geometric mean of the instance-normalized KPI (where values close to 1 indicate the best performance). Conversely, in the columns for non-normalized values, lower numbers reflect better outcomes. In all cases, the best results are obtained for the ML algorithm. Note also that in this case, the differences in the performance of the algorithms are likely better reflected by the geometric mean because it gives a better representation of relative differences. + +```{r AMSLtab, echo = TRUE, tidy=TRUE, eval=FALSE} +KPI_table(data, predictions = predict_test) +``` + +```{r AMSLtab2, echo = FALSE, tidy=TRUE, eval=TRUE} +KPItab <- KPI_table(data, predictions = predict_test) +KPItab <- round(KPItab, 3) +rownames(KPItab) <- c("ML", "max", "sum", "dual", "range", "eig-VI", "eig-CMI") +rownames(KPItab) <- kableExtra::cell_spec(rownames(KPItab), monospace = TRUE) +# Define column names based on the output format +if (knitr::is_html_output()) { + col_names <- c("Arithmetic mean\ninst-norm KPI", "Geometric mean\n inst-norm KPI", "Arithmetic mean\nnon-norm KPI", "Geometric mean\nnon-norm KPI") + wi <- "3.3cm" +} else { + col_names <- c("Arith. mean\\newline inst-norm KPI", "Geom. mean\\newline inst-norm KPI", "Arith. mean\\newline non-norm KPI", "Geom. mean\\newline non-norm KPI") + wi <- "2.5cm" +} +if (knitr::is_latex_output()) { + fs <- 9 +} else { + fs <- NULL +} +kableExtra::kbl(KPItab, + escape = F, caption = "Arithmetic and geometric mean of the KPI (both instance-normalized and non-normalized) for each algorithm on the test set, along with the results for the algorithm selected by the learning model (first row).", + col.names = col_names, + booktabs = TRUE, +) %>% + kableExtra::kable_styling(full_width = FALSE, position = "center", latex_options = "hold_position", font_size = fs) %>% + column_spec(column = 2:5, width = wi) +``` + +Additionally, the function `KPI_summary_table` generates a concise comparative table displaying values for three different choices: single best, ML, and optimal, see Table `r knitr::asis_output('\\@ref(tab:AMSLtabsum2)')`. The single best choice refers to selecting the same algorithm for all instances based on the lowest geometric mean of the non-normalized KPI (in this case the `range` rule). This approach evaluates the performance of each algorithm across all instances and chooses the one that consistently performs best overall, rather than optimizing for individual instances. The ML choice represents the algorithm selected by the quantile random forest model. The optimal choice corresponds to solving each instance with the algorithm that performs best for that specific instance. The ML choice shows promising results, with a mean KPI close to the optimal choice, demonstrating its capability to select algorithms that yield competitive performance. + +```{r AMSLtabsum, echo = TRUE, tidy=TRUE, eval=FALSE} +KPI_summary_table(data, predictions = predict_test) +``` + +```{r AMSLtabsum2, echo = FALSE, tidy=TRUE, eval=TRUE} +KPItab <- KPI_summary_table(data, predictions = predict_test) +KPItab <- round(KPItab, 3) +rownames(KPItab) <- c("single best", "ML", "optimal") +rownames(KPItab) <- kableExtra::cell_spec(rownames(KPItab), monospace = TRUE) +# Define column names based on the output format +if (knitr::is_html_output()) { + col_names <- c("Arithmetic mean\nnon-norm KPI", "Geometric mean\nnon-norm KPI") + wi <- "3.3cm" +} else { + col_names <- c("Arith. mean\\newline non-norm KPI", "Geom. mean\\newline non-norm KPI") + wi <- "2.5cm" +} +if (knitr::is_latex_output()) { + fs <- 9 +} else { + fs <- NULL +} +kableExtra::kbl(KPItab, + escape = F, caption = "Arithmetic and geometric mean of the non-normalized KPI for single best choice, ML choice, and optimal choice.", + col.names = col_names, + booktabs = TRUE, +) %>% + kableExtra::kable_styling(full_width = FALSE, position = "center", latex_options = "hold_position", font_size = fs) %>% + column_spec(column = 2:3, width = wi) +``` + +The following code generates several visualizations that help us compare how well the algorithms perform according to the response variable (instance-normalized KPI) and also illustrate the behavior of the learning process. These plots give us good insights into how effective the algorithm selection process is and how it behaves in comparison to using the same branching rule for all instances. Figure `r knitr::asis_output(ifelse(knitr::is_html_output(), '\\@ref(fig:ASMLplot1)', '\\@ref(fig:ASMLplot1)'))` shows the boxplots comparing the performance of each algorithm in terms of the instance-normalized KPI, including the instance-normalized KPI of the rules selected by the ML process for the test set. In Figure `r knitr::asis_output(ifelse(knitr::is_html_output(), '\\@ref(fig:ASMLplot2)', '\\@ref(fig:ASMLplot2)'))`, the performance is presented by family, allowing for a more detailed comparison across the different sets of instances. In Figure `r knitr::asis_output(ifelse(knitr::is_html_output(), '\\@ref(fig:ASMLplot3)', '\\@ref(fig:ASMLplot3)'))`, we show the ranking of algorithms based on the instance-normalized KPI for the test sample, including the ML rule, categorized by family. Finally, in Figure `r knitr::asis_output(ifelse(knitr::is_html_output(), '\\@ref(fig:ASMLplot4)', '\\@ref(fig:ASMLplot4)'))`, the right-side bar in the stacked bar plot (optimal) illustrates the proportion of instances in which each of the original rules is identified as the best-performing option. In contrast, the left-side bar (ML) depicts the frequency with which ML selects each rule as the top choice. Although the rule chosen by ML in each instance doesn’t always match the best one for that case, ML tends to select the different rules in a similar proportion to how often those rules are the best across the test set. This means it does not consistently favor a particular rule or ignore any that are the best a significant percentage of instances. + +```{r ASMLplot, echo = TRUE, tidy=TRUE, eval=FALSE} +boxplots(data, predictions = predict_test, labels = c(lab_rules, "ML")) +boxplots(data, predictions = predict_test, labels = c(lab_rules, "ML"), by_families = TRUE) +ranking(data, predictions = predict_test, labels = c("ML", lab_rules), by_families = TRUE) +figure_comparison(data, predictions = predict_test, by_families = FALSE, labels = lab_rules) +``` + +```{r ASMLplot1, echo = FALSE, tidy=TRUE, out.width = ifelse(knitr::is_latex_output(), "70%" , "100%"), fig.cap="Boxplots of instance-normalized KPI for each algorithm, including the ML algorithm, across instances in the test set.", fig.alt = "Boxplots of instance-normalized KPI for each algorithm, including the ML algorithm, across instances in the test set."} +boxplots(data, predictions = predict_test, labels = c(lab_rules, "ML")) +``` + +```{r ASMLplot2, echo = FALSE, tidy=TRUE, out.width = ifelse(knitr::is_latex_output(), "70%" , "100%"), fig.cap="Boxplots of instance-normalized KPI for each algorithm, including the ML algorithm, across instances in the test set, categorized by family.", fig.alt = "Boxplots of instance-normalized KPI for each algorithm, including the ML algorithm, across instances in the test set, categorized by family."} +boxplots(data, predictions = predict_test, labels = c(lab_rules, "ML"), by_families = TRUE) +``` + +```{r ASMLplot3, echo = FALSE, tidy=TRUE, out.width = ifelse(knitr::is_latex_output(), "70%" , "100%"), fig.cap="Ranking of algorithms, including the ML algorithm, based on the instance-normalized KPI for the test sample, categorized by family. The bars represent the percentage of times each algorithm appeared in different ranking positions, with the numbers indicating the mean value of the normalized KPI.", fig.alt = "Ranking of algorithms, including the ML algorithm, based on the instance-normalized KPI for the test sample, categorized by family. The bars represent the percentage of times each algorithm appeared in different ranking positions, with the numbers indicating the mean value of the normalized KPI."} +ranking(data, predictions = predict_test, labels = c("ML", lab_rules), by_families = TRUE) +``` + +```{r ASMLplot4, echo = FALSE, tidy=TRUE, out.width = ifelse(knitr::is_latex_output(), "70%" , "100%"), fig.cap="Comparison of the best-performing rules: The right stack shows the proportion of times each of the original rules is identified as the best-performing option, while the left stack presents the frequency of selection by ML.", fig.alt = "Comparison of the best-performing rules: The right stack shows the proportion of times each of the original rules is identified as the best-performing option, while the left stack presents the frequency of selection by ML."} +figure_comparison(data, predictions = predict_test, by_families = FALSE, labels = lab_rules) +``` + +## Custom user-defined methods + +While \CRANpkg{caret} provides a range of built-in methods for model training and prediction, there may be situations where researchers want to explore additional methods not directly integrated into the package. Considering alternative methods can improve the analysis and provide greater flexibility in modeling choices. + +In this section, we present an example of how to modify the quantile random forest `qrf` method. The `qrf` implementation in \CRANpkg{caret} does not allow users to specify the conditional quantile to predict, which is set to the median by default. In this case, rather than creating an entirely new method, we only need to adjust the prediction function to include the `what` argument, allowing us to specify the desired conditional quantile for prediction. In this execution example, we base the algorithm selection method on the predictions of the $\alpha$-conditional quantile of the instance-normalized KPI for $\alpha = 0.25$. + +```{r AMSLtrain2, echo = TRUE, tidy=TRUE, eval=FALSE} +qrf_q_predict <- function(modelFit, newdata, what = 0.5, submodels = NULL) { + out <- predict(modelFit$finalModel, newdata, what = what) + if (is.matrix(out)) { + out <- out[, 1] + } + out +} + +predict_test_Q1 <- ASpredict(training, newdata = data$x.test, f = "qrf_q_predict", what = 0.25) +KPI_summary_table(data, predictions = predict_test_Q1) +``` + +```{r, inline = TRUE, results='asis'} +if (knitr::is_html_output()) { + knitr::asis_output("The results are summarized in Table \\@ref(tab:AMSLtabsum22).") +} +``` + +```{r AMSLtabsum22, echo = FALSE, tidy=TRUE, eval=TRUE} +if (knitr::is_html_output()) { + .GlobalEnv$qrf_q_predict <- function(modelFit, newdata, what = 0.5, submodels = NULL) { + out <- predict(modelFit$finalModel, newdata, what = what) + if (is.matrix(out)) { + out <- out[, 1] + } + out + } + predict_test_Q1 <- ASML::ASpredict(training, newdata = data$x.test, f = "qrf_q_predict", what = 0.25) + KPItab <- KPI_summary_table(data, predictions = predict_test_Q1) + KPItab <- round(KPItab, 3) + rownames(KPItab) <- c("single best", "ML", "optimal") + rownames(KPItab) <- kableExtra::cell_spec(rownames(KPItab), monospace = TRUE) + # Define column names based on the output format + if (knitr::is_html_output()) { + col_names <- c("Arithmetic mean\nnon-norm KPI", "Geometric mean\nnon-norm KPI") + wi <- "3.3cm" + } else { + col_names <- c("Arith. mean\\newline non-norm KPI", "Geom. mean\\newline non-norm KPI") + wi <- "2.5cm" + } + # kableExtra::kbl(KPItab, escape = F, caption=paste0("Arithmetic and geometric mean of the non-normalized KPI #for single best choice, ML choice, and optimal choice. The ML choice is based on the predictions of the", + # bquote(alpha), + # "-conditional quantile for ", + # bquote(alpha), + # "=0.25.") , + if (knitr::is_latex_output()) { + fs <- 9 + } else { + fs <- NULL + } + kableExtra::kbl(KPItab, + escape = F, caption = "Arithmetic and geometric mean of the non-normalized KPI for single best choice, ML choice, and optimal choice. The ML choice is based on the predictions of the alpha-conditional quantile for alpha=0.25.",booktabs = TRUE, + col.names = col_names + ) %>% + kableExtra::kable_styling(full_width = FALSE, position = "center", latex_options = "hold_position", font_size = fs) %>% + column_spec(column = 2:3, width = wi) +} else { + cat("") +} +``` + +## Model interpretability + +Predictive modeling often relies on flexible but complex methods. These methods typically involve many parameters or hyperparameters, which can make the models difficult to interpret. To address this, interpretable ML techniques provide tools for exploring *black-box* models. \CRANpkg{ASML} integrates seamlessly with the package \CRANpkg{DALEX} (moDel Agnostic Language for Exploration and eXplanation), see `r knitr::asis_output(ifelse(knitr::is_html_output(), '@DALEX', '\\cite{DALEX}'))`. With \CRANpkg{DALEX}, users can obtain model performance metrics, evaluate feature importance, and generate partial dependence plots (PDPs), among other analyses. + +To simplify the use of \CRANpkg{DALEX} within our framework, \CRANpkg{ASML} provides the function `ASexplainer`. This function automatically creates \CRANpkg{DALEX} explainers for the models trained with `AStrain` (one for each algorithm in the portfolio). Once the explainers are created, users can easily apply \CRANpkg{DALEX} functions to explore and compare the behavior of each model. The following example shows how to obtain a plot of the reversed empirical cumulative distribution function of the absolute residuals, from the performance metrics computed with `DALEX::model_performance`, see Figure `r knitr::asis_output(ifelse(knitr::is_html_output(), '\\@ref(fig:DALEX1)', '\\@ref(fig:DALEX1)'))`. + +```{r ASML_DALEX, echo = TRUE, tidy=TRUE, eval=FALSE} +# Create DALEX explainers for each trained model +explainers_qrf <- ASexplainer(training, data = data$x.test, y = data$y.test, labels = lab_rules) +# Compute model performance metrics for each explainer +mp_qrf <- lapply(explainers_qrf, DALEX::model_performance) +# Plot the performance metrics +do.call(plot, unname(mp_qrf)) + theme_bw(base_line_size = 0.5) +``` + +```{r DALEX1, echo = FALSE, tidy=TRUE, out.width = ifelse(knitr::is_latex_output(), "70%" , "100%"), fig.cap="Reversed empirical cumulative distribution function of the absolute residuals of the trained models.", fig.alt = "Reversed empirical cumulative distribution function of the absolute residuals of the trained model."} +explainers_qrf <- ASexplainer(training, data = data$x.test, y = data$y.test, labels = lab_rules, verbose = FALSE) +mp_qrf <- lapply(explainers_qrf, DALEX::model_performance) +do.call(plot, unname(mp_qrf)) + theme_bw(base_line_size = 0.5) +``` +The code below illustrates how to obtain feature importance (via `DALEX::model_parts`) and a PDP for the predictor variable `degree` (via `DALEX::model_profile`). Plots are not displayed in this manuscript, but they can be generated by executing the code. + +```{r ASML_DALEX2, echo = TRUE, tidy=TRUE, eval=FALSE} +# Compute feature importance for each model in the explainers list +vi_qrf <- lapply(explainers_qrf, DALEX::model_parts) +# Plot the top 5 most important variables for each model +do.call(plot, c(unname(vi_qrf), list(max_vars = 5))) +# Compute PDP for the variable "degree" for each model +pdp_qrf <- lapply(explainers_qrf, DALEX::model_profile, variable = "degree", type = "partial") +# Plot the PDPs generated +do.call(plot, unname(pdp_qrf)) + theme_bw(base_line_size = 0.5) +``` + +# Example on a larger dataset +To analyze the scalability of \CRANpkg{ASML}, we now consider an example of algorithm selection in the field of high-performance computing (HPC), specifically in the context of the automatic selection of the most suitable storage format for sparse matrices on GPUs. This is a well-known problem in HPC, since the storage format has a decisive impact on the performance of many scientific kernels such as the sparse matrix–vector multiplication (SpMV). For this study, we use the dataset introduced by `r knitr::asis_output(ifelse(knitr::is_html_output(), '@pic18', '\\cite{pic18}'))`, which contains 8111 sparse matrices and is available in the \CRANpkg{ASML} package under the name `SpMVformat`. Each matrix is described by a set of nine structural features, and the performance of the single-precision SpMV kernel was measured on a NVIDIA GeForce GTX TITAN GPU, under three storage formats: compressed row storage (CSR), ELLPACK (ELL), and hybrid (HYB). For each matrix and format, performance is expressed as the average GFLOPS (billions of floating-point operations per second), over 1000 SpMV operations. This setup allows us to study how matrix features relate to the most efficient storage format. + +The workflow follows the standard \CRANpkg{ASML} pipeline: data are partitioned and normalized, preprocessed, and models are trained using `ASML::AStrain`. We considered different learning methods available in \CRANpkg{caret} and evaluated execution times both with and without parallel processing, which is controlled via the `parallel` argument in `ASML::AStrain`. The selected methods were run with their default configurations in \CRANpkg{caret}, without additional hyperparameter tuning. All experiments were performed on a machine equipped with a 12th Gen Intel(R) Core(TM) i7-12700 (12 cores), 2.11 GHz processor and 32 GB of RAM. The execution times are summarized in Tables `r knitr::asis_output('\\@ref(tab:AMSLtimes)')` and `r knitr::asis_output('\\@ref(tab:AMSLtimes2)')`. + +```{r AMSLtimes, echo = FALSE, tidy=TRUE, eval=TRUE} +library(kableExtra) +set.seed(1234) +# Tabla con valores fijos +general_times <- data.frame( + Function = c("ASML::partition_and_normalize", "caret::preProcess"), + Time_sec = c(0.03, 1.55) # valores ya obtenidos +) + +# Nombres de columna +col_names <- c("Stage", "Execution time (seconds)") + +# Ancho opcional de la columna de tiempo +wi <- "10em" + +# Crear tabla elegante +if (knitr::is_latex_output()) { + fs <- 9 +} else { + fs <- NULL +} +kableExtra::kbl( + general_times, + escape = TRUE, + caption = "Execution times (in seconds) on the SpMVformat dataset for the main preprocessing stages.", + col.names = col_names,booktabs = TRUE, +) %>% + kableExtra::kable_styling(full_width = FALSE, position = "center", latex_options = "hold_position", font_size = fs) %>% + kableExtra::column_spec(column = 2, width = wi) +``` + +```{r AMSLtimes2, echo = FALSE, tidy=TRUE, eval=TRUE} +library(kableExtra) + +# Crear tabla con valores fijos +train_times <- data.frame( + Method = c("nnet", "svmRadial", "rf"), + `parallel \\= FALSE` = c(236.58, 881.03, 4753.00), + `parallel \\= TRUE` = c(50.75, 263.60, 1289.68) +) + +# Ancho opcional de columnas +wi <- "10em" + +# Crear tabla elegante compatible con PDF +if (knitr::is_latex_output()) { + fs <- 9 +} else { + fs <- NULL +} +kableExtra::kbl( + train_times, + escape = TRUE, # ya escapamos manualmente en los nombres + caption = "Training times (in seconds) on the SpMVformat dataset for different methods using ASML::AStrain. The first column shows execution without parallelization (parallel = FALSE) and the second column shows execution with parallelization (parallel = TRUE).", + col.names = c("Method", "parallel = FALSE", "parallel = TRUE"), booktabs = TRUE, +) %>% + kableExtra::kable_styling(latex_options = c("hold_position"), font_size = fs) %>% + kableExtra::column_spec(column = 2:3, width = wi) %>% + kableExtra::add_header_above(c(" " = 1, "Execution times (in seconds) of ASML::AStrain" = 2)) +``` + +The majority of the computational cost is associated with model training, which depends on the learning method in \CRANpkg{caret}. We observe that training times vary across methods: `nnet` (a simple feed-forward neural network), `svmRadial` (support vector machines with radial kernel), and `rf` (random forest). Parallel execution substantially reduces training times for all selected methods, demonstrating that the workflow scales efficiently to larger datasets while keeping preprocessing overhead minimal. + +Apart from the execution times, we also take this opportunity to provide a brief commentary on the outcome of the algorithm selection in this application example. In particular, we illustrate the model’s ability to identify the most efficient storage format by reporting the results obtained with the `nnet` method, see Figure `r knitr::asis_output(ifelse(knitr::is_html_output(), '\\@ref(fig:ASMLnnet1)', '\\@ref(fig:ASMLnnet1)'))`. The trained model selects the best-performing format in more than 85% of the test cases, and even when it does not, the chosen format still achieves high performance, with mean value of the normalized KPI (normalized average GFLOPS) around 0.9. + +```{r ASMLnnet, echo = TRUE, tidy=TRUE, eval=FALSE} +set.seed(1234) +data(SpMVformat) +features <- SpMVformat$x +KPI <- SpMVformat$y +data <- partition_and_normalize(features, KPI, better_smaller = FALSE) +preProcValues <- caret::preProcess(data$x.train, method = "YeoJohnson") +data$x.train <- predict(preProcValues, data$x.train) +data$x.test <- predict(preProcValues, data$x.test) +training <- AStrain(data, method = "nnet", parallel = TRUE) +pred <- ASpredict(training, newdata = data$x.test) +ranking(data, predictions = pred) +``` + +```{r ASMLnnet1, echo = FALSE, tidy=TRUE, out.width = ifelse(knitr::is_latex_output(), "70%" , "100%"), fig.cap="Ranking of storage formats, including the ML selected, based on the instance-normalized KPI for the test sample. The bars represent the percentage of times each storage format appeared in different ranking positions, with the numbers indicating the mean value of the normalized KPI.", fig.alt = "Ranking of storage formats, including the ML selected, based on the instance-normalized KPI for the test sample. The bars represent the percentage of times each storage format appeared in different ranking positions, with the numbers indicating the mean value of the normalized KPI."} +data(SpMVformat) +features <- SpMVformat$x +KPI <- SpMVformat$y +data <- partition_and_normalize(features, KPI, better_smaller = FALSE) +preProcValues <- caret::preProcess(data$x.train, method = "YeoJohnson") +data$x.train <- predict(preProcValues, data$x.train) +data$x.test <- predict(preProcValues, data$x.test) +training <- AStrain(data, method = "nnet", parallel = TRUE) +pred <- ASpredict(training, newdata = data$x.test) +ranking(data, predictions = pred) +``` + +# Using ASML for algorithm selection on ASlib scenarios + +While the primary purpose of the \CRANpkg{ASML} package is not to systematically conduct algorithm selection studies like those found in ASlib (an area for which the \CRANpkg{llama} toolkit is especially helpful), it does offer a complementary approach for reproducing results from the ASlib benchmark (https://coseal.github.io/aslib-r/scenario-pages/index.html). Our method allows for a comparative analysis using instance-normalized KPIs, which, as demonstrated in the following example, can sometimes yield improved performance results. Additionally, it can be useful for evaluating algorithm selection approaches based on methods that are not available in the \CRANpkg{mlr} package used by \CRANpkg{llama} but are accessible in \CRANpkg{caret}. + +## Data download and preparation + +First, we identify the specific scenario from ASlib we are interested in, in this case, `CPMP-2015`. Using the scenario name, we construct a URL that points to the corresponding page on the ASlib website. Then, we fetch the HTML content of the page and create a local directory to store the downloaded files^[A more direct approach would be to use the `getCosealASScenario` function from the \CRANpkg{aslib} package; however, this function seems to be currently not working, likely due to changes in the directory structure of the scenarios.]. + +```{r echo = TRUE, tidy=TRUE, eval=TRUE} +set.seed(1234) +library(tidyverse) +library(rvest) +scen <- "CPMP-2015" +url <- paste0("https://coseal.github.io/aslib-r/scenario-pages/", scen, "/data_files") +page <- read_html(paste0(url, ".html")) +file_links <- page %>% + html_nodes("a") %>% + html_attr("href") + +# Create directory for downloaded files +dir_data <- paste0(scen, "_data") +dir.create(dir_data, showWarnings = FALSE) + +# Download files +for (link in file_links) { + full_link <- ifelse(grepl("^http", link), link, paste0(url, "/", link)) + file_name <- basename(link) + dest_file <- file.path(dir_data, file_name) + if (!is.na(full_link)) { + download.file(full_link, dest_file, mode = "wb", quiet = TRUE) + } +} +``` + +## Data preparation with aslib + +Now, we use the \CRANpkg{aslib} package to parse the scenario data and extract the relevant features and performance metrics. The `parseASScenario` function from \CRANpkg{aslib} creates a structured object `ASScen` that contains information regarding the algorithms and instances being evaluated. We then transform this data into cross-validation folds using the `cvFolds` function from \CRANpkg{llama}. This conversion facilitates a fair evaluation of algorithm performance across different scenarios, allowing us to compare the results with those published `r if (knitr::is_html_output()) { + '^[Available at: https://coseal.github.io/aslib-r/scenario-pages/CPMP-2015/llama.html (Accessed October 25, 2024).]' +} else { + '\\footnote{\\label{foot}Available at: https://coseal.github.io/aslib-r/scenario-pages/CPMP-2015/llama.html (Accessed October 25, 2024).}' +}`. + +```{r echo = TRUE, tidy=TRUE, eval=TRUE} +library(aslib) +ASScen <- aslib::parseASScenario(dir_data) +llamaScen <- aslib::convertToLlama(ASScen) +folds <- llama::cvFolds(llamaScen) +``` + +Then we extract the key performance indicator (KPI) and features from the folds object. In this case, `KPI` refers to runtime. As described in the ASlib documentation, `KPI_pen` measures the penalized runtime. If an instance is solved within the timeout (`cutoff`) by the selected algorithm, the actual runtime is used. However, if a timeout occurs, the timeout value is multiplied by 10 to penalize the algorithm's performance. We also define `nins` as the number of instances and `ID` as unique identifiers for each instance. + +```{r echo = TRUE, tidy=TRUE, eval=TRUE} +KPI <- folds$data[, folds$performance] +features <- folds$data[, folds$features] +cutoff <- ASScen$desc$algorithm_cutoff_time +is.timeout <- ASScen$algo.runstatus[, -c(1, 2)] != "ok" +KPI_pen <- KPI * ifelse(is.timeout, 10, 1) +nins <- length(getInstanceNames(ASScen)) +ID <- 1:nins +``` + +## Quantile random forest using ASML on instance-normalized KPI + +We use the \CRANpkg{ASML} package to perform quantile random forest on instance-normalized KPI. We have already established the folds beforehand, and we want to use those partitions to maintain consistency with the original ASlib scenario design. Therefore, we provide `x.test` and `y.test` as arguments directly to the `partition_and_normalize` function. + +```{r echo = TRUE, tidy=TRUE, eval=TRUE} +data <- partition_and_normalize(x = features, y = KPI, x.test = features, y.test = KPI, better_smaller = TRUE) +train_control <- caret::trainControl(index = folds$train, savePredictions = "final") +training <- AStrain(data, method = "qrf", trControl = train_control) +``` + +In this code block, we process the predictions made by the models trained using \CRANpkg{ASML} and calculate the same performance metrics used in ASlib, namely, the percentage of solved instances (`succ`), penalized average runtime (`par10`), and misclassification penalty (`mcp`), as detailed in the ASlib documentation `r knitr::asis_output(ifelse(knitr::is_html_output(), '[@bis16]', '\\citep{bis16}'))`. + +```{r echo = TRUE, tidy=TRUE, eval=TRUE} +pred_list <- lapply(training, function(model) { + model$pred %>% + arrange(rowIndex) %>% + pull(pred) +}) + +pred <- do.call(cbind, pred_list) +alg_sel <- apply(pred, 1, which.max) + +succ <- mean(!is.timeout[cbind(ID, alg_sel)]) +par10 <- mean(KPI_pen[cbind(ID, alg_sel)]) +mcp <- mean(KPI[cbind(ID, alg_sel)] - apply(KPI, 1, min)) +``` + +In Table `r knitr::asis_output('\\@ref(tab:AMSLtabASLIB)')`, we present the results. We observe that, in this example, using instance-normalized KPI along with the quantile random forest model offers an alternative modeling option in addition to the standard regression models employed in the original ASlib study (linear model, regression trees and regression random forest), resulting in improved performance outcomes. + +```{r AMSLtabASLIB, echo = FALSE, tidy=TRUE, eval=TRUE} +results_table <- data.frame(Model = "ASML qrf", succ = format(succ, nsmall = 3, digits = 3), par10 = format(par10, nsmall = 3, digits = 3), mcp = format(mcp, nsmall = 3, digits = 3)) + +# Add manually defined rows +manual_rows <- data.frame( + Model = c("baseline vbs", "baseline singleBest", "regr.lm", "regr.rpart", "regr.randomForest"), + succ = format(c(1.000, 0.812, 0.843, 0.843, 0.846), nsmall = 3, digits = 3), + par10 = format(c(227.605, 7002.907, 5887.326, 5916.120, 5748.065), nsmall = 3, digits = 3), + mcp = format(c(0.000, 688.774, 556.875, 585.669, 540.574), nsmall = 3, digits = 3) +) + +# Combine with the results table +results_table <- rbind(manual_rows, results_table) + +# Update column names if necessary +colnames(results_table) <- kableExtra::cell_spec(colnames(results_table), monospace = TRUE) + +# Create the table +if (knitr::is_latex_output()) { + fs <- 9 +} else { + fs <- NULL +} +if (knitr::is_html_output()) { + results_table %>% + kableExtra::kbl( + escape = FALSE, caption = "Performance results of various models on the CPMP-2015 dataset. The last row represents the performance of the quantile random forest model based on instance-normalized KPI using the \\CRANpkg{ASML} package. + The preceding rows detail the results (all taken from original ASlib study^[Available at: https://coseal.github.io/aslib-r/scenario-pages/CPMP-2015/llama.html (Accessed October 25, 2024).]) of the virtual best solver (vbs), single best solver (singleBest), and the considered regression methods (linear model, regression trees and regression random forest).", + align = "lrrr" + ) %>% + kableExtra::kable_styling(full_width = FALSE, position = "center", latex_options = "hold_position", font_size = fs) %>% + kableExtra::row_spec(0, bold = TRUE) %>% # Encabezado en negrita + kableExtra::row_spec(c(1, 2), background = "#E6F2FA") %>% # Color para la primera fila + kableExtra::row_spec(3:5, background = "#B3D8E5") %>% # Color para la segunda fila + kableExtra::row_spec(nrow(results_table), background = "#99C4DE") # Color para la última fila +} else { + results_table %>% + kableExtra::kbl( + escape = FALSE, caption = "Performance results of various models on the CPMP-2015 dataset. The last row represents the performance of the quantile random forest model based on instance-normalized KPI using the \\CRANpkg{ASML} package. + The preceding rows detail the results (all taken from original ASlib study) of the virtual best solver (vbs), single best solver (singleBest), and the considered regression methods (linear model, regression trees and regression random forest).", booktabs = TRUE, + align = "lrrr" + ) %>% + kableExtra::kable_styling(full_width = FALSE, position = "center", latex_options = "hold_position", font_size = fs) %>% + kableExtra::row_spec(0, bold = TRUE) +} +``` + +```{r AMSLtabASLIB_OLD, echo = FALSE, tidy=TRUE, eval=FALSE} +results_table <- cbind(succ, par10, mcp) +colnames(results_table) <- kableExtra::cell_spec(colnames(results_table), monospace = TRUE) +if (knitr::is_latex_output()) { + fs <- 9 +} else { + fs <- NULL +} +kableExtra::kbl(results_table, escape = F, booktabs = TRUE, caption = "Performance results of quantile random forest on the CPMP-2015 dataset based on instance-normalized KPI.") %>% + kableExtra::kable_styling(full_width = FALSE, position = "center", latex_options = "hold_position", font_size = fs) +``` + +It's important to note that this is merely an example of execution and there are other scenarios in ASlib where replication may not be feasible in the same manner, due to factors not considered in \CRANpkg{ASML} (for more robust behavior across the ASlib benchmark, we refer to \CRANpkg{llama}). Despite these limitations, \CRANpkg{ASML} provides a flexible framework that allows researchers to explore various methodologies, including those not directly applicable with \CRANpkg{llama} through \CRANpkg{mlr}, and improve algorithm selection processes across different scenarios, ultimately contributing to improve understanding and performance in algorithm selection tasks. + +# Summary and discussion + +In this work, we present \CRANpkg{ASML}, an R package to select the best algorithm from a portfolio of candidates based on a chosen KPI. \CRANpkg{ASML} uses instance-specific features and historical performance data to estimate, via a model selected by the user, including any regression method from the \CRANpkg{caret} package or a custom function, how well each algorithm is likely to perform on new instances. This helps the automatic selection of the most suitable one. The use of instance-normalized KPIs for algorithm selection is a novel aspect of this package, allowing a unified comparison across different algorithms and problem instances. + +While the motivation and examples presented in this work focus on optimization problems, particularly the automatic selection of branching rules and decision strategies in polynomial optimization, the \CRANpkg{ASML} framework is inherently flexible and can be applied more broadly. In particular, in the context of ML, selecting the right algorithm is a crucial factor for the success of ML applications. Traditionally, this process involves empirically assessing potential algorithms with the available data, which can be resource-intensive. In contrast, in the so-called Meta-Learning, the aim is to predict the performance of ML algorithms based on features of the learning problems (meta-examples). Each meta-example contains details about a previously solved learning problem, including features, and the performance achieved by the candidate algorithms on that problem. A common Meta-Learning approach involves using regression algorithms to forecast the value of a selected performance metric (such as classification error) for the candidate algorithms based on the problem features. This method is commonly referred to as Meta-Regression in the literature. Thus, \CRANpkg{ASML} could also be used in this context, providing a flexible tool for algorithm selection across a variety of domains. + +# Acknowledgments + +The authors would like to thank María Caseiro-Arias, Antonio Fariña-Elorza and Manuel Timiraos-López for their contributions to the development of the \CRANpkg{ASML} package. +This work is part of the R\&D projects PID2024-158017NB-I00, PID2020-116587GB-I00 and PID2021-124030NB-C32 granted by MICIU/AEI/10.13039/501100011033. This research was also funded by Grupos de Referencia Competitiva ED431C-2021/24 and ED431C 2025/03 from the Consellería de Educación, Ciencia, Universidades e Formación Profesional, Xunta de Galicia. Brais González-Rodríguez acknowledges the support from MICIU, through grant BG23/00155. diff --git a/_articles/RJ-2025-045/RJ-2025-045.html b/_articles/RJ-2025-045/RJ-2025-045.html new file mode 100644 index 0000000000..50cc82c353 --- /dev/null +++ b/_articles/RJ-2025-045/RJ-2025-045.html @@ -0,0 +1,4295 @@ + + + + + + + + + + + + + + + + + + + + + + ASML: An R Package for Algorithm Selection with Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +

    ASML: An R Package for Algorithm Selection with Machine Learning

    + + + +

    For extensively studied computational problems, it is commonly acknowledged that different instances may require different algorithms for optimal performance. The R package ASML focuses on the task of efficiently selecting from a given portfolio of algorithms, the most suitable one for each specific problem instance, based on significant instance features. The package allows for the use of the machine learning tools available in the R package caret and additionally offers visualization tools and summaries of results that make it easier to interpret how algorithm selection techniques perform, helping users better understand and assess their behavior and performance improvements.

    +
    + + + +
    +

    1 Introduction

    +

    Selecting from a set of algorithms the most appropriate one for solving a given problem instance (understood as an individual problem case with its own specific characteristics) is a common issue that comes up in many different situations, such as in combinatorial search problems (Kotthoff 2016; Drake et al. 2020), planning and scheduling problems (Messelis and De Causmaecker 2014; Speck et al. 2021), or in machine learning (ML), where the multitude of available techniques often makes it challenging to determine the best approach for a particular dataset (Vanschoren 2019). For an extensive survey on automated algorithm selection and application areas, we refer to Kerschke et al. (2019).

    +

    Figure 1 presents a general scheme, adapted from Figure 1 in Kerschke et al. (2019), illustrating the use of ML for algorithm selection. A set of problem instances is given, each described by associated features, together with a portfolio of algorithms that have been evaluated on all instances. The instance features and performance results are then fed into a ML framework, which is trained to produce a selector capable of predicting the best-performing algorithm for an unseen instance. Note that we are restricting attention to offline algorithm selection, in which the selector is constructed using a training set of instances and then applied to new problem instances.

    +
    +
    +Schematic overview of the interplay between problem instance features (top left), algorithm performance data (bottom left), selector construction (center), and the assessment of selector performance (bottom right). Adapted from Kerschke et al. (2019). +

    +Figure 1: Schematic overview of the interplay between problem instance features (top left), algorithm performance data (bottom left), selector construction (center), and the assessment of selector performance (bottom right). Adapted from Kerschke et al. (2019). +

    +
    +
    +

    Algorithm selection tools also demonstrate significant potential in the field of optimization, enhancing performance at solving problems where multiple solving strategies are often available. For example, a key factor in the efficiency of state-of-the-art global solvers in mixed integer linear programming and also in nonlinear optimization is the design of branch-and-bound algorithms and, in particular, of their branching rules. There isn’t a single branching rule that outperforms all others on every problem instance. Instead, different branching rules exhibit optimal performance on different types of problem instances. Developing methods for the automatic selection of branching rules based on instance features has proven to be an effective strategy toward solving optimization problems more efficiently (Lodi and Zarpellon 2017; Bengio et al. 2021; Ghaddar et al. 2023).

    +

    In algorithm selection, not only do the problem domain to which it applies and the algorithms for addressing problem instances play a crucial role, but also the metrics used to assess algorithm effectiveness —referred to in this work as Key Performance Indicators (KPIs). KPIs are used in different fields to assess and measure the performance of specific objectives or goals. In a business context, these indicators are quantifiable metrics that provide valuable insights into how well an individual, team, or entire organization is progressing towards achieving its defined targets. In the context of algorithms, KPIs serve as quantifiable measures used to evaluate the effectiveness and efficiency of algorithmic processes. For instance, in the realm of computer science and data analysis, KPIs can include measures like execution time, accuracy, and scalability. Monitoring these KPIs allows for a comprehensive assessment of algorithmic performance, aiding in the selection of the most appropriate algorithm for a given instance and facilitating continuous improvement in algorithmic design and implementation.

    +

    Additionally, in many applications, normalizing the KPI to a standardized range like \([0, 1]\) provides a more meaningful basis for comparison. The KPI obtained through this process, which we will refer to as instance-normalized KPI, reflects the performance of each algorithm relative to the best-performing one for each specific instance. For example, if we have multiple algorithms and we are measuring execution time that can vary across instances, normalizing the execution time for each instance relative to the fastest algorithm within that same instance allows for a fairer evaluation. This is particularly important when the values of execution time might not directly reflect the relative performance of the algorithms due to wide variations in the scale of the measurements. Thus, normalizing puts all algorithms on an equal footing, allowing a clearer assessment of their relative efficiency.

    +

    Following the general framework illustrated in Figure 1, the R package ASML (González-Rodríguez et al. 2025b) provides a wrapper for ML methods to select from a portfolio of algorithms based on the value of a given KPI. It uses a set of features in a training set to learn a regression model for the instance-normalized KPI value for each algorithm. Then, the instance-normalized KPI is predicted for unseen test instances, and the algorithm with the best predicted value is chosen. As learning techniques for algorithm selection, the user can invoke any regression method from the caret package (Kuhn and Max 2008) or use a custom function defined by the user. This makes our package flexible, as it automatically supports new methods when they are added to caret. Although initially designed for selecting branching rules in nonlinear optimization problems, its versatility allows the package to effectively address algorithm selection challenges across a wide range of domains. It can be applied to a broad spectrum of disciplines whenever there is a diverse set of instances within a specific problem domain, a suite of algorithms with varying behaviors across instances, clearly defined metrics for evaluating the performance of the available algorithms, and known features or characteristics of the instances that can be computed and are ideally correlated with algorithm performance. The visualization tools implemented in the package allow for an effective evaluation of the performance of the algorithm selection techniques. A key distinguishing element of ASML is its learning-phase approach, which uses instance-normalized KPI values and trains a separate regression model for each algorithm to predict its normalized KPI on unseen instances.

    +

    2 Background

    + +

    The algorithm selection problem was first outlined in the seminal work by Rice (1976) . In simple terms, for a given set of problem instances (problem space) and a set of algorithms (algorithm space), the goal is to determine a selection model that maps each problem instance to the most suitable algorithm for it. By most suitable, we mean the best according to a specific metric that associates each combination of instance and algorithm with its respective performance. Formally, let \(\mathcal{P}\) denote the problem space or set of problem instances. The algorithm space or set of algorithms is denoted by \(\mathcal{A}\). The metric \(p:\mathcal{P}\times\mathcal{A}\rightarrow \mathbb{R}^n\) measures the performance \(p(x,A)\) of +any algorithm \(A\in \mathcal{A}\) on instance \(x\in\mathcal{P}\). The goal is to construct a selector \(S:\mathcal{P}\rightarrow \mathcal{A}\) that maps any problem +instance \(x\in \mathcal{P}\) to an algorithm \(S(x)=A\in \mathcal{A}\) in such a way that its performance is optimal.

    +

    As discussed in the Introduction, many algorithm selection methods in the literature use ML tools to model the relationship between problem instances and algorithm performance, using features derived from these instances. The pivotal step in this process is defining appropriate features that can be readily computed and are likely to impact algorithm performance. That is, given \(x\in\mathcal{P}\), we make use of informative features \(f (x) = (f_1(x),\ldots,f_k(x))\in \mathbb{R}^k\). In this framework, the selector \(S\) maps the simpler feature space \(\mathbb{R}^k\) into the algorithm space \(\mathcal{A}\). A scheme of the algorithm selection problem, as described in Rice (1976), is shown in Figure 2.

    +
    +
    +Scheme of the algorithm selection problem by Rice (1976). +

    +Figure 2: Scheme of the algorithm selection problem by Rice (1976). +

    +
    +
    +

    For the practical derivation of the selection model \(S\), we use training data consisting of features \(f(x)\) and performances \(p(x,A)\), where \(x \in \mathcal{P}^\prime \subset \mathcal{P}\) and \(A \in \mathcal{A}\). The task is to learn the selector \(S\) based on the training data. The model allows us to forecast the performance on unobserved instance problems based on their features and subsequently select the algorithm with the highest predicted performance. A comprehensive discussion of various aspects of algorithm selection techniques can be found in Kotthoff (2016) and Pulatov et al. (2022).

    +

    3 Algorithm selection tools in R

    +

    The task of algorithm selection has seen significant advancements in recent years, with R packages facilitating this process. Here we present some of the existing tools that offer a range of functionalities, including flexible model-building frameworks, automated workflows, and standardized scenario formats, providing valuable resources for both researchers and end-users in algorithm selection.

    +

    The llama package (Kotthoff et al. 2021) provides a flexible implementation within R for evaluating algorithm portfolios. It simplifies the task of building predictive models to solve algorithm selection scenarios, allowing users to apply ML models effectively. In llama, ML algorithms are defined using the mlr package (Bischl et al. 2016b), offering a structured approach to model selection. On the other hand, the Algorithm Selection Library (ASlib) (Bischl et al. 2016a) proposes a standardized format for representing algorithm selection scenarios and introduces a repository that hosts an expanding collection of datasets from the literature. It serves as a benchmark for evaluating algorithm selection techniques under consistent conditions. It is accessible to R users through the aslib package. This integration simplifies the process for those working within the R environment. Furthermore, aslib interfaces with the llama package, facilitating the analysis of algorithm selection techniques within the benchmark scenarios it provides.

    +

    Our ASML package offers an approach to algorithm selection based on the powerful and flexible caret framework. By using caret’s ability to work with many different ML models, along with its model tuning and validation tools, ASML makes the selection process easy and effective, especially for users already familiar with caret. Thus, while ASML shares some conceptual similarities with llama, it distinguishes itself through its interface to the ML models in caret instead of mlr, which is currently considered retired by the mlr-org team, potentially leading to compatibility issues with certain learners, and has been succeeded by the next-generation mlr3 (Lang et al. 2019). In addition, ASML automates the normalization of KPIs based on the best-performing algorithm for each instance, addressing the challenges that arise when performance metrics vary significantly across instances. ASML further provides new visualization tools that can be useful for understanding the results of the learning process. A comparative overview of the main features and differences between these packages can be seen in Table 1.

    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +Table 1: Comparative overview of ASML and llama for algorithm selection. +
    +Aspect + +ASML + +llama +
    +Input data + +features
    KPIs
    split by families supported +
    +features
    KPIs
    feature costs supported +
    +Normalized KPIs + +✓ + +✗ +
    +ML backend + +caret + +mlr +
    +Hyperparameter tuning + +ASML::AStrain()
    supports arguments passed to caret (trainControl(), tuneGrid) +
    +llama::cvFolds
    llama::tuneModel +
    +Parallelization + +✓
    with snow +
    +✓
    with parallelMap +
    +Results summary + +Per algorithm
    Best overall and per instance
    ML-selected +
    +Virtual best and single best per instance
    Aggregated scores (PAR, count, successes) +
    +Visualization + +Boxplots (per algorithm and ML-selected)
    Ranking plots
    Barplots (best vs ML-selected) +
    +Scatter plots comparing two algorithm selectors +
    +Model interpretability tools + +✓
    with DALEX +
    +✗ +
    +ASlib integration + +basic support + +extended support +
    +Latest release + +CRAN 1.1.0 (2025) + +CRAN 0.10.1 (2021) +
    +
    +

    There are also automated approaches that streamline the process of selecting and optimizing ML models within the R environment. Tools like h2o provide robust functionalities specifically designed for R users, facilitating an end-to-end ML workflow. These frameworks automate various tasks, including algorithm selection, hyperparameter optimization, and feature engineering, thereby simplifying the process for users of all skill levels. By integrating these automated solutions into R, users can efficiently explore a wide range of models and tuning options without needing extensive domain knowledge or manual intervention. This automation not only accelerates the model development process but also improves the overall performance of ML projects by allowing a systematic evaluation of different approaches and configurations. However, while h2o excels at automating the selection of ML models and hyperparameter tuning, it does not perform algorithm selection based on instance-specific features, which is the primary focus of our approach. Instead, it evaluates multiple algorithms in parallel and selects the best-performing one based on predetermined metrics.

    +

    4 Using the ASML package

    +
    + +
    +

    Here, we illustrate the usage of the ASML package with an example within the context of algorithm selection for spatial branching in polynomial optimization, aligning with the problem discussed in Ghaddar et al. (2023) and further explored in González-Rodríguez et al. (2025a). Table 2 provides an overview of the problem and a summary of the components that we will discuss in detail below.

    +
    + + + + + + + + + + + + + + + + + + + + + + + + +
    +Table 2: Summary of the branching rule selection problem. +
    +Algorithms + +max, sum, dual, range, eig-VI, eig-CMI +
    +KPI + +pace +
    +Number of instances + +407 +
    +Number of instances per library + +180 (DS), 164 (MINLPLib), 63 (QPLIB) +
    +Number of features + +33 +
    +
    +

    A well-known approach for finding global optima in polynomial optimization problems is based on the use of the Reformation Linearization Technique (RLT) (Sherali and Tuncbilek 1992). Without delving into intricate details, RLT operates by creating a linear relaxation of the original polynomial problem, which is then integrated into a branch-and-bound framework. The branching process involves assigning a score to each variable, based on the violations of the RLT identities it participates in, after solving the corresponding relaxation at each node. Subsequently, the variable with the highest score is selected for branching. The computation of these scores is a critical aspect and allows for various approaches, leading to distinct branching rules that constitute our algorithm selection portfolio. Specifically, in our example, we will examine six distinct branching rules (referred to interchangeably as branching rules or algorithms), labeled as max, sum, dual, range, eig-VI, and eig-CMI rules. For the definitions and a comprehensive understanding of the rationale behind these rules, refer to Ghaddar et al. (2023).

    +

    Measuring the performance of different algorithms in the context of optimization is crucial for evaluating their effectiveness and efficiency. Two common metrics for this evaluation are running time and optimality gap, measured as a function of the lower and upper bounds for the objective function value at the end of the algorithm (a small optimality gap indicates that the algorithm is producing solutions close to the optimal). Both metrics are important and are often considered together to evaluate algorithm performance. For instance, it is meaningful to consider the time required to reduce the optimality gap by one unit as KPI. In our example, and to ensure it is well-defined, we make use of a slightly different metric, which we refer to as pace, defined as the time required to increase the lower bound by one unit. For the pace, a smaller value is preferred, as it indicates better performance.

    +

    As depicted in Figure 2, a crucial aspect of the methodology involves selecting input variables (features) that facilitate the prediction of the KPI for each branching rule. We consider 33 features representing global information of the polynomial optimization problems, such as relevant characteristics of variables, constraints, monomials, coefficients, or other attributes. A detailed description of the considered features can be found in Table 3. Although we won’t delve into these aspects, determining appropriate features is often complex, and using feature-selection methods can be beneficial for choosing the most relevant ones.

    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +Table 3: Features from the branching dataset. +
    +Index + +Description +
    +3 + +Number of variables +
    +4 + +Number of constraints +
    +5 + +Degree +
    +6 + +Number of monomials +
    +7 + +Density +
    +8 + +Density of VIG +
    +9 + +Modularity of VIG +
    +10 + +Treewidth of VIG +
    +11 + +Density of CMIG +
    +12 + +Modularity of CMIG +
    +13 + +Treewidth of CMIG +
    +14 + +Pct. of variables not present in any monomial with degree greater than one +
    +15 + +Pct. of variables not present in any monomial with degree greater than two +
    +16 + +Number of variables divided by number of constrains +
    +17 + +Number of variables divided by degree +
    +18 + +Pct. of equality constraints +
    +19 + +Pct. of linear constraints +
    +20 + +Pct. of quadratic constraints +
    +21 + +Number of monomials divided by number of constrains +
    +22 + +Number of RLT variables divided by number of constrains +
    +23 + +Pct. of linear monomials +
    +24 + +Pct. of quadratic monomials +
    +25 + +Pct. of linear RLT variables +
    +26 + +Pct. of quadratic RLT variables +
    +27 + +Variance of the ranges of the variables +
    +28 + +Variance of the coefficients +
    +29 + +Variance of the density of the variables +
    +30 + +Variance of the no. of appearances of each variable +
    +31 + +Average of the ranges of the variables +
    +32 + +Average of the coefficients +
    +33 + +Average pct. of monomials in each constraint and in the objective function +
    +34 + +Average of the no. of appearances of each variable +
    +35 + +Median of the ranges of the variables +
    +Note: +
    + Index refers to columns of branching$x. +
    +
    +

    To assess the performance of the algorithm selection methods in this context, we have a diverse set of 407 instances from different optimization problems, taken from three well-known benchmarks (Bussieck et al. 2003; Dalkiran and Sherali 2016; Furini et al. 2018), corresponding respectively to the MINLPLib, DS, and QPLIB libraries. Details are given in Table 2. The data for this analysis is contained within the branching dataset included in the package. We begin by defining two data frames. The features data frame includes two initial columns that provide the instance names and the corresponding family (library in our example) for each instance. The remaining columns consist of the features listed in Table 3.

    +

    We also define the KPI data frame, which is derived from branching$y. This data frame contains the pace values for each of the six branching rules considered in this study (specified by the labels in the lab_rules vector). These data frames will serve as the input for our subsequent analyses.

    +
    +
    +
    set.seed(1234)
    +library(ASML)
    +data(branching)
    +features <- branching$x
    +KPI <- branching$y
    +lab_rules <- c("max", "sum", "dual", "range", "eig-VI", "eig-CMI")
    +
    +
    +

    4.1 Pre-Processing the data

    +

    As with any analysis, the first step involves preprocessing the data. This includes using the function partition_and_normalize, which not only divides the dataset into training and test sets but also normalizes the KPI relative to the best result for each instance. The argument better_smaller specifies whether a lower KPI value is preferred (such as in our case where the KPI represents pace, with smaller values indicating better performance) or if a higher value is desired, when larger KPI values are considered more advantageous.

    +
    +
    +
    data <- partition_and_normalize(features, KPI, family_column = 1, split_by_family = TRUE,
    +    better_smaller = TRUE)
    +names(data)
    +
    +
    [1] "x.train"          "y.train"          "y.train.original"
    +[4] "x.test"           "y.test"           "y.test.original" 
    +[7] "families.train"   "families.test"    "better_smaller"  
    +
    +

    When using the function partition_and_normalize the resulting object is of class as_data and contains several key components essential for our study. Specifically, the object includes x.train and x.test, representing the feature sets for the training and test datasets, respectively. Additionally, it contains y.train and y.test, with the instance-normalized KPI corresponding to each dataset, along with their original counterparts, y.train.original and y.test.original. This structure allows us to retain the original KPI values while working with the instance-normalized data. Furthermore, when the parameter split_by_family is set to TRUE, as in the example, the object also includes families.train and families.test, indicating the family affiliation for each observation within the training and test sets. Figure 3 illustrates how the split preserves the proportions of instances for each library.

    +
    +
    +Train/Test partition preserving the percentage of instances for each library. +

    +Figure 3: Train/Test partition preserving the percentage of instances for each library. +

    +
    +
    +

    As a tool for visualizing the performance of the considered algorithms, the boxplots function operates on objects of class as_data and generates boxplots for the instance-normalized KPI. This visualization facilitates the comparison of performance differences across instances. The function can be applied to both training and test observations and can also group the results by family. Additionally, it accepts common arguments typically used in R functions. Figure 4 shows the instance-normalized KPI of the instances in the train set. What becomes evident from the boxplots is that there is no branching rule that outperforms the others across all instances, and making a wrong choice of criteria in certain problems can lead to very poor performance.

    +
    +
    +
    boxplots(data, test = FALSE, by_families = FALSE, labels = lab_rules)
    +
    +
    +Boxplots of instance-normalized KPI for each algorithm across instances in the train set. +

    +Figure 4: Boxplots of instance-normalized KPI for each algorithm across instances in the train set. +

    +
    +
    +

    The ranking function, specifically designed for the ASML package, is also valuable for visualizing the differing behaviors of the algorithms under investigation, depending on the analyzed instances. After ranking the algorithms for each instance, based on the instance-normalized KPI, the function generates a bar chart for each algorithm, indicating the percentage of times it occupies each ranking position. The numbers displayed within the bars represent the mean value of the instance-normalized KPI for the problems associated with that specific ranking position. Again, the representation can be made both for the training and test sets, as well as by family. In Figure 5, we present the chart corresponding to the training sample and categorized by family. In particular, it is observed that certain rules, when not the best choice for a given instance, can perform quite poorly in terms of instance-normalized KPI (see, for example, the results on the MINLPLib library). This highlights the importance of not only selecting the best algorithm for each instance but also ensuring that the chosen algorithm does not perform too poorly when it isn’t optimal. In some cases, even if an algorithm isn’t the best-performing option, it may still provide reasonably good results, whereas a wrong choice can result in significantly worse outcomes.

    +
    +
    +
    ranking(data, test = FALSE, by_families = TRUE, labels = lab_rules)
    +
    +
    +Ranking of algorithms based on the instance-normalized KPI for the training sample, categorized by family. The bars represent the percentage of times each algorithm appeared in different ranking positions, with the numbers indicating the mean value of the KPI. +

    +Figure 5: Ranking of algorithms based on the instance-normalized KPI for the training sample, categorized by family. The bars represent the percentage of times each algorithm appeared in different ranking positions, with the numbers indicating the mean value of the KPI. +

    +
    +
    +

    Additionally, functions from the caret package can be applied if further operations on the predictors are needed. Here we show an example where the Yeo-Johnson transformation is applied to the training set, and the same transformation is subsequently applied to the test set to ensure consistency across both datasets. The flexibility of caret also allows for the inclusion of advanced techniques, such as feature selection and dimensionality reduction, to improve the quality of the algorithm selection process.

    +
    +
    +
    preProcValues <- caret::preProcess(data$x.train, method = "YeoJohnson")
    +data$x.train <- predict(preProcValues, data$x.train)
    +data$x.test <- predict(preProcValues, data$x.test)
    +
    +
    +

    4.2 Training models and predicting the performance of the algorithms

    +

    The approach in ASML to algorithm selection is based on building regression models that predict the instance-normalized KPI of each considered algorithm. To this end, users can take advantage of the wide range of ML models available in the caret package, which provides a unified interface for training and tuning various types of models. Models trained with caret can be seamlessly integrated into the ASML workflow using the AStrain function from ASML, as shown in the next example. Just for illustrative purposes, we use quantile random forest (Meinshausen 2006) to model the behavior of the instance-normalized KPI based on the features. This is done with the qrf method in the caret package, which relies on the quantregForest package (Meinshausen 2024).

    +
    +
    +
    library(quantregForest)
    +tune_grid <- expand.grid(mtry = 10)
    +training <- AStrain(data, method = "qrf", tuneGrid = tune_grid)
    +
    +
    +

    Additional arguments for caret::train can also be passed directly to ASML::AStrain. This allows users to take advantage of the flexibility of the caret package, including specifying control methods (such as cross-validation), tuning parameters, or any other relevant settings provided by caret::train. This integration ensures that the ASML workflow can fully make use of the modeling capabilities offered by caret. To make the execution faster (it is not our intention here to delve into the choice of the best model), we use a tune_grid that sets a fixed value for mtry. This avoids the need for an exhaustive search for this hyperparameter, speeding up the model training process. Other modeling approaches should also be considered, as they may offer better performance depending on the specific characteristics of the data and the problem at hand. For more computationally intensive models or larger datasets, the ASML::AStrain function includes the argument parallel, which can be set to TRUE to enable parallel execution using the snow package (Tierney et al. 2021). This allows the training step to be distributed across multiple cores, reducing computation time. A detailed example on a larger dataset is provided in the following section, showing the scalability of the workflow and the effect of parallelization on training time.

    +

    The function caret::train returns a trained model along with performance metrics, predictions, and tuning parameters, providing insights into the model’s effectiveness. In a similar manner, ASML::AStrain offers the same type of output but for each algorithm under consideration, allowing straightforward comparison within the ASML framework. The ASML::ASpredict function generates the predictions for new data by using the models created during the training phase for each algorithm under evaluation. Thus, predictions for the algorithms are obtained simultaneously, facilitating a direct comparison of their performance. By using ASML::ASpredict as follows, we obtain a matrix where each row corresponds to an instance from the test set, and each column represents the predicted instance-normalized KPIs for the six branching rules using the qrf method.

    +
    +
    +
    predict_test <- ASpredict(training, newdata = data$x.test)
    +
    +
    +

    4.3 Evaluating and visualizing the results

    +

    One of the key strengths of the ASML package lies in its ability to evaluate results collectively and provide intuitive visualizations. This approach not only aids in identifying the most effective algorithms but also contributes to the interpretability of the results, making it easier for users to make informed decisions based on the performance metrics and visual representations provided. For example, the function KPI_table returns a table showing the arithmetic and geometric mean of the KPI (both instance-normalized and not normalized) obtained on the test set for each algorithm, as well as for the algorithm selected by the learning model (the one with the largest instance-normalized predicted KPI for each instance). In Table 4, the results for our case study are shown. It is important to note that larger values are better in the columns for the arithmetic and geometric mean of the instance-normalized KPI (where values close to 1 indicate the best performance). Conversely, in the columns for non-normalized values, lower numbers reflect better outcomes. In all cases, the best results are obtained for the ML algorithm. Note also that in this case, the differences in the performance of the algorithms are likely better reflected by the geometric mean because it gives a better representation of relative differences.

    +
    +
    +
    KPI_table(data, predictions = predict_test)
    +
    +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +Table 4: Arithmetic and geometric mean of the KPI (both instance-normalized and non-normalized) for each algorithm on the test set, along with the results for the algorithm selected by the learning model (first row). +
    + +Arithmetic mean +inst-norm KPI + +Geometric mean +inst-norm KPI + +Arithmetic mean +non-norm KPI + +Geometric mean +non-norm KPI +
    +ML + +0.911 + +0.887 + +88114.19 + +1.035 +
    +max + +0.719 + +0.367 + +158716.13 + +2.574 +
    +sum + +0.791 + +0.537 + +104402.53 + +1.780 +
    +dual + +0.842 + +0.581 + +104393.92 + +1.634 +
    +range + +0.879 + +0.644 + +107064.29 + +1.432 +
    +eig-VI + +0.781 + +0.474 + +131194.49 + +2.007 +
    +eig-CMI + +0.800 + +0.591 + +88197.74 + +1.616 +
    +
    +

    Additionally, the function KPI_summary_table generates a concise comparative table displaying values for three different choices: single best, ML, and optimal, see Table 5. The single best choice refers to selecting the same algorithm for all instances based on the lowest geometric mean of the non-normalized KPI (in this case the range rule). This approach evaluates the performance of each algorithm across all instances and chooses the one that consistently performs best overall, rather than optimizing for individual instances. The ML choice represents the algorithm selected by the quantile random forest model. The optimal choice corresponds to solving each instance with the algorithm that performs best for that specific instance. The ML choice shows promising results, with a mean KPI close to the optimal choice, demonstrating its capability to select algorithms that yield competitive performance.

    +
    +
    +
    KPI_summary_table(data, predictions = predict_test)
    +
    +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +Table 5: Arithmetic and geometric mean of the non-normalized KPI for single best choice, ML choice, and optimal choice. +
    + +Arithmetic mean +non-norm KPI + +Geometric mean +non-norm KPI +
    +single best + +107064.29 + +1.432 +
    +ML + +88114.19 + +1.035 +
    +optimal + +88085.37 + +0.911 +
    +
    +

    The following code generates several visualizations that help us compare how well the algorithms perform according to the response variable (instance-normalized KPI) and also illustrate the behavior of the learning process. These plots give us good insights into how effective the algorithm selection process is and how it behaves in comparison to using the same branching rule for all instances. Figure 6 shows the boxplots comparing the performance of each algorithm in terms of the instance-normalized KPI, including the instance-normalized KPI of the rules selected by the ML process for the test set. In Figure 7, the performance is presented by family, allowing for a more detailed comparison across the different sets of instances. In Figure 8, we show the ranking of algorithms based on the instance-normalized KPI for the test sample, including the ML rule, categorized by family. Finally, in Figure 9, the right-side bar in the stacked bar plot (optimal) illustrates the proportion of instances in which each of the original rules is identified as the best-performing option. In contrast, the left-side bar (ML) depicts the frequency with which ML selects each rule as the top choice. Although the rule chosen by ML in each instance doesn’t always match the best one for that case, ML tends to select the different rules in a similar proportion to how often those rules are the best across the test set. This means it does not consistently favor a particular rule or ignore any that are the best a significant percentage of instances.

    +
    +
    +
    boxplots(data, predictions = predict_test, labels = c(lab_rules, "ML"))
    +boxplots(data, predictions = predict_test, labels = c(lab_rules, "ML"),
    +    by_families = TRUE)
    +ranking(data, predictions = predict_test, labels = c("ML", lab_rules),
    +    by_families = TRUE)
    +figure_comparison(data, predictions = predict_test, by_families = FALSE,
    +    labels = lab_rules)
    +
    +
    +
    +
    +Boxplots of instance-normalized KPI for each algorithm, including the ML algorithm, across instances in the test set. +

    +Figure 6: Boxplots of instance-normalized KPI for each algorithm, including the ML algorithm, across instances in the test set. +

    +
    +
    +
    +
    +Boxplots of instance-normalized KPI for each algorithm, including the ML algorithm, across instances in the test set, categorized by family. +

    +Figure 7: Boxplots of instance-normalized KPI for each algorithm, including the ML algorithm, across instances in the test set, categorized by family. +

    +
    +
    +
    +
    +Ranking of algorithms, including the ML algorithm, based on the instance-normalized KPI for the test sample, categorized by family. The bars represent the percentage of times each algorithm appeared in different ranking positions, with the numbers indicating the mean value of the normalized KPI. +

    +Figure 8: Ranking of algorithms, including the ML algorithm, based on the instance-normalized KPI for the test sample, categorized by family. The bars represent the percentage of times each algorithm appeared in different ranking positions, with the numbers indicating the mean value of the normalized KPI. +

    +
    +
    +
    +
    +Comparison of the best-performing rules: The right stack shows the proportion of times each of the original rules is identified as the best-performing option, while the left stack presents the frequency of selection by ML. +

    +Figure 9: Comparison of the best-performing rules: The right stack shows the proportion of times each of the original rules is identified as the best-performing option, while the left stack presents the frequency of selection by ML. +

    +
    +
    +

    4.4 Custom user-defined methods

    +

    While caret provides a range of built-in methods for model training and prediction, there may be situations where researchers want to explore additional methods not directly integrated into the package. Considering alternative methods can improve the analysis and provide greater flexibility in modeling choices.

    +

    In this section, we present an example of how to modify the quantile random forest qrf method. The qrf implementation in caret does not allow users to specify the conditional quantile to predict, which is set to the median by default. In this case, rather than creating an entirely new method, we only need to adjust the prediction function to include the what argument, allowing us to specify the desired conditional quantile for prediction. In this execution example, we base the algorithm selection method on the predictions of the \(\alpha\)-conditional quantile of the instance-normalized KPI for \(\alpha = 0.25\).

    +
    +
    +
    qrf_q_predict <- function(modelFit, newdata, what = 0.5, submodels = NULL) {
    +    out <- predict(modelFit$finalModel, newdata, what = what)
    +    if (is.matrix(out)) {
    +        out <- out[, 1]
    +    }
    +    out
    +}
    +
    +predict_test_Q1 <- ASpredict(training, newdata = data$x.test, f = "qrf_q_predict",
    +    what = 0.25)
    +KPI_summary_table(data, predictions = predict_test_Q1)
    +
    +
    +
    +

    The results are summarized in Table 6.

    +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +Table 6: Arithmetic and geometric mean of the non-normalized KPI for single best choice, ML choice, and optimal choice. The ML choice is based on the predictions of the alpha-conditional quantile for alpha=0.25. +
    + +Arithmetic mean +non-norm KPI + +Geometric mean +non-norm KPI +
    +single best + +107064.29 + +1.432 +
    +ML + +88112.80 + +1.110 +
    +optimal + +88085.37 + +0.911 +
    +
    +

    4.5 Model interpretability

    +

    Predictive modeling often relies on flexible but complex methods. These methods typically involve many parameters or hyperparameters, which can make the models difficult to interpret. To address this, interpretable ML techniques provide tools for exploring black-box models. ASML integrates seamlessly with the package DALEX (moDel Agnostic Language for Exploration and eXplanation), see Biecek (2018). With DALEX, users can obtain model performance metrics, evaluate feature importance, and generate partial dependence plots (PDPs), among other analyses.

    +

    To simplify the use of DALEX within our framework, ASML provides the function ASexplainer. This function automatically creates DALEX explainers for the models trained with AStrain (one for each algorithm in the portfolio). Once the explainers are created, users can easily apply DALEX functions to explore and compare the behavior of each model. The following example shows how to obtain a plot of the reversed empirical cumulative distribution function of the absolute residuals, from the performance metrics computed with DALEX::model_performance, see Figure 10.

    +
    +
    +
    # Create DALEX explainers for each trained model
    +explainers_qrf <- ASexplainer(training, data = data$x.test, y = data$y.test,
    +    labels = lab_rules)
    +# Compute model performance metrics for each explainer
    +mp_qrf <- lapply(explainers_qrf, DALEX::model_performance)
    +# Plot the performance metrics
    +do.call(plot, unname(mp_qrf)) + theme_bw(base_line_size = 0.5)
    +
    +
    +
    +
    +Reversed empirical cumulative distribution function of the absolute residuals of the trained model. +

    +Figure 10: Reversed empirical cumulative distribution function of the absolute residuals of the trained models. +

    +
    +
    +

    The code below illustrates how to obtain feature importance (via DALEX::model_parts) and a PDP for the predictor variable degree (via DALEX::model_profile). Plots are not displayed in this manuscript, but they can be generated by executing the code.

    +
    +
    +
    # Compute feature importance for each model in the explainers list
    +vi_qrf <- lapply(explainers_qrf, DALEX::model_parts)
    +# Plot the top 5 most important variables for each model
    +do.call(plot, c(unname(vi_qrf), list(max_vars = 5)))
    +# Compute PDP for the variable 'degree' for each model
    +pdp_qrf <- lapply(explainers_qrf, DALEX::model_profile, variable = "degree",
    +    type = "partial")
    +# Plot the PDPs generated
    +do.call(plot, unname(pdp_qrf)) + theme_bw(base_line_size = 0.5)
    +
    +
    +

    5 Example on a larger dataset

    +

    To analyze the scalability of ASML, we now consider an example of algorithm selection in the field of high-performance computing (HPC), specifically in the context of the automatic selection of the most suitable storage format for sparse matrices on GPUs. This is a well-known problem in HPC, since the storage format has a decisive impact on the performance of many scientific kernels such as the sparse matrix–vector multiplication (SpMV). For this study, we use the dataset introduced by Pichel and Pateiro-López (2018), which contains 8111 sparse matrices and is available in the ASML package under the name SpMVformat. Each matrix is described by a set of nine structural features, and the performance of the single-precision SpMV kernel was measured on a NVIDIA GeForce GTX TITAN GPU, under three storage formats: compressed row storage (CSR), ELLPACK (ELL), and hybrid (HYB). For each matrix and format, performance is expressed as the average GFLOPS (billions of floating-point operations per second), over 1000 SpMV operations. This setup allows us to study how matrix features relate to the most efficient storage format.

    +

    The workflow follows the standard ASML pipeline: data are partitioned and normalized, preprocessed, and models are trained using ASML::AStrain. We considered different learning methods available in caret and evaluated execution times both with and without parallel processing, which is controlled via the parallel argument in ASML::AStrain. The selected methods were run with their default configurations in caret, without additional hyperparameter tuning. All experiments were performed on a machine equipped with a 12th Gen Intel(R) Core(TM) i7-12700 (12 cores), 2.11 GHz processor and 32 GB of RAM. The execution times are summarized in Tables 7 and 8.

    +
    + + + + + + + + + + + + + + + + + + +
    +Table 7: Execution times (in seconds) on the SpMVformat dataset for the main preprocessing stages. +
    +Stage + +Execution time (seconds) +
    +ASML::partition_and_normalize + +0.03 +
    +caret::preProcess + +1.55 +
    +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +Table 8: Training times (in seconds) on the SpMVformat dataset for different methods using ASML::AStrain. The first column shows execution without parallelization (parallel = FALSE) and the second column shows execution with parallelization (parallel = TRUE). +
    + +
    +Execution times (in seconds) of ASML::AStrain +
    +
    +Method + +parallel = FALSE + +parallel = TRUE +
    +nnet + +236.58 + +50.75 +
    +svmRadial + +881.03 + +263.60 +
    +rf + +4753.00 + +1289.68 +
    +
    +

    The majority of the computational cost is associated with model training, which depends on the learning method in caret. We observe that training times vary across methods: nnet (a simple feed-forward neural network), svmRadial (support vector machines with radial kernel), and rf (random forest). Parallel execution substantially reduces training times for all selected methods, demonstrating that the workflow scales efficiently to larger datasets while keeping preprocessing overhead minimal.

    +

    Apart from the execution times, we also take this opportunity to provide a brief commentary on the outcome of the algorithm selection in this application example. In particular, we illustrate the model’s ability to identify the most efficient storage format by reporting the results obtained with the nnet method, see Figure 11. The trained model selects the best-performing format in more than 85% of the test cases, and even when it does not, the chosen format still achieves high performance, with mean value of the normalized KPI (normalized average GFLOPS) around 0.9.

    +
    +
    +
    set.seed(1234)
    +data(SpMVformat)
    +features <- SpMVformat$x
    +KPI <- SpMVformat$y
    +data <- partition_and_normalize(features, KPI, better_smaller = FALSE)
    +preProcValues <- caret::preProcess(data$x.train, method = "YeoJohnson")
    +data$x.train <- predict(preProcValues, data$x.train)
    +data$x.test <- predict(preProcValues, data$x.test)
    +training <- AStrain(data, method = "nnet", parallel = TRUE)
    +pred <- ASpredict(training, newdata = data$x.test)
    +ranking(data, predictions = pred)
    +
    +
    +
    +
    +Ranking of storage formats, including the ML selected, based on the instance-normalized KPI for the test sample. The bars represent the percentage of times each storage format appeared in different ranking positions, with the numbers indicating the mean value of the normalized KPI. +

    +Figure 11: Ranking of storage formats, including the ML selected, based on the instance-normalized KPI for the test sample. The bars represent the percentage of times each storage format appeared in different ranking positions, with the numbers indicating the mean value of the normalized KPI. +

    +
    +
    +

    6 Using ASML for algorithm selection on ASlib scenarios

    +

    While the primary purpose of the ASML package is not to systematically conduct algorithm selection studies like those found in ASlib (an area for which the llama toolkit is especially helpful), it does offer a complementary approach for reproducing results from the ASlib benchmark (https://coseal.github.io/aslib-r/scenario-pages/index.html). Our method allows for a comparative analysis using instance-normalized KPIs, which, as demonstrated in the following example, can sometimes yield improved performance results. Additionally, it can be useful for evaluating algorithm selection approaches based on methods that are not available in the mlr package used by llama but are accessible in caret.

    +

    6.1 Data download and preparation

    +

    First, we identify the specific scenario from ASlib we are interested in, in this case, CPMP-2015. Using the scenario name, we construct a URL that points to the corresponding page on the ASlib website. Then, we fetch the HTML content of the page and create a local directory to store the downloaded files1.

    +
    +
    +
    set.seed(1234)
    +library(tidyverse)
    +library(rvest)
    +scen <- "CPMP-2015"
    +url <- paste0("https://coseal.github.io/aslib-r/scenario-pages/", scen,
    +    "/data_files")
    +page <- read_html(paste0(url, ".html"))
    +file_links <- page %>%
    +    html_nodes("a") %>%
    +    html_attr("href")
    +
    +# Create directory for downloaded files
    +dir_data <- paste0(scen, "_data")
    +dir.create(dir_data, showWarnings = FALSE)
    +
    +# Download files
    +for (link in file_links) {
    +    full_link <- ifelse(grepl("^http", link), link, paste0(url, "/", link))
    +    file_name <- basename(link)
    +    dest_file <- file.path(dir_data, file_name)
    +    if (!is.na(full_link)) {
    +        download.file(full_link, dest_file, mode = "wb", quiet = TRUE)
    +    }
    +}
    +
    +
    +

    6.2 Data preparation with aslib

    +

    Now, we use the aslib package to parse the scenario data and extract the relevant features and performance metrics. The parseASScenario function from aslib creates a structured object ASScen that contains information regarding the algorithms and instances being evaluated. We then transform this data into cross-validation folds using the cvFolds function from llama. This conversion facilitates a fair evaluation of algorithm performance across different scenarios, allowing us to compare the results with those published 2.

    +
    +
    +
    library(aslib)
    +ASScen <- aslib::parseASScenario(dir_data)
    +llamaScen <- aslib::convertToLlama(ASScen)
    +folds <- llama::cvFolds(llamaScen)
    +
    +
    +

    Then we extract the key performance indicator (KPI) and features from the folds object. In this case, KPI refers to runtime. As described in the ASlib documentation, KPI_pen measures the penalized runtime. If an instance is solved within the timeout (cutoff) by the selected algorithm, the actual runtime is used. However, if a timeout occurs, the timeout value is multiplied by 10 to penalize the algorithm’s performance. We also define nins as the number of instances and ID as unique identifiers for each instance.

    +
    +
    +
    KPI <- folds$data[, folds$performance]
    +features <- folds$data[, folds$features]
    +cutoff <- ASScen$desc$algorithm_cutoff_time
    +is.timeout <- ASScen$algo.runstatus[, -c(1, 2)] != "ok"
    +KPI_pen <- KPI * ifelse(is.timeout, 10, 1)
    +nins <- length(getInstanceNames(ASScen))
    +ID <- 1:nins
    +
    +
    +

    6.3 Quantile random forest using ASML on instance-normalized KPI

    +

    We use the ASML package to perform quantile random forest on instance-normalized KPI. We have already established the folds beforehand, and we want to use those partitions to maintain consistency with the original ASlib scenario design. Therefore, we provide x.test and y.test as arguments directly to the partition_and_normalize function.

    +
    +
    +
    data <- partition_and_normalize(x = features, y = KPI, x.test = features,
    +    y.test = KPI, better_smaller = TRUE)
    +train_control <- caret::trainControl(index = folds$train, savePredictions = "final")
    +training <- AStrain(data, method = "qrf", trControl = train_control)
    +
    +
    +

    In this code block, we process the predictions made by the models trained using ASML and calculate the same performance metrics used in ASlib, namely, the percentage of solved instances (succ), penalized average runtime (par10), and misclassification penalty (mcp), as detailed in the ASlib documentation (Bischl et al. 2016a).

    +
    +
    +
    pred_list <- lapply(training, function(model) {
    +    model$pred %>%
    +        arrange(rowIndex) %>%
    +        pull(pred)
    +})
    +
    +pred <- do.call(cbind, pred_list)
    +alg_sel <- apply(pred, 1, which.max)
    +
    +succ <- mean(!is.timeout[cbind(ID, alg_sel)])
    +par10 <- mean(KPI_pen[cbind(ID, alg_sel)])
    +mcp <- mean(KPI[cbind(ID, alg_sel)] - apply(KPI, 1, min))
    +
    +
    +

    In Table 9, we present the results. We observe that, in this example, using instance-normalized KPI along with the quantile random forest model offers an alternative modeling option in addition to the standard regression models employed in the original ASlib study (linear model, regression trees and regression random forest), resulting in improved performance outcomes.

    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +Table 9: Performance results of various models on the CPMP-2015 dataset. The last row represents the performance of the quantile random forest model based on instance-normalized KPI using the ASML package. +The preceding rows detail the results (all taken from original ASlib study3) of the virtual best solver (vbs), single best solver (singleBest), and the considered regression methods (linear model, regression trees and regression random forest). +
    +Model + +succ + +par10 + +mcp +
    +baseline vbs + +1.000 + +227.605 + +0.000 +
    +baseline singleBest + +0.812 + +7002.907 + +688.774 +
    +regr.lm + +0.843 + +5887.326 + +556.875 +
    +regr.rpart + +0.843 + +5916.120 + +585.669 +
    +regr.randomForest + +0.846 + +5748.065 + +540.574 +
    +ASML qrf + +0.873 + +4807.633 + +460.863 +
    +
    +
    + +
    +

    It’s important to note that this is merely an example of execution and there are other scenarios in ASlib where replication may not be feasible in the same manner, due to factors not considered in ASML (for more robust behavior across the ASlib benchmark, we refer to llama). Despite these limitations, ASML provides a flexible framework that allows researchers to explore various methodologies, including those not directly applicable with llama through mlr, and improve algorithm selection processes across different scenarios, ultimately contributing to improve understanding and performance in algorithm selection tasks.

    +

    7 Summary and discussion

    +

    In this work, we present ASML, an R package to select the best algorithm from a portfolio of candidates based on a chosen KPI. ASML uses instance-specific features and historical performance data to estimate, via a model selected by the user, including any regression method from the caret package or a custom function, how well each algorithm is likely to perform on new instances. This helps the automatic selection of the most suitable one. The use of instance-normalized KPIs for algorithm selection is a novel aspect of this package, allowing a unified comparison across different algorithms and problem instances.

    +

    While the motivation and examples presented in this work focus on optimization problems, particularly the automatic selection of branching rules and decision strategies in polynomial optimization, the ASML framework is inherently flexible and can be applied more broadly. In particular, in the context of ML, selecting the right algorithm is a crucial factor for the success of ML applications. Traditionally, this process involves empirically assessing potential algorithms with the available data, which can be resource-intensive. In contrast, in the so-called Meta-Learning, the aim is to predict the performance of ML algorithms based on features of the learning problems (meta-examples). Each meta-example contains details about a previously solved learning problem, including features, and the performance achieved by the candidate algorithms on that problem. A common Meta-Learning approach involves using regression algorithms to forecast the value of a selected performance metric (such as classification error) for the candidate algorithms based on the problem features. This method is commonly referred to as Meta-Regression in the literature. Thus, ASML could also be used in this context, providing a flexible tool for algorithm selection across a variety of domains.

    +

    8 Acknowledgments

    +

    The authors would like to thank María Caseiro-Arias, Antonio Fariña-Elorza and Manuel Timiraos-López for their contributions to the development of the ASML package. +This work is part of the R&D projects PID2024-158017NB-I00, PID2020-116587GB-I00 and PID2021-124030NB-C32 granted by MICIU/AEI/10.13039/501100011033. This research was also funded by Grupos de Referencia Competitiva ED431C-2021/24 and ED431C 2025/03 from the Consellería de Educación, Ciencia, Universidades e Formación Profesional, Xunta de Galicia. Brais González-Rodríguez acknowledges the support from MICIU, through grant BG23/00155.

    +
    +

    8.1 Supplementary materials

    +

    Supplementary materials are available in addition to this article. It can be downloaded at +RJ-2025-045.zip

    +

    8.2 CRAN packages used

    +

    ASML, caret, llama, mlr, aslib, mlr3, h2o, quantregForest, snow, DALEX

    +

    8.3 CRAN Task Views implied by cited packages

    +

    HighPerformanceComputing, MachineLearning, ModelDeployment, Spatial, TeachingStatistics

    +
    +
    +Y. Bengio, A. Lodi and A. Prouvost. Machine learning for combinatorial optimization: A methodological tour d’horizon. European Journal of Operational Research, 290(2): 405–421, 2021. URL https://www.sciencedirect.com/science/article/pii/S0377221720306895. +
    +
    +P. Biecek. DALEX: Explainers for complex predictive models in R. Journal of Machine Learning Research, 19(84): 1–5, 2018. URL http://jmlr.org/papers/v19/18-416.html. +
    +
    +B. Bischl, P. Kerschke, L. Kotthoff, M. Lindauer, Y. Malitsky, A. Fréchette, H. Hoos, F. Hutter, K. Leyton-Brown, K. Tierney, et al. ASlib: A benchmark library for algorithm selection. Artificial Intelligence, 237: 41–58, 2016a. DOI https://doi.org/10.1016/j.artint.2016.04.003. +
    +
    +B. Bischl, M. Lang, L. Kotthoff, J. Schiffner, J. Richter, E. Studerus, G. Casalicchio and Z. M. Jones. mlr: Machine learning in R. Journal of Machine Learning Research, 17(170): 1–5, 2016b. URL https://jmlr.org/papers/v17/15-066.html. +
    +
    +M. R. Bussieck, A. S. Drud and A. Meeraus. MINLPLib-a collection of test models for mixed-integer nonlinear programming. INFORMS Journal on Computing, 15: 114–119, 2003. DOI 10.1287/ijoc.15.1.114.15159. +
    +
    +E. Dalkiran and H. D. Sherali. RLT-POS: Reformulation-linearization technique-based optimization software for solving polynomial programming problems. Mathematical Programming Computation, 8: 337–375, 2016. DOI 10.1007/s12532-016-0099-5. +
    +
    +J. H. Drake, A. Kheiri, E. Özcan and E. K. Burke. Recent advances in selection hyper-heuristics. European Journal of Operational Research, 285(2): 405–428, 2020. URL https://www.sciencedirect.com/science/article/pii/S0377221719306526. +
    +
    +F. Furini, E. Traversi, P. Belotti, A. Frangioni, A. Gleixner, N. Gould, L. Liberti, A. Lodi, R. Misener, H. Mittelmann, et al. QPLIB: A library of quadratic programming instances. Mathematical Programming Computation, 1: 237–265, 2018. DOI 10.1007/s12532-018-0147-4. +
    +
    +B. Ghaddar, I. Gómez-Casares, J. González-Díaz, B. González-Rodríguez, B. Pateiro-López and S. Rodríguez-Ballesteros. Learning for spatial branching: An algorithm selection approach. INFORMS Journal on Computing, 35(5): 1024–1043, 2023. URL https://doi.org/10.1287/ijoc.2022.0090. +
    +
    +B. González-Rodríguez, I. Gómez-Casares, B. Ghaddar, J. González-Díaz and B. Pateiro-López. Learning in spatial branching: Limitations of strong branching imitation. INFORMS Journal on Computing, 2025a. +
    +
    +B. González-Rodríguez, I. Gómez-Casares, B. Pateiro-López and J. González-Díaz. ASML: Algorithm portfolio selection with machine learning. 2025b. URL https://CRAN.R-project.org/package=ASML. R package version 1.1.0. +
    +
    +P. Kerschke, H. H. Hoos, F. Neumann and H. Trautmann. Automated Algorithm Selection: Survey and Perspectives. Evolutionary Computation, 27(1): 3–45, 2019. URL https://doi.org/10.1162/evco_a_00242. +
    +
    +L. Kotthoff. Algorithm selection for combinatorial search problems: A survey. In Data mining and constraint programming: Foundations of a cross-disciplinary approach, Eds C. Bessiere, L. De Raedt, L. Kotthoff, S. Nijssen, B. O’Sullivan and D. Pedreschi pages. 149–190 2016. Cham: Springer International Publishing. ISBN 978-3-319-50137-6. +
    +
    +L. Kotthoff, B. Bischl, B. Hurley, T. Rahwan and D. Pulatov. Llama: Leveraging learning to automatically manage algorithms. 2021. URL https://CRAN.R-project.org/package=llama. R package version 0.10.1. +
    +
    +Kuhn and Max. Building predictive models in R using the caret package. Journal of Statistical Software, 28(5): 1–26, 2008. URL https://www.jstatsoft.org/index.php/jss/article/view/v028i05. +
    +
    +M. Lang, M. Binder, J. Richter, P. Schratz, F. Pfisterer, S. Coors, Q. Au, G. Casalicchio, L. Kotthoff and B. Bischl. mlr3: A modern object-oriented machine learning framework in R. Journal of Open Source Software, 2019. URL https://joss.theoj.org/papers/10.21105/joss.01903. +
    +
    +A. Lodi and G. Zarpellon. On learning and branching: a survey. TOP: An Official Journal of the Spanish Society of Statistics and Operations Research, 25(2): 207–236, 2017. URL https://ideas.repec.org/a/spr/topjnl/v25y2017i2d10.1007_s11750-017-0451-6.html. +
    +
    +N. Meinshausen. Quantile regression forests. Journal of Machine Learning Research, 7: 983–999, 2006. +
    +
    +N. Meinshausen. quantregForest: Quantile regression forests. 2024. URL https://CRAN.R-project.org/package=quantregForest. R package version 1.3-7.1. +
    +
    +T. Messelis and P. De Causmaecker. An automatic algorithm selection approach for the multi-mode resource-constrained project scheduling problem. European Journal of Operational Research, 233(3): 511–528, 2014. URL https://www.sciencedirect.com/science/article/pii/S0377221713006863. +
    +
    +J. C. Pichel and B. Pateiro-López. A new approach for sparse matrix classification based on deep learning techniques. In 2018 IEEE international conference on cluster computing (CLUSTER), pages. 46–54 2018. DOI 10.1109/CLUSTER.2018.00017. +
    +
    +D. Pulatov, M. Anastacio, L. Kotthoff and H. Hoos. Opening the black box: Automated software analysis for algorithm selection. In Proceedings of the first international conference on automated machine learning, Eds I. Guyon, M. Lindauer, M. van der Schaar, F. Hutter and R. Garnett pages. 6/1–18 2022. PMLR. URL https://proceedings.mlr.press/v188/pulatov22a.html. +
    +
    +J. R. Rice. The algorithm selection problem. Advances in Computers, 15: 65–118, 1976. +
    +
    +H. D. Sherali and C. H. Tuncbilek. A global optimization algorithm for polynomial programming problems using a reformulation-linearization technique. Journal of Global Optimization, 2(1): 101–112, 1992. DOI 10.1007/bf00121304. +
    +
    +D. Speck, A. Biedenkapp, F. Hutter, R. Mattmüller and M. Lindauer. Learning heuristic selection with dynamic algorithm configuration. In Proceedings of the 31st international conference on automated planning and scheduling (ICAPS21), 2021. URL https://arxiv.org/abs/2006.08246. +
    +
    +L. Tierney, A. J. Rossini, N. Li and H. Sevcikova. Snow: Simple network of workstations. 2021. URL https://CRAN.R-project.org/package=snow. R package version 0.4-4. +
    +
    +J. Vanschoren. Meta-learning. In Automated machine learning: Methods, systems, challenges, Eds F. Hutter, L. Kotthoff and J. Vanschoren pages. 35–61 2019. Springer International Publishing. ISBN 978-3-030-05318-5. +
    +
    +
    +
    +
      +
    1. A more direct approach would be to use the getCosealASScenario function from the aslib package; however, this function seems to be currently not working, likely due to changes in the directory structure of the scenarios.↩︎

    2. +
    3. Available at: https://coseal.github.io/aslib-r/scenario-pages/CPMP-2015/llama.html (Accessed October 25, 2024).↩︎

    4. +
    5. Available at: https://coseal.github.io/aslib-r/scenario-pages/CPMP-2015/llama.html (Accessed October 25, 2024).↩︎

    6. +
    +
    + + +
    + +
    +
    + + + + + + + +
    +

    References

    +
    +

    Reuse

    +

    Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

    +

    Citation

    +

    For attribution, please cite this work as

    +
    Gómez-Casares, et al., "ASML: An R Package for Algorithm Selection with Machine Learning", The R Journal, 2026
    +

    BibTeX citation

    +
    @article{RJ-2025-045,
    +  author = {Gómez-Casares, Ignacio and Pateiro-López, Beatriz and González-Rodríguez, Brais and González-Díaz, Julio},
    +  title = {ASML: An R Package for Algorithm Selection with Machine Learning},
    +  journal = {The R Journal},
    +  year = {2026},
    +  note = {https://doi.org/10.32614/RJ-2025-045},
    +  doi = {10.32614/RJ-2025-045},
    +  volume = {17},
    +  issue = {4},
    +  issn = {2073-4859},
    +  pages = {216-236}
    +}
    +
    + + + + + + + diff --git a/_articles/RJ-2025-045/RJ-2025-045.pdf b/_articles/RJ-2025-045/RJ-2025-045.pdf new file mode 100644 index 0000000000..582f6cefd3 Binary files /dev/null and b/_articles/RJ-2025-045/RJ-2025-045.pdf differ diff --git a/_articles/RJ-2025-045/RJ-2025-045.tex b/_articles/RJ-2025-045/RJ-2025-045.tex new file mode 100644 index 0000000000..f094a136b7 --- /dev/null +++ b/_articles/RJ-2025-045/RJ-2025-045.tex @@ -0,0 +1,644 @@ +% !TeX root = RJwrapper.tex +\title{ASML: An R Package for Algorithm Selection with Machine Learning} + + +\author{by Ignacio Gómez-Casares, Beatriz Pateiro-López, Brais González-Rodríguez, and Julio González-Díaz} + +\maketitle + +\abstract{% +For extensively studied computational problems, it is commonly acknowledged that different instances may require different algorithms for optimal performance. The R package ASML focuses on the task of efficiently selecting from a given portfolio of algorithms, the most suitable one for each specific problem instance, based on significant instance features. The package allows for the use of the machine learning tools available in the R package caret and additionally offers visualization tools and summaries of results that make it easier to interpret how algorithm selection techniques perform, helping users better understand and assess their behavior and performance improvements. +} + +\section{Introduction}\label{introduction} + +Selecting from a set of algorithms the most appropriate one for solving a given problem instance (understood as an individual problem case with its own specific characteristics) is a common issue that comes up in many different situations, such as in combinatorial search problems \citep{kot16,dra20}, planning and scheduling problems \citep{spe21,mes14}, or in machine learning (ML), where the multitude of available techniques often makes it challenging to determine the best approach for a particular dataset \citep{van19}. For an extensive survey on automated algorithm selection and application areas, we refer to \cite{ker19}. + +Figure \ref{fig:ASkerschke} presents a general scheme, adapted from Figure 1 in \cite{ker19}, illustrating the use of ML for algorithm selection. A set of problem instances is given, each described by associated features, together with a portfolio of algorithms that have been evaluated on all instances. The instance features and performance results are then fed into a ML framework, which is trained to produce a selector capable of predicting the best-performing algorithm for an unseen instance. Note that we are restricting attention to \emph{offline} algorithm selection, in which the selector is constructed using a training set of instances and then applied to new problem instances. + +\begin{figure}[ht] + +{\centering \includegraphics[width=1\linewidth]{figures/AS_kerschke_drawio} + +} + +\caption{Schematic overview of the interplay between problem instance features (top left), algorithm performance data (bottom left), selector construction (center), and the assessment of selector performance (bottom right). Adapted from Kerschke et al. (2019).}\label{fig:ASkerschke} +\end{figure} + +Algorithm selection tools also demonstrate significant potential in the field of optimization, enhancing performance at solving problems where multiple solving strategies are often available. For example, a key factor in the efficiency of state-of-the-art global solvers in mixed integer linear programming and also in nonlinear optimization is the design of branch-and-bound algorithms and, in particular, of their branching rules. There isn't a single branching rule that outperforms all others on every problem instance. Instead, different branching rules exhibit optimal performance on different types of problem instances. Developing methods for the automatic selection of branching rules based on instance features has proven to be an effective strategy toward solving optimization problems more efficiently \citep{lod17, ben21, gha23}. + +In algorithm selection, not only do the problem domain to which it applies and the algorithms for addressing problem instances play a crucial role, but also the metrics used to assess algorithm effectiveness ---referred to in this work as Key Performance Indicators (KPIs). KPIs are used in different fields to assess and measure the performance of specific objectives or goals. In a business context, these indicators are quantifiable metrics that provide valuable insights into how well an individual, team, or entire organization is progressing towards achieving its defined targets. In the context of algorithms, KPIs serve as quantifiable measures used to evaluate the effectiveness and efficiency of algorithmic processes. For instance, in the realm of computer science and data analysis, KPIs can include measures like execution time, accuracy, and scalability. Monitoring these KPIs allows for a comprehensive assessment of algorithmic performance, aiding in the selection of the most appropriate algorithm for a given instance and facilitating continuous improvement in algorithmic design and implementation. + +Additionally, in many applications, normalizing the KPI to a standardized range like \([0, 1]\) provides a more meaningful basis for comparison. The KPI obtained through this process, which we will refer to as instance-normalized KPI, reflects the performance of each algorithm relative to the best-performing one for each specific instance. For example, if we have multiple algorithms and we are measuring execution time that can vary across instances, normalizing the execution time for each instance relative to the fastest algorithm within that same instance allows for a fairer evaluation. This is particularly important when the values of execution time might not directly reflect the relative performance of the algorithms due to wide variations in the scale of the measurements. Thus, normalizing puts all algorithms on an equal footing, allowing a clearer assessment of their relative efficiency. + +Following the general framework illustrated in Figure \ref{fig:ASkerschke}, the R package \CRANpkg{ASML} \citep{Rasml} provides a wrapper for ML methods to select from a portfolio of algorithms based on the value of a given KPI. It uses a set of features in a training set to learn a regression model for the instance-normalized KPI value for each algorithm. Then, the instance-normalized KPI is predicted for unseen test instances, and the algorithm with the best predicted value is chosen. As learning techniques for algorithm selection, the user can invoke any regression method from the \CRANpkg{caret} package \citep{Rcaret} or use a custom function defined by the user. This makes our package flexible, as it automatically supports new methods when they are added to \CRANpkg{caret}. Although initially designed for selecting branching rules in nonlinear optimization problems, its versatility allows the package to effectively address algorithm selection challenges across a wide range of domains. It can be applied to a broad spectrum of disciplines whenever there is a diverse set of instances within a specific problem domain, a suite of algorithms with varying behaviors across instances, clearly defined metrics for evaluating the performance of the available algorithms, and known features or characteristics of the instances that can be computed and are ideally correlated with algorithm performance. The visualization tools implemented in the package allow for an effective evaluation of the performance of the algorithm selection techniques. A key distinguishing element of \CRANpkg{ASML} is its learning-phase approach, which uses instance-normalized KPI values and trains a separate regression model for each algorithm to predict its normalized KPI on unseen instances. + +\section{Background}\label{background} + +The algorithm selection problem was first outlined in the seminal work by \cite{ric76} . In simple terms, for a given set of problem instances (problem space) and a set of algorithms (algorithm space), the goal is to determine a selection model that maps each problem instance to the most suitable algorithm for it. By \emph{most suitable}, we mean the best according to a specific metric that associates each combination of instance and algorithm with its respective performance. Formally, let \(\mathcal{P}\) denote the problem space or set of problem instances. The algorithm space or set of algorithms is denoted by \(\mathcal{A}\). The metric \(p:\mathcal{P}\times\mathcal{A}\rightarrow \mathbb{R}^n\) measures the performance \(p(x,A)\) of +any algorithm \(A\in \mathcal{A}\) on instance \(x\in\mathcal{P}\). The goal is to construct a selector \(S:\mathcal{P}\rightarrow \mathcal{A}\) that maps any problem +instance \(x\in \mathcal{P}\) to an algorithm \(S(x)=A\in \mathcal{A}\) in such a way that its performance is optimal. + +As discussed in the Introduction, many algorithm selection methods in the literature use ML tools to model the relationship between problem instances and algorithm performance, using features derived from these instances. The pivotal step in this process is defining appropriate features that can be readily computed and are likely to impact algorithm performance. That is, given \(x\in\mathcal{P}\), we make use of informative features \(f (x) = (f_1(x),\ldots,f_k(x))\in \mathbb{R}^k\). In this framework, the selector \(S\) maps the simpler feature space \(\mathbb{R}^k\) into the algorithm space \(\mathcal{A}\). A scheme of the algorithm selection problem, as described in \cite{ric76}, is shown in Figure \ref{fig:rice}. + +\begin{figure}[ht] + +{\centering \includegraphics[width=0.8\linewidth]{figures/ASP_Rice} + +} + +\caption{Scheme of the algorithm selection problem by Rice (1976).}\label{fig:rice} +\end{figure} + +For the practical derivation of the selection model \(S\), we use training data consisting of features \(f(x)\) and performances \(p(x,A)\), where \(x \in \mathcal{P}^\prime \subset \mathcal{P}\) and \(A \in \mathcal{A}\). The task is to learn the selector \(S\) based on the training data. The model allows us to forecast the performance on unobserved instance problems based on their features and subsequently select the algorithm with the highest predicted performance. A comprehensive discussion of various aspects of algorithm selection techniques can be found in \cite{kot16} and \cite{pul22}. + +\section{Algorithm selection tools in R}\label{algorithm-selection-tools-in-r} + +The task of algorithm selection has seen significant advancements in recent years, with R packages facilitating this process. Here we present some of the existing tools that offer a range of functionalities, including flexible model-building frameworks, automated workflows, and standardized scenario formats, providing valuable resources for both researchers and end-users in algorithm selection. + +The \CRANpkg{llama} package \citep{Rllama} provides a flexible implementation within R for evaluating algorithm portfolios. It simplifies the task of building predictive models to solve algorithm selection scenarios, allowing users to apply ML models effectively. In \CRANpkg{llama}, ML algorithms are defined using the \CRANpkg{mlr} package \citep{Rmlr}, offering a structured approach to model selection. On the other hand, the Algorithm Selection Library (ASlib) \citep{bis16} proposes a standardized format for representing algorithm selection scenarios and introduces a repository that hosts an expanding collection of datasets from the literature. It serves as a benchmark for evaluating algorithm selection techniques under consistent conditions. It is accessible to R users through the \CRANpkg{aslib} package. This integration simplifies the process for those working within the R environment. Furthermore, \CRANpkg{aslib} interfaces with the \CRANpkg{llama} package, facilitating the analysis of algorithm selection techniques within the benchmark scenarios it provides. + +Our \CRANpkg{ASML} package offers an approach to algorithm selection based on the powerful and flexible \CRANpkg{caret} framework. By using \CRANpkg{caret}'s ability to work with many different ML models, along with its model tuning and validation tools, \CRANpkg{ASML} makes the selection process easy and effective, especially for users already familiar with \CRANpkg{caret}. Thus, while \CRANpkg{ASML} shares some conceptual similarities with \CRANpkg{llama}, it distinguishes itself through its interface to the ML models in \CRANpkg{caret} instead of \CRANpkg{mlr}, which is currently considered retired by the mlr-org team, potentially leading to compatibility issues with certain learners, and has been succeeded by the next-generation \CRANpkg{mlr3} \citep{Rmlr3}. In addition, \CRANpkg{ASML} automates the normalization of KPIs based on the best-performing algorithm for each instance, addressing the challenges that arise when performance metrics vary significantly across instances. \CRANpkg{ASML} further provides new visualization tools that can be useful for understanding the results of the learning process. A comparative overview of the main features and differences between these packages can be seen in Table \ref{tab:ASMLvsllama}. + +\begin{table}[!h] +\centering +\caption{\label{tab:ASMLvsllama}Comparative overview of ASML and llama for algorithm selection.} +\centering +\fontsize{9}{11}\selectfont +\begin{tabular}[t]{>{\raggedright\arraybackslash}m{3cm}>{\centering\arraybackslash}m{4.5cm}>{\centering\arraybackslash}m{4.5cm}} +\toprule +\textbf{Aspect} & \textbf{ASML} & \textbf{llama}\\ +\midrule +\cellcolor{gray!10}{Input data} & \cellcolor{gray!10}{features\hspace{4cm} KPIs\hspace{4cm}split by families supported} & \cellcolor{gray!10}{features\hspace{4cm} KPIs\hspace{4cm}feature costs supported}\\ +Normalized KPIs & \ding{51} & \ding{55}\\ +\cellcolor{gray!10}{ML backend} & \cellcolor{gray!10}{caret} & \cellcolor{gray!10}{mlr}\\ +Hyperparameter tuning & ASML::AStrain()\hspace{4cm}supports arguments passed to caret (trainControl(), tuneGrid) & llama::cvFolds\hspace{4cm}llama::tuneModel\\ +\cellcolor{gray!10}{Parallelization} & \cellcolor{gray!10}{\ding{51}\hspace{4cm} with snow} & \cellcolor{gray!10}{\ding{51}\hspace{4cm} with parallelMap}\\ +\addlinespace +Results summary & Per algorithm\hspace{4cm}Best overall and per instance\hspace{4cm}ML-selected & Virtual best and single best per instance\hspace{4cm}Aggregated scores (PAR, count, successes)\\ +\cellcolor{gray!10}{Visualization} & \cellcolor{gray!10}{Boxplots (per algorithm and ML-selected)\hspace{4cm}Ranking plots\hspace{4cm}Barplots (best vs ML-selected)} & \cellcolor{gray!10}{Scatter plots comparing two algorithm selectors}\\ +Model interpretability tools & \ding{51}\hspace{4cm} with DALEX & \ding{55}\\ +\cellcolor{gray!10}{ASlib integration} & \cellcolor{gray!10}{basic support} & \cellcolor{gray!10}{extended support}\\ +Latest release & CRAN 1.1.0 (2025) & CRAN 0.10.1 (2021)\\ +\bottomrule +\end{tabular} +\end{table} + +There are also automated approaches that streamline the process of selecting and optimizing ML models within the R environment. Tools like \CRANpkg{h2o} provide robust functionalities specifically designed for R users, facilitating an end-to-end ML workflow. These frameworks automate various tasks, including algorithm selection, hyperparameter optimization, and feature engineering, thereby simplifying the process for users of all skill levels. By integrating these automated solutions into R, users can efficiently explore a wide range of models and tuning options without needing extensive domain knowledge or manual intervention. This automation not only accelerates the model development process but also improves the overall performance of ML projects by allowing a systematic evaluation of different approaches and configurations. However, while \CRANpkg{h2o} excels at automating the selection of ML models and hyperparameter tuning, it does not perform algorithm selection based on instance-specific features, which is the primary focus of our approach. Instead, it evaluates multiple algorithms in parallel and selects the best-performing one based on predetermined metrics. + +\section{Using the ASML package}\label{using-the-asml-package} + +Here, we illustrate the usage of the \CRANpkg{ASML} package with an example within the context of algorithm selection for spatial branching in polynomial optimization, aligning with the problem discussed in \cite{gha23} and further explored in \cite{gon24}. Table \ref{tab:optsum} provides an overview of the problem and a summary of the components that we will discuss in detail below. + +\begin{table}[!h] +\centering +\caption{\label{tab:optsum}Summary of the branching rule selection problem.} +\centering +\fontsize{9}{11}\selectfont +\begin{tabular}[t]{ll} +\toprule +Algorithms & \ttfamily{max, sum, dual, range, eig-VI, eig-CMI}\\ +KPI & \ttfamily{pace}\\ +Number of instances & 407\\ +Number of instances per library & 180 (DS), 164 (MINLPLib), 63 (QPLIB)\\ +Number of features & 33\\ +\bottomrule +\end{tabular} +\end{table} + +A well-known approach for finding global optima in polynomial optimization problems is based on the use of the Reformation Linearization Technique (RLT) \citep{she92}. Without delving into intricate details, RLT operates by creating a linear relaxation of the original polynomial problem, which is then integrated into a branch-and-bound framework. The branching process involves assigning a score to each variable, based on the violations of the RLT identities it participates in, after solving the corresponding relaxation at each node. Subsequently, the variable with the highest score is selected for branching. The computation of these scores is a critical aspect and allows for various approaches, leading to distinct branching rules that constitute our algorithm selection portfolio. Specifically, in our example, we will examine six distinct branching rules (referred to interchangeably as branching rules or algorithms), labeled as \texttt{max}, \texttt{sum}, \texttt{dual}, \texttt{range}, \texttt{eig-VI}, and \texttt{eig-CMI} rules. For the definitions and a comprehensive understanding of the rationale behind these rules, refer to \cite{gha23}. + +Measuring the performance of different algorithms in the context of optimization is crucial for evaluating their effectiveness and efficiency. Two common metrics for this evaluation are running time and optimality gap, measured as a function of the lower and upper bounds for the objective function value at the end of the algorithm (a small optimality gap indicates that the algorithm is producing solutions close to the optimal). Both metrics are important and are often considered together to evaluate algorithm performance. For instance, it is meaningful to consider the time required to reduce the optimality gap by one unit as KPI. In our example, and to ensure it is well-defined, we make use of a slightly different metric, which we refer to as pace, defined as the time required to increase the lower bound by one unit. For the pace, a smaller value is preferred, as it indicates better performance. + +As depicted in Figure \ref{fig:rice}, a crucial aspect of the methodology involves selecting input variables (features) that facilitate the prediction of the KPI for each branching rule. We consider 33 features representing global information of the polynomial optimization problems, such as relevant characteristics of variables, constraints, monomials, coefficients, or other attributes. A detailed description of the considered features can be found in Table \ref{tab:featsum}. Although we won't delve into these aspects, determining appropriate features is often complex, and using feature-selection methods can be beneficial for choosing the most relevant ones. + +\begin{table} +\centering +\caption{\label{tab:featsum}Features from the branching dataset.} +\centering +\begin{tabular}[t]{ll} +\toprule +\textbf{Index} & \textbf{Description}\\ +\midrule +\ttfamily{3} & Number of variables\\ +\ttfamily{4} & Number of constraints\\ +\ttfamily{5} & Degree\\ +\ttfamily{6} & Number of monomials\\ +\ttfamily{7} & Density\\ +\addlinespace +\ttfamily{8} & Density of VIG\\ +\ttfamily{9} & Modularity of VIG\\ +\ttfamily{10} & Treewidth of VIG\\ +\ttfamily{11} & Density of CMIG\\ +\ttfamily{12} & Modularity of CMIG\\ +\addlinespace +\ttfamily{13} & Treewidth of CMIG\\ +\ttfamily{14} & Pct. of variables not present in any monomial with degree greater than one\\ +\ttfamily{15} & Pct. of variables not present in any monomial with degree greater than two\\ +\ttfamily{16} & Number of variables divided by number of constrains\\ +\ttfamily{17} & Number of variables divided by degree\\ +\addlinespace +\ttfamily{18} & Pct. of equality constraints\\ +\ttfamily{19} & Pct. of linear constraints\\ +\ttfamily{20} & Pct. of quadratic constraints\\ +\ttfamily{21} & Number of monomials divided by number of constrains\\ +\ttfamily{22} & Number of RLT variables divided by number of constrains\\ +\addlinespace +\ttfamily{23} & Pct. of linear monomials\\ +\ttfamily{24} & Pct. of quadratic monomials\\ +\ttfamily{25} & Pct. of linear RLT variables\\ +\ttfamily{26} & Pct. of quadratic RLT variables\\ +\ttfamily{27} & Variance of the ranges of the variables\\ +\addlinespace +\ttfamily{28} & Variance of the coefficients\\ +\ttfamily{29} & Variance of the density of the variables\\ +\ttfamily{30} & Variance of the no. of appearances of each variable\\ +\ttfamily{31} & Average of the ranges of the variables\\ +\ttfamily{32} & Average of the coefficients\\ +\addlinespace +\ttfamily{33} & Average pct. of monomials in each constraint and in the objective function\\ +\ttfamily{34} & Average of the no. of appearances of each variable\\ +\ttfamily{35} & Median of the ranges of the variables\\ +\bottomrule +\multicolumn{2}{l}{\rule{0pt}{1em}\textit{Note: }}\\ +\multicolumn{2}{l}{\rule{0pt}{1em}Index refers to columns of branching\$x.}\\ +\end{tabular} +\end{table} + +To assess the performance of the algorithm selection methods in this context, we have a diverse set of 407 instances from different optimization problems, taken from three well-known benchmarks \citep{bus03, dal16, fur18}, corresponding respectively to the MINLPLib, DS, and QPLIB libraries. Details are given in Table \ref{tab:optsum}. The data for this analysis is contained within the \texttt{branching} dataset included in the package. We begin by defining two data frames. The \texttt{features} data frame includes two initial columns that provide the instance names and the corresponding family (library in our example) for each instance. The remaining columns consist of the features listed in Table \ref{tab:featsum}. + +We also define the \texttt{KPI} data frame, which is derived from \texttt{branching\$y}. This data frame contains the pace values for each of the six branching rules considered in this study (specified by the labels in the \texttt{lab\_rules} vector). These data frames will serve as the input for our subsequent analyses. + +\begin{verbatim} +set.seed(1234) +library(ASML) +data(branching) +features <- branching$x +KPI <- branching$y +lab_rules <- c("max", "sum", "dual", "range", "eig-VI", "eig-CMI") +\end{verbatim} + +\subsection{Pre-Processing the data}\label{pre-processing-the-data} + +As with any analysis, the first step involves preprocessing the data. This includes using the function \texttt{partition\_and\_normalize}, which not only divides the dataset into training and test sets but also normalizes the KPI relative to the best result for each instance. The argument \texttt{better\_smaller} specifies whether a lower KPI value is preferred (such as in our case where the KPI represents pace, with smaller values indicating better performance) or if a higher value is desired, when larger KPI values are considered more advantageous. + +\begin{verbatim} +data <- partition_and_normalize(features, KPI, family_column = 1, split_by_family = TRUE, + better_smaller = TRUE) +names(data) +\end{verbatim} + +\begin{verbatim} +#> [1] "x.train" "y.train" "y.train.original" "x.test" +#> [5] "y.test" "y.test.original" "families.train" "families.test" +#> [9] "better_smaller" +\end{verbatim} + +When using the function \texttt{partition\_and\_normalize} the resulting object is of class \texttt{as\_data} and contains several key components essential for our study. Specifically, the object includes \texttt{x.train} and \texttt{x.test}, representing the feature sets for the training and test datasets, respectively. Additionally, it contains \texttt{y.train} and \texttt{y.test}, with the instance-normalized KPI corresponding to each dataset, along with their original counterparts, \texttt{y.train.original} and \texttt{y.test.original}. This structure allows us to retain the original KPI values while working with the instance-normalized data. Furthermore, when the parameter \texttt{split\_by\_family} is set to \texttt{TRUE}, as in the example, the object also includes \texttt{families.train} and \texttt{families.test}, indicating the family affiliation for each observation within the training and test sets. Figure \ref{fig:partitionPLOT} illustrates how the split preserves the proportions of instances for each library. + +\begin{figure}[ht] + +{\centering \includegraphics[width=0.7\linewidth,alt={Train/Test partition preserving the percentage of instances for each library.}]{figures/partitionPLOT-1} + +} + +\caption{Train/Test partition preserving the percentage of instances for each library.}\label{fig:partitionPLOT} +\end{figure} + +As a tool for visualizing the performance of the considered algorithms, the \texttt{boxplots} function operates on objects of class \texttt{as\_data} and generates boxplots for the instance-normalized KPI. This visualization facilitates the comparison of performance differences across instances. The function can be applied to both training and test observations and can also group the results by family. Additionally, it accepts common arguments typically used in R functions. Figure \ref{fig:splitPLOT} shows the instance-normalized KPI of the instances in the train set. What becomes evident from the boxplots is that there is no branching rule that outperforms the others across all instances, and making a wrong choice of criteria in certain problems can lead to very poor performance. + +\begin{verbatim} +boxplots(data, test = FALSE, by_families = FALSE, labels = lab_rules) +\end{verbatim} + +\begin{figure}[ht] + +{\centering \includegraphics[width=0.7\linewidth,alt={Boxplots of instance-normalized KPI for each algorithm across instances in the train set.}]{figures/splitPLOT-1} + +} + +\caption{Boxplots of instance-normalized KPI for each algorithm across instances in the train set.}\label{fig:splitPLOT} +\end{figure} + +The \texttt{ranking} function, specifically designed for the \CRANpkg{ASML} package, is also valuable for visualizing the differing behaviors of the algorithms under investigation, depending on the analyzed instances. After ranking the algorithms for each instance, based on the instance-normalized KPI, the function generates a bar chart for each algorithm, indicating the percentage of times it occupies each ranking position. The numbers displayed within the bars represent the mean value of the instance-normalized KPI for the problems associated with that specific ranking position. Again, the representation can be made both for the training and test sets, as well as by family. In Figure \ref{fig:rank}, we present the chart corresponding to the training sample and categorized by family. In particular, it is observed that certain rules, when not the best choice for a given instance, can perform quite poorly in terms of instance-normalized KPI (see, for example, the results on the MINLPLib library). This highlights the importance of not only selecting the best algorithm for each instance but also ensuring that the chosen algorithm does not perform too poorly when it isn't optimal. In some cases, even if an algorithm isn't the best-performing option, it may still provide reasonably good results, whereas a wrong choice can result in significantly worse outcomes. + +\begin{verbatim} +ranking(data, test = FALSE, by_families = TRUE, labels = lab_rules) +\end{verbatim} + +\begin{figure}[ht] + +{\centering \includegraphics[width=0.7\linewidth,alt={Ranking of algorithms based on the instance-normalized KPI for the training sample, categorized by family. The bars represent the percentage of times each algorithm appeared in different ranking positions, with the numbers indicating the mean value of the KPI.}]{figures/rank-1} + +} + +\caption{Ranking of algorithms based on the instance-normalized KPI for the training sample, categorized by family. The bars represent the percentage of times each algorithm appeared in different ranking positions, with the numbers indicating the mean value of the KPI.}\label{fig:rank} +\end{figure} + +Additionally, functions from the \CRANpkg{caret} package can be applied if further operations on the predictors are needed. Here we show an example where the Yeo-Johnson transformation is applied to the training set, and the same transformation is subsequently applied to the test set to ensure consistency across both datasets. The flexibility of \CRANpkg{caret} also allows for the inclusion of advanced techniques, such as feature selection and dimensionality reduction, to improve the quality of the algorithm selection process. + +\begin{verbatim} +preProcValues <- caret::preProcess(data$x.train, method = "YeoJohnson") +data$x.train <- predict(preProcValues, data$x.train) +data$x.test <- predict(preProcValues, data$x.test) +\end{verbatim} + +\subsection{Training models and predicting the performance of the algorithms}\label{training-models-and-predicting-the-performance-of-the-algorithms} + +The approach in \CRANpkg{ASML} to algorithm selection is based on building regression models that predict the instance-normalized KPI of each considered algorithm. To this end, users can take advantage of the wide range of ML models available in the \CRANpkg{caret} package, which provides a unified interface for training and tuning various types of models. Models trained with \CRANpkg{caret} can be seamlessly integrated into the \CRANpkg{ASML} workflow using the \texttt{AStrain} function from \CRANpkg{ASML}, as shown in the next example. Just for illustrative purposes, we use quantile random forest \citep{mei06} to model the behavior of the instance-normalized KPI based on the features. This is done with the \texttt{qrf} method in the \CRANpkg{caret} package, which relies on the \CRANpkg{quantregForest} package \citep{mei24}. + +\begin{verbatim} +library(quantregForest) +tune_grid <- expand.grid(mtry = 10) +training <- AStrain(data, method = "qrf", tuneGrid = tune_grid) +\end{verbatim} + +Additional arguments for \texttt{caret::train} can also be passed directly to \texttt{ASML::AStrain}. This allows users to take advantage of the flexibility of the \CRANpkg{caret} package, including specifying control methods (such as cross-validation), tuning parameters, or any other relevant settings provided by \texttt{caret::train}. This integration ensures that the \CRANpkg{ASML} workflow can fully make use of the modeling capabilities offered by \CRANpkg{caret}. To make the execution faster (it is not our intention here to delve into the choice of the best model), we use a \texttt{tune\_grid} that sets a fixed value for \texttt{mtry}. This avoids the need for an exhaustive search for this hyperparameter, speeding up the model training process. Other modeling approaches should also be considered, as they may offer better performance depending on the specific characteristics of the data and the problem at hand. For more computationally intensive models or larger datasets, the \texttt{ASML::AStrain} function includes the argument \texttt{parallel}, which can be set to TRUE to enable parallel execution using the \CRANpkg{snow} package \citep{Rsnow}. This allows the training step to be distributed across multiple cores, reducing computation time. A detailed example on a larger dataset is provided in the following section, showing the scalability of the workflow and the effect of parallelization on training time. + +The function \texttt{caret::train} returns a trained model along with performance metrics, predictions, and tuning parameters, providing insights into the model's effectiveness. In a similar manner, \texttt{ASML::AStrain} offers the same type of output but for each algorithm under consideration, allowing straightforward comparison within the \CRANpkg{ASML} framework. The \texttt{ASML::ASpredict} function generates the predictions for new data by using the models created during the training phase for each algorithm under evaluation. Thus, predictions for the algorithms are obtained simultaneously, facilitating a direct comparison of their performance. By using \texttt{ASML::ASpredict} as follows, we obtain a matrix where each row corresponds to an instance from the test set, and each column represents the predicted instance-normalized KPIs for the six branching rules using the \texttt{qrf} method. + +\begin{verbatim} +predict_test <- ASpredict(training, newdata = data$x.test) +\end{verbatim} + +\subsection{Evaluating and visualizing the results}\label{evaluating-and-visualizing-the-results} + +One of the key strengths of the \CRANpkg{ASML} package lies in its ability to evaluate results collectively and provide intuitive visualizations. This approach not only aids in identifying the most effective algorithms but also contributes to the interpretability of the results, making it easier for users to make informed decisions based on the performance metrics and visual representations provided. For example, the function \texttt{KPI\_table} returns a table showing the arithmetic and geometric mean of the KPI (both instance-normalized and not normalized) obtained on the test set for each algorithm, as well as for the algorithm selected by the learning model (the one with the largest instance-normalized predicted KPI for each instance). In Table \ref{tab:AMSLtab2}, the results for our case study are shown. It is important to note that larger values are better in the columns for the arithmetic and geometric mean of the instance-normalized KPI (where values close to 1 indicate the best performance). Conversely, in the columns for non-normalized values, lower numbers reflect better outcomes. In all cases, the best results are obtained for the ML algorithm. Note also that in this case, the differences in the performance of the algorithms are likely better reflected by the geometric mean because it gives a better representation of relative differences. + +\begin{verbatim} +KPI_table(data, predictions = predict_test) +\end{verbatim} + +\begin{table}[!h] +\centering +\caption{\label{tab:AMSLtab2}Arithmetic and geometric mean of the KPI (both instance-normalized and non-normalized) for each algorithm on the test set, along with the results for the algorithm selected by the learning model (first row).} +\centering +\fontsize{9}{11}\selectfont +\begin{tabular}[t]{l>{\raggedleft\arraybackslash}p{2.5cm}>{\raggedleft\arraybackslash}p{2.5cm}>{\raggedleft\arraybackslash}p{2.5cm}>{\raggedleft\arraybackslash}p{2.5cm}} +\toprule + & Arith. mean\newline inst-norm KPI & Geom. mean\newline inst-norm KPI & Arith. mean\newline non-norm KPI & Geom. mean\newline non-norm KPI\\ +\midrule +\ttfamily{ML} & 0.911 & 0.887 & 88114.19 & 1.035\\ +\ttfamily{max} & 0.719 & 0.367 & 158716.13 & 2.574\\ +\ttfamily{sum} & 0.791 & 0.537 & 104402.53 & 1.780\\ +\ttfamily{dual} & 0.842 & 0.581 & 104393.92 & 1.634\\ +\ttfamily{range} & 0.879 & 0.644 & 107064.29 & 1.432\\ +\addlinespace +\ttfamily{eig-VI} & 0.781 & 0.474 & 131194.49 & 2.007\\ +\ttfamily{eig-CMI} & 0.800 & 0.591 & 88197.74 & 1.616\\ +\bottomrule +\end{tabular} +\end{table} + +Additionally, the function \texttt{KPI\_summary\_table} generates a concise comparative table displaying values for three different choices: single best, ML, and optimal, see Table \ref{tab:AMSLtabsum2}. The single best choice refers to selecting the same algorithm for all instances based on the lowest geometric mean of the non-normalized KPI (in this case the \texttt{range} rule). This approach evaluates the performance of each algorithm across all instances and chooses the one that consistently performs best overall, rather than optimizing for individual instances. The ML choice represents the algorithm selected by the quantile random forest model. The optimal choice corresponds to solving each instance with the algorithm that performs best for that specific instance. The ML choice shows promising results, with a mean KPI close to the optimal choice, demonstrating its capability to select algorithms that yield competitive performance. + +\begin{verbatim} +KPI_summary_table(data, predictions = predict_test) +\end{verbatim} + +\begin{table}[!h] +\centering +\caption{\label{tab:AMSLtabsum2}Arithmetic and geometric mean of the non-normalized KPI for single best choice, ML choice, and optimal choice.} +\centering +\fontsize{9}{11}\selectfont +\begin{tabular}[t]{l>{\raggedleft\arraybackslash}p{2.5cm}>{\raggedleft\arraybackslash}p{2.5cm}} +\toprule + & Arith. mean\newline non-norm KPI & Geom. mean\newline non-norm KPI\\ +\midrule +\ttfamily{single best} & 107064.29 & 1.432\\ +\ttfamily{ML} & 88114.19 & 1.035\\ +\ttfamily{optimal} & 88085.37 & 0.911\\ +\bottomrule +\end{tabular} +\end{table} + +The following code generates several visualizations that help us compare how well the algorithms perform according to the response variable (instance-normalized KPI) and also illustrate the behavior of the learning process. These plots give us good insights into how effective the algorithm selection process is and how it behaves in comparison to using the same branching rule for all instances. Figure \ref{fig:ASMLplot1} shows the boxplots comparing the performance of each algorithm in terms of the instance-normalized KPI, including the instance-normalized KPI of the rules selected by the ML process for the test set. In Figure \ref{fig:ASMLplot2}, the performance is presented by family, allowing for a more detailed comparison across the different sets of instances. In Figure \ref{fig:ASMLplot3}, we show the ranking of algorithms based on the instance-normalized KPI for the test sample, including the ML rule, categorized by family. Finally, in Figure \ref{fig:ASMLplot4}, the right-side bar in the stacked bar plot (optimal) illustrates the proportion of instances in which each of the original rules is identified as the best-performing option. In contrast, the left-side bar (ML) depicts the frequency with which ML selects each rule as the top choice. Although the rule chosen by ML in each instance doesn't always match the best one for that case, ML tends to select the different rules in a similar proportion to how often those rules are the best across the test set. This means it does not consistently favor a particular rule or ignore any that are the best a significant percentage of instances. + +\begin{verbatim} +boxplots(data, predictions = predict_test, labels = c(lab_rules, "ML")) +boxplots(data, predictions = predict_test, labels = c(lab_rules, "ML"), by_families = TRUE) +ranking(data, predictions = predict_test, labels = c("ML", lab_rules), by_families = TRUE) +figure_comparison(data, predictions = predict_test, by_families = FALSE, labels = lab_rules) +\end{verbatim} + +\begin{figure}[ht] + +{\centering \includegraphics[width=0.7\linewidth,alt={Boxplots of instance-normalized KPI for each algorithm, including the ML algorithm, across instances in the test set.}]{figures/ASMLplot1-1} + +} + +\caption{Boxplots of instance-normalized KPI for each algorithm, including the ML algorithm, across instances in the test set.}\label{fig:ASMLplot1} +\end{figure} + +\begin{figure}[ht] + +{\centering \includegraphics[width=0.7\linewidth,alt={Boxplots of instance-normalized KPI for each algorithm, including the ML algorithm, across instances in the test set, categorized by family.}]{figures/ASMLplot2-1} + +} + +\caption{Boxplots of instance-normalized KPI for each algorithm, including the ML algorithm, across instances in the test set, categorized by family.}\label{fig:ASMLplot2} +\end{figure} + +\begin{figure}[ht] + +{\centering \includegraphics[width=0.7\linewidth,alt={Ranking of algorithms, including the ML algorithm, based on the instance-normalized KPI for the test sample, categorized by family. The bars represent the percentage of times each algorithm appeared in different ranking positions, with the numbers indicating the mean value of the normalized KPI.}]{figures/ASMLplot3-1} + +} + +\caption{Ranking of algorithms, including the ML algorithm, based on the instance-normalized KPI for the test sample, categorized by family. The bars represent the percentage of times each algorithm appeared in different ranking positions, with the numbers indicating the mean value of the normalized KPI.}\label{fig:ASMLplot3} +\end{figure} + +\begin{figure}[ht] + +{\centering \includegraphics[width=0.7\linewidth,alt={Comparison of the best-performing rules: The right stack shows the proportion of times each of the original rules is identified as the best-performing option, while the left stack presents the frequency of selection by ML.}]{figures/ASMLplot4-1} + +} + +\caption{Comparison of the best-performing rules: The right stack shows the proportion of times each of the original rules is identified as the best-performing option, while the left stack presents the frequency of selection by ML.}\label{fig:ASMLplot4} +\end{figure} + +\subsection{Custom user-defined methods}\label{custom-user-defined-methods} + +While \CRANpkg{caret} provides a range of built-in methods for model training and prediction, there may be situations where researchers want to explore additional methods not directly integrated into the package. Considering alternative methods can improve the analysis and provide greater flexibility in modeling choices. + +In this section, we present an example of how to modify the quantile random forest \texttt{qrf} method. The \texttt{qrf} implementation in \CRANpkg{caret} does not allow users to specify the conditional quantile to predict, which is set to the median by default. In this case, rather than creating an entirely new method, we only need to adjust the prediction function to include the \texttt{what} argument, allowing us to specify the desired conditional quantile for prediction. In this execution example, we base the algorithm selection method on the predictions of the \(\alpha\)-conditional quantile of the instance-normalized KPI for \(\alpha = 0.25\). + +\begin{verbatim} +qrf_q_predict <- function(modelFit, newdata, what = 0.5, submodels = NULL) { + out <- predict(modelFit$finalModel, newdata, what = what) + if (is.matrix(out)) { + out <- out[, 1] + } + out +} + +predict_test_Q1 <- ASpredict(training, newdata = data$x.test, f = "qrf_q_predict", + what = 0.25) +KPI_summary_table(data, predictions = predict_test_Q1) +\end{verbatim} + +\subsection{Model interpretability}\label{model-interpretability} + +Predictive modeling often relies on flexible but complex methods. These methods typically involve many parameters or hyperparameters, which can make the models difficult to interpret. To address this, interpretable ML techniques provide tools for exploring \emph{black-box} models. \CRANpkg{ASML} integrates seamlessly with the package \CRANpkg{DALEX} (moDel Agnostic Language for Exploration and eXplanation), see \cite{DALEX}. With \CRANpkg{DALEX}, users can obtain model performance metrics, evaluate feature importance, and generate partial dependence plots (PDPs), among other analyses. + +To simplify the use of \CRANpkg{DALEX} within our framework, \CRANpkg{ASML} provides the function \texttt{ASexplainer}. This function automatically creates \CRANpkg{DALEX} explainers for the models trained with \texttt{AStrain} (one for each algorithm in the portfolio). Once the explainers are created, users can easily apply \CRANpkg{DALEX} functions to explore and compare the behavior of each model. The following example shows how to obtain a plot of the reversed empirical cumulative distribution function of the absolute residuals, from the performance metrics computed with \texttt{DALEX::model\_performance}, see Figure \ref{fig:DALEX1}. + +\begin{verbatim} +# Create DALEX explainers for each trained model +explainers_qrf <- ASexplainer(training, data = data$x.test, y = data$y.test, labels = lab_rules) +# Compute model performance metrics for each explainer +mp_qrf <- lapply(explainers_qrf, DALEX::model_performance) +# Plot the performance metrics +do.call(plot, unname(mp_qrf)) + theme_bw(base_line_size = 0.5) +\end{verbatim} + +\begin{figure}[ht] + +{\centering \includegraphics[width=0.7\linewidth,alt={Reversed empirical cumulative distribution function of the absolute residuals of the trained model.}]{figures/DALEX1-1} + +} + +\caption{Reversed empirical cumulative distribution function of the absolute residuals of the trained models.}\label{fig:DALEX1} +\end{figure} + +The code below illustrates how to obtain feature importance (via \texttt{DALEX::model\_parts}) and a PDP for the predictor variable \texttt{degree} (via \texttt{DALEX::model\_profile}). Plots are not displayed in this manuscript, but they can be generated by executing the code. + +\begin{verbatim} +# Compute feature importance for each model in the explainers list +vi_qrf <- lapply(explainers_qrf, DALEX::model_parts) +# Plot the top 5 most important variables for each model +do.call(plot, c(unname(vi_qrf), list(max_vars = 5))) +# Compute PDP for the variable 'degree' for each model +pdp_qrf <- lapply(explainers_qrf, DALEX::model_profile, variable = "degree", type = "partial") +# Plot the PDPs generated +do.call(plot, unname(pdp_qrf)) + theme_bw(base_line_size = 0.5) +\end{verbatim} + +\section{Example on a larger dataset}\label{example-on-a-larger-dataset} + +To analyze the scalability of \CRANpkg{ASML}, we now consider an example of algorithm selection in the field of high-performance computing (HPC), specifically in the context of the automatic selection of the most suitable storage format for sparse matrices on GPUs. This is a well-known problem in HPC, since the storage format has a decisive impact on the performance of many scientific kernels such as the sparse matrix--vector multiplication (SpMV). For this study, we use the dataset introduced by \cite{pic18}, which contains 8111 sparse matrices and is available in the \CRANpkg{ASML} package under the name \texttt{SpMVformat}. Each matrix is described by a set of nine structural features, and the performance of the single-precision SpMV kernel was measured on a NVIDIA GeForce GTX TITAN GPU, under three storage formats: compressed row storage (CSR), ELLPACK (ELL), and hybrid (HYB). For each matrix and format, performance is expressed as the average GFLOPS (billions of floating-point operations per second), over 1000 SpMV operations. This setup allows us to study how matrix features relate to the most efficient storage format. + +The workflow follows the standard \CRANpkg{ASML} pipeline: data are partitioned and normalized, preprocessed, and models are trained using \texttt{ASML::AStrain}. We considered different learning methods available in \CRANpkg{caret} and evaluated execution times both with and without parallel processing, which is controlled via the \texttt{parallel} argument in \texttt{ASML::AStrain}. The selected methods were run with their default configurations in \CRANpkg{caret}, without additional hyperparameter tuning. All experiments were performed on a machine equipped with a 12th Gen Intel(R) Core(TM) i7-12700 (12 cores), 2.11 GHz processor and 32 GB of RAM. The execution times are summarized in Tables \ref{tab:AMSLtimes} and \ref{tab:AMSLtimes2}. + +\begin{table}[!h] +\centering +\caption{\label{tab:AMSLtimes}Execution times (in seconds) on the SpMVformat dataset for the main preprocessing stages.} +\centering +\fontsize{9}{11}\selectfont +\begin{tabular}[t]{l>{\raggedleft\arraybackslash}p{10em}} +\toprule +Stage & Execution time (seconds)\\ +\midrule +ASML::partition\_and\_normalize & 0.03\\ +caret::preProcess & 1.55\\ +\bottomrule +\end{tabular} +\end{table} + +\begin{table}[!h] +\centering +\caption{\label{tab:AMSLtimes2}Training times (in seconds) on the SpMVformat dataset for different methods using ASML::AStrain. The first column shows execution without parallelization (parallel = FALSE) and the second column shows execution with parallelization (parallel = TRUE).} +\centering +\fontsize{9}{11}\selectfont +\begin{tabular}[t]{l>{\raggedleft\arraybackslash}p{10em}>{\raggedleft\arraybackslash}p{10em}} +\toprule +\multicolumn{1}{c}{ } & \multicolumn{2}{c}{Execution times (in seconds) of ASML::AStrain} \\ +\cmidrule(l{3pt}r{3pt}){2-3} +Method & parallel = FALSE & parallel = TRUE\\ +\midrule +nnet & 236.58 & 50.75\\ +svmRadial & 881.03 & 263.60\\ +rf & 4753.00 & 1289.68\\ +\bottomrule +\end{tabular} +\end{table} + +The majority of the computational cost is associated with model training, which depends on the learning method in \CRANpkg{caret}. We observe that training times vary across methods: \texttt{nnet} (a simple feed-forward neural network), \texttt{svmRadial} (support vector machines with radial kernel), and \texttt{rf} (random forest). Parallel execution substantially reduces training times for all selected methods, demonstrating that the workflow scales efficiently to larger datasets while keeping preprocessing overhead minimal. + +Apart from the execution times, we also take this opportunity to provide a brief commentary on the outcome of the algorithm selection in this application example. In particular, we illustrate the model's ability to identify the most efficient storage format by reporting the results obtained with the \texttt{nnet} method, see Figure \ref{fig:ASMLnnet1}. The trained model selects the best-performing format in more than 85\% of the test cases, and even when it does not, the chosen format still achieves high performance, with mean value of the normalized KPI (normalized average GFLOPS) around 0.9. + +\begin{verbatim} +set.seed(1234) +data(SpMVformat) +features <- SpMVformat$x +KPI <- SpMVformat$y +data <- partition_and_normalize(features, KPI, better_smaller = FALSE) +preProcValues <- caret::preProcess(data$x.train, method = "YeoJohnson") +data$x.train <- predict(preProcValues, data$x.train) +data$x.test <- predict(preProcValues, data$x.test) +training <- AStrain(data, method = "nnet", parallel = TRUE) +pred <- ASpredict(training, newdata = data$x.test) +ranking(data, predictions = pred) +\end{verbatim} + +\begin{figure}[ht] + +{\centering \includegraphics[width=0.7\linewidth,alt={Ranking of storage formats, including the ML selected, based on the instance-normalized KPI for the test sample. The bars represent the percentage of times each storage format appeared in different ranking positions, with the numbers indicating the mean value of the normalized KPI.}]{figures/ASMLnnet1-1} + +} + +\caption{Ranking of storage formats, including the ML selected, based on the instance-normalized KPI for the test sample. The bars represent the percentage of times each storage format appeared in different ranking positions, with the numbers indicating the mean value of the normalized KPI.}\label{fig:ASMLnnet1} +\end{figure} + +\section{Using ASML for algorithm selection on ASlib scenarios}\label{using-asml-for-algorithm-selection-on-aslib-scenarios} + +While the primary purpose of the \CRANpkg{ASML} package is not to systematically conduct algorithm selection studies like those found in ASlib (an area for which the \CRANpkg{llama} toolkit is especially helpful), it does offer a complementary approach for reproducing results from the ASlib benchmark (\url{https://coseal.github.io/aslib-r/scenario-pages/index.html}). Our method allows for a comparative analysis using instance-normalized KPIs, which, as demonstrated in the following example, can sometimes yield improved performance results. Additionally, it can be useful for evaluating algorithm selection approaches based on methods that are not available in the \CRANpkg{mlr} package used by \CRANpkg{llama} but are accessible in \CRANpkg{caret}. + +\subsection{Data download and preparation}\label{data-download-and-preparation} + +First, we identify the specific scenario from ASlib we are interested in, in this case, \texttt{CPMP-2015}. Using the scenario name, we construct a URL that points to the corresponding page on the ASlib website. Then, we fetch the HTML content of the page and create a local directory to store the downloaded files\footnote{A more direct approach would be to use the \texttt{getCosealASScenario} function from the \CRANpkg{aslib} package; however, this function seems to be currently not working, likely due to changes in the directory structure of the scenarios.}. + +\begin{verbatim} +set.seed(1234) +library(tidyverse) +library(rvest) +scen <- "CPMP-2015" +url <- paste0("https://coseal.github.io/aslib-r/scenario-pages/", scen, "/data_files") +page <- read_html(paste0(url, ".html")) +file_links <- page %>% + html_nodes("a") %>% + html_attr("href") + +# Create directory for downloaded files +dir_data <- paste0(scen, "_data") +dir.create(dir_data, showWarnings = FALSE) + +# Download files +for (link in file_links) { + full_link <- ifelse(grepl("^http", link), link, paste0(url, "/", link)) + file_name <- basename(link) + dest_file <- file.path(dir_data, file_name) + if (!is.na(full_link)) { + download.file(full_link, dest_file, mode = "wb", quiet = TRUE) + } +} +\end{verbatim} + +\subsection{Data preparation with aslib}\label{data-preparation-with-aslib} + +Now, we use the \CRANpkg{aslib} package to parse the scenario data and extract the relevant features and performance metrics. The \texttt{parseASScenario} function from \CRANpkg{aslib} creates a structured object \texttt{ASScen} that contains information regarding the algorithms and instances being evaluated. We then transform this data into cross-validation folds using the \texttt{cvFolds} function from \CRANpkg{llama}. This conversion facilitates a fair evaluation of algorithm performance across different scenarios, allowing us to compare the results with those published \footnote{\label{foot}Available at: https://coseal.github.io/aslib-r/scenario-pages/CPMP-2015/llama.html (Accessed October 25, 2024).}. + +\begin{verbatim} +library(aslib) +ASScen <- aslib::parseASScenario(dir_data) +llamaScen <- aslib::convertToLlama(ASScen) +folds <- llama::cvFolds(llamaScen) +\end{verbatim} + +Then we extract the key performance indicator (KPI) and features from the folds object. In this case, \texttt{KPI} refers to runtime. As described in the ASlib documentation, \texttt{KPI\_pen} measures the penalized runtime. If an instance is solved within the timeout (\texttt{cutoff}) by the selected algorithm, the actual runtime is used. However, if a timeout occurs, the timeout value is multiplied by 10 to penalize the algorithm's performance. We also define \texttt{nins} as the number of instances and \texttt{ID} as unique identifiers for each instance. + +\begin{verbatim} +KPI <- folds$data[, folds$performance] +features <- folds$data[, folds$features] +cutoff <- ASScen$desc$algorithm_cutoff_time +is.timeout <- ASScen$algo.runstatus[, -c(1, 2)] != "ok" +KPI_pen <- KPI * ifelse(is.timeout, 10, 1) +nins <- length(getInstanceNames(ASScen)) +ID <- 1:nins +\end{verbatim} + +\subsection{Quantile random forest using ASML on instance-normalized KPI}\label{quantile-random-forest-using-asml-on-instance-normalized-kpi} + +We use the \CRANpkg{ASML} package to perform quantile random forest on instance-normalized KPI. We have already established the folds beforehand, and we want to use those partitions to maintain consistency with the original ASlib scenario design. Therefore, we provide \texttt{x.test} and \texttt{y.test} as arguments directly to the \texttt{partition\_and\_normalize} function. + +\begin{verbatim} +data <- partition_and_normalize(x = features, y = KPI, x.test = features, y.test = KPI, + better_smaller = TRUE) +train_control <- caret::trainControl(index = folds$train, savePredictions = "final") +training <- AStrain(data, method = "qrf", trControl = train_control) +\end{verbatim} + +In this code block, we process the predictions made by the models trained using \CRANpkg{ASML} and calculate the same performance metrics used in ASlib, namely, the percentage of solved instances (\texttt{succ}), penalized average runtime (\texttt{par10}), and misclassification penalty (\texttt{mcp}), as detailed in the ASlib documentation \citep{bis16}. + +\begin{verbatim} +pred_list <- lapply(training, function(model) { + model$pred %>% + arrange(rowIndex) %>% + pull(pred) +}) + +pred <- do.call(cbind, pred_list) +alg_sel <- apply(pred, 1, which.max) + +succ <- mean(!is.timeout[cbind(ID, alg_sel)]) +par10 <- mean(KPI_pen[cbind(ID, alg_sel)]) +mcp <- mean(KPI[cbind(ID, alg_sel)] - apply(KPI, 1, min)) +\end{verbatim} + +In Table \ref{tab:AMSLtabASLIB}, we present the results. We observe that, in this example, using instance-normalized KPI along with the quantile random forest model offers an alternative modeling option in addition to the standard regression models employed in the original ASlib study (linear model, regression trees and regression random forest), resulting in improved performance outcomes. + +\begin{table}[!h] +\centering +\caption{\label{tab:AMSLtabASLIB}Performance results of various models on the CPMP-2015 dataset. The last row represents the performance of the quantile random forest model based on instance-normalized KPI using the \CRANpkg{ASML} package. + The preceding rows detail the results (all taken from original ASlib study) of the virtual best solver (vbs), single best solver (singleBest), and the considered regression methods (linear model, regression trees and regression random forest).} +\centering +\fontsize{9}{11}\selectfont +\begin{tabular}[t]{lrrr} +\toprule +\textbf{\ttfamily{Model}} & \textbf{\ttfamily{succ}} & \textbf{\ttfamily{par10}} & \textbf{\ttfamily{mcp}}\\ +\midrule +baseline vbs & 1.000 & 227.605 & 0.000\\ +baseline singleBest & 0.812 & 7002.907 & 688.774\\ +regr.lm & 0.843 & 5887.326 & 556.875\\ +regr.rpart & 0.843 & 5916.120 & 585.669\\ +regr.randomForest & 0.846 & 5748.065 & 540.574\\ +\addlinespace +ASML qrf & 0.873 & 4807.633 & 460.863\\ +\bottomrule +\end{tabular} +\end{table} + +It's important to note that this is merely an example of execution and there are other scenarios in ASlib where replication may not be feasible in the same manner, due to factors not considered in \CRANpkg{ASML} (for more robust behavior across the ASlib benchmark, we refer to \CRANpkg{llama}). Despite these limitations, \CRANpkg{ASML} provides a flexible framework that allows researchers to explore various methodologies, including those not directly applicable with \CRANpkg{llama} through \CRANpkg{mlr}, and improve algorithm selection processes across different scenarios, ultimately contributing to improve understanding and performance in algorithm selection tasks. + +\section{Summary and discussion}\label{summary-and-discussion} + +In this work, we present \CRANpkg{ASML}, an R package to select the best algorithm from a portfolio of candidates based on a chosen KPI. \CRANpkg{ASML} uses instance-specific features and historical performance data to estimate, via a model selected by the user, including any regression method from the \CRANpkg{caret} package or a custom function, how well each algorithm is likely to perform on new instances. This helps the automatic selection of the most suitable one. The use of instance-normalized KPIs for algorithm selection is a novel aspect of this package, allowing a unified comparison across different algorithms and problem instances. + +While the motivation and examples presented in this work focus on optimization problems, particularly the automatic selection of branching rules and decision strategies in polynomial optimization, the \CRANpkg{ASML} framework is inherently flexible and can be applied more broadly. In particular, in the context of ML, selecting the right algorithm is a crucial factor for the success of ML applications. Traditionally, this process involves empirically assessing potential algorithms with the available data, which can be resource-intensive. In contrast, in the so-called Meta-Learning, the aim is to predict the performance of ML algorithms based on features of the learning problems (meta-examples). Each meta-example contains details about a previously solved learning problem, including features, and the performance achieved by the candidate algorithms on that problem. A common Meta-Learning approach involves using regression algorithms to forecast the value of a selected performance metric (such as classification error) for the candidate algorithms based on the problem features. This method is commonly referred to as Meta-Regression in the literature. Thus, \CRANpkg{ASML} could also be used in this context, providing a flexible tool for algorithm selection across a variety of domains. + +\section{Acknowledgments}\label{acknowledgments} + +The authors would like to thank María Caseiro-Arias, Antonio Fariña-Elorza and Manuel Timiraos-López for their contributions to the development of the \CRANpkg{ASML} package. +This work is part of the R\&D projects PID2024-158017NB-I00, PID2020-116587GB-I00 and PID2021-124030NB-C32 granted by MICIU/AEI/10.13039/501100011033. This research was also funded by Grupos de Referencia Competitiva ED431C-2021/24 and ED431C 2025/03 from the Consellería de Educación, Ciencia, Universidades e Formación Profesional, Xunta de Galicia. Brais González-Rodríguez acknowledges the support from MICIU, through grant BG23/00155. + +\bibliography{gomez-pateiro-gonzalez-gonzalez.bib} + +\address{% +Ignacio Gómez-Casares\\ +Universidade de Santiago de Compostela\\% +Department of Statistics, Mathematical Analysis and Optimization\\ Santiago de Compostela, Spain\\ +% +% +% +\href{mailto:ignaciogomez.casares@usc.es}{\nolinkurl{ignaciogomez.casares@usc.es}}% +} + +\address{% +Beatriz Pateiro-López\\ +Universidade de Santiago de Compostela\\% +Department of Statistics, Mathematical Analysis and Optimization\\ CITMAga (Galician Center for Mathematical Research and Technology)\\ Santiago de Compostela, Spain\\ +% +% +\textit{ORCiD: \href{https://orcid.org/0000-0002-7714-1835}{0000-0002-7714-1835}}\\% +\href{mailto:beatriz.pateiro@usc.es}{\nolinkurl{beatriz.pateiro@usc.es}}% +} + +\address{% +Brais González-Rodríguez\\ +Universidade de Vigo\\% +Department of Statistics and Operational Research\\ SiDOR Research Group\\ Vigo, Spain\\ +% +% +% +\href{mailto:brais.gonzalez.rodriguez@uvigo.gal}{\nolinkurl{brais.gonzalez.rodriguez@uvigo.gal}}% +} + +\address{% +Julio González-Díaz\\ +Universidade de Santiago de Compostela\\% +Department of Statistics, Mathematical Analysis and Optimization\\ CITMAga (Galician Center for Mathematical Research and Technology)\\ Santiago de Compostela, Spain\\ +% +% +\textit{ORCiD: \href{https://orcid.org/0000-0002-4667-4348}{0000-0002-4667-4348}}\\% +\href{mailto:julio.gonzalez@usc.es}{\nolinkurl{julio.gonzalez@usc.es}}% +} diff --git a/_articles/RJ-2025-045/RJ-2025-045.zip b/_articles/RJ-2025-045/RJ-2025-045.zip new file mode 100644 index 0000000000..c0a15a4e98 Binary files /dev/null and b/_articles/RJ-2025-045/RJ-2025-045.zip differ diff --git a/_articles/RJ-2025-045/RJournal.sty b/_articles/RJ-2025-045/RJournal.sty new file mode 100644 index 0000000000..351990be38 --- /dev/null +++ b/_articles/RJ-2025-045/RJournal.sty @@ -0,0 +1,358 @@ +% Package `RJournal' to use with LaTeX2e +% Copyright (C) 2010 by the R Foundation +% Copyright (C) 2013 by the R Journal +% +% Originally written by Kurt Hornik and Friedrich Leisch with subsequent +% edits by the editorial board +% +% CAUTION: +% Do not modify this style file. Any changes to this file will be reset when your +% article is submitted. +% If you must modify the style or add LaTeX packages to the article, these +% should be specified in RJwrapper.tex + +\NeedsTeXFormat{LaTeX2e}[1995/12/01] +\ProvidesPackage{RJournal}[2025/10/05 v0.17 RJournal package] + +\RequirePackage{tikz} + +% Overall page layout, fonts etc ----------------------------------------------- + +% Issues of of \emph{The R Journal} are created from the standard \LaTeX{} +% document class \pkg{report}. + +\RequirePackage{geometry} +\geometry{a4paper, + textwidth=14cm, top=1cm, bottom=1cm, + includehead,includefoot,centering, + footskip=1.5cm} +\raggedbottom +\sloppy +\clubpenalty = 10000 +\widowpenalty = 10000 +\brokenpenalty = 10000 +\usepackage{microtype} + + +\RequirePackage{fancyhdr} +\fancyhead{} +\fancyheadoffset{2cm} +\fancyhead[L]{\textsc{\RJ@sectionhead}} +\fancyhead[R]{\thepage} +\fancyfoot{} +\fancyfoot[L]{The R Journal Vol. \RJ@volume/\RJ@number, \RJ@month~\RJ@year} +\fancyfoot[R]{ISSN 2073-4859} +\pagestyle{fancy} + +% We use the following fonts (all with T1 encoding): +% +% rm & palatino +% tt & inconsolata +% sf & helvetica +% math & palatino + +\RequirePackage{microtype} + +\RequirePackage[scaled=0.92]{helvet} +\RequirePackage{palatino,mathpazo} +\RequirePackage[scaled=1.02]{inconsolata} +\RequirePackage[T1]{fontenc} + +\RequirePackage[hyphens]{url} +\RequirePackage[pagebackref]{hyperref} +\renewcommand{\backref}[1]{[p#1]} + +% Dark blue colour for all links +\RequirePackage{color} +\definecolor{link}{rgb}{0.45,0.51,0.67} +\hypersetup{ + colorlinks,% + citecolor=link,% + filecolor=link,% + linkcolor=link,% + urlcolor=link +} + +% Give the text a little room to breath +\setlength{\parskip}{3pt} +\RequirePackage{setspace} +\setstretch{1.05} + +% Issue and article metadata --------------------------------------------------- + +% Basic front matter information about the issue: volume, number, and +% date. + +\newcommand{\volume}[1]{\def\RJ@volume{#1}} +\newcommand{\volnumber}[1]{\def\RJ@number{#1}} +\renewcommand{\month}[1]{\def\RJ@month{#1}} +\renewcommand{\year}[1]{\def\RJ@year{#1}} + + +% Individual articles correspond to +% chapters, and are contained in |article| environments. This makes it +% easy to have figures counted within articles and hence hyperlinked +% correctly. + +% An article has an author, a title, and optionally a subtitle. We use +% the obvious commands for specifying these. Articles will be put in certain +% journal sections, named by \sectionhead. + +\newcommand {\sectionhead} [1]{\def\RJ@sectionhead{#1}} +\renewcommand{\author} [1]{\def\RJ@author{#1}} +\renewcommand{\title} [1]{\def\RJ@title{#1}} +\newcommand {\subtitle} [1]{\def\RJ@subtitle{#1}} + +% Control appearance of titles: make slightly smaller than usual, and +% suppress section numbering. See http://tex.stackexchange.com/questions/69749 +% for why we don't use \setcounter{secnumdepth}{-1} + +\usepackage[medium]{titlesec} +\usepackage{titletoc} +\titleformat{\section} {\normalfont\large\bfseries}{\arabic{section}}{1em}{} +\titleformat{\subsection}{\normalfont\normalsize\bfseries}{\arabic{section}.\arabic{subsection}}{0.5em}{} +\titlecontents{chapter} [0em]{}{}{}{\titlerule*[1em]{.}\contentspage} + +% Article layout --------------------------------------------------------------- + +% Environment |article| clears the article header information at its beginning. +% We use |\FloatBarrier| from the placeins package to keep floats within +% the article. +\RequirePackage{placeins} +\newenvironment{article}{\author{}\title{}\subtitle{}\FloatBarrier}{\FloatBarrier} + +% Refereed articles should have an abstract, so we redefine |\abstract| to +% give the desired style + +\renewcommand{\abstract}[1]{\noindent\textbf{Abstract} #1} +\renewenvironment{abstract}{\noindent\textbf{Abstract}~}{} + +% The real work is done by a redefined version of |\maketitle|. Note +% that even though we do not want chapters (articles) numbered, we +% need to increment the chapter counter, so that figures get correct +% labelling. + +\renewcommand{\maketitle}{% +\noindent + \chapter{\RJ@title}\refstepcounter{chapter} + \ifx\empty\RJ@subtitle + \else + \noindent\textbf{\RJ@subtitle} + \par\nobreak\addvspace{\baselineskip} + \fi + \ifx\empty\RJ@author + \else + \noindent\textit{\RJ@author} + \par\nobreak\addvspace{\baselineskip} + \fi + \@afterindentfalse\@nobreaktrue\@afterheading +} + +% Now for some ugly redefinitions. We do not want articles to start a +% new page. (Actually, we do, but this is handled via explicit +% \newpage +% +% The name@of@eq is a hack to get hyperlinks to equations to work +% within each article, even though there may be multiple eq.(1) +% \begin{macrocode} +\renewcommand\chapter{\secdef\RJ@chapter\@schapter} +\providecommand{\nohyphens}{% + \hyphenpenalty=10000\exhyphenpenalty=10000\relax} +\newcommand{\RJ@chapter}{% + \edef\name@of@eq{equation.\@arabic{\c@chapter}}% + \renewcommand{\@seccntformat}[1]{}% + \@startsection{chapter}{0}{0mm}{% + -2\baselineskip \@plus -\baselineskip \@minus -.2ex}{\p@}{% + \phantomsection\normalfont\huge\bfseries\raggedright}} + +% Book reviews should appear as sections in the text and in the pdf bookmarks, +% however we wish them to appear as chapters in the TOC. Thus we define an +% alternative to |\maketitle| for reviews. +\newcommand{\review}[1]{ + \pdfbookmark[1]{#1}{#1} + \section*{#1} + \addtocontents{toc}{\protect\contentsline{chapter}{#1}{\thepage}{#1.1}} +} + +% We want bibliographies as starred sections within articles. +% +\RequirePackage[sectionbib,round]{natbib} +\bibliographystyle{abbrvnat} +\renewcommand{\bibsection}{\section*{References}} + +% Equations, figures and tables are counted within articles, but we do +% not show the article number. For equations it becomes a bit messy to avoid +% having hyperref getting it wrong. + +% \numberwithin{equation}{chapter} +\renewcommand{\theequation}{\@arabic\c@equation} +\renewcommand{\thefigure}{\@arabic\c@figure} +\renewcommand{\thetable}{\@arabic\c@table} + +% Issue layout ----------------------------------------------------------------- + +% Need to provide our own version of |\tableofcontents|. We use the +% tikz package to get the rounded rectangle. Notice that |\section*| +% is really the same as |\chapter*|. +\renewcommand{\contentsname}{Contents} +\renewcommand\tableofcontents{% + \vspace{1cm} + \section*{\contentsname} + { \@starttoc{toc} } +} + +\renewcommand{\titlepage}{% + \thispagestyle{empty} + \hypersetup{ + pdftitle={The R Journal Volume \RJ@volume/\RJ@number, \RJ@month \RJ@year},% + pdfauthor={R Foundation for Statistical Computing},% + } + \noindent + \begin{center} + \fontsize{50pt}{50pt}\selectfont + The \raisebox{-8pt}{\includegraphics[height=77pt]{Rlogo-5}}\hspace{10pt} + Journal + + \end{center} + {\large \hfill Volume \RJ@volume/\RJ@number, \RJ@month{} \RJ@year \quad} + + \rule{\textwidth}{1pt} + \begin{center} + {\Large A peer-reviewed, open-access publication of the \\ + R Foundation for Statistical Computing} + \end{center} + + % And finally, put in the TOC box. Note the way |tocdepth| is adjusted + % before and after producing the TOC: thus, we can ensure that only + % articles show up in the printed TOC, but that in the PDF version, + % bookmarks are created for sections and subsections as well (provided + % that the non-starred forms are used). + \setcounter{tocdepth}{0} + \tableofcontents + \setcounter{tocdepth}{2} + \clearpage +} + +% Text formatting -------------------------------------------------------------- + +\newcommand{\R}{R} +\newcommand{\address}[1]{\addvspace{\baselineskip}\noindent\emph{#1}} +\newcommand{\email}[1]{\href{mailto:#1}{\normalfont\texttt{#1}}} + +% Simple font selection is not good enough. For example, |\texttt{--}| +% gives `\texttt{--}', i.e., an endash in typewriter font. Hence, we +% need to turn off ligatures, which currently only happens for commands +% |\code| and |\samp| and the ones derived from them. Hyphenation is +% another issue; it should really be turned off inside |\samp|. And +% most importantly, \LaTeX{} special characters are a nightmare. E.g., +% one needs |\~{}| to produce a tilde in a file name marked by |\file|. +% Perhaps a few years ago, most users would have agreed that this may be +% unfortunate but should not be changed to ensure consistency. But with +% the advent of the WWW and the need for getting `|~|' and `|#|' into +% URLs, commands which only treat the escape and grouping characters +% specially have gained acceptance + +\DeclareRobustCommand\code{\bgroup\@noligs\@codex} +\def\@codex#1{\texorpdfstring% +{{\normalfont\ttfamily\hyphenchar\font=-1 #1}}% +{#1}\egroup} +\newcommand{\kbd}[1]{{\normalfont\texttt{#1}}} +\newcommand{\key}[1]{{\normalfont\texttt{\uppercase{#1}}}} +\DeclareRobustCommand\samp{`\bgroup\@noligs\@sampx} +\def\@sampx#1{{\normalfont\texttt{#1}}\egroup'} +\newcommand{\var}[1]{{\normalfont\textsl{#1}}} +\let\env=\code +\newcommand{\file}[1]{{`\normalfont\textsf{#1}'}} +\let\command=\code +\let\option=\samp +\newcommand{\dfn}[1]{{\normalfont\textsl{#1}}} +% \acronym is effectively disabled since not used consistently +\newcommand{\acronym}[1]{#1} +\newcommand{\strong}[1]{\texorpdfstring% +{{\normalfont\fontseries{b}\selectfont #1}}% +{#1}} +\let\pkg=\strong +\newcommand{\CRANpkg}[1]{\href{https://CRAN.R-project.org/package=#1}{\pkg{#1}}}% +\let\cpkg=\CRANpkg +\newcommand{\ctv}[1]{\href{https://CRAN.R-project.org/view=#1}{\emph{#1}}} +\newcommand{\BIOpkg}[1]{\href{https://www.bioconductor.org/packages/release/bioc/html/#1.html}{\pkg{#1}}} + +% Example environments --------------------------------------------------------- +\RequirePackage{fancyvrb} +\RequirePackage{alltt} + +\DefineVerbatimEnvironment{example}{Verbatim}{} +\renewenvironment{example*}{\begin{alltt}}{\end{alltt}} + +% Support for output from Sweave, and generic session style code +% These used to have fontshape=sl for Sinput/Scode/Sin, but pslatex +% won't use a condensed font in that case. + +% Update (2015-05-28 by DS): remove fontsize=\small to match example environment + +\DefineVerbatimEnvironment{Sinput}{Verbatim}{} +\DefineVerbatimEnvironment{Soutput}{Verbatim}{} +\DefineVerbatimEnvironment{Scode}{Verbatim}{} +\DefineVerbatimEnvironment{Sin}{Verbatim}{} +\DefineVerbatimEnvironment{Sout}{Verbatim}{} +\newenvironment{Schunk}{}{} + +% Mathematics ------------------------------------------------------------------ + +% The implementation of |\operatorname| is similar to the mechanism +% \LaTeXe{} uses for functions like sin and cos, and simpler than the +% one of \AmSLaTeX{}. We use |\providecommand| for the definition in +% order to keep the one of the \pkg{amstex} if this package has +% already been loaded. +% \begin{macrocode} +\providecommand{\operatorname}[1]{% + \mathop{\operator@font#1}\nolimits} +\RequirePackage{amsfonts} + +\renewcommand{\P}{% + \mathop{\operator@font I\hspace{-1.5pt}P\hspace{.13pt}}} +\newcommand{\E}{% + \mathop{\operator@font I\hspace{-1.5pt}E\hspace{.13pt}}} +\newcommand{\VAR}{\operatorname{var}} +\newcommand{\COV}{\operatorname{cov}} +\newcommand{\COR}{\operatorname{cor}} + +% Figures ---------------------------------------------------------------------- + +% For use with pandoc > 3.2.1 +\newsavebox\pandoc@box +\newcommand*\pandocbounded[1]{% scales image to fit in text height/width + \sbox\pandoc@box{#1}% + \Gscale@div\@tempa{\textheight}{\dimexpr\ht\pandoc@box+\dp\pandoc@box\relax}% + \Gscale@div\@tempb{\linewidth}{\wd\pandoc@box}% + \ifdim\@tempb\p@<\@tempa\p@\let\@tempa\@tempb\fi% select the smaller of both + \ifdim\@tempa\p@<\p@\scalebox{\@tempa}{\usebox\pandoc@box}% + \else\usebox{\pandoc@box}% + \fi% +} + +\RequirePackage[font=small,labelfont=bf]{caption} + +% Wide environments for figures and tables ------------------------------------- +\RequirePackage{environ} + +% An easy way to make a figure span the full width of the page +\NewEnviron{widefigure}[1][]{ +\begin{figure}[#1] +\advance\leftskip-2cm +\begin{minipage}{\dimexpr\textwidth+4cm\relax}% + \captionsetup{margin=2cm} + \BODY +\end{minipage}% +\end{figure} +} + +\NewEnviron{widetable}[1][]{ +\begin{table}[#1] +\advance\leftskip-2cm +\begin{minipage}{\dimexpr\textwidth+4cm\relax}% + \captionsetup{margin=2cm} + \BODY +\end{minipage}% +\end{table} +} diff --git a/_articles/RJ-2025-045/RJwrapper.tex b/_articles/RJ-2025-045/RJwrapper.tex new file mode 100644 index 0000000000..d6e3934d24 --- /dev/null +++ b/_articles/RJ-2025-045/RJwrapper.tex @@ -0,0 +1,74 @@ +\documentclass[a4paper]{report} +\usepackage[utf8]{inputenc} +\usepackage[T1]{fontenc} +\usepackage{RJournal} +\usepackage{amsmath,amssymb,array} +\usepackage{booktabs} + + +% tightlist command for lists without linebreak +\providecommand{\tightlist}{% + \setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}} + +\usepackage{longtable} + +% Always define CSL refs as bib entries are contained in separate doc +% Pandoc citation processing +%From Pandoc 3.1.8 +% definitions for citeproc citations +\NewDocumentCommand\citeproctext{}{} +\NewDocumentCommand\citeproc{mm}{% + \begingroup\def\citeproctext{#2}\cite{#1}\endgroup} +\makeatletter + % allow citations to break across lines + \let\@cite@ofmt\@firstofone + % avoid brackets around text for \cite: + \def\@biblabel#1{} + \def\@cite#1#2{{#1\if@tempswa , #2\fi}} +\makeatother +\newlength{\cslhangindent} +\setlength{\cslhangindent}{1.5em} +\newlength{\csllabelwidth} +\setlength{\csllabelwidth}{3em} +\newenvironment{CSLReferences}[2] % #1 hanging-indent, #2 entry-spacing + {\begin{list}{}{% + \setlength{\itemindent}{0pt} + \setlength{\leftmargin}{0pt} + \setlength{\parsep}{0pt} + % turn on hanging indent if param 1 is 1 + \ifodd #1 + \setlength{\leftmargin}{\cslhangindent} + \setlength{\itemindent}{-1\cslhangindent} + \fi + % set entry spacing + \setlength{\itemsep}{#2\baselineskip}}} + {\end{list}} +\usepackage{calc} +\newcommand{\CSLBlock}[1]{#1\hfill\break} +\newcommand{\CSLLeftMargin}[1]{\parbox[t]{\csllabelwidth}{#1}} +\newcommand{\CSLRightInline}[1]{\parbox[t]{\linewidth - \csllabelwidth}{#1}\break} +\newcommand{\CSLIndent}[1]{\hspace{\cslhangindent}#1} + + +\usepackage{longtable} +\usepackage[table]{xcolor} +\usepackage{pifont} +\usepackage{float} + +\begin{document} + + +%% do not edit, for illustration only +\sectionhead{Contributed research article} +\volume{17} +\volnumber{4} +\year{2025} +\month{December} +\setcounter{page}{216} + +\begin{article} + \input{RJ-2025-045} +\end{article} + + +\end{document} diff --git a/_articles/RJ-2025-045/figures/ASMLnnet1-1.pdf b/_articles/RJ-2025-045/figures/ASMLnnet1-1.pdf new file mode 100644 index 0000000000..6606ca4704 Binary files /dev/null and b/_articles/RJ-2025-045/figures/ASMLnnet1-1.pdf differ diff --git a/_articles/RJ-2025-045/figures/ASMLnnet1-1.png b/_articles/RJ-2025-045/figures/ASMLnnet1-1.png new file mode 100644 index 0000000000..a9ac25bf25 Binary files /dev/null and b/_articles/RJ-2025-045/figures/ASMLnnet1-1.png differ diff --git a/_articles/RJ-2025-045/figures/ASMLplot1-1.pdf b/_articles/RJ-2025-045/figures/ASMLplot1-1.pdf new file mode 100644 index 0000000000..8a7f520988 Binary files /dev/null and b/_articles/RJ-2025-045/figures/ASMLplot1-1.pdf differ diff --git a/_articles/RJ-2025-045/figures/ASMLplot1-1.png b/_articles/RJ-2025-045/figures/ASMLplot1-1.png new file mode 100644 index 0000000000..7d88e46598 Binary files /dev/null and b/_articles/RJ-2025-045/figures/ASMLplot1-1.png differ diff --git a/_articles/RJ-2025-045/figures/ASMLplot2-1.pdf b/_articles/RJ-2025-045/figures/ASMLplot2-1.pdf new file mode 100644 index 0000000000..7b0fd1d014 Binary files /dev/null and b/_articles/RJ-2025-045/figures/ASMLplot2-1.pdf differ diff --git a/_articles/RJ-2025-045/figures/ASMLplot2-1.png b/_articles/RJ-2025-045/figures/ASMLplot2-1.png new file mode 100644 index 0000000000..a035674f76 Binary files /dev/null and b/_articles/RJ-2025-045/figures/ASMLplot2-1.png differ diff --git a/_articles/RJ-2025-045/figures/ASMLplot3-1.pdf b/_articles/RJ-2025-045/figures/ASMLplot3-1.pdf new file mode 100644 index 0000000000..913df6b150 Binary files /dev/null and b/_articles/RJ-2025-045/figures/ASMLplot3-1.pdf differ diff --git a/_articles/RJ-2025-045/figures/ASMLplot3-1.png b/_articles/RJ-2025-045/figures/ASMLplot3-1.png new file mode 100644 index 0000000000..7be2b0a9de Binary files /dev/null and b/_articles/RJ-2025-045/figures/ASMLplot3-1.png differ diff --git a/_articles/RJ-2025-045/figures/ASMLplot4-1.pdf b/_articles/RJ-2025-045/figures/ASMLplot4-1.pdf new file mode 100644 index 0000000000..4d52b4cc1e Binary files /dev/null and b/_articles/RJ-2025-045/figures/ASMLplot4-1.pdf differ diff --git a/_articles/RJ-2025-045/figures/ASMLplot4-1.png b/_articles/RJ-2025-045/figures/ASMLplot4-1.png new file mode 100644 index 0000000000..54a5a47b79 Binary files /dev/null and b/_articles/RJ-2025-045/figures/ASMLplot4-1.png differ diff --git a/_articles/RJ-2025-045/figures/ASP_Rice.png b/_articles/RJ-2025-045/figures/ASP_Rice.png new file mode 100644 index 0000000000..e5bf7e3bed Binary files /dev/null and b/_articles/RJ-2025-045/figures/ASP_Rice.png differ diff --git a/_articles/RJ-2025-045/figures/AS_kerschke_drawio.pdf b/_articles/RJ-2025-045/figures/AS_kerschke_drawio.pdf new file mode 100644 index 0000000000..4d9f661aa4 Binary files /dev/null and b/_articles/RJ-2025-045/figures/AS_kerschke_drawio.pdf differ diff --git a/_articles/RJ-2025-045/figures/AS_kerschke_drawio.png b/_articles/RJ-2025-045/figures/AS_kerschke_drawio.png new file mode 100644 index 0000000000..fc04605b99 Binary files /dev/null and b/_articles/RJ-2025-045/figures/AS_kerschke_drawio.png differ diff --git a/_articles/RJ-2025-045/figures/DALEX1-1.pdf b/_articles/RJ-2025-045/figures/DALEX1-1.pdf new file mode 100644 index 0000000000..10755401e2 Binary files /dev/null and b/_articles/RJ-2025-045/figures/DALEX1-1.pdf differ diff --git a/_articles/RJ-2025-045/figures/DALEX1-1.png b/_articles/RJ-2025-045/figures/DALEX1-1.png new file mode 100644 index 0000000000..3f3dfee1bb Binary files /dev/null and b/_articles/RJ-2025-045/figures/DALEX1-1.png differ diff --git a/_articles/RJ-2025-045/figures/partitionPLOT-1.pdf b/_articles/RJ-2025-045/figures/partitionPLOT-1.pdf new file mode 100644 index 0000000000..753af88fe2 Binary files /dev/null and b/_articles/RJ-2025-045/figures/partitionPLOT-1.pdf differ diff --git a/_articles/RJ-2025-045/figures/partitionPLOT-1.png b/_articles/RJ-2025-045/figures/partitionPLOT-1.png new file mode 100644 index 0000000000..59ec7ac4ee Binary files /dev/null and b/_articles/RJ-2025-045/figures/partitionPLOT-1.png differ diff --git a/_articles/RJ-2025-045/figures/rank-1.pdf b/_articles/RJ-2025-045/figures/rank-1.pdf new file mode 100644 index 0000000000..93bb0f9811 Binary files /dev/null and b/_articles/RJ-2025-045/figures/rank-1.pdf differ diff --git a/_articles/RJ-2025-045/figures/rank-1.png b/_articles/RJ-2025-045/figures/rank-1.png new file mode 100644 index 0000000000..56499b3ce3 Binary files /dev/null and b/_articles/RJ-2025-045/figures/rank-1.png differ diff --git a/_articles/RJ-2025-045/figures/splitPLOT-1.pdf b/_articles/RJ-2025-045/figures/splitPLOT-1.pdf new file mode 100644 index 0000000000..23959816c2 Binary files /dev/null and b/_articles/RJ-2025-045/figures/splitPLOT-1.pdf differ diff --git a/_articles/RJ-2025-045/figures/splitPLOT-1.png b/_articles/RJ-2025-045/figures/splitPLOT-1.png new file mode 100644 index 0000000000..f6f1cee29a Binary files /dev/null and b/_articles/RJ-2025-045/figures/splitPLOT-1.png differ diff --git a/_articles/RJ-2025-045/gomez-pateiro-gonzalez-gonzalez.R b/_articles/RJ-2025-045/gomez-pateiro-gonzalez-gonzalez.R new file mode 100644 index 0000000000..e5b895ffba --- /dev/null +++ b/_articles/RJ-2025-045/gomez-pateiro-gonzalez-gonzalez.R @@ -0,0 +1,146 @@ +## ---- Using the ASML package +## ------------------------------------------------------------- +set.seed(1234) +library(ASML) +data(branching) +features<-branching$x +KPI<-branching$y +lab_rules<-c("max", "sum", "dual", "range", "eig-VI", "eig-CMI") + +## ---- Data partition +data <- partition_and_normalize(features, KPI, family_column = 1, split_by_family =TRUE, better_smaller = TRUE) +names(data) + +## ---- Boxplots of instance-normalized KPI for each algorithm across instances in the train setlots of instance-normalized KPI for each algorithm across instances in the train set.", fig.alt = "Boxplots of instance-normalized KPI for each algorithm across instances in the train set."---- +boxplots(data,test=FALSE,by_families=FALSE,labels=lab_rules) + +## ---- Ranking of algorithms based on the instance-normalized KPI for the training sample, categorized by family. +ranking(data, test=FALSE, by_families = TRUE, labels=lab_rules) + +## ---- Preprocess caret +preProcValues <- caret::preProcess(data$x.train, method = "YeoJohnson") +data$x.train <- predict(preProcValues, data$x.train) +data$x.test <- predict(preProcValues, data$x.test) + +## ---- AMSL train +library(quantregForest) +tune_grid <- expand.grid(mtry = 10) +training <- AStrain(data, method = "qrf", tuneGrid = tune_grid) + +## ---- AMSL prediction +predict_test <- ASpredict(training, newdata = data$x.test) + +## ---- AMSL summary tables +KPI_table(data, predictions =predict_test) +KPI_summary_table(data, predictions =predict_test) + +## ---- Boxplots of instance-normalized KPI for each algorithm, including the ML algorithm, across instances in the test set. +boxplots(data, predictions = predict_test, labels=c(lab_rules,"ML")) + +## ---- Boxplots of instance-normalized KPI for each algorithm, including the ML algorithm, across instances in the test set, categorized by family. +boxplots(data, predictions = predict_test,labels=c(lab_rules,"ML"),by_families=TRUE) + +## ---- Ranking of algorithms, including the ML algorithm, based on the instance-normalized KPI for the test sample, categorized by family. +ranking(data, predictions = predict_test,labels=c("ML",lab_rules),by_families=TRUE) + +## ---- Comparison of the best-performing rules +figure_comparison(data, predictions = predict_test,by_families=FALSE,labels=lab_rules) + +## ---- Custom user-defined methods +qrf_q_predict <- function(modelFit, newdata, what = 0.5, submodels = NULL) { + out <- predict(modelFit$finalModel, newdata, what = what) + if (is.matrix(out)) + out <- out[, 1] + out +} +predict_test_Q1 <- ASpredict(training, newdata = data$x.test, f="qrf_q_predict", what=0.25) +KPI_summary_table(data, predictions=predict_test_Q1) + +## ---- Model interpretability with DALEX explainers +explainers_qrf <- ASexplainer(training, data = data$x.test, y = data$y.test, labels = lab_rules) +mp_qrf <- lapply(explainers_qrf, DALEX::model_performance) +do.call(plot, unname(mp_qrf)) +vi_qrf <- lapply(explainers_qrf, DALEX::model_parts) +do.call(plot, c(unname(vi_qrf), list(max_vars = 5))) +pdp_qrf <- lapply(explainers_qrf, DALEX::model_profile, variable = "degree", type = "partial") +do.call(plot, unname(pdp_qrf)) + + +## ---- Example on a larger dataset +## ------------------------------------------------------------- +set.seed(1234) +data(SpMVformat) +features <- SpMVformat$x +KPI <- SpMVformat$y +data <- partition_and_normalize(features, KPI, better_smaller = FALSE) +preProcValues <- caret::preProcess(data$x.train, method = "YeoJohnson") +data$x.train <- predict(preProcValues, data$x.train) +data$x.test <- predict(preProcValues, data$x.test) +training <- AStrain(data, method = "nnet", parallel = TRUE) +pred <- ASpredict(training, newdata = data$x.test) +ranking(data, predictions = pred) + + +## ---- Using ASML for algorithm selection on ASlib scenarios +## ------------------------------------------------------------- +set.seed(1234) +library(tidyverse) +library(rvest) +scen <- "CPMP-2015" +url <- paste0("https://coseal.github.io/aslib-r/scenario-pages/", scen, "/data_files") +page <- read_html(paste0(url, ".html")) +file_links <- page %>% html_nodes("a") %>% html_attr("href") + +# Create directory for downloaded files +dir_data <- paste0(scen, "_data") +dir.create(dir_data, showWarnings = FALSE) + +# Download files +for (link in file_links) { + full_link <- ifelse(grepl("^http", link), link, paste0(url, "/", link)) + file_name <- basename(link) + dest_file <- file.path(dir_data, file_name) + if (!is.na(full_link)) { + download.file(full_link, dest_file, mode = "wb", quiet = TRUE) + } +} + + +## ---- Data preparation with aslib +library(aslib) +ASScen<-aslib::parseASScenario(dir_data) +llamaScen <- aslib::convertToLlama(ASScen) +folds<- llama::cvFolds(llamaScen) + + +## ---- key performance indicator +KPI <- folds$data[,folds$performance] +features<-folds$data[,folds$features] +cutoff<-ASScen$desc$algorithm_cutoff_time +is.timeout <- ASScen$algo.runstatus[,-c(1,2)]!="ok" +KPI_pen <- KPI*ifelse(is.timeout,10,1) +nins<-length(getInstanceNames(ASScen)) +ID<-1:nins + + +## ---- Quantile random forest using ASML on instance-normalized KPI +data <- partition_and_normalize(x=features,y=KPI,x.test=features,y.test=KPI,better_smaller = TRUE) +train_control <- caret::trainControl(index=folds$train, savePredictions = 'final') +training <- AStrain(data, method = "qrf", trControl = train_control) + + +## ----echo = TRUE, tidy=TRUE, eval=TRUE---------------------------------------- +pred_list <- lapply(training, function(model) { + model$pred %>% arrange(rowIndex) %>% pull(pred) +}) + +pred <- do.call(cbind, pred_list) +alg_sel<-apply(pred,1,which.max) + +succ = mean(!is.timeout[cbind(ID, alg_sel)]) +par10 = mean(KPI_pen[cbind(ID, alg_sel)]) +mcp = mean(KPI[cbind(ID, alg_sel)] - apply(KPI, 1, min)) + +results_table <- data.frame(Model="ASML qrf",succ=format(succ, nsmall = 3, digits = 3), par10=format(par10, nsmall = 3, digits = 3), mcp=format(mcp, nsmall = 3, digits = 3)) + +results_table diff --git a/_articles/RJ-2025-045/gomez-pateiro-gonzalez-gonzalez.bib b/_articles/RJ-2025-045/gomez-pateiro-gonzalez-gonzalez.bib new file mode 100644 index 0000000000..c08b59f428 --- /dev/null +++ b/_articles/RJ-2025-045/gomez-pateiro-gonzalez-gonzalez.bib @@ -0,0 +1,313 @@ +@article{DALEX, + author = {Przemyslaw Biecek}, + title = {{DALEX}: Explainers for Complex Predictive Models in {R}}, + journal = {Journal of Machine Learning Research}, + year = {2018}, + volume = {19}, + number = {84}, + pages = {1-5}, + url = {http://jmlr.org/papers/v19/18-416.html} +} +@inproceedings{pic18, + author={Pichel, Juan C. and Pateiro-L\'{o}pez, Beatriz}, + booktitle={2018 {IEEE} International Conference on Cluster Computing ({CLUSTER})}, + title={A New Approach for Sparse Matrix Classification Based on Deep Learning Techniques}, + year={2018}, + pages={46-54}, + keywords={Sparse matrices;Training;Computer architecture;Convolution;Kernel;Convolutional neural networks;Sparse matrix, Classification, Deep Learning, CNN, Performance}, + doi={10.1109/CLUSTER.2018.00017} +} +@article{gon24, + title={Learning in Spatial Branching: Limitations of Strong Branching Imitation}, + author={Brais Gonz\'{a}lez-Rodr\'{i}guez and Ignacio G\'{o}mez-Casares and Bissan Ghaddar and Julio Gonz\'{a}lez-D\'{i}az and Beatriz Pateiro-L\'{o}pez}, + journal={INFORMS Journal on Computing}, + year={2025} +} +@article{mei06, + author = {Meinshausen, Nicolai}, + title = {Quantile Regression Forests}, + year = {2006}, + issue_date = {12/1/2006}, + publisher = {JMLR.org}, + volume = {7}, + issn = {1532-4435}, + journal = {Journal of Machine Learning Research}, + month = dec, + pages = {983-999}, + numpages = {17} +} +@manual{Rsnow, + title = {snow: Simple Network of Workstations}, + author = {Luke Tierney and A. J. Rossini and Na Li and H. Sevcikova}, + year = {2021}, + note = {R package version 0.4-4}, + url = {https://CRAN.R-project.org/package=snow}, + doi = {10.32614/CRAN.package.snow} +} +@manual{Rasml, + title = {{ASML}: Algorithm Portfolio Selection with Machine Learning}, + author = {Brais Gonz\'{a}lez-Rodr\'{i}guez and Ignacio G\'{o}mez-Casares and Beatriz Pateiro-L\'{o}pez and Julio Gonz\'{a}lez-D\'{i}az}, + year = {2025}, + note = {R package version 1.1.0}, + url = {https://CRAN.R-project.org/package=ASML}, + doi = {10.32614/CRAN.package.ASML} +} +@manual{mei24, + title = {quantregForest: Quantile Regression Forests}, + author = {Nicolai Meinshausen}, + year = {2024}, + note = {R package version 1.3-7.1}, + url = {https://CRAN.R-project.org/package=quantregForest} +} +@article{Rmlr3, + title = {{mlr3}: A modern object-oriented machine learning framework in {R}}, + author = {Michel Lang and Martin Binder and Jakob Richter and Patrick Schratz and Florian Pfisterer and Stefan Coors and Quay Au and Giuseppe Casalicchio and Lars Kotthoff and Bernd Bischl}, + journal = {Journal of Open Source Software}, + year = {2019}, + month = {dec}, + doi = {10.21105/joss.01903}, + url = {https://joss.theoj.org/papers/10.21105/joss.01903} +} +@article{Rmlr, + title = {{mlr}: Machine Learning in {R}}, + author = {Bernd Bischl and Michel Lang and Lars Kotthoff and Julia Schiffner and Jakob Richter and Erich Studerus and Giuseppe Casalicchio and Zachary M. Jones}, + journal = {Journal of Machine Learning Research}, + year = {2016}, + volume = {17}, + number = {170}, + pages = {1-5}, + url = {https://jmlr.org/papers/v17/15-066.html} +} +@manual{Raslib, + title = {aslib: Interface to the Algorithm Selection Benchmark Library}, + author = {Bernd Bischl and Lars Kotthoff and Pascal Kerschke and Damir Pulatov}, + year = {2024}, + note = {R package version 0.1.2, commit 2363baf4607971cd2ed1d784d323ecef898b2ea3}, + url = {https://github.com/coseal/aslib-r} +} +@manual{Rllama, + title = {llama: Leveraging Learning to Automatically Manage Algorithms}, + author = {Lars Kotthoff and Bernd Bischl and Barry Hurley and Talal Rahwan and Damir Pulatov}, + year = {2021}, + note = {R package version 0.10.1}, + url = {https://CRAN.R-project.org/package=llama} +} +@article{Rcaret, + title = {Building Predictive Models in {R} Using the caret Package}, + volume = {28}, + url = {https://www.jstatsoft.org/index.php/jss/article/view/v028i05}, + doi = {10.18637/jss.v028.i05}, + number = {5}, + journal = {Journal of Statistical Software}, + author = {{Kuhn} and {Max}}, + year = {2008}, + pages = {1-26} +} +@inproceedings{pul22, + title = {Opening the Black Box: Automated Software Analysis for Algorithm Selection}, + author = {Pulatov, Damir and Anastacio, Marie and Kotthoff, Lars and Hoos, Holger}, + booktitle = {Proceedings of the First International Conference on Automated Machine Learning}, + pages = {6/1--18}, + year = {2022}, + editor = {Guyon, Isabelle and Lindauer, Marius and van der Schaar, Mihaela and Hutter, Frank and Garnett, Roman}, + volume = {188}, + series = {Proceedings of Machine Learning Research}, + month = {25--27 Jul}, + publisher = {PMLR}, + pdf = {https://proceedings.mlr.press/v188/pulatov22a/pulatov22a.pdf}, + url = {https://proceedings.mlr.press/v188/pulatov22a.html} +} +@article{bis16, + title = {{ASlib}: A benchmark library for algorithm selection}, + journal = {Artificial Intelligence}, + volume = {237}, + pages = {41-58}, + year = {2016}, + issn = {0004-3702}, + doi = {https://doi.org/10.1016/j.artint.2016.04.003}, + author = {Bernd Bischl and Pascal Kerschke and Lars Kotthoff and Marius Lindauer and Yuri Malitsky and Alexandre Fr\'{e}chette and Holger Hoos and Frank Hutter and Kevin Leyton-Brown and Kevin Tierney and Joaquin Vanschoren} +} +@article{ric76, + author = {Rice, John R.}, + journal = {Advances in Computers}, + pages = {65-118}, + title = {The Algorithm Selection Problem.}, + volume = {15}, + year = {1976} +} +@article{ker19, + author = {Kerschke, Pascal and Hoos, Holger H. and Neumann, Frank and Trautmann, Heike}, + title = {{Automated Algorithm Selection: Survey and Perspectives}}, + journal = {Evolutionary Computation}, + volume = {27}, + number = {1}, + pages = {3-45}, + year = {2019}, + month = {03}, + issn = {1063-6560}, + doi = {10.1162/evco_a_00242}, + url = {https://doi.org/10.1162/evco_a_00242}, + eprint = {https://direct.mit.edu/evco/article-pdf/27/1/3/1552398/evco_a_00242.pdf} +} +@inproceedings{spe21, + author = {Speck, David and Biedenkapp, Andr\'{e} and Hutter, Frank and Mattm{\"u}ller, Robert and Lindauer, Marius}, + booktitle = {Proceedings of the 31st International Conference on Automated Planning and Scheduling ({ICAPS21})}, + keywords = {Heuristic Selection leibnizailab myown}, + month = {aug}, + title = {Learning Heuristic Selection with Dynamic Algorithm Configuration}, + url = {https://arxiv.org/abs/2006.08246}, + year = {2021} +} +@article{dra20, + title = {Recent advances in selection hyper-heuristics}, + journal = {European Journal of Operational Research}, + volume = {285}, + number = {2}, + pages = {405-428}, + year = {2020}, + issn = {0377-2217}, + doi = {https://doi.org/10.1016/j.ejor.2019.07.073}, + url = {https://www.sciencedirect.com/science/article/pii/S0377221719306526}, + author = {John H. Drake and Ahmed Kheiri and Ender \"{O}zcan and Edmund K. Burke} +} +@article{she92, + author = {Hanif D. Sherali and Cihan H. Tuncbilek}, + doi = {10.1007/bf00121304}, + journal = {Journal of Global Optimization}, + number = {1}, + pages = {101--112}, + publisher = {Springer Science and Business Media {LLC}}, + title = {A global optimization algorithm for polynomial programming problems using a Reformulation-Linearization Technique}, + volume = {2}, + year = {1992} +} +@inbook{kot16, + author={Kotthoff, Lars}, + editor={Bessiere, Christian and De Raedt, Luc and Kotthoff, Lars and Nijssen, Siegfried and O'Sullivan, Barry and Pedreschi, Dino}, + title={Algorithm Selection for Combinatorial Search Problems: A Survey}, + booktitle={Data Mining and Constraint Programming: Foundations of a Cross-Disciplinary Approach}, + year= {2016}, + publisher={Springer International Publishing}, + address={Cham}, + pages={149--190}, + isbn={978-3-319-50137-6} +} +@article{gha23, + author = {Ghaddar, Bissan and G\'{o}mez-Casares, Ignacio and Gonz\'{a}lez-D\'{i}az, Julio and Gonz\'{a}lez-Rodr\'{i}guez, Brais and Pateiro-L\'{o}pez, Beatriz and Rodr\'{i}guez-Ballesteros, Sof\'{i}a}, + title = {Learning for Spatial Branching: An Algorithm Selection Approach}, + journal = {INFORMS Journal on Computing}, + volume = {35}, + number = {5}, + pages = {1024-1043}, + year = {2023}, + doi = {10.1287/ijoc.2022.0090}, + url = {https://doi.org/10.1287/ijoc.2022.0090}, + eprint = {https://doi.org/10.1287/ijoc.2022.0090} +} +@article{lod17, + author={Andrea Lodi and Giulia Zarpellon}, + title={{On learning and branching: a survey}}, + journal={TOP: An Official Journal of the Spanish Society of Statistics and Operations Research}, + year={2017}, + volume={25}, + number={2}, + pages={207-236}, + month={July}, + keywords={Branch and bound; Machine learning}, + doi={10.1007/s11750-017-0451-6}, + url={https://ideas.repec.org/a/spr/topjnl/v25y2017i2d10.1007_s11750-017-0451-6.html} +} +@article{ben21, + title = {Machine learning for combinatorial optimization: A methodological tour d'horizon}, + journal = {European Journal of Operational Research}, + volume = {290}, + number = {2}, + pages = {405-421}, + year = {2021}, + issn = {0377-2217}, + doi = {https://doi.org/10.1016/j.ejor.2020.07.063}, + url = {https://www.sciencedirect.com/science/article/pii/S0377221720306895}, + author = {Yoshua Bengio and Andrea Lodi and Antoine Prouvost} +} +@article{bus03, + author = {Michael R. Bussieck and Arne Stolbjerg Drud and Alexander Meeraus}, + doi = {10.1287/ijoc.15.1.114.15159}, + issn = {1091-9856}, + journal = {INFORMS Journal on Computing}, + pages = {114--119}, + title = {{MINLPL}ib-A Collection of Test Models for Mixed-Integer Nonlinear Programming}, + volume = {15}, + year = {2003} +} +@article{fur18, + author = {Fabio Furini and Emiliano Traversi and Pietro Belotti and Antonio Frangioni and Ambros Gleixner and Nick Gould and Leo Liberti and Andrea Lodi and Ruth Misener and Hans Mittelmann and Nikolaos Sahinidis and Stefan Vigerske and Angelika Wiegele}, + doi = {10.1007/s12532-018-0147-4}, + issn = {1867-2957}, + journal = {Mathematical Programming Computation}, + pages = {237--265}, + title = {{QPLIB}: a library of quadratic programming instances}, + volume = {1}, + year = {2018} +} +@article{dal16, + author = {Evrim Dalkiran and Hanif D. Sherali}, + doi = {10.1007/s12532-016-0099-5}, + issn = {1867-2949}, + journal = {Mathematical Programming Computation}, + pages = {337--375}, + title = {{RLT}-{POS}: Reformulation-Linearization Technique-based optimization software for solving polynomial programming problems}, + volume = {8}, + year = {2016} +} +@inbook{van19, + author= {Vanschoren, Joaquin}, + editor={Hutter, Frank and Kotthoff, Lars and Vanschoren, Joaquin}, + title={Meta-Learning}, + booktitle={Automated Machine Learning: Methods, Systems, Challenges}, + year= {2019}, + publisher={Springer International Publishing}, + pages= {35--61}, + isbn={978-3-030-05318-5} +} +@article{mes14, + title = {An automatic algorithm selection approach for the multi-mode resource-constrained project scheduling problem}, + journal = {European Journal of Operational Research}, + volume = {233}, + number = {3}, + pages = {511-528}, + year = {2014}, + issn = {0377-2217}, + doi = {https://doi.org/10.1016/j.ejor.2013.08.021}, + url = {https://www.sciencedirect.com/science/article/pii/S0377221713006863}, + author = {Tommy Messelis and Patrick {De Causmaecker}} +} +@book{plotly, + author = {Carson Sievert}, + title = {{Interactive Web-Based Data Visualizatio}n with {R}, plotly, and shiny}, + publisher = {Chapman and Hall/CRC}, + year = {2020}, + isbn = {9781138331457}, + url = {https://plotly-r.com} +} +@manual{crosstalk, + title = {{crosstalk}: Inter-Widget Interactivity for HTML Widgets}, + author = {Joe Cheng and Carson Sievert}, + year = {2021}, + note = {R package version 1.1.1}, + url = {https://CRAN.R-project.org/package=crosstalk} +} +@article{RJ-2021-050, + author = {Earo Wang and Dianne Cook}, + title = {Conversations in time: interactive visualisation to explore structured temporal data}, + year = {2021}, + journal = {The R Journal}, + doi = {10.32614/RJ-2021-050}, + url = {https://journal.r-project.org/archive/2021/RJ-2021-050/index.html} +} +@manual{palmerpenguins, + title = {{palmerpenguins}: Palmer Archipelago (Antarctica) penguin data}, + author = {Allison Marie Horst and Alison Presmanes Hill and Kristen B Gorman}, + year = {2020}, + note = {R package version 0.1.0}, + url = {https://allisonhorst.github.io/palmerpenguins/} +} diff --git a/_articles/RJ-2025-045/gomez-pateiro-gonzalez-gonzalez.tex b/_articles/RJ-2025-045/gomez-pateiro-gonzalez-gonzalez.tex new file mode 100644 index 0000000000..889b2f9fc6 --- /dev/null +++ b/_articles/RJ-2025-045/gomez-pateiro-gonzalez-gonzalez.tex @@ -0,0 +1,687 @@ +% !TeX root = RJwrapper.tex +\title{ASML: An R Package for Algorithm Selection with Machine Learning} + + +\author{by Ignacio Gómez-Casares, Beatriz Pateiro-López, Brais González-Rodríguez, and Julio González-Díaz} + +\maketitle + +\abstract{% +For extensively studied computational problems, it is commonly acknowledged that different instances may require different algorithms for optimal performance. The R package ASML focuses on the task of efficiently selecting from a given portfolio of algorithms, the most suitable one for each specific problem instance, based on significant instance features. The package allows for the use of the machine learning tools available in the R package caret and additionally offers visualization tools and summaries of results that make it easier to interpret how algorithm selection techniques perform, helping users better understand and assess their behavior and performance improvements. +} + +\section{Introduction}\label{introduction} + +Selecting from a set of algorithms the most appropriate one for solving a given problem instance (understood as an individual problem case with its own specific characteristics) is a common issue that comes up in many different situations, such as in combinatorial search problems \citep{kot16,dra20}, planning and scheduling problems \citep{spe21,mes14}, or in machine learning (ML), where the multitude of available techniques often makes it challenging to determine the best approach for a particular dataset \citep{van19}. For an extensive survey on automated algorithm selection and application areas, we refer to \cite{ker19}. + +Figure \ref{fig:ASkerschke} presents a general scheme, adapted from Figure 1 in \cite{ker19}, illustrating the use of ML for algorithm selection. A set of problem instances is given, each described by associated features, together with a portfolio of algorithms that have been evaluated on all instances. The instance features and performance results are then fed into a ML framework, which is trained to produce a selector capable of predicting the best-performing algorithm for an unseen instance. Note that we are restricting attention to \emph{offline} algorithm selection, in which the selector is constructed using a training set of instances and then applied to new problem instances. + +\begin{figure}[ht] + +{\centering \includegraphics[width=1\linewidth]{figures/AS_kerschke_drawio} + +} + +\caption{Schematic overview of the interplay between problem instance features (top left), algorithm performance data (bottom left), selector construction (center), and the assessment of selector performance (bottom right). Adapted from Kerschke et al. (2019).}\label{fig:ASkerschke} +\end{figure} + +Algorithm selection tools also demonstrate significant potential in the field of optimization, enhancing performance at solving problems where multiple solving strategies are often available. For example, a key factor in the efficiency of state-of-the-art global solvers in mixed integer linear programming and also in nonlinear optimization is the design of branch-and-bound algorithms and, in particular, of their branching rules. There isn't a single branching rule that outperforms all others on every problem instance. Instead, different branching rules exhibit optimal performance on different types of problem instances. Developing methods for the automatic selection of branching rules based on instance features has proven to be an effective strategy towards solving optimization problems more efficiently \citep{lod17, ben21, gha23}. + +In algorithm selection, not only does the problem domain to which it applies and the algorithms for addressing problem instances play a crucial role, but also the metrics used to assess algorithm effectiveness ---referred to in this work as Key Performance Indicators (KPIs). KPIs are used in different fields to assess and measure the performance of specific objectives or goals. In a business context, these indicators are quantifiable metrics that provide valuable insights into how well an individual, team, or entire organization is progressing towards achieving its defined targets. In the context of algorithms, KPIs serve as quantifiable measures used to evaluate the effectiveness and efficiency of algorithmic processes. For instance, in the realm of computer science and data analysis, KPIs can include measures like execution time, accuracy, and scalability. Monitoring these KPIs allows for a comprehensive assessment of algorithmic performance, aiding in the selection of the most appropriate algorithm for a given instance and facilitating continuous improvement in algorithmic design and implementation. + +Additionally, in many applications, normalizing the KPI to a standardized range like \([0, 1]\) provides a more meaningful basis for comparison. The KPI obtained through this process, which we will refer to as instance-normalized KPI, reflects the performance of each algorithm relative to the best-performing one for each specific instance. For example, if we have multiple algorithms and we are measuring execution time that can vary across instances, normalizing the execution time for each instance relative to the fastest algorithm within that same instance allows for a fairer evaluation. This is particularly important when the values of execution time might not directly reflect the relative performance of the algorithms due to wide variations in the scale of the measurements. Thus, normalizing puts all algorithms on an equal footing, allowing a clearer assessment of their relative efficiency. + +Following the general framework illustrated in Figure \ref{fig:ASkerschke}, the R package \CRANpkg{ASML} \citep{Rasml} provides a wrapper for ML methods to select from a portfolio of algorithms based on the value of a given KPI. It uses a set of features in a training set to learn a regression model for the instance-normalized KPI value for each algorithm. Then, the instance-normalized KPI is predicted for unseen test instances, and the algorithm with the best predicted value is chosen. As learning techniques for algorithm selection, the user can invoke any regression method from the \CRANpkg{caret} package \citep{Rcaret} or use a custom function defined by the user. This makes our package flexible, as it automatically supports new methods when they are added to \CRANpkg{caret}. Although initially designed for selecting branching rules in nonlinear optimization problems, its versatility allows the package to effectively address algorithm selection challenges across a wide range of domains. It can be applied to a broad spectrum of disciplines whenever there is a diverse set of instances within a specific problem domain, a suite of algorithms with varying behaviors across instances, clearly defined metrics for evaluating the performance of the available algorithms, and known features or characteristics of the instances that can be computed and are ideally correlated with algorithm performance. The visualization tools implemented in the package allow for an effective evaluation of the performance of the algorithm selection techniques. A key distinguishing element of \CRANpkg{ASML} is its learning-phase approach, which uses instance-normalized KPI values and trains a separate regression model for each algorithm to predict its normalized KPI on unseen instances. + +\section{Background}\label{background} + +The algorithm selection problem was first outlined in the seminal work by \cite{ric76} . In simple terms, for a given set of problem instances (problem space) and a set of algorithms (algorithm space), the goal is to determine a selection model that maps each problem instance to the most suitable algorithm for it. By \emph{most suitable}, we mean the best according to a specific metric that associates each combination of instance and algorithm with its respective performance. Formally, let \(\mathcal{P}\) denote the problem space or set of problem instances. The algorithm space or set of algorithms is denoted by \(\mathcal{A}\). The metric \(p:\mathcal{P}\times\mathcal{A}\rightarrow \mathbb{R}^n\) measures the performance \(p(x,A)\) of +any algorithm \(A\in \mathcal{A}\) on instance \(x\in\mathcal{P}\). The goal is to construct a selector \(S:\mathcal{P}\rightarrow \mathcal{A}\) that maps any problem +instance \(x\in \mathcal{P}\) to an algorithm \(S(x)=A\in \mathcal{A}\) in such a way that its performance is optimal. + +As discussed in the Introduction, many algorithm selection methods in the literature use ML tools to model the relationship between problem instances and algorithm performance, using features derived from these instances. The pivotal step in this process is defining appropriate features that can be readily computed and are likely to impact algorithm performance. That is, given \(x\in\mathcal{P}\), we make use of informative features \(f (x) = (f_1(x),\ldots,f_k(x))\in \mathbb{R}^k\). In this framework, the selector \(S\) maps the simpler feature space \(\mathbb{R}^k\) into the algorithm space \(\mathcal{A}\). A scheme of the algorithm selection problem, as described in \cite{ric76}, is shown in Figure \ref{fig:rice}. + +\begin{figure}[ht] + +{\centering \includegraphics[width=0.8\linewidth]{figures/ASP_Rice} + +} + +\caption{Scheme of the algorithm selection problem by Rice (1976).}\label{fig:rice} +\end{figure} + +For the practical derivation of the selection model \(S\), we use training data consisting of features \(f(x)\) and performances \(p(x,A)\), where \(x \in \mathcal{P}^\prime \subset \mathcal{P}\) and \(A \in \mathcal{A}\). The task is to learn the selector \(S\) based on the training data. The model allows us to forecast the performance on unobserved instance problems based on their features and subsequently select the algorithm with the highest predicted performance. A comprehensive discussion of various aspects of algorithm selection techniques can be found in \cite{kot16} and \cite{pul22}. + +\section{Algorithm selection tools in R}\label{algorithm-selection-tools-in-r} + +The task of algorithm selection has seen significant advancements in recent years, with R packages facilitating this process. Here we present some of the existing tools that offer a range of functionalities, including flexible model-building frameworks, automated workflows, and standardized scenario formats, providing valuable resources for both researchers and end-users in algorithm selection. + +The \CRANpkg{llama} package \citep{Rllama} provides a flexible implementation within R for evaluating algorithm portfolios. It simplifies the task of building predictive models to solve algorithm selection scenarios, allowing users to apply ML models effectively. In \CRANpkg{llama}, ML algorithms are defined using the \CRANpkg{mlr} package \citep{Rmlr}, offering a structured approach to model selection. On the other hand, the Algorithm Selection Library (ASlib) \citep{bis16} propose a standardized format for representing algorithm selection scenarios and introduce a repository that hosts an expanding collection of datasets from the literature. It serves as a benchmark for evaluating algorithm selection techniques under consistent conditions. It is is accessible to R users through the \CRANpkg{aslib} package. This integration simplifies the process for those working within the R environment. Furthermore, \CRANpkg{aslib} interfaces with the \CRANpkg{llama} package, facilitating the analysis of algorithm selection techniques within the benchmark scenarios it provides. + +Our \CRANpkg{ASML} package offers an approach to algorithm selection based on the powerful and flexible \CRANpkg{caret} framework. By using \CRANpkg{caret}'s ability to work with many different ML models, along with its model tuning and validation tools, \CRANpkg{ASML} makes the selection process easy and effective, especially for users already familiar with \CRANpkg{caret}. Thus, while \CRANpkg{ASML} shares some conceptual similarities with \CRANpkg{llama}, it distinguishes itself through its interface to the ML models in \CRANpkg{caret} instead of \CRANpkg{mlr}, which is currently considered retired by the mlr-org team, potentially leading to compatibility issues with certain learners, and has been succeeded by the next-generation \CRANpkg{mlr3} \citep{Rmlr3}. In addition, \CRANpkg{ASML} automates the normalization of KPIs based on the best-performing algorithm for each instance, addressing the challenges that arise when performance metrics vary significantly across instances. \CRANpkg{ASML} further provides new visualization tools that can be useful for understanding the results of the learning process. A comparative overview of the main features and differences between these packages can be seen in Table \ref{tab:ASMLvsllama}. + +\begin{table}[!h] +\centering +\caption{\label{tab:ASMLvsllama}Comparative overview of ASML and llama for algorithm selection.} +\centering +\fontsize{9}{11}\selectfont +\begin{tabular}[t]{>{\raggedright\arraybackslash}m{3cm}>{\centering\arraybackslash}m{4.5cm}>{\centering\arraybackslash}m{4.5cm}} +\toprule +\textbf{Aspect} & \textbf{ASML} & \textbf{llama}\\ +\midrule +\cellcolor{gray!10}{Input data} & \cellcolor{gray!10}{features\hspace{4cm} KPIs\hspace{4cm}split by families supported} & \cellcolor{gray!10}{features\hspace{4cm} KPIs\hspace{4cm}feature costs supported}\\ +Normalized KPIs & \ding{51} & \ding{55}\\ +\cellcolor{gray!10}{ML backend} & \cellcolor{gray!10}{caret} & \cellcolor{gray!10}{mlr}\\ +Hyperparameter tuning & ASML::AStrain()\hspace{4cm}supports arguments passed to caret (trainControl(), tuneGrid) & llama::cvFolds\hspace{4cm}llama::tuneModel\\ +\cellcolor{gray!10}{Parallelization} & \cellcolor{gray!10}{\ding{51}\hspace{4cm} with snow} & \cellcolor{gray!10}{\ding{51}\hspace{4cm} with parallelMap}\\ +\addlinespace +Results summary & Per algorithm\hspace{4cm}Best overall and per instance\hspace{4cm}ML-selected & Virtual best and single best per instance\hspace{4cm}Aggregated scores (PAR, count, successes)\\ +\cellcolor{gray!10}{Visualization} & \cellcolor{gray!10}{Boxplots (per algorithm and ML-selected)\hspace{4cm}Ranking plots\hspace{4cm}Barplots (best vs ML-selected)} & \cellcolor{gray!10}{Scatter plots comparing two algorithm selectors}\\ +Model interpretability tools & \ding{51}\hspace{4cm} with DALEX & \ding{55}\\ +\cellcolor{gray!10}{ASlib integration} & \cellcolor{gray!10}{basic support} & \cellcolor{gray!10}{extended support}\\ +Latest release & CRAN 1.1.0 (2025) & CRAN 0.10.1 (2021)\\ +\bottomrule +\end{tabular} +\end{table} + +There are also automated approaches that streamline the process of selecting and optimizing ML models within the R environment. Tools like \CRANpkg{h2o} provide robust functionalities specifically designed for R users, facilitating an end-to-end ML workflow. These frameworks automate various tasks, including algorithm selection, hyperparameter optimization, and feature engineering, thereby simplifying the process for users of all skill levels. By integrating these automated solutions into R, users can efficiently explore a wide range of models and tuning options without needing extensive domain knowledge or manual intervention. This automation not only accelerates the model development process but also improves the overall performance of ML projects by allowing a systematic evaluation of different approaches and configurations. However, while \CRANpkg{h2o} excels at automating the selection of ML models and hyperparameter tuning, it does not perform algorithm selection based on instance-specific features, which is the primary focus of our approach. Instead, it evaluates multiple algorithms in parallel and selects the best-performing one based on predetermined metrics. + +\section{Using the ASML package}\label{using-the-asml-package} + +Here, we illustrate the usage of the \CRANpkg{ASML} package with an example within the context of algorithm selection for spatial branching in polynomial optimization, aligning with the problem discussed in \cite{gha23} and further explored in \cite{gon24}. Table \ref{tab:optsum} provides an overview of the problem and a summary of the components that we will discuss in detail below. + +\begin{table}[!h] +\centering +\caption{\label{tab:optsum}Summary of the branching rule selection problem.} +\centering +\fontsize{9}{11}\selectfont +\begin{tabular}[t]{l|l} +\hline +Algorithms & \ttfamily{max, sum, dual, range, eig-VI, eig-CMI}\\ +\hline +KPI & \ttfamily{pace}\\ +\hline +Number of instances & 407\\ +\hline +Number of instances per library & 180 (DS), 164 (MINLPLib), 63 (QPLIB)\\ +\hline +Number of features & 33\\ +\hline +\end{tabular} +\end{table} + +A well-known approach for finding global optima in polynomial optimization problems is based on the use of the Reformation Linearization Technique (RLT) \citep{she92}. Without delving into intricate details, RLT operates by creating a linear relaxation of the original polynomial problem, which is then integrated into a branch-and-bound framework. The branching process involves assigning a score to each variable, based on the violations of the RLT identities it participates in, after solving the corresponding relaxation at each node. Subsequently, the variable with the highest score is selected for branching. The computation of these scores is a critical aspect and allows for various approaches, leading to distinct branching rules that constitute our algorithm selection portfolio. Specifically, in our example, we will examine six distinct branching rules (referred to interchangeably as branching rules or algorithms), labeled as \texttt{max}, \texttt{sum}, \texttt{dual}, \texttt{range}, \texttt{eig-VI}, and \texttt{eig-CMI} rules. For the definitions and a comprehensive understanding of the rationale behind these rules, refer to \cite{gha23}. + +Measuring the performance of different algorithms in the context of optimization is crucial for evaluating their effectiveness and efficiency. Two common metrics for this evaluation are running time and optimality gap, measured as a function of the lower and upper bounds for the objective function value at the end of the algorithm (a small optimality gap indicates that the algorithm is producing solutions close to the optimal). Both metrics are important and are often considered together to evaluate algorithm performance. For instance, it is meaningful to consider the time required to reduce the optimality gap by one unit as KPI. In our example, and to ensure it is well-defined, we make use of a slightly different metric, which we refer to as pace, defined as the time required to increase the lower bound by one unit. For the pace, a smaller value is preferred, as it indicates better performance. + +As depicted in Figure \ref{fig:rice}, a crucial aspect of the methodology involves selecting input variables (features) that facilitate the prediction of the KPI for each branching rule. We consider 33 features representing global information of the polynomial optimization problems, such as relevant characteristics of variables, constraints, monomials, coefficients, or other attributes. A detailed description of the considered features can be found in Table \ref{tab:featsum}. Although we won't delve into these aspects, determining appropriate features is often complex, and use feature-selection methods can be beneficial for choosing the most relevant ones. + +\begin{table} +\centering +\caption{\label{tab:featsum}Features from the branching dataset.} +\centering +\begin{tabular}[t]{l|l} +\hline +\textbf{Index} & \textbf{Description}\\ +\hline +\ttfamily{3} & Number of variables\\ +\hline +\ttfamily{4} & Number of constraints\\ +\hline +\ttfamily{5} & Degree\\ +\hline +\ttfamily{6} & Number of monomials\\ +\hline +\ttfamily{7} & Density\\ +\hline +\ttfamily{8} & Density of VIG\\ +\hline +\ttfamily{9} & Modularity of VIG\\ +\hline +\ttfamily{10} & Treewidth of VIG\\ +\hline +\ttfamily{11} & Density of CMIG\\ +\hline +\ttfamily{12} & Modularity of CMIG\\ +\hline +\ttfamily{13} & Treewidth of CMIG\\ +\hline +\ttfamily{14} & Pct. of variables not present in any monomial with degree greater than one\\ +\hline +\ttfamily{15} & Pct. of variables not present in any monomial with degree greater than two\\ +\hline +\ttfamily{16} & Number of variables divided by number of constrains\\ +\hline +\ttfamily{17} & Number of variables divided by degree\\ +\hline +\ttfamily{18} & Pct. of equality constraints\\ +\hline +\ttfamily{19} & Pct. of linear constraints\\ +\hline +\ttfamily{20} & Pct. of quadratic constraints\\ +\hline +\ttfamily{21} & Number of monomials divided by number of constrains\\ +\hline +\ttfamily{22} & Number of RLT variables divided by number of constrains\\ +\hline +\ttfamily{23} & Pct. of linear monomials\\ +\hline +\ttfamily{24} & Pct. of quadratic monomials\\ +\hline +\ttfamily{25} & Pct. of linear RLT variables\\ +\hline +\ttfamily{26} & Pct. of quadratic RLT variables\\ +\hline +\ttfamily{27} & Variance of the ranges of the variables\\ +\hline +\ttfamily{28} & Variance of the coefficients\\ +\hline +\ttfamily{29} & Variance of the density of the variables\\ +\hline +\ttfamily{30} & Variance of the no. of appearances of each variable\\ +\hline +\ttfamily{31} & Average of the ranges of the variables\\ +\hline +\ttfamily{32} & Average of the coefficients\\ +\hline +\ttfamily{33} & Average pct. of monomials in each constraint and in the objective function\\ +\hline +\ttfamily{34} & Average of the no. of appearances of each variable\\ +\hline +\ttfamily{35} & Median of the ranges of the variables\\ +\hline +\multicolumn{2}{l}{\rule{0pt}{1em}\textit{Note: }}\\ +\multicolumn{2}{l}{\rule{0pt}{1em}Index refers to columns of branching\$x.}\\ +\end{tabular} +\end{table} + +To assess the performance of the algorithm selection methods in this context, we have a diverse set of 407 instances from different optimization problems, taken from three well-known benchmarks \citep{bus03, dal16, fur18}, corresponding respectively to the MINLPLib, DS, and QPLIB libraries. Details are given in Table \ref{tab:optsum}. The data for this analysis is contained within the \texttt{branching} dataset included in the package. We begin by defining two data frames. The \texttt{features} data frame includes two initial columns that provide the instance names and the corresponding family (library in our example) for each instance. The remaining columns consist of the features listed in Table \ref{tab:featsum}. + +We also define the \texttt{KPI} data frame, which is derived from \texttt{branching\$y}. This data frame contains the pace values for each of the six branching rules considered in this study (specified by the labels in the \texttt{lab\_rules} vector). These data frames will serve as the input for our subsequent analyses. + +\begin{verbatim} +set.seed(1234) +library(ASML) +data(branching) +features <- branching$x +KPI <- branching$y +lab_rules <- c("max", "sum", "dual", "range", "eig-VI", "eig-CMI") +\end{verbatim} + +\subsection{Pre-Processing the data}\label{pre-processing-the-data} + +As with any analysis, the first step involves preprocessing the data. This includes using the function \texttt{partition\_and\_normalize}, which not only divides the dataset into training and test sets but also normalizes the KPI relative to the best result for each instance. The argument \texttt{better\_smaller} specifies whether a lower KPI value is preferred (such as in our case where the KPI represents pace, with smaller values indicating better performance) or if a higher value is desired, when larger KPI values are considered more advantageous. + +\begin{verbatim} +data <- partition_and_normalize(features, KPI, family_column = 1, split_by_family = TRUE, + better_smaller = TRUE) +names(data) +\end{verbatim} + +\begin{verbatim} +#> [1] "x.train" "y.train" "y.train.original" "x.test" +#> [5] "y.test" "y.test.original" "families.train" "families.test" +#> [9] "better_smaller" +\end{verbatim} + +When using the function \texttt{partition\_and\_normalize} the resulting object is of class \texttt{as\_data} and contains several key components essential for our study. Specifically, the object includes \texttt{x.train} and \texttt{x.test}, representing the feature sets for the training and test datasets, respectively. Additionally, it contains \texttt{y.train} and \texttt{y.test}, with the instance-normalized KPI corresponding to each dataset, along with their original counterparts, \texttt{y.train.original} and \texttt{y.test.original}. This structure allows us to retain the original KPI values while working with the instance-normalized data. Furthermore, when the parameter \texttt{split\_by\_family} is set to \texttt{TRUE}, as in the example, the object includes also \texttt{families.train} and \texttt{families.test}, indicating the family affiliation for each observation within the training and test sets. Figure \ref{fig:partitionPLOT} illustrates how the split preserves the proportions of instances for each library. + +\begin{figure}[ht] + +{\centering \includegraphics[width=0.7\linewidth,alt={Train/Test partition preserving the percentage of instances for each library.}]{figures/partitionPLOT-1} + +} + +\caption{Train/Test partition preserving the percentage of instances for each library.}\label{fig:partitionPLOT} +\end{figure} + +As a tool for visualizing the performance of the considered algorithms, the \texttt{boxplots} function operates on objects of class \texttt{as\_data} and generates boxplots for the instance-normalized KPI. This visualization facilitates the comparison of performance differences across instances. The function can be applied to both training and test observations and can also group the results by family. Additionally, it accepts common arguments typically used in R functions. Figure \ref{fig:splitPLOT} shows the instance-normalized KPI of the instances in the train set. What becomes evident from the boxplots is that there is no branching rule that outperforms the others across all instances, and making a wrong choice of criteria in certain problems can lead to very poor performance. + +\begin{verbatim} +boxplots(data, test = FALSE, by_families = FALSE, labels = lab_rules) +\end{verbatim} + +\begin{figure}[ht] + +{\centering \includegraphics[width=0.7\linewidth,alt={Boxplots of instance-normalized KPI for each algorithm across instances in the train set.}]{figures/splitPLOT-1} + +} + +\caption{Boxplots of instance-normalized KPI for each algorithm across instances in the train set.}\label{fig:splitPLOT} +\end{figure} + +The \texttt{ranking} function, specifically designed for the \CRANpkg{ASML} package, is also valuable for visualizing the differing behaviors of the algorithms under investigation, depending on the analyzed instances. After ranking the algorithms for each instance, based on the instance-normalized KPI, the function generates a bar chart for each algorithm, indicating the percentage of times it occupies each ranking position. The numbers displayed within the bars represent the mean value of the instance-normalized KPI for the problems associated with that specific ranking position. Again, the representation can be made both for the training and test sets, as well as by family. In Figure \ref{fig:rank}, we present the chart corresponding to the training sample and categorized by family. In particular, it is observed that certain rules, when not the best choice for a given instance, can perform quite poorly in terms of instance-normalized KPI (see, for example, the results on the MINLPLib library). This highlights the importance of not only selecting the best algorithm for each instance but also ensuring that the chosen algorithm does not perform too poorly when it isn't optimal. In some cases, even if an algorithm isn't the best-performing option, it may still provide reasonably good results, whereas a wrong choice can result in significantly worse outcomes. + +\begin{verbatim} +ranking(data, test = FALSE, by_families = TRUE, labels = lab_rules) +\end{verbatim} + +\begin{figure}[ht] + +{\centering \includegraphics[width=0.7\linewidth,alt={Ranking of algorithms based on the instance-normalized KPI for the training sample, categorized by family. The bars represent the percentage of times each algorithm appeared in different ranking positions, with the numbers indicating the mean value of the KPI.}]{figures/rank-1} + +} + +\caption{Ranking of algorithms based on the instance-normalized KPI for the training sample, categorized by family. The bars represent the percentage of times each algorithm appeared in different ranking positions, with the numbers indicating the mean value of the KPI.}\label{fig:rank} +\end{figure} + +Additionally, functions from the \CRANpkg{caret} package can be applied if further operations on the predictors are needed. Here we show an example where the Yeo-Johnson transformation is applied to the training set, and the same transformation is subsequently applied to the test set to ensure consistency across both datasets. The flexibility of \CRANpkg{caret} also allows for the inclusion of advanced techniques, such as feature selection and dimensionality reduction, to improve the quality of the algorithm selection process. + +\begin{verbatim} +preProcValues <- caret::preProcess(data$x.train, method = "YeoJohnson") +data$x.train <- predict(preProcValues, data$x.train) +data$x.test <- predict(preProcValues, data$x.test) +\end{verbatim} + +\subsection{Training models and predicting the performance of the algorithms}\label{training-models-and-predicting-the-performance-of-the-algorithms} + +The approach in \CRANpkg{ASML} to algorithm selection is based on building regression models that predict the instance-normalized KPI of each considered algorithm. To this end, users can take advantage of the wide range of ML models available in the \CRANpkg{caret} package, that provides a unified interface for training and tuning various types of models. Models trained with \CRANpkg{caret} can be seamlessly integrated into the \CRANpkg{ASML} workflow using the \texttt{AStrain} function from \CRANpkg{ASML}, as shown in the next example. Just for illustrative purposes, we use quantile random forest \citep{mei06} to model the behavior of the instance-normalized KPI based on the features. This is done with the \texttt{qrf} method in the \CRANpkg{caret} package, which relies on the \CRANpkg{quantregForest} package \citep{mei24}. + +\begin{verbatim} +library(quantregForest) +tune_grid <- expand.grid(mtry = 10) +training <- AStrain(data, method = "qrf", tuneGrid = tune_grid) +\end{verbatim} + +Additional arguments for \texttt{caret::train} can also be passed directly to \texttt{ASML::AStrain}. This allows users to take advantage of the flexibility of the \CRANpkg{caret} package, including specifying control methods (such as cross-validation), tuning parameters, or any other relevant settings provided by \texttt{caret::train}. This integration ensures that the \CRANpkg{ASML} workflow can fully make use of the modeling capabilities offered by \CRANpkg{caret}. To make the execution faster (it is not our intention here to delve into the choice of the best model), we use a \texttt{tune\_grid} that sets a fixed value for \texttt{mtry}. This avoids the need for an exhaustive search for this hyperparameter, speeding up the model training process. Other modeling approaches should also be considered, as they may offer better performance depending on the specific characteristics of the data and the problem at hand. For more computationally intensive models or larger datasets, the \texttt{ASML::AStrain} function includes the argument \texttt{parallel}, which can be set to TRUE to enable parallel execution using the \CRANpkg{snow} package \citep{Rsnow}. This allows the training step to be distributed across multiple cores, reducing computation time. A detailed example on a larger dataset is provided in the following section, showing the scalability of the workflow and the effect of parallelization on training time. + +The function \texttt{caret::train} returns a trained model along with performance metrics, predictions, and tuning parameters, providing insights into the model's effectiveness. In a similar manner, \texttt{ASML::AStrain} offers the same type of output but for each algorithm under consideration, allowing straightforward comparison within the \CRANpkg{ASML} framework. The \texttt{ASML::ASpredict} function generates the predictions for new data by using the models created during the training phase for each algorithm under evaluation. Thus, predictions for the algorithms are obtained simultaneously, facilitating a direct comparison of their performance. By using \texttt{ASML::ASpredict} as follows, we obtain a matrix where each row corresponds to an instance from the test set, and each column represents the predicted instance-normalized KPIs for the six branching rules using the \texttt{qrf} method. + +\begin{verbatim} +predict_test <- ASpredict(training, newdata = data$x.test) +\end{verbatim} + +\subsection{Evaluating and visualizing the results}\label{evaluating-and-visualizing-the-results} + +One of the key strengths of the \CRANpkg{ASML} package lies in its ability to evaluate results collectively and provide intuitive visualizations. This approach not only aids in identifying the most effective algorithms but also contributes to the interpretability of the results, making it easier for users to make informed decisions based on the performance metrics and visual representations provided. For example, the function \texttt{KPI\_table} returns a table showing the arithmetic and geometric mean of the KPI (both instance-normalized and not normalized) obtained on the test set for each algorithm, as well as for the algorithm selected by the learning model (the one with the largest instance-normalized predicted KPI for each instance). In Table \ref{tab:AMSLtab2}, the results for our case study are shown. It is important to note that larger values are better in the columns for the arithmetic and geometric mean of the instance-normalized KPI (where values close to 1 indicate the best performance). Conversely, in the columns for non-normalized values, lower numbers reflect better outcomes. In all cases, the best results are obtained for the ML algorithm. Note also that in this case, the differences in the performance of the algorithms are likely better reflected by the geometric mean because gives a better representation of relative differences. + +\begin{verbatim} +KPI_table(data, predictions = predict_test) +\end{verbatim} + +\begin{table}[!h] +\centering +\caption{\label{tab:AMSLtab2}Arithmetic and geometric mean of the KPI (both instance-normalized and non-normalized) for each algorithm on the test set, along with the results for the algorithm selected by the learning model (first row).} +\centering +\fontsize{9}{11}\selectfont +\begin{tabular}[t]{l|>{\raggedleft\arraybackslash}p{2.5cm}|>{\raggedleft\arraybackslash}p{2.5cm}|>{\raggedleft\arraybackslash}p{2.5cm}|>{\raggedleft\arraybackslash}p{2.5cm}} +\hline + & Arith. mean\newline inst-norm KPI & Geom. mean\newline inst-norm KPI & Arith. mean\newline non-norm KPI & Geom. mean\newline non-norm KPI\\ +\hline +\ttfamily{ML} & 0.911 & 0.887 & 88114.19 & 1.035\\ +\hline +\ttfamily{max} & 0.719 & 0.367 & 158716.13 & 2.574\\ +\hline +\ttfamily{sum} & 0.791 & 0.537 & 104402.53 & 1.780\\ +\hline +\ttfamily{dual} & 0.842 & 0.581 & 104393.92 & 1.634\\ +\hline +\ttfamily{range} & 0.879 & 0.644 & 107064.29 & 1.432\\ +\hline +\ttfamily{eig-VI} & 0.781 & 0.474 & 131194.49 & 2.007\\ +\hline +\ttfamily{eig-CMI} & 0.800 & 0.591 & 88197.74 & 1.616\\ +\hline +\end{tabular} +\end{table} + +Additionally, the function \texttt{KPI\_summary\_table} generates a concise comparative table displaying values for three different choices: single best, ML, and optimal, see Table \ref{tab:AMSLtabsum2}. The single best choice refers to selecting the same algorithm for all instances based on the lowest geometric mean of the non-normalized KPI (in this case the \texttt{range} rule). This approach evaluates the performance of each algorithm across all instances and chooses the one that consistently performs best overall, rather than optimizing for individual instances. The ML choice represents the algorithm selected by the quantile random forest model. The optimal choice corresponds to solving each instance with the algorithm that performs best for that specific instance. The ML choice shows promising results, with a mean KPI close to the optimal choice, demonstrating its capability to select algorithms that yield competitive performance. + +\begin{verbatim} +KPI_summary_table(data, predictions = predict_test) +\end{verbatim} + +\begin{table}[!h] +\centering +\caption{\label{tab:AMSLtabsum2}Arithmetic and geometric mean of the non-normalized KPI for single best choice, ML choice, and optimal choice.} +\centering +\fontsize{9}{11}\selectfont +\begin{tabular}[t]{l|>{\raggedleft\arraybackslash}p{2.5cm}|>{\raggedleft\arraybackslash}p{2.5cm}} +\hline + & Arith. mean\newline non-norm KPI & Geom. mean\newline non-norm KPI\\ +\hline +\ttfamily{single best} & 107064.29 & 1.432\\ +\hline +\ttfamily{ML} & 88114.19 & 1.035\\ +\hline +\ttfamily{optimal} & 88085.37 & 0.911\\ +\hline +\end{tabular} +\end{table} + +The following code generates several visualizations that help us compare how well the algorithms perform according to the response variable (instance-normalized KPI) and also illustrate the behavior of the learning process. These plots give us good insights into how effective the algorithm selection process is and how it behaves in comparison to using the same branching rule for all instances. Figure \ref{fig:ASMLplot1} shows the boxplots comparing the performance of each algorithm in terms of the instance-normalized KPI, including the instance-normalized KPI of the rules selected by the ML process for the test set. In Figure \ref{fig:ASMLplot2}, the performance is presented by family, allowing for a more detailed comparison across the different sets of instances. In Figure \ref{fig:ASMLplot3}, we show the ranking of algorithms based on the instance-normalized KPI for the test sample, including the ML rule, categorized by family. Finally, in Figure \ref{fig:ASMLplot4}, the right-side bar in the stacked bar plot (optimal) illustrates the proportion of instances in which each of the original rules is identified as the best-performing option. In contrast, the left-side bar (ML) depicts the frequency with which ML selects each rule as the top choice. Although the rule chosen by ML in each instance doesn't always match the best one for that case, ML tends to select the different rules in a similar proportion to how often those rules are the best across the test set. This means it does not consistently favor a particular rule or ignore any that are the best a significant percentage of instances. + +\begin{verbatim} +boxplots(data, predictions = predict_test, labels = c(lab_rules, "ML")) +boxplots(data, predictions = predict_test, labels = c(lab_rules, "ML"), by_families = TRUE) +ranking(data, predictions = predict_test, labels = c("ML", lab_rules), by_families = TRUE) +figure_comparison(data, predictions = predict_test, by_families = FALSE, labels = lab_rules) +\end{verbatim} + +\begin{figure}[ht] + +{\centering \includegraphics[width=0.7\linewidth,alt={Boxplots of instance-normalized KPI for each algorithm, including the ML algorithm, across instances in the test set.}]{figures/ASMLplot1-1} + +} + +\caption{Boxplots of instance-normalized KPI for each algorithm, including the ML algorithm, across instances in the test set.}\label{fig:ASMLplot1} +\end{figure} + +\begin{figure}[ht] + +{\centering \includegraphics[width=0.7\linewidth,alt={Boxplots of instance-normalized KPI for each algorithm, including the ML algorithm, across instances in the test set, categorized by family.}]{figures/ASMLplot2-1} + +} + +\caption{Boxplots of instance-normalized KPI for each algorithm, including the ML algorithm, across instances in the test set, categorized by family.}\label{fig:ASMLplot2} +\end{figure} + +\begin{figure}[ht] + +{\centering \includegraphics[width=0.7\linewidth,alt={Ranking of algorithms, including the ML algorithm, based on the instance-normalized KPI for the test sample, categorized by family. The bars represent the percentage of times each algorithm appeared in different ranking positions, with the numbers indicating the mean value of the normalized KPI.}]{figures/ASMLplot3-1} + +} + +\caption{Ranking of algorithms, including the ML algorithm, based on the instance-normalized KPI for the test sample, categorized by family. The bars represent the percentage of times each algorithm appeared in different ranking positions, with the numbers indicating the mean value of the normalized KPI.}\label{fig:ASMLplot3} +\end{figure} + +\begin{figure}[ht] + +{\centering \includegraphics[width=0.7\linewidth,alt={Comparison of the best-performing rules: The right stack shows the proportion of times each of the original rules is identified as the best-performing option, while the left stack presents the frequency of selection by ML.}]{figures/ASMLplot4-1} + +} + +\caption{Comparison of the best-performing rules: The right stack shows the proportion of times each of the original rules is identified as the best-performing option, while the left stack presents the frequency of selection by ML.}\label{fig:ASMLplot4} +\end{figure} + +\subsection{Custom user-defined methods}\label{custom-user-defined-methods} + +While \CRANpkg{caret} provides a range of built-in methods for model training and prediction, there may be situations where researchers want to explore additional methods not directly integrated into the package. Considering alternative methods can improve the analysis and provide greater flexibility in modeling choices. + +In this section, we present an example of how to modify the quantile random forest \texttt{qrf} method. The \texttt{qrf} implementation in \CRANpkg{caret} does not allow users to specify the conditional quantile to predict, which is set to the median by default. In this case, rather than creating an entirely new method, we only need to adjust the prediction function to include the \texttt{what} argument, allowing us to specify the desired conditional quantile for prediction. In this execution example, we base the algorithm selection method on the predictions of the \(\alpha\)-conditional quantile of the instance-normalized KPI for \(\alpha = 0.25\). + +\begin{verbatim} +qrf_q_predict <- function(modelFit, newdata, what = 0.5, submodels = NULL) { + out <- predict(modelFit$finalModel, newdata, what = what) + if (is.matrix(out)) + out <- out[, 1] + out +} + +predict_test_Q1 <- ASpredict(training, newdata = data$x.test, f = "qrf_q_predict", + what = 0.25) +KPI_summary_table(data, predictions = predict_test_Q1) +\end{verbatim} + +\subsection{Model interpretability}\label{model-interpretability} + +Predictive modeling often relies on flexible but complex methods. These methods typically involve many parameters or hyperparameters, which can make the models difficult to interpret. To address this, interpretable ML techniques provide tools for exploring \emph{black-box} models. \CRANpkg{ASML} integrates seamlessly with the package \CRANpkg{DALEX} (moDel Agnostic Language for Exploration and eXplanation), see \cite{DALEX}. With \CRANpkg{DALEX}, users can obtain model performance metrics, evaluate feature importance, and generate partial dependence plots (PDPs), among other analyses. + +To simplify the use of \CRANpkg{DALEX} within our framework, \CRANpkg{ASML} provides the function \texttt{ASexplainer}. This function automatically creates \CRANpkg{DALEX} explainers for the models trained with \texttt{AStrain} (one for each algorithm in the portfolio). Once the explainers are created, users can easily apply \CRANpkg{DALEX} functions to explore and compare the behavior of each model. The following example shows how to obtain a plot of the reversed empirical cumulative distribution function of the absolute residuals, from the performance metrics computed with \texttt{DALEX::model\_performance}, see Figure \ref{fig:DALEX1}. + +\begin{verbatim} +# Create DALEX explainers for each trained model +explainers_qrf <- ASexplainer(training, data = data$x.test, y = data$y.test, labels = lab_rules) +# Compute model performance metrics for each explainer +mp_qrf <- lapply(explainers_qrf, DALEX::model_performance) +# Plot the performance metrics +do.call(plot, unname(mp_qrf)) +\end{verbatim} + +\begin{figure}[ht] + +{\centering \includegraphics[width=0.7\linewidth,alt={Reversed empirical cumulative distribution function of the absolute residuals of the trained model.}]{figures/DALEX1-1} + +} + +\caption{Reversed empirical cumulative distribution function of the absolute residuals of the trained models.}\label{fig:DALEX1} +\end{figure} + +The code below illustrates how to obtain feature importance (via \texttt{DALEX::model\_parts}) and a PDP for the predictor variable \texttt{degree} (via \texttt{DALEX::model\_profile}). Plots are not displayed in this manuscript, but they can be generated by executing the code. + +\begin{verbatim} +# Compute feature importance for each model in the explainers list +vi_qrf <- lapply(explainers_qrf, DALEX::model_parts) +# Plot the top 5 most important variables for each model +do.call(plot, c(unname(vi_qrf), list(max_vars = 5))) +# Compute PDP for the variable 'degree' for each model +pdp_qrf <- lapply(explainers_qrf, DALEX::model_profile, variable = "degree", type = "partial") +# Plot the PDPs generated +do.call(plot, unname(pdp_qrf)) +\end{verbatim} + +\section{Example on a larger dataset}\label{example-on-a-larger-dataset} + +To analyze the scalability of \CRANpkg{ASML}, we now consider an example of algorithm selection in the field of high-performance computing (HPC), specifically in the context of the automatic selection of the most suitable storage format for sparse matrices on GPUs. This is a well-known problem in HPC, since the storage format has a decisive impact on the performance of many scientific kernels such as the sparse matrix--vector multiplication (SpMV). For this study, we use the dataset introduced by \cite{pic18}, which contains 8111 sparse matrices and is available in the \CRANpkg{ASML} package under the name \texttt{SpMVformat}. Each matrix is described by a set of nine structural features, and the performance of the single-precision SpMV kernel was measured on a NVIDIA GeForce GTX TITAN GPU, under three storage formats: compressed row storage (CSR), ELLPACK (ELL), and hybrid (HYB). For each matrix and format, performance is expressed as the average GFLOPS (billions of floating-point operations per second), over 1000 SpMV operations. This setup allows us to study how matrix features relate to the most efficient storage format. + +The workflow follows the standard \CRANpkg{ASML} pipeline: data are partitioned and normalized, preprocessed, and models are trained using \texttt{ASML::AStrain}. We considered different learning methods available in \CRANpkg{caret} and evaluated execution times both with and without parallel processing, which is controlled via the \texttt{parallel} argument in \texttt{ASML::AStrain}. The selected methods were run with their default configurations in \CRANpkg{caret}, without additional hyperparameter tuning. All experiments were performed on a machine equipped with a 12th Gen Intel(R) Core(TM) i7-12700 (12 cores), 2.11 GHz processor and 32 GB of RAM. The execution times are summarized in Tables \ref{tab:AMSLtimes} and \ref{tab:AMSLtimes2}. + +\begin{table}[!h] +\centering +\caption{\label{tab:AMSLtimes}Execution times (in seconds) on the SpMVformat dataset for the main preprocessing stages.} +\centering +\fontsize{9}{11}\selectfont +\begin{tabular}[t]{l|>{\raggedleft\arraybackslash}p{10em}} +\hline +Stage & Execution time (seconds)\\ +\hline +ASML::partition\_and\_normalize & 0.03\\ +\hline +caret::preProcess & 1.55\\ +\hline +\end{tabular} +\end{table} + +\begin{table}[!h] +\centering +\caption{\label{tab:AMSLtimes2}Training times (in seconds) on the SpMVformat dataset for different methods using ASML::AStrain. The first column shows execution without parallelization (parallel = FALSE) and the second column shows execution with parallelization (parallel = TRUE).} +\centering +\fontsize{9}{11}\selectfont +\begin{tabular}[t]{l|>{\raggedleft\arraybackslash}p{10em}|>{\raggedleft\arraybackslash}p{10em}} +\hline +\multicolumn{1}{c|}{ } & \multicolumn{2}{c}{Execution times (in seconds) of ASML::AStrain} \\ +\cline{2-3} +Method & parallel = FALSE & parallel = TRUE\\ +\hline +nnet & 236.58 & 50.75\\ +\hline +svmRadial & 881.03 & 263.60\\ +\hline +rf & 4753.00 & 1289.68\\ +\hline +\end{tabular} +\end{table} + +The majority of the computational cost is associated with model training, which depends on the learning method in \CRANpkg{caret}. We observe that training times vary across methods: \texttt{nnet} (a simple feed-forward neural network), \texttt{svmRadial} (support vector machines with radial kernel), and \texttt{rf} (random forest). Parallel execution substantially reduces training times for all selected methods, demonstrating that the workflow scales efficiently to larger datasets while keeping preprocessing overhead minimal. + +Apart from the execution times, we also take this opportunity to provide a brief commentary on the outcome of the algorithm selection in this application example. In particular, we illustrate the model's ability to identify the most efficient storage format by reporting the results obtained with the \texttt{nnet} method, see Figure \ref{fig:ASMLnnet1}. The trained model selects the best-performing format in more than 85\% of the test cases, and even when it does not, the chosen format still achieves high performance, with mean value of the normalized KPI (normalized average GFLOPS) around 0.9. + +\begin{verbatim} +set.seed(1234) +data(SpMVformat) +features <- SpMVformat$x +KPI <- SpMVformat$y +data <- partition_and_normalize(features, KPI, better_smaller = FALSE) +preProcValues <- caret::preProcess(data$x.train, method = "YeoJohnson") +data$x.train <- predict(preProcValues, data$x.train) +data$x.test <- predict(preProcValues, data$x.test) +training <- AStrain(data, method = "nnet", parallel = TRUE) +pred <- ASpredict(training, newdata = data$x.test) +ranking(data, predictions = pred) +\end{verbatim} + +\begin{figure}[ht] + +{\centering \includegraphics[width=0.7\linewidth,alt={Ranking of storage formats, including the ML selected, based on the instance-normalized KPI for the test sample. The bars represent the percentage of times each storage format appeared in different ranking positions, with the numbers indicating the mean value of the normalized KPI.}]{figures/ASMLnnet1-1} + +} + +\caption{Ranking of storage formats, including the ML selected, based on the instance-normalized KPI for the test sample. The bars represent the percentage of times each storage format appeared in different ranking positions, with the numbers indicating the mean value of the normalized KPI.}\label{fig:ASMLnnet1} +\end{figure} + +\section{Using ASML for algorithm selection on ASlib scenarios}\label{using-asml-for-algorithm-selection-on-aslib-scenarios} + +While the primary purpose of the \CRANpkg{ASML} package is not to systematically conduct algorithm selection studies like those found in ASlib (an area for which the \CRANpkg{llama} toolkit is specially helpful), it does offer a complementary approach for reproducing results from the ASlib benchmark (\url{https://coseal.github.io/aslib-r/scenario-pages/index.html}). Our method allows for a comparative analysis using instance-normalized KPIs, which, as demonstrated in the following example, can sometimes yield improved performance results. Additionally, it can be useful for evaluating algorithm selection approaches based on methods that are not available in the \CRANpkg{mlr} package used by \CRANpkg{llama} but are accessible in \CRANpkg{caret}. + +\subsection{Data download and preparation}\label{data-download-and-preparation} + +First, we identify the specific scenario from ASlib we are interested in, in this case, \texttt{CPMP-2015}. Using the scenario name, we construct a URL that points to the corresponding page on the ASlib website. Then, we fetch the HTML content of the page and create a local directory to store the downloaded files\footnote{A more direct approach would be to use the \texttt{getCosealASScenario} function from the \CRANpkg{aslib} package; however, this function seems to be currently not working, likely due to changes in the directory structure of the scenarios.}. + +\begin{verbatim} +set.seed(1234) +library(tidyverse) +library(rvest) +scen <- "CPMP-2015" +url <- paste0("https://coseal.github.io/aslib-r/scenario-pages/", scen, "/data_files") +page <- read_html(paste0(url, ".html")) +file_links <- page %>% + html_nodes("a") %>% + html_attr("href") + +# Create directory for downloaded files +dir_data <- paste0(scen, "_data") +dir.create(dir_data, showWarnings = FALSE) + +# Download files +for (link in file_links) { + full_link <- ifelse(grepl("^http", link), link, paste0(url, "/", link)) + file_name <- basename(link) + dest_file <- file.path(dir_data, file_name) + if (!is.na(full_link)) { + download.file(full_link, dest_file, mode = "wb", quiet = TRUE) + } +} +\end{verbatim} + +\subsection{Data preparation with aslib}\label{data-preparation-with-aslib} + +Now, we use the \CRANpkg{aslib} package to parse the scenario data and extract the relevant features and performance metrics. The \texttt{parseASScenario} function from \CRANpkg{aslib} creates a structured object \texttt{ASScen} that contains information regarding the algorithms and instances being evaluated. We then transform this data into cross-validation folds using the \texttt{cvFolds} function from \CRANpkg{llama}. This conversion facilitates a fair evaluation of algorithm performance across different scenarios, allowing us to compare the results with those published \footnote{\label{foot}Available at: https://coseal.github.io/aslib-r/scenario-pages/CPMP-2015/llama.html (Accessed October 25, 2024).}. + +\begin{verbatim} +library(aslib) +ASScen <- aslib::parseASScenario(dir_data) +llamaScen <- aslib::convertToLlama(ASScen) +folds <- llama::cvFolds(llamaScen) +\end{verbatim} + +Then we extract the key performance indicator (KPI) and features from the folds object. In this case, \texttt{KPI} refers to runtime. As described in the ASlib documentation, \texttt{KPI\_pen} measures the penalized runtime. If an instance is solved within the timeout (\texttt{cutoff}) by the selected algorithm, the actual runtime is used. However, if a timeout occurs, the timeout value is multiplied by 10 to penalize the algorithm's performance. We also define \texttt{nins} as the number of instances and \texttt{ID} as unique identifiers for each instance. + +\begin{verbatim} +KPI <- folds$data[, folds$performance] +features <- folds$data[, folds$features] +cutoff <- ASScen$desc$algorithm_cutoff_time +is.timeout <- ASScen$algo.runstatus[, -c(1, 2)] != "ok" +KPI_pen <- KPI * ifelse(is.timeout, 10, 1) +nins <- length(getInstanceNames(ASScen)) +ID <- 1:nins +\end{verbatim} + +\subsection{Quantile random forest using ASML on instance-normalized KPI}\label{quantile-random-forest-using-asml-on-instance-normalized-kpi} + +We use the \CRANpkg{ASML} package to perform quantile random forest on instance-normalized KPI. We have already established the folds beforehand, and we want to use those partitions to maintain consistency with the original ASlib scenario design. Therefore, we provide \texttt{x.test} and \texttt{y.test} as arguments directly to the \texttt{partition\_and\_normalize} function. + +\begin{verbatim} +data <- partition_and_normalize(x = features, y = KPI, x.test = features, y.test = KPI, + better_smaller = TRUE) +train_control <- caret::trainControl(index = folds$train, savePredictions = "final") +training <- AStrain(data, method = "qrf", trControl = train_control) +\end{verbatim} + +In this code block, we process the predictions made by the models trained using \CRANpkg{ASML} and calculate the same performance metrics used in ASlib, namely, the percentage of solved instances (\texttt{succ}), penalized average runtime (\texttt{par10}), and misclassification penalty (\texttt{mcp}), as detailed in the ASlib documentation \citep{bis16}. + +\begin{verbatim} +pred_list <- lapply(training, function(model) { + model$pred %>% + arrange(rowIndex) %>% + pull(pred) +}) + +pred <- do.call(cbind, pred_list) +alg_sel <- apply(pred, 1, which.max) + +succ = mean(!is.timeout[cbind(ID, alg_sel)]) +par10 = mean(KPI_pen[cbind(ID, alg_sel)]) +mcp = mean(KPI[cbind(ID, alg_sel)] - apply(KPI, 1, min)) +\end{verbatim} + +In Table \ref{tab:AMSLtabASLIB}, we present the results. We observe that, in this example, using instance-normalized KPI along with the quantile random forest model offers an alternative modeling option in addition to the standard regression models employed in the original ASlib study (linear model, regression trees and regression random forest), resulting in improved performance outcomes. + +\begin{table}[!h] +\centering +\caption{\label{tab:AMSLtabASLIB}Performance results of various models on the CPMP-2015 dataset. The last row represents the performance of the quantile random forest model based on instance-normalized KPI using the \CRANpkg{ASML} package. + The preceding rows detail the results (all taken from original ASlib study) of the virtual best solver (vbs), single best solver (singleBest), and the considered regression methods (linear model, regression trees and regression random forest).} +\centering +\fontsize{9}{11}\selectfont +\begin{tabular}[t]{l|r|r|r} +\hline +\textbf{\ttfamily{Model}} & \textbf{\ttfamily{succ}} & \textbf{\ttfamily{par10}} & \textbf{\ttfamily{mcp}}\\ +\hline +baseline vbs & 1.000 & 227.605 & 0.000\\ +\hline +baseline singleBest & 0.812 & 7002.907 & 688.774\\ +\hline +regr.lm & 0.843 & 5887.326 & 556.875\\ +\hline +regr.rpart & 0.843 & 5916.120 & 585.669\\ +\hline +regr.randomForest & 0.846 & 5748.065 & 540.574\\ +\hline +ASML qrf & 0.873 & 4807.633 & 460.863\\ +\hline +\end{tabular} +\end{table} + +It's important to note that this is merely an example of execution and there are other scenarios in ASlib where replication may not be feasible in the same manner, due to factors not considered in \CRANpkg{ASML} (for more robust behavior across the ASlib benchmark, we refer to \CRANpkg{llama}). Despite these limitations, \CRANpkg{ASML} provides a flexible framework that allows researchers to explore various methodologies, including those not directly applicable with \CRANpkg{llama} through \CRANpkg{mlr}, and improve algorithm selection processes across different scenarios, ultimately contributing to improve understanding and performance in algorithm selection tasks. + +\section{Summary and discussion}\label{summary-and-discussion} + +In this work, we present \CRANpkg{ASML}, an R package to select the best algorithm from a portfolio of candidates based on a chosen KPI. \CRANpkg{ASML} uses instance-specific features and historical performance data to estimate, via a model selected by the user, including any regression method from the \CRANpkg{caret} package or a custom function, how well each algorithm is likely to perform on new instances. This helps the automatic selection of the most suitable one. The use of instance-normalized KPIs for algorithm selection is a novel aspect of this package, allowing a unified comparison across different algorithms and problem instances. + +While the motivation and examples presented in this work focus on optimization problems, particularly the automatic selection of branching rules and decision strategies in polynomial optimization, the \CRANpkg{ASML} framework is inherently flexible and can be applied more broadly. In particular, in the context of ML, selecting the right algorithm is a crucial factor for the success of ML applications. Traditionally, this process involves empirically assessing potential algorithms with the available data, which can be resource-intensive. In contrast, in the so-called Meta-Learning, the aim is to predict the performance of ML algorithms based on features of the learning problems (meta-examples). Each meta-example contains details about a previously solved learning problem, including features, and the performance achieved by the candidate algorithms on that problem. A common Meta-Learning approach involves using regression algorithms to forecast the value of a selected performance metric (such as classification error) for the candidate algorithms based on the problem features. This method is commonly referred to as Meta-Regression in the literature. Thus, \CRANpkg{ASML} could also be used in this context, providing a flexible tool for algorithm selection across a variety of domains. + +\section{Acknowledgments}\label{acknowledgments} + +The authors would like to thank María Caseiro-Arias, Antonio Fariña-Elorza and Manuel Timiraos-López for their contributions to the development of the \CRANpkg{ASML} package. +This work is part of the R\&D projects PID2024-158017NB-I00, PID2020-116587GB-I00 and PID2021-124030NB-C32 granted by MICIU/AEI/10.13039/501100011033. This research was also funded by Grupos de Referencia Competitiva ED431C-2021/24 and ED431C 2025/03 from the Consellería de Educación, Ciencia, Universidades e Formación Profesional, Xunta de Galicia. Brais González-Rodríguez acknowledges the support from MICIU, through grant BG23/00155. + +\bibliography{gomez-pateiro-gonzalez-gonzalez.bib} + +\address{% +Ignacio Gómez-Casares\\ +Universidade de Santiago de Compostela\\% +Department of Statistics, Mathematical Analysis and Optimization\\ Santiago de Compostela, Spain\\ +% +% +% +\href{mailto:ignaciogomez.casares@usc.es}{\nolinkurl{ignaciogomez.casares@usc.es}}% +} + +\address{% +Beatriz Pateiro-López\\ +Universidade de Santiago de Compostela\\% +Department of Statistics, Mathematical Analysis and Optimization\\ CITMAga (Galician Center for Mathematical Research and Technology)\\ Santiago de Compostela, Spain\\ +% +% +\textit{ORCiD: \href{https://orcid.org/0000-0002-7714-1835}{0000-0002-7714-1835}}\\% +\href{mailto:beatriz.pateiro@usc.es}{\nolinkurl{beatriz.pateiro@usc.es}}% +} + +\address{% +Brais González-Rodríguez\\ +Universidade de Vigo\\% +Department of Statistics and Operational Research\\ SiDOR Research Group\\ Vigo, Spain\\ +% +% +% +\href{mailto:brais.gonzalez.rodriguez@uvigo.gal}{\nolinkurl{brais.gonzalez.rodriguez@uvigo.gal}}% +} + +\address{% +Julio González-Díaz\\ +Universidade de Santiago de Compostela\\% +Department of Statistics, Mathematical Analysis and Optimization\\ CITMAga (Galician Center for Mathematical Research and Technology)\\ Santiago de Compostela, Spain\\ +% +% +\textit{ORCiD: \href{https://orcid.org/0000-0002-4667-4348}{0000-0002-4667-4348}}\\% +\href{mailto:julio.gonzalez@usc.es}{\nolinkurl{julio.gonzalez@usc.es}}% +} diff --git a/_articles/RJ-2025-046/RJ-2025-046.Rmd b/_articles/RJ-2025-046/RJ-2025-046.Rmd new file mode 100644 index 0000000000..70aef65e03 --- /dev/null +++ b/_articles/RJ-2025-046/RJ-2025-046.Rmd @@ -0,0 +1,2021 @@ +--- +title: 'drclust: An R Package for Simultaneous Clustering and Dimensionality Reduction' +abstract: | + The primary objective of simultaneous methodologies for clustering and + variable reduction is to identify both the optimal partition of units + and the optimal subspace of variables, all at once. The optimality is + typically determined using least squares or maximum likelihood + estimation methods. These simultaneous techniques are particularly + useful when working with Big Data, where the reduction (synthesis) is + essential for both units and variables. Furthermore, a secondary + objective of reducing variables through a subspace is to enhance the + interpretability of the latent variables identified by the subspace + using specific methodologies. The drclust package implements double + K-means (KM), reduced KM, and factorial KM to address the primary + objective. KM with disjoint principal components addresses both the + primary and secondary objectives, while disjoint principal component + analysis and disjoint factor analysis address the latter, producing + the sparsest loading matrix. The models are implemented in C++ for + faster execution, processing large data matrices in a reasonable + amount of time. +author: +- name: Ionel Prunila + affiliation: Department of Statistical Sciences, Sapienza University of Rome + orcid: 0009-0009-3773-0481 + address: + - P.le Aldo Moro 5, 00185 Rome + - Italy + - | + [ionel.prunila@uniroma1.it](ionel.prunila@uniroma1.it){.uri} +- name: Maurizio Vichi + affiliation: Department of Statistical Sciences, Sapienza University of Rome + orcid: 0000-0002-3876-444X + address: + - P.le Aldo Moro 5, 00185 Rome + - Italy + - | + [maurizio.vichi@uniroma1.it](maurizio.vichi@uniroma1.it){.uri} +date: '2026-02-04' +date_received: '2024-07-12' +journal: + firstpage: 103 + lastpage: 132 +volume: 17 +issue: 4 +slug: RJ-2025-046 +citation_url: https://rjournal.github.io/ +packages: + cran: + - psych + - ade4 + - FactoMineR + - FactoClass + - factoextra + - NbClust + - drclust + - clustrd + - biplotbootGUI + - Rcpp + - RcppArmadillo + - cluster + - pheatmap + - ggplot2 + - dplyr + - GGally + bioc: [] +preview: preview.png +bibliography: prunila-vichi.bib +CTV: ~ +legacy_pdf: yes +legacy_converted: yes +output: + rjtools::rjournal_web_article: + self_contained: yes + toc: no + mathjax: https://cdn.jsdelivr.net/npm/mathjax@4/tex-mml-chtml.js + md_extension: -tex_math_single_backslash +draft: no + +--- + + +::::::::: article +## Introduction {#Introduction} + +Cluster analysis is the process of identifying homogeneous groups of +units in the data so that those within clusters are perceived with a low +degree of dissimilarity with each other. In contrast, units in different +clusters are perceived as dissimilar, i.e., with a high degree of +dissimilarity. When dealing with large or extremely large data matrices, +often referred to as Big Data, the task of assessing these +dissimilarities becomes computationally intensive due to the sheer +volume of units and variables involved. To manage this vast amount of +information, it is essential to employ statistical techniques that +synthesize and highlight the most significant aspects of the data. +Typically, this involves dimensionality reduction for both units and +variables to efficiently summarize the data. + +While cluster analysis synthesizes information across the rows of the +data matrix, variable reduction operates on the columns, aiming to +summarize the features and, ideally, facilitate their interpretation. +This key process involves extracting a subspace from the full space +spanned by the manifest variables, maintaining the principal informative +content. The process allows for the synthesis of common information +mainly among subsets of manifest variables, which represent concepts not +directly observable. As a result, subspace-based variable reduction +identifies a few uncorrelated latent variables that mainly capture +common relationships within these subsets. When using techniques like +Factor Analysis (FA) or Principal Component Analysis (PCA) for this +purpose, interpreting the resulting factors or components can be +challenging, particularly when variables significantly load onto +multiple factors, a situation known as *cross-loading*. Therefore, a +simpler structure in the loading matrix, focusing on the primary +relationship between each variable and its related factor, becomes +desirable for clarity and ease of interpretation. Furthermore, the +latent variables derived from PCA or FA do not provide a unique +solution. An equivalent model fit can be achieved by applying an +orthogonal rotation to the component axes. This aspect of non-uniqueness +is often exploited in practice through Varimax rotation, which is +designed to improve the interpretability of latent variables, without +affecting the fit of the analysis. The rotation promotes a simpler +structure in the loading matrix, however, the rotations do not always +ensure enhanced interpretability. An alternative approach has been +proposed by (Vichi and Saporta 2009) and (Vichi 2017), with Disjoint +Principal Component (DPCA) and Disjoint FA (DFA), suggesting to +construct each component/factor from a distinct subset of manifest +variables rather than using all available variables, still optimizing +the same estimation as in PCA and FA, respectively. + +It is important to note that data matrix reduction for both rows and +columns is often performed without specialized methodologies by +employing a \"tandem analysis.\" This involves sequentially applying two +methods, such as using PCA or FA for variable reduction, followed by +Cluster Analysis using KM on the resulting factors. Alternatively, one +could start with Cluster Analysis and then proceed to variable +reduction. The outcomes of these two tandem analyses differ since each +approach optimizes distinct objective functions, one before the other. +For instance, when PCA is applied first, the components maximize the +total variance of the manifest variables. However, if the manifest +variables include high-variance variables that lack a clustering +structure, these will be included in the components, even though they +are not necessary for KM, which focuses on explaining only the variance +between clusters. As a result, sequentially optimizing two different +objectives may lead to sub-optimal solutions. In contrast, when +combining KM with PCA or FA in a simultaneous approach, a single +integrated objective function is utilized. This function aims to +optimize both the clustering partition and the subspace simultaneously. +The optimization is typically carried out using an Alternating Least +Squares (ALS) algorithm, which updates the partition for the current +subspace in one step and the subspace for the current partition in the +next. This iterative process ensures convergence to a solution that +represents at least a local minimum of the integrated objective +function. In comparison, tandem analysis, which follows a sequential +approach (e.g., PCA followed by KM), does not guarantee joint +optimization. One potential limitation of this sequential method is that +the initial optimization through PCA may obscure relevant information +for the subsequent step of Cluster Analysis or emphasize irrelevant +patterns, ultimately leading to sub-optimal solutions, as mentioned by +(DeSarbo et al. 1990). Indeed, the simultaneous strategy has been shown +to be effective in various studies, like (De Soete and Carroll 1994), +(Vichi and Kiers 2001), (Vichi 2001), (Vichi and Saporta 2009), (Rocci +and Vichi 2008), (Timmerman et al. 2010), (Yamamoto and Hwang 2014). + +In order to spread access to these techniques and their use, software +implementations are needed. Within the R Core Team (2015) environment, +there are different libraries available to perform dimensionality +reduction techniques. Indeed, the plain version of KM, PCA, and FA are +available in the built-in package stats, namely: `princomp`, `factanal`, +`kmeans`. Furthermore, some packages allow to go beyond the plain +estimation and output of such algorithms. Indeed, one of the most rich +libraries in R is [**psych**](https://CRAN.R-project.org/package=psych) +(W. R. Revelle 2017), which provides functions that allow to easily +simulate data according to different schemes, testing routines, +calculation of various estimates, as well as multiple estimation +methods. [**ade4**](https://CRAN.R-project.org/package=ade4) (Dray and +Dufour 2007) allows for dimensionality reduction in the presence of +different types of variables, along with many graphical instruments. The +[**FactoMineR**](https://CRAN.R-project.org/package=FactoMineR) (Lê et +al. 2008) package allows for unit-clustering and extraction of latent +variables, also in the presence of mixed variables. +[**FactoClass**](https://CRAN.R-project.org/package=FactoClass) (Pardo +and Del Campo 2007) implements functions for PCA, Correspondence +Analysis (CA) as well as clustering, including the tandem approach. +[**factoextra**](https://CRAN.R-project.org/package=factoextra) +(Kassambara 2022) instead, provides visualization of the results, aiding +their assessment in terms of choice of the number of latent variables, +elegant dendrograms, screeplots and more. More focused on the choice of +the number of clusters is +[**NbClust**](https://CRAN.R-project.org/package=NbClust) (Charrad et +al. 2014), offering 30 indices for determining the number of clusters, +proposing the best method by trying not only different numbers of groups +but also different distance measures and clustering methods, going +beyond the partitioning ones. + +More closely related to the library here presented, to the knowledge of +the authors, there are two packages that implement a subset of the +techniques proposed within +[**drclust**](https://CRAN.R-project.org/package=drclust). +[**clustrd**](https://CRAN.R-project.org/package=clustrd) (Markos et al. +2019) implements simultaneous methods of clustering and dimensionality +reduction. Besides offering functions for continuous data, they also +allow for categorical (or mixed) variables. Even more, they formulate, +at least for the continuous case, an implementation aligned with the +objective function proposed by Yamamoto and Hwang (2014), based on which +the reduced KM (RKM) and factorial KM (FKM) become special cases as +results of a tuning parameter. + +Finally, there is +[**biplotbootGUI**](https://CRAN.R-project.org/package=biplotbootGUI) +(Nieto Librero and Freitas 2023), offering a GUI allowing to interact +with graphical tools, aiding in the choice of the number of components +and clusters. Furthermore, it implements KM with disjoint PCA (DPCA), as +described in (Vichi and Saporta 2009). Even more, they propose an +optimization algorithm for the choice of the initial starting point from +which the estimation process for the parameters begins. + +Like [**clustrd**](https://CRAN.R-project.org/package=clustrd), the +[**drclust**](https://CRAN.R-project.org/package=drclust) package +provides implementations of FKM and RKM. However, while +[**clustrd**](https://CRAN.R-project.org/package=clustrd) also supports +categorical and mixed-type variables, our implementation currently +handles only continuous variables. That said, appropriate pre-processing +of categorical variables, as suggested in Vichi et al. (2019), can make +them compatible with the proposed methods. In extreme essence, one +should dummy-encode all the qualitative variables. In terms of +performance, [**drclust**](https://CRAN.R-project.org/package=drclust) +offers significantly faster execution. Moreover, regarding FKM, our +proposal demonstrates superior results in both empirical applications +and simulations, in terms of model fit and the Adjusted Rand Index +(ARI). Another alternative, +[**biplotbootGUI**](https://CRAN.R-project.org/package=biplotbootGUI), +implements KM with DPCA and includes built-in plotting functions and a +SDP-based initialization of parameters. However, our implementation +remains considerably faster and allows users to specify which variables +should be grouped together within the same (or different) principal +components. This capability enables a partially or fully confirmatory +approach to variable reduction. Beyond speed and the confirmatory +option, [**drclust**](https://CRAN.R-project.org/package=drclust) offers +three methods not currently available in other `R` packages: DPCA and +DFA, both designed for pure dimensionality reduction, and double KM +(DKM), which performs simultaneous clustering and variable reduction via +KM. All methods are implemented in C++ for computational efficiency. +Table \@ref(tab:T2) +summarizes the similarities and differences between `drclust` and +existing alternatives + +The package presented within this work aims to facilitate the access to +and usability of some techniques that fall in two main branches, which +overlap. In order to do so, some statistical background is first +recalled. + +## Notation and theoretical background + +The main pillars of +[**drclust**](https://CRAN.R-project.org/package=drclust) fall in two +main categories: dimensionality reduction and (partitioning) cluster +analysis. The former may be carried out individually or blended with the +latter. Because both rely on the language of linear algebra, Table +\@ref(tab:T1) contains, +for the convenience of the reader, the mathematical notation needed for +this context. Then some theoretical background is reported. + +::: {#tab:notation} + -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + Symbol Description + --------------------------- ---------------------------------------------------------------------------------------------------------------------------------------------- + *n*, *J*, *K*, *Q* number of: units, manifest variables, unit-clusters, latent factors + + $\mathbf{X}$ *n* x *J* data matrix, where the generic element $x_{ij}$ is the real observation on the *i*-th unit within the *j*-th variable + + $\mathbf{x}_i$ *J* x 1 vector representing the generic row of $\mathbf{X}$ + + $\mathbf{U}$ *n* x *K* unit-cluster membership matrix, binary and row stochastic, with $u_{ik}$ being the generic element + + $\mathbf{V}$ *J* x *Q* variable-cluster membership matrix, binary and row stochastic, with $v_{jq}$ as the generic element + + $\mathbf{B}$ *J* x *J* variable-weighting diagonal matrix + + $\mathbf{Y}$ *n* x *Q* component/factor score matrix defined on the reduced subspace + + $\mathbf{y}_i$ *Q* x 1 vector representing the generic row of **Y** + + $\mathbf{A}$ *J* x *Q* variables - factors, \"plain\", loading matrix + + $\mathbf{C}^+$ Moore-Penrose pseudo-inverse of a matrix **C**. $\mathbf{C}^+ = (\mathbf{C'C})^{-1}\mathbf{C'}$ + + $\bar{\textbf{X}}$ *K* x *J* centroid matrix in the original feature space, i.e., $\bar{\textbf{X}} = \textbf{U}^{+} \textbf{X}$ + + $\bar{\mathbf{Y}}$ *K* x *Q* centroid matrix projected in the reduced subspace, i.e., $\bar{\mathbf{Y}} = \bar{\mathbf{X}}\mathbf{A}$ + + $\mathbf{H}_{\mathbf{C}}$ Projector operator $\mathbf{H}_\mathbf{C} = \mathbf{C}(\mathbf{C}'\mathbf{C})^{-1}\mathbf{C}'$ spanned by the columns of matrix $\mathbf{C}$ + + $\mathbf{E}$ *n* x *J* Error term matrix + + $||\cdot||$ Frobenius norm + -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + + : (#tab:T1) Notation +::: + +### Latent variables with simple-structure loading matrix + +Classical methods of PCA (Pearson 1901) or FA (Cattell 1965; Lawley and +Maxwell 1962) build each latent factor from combination of *all* the +manifest variables. As a consequence, the loading matrix, describing the +relations between manifest and latent variables, is usually not +immediately interpretable. Ideally, it is desirable to have variables +that are associated to a single factor. This is typically called *simple +structure*, which induces subsets of variables characterizing factors +and frequently the partition of the variables. While factor rotation +techniques go in this direction (especially Varimax), even if not +exactly, they do not guarantee the result. Alternative solutions have +been proposed. (Zou et al. 2006), by framing the PCA problem as a +regression one, introducing an elastic-net penalty, aiming for a sparse +solution of the loading matrix **A**. For the present work, we consider +two techniques for this purpose: DPCA and DFA, implemented in the +proposed package. + +#### Disjoint principal component analysis + +Vichi and Saporta (2009) propose an alternative solution, DPCA, which +leads to the simplest possible structure on **A**, while still +maximizing the explained variance. Such a result is obtained by building +each latent factor from a subset of variables instead of allowing all +the variables to contribute to all the components. This means that it +provides *J* non-zero loadings instead of having *JQ* of them. To obtain +this setting, variables are grouped in such a way that they form a +partition of the initial set. The model can be described as a +constrained PCA, where the matrix $\mathbf{A}$ is restricted to be +reparametrized into the product $\mathbf{A}=\mathbf{BV}$. Thus, the +model is described as: + +$$\begin{equation} +\label{dpca1} + \mathbf{X} = \mathbf{X}\mathbf{A}\mathbf{A}' + \mathbf{E}= \mathbf{X}\mathbf{B}\mathbf{V}\mathbf{V}'\mathbf{B} + \mathbf{E}, +\end{equation} (\#eq:dpca1)$$ +subject to +$$\begin{equation} +\label{dpca2} + \mathbf{V} = [v_{jq} \in \{0,1\}] \ \ \ \ \ (binarity), +\end{equation} (\#eq:dpca2)$$ + +$$\begin{equation} +\label{dpca3} + \mathbf{V}\mathbf{1}_{Q} = \mathbf{1}_{J} \ \ \ (row-stochasticity), +\end{equation} (\#eq:dpca3)$$ + +$$\begin{equation} +\label{dpca4} +\mathbf{V}'\mathbf{B}\mathbf{B}'\mathbf{V} = \mathbf{I}_{Q} \ \ \ \ \ (orthonormality), +\end{equation} (\#eq:dpca4)$$ + +$$\begin{equation} +\label{dpca5} + \mathbf{B} = diag(b_1, \dots, b_J) \ \ \ \ (diagonality). +\end{equation} (\#eq:dpca5)$$ +The estimation of the parameters $\mathbf{B}$ and $\mathbf{V}$ is +carried out via least squares (LS) and, by solving the minimization +problem, +$$\begin{equation} +\label{dpca6} + RSS_{DPCA}(\mathbf{B}, \mathbf{V}) = ||\mathbf{X} - \mathbf{X}\mathbf{B}\mathbf{V}\mathbf{V}'\mathbf{B}||^2 +\end{equation} (\#eq:dpca6)$$ +subject to the the constraints (\@ref(eq:dpca2), \@ref(eq:dpca3), +\@ref(eq:dpca4), \@ref(eq:dpca5)). An ALS algorithm is employed, +guaranteeing at least a local optimum. In order to (at least partially) +overcome this downside, multiple random starts are needed, and the best +solution is retained. + +Therefore, the DPCA method is subject to more structural constraints +than standard PCA. Specifically, standard PCA does not enforce the +reparameterization $\mathbf{A}=\mathbf{BV}$, meaning its loading matrix +$\mathbf{A}$ is free to vary among orthonormal matrices. In contrast, +DPCA still requires an orthonormal matrix $\mathbf{A}$ but also needs +that each principal component is associated with a disjoint subset of +variables that most reconstruct the data. This implies that each +variable contributes to only one component, resulting in a sparse and +block-diagonal loading matrix. In essence, DPCA fits *Q* separate PCAs +on the *Q* disjoint subsets of variables, and from each, extracts the +eigenvector associated with the largest eigenvalue. In general, the +total variance explained by DPCA is slightly lower, and the residual of +the objective function is larger compared to PCA. This trade-off is made +in exchange for the added constraint that clearly enhances +interpretability. The extent of the reduction depends on the true +underlying structure of the latent factors, specifically on whether they +are truly uncorrelated. When the observed correlation matrix is block +diagonal, with variables within blocks being highly correlated and +variables between blocks being uncorrelated, DPCA can explain almost the +same amount of variance of PCA, with the advantage to simplify +interpretation.\ +It is important to note that, as DPCA is implemented, it allows for a +blend of exploratory and confirmatory approaches. In the confirmatory +framework, users can specify a priori which variables should +collectively contribute to a factor using the `constr` argument, +available for the last three functions in Table +\@ref(tab:T2). The +algorithm assigns the remaining manifest variables, for which no +constraint has been specified, to the *Q* factors in a way that ensures +the latent variables best reconstruct the manifest ones, capturing the +maximum variance. This is accomplished by minimizing the loss function +(\@ref(eq:dpca6)). Although each of the *Q* latent variables is derived +from a different subset of variables, which involves the spectral +decomposition of multiple covariance matrices, their smaller size, +combined with the implementation in C++, enables very rapid execution of +the routine. + +A very positive side effect of the additional constraint in DPCA +compared to standard PCA is the uniqueness of the solution, which +eliminates the need for factor rotation in DPCA. + +#### Disjoint factor analysis + +Proposed by Vichi (2017), this technique is the model-based counterpart +of the DPCA model. It pursues a similar goal in terms of building *Q* +factors from *J* variables, imposing a simple structure on the loading +matrix. However, the means by which the goal is pursued are different. +Unlike DPCA, the estimation method adopted for DFA is Maximum Likelihood +and the model requires additional statistical assumptions compared to +DPCA. The model can be formulated in a matrix form as, +$$\begin{equation} +\label{dfa1} + \mathbf{X} = \mathbf{Y}\mathbf{A}'+\mathbf{E}, +\end{equation} (\#eq:dfa1)$$ +where $\mathbf{X}$ is centered, meaning that the mean vector +$\boldsymbol{\mu}$ has been subtracted from each multivariate unit +$\mathbf{x}_{i}$. Therefore, for a multivariate, centered, unit, the +previous model can be expressed as +$$\begin{equation} +\label{dfa2} + \mathbf{x}_i = \mathbf{A}\mathbf{y}_i + \mathbf{e}_i, \ \ i = 1, \dots, n. +\end{equation} (\#eq:dfa2)$$ +where $\mathbf{y}_i$ is the *i*-th row of $\mathbf{Y}$ and +$\mathbf{x}_i$, $\mathbf{e}_i$ are, respectively, the $i$-th rows of +$\mathbf{X}$ and $\mathbf{E}$, with a multivariate normal distribution +on the $J$-dimensional space, +$$\begin{equation} +\label{FAassumptions1} + \mathbf{x}_i \sim \mathcal{N}(\boldsymbol{0}, \boldsymbol{\Sigma_X}), \ \ \ \mathbf{e}_i \sim \mathcal{N}(\boldsymbol{0}, \mathbf{\Psi}) +\end{equation} (\#eq:FAassumptions1)$$ +The covariance structure of the FA model can be written, +$$\begin{equation} + Cov(\mathbf{x}_i) = \mathbf{\Sigma_X} = \mathbf{AA'} + \mathbf{\Psi}, +\end{equation}$$ +[]{#dfa6 label="dfa6"} where additional, assumptions are needed, +$$\begin{equation} +\label{dfa4} + Cov(\mathbf{y}_{i}) = \mathbf{\Sigma}_{\mathbf{Y}} = \mathbf{I}_Q, +\end{equation} (\#eq:dfa4)$$ + +$$\begin{equation} +\label{dfa5} + Cov(\mathbf{e}_i) = \mathbf{\Sigma}_{\mathbf{E}} = \mathbf{\Psi}, \ \ \ \mathbf{\Psi} = diag(\psi_{1},\dots,\psi_{Q} : \psi_{q}>0)' , \ \ j = 1, \dots, J +\end{equation} (\#eq:dfa5)$$ + +$$\begin{equation} + Cov(\mathbf{e}_{i}, \mathbf{y}_{i}) = \mathbf{\Sigma}_{\mathbf{EY}} = 0 +\label{dfa5b} +\end{equation} (\#eq:dfa5b)$$ + +$$\begin{equation} +\mathbf{A} = \mathbf{BV} +\label{dfa6b} +\end{equation} (\#eq:dfa6b)$$ +The objective function can be formulated as the maximization of the +Likelihood function or as the minimization of the following discrepancy: +$$\begin{align*} + D_{DFA}(\mathbf{B},\mathbf{V}, \mathbf{\Psi}) + & = |\text{ln}(\mathbf{B}\mathbf{V}\mathbf{V}'\mathbf{B} + \mathbf{\Psi})| - \text{ln}|\mathbf{S}| + \text{tr}((\mathbf{B}\mathbf{V}\mathbf{V}'\mathbf{B} + \mathbf{\Psi})^{-1}\mathbf{S}) - \textit{J}, \\ + & \qquad j = 1, \dots, \textit{J}, \ q = 1, \dots, \textit{Q},\\ + & \qquad s.t.: \mathbf{V} = [v_{jq}], \ v_{jq} \in \{0,1\}, \ \sum_q{v_{jq}} = 1, +\end{align*}$$ +whose parameters are optimized by means of a coordinate descent +algorithm. + +Apart from the methodological distinctions between DPCA and DFA, the +latter exhibits the scale equivariance property. The optimization of the +Likelihood function implies a higher computational load, thus, a longer +(compared to the DPCA) execution time. + +As in the DPCA case, under the constraint $\mathbf{A}=\mathbf{BV}$, the +solution provided by the model is, also in this case, unique. + +### Joint clustering and variable reduction + +The four clustering methods discussed all follow the $K$-means +framework, working to partition units. However, they differ primarily in +how they handle variable reduction. + +Double KM (DKM) employs a symmetric approach, clustering both the units +(rows) and the variables (columns) of the data matrix at the same time. +This leads to the simultaneous identification of mean profiles for both +dimensions. DKM is particularly suitable for data matrices where both +rows and columns represent units. Examples of such matrices include +document-by-term matrices used in Text Analysis, product-by-customer +matrices in Marketing, and gene-by-sample matrices in Biology. + +In contrast, the other three clustering methods adopt an asymmetric +approach. They treat rows and columns differently, focusing on means +profiles and clustering for rows, while employing components or factors +for the variables (columns). These methods are more appropriate for +typical units-by-variable matrices, where it's beneficial to synthesize +variables using components or factors. At the same time, they emphasize +clustering and the mean profiles of the clusters specifically for the +rows. The methodologies that fall into this category are RKM, FKM, and +DPCAKM. + +The estimation is carried out by the LS method, while the computation of +the estimates is performed via ALS. + +#### Double k-means (DKM) + +Proposed by Vichi (2001), DKM is one of the first introduced +bi-clustering methods that provides a simultaneous partition of the +units and variables, resulting in a two-way extension of the plain KM +(McQueen 1967). The model is described by the following equation, +$$\begin{equation} +\label{dkm1} + \mathbf{X} = \mathbf{U}\bar{\mathbf{Y}}\mathbf{V}' + \mathbf{E} +\end{equation} (\#eq:dkm1)$$ +where $\bar{\mathbf{Y}}$ is the centroid matrix in the reduced space for +the rows and columns, enabling a comprehensive summarization of units +and variables. By optimizing a single objective function, the DKM method +captures valuable information from both dimensions of the dataset +simultaneously. + +This bi-clustering approach can be applied in several impactful ways. +One key application is in the realm of Big Data. DKM can effectively +compress expansive datasets that includes a vast number of units and +variables into a compressed more manageable and robust data matrix +$\bar{\mathbf{Y}}$. This compressed matrix, formed by mean profiles both +for rows and columns, can then be explored and analyzed using a variety +of subsequent statistical techniques, thus facilitating efficient data +handling and analysis of Big Data. The algorithm similarly to the +well-known KM is very fast and converges quickly to a solution, which is +at least a local minimum of the problem. + +Another significant application of DKM is its capability to achieve +optimal clustering for both rows and columns. This dual clustering +ability is particularly advantageous in situations where it is essential +to discern meaningful patterns and relationships within complex +datasets, highlighting the utility of DKM in diverse fields and +scenarios. + +The Least Squares estimation of the parameters $\mathbf{U}$, +$\mathbf{V}$ and $\bar{\mathbf{Y}}$ leads to the minimization of the +problem +$$\begin{equation} +\label{dkm2} + RSS_{\textit{DKM}}(\mathbf{U}, \mathbf{V}, \bar{\mathbf{Y}}) = {||\mathbf{X} - \mathbf{U}\bar{\mathbf{Y}}\mathbf{V}'||^2}, +\end{equation} (\#eq:dkm2)$$ + +$$\begin{equation} +\label{dkm3} + s.t.: u_{ik} \in \{0,1\}, \ \ \sum_{k} u_{ik} = 1, \ \ i = 1 ,\dots, N, \ \ k = 1 ,\dots, K, +\end{equation} (\#eq:dkm3)$$ + +$$\begin{equation} +\label{dkm4} + \ \ \ \ \ \ \ v_{jq} \in \{0,1\}, \ \ \sum_{q} v_{jq} = 1, \ \ j = 1, \dots, J, \ \ q = 1, \dots, Q. +\end{equation} (\#eq:dkm4)$$ +Since $\mathbf{\bar{Y}} = \mathbf{U}^{+}\mathbf{X}\mathbf{V}^{+'}$, then +(\@ref(eq:dkm2)) can be framed in terms of projector operators, thus: +$$\begin{equation} +\label{dkm5} +RSS_{\textit{DKM}}(\mathbf{U}, \mathbf{V}) = ||\mathbf{X} - \mathbf{H}_\mathbf{U}\mathbf{X}\mathbf{H}_\mathbf{V}||^2. +\end{equation} (\#eq:dkm5)$$ +Minimizing in both cases the sum of squared-residuals (or, equivalently, +the within deviances associated to the *K* unit-clusters and *Q* +variable-clusters). In this way, one obtains a (hard) classification of +both units and variables. The optimization of \@ref(eq:dkm5) is done +via ALS, alternating, in essence, two assignment problems for rows and +columns similar to KM steps. + +#### Reduced k-means (RKM) + +Proposed by De Soete and Carroll (1994), RKM performs the reduction of +the variables by projecting the *J*-dimensional centroid matrix into a +*Q*-dimensional subspace ($\textit{Q} \leq$ *J*), spanned by the columns +of the loading matrix $\mathbf{A}$, such that it best reconstructs +$\mathbf{X}$ by using the orthogonal projector matrix +$\mathbf{A}\mathbf{A}'$. Therefore, the model is described by the +following equation, +$$\begin{equation} +\label{rkm1} + \mathbf{X} = \mathbf{U}\bar{\mathbf{X}}\mathbf{A}\mathbf{A}' + \mathbf{E}. +\end{equation} (\#eq:rkm1)$$ +The estimation of **U** and **A** can be done via LS, minimizing the +following equation, +$$\begin{equation} +\label{rkm2} + RSS_{\textit{RKM}}(\mathbf{U}, \mathbf{A})={||\mathbf{X} - \mathbf{U}\bar{\mathbf{X}}\mathbf{A}\mathbf{A}'||^2}, +\end{equation} (\#eq:rkm2)$$ + +$$\begin{equation} +\label{rkm3} + s.t.: \ \ \ u_{ik} \in \{0,1\}, \ \ \sum_{k} u_{ik} = 1, \ \ \mathbf{A}'\mathbf{A} = \mathbf{I}. +\end{equation} (\#eq:rkm3)$$ +which can be optimized, once again, via ALS. In essence, the model +alternates a KM step assigning each original unit $\mathbf{x}_i$ to the +closest centroid in the reduced space and a PCA step based on the +spectral decomposition of $\mathbf{X}'\mathbf{H}_\mathbf{U}\mathbf{X}$, +conditioned on the results of the previous iteration. The iterations +continue until when the difference between two subsequent objective +functions is smaller than a small arbitrary chosen constant +$\epsilon > 0$. + +#### Factorial k-means (FKM) + +Proposed by Vichi and Kiers (2001), FKM produces a dimension reduction +both of the units and centroids differently from RKM. Its goal is to +reconstruct the data in the reduced subspace, $\mathbf{Y}$, by means of +the centroids in the reduced space. The FKM model can be obtained by +considering the RKM model and post-multiplying the right- and left-hand +side of it in equation (\@ref(eq:rkm1)), and rewriting the new error as +$\mathbf{E}$, +$$\begin{equation} + \mathbf{X}\mathbf{A} = \mathbf{U}\bar{\mathbf{X}}\mathbf{A} + \mathbf{E}. +\end{equation}$$ +Its estimation via LS results in the optimization of the following +equation, +$$\begin{equation} +\label{fkm1} + RSS_{\textit{FKM}}(\mathbf{U}, \mathbf{A}, \bar{\mathbf{X}})={||\mathbf{X}\mathbf{A} - \mathbf{U}\bar{\mathbf{X}}\mathbf{A}||^2}, +\end{equation} (\#eq:fkm1)$$ + +$$\begin{equation} + s.t.: \ \ \ u_{ik} \in \{0,1\}, \ \ \sum_{k} u_{ik} = 1, \ \ \mathbf{A}'\mathbf{A} = \mathbf{I}. +\end{equation}$$ +Although the connection with the RKM model appears straightforward, it +can be shown that the loss function of the former is always equal or +smaller compared to the latter. Practically, the KM step is applied on +$\mathbf{X}\mathbf{A}$, instead of just $\mathbf{X}$, as it happens in +the DKM and RKM. In essence, FKM works better when the data and +centroids are lying in the reduced subspace, and not just the centroids +as in RKM. + +In order to decide when RKM or FKM can be properly applied, it is +important to recall that two types of residuals can be defined in +dimensionality reduction: *subspace residuals*, lying on the subspace +spanned by the columns of $\mathbf{A}$ and *complement residuals*, lying +on the complement of this subspace, i.e., those residual lying on the +subspace spanned by the columns of $\mathbf{A}^\perp$, with +$\mathbf{A}^\perp$ a column-wise orthonormal matrix of order +$J \times (J-Q)$ such that +$\mathbf{A}^\perp \mathbf{A}^{\perp ^\prime} = \mathbf{O}_{J-Q}$, where +$\mathbf{O}_{J-Q}$ is the matrix of zeroes of order $Q \times (J-Q)$. +FKM is more effective when there is significant residual variance in the +subspace orthogonal to the clustering subspace. In other words, the +complement residuals typically represent the error given by those +observed variables that scarcely contribute to the clustering subspace +to be identified. FKM tends to recover the subspace and clustering +structure more accurately when the data contains variables with +substantial variance that does not reflect the clustering structure and +therefore mask it. FKM can better ignore these variables and focus on +the relevant clustering subspace. On the other hand, RKM performs better +when the data has significant residual variance within the clustering +subspace itself. This means that when the variables within the subspace +show considerable variance, RKM can more effectively capture the +clustering structure. + +In essence, when most of the variables in the dataset reflect the +clustering structure, RKM is more likely to provide a good solution. If +this is not the case, FKM may be preferred. + +#### Disjoint principal component analysis k-means (DPCAKM) + +Starting from the FKM model, the goal here, beside the partition of the +units, is to have a parsimonious representation of the relationships +between latent and manifest variables, provided by the loading matrix +**A**. Vichi and Saporta (2009) propose for FKM the parametrization of +**A** = **BV**, that allows the simplest structure and thus simplifies +the interpretation of the factors, +$$\begin{equation} +\label{cdpca1} + \mathbf{X} = \mathbf{U}\bar{\mathbf{X}}\mathbf{B}\mathbf{V}\mathbf{V}'\mathbf{B} + \mathbf{E}. +\end{equation} (\#eq:cdpca1)$$ +By estimating $\mathbf{U}$, $\mathbf{B}$, $\mathbf{V}$ and +$\bar{\mathbf{X}}$ via LS, the loss function of the proposed method +becomes: +$$\begin{equation} +\label{cdpca2} + RSS_{DPCAKM}(\mathbf{U}, \mathbf{B}, \mathbf{V}, \bar{\mathbf{X}}) = ||\mathbf{X} - \mathbf{U}\bar{\mathbf{X}}\mathbf{B}\mathbf{V}\mathbf{V}'\mathbf{B}||^2, +\end{equation} (\#eq:cdpca2)$$ + +$$\begin{equation} +\label{cdpca3} + s.t.: u_{ik} \in \{0,1\}, \ \ \sum_{k} u_{ik} = 1, \ \ i = 1 ,\dots, N, \ \ k = 1 ,\dots, K, +\end{equation} (\#eq:cdpca3)$$ + +$$\begin{equation} +\label{cdpca4} + \ \ \ \ \ \ \ v_{jq} \in \{0,1\}, \ \ \sum_{q} v_{jq} = 1, \ \ j = 1, \dots, J, \ \ q = 1, \dots, Q, +\end{equation} (\#eq:cdpca4)$$ + +$$\begin{equation} +\label{cdpca5} + \ \ \ \ \ \ \ \mathbf{V}'\mathbf{B}\mathbf{B}\mathbf{V} = \mathbf{I}, \ \ \mathbf{B} = diag(b_1, \dots, b_J). +\end{equation} (\#eq:cdpca5)$$ +In practice, this model has traits of the DPCA given the projection on +the reduced subspace and the partitioning of the units, resulting in a +sparse loading matrix, but also of the DKM, given the presence of both +**U** and **V**. Thus, DPCAKM can be considered a bi-clustering +methodology with an asymmetric treatment of the rows and columns of +**X**. By inheriting the constraint on **A**, the overall fit of the +model compared with the FKM for example, is generally worse although it +offers an easier interpretation of the principal components. +Nevertheless, it is potentially able to identify a better partition of +the units. Like in the DPCA case, the difference is negligible when the +true latent variables are really disjoint. As implemented, the +assignment step is carried out by minimizing the unit-centroid +squared-Euclidean distance in the reduced subspace. + +## The package + +The library offers the implementation of all the models mentioned in the +previous section. Each one of them corresponds to a specific function +implemented using [**Rcpp**](https://CRAN.R-project.org/package=Rcpp) +(Eddelbuettel and Francois 2011) and +[**RcppArmadillo**](https://CRAN.R-project.org/package=RcppArmadillo) +(Eddelbuettel and Sanderson 2014). + +::: {#tab:stat_models} + ----------------------------------------------------------------------------------------------------------------------------------------------------- + Function Model Previous\ Main differences\ + Implementations in `drclust` + ------------ ----------------------------- ----------------------------------------- ---------------------------------------------------------------- + `doublekm` DKM\ None Short runtime (C++); + (Vichi 2001) + + `redkm` RKM\ in `clusterd`;\ \>50x faster (C++);\ + (De Soete and Carroll 1994) Mixed variables; Continuous variables; + + `factkm` FKM\ in `clustrd`;\ \>20x faster (C++);\ + (Vichi and Kiers 2001) Mixed variables Continuous variables;\ + Better fit and classification; + + `dpcakm` DPCAKM\ in `biplotbootGUI`;\ \>10x faster (C++);\ + (Vichi and Saporta 2009) Continuous variables;\ Constraint on variable allocation within principal components; + SDP-based initialization of parameters; + + `dispca` DPCA\ None Short runtime (C++);\ + (Vichi and Saporta 2009) Constraint on variable allocation within principal components; + + `disfa` DFA\ None Short runtime (C++);\ + (Vichi 2017) Constraint on variable allocation within factors; + ----------------------------------------------------------------------------------------------------------------------------------------------------- + + : (#tab:T2) Statistical methods available in the `drclust` package +::: + +Some additional functions have been made available for the user. Most of +them are intended to aid the user in evaluating the quality of the +results, or in the choice of the hyper-parameters. + +::: {#tab:aux_methods} + ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + **Function** **Technique** **Description** **Goal** + ----------------- ----------------------------- --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- --------------------------------- + `apseudoF` \"relaxed\" pseudoF \"Relaxed\" version of Caliński and Harabasz (1974). Selects the second largest pseudoF value if the difference with the first is less than a fraction. Parameter tuning + + `dpseudoF` DKM-pseudoF Adaptation of the pseudoF criterion proposed by Rocci and Vichi (2008) to bi-clustering. Parameter tuning + + `kaiserCrit` Kaiser criterion Kaiser rule for selecting the number of principal components (Kaiser 1960). Parameter tuning + + `centree` Dendrogram of the centroids Graphical tool showing how close the centroids of a partition are. Visualization + + `silhouette` Silhouette Imported from [**cluster**](https://CRAN.R-project.org/package=cluster) (Maechler et al. 2023) and [**factoextra**](https://CRAN.R-project.org/package=factoextra) (Kassambara 2022). Visualization, parameter tuning + + `heatm` Heatmap Heatmap of distance-ordered units within distance-ordered clusters, adapted from [**pheatmap**](https://CRAN.R-project.org/package=pheatmap) (Kolde 2019). Visualization + + `CronbachAlpha` Cronbach Alpha Index Proposed by Cronbach (1951). Assesses the unidimensionality of a dataset. Assessment + + `mrand` ARI Assesses clustering quality based on the confusion matrix (Rand 1971). Assessment + + `cluster` Membership vector Returns a multinomial 1 × *n* membership vector from a binary, row-stochastic *n* × *K* membership matrix; mimics `kmeans$cluster`. Encoding + ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + + : (#tab:T3) Auxiliary functions available in the library +::: + +With regard to the auxiliary functions (Table +\@ref(tab:T3)), they +have all been implemented in the `R` language, building on top of +packages already available on CRAN, such as +[**cluster**](https://CRAN.R-project.org/package=cluster) by (Maechler +et al. 2023), +[**factoextra**](https://CRAN.R-project.org/package=factoextra) by +(Kassambara 2022), +[**pheatmap**](https://CRAN.R-project.org/package=pheatmap) by (Kolde +2019), which allowed for an easier implementation. One of the main goals +of the proposed package, besides spreading the availability and +usability of the statistical methods considered, is the speed of +computation. By doing so (if the memory is sufficient), the results, +also for large data matrices, can be obtained in a reasonable amount of +time. A first mean adopted to pursue such a goal is the full +implementation of the statistical methods in the C++ language. The +libraries used are [**Rcpp**](https://CRAN.R-project.org/package=Rcpp) +(Eddelbuettel and Francois 2011) and +[**RcppArmadillo**](https://CRAN.R-project.org/package=RcppArmadillo) +(Eddelbuettel and Sanderson 2014), which significantly reduced the +required runtime. + +A practical issue that happens very often in crisp (hard) clustering, +such as KM, is the presence of empty clusters after the assignment step. +When this happens, a column of $\mathbf{U}$ has all elements equal to +zero, which can be proved to be a local minimum solution, and impedes +obtaining a solution for $(\mathbf{U}'\mathbf{U})^{-1}$. This typically +happens even more often when the number of clusters *K* specified by the +user is larger than the true one or in the case of a sub-optimal +solution. Among the possible solutions addressing this issue, the one +implemented here consists in splitting the cluster with higher +within-deviance. In practice, a KM with $\textit{K} = 2$ is applied to +it, assigning to the empty cluster one of the two clusters obtained by +the procedure, which is iterated until all the empty clusters are +filled. Such a strategy guarantees that the monotonicity of the ALS +algorithm is preserved, although it is the most time-consuming one. + +Among all the six implementations of the statistical techniques, there +are some arguments that are set to a default value. Table +\@ref(tab:T4) +describes all the arguments that have a default value. In particular, +`print`, which displays a descriptive summary of the results, is set to +zero (so the user should explicitly require to the function such +output). `Rndstart` is set as default to 20, so that the algorithm is +run 20 times until convergence. In order to have more confidence (not +certainty) that the obtained solution is a global optimum, a higher +value for this argument can be provided. With particular regard to +`redkm` and `factkm`, the argument `rot`, which performs a Varimax +rotation on the loading matrix, is set by default to 0. If the user +would like to have this performed, it must be set equal to 1. Finally, +the `constr` argument, which is available for `dpcakm` and `dispca`, is +set by default to a vector (of length *J*) of zeros, so that each +variable is selected to contribute to the most appropriate latent +variable, according to the logic of the model. + +::: {#tab:defaultarguments} + --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + **Argument** **Used In** **Description** **Default Value** + -------------- ------------------------------------------------------------ --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------- + `Rndstart` `doublekm`, `redkm`, `factkm`, `dpcakm`, `dispca`, `disfa` Number of times the model is run until convergence. 20 + + `verbose` `doublekm`, `redkm`, `factkm`, `dpcakm`, `dispca`, `disfa` Outputs basic summary statistics regarding each random start (1 = enabled; 0 = disabled). 0 + + `maxiter` `doublekm`, `redkm`, `factkm`, `dpcakm`, `dispca`, `disfa` Maximum number of iterations allowed for each random start (if convergence is not yet reached) 100 + + `tol` `doublekm`, `redkm`, `factkm`, `dpcakm`, `dispca`, `disfa` Tolerance threshold (maximum difference between the values of the objective function of two consecutive iterations such that convergence is assumed $10^{-6}$ + + `tol` `apseudoF` Approximation value. It is half of the length of the interval put for each pF value. 0 \<= `tol` \< 1 0.05 + + `rot` `redkm`, `factkm` performs varimax rotation of axes obtained via PCA (0 = `False`; 1 = `True`) 0 + + `prep` `doublekm`, `redkm`, `factkm`, `dpcakm`, `dispca`, `disfa` Pre-processing of the data. 1 performs the *z*-score transform; 2 performs the min-max transform; 0 leaves the data un-pre-processed 1 + + `print` `doublekm`, `redkm`, `factkm`, `dpcakm`, `dispca`, `disfa` Final summary statistics of the performed method (1 = enabled; 0 = disabled). 0 + + `constr` `dpcakm`, `dispca`, `disfa` Vector of length $J$ (number of variables) specifying variable-to-cluster assignments. Each element can be an integer from 0 to $Q$ (number of variable-clusters or components), indicating a fixed assignment, or 0 to leave the variable unconstrained (i.e., assigned by the algorithm). `rep(0,J)` + --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + + : (#tab:T4) Arguments accepted by functions in the `drclust` + package with default values +::: + +By offering a fast execution time, all the implemented models allow to +run multiple random starts of the algorithm in a reasonable amount of +time. This feature comes particularly useful given the absence of +guarantees of global optima for the ALS algorithm, which has an ad-hoc +implementation for all the models. Table +\@ref(tab:T5) shows +that, compared to the two packages which implement 3 of the 6 models in +[**drclust**](https://CRAN.R-project.org/package=drclust), our proposal +is much faster than the corresponding versions implemented in `R` (Table +\@ref(tab:T5)), +providing, nevertheless, compelling results. + +The iris dataset has been used in order to measure the performance in +terms of fit, runtime, and ARI (Rand 1971). The *z*-transform has been +applied on all the variables of the dataset. This implies that all the +variables, post-transformation, have mean equal to 0 and variance equal +to 1, by subtracting the mean to each variable and dividing the result +by the standard deviation. The same result is typically obtained by the +`scale(X) R` function. + +$$\begin{equation} +\label{eq:ztransform} +\mathbf{Z}_{\cdot j} = \frac{\mathbf{X}_{\cdot j} - \mu_j \mathbf{1_\textit{n}}}{\sigma_j} +\end{equation} (\#eq:ztransform)$$ +where $\mu_j$ is the mean of the *j*-th variable and $\sigma_\textit{j}$ +its standard deviation. The subscript .*j* refers to the whole *j*-th +column of the matrix. This operation avoids the measurement scale to +have impact on the final result (and is used by default, unless +otherwise specified by the user, within all the techniques implemented +by `drclust`. In order to avoid the comparison between potentially +different objective functions, the between deviance (intended as +described by the authors in the articles where the methods have been +proposed) has been used as a fit measure and computed based on the +output provided by the functions, aiming at having homogeneity in the +evaluation metric. *K=3* and *Q=2* have been used for the clustering +algorithms, maintaining, for the two-dimensionality reduction +techniques, just *Q* = 2. + +For each method, 100 runs have been performed and the best solution has +been picked. For each run, the maximum allowed number of iterations = +100, with a tolerance error (i.e., precision) equal to $10^{-6}$. + +::: {#tab:comparison} + ------------------------------------------------------------------------------------------------------ + Library Technique Runtime Fit ARI Fit Measure + --------------- ----------- --------- ------- ------- ------------------------------------------------ + clustrd RKM 0.73 21.38 0.620 $||\mathbf{U}\bar{\mathbf{Y}}\mathbf{A}'||^2$ + + drclust RKM 0.01 21.78 0.620 $||\mathbf{U}\bar{\mathbf{Y}}\mathbf{A}'||^2$ + + clustrd FKM 1.89 4.48 0.098 $||\mathbf{U}\bar{\mathbf{Y}}||^2$ + + drclust FKM 0.03 21.89 0.620 $||\mathbf{U}\bar{\mathbf{Y}}||^2$ + + biplotbootGUI CDPCA 2.83 21.32 0.676 $||\mathbf{U}\bar{\mathbf{Y}}\mathbf{A}'||^2$ + + drclust CDPCA 0.05 21.34 0.676 $||\mathbf{U}\bar{\mathbf{Y}}\mathbf{A}'||^2$ + + drclust DKM 0.03 21.29 0.652 $||\mathbf{U}\bar{\mathbf{X}}\mathbf{H_V}||^2$ + + drclust DPCA \<0.01 23.70 \- $||\mathbf{Y}\mathbf{A}'||^2$ + + drclust DFA 1.11 55.91 \- $||\mathbf{Y}\mathbf{A}'||^2$ + ------------------------------------------------------------------------------------------------------ + + : (#tab:T5) Performance of the variable reduction and joint + clustering-variable reduction models +::: + +The results of table \@ref(tab:T5) are visually represented in figure +\@ref(fig:iriscomparison). + + +```{r iriscomparison, fig.cap="ARI, Fit, Runtime for the available implementations", echo=FALSE} +knitr::include_graphics("figures/1iriscomparison.png") +``` + + +Although the runtime heavily depends on the hardware characteristics, +they have been reported within Table \@ref(tab:T5) for a relative comparison purpose only, +having run all the techniques with the same one hardware. For all the +computations within the present work, the specifics of the machine used +are: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz 2.00 GHz. + +Besides the already mentioned difference between DPCA and DFA, it is +worth mentioning that, in terms of implementation, they retrieve the +latent variables differently. Indeed, while the DPCA relies on the +eigendecomposition, the DFA uses an implementation of the power method +(Hotelling 1933). + +In essence, the implementation of our proposal, while being very fast, +exhibits a goodness of fit very close (sometimes better, compared) to +the available alternatives. + +## Simulation study + +To better understand the capabilities of the proposed methodologies and +evaluate the performance of the drclust package, a simulation study was +conducted. In this study, we assume that the number of clusters (K) and +the number of factors (Q) are known, and we examine how results vary +across the DKM, RKM, FKM, and DPCAKM methods. + +### Data generation process + +The performance of these algorithms is tested on synthetic data +generated through a specific procedure. Initially, centroids are created +using eigendecomposition on a transformed distance matrix, resulting in +three equidistant centroids in a reduced two-dimensional space. To model +the variances and covariances among the generated units within each +cluster and to introduce heterogeneity among the units, a +variance-covariance matrix ($\Sigma_O$) is derived from samples taken +from a zero-mean Gaussian distribution, with a specified standard +deviation ($\sigma_u$). + +Membership for the 1,000 units is determined based on a (K × 1) vector +of prior probabilities, utilizing a multinomial distribution with (0.2, +0.3, 0.5) probabilities. For each unit, a sample is drawn from a +multivariate Gaussian distribution centered around its corresponding +centroid, using the previously generated covariance matrix ($\Sigma_O$). +Additionally, four masking variables, which do not exhibit any +clustering structure, are generated from a zero-mean multivariate +Gaussian and scaled by a standard deviation of $\sigma$=6. These masking +variables are added to the 2 variables that form the clustering +structure of the dataset. Then, the final sample dataset is +standardized. + +It is important to note that the standard deviation $\sigma_u$ controls +the amount of variance in the reduced space, thus influencing the level +of subspace residuals. Conversely, $\sigma_m$ regulates the variance of +the masking variables, impacting the complement residuals. + +This study considers various scenarios where there are $J$ = 6 +variables, $n$ = 1,000 units, $K$ = 3 clusters and $Q$ = 2 factors. We +explore high, medium, and low variance $\sigma_u$ of the heterogeneity +within clusters with values of 0.8, 0.55, and 0.3. For each combination +of these parameters, $s$=100 samples are generated. Since the design is +fully crossed, a total of 300 datasets are produced. Examples of the +generated samples are illustrated in Figure +\@ref(fig:sim123), which +shows that as the level of within-cluster variance increases, the +variables with a clustering structure tend to create overlapping +clusters. It is worthy to inform that the two techniques dedicated +solely to variable reduction, namely DPCA and DFA, were not included in +the simulation study. This is because the study's primary focus is on +clustering and dimension reduction and the comparison with competing +implementations. However, it is worth noting that these methods are +inherently quick, as can be observed from the speed of methodologies +that combine clustering with DPCA or DFA dimension reduction methods. + +### Performance evaluation + +The performance of the proposed methods was assessed through a +simulation study. To evaluate the accuracy in recovering the true +cluster membership of the units (**U**), the ARI (Hubert and Arabie +1985) was employed. The ARI quantifies the similarity between the hard +partitions generated by the estimated classification matrices and those +defined by the true partition. It considers both the reference partition +and the one produced by the algorithm under evaluation. The ARI +typically ranges from 0 to 1, where 0 indicates a level of agreement +expected by random chance, and 1 denotes a perfect match. Negative +values may also occur, indicating agreement worse than what would be +expected by chance. In order to assess the models' ability to +reconstruct the underlying data structure, the between deviance, denoted +by $f$---, was computed. This measure is defined in the original works +proposing the evaluated methods and is reported in the second column +(Fit Measure) of Table \@ref(tab:T6). For comparison, the true between deviance +$f^{*}$, calculated from the known true, known, values of **U** and +**A**, was also computed. The difference $f - f^{*}$ was considered, +where negative values suggest potential overfitting. Furthermore, the +squared Frobenius norm $||\mathbf{A}^* - \mathbf{A}||^2$ was computed to +assess how accurately each model estimated the true loading matrix +$\mathbf{A}^*$. This evaluation was not applicable to the DKM method, as +it does not provide estimates of the loading matrix. For each +performance metric presented in Table \@ref(tab:T6), the median value across $s$ = 100 +replicates, for each level of error (within deviance), is reported. + +It is important to note that fit and ARI reflect distinct objectives. +While fit measures the variance explained by the model, the ARI assesses +clustering accuracy. As such, the two metrics may diverge. A model may +achieve high fit by capturing subtle variation or even noise, which may +not correspond to well-separated clusters, leading to a lower ARI. +Conversely, a method focused on maximizing cluster separation may yield +high ARI while explaining less overall variance. This trade-off is +particularly relevant in unsupervised settings, where there is no +external supervision to guide the balance between reconstruction and +partitioning. For this reason, we report both metrics to provide a more +comprehensive assessment of model performance. + +### Algorithms performances and comparison with the competing implementations + +For each sample, the algorithms DKM, RKM, FKM, and DPCAKM are applied +using 100 random start solutions, selecting the best one. This +significantly reduces the impact of local minima in the clustering and +dimension reduction process. Figure +\@ref(fig:sim123) depicts +the typical situation for each scenario (low, medium, high +within-cluster variance). + +```{r sim123, fig.cap="Within-cluster variance of the simulated data (in order: low, medium, high)", echo=FALSE} +knitr::include_graphics("figures/2sim123.png") +``` + + +::: {#tab:simulation} ++-----------+---------------------------------------------------------+---------------+-------------+---------+---------+-----------+-----------------------------------+ +| Technique | Fit Measure | Library | Runtime (s) | Fit | ARI | $f^* - f$ | $||\mathbf{A}^* - \mathbf{A}||^2$ | ++:==========+:========================================================+:==============+:============+:========+:========+:==========+:==================================+ +| **Low** | ++-----------+---------------------------------------------------------+---------------+-------------+---------+---------+-----------+-----------------------------------+ +| RKM | $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}\mathbf{A}'||^2$ | clustrd | 164.03 | 42.76 | 1.00 | 0.00 | 2.00 | ++-----------+---------------------------------------------------------+---------------+-------------+---------+---------+-----------+-----------------------------------+ +| RKM | $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}\mathbf{A}'||^2$ | drclust | 0.48 | 42.76 | 1.00 | 0.00 | 2.00 | ++-----------+---------------------------------------------------------+---------------+-------------+---------+---------+-----------+-----------------------------------+ +| FKM | $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}||^2$ | clustrd | 15.48 | 2.89 | 0.35 | 39.77 | 1.99 | ++-----------+---------------------------------------------------------+---------------+-------------+---------+---------+-----------+-----------------------------------+ +| FKM | $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}||^2$ | drclust | 0.52 | 42.76 | 1.00 | 0.00 | 2.00 | ++-----------+---------------------------------------------------------+---------------+-------------+---------+---------+-----------+-----------------------------------+ +| DPCAKM | $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}||^2$ | biplotbootGUI | 41.70 | 42.74 | 1.00 | 0.01 | 2.00 | ++-----------+---------------------------------------------------------+---------------+-------------+---------+---------+-----------+-----------------------------------+ +| DPCAKM | $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}||^2$ | drclust | 1.37 | 42.74 | 1.00 | 0.01 | 2.00 | ++-----------+---------------------------------------------------------+---------------+-------------+---------+---------+-----------+-----------------------------------+ +| DKM | $||\mathbf{U}\bar{\mathbf{X}}\mathbf{V}||^2$ | drclust | 0.78 | 61.55 | 0.46 | -18.94 | \- | ++-----------+---------------------------------------------------------+---------------+-------------+---------+---------+-----------+-----------------------------------+ +| **Medium** | ++-----------+---------------------------------------------------------+---------------+-------------+---------+---------+-----------+-----------------------------------+ +| RKM | $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}\mathbf{A}'||^2$ | clustrd | 230.31 | 39.18 | 0.92 | -0.27 | 2.00 | ++-----------+---------------------------------------------------------+---------------+-------------+---------+---------+-----------+-----------------------------------+ +| RKM | $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}\mathbf{A}'||^2$ | drclust | 0.70 | 39.18 | 0.92 | -0.27 | 2.00 | ++-----------+---------------------------------------------------------+---------------+-------------+---------+---------+-----------+-----------------------------------+ +| FKM | $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}||^2$ | clustrd | 14.31 | 2.85 | 0.28 | 36.09 | 1.99 | ++-----------+---------------------------------------------------------+---------------+-------------+---------+---------+-----------+-----------------------------------+ +| FKM | $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}||^2$ | drclust | 0.76 | 39.18 | 0.92 | -0.27 | 2 | ++-----------+---------------------------------------------------------+---------------+-------------+---------+---------+-----------+-----------------------------------+ +| DPCAKM | $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}||^2$ | biplotbootGUI | 47.76 | 39.15 | 0.92 | -0.25 | 2.00 | ++-----------+---------------------------------------------------------+---------------+-------------+---------+---------+-----------+-----------------------------------+ +| DPCAKM | $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}||^2$ | drclust | 1.64 | 39.15 | 0.92 | -0.25 | 2.00 | ++-----------+---------------------------------------------------------+---------------+-------------+---------+---------+-----------+-----------------------------------+ +| DKM | $||\mathbf{U}\bar{\mathbf{Y}}\mathbf{V}||^2$ | drclust | 0.81 | 5.93 | 0.39 | -21.00 | \- | ++-----------+---------------------------------------------------------+---------------+-------------+---------+---------+-----------+-----------------------------------+ +| **High** | ++-----------+---------------------------------------------------------+---------------+-------------+---------+---------+-----------+-----------------------------------+ +| RKM | $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}\mathbf{A}'||^2$ | clustrd | 314.89 | 36.61 | 0.62 | -2.11 | 2.00 | ++-----------+---------------------------------------------------------+---------------+-------------+---------+---------+-----------+-----------------------------------+ +| RKM | $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}\mathbf{A}'||^2$ | drclust | 0.94 | 36.61 | 0.61 | -2.11 | 2.00 | ++-----------+---------------------------------------------------------+---------------+-------------+---------+---------+-----------+-----------------------------------+ +| FKM | $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}||^2$ | clustrd | 13.87 | 2.90 | 0.19 | 31.55 | 2.00 | ++-----------+---------------------------------------------------------+---------------+-------------+---------+---------+-----------+-----------------------------------+ +| FKM | $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}||^2$ | drclust | 1.02 | 36.61 | 0.61 | -2.11 | 2.00 | ++-----------+---------------------------------------------------------+---------------+-------------+---------+---------+-----------+-----------------------------------+ +| DPCAKM | $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}||^2$ | biplotbootGUI | 55.49 | 36.53 | 0.64 | -1.99 | 2.00 | ++-----------+---------------------------------------------------------+---------------+-------------+---------+---------+-----------+-----------------------------------+ +| DPCAKM | $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}||^2$ | drclust | 2.06 | 36.53 | 0.63 | -2.01 | 2.00 | ++-----------+---------------------------------------------------------+---------------+-------------+---------+---------+-----------+-----------------------------------+ +| DKM | $||\mathbf{U}\bar{\mathbf{X}}\mathbf{V}||^2$ | drclust | 0.84 | 58.97 | 0.29 | -24.37 | \- | ++-----------+---------------------------------------------------------+---------------+-------------+---------+---------+-----------+-----------------------------------+ + +: (#tab:T6) Comparison of joint clustering-variable reduction +methods on simulated data +::: + +For the three scenarios, the results are reported in +\@ref(tab:T6). + + +```{r simboxplots1, fig.cap="Boxplots of the Fit results in Table \\@ref(tab:T6)", echo=FALSE} +knitr::include_graphics("figures/3Fit.png") +``` + +```{r simboxplots2, fig.cap="Boxplots of the ARI results in Table \\@ref(tab:T6)", echo=FALSE} +knitr::include_graphics("figures/4ARI.png") +``` + +```{r simboxplots3, fig.cap="Boxplots of the $f^* - f$ results in Table \\@ref(tab:T6)", echo=FALSE} +knitr::include_graphics("figures/5fsf.png") +``` + +```{r simboxplots4, fig.cap="Boxplots of the $||\\mathbf{A} - \\mathbf{A}^*||^2$ metric results in Table \\@ref(tab:T6)", echo=FALSE} +knitr::include_graphics("figures/6AsA.png") +``` + +```{r simboxplots5, fig.cap="Boxplots of the runtime results in Table \\@ref(tab:T6), for the RKM", echo=FALSE} +knitr::include_graphics("figures/7runtime_RKM.png") +``` + +```{r simboxplots6, fig.cap="Boxplots of the runtime metric results in Table \\@ref(tab:T6), for DKM, DPCAKM, FKM", echo=FALSE} +knitr::include_graphics("figures/8runtime_others.png") +``` + + +Regarding the RKM, the +[**drclust**](https://CRAN.R-project.org/package=drclust) and +[**clustrd**](https://CRAN.R-project.org/package=clustrd) performance is +very close, both in terms of the ability to recover the data (fit) and +in terms of identifying the true classification of the objects. + +The FKM appears to be performing way better in the +[**drclust**](https://CRAN.R-project.org/package=drclust) case in terms +of fit and ARI. Considering both ARI and fit for the CDPCA algorithm, +the difference between the present proposal and the one of +[**biplotbootGUI**](https://CRAN.R-project.org/package=biplotbootGUI) is +almost absent. Referring to the CPU runtime, all of the models proposed +are significantly faster compared to the previously available ones (RKM, +FKM and KM with DPCA). For the architecture used for the experiments, +the order of magnitude for such differences are specified in the last +column of Table \@ref(tab:T2). + +In general, the +[**drclust**](https://CRAN.R-project.org/package=drclust) shows a slight +overfit, while there is no evident difference in the ability to recover +the true **A**. There is no alternative implementation for the DKM, so +no comparison can be made. However, except for the ARI which is lower +than the other techniques, its fit is very close, showing a compelling +ability to reconstruct the data. In general, except for the FKM, where +our proposal outperforms the one in +[**clustrd**](https://CRAN.R-project.org/package=clustrd), our proposal +is equivalent in terms of fit and ARI. However, our versions outperform +every alternative in terms of runtime. Figures +(\@ref(fig:simboxplots1) - +\@ref(fig:simboxplots6)) visually depict the situation in +\@ref(tab:T6), showing +also the variability for each scenario, among 100 replicates. In +general, with the exception of the FKM method, where our proposed +approach outperforms the implementation available in +[**clustrd**](https://CRAN.R-project.org/package=clustrd), the methods +are comparable in terms of both fit and ARI. Nevertheless, our +implementations consistently outperform all alternatives in terms of +runtime. + +Figure (\@ref(fig:simboxplots1) - +\@ref(fig:simboxplots6)) provide a visual summary of the results +reported in Table \@ref(tab:T6), illustrating not only the central +tendencies but also the variability across the 100 simulation replicates +for each scenario. + +## Application on real data + +The six statistical models implemented (Table +\@ref(tab:T2)) have a +binary argument `print` which, if set to one, displays at the end of the +execution the main statistics. In the following examples, such results +are shown, using as dataset the same used by Vichi and Kiers (2001) and +made available in +[**clustrd**](https://CRAN.R-project.org/package=clustrd) (Markos et al. +2019) and named `macro`, which has been standardized by setting the +argument `prep=1`, which is done by default by all the techniques. +Moreover, the commands reported in each example do not specify all the +arguments available for the function, for which the default values have +been kept. + +The first example refers to the DKM (Vichi 2001). As shown, the output +contains the fit expressed as the percentage of the total deviance +(i.e., $||\mathbf{X}||^2$) captured by the between deviance of the +model, implementing the fit measures in (Table +\@ref(tab:T5)). The +second output is the centroid matrix $\bar{\mathbf{Y}}$, which describes +the *K* centroids in the *Q*-dimensional space induced by the partition +of the variables and its related variable-means. What follows are the +sizes and within deviances of each unit cluster and each variable +cluster. Finally, it shows the pseudoF (Caliński and Harabasz 1974) +index, which is always computed for the partition of the units. Please +note that the data matrix provided to each function implemented in the +package needs to be in matrix format. + +``` r +# Macro dataset (Vichi & Kiers, 2001) +library(clustrd) +data(macro) +macro <- as.matrix(macro) +# DKM +> dkm <- doublekm(X = macro, K = 5, Q = 3, print = 1) + +>> Variance Explained by the DKM (% BSS / TSS): 44.1039 + +>> Centroid Matrix (Unit-centroids x Variable-centroids): + + V-Clust 1 V-Clust 2 V-Clust 3 +U-Clust 1 0.1282052 -0.31086968 -0.4224182 +U-Clust 2 0.0406931 -0.08362029 0.9046692 +U-Clust 3 1.4321347 0.51191282 -0.7813761 +U-Clust 4 -0.9372541 0.22627768 0.1175189 +U-Clust 5 1.2221058 -2.59078258 -0.1660691 + +>> Unit-clusters: + + U-Clust 1 U-Clust 2 U-Clust 3 U-Clust 4 U-Clust 5 +Size 8 4 4 3 1 +Deviance 23.934373 31.737865 5.878199 4.844466 0.680442 + + + +>> Variable-clusters: + + V-Clust 1 V-Clust 2 V-Clust 3 +Size 3 2 1 +Deviance 40.832173 23.024249 3.218923 + +>> pseudoF Statistic (Calinski-Harabasz): 2.23941 +``` + +The second example shows as output the main quantities computed for the +`redkm` (De Soete and Carroll 1994). Differently from the DKM where the +variable reduction is operated via averages, the RKM does this via PCA +leading to a better overall fit altering also the final unit-partition, +as observable from the sizes or deviances. + +Additionally from the DKM example, the RKM also provides the loading +matrix which projects the *J*-dimensional centroids in the +*Q*-dimensional subspace. Another important difference is the summary of +the latent factors: this table shows the information captured by the +principal components with respect to the original data. In this sense, +the output allows to distinguish between the loss due to the variable +reduction (accounted in this table) and the overall loss of the +algorithm (which accounts for the loss in the reduction of the units and +the one due to the reduction of the variables, reported in the first +line of the output). + +``` r +# RKM +> rkm <- redkm(X = macro, K = 5, Q = 3, print = 1) + +>> Variance Explained by the RKM (% BSS / TSS): 55.0935 + +>> Matrix of Centroids (Unit-centroids x Principal Components): + + PC 1 PC 2 PC 3 +Clust 1 -1.3372534 -1.1457414 -0.6150841 +Clust 2 1.8834878 -0.0853912 -0.8907303 +Clust 3 0.5759906 0.4187003 0.3739608 +Clust 4 -0.9538864 1.2392976 0.3454186 +Clust 5 1.0417952 -2.2197178 3.0414445 + +>> Unit-clusters: + Clust 1 Clust 2 Clust 3 Clust 4 Clust 5 +Size 5 5 5 4 1 +Deviance 26.204374 9.921313 11.231563 6.112386 0.418161 + +>> Loading Matrix (Manifest Variables x Latent Variables): + + PC 1 PC 2 PC 3 +GDP -0.5144915 -0.04436269 0.08985135 +LI -0.2346937 -0.01773811 -0.86115069 +UR -0.3529363 0.53044730 0.28002534 +IR -0.4065339 -0.42022401 -0.17016203 +TB 0.1975072 0.69145440 -0.36710245 +NNS 0.5927684 -0.24828525 -0.09062404 + +>> Summary of the latent factors: + + Explained Variance Expl. Var. (%) Cumulated Var. Cum. Var (%) +PC 1 1.699343 28.322378 1.699343 28.322378 +PC 2 1.39612 23.268663 3.095462 51.591041 +PC 3 1.182372 19.706208 4.277835 71.297249 + +>> pseudoF Statistic (Calinski-Harabasz): 4.29923 +``` + +The `factkm` (Vichi and Kiers 2001) has the same output structure of the +`redkm`. It exhibits, for the same data and hyperparameters, a similar +fit (overall and variable-wise). However, the unit-partition, as well as +the latent variables are different. This difference can be (at least) +partially justified by the difference in the objective function, which +is most evident in the assignment step. + +``` r +# factorial KM +> fkm <- factkm(X = macro, K = 5, Q = 3, print = 1, rot = 1) + +>> Variance Explained by the FKM (% BSS / TSS): 55.7048 + +>> Matrix of Centroids (Unit-centroids x Principal Components): + + PC 1 PC 2 PC 3 +Clust 1 -0.7614810 2.16045496 -1.21025666 +Clust 2 1.1707159 -0.08840133 -0.29876729 +Clust 3 -0.9602731 -1.33141866 0.02370092 +Clust 4 1.0782934 1.17952330 3.59632116 +Clust 5 -1.7634699 0.65075735 0.46486440 + +>> Unit-clusters: + Clust 1 Clust 2 Clust 3 Clust 4 Clust 5 +Size 9 5 3 2 1 +Deviance 6.390576 2.827047 5.018935 3.215995 0 + +>> Loading Matrix (Manifest Variables x Latent Variables): + + PC 1 PC 2 PC 3 +GDP -0.6515084 -0.1780021 0.37482509 +LI -0.3164139 0.1809559 -0.68284917 +UR -0.2944864 -0.5235492 0.01561022 +IR -0.3316254 0.5884434 -0.22101070 +TB 0.1848264 -0.5367239 -0.57166730 +NNS 0.4945307 0.1647067 0.13164438 + +>> Summary of the latent factors: + + Explained Variance Expl. Var. (%) Cumulated Var. Cum. Var (%) +PC 1 1.68496 28.082675 1.68496 28.082675 +PC 2 1.450395 24.173243 3.135355 52.255917 +PC 3 1.079558 17.992635 4.214913 70.248552 + +>> pseudoF Statistic (Calinski-Harabasz): 4.26936 +``` + +`dpcakm` (Vichi and Saporta 2009) shows the same output as RKM and FKM. +The partition of the variables, described by the $\mathbf{V}$ term in +(\@ref(eq:cdpca4)) - (\@ref(eq:cdpca5)), is readable within the loading +matrix, considering a $1$ for each non-zero value. For the `iris` +dataset, the additional constraint $\mathbf{A} = \mathbf{B}\mathbf{V}$ +does not cause a significant decrease in the objective function. The +clusters, however, differ from the previous cases as well. + +``` r +# K-means DPCA +> cdpca <- dpcakm(X = macro, K = 5, Q = 3, print = 1) + +>> Variance Explained by the DPCAKM (% BSS / TSS): 54.468 + +>> Matrix of Centroids (Unit-centroids x Principal Components): + + PC 1 PC 2 PC 3 +Clust 1 0.6717536 0.01042978 -2.7309458 +Clust 2 3.7343724 -1.18771685 0.6320673 +Clust 3 -0.6729575 -1.80822745 0.7239541 +Clust 4 -0.2496002 1.54537904 0.5263009 +Clust 5 -0.1269212 -0.12464388 -0.1748282 + +>> Unit-clusters: + Clust 1 Clust 2 Clust 3 Clust 4 Clust 5 +Size 7 6 4 2 1 +Deviance 3.816917 2.369948 1.14249 4.90759 0 + +>> Loading Matrix (Manifest Variables x Latent Variables): + + PC 1 PC 2 PC 3 +GDP 0.5567605 0.0000000 0 +LI 0.0000000 0.7071068 0 +UR 0.5711396 0.0000000 0 +IR 0.0000000 0.0000000 1 +TB 0.0000000 0.7071068 0 +NNS -0.6031727 0.0000000 0 + +>> Summary of the latent factors: + Explained Variance Expl. Var. (%) Cumulated Var. Cum. Var (%) +PC 1 1 16.666667 1 16.666667 +PC 2 1.703964 28.399406 2.703964 45.066073 +PC 3 1.175965 19.599421 3.87993 64.665494 + +>> pseudoF Statistic (Calinski-Harabasz): 3.26423 +``` + +For the `dispca` (Vichi and Saporta 2009), the output is mostly similar +(except for the part of unit-clustering) to the ones already shown. +Nevertheless, because the focus here is exclusively on the variable +reduction process, some additional information is reported in the +summary of the latent factors. Indeed, because a single principal +component summarises a subset of manifest variables, the variance of the +second component related to each of the subsets, along with the Cronbach +(1951) Alpha index is computed, in order for the user to know when the +evidence supports such strategy of dimensionality reduction. As +mentioned, this function, like in the DPCAKM case, as well as the DFA +case, it allows to constrain a subset of the *J* variables to belong to +the same cluster. In the example that follows, the first two manifest +variables are constrained to contribute to the same principal component +(which is confirmed by the output `A`). Note that the manifest variables +that have indices (colum-position in the data matrix) in correspondence +of the zeros in `constr` remain unconstrained. + +``` r +# DPCA +# Impose GDP and LI to be in the same cluster +> out <- dispca(X = macro, Q = 3, print = 1, constr = c(1,1,0,0,0,0)) + +>> Variance explained by the DPCA (% BSS / TSS)= 63.9645 + +>> Loading Matrix (Manifest Variables x Latent variables) + + PC 1 PC 2 PC 3 +GDP 0.0000000 0.0000000 0.7071068 +LI 0.0000000 0.0000000 0.7071068 +UR -0.7071068 0.0000000 0.0000000 +IR 0.0000000 -0.7071068 0.0000000 +TB 0.0000000 0.7071068 0.0000000 +NNS 0.7071068 0.0000000 0.0000000 + +>> Summary of the latent factors: + Explained Variance Expl. Var. (%) Cumulated Var. +PC 1 1.388294 23.13824 1.388294 +PC 2 1.364232 22.73721 2.752527 +PC 3 1.085341 18.08902 3.837868 + Cum. Var (%) Var. 2nd component Cronbach's Alpha +PC 1 23.13824 0.6117058 -1.269545 +PC 2 45.87544 0.6357675 -1.145804 +PC 3 63.96447 0.9146585 0.157262 +``` + +The `disfa` (Vichi 2017), by assuming a probabilistic underlying model, +allows additional evaluation metrics and statistics as well. The overall +objective function is not directly comparable with the other ones, and +is expressed in absolute (not relative, like in the previous cases) +terms. The $\chi^2$ (`X2`), along with `BIC`, `AIC` and `RMSEA` allow a +robust evaluation of the results in terms of fit/parsimony. Additionally +to the DPCA case, for each variable, the function displays the +commonality with the factors, providing a standard error, as well as an +associated *p*-value for the estimate. + +It is possible to assess by comparing the loading matrix in the DPCA +case with the DFA one, the similarity in terms of latent variables. Part +of the difference can be justified (besides the well-known distinctions +between PCA and FA) with the method used to compute each factor. While +in all the previous cases, the eigendecomposition has been employed for +this purpose, the DFA makes use of the power iteration method for the +computation of the loading matrix (Hotelling 1933). + +``` r +# disjoint FA +> out <- disfa(X = macro, Q = 3, print = 1) +>> Discrepancy of DFA: 0.296499 + +>> Summary statistics: + + Unknown Parameters Chi-square Degrees of Freedom BIC + 9 4.447531 12 174.048102 + AIC RMSEA + 165.086511 0.157189 + +>> Loading Matrix (Manifest Variables x Latent Variables) + + Factor 1 Factor 2 Factor 3 +GDP 0.5318618 0 0.0000000 +LI 0.0000000 1 0.0000000 +UR 0.5668542 0 0.0000000 +IR 0.0000000 0 0.6035160 +TB 0.0000000 0 -0.6035152 +NNS -0.6849942 0 0.0000000 + +>> Summary of the latent factors: + + Explained Variance Expl. Var. (%) Cum. Var Cum. Var (%) +Factor 1 1.0734177 17.89029 1.073418 17.89029 +Factor 2 1.0000000 16.66667 2.073418 34.55696 +Factor 3 0.7284622 12.14104 2.801880 46.69800 + Var. 2nd component Cronbach's Alpha +Factor 1 0.7001954 -0.6451803 +Factor 2 0.0000000 1.0000000 +Factor 3 0.6357675 -1.1458039 + +>> Detailed Manifest-variable - Latent-factor relationships + + Associated Factor Corr. Coeff. Std. Error Pr(p>|Z|) +GDP 1 0.5318618 0.1893572 0.0157923335 +LI 2 1.0000000 0.0000000 0.0000000000 +UR 1 0.5668542 0.1842113 0.0091557523 +IR 3 0.6035160 0.1782931 0.0048411219 +TB 3 -0.6035152 0.1782932 0.0048411997 +NNS 1 -0.6849942 0.1629084 0.0008606488 + Var. Error Communality +GDP 0.7171230 0.2828770 +LI 0.0000000 1.0000000 +UR 0.6786764 0.3213236 +IR 0.6357684 0.3642316 +TB 0.6357695 0.3642305 +NNS 0.5307830 0.4692170 +``` + +In practice, usually the `K` and `Q` hyper-parameters are not known a +priori. In such case, a possible tool that allows to investigate +plausible values for `Q` is the Kaiser criterion (Kaiser 1960), in `R`, +`kaiserCrit`), takes as a single argument the dataset and outputs a +message, as well as a scalar output indicating the number of the optimal +components based on this rule. + +``` r +# Kaiser criterion for the choice of Q, the number of latent components +> kaiserCrit(X = macro) + +The number of components suggested by the Kaiser criterion is: 3 +``` + +For selecting the number of clusters, `K`, one of the most commonly used +indices is the *pseudoF* statistic, which, however, tends to +underestimate the optimal number of clusters. To address this +limitation, a \"relaxed\" version, referred to as `apseudoF`, has been +implemented. The `apseudoF` procedure computes the standard `pseudoF` +index over a range of possible values up to `maxK`. If a higher value of +`K` yields a pseudoF that is less than `tol` $\cdot$ pseudoF (compared +to the maximum value suggested by the plain pseudoF), then `apseudoF` +selects this alternative `K` as the optimal number of clusters. +Additionally, it generates a plot of the pseudoF values computed across +the specified *K* range. Given the hybrid nature of the proposed +methods, the function also requires specifying the clustering model to +be used: 1 = `doublekm`, 2 = `redkm`, 3 = `factkm`, 4 = `dpcakm`. +Furthermore, the number of components, `Q`, must be provided, as it also +influences the final quality of the resulting partition. + +``` r +> apseudoF(X = macro, maxK=10, tol = 0.05, model = 2, Q = 3) +The optimal number of clusters based on the pseudoF criterion is: 5 +``` + +```{r pF-fkm, fig.cap="Interval-pseudoF polygonal chain", echo=FALSE} +knitr::include_graphics("figures/9pFfactkm.png") +``` + +While this index has been thought for one-mode clustering methods, +(Rocci and Vichi 2008) extended it for two-mode clustering methods, +allowing to apply it for methods like the `doublekm`. The `dpseudoF` +function implements it and, besides the dataset, one provides the +maximum `K` and `Q` values. + +``` r +> dpseudoF(X = macro, maxK = 10, maxQ = 5) + Q = 2 Q = 3 Q = 4 Q = 5 +K = 2 38.666667 22.800000 16.000000 12.222222 +K = 3 22.800000 13.875000 9.818182 7.500000 +K = 4 16.000000 9.818182 6.933333 5.263158 +K = 5 12.222222 7.500000 5.263158 3.958333 +K = 6 9.818182 6.000000 4.173913 3.103448 +K = 7 8.153846 4.950000 3.407407 2.500000 +K = 8 6.933333 4.173913 2.838710 2.051282 +K = 9 6.000000 3.576923 2.400000 1.704545 +K = 10 5.263158 3.103448 2.051282 1.428571 +``` + +Here, the indices of the maximum value within the matrix are chosen as +the best `Q` and `K` values. + +Just by providing the centroid matrix, one can check how those are +related. Such information is usually not provided by partitive +clustering methods, but rather for the hierarchical ones. Nevertheless, +it is always possible to construct a distance matrix based on the +centroids and represent it via a dendrogram, using an arbitrary +distance. The `centree` function does exactly this, using the the Ward +(1963) distance, which corresponds to the squared Euclidean one. In +practice, one provides as an argument the output of one of the 4 methods +performing clustering. + +``` r +> out <- factkm(X = macro, K = 10, Q = 3) +> centree(drclust_out = out) +``` + +```{r fig-centree, fig.cap="Dendrogram of a 10-centroids", echo=FALSE} +knitr::include_graphics("figures/10centreedpca10m.png") +``` + + +If, instead, one wants to assess visually the quality of the obtained +partition, there are another instrument typically used for this purpose. +The silhouette (Rousseeuw 1987), besides summarizing this numerically, +allows to also graphically represent it. By employing +[**cluster**](https://CRAN.R-project.org/package=cluster) for the +computational part and +[**factoextra**](https://CRAN.R-project.org/package=factoextra) for the +graphical part, `silhouette` takes as argument the output of one of the +four [**drclust**](https://CRAN.R-project.org/package=drclust) +clustering methods and the dataset, returning the results of the two +functions with just one command. + +``` r +# Note: The same data must be provided to dpcakm and silhouette +> out <- dpcakm(X = macro, K = 5, Q = 3) +> silhouette(X = macro, drclust_out = out) +``` + +```{r silhouette, fig.cap="Silhouette of a DPCA KM solution", echo=FALSE} +knitr::include_graphics("figures/11silhouettek5q3.png") +``` + +As can be seen in Figure \@ref(fig:silhouette), the average silhouette width is also +displayed as a scalar above the plot. + +A purely graphical tool used to assess the dis/homogeneity of the groups +is the `heatmap`. By employing the +[**pheatmap**](https://CRAN.R-project.org/package=pheatmap) library +(Kolde 2019) and the result of `doublekm`, `redkm`, `factkm` or +`dpcakm`, the function orders each cluster of observations in ascending +order with regard to the distance between observation and cluster to +which it has been assigned. After doing so for each group, groups are +sorted based on the distance between their centroid and the grand mean +(i.e., the mean of all observations). The `heatm` function allows to +obtain such result. Figure \@ref(fig:silhouette) represents its graphical output. + +``` r +# Note: The same data must be provided to dpcakm and silhouette +> out <- doublekm(X = macro, K = 5, Q = 3) +> heatm(X = macro, drclust_out = out) +``` + + +```{r heatmap-dkm, fig.cap="heatmap of a double-KM solution", echo=FALSE} +knitr::include_graphics("figures/12dkmk5q3heatm.png") +``` + + +Biplots and parallel coordinates plots can be obtained based on the +output of the techniques in the proposed package by means of few +instructions, using libraries available on `CRAN`, such as: +[**ggplot2**](https://CRAN.R-project.org/package=ggplot2) (Wickham et +al. 2024), `grid` (which now became a base package, +[**dplyr**](https://CRAN.R-project.org/package=dplyr) (Wickham et al. +2023) and [**GGally**](https://CRAN.R-project.org/package=GGally) by +(Schloerke et al. 2024). Therefore, the user can easily visualize the +subspaces provided by the statistical techniques. In future versions of +the package, the two functions will be available as built-in. Currently, +for the biplot, we have: + +``` r +library(ggplot2) +library(grid) +library(dplyr) + +out <- factkm(macro, K = 2, Q = 2, Rndstart = 100) + +# Prepare data +Y <- as.data.frame(macro%*%out$A); colnames(Y) <- c("Dim1", "Dim2") +Y$cluster <- as.factor(cluster(out$U)) + +arrow_scale <- 5 +A <- as.data.frame(out$A)[, 1:2] * arrow_scale +colnames(A) <- c("PC1", "PC2") +A$var <- colnames(macro) + +# Axis limits +lims <- range(c(Y$Dim1, Y$Dim2, A$PC1, A$PC2)) * 1.2 + +# Circle +circle <- data.frame(x = cos(seq(0, 2*pi, length.out = 200)) * arrow_scale, + y = sin(seq(0, 2*pi, length.out = 200)) * arrow_scale) + +ggplot(Y, aes(x = Dim1, y = Dim2, color = cluster)) + + geom_point(size = 2) + + geom_segment( + data = A, aes(x = 0, y = 0, xend = PC1, yend = PC2), + arrow = arrow(length = unit(0.2, "cm")), inherit.aes = FALSE, color = "gray40" + ) + + geom_text( + data = A, aes(x = PC1, y = PC2, label = colnames(macro)), inherit.aes = FALSE, + hjust = 1.1, vjust = 1.1, size = 3 + ) + + geom_path(data = circle, aes(x = x, y = y), inherit.aes = FALSE, + linetype = "dashed", color = "gray70") + + coord_fixed(xlim = lims, ylim = lims) + + labs(x = "Component 1", y = "Component 2", title = "Biplot") + + theme_minimal() +``` + +which leads to the result shown in Figure +\@ref(fig:boxplot1). + +```{r boxplot1, fig.cap="Biplot of a FKM solution", echo=FALSE} +knitr::include_graphics("figures/13biplot.png") +``` + + +By using essential information in the output provided by `factkm`, we +are able to see the cluster of each observation, represented in the +estimated subspace induced by $\mathbf{A}$, as well as the relationships +between observed and latent variables via the arrows. + +In order to obtain the parallel coordinates plot, a single instruction +is sufficient, based on the same output as a starting point. + +``` r +library(GGally) +out <- factkm(macro, K = 3, Q = 2, Rndstart = 100) +ggparcoord( + data = Y, columns = 1:(ncol(Y)-1), + groupColumn = "cluster", scale = "uniminmax", + showPoints = FALSE, alphaLines = 0.5 +) + + theme_minimal() + + labs(title = "Parallel Coordinate Plot", + x = "Variables", y = "Normalized Value") +``` + +For FKM applied on `macro` dataset, the output is reported in figure +\@ref(fig:parcoord). + + +```{r parcoord, fig.cap="Parallel coordinates plot of a FKM solution", echo=FALSE} +knitr::include_graphics("figures/14parcoord.png") +``` + + +## Conclusions {#Conclusions} + +This work presents an R library that implements techniques of joint +dimensionality reduction and clustering. Some of them are already +implemented by other packages. In general, the performance between the +proposed implementations and the earlier ones is very close, except for +the FKM, where the new one is always better for the metrics considered +here. As an element of novelty, the empty cluster(s) issue that may +occur in the estimation process has been addressed by applying 2-means +on the cluster with the highest deviance, preserving the monotonicity of +the algorithm and providing slightly better results, at a higher +computational costs. + +The implementation of the two dimensionality reduction methods, `dispca` +and `disfa`, as well as `doublekm` offered by our library are novel in +the sense that they do not find previous implementation in R. Besides +the methodological difference between these last two, the latent +variables are computed differently: the former uses the well-known +eigendecomposition, while the latter adopts the power method. In +general, by implementing all the models in C/C++, the speed advantage +has been shown to be remarkable compared to all the existing +comparisons. These improvements allow the application of the techniques +on datasets that are relatively large, to obtain results in reasonable +amounts of time. Some additional functions have been implemented for the +purpose of helping in the choice process for the values of the +hyperparameters. Additionally, they can also be used as an assessment +tool in order to evaluate the quality of the results provided by the +implementations. +::::::::: + +:::::::::::::::::::::::::::::::::::::::::: {#refs .references .csl-bib-body .hanging-indent} +::: {#ref-calinski1974 .csl-entry} +Caliński, T., and J. Harabasz. 1974. "A Dendrite Method for Cluster +Analysis." *Communications in Statistics* 3 (1): 1--27. +. +::: + +::: {#ref-cattell1965 .csl-entry} +Cattell, R. B. 1965. "Factor Analysis: An Introduction to Essentials i. +The Purpose and Underlying Models." *Biometrics* 21 (1): 190--215. +. +::: + +::: {#ref-charrad2014 .csl-entry} +Charrad, M., N. Ghazzali, V. Boiteau, and A. Niknafs. 2014. "NbClust: An +R Package for Determining the Relevant Number of Clusters in a Data +Set." *Journal of Statistical Software* 61 (6): 1--36. +. +::: + +::: {#ref-cronbach1951 .csl-entry} +Cronbach, Lee J. 1951. "Coefficient Alpha and the Internal Structure of +Tests." *Psychometrika* 16 (3): 297--334. +. +::: + +::: {#ref-desoete1994 .csl-entry} +De Soete, G., and J. D. Carroll. 1994. "K-Means Clustering in a +Low-Dimensional Euclidean Space." Chap. 24 in *New Approaches in +Classification and Data Analysis*, edited by E. Diday, Y. Lechevallier, +M. Schader, P. Bertrand, and B. Burtschy. Springer. +. +::: + +::: {#ref-desarbo1990 .csl-entry} +DeSarbo, W. S., K. Jedidi, K. Cool, and D. Schendel. 1990. "Simultaneous +Multidimensional Unfolding and Cluster Analysis: An Investigation of +Strategic Groups." *Marketing Letters* 2: 129--46. +. +::: + +::: {#ref-dray2007 .csl-entry} +Dray, S., and A.-B. Dufour. 2007. "The Ade4 Package: Implementing the +Duality Diagram for Ecologists." *Journal of Statistical Software* 22 +(4): 1--20. . +::: + +::: {#ref-eddelbuettel2011 .csl-entry} +Eddelbuettel, D., and R. Francois. 2011. "Rcpp: Seamless R and C++ +Integration." *Journal of Statistical Software* 40 (8): 1--18. +. +::: + +::: {#ref-eddelbuettel2014 .csl-entry} +Eddelbuettel, D., and C. Sanderson. 2014. "RcppArmadillo: Accelerating R +with High-Performance C++ Linear Algebra." *Computational Statistics and +Data Analysis* 71: 1054--63. +. +::: + +::: {#ref-hotelling1933 .csl-entry} +Hotelling, H. 1933. "Analysis of a Complex of Statistical Variables into +Principal Components." *Journal of Educational Psychology* 24: 417--41, +and 498--520. . +::: + +::: {#ref-HubertArabie .csl-entry} +Hubert, L., and P. Arabie. 1985. "Comparing Partitions." *Journal of +Classification* 2 (1): 193--218. . +::: + +::: {#ref-kaiser1960 .csl-entry} +Kaiser, Henry F. 1960. "The Application of Electronic Computers to +Factor Analysis." *Educational and Psychological Measurement* 20 (1): +141--51. . +::: + +::: {#ref-kassambara2022 .csl-entry} +Kassambara, A. 2022. *Factoextra: Extract and Visualize the Results of +Multivariate Data Analyses*. R package version 1.0.7. +. +::: + +::: {#ref-kolde2019 .csl-entry} +Kolde, R. 2019. *Pheatmap: Pretty Heatmaps*. R package 1.0.12. +. +::: + +::: {#ref-lawley1962 .csl-entry} +Lawley, D. N., and A. E. Maxwell. 1962. "Factor Analysis as a +Statistical Method." *Journal of the Royal Statistical Society. Series D +(The Statistician)* 12 (3): 209--29. . +::: + +::: {#ref-le2008 .csl-entry} +Lê, S., J. Josse, and F. Husson. 2008. "FactoMineR: An R Package for +Multivariate Analysis." *Journal of Statistical Software* 25 (1): 1--18. +. +::: + +::: {#ref-maechler2023 .csl-entry} +Maechler, M., P. Rousseeuw, A. Struyf, M. Hubert, and K. Hornik. 2023. +*Cluster: Cluster Analysis Basics and Extensions*. R package version +2.1.6. . +::: + +::: {#ref-markos2019 .csl-entry} +Markos, A., A. I. D'Enza, and M. van de Velden. 2019. "Beyond Tandem +Analysis: Joint Dimension Reduction and Clustering in R." *Journal of +Statistical Software* 91 (10): 1--24. +. +::: + +::: {#ref-mcqueen1967 .csl-entry} +McQueen, J. 1967. "Some Methods for Classification and Analysis of +Multivariate Observations." *Computer and Chemistry* 4: 257--72. +. +::: + +::: {#ref-nietolibreiro2023 .csl-entry} +Nieto Librero, A. B., and A. Freitas. 2023. *biplotbootGUI: Bootstrap on +Classical Biplots and Clustering Disjoint Biplot*. +. +::: + +::: {#ref-pardo2007 .csl-entry} +Pardo, C. E., and P. C. Del Campo. 2007. "Combination of Factorial +Methods and Cluster Analysis in R: The Package FactoClass." *Revista +Colombiana de Estadística* 30 (2): 231--45. +. +::: + +::: {#ref-pearson1901 .csl-entry} +Pearson, K. 1901. "On Lines and Planes of Closest Fit to Systems of +Points in Space." *The London, Edinburgh, and Dublin Philosophical +Magazine and Journal of Science* 2 (11): 559--72. +. +::: + +::: {#ref-R .csl-entry} +R Core Team. 2015. *R: A Language and Environment for Statistical +Computing*. R Foundation for Statistical Computing. +. +::: + +::: {#ref-rand1971 .csl-entry} +Rand, W. M. 1971. "Objective Criteria for the Evaluation of Clustering +Methods." *Journal of the American Statistical Association* 66 (336): +846--50. . +::: + +::: {#ref-rocci2008 .csl-entry} +Rocci, R., and M. Vichi. 2008. "Two-Mode Multi-Partitioning." +*Computational Statistics & Data Analysis* 52 (4): 1984--2003. +. +::: + +::: {#ref-ROUSSEEUW198753 .csl-entry} +Rousseeuw, Peter J. 1987. "Silhouettes: A Graphical Aid to the +Interpretation and Validation of Cluster Analysis." *Journal of +Computational and Applied Mathematics* 20: 53--65. +. +::: + +::: {#ref-ggally .csl-entry} +Schloerke, B., D. Cook, H. Hofmann, et al. 2024. *GGally: Extension to +'Ggplot2'*. R package version 2.1.2. +. +::: + +::: {#ref-timmerman2010 .csl-entry} +Timmerman, Marieke E., Eva Ceulemans, Henk A. L. Kiers, and Maurizio +Vichi. 2010. "Factorial and Reduced k-Means Reconsidered." +*Computational Statistics & Data Analysis* 54 (7): 1858--71. +. +::: + +::: {#ref-maurizio2001a .csl-entry} +Vichi, M. 2001. "Double k-Means Clustering for Simultaneous +Classification of Objects and Variables." Chap. 6 in *Advances in +Classification and Data Analysis*, edited by S. Borra, R. Rocci, M. +Vichi, and M. Schader. Springer. +. +::: + +::: {#ref-vichi2017 .csl-entry} +Vichi, M. 2017. "Disjoint Factor Analysis with Cross-Loadings." +*Advances in Data Analysis and Classification* 11 (4): 563--91. +. +::: + +::: {#ref-vichi2001a .csl-entry} +Vichi, Maurizio, and Henk A. L. Kiers. 2001. "Factorial k-Means Analysis +for Two-Way Data." *Computational Statistics & Data Analysis* 37 (1): +49--64. . +::: + +::: {#ref-vichi2009 .csl-entry} +Vichi, Maurizio, and Gilbert Saporta. 2009. "Clustering and Disjoint +Principal Component Analysis." *Computational Statistics & Data +Analysis* 53 (8): 3194--208. +. +::: + +::: {#ref-VichiVicariKiers .csl-entry} +Vichi, M., D. Vicari, and Henk A. L. Kiers. 2019. "Clustering and +Dimension Reduction for Mixed Variables." *Behaviormetrika*, 243--69. +. +::: + +::: {#ref-revelle2017 .csl-entry} +W. R. Revelle. 2017. *Psych: Procedures for Personality and +Psychological Research*. +. +::: + +::: {#ref-ward1963 .csl-entry} +Ward, J. H. 1963. "Hierarchical Grouping to Optimize an Objective +Function." *Journal of the American Statistical Association* 58 (301): +236--44. . +::: + +::: {#ref-ggplot2 .csl-entry} +Wickham, H., W. Chang, L. Henry, et al. 2024. *Ggplot2: Elegant Graphics +for Data Analysis*. R package version 3.4.4. +. +::: + +::: {#ref-dplyr .csl-entry} +Wickham, H., R. François, L. Henry, and K. Müller. 2023. *Dplyr: A +Grammar of Data Manipulation*. R package version 1.1.4. +. +::: + +::: {#ref-yamamoto2014 .csl-entry} +Yamamoto, M., and H. Hwang. 2014. "A General Formulation of Cluster +Analysis with Dimension Reduction and Subspace Separation." +*Behaviormetrika* 41: 115--29. . +::: + +::: {#ref-zou2006 .csl-entry} +Zou, H., T. Hastie, and R. Tibshirani. 2006. "Sparse Principal Component +Analysis." *Journal of Computational and Graphical Statistics* 15 (2): +265--86. https://doi.org/. +::: +:::::::::::::::::::::::::::::::::::::::::: diff --git a/_articles/RJ-2025-046/RJ-2025-046.html b/_articles/RJ-2025-046/RJ-2025-046.html new file mode 100644 index 0000000000..5c3a7d3f00 --- /dev/null +++ b/_articles/RJ-2025-046/RJ-2025-046.html @@ -0,0 +1,4974 @@ + + + + + + + + + + + + + + + + + + + + + + drclust: An R Package for Simultaneous Clustering and Dimensionality Reduction + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +

    drclust: An R Package for Simultaneous Clustering and Dimensionality Reduction

    + + + +

    The primary objective of simultaneous methodologies for clustering and +variable reduction is to identify both the optimal partition of units +and the optimal subspace of variables, all at once. The optimality is +typically determined using least squares or maximum likelihood +estimation methods. These simultaneous techniques are particularly +useful when working with Big Data, where the reduction (synthesis) is +essential for both units and variables. Furthermore, a secondary +objective of reducing variables through a subspace is to enhance the +interpretability of the latent variables identified by the subspace +using specific methodologies. The drclust package implements double +K-means (KM), reduced KM, and factorial KM to address the primary +objective. KM with disjoint principal components addresses both the +primary and secondary objectives, while disjoint principal component +analysis and disjoint factor analysis address the latter, producing +the sparsest loading matrix. The models are implemented in C++ for +faster execution, processing large data matrices in a reasonable +amount of time.

    +
    + + + +
    +
    +

    1 Introduction

    +

    Cluster analysis is the process of identifying homogeneous groups of +units in the data so that those within clusters are perceived with a low +degree of dissimilarity with each other. In contrast, units in different +clusters are perceived as dissimilar, i.e., with a high degree of +dissimilarity. When dealing with large or extremely large data matrices, +often referred to as Big Data, the task of assessing these +dissimilarities becomes computationally intensive due to the sheer +volume of units and variables involved. To manage this vast amount of +information, it is essential to employ statistical techniques that +synthesize and highlight the most significant aspects of the data. +Typically, this involves dimensionality reduction for both units and +variables to efficiently summarize the data.

    +

    While cluster analysis synthesizes information across the rows of the +data matrix, variable reduction operates on the columns, aiming to +summarize the features and, ideally, facilitate their interpretation. +This key process involves extracting a subspace from the full space +spanned by the manifest variables, maintaining the principal informative +content. The process allows for the synthesis of common information +mainly among subsets of manifest variables, which represent concepts not +directly observable. As a result, subspace-based variable reduction +identifies a few uncorrelated latent variables that mainly capture +common relationships within these subsets. When using techniques like +Factor Analysis (FA) or Principal Component Analysis (PCA) for this +purpose, interpreting the resulting factors or components can be +challenging, particularly when variables significantly load onto +multiple factors, a situation known as cross-loading. Therefore, a +simpler structure in the loading matrix, focusing on the primary +relationship between each variable and its related factor, becomes +desirable for clarity and ease of interpretation. Furthermore, the +latent variables derived from PCA or FA do not provide a unique +solution. An equivalent model fit can be achieved by applying an +orthogonal rotation to the component axes. This aspect of non-uniqueness +is often exploited in practice through Varimax rotation, which is +designed to improve the interpretability of latent variables, without +affecting the fit of the analysis. The rotation promotes a simpler +structure in the loading matrix, however, the rotations do not always +ensure enhanced interpretability. An alternative approach has been +proposed by (Vichi and Saporta 2009) and (Vichi 2017), with Disjoint +Principal Component (DPCA) and Disjoint FA (DFA), suggesting to +construct each component/factor from a distinct subset of manifest +variables rather than using all available variables, still optimizing +the same estimation as in PCA and FA, respectively.

    +

    It is important to note that data matrix reduction for both rows and +columns is often performed without specialized methodologies by +employing a "tandem analysis." This involves sequentially applying two +methods, such as using PCA or FA for variable reduction, followed by +Cluster Analysis using KM on the resulting factors. Alternatively, one +could start with Cluster Analysis and then proceed to variable +reduction. The outcomes of these two tandem analyses differ since each +approach optimizes distinct objective functions, one before the other. +For instance, when PCA is applied first, the components maximize the +total variance of the manifest variables. However, if the manifest +variables include high-variance variables that lack a clustering +structure, these will be included in the components, even though they +are not necessary for KM, which focuses on explaining only the variance +between clusters. As a result, sequentially optimizing two different +objectives may lead to sub-optimal solutions. In contrast, when +combining KM with PCA or FA in a simultaneous approach, a single +integrated objective function is utilized. This function aims to +optimize both the clustering partition and the subspace simultaneously. +The optimization is typically carried out using an Alternating Least +Squares (ALS) algorithm, which updates the partition for the current +subspace in one step and the subspace for the current partition in the +next. This iterative process ensures convergence to a solution that +represents at least a local minimum of the integrated objective +function. In comparison, tandem analysis, which follows a sequential +approach (e.g., PCA followed by KM), does not guarantee joint +optimization. One potential limitation of this sequential method is that +the initial optimization through PCA may obscure relevant information +for the subsequent step of Cluster Analysis or emphasize irrelevant +patterns, ultimately leading to sub-optimal solutions, as mentioned by +(DeSarbo et al. 1990). Indeed, the simultaneous strategy has been shown +to be effective in various studies, like (De Soete and Carroll 1994), +(Vichi and Kiers 2001), (Vichi 2001), (Vichi and Saporta 2009), (Rocci +and Vichi 2008), (Timmerman et al. 2010), (Yamamoto and Hwang 2014).

    +

    In order to spread access to these techniques and their use, software +implementations are needed. Within the R Core Team (2015) environment, +there are different libraries available to perform dimensionality +reduction techniques. Indeed, the plain version of KM, PCA, and FA are +available in the built-in package stats, namely: princomp, factanal, +kmeans. Furthermore, some packages allow to go beyond the plain +estimation and output of such algorithms. Indeed, one of the most rich +libraries in R is psych +(W. R. Revelle 2017), which provides functions that allow to easily +simulate data according to different schemes, testing routines, +calculation of various estimates, as well as multiple estimation +methods. ade4 (Dray and +Dufour 2007) allows for dimensionality reduction in the presence of +different types of variables, along with many graphical instruments. The +FactoMineR (Lê et +al. 2008) package allows for unit-clustering and extraction of latent +variables, also in the presence of mixed variables. +FactoClass (Pardo +and Del Campo 2007) implements functions for PCA, Correspondence +Analysis (CA) as well as clustering, including the tandem approach. +factoextra +(Kassambara 2022) instead, provides visualization of the results, aiding +their assessment in terms of choice of the number of latent variables, +elegant dendrograms, screeplots and more. More focused on the choice of +the number of clusters is +NbClust (Charrad et +al. 2014), offering 30 indices for determining the number of clusters, +proposing the best method by trying not only different numbers of groups +but also different distance measures and clustering methods, going +beyond the partitioning ones.

    +

    More closely related to the library here presented, to the knowledge of +the authors, there are two packages that implement a subset of the +techniques proposed within +drclust. +clustrd (Markos et al. +2019) implements simultaneous methods of clustering and dimensionality +reduction. Besides offering functions for continuous data, they also +allow for categorical (or mixed) variables. Even more, they formulate, +at least for the continuous case, an implementation aligned with the +objective function proposed by Yamamoto and Hwang (2014), based on which +the reduced KM (RKM) and factorial KM (FKM) become special cases as +results of a tuning parameter.

    +

    Finally, there is +biplotbootGUI +(Nieto Librero and Freitas 2023), offering a GUI allowing to interact +with graphical tools, aiding in the choice of the number of components +and clusters. Furthermore, it implements KM with disjoint PCA (DPCA), as +described in (Vichi and Saporta 2009). Even more, they propose an +optimization algorithm for the choice of the initial starting point from +which the estimation process for the parameters begins.

    +

    Like clustrd, the +drclust package +provides implementations of FKM and RKM. However, while +clustrd also supports +categorical and mixed-type variables, our implementation currently +handles only continuous variables. That said, appropriate pre-processing +of categorical variables, as suggested in Vichi et al. (2019), can make +them compatible with the proposed methods. In extreme essence, one +should dummy-encode all the qualitative variables. In terms of +performance, drclust +offers significantly faster execution. Moreover, regarding FKM, our +proposal demonstrates superior results in both empirical applications +and simulations, in terms of model fit and the Adjusted Rand Index +(ARI). Another alternative, +biplotbootGUI, +implements KM with DPCA and includes built-in plotting functions and a +SDP-based initialization of parameters. However, our implementation +remains considerably faster and allows users to specify which variables +should be grouped together within the same (or different) principal +components. This capability enables a partially or fully confirmatory +approach to variable reduction. Beyond speed and the confirmatory +option, drclust offers +three methods not currently available in other R packages: DPCA and +DFA, both designed for pure dimensionality reduction, and double KM +(DKM), which performs simultaneous clustering and variable reduction via +KM. All methods are implemented in C++ for computational efficiency. +Table 2 +summarizes the similarities and differences between drclust and +existing alternatives

    +

    The package presented within this work aims to facilitate the access to +and usability of some techniques that fall in two main branches, which +overlap. In order to do so, some statistical background is first +recalled.

    +

    2 Notation and theoretical background

    +

    The main pillars of +drclust fall in two +main categories: dimensionality reduction and (partitioning) cluster +analysis. The former may be carried out individually or blended with the +latter. Because both rely on the language of linear algebra, Table +1 contains, +for the convenience of the reader, the mathematical notation needed for +this context. Then some theoretical background is reported.

    +
    + + ++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Table 1: Notation
    SymbolDescription
    n, J, K, Qnumber of: units, manifest variables, unit-clusters, latent factors
    \(\mathbf{X}\)n x J data matrix, where the generic element \(x_{ij}\) is the real observation on the i-th unit within the j-th variable
    \(\mathbf{x}_i\)J x 1 vector representing the generic row of \(\mathbf{X}\)
    \(\mathbf{U}\)n x K unit-cluster membership matrix, binary and row stochastic, with \(u_{ik}\) being the generic element
    \(\mathbf{V}\)J x Q variable-cluster membership matrix, binary and row stochastic, with \(v_{jq}\) as the generic element
    \(\mathbf{B}\)J x J variable-weighting diagonal matrix
    \(\mathbf{Y}\)n x Q component/factor score matrix defined on the reduced subspace
    \(\mathbf{y}_i\)Q x 1 vector representing the generic row of Y
    \(\mathbf{A}\)J x Q variables - factors, "plain", loading matrix
    \(\mathbf{C}^+\)Moore-Penrose pseudo-inverse of a matrix C. \(\mathbf{C}^+ = (\mathbf{C'C})^{-1}\mathbf{C'}\)
    \(\bar{\textbf{X}}\)K x J centroid matrix in the original feature space, i.e., \(\bar{\textbf{X}} = \textbf{U}^{+} \textbf{X}\)
    \(\bar{\mathbf{Y}}\)K x Q centroid matrix projected in the reduced subspace, i.e., \(\bar{\mathbf{Y}} = \bar{\mathbf{X}}\mathbf{A}\)
    \(\mathbf{H}_{\mathbf{C}}\)Projector operator \(\mathbf{H}_\mathbf{C} = \mathbf{C}(\mathbf{C}'\mathbf{C})^{-1}\mathbf{C}'\) spanned by the columns of matrix \(\mathbf{C}\)
    \(\mathbf{E}\)n x J Error term matrix
    \(||\cdot||\)Frobenius norm
    +
    +

    Latent variables with simple-structure loading matrix

    +

    Classical methods of PCA (Pearson 1901) or FA (Cattell 1965; Lawley and +Maxwell 1962) build each latent factor from combination of all the +manifest variables. As a consequence, the loading matrix, describing the +relations between manifest and latent variables, is usually not +immediately interpretable. Ideally, it is desirable to have variables +that are associated to a single factor. This is typically called simple +structure, which induces subsets of variables characterizing factors +and frequently the partition of the variables. While factor rotation +techniques go in this direction (especially Varimax), even if not +exactly, they do not guarantee the result. Alternative solutions have +been proposed. (Zou et al. 2006), by framing the PCA problem as a +regression one, introducing an elastic-net penalty, aiming for a sparse +solution of the loading matrix A. For the present work, we consider +two techniques for this purpose: DPCA and DFA, implemented in the +proposed package.

    +
    Disjoint principal component analysis
    +

    Vichi and Saporta (2009) propose an alternative solution, DPCA, which +leads to the simplest possible structure on A, while still +maximizing the explained variance. Such a result is obtained by building +each latent factor from a subset of variables instead of allowing all +the variables to contribute to all the components. This means that it +provides J non-zero loadings instead of having JQ of them. To obtain +this setting, variables are grouped in such a way that they form a +partition of the initial set. The model can be described as a +constrained PCA, where the matrix \(\mathbf{A}\) is restricted to be +reparametrized into the product \(\mathbf{A}=\mathbf{BV}\). Thus, the +model is described as:

    +

    \[\begin{equation} +\label{dpca1} + \mathbf{X} = \mathbf{X}\mathbf{A}\mathbf{A}' + \mathbf{E}= \mathbf{X}\mathbf{B}\mathbf{V}\mathbf{V}'\mathbf{B} + \mathbf{E}, +\end{equation} \tag{1}\] +subject to +\[\begin{equation} +\label{dpca2} + \mathbf{V} = [v_{jq} \in \{0,1\}] \ \ \ \ \ (binarity), +\end{equation} \tag{2}\]

    +

    \[\begin{equation} +\label{dpca3} + \mathbf{V}\mathbf{1}_{Q} = \mathbf{1}_{J} \ \ \ (row-stochasticity), +\end{equation} \tag{3}\]

    +

    \[\begin{equation} +\label{dpca4} +\mathbf{V}'\mathbf{B}\mathbf{B}'\mathbf{V} = \mathbf{I}_{Q} \ \ \ \ \ (orthonormality), +\end{equation} \tag{4}\]

    +

    \[\begin{equation} +\label{dpca5} + \mathbf{B} = diag(b_1, \dots, b_J) \ \ \ \ (diagonality). +\end{equation} \tag{5}\] +The estimation of the parameters \(\mathbf{B}\) and \(\mathbf{V}\) is +carried out via least squares (LS) and, by solving the minimization +problem, +\[\begin{equation} +\label{dpca6} + RSS_{DPCA}(\mathbf{B}, \mathbf{V}) = ||\mathbf{X} - \mathbf{X}\mathbf{B}\mathbf{V}\mathbf{V}'\mathbf{B}||^2 +\end{equation} \tag{6}\] +subject to the the constraints ((2), (3), +(4), (5)). An ALS algorithm is employed, +guaranteeing at least a local optimum. In order to (at least partially) +overcome this downside, multiple random starts are needed, and the best +solution is retained.

    +

    Therefore, the DPCA method is subject to more structural constraints +than standard PCA. Specifically, standard PCA does not enforce the +reparameterization \(\mathbf{A}=\mathbf{BV}\), meaning its loading matrix +\(\mathbf{A}\) is free to vary among orthonormal matrices. In contrast, +DPCA still requires an orthonormal matrix \(\mathbf{A}\) but also needs +that each principal component is associated with a disjoint subset of +variables that most reconstruct the data. This implies that each +variable contributes to only one component, resulting in a sparse and +block-diagonal loading matrix. In essence, DPCA fits Q separate PCAs +on the Q disjoint subsets of variables, and from each, extracts the +eigenvector associated with the largest eigenvalue. In general, the +total variance explained by DPCA is slightly lower, and the residual of +the objective function is larger compared to PCA. This trade-off is made +in exchange for the added constraint that clearly enhances +interpretability. The extent of the reduction depends on the true +underlying structure of the latent factors, specifically on whether they +are truly uncorrelated. When the observed correlation matrix is block +diagonal, with variables within blocks being highly correlated and +variables between blocks being uncorrelated, DPCA can explain almost the +same amount of variance of PCA, with the advantage to simplify +interpretation.
    +It is important to note that, as DPCA is implemented, it allows for a +blend of exploratory and confirmatory approaches. In the confirmatory +framework, users can specify a priori which variables should +collectively contribute to a factor using the constr argument, +available for the last three functions in Table +2. The +algorithm assigns the remaining manifest variables, for which no +constraint has been specified, to the Q factors in a way that ensures +the latent variables best reconstruct the manifest ones, capturing the +maximum variance. This is accomplished by minimizing the loss function +((6)). Although each of the Q latent variables is derived +from a different subset of variables, which involves the spectral +decomposition of multiple covariance matrices, their smaller size, +combined with the implementation in C++, enables very rapid execution of +the routine.

    +

    A very positive side effect of the additional constraint in DPCA +compared to standard PCA is the uniqueness of the solution, which +eliminates the need for factor rotation in DPCA.

    +
    Disjoint factor analysis
    +

    Proposed by Vichi (2017), this technique is the model-based counterpart +of the DPCA model. It pursues a similar goal in terms of building Q +factors from J variables, imposing a simple structure on the loading +matrix. However, the means by which the goal is pursued are different. +Unlike DPCA, the estimation method adopted for DFA is Maximum Likelihood +and the model requires additional statistical assumptions compared to +DPCA. The model can be formulated in a matrix form as, +\[\begin{equation} +\label{dfa1} + \mathbf{X} = \mathbf{Y}\mathbf{A}'+\mathbf{E}, +\end{equation} \tag{7}\] +where \(\mathbf{X}\) is centered, meaning that the mean vector +\(\boldsymbol{\mu}\) has been subtracted from each multivariate unit +\(\mathbf{x}_{i}\). Therefore, for a multivariate, centered, unit, the +previous model can be expressed as +\[\begin{equation} +\label{dfa2} + \mathbf{x}_i = \mathbf{A}\mathbf{y}_i + \mathbf{e}_i, \ \ i = 1, \dots, n. +\end{equation} \tag{8}\] +where \(\mathbf{y}_i\) is the i-th row of \(\mathbf{Y}\) and +\(\mathbf{x}_i\), \(\mathbf{e}_i\) are, respectively, the \(i\)-th rows of +\(\mathbf{X}\) and \(\mathbf{E}\), with a multivariate normal distribution +on the \(J\)-dimensional space, +\[\begin{equation} +\label{FAassumptions1} + \mathbf{x}_i \sim \mathcal{N}(\boldsymbol{0}, \boldsymbol{\Sigma_X}), \ \ \ \mathbf{e}_i \sim \mathcal{N}(\boldsymbol{0}, \mathbf{\Psi}) +\end{equation} \tag{9}\] +The covariance structure of the FA model can be written, +\[\begin{equation} + Cov(\mathbf{x}_i) = \mathbf{\Sigma_X} = \mathbf{AA'} + \mathbf{\Psi}, +\end{equation}\] + where additional, assumptions are needed, +\[\begin{equation} +\label{dfa4} + Cov(\mathbf{y}_{i}) = \mathbf{\Sigma}_{\mathbf{Y}} = \mathbf{I}_Q, +\end{equation} \tag{10}\]

    +

    \[\begin{equation} +\label{dfa5} + Cov(\mathbf{e}_i) = \mathbf{\Sigma}_{\mathbf{E}} = \mathbf{\Psi}, \ \ \ \mathbf{\Psi} = diag(\psi_{1},\dots,\psi_{Q} : \psi_{q}>0)' , \ \ j = 1, \dots, J +\end{equation} \tag{11}\]

    +

    \[\begin{equation} + Cov(\mathbf{e}_{i}, \mathbf{y}_{i}) = \mathbf{\Sigma}_{\mathbf{EY}} = 0 +\label{dfa5b} +\end{equation} \tag{12}\]

    +

    \[\begin{equation} +\mathbf{A} = \mathbf{BV} +\label{dfa6b} +\end{equation} \tag{13}\] +The objective function can be formulated as the maximization of the +Likelihood function or as the minimization of the following discrepancy: +\[\begin{align*} + D_{DFA}(\mathbf{B},\mathbf{V}, \mathbf{\Psi}) + & = |\text{ln}(\mathbf{B}\mathbf{V}\mathbf{V}'\mathbf{B} + \mathbf{\Psi})| - \text{ln}|\mathbf{S}| + \text{tr}((\mathbf{B}\mathbf{V}\mathbf{V}'\mathbf{B} + \mathbf{\Psi})^{-1}\mathbf{S}) - \textit{J}, \\ + & \qquad j = 1, \dots, \textit{J}, \ q = 1, \dots, \textit{Q},\\ + & \qquad s.t.: \mathbf{V} = [v_{jq}], \ v_{jq} \in \{0,1\}, \ \sum_q{v_{jq}} = 1, +\end{align*}\] +whose parameters are optimized by means of a coordinate descent +algorithm.

    +

    Apart from the methodological distinctions between DPCA and DFA, the +latter exhibits the scale equivariance property. The optimization of the +Likelihood function implies a higher computational load, thus, a longer +(compared to the DPCA) execution time.

    +

    As in the DPCA case, under the constraint \(\mathbf{A}=\mathbf{BV}\), the +solution provided by the model is, also in this case, unique.

    +

    Joint clustering and variable reduction

    +

    The four clustering methods discussed all follow the \(K\)-means +framework, working to partition units. However, they differ primarily in +how they handle variable reduction.

    +

    Double KM (DKM) employs a symmetric approach, clustering both the units +(rows) and the variables (columns) of the data matrix at the same time. +This leads to the simultaneous identification of mean profiles for both +dimensions. DKM is particularly suitable for data matrices where both +rows and columns represent units. Examples of such matrices include +document-by-term matrices used in Text Analysis, product-by-customer +matrices in Marketing, and gene-by-sample matrices in Biology.

    +

    In contrast, the other three clustering methods adopt an asymmetric +approach. They treat rows and columns differently, focusing on means +profiles and clustering for rows, while employing components or factors +for the variables (columns). These methods are more appropriate for +typical units-by-variable matrices, where it’s beneficial to synthesize +variables using components or factors. At the same time, they emphasize +clustering and the mean profiles of the clusters specifically for the +rows. The methodologies that fall into this category are RKM, FKM, and +DPCAKM.

    +

    The estimation is carried out by the LS method, while the computation of +the estimates is performed via ALS.

    +
    Double k-means (DKM)
    +

    Proposed by Vichi (2001), DKM is one of the first introduced +bi-clustering methods that provides a simultaneous partition of the +units and variables, resulting in a two-way extension of the plain KM +(McQueen 1967). The model is described by the following equation, +\[\begin{equation} +\label{dkm1} + \mathbf{X} = \mathbf{U}\bar{\mathbf{Y}}\mathbf{V}' + \mathbf{E} +\end{equation} \tag{14}\] +where \(\bar{\mathbf{Y}}\) is the centroid matrix in the reduced space for +the rows and columns, enabling a comprehensive summarization of units +and variables. By optimizing a single objective function, the DKM method +captures valuable information from both dimensions of the dataset +simultaneously.

    +

    This bi-clustering approach can be applied in several impactful ways. +One key application is in the realm of Big Data. DKM can effectively +compress expansive datasets that includes a vast number of units and +variables into a compressed more manageable and robust data matrix +\(\bar{\mathbf{Y}}\). This compressed matrix, formed by mean profiles both +for rows and columns, can then be explored and analyzed using a variety +of subsequent statistical techniques, thus facilitating efficient data +handling and analysis of Big Data. The algorithm similarly to the +well-known KM is very fast and converges quickly to a solution, which is +at least a local minimum of the problem.

    +

    Another significant application of DKM is its capability to achieve +optimal clustering for both rows and columns. This dual clustering +ability is particularly advantageous in situations where it is essential +to discern meaningful patterns and relationships within complex +datasets, highlighting the utility of DKM in diverse fields and +scenarios.

    +

    The Least Squares estimation of the parameters \(\mathbf{U}\), +\(\mathbf{V}\) and \(\bar{\mathbf{Y}}\) leads to the minimization of the +problem +\[\begin{equation} +\label{dkm2} + RSS_{\textit{DKM}}(\mathbf{U}, \mathbf{V}, \bar{\mathbf{Y}}) = {||\mathbf{X} - \mathbf{U}\bar{\mathbf{Y}}\mathbf{V}'||^2}, +\end{equation} \tag{15}\]

    +

    \[\begin{equation} +\label{dkm3} + s.t.: u_{ik} \in \{0,1\}, \ \ \sum_{k} u_{ik} = 1, \ \ i = 1 ,\dots, N, \ \ k = 1 ,\dots, K, +\end{equation} \tag{16}\]

    +

    \[\begin{equation} +\label{dkm4} + \ \ \ \ \ \ \ v_{jq} \in \{0,1\}, \ \ \sum_{q} v_{jq} = 1, \ \ j = 1, \dots, J, \ \ q = 1, \dots, Q. +\end{equation} \tag{17}\] +Since \(\mathbf{\bar{Y}} = \mathbf{U}^{+}\mathbf{X}\mathbf{V}^{+'}\), then +((15)) can be framed in terms of projector operators, thus: +\[\begin{equation} +\label{dkm5} +RSS_{\textit{DKM}}(\mathbf{U}, \mathbf{V}) = ||\mathbf{X} - \mathbf{H}_\mathbf{U}\mathbf{X}\mathbf{H}_\mathbf{V}||^2. +\end{equation} \tag{18}\] +Minimizing in both cases the sum of squared-residuals (or, equivalently, +the within deviances associated to the K unit-clusters and Q +variable-clusters). In this way, one obtains a (hard) classification of +both units and variables. The optimization of (18) is done +via ALS, alternating, in essence, two assignment problems for rows and +columns similar to KM steps.

    +
    Reduced k-means (RKM)
    +

    Proposed by De Soete and Carroll (1994), RKM performs the reduction of +the variables by projecting the J-dimensional centroid matrix into a +Q-dimensional subspace (\(\textit{Q} \leq\) J), spanned by the columns +of the loading matrix \(\mathbf{A}\), such that it best reconstructs +\(\mathbf{X}\) by using the orthogonal projector matrix +\(\mathbf{A}\mathbf{A}'\). Therefore, the model is described by the +following equation, +\[\begin{equation} +\label{rkm1} + \mathbf{X} = \mathbf{U}\bar{\mathbf{X}}\mathbf{A}\mathbf{A}' + \mathbf{E}. +\end{equation} \tag{19}\] +The estimation of U and A can be done via LS, minimizing the +following equation, +\[\begin{equation} +\label{rkm2} + RSS_{\textit{RKM}}(\mathbf{U}, \mathbf{A})={||\mathbf{X} - \mathbf{U}\bar{\mathbf{X}}\mathbf{A}\mathbf{A}'||^2}, +\end{equation} \tag{20}\]

    +

    \[\begin{equation} +\label{rkm3} + s.t.: \ \ \ u_{ik} \in \{0,1\}, \ \ \sum_{k} u_{ik} = 1, \ \ \mathbf{A}'\mathbf{A} = \mathbf{I}. +\end{equation} \tag{21}\] +which can be optimized, once again, via ALS. In essence, the model +alternates a KM step assigning each original unit \(\mathbf{x}_i\) to the +closest centroid in the reduced space and a PCA step based on the +spectral decomposition of \(\mathbf{X}'\mathbf{H}_\mathbf{U}\mathbf{X}\), +conditioned on the results of the previous iteration. The iterations +continue until when the difference between two subsequent objective +functions is smaller than a small arbitrary chosen constant +\(\epsilon > 0\).

    +
    Factorial k-means (FKM)
    +

    Proposed by Vichi and Kiers (2001), FKM produces a dimension reduction +both of the units and centroids differently from RKM. Its goal is to +reconstruct the data in the reduced subspace, \(\mathbf{Y}\), by means of +the centroids in the reduced space. The FKM model can be obtained by +considering the RKM model and post-multiplying the right- and left-hand +side of it in equation ((19)), and rewriting the new error as +\(\mathbf{E}\), +\[\begin{equation} + \mathbf{X}\mathbf{A} = \mathbf{U}\bar{\mathbf{X}}\mathbf{A} + \mathbf{E}. +\end{equation}\] +Its estimation via LS results in the optimization of the following +equation, +\[\begin{equation} +\label{fkm1} + RSS_{\textit{FKM}}(\mathbf{U}, \mathbf{A}, \bar{\mathbf{X}})={||\mathbf{X}\mathbf{A} - \mathbf{U}\bar{\mathbf{X}}\mathbf{A}||^2}, +\end{equation} \tag{22}\]

    +

    \[\begin{equation} + s.t.: \ \ \ u_{ik} \in \{0,1\}, \ \ \sum_{k} u_{ik} = 1, \ \ \mathbf{A}'\mathbf{A} = \mathbf{I}. +\end{equation}\] +Although the connection with the RKM model appears straightforward, it +can be shown that the loss function of the former is always equal or +smaller compared to the latter. Practically, the KM step is applied on +\(\mathbf{X}\mathbf{A}\), instead of just \(\mathbf{X}\), as it happens in +the DKM and RKM. In essence, FKM works better when the data and +centroids are lying in the reduced subspace, and not just the centroids +as in RKM.

    +

    In order to decide when RKM or FKM can be properly applied, it is +important to recall that two types of residuals can be defined in +dimensionality reduction: subspace residuals, lying on the subspace +spanned by the columns of \(\mathbf{A}\) and complement residuals, lying +on the complement of this subspace, i.e., those residual lying on the +subspace spanned by the columns of \(\mathbf{A}^\perp\), with +\(\mathbf{A}^\perp\) a column-wise orthonormal matrix of order +\(J \times (J-Q)\) such that +\(\mathbf{A}^\perp \mathbf{A}^{\perp ^\prime} = \mathbf{O}_{J-Q}\), where +\(\mathbf{O}_{J-Q}\) is the matrix of zeroes of order \(Q \times (J-Q)\). +FKM is more effective when there is significant residual variance in the +subspace orthogonal to the clustering subspace. In other words, the +complement residuals typically represent the error given by those +observed variables that scarcely contribute to the clustering subspace +to be identified. FKM tends to recover the subspace and clustering +structure more accurately when the data contains variables with +substantial variance that does not reflect the clustering structure and +therefore mask it. FKM can better ignore these variables and focus on +the relevant clustering subspace. On the other hand, RKM performs better +when the data has significant residual variance within the clustering +subspace itself. This means that when the variables within the subspace +show considerable variance, RKM can more effectively capture the +clustering structure.

    +

    In essence, when most of the variables in the dataset reflect the +clustering structure, RKM is more likely to provide a good solution. If +this is not the case, FKM may be preferred.

    +
    Disjoint principal component analysis k-means (DPCAKM)
    +

    Starting from the FKM model, the goal here, beside the partition of the +units, is to have a parsimonious representation of the relationships +between latent and manifest variables, provided by the loading matrix +A. Vichi and Saporta (2009) propose for FKM the parametrization of +A = BV, that allows the simplest structure and thus simplifies +the interpretation of the factors, +\[\begin{equation} +\label{cdpca1} + \mathbf{X} = \mathbf{U}\bar{\mathbf{X}}\mathbf{B}\mathbf{V}\mathbf{V}'\mathbf{B} + \mathbf{E}. +\end{equation} \tag{23}\] +By estimating \(\mathbf{U}\), \(\mathbf{B}\), \(\mathbf{V}\) and +\(\bar{\mathbf{X}}\) via LS, the loss function of the proposed method +becomes: +\[\begin{equation} +\label{cdpca2} + RSS_{DPCAKM}(\mathbf{U}, \mathbf{B}, \mathbf{V}, \bar{\mathbf{X}}) = ||\mathbf{X} - \mathbf{U}\bar{\mathbf{X}}\mathbf{B}\mathbf{V}\mathbf{V}'\mathbf{B}||^2, +\end{equation} \tag{24}\]

    +

    \[\begin{equation} +\label{cdpca3} + s.t.: u_{ik} \in \{0,1\}, \ \ \sum_{k} u_{ik} = 1, \ \ i = 1 ,\dots, N, \ \ k = 1 ,\dots, K, +\end{equation} \tag{25}\]

    +

    \[\begin{equation} +\label{cdpca4} + \ \ \ \ \ \ \ v_{jq} \in \{0,1\}, \ \ \sum_{q} v_{jq} = 1, \ \ j = 1, \dots, J, \ \ q = 1, \dots, Q, +\end{equation} \tag{26}\]

    +

    \[\begin{equation} +\label{cdpca5} + \ \ \ \ \ \ \ \mathbf{V}'\mathbf{B}\mathbf{B}\mathbf{V} = \mathbf{I}, \ \ \mathbf{B} = diag(b_1, \dots, b_J). +\end{equation} \tag{27}\] +In practice, this model has traits of the DPCA given the projection on +the reduced subspace and the partitioning of the units, resulting in a +sparse loading matrix, but also of the DKM, given the presence of both +U and V. Thus, DPCAKM can be considered a bi-clustering +methodology with an asymmetric treatment of the rows and columns of +X. By inheriting the constraint on A, the overall fit of the +model compared with the FKM for example, is generally worse although it +offers an easier interpretation of the principal components. +Nevertheless, it is potentially able to identify a better partition of +the units. Like in the DPCA case, the difference is negligible when the +true latent variables are really disjoint. As implemented, the +assignment step is carried out by minimizing the unit-centroid +squared-Euclidean distance in the reduced subspace.

    +

    3 The package

    +

    The library offers the implementation of all the models mentioned in the +previous section. Each one of them corresponds to a specific function +implemented using Rcpp +(Eddelbuettel and Francois 2011) and +RcppArmadillo +(Eddelbuettel and Sanderson 2014).

    +
    + + ++++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Table 2: Statistical methods available in the drclust package
    FunctionModelPrevious
    +Implementations
    Main differences
    +in drclust
    doublekmDKM
    +(Vichi 2001)
    NoneShort runtime (C++);
    redkmRKM
    +(De Soete and Carroll 1994)
    in clusterd;
    +Mixed variables;
    >50x faster (C++);
    +Continuous variables;
    factkmFKM
    +(Vichi and Kiers 2001)
    in clustrd;
    +Mixed variables
    >20x faster (C++);
    +Continuous variables;
    +Better fit and classification;
    dpcakmDPCAKM
    +(Vichi and Saporta 2009)
    in biplotbootGUI;
    +Continuous variables;
    +SDP-based initialization of parameters;
    >10x faster (C++);
    +Constraint on variable allocation within principal components;
    dispcaDPCA
    +(Vichi and Saporta 2009)
    NoneShort runtime (C++);
    +Constraint on variable allocation within principal components;
    disfaDFA
    +(Vichi 2017)
    NoneShort runtime (C++);
    +Constraint on variable allocation within factors;
    +
    +

    Some additional functions have been made available for the user. Most of +them are intended to aid the user in evaluating the quality of the +results, or in the choice of the hyper-parameters.

    +
    + + ++++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Table 3: Auxiliary functions available in the library
    FunctionTechniqueDescriptionGoal
    apseudoF"relaxed" pseudoF"Relaxed" version of Caliński and Harabasz (1974). Selects the second largest pseudoF value if the difference with the first is less than a fraction.Parameter tuning
    dpseudoFDKM-pseudoFAdaptation of the pseudoF criterion proposed by Rocci and Vichi (2008) to bi-clustering.Parameter tuning
    kaiserCritKaiser criterionKaiser rule for selecting the number of principal components (Kaiser 1960).Parameter tuning
    centreeDendrogram of the centroidsGraphical tool showing how close the centroids of a partition are.Visualization
    silhouetteSilhouetteImported from cluster (Maechler et al. 2023) and factoextra (Kassambara 2022).Visualization, parameter tuning
    heatmHeatmapHeatmap of distance-ordered units within distance-ordered clusters, adapted from pheatmap (Kolde 2019).Visualization
    CronbachAlphaCronbach Alpha IndexProposed by Cronbach (1951). Assesses the unidimensionality of a dataset.Assessment
    mrandARIAssesses clustering quality based on the confusion matrix (Rand 1971).Assessment
    clusterMembership vectorReturns a multinomial 1 × n membership vector from a binary, row-stochastic n × K membership matrix; mimics kmeans$cluster.Encoding
    +
    +

    With regard to the auxiliary functions (Table +3), they +have all been implemented in the R language, building on top of +packages already available on CRAN, such as +cluster by (Maechler +et al. 2023), +factoextra by +(Kassambara 2022), +pheatmap by (Kolde +2019), which allowed for an easier implementation. One of the main goals +of the proposed package, besides spreading the availability and +usability of the statistical methods considered, is the speed of +computation. By doing so (if the memory is sufficient), the results, +also for large data matrices, can be obtained in a reasonable amount of +time. A first mean adopted to pursue such a goal is the full +implementation of the statistical methods in the C++ language. The +libraries used are Rcpp +(Eddelbuettel and Francois 2011) and +RcppArmadillo +(Eddelbuettel and Sanderson 2014), which significantly reduced the +required runtime.

    +

    A practical issue that happens very often in crisp (hard) clustering, +such as KM, is the presence of empty clusters after the assignment step. +When this happens, a column of \(\mathbf{U}\) has all elements equal to +zero, which can be proved to be a local minimum solution, and impedes +obtaining a solution for \((\mathbf{U}'\mathbf{U})^{-1}\). This typically +happens even more often when the number of clusters K specified by the +user is larger than the true one or in the case of a sub-optimal +solution. Among the possible solutions addressing this issue, the one +implemented here consists in splitting the cluster with higher +within-deviance. In practice, a KM with \(\textit{K} = 2\) is applied to +it, assigning to the empty cluster one of the two clusters obtained by +the procedure, which is iterated until all the empty clusters are +filled. Such a strategy guarantees that the monotonicity of the ALS +algorithm is preserved, although it is the most time-consuming one.

    +

    Among all the six implementations of the statistical techniques, there +are some arguments that are set to a default value. Table +4 +describes all the arguments that have a default value. In particular, +print, which displays a descriptive summary of the results, is set to +zero (so the user should explicitly require to the function such +output). Rndstart is set as default to 20, so that the algorithm is +run 20 times until convergence. In order to have more confidence (not +certainty) that the obtained solution is a global optimum, a higher +value for this argument can be provided. With particular regard to +redkm and factkm, the argument rot, which performs a Varimax +rotation on the loading matrix, is set by default to 0. If the user +would like to have this performed, it must be set equal to 1. Finally, +the constr argument, which is available for dpcakm and dispca, is +set by default to a vector (of length J) of zeros, so that each +variable is selected to contribute to the most appropriate latent +variable, according to the logic of the model.

    +
    + + ++++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Table 4: Arguments accepted by functions in the drclust +package with default values
    ArgumentUsed InDescriptionDefault Value
    Rndstartdoublekm, redkm, factkm, dpcakm, dispca, disfaNumber of times the model is run until convergence.20
    verbosedoublekm, redkm, factkm, dpcakm, dispca, disfaOutputs basic summary statistics regarding each random start (1 = enabled; 0 = disabled).0
    maxiterdoublekm, redkm, factkm, dpcakm, dispca, disfaMaximum number of iterations allowed for each random start (if convergence is not yet reached)100
    toldoublekm, redkm, factkm, dpcakm, dispca, disfaTolerance threshold (maximum difference between the values of the objective function of two consecutive iterations such that convergence is assumed\(10^{-6}\)
    tolapseudoFApproximation value. It is half of the length of the interval put for each pF value. 0 <= tol < 10.05
    rotredkm, factkmperforms varimax rotation of axes obtained via PCA (0 = False; 1 = True)0
    prepdoublekm, redkm, factkm, dpcakm, dispca, disfaPre-processing of the data. 1 performs the z-score transform; 2 performs the min-max transform; 0 leaves the data un-pre-processed1
    printdoublekm, redkm, factkm, dpcakm, dispca, disfaFinal summary statistics of the performed method (1 = enabled; 0 = disabled).0
    constrdpcakm, dispca, disfaVector of length \(J\) (number of variables) specifying variable-to-cluster assignments. Each element can be an integer from 0 to \(Q\) (number of variable-clusters or components), indicating a fixed assignment, or 0 to leave the variable unconstrained (i.e., assigned by the algorithm).rep(0,J)
    +
    +

    By offering a fast execution time, all the implemented models allow to +run multiple random starts of the algorithm in a reasonable amount of +time. This feature comes particularly useful given the absence of +guarantees of global optima for the ALS algorithm, which has an ad-hoc +implementation for all the models. Table +5 shows +that, compared to the two packages which implement 3 of the 6 models in +drclust, our proposal +is much faster than the corresponding versions implemented in R (Table +5), +providing, nevertheless, compelling results.

    +

    The iris dataset has been used in order to measure the performance in +terms of fit, runtime, and ARI (Rand 1971). The z-transform has been +applied on all the variables of the dataset. This implies that all the +variables, post-transformation, have mean equal to 0 and variance equal +to 1, by subtracting the mean to each variable and dividing the result +by the standard deviation. The same result is typically obtained by the +scale(X) R function.

    +

    \[\begin{equation} +\label{eq:ztransform} +\mathbf{Z}_{\cdot j} = \frac{\mathbf{X}_{\cdot j} - \mu_j \mathbf{1_\textit{n}}}{\sigma_j} +\end{equation} \tag{28}\] +where \(\mu_j\) is the mean of the j-th variable and \(\sigma_\textit{j}\) +its standard deviation. The subscript .j refers to the whole j-th +column of the matrix. This operation avoids the measurement scale to +have impact on the final result (and is used by default, unless +otherwise specified by the user, within all the techniques implemented +by drclust. In order to avoid the comparison between potentially +different objective functions, the between deviance (intended as +described by the authors in the articles where the methods have been +proposed) has been used as a fit measure and computed based on the +output provided by the functions, aiming at having homogeneity in the +evaluation metric. K=3 and Q=2 have been used for the clustering +algorithms, maintaining, for the two-dimensionality reduction +techniques, just Q = 2.

    +

    For each method, 100 runs have been performed and the best solution has +been picked. For each run, the maximum allowed number of iterations = +100, with a tolerance error (i.e., precision) equal to \(10^{-6}\).

    +
    + + ++++++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Table 5: Performance of the variable reduction and joint +clustering-variable reduction models
    LibraryTechniqueRuntimeFitARIFit Measure
    clustrdRKM0.7321.380.620\(||\mathbf{U}\bar{\mathbf{Y}}\mathbf{A}'||^2\)
    drclustRKM0.0121.780.620\(||\mathbf{U}\bar{\mathbf{Y}}\mathbf{A}'||^2\)
    clustrdFKM1.894.480.098\(||\mathbf{U}\bar{\mathbf{Y}}||^2\)
    drclustFKM0.0321.890.620\(||\mathbf{U}\bar{\mathbf{Y}}||^2\)
    biplotbootGUICDPCA2.8321.320.676\(||\mathbf{U}\bar{\mathbf{Y}}\mathbf{A}'||^2\)
    drclustCDPCA0.0521.340.676\(||\mathbf{U}\bar{\mathbf{Y}}\mathbf{A}'||^2\)
    drclustDKM0.0321.290.652\(||\mathbf{U}\bar{\mathbf{X}}\mathbf{H_V}||^2\)
    drclustDPCA<0.0123.70-\(||\mathbf{Y}\mathbf{A}'||^2\)
    drclustDFA1.1155.91-\(||\mathbf{Y}\mathbf{A}'||^2\)
    +
    +

    The results of table 5 are visually represented in figure +1.

    +
    +
    +ARI, Fit, Runtime for the available implementations +

    +Figure 1: ARI, Fit, Runtime for the available implementations +

    +
    +
    +

    Although the runtime heavily depends on the hardware characteristics, +they have been reported within Table 5 for a relative comparison purpose only, +having run all the techniques with the same one hardware. For all the +computations within the present work, the specifics of the machine used +are: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz 2.00 GHz.

    +

    Besides the already mentioned difference between DPCA and DFA, it is +worth mentioning that, in terms of implementation, they retrieve the +latent variables differently. Indeed, while the DPCA relies on the +eigendecomposition, the DFA uses an implementation of the power method +(Hotelling 1933).

    +

    In essence, the implementation of our proposal, while being very fast, +exhibits a goodness of fit very close (sometimes better, compared) to +the available alternatives.

    +

    4 Simulation study

    +

    To better understand the capabilities of the proposed methodologies and +evaluate the performance of the drclust package, a simulation study was +conducted. In this study, we assume that the number of clusters (K) and +the number of factors (Q) are known, and we examine how results vary +across the DKM, RKM, FKM, and DPCAKM methods.

    +

    Data generation process

    +

    The performance of these algorithms is tested on synthetic data +generated through a specific procedure. Initially, centroids are created +using eigendecomposition on a transformed distance matrix, resulting in +three equidistant centroids in a reduced two-dimensional space. To model +the variances and covariances among the generated units within each +cluster and to introduce heterogeneity among the units, a +variance-covariance matrix (\(\Sigma_O\)) is derived from samples taken +from a zero-mean Gaussian distribution, with a specified standard +deviation (\(\sigma_u\)).

    +

    Membership for the 1,000 units is determined based on a (K × 1) vector +of prior probabilities, utilizing a multinomial distribution with (0.2, +0.3, 0.5) probabilities. For each unit, a sample is drawn from a +multivariate Gaussian distribution centered around its corresponding +centroid, using the previously generated covariance matrix (\(\Sigma_O\)). +Additionally, four masking variables, which do not exhibit any +clustering structure, are generated from a zero-mean multivariate +Gaussian and scaled by a standard deviation of \(\sigma\)=6. These masking +variables are added to the 2 variables that form the clustering +structure of the dataset. Then, the final sample dataset is +standardized.

    +

    It is important to note that the standard deviation \(\sigma_u\) controls +the amount of variance in the reduced space, thus influencing the level +of subspace residuals. Conversely, \(\sigma_m\) regulates the variance of +the masking variables, impacting the complement residuals.

    +

    This study considers various scenarios where there are \(J\) = 6 +variables, \(n\) = 1,000 units, \(K\) = 3 clusters and \(Q\) = 2 factors. We +explore high, medium, and low variance \(\sigma_u\) of the heterogeneity +within clusters with values of 0.8, 0.55, and 0.3. For each combination +of these parameters, \(s\)=100 samples are generated. Since the design is +fully crossed, a total of 300 datasets are produced. Examples of the +generated samples are illustrated in Figure +2, which +shows that as the level of within-cluster variance increases, the +variables with a clustering structure tend to create overlapping +clusters. It is worthy to inform that the two techniques dedicated +solely to variable reduction, namely DPCA and DFA, were not included in +the simulation study. This is because the study’s primary focus is on +clustering and dimension reduction and the comparison with competing +implementations. However, it is worth noting that these methods are +inherently quick, as can be observed from the speed of methodologies +that combine clustering with DPCA or DFA dimension reduction methods.

    +

    Performance evaluation

    +

    The performance of the proposed methods was assessed through a +simulation study. To evaluate the accuracy in recovering the true +cluster membership of the units (U), the ARI (Hubert and Arabie +1985) was employed. The ARI quantifies the similarity between the hard +partitions generated by the estimated classification matrices and those +defined by the true partition. It considers both the reference partition +and the one produced by the algorithm under evaluation. The ARI +typically ranges from 0 to 1, where 0 indicates a level of agreement +expected by random chance, and 1 denotes a perfect match. Negative +values may also occur, indicating agreement worse than what would be +expected by chance. In order to assess the models’ ability to +reconstruct the underlying data structure, the between deviance, denoted +by \(f\)—, was computed. This measure is defined in the original works +proposing the evaluated methods and is reported in the second column +(Fit Measure) of Table 6. For comparison, the true between deviance +\(f^{*}\), calculated from the known true, known, values of U and +A, was also computed. The difference \(f - f^{*}\) was considered, +where negative values suggest potential overfitting. Furthermore, the +squared Frobenius norm \(||\mathbf{A}^* - \mathbf{A}||^2\) was computed to +assess how accurately each model estimated the true loading matrix +\(\mathbf{A}^*\). This evaluation was not applicable to the DKM method, as +it does not provide estimates of the loading matrix. For each +performance metric presented in Table 6, the median value across \(s\) = 100 +replicates, for each level of error (within deviance), is reported.

    +

    It is important to note that fit and ARI reflect distinct objectives. +While fit measures the variance explained by the model, the ARI assesses +clustering accuracy. As such, the two metrics may diverge. A model may +achieve high fit by capturing subtle variation or even noise, which may +not correspond to well-separated clusters, leading to a lower ARI. +Conversely, a method focused on maximizing cluster separation may yield +high ARI while explaining less overall variance. This trade-off is +particularly relevant in unsupervised settings, where there is no +external supervision to guide the balance between reconstruction and +partitioning. For this reason, we report both metrics to provide a more +comprehensive assessment of model performance.

    +

    Algorithms performances and comparison with the competing implementations

    +

    For each sample, the algorithms DKM, RKM, FKM, and DPCAKM are applied +using 100 random start solutions, selecting the best one. This +significantly reduces the impact of local minima in the clustering and +dimension reduction process. Figure +2 depicts +the typical situation for each scenario (low, medium, high +within-cluster variance).

    +
    +
    +Within-cluster variance of the simulated data (in order: low, medium, high) +

    +Figure 2: Within-cluster variance of the simulated data (in order: low, medium, high) +

    +
    +
    +
    + + ++++++++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Table 6: Comparison of joint clustering-variable reduction +methods on simulated data
    TechniqueFit MeasureLibraryRuntime (s)FitARI\(f^* - f\)\(||\mathbf{A}^* - \mathbf{A}||^2\)
    Low
    RKM\(||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}\mathbf{A}'||^2\)clustrd164.0342.761.000.002.00
    RKM\(||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}\mathbf{A}'||^2\)drclust0.4842.761.000.002.00
    FKM\(||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}||^2\)clustrd15.482.890.3539.771.99
    FKM\(||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}||^2\)drclust0.5242.761.000.002.00
    DPCAKM\(||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}||^2\)biplotbootGUI41.7042.741.000.012.00
    DPCAKM\(||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}||^2\)drclust1.3742.741.000.012.00
    DKM\(||\mathbf{U}\bar{\mathbf{X}}\mathbf{V}||^2\)drclust0.7861.550.46-18.94-
    Medium
    RKM\(||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}\mathbf{A}'||^2\)clustrd230.3139.180.92-0.272.00
    RKM\(||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}\mathbf{A}'||^2\)drclust0.7039.180.92-0.272.00
    FKM\(||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}||^2\)clustrd14.312.850.2836.091.99
    FKM\(||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}||^2\)drclust0.7639.180.92-0.272
    DPCAKM\(||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}||^2\)biplotbootGUI47.7639.150.92-0.252.00
    DPCAKM\(||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}||^2\)drclust1.6439.150.92-0.252.00
    DKM\(||\mathbf{U}\bar{\mathbf{Y}}\mathbf{V}||^2\)drclust0.815.930.39-21.00-
    High
    RKM\(||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}\mathbf{A}'||^2\)clustrd314.8936.610.62-2.112.00
    RKM\(||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}\mathbf{A}'||^2\)drclust0.9436.610.61-2.112.00
    FKM\(||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}||^2\)clustrd13.872.900.1931.552.00
    FKM\(||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}||^2\)drclust1.0236.610.61-2.112.00
    DPCAKM\(||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}||^2\)biplotbootGUI55.4936.530.64-1.992.00
    DPCAKM\(||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}||^2\)drclust2.0636.530.63-2.012.00
    DKM\(||\mathbf{U}\bar{\mathbf{X}}\mathbf{V}||^2\)drclust0.8458.970.29-24.37-
    +
    +

    For the three scenarios, the results are reported in +6.

    +
    +
    +Boxplots of the Fit results in Table \@ref(tab:T6) +

    +Figure 3: Boxplots of the Fit results in Table 6 +

    +
    +
    +
    +
    +Boxplots of the ARI results in Table \@ref(tab:T6) +

    +Figure 4: Boxplots of the ARI results in Table 6 +

    +
    +
    +
    +
    +Boxplots of the $f^* - f$ results in Table \@ref(tab:T6) +

    +Figure 5: Boxplots of the \(f^* - f\) results in Table 6 +

    +
    +
    +
    +
    +Boxplots of the $||\mathbf{A} - \mathbf{A}^*||^2$ metric results in Table \@ref(tab:T6) +

    +Figure 6: Boxplots of the \(||\mathbf{A} - \mathbf{A}^*||^2\) metric results in Table 6 +

    +
    +
    +
    +
    +Boxplots of the runtime results in Table \@ref(tab:T6), for the RKM +

    +Figure 7: Boxplots of the runtime results in Table 6, for the RKM +

    +
    +
    +
    +
    +Boxplots of the runtime metric results in Table \@ref(tab:T6), for DKM, DPCAKM, FKM +

    +Figure 8: Boxplots of the runtime metric results in Table 6, for DKM, DPCAKM, FKM +

    +
    +
    +

    Regarding the RKM, the +drclust and +clustrd performance is +very close, both in terms of the ability to recover the data (fit) and +in terms of identifying the true classification of the objects.

    +

    The FKM appears to be performing way better in the +drclust case in terms +of fit and ARI. Considering both ARI and fit for the CDPCA algorithm, +the difference between the present proposal and the one of +biplotbootGUI is +almost absent. Referring to the CPU runtime, all of the models proposed +are significantly faster compared to the previously available ones (RKM, +FKM and KM with DPCA). For the architecture used for the experiments, +the order of magnitude for such differences are specified in the last +column of Table 2.

    +

    In general, the +drclust shows a slight +overfit, while there is no evident difference in the ability to recover +the true A. There is no alternative implementation for the DKM, so +no comparison can be made. However, except for the ARI which is lower +than the other techniques, its fit is very close, showing a compelling +ability to reconstruct the data. In general, except for the FKM, where +our proposal outperforms the one in +clustrd, our proposal +is equivalent in terms of fit and ARI. However, our versions outperform +every alternative in terms of runtime. Figures +(3 - +8) visually depict the situation in +6, showing +also the variability for each scenario, among 100 replicates. In +general, with the exception of the FKM method, where our proposed +approach outperforms the implementation available in +clustrd, the methods +are comparable in terms of both fit and ARI. Nevertheless, our +implementations consistently outperform all alternatives in terms of +runtime.

    +

    Figure (3 - +8) provide a visual summary of the results +reported in Table 6, illustrating not only the central +tendencies but also the variability across the 100 simulation replicates +for each scenario.

    +

    5 Application on real data

    +

    The six statistical models implemented (Table +2) have a +binary argument print which, if set to one, displays at the end of the +execution the main statistics. In the following examples, such results +are shown, using as dataset the same used by Vichi and Kiers (2001) and +made available in +clustrd (Markos et al. +2019) and named macro, which has been standardized by setting the +argument prep=1, which is done by default by all the techniques. +Moreover, the commands reported in each example do not specify all the +arguments available for the function, for which the default values have +been kept.

    +

    The first example refers to the DKM (Vichi 2001). As shown, the output +contains the fit expressed as the percentage of the total deviance +(i.e., \(||\mathbf{X}||^2\)) captured by the between deviance of the +model, implementing the fit measures in (Table +5). The +second output is the centroid matrix \(\bar{\mathbf{Y}}\), which describes +the K centroids in the Q-dimensional space induced by the partition +of the variables and its related variable-means. What follows are the +sizes and within deviances of each unit cluster and each variable +cluster. Finally, it shows the pseudoF (Caliński and Harabasz 1974) +index, which is always computed for the partition of the units. Please +note that the data matrix provided to each function implemented in the +package needs to be in matrix format.

    +
    # Macro dataset (Vichi & Kiers, 2001)
    +library(clustrd)
    +data(macro)
    +macro <- as.matrix(macro)
    +# DKM
    +> dkm <- doublekm(X = macro, K = 5, Q = 3, print = 1)
    +
    +>> Variance Explained by the DKM (% BSS / TSS):  44.1039
    +
    +>> Centroid Matrix (Unit-centroids x Variable-centroids):
    +
    +           V-Clust 1   V-Clust 2  V-Clust 3
    +U-Clust 1  0.1282052 -0.31086968 -0.4224182
    +U-Clust 2  0.0406931 -0.08362029  0.9046692
    +U-Clust 3  1.4321347  0.51191282 -0.7813761
    +U-Clust 4 -0.9372541  0.22627768  0.1175189
    +U-Clust 5  1.2221058 -2.59078258 -0.1660691
    +
    +>> Unit-clusters: 
    +
    +         U-Clust 1 U-Clust 2 U-Clust 3 U-Clust 4 U-Clust 5
    +Size     8         4         4         3         1 
    +Deviance 23.934373 31.737865 5.878199  4.844466  0.680442 
    +         
    +
    +
    +>> Variable-clusters: 
    + 
    +         V-Clust 1 V-Clust 2 V-Clust 3
    +Size     3         2         1        
    +Deviance 40.832173 23.024249 3.218923 
    +
    +>> pseudoF Statistic (Calinski-Harabasz): 2.23941
    +

    The second example shows as output the main quantities computed for the +redkm (De Soete and Carroll 1994). Differently from the DKM where the +variable reduction is operated via averages, the RKM does this via PCA +leading to a better overall fit altering also the final unit-partition, +as observable from the sizes or deviances.

    +

    Additionally from the DKM example, the RKM also provides the loading +matrix which projects the J-dimensional centroids in the +Q-dimensional subspace. Another important difference is the summary of +the latent factors: this table shows the information captured by the +principal components with respect to the original data. In this sense, +the output allows to distinguish between the loss due to the variable +reduction (accounted in this table) and the overall loss of the +algorithm (which accounts for the loss in the reduction of the units and +the one due to the reduction of the variables, reported in the first +line of the output).

    +
    # RKM
    +> rkm <- redkm(X = macro, K = 5, Q = 3, print = 1)
    +
    +>> Variance Explained by the RKM (% BSS / TSS): 55.0935
    +
    +>> Matrix of Centroids (Unit-centroids x Principal Components):
    +
    +              PC 1       PC 2       PC 3
    +Clust 1 -1.3372534 -1.1457414 -0.6150841
    +Clust 2  1.8834878 -0.0853912 -0.8907303
    +Clust 3  0.5759906  0.4187003  0.3739608
    +Clust 4 -0.9538864  1.2392976  0.3454186
    +Clust 5  1.0417952 -2.2197178  3.0414445
    +
    +>> Unit-clusters: 
    +         Clust 1   Clust 2  Clust 3   Clust 4  Clust 5 
    +Size     5         5        5         4        1       
    +Deviance 26.204374 9.921313 11.231563 6.112386 0.418161
    +
    +>> Loading Matrix (Manifest Variables x Latent Variables):
    + 
    +          PC 1        PC 2        PC 3
    +GDP -0.5144915 -0.04436269  0.08985135
    +LI  -0.2346937 -0.01773811 -0.86115069
    +UR  -0.3529363  0.53044730  0.28002534
    +IR  -0.4065339 -0.42022401 -0.17016203
    +TB   0.1975072  0.69145440 -0.36710245
    +NNS  0.5927684 -0.24828525 -0.09062404
    +
    +>> Summary of the latent factors:
    +
    +     Explained Variance Expl. Var. (%) Cumulated Var. Cum. Var (%)
    +PC 1 1.699343           28.322378      1.699343       28.322378   
    +PC 2 1.39612            23.268663      3.095462       51.591041   
    +PC 3 1.182372           19.706208      4.277835       71.297249   
    +
    +>> pseudoF Statistic (Calinski-Harabasz): 4.29923
    +

    The factkm (Vichi and Kiers 2001) has the same output structure of the +redkm. It exhibits, for the same data and hyperparameters, a similar +fit (overall and variable-wise). However, the unit-partition, as well as +the latent variables are different. This difference can be (at least) +partially justified by the difference in the objective function, which +is most evident in the assignment step.

    +
    # factorial KM
    +> fkm <- factkm(X = macro, K = 5, Q = 3, print = 1, rot = 1)
    +
    +>> Variance Explained by the FKM (% BSS / TSS): 55.7048
    +
    +>> Matrix of Centroids (Unit-centroids x Principal Components):
    +
    +              PC 1        PC 2        PC 3
    +Clust 1 -0.7614810  2.16045496 -1.21025666
    +Clust 2  1.1707159 -0.08840133 -0.29876729
    +Clust 3 -0.9602731 -1.33141866  0.02370092
    +Clust 4  1.0782934  1.17952330  3.59632116
    +Clust 5 -1.7634699  0.65075735  0.46486440
    +
    +>> Unit-clusters: 
    +         Clust 1  Clust 2  Clust 3  Clust 4  Clust 5
    +Size     9        5        3        2        1      
    +Deviance 6.390576 2.827047 5.018935 3.215995 0      
    +
    +>> Loading Matrix (Manifest Variables x Latent Variables):
    + 
    +          PC 1       PC 2        PC 3
    +GDP -0.6515084 -0.1780021  0.37482509
    +LI  -0.3164139  0.1809559 -0.68284917
    +UR  -0.2944864 -0.5235492  0.01561022
    +IR  -0.3316254  0.5884434 -0.22101070
    +TB   0.1848264 -0.5367239 -0.57166730
    +NNS  0.4945307  0.1647067  0.13164438
    +
    +>> Summary of the latent factors:
    +
    +     Explained Variance Expl. Var. (%) Cumulated Var. Cum. Var (%)
    +PC 1 1.68496            28.082675      1.68496        28.082675   
    +PC 2 1.450395           24.173243      3.135355       52.255917   
    +PC 3 1.079558           17.992635      4.214913       70.248552   
    +
    +>> pseudoF Statistic (Calinski-Harabasz): 4.26936
    +

    dpcakm (Vichi and Saporta 2009) shows the same output as RKM and FKM. +The partition of the variables, described by the \(\mathbf{V}\) term in +((26)) - ((27)), is readable within the loading +matrix, considering a \(1\) for each non-zero value. For the iris +dataset, the additional constraint \(\mathbf{A} = \mathbf{B}\mathbf{V}\) +does not cause a significant decrease in the objective function. The +clusters, however, differ from the previous cases as well.

    +
    # K-means DPCA
    +> cdpca <- dpcakm(X = macro, K = 5, Q = 3, print = 1)
    +
    +>> Variance Explained by the DPCAKM (% BSS / TSS): 54.468
    +
    +>> Matrix of Centroids (Unit-centroids x Principal Components):
    +
    +              PC 1        PC 2       PC 3
    +Clust 1  0.6717536  0.01042978 -2.7309458
    +Clust 2  3.7343724 -1.18771685  0.6320673
    +Clust 3 -0.6729575 -1.80822745  0.7239541
    +Clust 4 -0.2496002  1.54537904  0.5263009
    +Clust 5 -0.1269212 -0.12464388 -0.1748282
    +
    +>> Unit-clusters: 
    +         Clust 1  Clust 2  Clust 3 Clust 4 Clust 5
    +Size     7        6        4       2       1      
    +Deviance 3.816917 2.369948 1.14249 4.90759 0      
    +
    +>> Loading Matrix (Manifest Variables x Latent Variables):
    + 
    +          PC 1      PC 2 PC 3
    +GDP  0.5567605 0.0000000    0
    +LI   0.0000000 0.7071068    0
    +UR   0.5711396 0.0000000    0
    +IR   0.0000000 0.0000000    1
    +TB   0.0000000 0.7071068    0
    +NNS -0.6031727 0.0000000    0
    +
    +>> Summary of the latent factors:
    +     Explained Variance Expl. Var. (%) Cumulated Var. Cum. Var (%)
    +PC 1 1                  16.666667      1              16.666667   
    +PC 2 1.703964           28.399406      2.703964       45.066073   
    +PC 3 1.175965           19.599421      3.87993        64.665494   
    +
    +>> pseudoF Statistic (Calinski-Harabasz): 3.26423
    +

    For the dispca (Vichi and Saporta 2009), the output is mostly similar +(except for the part of unit-clustering) to the ones already shown. +Nevertheless, because the focus here is exclusively on the variable +reduction process, some additional information is reported in the +summary of the latent factors. Indeed, because a single principal +component summarises a subset of manifest variables, the variance of the +second component related to each of the subsets, along with the Cronbach +(1951) Alpha index is computed, in order for the user to know when the +evidence supports such strategy of dimensionality reduction. As +mentioned, this function, like in the DPCAKM case, as well as the DFA +case, it allows to constrain a subset of the J variables to belong to +the same cluster. In the example that follows, the first two manifest +variables are constrained to contribute to the same principal component +(which is confirmed by the output A). Note that the manifest variables +that have indices (colum-position in the data matrix) in correspondence +of the zeros in constr remain unconstrained.

    +
    # DPCA
    +# Impose GDP and LI to be in the same cluster
    +> out <- dispca(X = macro, Q = 3, print = 1, constr = c(1,1,0,0,0,0))
    +
    +>> Variance explained by the DPCA (% BSS / TSS)= 63.9645
    +
    +>> Loading Matrix (Manifest Variables x Latent variables) 
    +
    +          PC 1       PC 2      PC 3
    +GDP  0.0000000  0.0000000 0.7071068
    +LI   0.0000000  0.0000000 0.7071068
    +UR  -0.7071068  0.0000000 0.0000000
    +IR   0.0000000 -0.7071068 0.0000000
    +TB   0.0000000  0.7071068 0.0000000
    +NNS  0.7071068  0.0000000 0.0000000
    +
    +>> Summary of the latent factors:
    +     Explained Variance Expl. Var. (%) Cumulated Var.
    +PC 1           1.388294       23.13824       1.388294
    +PC 2           1.364232       22.73721       2.752527
    +PC 3           1.085341       18.08902       3.837868
    +     Cum. Var (%) Var. 2nd component Cronbach's Alpha
    +PC 1     23.13824          0.6117058        -1.269545
    +PC 2     45.87544          0.6357675        -1.145804
    +PC 3     63.96447          0.9146585         0.157262
    +

    The disfa (Vichi 2017), by assuming a probabilistic underlying model, +allows additional evaluation metrics and statistics as well. The overall +objective function is not directly comparable with the other ones, and +is expressed in absolute (not relative, like in the previous cases) +terms. The \(\chi^2\) (X2), along with BIC, AIC and RMSEA allow a +robust evaluation of the results in terms of fit/parsimony. Additionally +to the DPCA case, for each variable, the function displays the +commonality with the factors, providing a standard error, as well as an +associated p-value for the estimate.

    +

    It is possible to assess by comparing the loading matrix in the DPCA +case with the DFA one, the similarity in terms of latent variables. Part +of the difference can be justified (besides the well-known distinctions +between PCA and FA) with the method used to compute each factor. While +in all the previous cases, the eigendecomposition has been employed for +this purpose, the DFA makes use of the power iteration method for the +computation of the loading matrix (Hotelling 1933).

    +
    # disjoint FA
    +> out <- disfa(X = macro, Q = 3, print = 1)
    +>> Discrepancy of DFA: 0.296499
    +
    +>> Summary statistics:
    +
    +  Unknown Parameters Chi-square Degrees of Freedom BIC       
    +  9                  4.447531   12                 174.048102
    +  AIC        RMSEA   
    +  165.086511 0.157189
    +
    +>> Loading Matrix (Manifest Variables x Latent Variables) 
    +
    +      Factor 1 Factor 2   Factor 3
    +GDP  0.5318618        0  0.0000000
    +LI   0.0000000        1  0.0000000
    +UR   0.5668542        0  0.0000000
    +IR   0.0000000        0  0.6035160
    +TB   0.0000000        0 -0.6035152
    +NNS -0.6849942        0  0.0000000
    +
    +>> Summary of the latent factors:
    +
    +         Explained Variance Expl. Var. (%) Cum. Var Cum. Var (%)
    +Factor 1          1.0734177       17.89029 1.073418     17.89029
    +Factor 2          1.0000000       16.66667 2.073418     34.55696
    +Factor 3          0.7284622       12.14104 2.801880     46.69800
    +         Var. 2nd component Cronbach's Alpha
    +Factor 1          0.7001954       -0.6451803
    +Factor 2          0.0000000        1.0000000
    +Factor 3          0.6357675       -1.1458039
    +
    +>> Detailed Manifest-variable - Latent-factor relationships
    +
    +    Associated Factor Corr. Coeff. Std. Error    Pr(p>|Z|)
    +GDP                 1    0.5318618  0.1893572 0.0157923335
    +LI                  2    1.0000000  0.0000000 0.0000000000
    +UR                  1    0.5668542  0.1842113 0.0091557523
    +IR                  3    0.6035160  0.1782931 0.0048411219
    +TB                  3   -0.6035152  0.1782932 0.0048411997
    +NNS                 1   -0.6849942  0.1629084 0.0008606488
    +    Var. Error Communality
    +GDP  0.7171230   0.2828770
    +LI   0.0000000   1.0000000
    +UR   0.6786764   0.3213236
    +IR   0.6357684   0.3642316
    +TB   0.6357695   0.3642305
    +NNS  0.5307830   0.4692170
    +

    In practice, usually the K and Q hyper-parameters are not known a +priori. In such case, a possible tool that allows to investigate +plausible values for Q is the Kaiser criterion (Kaiser 1960), in R, +kaiserCrit), takes as a single argument the dataset and outputs a +message, as well as a scalar output indicating the number of the optimal +components based on this rule.

    +
    # Kaiser criterion for the choice of Q, the number of latent components
    +> kaiserCrit(X = macro)
    +
    +The number of components suggested by the Kaiser criterion is:  3 
    +

    For selecting the number of clusters, K, one of the most commonly used +indices is the pseudoF statistic, which, however, tends to +underestimate the optimal number of clusters. To address this +limitation, a "relaxed" version, referred to as apseudoF, has been +implemented. The apseudoF procedure computes the standard pseudoF +index over a range of possible values up to maxK. If a higher value of +K yields a pseudoF that is less than tol \(\cdot\) pseudoF (compared +to the maximum value suggested by the plain pseudoF), then apseudoF +selects this alternative K as the optimal number of clusters. +Additionally, it generates a plot of the pseudoF values computed across +the specified K range. Given the hybrid nature of the proposed +methods, the function also requires specifying the clustering model to +be used: 1 = doublekm, 2 = redkm, 3 = factkm, 4 = dpcakm. +Furthermore, the number of components, Q, must be provided, as it also +influences the final quality of the resulting partition.

    +
    > apseudoF(X = macro, maxK=10, tol = 0.05, model = 2, Q = 3)
    +The optimal number of clusters based on the pseudoF criterion is: 5
    +
    +
    +Interval-pseudoF polygonal chain +

    +Figure 9: Interval-pseudoF polygonal chain +

    +
    +
    +

    While this index has been thought for one-mode clustering methods, +(Rocci and Vichi 2008) extended it for two-mode clustering methods, +allowing to apply it for methods like the doublekm. The dpseudoF +function implements it and, besides the dataset, one provides the +maximum K and Q values.

    +
    > dpseudoF(X = macro, maxK = 10, maxQ = 5)
    +           Q = 2     Q = 3     Q = 4     Q = 5
    +K = 2  38.666667 22.800000 16.000000 12.222222
    +K = 3  22.800000 13.875000  9.818182  7.500000
    +K = 4  16.000000  9.818182  6.933333  5.263158
    +K = 5  12.222222  7.500000  5.263158  3.958333
    +K = 6   9.818182  6.000000  4.173913  3.103448
    +K = 7   8.153846  4.950000  3.407407  2.500000
    +K = 8   6.933333  4.173913  2.838710  2.051282
    +K = 9   6.000000  3.576923  2.400000  1.704545
    +K = 10  5.263158  3.103448  2.051282  1.428571
    +

    Here, the indices of the maximum value within the matrix are chosen as +the best Q and K values.

    +

    Just by providing the centroid matrix, one can check how those are +related. Such information is usually not provided by partitive +clustering methods, but rather for the hierarchical ones. Nevertheless, +it is always possible to construct a distance matrix based on the +centroids and represent it via a dendrogram, using an arbitrary +distance. The centree function does exactly this, using the the Ward +(1963) distance, which corresponds to the squared Euclidean one. In +practice, one provides as an argument the output of one of the 4 methods +performing clustering.

    +
    > out <- factkm(X = macro, K = 10, Q = 3)
    +> centree(drclust_out = out)
    +
    +
    +Dendrogram of a 10-centroids +

    +Figure 10: Dendrogram of a 10-centroids +

    +
    +
    +

    If, instead, one wants to assess visually the quality of the obtained +partition, there are another instrument typically used for this purpose. +The silhouette (Rousseeuw 1987), besides summarizing this numerically, +allows to also graphically represent it. By employing +cluster for the +computational part and +factoextra for the +graphical part, silhouette takes as argument the output of one of the +four drclust +clustering methods and the dataset, returning the results of the two +functions with just one command.

    +
    # Note: The same data must be provided to dpcakm and silhouette 
    +> out <- dpcakm(X = macro, K = 5, Q = 3)
    +> silhouette(X = macro, drclust_out = out)   
    +
    +
    +Silhouette of a DPCA KM solution +

    +Figure 11: Silhouette of a DPCA KM solution +

    +
    +
    +

    As can be seen in Figure 11, the average silhouette width is also +displayed as a scalar above the plot.

    +

    A purely graphical tool used to assess the dis/homogeneity of the groups +is the heatmap. By employing the +pheatmap library +(Kolde 2019) and the result of doublekm, redkm, factkm or +dpcakm, the function orders each cluster of observations in ascending +order with regard to the distance between observation and cluster to +which it has been assigned. After doing so for each group, groups are +sorted based on the distance between their centroid and the grand mean +(i.e., the mean of all observations). The heatm function allows to +obtain such result. Figure 11 represents its graphical output.

    +
    # Note: The same data must be provided to dpcakm and silhouette 
    +> out <- doublekm(X = macro, K = 5, Q = 3)
    +> heatm(X = macro, drclust_out = out)
    +
    +
    +heatmap of a double-KM solution +

    +Figure 12: heatmap of a double-KM solution +

    +
    +
    +

    Biplots and parallel coordinates plots can be obtained based on the +output of the techniques in the proposed package by means of few +instructions, using libraries available on CRAN, such as: +ggplot2 (Wickham et +al. 2024), grid (which now became a base package, +dplyr (Wickham et al. +2023) and GGally by +(Schloerke et al. 2024). Therefore, the user can easily visualize the +subspaces provided by the statistical techniques. In future versions of +the package, the two functions will be available as built-in. Currently, +for the biplot, we have:

    +
    library(ggplot2)
    +library(grid)
    +library(dplyr)
    +
    +out <- factkm(macro, K = 2, Q = 2, Rndstart = 100)
    +
    +# Prepare data
    +Y <- as.data.frame(macro%*%out$A); colnames(Y) <- c("Dim1", "Dim2")
    +Y$cluster <- as.factor(cluster(out$U))
    +
    +arrow_scale <- 5
    +A <- as.data.frame(out$A)[, 1:2] * arrow_scale
    +colnames(A) <- c("PC1", "PC2")
    +A$var <- colnames(macro)
    +
    +# Axis limits
    +lims <- range(c(Y$Dim1, Y$Dim2, A$PC1, A$PC2)) * 1.2
    +
    +# Circle
    +circle <- data.frame(x = cos(seq(0, 2*pi, length.out = 200)) * arrow_scale,
    +                     y = sin(seq(0, 2*pi, length.out = 200)) * arrow_scale)
    +
    +ggplot(Y, aes(x = Dim1, y = Dim2, color = cluster)) +
    +  geom_point(size = 2) +
    +  geom_segment(
    +    data = A, aes(x = 0, y = 0, xend = PC1, yend = PC2),
    +    arrow = arrow(length = unit(0.2, "cm")), inherit.aes = FALSE, color = "gray40"
    +  ) +
    +  geom_text(
    +    data = A, aes(x = PC1, y = PC2, label = colnames(macro)), inherit.aes = FALSE,
    +    hjust = 1.1, vjust = 1.1, size = 3
    +  ) +
    +  geom_path(data = circle, aes(x = x, y = y), inherit.aes = FALSE,
    +            linetype = "dashed", color = "gray70") +
    +  coord_fixed(xlim = lims, ylim = lims) +
    +  labs(x = "Component 1", y = "Component 2", title = "Biplot") +
    +  theme_minimal()
    +

    which leads to the result shown in Figure +13.

    +
    +
    +Biplot of a FKM solution +

    +Figure 13: Biplot of a FKM solution +

    +
    +
    +

    By using essential information in the output provided by factkm, we +are able to see the cluster of each observation, represented in the +estimated subspace induced by \(\mathbf{A}\), as well as the relationships +between observed and latent variables via the arrows.

    +

    In order to obtain the parallel coordinates plot, a single instruction +is sufficient, based on the same output as a starting point.

    +
    library(GGally)
    +out <- factkm(macro, K = 3, Q = 2, Rndstart = 100)
    +ggparcoord(
    +    data = Y, columns = 1:(ncol(Y)-1),     
    +    groupColumn = "cluster", scale = "uniminmax", 
    +    showPoints = FALSE, alphaLines = 0.5
    +) + 
    +    theme_minimal() + 
    +    labs(title = "Parallel Coordinate Plot", 
    +      x = "Variables",  y = "Normalized Value")
    +

    For FKM applied on macro dataset, the output is reported in figure +14.

    +
    +
    +Parallel coordinates plot of a FKM solution +

    +Figure 14: Parallel coordinates plot of a FKM solution +

    +
    +
    +

    6 Conclusions

    +

    This work presents an R library that implements techniques of joint +dimensionality reduction and clustering. Some of them are already +implemented by other packages. In general, the performance between the +proposed implementations and the earlier ones is very close, except for +the FKM, where the new one is always better for the metrics considered +here. As an element of novelty, the empty cluster(s) issue that may +occur in the estimation process has been addressed by applying 2-means +on the cluster with the highest deviance, preserving the monotonicity of +the algorithm and providing slightly better results, at a higher +computational costs.

    +

    The implementation of the two dimensionality reduction methods, dispca +and disfa, as well as doublekm offered by our library are novel in +the sense that they do not find previous implementation in R. Besides +the methodological difference between these last two, the latent +variables are computed differently: the former uses the well-known +eigendecomposition, while the latter adopts the power method. In +general, by implementing all the models in C/C++, the speed advantage +has been shown to be remarkable compared to all the existing +comparisons. These improvements allow the application of the techniques +on datasets that are relatively large, to obtain results in reasonable +amounts of time. Some additional functions have been implemented for the +purpose of helping in the choice process for the values of the +hyperparameters. Additionally, they can also be used as an assessment +tool in order to evaluate the quality of the results provided by the +implementations.

    +
    +
    +
    +Caliński, T., and J. Harabasz. 1974. “A Dendrite Method for Cluster +Analysis.” Communications in Statistics 3 (1): 1–27. +https://doi.org/10.1080/03610927408827101. +
    +
    +Cattell, R. B. 1965. “Factor Analysis: An Introduction to Essentials i. +The Purpose and Underlying Models.” Biometrics 21 (1): 190–215. +https://doi.org/10.2307/2528364. +
    +
    +Charrad, M., N. Ghazzali, V. Boiteau, and A. Niknafs. 2014. “NbClust: An +R Package for Determining the Relevant Number of Clusters in a Data +Set.” Journal of Statistical Software 61 (6): 1–36. +https://doi.org/10.18637/jss.v061.i06. +
    +
    +Cronbach, Lee J. 1951. “Coefficient Alpha and the Internal Structure of +Tests.” Psychometrika 16 (3): 297–334. +https://doi.org/10.1007/BF02310555. +
    +
    +De Soete, G., and J. D. Carroll. 1994. “K-Means Clustering in a +Low-Dimensional Euclidean Space.” Chap. 24 in New Approaches in +Classification and Data Analysis, edited by E. Diday, Y. Lechevallier, +M. Schader, P. Bertrand, and B. Burtschy. Springer. +https://doi.org/10.1007/978-3-642-51175-2_24. +
    +
    +DeSarbo, W. S., K. Jedidi, K. Cool, and D. Schendel. 1990. “Simultaneous +Multidimensional Unfolding and Cluster Analysis: An Investigation of +Strategic Groups.” Marketing Letters 2: 129–46. +https://doi.org/10.1007/BF00436033. +
    +
    +Dray, S., and A.-B. Dufour. 2007. “The Ade4 Package: Implementing the +Duality Diagram for Ecologists.” Journal of Statistical Software 22 +(4): 1–20. https://doi.org/10.18637/jss.v022.i04. +
    +
    +Eddelbuettel, D., and R. Francois. 2011. “Rcpp: Seamless R and C++ +Integration.” Journal of Statistical Software 40 (8): 1–18. +https://doi.org/10.18637/jss.v040.i08. +
    +
    +Eddelbuettel, D., and C. Sanderson. 2014. “RcppArmadillo: Accelerating R +with High-Performance C++ Linear Algebra.” Computational Statistics and +Data Analysis 71: 1054–63. +https://doi.org/10.1016/j.csda.2013.02.005. +
    +
    +Hotelling, H. 1933. “Analysis of a Complex of Statistical Variables into +Principal Components.” Journal of Educational Psychology 24: 417–41, +and 498–520. https://doi.org/10.1037/h0071325. +
    +
    +Hubert, L., and P. Arabie. 1985. “Comparing Partitions.” Journal of +Classification 2 (1): 193–218. https://doi.org/10.1007/BF01908075. +
    +
    +Kaiser, Henry F. 1960. “The Application of Electronic Computers to +Factor Analysis.” Educational and Psychological Measurement 20 (1): +141–51. https://doi.org/10.1177/001316446002000116. +
    +
    +Kassambara, A. 2022. Factoextra: Extract and Visualize the Results of +Multivariate Data Analyses. R package version 1.0.7. +https://cran.r-project.org/package=factoextra. +
    +
    +Kolde, R. 2019. Pheatmap: Pretty Heatmaps. R package 1.0.12. +https://cran.r-project.org/package=pheatmap. +
    +
    +Lawley, D. N., and A. E. Maxwell. 1962. “Factor Analysis as a +Statistical Method.” Journal of the Royal Statistical Society. Series D +(The Statistician) 12 (3): 209–29. https://doi.org/10.2307/2986915. +
    +
    +Lê, S., J. Josse, and F. Husson. 2008. “FactoMineR: An R Package for +Multivariate Analysis.” Journal of Statistical Software 25 (1): 1–18. +https://doi.org/10.18637/jss.v025.i01. +
    +
    +Maechler, M., P. Rousseeuw, A. Struyf, M. Hubert, and K. Hornik. 2023. +Cluster: Cluster Analysis Basics and Extensions. R package version +2.1.6. https://CRAN.R-project.org/package=cluster. +
    +
    +Markos, A., A. I. D’Enza, and M. van de Velden. 2019. “Beyond Tandem +Analysis: Joint Dimension Reduction and Clustering in R.” Journal of +Statistical Software 91 (10): 1–24. +https://doi.org/10.18637/jss.v091.i10. +
    +
    +McQueen, J. 1967. “Some Methods for Classification and Analysis of +Multivariate Observations.” Computer and Chemistry 4: 257–72. +https://www.cs.cmu.edu/~bhiksha/courses/mlsp.fall2010/class14/macqueen.pdf. +
    +
    +Nieto Librero, A. B., and A. Freitas. 2023. biplotbootGUI: Bootstrap on +Classical Biplots and Clustering Disjoint Biplot. +https://cran.r-project.org/web/packages/biplotbootGUI/index.html. +
    +
    +Pardo, C. E., and P. C. Del Campo. 2007. “Combination of Factorial +Methods and Cluster Analysis in R: The Package FactoClass.” Revista +Colombiana de Estadística 30 (2): 231–45. +https://revistas.unal.edu.co/index.php/estad/article/view/29478. +
    +
    +Pearson, K. 1901. “On Lines and Planes of Closest Fit to Systems of +Points in Space.” The London, Edinburgh, and Dublin Philosophical +Magazine and Journal of Science 2 (11): 559–72. +https://doi.org/10.1080/14786440109462720. +
    +
    +R Core Team. 2015. R: A Language and Environment for Statistical +Computing. R Foundation for Statistical Computing. +http://www.R-project.org/. +
    +
    +Rand, W. M. 1971. “Objective Criteria for the Evaluation of Clustering +Methods.” Journal of the American Statistical Association 66 (336): +846–50. https://doi.org/10.2307/2284239. +
    +
    +Rocci, R., and M. Vichi. 2008. “Two-Mode Multi-Partitioning.” +Computational Statistics & Data Analysis 52 (4): 1984–2003. +https://doi.org/10.1016/j.csda.2007.06.025. +
    +
    +Rousseeuw, Peter J. 1987. “Silhouettes: A Graphical Aid to the +Interpretation and Validation of Cluster Analysis.” Journal of +Computational and Applied Mathematics 20: 53–65. +https://doi.org/10.1016/0377-0427(87)90125-7. +
    +
    +Schloerke, B., D. Cook, H. Hofmann, et al. 2024. GGally: Extension to +‘Ggplot2’. R package version 2.1.2. +https://CRAN.R-project.org/package=GGally. +
    +
    +Timmerman, Marieke E., Eva Ceulemans, Henk A. L. Kiers, and Maurizio +Vichi. 2010. “Factorial and Reduced k-Means Reconsidered.” +Computational Statistics & Data Analysis 54 (7): 1858–71. +https://doi.org/10.1016/j.csda.2010.02.009. +
    +
    +Vichi, M. 2001. “Double k-Means Clustering for Simultaneous +Classification of Objects and Variables.” Chap. 6 in Advances in +Classification and Data Analysis, edited by S. Borra, R. Rocci, M. +Vichi, and M. Schader. Springer. +https://doi.org/10.1007/978-3-642-59471-7_6. +
    +
    +Vichi, M. 2017. “Disjoint Factor Analysis with Cross-Loadings.” +Advances in Data Analysis and Classification 11 (4): 563–91. +https://doi.org/10.1007/s11634-016-0263-9. +
    +
    +Vichi, Maurizio, and Henk A. L. Kiers. 2001. “Factorial k-Means Analysis +for Two-Way Data.” Computational Statistics & Data Analysis 37 (1): +49–64. https://doi.org/10.1016/S0167-9473(00)00064-5. +
    +
    +Vichi, Maurizio, and Gilbert Saporta. 2009. “Clustering and Disjoint +Principal Component Analysis.” Computational Statistics & Data +Analysis 53 (8): 3194–208. +https://doi.org/10.1016/j.csda.2008.05.028. +
    +
    +Vichi, M., D. Vicari, and Henk A. L. Kiers. 2019. “Clustering and +Dimension Reduction for Mixed Variables.” Behaviormetrika, 243–69. +https://doi.org/10.1007/s41237-018-0068-6. +
    +
    +W. R. Revelle. 2017. Psych: Procedures for Personality and +Psychological Research. +https://cran.r-project.org/web/packages/psych/index.html. +
    +
    +Ward, J. H. 1963. “Hierarchical Grouping to Optimize an Objective +Function.” Journal of the American Statistical Association 58 (301): +236–44. https://doi.org/10.1080/01621459.1963.10500845. +
    +
    +Wickham, H., W. Chang, L. Henry, et al. 2024. Ggplot2: Elegant Graphics +for Data Analysis. R package version 3.4.4. +https://CRAN.R-project.org/package=ggplot2. +
    +
    +Wickham, H., R. François, L. Henry, and K. Müller. 2023. Dplyr: A +Grammar of Data Manipulation. R package version 1.1.4. +https://CRAN.R-project.org/package=dplyr. +
    +
    +Yamamoto, M., and H. Hwang. 2014. “A General Formulation of Cluster +Analysis with Dimension Reduction and Subspace Separation.” +Behaviormetrika 41: 115–29. https://doi.org/10.2333/bhmk.41.115. +
    +
    +Zou, H., T. Hastie, and R. Tibshirani. 2006. “Sparse Principal Component +Analysis.” Journal of Computational and Graphical Statistics 15 (2): +265–86. https://doi.org/https://doi.org/10.1198/106186006X113430. +
    +
    +
    +

    7 Supplementary materials

    +

    Supplementary materials are available in addition to this article. It can be downloaded at +RJ-2025-046.zip

    +

    8 CRAN packages used

    +

    psych, ade4, FactoMineR, FactoClass, factoextra, NbClust, drclust, clustrd, biplotbootGUI, Rcpp, RcppArmadillo, cluster, pheatmap, ggplot2, dplyr, GGally

    +

    9 CRAN Task Views implied by cited packages

    +

    ChemPhys, Cluster, Databases, Environmetrics, HighPerformanceComputing, MissingData, ModelDeployment, NetworkAnalysis, NumericalMathematics, Phylogenetics, Psychometrics, Robust, Spatial, TeachingStatistics

    +

    10 Note

    +

    This article is converted from a Legacy LaTeX article using the +texor package. +The pdf version is the official version. To report a problem with the html, +refer to CONTRIBUTE on the R Journal homepage.

    + + +
    + +
    +
    + + + + + + + +
    +

    References

    +
    +

    Reuse

    +

    Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

    +

    Citation

    +

    For attribution, please cite this work as

    +
    Prunila & Vichi, "drclust: An R Package for Simultaneous Clustering and Dimensionality Reduction", The R Journal, 2026
    +

    BibTeX citation

    +
    @article{RJ-2025-046,
    +  author = {Prunila, Ionel and Vichi, Maurizio},
    +  title = {drclust: An R Package for Simultaneous Clustering and Dimensionality Reduction},
    +  journal = {The R Journal},
    +  year = {2026},
    +  note = {https://doi.org/10.32614/RJ-2025-046},
    +  doi = {10.32614/RJ-2025-046},
    +  volume = {17},
    +  issue = {4},
    +  issn = {2073-4859},
    +  pages = {103-132}
    +}
    +
    + + + + + + + diff --git a/_articles/RJ-2025-046/RJ-2025-046.pdf b/_articles/RJ-2025-046/RJ-2025-046.pdf new file mode 100644 index 0000000000..765639b327 Binary files /dev/null and b/_articles/RJ-2025-046/RJ-2025-046.pdf differ diff --git a/_articles/RJ-2025-046/RJ-2025-046.zip b/_articles/RJ-2025-046/RJ-2025-046.zip new file mode 100644 index 0000000000..0cdea5041d Binary files /dev/null and b/_articles/RJ-2025-046/RJ-2025-046.zip differ diff --git a/_articles/RJ-2025-046/RJournal.sty b/_articles/RJ-2025-046/RJournal.sty new file mode 100644 index 0000000000..351990be38 --- /dev/null +++ b/_articles/RJ-2025-046/RJournal.sty @@ -0,0 +1,358 @@ +% Package `RJournal' to use with LaTeX2e +% Copyright (C) 2010 by the R Foundation +% Copyright (C) 2013 by the R Journal +% +% Originally written by Kurt Hornik and Friedrich Leisch with subsequent +% edits by the editorial board +% +% CAUTION: +% Do not modify this style file. Any changes to this file will be reset when your +% article is submitted. +% If you must modify the style or add LaTeX packages to the article, these +% should be specified in RJwrapper.tex + +\NeedsTeXFormat{LaTeX2e}[1995/12/01] +\ProvidesPackage{RJournal}[2025/10/05 v0.17 RJournal package] + +\RequirePackage{tikz} + +% Overall page layout, fonts etc ----------------------------------------------- + +% Issues of of \emph{The R Journal} are created from the standard \LaTeX{} +% document class \pkg{report}. + +\RequirePackage{geometry} +\geometry{a4paper, + textwidth=14cm, top=1cm, bottom=1cm, + includehead,includefoot,centering, + footskip=1.5cm} +\raggedbottom +\sloppy +\clubpenalty = 10000 +\widowpenalty = 10000 +\brokenpenalty = 10000 +\usepackage{microtype} + + +\RequirePackage{fancyhdr} +\fancyhead{} +\fancyheadoffset{2cm} +\fancyhead[L]{\textsc{\RJ@sectionhead}} +\fancyhead[R]{\thepage} +\fancyfoot{} +\fancyfoot[L]{The R Journal Vol. \RJ@volume/\RJ@number, \RJ@month~\RJ@year} +\fancyfoot[R]{ISSN 2073-4859} +\pagestyle{fancy} + +% We use the following fonts (all with T1 encoding): +% +% rm & palatino +% tt & inconsolata +% sf & helvetica +% math & palatino + +\RequirePackage{microtype} + +\RequirePackage[scaled=0.92]{helvet} +\RequirePackage{palatino,mathpazo} +\RequirePackage[scaled=1.02]{inconsolata} +\RequirePackage[T1]{fontenc} + +\RequirePackage[hyphens]{url} +\RequirePackage[pagebackref]{hyperref} +\renewcommand{\backref}[1]{[p#1]} + +% Dark blue colour for all links +\RequirePackage{color} +\definecolor{link}{rgb}{0.45,0.51,0.67} +\hypersetup{ + colorlinks,% + citecolor=link,% + filecolor=link,% + linkcolor=link,% + urlcolor=link +} + +% Give the text a little room to breath +\setlength{\parskip}{3pt} +\RequirePackage{setspace} +\setstretch{1.05} + +% Issue and article metadata --------------------------------------------------- + +% Basic front matter information about the issue: volume, number, and +% date. + +\newcommand{\volume}[1]{\def\RJ@volume{#1}} +\newcommand{\volnumber}[1]{\def\RJ@number{#1}} +\renewcommand{\month}[1]{\def\RJ@month{#1}} +\renewcommand{\year}[1]{\def\RJ@year{#1}} + + +% Individual articles correspond to +% chapters, and are contained in |article| environments. This makes it +% easy to have figures counted within articles and hence hyperlinked +% correctly. + +% An article has an author, a title, and optionally a subtitle. We use +% the obvious commands for specifying these. Articles will be put in certain +% journal sections, named by \sectionhead. + +\newcommand {\sectionhead} [1]{\def\RJ@sectionhead{#1}} +\renewcommand{\author} [1]{\def\RJ@author{#1}} +\renewcommand{\title} [1]{\def\RJ@title{#1}} +\newcommand {\subtitle} [1]{\def\RJ@subtitle{#1}} + +% Control appearance of titles: make slightly smaller than usual, and +% suppress section numbering. See http://tex.stackexchange.com/questions/69749 +% for why we don't use \setcounter{secnumdepth}{-1} + +\usepackage[medium]{titlesec} +\usepackage{titletoc} +\titleformat{\section} {\normalfont\large\bfseries}{\arabic{section}}{1em}{} +\titleformat{\subsection}{\normalfont\normalsize\bfseries}{\arabic{section}.\arabic{subsection}}{0.5em}{} +\titlecontents{chapter} [0em]{}{}{}{\titlerule*[1em]{.}\contentspage} + +% Article layout --------------------------------------------------------------- + +% Environment |article| clears the article header information at its beginning. +% We use |\FloatBarrier| from the placeins package to keep floats within +% the article. +\RequirePackage{placeins} +\newenvironment{article}{\author{}\title{}\subtitle{}\FloatBarrier}{\FloatBarrier} + +% Refereed articles should have an abstract, so we redefine |\abstract| to +% give the desired style + +\renewcommand{\abstract}[1]{\noindent\textbf{Abstract} #1} +\renewenvironment{abstract}{\noindent\textbf{Abstract}~}{} + +% The real work is done by a redefined version of |\maketitle|. Note +% that even though we do not want chapters (articles) numbered, we +% need to increment the chapter counter, so that figures get correct +% labelling. + +\renewcommand{\maketitle}{% +\noindent + \chapter{\RJ@title}\refstepcounter{chapter} + \ifx\empty\RJ@subtitle + \else + \noindent\textbf{\RJ@subtitle} + \par\nobreak\addvspace{\baselineskip} + \fi + \ifx\empty\RJ@author + \else + \noindent\textit{\RJ@author} + \par\nobreak\addvspace{\baselineskip} + \fi + \@afterindentfalse\@nobreaktrue\@afterheading +} + +% Now for some ugly redefinitions. We do not want articles to start a +% new page. (Actually, we do, but this is handled via explicit +% \newpage +% +% The name@of@eq is a hack to get hyperlinks to equations to work +% within each article, even though there may be multiple eq.(1) +% \begin{macrocode} +\renewcommand\chapter{\secdef\RJ@chapter\@schapter} +\providecommand{\nohyphens}{% + \hyphenpenalty=10000\exhyphenpenalty=10000\relax} +\newcommand{\RJ@chapter}{% + \edef\name@of@eq{equation.\@arabic{\c@chapter}}% + \renewcommand{\@seccntformat}[1]{}% + \@startsection{chapter}{0}{0mm}{% + -2\baselineskip \@plus -\baselineskip \@minus -.2ex}{\p@}{% + \phantomsection\normalfont\huge\bfseries\raggedright}} + +% Book reviews should appear as sections in the text and in the pdf bookmarks, +% however we wish them to appear as chapters in the TOC. Thus we define an +% alternative to |\maketitle| for reviews. +\newcommand{\review}[1]{ + \pdfbookmark[1]{#1}{#1} + \section*{#1} + \addtocontents{toc}{\protect\contentsline{chapter}{#1}{\thepage}{#1.1}} +} + +% We want bibliographies as starred sections within articles. +% +\RequirePackage[sectionbib,round]{natbib} +\bibliographystyle{abbrvnat} +\renewcommand{\bibsection}{\section*{References}} + +% Equations, figures and tables are counted within articles, but we do +% not show the article number. For equations it becomes a bit messy to avoid +% having hyperref getting it wrong. + +% \numberwithin{equation}{chapter} +\renewcommand{\theequation}{\@arabic\c@equation} +\renewcommand{\thefigure}{\@arabic\c@figure} +\renewcommand{\thetable}{\@arabic\c@table} + +% Issue layout ----------------------------------------------------------------- + +% Need to provide our own version of |\tableofcontents|. We use the +% tikz package to get the rounded rectangle. Notice that |\section*| +% is really the same as |\chapter*|. +\renewcommand{\contentsname}{Contents} +\renewcommand\tableofcontents{% + \vspace{1cm} + \section*{\contentsname} + { \@starttoc{toc} } +} + +\renewcommand{\titlepage}{% + \thispagestyle{empty} + \hypersetup{ + pdftitle={The R Journal Volume \RJ@volume/\RJ@number, \RJ@month \RJ@year},% + pdfauthor={R Foundation for Statistical Computing},% + } + \noindent + \begin{center} + \fontsize{50pt}{50pt}\selectfont + The \raisebox{-8pt}{\includegraphics[height=77pt]{Rlogo-5}}\hspace{10pt} + Journal + + \end{center} + {\large \hfill Volume \RJ@volume/\RJ@number, \RJ@month{} \RJ@year \quad} + + \rule{\textwidth}{1pt} + \begin{center} + {\Large A peer-reviewed, open-access publication of the \\ + R Foundation for Statistical Computing} + \end{center} + + % And finally, put in the TOC box. Note the way |tocdepth| is adjusted + % before and after producing the TOC: thus, we can ensure that only + % articles show up in the printed TOC, but that in the PDF version, + % bookmarks are created for sections and subsections as well (provided + % that the non-starred forms are used). + \setcounter{tocdepth}{0} + \tableofcontents + \setcounter{tocdepth}{2} + \clearpage +} + +% Text formatting -------------------------------------------------------------- + +\newcommand{\R}{R} +\newcommand{\address}[1]{\addvspace{\baselineskip}\noindent\emph{#1}} +\newcommand{\email}[1]{\href{mailto:#1}{\normalfont\texttt{#1}}} + +% Simple font selection is not good enough. For example, |\texttt{--}| +% gives `\texttt{--}', i.e., an endash in typewriter font. Hence, we +% need to turn off ligatures, which currently only happens for commands +% |\code| and |\samp| and the ones derived from them. Hyphenation is +% another issue; it should really be turned off inside |\samp|. And +% most importantly, \LaTeX{} special characters are a nightmare. E.g., +% one needs |\~{}| to produce a tilde in a file name marked by |\file|. +% Perhaps a few years ago, most users would have agreed that this may be +% unfortunate but should not be changed to ensure consistency. But with +% the advent of the WWW and the need for getting `|~|' and `|#|' into +% URLs, commands which only treat the escape and grouping characters +% specially have gained acceptance + +\DeclareRobustCommand\code{\bgroup\@noligs\@codex} +\def\@codex#1{\texorpdfstring% +{{\normalfont\ttfamily\hyphenchar\font=-1 #1}}% +{#1}\egroup} +\newcommand{\kbd}[1]{{\normalfont\texttt{#1}}} +\newcommand{\key}[1]{{\normalfont\texttt{\uppercase{#1}}}} +\DeclareRobustCommand\samp{`\bgroup\@noligs\@sampx} +\def\@sampx#1{{\normalfont\texttt{#1}}\egroup'} +\newcommand{\var}[1]{{\normalfont\textsl{#1}}} +\let\env=\code +\newcommand{\file}[1]{{`\normalfont\textsf{#1}'}} +\let\command=\code +\let\option=\samp +\newcommand{\dfn}[1]{{\normalfont\textsl{#1}}} +% \acronym is effectively disabled since not used consistently +\newcommand{\acronym}[1]{#1} +\newcommand{\strong}[1]{\texorpdfstring% +{{\normalfont\fontseries{b}\selectfont #1}}% +{#1}} +\let\pkg=\strong +\newcommand{\CRANpkg}[1]{\href{https://CRAN.R-project.org/package=#1}{\pkg{#1}}}% +\let\cpkg=\CRANpkg +\newcommand{\ctv}[1]{\href{https://CRAN.R-project.org/view=#1}{\emph{#1}}} +\newcommand{\BIOpkg}[1]{\href{https://www.bioconductor.org/packages/release/bioc/html/#1.html}{\pkg{#1}}} + +% Example environments --------------------------------------------------------- +\RequirePackage{fancyvrb} +\RequirePackage{alltt} + +\DefineVerbatimEnvironment{example}{Verbatim}{} +\renewenvironment{example*}{\begin{alltt}}{\end{alltt}} + +% Support for output from Sweave, and generic session style code +% These used to have fontshape=sl for Sinput/Scode/Sin, but pslatex +% won't use a condensed font in that case. + +% Update (2015-05-28 by DS): remove fontsize=\small to match example environment + +\DefineVerbatimEnvironment{Sinput}{Verbatim}{} +\DefineVerbatimEnvironment{Soutput}{Verbatim}{} +\DefineVerbatimEnvironment{Scode}{Verbatim}{} +\DefineVerbatimEnvironment{Sin}{Verbatim}{} +\DefineVerbatimEnvironment{Sout}{Verbatim}{} +\newenvironment{Schunk}{}{} + +% Mathematics ------------------------------------------------------------------ + +% The implementation of |\operatorname| is similar to the mechanism +% \LaTeXe{} uses for functions like sin and cos, and simpler than the +% one of \AmSLaTeX{}. We use |\providecommand| for the definition in +% order to keep the one of the \pkg{amstex} if this package has +% already been loaded. +% \begin{macrocode} +\providecommand{\operatorname}[1]{% + \mathop{\operator@font#1}\nolimits} +\RequirePackage{amsfonts} + +\renewcommand{\P}{% + \mathop{\operator@font I\hspace{-1.5pt}P\hspace{.13pt}}} +\newcommand{\E}{% + \mathop{\operator@font I\hspace{-1.5pt}E\hspace{.13pt}}} +\newcommand{\VAR}{\operatorname{var}} +\newcommand{\COV}{\operatorname{cov}} +\newcommand{\COR}{\operatorname{cor}} + +% Figures ---------------------------------------------------------------------- + +% For use with pandoc > 3.2.1 +\newsavebox\pandoc@box +\newcommand*\pandocbounded[1]{% scales image to fit in text height/width + \sbox\pandoc@box{#1}% + \Gscale@div\@tempa{\textheight}{\dimexpr\ht\pandoc@box+\dp\pandoc@box\relax}% + \Gscale@div\@tempb{\linewidth}{\wd\pandoc@box}% + \ifdim\@tempb\p@<\@tempa\p@\let\@tempa\@tempb\fi% select the smaller of both + \ifdim\@tempa\p@<\p@\scalebox{\@tempa}{\usebox\pandoc@box}% + \else\usebox{\pandoc@box}% + \fi% +} + +\RequirePackage[font=small,labelfont=bf]{caption} + +% Wide environments for figures and tables ------------------------------------- +\RequirePackage{environ} + +% An easy way to make a figure span the full width of the page +\NewEnviron{widefigure}[1][]{ +\begin{figure}[#1] +\advance\leftskip-2cm +\begin{minipage}{\dimexpr\textwidth+4cm\relax}% + \captionsetup{margin=2cm} + \BODY +\end{minipage}% +\end{figure} +} + +\NewEnviron{widetable}[1][]{ +\begin{table}[#1] +\advance\leftskip-2cm +\begin{minipage}{\dimexpr\textwidth+4cm\relax}% + \captionsetup{margin=2cm} + \BODY +\end{minipage}% +\end{table} +} diff --git a/_articles/RJ-2025-046/RJwrapper.md b/_articles/RJ-2025-046/RJwrapper.md new file mode 100644 index 0000000000..4b8cca9950 --- /dev/null +++ b/_articles/RJ-2025-046/RJwrapper.md @@ -0,0 +1,2029 @@ +--- +abstract: | + The primary objective of simultaneous methodologies for clustering and + variable reduction is to identify both the optimal partition of units + and the optimal subspace of variables, all at once. The optimality is + typically determined using least squares or maximum likelihood + estimation methods. These simultaneous techniques are particularly + useful when working with Big Data, where the reduction (synthesis) is + essential for both units and variables. Furthermore, a secondary + objective of reducing variables through a subspace is to enhance the + interpretability of the latent variables identified by the subspace + using specific methodologies. The drclust package implements double + K-means (KM), reduced KM, and factorial KM to address the primary + objective. KM with disjoint principal components addresses both the + primary and secondary objectives, while disjoint principal component + analysis and disjoint factor analysis address the latter, producing + the sparsest loading matrix. The models are implemented in C++ for + faster execution, processing large data matrices in a reasonable + amount of time. +address: +- | + Ionel Prunila\ + Department of Statistical Sciences, Sapienza University of Rome\ + P.le Aldo Moro 5, 00185 Rome\ + Italy\ + ORCiD: 0009-0009-3773-0481\ + [ionel.prunila@uniroma1.it](ionel.prunila@uniroma1.it){.uri} +- | + Maurizio Vichi\ + Department of Statistical Sciences, Sapienza University of Rome\ + P.le Aldo Moro 5, 00185 Rome\ + Italy\ + ORCiD: 0000-0002-3876-444X\ + [maurizio.vichi@uniroma1.it](maurizio.vichi@uniroma1.it){.uri} +author: +- by Ionel Prunila and Maurizio Vichi +bibliography: +- prunila-vichi.bib +title: "drclust: An R Package for Simultaneous Clustering and + Dimensionality Reduction" +--- + +::::::::: article +## Introduction {#Introduction} + +Cluster analysis is the process of identifying homogeneous groups of +units in the data so that those within clusters are perceived with a low +degree of dissimilarity with each other. In contrast, units in different +clusters are perceived as dissimilar, i.e., with a high degree of +dissimilarity. When dealing with large or extremely large data matrices, +often referred to as Big Data, the task of assessing these +dissimilarities becomes computationally intensive due to the sheer +volume of units and variables involved. To manage this vast amount of +information, it is essential to employ statistical techniques that +synthesize and highlight the most significant aspects of the data. +Typically, this involves dimensionality reduction for both units and +variables to efficiently summarize the data. + +While cluster analysis synthesizes information across the rows of the +data matrix, variable reduction operates on the columns, aiming to +summarize the features and, ideally, facilitate their interpretation. +This key process involves extracting a subspace from the full space +spanned by the manifest variables, maintaining the principal informative +content. The process allows for the synthesis of common information +mainly among subsets of manifest variables, which represent concepts not +directly observable. As a result, subspace-based variable reduction +identifies a few uncorrelated latent variables that mainly capture +common relationships within these subsets. When using techniques like +Factor Analysis (FA) or Principal Component Analysis (PCA) for this +purpose, interpreting the resulting factors or components can be +challenging, particularly when variables significantly load onto +multiple factors, a situation known as *cross-loading*. Therefore, a +simpler structure in the loading matrix, focusing on the primary +relationship between each variable and its related factor, becomes +desirable for clarity and ease of interpretation. Furthermore, the +latent variables derived from PCA or FA do not provide a unique +solution. An equivalent model fit can be achieved by applying an +orthogonal rotation to the component axes. This aspect of non-uniqueness +is often exploited in practice through Varimax rotation, which is +designed to improve the interpretability of latent variables, without +affecting the fit of the analysis. The rotation promotes a simpler +structure in the loading matrix, however, the rotations do not always +ensure enhanced interpretability. An alternative approach has been +proposed by (Vichi and Saporta 2009) and (Vichi 2017), with Disjoint +Principal Component (DPCA) and Disjoint FA (DFA), suggesting to +construct each component/factor from a distinct subset of manifest +variables rather than using all available variables, still optimizing +the same estimation as in PCA and FA, respectively. + +It is important to note that data matrix reduction for both rows and +columns is often performed without specialized methodologies by +employing a \"tandem analysis.\" This involves sequentially applying two +methods, such as using PCA or FA for variable reduction, followed by +Cluster Analysis using KM on the resulting factors. Alternatively, one +could start with Cluster Analysis and then proceed to variable +reduction. The outcomes of these two tandem analyses differ since each +approach optimizes distinct objective functions, one before the other. +For instance, when PCA is applied first, the components maximize the +total variance of the manifest variables. However, if the manifest +variables include high-variance variables that lack a clustering +structure, these will be included in the components, even though they +are not necessary for KM, which focuses on explaining only the variance +between clusters. As a result, sequentially optimizing two different +objectives may lead to sub-optimal solutions. In contrast, when +combining KM with PCA or FA in a simultaneous approach, a single +integrated objective function is utilized. This function aims to +optimize both the clustering partition and the subspace simultaneously. +The optimization is typically carried out using an Alternating Least +Squares (ALS) algorithm, which updates the partition for the current +subspace in one step and the subspace for the current partition in the +next. This iterative process ensures convergence to a solution that +represents at least a local minimum of the integrated objective +function. In comparison, tandem analysis, which follows a sequential +approach (e.g., PCA followed by KM), does not guarantee joint +optimization. One potential limitation of this sequential method is that +the initial optimization through PCA may obscure relevant information +for the subsequent step of Cluster Analysis or emphasize irrelevant +patterns, ultimately leading to sub-optimal solutions, as mentioned by +(DeSarbo et al. 1990). Indeed, the simultaneous strategy has been shown +to be effective in various studies, like (De Soete and Carroll 1994), +(Vichi and Kiers 2001), (Vichi 2001), (Vichi and Saporta 2009), (Rocci +and Vichi 2008), (Timmerman et al. 2010), (Yamamoto and Hwang 2014). + +In order to spread access to these techniques and their use, software +implementations are needed. Within the R Core Team (2015) environment, +there are different libraries available to perform dimensionality +reduction techniques. Indeed, the plain version of KM, PCA, and FA are +available in the built-in package stats, namely: `princomp`, `factanal`, +`kmeans`. Furthermore, some packages allow to go beyond the plain +estimation and output of such algorithms. Indeed, one of the most rich +libraries in R is [**psych**](https://CRAN.R-project.org/package=psych) +(W. R. Revelle 2017), which provides functions that allow to easily +simulate data according to different schemes, testing routines, +calculation of various estimates, as well as multiple estimation +methods. [**ade4**](https://CRAN.R-project.org/package=ade4) (Dray and +Dufour 2007) allows for dimensionality reduction in the presence of +different types of variables, along with many graphical instruments. The +[**FactoMineR**](https://CRAN.R-project.org/package=FactoMineR) (Lê et +al. 2008) package allows for unit-clustering and extraction of latent +variables, also in the presence of mixed variables. +[**FactoClass**](https://CRAN.R-project.org/package=FactoClass) (Pardo +and Del Campo 2007) implements functions for PCA, Correspondence +Analysis (CA) as well as clustering, including the tandem approach. +[**factoextra**](https://CRAN.R-project.org/package=factoextra) +(Kassambara 2022) instead, provides visualization of the results, aiding +their assessment in terms of choice of the number of latent variables, +elegant dendrograms, screeplots and more. More focused on the choice of +the number of clusters is +[**NbClust**](https://CRAN.R-project.org/package=NbClust) (Charrad et +al. 2014), offering 30 indices for determining the number of clusters, +proposing the best method by trying not only different numbers of groups +but also different distance measures and clustering methods, going +beyond the partitioning ones. + +More closely related to the library here presented, to the knowledge of +the authors, there are two packages that implement a subset of the +techniques proposed within +[**drclust**](https://CRAN.R-project.org/package=drclust). +[**clustrd**](https://CRAN.R-project.org/package=clustrd) (Markos et al. +2019) implements simultaneous methods of clustering and dimensionality +reduction. Besides offering functions for continuous data, they also +allow for categorical (or mixed) variables. Even more, they formulate, +at least for the continuous case, an implementation aligned with the +objective function proposed by Yamamoto and Hwang (2014), based on which +the reduced KM (RKM) and factorial KM (FKM) become special cases as +results of a tuning parameter. + +Finally, there is +[**biplotbootGUI**](https://CRAN.R-project.org/package=biplotbootGUI) +(Nieto Librero and Freitas 2023), offering a GUI allowing to interact +with graphical tools, aiding in the choice of the number of components +and clusters. Furthermore, it implements KM with disjoint PCA (DPCA), as +described in (Vichi and Saporta 2009). Even more, they propose an +optimization algorithm for the choice of the initial starting point from +which the estimation process for the parameters begins. + +Like [**clustrd**](https://CRAN.R-project.org/package=clustrd), the +[**drclust**](https://CRAN.R-project.org/package=drclust) package +provides implementations of FKM and RKM. However, while +[**clustrd**](https://CRAN.R-project.org/package=clustrd) also supports +categorical and mixed-type variables, our implementation currently +handles only continuous variables. That said, appropriate pre-processing +of categorical variables, as suggested in Vichi et al. (2019), can make +them compatible with the proposed methods. In extreme essence, one +should dummy-encode all the qualitative variables. In terms of +performance, [**drclust**](https://CRAN.R-project.org/package=drclust) +offers significantly faster execution. Moreover, regarding FKM, our +proposal demonstrates superior results in both empirical applications +and simulations, in terms of model fit and the Adjusted Rand Index +(ARI). Another alternative, +[**biplotbootGUI**](https://CRAN.R-project.org/package=biplotbootGUI), +implements KM with DPCA and includes built-in plotting functions and a +SDP-based initialization of parameters. However, our implementation +remains considerably faster and allows users to specify which variables +should be grouped together within the same (or different) principal +components. This capability enables a partially or fully confirmatory +approach to variable reduction. Beyond speed and the confirmatory +option, [**drclust**](https://CRAN.R-project.org/package=drclust) offers +three methods not currently available in other `R` packages: DPCA and +DFA, both designed for pure dimensionality reduction, and double KM +(DKM), which performs simultaneous clustering and variable reduction via +KM. All methods are implemented in C++ for computational efficiency. +Table [2](#tab:T2){reference-type="ref" reference="tab:stat_models"} +summarizes the similarities and differences between `drclust` and +existing alternatives + +The package presented within this work aims to facilitate the access to +and usability of some techniques that fall in two main branches, which +overlap. In order to do so, some statistical background is first +recalled. + +## Notation and theoretical background + +The main pillars of +[**drclust**](https://CRAN.R-project.org/package=drclust) fall in two +main categories: dimensionality reduction and (partitioning) cluster +analysis. The former may be carried out individually or blended with the +latter. Because both rely on the language of linear algebra, Table +[1](#tab:T1){reference-type="ref" reference="tab:notation"} contains, +for the convenience of the reader, the mathematical notation needed for +this context. Then some theoretical background is reported. + +::: {#tab:notation} + -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + Symbol Description + --------------------------- ---------------------------------------------------------------------------------------------------------------------------------------------- + *n*, *J*, *K*, *Q* number of: units, manifest variables, unit-clusters, latent factors + + $\mathbf{X}$ *n* x *J* data matrix, where the generic element $x_{ij}$ is the real observation on the *i*-th unit within the *j*-th variable + + $\mathbf{x}_i$ *J* x 1 vector representing the generic row of $\mathbf{X}$ + + $\mathbf{U}$ *n* x *K* unit-cluster membership matrix, binary and row stochastic, with $u_{ik}$ being the generic element + + $\mathbf{V}$ *J* x *Q* variable-cluster membership matrix, binary and row stochastic, with $v_{jq}$ as the generic element + + $\mathbf{B}$ *J* x *J* variable-weighting diagonal matrix + + $\mathbf{Y}$ *n* x *Q* component/factor score matrix defined on the reduced subspace + + $\mathbf{y}_i$ *Q* x 1 vector representing the generic row of **Y** + + $\mathbf{A}$ *J* x *Q* variables - factors, \"plain\", loading matrix + + $\mathbf{C}^+$ Moore-Penrose pseudo-inverse of a matrix **C**. $\mathbf{C}^+ = (\mathbf{C'C})^{-1}\mathbf{C'}$ + + $\bar{\textbf{X}}$ *K* x *J* centroid matrix in the original feature space, i.e., $\bar{\textbf{X}} = \textbf{U}^{+} \textbf{X}$ + + $\bar{\mathbf{Y}}$ *K* x *Q* centroid matrix projected in the reduced subspace, i.e., $\bar{\mathbf{Y}} = \bar{\mathbf{X}}\mathbf{A}$ + + $\mathbf{H}_{\mathbf{C}}$ Projector operator $\mathbf{H}_\mathbf{C} = \mathbf{C}(\mathbf{C}'\mathbf{C})^{-1}\mathbf{C}'$ spanned by the columns of matrix $\mathbf{C}$ + + $\mathbf{E}$ *n* x *J* Error term matrix + + $||\cdot||$ Frobenius norm + -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + + : (#tab:T1) Notation +::: + +### Latent variables with simple-structure loading matrix + +Classical methods of PCA (Pearson 1901) or FA (Cattell 1965; Lawley and +Maxwell 1962) build each latent factor from combination of *all* the +manifest variables. As a consequence, the loading matrix, describing the +relations between manifest and latent variables, is usually not +immediately interpretable. Ideally, it is desirable to have variables +that are associated to a single factor. This is typically called *simple +structure*, which induces subsets of variables characterizing factors +and frequently the partition of the variables. While factor rotation +techniques go in this direction (especially Varimax), even if not +exactly, they do not guarantee the result. Alternative solutions have +been proposed. (Zou et al. 2006), by framing the PCA problem as a +regression one, introducing an elastic-net penalty, aiming for a sparse +solution of the loading matrix **A**. For the present work, we consider +two techniques for this purpose: DPCA and DFA, implemented in the +proposed package. + +#### Disjoint principal component analysis + +Vichi and Saporta (2009) propose an alternative solution, DPCA, which +leads to the simplest possible structure on **A**, while still +maximizing the explained variance. Such a result is obtained by building +each latent factor from a subset of variables instead of allowing all +the variables to contribute to all the components. This means that it +provides *J* non-zero loadings instead of having *JQ* of them. To obtain +this setting, variables are grouped in such a way that they form a +partition of the initial set. The model can be described as a +constrained PCA, where the matrix $\mathbf{A}$ is restricted to be +reparametrized into the product $\mathbf{A}=\mathbf{BV}$. Thus, the +model is described as: + +$$\begin{equation} +\label{dpca1} + \mathbf{X} = \mathbf{X}\mathbf{A}\mathbf{A}' + \mathbf{E}= \mathbf{X}\mathbf{B}\mathbf{V}\mathbf{V}'\mathbf{B} + \mathbf{E}, +\end{equation} (\#eq:dpca1)$$ +subject to +$$\begin{equation} +\label{dpca2} + \mathbf{V} = [v_{jq} \in \{0,1\}] \ \ \ \ \ (binarity), +\end{equation} (\#eq:dpca2)$$ + +$$\begin{equation} +\label{dpca3} + \mathbf{V}\mathbf{1}_{Q} = \mathbf{1}_{J} \ \ \ (row-stochasticity), +\end{equation} (\#eq:dpca3)$$ + +$$\begin{equation} +\label{dpca4} +\mathbf{V}'\mathbf{B}\mathbf{B}'\mathbf{V} = \mathbf{I}_{Q} \ \ \ \ \ (orthonormality), +\end{equation} (\#eq:dpca4)$$ + +$$\begin{equation} +\label{dpca5} + \mathbf{B} = diag(b_1, \dots, b_J) \ \ \ \ (diagonality). +\end{equation} (\#eq:dpca5)$$ +The estimation of the parameters $\mathbf{B}$ and $\mathbf{V}$ is +carried out via least squares (LS) and, by solving the minimization +problem, +$$\begin{equation} +\label{dpca6} + RSS_{DPCA}(\mathbf{B}, \mathbf{V}) = ||\mathbf{X} - \mathbf{X}\mathbf{B}\mathbf{V}\mathbf{V}'\mathbf{B}||^2 +\end{equation} (\#eq:dpca6)$$ +subject to the the constraints (\@ref(eq:dpca2), \@ref(eq:dpca3), +\@ref(eq:dpca4), \@ref(eq:dpca5)). An ALS algorithm is employed, +guaranteeing at least a local optimum. In order to (at least partially) +overcome this downside, multiple random starts are needed, and the best +solution is retained. + +Therefore, the DPCA method is subject to more structural constraints +than standard PCA. Specifically, standard PCA does not enforce the +reparameterization $\mathbf{A}=\mathbf{BV}$, meaning its loading matrix +$\mathbf{A}$ is free to vary among orthonormal matrices. In contrast, +DPCA still requires an orthonormal matrix $\mathbf{A}$ but also needs +that each principal component is associated with a disjoint subset of +variables that most reconstruct the data. This implies that each +variable contributes to only one component, resulting in a sparse and +block-diagonal loading matrix. In essence, DPCA fits *Q* separate PCAs +on the *Q* disjoint subsets of variables, and from each, extracts the +eigenvector associated with the largest eigenvalue. In general, the +total variance explained by DPCA is slightly lower, and the residual of +the objective function is larger compared to PCA. This trade-off is made +in exchange for the added constraint that clearly enhances +interpretability. The extent of the reduction depends on the true +underlying structure of the latent factors, specifically on whether they +are truly uncorrelated. When the observed correlation matrix is block +diagonal, with variables within blocks being highly correlated and +variables between blocks being uncorrelated, DPCA can explain almost the +same amount of variance of PCA, with the advantage to simplify +interpretation.\ +It is important to note that, as DPCA is implemented, it allows for a +blend of exploratory and confirmatory approaches. In the confirmatory +framework, users can specify a priori which variables should +collectively contribute to a factor using the `constr` argument, +available for the last three functions in Table +[2](#tab:T2){reference-type="ref" reference="tab:stat_models"}. The +algorithm assigns the remaining manifest variables, for which no +constraint has been specified, to the *Q* factors in a way that ensures +the latent variables best reconstruct the manifest ones, capturing the +maximum variance. This is accomplished by minimizing the loss function +(\@ref(eq:dpca6)). Although each of the *Q* latent variables is derived +from a different subset of variables, which involves the spectral +decomposition of multiple covariance matrices, their smaller size, +combined with the implementation in C++, enables very rapid execution of +the routine. + +A very positive side effect of the additional constraint in DPCA +compared to standard PCA is the uniqueness of the solution, which +eliminates the need for factor rotation in DPCA. + +#### Disjoint factor analysis + +Proposed by Vichi (2017), this technique is the model-based counterpart +of the DPCA model. It pursues a similar goal in terms of building *Q* +factors from *J* variables, imposing a simple structure on the loading +matrix. However, the means by which the goal is pursued are different. +Unlike DPCA, the estimation method adopted for DFA is Maximum Likelihood +and the model requires additional statistical assumptions compared to +DPCA. The model can be formulated in a matrix form as, +$$\begin{equation} +\label{dfa1} + \mathbf{X} = \mathbf{Y}\mathbf{A}'+\mathbf{E}, +\end{equation} (\#eq:dfa1)$$ +where $\mathbf{X}$ is centered, meaning that the mean vector +$\boldsymbol{\mu}$ has been subtracted from each multivariate unit +$\mathbf{x}_{i}$. Therefore, for a multivariate, centered, unit, the +previous model can be expressed as +$$\begin{equation} +\label{dfa2} + \mathbf{x}_i = \mathbf{A}\mathbf{y}_i + \mathbf{e}_i, \ \ i = 1, \dots, n. +\end{equation} (\#eq:dfa2)$$ +where $\mathbf{y}_i$ is the *i*-th row of $\mathbf{Y}$ and +$\mathbf{x}_i$, $\mathbf{e}_i$ are, respectively, the $i$-th rows of +$\mathbf{X}$ and $\mathbf{E}$, with a multivariate normal distribution +on the $J$-dimensional space, +$$\begin{equation} +\label{FAassumptions1} + \mathbf{x}_i \sim \mathcal{N}(\boldsymbol{0}, \boldsymbol{\Sigma_X}), \ \ \ \mathbf{e}_i \sim \mathcal{N}(\boldsymbol{0}, \mathbf{\Psi}) +\end{equation} (\#eq:FAassumptions1)$$ +The covariance structure of the FA model can be written, +$$\begin{equation} + Cov(\mathbf{x}_i) = \mathbf{\Sigma_X} = \mathbf{AA'} + \mathbf{\Psi}, +\end{equation}$$ +[]{#dfa6 label="dfa6"} where additional, assumptions are needed, +$$\begin{equation} +\label{dfa4} + Cov(\mathbf{y}_{i}) = \mathbf{\Sigma}_{\mathbf{Y}} = \mathbf{I}_Q, +\end{equation} (\#eq:dfa4)$$ + +$$\begin{equation} +\label{dfa5} + Cov(\mathbf{e}_i) = \mathbf{\Sigma}_{\mathbf{E}} = \mathbf{\Psi}, \ \ \ \mathbf{\Psi} = diag(\psi_{1},\dots,\psi_{Q} : \psi_{q}>0)' , \ \ j = 1, \dots, J +\end{equation} (\#eq:dfa5)$$ + +$$\begin{equation} + Cov(\mathbf{e}_{i}, \mathbf{y}_{i}) = \mathbf{\Sigma}_{\mathbf{EY}} = 0 +\label{dfa5b} +\end{equation} (\#eq:dfa5b)$$ + +$$\begin{equation} +\mathbf{A} = \mathbf{BV} +\label{dfa6b} +\end{equation} (\#eq:dfa6b)$$ +The objective function can be formulated as the maximization of the +Likelihood function or as the minimization of the following discrepancy: +$$\begin{align*} + D_{DFA}(\mathbf{B},\mathbf{V}, \mathbf{\Psi}) + & = |\text{ln}(\mathbf{B}\mathbf{V}\mathbf{V}'\mathbf{B} + \mathbf{\Psi})| - \text{ln}|\mathbf{S}| + \text{tr}((\mathbf{B}\mathbf{V}\mathbf{V}'\mathbf{B} + \mathbf{\Psi})^{-1}\mathbf{S}) - \textit{J}, \\ + & \qquad j = 1, \dots, \textit{J}, \ q = 1, \dots, \textit{Q},\\ + & \qquad s.t.: \mathbf{V} = [v_{jq}], \ v_{jq} \in \{0,1\}, \ \sum_q{v_{jq}} = 1, +\end{align*}$$ +whose parameters are optimized by means of a coordinate descent +algorithm. + +Apart from the methodological distinctions between DPCA and DFA, the +latter exhibits the scale equivariance property. The optimization of the +Likelihood function implies a higher computational load, thus, a longer +(compared to the DPCA) execution time. + +As in the DPCA case, under the constraint $\mathbf{A}=\mathbf{BV}$, the +solution provided by the model is, also in this case, unique. + +### Joint clustering and variable reduction + +The four clustering methods discussed all follow the $K$-means +framework, working to partition units. However, they differ primarily in +how they handle variable reduction. + +Double KM (DKM) employs a symmetric approach, clustering both the units +(rows) and the variables (columns) of the data matrix at the same time. +This leads to the simultaneous identification of mean profiles for both +dimensions. DKM is particularly suitable for data matrices where both +rows and columns represent units. Examples of such matrices include +document-by-term matrices used in Text Analysis, product-by-customer +matrices in Marketing, and gene-by-sample matrices in Biology. + +In contrast, the other three clustering methods adopt an asymmetric +approach. They treat rows and columns differently, focusing on means +profiles and clustering for rows, while employing components or factors +for the variables (columns). These methods are more appropriate for +typical units-by-variable matrices, where it's beneficial to synthesize +variables using components or factors. At the same time, they emphasize +clustering and the mean profiles of the clusters specifically for the +rows. The methodologies that fall into this category are RKM, FKM, and +DPCAKM. + +The estimation is carried out by the LS method, while the computation of +the estimates is performed via ALS. + +#### Double k-means (DKM) + +Proposed by Vichi (2001), DKM is one of the first introduced +bi-clustering methods that provides a simultaneous partition of the +units and variables, resulting in a two-way extension of the plain KM +(McQueen 1967). The model is described by the following equation, +$$\begin{equation} +\label{dkm1} + \mathbf{X} = \mathbf{U}\bar{\mathbf{Y}}\mathbf{V}' + \mathbf{E} +\end{equation} (\#eq:dkm1)$$ +where $\bar{\mathbf{Y}}$ is the centroid matrix in the reduced space for +the rows and columns, enabling a comprehensive summarization of units +and variables. By optimizing a single objective function, the DKM method +captures valuable information from both dimensions of the dataset +simultaneously. + +This bi-clustering approach can be applied in several impactful ways. +One key application is in the realm of Big Data. DKM can effectively +compress expansive datasets that includes a vast number of units and +variables into a compressed more manageable and robust data matrix +$\bar{\mathbf{Y}}$. This compressed matrix, formed by mean profiles both +for rows and columns, can then be explored and analyzed using a variety +of subsequent statistical techniques, thus facilitating efficient data +handling and analysis of Big Data. The algorithm similarly to the +well-known KM is very fast and converges quickly to a solution, which is +at least a local minimum of the problem. + +Another significant application of DKM is its capability to achieve +optimal clustering for both rows and columns. This dual clustering +ability is particularly advantageous in situations where it is essential +to discern meaningful patterns and relationships within complex +datasets, highlighting the utility of DKM in diverse fields and +scenarios. + +The Least Squares estimation of the parameters $\mathbf{U}$, +$\mathbf{V}$ and $\bar{\mathbf{Y}}$ leads to the minimization of the +problem +$$\begin{equation} +\label{dkm2} + RSS_{\textit{DKM}}(\mathbf{U}, \mathbf{V}, \bar{\mathbf{Y}}) = {||\mathbf{X} - \mathbf{U}\bar{\mathbf{Y}}\mathbf{V}'||^2}, +\end{equation} (\#eq:dkm2)$$ + +$$\begin{equation} +\label{dkm3} + s.t.: u_{ik} \in \{0,1\}, \ \ \sum_{k} u_{ik} = 1, \ \ i = 1 ,\dots, N, \ \ k = 1 ,\dots, K, +\end{equation} (\#eq:dkm3)$$ + +$$\begin{equation} +\label{dkm4} + \ \ \ \ \ \ \ v_{jq} \in \{0,1\}, \ \ \sum_{q} v_{jq} = 1, \ \ j = 1, \dots, J, \ \ q = 1, \dots, Q. +\end{equation} (\#eq:dkm4)$$ +Since $\mathbf{\bar{Y}} = \mathbf{U}^{+}\mathbf{X}\mathbf{V}^{+'}$, then +(\@ref(eq:dkm2)) can be framed in terms of projector operators, thus: +$$\begin{equation} +\label{dkm5} +RSS_{\textit{DKM}}(\mathbf{U}, \mathbf{V}) = ||\mathbf{X} - \mathbf{H}_\mathbf{U}\mathbf{X}\mathbf{H}_\mathbf{V}||^2. +\end{equation} (\#eq:dkm5)$$ +Minimizing in both cases the sum of squared-residuals (or, equivalently, +the within deviances associated to the *K* unit-clusters and *Q* +variable-clusters). In this way, one obtains a (hard) classification of +both units and variables. The optimization of \[\@ref(eq:dkm5)\] is done +via ALS, alternating, in essence, two assignment problems for rows and +columns similar to KM steps. + +#### Reduced k-means (RKM) + +Proposed by De Soete and Carroll (1994), RKM performs the reduction of +the variables by projecting the *J*-dimensional centroid matrix into a +*Q*-dimensional subspace ($\textit{Q} \leq$ *J*), spanned by the columns +of the loading matrix $\mathbf{A}$, such that it best reconstructs +$\mathbf{X}$ by using the orthogonal projector matrix +$\mathbf{A}\mathbf{A}'$. Therefore, the model is described by the +following equation, +$$\begin{equation} +\label{rkm1} + \mathbf{X} = \mathbf{U}\bar{\mathbf{X}}\mathbf{A}\mathbf{A}' + \mathbf{E}. +\end{equation} (\#eq:rkm1)$$ +The estimation of **U** and **A** can be done via LS, minimizing the +following equation, +$$\begin{equation} +\label{rkm2} + RSS_{\textit{RKM}}(\mathbf{U}, \mathbf{A})={||\mathbf{X} - \mathbf{U}\bar{\mathbf{X}}\mathbf{A}\mathbf{A}'||^2}, +\end{equation} (\#eq:rkm2)$$ + +$$\begin{equation} +\label{rkm3} + s.t.: \ \ \ u_{ik} \in \{0,1\}, \ \ \sum_{k} u_{ik} = 1, \ \ \mathbf{A}'\mathbf{A} = \mathbf{I}. +\end{equation} (\#eq:rkm3)$$ +which can be optimized, once again, via ALS. In essence, the model +alternates a KM step assigning each original unit $\mathbf{x}_i$ to the +closest centroid in the reduced space and a PCA step based on the +spectral decomposition of $\mathbf{X}'\mathbf{H}_\mathbf{U}\mathbf{X}$, +conditioned on the results of the previous iteration. The iterations +continue until when the difference between two subsequent objective +functions is smaller than a small arbitrary chosen constant +$\epsilon > 0$. + +#### Factorial k-means (FKM) + +Proposed by Vichi and Kiers (2001), FKM produces a dimension reduction +both of the units and centroids differently from RKM. Its goal is to +reconstruct the data in the reduced subspace, $\mathbf{Y}$, by means of +the centroids in the reduced space. The FKM model can be obtained by +considering the RKM model and post-multiplying the right- and left-hand +side of it in equation (\@ref(eq:rkm1)), and rewriting the new error as +$\mathbf{E}$, +$$\begin{equation} + \mathbf{X}\mathbf{A} = \mathbf{U}\bar{\mathbf{X}}\mathbf{A} + \mathbf{E}. +\end{equation}$$ +Its estimation via LS results in the optimization of the following +equation, +$$\begin{equation} +\label{fkm1} + RSS_{\textit{FKM}}(\mathbf{U}, \mathbf{A}, \bar{\mathbf{X}})={||\mathbf{X}\mathbf{A} - \mathbf{U}\bar{\mathbf{X}}\mathbf{A}||^2}, +\end{equation} (\#eq:fkm1)$$ + +$$\begin{equation} + s.t.: \ \ \ u_{ik} \in \{0,1\}, \ \ \sum_{k} u_{ik} = 1, \ \ \mathbf{A}'\mathbf{A} = \mathbf{I}. +\end{equation}$$ +Although the connection with the RKM model appears straightforward, it +can be shown that the loss function of the former is always equal or +smaller compared to the latter. Practically, the KM step is applied on +$\mathbf{X}\mathbf{A}$, instead of just $\mathbf{X}$, as it happens in +the DKM and RKM. In essence, FKM works better when the data and +centroids are lying in the reduced subspace, and not just the centroids +as in RKM. + +In order to decide when RKM or FKM can be properly applied, it is +important to recall that two types of residuals can be defined in +dimensionality reduction: *subspace residuals*, lying on the subspace +spanned by the columns of $\mathbf{A}$ and *complement residuals*, lying +on the complement of this subspace, i.e., those residual lying on the +subspace spanned by the columns of $\mathbf{A}^\perp$, with +$\mathbf{A}^\perp$ a column-wise orthonormal matrix of order +$J \times (J-Q)$ such that +$\mathbf{A}^\perp \mathbf{A}^{\perp ^\prime} = \mathbf{O}_{J-Q}$, where +$\mathbf{O}_{J-Q}$ is the matrix of zeroes of order $Q \times (J-Q)$. +FKM is more effective when there is significant residual variance in the +subspace orthogonal to the clustering subspace. In other words, the +complement residuals typically represent the error given by those +observed variables that scarcely contribute to the clustering subspace +to be identified. FKM tends to recover the subspace and clustering +structure more accurately when the data contains variables with +substantial variance that does not reflect the clustering structure and +therefore mask it. FKM can better ignore these variables and focus on +the relevant clustering subspace. On the other hand, RKM performs better +when the data has significant residual variance within the clustering +subspace itself. This means that when the variables within the subspace +show considerable variance, RKM can more effectively capture the +clustering structure. + +In essence, when most of the variables in the dataset reflect the +clustering structure, RKM is more likely to provide a good solution. If +this is not the case, FKM may be preferred. + +#### Disjoint principal component analysis k-means (DPCAKM) + +Starting from the FKM model, the goal here, beside the partition of the +units, is to have a parsimonious representation of the relationships +between latent and manifest variables, provided by the loading matrix +**A**. Vichi and Saporta (2009) propose for FKM the parametrization of +**A** = **BV**, that allows the simplest structure and thus simplifies +the interpretation of the factors, +$$\begin{equation} +\label{cdpca1} + \mathbf{X} = \mathbf{U}\bar{\mathbf{X}}\mathbf{B}\mathbf{V}\mathbf{V}'\mathbf{B} + \mathbf{E}. +\end{equation} (\#eq:cdpca1)$$ +By estimating $\mathbf{U}$, $\mathbf{B}$, $\mathbf{V}$ and +$\bar{\mathbf{X}}$ via LS, the loss function of the proposed method +becomes: +$$\begin{equation} +\label{cdpca2} + RSS_{DPCAKM}(\mathbf{U}, \mathbf{B}, \mathbf{V}, \bar{\mathbf{X}}) = ||\mathbf{X} - \mathbf{U}\bar{\mathbf{X}}\mathbf{B}\mathbf{V}\mathbf{V}'\mathbf{B}||^2, +\end{equation} (\#eq:cdpca2)$$ + +$$\begin{equation} +\label{cdpca3} + s.t.: u_{ik} \in \{0,1\}, \ \ \sum_{k} u_{ik} = 1, \ \ i = 1 ,\dots, N, \ \ k = 1 ,\dots, K, +\end{equation} (\#eq:cdpca3)$$ + +$$\begin{equation} +\label{cdpca4} + \ \ \ \ \ \ \ v_{jq} \in \{0,1\}, \ \ \sum_{q} v_{jq} = 1, \ \ j = 1, \dots, J, \ \ q = 1, \dots, Q, +\end{equation} (\#eq:cdpca4)$$ + +$$\begin{equation} +\label{cdpca5} + \ \ \ \ \ \ \ \mathbf{V}'\mathbf{B}\mathbf{B}\mathbf{V} = \mathbf{I}, \ \ \mathbf{B} = diag(b_1, \dots, b_J). +\end{equation} (\#eq:cdpca5)$$ +In practice, this model has traits of the DPCA given the projection on +the reduced subspace and the partitioning of the units, resulting in a +sparse loading matrix, but also of the DKM, given the presence of both +**U** and **V**. Thus, DPCAKM can be considered a bi-clustering +methodology with an asymmetric treatment of the rows and columns of +**X**. By inheriting the constraint on **A**, the overall fit of the +model compared with the FKM for example, is generally worse although it +offers an easier interpretation of the principal components. +Nevertheless, it is potentially able to identify a better partition of +the units. Like in the DPCA case, the difference is negligible when the +true latent variables are really disjoint. As implemented, the +assignment step is carried out by minimizing the unit-centroid +squared-Euclidean distance in the reduced subspace. + +## The package + +The library offers the implementation of all the models mentioned in the +previous section. Each one of them corresponds to a specific function +implemented using [**Rcpp**](https://CRAN.R-project.org/package=Rcpp) +(Eddelbuettel and Francois 2011) and +[**RcppArmadillo**](https://CRAN.R-project.org/package=RcppArmadillo) +(Eddelbuettel and Sanderson 2014). + +::: {#tab:stat_models} + ----------------------------------------------------------------------------------------------------------------------------------------------------- + Function Model Previous\ Main differences\ + Implementations in `drclust` + ------------ ----------------------------- ----------------------------------------- ---------------------------------------------------------------- + `doublekm` DKM\ None Short runtime (C++); + (Vichi 2001) + + `redkm` RKM\ in `clusterd`;\ \>50x faster (C++);\ + (De Soete and Carroll 1994) Mixed variables; Continuous variables; + + `factkm` FKM\ in `clustrd`;\ \>20x faster (C++);\ + (Vichi and Kiers 2001) Mixed variables Continuous variables;\ + Better fit and classification; + + `dpcakm` DPCAKM\ in `biplotbootGUI`;\ \>10x faster (C++);\ + (Vichi and Saporta 2009) Continuous variables;\ Constraint on variable allocation within principal components; + SDP-based initialization of parameters; + + `dispca` DPCA\ None Short runtime (C++);\ + (Vichi and Saporta 2009) Constraint on variable allocation within principal components; + + `disfa` DFA\ None Short runtime (C++);\ + (Vichi 2017) Constraint on variable allocation within factors; + ----------------------------------------------------------------------------------------------------------------------------------------------------- + + : (#tab:T2) Statistical methods available in the `drclust` package +::: + +Some additional functions have been made available for the user. Most of +them are intended to aid the user in evaluating the quality of the +results, or in the choice of the hyper-parameters. + +::: {#tab:aux_methods} + ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + **Function** **Technique** **Description** **Goal** + ----------------- ----------------------------- --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- --------------------------------- + `apseudoF` \"relaxed\" pseudoF \"Relaxed\" version of Caliński and Harabasz (1974). Selects the second largest pseudoF value if the difference with the first is less than a fraction. Parameter tuning + + `dpseudoF` DKM-pseudoF Adaptation of the pseudoF criterion proposed by Rocci and Vichi (2008) to bi-clustering. Parameter tuning + + `kaiserCrit` Kaiser criterion Kaiser rule for selecting the number of principal components (Kaiser 1960). Parameter tuning + + `centree` Dendrogram of the centroids Graphical tool showing how close the centroids of a partition are. Visualization + + `silhouette` Silhouette Imported from [**cluster**](https://CRAN.R-project.org/package=cluster) (Maechler et al. 2023) and [**factoextra**](https://CRAN.R-project.org/package=factoextra) (Kassambara 2022). Visualization, parameter tuning + + `heatm` Heatmap Heatmap of distance-ordered units within distance-ordered clusters, adapted from [**pheatmap**](https://CRAN.R-project.org/package=pheatmap) (Kolde 2019). Visualization + + `CronbachAlpha` Cronbach Alpha Index Proposed by Cronbach (1951). Assesses the unidimensionality of a dataset. Assessment + + `mrand` ARI Assesses clustering quality based on the confusion matrix (Rand 1971). Assessment + + `cluster` Membership vector Returns a multinomial 1 × *n* membership vector from a binary, row-stochastic *n* × *K* membership matrix; mimics `kmeans$cluster`. Encoding + ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + + : (#tab:T3) Auxiliary functions available in the library +::: + +With regard to the auxiliary functions (Table +[3](#tab:T3){reference-type="ref" reference="tab:aux_methods"}), they +have all been implemented in the `R` language, building on top of +packages already available on CRAN, such as +[**cluster**](https://CRAN.R-project.org/package=cluster) by (Maechler +et al. 2023), +[**factoextra**](https://CRAN.R-project.org/package=factoextra) by +(Kassambara 2022), +[**pheatmap**](https://CRAN.R-project.org/package=pheatmap) by (Kolde +2019), which allowed for an easier implementation. One of the main goals +of the proposed package, besides spreading the availability and +usability of the statistical methods considered, is the speed of +computation. By doing so (if the memory is sufficient), the results, +also for large data matrices, can be obtained in a reasonable amount of +time. A first mean adopted to pursue such a goal is the full +implementation of the statistical methods in the C++ language. The +libraries used are [**Rcpp**](https://CRAN.R-project.org/package=Rcpp) +(Eddelbuettel and Francois 2011) and +[**RcppArmadillo**](https://CRAN.R-project.org/package=RcppArmadillo) +(Eddelbuettel and Sanderson 2014), which significantly reduced the +required runtime. + +A practical issue that happens very often in crisp (hard) clustering, +such as KM, is the presence of empty clusters after the assignment step. +When this happens, a column of $\mathbf{U}$ has all elements equal to +zero, which can be proved to be a local minimum solution, and impedes +obtaining a solution for $(\mathbf{U}'\mathbf{U})^{-1}$. This typically +happens even more often when the number of clusters *K* specified by the +user is larger than the true one or in the case of a sub-optimal +solution. Among the possible solutions addressing this issue, the one +implemented here consists in splitting the cluster with higher +within-deviance. In practice, a KM with $\textit{K} = 2$ is applied to +it, assigning to the empty cluster one of the two clusters obtained by +the procedure, which is iterated until all the empty clusters are +filled. Such a strategy guarantees that the monotonicity of the ALS +algorithm is preserved, although it is the most time-consuming one. + +Among all the six implementations of the statistical techniques, there +are some arguments that are set to a default value. Table +[4](#tab:T4){reference-type="ref" reference="tab:defaultarguments"} +describes all the arguments that have a default value. In particular, +`print`, which displays a descriptive summary of the results, is set to +zero (so the user should explicitly require to the function such +output). `Rndstart` is set as default to 20, so that the algorithm is +run 20 times until convergence. In order to have more confidence (not +certainty) that the obtained solution is a global optimum, a higher +value for this argument can be provided. With particular regard to +`redkm` and `factkm`, the argument `rot`, which performs a Varimax +rotation on the loading matrix, is set by default to 0. If the user +would like to have this performed, it must be set equal to 1. Finally, +the `constr` argument, which is available for `dpcakm` and `dispca`, is +set by default to a vector (of length *J*) of zeros, so that each +variable is selected to contribute to the most appropriate latent +variable, according to the logic of the model. + +::: {#tab:defaultarguments} + --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + **Argument** **Used In** **Description** **Default Value** + -------------- ------------------------------------------------------------ --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------- + `Rndstart` `doublekm`, `redkm`, `factkm`, `dpcakm`, `dispca`, `disfa` Number of times the model is run until convergence. 20 + + `verbose` `doublekm`, `redkm`, `factkm`, `dpcakm`, `dispca`, `disfa` Outputs basic summary statistics regarding each random start (1 = enabled; 0 = disabled). 0 + + `maxiter` `doublekm`, `redkm`, `factkm`, `dpcakm`, `dispca`, `disfa` Maximum number of iterations allowed for each random start (if convergence is not yet reached) 100 + + `tol` `doublekm`, `redkm`, `factkm`, `dpcakm`, `dispca`, `disfa` Tolerance threshold (maximum difference between the values of the objective function of two consecutive iterations such that convergence is assumed $10^{-6}$ + + `tol` `apseudoF` Approximation value. It is half of the length of the interval put for each pF value. 0 \<= `tol` \< 1 0.05 + + `rot` `redkm`, `factkm` performs varimax rotation of axes obtained via PCA (0 = `False`; 1 = `True`) 0 + + `prep` `doublekm`, `redkm`, `factkm`, `dpcakm`, `dispca`, `disfa` Pre-processing of the data. 1 performs the *z*-score transform; 2 performs the min-max transform; 0 leaves the data un-pre-processed 1 + + `print` `doublekm`, `redkm`, `factkm`, `dpcakm`, `dispca`, `disfa` Final summary statistics of the performed method (1 = enabled; 0 = disabled). 0 + + `constr` `dpcakm`, `dispca`, `disfa` Vector of length $J$ (number of variables) specifying variable-to-cluster assignments. Each element can be an integer from 0 to $Q$ (number of variable-clusters or components), indicating a fixed assignment, or 0 to leave the variable unconstrained (i.e., assigned by the algorithm). `rep(0,J)` + --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + + : (#tab:T4) Arguments accepted by functions in the `drclust` + package with default values +::: + +By offering a fast execution time, all the implemented models allow to +run multiple random starts of the algorithm in a reasonable amount of +time. This feature comes particularly useful given the absence of +guarantees of global optima for the ALS algorithm, which has an ad-hoc +implementation for all the models. Table +[5](#tab:T5){reference-type="ref" reference="tab:comparison"} shows +that, compared to the two packages which implement 3 of the 6 models in +[**drclust**](https://CRAN.R-project.org/package=drclust), our proposal +is much faster than the corresponding versions implemented in `R` (Table +[5](#tab:T5){reference-type="ref" reference="tab:comparison"}), +providing, nevertheless, compelling results. + +The iris dataset has been used in order to measure the performance in +terms of fit, runtime, and ARI (Rand 1971). The *z*-transform has been +applied on all the variables of the dataset. This implies that all the +variables, post-transformation, have mean equal to 0 and variance equal +to 1, by subtracting the mean to each variable and dividing the result +by the standard deviation. The same result is typically obtained by the +`scale(X) R` function. + +$$\begin{equation} +\label{eq:ztransform} +\mathbf{Z}_{\cdot j} = \frac{\mathbf{X}_{\cdot j} - \mu_j \mathbf{1_\textit{n}}}{\sigma_j} +\end{equation} (\#eq:ztransform)$$ +where $\mu_j$ is the mean of the *j*-th variable and $\sigma_\textit{j}$ +its standard deviation. The subscript .*j* refers to the whole *j*-th +column of the matrix. This operation avoids the measurement scale to +have impact on the final result (and is used by default, unless +otherwise specified by the user, within all the techniques implemented +by `drclust`. In order to avoid the comparison between potentially +different objective functions, the between deviance (intended as +described by the authors in the articles where the methods have been +proposed) has been used as a fit measure and computed based on the +output provided by the functions, aiming at having homogeneity in the +evaluation metric. *K=3* and *Q=2* have been used for the clustering +algorithms, maintaining, for the two-dimensionality reduction +techniques, just *Q* = 2. + +For each method, 100 runs have been performed and the best solution has +been picked. For each run, the maximum allowed number of iterations = +100, with a tolerance error (i.e., precision) equal to $10^{-6}$. + +::: {#tab:comparison} + ------------------------------------------------------------------------------------------------------ + Library Technique Runtime Fit ARI Fit Measure + --------------- ----------- --------- ------- ------- ------------------------------------------------ + clustrd RKM 0.73 21.38 0.620 $||\mathbf{U}\bar{\mathbf{Y}}\mathbf{A}'||^2$ + + drclust RKM 0.01 21.78 0.620 $||\mathbf{U}\bar{\mathbf{Y}}\mathbf{A}'||^2$ + + clustrd FKM 1.89 4.48 0.098 $||\mathbf{U}\bar{\mathbf{Y}}||^2$ + + drclust FKM 0.03 21.89 0.620 $||\mathbf{U}\bar{\mathbf{Y}}||^2$ + + biplotbootGUI CDPCA 2.83 21.32 0.676 $||\mathbf{U}\bar{\mathbf{Y}}\mathbf{A}'||^2$ + + drclust CDPCA 0.05 21.34 0.676 $||\mathbf{U}\bar{\mathbf{Y}}\mathbf{A}'||^2$ + + drclust DKM 0.03 21.29 0.652 $||\mathbf{U}\bar{\mathbf{X}}\mathbf{H_V}||^2$ + + drclust DPCA \<0.01 23.70 \- $||\mathbf{Y}\mathbf{A}'||^2$ + + drclust DFA 1.11 55.91 \- $||\mathbf{Y}\mathbf{A}'||^2$ + ------------------------------------------------------------------------------------------------------ + + : (#tab:T5) Performance of the variable reduction and joint + clustering-variable reduction models +::: + +The results of table [5](#tab:T5){reference-type="ref" +reference="tab:comparison"} are visually represented in figure +[1](#fig:iriscomparison){reference-type="ref" +reference="fig:iriscomparison"}. + +
    + +
    Figure 1: ARI, Fit, Runtime for the available +implementations
    +
    + +Although the runtime heavily depends on the hardware characteristics, +they have been reported within Table [5](#tab:T5){reference-type="ref" +reference="tab:comparison"} for a relative comparison purpose only, +having run all the techniques with the same one hardware. For all the +computations within the present work, the specifics of the machine used +are: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz 2.00 GHz. + +Besides the already mentioned difference between DPCA and DFA, it is +worth mentioning that, in terms of implementation, they retrieve the +latent variables differently. Indeed, while the DPCA relies on the +eigendecomposition, the DFA uses an implementation of the power method +(Hotelling 1933). + +In essence, the implementation of our proposal, while being very fast, +exhibits a goodness of fit very close (sometimes better, compared) to +the available alternatives. + +## Simulation study + +To better understand the capabilities of the proposed methodologies and +evaluate the performance of the drclust package, a simulation study was +conducted. In this study, we assume that the number of clusters (K) and +the number of factors (Q) are known, and we examine how results vary +across the DKM, RKM, FKM, and DPCAKM methods. + +### Data generation process + +The performance of these algorithms is tested on synthetic data +generated through a specific procedure. Initially, centroids are created +using eigendecomposition on a transformed distance matrix, resulting in +three equidistant centroids in a reduced two-dimensional space. To model +the variances and covariances among the generated units within each +cluster and to introduce heterogeneity among the units, a +variance-covariance matrix ($\Sigma_O$) is derived from samples taken +from a zero-mean Gaussian distribution, with a specified standard +deviation ($\sigma_u$). + +Membership for the 1,000 units is determined based on a (K × 1) vector +of prior probabilities, utilizing a multinomial distribution with (0.2, +0.3, 0.5) probabilities. For each unit, a sample is drawn from a +multivariate Gaussian distribution centered around its corresponding +centroid, using the previously generated covariance matrix ($\Sigma_O$). +Additionally, four masking variables, which do not exhibit any +clustering structure, are generated from a zero-mean multivariate +Gaussian and scaled by a standard deviation of $\sigma$=6. These masking +variables are added to the 2 variables that form the clustering +structure of the dataset. Then, the final sample dataset is +standardized. + +It is important to note that the standard deviation $\sigma_u$ controls +the amount of variance in the reduced space, thus influencing the level +of subspace residuals. Conversely, $\sigma_m$ regulates the variance of +the masking variables, impacting the complement residuals. + +This study considers various scenarios where there are $J$ = 6 +variables, $n$ = 1,000 units, $K$ = 3 clusters and $Q$ = 2 factors. We +explore high, medium, and low variance $\sigma_u$ of the heterogeneity +within clusters with values of 0.8, 0.55, and 0.3. For each combination +of these parameters, $s$=100 samples are generated. Since the design is +fully crossed, a total of 300 datasets are produced. Examples of the +generated samples are illustrated in Figure +[2](#fig:sim123){reference-type="ref" reference="fig:sim123"}, which +shows that as the level of within-cluster variance increases, the +variables with a clustering structure tend to create overlapping +clusters. It is worthy to inform that the two techniques dedicated +solely to variable reduction, namely DPCA and DFA, were not included in +the simulation study. This is because the study's primary focus is on +clustering and dimension reduction and the comparison with competing +implementations. However, it is worth noting that these methods are +inherently quick, as can be observed from the speed of methodologies +that combine clustering with DPCA or DFA dimension reduction methods. + +### Performance evaluation + +The performance of the proposed methods was assessed through a +simulation study. To evaluate the accuracy in recovering the true +cluster membership of the units (**U**), the ARI (Hubert and Arabie +1985) was employed. The ARI quantifies the similarity between the hard +partitions generated by the estimated classification matrices and those +defined by the true partition. It considers both the reference partition +and the one produced by the algorithm under evaluation. The ARI +typically ranges from 0 to 1, where 0 indicates a level of agreement +expected by random chance, and 1 denotes a perfect match. Negative +values may also occur, indicating agreement worse than what would be +expected by chance. In order to assess the models' ability to +reconstruct the underlying data structure, the between deviance, denoted +by $f$---, was computed. This measure is defined in the original works +proposing the evaluated methods and is reported in the second column +(Fit Measure) of Table [6](#tab:T6){reference-type="ref" +reference="tab:simulation"}. For comparison, the true between deviance +$f^{*}$, calculated from the known true, known, values of **U** and +**A**, was also computed. The difference $f - f^{*}$ was considered, +where negative values suggest potential overfitting. Furthermore, the +squared Frobenius norm $||\mathbf{A}^* - \mathbf{A}||^2$ was computed to +assess how accurately each model estimated the true loading matrix +$\mathbf{A}^*$. This evaluation was not applicable to the DKM method, as +it does not provide estimates of the loading matrix. For each +performance metric presented in Table [6](#tab:T6){reference-type="ref" +reference="tab:simulation"}, the median value across $s$ = 100 +replicates, for each level of error (within deviance), is reported. + +It is important to note that fit and ARI reflect distinct objectives. +While fit measures the variance explained by the model, the ARI assesses +clustering accuracy. As such, the two metrics may diverge. A model may +achieve high fit by capturing subtle variation or even noise, which may +not correspond to well-separated clusters, leading to a lower ARI. +Conversely, a method focused on maximizing cluster separation may yield +high ARI while explaining less overall variance. This trade-off is +particularly relevant in unsupervised settings, where there is no +external supervision to guide the balance between reconstruction and +partitioning. For this reason, we report both metrics to provide a more +comprehensive assessment of model performance. + +### Algorithms performances and comparison with the competing implementations + +For each sample, the algorithms DKM, RKM, FKM, and DPCAKM are applied +using 100 random start solutions, selecting the best one. This +significantly reduces the impact of local minima in the clustering and +dimension reduction process. Figure +[2](#fig:sim123){reference-type="ref" reference="fig:sim123"} depicts +the typical situation for each scenario (low, medium, high +within-cluster variance). + +
    + +
    Figure 2: Within-cluster variance of the simulated data (in +order: low, medium, high)
    +
    + +::: {#tab:simulation} ++-----------+---------------------------------------------------------+---------------+-------------+---------+---------+-----------+-----------------------------------+ +| Technique | Fit Measure | Library | Runtime (s) | Fit | ARI | $f^* - f$ | $||\mathbf{A}^* - \mathbf{A}||^2$ | ++:==========+:========================================================+:==============+:============+:========+:========+:==========+:==================================+ +| **Low** | ++-----------+---------------------------------------------------------+---------------+-------------+---------+---------+-----------+-----------------------------------+ +| RKM | $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}\mathbf{A}'||^2$ | clustrd | 164.03 | 42.76 | 1.00 | 0.00 | 2.00 | ++-----------+---------------------------------------------------------+---------------+-------------+---------+---------+-----------+-----------------------------------+ +| RKM | $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}\mathbf{A}'||^2$ | drclust | 0.48 | 42.76 | 1.00 | 0.00 | 2.00 | ++-----------+---------------------------------------------------------+---------------+-------------+---------+---------+-----------+-----------------------------------+ +| FKM | $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}||^2$ | clustrd | 15.48 | 2.89 | 0.35 | 39.77 | 1.99 | ++-----------+---------------------------------------------------------+---------------+-------------+---------+---------+-----------+-----------------------------------+ +| FKM | $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}||^2$ | drclust | 0.52 | 42.76 | 1.00 | 0.00 | 2.00 | ++-----------+---------------------------------------------------------+---------------+-------------+---------+---------+-----------+-----------------------------------+ +| DPCAKM | $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}||^2$ | biplotbootGUI | 41.70 | 42.74 | 1.00 | 0.01 | 2.00 | ++-----------+---------------------------------------------------------+---------------+-------------+---------+---------+-----------+-----------------------------------+ +| DPCAKM | $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}||^2$ | drclust | 1.37 | 42.74 | 1.00 | 0.01 | 2.00 | ++-----------+---------------------------------------------------------+---------------+-------------+---------+---------+-----------+-----------------------------------+ +| DKM | $||\mathbf{U}\bar{\mathbf{X}}\mathbf{V}||^2$ | drclust | 0.78 | 61.55 | 0.46 | -18.94 | \- | ++-----------+---------------------------------------------------------+---------------+-------------+---------+---------+-----------+-----------------------------------+ +| **Medium** | ++-----------+---------------------------------------------------------+---------------+-------------+---------+---------+-----------+-----------------------------------+ +| RKM | $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}\mathbf{A}'||^2$ | clustrd | 230.31 | 39.18 | 0.92 | -0.27 | 2.00 | ++-----------+---------------------------------------------------------+---------------+-------------+---------+---------+-----------+-----------------------------------+ +| RKM | $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}\mathbf{A}'||^2$ | drclust | 0.70 | 39.18 | 0.92 | -0.27 | 2.00 | ++-----------+---------------------------------------------------------+---------------+-------------+---------+---------+-----------+-----------------------------------+ +| FKM | $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}||^2$ | clustrd | 14.31 | 2.85 | 0.28 | 36.09 | 1.99 | ++-----------+---------------------------------------------------------+---------------+-------------+---------+---------+-----------+-----------------------------------+ +| FKM | $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}||^2$ | drclust | 0.76 | 39.18 | 0.92 | -0.27 | 2 | ++-----------+---------------------------------------------------------+---------------+-------------+---------+---------+-----------+-----------------------------------+ +| DPCAKM | $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}||^2$ | biplotbootGUI | 47.76 | 39.15 | 0.92 | -0.25 | 2.00 | ++-----------+---------------------------------------------------------+---------------+-------------+---------+---------+-----------+-----------------------------------+ +| DPCAKM | $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}||^2$ | drclust | 1.64 | 39.15 | 0.92 | -0.25 | 2.00 | ++-----------+---------------------------------------------------------+---------------+-------------+---------+---------+-----------+-----------------------------------+ +| DKM | $||\mathbf{U}\bar{\mathbf{Y}}\mathbf{V}||^2$ | drclust | 0.81 | 5.93 | 0.39 | -21.00 | \- | ++-----------+---------------------------------------------------------+---------------+-------------+---------+---------+-----------+-----------------------------------+ +| **High** | ++-----------+---------------------------------------------------------+---------------+-------------+---------+---------+-----------+-----------------------------------+ +| RKM | $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}\mathbf{A}'||^2$ | clustrd | 314.89 | 36.61 | 0.62 | -2.11 | 2.00 | ++-----------+---------------------------------------------------------+---------------+-------------+---------+---------+-----------+-----------------------------------+ +| RKM | $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}\mathbf{A}'||^2$ | drclust | 0.94 | 36.61 | 0.61 | -2.11 | 2.00 | ++-----------+---------------------------------------------------------+---------------+-------------+---------+---------+-----------+-----------------------------------+ +| FKM | $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}||^2$ | clustrd | 13.87 | 2.90 | 0.19 | 31.55 | 2.00 | ++-----------+---------------------------------------------------------+---------------+-------------+---------+---------+-----------+-----------------------------------+ +| FKM | $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}||^2$ | drclust | 1.02 | 36.61 | 0.61 | -2.11 | 2.00 | ++-----------+---------------------------------------------------------+---------------+-------------+---------+---------+-----------+-----------------------------------+ +| DPCAKM | $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}||^2$ | biplotbootGUI | 55.49 | 36.53 | 0.64 | -1.99 | 2.00 | ++-----------+---------------------------------------------------------+---------------+-------------+---------+---------+-----------+-----------------------------------+ +| DPCAKM | $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}||^2$ | drclust | 2.06 | 36.53 | 0.63 | -2.01 | 2.00 | ++-----------+---------------------------------------------------------+---------------+-------------+---------+---------+-----------+-----------------------------------+ +| DKM | $||\mathbf{U}\bar{\mathbf{X}}\mathbf{V}||^2$ | drclust | 0.84 | 58.97 | 0.29 | -24.37 | \- | ++-----------+---------------------------------------------------------+---------------+-------------+---------+---------+-----------+-----------------------------------+ + +: (#tab:T6) Comparison of joint clustering-variable reduction +methods on simulated data +::: + +For the three scenarios, the results are reported in +[6](#tab:T6){reference-type="ref" reference="tab:simulation"}. + +
    + +
    Figure 3: Boxplots of the Fit results in Table 6
    +
    + +
    + +
    Figure 4: Boxplots of the ARI results in Table 6
    +
    + +
    + +
    Figure 5: Boxplots of the f* − f results +in Table 6
    +
    + +
    + +
    Figure 6: Boxplots of the AA*2 +metric results in Table 6
    +
    + +
    + +
    Figure 7: Boxplots of the runtime results in Table 6, for the RKM
    +
    + +
    + +
    Figure 8: Boxplots of the runtime metric results in Table 6, for DKM, DPCAKM, FKM
    +
    + +Regarding the RKM, the +[**drclust**](https://CRAN.R-project.org/package=drclust) and +[**clustrd**](https://CRAN.R-project.org/package=clustrd) performance is +very close, both in terms of the ability to recover the data (fit) and +in terms of identifying the true classification of the objects. + +The FKM appears to be performing way better in the +[**drclust**](https://CRAN.R-project.org/package=drclust) case in terms +of fit and ARI. Considering both ARI and fit for the CDPCA algorithm, +the difference between the present proposal and the one of +[**biplotbootGUI**](https://CRAN.R-project.org/package=biplotbootGUI) is +almost absent. Referring to the CPU runtime, all of the models proposed +are significantly faster compared to the previously available ones (RKM, +FKM and KM with DPCA). For the architecture used for the experiments, +the order of magnitude for such differences are specified in the last +column of Table [2](#tab:T2){reference-type="ref" +reference="tab:stat_models"}. + +In general, the +[**drclust**](https://CRAN.R-project.org/package=drclust) shows a slight +overfit, while there is no evident difference in the ability to recover +the true **A**. There is no alternative implementation for the DKM, so +no comparison can be made. However, except for the ARI which is lower +than the other techniques, its fit is very close, showing a compelling +ability to reconstruct the data. In general, except for the FKM, where +our proposal outperforms the one in +[**clustrd**](https://CRAN.R-project.org/package=clustrd), our proposal +is equivalent in terms of fit and ARI. However, our versions outperform +every alternative in terms of runtime. Figures +([3](#fig:simboxplots1){reference-type="ref" +reference="fig:simboxplots1"} - +[8](#fig:simboxplots6){reference-type="ref" +reference="fig:simboxplots6"}) visually depict the situation in +[6](#tab:T6){reference-type="ref" reference="tab:simulation"}, showing +also the variability for each scenario, among 100 replicates. In +general, with the exception of the FKM method, where our proposed +approach outperforms the implementation available in +[**clustrd**](https://CRAN.R-project.org/package=clustrd), the methods +are comparable in terms of both fit and ARI. Nevertheless, our +implementations consistently outperform all alternatives in terms of +runtime. + +Figure ([3](#fig:simboxplots1){reference-type="ref" +reference="fig:simboxplots1"} - +[8](#fig:simboxplots6){reference-type="ref" +reference="fig:simboxplots6"}) provide a visual summary of the results +reported in Table [6](#tab:T6){reference-type="ref" +reference="tab:simulation"}, illustrating not only the central +tendencies but also the variability across the 100 simulation replicates +for each scenario. + +## Application on real data + +The six statistical models implemented (Table +[2](#tab:T2){reference-type="ref" reference="tab:stat_models"}) have a +binary argument `print` which, if set to one, displays at the end of the +execution the main statistics. In the following examples, such results +are shown, using as dataset the same used by Vichi and Kiers (2001) and +made available in +[**clustrd**](https://CRAN.R-project.org/package=clustrd) (Markos et al. +2019) and named `macro`, which has been standardized by setting the +argument `prep=1`, which is done by default by all the techniques. +Moreover, the commands reported in each example do not specify all the +arguments available for the function, for which the default values have +been kept. + +The first example refers to the DKM (Vichi 2001). As shown, the output +contains the fit expressed as the percentage of the total deviance +(i.e., $||\mathbf{X}||^2$) captured by the between deviance of the +model, implementing the fit measures in (Table +[5](#tab:T5){reference-type="ref" reference="tab:comparison"}). The +second output is the centroid matrix $\bar{\mathbf{Y}}$, which describes +the *K* centroids in the *Q*-dimensional space induced by the partition +of the variables and its related variable-means. What follows are the +sizes and within deviances of each unit cluster and each variable +cluster. Finally, it shows the pseudoF (Caliński and Harabasz 1974) +index, which is always computed for the partition of the units. Please +note that the data matrix provided to each function implemented in the +package needs to be in matrix format. + +``` r +# Macro dataset (Vichi & Kiers, 2001) +library(clustrd) +data(macro) +macro <- as.matrix(macro) +# DKM +> dkm <- doublekm(X = macro, K = 5, Q = 3, print = 1) + +>> Variance Explained by the DKM (% BSS / TSS): 44.1039 + +>> Centroid Matrix (Unit-centroids x Variable-centroids): + + V-Clust 1 V-Clust 2 V-Clust 3 +U-Clust 1 0.1282052 -0.31086968 -0.4224182 +U-Clust 2 0.0406931 -0.08362029 0.9046692 +U-Clust 3 1.4321347 0.51191282 -0.7813761 +U-Clust 4 -0.9372541 0.22627768 0.1175189 +U-Clust 5 1.2221058 -2.59078258 -0.1660691 + +>> Unit-clusters: + + U-Clust 1 U-Clust 2 U-Clust 3 U-Clust 4 U-Clust 5 +Size 8 4 4 3 1 +Deviance 23.934373 31.737865 5.878199 4.844466 0.680442 + + + +>> Variable-clusters: + + V-Clust 1 V-Clust 2 V-Clust 3 +Size 3 2 1 +Deviance 40.832173 23.024249 3.218923 + +>> pseudoF Statistic (Calinski-Harabasz): 2.23941 +``` + +The second example shows as output the main quantities computed for the +`redkm` (De Soete and Carroll 1994). Differently from the DKM where the +variable reduction is operated via averages, the RKM does this via PCA +leading to a better overall fit altering also the final unit-partition, +as observable from the sizes or deviances. + +Additionally from the DKM example, the RKM also provides the loading +matrix which projects the *J*-dimensional centroids in the +*Q*-dimensional subspace. Another important difference is the summary of +the latent factors: this table shows the information captured by the +principal components with respect to the original data. In this sense, +the output allows to distinguish between the loss due to the variable +reduction (accounted in this table) and the overall loss of the +algorithm (which accounts for the loss in the reduction of the units and +the one due to the reduction of the variables, reported in the first +line of the output). + +``` r +# RKM +> rkm <- redkm(X = macro, K = 5, Q = 3, print = 1) + +>> Variance Explained by the RKM (% BSS / TSS): 55.0935 + +>> Matrix of Centroids (Unit-centroids x Principal Components): + + PC 1 PC 2 PC 3 +Clust 1 -1.3372534 -1.1457414 -0.6150841 +Clust 2 1.8834878 -0.0853912 -0.8907303 +Clust 3 0.5759906 0.4187003 0.3739608 +Clust 4 -0.9538864 1.2392976 0.3454186 +Clust 5 1.0417952 -2.2197178 3.0414445 + +>> Unit-clusters: + Clust 1 Clust 2 Clust 3 Clust 4 Clust 5 +Size 5 5 5 4 1 +Deviance 26.204374 9.921313 11.231563 6.112386 0.418161 + +>> Loading Matrix (Manifest Variables x Latent Variables): + + PC 1 PC 2 PC 3 +GDP -0.5144915 -0.04436269 0.08985135 +LI -0.2346937 -0.01773811 -0.86115069 +UR -0.3529363 0.53044730 0.28002534 +IR -0.4065339 -0.42022401 -0.17016203 +TB 0.1975072 0.69145440 -0.36710245 +NNS 0.5927684 -0.24828525 -0.09062404 + +>> Summary of the latent factors: + + Explained Variance Expl. Var. (%) Cumulated Var. Cum. Var (%) +PC 1 1.699343 28.322378 1.699343 28.322378 +PC 2 1.39612 23.268663 3.095462 51.591041 +PC 3 1.182372 19.706208 4.277835 71.297249 + +>> pseudoF Statistic (Calinski-Harabasz): 4.29923 +``` + +The `factkm` (Vichi and Kiers 2001) has the same output structure of the +`redkm`. It exhibits, for the same data and hyperparameters, a similar +fit (overall and variable-wise). However, the unit-partition, as well as +the latent variables are different. This difference can be (at least) +partially justified by the difference in the objective function, which +is most evident in the assignment step. + +``` r +# factorial KM +> fkm <- factkm(X = macro, K = 5, Q = 3, print = 1, rot = 1) + +>> Variance Explained by the FKM (% BSS / TSS): 55.7048 + +>> Matrix of Centroids (Unit-centroids x Principal Components): + + PC 1 PC 2 PC 3 +Clust 1 -0.7614810 2.16045496 -1.21025666 +Clust 2 1.1707159 -0.08840133 -0.29876729 +Clust 3 -0.9602731 -1.33141866 0.02370092 +Clust 4 1.0782934 1.17952330 3.59632116 +Clust 5 -1.7634699 0.65075735 0.46486440 + +>> Unit-clusters: + Clust 1 Clust 2 Clust 3 Clust 4 Clust 5 +Size 9 5 3 2 1 +Deviance 6.390576 2.827047 5.018935 3.215995 0 + +>> Loading Matrix (Manifest Variables x Latent Variables): + + PC 1 PC 2 PC 3 +GDP -0.6515084 -0.1780021 0.37482509 +LI -0.3164139 0.1809559 -0.68284917 +UR -0.2944864 -0.5235492 0.01561022 +IR -0.3316254 0.5884434 -0.22101070 +TB 0.1848264 -0.5367239 -0.57166730 +NNS 0.4945307 0.1647067 0.13164438 + +>> Summary of the latent factors: + + Explained Variance Expl. Var. (%) Cumulated Var. Cum. Var (%) +PC 1 1.68496 28.082675 1.68496 28.082675 +PC 2 1.450395 24.173243 3.135355 52.255917 +PC 3 1.079558 17.992635 4.214913 70.248552 + +>> pseudoF Statistic (Calinski-Harabasz): 4.26936 +``` + +`dpcakm` (Vichi and Saporta 2009) shows the same output as RKM and FKM. +The partition of the variables, described by the $\mathbf{V}$ term in +(\@ref(eq:cdpca4)) - (\@ref(eq:cdpca5)), is readable within the loading +matrix, considering a $1$ for each non-zero value. For the `iris` +dataset, the additional constraint $\mathbf{A} = \mathbf{B}\mathbf{V}$ +does not cause a significant decrease in the objective function. The +clusters, however, differ from the previous cases as well. + +``` r +# K-means DPCA +> cdpca <- dpcakm(X = macro, K = 5, Q = 3, print = 1) + +>> Variance Explained by the DPCAKM (% BSS / TSS): 54.468 + +>> Matrix of Centroids (Unit-centroids x Principal Components): + + PC 1 PC 2 PC 3 +Clust 1 0.6717536 0.01042978 -2.7309458 +Clust 2 3.7343724 -1.18771685 0.6320673 +Clust 3 -0.6729575 -1.80822745 0.7239541 +Clust 4 -0.2496002 1.54537904 0.5263009 +Clust 5 -0.1269212 -0.12464388 -0.1748282 + +>> Unit-clusters: + Clust 1 Clust 2 Clust 3 Clust 4 Clust 5 +Size 7 6 4 2 1 +Deviance 3.816917 2.369948 1.14249 4.90759 0 + +>> Loading Matrix (Manifest Variables x Latent Variables): + + PC 1 PC 2 PC 3 +GDP 0.5567605 0.0000000 0 +LI 0.0000000 0.7071068 0 +UR 0.5711396 0.0000000 0 +IR 0.0000000 0.0000000 1 +TB 0.0000000 0.7071068 0 +NNS -0.6031727 0.0000000 0 + +>> Summary of the latent factors: + Explained Variance Expl. Var. (%) Cumulated Var. Cum. Var (%) +PC 1 1 16.666667 1 16.666667 +PC 2 1.703964 28.399406 2.703964 45.066073 +PC 3 1.175965 19.599421 3.87993 64.665494 + +>> pseudoF Statistic (Calinski-Harabasz): 3.26423 +``` + +For the `dispca` (Vichi and Saporta 2009), the output is mostly similar +(except for the part of unit-clustering) to the ones already shown. +Nevertheless, because the focus here is exclusively on the variable +reduction process, some additional information is reported in the +summary of the latent factors. Indeed, because a single principal +component summarises a subset of manifest variables, the variance of the +second component related to each of the subsets, along with the Cronbach +(1951) Alpha index is computed, in order for the user to know when the +evidence supports such strategy of dimensionality reduction. As +mentioned, this function, like in the DPCAKM case, as well as the DFA +case, it allows to constrain a subset of the *J* variables to belong to +the same cluster. In the example that follows, the first two manifest +variables are constrained to contribute to the same principal component +(which is confirmed by the output `A`). Note that the manifest variables +that have indices (colum-position in the data matrix) in correspondence +of the zeros in `constr` remain unconstrained. + +``` r +# DPCA +# Impose GDP and LI to be in the same cluster +> out <- dispca(X = macro, Q = 3, print = 1, constr = c(1,1,0,0,0,0)) + +>> Variance explained by the DPCA (% BSS / TSS)= 63.9645 + +>> Loading Matrix (Manifest Variables x Latent variables) + + PC 1 PC 2 PC 3 +GDP 0.0000000 0.0000000 0.7071068 +LI 0.0000000 0.0000000 0.7071068 +UR -0.7071068 0.0000000 0.0000000 +IR 0.0000000 -0.7071068 0.0000000 +TB 0.0000000 0.7071068 0.0000000 +NNS 0.7071068 0.0000000 0.0000000 + +>> Summary of the latent factors: + Explained Variance Expl. Var. (%) Cumulated Var. +PC 1 1.388294 23.13824 1.388294 +PC 2 1.364232 22.73721 2.752527 +PC 3 1.085341 18.08902 3.837868 + Cum. Var (%) Var. 2nd component Cronbach's Alpha +PC 1 23.13824 0.6117058 -1.269545 +PC 2 45.87544 0.6357675 -1.145804 +PC 3 63.96447 0.9146585 0.157262 +``` + +The `disfa` (Vichi 2017), by assuming a probabilistic underlying model, +allows additional evaluation metrics and statistics as well. The overall +objective function is not directly comparable with the other ones, and +is expressed in absolute (not relative, like in the previous cases) +terms. The $\chi^2$ (`X2`), along with `BIC`, `AIC` and `RMSEA` allow a +robust evaluation of the results in terms of fit/parsimony. Additionally +to the DPCA case, for each variable, the function displays the +commonality with the factors, providing a standard error, as well as an +associated *p*-value for the estimate. + +It is possible to assess by comparing the loading matrix in the DPCA +case with the DFA one, the similarity in terms of latent variables. Part +of the difference can be justified (besides the well-known distinctions +between PCA and FA) with the method used to compute each factor. While +in all the previous cases, the eigendecomposition has been employed for +this purpose, the DFA makes use of the power iteration method for the +computation of the loading matrix (Hotelling 1933). + +``` r +# disjoint FA +> out <- disfa(X = macro, Q = 3, print = 1) +>> Discrepancy of DFA: 0.296499 + +>> Summary statistics: + + Unknown Parameters Chi-square Degrees of Freedom BIC + 9 4.447531 12 174.048102 + AIC RMSEA + 165.086511 0.157189 + +>> Loading Matrix (Manifest Variables x Latent Variables) + + Factor 1 Factor 2 Factor 3 +GDP 0.5318618 0 0.0000000 +LI 0.0000000 1 0.0000000 +UR 0.5668542 0 0.0000000 +IR 0.0000000 0 0.6035160 +TB 0.0000000 0 -0.6035152 +NNS -0.6849942 0 0.0000000 + +>> Summary of the latent factors: + + Explained Variance Expl. Var. (%) Cum. Var Cum. Var (%) +Factor 1 1.0734177 17.89029 1.073418 17.89029 +Factor 2 1.0000000 16.66667 2.073418 34.55696 +Factor 3 0.7284622 12.14104 2.801880 46.69800 + Var. 2nd component Cronbach's Alpha +Factor 1 0.7001954 -0.6451803 +Factor 2 0.0000000 1.0000000 +Factor 3 0.6357675 -1.1458039 + +>> Detailed Manifest-variable - Latent-factor relationships + + Associated Factor Corr. Coeff. Std. Error Pr(p>|Z|) +GDP 1 0.5318618 0.1893572 0.0157923335 +LI 2 1.0000000 0.0000000 0.0000000000 +UR 1 0.5668542 0.1842113 0.0091557523 +IR 3 0.6035160 0.1782931 0.0048411219 +TB 3 -0.6035152 0.1782932 0.0048411997 +NNS 1 -0.6849942 0.1629084 0.0008606488 + Var. Error Communality +GDP 0.7171230 0.2828770 +LI 0.0000000 1.0000000 +UR 0.6786764 0.3213236 +IR 0.6357684 0.3642316 +TB 0.6357695 0.3642305 +NNS 0.5307830 0.4692170 +``` + +In practice, usually the `K` and `Q` hyper-parameters are not known a +priori. In such case, a possible tool that allows to investigate +plausible values for `Q` is the Kaiser criterion (Kaiser 1960), in `R`, +`kaiserCrit`), takes as a single argument the dataset and outputs a +message, as well as a scalar output indicating the number of the optimal +components based on this rule. + +``` r +# Kaiser criterion for the choice of Q, the number of latent components +> kaiserCrit(X = macro) + +The number of components suggested by the Kaiser criterion is: 3 +``` + +For selecting the number of clusters, `K`, one of the most commonly used +indices is the *pseudoF* statistic, which, however, tends to +underestimate the optimal number of clusters. To address this +limitation, a \"relaxed\" version, referred to as `apseudoF`, has been +implemented. The `apseudoF` procedure computes the standard `pseudoF` +index over a range of possible values up to `maxK`. If a higher value of +`K` yields a pseudoF that is less than `tol` $\cdot$ pseudoF (compared +to the maximum value suggested by the plain pseudoF), then `apseudoF` +selects this alternative `K` as the optimal number of clusters. +Additionally, it generates a plot of the pseudoF values computed across +the specified *K* range. Given the hybrid nature of the proposed +methods, the function also requires specifying the clustering model to +be used: 1 = `doublekm`, 2 = `redkm`, 3 = `factkm`, 4 = `dpcakm`. +Furthermore, the number of components, `Q`, must be provided, as it also +influences the final quality of the resulting partition. + +``` r +> apseudoF(X = macro, maxK=10, tol = 0.05, model = 2, Q = 3) +The optimal number of clusters based on the pseudoF criterion is: 5 +``` + +
    + +
    Figure 9: Interval-pseudoF polygonal chain
    +
    + +While this index has been thought for one-mode clustering methods, +(Rocci and Vichi 2008) extended it for two-mode clustering methods, +allowing to apply it for methods like the `doublekm`. The `dpseudoF` +function implements it and, besides the dataset, one provides the +maximum `K` and `Q` values. + +``` r +> dpseudoF(X = macro, maxK = 10, maxQ = 5) + Q = 2 Q = 3 Q = 4 Q = 5 +K = 2 38.666667 22.800000 16.000000 12.222222 +K = 3 22.800000 13.875000 9.818182 7.500000 +K = 4 16.000000 9.818182 6.933333 5.263158 +K = 5 12.222222 7.500000 5.263158 3.958333 +K = 6 9.818182 6.000000 4.173913 3.103448 +K = 7 8.153846 4.950000 3.407407 2.500000 +K = 8 6.933333 4.173913 2.838710 2.051282 +K = 9 6.000000 3.576923 2.400000 1.704545 +K = 10 5.263158 3.103448 2.051282 1.428571 +``` + +Here, the indices of the maximum value within the matrix are chosen as +the best `Q` and `K` values. + +Just by providing the centroid matrix, one can check how those are +related. Such information is usually not provided by partitive +clustering methods, but rather for the hierarchical ones. Nevertheless, +it is always possible to construct a distance matrix based on the +centroids and represent it via a dendrogram, using an arbitrary +distance. The `centree` function does exactly this, using the the Ward +(1963) distance, which corresponds to the squared Euclidean one. In +practice, one provides as an argument the output of one of the 4 methods +performing clustering. + +``` r +> out <- factkm(X = macro, K = 10, Q = 3) +> centree(drclust_out = out) +``` + +
    + +
    Figure 10: Dendrogram of a 10-centroids
    +
    + +If, instead, one wants to assess visually the quality of the obtained +partition, there are another instrument typically used for this purpose. +The silhouette (Rousseeuw 1987), besides summarizing this numerically, +allows to also graphically represent it. By employing +[**cluster**](https://CRAN.R-project.org/package=cluster) for the +computational part and +[**factoextra**](https://CRAN.R-project.org/package=factoextra) for the +graphical part, `silhouette` takes as argument the output of one of the +four [**drclust**](https://CRAN.R-project.org/package=drclust) +clustering methods and the dataset, returning the results of the two +functions with just one command. + +``` r +# Note: The same data must be provided to dpcakm and silhouette +> out <- dpcakm(X = macro, K = 5, Q = 3) +> silhouette(X = macro, drclust_out = out) +``` + +
    + +
    Figure 11: Silhouette of a DPCA KM solution
    +
    + +As can be seen in Figure [11](#silhouette dpcakm){reference-type="ref" +reference="silhouette dpcakm"}, the average silhouette width is also +displayed as a scalar above the plot. + +A purely graphical tool used to assess the dis/homogeneity of the groups +is the `heatmap`. By employing the +[**pheatmap**](https://CRAN.R-project.org/package=pheatmap) library +(Kolde 2019) and the result of `doublekm`, `redkm`, `factkm` or +`dpcakm`, the function orders each cluster of observations in ascending +order with regard to the distance between observation and cluster to +which it has been assigned. After doing so for each group, groups are +sorted based on the distance between their centroid and the grand mean +(i.e., the mean of all observations). The `heatm` function allows to +obtain such result. Figure [11](#silhouette dpcakm){reference-type="ref" +reference="silhouette dpcakm"} represents its graphical output. + +``` r +# Note: The same data must be provided to dpcakm and silhouette +> out <- doublekm(X = macro, K = 5, Q = 3) +> heatm(X = macro, drclust_out = out) +``` + +
    + +
    Figure 12: heatmap of a double-KM solution
    +
    + +Biplots and parallel coordinates plots can be obtained based on the +output of the techniques in the proposed package by means of few +instructions, using libraries available on `CRAN`, such as: +[**ggplot2**](https://CRAN.R-project.org/package=ggplot2) (Wickham et +al. 2024), `grid` (which now became a base package, +[**dplyr**](https://CRAN.R-project.org/package=dplyr) (Wickham et al. +2023) and [**GGally**](https://CRAN.R-project.org/package=GGally) by +(Schloerke et al. 2024). Therefore, the user can easily visualize the +subspaces provided by the statistical techniques. In future versions of +the package, the two functions will be available as built-in. Currently, +for the biplot, we have: + +``` r +library(ggplot2) +library(grid) +library(dplyr) + +out <- factkm(macro, K = 2, Q = 2, Rndstart = 100) + +# Prepare data +Y <- as.data.frame(macro%*%out$A); colnames(Y) <- c("Dim1", "Dim2") +Y$cluster <- as.factor(cluster(out$U)) + +arrow_scale <- 5 +A <- as.data.frame(out$A)[, 1:2] * arrow_scale +colnames(A) <- c("PC1", "PC2") +A$var <- colnames(macro) + +# Axis limits +lims <- range(c(Y$Dim1, Y$Dim2, A$PC1, A$PC2)) * 1.2 + +# Circle +circle <- data.frame(x = cos(seq(0, 2*pi, length.out = 200)) * arrow_scale, + y = sin(seq(0, 2*pi, length.out = 200)) * arrow_scale) + +ggplot(Y, aes(x = Dim1, y = Dim2, color = cluster)) + + geom_point(size = 2) + + geom_segment( + data = A, aes(x = 0, y = 0, xend = PC1, yend = PC2), + arrow = arrow(length = unit(0.2, "cm")), inherit.aes = FALSE, color = "gray40" + ) + + geom_text( + data = A, aes(x = PC1, y = PC2, label = colnames(macro)), inherit.aes = FALSE, + hjust = 1.1, vjust = 1.1, size = 3 + ) + + geom_path(data = circle, aes(x = x, y = y), inherit.aes = FALSE, + linetype = "dashed", color = "gray70") + + coord_fixed(xlim = lims, ylim = lims) + + labs(x = "Component 1", y = "Component 2", title = "Biplot") + + theme_minimal() +``` + +which leads to the result shown in Figure +[13](#boxplot1){reference-type="ref" reference="boxplot1"}. + +
    + +
    Figure 13: Biplot of a FKM solution
    +
    + +By using essential information in the output provided by `factkm`, we +are able to see the cluster of each observation, represented in the +estimated subspace induced by $\mathbf{A}$, as well as the relationships +between observed and latent variables via the arrows. + +In order to obtain the parallel coordinates plot, a single instruction +is sufficient, based on the same output as a starting point. + +``` r +library(GGally) +out <- factkm(macro, K = 3, Q = 2, Rndstart = 100) +ggparcoord( + data = Y, columns = 1:(ncol(Y)-1), + groupColumn = "cluster", scale = "uniminmax", + showPoints = FALSE, alphaLines = 0.5 +) + + theme_minimal() + + labs(title = "Parallel Coordinate Plot", + x = "Variables", y = "Normalized Value") +``` + +For FKM applied on `macro` dataset, the output is reported in figure +[14](#parcoord){reference-type="ref" reference="parcoord"}. + +
    + +
    Figure 14: Parallel coordinates plot of a FKM +solution
    +
    + +## Conclusions {#Conclusions} + +This work presents an R library that implements techniques of joint +dimensionality reduction and clustering. Some of them are already +implemented by other packages. In general, the performance between the +proposed implementations and the earlier ones is very close, except for +the FKM, where the new one is always better for the metrics considered +here. As an element of novelty, the empty cluster(s) issue that may +occur in the estimation process has been addressed by applying 2-means +on the cluster with the highest deviance, preserving the monotonicity of +the algorithm and providing slightly better results, at a higher +computational costs. + +The implementation of the two dimensionality reduction methods, `dispca` +and `disfa`, as well as `doublekm` offered by our library are novel in +the sense that they do not find previous implementation in R. Besides +the methodological difference between these last two, the latent +variables are computed differently: the former uses the well-known +eigendecomposition, while the latter adopts the power method. In +general, by implementing all the models in C/C++, the speed advantage +has been shown to be remarkable compared to all the existing +comparisons. These improvements allow the application of the techniques +on datasets that are relatively large, to obtain results in reasonable +amounts of time. Some additional functions have been implemented for the +purpose of helping in the choice process for the values of the +hyperparameters. Additionally, they can also be used as an assessment +tool in order to evaluate the quality of the results provided by the +implementations. +::::::::: + +:::::::::::::::::::::::::::::::::::::::::: {#refs .references .csl-bib-body .hanging-indent} +::: {#ref-calinski1974 .csl-entry} +Caliński, T., and J. Harabasz. 1974. "A Dendrite Method for Cluster +Analysis." *Communications in Statistics* 3 (1): 1--27. +. +::: + +::: {#ref-cattell1965 .csl-entry} +Cattell, R. B. 1965. "Factor Analysis: An Introduction to Essentials i. +The Purpose and Underlying Models." *Biometrics* 21 (1): 190--215. +. +::: + +::: {#ref-charrad2014 .csl-entry} +Charrad, M., N. Ghazzali, V. Boiteau, and A. Niknafs. 2014. "NbClust: An +R Package for Determining the Relevant Number of Clusters in a Data +Set." *Journal of Statistical Software* 61 (6): 1--36. +. +::: + +::: {#ref-cronbach1951 .csl-entry} +Cronbach, Lee J. 1951. "Coefficient Alpha and the Internal Structure of +Tests." *Psychometrika* 16 (3): 297--334. +. +::: + +::: {#ref-desoete1994 .csl-entry} +De Soete, G., and J. D. Carroll. 1994. "K-Means Clustering in a +Low-Dimensional Euclidean Space." Chap. 24 in *New Approaches in +Classification and Data Analysis*, edited by E. Diday, Y. Lechevallier, +M. Schader, P. Bertrand, and B. Burtschy. Springer. +. +::: + +::: {#ref-desarbo1990 .csl-entry} +DeSarbo, W. S., K. Jedidi, K. Cool, and D. Schendel. 1990. "Simultaneous +Multidimensional Unfolding and Cluster Analysis: An Investigation of +Strategic Groups." *Marketing Letters* 2: 129--46. +. +::: + +::: {#ref-dray2007 .csl-entry} +Dray, S., and A.-B. Dufour. 2007. "The Ade4 Package: Implementing the +Duality Diagram for Ecologists." *Journal of Statistical Software* 22 +(4): 1--20. . +::: + +::: {#ref-eddelbuettel2011 .csl-entry} +Eddelbuettel, D., and R. Francois. 2011. "Rcpp: Seamless R and C++ +Integration." *Journal of Statistical Software* 40 (8): 1--18. +. +::: + +::: {#ref-eddelbuettel2014 .csl-entry} +Eddelbuettel, D., and C. Sanderson. 2014. "RcppArmadillo: Accelerating R +with High-Performance C++ Linear Algebra." *Computational Statistics and +Data Analysis* 71: 1054--63. +. +::: + +::: {#ref-hotelling1933 .csl-entry} +Hotelling, H. 1933. "Analysis of a Complex of Statistical Variables into +Principal Components." *Journal of Educational Psychology* 24: 417--41, +and 498--520. . +::: + +::: {#ref-HubertArabie .csl-entry} +Hubert, L., and P. Arabie. 1985. "Comparing Partitions." *Journal of +Classification* 2 (1): 193--218. . +::: + +::: {#ref-kaiser1960 .csl-entry} +Kaiser, Henry F. 1960. "The Application of Electronic Computers to +Factor Analysis." *Educational and Psychological Measurement* 20 (1): +141--51. . +::: + +::: {#ref-kassambara2022 .csl-entry} +Kassambara, A. 2022. *Factoextra: Extract and Visualize the Results of +Multivariate Data Analyses*. R package version 1.0.7. +. +::: + +::: {#ref-kolde2019 .csl-entry} +Kolde, R. 2019. *Pheatmap: Pretty Heatmaps*. R package 1.0.12. +. +::: + +::: {#ref-lawley1962 .csl-entry} +Lawley, D. N., and A. E. Maxwell. 1962. "Factor Analysis as a +Statistical Method." *Journal of the Royal Statistical Society. Series D +(The Statistician)* 12 (3): 209--29. . +::: + +::: {#ref-le2008 .csl-entry} +Lê, S., J. Josse, and F. Husson. 2008. "FactoMineR: An R Package for +Multivariate Analysis." *Journal of Statistical Software* 25 (1): 1--18. +. +::: + +::: {#ref-maechler2023 .csl-entry} +Maechler, M., P. Rousseeuw, A. Struyf, M. Hubert, and K. Hornik. 2023. +*Cluster: Cluster Analysis Basics and Extensions*. R package version +2.1.6. . +::: + +::: {#ref-markos2019 .csl-entry} +Markos, A., A. I. D'Enza, and M. van de Velden. 2019. "Beyond Tandem +Analysis: Joint Dimension Reduction and Clustering in R." *Journal of +Statistical Software* 91 (10): 1--24. +. +::: + +::: {#ref-mcqueen1967 .csl-entry} +McQueen, J. 1967. "Some Methods for Classification and Analysis of +Multivariate Observations." *Computer and Chemistry* 4: 257--72. +. +::: + +::: {#ref-nietolibreiro2023 .csl-entry} +Nieto Librero, A. B., and A. Freitas. 2023. *biplotbootGUI: Bootstrap on +Classical Biplots and Clustering Disjoint Biplot*. +. +::: + +::: {#ref-pardo2007 .csl-entry} +Pardo, C. E., and P. C. Del Campo. 2007. "Combination of Factorial +Methods and Cluster Analysis in R: The Package FactoClass." *Revista +Colombiana de Estadística* 30 (2): 231--45. +. +::: + +::: {#ref-pearson1901 .csl-entry} +Pearson, K. 1901. "On Lines and Planes of Closest Fit to Systems of +Points in Space." *The London, Edinburgh, and Dublin Philosophical +Magazine and Journal of Science* 2 (11): 559--72. +. +::: + +::: {#ref-R .csl-entry} +R Core Team. 2015. *R: A Language and Environment for Statistical +Computing*. R Foundation for Statistical Computing. +. +::: + +::: {#ref-rand1971 .csl-entry} +Rand, W. M. 1971. "Objective Criteria for the Evaluation of Clustering +Methods." *Journal of the American Statistical Association* 66 (336): +846--50. . +::: + +::: {#ref-rocci2008 .csl-entry} +Rocci, R., and M. Vichi. 2008. "Two-Mode Multi-Partitioning." +*Computational Statistics & Data Analysis* 52 (4): 1984--2003. +. +::: + +::: {#ref-ROUSSEEUW198753 .csl-entry} +Rousseeuw, Peter J. 1987. "Silhouettes: A Graphical Aid to the +Interpretation and Validation of Cluster Analysis." *Journal of +Computational and Applied Mathematics* 20: 53--65. +. +::: + +::: {#ref-ggally .csl-entry} +Schloerke, B., D. Cook, H. Hofmann, et al. 2024. *GGally: Extension to +'Ggplot2'*. R package version 2.1.2. +. +::: + +::: {#ref-timmerman2010 .csl-entry} +Timmerman, Marieke E., Eva Ceulemans, Henk A. L. Kiers, and Maurizio +Vichi. 2010. "Factorial and Reduced k-Means Reconsidered." +*Computational Statistics & Data Analysis* 54 (7): 1858--71. +. +::: + +::: {#ref-maurizio2001a .csl-entry} +Vichi, M. 2001. "Double k-Means Clustering for Simultaneous +Classification of Objects and Variables." Chap. 6 in *Advances in +Classification and Data Analysis*, edited by S. Borra, R. Rocci, M. +Vichi, and M. Schader. Springer. +. +::: + +::: {#ref-vichi2017 .csl-entry} +Vichi, M. 2017. "Disjoint Factor Analysis with Cross-Loadings." +*Advances in Data Analysis and Classification* 11 (4): 563--91. +. +::: + +::: {#ref-vichi2001a .csl-entry} +Vichi, Maurizio, and Henk A. L. Kiers. 2001. "Factorial k-Means Analysis +for Two-Way Data." *Computational Statistics & Data Analysis* 37 (1): +49--64. . +::: + +::: {#ref-vichi2009 .csl-entry} +Vichi, Maurizio, and Gilbert Saporta. 2009. "Clustering and Disjoint +Principal Component Analysis." *Computational Statistics & Data +Analysis* 53 (8): 3194--208. +. +::: + +::: {#ref-VichiVicariKiers .csl-entry} +Vichi, M., D. Vicari, and Henk A. L. Kiers. 2019. "Clustering and +Dimension Reduction for Mixed Variables." *Behaviormetrika*, 243--69. +. +::: + +::: {#ref-revelle2017 .csl-entry} +W. R. Revelle. 2017. *Psych: Procedures for Personality and +Psychological Research*. +. +::: + +::: {#ref-ward1963 .csl-entry} +Ward, J. H. 1963. "Hierarchical Grouping to Optimize an Objective +Function." *Journal of the American Statistical Association* 58 (301): +236--44. . +::: + +::: {#ref-ggplot2 .csl-entry} +Wickham, H., W. Chang, L. Henry, et al. 2024. *Ggplot2: Elegant Graphics +for Data Analysis*. R package version 3.4.4. +. +::: + +::: {#ref-dplyr .csl-entry} +Wickham, H., R. François, L. Henry, and K. Müller. 2023. *Dplyr: A +Grammar of Data Manipulation*. R package version 1.1.4. +. +::: + +::: {#ref-yamamoto2014 .csl-entry} +Yamamoto, M., and H. Hwang. 2014. "A General Formulation of Cluster +Analysis with Dimension Reduction and Subspace Separation." +*Behaviormetrika* 41: 115--29. . +::: + +::: {#ref-zou2006 .csl-entry} +Zou, H., T. Hastie, and R. Tibshirani. 2006. "Sparse Principal Component +Analysis." *Journal of Computational and Graphical Statistics* 15 (2): +265--86. https://doi.org/. +::: +:::::::::::::::::::::::::::::::::::::::::: diff --git a/_articles/RJ-2025-046/RJwrapper.tex b/_articles/RJ-2025-046/RJwrapper.tex new file mode 100644 index 0000000000..73458f9934 --- /dev/null +++ b/_articles/RJ-2025-046/RJwrapper.tex @@ -0,0 +1,29 @@ +%% Just added files from R Project to Overleaf. + +\documentclass[a4paper]{report} +\usepackage[utf8]{inputenc} +\usepackage[T1]{fontenc} +\usepackage{RJournal} +\usepackage{amsmath,amssymb,array} +\usepackage{booktabs} + +%% load any required packages here +\usepackage[normalem]{ulem} +\usepackage{xcolor} + +\begin{document} + +%% do not edit, for illustration only +\sectionhead{Contributed research article} +\volume{17} +\volnumber{4} +\year{2025} +\month{December} +\setcounter{page}{103} + +%% replace RJtemplate with your article +\begin{article} + \input{prunila-vichi} +\end{article} + +\end{document} diff --git a/_articles/RJ-2025-046/figures/10centreedpca10m.png b/_articles/RJ-2025-046/figures/10centreedpca10m.png new file mode 100644 index 0000000000..12c445df05 Binary files /dev/null and b/_articles/RJ-2025-046/figures/10centreedpca10m.png differ diff --git a/_articles/RJ-2025-046/figures/11silhouettek5q3.png b/_articles/RJ-2025-046/figures/11silhouettek5q3.png new file mode 100644 index 0000000000..74be54b080 Binary files /dev/null and b/_articles/RJ-2025-046/figures/11silhouettek5q3.png differ diff --git a/_articles/RJ-2025-046/figures/12dkmk5q3heatm.png b/_articles/RJ-2025-046/figures/12dkmk5q3heatm.png new file mode 100644 index 0000000000..05056f880d Binary files /dev/null and b/_articles/RJ-2025-046/figures/12dkmk5q3heatm.png differ diff --git a/_articles/RJ-2025-046/figures/13biplot.png b/_articles/RJ-2025-046/figures/13biplot.png new file mode 100644 index 0000000000..09109c7477 Binary files /dev/null and b/_articles/RJ-2025-046/figures/13biplot.png differ diff --git a/_articles/RJ-2025-046/figures/14parcoord.png b/_articles/RJ-2025-046/figures/14parcoord.png new file mode 100644 index 0000000000..339dbe2a64 Binary files /dev/null and b/_articles/RJ-2025-046/figures/14parcoord.png differ diff --git a/_articles/RJ-2025-046/figures/1iriscomparison.png b/_articles/RJ-2025-046/figures/1iriscomparison.png new file mode 100644 index 0000000000..cccf374d90 Binary files /dev/null and b/_articles/RJ-2025-046/figures/1iriscomparison.png differ diff --git a/_articles/RJ-2025-046/figures/2sim123.png b/_articles/RJ-2025-046/figures/2sim123.png new file mode 100644 index 0000000000..41ae701e13 Binary files /dev/null and b/_articles/RJ-2025-046/figures/2sim123.png differ diff --git a/_articles/RJ-2025-046/figures/3Fit.png b/_articles/RJ-2025-046/figures/3Fit.png new file mode 100644 index 0000000000..fb9cf64d96 Binary files /dev/null and b/_articles/RJ-2025-046/figures/3Fit.png differ diff --git a/_articles/RJ-2025-046/figures/4ARI.png b/_articles/RJ-2025-046/figures/4ARI.png new file mode 100644 index 0000000000..65c25e1b11 Binary files /dev/null and b/_articles/RJ-2025-046/figures/4ARI.png differ diff --git a/_articles/RJ-2025-046/figures/5fsf.png b/_articles/RJ-2025-046/figures/5fsf.png new file mode 100644 index 0000000000..bef819c659 Binary files /dev/null and b/_articles/RJ-2025-046/figures/5fsf.png differ diff --git a/_articles/RJ-2025-046/figures/6AsA.png b/_articles/RJ-2025-046/figures/6AsA.png new file mode 100644 index 0000000000..eb587167d0 Binary files /dev/null and b/_articles/RJ-2025-046/figures/6AsA.png differ diff --git a/_articles/RJ-2025-046/figures/7runtime_RKM.png b/_articles/RJ-2025-046/figures/7runtime_RKM.png new file mode 100644 index 0000000000..ed357dad48 Binary files /dev/null and b/_articles/RJ-2025-046/figures/7runtime_RKM.png differ diff --git a/_articles/RJ-2025-046/figures/8runtime_others.png b/_articles/RJ-2025-046/figures/8runtime_others.png new file mode 100644 index 0000000000..625eb4c72b Binary files /dev/null and b/_articles/RJ-2025-046/figures/8runtime_others.png differ diff --git a/_articles/RJ-2025-046/figures/9pFfactkm.png b/_articles/RJ-2025-046/figures/9pFfactkm.png new file mode 100644 index 0000000000..208735cbc4 Binary files /dev/null and b/_articles/RJ-2025-046/figures/9pFfactkm.png differ diff --git a/_articles/RJ-2025-046/figures/Rlogo.png b/_articles/RJ-2025-046/figures/Rlogo.png new file mode 100644 index 0000000000..72bd08f764 Binary files /dev/null and b/_articles/RJ-2025-046/figures/Rlogo.png differ diff --git a/_articles/RJ-2025-046/prunila-vichi.R b/_articles/RJ-2025-046/prunila-vichi.R new file mode 100644 index 0000000000..99764887e1 --- /dev/null +++ b/_articles/RJ-2025-046/prunila-vichi.R @@ -0,0 +1,713 @@ +library(ggplot2) +library(MASS) +library(clustrd) +library(biplotbootGUI) +library(drclust) +library(dplyr) +library(tidyr) +library(tidyverse) +library(patchwork) +library(forcats) +library(RColorBrewer) +library(grid) +set.seed(92) +# defining two custom-functions needed for the computations + +# Function bsmatrix(): from clustering vector to a membership matrix: +bsmatrix = function(cluster){ + K = max(cluster) + n = length(cluster) + id = rep(1, K) + I = diag(id) + Uc = matrix(0, nrow = n, ncol = K) + for(i in 1:n){ + Uc[i,] = I[cluster[i],] + } + return(U = Uc) +} + +# Function MixSampling(): Generates synthetic data with 2-dimensional clustering structure + 4-dimensional noise +MixSampling = function(n, Pr, su, sc, seed){ + set.seed(seed) + # e.g. MixSampling(1000, c(0.2, 0.3, 0.5), 0.3, 0.01) + ## Input: + # n: nr. of objects + # Pr: vector of a priori probabilities (K x 1) + # su: standard deviation among units + # sc: standard deviation of equidistant centroids + + ## Output: + # X: data matrix (n x K-1) + # U: membership matrix + # v: categorical clustering variable + + # K number of cluster + K = length(Pr) + + # Xm: K equidistant centroid matrix + D = 6*(matrix(1,K,K)-diag(rep(1,K))) + Jc = diag(rep(1,K)) - (1/K)*(matrix(1,K,K)) + cDc = -0.5*Jc%*%D%*%Jc + eigs = eigen(cDc) + B = diag(eigs$values[1:K-1]) + A = eigs$vectors[,1:K-1] + # centroid matrix + Xm = A%*%sqrt(B) # falla andare + + # Centroid matrix + mu = t(rep(0, K-1)) + Sigma = sc*diag(rep(1, K-1)) + E = mvrnorm(n = K, mu = mu, Sigma = Sigma) + + # add error to the centroids (modify isolation) + Xmp = Xm+E + # variance and covariance matrix + Sigma0 = cov(mvrnorm(n = 100, mu = mu, Sigma = diag(rep(1,K-1)))*su)*(99/100) + U = t(rmultinom(n = n, prob = Pr, size = 1)) + u = U%*%c(1:K) + srt = sort(u,) + us <- u[order(u)] + ius <- order(u) + U = U[ius,] + X = matrix(nrow = n, ncol = 2) + for(i in 1:n){ + X[i,] = mvrnorm(mu = Xmp[us[i],], Sigma = Sigma0, n = 1) # falla andare + } + Y = mvrnorm(n = n, mu = c(0,0,0,0), Sigma = diag(c(1,1,1,1)))*6 + X = cbind(X, Y) + colnames(X) <- c("x", "y", "n1", "n2", "n3", "n4") + #p <- ggplot(data = as.data.frame(X[,1:2]), aes(x = x, y = y, color = factor(us))) + + # geom_point() + + # labs(x = "Dim1", y = "Dim2", title = "Simulated Data") + + # scale_color_discrete(name = "Cluster") + #plot(p) + + return(list(X = X, U = U, us = us)) +} +##### ------------------------- iris dataset +## Testing the statistical methods implemented in drclust and comparing with the available alternatives. + +# create a table where to allocate the results +IComp <- matrix(ncol = 5, nrow = 9) # iris comparison + +# the following 5 parameters are reported for each technique +colnames(IComp) <- c("package", "technique", "runtime", "ARI", "fit") + +# load the iris data +data(iris) +iris +irisn <- as.matrix(iris[,-5]) +# true classification labels +Utrue <- bsmatrix(as.numeric(factor(iris$Species))) +# test the RKM, FKM and CPDCA on drclust (RKM, FKM, CDPCA, ...), clustrd (RKM, FKM), biplotbootGUI (CDPCA) + +### RKM +pkg <- "clustrd" +met <- "RKM" +runtime <- system.time({ + rkm_a <- clustrd::cluspca(irisn, 3, 2, alpha = 0.5, method = "RKM", nstart = 100) +}) +# runtime +rntm <- runtime[3] +# U matrix +Urkm_rd <- bsmatrix(rkm_a$cluster) +# ARI +ari <- mrand(t(Urkm_rd)%*%Utrue) +# Fit +fit <- norm(Urkm_rd%*%rkm_a$centroid%*%t(rkm_a$attcoord), "F") +IComp[1,] <- c(pkg, met, rntm, ari, fit) + +pkg <- "drclust" +## drclust +runtime <- system.time({ + rkm_b <- redkm(irisn, 3, 2, Rndstart = 100) +}) +# runtime +rntm <- runtime[3] +# ARI +ari <- mrand(t(rkm_b$U)%*%Utrue) +# Fit +fit <- norm(rkm_b$U%*%rkm_b$centers%*%t(rkm_b$A), "F") +IComp[2,] <- c(pkg, met, rntm, ari, fit) + +## +met <- "FKM" +pkg <- "clustrd" + +runtime <- system.time({ + fkm_a <- clustrd::cluspca(irisn, 3, 2, alpha = 0, method = "FKM", nstart = 100) +}) +# runtime +rntm <- runtime[3] +# U matrix +Ufkm_rd <- bsmatrix(fkm_a$cluster) +# ARI +ari <- mrand(t(Ufkm_rd)%*%Utrue) +# Fit +fit <- norm(Ufkm_rd%*%fkm_a$centroid%*%t(fkm_a$attcoord), "F") +IComp[3,] <- c(pkg, met, rntm, ari, fit) + + +pkg <- "drclust" +runtime <- system.time({ + fkm_b <- factkm(irisn, 3, 2, Rndstart = 100) +}) +# runtime +rntm <- runtime[3] +# ARI +ari <- mrand(t(fkm_b$U)%*%Utrue) +# Fit +fit <- norm(fkm_b$U%*%fkm_b$centers%*%t(fkm_b$A), "F") +IComp[4,] <- c(pkg, met, rntm, ari, fit) + +pkg <- "biplotbootGUI" +met <- "DPCAKM" +runtime <- system.time({ + cdpca_a <- biplotbootGUI::CDpca(irisn, P = 3, Q = 2, tol = 1e-6, maxit = 100, r = 100, cdpcaplot = F) +}) +# runtime +rntm <- runtime[3] +# Ari +ari <- mrand(t(cdpca_a$U)%*%Utrue) +# Fit +fit <- norm(cdpca_a$U%*%cdpca_a$Ybar%*%t(cdpca_a$A), "F") +IComp[5,] <- c(pkg, met, rntm, ari, fit) + +pkg <- "drclust" + +runtime <- system.time({ + cdpca_b <- dpcakm(irisn, 3, 2, tol = 1e-6, maxiter = 100, Rndstart = 100) +}) +# runtime +rntm<- runtime[3] +# Ari +ari <- mrand(t(cdpca_b$U)%*%Utrue) +fit <- norm(cdpca_b$U%*%cdpca_b$centers%*%t(cdpca_b$A), "F") +IComp[6,] <- c(pkg, met, rntm, ari, fit) + +met <- "DPCA" + +runtime <- system.time({ + dpca <- dispca(irisn, 2, tol = 1e-6, maxiter = 100, Rndstart = 100) +}) +rntm <- runtime[3] +fit <- norm(scale(irisn)%*%dpca$A%*%t(dpca$A), "F") +ari <- NA +IComp[7,] <- c(pkg, met, rntm, ari, fit) + +met <- "DFA" +runtime <- system.time({ + dfa<- drclust::disfa(irisn, 2, tol = 1e-6, maxiter = 100, Rndstart = 100) +}) +rntm <- runtime[3] +fit <- norm(scale(irisn)%*%dfa$A%*%t(dfa$A), "F") +ari <- NA +IComp[8,] <- c(pkg, met, rntm, ari, fit) + +met <- "DKM" +runtime <- system.time({ + dkm <- drclust::doublekm(irisn, 3, 2, Rndstart = 100) +}) +rntm <- runtime[3] +ari <- mrand(t(dkm$U)%*%Utrue) +fit <- norm(dkm$U%*%dkm$centers%*%t(dkm$V), "F") +IComp[9,] <- c(pkg, met, rntm, ari, fit) +IComp +IComp <- as.data.frame(IComp) +IComp[,3:5] <- apply(IComp[,3:5], 2, as.numeric) + +custom_colors <- c( + "DPCAKM (biplotbootGUI)" = "#e78ac3", + "DPCAKM (drclust)" = "#ffd92f", + "DKM (drclust)" = "#cab2d6", + "FKM (clustrd)" = "#66c2a5", + "FKM (drclust)" = "#fc8d62", + "RKM (clustrd)" = "#8da0cb", + "RKM (drclust)" = "#a6d854", + "DPCA (drclust)" = "#e5c494", + "DFA (drclust)" = "#b3b3b3" +) + +IComp <- IComp %>% + mutate(technique_label = paste0(technique, " (", package, ")")) + +IComp$technique_label <- factor(IComp$technique_label, + levels = IComp %>% + arrange(technique, package) %>% + pull(technique_label) %>% + unique()) + +metric_labels <- c(ARI = "ARI", fit = "Fit", runtime = "Runtime (s)") + +IComp_long <- IComp %>% + pivot_longer(cols = c(runtime, ARI, fit), names_to = "metric", values_to = "value") %>% + filter(!is.na(value)) %>% + mutate(metric = factor(metric, levels = names(metric_labels), labels = metric_labels)) + +ggplot(IComp_long, aes(x = technique_label, y = value, fill = technique_label)) + + geom_bar(stat = "identity", position = "dodge") + + facet_wrap(~ metric, scales = "free_y", ncol = 1) + + scale_fill_manual(values = custom_colors) + + theme_minimal(base_size = 14) + + theme( + axis.text.x = element_text(angle = 45, hjust = 1), + strip.text = element_text(face = "bold", size = 14) + ) + + labs(x = "Technique (library)", y = "Value", fill = "Technique") +## Simulation +# It takes much time, mainly due to the speed of clustrd +# The estimation of the models on the simulated datasets is parallelized + + +## ------------------------- 3 simulated datasets + + +# Simulations: 3 Scenarios: High, Medium and Low noise. +# Generated via function + +seeds <- sample(1:1e6, 100, replace = FALSE) + +High <- lapply(seeds, function(x) MixSampling(1000, c(0.5, 0.3, 0.2), 0.8, 0.01, x)) +Medium <- lapply(seeds, function(x) MixSampling(1000, c(0.5, 0.3, 0.2), 0.55, 0.01, x)) +Low <- lapply(seeds, function(x) MixSampling(1000, c(0.5, 0.3, 0.2), 0.3, 0.01, x)) + +names(High) <- paste0("High.", seq_along(High)) +names(Medium) <- paste0("Medium.", seq_along(Medium)) +names(Low) <- paste0("Low.", seq_along(Low)) + +Data <- c(High, Medium, Low) + +# parameters reported for each technique +cnames <- c("With. Var..", "library", "technique", "runtime", "fit", "ari", "f*-f", "||A*-A||^2") + +Noise <- c(rep("High", 100), rep("Medium", 100), rep("Low", 100)) + +X_sim <- lapply(Data, function(x) x$X) +Utrue_sim <- lapply(Data, function(x) x$U) + +require(doParallel) +cl <- makeCluster(6, outfile = "") # +registerDoParallel(cl) + +f <- function(i) { + library(clustrd) + library(drclust) + library(biplotbootGUI) + cat("Dataset nr.", i, "\n") + + # Atrue [Without Noise] + Atrue <- matrix(0, nrow = 6, ncol = 2) + Atrue[1:2,1:2] = diag(c(1,1)) + + # Xbar + Xbar_true <- solve(t(Utrue_sim[[i]])%*%Utrue_sim[[i]])%*%t(Utrue_sim[[i]])%*%X_sim[[i]] + + # Atrue with noise + Xs = scale(X_sim[[i]]) + su = colSums(Utrue_sim[[i]]) + XX = t(Xs) %*% (Utrue_sim[[i]] %*% diag(1/su) %*% t(Utrue_sim[[i]])) %*% Xs + ## + eigs = eigen(XX) + A = as.matrix(eigs$vectors[,1:2]) + + # Fit of the true model + fstar = norm(Utrue_sim[[i]] %*% solve(t(Utrue_sim[[i]])%*%Utrue_sim[[i]]) %*%t(Utrue_sim[[i]]) %*% scale(X_sim[[i]]) %*%A, "F") + ### RKM + + pkg <- "clustrd" + met <- "RKM" + + runtime <- system.time({ + rkm_clustrd <- clustrd::cluspca(X_sim[[i]], nclus = 3, ndim = 2, method = "RKM", nstart = 100) + }) + rntm <- runtime[3] + Urkm_rd <- bsmatrix(rkm_clustrd$cluster) + ari <- mrand(t(Urkm_rd)%*%Utrue_sim[[i]]) + fit <- norm(Urkm_rd %*% solve(t(Urkm_rd)%*%Urkm_rd) %*%t(Urkm_rd) %*% scale(X_sim[[i]]) %*%rkm_clustrd$attcoord, "F") + + diff_f <- fstar - fit + diff_A <- norm(A-rkm_clustrd$attcoord, "F") + RKM1 <- c(Noise[i], pkg, met, runtime[3], fit, ari, diff_f, diff_A) + + + pkg <- "drclust" + met <- "RKM" + + runtime <- system.time({ + rkm_drclust <- redkm(X_sim[[i]], K = 3, Q = 2, Rndstart = 100) + }) + rntm <- runtime[3] + ari <- mrand(t(rkm_drclust$U)%*%Utrue_sim[[i]]) + fit <- norm(rkm_drclust$U %*% solve(t(rkm_drclust$U)%*%rkm_drclust$U) %*%t(rkm_drclust$U) %*% scale(X_sim[[i]]) %*%rkm_drclust$A, "F") + diff_f <- fstar - fit + diff_A <- norm(A-rkm_drclust$A, "F") + RKM2 <- c(Noise[i], pkg, met, runtime[3], fit, ari, diff_f, diff_A) + + + ### FKM + + pkg <- "clustrd" + met <- "FKM" + + runtime <- system.time({ + fkm_clustrd <- cluspca(X_sim[[i]],nclus = 3, ndim = 2, method = "FKM", nstart = 100) + }) + rntm <- runtime[3] + Ufkm_rd <- bsmatrix(fkm_clustrd$cluster) + ari <- mrand(t(Ufkm_rd)%*%Utrue_sim[[i]]) + fit <- norm(Ufkm_rd %*% solve(t(Ufkm_rd)%*%Ufkm_rd) %*%t(Ufkm_rd) %*% scale(X_sim[[i]]) %*%fkm_clustrd$attcoord, "F") + diff_f <- fstar - fit + diff_A <- norm(A-fkm_clustrd$attcoord, "F") + FKM1 <- c(Noise[i], pkg, met, runtime[3], fit, ari, diff_f, diff_A) + + + pkg <- "drclust" + met <- "FKM" + + runtime <- system.time({ + fkm_drclust <- factkm(X_sim[[i]], K = 3, Q = 2, Rndstart = 100) + }) + rntm <- runtime[3] + ari <- mrand(t(fkm_drclust$U)%*%Utrue_sim[[i]]) + fit <- norm(fkm_drclust$U %*% solve(t(fkm_drclust$U)%*%fkm_drclust$U) %*%t(fkm_drclust$U) %*% scale(X_sim[[i]]) %*%fkm_drclust$A, "F") + diff_f <- fstar - fit + diff_A <- norm(A-rkm_clustrd$attcoord, "F") + FKM2 <- c(Noise[i], pkg, met, runtime[3], fit, ari, diff_f, diff_A) + + + ### CDPCA + + pkg <- "biplotbootGUI" + met <- "DPCAKM" + runtime <- system.time({ + cdpca_bpbGUI <- CDpca(X_sim[[i]], P = 3, Q = 2, r = 100, maxit=100, cdpcaplot = F) + }) + rntm <- runtime[3] + ari <- mrand(t(cdpca_bpbGUI$U)%*%Utrue_sim[[i]]) + fit <- norm(cdpca_bpbGUI$U %*% solve(t(cdpca_bpbGUI$U)%*%cdpca_bpbGUI$U) %*%t(cdpca_bpbGUI$U) %*% scale(X_sim[[i]]) %*%cdpca_bpbGUI$A, "F") + diff_f <- fstar-fit + diff_A <- norm(A-cdpca_bpbGUI$A , "F") + CDPCA1 <- c(Noise[i], pkg, met, runtime[3], fit, ari, diff_f, diff_A) + + + pkg <- "drclust" + met <- "DPCAKM" + + runtime <- system.time({ + cdpca_drclust <- dpcakm(X_sim[[i]], 3, 2, Rndstart = 100) + }) + rntm <- runtime[3] + ari <- mrand(t(cdpca_drclust$U)%*%Utrue_sim[[i]]) + fit <- norm(cdpca_drclust$U %*% solve(t(cdpca_drclust$U)%*%cdpca_drclust$U) %*%t(cdpca_drclust$U) %*% scale(X_sim[[i]]) %*%cdpca_drclust$A, "F") + diff_f <- fstar-fit + diff_A <- norm(A-cdpca_drclust$A , "F") + CDPCA2 <- c(Noise[i], pkg, met, runtime[3], fit, ari, diff_f, diff_A) + + + ### DKM + + met <- "DKM" + + runtime <- system.time({ + dkm <- doublekm(X_sim[[i]], 3, 2, Rndstart = 100) + }) + rntm <- runtime[3] + ari <- mrand(t(dkm$U)%*%Utrue_sim[[i]]) + fit <- norm(dkm$U %*% solve(t(dkm$U)%*%dkm$U) %*%t(dkm$U) %*% scale(X_sim[[i]]) %*%dkm$V, "F") + diff_f <- fstar-fit + diff_A <- NA + DKM <- c(Noise[i], pkg, met, runtime[3], fit, ari, diff_f, diff_A) + + return(list(DKM = DKM, RKM1 = RKM1, RKM2 = RKM2, FKM1 = FKM1, FKM2 = FKM2, CDPCA1 = CDPCA1, CDPCA2 = CDPCA2, dkm = dkm, rkm_clustrd = rkm_clustrd, rkm_drclust = rkm_drclust, fkm_clustrd = fkm_clustrd, fkm_drclust = fkm_drclust, cdpca_bpbGUI = cdpca_bpbGUI, cdpca_drclust = cdpca_drclust)) +} + +# This takes ~ 6 hours +# models_out <- foreach(data = 1:300) %dopar% {f(data)} + +# The following line loads all the object produced by the script until here +load(url("https://figshare.com/ndownloader/files/58897297")) + +# stopCluster(cl) + +temp <-lapply(models_out, function(x) c(x$RKM1, x$RKM2, x$FKM1, x$FKM2, x$CDPCA1, x$CDPCA2, x$DKM)) +rkm1 <- lapply(temp, function(x) unlist(c((x[1:8])))) +rkm2 <- lapply(temp, function(x) unlist(c((x[9:16])))) +fkm1 <- lapply(temp, function(x) unlist(c((x[17:24])))) +fkm2 <- lapply(temp, function(x) unlist(c((x[25:32])))) +cdpca1 <- lapply(temp, function(x) unlist(c((x[33:40])))) +cdpca2 <- lapply(temp, function(x) unlist(c((x[41:48])))) +dkm <- lapply(temp, function(x) unlist(c((x[49:56])))) + + +RKM1 <- do.call(rbind, rkm1) +RKM2 <- do.call(rbind, rkm2) +FKM1 <- do.call(rbind, fkm1) +FKM2 <- do.call(rbind, fkm2) +CDPCA1 <- do.call(rbind, cdpca1) +CDPCA2 <- do.call(rbind, cdpca2) +DKM <- do.call(rbind, dkm) +colnames(RKM1) <- cnames +CDPCA1[,3] <- "DPCAKM" +CDPCA2[,3] <- "DPCAKM" + +RKM1 <- as.data.frame(RKM1) +RKM2 <- as.data.frame(RKM2) +FKM1 <- as.data.frame(FKM1) +FKM2 <- as.data.frame(FKM2) +CDPCA1 <- as.data.frame(CDPCA1) +CDPCA2 <- as.data.frame(CDPCA2) +DKM <- as.data.frame(DKM) + +RKM1[,c(4:8)] <- apply(RKM1[, c(4:8)], 2, as.numeric) +RKM2[,c(4:8)] <- apply(RKM2[, c(4:8)], 2, as.numeric) +FKM1[,c(4:8)] <- apply(FKM1[, c(4:8)], 2, as.numeric) +FKM2[,c(4:8)] <- apply(FKM2[, c(4:8)], 2, as.numeric) +CDPCA1[,c(4:8)] <- apply(CDPCA1[, c(4:8)], 2, as.numeric) +CDPCA2[,c(4:8)] <- apply(CDPCA2[, c(4:8)], 2, as.numeric) +DKM[,c(4:8)] <- apply(DKM[, c(4:8)], 2, as.numeric) + +unique_vals <- unique(RKM1[, 1]) + +rkm1 <- RKM1[1:3,] +rkm2 <- RKM2[1:3,] +fkm1 <- FKM1[1:3,] +fkm2 <- FKM2[1:3,] +dpcakm1 <- CDPCA1[1:3,] +dpcakm2 <- CDPCA2[1:3,] +dkm <- DKM[1:3,] + +for (i in 1:length(unique_vals)) { + val <- unique_vals[i] + idx <- which(RKM1[, 1] == val) + med_vals <- apply(RKM1[idx, 4:8], 2, median) + rkm1[i, 4:8] <- med_vals + rkm1[i,1] <- val + + med_vals <- apply(RKM2[idx, 4:8], 2, median) + rkm2[i, 4:8] <- med_vals + rkm2[i,1] <- val + + med_vals <- apply(FKM1[idx, 4:8], 2, median) + fkm1[i, 4:8] <- med_vals + fkm1[i,1] <- val + + med_vals <- apply(FKM2[idx, 4:8], 2, median) + fkm2[i, 4:8] <- med_vals + fkm2[i,1] <- val + + med_vals <- apply(CDPCA1[idx, 4:8], 2, median) + dpcakm1[i, 4:8] <- med_vals + dpcakm1[i,1] <- val + + med_vals <- apply(CDPCA2[idx, 4:8], 2, median) + dpcakm2[i, 4:8] <- med_vals + dpcakm2[i,1] <- val + + med_vals <- apply(DKM[idx, 4:7], 2, median) + dkm[i, 4:7] <- med_vals + dkm[i,1] <- val + dkm[,8] <- NA + +} +DKM[,2] <- "drclust" +colnames(RKM1) <- cnames +colnames(RKM2) <- cnames +colnames(FKM1) <- cnames +colnames(FKM2) <- cnames +colnames(CDPCA1) <- cnames +colnames(CDPCA2) <- cnames +colnames(DKM) <- cnames + + +# ----------- Boxplots +df_list <- list(RKM1, RKM2, FKM1, FKM2, CDPCA1, CDPCA2, DKM) + +### + +long_data <- purrr::map_df(df_list, function(df) { + colnames(df)[1] <- "Block" + method_label <- paste0(df$technique[1], "\n(", df$library[1], ")") + + numeric_cols <- names(df)[sapply(df, is.numeric)] + numeric_cols <- setdiff(numeric_cols, "Block") + + df %>% + pivot_longer(cols = dplyr::all_of(numeric_cols), + names_to = "Measure", values_to = "Value") %>% + mutate(Method = method_label) +}) + +long_data$Block <- factor(long_data$Block, levels = c("High", "Medium", "Low")) + +long_data_clean <- long_data %>% + group_by(Method, Measure, Block) %>% + filter(any(!is.na(Value))) %>% + ungroup() + +unique_measures <- unique(long_data_clean$Measure) + +make_measure_plot <- function(df, misura) { + ggplot(dplyr::filter(df, Measure == misura), + aes(x = Method, y = Value, fill = Method)) + + geom_boxplot(width = 0.5) + + facet_wrap(~ Block, nrow = 1, scales = "free_x", strip.position = "bottom") + + coord_flip(clip = "off") + + theme_minimal() + + labs(title = NULL, x = NULL, y = NULL) + + theme( + strip.placement = "outside", + strip.background = element_blank(), + strip.text.x = element_text(size = 7, face = "bold"), + axis.text.x = element_text(size = 8), + axis.text.y = element_text(size = 9, lineheight = 1.2), + legend.position = "none", + panel.spacing.x = unit(1.2, "lines"), + plot.margin = margin(8, 8, 8, 22) + ) +} + +measures_no_runtime <- setdiff(unique_measures, "runtime") +plots <- purrr::map(measures_no_runtime, ~ make_measure_plot(long_data_clean, .x)) +names(plots) <- measures_no_runtime + +runtime_data <- dplyr::filter(long_data_clean, Measure == "runtime") +runtime_rkm <- dplyr::filter(runtime_data, str_detect(Method, "^RKM\\n")) +runtime_others <- dplyr::filter(runtime_data, !str_detect(Method, "^RKM\\n")) + +make_runtime_plot <- function(df_runtime) { + ggplot(df_runtime, aes(x = Method, y = Value, fill = Method)) + + geom_boxplot(width = 0.5) + + facet_wrap(~ Block, nrow = 1, scales = "free_x", strip.position = "bottom") + + coord_flip(clip = "off") + + theme_minimal() + + labs(title = NULL, x = NULL, y = NULL) + + theme( + axis.text.x = element_text(size = 8), + axis.text.y = element_text(size = 9, lineheight = 1.2), + strip.text.x = element_text(size = 8, face = "bold"), + legend.position = "none", + panel.spacing.x = unit(1.2, "lines"), + plot.margin = margin(8, 8, 8, 22) + ) +} + +# Figure 3-8 +p_runtime_rkm <- make_runtime_plot(runtime_rkm) +p_runtime_others <- make_runtime_plot(runtime_others) +plots +p_runtime_rkm +p_runtime_others +### + +# ---------------------------------------------- macro dataset +# The examples are from the section "Application on Real Data" +# Macro dataset (Vichi & Kiers, 2001) +data("macro") +macro <- as.matrix(macro) +irisn <- as.matrix(iris[,-5]) + +kaiserCrit(macro) + +# relaxed pseudoF +apseudoF(irisn, maxK=10, tol = 0.05, model = 2, Q = 3) + +# double pseudoF (Rocci & Vichi, 2008) +dpseudoF(irisn, maxK = 10, maxQ = 3) + +# double k-means +dkm <- doublekm(irisn,4,3, print=1) + +# reduced k-means +rkm <- redkm(irisn, 4, 3, print=1) + +# factorial k-means +fkm <- factkm(irisn, 4, 3, print=1, rot = 1) + +# clustering with disjoint PCA +cdpca <- dpcakm(irisn, 3, 3, print=1) + +# disjoint PCA +# Require GDP and LI variables to lie in the same cluster +out <- dispca(irisn, 3, print = 1, constr = c(1,1,0,0,0,0)) + +# disjoint FA +out <- disfa(irisn, 3, print=1) + +# Kaiser criterion for the choice of Q, the number of latent components +kaiserCrit(macro) + +# relaxed pseudoF +apseudoF(macro, maxK=10, tol = 0.05, model = 2, Q = 3) + +# double pseudoF (Rocci & Vichi, 2008) +dpseudoF(macro, maxK = 10, maxQ = 5) + +# dendrogram of the centroids obtained by FKM +out <- factkm(macro, 10, 3) +centree(out) + +# silhouette for the partition obtained via CDPCA +out <- dpcakm(macro, 4, 3) +silhouette(macro, out) + +# heatmap of the observations +# (sorted by distance(centroid, GrandMean), sorted within each cluster by distance(unit, centroid)) +# projected on the subspace +out <- doublekm(macro,5,3) +heatm(macro, out) + + +#-------------- Biplot +library(ggplot2) +library(grid) +library(dplyr) +library(drclust) + +# Prepare data and perform a model in drclust +out <- factkm(macro, K = 2, Q = 2, Rndstart = 100) + +# Prepare data +Y <- as.data.frame(macro%*%out$A); colnames(Y) <- c("Dim1", "Dim2") +Y$cluster <- as.factor(cluster(out$U)) + +arrow_scale <- 5 +A <- as.data.frame(out$A)[, 1:2] * arrow_scale +colnames(A) <- c("PC1", "PC2") +A$var <- colnames(macro) + +# Axis limits +lims <- range(c(Y$Dim1, Y$Dim2, A$PC1, A$PC2)) * 1.2 + +# Circle +circle <- data.frame(x = cos(seq(0, 2*pi, length.out = 200)) * arrow_scale, + y = sin(seq(0, 2*pi, length.out = 200)) * arrow_scale) + +ggplot(Y, aes(x = Dim1, y = Dim2, color = cluster)) + + geom_point(size = 2) + + geom_segment(data = A, aes(x = 0, y = 0, xend = PC1, yend = PC2), + arrow = arrow(length = unit(0.2, "cm")), inherit.aes = FALSE, color = "gray40") + + geom_text(data = A, aes(x = PC1, y = PC2, label = var), inherit.aes = FALSE, + hjust = 1.1, vjust = 1.1, size = 3) + + geom_path(data = circle, aes(x = x, y = y), inherit.aes = FALSE, + linetype = "dashed", color = "gray70") + + coord_fixed(xlim = lims, ylim = lims) + + labs(x = "Component 1", y = "Component 2", title = "Biplot") + + theme_minimal() + + + +#-------------- Parallel Coordinates Plot + +library(GGally) + +out <- factkm(macro, K = 2, Q = 2, Rndstart = 100) +ggparcoord(data = Y, + columns = 1:(ncol(Y)-1), + groupColumn = "cluster", + scale = "uniminmax", + showPoints = FALSE, + alphaLines = 0.5) + + theme_minimal() + + labs(title = "Parallel Coordinate Plot", + x = "Variables", y = "Normalized Value") + diff --git a/_articles/RJ-2025-046/prunila-vichi.bib b/_articles/RJ-2025-046/prunila-vichi.bib new file mode 100644 index 0000000000..2a6cb21267 --- /dev/null +++ b/_articles/RJ-2025-046/prunila-vichi.bib @@ -0,0 +1,386 @@ +@Manual{R, + author = {{R Core Team}}, + title = {{R}: A Language and Environment for Statistical Computing}, + organization = {R Foundation for Statistical Computing}, + address = {Vienna, Austria}, + year = {2015}, + url = {http://www.R-project.org/} +} +@article{desarbo1990, + author = {DeSarbo, W. S. and Jedidi, K. and Cool, K. and Schendel, D.}, + title = {Simultaneous multidimensional unfolding and cluster analysis: An investigation of strategic groups}, + journal = {Marketing Letters}, + volume = {2}, + year = {1990}, + pages = {129--146}, + url = {https://doi.org/10.1007/BF00436033} +} +@article{knight1978, + author = {Knight, Anne}, + title = {Common Factor Analysis: Some Recent Developments in Theory and Practice}, + journal = {Journal of the Royal Statistical Society. Series D (The Statistician)}, + volume = {27}, + number = {1}, + year = {1978}, + pages = {27--42}, + url = {https://doi.org/10.2307/2988250} +} +@incollection{desoete1994, + author = {De Soete, G. and Carroll, J. D.}, + title = {K-means clustering in a low-dimensional {Euclidean} space}, + booktitle = {New Approaches in Classification and Data Analysis}, + editor = {Diday, E. and Lechevallier, Y. and Schader, M. and Bertrand, P. and Burtschy, B.}, + publisher = {Springer}, + address = {Berlin, Heidelberg}, + year = {1994}, + chapter = {24}, + url = {https://doi.org/10.1007/978-3-642-51175-2_24} +} +@incollection{maurizio2001a, + author = {Vichi, M.}, + title = {Double k-means Clustering for Simultaneous Classification of Objects and Variables}, + booktitle = {Advances in Classification and Data Analysis}, + editor = {Borra, S. and Rocci, R. and Vichi, M. and Schader, M.}, + publisher = {Springer}, + address = {Berlin, Heidelberg}, + year = {2001}, + chapter = {6}, + url = {https://doi.org/10.1007/978-3-642-59471-7_6} +} +@article{mcqueen1967, + author = {McQueen, J.}, + title = {Some Methods for Classification and Analysis of Multivariate Observations}, + journal = {Computer and Chemistry}, + volume = {4}, + year = {1967}, + pages = {257--272}, + url = {https://www.cs.cmu.edu/~bhiksha/courses/mlsp.fall2010/class14/macqueen.pdf} +} +@article{vichi2001a, + author = {Vichi, Maurizio and Kiers, Henk A. L.}, + title = {Factorial k-means analysis for two-way data}, + journal = {Computational Statistics \& Data Analysis}, + volume = {37}, + number = {1}, + year = {2001}, + pages = {49--64}, + issn = {0167-9473}, + url = {https://doi.org/10.1016/S0167-9473(00)00064-5} +} +@article{vichi2009, + author = {Vichi, Maurizio and Saporta, Gilbert}, + title = {Clustering and disjoint principal component analysis}, + journal = {Computational Statistics \& Data Analysis}, + volume = {53}, + number = {8}, + year = {2009}, + pages = {3194--3208}, + issn = {0167-9473}, + url = {https://doi.org/10.1016/j.csda.2008.05.028} +} +@article{hotelling1933, + author = {Hotelling, H.}, + title = {Analysis of a complex of statistical variables into principal components}, + journal = {Journal of Educational Psychology}, + volume = {24}, + year = {1933}, + pages = {417--441, and 498--520}, + url = {https://doi.org/10.1037/h0071325} +} +@article{vichi2017, + author = {Vichi, M.}, + title = {Disjoint factor analysis with cross-loadings}, + journal = {Advances in Data Analysis and Classification}, + volume = {11}, + number = {4}, + year = {2017}, + pages = {563--591}, + url = {https://doi.org/10.1007/s11634-016-0263-9} +} +@article{rocci2008, + author = {Rocci, R. and Vichi, M.}, + title = {Two-mode multi-partitioning}, + journal = {Computational Statistics \& Data Analysis}, + volume = {52}, + number = {4}, + year = {2008}, + pages = {1984--2003}, + url = {https://doi.org/10.1016/j.csda.2007.06.025} +} +@article{kaiser1960, + author = {Kaiser, Henry F.}, + title = {The Application of Electronic Computers to Factor Analysis}, + journal = {Educational and Psychological Measurement}, + volume = {20}, + number = {1}, + month = {April}, + year = {1960}, + pages = {141--151}, + url = {https://doi.org/10.1177/001316446002000116} +} +@article{eddelbuettel2011, + author = {Eddelbuettel, D. and Francois, R.}, + title = {Rcpp: Seamless {R} and {C++} Integration}, + journal = {Journal of Statistical Software}, + volume = {40}, + number = {8}, + year = {2011}, + pages = {1--18}, + url = {https://doi.org/10.18637/jss.v040.i08} +} +@article{eddelbuettel2014, + author = {Eddelbuettel, D. and Sanderson, C.}, + title = {RcppArmadillo: Accelerating {R} with high-performance {C++} linear algebra}, + journal = {Computational Statistics and Data Analysis}, + volume = {71}, + year = {2014}, + pages = {1054--1063}, + url = {https://doi.org/10.1016/j.csda.2013.02.005} +} +@manual{maechler2023, + author = {Maechler, M. and Rousseeuw, P. and Struyf, A. and Hubert, M. and Hornik, K.}, + title = {cluster: Cluster Analysis Basics and Extensions}, + organization = {R package version 2.1.6}, + year = {2023}, + url = {https://CRAN.R-project.org/package=cluster} +} +@manual{kassambara2022, + author = {Kassambara, A.}, + title = {factoextra: Extract and Visualize the Results of Multivariate Data Analyses}, + organization = {R package version 1.0.7}, + year = {2022}, + url = {https://cran.r-project.org/package=factoextra} +} +@article{cronbach1951, + author = {Cronbach, Lee J.}, + title = {Coefficient alpha and the internal structure of tests}, + journal = {Psychometrika}, + volume = {16}, + number = {3}, + year = {1951}, + pages = {297--334}, + url = {https://doi.org/10.1007/BF02310555} +} +@article{rand1971, + author = {Rand, W. M.}, + title = {Objective criteria for the evaluation of clustering methods}, + journal = {Journal of the American Statistical Association}, + volume = {66}, + number = {336}, + year = {1971}, + pages = {846--850}, + url = {https://doi.org/10.2307/2284239} +} +@manual{kolde2019, + author = {Kolde, R.}, + title = {pheatmap: Pretty Heatmaps}, + organization = {R package 1.0.12}, + year = {2019}, + url = {https://cran.r-project.org/package=pheatmap} +} +@article{kaiser1958, + author = {Kaiser, H. F.}, + title = {The varimax criterion for analytic rotation in factor analysis}, + journal = {Psychometrika}, + volume = {23}, + year = {1958}, + pages = {187--200}, + url = {https://doi.org/10.1007/BF02289233} +} +@article{dray2007, + author = {Dray, S. and Dufour, A.-B.}, + title = {The ade4 Package: Implementing the Duality Diagram for Ecologists}, + journal = {Journal of Statistical Software}, + volume = {22}, + number = {4}, + year = {2007}, + pages = {1--20}, + url = {https://doi.org/10.18637/jss.v022.i04} +} +@article{le2008, + author = {Lê, S. and Josse, J. and Husson, F.}, + title = {FactoMineR: An {R} Package for Multivariate Analysis}, + journal = {Journal of Statistical Software}, + volume = {25}, + number = {1}, + year = {2008}, + pages = {1--18}, + url = {https://doi.org/10.18637/jss.v025.i01} +} +@book{lebart2000, + author = {Lebart, L. and Morineau, A. and Piron, M.}, + title = {Statistique Exploratoire Multidimensionnelle}, + year = {2000}, + url = {https://horizon.documentation.ird.fr/exl-doc/pleins_textes/2022-03/010029478.pdf} +} +@article{pardo2007, + author = {Pardo, C. E. and Del Campo, P. C.}, + title = {Combination of Factorial Methods and Cluster Analysis in {R}: The Package FactoClass}, + journal = {Revista Colombiana de Estadística}, + volume = {30}, + number = {2}, + year = {2007}, + pages = {231--245}, + url = {https://revistas.unal.edu.co/index.php/estad/article/view/29478} +} +@article{charrad2014, + author = {Charrad, M. and Ghazzali, N. and Boiteau, V. and Niknafs, A.}, + title = {NbClust: An {R} Package for Determining the Relevant Number of Clusters in a Data Set}, + journal = {Journal of Statistical Software}, + volume = {61}, + number = {6}, + year = {2014}, + pages = {1--36}, + url = {https://doi.org/10.18637/jss.v061.i06} +} +@misc{nietolibreiro2023, + author = {Nieto Librero, A. B. and Freitas, A.}, + title = {biplotbootGUI: Bootstrap on Classical Biplots and Clustering Disjoint Biplot}, + year = {2023}, + url = {https://cran.r-project.org/web/packages/biplotbootGUI/index.html} +} +@article{yamamoto2014, + author = {Yamamoto, M. and Hwang, H.}, + title = {A General Formulation of Cluster Analysis with Dimension Reduction and Subspace Separation}, + journal = {Behaviormetrika}, + volume = {41}, + year = {2014}, + pages = {115--129}, + url = {https://doi.org/10.2333/bhmk.41.115} +} +@article{cattell1965, + author = {Cattell, R. B.}, + title = {Factor Analysis: An Introduction to Essentials I. The Purpose and Underlying Models}, + journal = {Biometrics}, + volume = {21}, + number = {1}, + year = {1965}, + pages = {190--215}, + url = {https://doi.org/10.2307/2528364} +} +@article{lawley1962, + author = {Lawley, D. N. and Maxwell, A. E.}, + title = {Factor Analysis as a Statistical Method}, + journal = {Journal of the Royal Statistical Society. Series D (The Statistician)}, + volume = {12}, + number = {3}, + year = {1962}, + pages = {209--229}, + url = {https://doi.org/10.2307/2986915} +} +@article{pearson1901, + author = {Pearson, K.}, + title = {On lines and planes of closest fit to systems of points in space}, + journal = {The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science}, + volume = {2}, + number = {11}, + year = {1901}, + pages = {559--572}, + url = {https://doi.org/10.1080/14786440109462720} +} +@article{zou2006, + author = {Zou, H. and Hastie, T. and Tibshirani, R.}, + title = {Sparse Principal Component Analysis}, + journal = {Journal of Computational and Graphical Statistics}, + volume = {15}, + number = {2}, + year = {2006}, + pages = {265--286}, + doi = {https://doi.org/10.1198/106186006X113430} +} +@article{calinski1974, + author = {Caliński, T. and Harabasz, J.}, + title = {A dendrite method for cluster analysis}, + journal = {Communications in Statistics}, + volume = {3}, + number = {1}, + year = {1974}, + pages = {1--27}, + url = {https://doi.org/10.1080/03610927408827101} +} +@article{markos2019, + author = {Markos, A. and D'Enza, A. I. and van de Velden, M.}, + title = {Beyond Tandem Analysis: Joint Dimension Reduction and Clustering in {R}}, + journal = {Journal of Statistical Software}, + volume = {91}, + number = {10}, + year = {2019}, + pages = {1--24}, + url = {https://doi.org/10.18637/jss.v091.i10} +} +@article{ROUSSEEUW198753, + author = {Peter J. Rousseeuw}, + title = {Silhouettes: A graphical aid to the interpretation and validation of cluster analysis}, + journal = {Journal of Computational and Applied Mathematics}, + volume = {20}, + pages = {53--65}, + year = {1987}, + issn = {0377-0427}, + url = {https://doi.org/10.1016/0377-0427(87)90125-7} +} +@article{ward1963, + author = {Ward, J. H.}, + title = {Hierarchical Grouping to Optimize an Objective Function}, + journal = {Journal of the American Statistical Association}, + volume = {58}, + number = {301}, + year = {1963}, + pages = {236--244}, + url = {https://doi.org/10.1080/01621459.1963.10500845} +} +@article{timmerman2010, + author = {Marieke E. Timmerman and Eva Ceulemans and Henk A.L. Kiers and Maurizio Vichi}, + title = {Factorial and reduced K-means reconsidered}, + journal = {Computational Statistics \& Data Analysis}, + volume = {54}, + number = {7}, + pages = {1858--1871}, + year = {2010}, + issn = {0167-9473}, + url = {https://doi.org/10.1016/j.csda.2010.02.009} +} +@misc{revelle2017, + author = "{W. R. Revelle}", + title = {psych: Procedures for Personality and Psychological Research}, + year = {2017}, + url = {https://cran.r-project.org/web/packages/psych/index.html} +} +@article{VichiVicariKiers, + author = {Vichi, M. and Vicari, D. and Kiers, Henk A. L.}, + title = {Clustering and dimension reduction for mixed variables}, + journal = {Behaviormetrika}, + year = {2019}, + pages = {243--269}, + url = {https://doi.org/10.1007/s41237-018-0068-6} +} +@manual{ggplot2, + author = {Wickham, H. and Chang, W. and Henry, L. and Pedersen, T. L. and Takahashi, K. and Wilke, C. G. and Yutani, H. and Dunnington, D.}, + title = {ggplot2: Elegant Graphics for Data Analysis}, + organization = {R package version 3.4.4}, + year = {2024}, + url = {https://CRAN.R-project.org/package=ggplot2} +} +@manual{dplyr, + author = {Wickham, H. and François, R. and Henry, L. and Müller, K.}, + title = {dplyr: A Grammar of Data Manipulation}, + organization = {R package version 1.1.4}, + year = {2023}, + url = {https://CRAN.R-project.org/package=dplyr} +} +@manual{ggally, + author = {Schloerke, B. and Cook, D. and Hofmann, H. and Wickham, H. and Garnier, S. and Morrissey, K. and Elberg, A.}, + title = {GGally: Extension to 'ggplot2'}, + organization = {R package version 2.1.2}, + year = {2024}, + url = {https://CRAN.R-project.org/package=GGally} +} +@article{HubertArabie, + author = {Hubert, L. and Arabie, P.}, + title = {Comparing partitions}, + journal = {Journal of Classification}, + volume = {2}, + number = {1}, + pages = {193--218}, + year = {1985}, + url = {https://doi.org/10.1007/BF01908075} +} diff --git a/_articles/RJ-2025-046/prunila-vichi.tex b/_articles/RJ-2025-046/prunila-vichi.tex new file mode 100644 index 0000000000..d330e53cf5 --- /dev/null +++ b/_articles/RJ-2025-046/prunila-vichi.tex @@ -0,0 +1,1000 @@ +% !TeX root = RJwrapper.tex +\title{drclust: An R Package for Simultaneous Clustering and Dimensionality Reduction} +\author{by Ionel Prunila and Maurizio Vichi} +\maketitle +%\begin{abstract} +\begin{abstract} + The primary objective of simultaneous methodologies for clustering and variable reduction is to identify both the optimal partition of units and the optimal subspace of variables, all at once. The optimality is typically determined using least squares or maximum likelihood estimation methods. These simultaneous techniques are particularly useful when working with Big Data, where the reduction (synthesis) is essential for both units and variables. Furthermore, a secondary objective of reducing variables through a subspace is to enhance the interpretability of the latent variables identified by the subspace using specific methodologies. + The drclust package implements double K-means (KM), reduced KM, and factorial KM to address the primary objective. KM with disjoint principal components addresses both the primary and secondary objectives, while disjoint principal component analysis and disjoint factor analysis address the latter, producing the sparsest loading matrix. + The models are implemented in C++ for faster execution, processing large data matrices in a reasonable amount of time. +\end{abstract} +%\end{abstract} +% \section{Introduction} +\section{Introduction}\label{Introduction} +Cluster analysis is the process of identifying homogeneous groups of units in the data so that those within clusters are perceived with a low degree of dissimilarity with each other. In contrast, units in different clusters are perceived as dissimilar, i.e., with a high degree of dissimilarity. +When dealing with large or extremely large data matrices, often referred to as Big Data, the task of assessing these dissimilarities becomes computationally intensive due to the sheer volume of units and variables involved. To manage this vast amount of information, it is essential to employ statistical techniques that synthesize and highlight the most significant aspects of the data. Typically, this involves dimensionality reduction for both units and variables to efficiently summarize the data. + +While cluster analysis synthesizes information across the rows of the data matrix, variable reduction operates on the columns, aiming to summarize the features and, ideally, facilitate their interpretation. This key process involves extracting a subspace from the full space spanned by the manifest variables, maintaining the principal informative content. The process allows for the synthesis of common information mainly among subsets of manifest variables, which represent concepts not directly observable. As a result, subspace-based variable reduction identifies a few uncorrelated latent variables that mainly capture common relationships within these subsets. When using techniques like Factor Analysis (FA) or Principal Component Analysis (PCA) for this purpose, interpreting the resulting factors or components can be challenging, particularly when variables significantly load onto multiple factors, a situation known as \textit{cross-loading}. Therefore, a simpler structure in the loading matrix, focusing on the primary relationship between each variable and its related factor, becomes desirable for clarity and ease of interpretation. Furthermore, the latent variables derived from PCA or FA do not provide a unique solution. An equivalent model fit can be achieved by applying an orthogonal rotation to the component axes. This aspect of non-uniqueness is often exploited in practice through Varimax rotation, which is designed to improve the interpretability of latent variables, without affecting the fit of the analysis. The rotation promotes a simpler structure in the loading matrix, however, the rotations do not always ensure enhanced interpretability. An alternative approach has been proposed by \cite{vichi2009} and \cite{vichi2017}, with Disjoint Principal Component (DPCA) and Disjoint FA (DFA), +suggesting to construct each component/factor from a distinct subset of manifest variables rather than using all available variables, still optimizing the same estimation as in PCA and FA, respectively. + +It is important to note that data matrix reduction for both rows and columns is often performed without specialized methodologies by employing a "tandem analysis." This involves sequentially applying two methods, such as using PCA or FA for variable reduction, followed by Cluster Analysis using KM on the resulting factors. Alternatively, one could start with Cluster Analysis and then proceed to variable reduction. +The outcomes of these two tandem analyses differ since each approach optimizes distinct objective functions, one before the other. For instance, when PCA is applied first, the components maximize the total variance of the manifest variables. However, if the manifest variables include high-variance variables that lack a clustering structure, these will be included in the components, even though they are not necessary for KM, which focuses on explaining only the variance between clusters. As a result, sequentially optimizing two different objectives may lead to sub-optimal solutions. +In contrast, when combining KM with PCA or FA in a simultaneous approach, a single integrated objective function is utilized. This function aims to optimize both the clustering partition and the subspace simultaneously. The optimization is typically carried out using an Alternating Least Squares (ALS) algorithm, which updates the partition for the current subspace in one step and the subspace for the current partition in the next. This iterative process ensures convergence to a solution that represents at least a local minimum of the integrated objective function. +In comparison, tandem analysis, which follows a sequential approach (e.g., PCA followed by KM), does not guarantee joint optimization. One potential limitation of this sequential method is that the initial optimization through PCA may obscure relevant information for the subsequent step of Cluster Analysis or emphasize irrelevant patterns, ultimately leading to sub-optimal solutions, as mentioned by \cite{desarbo1990}. Indeed, the simultaneous strategy has been shown to be effective in various studies, like \cite{desoete1994}, \cite{vichi2001a}, \cite{maurizio2001a}, \cite{vichi2009}, \cite{rocci2008}, \cite{timmerman2010}, \cite{yamamoto2014}. + +In order to spread access to these techniques and their use, software implementations are needed. Within the \citet{R} environment, there are different libraries available to perform dimensionality reduction techniques. Indeed, the plain version of KM, PCA, and FA are available in the built-in package stats, namely: \code{princomp}, \code{factanal}, \code{kmeans}. Furthermore, some packages allow to go beyond the plain estimation and output of such algorithms. Indeed, one of the most rich libraries in R is \CRANpkg{psych} \citep{revelle2017}, which provides functions that allow to easily simulate data according to different schemes, testing routines, calculation of various estimates, as well as multiple estimation methods. \CRANpkg{ade4} \citep{dray2007} allows for dimensionality reduction in the presence of different types of variables, along with many graphical instruments. The \CRANpkg{FactoMineR} \citep{le2008} package allows for unit-clustering and extraction of latent variables, also in the presence of mixed variables. \CRANpkg{FactoClass} \citep{pardo2007} implements functions for PCA, Correspondence Analysis (CA) as well as clustering, including the tandem approach. \CRANpkg{factoextra} \citep{kassambara2022} instead, provides visualization of the results, aiding their assessment in terms of choice of the number of latent variables, elegant dendrograms, screeplots and more. More focused on the choice of the number of clusters is \CRANpkg{NbClust} \citep{charrad2014}, offering 30 indices for determining the number of clusters, proposing the best method by trying not only different numbers of groups but also different distance measures and clustering methods, going beyond the partitioning ones. + +More closely related to the library here presented, to the knowledge of the authors, there are two packages that implement a subset of the techniques proposed within \CRANpkg{drclust}. \CRANpkg{clustrd} \citep{markos2019} implements simultaneous methods of clustering and dimensionality reduction. Besides offering functions for continuous data, they also allow for categorical (or mixed) variables. Even more, they formulate, at least for the continuous case, an implementation aligned with the objective function proposed by \citet{yamamoto2014}, based on which the reduced KM (RKM) and factorial KM (FKM) become special cases as results of a tuning parameter. + +Finally, there is \CRANpkg{biplotbootGUI} +\citep{nietolibreiro2023}, offering a GUI allowing to interact with graphical tools, aiding in the choice of the number of components and clusters. Furthermore, it implements KM with disjoint PCA (DPCA), as described in \citep{vichi2009}. Even more, they propose an optimization algorithm for the choice of the initial starting point from which the estimation process for the parameters begins. + +Like \CRANpkg{clustrd}, the \CRANpkg{drclust} package provides implementations of FKM and RKM. However, while \CRANpkg{clustrd} also supports categorical and mixed-type variables, our implementation currently handles only continuous variables. That said, appropriate pre-processing of categorical variables, as suggested in \citet{VichiVicariKiers}, can make them compatible with the proposed methods. In extreme essence, one should dummy-encode all the qualitative variables. +In terms of performance, \CRANpkg{drclust} offers significantly faster execution. Moreover, regarding FKM, our proposal demonstrates superior results in both empirical applications and simulations, in terms of model fit and the Adjusted Rand Index (ARI). +Another alternative, \CRANpkg{biplotbootGUI}, implements KM with DPCA and includes built-in plotting functions and a SDP-based initialization of parameters. However, our implementation remains considerably faster and allows users to specify which variables should be grouped together within the same (or different) principal components. This capability enables a partially or fully confirmatory approach to variable reduction. +Beyond speed and the confirmatory option, \CRANpkg{drclust} offers three methods not currently available in other \code{R} packages: DPCA and DFA, both designed for pure dimensionality reduction, and double KM (DKM), which performs simultaneous clustering and variable reduction via KM. All methods are implemented in C++ for computational efficiency. Table~\ref{tab:stat_models} summarizes the similarities and differences between \code{drclust} and existing alternatives + +The package presented within this work aims to facilitate the access to and usability of some techniques that fall in two main branches, which overlap. In order to do so, some statistical background is first recalled. + +\section{Notation and theoretical background} +The main pillars of \CRANpkg{drclust} fall in two main categories: dimensionality reduction and (partitioning) cluster analysis. The former may be carried out individually or blended with the latter. Because both rely on the language of linear algebra, Table \ref{tab:notation} contains, for the convenience of the reader, the mathematical notation needed for this context. Then some theoretical background is reported. + +\begin{table}[htbp] + \centering + \begin{tabular}{p{2cm} p{12cm}} + \toprule + Symbol & Description \\ + \toprule + \textit{n}, \textit{J}, \textit{K}, \textit{Q} & number of: units, manifest variables, unit-clusters, latent factors \vspace{0.10cm}\\ + $\mathbf{X}$ & \textit{n} x \textit{J} data matrix, where the generic element $x_{ij}$ is the real observation + on the \textit{i}-th unit within the \textit{j}-th variable \vspace{0.10cm}\\ + $\mathbf{x}_i$ & \textit{J} x 1 vector representing the generic row of $\mathbf{X}$ \vspace{0.10cm}\\ + $\mathbf{U}$ & \textit{n} x \textit{K} unit-cluster membership matrix, binary and row stochastic, with $u_{ik}$ being the generic element \vspace{0.10cm}\\ + $\mathbf{V}$ & \textit{J} x \textit{Q} variable-cluster membership matrix, binary and row stochastic, with $v_{jq}$ as the generic element + \vspace{0.10cm}\\ + $\mathbf{B}$ & \textit{J} x \textit{J} variable-weighting diagonal matrix \vspace{0.10cm}\\ + $\mathbf{Y}$ & \textit{n} x \textit{Q} component/factor score matrix defined on the reduced subspace + \vspace{0.10cm}\\ + $\mathbf{y}_i$ & \textit{Q} x 1 vector representing the generic row of \textbf{Y} \vspace{0.10cm}\\ + $\mathbf{A}$ & \textit{J} x \textit{Q} variables - factors, "plain", loading matrix \vspace{0.10cm}\\ + $\mathbf{C}^+$ & Moore-Penrose pseudo-inverse of a matrix \textbf{C}. $\mathbf{C}^+ = (\mathbf{C'C})^{-1}\mathbf{C'}$ \vspace{0.10cm}\\$\bar{\textbf{X}}$ & \textit{K} x \textit{J} centroid matrix in the original feature space, i.e., $\bar{\textbf{X}} = \textbf{U}^{+} \textbf{X}$ \vspace{0.10cm}\\ + $\bar{\mathbf{Y}}$ & \textit{K} x \textit{Q} centroid matrix projected in the reduced subspace, i.e., $\bar{\mathbf{Y}} = \bar{\mathbf{X}}\mathbf{A}$ \vspace{0.10cm}\\ + $\mathbf{H}_{\mathbf{C}}$ & Projector operator $\mathbf{H}_\mathbf{C} = \mathbf{C}(\mathbf{C}'\mathbf{C})^{-1}\mathbf{C}'$ spanned by the columns of matrix $\mathbf{C}$ \vspace{0.10cm}\\ + $\mathbf{E}$ & \textit{n} x \textit{J} Error term matrix + \vspace{0.10cm}\\ + $||\cdot||$ & Frobenius norm\\ + \bottomrule + \end{tabular} + \caption{Notation} + \label{tab:notation} +\end{table} + + +\subsection{Latent variables with simple-structure loading matrix} +Classical methods of PCA \citep{pearson1901} or FA \citep{cattell1965, lawley1962} build each latent factor from combination of \textit{all} the manifest variables. As a consequence, the loading matrix, describing the relations between manifest and latent variables, is usually not immediately interpretable. Ideally, it is desirable to have variables that are associated to a single factor. This is typically called \textit{simple structure}, which induces subsets of variables characterizing factors and frequently the partition of the variables. While factor rotation techniques go in this direction (especially Varimax), even if not exactly, they do not guarantee the result. Alternative solutions have been proposed. \citep{zou2006}, by framing the PCA problem as a regression one, introducing an elastic-net penalty, aiming for a sparse solution of the loading matrix \textbf{A}. For the present work, we consider two techniques for this purpose: DPCA and DFA, implemented in the proposed package. + +\subsubsection{Disjoint principal component analysis} +\citet{vichi2009} propose an alternative solution, DPCA, which leads to the simplest possible structure on \textbf{A}, while still maximizing the explained variance. Such a result is obtained by building each latent factor from a subset of variables instead of allowing all the variables to contribute to all the components. This means that it provides \textit{J} non-zero loadings instead of having \textit{J}\textit{Q} of them. To obtain this setting, variables are grouped in such a way that they form a partition of the initial set. The model can be described as a constrained PCA, where the matrix $\mathbf{A}$ is restricted to be reparametrized into the product $\mathbf{A}=\mathbf{BV} $. Thus, the model is described as: + +\begin{equation} +\label{dpca1} + \mathbf{X} = \mathbf{X}\mathbf{A}\mathbf{A}' + \mathbf{E}= \mathbf{X}\mathbf{B}\mathbf{V}\mathbf{V}'\mathbf{B} + \mathbf{E}, +\end{equation} +subject to +\begin{equation} +\label{dpca2} + \mathbf{V} = [v_{jq} \in \{0,1\}] \ \ \ \ \ (binarity), +\end{equation} +\begin{equation} +\label{dpca3} + \mathbf{V}\mathbf{1}_{Q} = \mathbf{1}_{J} \ \ \ (row-stochasticity), +\end{equation} +\begin{equation} +\label{dpca4} +\mathbf{V}'\mathbf{B}\mathbf{B}'\mathbf{V} = \mathbf{I}_{Q} \ \ \ \ \ (orthonormality), +\end{equation} +\begin{equation} +\label{dpca5} + \mathbf{B} = diag(b_1, \dots, b_J) \ \ \ \ (diagonality). +\end{equation} +The estimation of the parameters $\mathbf{B}$ and $\mathbf{V}$ is carried out via least squares (LS) and, by solving the minimization problem, +\begin{equation} +\label{dpca6} + RSS_{DPCA}(\mathbf{B}, \mathbf{V}) = ||\mathbf{X} - \mathbf{X}\mathbf{B}\mathbf{V}\mathbf{V}'\mathbf{B}||^2 +\end{equation} +subject to the the constraints (\ref{dpca2}, \ref{dpca3}, \ref{dpca4}, \ref{dpca5}). An ALS algorithm is employed, guaranteeing at least a local optimum. In order to (at least partially) overcome this downside, multiple random starts are needed, and the best solution is retained. + +Therefore, the DPCA method is subject to more structural constraints than standard PCA. Specifically, standard PCA does not enforce the reparameterization $\mathbf{A}=\mathbf{BV} $, meaning its loading matrix $\mathbf{A}$ is free to vary among orthonormal matrices. In contrast, DPCA still requires an orthonormal matrix $\mathbf{A}$ but also needs that each principal component is associated with a disjoint subset of variables that most reconstruct the data. This implies that each variable contributes to only one component, resulting in a sparse and block-diagonal loading matrix. In essence, DPCA fits \textit{Q} separate PCAs on the \textit{Q} disjoint subsets of variables, and from each, extracts the eigenvector associated with the largest eigenvalue. +In general, the total variance explained by DPCA is slightly lower, and the residual of the objective function is larger compared to PCA. This trade-off is made in exchange for the added constraint that clearly enhances interpretability. +%i.e. +%\begin{equation} +%\label{dpca7} +% \mathbf{A} = %\mathbf{B}\mathbf{V} +%\end{equation} +The extent of the reduction depends on the true underlying structure of the latent factors, specifically on whether they are truly uncorrelated. When the observed correlation matrix is block diagonal, with variables within blocks being highly correlated and variables between blocks being uncorrelated, DPCA can explain almost the same amount of variance of PCA, with the advantage to simplify interpretation. +\newline +It is important to note that, as DPCA is implemented, it allows for a blend of exploratory and confirmatory approaches. In the confirmatory framework, users can specify a priori which variables should collectively contribute to a factor using the \code{constr} argument, available for the last three functions in Table \ref{tab:stat_models}. The algorithm assigns the remaining manifest variables, for which no constraint has been specified, to the \textit{Q} factors in a way that ensures the latent variables best reconstruct the manifest ones, capturing the maximum variance. This is accomplished by minimizing the loss function (\ref{dpca6}). +Although each of the \textit{Q} latent variables is derived from a different subset of variables, which involves the spectral decomposition of multiple covariance matrices, their smaller size, combined with the implementation in C++, enables very rapid execution of the routine. + +A very positive side effect of the additional constraint in DPCA compared to standard PCA is the uniqueness of the solution, which eliminates the need for factor rotation in DPCA. + + +\subsubsection{Disjoint factor analysis} +Proposed by \citet{vichi2017}, this technique is the model-based counterpart of the DPCA model. It pursues a similar goal in terms of building \textit{Q} factors from \textit{J} variables, imposing a simple structure on the loading matrix. However, the means by which the goal is pursued are different. Unlike DPCA, the estimation method adopted for DFA is Maximum Likelihood and the model requires additional statistical assumptions compared to DPCA. The model can be formulated in a matrix form as, +\begin{equation} +\label{dfa1} + \mathbf{X} = \mathbf{Y}\mathbf{A}'+\mathbf{E}, +\end{equation} +where $\mathbf{X}$ is centered, meaning that the mean vector $\boldsymbol{\mu}$ has been subtracted from each multivariate unit $\mathbf{x}_{i}$. Therefore, for a multivariate, centered, unit, the previous model can be expressed as +\begin{equation} +\label{dfa2} + \mathbf{x}_i = \mathbf{A}\mathbf{y}_i + \mathbf{e}_i, \ \ i = 1, \dots, n. +\end{equation} +where $\mathbf{y}_i$ is the \textit{i}-th row of $\mathbf{Y}$ and $\mathbf{x}_i$, $\mathbf{e}_i$ are, respectively, the $i$-th rows of $\mathbf{X}$ and $\mathbf{E}$, with a multivariate normal distribution on the $J$-dimensional space, +\begin{equation} +\label{FAassumptions1} + \mathbf{x}_i \sim \mathcal{N}(\boldsymbol{0}, \boldsymbol{\Sigma_X}), \ \ \ \mathbf{e}_i \sim \mathcal{N}(\boldsymbol{0}, \mathbf{\Psi}) +\end{equation} +The covariance structure of the FA model can be written, +\begin{equation} + Cov(\mathbf{x}_i) = \mathbf{\Sigma_X} = \mathbf{AA'} + \mathbf{\Psi}, +\end{equation} +\label{dfa6} +where additional, assumptions are needed, +%\textcolor{red}{\sout{ +%\begin{equation} +%\label{dfa3} +% \mathbb{E}(\mathbf{e}_i) = \mathbf{0}_{J} +%\end{equation}}} +\begin{equation} +\label{dfa4} + Cov(\mathbf{y}_{i}) = \mathbf{\Sigma}_{\mathbf{Y}} = \mathbf{I}_Q, +\end{equation} +\begin{equation} +\label{dfa5} + Cov(\mathbf{e}_i) = \mathbf{\Sigma}_{\mathbf{E}} = \mathbf{\Psi}, \ \ \ \mathbf{\Psi} = diag(\psi_{1},\dots,\psi_{Q} : \psi_{q}>0)' , \ \ j = 1, \dots, J +\end{equation} +\begin{equation} Cov(\mathbf{e}_{i}, \mathbf{y}_{i}) = \mathbf{\Sigma}_{\mathbf{EY}} = 0 +\label{dfa5b} +\end{equation} +\begin{equation} +\mathbf{A} = \mathbf{BV} +\label{dfa6b} +\end{equation} +The objective function can be formulated as the maximization of the Likelihood function or as the minimization of the following discrepancy: +\begin{align*} + D_{DFA}(\mathbf{B},\mathbf{V}, \mathbf{\Psi}) + & = |\text{ln}(\mathbf{B}\mathbf{V}\mathbf{V}'\mathbf{B} + \mathbf{\Psi})| - \text{ln}|\mathbf{S}| + \text{tr}((\mathbf{B}\mathbf{V}\mathbf{V}'\mathbf{B} + \mathbf{\Psi})^{-1}\mathbf{S}) - \textit{J}, \\ + & \qquad j = 1, \dots, \textit{J}, \ q = 1, \dots, \textit{Q},\\ + & \qquad s.t.: \mathbf{V} = [v_{jq}], \ v_{jq} \in \{0,1\}, \ \sum_q{v_{jq}} = 1, +\end{align*} +whose parameters are optimized by means of a coordinate descent algorithm. + +Apart from the methodological distinctions between DPCA and DFA, the latter exhibits the scale equivariance property. The optimization of the Likelihood function implies a higher computational load, thus, a longer (compared to the DPCA) execution time. + +As in the DPCA case, under the constraint $ \mathbf{A}=\mathbf{BV} $, the solution provided by the model is, also in this case, unique. + +\subsection{Joint clustering and variable reduction} + +The four clustering methods discussed all follow the $K$-means framework, working to partition units. However, they differ primarily in how they handle variable reduction. + +Double KM (DKM) employs a symmetric approach, clustering both the units (rows) and the variables (columns) of the data matrix at the same time. This leads to the simultaneous identification of mean profiles for both dimensions. DKM is particularly suitable for data matrices where both rows and columns represent units. Examples of such matrices include document-by-term matrices used in Text Analysis, product-by-customer matrices in Marketing, and gene-by-sample matrices in Biology. + +In contrast, the other three clustering methods adopt an asymmetric approach. They treat rows and columns differently, focusing on means profiles and clustering for rows, while employing components or factors for the variables (columns). These methods are more appropriate for typical units-by-variable matrices, where it's beneficial to synthesize variables using components or factors. At the same time, they emphasize clustering and the mean profiles of the clusters specifically for the rows. The methodologies that fall into this category are RKM, FKM, and DPCAKM. + +The estimation is carried out by the LS method, while the computation of the estimates is performed via ALS. + +\subsubsection{Double k-means (DKM)} +Proposed by \citet{maurizio2001a}, DKM is one of the first introduced bi-clustering methods that provides a simultaneous partition of the units and variables, resulting in a two-way extension of the plain KM \citep{mcqueen1967}. The model is described by the following equation, \begin{equation} +\label{dkm1} + \mathbf{X} = \mathbf{U}\bar{\mathbf{Y}}\mathbf{V}' + \mathbf{E} +\end{equation} +where $\bar{\mathbf{Y}}$ is the centroid matrix in the reduced space for the rows and columns, enabling a comprehensive summarization of units and variables. By optimizing a single objective function, the DKM method captures valuable information from both dimensions of the dataset simultaneously. + +This bi-clustering approach can be applied in several impactful ways. One key application is in the realm of Big Data. DKM can effectively compress expansive datasets that includes a vast number of units and variables into a compressed more manageable and robust data matrix $\bar{\mathbf{Y}}$. This compressed matrix, formed by mean profiles both for rows and columns, can then be explored and analyzed using a variety of subsequent statistical techniques, thus facilitating efficient data handling and analysis of Big Data. The algorithm similarly to the well-known KM is very fast and converges quickly to a solution, which is at least a local minimum of the problem. + +Another significant application of DKM is its capability to achieve optimal clustering for both rows and columns. This dual clustering ability is particularly advantageous in situations where it is essential to discern meaningful patterns and relationships within complex datasets, highlighting the utility of DKM in diverse fields and scenarios. + +The Least Squares estimation of the parameters $\mathbf{U}$, $\mathbf{V}$ and $\bar{\mathbf{Y}}$ leads to the minimization of the problem +\begin{equation} +\label{dkm2} + RSS_{\textit{DKM}}(\mathbf{U}, \mathbf{V}, \bar{\mathbf{Y}}) = {||\mathbf{X} - \mathbf{U}\bar{\mathbf{Y}}\mathbf{V}'||^2}, +\end{equation} +\begin{equation} +\label{dkm3} + s.t.: u_{ik} \in \{0,1\}, \ \ \sum_{k} u_{ik} = 1, \ \ i = 1 ,\dots, N, \ \ k = 1 ,\dots, K, +\end{equation} +\begin{equation} +\label{dkm4} + \ \ \ \ \ \ \ v_{jq} \in \{0,1\}, \ \ \sum_{q} v_{jq} = 1, \ \ j = 1, \dots, J, \ \ q = 1, \dots, Q. +\end{equation} +Since $\mathbf{\bar{Y}} = \mathbf{U}^{+}\mathbf{X}\mathbf{V}^{+'}$, then (\ref{dkm2}) can be framed in terms of projector operators, thus: +\begin{equation} +\label{dkm5} +RSS_{\textit{DKM}}(\mathbf{U}, \mathbf{V}) = ||\mathbf{X} - \mathbf{H}_\mathbf{U}\mathbf{X}\mathbf{H}_\mathbf{V}||^2. +\end{equation} +Minimizing in both cases the sum of squared-residuals (or, equivalently, the within deviances associated to the \textit{K} unit-clusters and \textit{Q} variable-clusters). In this way, one obtains a (hard) classification of both units and variables. The optimization of [\ref{dkm5}] is done via ALS, alternating, in essence, two assignment problems for rows and columns similar to KM steps. + +\subsubsection{Reduced k-means (RKM)} +Proposed by \citet{desoete1994}, RKM performs the reduction of the variables by projecting the \textit{J}-dimensional centroid matrix into a \textit{Q}-dimensional subspace ($\textit{Q} \leq$ \textit{J}), spanned by the columns of the loading matrix $\mathbf{A} $, such that it best reconstructs $\mathbf{X}$ by using the orthogonal projector matrix $\mathbf{A}\mathbf{A}'$. +Therefore, the model is described by the following equation, +\begin{equation} +\label{rkm1} + \mathbf{X} = \mathbf{U}\bar{\mathbf{X}}\mathbf{A}\mathbf{A}' + \mathbf{E}. +\end{equation} +The estimation of \textbf{U} and \textbf{A} can be done via LS, minimizing the following equation, +\begin{equation} +\label{rkm2} + RSS_{\textit{RKM}}(\mathbf{U}, \mathbf{A})={||\mathbf{X} - \mathbf{U}\bar{\mathbf{X}}\mathbf{A}\mathbf{A}'||^2}, +\end{equation} +\begin{equation} +\label{rkm3} + s.t.: \ \ \ u_{ik} \in \{0,1\}, \ \ \sum_{k} u_{ik} = 1, \ \ \mathbf{A}'\mathbf{A} = \mathbf{I}. +\end{equation} +which can be optimized, once again, via ALS. +In essence, the model alternates a KM step assigning each original unit $\mathbf{x}_i$ to the closest centroid in the reduced space and a PCA step based on the spectral decomposition of $\mathbf{X}'\mathbf{H}_\mathbf{U}\mathbf{X}$, conditioned on the results of the previous iteration. The iterations continue until when the difference between two subsequent objective functions is smaller than a small arbitrary chosen constant $\epsilon > 0$. + +\subsubsection{Factorial k-means (FKM)} +Proposed by \citet{vichi2001a}, FKM produces a dimension reduction both of the units and centroids differently from RKM. Its goal is to reconstruct the data in the reduced subspace, $\mathbf{Y}$, by means of the centroids in the reduced space. The FKM model can be obtained by considering the RKM model and post-multiplying the right- and left-hand side of it in equation (\ref{rkm1}), and rewriting the new error as $\mathbf{E}$, +\begin{equation} + \mathbf{X}\mathbf{A} = \mathbf{U}\bar{\mathbf{X}}\mathbf{A} + \mathbf{E}. +\end{equation} +Its estimation via LS results in the optimization of the following equation, +\begin{equation} +\label{fkm1} + RSS_{\textit{FKM}}(\mathbf{U}, \mathbf{A}, \bar{\mathbf{X}})={||\mathbf{X}\mathbf{A} - \mathbf{U}\bar{\mathbf{X}}\mathbf{A}||^2}, +\end{equation} + +\begin{equation} + s.t.: \ \ \ u_{ik} \in \{0,1\}, \ \ \sum_{k} u_{ik} = 1, \ \ \mathbf{A}'\mathbf{A} = \mathbf{I}. +\end{equation} +Although the connection with the RKM model appears straightforward, it can be shown that the loss function of the former is always equal or smaller compared to the latter. Practically, the KM step is applied on $\mathbf{X}\mathbf{A}$, instead of just $\mathbf{X}$, as it happens in the DKM and RKM. In essence, FKM works better when the data and centroids are lying in the reduced subspace, and not just the centroids as in RKM. + +In order to decide when RKM or FKM can be properly applied, it is important to recall that +two types of residuals can be defined in dimensionality reduction: \textit{subspace residuals}, lying on the subspace spanned by the columns of $\mathbf{A}$ and \textit{complement residuals}, lying on the complement of this subspace, i.e., those residual lying on the subspace spanned by the columns of $\mathbf{A}^\perp$, with $\mathbf{A}^\perp$ a column-wise orthonormal matrix of order $J \times (J-Q)$ such that $\mathbf{A}^\perp \mathbf{A}^{\perp ^\prime} = \mathbf{O}_{J-Q}$, where $\mathbf{O}_{J-Q}$ is the matrix of zeroes of order $Q \times (J-Q)$. FKM is more effective when there is significant residual variance in the subspace orthogonal to the clustering subspace. In other words, the complement residuals typically represent the error given by those observed variables that scarcely contribute to the clustering subspace to be identified. FKM tends to recover the subspace and clustering structure more accurately when the data contains variables with substantial variance that does not reflect the clustering structure and therefore mask it. FKM can better ignore these variables and focus on the relevant clustering subspace. + On the other hand, RKM performs better when the data has significant residual variance within the clustering subspace itself. This means that when the variables within the subspace show considerable variance, RKM can more effectively capture the clustering structure. + +In essence, when most of the variables in the dataset reflect the clustering structure, RKM is more likely to provide a good solution. If this is not the case, FKM may be preferred. + +\subsubsection{Disjoint principal component analysis k-means (DPCAKM)} +Starting from the FKM model, the goal here, beside the partition of the units, is to have a parsimonious representation of the relationships between latent and manifest variables, provided by the loading matrix \textbf{A}. \citet{vichi2009} propose for FKM the parametrization of \textbf{A} = \textbf{BV}, that allows the simplest structure and thus simplifies the interpretation of the factors, +\begin{equation} +\label{cdpca1} + \mathbf{X} = \mathbf{U}\bar{\mathbf{X}}\mathbf{B}\mathbf{V}\mathbf{V}'\mathbf{B} + \mathbf{E}. +\end{equation} +By estimating $\mathbf{U}$, $\mathbf{B}$, $\mathbf{V}$ and $\bar{\mathbf{X}}$ via LS, the loss function of the proposed method becomes: +\begin{equation} +\label{cdpca2} + RSS_{DPCAKM}(\mathbf{U}, \mathbf{B}, \mathbf{V}, \bar{\mathbf{X}}) = ||\mathbf{X} - \mathbf{U}\bar{\mathbf{X}}\mathbf{B}\mathbf{V}\mathbf{V}'\mathbf{B}||^2, +\end{equation} +\begin{equation} +\label{cdpca3} + s.t.: u_{ik} \in \{0,1\}, \ \ \sum_{k} u_{ik} = 1, \ \ i = 1 ,\dots, N, \ \ k = 1 ,\dots, K, +\end{equation} +\begin{equation} +\label{cdpca4} + \ \ \ \ \ \ \ v_{jq} \in \{0,1\}, \ \ \sum_{q} v_{jq} = 1, \ \ j = 1, \dots, J, \ \ q = 1, \dots, Q, +\end{equation} +\begin{equation} +\label{cdpca5} + \ \ \ \ \ \ \ \mathbf{V}'\mathbf{B}\mathbf{B}\mathbf{V} = \mathbf{I}, \ \ \mathbf{B} = diag(b_1, \dots, b_J). +\end{equation} +In practice, this model has traits of the DPCA given the projection on the reduced subspace and the partitioning of the units, resulting in a sparse loading matrix, but also of the DKM, given the presence of both \textbf{U} and \textbf{V}. Thus, DPCAKM can be considered a bi-clustering methodology with an asymmetric treatment of the rows and columns of \textbf{X}. By inheriting the constraint on \textbf{A}, the overall fit of the model compared with the FKM for example, is generally worse although it offers an easier interpretation of the principal components. Nevertheless, it is potentially able to identify a better partition of the units. Like in the DPCA case, the difference is negligible when the true latent variables are really disjoint. As implemented, the assignment step is carried out by minimizing the unit-centroid squared-Euclidean distance in the reduced subspace. + +\section{The package} +The library offers the implementation of all the models mentioned in the previous section. Each one of them corresponds to a specific function implemented using \CRANpkg{Rcpp} \citep{eddelbuettel2011} and \CRANpkg{RcppArmadillo} \citep{eddelbuettel2014}. + +\begin{table}[h!] + \renewcommand{\arraystretch}{1.5} + \centering + \begin{tabular}{p{1.4cm} >{\small}p{4.0cm} >{\small}p{3cm} >{\small}p{4.2cm}} + \toprule + \normalsize Function & \normalsize Model & \normalsize Previous \newline Implementations & \normalsize Main differences \newline in \code{drclust} \\ + \toprule + \code{doublekm} & DKM \newline \citep{maurizio2001a} & None & Short runtime (C++); \\ + \code{redkm} & RKM \newline \citep{desoete1994} & in \code{clusterd}; \newline Mixed variables; & >50x faster (C++); \newline Continuous variables;\\ + \code{factkm} & FKM \newline \citep{vichi2001a} & in \code{clustrd}; \newline Mixed variables & >20x faster (C++); \newline Continuous variables; \newline Better fit and classification;\\ + \code{dpcakm} & DPCAKM \newline \citep{vichi2009} & in \code{biplotbootGUI}; \newline Continuous variables; \newline SDP-based initialization of parameters; & >10x faster (C++); \newline Constraint on variable allocation within principal components;\\ + \code{dispca} & DPCA \newline \citep{vichi2009} & None & Short runtime (C++); \newline Constraint on variable allocation within principal components;\\ + \code{disfa} & DFA \newline \citep{vichi2017} & None & Short runtime (C++); \newline Constraint on variable allocation within factors;\\ + \bottomrule + \end{tabular} + \caption{Statistical methods available in the \texttt{drclust} package} + \label{tab:stat_models} +\end{table} +%% +%\begin{table}[htbp] +% \centering +% \begin{tabular}{ll} +% \toprule +% Function & Model \\ +% \toprule +% \code{doublekm} & Double \textit{k}-means \citep{maurizio2001a} \vspace{0.10cm}\\ +% \code{redkm} & Reduced \textit{k}-means \citep{desoete1994} \vspace{0.10cm}\\ +% \code{factkm} & Factorial \textit{k}-means \citep{vichi2001a} \vspace{0.10cm}\\ +% \code{dpcakm} & \textit{K}-means with Disjoint PCA \citep{vichi2009} \vspace{0.10cm}\\ +% \code{dispca} & Disjoint PCA \citep{vichi2009} \vspace{0.10cm}\\ +% \code{disfa} & Disjoint Factor Analysis \citep{vichi2017} \vspace{0.10cm}\\ +% \bottomrule +% \end{tabular} +% \caption{Statistical methods available in the library} +% \label{tab:stat_models} +%\end{table} +Some additional functions have been made available for the user. Most of them are intended to aid the user in evaluating the quality of the results, or in the choice of the hyper-parameters. + +\begin{table}[h!] +\renewcommand{\arraystretch}{1.3} +\small +\centering +\begin{tabular}{p{1.8cm} p{2.0cm} p{6cm} p{1.8cm}} +\toprule +\textbf{Function} & \textbf{Technique} & \textbf{Description} & \textbf{Goal} \\ +\toprule +\code{apseudoF} & "relaxed" pseudoF & "Relaxed" version of \citet{calinski1974}. Selects the second largest pseudoF value if the difference with the first is less than a fraction. & Parameter tuning \\ +\code{dpseudoF} & DKM-pseudoF & Adaptation of the pseudoF criterion proposed by \citet{rocci2008} to bi-clustering. & Parameter tuning \\ +\code{kaiserCrit} & Kaiser criterion & Kaiser rule for selecting the number of principal components \citep{kaiser1960}. & Parameter tuning \\ +\code{centree} & Dendrogram of the centroids & Graphical tool showing how close the centroids of a partition are. & Visualization \\ +\code{silhouette} & Silhouette & Imported from \CRANpkg{cluster} \citep{maechler2023} and \CRANpkg{factoextra} \citep{kassambara2022}. & Visualization, parameter tuning \\ +\code{heatm} & Heatmap & Heatmap of distance-ordered units within distance-ordered clusters, adapted from \CRANpkg{pheatmap} \citep{kolde2019}. & Visualization \\ +\code{CronbachAlpha} & Cronbach Alpha Index & Proposed by \citet{cronbach1951}. Assesses the unidimensionality of a dataset. & Assessment \\ +\code{mrand} & ARI & Assesses clustering quality based on the confusion matrix \citep{rand1971}. & Assessment \\ +\code{cluster} & Membership vector & Returns a multinomial 1 × \textit{n} membership vector from a binary, row-stochastic \textit{n} × \textit{K} membership matrix; mimics \code{kmeans\$cluster}. & Encoding \\ +\bottomrule +\end{tabular} +\caption{Auxiliary functions available in the library} +\label{tab:aux_methods} +\end{table} +% \begin{table}[htbp] +% \small +% \centering +% \begin{tabular}{llll} +% \toprule +% Function & Technique & Description & Goal \\ +% \toprule +% \code{apseudoF} & "relaxed" pseudoF & "relaxed" version of \citet{calinski1974} & Parameter Tuning.\\ +% & & Selects the second largest pseudoF value if the\\ & & difference with the first is less than a fraction. \vspace{0.10cm}\\ +% \code{dpseudoF} & DKM-pseudoF & Adaptation of pseudoF criterion proposed by \\ +% & & \citet{rocci2008} to bi-clustering. \vspace{0.10cm} & Parameter Tuning\\ +% \code{kaiserCrit} & Kaiser criterion & Kaiser rule for the choice of the number of \\ +% & & principal components \citep{kaiser1960}. \vspace{0.10cm} & Parameter Tuning\\ +% \code{centree} & Dendrogram of the centroids & Graphical tool showing how close \\ +% & & the centroids of a partition are. \vspace{0.10cm}\\ +% \code{silhouette} & Silhouette & Imported from \CRANpkg{cluster} \citet{maechler2023} \\ +% & & and \CRANpkg{factoextra} \citep{kassambara2022}. \vspace{0.10cm} & Visualization and Parameter Tuning\\ +% \code{heatm} & Heatmap & Produces a heatmap of distance-based-ordered \\ +% & & units within distance-based-ordered clusters. \\ +% & & Adapted from \CRANpkg{pheatmap} \citep{kolde2019}.\vspace{0.10cm} & Visualization\\ +% \code{CronbachAlpha} & Cronbach Alpha Index & Proposed by \citet{cronbach1951}. Assesses the \\ +% & & unidimensionality of a dataset. \vspace{0.10cm}\\ +% \code{mrand} & Adjusted Rand Index & Assesses the quality of a partition based on \\ +% & & the confusion matrix, as described in \citet{rand1971} & Assessment of results\vspace{0.10cm}\\ +% \code{cluster} & Membership vector & Returns a multinomial 1 x \textit{n} membership vector \\ +% & & from the binary and row-stochastic \textit{n} x \textit{K} \\ +% & & membership matrix, similar to the plain \\ & & \code{kmeans} \code{\$cluster} returned value. & Encoding\\ +% \bottomrule +% \end{tabular} +% \caption{Auxiliary functions available in the library} +% \label{tab:aux_methods} +% \end{table} +With regard to the auxiliary functions (Table \ref{tab:aux_methods}), they have all been implemented in the \texttt{R} language, building on top of packages already available on CRAN, such as \CRANpkg{cluster} by \cite{maechler2023}, \CRANpkg{factoextra} by \cite{kassambara2022}, \CRANpkg{pheatmap} by \cite{kolde2019}, which allowed for an easier implementation. +One of the main goals of the proposed package, besides spreading the availability and usability of the statistical methods considered, is the speed of computation. By doing so (if the memory is sufficient), the results, also for large data matrices, can be obtained in a reasonable amount of time. A first mean adopted to pursue such a goal is the full implementation of the statistical methods in the C++ language. The libraries used are \CRANpkg{Rcpp} \citep{eddelbuettel2011} and \CRANpkg{RcppArmadillo} \citep{eddelbuettel2014}, which significantly reduced the required runtime. + +A practical issue that happens very often in crisp (hard) clustering, such as KM, is the presence of empty clusters after the assignment step. When this happens, a column of $\mathbf{U}$ has all elements equal to zero, which can be proved to be a local minimum solution, and impedes obtaining a solution for $(\mathbf{U}'\mathbf{U})^{-1}$. This typically happens even more often when the number of clusters \textit{K} specified by the user is larger than the true one or in the case of a sub-optimal solution. Among the possible solutions addressing this issue, the one implemented here consists in splitting the cluster with higher within-deviance. In practice, a KM with $\textit{K} = 2$ is applied to it, assigning to the empty cluster one of the two clusters obtained by the procedure, which is iterated until all the empty clusters are filled. Such a strategy guarantees that the monotonicity of the ALS algorithm is preserved, although it is the most time-consuming one. + +Among all the six implementations of the statistical techniques, there are some arguments that are set to a default value. Table \ref{tab:defaultarguments} describes all the arguments that have a default value. In particular, \code{print}, which displays a descriptive summary of the results, is set to zero (so the user should explicitly require to the function such output). \code{Rndstart} is set as default to 20, so that the algorithm is run 20 times until convergence. In order to have more confidence (not certainty) that the obtained solution is a global optimum, a higher value for this argument can be provided. With particular regard to \code{redkm} and \code{factkm}, the argument \code{rot}, which performs a Varimax rotation on the loading matrix, is set by default to 0. If the user would like to have this performed, it must be set equal to 1. Finally, the \code{constr} argument, which is available for \code{dpcakm} and \code{dispca}, is set by default to a vector (of length \textit{J}) of zeros, so that each variable is selected to contribute to the most appropriate latent variable, according to the logic of the model. +\begin{table}[h] +\small +\centering +\renewcommand{\arraystretch}{1.3} +\begin{tabular}{p{1.3cm} p{2.3cm} p{7cm} p{1cm}} +\toprule +\textbf{Argument} & \textbf{Used In} & \textbf{Description} & \textbf{Default Value} \\ +\toprule +\code{Rndstart} & \code{doublekm}, \code{redkm}, \code{factkm}, \code{dpcakm}, \code{dispca}, \code{disfa} & +Number of times the model is run until convergence. & 20\\ +\code{verbose} & \code{doublekm}, \code{redkm}, \code{factkm}, \code{dpcakm}, \code{dispca}, \code{disfa} & +Outputs basic summary statistics regarding each random start (1 = enabled; 0 = disabled). & 0 \\ +\code{maxiter} & \code{doublekm}, \code{redkm}, \code{factkm}, \code{dpcakm}, \code{dispca}, \code{disfa} & +Maximum number of iterations allowed for each random start (if convergence is not yet reached) & 100 \\ +\code{tol} & \code{doublekm}, \code{redkm}, \code{factkm}, \code{dpcakm}, \code{dispca}, \code{disfa} & Tolerance threshold (maximum difference between the values of the objective function of two consecutive iterations such that convergence is assumed & $10^{-6}$\\ +\code{tol} & \code{apseudoF} & Approximation value. It is half of the length of the interval put for each pF value. 0 <= \code{tol} < 1 & 0.05\\ +\code{rot} & \code{redkm}, \code{factkm} & performs varimax rotation of axes obtained via PCA (0 = \code{False}; 1 = \code{True}) & 0\\ +\code{prep} & \code{doublekm}, \code{redkm}, \code{factkm}, \code{dpcakm}, \code{dispca}, \code{disfa} & +Pre-processing of the data. 1 performs the \textit{z}-score transform; 2 performs the min-max transform; 0 leaves the data un-pre-processed & 1\\ +\code{print} & \code{doublekm}, \code{redkm}, \code{factkm}, \code{dpcakm}, \code{dispca}, \code{disfa} & Final summary statistics of the performed method (1 = enabled; 0 = disabled). & 0\\ +\code{constr} & \code{dpcakm}, \code{dispca}, \code{disfa} & Vector of length \( J \) (number of variables) specifying variable-to-cluster assignments. Each element can be an integer from 0 to \( Q \) (number of variable-clusters or components), indicating a fixed assignment, or 0 to leave the variable unconstrained (i.e., assigned by the algorithm). & \code{rep(0,J)}\\ +\bottomrule +\end{tabular} +\caption{Arguments accepted by functions in the \texttt{drclust} package with default values} +\label{tab:defaultarguments} +\end{table} + +By offering a fast execution time, all the implemented models allow to run multiple random starts of the algorithm in a reasonable amount of time. This feature comes particularly useful given the absence of guarantees of global optima for the ALS algorithm, which has an ad-hoc implementation for all the models. Table \ref{tab:comparison} shows that, compared to the two packages which implement 3 of the 6 models in \CRANpkg{drclust}, our proposal is much faster than the corresponding versions implemented in \texttt{R} (Table \ref{tab:comparison}), providing, nevertheless, compelling results. + +The iris dataset has been used in order to measure the performance in terms of fit, runtime, and ARI \citep{rand1971}. The \textit{z}-transform has been applied on all the variables of the dataset. This implies that all the variables, post-transformation, have mean equal to 0 and variance equal to 1, by subtracting the mean to each variable and dividing the result by the standard deviation. The same result is typically obtained by the \code{scale(X) R} function. + +\begin{equation} +\label{eq:ztransform} +\mathbf{Z}_{\cdot j} = \frac{\mathbf{X}_{\cdot j} - \mu_j \mathbf{1_\textit{n}}}{\sigma_j} +\end{equation} +where $\mu_j$ is the mean of the \textit{j}-th variable and $\sigma_\textit{j}$ its standard deviation. The subscript .\textit{j} refers to the whole \textit{j}-th column of the matrix. +This operation avoids the measurement scale to have impact on the final result (and is used by default, unless otherwise specified by the user, within all the techniques implemented by \code{drclust}. In order to avoid the comparison between potentially different objective functions, the between deviance (intended as described by the authors in the articles where the methods have been proposed) has been used as a fit measure and computed based on the output provided by the functions, aiming at having homogeneity in the evaluation metric. +\textit{K=3} and \textit{Q=2} have been used for the clustering algorithms, maintaining, for the two-dimensionality reduction techniques, just \textit{Q} = 2. + +For each method, 100 runs have been performed and the best solution has been picked. For each run, the maximum allowed number of iterations = 100, with a tolerance error (i.e., precision) equal to $10^{-6}$. + +\begin{table}[htbp] + \centering + \begin{tabular}{llllll} + \toprule + Library & Technique & Runtime & Fit & ARI & Fit Measure \\ + \toprule + clustrd & RKM & 0.73& 21.38 & 0.620 & $||\mathbf{U}\bar{\mathbf{Y}}\mathbf{A}'||^2$ \\ + drclust & RKM & 0.01& 21.78& 0.620 & $||\mathbf{U}\bar{\mathbf{Y}}\mathbf{A}'||^2$ \vspace{0.15cm} \\ + clustrd & FKM & 1.89& 4.48 & 0.098 &$||\mathbf{U}\bar{\mathbf{Y}}||^2$ \\ + drclust & FKM & 0.03 & 21.89 & 0.620 &$||\mathbf{U}\bar{\mathbf{Y}}||^2$ \vspace{0.15cm} \\ + biplotbootGUI & CDPCA & 2.83& 21.32 & 0.676 &$||\mathbf{U}\bar{\mathbf{Y}}\mathbf{A}'||^2$ \\ + drclust & CDPCA & 0.05 & 21.34 & 0.676 & $||\mathbf{U}\bar{\mathbf{Y}}\mathbf{A}'||^2$ \vspace{0.15cm} \\ + drclust & DKM & 0.03 & 21.29 & 0.652 & $||\mathbf{U}\bar{\mathbf{X}}\mathbf{H_V}||^2$ \vspace{0.15cm} \\ + drclust & DPCA & <0.01 & 23.70& - &$||\mathbf{Y}\mathbf{A}'||^2$ \vspace{0.10cm} \vspace{0.15cm} \\ + drclust & DFA & 1.11 & 55.91& - &$||\mathbf{Y}\mathbf{A}'||^2$ \vspace{0.15cm} \\ + \bottomrule + \end{tabular} + \caption{Performance of the variable reduction and joint clustering-variable reduction models} + \label{tab:comparison} +\end{table} + +The results of table \ref{tab:comparison} are visually represented in figure \ref{fig:iriscomparison}. + +\begin{figure}[h] + \centering + \includegraphics[scale=0.45]{figures/1iriscomparison.png} + \caption{ARI, Fit, Runtime for the available implementations} + \label{fig:iriscomparison} +\end{figure} + +Although the runtime heavily depends on the hardware characteristics, they have been reported within Table \ref{tab:comparison} for a relative comparison purpose only, having run all the techniques with the same one hardware. For all the computations within the present work, the specifics of the machine used are: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz 2.00 GHz. + +Besides the already mentioned difference between DPCA and DFA, it is worth mentioning that, in terms of implementation, they retrieve the latent variables differently. Indeed, while the DPCA relies on the eigendecomposition, the DFA uses an implementation of the power method \citep{hotelling1933}. + +In essence, the implementation of our proposal, while being very fast, exhibits a goodness of fit very close (sometimes better, compared) to the available alternatives. + +\section{Simulation study} +To better understand the capabilities of the proposed methodologies and evaluate the performance of the drclust package, a simulation study was conducted. In this study, we assume that the number of clusters (K) and the number of factors (Q) are known, and we examine how results vary across the DKM, RKM, FKM, and DPCAKM methods. +\subsection{Data generation process} +The performance of these algorithms is tested on synthetic data generated through a specific procedure. Initially, centroids are created using eigendecomposition on a transformed distance matrix, resulting in three equidistant centroids in a reduced two-dimensional space. To model the variances and covariances among the generated units within each cluster and to introduce heterogeneity among the units, a variance-covariance matrix ($\Sigma_O$) is derived from samples taken from a zero-mean Gaussian distribution, with a specified standard deviation ($\sigma_u$). + +Membership for the 1,000 units is determined based on a (K × 1) vector of prior probabilities, utilizing a multinomial distribution with (0.2, 0.3, 0.5) probabilities. For each unit, a sample is drawn from a multivariate Gaussian distribution centered around its corresponding centroid, using the previously generated covariance matrix ($\Sigma_O$). Additionally, four masking variables, which do not exhibit any clustering structure, are generated from a zero-mean multivariate Gaussian and scaled by a standard deviation of $\sigma$=6. These masking variables are added to the 2 variables that form the clustering structure of the dataset. Then, the final sample dataset is standardized. + +It is important to note that the standard deviation $\sigma_u$ controls the amount of variance in the reduced space, thus influencing the level of subspace residuals. Conversely, $\sigma_m$ regulates the variance of the masking variables, impacting the complement residuals. + +This study considers various scenarios where there are $J$ = 6 variables, $n$ = 1,000 units, $K$ = 3 clusters and $Q$ = 2 factors. We explore high, medium, and low variance +$\sigma_u$ of the heterogeneity within clusters with values of 0.8, 0.55, and 0.3. For each combination of these parameters, $s$=100 samples are generated. Since the design is fully crossed, a total of 300 datasets are produced. Examples of the generated samples are illustrated in Figure \ref{fig:sim123}, which shows that as the level of within-cluster variance increases, the variables with a clustering structure tend to create overlapping clusters. +It is worthy to inform that the two techniques dedicated solely to variable reduction, namely DPCA and DFA, were not included in the simulation study. This is because the study's primary focus is on clustering and dimension reduction and the comparison with competing implementations. However, it is worth noting that these methods are inherently quick, as can be observed from the speed of methodologies that combine clustering with DPCA or DFA dimension reduction methods. + +\subsection{Performance evaluation} +The performance of the proposed methods was assessed through a simulation study. To evaluate the accuracy in recovering the true cluster membership of the units (\textbf{U}), the ARI \citep{HubertArabie} was employed. The ARI quantifies the similarity between the hard partitions generated by the estimated classification matrices and those defined by the true partition. It considers both the reference partition and the one produced by the algorithm under evaluation. The ARI typically ranges from 0 to 1, where 0 indicates a level of agreement expected by random chance, and 1 denotes a perfect match. Negative values may also occur, indicating agreement worse than what would be expected by chance. +In order to assess the models' ability to reconstruct the underlying data structure, the between deviance, denoted by $f$—, was computed. This measure is defined in the original works proposing the evaluated methods and is reported in the second column (Fit Measure) of Table \ref{tab:simulation}. For comparison, the true between deviance $f^{*}$, calculated from the known true, known, values of \textbf{U} and \textbf{A}, was also computed. The difference $f - f^{*}$ was considered, where negative values suggest potential overfitting. +Furthermore, the squared Frobenius norm $||\mathbf{A}^* - \mathbf{A}||^2$ was computed to assess how accurately each model estimated the true loading matrix $\mathbf{A}^*$. This evaluation was not applicable to the DKM method, as it does not provide estimates of the loading matrix. For each performance metric presented in Table \ref{tab:simulation}, the median value across $s$ = 100 replicates, for each level of error (within deviance), is reported. + +It is important to note that fit and ARI reflect distinct objectives. While fit measures the variance explained by the model, the ARI assesses clustering accuracy. As such, the two metrics may diverge. A model may achieve high fit by capturing subtle variation or even noise, which may not correspond to well-separated clusters, leading to a lower ARI. Conversely, a method focused on maximizing cluster separation may yield high ARI while explaining less overall variance. This trade-off is particularly relevant in unsupervised settings, where there is no external supervision to guide the balance between reconstruction and partitioning. For this reason, we report both metrics to provide a more comprehensive assessment of model performance. + + +\subsection{Algorithms performances and comparison with the competing implementations} +For each sample, the algorithms DKM, RKM, FKM, and DPCAKM are applied using 100 random start solutions, selecting the best one. This significantly reduces the impact of local minima in the clustering and dimension reduction process. Figure \ref{fig:sim123} depicts the typical situation for each scenario (low, medium, high within-cluster variance). +\begin{figure}[h!] + \centering + \includegraphics[scale = 0.45]{figures/2sim123.png} + \caption{Within-cluster variance of the simulated data (in order: low, medium, high)} + \label{fig:sim123} +\end{figure} +\begin{table}[h!!] +\small + \centering + \begin{tabular}{llllllll} + \toprule + Technique & Fit Measure & Library & Runtime (s) & Fit & ARI & $f^* - f$ & $||\mathbf{A}^* - \mathbf{A}||^2$ \\ + \toprule + \multicolumn{8}{c}{\textbf{Low}} \\ + \midrule + RKM & $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}\mathbf{A}'||^2$ & clustrd & 164.03 & 42.76 & 1.00 & 0.00 & 2.00 \\ + RKM & $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}\mathbf{A}'||^2$ & drclust & 0.48 & 42.76 & 1.00 & 0.00 & 2.00 \\ + FKM & $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}||^2$ & clustrd & 15.48 & 2.89 & 0.35 & 39.77 & 1.99 \\ + FKM & $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}||^2$ & drclust & 0.52 & 42.76 & 1.00 & 0.00 & 2.00 \\ + DPCAKM & $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}||^2$ & biplotbootGUI & 41.70 & 42.74 & 1.00 & 0.01 & 2.00 \\ + DPCAKM & $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}||^2$ & drclust & 1.37 & 42.74 & 1.00 & 0.01 & 2.00 \\ + DKM & $||\mathbf{U}\bar{\mathbf{X}}\mathbf{V}||^2$ & drclust & 0.78 & 61.55 & 0.46 & -18.94 & - \\ + \midrule + \multicolumn{8}{c}{\textbf{Medium}} \\ + \midrule + RKM & $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}\mathbf{A}'||^2$ & clustrd & 230.31 & 39.18 & 0.92 & -0.27 & 2.00 \\ + RKM & $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}\mathbf{A}'||^2$ & drclust & 0.70 & 39.18 & 0.92 & -0.27 & 2.00 \\ + FKM & $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}||^2$ & clustrd & 14.31 & 2.85 & 0.28 & 36.09 & 1.99 \\ + FKM & $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}||^2$ & drclust & 0.76 & 39.18 & 0.92 & -0.27 & 2 \\ + DPCAKM & $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}||^2$ & biplotbootGUI & 47.76 & 39.15 & 0.92 & -0.25 & 2.00 \\ + DPCAKM & $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}||^2$ & drclust & 1.64 & 39.15 & 0.92 & -0.25 & 2.00 \\ + DKM & $||\mathbf{U}\bar{\mathbf{Y}}\mathbf{V}||^2$ & drclust & 0.81 & 5.93 & 0.39 & -21.00 & - \\ + \midrule + \multicolumn{8}{c}{\textbf{High}} \\ + \midrule + RKM & $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}\mathbf{A}'||^2$ & clustrd & 314.89 & 36.61 & 0.62 & -2.11 & 2.00 \\ + RKM & $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}\mathbf{A}'||^2$ & drclust & 0.94 & 36.61 & 0.61 & -2.11 & 2.00 \\ + FKM & $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}||^2$ & clustrd & 13.87 & 2.90 & 0.19 & 31.55 & 2.00 \\ + FKM & $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}||^2$ & drclust & 1.02 & 36.61 & 0.61 & -2.11 & 2.00 \\ + DPCAKM & $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}||^2$ & biplotbootGUI & 55.49 & 36.53 & 0.64 & -1.99 & 2.00 \\ + DPCAKM & $||\mathbf{U}\bar{\mathbf{X}}\mathbf{A}||^2$ & drclust & 2.06 & 36.53 & 0.63 & -2.01 & 2.00 \\ + DKM & $||\mathbf{U}\bar{\mathbf{X}}\mathbf{V}||^2$ & drclust & 0.84 & 58.97 & 0.29 & -24.37 & - \\ + \bottomrule + \end{tabular} + \caption{Comparison of joint clustering-variable reduction methods on simulated data} + \label{tab:simulation} +\end{table} +For the three scenarios, the results are reported in \ref{tab:simulation}. + \begin{figure}[h!!] + \centering + \includegraphics[scale=1]{figures/3Fit.png} + \caption{Boxplots of the Fit results in Table \ref{tab:simulation}} + \label{fig:simboxplots1} +\end{figure} + \begin{figure}[h!!] + \centering + \includegraphics[scale=1]{figures/4ARI.png} + \caption{Boxplots of the ARI results in Table \ref{tab:simulation}} + \label{fig:simboxplots2} +\end{figure} + \begin{figure}[h!] + \centering + \includegraphics[scale=1]{figures/5fsf.png} + \caption{Boxplots of the $f^*-f$ results in Table \ref{tab:simulation}} + \label{fig:simboxplots3} +\end{figure} + \begin{figure}[h!!] + \centering + \includegraphics[scale=1]{figures/6AsA.png} + \caption{Boxplots of the $\|\mathbf{A-A^*}\|^2$ metric results in Table \ref{tab:simulation}} + \label{fig:simboxplots4} +\end{figure} + \begin{figure}[h!!] + \centering + \includegraphics[scale=0.85]{figures/7runtime_RKM.png} + \caption{Boxplots of the runtime results in Table \ref{tab:simulation}, for the RKM} + \label{fig:simboxplots5} +\end{figure} + \begin{figure}[h!!] + \centering + \includegraphics[scale=1]{figures/8runtime_others.png} + \caption{Boxplots of the runtime metric results in Table \ref{tab:simulation}, for DKM, DPCAKM, FKM} + \label{fig:simboxplots6} +\end{figure} + +Regarding the RKM, the \CRANpkg{drclust} and \CRANpkg{clustrd} performance is very close, both in terms of the ability to recover the data (fit) and in terms of identifying the true classification of the objects. + +The FKM appears to be performing way better in the \CRANpkg{drclust} case in terms of fit and ARI. Considering both ARI and fit for the CDPCA algorithm, the difference between the present proposal and the one of \CRANpkg{biplotbootGUI} is almost absent. Referring to the CPU runtime, all of the models proposed are significantly faster compared to the previously available ones (RKM, FKM and KM with DPCA). For the architecture used for the experiments, the order of magnitude for such differences are specified in the last column of Table \ref{tab:stat_models}. + +In general, the \CRANpkg{drclust} shows a slight overfit, while there is no evident difference in the ability to recover the true \textbf{A}. There is no alternative implementation for the DKM, so no comparison can be made. However, except for the ARI which is lower than the other techniques, its fit is very close, showing a compelling ability to reconstruct the data. +In general, except for the FKM, where our proposal outperforms the one in \CRANpkg{clustrd}, our proposal is equivalent in terms of fit and ARI. However, our versions outperform every alternative in terms of runtime. +Figures (\ref{fig:simboxplots1} - \ref{fig:simboxplots6}) visually depict the situation in \ref{tab:simulation}, showing also the variability for each scenario, among 100 replicates. In general, with the exception of the FKM method, where our proposed approach outperforms the implementation available in \CRANpkg{clustrd}, the methods are comparable in terms of both fit and ARI. Nevertheless, our implementations consistently outperform all alternatives in terms of runtime. + +Figure~(\ref{fig:simboxplots1} - \ref{fig:simboxplots6}) provide a visual summary of the results reported in Table~\ref{tab:simulation}, illustrating not only the central tendencies but also the variability across the 100 simulation replicates for each scenario. + +\section{Application on real data} + +The six statistical models implemented (Table \ref{tab:stat_models}) have a binary argument \code{print} which, if set to one, displays at the end of the execution the main statistics. In the following examples, such results are shown, using as dataset the same used by \citet{vichi2001a} and made available in \CRANpkg{clustrd} \citep{markos2019} and named \code{macro}, which has been standardized by setting the argument \code{prep=1}, which is done by default by all the techniques. Moreover, the commands reported in each example do not specify all the arguments available for the function, for which the default values have been kept. + +The first example refers to the DKM \citep{maurizio2001a}. As shown, the output contains the fit expressed as the percentage of the total deviance (i.e., $||\mathbf{X}||^2$) captured by the between deviance of the model, implementing the fit measures in (Table \ref{tab:comparison}). The second output is the centroid matrix $\bar{\mathbf{Y}}$, which describes the \textit{K} centroids in the \textit{Q}-dimensional space induced by the partition of the variables and its related variable-means. What follows are the sizes and within deviances of each unit cluster and each variable cluster. +Finally, it shows the pseudoF \citep{calinski1974} index, which is always computed for the partition of the units. +Please note that the data matrix provided to each function implemented in the package needs to be in matrix format. + +\begin{example} +# Macro dataset (Vichi & Kiers, 2001) +library(clustrd) +data(macro) +macro <- as.matrix(macro) +# DKM +> dkm <- doublekm(X = macro, K = 5, Q = 3, print = 1) + +>> Variance Explained by the DKM (% BSS / TSS): 44.1039 + +>> Centroid Matrix (Unit-centroids x Variable-centroids): + + V-Clust 1 V-Clust 2 V-Clust 3 +U-Clust 1 0.1282052 -0.31086968 -0.4224182 +U-Clust 2 0.0406931 -0.08362029 0.9046692 +U-Clust 3 1.4321347 0.51191282 -0.7813761 +U-Clust 4 -0.9372541 0.22627768 0.1175189 +U-Clust 5 1.2221058 -2.59078258 -0.1660691 + +>> Unit-clusters: + + U-Clust 1 U-Clust 2 U-Clust 3 U-Clust 4 U-Clust 5 +Size 8 4 4 3 1 +Deviance 23.934373 31.737865 5.878199 4.844466 0.680442 + + + +>> Variable-clusters: + + V-Clust 1 V-Clust 2 V-Clust 3 +Size 3 2 1 +Deviance 40.832173 23.024249 3.218923 + +>> pseudoF Statistic (Calinski-Harabasz): 2.23941 +\end{example} +The second example shows as output the main quantities computed for the \code{redkm} \citep{desoete1994}. Differently from the DKM where the variable reduction is operated via averages, the RKM does this via PCA leading to a better overall fit altering also the final unit-partition, as observable from the sizes or deviances. + +Additionally from the DKM example, the RKM also provides the loading matrix which projects the \textit{J}-dimensional centroids in the \textit{Q}-dimensional subspace. Another important difference is the summary of the latent factors: this table shows the information captured by the principal components with respect to the original data. In this sense, the output allows to distinguish between the loss due to the variable reduction (accounted in this table) and the overall loss of the algorithm (which accounts for the loss in the reduction of the units and the one due to the reduction of the variables, reported in the first line of the output). +\begin{example} +# RKM +> rkm <- redkm(X = macro, K = 5, Q = 3, print = 1) + +>> Variance Explained by the RKM (% BSS / TSS): 55.0935 + +>> Matrix of Centroids (Unit-centroids x Principal Components): + + PC 1 PC 2 PC 3 +Clust 1 -1.3372534 -1.1457414 -0.6150841 +Clust 2 1.8834878 -0.0853912 -0.8907303 +Clust 3 0.5759906 0.4187003 0.3739608 +Clust 4 -0.9538864 1.2392976 0.3454186 +Clust 5 1.0417952 -2.2197178 3.0414445 + +>> Unit-clusters: + Clust 1 Clust 2 Clust 3 Clust 4 Clust 5 +Size 5 5 5 4 1 +Deviance 26.204374 9.921313 11.231563 6.112386 0.418161 + +>> Loading Matrix (Manifest Variables x Latent Variables): + + PC 1 PC 2 PC 3 +GDP -0.5144915 -0.04436269 0.08985135 +LI -0.2346937 -0.01773811 -0.86115069 +UR -0.3529363 0.53044730 0.28002534 +IR -0.4065339 -0.42022401 -0.17016203 +TB 0.1975072 0.69145440 -0.36710245 +NNS 0.5927684 -0.24828525 -0.09062404 + +>> Summary of the latent factors: + + Explained Variance Expl. Var. (%) Cumulated Var. Cum. Var (%) +PC 1 1.699343 28.322378 1.699343 28.322378 +PC 2 1.39612 23.268663 3.095462 51.591041 +PC 3 1.182372 19.706208 4.277835 71.297249 + +>> pseudoF Statistic (Calinski-Harabasz): 4.29923 +\end{example} +The \code{factkm} \citep{vichi2001a} has the same output structure of the \code{redkm}. It exhibits, for the same data and hyperparameters, a similar fit (overall and variable-wise). However, the unit-partition, as well as the latent variables are different. This difference can be (at least) partially justified by the difference in the objective function, which is most evident in the assignment step. +\begin{example} +# factorial KM +> fkm <- factkm(X = macro, K = 5, Q = 3, print = 1, rot = 1) + +>> Variance Explained by the FKM (% BSS / TSS): 55.7048 + +>> Matrix of Centroids (Unit-centroids x Principal Components): + + PC 1 PC 2 PC 3 +Clust 1 -0.7614810 2.16045496 -1.21025666 +Clust 2 1.1707159 -0.08840133 -0.29876729 +Clust 3 -0.9602731 -1.33141866 0.02370092 +Clust 4 1.0782934 1.17952330 3.59632116 +Clust 5 -1.7634699 0.65075735 0.46486440 + +>> Unit-clusters: + Clust 1 Clust 2 Clust 3 Clust 4 Clust 5 +Size 9 5 3 2 1 +Deviance 6.390576 2.827047 5.018935 3.215995 0 + +>> Loading Matrix (Manifest Variables x Latent Variables): + + PC 1 PC 2 PC 3 +GDP -0.6515084 -0.1780021 0.37482509 +LI -0.3164139 0.1809559 -0.68284917 +UR -0.2944864 -0.5235492 0.01561022 +IR -0.3316254 0.5884434 -0.22101070 +TB 0.1848264 -0.5367239 -0.57166730 +NNS 0.4945307 0.1647067 0.13164438 + +>> Summary of the latent factors: + + Explained Variance Expl. Var. (%) Cumulated Var. Cum. Var (%) +PC 1 1.68496 28.082675 1.68496 28.082675 +PC 2 1.450395 24.173243 3.135355 52.255917 +PC 3 1.079558 17.992635 4.214913 70.248552 + +>> pseudoF Statistic (Calinski-Harabasz): 4.26936 +\end{example} +\code{dpcakm} \citep{vichi2009} shows the same output as RKM and FKM. The partition of the variables, described by the $\mathbf{V}$ term in (\ref{cdpca4}) - (\ref{cdpca5}), is readable within the loading matrix, considering a $1$ for each non-zero value. For the \code{iris} dataset, the additional constraint $\mathbf{A} = \mathbf{B}\mathbf{V}$ does not cause a significant decrease in the objective function. The clusters, however, differ from the previous cases as well. +\begin{example} +# K-means DPCA +> cdpca <- dpcakm(X = macro, K = 5, Q = 3, print = 1) + +>> Variance Explained by the DPCAKM (% BSS / TSS): 54.468 + +>> Matrix of Centroids (Unit-centroids x Principal Components): + + PC 1 PC 2 PC 3 +Clust 1 0.6717536 0.01042978 -2.7309458 +Clust 2 3.7343724 -1.18771685 0.6320673 +Clust 3 -0.6729575 -1.80822745 0.7239541 +Clust 4 -0.2496002 1.54537904 0.5263009 +Clust 5 -0.1269212 -0.12464388 -0.1748282 + +>> Unit-clusters: + Clust 1 Clust 2 Clust 3 Clust 4 Clust 5 +Size 7 6 4 2 1 +Deviance 3.816917 2.369948 1.14249 4.90759 0 + +>> Loading Matrix (Manifest Variables x Latent Variables): + + PC 1 PC 2 PC 3 +GDP 0.5567605 0.0000000 0 +LI 0.0000000 0.7071068 0 +UR 0.5711396 0.0000000 0 +IR 0.0000000 0.0000000 1 +TB 0.0000000 0.7071068 0 +NNS -0.6031727 0.0000000 0 + +>> Summary of the latent factors: + Explained Variance Expl. Var. (%) Cumulated Var. Cum. Var (%) +PC 1 1 16.666667 1 16.666667 +PC 2 1.703964 28.399406 2.703964 45.066073 +PC 3 1.175965 19.599421 3.87993 64.665494 + +>> pseudoF Statistic (Calinski-Harabasz): 3.26423 +\end{example} +For the \code{dispca} \citep{vichi2009}, the output is mostly similar (except for the part of unit-clustering) to the ones already shown. Nevertheless, because the focus here is exclusively on the variable reduction process, some additional information is reported in the summary of the latent factors. Indeed, because a single principal component summarises a subset of manifest variables, the variance of the second component related to each of the subsets, along with the \citet{cronbach1951} Alpha index is computed, in order for the user to know when the evidence supports such strategy of dimensionality reduction. As mentioned, this function, like in the DPCAKM case, as well as the DFA case, it allows to constrain a subset of the \textit{J} variables to belong to the same cluster. In the example that follows, the first two manifest variables are constrained to contribute to the same principal component (which is confirmed by the output \code{A}). Note that the manifest variables that have indices (colum-position in the data matrix) in correspondence of the zeros in \code{constr} remain unconstrained. +\begin{example} +# DPCA +# Impose GDP and LI to be in the same cluster +> out <- dispca(X = macro, Q = 3, print = 1, constr = c(1,1,0,0,0,0)) + +>> Variance explained by the DPCA (% BSS / TSS)= 63.9645 + +>> Loading Matrix (Manifest Variables x Latent variables) + + PC 1 PC 2 PC 3 +GDP 0.0000000 0.0000000 0.7071068 +LI 0.0000000 0.0000000 0.7071068 +UR -0.7071068 0.0000000 0.0000000 +IR 0.0000000 -0.7071068 0.0000000 +TB 0.0000000 0.7071068 0.0000000 +NNS 0.7071068 0.0000000 0.0000000 + +>> Summary of the latent factors: + Explained Variance Expl. Var. (%) Cumulated Var. +PC 1 1.388294 23.13824 1.388294 +PC 2 1.364232 22.73721 2.752527 +PC 3 1.085341 18.08902 3.837868 + Cum. Var (%) Var. 2nd component Cronbach's Alpha +PC 1 23.13824 0.6117058 -1.269545 +PC 2 45.87544 0.6357675 -1.145804 +PC 3 63.96447 0.9146585 0.157262 +\end{example} +The \code{disfa} \citep{vichi2017}, by assuming a probabilistic underlying model, allows additional evaluation metrics and statistics as well. The overall objective function is not directly comparable with the other ones, and is expressed in absolute (not relative, like in the previous cases) terms. The $\chi^2$ (\code{X2}), along with \code{BIC}, \code{AIC} and \code{RMSEA} allow a robust evaluation of the results in terms of fit/parsimony. Additionally to the DPCA case, for each variable, the function displays the commonality with the factors, providing a standard error, as well as an associated \textit{p}-value for the estimate. + +It is possible to assess by comparing the loading matrix in the DPCA case with the DFA one, the similarity in terms of latent variables. Part of the difference can be justified (besides the well-known distinctions between PCA and FA) with the method used to compute each factor. While in all the previous cases, the eigendecomposition has been employed for this purpose, the DFA makes use of the power iteration method for the computation of the loading matrix \citep{hotelling1933}. +\begin{example} +# disjoint FA +> out <- disfa(X = macro, Q = 3, print = 1) +>> Discrepancy of DFA: 0.296499 + +>> Summary statistics: + + Unknown Parameters Chi-square Degrees of Freedom BIC + 9 4.447531 12 174.048102 + AIC RMSEA + 165.086511 0.157189 + +>> Loading Matrix (Manifest Variables x Latent Variables) + + Factor 1 Factor 2 Factor 3 +GDP 0.5318618 0 0.0000000 +LI 0.0000000 1 0.0000000 +UR 0.5668542 0 0.0000000 +IR 0.0000000 0 0.6035160 +TB 0.0000000 0 -0.6035152 +NNS -0.6849942 0 0.0000000 + +>> Summary of the latent factors: + + Explained Variance Expl. Var. (%) Cum. Var Cum. Var (%) +Factor 1 1.0734177 17.89029 1.073418 17.89029 +Factor 2 1.0000000 16.66667 2.073418 34.55696 +Factor 3 0.7284622 12.14104 2.801880 46.69800 + Var. 2nd component Cronbach's Alpha +Factor 1 0.7001954 -0.6451803 +Factor 2 0.0000000 1.0000000 +Factor 3 0.6357675 -1.1458039 + +>> Detailed Manifest-variable - Latent-factor relationships + + Associated Factor Corr. Coeff. Std. Error Pr(p>|Z|) +GDP 1 0.5318618 0.1893572 0.0157923335 +LI 2 1.0000000 0.0000000 0.0000000000 +UR 1 0.5668542 0.1842113 0.0091557523 +IR 3 0.6035160 0.1782931 0.0048411219 +TB 3 -0.6035152 0.1782932 0.0048411997 +NNS 1 -0.6849942 0.1629084 0.0008606488 + Var. Error Communality +GDP 0.7171230 0.2828770 +LI 0.0000000 1.0000000 +UR 0.6786764 0.3213236 +IR 0.6357684 0.3642316 +TB 0.6357695 0.3642305 +NNS 0.5307830 0.4692170 +\end{example} +In practice, usually the \code{K} and \code{Q} hyper-parameters are not known a priori. In such case, a possible tool that allows to investigate plausible values for \code{Q} is the Kaiser criterion \citep{kaiser1960}, in \code{R}, \code{kaiserCrit}), takes as a single argument the dataset and outputs a message, as well as a scalar output indicating the number of the optimal components based on this rule. +\begin{example} +# Kaiser criterion for the choice of Q, the number of latent components +> kaiserCrit(X = macro) + +The number of components suggested by the Kaiser criterion is: 3 +\end{example} +For selecting the number of clusters, \code{K}, one of the most commonly used indices is the \textit{pseudoF} statistic, which, however, tends to underestimate the optimal number of clusters. To address this limitation, a "relaxed" version, referred to as \code{apseudoF}, has been implemented. +The \code{apseudoF} procedure computes the standard \code{pseudoF} index over a range of possible values up to \code{maxK}. If a higher value of \code{K} yields a pseudoF that is less than \code{tol} $\cdot$ pseudoF (compared to the maximum value suggested by the plain pseudoF), then \code{apseudoF} selects this alternative \code{K} as the optimal number of clusters. Additionally, it generates a plot of the pseudoF values computed across the specified \textit{K} range. +Given the hybrid nature of the proposed methods, the function also requires specifying the clustering model to be used: 1 = \code{doublekm}, 2 = \code{redkm}, 3 = \code{factkm}, 4 = \code{dpcakm}. Furthermore, the number of components, \code{Q}, must be provided, as it also influences the final quality of the resulting partition. +\begin{example} +> apseudoF(X = macro, maxK=10, tol = 0.05, model = 2, Q = 3) +The optimal number of clusters based on the pseudoF criterion is: 5 +\end{example} +\begin{figure}[h!] + \centering + \includegraphics[width=0.6\textwidth]{figures/9pFfactkm.png} + \caption{Interval-pseudoF polygonal chain} + \label{fig:pF fkm} +\end{figure} + +While this index has been thought for one-mode clustering methods, \citep{rocci2008} extended it for two-mode clustering methods, allowing to apply it for methods like the \code{doublekm}. The \code{dpseudoF} function implements it and, besides the dataset, one provides the maximum \code{K} and \code{Q} values. +\begin{example} +> dpseudoF(X = macro, maxK = 10, maxQ = 5) + Q = 2 Q = 3 Q = 4 Q = 5 +K = 2 38.666667 22.800000 16.000000 12.222222 +K = 3 22.800000 13.875000 9.818182 7.500000 +K = 4 16.000000 9.818182 6.933333 5.263158 +K = 5 12.222222 7.500000 5.263158 3.958333 +K = 6 9.818182 6.000000 4.173913 3.103448 +K = 7 8.153846 4.950000 3.407407 2.500000 +K = 8 6.933333 4.173913 2.838710 2.051282 +K = 9 6.000000 3.576923 2.400000 1.704545 +K = 10 5.263158 3.103448 2.051282 1.428571 +\end{example} +Here, the indices of the maximum value within the matrix are chosen as the best \code{Q} and \code{K} values. + +Just by providing the centroid matrix, one can check how those are related. Such information is usually not provided by partitive clustering methods, but rather for the hierarchical ones. Nevertheless, it is always possible to construct a distance matrix based on the centroids and represent it via a dendrogram, using an arbitrary distance. The \code{centree} function does exactly this, using the the \citet{ward1963} distance, which corresponds to the squared Euclidean one. In practice, one provides as an argument the output of one of the 4 methods performing clustering. +\begin{example} +> out <- factkm(X = macro, K = 10, Q = 3) +> centree(drclust_out = out) +\end{example} +\begin{figure}[h] + \centering + \includegraphics[width=0.7\textwidth]{figures/10centreedpca10m.png} + \caption{Dendrogram of a 10-centroids } + \label{fig:centree dpca10m} +\end{figure} +If, instead, one wants to assess visually the quality of the obtained partition, there are another instrument typically used for this purpose. The silhouette \citep{ROUSSEEUW198753}, besides summarizing this numerically, allows to also graphically represent it. By employing \CRANpkg{cluster} for the computational part and \CRANpkg{factoextra} for the graphical part, \code{silhouette} takes as argument the output of one of the four \CRANpkg{drclust} clustering methods and the dataset, returning the results of the two functions with just one command. +\begin{example} +# Note: The same data must be provided to dpcakm and silhouette +> out <- dpcakm(X = macro, K = 5, Q = 3) +> silhouette(X = macro, drclust_out = out) +\end{example} +\begin{figure}[h!] +\small + \centering + \includegraphics[width=0.7\textwidth]{figures/11silhouettek5q3.png} + \caption{Silhouette of a DPCA KM solution} + \label{silhouette dpcakm} +\end{figure} +As can be seen in Figure \ref{silhouette dpcakm}, the average silhouette width is also displayed as a scalar above the plot. + +A purely graphical tool used to assess the dis/homogeneity of the groups is the \code{heatmap}. By employing the \CRANpkg{pheatmap} library \citep{kolde2019} and the result of \code{doublekm}, \code{redkm}, \code{factkm} or \code{dpcakm}, the function orders each cluster of observations in ascending order with regard to the distance between observation and cluster to which it has been assigned. After doing so for each group, groups are sorted based on the distance between their centroid and the grand mean (i.e., the mean of all observations). The \code{heatm} function allows to obtain such result. Figure \ref{silhouette dpcakm} represents its graphical output. +\begin{example} +# Note: The same data must be provided to dpcakm and silhouette +> out <- doublekm(X = macro, K = 5, Q = 3) +> heatm(X = macro, drclust_out = out) +\end{example} +\begin{figure}[h!] +\small + \centering + \includegraphics[width=0.7\textwidth]{figures/12dkmk5q3heatm.png} + \caption{heatmap of a double-KM solution} + \label{fig:heatmap dkm} +\end{figure} +Biplots and parallel coordinates plots can be obtained based on the output of the techniques in the proposed package by means of few instructions, using libraries available on \code{CRAN}, such as: \CRANpkg{ggplot2} \cite{ggplot2}, \code{grid} (which now became a base package, \CRANpkg{dplyr} \cite{dplyr} and \CRANpkg{GGally} by \cite{ggally}. Therefore, the user can easily visualize the subspaces provided by the statistical techniques. In future versions of the package, the two functions will be available as built-in. Currently, for the biplot, we have: +\begin{example} +library(ggplot2) +library(grid) +library(dplyr) + +out <- factkm(macro, K = 2, Q = 2, Rndstart = 100) + +# Prepare data +Y <- as.data.frame(macro%*%out$A); colnames(Y) <- c("Dim1", "Dim2") +Y$cluster <- as.factor(cluster(out$U)) + +arrow_scale <- 5 +A <- as.data.frame(out$A)[, 1:2] * arrow_scale +colnames(A) <- c("PC1", "PC2") +A$var <- colnames(macro) + +# Axis limits +lims <- range(c(Y$Dim1, Y$Dim2, A$PC1, A$PC2)) * 1.2 + +# Circle +circle <- data.frame(x = cos(seq(0, 2*pi, length.out = 200)) * arrow_scale, + y = sin(seq(0, 2*pi, length.out = 200)) * arrow_scale) + +ggplot(Y, aes(x = Dim1, y = Dim2, color = cluster)) + + geom_point(size = 2) + + geom_segment( + data = A, aes(x = 0, y = 0, xend = PC1, yend = PC2), + arrow = arrow(length = unit(0.2, "cm")), inherit.aes = FALSE, color = "gray40" + ) + + geom_text( + data = A, aes(x = PC1, y = PC2, label = colnames(macro)), inherit.aes = FALSE, + hjust = 1.1, vjust = 1.1, size = 3 + ) + + geom_path(data = circle, aes(x = x, y = y), inherit.aes = FALSE, + linetype = "dashed", color = "gray70") + + coord_fixed(xlim = lims, ylim = lims) + + labs(x = "Component 1", y = "Component 2", title = "Biplot") + + theme_minimal() +\end{example} +which leads to the result shown in Figure \ref{boxplot1}. +\begin{figure}[h!] +\small + \centering + \includegraphics[width=0.7\textwidth]{figures/13biplot.png} + \caption{Biplot of a FKM solution} + \label{boxplot1} +\end{figure} + +By using essential information in the output provided by \code{factkm}, we are able to see the cluster of each observation, represented in the estimated subspace induced by $\mathbf{A}$, as well as the relationships between observed and latent variables via the arrows. + +In order to obtain the parallel coordinates plot, a single instruction is sufficient, based on the same output as a starting point. +\begin{example} +library(GGally) +out <- factkm(macro, K = 3, Q = 2, Rndstart = 100) +ggparcoord( + data = Y, columns = 1:(ncol(Y)-1), + groupColumn = "cluster", scale = "uniminmax", + showPoints = FALSE, alphaLines = 0.5 +) + + theme_minimal() + + labs(title = "Parallel Coordinate Plot", + x = "Variables", y = "Normalized Value") +\end{example} +For FKM applied on \code{macro} dataset, the output is reported in figure \ref{parcoord}. +\begin{figure}[h] +\small + \centering + \includegraphics[width=0.7\textwidth]{figures/14parcoord.png} + \caption{Parallel coordinates plot of a FKM solution} + \label{parcoord} +\end{figure} + + + +\section{Conclusions}\label{Conclusions} +This work presents an R library that implements techniques of joint dimensionality reduction and clustering. Some of them are already implemented by other packages. In general, the performance between the proposed implementations and the earlier ones is very close, except for the FKM, where the new one is always better for the metrics considered here. As an element of novelty, the empty cluster(s) issue that may occur in the estimation process has been addressed by applying 2-means on the cluster with the highest deviance, preserving the monotonicity of the algorithm and providing slightly better results, at a higher computational costs. + +The implementation of the two dimensionality reduction methods, \code{dispca} and \code{disfa}, as well as \code{doublekm} offered by our library are novel in the sense that they do not find previous implementation in R. Besides the methodological difference between these last two, the latent variables are computed differently: the former uses the well-known eigendecomposition, while the latter adopts the power method. In general, by implementing all the models in C/C++, the speed advantage has been shown to be remarkable compared to all the existing comparisons. +These improvements allow the application of the techniques on datasets that are relatively large, to obtain results in reasonable amounts of time. Some additional functions have been implemented for the purpose of helping in the choice process for the values of the hyperparameters. Additionally, they can also be used as an assessment tool in order to evaluate the quality of the results provided by the implementations. +% +% +% \section*{References} +\bibliography{prunila-vichi} + +\address{Ionel Prunila\\ + Department of Statistical Sciences, Sapienza University of Rome\\ + P.le Aldo Moro 5, 00185 Rome\\ + Italy\\ + ORCiD: 0009-0009-3773-0481\\ +\email{ionel.prunila@uniroma1.it}} +% +\vspace{0.5cm} +% +\address{Maurizio Vichi\\ + Department of Statistical Sciences, Sapienza University of Rome\\ + P.le Aldo Moro 5, 00185 Rome\\ + Italy\\ + ORCiD: 0000-0002-3876-444X\\ +\email{maurizio.vichi@uniroma1.it}} diff --git a/_articles/RJ-2025-046/prunila-vichi_files/figure-html5/unnamed-chunk-1-1.png b/_articles/RJ-2025-046/prunila-vichi_files/figure-html5/unnamed-chunk-1-1.png new file mode 100644 index 0000000000..cd21f90d76 Binary files /dev/null and b/_articles/RJ-2025-046/prunila-vichi_files/figure-html5/unnamed-chunk-1-1.png differ diff --git a/_issues/.DS_Store b/_issues/.DS_Store deleted file mode 100644 index 9573b4326c..0000000000 Binary files a/_issues/.DS_Store and /dev/null differ diff --git a/_issues/2025-4/2025-4.Rmd b/_issues/2025-4/2025-4.Rmd new file mode 100644 index 0000000000..ed3e4d977d --- /dev/null +++ b/_issues/2025-4/2025-4.Rmd @@ -0,0 +1,84 @@ +--- +title: Volume 17/4 +description: Articles published in the December 2025 issue +draft: no +date: '2025-12-01' +volume: 17 +issue: 4 +editors: + chief: + - name: Rob J Hyndman + affiliation: Monash University + country: Australia + executive: + - name: Mark van der Loo + affiliation: Statistics Netherlands and Leiden University + country: Netherlands + - name: Emi Tanaka + affiliation: Australian National University + country: Australia + - name: Emily Zabor + affiliation: Cleveland Clinic + country: United States + technical: + - name: Mitchell O'Hara-Wild + affiliation: Monash University + country: Australia +articles: + before: ~ + after: ~ +output: + rjtools::rjournal_pdf_issue: + render_all: yes + rjtools::rjournal_web_issue: default +news: +- RJ-2025-4-bioconductor +- RJ-2025-4-cran +- RJ-2025-4-rforwards +- RJ-2025-4-rfoundation + +--- + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +  diff --git a/_issues/2025-4/2025-4.html b/_issues/2025-4/2025-4.html new file mode 100644 index 0000000000..4a37bce980 --- /dev/null +++ b/_issues/2025-4/2025-4.html @@ -0,0 +1,1790 @@ + + + + + + + + + + + + + + + + + + + + + + + + Volume 17/4 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +

    Volume 17/4

    + + + +

    Articles published in the December 2025 issue

    +
    + + +
    +

     

    +
    +

    Complete issue + +

    +

    Table of contents

    +

    Editorial
    Rob J Hyndman 3

    +

    Contributed Research Articles

    +

    csurvey: Implementing Order Constraints in Survey Data Analysis
    Xiyue Liao and Mary C. Meyer 4

    +

    LHD: An All-encompassing R Package for Constructing Optimal Latin Hypercube Designs
    Hongzhi Wang, Qian Xiao and Abhyuday Mandal 20

    +

    longevity: An R Package for Modelling Excess Lifetimes
    Léo R. Belzile 37

    +

    movieROC: Visualizing the Decision Rules Underlying Binary Classification
    Sonia Pérez-Fernández, Pablo Martínez-Camblor and Norberto Corral-Blanco 59

    +

    dtComb: A Comprehensive R Library and Web Tool for Combining Diagnostic Tests
    S. Ilayda Yerlitaş Taştan, Serra Bersan Gengeç, Necla Koçhan, Ertürk Zararsız, Selçuk Korkmaz and Zararsız 80

    +

    drclust: An R Package for Simultaneous Clustering and Dimensionality Reduction
    Ionel Prunila and Maurizio Vichi 103

    +

    AcceptReject: An R Package for Acceptance-Rejection Method
    Pedro Rafael Diniz Marinho and Vera L. D. Tomazella 133

    +

    qCBA: An R Package for Postoptimization of Rule Models Learnt on Quantized Data
    Tomas Kliegr 156

    +

    simlandr: Simulation-Based Landscape Construction for Dynamical Systems
    Jingmeng Cui, Merlijn Olthof, Anna Lichtwarck-Aschoff, Tiejun Li and Fred Hasselman 173

    +

    rvif: a Decision Rule to Detect Troubling Statistical Multicollinearity Based on Redefined VIF
    Román Salmerón-Gómez and Catalina B. García-García 192

    +

    ASML: An R Package for Algorithm Selection with Machine Learning
    Ignacio Gómez-Casares, Beatriz Pateiro-López, Brais González-Rodríguez and Julio González-Díaz 216

    +

    elhmc: An R Package for Hamiltonian Monte Carlo Sampling in Bayesian Empirical Likelihood
    Neo Han Wei, Dang Trung Kien and Sanjay Chaudhuri 237

    +

    GeRnika: An R Package for the Simulation, Visualization and Comparison of Tumor Phylogenies
    Aitor Sánchez-Ferrera, Maitena Tellaetxe-Abete and Borja Calvo-Molinos 255

    +

    MultiATSM: An R Package for Arbitrage-Free Macrofinance Multicountry Affine Term Structure Models
    Rubens Moura 275

    +

    memshare: Memory Sharing for Multicore Computation in R with an Application to Feature Selection by Mutual Information using PDE
    Michael C. Thrun and Julian Märte 305

    +

    News and Notes

    +

    Bioconductor Notes, December 2025
    Maria Doyle and Bioconductor Core Developer Team 322

    +

    Changes on CRAN
    Kurt Hornik, Uwe Ligges and Achim Zeileis 326

    +

    News from the Forwards Taskforce
    Heather Turner 328

    +

    R Foundation News
    Torsten Hothorn 330

    + + +
    + +
    +
    + + + + + +
    + + + + + + + diff --git a/_issues/2025-4/2025-4.pdf b/_issues/2025-4/2025-4.pdf new file mode 100644 index 0000000000..df05c84951 Binary files /dev/null and b/_issues/2025-4/2025-4.pdf differ diff --git a/_issues/2025-4/RJournal.sty b/_issues/2025-4/RJournal.sty new file mode 100644 index 0000000000..351990be38 --- /dev/null +++ b/_issues/2025-4/RJournal.sty @@ -0,0 +1,358 @@ +% Package `RJournal' to use with LaTeX2e +% Copyright (C) 2010 by the R Foundation +% Copyright (C) 2013 by the R Journal +% +% Originally written by Kurt Hornik and Friedrich Leisch with subsequent +% edits by the editorial board +% +% CAUTION: +% Do not modify this style file. Any changes to this file will be reset when your +% article is submitted. +% If you must modify the style or add LaTeX packages to the article, these +% should be specified in RJwrapper.tex + +\NeedsTeXFormat{LaTeX2e}[1995/12/01] +\ProvidesPackage{RJournal}[2025/10/05 v0.17 RJournal package] + +\RequirePackage{tikz} + +% Overall page layout, fonts etc ----------------------------------------------- + +% Issues of of \emph{The R Journal} are created from the standard \LaTeX{} +% document class \pkg{report}. + +\RequirePackage{geometry} +\geometry{a4paper, + textwidth=14cm, top=1cm, bottom=1cm, + includehead,includefoot,centering, + footskip=1.5cm} +\raggedbottom +\sloppy +\clubpenalty = 10000 +\widowpenalty = 10000 +\brokenpenalty = 10000 +\usepackage{microtype} + + +\RequirePackage{fancyhdr} +\fancyhead{} +\fancyheadoffset{2cm} +\fancyhead[L]{\textsc{\RJ@sectionhead}} +\fancyhead[R]{\thepage} +\fancyfoot{} +\fancyfoot[L]{The R Journal Vol. \RJ@volume/\RJ@number, \RJ@month~\RJ@year} +\fancyfoot[R]{ISSN 2073-4859} +\pagestyle{fancy} + +% We use the following fonts (all with T1 encoding): +% +% rm & palatino +% tt & inconsolata +% sf & helvetica +% math & palatino + +\RequirePackage{microtype} + +\RequirePackage[scaled=0.92]{helvet} +\RequirePackage{palatino,mathpazo} +\RequirePackage[scaled=1.02]{inconsolata} +\RequirePackage[T1]{fontenc} + +\RequirePackage[hyphens]{url} +\RequirePackage[pagebackref]{hyperref} +\renewcommand{\backref}[1]{[p#1]} + +% Dark blue colour for all links +\RequirePackage{color} +\definecolor{link}{rgb}{0.45,0.51,0.67} +\hypersetup{ + colorlinks,% + citecolor=link,% + filecolor=link,% + linkcolor=link,% + urlcolor=link +} + +% Give the text a little room to breath +\setlength{\parskip}{3pt} +\RequirePackage{setspace} +\setstretch{1.05} + +% Issue and article metadata --------------------------------------------------- + +% Basic front matter information about the issue: volume, number, and +% date. + +\newcommand{\volume}[1]{\def\RJ@volume{#1}} +\newcommand{\volnumber}[1]{\def\RJ@number{#1}} +\renewcommand{\month}[1]{\def\RJ@month{#1}} +\renewcommand{\year}[1]{\def\RJ@year{#1}} + + +% Individual articles correspond to +% chapters, and are contained in |article| environments. This makes it +% easy to have figures counted within articles and hence hyperlinked +% correctly. + +% An article has an author, a title, and optionally a subtitle. We use +% the obvious commands for specifying these. Articles will be put in certain +% journal sections, named by \sectionhead. + +\newcommand {\sectionhead} [1]{\def\RJ@sectionhead{#1}} +\renewcommand{\author} [1]{\def\RJ@author{#1}} +\renewcommand{\title} [1]{\def\RJ@title{#1}} +\newcommand {\subtitle} [1]{\def\RJ@subtitle{#1}} + +% Control appearance of titles: make slightly smaller than usual, and +% suppress section numbering. See http://tex.stackexchange.com/questions/69749 +% for why we don't use \setcounter{secnumdepth}{-1} + +\usepackage[medium]{titlesec} +\usepackage{titletoc} +\titleformat{\section} {\normalfont\large\bfseries}{\arabic{section}}{1em}{} +\titleformat{\subsection}{\normalfont\normalsize\bfseries}{\arabic{section}.\arabic{subsection}}{0.5em}{} +\titlecontents{chapter} [0em]{}{}{}{\titlerule*[1em]{.}\contentspage} + +% Article layout --------------------------------------------------------------- + +% Environment |article| clears the article header information at its beginning. +% We use |\FloatBarrier| from the placeins package to keep floats within +% the article. +\RequirePackage{placeins} +\newenvironment{article}{\author{}\title{}\subtitle{}\FloatBarrier}{\FloatBarrier} + +% Refereed articles should have an abstract, so we redefine |\abstract| to +% give the desired style + +\renewcommand{\abstract}[1]{\noindent\textbf{Abstract} #1} +\renewenvironment{abstract}{\noindent\textbf{Abstract}~}{} + +% The real work is done by a redefined version of |\maketitle|. Note +% that even though we do not want chapters (articles) numbered, we +% need to increment the chapter counter, so that figures get correct +% labelling. + +\renewcommand{\maketitle}{% +\noindent + \chapter{\RJ@title}\refstepcounter{chapter} + \ifx\empty\RJ@subtitle + \else + \noindent\textbf{\RJ@subtitle} + \par\nobreak\addvspace{\baselineskip} + \fi + \ifx\empty\RJ@author + \else + \noindent\textit{\RJ@author} + \par\nobreak\addvspace{\baselineskip} + \fi + \@afterindentfalse\@nobreaktrue\@afterheading +} + +% Now for some ugly redefinitions. We do not want articles to start a +% new page. (Actually, we do, but this is handled via explicit +% \newpage +% +% The name@of@eq is a hack to get hyperlinks to equations to work +% within each article, even though there may be multiple eq.(1) +% \begin{macrocode} +\renewcommand\chapter{\secdef\RJ@chapter\@schapter} +\providecommand{\nohyphens}{% + \hyphenpenalty=10000\exhyphenpenalty=10000\relax} +\newcommand{\RJ@chapter}{% + \edef\name@of@eq{equation.\@arabic{\c@chapter}}% + \renewcommand{\@seccntformat}[1]{}% + \@startsection{chapter}{0}{0mm}{% + -2\baselineskip \@plus -\baselineskip \@minus -.2ex}{\p@}{% + \phantomsection\normalfont\huge\bfseries\raggedright}} + +% Book reviews should appear as sections in the text and in the pdf bookmarks, +% however we wish them to appear as chapters in the TOC. Thus we define an +% alternative to |\maketitle| for reviews. +\newcommand{\review}[1]{ + \pdfbookmark[1]{#1}{#1} + \section*{#1} + \addtocontents{toc}{\protect\contentsline{chapter}{#1}{\thepage}{#1.1}} +} + +% We want bibliographies as starred sections within articles. +% +\RequirePackage[sectionbib,round]{natbib} +\bibliographystyle{abbrvnat} +\renewcommand{\bibsection}{\section*{References}} + +% Equations, figures and tables are counted within articles, but we do +% not show the article number. For equations it becomes a bit messy to avoid +% having hyperref getting it wrong. + +% \numberwithin{equation}{chapter} +\renewcommand{\theequation}{\@arabic\c@equation} +\renewcommand{\thefigure}{\@arabic\c@figure} +\renewcommand{\thetable}{\@arabic\c@table} + +% Issue layout ----------------------------------------------------------------- + +% Need to provide our own version of |\tableofcontents|. We use the +% tikz package to get the rounded rectangle. Notice that |\section*| +% is really the same as |\chapter*|. +\renewcommand{\contentsname}{Contents} +\renewcommand\tableofcontents{% + \vspace{1cm} + \section*{\contentsname} + { \@starttoc{toc} } +} + +\renewcommand{\titlepage}{% + \thispagestyle{empty} + \hypersetup{ + pdftitle={The R Journal Volume \RJ@volume/\RJ@number, \RJ@month \RJ@year},% + pdfauthor={R Foundation for Statistical Computing},% + } + \noindent + \begin{center} + \fontsize{50pt}{50pt}\selectfont + The \raisebox{-8pt}{\includegraphics[height=77pt]{Rlogo-5}}\hspace{10pt} + Journal + + \end{center} + {\large \hfill Volume \RJ@volume/\RJ@number, \RJ@month{} \RJ@year \quad} + + \rule{\textwidth}{1pt} + \begin{center} + {\Large A peer-reviewed, open-access publication of the \\ + R Foundation for Statistical Computing} + \end{center} + + % And finally, put in the TOC box. Note the way |tocdepth| is adjusted + % before and after producing the TOC: thus, we can ensure that only + % articles show up in the printed TOC, but that in the PDF version, + % bookmarks are created for sections and subsections as well (provided + % that the non-starred forms are used). + \setcounter{tocdepth}{0} + \tableofcontents + \setcounter{tocdepth}{2} + \clearpage +} + +% Text formatting -------------------------------------------------------------- + +\newcommand{\R}{R} +\newcommand{\address}[1]{\addvspace{\baselineskip}\noindent\emph{#1}} +\newcommand{\email}[1]{\href{mailto:#1}{\normalfont\texttt{#1}}} + +% Simple font selection is not good enough. For example, |\texttt{--}| +% gives `\texttt{--}', i.e., an endash in typewriter font. Hence, we +% need to turn off ligatures, which currently only happens for commands +% |\code| and |\samp| and the ones derived from them. Hyphenation is +% another issue; it should really be turned off inside |\samp|. And +% most importantly, \LaTeX{} special characters are a nightmare. E.g., +% one needs |\~{}| to produce a tilde in a file name marked by |\file|. +% Perhaps a few years ago, most users would have agreed that this may be +% unfortunate but should not be changed to ensure consistency. But with +% the advent of the WWW and the need for getting `|~|' and `|#|' into +% URLs, commands which only treat the escape and grouping characters +% specially have gained acceptance + +\DeclareRobustCommand\code{\bgroup\@noligs\@codex} +\def\@codex#1{\texorpdfstring% +{{\normalfont\ttfamily\hyphenchar\font=-1 #1}}% +{#1}\egroup} +\newcommand{\kbd}[1]{{\normalfont\texttt{#1}}} +\newcommand{\key}[1]{{\normalfont\texttt{\uppercase{#1}}}} +\DeclareRobustCommand\samp{`\bgroup\@noligs\@sampx} +\def\@sampx#1{{\normalfont\texttt{#1}}\egroup'} +\newcommand{\var}[1]{{\normalfont\textsl{#1}}} +\let\env=\code +\newcommand{\file}[1]{{`\normalfont\textsf{#1}'}} +\let\command=\code +\let\option=\samp +\newcommand{\dfn}[1]{{\normalfont\textsl{#1}}} +% \acronym is effectively disabled since not used consistently +\newcommand{\acronym}[1]{#1} +\newcommand{\strong}[1]{\texorpdfstring% +{{\normalfont\fontseries{b}\selectfont #1}}% +{#1}} +\let\pkg=\strong +\newcommand{\CRANpkg}[1]{\href{https://CRAN.R-project.org/package=#1}{\pkg{#1}}}% +\let\cpkg=\CRANpkg +\newcommand{\ctv}[1]{\href{https://CRAN.R-project.org/view=#1}{\emph{#1}}} +\newcommand{\BIOpkg}[1]{\href{https://www.bioconductor.org/packages/release/bioc/html/#1.html}{\pkg{#1}}} + +% Example environments --------------------------------------------------------- +\RequirePackage{fancyvrb} +\RequirePackage{alltt} + +\DefineVerbatimEnvironment{example}{Verbatim}{} +\renewenvironment{example*}{\begin{alltt}}{\end{alltt}} + +% Support for output from Sweave, and generic session style code +% These used to have fontshape=sl for Sinput/Scode/Sin, but pslatex +% won't use a condensed font in that case. + +% Update (2015-05-28 by DS): remove fontsize=\small to match example environment + +\DefineVerbatimEnvironment{Sinput}{Verbatim}{} +\DefineVerbatimEnvironment{Soutput}{Verbatim}{} +\DefineVerbatimEnvironment{Scode}{Verbatim}{} +\DefineVerbatimEnvironment{Sin}{Verbatim}{} +\DefineVerbatimEnvironment{Sout}{Verbatim}{} +\newenvironment{Schunk}{}{} + +% Mathematics ------------------------------------------------------------------ + +% The implementation of |\operatorname| is similar to the mechanism +% \LaTeXe{} uses for functions like sin and cos, and simpler than the +% one of \AmSLaTeX{}. We use |\providecommand| for the definition in +% order to keep the one of the \pkg{amstex} if this package has +% already been loaded. +% \begin{macrocode} +\providecommand{\operatorname}[1]{% + \mathop{\operator@font#1}\nolimits} +\RequirePackage{amsfonts} + +\renewcommand{\P}{% + \mathop{\operator@font I\hspace{-1.5pt}P\hspace{.13pt}}} +\newcommand{\E}{% + \mathop{\operator@font I\hspace{-1.5pt}E\hspace{.13pt}}} +\newcommand{\VAR}{\operatorname{var}} +\newcommand{\COV}{\operatorname{cov}} +\newcommand{\COR}{\operatorname{cor}} + +% Figures ---------------------------------------------------------------------- + +% For use with pandoc > 3.2.1 +\newsavebox\pandoc@box +\newcommand*\pandocbounded[1]{% scales image to fit in text height/width + \sbox\pandoc@box{#1}% + \Gscale@div\@tempa{\textheight}{\dimexpr\ht\pandoc@box+\dp\pandoc@box\relax}% + \Gscale@div\@tempb{\linewidth}{\wd\pandoc@box}% + \ifdim\@tempb\p@<\@tempa\p@\let\@tempa\@tempb\fi% select the smaller of both + \ifdim\@tempa\p@<\p@\scalebox{\@tempa}{\usebox\pandoc@box}% + \else\usebox{\pandoc@box}% + \fi% +} + +\RequirePackage[font=small,labelfont=bf]{caption} + +% Wide environments for figures and tables ------------------------------------- +\RequirePackage{environ} + +% An easy way to make a figure span the full width of the page +\NewEnviron{widefigure}[1][]{ +\begin{figure}[#1] +\advance\leftskip-2cm +\begin{minipage}{\dimexpr\textwidth+4cm\relax}% + \captionsetup{margin=2cm} + \BODY +\end{minipage}% +\end{figure} +} + +\NewEnviron{widetable}[1][]{ +\begin{table}[#1] +\advance\leftskip-2cm +\begin{minipage}{\dimexpr\textwidth+4cm\relax}% + \captionsetup{margin=2cm} + \BODY +\end{minipage}% +\end{table} +} diff --git a/_issues/2025-4/RJwrapper.tex b/_issues/2025-4/RJwrapper.tex new file mode 100644 index 0000000000..3ad0dbb186 --- /dev/null +++ b/_issues/2025-4/RJwrapper.tex @@ -0,0 +1,153 @@ +\documentclass[a4paper]{report} +\usepackage[utf8]{inputenc} +\usepackage[T1]{fontenc} +\usepackage{RJournal} +\usepackage{amsmath,amssymb,array} +\usepackage{booktabs} + +$if(highlighting-macros)$ +% Pandoc syntax highlighting +$highlighting-macros$ +$endif$ + +% tightlist command for lists without linebreak +\providecommand{\tightlist}{% + \setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}} + +\usepackage{longtable} +$if(tables)$ +% From pandoc table feature +\usepackage{booktabs,array} +$if(multirow)$ +\usepackage{multirow} +$endif$ +\usepackage{calc} % for calculating minipage widths +% Correct order of tables after \paragraph or \subparagraph +\usepackage{etoolbox} +\makeatletter +\patchcmd\longtable{\par}{\if@noskipsec\mbox{}\fi\par}{}{} +\makeatother +% Allow footnotes in longtable head/foot +\IfFileExists{footnotehyper.sty}{\usepackage{footnotehyper}}{\usepackage{footnote}} +\makesavenoteenv{longtable} +$endif$ + +% Always define CSL refs as bib entries are contained in separate doc +% Pandoc citation processing +$if(pandoc318)$ +%From Pandoc 3.1.8 +% definitions for citeproc citations +\NewDocumentCommand\citeproctext{}{} +\NewDocumentCommand\citeproc{mm}{% + \begingroup\def\citeproctext{#2}\cite{#1}\endgroup} +\makeatletter + % allow citations to break across lines + \let\@cite@ofmt\@firstofone + % avoid brackets around text for \cite: + \def\@biblabel#1{} + \def\@cite#1#2{{#1\if@tempswa , #2\fi}} +\makeatother +\newlength{\cslhangindent} +\setlength{\cslhangindent}{1.5em} +\newlength{\csllabelwidth} +\setlength{\csllabelwidth}{3em} +\newenvironment{CSLReferences}[2] % #1 hanging-indent, #2 entry-spacing + {\begin{list}{}{% + \setlength{\itemindent}{0pt} + \setlength{\leftmargin}{0pt} + \setlength{\parsep}{0pt} + % turn on hanging indent if param 1 is 1 + \ifodd #1 + \setlength{\leftmargin}{\cslhangindent} + \setlength{\itemindent}{-1\cslhangindent} + \fi + % set entry spacing + \setlength{\itemsep}{#2\baselineskip}}} + {\end{list}} +$else$$if(pandoc317)$ +% definitions for citeproc citations +\NewDocumentCommand\citeproctext{}{} +\NewDocumentCommand\citeproc{mm}{% + \begingroup\def\citeproctext{#2}\cite{#1}\endgroup} +% avoid brackets around text for \cite: +\makeatletter + \def\@biblabel#1{} + \def\@cite#1#2{{#1\if@tempswa , #2\fi}} +\makeatother +\newlength{\cslhangindent} +\setlength{\cslhangindent}{1.5em} +\newlength{\csllabelwidth} +\setlength{\csllabelwidth}{3em} +\newlength{\cslentryspacing} +\setlength{\cslentryspacing}{0em} +\usepackage{enumitem} +\newlist{CSLReferences}{itemize}{1} +\setlist[CSLReferences]{label={}, + leftmargin=\cslhangindent, + itemindent=-1\cslhangindent, + parsep=\parskip, + itemsep=\cslentryspacing} +$else$ +\newlength{\cslhangindent} +\setlength{\cslhangindent}{1.5em} +\newlength{\csllabelwidth} +\setlength{\csllabelwidth}{3em} +\newlength{\cslentryspacingunit} % times entry-spacing +\setlength{\cslentryspacingunit}{\parskip} +% for Pandoc 2.8 to 2.10.1 +\newenvironment{cslreferences}% + {$if(csl-hanging-indent)$\setlength{\parindent}{0pt}% + \everypar{\setlength{\hangindent}{\cslhangindent}}\ignorespaces$endif$}% + {\par} +% For Pandoc 2.11+ +\newenvironment{CSLReferences}[2] % #1 hanging-ident, #2 entry spacing + {% don't indent paragraphs + \setlength{\parindent}{0pt} + % turn on hanging indent if param 1 is 1 + \ifodd #1 + \let\oldpar\par + \def\par{\hangindent=\cslhangindent\oldpar} + \fi + % set entry spacing + \setlength{\parskip}{#2\cslentryspacingunit} + }% + {} +$endif$$endif$ +\usepackage{calc} +\newcommand{\CSLBlock}[1]{#1\hfill\break} +\newcommand{\CSLLeftMargin}[1]{\parbox[t]{\csllabelwidth}{#1}} +\newcommand{\CSLRightInline}[1]{\parbox[t]{\linewidth - \csllabelwidth}{#1}\break} +\newcommand{\CSLIndent}[1]{\hspace{\cslhangindent}#1} + + +$for(header-includes)$ +$header-includes$ +$endfor$ + +\begin{document} + +$for(include-before)$ +$include-before$ + +$endfor$ + +%% do not edit, for illustration only +\sectionhead{$journal.section$} +\volume{$volume$} +\volnumber{$issue$} +\year{$issueyear$} +\month{$issuemonth$} +$if(journal.firstpage)$ +\setcounter{page}{$journal.firstpage$} +$endif$ + +\begin{article} + \input{$filename$} +\end{article} + +$for(include-after)$ +$include-after$ + +$endfor$ + +\end{document} diff --git a/_issues/2025-4/Rlogo-5.png b/_issues/2025-4/Rlogo-5.png new file mode 100644 index 0000000000..077505788a Binary files /dev/null and b/_issues/2025-4/Rlogo-5.png differ diff --git a/_issues/2025-4/doi.xml b/_issues/2025-4/doi.xml new file mode 100644 index 0000000000..ef60f74b75 --- /dev/null +++ b/_issues/2025-4/doi.xml @@ -0,0 +1,629 @@ + + + + 20260224163042 + 20260224163042 + + R Foundation + R-foundation@r-project.org + + CrossRef + + + + + The R Journal + The R Journal + 2073-4859 + + + + + + + 12 + 01 + 2025 + + + 17 + + 4 + + + + + csurvey: Implementing Order Constraints in Survey Data Analysis + + + + Xiyue + Liao + San Diego State University + + + + Mary + C. Meyer + Colorado State University + + + + + 01 + 05 + 2026 + + + 4 + 19 + + + 10.32614/RJ-2025-032 + + + 10.32614/RJ-2025-032 + https://journal.r-project.org/articles/RJ-2025-032 + + + + + LHD: An All-encompassing R Package for Constructing Optimal Latin Hypercube Designs + + + + Hongzhi + Wang + [wanghongzhi.ut@gmail.com](wanghongzhi.ut@gmail.com){.uri} + + + + + Qian + Xiao + Department of Statistics, School of Mathematical Sciences, Shanghai +Jiao Tong University + + + + Abhyuday + Mandal + Department of Statistics, University of Georgia + + + + + 01 + 05 + 2026 + + + 20 + 36 + + + 10.32614/RJ-2025-033 + + + 10.32614/RJ-2025-033 + https://journal.r-project.org/articles/RJ-2025-033 + + + + + longevity: An R Package for Modelling Excess Lifetimes + + + + Léo + R. Belzile + Department of Decision Sciences, HEC Montréal + + + + + 01 + 06 + 2026 + + + 37 + 58 + + + 10.32614/RJ-2025-034 + + + 10.32614/RJ-2025-034 + https://journal.r-project.org/articles/RJ-2025-034 + + + + + movieROC: Visualizing the Decision Rules Underlying Binary Classification + + + + Sonia + Pérez-Fernández + Department of Statistics and Operations Research and Mathematics Didactics, University of Oviedo (Asturias, Spain) + + + + Pablo + Martínez-Camblor + Department of Anesthesiology, Geisel School of Medicine at Dartmouth (NH, USA); Faculty of Health Sciences, Universidad Autónoma de Chile (Chile) + + + + Norberto + Corral-Blanco + Department of Statistics and Operations Research and Mathematics Didactics, University of Oviedo (Asturias, Spain) + + + + + 02 + 18 + 2026 + + + 59 + 79 + + + 10.32614/RJ-2025-035 + + + 10.32614/RJ-2025-035 + https://journal.r-project.org/articles/RJ-2025-035 + + + + + dtComb: A Comprehensive R Library and Web Tool for Combining Diagnostic Tests + + + + S. + Ilayda Yerlitaş Taştan + Department of Biostatistics + + + + Serra + Bersan Gengeç + Department of Biostatistics + + + + Necla + Koçhan + Department of Mathematics + + + + Ertürk + Zararsız + Department of Biostatistics + + + + Selçuk + Korkmaz + Department of Biostatistics + + + + Zararsız + + Department of Biostatistics + + + + + 02 + 04 + 2026 + + + 80 + 102 + + + 10.32614/RJ-2025-036 + + + 10.32614/RJ-2025-036 + https://journal.r-project.org/articles/RJ-2025-036 + + + + + drclust: An R Package for Simultaneous Clustering and Dimensionality Reduction + + + + Ionel + Prunila + Department of Statistical Sciences, Sapienza University of Rome + + + + Maurizio + Vichi + Department of Statistical Sciences, Sapienza University of Rome + + + + + 02 + 04 + 2026 + + + 103 + 132 + + + 10.32614/RJ-2025-046 + + + 10.32614/RJ-2025-046 + https://journal.r-project.org/articles/RJ-2025-046 + + + + + AcceptReject: An R Package for Acceptance-Rejection Method + + + + Pedro + Rafael Diniz Marinho + Federal University of Paraíba + + + + Vera + L. D. Tomazella + Federal University of São Carlos + + + + + 02 + 04 + 2026 + + + 133 + 155 + + + 10.32614/RJ-2025-037 + + + 10.32614/RJ-2025-037 + https://journal.r-project.org/articles/RJ-2025-037 + + + + + qCBA: An R Package for Postoptimization of Rule Models Learnt on Quantized Data + + + + Tomas + Kliegr + Prague University of Economics and Business + + + + + 02 + 04 + 2026 + + + 156 + 172 + + + 10.32614/RJ-2025-038 + + + 10.32614/RJ-2025-038 + https://journal.r-project.org/articles/RJ-2025-038 + + + + + simlandr: Simulation-Based Landscape Construction for Dynamical Systems + + + + Jingmeng + Cui + University of Groningen,Radboud UniversityUniversity of Groningen,Radboud University + + + + Merlijn + Olthof + University of Groningen,Radboud UniversityUniversity of Groningen,Radboud University + + + + Anna + Lichtwarck-Aschoff + University of Groningen,Radboud UniversityUniversity of Groningen,Radboud University + + + + Tiejun + Li + Peking University + + + + Fred + Hasselman + Radboud University + + + + + 02 + 13 + 2026 + + + 173 + 191 + + + 10.32614/RJ-2025-039 + + + 10.32614/RJ-2025-039 + https://journal.r-project.org/articles/RJ-2025-039 + + + + + rvif: a Decision Rule to Detect Troubling Statistical Multicollinearity Based on Redefined VIF + + + + Román + Salmerón-Gómez + University of Granada + + + + Catalina + B. García-García + University of Granada + + + + + 02 + 04 + 2026 + + + 192 + 215 + + + 10.32614/RJ-2025-040 + + + 10.32614/RJ-2025-040 + https://journal.r-project.org/articles/RJ-2025-040 + + + + + ASML: An R Package for Algorithm Selection with Machine Learning + + + + Ignacio + Gómez-Casares + Universidade de Santiago de Compostela + + + + Beatriz + Pateiro-López + Universidade de Santiago de Compostela + + + + Brais + González-Rodríguez + Universidade de Vigo + + + + Julio + González-Díaz + Universidade de Santiago de Compostela + + + + + 02 + 10 + 2026 + + + 216 + 236 + + + 10.32614/RJ-2025-045 + + + 10.32614/RJ-2025-045 + https://journal.r-project.org/articles/RJ-2025-045 + + + + + elhmc: An R Package for Hamiltonian Monte Carlo Sampling in Bayesian Empirical Likelihood + + + + Neo + Han Wei + Citibank, Singapore + + + + Dang + Trung Kien + Independent Consultant + + + + Sanjay + Chaudhuri + University of Nebraska-Lincoln + + + + + 01 + 06 + 2026 + + + 237 + 254 + + + 10.32614/RJ-2025-041 + + + 10.32614/RJ-2025-041 + https://journal.r-project.org/articles/RJ-2025-041 + + + + + GeRnika: An R Package for the Simulation, Visualization and Comparison of Tumor Phylogenies + + + + Aitor + Sánchez-Ferrera + Intelligent Systems Group, Computer Science Faculty, University of the +Basque Country + + + + Maitena + Tellaetxe-Abete + Intelligent Systems Group, Computer Science Faculty, University of the +Basque Country + + + + Borja + Calvo-Molinos + Intelligent Systems Group, Computer Science Faculty, University of the +Basque Country + + + + + 02 + 11 + 2026 + + + 255 + 274 + + + 10.32614/RJ-2025-042 + + + 10.32614/RJ-2025-042 + https://journal.r-project.org/articles/RJ-2025-042 + + + + + MultiATSM: An R Package for Arbitrage-Free Macrofinance Multicountry Affine Term Structure Models + + + + Rubens + Moura + Banco de Mexico + + + + + 02 + 10 + 2026 + + + 275 + 304 + + + 10.32614/RJ-2025-044 + + + 10.32614/RJ-2025-044 + https://journal.r-project.org/articles/RJ-2025-044 + + + + + memshare: Memory Sharing for Multicore Computation in R with an Application to Feature Selection by Mutual Information using PDE + + + + Michael + C. Thrun + University of Marburg,IAP-GmbH Intelligent Analytics ProjectsUniversity of Marburg,IAP-GmbH Intelligent Analytics Projects + + + + Julian + Märte + IAP-GmbH Intelligent Analytics Projects + + + + + 02 + 04 + 2026 + + + 305 + 321 + + + 10.32614/RJ-2025-043 + + + 10.32614/RJ-2025-043 + https://journal.r-project.org/articles/RJ-2025-043 + + + + + diff --git a/_news/RJ-2025-4-bioconductor/RJ-2025-4-bioconductor.R b/_news/RJ-2025-4-bioconductor/RJ-2025-4-bioconductor.R new file mode 100644 index 0000000000..0f605083da --- /dev/null +++ b/_news/RJ-2025-4-bioconductor/RJ-2025-4-bioconductor.R @@ -0,0 +1,16 @@ +# Generated by `rjournal_pdf_article()` using `knitr::purl()`: do not edit by hand +# Please edit RJ-2025-4-bioconductor.Rmd to modify this file + +## ----lkdat,results="hide",echo=FALSE,message=FALSE---------------------------- +library(BiocPkgTools) +softli = biocPkgList(version="3.22") +expli = biocPkgList(version="3.22", repo="BioCexp") +worli = biocPkgList(version="3.22", repo="BioCworkflows") +annli = biocPkgList(version="3.22", repo="BioCann") +curR = "4.5" +nsoft = nrow(softli) +nexp = nrow(expli) +nanno = nrow(annli) +nwkfl = nrow(worli) +nbooks = 8 + diff --git a/_news/RJ-2025-4-bioconductor/RJ-2025-4-bioconductor.Rmd b/_news/RJ-2025-4-bioconductor/RJ-2025-4-bioconductor.Rmd new file mode 100644 index 0000000000..c95cd7d056 --- /dev/null +++ b/_news/RJ-2025-4-bioconductor/RJ-2025-4-bioconductor.Rmd @@ -0,0 +1,173 @@ +--- +title: Bioconductor Notes, December 2025 +date: '2025-12-01' +author: +- name: Maria Doyle + affiliation: University of Limerick + address: Bioconductor Community Manager +- name: Bioconductor Core Developer Team + affiliation: Dana-Farber Cancer Institute + address: Roswell Park Comprehensive Cancer Center, City University of New York, + Fred Hutchinson Cancer Research Center, Mass General Brigham +output: + rjtools::rjournal_article: + self_contained: yes + toc: no +volume: 17 +issue: 4 +slug: RJ-2025-4-bioconductor +draft: no +journal: + lastpage: 325 + firstpage: 322 + +--- + + +# Introduction + +[Bioconductor](https://bioconductor.org) provides +tools for the analysis and comprehension of high-throughput genomic +data. The project has entered its twenty-first year, with funding +for core development and infrastructure maintenance secured +through 2025 (NIH NHGRI 2U24HG004059). Additional support is provided +by NIH NCI, Chan-Zuckerberg Initiative, National Science Foundation, +Microsoft, and Amazon. In this news report, we give some updates on +core team and project activities. + +# Software + +```{r lkdat,results="hide",echo=FALSE,message=FALSE} +library(BiocPkgTools) +softli = biocPkgList(version="3.22") +expli = biocPkgList(version="3.22", repo="BioCexp") +worli = biocPkgList(version="3.22", repo="BioCworkflows") +annli = biocPkgList(version="3.22", repo="BioCann") +curR = "4.5" +nsoft = nrow(softli) +nexp = nrow(expli) +nanno = nrow(annli) +nwkfl = nrow(worli) +nbooks = 8 +``` + +Bioconductor [3.22](https://bioconductor.org/news/bioc_3_22_release/), released in October 2025, is now available. It is +compatible with R `r curR` and consists of `r nsoft` software packages, `r nexp` +experiment data packages, `r nanno` up-to-date annotation packages, `r nwkfl` +workflows, and `r nbooks` books. [Books](https://bioconductor.org/books/release/) are +built regularly from source, ensuring full reproducibility; an example is the +community-developed [Orchestrating Single-Cell Analysis with Bioconductor](https://bioconductor.org/books/release/OSCA/). + + +# Core Team and Infrastructure Updates + +Bioconductor has introduced new GPU infrastructure, funded by CZI EOSS 6, to support developers building GPU‑accelerated packages, including Nvidia GPU build nodes, GPU‑aware containers, and a new biocViews term. Maintainers can now opt into GPU software builds and mark packages as GPU‑optional or GPU‑required. See [the blog post](https://blog.bioconductor.org/posts/2025-10-10-gpus/) for more details. + +NEWS summaries for three contributed packages chosen at random from the 59 new software contributions are: + + + +- **dmGsea**: The R package dmGsea provides efficient gene set enrichment analysis specifically for DNA methylation data. It addresses key biases, including probe dependency and varying probe numbers per gene. The package supports Illumina 450K, EPIC, and mouse methylation arrays. Users can also apply it to other omics data by supplying custom probe-to-gene mapping annotations. dmGsea is flexible, fast, and well-suited for large-scale epigenomic studies. + +- **goatea**: Geneset Ordinal Association Test Enrichment Analysis (GOATEA) provides a ‘Shiny’ interface with interactive visualizations and utility functions for performing and exploring automated gene set enrichment analysis using the ‘GOAT’ package. ‘GOATEA’ is designed to support large-scale and user-friendly enrichment workflows across multiple gene lists and comparisons, with flexible plotting and output options. Visualizations pre-enrichment include interactive ‘Volcano’ and ‘UpSet’ (overlap) plots. Visualizations post-enrichment include interactive geneset dotplot, geneset treeplot, gene-effectsize heatmap, gene-geneset heatmap and ‘STRING’ database of protein-protein-interactions network graph. ‘GOAT’ reference: Frank Koopmans (2024) . + +- **igblastr**: The igblastr package provides functions to conveniently install and use a local IgBLAST installation from within R. IgBLAST is described at https://pubmed.ncbi.nlm.nih.gov/23671333/. Online IgBLAST: https://www.ncbi.nlm.nih.gov/igblast/. + +See the NEWS section in the [release announcement](https://bioconductor.org/news/bioc_3_22_release/) for +a complete account of changes throughout the ecosystem. + + +# Community and Impact + +## Community Team Updates +Nicholas Cooley has joined the Bioconductor Community Team as Developer Engagement Lead, based at the University of Limerick in Ireland. Working with the Community Manager, his role focuses on supporting package developers, improving onboarding resources, and strengthening connections across the developer community. Nick is funded through [CZI EOSS 6](https://blog.bioconductor.org/posts/2024-07-12-czi-eoss6-grants/). + +Laurah Ondari has also joined the team part‑time, based at the International Institute of Tropical Agriculture (IITA) in Kenya. She works with the Community Manager on communications, social media, and community engagement, including supporting Africa-focused capacity‑building efforts. Laurah’s position is also supported by [CZI EOSS 6](https://blog.bioconductor.org/posts/2024-07-12-czi-eoss6-grants/) funding. + +## Outreachy Internships + +The June–August 2025 [Outreachy](https://www.outreachy.org/) program concluded successfully, with interns contributing to Bioconductor and sharing their reflections in a [blog post](https://blog.bioconductor.org/posts/2025-12-12-outreachy-june25/). + +## Community Updates + +The Bioconductor Seminar Series is a new quarterly online event showcasing recent advances in computational biology and their relevance to Bioconductor methods, workflows, and community practice. Conceived by Bioconductor founder Robert Gentleman and organised by Erica Feick, the series began in December 2025 and brings together expert speakers, moderated discussions, and open Q&A. The first session was on *Deep-learning-based Gene Perturbation Effect Prediction Does Not Yet Outperform Simple Linear Baselines* (Nature Methods, 2025) with speakers Constantin Ahlmann-Eltze, Wolfgang Huber, Simon Anders and discussant Davide Risso. It was well attended with engaging discussion. More details can be found on the [Bioconductor website](https://bioconductor.org/help/seminar-series/). + +## Publications and Preprints + +In November 2025, Crowell et al. released a [preprint](https://www.biorxiv.org/content/10.1101/2025.11.20.688607v1) describing their online book Orchestrating Spatial Transcriptomics Analysis with Bioconductor (OSTA), a collaborative effort supported by leaders across the community and built on two decades of Bioconductor’s foundational work. The project invites feedback, suggestions, and contributions from Bioconductor users and developers. + + +# Conferences and Workshops + +## Recaps + +* **EuroBioC 2025:** The European Bioconductor Conference (GBCC 2025) was held in September 2025 in Barcelona. A blog post recap of the conference can be found in the [Bioconductor blog](https://blog.bioconductor.org/posts/2025-10-24-EuroBioC2025-recap/). Recordings of talks are available in a [YouTube playlist](https://www.youtube.com/playlist?list=PLdl4u5ZRDMQS_qvtLJNdDqHL6z5jj5y_7). +* **BioCAsia 2025:** BioCAsia 2025 was held as part of the ABACBS conference in Adelaide, November 27-28. The focus was on hands-on workshops and saw >100 participants. For more information, visit the [conference website](https://biocasia2025.bioconductor.org/). +* **Global Training:** We have had a busy second half of the year of training events, with two in-person courses in Africa, in [Ethiopia](https://blog.bioconductor.org/posts/2025-11-24-ethiopia-course/) and [Benin](https://blog.bioconductor.org/posts/2025-12-11-benin-course/). + +## Announcements + +* **EuroBioC 2026:** The European Bioconductor conference is taking place in Turku, Finland from June 3-5, with pre-conference activities June 1-2. For more information, visit the [conference website](https://eurobioc2026.bioconductor.org/). +* **BioC 2026:** The North American Bioconductor conference is taking place in Seattle from August 10-12, with post-conference activities August 13-14. For more information, visit the [conference website](https://bioc2026.bioconductor.org/). + + +# Boards and Working Groups Updates + +## New Board Members + +- The [Community Advisory Board](https://bioconductor.org/about/community-advisory-board/) (CAB) welcomes new members Fabricio Almeida-Silva, Tuomas Borman, Laurent Gatto, Zuguang Gu, Eliana Ibrahimi, Martha Luka and Izabela Mamede. We extend our gratitude to outgoing members Jasmine Daly, Leo Lahti, Nicole Ortogero, Janani Ravi, Luyi Tian, Hedia Tnani and Jiefei Wang for their service. + +- The [Technical Advisory Board](https://bioconductor.org/about/technical-advisory-board/) (TAB) welcomes new members Robert Castelo and Hugo Gruson and Gabriele Sales. Outgoing members Stephanie Hicks, Davide Risso and Charlotte Soneson are also warmly thanked for their contributions. + +# Using Bioconductor + +Start using +Bioconductor by installing the most recent version of R and evaluating +the commands +``` + if (!requireNamespace("BiocManager", quietly = TRUE)) + install.packages("BiocManager") + BiocManager::install() +``` + +Install additional packages and dependencies, +e.g., [SingleCellExperiment](https://bioconductor.org/packages/SingleCellExperiment), with +``` + BiocManager::install("SingleCellExperiment") +``` +[Docker](https://bioconductor.org/help/docker/) +images provides a very effective on-ramp for power users to rapidly +obtain access to standardized and scalable computing environments. +Key resources include: + + +- [bioconductor.org](https://bioconductor.org) to install, + learn, use, and develop Bioconductor packages. +- A list of [available software](https://bioconductor.org/packages) + linking to pages describing each package. +- A question-and-answer style + [user support site](https://support.bioconductor.org) and + developer-oriented [mailing list](https://stat.ethz.ch/mailman/listinfo/bioc-devel). +- A community Zulip workspace ([sign up](https://chat.bioconductor.org)) + for extended technical discussion. +- The [F1000Research Bioconductor gateway](https://f1000research.com/gateways/bioconductor) +for peer-reviewed Bioconductor workflows as well as conference contributions. +- The [Bioconductor YouTube](https://www.youtube.com/user/bioconductor) + channel includes recordings of keynote and talks from recent + conferences, in addition to + video recordings of training courses. +- Our [package submission](https://github.com/Bioconductor/Contributions) + repository for open technical review of new packages. + +Upcoming and recently completed events are browsable at our +[events page](https://bioconductor.org/help/events/). + +The [Technical](https://bioconductor.org/about/technical-advisory-board/) and [Community](https://bioconductor.org/about/community-advisory-board/) +Advisory Boards provide guidance to ensure that the project addresses +leading-edge biological problems with advanced technical approaches, +and adopts practices (such as a +project-wide [Code of Conduct](https://bioconductor.org/about/code-of-conduct/)) +that encourages all to participate. We look forward to +welcoming you! + +We welcome your feedback on these updates and invite you to connect with us through the [Bioconductor Zulip](https://chat.bioconductor.org) workspace or by emailing community@bioconductor.org. diff --git a/_news/RJ-2025-4-bioconductor/RJ-2025-4-bioconductor.html b/_news/RJ-2025-4-bioconductor/RJ-2025-4-bioconductor.html new file mode 100644 index 0000000000..0d78de23fe --- /dev/null +++ b/_news/RJ-2025-4-bioconductor/RJ-2025-4-bioconductor.html @@ -0,0 +1,1904 @@ + + + + + + + + + + + + + + + + + + + + + Bioconductor Notes, December 2025 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +

    Bioconductor Notes, December 2025

    + + + +

    “Bioconductor Notes, December 2025” published in The R Journal.

    +
    + + + +
    +

    1 Introduction

    +

    Bioconductor provides +tools for the analysis and comprehension of high-throughput genomic +data. The project has entered its twenty-first year, with funding +for core development and infrastructure maintenance secured +through 2025 (NIH NHGRI 2U24HG004059). Additional support is provided +by NIH NCI, Chan-Zuckerberg Initiative, National Science Foundation, +Microsoft, and Amazon. In this news report, we give some updates on +core team and project activities.

    +

    2 Software

    +
    + +
    +

    Bioconductor 3.22, released in October 2025, is now available. It is +compatible with R 4.5 and consists of 2361 software packages, 435 +experiment data packages, 928 up-to-date annotation packages, 29 +workflows, and 8 books. Books are +built regularly from source, ensuring full reproducibility; an example is the +community-developed Orchestrating Single-Cell Analysis with Bioconductor.

    +

    3 Core Team and Infrastructure Updates

    +

    Bioconductor has introduced new GPU infrastructure, funded by CZI EOSS 6, to support developers building GPU‑accelerated packages, including Nvidia GPU build nodes, GPU‑aware containers, and a new biocViews term. Maintainers can now opt into GPU software builds and mark packages as GPU‑optional or GPU‑required. See the blog post for more details.

    +

    NEWS summaries for three contributed packages chosen at random from the 59 new software contributions are:

    + +
      +
    • dmGsea: The R package dmGsea provides efficient gene set enrichment analysis specifically for DNA methylation data. It addresses key biases, including probe dependency and varying probe numbers per gene. The package supports Illumina 450K, EPIC, and mouse methylation arrays. Users can also apply it to other omics data by supplying custom probe-to-gene mapping annotations. dmGsea is flexible, fast, and well-suited for large-scale epigenomic studies.

    • +
    • goatea: Geneset Ordinal Association Test Enrichment Analysis (GOATEA) provides a ‘Shiny’ interface with interactive visualizations and utility functions for performing and exploring automated gene set enrichment analysis using the ‘GOAT’ package. ‘GOATEA’ is designed to support large-scale and user-friendly enrichment workflows across multiple gene lists and comparisons, with flexible plotting and output options. Visualizations pre-enrichment include interactive ‘Volcano’ and ‘UpSet’ (overlap) plots. Visualizations post-enrichment include interactive geneset dotplot, geneset treeplot, gene-effectsize heatmap, gene-geneset heatmap and ‘STRING’ database of protein-protein-interactions network graph. ‘GOAT’ reference: Frank Koopmans (2024) doi:10.1038/s42003-024-06454-5.

    • +
    • igblastr: The igblastr package provides functions to conveniently install and use a local IgBLAST installation from within R. IgBLAST is described at https://pubmed.ncbi.nlm.nih.gov/23671333/. Online IgBLAST: https://www.ncbi.nlm.nih.gov/igblast/.

    • +
    +

    See the NEWS section in the release announcement for +a complete account of changes throughout the ecosystem.

    +

    4 Community and Impact

    +

    4.1 Community Team Updates

    +

    Nicholas Cooley has joined the Bioconductor Community Team as Developer Engagement Lead, based at the University of Limerick in Ireland. Working with the Community Manager, his role focuses on supporting package developers, improving onboarding resources, and strengthening connections across the developer community. Nick is funded through CZI EOSS 6.

    +

    Laurah Ondari has also joined the team part‑time, based at the International Institute of Tropical Agriculture (IITA) in Kenya. She works with the Community Manager on communications, social media, and community engagement, including supporting Africa-focused capacity‑building efforts. Laurah’s position is also supported by CZI EOSS 6 funding.

    +

    4.2 Outreachy Internships

    +

    The June–August 2025 Outreachy program concluded successfully, with interns contributing to Bioconductor and sharing their reflections in a blog post.

    +

    4.3 Community Updates

    +

    The Bioconductor Seminar Series is a new quarterly online event showcasing recent advances in computational biology and their relevance to Bioconductor methods, workflows, and community practice. Conceived by Bioconductor founder Robert Gentleman and organised by Erica Feick, the series began in December 2025 and brings together expert speakers, moderated discussions, and open Q&A. The first session was on Deep-learning-based Gene Perturbation Effect Prediction Does Not Yet Outperform Simple Linear Baselines (Nature Methods, 2025) with speakers Constantin Ahlmann-Eltze, Wolfgang Huber, Simon Anders and discussant Davide Risso. It was well attended with engaging discussion. More details can be found on the Bioconductor website.

    +

    4.4 Publications and Preprints

    +

    In November 2025, Crowell et al. released a preprint describing their online book Orchestrating Spatial Transcriptomics Analysis with Bioconductor (OSTA), a collaborative effort supported by leaders across the community and built on two decades of Bioconductor’s foundational work. The project invites feedback, suggestions, and contributions from Bioconductor users and developers.

    +

    5 Conferences and Workshops

    +

    5.1 Recaps

    +
      +
    • EuroBioC 2025: The European Bioconductor Conference (GBCC 2025) was held in September 2025 in Barcelona. A blog post recap of the conference can be found in the Bioconductor blog. Recordings of talks are available in a YouTube playlist.
    • +
    • BioCAsia 2025: BioCAsia 2025 was held as part of the ABACBS conference in Adelaide, November 27-28. The focus was on hands-on workshops and saw >100 participants. For more information, visit the conference website.
    • +
    • Global Training: We have had a busy second half of the year of training events, with two in-person courses in Africa, in Ethiopia and Benin.
    • +
    +

    5.2 Announcements

    +
      +
    • EuroBioC 2026: The European Bioconductor conference is taking place in Turku, Finland from June 3-5, with pre-conference activities June 1-2. For more information, visit the conference website.
    • +
    • BioC 2026: The North American Bioconductor conference is taking place in Seattle from August 10-12, with post-conference activities August 13-14. For more information, visit the conference website.
    • +
    +

    6 Boards and Working Groups Updates

    +

    6.1 New Board Members

    +
      +
    • The Community Advisory Board (CAB) welcomes new members Fabricio Almeida-Silva, Tuomas Borman, Laurent Gatto, Zuguang Gu, Eliana Ibrahimi, Martha Luka and Izabela Mamede. We extend our gratitude to outgoing members Jasmine Daly, Leo Lahti, Nicole Ortogero, Janani Ravi, Luyi Tian, Hedia Tnani and Jiefei Wang for their service.

    • +
    • The Technical Advisory Board (TAB) welcomes new members Robert Castelo and Hugo Gruson and Gabriele Sales. Outgoing members Stephanie Hicks, Davide Risso and Charlotte Soneson are also warmly thanked for their contributions.

    • +
    +

    7 Using Bioconductor

    +

    Start using +Bioconductor by installing the most recent version of R and evaluating +the commands

    +
      if (!requireNamespace("BiocManager", quietly = TRUE))
    +      install.packages("BiocManager")
    +  BiocManager::install()
    +

    Install additional packages and dependencies, +e.g., SingleCellExperiment, with

    +
      BiocManager::install("SingleCellExperiment")
    +

    Docker +images provides a very effective on-ramp for power users to rapidly +obtain access to standardized and scalable computing environments. +Key resources include:

    + +

    Upcoming and recently completed events are browsable at our +events page.

    +

    The Technical and Community +Advisory Boards provide guidance to ensure that the project addresses +leading-edge biological problems with advanced technical approaches, +and adopts practices (such as a +project-wide Code of Conduct) +that encourages all to participate. We look forward to +welcoming you!

    +

    We welcome your feedback on these updates and invite you to connect with us through the Bioconductor Zulip workspace or by emailing .

    +
    + + +
    + +
    +
    + + + + + +
    +

    Reuse

    +

    Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

    +

    Citation

    +

    For attribution, please cite this work as

    +
    Doyle & Bioconductor Core Developer Team, "Bioconductor Notes, December 2025", The R Journal, 2025
    +

    BibTeX citation

    +
    @article{RJ-2025-4-bioconductor,
    +  author = {Doyle, Maria and Bioconductor Core Developer Team, },
    +  title = {Bioconductor Notes, December 2025},
    +  journal = {The R Journal},
    +  year = {2025},
    +  note = {https://journal.r-project.org/news/RJ-2025-4-bioconductor},
    +  volume = {17},
    +  issue = {4},
    +  issn = {2073-4859},
    +  pages = {322-325}
    +}
    +
    + + + + + + + diff --git a/_news/RJ-2025-4-bioconductor/RJ-2025-4-bioconductor.pdf b/_news/RJ-2025-4-bioconductor/RJ-2025-4-bioconductor.pdf new file mode 100644 index 0000000000..a528e0c7cd Binary files /dev/null and b/_news/RJ-2025-4-bioconductor/RJ-2025-4-bioconductor.pdf differ diff --git a/_news/RJ-2025-4-bioconductor/RJ-2025-4-bioconductor.tex b/_news/RJ-2025-4-bioconductor/RJ-2025-4-bioconductor.tex new file mode 100644 index 0000000000..77525fabe0 --- /dev/null +++ b/_news/RJ-2025-4-bioconductor/RJ-2025-4-bioconductor.tex @@ -0,0 +1,187 @@ +% !TeX root = RJwrapper.tex +\title{Bioconductor Notes, December 2025} + + +\author{by Maria Doyle and Bioconductor~Core~Developer~Team} + +\maketitle + + +\section{Introduction}\label{introduction} + +\href{https://bioconductor.org}{Bioconductor} provides +tools for the analysis and comprehension of high-throughput genomic +data. The project has entered its twenty-first year, with funding +for core development and infrastructure maintenance secured +through 2025 (NIH NHGRI 2U24HG004059). Additional support is provided +by NIH NCI, Chan-Zuckerberg Initiative, National Science Foundation, +Microsoft, and Amazon. In this news report, we give some updates on +core team and project activities. + +\section{Software}\label{software} + +Bioconductor \href{https://bioconductor.org/news/bioc_3_22_release/}{3.22}, released in October 2025, is now available. It is +compatible with R 4.5 and consists of 2361 software packages, 435 +experiment data packages, 928 up-to-date annotation packages, 29 +workflows, and 8 books. \href{https://bioconductor.org/books/release/}{Books} are +built regularly from source, ensuring full reproducibility; an example is the +community-developed \href{https://bioconductor.org/books/release/OSCA/}{Orchestrating Single-Cell Analysis with Bioconductor}. + +\section{Core Team and Infrastructure Updates}\label{core-team-and-infrastructure-updates} + +Bioconductor has introduced new GPU infrastructure, funded by CZI EOSS 6, to support developers building GPU‑accelerated packages, including Nvidia GPU build nodes, GPU‑aware containers, and a new biocViews term. Maintainers can now opt into GPU software builds and mark packages as GPU‑optional or GPU‑required. See \href{https://blog.bioconductor.org/posts/2025-10-10-gpus/}{the blog post} for more details. + +NEWS summaries for three contributed packages chosen at random from the 59 new software contributions are: + +\begin{itemize} +\item + \textbf{dmGsea}: The R package dmGsea provides efficient gene set enrichment analysis specifically for DNA methylation data. It addresses key biases, including probe dependency and varying probe numbers per gene. The package supports Illumina 450K, EPIC, and mouse methylation arrays. Users can also apply it to other omics data by supplying custom probe-to-gene mapping annotations. dmGsea is flexible, fast, and well-suited for large-scale epigenomic studies. +\item + \textbf{goatea}: Geneset Ordinal Association Test Enrichment Analysis (GOATEA) provides a `Shiny' interface with interactive visualizations and utility functions for performing and exploring automated gene set enrichment analysis using the `GOAT' package. `GOATEA' is designed to support large-scale and user-friendly enrichment workflows across multiple gene lists and comparisons, with flexible plotting and output options. Visualizations pre-enrichment include interactive `Volcano' and `UpSet' (overlap) plots. Visualizations post-enrichment include interactive geneset dotplot, geneset treeplot, gene-effectsize heatmap, gene-geneset heatmap and `STRING' database of protein-protein-interactions network graph. `GOAT' reference: Frank Koopmans (2024) \url{doi:10.1038/s42003-024-06454-5}. +\item + \textbf{igblastr}: The igblastr package provides functions to conveniently install and use a local IgBLAST installation from within R. IgBLAST is described at \url{https://pubmed.ncbi.nlm.nih.gov/23671333/}. Online IgBLAST: \url{https://www.ncbi.nlm.nih.gov/igblast/}. +\end{itemize} + +See the NEWS section in the \href{https://bioconductor.org/news/bioc_3_22_release/}{release announcement} for +a complete account of changes throughout the ecosystem. + +\section{Community and Impact}\label{community-and-impact} + +\subsection{Community Team Updates}\label{community-team-updates} + +Nicholas Cooley has joined the Bioconductor Community Team as Developer Engagement Lead, based at the University of Limerick in Ireland. Working with the Community Manager, his role focuses on supporting package developers, improving onboarding resources, and strengthening connections across the developer community. Nick is funded through \href{https://blog.bioconductor.org/posts/2024-07-12-czi-eoss6-grants/}{CZI EOSS 6}. + +Laurah Ondari has also joined the team part‑time, based at the International Institute of Tropical Agriculture (IITA) in Kenya. She works with the Community Manager on communications, social media, and community engagement, including supporting Africa-focused capacity‑building efforts. Laurah's position is also supported by \href{https://blog.bioconductor.org/posts/2024-07-12-czi-eoss6-grants/}{CZI EOSS 6} funding. + +\subsection{Outreachy Internships}\label{outreachy-internships} + +The June--August 2025 \href{https://www.outreachy.org/}{Outreachy} program concluded successfully, with interns contributing to Bioconductor and sharing their reflections in a \href{https://blog.bioconductor.org/posts/2025-12-12-outreachy-june25/}{blog post}. + +\subsection{Community Updates}\label{community-updates} + +The Bioconductor Seminar Series is a new quarterly online event showcasing recent advances in computational biology and their relevance to Bioconductor methods, workflows, and community practice. Conceived by Bioconductor founder Robert Gentleman and organised by Erica Feick, the series began in December 2025 and brings together expert speakers, moderated discussions, and open Q\&A. The first session was on \emph{Deep-learning-based Gene Perturbation Effect Prediction Does Not Yet Outperform Simple Linear Baselines} (Nature Methods, 2025) with speakers Constantin Ahlmann-Eltze, Wolfgang Huber, Simon Anders and discussant Davide Risso. It was well attended with engaging discussion. More details can be found on the \href{https://bioconductor.org/help/seminar-series/}{Bioconductor website}. + +\subsection{Publications and Preprints}\label{publications-and-preprints} + +In November 2025, Crowell et al.~released a \href{https://www.biorxiv.org/content/10.1101/2025.11.20.688607v1}{preprint} describing their online book Orchestrating Spatial Transcriptomics Analysis with Bioconductor (OSTA), a collaborative effort supported by leaders across the community and built on two decades of Bioconductor's foundational work. The project invites feedback, suggestions, and contributions from Bioconductor users and developers. + +\section{Conferences and Workshops}\label{conferences-and-workshops} + +\subsection{Recaps}\label{recaps} + +\begin{itemize} +\tightlist +\item + \textbf{EuroBioC 2025:} The European Bioconductor Conference (GBCC 2025) was held in September 2025 in Barcelona. A blog post recap of the conference can be found in the \href{https://blog.bioconductor.org/posts/2025-10-24-EuroBioC2025-recap/}{Bioconductor blog}. Recordings of talks are available in a \href{https://www.youtube.com/playlist?list=PLdl4u5ZRDMQS_qvtLJNdDqHL6z5jj5y_7}{YouTube playlist}. +\item + \textbf{BioCAsia 2025:} BioCAsia 2025 was held as part of the ABACBS conference in Adelaide, November 27-28. The focus was on hands-on workshops and saw \textgreater100 participants. For more information, visit the \href{https://biocasia2025.bioconductor.org/}{conference website}. +\item + \textbf{Global Training:} We have had a busy second half of the year of training events, with two in-person courses in Africa, in \href{https://blog.bioconductor.org/posts/2025-11-24-ethiopia-course/}{Ethiopia} and \href{https://blog.bioconductor.org/posts/2025-12-11-benin-course/}{Benin}. +\end{itemize} + +\subsection{Announcements}\label{announcements} + +\begin{itemize} +\tightlist +\item + \textbf{EuroBioC 2026:} The European Bioconductor conference is taking place in Turku, Finland from June 3-5, with pre-conference activities June 1-2. For more information, visit the \href{https://eurobioc2026.bioconductor.org/}{conference website}. +\item + \textbf{BioC 2026:} The North American Bioconductor conference is taking place in Seattle from August 10-12, with post-conference activities August 13-14. For more information, visit the \href{https://bioc2026.bioconductor.org/}{conference website}. +\end{itemize} + +\section{Boards and Working Groups Updates}\label{boards-and-working-groups-updates} + +\subsection{New Board Members}\label{new-board-members} + +\begin{itemize} +\item + The \href{https://bioconductor.org/about/community-advisory-board/}{Community Advisory Board} (CAB) welcomes new members Fabricio Almeida-Silva, Tuomas Borman, Laurent Gatto, Zuguang Gu, Eliana Ibrahimi, Martha Luka and Izabela Mamede. We extend our gratitude to outgoing members Jasmine Daly, Leo Lahti, Nicole Ortogero, Janani Ravi, Luyi Tian, Hedia Tnani and Jiefei Wang for their service. +\item + The \href{https://bioconductor.org/about/technical-advisory-board/}{Technical Advisory Board} (TAB) welcomes new members Robert Castelo and Hugo Gruson and Gabriele Sales. Outgoing members Stephanie Hicks, Davide Risso and Charlotte Soneson are also warmly thanked for their contributions. +\end{itemize} + +\section{Using Bioconductor}\label{using-bioconductor} + +Start using +Bioconductor by installing the most recent version of R and evaluating +the commands + +\begin{verbatim} + if (!requireNamespace("BiocManager", quietly = TRUE)) + install.packages("BiocManager") + BiocManager::install() +\end{verbatim} + +Install additional packages and dependencies, +e.g., \href{https://bioconductor.org/packages/SingleCellExperiment}{SingleCellExperiment}, with + +\begin{verbatim} + BiocManager::install("SingleCellExperiment") +\end{verbatim} + +\href{https://bioconductor.org/help/docker/}{Docker} +images provides a very effective on-ramp for power users to rapidly +obtain access to standardized and scalable computing environments. +Key resources include: + +\begin{itemize} +\tightlist +\item + \href{https://bioconductor.org}{bioconductor.org} to install, + learn, use, and develop Bioconductor packages. +\item + A list of \href{https://bioconductor.org/packages}{available software} + linking to pages describing each package. +\item + A question-and-answer style + \href{https://support.bioconductor.org}{user support site} and + developer-oriented \href{https://stat.ethz.ch/mailman/listinfo/bioc-devel}{mailing list}. +\item + A community Zulip workspace (\href{https://chat.bioconductor.org}{sign up}) + for extended technical discussion. +\item + The \href{https://f1000research.com/gateways/bioconductor}{F1000Research Bioconductor gateway} + for peer-reviewed Bioconductor workflows as well as conference contributions. +\item + The \href{https://www.youtube.com/user/bioconductor}{Bioconductor YouTube} + channel includes recordings of keynote and talks from recent + conferences, in addition to + video recordings of training courses. +\item + Our \href{https://github.com/Bioconductor/Contributions}{package submission} + repository for open technical review of new packages. +\end{itemize} + +Upcoming and recently completed events are browsable at our +\href{https://bioconductor.org/help/events/}{events page}. + +The \href{https://bioconductor.org/about/technical-advisory-board/}{Technical} and \href{https://bioconductor.org/about/community-advisory-board/}{Community} +Advisory Boards provide guidance to ensure that the project addresses +leading-edge biological problems with advanced technical approaches, +and adopts practices (such as a +project-wide \href{https://bioconductor.org/about/code-of-conduct/}{Code of Conduct}) +that encourages all to participate. We look forward to +welcoming you! + +We welcome your feedback on these updates and invite you to connect with us through the \href{https://chat.bioconductor.org}{Bioconductor Zulip} workspace or by emailing \href{mailto:community@bioconductor.org}{\nolinkurl{community@bioconductor.org}}. + + +\address{% +Maria Doyle\\ +University of Limerick\\% +Bioconductor Community Manager\\ +% +% +% +% +} + +\address{% +Bioconductor~Core~Developer~Team\\ +Dana-Farber Cancer Institute\\% +Roswell Park Comprehensive Cancer Center, City University of New York, Fred Hutchinson Cancer Research Center, Mass General Brigham\\ +% +% +% +% +} diff --git a/_news/RJ-2025-4-bioconductor/RJournal.sty b/_news/RJ-2025-4-bioconductor/RJournal.sty new file mode 100644 index 0000000000..351990be38 --- /dev/null +++ b/_news/RJ-2025-4-bioconductor/RJournal.sty @@ -0,0 +1,358 @@ +% Package `RJournal' to use with LaTeX2e +% Copyright (C) 2010 by the R Foundation +% Copyright (C) 2013 by the R Journal +% +% Originally written by Kurt Hornik and Friedrich Leisch with subsequent +% edits by the editorial board +% +% CAUTION: +% Do not modify this style file. Any changes to this file will be reset when your +% article is submitted. +% If you must modify the style or add LaTeX packages to the article, these +% should be specified in RJwrapper.tex + +\NeedsTeXFormat{LaTeX2e}[1995/12/01] +\ProvidesPackage{RJournal}[2025/10/05 v0.17 RJournal package] + +\RequirePackage{tikz} + +% Overall page layout, fonts etc ----------------------------------------------- + +% Issues of of \emph{The R Journal} are created from the standard \LaTeX{} +% document class \pkg{report}. + +\RequirePackage{geometry} +\geometry{a4paper, + textwidth=14cm, top=1cm, bottom=1cm, + includehead,includefoot,centering, + footskip=1.5cm} +\raggedbottom +\sloppy +\clubpenalty = 10000 +\widowpenalty = 10000 +\brokenpenalty = 10000 +\usepackage{microtype} + + +\RequirePackage{fancyhdr} +\fancyhead{} +\fancyheadoffset{2cm} +\fancyhead[L]{\textsc{\RJ@sectionhead}} +\fancyhead[R]{\thepage} +\fancyfoot{} +\fancyfoot[L]{The R Journal Vol. \RJ@volume/\RJ@number, \RJ@month~\RJ@year} +\fancyfoot[R]{ISSN 2073-4859} +\pagestyle{fancy} + +% We use the following fonts (all with T1 encoding): +% +% rm & palatino +% tt & inconsolata +% sf & helvetica +% math & palatino + +\RequirePackage{microtype} + +\RequirePackage[scaled=0.92]{helvet} +\RequirePackage{palatino,mathpazo} +\RequirePackage[scaled=1.02]{inconsolata} +\RequirePackage[T1]{fontenc} + +\RequirePackage[hyphens]{url} +\RequirePackage[pagebackref]{hyperref} +\renewcommand{\backref}[1]{[p#1]} + +% Dark blue colour for all links +\RequirePackage{color} +\definecolor{link}{rgb}{0.45,0.51,0.67} +\hypersetup{ + colorlinks,% + citecolor=link,% + filecolor=link,% + linkcolor=link,% + urlcolor=link +} + +% Give the text a little room to breath +\setlength{\parskip}{3pt} +\RequirePackage{setspace} +\setstretch{1.05} + +% Issue and article metadata --------------------------------------------------- + +% Basic front matter information about the issue: volume, number, and +% date. + +\newcommand{\volume}[1]{\def\RJ@volume{#1}} +\newcommand{\volnumber}[1]{\def\RJ@number{#1}} +\renewcommand{\month}[1]{\def\RJ@month{#1}} +\renewcommand{\year}[1]{\def\RJ@year{#1}} + + +% Individual articles correspond to +% chapters, and are contained in |article| environments. This makes it +% easy to have figures counted within articles and hence hyperlinked +% correctly. + +% An article has an author, a title, and optionally a subtitle. We use +% the obvious commands for specifying these. Articles will be put in certain +% journal sections, named by \sectionhead. + +\newcommand {\sectionhead} [1]{\def\RJ@sectionhead{#1}} +\renewcommand{\author} [1]{\def\RJ@author{#1}} +\renewcommand{\title} [1]{\def\RJ@title{#1}} +\newcommand {\subtitle} [1]{\def\RJ@subtitle{#1}} + +% Control appearance of titles: make slightly smaller than usual, and +% suppress section numbering. See http://tex.stackexchange.com/questions/69749 +% for why we don't use \setcounter{secnumdepth}{-1} + +\usepackage[medium]{titlesec} +\usepackage{titletoc} +\titleformat{\section} {\normalfont\large\bfseries}{\arabic{section}}{1em}{} +\titleformat{\subsection}{\normalfont\normalsize\bfseries}{\arabic{section}.\arabic{subsection}}{0.5em}{} +\titlecontents{chapter} [0em]{}{}{}{\titlerule*[1em]{.}\contentspage} + +% Article layout --------------------------------------------------------------- + +% Environment |article| clears the article header information at its beginning. +% We use |\FloatBarrier| from the placeins package to keep floats within +% the article. +\RequirePackage{placeins} +\newenvironment{article}{\author{}\title{}\subtitle{}\FloatBarrier}{\FloatBarrier} + +% Refereed articles should have an abstract, so we redefine |\abstract| to +% give the desired style + +\renewcommand{\abstract}[1]{\noindent\textbf{Abstract} #1} +\renewenvironment{abstract}{\noindent\textbf{Abstract}~}{} + +% The real work is done by a redefined version of |\maketitle|. Note +% that even though we do not want chapters (articles) numbered, we +% need to increment the chapter counter, so that figures get correct +% labelling. + +\renewcommand{\maketitle}{% +\noindent + \chapter{\RJ@title}\refstepcounter{chapter} + \ifx\empty\RJ@subtitle + \else + \noindent\textbf{\RJ@subtitle} + \par\nobreak\addvspace{\baselineskip} + \fi + \ifx\empty\RJ@author + \else + \noindent\textit{\RJ@author} + \par\nobreak\addvspace{\baselineskip} + \fi + \@afterindentfalse\@nobreaktrue\@afterheading +} + +% Now for some ugly redefinitions. We do not want articles to start a +% new page. (Actually, we do, but this is handled via explicit +% \newpage +% +% The name@of@eq is a hack to get hyperlinks to equations to work +% within each article, even though there may be multiple eq.(1) +% \begin{macrocode} +\renewcommand\chapter{\secdef\RJ@chapter\@schapter} +\providecommand{\nohyphens}{% + \hyphenpenalty=10000\exhyphenpenalty=10000\relax} +\newcommand{\RJ@chapter}{% + \edef\name@of@eq{equation.\@arabic{\c@chapter}}% + \renewcommand{\@seccntformat}[1]{}% + \@startsection{chapter}{0}{0mm}{% + -2\baselineskip \@plus -\baselineskip \@minus -.2ex}{\p@}{% + \phantomsection\normalfont\huge\bfseries\raggedright}} + +% Book reviews should appear as sections in the text and in the pdf bookmarks, +% however we wish them to appear as chapters in the TOC. Thus we define an +% alternative to |\maketitle| for reviews. +\newcommand{\review}[1]{ + \pdfbookmark[1]{#1}{#1} + \section*{#1} + \addtocontents{toc}{\protect\contentsline{chapter}{#1}{\thepage}{#1.1}} +} + +% We want bibliographies as starred sections within articles. +% +\RequirePackage[sectionbib,round]{natbib} +\bibliographystyle{abbrvnat} +\renewcommand{\bibsection}{\section*{References}} + +% Equations, figures and tables are counted within articles, but we do +% not show the article number. For equations it becomes a bit messy to avoid +% having hyperref getting it wrong. + +% \numberwithin{equation}{chapter} +\renewcommand{\theequation}{\@arabic\c@equation} +\renewcommand{\thefigure}{\@arabic\c@figure} +\renewcommand{\thetable}{\@arabic\c@table} + +% Issue layout ----------------------------------------------------------------- + +% Need to provide our own version of |\tableofcontents|. We use the +% tikz package to get the rounded rectangle. Notice that |\section*| +% is really the same as |\chapter*|. +\renewcommand{\contentsname}{Contents} +\renewcommand\tableofcontents{% + \vspace{1cm} + \section*{\contentsname} + { \@starttoc{toc} } +} + +\renewcommand{\titlepage}{% + \thispagestyle{empty} + \hypersetup{ + pdftitle={The R Journal Volume \RJ@volume/\RJ@number, \RJ@month \RJ@year},% + pdfauthor={R Foundation for Statistical Computing},% + } + \noindent + \begin{center} + \fontsize{50pt}{50pt}\selectfont + The \raisebox{-8pt}{\includegraphics[height=77pt]{Rlogo-5}}\hspace{10pt} + Journal + + \end{center} + {\large \hfill Volume \RJ@volume/\RJ@number, \RJ@month{} \RJ@year \quad} + + \rule{\textwidth}{1pt} + \begin{center} + {\Large A peer-reviewed, open-access publication of the \\ + R Foundation for Statistical Computing} + \end{center} + + % And finally, put in the TOC box. Note the way |tocdepth| is adjusted + % before and after producing the TOC: thus, we can ensure that only + % articles show up in the printed TOC, but that in the PDF version, + % bookmarks are created for sections and subsections as well (provided + % that the non-starred forms are used). + \setcounter{tocdepth}{0} + \tableofcontents + \setcounter{tocdepth}{2} + \clearpage +} + +% Text formatting -------------------------------------------------------------- + +\newcommand{\R}{R} +\newcommand{\address}[1]{\addvspace{\baselineskip}\noindent\emph{#1}} +\newcommand{\email}[1]{\href{mailto:#1}{\normalfont\texttt{#1}}} + +% Simple font selection is not good enough. For example, |\texttt{--}| +% gives `\texttt{--}', i.e., an endash in typewriter font. Hence, we +% need to turn off ligatures, which currently only happens for commands +% |\code| and |\samp| and the ones derived from them. Hyphenation is +% another issue; it should really be turned off inside |\samp|. And +% most importantly, \LaTeX{} special characters are a nightmare. E.g., +% one needs |\~{}| to produce a tilde in a file name marked by |\file|. +% Perhaps a few years ago, most users would have agreed that this may be +% unfortunate but should not be changed to ensure consistency. But with +% the advent of the WWW and the need for getting `|~|' and `|#|' into +% URLs, commands which only treat the escape and grouping characters +% specially have gained acceptance + +\DeclareRobustCommand\code{\bgroup\@noligs\@codex} +\def\@codex#1{\texorpdfstring% +{{\normalfont\ttfamily\hyphenchar\font=-1 #1}}% +{#1}\egroup} +\newcommand{\kbd}[1]{{\normalfont\texttt{#1}}} +\newcommand{\key}[1]{{\normalfont\texttt{\uppercase{#1}}}} +\DeclareRobustCommand\samp{`\bgroup\@noligs\@sampx} +\def\@sampx#1{{\normalfont\texttt{#1}}\egroup'} +\newcommand{\var}[1]{{\normalfont\textsl{#1}}} +\let\env=\code +\newcommand{\file}[1]{{`\normalfont\textsf{#1}'}} +\let\command=\code +\let\option=\samp +\newcommand{\dfn}[1]{{\normalfont\textsl{#1}}} +% \acronym is effectively disabled since not used consistently +\newcommand{\acronym}[1]{#1} +\newcommand{\strong}[1]{\texorpdfstring% +{{\normalfont\fontseries{b}\selectfont #1}}% +{#1}} +\let\pkg=\strong +\newcommand{\CRANpkg}[1]{\href{https://CRAN.R-project.org/package=#1}{\pkg{#1}}}% +\let\cpkg=\CRANpkg +\newcommand{\ctv}[1]{\href{https://CRAN.R-project.org/view=#1}{\emph{#1}}} +\newcommand{\BIOpkg}[1]{\href{https://www.bioconductor.org/packages/release/bioc/html/#1.html}{\pkg{#1}}} + +% Example environments --------------------------------------------------------- +\RequirePackage{fancyvrb} +\RequirePackage{alltt} + +\DefineVerbatimEnvironment{example}{Verbatim}{} +\renewenvironment{example*}{\begin{alltt}}{\end{alltt}} + +% Support for output from Sweave, and generic session style code +% These used to have fontshape=sl for Sinput/Scode/Sin, but pslatex +% won't use a condensed font in that case. + +% Update (2015-05-28 by DS): remove fontsize=\small to match example environment + +\DefineVerbatimEnvironment{Sinput}{Verbatim}{} +\DefineVerbatimEnvironment{Soutput}{Verbatim}{} +\DefineVerbatimEnvironment{Scode}{Verbatim}{} +\DefineVerbatimEnvironment{Sin}{Verbatim}{} +\DefineVerbatimEnvironment{Sout}{Verbatim}{} +\newenvironment{Schunk}{}{} + +% Mathematics ------------------------------------------------------------------ + +% The implementation of |\operatorname| is similar to the mechanism +% \LaTeXe{} uses for functions like sin and cos, and simpler than the +% one of \AmSLaTeX{}. We use |\providecommand| for the definition in +% order to keep the one of the \pkg{amstex} if this package has +% already been loaded. +% \begin{macrocode} +\providecommand{\operatorname}[1]{% + \mathop{\operator@font#1}\nolimits} +\RequirePackage{amsfonts} + +\renewcommand{\P}{% + \mathop{\operator@font I\hspace{-1.5pt}P\hspace{.13pt}}} +\newcommand{\E}{% + \mathop{\operator@font I\hspace{-1.5pt}E\hspace{.13pt}}} +\newcommand{\VAR}{\operatorname{var}} +\newcommand{\COV}{\operatorname{cov}} +\newcommand{\COR}{\operatorname{cor}} + +% Figures ---------------------------------------------------------------------- + +% For use with pandoc > 3.2.1 +\newsavebox\pandoc@box +\newcommand*\pandocbounded[1]{% scales image to fit in text height/width + \sbox\pandoc@box{#1}% + \Gscale@div\@tempa{\textheight}{\dimexpr\ht\pandoc@box+\dp\pandoc@box\relax}% + \Gscale@div\@tempb{\linewidth}{\wd\pandoc@box}% + \ifdim\@tempb\p@<\@tempa\p@\let\@tempa\@tempb\fi% select the smaller of both + \ifdim\@tempa\p@<\p@\scalebox{\@tempa}{\usebox\pandoc@box}% + \else\usebox{\pandoc@box}% + \fi% +} + +\RequirePackage[font=small,labelfont=bf]{caption} + +% Wide environments for figures and tables ------------------------------------- +\RequirePackage{environ} + +% An easy way to make a figure span the full width of the page +\NewEnviron{widefigure}[1][]{ +\begin{figure}[#1] +\advance\leftskip-2cm +\begin{minipage}{\dimexpr\textwidth+4cm\relax}% + \captionsetup{margin=2cm} + \BODY +\end{minipage}% +\end{figure} +} + +\NewEnviron{widetable}[1][]{ +\begin{table}[#1] +\advance\leftskip-2cm +\begin{minipage}{\dimexpr\textwidth+4cm\relax}% + \captionsetup{margin=2cm} + \BODY +\end{minipage}% +\end{table} +} diff --git a/_news/RJ-2025-4-bioconductor/RJwrapper.tex b/_news/RJ-2025-4-bioconductor/RJwrapper.tex new file mode 100644 index 0000000000..80983dd41a --- /dev/null +++ b/_news/RJ-2025-4-bioconductor/RJwrapper.tex @@ -0,0 +1,70 @@ +\documentclass[a4paper]{report} +\usepackage[utf8]{inputenc} +\usepackage[T1]{fontenc} +\usepackage{RJournal} +\usepackage{amsmath,amssymb,array} +\usepackage{booktabs} + + +% tightlist command for lists without linebreak +\providecommand{\tightlist}{% + \setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}} + +\usepackage{longtable} + +% Always define CSL refs as bib entries are contained in separate doc +% Pandoc citation processing +%From Pandoc 3.1.8 +% definitions for citeproc citations +\NewDocumentCommand\citeproctext{}{} +\NewDocumentCommand\citeproc{mm}{% + \begingroup\def\citeproctext{#2}\cite{#1}\endgroup} +\makeatletter + % allow citations to break across lines + \let\@cite@ofmt\@firstofone + % avoid brackets around text for \cite: + \def\@biblabel#1{} + \def\@cite#1#2{{#1\if@tempswa , #2\fi}} +\makeatother +\newlength{\cslhangindent} +\setlength{\cslhangindent}{1.5em} +\newlength{\csllabelwidth} +\setlength{\csllabelwidth}{3em} +\newenvironment{CSLReferences}[2] % #1 hanging-indent, #2 entry-spacing + {\begin{list}{}{% + \setlength{\itemindent}{0pt} + \setlength{\leftmargin}{0pt} + \setlength{\parsep}{0pt} + % turn on hanging indent if param 1 is 1 + \ifodd #1 + \setlength{\leftmargin}{\cslhangindent} + \setlength{\itemindent}{-1\cslhangindent} + \fi + % set entry spacing + \setlength{\itemsep}{#2\baselineskip}}} + {\end{list}} +\usepackage{calc} +\newcommand{\CSLBlock}[1]{#1\hfill\break} +\newcommand{\CSLLeftMargin}[1]{\parbox[t]{\csllabelwidth}{#1}} +\newcommand{\CSLRightInline}[1]{\parbox[t]{\linewidth - \csllabelwidth}{#1}\break} +\newcommand{\CSLIndent}[1]{\hspace{\cslhangindent}#1} + + + +\begin{document} + + +%% do not edit, for illustration only +\sectionhead{Contributed research article} +\volume{17} +\volnumber{4} +\year{2025} +\month{December} +\setcounter{page}{322} + +\begin{article} + \input{RJ-2025-4-bioconductor} +\end{article} + + +\end{document} diff --git a/_news/RJ-2025-4-cran/RJ-2025-4-cran.R b/_news/RJ-2025-4-cran/RJ-2025-4-cran.R new file mode 100644 index 0000000000..0cf21d41fc --- /dev/null +++ b/_news/RJ-2025-4-cran/RJ-2025-4-cran.R @@ -0,0 +1,48 @@ +# Generated by `rjournal_pdf_article()` using `knitr::purl()`: do not edit by hand +# Please edit RJ-2025-4-cran.Rmd to modify this file + +## ----global_options, include=FALSE, message=FALSE----------------------------- +knitr::opts_chunk$set(fig.pos = 'H') + + +## ----cran_growth_data, include=FALSE, message=FALSE--------------------------- +from <- as.Date("2025-10-01") +to <- as.Date("2025-12-31") + +cran_growth <- readRDS("cran_growth.rds") +cran_zoo <- cran_growth$zoo +loadNamespace("zoo") + +nmonths <- cran_growth$nmonths +npackages <- cran_growth$npackages +mirrors <- cran_growth$mirrors + + +## ----cran_growth, fig.height=7, fig.width=10, fig.align='center', out.width = "100%", fig.alt="CRAN growth: Number of CRAN packages over time in levels (left) and in logs (right).", echo=FALSE---- +par(mfrow = c(1, 2)) +plot(cran_zoo, lwd = 3, xlab = "Year", ylab = "", main = "Number of CRAN Packages") +plot(cran_zoo, lwd = 3, log = "y", xlab = "Year", ylab = "", main = "Number of CRAN Packages (Log-Scale)") + + +## ----cran_submissions_data, include=FALSE------------------------------------- +## generate on cran master via cran_submissions(from, to, ...) +cran_sub <- readRDS("cran_submissions.rds") + + +## ----cran_submissions_autouser, echo=FALSE------------------------------------ +knitr::kable(cran_sub$autouser[, !grepl("clang|special", colnames(cran_sub$autouser))], format = "pipe") + + +## ----cran_submissions_autolast, echo=FALSE------------------------------------ +knitr::kable(cran_sub$autolast, format = "pipe", align = "rr") + + +## ----cran_views_data, include=FALSE------------------------------------------- +## generate via cran_views() +cran_views <- readRDS("cran_views.rds") +cran_views$active <- cran_views$ntotal/as.numeric(tail(cran_zoo, 1)) + + +## ----cran_views_new, echo=FALSE, results="asis"------------------------------- +writeLines(cran_views$new) + diff --git a/_news/RJ-2025-4-cran/RJ-2025-4-cran.Rmd b/_news/RJ-2025-4-cran/RJ-2025-4-cran.Rmd new file mode 100644 index 0000000000..fb728be3b1 --- /dev/null +++ b/_news/RJ-2025-4-cran/RJ-2025-4-cran.Rmd @@ -0,0 +1,143 @@ +--- +title: Changes on CRAN +subtitle: 2025-10-01 to 2025-12-31 +date: '2025-12-01' +draft: no +author: +- name: Kurt Hornik + affiliation: WU Wirtschaftsuniversität Wien + address: Austria + orcid: 0000-0003-4198-9911 + email: Kurt.Hornik@R-project.org +- name: Uwe Ligges + affiliation: TU Dortmund + address: Germany + orcid: 0000-0001-5875-6167 + email: Uwe.Ligges@R-project.org +- name: Achim Zeileis + affiliation: Universität Innsbruck + address: Austria + orcid: 0000-0003-0918-3766 + email: Achim.Zeileis@R-project.org +output: + rjtools::rjournal_article: + self_contained: yes + toc: no +preamble: \usepackage{float, longtable} +volume: 17 +issue: 4 +slug: RJ-2025-4-cran +journal: + lastpage: 327 + firstpage: 326 + +--- + + +```{r global_options, include=FALSE, message=FALSE} +knitr::opts_chunk$set(fig.pos = 'H') +``` + +# CRAN growth + +```{r cran_growth_data, include=FALSE, message=FALSE} +from <- as.Date("2025-10-01") +to <- as.Date("2025-12-31") + +cran_growth <- readRDS("cran_growth.rds") +cran_zoo <- cran_growth$zoo +loadNamespace("zoo") + +nmonths <- cran_growth$nmonths +npackages <- cran_growth$npackages +mirrors <- cran_growth$mirrors +``` + +In the past `r nmonths` months, `r npackages["new"]` new packages were +added to the CRAN package repository. `r npackages["unarchived"]` packages +were unarchived, `r npackages["archived"]` were archived and +`r npackages["removed"]` had to be removed. The following shows the +growth of the number of active packages in the CRAN package repository: + +```{r cran_growth, fig.height=7, fig.width=10, fig.align='center', out.width = "100%", fig.alt="CRAN growth: Number of CRAN packages over time in levels (left) and in logs (right).", echo=FALSE} +par(mfrow = c(1, 2)) +plot(cran_zoo, lwd = 3, xlab = "Year", ylab = "", main = "Number of CRAN Packages") +plot(cran_zoo, lwd = 3, log = "y", xlab = "Year", ylab = "", main = "Number of CRAN Packages (Log-Scale)") +``` + +\noindent On `r to`, the number of active packages was around `r cran_zoo[to]`. + + + + + +# CRAN package submissions + +```{r cran_submissions_data, include=FALSE} +## generate on cran master via cran_submissions(from, to, ...) +cran_sub <- readRDS("cran_submissions.rds") +``` + +From `r format(cran_sub$from, "%B %Y")` to `r format(cran_sub$to, "%B %Y")` +CRAN received `r cran_sub$nsubmissions` package submissions. +For these, `r cran_sub$nactions` actions took place of which +`r cran_sub$autocheck["TRUE"]` (`r cran_sub$autocheckrel["TRUE"]`%) were auto processed actions and +`r cran_sub$autocheck["FALSE"]` (`r cran_sub$autocheckrel["FALSE"]`%) manual actions. + +Minus some special cases, a summary of the auto-processed and manually +triggered actions follows: + +```{r cran_submissions_autouser, echo=FALSE} +knitr::kable(cran_sub$autouser[, !grepl("clang|special", colnames(cran_sub$autouser))], format = "pipe") +``` + +These include the final decisions for the submissions which were + +```{r cran_submissions_autolast, echo=FALSE} +knitr::kable(cran_sub$autolast, format = "pipe", align = "rr") +``` + +\noindent where we only count those as _auto_ processed whose publication or +rejection happened automatically in all steps. + +`r gsub("!", ".", cran_sub$newmember, fixed = TRUE)` +`r gsub("~", " ", cran_sub$oldmember, fixed = TRUE)` + + +# CRAN mirror security + +Currently, there are `r mirrors[1]` official CRAN mirrors, +`r mirrors[2]` of which provide both +secure downloads via '`https`' _and_ use secure mirroring from the CRAN master +(via rsync through ssh tunnels). Since the R 3.4.0 release, `chooseCRANmirror()` +offers these mirrors in preference to the others which are not fully secured (yet). + + +# CRAN Task View Initiative + +```{r cran_views_data, include=FALSE} +## generate via cran_views() +cran_views <- readRDS("cran_views.rds") +cran_views$active <- cran_views$ntotal/as.numeric(tail(cran_zoo, 1)) +``` + +```{r cran_views_new, echo=FALSE, results="asis"} +writeLines(cran_views$new) +``` + +Currently, there are `r cran_views$nviews` task views (see ), +with median and mean numbers of CRAN packages covered +`r round(median(cran_views$npackages))` and `r round(mean(cran_views$npackages))`, respectively. +Overall, these task views cover `r cran_views$ntotal` CRAN packages, +which is about `r round(100 * cran_views$active)`% of all active CRAN packages. diff --git a/_news/RJ-2025-4-cran/RJ-2025-4-cran.html b/_news/RJ-2025-4-cran/RJ-2025-4-cran.html new file mode 100644 index 0000000000..cc842e2a0b --- /dev/null +++ b/_news/RJ-2025-4-cran/RJ-2025-4-cran.html @@ -0,0 +1,1931 @@ + + + + + + + + + + + + + + + + + + + + + Changes on CRAN + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +

    Changes on CRAN

    +

    2025-10-01 to 2025-12-31

    + + +

    “Changes on CRAN” published in The R Journal.

    +
    + + + +
    +

    1 CRAN growth

    +

    In the past 3 months, 590 new packages were +added to the CRAN package repository. 158 packages +were unarchived, 594 were archived and +0 had to be removed. The following shows the +growth of the number of active packages in the CRAN package repository:

    +
    +

    CRAN growth: Number of CRAN packages over time in levels (left) and in logs (right).

    +
    +

    On 2025-12-31, the number of active packages was around 22969.

    + +

    2 CRAN package submissions

    +

    From October 2025 to December 2025 +CRAN received 7066 package submissions. +For these, 11187 actions took place of which +8337 (75%) were auto processed actions and +2850 (25%) manual actions.

    +

    Minus some special cases, a summary of the auto-processed and manually +triggered actions follows:

    +
    + +++++++++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    archiveinspectnewbiespendingpretestpublishrecheckwaiting
    auto2258634164314602306770308
    manual107217787134524483
    +
    +

    These include the final decisions for the submissions which were

    +
    + + + + + + + + + + + + + + + + + + + + +
    archivepublish
    auto2132 (31.2%)2099 (30.7%)
    manual1066 (15.6%)1545 (22.6%)
    +
    +

    where we only count those as auto processed whose publication or +rejection happened automatically in all steps.

    +

    3 CRAN mirror security

    +

    Currently, there are 93 official CRAN mirrors, +77 of which provide both +secure downloads via ‘httpsand use secure mirroring from the CRAN master +(via rsync through ssh tunnels). Since the R 3.4.0 release, chooseCRANmirror() +offers these mirrors in preference to the others which are not fully secured (yet).

    +

    4 CRAN Task View Initiative

    +
    + +
    +

    Currently, there are 49 task views (see https://CRAN.R-project.org/web/views/), +with median and mean numbers of CRAN packages covered +110 and 123, respectively. +Overall, these task views cover 5003 CRAN packages, +which is about 22% of all active CRAN packages.

    +
    + + +
    + +
    +
    + + + + + +
    +

    Reuse

    +

    Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

    +

    Citation

    +

    For attribution, please cite this work as

    +
    Hornik, et al., "Changes on CRAN", The R Journal, 2025
    +

    BibTeX citation

    +
    @article{RJ-2025-4-cran,
    +  author = {Hornik, Kurt and Ligges, Uwe and Zeileis, Achim},
    +  title = {Changes on CRAN},
    +  journal = {The R Journal},
    +  year = {2025},
    +  note = {https://journal.r-project.org/news/RJ-2025-4-cran},
    +  volume = {17},
    +  issue = {4},
    +  issn = {2073-4859},
    +  pages = {326-327}
    +}
    +
    + + + + + + + diff --git a/_news/RJ-2025-4-cran/RJ-2025-4-cran.pdf b/_news/RJ-2025-4-cran/RJ-2025-4-cran.pdf new file mode 100644 index 0000000000..9c88c28e30 Binary files /dev/null and b/_news/RJ-2025-4-cran/RJ-2025-4-cran.pdf differ diff --git a/_news/RJ-2025-4-cran/RJ-2025-4-cran.tex b/_news/RJ-2025-4-cran/RJ-2025-4-cran.tex new file mode 100644 index 0000000000..3c3cdeceda --- /dev/null +++ b/_news/RJ-2025-4-cran/RJ-2025-4-cran.tex @@ -0,0 +1,134 @@ +% !TeX root = RJwrapper.tex +\title{Changes on CRAN} + +\subtitle{% +2025-10-01 to 2025-12-31 +} + +\author{by Kurt Hornik, Uwe Ligges, and Achim Zeileis} + +\maketitle + + +\section{CRAN growth}\label{cran-growth} + +In the past 3 months, 590~new packages were +added to the CRAN package repository. 158~packages +were unarchived, 594~were archived and +0~had to be removed. The following shows the +growth of the number of active packages in the CRAN package repository: + +\begin{center}\includegraphics[width=1\linewidth,alt={CRAN growth: Number of CRAN packages over time in levels (left) and in logs (right).}]{RJ-2025-4-cran_files/figure-latex/cran_growth-1} \end{center} + +\noindent On 2025-12-31, the number of active packages was around~22969. + +\section{CRAN package submissions}\label{cran-package-submissions} + +From October 2025 to December 2025 +CRAN received 7066~package submissions. +For these, 11187~actions took place of which +8337~(75\%) were auto processed actions and +2850~(25\%) manual actions. + +Minus some special cases, a summary of the auto-processed and manually +triggered actions follows: + +\begin{longtable}[]{@{} + >{\raggedright\arraybackslash}p{(\linewidth - 16\tabcolsep) * \real{0.0986}} + >{\raggedleft\arraybackslash}p{(\linewidth - 16\tabcolsep) * \real{0.1127}} + >{\raggedleft\arraybackslash}p{(\linewidth - 16\tabcolsep) * \real{0.1127}} + >{\raggedleft\arraybackslash}p{(\linewidth - 16\tabcolsep) * \real{0.1127}} + >{\raggedleft\arraybackslash}p{(\linewidth - 16\tabcolsep) * \real{0.1127}} + >{\raggedleft\arraybackslash}p{(\linewidth - 16\tabcolsep) * \real{0.1127}} + >{\raggedleft\arraybackslash}p{(\linewidth - 16\tabcolsep) * \real{0.1127}} + >{\raggedleft\arraybackslash}p{(\linewidth - 16\tabcolsep) * \real{0.1127}} + >{\raggedleft\arraybackslash}p{(\linewidth - 16\tabcolsep) * \real{0.1127}}@{}} +\toprule\noalign{} +\begin{minipage}[b]{\linewidth}\raggedright +\end{minipage} & \begin{minipage}[b]{\linewidth}\raggedleft +archive +\end{minipage} & \begin{minipage}[b]{\linewidth}\raggedleft +inspect +\end{minipage} & \begin{minipage}[b]{\linewidth}\raggedleft +newbies +\end{minipage} & \begin{minipage}[b]{\linewidth}\raggedleft +pending +\end{minipage} & \begin{minipage}[b]{\linewidth}\raggedleft +pretest +\end{minipage} & \begin{minipage}[b]{\linewidth}\raggedleft +publish +\end{minipage} & \begin{minipage}[b]{\linewidth}\raggedleft +recheck +\end{minipage} & \begin{minipage}[b]{\linewidth}\raggedleft +waiting +\end{minipage} \\ +\midrule\noalign{} +\endhead +\bottomrule\noalign{} +\endlastfoot +auto & 2258 & 634 & 1643 & 146 & 0 & 2306 & 770 & 308 \\ +manual & 1072 & 1 & 7 & 7 & 87 & 1345 & 244 & 83 \\ +\end{longtable} + +These include the final decisions for the submissions which were + +\begin{longtable}[]{@{}lrr@{}} +\toprule\noalign{} +& archive & publish \\ +\midrule\noalign{} +\endhead +\bottomrule\noalign{} +\endlastfoot +auto & 2132 (31.2\%) & 2099 (30.7\%) \\ +manual & 1066 (15.6\%) & 1545 (22.6\%) \\ +\end{longtable} + +\noindent where we only count those as \emph{auto} processed whose publication or +rejection happened automatically in all steps. + +\section{CRAN mirror security}\label{cran-mirror-security} + +Currently, there are 93 official CRAN mirrors, +77~of which provide both +secure downloads via `\texttt{https}' \emph{and} use secure mirroring from the CRAN master +(via rsync through ssh tunnels). Since the~R 3.4.0 release, \texttt{chooseCRANmirror()} +offers these mirrors in preference to the others which are not fully secured (yet). + +\section{CRAN Task View Initiative}\label{cran-task-view-initiative} + +Currently, there are 49~task views (see \url{https://CRAN.R-project.org/web/views/}), +with median and mean numbers of CRAN packages covered +110 and~123, respectively. +Overall, these task views cover 5003~CRAN packages, +which is about 22\% of all active CRAN packages. + + +\address{% +Kurt Hornik\\ +WU Wirtschaftsuniversität Wien\\% +Austria\\ +% +% +\textit{ORCiD: \href{https://orcid.org/0000-0003-4198-9911}{0000-0003-4198-9911}}\\% +\href{mailto:Kurt.Hornik@R-project.org}{\nolinkurl{Kurt.Hornik@R-project.org}}% +} + +\address{% +Uwe Ligges\\ +TU Dortmund\\% +Germany\\ +% +% +\textit{ORCiD: \href{https://orcid.org/0000-0001-5875-6167}{0000-0001-5875-6167}}\\% +\href{mailto:Uwe.Ligges@R-project.org}{\nolinkurl{Uwe.Ligges@R-project.org}}% +} + +\address{% +Achim Zeileis\\ +Universität Innsbruck\\% +Austria\\ +% +% +\textit{ORCiD: \href{https://orcid.org/0000-0003-0918-3766}{0000-0003-0918-3766}}\\% +\href{mailto:Achim.Zeileis@R-project.org}{\nolinkurl{Achim.Zeileis@R-project.org}}% +} diff --git a/_news/RJ-2025-4-cran/RJournal.sty b/_news/RJ-2025-4-cran/RJournal.sty new file mode 100644 index 0000000000..351990be38 --- /dev/null +++ b/_news/RJ-2025-4-cran/RJournal.sty @@ -0,0 +1,358 @@ +% Package `RJournal' to use with LaTeX2e +% Copyright (C) 2010 by the R Foundation +% Copyright (C) 2013 by the R Journal +% +% Originally written by Kurt Hornik and Friedrich Leisch with subsequent +% edits by the editorial board +% +% CAUTION: +% Do not modify this style file. Any changes to this file will be reset when your +% article is submitted. +% If you must modify the style or add LaTeX packages to the article, these +% should be specified in RJwrapper.tex + +\NeedsTeXFormat{LaTeX2e}[1995/12/01] +\ProvidesPackage{RJournal}[2025/10/05 v0.17 RJournal package] + +\RequirePackage{tikz} + +% Overall page layout, fonts etc ----------------------------------------------- + +% Issues of of \emph{The R Journal} are created from the standard \LaTeX{} +% document class \pkg{report}. + +\RequirePackage{geometry} +\geometry{a4paper, + textwidth=14cm, top=1cm, bottom=1cm, + includehead,includefoot,centering, + footskip=1.5cm} +\raggedbottom +\sloppy +\clubpenalty = 10000 +\widowpenalty = 10000 +\brokenpenalty = 10000 +\usepackage{microtype} + + +\RequirePackage{fancyhdr} +\fancyhead{} +\fancyheadoffset{2cm} +\fancyhead[L]{\textsc{\RJ@sectionhead}} +\fancyhead[R]{\thepage} +\fancyfoot{} +\fancyfoot[L]{The R Journal Vol. \RJ@volume/\RJ@number, \RJ@month~\RJ@year} +\fancyfoot[R]{ISSN 2073-4859} +\pagestyle{fancy} + +% We use the following fonts (all with T1 encoding): +% +% rm & palatino +% tt & inconsolata +% sf & helvetica +% math & palatino + +\RequirePackage{microtype} + +\RequirePackage[scaled=0.92]{helvet} +\RequirePackage{palatino,mathpazo} +\RequirePackage[scaled=1.02]{inconsolata} +\RequirePackage[T1]{fontenc} + +\RequirePackage[hyphens]{url} +\RequirePackage[pagebackref]{hyperref} +\renewcommand{\backref}[1]{[p#1]} + +% Dark blue colour for all links +\RequirePackage{color} +\definecolor{link}{rgb}{0.45,0.51,0.67} +\hypersetup{ + colorlinks,% + citecolor=link,% + filecolor=link,% + linkcolor=link,% + urlcolor=link +} + +% Give the text a little room to breath +\setlength{\parskip}{3pt} +\RequirePackage{setspace} +\setstretch{1.05} + +% Issue and article metadata --------------------------------------------------- + +% Basic front matter information about the issue: volume, number, and +% date. + +\newcommand{\volume}[1]{\def\RJ@volume{#1}} +\newcommand{\volnumber}[1]{\def\RJ@number{#1}} +\renewcommand{\month}[1]{\def\RJ@month{#1}} +\renewcommand{\year}[1]{\def\RJ@year{#1}} + + +% Individual articles correspond to +% chapters, and are contained in |article| environments. This makes it +% easy to have figures counted within articles and hence hyperlinked +% correctly. + +% An article has an author, a title, and optionally a subtitle. We use +% the obvious commands for specifying these. Articles will be put in certain +% journal sections, named by \sectionhead. + +\newcommand {\sectionhead} [1]{\def\RJ@sectionhead{#1}} +\renewcommand{\author} [1]{\def\RJ@author{#1}} +\renewcommand{\title} [1]{\def\RJ@title{#1}} +\newcommand {\subtitle} [1]{\def\RJ@subtitle{#1}} + +% Control appearance of titles: make slightly smaller than usual, and +% suppress section numbering. See http://tex.stackexchange.com/questions/69749 +% for why we don't use \setcounter{secnumdepth}{-1} + +\usepackage[medium]{titlesec} +\usepackage{titletoc} +\titleformat{\section} {\normalfont\large\bfseries}{\arabic{section}}{1em}{} +\titleformat{\subsection}{\normalfont\normalsize\bfseries}{\arabic{section}.\arabic{subsection}}{0.5em}{} +\titlecontents{chapter} [0em]{}{}{}{\titlerule*[1em]{.}\contentspage} + +% Article layout --------------------------------------------------------------- + +% Environment |article| clears the article header information at its beginning. +% We use |\FloatBarrier| from the placeins package to keep floats within +% the article. +\RequirePackage{placeins} +\newenvironment{article}{\author{}\title{}\subtitle{}\FloatBarrier}{\FloatBarrier} + +% Refereed articles should have an abstract, so we redefine |\abstract| to +% give the desired style + +\renewcommand{\abstract}[1]{\noindent\textbf{Abstract} #1} +\renewenvironment{abstract}{\noindent\textbf{Abstract}~}{} + +% The real work is done by a redefined version of |\maketitle|. Note +% that even though we do not want chapters (articles) numbered, we +% need to increment the chapter counter, so that figures get correct +% labelling. + +\renewcommand{\maketitle}{% +\noindent + \chapter{\RJ@title}\refstepcounter{chapter} + \ifx\empty\RJ@subtitle + \else + \noindent\textbf{\RJ@subtitle} + \par\nobreak\addvspace{\baselineskip} + \fi + \ifx\empty\RJ@author + \else + \noindent\textit{\RJ@author} + \par\nobreak\addvspace{\baselineskip} + \fi + \@afterindentfalse\@nobreaktrue\@afterheading +} + +% Now for some ugly redefinitions. We do not want articles to start a +% new page. (Actually, we do, but this is handled via explicit +% \newpage +% +% The name@of@eq is a hack to get hyperlinks to equations to work +% within each article, even though there may be multiple eq.(1) +% \begin{macrocode} +\renewcommand\chapter{\secdef\RJ@chapter\@schapter} +\providecommand{\nohyphens}{% + \hyphenpenalty=10000\exhyphenpenalty=10000\relax} +\newcommand{\RJ@chapter}{% + \edef\name@of@eq{equation.\@arabic{\c@chapter}}% + \renewcommand{\@seccntformat}[1]{}% + \@startsection{chapter}{0}{0mm}{% + -2\baselineskip \@plus -\baselineskip \@minus -.2ex}{\p@}{% + \phantomsection\normalfont\huge\bfseries\raggedright}} + +% Book reviews should appear as sections in the text and in the pdf bookmarks, +% however we wish them to appear as chapters in the TOC. Thus we define an +% alternative to |\maketitle| for reviews. +\newcommand{\review}[1]{ + \pdfbookmark[1]{#1}{#1} + \section*{#1} + \addtocontents{toc}{\protect\contentsline{chapter}{#1}{\thepage}{#1.1}} +} + +% We want bibliographies as starred sections within articles. +% +\RequirePackage[sectionbib,round]{natbib} +\bibliographystyle{abbrvnat} +\renewcommand{\bibsection}{\section*{References}} + +% Equations, figures and tables are counted within articles, but we do +% not show the article number. For equations it becomes a bit messy to avoid +% having hyperref getting it wrong. + +% \numberwithin{equation}{chapter} +\renewcommand{\theequation}{\@arabic\c@equation} +\renewcommand{\thefigure}{\@arabic\c@figure} +\renewcommand{\thetable}{\@arabic\c@table} + +% Issue layout ----------------------------------------------------------------- + +% Need to provide our own version of |\tableofcontents|. We use the +% tikz package to get the rounded rectangle. Notice that |\section*| +% is really the same as |\chapter*|. +\renewcommand{\contentsname}{Contents} +\renewcommand\tableofcontents{% + \vspace{1cm} + \section*{\contentsname} + { \@starttoc{toc} } +} + +\renewcommand{\titlepage}{% + \thispagestyle{empty} + \hypersetup{ + pdftitle={The R Journal Volume \RJ@volume/\RJ@number, \RJ@month \RJ@year},% + pdfauthor={R Foundation for Statistical Computing},% + } + \noindent + \begin{center} + \fontsize{50pt}{50pt}\selectfont + The \raisebox{-8pt}{\includegraphics[height=77pt]{Rlogo-5}}\hspace{10pt} + Journal + + \end{center} + {\large \hfill Volume \RJ@volume/\RJ@number, \RJ@month{} \RJ@year \quad} + + \rule{\textwidth}{1pt} + \begin{center} + {\Large A peer-reviewed, open-access publication of the \\ + R Foundation for Statistical Computing} + \end{center} + + % And finally, put in the TOC box. Note the way |tocdepth| is adjusted + % before and after producing the TOC: thus, we can ensure that only + % articles show up in the printed TOC, but that in the PDF version, + % bookmarks are created for sections and subsections as well (provided + % that the non-starred forms are used). + \setcounter{tocdepth}{0} + \tableofcontents + \setcounter{tocdepth}{2} + \clearpage +} + +% Text formatting -------------------------------------------------------------- + +\newcommand{\R}{R} +\newcommand{\address}[1]{\addvspace{\baselineskip}\noindent\emph{#1}} +\newcommand{\email}[1]{\href{mailto:#1}{\normalfont\texttt{#1}}} + +% Simple font selection is not good enough. For example, |\texttt{--}| +% gives `\texttt{--}', i.e., an endash in typewriter font. Hence, we +% need to turn off ligatures, which currently only happens for commands +% |\code| and |\samp| and the ones derived from them. Hyphenation is +% another issue; it should really be turned off inside |\samp|. And +% most importantly, \LaTeX{} special characters are a nightmare. E.g., +% one needs |\~{}| to produce a tilde in a file name marked by |\file|. +% Perhaps a few years ago, most users would have agreed that this may be +% unfortunate but should not be changed to ensure consistency. But with +% the advent of the WWW and the need for getting `|~|' and `|#|' into +% URLs, commands which only treat the escape and grouping characters +% specially have gained acceptance + +\DeclareRobustCommand\code{\bgroup\@noligs\@codex} +\def\@codex#1{\texorpdfstring% +{{\normalfont\ttfamily\hyphenchar\font=-1 #1}}% +{#1}\egroup} +\newcommand{\kbd}[1]{{\normalfont\texttt{#1}}} +\newcommand{\key}[1]{{\normalfont\texttt{\uppercase{#1}}}} +\DeclareRobustCommand\samp{`\bgroup\@noligs\@sampx} +\def\@sampx#1{{\normalfont\texttt{#1}}\egroup'} +\newcommand{\var}[1]{{\normalfont\textsl{#1}}} +\let\env=\code +\newcommand{\file}[1]{{`\normalfont\textsf{#1}'}} +\let\command=\code +\let\option=\samp +\newcommand{\dfn}[1]{{\normalfont\textsl{#1}}} +% \acronym is effectively disabled since not used consistently +\newcommand{\acronym}[1]{#1} +\newcommand{\strong}[1]{\texorpdfstring% +{{\normalfont\fontseries{b}\selectfont #1}}% +{#1}} +\let\pkg=\strong +\newcommand{\CRANpkg}[1]{\href{https://CRAN.R-project.org/package=#1}{\pkg{#1}}}% +\let\cpkg=\CRANpkg +\newcommand{\ctv}[1]{\href{https://CRAN.R-project.org/view=#1}{\emph{#1}}} +\newcommand{\BIOpkg}[1]{\href{https://www.bioconductor.org/packages/release/bioc/html/#1.html}{\pkg{#1}}} + +% Example environments --------------------------------------------------------- +\RequirePackage{fancyvrb} +\RequirePackage{alltt} + +\DefineVerbatimEnvironment{example}{Verbatim}{} +\renewenvironment{example*}{\begin{alltt}}{\end{alltt}} + +% Support for output from Sweave, and generic session style code +% These used to have fontshape=sl for Sinput/Scode/Sin, but pslatex +% won't use a condensed font in that case. + +% Update (2015-05-28 by DS): remove fontsize=\small to match example environment + +\DefineVerbatimEnvironment{Sinput}{Verbatim}{} +\DefineVerbatimEnvironment{Soutput}{Verbatim}{} +\DefineVerbatimEnvironment{Scode}{Verbatim}{} +\DefineVerbatimEnvironment{Sin}{Verbatim}{} +\DefineVerbatimEnvironment{Sout}{Verbatim}{} +\newenvironment{Schunk}{}{} + +% Mathematics ------------------------------------------------------------------ + +% The implementation of |\operatorname| is similar to the mechanism +% \LaTeXe{} uses for functions like sin and cos, and simpler than the +% one of \AmSLaTeX{}. We use |\providecommand| for the definition in +% order to keep the one of the \pkg{amstex} if this package has +% already been loaded. +% \begin{macrocode} +\providecommand{\operatorname}[1]{% + \mathop{\operator@font#1}\nolimits} +\RequirePackage{amsfonts} + +\renewcommand{\P}{% + \mathop{\operator@font I\hspace{-1.5pt}P\hspace{.13pt}}} +\newcommand{\E}{% + \mathop{\operator@font I\hspace{-1.5pt}E\hspace{.13pt}}} +\newcommand{\VAR}{\operatorname{var}} +\newcommand{\COV}{\operatorname{cov}} +\newcommand{\COR}{\operatorname{cor}} + +% Figures ---------------------------------------------------------------------- + +% For use with pandoc > 3.2.1 +\newsavebox\pandoc@box +\newcommand*\pandocbounded[1]{% scales image to fit in text height/width + \sbox\pandoc@box{#1}% + \Gscale@div\@tempa{\textheight}{\dimexpr\ht\pandoc@box+\dp\pandoc@box\relax}% + \Gscale@div\@tempb{\linewidth}{\wd\pandoc@box}% + \ifdim\@tempb\p@<\@tempa\p@\let\@tempa\@tempb\fi% select the smaller of both + \ifdim\@tempa\p@<\p@\scalebox{\@tempa}{\usebox\pandoc@box}% + \else\usebox{\pandoc@box}% + \fi% +} + +\RequirePackage[font=small,labelfont=bf]{caption} + +% Wide environments for figures and tables ------------------------------------- +\RequirePackage{environ} + +% An easy way to make a figure span the full width of the page +\NewEnviron{widefigure}[1][]{ +\begin{figure}[#1] +\advance\leftskip-2cm +\begin{minipage}{\dimexpr\textwidth+4cm\relax}% + \captionsetup{margin=2cm} + \BODY +\end{minipage}% +\end{figure} +} + +\NewEnviron{widetable}[1][]{ +\begin{table}[#1] +\advance\leftskip-2cm +\begin{minipage}{\dimexpr\textwidth+4cm\relax}% + \captionsetup{margin=2cm} + \BODY +\end{minipage}% +\end{table} +} diff --git a/_news/RJ-2025-4-cran/RJwrapper.tex b/_news/RJ-2025-4-cran/RJwrapper.tex new file mode 100644 index 0000000000..9cb6138bc7 --- /dev/null +++ b/_news/RJ-2025-4-cran/RJwrapper.tex @@ -0,0 +1,71 @@ +\documentclass[a4paper]{report} +\usepackage[utf8]{inputenc} +\usepackage[T1]{fontenc} +\usepackage{RJournal} +\usepackage{amsmath,amssymb,array} +\usepackage{booktabs} + + +% tightlist command for lists without linebreak +\providecommand{\tightlist}{% + \setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}} + +\usepackage{longtable} + +% Always define CSL refs as bib entries are contained in separate doc +% Pandoc citation processing +%From Pandoc 3.1.8 +% definitions for citeproc citations +\NewDocumentCommand\citeproctext{}{} +\NewDocumentCommand\citeproc{mm}{% + \begingroup\def\citeproctext{#2}\cite{#1}\endgroup} +\makeatletter + % allow citations to break across lines + \let\@cite@ofmt\@firstofone + % avoid brackets around text for \cite: + \def\@biblabel#1{} + \def\@cite#1#2{{#1\if@tempswa , #2\fi}} +\makeatother +\newlength{\cslhangindent} +\setlength{\cslhangindent}{1.5em} +\newlength{\csllabelwidth} +\setlength{\csllabelwidth}{3em} +\newenvironment{CSLReferences}[2] % #1 hanging-indent, #2 entry-spacing + {\begin{list}{}{% + \setlength{\itemindent}{0pt} + \setlength{\leftmargin}{0pt} + \setlength{\parsep}{0pt} + % turn on hanging indent if param 1 is 1 + \ifodd #1 + \setlength{\leftmargin}{\cslhangindent} + \setlength{\itemindent}{-1\cslhangindent} + \fi + % set entry spacing + \setlength{\itemsep}{#2\baselineskip}}} + {\end{list}} +\usepackage{calc} +\newcommand{\CSLBlock}[1]{#1\hfill\break} +\newcommand{\CSLLeftMargin}[1]{\parbox[t]{\csllabelwidth}{#1}} +\newcommand{\CSLRightInline}[1]{\parbox[t]{\linewidth - \csllabelwidth}{#1}\break} +\newcommand{\CSLIndent}[1]{\hspace{\cslhangindent}#1} + + +\usepackage{float, longtable} + +\begin{document} + + +%% do not edit, for illustration only +\sectionhead{Contributed research article} +\volume{17} +\volnumber{4} +\year{2025} +\month{December} +\setcounter{page}{326} + +\begin{article} + \input{RJ-2025-4-cran} +\end{article} + + +\end{document} diff --git a/_news/RJ-2025-4-cran/cran_growth.rds b/_news/RJ-2025-4-cran/cran_growth.rds new file mode 100644 index 0000000000..e0f4ac39ae Binary files /dev/null and b/_news/RJ-2025-4-cran/cran_growth.rds differ diff --git a/_news/RJ-2025-4-cran/cran_submissions.rds b/_news/RJ-2025-4-cran/cran_submissions.rds new file mode 100644 index 0000000000..f0d542af65 Binary files /dev/null and b/_news/RJ-2025-4-cran/cran_submissions.rds differ diff --git a/_news/RJ-2025-4-cran/cran_views.rds b/_news/RJ-2025-4-cran/cran_views.rds new file mode 100644 index 0000000000..9e3843eb65 Binary files /dev/null and b/_news/RJ-2025-4-cran/cran_views.rds differ diff --git a/_news/RJ-2025-4-editorial/RJ-2025-4-editorial.R b/_news/RJ-2025-4-editorial/RJ-2025-4-editorial.R new file mode 100644 index 0000000000..e1312d65c6 --- /dev/null +++ b/_news/RJ-2025-4-editorial/RJ-2025-4-editorial.R @@ -0,0 +1,8 @@ +# Generated by `rjournal_pdf_article()` using `knitr::purl()`: do not edit by hand +# Please edit RJ-2025-4-editorial.Rmd to modify this file + +## ----setup, include=FALSE----------------------------------------------------- +knitr::opts_chunk$set(echo = FALSE, warning = FALSE, message = FALSE,fig.align = "center", + fig.retina=5, + echo = FALSE, fig.path="figs/") + diff --git a/_news/RJ-2025-4-editorial/RJ-2025-4-editorial.Rmd b/_news/RJ-2025-4-editorial/RJ-2025-4-editorial.Rmd new file mode 100644 index 0000000000..2eea35f40b --- /dev/null +++ b/_news/RJ-2025-4-editorial/RJ-2025-4-editorial.Rmd @@ -0,0 +1,49 @@ +--- +title: Editorial +draft: no +author: +- name: Rob J Hyndman + affiliation: Monash University + url: https://journal.r-project.org + email: r-journal@r-project.org +date: '2025-12-01' +creative_commons: CC BY +output: + rjtools::rjournal_article: + self_contained: yes + toc: no +volume: 17 +issue: 4 +slug: RJ-2025-4-editorial +journal: + lastpage: 3 + firstpage: 3 + +--- + + +```{r setup, include=FALSE} +knitr::opts_chunk$set(echo = FALSE, warning = FALSE, message = FALSE,fig.align = "center", + fig.retina=5, + echo = FALSE, fig.path="figs/") +``` + +# Editorial changes {-} + +This is the last issue for 2025, and so it also marks the end of Mark van der Loo's term as an Executive Editor of the *R Journal*. We thank him for his work over the last four years, and especially for his service as Editor-in-Chief during 2024. He has consistently worked to improve the *R Journal*, and raise the standard of published articles and packages, and we are grateful for his contributions. + +This is also the last issue for me as Editor-in-Chief of the *R Journal*. It has been an honour and a pleasure to serve in this role, and a privilege to work with a great group of editors and associate editors. I am happy to hand the reins to Dr Emi Tanaka as the incoming Editor-in-Chief, who has been an Executive Editor since 2024. Emi is a Senior Lecturer at the Australian National University, and is a very active member of the R community with several contributed packages on CRAN. + +We also welcome Professor Vincent Arel-Bundock, from the University of Montreal, as a new Executive Editor for the period 2026--2029. He joins Emi Tanaka, Emily Zabor and me as the team of Executive Editors for 2026. + +We also welcome two new Associate Editors: Selçuk Korkmaz and Maciej Beręsewicz. We are grateful to them, and the large team of Associate Editors, for their willingness to contribute to the *R Journal*. + +Finally, thanks to Mitchell O'Hara-Wild, who is stepping down as Technical Editor of the Journal. He has provided wonderful support to the *R Journal* over many years, solving countless bewildering technical issues in order to make the Journal website function smoothly. In his place, we welcome Abhishek Ulayil, who will be the new Technical Editor from 2026. We are grateful to Abhishek for taking on this valuable role. + +# New guidelines + +We recently introduced new guidelines for papers about R packages (which covers the vast majority of the papers we publish). The new guidelines are available on the [*R Journal* website](https://journal.r-project.org/R_package_guidelines.html). The intention is to make the expectations for authors clearer, and to improve the quality and consistency of articles published in the *R Journal*. Prospective authors should follow these guidelines when preparing their submissions. + +# In this issue {-} + +On behalf of the editorial board, I am pleased to present Volume 17 Issue 4 of the R Journal. This issue features 15 research articles. Each article relates to an R package available on CRAN, providing an overview of the package, its functionality, and examples of its use. Supplementary material for each article, with fully reproducible code, is available for download from the Journal website. We also include news from CRAN, Bioconductor, the Forwards Taskforce, and the R Foundation. diff --git a/_news/RJ-2025-4-editorial/RJ-2025-4-editorial.html b/_news/RJ-2025-4-editorial/RJ-2025-4-editorial.html new file mode 100644 index 0000000000..eda0cad7e0 --- /dev/null +++ b/_news/RJ-2025-4-editorial/RJ-2025-4-editorial.html @@ -0,0 +1,1808 @@ + + + + + + + + + + + + + + + + + + + + + Editorial + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +

    Editorial

    + + + +

    “Editorial” published in The R Journal.

    +
    + + + +
    +

    Editorial changes

    +

    This is the last issue for 2025, and so it also marks the end of Mark van der Loo’s term as an Executive Editor of the R Journal. We thank him for his work over the last four years, and especially for his service as Editor-in-Chief during 2024. He has consistently worked to improve the R Journal, and raise the standard of published articles and packages, and we are grateful for his contributions.

    +

    This is also the last issue for me as Editor-in-Chief of the R Journal. It has been an honour and a pleasure to serve in this role, and a privilege to work with a great group of editors and associate editors. I am happy to hand the reins to Dr Emi Tanaka as the incoming Editor-in-Chief, who has been an Executive Editor since 2024. Emi is a Senior Lecturer at the Australian National University, and is a very active member of the R community with several contributed packages on CRAN.

    +

    We also welcome Professor Vincent Arel-Bundock, from the University of Montreal, as a new Executive Editor for the period 2026–2029. He joins Emi Tanaka, Emily Zabor and me as the team of Executive Editors for 2026.

    +

    We also welcome two new Associate Editors: Selçuk Korkmaz and Maciej Beręsewicz. We are grateful to them, and the large team of Associate Editors, for their willingness to contribute to the R Journal.

    +

    Finally, thanks to Mitchell O’Hara-Wild, who is stepping down as Technical Editor of the Journal. He has provided wonderful support to the R Journal over many years, solving countless bewildering technical issues in order to make the Journal website function smoothly. In his place, we welcome Abhishek Ulayil, who will be the new Technical Editor from 2026. We are grateful to Abhishek for taking on this valuable role.

    +

    1 New guidelines

    +

    We recently introduced new guidelines for papers about R packages (which covers the vast majority of the papers we publish). The new guidelines are available on the R Journal website. The intention is to make the expectations for authors clearer, and to improve the quality and consistency of articles published in the R Journal. Prospective authors should follow these guidelines when preparing their submissions.

    +

    In this issue

    +

    On behalf of the editorial board, I am pleased to present Volume 17 Issue 4 of the R Journal. This issue features 15 research articles. Each article relates to an R package available on CRAN, providing an overview of the package, its functionality, and examples of its use. Supplementary material for each article, with fully reproducible code, is available for download from the Journal website. We also include news from CRAN, Bioconductor, the Forwards Taskforce, and the R Foundation.

    +
    + + +
    + +
    +
    + + + + + +
    +

    Reuse

    +

    Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

    +

    Citation

    +

    For attribution, please cite this work as

    +
    Hyndman, "Editorial", The R Journal, 2025
    +

    BibTeX citation

    +
    @article{RJ-2025-4-editorial,
    +  author = {Hyndman, Rob J},
    +  title = {Editorial},
    +  journal = {The R Journal},
    +  year = {2025},
    +  note = {https://journal.r-project.org/news/RJ-2025-4-editorial},
    +  volume = {17},
    +  issue = {4},
    +  issn = {2073-4859},
    +  pages = {3-3}
    +}
    +
    + + + + + + + diff --git a/_news/RJ-2025-4-editorial/RJ-2025-4-editorial.pdf b/_news/RJ-2025-4-editorial/RJ-2025-4-editorial.pdf new file mode 100644 index 0000000000..640c7b9fe7 Binary files /dev/null and b/_news/RJ-2025-4-editorial/RJ-2025-4-editorial.pdf differ diff --git a/_news/RJ-2025-4-editorial/RJ-2025-4-editorial.tex b/_news/RJ-2025-4-editorial/RJ-2025-4-editorial.tex new file mode 100644 index 0000000000..f26d5cc6af --- /dev/null +++ b/_news/RJ-2025-4-editorial/RJ-2025-4-editorial.tex @@ -0,0 +1,41 @@ +% !TeX root = RJwrapper.tex +\title{Editorial} + + +\author{by Rob J Hyndman} + +\maketitle + + +\section*{Editorial changes}\label{editorial-changes} +\addcontentsline{toc}{section}{Editorial changes} + +This is the last issue for 2025, and so it also marks the end of Mark van der Loo's term as an Executive Editor of the \emph{R Journal}. We thank him for his work over the last four years, and especially for his service as Editor-in-Chief during 2024. He has consistently worked to improve the \emph{R Journal}, and raise the standard of published articles and packages, and we are grateful for his contributions. + +This is also the last issue for me as Editor-in-Chief of the \emph{R Journal}. It has been an honour and a pleasure to serve in this role, and a privilege to work with a great group of editors and associate editors. I am happy to hand the reins to Dr Emi Tanaka as the incoming Editor-in-Chief, who has been an Executive Editor since 2024. Emi is a Senior Lecturer at the Australian National University, and is a very active member of the R community with several contributed packages on CRAN. + +We also welcome Professor Vincent Arel-Bundock, from the University of Montreal, as a new Executive Editor for the period 2026--2029. He joins Emi Tanaka, Emily Zabor and me as the team of Executive Editors for 2026. + +We also welcome two new Associate Editors: Selçuk Korkmaz and Maciej Beręsewicz. We are grateful to them, and the large team of Associate Editors, for their willingness to contribute to the \emph{R Journal}. + +Finally, thanks to Mitchell O'Hara-Wild, who is stepping down as Technical Editor of the Journal. He has provided wonderful support to the \emph{R Journal} over many years, solving countless bewildering technical issues in order to make the Journal website function smoothly. In his place, we welcome Abhishek Ulayil, who will be the new Technical Editor from 2026. We are grateful to Abhishek for taking on this valuable role. + +\section{New guidelines}\label{new-guidelines} + +We recently introduced new guidelines for papers about R packages (which covers the vast majority of the papers we publish). The new guidelines are available on the \href{https://journal.r-project.org/R_package_guidelines.html}{\emph{R Journal} website}. The intention is to make the expectations for authors clearer, and to improve the quality and consistency of articles published in the \emph{R Journal}. Prospective authors should follow these guidelines when preparing their submissions. + +\section*{In this issue}\label{in-this-issue} +\addcontentsline{toc}{section}{In this issue} + +On behalf of the editorial board, I am pleased to present Volume 17 Issue 4 of the R Journal. This issue features 15 research articles. Each article relates to an R package available on CRAN, providing an overview of the package, its functionality, and examples of its use. Supplementary material for each article, with fully reproducible code, is available for download from the Journal website. We also include news from CRAN, Bioconductor, the Forwards Taskforce, and the R Foundation. + + +\address{% +Rob J Hyndman\\ +Monash University\\% +\\ +% +\url{https://journal.r-project.org}\\% +% +\href{mailto:r-journal@r-project.org}{\nolinkurl{r-journal@r-project.org}}% +} diff --git a/_news/RJ-2025-4-editorial/RJournal.sty b/_news/RJ-2025-4-editorial/RJournal.sty new file mode 100644 index 0000000000..351990be38 --- /dev/null +++ b/_news/RJ-2025-4-editorial/RJournal.sty @@ -0,0 +1,358 @@ +% Package `RJournal' to use with LaTeX2e +% Copyright (C) 2010 by the R Foundation +% Copyright (C) 2013 by the R Journal +% +% Originally written by Kurt Hornik and Friedrich Leisch with subsequent +% edits by the editorial board +% +% CAUTION: +% Do not modify this style file. Any changes to this file will be reset when your +% article is submitted. +% If you must modify the style or add LaTeX packages to the article, these +% should be specified in RJwrapper.tex + +\NeedsTeXFormat{LaTeX2e}[1995/12/01] +\ProvidesPackage{RJournal}[2025/10/05 v0.17 RJournal package] + +\RequirePackage{tikz} + +% Overall page layout, fonts etc ----------------------------------------------- + +% Issues of of \emph{The R Journal} are created from the standard \LaTeX{} +% document class \pkg{report}. + +\RequirePackage{geometry} +\geometry{a4paper, + textwidth=14cm, top=1cm, bottom=1cm, + includehead,includefoot,centering, + footskip=1.5cm} +\raggedbottom +\sloppy +\clubpenalty = 10000 +\widowpenalty = 10000 +\brokenpenalty = 10000 +\usepackage{microtype} + + +\RequirePackage{fancyhdr} +\fancyhead{} +\fancyheadoffset{2cm} +\fancyhead[L]{\textsc{\RJ@sectionhead}} +\fancyhead[R]{\thepage} +\fancyfoot{} +\fancyfoot[L]{The R Journal Vol. \RJ@volume/\RJ@number, \RJ@month~\RJ@year} +\fancyfoot[R]{ISSN 2073-4859} +\pagestyle{fancy} + +% We use the following fonts (all with T1 encoding): +% +% rm & palatino +% tt & inconsolata +% sf & helvetica +% math & palatino + +\RequirePackage{microtype} + +\RequirePackage[scaled=0.92]{helvet} +\RequirePackage{palatino,mathpazo} +\RequirePackage[scaled=1.02]{inconsolata} +\RequirePackage[T1]{fontenc} + +\RequirePackage[hyphens]{url} +\RequirePackage[pagebackref]{hyperref} +\renewcommand{\backref}[1]{[p#1]} + +% Dark blue colour for all links +\RequirePackage{color} +\definecolor{link}{rgb}{0.45,0.51,0.67} +\hypersetup{ + colorlinks,% + citecolor=link,% + filecolor=link,% + linkcolor=link,% + urlcolor=link +} + +% Give the text a little room to breath +\setlength{\parskip}{3pt} +\RequirePackage{setspace} +\setstretch{1.05} + +% Issue and article metadata --------------------------------------------------- + +% Basic front matter information about the issue: volume, number, and +% date. + +\newcommand{\volume}[1]{\def\RJ@volume{#1}} +\newcommand{\volnumber}[1]{\def\RJ@number{#1}} +\renewcommand{\month}[1]{\def\RJ@month{#1}} +\renewcommand{\year}[1]{\def\RJ@year{#1}} + + +% Individual articles correspond to +% chapters, and are contained in |article| environments. This makes it +% easy to have figures counted within articles and hence hyperlinked +% correctly. + +% An article has an author, a title, and optionally a subtitle. We use +% the obvious commands for specifying these. Articles will be put in certain +% journal sections, named by \sectionhead. + +\newcommand {\sectionhead} [1]{\def\RJ@sectionhead{#1}} +\renewcommand{\author} [1]{\def\RJ@author{#1}} +\renewcommand{\title} [1]{\def\RJ@title{#1}} +\newcommand {\subtitle} [1]{\def\RJ@subtitle{#1}} + +% Control appearance of titles: make slightly smaller than usual, and +% suppress section numbering. See http://tex.stackexchange.com/questions/69749 +% for why we don't use \setcounter{secnumdepth}{-1} + +\usepackage[medium]{titlesec} +\usepackage{titletoc} +\titleformat{\section} {\normalfont\large\bfseries}{\arabic{section}}{1em}{} +\titleformat{\subsection}{\normalfont\normalsize\bfseries}{\arabic{section}.\arabic{subsection}}{0.5em}{} +\titlecontents{chapter} [0em]{}{}{}{\titlerule*[1em]{.}\contentspage} + +% Article layout --------------------------------------------------------------- + +% Environment |article| clears the article header information at its beginning. +% We use |\FloatBarrier| from the placeins package to keep floats within +% the article. +\RequirePackage{placeins} +\newenvironment{article}{\author{}\title{}\subtitle{}\FloatBarrier}{\FloatBarrier} + +% Refereed articles should have an abstract, so we redefine |\abstract| to +% give the desired style + +\renewcommand{\abstract}[1]{\noindent\textbf{Abstract} #1} +\renewenvironment{abstract}{\noindent\textbf{Abstract}~}{} + +% The real work is done by a redefined version of |\maketitle|. Note +% that even though we do not want chapters (articles) numbered, we +% need to increment the chapter counter, so that figures get correct +% labelling. + +\renewcommand{\maketitle}{% +\noindent + \chapter{\RJ@title}\refstepcounter{chapter} + \ifx\empty\RJ@subtitle + \else + \noindent\textbf{\RJ@subtitle} + \par\nobreak\addvspace{\baselineskip} + \fi + \ifx\empty\RJ@author + \else + \noindent\textit{\RJ@author} + \par\nobreak\addvspace{\baselineskip} + \fi + \@afterindentfalse\@nobreaktrue\@afterheading +} + +% Now for some ugly redefinitions. We do not want articles to start a +% new page. (Actually, we do, but this is handled via explicit +% \newpage +% +% The name@of@eq is a hack to get hyperlinks to equations to work +% within each article, even though there may be multiple eq.(1) +% \begin{macrocode} +\renewcommand\chapter{\secdef\RJ@chapter\@schapter} +\providecommand{\nohyphens}{% + \hyphenpenalty=10000\exhyphenpenalty=10000\relax} +\newcommand{\RJ@chapter}{% + \edef\name@of@eq{equation.\@arabic{\c@chapter}}% + \renewcommand{\@seccntformat}[1]{}% + \@startsection{chapter}{0}{0mm}{% + -2\baselineskip \@plus -\baselineskip \@minus -.2ex}{\p@}{% + \phantomsection\normalfont\huge\bfseries\raggedright}} + +% Book reviews should appear as sections in the text and in the pdf bookmarks, +% however we wish them to appear as chapters in the TOC. Thus we define an +% alternative to |\maketitle| for reviews. +\newcommand{\review}[1]{ + \pdfbookmark[1]{#1}{#1} + \section*{#1} + \addtocontents{toc}{\protect\contentsline{chapter}{#1}{\thepage}{#1.1}} +} + +% We want bibliographies as starred sections within articles. +% +\RequirePackage[sectionbib,round]{natbib} +\bibliographystyle{abbrvnat} +\renewcommand{\bibsection}{\section*{References}} + +% Equations, figures and tables are counted within articles, but we do +% not show the article number. For equations it becomes a bit messy to avoid +% having hyperref getting it wrong. + +% \numberwithin{equation}{chapter} +\renewcommand{\theequation}{\@arabic\c@equation} +\renewcommand{\thefigure}{\@arabic\c@figure} +\renewcommand{\thetable}{\@arabic\c@table} + +% Issue layout ----------------------------------------------------------------- + +% Need to provide our own version of |\tableofcontents|. We use the +% tikz package to get the rounded rectangle. Notice that |\section*| +% is really the same as |\chapter*|. +\renewcommand{\contentsname}{Contents} +\renewcommand\tableofcontents{% + \vspace{1cm} + \section*{\contentsname} + { \@starttoc{toc} } +} + +\renewcommand{\titlepage}{% + \thispagestyle{empty} + \hypersetup{ + pdftitle={The R Journal Volume \RJ@volume/\RJ@number, \RJ@month \RJ@year},% + pdfauthor={R Foundation for Statistical Computing},% + } + \noindent + \begin{center} + \fontsize{50pt}{50pt}\selectfont + The \raisebox{-8pt}{\includegraphics[height=77pt]{Rlogo-5}}\hspace{10pt} + Journal + + \end{center} + {\large \hfill Volume \RJ@volume/\RJ@number, \RJ@month{} \RJ@year \quad} + + \rule{\textwidth}{1pt} + \begin{center} + {\Large A peer-reviewed, open-access publication of the \\ + R Foundation for Statistical Computing} + \end{center} + + % And finally, put in the TOC box. Note the way |tocdepth| is adjusted + % before and after producing the TOC: thus, we can ensure that only + % articles show up in the printed TOC, but that in the PDF version, + % bookmarks are created for sections and subsections as well (provided + % that the non-starred forms are used). + \setcounter{tocdepth}{0} + \tableofcontents + \setcounter{tocdepth}{2} + \clearpage +} + +% Text formatting -------------------------------------------------------------- + +\newcommand{\R}{R} +\newcommand{\address}[1]{\addvspace{\baselineskip}\noindent\emph{#1}} +\newcommand{\email}[1]{\href{mailto:#1}{\normalfont\texttt{#1}}} + +% Simple font selection is not good enough. For example, |\texttt{--}| +% gives `\texttt{--}', i.e., an endash in typewriter font. Hence, we +% need to turn off ligatures, which currently only happens for commands +% |\code| and |\samp| and the ones derived from them. Hyphenation is +% another issue; it should really be turned off inside |\samp|. And +% most importantly, \LaTeX{} special characters are a nightmare. E.g., +% one needs |\~{}| to produce a tilde in a file name marked by |\file|. +% Perhaps a few years ago, most users would have agreed that this may be +% unfortunate but should not be changed to ensure consistency. But with +% the advent of the WWW and the need for getting `|~|' and `|#|' into +% URLs, commands which only treat the escape and grouping characters +% specially have gained acceptance + +\DeclareRobustCommand\code{\bgroup\@noligs\@codex} +\def\@codex#1{\texorpdfstring% +{{\normalfont\ttfamily\hyphenchar\font=-1 #1}}% +{#1}\egroup} +\newcommand{\kbd}[1]{{\normalfont\texttt{#1}}} +\newcommand{\key}[1]{{\normalfont\texttt{\uppercase{#1}}}} +\DeclareRobustCommand\samp{`\bgroup\@noligs\@sampx} +\def\@sampx#1{{\normalfont\texttt{#1}}\egroup'} +\newcommand{\var}[1]{{\normalfont\textsl{#1}}} +\let\env=\code +\newcommand{\file}[1]{{`\normalfont\textsf{#1}'}} +\let\command=\code +\let\option=\samp +\newcommand{\dfn}[1]{{\normalfont\textsl{#1}}} +% \acronym is effectively disabled since not used consistently +\newcommand{\acronym}[1]{#1} +\newcommand{\strong}[1]{\texorpdfstring% +{{\normalfont\fontseries{b}\selectfont #1}}% +{#1}} +\let\pkg=\strong +\newcommand{\CRANpkg}[1]{\href{https://CRAN.R-project.org/package=#1}{\pkg{#1}}}% +\let\cpkg=\CRANpkg +\newcommand{\ctv}[1]{\href{https://CRAN.R-project.org/view=#1}{\emph{#1}}} +\newcommand{\BIOpkg}[1]{\href{https://www.bioconductor.org/packages/release/bioc/html/#1.html}{\pkg{#1}}} + +% Example environments --------------------------------------------------------- +\RequirePackage{fancyvrb} +\RequirePackage{alltt} + +\DefineVerbatimEnvironment{example}{Verbatim}{} +\renewenvironment{example*}{\begin{alltt}}{\end{alltt}} + +% Support for output from Sweave, and generic session style code +% These used to have fontshape=sl for Sinput/Scode/Sin, but pslatex +% won't use a condensed font in that case. + +% Update (2015-05-28 by DS): remove fontsize=\small to match example environment + +\DefineVerbatimEnvironment{Sinput}{Verbatim}{} +\DefineVerbatimEnvironment{Soutput}{Verbatim}{} +\DefineVerbatimEnvironment{Scode}{Verbatim}{} +\DefineVerbatimEnvironment{Sin}{Verbatim}{} +\DefineVerbatimEnvironment{Sout}{Verbatim}{} +\newenvironment{Schunk}{}{} + +% Mathematics ------------------------------------------------------------------ + +% The implementation of |\operatorname| is similar to the mechanism +% \LaTeXe{} uses for functions like sin and cos, and simpler than the +% one of \AmSLaTeX{}. We use |\providecommand| for the definition in +% order to keep the one of the \pkg{amstex} if this package has +% already been loaded. +% \begin{macrocode} +\providecommand{\operatorname}[1]{% + \mathop{\operator@font#1}\nolimits} +\RequirePackage{amsfonts} + +\renewcommand{\P}{% + \mathop{\operator@font I\hspace{-1.5pt}P\hspace{.13pt}}} +\newcommand{\E}{% + \mathop{\operator@font I\hspace{-1.5pt}E\hspace{.13pt}}} +\newcommand{\VAR}{\operatorname{var}} +\newcommand{\COV}{\operatorname{cov}} +\newcommand{\COR}{\operatorname{cor}} + +% Figures ---------------------------------------------------------------------- + +% For use with pandoc > 3.2.1 +\newsavebox\pandoc@box +\newcommand*\pandocbounded[1]{% scales image to fit in text height/width + \sbox\pandoc@box{#1}% + \Gscale@div\@tempa{\textheight}{\dimexpr\ht\pandoc@box+\dp\pandoc@box\relax}% + \Gscale@div\@tempb{\linewidth}{\wd\pandoc@box}% + \ifdim\@tempb\p@<\@tempa\p@\let\@tempa\@tempb\fi% select the smaller of both + \ifdim\@tempa\p@<\p@\scalebox{\@tempa}{\usebox\pandoc@box}% + \else\usebox{\pandoc@box}% + \fi% +} + +\RequirePackage[font=small,labelfont=bf]{caption} + +% Wide environments for figures and tables ------------------------------------- +\RequirePackage{environ} + +% An easy way to make a figure span the full width of the page +\NewEnviron{widefigure}[1][]{ +\begin{figure}[#1] +\advance\leftskip-2cm +\begin{minipage}{\dimexpr\textwidth+4cm\relax}% + \captionsetup{margin=2cm} + \BODY +\end{minipage}% +\end{figure} +} + +\NewEnviron{widetable}[1][]{ +\begin{table}[#1] +\advance\leftskip-2cm +\begin{minipage}{\dimexpr\textwidth+4cm\relax}% + \captionsetup{margin=2cm} + \BODY +\end{minipage}% +\end{table} +} diff --git a/_news/RJ-2025-4-editorial/RJwrapper.tex b/_news/RJ-2025-4-editorial/RJwrapper.tex new file mode 100644 index 0000000000..39cdbed766 --- /dev/null +++ b/_news/RJ-2025-4-editorial/RJwrapper.tex @@ -0,0 +1,70 @@ +\documentclass[a4paper]{report} +\usepackage[utf8]{inputenc} +\usepackage[T1]{fontenc} +\usepackage{RJournal} +\usepackage{amsmath,amssymb,array} +\usepackage{booktabs} + + +% tightlist command for lists without linebreak +\providecommand{\tightlist}{% + \setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}} + +\usepackage{longtable} + +% Always define CSL refs as bib entries are contained in separate doc +% Pandoc citation processing +%From Pandoc 3.1.8 +% definitions for citeproc citations +\NewDocumentCommand\citeproctext{}{} +\NewDocumentCommand\citeproc{mm}{% + \begingroup\def\citeproctext{#2}\cite{#1}\endgroup} +\makeatletter + % allow citations to break across lines + \let\@cite@ofmt\@firstofone + % avoid brackets around text for \cite: + \def\@biblabel#1{} + \def\@cite#1#2{{#1\if@tempswa , #2\fi}} +\makeatother +\newlength{\cslhangindent} +\setlength{\cslhangindent}{1.5em} +\newlength{\csllabelwidth} +\setlength{\csllabelwidth}{3em} +\newenvironment{CSLReferences}[2] % #1 hanging-indent, #2 entry-spacing + {\begin{list}{}{% + \setlength{\itemindent}{0pt} + \setlength{\leftmargin}{0pt} + \setlength{\parsep}{0pt} + % turn on hanging indent if param 1 is 1 + \ifodd #1 + \setlength{\leftmargin}{\cslhangindent} + \setlength{\itemindent}{-1\cslhangindent} + \fi + % set entry spacing + \setlength{\itemsep}{#2\baselineskip}}} + {\end{list}} +\usepackage{calc} +\newcommand{\CSLBlock}[1]{#1\hfill\break} +\newcommand{\CSLLeftMargin}[1]{\parbox[t]{\csllabelwidth}{#1}} +\newcommand{\CSLRightInline}[1]{\parbox[t]{\linewidth - \csllabelwidth}{#1}\break} +\newcommand{\CSLIndent}[1]{\hspace{\cslhangindent}#1} + + + +\begin{document} + + +%% do not edit, for illustration only +\sectionhead{Contributed research article} +\volume{17} +\volnumber{4} +\year{2025} +\month{December} +\setcounter{page}{3} + +\begin{article} + \input{RJ-2025-4-editorial} +\end{article} + + +\end{document} diff --git a/_news/RJ-2025-4-rforwards/RJ-2025-4-rforwards.R b/_news/RJ-2025-4-rforwards/RJ-2025-4-rforwards.R new file mode 100644 index 0000000000..bd1745232c --- /dev/null +++ b/_news/RJ-2025-4-rforwards/RJ-2025-4-rforwards.R @@ -0,0 +1,6 @@ +# Generated by `rjournal_pdf_article()` using `knitr::purl()`: do not edit by hand +# Please edit RJ-2025-4-rforwards.Rmd to modify this file + +## ----setup, include=FALSE----------------------------------------------------- +knitr::opts_chunk$set(echo = FALSE, warning = FALSE, message = FALSE) + diff --git a/_news/RJ-2025-4-rforwards/RJ-2025-4-rforwards.Rmd b/_news/RJ-2025-4-rforwards/RJ-2025-4-rforwards.Rmd new file mode 100644 index 0000000000..b72d06d795 --- /dev/null +++ b/_news/RJ-2025-4-rforwards/RJ-2025-4-rforwards.Rmd @@ -0,0 +1,139 @@ +--- +title: News from the Forwards Taskforce +date: '2025-12-01' +abstract: | + [Forwards](https://forwards.github.io/) is an R Foundation taskforce working to widen the participation of under-represented groups in the R project and in related activities, such as the *useR!* conference. This report rounds up activities of the taskforce during 2025. +draft: no +author: +- name: Heather Turner + affiliation: University of Warwick + address: Coventry, United Kingdom + orcid: 0000-0002-1256-3375 + url: https://warwick.ac.uk/heatherturner + email: heather.turner@r-project.org +type: news +output: + rjtools::rjournal_article: + self_contained: yes + toc: no +volume: 17 +issue: 4 +slug: RJ-2025-4-rforwards +journal: + lastpage: 329 + firstpage: 328 + +--- + + +```{r setup, include=FALSE} +knitr::opts_chunk$set(echo = FALSE, warning = FALSE, message = FALSE) +``` + +# Accessibility + +Work continued on the [R Consortium funded project](https://r-consortium.org/all-projects/2023-group-2.html#accessibility-enhancements-for-the-r-journal) +to facilitate adding alternative text (_alt text_) to figures in R Journal +articles. +Di Cook, Jonathan Godfrey and Heather Turner advised Maliny Po in developing +two tools: + +* A [Shiny app](https://maliny.shinyapps.io/alt-text/) for generating alt text +based on a plot and the code used to generate it. +* The [autoAlt](https://github.com/numbats/autoAlt) package for generating +alt text for Quarto or R markdown files. + +Both of these are works in progress. Di worked with Jacob Voo at +[OceaniaR Hackathon 2025](https://github.com/StatSocAus/oceaniar-hack-2025/issues/7) +to package up some of the scripts used by the Shiny app into the autoAlt package. + +# Community engagement + +Ella Kaye was selected as a 2025 Software Sustainability Institute Fellow, +supporting her work with the [rainbowR](https://rainbowr.org/) community +for LGBTQ+ folk who code in R. This has enabled rainbowR to become more +established, with an expanded [leadership team](https://rainbowr.org/committees.html) +and [an improved Code of Conduct](https://rainbowr.org/posts/2025-12-16_coc-update/). +The membership has grown to over 250 members and the community have had the +capacity to expand their activities to include a book club (discussing +[Queer Data](https://kevinguyan.com/queer-data/) as the first book) and an +[online conference](https://conference.rainbowr.org) which will be held 25-26 February 2026. The committee has also worked towards securing the long-term sustainability of rainbowR, including adopting a constitution and electing the leadership committee. They will hold their first Annual General Meeting to vote on these developments in January 2026. + +Heather Turner and Ella Kaye attended the "Data Science for Girls" open event +hosted by [R-Girls](https://greenoak.bham.sch.uk/r-girls-school-network/) at +Green Oak Academy, Birmingham, UK. They wrote a [post on the Forwards blog](https://forwards.github.io/blog/2025/r-girls-open-event-2025/) to +report back on this event. Mohammed A Mohammed, a co-founder of the R-Girls +initiative, has joined Forwards to help maintain the connection with this group. + +Ella was also Rotating Curator on the [We Are R-Ladies Bluesky account](https://bsky.app/profile/weare.rladies.org) for a week in December, +where she was able to promote Forwards and rainbowR, as well as share tips and resources regarding contributing to base R. + +# R Contribution/on-ramps + +Forwards was once again heavily involved in activities of the +[R Contribution Working Group](https://contributor.r-project.org/working-group), +aiming to foster a larger and more diverse community of contributors to R. +2025 saw the organization of the first bilingual French/English R Dev Day as +a satellite to [RencontresR 2025](https://github.com/r-devel/r-dev-day/blob/main/reports/2025-05-22_RencontresR2025.md), as well as the first R Dev Days in Oceania. The one in [Australia](https://pretix.eu/r-contributors/r-dev-day-oz-25/) facilitated +collaboration between online participants and people at venues in Melbourne, +Perth and Brisbane, while the one in +[New Zealand](https://pretix.eu/r-contributors/r-dev-day-nz-25/) ran in two +streams to facilitate collaboration between online participants and people +attending in Auckland. These experiments were part of a commitment to run +R Contributor events as hybrid in future, for greater inclusion. The R Dev Day +at RSECon25 was another significant event, as funding was available from Heather +Turner's EPSRC Research Software Engineering Fellowship +(grant number: EP/V052128/1) to provide full travel support for 9 participants +from the Global South, enabling them to participate in person. + +Heather presented a keynote talk at [LatinR 2025](https://youtu.be/xbCT4v5LkD4?t=26418) +on [*Lowering Barriers to Contributing to R*](https://hturner.github.io/LatinR2025) +and highlighted the contributions of [*R-Ladies at R Dev Days*](https://hturner.github.io/RLadiesMelbourne2025/) in a talk to the +R-Ladies Melbourne chapter. + +Ella Kaye facilitated a second run of the [C Study Group](https://contributor.r-project.org/events/c-study-group-2025) - a cohort +for Oceania was organised by Nick Tierney and Fonti Kar. + +# Conferences + +Dillon Sparks finalised the [report on useR! 2024 survey](https://forwards.github.io/docs/useR2024_survey/) and joined the +organising committee for useR! 2025 to support the running of this survey that +helps to track demographics of participants over time, as well as inform +future events. He has been working with our new members +Imani Oluwafumilayo Maliti and Lois Adler-Johnson on promoting the survey and +analysing the results. Another new member, Jesica Formoso, is helping to +maintain the [useR! infoboard](https://rconf.gitlab.io/userinfoboard/) that +summarises key data from useR! conferences over time. + +Kevin O'Brien and gwynn gebeyhu were on the advisory committee for the +[GhanaR 2025 conference](https://ghana-rusers.org/events/?event=5449), the +second instance of this conference. + +# Teaching + +The materials for the Forwards Package Development Workshop were updated and +added to a new section of the Forwards website: +. They are shared under the +CC-BY-NC-SA 4.0 license. + +The updated materials were used to teach the workshop online to two cohorts in +June-July, the first led by Ella Kaye and Pao Corrales, the second led by +Joyce Robbins, Emma Rand and Heather Turner. The workshops were promoted in +collaboration with rainbowR and R-Ladies Remote, encouraging participation +from underrepresented groups. Around 50 people participated in these events and +we plan to re-run the workshops in 2026. + +# Social Media/Branding + +The [Forwards website](https://forwards.github.io/) has a fresh look, thanks to +Ella Kaye. Ella also developed a +[Quarto revealjs extension for Forwards](https://github.com/forwards/forwardspres) +that is used in the updated teaching materials. + +# Membership changes + +We were happy to welcome four new members in 2025: Lois Adler-Johnson, +Jesica Formoso, Mohammed A Mohammed, and Imani Oluwafumilayo Maliti. + +One member, Emma Rand, has stepped down from the taskforce and we thank her +for her contributions over several years. diff --git a/_news/RJ-2025-4-rforwards/RJ-2025-4-rforwards.html b/_news/RJ-2025-4-rforwards/RJ-2025-4-rforwards.html new file mode 100644 index 0000000000..11629de477 --- /dev/null +++ b/_news/RJ-2025-4-rforwards/RJ-2025-4-rforwards.html @@ -0,0 +1,1885 @@ + + + + + + + + + + + + + + + + + + + + + News from the Forwards Taskforce + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +

    News from the Forwards Taskforce

    + + + +

    Forwards is an R Foundation taskforce working to widen the participation of under-represented groups in the R project and in related activities, such as the useR! conference. This report rounds up activities of the taskforce during 2025.

    +
    + + + +
    +

    1 Accessibility

    +

    Work continued on the R Consortium funded project +to facilitate adding alternative text (alt text) to figures in R Journal +articles. +Di Cook, Jonathan Godfrey and Heather Turner advised Maliny Po in developing +two tools:

    +
      +
    • A Shiny app for generating alt text +based on a plot and the code used to generate it.
    • +
    • The autoAlt package for generating +alt text for Quarto or R markdown files.
    • +
    +

    Both of these are works in progress. Di worked with Jacob Voo at +OceaniaR Hackathon 2025 +to package up some of the scripts used by the Shiny app into the autoAlt package.

    +

    2 Community engagement

    +

    Ella Kaye was selected as a 2025 Software Sustainability Institute Fellow, +supporting her work with the rainbowR community +for LGBTQ+ folk who code in R. This has enabled rainbowR to become more +established, with an expanded leadership team +and an improved Code of Conduct. +The membership has grown to over 250 members and the community have had the +capacity to expand their activities to include a book club (discussing +Queer Data as the first book) and an +online conference which will be held 25-26 February 2026. The committee has also worked towards securing the long-term sustainability of rainbowR, including adopting a constitution and electing the leadership committee. They will hold their first Annual General Meeting to vote on these developments in January 2026.

    +

    Heather Turner and Ella Kaye attended the “Data Science for Girls” open event +hosted by R-Girls at +Green Oak Academy, Birmingham, UK. They wrote a post on the Forwards blog to +report back on this event. Mohammed A Mohammed, a co-founder of the R-Girls +initiative, has joined Forwards to help maintain the connection with this group.

    +

    Ella was also Rotating Curator on the We Are R-Ladies Bluesky account for a week in December, +where she was able to promote Forwards and rainbowR, as well as share tips and resources regarding contributing to base R.

    +

    3 R Contribution/on-ramps

    +

    Forwards was once again heavily involved in activities of the +R Contribution Working Group, +aiming to foster a larger and more diverse community of contributors to R. +2025 saw the organization of the first bilingual French/English R Dev Day as +a satellite to RencontresR 2025, as well as the first R Dev Days in Oceania. The one in Australia facilitated +collaboration between online participants and people at venues in Melbourne, +Perth and Brisbane, while the one in +New Zealand ran in two +streams to facilitate collaboration between online participants and people +attending in Auckland. These experiments were part of a commitment to run +R Contributor events as hybrid in future, for greater inclusion. The R Dev Day +at RSECon25 was another significant event, as funding was available from Heather +Turner’s EPSRC Research Software Engineering Fellowship +(grant number: EP/V052128/1) to provide full travel support for 9 participants +from the Global South, enabling them to participate in person.

    +

    Heather presented a keynote talk at LatinR 2025 +on Lowering Barriers to Contributing to R +and highlighted the contributions of R-Ladies at R Dev Days in a talk to the +R-Ladies Melbourne chapter.

    +

    Ella Kaye facilitated a second run of the C Study Group - a cohort +for Oceania was organised by Nick Tierney and Fonti Kar.

    +

    4 Conferences

    +

    Dillon Sparks finalised the report on useR! 2024 survey and joined the +organising committee for useR! 2025 to support the running of this survey that +helps to track demographics of participants over time, as well as inform +future events. He has been working with our new members +Imani Oluwafumilayo Maliti and Lois Adler-Johnson on promoting the survey and +analysing the results. Another new member, Jesica Formoso, is helping to +maintain the useR! infoboard that +summarises key data from useR! conferences over time.

    +

    Kevin O’Brien and gwynn gebeyhu were on the advisory committee for the +GhanaR 2025 conference, the +second instance of this conference.

    +

    5 Teaching

    +

    The materials for the Forwards Package Development Workshop were updated and +added to a new section of the Forwards website: +https://forwards.github.io/package-dev/. They are shared under the +CC-BY-NC-SA 4.0 license.

    +

    The updated materials were used to teach the workshop online to two cohorts in +June-July, the first led by Ella Kaye and Pao Corrales, the second led by +Joyce Robbins, Emma Rand and Heather Turner. The workshops were promoted in +collaboration with rainbowR and R-Ladies Remote, encouraging participation +from underrepresented groups. Around 50 people participated in these events and +we plan to re-run the workshops in 2026.

    +

    6 Social Media/Branding

    +

    The Forwards website has a fresh look, thanks to +Ella Kaye. Ella also developed a +Quarto revealjs extension for Forwards +that is used in the updated teaching materials.

    +

    7 Membership changes

    +

    We were happy to welcome four new members in 2025: Lois Adler-Johnson, +Jesica Formoso, Mohammed A Mohammed, and Imani Oluwafumilayo Maliti.

    +

    One member, Emma Rand, has stepped down from the taskforce and we thank her +for her contributions over several years.

    +
    + + +
    + +
    +
    + + + + + +
    +

    Reuse

    +

    Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

    +

    Citation

    +

    For attribution, please cite this work as

    +
    Turner, "News from the Forwards Taskforce", The R Journal, 2025
    +

    BibTeX citation

    +
    @article{RJ-2025-4-rforwards,
    +  author = {Turner, Heather},
    +  title = {News from the Forwards Taskforce},
    +  journal = {The R Journal},
    +  year = {2025},
    +  note = {https://journal.r-project.org/news/RJ-2025-4-rforwards},
    +  volume = {17},
    +  issue = {4},
    +  issn = {2073-4859},
    +  pages = {328-329}
    +}
    +
    + + + + + + + diff --git a/_news/RJ-2025-4-rforwards/RJ-2025-4-rforwards.pdf b/_news/RJ-2025-4-rforwards/RJ-2025-4-rforwards.pdf new file mode 100644 index 0000000000..83238f053b Binary files /dev/null and b/_news/RJ-2025-4-rforwards/RJ-2025-4-rforwards.pdf differ diff --git a/_news/RJ-2025-4-rforwards/RJ-2025-4-rforwards.tex b/_news/RJ-2025-4-rforwards/RJ-2025-4-rforwards.tex new file mode 100644 index 0000000000..a13231d82a --- /dev/null +++ b/_news/RJ-2025-4-rforwards/RJ-2025-4-rforwards.tex @@ -0,0 +1,135 @@ +% !TeX root = RJwrapper.tex +\title{News from the Forwards Taskforce} + + +\author{by Heather Turner} + +\maketitle + +\abstract{% +\href{https://forwards.github.io/}{Forwards} is an R Foundation taskforce working to widen the participation of under-represented groups in the R project and in related activities, such as the \emph{useR!} conference. This report rounds up activities of the taskforce during 2025. +} + +\section{Accessibility}\label{accessibility} + +Work continued on the \href{https://r-consortium.org/all-projects/2023-group-2.html\#accessibility-enhancements-for-the-r-journal}{R Consortium funded project} +to facilitate adding alternative text (\emph{alt text}) to figures in R Journal +articles. +Di Cook, Jonathan Godfrey and Heather Turner advised Maliny Po in developing +two tools: + +\begin{itemize} +\tightlist +\item + A \href{https://maliny.shinyapps.io/alt-text/}{Shiny app} for generating alt text + based on a plot and the code used to generate it. +\item + The \href{https://github.com/numbats/autoAlt}{autoAlt} package for generating + alt text for Quarto or R markdown files. +\end{itemize} + +Both of these are works in progress. Di worked with Jacob Voo at +\href{https://github.com/StatSocAus/oceaniar-hack-2025/issues/7}{OceaniaR Hackathon 2025} +to package up some of the scripts used by the Shiny app into the autoAlt package. + +\section{Community engagement}\label{community-engagement} + +Ella Kaye was selected as a 2025 Software Sustainability Institute Fellow, +supporting her work with the \href{https://rainbowr.org/}{rainbowR} community +for LGBTQ+ folk who code in R. This has enabled rainbowR to become more +established, with an expanded \href{https://rainbowr.org/committees.html}{leadership team} +and \href{https://rainbowr.org/posts/2025-12-16_coc-update/}{an improved Code of Conduct}. +The membership has grown to over 250 members and the community have had the +capacity to expand their activities to include a book club (discussing +\href{https://kevinguyan.com/queer-data/}{Queer Data} as the first book) and an +\href{https://conference.rainbowr.org}{online conference} which will be held 25-26 February 2026. The committee has also worked towards securing the long-term sustainability of rainbowR, including adopting a constitution and electing the leadership committee. They will hold their first Annual General Meeting to vote on these developments in January 2026. + +Heather Turner and Ella Kaye attended the ``Data Science for Girls'' open event +hosted by \href{https://greenoak.bham.sch.uk/r-girls-school-network/}{R-Girls} at +Green Oak Academy, Birmingham, UK. They wrote a \href{https://forwards.github.io/blog/2025/r-girls-open-event-2025/}{post on the Forwards blog} to +report back on this event. Mohammed A Mohammed, a co-founder of the R-Girls +initiative, has joined Forwards to help maintain the connection with this group. + +Ella was also Rotating Curator on the \href{https://bsky.app/profile/weare.rladies.org}{We Are R-Ladies Bluesky account} for a week in December, +where she was able to promote Forwards and rainbowR, as well as share tips and resources regarding contributing to base R. + +\section{R Contribution/on-ramps}\label{r-contributionon-ramps} + +Forwards was once again heavily involved in activities of the +\href{https://contributor.r-project.org/working-group}{R Contribution Working Group}, +aiming to foster a larger and more diverse community of contributors to R. +2025 saw the organization of the first bilingual French/English R Dev Day as +a satellite to \href{https://github.com/r-devel/r-dev-day/blob/main/reports/2025-05-22_RencontresR2025.md}{RencontresR 2025}, as well as the first R Dev Days in Oceania. The one in \href{https://pretix.eu/r-contributors/r-dev-day-oz-25/}{Australia} facilitated +collaboration between online participants and people at venues in Melbourne, +Perth and Brisbane, while the one in +\href{https://pretix.eu/r-contributors/r-dev-day-nz-25/}{New Zealand} ran in two +streams to facilitate collaboration between online participants and people +attending in Auckland. These experiments were part of a commitment to run +R Contributor events as hybrid in future, for greater inclusion. The R Dev Day +at RSECon25 was another significant event, as funding was available from Heather +Turner's EPSRC Research Software Engineering Fellowship +(grant number: EP/V052128/1) to provide full travel support for 9 participants +from the Global South, enabling them to participate in person. + +Heather presented a keynote talk at \href{https://youtu.be/xbCT4v5LkD4?t=26418}{LatinR 2025} +on \href{https://hturner.github.io/LatinR2025}{\emph{Lowering Barriers to Contributing to R}} +and highlighted the contributions of \href{https://hturner.github.io/RLadiesMelbourne2025/}{\emph{R-Ladies at R Dev Days}} in a talk to the +R-Ladies Melbourne chapter. + +Ella Kaye facilitated a second run of the \href{https://contributor.r-project.org/events/c-study-group-2025}{C Study Group} - a cohort +for Oceania was organised by Nick Tierney and Fonti Kar. + +\section{Conferences}\label{conferences} + +Dillon Sparks finalised the \href{https://forwards.github.io/docs/useR2024_survey/}{report on useR! 2024 survey} and joined the +organising committee for useR! 2025 to support the running of this survey that +helps to track demographics of participants over time, as well as inform +future events. He has been working with our new members +Imani Oluwafumilayo Maliti and Lois Adler-Johnson on promoting the survey and +analysing the results. Another new member, Jesica Formoso, is helping to +maintain the \href{https://rconf.gitlab.io/userinfoboard/}{useR! infoboard} that +summarises key data from useR! conferences over time. + +Kevin O'Brien and gwynn gebeyhu were on the advisory committee for the +\href{https://ghana-rusers.org/events/?event=5449}{GhanaR 2025 conference}, the +second instance of this conference. + +\section{Teaching}\label{teaching} + +The materials for the Forwards Package Development Workshop were updated and +added to a new section of the Forwards website: +\url{https://forwards.github.io/package-dev/}. They are shared under the +CC-BY-NC-SA 4.0 license. + +The updated materials were used to teach the workshop online to two cohorts in +June-July, the first led by Ella Kaye and Pao Corrales, the second led by +Joyce Robbins, Emma Rand and Heather Turner. The workshops were promoted in +collaboration with rainbowR and R-Ladies Remote, encouraging participation +from underrepresented groups. Around 50 people participated in these events and +we plan to re-run the workshops in 2026. + +\section{Social Media/Branding}\label{social-mediabranding} + +The \href{https://forwards.github.io/}{Forwards website} has a fresh look, thanks to +Ella Kaye. Ella also developed a +\href{https://github.com/forwards/forwardspres}{Quarto revealjs extension for Forwards} +that is used in the updated teaching materials. + +\section{Membership changes}\label{membership-changes} + +We were happy to welcome four new members in 2025: Lois Adler-Johnson, +Jesica Formoso, Mohammed A Mohammed, and Imani Oluwafumilayo Maliti. + +One member, Emma Rand, has stepped down from the taskforce and we thank her +for her contributions over several years. + + +\address{% +Heather Turner\\ +University of Warwick\\% +Coventry, United Kingdom\\ +% +\url{https://warwick.ac.uk/heatherturner}\\% +\textit{ORCiD: \href{https://orcid.org/0000-0002-1256-3375}{0000-0002-1256-3375}}\\% +\href{mailto:heather.turner@r-project.org}{\nolinkurl{heather.turner@r-project.org}}% +} diff --git a/_news/RJ-2025-4-rforwards/RJournal.sty b/_news/RJ-2025-4-rforwards/RJournal.sty new file mode 100644 index 0000000000..351990be38 --- /dev/null +++ b/_news/RJ-2025-4-rforwards/RJournal.sty @@ -0,0 +1,358 @@ +% Package `RJournal' to use with LaTeX2e +% Copyright (C) 2010 by the R Foundation +% Copyright (C) 2013 by the R Journal +% +% Originally written by Kurt Hornik and Friedrich Leisch with subsequent +% edits by the editorial board +% +% CAUTION: +% Do not modify this style file. Any changes to this file will be reset when your +% article is submitted. +% If you must modify the style or add LaTeX packages to the article, these +% should be specified in RJwrapper.tex + +\NeedsTeXFormat{LaTeX2e}[1995/12/01] +\ProvidesPackage{RJournal}[2025/10/05 v0.17 RJournal package] + +\RequirePackage{tikz} + +% Overall page layout, fonts etc ----------------------------------------------- + +% Issues of of \emph{The R Journal} are created from the standard \LaTeX{} +% document class \pkg{report}. + +\RequirePackage{geometry} +\geometry{a4paper, + textwidth=14cm, top=1cm, bottom=1cm, + includehead,includefoot,centering, + footskip=1.5cm} +\raggedbottom +\sloppy +\clubpenalty = 10000 +\widowpenalty = 10000 +\brokenpenalty = 10000 +\usepackage{microtype} + + +\RequirePackage{fancyhdr} +\fancyhead{} +\fancyheadoffset{2cm} +\fancyhead[L]{\textsc{\RJ@sectionhead}} +\fancyhead[R]{\thepage} +\fancyfoot{} +\fancyfoot[L]{The R Journal Vol. \RJ@volume/\RJ@number, \RJ@month~\RJ@year} +\fancyfoot[R]{ISSN 2073-4859} +\pagestyle{fancy} + +% We use the following fonts (all with T1 encoding): +% +% rm & palatino +% tt & inconsolata +% sf & helvetica +% math & palatino + +\RequirePackage{microtype} + +\RequirePackage[scaled=0.92]{helvet} +\RequirePackage{palatino,mathpazo} +\RequirePackage[scaled=1.02]{inconsolata} +\RequirePackage[T1]{fontenc} + +\RequirePackage[hyphens]{url} +\RequirePackage[pagebackref]{hyperref} +\renewcommand{\backref}[1]{[p#1]} + +% Dark blue colour for all links +\RequirePackage{color} +\definecolor{link}{rgb}{0.45,0.51,0.67} +\hypersetup{ + colorlinks,% + citecolor=link,% + filecolor=link,% + linkcolor=link,% + urlcolor=link +} + +% Give the text a little room to breath +\setlength{\parskip}{3pt} +\RequirePackage{setspace} +\setstretch{1.05} + +% Issue and article metadata --------------------------------------------------- + +% Basic front matter information about the issue: volume, number, and +% date. + +\newcommand{\volume}[1]{\def\RJ@volume{#1}} +\newcommand{\volnumber}[1]{\def\RJ@number{#1}} +\renewcommand{\month}[1]{\def\RJ@month{#1}} +\renewcommand{\year}[1]{\def\RJ@year{#1}} + + +% Individual articles correspond to +% chapters, and are contained in |article| environments. This makes it +% easy to have figures counted within articles and hence hyperlinked +% correctly. + +% An article has an author, a title, and optionally a subtitle. We use +% the obvious commands for specifying these. Articles will be put in certain +% journal sections, named by \sectionhead. + +\newcommand {\sectionhead} [1]{\def\RJ@sectionhead{#1}} +\renewcommand{\author} [1]{\def\RJ@author{#1}} +\renewcommand{\title} [1]{\def\RJ@title{#1}} +\newcommand {\subtitle} [1]{\def\RJ@subtitle{#1}} + +% Control appearance of titles: make slightly smaller than usual, and +% suppress section numbering. See http://tex.stackexchange.com/questions/69749 +% for why we don't use \setcounter{secnumdepth}{-1} + +\usepackage[medium]{titlesec} +\usepackage{titletoc} +\titleformat{\section} {\normalfont\large\bfseries}{\arabic{section}}{1em}{} +\titleformat{\subsection}{\normalfont\normalsize\bfseries}{\arabic{section}.\arabic{subsection}}{0.5em}{} +\titlecontents{chapter} [0em]{}{}{}{\titlerule*[1em]{.}\contentspage} + +% Article layout --------------------------------------------------------------- + +% Environment |article| clears the article header information at its beginning. +% We use |\FloatBarrier| from the placeins package to keep floats within +% the article. +\RequirePackage{placeins} +\newenvironment{article}{\author{}\title{}\subtitle{}\FloatBarrier}{\FloatBarrier} + +% Refereed articles should have an abstract, so we redefine |\abstract| to +% give the desired style + +\renewcommand{\abstract}[1]{\noindent\textbf{Abstract} #1} +\renewenvironment{abstract}{\noindent\textbf{Abstract}~}{} + +% The real work is done by a redefined version of |\maketitle|. Note +% that even though we do not want chapters (articles) numbered, we +% need to increment the chapter counter, so that figures get correct +% labelling. + +\renewcommand{\maketitle}{% +\noindent + \chapter{\RJ@title}\refstepcounter{chapter} + \ifx\empty\RJ@subtitle + \else + \noindent\textbf{\RJ@subtitle} + \par\nobreak\addvspace{\baselineskip} + \fi + \ifx\empty\RJ@author + \else + \noindent\textit{\RJ@author} + \par\nobreak\addvspace{\baselineskip} + \fi + \@afterindentfalse\@nobreaktrue\@afterheading +} + +% Now for some ugly redefinitions. We do not want articles to start a +% new page. (Actually, we do, but this is handled via explicit +% \newpage +% +% The name@of@eq is a hack to get hyperlinks to equations to work +% within each article, even though there may be multiple eq.(1) +% \begin{macrocode} +\renewcommand\chapter{\secdef\RJ@chapter\@schapter} +\providecommand{\nohyphens}{% + \hyphenpenalty=10000\exhyphenpenalty=10000\relax} +\newcommand{\RJ@chapter}{% + \edef\name@of@eq{equation.\@arabic{\c@chapter}}% + \renewcommand{\@seccntformat}[1]{}% + \@startsection{chapter}{0}{0mm}{% + -2\baselineskip \@plus -\baselineskip \@minus -.2ex}{\p@}{% + \phantomsection\normalfont\huge\bfseries\raggedright}} + +% Book reviews should appear as sections in the text and in the pdf bookmarks, +% however we wish them to appear as chapters in the TOC. Thus we define an +% alternative to |\maketitle| for reviews. +\newcommand{\review}[1]{ + \pdfbookmark[1]{#1}{#1} + \section*{#1} + \addtocontents{toc}{\protect\contentsline{chapter}{#1}{\thepage}{#1.1}} +} + +% We want bibliographies as starred sections within articles. +% +\RequirePackage[sectionbib,round]{natbib} +\bibliographystyle{abbrvnat} +\renewcommand{\bibsection}{\section*{References}} + +% Equations, figures and tables are counted within articles, but we do +% not show the article number. For equations it becomes a bit messy to avoid +% having hyperref getting it wrong. + +% \numberwithin{equation}{chapter} +\renewcommand{\theequation}{\@arabic\c@equation} +\renewcommand{\thefigure}{\@arabic\c@figure} +\renewcommand{\thetable}{\@arabic\c@table} + +% Issue layout ----------------------------------------------------------------- + +% Need to provide our own version of |\tableofcontents|. We use the +% tikz package to get the rounded rectangle. Notice that |\section*| +% is really the same as |\chapter*|. +\renewcommand{\contentsname}{Contents} +\renewcommand\tableofcontents{% + \vspace{1cm} + \section*{\contentsname} + { \@starttoc{toc} } +} + +\renewcommand{\titlepage}{% + \thispagestyle{empty} + \hypersetup{ + pdftitle={The R Journal Volume \RJ@volume/\RJ@number, \RJ@month \RJ@year},% + pdfauthor={R Foundation for Statistical Computing},% + } + \noindent + \begin{center} + \fontsize{50pt}{50pt}\selectfont + The \raisebox{-8pt}{\includegraphics[height=77pt]{Rlogo-5}}\hspace{10pt} + Journal + + \end{center} + {\large \hfill Volume \RJ@volume/\RJ@number, \RJ@month{} \RJ@year \quad} + + \rule{\textwidth}{1pt} + \begin{center} + {\Large A peer-reviewed, open-access publication of the \\ + R Foundation for Statistical Computing} + \end{center} + + % And finally, put in the TOC box. Note the way |tocdepth| is adjusted + % before and after producing the TOC: thus, we can ensure that only + % articles show up in the printed TOC, but that in the PDF version, + % bookmarks are created for sections and subsections as well (provided + % that the non-starred forms are used). + \setcounter{tocdepth}{0} + \tableofcontents + \setcounter{tocdepth}{2} + \clearpage +} + +% Text formatting -------------------------------------------------------------- + +\newcommand{\R}{R} +\newcommand{\address}[1]{\addvspace{\baselineskip}\noindent\emph{#1}} +\newcommand{\email}[1]{\href{mailto:#1}{\normalfont\texttt{#1}}} + +% Simple font selection is not good enough. For example, |\texttt{--}| +% gives `\texttt{--}', i.e., an endash in typewriter font. Hence, we +% need to turn off ligatures, which currently only happens for commands +% |\code| and |\samp| and the ones derived from them. Hyphenation is +% another issue; it should really be turned off inside |\samp|. And +% most importantly, \LaTeX{} special characters are a nightmare. E.g., +% one needs |\~{}| to produce a tilde in a file name marked by |\file|. +% Perhaps a few years ago, most users would have agreed that this may be +% unfortunate but should not be changed to ensure consistency. But with +% the advent of the WWW and the need for getting `|~|' and `|#|' into +% URLs, commands which only treat the escape and grouping characters +% specially have gained acceptance + +\DeclareRobustCommand\code{\bgroup\@noligs\@codex} +\def\@codex#1{\texorpdfstring% +{{\normalfont\ttfamily\hyphenchar\font=-1 #1}}% +{#1}\egroup} +\newcommand{\kbd}[1]{{\normalfont\texttt{#1}}} +\newcommand{\key}[1]{{\normalfont\texttt{\uppercase{#1}}}} +\DeclareRobustCommand\samp{`\bgroup\@noligs\@sampx} +\def\@sampx#1{{\normalfont\texttt{#1}}\egroup'} +\newcommand{\var}[1]{{\normalfont\textsl{#1}}} +\let\env=\code +\newcommand{\file}[1]{{`\normalfont\textsf{#1}'}} +\let\command=\code +\let\option=\samp +\newcommand{\dfn}[1]{{\normalfont\textsl{#1}}} +% \acronym is effectively disabled since not used consistently +\newcommand{\acronym}[1]{#1} +\newcommand{\strong}[1]{\texorpdfstring% +{{\normalfont\fontseries{b}\selectfont #1}}% +{#1}} +\let\pkg=\strong +\newcommand{\CRANpkg}[1]{\href{https://CRAN.R-project.org/package=#1}{\pkg{#1}}}% +\let\cpkg=\CRANpkg +\newcommand{\ctv}[1]{\href{https://CRAN.R-project.org/view=#1}{\emph{#1}}} +\newcommand{\BIOpkg}[1]{\href{https://www.bioconductor.org/packages/release/bioc/html/#1.html}{\pkg{#1}}} + +% Example environments --------------------------------------------------------- +\RequirePackage{fancyvrb} +\RequirePackage{alltt} + +\DefineVerbatimEnvironment{example}{Verbatim}{} +\renewenvironment{example*}{\begin{alltt}}{\end{alltt}} + +% Support for output from Sweave, and generic session style code +% These used to have fontshape=sl for Sinput/Scode/Sin, but pslatex +% won't use a condensed font in that case. + +% Update (2015-05-28 by DS): remove fontsize=\small to match example environment + +\DefineVerbatimEnvironment{Sinput}{Verbatim}{} +\DefineVerbatimEnvironment{Soutput}{Verbatim}{} +\DefineVerbatimEnvironment{Scode}{Verbatim}{} +\DefineVerbatimEnvironment{Sin}{Verbatim}{} +\DefineVerbatimEnvironment{Sout}{Verbatim}{} +\newenvironment{Schunk}{}{} + +% Mathematics ------------------------------------------------------------------ + +% The implementation of |\operatorname| is similar to the mechanism +% \LaTeXe{} uses for functions like sin and cos, and simpler than the +% one of \AmSLaTeX{}. We use |\providecommand| for the definition in +% order to keep the one of the \pkg{amstex} if this package has +% already been loaded. +% \begin{macrocode} +\providecommand{\operatorname}[1]{% + \mathop{\operator@font#1}\nolimits} +\RequirePackage{amsfonts} + +\renewcommand{\P}{% + \mathop{\operator@font I\hspace{-1.5pt}P\hspace{.13pt}}} +\newcommand{\E}{% + \mathop{\operator@font I\hspace{-1.5pt}E\hspace{.13pt}}} +\newcommand{\VAR}{\operatorname{var}} +\newcommand{\COV}{\operatorname{cov}} +\newcommand{\COR}{\operatorname{cor}} + +% Figures ---------------------------------------------------------------------- + +% For use with pandoc > 3.2.1 +\newsavebox\pandoc@box +\newcommand*\pandocbounded[1]{% scales image to fit in text height/width + \sbox\pandoc@box{#1}% + \Gscale@div\@tempa{\textheight}{\dimexpr\ht\pandoc@box+\dp\pandoc@box\relax}% + \Gscale@div\@tempb{\linewidth}{\wd\pandoc@box}% + \ifdim\@tempb\p@<\@tempa\p@\let\@tempa\@tempb\fi% select the smaller of both + \ifdim\@tempa\p@<\p@\scalebox{\@tempa}{\usebox\pandoc@box}% + \else\usebox{\pandoc@box}% + \fi% +} + +\RequirePackage[font=small,labelfont=bf]{caption} + +% Wide environments for figures and tables ------------------------------------- +\RequirePackage{environ} + +% An easy way to make a figure span the full width of the page +\NewEnviron{widefigure}[1][]{ +\begin{figure}[#1] +\advance\leftskip-2cm +\begin{minipage}{\dimexpr\textwidth+4cm\relax}% + \captionsetup{margin=2cm} + \BODY +\end{minipage}% +\end{figure} +} + +\NewEnviron{widetable}[1][]{ +\begin{table}[#1] +\advance\leftskip-2cm +\begin{minipage}{\dimexpr\textwidth+4cm\relax}% + \captionsetup{margin=2cm} + \BODY +\end{minipage}% +\end{table} +} diff --git a/_news/RJ-2025-4-rforwards/RJwrapper.tex b/_news/RJ-2025-4-rforwards/RJwrapper.tex new file mode 100644 index 0000000000..b35d9c37b8 --- /dev/null +++ b/_news/RJ-2025-4-rforwards/RJwrapper.tex @@ -0,0 +1,70 @@ +\documentclass[a4paper]{report} +\usepackage[utf8]{inputenc} +\usepackage[T1]{fontenc} +\usepackage{RJournal} +\usepackage{amsmath,amssymb,array} +\usepackage{booktabs} + + +% tightlist command for lists without linebreak +\providecommand{\tightlist}{% + \setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}} + +\usepackage{longtable} + +% Always define CSL refs as bib entries are contained in separate doc +% Pandoc citation processing +%From Pandoc 3.1.8 +% definitions for citeproc citations +\NewDocumentCommand\citeproctext{}{} +\NewDocumentCommand\citeproc{mm}{% + \begingroup\def\citeproctext{#2}\cite{#1}\endgroup} +\makeatletter + % allow citations to break across lines + \let\@cite@ofmt\@firstofone + % avoid brackets around text for \cite: + \def\@biblabel#1{} + \def\@cite#1#2{{#1\if@tempswa , #2\fi}} +\makeatother +\newlength{\cslhangindent} +\setlength{\cslhangindent}{1.5em} +\newlength{\csllabelwidth} +\setlength{\csllabelwidth}{3em} +\newenvironment{CSLReferences}[2] % #1 hanging-indent, #2 entry-spacing + {\begin{list}{}{% + \setlength{\itemindent}{0pt} + \setlength{\leftmargin}{0pt} + \setlength{\parsep}{0pt} + % turn on hanging indent if param 1 is 1 + \ifodd #1 + \setlength{\leftmargin}{\cslhangindent} + \setlength{\itemindent}{-1\cslhangindent} + \fi + % set entry spacing + \setlength{\itemsep}{#2\baselineskip}}} + {\end{list}} +\usepackage{calc} +\newcommand{\CSLBlock}[1]{#1\hfill\break} +\newcommand{\CSLLeftMargin}[1]{\parbox[t]{\csllabelwidth}{#1}} +\newcommand{\CSLRightInline}[1]{\parbox[t]{\linewidth - \csllabelwidth}{#1}\break} +\newcommand{\CSLIndent}[1]{\hspace{\cslhangindent}#1} + + + +\begin{document} + + +%% do not edit, for illustration only +\sectionhead{Contributed research article} +\volume{17} +\volnumber{4} +\year{2025} +\month{December} +\setcounter{page}{328} + +\begin{article} + \input{RJ-2025-4-rforwards} +\end{article} + + +\end{document} diff --git a/_news/RJ-2025-4-rfoundation/RJ-2025-4-rfoundation.R b/_news/RJ-2025-4-rfoundation/RJ-2025-4-rfoundation.R new file mode 100644 index 0000000000..8b712b0fca --- /dev/null +++ b/_news/RJ-2025-4-rfoundation/RJ-2025-4-rfoundation.R @@ -0,0 +1,4 @@ +# Generated by `rjournal_pdf_article()` using `knitr::purl()`: do not edit by hand +# Please edit RJ-2025-4-rfoundation.Rmd to modify this file + + diff --git a/_news/RJ-2025-4-rfoundation/RJ-2025-4-rfoundation.Rmd b/_news/RJ-2025-4-rfoundation/RJ-2025-4-rfoundation.Rmd new file mode 100644 index 0000000000..0c7062f1fe --- /dev/null +++ b/_news/RJ-2025-4-rfoundation/RJ-2025-4-rfoundation.Rmd @@ -0,0 +1,103 @@ +--- +title: R Foundation News +date: '2025-12-01' +draft: no +author: +- name: Torsten Hothorn + affiliation: Universität Zürich + address: Switzerland + email: Torsten.Hothorn@R-project.org + orcid: 0000-0001-8301-0471 +output: + rjtools::rjournal_article: + self_contained: yes + toc: no + rjtools::rjournal_pdf_article: + latex_engine: xelatex +header-includes: +- \usepackage{xeCJK} +- \setCJKmainfont{Noto Sans CJK SC} +volume: 17 +issue: 4 +slug: RJ-2025-4-rfoundation +journal: + lastpage: 330 + firstpage: 330 + +--- + + +# Donations and members + +Membership fees and donations received between +2025-11-06 and 2026-01-05. + +## Donations + +Andreas Büttner (Germany); +Tobias Fellinger (Austria); +Shalese Fitzgerald (United States); +Roger Koenker (United Kingdom); +Bruce Larson (United States); +Joseph Luchman (United States); +HMS Analytical Software GmbH (Germany); +Kem Phillips (United States); +Fergus Reig Gracia (Spain); +Zane Troyer (United States); +Roland Wedekind (France). + +## Supporting institutions + +oikostat GmbH, Ettiswil (Switzerland). + +## Supporting members + +Richard Abdill (United States); +Justan Baker (United States); +Gilberto Camara (Brazil); +Michael Chirico (United States); +Tom Clarke (United Kingdom); +Robin Crockett (United Kingdom); +Dan Dediu (Spain); +Kevin DeMaio (United States); +Ian Dinwoodie (United States); +Anna Doizy (Réunion); +Fraser Edwards (United Kingdom); +Anthony Alan Egerton (Malaysia); +Neil Frazer (United States); +Sven Garbade (Germany); +Brian Gramberg (Netherlands); +Spencer Graves (United States); +Krushi Gurudu (United States); +Joe Harwood (United Kingdom); +Kieran Healy (United States); +Adam Hill (United States); +Sebastian Jeworutzki (Germany); +June Kee Kim (South Korea); +Vishal Lama (United States); +Thierry Lecerf (Switzerland); +Thomas Levine (United States); +David Luckett (Australia); +Mehrad Mahmoudian (Finland); +Keon-Woong Moon (South Korea); +Yoshinobu Nakahashi (Japan); +Dan Orsholits (Switzerland); +Sermet Pekin (Türkiye); +Elgin Perry (United States); +Peter Ruckdeschel (Germany); +Choonghyun Ryu (South Korea); +Raoul Schorer (Switzerland); +Dejan Schuster (Germany); +Tobias Strapatsas (Germany); +Kai Streicher (Switzerland); +Robert Szabo (Sweden); +Koki Takayama (Japan); +Jan Tarabek (Czechia); +Marcus Vollmer (Germany); +Petr Waldauf (Czechia); +Jaap Walhout (Netherlands); +Sandra Ware (Australia); +Fredrik Wartenberg (Sweden); +Vaidotas Zemlys-Balevičius (Lithuania); +Lim Zhong Hao (Singapore); +广宇 曾 (China). diff --git a/_news/RJ-2025-4-rfoundation/RJ-2025-4-rfoundation.html b/_news/RJ-2025-4-rfoundation/RJ-2025-4-rfoundation.html new file mode 100644 index 0000000000..d93099428d --- /dev/null +++ b/_news/RJ-2025-4-rfoundation/RJ-2025-4-rfoundation.html @@ -0,0 +1,1865 @@ + + + + + + + + + + + + + + + + + + + + + R Foundation News + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +

    R Foundation News

    + + + +

    “R Foundation News” published in The R Journal.

    +
    + + + +
    +

    1 Donations and members

    +

    Membership fees and donations received between +2025-11-06 and 2026-01-05.

    +

    1.1 Donations

    +

    Andreas Büttner (Germany); +Tobias Fellinger (Austria); +Shalese Fitzgerald (United States); +Roger Koenker (United Kingdom); +Bruce Larson (United States); +Joseph Luchman (United States); +HMS Analytical Software GmbH (Germany); +Kem Phillips (United States); +Fergus Reig Gracia (Spain); +Zane Troyer (United States); +Roland Wedekind (France).

    +

    1.2 Supporting institutions

    +

    oikostat GmbH, Ettiswil (Switzerland).

    +

    1.3 Supporting members

    +

    Richard Abdill (United States); +Justan Baker (United States); +Gilberto Camara (Brazil); +Michael Chirico (United States); +Tom Clarke (United Kingdom); +Robin Crockett (United Kingdom); +Dan Dediu (Spain); +Kevin DeMaio (United States); +Ian Dinwoodie (United States); +Anna Doizy (Réunion); +Fraser Edwards (United Kingdom); +Anthony Alan Egerton (Malaysia); +Neil Frazer (United States); +Sven Garbade (Germany); +Brian Gramberg (Netherlands); +Spencer Graves (United States); +Krushi Gurudu (United States); +Joe Harwood (United Kingdom); +Kieran Healy (United States); +Adam Hill (United States); +Sebastian Jeworutzki (Germany); +June Kee Kim (South Korea); +Vishal Lama (United States); +Thierry Lecerf (Switzerland); +Thomas Levine (United States); +David Luckett (Australia); +Mehrad Mahmoudian (Finland); +Keon-Woong Moon (South Korea); +Yoshinobu Nakahashi (Japan); +Dan Orsholits (Switzerland); +Sermet Pekin (Türkiye); +Elgin Perry (United States); +Peter Ruckdeschel (Germany); +Choonghyun Ryu (South Korea); +Raoul Schorer (Switzerland); +Dejan Schuster (Germany); +Tobias Strapatsas (Germany); +Kai Streicher (Switzerland); +Robert Szabo (Sweden); +Koki Takayama (Japan); +Jan Tarabek (Czechia); +Marcus Vollmer (Germany); +Petr Waldauf (Czechia); +Jaap Walhout (Netherlands); +Sandra Ware (Australia); +Fredrik Wartenberg (Sweden); +Vaidotas Zemlys-Balevičius (Lithuania); +Lim Zhong Hao (Singapore); +广宇 曾 (China).

    +
    + + +
    + +
    +
    + + + + + +
    +

    Reuse

    +

    Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

    +

    Citation

    +

    For attribution, please cite this work as

    +
    Hothorn, "R Foundation News", The R Journal, 2025
    +

    BibTeX citation

    +
    @article{RJ-2025-4-rfoundation,
    +  author = {Hothorn, Torsten},
    +  title = {R Foundation News},
    +  journal = {The R Journal},
    +  year = {2025},
    +  note = {https://journal.r-project.org/news/RJ-2025-4-rfoundation},
    +  volume = {17},
    +  issue = {4},
    +  issn = {2073-4859},
    +  pages = {330-330}
    +}
    +
    + + + + + + + diff --git a/_news/RJ-2025-4-rfoundation/RJ-2025-4-rfoundation.pdf b/_news/RJ-2025-4-rfoundation/RJ-2025-4-rfoundation.pdf new file mode 100644 index 0000000000..dec67edac7 Binary files /dev/null and b/_news/RJ-2025-4-rfoundation/RJ-2025-4-rfoundation.pdf differ diff --git a/_news/RJ-2025-4-rfoundation/RJ-2025-4-rfoundation.tex b/_news/RJ-2025-4-rfoundation/RJ-2025-4-rfoundation.tex new file mode 100644 index 0000000000..3919f9bd6a --- /dev/null +++ b/_news/RJ-2025-4-rfoundation/RJ-2025-4-rfoundation.tex @@ -0,0 +1,94 @@ +% !TeX root = RJwrapper.tex +\title{R Foundation News} + + +\author{by Torsten Hothorn} + +\maketitle + + +\section{Donations and members}\label{donations-and-members} + +Membership fees and donations received between +2025-11-06 and 2026-01-05. + +\subsection{Donations}\label{donations} + +Andreas Büttner (Germany); +Tobias Fellinger (Austria); +Shalese Fitzgerald (United States); +Roger Koenker (United Kingdom); +Bruce Larson (United States); +Joseph Luchman (United States); +HMS Analytical Software GmbH (Germany); +Kem Phillips (United States); +Fergus Reig Gracia (Spain); +Zane Troyer (United States); +Roland Wedekind (France). + +\subsection{Supporting institutions}\label{supporting-institutions} + +oikostat GmbH, Ettiswil (Switzerland). + +\subsection{Supporting members}\label{supporting-members} + +Richard Abdill (United States); +Justan Baker (United States); +Gilberto Camara (Brazil); +Michael Chirico (United States); +Tom Clarke (United Kingdom); +Robin Crockett (United Kingdom); +Dan Dediu (Spain); +Kevin DeMaio (United States); +Ian Dinwoodie (United States); +Anna Doizy (Réunion); +Fraser Edwards (United Kingdom); +Anthony Alan Egerton (Malaysia); +Neil Frazer (United States); +Sven Garbade (Germany); +Brian Gramberg (Netherlands); +Spencer Graves (United States); +Krushi Gurudu (United States); +Joe Harwood (United Kingdom); +Kieran Healy (United States); +Adam Hill (United States); +Sebastian Jeworutzki (Germany); +June Kee Kim (South Korea); +Vishal Lama (United States); +Thierry Lecerf (Switzerland); +Thomas Levine (United States); +David Luckett (Australia); +Mehrad Mahmoudian (Finland); +Keon-Woong Moon (South Korea); +Yoshinobu Nakahashi (Japan); +Dan Orsholits (Switzerland); +Sermet Pekin (Türkiye); +Elgin Perry (United States); +Peter Ruckdeschel (Germany); +Choonghyun Ryu (South Korea); +Raoul Schorer (Switzerland); +Dejan Schuster (Germany); +Tobias Strapatsas (Germany); +Kai Streicher (Switzerland); +Robert Szabo (Sweden); +Koki Takayama (Japan); +Jan Tarabek (Czechia); +Marcus Vollmer (Germany); +Petr Waldauf (Czechia); +Jaap Walhout (Netherlands); +Sandra Ware (Australia); +Fredrik Wartenberg (Sweden); +Vaidotas Zemlys-Balevičius (Lithuania); +Lim Zhong Hao (Singapore); +广宇 曾 (China). + + +\address{% +Torsten Hothorn\\ +Universität Zürich\\% +Switzerland\\ +% +% +\textit{ORCiD: \href{https://orcid.org/0000-0001-8301-0471}{0000-0001-8301-0471}}\\% +\href{mailto:Torsten.Hothorn@R-project.org}{\nolinkurl{Torsten.Hothorn@R-project.org}}% +} diff --git a/_news/RJ-2025-4-rfoundation/RJournal.sty b/_news/RJ-2025-4-rfoundation/RJournal.sty new file mode 100644 index 0000000000..351990be38 --- /dev/null +++ b/_news/RJ-2025-4-rfoundation/RJournal.sty @@ -0,0 +1,358 @@ +% Package `RJournal' to use with LaTeX2e +% Copyright (C) 2010 by the R Foundation +% Copyright (C) 2013 by the R Journal +% +% Originally written by Kurt Hornik and Friedrich Leisch with subsequent +% edits by the editorial board +% +% CAUTION: +% Do not modify this style file. Any changes to this file will be reset when your +% article is submitted. +% If you must modify the style or add LaTeX packages to the article, these +% should be specified in RJwrapper.tex + +\NeedsTeXFormat{LaTeX2e}[1995/12/01] +\ProvidesPackage{RJournal}[2025/10/05 v0.17 RJournal package] + +\RequirePackage{tikz} + +% Overall page layout, fonts etc ----------------------------------------------- + +% Issues of of \emph{The R Journal} are created from the standard \LaTeX{} +% document class \pkg{report}. + +\RequirePackage{geometry} +\geometry{a4paper, + textwidth=14cm, top=1cm, bottom=1cm, + includehead,includefoot,centering, + footskip=1.5cm} +\raggedbottom +\sloppy +\clubpenalty = 10000 +\widowpenalty = 10000 +\brokenpenalty = 10000 +\usepackage{microtype} + + +\RequirePackage{fancyhdr} +\fancyhead{} +\fancyheadoffset{2cm} +\fancyhead[L]{\textsc{\RJ@sectionhead}} +\fancyhead[R]{\thepage} +\fancyfoot{} +\fancyfoot[L]{The R Journal Vol. \RJ@volume/\RJ@number, \RJ@month~\RJ@year} +\fancyfoot[R]{ISSN 2073-4859} +\pagestyle{fancy} + +% We use the following fonts (all with T1 encoding): +% +% rm & palatino +% tt & inconsolata +% sf & helvetica +% math & palatino + +\RequirePackage{microtype} + +\RequirePackage[scaled=0.92]{helvet} +\RequirePackage{palatino,mathpazo} +\RequirePackage[scaled=1.02]{inconsolata} +\RequirePackage[T1]{fontenc} + +\RequirePackage[hyphens]{url} +\RequirePackage[pagebackref]{hyperref} +\renewcommand{\backref}[1]{[p#1]} + +% Dark blue colour for all links +\RequirePackage{color} +\definecolor{link}{rgb}{0.45,0.51,0.67} +\hypersetup{ + colorlinks,% + citecolor=link,% + filecolor=link,% + linkcolor=link,% + urlcolor=link +} + +% Give the text a little room to breath +\setlength{\parskip}{3pt} +\RequirePackage{setspace} +\setstretch{1.05} + +% Issue and article metadata --------------------------------------------------- + +% Basic front matter information about the issue: volume, number, and +% date. + +\newcommand{\volume}[1]{\def\RJ@volume{#1}} +\newcommand{\volnumber}[1]{\def\RJ@number{#1}} +\renewcommand{\month}[1]{\def\RJ@month{#1}} +\renewcommand{\year}[1]{\def\RJ@year{#1}} + + +% Individual articles correspond to +% chapters, and are contained in |article| environments. This makes it +% easy to have figures counted within articles and hence hyperlinked +% correctly. + +% An article has an author, a title, and optionally a subtitle. We use +% the obvious commands for specifying these. Articles will be put in certain +% journal sections, named by \sectionhead. + +\newcommand {\sectionhead} [1]{\def\RJ@sectionhead{#1}} +\renewcommand{\author} [1]{\def\RJ@author{#1}} +\renewcommand{\title} [1]{\def\RJ@title{#1}} +\newcommand {\subtitle} [1]{\def\RJ@subtitle{#1}} + +% Control appearance of titles: make slightly smaller than usual, and +% suppress section numbering. See http://tex.stackexchange.com/questions/69749 +% for why we don't use \setcounter{secnumdepth}{-1} + +\usepackage[medium]{titlesec} +\usepackage{titletoc} +\titleformat{\section} {\normalfont\large\bfseries}{\arabic{section}}{1em}{} +\titleformat{\subsection}{\normalfont\normalsize\bfseries}{\arabic{section}.\arabic{subsection}}{0.5em}{} +\titlecontents{chapter} [0em]{}{}{}{\titlerule*[1em]{.}\contentspage} + +% Article layout --------------------------------------------------------------- + +% Environment |article| clears the article header information at its beginning. +% We use |\FloatBarrier| from the placeins package to keep floats within +% the article. +\RequirePackage{placeins} +\newenvironment{article}{\author{}\title{}\subtitle{}\FloatBarrier}{\FloatBarrier} + +% Refereed articles should have an abstract, so we redefine |\abstract| to +% give the desired style + +\renewcommand{\abstract}[1]{\noindent\textbf{Abstract} #1} +\renewenvironment{abstract}{\noindent\textbf{Abstract}~}{} + +% The real work is done by a redefined version of |\maketitle|. Note +% that even though we do not want chapters (articles) numbered, we +% need to increment the chapter counter, so that figures get correct +% labelling. + +\renewcommand{\maketitle}{% +\noindent + \chapter{\RJ@title}\refstepcounter{chapter} + \ifx\empty\RJ@subtitle + \else + \noindent\textbf{\RJ@subtitle} + \par\nobreak\addvspace{\baselineskip} + \fi + \ifx\empty\RJ@author + \else + \noindent\textit{\RJ@author} + \par\nobreak\addvspace{\baselineskip} + \fi + \@afterindentfalse\@nobreaktrue\@afterheading +} + +% Now for some ugly redefinitions. We do not want articles to start a +% new page. (Actually, we do, but this is handled via explicit +% \newpage +% +% The name@of@eq is a hack to get hyperlinks to equations to work +% within each article, even though there may be multiple eq.(1) +% \begin{macrocode} +\renewcommand\chapter{\secdef\RJ@chapter\@schapter} +\providecommand{\nohyphens}{% + \hyphenpenalty=10000\exhyphenpenalty=10000\relax} +\newcommand{\RJ@chapter}{% + \edef\name@of@eq{equation.\@arabic{\c@chapter}}% + \renewcommand{\@seccntformat}[1]{}% + \@startsection{chapter}{0}{0mm}{% + -2\baselineskip \@plus -\baselineskip \@minus -.2ex}{\p@}{% + \phantomsection\normalfont\huge\bfseries\raggedright}} + +% Book reviews should appear as sections in the text and in the pdf bookmarks, +% however we wish them to appear as chapters in the TOC. Thus we define an +% alternative to |\maketitle| for reviews. +\newcommand{\review}[1]{ + \pdfbookmark[1]{#1}{#1} + \section*{#1} + \addtocontents{toc}{\protect\contentsline{chapter}{#1}{\thepage}{#1.1}} +} + +% We want bibliographies as starred sections within articles. +% +\RequirePackage[sectionbib,round]{natbib} +\bibliographystyle{abbrvnat} +\renewcommand{\bibsection}{\section*{References}} + +% Equations, figures and tables are counted within articles, but we do +% not show the article number. For equations it becomes a bit messy to avoid +% having hyperref getting it wrong. + +% \numberwithin{equation}{chapter} +\renewcommand{\theequation}{\@arabic\c@equation} +\renewcommand{\thefigure}{\@arabic\c@figure} +\renewcommand{\thetable}{\@arabic\c@table} + +% Issue layout ----------------------------------------------------------------- + +% Need to provide our own version of |\tableofcontents|. We use the +% tikz package to get the rounded rectangle. Notice that |\section*| +% is really the same as |\chapter*|. +\renewcommand{\contentsname}{Contents} +\renewcommand\tableofcontents{% + \vspace{1cm} + \section*{\contentsname} + { \@starttoc{toc} } +} + +\renewcommand{\titlepage}{% + \thispagestyle{empty} + \hypersetup{ + pdftitle={The R Journal Volume \RJ@volume/\RJ@number, \RJ@month \RJ@year},% + pdfauthor={R Foundation for Statistical Computing},% + } + \noindent + \begin{center} + \fontsize{50pt}{50pt}\selectfont + The \raisebox{-8pt}{\includegraphics[height=77pt]{Rlogo-5}}\hspace{10pt} + Journal + + \end{center} + {\large \hfill Volume \RJ@volume/\RJ@number, \RJ@month{} \RJ@year \quad} + + \rule{\textwidth}{1pt} + \begin{center} + {\Large A peer-reviewed, open-access publication of the \\ + R Foundation for Statistical Computing} + \end{center} + + % And finally, put in the TOC box. Note the way |tocdepth| is adjusted + % before and after producing the TOC: thus, we can ensure that only + % articles show up in the printed TOC, but that in the PDF version, + % bookmarks are created for sections and subsections as well (provided + % that the non-starred forms are used). + \setcounter{tocdepth}{0} + \tableofcontents + \setcounter{tocdepth}{2} + \clearpage +} + +% Text formatting -------------------------------------------------------------- + +\newcommand{\R}{R} +\newcommand{\address}[1]{\addvspace{\baselineskip}\noindent\emph{#1}} +\newcommand{\email}[1]{\href{mailto:#1}{\normalfont\texttt{#1}}} + +% Simple font selection is not good enough. For example, |\texttt{--}| +% gives `\texttt{--}', i.e., an endash in typewriter font. Hence, we +% need to turn off ligatures, which currently only happens for commands +% |\code| and |\samp| and the ones derived from them. Hyphenation is +% another issue; it should really be turned off inside |\samp|. And +% most importantly, \LaTeX{} special characters are a nightmare. E.g., +% one needs |\~{}| to produce a tilde in a file name marked by |\file|. +% Perhaps a few years ago, most users would have agreed that this may be +% unfortunate but should not be changed to ensure consistency. But with +% the advent of the WWW and the need for getting `|~|' and `|#|' into +% URLs, commands which only treat the escape and grouping characters +% specially have gained acceptance + +\DeclareRobustCommand\code{\bgroup\@noligs\@codex} +\def\@codex#1{\texorpdfstring% +{{\normalfont\ttfamily\hyphenchar\font=-1 #1}}% +{#1}\egroup} +\newcommand{\kbd}[1]{{\normalfont\texttt{#1}}} +\newcommand{\key}[1]{{\normalfont\texttt{\uppercase{#1}}}} +\DeclareRobustCommand\samp{`\bgroup\@noligs\@sampx} +\def\@sampx#1{{\normalfont\texttt{#1}}\egroup'} +\newcommand{\var}[1]{{\normalfont\textsl{#1}}} +\let\env=\code +\newcommand{\file}[1]{{`\normalfont\textsf{#1}'}} +\let\command=\code +\let\option=\samp +\newcommand{\dfn}[1]{{\normalfont\textsl{#1}}} +% \acronym is effectively disabled since not used consistently +\newcommand{\acronym}[1]{#1} +\newcommand{\strong}[1]{\texorpdfstring% +{{\normalfont\fontseries{b}\selectfont #1}}% +{#1}} +\let\pkg=\strong +\newcommand{\CRANpkg}[1]{\href{https://CRAN.R-project.org/package=#1}{\pkg{#1}}}% +\let\cpkg=\CRANpkg +\newcommand{\ctv}[1]{\href{https://CRAN.R-project.org/view=#1}{\emph{#1}}} +\newcommand{\BIOpkg}[1]{\href{https://www.bioconductor.org/packages/release/bioc/html/#1.html}{\pkg{#1}}} + +% Example environments --------------------------------------------------------- +\RequirePackage{fancyvrb} +\RequirePackage{alltt} + +\DefineVerbatimEnvironment{example}{Verbatim}{} +\renewenvironment{example*}{\begin{alltt}}{\end{alltt}} + +% Support for output from Sweave, and generic session style code +% These used to have fontshape=sl for Sinput/Scode/Sin, but pslatex +% won't use a condensed font in that case. + +% Update (2015-05-28 by DS): remove fontsize=\small to match example environment + +\DefineVerbatimEnvironment{Sinput}{Verbatim}{} +\DefineVerbatimEnvironment{Soutput}{Verbatim}{} +\DefineVerbatimEnvironment{Scode}{Verbatim}{} +\DefineVerbatimEnvironment{Sin}{Verbatim}{} +\DefineVerbatimEnvironment{Sout}{Verbatim}{} +\newenvironment{Schunk}{}{} + +% Mathematics ------------------------------------------------------------------ + +% The implementation of |\operatorname| is similar to the mechanism +% \LaTeXe{} uses for functions like sin and cos, and simpler than the +% one of \AmSLaTeX{}. We use |\providecommand| for the definition in +% order to keep the one of the \pkg{amstex} if this package has +% already been loaded. +% \begin{macrocode} +\providecommand{\operatorname}[1]{% + \mathop{\operator@font#1}\nolimits} +\RequirePackage{amsfonts} + +\renewcommand{\P}{% + \mathop{\operator@font I\hspace{-1.5pt}P\hspace{.13pt}}} +\newcommand{\E}{% + \mathop{\operator@font I\hspace{-1.5pt}E\hspace{.13pt}}} +\newcommand{\VAR}{\operatorname{var}} +\newcommand{\COV}{\operatorname{cov}} +\newcommand{\COR}{\operatorname{cor}} + +% Figures ---------------------------------------------------------------------- + +% For use with pandoc > 3.2.1 +\newsavebox\pandoc@box +\newcommand*\pandocbounded[1]{% scales image to fit in text height/width + \sbox\pandoc@box{#1}% + \Gscale@div\@tempa{\textheight}{\dimexpr\ht\pandoc@box+\dp\pandoc@box\relax}% + \Gscale@div\@tempb{\linewidth}{\wd\pandoc@box}% + \ifdim\@tempb\p@<\@tempa\p@\let\@tempa\@tempb\fi% select the smaller of both + \ifdim\@tempa\p@<\p@\scalebox{\@tempa}{\usebox\pandoc@box}% + \else\usebox{\pandoc@box}% + \fi% +} + +\RequirePackage[font=small,labelfont=bf]{caption} + +% Wide environments for figures and tables ------------------------------------- +\RequirePackage{environ} + +% An easy way to make a figure span the full width of the page +\NewEnviron{widefigure}[1][]{ +\begin{figure}[#1] +\advance\leftskip-2cm +\begin{minipage}{\dimexpr\textwidth+4cm\relax}% + \captionsetup{margin=2cm} + \BODY +\end{minipage}% +\end{figure} +} + +\NewEnviron{widetable}[1][]{ +\begin{table}[#1] +\advance\leftskip-2cm +\begin{minipage}{\dimexpr\textwidth+4cm\relax}% + \captionsetup{margin=2cm} + \BODY +\end{minipage}% +\end{table} +} diff --git a/_news/RJ-2025-4-rfoundation/RJwrapper.tex b/_news/RJ-2025-4-rfoundation/RJwrapper.tex new file mode 100644 index 0000000000..700f697366 --- /dev/null +++ b/_news/RJ-2025-4-rfoundation/RJwrapper.tex @@ -0,0 +1,72 @@ +\documentclass[a4paper]{report} +\usepackage[utf8]{inputenc} +\usepackage[T1]{fontenc} +\usepackage{RJournal} +\usepackage{amsmath,amssymb,array} +\usepackage{booktabs} + + +% tightlist command for lists without linebreak +\providecommand{\tightlist}{% + \setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}} + +\usepackage{longtable} + +% Always define CSL refs as bib entries are contained in separate doc +% Pandoc citation processing +%From Pandoc 3.1.8 +% definitions for citeproc citations +\NewDocumentCommand\citeproctext{}{} +\NewDocumentCommand\citeproc{mm}{% + \begingroup\def\citeproctext{#2}\cite{#1}\endgroup} +\makeatletter + % allow citations to break across lines + \let\@cite@ofmt\@firstofone + % avoid brackets around text for \cite: + \def\@biblabel#1{} + \def\@cite#1#2{{#1\if@tempswa , #2\fi}} +\makeatother +\newlength{\cslhangindent} +\setlength{\cslhangindent}{1.5em} +\newlength{\csllabelwidth} +\setlength{\csllabelwidth}{3em} +\newenvironment{CSLReferences}[2] % #1 hanging-indent, #2 entry-spacing + {\begin{list}{}{% + \setlength{\itemindent}{0pt} + \setlength{\leftmargin}{0pt} + \setlength{\parsep}{0pt} + % turn on hanging indent if param 1 is 1 + \ifodd #1 + \setlength{\leftmargin}{\cslhangindent} + \setlength{\itemindent}{-1\cslhangindent} + \fi + % set entry spacing + \setlength{\itemsep}{#2\baselineskip}}} + {\end{list}} +\usepackage{calc} +\newcommand{\CSLBlock}[1]{#1\hfill\break} +\newcommand{\CSLLeftMargin}[1]{\parbox[t]{\csllabelwidth}{#1}} +\newcommand{\CSLRightInline}[1]{\parbox[t]{\linewidth - \csllabelwidth}{#1}\break} +\newcommand{\CSLIndent}[1]{\hspace{\cslhangindent}#1} + + +\usepackage{xeCJK} +\setCJKmainfont{Noto Sans CJK SC} + +\begin{document} + + +%% do not edit, for illustration only +\sectionhead{Contributed research article} +\volume{17} +\volnumber{4} +\year{2025} +\month{December} +\setcounter{page}{330} + +\begin{article} + \input{RJ-2025-4-rfoundation} +\end{article} + + +\end{document} diff --git a/_site.yml b/_site.yml index ccb2b32085..65e0db9962 100644 --- a/_site.yml +++ b/_site.yml @@ -17,7 +17,7 @@ navbar: - text: "Home" href: index.html - text: "Current" - href: issues/2025-3 + href: issues/2025-4 - text: "Issues" href: issues.html - text: "News" diff --git a/editors.Rmd b/editors.Rmd index 3ce1578c28..2776102ccb 100644 --- a/editors.Rmd +++ b/editors.Rmd @@ -14,20 +14,21 @@ d-article > * { ## Editor-in-Chief -[Rob Hyndman](https://robjhyndman.com), Monash University, Australia. r-journal@r-project.org +[Emi Tanaka](https://emitanaka.org), Australian National University, Australia. r-journal@r-project.org + ## Executive editors -- [Emi Tanaka](https://emitanaka.org), Australian National University, Australia -- [Mark van der Loo](https://www.markvanderloo.eu), Statistics Netherlands and University of Leiden, The Netherlands +- [Vincent Arel-Bundock](https://arelbundock.com), University of Montreal, Canada +- [Rob Hyndman](https://robjhyndman.com), Monash University, Australia. - [Emily Zabor](https://www.emilyzabor.com), Case Western Reserve University, Cleveland Clinic and Taussig Cancer Institute, USA. ## Associate editors - [Rafael de Andrade Moral](https://rafamoral.github.io), Maynooth University, Ireland - [David Ardia](https://ardiad.github.io), HEC Montréal, Canada -- [Vincent Arel-Bundock](https://arelbundock.com), University of Montreal, Canada - [Rasmus Bååth](https://www.sumsar.net), Lund University, Sweden +- [Maciej Beręsewicz](https://maciejberesewicz.com/), Poznań University of Economics and Business, Poland - [Przemek Biecek](https://pbiecek.github.io), University of Warsaw, Poland - [Kevin Burke](https://kevinburke.ie), University of Limerick, Ireland - [Mine Çetinkaya-Rundel](https://mine-cr.com), Duke University, USA @@ -57,7 +58,7 @@ d-article > * { ## Technical editor -- [Mitchell O'Hara-Wild](https://mitchelloharawild.com), Nectric, Australia +- [Abhishek Ulayil](https://www.linkedin.com/in/abhishek-ulayil-666b647b/), India ## Editorial advisory board @@ -76,7 +77,7 @@ Vince Carey (2009), Peter Dalgaard (2010), Heather Turner (2011), Martyn Plummer (2012), Hadley Wickham (2013), Deepayan Sarkar (2014), Bettina Grün (2015), Michael Lawrence (2016), Roger Bivand (2017), John Verzani (2018), Norman Matloff (2019), Michael Kane (2020), Dianne Cook (2021), Catherine -Hurley (2022), Simon Urbanek (2023), Mark van der Loo (2024) +Hurley (2022), Simon Urbanek (2023), Mark van der Loo (2024), Rob Hyndman (2025) ## Past Associate Editors diff --git a/resources/article_status_data.Rdata b/resources/article_status_data.Rdata index c102cc7740..c5841e082e 100644 Binary files a/resources/article_status_data.Rdata and b/resources/article_status_data.Rdata differ diff --git a/resources/article_status_plot.png b/resources/article_status_plot.png index 5c2b40461f..df3da5f5fd 100644 Binary files a/resources/article_status_plot.png and b/resources/article_status_plot.png differ diff --git a/resources/time_to_accept_data.Rdata b/resources/time_to_accept_data.Rdata index 92db27b75c..2b90de9619 100644 Binary files a/resources/time_to_accept_data.Rdata and b/resources/time_to_accept_data.Rdata differ diff --git a/resources/time_to_accept_plot.png b/resources/time_to_accept_plot.png index 8a4dedfc4e..37ceeaa5c5 100644 Binary files a/resources/time_to_accept_plot.png and b/resources/time_to_accept_plot.png differ diff --git a/submissions.Rmd b/submissions.Rmd index 3954d51217..19493b67c0 100644 --- a/submissions.Rmd +++ b/submissions.Rmd @@ -30,6 +30,8 @@ Figures and tables should have alt-text in chunk specifications, to assist with Titles and abstract should be in plain text, with the abstract no more than 250 words. +Ensure that R outputs are from an evaluation of R code rather than manually entering the outputs in the Rmd file. + ### Rmarkdown using rticles The `rticles::rjournal_article` output format has been deprecated in favour of `rjtools`. Please use `rjtools` instead of `rticles`. @@ -44,6 +46,10 @@ If you do use LaTeX, you must use the [LaTeX template](https://github.com/rjourn The `rjtools` check functions described below can also be applied to check your files prior to submission. +### bib files + +Whether you use Rmarkdown or LaTeX, your references need to be contained in a `.bib` file. Please do not send us enormous bib files containing many references that you do not cite. Your submitted bib file should contain only the references actually cited in your paper. Please also check that all references fields are properly formatted. To preserve capitalization, wrap words such as `{R}` or package names in braces. Do not wrap all titles in braces, as this overrides the journal's reference style. + # Checking your article The `rjtools` package has a number of functions which can help you check that your article is ready to submit. These include: @@ -60,9 +66,14 @@ The `rjtools` package has a number of functions which can help you check that yo - `check_packages_available()` additionally referenced packaged are also available on CRA or Bioconductor - Each of these can also be run individually. + +## Long-term accessibility + +Authors are encouraged to think about the long-term accessibility of any key components of their submission that are hosted online, such as interactive applications or documentation websites. Rather than linking directly to a specific hosting service that may change, move, or shut down over time, consider pointing readers to a stable landing page under your control or to a resource with a permanent identifier. For example, if you are hosting a Shiny application, you could link to a landing page under your control that has the link to the live app, rather than linking directly to the hosting service. This way, if the hosting service changes or the app needs to be moved, you can update the landing page without breaking the link in your article. + # Submitting your article -To submit an article to the R Journal, you will need to complete [this form](https://forms.gle/ykj6QcoGQD5ctnA5A). +To submit an article to the R Journal, you will need to complete [this form](https://forms.gle/Eqkf6cFJM3mjuxZUA). Your files will need to be uploaded in a zip file that should contain: