Conversation
1. fast_group_by 2. data_table_bind
Codecov Report
@@ Coverage Diff @@
## master #1158 +/- ##
==========================================
- Coverage 95.00% 94.40% -0.60%
==========================================
Files 11 11
Lines 1421 1430 +9
==========================================
Hits 1350 1350
- Misses 71 80 +9
Continue to review full report at Codecov.
|
bpbond
left a comment
There was a problem hiding this comment.
Minor style changes only. Thanks @kanishkan91 !
I wonder if we should look for opportunities to use this throughout the codebase--for example, in the current slowest chunks. Thoughts @pralitp ?
| fast_group_by<- function(df,by,colname="value",func= "sum"){ | ||
|
|
||
|
|
||
| #Convert relevant column to numeric |
There was a problem hiding this comment.
For consistency with rest of codebase, please add a space after all these #s
| df<- df[, (colname) := (get(func)(get(colname))), by] | ||
|
|
||
| #Save back to tibble | ||
| df<- as_tibble(df) |
There was a problem hiding this comment.
I would just make line 537 the last one of the function: as_tibble(df)
| df <- rbindlist(list_for_bind,use.names=TRUE) | ||
|
|
||
| #Return as tibble | ||
| df<-as_tibble(df) |
| list_for_bind =list(...) | ||
|
|
||
| #bind into one dataframe using rbindlist | ||
| df <- rbindlist(list_for_bind,use.names=TRUE) |
There was a problem hiding this comment.
| df <- rbindlist(list_for_bind,use.names=TRUE) | |
| df <- rbindlist(list_for_bind, use.names = TRUE) |
| #' @importFrom dplyr %>% | ||
| #' @author kbn 24 Mar 2020 | ||
| #' @export | ||
| fast_group_by<- function(df,by,colname="value",func= "sum"){ |
There was a problem hiding this comment.
| fast_group_by<- function(df,by,colname="value",func= "sum"){ | |
| fast_group_by <- function(df, by, colname = "value", func = "sum"){ |
Adding 2 functions with documentation,
fast_group_by- A faster alternative to the traditional dplyr alternative. It makes use of data.table. It groups data, performs a function, ungroups. Essentially performs a group_by, mutate and ungroup. It can be used within dplyr pipes. Speed increases exponentially with the increase in the volume of underlying data.
data_table_bind- A faster alternative to bind_rows that takes advantage of data.table's data processing capabilities. Returns a tibble after binding all input datasets.