Data table functions by kanishkan91 · Pull Request #1158 · JGCRI/gcamdata

kanishkan91 · 2020-03-26T17:19:51Z

Adding 2 functions with documentation,

fast_group_by- A faster alternative to the traditional dplyr alternative. It makes use of data.table. It groups data, performs a function, ungroups. Essentially performs a group_by, mutate and ungroup. It can be used within dplyr pipes. Speed increases exponentially with the increase in the volume of underlying data.
data_table_bind- A faster alternative to bind_rows that takes advantage of data.table's data processing capabilities. Returns a tibble after binding all input datasets.

1. fast_group_by 2. data_table_bind

codecov · 2020-03-26T17:52:01Z

Codecov Report

Merging #1158 into master will decrease coverage by 0.59%.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##           master    #1158      +/-   ##
==========================================
- Coverage   95.00%   94.40%   -0.60%     
==========================================
  Files          11       11              
  Lines        1421     1430       +9     
==========================================
  Hits         1350     1350              
- Misses         71       80       +9

Impacted Files	Coverage Δ
R/utils.R	`94.82% <0.00%> (-3.53%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c8156b3...705e40d. Read the comment docs.

bpbond

Minor style changes only. Thanks @kanishkan91 !

I wonder if we should look for opportunities to use this throughout the codebase--for example, in the current slowest chunks. Thoughts @pralitp ?

bpbond · 2020-03-27T09:55:23Z

R/utils.R

+fast_group_by<- function(df,by,colname="value",func= "sum"){
+
+
+    #Convert relevant column to numeric


For consistency with rest of codebase, please add a space after all these #s

bpbond · 2020-03-27T09:55:48Z

R/utils.R

+    df<- df[, (colname) := (get(func)(get(colname))), by]
+
+    #Save back to tibble
+    df<- as_tibble(df)


I would just make line 537 the last one of the function: as_tibble(df)

bpbond · 2020-03-27T09:55:56Z

R/utils.R

+    df <- rbindlist(list_for_bind,use.names=TRUE)
+
+    #Return as tibble
+    df<-as_tibble(df)


bpbond · 2020-03-27T09:56:30Z

R/utils.R

+    list_for_bind =list(...)
+
+    #bind into one dataframe using rbindlist
+    df <- rbindlist(list_for_bind,use.names=TRUE)


Suggested change

df <- rbindlist(list_for_bind,use.names=TRUE)

df <- rbindlist(list_for_bind, use.names = TRUE)

bpbond · 2020-03-27T09:57:10Z

R/utils.R

+#' @importFrom dplyr %>%
+#' @author kbn 24 Mar 2020
+#' @export
+fast_group_by<- function(df,by,colname="value",func= "sum"){


Suggested change

fast_group_by<- function(df,by,colname="value",func= "sum"){

fast_group_by <- function(df, by, colname = "value", func = "sum"){

kanishkan91 added 2 commits March 25, 2020 19:13

Adding functions for fast_group_by and data_table_bind

da921d2

Commiting 2 functions with documentation

705e40d

1. fast_group_by 2. data_table_bind

kanishkan91 added the enhancement label Mar 26, 2020

kanishkan91 requested a review from bpbond March 26, 2020 17:19

kanishkan91 self-assigned this Mar 26, 2020

bpbond requested changes Mar 27, 2020

View reviewed changes

Base automatically changed from master to main January 19, 2021 20:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Data table functions#1158

Data table functions#1158
kanishkan91 wants to merge 2 commits intomainfrom
data-table-functions

kanishkan91 commented Mar 26, 2020

Uh oh!

codecov bot commented Mar 26, 2020 •

edited

Loading

Uh oh!

bpbond left a comment

Uh oh!

bpbond Mar 27, 2020

Uh oh!

bpbond Mar 27, 2020

Uh oh!

bpbond Mar 27, 2020

Uh oh!

bpbond Mar 27, 2020

Uh oh!

bpbond Mar 27, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		fast_group_by<- function(df,by,colname="value",func= "sum"){


		#Convert relevant column to numeric

	df <- rbindlist(list_for_bind,use.names=TRUE)
	df <- rbindlist(list_for_bind, use.names = TRUE)

	fast_group_by<- function(df,by,colname="value",func= "sum"){
	fast_group_by <- function(df, by, colname = "value", func = "sum"){

Comments

Conversation

kanishkan91 commented Mar 26, 2020

Uh oh!

codecov bot commented Mar 26, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

bpbond left a comment

Choose a reason for hiding this comment

Uh oh!

bpbond Mar 27, 2020

Choose a reason for hiding this comment

Uh oh!

bpbond Mar 27, 2020

Choose a reason for hiding this comment

Uh oh!

bpbond Mar 27, 2020

Choose a reason for hiding this comment

Uh oh!

bpbond Mar 27, 2020

Choose a reason for hiding this comment

Uh oh!

bpbond Mar 27, 2020

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Mar 26, 2020 •

edited

Loading