-
Notifications
You must be signed in to change notification settings - Fork 27
Add csv post process for time series variables #440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add csv post process for time series variables #440
Conversation
elenya-grant
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a really great tool to save timeseries profiles! Thank you! I have a few suggestions and would be happy to chat more about other possible considerations. For example,
- The input to the function could be the .sql filepath and the case index then the filename could be output as the same as the sql filename but includes the case number:
output_fpath = sql_fpath.parent / f"{sql_fpath.stem}_Case{case_index}.csv"- Would there be a case where someone would want to save all the timeseries data? We could make this function also able to save all the timeseries variables if a specified name list isn't input.
| # get the cases as a list | ||
| cases = list(cr.get_cases()) | ||
|
|
||
| variable_list = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that "variable_list" and "units_list" may be easier to read if it was a dictionary. Like:
variable_to_units_dict = {'battery.SOC': 'percent', 'battery.electricity_in': 'MW'} | name_list = var.split(".") | ||
| name_list.append(units_list[loc]) | ||
| save_key = " ".join(name_list) | ||
| data[save_key] = case.get_val(var, units=units_list[loc]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you wanted to make this flexible enough to handle a case where a user doesn't specify units, you can get units for a variable using: case._get_units(var)
Also - if you wanted to be consistent with the column naming in convert_sql_to_csv_summary() (h2integrate/postprocess/sql_to_csv.py) then I think the save_key would be formatted as f"{var} ({data_units})" where var is the variable naming that you have in variable_list.
I also think it'd be good to have a check that each variable is an array or else you may get errors from pandas if trying to make a data frame of 8760s but one of the variables is a scalar.
| import pandas as pd | ||
|
|
||
| def save_as_csv(variable_list, units_list, case, filename, alternative_name_list=None): | ||
| data = pd.DataFrame() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is a draft but just a reminder to include docstrings!
…ess_csv' into feature/post_process_csv
…ure/post_process_csv
johnjasa
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fun PR! I'm really happy to see you, @genevievestarke, contributing to the code in nice, digestible chunks as you work through a bigger behemoth in #407.
I have some comments and questions that I've marked. The main quasi-blocking one is on if the number of outputs saved is intentional. I left more info in a bigger comment with an example output showing what I mean.
I made some small edits directly to the branch, too.
My other suggestion is to add at least a mention of this functionality to the docs (postprocessing_results.md). A bigger inclusion could show using it within an .ipynb file but that isn't strictly necessary.
Lastly, let me acknowledge that this is a more in-depth review than is purely needed here, but really because I want to give good feedback and make sure this is optimally useful for users. I love the addition and I think it helps answer the question "so I just ran H2I... now what?" so that's very helpful! Thank you!
elenya-grant
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! Thank you for adding this in! I gave one non-blocking comment but this looks good to go!
| assert all(c_name in res.columns.to_list() for c_name in expected_colnames) | ||
|
|
||
|
|
||
| def test_alternative_column_names(subtests, run_example_02_sql_fpath): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love this new test! Thank you!
|
|
||
| ### Saving outputs | ||
|
|
||
| The time series outputs can be saved to a csv output using the `save_case_timeseries_as_csv` function. If no variables are specified, then the function saves all time series variables in the simulation. Otherwise, the specified variables are saves. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Non-blocking): could be good to document the different inputs (like the cases done in the tests for this function) here. I'm thinking about specifying units and/or giving it alternative names.
johnjasa
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks for this meaningful quality-of-life improvement to H2I, @genevievestarke! I really like the functionality this adds and I appreciate the effort you put into iterating on it.
I pushed up some doc changes based on Elenya's last non-block suggestion. Great to come in and I'll merge it after tests pass!
Add csv post process for time series variables
This pull requests adds a function to save specified time series variables to a csv file for post processing. The function is meant to only save time series variables from a single H2I run. It loads the data from an sql file, so the outputs of the H2I run need to be saved with a recorder. You can save all the time series variables from a run by not specifying specific variables, or you can enter a dictionary or list of the time series variables you would like to save. You can also provide units and/or alternative names for the variables. The output is a dictionary of the time series variables, and a csv file if the
save_to_fileinput is set toTrue.Additionally, there is a minor bug fix to return the recorder path.
Section 1: Type of Contribution
Section 3: General PR Checklist
docs/files are up-to-date, or added when necessaryCHANGELOG.mdhas been updated to describe the changes made in this PRSection 3: Related Issues
Section 4: Impacted Areas of the Software
Section 4.1: New Files
h2integrate\postprocess\sql_timeseries_to_csv.py: tests for all possible different use cases of the functionsave_case_timeseries_as_csvcheck_get_units_for_var: Check the units for a variable within a case and returns the variable value in the specified units and the specified units.save_case_timeseries_as_csv: Summarize timeseries data from a case within an sql recorder file to a DataFrameand save to csv file if
save_to_fileis True.h2integrate\postprocess\test\test_sql_timeseries_to_csv.pySection 4.2: Modified Files
h2integrate/core/pose_optimization.pyset_recorders(): minor bugfix to return recorder path ifrecorder_attachment=="model"examples/02_texas_ammonia/driver_config.yaml: updated to user recorderexamples/02_texas_ammonia/run_texas_ammonia_plant.py: updated to showcase saving timeseries functionalitySection 5: Additional Supporting Information
Section 6: Test Results, if applicable