-
Notifications
You must be signed in to change notification settings - Fork 130
adding files related to NOA GFS poc #1824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @balit-raibot, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request establishes the foundational data and configuration for a Proof of Concept (PoC) to integrate NOAA Global Forecast System (GFS) data into the system. It includes the raw GFS data, the necessary metadata and mapping files to process this data, and the resulting transformed output, along with the corresponding statistical variable definitions and schema. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds new files for importing NOAA GFS data as a proof-of-concept for Data Commons. While this is a valuable addition, I've identified several critical and high-severity issues in the generated files and configurations that should be addressed to ensure the data is imported correctly and the process is robust. Key issues include hardcoded values leading to data precision loss, incomplete and fragile TMCF definitions, malformed data in the output CSV, and stylistic issues in the MCF files.
| Node: E:noa_gfs_output->E0 | ||
| observationDate: C:noa_gfs_output->observationDate | ||
| value: C:noa_gfs_output->value | ||
| observationAbout: C:noa_gfs_output->observationAbout | ||
| variableMeasured: C:noa_gfs_output->variableMeasured | ||
| unit: C:noa_gfs_output->unit | ||
| typeOf: dcs:StatVarObservation | ||
| referenceTime: 2025-12-24T00:00:00Z | ||
| validTime: 2025-12-24T00:00:00Z |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TMCF file is incomplete and not robust for a general import process.
- Missing Mappings: It's missing mappings for the
level,latitude, andlongitudecolumns from the CSV. These are important properties for the observations and should be included. - Hardcoded Times:
referenceTimeandvalidTimeare hardcoded. While they are constant for this specific input file, this approach is fragile. If data from other times is processed, this will lead to incorrect data. These should be mapped from their respective columns in the CSV.
A corrected version would look like this:
Node: E:noa_gfs_output->E0
typeOf: dcs:StatVarObservation
observationDate: C:noa_gfs_output->observationDate
value: C:noa_gfs_output->value
observationAbout: C:noa_gfs_output->observationAbout
variableMeasured: C:noa_gfs_output->variableMeasured
unit: C:noa_gfs_output->unit
level: C:noa_gfs_output->level
latitude: C:noa_gfs_output->latitude
longitude: C:noa_gfs_output->longitude
referenceTime: C:noa_gfs_output->referenceTime
validTime: C:noa_gfs_output->validTime
Node: E:noa_gfs_output->E0
typeOf: dcs:StatVarObservation
observationDate: C:noa_gfs_output->observationDate
value: C:noa_gfs_output->value
observationAbout: C:noa_gfs_output->observationAbout
variableMeasured: C:noa_gfs_output->variableMeasured
unit: C:noa_gfs_output->unit
level: C:noa_gfs_output->level
latitude: C:noa_gfs_output->latitude
longitude: C:noa_gfs_output->longitude
referenceTime: C:noa_gfs_output->referenceTime
validTime: C:noa_gfs_output->validTime
| @@ -0,0 +1,77 @@ | |||
| key,p1,v1,p2,v2,p3,v3,p4,v4 | |||
| S.No,observationAbout,country/USA,observationDate,2025 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The observationDate is hardcoded to 2025, which leads to a loss of precision since the input data contains full timestamps. This should be derived from the Reference_Time column.
To fix this, you should:
- Change this line to remove the
observationDatemapping. - Add a new line to map
Reference_TimetoobservationDate, for example:
Reference_Time,observationDate,{Data},#Eval,"observationDate=format_date(Data, '%Y-%m-%d')"
S.No,observationAbout,country/USA
Adding GFS Data of NOA for Data Commons POC