Conversation
This program can be used to download GEDI files using an text file of urls provided by EarthData Search. Combined with the gediFinder.py program, one should be able to gather data from a large bounding box using Earthdata search without a lot of computer storage space and in a timely manner.
crich011
left a comment
There was a problem hiding this comment.
I made a handful of comments, one structural one about the order of downloading vs. checking output format but mostly about docs.
I think this all looks good and after you take a look at those, along with adding trailing spaces for some of the argparse help strings (see, e.g. L355 of process_l2a, which should get a space at the end of that string), and make any modifications to docstrings, I think we should merge it and get started on writing tests to port it to the earthshot repo.
| a = np.sort(a, axis =1) | ||
| count = (~np.isnan(a)).sum(axis=1) # count number of non-nans in row | ||
| groups = np.unique(count) # returns sorted unique values | ||
| groups = groups[groups > 0] # only returns groups with at least 1 non-nan value\n", |
There was a problem hiding this comment.
| groups = groups[groups > 0] # only returns groups with at least 1 non-nan value\n", | |
| groups = groups[groups > 0] # only returns groups with at least 1 non-nan value |
|
|
||
| Parameters | ||
| ---------- | ||
| file_dir : str |
There was a problem hiding this comment.
The docstring here can be updated to remove file_dir kwarg since the function just takes full paths, instead of a path and filenames, now.
| if args.filetype.lower() == "csv": | ||
| filename = os.path.join(args.dir, args.outfile + ".csv") | ||
| print(f'Writing to file {filename}') | ||
| df.to_csv(filename, index=False) | ||
| elif args.filetype.lower() == "parquet": | ||
| filename = os.path.join(args.dir, args.outfile + ".parquet.gzip") | ||
| print(f'Writing to file {filename}') | ||
| df.to_parquet(filename, compression="gzip") | ||
| elif args.filetype.lower() == "geojson": | ||
| filename = os.path.join(args.dir, args.outfile + ".geojson") | ||
| print(f'Writing to file {filename}') | ||
| df_to_geojson(df, filename) | ||
| else: | ||
| raise ValueError( | ||
| f"Received unsupported file type {args.filetype}. Please provide one of: csv, parquet, or GeoJSON." | ||
| ) |
There was a problem hiding this comment.
It might make more sense to do the file extension handling before the downloading and unpacking, just in case a user provides the wrong file extension (or has a typo), to save the risk of losing that downloading and unpacking. Another option would be to store the output of the intermediate steps, but I think just checking up-front probably makes more sense than that to me.
| None | ||
| """ | ||
| filepath = os.path.join(dir, "granuleData.zip") | ||
| r = requests.get(url, stream = True) |
There was a problem hiding this comment.
I think the way this does the checking and prints a descriptive error to the user is useful, especially since we want this function to return a bool, rather than throw the exception outright.
But, for future reference, a helpful tip that someone taught me: in most cases, using raise_for_status (i.e. r.raise_for_status() in this case) does the sensible thing you'd want when checking a response. It raises an error if the request returns an error or times out, and it returns none otherwise. Again, I think that is not actually what we want here, but it's worth having that in your toolbox for the times where it is what you want to do.
| Downloads and unpacks the zip file from the provide url | ||
| Returns | ||
| ------- | ||
| None |
There was a problem hiding this comment.
| None | |
| bool | |
| True indicates a successful download. False indicates that the download was unsuccessful. |
No description provided.