-
Notifications
You must be signed in to change notification settings - Fork 1
Suggestion: cmake system for downloading datasets #324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…GAMBIT modules modified: CMakeLists.txt modified: cmake/cleaning.cmake new file: cmake/datasets.cmake modified: cmake/scripts/safe_dl.sh
|
This sounds like a good idea. A few thoughts spring to mind:
|
For now I'd be inclined to not incorporate these dummy backends at all into the new system. Since they represent data connected to a specific backend, I think they quite naturally belong within the backends.cmake framework, as something that eventually go into the It's a question whether we should make the cmake target for SomeBit depend on its dataset targets, but I think it's probably better not to do that. I don't think the user should have to download a potentially large dataset just to compile SomeBit, since they may not be interested in using the module function that requires the data. That means however that it would be up to the module function using the data set to first check that it exists, and throw a sensible error (with the suggested make command) if it's not present. One perhaps useful thing would be to somehow register each dataset in dataset.cmake with its corresponding GAMBIT module, so that when running cmake we could output a message saying which "make get-dataset-blah" commands should be run to get all the datasets for the non-ditched modules.
Good idea. Will do that. |
|
I agree that it's better not to make the Bits all depend on all datasets that they might not use, but I don't see a good argument for maintaining two lots of dataset targets, one associated with backends and one not. I think it would be a lot neater to just make datasets associated with backends part of You'd then just have some other custom function/macro that you could put in |
|
OK, thanks, sounds like a good plan. I'll go with that. |
|
Hi, sorry for the late reply. I agree with most of what you guys discussed. My main suggestion is to not assign datasets with modules themselves, but with module functions that need it. We can create a |
|
Comment from Core meeting: I will get back to this, once I'm done with PR #485. We will probably keep the first version of this system simple, as in just work at the level of the cmake targets (backends and modules). Introducing a more fine-grained system, at the level of dependency resolution, can be future work. As a first concrete example I will add the |
As discussed in the NeutrinoBit meeting today, we sometimes need to use fairly large datasets. (The current example is that Super-K provides their 4D tabulated chi^2 function in two text files with a total size of 150 MB). And this issue will just become more frequent going forward -- e.g. we probably don't want to include every ATLAS FullLikelihood .json file in our repo directly as each of these files are a few MB.
So I think it would be good to have a cmake system for downloading these types of datasets.
In the NeutrinoBit meeting we briefly discussed how we could probably just use a "fake" backend (e.g. in the Super-K case just a BE convenience function that performs interpolation) to effectively get a downloadable dataset in the current cmake system. But I think it will be easier and less confusing to have a separate part of our cmake system properly dedicated to downloading datasets that in reality aren't connected with any backend. Typically, these are the datasets that we would put in SomeBit/data/.
This PR is a suggestion for such a cmake system. It's essentially just a new file
datasets.cmakewhere we can register downloadable datasets, much like how we register backends inbackends.cmake.The current dummy example in
datasets.cmakedownloads our own CMSSM/NUHM best-fit SLHA files as a tarball from Zenodo and puts them inExampleBit_A/data/best_fits_SLHA_1705_07935.The generated make targets are
make dataset-best_fits_SLHA_1705_07935,make nuke-dataset-best_fits_SLHA_1705_07935andmake nuke-datasets. The targetmake dataset-best_fits_SLHA_1705_07935is not added if ExampleBit_A is ditched.What do you think, @tegonzalo? (Added you as reviewer.) Also tagging @patscott for your thoughts on this.