-
Notifications
You must be signed in to change notification settings - Fork 1
Parallel remeshing #684
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Parallel remeshing #684
Conversation
|
thanks @fsalmon001. I have compiled parmmg2d now, so waiting to hear how to run with it |
|
Hi @tdcwilliams, it is in the attached file. Actually, with default options; you just need to add regrid=mmg in the numerics options |
|
Hi again @fsalmon001, I was able to compile nextsim after |
|
Hi @tdcwilliams, the USE_MMG=true is done in the environment file like env_compile_gnu_linux.bash. Indeed, with the container, you don't use them so you should add it in the container. Then, yes, you only need to use numerics.regrid=mmg or do not choose it, as when USE_MMG=true, MMG is the default choice in the options.cpp file (you can change it if you want). |
|
actually USE_MMG was unset, and it doesn't compile. Where is libmetis? I have in /opt/local/parmmg2d: and |
|
Hi @fsalmon001, |
|
Hi @fsalmon001, |
|
Hi @tdcwilliams, I just pushed a new commit for parmmg2d to have the correct installation path, which was not the case. Now the metis library is at the same location as the parmmg2d library |
|
Hi @fsalmon001 |
|
Hi @fsalmon001, Your fix was nicer though and also works. |
|
Hi @fsalmon001 |
|
Hi @tdcwilliams, |
|
Hi again @fsalmon001, For |
|
Hi @fsalmon001 |
|
Ok, I see, actually it was not really an error even if MMG complains about it @tdcwilliams h_min = coef_min * h_mid So to have a strictly uniform mesh, you need h_min = h_max, or at least, h_min almost equal to h_max. So, I would choose 0.999 and 1.001 for instance, to avoid the warning and still have a uniform mesh by default. You still have the issue with BAMG? |
|
yes I still have a bamg error, with this cfg file and 32 cpus |
…x closer together
|
I tried your case @tdcwilliams, but without the ocean model (constant instead) and the ice-type is piomas. I don't have the drifter file neither so I did not use drifters. And it works. I think I have only seen this kind of error, and if I remember well, it is related to an issue with a netcdf file (netcdf uses the argument ncuts in some functions). |
|
Hi @fsalmon001 Can you do this (NB remove |
|
Hi @tdcwilliams, I would agree with you, but we still use one interpolation function of BAMG (for nodes) in interpFields_parallel, so we cannot remove completly BAMG so far. And it is not as easy to remove it because this function calls other functions in BAMG. Moreover, on a supercomputer, I had an error in the node interpolation function, probably due to epsilon error and a point which was outside the domain for maybe 1e-14 m, and this caused a crash. So there is indeed not perfect functions in BAMG. So what do we do? I could do what you want relatively quickly I think. But maybe, before, you could try 2 km meshes for instance, to check everything is ok even with finer meshes and your configuration file (I don't have as many options as you in my tests). |
|
Yes, there is no need for us to have two remeshing methods, even though @fsalmon001 obviously needs to compare the two in his paper. But if we remove bamg, we need to update the Docker and Aptainer files in https://github.com/nansencenter/nextsim-env before merging into develop. |
|
Hi @fsalmon001, Hi @einola, |
|
I just thought of it @tdcwilliams, @einola, but when the paper will be under review, maybe the reviewers should have access to a version with MMG and BAMG to check and compare the results? Maybe we could let BAMG in this version, and I do another pull request to remove BAMG from the code after? My post-doc ends at the end of January, so we have enough time, it will be a quick PR. I also had a look on the interpolation function from BAMG, I think I can code another one, but not sure so far that I will be as quick as the original one. I will keep you informed |
|
Hi @fsalmon001, we can tag the current version of develop, and the one after this PR, if they want to compare bamg and mmg. |
|
Hi @tdcwilliams, yes that sounds good. For BAMG, I am not convinced that the bamg option is unstable in this PR, there is only one difference when using BAMG from the current dev branch, it is in the explicitSolve function, and the difference is epsilon. So, I don't see how this branch could bring issues when using BAMG compared to the current nextsim version. |
|
Hi @fsalmon001 same config as before but different mesh and timestep is 450s. |
|
With your compile option, it crashes at the beginning like you. So, I will be able to look for this problem |
|
I have found the error at the beginning inside the interpolation function. The error is stupid and is just at line 86 : I have found another error in the parallel interpolation function after more than 20 time steps. So I am looking for this before making a commit. |
Yes, I see. Well, the forcing data (winds, currents, etc) are much higher resolution than the initial conditions we had problems with. So that still works fine. I don't expect we'll ever be in a situation where the forcing data is so coarse that we have this problem with those. |
|
Ok @einola. I made a commit to correct buffer overflows (same error found by @tdcwilliams in the drifters). Please, tell me how it works now (here it seems to run) |
|
Hi @fsalmon001 |
|
With your compilation options, it crashes exactly at the line where there is an issue. I think if you just add some std::cerr, it should be possible to see where it crashes. I don't really see where it can crash (I think I have never had crashes here), but there are some sqrt, maybe of negative numbers. This might happen due to epsilon precision. |
|
I have narrowed the crash down to find_z1_z2, I think in the while loop (I don't have comm in there so haven't got a barrier before the printouts) - can you see anything there? |
|
This makes sense, I made a commit about this loop last week because there was some issues. So the error is inside: I think the error is that k becomes negative because there is no solution. Can you check please? |
|
Can you replace inside find_z1_z2 by this: Also, if it doesn't crash anymore, please tell me if you have an error message "INACCURACIES OR ERROR IN THE METRIC COMPUTATION:" and the values given. |
|
yes, it did get negative. One question: would initialising with Anyway, I'll try your code out |
|
Yes indeed you are right. Even if this is not the problem here, it should be initialized by k = nb_vertices |
|
ok, what about |
|
ok, with your code it doesn't crash anymore, but I do get |
|
Ok this is not surprising. Instead, can you try with this please (the first case with k=np_vertices should be done before the loop, but it was not achieved correctly): |
|
Hi @fsalmon001, |
|
Thank you @tdcwilliams. And now, can you check a mixed of both to see if the inaccurracy is stille the same? PS: I edited the code below at 8h55 |
|
didn't crash but gave NB I used the latest edit of the code |
|
Ok, so the solution is not the first k and the issue is that there is no solution. I have run a lot of cases and I never faced this. I don't know how it is possible. Basically, this means that the number of vertices will increase a bit in the new mesh. So it is not a huge issue. The next regrid step, do you have this message again? |
|
yes I seem to get it every time there is a regrid |
|
The mesh seems ok? I am not sure the origin of the problem is in the metric computation if this happens everytime. Or this is a very particular case which is permanent here for some reason. |
|
I don't know about the mesh, but the moorings outputs seem ok. |
|
Ok, I will try to understand if this is theoretically possible. At least, this is not a critical error, rather an inaccurracy on the wanted metric. |
|
can I save some fields or something for you to test? |
|
Maybe, give me the log file of your run (with the LOG[DEBUG], etc) please? I could compare with mine to see when both simulations differ whereas we have the same config file. Maybe there is still an error in the code, and depending on the configuration, we have it or not |
|
In the 3 commits, I corrected:
|
Quick_user_guide.odt
Integration of parallel remeshing into nextsim using parmmg2d. The parallel interpolation is also added.
The use of BAMG is still possible.
There will be some minor differences with the develop branch even with bamg due to changes in the explicitSolve function (induce only machine precision errors).
To use parmmg2d, you must compile parmmg2d. It should be done by adding in the container:
I will edit this message and add a quick user guide for using the parallel remeshing this afternoon.