Fixed inconsistent results of oracle.get_links across runs#196
Open
PauBadiaM wants to merge 1 commit intomorris-lab:masterfrom
Open
Fixed inconsistent results of oracle.get_links across runs#196PauBadiaM wants to merge 1 commit intomorris-lab:masterfrom
oracle.get_links across runs#196PauBadiaM wants to merge 1 commit intomorris-lab:masterfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hi @KenjiKamimoto-wustl122
I have observed that the method
oracle.get_linksunfortunately returns different results across runs (celloracle == 0.18.0). While these differences are not huge (mean jaccard index of 0.9 between different runs), it is important to have a fixed seed to make results reproducible.Even though you correctly use
BaggingRegressorwith a fixed seed, the problem comes upstream since you use sets to store TF gene symbols inoracle.TFdict. The problem with using sets is that their order is dependent on the current memory hash being used, meaning that at each run their order is going to be slightly different. This makesBaggingRegressorsample differently event though it uses the same seed all the time. However the solution is very easy, to fix the order of the selected TFs by sorting them alphabetically:With this simple change results are always the same.
Note that to get different results with the previous version you need to restart the kernel/run the script again so that the memory hash is restarted. Running the same code inside the same session in a jupyter lab will yield the same results but not if you restart the notebook.
Hope this is helpful!