Fixed inconsistent results of `oracle.get_links` across runs by PauBadiaM · Pull Request #196 · morris-lab/CellOracle

PauBadiaM · 2024-05-07T11:58:44Z

Hi @KenjiKamimoto-wustl122

I have observed that the method oracle.get_links unfortunately returns different results across runs (celloracle == 0.18.0). While these differences are not huge (mean jaccard index of 0.9 between different runs), it is important to have a fixed seed to make results reproducible.

Even though you correctly use BaggingRegressor with a fixed seed, the problem comes upstream since you use sets to store TF gene symbols in oracle.TFdict. The problem with using sets is that their order is dependent on the current memory hash being used, meaning that at each run their order is going to be slightly different. This makes BaggingRegressor sample differently event though it uses the same seed all the time. However the solution is very easy, to fix the order of the selected TFs by sorting them alphabetically:

# Sort to fix seed
reg_all = sorted(reg_all)

With this simple change results are always the same.

Note that to get different results with the previous version you need to restart the kernel/run the script again so that the memory hash is restarted. Running the same code inside the same session in a jupyter lab will yield the same results but not if you restart the notebook.
Hope this is helpful!

Fixed inconsistent results across runs

3537f61

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Fixed inconsistent results of `oracle.get_links` across runs#196

Fixed inconsistent results of `oracle.get_links` across runs#196
PauBadiaM wants to merge 1 commit intomorris-lab:masterfrom
PauBadiaM:master

PauBadiaM commented May 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

PauBadiaM commented May 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant