Conversation
janfb
left a comment
There was a problem hiding this comment.
Awesome, looks very good already.
I was wondering whether we would also need to refactor process_simulator accordingly? e.g., we can likely drop the internal torch wrapping and accept numpy simulators (because we wrap to numpy as again). Would have to look into it in detail but I think we can simplify the function a bit.
tomMoral
left a comment
There was a problem hiding this comment.
A few comments.
A question: is it possible to use a return type which is str? It is unclear to me where this would go in the current sbi
| seed: Optional[int] = None, | ||
| show_progress_bar: bool = True, | ||
| ) -> Tuple[Tensor, Tensor]: | ||
| ) -> Tuple[Tensor, Tensor | List[str] | str]: |
There was a problem hiding this comment.
Shouldn't it be:
| ) -> Tuple[Tensor, Tensor | List[str] | str]: | |
| ) -> Tuple[Theta, X]: |
| UserWarning, | ||
| stacklevel=2, | ||
| ) | ||
| batches = [theta for theta in thetas] |
There was a problem hiding this comment.
Why? In this context, we could use the batch parameter of joblib (useful if initialization of the simulator is slow)
| ) | ||
| batches = [theta for theta in thetas] | ||
| else: | ||
| batches = [theta for theta in thetas] |
There was a problem hiding this comment.
Unclear why not keep batches = thetas? The current solution duplicates the theta in memory, which can be a large overhead for large number of simulation (probably for later version, right now everything needs to fit in memory).
| num_batches = ( | ||
| num_simulations + simulation_batch_size - 1 | ||
| ) // simulation_batch_size | ||
| batches = [ |
There was a problem hiding this comment.
I would use a iterator if possible, once again to avoid filling the memory and duplicating. in particular if this is a tensor, use split?
|
|
||
| # Run in parallel | ||
| # Generate seeds | ||
| batch_seeds = np.random.randint(low=0, high=1_000_000, size=(len(batches),)) |
There was a problem hiding this comment.
We need to be able to seed this part no?
| context = parallel_config(n_jobs=num_workers) | ||
|
|
||
| with context: |
There was a problem hiding this comment.
| context = parallel_config(n_jobs=num_workers) | |
| with context: | |
| with parallel_config(n_jobs=num_workers): |
|
|
||
| # 1. No arguments | ||
| @parallelize_simulator | ||
| def decorated_sim(theta): |
There was a problem hiding this comment.
Why not use simple_simulator_torch which is the same?
This PR refactors the simulation utilities to improve parallel execution efficiency and modularity.
New
parallelize_simulatordecorator/function: introduces a tool that wraps any simulator to enable parallel execution usingjoblib.New
simulate_from_thetasutility: added a lightweight wrapper aroundparallelize_simulatorthat directly executes simulations for a given set of parameters (thetas).Refactored
simulate_for_sbi(backward compatibility): updated the internal implementation ofsimulate_for_sbito leverage the newsimulate_from_thetasbackend.