Skip to content

Minor change to PackedPose.scores dictionary construction#612

Open
klimaj wants to merge 24 commits intoRosettaCommons:mainfrom
klimaj:secure_packages
Open

Minor change to PackedPose.scores dictionary construction#612
klimaj wants to merge 24 commits intoRosettaCommons:mainfrom
klimaj:secure_packages

Conversation

@klimaj
Copy link
Member

@klimaj klimaj commented Jan 26, 2026

Currently in PyRosetta, instantiating a PackedPose object triggers deserialization of the Pose.cache dictionary (due to the line self.scores = dict(pose_or_pack.cache)). This is usually fine for single Python processes where users have already added secure packages to the unpickle-allowed list (i.e., via the pyrosetta.secure_unpickle module); however, PyRosettaCluster spawns new threads using the billiard package on Dask workers, where binary blobs are deserialized into instantiated PackedPose objects, triggering deserialization of the Pose.cache dictionary before users have had a chance to add secure packages to the unpickle-allowed list in the billiard subprocess. This can raise UnpickleSecurityError exceptions if users cache arbitrary Python types in the Pose.cache dictionary, even when users add secure packages in their PyRosetta protocols, since PackedPose instantiation happens before the user-provided PyRosetta protocols run.

In order to remedy this, this PR updates the PackedPose class to instantiate an empty dictionary on the PackedPose.scores attribute, rather than a detached, flattened copy of the Pose.cache dictionary. Not only does this free up duplicated memory usage from all of the Pose.cache scores copied onto the PackedPose object, but I think it improves the PackedPose API intention a bit: the PackedPose class already implements the PackedPose.update_scores method, which is specifically designed to embed new scores into the Pose object; and because the PackedPose.scores attribute is currently a detached, flattened copy of the Pose.cache dictionary, this can cause confusion about what are the ground truth scores in the PackedPose object.

Regarding PyRosettaCluster, by preventing deserialization of the Pose.cache dictionary upon PackedPose instantiation, users can now programmatically add secure packages in their PyRosetta protocols per Python process (as the pyrosetta.secure_unpickle module was designed), and the underlying PyRosettaCluster infrastructure does not unintentionally deserialize the Pose.cache dictionary just to compress PackedPose objects in transit between the client and Dask workers.

Nonetheless, this minor change has the potential to break scripts in the wild. Anyone's scripts that call PackedPose.scores to access score values (that were originally embedded in the Pose object) now needs to call PackedPose.pose.cache (or the deprecated PackedPose.pose.scores). However, the PackedPose.scores attribute still functions the same way as before as a detached dictionary from the Pose.cache infrastructure, but herein this PR it is just not pre-populated with the Pose.cache dictionary upon PackedPose instantiation; users adding scores to the PackedPose.scores dictionary can still access them there (even though the preferred syntax continues to use the PackedPose.update_scores method).

@klimaj klimaj added bug Something isn't working enhancement New feature or request 03 PyRosetta industry labels Jan 26, 2026
@klimaj klimaj requested a review from lyskov January 29, 2026 19:11
@klimaj klimaj added the ready_for_review This PR is ready to be reviewed and merged. label Jan 29, 2026
…ay be used by default, depending on pandas version
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

03 PyRosetta bug Something isn't working enhancement New feature or request industry ready_for_review This PR is ready to be reviewed and merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant