Minor change to PackedPose.scores dictionary construction#612
Open
klimaj wants to merge 24 commits intoRosettaCommons:mainfrom
Open
Minor change to PackedPose.scores dictionary construction#612klimaj wants to merge 24 commits intoRosettaCommons:mainfrom
klimaj wants to merge 24 commits intoRosettaCommons:mainfrom
Conversation
…Pose.scores dictionary
…rdless of if compression is enabled
…ete files: outputs'
…rityError exceptions
…ay be used by default, depending on pandas version
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Currently in PyRosetta, instantiating a
PackedPoseobject triggers deserialization of thePose.cachedictionary (due to the lineself.scores = dict(pose_or_pack.cache)). This is usually fine for single Python processes where users have already added secure packages to the unpickle-allowed list (i.e., via thepyrosetta.secure_unpicklemodule); however,PyRosettaClusterspawns new threads using thebilliardpackage on Dask workers, where binary blobs are deserialized into instantiatedPackedPoseobjects, triggering deserialization of thePose.cachedictionary before users have had a chance to add secure packages to the unpickle-allowed list in thebilliardsubprocess. This can raiseUnpickleSecurityErrorexceptions if users cache arbitrary Python types in thePose.cachedictionary, even when users add secure packages in their PyRosetta protocols, sincePackedPoseinstantiation happens before the user-provided PyRosetta protocols run.In order to remedy this, this PR updates the
PackedPoseclass to instantiate an empty dictionary on thePackedPose.scoresattribute, rather than a detached, flattened copy of thePose.cachedictionary. Not only does this free up duplicated memory usage from all of thePose.cachescores copied onto thePackedPoseobject, but I think it improves thePackedPoseAPI intention a bit: thePackedPoseclass already implements thePackedPose.update_scoresmethod, which is specifically designed to embed new scores into thePoseobject; and because thePackedPose.scoresattribute is currently a detached, flattened copy of thePose.cachedictionary, this can cause confusion about what are the ground truth scores in thePackedPoseobject.Regarding
PyRosettaCluster, by preventing deserialization of thePose.cachedictionary uponPackedPoseinstantiation, users can now programmatically add secure packages in their PyRosetta protocols per Python process (as thepyrosetta.secure_unpicklemodule was designed), and the underlyingPyRosettaClusterinfrastructure does not unintentionally deserialize thePose.cachedictionary just to compressPackedPoseobjects in transit between the client and Dask workers.Nonetheless, this minor change has the potential to break scripts in the wild. Anyone's scripts that call
PackedPose.scoresto access score values (that were originally embedded in thePoseobject) now needs to callPackedPose.pose.cache(or the deprecatedPackedPose.pose.scores). However, thePackedPose.scoresattribute still functions the same way as before as a detached dictionary from thePose.cacheinfrastructure, but herein this PR it is just not pre-populated with thePose.cachedictionary uponPackedPoseinstantiation; users adding scores to thePackedPose.scoresdictionary can still access them there (even though the preferred syntax continues to use thePackedPose.update_scoresmethod).