Supporting secure unpickling in PyRosetta#523
Merged
lyskov merged 54 commits intoRosettaCommons:mainfrom Sep 24, 2025
Merged
Conversation
…g required attributes
Added a warning about the risks of unpickling data from untrusted sources in the secure_unpickle.py file.
Updated warning message regarding secure unpickling to clarify risks and emphasize the importance of trusted sources.
lyskov
reviewed
Sep 23, 2025
Comment on lines
70
to
79
| from pyrosetta.secure_unpickle import ( | ||
| add_secure_package, | ||
| remove_secure_package, | ||
| clear_secure_packages, | ||
| get_disallowed_packages, | ||
| get_secure_packages, | ||
| set_secure_packages, | ||
| UnpickleSecurityError, | ||
| UnpickleIntegrityError, | ||
| ) |
Member
There was a problem hiding this comment.
@klimaj could you please elaborate why do we need these at top level? Ie: in general I would like to keep default import list lean (and these will not be useful unless distributed framework is enabled).
Member
Author
There was a problem hiding this comment.
@lyskov Good point! The original idea was for convenience, but on second thought, I agree that we don't want to clutter the pyrosetta.* namespace. Also please note that these methods are still active outside of the pyrosetta.distributed framework, since these are for configuring the secure packages for the Pose.cache dictionary.
Member
Author
lyskov
approved these changes
Sep 24, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Currently,
PackedPoseobjects are serialized/deserialized using thepicklemodule (introduced in ~2019), and thePose.cachedictionary (introduced in #430) supports caching arbitrary datatypes in thePoseobject using thepicklemodule. Additionally, #462 enables saving compressedPackedPoseobjects to disk (i.e., as*.b64_poseand*.pkl_posefiles) for sharing PyRosettaPoseobjects with the scientific community. However, use of thepicklemodule is not secure (see warning here as outlined in #519).Herein this PR, a secure
pickle.loadsmethod is developed and slotted into thePackedPoseandPose.cacheinfrastructure to permanently disallow certain risky packages, modules, and namespaces from being unpickled/loaded (e.g.,exec,eval,os.system,subprocess.run, etc., and will be updated over time as needed), thus significantly improving the security of handlingPackedPoseandPoseobjects in memory if received from a second party (i.e., over a socket, queue, interprocess communication, etc.) or when reading a file received from a second party (i.e., usingpyrosetta.distributed.io.pose_from_filewith a*.b64_poseand*.pkl_posefile). By default, onlypyrosettaandnumpypackages, and certainbuiltinsmodules (likedict,complex,tuple, etc.), are considered secure and permitted to be unpickled/loaded. Other packages that the user may want to serialize/deserialize may be assigned as secure per-process by the user in-code (see methods below). It is worth noting that PyTorch developers have implemented a similar strategy with the torch.serialization.add_safe_globals() method.Another aim of this PR is to implement an optional Hash-based Message Authentication Code (HMAC) key in the
Pose.cachedictionary for data integrity verification. While not a security feature, this new API allows the user to set a HMAC key to be prepended to every score value in thePose.cachedictionary that effectively says "this was saved by PyRosetta", so that it intentionally raises an error when the HMAC key is missing or differs upon retrieval, indicating that the data appears to have been tampered with or modified. By default, the HMAC key is disabled (being set toNone) in order to reduce memory overhead of thePose.cachedictionary; e.g., if 32 bytes are prepended to each score value, with 1,000 score values that's 32,000 bytes or 32 KB of overhead, and with a million score values that's 32 MB of overhead.The following are newly added functions:
pyrosetta.secure_unpickle.add_secure_package: Add a package to the unpickle allowed listpyrosetta.secure_unpickle.remove_secure_package: Remove a package from the unpickle allowed listpyrosetta.secure_unpickle.clear_secure_packages: Remove all packages from the unpickle allowed listpyrosetta.secure_unpickle.get_disallowed_packages: Return all permanently disallowed packages/modules/prefixespyrosetta.secure_unpickle.get_secure_packages: Return all packages in the unpickle allowed listpyrosetta.secure_unpickle.set_secure_packages: Set all packages in the unpickle allowed listpyrosetta.secure_unpickle.set_unpickle_hmac_key: Set the HMAC key for thePose.cachedictionarypyrosetta.secure_unpickle.get_unpickle_hmac_key: Return the HMAC key for thePose.cachedictionary