Skip to content

feat: Implementing parallelization for unzipping files#1114

Open
Roaimkhan wants to merge 1 commit intogoogle-deepmind:mainfrom
Roaimkhan:features/bugs
Open

feat: Implementing parallelization for unzipping files#1114
Roaimkhan wants to merge 1 commit intogoogle-deepmind:mainfrom
Roaimkhan:features/bugs

Conversation

@Roaimkhan
Copy link

Description

This PR implements GNU Parallel-based unzipping for 200,000+ *.cif.gz files in the AlphaFold pipeline.

The Problem:

The current unzipping process is strictly serial, which is extremely slow for large datasets. This limits efficiency and delays downstream processing.

The Fix:

Added a check for GNU Parallel availability.

Automatically detects the number of CPU cores, leaving one core free for I/O-bound tasks and using the remaining cores for parallel unzipping.

Falls back to the existing serial method if GNU Parallel is not installed.

Updated README.md to reflect the new parallelization option and usage instructions.

Fixes: #1075

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

parallelization opportunity

1 participant