Skip to content

ERROR: invalid file header with NCBI genomic data, file works in python. #89

@pcjentsch

Description

@pcjentsch

This archive opens fine with python's zipfiles.

It is a 39GB so I cannot include a working example easily but if you are inclined to try it:

conda create -n ncbi_datasets
conda activate ncbi_datasets
conda install -c conda-forge ncbi-datasets-cli
datasets download virus genome taxon sars-cov-2 --host human

The output from zipfiles.infolist() in python is

zipf.infolist()
[<ZipInfo filename='README.md' compress_type=deflate filemode='?rw-------' file_size=1                  604 compress_size=769>, <ZipInfo filename='ncbi_dataset/data/data_report.jsonl' compress_type=deflate filemode='?rw-------' file_size=81889507642 compress_size=4292597995>, <ZipInfo filename='ncbi_dataset/data/biosample.jsonl' compress_type=deflate filemode='?rw-------' file_size=7826671566 compress_size=205379661>, <ZipInfo filename='ncbi_dataset/data/cds.fna' compress_type=deflate filemode='?rw-------' file_siz                            e=177621771946 compress_size=11180195822>, <ZipInfo filename='ncbi_dataset/data/genomic.fna' compress_type=deflate filemode='?rw-------' file_size=167811715365 compress_size=13523743233>, <ZipInfo filename='ncbi_dataset/data/protein.faa' compress_type=deflate filemode='?rw-------' file_size=82837067420 compress_size=3110887927>, <ZipInfo filename='ncbi_dataset/data/virus_dataset.md' compress_type=deflate filemode='?rw-------' file_size=2431 compress_size=1057>, <ZipInfo filename='ncbi_dataset/data/dataset_catalog.json' compress_type=deflate filemode='?rw-------' file_size=845 compress_size=321>]

if that is helpful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions