Skip to content

no filtering with polished genome assembly #7

@sjfleck

Description

@sjfleck

I'm trying to use your program on an an ~700MB assembly that has undergone 3 rounds of racon and 3 rounds of pilon for polishing. I put that assembly into pseudohaploid and I got this result:

SLURM_JOBID=3284934
SLURM_JOB_NODELIST=cpn-k06-05
SLURM_NNODES=1
SLURMTMPDIR=/scratch/3284934
working directory = /gpfs/scratch/sjfleck/0_PINGUICULA/PseudoHaploid
Generating pseudohaploid genome sequence

GENOME: pilon3_ping_giga_all.fasta
OUTPREFIX: pilon3_ping_giga_all
MIN_IDENTITY: 90
MIN_LENGTH: 1000
MIN_CONTAIN: 93
MAX_CHAIN_GAP: 20000

  1. Aligning pilon3_ping_giga_all.fasta to itself with nucmer
    Original assembly has 885 contigs

  2. Filter for alignments longer than 1000 bp and below 90 identity

  3. Generating coords file

  4. Identifying alignment chains: min_id: 90 min_contain: 93 max_gap: 20000
    Processing coords file (pilon3_ping_giga_all.filter.coords)...
    Processed 0 alignment records [0 valid]
    Finding chains for 0 contigs...
    Found 0 total edges [0 constructtime, 0 searchtime, 0 stackadd]
    Looking for contained contigs...
    Found 0 joint contained contigs
    Printed 0 total contained contigs

  5. Generating a list of redundant contig ids using min_contain: 93
    Identified 0 redundant contig to remove in pilon3_ping_giga_all.contained.ids

  6. Creating final pseudohaploid assembly in pilon3_ping_giga_all.pseudohap.fa
    Pseudohaploid assembly has 885 contigs
    All Done!

It only ran for 1 minute, so I don't think it ran properly. The nucmer.err file said:
20200609|153050| 12618| ERROR: mummer and/or mgaps returned non-zero

and my .err file printed:
ERROR: Could not parse delta file, pilon3_ping_giga_all.delta
error no: 400
ERROR: Could not parse delta file, pilon3_ping_giga_all.filter.delta
error no: 402

Any insight into what I'm doing wrong would be very helpful. I also want to mention that I did not alter the create_pseudohaploid.sh file. I was confused by this line:

You may want to replace this with sge_mummer for large genomes

See: https://github.com/fritzsedlazeck/sge_mummer

if [ ! -r $PREFIX.delta ]
then
echo "1. Aligning $GENOME to itself with nucmer"
(nucmer --maxmatch -c 100 -l 500 $GENOME $GENOME -p $PREFIX) >& nucmer.log
numorig=grep -c '^>' $GENOME
echo "Original assembly has $numorig contigs"
echo
fi

What should I be replacing with sge_mummer with? Also, my university uses the SLURM workload manager, not the Sun Grid Engine (SGE), so is multithreading not available for me? Thanks for any and all insight you can provide
-Steve

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions