Skip to content

Minimum overlap between reference and query #5

@JosieLikesCats

Description

@JosieLikesCats

I am currently testing Recomb-Mix on a smaller simulated dataset (190 AFR-EUR individuals) with the intention of applying it to larger, biobank-scale participant data. However, I am running into some issues with the output when there is not a 100% match between the query and the reference panel SNPs.

My query panel comprises 190 AFR-EUR admixed individuals simulated using Haptools and a subset of the AFR and EUR samples from the HGDP-1KG joint-call panel. My reference panel comprises the remaining 999 AFR and EUR samples from the joint-call panel. Both panels have been QCed to remove indels, multi-allelic SNPs, strand-ambiguous SNPs, and filtered for MAF 0.005 and HWE p<1e-10. I am running Recomb-Mix with the default parameters and am currently testing on chromosome 1.

When I use just the intersecting SNPs (907 046 SNPs with 100% overlap between ref and query), the tool outputs reasonable-looking LAI data. However, when there is less overlap (between 90% and 98%, with ~1M and ~1.07M SNPs for query and ref respectively), the tool outputs lots of really small (sometimes just 1 SNP) segments assigned either AFR or -1 (I'm assuming this is 'can't be assigned' or missing). Even with 98% overlap, there are lots of small and missing segments for every sample.

Is this expected behaviour - i.e. there must be a 100% match between query and reference panels?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions