-
Notifications
You must be signed in to change notification settings - Fork 1
Description
I am currently testing Recomb-Mix on a smaller simulated dataset (190 AFR-EUR individuals) with the intention of applying it to larger, biobank-scale participant data. However, I am running into some issues with the output when there is not a 100% match between the query and the reference panel SNPs.
My query panel comprises 190 AFR-EUR admixed individuals simulated using Haptools and a subset of the AFR and EUR samples from the HGDP-1KG joint-call panel. My reference panel comprises the remaining 999 AFR and EUR samples from the joint-call panel. Both panels have been QCed to remove indels, multi-allelic SNPs, strand-ambiguous SNPs, and filtered for MAF 0.005 and HWE p<1e-10. I am running Recomb-Mix with the default parameters and am currently testing on chromosome 1.
When I use just the intersecting SNPs (907 046 SNPs with 100% overlap between ref and query), the tool outputs reasonable-looking LAI data. However, when there is less overlap (between 90% and 98%, with ~1M and ~1.07M SNPs for query and ref respectively), the tool outputs lots of really small (sometimes just 1 SNP) segments assigned either AFR or -1 (I'm assuming this is 'can't be assigned' or missing). Even with 98% overlap, there are lots of small and missing segments for every sample.
Is this expected behaviour - i.e. there must be a 100% match between query and reference panels?