-
Notifications
You must be signed in to change notification settings - Fork 0
Update Mouse Genome
#Overview of steps
Update Jax Imputed Database SNP Locations
- Jax markers can be found here [http://cgd.jax.org/datasets/popgen/imputed.shtml Center for Genome Dynamics]
- Limiting Jax Markers.R imports data and limits SNPs based on confidence and information
Update Sanger Institute C57BL/6J and DBA/2J SNP locations
- Here is the link to their latest version of SNPs [ftp://ftp-mouse.sanger.ac.uk/current_snps/ Mouse Genome Project]
- I haven't dealt with their new format of SNPs
Update locations of BXD SNPs from the Wellcome Trust-CTC (for PhenoGen and eQTL - not mask)
- Data for the mm37 build can be download from here [http://gscan.well.ox.ac.uk/gsBleadingEdge/mouse.snp.selector.cgi Mouse SNP Selector]
Update Mouse Exon and 430 version 2 SNP Masks
- Download sequence information for the UCSC genome browser. It is easiest to download combined file.
- Download probe sequence information from Affymetrix
- Download and install BLAT
- Align probes to new genome
- Update masks
- Identify probes that hit the genome once and only once (findPerfectMatches.R)
- Use BEDtools to find overlap between SNPs and probes (intersectSNPsAndPerfectMatchedProbes.txt)
- Reformat original PGF file (read in pgf file.txt)
- Eliminate probes from temporary PGF file (Eliminate.Probes.R)
- Create new PS files (Eliminate.Probes.R)
- Create new MPS files (Eliminate.Probes.R, updatingFullTransFile.R, and creating new *.mps file.txt)
- Reformat new PGF file (creating new PGF file.txt)>>
- Normalize Rat Exon Array data using new mask
- Calculate heritability
- Calculate eQTL with genome-wide pvalues
- Calculate eQTL with locus-specific pvalues
- Generate LOD plots
#Current Method Aug 2013
-
Run blat
./blat -stepSize=5 -minScore=20 -minIdentity=1 -repMatch=2253 /Volumes/Data/mm10/index/mm10.fa RaEx-1_0-st-v1.probe.fa blatOutput.psl
Note: It may seem like you can increase the minScore and minIdentity but this can cause blat to miss perfect matches. So leave it alone and filter the .psl file with the R script or some other method. -
Identify probes that hit the genome once and only once (findPerfectMatches_Exon.R - Exon array or findPerfectMatches_3Prime.R)
- Modified by only doing the probe location output as a bed file. Skip combining SNPS to allow for different combinations of SNPs to be used for masking.
-
Create BED files for SNPs for each strain. (Currently only ILS/ISS which were separate already)
- Converted from VCF using bedops(vcf2bed). Complete files not the unique SNPs (ILS_VQSR_BOTH_homozygousVariants.vcf and ISS_VQSR_BOTH_homozygousVariants.vcf)
-
Find intersection of Probe bed and SNP bed for each strain.
- Use BEDtools to find overlap between SNPs and probes (intersectSNPsAndPerfectMatchedProbes.txt)
./intersectBed -a /Volumes/Data/mm10/array/MoEx/Aligned/MoEx1_0st_probe.default.perfectMatches.wStrand.txt -b /Volumes/Data/mm10/snps/sanger/snps_DBA.bed -c > /Volumes/Data/mm10/array/MoEx/SNPs_Probes/DBA_snpsInProbes.txt
- Use BEDtools to find overlap between SNPs and probes (intersectSNPsAndPerfectMatchedProbes.txt)
-
Combine SNPs and Probes and perform filtering and output Masks (Exon arrays)
perl CreateMaskExon.pl /Volumes/Data/mm10/array/MoEx MoEx-1_0-st-v1.r2.pgf DBA MoEx-1_0-st-v1.r2.dt1.mm9. MoEx-1_0-st-v1.r2.DBA.MASKED.perl.pgf mm10 MASKED.perl MoEx-1_0-st-v1.r2.dt1.mm9.csv- Arguments are:
- Path to folder for the array type(Ex. MoEx)
- PGF file name
- Comma separated list of strain snps to include(just strain name prefix of file Strain_snpsInProbes.txt) Strain,Strain2,Strain3
- Prefix including last . up to core, extended etc for the source files.(Downloaded from affy)
- Output pgf file name
- version of genome ex mm9,mm10
- label to append to all files output(this can be anything avoid spaces)
- The csv file linking Transcript clusters and probe sets(Downloaded from affy)
- Arguments are: