Skip to content

Update Mouse Genome

Spencer Mahaffey edited this page Aug 14, 2015 · 5 revisions

#Overview of steps

Update Jax Imputed Database SNP Locations

  • Jax markers can be found here [http://cgd.jax.org/datasets/popgen/imputed.shtml Center for Genome Dynamics]
  • Limiting Jax Markers.R imports data and limits SNPs based on confidence and information

Update Sanger Institute C57BL/6J and DBA/2J SNP locations

  • Here is the link to their latest version of SNPs [ftp://ftp-mouse.sanger.ac.uk/current_snps/ Mouse Genome Project]
  • I haven't dealt with their new format of SNPs

Update locations of BXD SNPs from the Wellcome Trust-CTC (for PhenoGen and eQTL - not mask)

  • Data for the mm37 build can be download from here [http://gscan.well.ox.ac.uk/gsBleadingEdge/mouse.snp.selector.cgi Mouse SNP Selector]

Update Mouse Exon and 430 version 2 SNP Masks

  • Download sequence information for the UCSC genome browser. It is easiest to download combined file.
  • Download probe sequence information from Affymetrix
  • Download and install BLAT
  • Align probes to new genome
  • Update masks
    • Identify probes that hit the genome once and only once (findPerfectMatches.R)
    • Use BEDtools to find overlap between SNPs and probes (intersectSNPsAndPerfectMatchedProbes.txt)
    • Reformat original PGF file (read in pgf file.txt)
    • Eliminate probes from temporary PGF file (Eliminate.Probes.R)
    • Create new PS files (Eliminate.Probes.R)
    • Create new MPS files (Eliminate.Probes.R, updatingFullTransFile.R, and creating new *.mps file.txt)
    • Reformat new PGF file (creating new PGF file.txt)>>
  • Normalize Rat Exon Array data using new mask
  • Calculate heritability
  • Calculate eQTL with genome-wide pvalues
  • Calculate eQTL with locus-specific pvalues
  • Generate LOD plots

#Current Method Aug 2013

  1. Run blat
    ./blat -stepSize=5 -minScore=20 -minIdentity=1 -repMatch=2253 /Volumes/Data/mm10/index/mm10.fa RaEx-1_0-st-v1.probe.fa blatOutput.psl
    Note: It may seem like you can increase the minScore and minIdentity but this can cause blat to miss perfect matches. So leave it alone and filter the .psl file with the R script or some other method.

  2. Identify probes that hit the genome once and only once (findPerfectMatches_Exon.R - Exon array or findPerfectMatches_3Prime.R)

    • Modified by only doing the probe location output as a bed file. Skip combining SNPS to allow for different combinations of SNPs to be used for masking.
  3. Create BED files for SNPs for each strain. (Currently only ILS/ISS which were separate already)

    • Converted from VCF using bedops(vcf2bed). Complete files not the unique SNPs (ILS_VQSR_BOTH_homozygousVariants.vcf and ISS_VQSR_BOTH_homozygousVariants.vcf)
  4. Find intersection of Probe bed and SNP bed for each strain.

    • Use BEDtools to find overlap between SNPs and probes (intersectSNPsAndPerfectMatchedProbes.txt)
      ./intersectBed -a /Volumes/Data/mm10/array/MoEx/Aligned/MoEx1_0st_probe.default.perfectMatches.wStrand.txt -b /Volumes/Data/mm10/snps/sanger/snps_DBA.bed -c > /Volumes/Data/mm10/array/MoEx/SNPs_Probes/DBA_snpsInProbes.txt
  5. Combine SNPs and Probes and perform filtering and output Masks (Exon arrays)
    perl CreateMaskExon.pl /Volumes/Data/mm10/array/MoEx MoEx-1_0-st-v1.r2.pgf DBA MoEx-1_0-st-v1.r2.dt1.mm9. MoEx-1_0-st-v1.r2.DBA.MASKED.perl.pgf mm10 MASKED.perl MoEx-1_0-st-v1.r2.dt1.mm9.csv

    • Arguments are:
      • Path to folder for the array type(Ex. MoEx)
      • PGF file name
      • Comma separated list of strain snps to include(just strain name prefix of file Strain_snpsInProbes.txt) Strain,Strain2,Strain3
      • Prefix including last . up to core, extended etc for the source files.(Downloaded from affy)
      • Output pgf file name
      • version of genome ex mm9,mm10
      • label to append to all files output(this can be anything avoid spaces)
      • The csv file linking Transcript clusters and probe sets(Downloaded from affy)

Update mouse genome details

Clone this wiki locally