MoTeX: A word-based HPC tool for single and structured MoTif eXtraction

Description: MoTeX is an accurate and efficient tool for single and structured MoTif eXtraction. It comes in three flavors: the standard CPU version; the OpenMP-based version; and the MPI-based version. It includes a tool that implements measures for assesing the statistical significance of the reported motifs.

Installation: To compile MoTeX-II, please follow the instructions given in file INSTALL.

Usage:

m    m       mmmmmmm        m    m
##  ##  mmm     #     mmm    #  #
# ## # #" "#    #    #"  #    ##
# "" # #   #    #    #""""   m""m
#    # "#m#"    #    "#mm"  m"  "m

 Usage: motexCPU|motexOMP|motexMPI <options>
 Standard (Mandatory):
  -a, --alphabet            <str>     `DNA' for nucleotide  sequences or `PROT'
                                      for protein  sequences. You may use `USR'
                                      for user-defined alphabet; edit the file
                                      motexdefs.h accordingly.
  -i, --input-file          <str>     (Multi)FASTA input filename.
  -o, --output-file         <str>     MoTeX output filename.
  -d, --distance            <int>     The  distance  used  for extracting  the
                                      motifs. It can be  either 0 (for Hamming
                                      distance) or 1 (for edit distance).
  -k, --motifs-length       <int>     The length for motifs.
  -e, --errors              <int>     Limit the  max number  of errors to this
                                      value.
  -q, --quorum              <int>     The quorum is the minimum percentage (%)
                                      of sequences in which a motif must occur.

 Optional:
  -Q, --max-quorum          <int>     The maximum percentage (%) of sequences
                                      in which a motif can occur (default: 100).
  -n, --num-of-occ          <int>     The minimum  number of  occurrences of a
                                      reported  motif in any  of the sequences
                                      (default: 1).
  -N, --max-num-of-occ      <int>     The maximum  number of  occurrences of a
                                      reported  motif in any  of the sequences
                                      (default: 10000).
  -s, --structured-motifs   <str>     Input filename  for the structure of the
                                      boxes in the case of structured motifs.
  -S, --SMILE-out-file      <str>     SMILE-like output filename to be used by
                                      SMILE.
  -b, --background-in-file  <str>     MoTeX background filename for statistical
                                      evaluation passed as input.
  -t, --threads             <int>     Number of threads to be used by the OMP
                                      version (default: 4).
  -L, --long-sequences      <int>     If the number of input sequences is less
                                      than  the number of  processors  used by
                                      the MPI version, this should be set to 1
                                      (default: 0); useful  for a few (or one)
                                      very long sequence(s), e.g. a chromosome.
  -u, --un-out-file         <str>     Output filename for foreground motifs not
                                      matched exactly with any background motif
                                      in the file passed with the `-b' option.
  -I, --un-in-file          <str>     Input filename of the aforementioned file
                                      with the unmatched  motifs. These  motifs
                                      will be approximately searched  as motifs
                                      in the file passed with the `-i' option.
  -U, --SMILE-un-out-file   <str>     SMILE-like output filename for foreground
                                      motifs  not  matched  exactly  with  any
                                      background motif in the file passed with
                                      the `-b' option.

Example: For typical runs, see file EXAMPLES.

Citation:

S. P. Pissis: 
MoTeX-II: structured MoTif eXtraction from large-scale datasets. 
BMC Bioinformatics 15: 235 (2014)

Name		Name	Last commit message	Last commit date
Latest commit History 164 Commits
data		data
EXAMPLES		EXAMPLES
INSTALL		INSTALL
LICENSE		LICENSE
Makefile.CPU.gcc		Makefile.CPU.gcc
Makefile.CPU.icc		Makefile.CPU.icc
Makefile.MPI.gcc		Makefile.MPI.gcc
Makefile.MPI.icc		Makefile.MPI.icc
Makefile.OMP.gcc		Makefile.OMP.gcc
Makefile.OMP.icc		Makefile.OMP.icc
README.md		README.md
SMILEv1.47.tgz		SMILEv1.47.tgz
alphabet		alphabet
binomial.cc		binomial.cc
functions.cc		functions.cc
libFLASM.zip		libFLASM.zip
libdatrie_0.2.8.orig.tar.xz		libdatrie_0.2.8.orig.tar.xz
motex.cc		motex.cc
motexdefs.h		motexdefs.h
mpfr-3.1.2.tar.gz		mpfr-3.1.2.tar.gz
pre-install.sh		pre-install.sh
script.MPI.sh		script.MPI.sh
script.OMP.sh		script.OMP.sh
seqan.zip		seqan.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MoTeX: A word-based HPC tool for single and structured MoTif eXtraction

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

solonas13/MoTeX

Folders and files

Latest commit

History

Repository files navigation

MoTeX: A word-based HPC tool for single and structured MoTif eXtraction

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages