Overview | Start at bench | Install TOUCAN | Get Sequences | Annotate | MotifScanner | Statistics | ModuleSearcher | MotifSampler | Return to bench | References  
   
MotifScanner - Search for Transcription Factor Binding Sites

We now want to scan our dataset of 53 human promoter sequences for the presence of transcription factor binding sites (TFBS). MotifScanner is one of the programs integrated in TOUCAN which can be used for this purpose. Choose "Motifs", "MotifScanner", which brings up a window like the following one:
               
MotifScanner Input
           
Essentially, you have to make three selections here.

1. "PWM database" is the database of transcription factor binding site profiles you want to use, which is quite self-explanatory, e.g. for human sequences you may choose "TRANSFAC 6.0 public - Vertebrates"
(Wingender et al., 2001) or you may choose the independent JASPAR database (Sandelin et al., 2004). It should be mentioned in this context, that for matters of comparison, it can be useful to perform several runs using different PWM databases. A PWM (Position-Weight-Matrix) displays a TFBS as matrix which indicates the experimentally determined frequency of the four nucleotides at each position. The last column represents the deduced consensus in IUPAC code. The following matrix shows the binding profile of the factor NF-kappaB (p50) as illustrative example, taken and modified from the TRANSFAC 6.0 public database. Note that at ambigous positions, IUPAC letters are used (like Y which stands for C or T).
     
AC   M00051
XX
ID V$NFKAPPAB50_01
XX
DE NF-kappaB (p50)
XX
BF T00593 NF-kappaB1; Species: human, Homo sapiens.
XX
PO   A C G T
01 0 0 18 0 G
02 0 0 18 0 G
03 0 0 18 0 G
04 2 0 16 0 G
05 16 1 0 1 A
06 0 0 3 15 T
07 0 7 1 10 Y
08 0 16 0 2 C
09 0 18 0 0 C
10 0 17 1 0 C
             
2. The "Background Model" takes the "average promoter composition" within the species of interest into consideration, and compares this information with the sequences in the active sequence set. Naturally, if you are scanning human promoter sequences you should use a background model calculated from human promoter sequences as well, like "EPD Human" based on human promoter sequences stored in the Eukaryotic Promoter database, EPD (Schmid et al., 2004). "Background Model" lists orders of Markov Models, 3rd order models are fine in most cases. In a 1st order background model, the genomic frequencies are calculated for each dinucleotide (AA, AT, etc.), so 1 bp (1st order) before the actual bp that is being scored with the background model and matrix model. In a 2nd order background model, the score of a nucleotide for the background model is the frequency of the trinucleotide (e.g. AAT if T is being scored).

3. The "Prior" value indicates the stringency level which defines if a sequence motif corresponds to a TFBS concensus. The higher the "prior", the more instances of each motif will be found, meaning a lower "prior" (like 0.1) is more stringent than a higher one (like 0.9). In general, also the size of the sequences analyzed should be taken into consideration, which is reflected in the following "prior"-examples: 0.1-0.2 for sequences smaller than 300 bp, 0.9 for sequences larger than 1500 bp.

Results are returned as GFF format. Annotate these on your sequences simply by choosing "YES". The output is a color-coded list of TFBS in the "Feature list" on the left side of the TOUCAN window, and a visualization of these sites along the input sequences in the main window ("Sequence set"). Within the "Feature list", you can right-click on individual TFs to show/hide/or re-color, or you may visualize features simply by selecting them and hitting the "Enter"-key (also works with CDS, exons, ...). Alternatively, you may click onto individual boxes (TFBS) in the main window, in order to display their properties in the "Feature Info" window, as shown for the factor NF-kappaB (p50) in the following image (red). If you want to know which gene / promoter you are analyzing at the moment, simply click into the region of the first exon, which reveals this information in the "Feature Info" window (cyan).
                    
MotifScanner output
                

Previous <       > Next