Help: Prokaryotic background models



Overview

  1. Calculation of the higher order background models
  2. Statistics
  3. References
Calculation of the higher order background models

Calculation of the higher order background models
All GenBank files corresponding to the listed accessionnumbers where downloaded. Intergenic regions were delineated according to the modules of INCLUSive (Thijs et al., 2002). Based on these intergenics higher order background models were calculated. Therefore, the frequency of all oligonucleotides of order m and m+1 are counted to generated resp. the frequency table of all oligonucleotides and the transition matrix of the background model. The transition matrix stores the probability of a nucleotide given the m previous nucleotides in the sequence. The following table shows part of the third-order background model based on the intergenic sequences in Escherichia coli (K12). The first element in the matrix thus represents the probability of finding an A in the intergenics of E.coli given the fact that there are three preceding A's.

ContentACGT
AAA0.3788 0.1715 0.1655 0.2842
AAC 0.3530 0.2052 0.2405 0.2014
AAG 0.2678 0.2447 0.2461 0.2414
AAT 0.3142 0.2007 0.2009 0.2842
ACA 0.3625 0.1596 0.2208 0.2571
... ............
TTC 0.3325 0.2136 0.1680 0.2859
TTG 0.2333 0.3207 0.1180 0.3279
TTT 0.2301 0.1907 0.1801 0.3992


Overview of the computed models
The table below gives the accession numbers for which a background model was calculated, the description of the accession number and the single nucleotide frequency of the intergenic sequences. Remark at first that for Vibrio cholerae, Brucella melitensis and Deinococcus radiodurans, the intergenic sequences of both chromosomes were combined to calculate a single genome background model. Secondly, it should be noted that for genomes and plasmids separate background models were calculated.

Accession Number Description of the organism Single Nucleotide Frequency A C G T
NC_000854 Aeropyrum pernix 0.241861 0.255022 0.260723 0.242394
NC_002147 Agrobacterium tumefaciens (plasmid pTi-SAKURA) 0.240459 0.259457 0.26361 0.236475
NC_003306 Agrobacterium tumefaciens (C58 Dupont plasmid AT) 0.245357 0.258775 0.261389 0.234479
NC_003304 Agrobacterium tumefaciens (C58 U.Wash circular chromosome) 0.242733 0.26016 0.26263 0.234478
NC_003305 Agrobacterium tumefaciens (C58 U.Wash linear chromosome) 0.243302 0.258823 0.261405 0.23647
NC_003062 Agrobacterium tumefaciens (C58 circular chromosome) 0.243589 0.259362 0.258827 0.238222
NC_003063 Agrobacterium tumefaciens (C58 linear chromosome) 0.245508 0.259408 0.257901 0.237183
NC_003064 Agrobacterium tumefaciens (C58 plasmid AT) 0.243119 0.259851 0.262967 0.234063
NC_003065 Agrobacterium tumefaciens (C58 plasmid Ti) 0.263051 0.239196 0.241812 0.255941
NC_000918 Aquifex aeolicus 0.30665 0.194991 0.204513 0.293847
NC_000917 Archaeoglobus fulgidus 0.317611 0.180168 0.187041 0.315179
NC_003995 Bacillus anthracis (A2012) 0.359211 0.131324 0.169541 0.339925
NC_001496 Bacillus anthracis (virulence plasmid PX01) 0.361732 0.141994 0.161688 0.334586
NC_002570 Bacillus halodurans 0.319638 0.172334 0.207455 0.300574
NC_000964 Bacillus subtilis 0.328008 0.167476 0.193845 0.310671
NC_003278 Bacteriophage phi CTX 0.190154 0.339068 0.308041 0.162737
NC_001318 Borrelia burgdorferi 0.395511 0.089595 0.11843 0.396465
NC_002528 Buchnera sp. (APS) 0.4282 0.077825 0.081557 0.412418
NC_002163 Campylobacter jejuni 0.396611 0.102881 0.1212 0.379308
NC_002696 Caulobacter crescentus 0.193431 0.30843 0.30567 0.192469
NC_002620 Chlamydia muridarum 0.321674 0.166279 0.177119 0.334928
NC_002179 Chlamydia pneumoniae (AR39) 0.333693 0.160117 0.160409 0.345781
NC_000117 Chlamydia trachomatis 0.313897 0.172219 0.183978 0.329906
NC_000922 Chlamydophila pneumoniae (CWL029) 0.328908 0.165686 0.167276 0.33813
NC_003030 Clostridium acetobutylicum (ATCC824) 0.382556 0.109831 0.153613 0.354
NC_003366 Clostridium perfringens 0.408966 0.080243 0.124114 0.386676
NC_001895 Enterobacteria phage (P2) 0.296663 0.220681 0.21143 0.271226
NC_002371 Enterobacteria phage (P22) 0.298211 0.217804 0.217362 0.266622
NC_000913 Escherichia coli (K12) 0.295326 0.205887 0.20227 0.296517
NC_002655 Escherichia coli (O157:H7 EDL933) 0.295658 0.204987 0.206437 0.292918
NC_002695 Escherichia coli (O157:H7) 0.295608 0.204037 0.20508 0.295275
NC_002142 Escherichia coli (plasmid pB171) 0.284904 0.217931 0.219671 0.277494
NC_002525 Escherichia coli (plasmid R721) 0.291134 0.209497 0.21193 0.287439
NC_003295 Ralstonia solanacearum 0.255556 0.246667 0.24 0.257778
NC_000907 Haemophilus influenzae (Rd) 0.353833 0.148793 0.153898 0.343475
NC_002607 Halobacterium sp. (NRC-1) 0.188925 0.313247 0.310605 0.187223
NC_000915 Helicobacter pylori (26695) 0.3593 0.137307 0.154089 0.349304
NC_000921 Helicobacter pylori (J99) 0.351909 0.148488 0.162548 0.337055
NC_002137 Lactococcus cremoris (plasmid pNZ4000) 0.355004 0.145747 0.173055 0.326193
NC_002662 Lactococcus lactis 0.378586 0.124909 0.160222 0.336283
NC_003212 Listeria innocua (Clip11262) 0.351615 0.142858 0.176998 0.328529
NC_003210 Listeria monocytogenes (EGD) 0.346104 0.147406 0.177076 0.329414
NC_002682 Mesorhizobium loti (plasmid pMLb) 0.214231 0.284752 0.291247 0.20977
NC_002678 Mesorhizobium loti 0.218228 0.284727 0.285867 0.211177
NC_000916 Methanobacterium thermoautotrophicum (delta H) 0.319494 0.18352 0.19296 0.304026
NC_000909 Methanococcus jannaschii 0.383845 0.11403 0.129283 0.372843
NC_003551 Methanopyrus kandleri (AV19) 0.200434 0.297155 0.307352 0.195059
NC_003552 Methanosarcina acetivorans (C2A) 0.333612 0.169457 0.174811 0.322119
NC_003901 Methanosarcina mazei (Goe1) 0.335876 0.164747 0.16769 0.331688
NC_002677 Mycobacterium leprae (TN) 0.187086 0.27649 0.321192 0.215232
NC_002755 Mycobacterium tuberculosis (CDC1551) 0.196068 0.302233 0.31504 0.186659
NC_000962 Mycobacterium tuberculosis (H37Rv) 0.196432 0.304403 0.313149 0.186016
NC_000908 Mycoplasma genitalium 0.367269 0.141328 0.14336 0.348044
NC_000912 Mycoplasma pneumoniae 0.336135 0.168198 0.168528 0.327139
NC_002771 Mycoplasma pulmonis 0.422614 0.08577 0.093849 0.397767
NC_003116 Neisseria meningitidis (serogroup A strain Z2491) 0.287045 0.224549 0.220981 0.267426
NC_003112 Neisseria meningitidis (serogroup B strain MC58) 0.293869 0.220158 0.215051 0.270922
NC_003272 Nostoc sp. (PCC 7120) 0.328164 0.176426 0.176844 0.318566
NC_002663 Pasteurella multocida 0.332998 0.163356 0.172938 0.330708
NC_002122 Plasmid ColIb-P9 0.263065 0.244201 0.23187 0.260863
NC_002483 Plasmid F 0.293327 0.199083 0.210064 0.297526
NC_002134 Plasmid R100 0.28753 0.217654 0.223334 0.271481
NC_002516 Pseudomonas aeruginosa 0.203574 0.30486 0.293744 0.197822
NC_003350 Pseudomonas putida (plasmid pWW0) 0.221135 0.283861 0.288276 0.206728
NC_003364 Pyrobaculum aerophilum 0.277496 0.220089 0.2264 0.276015
NC_003296 Pyrobaculum aerophilum 0.277496 0.220089 0.2264 0.276015
NC_000868 Pyrococcus abyssi 0.303426 0.190595 0.203458 0.302522
NC_003413 Pyrococcus furiosus (DSM 3638) 0.31836 0.175281 0.188468 0.317891
NC_000961 Pyrococcus horikoshii 0.318463 0.175196 0.189504 0.316837
NC_003103 Rickettsia conorii (Malish 7) 0.369363 0.138443 0.148575 0.343619
NC_000963 Rickettsia prowazekii (Madrid E) 0.385694 0.112105 0.120806 0.381395
NC_002638 Salmonella Choleraesuis (50k virulence plasmid) 0.269815 0.220568 0.224913 0.284704
NC_003384 Salmonella Typhi (plasmid pHCM1) 0.290802 0.202909 0.21092 0.295369
NC_003385 Salmonella Typhi (plasmid pHCM2) 0.301451 0.210195 0.20977 0.278584
NC_003198 Salmonella Typhi 0.289873 0.208486 0.210412 0.291228
NC_002305 Salmonella typhi (plasmid R27) 0.292833 0.200335 0.210706 0.296126
NC_003277 Salmonella typhimurium (LT2 plasmid pSLT) 0.250919 0.24901 0.251732 0.248339
NC_003197 Salmonella typhimurium (LT2) 0.290432 0.20943 0.209792 0.290345
NC_002698 Shigella flexneri (virulence plasmid pWR501) 0.299489 0.206613 0.214861 0.279037
NC_003047 Sinorhizobium meliloti (1021) 0.221659 0.279783 0.284867 0.213692
NC_003037 Sinorhizobium meliloti (plasmid pSymA) 0.22263 0.279295 0.28446 0.213615
NC_003078 Sinorhizobium meliloti (plasmid pSymB) 0.224171 0.279466 0.285155 0.211207
NC_002758 Staphylococcus aureus (Mu50) 0.373193 0.121018 0.151865 0.353924
NC_002745 Staphylococcus aureus (N315) 0.373372 0.1203 0.149601 0.356727
NC_003098 Streptococcus pneumoniae (R6) 0.350542 0.144933 0.177469 0.327057
NC_003028 Streptococcus pneumoniae (R6) 0.350542 0.144933 0.177469 0.327057
NC_003485 Streptococcus pyogenes (MGAS8232) 0.343958 0.15266 0.187757 0.315626
NC_002737 Streptococcus pyogenes 0.341701 0.154167 0.186625 0.317507
NC_003888 Streptomyces coelicolor (A32) 0.158916 0.349382 0.341986 0.149716
NC_003106 Sulfolobus tokodaii 0.359505 0.138665 0.142451 0.359379
NC_000911 Synechocystis sp. (PCC 6803) 0.288694 0.210883 0.206234 0.294189
NC_003869 Thermoanaerobacter tengcongensis (MB4T) 0.348676 0.140259 0.196223 0.314841
NC_002689 Thermoplasma volcanium 0.343795 0.157582 0.160852 0.337771
NC_000853 Thermotoga maritima 0.311903 0.17974 0.213007 0.29535
NC_000919 Treponema pallidum 0.222977 0.236436 0.296936 0.243651
NC_002488 Xylella fastidiosa (9a5c) 0.267285 0.227061 0.226993 0.278661
NC_003131 Yersinia pestis plasmid (pCD1) 0.30731 0.187034 0.196966 0.30869
NC_003134 Yersinia pestis plasmid (pMT1) 0.288638 0.219575 0.215738 0.27605
NC_003143 Yersinia pestis strain (CO92) 0.300678 0.196472 0.199165 0.303685
Bmelitensis Brucella melitensis (chr I and II) 0.25082 0.24512 0.253735 0.250325
Vcholerae Vibrio cholerae (chr I and II) 0.296128 0.200417 0.203164 0.300292
NC_004041 Rhizobium etli (symbiotic plasmid p42d) 0.227587 0.272952 0.277418 0.222043
NC_003450 Corynebacterium glutamicum 0.26862 0.230742 0.235683 0.264956
Dradiodurans Deinococcus radiodurans (R1) 0.191602 0.313635 0.300836 0.193927


Statistics

Figures 1 compares the 3th order transition probabilities of two different genomes. In panel A the third-order transition matrices of E.coli and S.typhimurium are compared. The relatedness of both species is reflected by the almost identical distribution of the transition probabilities, which is represented by the scattering of the dots around the 1-1 line. In panel B the same plot is made but for two species with different GC content: E.coli K12 (At-rich) and S.coelicolor A32 (GC rich). From these plots it is clear that exchanging background models of E.coli and S.typhimurium will not deteriorate results obtained by the Motif Sampler. Exchanging background models between organisms with strongly different base pair composition will unmistakably result in an increase of false positives.

comparison of transition probabilities

Figure 1.A Figure 1.B


[View full size image]


[View full size image]


The tree in Figure2 shows the relatedness between organisms based on their oligonucleotide frequency of length 4 (3 order transition probabilities). To construct this tree, each genome was characterized by its transition probability vector of order 3. The Euclidean distance was used to calculate a pairwise distance between all 106 probability vectors. Using complete linkage clustering, a tree was constructed (TREECON, (Van de Peer and De Wachter, 1994)). For organisms or entries clustering together in this tree, background models can safely be exchanged (e.g. the base pair composition of E. coli strains and S. strains are almost identical). If, however, two organisms are too far away in the tree, the use of a distinct background model is essential (see also Fig B). This information can be useful for e.g. phylogenetic footprinting. If one searches a motif in the intergenics of orthologs of distinct organisms by using Motif Sampler, it is advisable to check the background distribution of these species.

Figure 2


[View full size image]

Top

References

  • Thijs G et al. (2002) INCLUSive: INtegrated Clustering, Upstream sequence retrieval and motif Sampling. Bioinformatics 18:331-332
  • Van de Peer Y, De Wachter R (1994) TREECON for Windows: a software package for the construction and drawing of evolutionary trees for the Microsoft Windows environment. Comput.Appl.Biosci. 10:569-570


    Top

    This page is maintained by Gert Thijs. Last update 2002/07/02.
    Email: gert.thijs@esat.kuleuven.ac.be
    Copyright © 2002, KULeuven.