|
|
A C G T
.-------------------- P(S[i]|S[i-1])
A| 1/12 1/12 1/12 9/12
|
C| 9/20 1/20 1/20 9/10
|
G| 9/20 1/20 1/20 9/10
|
T| 9/12 1/12 1/12 1/12
MMg: an AT-rich, order-1 Markov model.
|
S1 and S2 are two unrelated sequences drawn from
a population modelled by an order-1 Markov model (right).
(The model is just an example but it is not implausible --
the genome of Plasmodium falciparum in 80% AT,
and AT-rich regions appear in other genomes.)
Assuming a uniform random population,
they appear to be related (390:400 bits),
but assuming an order-0, or (better) an order-1
population model (whose parameters are learned from the data), they are
correctly seen to be unrelated (283:339 bits).
> Align Compressible Sequences:
> S1:
1 GCTATAGTAA TGCTATAATG ATATATTATA TATCTATATA TATATTATAT
51 ATACTAATAT GATAATATAT ATATATATCT ATAGTCATAT CTATATACAT 100
> S2:
1 GCATGTATAT TATATATATA CTTATGTATG ATTATTATAT ATCATAGACT
51 ATCATATATT TATAATATAT CACATATATA TGATATACTA TGATATCTAT 100
> Models: 2 x Uniform:
> msgLen null = 400.0 bits = 200.0{S1} + 200.0{S2} = 2.0000 b/ch
> msgLen S1~S2 = 390.0 bits = 9.7+0.0+0.0{H} + 380.3{S1~S2|H} = 1.9498 b/ch
> GCTATAGTAATGCTATAATGATATA-TTATATATCTATA-TATATATTAT
> || || || || ||| || ||||| |||| ||| || ||||||| ||
> GC-AT-GT-ATATTAT-AT-ATATACTTATGTATGATTATTATATATCAT
> ATA-TACTAATATGATAATATATATAT-ATATCTATAGTCATAT-CTAT-
> | | || | |||| | ||| |||||| | || |||| | |||| ||||
> AGACTA-TCATATATTTATA-ATATATCACATATATA-TGATATACTATG
> ATA-C-AT
> ||| | ||
> ATATCTAT
> [Frequencies =:77.0 ~:15.0 i:8.0 d:8.0 tot:108.0]
> model implies ALIGNMENT:unrelated = 2^10.0 : 1 +/- a pinch of salt
> ---
> Models: 2 x Order-0 Markov:
> msgLen null = 339.4 bits = 167.4{S1} + 172.0{S2} = 1.6969 b/ch
> msgLen S1~S2 = 370.9 bits = 9.7+9.2+9.0{H} + 343.0{S1~S2|H} = 1.8545 b/ch
> GCTATAGTAATGCTATAATGATATA-TTATATATCTATA-TATATATTAT
> $$ || $| || ||| || ||||| |||| ||| || ||||||| ||
> GC-AT-GT-ATATTAT-AT-ATATACTTATGTATGATTATTATATATCAT
> ATA-TA-CTAATA-TGATAATATATATATATATCTATAGTCATATCTAT-
> | | || $ ||| | ||||||||| | |||| ||| $ ||| $|||
> AGACTATCATATATTTATAATATAT-CACATATATAT-GATATA-CTATG
> ATA-C-AT
> ||| $ ||
> ATATCTAT
> [Frequencies =:76.0 ~:16.0 i:8.0 d:8.0 tot:108.0]
> model implies alignment:UNRELATED = 1 : 2^31.5 +/- a pinch of salt
> ---
> Models: 2 x Order-1 Markov:
> msgLen null = 283.1 bits = 135.6{S1} + 147.4{S2} = 1.4153 b/ch
> msgLen S1~S2 = 339.2 bits = 9.7+26.4+26.1{H} + 277.0{S1~S2|H} = 1.6959 b/ch
> GCTATAGTAATGCTATAATGATATA-TTATATATCTATATATATATTAT-
> $$ || $| || ||| || ||||| |$|| ||| ||| |||| $||
> GC-AT-GT-ATATTAT-AT-ATATACTTATGTATGATTAT-TATA-TATC
> ATATACTA--ATATGATAATATATATAT-ATATCTATAGTCATAT-CTAT
> ||| |$|| $||| | ||| $||||| | || |||| | |||| $|||
> ATAGACTATCATATATTTATA-ATATATCACATATATA-TGATATACTAT
> -ATA-C-AT
> ||| $ ||
> GATATCTAT
> [Frequencies =:78.0 ~:13.0 i:9.0 d:9.0 tot:109.0]
> model implies alignment:UNRELATED = 1 : 2^56.1 +/- a pinch of salt
> --- end ---
- [more]
|
window on the wide world:
|
|