## e.g., unrelated

 Bioinfomatics  compression   +alignment    e.g.
 ``` A C G T .-------------------- P(S[i]|S[i-1]) A| 1/12 1/12 1/12 9/12 | C| 9/20 1/20 1/20 9/10 | G| 9/20 1/20 1/20 9/10 | T| 9/12 1/12 1/12 1/12 ``` MMg: an AT-rich, order-1 Markov model.

S1 and S2 are two unrelated sequences drawn from a population modelled by an order-1 Markov model (right). (The model is just an example but it is not implausible -- the genome of Plasmodium falciparum in 80% AT, and AT-rich regions appear in other genomes.)

Assuming a uniform random population, they appear to be related (390:400 bits), but assuming an order-0, or (better) an order-1 population model (whose parameters are learned from the data), they are correctly seen to be unrelated (283:339 bits).

```Align Compressible Sequences:

S1:
1 GCTATAGTAA TGCTATAATG ATATATTATA TATCTATATA TATATTATAT
51 ATACTAATAT GATAATATAT ATATATATCT ATAGTCATAT CTATATACAT  100

S2:
1 GCATGTATAT TATATATATA CTTATGTATG ATTATTATAT ATCATAGACT
51 ATCATATATT TATAATATAT CACATATATA TGATATACTA TGATATCTAT  100

Models: 2 x Uniform:
msgLen  null = 400.0 bits = 200.0{S1} + 200.0{S2} = 2.0000 b/ch
msgLen S1~S2 = 390.0 bits = 9.7+0.0+0.0{H} + 380.3{S1~S2|H} = 1.9498 b/ch
GCTATAGTAATGCTATAATGATATA-TTATATATCTATA-TATATATTAT
|| || || ||  ||| || ||||| |||| |||   || ||||||| ||
GC-AT-GT-ATATTAT-AT-ATATACTTATGTATGATTATTATATATCAT

ATA-TACTAATATGATAATATATATAT-ATATCTATAGTCATAT-CTAT-
| | || | ||||  | ||| |||||| | || |||| | |||| ||||
AGACTA-TCATATATTTATA-ATATATCACATATATA-TGATATACTATG

ATA-C-AT
||| | ||
ATATCTAT

[Frequencies =:77.0 ~:15.0 i:8.0 d:8.0 tot:108.0]
model implies  ALIGNMENT:unrelated = 2^10.0 : 1  +/- a pinch of salt
---
```

The sequences seem to be related using the uniform population model above, but using a 0-order population model, learned from the data, we get ...

```Models: 2 x Order-0 Markov:
msgLen  null = 339.4 bits = 167.4{S1} + 172.0{S2} = 1.6969 b/ch
msgLen S1~S2 = 370.9 bits = 9.7+9.2+9.0{H} + 343.0{S1~S2|H} = 1.8545 b/ch
GCTATAGTAATGCTATAATGATATA-TTATATATCTATA-TATATATTAT
\$\$ || \$| ||  ||| || ||||| |||| |||   || ||||||| ||
GC-AT-GT-ATATTAT-AT-ATATACTTATGTATGATTATTATATATCAT

ATA-TA-CTAATA-TGATAATATATATATATATCTATAGTCATATCTAT-
| | || \$  ||| | |||||||||  | |||| ||| \$  ||| \$|||
AGACTATCATATATTTATAATATAT-CACATATATAT-GATATA-CTATG

ATA-C-AT
||| \$ ||
ATATCTAT

[Frequencies =:76.0 ~:16.0 i:8.0 d:8.0 tot:108.0]
model implies  alignment:UNRELATED = 1 : 2^31.5  +/- a pinch of salt
---
```

. . . the sequences are seen to be unrelated. The same conclusion is achieved but with even greater confidence if an order-1 population model (learned from the data) is used ...

```Models: 2 x Order-1 Markov:
msgLen  null = 283.1 bits = 135.6{S1} + 147.4{S2} = 1.4153 b/ch
msgLen S1~S2 = 339.2 bits = 9.7+26.4+26.1{H} + 277.0{S1~S2|H} = 1.6959 b/ch
GCTATAGTAATGCTATAATGATATA-TTATATATCTATATATATATTAT-
\$\$ || \$| ||  ||| || ||||| |\$|| |||   ||| |||| \$||
GC-AT-GT-ATATTAT-AT-ATATACTTATGTATGATTAT-TATA-TATC

ATATACTA--ATATGATAATATATATAT-ATATCTATAGTCATAT-CTAT
||| |\$||  \$|||  | ||| \$||||| | || |||| | |||| \$|||
ATAGACTATCATATATTTATA-ATATATCACATATATA-TGATATACTAT

-ATA-C-AT
||| \$ ||
GATATCTAT

[Frequencies =:78.0 ~:13.0 i:9.0 d:9.0 tot:109.0]
model implies  alignment:UNRELATED = 1 : 2^56.1  +/- a pinch of salt
-- end ---
```

. . . fairly strong odds that the sequences are unrelated, as is truly the case.

Also see [more (click)].

