[Compress]
DNA Sequence Compression Example
> Compress: L.Allison, Computer Science, Monash University 4/1998
1 TGATAGGTGA TAGATAGATT GATAGATGAT AGAAGATTGA TAGATGATAG
51 ATACATAGGT GATAGTAGAT GTAAGATGAT AGATGATAGA TAGATAGATG
101 ATAGACAGAT TGATAGATGA TAGAGAGA 128
> order-0 Markov Model
> . . | 4.0 +
> | 3.5 b
> | 3.0 b
> | 2.5 b
>+..+.....+...+..-.+...+...-..+..+..+-.+..........+..-..+........|- 2.0 b
> .. ..... ... ..+. ... ...... .. .. +. .......... ..... ........| 1.5 b
> | 1.0 b
> | 0.5 b
> | 0.0 b
> compress: Sequence length=128, |Alphabet|=4, log2(|Alphabet|) =2.0000
> hypothesis: (H) =10.0 bits
> data: (D|H) =211.0 bits, =1.6487 b/ch
> total: (H)+(D|H) =221.0 bits, =1.7269 b/ch
> ran 00/01/21 from 15:32:55 to 15:32:55
> order-1 Markov Model
> . . . . . . | 4.0 +
> . . . | 3.5 b
> . . . . | 3.0 b
> | 2.5 b
>----------------------------------------------------------------|- 2.0 b
>. . . . . . .. . . . .. . . . .. . .. . . . . .. . . . . . ...| 1.5 b
>... + + . + ... . + ....... + .. . .... + + + ... . ... + | 1.0 b
> . . . . . . . .. . . . . . . . . . .. . . . . ... . .. ...| 0.5 b
> | 0.0 b
> compress: Sequence length=128, |Alphabet|=4, log2(|Alphabet|) =2.0000
> hypothesis: (H) =30.0 bits
> data: (D|H) =152.1 bits, =1.1886 b/ch
> total: (H)+(D|H) =182.2 bits, =1.4233 b/ch
> ran 00/01/21 from 15:32:55 to 15:32:55
> AED fwd approx repeats
> [Frequencies B:58.6 R:3.2 C:68.2 E:3.2 =:66.4 ~:2.0 i:1.0 d:2.1 tot:204.8]
> [Frequencies B:41.0 R:4.2 C:86.0 E:4.2 =:83.6 ~:2.0 i:1.4 d:3.3 tot:225.8]
> [Frequencies B:37.3 R:4.8 C:89.3 E:4.8 =:87.4 ~:1.7 i:1.5 d:3.6 tot:230.6]
> . . . . . | 4.0 +
> + . . . . | 3.5 b
> + . | 3.0 b
> . . | 2.5 b
>------.----------------------------------.-------.-------------.|- 2.0 b
>. . . . . .. . . . | 1.5 b
>... +. . . .. ..+ . . + | 1.0 b
> . .+. ....+.+.... . . . + ..++ .+.... .++..+..+.. ..| 0.5 b
> +++.++. .+. + . . . ..++ | 0.0 b
> compress: Sequence length=128, |Alphabet|=4, log2(|Alphabet|) =2.0000
> hypothesis: (H) =49.5 bits
> data: (D|H) =128.5 bits, =1.0040 b/ch
> total: (H)+(D|H) =178.0 bits, =1.3906 b/ch
> ran 00/01/21 from 15:32:55 to 15:32:58
> --- end ---
------------------------------------------------------------------------------
L.Allison, Computer Science and SWE, Monash University, Australia 3168
http://www.csse.monash.edu.au/~lloyd/tildeStrings/
Fri Jan 21 15:32:58 EST 2000