V

LA home
Computing
 Algorithms
 Bioinformatics
 FP,  λ
 Logic,  π
 MML
 Prog.Langs

FP
 II
  Ver.1.1

Given: a data-set on a certain protein from various individuals. The data set includes a variable (attribute, column), mSets, which is set-valued. For an individual, the set contains the positions at which mutations are observed in the protein compared to the "normal" protein. ms is the union of all the individual sets; 100+ positions have been observed to mutate at least once. Most of the sets are sparse and contain fewer than six mutations. The mutations could be stored as 100+ Boolean variables but a set-valued representation is convenient, and is the way that the data was presented.

We want to see if knowing whether or not a set contains mutation 1 tells us anything about whether or not it is likely to contain mutation 2. We chose pairs of different mutation positions, mut1 and mut2 from ms. Compute the information content of mut2, first independent of mut1 by indep = estMultiState mut2s, indepCost = msg2 indep mut2s, and then dependent on mut1 by depd = estFiniteFunction mut1s mut2s, depdCost = msg2 (functionModel2model depd) mut12s. The difference in these costs shows how much information, if any, knowing about mut1 gives us on mut2. (Note this quantity is not necessarily symmetric between mut1 and mut2.) The savings are sorted and negative ones discarded.

pairwise mSets ms =
 ((sortBy (\((_, _, s1):_) -> \((_, _, s2):_) -> compare s2 s1))
 .(filter (not.null)))
 [ m1s |
   mut1 <- ms,   -- for each mutation position, mut1
   let
     mut1s = map (elem mut1) mSets
     m1s = ((takeWhile (\(_, _, s) -> s > 0))
           .(sortBy (\(_, _, s1) -> \(_, _, s2) -> compare s2 s1)))
           [ (mut1, mut2, save) |
             mut2 <- ms,           -- for each mutation position, mut2
             mut1 /= mut2,         -- must differ
             let
               mut2s  = map (elem mut2) mSets
               mut12s = zip mut1s mut2s
               indep  = estMultiState           mut2s               -- 2
               depd   = estFiniteFunction mut1s mut2s               -- 2|1
               indepCost = msg2                     indep  mut2s  --cost 2
               depdCost  = msg2 (functionModel2model depd) mut12s --cost 2|1
               save = indepCost - depdCost
           ]  -- mut2 within mut1
 ]
11/2005

map, zip, elem, takeWhile, sortBy, and list comprehensions [exp| x<-s, ...] are standard Haskell-98 features. estMultiState, estFiniteFunction (regression), msg2, and functionModel2model are part of the Inductive Programming machine-learning library[1.0,1.1].

(See TID for more information on this problem.)

window on the wide world:

Computer Science Education Week

Linux
 Ubuntu
free op. sys.
OpenOffice
free office suite,
ver 3.4+

The GIMP
~ free photoshop
Firefox
web browser
FlashBlock
like it says!

© L. Allison   http://www.allisons.org/ll/   (or as otherwise indicated),
Faculty of Information Technology (Clayton), Monash University, Australia 3800 (6/'05 was School of Computer Science and Software Engineering, Fac. Info. Tech., Monash University,
was Department of Computer Science, Fac. Comp. & Info. Tech., '89 was Department of Computer Science, Fac. Sci., '68-'71 was Department of Information Science, Fac. Sci.)
Created with "vi (Linux + Solaris)",  charset=iso-8859-1,  fetched Wednesday, 30-Jul-2014 04:50:42 EST.