Given: a data-set on a certain protein from various individuals. The data set includes a variable (attribute, column), mSets, which is set-valued. For an individual, the set contains the positions at which mutations are observed in the protein compared to the "normal" protein. ms is the union of all the individual sets; 100+ positions have been observed to mutate at least once. Most of the sets are sparse and contain fewer than six mutations. The mutations could be stored as 100+ Boolean variables but a set-valued representation is convenient, and is the way that the data was presented.

We want to see if knowing whether or not a set contains mutation 1 tells us anything about whether or not it is likely to contain mutation 2. We chose pairs of different mutation positions, mut1 and mut2 from ms. Compute the information content of mut2, first independent of mut1 by indep = estMultiState mut2s, indepCost = msg2 indep mut2s, and then dependent on mut1 by depd = estFiniteFunction mut1s mut2s, depdCost = msg2 (functionModel2model depd) mut12s. The difference in these costs shows how much information, if any, knowing about mut1 gives us on mut2. (Note this quantity is not necessarily symmetric between mut1 and mut2.) The savings are sorted and negative ones discarded.

pairwise mSets ms =
 ((sortBy (\((_, _, s1):_) -> \((_, _, s2):_) -> compare s2 s1))
 .(filter (not.null)))
 [ m1s |
   mut1 <- ms,   -- for each mutation position, mut1
     mut1s = map (elem mut1) mSets
     m1s = ((takeWhile (\(_, _, s) -> s > 0))
           .(sortBy (\(_, _, s1) -> \(_, _, s2) -> compare s2 s1)))
           [ (mut1, mut2, save) |
             mut2 <- ms,           -- for each mutation position, mut2
             mut1 /= mut2,         -- must differ
               mut2s  = map (elem mut2) mSets
               mut12s = zip mut1s mut2s
               indep  = estMultiState           mut2s               -- 2
               depd   = estFiniteFunction mut1s mut2s               -- 2|1
               indepCost = msg2                     indep  mut2s  --cost 2
               depdCost  = msg2 (functionModel2model depd) mut12s --cost 2|1
               save = indepCost - depdCost
           ]  -- mut2 within mut1

map, zip, elem, takeWhile, sortBy, and list comprehensions [exp| x<-s, ...] are standard Haskell-98 features. estMultiState, estFiniteFunction (regression), msg2, and functionModel2model are part of the Inductive Programming machine-learning library[1.0,1.1].

(See TID for more information on this problem.)

window on the wide world:

Computer Science Education Week

free op. sys.
free office suite,
ver 3.4+

~ free photoshop
web browser
like it says!

© L. Allison   http://www.allisons.org/ll/   (or as otherwise indicated),
Faculty of Information Technology (Clayton), Monash University, Australia 3800 (6/'05 was School of Computer Science and Software Engineering, Fac. Info. Tech., Monash University,
was Department of Computer Science, Fac. Comp. & Info. Tech., '89 was Department of Computer Science, Fac. Sci., '68-'71 was Department of Information Science, Fac. Sci.)
Created with "vi (Linux + Solaris)",  charset=iso-8859-1,  fetched Monday, 31-Aug-2015 18:47:33 EST.