## V

 FP  II   Ver.1.1

Given: a data-set on a certain protein from various individuals. The data set includes a variable (attribute, column), mSets, which is set-valued. For an individual, the set contains the positions at which mutations are observed in the protein compared to the "normal" protein. ms is the union of all the individual sets; 100+ positions have been observed to mutate at least once. Most of the sets are sparse and contain fewer than six mutations. The mutations could be stored as 100+ Boolean variables but a set-valued representation is convenient, and is the way that the data was presented.

We want to see if knowing whether or not a set contains mutation 1 tells us anything about whether or not it is likely to contain mutation 2. We chose pairs of different mutation positions, mut1 and mut2 from ms. Compute the information content of mut2, first independent of mut1 by indep = estMultiState mut2s, indepCost = msg2 indep mut2s, and then dependent on mut1 by depd = estFiniteFunction mut1s mut2s, depdCost = msg2 (functionModel2model depd) mut12s. The difference in these costs shows how much information, if any, knowing about mut1 gives us on mut2. (Note this quantity is not necessarily symmetric between mut1 and mut2.) The savings are sorted and negative ones discarded.

 ```pairwise mSets ms = ((sortBy (\((_, _, s1):_) -> \((_, _, s2):_) -> compare s2 s1)) .(filter (not.null))) [ m1s | mut1 <- ms, -- for each mutation position, mut1 let mut1s = map (elem mut1) mSets m1s = ((takeWhile (\(_, _, s) -> s > 0)) .(sortBy (\(_, _, s1) -> \(_, _, s2) -> compare s2 s1))) [ (mut1, mut2, save) | mut2 <- ms, -- for each mutation position, mut2 mut1 /= mut2, -- must differ let mut2s = map (elem mut2) mSets mut12s = zip mut1s mut2s indep = estMultiState mut2s -- 2 depd = estFiniteFunction mut1s mut2s -- 2|1 indepCost = msg2 indep mut2s --cost 2 depdCost = msg2 (functionModel2model depd) mut12s --cost 2|1 save = indepCost - depdCost ] -- mut2 within mut1 ] ``` 11/2005

map, zip, elem, takeWhile, sortBy, and list comprehensions [exp| x<-s, ...] are standard Haskell-98 features. estMultiState, estFiniteFunction (regression), msg2, and functionModel2model are part of the Inductive Programming machine-learning library[1.0,1.1].