Search engine run on: http://www.csse.monash.edu.au/
Glookbib search for: MolBio protein structure LAllison
%A J. Collier
%A L. Allison
%A A. Lesk
%A M. Garcia de La Banda
%A A. Konagurthu
%T A new statistical framework to assess structural alignment quality using
information compression
%J ECCB
%W Strasbourg
%M SEP
%D 2014
%K conf, ECCB 14, MolBio, c2014, c201x, c20xx, zz0914, LAllison, ArunK, AMLesk,
protein, 3D, similar, structure, alignment, match, MML, MDL, AIC, complexity,
bioinformatics, 13th European Conference on Computational Biology,
I value, Ivalue,
%X "... proposes a new statistical framework to assess structural alignment
quality and significance based on lossless information compression. This is
a radical departure from the traditional approach of formulating scoring
functions. It links the structural alignment problem to the general class of
statistical inductive inference problems, solved using the
information-theoretic criterion of minimum message length. Based on this, we
developed an efficient and reliable measure of structural alignment quality,
I-value. The performance of I-value is demonstrated in comparison with a
number of popular scoring functions, on a large collection of competing
alignments. Our analysis shows that I-value provides a rigorous and reliable
quantification of structural alignment quality, addressing a major gap in
the field."
-- [doi:10.1093/bioinformatics/btu460],
[more].
%A A. S. Konagurthu
%A P. Kasarapu
%A L. Allison
%A J. H. Collier
%A A. M. Lesk
%T On sufficient statistics of least-squares superposition of vector sets
%J RECOMB
%I SpringerVerlag
%S LNCS/LNBI
%V 8394
%M APR
%P 144-159
%D 2014
%K conf, RECOMB, MolBio, c2014, c201x, c20xx, zz0514, ArunK, LAllison, AMLesk,
bioinformatics, RECOMB18, least squares, RMS, error, protein, structure, 3D,
match, matching, additive, orthogonal, rigid, vector set, Kearsley, algorithm
%X "Superposition by orthogonal transformation of vector sets by minimizing the
least-squares error is a fundamental task in many areas of science, notably
in structural molecular biology. Its widespread use for structural analyses
is facilitated by exact solns of this problem, computable in linear time.
However, in several of these analyses it is common to invoke this
superposition routine a very large number of times, often operating (through
addition or deletion) on previously superposed vector sets. This paper
derives a set of sufficient statistics for the least-squares orthogonal
transformation problem. These sufficient statistics are additive. This
property allows for the superposition parameters (rotation, translation, &
root mean square deviation) to be computable as constant time updates from
the statistics of partial solutions. We demonstrate that this results in a
massive speed up in the computational effort, when compared to the method
that recomputes superpositions ab initio . Among others, protein structural
alignment algorithms stand to benefit from our results."
-- [doi:10.1007/978-3-319-05269-4_11]['14],
[more].
%A A. S. Konagurthu
%A L. Allison
%A D. Abramson
%A P. J. Stuckey
%A A. M. Lesk
%T How precise are reported protein coordinate data?
%J Acta Cryst.
%V D70
%N 3
%P 904-906
%M MAR
%D 2014
%K jrnl, MolBio, c2014, c201x, c20xx, zz0314, protein, 3D, tertiary, structure,
precision, accuracy, PDB, LAllison, AMLesk
%X "Atomic coordinates in the Worldwide Protein Data Bank (wwPDB) are generally
reported to greater precision than the experimental structure determinations
have actually achieved. By using information theory & data compression to
study the compressibility of protein atomic coordinates, it is possible to
quantify the amount of randomness in the coordinate data & thereby to
determine the realistic precision of the reported coordinates. On avg., the
value of each C_alpha coordinate in a set of selected p.structures solved at
a variety of resolutions is good to about 0.1A."
-- [doi:10.1107/S1399004713031787]['14],
[more].
%A A. S. Konagurthu
%A A. M. Lesk
%A D. Abramson
%A P. J. Stuckey
%A L. Allison
%T Statistical inference of protein "LEGO bricks"
%J ICDM
%M DEC
%D 2013
%K conf, ICDM, ICDM13, MolBio, c2013, c201x, c20xx, zz1213, LAllison, ArunK,
AMLesk, protein, tertiary, 3D, MML, structure, structures, recurrent,
structural, motifs, backbone, folds, MDL, library, dictionary, fragments,
blocks, bioinformatics, data mining
%X "Proteins are biomolecules of life. They fold into a great variety of
three-dimensional (3D) shapes. Underlying these folding patterns are many
recurrent structural fragments or building blocks (analogous to "LEGO(r)
bricks"). This paper reports an innovative statistical inference approach to
discover a comprehensive dictionary of protein structural building blocks
from a large corpus of experimentally determined protein structures. Our
approach is built on the Bayesian and information-theoretic criterion of
minimum message length [MML]. To the best of our knowledge, this work is the
first systematic and rigorous treatment of a very important data mining
problem that arises in the cross-disciplinary area of structural
bioinformatics. The quality of the dictionary we find is demonstrated by its
explanatory power - any protein within the corpus of known 3D structures can
be dissected into successive regions assigned to fragments from this
dictionary. This induces a novel one-dimensional representation of three-
-dimensional protein folding patterns, suitable for application of the rich
repertoire of character-string processing algorithms, for rapid
identification of folding patterns of newly determined structures. This paper
presents the details of the methodology used to infer the dictionary of
building blocks, and is supported by illustrative examples to demonstrate its
effectiveness and utility."
-- [doi:10.1109/ICDM.2013.73]['14],
[more], and
1310.1462@[arXiv]['13].
%A A. S. Konagurthu
%A A. M. Lesk
%A L. Allison
%T Minimum message length inference of secondary structure from protein
coordinate data
%J J. Bioinformatics
%I OUP
%V 28
%N 12
%P i97-i105
%M JUN
%D 2012
%O ISMB, Long Beach
%K conf, ISMB12, MolBio, c2012, c201x, c20xx, zz0612, LAllison, ArunK, AMLesk,
SST, bioinformatics, protein, secondary structure, assignment, helix,
extended strand, sheet, coil, mmld, fold, MML, MDL, model
%X "Motivation: Secondary structure underpins the folding pattern and
architecture of most proteins. Accurate assignment of the SS elts is
therefore an important problem. Although many approx. solns of the SS
assignment problem exist, the statement of the problem has resisted a
consistent & math. rigorous defn. A variety of comparative studies have
highlighted major disagreements in the way the available methods define &
assign SS to coord.data.
Results: We report a new method to infer SS based on the Bayesian method of
Minimum Message Length (MML) inference. It treats assignments of SS as
hypotheses that explain the given coord.data. The method seeks to maximise
the joint probability of a hypothesis & the data. There is a natural null
hypothesis & any assignment that cannot better it is unacceptable. We
developed a program SST based on this approach & compared it to popular
programs such as DSSP & STRIDE amongst others. Our evaln suggests that SST
gives reliable assignments even on low resolution structures."
-- [doi:10.1093/bioinformatics/bts223]['12],
[www]['12].
More: [www]['12].
%A A. S. Konagurthu
%A L. Allison
%A P. J. Stuckey
%A A. M. Lesk
%T Piecewise linear approximation of protein structures using the principle of
minimum message length
%J J. Bioinformatics
%V 27
%N 13
%P i43-i51
%M JUL
%D 2011
%K conf, MolBio, MML, c2011, c201x, c20xx, zz0711, ISMB, LAllison, ArunK,
AMLesk, protein, fold, cartoon, description, ribbon diagram, structure,
segmentation, minimum message length, MDL, information theoretic
%X "Simple & concise representations of protein-folding patterns provide
powerful abstractions for visualizations, comparisons, classifications,
searching & aligning structural data. Structures are often abstracted by
replacing standard secondary structural features - that is, helices & strands
of sheet - by vectors or linear segments. Relying solely on std secondary
structure may result in a sig. loss of structural information. Further,
traditional methods of simplification crucially depend on the consistency &
accuracy of external methods to assign SS to protein coord.data. Although
many methods exist automatically to identify SS, the impreciseness of
definitions, along with errors & inconsistencies in experimental structure
data, drastically limit their applicability to generate reliable simplified
representations, especially for structural comparison.
This article introduces a mathematically rigorous alg. to delineate protein
structure using the elegant statistical & inductive inference framework of
minimum message length (MML). Our method generates consistent & statistically
robust piecewise linear explanations of protein coordinate data, resulting in
a powerful & concise representation of the structure. The delineation is
completely independent of the approaches of using hydrogen-bonding patterns
or inspecting local substructural geometry that the current methods use.
Indeed, as is common with applications of the MML criterion, this method is
free of parameters & thresholds, in striking contrast to the existing
programs which are often beset by them.
The analysis of results over a large number of proteins suggests that the
method produces consistent delineation of structures that encompasses, among
others, the segments corresponding to standard secondary structure."
-- [doi:10.1093/bioinformatics/btr240]['11].
(Also see [more].)
%A T. Edgoose
%A L. Allison
%A D. L. Dowe
%T An MML classification of protein structure that knows about angles and
sequence
%J Pacific Symposium on Biocomputing '98
%P 585-596
%M JAN
%D 1998
%K conf, MolBio, PSB, PSB3, PSB98, von Mises, vonMises, angle, dihedral, class,
cluster, clustering, HMM, SNOB, time series, timeseries, ARC A49602504,
LAllison, MDL, distribution, bioinformatics, Monash, c1998, c199x, c19xx
%I World Scientic
%X SNOB + vonMises circular probability distribution + 1st order Markov model.
phi-psi pairs give 17 classes and a class seq' correlation matrix.
[paper]
[paper]
[paper.pdf@stanford.edu]['98]; uk us isbn:9810232780.
von Mises, probability density:
f(x | mu, kappa) = (1/(2.pi.I0(kappa))).exp(kappa.cos(x-mu))
where I0(kappa) is a normalisation constant.
[Bioinformatics].
%A D. L. Dowe
%A L. Allison
%A T. I. Dix
%A L. Hunter
%A C. S. Wallace
%A T. Edgoose
%T Circular clustering of protein dihedral angles by minimum message length
%J Pacific Symposium on Biocomputing '96
%M JAN
%P 242-255
%D 1996
%I World Scientific
%O TR 95/237, Dept. Computer Science, Monash University, Oct 1995
%K PSB, PSB96, TR 237, TR237, Monash, DLD, CSW, CSWallace, LAllison, MolBio,
Monash, classification, angle, von Mises, vonMises, protein structure,
inductive inference, II, MML, MDL, conf, bioinformatics, c1996, c199x, c19xx
%X L. Hunter - NLM, NIH. PSB '96: 3-6 Jan 1996, Hawaii; uk us isbn:9810225784.
[paper],
[paper.ps][1/'96],
[[eProceedings]][1/'96].
%A D. L. Dowe
%A J. Oliver
%A L. Allison
%A T. I. Dix
%A C. S. Wallace
%T Learning rules for protein secondary structure prediction
%J Proc. 1992 Department Research Conf.
%I Dept. Computer Science, University of Western Australia
%E C. McDonald
%E J. Rohl
%E R. Owens
%M JUL
%D 1992
%O TR 92/163, Dept. Computer Science, Monash University, JUN '92
%K LAllison, CSW, DLD, Monash, UWA, WA, conf, MolBio, decision tree, trees,
graph, protein, amino acid, AA, secondary structure, SS, prediction,
rule, rules, alpha helix, beta strand, extended sheet, coil, turn,
CSWallace, inductive inference, II, MML, minimum message length, c1992,
c199x, c19xx, bioinformatics, TR 92 163, TR92-163, TR163
%X [TR92/163.ps]
Also see [Bioinformatics],
and TR 92/163.
[CSci UWA home]['00]; uk us isbn:0864221959.
%A D. L. Dowe
%A J. Oliver
%A T. I. Dix
%A L. Allison
%A C. S. Wallace
%T A decision graph explanation of protein secondary structure prediction
%J 26th Hawaii Int. Conf. Sys. Sci.
%V 1
%P 669-678
%M JAN
%D 1993
%K LAllison, CSW, Monash, conf, MolBio, protein secondary structure prediction,
conformation, alpha helix, ss, AA, beta sheet extended strand, turn, coil,
II, inductive inference, decision graph tree, DTree, DGraph, CSWallace, CSW,
MML, Minimum message length encoding, description, MDL, Bayesian,
TR163 163, c1993, c199x, c19xx, bioinformatics, HICSS, HICSS26, HICSS93
%X Oliver and Wallace (IJCAI '91) introduced `decision graphs' -
a generalisation of decision trees - here applied to protein secondary
structure prediction.
[more],
[paper (HTML)].
Also see TR 92/163.