The Centre for Research in Intelligent Systems is running a series of activities during Nov 23 to 27 centred around visits by Tom Mitchell and Eamonn Keogh.
Monday Nov 23, 10:00am to 3:00pm, 26/135
Tom Mitchell will give a talk and CRIS members will present short overviews of their research
Talk: Brains, Meaning and Corpus Statistics
How does the human brain represent meanings of words and pictures in terms of the underlying neural activity? This talk will present our research using machine learning methods together with fMRI brain imaging to study this question. One line of our research has involved training classifiers that identify which word a person is thinking about, based on their neural activity observed using fMRI. These classifiers essentially provide a virtual sensor of the information encoded by neural activity in different regions of the brain. A more recent line of research involves developing a computational model that predicts the neural activity associated with arbitrary English words, including words for which we do not yet have brain image data. This computational model is trained using fMRI data collected which people think about different nouns, together with noun statistics gathered from a trillion-word text corpus. Once trained, the model predicts fMRI activation for any other concrete noun appearing in the text corpus, with highly significant accuracies over the 60 nouns for which we currently have fMRI data.
Wednesday Nov 25, 2:00pm, 26/135
Read the Web: Toward Never-Ending Language Learning
We consider the problem of developing a never-ending language learner to learn to read the web, and focus on an approach with three characteristics that we hypothesize make it viable. First, in contrast to the very difficult problem of reading information from a single document, we consider the much easier problem of reading hundreds of millions of documents simultaneously, so that our system can extract facts that are stated many times by combining evidence from many documents. Second, our system begins with a given ontology that defines the types of information to be extracted, enabling it to focus its effort and to ignore most of the text which is irrelevant to the target ontology. Third, the system uses a new class of semi-supervised learning algorithms to learn how to extract information from web pages -- algorithms designed to achieve greater accuracy when given more complex ontologies. Our experiments show that this approach can produce knowledge bases containing tens of thousands of facts to populate ontologies with approximately 90% accuracy, starting with only a handful of labeled training examples and 200 million unlabeled web pages. An example Knowledge Base extracted by our system, containing approximately 40,000 assertions, is available at http://rtw.ml.cmu.edu/readtheweb.html
Thursday Nov 26, 2:00pm, 26/135
Friday Nov 27, 9:30am to 1:00pm, 26/135
Places are strictly limited and must be reserved by RSVP. In the first instance they are limited to members, associate members and postgraduate members of CRIS and FIT members of staff. If places remain they will be made available to others. Please RSVP to Jeanette Niehus