#This file was created by Wed Dec 2 17:16:47 1998 #LyX 0.12 (C) 1995-1998 Matthias Ettrich and the LyX Team \lyxformat 2.15 \textclass report \begin_preamble \documentclass[11pt,honoursthesis,oneandhalfspace]{sdthesis} \usepackage{chicago} %% input the chicago bib package. \usepackage{epsf} \usepackage{fancybox} \usepackage{lgrind} \usepackage{verbatim} \newcommand{\eg}{e.g.,\ } \newcommand{\ie}{i.e.,\ } % Thesis definitions \renewcommand{\thesisauthor}{Daryl S. M. Moulder} \renewcommand{\thesisauthorlastname}{Moulder} \renewcommand{\thesisauthorpreviousdegrees}{BComp(Software Development)} \renewcommand{\thesismonth}{July} \renewcommand{\thesisyear}{1998} \renewcommand{\thesistitle}{Software Agents for the Internet and NFACT (News Filtering Agent Communication Tool)} \renewcommand{\thesissupervisor}{Jason Lowder and Xindong Wu} %\renewcommand{\thesisresearchunittitle{School} \renewcommand{\thesisdepartment}{Computer Science and Software Engineering} % \renewcommand{\thesisauthoraddress}{School of Computer Science and Software Engineering\\Monash University\\Australia} %\renewcommand{\thesisdedication}{} % start the document \begin{document} \frontmatter %% start the thesis front matter. \thesiscopyrightpage %% Generate the copyright page. \thesistitlepage %% Generate the title page. \tableofcontents %% Generate a table of contents. \listoftables %% Generate a list of tables (optional). \listoffigures %% Generate a list of figures (optional). \begin{thesisabstract} %% generate the abstract page. The exponential growth of the Internet has created a substantial increase in the information that is available online. The Internet, and Internet News in particular, provides a large amount of information, through which the user has to search. One approach to this problem is to use intelligent agents which gather information which the user would find potentially useful. This research provides a framework from which a prototype agent was developed to effectively filter Internet news and co-operate with other news agents to retrieve articles. It also uses some natural language processing techniques which enable the agent to find more relevant articles. The Java language is used to provide some heuristics which classifies articles which have been returned from a search engine into different catagories such as ``Flames'' and ``me too'' articles. The system also gives an indication about how formal the articles are from 1 (formal) to 5 (very informal). NFACT gives the user a better way of browsing articles because it provides a more informative view of Internet news articles than a standard news reader. %\input{abstract.tex} \end{thesisabstract} %\thesisdeclarationpage %% generate the declaration page %(optional). %\begin{thesisacknowledgments} %% generate the acknowledgements page (optional). %\input{ack.tex} %\end{thesisacknowledgments} %\thesisdedicationpage %% Generate the dedication page %(optional). \mainmatter \end_preamble \language default \inputencoding default \fontscheme helvet \graphics dvips \float_placement ht \paperfontsize 11 \spacing onehalf \papersize a4paper \paperpackage a4 \use_geometry 0 \use_amsmath 0 \paperorientation portrait \secnumdepth 3 \tocdepth 3 \paragraph_separation indent \defskip medskip \quotes_language english \quotes_times 2 \papercolumns 1 \papersides 1 \paperpagestyle plain \layout Chapter Introduction \layout Section The Internet and Information Acquisition \layout Quote \begin_inset Quotes eld \end_inset We are drowning in information, but starved of knowledge. \begin_inset Quotes erd \end_inset \begin_inset LatexCommand \cite{nasisbitt} \end_inset \layout Standard The Internet is a world wide network of heterogeneous computers that are linked together via a network protocol called TCP/IP. It has been growing at an almost exponential rate, increasing the amount of information that is being provided \begin_inset LatexCommand \cite{internet} \end_inset . This increase of information becomes overwhelming to the average user. The lack of a coherent structure also makes it very difficult to find the information that one is looking for. \layout Standard Information itself has become an important trading commodity as the information age approaches \begin_inset LatexCommand \cite{bjorn} \end_inset . This is because of the importance of information to individuals and large corporations as well as from the increased availability of this information that the Internet provides. \layout Standard The method that has traditionally been used to locate information and to filter information effectively has been the search engine. The search engine has various shortcomings, however. It finds a whole range of different web pages and news articles and only a small number of these may contain information relevant to the user. Search engines rely on keywords typed in by the user, and do not take into account the context in which they are used \latex latex \latex default \begin_inset LatexCommand \cite{DoR} \end_inset . \layout Standard An agent can help alleviate the information crisis by filtering the information that is being searched for without doing all the searching for the user. The agent then becomes an extension of the user's own searching methods and is able to complement them. This offers a superior solution to using search engines because it uses a profile that the user enters into the system which defines the context of the information that is being searched for. \layout Standard Agents have been used for quite some time, but it has only been in recent years that they have become prominent in both the research community and the marketplace. Agents have begun to be very popular in the general community. Many products (such as Word \begin_inset LatexCommand \cite{word} \end_inset ) try to brand themselves as containing an intelligent agent to entice users to buy their products. \layout Standard Agents have become popular with developers not just because they are a current trend, but by implementing agents they provide some very important functionalit y to the product. Microsoft Word for example actively assists by offering suggestions. When the words \begin_inset Quotes eld \end_inset Dear John, \begin_inset Quotes erd \end_inset is typed, the user guide offers up a suggestion. It asks whether the user is typing a letter, and would the said user be interested in using a letter wizard to help. These programs although they try to brand themselves as being an intelligent agent, do not possess all the capabilities of an agent. The ability for the agent to learn from the user's behaviour is an example of this. \layout Standard The field of artificial intelligence is constantly expanding, and in particular the topic of intelligent agents. It is still quite difficult, however, to define both what intelligence and intelligent behaviour is. These are systems that exhibit the activities necessary to be seen as intellige nt, as opposed to the imitation of intelligence. This research aims to further this goal by developing a good reasoning model for the agent. \layout Standard An agent needs a reasoning model to deduce from a set of rules information which can be shown to the user based on information that has been given it. This is being achieved by using an expert system which is a program that separates a knowledge base from the rules, and the inference engine which applies these rules (see section \begin_inset LatexCommand \ref{agents_and_reasoning} \end_inset for more details). \layout Standard This project examines a particular type of agent called an information agent. These agents actively search for information and then filter information based on the what the agent has learned about the user \begin_inset LatexCommand \cite{DaR} \end_inset . The agent used as an example in this research, knows the context of the search from the use of a thesaurus, or a set of rules which provides the user with substantially more relevant information. \layout Section Purpose of the Research \layout Standard The purpose of this research aims to reduce the amount of time the user requires to find relevant information. This is achieved by building a hybrid of a search engine and a rule based intelligent agent. The particular solution used is in the form of a news client that takes advantage of available search engines to help find relevant information. These news articles are then reduced further by deleting references that are not within the user's chosen context. \latex latex \layout Standard The research provides an example of an intelligent agent that can be used to filter information on the Internet \begin_inset LatexCommand \cite{internet} \end_inset . The agent also is able to communicate with other agents to send and receive articles on common topics. \layout Section Research Questions \layout Standard What kind of reasoning model should be provided in an intelligent Internet News filtering agent? Such a model could be developed further from similar news filtering agents by taking into account the type of information the user is interested in and not just the keywords of the topic itself. It would include limiting the sources of information from which an agent retrieves relevant information. This is achieved by limiting the agent to specific news groups or cutting out information from a specific author of an article automatically using appropriate filters. However, simple filtering alone is not enough to enable an agent to make a decision as to which articles should be presented to the user. The agent, by using heuristics can approximately classify an article into different types and represent this classification in a symbolic way. For example a derogatory article (see section \begin_inset LatexCommand \ref{flames} \end_inset ) can be shown in the dialog box by using a symbol of a flame to indicate that the agent has made an educated guess as to the nature of the article. \layout Standard How can user agents communicate with each other, so that each could provide recommendations to the other? By sharing articles agents can retrieve articles from another user that have a greater chance of being relevant. \layout Standard The agents, \shape italic NewT \shape default \latex latex \begin_inset LatexCommand \cite{pattie2} \end_inset \latex default (see Section \begin_inset LatexCommand \ref{NewT} \end_inset ) and \shape italic INFOS \shape default \latex latex \begin_inset LatexCommand \cite{INFOS} \end_inset \latex default (see section \latex latex \begin_inset LatexCommand \ref{INFOS} \end_inset \latex default ) do not have these features. These do, however, provide important features which have been incorporated into this research: \layout Itemize Weighting with keyword searches and simple natural language processing (INFOS). \layout Itemize The ability for a different agent to be allocated to each topic the user is interested in (NewT). \layout Section Research Methods \layout Standard The project will be developed from existing research on information agents. It will also build up a rule base by analysing the users responses from the system. The preferred method of achieving this is by using an expert system. Expert systems and production systems in particular are flexible and are easily implemented. \latex latex \layout Standard A simple news client called JavaNews was used as part of the prototype which was developed in Java \begin_inset LatexCommand \cite{java_news} \end_inset . Extra classes were then added to it so that the project prototype was built in an optimal time period. All the JavaNews classes were updated to version 1.1 of the Java language. This was done to increase efficiency because it uses the new event model. It also removed deprecation warnings on compilation. \layout Standard The first method used to address these problems is to filter information taking into consideration the subjective needs of the user. A dialog box was used in the prototype to provide the user the opportunity to specify criteria with which to filter articles based on their own personal preferences. \layout Standard The second aim for the research is to obtain recommendations from other agents that have similar preferences installed. \layout Standard The scope of the research is for the agent to filter articles that the user is searching for and to also assess the appropriateness of the information. The information filtering will be done with the aid of a rule base which will determine if the article is of sufficient interest. If so, the article will be kept and presented to the user for confirmation of its interest. \layout Standard Four agents in all were developed for the prototype which provides enough ability to demonstrate the system. \layout Section Thesis Plan \layout Standard Chapter 2 outlines the current research in the area of intelligent agents. It also discusses aspects of computer reasoning and in particular how expert systems reason. This is because the News agent being designed in this document uses an expert system to reason effectively as part of its reasoning model. \layout Standard Chapter 3 explains the framework of a completely implemented system. This framework gives enough flexibility for the developer to implement a complete system in their own way. \layout Standard Chapter 4 discusses an abstract syntax of how agents can communicate with each other across a network. It provides a set of protocols for achieving article swapping between agents. \layout Standard Chapter 5 then gives an example of a prototype system which has been developed using the Java \latex latex \begin_inset LatexCommand \cite{java} \end_inset \latex default language and uses the Remote Method Invocation (RMI). \layout Standard Finally chapter 6 discusses future directions if the project is to be developed further. These include further information about security concerns and the initial contacting of other news agents. \layout Chapter News Agents and the Internet \layout Section Definition of an Agent \layout Standard \latex latex \backslash citeN{julia} \latex default states that: \layout Quotation \align left \begin_inset Quotes eld \end_inset An agent is a piece of software that works independently of the user to perform some task. This is something that the user has specified directly or indirectly by observing the users behaviour and making decisions which would help the user in performing their tasks. \begin_inset Quotes erd \end_inset \layout Standard There are nine attributes which are considered important when examining agent behaviour. These are \emph on autonomy \emph toggle , \emph on personalisabilty \emph toggle , \emph on communication, delegation, domain \emph toggle , \emph on graceful degradation \emph toggle , \emph on cooperation \emph toggle , \emph on anthropomorphism \emph toggle and \emph on expectations \emph toggle \latex latex \latex default \begin_inset LatexCommand \cite{julia} \end_inset , \latex latex \latex default \begin_inset LatexCommand \cite{pattie2} \end_inset . \layout Subsection Autonomy \layout Standard Any agent should be able to work separately from the user and be able to initiate tasks on behalf of the user independently. This type of behaviour has several aspects: \shape italic periodic action \shape default in which the agent completes actions at particular times \shape italic , spontaneous execution \shape default where the agent executes an action without being prompted by the user, and \shape italic initiative \shape default where the agent carries out tasks not set by the user, which will benefit the user in some way. An agent must also be able to make independent or preemptive actions which will eventually benefit the user \latex latex \latex default \begin_inset LatexCommand \cite{julia} \end_inset . \layout Standard A garbage collector is not an agent, it does a task preemptively but does not provide a way to interact with the user. It collects objects which have been dereferenced at periodic intervals. An agent will act in reaction to circumstances not just on a timed basis. This provides a measure of intelligent behaviour. Julia \latex latex \begin_inset LatexCommand \cite{julia} \end_inset \latex default (see section \begin_inset LatexCommand \ref{domain} \end_inset ) is able to roam around her domain as one of her tasks, if she notices that there has been a new room added since she last visited, she will add it to her store of information about the particular MUD (Multi User Dungeon). \layout Standard Most programs are non-autonomous because they execute instructions when the user performs some action, or on a timed basis, such as the automatic saving of documents. \layout Subsection Personalisability \layout Standard Agents are personalised to a particular user. A good agent will \shape italic learn \shape default from the user, so that the agent does not have to be instructed specifically what to do. Agents learn by observing the user and committing those observations to its memory. This could be achieved by using a knowledge base when the system is able to make new rules and add them to it's rule base or to replace an existing contradictory rule . They are also tools which allow users to handle tasks effectively, and must be able to be educated to perform these tasks \begin_inset LatexCommand \cite{learning} \end_inset . \layout Subsection Communication \layout Standard For the agent to fulfill a particular task, communication with the agent is needed. This communication should be \shape italic two way \shape default to ensure that the agent gives feedback about what it is doing and for it to clarify what the user's task is. \layout Standard One type of communication language which has been specifically written as a protocol to talk to other agents is Knowledge Query and Manipulation Language (KQML) \begin_inset LatexCommand \label{KQML} \end_inset \latex latex \latex default \begin_inset LatexCommand \cite{kqml} \end_inset . KQML is intended to be a high-level language to be used by knowledge-based system to share knowledge at run time. It is a language for programs to communicate with each other even if they have been written in other languages in a very similar way to the OMG's ORB (Object Request Broker) \begin_inset LatexCommand \cite{orb} \end_inset . The OMG's IDL (Interface Definition Language) from CORBA specifies the interfaces and types to be used in the client but does not implement any code. The main problem with KQML is that is more of a specification language as opposed to an implemented solution such as Sunsoft's RMI (Remote Method Invocation) or CORBA. KQML also has the problem of being over complicated to achieve simple tasks where either the OMG's CORBA or RMI would be more practical. \layout Subsection Delegation \begin_inset LatexCommand \label{delegation} \end_inset \layout Standard For an agent to be effective it must be able to handle tasks, such as searching for relevant web pages or news articles, on its own without the user worrying about how well it is performing the task. The user trusts that the agent is capable of the tasks that it has been set. If the agent is trusted to work on the user's behalf there is always a chance that something will go wrong and the agent will not carry out the task the way the user intended. If the agent cannot complete the task, it should complete at least some of the task. This enables the agent to become more efficient when performing the task and easier to trust. After the search has been refined and the agent has a much better idea of the user's needs. \layout Standard The agent is working in a domain chosen by the user which gives an opportunity for the user to weigh up the risks of using the agent given it's topic domain. \layout Subsection Domain \begin_inset LatexCommand \label{domain} \end_inset \layout Standard The domain is the space in which the agent is working, which is tied to the task that the agent has been set to achieve. The Julia \latex latex \begin_inset LatexCommand \cite{julia} \end_inset \latex default agent works on a MUD (Multi-User Dungeon) environment. Julia is an agent which emulates a player character connected to a MUD and behaves in a very human like way. Julia has three main functions: \layout Enumerate To map out the area and become a guide so that other users in the mud can find their way around. \layout Enumerate To observe other users. \layout Enumerate To quote users on a topic that Julia is unfamiliar with. \layout Standard If Julia cannot understand something she will quote another user on the same topic. Julia does this by storing a database of quotes and then searches through them by keywords when quoting on a subject. Failing that she will talk on another subject. If that fails she will start to talk on the subject that she is familiar with (in Julia's case this is hockey). \latex latex \backslash citeN{mauldin} \latex default classifies Julia as a \begin_inset Quotes eld \end_inset Chatterbot \begin_inset Quotes erd \end_inset . A Chatterbot splits the different functions into modules. These modules are able to automate a player in a MUD. The module which deals with conversation is implemented as a prioritised layer of mini-experts. These are collections of patterns and associated responses. Julia is more sophisticated than Weizenbaum's Eliza \latex latex \begin_inset LatexCommand \cite{eliza} \end_inset \latex default program (see \series bold Anthropomorphism \series default in section \begin_inset LatexCommand \ref{anthro} \end_inset ) because Julia is able to handle more user responses using better pattern matching techniques and has a memory of past events \latex latex \latex default \begin_inset LatexCommand \cite{enter} \end_inset . \layout Subsection Graceful Degradation \begin_inset LatexCommand \label{graceful degradation} \end_inset \layout Standard When two parties communicate incorrectly some form of error recovery needs to be invoked. This generally involves only part of the task being fulfilled which is arguably better than a direct fail for a particular task. Ways in which errors in communication occur are \latex latex \latex default \begin_inset LatexCommand \cite{julia} \end_inset : \layout Enumerate A communications mismatch - the two parties mis-communicate and do not realise it. \layout Enumerate A domain mismatch - one or both parties could be talking about something outside their current domain of understanding and may not realise it. \layout Standard If the agent only partially fails, the user has more trust in the agent's performance. \layout Subsection Cooperation \begin_inset LatexCommand \label{cooperation} \end_inset \layout Standard The user and the agent are collaborating in the construction of a contract. The user provides a guide to what the agent should be doing. The agent specifies what it can do and then produces results. Communication is two way giving the user a chance to verify what is required when the agent gives feedback on what it believes the user wants. This form of communication is more of a peer to peer style rather than a user issuing \shape italic "commands" \shape default to the system. \layout Subsection Anthropomorphism \begin_inset LatexCommand \label{anthro} \end_inset \layout Standard Anthropomorphism is the degree to which the agent acts like a human being. The program ELIZA \latex latex \begin_inset LatexCommand \cite{eliza} \end_inset \latex default (a psychologist emulator) pretends to be human, but one could not call it an agent as it lacks some of the other essential ingredients (such as being useful to it's domain). The ELIZA program simulated a Rogerian psychotherapist by rephrasing any of the patient's statements as questions and posing them to the patient. It worked by simple pattern recognition and substitution of keywords into standard phrases. Note that the example given in table \begin_inset LatexCommand \ref{julia_match} \end_inset is from Julia and that the Eliza program matches human speech in a similar way. \layout Standard \begin_float tab \layout Standard \align center \LyXTable multicol5 7 1 0 0 -1 -1 -1 -1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 2 1 1 "" "" 0 2 0 1 0 0 0 "" "" 0 2 0 1 0 0 0 "" "" 0 2 0 1 0 0 0 "" "" 0 2 0 1 0 0 0 "" "" 0 2 0 1 0 0 0 "" "" 0 2 0 1 0 0 0 "" "" 0 2 0 1 0 0 0 "" "" else if ((MATCH (lcmsg, \begin_inset Quotes eld \end_inset *how*are*you* \begin_inset Quotes erd \end_inset ) \begin_inset Formula \( | \) \end_inset \begin_inset Formula \( | \) \end_inset \newline MATCH(lcmsg, \begin_inset Quotes eld \end_inset how's my*favour* \begin_inset Quotes erd \end_inset ) \begin_inset Formula \( | \) \end_inset \begin_inset Formula \( | \) \end_inset \newline MATCH(lcmsg, \begin_inset Quotes eld \end_inset how is my*favou* \begin_inset Quotes erd \end_inset ) \begin_inset Formula \( | \) \end_inset \begin_inset Formula \( | \) \end_inset \newline MATCH(lcmsg, \begin_inset Quotes eld \end_inset how*do*you* \begin_inset Quotes erd \end_inset ) \begin_inset Formula \( | \) \end_inset \begin_inset Formula \( | \) \end_inset \newline MATCH(lcmsg, *how's*life* \begin_inset Quotes erd \end_inset ) \begin_inset Formula \( | \) \end_inset \begin_inset Formula \( | \) \end_inset \newline MATCH(lcmsg, *how is *life* \begin_inset Quotes erd \end_inset ) \begin_inset Formula \( | \) \end_inset \begin_inset Formula \( | \) \end_inset \newline MATCH(lcmsg, *are* you* OK* \begin_inset Quotes erd \end_inset ) \begin_inset Formula \( | \) \end_inset \begin_inset Formula \( | \) \end_inset \layout Caption \begin_inset LatexCommand \label{julia_match} \end_inset Julia Matching Algorithm \end_float Table \begin_inset LatexCommand \ref{julia_match} \end_inset illustrates how the Julia algorithm uses meta characters such as the `*' which can match a string of characters. These are used in regular expressions to find pattern matches. In the first instance the algorithm finds the words \begin_inset Quotes eld \end_inset how \begin_inset Quotes erd \end_inset , \begin_inset Quotes eld \end_inset are \begin_inset Quotes erd \end_inset and \begin_inset Quotes eld \end_inset you \begin_inset Quotes erd \end_inset in that order that have any number of characters between those words. The same method is used in the other cases to find out whether or not the user is asking Julia what her present circumstances are. With such as simple parser, ELIZA (as well as Julia) experiences many failures of communication. \layout Standard ELIZA was so convincing, however, that many people became emotionally involved with it. All this was due to people's tendency to attach to words meanings which the computer never put there. Another program which brings in the concept of emotions into agents is \begin_inset Quotes eld \end_inset Woggles \begin_inset Quotes erd \end_inset \begin_inset LatexCommand \cite{emotion} \end_inset , which are three creatures called Shrimp, Bear and Wolf. These creatures are self-animating and use some of the techniques used by Disney animators to show life like emotions into these characters. These include facial characteristics when goals were fulfilled or unmet and quirks such as scratching an ear and falling over, which demonstrates a unique `character trait'. Joseph Bates who created the Woggles argues that emotion is important for the users to identify with the agents and for these agents to be truly anthropomophic. \layout Subsection Expectations \begin_inset LatexCommand \label{expectations} \end_inset \layout Standard When one is interacting with an agent the interaction becomes a lot better when the users expectations are fulfilled. Agents are most useful in domains in which graceful degradation and correct balance of risk to trust can be obtained. The users expectations compared with the performance that the agent delivers are important. \layout Section Types of Agents \layout Standard There are many different types of agents some of which are \emph on Autonomous Agents \emph toggle , \emph on Symbolic and Cooperative Agents \emph toggle , \emph on Anthropomorphic Agents \emph toggle and \emph on Multi Agent Systems \emph toggle . These agents are described as follows: \layout Subsection Autonomous Agents \layout Standard Autonomous agents work on behalf of the user without the user giving any direct commands to the agent. They can act without the user being there. Some of these agents search for information among the world's databases \begin_inset LatexCommand \cite{scully} \end_inset . They get this information by accessing gopher databases or the World Wide Web (WWW) to return interesting and relevant information. \layout Standard Two of the problems that these information retrieval agents encounter are: \layout Enumerate Delays with a slow network connection. \layout Enumerate The vast quantity of information. \layout Standard The type of the information also has to be assessed. The agent searches intelligently by not searching information in areas where the type of information is not what the user requires. For example if an agent is looking for suitable material for an academic audience then it would unlikely to get information from a online press source without the likelihood of this information being biased in some way, or not technical enough in its content. \layout Subsection Symbolic and Cooperative Agents \layout Standard Symbolic and cooperative agents assist in the current user's task by providing alternative views and additional information. \layout Standard Eager \latex latex \begin_inset LatexCommand \cite{cypher} \end_inset \latex default observes user's interaction and uses it to create macros and useful program defaults. The CAD Helper which is described by Toganzzini \latex latex \begin_inset LatexCommand \cite{tog} \end_inset \latex default spots useful relationships in engineering drawings including start, midpoint, end and tangent lines. Two other agents Clarke and Smyth's (1993) room arrangement and Fisher's Critics (1991) use a large knowledge base of domain rules in a similar way to an expert system. An expert system is able to use a knowledge base of rules and an inference engine to scan those rules to see which apply to the users input. If any rules are applicable, those rules are invoked and their results are fed back to the user \latex latex \latex default \begin_inset LatexCommand \cite{crash} \end_inset . The rules for the kitchen agent include ones for layout and design of the kitchen. These type of agents are similar to guides, except that they also provide complementary information as well. \layout Subsection Anthropomorphic Agents \layout Standard Anthropomorphic agents provide human like behaviour. Microsoft and Wordperfect have Wizards \latex latex \begin_inset LatexCommand \cite{word} \end_inset \latex default and Coaches \latex latex \begin_inset LatexCommand \cite{wordperfect} \end_inset \latex default respectively. These guides act as tutors when the user is using the program. They are more advanced than a traditional help system in that they take an active role in helping the user use the program effectively. Guides for Apple Macintosh Computers \begin_inset LatexCommand \cite{oren} \end_inset , use a hypermedia (a hyperlinked document with different media types such as pictures, sounds etc) database. These guides have some of the attributes of agents and have been labelled as such by the companies that made them. However, they are not agents as they do not learn from the user. For example even though they try to give suggestions to the user, they do not keep a user profile which could be used for prompting the user if they keep on making the same errors, or provide extra help for functions that the user is finding difficulty with. For information on true anthropomorphic agents see the Anthropomorphism section \begin_inset LatexCommand \ref{anthro} \end_inset . \layout Subsection Multi-agent Systems \layout Standard Multi-agents systems are systems which are made up of a variety of agents. Often when the term agent is discussed the end user is also considered. In a workgroup situation the software which manages the working group is very agent like. Multi-agent systems can be distributed as well. The KQML language mentioned in section \begin_inset LatexCommand \ref{KQML} \end_inset is a way of having agents communicate together on a common task. It allows agents, no matter what language they are written in and what their purpose is, to be able share information and be able to use this in their tasks. Multi-agent systems can be written in the Java \latex latex \begin_inset LatexCommand \cite{java} \end_inset \latex default language. Aglets \latex latex \begin_inset LatexCommand \cite{ibm} \end_inset \latex default allow agents to be written that not only communicate with each other, but can shift to other sites while being able to retain their state. This creates a more reliable system because agents can be moved from an unreliable site to a more reliable one easily. \layout Section Difference Between Internet Search Agents and Search Engines \layout Standard Many people have criticised agents for performing a similar role to that of the search engine and therefore making an agent not necessary or useful. This section describes how agents are different from search engines, in that they provide features that search engines do not have. The differences between a search engine and a Internet search agent are listed below \latex latex \latex default \begin_inset LatexCommand \cite{bjorn} \end_inset : \layout Itemize Agents are able to take into context the words the user is searching for. Unlike a search engine which searches only for keywords. It includes using a tool such as a thesaurus which is able to look for the alternative meanings of words. \layout Itemize Agents are able to search for a longer period and are able to gather information without the user taking the time to do so. A search engine takes up time as the user has to specifically inform the engine of the information and sometimes do a second search if the information is irrelevant. \layout Itemize Agents are able to do preemptive searches which will often provide information that the user was not specifically looking for but is relevant to the users interests, or will provide specific information that the user searched for earlier. \layout Section Agents and Reasoning \begin_inset LatexCommand \label{agents_and_reasoning} \end_inset \layout Standard Agents have the ability to make decisions on behalf of the user. Agents achieve this by the implementation of a reasoning model. Reasoning is the process of deduction, where the program is able to deduce the correct answer based on the information it is given and the rules that are applied. \layout Standard The goal of scientists over many years to discover how humans solve problems and the steps involved in doing so. This in turn led artificial intelligence experts to provide various means of implementing these and other methods of reasoning. \layout Standard \latex latex \backslash citeN{rep_uncertain_know} \latex default state that: \layout Quote \align left \begin_inset Quotes eld \end_inset An AI program manipulates symbols that somehow represent pieces of information about the world to perform a task that we normally take to require intelligence , such as playing chess, diagnosing an illness in a patient, or understanding an English sentence. \begin_inset Quotes erd \end_inset \layout Standard The kind of reasoning model that is being used by the agent is an expert system, which is an example of this sort of symbol manipulation. \layout Subsection Expert Systems \begin_inset LatexCommand \label{expert_systems} \end_inset \layout Standard Expert systems are constructed by obtaining this knowledge from a human expert who has knowledge to a particular domain (for example medicine). This knowledge is then encoded into a form that the computer can use to solve problems. Experts usually apply rules of thumb to solve problems. These rules of thumb when placed in an expert system are called heuristics. Experts also have a general knowledge of their specific domain, which is also placed in the expert system \begin_inset LatexCommand \cite{ai} \end_inset . \layout Standard Expert systems use a combination of if-then rules which are applied to various facts. In the example mentioned in section \begin_inset LatexCommand \ref{forward_chain} \end_inset , a set of rules is applied to solve the problem of an electric light which does not function properly. When the system is initiated a set of facts are placed into the rulebase. In the case of the given example this is information about the given situation. The system then activates any rules which apply to the facts given. The facts and rules make use of various symbols which are stored as facts and manipulated by the rule base (the if-then rules). \layout Standard The computer is unaware of the meaning of these symbols but is able to manipulat e them successfully to get an appropriate output. Expert systems are used when the knowledge that is being stored is organised in a highly structured way. Agents can use an expert system to deduce relevant information and through learning about the user through the process of knowledge acquisition rules can be added to the expert system to improve the reliability of the system. \layout Subsubsection Forward Chaining \begin_inset LatexCommand \label{forward_chain} \end_inset \layout Standard In forward reasoning, one starts with the available information in the knowledge -base and applies all the inference rules until the goal has been successfully discovered or proved that one can not reach the goal. Inference rules are usually formulated in a forward way. If one knows this, then implicitly one can also know other information as well. \layout LyX-Code Rule 1: \layout LyX-Code If there is no electricity, \layout LyX-Code THEN the light will not come on \layout LyX-Code Rule 2: \layout LyX-Code If the bulb is broken \layout LyX-Code THEN the light will not come on \layout LyX-Code Rule 3: \layout LyX-Code If the switch is broken \layout LyX-Code THEN the light will not come on \layout LyX-Code Rule 4: \layout LyX-Code If the bulb is broken \layout LyX-Code THEN the light will not come on \layout LyX-Code Rule 5: \layout LyX-Code If other electrical appliances \layout LyX-Code OR other lights work \layout LyX-Code THEN there is electricity \layout LyX-Code ELSE there is no electricity \layout LyX-Code Rule 6: \layout LyX-Code If the electricity is on \layout LyX-Code AND the bulb is not broken \layout LyX-Code THEN the switch is broken \layout LyX-Code Rule 7: \layout LyX-Code If the electricity is on \layout LyX-Code AND the switch is not broken \layout LyX-Code THEN the bulb broken \layout Standard The following facts are stored: \layout LyX-Code Fact 1: Other lights are still on \layout LyX-Code Fact 2: The switch is not broken \layout Standard Suppose we would like to find out if the light will not work and we want to use this rule base to do so. If we only have information on the left hand side (LHS) of the rule then this information has to be inferred from information in the right hand side (RHS). \layout Standard If the system was forward chaining the system would apply all the left hand side of the rules and store any facts generated in the knowledge base. This could be done either by starting from the top finding any applicable rule, storing the facts generated and starting all over again or it would go on to rule 2 and then apply the facts in the database. The first rule will be activated will be rule 5 noting that the electricity is turned on and asserting the fact: \layout LyX-Code Fact 3: There is electricity \layout Standard Then rule 7 will be activated stating that because there is electricity and the switch is not broken, then the bulb is broken. Finally the rulebase states that the light will not work because the bulb is broken (rule 4). \layout Subsubsection Backward Chaining \layout Standard When backward chaining one starts out with a goal and the inference rules are applied in reverse to find all the appropriate information needed for the goal to be determined. \layout Standard Using the rules mentioned earlier one would start with the goal that the light will not work. The interpreter then finds the applicable domain rules. Domain rules are rules that apply to the particular problem (in this case why the light will not work). If the domain rules conclude that the goal is true, then the system will report back TRUE and then it is known that the lights will not work. \layout Subsubsection Breadth First Searching \layout Standard Breadth first searching descends the tree in figure \begin_inset LatexCommand \ref{search_tree} \end_inset . This search tree diagram is taken from the example in section \begin_inset LatexCommand \ref{forward_chain} \end_inset . The searching starts from the root of the tree level and descends level by level. In this example the A node (or root node) of the tree would be first be explored then the B and C nodes, after that the D, E, F, and G nodes and then the H I, J, K nodes. Finally the L, M, N and O nodes would be explored which are at the bottom layer of the tree. \layout Standard \begin_float fig \layout Standard \align center \begin_inset Figure size 405 228 file search_tree.eps flags 9 \end_inset \layout Caption \begin_inset LatexCommand \label{search_tree} \end_inset Search Tree For Expert System Example \end_float \layout Subsubsection Depth First Searching \layout Standard As a contrast to breadth first searching, a depth first search goes deeper into the search space whenever this is possible. Only when there are no dependents of a state to be found are it's siblings considered. \layout Standard This can be illustrated in Figure \begin_inset LatexCommand \ref{search_tree} \end_inset by the search proceeding in the order of A, B, D, H, I, E, L, M, K, N, O. Then the next half of the tree C, F and G. In this example the left node is always considered first before transfersing the right node of the tree \latex latex \latex default \begin_inset LatexCommand \cite{ai_page} \end_inset . \layout Standard In backward reasoning, applying an inference rule reduces the goal to a set of subgoals. This allows the system to solve an easier set of subgoals. Although these subgoals may also be complicated. The difference between breadth searches and depth searching is the order one generates to try to solve the subgoals. \layout Standard Given a goal G can be in principle reduced to either to S or S' (first or second reductions). In a depth first search using backward reasoning, the interpreter will only use the first reduction and try to reduce it to it's subgoals. The second reduction, S' will only be tried after all the possibility of the first reduction, S have been exhausted. This process is called back-tracking. In the search tree example when a `No' node is reached and there are no further ways to descend the tree further, the program will backtrack to the next accessible branch of the tree. For example if the program reaches the `I' node then the program will backtrack to the `E' node. \layout Subsubsection Advantages and Disadvantages to Expert Systems \layout Standard One of the main advantages of expert systems is the ability to store knowledge separately from the inference engine. This allows the facts to be changed while still allowing the same reasoning and also for new rules to be added. When the system changes the rules itself using input from the user, this can provide a means of computer learning. \layout Standard The main disadvantage of expert systems is their lack of introspection. This means that they are unable to easily let the user know the way in which the decision is made. \layout Subsection Non Monotonic and Inexact Reasoning \layout Subsubsection Key components in Inexact Reasoning \layout Standard A measure of either fuzzy degree (see section \begin_inset LatexCommand \ref{fuzzy} \end_inset ) or probability to describe imperfect data. It can describe a measure of conditional probability or a rule strength to represent imperfect rules. Real world problems are often fuzzy in nature and require a set of heuristics to determine an approximate answer without being able to achieve an exact match. This is true for news filtering agents where they have used inexact information provided by natural language processing of the news article text. \layout Standard An inexact reasoning model contains a set of computing formulae to: \layout Enumerate Evaluate the certainty factor of each conclusion by examining the right hand side (RHS) of a rule according to the certainty factors of all conditions. This includes taking into account the left hand side (LHS) which is their conjunction, disjunction and negation of the rule. \layout Enumerate Compute the certainty factor of a conclusion which is supported by a set of rules. \layout Standard These are issues which are important to address in a news agent to provide the user with information about the certainty of the expert systems conclusions when analysing and classifying articles. \layout Subsubsection Sources of Noise \layout Standard With predicate calculus, correct conclusions can be formed from correct premises with realistic applications such as intelligent agents these premises cannot be assumed to be 100 percent correct. \layout Standard Using a knowledge base such as used for the reasoning of some agents, conclusion s are drawn from poorly formed and uncertain evidence. For example an indication of the style of an article can only be ascertained in an approximate manner by the use heuristics rather than precise rules. \layout Standard Usually, in deductive reasoning there follows a conclusion that must derive from a set of premises. This is because deductive reasoning uses mathematics to prove or disprove conclusions. However where the information is uncertain, it is impossible to predict with classical mathematical logic that the conclusion is true. Modern extensions to mathematical logic or monotonic logic have been made to deal with this uncertainty \begin_inset LatexCommand \cite{rep_uncertain_know} \end_inset . \layout Standard Formal logic requires the manipulation of symbols (p or \begin_inset Formula \( \neg \) \end_inset p), and represents a condition such as it is raining or it is not raining. It does not matter what p represents, it would always be true. \layout Standard Model theory is used to specify the semantics or meaning behind the logic. For a machine to be built that can reason, legal methods of reasoning would need to be incorporated by providing syntactic manipulations on sentences in the logical language \latex latex \begin_inset LatexCommand \cite{rep_uncertain_know} \end_inset \latex default . \layout Standard Non-monotonic logic and reasoning with beliefs, use a quantitative measure of modeling where inference is based on beliefs and assumptions. \layout Standard A non-monotonic system addresses the problem of changing beliefs. Uncertainty is handled by making reasonable assumptions when there is uncertain information. It reasons as if these assumptions are true. Later a belief may change. When this happens conclusions and derived conclusions determined from the previous set of beliefs will have to be re-examined and changed. \layout Standard For example when it rains the streets are wet, so by inference if the streets are wet it is possible to infer that it has been raining. However there could be other reasons why the streets are wet, for example fire hydrant has burst. If another piece of information was added that stated that the sky was clear then the conclusion that it had been raining would have to be reassessed \begin_inset LatexCommand \cite{know_rep} \end_inset . \layout Standard To reduce the complexity of revising conclusions when beliefs are changed, truth maintenance systems are used which store justifications for each inference. \layout Subsubsection Fuzzy Logic \begin_inset LatexCommand \label{fuzzy} \end_inset \layout Standard Fuzzy logic uses propositional logic with truth values between 0 and 1. These values are used to describe the `vagueness' property. \layout Standard For example high temperature could be defined as a value which is greater or equal to 35 degrees Celsius. Intuitively though a high temperature might not be seen to be such a precise cutoff point. Certainly if the temperature was 43 degrees it would be defined as being high and 25 degrees as being normal but 39 degrees could be seen with increasin g conviction as being high but perhaps not definitely. \layout Standard A characteristic function can take graded values in the [0,1] interval. These fuzzy sets were first introduced by \latex latex \backslash citeN{zadeh65} \latex default . \layout Section Agents and the Internet \layout Standard The main problem with the Internet is that it is so large that there is too much information for a single user to absorb. The information is also not properly organised. To see how the Internet developed see appendix \begin_inset LatexCommand \ref{The_internet} \end_inset . Added to this is difficulty in finding time to continually search the net for a particular article one is looking for. In addition there may be times when important information has been added online of which the user has no knowledge. \layout Subsection Information and Newsgroups \layout Standard One of the main problems with the Internet in general and newsgroups in particular is the transient nature of information on the Internet. What is available one day may not be available the next, making it even more important that the user capture the information before it is removed. An agent is able to capture information when the user is performing other activities, which means that less time is spent searching for information and more time being devoted to more productive activities, thus increasing user productivity. One of the important points to note is that an agent should not replace the user in the search for information, but instead compliment the user's existing search patterns. This includes monitoring the user's search in order to pick up a pattern of searching. It is then used by the agent to search the Internet for the relevant informatio n that fits the pattern recognised by the agent \begin_inset LatexCommand \cite{internet} \end_inset . \layout Subsection Examples of News Filtering Agents \layout Subsubsection NewT \begin_inset LatexCommand \label{NewT} \end_inset \layout Standard NewT \latex latex \begin_inset LatexCommand \cite{pattie2} \end_inset \latex default is a news filtering agent which has been implemented in C++ on a Unix platform. Users can train their own agents, which are assigned to specific subjects. These agents are trained by the user to select certain articles and discard others. An agent is initialised by giving it some accepted and rejected articles to retrieve. The agent performs its analysis using the vector space model for documents \begin_inset LatexCommand \cite{vector} \end_inset . This model is used to retrieve the words in the text that may be relevant. The model also remembers the article, author and source assigned indices information. The user can also manually configure the agent to search for articles based on a desired set of templates of articles. An example of this is for the user to ask the system to search for articles by \begin_inset Quotes eld \end_inset Michael Smith \begin_inset Quotes erd \end_inset about \begin_inset Quotes eld \end_inset expert systems \begin_inset Quotes erd \end_inset . \layout Standard Once the agent has been primed it can start recommending articles to the user which the user can give positive or negative feedback on different parts of the article such as the author, subject, a paragraph, etc, or the complete article. One of the main advantages with NewT is that it complements the users own search for information and does not try to completely automate the user's information retrieval. The NewT system however, has several disadvantages. One is that the user has to give constant feedback on which articles are appropriate or not. The current program is also limited to filtering to a particular user. Recommendations from one user's agent would not be passed on to another user's agent. Also this system looks for keywords only and does not have any natural language understanding. \layout Subsubsection INFOS (Intelligent News Filtering Organisational System) \begin_inset LatexCommand \label{INFOS} \end_inset \layout Standard INFOS \latex latex \begin_inset LatexCommand \cite{INFOS} \end_inset \latex default is an agent which provides a mixture of simple natural language processing and keyword searching. This agent is designed to reduce the user's amount of time searching by automatically eliminating the Usenet articles predicted to be irrelevant. Usenet News is otherwise known as Internet News and is described in more detail in appendix \begin_inset LatexCommand \ref{Internet_news_appendix} \end_inset . The author states that using a knowledge based system to analyse the text can decrease the errors made by the agent but has the disadvantage of not being able to scale up to large domains. Statistical and keyword approaches on the other hand scale up quite easily but because the system does not take the input into context it produces more errors. A hybrid system is proposed as a compromise between the two approaches. \layout Standard INFOS uses the Global Hill Climbing \latex latex \begin_inset LatexCommand \cite{hill} \end_inset \latex default algorithm. This model uses both the Bayesian induction and TF-IDF schemes. These are statistical methods of assigning weighting's to items. It is a linear discrimination method based on a table of features. This table counts the number of times a feature is found in each type of article. The table can be manipulated by users since there is only one variable per type of article. The table is created as the user reads articles, he indicates whether the article is liked or not. The outcome is to increment the table's weights. \layout LyX-Code \begin_float tab \layout Standard \align center \LyXTable multicol5 7 3 0 0 -1 -1 -1 -1 1 1 0 0 1 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 8 1 0 "" "" 8 1 0 "" "" 8 1 1 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" \series bold Word \series default \newline \series bold Accepted \series default \newline \series bold Rejected \series default \newline genetic \newline 5 \newline 0 \newline algorithm \newline 3 \newline 3 \newline flames \newline 2 \newline 7 \newline grog@ucdavis \newline 3 \newline 1 \newline Kiki Accepted \newline 4 \newline 1 \newline Kiki Rejected \newline 2 \newline 3 \layout Caption \begin_inset LatexCommand \label{Global_Hill_Weights} \end_inset Global Hill Climbing Table of Weights \end_float \layout Standard Table \begin_inset LatexCommand \ref{Global_Hill_Weights} \end_inset is used to evaluate new articles. The system extracts the unique words in the article. Every time an article is accepted or rejected this table is updated. If the word has the same number rejected and accepted then this word is of no interest. Words such as `the', `a', `I', for example are quickly eliminated as they will fall into this category. If a particular word is accepted more than rejected, then this word/topic is of interest. \layout Standard The system does not use this method alone as words such as `bicycle' and `bike' mean the same thing but there are two separate entries generated. \layout Standard INFOS uses a case-based reasoning model. By retrieving individual cases and using the classification system of those cases to classify new articles. The system is able to compare concepts rather than individual words. INFOS uses WordNet \latex latex \begin_inset LatexCommand \cite{miller} \end_inset \latex default to map words into concepts. WordNet works by using speech identification, synonyms, frequency usage etc to classify words. It also uses a hierarchal way of classifying works. For example a woman is a person. An oak is a tree, is a plant, is an organism. WordNet also allows for the different senses that a word can be used. For example: \layout LyX-Code Sense 1 \layout LyX-Code Nut -is a tree, is an organism \layout LyX-Code Sense 2 \layout LyX-Code Nut - is a mechanical part \layout LyX-Code Sense 3 \layout LyX-Code Nut - is mentally ill \layout Standard These senses are ordered from the specific to more abstract concepts. INFOS removes irrelevant nouns and verbs that are part of the article by using Paice's index extraction algorithm \begin_inset LatexCommand \cite{paice} \end_inset . This algorithm assumes that sentences repeat an underlying concept within a \begin_inset Quotes eld \end_inset topic neighbourhood \begin_inset Quotes erd \end_inset of a few sentences. The words that occur with the most frequency are likely to be more relevant to the users chosen topic. \layout Standard INFOS uses Paice's algorithm to work with concepts rather than words. The nouns and verbs are extracted and then Word Net is used to find the senses of these words. Then the irrelevant sense of the word is deleted. For example \begin_inset Quotes eld \end_inset I like to eat nuts \begin_inset Quotes erd \end_inset clearly refers to the plant kind of nut. \layout Standard One of the main disadvantages in this approach is that there is no way to evaluate the type of information. This would make it difficult to distinguish between an academic journal type article and a press news article on the subject. \layout Section Summary \layout Standard From the literature that has been reviewed, the field of intelligent agents is one that is rapidly growing in interest and research. There are many organisations that are involved with producing intelligent agents including `The Agent Society', and many Universities such as Massachuset ts Institute of Technology (MIT) and Royal Melbourne Institute of Technology (RMIT). Recently commercial companies have been developing intelligent agents or providing agent development tools such as IBM's Aglets which are agents developed in the Java programming language. \layout Standard Some of the issues that have arisen from the introduction of agents are: \layout Itemize Interacting with the user in a human like way, the degree of anthropomorphism an agent should exhibit without confusing the user into thinking it is another human. \layout Itemize Natural language processing, how far should an agent be able to understand the context of the information the user is looking for in agents that filter information. \layout Itemize Communication with other agents, tools such as KQML allow agents to share information. This allows agents to get information not just from their current domain but from other agents that are attempting a similar task. \layout Itemize Trust in the agent is important, if the agent is delegated to a particular task then the agent must be able to complete the task effectively. \layout Standard At least some of these issues need to be addressed when developing an agent. The types of issues involved depend on the type of agent that is being developed. \layout Standard The domain of an agent is important so that the agent is able to react well to it's environment and able to follow any existing standards. \layout Standard The examples of news filtering agents that were mentioned earlier do not take into account the user's subjective needs about the quality of information, also they do not try to gain information from other agents. The agents could also be improved by increasing the natural language skills of the agent in a form that is able to scale better than existing processing methods. \layout Chapter A Framework for Building News Filtering Agents \begin_inset LatexCommand \label{sec: Framework} \end_inset \layout Standard This framework chapter describes a model for the creation of an intelligent news agent. The prototype (which provides an example of how the model functions) is demonstrated in chapter \begin_inset LatexCommand \ref{NFACT} \end_inset . \layout Standard This system provides information which matches the user's profile. \begin_float fig \layout Standard \align center \begin_inset Figure size 364 182 file proj_frmwrk.eps flags 9 \end_inset \layout Caption \begin_inset LatexCommand \label{Project_diagram} \end_inset A Framework for Agent Based News Filtering \end_float \layout Standard \align left Figure \begin_inset LatexCommand \ref{Project_diagram} \end_inset shows four agents retrieving and placing sorted matching articles from a database using an \emph on Agent Coordinator. \emph toggle Such a system might typically include the following components: \layout Itemize \emph on Agent's NNTP Client \emph toggle , \layout Itemize \emph on Client Interface, \emph toggle \layout Itemize \emph on User's NNTP Client, \layout Itemize \emph on Dialog Based Feedback Mechanism, \emph toggle \layout Itemize \emph on Expert system to filter articles, \emph toggle \layout Itemize \emph on Learning mechanism and a \layout Itemize \emph on Database of Matching Articles \emph toggle . \layout Standard These are represented in figure \begin_inset LatexCommand \ref{Project_diagram} \end_inset . Each component is fully described in the following sections: \layout Section Agent's NNTP (Network News Transport Protocol) Client \layout Standard Agents can retrieve articles by either using a search engine (such as \latex latex \backslash citeN{deja_news} \latex default ) or by connecting to a news server that the client connects with. The agent can then retrieve articles by issuing commands to the server. Details of these commands can be found in Appendix \begin_inset LatexCommand \ref{Internet_news_appendix} \end_inset or in the Request For Comment (RFC) 977 document \latex latex \latex default \begin_inset LatexCommand \cite{rfc977} \end_inset , which describes the NNTP news. \layout Standard It is important that agents do not keep entire articles but only references to them, which saves space, if two or more agents refer to the same article. Their unique \emph on Message_Id \emph toggle should refer to the articles. This is important because agents from other user systems will also be asking for articles and checking by \emph on Message_Id \emph toggle which articles have been already stored in the database file by using their \emph on Agent Coordinator \emph toggle (see section \begin_inset LatexCommand \ref{agent-co} \end_inset ). Retrieving information for the user in this way is one of the delegated tasks performed for the user. The user trusts that the agent is able to do this task effectively as mentioned in section \begin_inset LatexCommand \ref{delegation} \end_inset . \layout Section Client Interface \layout Standard The client interface should enable the user to have access to all the abilities of a standard Usenet news client and have a menu, which saves to disk the agent's initial profile and is able to read and alter topic profiles. The system should also have the ability to search for other user agents and connect to them. An example client interface can be found in the prototype shown in section \begin_inset LatexCommand \ref{client_interface} \end_inset . \layout Section User's NNTP Client \layout Standard The NNTP user client connects a \emph on news server \emph toggle and performs all the commands necessary to read and post news articles. Achieving this uses the commands specified in the RFC 977 document. \layout Section Dialog Based Feedback \layout Standard The \emph on Dialog Feedback Mechanism \emph toggle component creates the starting rules based on the user profile. The user profile is created from the limitations or initial search settings that the user enters into the system which include information about what author, subject, author's email address or of the content of the article itself should be searched for. \layout Standard The basic rules for the system should be set up as a template, with the specific information needed to filter the articles being provided by the feedback from the dialog box. The \emph on \emph toggle component is the part of the user interface which brings up a list of articles that the agent has found for the user. When an article is selected, the main text of the article is shown up in a listbox. Clicking on other articles will add them to the text area. De-selecting each article deletes the main text of the article from it. \layout Standard When the user has finished selecting the articles which they consider the most relevant the system then analyses the articles and tries to induct a pattern which can be used to search for other articles which match this pattern. Information about this is given in more detail in section \begin_inset LatexCommand \ref{agent_learning} \end_inset . \layout Standard Useful feedback to the user provides a method of dialogue with the agent, which is one of the attributes of an agent mentioned in \begin_inset LatexCommand \ref{cooperation} \end_inset . \layout Section Expert System Filtering for an Agent \begin_inset LatexCommand \label{expert filter} \end_inset \layout Standard Experts Systems have been discussed earlier in section \begin_inset LatexCommand \ref{expert_systems} \end_inset . The agent uses an expert system to enable it to classify articles. The \emph on Expert System \emph toggle is a component to which the agent sends its main text so that it can be processed and the resulting classification sent back to the system. The system also provides a mechanism for introspection which involves outputtin g text to a listbox. The text contains information about what decisions the agent has made in regard to classifying and choosing articles. \layout Standard The process is an important one because it allows users to make the decision as to whether the agent has correctly classified an article or not and can then decide then what to do with the article. This process is known as decision support because it is the user rather than the system that makes the final decision about the possible classification of an article. Using an expert system provides a number of rules for which new rules can be added and old ones deleted or modified. \layout Section Agent Learning and Updating the Rulebase \begin_inset LatexCommand \label{agent_learning} \end_inset \layout Standard Learning is defined in the Australian Pocket Oxford dictionary (1988) as \begin_inset Quotes eld \end_inset get knowledge of or skill in by study, experience , or by being taught \begin_inset Quotes erd \end_inset (p. 391). Sesito and Dillon (1994) define several type of learning strategies which include: \layout Itemize Rote Learning and direct implanting of new knowledge. \layout Itemize Learning by instruction. \layout Itemize Learning by analogy. \layout Itemize Learning by example. \layout Itemize Learning by observation and discovery. \layout Standard The three main strategies that are of benefit to an intelligent agent are learning by example, learning by observation and discovery and learning by instruction. Many news agents use both strategies with one being more prevalent than the others. \layout Standard News agents are given examples of articles which the user finds useful and then they extrapolate some rules by using pattern matching techniques. The agent also observes the user when they are reading articles and tries to ascertain whether the user is interested in the article. If the agent believes the article to be useful it will process it in the same manner mentioned earlier. \layout Standard Because of the limited time allowed to complete this research project a particular method of pattern matching has been included into the design framework. This method by \latex latex \backslash citeN{learning_alg} \latex default finds patterns in pieces of text by looking for the distance between words. The method compares pairs of words in each of the articles chosen and rejected and searches for pairs of words from each article which are similar. \layout Section Database of Matching Articles \begin_inset LatexCommand \label{database} \end_inset \layout Standard The database file is accessed via the Agent Coordinator. This component stores the articles that belong to the agents. The agents are able to retrieve and save articles to the database by the use of references. These references are stored in a hashtable, which is shown in table \begin_inset LatexCommand \ref{hashtable_of_articles} \end_inset . Each article processed by the system can be stored in a news object, which contains the following attributes: \layout Enumerate The System Id (used to keep track of where the article is in the database) \layout Enumerate The Header \layout Enumerate The Author and the Author's Email Address \layout Enumerate The Subject \layout Enumerate The Main Text of the Article \layout Enumerate The Newsgroups \layout Standard The news object then becomes a record in the database. The attributes stored in each record provide the system with a mechanism to access different parts of the article. This provides the flexibility to examine parts of an article such as the author without having to search the entire article for them, allowing the program to behave efficiently. For example if the system needed to check for a particular author's name, the system could retrieve the author's name directly without having to search the entire record, thus increasing the speed of retrieval from the database. \layout Chapter Inter-agent Communication \begin_inset LatexCommand \label{communication} \end_inset \layout Section Inter-agent Communication Language for Agents \layout Standard One of the properties of agents is their ability to communicate with other agents. NFACT communicates with other agents to retrieve useful articles. The agent does not retrieve rules from other agents because the rules that are used need to be generated from the user's own preferences and not another user, who may have different interests even within the same topic domain. An abstract language has been created in this thesis to allow communication between other agents and the \emph on Agent Coordinator \emph toggle . \layout Subsection General Syntax \layout Description \family sans \begin_inset Formula \( < \) \end_inset \family typewriter ID \family sans \begin_inset Formula \( > \) \end_inset = Unique Usenet Message Id. \layout Description \family typewriter Topics \family sans = Each agent collects articles for a particular topic (eg. \begin_inset Quotes eld \end_inset Sweets \begin_inset Quotes erd \end_inset ). \family typewriter Topics \family sans is a keyword which indicates all the topics that the system is carrying. \layout Description \family typewriter myTopics \family sans = Array or Vector of type String. \layout Description \family typewriter Entire_Article \family sans = The \begin_inset Formula \( < \) \end_inset \family typewriter ID \family sans \begin_inset Formula \( > \) \end_inset for that article. \layout Description \family typewriter Topic \family sans = \begin_inset Formula \( < \) \end_inset \family typewriter Topic \family sans \begin_inset Formula \( > \) \end_inset String name of topic. \layout Description \family typewriter Topics_Array \family sans = Array or Vector of Topic names of type String \layout Subsection Send \layout Description Syntax \layout LyX-Code \begin_inset Formula \( Send\left\{ \begin{array}{l} Topics\\ Topics-Array\\ Message-ID\\ Entire-Article\\ Article-Array \end{array}\right. \) \end_inset \layout Description Usage \layout Itemize \family typewriter Send Topics \family sans \newline \family typewriter Topics \family sans = Keyword to send all Topics available on the system \layout Itemize \family typewriter Send Topics_Array \family sans \newline \family typewriter Topics_Array \family sans = Array or Vector of type String \layout Itemize \family typewriter Send Message_Id \family sans \newline \family typewriter Message_Id \family sans = \begin_inset Formula \( < \) \end_inset \family typewriter ID \family sans \begin_inset Formula \( > \) \end_inset , \begin_inset Formula \( < \) \end_inset \family typewriter ID \family sans \begin_inset Formula \( > \) \end_inset = Unique Usenet Message Identifier \layout Itemize \family typewriter Send Entire_Article \family sans \newline \family typewriter Entire_Article \family sans = \begin_inset Formula \( < \) \end_inset \family typewriter ID \family sans \begin_inset Formula \( > \) \end_inset \layout Itemize \family typewriter Send Articles_Array \family sans \newline \family typewriter Articles_Array \family sans = Array of News articles objects \layout Description About \layout Standard Whenever a request is made to another agent the request is sent via the \emph on Agent Coordinator \emph toggle which processes requests and sends the information via its controlled port (see section \begin_inset LatexCommand \ref{agent_to_database_man} \end_inset ). \layout Standard When using the default \family typewriter Send Topics, \family default a complete list of topics is sent to the receiving agent as an array or vector. The system can send a specific list of topics in an array or vector rather than a complete list. Allowing the agent to send a specific list to the receiving agent rather than a complete list enables the agent to send the topics which do not contain sensitive information as part of its security feature. A full description of agent security is outside the scope of this project but is explained briefly in section \begin_inset LatexCommand \ref{security} \end_inset . \layout Standard When sending a \family typewriter Message_Id \family default of an article, it first has to be extracted from the article. A \family typewriter Message_Id \family default is the unique id for that article which is generated by Usenet news. A particular article can also be sent by the \family typewriter Message_Id \family default of the article to be acquired. \layout Description Examples \layout Itemize \family typewriter Send Topics \layout Itemize \family typewriter Send myTopics \layout Itemize \family typewriter Send Id(foo@bar34343) \layout Subsection Get \layout Standard \begin_inset Formula \( Get\left\{ \begin{array}{l} Topics\\ Topics-Array\\ Topic-Message-IDs\\ Topic\\ Entire-Article\\ Articles-Array\\ Topic \end{array}\right. \) \end_inset \layout Description Usage \layout Itemize \family typewriter Get Topics \family sans \newline Returns a vector or array of topics \layout Itemize \family typewriter Get Message_Ids, Topic \family sans \newline Get \family typewriter Message_Ids \family sans for a particular topic. \family typewriter Message_Ids \family sans = \begin_inset Formula \( < \) \end_inset \family typewriter ID \family sans \begin_inset Formula \( > \) \end_inset \layout Itemize \family typewriter Get Entire_Article \family sans \newline \family typewriter Entire_Article \family sans = \begin_inset Formula \( < \) \end_inset \family typewriter ID \family sans \begin_inset Formula \( > \) \end_inset \protected_separator \protected_separator \protected_separator Returns an article with \family typewriter Message_Id \family sans \begin_inset Formula \( < \) \end_inset \family typewriter ID \family sans \begin_inset Formula \( > \) \end_inset \layout Itemize \family typewriter Get Articles_Array, Topic \family sans \newline Retrieve an Array of articles from a particular topic. \layout Itemize \family typewriter Get Topic, Message_Ids \family sans \newline Returns Usenet \family typewriter Message_Ids \family sans for a particular topic. \layout Itemize \family typewriter Get Topic \family sans \newline \family typewriter Topic \family sans = \begin_inset Formula \( < \) \end_inset \family typewriter Topic \family sans \begin_inset Formula \( > \) \end_inset \protected_separator \protected_separator \protected_separator (String name of topic) Returns a vector or an array of all the articles for that topic \family default . \layout Description About \layout Standard When using the default, \family typewriter Get Topics \family default the requesting agent receives a complete list of topics that the target \emph on Agent Coordinator \emph toggle holds. \layout Standard A particular article can also be received by sending the \family typewriter Message_Id \family default of the article. \layout Standard The system can retrieve an array or vector of articles relating to a particular topic. \layout Description Examples \layout Itemize \family typewriter Get Topics \layout Itemize \family typewriter Get myTopics \layout Itemize \family typewriter Get 12332@foo.bar \layout Itemize \family typewriter Get \begin_inset Quotes eld \end_inset Blue Whales \begin_inset Quotes erd \end_inset \layout Itemize \family typewriter Get \begin_inset Quotes eld \end_inset Blue Whales \begin_inset Quotes erd \end_inset , Message_Ids \layout Subsection Example of Agent Interaction \begin_inset LatexCommand \label{sec: Agent Interaction Example} \end_inset \layout Standard \added_space_top 0.3cm \added_space_bottom 0.3cm \align center \begin_float tab \layout Standard \align center \LyXTable multicol5 17 5 0 0 -1 -1 -1 -1 1 1 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 8 1 0 "" "" 2 1 0 "" "" 2 1 0 "" "" 2 1 0 "" "" 2 1 1 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 2 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 2 0 1 0 0 0 "" "" 0 2 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 2 0 1 0 0 0 "" "" 0 2 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 2 0 1 0 0 0 "" "" 0 2 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 2 0 1 0 0 0 "" "" 0 2 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 2 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" \size small Time \newline Agent A Messages \newline Agent A Processes \newline Agent B Messages \newline Agent B Processes \newline 1 \newline Get Topics \newline \newline \newline \newline 2 \newline \newline Receive Topics \newline Send Topics \newline \newline 3 \newline Send Message_Ids, \newline Send Message Id's of \newline \newline \newline \newline Topic \newline articles that have the \newline \newline \newline \newline ......... \newline same topics \newline \newline \newline 4 \newline Send Message_Ids \newline \newline Get {Article_Array, \newline Get Articles where \newline \newline \newline \newline Message_Ids} \newline the Message_Ids are \newline \newline \newline \newline \newline not found in the \newline \newline \newline \newline \newline database file \newline 5 \newline \newline \newline Send Articles \newline \newline 6 \newline \newline Receive Articles \newline \newline \newline 7 \newline Send Articles \newline Send Articles \newline \newline Receive Articles \newline \newline \newline which Agent B does \newline \newline \newline \newline \newline not have \newline \newline \newline 8 \newline \newline Process_and_Store \newline \newline Process_and_Store \newline \newline \newline articles \newline \newline articles \layout Caption \begin_inset LatexCommand \label{communicating_agents} \end_inset Agent Communication and Backscratching \end_float \latex latex \backslash vspace{1ex} \layout Standard Table \begin_inset LatexCommand \ref{communicating_agents} \end_inset illustrates how an agent \emph on (Agent A) \emph toggle can retrieve articles from another user agent ( \emph on Agent B \emph toggle ). The table splits the activities of the agent into it's processes and the messages (or commands) that it sends. An indication of the sequence of events and which events are happening at the same time is shown by the \emph on Time \emph toggle column. \emph on Agent A \emph toggle communicates to \emph on Agent B \emph toggle by first asking \emph on Agent B \emph toggle if it is searching for articles on the same topic. If so, the titles of the articles it has stored are sent back to \emph on Agent A \emph toggle as well as their unique \family typewriter Message Ids \family default \emph default . If there are articles that \emph on Agent B \emph default does not have, these articles are sent by \emph on Agent A \emph default . \emph on Agent A \emph default is then given by \emph on Agent B \emph default all the articles in its rule base that were not given in the original list. These articles are filtered by \emph on Agent A \emph default , and any articles that have not been rejected, are stored in the database. \emph toggle Finally \emph on Agent B \emph default filters any new articles given by \emph on Agent A \emph default through its rule base and stores them in its database. \layout Standard After \emph on Agent A \emph toggle has received the articles that it has asked for, \emph on Agent A \emph toggle can then send articles to \emph on Agent \emph toggle B, which is known as \emph on back scratching \emph toggle . \emph on Agent \emph toggle A is aware of the articles that agent B does not have because it has a list of all the \family typewriter Message_Ids \family default it has requested from \emph on Agent A \emph toggle . If there are any articles that \emph on Agent A \emph toggle holds and that \emph on Agent B \emph toggle does not hold in its database, they can be sent to \emph on Agent B \emph toggle . If the topic of \emph on Agent A \emph toggle has been restricted due to security measures it cannot be accessed and therefore no articles can be sent to \emph on Agent B \emph toggle (see section \begin_inset LatexCommand \ref{security} \end_inset ). \layout Section Agent to Agent Coordinator Interaction \begin_inset LatexCommand \label{agent_to_database_man} \end_inset \layout Standard \begin_float fig \layout Standard \align center \begin_inset Figure size 318 192 file database_com.eps flags 9 \end_inset \layout Caption \begin_inset LatexCommand \label{agent-co} \end_inset Agents communicating through a Agent Coordinator \end_float The \emph on Agent Coordinator \emph toggle is a component which coordinates requests between agents and the database. It also enables requests from an agent from one user system to communicate with an agent on another user system. Figure \begin_inset LatexCommand \ref{agent-co} \end_inset shows an \emph on Agent Coordinator \emph toggle coordinating requests from agents. The \emph on Agent Coordinator \emph toggle sends these requests through its own socket. Each \emph on Agent Coordinator \emph toggle is only using one socket rather than having a socket devoted to each agent. The problem with each Agent Coordinator having their own socket is that a machine will only have a limited number of sockets that can be used. Also the Agent Coordinator is unaware of what sockets are available to use. The way that sockets can be requested is not covered in the scope of this project but is explained briefly in section \begin_inset LatexCommand \ref{socket} \end_inset . \emph on \emph toggle When \emph on the Agent Coordinator \emph toggle on the other machine receives a message from an agent, it will in turn forward it to the agent that the message was sent to. \layout Standard The \emph on Agent Coordinator \emph toggle handles requests to the database for articles. The agents do not keep any of the articles themselves but keep a set of references to articles. When asking for an article or list of articles, the agent will send a request to the \emph on Agent Coordinator \emph toggle with the reference or list of references required to access the articles in its database. This database is simply a random access file. The \emph on Agent Coordinator \emph toggle then reads in these articles and sends them to the agent that required them. \layout Standard If the agent requires the database to send articles to another user's agent it sends these articles using its own socket to the machine where the other agent was located. The other user's \emph on Agent Coordinator \emph toggle would then store the articles and give the receiving agent a hashtable with a list of references. \layout Standard \begin_float tab \layout Standard \align center \LyXTable multicol5 4 2 0 0 -1 -1 -1 -1 1 1 0 0 1 1 0 0 0 1 0 0 0 1 0 0 8 1 0 "" "" 8 1 1 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" Article ID \newline Usenet Message ID \newline 15 \newline foo@bar3433 \newline 16 \newline me@mega.com34t3 \newline 17 \newline fuzz@moca.com.au34 \layout Caption \begin_inset LatexCommand \label{hashtable_of_articles} \end_inset Hashtable References for Articles in the Database \end_float Table \begin_inset LatexCommand \ref{hashtable_of_articles} \end_inset is an example hashtable with some articles. This is used by the agent to access the articles that it owns in the database. The first column, the \emph on Article Id \emph toggle is the primary key that is used by the system so that it can be accessed by the \emph on Agent Coordinator \emph toggle . The second column, the \emph on Usenet Id \emph toggle is the articles unique \emph on Id \emph toggle on the Internet. When an agent on another system requests an article it can request a \emph on Usenet Id \emph toggle which the \emph on Agent Coordinator \emph toggle can use to send back the required article. \layout Subsection Retrieving Articles from the Database \layout Standard Agents make requests to the \emph on Agent Coordinator \emph toggle to retrieve articles which they have references to. Each article has its own unique \emph on Article Id \emph toggle which is used by the system for the purpose of writing and retrieving a list of articles that particular agent owns. Each agent has a list of these \emph on Article Ids \emph toggle which are then used for retrieving from the database. \layout Subsection Communicating with other Agents \layout Standard Agents can also send information to other agents. This information can be in the form of: \layout Enumerate Topics. \layout Enumerate Usenet \emph on Message_Ids \emph toggle . \layout Enumerate Articles. \layout Standard When an agent sends out a request to another agent it is first received by the \emph on Agent Coordinator \emph toggle which sends the information to the receiving agent via its own \emph on Agent Coordinator \emph toggle . \layout Section Agent to World Interaction \layout Standard Agents communicate to the world by first querying the \emph on Agent Coordinator \emph toggle . It is this component which then sends the message to the outside world via its own port. Messages received are passed from the article manager to the particular agent. This is achieved by each agent having its own unique identifier so the manager knows where to send them. \layout Chapter Implementation of NFACT (News Filtering Agent Communication Tool) Example \begin_inset LatexCommand \label{NFACT} \end_inset \layout Section The Design of the News Agent \layout Standard This section discusses a particular implementation of NFACT which is a prototype system designed to illustrate the principles of the framework. It outlines each of the following components that were implemented which include the \emph on Agent's NNTP Client \emph toggle , \emph on Client Interface \emph toggle , \emph on User's NNTP Client \emph toggle , \emph on Dialog Based Feedback \emph toggle , An \emph on expert system to filter articles \emph toggle , a \emph on learning mechanism \emph toggle and a \emph on Database of Matching Articles \begin_float footnote \layout Standard The source code and Java files can be found at \begin_inset Quotes eld \end_inset http://www.sd.monash.edu.au/~dmoulder/nfact.html \begin_inset Quotes erd \end_inset . \end_float . \layout Subsection Agent's NNTP Client \layout Standard The \emph on Agent's NNTP Client \emph toggle is in this case the Deja News \begin_inset LatexCommand \cite{deja_news} \end_inset search engine. The agent searches for news articles by making requests to the search engine and bringing back relevant articles before they are classified by the agent's expert system (see section \begin_inset LatexCommand \ref{expert filter} \end_inset ). \layout Subsection Client Interface \begin_inset LatexCommand \label{client_interface} \end_inset \layout Standard The client interface is made up of a general news reader (in this case Java News \begin_inset LatexCommand \cite{java_news} \end_inset ) which is shown in figure \begin_inset LatexCommand \ref{JavaNews} \end_inset and an added dialog box shown in figure \begin_inset LatexCommand \ref{user_profile_screen} \end_inset which gives the user a way of priming the expert system and retrieving the type of articles that the user is looking for. The system also uses this information to set up a filter in the search engine, Deja News \latex latex \begin_inset LatexCommand \cite{deja_news} \end_inset \latex default and retrieves relevant articles which are filtered through the systems rule base (see section \begin_inset LatexCommand \ref{expert filter} \end_inset ). \layout Subsubsection Before Retrieval \layout Standard \begin_float fig \layout Standard \align center \begin_inset Figure size 338 257 file profile.ps flags 9 \end_inset \layout Caption \begin_inset LatexCommand \label{user_profile_screen} \end_inset User Profile Screen \end_float \begin_float fig \layout Standard \align center \begin_inset Figure size 351 163 file JavaNews.ps flags 9 \end_inset \layout Caption \begin_inset LatexCommand \label{JavaNews} \end_inset The JavaNews Client Screen with Agent Menu \end_float \layout Standard The types of information that the system asks the user for and the general method of searching are as follows: \layout Description General \series bold Searching Information \layout Standard The default search for the newsgroups, authors and subjects parameters is for all the entered keywords to be searched (shown in figure \begin_inset LatexCommand \ref{user_profile_screen} \end_inset ). If however `or' is typed after the keyword then either the first \emph on or \emph toggle the second subject keyword is searched for. \layout Standard To exclude words from the search the exclude checkbox can be pressed which excludes the following keyword from being searched for. This can help narrow down the context of a search. If the keyword `and' is typed after the keyword the first keyword \emph on and \emph toggle the second will be searched for. \layout Standard \added_space_top 0.3cm \added_space_bottom 0.3cm \align center \begin_float tab \layout Standard \align center \LyXTable multicol5 3 1 0 0 -1 -1 -1 -1 1 0 0 0 0 0 0 0 0 1 0 0 2 1 1 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 2 0 1 0 0 0 "" "" \family typewriter bread and \newline cheese &! \newline baker \layout Caption \begin_inset LatexCommand \label{subject_example} \end_inset Example for the Subject Parameter \end_float \layout Standard Table \begin_inset LatexCommand \ref{subject_example} \end_inset shows a search for any articles about bread and cheese in the subject heading but will disregard any articles about bakers. \layout Description Subject \series bold Keywords \layout Standard These are keywords which are used to find the subject or subjects that the user is interested in. There is an option for the user to specify subject keywords which are to be ignored, for example growing nuts but not peanuts. Each subject is entered into a text field. When the user presses the return or enter key it is entered into a list box. The field is then left blank to enter the next subject. \layout Description Author \series medium \series bold or Authors \layout Standard The \emph on Author \emph toggle field allows the program to look for or eliminate articles from specific author or authors when making its selection. \layout Description Main \series bold Text Keywords \layout Standard The \emph on Keyword Search \emph toggle field allows the system to favour articles that have these keywords in the main text of the article. \layout Description Newsgroup(s) \layout Standard The agent can be configured to search only particular newsgroups for relevant articles. Each newsgroup is entered into a text field. When the user presses enter, it is added to a list box. The field is then blanked out for entry of the next newsgroup. \layout Subsubsection After Retrieval \begin_inset LatexCommand \label{sec: After Retrieval} \end_inset \layout Standard \begin_float fig \layout Standard \align center \begin_inset Figure size 276 194 file keep.ps flags 9 \end_inset \layout Caption \begin_inset LatexCommand \label{keep_articles} \end_inset The User Choosing Articles which have been Retrieved \end_float \layout Standard When the articles have been retrieved by the agent the `Keep Articles' dialog box appears which is shown in figure \begin_inset LatexCommand \ref{keep_articles} \end_inset . The agent has at this point filtered the articles and placed them into the following categories. \layout Description Style \series bold of an Article \layout Standard The style of the article in this agent is specified by how formal in style the article is. The article style before being processed by the expert system starts with a style of 1 (Formal) and the style rating increases as informalities are discovered in the text. Informalities consist of the following ratings: \layout Enumerate Formal. \layout Enumerate Abbreviations. \layout Enumerate Use of informalisms such as \begin_inset Quotes eld \end_inset a lot \begin_inset Quotes erd \end_inset or \begin_inset Quotes eld \end_inset something like that \begin_inset Quotes erd \end_inset . \layout Enumerate Slang words. \layout Enumerate Swear words. \layout Standard Each of these informality qualities increases the informality of an article. The progression of the ratings was chosen arbitrarily based on knowledge about what is required in a formal piece of writing. Style also encompasses more infomation than what is given by the agent, however to implement rules of this nature is too complicated given the time constraints of an honours thesis. When abbreviations are discovered, the rating changes to 2 if slang words are used then the rating changes to 4 and so on. The abbreviations, informalisms, slang words and swear words are stored in a file on disk . Each rating is displayed to the user on the left hand side of the article title as shown in figure \begin_inset LatexCommand \ref{keep_articles} \end_inset . \layout Description Delete \series bold \begin_inset Quotes eld \end_inset me too \begin_inset Quotes erd \end_inset Articles \layout Standard \begin_inset Quotes eld \end_inset Me too \begin_inset Quotes erd \end_inset articles are written with an agreement to a particular point of view without contributing constructively to the discussion. It appears in the agent profile as a checkbox option that can be selected. Articles of this nature usually have a ` \begin_inset Formula \( > \) \end_inset ' at the start of a line for any text which has been quoted from a previous article. The \emph on Expert Filtering System \emph toggle checks to see how many lines of text have not been quoted. If there are only two or three lines that are not quoted and specific phrases such as \begin_inset Quotes eld \end_inset I agree \begin_inset Quotes erd \end_inset or \begin_inset Quotes eld \end_inset me too \begin_inset Quotes erd \end_inset etc appear in the text, then these types of articles are classified as \begin_inset Quotes eld \end_inset me too \begin_inset Quotes erd \end_inset articles. \layout Description Delete \series bold Suspected Flames \begin_inset LatexCommand \label{flames} \end_inset \layout Standard A flame is an article which includes derogatory remarks about a person and often does not add any constructive criticism to the argument or discussion. This option is a checkbox which allows the user to delete any suspected articles which as mentioned in section \begin_inset LatexCommand \ref{dialog_proto} \end_inset are highlighted with a symbol of a flame. \layout Subsection User's NNTP Client \layout Standard The user's NNTP client module is an adaptation of \shape italic Java News \emph on which is the Java Usenet News client \shape default \emph default \begin_inset LatexCommand \cite{java_news} \end_inset . This client will be altered so that the user can set up the initial conditions for the Agent and the search engine filter for Deja News \begin_inset LatexCommand \cite{deja_news} \end_inset . \layout Subsection Dialog Based Feedback \begin_inset LatexCommand \label{dialog_proto} \end_inset \layout Standard The \emph on Dialog Based Feedback \emph toggle dialog box shown in figure \begin_inset LatexCommand \ref{keep_articles} \end_inset lists all the articles that have been recommended by the system. The listbox appears after the user specified time limit for searching for relevant articles has expired. The longer the system is given, the more relevant articles it returns, up to the limit specified by the user (for example, a maximum of twenty articles). \layout Standard Any articles which are suspected to be flames will include an indication next to the article such as a picture of a flame. Flames are discussed in section \begin_inset LatexCommand \ref{flames} \end_inset . \layout Subsection Expert System Filtering for an Agent \begin_inset LatexCommand \label{sec: Expert System Filtering and Jess} \end_inset \layout Standard The general way that this component behaves is described in section \begin_inset LatexCommand \ref{expert filter} \end_inset . The focus here, is on a description of the specific implementation of this component. \layout Standard The expert system used for filtering articles is provided by using Java and the Perl 5 regular expressions package \begin_inset Quotes eld \end_inset Stevesoft Perl Regular Expressions Library \begin_inset Quotes erd \end_inset \begin_inset LatexCommand \cite{stevesoft} \end_inset which provided a way to do simple filtering in a similar way to Julia (see table \begin_inset LatexCommand \ref{julia_match} \end_inset ). In the future rules will be provided using Jess \begin_inset LatexCommand \cite{jess} \end_inset . Jess is an expert system shell written in Java which uses the Rete algorithm (see section \begin_inset LatexCommand \ref{sec: Rete} \end_inset ) to perform as a production system. The expert system is used to classify articles. The system is able to classify \begin_inset Quotes eld \end_inset me too \begin_inset Quotes erd \end_inset articles, flames and the formality of the articles which have been described in section \begin_inset LatexCommand \ref{client_interface} \end_inset . \layout Subsubsection The Rete Algorithm \begin_inset LatexCommand \label{sec: Rete} \end_inset \layout Standard Jess uses the Rete (Latin for `net') algorithm \begin_inset LatexCommand \cite{rete} \end_inset . The Rete algorithm has been used in several expert system shells including OPS5 \begin_inset LatexCommand \cite{OPS5} \end_inset , its descendant ART \begin_inset LatexCommand \cite{ART} \end_inset , and CLIPS \begin_inset LatexCommand \cite{CLIPS} \end_inset . \layout Standard A basic production system checks each if-then statement to see which ones should be executed based on the facts in the database, looping back to the first rule when it has finished. In the Rete algorithm, efficiency is gained from this basic algorithm by remembering past test results across iterations of the rule loop. Only new facts are tested against any rule (LHSs). Additionally, new facts are tested against only the rule (LHSs) which may be relevant. As a result, the computational complexity per iteration decreases to approximat ely the order of \begin_inset Formula \( O(\sqrt{RP}) \) \end_inset , where R is the number of rules and P is the average number of patterns per rule (LHS). The information given here is brief, for more information see the book \begin_inset Quotes eld \end_inset Expert Systems: Principles and Programming \begin_inset Quotes erd \end_inset by \latex latex \backslash citeN{rete2} \latex default . \layout Standard The Rete algorithm is implemented by building a network of nodes, each of which represents one or more tests found on a rule (LHS). Facts that are being added to or removed from the fact list are processed by this network of nodes. The nodes at the bottom of the network represent individual rules. When a set of facts is filtered to the bottom of the network, it has passed all the tests on the (LHS) of a particular rule and this set becomes an \emph on activation \emph toggle . If one or more facts is removed from the \emph on activation set, \emph toggle the \emph on activation \emph toggle is \emph on \emph toggle invalidated \emph on . \emph toggle The associated rule will only have its (RHS) executed ( \emph on be fired \emph toggle ) if the \emph on activation \emph toggle is not invalidated. \layout Standard Within the network itself as a general definition, there are two kinds of nodes: \layout Enumerate One-input nodes \layout Enumerate Two-input nodes \layout Standard One-input nodes perform tests on individual facts. Two-input nodes, however, perform tests across facts and perform the grouping function. Subtypes of these two classes of node are also used, and there are also auxiliary types such as the terminal nodes which are the last nodes to get executed. \begin_float tab \layout Standard \align center \LyXTable multicol5 5 2 0 0 -1 -1 -1 -1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 8 1 0 "" "" 8 1 1 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" (defrule example-1 \newline (defrule example 2 \newline (x) \newline (x) \newline (y) \newline (y) \newline (z) \newline \begin_inset Formula \( \Rightarrow \) \end_inset \protected_separator ) \newline \begin_inset Formula \( \Rightarrow \) \end_inset \protected_separator ) \newline \layout Caption \begin_inset LatexCommand \label{rete_example} \end_inset Example of the Rete Algorithm \end_float \layout Standard The example of the Rete algorithm shown in table \begin_inset LatexCommand \ref{rete_example} \end_inset may be compiled into the network illustrated in figure \begin_inset LatexCommand \ref{rete_example_figure} \end_inset . \layout Standard \begin_float fig \layout Standard \align center \begin_inset Figure size 299 182 file rete_net.eps flags 10 \end_inset \layout Caption \begin_inset LatexCommand \label{rete_example_figure} \end_inset Network for Rete Algorithm Example \end_float The nodes marked as `x?', `y?' and `z?', test if a fact contains the given data, while the nodes marked `+' remember all facts and fire whenever they have received data from both their left and right inputs. \layout Standard To run the network, the Rete algorithm presents new facts to each node at the top of the network when they are appended to the fact list. Each node takes input from the top and sends its output downwards. A single input node generally receives a fact from above, and applies a test to it. If the test is passed, the fact is sent downward to the next node or if test fails, the one-input nodes simply do nothing. The two-input nodes integrate facts from their left and right inputs, and because of this their behaviour is more complex. \layout Standard Any facts that reach the top of a two-input node could potentially contribute to an \emph on activation \emph toggle . That is, they pass all tests that can be applied to single facts. The two input nodes therefore must remember all facts that are presented to them, and attempt to group facts arriving on their left inputs with facts arriving on their right inputs to make up complete \emph on activation sets \emph toggle . A two-input node therefore has a `left memory' and a `right memory'. Using memories improves efficiency because two tasks can be working simultaneou sly. A convenient distinction is to divide the network into two logical components: the single-input nodes comprise the \emph on pattern network \emph toggle , while the two-input nodes make up the \emph on join network \emph toggle \begin_inset LatexCommand \cite{jess_manual} \end_inset . \layout Subsection Communication With Other User's Agents \begin_inset LatexCommand \label{sec: Implement communication} \end_inset \layout Standard Section \begin_inset LatexCommand \ref{communication} \end_inset described the general way in which agents communicate with each other. This section discusses NFACT's particular implementation of this protocol. \layout Standard The method chosen for communicating with other agents is using Sunsoft's RMI \begin_inset LatexCommand \cite{java} \end_inset , (Remote Method Invocation). This method was used in the prototype because of this papers authors experience in distributed object technologies and this particular method was chosen in preference to CORBA because the system is a homogeneous Java environment. This ensures that a typical Java interface can be used to design an interface to send both string message types, and also the hashtable that links all that particular news objects for that agent. \begin_float tab \layout Standard \align center \LyXTable multicol5 20 5 0 0 -1 -1 -1 -1 1 1 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 8 1 0 "" "" 8 1 0 "" "" 8 1 0 "" "" 8 1 0 "" "" 8 1 1 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" \size scriptsize Time \newline Agent A Messages \newline Agent A Processes \newline Agent B Messages \newline Agent B Processes \newline 1 \newline agentB.getTopics(); \newline \newline \newline \newline 2 \newline \newline Receive Topics \newline agentA.sendTopics \newline \newline \newline \newline \newline (String[] topics); \newline \newline 3 \newline agentA.sendTopic \newline \newline \newline \newline \newline (String [] Message_Ids \newline Send Message Id's of \newline \newline Recieve Topics \newline \newline ,String topic); \newline articles that have the \newline \newline \newline \newline ......... \newline same topics \newline \newline \newline 4 \newline sendMessage_Ids \newline \newline agentA.getArticles \newline Get Articles where \newline \newline (String[] message_Ids) \newline \newline (Article[] articles, \newline the Message_Ids are \newline \newline \newline \newline String[] message_Ids) \newline not found in the \newline \newline \newline \newline \newline database file \newline 5 \newline \newline \newline agentA.sendArticles \newline \newline \newline \newline \newline (Article[] articles); \newline \newline 6 \newline \newline Receive Articles \newline \newline \newline 7 \newline agentB.sendArticles \newline Send Articles \newline \newline Receive Articles \newline \newline (Aricle[] articles); \newline which Agent B does \newline \newline \newline \newline \newline not have \newline \newline \newline 8 \newline \newline Process_and_Store \newline \newline Process_and_Store \newline \newline \newline articles \newline \newline articles \layout Caption \begin_inset LatexCommand \label{agent_rmicalls} \end_inset RMI Calls For Two Agents \end_float \layout Standard Table \begin_inset LatexCommand \ref{agent_rmicalls} \end_inset shows two agents communicating using the RMI protocol to pass topics back and forth. Agent A invokes a ( \family typewriter sendTopics() \family default ) method in an Agent B method to get a list of topics from Agent B. Agent B then automatically sends topics from each agent on its system. This provides an implementation of the backscratching routine found in section \begin_inset LatexCommand \ref{sec: Agent Interaction Example} \end_inset . \layout Subsection Database of Matching Articles \layout Standard The Database of matching articles is stored as a random access file which is manipulated by the \emph on Agent Coordinator \emph toggle . Articles are retrieved from the database when the agents submit an \emph on Article Id \emph toggle or group of \emph on Article Ids \emph toggle which they are referenced by. The \emph on Agent Coordinator \emph toggle can also retrieve articles by their \emph on Usenet Ids \emph toggle when requests are made to retrieve articles from another user's agent via \emph on its Agent Coordinator. \emph toggle The agent keeps a reference to both a \emph on Usenet Id \emph toggle and \emph on Article Id \emph toggle . Articles linked by \emph on Usenet Id \emph toggle need to be retrieved sequentially because the database file can have only the \emph on Article Id \emph toggle as an index. \layout Subsection Implementation Issues \layout Standard The prototype was developed using the Java language because of the ease of implementation and that it can be executed on many hardware platforms, including Windows NT \latex latex \begin_inset LatexCommand \cite{word} \end_inset \latex default and Solaris \begin_inset LatexCommand \cite{java} \end_inset . One of the main problems with this approach however, is that Java's virtual machine executing bytecode is very slow. Therefore other languages such as C++ could be used to increase the performance of the agent. \layout Standard Another problem is that there have been several features that due to time constraints could not be implemented these include \emph on a system for agent learning, finding other users agents and security features \emph toggle to ensure that only certain articles can be given to other agents. \layout Chapter Results from the implementation of NFACT \layout Section Testing Strategy \layout Standard The testing of the prototype application involves the input of various test data to illustrate the different capabilities of the NFACT prototype. This includes testing for flames such as using the keyword 'idiot' as an input. The number of outputted articles was limited to twenty as this is representativ e of a typical query type. NFACT is tested against manually searching for the same queries using the Unix `tin' newsreader. A percentage of \emph on unhelpful, helpful, incorrect \emph toggle or \emph on correct \emph toggle articles is illustrated. \layout Description incorrect: articles are ones which have been incorrectly classified (for example an article which has been classified as a flame when the article is not). \layout Description correct: articles are ones which have been correctly classified. \layout Description rating: each article is rated from 1 (most helpful) to 10 (unhelpful). The number of each rating is shown in the `Results as Expected?' column of table \begin_inset LatexCommand \ref{testing} \end_inset . \layout Description Time: the time taken to retrieve and classify articles is also given in table \begin_inset LatexCommand \ref{testing} \end_inset . \layout Subsection Methodology \layout Standard Table \begin_inset LatexCommand \ref{testing} \end_inset illustrates the testing of the display of different classifications by entering in some key words which, if found in the main text of an article, should bring up a particular classification. For example entering \begin_inset Quotes eld \end_inset idiot \begin_inset Quotes erd \end_inset into the \emph on Keyword Search field \emph toggle the system should classify articles as a `Flame' if that keyword appears in the text. \layout Standard \spacing onehalf \begin_float tab \layout Standard \align center \LyXTable multicol5 10 6 0 0 -1 -1 -1 -1 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 2 1 0 "" "" 2 1 0 "" "" 8 1 0 "" "" 8 1 0 "" "" 8 1 0 "" "" 2 1 1 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 2 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 2 0 1 0 0 0 "" "" 0 2 0 1 0 0 0 "" "" 0 2 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 2 0 1 0 0 0 "" "" 0 2 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" \size tiny Test Made \newline Expected Results \newline Results as \newline Time \newline Articles \newline Articles Flagged \newline \newline \newline Expected ? \newline (mins) \newline Tested \newline and Type \newline Entered some dummy \newline Flame and \newline yes \newline \newline 3 \newline 1 \begin_inset Quotes eld \end_inset Me too \begin_inset Quotes erd \end_inset \newline articles into the KeepArticles \newline \begin_inset Quotes eld \end_inset me too \begin_inset Quotes erd \end_inset articles \newline \newline \newline \newline 1 Flame \newline class with various classifications. \newline classified. \newline \newline \newline \newline \newline Entered \begin_inset Quotes eld \end_inset idiot \begin_inset Quotes erd \end_inset into query field \newline Flame articles classified \newline yes (20/20 \newline 1:33:85 \newline 20 \newline 9 Flame \newline \newline \newline Correctly \size default \size tiny classified) \newline \newline \newline (Formal N/A) \newline Entered \begin_inset Quotes eld \end_inset me too \begin_inset Quotes erd \end_inset into \newline \begin_inset Quotes eld \end_inset Me too \begin_inset Quotes erd \end_inset articles classified \newline yes (20/20 \newline 1:09:67 \newline 20 \newline 12 Me Too \size default \newline \size tiny the query field \size default \newline \newline \size tiny Correctly \size default \size tiny classified) \size default \newline \newline \newline \size tiny 1 Flame \size default \newline \size tiny \newline \newline \newline \newline \newline (Formal N/A) \layout Caption \begin_inset LatexCommand \label{testing} \end_inset Testing Table 1 \end_float In table \begin_inset LatexCommand \ref{tab: Testing 2} \end_inset , different topics are entered into the \emph on Keyword Search field. \emph toggle The articles are then checked to see how helpful or unhelpful they are by rating them from 1 (very helpful) to 10 (unhelpful) which is shown in the `Results as Expected?' column. The number of articles which were flagged as being of a particular type (such as `Flame' or `Me too') are shown in column `Articles Flagged and Type'. The `Articles Flagged and Type' column also displays how many articles were classified as a particular formality type. For example \begin_inset Quotes eld \end_inset 5, 3 Formal \begin_inset Quotes erd \end_inset indicates that 5 articles of a formal rating of 3 have been found. For more information about the formality rating system see section \begin_inset LatexCommand \ref{sec: After Retrieval} \end_inset . \begin_float tab \layout Standard \align center \LyXTable multicol5 42 6 0 0 -1 -1 -1 -1 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 8 1 0 "" "" 8 1 0 "" "" 8 1 0 "" "" 8 1 0 "" "" 8 1 0 "" "" 8 1 1 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" \size tiny Test Made \size default \newline \size tiny Expected Results \size default \newline \size tiny Results as \size default \newline \size tiny Time \size default \newline \size tiny Articles \size default \newline \size tiny Articles Flagged \size default \newline \newline \size tiny \size default \newline \size tiny Expected ? \size default \newline \size tiny (mins) \size default \newline \size tiny Tested \size default \newline \size tiny and Type \size default \newline \size tiny Entered \begin_inset Quotes eld \end_inset ZX Spectrum \begin_inset Quotes erd \end_inset into \size default \newline \size tiny Articles Correctly \size default \newline \size tiny no \size default \newline \size tiny 0:50 \size default \newline \newline \size tiny none \size default \newline \size tiny the query field \size default \newline \newline \size tiny 17 Incorrectly \size default \newline \newline \newline \size tiny 17, 1 Formal \size default \newline \newline \newline \size tiny Classified a Formal 1 \size default \newline \newline \newline \size tiny 3, 3 Formal \size default \newline \newline \newline \size tiny 17 Rating 10 \size default \newline \newline \newline \newline \newline \newline \size tiny 1 Rating 4 \size default \newline \newline \newline \newline \newline \newline \size tiny 1 Rating 5 \size default \newline \newline \newline \newline \size tiny Entered \begin_inset Quotes eld \end_inset Intel vs powerpc \begin_inset Quotes erd \end_inset \size default \newline \size tiny Articles Correctly \size default \newline \size tiny yes \size default \newline \size tiny 3:09 \size default \newline \size tiny 20 \size default \newline \size tiny none \size default \newline \size tiny into the query field. \size default \newline \size tiny Classified \size default \newline \size tiny 1 \size default \size tiny Rating 1 \size default \newline \newline \newline \size tiny 1 , 1 Formal \size default \newline \newline \size tiny and a \size default \newline \size tiny 1, \size default \size tiny Rating 3 \size default \newline \newline \newline \size tiny 17, 3 Formal \size default \newline \newline \size tiny majority \size default \newline \size tiny 2 \size default \size tiny Rating 4 \size default \newline \newline \newline \size tiny 2, 4 formal \size default \newline \newline \size tiny of helpful \size default \newline \size tiny 3 \size default \size tiny Rating 5 \size default \newline \newline \newline \size tiny 3 Incorrect \size default \newline \newline \size tiny articles \size default \newline \size tiny 1 \size default \size tiny Rating 6 \size default \newline \newline \newline \newline \newline \size tiny found. \size default \newline \size tiny 1 Rating 8 \size default \newline \newline \newline \newline \newline \newline \size tiny 2 Rating 9 \size default \newline \newline \newline \newline \newline \newline \size tiny 7 Rating 10 \size default \newline \newline \newline \newline \size tiny Entered \begin_inset Quotes eld \end_inset Mac OS X \begin_inset Quotes erd \end_inset \newline Articles Classified \newline no (many articles \newline 1:40 \newline 20 \newline 2 Flames \newline into the query field \newline Correctly \newline filled with \newline \newline \newline 7, 3 Formal \newline \newline and a \newline nonsense characters) \newline \newline \newline 13, 1 Formal \newline \newline majority \newline 1 Rating 3 \newline \newline \newline 10 Incorrect \newline \newline of helpful \newline 2 Rating 4 \newline \newline \newline \newline \newline article \newline 1 Rating 5 \newline \newline \newline \newline \newline found. \newline 1 Rating 8 \newline \newline \newline \newline \newline \newline 2 Rating 9 \newline \newline \newline \newline \newline \newline 12 Rating 10 \newline \newline \newline \newline Entered \begin_inset Quotes eld \end_inset Linux ppc \begin_inset Quotes erd \end_inset \newline Articles \newline 3 Rating 1 \newline 1:10 \newline 20 \newline none \newline into the query field \newline Classified \newline 2 Rating 2 \newline \newline \newline 7, 1 Formal \newline \newline and a \newline 2 Rating 3 \newline \newline \newline 3, 2 Formal \newline \newline majority \newline 2 Rating 5 \newline \newline \newline 10, 3 Formal \newline \newline of helpful \newline 2 Rating 6 \newline \newline \newline 8 Incorrect \size default \newline \size tiny \newline articles found \newline 2 Rating 7 \newline \newline \newline \newline \newline \newline 2 Rating 9 \newline \newline \newline \newline \newline \newline 5 Rating 10 \newline \newline \newline \newline Entered \begin_inset Quotes eld \end_inset Be OS \begin_inset Quotes erd \end_inset \newline Articles Classified \newline 1 Rating 2 \newline 1:10 \newline 20 \newline none \newline into the query field \newline and a \newline 3 Rating 3 \newline \newline \newline 9, 1 Formal \newline Entered comp.sys.be \newline majority \newline 1 Rating 4 \newline \newline \newline 1, 2 Formal \newline into the newsgroup \newline of helpful \newline 1 Rating 5 \newline \newline \newline 10, 3 Formal \newline field as a general \newline articles found \newline 1 Rating 7 \newline \newline \newline \newline search did not \newline \newline 2 Rating 8 \newline \newline \newline \newline produce any results \newline \newline 3 Rating 9 \newline \newline \newline \newline \newline \newline 8,Rating 10 \newline \newline \newline \layout Caption \begin_inset LatexCommand \label{tab: Testing 2} \end_inset Testing Table 2 (NFACT Testing) \end_float Finally the table \begin_inset LatexCommand \ref{tab: Testing 3} \end_inset shows the same queries that were made in the second section except that they have been searched for manually to illustrate a search by the user in using the `tin' Unix news reader. The first twenty articles which look as though they would satisfy the search criteria are examined in a similar way to the second section of the table and rated accordingly. The user also manually places a classification for the article (for example a `Flame' or `Me too' article). \layout Standard The aim for this testing is to illustrate how the agent can improve the user's search for relevant articles, and by classifying them can reduce the amount of irrelevant articles that would normally be browsed. One would not normally read a `Me too' type article for example. \begin_float tab \layout Standard \align center \LyXTable multicol5 27 5 0 0 -1 -1 -1 -1 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 8 1 0 "" "" 8 1 0 "" "" 8 1 0 "" "" 8 1 0 "" "" 8 1 1 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" \size tiny Test Made \newline Results of \newline Time \newline Articles \newline Articles \newline \newline search \newline (mins) \newline Tested \newline Manually \newline \newline \newline \newline \newline Classified and Type \newline Looked for \begin_inset Quotes eld \end_inset ZX Spectrum \begin_inset Quotes erd \end_inset \newline 0 \newline 2 \newline 0 \newline n/a \newline in the \newline articles \newline \newline \newline \newline fido7.zx.spectrum \newline found \newline \newline \newline \newline newsgroup \newline \newline \newline \newline \newline Looked for \begin_inset Quotes eld \end_inset Intel vs powerpc \begin_inset Quotes erd \end_inset \newline 1 Rating 5 \newline 15:18 \newline 16 \newline none \newline in the comp.sys.powerpc and . \newline 2 Rating 7 \newline \newline \newline 1 , 2Formal \newline comp.sys.intel \newline 2 Rating 9 \newline \newline \newline 10, 3 Formal \newline newsgroups. \newline 11, Rating 10 \newline \newline \newline 4, 4 formal \newline Looked for \begin_inset Quotes eld \end_inset Mac OS X \begin_inset Quotes erd \end_inset \newline 1 Rating 1 \newline 10:10 \newline 7 \newline none \newline in the \newline 1 Rating 2 \newline \newline \newline 1, 5 Rating \newline comp.sys.mac.developer \newline 1 Rating 5 \newline \newline \newline 6, 3 Rating \newline and comp.sys.mac.advocacy \newline 1 Rating 7 \newline \newline \newline \newline newsgroups. \newline 1 Rating 8 \newline \newline \newline \newline \newline 2 Rating 10 \newline \newline \newline \newline Looked for \begin_inset Quotes eld \end_inset Linux ppc \begin_inset Quotes erd \end_inset \newline 2 Rating 1 \newline 17:10 \newline 5 \newline none \newline in the \newline 1 Rating 2 \newline \newline \newline 3, 3 Formal \newline comp.sys.linux.* \newline 1 Rating 3 \newline \newline \newline 2, 4 Formal \newline hirachy. \newline 1 Rating 5 \newline \newline \newline \newline \newline \newline \newline \newline \newline \newline \newline \newline \newline \newline looked for Be \begin_inset Quotes eld \end_inset OS \begin_inset Quotes erd \end_inset \newline 1 Rating 2 \newline 23:15 \newline 3 \newline none \newline in the \newline 1 Rating 5 \newline \newline \newline 2, 3 Formal \newline comp.sys.be \newline 1 Rating 10 \newline \newline \newline 1, 4 Formal \newline newsgroup \newline \newline \newline \newline \layout Caption \begin_inset LatexCommand \label{tab: Testing 3} \end_inset Testing Table 3 (Manual Testing) \end_float Figure \begin_inset LatexCommand \ref{fig: nfact testing chart} \end_inset shows the five queries in table \begin_inset LatexCommand \ref{tab: Testing 2} \end_inset with the ratings for the twenty articles that were retrieved by the agent. \layout Standard Figure \begin_inset LatexCommand \ref{fig: manual testing chart} \end_inset is similar to figure \begin_inset LatexCommand \ref{fig: nfact testing chart} \end_inset . Only a few articles were found for each query, which is shown as the \begin_inset Quotes eld \end_inset Totals \begin_inset Quotes erd \end_inset in the `x' axis. The reason is that the search engine is able to retrieve articles which are no longer kept on the server, and is able to search through a wider range of groups. Therefore more articles are retrieved than would be found in a manual search. \layout Standard Figures \begin_inset LatexCommand \ref{fig: manual testing chart} \end_inset and \begin_inset LatexCommand \ref{fig: nfact testing chart} \end_inset suggest that more useful articles were found by using NFACT than by performing a search for relevant articles manually. Also the extra information given by the agent gives the user useful information about whether an article might be relevant or not, which gives an advantage to using a search engine such as Deja News by itself. \begin_float fig \layout Standard \align center \begin_inset Figure size 337 242 file nfact_chart.ps flags 9 \end_inset \layout Caption \begin_inset LatexCommand \label{fig: nfact testing chart} \end_inset Article Rating Frequency for Table 2 (NFACT Testing) \end_float \begin_float fig \layout Standard \align center \begin_inset Figure size 351 272 file manual_chart.ps flags 9 \end_inset \layout Caption \begin_inset LatexCommand \label{fig: manual testing chart} \end_inset Article Rating Frequency for Table 3 (Manual Testing) \end_float \layout Chapter Conclusion and Future Work \layout Standard The project addressed some key questions such as what kind of reasoning model should be provided in an intelligent agent. The model that provides the brains for the agent to function effectively. This model has been developed further from similar news filtering agents such as INFOS by taking into account the type of information the user is interested in and not just the keywords of the topic itself. The agent can limit the sources of information from which an agent retrieves relevant information. This is achieved by limiting the agent to specific news groups or cutting out information from a specific author of an article automatically using appropriate filters. However, simple filtering alone is not enough to enable to agent to make a decision as to which articles should be presented to the user. The agent, by using heuristics can approximately classify an article into different types. For example a derogatory article (see section \begin_inset LatexCommand \ref{flames} \end_inset ) is shown in the dialog box by using a symbol of a flame to indicate that the agent has made an educated guess as to the nature of the article. \layout Standard Another question has been addressed is how user agents can communicate with each other, so that each could provide recommendations to the other. By sharing articles agents can retrieve articles from another user that have a greater chance of being relevant. \layout Section Agent Model \layout Standard Figure \begin_inset LatexCommand \ref{Project_diagram} \end_inset shows four agents retrieving and placing sorted matching articles from a database using an \emph on Agent Coordinator. \emph toggle Such a system might typically include the following components, \emph on Agent's NNTP Client \emph toggle , \emph on Client Interface \emph toggle , \emph on User's NNTP Client \emph toggle , \emph on Dialog Based Feedback \emph toggle , An \emph on expert system to filter articles \emph toggle , a \emph on learning mechanism \emph toggle and a \emph on Database of Matching Articles \emph toggle . These are represented in figure \begin_inset LatexCommand \ref{Project_diagram} \end_inset . Each component is briefly summarised as follows: \layout Itemize Agent's NNTP Client \begin_inset LatexCommand \label{sec: Agent's NNTP Client} \end_inset \newline Agents can retrieve articles by either using a search engine (such as Deja News \latex latex \latex default \begin_inset LatexCommand \cite{deja_news} \end_inset ) or by connecting to a news server that the client connects with. The agent can then retrieve articles by issuing commands to the server. \layout Itemize Client Interface \newline The client interface should enable the user to have access to all the abilities of a standard Usenet news client and have a menu, which saves to disk the agent's initial profile and is able to read and alter topic profiles. The system should also have the ability to search for other user agents and connect to them. An example client interface can be found in the prototype shown in figure \begin_inset LatexCommand \ref{user_profile_screen} \end_inset . \layout Itemize User's NNTP Client \newline The NNTP user client connects a \emph on news server \emph toggle and performs all the commands necessary to read and post news articles. \layout Itemize Dialog Based Feedback \newline The \emph on Dialog Feedback Mechanism \emph toggle component creates the starting rules based on the user profile. The user profile is created from the limitations or initial search settings that the user enters into the system which include information about what author, subject, author's email address or of the content of the article itself should be searched for. \layout Itemize Expert System Filtering for an Agent \newline The agent uses an expert system to enable it to classify articles. The \emph on Expert System \emph toggle is a component which the agent sends its main text to so that it can be processed and the resulting classification sent back to the system. The system also provides a mechanism for introspection which involves outputtin g text to a listbox. The text contains information about what decisions the agent has made in regard to classifying and choosing articles. \layout Itemize Agent Learning and Updating the Rulebase \newline News agents are given examples of articles which the user finds useful and then they extrapolate some rules by using pattern matching techniques. The agent also observes the user when they are reading articles and tries to ascertain wheather the user is interested in the article. If the agent believes the article to be useful it will present it to the user using the \emph on Dialog Feedback Mechanism \emph toggle . \layout Itemize Database of Matching Articles \newline The database file is accessed via the Agent Coordinator. This component stores the articles that belong to the agents. The agents are able to retrieve and save articles to the database by the use of references. These references are stored in a hashtable, which is shown in table \begin_inset LatexCommand \ref{hashtable_of_articles} \end_inset . Each article processed by the system can be stored in a news object, which contains several attributes such as The \emph on System Id \emph toggle and the \emph on Subject \emph toggle of the article. \layout Itemize Inter-agent Communication Language for Agents \newline One of the properties of agents is their ability to communicate with other agents. NFACT communicates with other agents to retrieve useful articles. The agent does not retrieve rules from other agents because the rules that are used need to be generated from the user's own preferences and not another user, is who may have different interests even within the same topic domain. An abstract language has been created to allow communication between other agents and the \emph on Agent Coordinator \emph toggle . The language has been fully described in section \begin_inset LatexCommand \ref{communication} \end_inset . \layout Section Future Research \layout Standard This project addresses two main areas, that of a reasoning model for the agent which is implemented in NFACT as a production system and the issue of a communication protocol between agents to negotiate and swap relevant articles. What has not been addressed however is the problem of finding other agents on the Internet. Would the agent be able to use a DNS lookup system to find other agents? Would it be possible to use a search engine which agents could register their existence? \layout Standard One of the other main issues that needs to be addressed is that of security. It is important to restrict the information that can be retrieved from other agents because of groups and the information contained therein that should only be local to a particular server. \layout Standard Four of these issues that are explained here further are the restriction of the distribution of articles for better security, socket sharing between Agent Coordinators, implementing and testing an agent learning facility and an agent communication facility. \layout Subsection Security (Scope of Access to Articles) \begin_inset LatexCommand \label{security} \end_inset \layout Standard Security for a news agent is important to restrict the information that can be retrieved from other agents. Some newsgroups are only made available locally on a particular site or sub network. Some of these local newsgroups could contain sensitive information that the particular organisations where the group is based would not like released to a more public audience. Because agents are retrieving articles from other agents they could inadvertent ly retrieve articles of this nature. \layout Subsection Socket Sharing for Agent Coordinators \begin_inset LatexCommand \label{socket} \end_inset \layout Standard One of the main problems with the internet socket protocol is that there is only a limited number of sockets and only some of the applications that are widely used such as \emph on Internet News \emph toggle and the \emph on World Wide Web \emph toggle have standardised on a particular port. It is for this reason that the \emph on Agent Coordinator \emph toggle could have a problem if there was another program or another \emph on Agent Coordinator \emph off \emph toggle using the same port number. \layout Standard The idea of dynamic IP address has been utilised in the past few years. Internet service providers with limited allocations of IP addresses to give to their customers can dynamically allocate them to a client who is logging in via a modem. In the same way port addresses could be negotiated by a \emph off Coordinator Port Allocator \emph toggle (ACPA) program. \layout Standard When a program requests a port number it first requests a port from the ACPA. \layout Subsection Implementation and Testing of the Agent Learning Facility \layout Standard The prototype at this stage enables the user to search for articles via search engine (see section \begin_inset LatexCommand \ref{sec: Agent's NNTP Client} \end_inset ) and enables the user to retrieve a list of filtered articles. The agent at this stage does not implement a way of learning from the user about which articles are important to keep. The general way an agent could learn from the user is discussed in section \begin_inset LatexCommand \ref{agent_learning} \end_inset ). The user, by choosing from the retrieved articles the most appropriate ones would enable the agent to learn about the user's preferences by adding rules to the Jess rulebase (see section \begin_inset LatexCommand \ref{sec: Expert System Filtering and Jess} \end_inset ). When a similar search is performed, the system would place the articles which could be more relevant at the top of the list by using this rulebase. \layout Subsection Implementation and Testing of Agent Communication \layout Standard The agent communication discussed in chapter \begin_inset LatexCommand \ref{communication} \end_inset has still to be implemented. The way that this would be achieved has been described in section \begin_inset LatexCommand \ref{sec: Implement communication} \end_inset . The RMI (Remote Method Invocation) interface for the prototype has been created, however, only passing a single article has been implemented. \layout Standard There are a number of ways that the agent could be extended to improve it's capablilities, however there are quite a number of features and ideas that have been implemented in the prototype (section \begin_inset LatexCommand \ref{NFACT} \end_inset ) or designed in the framework given in section \begin_inset LatexCommand \ref{sec: Framework} \end_inset , which gives a solid basis for future work. \layout Standard \begin_inset LatexCommand \BibTeX[chicago]{thesis_draft} \end_inset \layout Chapter \start_of_appendix The Internet \begin_inset LatexCommand \label{The_internet} \end_inset \layout Standard Recent years have seen an explosive growth of the Internet. The Internet was originally built in the 1960's by DARPA (Defense Advanced Research Projects Agency) \latex latex \latex default \begin_inset LatexCommand \cite{rfc823} \end_inset . \layout Standard It was a group of heterogenous computers linked up in way which would survive a nuclear explosion. For example even if part of the network had been destroyed the information would be re-routed in a different way. This network was called ARPANET (Advanced Research Projects Network). \layout Standard In time universities started to connect to the network and the defense departmen t constructed its own network Defense Data Network (DDN) to containsensitive information. They still however maintained a presence on the original network so they would have access to the research being carried out by the universities around the country. \layout Standard This network became NSFNET (National Science Foundation Network) which still is the backbone of the Internet today. Finally many private individuals and non profit organisations which became known as ISP's (Internet Service Providers) became connected. But it was the advent of the World Wide Web that became the starting point for the large scale growth of the Internet. Now large corporations and small business have to joined to advertise their services, keep in contact with relevant research in their area with universitie s and also to advertise their services. This began the commercialisation of the Internet into what it has become at this time a large network with millions of computers and even larger amounts of people using those computers. \layout Standard The main problem in recent years has been the shortage of IP addresses. Just as people have a physical home address each computer connected to the Internet has an address which is used for outgoing and incominginformation. The IP address is four numbers separated by colons For example a computer at Monash University called Silas has the IP address 130.194.1.100 . The IP address forms part of the TCP/IP protocol suite. This stands for Transmission Control Protocol/Internet Protocol. TCP ensures that the packets reach their destination in the right order and that there are no packets missing. This is useful for FTP (file transport protocol) which is used for transferring files and other applications where it is critical to receive all the packets correctly. \layout Standard Another form of transferring packets is UDP where packets are sent without worrying about error correction. This allows packets to be sent from the source to the destination very quickly but if packets are corrupted then some packets are missed. This is usually used in multimedia streaming such as RealVideo or RealAudio it is not vital that every single packet is received correctly just that there is a large enough flow of data to receive a constant audio or visual stream. \layout Section Internet News \layout Standard Netnews was originally designed on UNIX systems to exchange information in a common area. Now most of the platforms connected to the Internet use this form of news protocol such as the Macintosh, PC and VAX computers. \layout Standard Information on netnews is divided into groups called newsgroup. These newsgroups cover specific areas of interest. There are more than a thousand newsgroups, although not all of them are available for one server. For more information about Internet News see chapter \begin_inset LatexCommand \ref{Internet_news_appendix} \end_inset : \layout Subsection Other Internet Services \layout Subsubsection FTP (File Transfer Protocol) \layout Standard FTP is a way of sending and retrieving files over the Internet. It is used both by dedicated clients and also other programs such as Web Browsers (e.g. Netscape Navigator and Microsoft Internet Explorer). It uses TCP/IP to provide reliable connections which will re send any corrupted packets of data or missing packets because of its error correcting facilities. \layout Subsubsection Telnet \layout Standard Telnet allows users to remotely access a computer located at another site. By telneting to a particular machine one is able to remotely log on to the machine. This can be done anonymously if there is a public account. This involves typing \family typewriter `anonymous' \family default for the user and the users current email address for the password. Or a name and password for a private account. This means one can use the resources on the remote machine, such as being able to read mail or news. \layout Subsubsection Gopher \layout Standard Gopher is a way of finding files on the Internet with clients connecting up to gopher services. It provides a hierarchical tree type structure to sort information in a logical way. A book from the local library would be found in the \begin_inset Quotes eld \end_inset Libraries/Australian Libraries/Melbourne Libraries/Boroondara/Hawthorn \begin_inset Quotes erd \end_inset directory for example. \layout Subsubsection Archie \layout Standard Archie is a way of searching for programs. Users would typically telnet to an Archie server such as Archie.au and be presented with a text menu which would allow the users to make a search using regular expressions or just the name of the file that is being searched for. The system then searches through its database which is regularly updated and provides the user with a list of sites which have the particular file, and the directory in which they can be found. With a graphical client the user just has to double click on one of the sites listed to download the file from that particular site. \layout Subsubsection The World Wide Web (WWW) \layout Standard The (World Wide Web) WWW is the most popular of services and is a combination of information retrieval and hypertext. \layout Standard The WWW seeks to provide access to the web of information available. It does this using HTML (Hypertext Markup Language) which is a text language used to describe a web page. An example of how to make a piece of text bold is shown in table \begin_inset LatexCommand \ref{html_text_bold} \end_inset . \layout Standard \begin_float tab \layout Standard \align center \LyXTable multicol5 3 1 0 0 -1 -1 -1 -1 1 0 0 0 0 0 0 0 0 1 0 0 8 1 1 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" 0 8 0 1 0 0 0 "" "" \begin_inset Formula \( < \) \end_inset b \begin_inset Formula \( > \) \end_inset This would be bold on a web page \begin_inset Formula \( < \) \end_inset /b \begin_inset Formula \( > \) \end_inset \newline or for a large web page heading: \newline \begin_inset Formula \( < \) \end_inset h1 \begin_inset Formula \( > \) \end_inset This is my Large Heading \begin_inset Formula \( < \) \end_inset /h1 \begin_inset Formula \( > \) \end_inset \layout Caption \begin_inset LatexCommand \label{html_text_bold} \end_inset HTML (Hypertext Markup Language) Example \end_float The WWW contains many different types of data from pictures, to video, to streamed audio such as what is linked with RealAudio. It does not have a top or a bottom, like the spider web it resembles it spreads out in many different directions. Anyone can publish information and an individual can have just as much web real estate as a large company such as IBM or Apple computer. \layout Chapter Internet News \begin_inset LatexCommand \label{Internet_news_appendix} \end_inset \layout Standard Internet News (hereafter called \begin_inset Quotes eld \end_inset Netnews \begin_inset Quotes erd \end_inset ) was originally designed on UNIX systems to exchange information in a common area. Now most of the platforms connected to the Internet use this form of news protocol such as the Macintosh, PC and VAX computers. \layout Standard Information on Netnews is divided into groups called newsgroups. These newsgroups cover specific areas of interest. There are more than a thousand newsgroups, although not all of them are available on one server. They are arranged in a hierarchical tree fashion, with each root of the tree devoted to a major topic for example \layout LyX-Code \begin_float fig \layout Standard \align center \begin_inset Figure size 109 76 file tree.eps flags 9 \end_inset \layout Caption A Newsgroup Tree \end_float \layout Standard Some of the major roots are: \layout Itemize alt - Alternative newsgroups often used for prototype newsgroups. \layout Itemize bionet - Biology related newsgroups. \layout Itemize bit - Bitnet newsgroups. \layout Itemize gnu - Newsgroups from the Free Software Foundation. \layout Itemize rec - Recreation related newsgroups. \layout Itemize comp - Computer related newsgroups. \layout Itemize sci - Scientific newsgroups. \layout Standard Some of the programs available on \latex latex \latex default Unix systems are \shape italic rn, tin, gnunews \shape default , and \shape italic nn \shape default . Some of these newsgroups are moderated. This means there is a person nominated as a moderator who only allows articles which are deemed appropriate to the newsgroup to be posted. If non text data has to be sent in a news posting, MIME (Multipurpose Internet Mail Extensions) \latex latex \begin_inset LatexCommand \cite{rfc2045} \end_inset \latex default can encrypt binary data found in pictures, Microsoft Word documents, music files etc into a text format that can be decoded at the other end. The MIME format uses the Unix uuencode and uudecode utilities to do this, but many other platforms that have MIME encoding in their newsreaders process MIME encoded data with their own uuencoder and uudecoder built into them. \layout Section Design of NNTP \layout Standard NNTP is a protocol which specifies the distribution, inquiry and the retrieval, and posting of news articles using a reliable stream-based transmission of news. The medium of this news being the Internet \begin_inset LatexCommand \cite{rfc977} \end_inset . \layout Standard Articles are stored in a central database allowing the client to download only the ones that the user wishes to read. These articles are then indexed and cross-referenced with articles on the same topic. It will also delete news articles that have expired. The Unix Usenet news system provided these facilities but was restricted to Unix hosts. The NNTP standard allows for a wide range of different platforms to be able to have their own clients. The protocol for talking to the server is simply to open up a news port (typically 119) and to send text messages to the server to ask for groups and articles. Some typical commands are : \layout List \labelwidthstring 00.00.0000 NEWSGROUPS - a list of new newsgroups that have been created on the server since the client last connected, any groups which are allowed to be accessed by the local server can be created as a new newsgroup. \layout List \labelwidthstring 00.00.0000 NEWNEWS - Receives a list of new articles from the server in the form of message-Ids. \layout List \labelwidthstring 00.00.0000 NEXT - Retrieve next article, uses the \begin_inset Quotes eld \end_inset current article pointer \begin_inset Quotes erd \end_inset . This pointer keeps track of which article is currently selected. \layout List \labelwidthstring 00.00.0000 POST - Post an article. If posting is not allowed the response code 440 is returned by the server. A single period on the line at the end of the text indicates the end of the article. \layout List \labelwidthstring 00.00.0000 QUIT - Finish transactions with the server and close the connection. \layout List \labelwidthstring 00.00.0000 SLAVE - Indicates to the server that the client is a slave server, and can receive messages at a higher priority than other clients. Slave servers are used to reduce network traffic and buffer articles from the main server for faster service. If a slave server is not available the client can connect in some cases (depending how the network is set up) directly to the main news server. \layout List \labelwidthstring 00.00.0000 HELP - Provides a short summary of commands that are understood by this server. \layout List \labelwidthstring 00.00.0000 IHAVE - This command informs the server that the client has a particular article which is defined by its message Id. \layout List \labelwidthstring 00.00.0000 GROUP - This command asks the server to select a newsgroup, for example comp.sys.mac.games. \layout List \labelwidthstring 00.00.0000 LIST - List of valid newsgroups on the server. \layout List \labelwidthstring 00.00.0000 ARTICLE - This command asks the server to provide a selected article, the parameter being the message Id. \layout List \labelwidthstring 00.00.0000 HEAD - Similar to article except only returns the head of an article. \layout List \labelwidthstring 00.00.0000 BODY - Similar to article except only returns the article body. \layout List \labelwidthstring 00.00.0000 STAT - Similar to article except there is no text returned, it just changes the current pointer to the selected article. \layout Subsection Responses from the Server \layout Standard There are two types of responses given by the server to commands, text responses and status responses. \layout Standard For text responses text is sent as a series of lines of text each terminated with a carriage return-line feed pair. A line containing a single period (.), indicates the end of a section of text. Text messages are displayed on the users terminal whereas command/status messages will be interpreted by the client. \layout Standard Status reports from the server indicate the response from the last command sent by the client. Status responses are in the form of a 3 digit numeric code, some indicate that more text is to be transmitted by the server. The first digit indicates the failure, or progress of a previous command. The next digit indicates which type of function the command was calling. A complete list of status codes are listed in the request for comment 977 document \begin_inset LatexCommand \cite{rfc977} \end_inset . \layout Subsection Example of a Client Server Interaction \layout LyX-Code Server: (listens at TCP port 119) \layout LyX-Code Client: (requests connection on TCP port 119) \layout LyX-Code Server: 200 newsserver news server ready - posting OK \layout Standard (client asks for a current newsgroup list) \layout LyX-Code Client: LIST \layout LyX-Code Server: 215 List of newsgroups follows \layout LyX-Code Server: net.wombats 00543 00501 y \layout LyX-Code Server: net.unix-wizards 10125 10011 y \layout LyX-Code \protected_separator \protected_separator \protected_separator \protected_separator \protected_separator \protected_separator \protected_separator (more information here) \layout LyX-Code Server: net.idiots 00100 00001 n \layout LyX-Code Server: . \layout Standard (client selects a newsgroup) \layout LyX-Code Client: GROUP net.unix-wizards \layout LyX-Code Server: 211 104 10011 10125 net.unix-wizards group \layout LyX-Code \protected_separator \protected_separator \protected_separator \protected_separator selected (There are 104 articles on file, \layout LyX-Code \protected_separator \protected_separator \protected_separator \protected_separator from 10011 to 10125) \layout Standard (client selects an article to read) \layout LyX-Code Client: STAT 10110 \layout LyX-Code Server: 223 10110 <23445@sdcsvax.ARPA> article \layout LyX-Code \protected_separator \protected_separator \protected_separator \protected_separator retrieved - statistics only \layout LyX-Code \protected_separator \protected_separator \protected_separator \protected_separator (article 10110 selected, its \layout LyX-Code \protected_separator \protected_separator \protected_separator \protected_separator message-id is \layout LyX-Code \protected_separator \protected_separator \protected_separator \protected_separator \protected_separator \protected_separator \protected_separator \protected_separator <23445@sdcsvax.ARPA>) \layout Standard (client examines the header) \layout LyX-Code Client: HEAD \layout LyX-Code Server: 222 10110 <23445@sdcsvax.ARPA> article \layout LyX-Code \protected_separator \protected_separator \protected_separator \protected_separator retrieved - head follows (text of header \layout LyX-Code \protected_separator \protected_separator \protected_separator \protected_separator appears here) \layout LyX-Code Server: . \layout Standard (client wants to see the text body of the article) \layout LyX-Code Client: BODY \layout LyX-Code Server: 222 10110 <23445@sdcsvax.ARPA> article \layout LyX-Code \protected_separator \protected_separator \protected_separator \protected_separator retrieved - body follows (body text \layout LyX-Code \protected_separator \protected_separator \protected_separator \protected_separator here) \layout LyX-Code Server: . \layout Standard (client selects next article in group) \layout LyX-Code Client: NEXT \layout LyX-Code Server: 223 10113 <21495@comptec.uucp> article \layout LyX-Code \protected_separator \protected_separator \protected_separator \protected_separator retrieved - statistics only (article \layout LyX-Code \protected_separator \protected_separator \protected_separator \protected_separator 10113 was next in group) \layout Standard (client finishes session) \layout LyX-Code Client: QUIT \layout LyX-Code Server: 205 goodbye. \the_end