Structured Process Input/OutputCameron McCormackclm@csse.monash.edu.au
http://www.csse.monash.edu.au/~clm/uni/honours/
Supervisor: John Hursttitle.pngWhat's the project about?Moving away UNIX process I/O from flat text (line-based records) to structured text (XML documents)The UNIX Command Line EnvironmentWhy is it popular?Idea of the "software tool" (Kernighan 1976)Many simple, specific programsComposition of these simple programs to make complex onesFocus is on flat text processingLine-based recordsFields separated by whitespaceIs this a problem?A problem with flat textNot all data conform to this record/field modelAn exampleA hierarchical directory listinghier2.pngAn exampleOutput of "ls -R".:
documents
./documents:
personal
shared
./documents/personal:
phonebook
resume
./documents/shared:
housekeeping
The hierarchical information doesn't fit into records/fieldsUsing this output, how do you find all files two directories deep?Another problemMany programs format output for human readingMore difficult for programs to parseAnother exampleOutput of ls long format$ ls -l
total 0
-rw-r--r-- 1 cameron cameron 253 Oct 22 23:40 phonebook
-rw-r--r-- 1 cameron cameron 763 Sep 5 2001 resume
How do you extract the modified time?No clear field delimeterNormally use "cut"But this needs knowledge of output formatModified time format can also changeNeed to separate information from presentationSo, what can we do?The solutionIncorporate a well-defined structure into process' input/outputXML as the data formatStandardisedParsers already existStill human readableHierarchicalAdded benefit: UnicodeThe new model - programsPrograms divided into three categoriesData generating programsData filtering programsUser interface to the environment (the shell)The new model - data formatPrograms take an XML document on standard input and standard outputStandard error remains plain textMost programs will generate documents of the form:
]]>Data generating programsPrograms such as ls, ps, dfThese programs take no inputNot responsible for formatting the outputThey do have formatting options, howeverFormatting preferences recorded, but not acted upon until laterSmytheJon9555 12349600 4321
...
]]>The updated lsThe output generated by our new ls program:
]]>
Now how easy is it toFind files two directories deepExtract the last modified timeData filtering programsWithout more XML aware programs, it is still difficult to
extract information from this outputWe need some filtersAn equivalent for cut, grep, sort, etc.Some of these filters can utilise XPathTake an XML document on standard inputProduce an XML document on standard outputWe can now say
]]>
User interfaceThis XML output is not suitable for presenting to the userNeed to transform it to some format for the terminalCan use XSLT to do the transformationBut it is unwieldy to manually transform every command's outputTransformation must happen automaticallyWe need support from the shellThe new shellHandles composition of programs just like Bourne shellBut also detects output type and transforms it appropriatelyAutomatic transformationThe shell inspects the output of the commandIt looks at the namespace of the document element
...
]]>It checks /etc/transforms.xml to determine which XSLT stylesheet to transform the output withIt runs the XSLT transformer, the output going to the terminalPrevent automatic transformationAllow the user to view the XML source, if they wishHelpful for determining how to construct a command pipelineThe ^ character tells the shell to keep the markup
]]>What we can achieve with thisWe can now work with the underlying information without trying to get around the formatting of one programAllows the user to construct commands more intuitivelyLimitations of the modelNot everything is suited to XML!Non XML documents must be supported as wellAll documents must be well formedDoes it make sense to run a filter on part of a document tree?An example: XIncludeDemonstrationSignificant outcomesPositiveCommands to extract data are easier to producePrograms are simplified, not having to worry about presentationNamed fields are importantNegativeUsers must think about the problem at a different levelMust work from the source, not just from what's on the screenConclusions drawnSeparation of form and content is definitely an advantagePrograms interoperate better with a standard file formatSimplicity is lostWill be difficult to get people to changeFuture workA UNIX-like operating system for this environment to be maximally useful inExtending the environment to access network objects (XML over HTTP)Thanks for listening!