This thesis involves the creation of an editor which enables users to view the graphical representation of a document. In general, if there is no graphical interface for editing a document, a text-based editor is used. For computer scientists, this is not a problem, as they are quite happy to work with a text-based system. However, less computer literate users would prefer to work with the graphical representation of a document. They prefer to use software with graphical user interface (GUI) rather than the text-based system. Therefore, the rationale behind this project is to provide users with a more user-friendly graphical representation of a document.
More specifically, this project aims to build an editor for the AXE (Ajh's XML Engine) style sheet or translation document, namely AXEsse (AXE style sheet editor). AXE is an XML tool that allows users to translate an XML (Extensible Markup Language) document to different types of document [8], such as HTML, RTF (Rich Text Format) and Latex, based on its style sheet. AXE and its style sheet will be discussed further in Sections 4.6 and 3.3, respectively. XML, including its background, translation process and the XML document itself, will be discussed in more detail in Chapter 2.
To achieve the project aim, Perl (Practical Extraction and Report Language) programming language and GTK (Gimp ToolKit) Perl binding were chosen to build the editor. Perl programming language was chosen for this project because of features including its capability to work with markup languages such as HTML and XML, and because it is good at text processing. Since this project deals with XML and text processing, Perl was an appropriate choice for this project. GTK was selected as a library for creating graphical user interfaces because of its "look and feel" infrastructure [15]. The details of Perl and GTK will be discussed in Sections 5.1 and 5.2 respectively.
As stated before, the XML background, its translation process and the XML document will be discussed in Chapter 2, followed by Chapter 3 on style sheets available for XML. Chapter 4 illustrates several XML tools available and compares them. The design of the editor is covered in Chapter 5, while Chapter 6 explains its implementation. The result is discussed in Chapter 7. The conclusions and future works will be discussed in Chapter 8. The User Manual, the AXEsse program interface and program source code are included in the appendices.
In the current Internet world, the most popular web markup language is the Hypertext Markup Language (HTML). HTML has been a powerful asset to Web development, but it lacks the capability of specialization [11]. HTML is designed to format how to present web page data, but not what the data represents. On the other hand, there is another markup language that is designed to format what the data represents, namely XML (Extensible Markup Language). Similar to HTML, XML has an interoperability feature, yet it has some significant features that are different from HTML, i.e. it is extensible and it separates document structure from document representation.
According to Kiely [16], the best part of XML is the 'X' in the acronym, which stands for extensible. This means that users have the ability to create their own vocabularies to describe the information. With this ability, an XML document can be designed to fit specific purposes. However, this is not possible with HTML. For instance, in building a document that stores address book information, XML allows users to create their own tags (tags are those that start with the 'less than' symbol, < and end with the 'greater than' symbol, >) such as <Name>, <Address>, <Phone> and <Email>. These XML tags are more meaningful and easier to understand compared to tags used in HTML. Tags will be discussed further in Section 2.3.6.
Different from HTML, XML separates document structure from document representation. This allows users to use the same XML document to produce different document representations. Moreover, the relevant information in XML can be retrieved more easily by search engines compared to HTML. For example, a user is searching for the names of all students at Monash University using a search engine. If an XML based search engine is used, the user is more likely to be able to retrieve all the names of Monash students. However, if an HTML based search engine is used, only those documents which contain the word "name" will be displayed. This is often not relevant to what the user wanted.
Similar to HTML, XML has an interoperability feature. It also gives users the ability to exchange documents in different platforms and applications. This is essentially because it is actually a simple text document [17]. In this sense, XML can be said to stand for exchangeable markup language rather than extensible markup language [18].
Both HTML and XML originate from Standard Generalized Markup Language (SGML). SGML is an international standard for the semantic tagging of documents [7], that was issued in 1986. It is particularly popular in large industries such as aircraft, aerospace, power and telecommunication, because it deals with large quantities of highly structured data, which need to be presented in the form of documents. It assists computer in cataloging and indexing, SGML however was very complex and expensive.
A simpler and cheaper alternative, XML was developed in 1996 by an
XML Working Group chaired by Jon Bosak of Sun Microsystems, under the auspices
of the World Wide Web Consortium (W3C). As a subset of SGML, XML
retained the structural power and flexibility of SGML but eliminated much
of the syntactic complexity of SGML [29].
An XML document can be translated into different types of documents
such as those in HTML, RTF or TEX.
In the translation process, the parser is used to check the well-formedness
and/or validity of an XML document. If the document is well-formed
and/or valid, it can then be transformed by the Transformer or Code Generator
into a different document type (see Figure 1). The well-formedness
in an XML document will be discussed in Section 2.3.6 while the validity
will be discussed in Section 2.3.4. To transform an XML document
to any other document, a translation file or style sheet is also needed.
The style sheet will be discussed in Chapter 3.
The XML document is made up of character data and markup. Character
data is the basic information of the document. Markup, on the other
hand, describes the properties of the document. Markup properties,
to be described below, include entity, CDATA, declarations, document type
definitions, elements, comments, character references, and processing instructions.
An entity is a storage unit that contains particular parts of an XML
document [6]. It may be a file, a database
record or network resource [19].
To refer to an entity through entity reference, requires the entity to
be declared by using entity declaration. For example, an entity named
xml is declared with the content of "Extensible Markup Language".
The XML processor will replace each instance of the entity reference, i.e.
&xml;, by Extensible Markup Language. An example is shown in
Figure 2.
The entity possesses the following type: internal entity, external
entity, parsed entity and unparsed entity, the examples of which are shown
in Figure 3. Internal entities are defined completely within the
XML document, while external entities are those which acquire their content
from another source located via a URL (Uniform Resource Locator) [6].
The parsed entities are entities whose replacement text will be parsed
as part of the document in which a reference to it occurs [19]. The unparsed
entities are external entities that the XML processor should not parse
as XML in the current document [19].
CDATA, which stands for character data, is used as verbatim quote to
escape blocks of text containing characters which would otherwise be recognized
as markup. This part of CDATA is not to be confused with the character
data. CDATA is usually used to write the markup code as pure text
in XML. A CDATA section begins with a string of "<![CDATA[" and
ends with a string of "]]>". The content between these strings will
not be interpreted by the XML parser. An example of CDATA is given
in Figure 4.
The document type declaration specifies the document type definition
a document uses [6]. The document type declaration is not to be confused
with the document type definition (DTD). The DTD is the set of rules
for specifying the structure of a document. It defines the legal
building blocks of an XML document [20].
An example is given in Figure
5, where <!DOCTYPE Document [ … ]> is the document type declaration
and <!ELEMENT … (…)> are the DTD.
An XML document is valid if it matches the constraints listed in the
DTD. From the example shown in Figure 5, the constraints are:
There are three parts to the XML declaration, the version information,
the encoding declaration and the standalone document declaration.
The XML declaration must precede the document type declaration if both
are provided.
Comments are inserted into an XML document to improve the readability
of that document. They are ignored by the processing software.
The comment must begin with a string of "<!--", followed by any text
and end with a string of "-->". For compatibility, the string "--"
must not occur within comments. An example is given in figure 8.
Each XML document contains one or more elements, which can be broken
down into two categories, i.e. element with content and empty element.
The first element has content which begins with a start-tag and finish
with an end-tag. The start-tag consists the name of an element type
which is known as generic identifier (GI), enclosed by a 'less than' symbol
< and a 'greater than' symbol >. An end-tag consists of the string
"</", the same GI as start-tag and a 'greater than' symbol >.
An example is shown in Figure 9.
The second element is empty element, which is an element without any
content. There are two ways that can be used to denote an empty element,
i.e. either by simply leaving out the content or by using an empty tag.
An empty tag consists of a 'less than' symbol <, followed by GI and
closed by a string "/>". To provide a better understanding of an
empty element, two examples are shown below.
An XML document is said to be well-formed if all its element's tags
are in pairs, that is for each start-tag there is an end-tag, except for
tags denoting empty elements.
In addition to content, elements may have attributes which will be
discussed in Section 2.3.8.
Character reference refers to specific characters in the Unicode character
set [20]. Unicode is the native character set of XML which can be
displayed by the XML browser [6]. Every Unicode character is a number
between 0 to 65,535. The character reference consists of string "&#",
the code's decimal number and a semicolon (;). If the hexadecimal
character code is used, the character set starts with a string of "&#x",
the code's hexadecimal number and a semicolon (;). For example, the
Greek pi symbol, has Unicode decimal value of 960, thus it can be inserted
to an XML document as π, or π.
Processing Instruction (PI) is an explicit mechanism for embedding
information in a document intended for proprietary application rather than
the XML parser or browser [6]. It allows an XML document to contain
instructions for applications. The XML parser will pass the instructions
to the application and the application will decide what to do with the
instructions. If the application does not recognize the instruction,
the instruction will be ignored.
An XML document only specifies the content of the document, however
it does not say anything about how the content should look. Information
about an XML document's appearance is stored in a style sheet. Different
style sheet can be used for a single document to produce different appearance.
A Few examples of XML style sheets are CSS, XSL and AXE style sheet.
These will be discussed in Section
3.1, 3.2 and 3.3 respectively.
In Section 3.4, the comparison of these style sheets are discussed.
Cascading Style Sheets (CSS) were introduced in 1996 as a standard
of adding information about style properties to HTML document. It
is a simple declarative language that allows stylistic information, such
as font, spacing, colour and so on, to be applied to the structured documents
written in HTML or even XML [23].
It allows elements to be rendered
by associating them with properties (e.g. font-size, font-weight, color)
and values (e.g. 24pt, bold, blue). For instance, in Figure 14, Greeting
element is rendered as a block-level element in 24-point bold blue text.
Extensible Stylesheet Language (XSL) is a specification under development
within the W3C for applying formatting to XML documents [21]. It
is a language for expressing style sheets [5]. XSL itself is an XML
application. It contains two major parts:
In Figure 15, XSL will transform Greeting element to 24-point bold
blue text.
AXE (Ajh's XML Engine) was developed by John Hurst at Monash University.
After experimenting with several XML tools that did not perform to expectations,
he built his own [8]. AXE was built to translate an XML document
to any document types such as HTML and Latex [8]. All information
below relating to details of the AXE and its style sheet are derived from
Hurst's work [8].
AXE has its own style sheet mechanism, which allows it to contain comments,
"include commands" and translation commands. Comments are denoted
by a character of '#', which must not be preceded by any characters, and
these are followed by strings of text. The "include commands" allow
users to include other style sheets to the current style sheet, allowing
users to reuse any style sheets that they have created. These begin
with string of "include", followed by the filename. An example of
two comments and an "include command" can be seen in Figure 16.
The translation commands define the translations to be applied upon
recognizing each element's start and end tag in the document. The
translation may consist of various texts or Perl code fragments.
Furthermore, the translation for each element may be divided in three parts:
prefix, infix and postfix translation (see Figure 17). The prefix
translation is applied before the element content is translated.
The element content or infix is indicated either by a variable name or
by string of "^^". The postfix translation is applied
after the element content is translated.
More detailed information about the AXE translation can be found on
John Hurst's homepage (
http://www.csse.monash.edu.au/~ajh/research/doctech/index.html).
CSS allows elements to be rendered by associating them with the properties
and the values. The CSS syntax is much simpler compared to XSL and
it has the advantage of broader browser support [6].
CSS does not allow the user to change or reorder the content of an XML document or add
extra information like a signature block [6]. In other words, it
cannot provide a display structure that deviates from the structure of
the XML document [24].
XSL and AXE, on the other hand, allow the user to rearrange and reorder
the element. XSL is more flexible and powerful, and better suited
to XML documents, compared to CSS [6]. Furthermore, XSL and AXE allow
the user to access and display the content of the attributes easily, which
cannot be done in CSS. CSS can only apply to the elements' content,
not the attributes. Therefore, if there is any data that the user
wants to display, the data must be part of an element's content rather
than one of its attributes in CSS. Both XSL and AXE are transformation
languages which enable the user to convert an XML document into different
types of document, such as HTML and RTF. XSL is a formatting language
as well, same as CSS. AXE is only a transformation language, not
a formatting language.
There are many software available that enable the user to view and
modify XML documents and style sheets, as well as to translate the XML
document based on the style sheet. This report provides information
about some of the software, such as XML Spy, XMLwriter, UltraXML, XED,
XML Notepad and AXE. The comparison among the software is provided
in Section 4.7.
XML Spy is developed by the Altova company in Austria. As a member
of W3C, Altova has been actively involved in XML software technologies.
XML Spy is software that provides users with major aspects of XML in one
powerful and easy-to-use product [13]. It allows XML editing &
validation, DTD editing & validation and XSL editing & transformation.
The screenshot is shown in Figure 18.
To fulfil users' preferences, XML Spy provides four advance views on
XML documents. These four advance views are the enhanced grid view
for structured editing, the database/table view that shows repeated elements
in a tabular fashion, a text view with syntax-coloring for low-level work
and an integrated browser view that supports both CSS and XSL style-sheets
[13].
Other than the four advanced views, XML Spy also has several important
features such as [13]:
Other information includes:
For more information about XML Spy, please visit XML Spy homepage (
http://www.xmlspy.com).
XMLwriter is developed by the Wattle Software company based in Sydney,
Australia, whose aims are to produce high quality, user-friendly applications
with comprehensive online help and support. As an XML editor, XMLwriter
is designed to help users to take advantage of the latest XML and XML-related
technologies such as XSL and XQL [1] by providing users with range of XML
functionalities such as validation of XML documents against a DTD or XML
Schema and the ability to convert XML to HTML using XSL style sheets [1].
The screenshot can be seen in Figure 19.
Furthermore, it also provides users with three different windows to
make the job easier, such as Workspace, Document and Preview windows [2].
Other information includes:
For more information about XMLwriter, please visit its homepage (
http://www.XMLwriter.com).
According to WebX System Ltd., UltraXML is the very first true native
WYSIWYG integrated XML editor solution available [9]. UltraXML allows
users to see the XML document appearance directly as it is created.
UltraXML important features are [9]:
Other information includes:
For more information about UltraXML, please visit its homepage
(http://www.webxsystems.com/UltraXML.htm).
XED is a XML editor created by Henry S. Thompson from the University
of Edinburgh. It only supports hand-authoring of small-to-medium
size XML documents [12]. An outstanding feature of XED is to ensure
only a well-formed document is produced. Moreover, XED can be used
in different platforms, such as Windows 95/98/NT, Linux, FreeBSD, and Solaris
2.5 [12]. The screenshot is shown in Figure 20.
XED is indeed a very simple editor that support some simple functionalities
[12], such as:
Other information includes:
For more information about XED, please visit XED homepage
(http://www.ltg.edu.ac.uk/~ht/xed.html).
XML Notepad is a product of Microsoft Corporation. It is a simple
application that enables the rapid building and editing of an XML document
[10]. It provides simple user interface that graphically represents
XML data in tree structure.
As shown in Figure 21, the structure of the document is represented
in the left column while the values of the nodes are displayed in the right
column [14]. Elements in XML Notepad are represented either by folder
icons if they have dependent structures such as other elements, attributes,
etc, or leaf icons if they contain no substructures [14]. Attributes,
text and comments are represented by 3-D blocks, text icons and exclamation
mark icons, respectively [14]. Besides element, attribute, comment
and character data, other properties are not supported by XML Notepad.
Other information includes:
For more information, please visit XML Notepad's homepage
(
http://msdn.microsoft.com/xml/NOTEPAD/intro.asp).
As explained in Section 3.3, AXE is an XML tool that allows a general
purpose XML, which only needs to be well-formed, to be translated to any
translation mechanism [8]. The unique features of AXE are that it
has its own style sheet and translation mechanism and it allows Perl code
to be included as part of the translation. However, AXE does not
have a graphical interface for its style sheet yet.
Other information includes:
For more information about AXE and its style sheet, please visit AXE
homepage (
http://www.csse.monash.edu.au/~ajh/research/doctech/index.html).
Based on the XML software available in the market, discussed in detail in the previous section,
the differences among the various software will now be analyzed.
In terms of its capabilities in speed and quality, XML Spy, XML Writer and Ultra XML are powerful editors which rank
on the higher end of the axis. On the other hand, editors such as XED and XML notepad offer simple features and simple application functionalities.
Different XML software are developed for different group of users. Hence, the degree of ease of usage and user-friendliness of the software varies.
XML Spy, XML Writer and XML notepad are some of the easier-to-use editors in the market.
There are also variations in the pricing of the products based on the features and quality of the products.
On the higher end of the price range would be Ultra XML, costing a hefty $3000, while XML Writer and XML Spy cost less than $150. XED,
AXE and XML notepad are available free of charge from their websites.
XML Notepad has a simple user interface, while AXE does not have a graphical interface.
Ultra XML is so powerful, it allows users to view the XML document appear directly even as it is being created.
The six editors vary in their range of features and functionalities offered, such as their editing and validation capabilities,
their window display and their translation mechanisms.
With regards to product availability, all the six editors mentioned can be used on Microsoft Windows that is,
Windows 95/98/NT. AXE can be used on Unix and Linux, while XED can be used on Solaris 2.5, Free BSD and Linux and XML Notepad
can be used on Internet Explorer 4.01, which goes to show its flexibility in usage.
This project is about building an editor for AXE style sheet, namely
AXEsse (AXE style sheet editor). The editor would enable users to
see the content of the file in a user-friendly way. Thus, the graphical
user interface of AXE style sheet is designed to have the following features:
To achieve the design specified above, the program was written in Perl
with GTK. Both Perl and GTK will be discussed in more detail in Section
5.1 and 5.2 respectively.
Perl, which stands for Practical Extraction and Report language, was
originally developed in 1986 by Larry Wall as a glue language for the UNIX
operating system [25].
It was developed to produce reports from many
files with many cross-references between files. For this reason,
Perl is good at text processing, such as scanning arbitrary text files,
extracting information from those files and processing that information
with various build-in functions.
There are many reasons for the success of Perl. These include:
Since this project deals with the XML style sheet, Perl's capabilities
of working with XML and text processing make it a good choice for building
AXEsse software. Furthermore, the fact that it does not impose arbitrary
limitations on data makes the choice even stronger.
Other than those reasons given above, compared with C, GTK with Perl
binding is much simpler. With Perl, programmers do not need to worry
about casting as needed in C, it has already been taken care of.
Examples of GTK with C and Perl are given in Figures 22 and 23 respectively.
GTK (GIMP Toolkit) is an Open Source Free Software GUI Toolkit written
in C programming language by its primary authors, Peter Mattis, Spencer
Kimball and Josh MackDonald [15]. Although it was primarily
developed for use with the X Window System, but GTK is now also used in
the process of building different software projects. Though it is
written in C, GTK is essentially an object-oriented application interface
(API) because it is implemented using the idea of classes and callback
functions (pointers to functions) [15].
GTK, that was built on top of GDK (GIMP Drawing Kit), is a library
for creating graphical user interfaces with the "look and feel" infrastructure.
Designed to be small and efficient, it is still flexible enough to allow
the programmer freedom in the interfaces created. It allows the programmer
to use a variety of standard user interface widgets such as push, radio
and check buttons, menus, lists and frames. It also provides several container
widgets which can be used to control the layout of the user interface elements.
Being written in C, GTK can be used in C, but there are also GTK bindings
for many other languages including C++, Guile, Perl, Python, TOM, Ada95,
Objective C, Free Pascal, and Eiffel [15]. Since AXEsse is
the graphical interface for an editor which is written in Perl, GTK with
Perl binding became the natural choice.
The following section describes the development and the implementation
of the AXEsse interface and functionalities. The obstacles faced
in the process of completing this project will be discussed as well as
the successes.
The initial design specified a screen containing a "tree view" at the
left-hand side and a text box at the right hand side, separated by a GTK
widget named Hpaned which enabled the user to resize the screen horizontally.
The expandable "tree view" structure was used to provide a brief description
of each component of the style sheet based on the DTD structure.
The text box was used to display the content of each component in the style
sheet. However, this "tree view" structure was not retained because
there was no DTD in the style sheet.
Another factor which made the use of a "tree view" structure desirable
was that it was able to display the content of the included files (using
include commands) in the same window used for the style sheet. Nonetheless,
this structure was not finally implemented as the user would not have seen
the content of the current file and included files simultaneously.
In the final design, the list structure was chosen to replace this
"tree view" structure to display the content of the style sheet.
To display the content of any included file, another window is displayed
when the user selects the included file component in this list structure.
The list structure will be further discussed in Section 6.6.
The pictures used to distinguish the component of the style sheet were
created as image files using the Icon Editor tool written by Thomas Tanghus
which is available in Linux. To display these image files on AXEsee,
the GDK pixmap widget was used. In addition, icons used in AXEsse
were created using this tool.
The right-hand side of the interface contains a text box which displays
the content of the file. This interface was further improved by using
four different text boxes to display the content of each component of the
style sheet, enabling the user to see the contents clearly. The first
text box is used for displaying the content of the component as a whole.
The second one is used to display the prefix of the component. The
third one is used to display the infix of the component. The last
one is used to display the postfix of the component. These text boxes
were retained and used in the final interface of AXEsse.
In order to distinguish the prefix, the infix and the postfix of the
content, a pattern matching mechanism was required. Perl provides
regular expressions for the pattern matching mechanism, thus this procedure
was easily achieved. One obstacle faced in using this regular expression
was to find a tag component whose start tag and end tag were not written
in one line. One solution to this obstacle was to delete the new
line ('\n') at the end of each line, combine those lines together, and
then apply the regular expression to find the start tag and end tag.
However, this solution also changes the content structure. In the
end, this solution was not used in the final implementation of AXEsse because
a pattern matching operator 's' were found that treats the string as a
single line in the regular expression. Using this pattern matching
operator, the structure of the content could be retained while the start
tag and end tag could be found easily. This mechanism was implemented
in AXEsse.
To store the component of the file, the following attributes are needed,
such as Type (to distinguish whether the component is a ‘tag’, ‘comment’,
‘include file’, ‘new line’ or ‘element’), Tag name, Content, Prefix, Infix,
Postfix and Comment (the text that will appear in the status bar).
Originally, a 2D array was designed to be used to store these attributes.
For example, to access the type of the first component, $array[0][0] was
used. However, instead of implementing in 2D, an array of hashes
was implemented in the final design, because the code would be more understandable.
Thus for the example above, to access the type of the first component was
written as $array[0]{'type'}, which is more understandable compared to
$array[0][0].
As stated before, when the user selects the included file component
in the list structure, another window is popped up to display the content
of the included file. To prevent several windows popping up for the
same file, a temporary file is created to store the filename of an open
file in AXEsee, in this case ".AXEsse.tb". Whenever a file is closed,
the filename is removed from ".AXEsse.tb". When there is no more
file in ".AXEsse.tb", it is removed from the directory. Thus, when
the user selects an included file component in the list structure and the
file is already displayed in another window, the status bar displays a
message to inform the user that the file is already opened. This
does not mean the user cannot open the same file in different windows,
the user can open the same file by using the Open menu item or toolbar
item. If there is an inconsistency in this temporary file, for example
the user accidentally or deliberately deletes the ".AXEsse.tb" file, when
another included file component in the list structure is selected, the
system will inform the user that there is some inconsistency happening
and will suggest to the user to reload the editor.
The present interface of AXEsse (see Figure 24) consists of:
Each of them is explained in the sections below.
The Menu bar consists of File, Edit, Preference and Help. To
create the menu bar, GTK provides a built-in widget called Accel Group.
Accel Group provides a mechanism to include short cut keys for each menu
item. Thus, by using the Accel Group widget, design features such
as a basic menu bar and the provision of short cut keys, were achieved.
Unfortunately, there was a minor problem encountered while using this Accel
Group widget. It could not add the ‘F1’ key as a short cut key.
Thus, the short cut ‘F1’ key that is usually used to provide the user with
the Help function was not implemented in this application. The user
simply clicks on Help menu bar and Content to call the Help function.
The File menu bar contains New Window, New, Open, Save, Save As and
Exit menu items. The Edit menu bar item has Undo, Redo, Update, Insert,
Append, Delete, Cut, Copy and Paste functionalities. The Preference
menu bar consists of Icon only, Text only and Icon & Text menu items.
Finally, the Help menu bar consists of Content and About AXEsse menu items.
By clicking on the New Window file menu item, a new window is displayed
for the user. This was done by calling a Perl function named system,
which executes any program on the system, in this case the "perl AXEsse.pl".
The New file menu item allows the user to create a new style sheet.
To ensure no data is lost, if the current file is modified and has not
been saved, a message box is displayed to inform the user that the file
has been modified, it then asks whether the file should be saved before
allowing the new creation of a style sheet.
The Open menu item allows the user to open an existing style sheet.
This was implemented using a file selection dialog box, where the user
can choose either to select the file from the file list field or to type
in the filename in the provided text box. The selected file is then
opened for the user. The file selection dialog box is discussed in
Section 6.4. Similar to the New menu item, if the current file has
been modified and has not been saved, the message box is popped up to warn
the user.
The Save and Save As menu items allow the user to save the current
file. If the Save menu item is selected and the user has not specified
any filename yet, a file selection dialog box is displayed for the user
to provide the filename. After a filename is provided, the data is
then saved to that file. Similar to the Save As menu item in other
word processors, it always displays a file selection dialog box to allow
the user to select or insert the filename to which the data is saved.
If the file already exists, then a message box is popped up to provide
the user with the information that the file already exists, it then asks
if the user wants to overwrite it.
The Exit menu item allows the user to quit the application. If
the current file has been modified and has not been saved, a message box
will pop up to inform the user of this fact and asks if the file should
be saved first before quitting the application.
The Undo menu item provides the user with the ability to undo changes
made previously, such as insert, append, update and delete. The Redo
menu item allows the user to redo the actions that have been undone by
the Undo. To provide unlimited undo and redo features as specified
in the design, two arrays, which are treated as stacks, are used to store
each action that the user does. Whenever an action is performed,
the undo array is updated with information about the action details, such
as the action type, the column number that was deleted and the content
of that column number. The redo array, on the other hand, is updated
only when the user clicks on the Undo menu item or the undo icon.
However, the undo and the redo functions cannot support the undo and redo
for cut and paste functionalities. This was because the content that
was cut or pasted cannot be obtained from the buffer used in GTK.
The tutorial and the technical report for GTK were unable to provide useful
information on this topic. Thus, the undo and redo features could be used
for the update, insert, append and delete capabilities only.
The Update, Insert, Append and Delete menu items provide the capability
to modify the content of the current style sheet. The Insert, Append
and Delete menu items functionalities will be discussed in Section 6.8.
The Update toolbar item is used to update the component of the style sheet.
After modifying the content of the Content text box, the user needs to
either click on the Update menu or toolbar item. If the user clicks
on other list item in the Component Display screen without click on Update
menu or toolbar item, the modification is lost. This is performed
in such a way to prevent the inconsistency of the content after the modification.
For example, if the current component is a 'tag' component, after the modification,
the error occurs. Thus, this component should be categorized as an
error, rather than 'tag' component.
Specifically for the 'tag' component, the user can also modify the
content of the Prefix, Infix and Postfix text boxes. However, when
the user modifies these three text boxes, the user must not modify the
content of the Content text box. Because by modifying both the Content
and one or more of the other text boxes, the system may not know which
modification should be updated. For example, the content of the component
is <tag><B><I>.^^.</B></I></tag>. After the modification
of the Content and Prefix text box, the content becomes:
Notice that the modifications make the content of the text boxes inconsistent
to each other, thus these modifications may cause confusion as to which
modification should be updated. Therefore, to allow the modification
on Prefix, Infix and Postfix text boxes to be updated, the content of the
Content text box must not be updated. If it is updated, the modification
is updated based on the modification done in the Content text box.
The last three edit menu items are the Cut, Copy and Paste menu items.
These menu items can be applied for all the text boxes available in AXEsse.
These functionalities were easily done using callback functions provided
for text box widget in GTK.
The preference menu item only includes Icon only, Text only and Text
& Icon functionalities. They enable the user to view the items
in the toolbar as icon, text or both text and icon. These functionalities
can be easily accomplished through the built-in functions provided by GTK.
Finally, Content and About AXEsse in the Help menu item provide the
user manual of AXEsse (see Appendix A) and information about AXEsse to
the user. When the user clicks on the Content menu item, the application
will open Netscape to display the Help content of AXEsse, written in HTML.
If the About AXEsse menu item is clicked, a message box which provides
information about the version of AXEsse, the copyright, the author and
the author’s email address, is displayed (see Figure 25).
When the message box or the file selection dialog box is displayed,
the main window is dimmed or de-activated to prevent the user from doing
other actions until an option provided in the message box or file selection
dialog box is chosen.
The toolbar items provided in AXEsse are New, Open, Save, Undo, Redo,
Cut, Copy, Paste and Update. All of these toolbar items' functionalities
have been discussed in the previous section. Similar to the menu
bar, GTK also provides a built-in toolbar widget which allows the creation
of each toolbar item to be achieved easily. Furthermore, the toolbar
widget in GTK also provides the capability to show some tips about the
toolbar item. This is normally done by using the Tooltips widget.
If the user leaves the pointer over a toolbar item for a few seconds, the
tips of that toolbar item is displayed.
The Status bar implemented in this application is essentially a label.
The message is displayed when the user clicks on the list item in the Component
List screen. It provides information about the current selected list
item. For the include command component, if the file cannot be opened,
then a message is displayed in the status bar to inform the user that the
file cannot be opened. If the file is already displayed in other
window, the status bar provides a message to inform the user that the file
is currently open. The GTK status bar keeps a stack of messages.
To display the message, this can be achieved by popping the stack content
[28]. Thus, the GTK status bar widget was not
used for this application as it is complicated and cumbersome.
The file selection dialog box allows the user to select an existing
filename. The GTK built-in file selection widget was implemented
for the file selection dialog box. The programming time was cut down
through using the built-in file selection widget provided by GTK.
As shown in Figure 26, it provides several functionalities such as a Create
Dir button, Delete File button, Rename File button, Directory drop down
list box, Directory list screen, File list screen, Selection text box,
OK button and Cancel button.
The Create Dir button is used to create a new directory in the current
selected directory. As the names suggest, the Delete File and Rename
File buttons provide the user with the ability to remove the selected file
from that directory and to alter the name of the selected file. The
Directory List screen enables the user to change the directory to search
for a certain file. The user can select the file by clicking on the
filename in the File list screen. Finally, the OK button allows the
user to proceed to the open or save actions, while the Cancel button allows
the user to cancel the actions that are to be carried out.
The message box, which is used to provide some important information
to the user, was created by using the GTK dialog widget. This simple
dialog widget is basically a window with two boxes and a separator packed
into it. The first box provides the ability to include pictures and
text, while the second box, which is called the action area
[28], allows
the addition of several buttons, such as OK, Cancel, Yes and No.
The message box interface that asks the user to save the file before the
next action is carried out, is shown in Figure 27.
The Component List screen was implemented using the GTK Clist widget
to briefly display the content of the style sheet. The Clist widget provides
easy access to the list items. The scrolledwindow widget was also
implemented to provide the scrollbar automatically when it is needed.
With the Clist widget, the item can either be text only or picture
only or both picture and text. A feature that allowed picture and
text to be included together was considered necessary for the interface.
The pictures could be used to distinguish the components of the style sheet
such as ‘tags’, ‘comments’, ‘include files’, ‘new lines’ and ‘elements’
clearly. The text could be used to briefly describe the content of
each component.
Compared to Figure 24, Figure 28 includes a scrollbar which appears
automatically and the list elements that are displayed in pictures and
text. Because this report is printed in black and white, the actual
colour of the pictures are shown in black. The original colour for
the background of the pictures is yellow, the selected item is highlighted
by blue. The picture for each component is:
Originally, each comment in the style sheet that is next to each other
was displayed in separate list items. This specification was altered
to display all comments in only one list item because adjoining comments
usually describe the same fragment of code. On the other hand, if
two comments are separated by a new line, they are displayed in different
list elements.
The content display screen consists of four text boxes. The first
text box displays the content of the element selected in the element list
screen. The prefix, infix and the postfix text boxes are used to display
the prefix, the infix and the postfix components of the element content.
The GTK Vpaned widget is used to allow the size of these text boxes to
be adjusted vertically as the user desires. If the total content
cannot fit in the text box, the scroll bar will automatically appear to
allow the user to scroll through the whole content. As shown in Figure
28, only two text boxes have scroll bars. This is because their content
cannot be seen totally in their text boxes. This mechanism is implemented
using the GTK scrolledwindow widget which automatically displays the scroll
bar when it is needed. Notice that there is a curved arrow symbol,
in two of the text boxes. This symbol indicates that the line of
text is too long to fit onto a single line of the display window, thus
the text is wrapped onto the next line.
To allow the user to modify the content of the style sheet, a few GTK
command buttons, such as Insert, Append, Delete buttons, are provided.
The Insert button allows the user to insert the component of the translation
file before the current selected component in the list. The Append
button allows the user to append the component of the style sheet at the
end of the component list. Both Insert and Append buttons will insert
or append the components that are typed by the user in the Insert/Append
text box. The Delete button is used to delete a single component
in the list.
The AXEsse was built to achieve the aim of this project, that is to
provide an AXE style sheet editor. AXEsse, an editor that was built
with graphical user interface provides the user with several functionalities
to view and to modify the style sheet.
In viewing the style sheet, AXEsse gives the user these following benefits:
In modifying the style sheet, the user also has these following benefits:
Admitting that there are more features can be added to AXEsse, since
AXEsse enables the user to edit and modify the style sheet in a graphical
way, it can be said that AXEsse does fulfil the requirements asked of it.
This project has successfully built and tested an editor, namely AXEsse
(AXE style sheet editor), for the AXE style sheet. AXEsse enables
the user to view and modify the content of the style sheet in a graphical
way. The user could easily distinguish different components, such
as 'tag', 'comment', 'include file', 'element' or error in the style sheet
because AXEsse provides different icons for each component. For 'tag'
components, AXEsse displays their prefix, infix and postfix components.
The user can also modify the content of the style sheet as AXEsse provides
them with update, insert, append and delete functionalities. Furthermore,
AXEsse provides users with unlimited undo and redo capabilities, thus major
corruption of the style sheet can be prevented.
The current version of AXEsse could be further improved in various
ways to provide even better editing mechanism to the user. Firstly,
the graphical representation of the tags. Secondly, to include the
Undo and Redo capabilities for Cut and Paste functionalities. Thirdly,
the addition of Print and Find capabilities to enable the user to print
the style sheet and to search for specific component in the style sheet.
Fourthly, the capability to change the font size, which enables the user
to view the content in the preferred font, size. Finally, the capability
to display the rendered document which enables the user to view how the
XML document will look like. With the presence of these additions,
AXEsse would be considered as a complete editor.
2.2. XML Translation Process
Figure 1: Translation Process
2.3. XML Document
2.3.1. Entity
Figure 2: Example of entity declaration and entity reference
Figure 3: Example of internal, external, parsed and unparsed
entity declarations
2.3.2. CDATA
Figure 4: Example of CDATA
2.3.3. Document Type Declaration
Figure 5: Example of an XML document with DTD
¤ A Document needs to have a Greeting and a Body parts.
It cannot have more than one of them.
¤ The Greeting and Body have to be parsed character data (PCDATA).
2.3.4. Declaration
¤ The version information part declares the version of XML that
is in use. This part is required in all XML declaration (Figure 6).
Figure 6: Example of XML declaration with version information
Figure 7: Example of XML declaration with version information, encoding
declaration and standalone document declaration
2.3.5. Comment
Figure 8: Example of comments
2.3.6. Element
Figure 9: Example of an element with content
Figure 10: Example of an empty element with a start-tag and an end-tag
Figure 11: Example of an empty element with an empty tag
2.3.7. Character Reference
2.3.8. Attribute
Attributes are a way of attaching characteristics or properties to
the elements of a document. Each attribute will have names and values
pair. For instance, a person has height and weight as his/her properties.
Thus, these properties can be transformed into a person element's attributes
(see Figure 12). Note that the attribute name does not go in quotes
while attribute value does.
Figure 12: Example of attribute
2.3.9. Processing Instruction
PI starts with a string of "<?", followed by strings of text and
ends with a string of "?>". An example of PI is shown in Figure 13
where PI passes gcc HelloWorld.c to the application.
Figure 13: Example of a Processing Instruction
3. XML Style Sheet
3.1. CSS
Figure 14: Example of CSS
3.2. XSL
Figure 15: Example of XSL
3.3. AXE Style Sheet
Figure 16: Example of AXE comments and an include command
Figure 17: Example of AXE Style Sheet
3.4. Comparison of The Style Sheets
4. XML Software
4.1. XML Spy
Figure 18: XML Spy Interface – taken from
http://new.xmlspy.com/features_intro.html on 19/7/2000
4.2. XMLwriter
Figure 19: XMLwriter interface – taken from
http://www.xmlwriter.net/images/fullscreen.gif on 19/7/2000
4.3. UltraXML
4.4. XED
Figure 20: XED Interface - taken from
http://www.ltg.ed.ac.uk/~ht/xed.html on 19/7/2000
4.5. XML Notepad
Figure 21: XML Notepad Interface – taken from
http://msdn.microsoft.com/xml/notepad/run.asp on 19/7/2000
4.6. AXE
4.7. Comparison of XML Software
5. System Design
¤ An AXE style sheet displayed in graphical way. This
could be achieved by building an editor that has a graphical front end.
GTK was chosen to accomplish this project because it is the library to
build the user graphical interface. Furthermore, the design could
be further improved by providing the display of tags using images.
5.1. Perl Language
Figure 22: Example of GTK with C binding
Figure 23: Example of GTK with Perl binding
5.2. GTK
6. System Implementation
6.1. Menu Bar
Figure 24: Final AXEsse editor screen
Figure 25: The About AXEsse message box
6.2. Toolbar
6.3. Status Bar
6.4. File Selection Dialog Box
Figure 26: File Selection dialog box
6.5. Message Box
Figure 27: A Message Box interface
6.6. Component List screen
6.7. Content Screen
Figure 28: AXEsse interface with file content
6.8. Insert, Append and Delete Buttons
7. Result
¤ The user can view the current and the included files' content
simultaneously, because AXEsse can display the included files' content
in other windows.
8. Conclusions and Future Works
Appendix A - AXEsse User Manual
Please go to AXE User Manual
Appendix B - Program Interface
Interface of AXEsse
Last updated by Susanti on 16/11/2000