Constraint-Based Document Layout for Mobile Computers

Lawrence Teo <Lawrence.Teo@infotech.monash.edu.au>

Supervised by:
Assoc. Prof. Arkady Zaslavsky <Arkady.Zaslavsky@infotech.monash.edu.au>
Assoc. Prof. Kim Marriott <Kim.Marriott@infotech.monash.edu.au>

Monash University, Melbourne, Australia
Report Date: 26 January 2001
Timeframe: November 2000 - February 2001

This is a joint project between the Distributed Systems and Software Engineering (DSSE) and Optimisation and Constraint Solving (O&CS) research groups. It is funded by DSTC as a summer vacation student project.

Introduction

One of the main technical challenges of document delivery over the Internet results from the tensions among the document designer's desire to specify the exact appearance of the document, the document viewer's desires and needs, and the capabilities of the viewing device. This is particularly true for hand-held computing devices whose screen area is extremely limited. Constraint-based layout provides a possible solution [1,2]. Constraints allow the designer to specify what are the desired properties of the layout, rather than precisely detail the layout, thus allowing more flexibility.

The aim of this project is to determine whether constraint-based layout is provides sufficient flexibility for hand-held devices [2].

Brief Background

This project uses two technologies for formatting document layouts on mobile devices: constraints and Composite Capabilities/Preferences Profiles (CC/PP).

Constraints provide a declarative way of specifying the desired layout of a document. Constraints have previously been researched in areas such as web development. In this project, constraints was tested for its feasibility as a means for formatting document layouts on mobile devices.

CC/PP is a draft standard proposed by the World Wide Web Consortium (W3C). CC/PP is used to state the capabilities and preferences of users and their user agents. For example, the CC/PP for a PDA would describe its hardware, software, operating system, etc. It would also state the user's preferences (such as whether the user wants sound to be turned on), and hardware extensions.

How the System Works

A system was designed to use constraints and CC/PP to format document layout on mobile devices. The system was implemented as a Perl script called pdaformat.pl. The system takes in an input document, applies constraints and CC/PP attributes to it in order to produce a Palm database (.pdb) file which will be shown on a PalmOS device.

In the system, instead of focusing on all kinds of documents, we chose to use tables as our input document. We then define constraints specific to tables (such as column widths). CC/PP is used to provide the screen width for the device, which is then used as an input value for the constraints (in this case, the screen width is used as the value for the total width of the table).

The constraints are declared in a file using a simple constraint declaration language (which will be described in the section Constraint Declaration Language).

Required Software

The following software are required to run the system:

Software Homepage Download URL
Perl 5.6 http://www.perl.com/ stable.tar.gz
Qoca constraint-solving toolkit http://www.csse.monash.edu.au/projects/qoca/ Use the CVS version.
Qoca Perl None Through CVS
XML-Parser 2.30 http://search.cpan.org/search?dist=XML-Parser XML-Parser.2.30.tar.gz
Local copy
expat 1.95.1 http://sourceforge.net/projects/expat/ expat-1.95.1.tar.gz
Local copy
MobileDB convertor for Linux
(other platforms are also supported)
http://www.mobilegeneration.com/ mdbconvlinuxi386.tar.gz
Local copy
xls2csv http://www.ice.ru/~vitus/catdoc/ catdoc-0.91.4.tar.gz
Local copy

In order to show the PDB file, you need a PalmOS device (e.g. a Palm or Handspring Visor) or a PalmOS emulator.

Installing the System

You can download the system from here: pdaformat.tar.gz (971KB). Here are some short instructions on installing the system:

  1. Install the required software on a GNU/Linux system (other UNIX versions may also work, but have not been tested).
  2. Extract the pdaformat.tar.gz file into a directory, let's say, pdaformat/.
  3. Make sure that the Qc.so and Qc.pm files from Qoca Perl are in the pdaformat/ directory. Also make sure that the xls2csv and mdbconv files are in the pdaformat/ directory as well. (Note: the pdaformat.tar.gz tarball has these files in the directory already. However, if you compile a new version of Qoca Perl, xls2csv and mdbconv, then please make sure these files are there).

Another way to obtain the latest version of the system is through CVS. The system has been placed on the CVS server at bowman.csse.monash.edu.au on 24 January 2001. To obtain the CVS version, you can use the following instructions on a UNIX system.

export CVSROOT=:pserver:anonymous@bowman.csse.monash.edu.au:/home/cvsroot
cvs login
(password is "anonymous")
cvs co cns-tbls

Running the System

Go to the pdaformat/ directory and type ./pdaformat.pl at the shell. You should get the following output:

pdaformat
Copyright (c) 2001 DSTC Pty Ltd.
Written by Lawrence Teo, includes code from Qoca Perl by Nathan Hurst

usage: perl pdaformat.pl [constraints_filename]
OPTIONS:
    -i filename       the input filename (eg. spreadsheet.xls). Accepted
                      file types are Excel (.xls), basic XHTML tables
                      (.html,.xml) and comma-separated values (.csv)
    -c filename       the constraint filename (eg. constraints.con)
    -p filename       the CC/PP profile (eg. palm-iii.ccpp)
    -o filename       the output filename (eg. database.pdb)
    -n name           the name of the database (eg. "Phonebook")
    -D level          display debugging info (eg. 1,2,3,...)
    -h                this help screen

The pdaformat.tar.gz package comes with a few sample files that can be used to test the pdaformat.pl script. These files are described as follows:

File Description
phonebook.csv A comma-separated values file containing phonebook entries (name, email and phone). Best used for testing simple constraints.
table.html The same phonebook in the form of an XHTML table. Best used for testing simple constraints.
phonebook.con Sample constraints file for the phonebook.
wines3.xls This is an Excel spreadsheet describing a few wines. Best used for testing weighted constraints.
wines3.con Sample constraints file for the wines3.xls file.
palm-iii.ccpp Sample CC/PP profile describing the Palm III. Please note that this is not an accurate description of the Palm III.

The sample files are shown as follows:

phonebook.csv:

"Name","Email","Phone"
"Arkady","arkady@monash.edu.au","32 479"
"Kim","kim@monash.edu.au","55 525"
"Lawrence","lawrence@monash.edu.au","32 309"
"Nathan","nathan@monash.edu.au","55 779"
"Peter","peter@monash.edu.au","55 779"

table.html:

Phone Book

This is an XHTML file

Name Email Phone
Arkady arkady@monash.edu.au 32 479
Kim kim@monash.edu.au 55 525
Lawrence lawrence@monash.edu.au 32 309
Nathan nathan@monash.edu.au 55 779
Peter peter@monash.edu.au 55 779

wines3.xls (represented here in HTML):

Manufacturer Name Description Price/bottle
Chandon Vintage Brut Pinot Noir and Chardonnay A$35
Brown Brothers Moscato Fresh and fruity A$20
Eyton Shiraz Dense red/purple tints A$40
Chandon Cuvee Riche Dessert wine A$30
Taylor Vintage Port Berry, black pepper, blackberry A$40
Saxenburg Chardonnay A unique Chardonnay A$30

palm-iii.ccpp:

<?xml version="1.0"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:prf="http://www.w3.org/TR/WD-profile-vocabulary#">
 <rdf:Description about="HardwarePlatform">
   <prf:Defaults
    Vendor="Palm"
    Model="IIIe"
    Type="PDA"
    ScreenSize="160x150x2"
    CPU="PPC"
    Keyboard="Yes"
    Memory="8mB"
    Bluetooth="No"
    Speaker="Yes" />
  <prf:Modifications
    Memory="32mB" />
 </rdf:Description>
 <rdf:Description about="SoftwarePlatform">
  <prf:Defaults
   OS="PalmOS3.5"
   HTMLVersion="4.0"
   JavaScriptVersion="4.0"
   WAPVersion="1.0"
   WMLScript="1.0" />
  <prf:Modifications
   Sound="Off"
   Images="Off" />
 </rdf:Description>
 <rdf:Description about="EpocEmail1.0">
   <prf:Defaults
   HTMLVersion="4.0" />
 </rdf:Description>
  <rdf:Description about="EpocCalendar1.0">
  <prf:Defaults
   HTMLVersion="4.0" />
 </rdf:Description>
  <rdf:Description about="UserPreferences">
 <prf:Defaults
        Language="English"/>
 </rdf:Description>
</rdf:RDF>

You can test the script as follows:

./pdaformat.pl -i phonebook.csv -c phonebook.con -p palm-iii.ccpp
-o phonebook.pdb -n "myphonebook"

That command should generate a file called phonebook.pdb. In summary, the command takes in the phonebook.csv file, and uses constraints declared in phonebook.con and the screen size of the Palm specified in palm-iii.ccpp. It then generates the phonebook.pdb database file with the name "myphonebook". The following figure shows a screenshot of the PDB file being displayed on a PalmOS emulator.

The -D option can be used to print out (hopefully) useful debugging information. Here's a sample use for it:

./pdaformat.pl -i wines3.xls -c wines3.con -p palm-iii.ccpp
-o wines3.pdb -n "My wines" -D1

The command generates the wines3.pdb file from wines3.xls, wines3.con and palm-iii.ccpp. The debug level was set to "1", which prints out debugging information. The greater the level, the more debugging information is printed.

Features

  • Support for Excel (.xls), comma-separated values (.csv), and basic XHTML tables (.html,.xml) as input documents
  • Support for simple and weighted constraints
  • Uses a simple constraint declaration language

Constraint declaration language

The system uses a simple constraint declaration language, which is defined in "pseudo-BNF" as follows:

<!type> ::= 'simple' | 'weighted'

<weighted-constraint> ::= <weight> <linpoly> <symbol> <linpoly>

<simple-constraint> ::= <linpoly> <symbol> <linpoly>

<weight> ::= 'verystrong' | 'strong' | 'medium' | 'weak' | 'veryweak'

<symbol> ::= '==' | '>=' | '<='

<linpoly> ::= <digit>+ | <var> | <linpoly> <operator>

<operator> ::= '+' | '-' | '*' | '/'

<var> ::= cw<digit> | ce<digit> | cl<digit> | tw

For a better explanation of these constraints, please refer to the constraint examples given below.

The variable names used in the constraints have special meanings attached to them. These meanings are described as follows:

Variable name Meaning
tw This is the total width of the table. Usually the constraint for tw is specified as tw == cw1+cw2+...+cwn. tw is also replaced by the screen width obtained from the CC/PP.
cwx This represents the width of a column. For example, cw3 is the width of the third column.
cex This represents the width of the widest entry in a column. For example, ce2 is the width of the widest entry in column 2.
clx This represents the width of the longest word in a column. For example, cl4 is the width of the longest word in column 4.

On average, each character is 5.6 pixels wide on the Palm. The actual width of each character is variable - for example, a "w" will occupy more pixels compared to an "i". 5.6 pixels was chosen as the average size, as the width of most characters range from 5 to 6 pixels.

Here is a sample file containing simple constraints:

# Constraints for phonebook

tw == cw1+cw2+cw3

# make col 1 equal to col 3
cw1 == cw3

#  let col 2 be twice the size of col 1
cw2 == 2*cw1

The constraints above can be summarised like this: The total width of the table is equal to the sum of the widths of columns 1, 2 and 3. The width of column 1 must be equal to column 3. Column 2's width will be twice that of column 1.

Here is an example of a constraints file using weighted constraints.

# Constraints for wines3.xls
#
# Weights: verystrong, strong, medium, weak, veryweak

# Constraint type: "simple" or "weighted"
!type = weighted

verystrong  tw == cw1+cw2+cw3+cw4

strong      cw1 >= ce1
verystrong  cw1 >= cl1

strong      cw2 >= ce2
strong      cw2 >= cl2

veryweak    cw3 >= ce3
weak        cw3 >= cl3

strong      cw4 >= ce4
verystrong  cw4 >= cl4

The set of sample weighted constraints given above can be summarised like this: The "!type = weighted" statement means that the constraints specified in this file are weighted constraints. The system will therefore expect to read weighted constraint declaration lines.

"cw1 >= ce1" means that the width of column 1 must be greater than the width of the widest entry in column 1. "cw1 >= cl1" means that the width of column 1 must be greater than the width of the longest word in column 1. Likewise, the widths of the rest of the columns are greater than the widths of the widest entries and the longest words in those columns ("cwx >= cex" and "cwx >= clx").

There are weights associated with each constraint. In this example, the strong and verystrong weights are associated with columns 1,2 and 4, because those are important columns. Column 3 is given a weak weight. The figures below show the screenshots of the output using these set of constraints on the sample wines3.xls file (the wines3.xls file has four columns: Manufacturer, Name, Description, Price/bottle). Note that column 3 has been given the weakest weight, and therefore its width is not greater than the widest entry nor the longest word. On the other hand, columns 1, 2 and 4 have been given greater priority.

Internal Workings

This section presents some notes on the internal workings of the system.

The program is summarised in the following order:

  1. Generate a temporary filename to store the temporary CSV file (in the format "/tmp/pdaformat<inputfilename>-<pid>-<time>").
  2. Check the input file. If it's an Excel spreadsheet or an XHTML file, convert it to a temporary CSV file. If it's a CSV file, use it as it is. The Excel to CSV conversion is done using xls2csv, while the XHTML table is parsed using XML::Parser and converted manually.
  3. Parse the CC/PP file and obtain the screen width of the target device using XML::Parser.
  4. Parse the constraints file and add the constraints to Qoca (using Qc::add_constraint() or Qc::add_weighted_constraint() depending on whether the constraints are simple or weighted).
  5. Solve the constraints using Qc::Solve().
  6. If we have cex or clx variable names, look for the widest entry and the longest word in each column. This is stored in two arrays - @widest_entry and @longest_word. (as I am writing this, I just realised that there is a problem with the logic here. I used "if (weighted)" here instead of checking if there are variables named ce and cl as a condition for executing this part of the code. This needs to be fixed.)
  7. Assign the screen width to the tw variable using Qc::AddEditVar(); Qc::BeginEdit(); Qc::SuggestValue().
  8. If we have cex or clx variable names, suggest values for them using Qc::AddEditVar() and Qc::SuggestValue(). (I just realised I used "if (weighted)" here again as well).
  9. Resolve the constraints using Qc::Resolve() and Qc::EndEdit().
  10. Assign the values of the cwx variables into an array called @cw_values.
  11. Convert the CSV file into a PDB file using mdbconv, using the widths from @cw_values.
  12. Delete the temporary CSV file.

The weight strings ("verystrong", "strong", ...) and their respective weight values are stored using an associative array called %weight_values which has the following associations:

Weight string Value
verystrong 1e25
strong 1e20
medium 1e15
weak 1e10
veryweak 1e5

Known Issues

  1. Numeric data in Excel spreadsheets may cause problems ("30" and "$55" are not okay; "A$30" or "B2" are okay). This is a problem with xls2csv, the conversion tool used to convert Excel spreadsheets to CSV files.
  2. The add_constraint() and add_weighted_constraint() functions in pdaformat.pl have a lot of redundant code. It would be better to move these parts of the code to another function so that it's more efficient.
  3. There is only limited error checking in the code.
  4. The system at present does not enforce the total width variable (tw) to be equal to the screen width. There is a part of the code in the pdaformat.pl file that can be used to enforce this, but has been commented out (look for the part that says "# UNCOMMENT THESE THREE LINES to force TW == screen_width"). The reason why they have been commented out is because another possible, and probably more elegant, way to do it would be to introduce another constraint variable to represent the screen width. That way, we can enforce the constraint in the constraints file with a line like "verystrong tw == screen_width".

Future Work

  1. Allow interactive swapping of columns. This was not pursued in the project because it requires developing programs on the PalmOS platform, which is outside the scope and time limit for this project.
  2. Automatically adapt the table in the most optimal manner upon reading device characterisitics. The system should be able to switch rows into columns, and columns into rows, depending on the size of the screen of the device obtained from the CC/PP. For example, if there are more rows than columns in the table, and the height of the PDA screen is longer than the width, then the rows and columns should be swapped to enable more data to be displayed on the PDA.
  3. Allow abbreviation of words. Some words may be too long to fit in a single column. In such cases, it would be beneficial to abbreviate such words to fit into the column. An example is "computer science" which can be abbreviated into "comp. sci." or "c.s.". One issue to look out for would be collisions. For example, "computer science" and "constraint solving" can both be abbreviated to "c.s.". In such cases, the system should be intelligent enough to detect and avoid such collisions by choosing the best abbreviation in a certain scenario.
  4. Dynamically generate constraints for documents. To be able to dynamically generate constraints for arbitrary documents would be an interesting challenge. This would probably require the content and context of the document to be examined before proper constraints can be automatically generated.
  5. Design a proper installation procedure for the system. At present, the system's installation procedure is somewhat unstructured. A proper installation procedure such as "make install" or something similar would be good.

Seminar

A seminar was held to present the results and status of this project:

Constraint-Based Document Layout for Mobile Computers, Distributed Systems Technology Centre, Room B4.61, Monash University, Caulfield, 24 January 2001, 4pm.

The Powerpoint presentation can be downloaded here: seminar.ppt or in online HTML format.

Abstract

As mobile devices become more popular, their users have an increasing need for such devices to be connected to the Internet. However, the limitations of such devices make it difficult for Internet content to be displayed on them. As an example, some webpages that are graphically rich are almost impossible to display on a Personal Digital Assistant with a limited screen size without colours.

There is therefore a need to adapt the content of Internet resources into a format that fits the limitations of a mobile device. Content adaptation has been done before, and it remains active in research today.

This project investigates the feasibility of two technologies - constraints and Composite Capabilities/Preferences Profiles (CC/PP) - as a means for formatting document layouts on mobile devices. I will also present a small prototype of a mobile computing system which merges both technologies to format documents for mobile devices. An interesting aspect of this system is that it uses Qoca, a constraint-solving toolkit developed by CSSE.

Acknowledgments

I would like to acknowledge the help of my supervisors Arkady Zaslavsky and Kim Marriott. Thanks also goes out to Peter Moulder for his help on Qoca, CVS, and many other things, and Nathan Hurst for writing Qoca Perl. Dr Seng Wai Loke introduced me to resources on CC/PP. Rob Gray helped me with administrative matters.

It has been a great pleasure working with all of you -- and tell me when you guys get a paper published! :-)

Resources

  1. A. Borning, R. Lin and K. Marriott. Constraints for the Web. Fifth ACM International Multi-Media Conference, pages 173--182. Seattle, Nov. 1997.
  2. G. Badros, A. Borning, K. Marriott and P. Stuckey. Constraint Cascading Style Sheets for the Web. ACM 12th Symposium on User Interface Software and Technology (UIST), Nov. 1999.
  3. World Wide Web Consortium. Composite Capabilities/Preferences Profiles. http://www.w3.org/Mobile/CCPP/


Personal Notes

This section describes some of my personal experiences throughout the duration of this project.

When I first started the project, I had no idea about constraint solving and CC/PP. My research interest was in network security; therefore, constraints and CC/PP were alien to me. I really wondered whether I could actually finish this entire project.

I had a few meetings with Arkady, Kim, Peter and Nathan in the initial weeks of the project. Kim brought me some papers on constraints, which proved to be immensely useful. It provided me with some ideas on constraint solving. Arkady introduced me to Dr Seng Wai Loke, who in turn showed me the W3C resources on CC/PP.

I remember facing a lot of difficulties in the beginning. I was running an OpenBSD box in my lab, and I ran into compilation problems when trying to compile Qoca because of the non-GNU utilities on the system. I exchanged a lot of emails with Peter, who helped me out tremendously. In the end, Peter and Nathan gave me an account on bowman, their Linux box in their lab, and I developed the system there ever since. Although the compilation problems delayed my project for awhile, I did learn a lot about UNIX development tools, especially how BSD and GNU utilities differ (note to Peter: okay, I shall make sure that my future BSD boxes have the proper GNU tools on them! :) ).

I also had difficulty understanding how Qoca worked in the beginning. The documentation was actually there, just that I was looking in the wrong places. Eventually I found the tutorial on the Qoca project webpage, and using the information on that page together with trial-and-error, I finally understood how it worked. I soon got it working on my system. And I'm now thinking of using Qoca for some of my own projects in the future! :)

The goals for the project changed as well. Originally, I defined two goals in the project:

Goal #1: Convert an Excel file to a PDB file, using constraints and CC/PP to format the document layout. Demonstrate this on MobileDB on the Palm emulator. The aim of this is to show the basics of using constraint-solving techniques and CC/PP to format documents on present mobile devices.

Goal #2: Convert an Excel file to an XML file, using constraints and CC/PP to format the document layout. Demonstrate this on Tarpaulin, the XML browser written by Nathan. This will be used to show more advanced uses of constraints and CC/PP to format documents on "future" mobile devices.

These goals evolved along the way. At the end of it, I was supporting Excel spreadsheets, XHTML tables and CSV files. It supported weighted constraints too, which just got added into Qoca recently. Unfortunately, I didn't manage to use Tarpaulin as planned. Tarpaulin was an XML browser that was supposed to be used in the project to simulate future and more advanced mobile devices.

Presenting a seminar about the project was an interesting experience. I received a lot of questions from the audience, a lot of which I never expected, but the audience input was very helpful in determining the future areas that can be looked into in this project. I had a good discussion with the audience. I was glad to have Arkady, Kim and Peter in the seminar too to help me answer questions which I have no idea of!

Overall, I think my whole DSTC summer project has been a really rewarding experience. I solved many interesting problems and enjoyed the numerous challenges along the way. It was also great to work with a team of people who supported me all the way. Although the project wasn't easy, I had a lot of fun and learnt a heap of stuff, which will definitely be useful in the future.