Character/Ligature identification
- all the printed shapes on the page are identified and their locations
recorded
- the shapes are combined into complete characters using rules based
on overlap, closeness, column width, maximum height, etc.
- the page is then "walked" column by column to record the sequence
and locations of characters and ligatures.