View and Rank Trains

John Hurst 0.1.0 ()

1. Overview

This document describes various python cgi scripts for viewing the author's railway photograph collection. The collection is held in a database (directory plus sub-directories) that can be accessed from a variety of web pages (see for example, my school server page). There are two main scripts, one to view an image at full resolution, and one to view the popularity rankings of the images. Other scripts described here help in maintaining the system.

The first script defined is viewtrains.py. This script takes a single parameter, which is the short name of an image, and searches the database for that image, and then renders it at full resolution, along with data about the image retrieved from the associated XML file.

The other main script is ranktrains.py, which delivers a page of thumbnail images in order of popularity. Because of the large number of images, each such page delivers only a few of the total number of images, but buttons are included to navigate around the complete rankings.

tidytrains.py

ranking.py

rank.py

1.1 Global Constants used in each Program

1.1 Define Global Constants

(year, month, day, hour, minute, second, weekday, yday, DST) = \ time.localtime(time.time())MacOSX='MacOSX'Solaris='Solaris'system=MacOSX # this line gets nurgled by the makefileif system==MacOSX: elif system==Solaris: else: print "Unknown system %s" % (system) sys.exit(1)EXTN=".xml"CGIBIN="http://localhost/~ajh/cgi-bin"HOMEPAGE="http://localhost/~ajh"BASEPAGE="/home/ajh/www"LOGFILE="/Library/WebServer/Documents/ajh/logs"CGIBIN="http://www.csse.monash.edu.au/~ajh/cgi-bin"HOMEPAGE="http://www.csse.monash.edu.au/~ajh"BASEPAGE="/u/web/homes/ajh"LOGFILE=BASEPAGE+"/logs"

2. viewtrains.py Main Body

#!/usr/local/bin/python# DO NOT EDIT THIS FILE! # use ~/Computers/python/viewtrains/viewtrains.xlp insteadprint "Content-Type: text/html\n\n";htmltitle="Full size image of "+imageparmif imageisrelative: display(imageparm)else: res=visit(top,0,imageparm)scriptnameparm="viewtrains.py?image=%s" % (escimageparm)

The main work of this script is done in one of the two procedures display and visit, depending upon whether the user offered a relative image path or not.

Somewhat contradictory to normal usage, a relative path refers to the location of the image relative to the base trains directory, rather than the root directory, hence the terminology. A non-relative path simply gives the image name, and hence a full directory search must be carried out in order to locate the image.

2.1 Initialisation

2.1.1 Imports

import cgiimport osimport datetimeimport mathimport rankfrom rank import DECAYimport reimport stringimport sysimport timeimport urllibfrom xml.dom.minidom import parse, parseString, Node

2.1.2 Define Global Constants

tm = "%4d%02d%02d:%02d%02d" % (year, month, day, hour, minute)SCRIPT=CGIBIN+"/viewtrains.py"SCRIPTIMAGE=SCRIPT+"?image="top=BASEPAGE+"/trains"now=datetime.datetime.now()startnow=datetime.datetime.now()today=startnow.strftime("%Y%m%d")

2.1.3 Define Regular Expression Patterns

ignoredirs = re.compile('(tmp)|(units)')

2.1.4 Define Subroutines

These procedures are of sufficient significance that they have been moved to a separate section.

2.2 Supporting Code

2.2.1 collect cgi parameters

form = cgi.FieldStorage()#print form#print cgi.print_environ()ipadr=convertIPtoHex(os.getenv("REMOTE_ADDR"))gotparms=0; dontlog=0#print "QUERY_STRING=",os.getenv("QUERY_STRING"),"<BR/>"#print "USER_AGENT=",os.getenv("USER_AGENT"),"<BR/>"if form.has_key("image"): imageparm=form["image"].value gotparms=1 res=re.match('([^.]+).jpg$',imageparm) if res: imageparm=res.group(1) res=re.match('^trains/',imageparm) if res: imageisrelative=1 else: imageisrelative=0if form.has_key("disablevote"): dontlog=1if not gotparms: print "<H1>Error!</H1>" print "<P>You are using a browser which has not passed in the ", print "cgi parameters ", print "correctly. Please use a different browser that does ", print "handle parameters properly. ", print "(Mozilla, Safari, Epiphany, Firefox are known to work).</P>" print "<P>As a consolation prize, you the second most popular image!</P>" imageparm="trains/private/9-2" imageisrelative=1 dontlog=1escimageparm=urllib.quote(imageparm)

Use the Python library to retrieve cgi parameters. Currently there only one, image, which is the name of an image in the train library. Two alternatives are available:

  1. The parameter starts with "trains/", in which case it is a relative pathname into the trains directory, and no searching is required; or
  2. It does not, in which case the name must be searched against the image library to find the required image.
The choice between these two is flagged in the variable imageisrelative.

Discard any ".jpg" suffix.

Escape any suspect URL parameter characters.

2.2.2 Collect Previous Rankings

All previous rankings have been reduced to a single vote value for each image. These values are stored in a file RANKINGS, together with the date and time of the rankings. These votes are exponentially decayed, and used as the base values for any additional votes cast since that date.

RANKINGS=LOGFILE+"/trainrank"VIEWINGS=LOGFILE+"/trainview"totalimages, datatime, votefactor, table = rank.rankdata(RANKINGS)

2.2.3 print header of html page

This code prints the header part of the html page.

print """ <html> <head>

print the starting lines, then ...

<title>""",print htmltitle,print """</title>

print the page title,including the "MONASH UNIVERSITY", "INFORMATION TECHNOLOGY" and "Clayton School" parts.

<base href=\""""+HOMEPAGE+"""/"/> <link rel="stylesheet" HREF="styles/monash.css" type="text/css" /> </head> <body> <div id="global-header"> <div id="global-images"> <table width="100%" bgcolor="white"> <tr width="100%"> <td align="left"> <table> <tr> <td align="left"> <a href="http://www.monash.edu.au"> <span style="font-family:sans-serif;font-size:+160%;font-weight:bold; background-color:#ffffff;color:black"> MONASH UNIVERSITY </span> </a> </td> </tr> <tr> <td align="left"> <a href="http://www.infotech.monash.edu.au" COLOR="black"> <span style="font-family:sans-serif;font-size:+140%;font-weight:bold; background-color:#ffffff;color:black">INFORMATION TECHNOLOGY</span> </a> </td> </tr> <tr> <td align="left"> <a href="http://www.csse.monash.edu.au" COLOR="black"> <span style="font-family:sans-serif;font-size:+120%; font-weight:bold;background-color:#ffffff;color:black"> Clayton School</span> </a> </td> </tr> </table> </td> <td align="right">""",

Generate the trains image on the trains page. We do this from the list of available images in web/images/banner (added manually to this list), by choosing one indexed by the low order bits of the current microsecond, that is, pseudo-randomly.

rightnow=datetime.datetime.now()locos=["R707-1.jpg", "6029-32.jpg", "4472=R761-11.jpg",\ "621-16.jpg", "F255-1.jpg", "5910-4.jpg",\ "3813-5.jpg", "5112+5910-4a.jpg","W933-8.jpg",\ "520-6.jpg", "3203-3.jpg", "3642-1.jpg",\ "Rx207-15.jpg", "38=D3+K=R-3.jpg", "D3-639+R707-1.png",\ "J549-21.jpg", "5367-2.jpg", "W22-1.png",\ "3813-5a.jpg", "tgv-13.jpg", "S300-1.jpg",\ "4472=R761-11.jpg","6029-6.jpg", "Callington-1.jpg",\ "NYCHudson-1.jpg", "R711-3.jpg", "MerddinEmrys-2.jpg",\ "3801+3813+3820-2.jpg"]loco=locos[rightnow.microsecond % len(locos)]print "<img align=\"right\" SRC=\"images/banner/" + loco,print "\" height=\"79\" alt=\"steam loco " + loco + "\"\/>",

Now complete the final part of the header. A warning message about using Internet Explorer is also added, as that application is not W3C compliant.

print """ </td> </tr> </table> </div> <div class="spacer"></div> <table style="background-color:#339933;border-top:1px solid #000000" width="100%" id="global-nav" summary="Layout for site-wide navigation"> <tr> <td valign="center"> <div style="font-size:+140%;margin-left:1em"> <a HREF="index"""+EXTN+"""">JOHN HURST</a> Warning: This page works with any browser EXCEPT Internet Explorer! <xsl:copy-of select="$GlobalNavBar"/> </div> </td> </tr> </table> <!-- U T I L I T Y N A V I G A T I O N --> <table style="background-color:#3c6;color:#fff;vertical-align:middle; text-align: right" width="100%" id="global-utils" summary="Layout for utility navigation"> <tr> <td align="left"> <a HREF="position/index"""+EXTN+"""">Position</a> | <a HREF="research/index"""+EXTN+"""">Research</a> | <a HREF="teaching/index"""+EXTN+"""">Teaching</a> | <a HREF="admin/index"""+EXTN+"""">Administration</a> | <a HREF="professional/index"""+EXTN+"""">Professional</a> | <a HREF="personal/index"""+EXTN+"""">Personal</a> | <a HREF="trains/index"""+EXTN+"""">Railways</a> | <a HREF=\""""+CGIBIN+"""/train-map.py">Site map</a> </td> </tr> </table><TABLE WIDTH="100%" BGCOLOR="#fff" CELLSPACING="0" CELLPADDING="0"><TR><TD COLOR="#ffffff" BGCOLOR="#33cc66" COLSPAN="40" ALIGN="center"><B>Central Shunting Yard</B></TD></TR><TR><TD ALIGN="center" BGCOLOR="silver" COLSPAN="3">Main</TD><TD ALIGN="center" BGCOLOR="lightgreen" COLSPAN="7">Australia</TD><TD ALIGN="center" BGCOLOR="lightpink" COLSPAN="3">Miscellaneous</TD><TD ALIGN="center" BGCOLOR="lightblue" COLSPAN="5">Rest of World</TD></TR><TR><TD ALIGN="center" BGCOLOR="white"><A HREF="trains/./index"""+EXTN+""""><IMG SRC="trains/./trains.gif" ALT="Main Railway Page" HEIGHT="30" WIDTH="30"></A></TD><TD ALIGN="center" BGCOLOR="white"><A HREF="trains/new/index"""+EXTN+""""><IMG SRC="trains/new/trains.gif" ALT="The Latest Additions" HEIGHT="30" WIDTH="30"></A></TD><TD ALIGN="center" BGCOLOR="white"><A HREF="trains/pops/index"""+EXTN+""""><IMG SRC="trains/pops/thumb/ajh.gif" ALT="The Most Popular Images" HEIGHT="30" WIDTH="30"></A></TD><TD ALIGN="center" BGCOLOR="white"><A HREF="trains/anr/index"""+EXTN+""""><IMG SRC="trains/anr/trains.gif" ALT="Australian National Railways" HEIGHT="30" WIDTH="30"></A></TD><TD ALIGN="center" BGCOLOR="white"><A HREF="trains/nsw/index"""+EXTN+""""><IMG SRC="trains/nsw/trains.gif" ALT="New South Wales Railways" HEIGHT="30" WIDTH="30"></A></TD><TD ALIGN="center" BGCOLOR="white"><A HREF="trains/qld/index"""+EXTN+""""><IMG SRC="trains/qld/trains.gif" ALT="Queensland Railways" HEIGHT="30" WIDTH="30"></A></TD><TD ALIGN="center" BGCOLOR="white"><A HREF="trains/sa/index"""+EXTN+""""><IMG SRC="trains/sa/trains.gif" ALT="South Australian Railways" HEIGHT="30" WIDTH="30"></A></TD><TD ALIGN="center" BGCOLOR="white"><A HREF="trains/tas/index"""+EXTN+""""><IMG SRC="trains/tas/trains.gif" ALT="Tasmanian Railways" HEIGHT="30" WIDTH="30"></A></TD><TD ALIGN="center" BGCOLOR="white"><A HREF="trains/vic/index"""+EXTN+""""><IMG SRC="trains/vic/trains.gif" ALT="Victorian Railways" HEIGHT="30" WIDTH="30"></A></TD><TD ALIGN="center" BGCOLOR="white"><A HREF="trains/wa/index"""+EXTN+""""><IMG SRC="trains/wa/trains.gif" ALT="West Australian Railways" HEIGHT="30" WIDTH="30"></A></TD><TD ALIGN="center" BGCOLOR="white"><A HREF="trains/misc/index"""+EXTN+""""><IMG SRC="trains/misc/trains.gif" ALT="Miscellaneous Railway Items" HEIGHT="30" WIDTH="30"></A></TD><TD ALIGN="center" BGCOLOR="white"><A HREF="trains/private/index"""+EXTN+""""><IMG SRC="trains/private/trains.gif" ALT="Private Railways" HEIGHT="30" WIDTH="30"></A></TD><TD ALIGN="center" BGCOLOR="white"><A HREF="trains/tourist/index"""+EXTN+""""><IMG SRC="trains/tourist/trains.gif" ALT="Tourist and Preservation" HEIGHT="30" WIDTH="30"></A></TD><TD ALIGN="center" BGCOLOR="white"><A HREF="trains/./rest"""+EXTN+""""><IMG SRC="trains/./thumb/rest.gif" ALT="African/Asian Railways" HEIGHT="30" WIDTH="30"></A></TD><TD ALIGN="center" BGCOLOR="white"><A HREF="trains/br/index"""+EXTN+""""><IMG SRC="trains/br/trains.gif" ALT="British Railways" HEIGHT="30" WIDTH="30"></A></TD><TD ALIGN="center" BGCOLOR="white"><A HREF="trains/europe/index"""+EXTN+""""><IMG SRC="trains/europe/trains.gif" ALT="Continental European Railways" HEIGHT="30" WIDTH="30"></A></TD><TD ALIGN="center" BGCOLOR="white"><A HREF="trains/nz/index"""+EXTN+""""><IMG SRC="trains/nz/trains.gif" ALT="New Zealand Railways" HEIGHT="30" WIDTH="30"></A></TD><TD ALIGN="center" BGCOLOR="white"><A HREF="trains/usa/index"""+EXTN+""""><IMG SRC="trains/usa/trains.gif" ALT="North American Railways" HEIGHT="30" WIDTH="30"></A></TD></TR><TR><TD ALIGN="center" BGCOLOR="white"><A HREF="trains/./index"""+EXTN+"""">Central</A></TD><TD ALIGN="center" BGCOLOR="white"><A HREF="trains/new/index"""+EXTN+"""">Latest</A></TD><TD ALIGN="center" BGCOLOR="white"><A HREF="trains/pops/index"""+EXTN+"""">VoxPop</A></TD><TD ALIGN="center" BGCOLOR="white"><A HREF="trains/anr/index"""+EXTN+"""">ANR</A></TD><TD ALIGN="center" BGCOLOR="white"><A HREF="trains/nsw/index"""+EXTN+"""">NSW</A></TD><TD ALIGN="center" BGCOLOR="white"><A HREF="trains/qld/index"""+EXTN+"""">QLD</A></TD><TD ALIGN="center" BGCOLOR="white"><A HREF="trains/sa/index"""+EXTN+"""">SA</A></TD><TD ALIGN="center" BGCOLOR="white"><A HREF="trains/tas/index"""+EXTN+"""">TAS</A></TD><TD ALIGN="center" BGCOLOR="white"><A HREF="trains/vic/index"""+EXTN+"""">VIC</A></TD><TD ALIGN="center" BGCOLOR="white"><A HREF="trains/wa/index"""+EXTN+"""">WA</A></TD><TD ALIGN="center" BGCOLOR="white"><A HREF="trains/misc/index"""+EXTN+"""">Misc</A></TD><TD ALIGN="center" BGCOLOR="white"><A HREF="trains/private/index"""+EXTN+"""">Private</A></TD><TD ALIGN="center" BGCOLOR="white"><A HREF="trains/tourist/index"""+EXTN+"""">Tourist</A></TD><TD ALIGN="center" BGCOLOR="white"><A HREF="trains/./rest"""+EXTN+"""">Rest</A></TD><TD ALIGN="center" BGCOLOR="white"><A HREF="trains/br/index"""+EXTN+"""">BR</A></TD><TD ALIGN="center" BGCOLOR="white"><A HREF="trains/europe/index"""+EXTN+"""">Europe</A></TD><TD ALIGN="center" BGCOLOR="white"><A HREF="trains/nz/index"""+EXTN+"""">NZ</A></TD><TD ALIGN="center" BGCOLOR="white"><A HREF="trains/usa/index"""+EXTN+"""">US&amp;</A></TD></TR></TABLE></div>"""

2.2.4 Print Trailer of HTML Page

print """ <HR SIZE="4" NOSHADE="on" COLOR="#339"/> <TABLE width="100%" align="center" border="0" cellspacing="0" cellpadding="0"> <TR><TD height="10"></TD></TR> <TR> <TD>This page maintained by John Hurst. <BR/> Copyright <A HREF="http://www.adm.monash.edu.au/unisec/pol/itec12.html"> Monash University Acceptable Use Policy </A> </TD> <TD><xsl:copy-of select="$GlobalCounter"/></TD> <TD ALIGN="right" ROWSPAN="2"> <IMG VALIGN="bottom" SRC="images/MadeOnMac.gif"/> <A HREF="index"""+EXTN+""""> <IMG ALIGN="center" height="50" width="33" SRC="family/john9808.gif" alt="My Photo"/></A> <A HREF="trains/index"""+EXTN+""""> <IMG ALIGN="center" height="50" width="33" SRC="images/train.gif" alt="Train Photo"/></A> </TD> </TR> <TR> <TD ALIGN="left" valign="bottom" COLSPAN="3"> <SPAN STYLE="font-size:80%"> <P> Dynamically generated at """+\ tm+"""\ <BR/> Maintainer use only; not generally accessible: <!-- **** NB! The "localhost" in the following MUST be split to avoid being converted for other server contexts -->"""ind=' 'print ind+'<A href="http://local'+'host/cgi-bin/ajh/'+scriptnameparm+'">Local Server</A>'print ind+"<xsl:text>&#x0a;</xsl:text>"print ind+'<A href="http://www.ajhurst.org/cgi-bin/ajh/'+scriptnameparm+'">Home Server</A>'print ind+"<xsl:text>&#x0a;</xsl:text>"print ind+'<A href="http://bendigo.csse.monash.edu.au/cgi-bin/ajh/'+scriptnameparm+'">'print ind+'Work Server</A>'print ind+"<xsl:text>&#x0a;</xsl:text>"print ind+'<A href="http://www.csse.monash.edu.au/cgi-bin/cgiwrap/ajh/'+scriptnameparm+'">'print ind+'CSSE Server</A>'print """ <xsl:text>&#x0a;</xsl:text> </P> </SPAN> </TD> </TR> </TABLE> </body></html>"""

2.3 Define Subroutines

2.3.1 define function convertIPtoHex

def convertIPtoHex(ipadrDec): ipadrHex=ipadrDec res=re.match(r'(\d+)\.(\d+)\.(\d+)\.(\d+)',ipadrDec) if res: d1=int(res.group(1)) d2=int(res.group(2)) d3=int(res.group(3)) d4=int(res.group(4)) ipadrHex = "%02x%02x%02x%02x" % (d1,d2,d3,d4) return ipadrHex

The logic of this function is simple enough: extract the integer (decimal) values of each field in an IP address, and convert each to a two-digit hexadecimal value. Concatenate all these into a single hex string, which is returned.

2.3.2 define procedure log

def log(ipadr,image,acc,ok): global dontlog if dontlog: return access="" if not acc: refer = os.getenv("HTTP_REFERRER") access=" *** not served *** (ref: %s)" % (refer) elif not ok: access=" *** already voted ***" try: f=open(VIEWINGS,'a') except: print "Cannot open logfile %s" % (VIEWINGS) f.write("%s %s %s%s\n" % (tm,ipadr,image,access)) f.close() #print "<P>Logged %s %s</P>" % (tm,image)

Every image access is logged, for recording its popularity. The exceptions are where there is an explicit request not to log (dontlog is true), and where there is some problem in delivering the image.

2.3.3 Determine rank of image

logentrypat=\ re.compile("(\d{4})(\d{2})(\d{2}):(\d{2})(\d{2}) "+\ "([0-9a-f\.]+ )?trains/(.*)$")def getrank(path,imageparm): global ipadr res=re.match(r'trains/(.*)$',imageparm) if res: imageparm=res.group(1) ok=1 notserved=re.compile(".*\*\*\* .* \*\*\*") res=re.match('.*trains/(.*)$',path) if res: path=res.group(1) try: data=open(VIEWINGS) except: print "Cannot open logfile %s" % (VIEWINGS) sys.exit(1) for l in data.readlines(): res=notserved.match(l) if res: continue pass if ok: if table.has_key(imageparm): if not dontlog: table[imageparm]+=1.0 # add one for this viewing! else: table[imageparm]=1.0 # add one for this viewing! list=[] for key in sorted(table.keys()): list.append((table[key],key)) #print "%f %s" % (table[key],key) sortlist=sorted(list,reverse=True) totalimages=len(sortlist) i=1; last=0.0; rank=1; thisrank=0 for (n,k) in sortlist: if last!=n: rank=i #print "%4d %2.6f %s<BR/>" % (rank,n,k) i+=1 last=n if k==path: thisrank=rank break return (totalimages,thisrank,ok)

Match the various fields in a logfile entry. These are the year, month, day, hour and minute of the entry (in the format YYYYMMDD:hhmm), followed by the IP address (now stored in hexadecimal, but originally in decimal, and before that, not at all), and then the image address, including the base directory trains/, which is stripped off. Note that any logging of whether the image actually was served, or had already been voted upon, has been discarded previously.

res=logentrypat.match(l)if res: logdate=l[0:8] year=int(res.group(1)) month=int(res.group(2)) day=int(res.group(3)) hour=int(res.group(4)) minute=int(res.group(5)) accesstime = datetime.datetime(year,month,day,hour,minute) timesinceaccess=now-accesstime dayssinceaccess=timesinceaccess.days+timesinceaccess.seconds/86400.0 expval=-dayssinceaccess/DECAY voteval=math.exp(expval) #print "%1.4f %2.6f %2.5f" % (voteval,expval,dayssinceaccess) thisipadr=res.group(6) if thisipadr: thisipadr=thisipadr.strip() imagename=res.group(7) #print "%s %s %s : %s %s <%s> %s" % \ # (year,month,day,hour,minute,thisipadr,imagename) if table.has_key(imagename): table[imagename]+=voteval else: table[imagename]=voteval if thisipadr==ipadr and imagename==imageparm and logdate==today: ok = 0

2.3.4 Define the XML Dispatch Routines

def doname(elem,path): if elem.firstChild: text=elem.firstChild.nodeValue pathsplit=re.match(BASEPAGE+'/(.*)/([^/]*)$',path) if pathsplit: pathbase=pathsplit.group(1) pathfile=pathsplit.group(2) dirattr=elem.getAttribute('dir') dir="" if dirattr: dir=dirattr page="" pageattr=elem.getAttribute('page') if pageattr: url=pathbase+'/'+pageattr+EXTN+'#'+pathfile page=pageattr else: url=pathbase+'/index'+EXTN+'#'+pathfile print "<LI><I>name:</I> <A HREF=\"%s\">%s</A> dir=%s page=%s</LI>" % (url,text,dir,page) else: print "<LI><I>name:</I> %s</LI>" % (text)def dothumb(elem,path): pass #text=elem.firstChild.nodeValue #print "<LI><I>thumb:</I> %s</LI>" % textdef dosize(elem,path): bytes=pixels="" print "%s" % (elem) attrs=elem.attributes for i in range(attrs.length): attr = attrs.item(i) if attr.name=='bytes': bytes=attr.value if attr.name=='pixels': pixels=attr.value print "<LI><I>size:</I> %s bytes, %s pixels</LI>" % (bytes,pixels)def dodate(elem,path): taken=catalogued="" attrs=elem.attributes for i in range(attrs.length): attr = attrs.item(i) if attr.name=='taken': taken=attr.value if attr.name=='catalogued': catalogued=attr.value print "<LI><I>date:</I> taken: %s, catalogued %s</LI>" % (taken,catalogued)def dophotographer(elem,path): text=elem.firstChild.nodeValue print "<LI><I>photographer:</I> %s</LI>" % textdef doindex(elem,path): if elem.firstChild: text=elem.firstChild.nodeValue print "<LI><I>index terms:</I> %s</LI>" % textdef totext(node,path): if node.nodeType==node.TEXT_NODE: return node.nodeValue elif node.nodeType==node.ELEMENT_NODE: text='' for n in node.childNodes: text=text+totext(n,path) if node.tagName=='narrower': return "<DIV style=\"margin-left:20;font-style:italic\">%s</DIV>" % text elif node.tagName=='uri': attributes=node.attributes for i in range(attributes.length): attr = attributes.item(i) if attr.name=='href': href=attr.value return "<A HREF=\"%s\">%s</A>" % (href,text) elif node.tagName=='p': return "<P>%s</P>" % (text) elif node.tagName=='b': return "<B>%s</B>" % (text) elif node.tagName=='i': return "<I>%s</I>" % (text) elif node.tagName=='em': return "<EM>%s</EM>" % (text) elif node.tagName=='dq': return "\"%s\"" % (text) elif node.tagName=='description': return text else: return "&amp;lt;%s>%s&amp;lt;/%s>" % (node.tagName,text,node.tagName) elif node.childNodes: text='' for n in node.childNodes: text=text+totext(n,path) return text else: return "**unknown node**"def dodescription(elem,path): text=totext(elem,path) print "<LI><I>description:</I> %s</LI>" % textdispatch={'name':doname, 'thumb':dothumb, 'size':dosize, 'date':dodate, 'photographer':dophotographer, 'index':doindex, 'description':dodescription}

2.3.5 define procedure to display the image

This is the procedure that does most of the real work in displaying the full image.

def display(image): global totalimages,imageparm path=top[0:len(top)-6]+image xmlfile=path+EXTN jpgfile=path+".jpg" acc=os.access(jpgfile,os.R_OK) pathsplit=re.match(BASEPAGE+'/(.*)/([^/]*)$',path) if pathsplit: pathbase=pathsplit.group(1) pathfile=pathsplit.group(2) if not acc: if pathsplit: gifpath=pathbase+'/thumb/'+pathfile+'.gif' gifacc=os.access(BASEPAGE+'/'+gifpath,os.R_OK) else: gifacc=False #print "<P>%s,%08x</P>" % (jpgfile,acc) print "<P><B>The file %s is not available</B>" % (jpgfile) if gifacc: print '<IMG SRC="'+gifpath+'"/></P>' print '<P>The image has been removed for space reasons. ' print 'It will be retrieved overnight.</P>' else: print "</P>" res=re.match(".*/([^/]*)$",path) name="Sorry" if res: name=res.group(1) print else: #print "<P>%s,%08x</P>" % (jpgfile,acc) print "<IMG SRC=\"%s\"/>" % (image+".jpg") print "<LI><I>file:</I> %s </LI>" % (xmlfile) dom=parse(xmlfile) elems=dom.getElementsByTagName('image').item(0).childNodes for n in elems: if n.nodeType == Node.ELEMENT_NODE: #print n.tagName if dispatch.has_key(n.tagName): dispatch[n.tagName](n,path) #print '<P><A href="%s">Go to %s</A></P>' % (image,image) (totalimages,rank,ok)=getrank(path,imageparm) log(ipadr,image,acc,ok) startrank = 25 * ((rank-1) / 25) print if not ok: print '<P>You have already voted for this image today!</P>\n' return"""<P>This may be because the file has been relocated to a different location.Try clicking this link to search the website:<FORM action=\""""+SCRIPT+"""\" method=\"post\" name=\"image\"><INPUT type=\"submit\" name=\"image\" value=\"%s\"><IMG SRC="%s"/></INPUT></FORM>If that does not work, it may be that the image has been removed for space reasons. Sorry.""" % (name,gifpath)"""<FORM action=\""""+CGIBIN+"""/ranktrains.py" method="post"><INPUT type="hidden" name="number" value="25"/><P>This image ranks %d out of %d<BUTTON type="submit" name="startnum" value="%d">%04d-%04d</BUTTON></P></FORM>""" % (rank,totalimages,startrank,startrank+1,startrank+25)

2.3.6 Define Procedure to Search Directories

The procedure visit is called when we have an image name, but no path to the image. The procedure recursively visits all directories reachable from the initial parameter dir, and if it finds the image, calls display to perform the actual display of the image. It then returns. Thumbnail directories are skipped. It is assumed that the initial dir parameter contains the substring trains/, indicating where the trains subdirectory begins.

def visit(dir,level,image): list = os.listdir(dir) for f in list: if f == 'thumb': continue path = dir + "/" + f if f == image+'.jpg': res=re.match('(.*)(trains/[^.]*)\.jpg',path) if res: display(res.group(2)) return 1 if os.path.isdir(path): res=visit(path,level+2,image) if res: return 1 return 0

3. ranktrains.py

#!/usr/local/bin/python# DO NOT EDIT THIS FILE! # use ~/Computers/python/viewtrains/viewtrains.xlp insteadimport re,string,sys,datetimeimport cgi,osimport timeimport urllibimport mathimport rankfrom xml.dom.minidom import parse, parseString, Nodeprint "Content-Type: text/html\n\n";print "<base href=\""+HOMEPAGE+"/\">\n"htmltitle="Dynamic Rankings"scriptnameparm="ranktrains.py"

3.1 ranktrains: define constants

This macro is also defined in other sections.

tm = time.asctime(time.localtime(time.time())) + ["", "(Daylight savings)"][DST]SCRIPT=CGIBIN+"/ranktrains.py"

3.2 ranktrains: collect cgi parameters

form = cgi.FieldStorage()gotparms=0numtodisplay=25; startnum=0if form.has_key("number"): numtodisplay=int(form["number"].value) gotparms=1if form.has_key("startnum"): startnum=int(form["startnum"].value) gotparms=1stopnum=startnum+numtodisplay#print "numtodisplay = %d, startnum = %d" % (numtodisplay,startnum)

3.3 ranktrains: collect previous ranking information

RANKINGS=LOGFILE+"/trainrank"VIEWINGS=LOGFILE+"/trainview"totalimages, datatime, votefactor, table = rank.rankdata(RANKINGS)

3.4 ranktrains: Update Rankings with Latest Log Info

logentrypat=\ re.compile("(\d{4})(\d{2})(\d{2}):(\d{2})(\d{2}) "+\ "([0-9a-f\.]+ )?(trains/)?(.*)$")notserved=re.compile(".*\*\*\* .* \*\*\*")totalimages, logcount, ranktime, sorttime, sortlist = \ rank.ranklog(VIEWINGS,table,notserved)

3.5 rankings: Print Forward and Backward Buttons

print "<FORM action=\""+CGIBIN+"/ranktrains.py\" method=\"post\">\n"print "<table align=\"center\"><tr>\n"print "<INPUT type=\"hidden\" name=\"number\" value=\"%d\"/>" % (numtodisplay)if startnum-numtodisplay>=0: print "<td><BUTTON type=\"submit\" name=\"startnum\" " print "value=\"%d\">Prev (%d-%d)</BUTTON></td>" % \ (startnum-numtodisplay,startnum-numtodisplay+1,startnum)else: print "<td><BUTTON type=\"submit\">(Prev)</BUTTON></td>"print "<td><BUTTON type=\"submit\" name=\"startnum\" "print "value=\"%d\">Next (%d-%d)</BUTTON></td>" % \ (startnum+numtodisplay,startnum+numtodisplay+1,startnum+2*numtodisplay)print "</tr></table>\n"print "</FORM>\n"

3.6 ranktrains: Generate Image Rankings

print "<H1>Image Rankings %d - %d</H1>\n" % (startnum+1,stopnum)print "<FORM action=\""+CGIBIN+"/viewtrains.py\" "print "method=\"post\" name=\"image\"><table align=\"center\">\n"perline=5; posonline=0i=1; last=0.0; rank=1for (n,k) in sortlist: #print "%1.4f %s <BR/>" % (n,k) #(n,k) = sortlist[i] if last!=n: rank=i if i>startnum: if posonline==0: print "<tr>\n" res=re.match("(.*)/([^/]*)",k) path="" ; image="" if res: path=res.group(1) image=res.group(2) else: image=k caption=k if len(caption)>17: caption=path+"<BR/>"+image try: xmlfile = open(BASEPAGE+"/trains/"+path+"/"+image+".xml") dom = parse(xmlfile) nameelem=dom.getElementsByTagName('name').item(0) pageattr=nameelem.getAttributeNode('page') if pageattr: page=pageattr.nodeValue else: page='index' xmlfile.close() print "<td align=\"center\">" print "<table><tr><td>%4d</td>" % (rank) print "<td align=\"right\">%2.6f</td></tr>" % (n) print "<tr><td colspan=\"2\" align=\"center\">" #print '<INPUT type="hidden" name="disablevote" value="1"/>' print "<BUTTON type=\"submit\" name=\"image\" value=\"trains/%s\">" % (k) print "<IMG align=\"center\" ALT=\"click me for full image\" " print "SRC=\"trains/%s/thumb/%s.gif\">" % (path,image) print "</BUTTON></td></tr><tr><td colspan=\"2\" align=\"center\">" print "<A HREF=\"trains/%s/%s.xml#%s\">%s</A></td></tr></table></td>" % \ (path,page,image,caption) except IOError: print "<td align=\"center\">" print "<table><tr><td>%4d</td>" % (rank) print "<td align=\"right\">%2.6f</td></tr>" % (n) print "<tr><td colspan=\"2\" align=\"center\">" print "Cannot access<BR/>" print "%s</td></tr></table></td>" % (caption) pass posonline+=1 if posonline==perline: print "</tr>\n" posonline=0 i+=1 if i>stopnum: break last=nprint "</table></FORM>\n"

3.7 ranktrains: print rankings table

print "<H1>Rankings Images Table</H1>\n"print "<FORM action=\""+CGIBIN+"/ranktrains.py\" method=\"post\">\n"print "<table align=\"center\"><tr>\n"print "<INPUT type=\"hidden\" name=\"number\" value=\"%d\"/>" % (numtodisplay)linecount=0# compute score from first image(score,key)=sortlist[0]print "<td align=\"left\">%7.4f</td>" % (score)nimages=len(sortlist)-1for i in range(0,totalimages,numtodisplay): print "<td align=\"center\">" print "<BUTTON type=\"submit\" name=\"startnum\" " print "value=\"%d\">%04d-%04d</BUTTON></td>" % (i,i+1,i+numtodisplay) linecount+=1 # compute score from image number i+numtodisplay j=i+numtodisplay-1 if j > nimages: j = nimages (score,key)=sortlist[j] if linecount % 6 == 2: print "<td align=\"left\">%7.4f</td>" % (score) if linecount % 6 == 4: print "<td align=\"left\">%7.4f</td>" % (score) if linecount % 6 == 0: print "<td align=\"left\">%7.4f</td>" % (score) print "</tr><tr>\n" # compute score from image number i+numtodisplay+1 j=i+numtodisplay if j > nimages: j = nimages (score,key)=sortlist[j] print "<td align=\"left\">%7.4f</td>" % (score)# compute score from last image(score,key)=sortlist[len(sortlist)-1]print "<td align=\"left\">%7.4f</td>" % (score)print "</tr></table>\n"print "</FORM>\n"print """<P> A full explanation of how these rankings are computed can be found on the <A HREF=\""""+HOMEPAGE+"""/trains/pops/index"""+EXTN+"""">Vox Pops Page</A></P>"""

3.8 ranktrains: Print Ranking Analysis

print "<H1>Ranking Analysis Data</H1>\n"print '<P>Time analyses based on wallclock times</P>'print 'votefactor=%f ' % (votefactor) + \ '(this is the decay since last rankings were computed)<BR/>'print "Ranking data input took %d.%06d seconds for %d images<BR/>" % \ (datatime.seconds,datatime.microseconds,totalimages)print "Logfile input took %d.%06d seconds for %d entries<BR/>" % \ (datatime.seconds,datatime.microseconds,logcount)print "Input analysis and sorting took %d.%06d seconds<BR/>" % \ (sorttime.seconds,sorttime.microseconds)print "Data ranking took %d.%06d seconds<BR/>" % \ (ranktime.seconds,ranktime.microseconds)

4. tidytrains.py

# read two files to determine current rankings. The first file# contains a single entry for each image, containing the current# voting score, and the second file contains a list of votes since# then.#!/usr/local/bin/pythonimport re,string,sys,datetimeimport cgiimport getoptimport mathimport osimport shutilimport timeimport urllibimport rankTHRESHOLD=0.000170MASTERDIR='/home/ajh/web/trains/'WEBDIR='/home/ajh/Sites/'WEBLOG=WEBDIR+'logs/'WEBPAGE=WEBDIR+'trains/'WEBRANK=WEBLOG+'trainrank'WEBVIEW=WEBLOG+'trainview'SCSSEPAGE='/u/homes1/ajh/trains/'opts, args = getopt.getopt(sys.argv[1:],"s:f:")#CURRENT=WEBRANK#LOGFILE=WEBVIEWCURRENT='/tmp/ajh/trains/trainrank'LOGFILE='/tmp/ajh/trains/trainview'LISTFILE='/tmp/ajh/trains/trainlist'command='/usr/local/bin/rsync -auv -e "ssh -2" nexus.csse.monash.edu.au:web/logs/ /tmp/ajh/trains/'print commandstatus=os.system(command)if status: print "Urrk 1! %d" % (status) sys.exit(status)command="/usr/bin/ssh -2 nexus '(find web/trains -name \*.jpg)' >%s" % LISTFILEstatus=os.system(command)if status: print "Urrk 2! %d" % (status) sys.exit(status)available=[]avfile=open(LISTFILE)for l in avfile.readlines(): l=l.strip() l=l[11:] # strip off web/trains/ #print ">%s<" % l available.append(l)avfile.close()#os.remove(LISTFILE)oneday=datetime.timedelta(days=1)yesterday=datetime.datetime.now()-onedaystarttime=yesterday.strftime("%Y%m%d:000000")finishtime="20201231:235959"for opt,val in opts: print "%s %s" % (opt,val) if opt=='-s': starttime=val elif opt=='-f': finishtime=val else: print "Unknown option %s" % (opt)print "starttime = %s" % (starttime)totalimages, datatime, votefactor, table = rank.rankdata(CURRENT)ignorepat=re.compile(".*\*\*\* [^*]* \*\*\*")totalimages,logcount, ranktime,sorttime,sortlist = \ rank.ranklog(LOGFILE,table,ignorepat,starttime,finishtime)for (val,key) in sortlist: if val<THRESHOLD: #path=WEBPAGE+key+'.jpg' path=SCSSEPAGE+key+'.jpg' if key in available: print 'removing %s' % (path) command='ssh -1 nexus.csse.monash.edu.au "rm %s"' % (path) status=os.system(command) #os.remove(path) else: print '%s already removed' % (path) pass table = {}ignorepat=re.compile(r'.*((-[0-9a-z]+)|(already voted \*\*\*))$')totalimages,logcount, ranktime,sorttime,sortlist = \ rank.ranklog(LOGFILE,table,ignorepat,starttime,finishtime)pathpat=re.compile(r'([^ ]+) ')for (val,key) in sortlist: #print "%s %7.6f" % (key,val) res=pathpat.match(key) if res: path=res.group(1) webpath = MASTERDIR+path+'.jpg' #sitepath = WEBPAGE+path+'.jpg' sitepath = SCSSEPAGE+path+'.jpg' #webacc=os.access(webpath,os.F_OK) #siteacc=os.access(sitepath,os.F_OK) #if webacc: #if siteacc: #pass #else: #print 'cp -p %s %s' % (webpath,sitepath) #shutil.copy2(webpath,sitepath) #else: #if siteacc: #print 'Anomalous: %s exists but %s does not' % (sitepath,webpath) #else: #print 'Cannot recover %s as there is no master copy' % (sitepath) command='/usr/local/bin/rsync -auv %s nexus.csse.monash.edu.au:%s &>/dev/null' % \ (webpath,sitepath) status=os.system(command) if not status: print 'recovered %s' % (sitepath) else: print 'Could NOT recover %s, status:%d' % (sitepath,status)sys.exit(0)

5. ranking.py

#!/usr/local/bin/python# read two files to determine current rankings. The first file# contains a single entry for each image, containing the current# voting score, and the second file contains a list of votes since# then. A third file is then constructed, containing the updated# rankings.# DO NOT EDIT THIS FILE! # use ~/Computers/python/viewtrains/viewtrains.xlp insteadimport re,string,sys,datetimeimport cgi,osimport getoptimport timeimport urllibimport math<u name="define global constants"/>tm = time.asctime(time.localtime(time.time())) + ["", "(Daylight savings)"][DST]startnow=datetime.datetime.now()ignoredirs = re.compile('(tmp)|(units)')top=BASEPAGE+"/trains"jpgpat=re.compile(r'(.*)\.jpg$')xmlpat=re.compile(r'.*\.xml$')datepat=re.compile(r'(\d{4})(\d{2})(\d{2}):(\d{2})(\d{2})(\d{2})')def strtotime(str,default): res=datepat.match(str) if res: thisdatetime=datetime.datetime(int(res.group(1)), # year int(res.group(2)), # month int(res.group(3)), # day int(res.group(4)), # hour int(res.group(5)), # minute int(res.group(6))) # second return thisdatetime else: return default opts, args = getopt.getopt(sys.argv[1:],"s:f:")CURRENT=args[0]LOGFILE=args[1]NEW=args[2]starttime="20050101:000000"finishtime="20051231:235959"for opt,val in opts: print "%s %s" % (opt,val) if opt=='-s': starttime=val elif opt=='-f': finishtime=val else: print "Unknown option %s" % (opt)print "%s %s" % (starttime,finishtime)starttime=strtotime(starttime,None)finishtime=strtotime(finishtime,None)print "%s %s" % (starttime,finishtime)currcount=0currlist=file(CURRENT,"r")table={}DECAY=15.0currdate=currlist.readline()currdatetime=strtotime(currdate,startnow)timesincelast=startnow-currdatetimedayssincelast=timesincelast.days+timesincelast.seconds/86400.0expval=-dayssincelast/DECAYvotefactor=math.exp(expval)print 'votefactor=%f' % (votefactor)totalimages=0for l in currlist.readlines(): res=re.match(r'([^ ]+) +([0-9.]+)$',l) if res: lastvote=float(res.group(2)) nowvote=votefactor*lastvote table[res.group(1)]=nowvote else: print 'bad format in %s' % (l) totalimages+=1currlist.close()datanow=datetime.datetime.now()datatime = datanow-startnowprint "Data input took %d.%06d seconds for %d images<BR/>" % \ (datatime.seconds,datatime.microseconds,totalimages)data=open(LOGFILE)pat=re.compile("(\d{4})(\d{2})(\d{2}):(\d{2})(\d{2}) ([0-9a-f\.]+ )?(trains)?(.*)$")notserved=re.compile(".*\*\*\* [^*]* \*\*\*")logcount=0for l in data.readlines(): logcount+=1 res=notserved.match(l) if res: continue res=pat.match(l) if res: year=int(res.group(1)) month=int(res.group(2)) day=int(res.group(3)) hour=int(res.group(4)) minute=int(res.group(5)) accesstime = datetime.datetime(year,month,day,hour,minute) #print "%s %s %s" % (accesstime,starttime,finishtime) if accesstime < starttime or accesstime > finishtime: print "ignoring %s" % (l) continue timesinceaccess=startnow-accesstime dayssinceaccess=timesinceaccess.days+timesinceaccess.seconds/86400.0 expval=-dayssinceaccess/DECAY voteval=math.exp(expval) #print "%1.4f %2.6f %2.5f" % (voteval,expval,dayssinceaccess) ipadr=res.group(6) if ipadr: ipadr=ipadr.strip() imagename=res.group(8) res=re.match('(.*)(/[^/.]+/\.\./)(.*)$',imagename) while res: imagename=res.group(1)+'/'+res.group(3) res=re.match('(.*)(/[^/.]+/\.\./)(.*)$',imagename) print imagename imagename=imagename[1:] if table.has_key(imagename): table[imagename]+=voteval else: table[imagename]=voteval passranknow=datetime.datetime.now()ranktime = ranknow-datanowprint "Input analysis took %d.%06d seconds for %d entries<BR/>" % \ (ranktime.seconds,ranktime.microseconds,logcount)list=[]for key in sorted(table.keys()): list.append((table[key],key))sortlist=sorted(list,reverse=True)currlist=file(NEW,"w")currlist.write('%04d%02d%02d:%02d%02d%02d\n' % (startnow.year, startnow.month, startnow.day, startnow.hour, startnow.minute, startnow.second))#for key in sorted(table.keys()):# currlist.write('%s %f\n' % (key,table[key]))for (val,key) in sortlist: currlist.write('%s %f\n' % (key,val))currlist.close() closenow=datetime.datetime.now()sorttime = closenow-ranknowprint "Data sorting took %d.%06d seconds<BR/>" % (sorttime.seconds,sorttime.microseconds)sys.exit(0)

6. rank.py

This module provides three procedures for analysing the rankings of John's railway photographs. Rnaking data is stored in two files, generically known as the trainrank and trainview files. The first stores ranking data for a set of images, as computed at a specified data and time. The second stores access requests for images in the database, together with the time and IP address of the request. Note that there is one entry for each unique image in the first file, whereas there may be multiple entries in the second file for any given image.

import re,string,sys,datetimeimport cgiimport getoptimport mathimport osimport timeimport urllibstartnow=datetime.datetime.now()datepat=re.compile(r'(\d{4})(\d{2})(\d{2}):(\d{2})(\d{2})(\d{2})')pat=re.compile("(\d{4})(\d{2})(\d{2}):(\d{2})(\d{2}) ([0-9a-f\.]+ )?(trains/)?(.*)$")DECAY=15.0

6.1 rank: define the strtotime function

def strtotime(str,default): res=datepat.match(str) if res: thisdatetime=datetime.datetime(int(res.group(1)), # year int(res.group(2)), # month int(res.group(3)), # day int(res.group(4)), # hour int(res.group(5)), # minute int(res.group(6))) # second return thisdatetime else: return default

6.2 rank: define rankdata procedure

The rankdata procedure reads the train rank file (the path to which is passed as the parameter), containing a date and time when the file was created, together a line for every image in the database. Each line contains the image path, starting at the trains subdirectory, together with the (decayed) vote value. Each vote is decayed by an exponential factor votefactor, where the exponent is proportional to the length of time since the creation date of the file. An updated vote value is entered into an associative table, which is returned, along with various housekeeping values of totalimages (the total number of distinct images processed), and datatime, the elapsed wall time spent in processing this file.

def rankdata(CURRENT): datastart=datetime.datetime.now() totalimages=0 table={} currlist=file(CURRENT,"r") currdate=currlist.readline() currdatetime=strtotime(currdate,startnow) timesincelast=startnow-currdatetime dayssincelast=timesincelast.days+timesincelast.seconds/86400.0 expval=-dayssincelast/DECAY votefactor=math.exp(expval) for l in currlist.readlines(): res=re.match(r'([^ ]+) +([0-9.]+)$',l) if res: lastvote=float(res.group(2)) nowvote=votefactor*lastvote table[res.group(1)]=nowvote else: print 'bad format in %s' % (l) totalimages+=1 currlist.close() datanow=datetime.datetime.now() datatime = datanow-datastart return totalimages, datatime, votefactor, table

6.3 rank: define ranklog procedure

def ranklog(LOGFILE,table,ignorepat, starttime="20050101:000000",finishtime="20061231:235959"): logstart=datetime.datetime.now() data=open(LOGFILE) logcount=0 starttime=strtotime(starttime,None) finishtime=strtotime(finishtime,None) for l in data.readlines(): logcount+=1 l=l.strip() res=ignorepat.match(l) if res: continue res=pat.match(l) if res: year=int(res.group(1)) month=int(res.group(2)) day=int(res.group(3)) hour=int(res.group(4)) minute=int(res.group(5)) accesstime = datetime.datetime(year,month,day,hour,minute) #print "%s %s %s" % (accesstime,starttime,finishtime) if accesstime < starttime or accesstime > finishtime: #print "ignoring %s" % (l) continue timesinceaccess=startnow-accesstime dayssinceaccess=timesinceaccess.days+timesinceaccess.seconds/86400.0 expval=-dayssinceaccess/DECAY voteval=math.exp(expval) #print "%1.4f %2.6f %2.5f" % (voteval,expval,dayssinceaccess) ipadr=res.group(6) if ipadr: ipadr=ipadr.strip() imagename=res.group(8) if table.has_key(imagename): table[imagename]+=voteval else: table[imagename]=voteval pass ranknow=datetime.datetime.now() ranktime = ranknow-logstart list=[] for key in sorted(table.keys()): list.append((table[key],key)) sortlist=sorted(list,reverse=True) totalimages = len(sortlist) closenow=datetime.datetime.now() sorttime = closenow-ranknow return totalimages, logcount, ranktime, sorttime, sortlist

7. TO DO

8. Indices

8.1 Identifier Index

8.2 Chunk Index

8.3 File Index

This page maintained by John Hurst.
Copyright
20488 accesses since
12 Sep 2006
My PhotoMy PhotoTrain Photo

Dynamically generated at 20090704:1619 from an XML file modified on 20061010:0845.
Maintainer use only; not generally accessible: Local ServerHome ServerHurst ServerWork ServerCSSE Server

347 accesses since 19 Aug 2008, HTML cache rendered at 20120128:1627