I'm working on displaying OCR text from BHL using SVG, and these are just some quick notes on font size. Specifically how SVG font size corresponds to the size of letters, and how you work out what point size was used to print text on a BHL page.
SVG font-size corresponds to the EM square of the font. Hence, if I specify a font-size of 100px then text looks like this (you'll need need a browser that supports SVG to see this):
The yellow box is the EM square (in this example 100px by 100px). The height of the letter "M" is set by the properties of the font which in this case is Times-Roman which has a capheight of 662. This value (and others) are defined in the font description file (Adobe-Core35_AFMs-314.tar.gz).
Below is a diagram showing attributes of Times Roman with respect to the 1000 x 1000 EM square:
Couple of things to note. The first is that the height of a digital font is not given by simply adding the capheight and descender, the height of the font is the EM square. If you know the capheight and the font metrics you can compute the size of the EM square (for Times Roman capheight / 0.662 gives you the EM square). Hence it is possible to fairly accurately reproduce printed text in SVG. I had hopes that I could then go on to infer the actual point size used on the printed page (being able to say "this is 10pt" seemed more elegant than this font is "x pixels"). Turns out that "point size" is a terribly elusive concept, see Point Size and the Em Square: Not What People Think. I've clearly lots to learn about typography. BHL would be a gold mine for anyone interested in the development of type faces and printing technology over time.