OR a long time I toyed with the idea of applying Fourier analysis to a typographic page. It seemed to me that certain hard-to-define visual qualities of a perfectly set page can be revealed and perhaps even measured by taking a close look at its Fourier spectrum. It remained just an idea until I came across the wonderful Peter Burnhill’s book on Aldine typography ( Type spaces: in-house norms in the typography of Aldus Manutius), which contained a very detailed analysis of typographic norms and page geometry. My interest in the subject renewed, and I realized that I had everything I needed to go ahead and try my idea out. Here is the list of materials:

Leaves from Cassius Dio, IN HOC VOLVMINE HAEC CONTINENTVR (Aldine Press, 1519)
CanoScan N650U
PaintShop Pro 5
Alex Chirokov's FFT/IFFT plugins

The plugins are free, work with PSP and Photoshop, and are available with source code (side note: by strange coincidence, Alex Chirokov graduated from the same place as I did, only 14 years later).

IN HOC VOLVMINE HAEC CONTINENTVR (a.k.a. Scriptores historiae Augustae, Ren. 87:8; Adams S781; BMSTC I 217; Palau 48) is typographically similar to the famous octavo classics series (1501–15). The author, Cassius Dio (164–c.235), was a Roman historian of Greek descent; the full text of this work, in English translation, can be found here. As far as I can tell, the leaf refers to the brief reign of Didius Julianus. It is set in Griffo’s italic, and is representative of the high quality Aldine typography.

The original 2400dpi scan proved to be too large for processing, so I scaled it down 5 times and cropped the margins. The resulting image of the text block is 3MB on disk and 9.5MB in memory (32 bits per pixel). Here are the exact dimensions:

original textblock width (measured on Totum'q; line): 60.7mm

original textblock height (from top ascender to bottom descender, sans running head): 120mm

image size in pixels: 1340 × 2490

textblock width in pixels: 1150 (yields 18.945 dpmm resolution)

textblock height in pixels: 2277 (yields 18.975 dpmm resolution)

original scan resolution: 2400dpi = 94.56 dpmm

scaled down 5 times: 18.912 dpmm

Averaging the measurements above, we get a 18.95±0.05 dpmm image resolution. Thus, the real size of the image is 70.7±0.2mm × 131.4±0.3mm.

Transformation

Fourier transformation converts a greyscale image into an equivalent form showing the phases and amplitudes of sine waves of various frequencies; the position of a pixel defines the angle and frequency of the wave, while the color information for the pixel stores the phase and amplitude/intensity of the wave (if you want a more detailed introduction, go here). In the original image space, each individual wave looks like a regular set of parallel bars with intensity alternating between darker and lighter shades of grey. The wave covers the whole picture and its average intensity is ½ (1=white, 0=black). The phase of a wave determines positions of its maxima; the interference of the waves of many different frequencies, amplitudes and phases recreates the original image. In theory, the transformation does not lose any information and is reversible. When applied to discrete images, rounding errors do occur, but in most cases, including ours, the result of direct transformation (FFT) followed by inverse transformation (IFFT) is visually indistinguishable from the original.

Alex Chirokov’s FFT plugin converts the lightness channel of any 32-bit image into its FFT form in-place, writing the Fourier image in the resulting image’s H (hue) and L (lightness) channels. The H channel contains phase information; the L channel contains amplitudes. After FFT, the S (saturation) channel of the H-S-L split is set to a grey level of ½ (127) and is not used in reverse transformation. To operate on channels independently, the image is split into three (one per channel) and later recombined (most image editors support HSL splitting/combining; PaintShop represents channels as 8-bit greyscale images). Of the two channels carrying information, the L, which encodes waves’ amplitudes, is more interesting (see picture on the left). The phase channel (H) is harder to interpret and is usually left intact, whereas the intensity channel can be edited, recombined with the other two, and transformed back into original image form.

The intensity picture of the original page shows small-to-medium variations of the tone between adjacent pixels (representing waves of close frequency and direction). These changes, when blurred by the eye, give very smooth transitions with very little contrast except for a few places around the middle. The contrast can be artificially enhanced for analysis by averaging and “posterizing” the picture into small number of fixed intensities. I chose 3-bit posterization (8 intensity levels) and colored the result using map-like pseudocolors (blue to white through green). Also, since the averaging loses important intensity peaks, I added those peaks using intensity threshholding on the non-averaged channel. The peaks look like small white dots located in the central area.

Analysis

When discrete Fourier transformation is applied to a finite image, there are natural limits for the waves’ frequencies: the shortest identifiable wavelength is two pixels, while the longest wavelength is equal to the size of the picture (if the picture is not a square, horizontal and vertical components of the frequency can have different maxima). The FFT image is laid out so that the information concerning the long waves is in the center of the image, while the shortest waves are represented by pixels close to the edges. Since amplitude channels of FFT images are symmetrical around the center, only one half of the image needs to be analyzed.

The picture on the left displays the central part of the FFT image, showing waves with wavelengths of 0.5 mm and larger. The edges of the FFT image are important for analysis of higher frequences responsible for grain, imperfections, scanner noise etc. (click on the picture to see the whole image). The nested ellipses in the picture show the areas containing waves with wavelengths larger than 1.0, 0.5, 0.25, and 0.125mm respectively.

First, let’s take a look at the peaks close to the center (I marked them with red dots). The center pixel and pixels around the center encode amplitudes of very long waves, responsible for background color and very smooth transitions of the tone in the original image; they aren't very interesting. There are half a dozen peaks right above and below the center point, spaced at equal intervals. The first three peaks above the center are marked vf, vf*2, and vf*3; they represent the main vertical wave and its harmonics (shorter waves with frequencies which are exact multiples of the main wave). As it is the case with any musical instrument, the main tone is accompanied by its harmonics with higher frequences; when superimposed, they shape the actual waveform, turning the pure sinusoidal shape into something more sophisticated. In our case, harmonics allow for more abrupt transitions between black and white, which will be presented below.

Positions of the peaks can be measured and compared to the central pixel at 670,1245:

vf, 3rd harmonic up: 670,1146 (99 up)

vf, 2nd harmonic up: 670,1179 (66 up)

vf, 1st harmonic up: 670,1212 (33 up)

central point: 670,1245

vf, 1st harmonic dn: 670,1278 (33 dn)

vf, 2nd harmonic dn: 670,1311 (66 dn)

vf, 3rd harmonic dn: 670,1344 (99 dn)

Given the vertical offset of the peak in pixels, the actual wavelength in pixels can be calculated as the quotient of the height of the image and the offset; for the main vertical wave we have 2490/33 = 75.5±2.5pix between maximums. In real terms, dividing by 18.95±0.05 dpmm resolution, we get a 4.0±0.1mm wavelength (distance between maxima). As you will see below, this wave corresponds to the vertical line period and its length is equal to the aldine classics line increment measured by Peter Burnhill (figure 35 in Type Spaces).

The remaining points of interest are not peaks of high intensity, but rather centers of “hills”, responsible for many diagonal waves of close frequencies and orientation. These centers are marked df1, df2, and df3 (since the FFT image is symmetrical, the same points are present below the center). Here are the positions of the centers and their offsets from the center point:

df3: 581±5,1026±8 (89 left, 219 up)

df2: 581±10,1122±15 (89 left, 123 up)

df1: 586,1220(approx.) (84 left, 25 up)

central point: 670,1245

df1: 754,1270(approx.) (84 right, 25 dn)

df2: 759±10,1368±15 (89 right, 123 dn)

df3: 759±5,1464±8 (89 right, 219 dn)

With these numbers, we can easily calculate the orientation and wavelengths of the diagonal waves. It can be done componentwise, e.g. the horizontal component of df3's wavelength (x) is 1340/89 = 15pix between maximums along horizontal axis; the vertical component (y) is 2490/219 = 11.4pix. The actual wavelength is x×y/sqrt(x²+y²), which in our case will be 9.0±0.3pix or 0.48±0.02mm in real units. The angle is atan(x/y), or 53 degrees clockwise from vertical. Making these calculations for all centers, we get the following results (vf is added for completeness):

vf: 75.5±2.5pix or 4.0±0.1mm, 90° from vertical

df3: 9±0.3pix or 0.48±0.02mm, 53° cw from vertical

df2: 12±1pix or 0.63±0.05mm, 37° cw from vertical

df1: 16pix or 0.85mm (average), 9° cw from vertical

Visualization

Nice thing about Fourier transform is that you can inverse it and get the original image. However, it is more interesting to tweak the amplitude channel before making the inverse transformation—this way one can filter out some unneeded frequencies or emphasize the important ones. Basic filters for low and high frequencies just sharpen or soften the image; we will try something more fancy here.

The easiest thing to do is to mask the waves we aren’t interested in and see what will happen with the image after FFT’s H, S, and modified L channels are recombined and IFFT is applied to get back to the image space. To mask the waves we don’t need, we can select the useful ones (above and below the center), inverse the selection, and reduce brightness of the selection by, say, 80%. It is important to keep the very center of the FFT amplitude channel unaltered: it is responsible for the overall brightness of the image and dimming it down will make the image black, with most of the information lost due to numerical roundoff errors. The result is shown on the left; the individual letterforms are gone and the picture resembles what it actually is—an interference of many waves of various frequencies coming from different angles. However, one can notice that some characteristics of the original picture remain; the lines are blurred, but still clearly distinguishable, even the running head is still visible. This is possible because although many component waves are lost or suppressed, the phase plane which keeps origin points for all waves is intact and can still determine the layout of the interference picture.

Alternatively, we can emphasize the interesting waves by selecting them in the FFT’s amplitude channel and increasing their brightness without touching the rest. The result (shown on the right) has a lot in common with the original picture, but the details of the original letterforms are traded for the intensity of major repeating features and now the relation between the two becomes more obvious. The main vertical wave (called vf on the FFT image) is a line increment; diagonal waves correspond to various repeated font features. The waves can be seen more clearly by emphasizing each one individually, dimming other frequencies and superimposing the picture of the individual wave with the original text. Although the result is somewhat artificial from the pure FFT standpoint, it shows much more clearly which features correspond to each wave.

The most obvious match is the 9° wave (df1); it corresponds to the main repeated stem rate and is angled at the average ascender / descender angle. Its average horizontal period is 0.84mm, which corresponds to 2½ Burnhill's units (0.333mm × 2.5 = 0.833mm, ½ of the x-height). The next wave, df2, can be attributed to repeated connection strokes and frequent ligatures which have inclined shapes, different from those of standalone letters. At 37°, it is close to the prototype pen angle. The last one, df3, is less pronounced than the first two. Its angle is steeper (53°) and its maxima are closer together (although the horizontal spacing is the same as for df2). Closer inspection allows us to identify it with features of a, e, æ, and possibly with high connection strokes of standalone m and n. The details can be viewed by clicking on the pictures below.

df1: 0.85mm (average), 9° cw
df2: 0.63±0.05mm, 37° cw
df3: 0.48±0.02mm, 53° cw

Acknowledgements

I am grateful to Alex Chirokov for making this work possible and to Andrew Pochinsky for his thoughtful comments.

Miscellanea

To see the role of phase information, observe what happens with the original picture when the FFT phase channel is zeroed. The amplitudes are the same, but all the waves are shifted to a meaningless common origin, which results in an unrecognizeable interference picture.