As anyone who has ever produced a large document knows, writing it is
just the beginning. In our last installment, I listed
some of the things left to do before I can offer bound copies of Through
Darkest Zymurgia! for sale on-line. I’ve got much of that work done
now; in particular, I’ve chosen the font and the page style, set the
pagesize to 5″ by 8″, and got the frontmatter of the book almost
completely ready to go. (You can download a preview
of the front of the book in (what else) PDF format if you’re
interested.)
But there was one big step which I had forgotten–or suppressed, I’m not
sure. And that big step is policing line-breaks or, in a word,
hyphenation.
The soul of TeX is its justification algorithm. TeX is extraordinarily
good at producing high-quality fully justified output that looks as though
it were typeset by hand by a skilled typesetter. Unfortunately, that
beautiful output comes with a cost–by TeX’s standards, not all text is
capable of being beautifully typeset. This usually results in what TeX
calls an “overful hbox”–that is, a line that it simply can’t break
without introducing “too much” whitespace into the paragraph. In such
cases TeX reports the error and allows the line to run a little long and
stick out into the margin. If desired, it will also mark the error with
a big black box, so that it will be easier to find visually.
There are several ways to solve the “overful hbox” problem. TeX is good
at hyphenation, but of course it doesn’t know anything about made up
words and names, nor is it aware of all of the possible word-breaks even
in standard English. Often it’s possible to solve the problem by
inserting an explicit hyphen here or there.
In more serious cases the appropriate words in the errant paragraph
simply do not admit of hyphenation. You can’t hyphenate the word “good”,
for example. In such cases, you can tell TeX to be “sloppy” about
formatting the paragraph; this allows it to add more interword space than
it would ordinarily do, and usually solves the problem.
Sloppy formatting has its own perils, however–once in a while it results
in the dreaded “underful hbox” error. This means that TeX has had to add
too much whitespace between one or more words, and that its poetic soul
has rebelled. One can ignore “underful hbox” errors, as TeX inserts the
space anyway, but the annoying thing is that TeX is usually right. Too
much whitespace sticks out like a sore thumb. In this case, you
generally have to modify the text in some way. Sometimes you can split
the paragraph in two; other times, you actually have to change the
wording slightly.
There’s an additional problem associated with hyphenation, which is that
people’s names shouldn’t be hyphenated if it can possibly be avoided. It’s
possible to specify that a word is not to be hyphenated, but all too often
so specifying leads to all of the problems listed above.
TeX has no idea whether a word is a person’s name or not; and sometimes
even when hyphenation can’t be avoided it will hyphenate names in the
wrong place. Consider the narrator of Zymurgia, Professor Leon
Thintwhistle. The good professor’s last name is prounounced
“Thint-whistle”, yet TeX decided that it could hyphenate it
“Thin-twhistle”. It’s possible to educate TeX about such matters, but it
requires looking through the finished PDF file for hyphenation problems.
All of this, I may say, is slow going. I’ve now spent two or three hours
at it, and I’ve made it through chapter 10 (of 41).
It’s not all bad, though. I’m taking the opportunity to added drop caps
at the beginning of each chapter, and as I read through the output
looking for bad line-breaks I’m finding a number of other small errors.
In the next installment–I’m not sure yet. We’ll see.