Semantic Linefeeds

Date:	3 April 2012
Tags:	python, computing, document-processing

I give some advice each year in my annual Sphinx tutorial at PyCon. A grateful student asked where I myself had learned the tip. I have done some archæology and finally have an answer. Let me share what I teach them about “semantic linefeeds,” then I will reveal its source — which turns out to have been written when I was only a few months old!

In the tutorial, I ask students whether or not the Sphinx text files in their project will be read by end-users. If not, then I encourage students to treat the files as private “source code” that they are free to format semantically. Instead of fussing with the lines of each paragraph so that they all end near the right margin, they can add linefeeds anywhere that there is a break between ideas.

By starting a new line at the end of each sentence, and splitting sentences themselves at natural breaks between clauses, a text file becomes far easier to edit and version control. Text editors are very good at manipulating lines — so when each sentence is a contiguous block of lines, your editor suddenly becomes a very powerful mechanism for quickly rearranging clauses and ideas.

And your version-control system will love semantic linefeeds. Have you ever changed a few words at the beginning of a paragraph, only to discover that version control now thinks the whole text has changed?

 ...
 the definition in place of it.

-The beauteous scheme is that now, if you change
-your mind about what a paragraph should look
-like, you can change the formatted output merely
-by changing the definition of ‘‘.PP’’ and
-re-running the formatter.
+The beauty of this scheme is that now, if you
+change your mind about what a paragraph should
+look like, you can change the formatted output
+merely by changing the definition of ‘‘.PP’’
+and re-running the formatter.

 As a rule of thumb, for all but the most
 ...

With every sentence and clause on its own line, you can make exactly the same change to the same paragraph without the rest of the paragraph even noticing:

 ...
 the definition in place of it.

-The beauteous scheme is that now,
+The beauty of this scheme is that now,
 if you change your mind
 about what a paragraph should look like,
 you can change the formatted output
 merely by changing
 the definition of ‘‘.PP’’
 and re-running the formatter.

 As a rule of thumb, for all but the most
 ...

“Semantic linefeeds,” as I call them, have been making my life easier for more than twenty years, and have governed how my text files look behind-the-scenes whether my markup format is HTML, TeX, RST, or the venerable troff macro typesetter.

For a long time I believed that my source must have been the UNIX Documenter's Workbench manual. The Workbench was an attempt by AT&T to market the operating system that had become such a cult hit internally among Bell Labs engineers, by bundling the system with its most powerful typesetting tools. The attempt failed, of course — I am told that AT&T was terrible at marketing computers, just as Xerox had no idea what to do with the ideas that were bubbling at PARC in the 1970s — but my father worked at Bell Labs and had a copy of the Workbench documentation around the house. (I cannot find a copy on the Internet — were all public copies destroyed during the devastating copyright battle that justly brought SCO to its ruin?)

But after an extensive search, I have found an earlier source — and I could not be any happier to discover that my inspiration is none other than Brian W. Kernighan!

He published “UNIX for Beginners” [PDF] as Bell Labs Technical Memorandum 74-1273-18 on 29 October 1974. It describes a far more primitive version of the operating system than his more famous and more widely available “UNIX for Beginners — Second Edition” from 1978. After a long search I have found the lone copy linked above, hosted on an obscure Japanese web page about UNIX 6th Edition which has now disappeared but can still be viewed on the Internet Archive’s Wayback Machine (to which both of the links above point). In the section “Hints for Preparing Documents,” Kernighan shares this wisdom:

Note how Pythonic his advice sounds — he replaces the fiction of “write-once” documents with a realistic focus on making text that is easy to edit later!

I must have read this when I was first learning UNIX and somehow carried it with me all of these years. It says something very powerful about the UNIX plain-text approach that advice given in 1974, and basically targeted at making text easier to edit in the terribly cramped ed text editor, applies just as well to our modern world of colorful full-screen editors like Emacs and Vim and distributed version control systems that were not even imagined in the 1970s.

If you are interested in more early UNIX documentation — including the Second Edition of Kernighan's “Beginners” guide — check out the 7th Edition manuals which Bell Labs has kindly made available online, both as PDF files and also as plain-text files marked up for the troff typesetter. Note that you can still compile the troff files successfully on a modern system — try that with any other richly-formatted text from the 1970s!